Twitter Spam Detection using Natural Language Processing

Description

Twitter Spam Detection using NLP

Abstract:

Twitter Spam Detection using NLP  – With the growing desire for social life in today’s world, one well-known network known as Twitter plays an essential role in allowing citizens to connect socially, whether through tweeting a tweet for another person or investigating numerous disciplines in the running world. But, these days, this platform has been infected by spammers who, in order to increase traffic to their spam websites, link their URL to informative tweets where there is no relationship between the material in the URL and the tweet message, resulting in what is known as spammed tweets. Using the encoder-decoder methodology combined with the vectorizer converter, this study proposes a novel method for determining if a tweet referenced by a user is spam or not.

Introduction: 

  • Twitter is currently one of the most popular social media platforms for keeping users connected in real life. Millions of people use linked URLs in their tweets, and the topics they discuss in their tweets are usually related to the linked URL’s content. However, spammers took notice of Twitter’s technique. Spammers frequently accompanied their informative topic-related tweets with the URL of a product commercial or other content that has no connection to the tweet they posted. The URLs may contain drug sales, malware downloads, phishing, and other types of content [1], [2], all of which choke client experiences and, in some cases, completely eliminate the network.
  • It is now required to create a spam-free environment on Twitter by detecting spam tweets and filtering them out of non-spam tweets. Many approaches in many sectors (email, SMS, etc.) have been tried to detect spam, but the authors here proposed a new Framework that can categorize a tweet as spam or not spam using a few machine learning techniques.
  • Many organizations have implemented ways to make Twitter a spam-free environment, such as Trend Micro, which uses a blacklisting service, sometimes known as a web reputation Technology system, to filter spam URLs for users who have its services installed [3]. Twitter’s Blacklist filtering is aided by Bot Maker [4], sometimes known as a recognition system. Due to the delay, the blacklist was unable to protect victims against inbound spam [5].

The existing model of the system:

  • Wang et al introduced the K-L divergence approach in paper [6, which was used to extract the concept from a distributed pattern of spam communications, and the Multi-Scale Drift Detection Test (MDDT) was utilized to detect drift detection in spam messages.
  • Madisetty et al. suggested a DNN convolutional neural network that can detect spam in their study [10], in which each CNN employed five different word embedding approaches to train twitter data. Then, for the aim of spam identification, this technology was integrated with a deep learning algorithm and feature-based models. [11] used a real-time approach for spam filtering utilizing URLs. URLs are scanned and evaluated using several APIs in [12] to determine whether or not they are dangerous. [13] employed the Naive Bayes Algorithm to assess the data, which included both all data and Twitter-specific data. By combining a spam wordlist and a commercial URL-based security solution, this work used adaptive data categorization to detect spam.

Proposed model of the system :

Twitter Spam Detection using NLP

  • Model A: Encoder-Decoder The Encoder-Decoder model is a technique for solving sequence-to-sequence prediction issues using recurrent neural networks. It’s worked well for problems like text summarization and question answering. The Encoder-Decoder model was created for our project to summarise the vector product. Its functioning processes begin with sending larger vector data into the encoder as an input and receiving a smaller (summarised) version of that vectorized data as an output from the Decoder. The layering of the Encoder-Decoder model is depicted in Figure 1.
  • Model and algorithm 
  • Step 1: Considering tweets extracted from Twitter APIs (T1, T2, T3,…. Tn). 
  • Step 2: Labelling URLs with their corresponding tweets and extraction of URL data (U1, U2, U3…..Un). Step 3: Mapping word embedding vectors to each tweet converting (T1 V1, T2 V2,..Tn Vn). 
  • Step 4: Encoder model layers are used to pass large data of URL, converting larger vectors into small vectors with removing stop words (U1 E1, U2 E2,……Un En). 
  • Step 5: Calculating similarity score between V1, V2,…..Vn and E1, E2,…..En on the basis of vectorized results.
  •  Step 6: Determining score for each posted tweet with its respective URL. for i=1 to K (for each tweet) if similarity score >75% to K do put atweet in not spam category if similarity score.

 

Twitter spam detection using NLP by encoder decoder ppt pptx
Twitter spam detection using NLP by encoder decoder ppt pptx

 

 

Twitter spam detection using NLP by encoder decoder ppt pptx 1
Twitter spam detection using NLP by encoder decoder ppt pptx 1

System requirements:-

SOFTWARE AND HARDWARE REQUIREMENTS: –

Twitter Spam Detection using NLP

HARDWARE:-

  • Processor:              Intel i3
  • Hard Disk:             500 GB 
  • RAM:                     2 GB
  • Operating System:  Windows7 or above

SOFTWARE:

  • Technology: Numpy, Matplotlib, Pandas, Seabron, Sklearn
  • IDE: Anaconda Navigator
  • Tool: Jupyter Notebook

    Customer Reviews

    There are no reviews yet.

    Be the first to review “Twitter Spam Detection using Natural Language Processing”

    This site uses Akismet to reduce spam. Learn how your comment data is processed.