Building a Model for Dynamic Pricing Mechanism using Sentimental Analysis
Objective: Model for Dynamic Pricing Mechanism using Sentimental Analysis – The main aim of this project is to collect customer reviews from various sources and classify them as positive sentiment or negative sentiment.
In today’s world, a lot of sentiments are being flooded across social media and other websites. They come from a variety of sectors like hospitality, Advertisement, and Retail sectors. The sentiments are being collected from various sources and they are being analyzed in order to improve the revenue of the entire company. The company keeps track of the changes in sentiment towards a particular product and they try to meet the customer’s demand. The prediction result shows that there is going to be an increase in the revenue of the entire company using the dynamic pricing model when compared to the previous methods.
The exponential growth of social media such as Twitter and community forums has revolutionized communication and content publishing but is also increasingly exploited for the propagation of hate speech and the organization of hate-based activities [1, 3]. The anonymity and mobility afforded by such media have made the breeding and spread of hate speech – eventually leading to hate crime – effortless in a virtual landscape beyond the realms of traditional law enforcement. The term ‘hate speech’ was formally defined as ‘any communication that disparages a person or a group on the basis of some characteristics (to be referred to as types of hate or hate classes) such as race, color, ethnicity, gender, sexual orientation, nationality, religion, or other characteristics. In the UK, there has been a significant increase in hate speech towards the migrant and Muslim communities following recent events including leaving the EU, the Manchester, and the London attacks. In the EU, surveys, and reports focusing on young people in the EEA (European Economic Area) region show rising hate speech. And related crimes based on religious beliefs, ethnicity, sexual orientation, or gender, as 80% of respondents have encountered hate speech online and 40% felt attacked or threatened. Statistics also show that in the US, hate speech and crime is on the rise since the Trump election. The urgency of this matter has been increasingly recognized, as a range of international initiatives has been launched towards the qualification of the problems and the development of counter-measures.
The data given is in the form of a comma-separated values file with tweets and their corresponding sentiments. The training dataset is a CSV file of type tweet_id, sentiment, tweet where the tweet_id is a unique integer identifying the tweet, sentiment is either 1 (positive) or 0 (negative), and tweet is the tweet enclosed in “”. Similarly, the test dataset is a CSV file of type tweet_id, tweet.
Existing methods primarily cast the problem as a supervised document classification task . These can be divided into two categories: one relies on manual feature engineering that are then consumed by algorithms such as SVM, Naive Bayes, and Logistic Regression [3, 9, 11, 15, 19, 23, 35–39] (classic methods); the other represents the more recent deep learning paradigm that employs neural networks to automatically learn multi-layers of abstract features from raw data [13, 26, 30, 34] (deep learning methods).
- Existing studies on hate speech detection have primarily reported their results using micro-average Precision, Recall, and F1 [1, 13, 30, 36, 37, 40].
- The problem with this is that in an unbalanced dataset where instances of one class (to be called the ‘dominant class’) significantly outnumber others (to be called ‘minority classes’), micro-averaging can mask the real performance of minority classes.
In the proposed system, the customer datasets are collected from various sources and the essential features are extracted using the feature extraction phase. Then, the tweets are given to the classifier module in order to classify the sentiments of the customers. A dynamic pricing mechanism system has been designed in order to meet the demands of the customers. The current system contains the following modules: Dataset module, Pre – Processing module, Feature extraction module, classifier module, and the evaluation module.
- Also, as we shall show in the following, this problem may not be easily mitigated by conventional methods of over-or under-sampling.
- Because the real challenge is the lack of unique, discriminative linguistic characteristics in negative Tweets compared to positive ones.
- As a proxy to quantify and compare the linguistic characteristics of positive and negative Tweets, we propose to study the ‘uniqueness’ of the vocabulary for each class.
In this report, we proposed a solution to the detection of positive tweets and negative tweets on Twitter through machine learning using Bag of Words and TF IDF values. We performed a comparative analysis of Logistic Regression, Naive Bayes, Decision Tree, Random Forest, and Gradient Boosting on various sets of feature values and model parameters. The results showed that Logistic Regression performs comparatively better with the TF IDF approach. We presented the current problems for this task and our system that achieves reasonable accuracy (89%) as well as recall (84%). Given all the challenges that remain, there is a need for more research on this problem, including both technical and practical matters.
Hard Disk: 120 GB or above
Ram:4 GB(min) or above
Operating system: Windows7. Or above
Coding Language: Python
Tool: Anaconda Navigator