An Efficient Approach in Selection of Information-Gaining Features Using Sentiment Analysis
Keywords:
KNN, IDF, CART, IG, TF-IDF, socialAbstract
Sentiment analysis, also known as opinion mining, has become increasingly important as the number of online review and social networking sites continues to expand rapidly. People's opinions on items, services, programs, and even politics whether they're positive or negative are heavily influenced by the feedback they receive from others who have used the same item. Big data refers to the massive amounts of data that can be analyzed because of this. The use of big data has expanded into every facet of the global economy. Feature selection is the process of selecting a subset of input variables that best separates input data while also reducing the impact of noise or unsuitable variables and producing efficiently higher prediction outcomes. Term Frequency-Inverse Document Frequency (TF-IDF) is used as a feature extraction method. The inverse document frequency (IDF) is used to standardize the term frequency for each word in the TF-IDF representation, hence decreasing the importance of a word's occurrence count. The focus here is on opinion mining using Information Gain (IG) based feature selection. Features' individual contributions to lowering the entropy can be used to derive IG. This reduces the processing time required by the learning algorithms while simultaneously improving classification accuracy by discarding superfluous characteristics from the initial feature set. The suggested method is evaluated using the Naive Bayes, Classification and Regression Tree (CART), and K-Nearest Neighbor (KNN) classifiers. It has been demonstrated that the proposed method yields optimal results.