Tài liệu An empirical study of Vietnamese sentiment analysis - Nguyen Hoang Quan: Công nghệ thông tin & Cơ sở toán học cho tin học
N. H. Quan, L. D. Son, N. Q. Uy, “An empirical study of Vietnamese sentiment analysis.” 104
AN EMPIRICAL STUDY OF VIETNAMESE SENTIMENT ANALYSIS
Nguyen Hoang Quan*, Le Dinh Son , Nguyen Quang Uy
Abstract: Sentiment Analysis is the area of research that studies people’s opinions,
sentiments, evaluations, attitudes and emotions from written text. This has become
one of the most active research fields in Natural Language Processing. In recent
years, sentiment analysis for Vietnamese text has received a considerable attention. In
the fourth international workshop on Vietnamese language and speech processing,
there was a competition between various researchers in solving this problem based on
a benchmark dataset provided by the workshop committee. However, each researcher
address the problem with different methods. In this paper, we present a comparison of
a large number of machine learning algorithms for tackling this ...
7 trang |
Chia sẻ: quangot475 | Lượt xem: 536 | Lượt tải: 0
Bạn đang xem nội dung tài liệu An empirical study of Vietnamese sentiment analysis - Nguyen Hoang Quan, để tải tài liệu về máy bạn click vào nút DOWNLOAD ở trên
Công nghệ thông tin & Cơ sở toán học cho tin học
N. H. Quan, L. D. Son, N. Q. Uy, “An empirical study of Vietnamese sentiment analysis.” 104
AN EMPIRICAL STUDY OF VIETNAMESE SENTIMENT ANALYSIS
Nguyen Hoang Quan*, Le Dinh Son , Nguyen Quang Uy
Abstract: Sentiment Analysis is the area of research that studies people’s opinions,
sentiments, evaluations, attitudes and emotions from written text. This has become
one of the most active research fields in Natural Language Processing. In recent
years, sentiment analysis for Vietnamese text has received a considerable attention. In
the fourth international workshop on Vietnamese language and speech processing,
there was a competition between various researchers in solving this problem based on
a benchmark dataset provided by the workshop committee. However, each researcher
address the problem with different methods. In this paper, we present a comparison of
a large number of machine learning algorithms for tackling this problem. The results
of the experiments help to gain insight into the ability of various machine learning
algorithms when used for Vietnamese sentiment analysis.
Keywords: Sentiment analysis, Opinion mining, Machine learning.
1. INTRODUCTION
Sentiment analysis is the task of determining user’s opinion about products,
movies, events or policies etc. In this topic, sentiment classification is one of the
most important task aiming to classify opinion of a sentence or document into
several categories such as positive, negative and neutral. Predicting user’s
sentiment is extremely important because the user’s opinion becomes more and
more value. The public interest is the main factor that affects the profit of products
like movies, books, etc. Subsequently, this problem is the interest of both
researchers and companies.
For a comprehensive survey of sentiment analysis and opinion mining, readers
are refered to [1]. The major tasks in sentiment analysis include:
• Subjective classification: aims to classify subjectivity and objectivity
documents.
• Polarity sentiment classification: aims to classify an subjectivity document
into one of the three classes: “positive”, “negative” and “neutral”.
• Rating: aims to rate the documents having personal opinions from 1 star to 5
star (very negative to very positive).
For sentiment analysis of Vietnamese language, VLSP 2016 (The fourth
Internationaly Workshop on Vietnamese Language and Speech Processing)
evaluation campaign is the first effort to provide the benchmark data and to
perform a systematic comparison between Vietnamese sentiment analysis systems.
The scope of the campaign in VLSP 2016 is polarity classification in which
participant systems need to classify Vietnamese reviews/documents into one of
three categories: “positive”, “negative”, or “neutral”.
The campaign has attracted eight teams participating the five best results are
published in the proceedings of the workshop [2, 3, 4, 5, 6]. Overall, various
machine learning have been applied by researchers to tackle this problem in VLSP
2016 competition. However, each team has used some of the popular algorithms
with different parameters settings and features extraction methods. Therefore, it is
Nghiên cứu khoa học công nghệ
Tạp chí Nghiên cứu KH&CN quân sự, Số 48, 04 - 2017 105
difficult to assess if a method is superior to other when used for Vietnamese
sentiment analysis. The objective of this paper is to systematically conduct a
comparsion between a large number of machine learning techniques in solving
Vietnamese sentiment analysis problem. We also carefully tune parameters setting
of the tested algorithms and only the best result of each method was reported.
Based on the results of this paper, we will have better insight into the ability of
different machine learning techniques in solving Vietnamese sentiment analysis
problem.
The remainder of this paper is organized as follows. Section 2 provides a detail
of our system. Section 3 describes the experimental setup. The results of the
experiments are presented and discussed in Section 4. Section 5 concludes the
paper and points to avenues for future work.
2. SYSTEM DESCRIPTION
Figure 1 illustrates the processes of our system for sentiment classification.
After preprocessing data by removing low-frequency words, in feature extraction
step, we extract sentence or document feature vector using TF and TF-IDF
features. Then these feature vector is input to a classifier such as Support Vector
Machine or Multilayer Neural Network, etc. to determine sentiment label of
sentence or document.
Figure 1. Sentiment classification system.
A. Features
In this paper, two methods are used to extract features from each sentence or
each document:
a) TF (Term Frequency)[7]: term frequency is often used to present the
relationship between words in a document. Usually, the simplest choice is to use
the raw frequency of a term in a document, i.e. the number of times that term t
occurs in document d. If we denote the raw frequency of t by ft,d then the simple tf
scheme is tf(t,d) = ft,d.
b) TF-IDF (Term Frequency * Inverse Document Frequency)[8]: TF-IDF
usually used in information retrieval to determine which words are importance.
This feature has solved the local and global information problem in feature
extraction approach through TF and IDF score. In our experiment, we use TF-IDF
score of unigram (uni-word) to extract the feature from a sentence or a document.
B. Algorithms
A larger number of machine learning algorithms are used in this paper for
comparison. These algorithms include:
• Support Vector Machine (SVM) [9]: SVM is the classic supervised machine
learning algorithm. The goal of SVM is to determine the hyperplane that has the
largest distance to the support vectors. With its effectiveness in classification,
Document Feature
Vector
Feature
Extraction
Sentiment
Label
Preprocess Classification
Công nghệ thông tin & Cơ sở toán học cho tin học
N. H. Quan, L. D. Son, N. Q. Uy, “An empirical study of Vietnamese sentiment analysis.” 106
SVM is popularly used in many areas as the hand-written recognizer, opinion
mining. etc.
• Multilayer Neural Network (MLNN) [10]: In neural network classifier,
sentence’s features are integrated into a multi-layer full connected network. The
last layer use softmax function classify feature vector. Particularly, MLNN was
trained by stochastic gradient descent optimizer (learning rate 0.1) and it uses
sigmoid function as activation function at hidden layer.
• K-Nearest Neighbors: In pattern recognition, the k-nearest neighbors
algorithm (k-NN) is a non-parametric method used for classification [11]. The
input consists of the k closest training examples in the feature space. The output is
a class membership. An object is classified by a majority vote of its neighbors,
with the object being assigned to the class most common among its k nearest
neighbors (k is a positive integer, typically small).
• Decision Tree[12]: A decision tree is a classification algorithm that uses a
tree-like graph for making decisions [13]. Each internal node of a tree represents a
"test" on an attribute (e.g. whether a coin flip comes up heads or tails), each branch
represents the outcome of the test and each leaf node represents a class label
(decision taken after computing all attributes). The paths from root to leaf
represents classification rules.
• Random forests: Random forests [14] are the method that aim to correct for
decision trees' habit of overfitting to their training set. Random forests are an
ensemble learning method used for classification and other tasks, that operate by
constructing a multitude of decision trees at training time and outputting the class
that is the mode of the classes (classification) or mean prediction (regression) of
the individual trees.
• The passive-aggressive algorithms [15] are a family of algorithms for large-
scale learning. They are similar to the Perceptron Network in that they do not
require a learning rate. However, contrary to the Perceptron, they include a
regularization parameter C in order to a void overfitting.
• AdaBoost: AdaBost shorted for "Adaptive Boosting", is a ensemble method
that can be used in conjunction with many other types of learning algorithms to
improve their performance. The output of the other learning algorithms ('weak
learners') is combined into a weighted sum that represents the final output of the
boosted classifier. AdaBoost is adaptive in the sense that subsequent weak learners
are tweaked in favor of those instances misclassified by previous classifiers.
For all these algorithms, we used their implementation in Scikit-learn library to
conduct the experiments. Scikit-learn is a popular machine learning library written
in python [16].
3. EXPERIMENTAL SETTINGS
A. Dataset
To train and test our systems, we used the data provided in the VLSP evaluation
campaign in sentiment analysis task. It contains user’s reviews about technological
Nghiên cứu khoa học công nghệ
Tạp chí Nghiên cứu KH&CN quân sự, Số 48, 04 - 2017 107
device following three categories: ”negative”, ”positive” and ”neutral”. We divided
the dataset into two parts: one for training and another for testing. The number of
positive, negative, neutral samples and the total samples for training and testing set
are shown in Table 1.
Table 1. Dataset for training and testing algorithms.
Positive Neutral Negative Total
Train 1400 1400 1400 4200
Test 300 300 300 900
B. Evaluation
The performance of the sentiment classification systems will be evaluated using
three popular metrics including precision, recall, and the F1 score. Let A and B be
the set of reviews that the system predicted as positive and the set of reviews with
positive label, the precision, recall, and the F1 score of positive label can be
computed as follows (similarly for negative and neutral labels):
A B
precision
A
A B
recall
B
2
1
precision recall
F
precision recall
After calculating precision, recall and F1 for each label, the final value of
precision, recall and F1 of a method is obtained by averaging over the value of
three labels.
4. RESULTS AND DISCUSSION
For each method, we used two features (TF, and TF-IDF) as has been presented
in the above section. We also tested each algorithm with various values of its
parameters. After a number of experiments, we found that the results with TF
feature is often worse compared to the results of TF-IDF feature. The reason could
be that TF feature ignored the information relating to the lengtht of the document
and this information may be useful for classifying the opinions contained in the
document. Therefore, the results of TF feature was discarded and we only present
and discuss the result of TF-IDF feature in this section.
The best result on each algorithm with the parameters that achieved best result
are shown in Table 2. Among three performance metrics, F1 is the most important
measure. Therefore, in this table, we focus on comparing between algorthms based
on F1. Precision and Recall are used only for reference. In Table 2, the best
algorithm (the highest value of F1) is printed bold faced and the worst algorithm
(the lowest value of F1) is printed bold and italic faced.
Công nghệ thông tin & Cơ sở toán học cho tin học
N. H. Quan, L. D. Son, N. Q. Uy, “An empirical study of Vietnamese sentiment analysis.” 108
Table 2. The best results of each algorthm with it parameters.
Classifier Precision Recall F1-
Score
Nearest Neighbor (K=4) 0.54 0.52 0.50
SVM (Radial basic function kernel, Gamma=2, C=1) 0.66 0.64 0.65
SVM (LinearSVC) 0.64 0.64 0.63
SVM (Stochastic Gredient Descent) 0.57 0.56 0.56
Decision Tree (Max Deep=5) 0.54 0.45 0.42
Random Forest (Max Deep=5, n_estimators=10,
max_features=1)
0.45 0.44 0.41
Neural Network (Multi-layer Perceptron, alpha=1,
hidden_layer_size=100)
0.58 0.57 0.56
Passive Aggressive (C=1.0, n_iter=5) 0.63 0.63 0.62
Adaptive Boosting (n_estimaters=50) 0.57 0.56 0.56
It can be seen from this table that the best result achieved by support vector
machine with Radial basic function kernel while the worst result obtained by
random forest. Overall, the results of support vector machine with different kernel
functions are rather solid. The value of F1 achieved by support vector machine
with three kernel functions are alway among the best results of all methods.
Figure 2. Comparison between nine algorithms based on F1-score.
Nghiên cứu khoa học công nghệ
Tạp chí Nghiên cứu KH&CN quân sự, Số 48, 04 - 2017 109
The table also shows that tree-based algorithms often did not perform well on this
problem. Two tree-based algorithms including Decision Tree and Random Forest are
the worst methods among nine tested methods. Among four left algorithms, we can
see that the results of passive aggressive is also very good. This algorithm is ranked
third among nine algorithms while the results of neural network and AdaBoost are
equal and they are in the midle of the tested algorithms. Finally, the result of Nearest
Neighbor is also not convincing. This method is only better than two tree-based
algorthms (Decission Tree and Random Forest). Figure 2 presents in details the
comparison between nine tested algorithms based on their F1 value.
5. CONCLUSION
In this paper, we have examined the performance of various machine learning
algorithms in solving Vietnamese sentiment analysis problem. Nine popular
algorithms were selected and tested. A recent released data set for Vietnamese
sentiment analysis (VLSP 2016 dataset) was used in the experiments. The results
of the experiments showed that the methods based on support vector machine
achieved the best performance while tree-based methods (Decision Tree and
Random Forest) did not perform well. The results also showed that using TF-IDF
feature is better than using TF feature. These results provide the insight into the
ability of various machine learning techniques in solving this problem.
There are a number of research areas for future work, which arise from this
paper. First, we would like to investigate the better techniques for features
extraction. In this paper, we have shown that TF-IDF feature is better than IF
feature. In the future, we will study the methods of word2vec to extract features for
Vietnamese sentiment analysis. Second, recent research has shown that applying
deep learning to neural language proceecing has gained a significant improvement.
Therefore, it will be interesting to examine if deep learning can be usefull for
Vietnamese sentiment analysis. Last but not least, we are planning to collect and
conduct our research on a large Vietnamese dataset.
REFERENCES
[1]. K. Ravi and V. Ravi, “A survey on opinion mining and sentiment analysis: Tasks,
approaches and applications” Knowledge-Based Systems, vol. 89, pp. 14, 2015.
[2]. Le Anh Cuong, Ng. T. Minh Huyen, Ng. Viet Hung, “VLSP 2016 Shared
Task: Vietnamese Analysis”, VLSP 2016, Ha Noi, 2016.
[3]. Vi Ngo Van, Minh Hoang Van, Tam Nguyen Thanh: “Sentiment Analysis for
Vietnamese using Support Vector Machines with application to Facebook”,
VLSP 2016, Ha Noi, 2016.
[4]. Hy Nguyen, Tung Le, Viet-Thang Luong, Dinh Dien: “A Simple Supervised
Learning Approach to Sentiment Classification at VLSP 2016”, VLSP 2016,
Ha Noi, 2016.
[5]. Minh Nhat Quang Pham, Tran The Trung: “A Lightweight Ensemble Method
for Sentiment Classification Task”, VLSP 2016, Ha Noi, 2016.
[6]. Quynh-Trang Thi Pham, Xuan-Truong Nguyen, Van-Hien Tran, Thi-Cham
Nguyen, Mai-Vu Tran: “DSKTLAB: Vietnamese Sentiment Analysis for
Product Reviews”, VLSP 2016, Ha Noi, 2016.
Công nghệ thông tin & Cơ sở toán học cho tin học
N. H. Quan, L. D. Son, N. Q. Uy, “An empirical study of Vietnamese sentiment analysis.” 110
[7]. Rajaraman, A.; Ullman, J. D. (2011). "Data Mining". Mining of Massive
Datasets (PDF). pp. 1–17.
[8]. Khoo Khyou Bun; Bun, Khoo Khyou; Ishizuka, M. "Emerging Topic Tracking
System". Proceedings Third International Workshop on Advanced Issues of E-
Commerce and Web-Based Information Systems. WECWIS 2001.
[9]. C. Cortes and V. Vapnik, “Support-vector networks” Mach. Learn., vol. 20,
no. 3, pp. 273–297, Sep. 1995.
[10]. Rosenblatt, Frank. x. “Principles of Neurodynamics: Perceptrons and the
Theory of Brain Mechanisms”. Spartan Books, Washington DC, 1961
[11].Altman, N. S. (1992). "An introduction to kernel and nearest-neighbor
nonparametric regression". The American Statistician. 46 (3): 175–185.
[12]. Rokach, Lior; Maimon, O. (2008). “Data mining with decision trees: theory
and applications”. World Scientific Pub Co Inc. ISBN 978-9812771711.
[13]. Rokach, Lior; Maimon, O. (2008). “Data mining with decision trees: theory
and applications”. World Scientific Pub Co Inc. ISBN 978-9812771711.
[14]. Ho, Tin Kam (1998). "The Random Subspace Method for Constructing
Decision Forests" (PDF). IEEE Transactions on Pattern Analysis and
Machine Intelligence. 20 (8): 832–844. doi:10.1109/34.709601.
[15]. Koby Crammer, Ofer Dekel, Joseph Keshet, Shai Shalev-Shwartz, Yoram
Singer, “Online Passive-Aggressive Algorithms”, School of Computer Science
and Engineering The Hebrew University Jerusalem, 91904, Israel, 2006
[16]. Luis Pedro Coelho, Willi Richert, “Building Machine Learning System with
Python”, ISBN 978-1-78439-277-2, 2015.
TÓM TẮT
MỘT NGHIÊN CỨU THỰC NGHIỆM
VỀ PHÂN TÍCH TÂM LÝ TRONG TIẾNG VIỆT
Phân tích tâm lý bao gồm các lĩnh vực nghiên cứu về ý kiến, tình cảm,
đánh giá, thái độ và cảm xúc của con người dựa vào đoạn văn bản. Điều này
đã trở thành một trong những lĩnh vực nghiên cứu tích cực nhất trong xử lý
ngôn ngữ tự nhiên. Trong những năm gần đây, phân tích tâm lý cho văn bản
tiếng Việt đã nhận được sự chú ý đáng kể. Trong hội thảo quốc tế lần thứ tư
về ngôn ngữ Việt và xử lý tiếng nói, có một cuộc thi giữa các nhà nghiên cứu
khác nhau trong việc giải quyết vấn đề này trên một tập dữ liệu chuẩn được
ban tổ chức hội thảo cung cấp. Tuy nhiên, mỗi nhà nghiên cứu giải quyết các
vấn đề với các phương pháp khác nhau. Trong bài báo này, chúng tôi thể
hiện sự so sánh của nhiều thuật toán học máy để giải quyết vấn đề trên. Các
kết quả thí nghiệm giúp hiểu sâu hơn về khả năng của các thuật toán học
máy khác nhau khi được sử dụng để phân tích tâm lý trong tiếng Việt.
Từ khóa: Phân tích tâm lý, Khai thác quan điểm, Học máy.
Nhận bài ngày 17 tháng 02 năm 2017
Hoàn thiện ngày 04 tháng 4 năm 2017
Chấp nhận đăng ngày 05 tháng 4 năm 2017
Address: Military Technical Academy;
*Email: nghoangquan@gmail.com
Các file đính kèm theo tài liệu này:
- 12_son_2774_2151790.pdf