Document Details

Document Type : Thesis 
Document Title :
HYBRID APPROACH FOR ARABIC SENTIMENT ANALYSIS USING MACHINE LEARNING AND LEXICON-BASED CLASSIFICATION
نهج مزدوج لتحليل الآراء و المشاعر في النصوص العربية باستخدام تصنيف تعلم الآلة و التصنيف القائم على المعجم
 
Subject : Faculty of Computing and Information Technology 
Document Language : Arabic 
Abstract : With the remarkable growth of the social media platforms, a large scale of data is scattered throughout the networks, those data contain valuable information that can be of a great help in many areas. It became a challenge to extract useful information from a big amount of data, since they are large in volume, variety and velocity. Sentiment analysis (SA) or Opinion Mining (OM) is one of the techniques that can help to extract information from a large amount of data, it is a research field in text mining. SA is the task of extracting the opinion or classifying the polarity of a given sentence or text and determine whether it carries positive, negative or neutral sentiments. SA has been well studied for the English language, however, Arabic SA has been considered more challenging due to the natural of the Arabic language, its rules and the fact that multiple dialects exist in the Arab countries. Another challenging issue in SA classification is the need for labeled data for training and the fact that the labeling process is usually carried out manually by humans and thus it is time consuming. Another challenging issue is identifying the most suitable machine learning algorithm for classifying the data accurately. In this thesis, a new hybrid sentiment analysis model is proposed for Modern Standard Arabic (MSA) text, with the goal of achieving an efficient classification of a given unlabeled Arabic text with optimal accuracy. The proposed model uses both lexicon-based approach and machine learning approach by using the ensemble learning technique: majority voting. The issues of data labeling are addressed by taking advantage of the lexicon-based approach for text labeling. The majority voting-based ensemble learning is used to improve the classification performance, where multiple classifiers are used instead of a single classifier and their results are combined. Several sets of classifiers and test datasets have been considered. The experimental results show that the proposed model with the set of classifiers: Naive Bayes, Logistic Regression and Stochastic Gradient Descent outperforms all the models with a single classifier in terms of accuracy, recall, precision and F-score. 
Supervisor : Dr. Mounira Taileb 
Thesis Type : Master Thesis 
Publishing Year : 1441 AH
2019 AD
 
Added Date : Wednesday, September 25, 2019 

Researchers

Researcher Name (Arabic)Researcher Name (English)Researcher TypeDr GradeEmail
أمل ضيف الله الكبكبيAlkabkabi, Amal DhaifallahResearcherMaster 

Files

File NameTypeDescription
 45044.pdf pdf 

Back To Researches Page