Sentiment analysis is one of the most useful Natural Language Processing (NLP) functionalities that can determine the tone (positive, negative) of the text (e.g., product reviews, comments, etc.).
This Machine Learning (ML) model was built using a dataset published on Kaggle and combines 100k Arabic reviews from hotels, books, movies, products, and a few airlines. Text (reviews) were cleaned by removing Arabic diacritics and non-Arabic characters. Predictions are calculated using the log-odds statistics, and method accuracy exceeds 75% which is not a bad performance for a model sized 28.2 KB.
It has been tested also with the HARD: Hotel Arabic-Reviews Dataset, and it was able to achieve 82% on the balanced reviews dataset (in total 105,698 reviews).For simplicity, we assumed that all the words in the first language spoken by the Semitic peoples consisted of bi-radicals (i.e., two sounds/letters). Therefore, we can handle the majority of Arabic word roots as being expanded by the addition of a third letter, with the resulting meaning having a semantic relation to the original bi-radical (Ref: The biradical origin of semitic roots).
Arabic Review (sample input) | Sentiment (auto generated) | Probability (auto generated) |
الخدمة كانت بطيئة | Negative | 57.2% |
الإطلالة رائعة والطعام لذيذ | Positive | 91.5% |
التبريد لا يعمل والواي فاي ضعيفة | Negative | 73.0% |
النظافة مميزة وموظفي الاستقبال متعاونين | Positive | 58.6% |
جاءت القطعة مكسورة والعلبة مفتوحة | Negative | 60.5% |
المنتج مطابق للمواصفات والتسليم سريع | Positive | 56.1% |
<?php
$Arabic = new \ArPHP\I18N\Arabic();
$reviews = array('الخدمة كانت بطيئة',
'الإطلالة رائعة والطعام لذيذ',
'التبريد لا يعمل والواي فاي ضعيفة',
'النظافة مميزة وموظفي الاستقبال متعاونين',
'جاءت القطعة مكسورة والعلبة مفتوحة',
'المنتج مطابق للمواصفات والتسليم سريع');
echo <<< END
<center>
<table border="0" cellspacing="2" cellpadding="5" width="60%">
<tr>
<td bgcolor="#27509D" align="center" width="50%">
<b><font color="#ffffff">Arabic Review (sample input)</font></b>
</td>
<td bgcolor="#27509D" align="center" width="25%">
<b><font color="#ffffff">Sentiment (auto generated)</font></b>
</td>
<td bgcolor="#27509D" align="center" width="25%">
<b><font color="#ffffff">Probability (auto generated)</font></b>
</td>
</tr>
END;
foreach ($reviews as $review) {
$analysis = $Arabic->arSentiment($review);
if ($analysis['isPositive']) {
$sentiment = 'Positive';
$bgcolor = '#E0F0FF';
} else {
$sentiment = 'Negative';
$bgcolor = '#FFF0FF';
}
$probability = sprintf('%0.1f', round(100 * $analysis['probability'], 1));
echo '<tr><td bgcolor="'.$bgcolor.'" align="right">';
echo '<font face="Tahoma">'.$review.'</font></td>';
echo '<td bgcolor="'.$bgcolor.'" align="center">'.$sentiment.'</td>';
echo '<td bgcolor="'.$bgcolor.'" align="center">'.$probability.'%</td></tr>';
}
echo '</table></center>';