Sentimental Analysis of Arabic tweets
Posted on December 12, 2021 • 1 minutes • 134 words
In this project ,Sentimental analysis is done on Arabic tweets .An Arabic-Bert-base model from hugging face is used, it was pretrained on ~8.2 Billion words which sum up to ~95GB of text. The model achieved an accuracy of 90.16% and AUC of 0.966 on test set.
1- Dataset was found on kaggle at this link: https://www.kaggle.com/mksaad/arabic-sentiment-twitter-corpus
2- Some preprocessing was done before using BERT including:
- Normalize Unicode encoding
- Removing urls,@,trailing white spaces
- Add
[CLS]
and[SEP]
to the beginning and end of each sentence - set a max_length through truncating or padding
3- A Bert classifier is used , along with Adam optimizer and a learning rate scheduler.
4- Cross entropy was used as the objective function
5- Two performance metrics were used for evaluation : Accuracy and AUC