Deanship of Graduate Studies | Researches | Pre-trained Transformer-Based Approach for Extractive and Abstractive Summarization of Arabic Text

Main Page
Deanship
- The Dean
  - Dean's Word
  - Curriculum Vitae
  - Contact the Dean
- Vision and Mission
- Organizational Structure
- Vice- Deanship
- Vice- Dean
- KAU Graduate Studies
Research Services & Courses
- Research Services Unit
- Important Research for Society
- Deanship's Services
  - FAQs
  - Research
  - Staff Directory
  - Files
  - Favorite Websites
  - Deanship Access Map
Graduate Studies Awards
Deanship's Staff
- Staff Directory
Files
Researches
Contact us

- عربي
- English

Deanship of Graduate Studies

Document Details

Document Type	:	Thesis
Document Title	:	Pre-trained Transformer-Based Approach for Extractive and Abstractive Summarization of Arabic Text نهج قائم على المحولات المدربة مسبقاً للتلخيص الاستخراجي والتجريدي للنص العربي
Subject	:	Faculty of Computing and Information Technology
Document Language	:	Arabic
Abstract	:	Automatic Text Summarization (ATS) is a prominent research topic in Natural Language Processing (NLP) due to the variety and proliferation of information sources on the Internet. In this research study, we explored the ATS systems for two different approaches: extractive summarization and abstractive summarization. The extractive summarization method relies on selecting the most important phrases and sentences from the main entry text to create a summary without reformatting these phrases and sentences. Abstractive summarization, on the other hand, summarizes the original text in entirely newterms and sentences. There is plenty of research published on summarizing English text using more advanced methodologies to achieve advanced results. However, due to the nature of the Arabic language and the need for more basic reference datasets, research in Arabic text summarization is moving more slowly. Several pre-trained language models have recently shown excellent performance on many NLP tasks. For this reason, this study aims to experiment with different pre-trained models for summarizing the Arabic text. We finetuned and compared the performances of the base AraBERT model, the QARiB model, and the AraELECTRA model. These models were trained using the KALIMAT and EASC Arabic datasets for Arabic extractive text summarization. Then the generated summaries were evaluated with the ROUGE evaluation package using the ROUGE- 1, ROUGE-2, and ROUGE-L scales. The best results were achieved using the AraBERT model, which obtained 0.44, 0.26, and 0.44 on the KALIMAT dataset. In addition, for Arabic abstractive text summarization, we used the Text-to-Text Transfer Transformer (T5 model), which yielded good results. We used a dataset of 267,000 Arabic articles to finetune AraT5, the newly launched Arabic version. The model was evaluated through ROUGE-1, ROUGE-2, ROUGE-L, and BLEU scores, and the results were 0.494, 0.339, 0.469, and 0.4224, respectively. We also used another dataset containing 300,000 articles and headlines and achieved the following evaluation scores 0.53, 0.3, 0.36, and 0.48. In addition, the AraT5 model was superior to the most recent research using the Sequence-to-Sequence (Seq2Seq) model.
Supervisor	:	Dr. Amal Almansour
Thesis Type	:	Master Thesis
Publishing Year	:	1445 AH 2023 AD
Added Date	:	Friday, November 10, 2023

Researchers

Researcher Name (Arabic)	Researcher Name (English)	Researcher Type	Dr Grade	Email
ياسمين عينيه	Einieh, Yasmin	Researcher	Master

Files

File Name	Type	Description
49526.pdf	pdf

Back To Researches Page