Sunday, June 2

Summarize and Translate using LLMs

 


Introduction:

Hello! in this blog, we will explore the process of fine-tuning large language models (LLMs) for a specific task: summarizing English news articles in Arabic and generating an Arabic title. 

We will focus on two powerful models, Gemma-7b and Llama3-8b, and walk through each step required to achieve this task. 

  • Dataset Creation: How to gather and prepare the data necessary for fine-tuning. 
  • Prompt Creation: Crafting effective prompts to guide the models in performing the desired tasks.
  • Model Fine-Tuning: Using Unsloth AI to fine-tune our models specifically for summarization and title generation. 

By the end of this blog, you will have a clear understanding of how to adapt these LLMs to perform task-oriented applications, leveraging their capabilities to produce meaningful outputs in a different language. Let’s get started!

1- Data Preparation:

For this task, we utilized a sample of the XLSum dataset, which includes a diverse collection of news articles, their summaries, and titles, all in English. To tailor this dataset for our specific needs, we followed these steps:

  1. Sample Selection: We selected a representative sample from the XLSum dataset.
  2. Translation: While keeping the news articles in English, we translated the summaries and titles into Arabic.
  3. JSON Representation: We created a new column in our dataset that contains the JSON representation of both the translated summary and title.

This structured approach ensures that our data is well-organized and ready for the fine-tuning process. The resulting dataset looks as follows:



Our Dataset for this task: AhmedBou/EngText-ArabicSummary 


2- Effective Prompt:

In our fine-tuning process, crafting effective prompts is essential to guide the model on the specific task of summarizing English news articles in Arabic and generating an Arabic title.

Prompt Structure

Our prompt consists of the following elements:

  1. System Message: A message to set the context for the task.
  2. Fixed Instruction: A consistent instruction since we are fine-tuning for a specific task.
  3. Input: The news article in English.
  4. Response: The JSON representation of the translated summary and title.


This structured approach ensures that our dataset is complete and well-prepared, facilitating effective task-oriented fine-tuning. 
By maintaining consistency in our prompts, we enhance the model's ability to understand and perform the task accurately.

3- Task-Oriented Fine-Tuning of LLMs:

To fine-tune the Gemma-7b and Llama3-8b models for our specific task, we leveraged the power of Unsloth AI, which makes the fine-tuning process 2.2x faster and reduces VRAM usage by 80%. This efficiency allowed us to perform the fine-tuning on a free Colab notebook, making the process accessible and cost-effective.

Finetuned Model's link: AhmedBou/Llama-3-EngText-ArabicSummary


Tools and Frameworks

We utilized Hugging Face's TRL (Transformers Reinforcement Learning) library, specifically the SFTTrainer class, to facilitate the fine-tuning. This tool simplifies the training process and integrates seamlessly with our workflow.

Conclusion:

After fine-tuning the Gemma-7b and Llama3-8b models, we observed that the Llama3-8b model performed better in several key aspects.

It consistently respected the output format as JSON and provided more meaningful summaries and titles that adhered to Arabic grammar. This highlights the effectiveness of the Llama3-8b model for our specific task of summarizing English news articles in Arabic and generating Arabic titles.

Challenge for Readers

We invite you to take on a challenge to further explore and validate our findings. Using the test set we provided, calculate the approximate accuracy score between the two models. You can use evaluation metrics like BLEU, ROUGE scores, Jaccard Index, or RapidFuzz to determine the performance of each model. This will give you a quantitative measure to see which model performs best.

Steps to Follow

  1. Prepare the Test Set: Load the provided test set.
  2. Generate Outputs: Use both Gemma-7b and Llama3-8b models to generate summaries and titles.
  3. Evaluate Outputs: Calculate the evaluation metrics (BLEU, ROUGE, Jaccard Index, or RapidFuzz) to compare the models.

Finally

To explore the Python code used in this project, visit my GitHub 
Additionally, don't miss our YouTube video for a visual walkthrough of our journey. 
I'm always eager to connect, so feel free to reach out to me on LinkedIn

Thank you, and stay tuned for more captivating projects and insights!




Share: