Sunday, February 25

MedZa Assistant: Optimized RAG Chatbot based on Gemini, and Browser Data

 


Welcome to our blog! In today's digital landscape, chatbots like ChatGPT often face the challenge of providing accurate and up-to-date responses due to knowledge cutoff. 

To address this, we've developed a local chatbot solution leveraging RAG (Retrieval-Augmented Generation) with Gemini and real-time browser data integration using DuckDuckGo Api. 

In the following example, we can see how ChatGPT couldn't answer a simple question about SORA an OpenAI model!

Our solution: we call it MedZa Assistant, not only delivered the correct answer but also substantiated it with references, ensuring both its credibility and guarding against model hallucination.




Our innovative approach ensures precise and timely answers, surpassing limitations imposed by knowledge cutoff. Let's dive in!

Getting Fresh Data from the Internet with DuckDuckGo API

In our first part, we explored how we used the DuckDuckGo API to gather the latest info from the internet. When a user asked a question, our chatbot didn't just rely on what it already knew. Instead, it went online and checked out what was new. By tapping into the DuckDuckGo API, we found the top information on the topic, along with where it was coming from. This helped us stay up-to-date and provide the most recent data to our users, ensuring they got the freshest answers possible.

But why did we go through all this trouble? Well, it wasn't just about being current. By constantly updating our knowledge base with fresh info from the web, we were also helping our chatbot stay sharp. You see, sometimes our model might have gotten a bit confused or mixed things up – we called that "hallucination." But with the help of DuckDuckGo API, we could give it real-world examples to learn from, making sure it was always on the right track. So not only did our users get the most recent answers, but our chatbot also got a little boost in its smarts along the way.

The following are the top 2 results returned by DuckDuckGo API when we asked What is Python?



Making Chatbot Answers Better with Google Gemini

In our second part, we explored how we used the Google Gemini Pro model alongside prompts to handle each returned result from the web separately. After gathering information from different sources using the DuckDuckGo API, we fed each piece of data into the Google Gemini Pro model with a specific prompt designed to summarize it effectively. By breaking down the content into smaller parts and summarizing each one individually, we made sure that our summaries were clear and accurate.

But we didn't stop there. After generating summaries for each piece of information, we compiled them into a complete data digest. This digest provided a thorough overview of the topic, capturing the main points from multiple sources in a brief and easy-to-understand format. Each summary was linked back to its original source, giving users the option to explore further if they wanted to.

So, when users engaged with our chatbot, they could trust that the answers provided were not only accurate and recent but also carefully selected from reliable sources on the internet. Thanks to the integration of Google Gemini Pro and our meticulous curation process, we aimed to offer users a seamless and informative experience, giving them access to knowledge while maintaining transparency about our data sources.

The following is the Result we get from Google Gemini Pro when we asked about LLMOps, as you can see it is a pure hallucination.



And here's MedZa assistant answer:



Bringing MedZa Assistant to the Web with Streamlit

In an effort to make the MedZa Assistant more accessible, we leveraged the Streamlit library to develop a user-friendly web application.

This application allows users to interact with the chatbot directly through their web browser, eliminating the need for any downloads or installations.

Users can now visit the gallery and access the MedZa Assistant with just a few clicks, whether they're seeking information, assistance, or just a friendly chat.

The little backstory on the name "MedZa"! The secret behind this unique moniker is that it's a combination of the names of the creators: me, Ahmed, and Hamza. 🤫 We took the last part of each of our names and merged them together to create "MedZa," symbolizing our collaboration and dedication to building a helpful and innovative assistant for everyone to enjoy.

Future Directions and Improvements

Looking ahead, our focus shifts to enhancing the chatbot's capabilities beyond just question-answering.

While the current version excels in providing accurate responses, we aim to expand its functionality to include conversational interactions and logic-based queries. Additionally, we plan to refine its ability to handle straightforward questions like arithmetic calculations independently, without relying on additional context.

By addressing these areas, we aim to create a more versatile and intuitive chatbot experience for users.


Finally

To explore the Python code used in this project, visit my GitHub

I'm always eager to connect, so feel free to reach out to me on LinkedIn

Connect with Hamza Boulahia LinkedIn

Take a minute and visit Hamza Boulahia Amazing Blog


Share:

Sunday, January 21

RAG with Google Gemini on Arabic Docs

 

In the dynamic landscape of natural language processing, Google Gemini has emerged as a revolutionary tool, pushing the boundaries of language comprehension. In this blog, we explore the capabilities of Gemini models, with a particular focus on their prowess in understanding foreign languages like Arabic.

Build with Gemini: Developer API Key

One of the exciting aspects of Google Gemini is its accessibility through the developer API key. Google generously provides developers with the opportunity to tap into the potential of Gemini models for free, allowing innovation and experimentation without financial barriers.

Get your API key in Google AI Studio.



Meet the stars of the show:
Gemini-pro: Optimized for text-only prompts, this model masters the art of linguistic finesse.
Gemini-pro-vision: For text-and-images prompts, this model integrates visual context seamlessly.

Let's Start:

In this blog post, I will guide you step by step through the implementation of a RAG model using the Gemini model. Each step of the process will be meticulously explained, providing you with a clear roadmap for incorporating this advanced language understanding into your projects. What's more, to make this journey even more accessible, the Python code for the entire implementation will be included in a user-friendly Python notebook.

We initiated the evaluation by conducting a swift test to assess the model's prowess in generating Arabic content from Arabic queries. Additionally, we examined its ability to answer questions based on a set of information using a miniature version of the RAG (Retrieval-Augmented Generation) approach.

The results shed light on the model's effectiveness in handling Arabic language intricacies and its capacity to provide contextually relevant responses within the defined information scope.


Step 1: Data Import with Langchain:


Our project commences by importing data from external sources, encompassing PDFs, CSVs, and websites.

To facilitate this process, we leverage both the Langchain and html2text libraries. For our assessment of the model's capabilities, we opt to scrape information from the Wikipedia page on gravity, considering both Arabic and English versions. This dual-language approach ensures a diverse dataset, allowing us to thoroughly evaluate the model's proficiency in handling multilingual content and extracting meaningful insights.


Step 2: Data Splitting & chunks creation with Langchain:

To streamline the handling of website data from the Wikipedia page, we employed Langchain's RecursiveCharacterTextSplitter.

This powerful tool enabled us to efficiently split the retrieved content into smaller, manageable chunks. This step is pivotal as it prepares the data for embedding and storage in a vector store. By breaking down the information into more digestible units, we enhance the model's ability to comprehend and generate nuanced responses based on the intricacies of the input.

Step 3: Gemini Embedding Mastery:


For the embedding phase, we harnessed the power of the Google Gemini embedding model, specifically utilizing the embedding-001 variant. This model played a pivotal role in embedding all the previously processed data chunks, ensuring a rich representation of the information.

Step 4: Vector Store with Langchain DocArrayInMemorySearch:

To efficiently store and organize these embeddings, we employed Langchain's vector store functionality, leveraging the DocArrayInMemorySearch from the Langchain vectorstores.

This strategic combination not only facilitates seamless storage of the embedded data but also sets the stage for streamlined querying and retrieval.Now, with our chunks embedded and securely stored, they are poised for efficient retrieval as the project progresses.


Step 5: Prompt Injection & Results Harvest from Gemini Model:


In the pursuit of generating precise and contextually rich answers, our approach involves leveraging the vector store retriever to extract the top chunks deemed most relevant to address user queries. This crucial step ensures that the context necessary for a comprehensive response is readily available.

Subsequently, employing the versatile capabilities of Langchain, we construct a seamless workflow. The user's question and the retrieved context are seamlessly passed through a Langchain chain, which incorporates a meticulously designed prompt template. This template plays a crucial role in structuring the input for the Google Gemini model.

This integrated process sets the stage for the Google Gemini model to perform prompt injection, effectively generating answers that draw upon the contextual information stored in the vectorized chunks. Through this methodical approach, we aim to provide users with accurate and insightful responses tailored to their inquiries.

My Personal Opinion:

In our evaluation, the model showcases impressive capabilities and yields outstanding results when it comes to English.
However, the performance takes a hit when dealing with Arabic content. This discrepancy can be attributed to the limitations of the embedding model and the retriever, which struggle to retrieve the relevant context needed to answer Arabic user queries effectively.
It's worth considering the adoption of a more advanced embedding model, possibly a multilingual one, to enhance results in Arabic. This adjustment could potentially address the current limitations and improve the overall performance for a more robust user experience.

A task for you!

For a hands-on exploration, consider experimenting with alternative tools to enhance the performance of the model.
Try integrating a different embedding model, perhaps a multilingual one from the HuggingFace library. Additionally, explore the use of an alternative vector store, like Chroma DB, to store and retrieve embedded data. After making these adjustments, compare the results with our current setup. Your findings could provide valuable insights into optimizing the system for improved performance and responsiveness.

Finally

To explore the Python code used in this project, visit my GitHub 
Additionally, don't miss our YouTube video for a visual walkthrough of our journey. 
I'm always eager to connect, so feel free to reach out to me on LinkedIn

Thank you, and stay tuned for more captivating projects and insights!


Share:

Tuesday, September 5

Fine-Tuning Large Language Models for Specialized Arabic Task


I. Introduction: Large Language Models for Arabic Tags Generation

In this blog post, our primary focus will be on the process of fine-tuning four different large language models (LLMs) using an Arabic dataset. We'll delve into the intricacies of adapting these models to perform specialized tasks in Arabic natural language processing. 

The good news is that you won't need any complex setup, a Google Colab notebook will suffice for this entire workflow, making it accessible and efficient for anyone interested in exploring the world of LLMs fine-tuning.

1. Task Overview: Tags Generation

In this task, we explore the remarkable capabilities of different open-source large language models (LLMs) in understanding and generating Arabic words. 

Our objective is straightforward: to use LLMs to automatically generate descriptive tags for Arabic quotes. 
This task not only demonstrates the linguistic prowess of LLMs but also showcases their potential in Arabic language applications.

2. Large Language Models for the Challenge

In this section, we're gearing up to put four remarkable language models to the test, and the best part is that they're all readily available on the HuggingFace library. 



Here's a quick introduction to each one:

1. RedPajama ([Link]): RedPajama is developed by Togethercomputer.


2. Dolly V2 ([Link]): Dolly V2 is developed by Databricks.


3. OPT ([Link]): OPT was developed by Facebook (Meta).


4. GPT Neo 2.7B ([Link]): GPT Neo is an impressive model from EleutherAI,


An important point to note is that all of these language models weren't initially tailored for Arabic language tasks. Their exposure to Arabic data might be limited in comparison. This presents an exciting challenge for us as we explore their adaptability and potential in the context of Arabic tags generation. 

II. Fine-tuning strategy and the Used Dataset

In our pursuit of optimizing language model fine-tuning for specialized Arabic tasks, we employ a cutting-edge technique known as 4-bit quantization. This innovative method, represented by Quantized Low-Rank Adaptation (QLoRA), offers a game-changing advantage. 

1. Fine-Tuning on low resources: 4bit-Quantization

The 4-bit quantization technique allows us to fine-tune large language models (LLMs) using just a single GPU while preserving the high performance typically associated with full 16-bit models. To put it into perspective, this groundbreaking approach signifies a pivotal shift in the AI landscape, as it empowers us to achieve remarkable results efficiently and with reduced computational demands.

If you're eager to delve deeper into the intricacies of this remarkable technique, we invite you to explore it further. For a comprehensive understanding of 4-bit quantization and the QLoRA method, we encourage you to visit the following links:

Link 1: PEFT: Parameter-Efficient Fine-Tuning of Billion-Scale Models on Low-Resource Hardware

Link 2: Making LLMs even more accessible with bitsandbytes, 4-bit quantization and QLoRA


2. Fueling the Models: The Arabic Dataset and Where to Find It

Fine-tuning quantized LLMs is a powerful technique for adapting pre-trained language models to specific tasks or datasets. 

Fine-tuning the quantized model on the target task or dataset allows us to adapt the model to the new domain, improving its performance. With the right training procedure and hyperparameters, we can create highly performant quantized LLMs that are tailored to our specific needs.

To achieve this, I've curated a substantial dataset containing Arabic quotes along with their corresponding tags. It's open source and readily accessible on the HuggingFace library. This dataset serves as a valuable resource for training and fine-tuning language models for Arabic tags generation.



III. Comparative Study of Results and Model Hosting

1. Crafting the Metric: Evaluating LLM Performance

To assess the performance of each language model, we employed a tailored metric that we designed specifically for our evaluation. This custom metric serves as a vital yardstick in gauging the effectiveness of the models in generating Arabic tags for quotes. 

This metric takes two lists of Arabic strings (a string with the generated tags, and the validation string with original tags), preprocesses them to calculate their Jaccard similarity, and returns a normalized similarity score that ranges from 0 to 1, where 1 indicates a perfect match and 0 indicates no similarity.

 By creating this evaluation criterion, we ensure that the assessment aligns perfectly with our unique task, enabling a more precise and informative evaluation of each LLM's performance.


2. Unveiling Performance: Results of Each LLM

Now, it's time to unveil the results, and we have a clear winner! RedPajama-INCITE-Instruct-3B-v1 achieved the highest score. However, it's worth noting that the competition was extremely close. 



This closeness can be attributed to a couple of factors. First, the models we used are relatively small in size (all 4 models are under 3 billion parameters). Second, they haven't had extensive exposure to Arabic data during their pretraining phase. 

These two factors combined make the Arabic language challenge even more remarkable, as it underscores the models' adaptability and their ability to perform well despite limited exposure to Arabic data.


3. From Training to Deployment: Hosting the Winning Model

Hosting your model on the HuggingFace library is surprisingly straightforward and can be achieved with just a few lines of code. All you'll need is your HuggingFace token. 


Once your model is deployed, you can immediately start using it and even share it with your friends and colleagues for testing purposes. 



Detailed instructions for this process are provided in the Python notebook accompanying this blog post. If you'd like to explore more about HuggingFace model hosting, you can find additional information in this link: Deploy LLMs with Hugging Face Inference Endpoints


Conclusion

In summary, our exploration of large language models (LLMs) for Arabic tags generation has yielded impressive results. Despite model size constraints and limited Arabic data exposure, our top-performing model, RedPajama-INCITE-Instruct-3B-v1, showcased remarkable adaptability. The use of 4-bit quantization with QLoRA added efficiency to our process.

To explore the Python code used in this project, visit my GitHub repository
Additionally, don't miss our YouTube video for a visual walkthrough of our journey. 
I'm always eager to connect, so feel free to reach out to me on LinkedIn

Thank you, and stay tuned for more captivating projects and insights!

















Share:

Sunday, May 21

Demystifying Semantic Search and Question Answering




Introduction to Semantic Search

Semantic search is a fundamental NLP task that aims to bridge the gap between user queries and the underlying meaning of textual content. 

It goes beyond simple keyword matching, focusing on understanding context and delivering accurate results. 

While LLMs have advanced NLP, semantic search remains crucial, providing a complementary approach to achieve precise and context-aware search results. In this blog, we explore the importance of semantic search and its integration with question answering models, along with creating a user-friendly web interface.

1. Data Preprocessing



   - Understanding the Importance of Data Preprocessing

Data preprocessing plays a crucial role in semantic search as it lays the foundation for accurate and meaningful analysis. 

By extracting text from PDFs and employing techniques like sentence tokenization with Sentence Transformer, we enhance the quality of the data, enabling better understanding and analysis of the content during the semantic search process.

   - Extracting Text from PDFs: A Crucial First Step

Extracting text from PDFs is a crucial first step in the data preprocessing phase. To accomplish this, we employ the powerful pdfplumber Python library. 

By leveraging its functionality, we can efficiently extract the text content from PDF files, ensuring that we have the necessary textual data to perform subsequent semantic search and question answering tasks accurately and effectively.

   - Sentence Tokenization with Sentence Transformer: Enhancing Text Analysis

Sentence tokenization with the Sentence Transformers Python package is a pivotal step in enhancing text analysis for semantic search. 

By breaking down the extracted text into individual sentences, we can achieve a finer level of granularity, facilitating improved similarity measurements. 

Leveraging the power of Sentence Transformers, we can identify and compare relevant semantic units within the text, leading to more precise and context-aware search results.


2. Semantic Search



   - Harnessing the Power of Semantic Search

Unlike traditional keyword-based search, semantic search focuses on understanding the context and meaning behind user queries, leading to more accurate and relevant results. 

By analyzing the semantic relationships, concepts, and entities within the text, semantic search enables a deeper level of comprehension and improves the search experience. 

Whether it's finding specific information within scientific papers, legal documents, or news articles, semantic search empowers us to unlock the full potential of vast knowledge repositories and retrieve "precisely" what we need.

   - Exploring Different Similarity Methods for Semantic Search (Faiss, Annoy, and TF-IDF)

In this tutorial, we explore three different similarity methods for semantic search: 

Faiss (Facebook AI Similarity Search): A library for efficient similarity search.

Annoy (Approximate Nearest Neighbors Oh Yeah)): a package developed by Spotify.

TF-IDF (Term Frequency-Inverse Document Frequency), a classic information retrieval technique. 

By leveraging these techniques, we aim to obtain the top two most relevant results from each method, thereby improving search accuracy and effectiveness.

   - Context Injection: Unveiling the Most Relevant Content

In the context of semantic search, the BERT Question Answering (Q&A) model requires both a context and a question (used as the query) to provide accurate answers. 

To ensure we have the necessary context for the Q&A model, we employ context injection by concatenating all the top results obtained from the similarity methods (from the previous step). 

This process creates a comprehensive and informative context that can be utilized effectively by the question answering model to generate precise and contextualized responses.


3. Question Answering with BERT



   - Unleashing the Potential of Question Answering with BERT

By utilizing the Huggingface transformers package and leveraging a pre-trained model called "bert-large-uncased-whole-word-masking-finetuned-squad" which was fine-tuned on the SQuAD dataset, we can achieve accurate and context-aware question answering. 




This model has the following configuration: 

  • 24-layer 
  • 1024 hidden dimension 
  • 16 attention heads 
  • 336M parameters.

This model is capable of comprehending the nuances of the context and question, allowing it to provide detailed answers based on the information present within the text.

While the release of advanced language models like ChatGPT and other LLMs has garnered significant attention, the impact on the popularity of BERT-based question answering models can be observed from the download graph. 

Over time, the number of downloads for BERT models has decreased significantly, reflecting the shift in focus towards broader language models. 

However, it's important to note that BERT still remains a valuable and effective choice for accurate and contextualized question answering tasks.


4. Putting It All Together - Creating a Web Interface with Gradio

   - Enhancing User Experience: 



Creating a user-friendly web interface is essential for enhancing the overall user experience of our semantic search and question answering system. 

Gradio, a Python package, proves to be a key tool in this regard. With its intuitive and straightforward design, Gradio allows us to easily build interactive interfaces without extensive web development knowledge. 

By providing a simple and elegant way to showcase our semantic search and question answering functionalities, Gradio empowers users to interact seamlessly with our system, making it accessible and user-friendly for both technical and non-technical users alike.


5. Conclusion

In this blog, we explored the fascinating world of semantic search and question answering. We delved into the importance of data preprocessing, highlighting the crucial steps of extracting text from PDFs and performing sentence tokenization using Sentence Transformers. We then ventured into the realm of semantic search, discovering the power of different similarity methods like Faiss, Annoy, and TF-IDF to unveil the most relevant content. Leveraging the BERT Question Answering model, we witnessed how it provides accurate and contextualized answers, fueled by its understanding of language and fine-tuning on the SQuAD dataset.

Finally, we witnessed the seamless integration of all these components through the creation of a web interface using Gradio. This interface enhanced the user experience, allowing users to effortlessly interact with our semantic search and question-answering system.


Share:

Tuesday, March 14

Image Generator Web Application


Are you looking for an easy and fun way to generate images from text prompts? Look no further than a web application built with Google Colab, Stable Diffusion, and Gradio! In this blog post, we'll explore how to create an image generator web application using these tools. With just a few clicks, you can create a web app that generates stunning images based on text prompts. And the best part? You don't need any coding experience to get started.




What is Stable Diffusion? 

Stable Diffusion is a powerful deep learning model that can generate high-quality images from text prompts. It is based on the Diffusion Probabilistic Models (DPMs) framework, which is a class of generative models that can capture complex dependencies between variables in a probabilistic manner. Stable Diffusion can generate images that are highly detailed and diverse, making it an excellent tool for artists, designers, and researchers alike.


What is Gradio?

Gradio is a Python library that allows you to quickly create custom user interfaces for your machine learning models. With Gradio, you can create an interactive web interface that lets users experiment with different prompts and see the results in real-time. 
Gradio also supports a wide range of input and output types, making it easy to integrate with a variety of machine learning models and applications. 


Your App in 3 quick steps!

Creating an Image Generator Web Application To create an image generator web application, you'll need to follow these steps: 

1. Create a Google Colab notebook and install the Stable Diffusion package. 
2. Use Stable Diffusion to generate images from text prompts. 
3. Use Gradio to create an interactive and sharable web interface for your image generator.

Conclusion

In conclusion, the image generator web application is an easy and fun way to generate stunning images from text prompts. With just a few clicks and no coding experience necessary, you can create a user-friendly interface that allows users to experiment with different prompts and see the generated images in real-time. . If you enjoyed this post, don't forget to subscribe to our blog for more exciting and informative content. And if you have any questions or feedback, please don't hesitate to reach out to us via our contact links. We look forward to hearing from you!


Project Resources 


Tech used in this project: Python, Gradio, Stable Diffusion 2.0 


Share:

Thursday, March 9

ML-Olympiad Water-Quality-Prediction

 



Introduction

Greetings everyone, I am excited to share my journey participating in the water quality estimation competition. 
The competition required us to build a machine learning model based on the training data provided and predict the water quality estimation for the test dataset accurately. I put my knowledge of machine learning and data analysis into practice to preprocess, analyze, and visualize the data. I explored various regression techniques and hyperparameters to find the best model for this task. After numerous iterations, I was able to build a model that achieved high accuracy in predicting the water quality estimation for the test dataset. 
My hard work and dedication paid off as I secured the 18th position in the competition. I am sharing the code I used for this prediction task (regression) below, hoping that it can help and inspire others to pursue their interests in machine learning.


Machine Learning Models

I utilized three different machine learning models to predict the quality estimation for the test dataset. These models were the Sequential Neural Network, the XGBoost Regressor, and the Random Forest Regressor. 
Through rigorous experimentation and testing, I found that the XGBoost Regressor and the Random Forest Regressor performed the best in terms of prediction accuracy. 



Both models outperformed the Sequential Neural Network in this task, which is a reasonable outcome given the nature of the data. 

The XGBoost Regressor and the Random Forest Regressor are both tree-based models that excel in handling tabular data with multiple levels of categorical data. These models can capture complex interactions between variables, making them particularly well-suited for this type of problem. Ultimately, the XGBoost Regressor had the best performance based on the RMSE metric, followed closely by the Random Forest Regressor. 

Conclusion

I believe that the combination of these two models can provide a robust solution for similar regression problems in the future.
I would like to extend an invitation to try your own model and submit a late entry for the water quality estimation competition. This is an excellent opportunity to put your skills to the test and see how well your model performs against others in the competition. The competition data and rules are still available, so don't hesitate to give it a shot. You might be surprised at how well your model performs. Plus, this competition is an excellent opportunity to learn new techniques, explore new algorithms, and build your portfolio. 
So, why not take a shot and see how your model stacks up against others? Good luck, and happy modeling!


Project Resources

Tech used in this project: Python, Keras, Sklearn, Random Forest, XGBoost
GitHub project linkhttps://github.com/BoulahiaAhmed/ML-Olympiad--Water-Quality-Prediction






Share: