Identifying Credit Card Fraud through Machine Learning Techniques ~ Late Night Coding

Introduction

Credit card fraud is a pervasive problem that affects both consumers and financial institutions. With the increase in online transactions, the risk of fraud has also increased. In this project, we aimed to develop a model to detect credit card fraud using machine learning techniques.

Dataset Description

The dataset used in this project consisted of 284,807 transactions, of which 492 were identified as fraudulent.
This represents a fraud rate of 0.172%. The data was highly unbalanced, with a large majority of transactions being non-fraudulent.

The value 1 is for fraudulent transactions, value 0 is for nonfraudulent transactions

Link for the Dataset: https://www.kaggle.com/datasets/shayannaveed/credit-card-fraud-detection

Methods and Algorithms

To overcome the issue of unbalanced data, we implemented two techniques: oversampling using Synthetic Minority Oversampling Technique (SMOTE) and Under-sampling using Random Under-Sampling Technique (RUS).

These techniques helped to balance the data and improve the performance of our machine learning models.

SMOTE create elements specifically for the minority class. The algorithm picks examples from the feature space that are close to one another, draws a line connecting the examples, and then creates a new sample at a position along the line.

Random Under-Sampling Technique

RUS involves randomly selecting examples from the majority class and deleting them from the training dataset. In the random under-sampling, the majority class instances are discarded at random until a more balanced distribution is reached.

Using these two techniques, we were able to balance the training data by transforming the original dataset which had 200405 genuine transactions and 354 fraudulent transactions into a balanced dataset with 198276 examples of each class.

This is an important step in training a model for a classification task, as it helps to prevent the model from having a bias towards one class.

Machine Learning Models

In our study, we implemented two machine learning models: a Random Forest classifier and an XGBoost classifier. Our analysis revealed that the Random Forest model exhibited superior precision and F1 score compared to the XGBoost classifier. While the XGBoost classifier demonstrated a higher recall score, the Random Forest model demonstrated a more precise ability to predict the true labels of the data. as reflected in its higher precision and F1 score.

Evaluating the results

The picture on the top presents the test results of the random forest model, while the one on the bottom presents the results on the XGBoost Model. both includes the accuracy, recall, precision, and F1 score.

These metrics can be useful for understanding the strengths and weaknesses of a model and for comparing the performance of different models.

Random Forest Test Results

XGBoost Test Results

Conclusion

In this project, we successfully developed a credit card fraud detection model using machine learning techniques. By implementing oversampling and undersampling techniques, we were able to improve the performance of our models and achieve good results. The Random Forest model was found to be the best performing model in terms of precision and F1 score, while the XGBoost classifier had a better recall score.

Project resources:

Tech used in this project: Python, Sklearn, Random Forest, XGBoost
GitHub project link: https://github.com/BoulahiaAhmed/Credit-Card-Fraud-Detection

Late Night Coding

Welcome to my blog!🤗 My name is Ahmed Boulahia, I'm a data scientist with a passion for sharing my knowledge and expertise. You will find some of my projects on this blog. I hope you find them both interesting and informative.

Sunday, January 8

Identifying Credit Card Fraud through Machine Learning Techniques