Dublin Core
Title
Credit Card Fraud Detection Using Machine Learning Algorithms and Data Analysis Techniques
Abstract
In today’s world usage of card-based and online payment methods is rapidly increasing, and with this growth comes the issue of cybersecurity and overall fraud. The credit card fraud rate has never been higher, and it is following a growing trend.
Therefore, improvement of credit card fraud detection systems is the main priority for all banks, systems that are providing credit card-based payments and all the participants in the digital payments market. This also comes for the purpose of the large percentage of the population that is using their credit cards daily, from everyday payments to international transactions that are of great value.
The goal was to train multiple models to define if referenced transactions should be treated as fraud, and the results were measured by standard machine learning parameters. The model that had best results is Ensembled model using Decision Tree, Logistic Regression and K-Nearest Neighbor models with overall accuracy of 99.91% with Feature Selection algorithm applied. Ensemble method combines multiple models and creates the model with the best metrics possible. Along with this model, we have trained Logistic Regression model, K-Nearest Neighbors, Support Vectors Machines and Neural Networks, with accuracies respectively 88.37%, 85.48%, 00.73% and 98.11% with features selected.
This research also covers the part of data preprocessing, as this step is crucial when building a model for credit card fraud detection systems. These systems must be fast and precise in order to be usable, as they are dealing with large sets of imbalanced data.
Therefore, improvement of credit card fraud detection systems is the main priority for all banks, systems that are providing credit card-based payments and all the participants in the digital payments market. This also comes for the purpose of the large percentage of the population that is using their credit cards daily, from everyday payments to international transactions that are of great value.
The goal was to train multiple models to define if referenced transactions should be treated as fraud, and the results were measured by standard machine learning parameters. The model that had best results is Ensembled model using Decision Tree, Logistic Regression and K-Nearest Neighbor models with overall accuracy of 99.91% with Feature Selection algorithm applied. Ensemble method combines multiple models and creates the model with the best metrics possible. Along with this model, we have trained Logistic Regression model, K-Nearest Neighbors, Support Vectors Machines and Neural Networks, with accuracies respectively 88.37%, 85.48%, 00.73% and 98.11% with features selected.
This research also covers the part of data preprocessing, as this step is crucial when building a model for credit card fraud detection systems. These systems must be fast and precise in order to be usable, as they are dealing with large sets of imbalanced data.
At the end of the study, individuals will have better insight in credit card transactions, will also be familiar with the different methods for detecting credit card frauds and will have insight in which model suits the needs of this case the most.
Keywords
credit card, fraud, transaction, machine learning algorithms, classification, dataset preprocessing