]]> Neuromarketing combines neuroscience and marketing to analyze consumer behavior through tools like electroencephalography (EEG), which captures subconscious and emotional responses. This thesis applies machine learning (ML) techniques to EEG data for predicting purchase decisions, addressing the limitations of traditional marketing methods. Using the NeuMa dataset, which includes EEG and eye-tracking data, key features such as frontal alpha asymmetry (FAA), power spectral density (PSD), and alpha-beta power ratios were extracted to build predictive models. Four ML algorithms—Support Vector Machines (SVM), Random Forest (RF), Artificial Neural Networks (ANN), and Convolutional Neural Networks (CNN)—were evaluated based on accuracy, ROC AUC, and execution time. SVM emerged as the best-performing model, achieving 94.3% accuracy. 99% ROC AUC, with efficient processing time, making it suitable for neuromarketing research. The results confirm the critical role of EEG features from the frontal region, particularly FAA and alpha-beta power ratios, in predicting consumer preferences. These metrics reflect emotional and subconscious responses, emphasizing their importance in purchase decisions. This study demonstrates the value of integrating EEG with ML for consumer analysis, offering a scalable, unbiased, and data-driven approach to marketing research. By combining neuroscience with modern methods, this research provides a foundation for improving consumer preference analysis. It highlights the potential of EEG-based metrics and ML models to enhance marketing strategies, moving beyond traditional self-report methods toward more objective and accurate insights.

]]>
We explore the complexities of random forests and decision trees, making use of their ability to reveal intricate patterns in real estate databases. This research also includes time series modeling to recognize and comprehend the evolving patterns that characterize real estate dynamics throughout time. The analysis of SARIMAX, ARIMA, and Holt-Winters time-series models shows ARIMA's consistent accuracy, while SARIMAX and Holt-Winters excel in stability and trend capture, respectively. In machine learning, Decision Trees offer interpretability, while Random Forests show reduced error rates and enhanced accuracy. In the US dataset, SARIMAX has a Mean Absolute Percentage Error (MAPE) of 3.35% and ARIMA achieves 1.66%, while Holt-Winters shows 3.54%. Decision Trees have a MAPE of 2.97%, and Random Forests achieve 2.10%. In the BiH dataset, SARIMAX has a MAPE of 5.08%, ARIMA achieves 1.22%, while Holt-Winters shows 2.17%. Decision Trees have a MAPE of 0.83%, and Random Forests achieve 0.82%.]]> Today with applications being used in distributed environments on a widespread basis, awareness of them is most important in ensuring smooth and effective development of information systems, particularly in those fields where a lot of information has to be processed, such as e-business, banks etc.]]>
Therefore, improvement of credit card fraud detection systems is the main priority for all banks, systems that are providing credit card-based payments and all the participants in the digital payments market. This also comes for the purpose of the large percentage of the population that is using their credit cards daily, from everyday payments to international transactions that are of great value.

The goal was to train multiple models to define if referenced transactions should be treated as fraud, and the results were measured by standard machine learning parameters. The model that had best results is Ensembled model using Decision Tree, Logistic Regression and K-Nearest Neighbor models with overall accuracy of 99.91% with Feature Selection algorithm applied. Ensemble method combines multiple models and creates the model with the best metrics possible. Along with this model, we have trained Logistic Regression model, K-Nearest Neighbors, Support Vectors Machines and Neural Networks, with accuracies respectively 88.37%, 85.48%, 00.73% and 98.11% with features selected.

This research also covers the part of data preprocessing, as this step is crucial when building a model for credit card fraud detection systems. These systems must be fast and precise in order to be usable, as they are dealing with large sets of imbalanced data.

At the end of the study, individuals will have better insight in credit card transactions, will also be familiar with the different methods for detecting credit card frauds and will have insight in which model suits the needs of this case the most.

]]> Enterprise Resource Planning (ERP) systems are of immense importance in simplifying business operations. However, most ERP projects fail owing to the complexity and scope of the projects. The present research attempted to determine the outcomes of ERP projects by employing machine learning methods and addressing factors which determine whether projects fail or succeed. This dissertation obtained data from different aspects of the projects that included successful and unsuccessful ERP deployments in terms of within which industry, project magnitude, the level of budget and time exceeding, background of team experience as well as technical challenges faced amongst others.
The research includes machine learning methods such as logistic regression, decision trees, and random forests in order to assess the importance of the relevant predictors of any project. By training and testing these applications on a sample composed of both successful and non-successful ERP projects, the goal of the model is to seek for factors and patterns which could help in forecasting troubling tendencies. This research is aimed at devising a functional framework that can be used by project managers, enabling them to take action before issuing their project plans for ERP systems. Such a predictive model could significantly help in decreasing the rates of ERP failures and hence assist businesses in carrying out successful implementations and enhancing their returns on technology investment.]]> This proposed study investigates the potential of machine learning methodologies in facilitating early cancer risk assessment by analyzing complex medical datasets. The primary objective is to assess whether machine learning models can reliably identify patients at heightened risk before the disease becomes clinically evident. Through this approach, the study aims to contribute to the development of predictive systems that can trigger early interventions and encourage proactive health monitoring.
The research seeks to answer the core question: “Can machine learning models effectively assess the risk of early-stage cancer using molecular-level data, such as gene expression profiles, prior to the onset of clinical symptoms?”
Sub-questions to be explored include the accuracy of early-stage cancer detection using machine learning, the types of data that most influence prediction performance, and the feasibility of using such models to prompt timely medical evaluations in the absence of traditional diagnostic markers.
The findings are expected to support advancements in personalized medicine by laying the groundwork for tools that assist in identifying high-risk individuals, potentially transforming the current approach to cancer screening and prevention.
]]> Heart strokes remain one of the leading health risks in the world today. Timely prediction can significantly improve patient outcomes and healthcare resource allocation. This study aims to harness machine learning techniques to develop efficient predictive models for the early detection of heart strokes.
Research is based on a dataset created by combining different (five) datasets. The dataset encompasses patient demographics, clinical measurements, and historical medical records. The analysis focused on five machine learning models: Logistic Regression, Decision Tree, K-Nearest Neighbors, Random Forest, and Support Vector Machine.
The goal was not only to test different algorithms, but also to understand how data preparation, feature selection, and model choice impact the final results. The models were trained and tested on both the original dataset and an extended version, where new features were added by combining existing ones.
The results showed that models such as Logistic Regression, Decision Tree, and KNN performed better when applied to the original data. The Decision Tree model achieved an accuracy of 87.8% and an F1 score of 0.881, while Logistic Regression and K-Nearest Neighbors each attained F1 scores of 0.850 and 0.849, respectively. On the other hand, Random Forest and SVM showed significant improvements with the extended dataset. Random Forest performed the best overall, with an F1 score of 0.920 and an accuracy of 91.6% with enhanced results.
SVM also benefited from enhanced results, improving its F1 score from 0.892 to 0.879, which highlights how specific models can leverage additional features for improved generalization.
This tool could help detect risks earlier, allowing for timely interventions and prevention, thereby reducing the burden of strokes on healthcare systems and improving patient care. Limitations include data quality and availability, as well as potential bias in healthcare records.]]> Diabetes is a growing global health issue, and early prediction is key to preventing its effects. This thesis develops predictive models for diabetes using various machine learning methods, including Logistic Regression, Decision Trees, K-Nearest Neighbors (KNN), Random Forest, Support Vector Machine (SVM), and XGBoost, using the Diabetes Health Indicators dataset, which covers clinical, lifestyle, and demographic factors. Feature selection identifies the most important diabetic predictors, and model performance is evaluated using macro average and weighted average metrics, accuracy, precision, recall, F1-score, and error metrics (MSE and RMSE) to provide a thorough evaluation of model performance across the classes. Both SVM and Random Forest performed best overall, with an accuracy of 0.86. They also performed exceptionally well in weighted average and macro average measures, with overall recall and F1-scores of 0.86. SVM has the highest precision performance at 0.88, with Random Forest achieving the next best score of 0.87. These models are very dependable for diabetes prediction tasks because of their remarkable balance while handling both classes. SVM and Random Forest offer more dependable performance on a range of metrics, as evidenced by the weaker outcomes of Decision Tree, KNN, XGBoost, and Logistic Regression.

]]>