Machine Learning Based Real-Time Ad Click Fraud Detection and IP Mitigation System

Paper Details
Manuscript ID: 2126-0423-8928
Vol.: 2 Issue: 4 Pages: 135-146 Apr - 2026 Subject: Computer Science Language: English
ISSN: 3068-1995 Online ISSN: 3068-109X DOI: https://doi.org/10.64823/ijter.2604016
Abstract

The rapid expansion of digital advertising has created a lucrative target for fraudulent actors who exploit the pay-per-click model by generating illegitimate clicks through automated bots, click farms, and malicious scripts. These fraudulent activities distort campaign analytics, exhaust advertiser budgets, and progressively erode trust in online advertising platforms. Rule-based detection systems — which rely on fixed thresholds and manually defined heuristics — are easily circumvented by adversaries who deliberately engineer their traffic to remain below detection limits while still causing meaningful financial damage. This paper presents a modular, data-driven system that addresses this challenge by combining a 21-feature behavioural representation with a trained XG-Boost gradient boosting classifier to detect fraudulent clicks in real time. The system handles the severe class imbalance inherent to fraud datasets through cost-sensitive learning and threshold optimization guided by the precision-recall curve. Transparency is embedded at the core: SHAP-based explainability generates per-prediction feature-level rationales that are surfaced directly on the advertiser dashboard, converting opaque model decisions into actionable human-readable insights. The complete solution is implemented as a full-stack web application with a React frontend, Node.js/Express backend, Python Flask ML microservice, and MongoDB data layer. Evaluated on the large-scale TalkingData AdTracking benchmark, the deployed XGBoost model achieves an AUC of 0.9549 and a recall of 0.7673, while the hybrid LightGBM–XGBoost ensemble reaches 0.9815 AUC, demonstrating strong predictive performance and practical deployability

Keywords
Ad Click Fraud XGBoost LightGBM SHAP Feature Engineering Real-Time Detection IP Mitigation.
Share
Paper Metrics
  • Views 28
  • Downloads 3
Cite this Article

A Kameswara Rao, J Sujay, R Radhika (2026). Machine Learning Based Real-Time Ad Click Fraud Detection and IP Mitigation System. International Journal of Technology & Emerging Research (IJTER), 2(4), 135-146. https://doi.org/10.64823/ijter.2604016

BibTeX
@article{ijter2026212604238928,
  author = {A Kameswara Rao and J Sujay and R Radhika},
  title = {Machine Learning Based Real-Time Ad Click Fraud Detection and IP Mitigation System},
  journal = {International Journal of Technology &  Emerging Research },
  year = {2026},
  volume = {2},
  number = {4},
  pages = {135-146},
  doi =  {10.64823/ijter.2604016},
  issn = {3068-109X},
  url = {https://www.ijter.org/article/212604238928/machine-learning-based-real-time-ad-click-fraud-detection-and-ip-mitigation-system},
  abstract = {The rapid expansion of digital advertising has created a lucrative target for fraudulent actors who exploit the pay-per-click model by generating illegitimate clicks through automated bots, click farms, and malicious scripts. These fraudulent activities distort campaign analytics, exhaust advertiser budgets, and progressively erode trust in online advertising platforms. Rule-based detection systems — which rely on fixed thresholds and manually defined heuristics — are easily circumvented by adversaries who deliberately engineer their traffic to remain below detection limits while still causing meaningful financial damage.
  This paper presents a modular, data-driven system that addresses this challenge by combining a 21-feature behavioural representation with a trained XG-Boost gradient boosting classifier to detect fraudulent clicks in real time. The system handles the severe class imbalance inherent to fraud datasets through cost-sensitive learning and threshold optimization guided by the precision-recall curve. Transparency is embedded at the core: SHAP-based explainability generates per-prediction feature-level rationales that are surfaced directly on the advertiser dashboard, converting opaque model decisions into actionable human-readable insights. The complete solution is implemented as a full-stack web application with a React frontend, Node.js/Express backend, Python Flask ML microservice, and MongoDB data layer. Evaluated on the large-scale TalkingData AdTracking benchmark, the deployed XGBoost model achieves an AUC of 0.9549 and a recall of 0.7673, while the hybrid LightGBM–XGBoost ensemble reaches 0.9815 AUC, demonstrating strong predictive performance and practical deployability
  },
  keywords = {Ad Click Fraud, XGBoost, LightGBM, SHAP, Feature Engineering, Real-Time Detection, IP Mitigation.},
  month = {Apr},
}
Copyright & License

Copyright © 2025 Authors retain the copyright of this article. This article is an open access article distributed under the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.