From Data to Decisions: How Quality Drives Machine Learning Success

Ravikumar Mani Naidu Gunasekaran

doi:10.58812/esiscs.v2i01.1099

PDF

Published: Aug 31, 2024

DOI: https://doi.org/10.58812/esiscs.v2i01.1099

Keywords:

AI Performance; AI Reliability; Big Data Quality; Data Bias; Data Cleansing; Data Consistency; Data Governance; Data Integrity; Data Lifecycle Management; Data Preprocessing; Data Validation; Data-Driven Decisions; Machine Learning Accuracy; Predictive Analytics

Ravikumar Mani Naidu Gunasekaran

Independent researcher, California, United States

Abstract

In the era of data-driven decision-making, machine learning (ML) has emerged as a critical tool for extracting insights and enabling intelligent automation across industries. However, the success of ML models is fundamentally dependent on the quality of the data used throughout the analytics pipeline. This article explores the relationship between data quality and machine learning performance, emphasizing how data integrity directly impacts model accuracy, reliability, and fairness. Key dimensions of data quality—including accuracy, completeness, consistency, and timeliness—are examined in the context of real-world ML applications. The article further discusses common data challenges such as missing values, noise, bias, and data drift, highlighting their implications on predictive outcomes. Additionally, it presents practical approaches to improving data quality through data preprocessing, validation, governance frameworks, and automated monitoring systems. By bridging the gap between raw data and actionable insights, this study underscores that high-quality data is not merely a prerequisite but a strategic enabler of successful machine learning initiatives. Organizations that prioritize data integrity can achieve more robust models, better decision-making, and sustain competitive advantage in an increasingly data-centric world.

How to Cite

Gunasekaran, R. M. N. (2024). From Data to Decisions: How Quality Drives Machine Learning Success. The Eastasouth Journal of Information System and Computer Science, 2(01), 120–130. https://doi.org/10.58812/esiscs.v2i01.1099

Issue

Vol. 2 No. 01 (2024): The Eastasouth Journal of Information System and Computer Science (ESISCS)

Section

Articles

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

References

[1] Wang, R. Y., & Strong, D. M. (1996). Beyond accuracy: What data quality means to data consumers. Journal of Management Information Systems, 12(4), 5–33. https://doi.org/10.1080/07421222.1996.11518099

[2] Batini, C., & Scannapieco, M. (2016). Data and Information Quality: Dimensions, Principles and Techniques. Springer. https://doi.org/10.1007/978-3-319-24106-7

[3] Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press. https://www.deeplearningbook.org/

[4] Kelleher, J. D., Mac Namee, B., & D'Arcy, A. (2015). Fundamentals of Machine Learning for Predictive Data Analytics. MIT Press. https://dl.acm.org/doi/10.5555/2815672

[5] Redman, T. C. (2013). Data Driven: Profiting from Your Most Important Business Asset. Harvard Business Review Press.

[6] Sculley, D., Holt, G., Golovin, D., Davydov, E., Phillips, T., Ebner, D., Chaudhary, V., Young, M., Crespo, J.-F., & Dennison, D. (2015). Hidden technical debt in machine learning systems. Advances in Neural Information Processing Systems, 28, 2503–2511. https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems

[7] Polyzotis, N., Roy, S., Whang, S. E., & Zinkevich, M. (2017). Data management challenges in production machine learning. Proceedings of the 2017 ACM International Conference on Management of Data, 1723–1726. https://doi.org/10.1145/3035918.3054782

[8] Abadi, M., et al. (2016). TensorFlow: Large-scale machine learning on heterogeneous distributed systems. arXiv. https://arxiv.org/abs/1603.04467

[9] European Union. (2016). Regulation (EU) 2016/679 of the European Parliament and of the Council. Official Journal of the European Union. https://eur-lex.europa.eu/eli/reg/2016/679/oj/eng

[10] ISO/IEC. (2008). ISO/IEC 25012:2008 Software engineering—Software product Quality Requirements and Evaluation (SQuaRE)—Data quality model. ISO. https://iso25000.com/index.php/en/iso-25000-standards/iso-25012

Article Sidebar

Main Article Content

Abstract

Article Details

References