Autonomous Network Troubleshooting with AIOps and Machine Learning: A Data-Driven Architecture for Correlation, Prediction, and Remediation

Main Article Content

Mohit Bajpai

Abstract

Modern networks carry business-critical traffic across data centers, cloud platforms, branch locations, APIs, telecom circuits, security gateways, and software-defined overlays. In this environment, traditional troubleshooting is no longer limited by the availability of alarms; it is limited by the volume, fragmentation, and operational interpretation of telemetry. This updated paper presents an expanded AIOps and machine learning framework for automating network troubleshooting through telemetry ingestion, data enrichment, anomaly detection, event correlation, root-cause ranking, predictive risk scoring, and governed closed-loop remediation. The paper extends the original architecture by adding data governance, model lifecycle management, explainability, human approval gates, operational KPIs, and security controls. It also introduces a telecommunications implementation scenario in which Remedy tickets, Kafka streams, Kong APIs, topology data, and AIOps model outputs are combined to reduce alert noise, accelerate diagnosis, and improve network resilience. The proposed approach is not intended to replace network engineers; rather, it converts repetitive investigation patterns into repeatable, auditable, and continuously improving operational workflows.

Article Details

How to Cite
Bajpai, M. (2024). Autonomous Network Troubleshooting with AIOps and Machine Learning: A Data-Driven Architecture for Correlation, Prediction, and Remediation. The Eastasouth Journal of Information System and Computer Science, 2(02), 275–283. https://doi.org/10.58812/esiscs.v2i02.1101
Section
Articles

References

[1] Chen, Z., Kang, Y., Li, L., Zhang, X., Zhang, H., Xu, H., Zhou, Y., Yang, L., Sun, J. C., Xu, Z., Dang, Y., Gao, F., Zhao, P., Qiao, B., Lin, Q., Zhang, D., & Lyu, M. R. (2020). Towards intelligent incident management: Why we need it and how we make it. Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. https://doi.org/10.1145/3368089.3417055

[2] Cheng, Q., Sahoo, D., Saha, A., Yang, W., Liu, C., Woo, G., Singh, M., Saverese, S., & Hoi, S. C. H. (2023). AI for IT Operations (AIOps) on Cloud Platforms: Reviews, Opportunities and Challenges. arXiv:2304.04661. https://doi.org/10.48550/arXiv.2304.04661

[3] Zhong, Z., Fan, Q., Zhang, J., Ma, M., Zhang, S., Sun, Y., Lin, Q., Zhang, Y., & Pei, D. (2023). A Survey of Time Series Anomaly Detection Methods in the AIOps Domain. arXiv:2308.00393. https://doi.org/10.48550/arXiv.2308.00393

[4] Kumar, S. (2023). Data Silos: A Roadblock for AIOps. arXiv:2312.10039. https://doi.org/10.48550/arXiv.2312.10039

[5] Uptime Institute. (2023). Annual Outage Analysis 2023. https://uptimeinstitute.com/resources/research-and-reports/annual-outage-analysis-2023

[6] Cisco. (2020). Cisco Annual Internet Report (2018-2023) White Paper. https://www.cisco.com/c/en/us/solutions/collateral/executive-perspectives/annual-internet-report/white-paper-c11-741490.html

[7] BMC Software. (2020). BMC Remedy ITSM Suite Overview. https://www.bmc.com/it-solutions/remedy-itsm.html

[8] National Institute of Standards and Technology. (2020). Security and Privacy Controls for Information Systems and Organizations (NIST Special Publication 800-53 Rev. 5). https://doi.org/10.6028/NIST.SP.800-53r5

[9] International Organization for Standardization. (2022). ISO/IEC 27001:2022 Information security, cybersecurity and privacy protection - Information security management systems - Requirements.

[10] Pimentel, M. A. F., Clifton, D. A., Clifton, L., & Tarassenko, L. (2014). A review of novelty detection. Signal Processing, 99, 215-249. https://doi.org/10.1016/j.sigpro.2013.12.026

[11] Du, M., Li, F., Zheng, G., & Srikumar, V. (2017). DeepLog: Anomaly detection and diagnosis from system logs through deep learning. Proceedings of the ACM SIGSAC Conference on Computer and Communications Security. https://doi.org/10.1145/3133956.3134015

[12] Wang, S., Balarezo, J. F., Kandeepan, S., Al-Hourani, A., Gomez, K., & Rubinstein, B. (2021). Machine Learning in Network Anomaly Detection: A Survey. IEEE Access, 9, 152379-152396. https://doi.org/10.1109/ACCESS.2021