The Future of SRE and Observability: Leveraging AI, Automation, and Culture for Resilience

Main Article Content

Vasudevan Senathi Ramdoss

Abstract

Today’s systems are more complex than ever, making it essential for engineering teams to adopt resilient practices. This paper looks at how Site Reliability Engineering (SRE) and observability are changing, especially with new technologies like AI, predictive analytics, and automation. These tools help teams create systems that are reliable, scalable, and efficient. To keep up, companies need to adopt modern tools, rethink their culture, and make reliability a shared responsibility. SRE and observability are more than technical solutions—they’re ways to align teams around shared goals. The paper also emphasizes the need for continuous improvement and adapting to changes in technology and user demands.

Article Details

How to Cite
Ramdoss, V. S. (2023). The Future of SRE and Observability: Leveraging AI, Automation, and Culture for Resilience. The Eastasouth Journal of Information System and Computer Science, 1(01), 60–64. https://doi.org/10.58812/esiscs.v1i01.434
Section
Articles

References

Datadog, “Monitoring and observability platform.” https://www.datadoghq.com.

New Relic, “Application performance monitoring.” https://newrelic.com

Splunk, “Data platform for security and observability.” https://www.splunk.com.

Elasticsearch, “Distributed search and analytics engine.” https://www.elastic.co/elasticsearch.

Logstash, “Server-side data processing pipeline.”

Moogsoft, “AI-driven observability platform.” https://www.moogsoft.com.

PagerDuty, “PagerDuty.” https://www.pagerduty.com.

Prometheus, “Open-source monitoring and alerting toolkit.” https://prometheus.io

Grafana, “Visualization and analytics software.” available: https://grafana.com

Kibana, “Data visualization and exploration.” https://www.elastic.co/kibana.%0A%0A