Performance Evaluation of Machine Learning Inference Workloads in Containerized Cloud Computing Environments
Main Article Content
Abstract
Machine learning (ML) systems are increasingly deployed in cloud-native environments where scalability, portability, and resource efficiency are essential. There are many scenarios in which the Docker and Kubernetes containerization solution are the best solution for machine learning inferencing services as the application scales, moves, and seeks every efficiency. However, the performance of machine learning inferencing services within a containerized cloud environment still needs to be explored. What is the performance of machine learning inferencing services within a containerized cloud environment? The performance of machine learning inferencing services within a containerized cloud environment needs to be explored. The aim of the exploration is to understand the performance of various machine learning models within a containerized cloud environment and to determine the major factors affecting the performance of machine learning inferencing services. Several machine learning models are implemented using Python-based frameworks and deployed as microservices in Docker containers. The experiments are performed by sending simultaneous prediction requests from multiple users to the deployed models. The study establishes baseline benchmarks, which demonstrate the impact of containerization on inference speed and efficiency. This provides useful and practical knowledge for building scalable AI systems and establishes the foundation for future work, such as optimizing ML deployment pipelines, incorporating privacy-preserving inference techniques, and improving container orchestration for AI workloads.
Article Details

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
References
[1] L. Bottou, F. E. Curtis, and J. Nocedal, “Optimization methods for large-scale machine learning,” SIAM Review, vol. 60, no. 2, pp. 223–311, 2018.
[2] M. Abadi et al., “TensorFlow: A system for large-scale machine learning,” in Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (OSDI), pp. 265–283, 2016.
[3] J. Dean and S. Ghemawat, “MapReduce: Simplified data processing on large clusters,” Communications of the ACM, vol. 51, no. 1, pp. 107–113, 2008.
[4] W. Zhang, T. Chen, and J. Liu, “Efficient machine learning model deployment in cloud environments,” IEEE Transactions on Cloud Computing, vol. 9, no. 2, pp. 589–602, 2021.
[5] D. Merkel, “Docker: Lightweight Linux containers for consistent development and deployment,” Linux Journal, vol. 2014, no. 239, pp. 2–7, 2014.
[6] B. Burns, B. Grant, D. Oppenheimer, E. Brewer, and J. Wilkes, “Borg, Omega, and Kubernetes: Lessons learned from three container-management systems,” Communications of the ACM, vol. 59, no. 5, pp. 50–57, 2016.
[7] P. Pahl, “Containerization and the PaaS cloud,” IEEE Cloud Computing, vol. 2, no. 3, pp. 24–31, 2015.
[8] A. Verma, L. Pedrosa, M. Korupolu, D. Oppenheimer, E. Tune, and J. Wilkes, “Large-scale cluster management at Google with Borg,” in Proceedings of the European Conference on Computer Systems, pp. 18–33, 2015.
[9] C. Zhang, P. Patras, and H. Haddadi, “Deep learning in mobile and wireless networking: A survey,” IEEE Communications Surveys & Tutorials, vol. 21, no. 3, pp. 2224–2287, 2019.
[10] Y. Chen, A. Ganapathi, R. Griffith, and R. Katz, “The case for evaluating MapReduce performance using workload suites,” in IEEE International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems, pp. 390–399, 2011.
[11] S. Shi et al., “Benchmarking state-of-the-art deep learning software tools,” in Proceedings of the International Conference on Learning Representations (ICLR), 2017.
[12] J. Zhang, Q. Chen, and Z. Chen, “Performance evaluation of container-based cloud environments for scientific computing,” Future Generation Computer Systems, vol. 79, pp. 158–170, 2018.
[13] T. White, Hadoop: The Definitive Guide, 4th ed., O’Reilly Media, 2015.
[14] A. Krizhevsky, I. Sutskever, and G. Hinton, “ImageNet classification with deep convolutional neural networks,” in Advances in Neural Information Processing Systems, pp. 1097–1105, 2012.
[15] K. Hwang, G. Fox, and J. Dongarra, Distributed and Cloud Computing: From Parallel Processing to the Internet of Things, Morgan Kaufmann, 2012.
[16] K. R. Singi, “Performance Optimization Strategies for High-Concurrency Spring Boot Microservices in Enterprise Financial Systems,” The Eastasouth Journal of Information System and Computer Science, vol. 1, no. 2, pp. 215–231, 2023. doi:10.58812/esiscs.v1i02.883.