The Use of Chaos Engineering to Enhance the Resilience of Microservice Architectures: A Case Study of "Online Boutique"
pdf (Georgian)

Keywords

Chaos Engineering
Microservices
Kubernetes
DevOps
Prometheus
Grafana

How to Cite

Kuchava, G., Kartvelishvili, I., & Vashalomidze, S. (2026). The Use of Chaos Engineering to Enhance the Resilience of Microservice Architectures: A Case Study of "Online Boutique". International Scientific-Practical Conference: „Modern Challenges and Achievements in Information and Communication Technologies“ Transactions, 4, 469-473. https://papers.4science.ge/index.php/mcaaict/article/view/456

Abstract

Modern cloud-native microservice architectures, despite their flexibility and scalability, often face complex challenges such as latent failures, cascading errors, and unpredictable system behavior. This article explores the application of chaos engineering, a proactive DevOps method, using the example of Google’s “Online Boutique” application deployed on Google Kubernetes Engine (GKE). Experiments conducted with Chaos Mesh (CPU Hog, Memory Hog, Network Latency, Packet Loss) revealed critical vulnerabilities, including “silent failures,” memory exhaustion (OOMKilled), and inadequate timeout/retry mechanisms. Using KPIs measured through Prometheus and Grafana (response time, error rate, CPU/memory usage) and simulated user traffic via Locust, system optimization was achieved. This included implementing the Horizontal Pod Autoscaler (HPA), adjusting resource limits, and enhancing timeout/retry mechanisms. Repeated experiments demonstrated statistically significant improvements in system resilience, underscoring the importance of chaos engineering as an integral part of DevOps. The article provides practical recommendations for enhancing system reliability and integrating chaos engineering into the SDLC.

pdf (Georgian)

References

დემჩენკო ნ. (2025). DevOps-ის ეფექტიანობის ოპტიმიზაცია ქაოსური ინჟინერიის გამოყენებით, ბიზნესისა და ტექნოლოგიების უნივერსიტეტი. 80 გვ.

Basiri, A., Behnam, N., de Rooij, R., Hochstein, L., Kosewski, L., Reynolds, J., & Rosenthal, C. (2016). Chaos Engineering: Building Confidence in System Behavior through Controlled Experiments. Netflix Technology Blog.

Rosenthal, C., & Jones, N. (2020). Chaos Engineering: System Resiliency in Practice. O’Reilly Media. 267 p.

Zhang, X., Liu, Y., & Wang, J. (2022). Performance Optimization in Kubernetes-Based Microservices. Journal of Cloud Computing, 11(3), 45–56p.