The Use of Chaos Engineering to Enhance the Resilience of Microservice Architectures: A Case Study of "Online Boutique"

Giorgi Kuchava; Ioseb Kartvelishvili; Shalva Vashalomidze

No. 4 (2025), Articles

No. 4 (2025)

The Use of Chaos Engineering to Enhance the Resilience of Microservice Architectures: A Case Study of "Online Boutique"

Articles

Published 2026-01-06

Giorgi Kuchava⁺⁻
Ioseb Kartvelishvili⁺⁻
Shalva Vashalomidze⁺⁻

Giorgi Kuchava

Georgian Technical University

Ioseb Kartvelishvili

Georgian Technical University

Shalva Vashalomidze

Georgian Technical University

pdf (Georgian)

Keywords

Chaos Engineering
Microservices
Kubernetes
DevOps
Prometheus
Grafana

How to Cite

Kuchava, G., Kartvelishvili, I., & Vashalomidze, S. (2026). The Use of Chaos Engineering to Enhance the Resilience of Microservice Architectures: A Case Study of "Online Boutique". International Scientific-Practical Conference: „Modern Challenges and Achievements in Information and Communication Technologies“ Transactions, 4, 469-473. https://papers.4science.ge/index.php/mcaaict/article/view/456

Abstract

Modern cloud-native microservice architectures, despite their flexibility and scalability, often face complex challenges such as latent failures, cascading errors, and unpredictable system behavior. This article explores the application of chaos engineering, a proactive DevOps method, using the example of Google’s “Online Boutique” application deployed on Google Kubernetes Engine (GKE). Experiments conducted with Chaos Mesh (CPU Hog, Memory Hog, Network Latency, Packet Loss) revealed critical vulnerabilities, including “silent failures,” memory exhaustion (OOMKilled), and inadequate timeout/retry mechanisms. Using KPIs measured through Prometheus and Grafana (response time, error rate, CPU/memory usage) and simulated user traffic via Locust, system optimization was achieved. This included implementing the Horizontal Pod Autoscaler (HPA), adjusting resource limits, and enhancing timeout/retry mechanisms. Repeated experiments demonstrated statistically significant improvements in system resilience, underscoring the importance of chaos engineering as an integral part of DevOps. The article provides practical recommendations for enhancing system reliability and integrating chaos engineering into the SDLC.

pdf (Georgian)

References

დემჩენკო ნ. (2025). DevOps-ის ეფექტიანობის ოპტიმიზაცია ქაოსური ინჟინერიის გამოყენებით, ბიზნესისა და ტექნოლოგიების უნივერსიტეტი. 80 გვ.

Basiri, A., Behnam, N., de Rooij, R., Hochstein, L., Kosewski, L., Reynolds, J., & Rosenthal, C. (2016). Chaos Engineering: Building Confidence in System Behavior through Controlled Experiments. Netflix Technology Blog.

Rosenthal, C., & Jones, N. (2020). Chaos Engineering: System Resiliency in Practice. O’Reilly Media. 267 p.

Zhang, X., Liu, Y., & Wang, J. (2022). Performance Optimization in Kubernetes-Based Microservices. Journal of Cloud Computing, 11(3), 45–56p.

Most read articles by the same author(s)

Ioseb Kartvelishvili, M. Okhanashvili, M. Darchashvili, Analysis of the Stages and Phases of Typical Network Attacks Based on Security Events , International Scientific-practical Conference: „Modern Challenges and Achievements in Information and Communication Technologies“ Transactions: No. 4 (2025)
Avtandili Bichnigauri, Ioseb Kartvelishvili, Zebur Beridze, The Role of the Academic Community in Raising Cybersecurity Awareness , International Scientific-practical Conference: „Modern Challenges and Achievements in Information and Communication Technologies“ Transactions: No. 4 (2025)

The Use of Chaos Engineering to Enhance the Resilience of Microservice Architectures: A Case Study of "Online Boutique"

Keywords

How to Cite

Download Citation

Abstract

References

Most read articles by the same author(s)

Similar Articles