chaos engineering

Chaos Engineering: Not so Chaotic

by | 20.06.2021 | Engineering

It feels very complex when we talk a lot about cloud computing and developer operations. Furthermore, certain things look complicated, but they are not so if we easily understand those concepts. Today, we will discuss such a thing that sounds complex but is simple and is known as Chaos Engineering.

When cloud computing comes into our mind, we think about certain things like servers, data, zones, regions, etc. Still, these things also bring inevitable headaches like network outage, power failures and others. And, to cure those headaches or resisting network or power failures, chaos engineering comes to the rescue. So, if you are very new to chaos engineering, you have come to the right place and without further a due, let’s roll.

What is Chaos Engineering?

In short words, chaos engineering is the engineering process that determines whether a System can work in times of disturbances. Through Chaos Engineering, a discipline of experimenting goes on with a System to build its confidence while specific network or power failures occur.

With Chaos Engineering, we intentionally break a system to build its capabilities and see its working performance when any individual component fails. Also, specific stresses are applied to determine various potential outages, locate weakness and improve the System’s resiliency. Also, we can see traffic spikes, unpredictable situations and more through chaos engineering.

How Chaos Engineering Works?

Now that we have seen what chaos engineering is let us find out how does it work. Various processes are involved in chaos engineering, and we are going to have a look at it.

Defining Steady-State Hypothesis

The primary thing we have to do is determining an idea in which complications can occur. Then, we have to inject that failure into the System and wait for the outcome.

Simulating Real-World Events

Simulation of real-world events in simple terms means that we have to test the System using real-world scenarios and monitor how it can perform under specific stressful circumstances.

Confirming the Steady-State

This process is straightforward as we have to note down the changes that occurred through which we can get an insight into the System’s behaviour.

Collecting Metrics and Observing Dashboard

We have to collect the metrics (basically the System performance) by observing the dashboard in this process. The improved metrics will determine customer success and will help us to measure the failure against our hypothesis.

Making Changes and Fixing Issues

After running the experiment, we will have an idea of what needs to be changed, and we can improve the System. In addition, we can now identify what will lead to an outage and will break the System.

chaos engineering
Process of Chaos Engineering (Source: Medium)

Advantages of Chaos Engineering

Reducing Failures

Chaos Engineering helps reduce failures of a System through various experiments and tests if it can run during network outages or failures.

Improving Durability

As Chaos Engineering reduces failures, it improves the overall durability as it can tolerate unpredictable conditions.

Improving Service Availability

We can gain new insights about an application through chaos engineering, which helps improve the service of the application and makes it available during network chaos and leaves room for further future improvements.

Preventing Revenue Loss

Chaos Engineering helps prevent revenue loss as it identifies the causes of a network outage or failure in advance, which allows a business to save revenue.

Lower Maintenance Costs

Chaos Engineering helps lower maintenance costs because it helps in testing your System before running it in production, which can give a positive outcome during unpredictable conditions.

Disadvantages of Chaos Engineering

Takes extended amount of Time

One of the significant weakness of chaos engineering is that it has a long process, and that wastes a lot of time for a company because it eventually holds back the deploying of an application

Trial and Error Process

To gain a proper insight into an application through Chaos  Engineering, we have to do many experiments with fault injection, lengthening the process. Moreover, we have to do it again and again to check all the possibilities.


Not all applications can withstand the intentional breaking of their System, and as a result, we can’t apply Chaos Engineering in all applications.

Tools of Chaos Engineering

Now, we can see some of the tools with which we can use Chaos Engineering.

Chaos Monkey

Chaos Monkey is the original tool for implementing Chaos Engineering at Netflix in 2010. It is still a go-to chaos testing tool.


Gremlin is by far the most popular among chaos testing tools. The free version of it helps in simulating high CPU load and turning off machines.

Chaos Toolkit

Chaos Toolkit is an open-source initiative for testing chaos which makes it more accessible, and also, it has an Open API and a standard JSON format.


Pumba is a chaos testing tool and also a network emulation for Docker.


Litmus is a chaos engineering tool for stateful workloads on Kubernetes.

Learn more about Kubernetes here:

Value Proposition of Chaos Engineering

From all the above discussions and processes, we can now finally understand how chaos engineering works, and businesses should always think about implementing Chaos Engineering before deploying their applications.

As we can see, Chaos Engineering requires failure injection, so it will be preferable for an application with the proper requirements to withstand those experimentations. Big market value businesses build applications that require more machines and data, and those applications can get high facilitation from chaos testing. So, Chaos Engineering is highly preferable before deploying big applications, which will be disadvantaged during an outage or failure.

As for small and mid-cap businesses, they can test out Chaos Engineering through free tools as their application will be comparatively more minor and also, the budget can be an issue.

Final Thoughts

We now know the fundamental secrets of Chaos Engineering, and we have also discussed how we can use it and implement it. If you ask me, I will have a chaos test for my application through open-source because it helps in the long time process and also gives stability to my application during any outage. So, hopefully, we have learned a lot about Chaos Engineering and will apply it too.

Feeling exploratory? Feel free to check our other blogs:

Happy Learning!


The DevOps Awareness Program

Subscribe to the newsletter

Join 100+ cloud native ethusiasts


Join the community Slack

Discuss all things Kubernetes, DevOps and Cloud Native

Related articles6

Startup speed, enterprise quality

Startup speed, enterprise quality

Liebe Kunden, Partner und Kollegen,2021 ist vorbei und uns alle erwarten neue Herausforderungen und Ziele in 2022.In den letzten 3 Jahren hat sich p3r von einer One-Man-Show zu einer festen Größe im deutschen Cloud-Sektor entwickelt. Mit inzwischen 11...

Introduction to GitOps

Introduction to GitOps

GitOps serves to make the process of development and operations more developer-centric. It applies DevOps practices with Git as a single source of truth for infrastructure automation and deployment, hence the name “Git Ops.” But before getting deeper into what is...

Kaniko: How Users Can Make The Best Use of Docker

Kaniko: How Users Can Make The Best Use of Docker

Whether you love or hate containers, there are only a handful of ways to work with them properly that ensures proper application use with Docker. While there do exist a handful of solutions on the web and on the cloud to deal with all the needs that come with running...

Cilium: A Beginner’s Guide To Improve Security

Cilium: A Beginner’s Guide To Improve Security

A continuation from the previous series on eBPF and security concerns; it cannot be reiterated enough number of times how important it is for developers to ensure the safety and security of their applications. With the ever expanding reach of cloud and software...

How to clean up disk space occupied by Docker images?

How to clean up disk space occupied by Docker images?

Docker has revolutionised containers even if they weren't the first to walk the path of containerisation. The ease and agility docker provide makes it the preferred engine to explore for any beginner or enterprise looking towards containers. The one problem most of...

Parsing Packages with Porter

Parsing Packages with Porter

Porter works as a containerized tool that helps users to package the elements of any existing application or codebase along with client tools, configuration resources and deployment logic in a single bundle. This bundle can be further moved, exported, shared and distributed with just simple commands.