Chaos Engineering: Not so Chaotic

Published 20.06.2021

Categories Engineering

Tags devops

It feels very complex when we talk a lot about cloud computing and developer operations. Furthermore, certain things look complicated, but they are not so if we easily understand those concepts. Today, we will discuss such a thing that sounds complex but is simple and is known as Chaos Engineering.

When cloud computing comes into our mind, we think about certain things like servers, data, zones, regions, etc. Still, these things also bring inevitable headaches like network outage, power failures and others. And, to cure those headaches or resisting network or power failures, chaos engineering comes to the rescue. So, if you are very new to chaos engineering, you have come to the right place and without further a due, let’s roll.

What is Chaos Engineering?

In short words, chaos engineering is the engineering process that determines whether a System can work in times of disturbances. Through Chaos Engineering, a discipline of experimenting goes on with a System to build its confidence while specific network or power failures occur.

With Chaos Engineering, we intentionally break a system to build its capabilities and see its working performance when any individual component fails. Also, specific stresses are applied to determine various potential outages, locate weakness and improve the System’s resiliency. Also, we can see traffic spikes, unpredictable situations and more through chaos engineering.

How Chaos Engineering Works?

Now that we have seen what chaos engineering is let us find out how does it work. Various processes are involved in chaos engineering, and we are going to have a look at it.

Defining Steady-State Hypothesis

The primary thing we have to do is determining an idea in which complications can occur. Then, we have to inject that failure into the System and wait for the outcome.

Simulating Real-World Events

Simulation of real-world events in simple terms means that we have to test the System using real-world scenarios and monitor how it can perform under specific stressful circumstances.

Confirming the Steady-State

This process is straightforward as we have to note down the changes that occurred through which we can get an insight into the System’s behaviour.

Collecting Metrics and Observing Dashboard

We have to collect the metrics (basically the System performance) by observing the dashboard in this process. The improved metrics will determine customer success and will help us to measure the failure against our hypothesis.

Making Changes and Fixing Issues

After running the experiment, we will have an idea of what needs to be changed, and we can improve the System. In addition, we can now identify what will lead to an outage and will break the System.

chaos engineering
Process of Chaos Engineering (Source: Medium)

Advantages of Chaos Engineering

Reducing Failures

Chaos Engineering helps reduce failures of a System through various experiments and tests if it can run during network outages or failures.

Improving Durability

As Chaos Engineering reduces failures, it improves the overall durability as it can tolerate unpredictable conditions.

Improving Service Availability

We can gain new insights about an application through chaos engineering, which helps improve the service of the application and makes it available during network chaos and leaves room for further future improvements.

Preventing Revenue Loss

Chaos Engineering helps prevent revenue loss as it identifies the causes of a network outage or failure in advance, which allows a business to save revenue.

Lower Maintenance Costs

Chaos Engineering helps lower maintenance costs because it helps in testing your System before running it in production, which can give a positive outcome during unpredictable conditions.

Disadvantages of Chaos Engineering

Takes extended amount of Time

One of the significant weakness of chaos engineering is that it has a long process, and that wastes a lot of time for a company because it eventually holds back the deploying of an application

Trial and Error Process

To gain a proper insight into an application through Chaos  Engineering, we have to do many experiments with fault injection, lengthening the process. Moreover, we have to do it again and again to check all the possibilities.


Not all applications can withstand the intentional breaking of their System, and as a result, we can’t apply Chaos Engineering in all applications.

Tools of Chaos Engineering

Now, we can see some of the tools with which we can use Chaos Engineering.

Chaos Monkey

Chaos Monkey is the original tool for implementing Chaos Engineering at Netflix in 2010. It is still a go-to chaos testing tool.


Gremlin is by far the most popular among chaos testing tools. The free version of it helps in simulating high CPU load and turning off machines.

Chaos Toolkit

Chaos Toolkit is an open-source initiative for testing chaos which makes it more accessible, and also, it has an Open API and a standard JSON format.


Pumba is a chaos testing tool and also a network emulation for Docker.


Litmus is a chaos engineering tool for stateful workloads on Kubernetes.

Learn more about Kubernetes here:

Value Proposition of Chaos Engineering

From all the above discussions and processes, we can now finally understand how chaos engineering works, and businesses should always think about implementing Chaos Engineering before deploying their applications.

As we can see, Chaos Engineering requires failure injection, so it will be preferable for an application with the proper requirements to withstand those experimentations. Big market value businesses build applications that require more machines and data, and those applications can get high facilitation from chaos testing. So, Chaos Engineering is highly preferable before deploying big applications, which will be disadvantaged during an outage or failure.

As for small and mid-cap businesses, they can test out Chaos Engineering through free tools as their application will be comparatively more minor and also, the budget can be an issue.

Final Thoughts

We now know the fundamental secrets of Chaos Engineering, and we have also discussed how we can use it and implement it. If you ask me, I will have a chaos test for my application through open-source because it helps in the long time process and also gives stability to my application during any outage. So, hopefully, we have learned a lot about Chaos Engineering and will apply it too.

Feeling exploratory? Feel free to check our other blogs:

Happy Learning!

Join 100+ cloud native enthusiasts

and stay in the loop on modern software development.

Sign up to receive exclusive content around cloud native software development right into your inbox.

We don’t spam! Read our privacy policy for more info.

More stories from our blog

How To Install Docker on Ubuntu 20.04?

How To Install Docker on Ubuntu 20.04?

Docker is an open-source tool that makes managing application processes in containers much easier. Containers allow you to run your programmes in separate processes with their own resources. Containers are comparable to virtual machines, except they're more portable,...

Answer to Everything isn’t 42, it’s Family

Answer to Everything isn’t 42, it’s Family

We’re experiencing digitisation. An era where every person has a voice, and it doesn’t matter if he’s wise. There’s more motivation circulating the vast stretches of the internet than it’s required. This would be good in a theoretical world, but if you seek the truth,...

What’s new in Gitlab 14? 🦊

What’s new in Gitlab 14? 🦊

GitLab 14 is out and fans must be thrilled to know about all the new features along with all the fixes and removals. In this post, we will go through the many changes and improvements, bug fixes, and some remarkable deprecations. We will see all of that here. So,...

k8s vs k3s: The Comprehensive Difference

k8s vs k3s: The Comprehensive Difference

Kubernetes is undoubtedly a champion in the container orchestration world. But currently, we see that K3s or a lightweight Kubernetes distribution which is light, efficient and fast with a drastically small footprint levelling up. Businesses nowadays scratch their...

What’s new in Fluentbit v1.8.1?

What’s new in Fluentbit v1.8.1?

Fluentbit is a lightweight and fast data processor and forwarder for Linux, BSD and OSX. And, for Fluentbit fans, there is good news as they have released their new update with lots of new features and fixes. We will have a look at all of them below. New Metrics...

What’s new in Envoy v1.19.0?

What’s new in Envoy v1.19.0?

Envoyproxy introduced its new version, 1.19.0, recently, and it came with many changes and improvements from the previous ones. We can see more stability in this version, along with specific bug fixes. So, without waiting any further, let’s see what the new version...

What’s new in Vitess 10?

What’s new in Vitess 10?

Vitess 10 is released with many excellent features and also many bug fixes that were bothering the user base. We are going to see all the features and exciting announcements. So, Let's roll! Major Themes in Vitess In this release, we can see that Vitess Maintainers...

What’s new in Contour 1.17.0?

What’s new in Contour 1.17.0?

Contour 1.17.0 is out with a layer seven HTTP reverse proxy for Kubernetes clusters. The new version has arrived with many new features and several fixes, which will make the functioning of the ingress controller smoother. More activities within the community came...

What’s new in Prometheus 2.28?

What’s new in Prometheus 2.28?

Prometheus 2.28 is out. If you don't know, Prometheus is an excellent open-source system monitoring and alerting toolkit. Let's have a look at those features and have a look at the changelog. Displaying Trace Examplers in the Graphic Interface From the previous...

GPT-J: GPT-3 Democratized

GPT-J: GPT-3 Democratized

GPT-J is the open source cousin of GPT-3 that everyone can use. The open source transformer is all about democratizing transformers and with 6b parameters it’s the largest transformer available. Read more here

Interested in what we do? Looking for help? Wanna talk about software strategy?