Prometheus v2.30

What’s new in Prometheus v2.30?

by | 29.09.2021 | Changelog

Prometheus v2.30 was released a few days ago, and it is an exciting update. This update is not very inclined on adding new features to the ecosystem, but it brings several enhancements to configurability and resource usage efficiency. It also brings several bug fixes. We will take a look at all of those changes in this article.

So, without further a due, let’s start!

Faster server restart times via snapshotting

With this release, we will see a faster server restart time. For large Prometheus servers that track a large number of concurrent time series, a server restart can take multiple minutes. The Prometheus server must rebuild its latest in-memory state for recent time-series data from the write-ahead-log (WAL) on disk. This process can be pretty slow as Prometheus needs to effectively replay the entire ingestion process for every sample in the WAL. It does that to reconstruct the final time series chunks in memory.

The release of Prometheus v2.30 introduces an experimental snapshotting feature, implemented and enabled using the flag --enable-feature=memory-snapshot-on-shutdown. With the enabling of this feature, Prometheus writes out a more raw snapshot of its current in-memory state upon shutdown, re-read into memory more efficiently when the server restarts. The first experiments with this code have reduced restart times by 50-80% in one example.

You have to keep in mind that Prometheus only writes out this snapshot when it completes an orderly shutdown and not periodically during regular operation (to reduce overall write load while Prometheus is running). So this new feature only helps in speeding up restarts after clean shutdowns. In instances of crashes or unclean shutdowns, the most recent data while writing is replayed more slowly from the WAL than before.

Controlling scrape intervals and timeouts via relabeling

Unique target labels, along with target relabeling, helps in already allowing Prometheus users to control certain scrape behaviours, such as the address, the HTTP path, or the HTTP parameters sent along with the scrape request. In Prometheus 2.30, we can see the extension of this configurability to two more parameters:

Scrape intervals

The scrape intervals define how frequently a new __scrape_interval__ meta label such as __scrape_interval__=”15s” will scrape and control a target.

Scrape timeouts

Scrape Timeouts define how long a target scrape may take before a new __scrape_timeout__ meta label such as __scrape_timeout__=”15s” controls it.

Moreover, this will now allow setting per-target scrape intervals within the same scrape configuration section. Since target labels control the behaviour, service discovery mechanisms can control this behaviour more dynamically. I will temporarily tell Prometheus to scrape specific targets more frequently to get higher-resolution data in specific scenarios or less frequently cause less load.

Improvement of storage efficiency by tuning timestamp tolerances

The latest release brings an improvement in storage efficiency. We can see that Prometheus uses a double delta compression algorithm for storing sample timestamps within each time series. The compression algorithm performs better if the intervals between subsequent timestamps are entirely regular. Although Prometheus already tries to scrape targets regularly, actual scrape timestamps can deviate slightly (by a few milliseconds) from the intended schedule. Go regression extracted this that caused more timer jitter. In previous versions, Prometheus already allowed setting the --scrape.adjust-timestamps boolean flag to adjust scraping of timestamps by up to 2ms to align them with the intended scrape schedule and thus help achieve better timestamp compression.

With the release of Prometheus v2.30, we can see the addition of support for tuning this scrape timestamp tolerance duration by using the experimental flag --scrape.timestamp-tolerance=<duration>.

Other Enhancements

With this release, we can see other minor improvements in Prometheus v2.30, such as improving the usage of WAL load memory by 24% and CPU usage by 19%. We also see two more optional per-scrape metrics, namely, scrape_timeout_seconds and scrape_sample_limit) in addition to up and friends.

Bug Fixes

The latest release brings a couple of bug fixes. Firstly, we can see that in the case of Exemplars, there is fixing of panic when resizing exemplar storage from 0 to a non-zero size. For TSDB, we can now correctly decrement prometheus_tsdb_head_active_appenders when the append has no samples. Again with promtool rules backfill, we can see that there will be a return of 1 if backfill was unsuccessful, and also there will be avoiding of creation of overlapping blocks. Finally, in the case of config, we see the fixing of a panic when reloading configuration with a null relabel action.

Conclusion

Throughout the article, we have seen how Prometheus v2.30 brings a lot of essential enhancements that will help users have a better experience with the product. Try out the new version here and contribute to the project by clicking here. Have an excellent experience with Prometheus, and I will see you all in the next one.

You can find more of our blogs below. Happy learning!

CommunityNew

The DevOps Awareness Program

Subscribe to the newsletter

Join 100+ cloud native ethusiasts

#wearep3r

Join the community Slack

Discuss all things Kubernetes, DevOps and Cloud Native

Related articles6

What’s new in Kuma v1.3.0?

What’s new in Kuma v1.3.0?

Kuma recently came with their new version of 1.3.0. It has come up with several bug fixes and new features with this update. In this article, we will see those fixes and new features which will make users have a great experience with the product. Buck up, and let’s...

What’s new in Istio v1.11.3?

What’s new in Istio v1.11.3?

Istio came with its new version recently. It is a minor release, but it contains some significant changes and fixes. In this article, we will have a detailed look at what version 1.11.3 brings to the table. So, without wasting any time. Let's start! What is Istio?...

What’s new in Traefik v2.5.3?

What’s new in Traefik v2.5.3?

Traefik came with a new version of 2.5.3. This version mainly focuses on bug fixing and adding documents. This article will cover all of those entirely. It is not a big update, so this article will be short and crisp. Buckle up for a ride. Let's start! What is...

What’s new in Python-Tuf v0.18.0?

What’s new in Python-Tuf v0.18.0?

Python-Tuf v0.18.0 recently came, and it is quite a big update with major and minor changes. We will go through all of those changes, additions, fixes and removals in this document. Without further a due, let's start! What is Python-Tuf? The Update Framework (TUF) or...

What’s new in Envoyproxy v1.19.1?

What’s new in Envoyproxy v1.19.1?

Envoyproxy came with its new version a few days ago. Version 1.19.1 comes with very few updates. It provides a few minor behavioural changes and a few bug fixes to make the user experience smoother. In this article, we will cover all of the new changes. Let's start!...

What’s new in Jaeger v1.26.0?

What’s new in Jaeger v1.26.0?

Jaeger v1.26.0 recently came. It has a few changes in its backend. In this article, we will cover all of this in a straightforward way. We will see all of the fixes and the new features that the devs have added. Let's start! What is Jaeger? Jaeger is a graduated CNCF...