Why DevOps Engineers Love Fluentd?

by | 10.10.2021 | Engineering

Let’s rewind the clocks and talk a bit about a key microserve tool that catapulted and caught the internet’s attention since it was introduced. The Cloud Native Computing Foundation’s(CNCF) oldest project is the much talked about Fluentd. Built by Treasure Data and created as an integration tool under CNCF, Fluentd was designed to host microservice projects on platforms such as Kubernetes, Prometheus, OpenTracing etc.

When Fluentd hit the scenes almost a few years back, it promised users to help them build their own logging layers, earning praises from both Amazon Web Services and Google for use in their platforms. The latter even uses a modified version of Fluentd as a default logging agent.

Fluentd and where it’s used on the web
Fluentd and where it’s used on the web Source: Fluentd

What is Fluentd

Since 2011, Fluentd has become a well integrated solution that has achieved the maturity so desired by the newcomers to the field. The system is composed of modules that achieve a better connection with platforms using plugins. Here’s the amazing thing. Given how long Fuentd has been around, users can find plugins and services for their applications related to most major online tools and databases.

You can find plugins for data sources (including Ruby applications, Docker containers, SNMP, or MQTT protocols), data outputs (like Elastic Stack, SQL database, Sentry, Datadog, or Slack). That’s not to include several other kinds of filters and middleware. And if you’re feeling a little more  In case you’re still not satisfied with using pre-built solutions, you can even use the platform to create your own plugin in Ruby.

Users might be familiar with lesser discussed processes that go behind most applications, such as log parsing, filtering, and forwarding. Fluentd resolves the problems of connecting with these operations by having them all reside in an agent configuration file. The format resembles the Apache or Nginx logs closely and should be thus familiar to the operators. This helps the scripts and applications to perform better than most purpose built custom applications that are created using a convoluted mix of pipelining scripts.

How Fluentd Works

Fluentd’s main operational forte lies in the exchange of communication and platforming for creating pipelines where log data can be easily transferred from log generators (such as a host or application) to their preferred destinations (data sinks such as Elasticsearch).

Most sources transfer these log files through messaging through Fluentd, which forwards these messages to a destination, marked by a series of configurable yet easily identifiable schemas. As part of this process, Fluentd converts each of the messages from the original format into a more standardized JSON file, which is then subscripted so that it can be read properly by the servers at the destination. Information in the JSON file can correspond to data such as datatype, caption, alignment, data size and so on.

what is fluentd
Fluentd and its plugin supports for other systems Source: Fluentd

Fluentd achieves all this through a routing engine that messages to one or more of the destinations based on routing data that includes the source, format or the metadata. Fluentd also supports operations common with data platforms including filtering messages, data streaming integrating custom fields, and basic data stream manipulation.

Being a Cloud Native Computing Foundation (CNCF) project, Fluentd has developed much simpler integration systems with Docker and Kubernetes, where data can be deployed either as a container or a Kubernetes DaemonSet. It is also often used in logging stacks as an alternative to the Logstash method and has been central to the more popular EFK (Elasticsearch, Fluentd, Kibana) stack.

The Advantages of Fluentd

Unlike most other plugins and platforms that can cause communication issues on data communication, either due to data type handling instance buffering, Fluentd operates on an independent model of data transfers. This refers to the potentials that Fluentd brings regardless of the source or the destination.

If your service or application has a relevant Fluentd plugin, the application can support transferring data logs to or from it. As Fluentd converts all incoming logs into standard JSON, it can connect any supported log source to any supported log destination.

Fluentd also has in built applications for handling seam manipulation and instance editing for data streams including log parsing, conversion, and data processing. Users can also define custom log formats, apply custom labels to individual messages for inducing advanced filters, iinject or remove fields.

The community is yet to build plugins that can support advanced processing under Fluentd but the same can be achieved using integration with stream processing softwares such as Norikra and Amazon Kinesis, a testament to its compatibility and ease of use. The community has also focused on making Fluentd simple and light to use, requiring only 40MB of RAM.

Users can also use a lighter version titled Fluent Bit that removes much of the normal functionality on the main Fluentd plugin, but only requires around 450KB of RAM. Fluent Bit is a pocket sized rendition of the larger system that has around 30 plugins compared to Fluentd’s 600+. Don’t be fooled by the size though, it still supports many common log types and destinations used with the main Fluentd plugin including Elasticsearch, Splunk, InfluxDB, HTTP, and other local files.

The Disadvantages of Fluentd

Things aren’t always that hunky dory, even when using an established plugin like Fluentd:-

1. Performance issues have always been the bane of Fluentd due to the native architectures being written in C, while the plugin is written primarily in Ruby. While this decision makes the plugin more flexible and cost effective, creates a huge gap in terms of processing speeds even on the best of hardware. Each Fluentd instance can only process around 18,000 events per second.

2. Users may need to include multi-process workers to increase throughput for the plugin’s performance, but all this does is create compatibility issues with other plugins that may not support this feature.

3. Being an open source project, it is heavily dependent on community created deployments and templates for quickly deploying a generic Fluentd instance to Docker or Kubernetes. Users still need to go through the main process of configuring, testing and maintaining the instance to their specific requirements and infrastructure.

4. Enterprise support for instances built and transferred using Fluentd is only available for customers who use Treasure Data i.e., the maintainers of Fluentd. If a user has support issues for other systems, they will have to go through public channels.

5. Fluentd architectures add an intermediate layer between the log sources and log destinations. This can result in a slowed down logging pipeline, resulting in never ending unresolved backups if the sources generate events faster than Fluentd can parse, process, and forward them. Unlike Amazon or Azure systems that have in-built applications for dealing with buffering, Fluentd will drop events once the buffer gets filled entirely.

Fluentd: Important Components and Use Cases

Before one goes’ about into the deep end to implement Fluentd as a tool, it’s important to know a bit more about its main elements and components:-

  1. Data pushing and transfers through logging is achieved through an in-built logging driver for Fluentd. This means no additional agent is required on the container to move logs onto Fluentd as the logs are directly shipped and packaged using a simple STDOUT command. This system requires no additional file logs or repeated storage for files.
  2. Fluentd supports parsing for file formats such as json, regex, csv, syslog, apache and nginx and third party files, through another inbuilt plugin for standard file analysis and scraping.
  3. Fluentd has another component for performing metric data collection on the data streams and data pushes, as a means of collecting metadata and helping users understand servers that need more resources based on data transfer rates. It doesn’t have any unique methods to perform this system/container metrics collection. It can however scrape metrics from a Prometheus exporter.
  4. Fluentd has a separate plugin http_pull which is centered around scraping and pulling data from http endpoints like metrics, healthchecks etc.
How Fluentd Works
Common Fluentd tech stack Source: Medium

Final Verdict

It’s easy to detract from the benefits of a yesteryear program like Fluentd, being untouched by the benefits of much larger and complicated data logging systems on applications like Azure or Amazon. However, users shouldn’t ignore all that Fluentd has been able to achieve as an open source plugin and its lightweight capabilities.

If you’ve used data logging, parsing or instance transfer applications, either for local programs or web based platforms, chances are that you’ve run into Fluentd or its multitude of plugins. It doesn’t hurt for the curious mind to discover new plugins and implement their use in day to day work. Peer deeper into the community work and imbue yourself in all the benefits that it has for multi-application compatibility. Tune in next time as we discuss about another application that coders of all forms can use for their projects.

Happy Learning!


The DevOps Awareness Program

Subscribe to the newsletter

Join 100+ cloud native ethusiasts


Join the community Slack

Discuss all things Kubernetes, DevOps and Cloud Native

Related articles6

Introduction to GitOps

Introduction to GitOps

GitOps serves to make the process of development and operations more developer-centric. It applies DevOps practices with Git as a single source of truth for infrastructure automation and deployment, hence the name “Git Ops.” But before getting deeper into what is...

Kaniko: How Users Can Make The Best Use of Docker

Kaniko: How Users Can Make The Best Use of Docker

Whether you love or hate containers, there are only a handful of ways to work with them properly that ensures proper application use with Docker. While there do exist a handful of solutions on the web and on the cloud to deal with all the needs that come with running...

Cilium: A Beginner’s Guide To Improve Security

Cilium: A Beginner’s Guide To Improve Security

A continuation from the previous series on eBPF and security concerns; it cannot be reiterated enough number of times how important it is for developers to ensure the safety and security of their applications. With the ever expanding reach of cloud and software...

How to clean up disk space occupied by Docker images?

How to clean up disk space occupied by Docker images?

Docker has revolutionised containers even if they weren't the first to walk the path of containerisation. The ease and agility docker provide makes it the preferred engine to explore for any beginner or enterprise looking towards containers. The one problem most of...

Parsing Packages with Porter

Parsing Packages with Porter

Porter works as a containerized tool that helps users to package the elements of any existing application or codebase along with client tools, configuration resources and deployment logic in a single bundle. This bundle can be further moved, exported, shared and distributed with just simple commands.

eBPF – The Next Frontier In Linux (Introduction)

eBPF – The Next Frontier In Linux (Introduction)

The three great giants of the operating system even today are well regarded as Linux, Windows and Mac OS. But when it comes to creating all purpose and open source applications, Linux still takes the reign as a crucial piece of a developer’s toolkit. However, you...