Let’s rewind the clocks and talk a bit about a key microserve tool that catapulted and caught the internet’s attention since it was introduced. The Cloud Native Computing Foundation’s(CNCF) oldest project is the much talked about Fluentd. Built by Treasure Data and created as an integration tool under CNCF, Fluentd was designed to host microservice projects on platforms such as Kubernetes, Prometheus, OpenTracing etc.
When Fluentd hit the scenes almost a few years back, it promised users to help them build their own logging layers, earning praises from both Amazon Web Services and Google for use in their platforms. The latter even uses a modified version of Fluentd as a default logging agent.
What is Fluentd
Since 2011, Fluentd has become a well integrated solution that has achieved the maturity so desired by the newcomers to the field. The system is composed of modules that achieve a better connection with platforms using plugins. Here’s the amazing thing. Given how long Fuentd has been around, users can find plugins and services for their applications related to most major online tools and databases.
You can find plugins for data sources (including Ruby applications, Docker containers, SNMP, or MQTT protocols), data outputs (like Elastic Stack, SQL database, Sentry, Datadog, or Slack). That’s not to include several other kinds of filters and middleware. And if you’re feeling a little more In case you’re still not satisfied with using pre-built solutions, you can even use the platform to create your own plugin in Ruby.
Users might be familiar with lesser discussed processes that go behind most applications, such as log parsing, filtering, and forwarding. Fluentd resolves the problems of connecting with these operations by having them all reside in an agent configuration file. The format resembles the Apache or Nginx logs closely and should be thus familiar to the operators. This helps the scripts and applications to perform better than most purpose built custom applications that are created using a convoluted mix of pipelining scripts.
Fluentd’s main operational forte lies in the exchange of communication and platforming for creating pipelines where log data can be easily transferred from log generators (such as a host or application) to their preferred destinations (data sinks such as Elasticsearch).
Most sources transfer these log files through messaging through Fluentd, which forwards these messages to a destination, marked by a series of configurable yet easily identifiable schemas. As part of this process, Fluentd converts each of the messages from the original format into a more standardized JSON file, which is then subscripted so that it can be read properly by the servers at the destination. Information in the JSON file can correspond to data such as datatype, caption, alignment, data size and so on.
Fluentd achieves all this through a routing engine that messages to one or more of the destinations based on routing data that includes the source, format or the metadata. Fluentd also supports operations common with data platforms including filtering messages, data streaming integrating custom fields, and basic data stream manipulation.
Being a Cloud Native Computing Foundation (CNCF) project, Fluentd has developed much simpler integration systems with Docker and Kubernetes, where data can be deployed either as a container or a Kubernetes DaemonSet. It is also often used in logging stacks as an alternative to the Logstash method and has been central to the more popular EFK (Elasticsearch, Fluentd, Kibana) stack.
Unlike most other plugins and platforms that can cause communication issues on data communication, either due to data type handling instance buffering, Fluentd operates on an independent model of data transfers. This refers to the potentials that Fluentd brings regardless of the source or the destination.
If your service or application has a relevant Fluentd plugin, the application can support transferring data logs to or from it. As Fluentd converts all incoming logs into standard JSON, it can connect any supported log source to any supported log destination.
Fluentd also has in built applications for handling seam manipulation and instance editing for data streams including log parsing, conversion, and data processing. Users can also define custom log formats, apply custom labels to individual messages for inducing advanced filters, iinject or remove fields.
The community is yet to build plugins that can support advanced processing under Fluentd but the same can be achieved using integration with stream processing softwares such as Norikra and Amazon Kinesis, a testament to its compatibility and ease of use. The community has also focused on making Fluentd simple and light to use, requiring only 40MB of RAM.
Users can also use a lighter version titled Fluent Bit that removes much of the normal functionality on the main Fluentd plugin, but only requires around 450KB of RAM. Fluent Bit is a pocket sized rendition of the larger system that has around 30 plugins compared to Fluentd’s 600+. Don’t be fooled by the size though, it still supports many common log types and destinations used with the main Fluentd plugin including Elasticsearch, Splunk, InfluxDB, HTTP, and other local files.
Things aren’t always that hunky dory, even when using an established plugin like Fluentd:-
1. Performance issues have always been the bane of Fluentd due to the native architectures being written in C, while the plugin is written primarily in Ruby. While this decision makes the plugin more flexible and cost effective, creates a huge gap in terms of processing speeds even on the best of hardware. Each Fluentd instance can only process around 18,000 events per second.
2. Users may need to include multi-process workers to increase throughput for the plugin’s performance, but all this does is create compatibility issues with other plugins that may not support this feature.
3. Being an open source project, it is heavily dependent on community created deployments and templates for quickly deploying a generic Fluentd instance to Docker or Kubernetes. Users still need to go through the main process of configuring, testing and maintaining the instance to their specific requirements and infrastructure.
4. Enterprise support for instances built and transferred using Fluentd is only available for customers who use Treasure Data i.e., the maintainers of Fluentd. If a user has support issues for other systems, they will have to go through public channels.
5. Fluentd architectures add an intermediate layer between the log sources and log destinations. This can result in a slowed down logging pipeline, resulting in never ending unresolved backups if the sources generate events faster than Fluentd can parse, process, and forward them. Unlike Amazon or Azure systems that have in-built applications for dealing with buffering, Fluentd will drop events once the buffer gets filled entirely.
Before one goes’ about into the deep end to implement Fluentd as a tool, it’s important to know a bit more about its main elements and components:-
- Data pushing and transfers through logging is achieved through an in-built logging driver for Fluentd. This means no additional agent is required on the container to move logs onto Fluentd as the logs are directly shipped and packaged using a simple STDOUT command. This system requires no additional file logs or repeated storage for files.
- Fluentd supports parsing for file formats such as json, regex, csv, syslog, apache and nginx and third party files, through another inbuilt plugin for standard file analysis and scraping.
- Fluentd has another component for performing metric data collection on the data streams and data pushes, as a means of collecting metadata and helping users understand servers that need more resources based on data transfer rates. It doesn’t have any unique methods to perform this system/container metrics collection. It can however scrape metrics from a Prometheus exporter.
- Fluentd has a separate plugin http_pull which is centered around scraping and pulling data from http endpoints like metrics, healthchecks etc.
It’s easy to detract from the benefits of a yesteryear program like Fluentd, being untouched by the benefits of much larger and complicated data logging systems on applications like Azure or Amazon. However, users shouldn’t ignore all that Fluentd has been able to achieve as an open source plugin and its lightweight capabilities.
If you’ve used data logging, parsing or instance transfer applications, either for local programs or web based platforms, chances are that you’ve run into Fluentd or its multitude of plugins. It doesn’t hurt for the curious mind to discover new plugins and implement their use in day to day work. Peer deeper into the community work and imbue yourself in all the benefits that it has for multi-application compatibility. Tune in next time as we discuss about another application that coders of all forms can use for their projects.