Hi As discussed on IRC the other day [1] we want to propose a distributed logs processing architecture based on Heka [2], built on Alicja Kwasniewska's ELK work with <https://review.openstack.org/#/c/252968/>. Please take a look at the design document I've started working on [3]. The document is still work-in-progress, but the "Problem statement" and "Proposed change" sections should provide you with a good overview of the architecture we have in mind.
In the proposed architecture each cluster node runs an instance of Heka for collecting and processing logs. And instead of sending the processed logs to a centralized Logstash instance, logs are directly sent to Elasticsearch, which itself can be distributed across multiple nodes for high-availability and scaling. The proposed architecture is based on Heka, and it doesn't use Logstash. That being said, it is important to note that the intent of this proposal is not strictly directed at replacing Logstash by Heka. The intent is to propose a distributed architecture with Heka running on each cluster node rather than having Logstash run as a centralized logs processing component. For such a distributed architecture we think that Heka is more appropriate, with a smaller memory footprint and better performances in general. In addition, Heka is also more than a logs processing tool, as it's designed to process streams of any type of data, including events, logs and metrics. Some elements of comparison between Heka and Logstash: * Logstash was designed for logs processing. Heka is a "unified data processing" software, designed to process streams of any type of data. So Heka is about running one service on each box instead of many. Using a single service for processing different types of data also makes it possible to do correlations, and derive metrics from logs and events. See Rob Miller's presentation [4] for more details. * The virtual size of the Logstash Docker image is 447 MB, while the virtual size of an Heka image built from the same base image (debian:jessie) is 177 MB. For comparison the virtual size of the Elasticsearch image is 345 MB. * Heka is written in Go and has no dependencies. Go programs are compiled to native code. This in contrast to Logstash which uses JRuby and as such requires running a Java Virtual Machine. Besides this native versus interpreted code aspect, this also can raise the question of which JVM to use (Oracle, OpenJDK?) and which version (6,7,8?). * There are six types of Heka plugins: Inputs, Splitters, Decoders, Filters, Encoders, and Outputs. Heka plugins are written in Go or Lua. When written in Lua their executions are sandbox'ed, where misbehaving plugins may be shut down by Heka. Lua plugins may also be dynamically added to Heka with no config changes or Heka restart. This is an important property on container environments such as Mesos, where workloads are changed dynamically. * To avoid losing logs under high load it is often recommend to use Logstash together with Redis [5]. Redis plays the role of a buffer, where logs are queued when Logstash or Elasticsearch cannot keep up with the load. Heka, as a "unified data processing" software, includes its own resilient message queue, making it unnecessary to use an external queue (Redis for example). * Heka is faster than Logstash for processing logs, and its memory footprint is smaller. I ran tests, where 3,400,000 log messages were read from 500 input files and then written to a single output file. Heka processed the 3,400,000 log messages in 12 seconds, consuming 500M of RAM. Logstash processed the 3,400,000 log messages in 1mn 35s, consuming 1.1G of RAM. Adding a grok filter to parse and structure logs, Logstash processed the 3,400,000 log messages in 2mn 15s, consuming 1.5G of RAM. Using an equivalent filtering plugin, Heka processed the 3,400,000 log messages in 27s, consuming 730M of RAM. See my GitHub repo [6] for more information about the test environment. Also, I want to say that our team has been using Heka in production for about a year, in clusters of up to 200 nodes. Heka has proven to be very robust, efficient and flexible enough to address our logs processing and monitoring use-cases. We've also acquired a solid experience with it. Any comments are welcome! Thanks. [1] <http://eavesdrop.openstack.org/meetings/kolla/2016/kolla.2016-01-06-16.32.html> [2] <http://hekad.readthedocs.org> [3] <https://docs.google.com/document/d/1RdckXedts4THPb6giAZvoy3ESiJ5GXau3PYIgbGR-fA/edit?usp=sharing> [4] <http://www.slideshare.net/devopsdays/heka-rob-miller> [5] <http://blog.sematext.com/2015/09/28/recipe-rsyslog-redis-logstash/> [6] <https://github.com/elemoine/heka-logstash-comparison> __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev