I am new to Kafka so please excuse me if this is a very basic question.

I have a cluster set up with 3 zookeepers and 9 brokers.  I have network
security logs flowing into the kafka cluster.  I am using logstash to read
them from the cluster and ingest them into an elasticsearch cluster.

My current settings are mostly default.  I created a topic with 8
partitions.  I have 4 logstash consumers reading that topic and feeding my
ES cluster.  My problem is I can't keep up with real time at the moment.  I
am constantly falling behind and logs are building on my kafka cluster.

When I run:
$ /opt/kafka/bin/kafka-run-class.sh kafka.tools.ConsumerOffsetChecker
--group logstash --zookeeper localhost:2181 --topic bro-logs

I get the following:
logstash        bro-logs                       0   25937394        29935485
       3998091         logstash_OP-01-VM-553-1457301346564-d14fd84a-0
logstash        bro-logs                       1   25929594        29935506
       4005912         logstash_OP-01-VM-553-1457301346564-d14fd84a-0
logstash        bro-logs                       2   26710728        29935519
       3224791         logstash_OP-01-VM-554-1457356976268-fa8c24b9-0
logstash        bro-logs                       3   3887940         6372075
        2484135         logstash_OP-01-VM-554-1457356976268-fa8c24b9-0
logstash        bro-logs                       4   3978342         6372074
        2393732         logstash_OP-01-VM-555-1457368235387-c6b8bd1f-0
logstash        bro-logs                       5   3984965         6372075
        2387110         logstash_OP-01-VM-555-1457368235387-c6b8bd1f-0
logstash        bro-logs                       6   4017715         6372076
        2354361         logstash_OP-01-VM-556-1457368464998-8edb13df-0
logstash        bro-logs                       7   4022484         6372074
        2349590         logstash_OP-01-VM-556-1457368464998-8edb13df-0

from what I understand the Lag column is telling me that there are a hole
bunch of logs waiting in the cluster to be processed.

So my question is, should I spin up more logstash consumers to read from
the kafka cluster and feed the ES cluster?  Should I increase or decrease
partitions?  What can be done to increase the amount of logs being read
from the cluster and ingested into Elastisearch?

Like I said, very new to kafka.

Thanks for the help
Tim

Reply via email to