Hi list, I need some input on best practices on wiritng Java Kafka (0.10.1.0) consumers.
*The scenario:* A java distributed system sending/receving messages, currently based on Akka + RabbitMQ. A reasonably low number of channels (~dozen) (mapped to Kafka topics) however it can potentially grow to a high number (~thousands) as the system scale. *The requirement:* Use Kafka as replacement for RabbitMQ (basically as a queue). Disabled offset auto-commit. According to their Javadoc there are mainly two options: https://kafka.apache.org/0101/javadoc/org/apache/kafka/clients/consumer/KafkaConsumer.html#multithreaded #1. One thread per consumer (and per topic) #2. Decouple message consuming from processing *The problem with #1:* While this solution is very easy to implement, it seems that every Kafka Consumer is adding (too much) load to the system. System load get very high (7.0+ on a dual-core vm). This means 99% cpu utilization with NO io wait. I actually tried 2 different implementation: 1) Main Thread.run() polling from Kafka and putting messages into a local concurrent queue. a commodity method (getMessages) to retrieve messages from the locally populated queue. 2) Put the consumer.poll() logic straight in getMessages call. No big difference. Taking thread dumps and checking them against system threads all seems to point the finger agains the consumer.poll() logic. Are there any server/client side tuning that can help? Any suggestion on what to investigate further in order to get a clear answer on why those few threads are adding so much CPU usage? *Before trying #2:* I have not tried this solution yet. My main concern is that my consumer threads will have to handle multiple topics. Is there a "best practice" or limit in terms of topics per consumer thread? Any help is much appreciated