For #2 and #3, you would get better stability if zookeeper and Kafka get dedicated machines.
Have you profiled the performance of the nodes where multiple processes ran (zookeeper / Kafka / Druid) ? How was disk and network IO like ? Cheers On Wed, Feb 14, 2018 at 9:38 AM, Avinash Herle <avinash.herl...@gmail.com> wrote: > Hi, > > I'm using Kafka version 0.11.0.2. In my cluster, I've 4 nodes running Kafka > of which 3 nodes also running Zookeeper. I've a few producer processes that > publish to Kafka and multiple consumer processes, a streaming engine > (Spark) that ingests from Kafka and also publishes data to Kafka, and a > distributed data store (Druid) which reads all messages from Kafka and > stores in the DB. Druid also uses the same Zookeeper cluster being used by > Kafka for cluster state management. > > *Kafka Configs:* > 1) No replication being used > 2) Number of network threads 30 > 3) Number of IO threads 8 > 4) Machines have 64GB RAM and 16 cores > 5) 3 topics with 64 partitions per topic > > *Questions:* > > 1) *Partitions going offline* > I frequently see partitions going offline because of which the scheduling > delay of the Spark application increases and input rate gets jittery. I > tried enabling replication too to see if it helped with the problem. It > didn't quite make a difference. What could be the cause of this issue? Lack > of resources or cluster misconfigurations? Can the cause be large number of > receiver processes? > > *2) Colocation of Zookeeper and Kafka:* > As I mentioned above, I'm running 3 nodes with both Zookeeper and Kafka > colocated. Both the components are containerized, so they are running > inside docker containers. I found a few blogs that suggested not colocating > them for performance reasons. Is it necessary to run them on dedicated > machines? > > *3) Using same Zookeeper cluster across different components* > In my cluster, I use the same Zookeeper cluster for state management of the > Kafka cluster and the Druid cluster. Could this cause instability of the > overall system? > > Hope I've covered all the necessary information needed. Please let me know > if more information about my cluster is needed. > > Thanks in advance, > Avinash > -- > > Excuse brevity and typos. Sent from mobile device. >