Hello kafka-users, I have 50 topics, each with 32 partitions where data is being ingested continuously.
Data is being published in these 50 partitions externally (no control) which causes data skew amount the partitions of each topic. For example: For topic-1, partition-1 contains 100 events, while partition-2 can have 10K events and so on for all 50 topics. *Consuming data from all 50 topics using kafka-stream mechanism,* - Running 4 consumer instances, all within the same consumer-group. - Num of threads per consumer process: 8 As data among partitions are not evenly distributed (Data-skewed partitions across topics), I see 1 or 2 consumer instances (JVM) are processing/consuming very less records compared to other 2 instances, My guess is these instances process partitions with less data. *Can someone help, how can I balance the consumers here (distribute consumer workload evenly across 4 consumer instances)? Expectation here is that all 4 consumer instances should process approx. same amount of events. * Looking forward to hearing your inputs. Thanks in advance. *Ankit.*