Hi,

Just to let you know, I have made some enhancement in Low Level Reliable
Receiver based Kafka Consumer (
http://spark-packages.org/package/dibbhatt/kafka-spark-consumer)  .

Earlier version uses as many Receiver task for number of partitions of your
kafka topic . Now you can configure desired number of Receivers task and
every Receiver can handle subset of topic partitions.

There was some use cases where consumer need to handle gigantic topics (
having 100+ partitions ) and using my receiver creates that many Receiver
task and hence that many CPU cores is needed just for Receiver. It was a
issue .


In latest code, I have changed that behavior. The max limit for number of
Receiver is still your number of partition, but if you specify less number
of Receiver task, every receiver will handle a subset of partitions and
consume using Kafka Low Level consumer API.

Every receiver will manages partition(s) offset in ZK as usual way..


You can see the latest consumer here :
http://spark-packages.org/package/dibbhatt/kafka-spark-consumer



Regards,
Dibyendu

Reply via email to