Hi, Just to let you know, I have made some enhancement in Low Level Reliable Receiver based Kafka Consumer ( http://spark-packages.org/package/dibbhatt/kafka-spark-consumer) .
Earlier version uses as many Receiver task for number of partitions of your kafka topic . Now you can configure desired number of Receivers task and every Receiver can handle subset of topic partitions. There was some use cases where consumer need to handle gigantic topics ( having 100+ partitions ) and using my receiver creates that many Receiver task and hence that many CPU cores is needed just for Receiver. It was a issue . In latest code, I have changed that behavior. The max limit for number of Receiver is still your number of partition, but if you specify less number of Receiver task, every receiver will handle a subset of partitions and consume using Kafka Low Level consumer API. Every receiver will manages partition(s) offset in ZK as usual way.. You can see the latest consumer here : http://spark-packages.org/package/dibbhatt/kafka-spark-consumer Regards, Dibyendu