Thanks Neelesh. Even I have a plan to migrate the offset management to topic based (Kafka 0.8.2) rather zk based . That will make the consumer much faster . If you have plan to contribute your work for this consumer , that will be great also.
Dibyendu On Wed, Apr 1, 2015 at 11:07 PM, Neelesh <neele...@gmail.com> wrote: > Hi Dibyendu, > Thanks for your work on this project. Spark 1.3 now has direct kafka > streams, but still does not provide enough control over partitions and > topics. For example, the streams are fairly statically configured - > RDD.getPartitions() is computed only once, thus making it difficult to use > in a SaaS environment where topics are created and deactivated on the fly > (one topic per customer, for example). But its easy to build a wrapper > around your receivers. > May be there is a play where one can club direct streams with your > receivers, but I don't quite fully understand how the 1.3 direct streams > work yet > > Another thread - Kafka 0.8.2 supports non ZK offset management , which I > think is more scalable than bombarding ZK. I'm working on supporting the > new offset management strategy for Kafka with kafka-spark-consumer. > > Thanks! > -neelesh > > On Wed, Apr 1, 2015 at 9:49 AM, Dibyendu Bhattacharya < > dibyendu.bhattach...@gmail.com> wrote: > >> Hi, >> >> Just to let you know, I have made some enhancement in Low Level Reliable >> Receiver based Kafka Consumer ( >> http://spark-packages.org/package/dibbhatt/kafka-spark-consumer) . >> >> Earlier version uses as many Receiver task for number of partitions of >> your kafka topic . Now you can configure desired number of Receivers task >> and every Receiver can handle subset of topic partitions. >> >> There was some use cases where consumer need to handle gigantic topics ( >> having 100+ partitions ) and using my receiver creates that many Receiver >> task and hence that many CPU cores is needed just for Receiver. It was a >> issue . >> >> >> In latest code, I have changed that behavior. The max limit for number of >> Receiver is still your number of partition, but if you specify less number >> of Receiver task, every receiver will handle a subset of partitions and >> consume using Kafka Low Level consumer API. >> >> Every receiver will manages partition(s) offset in ZK as usual way.. >> >> >> You can see the latest consumer here : >> http://spark-packages.org/package/dibbhatt/kafka-spark-consumer >> >> >> >> Regards, >> Dibyendu >> >> >