Thanks Neelesh.

Even I have a plan to migrate the offset management to topic based (Kafka
0.8.2) rather zk based .  That will make the consumer much faster . If you
have plan to contribute your work for this consumer , that will be great
also.

Dibyendu



On Wed, Apr 1, 2015 at 11:07 PM, Neelesh <neele...@gmail.com> wrote:

> Hi Dibyendu,
>    Thanks for your work on this project. Spark 1.3 now has direct kafka
> streams, but still does not provide enough control over partitions and
> topics. For example, the streams are fairly statically configured -
> RDD.getPartitions() is computed only once, thus making it difficult to use
> in a SaaS environment where topics are created and deactivated on the fly
> (one topic per customer, for example). But its easy to build a wrapper
> around your receivers.
> May be there is a play where one can club direct streams with your
> receivers, but I don't quite fully understand how the 1.3 direct streams
> work yet
>
> Another thread -  Kafka 0.8.2 supports non ZK offset management , which I
> think is more scalable than bombarding ZK. I'm working on supporting the
> new offset management strategy for Kafka with kafka-spark-consumer.
>
> Thanks!
> -neelesh
>
> On Wed, Apr 1, 2015 at 9:49 AM, Dibyendu Bhattacharya <
> dibyendu.bhattach...@gmail.com> wrote:
>
>> Hi,
>>
>> Just to let you know, I have made some enhancement in Low Level Reliable
>> Receiver based Kafka Consumer (
>> http://spark-packages.org/package/dibbhatt/kafka-spark-consumer)  .
>>
>> Earlier version uses as many Receiver task for number of partitions of
>> your kafka topic . Now you can configure desired number of Receivers task
>> and every Receiver can handle subset of topic partitions.
>>
>> There was some use cases where consumer need to handle gigantic topics (
>> having 100+ partitions ) and using my receiver creates that many Receiver
>> task and hence that many CPU cores is needed just for Receiver. It was a
>> issue .
>>
>>
>> In latest code, I have changed that behavior. The max limit for number of
>> Receiver is still your number of partition, but if you specify less number
>> of Receiver task, every receiver will handle a subset of partitions and
>> consume using Kafka Low Level consumer API.
>>
>> Every receiver will manages partition(s) offset in ZK as usual way..
>>
>>
>> You can see the latest consumer here :
>> http://spark-packages.org/package/dibbhatt/kafka-spark-consumer
>>
>>
>>
>> Regards,
>> Dibyendu
>>
>>
>

Reply via email to