Should probably get everyone on the same page at https://issues.apache.org/jira/browse/SPARK-12177
On Mon, Dec 21, 2015 at 5:33 AM, Mario Ds Briggs <mario.bri...@in.ibm.com> wrote: > Hi Cody, > > I took a shot and here's what it looks like > > https://github.com/mariobriggs/spark/tree/kafka0.9-streaming/external/kafka-newconsumer/src/main/scala/org/apache/spark/streaming/kafka > > A few points of note > - I took the liberty of moving only the DirectStream & KafkaRDD part. I > assume that's the right priority. > > - In the API's on KafkaUtils, i didnt include the Deserializers in the > signature. I have expected the user to set these in the input KafkaParams > map. If that is a bad choice, open to putting them back. > > - To support the getPreferredLocations on KafkaRDD, i pull-in the > partitionLeader from the Consumer in KafkaRDD::getPartitions(). I could > have kept it like earlier, but it meant that the KafkaRDD constructors take > in 'leaders' argument. > > Quick look at API is here - > https://github.com/mariobriggs/spark/blob/kafka0.9-streaming/external/kafka-newconsumer/README.md > > After doing this, i realized that since the signatures of > createDirectStream & createRDD (Decoder's removed or replaced) change, > these can be added without conflict to KafkaUtils and remain separate from > existing implementation (older kafka API). > > thanks > Mario > > > > ----- Original message ----- > From: Mario Ds Briggs/India/IBM > To: Cody Koeninger <c...@koeninger.org> > Cc: "dev@spark.apache.org" <dev@spark.apache.org> > Subject: Re: Spark Streaming Kafka - DirectKafkaInputDStream: Using the > new Kafka Consumer API > Date: Mon, Dec 7, 2015 3:58 PM > > > sounds sane for a first cut. > > Since all creation methods take a KafkaParams, i was thinking along lines > of maybe a temp property in there which trigger usage of new consumer. > > thanks > Mario > > [image: Inactive hide details for Cody Koeninger ---04/12/2015 08:45:16 > pm---Brute force way to do it might be to just have a separate]Cody > Koeninger ---04/12/2015 08:45:16 pm---Brute force way to do it might be to > just have a separate streaming-kafka-new-consumer subproject, o > > From: Cody Koeninger <c...@koeninger.org> > To: Mario Ds Briggs/India/IBM@IBMIN > Cc: "dev@spark.apache.org" <dev@spark.apache.org> > Date: 04/12/2015 08:45 pm > Subject: Re: Spark Streaming Kafka - DirectKafkaInputDStream: Using the > new Kafka Consumer API > ------------------------------ > > > > Brute force way to do it might be to just have a separate > streaming-kafka-new-consumer subproject, or something along those lines. > > On Fri, Dec 4, 2015 at 3:12 AM, Mario Ds Briggs <*mario.bri...@in.ibm.com* > <mario.bri...@in.ibm.com>> wrote: > > - >> > forcing people on kafka 8.x to upgrade their brokers is questionable. > << > > I agree and i was more thinking maybe there is a way to support both > for a period of time (of course means some more code to maintain :-)). > > > thanks > Mario > > [image: Inactive hide details for Cody Koeninger ---04/12/2015 > 12:15:55 am---Honestly my feeling on any new API is to wait for a > point]Cody > Koeninger ---04/12/2015 12:15:55 am---Honestly my feeling on any new API is > to wait for a point release before taking it seriously :) > > From: Cody Koeninger <*c...@koeninger.org* <c...@koeninger.org>> > To: Mario Ds Briggs/India/IBM@IBMIN > Cc: "*dev@spark.apache.org* <dev@spark.apache.org>" < > *dev@spark.apache.org* <dev@spark.apache.org>> > Date: 04/12/2015 12:15 am > Subject: Re: Spark Streaming Kafka - DirectKafkaInputDStream: Using > the new Kafka Consumer API > ------------------------------ > > > > Honestly my feeling on any new API is to wait for a point release > before taking it seriously :) > > Auth and encryption seem like the only compelling reason to move, but > forcing people on kafka 8.x to upgrade their brokers is questionable. > > On Thu, Dec 3, 2015 at 11:30 AM, Mario Ds Briggs < > *mario.bri...@in.ibm.com* <mario.bri...@in.ibm.com>> wrote: > - Hi, > > Wanted to pick Cody's mind on what he thinks about > DirectKafkaInputDStream/KafkaRDD internally using the new Kafka consumer > API. I know the latter is documented as beta-quality, but yet wanted to > know if he sees any blockers as to why shouldn't go there shortly. On my > side the consideration is that kafka 0.9.0.0 introduced Authentication > and > Encryption (beta again) between clients & brokers, but this is available > only newer Consumer API's and not in the older Low-level/High-level > API's. > > From briefly studying the implementation of > DirectKafkaInputDStream/KafkaRDD and new Consumer API, my thinking is > that > it is possible to support the exact current implementation you have > using > the new API's. > One area that isnt so straightforward was the ctor of KafkaRDD > fixes the offsetRange (I did read about the deterministic feature you > were > after) and i couldnt find a direct method in the new Consumer API to get > the current 'latest' offset - however one can do a consumer.seekToEnd() > and > then call a consumer.position(). > Of course one other benefit is that the new Consumer API's > abstracts away having to deal with finding the leader for a partition, > so > can get rid of that code > > Would be great to get your thoughts. > > thanks in advance > Mario > > > > > > > > >