sorry, I removed others by mistake thanks a lot, Mario, for explaining. Appreciate it.
On Sun, May 1, 2016 at 11:51 PM, Mario Ds Briggs <mario.bri...@in.ibm.com> wrote: > Not sure if it was a mistake that you removed others and the group on this > response > > >> > > the data duplication in-efficiency (replication to WAL) > << > > You have covered this in 'direct mode's offset based Kafka fetch > without the extra cost of WAL' . That was exactly what i was referring > to > > >> > single version of the truth of the offsets processed > << > From the docs at > http://spark.apache.org/docs/latest/streaming-kafka-integration.html > > *Exactly-once semantics:* ... *there is a small chance some records may > get consumed twice under some failures. This occurs because of > inconsistencies between data reliably received by Spark Streaming and > offsets tracked by Zookeeper. * > > > thanks > Mario > > [image: Inactive hide details for Renyi Xiong ---01/05/2016 03:34:51 > am---Hi, Thanks a lot, Cody and Mario, for your comments.]Renyi Xiong > ---01/05/2016 03:34:51 am---Hi, Thanks a lot, Cody and Mario, for your > comments. > > From: Renyi Xiong <renyixio...@gmail.com> > To: Mario Ds Briggs/India/IBM@IBMIN > Date: 01/05/2016 03:34 am > Subject: Re: Spark streaming Kafka receiver WriteAheadLog question > ------------------------------ > > > > Hi, > > Thanks a lot, Cody and Mario, for your comments. > > Actually my question is that is it possible to have the benefits of both > direct and receiver mode. i.e. > > 1. direct mode's offset based Kafka fetch without the extra cost of WAL > 2. receiver mode's Kafka pre-fetch without the extra latency of direct > mode. > > Mario, > > I don't quite get your comment b, did you mean WAL is due to receiver > mode's nature? Can you explain a little bit more? > > thanks a lot, > Renyi. > > On Tue, Apr 26, 2016 at 4:09 AM, Mario Ds Briggs < > *mario.bri...@in.ibm.com* <mario.bri...@in.ibm.com>> wrote: > > That was my initial thought as well. But then i was wondering if this > approach could help remove > a - the little extra latency overhead we have with the DirectApproach > (compared to Receiver) and > b - the data duplication in-efficiency (replication to WAL) and single > version of the truth of the offsets processed (under some failures) in the > Receiver approach. > > thanks > Mario > > ----- Message from Cody Koeninger <*c...@koeninger.org* > <c...@koeninger.org>> on Mon, 25 Apr 2016 09:23:32 -0500 ----- > > *To:* > Renyi Xiong <*renyixio...@gmail.com* <renyixio...@gmail.com>> > > *cc:* > dev <*dev@spark.apache.org* <dev@spark.apache.org>> > > *Subject:* > Re: Spark streaming Kafka receiver WriteAheadLog questionIf you want > to refer back to Kafka based on offset ranges, why not use > createDirectStream? > > On Fri, Apr 22, 2016 at 11:49 PM, Renyi Xiong <*renyixio...@gmail.com* > <renyixio...@gmail.com>> wrote: > > Hi, > > > > Is it possible for Kafka receiver generated > WriteAheadLogBackedBlockRDD to > > hold corresponded Kafka offset range so that during recovery the RDD > can > > refer back to Kafka queue instead of paying the cost of write ahead > log? > > > > I guess there must be a reason here. Could anyone please help me > understand? > > > > Thanks, > > Renyi. > > > > > > > >