That was my initial thought as well. But then i was wondering if this approach could help remove a - the little extra latency overhead we have with the DirectApproach (compared to Receiver) and b - the data duplication in-efficiency (replication to WAL) and single version of the truth of the offsets processed (under some failures) in the Receiver approach.
thanks Mario ----- Message from Cody Koeninger <c...@koeninger.org> on Mon, 25 Apr 2016 09:23:32 -0500 ----- To: Renyi Xiong <renyixio...@gmail.com> cc: dev <dev@spark.apache.org> Subject: Re: Spark streaming Kafka receiver WriteAheadLog question If you want to refer back to Kafka based on offset ranges, why not use createDirectStream? On Fri, Apr 22, 2016 at 11:49 PM, Renyi Xiong <renyixio...@gmail.com> wrote: > Hi, > > Is it possible for Kafka receiver generated WriteAheadLogBackedBlockRDD to > hold corresponded Kafka offset range so that during recovery the RDD can > refer back to Kafka queue instead of paying the cost of write ahead log? > > I guess there must be a reason here. Could anyone please help me understand? > > Thanks, > Renyi.