Re: Getting kafka offsets at beginning of spark streaming application

2016-01-11 Thread Cody Koeninger
You can use HasOffsetRanges to get the offsets from the rdd, see http://spark.apache.org/docs/latest/streaming-kafka-integration.html Although if you're already saving the offsets to a DB, why not just use that as the starting point of your application? On Mon, Jan 11, 2016 at 11:00 AM, kundan ku

Re: Getting kafka offsets at beginning of spark streaming application

2016-01-11 Thread kundan kumar
Hi Cody, My use case is something like follows : My application dies at X time and I write the offsets to a DB. Now when my application starts at time Y (few minutes later) and spark streaming reads the latest offsets using createDirectStream method. Now here I want to get the exact offset that

Re: Getting kafka offsets at beginning of spark streaming application

2016-01-11 Thread Cody Koeninger
I'm not 100% sure what you're asking. If you're asking if it's possible to start a stream at a particular set of offsets, yes, one of the createDirectStream methods takes a map from topicpartition to starting offset. If you're asking if it's possible to query Kafka for the offset corresponding to