You can use HasOffsetRanges to get the offsets from the rdd, see http://spark.apache.org/docs/latest/streaming-kafka-integration.html
Although if you're already saving the offsets to a DB, why not just use that as the starting point of your application? On Mon, Jan 11, 2016 at 11:00 AM, kundan kumar <iitr.kun...@gmail.com> wrote: > Hi Cody, > > My use case is something like follows : > > My application dies at X time and I write the offsets to a DB. > > Now when my application starts at time Y (few minutes later) and spark > streaming reads the latest offsets using createDirectStream method. Now > here I want to get the exact offset that is being picked up by the > createDirectStream method at the begining of the batch. I need this to > create an initialRDD. > > Please let me know if anything is unclear. > > Thanks !!! > > > On Mon, Jan 11, 2016 at 8:54 PM, Cody Koeninger <c...@koeninger.org> > wrote: > >> I'm not 100% sure what you're asking. >> >> If you're asking if it's possible to start a stream at a particular set >> of offsets, yes, one of the createDirectStream methods takes a map from >> topicpartition to starting offset. >> >> If you're asking if it's possible to query Kafka for the offset >> corresponding to a particular time, yes, but the granularity for that API >> is very poor, because it's based on filesystem timestamp. You're better >> off keeping an index of time to offset on your own. >> >> On Mon, Jan 11, 2016 at 3:09 AM, Abhishek Anand <abhis.anan...@gmail.com> >> wrote: >> >>> Hi, >>> >>> Is there a way so that I can fetch the offsets from where the spark >>> streaming starts reading from Kafka when my application starts ? >>> >>> What I am trying is to create an initial RDD with offsest at a >>> particular time passed as input from the command line and the offsets from >>> where my spark streaming starts. >>> >>> Eg - >>> >>> Partition 0 -> 1000 to (offset at which my spark streaming starts) >>> >>> Thanks !! >>> >>> >>> >> >