Hi Cody, My use case is something like follows :
My application dies at X time and I write the offsets to a DB. Now when my application starts at time Y (few minutes later) and spark streaming reads the latest offsets using createDirectStream method. Now here I want to get the exact offset that is being picked up by the createDirectStream method at the begining of the batch. I need this to create an initialRDD. Please let me know if anything is unclear. Thanks !!! On Mon, Jan 11, 2016 at 8:54 PM, Cody Koeninger <c...@koeninger.org> wrote: > I'm not 100% sure what you're asking. > > If you're asking if it's possible to start a stream at a particular set of > offsets, yes, one of the createDirectStream methods takes a map from > topicpartition to starting offset. > > If you're asking if it's possible to query Kafka for the offset > corresponding to a particular time, yes, but the granularity for that API > is very poor, because it's based on filesystem timestamp. You're better > off keeping an index of time to offset on your own. > > On Mon, Jan 11, 2016 at 3:09 AM, Abhishek Anand <abhis.anan...@gmail.com> > wrote: > >> Hi, >> >> Is there a way so that I can fetch the offsets from where the spark >> streaming starts reading from Kafka when my application starts ? >> >> What I am trying is to create an initial RDD with offsest at a particular >> time passed as input from the command line and the offsets from where my >> spark streaming starts. >> >> Eg - >> >> Partition 0 -> 1000 to (offset at which my spark streaming starts) >> >> Thanks !! >> >> >> >