Hi Cody,

My use case is something like follows :

My application dies at X time and I write the offsets to a DB.

Now when my application starts at time Y (few minutes later) and spark
streaming reads the latest offsets using createDirectStream method. Now
here I want to get the exact offset that is being picked up by the
createDirectStream method at the begining of the batch. I need this to
create an initialRDD.

Please let me know if anything is unclear.

Thanks !!!


On Mon, Jan 11, 2016 at 8:54 PM, Cody Koeninger <c...@koeninger.org> wrote:

> I'm not 100% sure what you're asking.
>
> If you're asking if it's possible to start a stream at a particular set of
> offsets, yes, one of the createDirectStream methods takes a map from
> topicpartition to starting offset.
>
> If you're asking if it's possible to query Kafka for the offset
> corresponding to a particular time, yes, but the granularity for that API
> is very poor, because it's based on filesystem timestamp.  You're better
> off keeping an index of time to offset on your own.
>
> On Mon, Jan 11, 2016 at 3:09 AM, Abhishek Anand <abhis.anan...@gmail.com>
> wrote:
>
>> Hi,
>>
>> Is there a way so that I can fetch the offsets from where the spark
>> streaming starts reading from Kafka when my application starts ?
>>
>> What I am trying is to create an initial RDD with offsest at a particular
>> time passed as input from the command line and the offsets from where my
>> spark streaming starts.
>>
>> Eg -
>>
>> Partition 0 -> 1000 to (offset at which my spark streaming starts)
>>
>> Thanks !!
>>
>>
>>
>

Reply via email to