Re: Getting kafka offsets at beginning of spark streaming application

Cody Koeninger Mon, 11 Jan 2016 09:37:27 -0800

You can use HasOffsetRanges to get the offsets from the rdd, see
http://spark.apache.org/docs/latest/streaming-kafka-integration.html


Although if you're already saving the offsets to a DB, why not just use
that as the starting point of your application?

On Mon, Jan 11, 2016 at 11:00 AM, kundan kumar <iitr.kun...@gmail.com>
wrote:

> Hi Cody,
>
> My use case is something like follows :
>
> My application dies at X time and I write the offsets to a DB.
>
> Now when my application starts at time Y (few minutes later) and spark
> streaming reads the latest offsets using createDirectStream method. Now
> here I want to get the exact offset that is being picked up by the
> createDirectStream method at the begining of the batch. I need this to
> create an initialRDD.
>
> Please let me know if anything is unclear.
>
> Thanks !!!
>
>
> On Mon, Jan 11, 2016 at 8:54 PM, Cody Koeninger <c...@koeninger.org>
> wrote:
>
>> I'm not 100% sure what you're asking.
>>
>> If you're asking if it's possible to start a stream at a particular set
>> of offsets, yes, one of the createDirectStream methods takes a map from
>> topicpartition to starting offset.
>>
>> If you're asking if it's possible to query Kafka for the offset
>> corresponding to a particular time, yes, but the granularity for that API
>> is very poor, because it's based on filesystem timestamp.  You're better
>> off keeping an index of time to offset on your own.
>>
>> On Mon, Jan 11, 2016 at 3:09 AM, Abhishek Anand <abhis.anan...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> Is there a way so that I can fetch the offsets from where the spark
>>> streaming starts reading from Kafka when my application starts ?
>>>
>>> What I am trying is to create an initial RDD with offsest at a
>>> particular time passed as input from the command line and the offsets from
>>> where my spark streaming starts.
>>>
>>> Eg -
>>>
>>> Partition 0 -> 1000 to (offset at which my spark streaming starts)
>>>
>>> Thanks !!
>>>
>>>
>>>
>>
>

Re: Getting kafka offsets at beginning of spark streaming application

Reply via email to