Re: Kafka client - specify offsets?

Tobias Pfeiffer Tue, 24 Jun 2014 22:53:28 -0700

Michael,

apparently, the parameter "auto.offset.reset" has a different meaning
in Spark's Kafka implementation than what is described in the
documentation.


The Kafka docs at <https://kafka.apache.org/documentation.html>
specify the effect of "auto.offset.reset" as:
> What to do when there is no initial offset in ZooKeeper or if an offset is 
> out of range:
> * smallest : automatically reset the offset to the smallest offset
> * largest : automatically reset the offset to the largest offset
> * anything else: throw exception to the consumer

However, Spark's implementation seems to drop the part "when there is
no initial offset", as can be seen in
https://github.com/apache/spark/blob/master/external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaInputDStream.scala#L102
-- it will just wipe the stored offset from Zookeeper. I guess it's
actually a bug, because the parameter's effect is different than what
is documented, but then it's good for you (and me) because it allows
to specify "I want all that I can get" or "I want to start reading
right now", even if there is an offset stored in Zookeeper.

Tobias

On Sun, Jun 15, 2014 at 11:27 PM, Tobias Pfeiffer <t...@preferred.jp> wrote:
> Hi,
>
> there are apparently helpers to tell you the offsets
> <https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example#id-0.8.0SimpleConsumerExample-FindingStartingOffsetforReads>,
> but I have no idea how to pass that to the Kafka stream consumer. I am
> interested in that as well.
>
> Tobias
>
> On Thu, Jun 12, 2014 at 5:53 AM, Michael Campbell
> <michael.campb...@gmail.com> wrote:
>> Is there a way in the Apache Spark Kafka Utils to specify an offset to start
>> reading?  Specifically, from the start of the queue, or failing that, a
>> specific point?

Re: Kafka client - specify offsets?

Reply via email to