Michael, apparently, the parameter "auto.offset.reset" has a different meaning in Spark's Kafka implementation than what is described in the documentation.
The Kafka docs at <https://kafka.apache.org/documentation.html> specify the effect of "auto.offset.reset" as: > What to do when there is no initial offset in ZooKeeper or if an offset is > out of range: > * smallest : automatically reset the offset to the smallest offset > * largest : automatically reset the offset to the largest offset > * anything else: throw exception to the consumer However, Spark's implementation seems to drop the part "when there is no initial offset", as can be seen in https://github.com/apache/spark/blob/master/external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaInputDStream.scala#L102 -- it will just wipe the stored offset from Zookeeper. I guess it's actually a bug, because the parameter's effect is different than what is documented, but then it's good for you (and me) because it allows to specify "I want all that I can get" or "I want to start reading right now", even if there is an offset stored in Zookeeper. Tobias On Sun, Jun 15, 2014 at 11:27 PM, Tobias Pfeiffer <t...@preferred.jp> wrote: > Hi, > > there are apparently helpers to tell you the offsets > <https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example#id-0.8.0SimpleConsumerExample-FindingStartingOffsetforReads>, > but I have no idea how to pass that to the Kafka stream consumer. I am > interested in that as well. > > Tobias > > On Thu, Jun 12, 2014 at 5:53 AM, Michael Campbell > <michael.campb...@gmail.com> wrote: >> Is there a way in the Apache Spark Kafka Utils to specify an offset to start >> reading? Specifically, from the start of the queue, or failing that, a >> specific point?