Hi,

As I understand, your problem is similar to this JIRA.

https://issues.apache.org/jira/browse/SPARK-1647

The issue in this case, Kafka can not replay the message as offsets are
already committed. Also I think existing KafkaUtils ( The Default High
Level Kafka Consumer) also have this issue.

Similar discussion is there in this thread also...

http://apache-spark-user-list.1001560.n3.nabble.com/Data-loss-Spark-streaming-and-network-receiver-td12337.html

As I am thinking, it is possible to tackle this in the consumer code I have
written. If we can store the topic partition_id and consumed offset in ZK
after every checkpoint , then after Spark recover from the fail over, the
present PartitionManager code can start reading from last checkpointed
offset ( instead last committed offset as it is doing now) ..In that case
it can replay the data since last checkpoint.

I will think over it ..

Regards,
Dibyendu



On Mon, Aug 25, 2014 at 11:23 PM, RodrigoB <rodrigo.boav...@aspect.com>
wrote:

> Hi Dibyendu,
>
> My colleague has taken a look at the spark kafka consumer github you have
> provided and started experimenting.
>
> We found that somehow when Spark has a failure after a data checkpoint, the
> expected re-computations correspondent to the metadata checkpoints are not
> recovered so we loose Kafka messages and RDD's computations in Spark.
> The impression is that this code is replacing quite a bit of Spark Kafka
> Streaming code where maybe (not sure) metadata checkpoints are done every
> batch interval.
>
> Was it on purpose to solely depend on the Kafka commit to recover data and
> recomputations between data checkpoints? If so, how to make this work?
>
> tnks
> Rod
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Low-Level-Kafka-Consumer-for-Spark-tp11258p12757.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to