I am using spark streaming kafka direct approach these days. I found that
when I start the application, it always start consumer the latest offset. I
hope that when application start, it consume from the offset last
application consumes with the same kafka consumer group. It means I have to
maintain the consumer offset by my self, for example record it on
zookeeper, and reload the last offset from zookeeper when restarting the
applicaiton?

I see the following discussion:
https://github.com/apache/spark/pull/4805
https://issues.apache.org/jira/browse/SPARK-6249

Is there any conclusion? Do we need to maintain the offset by myself? Or
spark streaming will support a feature to simplify the offset maintain work?

https://forums.databricks.com/questions/2936/need-to-maintain-the-consumer-offset-by-myself-whe.html

Reply via email to