I am using spark streaming kafka direct approach these days. I found that when I start the application, it always start consumer the latest offset. I hope that when application start, it consume from the offset last application consumes with the same kafka consumer group. It means I have to maintain the consumer offset by my self, for example record it on zookeeper, and reload the last offset from zookeeper when restarting the applicaiton?
I see the following discussion: https://github.com/apache/spark/pull/4805 https://issues.apache.org/jira/browse/SPARK-6249 Is there any conclusion? Do we need to maintain the offset by myself? Or spark streaming will support a feature to simplify the offset maintain work? https://forums.databricks.com/questions/2936/need-to-maintain-the-consumer-offset-by-myself-whe.html
