Kafka now build-in supports managing metadata itself besides ZK, it is easy
to use and change from current ZK implementation. I think here the problem
is do we need to manage offset in Spark Streaming level or leave this
question to user.
If you want to manage offset in user level, letting Spark t
The only dependancy on Zookeeper I see is here:
https://github.com/apache/spark/blob/1c5475f1401d2233f4c61f213d1e2c2ee9673067/external/kafka/src/main/scala/org/apache/spark/streaming/kafka/ReliableKafkaReceiver.scala#L244-L247
If that's the only line that depends on Zookeeper, we could probably tr
There are already private methods in the code for interacting with Kafka's
offset management api.
There's a jira for making those methods public, but TD has been reluctant
to merge it
https://issues.apache.org/jira/browse/SPARK-10963
I think adding any ZK specific behavior to spark is a bad idea