Very nice!  I also was wondering about the offset autocommit in KafkaUtils. 
Since incoming streamed Kafka data is replicated across Spark nodes in
memory it seems it is possible to have up to a batch of data loss if tasks
hang or crash.

It seems you have avoided this case by using the Kafka simple consumer and
managing the offsets explicitly.  I think this will be a great addition to
Spark Streaming, but curious what others think.





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Low-Level-Kafka-Consumer-for-Spark-tp11258p11281.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to