hello all, we at tresata wrote a library to provide for batch integration between spark and kafka. it supports: * distributed write of rdd to kafa * distributed read of rdd from kafka
our main use cases are (in lambda architecture speak): * periodic appends to the immutable master dataset on hdfs from kafka using spark * make non-streaming data available in kafka with periodic data drops from hdfs using spark. this is to facilitate merging the speed and batch layers in spark-streaming * distributed writes from spark-streaming see here: https://github.com/tresata/spark-kafka best, koert