hello all, we at tresata wrote a library to provide for batch integration between spark and kafka (distributed write of rdd to kafa, distributed read of rdd from kafka). our main use cases are (in lambda architecture jargon): * period appends to the immutable master dataset on hdfs from kafka using spark * make non-streaming data available in kafka with periodic data drops from hdfs using spark. this is to facilitate merging the speed and batch layer in spark-streaming * distributed writes from spark-streaming
see here: https://github.com/tresata/spark-kafka best, koert