spark kafka batch integration

Koert Kuipers Sun, 14 Dec 2014 17:23:03 -0800

hello all,
we at tresata wrote a library to provide for batch integration between
spark and kafka. it supports:
* distributed write of rdd to kafa
* distributed read of rdd from kafka


our main use cases are (in lambda architecture speak):
* periodic appends to the immutable master dataset on hdfs from kafka using
spark
* make non-streaming data available in kafka with periodic data drops from
hdfs using spark. this is to facilitate merging the speed and batch layers
in spark-streaming
* distributed writes from spark-streaming

see here:
https://github.com/tresata/spark-kafka

best,
koert

spark kafka batch integration

Reply via email to