Spark Streaming also has built-in support for Kafka, and as of Spark 1.2, it supports using an HDFS write-ahead log to ensure zero data loss while streaming: https://databricks.com/blog/2015/01/15/improved-driver-fault-tolerance-and-zero-data-loss-in-spark-streaming.html
-Will On Fri, Mar 13, 2015 at 3:28 PM, Alberto Miorin <amiorin78+ka...@gmail.com> wrote: > I'll try this too. It looks very promising. > > Thx > > On Fri, Mar 13, 2015 at 8:25 PM, Gwen Shapira <gshap...@cloudera.com> > wrote: > > > There's a KafkaRDD that can be used in Spark: > > https://github.com/tresata/spark-kafka. It doesn't exactly replace > > Camus, but should be useful in building Camus-like system in Spark. > > > > On Fri, Mar 13, 2015 at 12:15 PM, Alberto Miorin > > <amiorin78+ka...@gmail.com> wrote: > > > We use spark on mesos. I don't want to partition our cluster because of > > one > > > YARN job (camus). > > > > > > Best > > > > > > Alberto > > > > > > On Fri, Mar 13, 2015 at 7:43 PM, Otis Gospodnetic < > > > otis.gospodne...@gmail.com> wrote: > > > > > >> Just curious - why - is Camus not suitable/working? > > >> > > >> Thanks, > > >> Otis > > >> -- > > >> Monitoring * Alerting * Anomaly Detection * Centralized Log Management > > >> Solr & Elasticsearch Support * http://sematext.com/ > > >> > > >> > > >> On Fri, Mar 13, 2015 at 2:33 PM, Alberto Miorin < > > amiorin78+ka...@gmail.com > > >> > > > >> wrote: > > >> > > >> > I was wondering if anybody has already tried to mirror a kafka topic > > to > > >> > hdfs just copying the log files from the topic directory of the > broker > > >> > (like 00000000000023244237.log). > > >> > > > >> > The file format is very simple : > > >> > https://twitter.com/amiorin/status/576448691139121152/photo/1 > > >> > > > >> > Implementing an InputFormat should not be so difficult. > > >> > > > >> > Any drawbacks? > > >> > > > >> > > >