It seemed really counter-intuitive; I can only imagine that it happened because nobody wanted to refactor the existing KafkaInputDStream to use the SimpleConsumer instead of the High Level Consumer (unless I'm misreading the source - it looks like that's what the new DirectKafkaInputDStream is doing, whereas KafkaInputDStream is using kafka.consumer.Consumer).
-Will On Fri, Mar 13, 2015 at 3:42 PM, Gwen Shapira <gshap...@cloudera.com> wrote: > I really like the new approach. The WAL in HDFS never made much sense > to me (I mean, Kafka is a log. I know they don't want the Kafka > dependency, but a log for a log makes no sense). > > Still experimental, but I think thats the right direction. > > On Fri, Mar 13, 2015 at 12:38 PM, Alberto Miorin > <amiorin78+ka...@gmail.com> wrote: > > We are currently using spark streaming 1.2.1 with kafka and write-ahead > log. > > I will only say one thing : "a nightmare". ;-) > > > > Let's see if things are better with 1.3.0 : > > http://spark.apache.org/docs/1.3.0/streaming-kafka-integration.html > > > > On Fri, Mar 13, 2015 at 8:33 PM, William Briggs <wrbri...@gmail.com> > wrote: > > > >> Spark Streaming also has built-in support for Kafka, and as of Spark > 1.2, > >> it supports using an HDFS write-ahead log to ensure zero data loss while > >> streaming: > >> > https://databricks.com/blog/2015/01/15/improved-driver-fault-tolerance-and-zero-data-loss-in-spark-streaming.html > >> > >> -Will > >> > >> On Fri, Mar 13, 2015 at 3:28 PM, Alberto Miorin < > amiorin78+ka...@gmail.com > >> > wrote: > >> > >>> I'll try this too. It looks very promising. > >>> > >>> Thx > >>> > >>> On Fri, Mar 13, 2015 at 8:25 PM, Gwen Shapira <gshap...@cloudera.com> > >>> wrote: > >>> > >>> > There's a KafkaRDD that can be used in Spark: > >>> > https://github.com/tresata/spark-kafka. It doesn't exactly replace > >>> > Camus, but should be useful in building Camus-like system in Spark. > >>> > > >>> > On Fri, Mar 13, 2015 at 12:15 PM, Alberto Miorin > >>> > <amiorin78+ka...@gmail.com> wrote: > >>> > > We use spark on mesos. I don't want to partition our cluster > because > >>> of > >>> > one > >>> > > YARN job (camus). > >>> > > > >>> > > Best > >>> > > > >>> > > Alberto > >>> > > > >>> > > On Fri, Mar 13, 2015 at 7:43 PM, Otis Gospodnetic < > >>> > > otis.gospodne...@gmail.com> wrote: > >>> > > > >>> > >> Just curious - why - is Camus not suitable/working? > >>> > >> > >>> > >> Thanks, > >>> > >> Otis > >>> > >> -- > >>> > >> Monitoring * Alerting * Anomaly Detection * Centralized Log > >>> Management > >>> > >> Solr & Elasticsearch Support * http://sematext.com/ > >>> > >> > >>> > >> > >>> > >> On Fri, Mar 13, 2015 at 2:33 PM, Alberto Miorin < > >>> > amiorin78+ka...@gmail.com > >>> > >> > > >>> > >> wrote: > >>> > >> > >>> > >> > I was wondering if anybody has already tried to mirror a kafka > >>> topic > >>> > to > >>> > >> > hdfs just copying the log files from the topic directory of the > >>> broker > >>> > >> > (like 00000000000023244237.log). > >>> > >> > > >>> > >> > The file format is very simple : > >>> > >> > https://twitter.com/amiorin/status/576448691139121152/photo/1 > >>> > >> > > >>> > >> > Implementing an InputFormat should not be so difficult. > >>> > >> > > >>> > >> > Any drawbacks? > >>> > >> > > >>> > >> > >>> > > >>> > >> > >> >