It seemed really counter-intuitive; I can only imagine that it happened
because nobody wanted to refactor the existing KafkaInputDStream to use the
SimpleConsumer instead of the High Level Consumer (unless I'm misreading
the source - it looks like that's what the new DirectKafkaInputDStream is
doing, whereas KafkaInputDStream is using kafka.consumer.Consumer).

-Will

On Fri, Mar 13, 2015 at 3:42 PM, Gwen Shapira <gshap...@cloudera.com> wrote:

> I really like the new approach. The WAL in HDFS never made much sense
> to me (I mean, Kafka is a log. I know they don't want the Kafka
> dependency, but a log for a log makes no sense).
>
> Still experimental, but I think thats the right direction.
>
> On Fri, Mar 13, 2015 at 12:38 PM, Alberto Miorin
> <amiorin78+ka...@gmail.com> wrote:
> > We are currently using spark streaming 1.2.1 with kafka and write-ahead
> log.
> > I will only say one thing : "a nightmare". ;-)
> >
> > Let's see if things are better with 1.3.0 :
> > http://spark.apache.org/docs/1.3.0/streaming-kafka-integration.html
> >
> > On Fri, Mar 13, 2015 at 8:33 PM, William Briggs <wrbri...@gmail.com>
> wrote:
> >
> >> Spark Streaming also has built-in support for Kafka, and as of Spark
> 1.2,
> >> it supports using an HDFS write-ahead log to ensure zero data loss while
> >> streaming:
> >>
> https://databricks.com/blog/2015/01/15/improved-driver-fault-tolerance-and-zero-data-loss-in-spark-streaming.html
> >>
> >> -Will
> >>
> >> On Fri, Mar 13, 2015 at 3:28 PM, Alberto Miorin <
> amiorin78+ka...@gmail.com
> >> > wrote:
> >>
> >>> I'll try this too. It looks very promising.
> >>>
> >>> Thx
> >>>
> >>> On Fri, Mar 13, 2015 at 8:25 PM, Gwen Shapira <gshap...@cloudera.com>
> >>> wrote:
> >>>
> >>> > There's a KafkaRDD that can be used in Spark:
> >>> > https://github.com/tresata/spark-kafka. It doesn't exactly replace
> >>> > Camus, but should be useful in building Camus-like system in Spark.
> >>> >
> >>> > On Fri, Mar 13, 2015 at 12:15 PM, Alberto Miorin
> >>> > <amiorin78+ka...@gmail.com> wrote:
> >>> > > We use spark on mesos. I don't want to partition our cluster
> because
> >>> of
> >>> > one
> >>> > > YARN job (camus).
> >>> > >
> >>> > > Best
> >>> > >
> >>> > > Alberto
> >>> > >
> >>> > > On Fri, Mar 13, 2015 at 7:43 PM, Otis Gospodnetic <
> >>> > > otis.gospodne...@gmail.com> wrote:
> >>> > >
> >>> > >> Just curious - why - is Camus not suitable/working?
> >>> > >>
> >>> > >> Thanks,
> >>> > >> Otis
> >>> > >> --
> >>> > >> Monitoring * Alerting * Anomaly Detection * Centralized Log
> >>> Management
> >>> > >> Solr & Elasticsearch Support * http://sematext.com/
> >>> > >>
> >>> > >>
> >>> > >> On Fri, Mar 13, 2015 at 2:33 PM, Alberto Miorin <
> >>> > amiorin78+ka...@gmail.com
> >>> > >> >
> >>> > >> wrote:
> >>> > >>
> >>> > >> > I was wondering if anybody has already tried to mirror a kafka
> >>> topic
> >>> > to
> >>> > >> > hdfs just copying the log files from the topic directory of the
> >>> broker
> >>> > >> > (like 00000000000023244237.log).
> >>> > >> >
> >>> > >> > The file format is very simple :
> >>> > >> > https://twitter.com/amiorin/status/576448691139121152/photo/1
> >>> > >> >
> >>> > >> > Implementing an InputFormat should not be so difficult.
> >>> > >> >
> >>> > >> > Any drawbacks?
> >>> > >> >
> >>> > >>
> >>> >
> >>>
> >>
> >>
>

Reply via email to