Flume solution looks very good. Thx.
On Fri, Mar 13, 2015 at 8:15 PM, William Briggs <wrbri...@gmail.com> wrote: > I would think that this is not a particularly great solution, as you will > end up running into quite a few edge cases, and I can't see this scaling > particularly well - how do you know which server to copy logs from in a > clustered and replicated environment? What happens when Kafka detects a > failure and moves partition replicas to a different node? The reason that > the Kafka Consumer APIs exist is to shield you from having to think about > these things. In addition, you would be tightly coupling yourself to > Kafka's internal log format; in my experience, this sort of thing rarely > ends well. > > Depending on your use case, Flume is a reasonable solution, if you don't > want to use Camus; it has a Kafka source that allows you to stream data out > of Kafka and into HDFS: > http://blog.cloudera.com/blog/2014/11/flafka-apache-flume-meets-apache-kafka-for-event-processing/ > > -Will > > On Fri, Mar 13, 2015 at 2:33 PM, Alberto Miorin <amiorin78+ka...@gmail.com > > wrote: > >> I was wondering if anybody has already tried to mirror a kafka topic to >> hdfs just copying the log files from the topic directory of the broker >> (like 00000000000023244237.log). >> >> The file format is very simple : >> https://twitter.com/amiorin/status/576448691139121152/photo/1 >> >> Implementing an InputFormat should not be so difficult. >> >> Any drawbacks? >> > >