Flume solution looks very good.

Thx.

On Fri, Mar 13, 2015 at 8:15 PM, William Briggs <wrbri...@gmail.com> wrote:

> I would think that this is not a particularly great solution, as you will
> end up running into quite a few edge cases, and I can't see this scaling
> particularly well - how do you know which server to copy logs from in a
> clustered and replicated environment? What happens when Kafka detects a
> failure and moves partition replicas to a different node? The reason that
> the Kafka Consumer APIs exist is to shield you from having to think about
> these things. In addition, you would be tightly coupling yourself to
> Kafka's internal log format; in my experience, this sort of thing rarely
> ends well.
>
> Depending on your use case, Flume is a reasonable solution, if you don't
> want to use Camus; it has a Kafka source that allows you to stream data out
> of Kafka and into HDFS:
> http://blog.cloudera.com/blog/2014/11/flafka-apache-flume-meets-apache-kafka-for-event-processing/
>
> -Will
>
> On Fri, Mar 13, 2015 at 2:33 PM, Alberto Miorin <amiorin78+ka...@gmail.com
> > wrote:
>
>> I was wondering if anybody has already tried to mirror a kafka topic to
>> hdfs just copying the log files from the topic directory of the broker
>> (like 00000000000023244237.log).
>>
>> The file format is very simple :
>> https://twitter.com/amiorin/status/576448691139121152/photo/1
>>
>> Implementing an InputFormat should not be so difficult.
>>
>> Any drawbacks?
>>
>
>

Reply via email to