Re: A more scalable Kafka to Hadoop InputFormat

Bhavesh Mistry Thu, 30 Oct 2014 20:20:33 -0700

HI Casey,

Do you think Camus Job can benefit from this ?  It pulls data from Kafka
and stores it into timestamps bucket.


Here is link to source:
https://github.com/linkedin/camus/blob/master/camus-etl-kafka/src/main/java/com/linkedin/camus/etl/kafka/mapred/EtlInputFormat.java

Thanks,

Bhavesh

On Thu, Oct 30, 2014 at 7:27 AM, Casey Green <cgr...@conductor.com> wrote:

> Hi Folks,
>
> I'm open sourcing a scalable Kafka InputFormat.  As far as I know or am
> aware of, my version is unique compared to other Kafka InputFormats out
> there, in that input splits are mapped to Kafka log files, rather than
> entire Kafka partitions.  Mapping Kafka log files to input splits scales
> your Map/Reduce job by the amount of data left to consume in a queue,
> whereas mapping input splits to entire partitions always gives you a
> constant number of input splits.
>
> I wrote up a blog post about it here<
> http://www.conductor.com/nightlight/data-stream-processing-bulk-kafka-hadoop/>,
> and the source code for my KafkaInputFormat is on github<
> https://github.com/Conductor/kangaroo>.  Your questions, comments and
> feedback are welcomed and much appreciated!
>
> Thanks,
> Casey Green
>

Re: A more scalable Kafka to Hadoop InputFormat

Reply via email to