HI Casey, Do you think Camus Job can benefit from this ? It pulls data from Kafka and stores it into timestamps bucket.
Here is link to source: https://github.com/linkedin/camus/blob/master/camus-etl-kafka/src/main/java/com/linkedin/camus/etl/kafka/mapred/EtlInputFormat.java Thanks, Bhavesh On Thu, Oct 30, 2014 at 7:27 AM, Casey Green <cgr...@conductor.com> wrote: > Hi Folks, > > I'm open sourcing a scalable Kafka InputFormat. As far as I know or am > aware of, my version is unique compared to other Kafka InputFormats out > there, in that input splits are mapped to Kafka log files, rather than > entire Kafka partitions. Mapping Kafka log files to input splits scales > your Map/Reduce job by the amount of data left to consume in a queue, > whereas mapping input splits to entire partitions always gives you a > constant number of input splits. > > I wrote up a blog post about it here< > http://www.conductor.com/nightlight/data-stream-processing-bulk-kafka-hadoop/>, > and the source code for my KafkaInputFormat is on github< > https://github.com/Conductor/kangaroo>. Your questions, comments and > feedback are welcomed and much appreciated! > > Thanks, > Casey Green >