We’ve been using Miniway’s hadoop-consumer in production for over a year without any problems. It stores offsets in zookeeper rather than HDFS and it uses the more recent mapreduce api.
https://github.com/miniway/kafka-hadoop-consumer On Feb 13, 2014, at 11:18 AM, Marcelo Valle <mva...@redoop.org> wrote: > Hello, > > I've been studying different options to consume messages from kafka to > hadoop(hdfs) and found three odds. > > Linkedin Camus - https://github.com/linkedin/camus > kafka-hadoop-loader - https://github.com/michal-harish/kafka-hadoop-loader > hadoop-consumer - > https://github.com/apache/kafka/tree/0.8/contrib/hadoop-consumer > > I suppose Camus is the most robust tool, and from performance point of view > is the best too. But is more complex to use and develop than other options. > But not support raw text messages... and only Avro serializad messages can > be used. > > kafka-hadoop-loader have no support since one year ago, and doesn't work > with hadoop 2 so is descarded. > > hadoop-consumer is native in kafka trunk, is simple and easy to use, > support Avro an raw test, but I have doubts about performance and fault > tolerance. > > I'm right in my conclusions? > Do you know about any alternive? > Can you help me to choose the best? > > Thanks!