For the last 6 months, we've been using this: https://github.com/wikimedia-incubator/kafka-hadoop-consumer
In combination with this wrapper script: https://github.com/wikimedia/kraken/blob/master/bin/kafka-hadoop-consume It's not great, but it works! On Aug 9, 2013, at 2:06 PM, Felix GV <fe...@mate1inc.com> wrote: > I think the answer is that there is currently no strong community-backed > solution to consume non-Avro data from Kafka to HDFS. > > A lot of people do it, but I think most people adapted and expanded the > contrib code to fit their needs. > > -- > Felix > > > On Fri, Aug 9, 2013 at 1:27 PM, Oleg Ruchovets <oruchov...@gmail.com> wrote: > >> Yes , I am definitely interested with such capabilities. We also using >> kafka 0.7. >> Guys I already asked , but nobody answer: what community using to >> consume from kafka to hdfs? >> My assumption was that if Camus support only Avro it will not be suitable >> for all , but people transfer from kafka to hadoop somehow. So the question >> is what is the alternatives to Camus to transfer messages from kafka to >> hdfs? >> Thanks >> Oleg. >> >> >> On Fri, Aug 9, 2013 at 6:21 AM, Andrew Psaltis <psaltis.and...@gmail.com >>> wrote: >> >>> Felix, >>> The Camus route is the direction I have headed for allot of the reasons >>> that you described. The only wrinkle is we are still on Kafka 0.7.3 so I >> am >>> in the process of back porting this patch: >>> >> https://github.com/linkedin/camus/commit/87917a2aea46da9d21c8f67129f6463af52f7aa8that >>> is described here: >>> https://groups.google.com/forum/#!topic/camus_etl/VcETxkYhzg8 -- so that >>> we can handle reading and writing non-avro'ized (if that is a word) data. >>> >>> I hope to have that done sometime in the morning and would be happy to >>> share it if others can benefit from it. >>> >>> Thanks, >>> Andrew >>> >>> >>> On Thursday, August 8, 2013 7:18:27 PM UTC-6, Felix GV wrote: >>> >>>> The contrib code is simple and probably wouldn't require too much work >> to >>>> fix, but it's a lot less robust than Camus, so you would ideally need >> to do >>>> some work to make it solid against all edge cases, failure scenarios and >>>> performance bottlenecks... >>>> >>>> I would definitely recommend investing in Camus instead, since it >> already >>>> covers a lot of the challenges I'm mentioning above, and also has more >>>> community support behind it at the moment (as far as I can tell, >> anyway), >>>> so it is more likely to keep getting improvements than the contrib code. >>>> >>>> -- >>>> Felix >>>> >>>> >>>> On Thu, Aug 8, 2013 at 9:28 AM, <psaltis...@gmail.com> wrote: >>>> >>>>> We also have a need today to ETL from Kafka into Hadoop and we do not >>>>> currently nor have any plans to use Avro. >>>>> >>>>> So is the official direction based on this discussion to ditch the >> Kafka >>>>> contrib code and direct people to use Camus without Avro as Ken >> described >>>>> or are both solutions going to survive? >>>>> >>>>> I can put time into the contrib code and/or work on documenting the >>>>> tutorial on how to make Camus work without Avro. >>>>> >>>>> Which is the preferred route, for the long term? >>>>> >>>>> Thanks, >>>>> Andrew >>>>> >>>>> On Wednesday, August 7, 2013 10:50:53 PM UTC-6, Ken Goodhope wrote: >>>>>> Hi Andrew, >>>>>> >>>>>> >>>>>> >>>>>> Camus can be made to work without avro. You will need to implement a >>>>> message decoder and and a data writer. We need to add a better >> tutorial >>>>> on how to do this, but it isn't that difficult. If you decide to go >> down >>>>> this path, you can always ask questions on this list. I try to make >> sure >>>>> each email gets answered. But it can take me a day or two. >>>>>> >>>>>> >>>>>> >>>>>> -Ken >>>>>> >>>>>> >>>>>> >>>>>> On Aug 7, 2013, at 9:33 AM, ao...@wikimedia.org wrote: >>>>>> >>>>>> >>>>>> >>>>>>> Hi all, >>>>>> >>>>>>> >>>>>> >>>>>>> Over at the Wikimedia Foundation, we're trying to figure out the >>>>> best way to do our ETL from Kafka into Hadoop. We don't currently use >> Avro >>>>> and I'm not sure if we are going to. I came across this post. >>>>>> >>>>>>> >>>>>> >>>>>>> If the plan is to remove the hadoop-consumer from Kafka contrib, do >>>>> you think we should not consider it as one of our viable options? >>>>>> >>>>>>> >>>>>> >>>>>>> Thanks! >>>>>> >>>>>>> -Andrew >>>>>> >>>>>>> >>>>>> >>>>>>> -- >>>>>> >>>>>>> You received this message because you are subscribed to the Google >>>>> Groups "Camus - Kafka ETL for Hadoop" group. >>>>>> >>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>> send an email to camus_etl+...@**googlegroups.com. >>>>> >>>>>> >>>>>>> For more options, visit https://groups.google.com/**groups/opt_out >> <https://groups.google.com/groups/opt_out> >>>>> . >>>>>> >>>>>>> >>>>>> >>>>>>> >>>>> >>>>> >>>> >>