Re: Kafka/Hadoop consumers and producers
We also have a need today to ETL from Kafka into Hadoop and we do not currently nor have any plans to use Avro. So is the official direction based on this discussion to ditch the Kafka contrib code and direct people to use Camus without Avro as Ken described or are both solutions going to survive? I can put time into the contrib code and/or work on documenting the tutorial on how to make Camus work without Avro. Which is the preferred route, for the long term? Thanks, Andrew On Wednesday, August 7, 2013 10:50:53 PM UTC-6, Ken Goodhope wrote: > Hi Andrew, > > > > Camus can be made to work without avro. You will need to implement a message > decoder and and a data writer. We need to add a better tutorial on how to > do this, but it isn't that difficult. If you decide to go down this path, you > can always ask questions on this list. I try to make sure each email gets > answered. But it can take me a day or two. > > > > -Ken > > > > On Aug 7, 2013, at 9:33 AM, ao...@wikimedia.org wrote: > > > > > Hi all, > > > > > > Over at the Wikimedia Foundation, we're trying to figure out the best way > > to do our ETL from Kafka into Hadoop. We don't currently use Avro and I'm > > not sure if we are going to. I came across this post. > > > > > > If the plan is to remove the hadoop-consumer from Kafka contrib, do you > > think we should not consider it as one of our viable options? > > > > > > Thanks! > > > -Andrew > > > > > > -- > > > You received this message because you are subscribed to the Google Groups > > "Camus - Kafka ETL for Hadoop" group. > > > To unsubscribe from this group and stop receiving emails from it, send an > > email to camus_etl+unsubscr...@googlegroups.com. > > > For more options, visit https://groups.google.com/groups/opt_out. > > > > > >
Message Serialization
I've read that LinkedIn uses Avro for their message serialization. Was there any particular reason this was chosen say over something like Thrift or ProtocolBuffers? Was the main motivating factor the native handling of Avro in Hadoop?
Re: Message Serialization
I did a comparison between Thrift vs PB vs Avro about 3 years ago. at the time, Avro was faster than PB than Thrift. Avro has schema evolution (mentioned in the kafka paper). On Thu, Aug 8, 2013 at 10:08 AM, Mark wrote: > I've read that LinkedIn uses Avro for their message serialization. Was > there any particular reason this was chosen say over something like Thrift > or ProtocolBuffers? Was the main motivating factor the native handling of > Avro in Hadoop?
Re: Kafka/Hadoop consumers and producers
The contrib code is simple and probably wouldn't require too much work to fix, but it's a lot less robust than Camus, so you would ideally need to do some work to make it solid against all edge cases, failure scenarios and performance bottlenecks... I would definitely recommend investing in Camus instead, since it already covers a lot of the challenges I'm mentioning above, and also has more community support behind it at the moment (as far as I can tell, anyway), so it is more likely to keep getting improvements than the contrib code. -- Felix On Thu, Aug 8, 2013 at 9:28 AM, wrote: > We also have a need today to ETL from Kafka into Hadoop and we do not > currently nor have any plans to use Avro. > > So is the official direction based on this discussion to ditch the Kafka > contrib code and direct people to use Camus without Avro as Ken described > or are both solutions going to survive? > > I can put time into the contrib code and/or work on documenting the > tutorial on how to make Camus work without Avro. > > Which is the preferred route, for the long term? > > Thanks, > Andrew > > On Wednesday, August 7, 2013 10:50:53 PM UTC-6, Ken Goodhope wrote: > > Hi Andrew, > > > > > > > > Camus can be made to work without avro. You will need to implement a > message decoder and and a data writer. We need to add a better tutorial > on how to do this, but it isn't that difficult. If you decide to go down > this path, you can always ask questions on this list. I try to make sure > each email gets answered. But it can take me a day or two. > > > > > > > > -Ken > > > > > > > > On Aug 7, 2013, at 9:33 AM, ao...@wikimedia.org wrote: > > > > > > > > > Hi all, > > > > > > > > > > Over at the Wikimedia Foundation, we're trying to figure out the best > way to do our ETL from Kafka into Hadoop. We don't currently use Avro and > I'm not sure if we are going to. I came across this post. > > > > > > > > > > If the plan is to remove the hadoop-consumer from Kafka contrib, do > you think we should not consider it as one of our viable options? > > > > > > > > > > Thanks! > > > > > -Andrew > > > > > > > > > > -- > > > > > You received this message because you are subscribed to the Google > Groups "Camus - Kafka ETL for Hadoop" group. > > > > > To unsubscribe from this group and stop receiving emails from it, send > an email to camus_etl+unsubscr...@googlegroups.com. > > > > > For more options, visit https://groups.google.com/groups/opt_out. > > > > > > > > > > > >
Re: Kafka/Hadoop consumers and producers
Felix, The Camus route is the direction I have headed for allot of the reasons that you described. The only wrinkle is we are still on Kafka 0.7.3 so I am in the process of back porting this patch: https://github.com/linkedin/camus/commit/87917a2aea46da9d21c8f67129f6463af52f7aa8 that is described here: https://groups.google.com/forum/#!topic/camus_etl/VcETxkYhzg8 -- so that we can handle reading and writing non-avro'ized (if that is a word) data. I hope to have that done sometime in the morning and would be happy to share it if others can benefit from it. Thanks, Andrew On Thursday, August 8, 2013 7:18:27 PM UTC-6, Felix GV wrote: > > The contrib code is simple and probably wouldn't require too much work to > fix, but it's a lot less robust than Camus, so you would ideally need to do > some work to make it solid against all edge cases, failure scenarios and > performance bottlenecks... > > I would definitely recommend investing in Camus instead, since it already > covers a lot of the challenges I'm mentioning above, and also has more > community support behind it at the moment (as far as I can tell, anyway), > so it is more likely to keep getting improvements than the contrib code. > > -- > Felix > > > On Thu, Aug 8, 2013 at 9:28 AM, >wrote: > >> We also have a need today to ETL from Kafka into Hadoop and we do not >> currently nor have any plans to use Avro. >> >> So is the official direction based on this discussion to ditch the Kafka >> contrib code and direct people to use Camus without Avro as Ken described >> or are both solutions going to survive? >> >> I can put time into the contrib code and/or work on documenting the >> tutorial on how to make Camus work without Avro. >> >> Which is the preferred route, for the long term? >> >> Thanks, >> Andrew >> >> On Wednesday, August 7, 2013 10:50:53 PM UTC-6, Ken Goodhope wrote: >> > Hi Andrew, >> > >> > >> > >> > Camus can be made to work without avro. You will need to implement a >> message decoder and and a data writer. We need to add a better tutorial >> on how to do this, but it isn't that difficult. If you decide to go down >> this path, you can always ask questions on this list. I try to make sure >> each email gets answered. But it can take me a day or two. >> > >> > >> > >> > -Ken >> > >> > >> > >> > On Aug 7, 2013, at 9:33 AM, ao...@wikimedia.org wrote: >> > >> > >> > >> > > Hi all, >> > >> > > >> > >> > > Over at the Wikimedia Foundation, we're trying to figure out the best >> way to do our ETL from Kafka into Hadoop. We don't currently use Avro and >> I'm not sure if we are going to. I came across this post. >> > >> > > >> > >> > > If the plan is to remove the hadoop-consumer from Kafka contrib, do >> you think we should not consider it as one of our viable options? >> > >> > > >> > >> > > Thanks! >> > >> > > -Andrew >> > >> > > >> > >> > > -- >> > >> > > You received this message because you are subscribed to the Google >> Groups "Camus - Kafka ETL for Hadoop" group. >> > >> > > To unsubscribe from this group and stop receiving emails from it, >> send an email to camus_etl+...@googlegroups.com . >> > >> > > For more options, visit https://groups.google.com/groups/opt_out. >> > >> > > >> > >> > > >> >> >
Re: Message Serialization
I think we discuss that a little in this paper: http://sites.computer.org/debull/A12june/pipeline.pdf -Jay On Thu, Aug 8, 2013 at 10:08 AM, Mark wrote: > I've read that LinkedIn uses Avro for their message serialization. Was > there any particular reason this was chosen say over something like Thrift > or ProtocolBuffers? Was the main motivating factor the native handling of > Avro in Hadoop?