date:20130808

Re: Kafka/Hadoop consumers and producers

2013-08-08 Thread psaltis . andrew

We also have a need today to ETL from Kafka into Hadoop and we do not currently 
nor have any plans to use Avro. 

So is the official direction based on this discussion to ditch the Kafka 
contrib code and direct people to use Camus without Avro as Ken described or 
are both solutions going to survive? 

I can put time into the contrib code and/or work on documenting the tutorial on 
how to make Camus work without Avro. 

Which is the preferred route, for the long term?

Thanks,
Andrew

On Wednesday, August 7, 2013 10:50:53 PM UTC-6, Ken Goodhope wrote:
> Hi Andrew,
> 
> 
> 
> Camus can be made to work without avro. You will need to implement a message 
> decoder and and a data writer.   We need to add a better tutorial on how to 
> do this, but it isn't that difficult. If you decide to go down this path, you 
> can always ask questions on this list. I try to make sure each email gets 
> answered. But it can take me a day or two. 
> 
> 
> 
> -Ken
> 
> 
> 
> On Aug 7, 2013, at 9:33 AM, ao...@wikimedia.org wrote:
> 
> 
> 
> > Hi all,
> 
> > 
> 
> > Over at the Wikimedia Foundation, we're trying to figure out the best way 
> > to do our ETL from Kafka into Hadoop.  We don't currently use Avro and I'm 
> > not sure if we are going to.  I came across this post.
> 
> > 
> 
> > If the plan is to remove the hadoop-consumer from Kafka contrib, do you 
> > think we should not consider it as one of our viable options?
> 
> > 
> 
> > Thanks!
> 
> > -Andrew
> 
> > 
> 
> > -- 
> 
> > You received this message because you are subscribed to the Google Groups 
> > "Camus - Kafka ETL for Hadoop" group.
> 
> > To unsubscribe from this group and stop receiving emails from it, send an 
> > email to camus_etl+unsubscr...@googlegroups.com.
> 
> > For more options, visit https://groups.google.com/groups/opt_out.
> 
> > 
> 
> >

Message Serialization

2013-08-08 Thread Mark

I've read that LinkedIn uses Avro for their message serialization. Was there 
any particular reason this was chosen say over something like Thrift or 
ProtocolBuffers? Was the main motivating factor the native handling of Avro in 
Hadoop?

Re: Message Serialization

2013-08-08 Thread Yang

I did a comparison between Thrift vs PB vs Avro about 3 years ago. at the
time, Avro was faster than PB than Thrift.

Avro has schema evolution (mentioned in the kafka paper).

On Thu, Aug 8, 2013 at 10:08 AM, Mark  wrote:

> I've read that LinkedIn uses Avro for their message serialization. Was
> there any particular reason this was chosen say over something like Thrift
> or ProtocolBuffers? Was the main motivating factor the native handling of
> Avro in Hadoop?

Re: Kafka/Hadoop consumers and producers

2013-08-08 Thread Felix GV

The contrib code is simple and probably wouldn't require too much work to
fix, but it's a lot less robust than Camus, so you would ideally need to do
some work to make it solid against all edge cases, failure scenarios and
performance bottlenecks...

I would definitely recommend investing in Camus instead, since it already
covers a lot of the challenges I'm mentioning above, and also has more
community support behind it at the moment (as far as I can tell, anyway),
so it is more likely to keep getting improvements than the contrib code.

--
Felix


On Thu, Aug 8, 2013 at 9:28 AM,  wrote:

> We also have a need today to ETL from Kafka into Hadoop and we do not
> currently nor have any plans to use Avro.
>
> So is the official direction based on this discussion to ditch the Kafka
> contrib code and direct people to use Camus without Avro as Ken described
> or are both solutions going to survive?
>
> I can put time into the contrib code and/or work on documenting the
> tutorial on how to make Camus work without Avro.
>
> Which is the preferred route, for the long term?
>
> Thanks,
> Andrew
>
> On Wednesday, August 7, 2013 10:50:53 PM UTC-6, Ken Goodhope wrote:
> > Hi Andrew,
> >
> >
> >
> > Camus can be made to work without avro. You will need to implement a
> message decoder and and a data writer.   We need to add a better tutorial
> on how to do this, but it isn't that difficult. If you decide to go down
> this path, you can always ask questions on this list. I try to make sure
> each email gets answered. But it can take me a day or two.
> >
> >
> >
> > -Ken
> >
> >
> >
> > On Aug 7, 2013, at 9:33 AM, ao...@wikimedia.org wrote:
> >
> >
> >
> > > Hi all,
> >
> > >
> >
> > > Over at the Wikimedia Foundation, we're trying to figure out the best
> way to do our ETL from Kafka into Hadoop.  We don't currently use Avro and
> I'm not sure if we are going to.  I came across this post.
> >
> > >
> >
> > > If the plan is to remove the hadoop-consumer from Kafka contrib, do
> you think we should not consider it as one of our viable options?
> >
> > >
> >
> > > Thanks!
> >
> > > -Andrew
> >
> > >
> >
> > > --
> >
> > > You received this message because you are subscribed to the Google
> Groups "Camus - Kafka ETL for Hadoop" group.
> >
> > > To unsubscribe from this group and stop receiving emails from it, send
> an email to camus_etl+unsubscr...@googlegroups.com.
> >
> > > For more options, visit https://groups.google.com/groups/opt_out.
> >
> > >
> >
> > >
>
>

Re: Kafka/Hadoop consumers and producers

2013-08-08 Thread Andrew Psaltis

Felix,
The Camus route is the direction I have headed for allot of the reasons 
that you described. The only wrinkle is we are still on Kafka 0.7.3 so I am 
in the process of back porting this patch: 
https://github.com/linkedin/camus/commit/87917a2aea46da9d21c8f67129f6463af52f7aa8
 that 
is described 
here: https://groups.google.com/forum/#!topic/camus_etl/VcETxkYhzg8 -- so 
that we can handle reading and writing non-avro'ized (if that is a word) 
data.

I hope to have that done sometime in the morning and would be happy to 
share it if others can benefit from it.

Thanks,
Andrew


On Thursday, August 8, 2013 7:18:27 PM UTC-6, Felix GV wrote:
>
> The contrib code is simple and probably wouldn't require too much work to 
> fix, but it's a lot less robust than Camus, so you would ideally need to do 
> some work to make it solid against all edge cases, failure scenarios and 
> performance bottlenecks...
>
> I would definitely recommend investing in Camus instead, since it already 
> covers a lot of the challenges I'm mentioning above, and also has more 
> community support behind it at the moment (as far as I can tell, anyway), 
> so it is more likely to keep getting improvements than the contrib code.
>
> --
> Felix
>
>
> On Thu, Aug 8, 2013 at 9:28 AM, >wrote:
>
>> We also have a need today to ETL from Kafka into Hadoop and we do not 
>> currently nor have any plans to use Avro.
>>
>> So is the official direction based on this discussion to ditch the Kafka 
>> contrib code and direct people to use Camus without Avro as Ken described 
>> or are both solutions going to survive?
>>
>> I can put time into the contrib code and/or work on documenting the 
>> tutorial on how to make Camus work without Avro.
>>
>> Which is the preferred route, for the long term?
>>
>> Thanks,
>> Andrew
>>
>> On Wednesday, August 7, 2013 10:50:53 PM UTC-6, Ken Goodhope wrote:
>> > Hi Andrew,
>> >
>> >
>> >
>> > Camus can be made to work without avro. You will need to implement a 
>> message decoder and and a data writer.   We need to add a better tutorial 
>> on how to do this, but it isn't that difficult. If you decide to go down 
>> this path, you can always ask questions on this list. I try to make sure 
>> each email gets answered. But it can take me a day or two.
>> >
>> >
>> >
>> > -Ken
>> >
>> >
>> >
>> > On Aug 7, 2013, at 9:33 AM, ao...@wikimedia.org  wrote:
>> >
>> >
>> >
>> > > Hi all,
>> >
>> > >
>> >
>> > > Over at the Wikimedia Foundation, we're trying to figure out the best 
>> way to do our ETL from Kafka into Hadoop.  We don't currently use Avro and 
>> I'm not sure if we are going to.  I came across this post.
>> >
>> > >
>> >
>> > > If the plan is to remove the hadoop-consumer from Kafka contrib, do 
>> you think we should not consider it as one of our viable options?
>> >
>> > >
>> >
>> > > Thanks!
>> >
>> > > -Andrew
>> >
>> > >
>> >
>> > > --
>> >
>> > > You received this message because you are subscribed to the Google 
>> Groups "Camus - Kafka ETL for Hadoop" group.
>> >
>> > > To unsubscribe from this group and stop receiving emails from it, 
>> send an email to camus_etl+...@googlegroups.com .
>> >
>> > > For more options, visit https://groups.google.com/groups/opt_out.
>> >
>> > >
>> >
>> > >
>>
>>
>

Re: Message Serialization

2013-08-08 Thread Jay Kreps

I think we discuss that a little in this paper:
http://sites.computer.org/debull/A12june/pipeline.pdf

-Jay


On Thu, Aug 8, 2013 at 10:08 AM, Mark  wrote:

> I've read that LinkedIn uses Avro for their message serialization. Was
> there any particular reason this was chosen say over something like Thrift
> or ProtocolBuffers? Was the main motivating factor the native handling of
> Avro in Hadoop?

Re: Kafka/Hadoop consumers and producers

Message Serialization

Re: Message Serialization

Re: Kafka/Hadoop consumers and producers

Re: Kafka/Hadoop consumers and producers

Re: Message Serialization

6 matches

Site Navigation

Mail list logo

Footer information