I think I can help answer that. Avro would be nice because a) it's
going to be part of the core infrastructure of Hadoop and Hadoop is
how most of our long-term stored data is accessed and b) the schema
model is really nice (both keeping the schema in the file with the
serialized data and having a nice definition of how a schema itself is
serialized). Currently, we use lzo-compressed protobuf files to great
effect, instead. See <https://github.com/kevinweil/elephant-bird>.
More importantly, see
<http://www.slideshare.net/kevinweil/hadoop-at-twitter-hadoop-summit-2010?src=related_normal&rel=4673096>
for why we care about the kind of solutions Avro storage could
provide.

On Thu, Dec 30, 2010 at 2:00 AM, David Dabbs <dmda...@gmail.com> wrote:
> Ryan, would you mind pointing us to any doc or history articulating why you
> feel Avro is preferable for "data storage and anywhere else we
> currently do custom serialization"? Your experiences would be valuable input
> for work I hope to soon begin.
>
> Thank you,
>
> David
>
>
>
>
> -----Original Message-----
> From: Ryan King [mailto:r...@twitter.com]
> Sent: Tuesday, December 28, 2010 2:10 PM
> To: client-...@cassandra.apache.org
> Cc: dev@cassandra.apache.org
> Subject: Re: Avro RPC?
>
> On Tue, Dec 28, 2010 at 9:42 AM, Edward Capriolo <edlinuxg...@gmail.com>
> wrote:
>> On Tue, Dec 28, 2010 at 11:30 AM, Eric Evans <eev...@rackspace.com> wrote:
>>> On Wed, 2010-12-22 at 11:00 -0600, Eric Evans wrote:
>>>> So, Avro RPC.  Is anyone using this?  Is there anyone interested in
>>>> seeing it maintained?
>>>>
>>>> I'm concentrating on CQL[1][2], which for me will culminate in the
>>>> creation of a new, application-specific transport, one that doesn't
>>>> use either of the frameworks.  To me, the existing RPC framework is
>>>> just something to piggy-back on until things are otherwise working,
>>>> and I'm starting to think Thrift might be a better piggy here (read:
>>>> it has more momentum).
>>>
>>> There hasn't been very many people sounding off on this, but those that
>>> have seem to be OK with calling it quits on the Avro interface.  Since I
>>> brought this up during the holiday season, I'll give it another week
>>> just in case someone who really cares has been offline.
>>>
>>> To be clear though, I'm not really interested in pushing this forward
>>> anymore, so it's not enough to simply want it, we need someone(s)
>>> willing to step up.
>>>
>>> --
>>> Eric Evans
>>> eev...@rackspace.com
>>>
>>>
>>
>>
>> @Eric I agree with many of your sentiments.
>>
>> The "avro summary" was/is somewhere between wishful thinking and
>> educated guesswork. In ~ May 2010 a shiny new Avro project went top
>> level apache status. Meanwhile thrift had no full time committers and
>> had some glaring bugs that had been open in thrift 0.4.0 (some around
>> php) that annoyed everyone.
>>
>> However thrift did have a release 0.5.0. There are some projects that
>> use thrift, Hbase, Cassandra, and Hive. Thrift still delivers on
>> bindings for a number of languages.
>>
>> Avro is in catchup mode to thrift. They are still evolving the
>> project, while still trying to add support for more languages. As far
>> as I can tell there is no flagship project build around Avro
>> end-to-end.
>>
>> It would be a shame to see the Avro support go away from Cassandra
>> because of all the hard work that was put into it. However the
>> maintenance cost might outweigh it's benefits.
>
> Agreed. Thrift's progress has improved a lot since we first talked
> about using Avro RPC. In the meantime Avro RPC progress has slowed
> (the focus is on the Avro storage implementations).
>
> I think it'd be fair to give up on Avro RPC for client rpc for now*.
> It doesn't deliver enough over Thrift anymore.
>
> -ryan
>
> * I'm still a fan of using avro for data storage and anywhere else we
> currently do custom serialization
>
>
>

Reply via email to