I think I can help answer that. Avro would be nice because a) it's going to be part of the core infrastructure of Hadoop and Hadoop is how most of our long-term stored data is accessed and b) the schema model is really nice (both keeping the schema in the file with the serialized data and having a nice definition of how a schema itself is serialized). Currently, we use lzo-compressed protobuf files to great effect, instead. See <https://github.com/kevinweil/elephant-bird>. More importantly, see <http://www.slideshare.net/kevinweil/hadoop-at-twitter-hadoop-summit-2010?src=related_normal&rel=4673096> for why we care about the kind of solutions Avro storage could provide.
On Thu, Dec 30, 2010 at 2:00 AM, David Dabbs <dmda...@gmail.com> wrote: > Ryan, would you mind pointing us to any doc or history articulating why you > feel Avro is preferable for "data storage and anywhere else we > currently do custom serialization"? Your experiences would be valuable input > for work I hope to soon begin. > > Thank you, > > David > > > > > -----Original Message----- > From: Ryan King [mailto:r...@twitter.com] > Sent: Tuesday, December 28, 2010 2:10 PM > To: client-...@cassandra.apache.org > Cc: dev@cassandra.apache.org > Subject: Re: Avro RPC? > > On Tue, Dec 28, 2010 at 9:42 AM, Edward Capriolo <edlinuxg...@gmail.com> > wrote: >> On Tue, Dec 28, 2010 at 11:30 AM, Eric Evans <eev...@rackspace.com> wrote: >>> On Wed, 2010-12-22 at 11:00 -0600, Eric Evans wrote: >>>> So, Avro RPC. Is anyone using this? Is there anyone interested in >>>> seeing it maintained? >>>> >>>> I'm concentrating on CQL[1][2], which for me will culminate in the >>>> creation of a new, application-specific transport, one that doesn't >>>> use either of the frameworks. To me, the existing RPC framework is >>>> just something to piggy-back on until things are otherwise working, >>>> and I'm starting to think Thrift might be a better piggy here (read: >>>> it has more momentum). >>> >>> There hasn't been very many people sounding off on this, but those that >>> have seem to be OK with calling it quits on the Avro interface. Since I >>> brought this up during the holiday season, I'll give it another week >>> just in case someone who really cares has been offline. >>> >>> To be clear though, I'm not really interested in pushing this forward >>> anymore, so it's not enough to simply want it, we need someone(s) >>> willing to step up. >>> >>> -- >>> Eric Evans >>> eev...@rackspace.com >>> >>> >> >> >> @Eric I agree with many of your sentiments. >> >> The "avro summary" was/is somewhere between wishful thinking and >> educated guesswork. In ~ May 2010 a shiny new Avro project went top >> level apache status. Meanwhile thrift had no full time committers and >> had some glaring bugs that had been open in thrift 0.4.0 (some around >> php) that annoyed everyone. >> >> However thrift did have a release 0.5.0. There are some projects that >> use thrift, Hbase, Cassandra, and Hive. Thrift still delivers on >> bindings for a number of languages. >> >> Avro is in catchup mode to thrift. They are still evolving the >> project, while still trying to add support for more languages. As far >> as I can tell there is no flagship project build around Avro >> end-to-end. >> >> It would be a shame to see the Avro support go away from Cassandra >> because of all the hard work that was put into it. However the >> maintenance cost might outweigh it's benefits. > > Agreed. Thrift's progress has improved a lot since we first talked > about using Avro RPC. In the meantime Avro RPC progress has slowed > (the focus is on the Avro storage implementations). > > I think it'd be fair to give up on Avro RPC for client rpc for now*. > It doesn't deliver enough over Thrift anymore. > > -ryan > > * I'm still a fan of using avro for data storage and anywhere else we > currently do custom serialization > > >