RE: hive - snappy and sequence file vs RC file

2012-06-27 Thread Chalcy Raja
e.org Subject: Re: hive - snappy and sequence file vs RC file Can you share the reason of choosing snappy as your compression codec? Like @omalley mentioned, RCFile will compress the data more densely, and will avoid reading data not required in your hive query. And I think Facebook use it to store

Re: hive - snappy and sequence file vs RC file

2012-06-26 Thread yongqiang he
Can you share the reason of choosing snappy as your compression codec? Like @omalley mentioned, RCFile will compress the data more densely, and will avoid reading data not required in your hive query. And I think Facebook use it to store tens of PB (if not hundred PB) of data. Thanks Yongqiang On

Re: hive - snappy and sequence file vs RC file

2012-06-26 Thread Owen O'Malley
SequenceFile compared to RCFile: * More widely deployed. * Available from MapReduce and Pig * Doesn't compress as small (in RCFile all of each columns values are put together) * Uncompresses and deserializes all of the columns, even if you are only reading a few In either case, for long te

RE: hive - snappy and sequence file vs RC file

2012-06-26 Thread Chalcy Raja
Thanks! Bejoy. I'll let you know which way we are going. Thanks, Chalcy From: Bejoy Ks [mailto:bejoy...@yahoo.com] Sent: Tuesday, June 26, 2012 9:22 AM To: user@hive.apache.org Subject: Re: hive - snappy and sequence file vs RC file Hi Chalcy AFAIK, RC File format is good when your qu

Re: hive - snappy and sequence file vs RC file

2012-06-26 Thread Bejoy Ks
Hi Chalcy AFAIK, RC File format is good when your queries deal with some specific columns and not on the whole data in a row. For a general purpose, Sequence File is a better choice. Also it is widely adopted, so more tools will have support for Sequence Files. Regards Bejoy KS ___