RE: hive - snappy and sequence file vs RC file

2012-06-27 Thread Chalcy Raja
e.org Subject: Re: hive - snappy and sequence file vs RC file Can you share the reason of choosing snappy as your compression codec? Like @omalley mentioned, RCFile will compress the data more densely, and will avoid reading data not required in your hive query. And I think Facebook use it to store

Re: hive - snappy and sequence file vs RC file

2012-06-26 Thread yongqiang he
Can you share the reason of choosing snappy as your compression codec? Like @omalley mentioned, RCFile will compress the data more densely, and will avoid reading data not required in your hive query. And I think Facebook use it to store tens of PB (if not hundred PB) of data. Thanks Yongqiang On

Re: hive - snappy and sequence file vs RC file

2012-06-26 Thread Owen O'Malley
SequenceFile compared to RCFile: * More widely deployed. * Available from MapReduce and Pig * Doesn't compress as small (in RCFile all of each columns values are put together) * Uncompresses and deserializes all of the columns, even if you are only reading a few In either case, for long te

RE: hive - snappy and sequence file vs RC file

2012-06-26 Thread Chalcy Raja
Thanks! Bejoy. I'll let you know which way we are going. Thanks, Chalcy From: Bejoy Ks [mailto:bejoy...@yahoo.com] Sent: Tuesday, June 26, 2012 9:22 AM To: user@hive.apache.org Subject: Re: hive - snappy and sequence file vs RC file Hi Chalcy AFAIK, RC File format is good when your qu

Re: hive - snappy and sequence file vs RC file

2012-06-26 Thread Bejoy Ks
From: Chalcy Raja To: "user@hive.apache.org" Sent: Tuesday, June 26, 2012 6:35 PM Subject: hive - snappy and sequence file vs RC file Hi Hive users, We are going to use snappy for compression.  What is the best file format, sequence file or RC file?

hive - snappy and sequence file vs RC file

2012-06-26 Thread Chalcy Raja
Hi Hive users, We are going to use snappy for compression. What is the best file format, sequence file or RC file? Both are splittable and therefore will work well for us. RC file performance seems to be better than Sequence file. Sqoop, looks like, may support --as-sequencefile tag somet