e.org
Subject: Re: hive - snappy and sequence file vs RC file
Can you share the reason of choosing snappy as your compression codec?
Like @omalley mentioned, RCFile will compress the data more densely, and will
avoid reading data not required in your hive query. And I think Facebook use it
to store
Can you share the reason of choosing snappy as your compression codec?
Like @omalley mentioned, RCFile will compress the data more densely,
and will avoid reading data not required in your hive query. And I
think Facebook use it to store tens of PB (if not hundred PB) of data.
Thanks
Yongqiang
On
SequenceFile compared to RCFile:
* More widely deployed.
* Available from MapReduce and Pig
* Doesn't compress as small (in RCFile all of each columns values are put
together)
* Uncompresses and deserializes all of the columns, even if you are only
reading a few
In either case, for long te
Thanks! Bejoy. I'll let you know which way we are going.
Thanks,
Chalcy
From: Bejoy Ks [mailto:bejoy...@yahoo.com]
Sent: Tuesday, June 26, 2012 9:22 AM
To: user@hive.apache.org
Subject: Re: hive - snappy and sequence file vs RC file
Hi Chalcy
AFAIK, RC File format is good when your qu
From: Chalcy Raja
To: "user@hive.apache.org"
Sent: Tuesday, June 26, 2012 6:35 PM
Subject: hive - snappy and sequence file vs RC file
Hi Hive users,
We are going to use snappy for compression.
What is the best file format, sequence file or RC file?
Hi Hive users,
We are going to use snappy for compression.
What is the best file format, sequence file or RC file? Both are splittable
and therefore will work well for us. RC file performance seems to be better
than Sequence file. Sqoop, looks like, may support --as-sequencefile tag
somet