e.org
Subject: Re: hive - snappy and sequence file vs RC file
Can you share the reason of choosing snappy as your compression codec?
Like @omalley mentioned, RCFile will compress the data more densely, and will
avoid reading data not required in your hive query. And I think Facebook use it
to store
Can you share the reason of choosing snappy as your compression codec?
Like @omalley mentioned, RCFile will compress the data more densely,
and will avoid reading data not required in your hive query. And I
think Facebook use it to store tens of PB (if not hundred PB) of data.
Thanks
Yongqiang
On
SequenceFile compared to RCFile:
* More widely deployed.
* Available from MapReduce and Pig
* Doesn't compress as small (in RCFile all of each columns values are put
together)
* Uncompresses and deserializes all of the columns, even if you are only
reading a few
In either case, for long te
Thanks! Bejoy. I'll let you know which way we are going.
Thanks,
Chalcy
From: Bejoy Ks [mailto:bejoy...@yahoo.com]
Sent: Tuesday, June 26, 2012 9:22 AM
To: user@hive.apache.org
Subject: Re: hive - snappy and sequence file vs RC file
Hi Chalcy
AFAIK, RC File format is good when your qu
Hi Chalcy
AFAIK, RC File format is good when your queries deal with some specific columns
and not on the whole data in a row. For a general purpose, Sequence File is a
better choice. Also it is widely adopted, so more tools will have support for
Sequence Files.
Regards
Bejoy KS
___