Can you share the reason of choosing snappy as your compression codec?
Like @omalley mentioned, RCFile will compress the data more densely,
and will avoid reading data not required in your hive query. And I
think Facebook use it to store tens of PB (if not hundred PB) of data.

Thanks
Yongqiang
On Tue, Jun 26, 2012 at 9:49 AM, Owen O'Malley <omal...@apache.org> wrote:
> SequenceFile compared to RCFile:
>   * More widely deployed.
>   * Available from MapReduce and Pig
>   * Doesn't compress as small (in RCFile all of each columns values are put
> together)
>   * Uncompresses and deserializes all of the columns, even if you are only
> reading a few
>
> In either case, for long term storage, you should seriously consider the
> default codec since that will provide much tighter compression (at the cost
> of cpu to compress it).
>
> -- Owen

Reply via email to