Re: Why does ORC use Deflater instead of native ZlibCompressor?

Owen O'Malley Thu, 23 Jun 2016 14:35:49 -0700

On Fri, Jun 17, 2016 at 11:31 PM, Aleksei Statkevich <
astatkev...@rocketfuel.com> wrote:


> Hello,
>
> I recently looked at ORC encoding and noticed
> that hive.ql.io.orc.ZlibCodec uses java's java.util.zip.Deflater and not
> Hadoop's native ZlibCompressor.
>
> Can someone please tell me what is the reason for it?
>

It is more subtle than that. The first piece to notice is that if your
Hadoop has the direct decompression
(org.apache.hadoop.io.compress.zlib.ZlibDirectDecompressor), it will be
used. The reason that the ZlibCompressor isn't used is because ORC needs a
different API. In particular, ORC doesn't use stream compression, but
rather block compression. That is done so that it can jump over compression
blocks for predicate push down. (If you are skipping over a lot of values,
ORC doesn't need to decompress the bytes.)

.. Owen



>
> Also, how does performance of Deflater (which also uses native
> implementation) compare to Hadoop's native zlib implementation?
>
> Thanks,
> Aleksei
>
>

Re: Why does ORC use Deflater instead of native ZlibCompressor?

Reply via email to