On Fri, Jun 17, 2016 at 11:31 PM, Aleksei Statkevich < astatkev...@rocketfuel.com> wrote:
> Hello, > > I recently looked at ORC encoding and noticed > that hive.ql.io.orc.ZlibCodec uses java's java.util.zip.Deflater and not > Hadoop's native ZlibCompressor. > > Can someone please tell me what is the reason for it? > It is more subtle than that. The first piece to notice is that if your Hadoop has the direct decompression (org.apache.hadoop.io.compress.zlib.ZlibDirectDecompressor), it will be used. The reason that the ZlibCompressor isn't used is because ORC needs a different API. In particular, ORC doesn't use stream compression, but rather block compression. That is done so that it can jump over compression blocks for predicate push down. (If you are skipping over a lot of values, ORC doesn't need to decompress the bytes.) .. Owen > > Also, how does performance of Deflater (which also uses native > implementation) compare to Hadoop's native zlib implementation? > > Thanks, > Aleksei > >