Actually, that should work. I'm a little concerned about the memory copy that the Hadoop ZlibCompressor does, but it should be a win. If you want to work on it, why don't you create a jira on the orc project? Don't forget that you'll need to handle the other options in CompressionCodec.modify.
.. Owen On Thu, Jun 23, 2016 at 3:59 PM, Aleksei Statkevich < astatkev...@rocketfuel.com> wrote: > Hi Owen, > > Thanks for the response. I saw that DirectDecompressor will be used if > available and the difference was only in compression. > Keeping in mind what you said, I looked at the code again. I see that the > only specific piece that ORC uses is "nowrap" = true in Deflater. As far as > I understand from the description, it should directly correspond > to CompressionHeader.NO_HEADER in ZlibCompressor. In this case, > ZlibCompressor with the right setup can be a replacement for Deflater. What > do you think? > > Aleksei > > *Aleksei Statkevich *| Engineering Manager > > > <http://www.google.com/url?q=http%3A%2F%2Frocketfuel.com%2F&sa=D&sntz=1&usg=AFrqEzfAQ9xih8SV05CiYtvyyIAKLzpX2g> > > <https://www.google.com/url?q=https%3A%2F%2Ftwitter.com%2Frocketfuelinc&sa=D&sntz=1&usg=AFrqEzdmS-VfAbRejUE27Yrsp6UaaAoUdw> > > <https://www.google.com/url?q=https%3A%2F%2Fwww.facebook.com%2Frocketfuelinc%2F&sa=D&sntz=1&usg=AFrqEzc8zstBb-QJdiYqd7m9Wmmt-UHs7A> > > <https://www.google.com/url?q=https%3A%2F%2Fwww.instagram.com%2Frocketfuellife%2F&sa=D&sntz=1&usg=AFrqEzf8veiDVVhTCQnpUnRttXonn6y9-g> > > <https://www.google.com/url?q=https%3A%2F%2Fwww.linkedin.com%2Fcompany%2Frocket-fuel-inc-&sa=D&sntz=1&usg=AFrqEzcvsj2bSqJ_SYc8qpQWQJnXXEjvLQ> > > <https://www.google.com/url?q=https%3A%2F%2Fwww.glassdoor.com%2FOverview%2FWorking-at-Rocket-Fuel-EI_IE286428.11%2C22.htm&sa=D&sntz=1&usg=AFrqEzf6IUelwlAKdidiiJ3wTFdjnigQVg> > > On Thu, Jun 23, 2016 at 2:35 PM, Owen O'Malley <omal...@apache.org> wrote: > >> >> >> On Fri, Jun 17, 2016 at 11:31 PM, Aleksei Statkevich < >> astatkev...@rocketfuel.com> wrote: >> >>> Hello, >>> >>> I recently looked at ORC encoding and noticed >>> that hive.ql.io.orc.ZlibCodec uses java's java.util.zip.Deflater and not >>> Hadoop's native ZlibCompressor. >>> >>> Can someone please tell me what is the reason for it? >>> >> >> It is more subtle than that. The first piece to notice is that if your >> Hadoop has the direct decompression >> (org.apache.hadoop.io.compress.zlib.ZlibDirectDecompressor), it will be >> used. The reason that the ZlibCompressor isn't used is because ORC needs a >> different API. In particular, ORC doesn't use stream compression, but >> rather block compression. That is done so that it can jump over compression >> blocks for predicate push down. (If you are skipping over a lot of values, >> ORC doesn't need to decompress the bytes.) >> >> .. Owen >> >> >> >>> >>> Also, how does performance of Deflater (which also uses native >>> implementation) compare to Hadoop's native zlib implementation? >>> >>> Thanks, >>> Aleksei >>> >>> >> >