Re: Why does ORC use Deflater instead of native ZlibCompressor?

Owen O'Malley Thu, 23 Jun 2016 17:00:29 -0700

Actually, that should work. I'm a little concerned about the memory copy
that the Hadoop ZlibCompressor does, but it should be a win. If you want to
work on it, why don't you create a jira on the orc project? Don't forget
that you'll need to handle the other options in CompressionCodec.modify.


.. Owen

On Thu, Jun 23, 2016 at 3:59 PM, Aleksei Statkevich <
astatkev...@rocketfuel.com> wrote:

> Hi Owen,
>
> Thanks for the response. I saw that DirectDecompressor will be used if
> available and the difference was only in compression.
> Keeping in mind what you said, I looked at the code again. I see that the
> only specific piece that ORC uses is "nowrap" = true in Deflater. As far as
> I understand from the description, it should directly correspond
> to CompressionHeader.NO_HEADER in ZlibCompressor. In this case,
> ZlibCompressor with the right setup can be a replacement for Deflater. What
> do you think?
>
> Aleksei
>
> *Aleksei Statkevich *| Engineering Manager
>
>
> <http://www.google.com/url?q=http%3A%2F%2Frocketfuel.com%2F&sa=D&sntz=1&usg=AFrqEzfAQ9xih8SV05CiYtvyyIAKLzpX2g>
>
> <https://www.google.com/url?q=https%3A%2F%2Ftwitter.com%2Frocketfuelinc&sa=D&sntz=1&usg=AFrqEzdmS-VfAbRejUE27Yrsp6UaaAoUdw>
>
> <https://www.google.com/url?q=https%3A%2F%2Fwww.facebook.com%2Frocketfuelinc%2F&sa=D&sntz=1&usg=AFrqEzc8zstBb-QJdiYqd7m9Wmmt-UHs7A>
>
> <https://www.google.com/url?q=https%3A%2F%2Fwww.instagram.com%2Frocketfuellife%2F&sa=D&sntz=1&usg=AFrqEzf8veiDVVhTCQnpUnRttXonn6y9-g>
>
> <https://www.google.com/url?q=https%3A%2F%2Fwww.linkedin.com%2Fcompany%2Frocket-fuel-inc-&sa=D&sntz=1&usg=AFrqEzcvsj2bSqJ_SYc8qpQWQJnXXEjvLQ>
>
> <https://www.google.com/url?q=https%3A%2F%2Fwww.glassdoor.com%2FOverview%2FWorking-at-Rocket-Fuel-EI_IE286428.11%2C22.htm&sa=D&sntz=1&usg=AFrqEzf6IUelwlAKdidiiJ3wTFdjnigQVg>
>
> On Thu, Jun 23, 2016 at 2:35 PM, Owen O'Malley <omal...@apache.org> wrote:
>
>>
>>
>> On Fri, Jun 17, 2016 at 11:31 PM, Aleksei Statkevich <
>> astatkev...@rocketfuel.com> wrote:
>>
>>> Hello,
>>>
>>> I recently looked at ORC encoding and noticed
>>> that hive.ql.io.orc.ZlibCodec uses java's java.util.zip.Deflater and not
>>> Hadoop's native ZlibCompressor.
>>>
>>> Can someone please tell me what is the reason for it?
>>>
>>
>> It is more subtle than that. The first piece to notice is that if your
>> Hadoop has the direct decompression
>> (org.apache.hadoop.io.compress.zlib.ZlibDirectDecompressor), it will be
>> used. The reason that the ZlibCompressor isn't used is because ORC needs a
>> different API. In particular, ORC doesn't use stream compression, but
>> rather block compression. That is done so that it can jump over compression
>> blocks for predicate push down. (If you are skipping over a lot of values,
>> ORC doesn't need to decompress the bytes.)
>>
>> .. Owen
>>
>>
>>
>>>
>>> Also, how does performance of Deflater (which also uses native
>>> implementation) compare to Hadoop's native zlib implementation?
>>>
>>> Thanks,
>>> Aleksei
>>>
>>>
>>
>

Re: Why does ORC use Deflater instead of native ZlibCompressor?

Reply via email to