[ 
https://issues.apache.org/jira/browse/HIVE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13732806#comment-13732806
 ] 

Prasanth J commented on HIVE-4123:
----------------------------------

Updated the excel sheet. The excel sheet shows the comparison of existing RLE 
(baseline) vs the new RLE. The latest patch after code review shows better 
compression ratio when compared to old patch as well as the existing RLE. I 
have also added the encoding and decoding time to the excel sheet. The encoding 
and decoding times (in the excel sheet) are not very reliable since it is 
calculated for only 1 iteration. I also ran encoding/decoding over a 25M row 
file for 5 iterations and took the average of last 3 iterations. 
HIVE-4123.2.git.patch.txt took 2072ms on average for encoding 25M rows file and 
920ms for decoding the encoded file. On the other hand, HIVE-4123.6.txt took 
1374ms on average for encoding 25M rows file and 874ms for decoding the encoded 
file. 


                
> The RLE encoding for ORC can be improved
> ----------------------------------------
>
>                 Key: HIVE-4123
>                 URL: https://issues.apache.org/jira/browse/HIVE-4123
>             Project: Hive
>          Issue Type: New Feature
>          Components: File Formats
>    Affects Versions: 0.12.0
>            Reporter: Owen O'Malley
>            Assignee: Prasanth J
>              Labels: orcfile
>             Fix For: 0.12.0
>
>         Attachments: HIVE-4123.1.git.patch.txt, HIVE-4123.2.git.patch.txt, 
> HIVE-4123.3.patch.txt, HIVE-4123.4.patch.txt, HIVE-4123.5.txt, 
> HIVE-4123.6.txt, ORC-Compression-Ratio-Comparison.xlsx
>
>
> The run length encoding of integers can be improved:
> * tighter bit packing
> * allow delta encoding
> * allow longer runs

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to