[ https://issues.apache.org/jira/browse/HIVE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13732806#comment-13732806 ]
Prasanth J commented on HIVE-4123: ---------------------------------- Updated the excel sheet. The excel sheet shows the comparison of existing RLE (baseline) vs the new RLE. The latest patch after code review shows better compression ratio when compared to old patch as well as the existing RLE. I have also added the encoding and decoding time to the excel sheet. The encoding and decoding times (in the excel sheet) are not very reliable since it is calculated for only 1 iteration. I also ran encoding/decoding over a 25M row file for 5 iterations and took the average of last 3 iterations. HIVE-4123.2.git.patch.txt took 2072ms on average for encoding 25M rows file and 920ms for decoding the encoded file. On the other hand, HIVE-4123.6.txt took 1374ms on average for encoding 25M rows file and 874ms for decoding the encoded file. > The RLE encoding for ORC can be improved > ---------------------------------------- > > Key: HIVE-4123 > URL: https://issues.apache.org/jira/browse/HIVE-4123 > Project: Hive > Issue Type: New Feature > Components: File Formats > Affects Versions: 0.12.0 > Reporter: Owen O'Malley > Assignee: Prasanth J > Labels: orcfile > Fix For: 0.12.0 > > Attachments: HIVE-4123.1.git.patch.txt, HIVE-4123.2.git.patch.txt, > HIVE-4123.3.patch.txt, HIVE-4123.4.patch.txt, HIVE-4123.5.txt, > HIVE-4123.6.txt, ORC-Compression-Ratio-Comparison.xlsx > > > The run length encoding of integers can be improved: > * tighter bit packing > * allow delta encoding > * allow longer runs -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira