[jira] [Commented] (HIVE-9660) store end offset of compressed data for RG in RowIndex in ORC

Sergey Shelukhin (JIRA) Sat, 19 Mar 2016 00:27:11 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-9660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15200794#comment-15200794
 ]


Sergey Shelukhin commented on HIVE-9660:
----------------------------------------

The fundamental problem with this patch is that logical writers (e.g. RLE 
writer) buffer the data. And for some writers like bit writer, we cannot even 
force the flush at the end of the RG, which would have solved this problem at 
some small size cost (all the encoding segments would have to terminate at RG 
boundaries). 

> store end offset of compressed data for RG in RowIndex in ORC
> -------------------------------------------------------------
>
>                 Key: HIVE-9660
>                 URL: https://issues.apache.org/jira/browse/HIVE-9660
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>         Attachments: HIVE-9660.WIP2.patch
>
>
> Right now the end offset is estimated, which in some cases results in tons of 
> extra data being read.
> We can add a separate array to RowIndex (positions_v2?) that stores number of 
> compressed buffers for each RG, or end offset, or something, to remove this 
> estimation magic



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9660) store end offset of compressed data for RG in RowIndex in ORC

Reply via email to