[ 
https://issues.apache.org/jira/browse/HIVE-9660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15232946#comment-15232946
 ] 

Prasanth Jayachandran commented on HIVE-9660:
---------------------------------------------

I still don't think we need a config for writer. I can see that the config is 
added to avoid writing wrong lengths or disable that feature. But the problem 
is that the we won't be able to identify the files that are already written 
wrongly. So I would recommend bumping up the writerVersion to reflect this jira 
(HIVE-9660). With this we can identify files that are written after HIVE-9660. 
In future if we find anything wrong, we bump up the writerVersion again and 
make reader resilient by ignoring lengths from files written with HIVE-9660. 
There should also be a reader config that use lengths when available or 
fallback to old codepath.

> store end offset of compressed data for RG in RowIndex in ORC
> -------------------------------------------------------------
>
>                 Key: HIVE-9660
>                 URL: https://issues.apache.org/jira/browse/HIVE-9660
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>         Attachments: HIVE-9660.01.patch, HIVE-9660.02.patch, 
> HIVE-9660.03.patch, HIVE-9660.04.patch, HIVE-9660.05.patch, 
> HIVE-9660.06.patch, HIVE-9660.07.patch, HIVE-9660.07.patch, HIVE-9660.patch, 
> HIVE-9660.patch
>
>
> Right now the end offset is estimated, which in some cases results in tons of 
> extra data being read.
> We can add a separate array to RowIndex (positions_v2?) that stores number of 
> compressed buffers for each RG, or end offset, or something, to remove this 
> estimation magic



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to