[ 
https://issues.apache.org/jira/browse/HIVE-7800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14158594#comment-14158594
 ] 

Daniel Weeks commented on HIVE-7800:
------------------------------------

This patch actually resolves a few different issues:

1) If the file schema size and table schema size differ across partitions, it 
no longer throws an index out of bounds.
2) There was an odd case where if the calculated input splits resulted in a 
mapper not processing the first split (due to the row group boundary checking), 
the array writable used to back the materialized rows would be initialized as 
the full table length as opposed to projected column length.  In the column 
index access case this caused problems due to not being able to handle that 
case.
3) There was a check included previously that didn't allow the file schema to 
vary from the table schema (i.e. could not request a column that doesn't exist 
in the underlying file).  This doesn't allow for schema evolution and was 
removed.  Columns missing from the file schema should be null padded in the 
final result. 

> Parquet Column Index Access Schema Size Checking
> ------------------------------------------------
>
>                 Key: HIVE-7800
>                 URL: https://issues.apache.org/jira/browse/HIVE-7800
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.14.0
>            Reporter: Daniel Weeks
>            Assignee: Daniel Weeks
>            Priority: Critical
>             Fix For: 0.14.0
>
>         Attachments: HIVE-7800.1.patch, HIVE-7800.2.patch, HIVE-7800.3.patch
>
>
> In the case that a parquet formatted table has partitions where the files 
> have different size schema, using column index access can result in an index 
> out of bounds exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to