[ https://issues.apache.org/jira/browse/HIVE-7800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14158594#comment-14158594 ]
Daniel Weeks commented on HIVE-7800: ------------------------------------ This patch actually resolves a few different issues: 1) If the file schema size and table schema size differ across partitions, it no longer throws an index out of bounds. 2) There was an odd case where if the calculated input splits resulted in a mapper not processing the first split (due to the row group boundary checking), the array writable used to back the materialized rows would be initialized as the full table length as opposed to projected column length. In the column index access case this caused problems due to not being able to handle that case. 3) There was a check included previously that didn't allow the file schema to vary from the table schema (i.e. could not request a column that doesn't exist in the underlying file). This doesn't allow for schema evolution and was removed. Columns missing from the file schema should be null padded in the final result. > Parquet Column Index Access Schema Size Checking > ------------------------------------------------ > > Key: HIVE-7800 > URL: https://issues.apache.org/jira/browse/HIVE-7800 > Project: Hive > Issue Type: Bug > Affects Versions: 0.14.0 > Reporter: Daniel Weeks > Assignee: Daniel Weeks > Priority: Critical > Fix For: 0.14.0 > > Attachments: HIVE-7800.1.patch, HIVE-7800.2.patch, HIVE-7800.3.patch > > > In the case that a parquet formatted table has partitions where the files > have different size schema, using column index access can result in an index > out of bounds exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)