[ https://issues.apache.org/jira/browse/HIVE-9502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299208#comment-14299208 ]
Brock Noland commented on HIVE-9502: ------------------------------------ Thank you Sergio! I have committed this to trunk and branch-1.1. > Parquet cannot read Map types from files written with Hive <= 0.12 > ------------------------------------------------------------------ > > Key: HIVE-9502 > URL: https://issues.apache.org/jira/browse/HIVE-9502 > Project: Hive > Issue Type: Bug > Affects Versions: 0.14.0 > Reporter: Sergio Peña > Assignee: Sergio Peña > Fix For: 1.1.0 > > Attachments: HIVE-9502.1.patch, HIVE-9502.2.patch, HIVE-9502.3.patch, > HIVE-9502.4.patch, alltypesparquet > > > When reading a Parquet file written by Hive <= 0.12, the following error is > thrown: > {noformat} > Caused by: java.lang.ArrayIndexOutOfBoundsException: 1 > at > org.apache.hadoop.hive.ql.io.parquet.serde.AbstractParquetMapInspector.getMap(AbstractParquetMapInspector.java:73) > at > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:519) > at > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:443) > at > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:427) > at > org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:582) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796) > at > org.apache.hadoop.hive.ql.exec.LimitOperator.processOp(LimitOperator.java:51) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796) > at > org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796) > at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:539) > ... 9 more > {noformat} > This is because old versions of Hive (<= 0.12) write Map types using the > following schema: > {noformat} > optional group m1 (MAP_KEY_VALUE) { > repeated group map { > required binary key; > optional binary key; > } > } > {noformat} > PARQUET-113 mentions new annotations for Parquet nested types. > https://github.com/rdblue/incubator-parquet-format/blob/PARQUET-113-add-list-and-map-spec/LogicalTypes.md#maps > And now the correct schema is: > {noformat} > optional group m1f (MAP) { > repeated group map (MAP_KEY_VALUE) { > required binary key; > optional binary key; > } > } > {noformat} > We should be backwards compatible to the old schema as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)