[jira] [Commented] (HIVE-8909) Hive doesn't correctly read Parquet nested types

Brock Noland (JIRA) Thu, 20 Nov 2014 19:57:04 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-8909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14220487#comment-14220487
 ]


Brock Noland commented on HIVE-8909:
------------------------------------

Not sure which test this is from:

{noformat}
Caused by: parquet.io.ParquetDecodingException: Can not read value at 0 in 
block 0 in file 
pfile:/Users/noland/workspaces/hive-apache/hive/itests/qtest/target/warehouse/parquet_jointable2/000000_0
  at 
parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:213)
  at 
parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:204)
  at 
org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:102)
  at 
org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:71)
  at 
org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:71)
  at 
org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.<init>(CombineHiveRecordReader.java:65)
  ... 16 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: 2
  at 
org.apache.hadoop.hive.ql.io.parquet.convert.HiveStructConverter.set(HiveStructConverter.java:96)
  at 
org.apache.hadoop.hive.ql.io.parquet.convert.ETypeConverter$BinaryConverter.addBinary(ETypeConverter.java:219)
  at 
parquet.column.impl.ColumnReaderImpl$2$6.writeValue(ColumnReaderImpl.java:306)
  at 
parquet.column.impl.ColumnReaderImpl.writeCurrentValueToConverter(ColumnReaderImpl.java:353)
  at 
parquet.io.RecordReaderImplementation.read(RecordReaderImplementation.java:402)
  at 
parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:194)
  ... 21 more
{noformat}

All failed parquet tests with the patch:
{noformat}
  <testcase name="testCliDriver_parquet_array_null_element" 
classname="org.apache.hadoop.hive.cli.TestCliDriver" time="4.945">
  <testcase name="testCliDriver_parquet_create" 
classname="org.apache.hadoop.hive.cli.TestCliDriver" time="4.416">
  <testcase name="testCliDriver_parquet_decimal" 
classname="org.apache.hadoop.hive.cli.TestCliDriver" time="5.478">
  <testcase name="testCliDriver_parquet_join" 
classname="org.apache.hadoop.hive.cli.TestCliDriver" time="8.928">
  <testcase name="testCliDriver_parquet_types" 
classname="org.apache.hadoop.hive.cli.TestCliDriver" time="4.094">
{noformat}

> Hive doesn't correctly read Parquet nested types
> ------------------------------------------------
>
>                 Key: HIVE-8909
>                 URL: https://issues.apache.org/jira/browse/HIVE-8909
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.13.1
>            Reporter: Ryan Blue
>            Assignee: Ryan Blue
>         Attachments: HIVE-8909-1.patch, HIVE-8909-2.patch, HIVE-8909.2.patch, 
> HIVE-8909.3.patch, parquet-test-data.tar.gz
>
>
> Parquet's Avro and Thrift object models don't produce the same parquet type 
> representation for lists and maps that Hive does. In the Parquet community, 
> we've defined what should be written and backward-compatibility rules for 
> existing data written by parquet-avro and parquet-thrift in PARQUET-113. We 
> need to implement those rules in the Hive Converter classes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8909) Hive doesn't correctly read Parquet nested types

Reply via email to