[ https://issues.apache.org/jira/browse/HIVE-10086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14385006#comment-14385006 ]
Hive QA commented on HIVE-10086: -------------------------------- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12707844/HIVE-10086.5.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 8678 tests executed *Failed tests:* {noformat} TestMinimrCliDriver-smb_mapjoin_8.q - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_table_with_subschema {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3187/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3187/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3187/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12707844 - PreCommit-HIVE-TRUNK-Build > Hive throws error when accessing Parquet file schema using field name match > --------------------------------------------------------------------------- > > Key: HIVE-10086 > URL: https://issues.apache.org/jira/browse/HIVE-10086 > Project: Hive > Issue Type: Bug > Affects Versions: 1.0.0 > Reporter: Sergio Peña > Assignee: Sergio Peña > Attachments: HIVE-10086.5.patch, HiveGroup.parquet > > > When Hive table schema contains a portion of the schema of a Parquet file, > then the access to the values should work if the field names match the > schema. This does not work when a struct<> data type is in the schema, and > the Hive schema contains just a portion of the struct elements. Hive throws > an error instead. > This is the example and how to reproduce: > First, create a parquet table, and add some values on it: > {code} > CREATE TABLE test1 (id int, name string, address > struct<number:int,street:string,zip:string>) STORED AS PARQUET; > INSERT INTO TABLE test1 SELECT 1, 'Roger', > named_struct('number',8600,'street','Congress Ave.','zip','87366') FROM > srcpart LIMIT 1; > {code} > Note: {{srcpart}} could be any table. It is just used to leverage the INSERT > statement. > The above table example generates the following Parquet file schema: > {code} > message hive_schema { > optional int32 id; > optional binary name (UTF8); > optional group address { > optional int32 number; > optional binary street (UTF8); > optional binary zip (UTF8); > } > } > {code} > Afterwards, I create a table that contains just a portion of the schema, and > load the Parquet file generated above, a query will fail on that table: > {code} > CREATE TABLE test1 (name string, address struct<street:string>) STORED AS > PARQUET; > LOAD DATA LOCAL INPATH '/tmp/HiveGroup.parquet' OVERWRITE INTO TABLE test1; > hive> SELECT name FROM test1; > OK > Roger > Time taken: 0.071 seconds, Fetched: 1 row(s) > hive> SELECT address FROM test1; > OK > Failed with exception > java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: > java.lang.UnsupportedOperationException: Cannot inspect > org.apache.hadoop.io.IntWritable > Time taken: 0.085 seconds > {code} > I would expect that Parquet can access the matched names, but Hive throws an > error instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)