[ https://issues.apache.org/jira/browse/HIVE-10086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14386033#comment-14386033 ]
Chao commented on HIVE-10086: ----------------------------- I think the test failure on smb_mapjoin_8.q is not related - I've seen it before, and also the same test succeeded on my local machine. > Hive throws error when accessing Parquet file schema using field name match > --------------------------------------------------------------------------- > > Key: HIVE-10086 > URL: https://issues.apache.org/jira/browse/HIVE-10086 > Project: Hive > Issue Type: Bug > Affects Versions: 1.0.0 > Reporter: Sergio Peña > Assignee: Sergio Peña > Attachments: HIVE-10086.5.patch, HiveGroup.parquet > > > When Hive table schema contains a portion of the schema of a Parquet file, > then the access to the values should work if the field names match the > schema. This does not work when a struct<> data type is in the schema, and > the Hive schema contains just a portion of the struct elements. Hive throws > an error instead. > This is the example and how to reproduce: > First, create a parquet table, and add some values on it: > {code} > CREATE TABLE test1 (id int, name string, address > struct<number:int,street:string,zip:string>) STORED AS PARQUET; > INSERT INTO TABLE test1 SELECT 1, 'Roger', > named_struct('number',8600,'street','Congress Ave.','zip','87366') FROM > srcpart LIMIT 1; > {code} > Note: {{srcpart}} could be any table. It is just used to leverage the INSERT > statement. > The above table example generates the following Parquet file schema: > {code} > message hive_schema { > optional int32 id; > optional binary name (UTF8); > optional group address { > optional int32 number; > optional binary street (UTF8); > optional binary zip (UTF8); > } > } > {code} > Afterwards, I create a table that contains just a portion of the schema, and > load the Parquet file generated above, a query will fail on that table: > {code} > CREATE TABLE test1 (name string, address struct<street:string>) STORED AS > PARQUET; > LOAD DATA LOCAL INPATH '/tmp/HiveGroup.parquet' OVERWRITE INTO TABLE test1; > hive> SELECT name FROM test1; > OK > Roger > Time taken: 0.071 seconds, Fetched: 1 row(s) > hive> SELECT address FROM test1; > OK > Failed with exception > java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: > java.lang.UnsupportedOperationException: Cannot inspect > org.apache.hadoop.io.IntWritable > Time taken: 0.085 seconds > {code} > I would expect that Parquet can access the matched names, but Hive throws an > error instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)