[jira] [Commented] (HIVE-10086) Hive throws error when accessing Parquet file schema using field name match

Chao (JIRA) Sun, 29 Mar 2015 17:06:12 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-10086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14386033#comment-14386033
 ]


Chao commented on HIVE-10086:
-----------------------------

I think the test failure on smb_mapjoin_8.q is not related - I've seen it 
before, and also the same test succeeded on my local machine.

> Hive throws error when accessing Parquet file schema using field name match
> ---------------------------------------------------------------------------
>
>                 Key: HIVE-10086
>                 URL: https://issues.apache.org/jira/browse/HIVE-10086
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 1.0.0
>            Reporter: Sergio Peña
>            Assignee: Sergio Peña
>         Attachments: HIVE-10086.5.patch, HiveGroup.parquet
>
>
> When Hive table schema contains a portion of the schema of a Parquet file, 
> then the access to the values should work if the field names match the 
> schema. This does not work when a struct<> data type is in the schema, and 
> the Hive schema contains just a portion of the struct elements. Hive throws 
> an error instead.
> This is the example and how to reproduce:
> First, create a parquet table, and add some values on it:
> {code}
> CREATE TABLE test1 (id int, name string, address 
> struct<number:int,street:string,zip:string>) STORED AS PARQUET;
> INSERT INTO TABLE test1 SELECT 1, 'Roger', 
> named_struct('number',8600,'street','Congress Ave.','zip','87366') FROM 
> srcpart LIMIT 1;
> {code}
> Note: {{srcpart}} could be any table. It is just used to leverage the INSERT 
> statement.
> The above table example generates the following Parquet file schema:
> {code}
> message hive_schema {
>   optional int32 id;
>   optional binary name (UTF8);
>   optional group address {
>     optional int32 number;
>     optional binary street (UTF8);
>     optional binary zip (UTF8);
>   }
> }
> {code} 
> Afterwards, I create a table that contains just a portion of the schema, and 
> load the Parquet file generated above, a query will fail on that table:
> {code}
> CREATE TABLE test1 (name string, address struct<street:string>) STORED AS 
> PARQUET;
> LOAD DATA LOCAL INPATH '/tmp/HiveGroup.parquet' OVERWRITE INTO TABLE test1;
> hive> SELECT name FROM test1;
> OK
> Roger
> Time taken: 0.071 seconds, Fetched: 1 row(s)
> hive> SELECT address FROM test1;
> OK
> Failed with exception 
> java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.UnsupportedOperationException: Cannot inspect 
> org.apache.hadoop.io.IntWritable
> Time taken: 0.085 seconds
> {code}
> I would expect that Parquet can access the matched names, but Hive throws an 
> error instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10086) Hive throws error when accessing Parquet file schema using field name match

Reply via email to