Sergio Peña created HIVE-10086:
----------------------------------

             Summary: Hive throws error when accessing Parquet file schema 
using field name match
                 Key: HIVE-10086
                 URL: https://issues.apache.org/jira/browse/HIVE-10086
             Project: Hive
          Issue Type: Bug
    Affects Versions: 1.0.0
            Reporter: Sergio Peña
            Assignee: Sergio Peña


When Hive table schema contains a portion of the schema of a Parquet file, then 
the access to the values should work if the field names match the schema. This 
does not work when a struct<> data type is in the schema, and the Hive schema 
contains just a portion of the struct elements. Hive throws an error instead.

This is the example and how to reproduce:

First, create a parquet table, and add some values on it:
{code}
CREATE TABLE test1 (id int, name string, address 
struct<number:int,street:string,zip:string>) STORED AS PARQUET;

INSERT INTO TABLE test1 SELECT 1, 'Roger', 
named_struct('number',8600,'street','Congress Ave.','zip','87366') FROM srcpart 
LIMIT 1;
{code}

Note: {{srcpart}} could be any table. It is just used to leverage the INSERT 
statement.

The above table example generates the following Parquet file schema:
{code}
message hive_schema {
  optional int32 id;
  optional binary name (UTF8);
  optional group address {
    optional int32 number;
    optional binary street (UTF8);
    optional binary zip (UTF8);
  }
}
{code} 

Afterwards, I create a table that contains just a portion of the schema, and 
load the Parquet file generated above, a query will fail on that table:
{code}
CREATE TABLE test1 (name string, address struct<street:string>) STORED AS 
PARQUET;

LOAD DATA LOCAL INPATH '/tmp/HiveGroup.parquet' OVERWRITE INTO TABLE test1;

hive> SELECT name FROM test1;
OK
Roger
Time taken: 0.071 seconds, Fetched: 1 row(s)

hive> SELECT address FROM test1;
OK
Failed with exception 
java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.UnsupportedOperationException: Cannot inspect 
org.apache.hadoop.io.IntWritable
Time taken: 0.085 seconds
{code}

I would expect that Parquet can access the matched names, but Hive throws an 
error instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to