[jira] [Commented] (HIVE-10086) Hive throws error when accessing Parquet file schema using field name match

Hive QA (JIRA) Thu, 26 Mar 2015 15:35:33 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-10086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14382839#comment-14382839
 ]


Hive QA commented on HIVE-10086:
--------------------------------



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12707516/HIVE-10086.2.patch

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 8678 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_join
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_table_with_subschema
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_remote_script
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_root_dir_external_table
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_scriptfile1
org.apache.hive.jdbc.TestSSL.testSSLFetchHttp
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3171/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3171/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3171/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12707516 - PreCommit-HIVE-TRUNK-Build

> Hive throws error when accessing Parquet file schema using field name match
> ---------------------------------------------------------------------------
>
>                 Key: HIVE-10086
>                 URL: https://issues.apache.org/jira/browse/HIVE-10086
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 1.0.0
>            Reporter: Sergio Peña
>            Assignee: Sergio Peña
>         Attachments: HIVE-10086.3.patch, HiveGroup.parquet
>
>
> When Hive table schema contains a portion of the schema of a Parquet file, 
> then the access to the values should work if the field names match the 
> schema. This does not work when a struct<> data type is in the schema, and 
> the Hive schema contains just a portion of the struct elements. Hive throws 
> an error instead.
> This is the example and how to reproduce:
> First, create a parquet table, and add some values on it:
> {code}
> CREATE TABLE test1 (id int, name string, address 
> struct<number:int,street:string,zip:string>) STORED AS PARQUET;
> INSERT INTO TABLE test1 SELECT 1, 'Roger', 
> named_struct('number',8600,'street','Congress Ave.','zip','87366') FROM 
> srcpart LIMIT 1;
> {code}
> Note: {{srcpart}} could be any table. It is just used to leverage the INSERT 
> statement.
> The above table example generates the following Parquet file schema:
> {code}
> message hive_schema {
>   optional int32 id;
>   optional binary name (UTF8);
>   optional group address {
>     optional int32 number;
>     optional binary street (UTF8);
>     optional binary zip (UTF8);
>   }
> }
> {code} 
> Afterwards, I create a table that contains just a portion of the schema, and 
> load the Parquet file generated above, a query will fail on that table:
> {code}
> CREATE TABLE test1 (name string, address struct<street:string>) STORED AS 
> PARQUET;
> LOAD DATA LOCAL INPATH '/tmp/HiveGroup.parquet' OVERWRITE INTO TABLE test1;
> hive> SELECT name FROM test1;
> OK
> Roger
> Time taken: 0.071 seconds, Fetched: 1 row(s)
> hive> SELECT address FROM test1;
> OK
> Failed with exception 
> java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.UnsupportedOperationException: Cannot inspect 
> org.apache.hadoop.io.IntWritable
> Time taken: 0.085 seconds
> {code}
> I would expect that Parquet can access the matched names, but Hive throws an 
> error instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10086) Hive throws error when accessing Parquet file schema using field name match

Reply via email to