[ 
https://issues.apache.org/jira/browse/HIVE-3257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13414212#comment-13414212
 ] 

Zhenxiao Luo commented on HIVE-3257:
------------------------------------

The problem is in
ql/src/java/org/apache/hadoop/hive/ql/io/avro/AvroGenericRecordReader.java:

in getSchema(), the FileSplit does not have the scheme part of path URI, in 
this case, "pfile:".

The matching function pathIsInPartition() is checking whether the split starts 
with patitionPath.

In hadoop0.23, partitionPath still holds pfile: prefix, while, FileSplit does 
not. So, pathIsInPartition() returns false.

In hadoop0.20, both partitionPath and FileSplit hold pfile: prefix. So, 
pathIsInPartition() returns true.

The root of the problem is in:
shims/src/common-secure/java/org/apache/hadoop/hive/shims/HadoopShimsSecure.java
ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java

In getSplits(), hadoop0.23  removes scheme part of path URI in the 
CombineFileInputFormat, in this case "pfile:". This diffs from hadoop0.20 
behavior. 

The same problem happens in HIVE-2737, HIVE-2778, HIVE-2784.

We already committed patches, which have workaround including checking whether 
the path is schemeless or not.

Will do the same thing for this AvroGenericRecordReader
                
> Fix avro_joins.q testcase failure when building hive on hadoop0.23
> ------------------------------------------------------------------
>
>                 Key: HIVE-3257
>                 URL: https://issues.apache.org/jira/browse/HIVE-3257
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Zhenxiao Luo
>            Assignee: Zhenxiao Luo
>
> avro_joins.q is failing when building hive on hadoop0.23 for both MR1 and 
> MR2. It has an execution exception:
> This query fails when execution:
> SELECT e.title, e.air_date, d.first_name, d.last_name, d.extra_field, 
> e.air_date
> FROM doctors4 d JOIN episodes e ON (d.number=e.doctor)
> ORDER BY d.last_name, e.title
> Execution failed with exit status: 2
> Obtaining error information
> Task failed!
> Task ID:
> Stage-1
> Logs:
> /home/cloudera/Code/hive/build/ql/tmp//hive.log
> FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.MapRedTask

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to