[ https://issues.apache.org/jira/browse/HIVE-3257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13414212#comment-13414212 ]
Zhenxiao Luo commented on HIVE-3257: ------------------------------------ The problem is in ql/src/java/org/apache/hadoop/hive/ql/io/avro/AvroGenericRecordReader.java: in getSchema(), the FileSplit does not have the scheme part of path URI, in this case, "pfile:". The matching function pathIsInPartition() is checking whether the split starts with patitionPath. In hadoop0.23, partitionPath still holds pfile: prefix, while, FileSplit does not. So, pathIsInPartition() returns false. In hadoop0.20, both partitionPath and FileSplit hold pfile: prefix. So, pathIsInPartition() returns true. The root of the problem is in: shims/src/common-secure/java/org/apache/hadoop/hive/shims/HadoopShimsSecure.java ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java In getSplits(), hadoop0.23 removes scheme part of path URI in the CombineFileInputFormat, in this case "pfile:". This diffs from hadoop0.20 behavior. The same problem happens in HIVE-2737, HIVE-2778, HIVE-2784. We already committed patches, which have workaround including checking whether the path is schemeless or not. Will do the same thing for this AvroGenericRecordReader > Fix avro_joins.q testcase failure when building hive on hadoop0.23 > ------------------------------------------------------------------ > > Key: HIVE-3257 > URL: https://issues.apache.org/jira/browse/HIVE-3257 > Project: Hive > Issue Type: Bug > Reporter: Zhenxiao Luo > Assignee: Zhenxiao Luo > > avro_joins.q is failing when building hive on hadoop0.23 for both MR1 and > MR2. It has an execution exception: > This query fails when execution: > SELECT e.title, e.air_date, d.first_name, d.last_name, d.extra_field, > e.air_date > FROM doctors4 d JOIN episodes e ON (d.number=e.doctor) > ORDER BY d.last_name, e.title > Execution failed with exit status: 2 > Obtaining error information > Task failed! > Task ID: > Stage-1 > Logs: > /home/cloudera/Code/hive/build/ql/tmp//hive.log > FAILED: Execution Error, return code 2 from > org.apache.hadoop.hive.ql.exec.MapRedTask -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira