[
https://issues.apache.org/jira/browse/HIVE-5891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13832744#comment-13832744
]
Ashutosh Chauhan commented on HIVE-5891:
----------------------------------------
[~sunrui] Can you add your test case as .q file in patch. You need to set
hive.auto.convert.join=true; to effect map-join in .q tests. Also, I think it
may result in lot of update to .q.out files since we may now use different
alias than $INTNAME which appears in .q.out files.
Also, [~yhuai] can you take a look to see if this has any affect on Demux
Operator. I don't think so, but you ll know better.
> Alias conflict when merging multiple mapjoin tasks into their common child
> mapred task
> --------------------------------------------------------------------------------------
>
> Key: HIVE-5891
> URL: https://issues.apache.org/jira/browse/HIVE-5891
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Affects Versions: 0.12.0
> Reporter: Sun Rui
> Attachments: HIVE-5891.1.patch
>
>
> Use the following test case with HIVE 0.12:
> {quote}
> create table src(key int, value string);
> load data local inpath 'src/data/files/kv1.txt' overwrite into table src;
> select * from (
> select c.key from
> (select a.key from src a join src b on a.key=b.key group by a.key) tmp
> join src c on tmp.key=c.key
> union all
> select c.key from
> (select a.key from src a join src b on a.key=b.key group by a.key) tmp
> join src c on tmp.key=c.key
> ) x;
> {quote}
> We will get a NullPointerException from Union Operator:
> {quote}
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException:
> Hive Runtime Error while processing row {"_col0":0}
> at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:175)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
> at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime
> Error while processing row {"_col0":0}
> at
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:544)
> at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:157)
> ... 4 more
> Caused by: java.lang.NullPointerException
> at
> org.apache.hadoop.hive.ql.exec.UnionOperator.processOp(UnionOperator.java:120)
> at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842)
> at
> org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:88)
> at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842)
> at
> org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:652)
> at
> org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:655)
> at
> org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:758)
> at
> org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:220)
> at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842)
> at
> org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:91)
> at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842)
> at
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:534)
> ... 5 more
> {quote}
>
> The root cause is in
> CommonJoinTaskDispatcher.mergeMapJoinTaskIntoItsChildMapRedTask().
> +--------------+ +--------------+
> | MapJoin task | | MapJoin task |
> +--------------+ +--------------+
> \ /
> \ /
> +--------------+
> | Union task |
> +--------------+
>
> CommonJoinTaskDispatcher merges the two MapJoin tasks into their common
> child: Union task. The two MapJoin tasks have the same alias name for their
> big tables: $INTNAME, which is the name of the temporary table of a join
> stream. The aliasToWork map uses alias as key, so eventually only the MapJoin
> operator tree of one MapJoin task is saved into the aliasToWork map of the
> Union task, while the MapJoin operator tree of another MapJoin task is lost.
> As a result, Union operator won't be initialized because not all of its
> parents gets intialized (The Union operator itself indicates it has two
> parents, but actually it has only 1 parent because another parent is lost).
> This issue does not exist in HIVE 0.11 and thus is a regression bug in HIVE
> 0.12.
> The propsed solution is to use the query ID as prefix for the join stream
> name to avoid conflict and add sanity check code in CommonJoinTaskDispatcher
> that merge of a MapJoin task into its child MapRed task is skipped if there
> is any alias conflict. Please review the patch. I am not sure if the patch
> properly handles the case of DemuxOperator.
> BTW, anyone knows the origin of "$INTNAME"? it is so confusing, maybe we can
> replace it with a meaningful name.
--
This message was sent by Atlassian JIRA
(v6.1#6144)