Zhang Xinyu created HIVE-3326: --------------------------------- Summary: plan for multiple mapjoin followed by a normal join is wrong Key: HIVE-3326 URL: https://issues.apache.org/jira/browse/HIVE-3326 Project: Hive Issue Type: Bug Components: SQL Affects Versions: 0.8.1 Environment: OS X 10.8; java 1.6.0_33 Reporter: Zhang Xinyu
example queries: create table yudi(c1 int, c2 int, c3 int, c4 int); create table wangmu(c1 int, c2 int, c3 int, c4 int); select /*+mapjoin(b,c)*/ * from yudi a join yudi b on a.c1=b.c1 join wangmu c on b.c2=c.c2 join yudi d on a.c3=d.c3; in explain mode, I got this: hive> explain select /*+mapjoin(b,c)*/ * from yudi a join yudi b on a.c1=b.c1 join wangmu c on b.c2=c.c2 join yudi d on a.c3=d.c3; OK STAGE DEPENDENCIES: Stage-8 is a root stage Stage-2 depends on stages: Stage-8 Stage-7 depends on stages: Stage-2 Stage-3 depends on stages: Stage-7 Stage-1 depends on stages: Stage-3 STAGE PLANS: Stage: Stage-8 Map Reduce Local Work Alias -> Map Local Tables: b <Not Important> Stage: Stage-2 Map Reduce Alias -> Map Operator Tree: a <Not Important> Local Work: Map Reduce Local Work Stage: Stage-7 Map Reduce Local Work Alias -> Map Local Tables: c <Not Important> Stage: Stage-3 Map Reduce Alias -> Map Operator Tree: file:/var/folders/4w/3_nk1cwd4pd023mzx64p3r480000gn/T/dukezhang/hive_2012-08-01_14-01-37_152_5814747325029961632/-mr-10002 <Not Important> Local Work: Map Reduce Local Work Stage: Stage-1 Map Reduce Alias -> Map Operator Tree: d TableScan file:/var/folders/4w/3_nk1cwd4pd023mzx64p3r480000gn/T/dukezhang/hive_2012-08-01_14-01-37_152_5814747325029961632/-mr-10002 Select Operator Reduce Operator Tree: <Not Important> You see, mapper of Stage-1 should read from Stage-3, maybe '.../-mr-10003', not Stage-2(result in '.../-mr-10002'). To resolve this bug, I found these codes(GenMapRedUtils.java, about line 431): if (oldMapJoin == null) { if (opProcCtx.getParseCtx().getListMapJoinOpsNoReducer().contains(mjOp) || local || (oldTask != null) && (parTasks != null)) { taskTmpDir = mjCtx.getTaskTmpDir(); tt_desc = mjCtx.getTTDesc(); rootOp = mjCtx.getRootMapJoinOp(); } } else { GenMRMapJoinCtx oldMjCtx = opProcCtx.getMapJoinCtx(oldMapJoin); assert oldMjCtx != null; taskTmpDir = oldMjCtx.getTaskTmpDir(); tt_desc = oldMjCtx.getTTDesc(); rootOp = oldMjCtx.getRootMapJoinOp(); } my query goes into 'else' block and gets wrong taskTmpDir. I hack them to let query go into 'if' block, and it works. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira