[ https://issues.apache.org/jira/browse/HIVE-8622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14203758#comment-14203758 ]
Xuefu Zhang commented on HIVE-8622: ----------------------------------- Here is my sudo code showing my attemp to solve this seemingly complex problem: {code} // Notation: // MJWork - a work with map join operator // HTSWork = a work with HashTableSinkOperator // Each MJWork will build a SparkWork for its small table works. This info is held in a map <MJWork, SparkWork>, // originally empty and named childSparkWorkMap Map<MJWork, SparkWork> childSparkWorkMap = new HashMap<MJWork, SparkWork>(); // Each work, including a MJWork, also belongs to a parent SparkWork. Originally, all works belong to the original SparkWork. // The info is help in another map <MJWork, SparkWork> named parentSparkWork. Map<BaseWork, SparkWork> parentSparkWorkMap = new HashMap<BaseWork, SparkWork>(); List<BaseWork> works = sparkWork.getAllWorks(); // sparkWork is original SparkWork to be split for (BaseWork work : works) { parentSparkWorkMap.put(work, sparkWork); } // dependency map among all SparkWorks. This our final result Map<SparkWork, SparkWork> dependencyMap = new new HashMap<SparkWork, SparkWork>(); // Process the original SparkWork from leaves backwards to roots. List<BaseWork> leaves = sparkWork.getLeaves(); for (BaseWork leaf : leaves) { move(leaf, sparkWork); } /** * Move a work from original SparkWork to the target SparkWork */ void move(BaseWork work, SparkWork target) { List<BaseWork> parentWorks = sparkWork.getParents(work); SparkWork currentParentSparkWork = parentSparkWorkMap.get(work); if(currentParentSparkWork != target) { // TODO: move the work from currentParent to target. parentSparkWorkMap.put(work, target); // update new parent } if (!(work instanceof MJWork)) { for(BaseWork parent : parents) { // move each parent to the same parent SparkWork of work move(parent, target); } } else { // it's a MJWork. SparkWork childSparkWork = new SparkWork(); dependencyMap.put(target, childSparkWork); childSparkMap.put(work, childSparkWork); for(BaseWork parent : parents) { if (parent instanceof HTSWork) { move(parent, childSparkWork); } else { move(parent, target); } } } } {code} > Split map-join plan into 2 SparkTasks in 3 stages [Spark Branch] > ---------------------------------------------------------------- > > Key: HIVE-8622 > URL: https://issues.apache.org/jira/browse/HIVE-8622 > Project: Hive > Issue Type: Sub-task > Reporter: Suhas Satish > Assignee: Chao > Attachments: HIVE-8622.2-spark.patch, HIVE-8622.3-spark.patch, > HIVE-8622.patch > > > This is a sub-task of map-join for spark > https://issues.apache.org/jira/browse/HIVE-7613 > This can use the baseline patch for map-join > https://issues.apache.org/jira/browse/HIVE-8616 -- This message was sent by Atlassian JIRA (v6.3.4#6332)