[jira] [Comment Edited] (HIVE-8622) Split map-join plan into 2 SparkTasks in 3 stages [Spark Branch]

Xuefu Zhang (JIRA) Sun, 09 Nov 2014 06:52:36 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-8622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14203758#comment-14203758
 ]


Xuefu Zhang edited comment on HIVE-8622 at 11/9/14 2:51 PM:
------------------------------------------------------------

Here is my sudo code showing my attemp to solve this seemingly complex problem:
{code}
// Notation:
// MJWork - a work with map join operator
// HTSWork = a work with HashTableSinkOperator

// Each MJWork will build a SparkWork for its small table works. This info is 
held in a map <MJWork, SparkWork>,
// originally empty and named childSparkWorkMap
Map<MJWork, SparkWork> childSparkWorkMap = new HashMap<MJWork, SparkWork>();

// dependency graph among all SparkWorks. This our final result
Map<SparkWork, List<SparkWork>> dependencyMap = new new HashMap<SparkWork, 
List<SparkWork>>();

// Process the original SparkWork from leaves backwards to roots.
List<BaseWork> leaves = sparkWork.getLeaves();
for (BaseWork leaf : leaves) {
  move(leaf, sparkWork);
}

/**
 * Move a work from original SparkWork to the target SparkWork
 */
void move(BaseWork work, SparkWork target) {
  List<BaseWork> parentWorks = sparkWork.getParents(work);
  if(sparkWork != target) {
    // TODO: move the work from currentParent to target.
    }
 
  if (!(work instanceof MJWork)) {
    for(BaseWork parent : parents) {
      // move each parent to the same parent SparkWork of work
      move(parent, target);
    }
  } else {
    // it's a MJWork.
    SparkWork childSparkWork = new SparkWork();
    // TODO: update dependencyGraph, target depends on childSparkWork
    childSparkMap.put(work, childSparkWork);
    for(BaseWork parent : parents) {
      if (parent instanceof HTSWork) {
        move(parent, childSparkWork);
      } else {
        move(parent, target);
      }
    }
  }
}
{code}


was (Author: xuefuz):
Here is my sudo code showing my attemp to solve this seemingly complex problem:
{code}
// Notation:
// MJWork - a work with map join operator
// HTSWork = a work with HashTableSinkOperator

// Each MJWork will build a SparkWork for its small table works. This info is 
held in a map <MJWork, SparkWork>,
// originally empty and named childSparkWorkMap
Map<MJWork, SparkWork> childSparkWorkMap = new HashMap<MJWork, SparkWork>();

// Each work, including a MJWork, also belongs to a parent SparkWork. 
Originally, all works belong to the original SparkWork.
// The info is help in another map <MJWork, SparkWork> named parentSparkWork.
Map<BaseWork, SparkWork> parentSparkWorkMap = new HashMap<BaseWork, 
SparkWork>();
List<BaseWork> works = sparkWork.getAllWorks(); // sparkWork is original 
SparkWork to be split
for (BaseWork work : works) {
  parentSparkWorkMap.put(work, sparkWork);
}

// dependency map among all SparkWorks. This our final result
Map<SparkWork, SparkWork> dependencyMap = new new HashMap<SparkWork, 
SparkWork>();

// Process the original SparkWork from leaves backwards to roots.
List<BaseWork> leaves = sparkWork.getLeaves();
for (BaseWork leaf : leaves) {
  move(leaf, sparkWork);
}

/**
 * Move a work from original SparkWork to the target SparkWork
 */
void move(BaseWork work, SparkWork target) {
  List<BaseWork> parentWorks = sparkWork.getParents(work);
  SparkWork currentParentSparkWork = parentSparkWorkMap.get(work);
  if(currentParentSparkWork != target) {
    // TODO: move the work from currentParent to target.
        parentSparkWorkMap.put(work, target); // update new parent
    }
 
  if (!(work instanceof MJWork)) {
    for(BaseWork parent : parents) {
      // move each parent to the same parent SparkWork of work
      move(parent, target);
    }
  } else {
    // it's a MJWork.
    SparkWork childSparkWork = new SparkWork();
    dependencyMap.put(target, childSparkWork);
    childSparkMap.put(work, childSparkWork);
    for(BaseWork parent : parents) {
      if (parent instanceof HTSWork) {
        move(parent, childSparkWork);
      } else {
        move(parent, target);
      }
    }
  }
}

{code}

> Split map-join plan into 2 SparkTasks in 3 stages [Spark Branch]
> ----------------------------------------------------------------
>
>                 Key: HIVE-8622
>                 URL: https://issues.apache.org/jira/browse/HIVE-8622
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Suhas Satish
>            Assignee: Chao
>         Attachments: HIVE-8622.2-spark.patch, HIVE-8622.3-spark.patch, 
> HIVE-8622.patch
>
>
> This is a sub-task of map-join for spark 
> https://issues.apache.org/jira/browse/HIVE-7613
> This can use the baseline patch for map-join
> https://issues.apache.org/jira/browse/HIVE-8616



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HIVE-8622) Split map-join plan into 2 SparkTasks in 3 stages [Spark Branch]

Reply via email to