[jira] Commented: (HIVE-1695) MapJoin followed by ReduceSink should be done as single MapReduce Job

Sreekanth Ramakrishnan (JIRA) Wed, 01 Dec 2010 02:33:36 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-1695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12965652#action_12965652
 ]


Sreekanth Ramakrishnan commented on HIVE-1695:
----------------------------------------------

Attaching a sample plan based on the above approach

{noformat}
ABSTRACT SYNTAX TREE:
  (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_TABREF test a) (TOK_TABREF test1 b) (= (. 
(TOK_TABLE_OR_COL a) key) (. (TOK_TABLE_OR_COL b) key)))) (TOK_INSERT 
(TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_HINTLIST (TOK_HINT 
TOK_MAPJOIN (TOK_HINTARGLIST b))) (TOK_SELEXPR (. (TOK_TABLE_OR_COL a) key))) 
(TOK_ORDERBY (TOK_TABSORTCOLNAMEASC (. (TOK_TABLE_OR_COL a) key)))))

STAGE DEPENDENCIES:
  Stage-4 is a root stage
  Stage-1 depends on stages: Stage-4
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-4
    Map Reduce Local Work
      Alias -> Map Local Tables:
        b 
          Fetch Operator
            limit: -1
      Alias -> Map Local Operator Tree:
        b 
          TableScan
            alias: b
            HashTable Sink Operator
              condition expressions:
                0 {key}
                1 
              handleSkewJoin: false
              keys:
                0 [Column[key]]
                1 [Column[key]]
              Position of Big Table: 0

  Stage: Stage-1
    Map Reduce
      Alias -> Map Operator Tree:
        a 
          TableScan
            alias: a
            Map Join Operator
              condition map:
                   Inner Join 0 to 1
              condition expressions:
                0 {key}
                1 
              handleSkewJoin: false
              keys:
                0 [Column[key]]
                1 [Column[key]]
              outputColumnNames: _col0
              Position of Big Table: 0
              Reduce Output Operator
                key expressions:
                      expr: _col0
                      type: int
                sort order: +
                tag: -1
                value expressions:
                      expr: _col0
                      type: int
      Local Work:
        Map Reduce Local Work
      Reduce Operator Tree:
        Extract
          File Output Operator
            compressed: false
            GlobalTableId: 0
            table:
                input format: org.apache.hadoop.mapred.TextInputFormat
                output format: 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat

  Stage: Stage-0
    Fetch Operator
      limit: -1
{noformat}

Currently, trying to fix the issue in the map failure due to the above plan. 
Stil figuring out how to add a Select Operator at the end of the map join 
operator.

> MapJoin followed by ReduceSink should be done as single MapReduce Job
> ---------------------------------------------------------------------
>
>                 Key: HIVE-1695
>                 URL: https://issues.apache.org/jira/browse/HIVE-1695
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Amareshwari Sriramadasu
>
> Currently MapJoin followed by ReduceSink runs as two MapReduce jobs : One map 
> only job followed by a Map-Reduce job. It can be combined into single 
> MapReduce Job.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1695) MapJoin followed by ReduceSink should be done as single MapReduce Job

Reply via email to