[jira] [Commented] (HIVE-8701) Combine nested map joins into the parent map join if possible [Spark Branch]

Szehon Ho (JIRA) Wed, 19 Nov 2014 17:39:07 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-8701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218861#comment-14218861
 ]


Szehon Ho commented on HIVE-8701:
---------------------------------

Note auto_join2 query plan now looks different than all these after some bug 
fixes.

After enabling mapjoin and testing various scenarios, we actually have the 
opposite issue, that SparkMapJoinOptimizer already combines mapjoin into a 
single BaseWork by default, via removal of RS for big table parent.  It might 
run out of memory as calculation only factors in the small-tables for that 
operator, and not all small-table that will run in that work.

So might actually need to enhance calculation to take into account linked 
parent mapjoin that will share the same executor, and not convert to mapjoin in 
that case.


> Combine nested map joins into the parent map join if possible [Spark Branch]
> ----------------------------------------------------------------------------
>
>                 Key: HIVE-8701
>                 URL: https://issues.apache.org/jira/browse/HIVE-8701
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>            Reporter: Xuefu Zhang
>            Assignee: Szehon Ho
>
> With the work in HIVE-8616 enabled, the generated plan shows that the nested 
> map join operator isn't merged to its parent when possible. This is 
> demonstrated in auto_join2.q. The MR plan shown that this optimization is in 
> place. We should do the same for Spark.
> {code}
> STAGE PLANS:
>   Stage: Stage-1
>     Spark
>       Edges:
>         Map 2 <- Map 3 (NONE, 0)
>         Map 3 <- Map 1 (NONE, 0)
>       DagName: xzhang_20141102074141_ac089634-bf01-4386-b1cf-3e7f2e99f6eb:3
>       Vertices:
>         Map 1 
>             Map Operator Tree:
>                 TableScan
>                   alias: src2
>                   Statistics: Num rows: 58 Data size: 5812 Basic stats: 
> COMPLETE Column stats: NONE
>                   Filter Operator
>                     predicate: key is not null (type: boolean)
>                     Statistics: Num rows: 29 Data size: 2906 Basic stats: 
> COMPLETE Column stats: NONE
>                     Reduce Output Operator
>                       key expressions: key (type: string)
>                       sort order: +
>                       Map-reduce partition columns: key (type: string)
>                       Statistics: Num rows: 29 Data size: 2906 Basic stats: 
> COMPLETE Column stats: NONE
>         Map 2 
>             Map Operator Tree:
>                 TableScan
>                   alias: src3
>                   Statistics: Num rows: 29 Data size: 5812 Basic stats: 
> COMPLETE Column stats: NONE
>                   Filter Operator
>                     predicate: UDFToDouble(key) is not null (type: boolean)
>                     Statistics: Num rows: 15 Data size: 3006 Basic stats: 
> COMPLETE Column stats: NONE
>                     Map Join Operator
>                       condition map:
>                            Inner Join 0 to 1
>                       condition expressions:
>                         0 {_col0}
>                         1 {value}
>                       keys:
>                         0 (_col0 + _col5) (type: double)
>                         1 UDFToDouble(key) (type: double)
>                       outputColumnNames: _col0, _col11
>                       input vertices:
>                         0 Map 3
>                       Statistics: Num rows: 17 Data size: 1813 Basic stats: 
> COMPLETE Column stats: NONE
>                       Select Operator
>                         expressions: _col0 (type: string), _col11 (type: 
> string)
>                         outputColumnNames: _col0, _col1
>                         Statistics: Num rows: 17 Data size: 1813 Basic stats: 
> COMPLETE Column stats: NONE
>                         File Output Operator
>                           compressed: false
>                           Statistics: Num rows: 17 Data size: 1813 Basic 
> stats: COMPLETE Column stats: NONE
>                           table:
>                               input format: 
> org.apache.hadoop.mapred.TextInputFormat
>                               output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>                               serde: 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>         Map 3 
>             Map Operator Tree:
>                 TableScan
>                   alias: src1
>                   Statistics: Num rows: 58 Data size: 5812 Basic stats: 
> COMPLETE Column stats: NONE
>                   Filter Operator
>                     predicate: key is not null (type: boolean)
>                     Statistics: Num rows: 29 Data size: 2906 Basic stats: 
> COMPLETE Column stats: NONE
>                     Map Join Operator
>                       condition map:
>                            Inner Join 0 to 1
>                       condition expressions:
>                         0 {key}
>                         1 {key}
>                       keys:
>                         0 key (type: string)
>                         1 key (type: string)
>                       outputColumnNames: _col0, _col5
>                       input vertices:
>                         1 Map 1
>                       Statistics: Num rows: 31 Data size: 3196 Basic stats: 
> COMPLETE Column stats: NONE
>                       Filter Operator
>                         predicate: (_col0 + _col5) is not null (type: boolean)
>                         Statistics: Num rows: 16 Data size: 1649 Basic stats: 
> COMPLETE Column stats: NONE
>                         Reduce Output Operator
>                           key expressions: (_col0 + _col5) (type: double)
>                           sort order: +
>                           Map-reduce partition columns: (_col0 + _col5) 
> (type: double)
>                           Statistics: Num rows: 16 Data size: 1649 Basic 
> stats: COMPLETE Column stats: NONE
>                           value expressions: _col0 (type: string)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8701) Combine nested map joins into the parent map join if possible [Spark Branch]

Reply via email to