[ https://issues.apache.org/jira/browse/HIVE-8701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218875#comment-14218875 ]
Suhas Satish commented on HIVE-8701: ------------------------------------ [~szehon] - Can you illustrate how you plan to optimize for lower memory utilization in the case of nested map-joins, with an example ? > Combine nested map joins into the parent map join if possible [Spark Branch] > ---------------------------------------------------------------------------- > > Key: HIVE-8701 > URL: https://issues.apache.org/jira/browse/HIVE-8701 > Project: Hive > Issue Type: Sub-task > Components: Spark > Reporter: Xuefu Zhang > Assignee: Szehon Ho > > With the work in HIVE-8616 enabled, the generated plan shows that the nested > map join operator isn't merged to its parent when possible. This is > demonstrated in auto_join2.q. The MR plan shown that this optimization is in > place. We should do the same for Spark. > {code} > STAGE PLANS: > Stage: Stage-1 > Spark > Edges: > Map 2 <- Map 3 (NONE, 0) > Map 3 <- Map 1 (NONE, 0) > DagName: xzhang_20141102074141_ac089634-bf01-4386-b1cf-3e7f2e99f6eb:3 > Vertices: > Map 1 > Map Operator Tree: > TableScan > alias: src2 > Statistics: Num rows: 58 Data size: 5812 Basic stats: > COMPLETE Column stats: NONE > Filter Operator > predicate: key is not null (type: boolean) > Statistics: Num rows: 29 Data size: 2906 Basic stats: > COMPLETE Column stats: NONE > Reduce Output Operator > key expressions: key (type: string) > sort order: + > Map-reduce partition columns: key (type: string) > Statistics: Num rows: 29 Data size: 2906 Basic stats: > COMPLETE Column stats: NONE > Map 2 > Map Operator Tree: > TableScan > alias: src3 > Statistics: Num rows: 29 Data size: 5812 Basic stats: > COMPLETE Column stats: NONE > Filter Operator > predicate: UDFToDouble(key) is not null (type: boolean) > Statistics: Num rows: 15 Data size: 3006 Basic stats: > COMPLETE Column stats: NONE > Map Join Operator > condition map: > Inner Join 0 to 1 > condition expressions: > 0 {_col0} > 1 {value} > keys: > 0 (_col0 + _col5) (type: double) > 1 UDFToDouble(key) (type: double) > outputColumnNames: _col0, _col11 > input vertices: > 0 Map 3 > Statistics: Num rows: 17 Data size: 1813 Basic stats: > COMPLETE Column stats: NONE > Select Operator > expressions: _col0 (type: string), _col11 (type: > string) > outputColumnNames: _col0, _col1 > Statistics: Num rows: 17 Data size: 1813 Basic stats: > COMPLETE Column stats: NONE > File Output Operator > compressed: false > Statistics: Num rows: 17 Data size: 1813 Basic > stats: COMPLETE Column stats: NONE > table: > input format: > org.apache.hadoop.mapred.TextInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > serde: > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > Map 3 > Map Operator Tree: > TableScan > alias: src1 > Statistics: Num rows: 58 Data size: 5812 Basic stats: > COMPLETE Column stats: NONE > Filter Operator > predicate: key is not null (type: boolean) > Statistics: Num rows: 29 Data size: 2906 Basic stats: > COMPLETE Column stats: NONE > Map Join Operator > condition map: > Inner Join 0 to 1 > condition expressions: > 0 {key} > 1 {key} > keys: > 0 key (type: string) > 1 key (type: string) > outputColumnNames: _col0, _col5 > input vertices: > 1 Map 1 > Statistics: Num rows: 31 Data size: 3196 Basic stats: > COMPLETE Column stats: NONE > Filter Operator > predicate: (_col0 + _col5) is not null (type: boolean) > Statistics: Num rows: 16 Data size: 1649 Basic stats: > COMPLETE Column stats: NONE > Reduce Output Operator > key expressions: (_col0 + _col5) (type: double) > sort order: + > Map-reduce partition columns: (_col0 + _col5) > (type: double) > Statistics: Num rows: 16 Data size: 1649 Basic > stats: COMPLETE Column stats: NONE > value expressions: _col0 (type: string) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)