Xuefu Zhang created HIVE-8701: --------------------------------- Summary: Combine nested map joins into the parent map join if possible [Spark Branch] Key: HIVE-8701 URL: https://issues.apache.org/jira/browse/HIVE-8701 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang
With the work in HIVE-8616 enabled, the generated plan shows that the nested map join operator isn't merged to its parent when possible. This is demonstrated in auto_join2.q. The MR plan shown that this optimization is in place. We should do the same for Spark. {code} STAGE PLANS: Stage: Stage-1 Spark Edges: Map 2 <- Map 3 (NONE, 0) Map 3 <- Map 1 (NONE, 0) DagName: xzhang_20141102074141_ac089634-bf01-4386-b1cf-3e7f2e99f6eb:3 Vertices: Map 1 Map Operator Tree: TableScan alias: src2 Statistics: Num rows: 58 Data size: 5812 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: key is not null (type: boolean) Statistics: Num rows: 29 Data size: 2906 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: key (type: string) sort order: + Map-reduce partition columns: key (type: string) Statistics: Num rows: 29 Data size: 2906 Basic stats: COMPLETE Column stats: NONE Map 2 Map Operator Tree: TableScan alias: src3 Statistics: Num rows: 29 Data size: 5812 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: UDFToDouble(key) is not null (type: boolean) Statistics: Num rows: 15 Data size: 3006 Basic stats: COMPLETE Column stats: NONE Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {_col0} 1 {value} keys: 0 (_col0 + _col5) (type: double) 1 UDFToDouble(key) (type: double) outputColumnNames: _col0, _col11 input vertices: 0 Map 3 Statistics: Num rows: 17 Data size: 1813 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: _col0 (type: string), _col11 (type: string) outputColumnNames: _col0, _col1 Statistics: Num rows: 17 Data size: 1813 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 17 Data size: 1813 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Map 3 Map Operator Tree: TableScan alias: src1 Statistics: Num rows: 58 Data size: 5812 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: key is not null (type: boolean) Statistics: Num rows: 29 Data size: 2906 Basic stats: COMPLETE Column stats: NONE Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {key} 1 {key} keys: 0 key (type: string) 1 key (type: string) outputColumnNames: _col0, _col5 input vertices: 1 Map 1 Statistics: Num rows: 31 Data size: 3196 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (_col0 + _col5) is not null (type: boolean) Statistics: Num rows: 16 Data size: 1649 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: (_col0 + _col5) (type: double) sort order: + Map-reduce partition columns: (_col0 + _col5) (type: double) Statistics: Num rows: 16 Data size: 1649 Basic stats: COMPLETE Column stats: NONE value expressions: _col0 (type: string) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)