[ https://issues.apache.org/jira/browse/HIVE-7985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137768#comment-14137768 ]
Mostafa Mokhtar commented on HIVE-7985: --------------------------------------- Q56 also has the same issues Query {code} with ss as ( select i_item_id,sum(ss_ext_sales_price) total_sales from store_sales, date_dim, customer_address, item where item.i_item_id in (select i.i_item_id from item i where i_color in ('purple','burlywood','indian')) and ss_item_sk = i_item_sk and ss_sold_date_sk = d_date_sk and d_year = 2001 and d_moy = 1 and ss_addr_sk = ca_address_sk and ca_gmt_offset = -6 group by i_item_id), cs as ( select i_item_id,sum(cs_ext_sales_price) total_sales from catalog_sales, date_dim, customer_address, item where item.i_item_id in (select i.i_item_id from item i where i_color in ('purple','burlywood','indian')) and cs_item_sk = i_item_sk and cs_sold_date_sk = d_date_sk and d_year = 2001 and d_moy = 1 and cs_bill_addr_sk = ca_address_sk and ca_gmt_offset = -6 group by i_item_id), ws as ( select i_item_id,sum(ws_ext_sales_price) total_sales from web_sales, date_dim, customer_address, item where item.i_item_id in (select i.i_item_id from item i where i_color in ('purple','burlywood','indian')) and ws_item_sk = i_item_sk and ws_sold_date_sk = d_date_sk and d_year = 2001 and d_moy = 1 and ws_bill_addr_sk = ca_address_sk and ca_gmt_offset = -6 group by i_item_id) select i_item_id ,sum(total_sales) total_sales from (select * from ss union all select * from cs union all select * from ws) tmp1 group by i_item_id order by total_sales limit 100 {code} Plan {code} Warning: Map Join MAPJOIN[177][bigTable=?] in task 'Map 8' is a cross product Warning: Map Join MAPJOIN[178][bigTable=?] in task 'Map 21' is a cross product Warning: Map Join MAPJOIN[179][bigTable=web_sales] in task 'Map 14' is a cross product Warning: Map Join MAPJOIN[180][bigTable=?] in task 'Map 20' is a cross product Warning: Map Join MAPJOIN[181][bigTable=?] in task 'Map 22' is a cross product Warning: Map Join MAPJOIN[182][bigTable=store_sales] in task 'Map 17' is a cross product Warning: Map Join MAPJOIN[174][bigTable=?] in task 'Map 13' is a cross product Warning: Map Join MAPJOIN[175][bigTable=?] in task 'Map 10' is a cross product Warning: Map Join MAPJOIN[176][bigTable=catalog_sales] in task 'Map 2' is a cross product OK STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Tez Edges: Map 10 <- Map 13 (BROADCAST_EDGE) Map 13 <- Map 16 (BROADCAST_EDGE) Map 14 <- Map 11 (BROADCAST_EDGE), Map 21 (BROADCAST_EDGE) Map 17 <- Map 12 (BROADCAST_EDGE), Map 22 (BROADCAST_EDGE) Map 2 <- Map 1 (BROADCAST_EDGE), Map 10 (BROADCAST_EDGE) Map 20 <- Map 9 (BROADCAST_EDGE) Map 21 <- Map 8 (BROADCAST_EDGE) Map 22 <- Map 20 (BROADCAST_EDGE) Map 8 <- Map 19 (BROADCAST_EDGE) Reducer 15 <- Map 14 (SIMPLE_EDGE), Union 4 (CONTAINS) Reducer 18 <- Map 17 (SIMPLE_EDGE), Union 4 (CONTAINS) Reducer 3 <- Map 2 (SIMPLE_EDGE), Union 4 (CONTAINS) Reducer 5 <- Union 4 (SIMPLE_EDGE) Reducer 6 <- Reducer 5 (SIMPLE_EDGE) Reducer 7 <- Reducer 6 (SIMPLE_EDGE) DagName: mmokhtar_20140916234444_107f445a-d48f-4a42-89a1-8c53eaa8dec0:1 Vertices: Map 1 Map Operator Tree: TableScan alias: i filterExpr: ((i_color) IN ('purple', 'burlywood', 'indian') and i_item_id is not null) (type: boolean) Statistics: Num rows: 48000 Data size: 68732712 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: ((i_color) IN ('purple', 'burlywood', 'indian') and i_item_id is not null) (type: boolean) Statistics: Num rows: 12000 Data size: 17183178 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: i_item_id (type: string) outputColumnNames: _col0 Statistics: Num rows: 12000 Data size: 17183178 Basic stats: COMPLETE Column stats: NONE Group By Operator keys: _col0 (type: string) mode: hash outputColumnNames: _col0 Statistics: Num rows: 12000 Data size: 17183178 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: _col0 (type: string) sort order: + Map-reduce partition columns: _col0 (type: string) Statistics: Num rows: 12000 Data size: 17183178 Basic stats: COMPLETE Column stats: NONE Execution mode: vectorized Map 10 Map Operator Tree: TableScan alias: customer_address filterExpr: (ca_gmt_offset = UDFToFloat((- 6))) (type: boolean) Statistics: Num rows: 800000 Data size: 811903688 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (ca_gmt_offset = UDFToFloat((- 6))) (type: boolean) Statistics: Num rows: 400000 Data size: 405951844 Basic stats: COMPLETE Column stats: NONE Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {_col0} {_col31} {_col32} 1 {ca_address_sk} keys: 0 1 outputColumnNames: _col0, _col31, _col32, _col56 input vertices: 0 Map 13 Statistics: Num rows: 440000 Data size: 446547040 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator sort order: Statistics: Num rows: 440000 Data size: 446547040 Basic stats: COMPLETE Column stats: NONE value expressions: _col0 (type: int), _col31 (type: int), _col32 (type: string), _col56 (type: int) Execution mode: vectorized Map 11 Map Operator Tree: TableScan alias: i filterExpr: ((i_color) IN ('purple', 'burlywood', 'indian') and i_item_id is not null) (type: boolean) Statistics: Num rows: 48000 Data size: 68732712 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: ((i_color) IN ('purple', 'burlywood', 'indian') and i_item_id is not null) (type: boolean) Statistics: Num rows: 12000 Data size: 17183178 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: i_item_id (type: string) outputColumnNames: _col0 Statistics: Num rows: 12000 Data size: 17183178 Basic stats: COMPLETE Column stats: NONE Group By Operator keys: _col0 (type: string) mode: hash outputColumnNames: _col0 Statistics: Num rows: 12000 Data size: 17183178 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: _col0 (type: string) sort order: + Map-reduce partition columns: _col0 (type: string) Statistics: Num rows: 12000 Data size: 17183178 Basic stats: COMPLETE Column stats: NONE Execution mode: vectorized Map 12 Map Operator Tree: TableScan alias: i filterExpr: ((i_color) IN ('purple', 'burlywood', 'indian') and i_item_id is not null) (type: boolean) Statistics: Num rows: 48000 Data size: 68732712 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: ((i_color) IN ('purple', 'burlywood', 'indian') and i_item_id is not null) (type: boolean) Statistics: Num rows: 12000 Data size: 17183178 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: i_item_id (type: string) outputColumnNames: _col0 Statistics: Num rows: 12000 Data size: 17183178 Basic stats: COMPLETE Column stats: NONE Group By Operator keys: _col0 (type: string) mode: hash outputColumnNames: _col0 Statistics: Num rows: 12000 Data size: 17183178 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: _col0 (type: string) sort order: + Map-reduce partition columns: _col0 (type: string) Statistics: Num rows: 12000 Data size: 17183178 Basic stats: COMPLETE Column stats: NONE Execution mode: vectorized Map 13 Map Operator Tree: TableScan alias: item filterExpr: i_item_id is not null (type: boolean) Statistics: Num rows: 48000 Data size: 68732712 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: i_item_id is not null (type: boolean) Statistics: Num rows: 24000 Data size: 34366356 Basic stats: COMPLETE Column stats: NONE Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {d_date_sk} 1 {i_item_sk} {i_item_id} keys: 0 1 outputColumnNames: _col0, _col31, _col32 input vertices: 0 Map 16 Statistics: Num rows: 26400 Data size: 37802992 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator sort order: Statistics: Num rows: 26400 Data size: 37802992 Basic stats: COMPLETE Column stats: NONE value expressions: _col0 (type: int), _col31 (type: int), _col32 (type: string) Execution mode: vectorized Map 14 Map Operator Tree: TableScan alias: web_sales Statistics: Num rows: 143966864 Data size: 19001610332 Basic stats: COMPLETE Column stats: NONE Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {_col0} {_col31} {_col32} {_col56} 1 {ws_item_sk} {ws_bill_addr_sk} {ws_ext_sales_price} {ws_sold_date_sk} keys: 0 1 outputColumnNames: _col0, _col31, _col32, _col56, _col74, _col78, _col94, _col105 input vertices: 0 Map 21 Statistics: Num rows: 158363552 Data size: 20901771264 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (((_col74 = _col31) and (_col105 = _col0)) and (_col78 = _col56)) (type: boolean) Statistics: Num rows: 19795444 Data size: 2612721408 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: _col94 (type: float), _col32 (type: string) outputColumnNames: _col22, _col85 Statistics: Num rows: 19795444 Data size: 2612721408 Basic stats: COMPLETE Column stats: NONE Map Join Operator condition map: Left Semi Join 0 to 1 condition expressions: 0 {_col22} {_col85} 1 keys: 0 _col85 (type: string) 1 _col0 (type: string) outputColumnNames: _col22, _col85 input vertices: 1 Map 11 Statistics: Num rows: 21774988 Data size: 2873993728 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: _col85 (type: string), _col22 (type: float) outputColumnNames: _col0, _col1 Statistics: Num rows: 21774988 Data size: 2873993728 Basic stats: COMPLETE Column stats: NONE Group By Operator aggregations: sum(_col1) keys: _col0 (type: string) mode: hash outputColumnNames: _col0, _col1 Statistics: Num rows: 21774988 Data size: 2873993728 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: _col0 (type: string) sort order: + Map-reduce partition columns: _col0 (type: string) Statistics: Num rows: 21774988 Data size: 2873993728 Basic stats: COMPLETE Column stats: NONE value expressions: _col1 (type: double) Execution mode: vectorized Map 16 Map Operator Tree: TableScan alias: date_dim filterExpr: ((d_year = 2001) and (d_moy = 1)) (type: boolean) Statistics: Num rows: 73049 Data size: 81741831 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: ((d_year = 2001) and (d_moy = 1)) (type: boolean) Statistics: Num rows: 18262 Data size: 20435178 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator sort order: Statistics: Num rows: 18262 Data size: 20435178 Basic stats: COMPLETE Column stats: NONE value expressions: d_date_sk (type: int) Execution mode: vectorized Map 17 Map Operator Tree: TableScan alias: store_sales Statistics: Num rows: 550076554 Data size: 47370018896 Basic stats: COMPLETE Column stats: NONE Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {_col0} {_col31} {_col32} {_col56} 1 {ss_item_sk} {ss_addr_sk} {ss_ext_sales_price} {ss_sold_date_sk} keys: 0 1 outputColumnNames: _col0, _col31, _col32, _col56, _col73, _col77, _col86, _col94 input vertices: 0 Map 22 Statistics: Num rows: 605084224 Data size: 52107022336 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (((_col73 = _col31) and (_col94 = _col0)) and (_col77 = _col56)) (type: boolean) Statistics: Num rows: 75635528 Data size: 6513377792 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: _col86 (type: float), _col32 (type: string) outputColumnNames: _col14, _col74 Statistics: Num rows: 75635528 Data size: 6513377792 Basic stats: COMPLETE Column stats: NONE Map Join Operator condition map: Left Semi Join 0 to 1 condition expressions: 0 {_col14} {_col74} 1 keys: 0 _col74 (type: string) 1 _col0 (type: string) outputColumnNames: _col14, _col74 input vertices: 1 Map 12 Statistics: Num rows: 83199080 Data size: 7164715520 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: _col74 (type: string), _col14 (type: float) outputColumnNames: _col0, _col1 Statistics: Num rows: 83199080 Data size: 7164715520 Basic stats: COMPLETE Column stats: NONE Group By Operator aggregations: sum(_col1) keys: _col0 (type: string) mode: hash outputColumnNames: _col0, _col1 Statistics: Num rows: 83199080 Data size: 7164715520 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: _col0 (type: string) sort order: + Map-reduce partition columns: _col0 (type: string) Statistics: Num rows: 83199080 Data size: 7164715520 Basic stats: COMPLETE Column stats: NONE value expressions: _col1 (type: double) Execution mode: vectorized Map 19 Map Operator Tree: TableScan alias: date_dim filterExpr: ((d_year = 2001) and (d_moy = 1)) (type: boolean) Statistics: Num rows: 73049 Data size: 81741831 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: ((d_year = 2001) and (d_moy = 1)) (type: boolean) Statistics: Num rows: 18262 Data size: 20435178 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator sort order: Statistics: Num rows: 18262 Data size: 20435178 Basic stats: COMPLETE Column stats: NONE value expressions: d_date_sk (type: int) Execution mode: vectorized Map 2 Map Operator Tree: TableScan alias: catalog_sales Statistics: Num rows: 286549727 Data size: 37743959324 Basic stats: COMPLETE Column stats: NONE Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {_col0} {_col31} {_col32} {_col56} 1 {cs_bill_addr_sk} {cs_item_sk} {cs_ext_sales_price} {cs_sold_date_sk} keys: 0 1 outputColumnNames: _col0, _col31, _col32, _col56, _col77, _col86, _col94, _col105 input vertices: 0 Map 10 Statistics: Num rows: 315204704 Data size: 41518358528 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (((_col86 = _col31) and (_col105 = _col0)) and (_col77 = _col56)) (type: boolean) Statistics: Num rows: 39400588 Data size: 5189794816 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: _col94 (type: float), _col32 (type: string) outputColumnNames: _col22, _col85 Statistics: Num rows: 39400588 Data size: 5189794816 Basic stats: COMPLETE Column stats: NONE Map Join Operator condition map: Left Semi Join 0 to 1 condition expressions: 0 {_col22} {_col85} 1 keys: 0 _col85 (type: string) 1 _col0 (type: string) outputColumnNames: _col22, _col85 input vertices: 1 Map 1 Statistics: Num rows: 43340648 Data size: 5708774400 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: _col85 (type: string), _col22 (type: float) outputColumnNames: _col0, _col1 Statistics: Num rows: 43340648 Data size: 5708774400 Basic stats: COMPLETE Column stats: NONE Group By Operator aggregations: sum(_col1) keys: _col0 (type: string) mode: hash outputColumnNames: _col0, _col1 Statistics: Num rows: 43340648 Data size: 5708774400 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: _col0 (type: string) sort order: + Map-reduce partition columns: _col0 (type: string) Statistics: Num rows: 43340648 Data size: 5708774400 Basic stats: COMPLETE Column stats: NONE value expressions: _col1 (type: double) Execution mode: vectorized Map 20 Map Operator Tree: TableScan alias: item filterExpr: i_item_id is not null (type: boolean) Statistics: Num rows: 48000 Data size: 68732712 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: i_item_id is not null (type: boolean) Statistics: Num rows: 24000 Data size: 34366356 Basic stats: COMPLETE Column stats: NONE Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {d_date_sk} 1 {i_item_sk} {i_item_id} keys: 0 1 outputColumnNames: _col0, _col31, _col32 input vertices: 0 Map 9 Statistics: Num rows: 26400 Data size: 37802992 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator sort order: Statistics: Num rows: 26400 Data size: 37802992 Basic stats: COMPLETE Column stats: NONE value expressions: _col0 (type: int), _col31 (type: int), _col32 (type: string) Execution mode: vectorized Map 21 Map Operator Tree: TableScan alias: customer_address filterExpr: (ca_gmt_offset = UDFToFloat((- 6))) (type: boolean) Statistics: Num rows: 800000 Data size: 811903688 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (ca_gmt_offset = UDFToFloat((- 6))) (type: boolean) Statistics: Num rows: 400000 Data size: 405951844 Basic stats: COMPLETE Column stats: NONE Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {_col0} {_col31} {_col32} 1 {ca_address_sk} keys: 0 1 outputColumnNames: _col0, _col31, _col32, _col56 input vertices: 0 Map 8 Statistics: Num rows: 440000 Data size: 446547040 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator sort order: Statistics: Num rows: 440000 Data size: 446547040 Basic stats: COMPLETE Column stats: NONE value expressions: _col0 (type: int), _col31 (type: int), _col32 (type: string), _col56 (type: int) Execution mode: vectorized Map 22 Map Operator Tree: TableScan alias: customer_address filterExpr: (ca_gmt_offset = UDFToFloat((- 6))) (type: boolean) Statistics: Num rows: 800000 Data size: 811903688 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (ca_gmt_offset = UDFToFloat((- 6))) (type: boolean) Statistics: Num rows: 400000 Data size: 405951844 Basic stats: COMPLETE Column stats: NONE Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {_col0} {_col31} {_col32} 1 {ca_address_sk} keys: 0 1 outputColumnNames: _col0, _col31, _col32, _col56 input vertices: 0 Map 20 Statistics: Num rows: 440000 Data size: 446547040 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator sort order: Statistics: Num rows: 440000 Data size: 446547040 Basic stats: COMPLETE Column stats: NONE value expressions: _col0 (type: int), _col31 (type: int), _col32 (type: string), _col56 (type: int) Execution mode: vectorized Map 8 Map Operator Tree: TableScan alias: item filterExpr: i_item_id is not null (type: boolean) Statistics: Num rows: 48000 Data size: 68732712 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: i_item_id is not null (type: boolean) Statistics: Num rows: 24000 Data size: 34366356 Basic stats: COMPLETE Column stats: NONE Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {d_date_sk} 1 {i_item_sk} {i_item_id} keys: 0 1 outputColumnNames: _col0, _col31, _col32 input vertices: 0 Map 19 Statistics: Num rows: 26400 Data size: 37802992 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator sort order: Statistics: Num rows: 26400 Data size: 37802992 Basic stats: COMPLETE Column stats: NONE value expressions: _col0 (type: int), _col31 (type: int), _col32 (type: string) Execution mode: vectorized Map 9 Map Operator Tree: TableScan alias: date_dim filterExpr: ((d_year = 2001) and (d_moy = 1)) (type: boolean) Statistics: Num rows: 73049 Data size: 81741831 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: ((d_year = 2001) and (d_moy = 1)) (type: boolean) Statistics: Num rows: 18262 Data size: 20435178 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator sort order: Statistics: Num rows: 18262 Data size: 20435178 Basic stats: COMPLETE Column stats: NONE value expressions: d_date_sk (type: int) Execution mode: vectorized Reducer 15 Reduce Operator Tree: Group By Operator aggregations: sum(VALUE._col0) keys: KEY._col0 (type: string) mode: mergepartial outputColumnNames: _col0, _col1 Select Operator expressions: _col0 (type: string), _col1 (type: double) outputColumnNames: _col0, _col1 Select Operator expressions: _col0 (type: string), _col1 (type: double) outputColumnNames: _col0, _col1 Group By Operator aggregations: sum(_col1) keys: _col0 (type: string) mode: hash outputColumnNames: _col0, _col1 Reduce Output Operator key expressions: _col0 (type: string) sort order: + Map-reduce partition columns: _col0 (type: string) value expressions: _col1 (type: double) Reducer 18 Reduce Operator Tree: Group By Operator aggregations: sum(VALUE._col0) keys: KEY._col0 (type: string) mode: mergepartial outputColumnNames: _col0, _col1 Select Operator expressions: _col0 (type: string), _col1 (type: double) outputColumnNames: _col0, _col1 Select Operator expressions: _col0 (type: string), _col1 (type: double) outputColumnNames: _col0, _col1 Select Operator expressions: _col0 (type: string), _col1 (type: double) outputColumnNames: _col0, _col1 Group By Operator aggregations: sum(_col1) keys: _col0 (type: string) mode: hash outputColumnNames: _col0, _col1 Reduce Output Operator key expressions: _col0 (type: string) sort order: + Map-reduce partition columns: _col0 (type: string) value expressions: _col1 (type: double) Reducer 3 Reduce Operator Tree: Group By Operator aggregations: sum(VALUE._col0) keys: KEY._col0 (type: string) mode: mergepartial outputColumnNames: _col0, _col1 Select Operator expressions: _col0 (type: string), _col1 (type: double) outputColumnNames: _col0, _col1 Select Operator expressions: _col0 (type: string), _col1 (type: double) outputColumnNames: _col0, _col1 Select Operator expressions: _col0 (type: string), _col1 (type: double) outputColumnNames: _col0, _col1 Group By Operator aggregations: sum(_col1) keys: _col0 (type: string) mode: hash outputColumnNames: _col0, _col1 Reduce Output Operator key expressions: _col0 (type: string) sort order: + Map-reduce partition columns: _col0 (type: string) value expressions: _col1 (type: double) Reducer 5 Reduce Operator Tree: Group By Operator aggregations: sum(VALUE._col0) keys: KEY._col0 (type: string) mode: mergepartial outputColumnNames: _col0, _col1 Statistics: Num rows: 37078679 Data size: 3936870912 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: _col0 (type: string), _col1 (type: double) outputColumnNames: _col0, _col1 Statistics: Num rows: 37078679 Data size: 3936870912 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: _col1 (type: double) sort order: + Statistics: Num rows: 37078679 Data size: 3936870912 Basic stats: COMPLETE Column stats: NONE TopN Hash Memory Usage: 0.04 value expressions: _col0 (type: string) Execution mode: vectorized Reducer 6 Reduce Operator Tree: Select Operator expressions: VALUE._col0 (type: string), KEY.reducesinkkey0 (type: double) outputColumnNames: _col0, _col1 Statistics: Num rows: 37078679 Data size: 3936870912 Basic stats: COMPLETE Column stats: NONE Limit Number of rows: 100 Statistics: Num rows: 100 Data size: 10600 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator sort order: Statistics: Num rows: 100 Data size: 10600 Basic stats: COMPLETE Column stats: NONE TopN Hash Memory Usage: 0.04 value expressions: _col0 (type: string), _col1 (type: double) Execution mode: vectorized Reducer 7 Reduce Operator Tree: Select Operator expressions: VALUE._col0 (type: string), VALUE._col1 (type: double) outputColumnNames: _col0, _col1 Statistics: Num rows: 100 Data size: 10600 Basic stats: COMPLETE Column stats: NONE Limit Number of rows: 100 Statistics: Num rows: 100 Data size: 10600 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: _col0 (type: string), _col1 (type: double) outputColumnNames: _col0, _col1 Statistics: Num rows: 100 Data size: 10600 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 100 Data size: 10600 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Execution mode: vectorized Union 4 Vertex: Union 4 Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: ListSink {code} > With CBO enabled cross product is generated when a subquery is present > ---------------------------------------------------------------------- > > Key: HIVE-7985 > URL: https://issues.apache.org/jira/browse/HIVE-7985 > Project: Hive > Issue Type: Bug > Components: CBO > Affects Versions: 0.14.0 > Reporter: Mostafa Mokhtar > Assignee: Laljo John Pullokkaran > Fix For: 0.14.0 > > > This is a regression introduced in the latest build of the CBO branch. > Removing the subquery for item will remove the cross products > Query > {code} > select i_item_id,sum(ss_ext_sales_price) total_sales from store_sales, > date_dim, item where item.i_item_id in (select i.i_item_id from item i where > i_color in ('purple','burlywood','indian')) and ss_item_sk = i_item_sk and > ss_sold_date_sk = d_date_sk and d_year = 2001 and d_moy = 1 group by > i_item_id; > {code} > {code} > Warning: Map Join MAPJOIN[38][bigTable=?] in task 'Map 1' is a cross product > Warning: Map Join MAPJOIN[39][bigTable=store_sales] in task 'Map 4' is a > cross product > OK > STAGE DEPENDENCIES: > Stage-1 is a root stage > Stage-0 depends on stages: Stage-1 > STAGE PLANS: > Stage: Stage-1 > Tez > Edges: > Map 1 <- Map 3 (BROADCAST_EDGE) > Map 4 <- Map 1 (BROADCAST_EDGE), Map 2 (BROADCAST_EDGE) > Reducer 5 <- Map 4 (SIMPLE_EDGE) > DagName: mmokhtar_20140904141313_9c253f7e-aad1-4ca4-9be1-ea45e3d34496:1 > Vertices: > Map 1 > Map Operator Tree: > TableScan > alias: item > filterExpr: (true and i_item_id is not null) (type: boolean) > Statistics: Num rows: 462000 Data size: 663862160 Basic > stats: COMPLETE Column stats: NONE > Filter Operator > predicate: i_item_id is not null (type: boolean) > Statistics: Num rows: 231000 Data size: 331931080 Basic > stats: COMPLETE Column stats: NONE > Map Join Operator > condition map: > Inner Join 0 to 1 > condition expressions: > 0 {i_item_sk} {i_item_id} > 1 {d_date_sk} > keys: > 0 > 1 > outputColumnNames: _col0, _col1, _col25 > input vertices: > 1 Map 3 > Statistics: Num rows: 254100 Data size: 365124192 Basic > stats: COMPLETE Column stats: NONE > Select Operator > expressions: _col0 (type: int), _col1 (type: string), > _col25 (type: int) > outputColumnNames: _col0, _col1, _col25 > Statistics: Num rows: 254100 Data size: 365124192 > Basic stats: COMPLETE Column stats: NONE > Reduce Output Operator > sort order: > Statistics: Num rows: 254100 Data size: 365124192 > Basic stats: COMPLETE Column stats: NONE > value expressions: _col0 (type: int), _col1 (type: > string), _col25 (type: int) > Execution mode: vectorized > Map 2 > Map Operator Tree: > TableScan > alias: i > filterExpr: ((i_color) IN ('purple', 'burlywood', 'indian') > and i_item_id is not null) (type: boolean) > Statistics: Num rows: 462000 Data size: 663862160 Basic > stats: COMPLETE Column stats: NONE > Filter Operator > predicate: ((i_color) IN ('purple', 'burlywood', > 'indian') and i_item_id is not null) (type: boolean) > Statistics: Num rows: 115500 Data size: 165965540 Basic > stats: COMPLETE Column stats: NONE > Select Operator > expressions: i_item_id (type: string) > outputColumnNames: _col0 > Statistics: Num rows: 115500 Data size: 165965540 Basic > stats: COMPLETE Column stats: NONE > Group By Operator > keys: _col0 (type: string) > mode: hash > outputColumnNames: _col0 > Statistics: Num rows: 115500 Data size: 165965540 > Basic stats: COMPLETE Column stats: NONE > Reduce Output Operator > key expressions: _col0 (type: string) > sort order: + > Map-reduce partition columns: _col0 (type: string) > Statistics: Num rows: 115500 Data size: 165965540 > Basic stats: COMPLETE Column stats: NONE > Execution mode: vectorized > Map 3 > Map Operator Tree: > TableScan > alias: date_dim > filterExpr: ((d_year = 2001) and (d_moy = 1)) (type: > boolean) > Statistics: Num rows: 73049 Data size: 81741831 Basic > stats: COMPLETE Column stats: NONE > Filter Operator > predicate: ((d_year = 2001) and (d_moy = 1)) (type: > boolean) > Statistics: Num rows: 18262 Data size: 20435178 Basic > stats: COMPLETE Column stats: NONE > Reduce Output Operator > sort order: > Statistics: Num rows: 18262 Data size: 20435178 Basic > stats: COMPLETE Column stats: NONE > value expressions: d_date_sk (type: int) > Execution mode: vectorized > Map 4 > Map Operator Tree: > TableScan > alias: store_sales > Statistics: Num rows: 82510879939 Data size: 7203833257964 > Basic stats: COMPLETE Column stats: NONE > Map Join Operator > condition map: > Inner Join 0 to 1 > condition expressions: > 0 {ss_sold_date_sk} {ss_item_sk} {ss_ext_sales_price} > 1 {_col0} {_col1} {_col25} > keys: > 0 > 1 > outputColumnNames: _col0, _col2, _col15, _col27, _col28, > _col52 > input vertices: > 1 Map 1 > Statistics: Num rows: 90761969664 Data size: > 7924217282560 Basic stats: COMPLETE Column stats: NONE > Filter Operator > predicate: ((_col2 = _col27) and (_col0 = _col52)) > (type: boolean) > Statistics: Num rows: 22690492416 Data size: > 1981054320640 Basic stats: COMPLETE Column stats: NONE > Select Operator > expressions: _col15 (type: float), _col28 (type: > string) > outputColumnNames: _col15, _col59 > Statistics: Num rows: 22690492416 Data size: > 1981054320640 Basic stats: COMPLETE Column stats: NONE > Map Join Operator > condition map: > Left Semi Join 0 to 1 > condition expressions: > 0 {_col15} {_col59} > 1 > keys: > 0 _col59 (type: string) > 1 _col0 (type: string) > outputColumnNames: _col15, _col59 > input vertices: > 1 Map 2 > Statistics: Num rows: 24959541248 Data size: > 2179159818240 Basic stats: COMPLETE Column stats: NONE > Select Operator > expressions: _col59 (type: string), _col15 (type: > float) > outputColumnNames: _col0, _col1 > Statistics: Num rows: 24959541248 Data size: > 2179159818240 Basic stats: COMPLETE Column stats: NONE > Group By Operator > aggregations: sum(_col1) > keys: _col0 (type: string) > mode: hash > outputColumnNames: _col0, _col1 > Statistics: Num rows: 24959541248 Data size: > 2179159818240 Basic stats: COMPLETE Column stats: NONE > Reduce Output Operator > key expressions: _col0 (type: string) > sort order: + > Map-reduce partition columns: _col0 (type: > string) > Statistics: Num rows: 24959541248 Data size: > 2179159818240 Basic stats: COMPLETE Column stats: NONE > value expressions: _col1 (type: double) > Execution mode: vectorized > Reducer 5 > Reduce Operator Tree: > Group By Operator > aggregations: sum(VALUE._col0) > keys: KEY._col0 (type: string) > mode: mergepartial > outputColumnNames: _col0, _col1 > Statistics: Num rows: 12479770624 Data size: 1089579909120 > Basic stats: COMPLETE Column stats: NONE > Select Operator > expressions: _col0 (type: string), _col1 (type: double) > outputColumnNames: _col0, _col1 > Statistics: Num rows: 12479770624 Data size: 1089579909120 > Basic stats: COMPLETE Column stats: NONE > File Output Operator > compressed: false > Statistics: Num rows: 12479770624 Data size: > 1089579909120 Basic stats: COMPLETE Column stats: NONE > table: > input format: org.apache.hadoop.mapred.TextInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > serde: > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > Stage: Stage-0 > Fetch Operator > limit: -1 > Processor Tree: > ListSink > {code} > No cross product generated for this query > {code} > select i_item_id,sum(ss_ext_sales_price) total_sales > from > store_sales, > date_dim, > item > where i_color in ('purple','burlywood','indian') > and ss_item_sk = i_item_sk > and ss_sold_date_sk = d_date_sk > and d_year = 2001 > and d_moy = 1 > group by i_item_id; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)