[
https://issues.apache.org/jira/browse/HIVE-7985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137768#comment-14137768
]
Mostafa Mokhtar commented on HIVE-7985:
---------------------------------------
Q56 also has the same issues
Query
{code}
with ss as (
select i_item_id,sum(ss_ext_sales_price) total_sales
from
store_sales,
date_dim,
customer_address,
item
where item.i_item_id in (select
i.i_item_id
from item i
where i_color in ('purple','burlywood','indian'))
and ss_item_sk = i_item_sk
and ss_sold_date_sk = d_date_sk
and d_year = 2001
and d_moy = 1
and ss_addr_sk = ca_address_sk
and ca_gmt_offset = -6
group by i_item_id),
cs as (
select i_item_id,sum(cs_ext_sales_price) total_sales
from
catalog_sales,
date_dim,
customer_address,
item
where
item.i_item_id in (select
i.i_item_id
from item i
where i_color in ('purple','burlywood','indian'))
and cs_item_sk = i_item_sk
and cs_sold_date_sk = d_date_sk
and d_year = 2001
and d_moy = 1
and cs_bill_addr_sk = ca_address_sk
and ca_gmt_offset = -6
group by i_item_id),
ws as (
select i_item_id,sum(ws_ext_sales_price) total_sales
from
web_sales,
date_dim,
customer_address,
item
where
item.i_item_id in (select
i.i_item_id
from item i
where i_color in ('purple','burlywood','indian'))
and ws_item_sk = i_item_sk
and ws_sold_date_sk = d_date_sk
and d_year = 2001
and d_moy = 1
and ws_bill_addr_sk = ca_address_sk
and ca_gmt_offset = -6
group by i_item_id)
select i_item_id ,sum(total_sales) total_sales
from (select * from ss
union all
select * from cs
union all
select * from ws) tmp1
group by i_item_id
order by total_sales
limit 100
{code}
Plan
{code}
Warning: Map Join MAPJOIN[177][bigTable=?] in task 'Map 8' is a cross product
Warning: Map Join MAPJOIN[178][bigTable=?] in task 'Map 21' is a cross product
Warning: Map Join MAPJOIN[179][bigTable=web_sales] in task 'Map 14' is a cross
product
Warning: Map Join MAPJOIN[180][bigTable=?] in task 'Map 20' is a cross product
Warning: Map Join MAPJOIN[181][bigTable=?] in task 'Map 22' is a cross product
Warning: Map Join MAPJOIN[182][bigTable=store_sales] in task 'Map 17' is a
cross product
Warning: Map Join MAPJOIN[174][bigTable=?] in task 'Map 13' is a cross product
Warning: Map Join MAPJOIN[175][bigTable=?] in task 'Map 10' is a cross product
Warning: Map Join MAPJOIN[176][bigTable=catalog_sales] in task 'Map 2' is a
cross product
OK
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1
STAGE PLANS:
Stage: Stage-1
Tez
Edges:
Map 10 <- Map 13 (BROADCAST_EDGE)
Map 13 <- Map 16 (BROADCAST_EDGE)
Map 14 <- Map 11 (BROADCAST_EDGE), Map 21 (BROADCAST_EDGE)
Map 17 <- Map 12 (BROADCAST_EDGE), Map 22 (BROADCAST_EDGE)
Map 2 <- Map 1 (BROADCAST_EDGE), Map 10 (BROADCAST_EDGE)
Map 20 <- Map 9 (BROADCAST_EDGE)
Map 21 <- Map 8 (BROADCAST_EDGE)
Map 22 <- Map 20 (BROADCAST_EDGE)
Map 8 <- Map 19 (BROADCAST_EDGE)
Reducer 15 <- Map 14 (SIMPLE_EDGE), Union 4 (CONTAINS)
Reducer 18 <- Map 17 (SIMPLE_EDGE), Union 4 (CONTAINS)
Reducer 3 <- Map 2 (SIMPLE_EDGE), Union 4 (CONTAINS)
Reducer 5 <- Union 4 (SIMPLE_EDGE)
Reducer 6 <- Reducer 5 (SIMPLE_EDGE)
Reducer 7 <- Reducer 6 (SIMPLE_EDGE)
DagName: mmokhtar_20140916234444_107f445a-d48f-4a42-89a1-8c53eaa8dec0:1
Vertices:
Map 1
Map Operator Tree:
TableScan
alias: i
filterExpr: ((i_color) IN ('purple', 'burlywood', 'indian')
and i_item_id is not null) (type: boolean)
Statistics: Num rows: 48000 Data size: 68732712 Basic stats:
COMPLETE Column stats: NONE
Filter Operator
predicate: ((i_color) IN ('purple', 'burlywood', 'indian')
and i_item_id is not null) (type: boolean)
Statistics: Num rows: 12000 Data size: 17183178 Basic
stats: COMPLETE Column stats: NONE
Select Operator
expressions: i_item_id (type: string)
outputColumnNames: _col0
Statistics: Num rows: 12000 Data size: 17183178 Basic
stats: COMPLETE Column stats: NONE
Group By Operator
keys: _col0 (type: string)
mode: hash
outputColumnNames: _col0
Statistics: Num rows: 12000 Data size: 17183178 Basic
stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: string)
sort order: +
Map-reduce partition columns: _col0 (type: string)
Statistics: Num rows: 12000 Data size: 17183178 Basic
stats: COMPLETE Column stats: NONE
Execution mode: vectorized
Map 10
Map Operator Tree:
TableScan
alias: customer_address
filterExpr: (ca_gmt_offset = UDFToFloat((- 6))) (type:
boolean)
Statistics: Num rows: 800000 Data size: 811903688 Basic
stats: COMPLETE Column stats: NONE
Filter Operator
predicate: (ca_gmt_offset = UDFToFloat((- 6))) (type:
boolean)
Statistics: Num rows: 400000 Data size: 405951844 Basic
stats: COMPLETE Column stats: NONE
Map Join Operator
condition map:
Inner Join 0 to 1
condition expressions:
0 {_col0} {_col31} {_col32}
1 {ca_address_sk}
keys:
0
1
outputColumnNames: _col0, _col31, _col32, _col56
input vertices:
0 Map 13
Statistics: Num rows: 440000 Data size: 446547040 Basic
stats: COMPLETE Column stats: NONE
Reduce Output Operator
sort order:
Statistics: Num rows: 440000 Data size: 446547040 Basic
stats: COMPLETE Column stats: NONE
value expressions: _col0 (type: int), _col31 (type:
int), _col32 (type: string), _col56 (type: int)
Execution mode: vectorized
Map 11
Map Operator Tree:
TableScan
alias: i
filterExpr: ((i_color) IN ('purple', 'burlywood', 'indian')
and i_item_id is not null) (type: boolean)
Statistics: Num rows: 48000 Data size: 68732712 Basic stats:
COMPLETE Column stats: NONE
Filter Operator
predicate: ((i_color) IN ('purple', 'burlywood', 'indian')
and i_item_id is not null) (type: boolean)
Statistics: Num rows: 12000 Data size: 17183178 Basic
stats: COMPLETE Column stats: NONE
Select Operator
expressions: i_item_id (type: string)
outputColumnNames: _col0
Statistics: Num rows: 12000 Data size: 17183178 Basic
stats: COMPLETE Column stats: NONE
Group By Operator
keys: _col0 (type: string)
mode: hash
outputColumnNames: _col0
Statistics: Num rows: 12000 Data size: 17183178 Basic
stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: string)
sort order: +
Map-reduce partition columns: _col0 (type: string)
Statistics: Num rows: 12000 Data size: 17183178 Basic
stats: COMPLETE Column stats: NONE
Execution mode: vectorized
Map 12
Map Operator Tree:
TableScan
alias: i
filterExpr: ((i_color) IN ('purple', 'burlywood', 'indian')
and i_item_id is not null) (type: boolean)
Statistics: Num rows: 48000 Data size: 68732712 Basic stats:
COMPLETE Column stats: NONE
Filter Operator
predicate: ((i_color) IN ('purple', 'burlywood', 'indian')
and i_item_id is not null) (type: boolean)
Statistics: Num rows: 12000 Data size: 17183178 Basic
stats: COMPLETE Column stats: NONE
Select Operator
expressions: i_item_id (type: string)
outputColumnNames: _col0
Statistics: Num rows: 12000 Data size: 17183178 Basic
stats: COMPLETE Column stats: NONE
Group By Operator
keys: _col0 (type: string)
mode: hash
outputColumnNames: _col0
Statistics: Num rows: 12000 Data size: 17183178 Basic
stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: string)
sort order: +
Map-reduce partition columns: _col0 (type: string)
Statistics: Num rows: 12000 Data size: 17183178 Basic
stats: COMPLETE Column stats: NONE
Execution mode: vectorized
Map 13
Map Operator Tree:
TableScan
alias: item
filterExpr: i_item_id is not null (type: boolean)
Statistics: Num rows: 48000 Data size: 68732712 Basic stats:
COMPLETE Column stats: NONE
Filter Operator
predicate: i_item_id is not null (type: boolean)
Statistics: Num rows: 24000 Data size: 34366356 Basic
stats: COMPLETE Column stats: NONE
Map Join Operator
condition map:
Inner Join 0 to 1
condition expressions:
0 {d_date_sk}
1 {i_item_sk} {i_item_id}
keys:
0
1
outputColumnNames: _col0, _col31, _col32
input vertices:
0 Map 16
Statistics: Num rows: 26400 Data size: 37802992 Basic
stats: COMPLETE Column stats: NONE
Reduce Output Operator
sort order:
Statistics: Num rows: 26400 Data size: 37802992 Basic
stats: COMPLETE Column stats: NONE
value expressions: _col0 (type: int), _col31 (type:
int), _col32 (type: string)
Execution mode: vectorized
Map 14
Map Operator Tree:
TableScan
alias: web_sales
Statistics: Num rows: 143966864 Data size: 19001610332 Basic
stats: COMPLETE Column stats: NONE
Map Join Operator
condition map:
Inner Join 0 to 1
condition expressions:
0 {_col0} {_col31} {_col32} {_col56}
1 {ws_item_sk} {ws_bill_addr_sk} {ws_ext_sales_price}
{ws_sold_date_sk}
keys:
0
1
outputColumnNames: _col0, _col31, _col32, _col56, _col74,
_col78, _col94, _col105
input vertices:
0 Map 21
Statistics: Num rows: 158363552 Data size: 20901771264
Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: (((_col74 = _col31) and (_col105 = _col0)) and
(_col78 = _col56)) (type: boolean)
Statistics: Num rows: 19795444 Data size: 2612721408
Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: _col94 (type: float), _col32 (type: string)
outputColumnNames: _col22, _col85
Statistics: Num rows: 19795444 Data size: 2612721408
Basic stats: COMPLETE Column stats: NONE
Map Join Operator
condition map:
Left Semi Join 0 to 1
condition expressions:
0 {_col22} {_col85}
1
keys:
0 _col85 (type: string)
1 _col0 (type: string)
outputColumnNames: _col22, _col85
input vertices:
1 Map 11
Statistics: Num rows: 21774988 Data size: 2873993728
Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: _col85 (type: string), _col22 (type:
float)
outputColumnNames: _col0, _col1
Statistics: Num rows: 21774988 Data size:
2873993728 Basic stats: COMPLETE Column stats: NONE
Group By Operator
aggregations: sum(_col1)
keys: _col0 (type: string)
mode: hash
outputColumnNames: _col0, _col1
Statistics: Num rows: 21774988 Data size:
2873993728 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: string)
sort order: +
Map-reduce partition columns: _col0 (type:
string)
Statistics: Num rows: 21774988 Data size:
2873993728 Basic stats: COMPLETE Column stats: NONE
value expressions: _col1 (type: double)
Execution mode: vectorized
Map 16
Map Operator Tree:
TableScan
alias: date_dim
filterExpr: ((d_year = 2001) and (d_moy = 1)) (type: boolean)
Statistics: Num rows: 73049 Data size: 81741831 Basic stats:
COMPLETE Column stats: NONE
Filter Operator
predicate: ((d_year = 2001) and (d_moy = 1)) (type: boolean)
Statistics: Num rows: 18262 Data size: 20435178 Basic
stats: COMPLETE Column stats: NONE
Reduce Output Operator
sort order:
Statistics: Num rows: 18262 Data size: 20435178 Basic
stats: COMPLETE Column stats: NONE
value expressions: d_date_sk (type: int)
Execution mode: vectorized
Map 17
Map Operator Tree:
TableScan
alias: store_sales
Statistics: Num rows: 550076554 Data size: 47370018896 Basic
stats: COMPLETE Column stats: NONE
Map Join Operator
condition map:
Inner Join 0 to 1
condition expressions:
0 {_col0} {_col31} {_col32} {_col56}
1 {ss_item_sk} {ss_addr_sk} {ss_ext_sales_price}
{ss_sold_date_sk}
keys:
0
1
outputColumnNames: _col0, _col31, _col32, _col56, _col73,
_col77, _col86, _col94
input vertices:
0 Map 22
Statistics: Num rows: 605084224 Data size: 52107022336
Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: (((_col73 = _col31) and (_col94 = _col0)) and
(_col77 = _col56)) (type: boolean)
Statistics: Num rows: 75635528 Data size: 6513377792
Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: _col86 (type: float), _col32 (type: string)
outputColumnNames: _col14, _col74
Statistics: Num rows: 75635528 Data size: 6513377792
Basic stats: COMPLETE Column stats: NONE
Map Join Operator
condition map:
Left Semi Join 0 to 1
condition expressions:
0 {_col14} {_col74}
1
keys:
0 _col74 (type: string)
1 _col0 (type: string)
outputColumnNames: _col14, _col74
input vertices:
1 Map 12
Statistics: Num rows: 83199080 Data size: 7164715520
Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: _col74 (type: string), _col14 (type:
float)
outputColumnNames: _col0, _col1
Statistics: Num rows: 83199080 Data size:
7164715520 Basic stats: COMPLETE Column stats: NONE
Group By Operator
aggregations: sum(_col1)
keys: _col0 (type: string)
mode: hash
outputColumnNames: _col0, _col1
Statistics: Num rows: 83199080 Data size:
7164715520 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: string)
sort order: +
Map-reduce partition columns: _col0 (type:
string)
Statistics: Num rows: 83199080 Data size:
7164715520 Basic stats: COMPLETE Column stats: NONE
value expressions: _col1 (type: double)
Execution mode: vectorized
Map 19
Map Operator Tree:
TableScan
alias: date_dim
filterExpr: ((d_year = 2001) and (d_moy = 1)) (type: boolean)
Statistics: Num rows: 73049 Data size: 81741831 Basic stats:
COMPLETE Column stats: NONE
Filter Operator
predicate: ((d_year = 2001) and (d_moy = 1)) (type: boolean)
Statistics: Num rows: 18262 Data size: 20435178 Basic
stats: COMPLETE Column stats: NONE
Reduce Output Operator
sort order:
Statistics: Num rows: 18262 Data size: 20435178 Basic
stats: COMPLETE Column stats: NONE
value expressions: d_date_sk (type: int)
Execution mode: vectorized
Map 2
Map Operator Tree:
TableScan
alias: catalog_sales
Statistics: Num rows: 286549727 Data size: 37743959324 Basic
stats: COMPLETE Column stats: NONE
Map Join Operator
condition map:
Inner Join 0 to 1
condition expressions:
0 {_col0} {_col31} {_col32} {_col56}
1 {cs_bill_addr_sk} {cs_item_sk} {cs_ext_sales_price}
{cs_sold_date_sk}
keys:
0
1
outputColumnNames: _col0, _col31, _col32, _col56, _col77,
_col86, _col94, _col105
input vertices:
0 Map 10
Statistics: Num rows: 315204704 Data size: 41518358528
Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: (((_col86 = _col31) and (_col105 = _col0)) and
(_col77 = _col56)) (type: boolean)
Statistics: Num rows: 39400588 Data size: 5189794816
Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: _col94 (type: float), _col32 (type: string)
outputColumnNames: _col22, _col85
Statistics: Num rows: 39400588 Data size: 5189794816
Basic stats: COMPLETE Column stats: NONE
Map Join Operator
condition map:
Left Semi Join 0 to 1
condition expressions:
0 {_col22} {_col85}
1
keys:
0 _col85 (type: string)
1 _col0 (type: string)
outputColumnNames: _col22, _col85
input vertices:
1 Map 1
Statistics: Num rows: 43340648 Data size: 5708774400
Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: _col85 (type: string), _col22 (type:
float)
outputColumnNames: _col0, _col1
Statistics: Num rows: 43340648 Data size:
5708774400 Basic stats: COMPLETE Column stats: NONE
Group By Operator
aggregations: sum(_col1)
keys: _col0 (type: string)
mode: hash
outputColumnNames: _col0, _col1
Statistics: Num rows: 43340648 Data size:
5708774400 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: string)
sort order: +
Map-reduce partition columns: _col0 (type:
string)
Statistics: Num rows: 43340648 Data size:
5708774400 Basic stats: COMPLETE Column stats: NONE
value expressions: _col1 (type: double)
Execution mode: vectorized
Map 20
Map Operator Tree:
TableScan
alias: item
filterExpr: i_item_id is not null (type: boolean)
Statistics: Num rows: 48000 Data size: 68732712 Basic stats:
COMPLETE Column stats: NONE
Filter Operator
predicate: i_item_id is not null (type: boolean)
Statistics: Num rows: 24000 Data size: 34366356 Basic
stats: COMPLETE Column stats: NONE
Map Join Operator
condition map:
Inner Join 0 to 1
condition expressions:
0 {d_date_sk}
1 {i_item_sk} {i_item_id}
keys:
0
1
outputColumnNames: _col0, _col31, _col32
input vertices:
0 Map 9
Statistics: Num rows: 26400 Data size: 37802992 Basic
stats: COMPLETE Column stats: NONE
Reduce Output Operator
sort order:
Statistics: Num rows: 26400 Data size: 37802992 Basic
stats: COMPLETE Column stats: NONE
value expressions: _col0 (type: int), _col31 (type:
int), _col32 (type: string)
Execution mode: vectorized
Map 21
Map Operator Tree:
TableScan
alias: customer_address
filterExpr: (ca_gmt_offset = UDFToFloat((- 6))) (type:
boolean)
Statistics: Num rows: 800000 Data size: 811903688 Basic
stats: COMPLETE Column stats: NONE
Filter Operator
predicate: (ca_gmt_offset = UDFToFloat((- 6))) (type:
boolean)
Statistics: Num rows: 400000 Data size: 405951844 Basic
stats: COMPLETE Column stats: NONE
Map Join Operator
condition map:
Inner Join 0 to 1
condition expressions:
0 {_col0} {_col31} {_col32}
1 {ca_address_sk}
keys:
0
1
outputColumnNames: _col0, _col31, _col32, _col56
input vertices:
0 Map 8
Statistics: Num rows: 440000 Data size: 446547040 Basic
stats: COMPLETE Column stats: NONE
Reduce Output Operator
sort order:
Statistics: Num rows: 440000 Data size: 446547040 Basic
stats: COMPLETE Column stats: NONE
value expressions: _col0 (type: int), _col31 (type:
int), _col32 (type: string), _col56 (type: int)
Execution mode: vectorized
Map 22
Map Operator Tree:
TableScan
alias: customer_address
filterExpr: (ca_gmt_offset = UDFToFloat((- 6))) (type:
boolean)
Statistics: Num rows: 800000 Data size: 811903688 Basic
stats: COMPLETE Column stats: NONE
Filter Operator
predicate: (ca_gmt_offset = UDFToFloat((- 6))) (type:
boolean)
Statistics: Num rows: 400000 Data size: 405951844 Basic
stats: COMPLETE Column stats: NONE
Map Join Operator
condition map:
Inner Join 0 to 1
condition expressions:
0 {_col0} {_col31} {_col32}
1 {ca_address_sk}
keys:
0
1
outputColumnNames: _col0, _col31, _col32, _col56
input vertices:
0 Map 20
Statistics: Num rows: 440000 Data size: 446547040 Basic
stats: COMPLETE Column stats: NONE
Reduce Output Operator
sort order:
Statistics: Num rows: 440000 Data size: 446547040 Basic
stats: COMPLETE Column stats: NONE
value expressions: _col0 (type: int), _col31 (type:
int), _col32 (type: string), _col56 (type: int)
Execution mode: vectorized
Map 8
Map Operator Tree:
TableScan
alias: item
filterExpr: i_item_id is not null (type: boolean)
Statistics: Num rows: 48000 Data size: 68732712 Basic stats:
COMPLETE Column stats: NONE
Filter Operator
predicate: i_item_id is not null (type: boolean)
Statistics: Num rows: 24000 Data size: 34366356 Basic
stats: COMPLETE Column stats: NONE
Map Join Operator
condition map:
Inner Join 0 to 1
condition expressions:
0 {d_date_sk}
1 {i_item_sk} {i_item_id}
keys:
0
1
outputColumnNames: _col0, _col31, _col32
input vertices:
0 Map 19
Statistics: Num rows: 26400 Data size: 37802992 Basic
stats: COMPLETE Column stats: NONE
Reduce Output Operator
sort order:
Statistics: Num rows: 26400 Data size: 37802992 Basic
stats: COMPLETE Column stats: NONE
value expressions: _col0 (type: int), _col31 (type:
int), _col32 (type: string)
Execution mode: vectorized
Map 9
Map Operator Tree:
TableScan
alias: date_dim
filterExpr: ((d_year = 2001) and (d_moy = 1)) (type: boolean)
Statistics: Num rows: 73049 Data size: 81741831 Basic stats:
COMPLETE Column stats: NONE
Filter Operator
predicate: ((d_year = 2001) and (d_moy = 1)) (type: boolean)
Statistics: Num rows: 18262 Data size: 20435178 Basic
stats: COMPLETE Column stats: NONE
Reduce Output Operator
sort order:
Statistics: Num rows: 18262 Data size: 20435178 Basic
stats: COMPLETE Column stats: NONE
value expressions: d_date_sk (type: int)
Execution mode: vectorized
Reducer 15
Reduce Operator Tree:
Group By Operator
aggregations: sum(VALUE._col0)
keys: KEY._col0 (type: string)
mode: mergepartial
outputColumnNames: _col0, _col1
Select Operator
expressions: _col0 (type: string), _col1 (type: double)
outputColumnNames: _col0, _col1
Select Operator
expressions: _col0 (type: string), _col1 (type: double)
outputColumnNames: _col0, _col1
Group By Operator
aggregations: sum(_col1)
keys: _col0 (type: string)
mode: hash
outputColumnNames: _col0, _col1
Reduce Output Operator
key expressions: _col0 (type: string)
sort order: +
Map-reduce partition columns: _col0 (type: string)
value expressions: _col1 (type: double)
Reducer 18
Reduce Operator Tree:
Group By Operator
aggregations: sum(VALUE._col0)
keys: KEY._col0 (type: string)
mode: mergepartial
outputColumnNames: _col0, _col1
Select Operator
expressions: _col0 (type: string), _col1 (type: double)
outputColumnNames: _col0, _col1
Select Operator
expressions: _col0 (type: string), _col1 (type: double)
outputColumnNames: _col0, _col1
Select Operator
expressions: _col0 (type: string), _col1 (type: double)
outputColumnNames: _col0, _col1
Group By Operator
aggregations: sum(_col1)
keys: _col0 (type: string)
mode: hash
outputColumnNames: _col0, _col1
Reduce Output Operator
key expressions: _col0 (type: string)
sort order: +
Map-reduce partition columns: _col0 (type: string)
value expressions: _col1 (type: double)
Reducer 3
Reduce Operator Tree:
Group By Operator
aggregations: sum(VALUE._col0)
keys: KEY._col0 (type: string)
mode: mergepartial
outputColumnNames: _col0, _col1
Select Operator
expressions: _col0 (type: string), _col1 (type: double)
outputColumnNames: _col0, _col1
Select Operator
expressions: _col0 (type: string), _col1 (type: double)
outputColumnNames: _col0, _col1
Select Operator
expressions: _col0 (type: string), _col1 (type: double)
outputColumnNames: _col0, _col1
Group By Operator
aggregations: sum(_col1)
keys: _col0 (type: string)
mode: hash
outputColumnNames: _col0, _col1
Reduce Output Operator
key expressions: _col0 (type: string)
sort order: +
Map-reduce partition columns: _col0 (type: string)
value expressions: _col1 (type: double)
Reducer 5
Reduce Operator Tree:
Group By Operator
aggregations: sum(VALUE._col0)
keys: KEY._col0 (type: string)
mode: mergepartial
outputColumnNames: _col0, _col1
Statistics: Num rows: 37078679 Data size: 3936870912 Basic
stats: COMPLETE Column stats: NONE
Select Operator
expressions: _col0 (type: string), _col1 (type: double)
outputColumnNames: _col0, _col1
Statistics: Num rows: 37078679 Data size: 3936870912 Basic
stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col1 (type: double)
sort order: +
Statistics: Num rows: 37078679 Data size: 3936870912 Basic
stats: COMPLETE Column stats: NONE
TopN Hash Memory Usage: 0.04
value expressions: _col0 (type: string)
Execution mode: vectorized
Reducer 6
Reduce Operator Tree:
Select Operator
expressions: VALUE._col0 (type: string), KEY.reducesinkkey0
(type: double)
outputColumnNames: _col0, _col1
Statistics: Num rows: 37078679 Data size: 3936870912 Basic
stats: COMPLETE Column stats: NONE
Limit
Number of rows: 100
Statistics: Num rows: 100 Data size: 10600 Basic stats:
COMPLETE Column stats: NONE
Reduce Output Operator
sort order:
Statistics: Num rows: 100 Data size: 10600 Basic stats:
COMPLETE Column stats: NONE
TopN Hash Memory Usage: 0.04
value expressions: _col0 (type: string), _col1 (type:
double)
Execution mode: vectorized
Reducer 7
Reduce Operator Tree:
Select Operator
expressions: VALUE._col0 (type: string), VALUE._col1 (type:
double)
outputColumnNames: _col0, _col1
Statistics: Num rows: 100 Data size: 10600 Basic stats:
COMPLETE Column stats: NONE
Limit
Number of rows: 100
Statistics: Num rows: 100 Data size: 10600 Basic stats:
COMPLETE Column stats: NONE
Select Operator
expressions: _col0 (type: string), _col1 (type: double)
outputColumnNames: _col0, _col1
Statistics: Num rows: 100 Data size: 10600 Basic stats:
COMPLETE Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 100 Data size: 10600 Basic stats:
COMPLETE Column stats: NONE
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde:
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Execution mode: vectorized
Union 4
Vertex: Union 4
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink
{code}
> With CBO enabled cross product is generated when a subquery is present
> ----------------------------------------------------------------------
>
> Key: HIVE-7985
> URL: https://issues.apache.org/jira/browse/HIVE-7985
> Project: Hive
> Issue Type: Bug
> Components: CBO
> Affects Versions: 0.14.0
> Reporter: Mostafa Mokhtar
> Assignee: Laljo John Pullokkaran
> Fix For: 0.14.0
>
>
> This is a regression introduced in the latest build of the CBO branch.
> Removing the subquery for item will remove the cross products
> Query
> {code}
> select i_item_id,sum(ss_ext_sales_price) total_sales from store_sales,
> date_dim, item where item.i_item_id in (select i.i_item_id from item i where
> i_color in ('purple','burlywood','indian')) and ss_item_sk = i_item_sk and
> ss_sold_date_sk = d_date_sk and d_year = 2001 and d_moy = 1 group by
> i_item_id;
> {code}
> {code}
> Warning: Map Join MAPJOIN[38][bigTable=?] in task 'Map 1' is a cross product
> Warning: Map Join MAPJOIN[39][bigTable=store_sales] in task 'Map 4' is a
> cross product
> OK
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-0 depends on stages: Stage-1
> STAGE PLANS:
> Stage: Stage-1
> Tez
> Edges:
> Map 1 <- Map 3 (BROADCAST_EDGE)
> Map 4 <- Map 1 (BROADCAST_EDGE), Map 2 (BROADCAST_EDGE)
> Reducer 5 <- Map 4 (SIMPLE_EDGE)
> DagName: mmokhtar_20140904141313_9c253f7e-aad1-4ca4-9be1-ea45e3d34496:1
> Vertices:
> Map 1
> Map Operator Tree:
> TableScan
> alias: item
> filterExpr: (true and i_item_id is not null) (type: boolean)
> Statistics: Num rows: 462000 Data size: 663862160 Basic
> stats: COMPLETE Column stats: NONE
> Filter Operator
> predicate: i_item_id is not null (type: boolean)
> Statistics: Num rows: 231000 Data size: 331931080 Basic
> stats: COMPLETE Column stats: NONE
> Map Join Operator
> condition map:
> Inner Join 0 to 1
> condition expressions:
> 0 {i_item_sk} {i_item_id}
> 1 {d_date_sk}
> keys:
> 0
> 1
> outputColumnNames: _col0, _col1, _col25
> input vertices:
> 1 Map 3
> Statistics: Num rows: 254100 Data size: 365124192 Basic
> stats: COMPLETE Column stats: NONE
> Select Operator
> expressions: _col0 (type: int), _col1 (type: string),
> _col25 (type: int)
> outputColumnNames: _col0, _col1, _col25
> Statistics: Num rows: 254100 Data size: 365124192
> Basic stats: COMPLETE Column stats: NONE
> Reduce Output Operator
> sort order:
> Statistics: Num rows: 254100 Data size: 365124192
> Basic stats: COMPLETE Column stats: NONE
> value expressions: _col0 (type: int), _col1 (type:
> string), _col25 (type: int)
> Execution mode: vectorized
> Map 2
> Map Operator Tree:
> TableScan
> alias: i
> filterExpr: ((i_color) IN ('purple', 'burlywood', 'indian')
> and i_item_id is not null) (type: boolean)
> Statistics: Num rows: 462000 Data size: 663862160 Basic
> stats: COMPLETE Column stats: NONE
> Filter Operator
> predicate: ((i_color) IN ('purple', 'burlywood',
> 'indian') and i_item_id is not null) (type: boolean)
> Statistics: Num rows: 115500 Data size: 165965540 Basic
> stats: COMPLETE Column stats: NONE
> Select Operator
> expressions: i_item_id (type: string)
> outputColumnNames: _col0
> Statistics: Num rows: 115500 Data size: 165965540 Basic
> stats: COMPLETE Column stats: NONE
> Group By Operator
> keys: _col0 (type: string)
> mode: hash
> outputColumnNames: _col0
> Statistics: Num rows: 115500 Data size: 165965540
> Basic stats: COMPLETE Column stats: NONE
> Reduce Output Operator
> key expressions: _col0 (type: string)
> sort order: +
> Map-reduce partition columns: _col0 (type: string)
> Statistics: Num rows: 115500 Data size: 165965540
> Basic stats: COMPLETE Column stats: NONE
> Execution mode: vectorized
> Map 3
> Map Operator Tree:
> TableScan
> alias: date_dim
> filterExpr: ((d_year = 2001) and (d_moy = 1)) (type:
> boolean)
> Statistics: Num rows: 73049 Data size: 81741831 Basic
> stats: COMPLETE Column stats: NONE
> Filter Operator
> predicate: ((d_year = 2001) and (d_moy = 1)) (type:
> boolean)
> Statistics: Num rows: 18262 Data size: 20435178 Basic
> stats: COMPLETE Column stats: NONE
> Reduce Output Operator
> sort order:
> Statistics: Num rows: 18262 Data size: 20435178 Basic
> stats: COMPLETE Column stats: NONE
> value expressions: d_date_sk (type: int)
> Execution mode: vectorized
> Map 4
> Map Operator Tree:
> TableScan
> alias: store_sales
> Statistics: Num rows: 82510879939 Data size: 7203833257964
> Basic stats: COMPLETE Column stats: NONE
> Map Join Operator
> condition map:
> Inner Join 0 to 1
> condition expressions:
> 0 {ss_sold_date_sk} {ss_item_sk} {ss_ext_sales_price}
> 1 {_col0} {_col1} {_col25}
> keys:
> 0
> 1
> outputColumnNames: _col0, _col2, _col15, _col27, _col28,
> _col52
> input vertices:
> 1 Map 1
> Statistics: Num rows: 90761969664 Data size:
> 7924217282560 Basic stats: COMPLETE Column stats: NONE
> Filter Operator
> predicate: ((_col2 = _col27) and (_col0 = _col52))
> (type: boolean)
> Statistics: Num rows: 22690492416 Data size:
> 1981054320640 Basic stats: COMPLETE Column stats: NONE
> Select Operator
> expressions: _col15 (type: float), _col28 (type:
> string)
> outputColumnNames: _col15, _col59
> Statistics: Num rows: 22690492416 Data size:
> 1981054320640 Basic stats: COMPLETE Column stats: NONE
> Map Join Operator
> condition map:
> Left Semi Join 0 to 1
> condition expressions:
> 0 {_col15} {_col59}
> 1
> keys:
> 0 _col59 (type: string)
> 1 _col0 (type: string)
> outputColumnNames: _col15, _col59
> input vertices:
> 1 Map 2
> Statistics: Num rows: 24959541248 Data size:
> 2179159818240 Basic stats: COMPLETE Column stats: NONE
> Select Operator
> expressions: _col59 (type: string), _col15 (type:
> float)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 24959541248 Data size:
> 2179159818240 Basic stats: COMPLETE Column stats: NONE
> Group By Operator
> aggregations: sum(_col1)
> keys: _col0 (type: string)
> mode: hash
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 24959541248 Data size:
> 2179159818240 Basic stats: COMPLETE Column stats: NONE
> Reduce Output Operator
> key expressions: _col0 (type: string)
> sort order: +
> Map-reduce partition columns: _col0 (type:
> string)
> Statistics: Num rows: 24959541248 Data size:
> 2179159818240 Basic stats: COMPLETE Column stats: NONE
> value expressions: _col1 (type: double)
> Execution mode: vectorized
> Reducer 5
> Reduce Operator Tree:
> Group By Operator
> aggregations: sum(VALUE._col0)
> keys: KEY._col0 (type: string)
> mode: mergepartial
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 12479770624 Data size: 1089579909120
> Basic stats: COMPLETE Column stats: NONE
> Select Operator
> expressions: _col0 (type: string), _col1 (type: double)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 12479770624 Data size: 1089579909120
> Basic stats: COMPLETE Column stats: NONE
> File Output Operator
> compressed: false
> Statistics: Num rows: 12479770624 Data size:
> 1089579909120 Basic stats: COMPLETE Column stats: NONE
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format:
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> serde:
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> Stage: Stage-0
> Fetch Operator
> limit: -1
> Processor Tree:
> ListSink
> {code}
> No cross product generated for this query
> {code}
> select i_item_id,sum(ss_ext_sales_price) total_sales
> from
> store_sales,
> date_dim,
> item
> where i_color in ('purple','burlywood','indian')
> and ss_item_sk = i_item_sk
> and ss_sold_date_sk = d_date_sk
> and d_year = 2001
> and d_moy = 1
> group by i_item_id;
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)