[ https://issues.apache.org/jira/browse/HIVE-9695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ashutosh Chauhan updated HIVE-9695: ----------------------------------- Component/s: (was: Physical Optimizer) Logical Optimizer > Redundant filter operator in reducer Vertex when CBO is disabled > ---------------------------------------------------------------- > > Key: HIVE-9695 > URL: https://issues.apache.org/jira/browse/HIVE-9695 > Project: Hive > Issue Type: Improvement > Components: Logical Optimizer > Affects Versions: 2.0.0 > Reporter: Mostafa Mokhtar > Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-9695.01.patch, HIVE-9695.01.patch, HIVE-9695.patch > > > There is a redundant filter operator in reducer Vertex when CBO is disabled. > Query > {code} > select > ss_item_sk, ss_ticket_number, ss_store_sk > from > store_sales a, store_returns b, store > where > a.ss_item_sk = b.sr_item_sk > and a.ss_ticket_number = b.sr_ticket_number > and ss_sold_date_sk between 2450816 and 2451500 > and sr_returned_date_sk between 2450816 and 2451500 > and s_store_sk = ss_store_sk; > {code} > Plan snippet > {code} > Statistics: Num rows: 57439344 Data size: 1838059008 Basic stats: COMPLETE > Column stats: COMPLETE > Filter Operator > predicate: (((((_col1 = _col27) and (_col8 = _col34)) and > _col22 BETWEEN 2450816 AND 2451500) and _col45 BETWEEN 2450816 AND 2451500) > and (_col49 = _col6)) (type: boolean) > {code} > Full plan with CBO disabled > {code} > STAGE DEPENDENCIES: > Stage-1 is a root stage > Stage-0 depends on stages: Stage-1 > STAGE PLANS: > Stage: Stage-1 > Tez > Edges: > Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 3 (BROADCAST_EDGE), Map 4 > (SIMPLE_EDGE) > DagName: mmokhtar_20150214182626_ad6820c7-b667-4652-ab25-cb60deed1a6d:13 > Vertices: > Map 1 > Map Operator Tree: > TableScan > alias: b > filterExpr: ((sr_item_sk is not null and sr_ticket_number > is not null) and sr_returned_date_sk BETWEEN 2450816 AND 2451500) (type: > boolean) > Statistics: Num rows: 2370038095 Data size: 170506118656 > Basic stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: (sr_item_sk is not null and sr_ticket_number > is not null) (type: boolean) > Statistics: Num rows: 706893063 Data size: 6498502768 > Basic stats: COMPLETE Column stats: COMPLETE > Reduce Output Operator > key expressions: sr_item_sk (type: int), > sr_ticket_number (type: int) > sort order: ++ > Map-reduce partition columns: sr_item_sk (type: int), > sr_ticket_number (type: int) > Statistics: Num rows: 706893063 Data size: 6498502768 > Basic stats: COMPLETE Column stats: COMPLETE > value expressions: sr_returned_date_sk (type: int) > Execution mode: vectorized > Map 3 > Map Operator Tree: > TableScan > alias: store > filterExpr: s_store_sk is not null (type: boolean) > Statistics: Num rows: 1704 Data size: 3256276 Basic stats: > COMPLETE Column stats: COMPLETE > Filter Operator > predicate: s_store_sk is not null (type: boolean) > Statistics: Num rows: 1704 Data size: 6816 Basic stats: > COMPLETE Column stats: COMPLETE > Reduce Output Operator > key expressions: s_store_sk (type: int) > sort order: + > Map-reduce partition columns: s_store_sk (type: int) > Statistics: Num rows: 1704 Data size: 6816 Basic stats: > COMPLETE Column stats: COMPLETE > Execution mode: vectorized > Map 4 > Map Operator Tree: > TableScan > alias: a > filterExpr: (((ss_item_sk is not null and ss_ticket_number > is not null) and ss_store_sk is not null) and ss_sold_date_sk BETWEEN 2450816 > AND 2451500) (type: boolean) > Statistics: Num rows: 28878719387 Data size: 2405805439460 > Basic stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: ((ss_item_sk is not null and ss_ticket_number > is not null) and ss_store_sk is not null) (type: boolean) > Statistics: Num rows: 8405840828 Data size: 110101408700 > Basic stats: COMPLETE Column stats: COMPLETE > Reduce Output Operator > key expressions: ss_item_sk (type: int), > ss_ticket_number (type: int) > sort order: ++ > Map-reduce partition columns: ss_item_sk (type: int), > ss_ticket_number (type: int) > Statistics: Num rows: 8405840828 Data size: > 110101408700 Basic stats: COMPLETE Column stats: COMPLETE > value expressions: ss_store_sk (type: int), > ss_sold_date_sk (type: int) > Execution mode: vectorized > Reducer 2 > Reduce Operator Tree: > Merge Join Operator > condition map: > Inner Join 0 to 1 > condition expressions: > 0 {KEY.reducesinkkey0} {VALUE._col5} {KEY.reducesinkkey1} > {VALUE._col20} > 1 {KEY.reducesinkkey0} {KEY.reducesinkkey1} {VALUE._col17} > outputColumnNames: _col1, _col6, _col8, _col22, _col27, > _col34, _col45 > Statistics: Num rows: 57439343 Data size: 1148786860 Basic > stats: COMPLETE Column stats: COMPLETE > Map Join Operator > condition map: > Inner Join 0 to 1 > condition expressions: > 0 {_col1} {_col6} {_col8} {_col22} {_col27} {_col34} > {_col45} > 1 {s_store_sk} > keys: > 0 _col6 (type: int) > 1 s_store_sk (type: int) > outputColumnNames: _col1, _col6, _col8, _col22, _col27, > _col34, _col45, _col49 > input vertices: > 1 Map 3 > Statistics: Num rows: 57439344 Data size: 1838059008 Basic > stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: (((((_col1 = _col27) and (_col8 = _col34)) and > _col22 BETWEEN 2450816 AND 2451500) and _col45 BETWEEN 2450816 AND 2451500) > and (_col49 = _col6)) (type: boolean) > Statistics: Num rows: 1794979 Data size: 57439328 Basic > stats: COMPLETE Column stats: COMPLETE > Select Operator > expressions: _col1 (type: int), _col8 (type: int), > _col6 (type: int) > outputColumnNames: _col0, _col1, _col2 > Statistics: Num rows: 1794979 Data size: 21539748 Basic > stats: COMPLETE Column stats: COMPLETE > File Output Operator > compressed: false > Statistics: Num rows: 1794979 Data size: 21539748 > Basic stats: COMPLETE Column stats: COMPLETE > table: > input format: > org.apache.hadoop.mapred.TextInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > serde: > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > Stage: Stage-0 > Fetch Operator > limit: -1 > Processor Tree: > ListSink > {code} > Full plan with CBO enabled > {code} > STAGE DEPENDENCIES: > Stage-1 is a root stage > Stage-0 depends on stages: Stage-1 > STAGE PLANS: > Stage: Stage-1 > Tez > Edges: > Map 4 <- Map 1 (BROADCAST_EDGE) > Reducer 3 <- Map 2 (SIMPLE_EDGE), Map 4 (SIMPLE_EDGE) > DagName: mmokhtar_20150214182525_63a9838f-db9f-40e9-8ae1-77c77143dccf:12 > Vertices: > Map 1 > Map Operator Tree: > TableScan > alias: store > filterExpr: s_store_sk is not null (type: boolean) > Statistics: Num rows: 1704 Data size: 3256276 Basic stats: > COMPLETE Column stats: COMPLETE > Filter Operator > predicate: s_store_sk is not null (type: boolean) > Statistics: Num rows: 1704 Data size: 6816 Basic stats: > COMPLETE Column stats: COMPLETE > Select Operator > expressions: s_store_sk (type: int) > outputColumnNames: _col0 > Statistics: Num rows: 1704 Data size: 6816 Basic stats: > COMPLETE Column stats: COMPLETE > Reduce Output Operator > key expressions: _col0 (type: int) > sort order: + > Map-reduce partition columns: _col0 (type: int) > Statistics: Num rows: 1704 Data size: 6816 Basic > stats: COMPLETE Column stats: COMPLETE > Execution mode: vectorized > Map 2 > Map Operator Tree: > TableScan > alias: b > filterExpr: (sr_item_sk is not null and sr_ticket_number is > not null) (type: boolean) > Statistics: Num rows: 2370038095 Data size: 170506118656 > Basic stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: (sr_item_sk is not null and sr_ticket_number > is not null) (type: boolean) > Statistics: Num rows: 706893063 Data size: 3670930516 > Basic stats: COMPLETE Column stats: COMPLETE > Select Operator > expressions: sr_item_sk (type: int), sr_ticket_number > (type: int) > outputColumnNames: _col0, _col1 > Statistics: Num rows: 706893063 Data size: 3670930516 > Basic stats: COMPLETE Column stats: COMPLETE > Reduce Output Operator > key expressions: _col0 (type: int), _col1 (type: int) > sort order: ++ > Map-reduce partition columns: _col0 (type: int), > _col1 (type: int) > Statistics: Num rows: 706893063 Data size: 3670930516 > Basic stats: COMPLETE Column stats: COMPLETE > Execution mode: vectorized > Map 4 > Map Operator Tree: > TableScan > alias: a > filterExpr: ((ss_store_sk is not null and ss_item_sk is not > null) and ss_ticket_number is not null) (type: boolean) > Statistics: Num rows: 28878719387 Data size: 2405805439460 > Basic stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: ((ss_store_sk is not null and ss_item_sk is > not null) and ss_ticket_number is not null) (type: boolean) > Statistics: Num rows: 8405840828 Data size: 76478045388 > Basic stats: COMPLETE Column stats: COMPLETE > Select Operator > expressions: ss_item_sk (type: int), ss_store_sk (type: > int), ss_ticket_number (type: int) > outputColumnNames: _col0, _col1, _col2 > Statistics: Num rows: 8405840828 Data size: 76478045388 > Basic stats: COMPLETE Column stats: COMPLETE > Map Join Operator > condition map: > Inner Join 0 to 1 > condition expressions: > 0 {_col0} {_col1} {_col2} > 1 > keys: > 0 _col1 (type: int) > 1 _col0 (type: int) > outputColumnNames: _col0, _col1, _col2 > input vertices: > 1 Map 1 > Statistics: Num rows: 8405840896 Data size: > 100870090752 Basic stats: COMPLETE Column stats: COMPLETE > Reduce Output Operator > key expressions: _col0 (type: int), _col2 (type: > int) > sort order: ++ > Map-reduce partition columns: _col0 (type: int), > _col2 (type: int) > Statistics: Num rows: 8405840896 Data size: > 100870090752 Basic stats: COMPLETE Column stats: COMPLETE > value expressions: _col1 (type: int) > Execution mode: vectorized > Reducer 3 > Reduce Operator Tree: > Merge Join Operator > condition map: > Inner Join 0 to 1 > condition expressions: > 0 {KEY.reducesinkkey0} {VALUE._col0} {KEY.reducesinkkey1} > 1 > outputColumnNames: _col0, _col1, _col2 > Statistics: Num rows: 75912751 Data size: 910953012 Basic > stats: COMPLETE Column stats: COMPLETE > Select Operator > expressions: _col0 (type: int), _col2 (type: int), _col1 > (type: int) > outputColumnNames: _col0, _col1, _col2 > Statistics: Num rows: 75912751 Data size: 910953012 Basic > stats: COMPLETE Column stats: COMPLETE > File Output Operator > compressed: false > Statistics: Num rows: 75912751 Data size: 910953012 Basic > stats: COMPLETE Column stats: COMPLETE > table: > input format: org.apache.hadoop.mapred.TextInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > serde: > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > Stage: Stage-0 > Fetch Operator > limit: -1 > Processor Tree: > ListSink > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)