[ https://issues.apache.org/jira/browse/HIVE-8769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14543989#comment-14543989 ]
Pengcheng Xiong commented on HIVE-8769: --------------------------------------- The TestMultiSessionsHS2WithLocalClusterSpark failing is mentioned in HIVE-9990. TestMinimrCliDriver.testCliDriver_schemeAuthority passed on my laptop. Both of them are unrelated to my patch. [~ashutoshc] and [~jpullokkaran], could you please take a look? Thanks. > Physical optimizer : Incorrect CE results in a shuffle join instead of a Map > join (PK/FK pattern not detected) > -------------------------------------------------------------------------------------------------------------- > > Key: HIVE-8769 > URL: https://issues.apache.org/jira/browse/HIVE-8769 > Project: Hive > Issue Type: Bug > Components: Physical Optimizer > Affects Versions: 0.14.0 > Reporter: Mostafa Mokhtar > Assignee: Pengcheng Xiong > Attachments: HIVE-8769.01.patch, HIVE-8769.02.patch > > > TPC-DS Q82 is running slower than hive 13 because the join type is not > correct. > The estimate for item x inventory x date_dim is 227 Million rows while the > actual is 3K rows. > Hive 13 finishes in 753 seconds. > Hive 14 finishes in 1,267 seconds. > Hive 14 + force map join finished in 431 seconds. > Query > {code} > select i_item_id > ,i_item_desc > ,i_current_price > from item, inventory, date_dim, store_sales > where i_current_price between 30 and 30+30 > and inv_item_sk = i_item_sk > and d_date_sk=inv_date_sk > and d_date between '2002-05-30' and '2002-07-30' > and i_manufact_id in (437,129,727,663) > and inv_quantity_on_hand between 100 and 500 > and ss_item_sk = i_item_sk > group by i_item_id,i_item_desc,i_current_price > order by i_item_id > limit 100 > {code} > Plan > {code} > STAGE PLANS: > Stage: Stage-1 > Tez > Edges: > Map 7 <- Map 1 (BROADCAST_EDGE), Map 2 (BROADCAST_EDGE) > Reducer 4 <- Map 3 (SIMPLE_EDGE), Map 7 (SIMPLE_EDGE) > Reducer 5 <- Reducer 4 (SIMPLE_EDGE) > Reducer 6 <- Reducer 5 (SIMPLE_EDGE) > DagName: mmokhtar_20141106005353_7a2eb8df-12ff-4fe9-89b4-30f1e4e3fb90:1 > Vertices: > Map 1 > Map Operator Tree: > TableScan > alias: item > filterExpr: ((i_current_price BETWEEN 30 AND 60 and > (i_manufact_id) IN (437, 129, 727, 663)) and i_item_sk is not null) (type: > boolean) > Statistics: Num rows: 462000 Data size: 663862160 Basic > stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: ((i_current_price BETWEEN 30 AND 60 and > (i_manufact_id) IN (437, 129, 727, 663)) and i_item_sk is not null) (type: > boolean) > Statistics: Num rows: 115500 Data size: 34185680 Basic > stats: COMPLETE Column stats: COMPLETE > Select Operator > expressions: i_item_sk (type: int), i_item_id (type: > string), i_item_desc (type: string), i_current_price (type: float) > outputColumnNames: _col0, _col1, _col2, _col3 > Statistics: Num rows: 115500 Data size: 33724832 Basic > stats: COMPLETE Column stats: COMPLETE > Reduce Output Operator > key expressions: _col0 (type: int) > sort order: + > Map-reduce partition columns: _col0 (type: int) > Statistics: Num rows: 115500 Data size: 33724832 > Basic stats: COMPLETE Column stats: COMPLETE > value expressions: _col1 (type: string), _col2 (type: > string), _col3 (type: float) > Execution mode: vectorized > Map 2 > Map Operator Tree: > TableScan > alias: date_dim > filterExpr: (d_date BETWEEN '2002-05-30' AND '2002-07-30' > and d_date_sk is not null) (type: boolean) > Statistics: Num rows: 73049 Data size: 81741831 Basic > stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: (d_date BETWEEN '2002-05-30' AND '2002-07-30' > and d_date_sk is not null) (type: boolean) > Statistics: Num rows: 36524 Data size: 3579352 Basic > stats: COMPLETE Column stats: COMPLETE > Select Operator > expressions: d_date_sk (type: int) > outputColumnNames: _col0 > Statistics: Num rows: 36524 Data size: 146096 Basic > stats: COMPLETE Column stats: COMPLETE > Reduce Output Operator > key expressions: _col0 (type: int) > sort order: + > Map-reduce partition columns: _col0 (type: int) > Statistics: Num rows: 36524 Data size: 146096 Basic > stats: COMPLETE Column stats: COMPLETE > Select Operator > expressions: _col0 (type: int) > outputColumnNames: _col0 > Statistics: Num rows: 36524 Data size: 146096 Basic > stats: COMPLETE Column stats: COMPLETE > Group By Operator > keys: _col0 (type: int) > mode: hash > outputColumnNames: _col0 > Statistics: Num rows: 18262 Data size: 73048 Basic > stats: COMPLETE Column stats: COMPLETE > Dynamic Partitioning Event Operator > Target Input: inventory > Partition key expr: inv_date_sk > Statistics: Num rows: 18262 Data size: 73048 > Basic stats: COMPLETE Column stats: COMPLETE > Target column: inv_date_sk > Target Vertex: Map 7 > Execution mode: vectorized > Map 3 > Map Operator Tree: > TableScan > alias: store_sales > filterExpr: ss_item_sk is not null (type: boolean) > Statistics: Num rows: 82510879939 Data size: 6873789738208 > Basic stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: ss_item_sk is not null (type: boolean) > Statistics: Num rows: 82510879939 Data size: 330043519756 > Basic stats: COMPLETE Column stats: COMPLETE > Select Operator > expressions: ss_item_sk (type: int) > outputColumnNames: _col0 > Statistics: Num rows: 82510879939 Data size: > 330043519756 Basic stats: COMPLETE Column stats: COMPLETE > Reduce Output Operator > key expressions: _col0 (type: int) > sort order: + > Map-reduce partition columns: _col0 (type: int) > Statistics: Num rows: 82510879939 Data size: > 330043519756 Basic stats: COMPLETE Column stats: COMPLETE > Execution mode: vectorized > Map 7 > Map Operator Tree: > TableScan > alias: inventory > filterExpr: (inv_quantity_on_hand BETWEEN 100 AND 500 and > inv_item_sk is not null) (type: boolean) > Statistics: Num rows: 1,627,857,000 Data size: 19208695084 > Basic stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: (inv_quantity_on_hand BETWEEN 100 AND 500 and > inv_item_sk is not null) (type: boolean) > Statistics: Num rows: 813,928,500 Data size: 9604347540 > Basic stats: COMPLETE Column stats: COMPLETE > Select Operator > expressions: inv_item_sk (type: int), inv_date_sk > (type: int) > outputColumnNames: _col0, _col2 > Statistics: Num rows: 813,928,500 Data size: 6511428000 > Basic stats: COMPLETE Column stats: COMPLETE > Map Join Operator > condition map: > Inner Join 0 to 1 > condition expressions: > 0 {_col2} > 1 {_col0} {_col1} {_col2} {_col3} > keys: > 0 _col0 (type: int) > 1 _col0 (type: int) > outputColumnNames: _col2, _col3, _col4, _col5, _col6 > input vertices: > 1 Map 1 > Statistics: Num rows: 203,482,128 Data size: > 59416781376 Basic stats: COMPLETE Column stats: COMPLETE > Map Join Operator > condition map: > Inner Join 0 to 1 > condition expressions: > 0 {_col3} {_col4} {_col5} {_col6} > 1 > keys: > 0 _col2 (type: int) > 1 _col0 (type: int) > outputColumnNames: _col3, _col4, _col5, _col6 > input vertices: > 1 Map 2 > Statistics: Num rows: 227,514,273 Data size: > 66434167716 Basic stats: COMPLETE Column stats: COMPLETE > Select Operator > expressions: _col3 (type: int), _col4 (type: > string), _col5 (type: string), _col6 (type: float) > outputColumnNames: _col3, _col4, _col5, _col6 > Statistics: Num rows: 227514273 Data size: > 66434167716 Basic stats: COMPLETE Column stats: COMPLETE > Reduce Output Operator > key expressions: _col3 (type: int) > sort order: + > Map-reduce partition columns: _col3 (type: int) > Statistics: Num rows: 227514273 Data size: > 66434167716 Basic stats: COMPLETE Column stats: COMPLETE > value expressions: _col4 (type: string), _col5 > (type: string), _col6 (type: float) > Execution mode: vectorized > Reducer 4 > Reduce Operator Tree: > Merge Join Operator > condition map: > Inner Join 0 to 1 > condition expressions: > 0 > 1 {VALUE._col3} {VALUE._col4} {VALUE._col5} > outputColumnNames: _col5, _col6, _col7 > Statistics: Num rows: 5156859392 Data size: 1485175504896 > Basic stats: COMPLETE Column stats: COMPLETE > Select Operator > expressions: _col5 (type: string), _col6 (type: string), > _col7 (type: float) > outputColumnNames: _col0, _col1, _col2 > Statistics: Num rows: 5156859392 Data size: 1485175504896 > Basic stats: COMPLETE Column stats: COMPLETE > Group By Operator > keys: _col0 (type: string), _col1 (type: string), _col2 > (type: float) > mode: hash > outputColumnNames: _col0, _col1, _col2 > Statistics: Num rows: 5156859392 Data size: 1485175504896 > Basic stats: COMPLETE Column stats: COMPLETE > Reduce Output Operator > key expressions: _col0 (type: string), _col1 (type: > string), _col2 (type: float) > sort order: +++ > Map-reduce partition columns: _col0 (type: string), > _col1 (type: string), _col2 (type: float) > Statistics: Num rows: 5156859392 Data size: > 1485175504896 Basic stats: COMPLETE Column stats: COMPLETE > Reducer 5 > Reduce Operator Tree: > Group By Operator > keys: KEY._col0 (type: string), KEY._col1 (type: string), > KEY._col2 (type: float) > mode: mergepartial > outputColumnNames: _col0, _col1, _col2 > Statistics: Num rows: 5156859392 Data size: 1485175504896 > Basic stats: COMPLETE Column stats: COMPLETE > Select Operator > expressions: _col0 (type: string), _col1 (type: string), > _col2 (type: float) > outputColumnNames: _col0, _col1, _col2 > Statistics: Num rows: 5156859392 Data size: 1485175504896 > Basic stats: COMPLETE Column stats: COMPLETE > Reduce Output Operator > key expressions: _col0 (type: string) > sort order: + > Statistics: Num rows: 5156859392 Data size: 1485175504896 > Basic stats: COMPLETE Column stats: COMPLETE > TopN Hash Memory Usage: 0.04 > value expressions: _col1 (type: string), _col2 (type: > float) > Execution mode: vectorized > Reducer 6 > Reduce Operator Tree: > Select Operator > expressions: KEY.reducesinkkey0 (type: string), VALUE._col0 > (type: string), VALUE._col1 (type: float) > outputColumnNames: _col0, _col1, _col2 > Statistics: Num rows: 5156859392 Data size: 1485175504896 > Basic stats: COMPLETE Column stats: COMPLETE > Limit > Number of rows: 100 > Statistics: Num rows: 100 Data size: 28800 Basic stats: > COMPLETE Column stats: COMPLETE > File Output Operator > compressed: false > Statistics: Num rows: 100 Data size: 28800 Basic stats: > COMPLETE Column stats: COMPLETE > table: > input format: org.apache.hadoop.mapred.TextInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > serde: > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > Execution mode: vectorized > Stage: Stage-0 > Fetch Operator > limit: 100 > Processor Tree: > ListSink > {code} > Actual rows counts > {code} > VERTICES TOTAL_TASKS FAILED_ATTEMPTS KILLED_TASKS DURATION_SECONDS > CPU_TIME_MILLIS GC_TIME_MILLIS INPUT_RECORDS OUTPUT_RECORDS > Map 1 1 0 0 1.01 > 1,280 0 462,000 65 > Map 2 1 0 0 0.41 > 400 23 10,000 62 > Map 3 2574 0 0 947.79 > 442,220,640 1,887,714 82,510,879,939 82,510,879,939 > Map 7 9 0 0 869.22 > 42,490 1,215 56,133,127 3,081 > Reducer 4 1009 0 0 389.75 > 69,471,510 1,149,529 82,510,883,020 33 > Reducer 5 253 0 0 93.73 > 938,930 26,150 33 33 > Reducer 6 1 0 0 2.08 > 730 10 33 33 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)