asko created HIVE-9821: -------------------------- Summary: Having the consistent physical execution plan , which using explain command with disable CBO and enable CBO. Key: HIVE-9821 URL: https://issues.apache.org/jira/browse/HIVE-9821 Project: Hive Issue Type: Bug Components: CBO Affects Versions: 0.14.0 Reporter: asko Priority: Critical
bq. Test case is( JOIN sub tree had been flatten after CBO in final plan stage of calcite optimizer) : {quote} --set hive.cbo.enable=true; --ANALYZE TABLE customer COMPUTE STATISTICS for columns; --ANALYZE TABLE orders COMPUTE STATISTICS for columns; --ANALYZE TABLE lineitem COMPUTE STATISTICS for columns; --ANALYZE TABLE region COMPUTE STATISTICS for columns; --ANALYZE TABLE supplier COMPUTE STATISTICS for columns; --ANALYZE TABLE partsupp COMPUTE STATISTICS for columns; --ANALYZE TABLE part COMPUTE STATISTICS for columns; --ANALYZE TABLE nation COMPUTE STATISTICS for columns; explain select o_year, sum(case when nation = 'BRAZIL' then volume else 0.0 end) / sum(volume) as mkt_share from ( select year(o_orderdate) as o_year, l_extendedprice * (1-l_discount) as volume, n2.n_name as nation from nation n1 join region r on n1.n_regionkey = r.r_regionkey and r.r_name = 'AMERICA' join customer c on c.c_nationkey = n1.n_nationkey join orders o on c.c_custkey = o.o_custkey join lineitem l on l.l_orderkey = o.o_orderkey and o.o_orderdate >= '1995-01-01' and o.o_orderdate < '1996-12-31' join part p on p.p_partkey = l.l_partkey and p.p_type = 'ECONOMY ANODIZED STEEL' join supplier s on s.s_suppkey = l.l_suppkey join nation n2 on s.s_nationkey = n2.n_nationkey ) all_nation group by o_year order by o_year; {quote} bq. This test from had modified q8 in TPC-H_full . Uncomment could enable CBO. twice run results are same : {quote} STAGE DEPENDENCIES: Stage-1 is a root stage Stage-2 depends on stages: Stage-1, Stage-7 Stage-3 depends on stages: Stage-2, Stage-10 Stage-4 depends on stages: Stage-3 Stage-5 depends on stages: Stage-4 Stage-7 is a root stage Stage-9 is a root stage Stage-10 depends on stages: Stage-9, Stage-12 Stage-12 is a root stage Stage-0 depends on stages: Stage-5 STAGE PLANS: Stage: Stage-1 Map Reduce Map Operator Tree: TableScan alias: l Statistics: Num rows: 27137974 Data size: 759863296 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: ((l_partkey is not null and l_suppkey is not null) and l_orderkey is not null) (type: boolean) Statistics: Num rows: 3392247 Data size: 94982919 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: l_orderkey (type: int), l_partkey (type: int), l_suppkey (type: int), l_extendedprice (type: double), l_discount (type: double) outputColumnNames: _col0, _col1, _col2, _col3, _col4 Statistics: Num rows: 3392247 Data size: 94982919 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: _col1 (type: int) sort order: + Map-reduce partition columns: _col1 (type: int) Statistics: Num rows: 3392247 Data size: 94982919 Basic stats: COMPLETE Column stats: NONE value expressions: _col0 (type: int), _col2 (type: int), _col3 (type: double), _col4 (type: double) TableScan alias: p Statistics: Num rows: 928322 Data size: 24136384 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: ((p_type = 'ECONOMY ANODIZED STEEL') and p_partkey is not null) (type: boolean) Statistics: Num rows: 232081 Data size: 6034109 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: p_partkey (type: int) outputColumnNames: _col0 Statistics: Num rows: 232081 Data size: 6034109 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: _col0 (type: int) sort order: + Map-reduce partition columns: _col0 (type: int) Statistics: Num rows: 232081 Data size: 6034109 Basic stats: COMPLETE Column stats: NONE Reduce Operator Tree: Join Operator condition map: Inner Join 0 to 1 keys: 0 _col1 (type: int) 1 _col0 (type: int) outputColumnNames: _col0, _col2, _col3, _col4 Statistics: Num rows: 3731471 Data size: 104481213 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe Stage: Stage-2 Map Reduce Map Operator Tree: TableScan Reduce Output Operator key expressions: _col2 (type: int) sort order: + Map-reduce partition columns: _col2 (type: int) Statistics: Num rows: 3731471 Data size: 104481213 Basic stats: COMPLETE Column stats: NONE value expressions: _col0 (type: int), _col3 (type: double), _col4 (type: double) TableScan Reduce Output Operator key expressions: _col0 (type: int) sort order: + Map-reduce partition columns: _col0 (type: int) Statistics: Num rows: 48474 Data size: 387799 Basic stats: COMPLETE Column stats: NONE value expressions: _col3 (type: string) Reduce Operator Tree: Join Operator condition map: Inner Join 0 to 1 keys: 0 _col2 (type: int) 1 _col0 (type: int) outputColumnNames: _col0, _col3, _col4, _col10 Statistics: Num rows: 4104618 Data size: 114929336 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe Stage: Stage-3 Map Reduce Map Operator Tree: TableScan Reduce Output Operator key expressions: _col0 (type: int) sort order: + Map-reduce partition columns: _col0 (type: int) Statistics: Num rows: 4104618 Data size: 114929336 Basic stats: COMPLETE Column stats: NONE value expressions: _col3 (type: double), _col4 (type: double), _col10 (type: string) TableScan Reduce Output Operator key expressions: _col0 (type: int) sort order: + Map-reduce partition columns: _col0 (type: int) Statistics: Num rows: 920636 Data size: 7365096 Basic stats: COMPLETE Column stats: NONE value expressions: _col2 (type: string) Reduce Operator Tree: Join Operator condition map: Inner Join 0 to 1 keys: 0 _col0 (type: int) 1 _col0 (type: int) outputColumnNames: _col3, _col4, _col10, _col13 Statistics: Num rows: 4515079 Data size: 126422272 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: year(_col13) (type: int), (_col3 * (1.0 - _col4)) (type: double), _col10 (type: string) outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 4515079 Data size: 126422272 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: _col0 (type: int), CASE WHEN ((_col2 = 'BRAZIL')) THEN (_col1) ELSE (0.0) END (type: double), _col1 (type: double) outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 4515079 Data size: 126422272 Basic stats: COMPLETE Column stats: NONE Group By Operator aggregations: sum(_col1), sum(_col2) keys: _col0 (type: int) mode: hash outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 4515079 Data size: 126422272 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe Stage: Stage-4 Map Reduce Map Operator Tree: TableScan Reduce Output Operator key expressions: _col0 (type: int) sort order: + Map-reduce partition columns: _col0 (type: int) Statistics: Num rows: 4515079 Data size: 126422272 Basic stats: COMPLETE Column stats: NONE value expressions: _col1 (type: double), _col2 (type: double) Reduce Operator Tree: Group By Operator aggregations: sum(VALUE._col0), sum(VALUE._col1) keys: KEY._col0 (type: int) mode: mergepartial outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 2257539 Data size: 63211121 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: _col0 (type: int), (_col1 / _col2) (type: double) outputColumnNames: _col0, _col1 Statistics: Num rows: 2257539 Data size: 63211121 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe Stage: Stage-5 Map Reduce Map Operator Tree: TableScan Reduce Output Operator key expressions: _col0 (type: int) sort order: + Statistics: Num rows: 2257539 Data size: 63211121 Basic stats: COMPLETE Column stats: NONE value expressions: _col1 (type: double) Reduce Operator Tree: Select Operator expressions: KEY.reducesinkkey0 (type: int), VALUE._col0 (type: double) outputColumnNames: _col0, _col1 Statistics: Num rows: 2257539 Data size: 63211121 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 2257539 Data size: 63211121 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Stage: Stage-7 Map Reduce Map Operator Tree: TableScan alias: s Statistics: Num rows: 176269 Data size: 1410156 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (s_nationkey is not null and s_suppkey is not null) (type: boolean) Statistics: Num rows: 44068 Data size: 352545 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: s_suppkey (type: int), s_nationkey (type: int) outputColumnNames: _col0, _col1 Statistics: Num rows: 44068 Data size: 352545 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: _col1 (type: int) sort order: + Map-reduce partition columns: _col1 (type: int) Statistics: Num rows: 44068 Data size: 352545 Basic stats: COMPLETE Column stats: NONE value expressions: _col0 (type: int) TableScan alias: n1 Statistics: Num rows: 173 Data size: 18037 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: n_nationkey is not null (type: boolean) Statistics: Num rows: 87 Data size: 9070 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: n_nationkey (type: int), n_name (type: string) outputColumnNames: _col0, _col1 Statistics: Num rows: 87 Data size: 9070 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: _col0 (type: int) sort order: + Map-reduce partition columns: _col0 (type: int) Statistics: Num rows: 87 Data size: 9070 Basic stats: COMPLETE Column stats: NONE value expressions: _col1 (type: string) Reduce Operator Tree: Join Operator condition map: Inner Join 0 to 1 keys: 0 _col1 (type: int) 1 _col0 (type: int) outputColumnNames: _col0, _col3 Statistics: Num rows: 48474 Data size: 387799 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe Stage: Stage-9 Map Reduce Map Operator Tree: TableScan alias: o Statistics: Num rows: 1592161 Data size: 171953408 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: ((((o_orderdate >= '1995-01-01') and (o_orderdate < '1996-12-31')) and o_custkey is not null) and o_orderkey is not null) (type: boolean) Statistics: Num rows: 44227 Data size: 4776516 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: o_orderkey (type: int), o_custkey (type: int), o_orderdate (type: string) outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 44227 Data size: 4776516 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: _col1 (type: int) sort order: + Map-reduce partition columns: _col1 (type: int) Statistics: Num rows: 44227 Data size: 4776516 Basic stats: COMPLETE Column stats: NONE value expressions: _col0 (type: int), _col2 (type: string) TableScan alias: c Statistics: Num rows: 3043428 Data size: 24347428 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (c_custkey is not null and c_nationkey is not null) (type: boolean) Statistics: Num rows: 760857 Data size: 6086857 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: c_custkey (type: int), c_nationkey (type: int) outputColumnNames: _col0, _col1 Statistics: Num rows: 760857 Data size: 6086857 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: _col0 (type: int) sort order: + Map-reduce partition columns: _col0 (type: int) Statistics: Num rows: 760857 Data size: 6086857 Basic stats: COMPLETE Column stats: NONE value expressions: _col1 (type: int) Reduce Operator Tree: Join Operator condition map: Inner Join 0 to 1 keys: 0 _col1 (type: int) 1 _col0 (type: int) outputColumnNames: _col0, _col2, _col4 Statistics: Num rows: 836942 Data size: 6695542 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe Stage: Stage-10 Map Reduce Map Operator Tree: TableScan Reduce Output Operator key expressions: _col4 (type: int) sort order: + Map-reduce partition columns: _col4 (type: int) Statistics: Num rows: 836942 Data size: 6695542 Basic stats: COMPLETE Column stats: NONE value expressions: _col0 (type: int), _col2 (type: string) TableScan Reduce Output Operator key expressions: _col0 (type: int) sort order: + Map-reduce partition columns: _col0 (type: int) Statistics: Num rows: 620 Data size: 4964 Basic stats: COMPLETE Column stats: NONE Reduce Operator Tree: Join Operator condition map: Inner Join 0 to 1 keys: 0 _col4 (type: int) 1 _col0 (type: int) outputColumnNames: _col0, _col2 Statistics: Num rows: 920636 Data size: 7365096 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe Stage: Stage-12 Map Reduce Map Operator Tree: TableScan alias: n1 Statistics: Num rows: 2254 Data size: 18037 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (n_regionkey is not null and n_nationkey is not null) (type: boolean) Statistics: Num rows: 564 Data size: 4513 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: n_nationkey (type: int), n_regionkey (type: int) outputColumnNames: _col0, _col1 Statistics: Num rows: 564 Data size: 4513 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: _col1 (type: int) sort order: + Map-reduce partition columns: _col1 (type: int) Statistics: Num rows: 564 Data size: 4513 Basic stats: COMPLETE Column stats: NONE value expressions: _col0 (type: int) TableScan alias: r Statistics: Num rows: 63 Data size: 695 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: ((r_name = 'AMERICA') and r_regionkey is not null) (type: boolean) Statistics: Num rows: 16 Data size: 176 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: r_regionkey (type: int) outputColumnNames: _col0 Statistics: Num rows: 16 Data size: 176 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: _col0 (type: int) sort order: + Map-reduce partition columns: _col0 (type: int) Statistics: Num rows: 16 Data size: 176 Basic stats: COMPLETE Column stats: NONE Reduce Operator Tree: Join Operator condition map: Inner Join 0 to 1 keys: 0 _col1 (type: int) 1 _col0 (type: int) outputColumnNames: _col0 Statistics: Num rows: 620 Data size: 4964 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: ListSink {quote} bq. Is CBO invalid? -- This message was sent by Atlassian JIRA (v6.3.4#6332)