Fang-Yu Rao created IMPALA-13490:
------------------------------------
Summary: TpcdsCpuCostPlannerTest#testNonTpcdsDdl() could fail
after IMPALA-13469
Key: IMPALA-13490
URL: https://issues.apache.org/jira/browse/IMPALA-13490
Project: IMPALA
Issue Type: Bug
Components: Frontend
Affects Versions: Impala 4.5.0
Reporter: Fang-Yu Rao
Assignee: Riza Suminto
We found that testNonTpcdsDdl() in
[TpcdsCpuCostPlannerTest.java|https://github.com/apache/impala/blob/master/fe/src/test/java/org/apache/impala/planner/TpcdsCpuCostPlannerTest.java]
could fail after IMPALA-13469 with the following error.
It looks like the expected value of 'segment-costs' does not match the actual
one in the single node plan.
+*Error Message*+
{code:java}
Section PLAN of query at line 651:
create table t partitioned by (c_nationkey) sort by (c_custkey) as
select c_custkey, max(o_totalprice) as maxprice, c_nationkey
from tpch.orders join tpch.customer on c_custkey = o_custkey
where c_nationkey < 10
group by c_custkey, c_nationkey
Actual does not match expected result:
Max Per-Host Resource Reservation: Memory=19.44MB Threads=1
Per-Host Resource Estimates: Memory=35MB
F00:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
| Per-Instance Resources: mem-estimate=34.94MB mem-reservation=19.44MB
thread-reservation=1 runtime-filters-memory=1.00MB
| max-parallelism=1 segment-costs=[8689789, 272154, 4822204]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
WRITE TO HDFS [tpcds_partitioned_parquet_snap.t, OVERWRITE=false,
PARTITION-KEYS=(c_nationkey)]
| partitions=25
| output exprs: c_custkey, max(o_totalprice), c_nationkey
| mem-estimate=100.00KB mem-reservation=0B thread-reservation=0 cost=4822204
|
04:SORT
| order by: c_nationkey ASC NULLS LAST, c_custkey ASC NULLS LAST
| mem-estimate=6.00MB mem-reservation=6.00MB spill-buffer=2.00MB
thread-reservation=0
| tuple-ids=3 row-size=18B cardinality=228.68K cost=272154
| in pipelines: 04(GETNEXT), 03(OPEN)
|
03:AGGREGATE [FINALIZE]
| output: max(o_totalprice)
| group by: c_custkey, c_nationkey
| mem-estimate=10.00MB mem-reservation=8.50MB spill-buffer=512.00KB
thread-reservation=0
| tuple-ids=2 row-size=18B cardinality=228.68K cost=1349818
| in pipelines: 03(GETNEXT), 00(OPEN)
|
02:HASH JOIN [INNER JOIN]
| hash predicates: o_custkey = c_custkey
| fk/pk conjuncts: o_custkey = c_custkey
| runtime filters: RF000[bloom] <- c_custkey
| mem-estimate=1.94MB mem-reservation=1.94MB spill-buffer=64.00KB
thread-reservation=0
| tuple-ids=0,1 row-size=26B cardinality=228.68K cost=441187
| in pipelines: 00(GETNEXT), 01(OPEN)
|
|--01:SCAN HDFS [tpch.customer]
| HDFS partitions=1/1 files=1 size=23.08MB
| predicates: c_nationkey < CAST(10 AS SMALLINT)
| stored statistics:
| table: rows=150.00K size=23.08MB
| columns: all
| extrapolated-rows=disabled max-scan-range-rows=150.00K
| mem-estimate=16.00MB mem-reservation=8.00MB thread-reservation=0
| tuple-ids=1 row-size=10B cardinality=15.00K cost=864778
| in pipelines: 01(GETNEXT)
|
00:SCAN HDFS [tpch.orders]
HDFS partitions=1/1 files=1 size=162.56MB
runtime filters: RF000[bloom] -> o_custkey
stored statistics:
table: rows=1.50M size=162.56MB
columns: all
extrapolated-rows=disabled max-scan-range-rows=1.18M
mem-estimate=16.00MB mem-reservation=8.00MB thread-reservation=0
tuple-ids=0 row-size=16B cardinality=1.50M cost=6034006
in pipelines: 00(GETNEXT)
Expected:
Max Per-Host Resource Reservation: Memory=19.44MB Threads=1
Per-Host Resource Estimates: Memory=35MB
F00:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
| Per-Instance Resources: mem-estimate=34.94MB mem-reservation=19.44MB
thread-reservation=1 runtime-filters-memory=1.00MB
| max-parallelism=1 segment-costs=[8689789, 17851, 3700630]
WRITE TO HDFS [tpcds_partitioned_parquet_snap.t, OVERWRITE=false,
PARTITION-KEYS=(c_nationkey)]
| partitions=25
| output exprs: c_custkey, max(o_totalprice), c_nationkey
| mem-estimate=100.00KB mem-reservation=0B thread-reservation=0 cost=3700630
|
04:SORT
| order by: c_nationkey ASC NULLS LAST, c_custkey ASC NULLS LAST
| mem-estimate=6.00MB mem-reservation=6.00MB spill-buffer=2.00MB
thread-reservation=0
| tuple-ids=3 row-size=18B cardinality=15.00K cost=17851
| in pipelines: 04(GETNEXT), 03(OPEN)
|
03:AGGREGATE [FINALIZE]
| output: max(o_totalprice)
| group by: c_custkey, c_nationkey
| mem-estimate=10.00MB mem-reservation=8.50MB spill-buffer=512.00KB
thread-reservation=0
| tuple-ids=2 row-size=18B cardinality=15.00K cost=1349818
| in pipelines: 03(GETNEXT), 00(OPEN)
|
02:HASH JOIN [INNER JOIN]
| hash predicates: o_custkey = c_custkey
| fk/pk conjuncts: o_custkey = c_custkey
| runtime filters: RF000[bloom] <- c_custkey
| mem-estimate=1.94MB mem-reservation=1.94MB spill-buffer=64.00KB
thread-reservation=0
| tuple-ids=0,1 row-size=26B cardinality=228.68K cost=441187
| in pipelines: 00(GETNEXT), 01(OPEN)
|
|--01:SCAN HDFS [tpch.customer]
| HDFS partitions=1/1 files=1 size=23.08MB
| predicates: c_nationkey < CAST(10 AS SMALLINT)
| stored statistics:
| table: rows=150.00K size=23.08MB
| columns: all
| extrapolated-rows=disabled max-scan-range-rows=150.00K
| mem-estimate=16.00MB mem-reservation=8.00MB thread-reservation=0
| tuple-ids=1 row-size=10B cardinality=15.00K cost=864778
| in pipelines: 01(GETNEXT)
|
00:SCAN HDFS [tpch.orders]
HDFS partitions=1/1 files=1 size=162.56MB
runtime filters: RF000[bloom] -> o_custkey
stored statistics:
table: rows=1.50M size=162.56MB
columns: all
extrapolated-rows=disabled max-scan-range-rows=1.18M
mem-estimate=16.00MB mem-reservation=8.00MB thread-reservation=0
tuple-ids=0 row-size=16B cardinality=1.50M cost=6034006
in pipelines: 00(GETNEXT)
{code}
Moreover, the expected value of 'Memory' does not match the actual one in the
distributed plan.
{code}
Section DISTRIBUTEDPLAN of query at line 651:
create table t partitioned by (c_nationkey) sort by (c_custkey) as
select c_custkey, max(o_totalprice) as maxprice, c_nationkey
from tpch.orders join tpch.customer on c_custkey = o_custkey
where c_nationkey < 10
group by c_custkey, c_nationkey
Actual does not match expected result:
Max Per-Host Resource Reservation: Memory=35.69MB Threads=5
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Per-Host Resource Estimates: Memory=66MB
F03:PLAN FRAGMENT [HASH(c_nationkey)] hosts=2 instances=2
| Per-Instance Resources: mem-estimate=8.01MB mem-reservation=6.00MB
thread-reservation=1
| max-parallelism=2 segment-costs=[316495, 4822204] cpu-comparison-result=3
[max(2 (self) vs 3 (sum children))]
WRITE TO HDFS [tpcds_partitioned_parquet_snap.t, OVERWRITE=false,
PARTITION-KEYS=(c_nationkey)]
| partitions=25
| output exprs: c_custkey, max(o_totalprice), c_nationkey
| mem-estimate=100.00KB mem-reservation=0B thread-reservation=0 cost=4822204
|
08:SORT
| order by: c_nationkey ASC NULLS LAST, c_custkey ASC NULLS LAST
| mem-estimate=6.00MB mem-reservation=6.00MB spill-buffer=2.00MB
thread-reservation=0
| tuple-ids=3 row-size=18B cardinality=228.68K cost=272154
| in pipelines: 08(GETNEXT), 06(OPEN)
|
07:EXCHANGE [HASH(c_nationkey)]
| mem-estimate=2.01MB mem-reservation=0B thread-reservation=0
| tuple-ids=2 row-size=18B cardinality=228.68K cost=44341
| in pipelines: 06(GETNEXT)
|
F02:PLAN FRAGMENT [HASH(c_custkey,c_nationkey)] hosts=2 instances=2
Per-Instance Resources: mem-estimate=12.01MB mem-reservation=4.75MB
thread-reservation=1
max-parallelism=2 segment-costs=[1394159, 382905] cpu-comparison-result=3
[max(2 (self) vs 3 (sum children))]
06:AGGREGATE [FINALIZE]
| output: max:merge(o_totalprice)
| group by: c_custkey, c_nationkey
| mem-estimate=10.00MB mem-reservation=4.75MB spill-buffer=256.00KB
thread-reservation=0
| tuple-ids=2 row-size=18B cardinality=228.68K cost=1349818
| in pipelines: 06(GETNEXT), 00(OPEN)
|
05:EXCHANGE [HASH(c_custkey,c_nationkey)]
| mem-estimate=2.01MB mem-reservation=0B thread-reservation=0
| tuple-ids=2 row-size=18B cardinality=228.68K cost=44341
| in pipelines: 00(GETNEXT)
|
F00:PLAN FRAGMENT [RANDOM] hosts=2 instances=2
Per-Host Shared Resources: mem-estimate=1.00MB mem-reservation=1.00MB
thread-reservation=0 runtime-filters-memory=1.00MB
Per-Instance Resources: mem-estimate=26.17MB mem-reservation=13.00MB
thread-reservation=1
max-parallelism=2 segment-costs=[7810011, 382905] cpu-comparison-result=3
[max(2 (self) vs 3 (sum children))]
03:AGGREGATE [STREAMING]
| output: max(o_totalprice)
| group by: c_custkey, c_nationkey
| mem-estimate=10.00MB mem-reservation=5.00MB spill-buffer=256.00KB
thread-reservation=0
| tuple-ids=2 row-size=18B cardinality=228.68K cost=1349818
| in pipelines: 00(GETNEXT)
|
02:HASH JOIN [INNER JOIN, BROADCAST]
| hash-table-id=00
| hash predicates: o_custkey = c_custkey
| fk/pk conjuncts: o_custkey = c_custkey
| mem-estimate=0B mem-reservation=0B spill-buffer=64.00KB thread-reservation=0
| tuple-ids=0,1 row-size=26B cardinality=228.68K cost=426187
| in pipelines: 00(GETNEXT), 01(OPEN)
|
|--F04:PLAN FRAGMENT [RANDOM] hosts=2 instances=2
| | Per-Instance Resources: mem-estimate=3.09MB mem-reservation=2.94MB
thread-reservation=1 runtime-filters-memory=1.00MB
| | max-parallelism=2 segment-costs=[18986]
| JOIN BUILD
| | join-table-id=00 plan-id=01 cohort-id=01
| | build expressions: c_custkey
| | runtime filters: RF000[bloom] <- c_custkey
| | mem-estimate=1.94MB mem-reservation=1.94MB spill-buffer=64.00KB
thread-reservation=0 cost=15000
| |
| 04:EXCHANGE [BROADCAST]
| | mem-estimate=160.48KB mem-reservation=0B thread-reservation=0
| | tuple-ids=1 row-size=10B cardinality=15.00K cost=3986
| | in pipelines: 01(GETNEXT)
| |
| F01:PLAN FRAGMENT [RANDOM] hosts=1 instances=1
| Per-Instance Resources: mem-estimate=16.05MB mem-reservation=8.00MB
thread-reservation=1
| max-parallelism=1 segment-costs=[865507]
| 01:SCAN HDFS [tpch.customer, RANDOM]
| HDFS partitions=1/1 files=1 size=23.08MB
| predicates: c_nationkey < CAST(10 AS SMALLINT)
| stored statistics:
| table: rows=150.00K size=23.08MB
| columns: all
| extrapolated-rows=disabled max-scan-range-rows=150.00K
| mem-estimate=16.00MB mem-reservation=8.00MB thread-reservation=0
| tuple-ids=1 row-size=10B cardinality=15.00K cost=864778
| in pipelines: 01(GETNEXT)
|
00:SCAN HDFS [tpch.orders, RANDOM]
HDFS partitions=1/1 files=1 size=162.56MB
runtime filters: RF000[bloom] -> o_custkey
stored statistics:
table: rows=1.50M size=162.56MB
columns: all
extrapolated-rows=disabled max-scan-range-rows=1.18M
mem-estimate=16.00MB mem-reservation=8.00MB thread-reservation=0
tuple-ids=0 row-size=16B cardinality=1.50M cost=6034006
in pipelines: 00(GETNEXT)
Expected:
Max Per-Host Resource Reservation: Memory=32.88MB Threads=5
Per-Host Resource Estimates: Memory=63MB
F03:PLAN FRAGMENT [HASH(c_nationkey)] hosts=2 instances=2
| Per-Instance Resources: mem-estimate=6.17MB mem-reservation=6.00MB
thread-reservation=1
| max-parallelism=2 segment-costs=[20759, 3700630] cpu-comparison-result=3
[max(2 (self) vs 3 (sum children))]
WRITE TO HDFS [tpcds_partitioned_parquet_snap.t, OVERWRITE=false,
PARTITION-KEYS=(c_nationkey)]
| partitions=25
| output exprs: c_custkey, max(o_totalprice), c_nationkey
| mem-estimate=100.00KB mem-reservation=0B thread-reservation=0 cost=3700630
|
08:SORT
| order by: c_nationkey ASC NULLS LAST, c_custkey ASC NULLS LAST
| mem-estimate=6.00MB mem-reservation=6.00MB spill-buffer=2.00MB
thread-reservation=0
| tuple-ids=3 row-size=18B cardinality=15.00K cost=17851
| in pipelines: 08(GETNEXT), 06(OPEN)
|
07:EXCHANGE [HASH(c_nationkey)]
| mem-estimate=175.84KB mem-reservation=0B thread-reservation=0
| tuple-ids=2 row-size=18B cardinality=15.00K cost=2908
| in pipelines: 06(GETNEXT)
|
F02:PLAN FRAGMENT [HASH(c_custkey,c_nationkey)] hosts=2 instances=2
Per-Instance Resources: mem-estimate=10.17MB mem-reservation=1.94MB
thread-reservation=1
max-parallelism=2 segment-costs=[91447, 25116] cpu-comparison-result=3 [max(2
(self) vs 3 (sum children))]
06:AGGREGATE [FINALIZE]
| output: max:merge(o_totalprice)
| group by: c_custkey, c_nationkey
| mem-estimate=10.00MB mem-reservation=1.94MB spill-buffer=64.00KB
thread-reservation=0
| tuple-ids=2 row-size=18B cardinality=15.00K cost=88539
| in pipelines: 06(GETNEXT), 00(OPEN)
|
05:EXCHANGE [HASH(c_custkey,c_nationkey)]
| mem-estimate=175.84KB mem-reservation=0B thread-reservation=0
| tuple-ids=2 row-size=18B cardinality=15.00K cost=2908
| in pipelines: 00(GETNEXT)
|
F00:PLAN FRAGMENT [RANDOM] hosts=2 instances=2
Per-Host Shared Resources: mem-estimate=1.00MB mem-reservation=1.00MB
thread-reservation=0 runtime-filters-memory=1.00MB
Per-Instance Resources: mem-estimate=26.17MB mem-reservation=13.00MB
thread-reservation=1
max-parallelism=2 segment-costs=[7810011, 25116] cpu-comparison-result=3 [max(2
(self) vs 3 (sum children))]
03:AGGREGATE [STREAMING]
| output: max(o_totalprice)
| group by: c_custkey, c_nationkey
| mem-estimate=10.00MB mem-reservation=5.00MB spill-buffer=256.00KB
thread-reservation=0
| tuple-ids=2 row-size=18B cardinality=15.00K cost=1349818
| in pipelines: 00(GETNEXT)
|
02:HASH JOIN [INNER JOIN, BROADCAST]
| hash-table-id=00
| hash predicates: o_custkey = c_custkey
| fk/pk conjuncts: o_custkey = c_custkey
| mem-estimate=0B mem-reservation=0B spill-buffer=64.00KB thread-reservation=0
| tuple-ids=0,1 row-size=26B cardinality=228.68K cost=426187
| in pipelines: 00(GETNEXT), 01(OPEN)
|
|--F04:PLAN FRAGMENT [RANDOM] hosts=2 instances=2
| | Per-Instance Resources: mem-estimate=3.09MB mem-reservation=2.94MB
thread-reservation=1 runtime-filters-memory=1.00MB
| | max-parallelism=2 segment-costs=[18986]
| JOIN BUILD
| | join-table-id=00 plan-id=01 cohort-id=01
| | build expressions: c_custkey
| | runtime filters: RF000[bloom] <- c_custkey
| | mem-estimate=1.94MB mem-reservation=1.94MB spill-buffer=64.00KB
thread-reservation=0 cost=15000
| |
| 04:EXCHANGE [BROADCAST]
| | mem-estimate=160.48KB mem-reservation=0B thread-reservation=0
| | tuple-ids=1 row-size=10B cardinality=15.00K cost=3986
| | in pipelines: 01(GETNEXT)
| |
| F01:PLAN FRAGMENT [RANDOM] hosts=1 instances=1
| Per-Instance Resources: mem-estimate=16.05MB mem-reservation=8.00MB
thread-reservation=1
| max-parallelism=1 segment-costs=[865507]
| 01:SCAN HDFS [tpch.customer, RANDOM]
| HDFS partitions=1/1 files=1 size=23.08MB
| predicates: c_nationkey < CAST(10 AS SMALLINT)
| stored statistics:
| table: rows=150.00K size=23.08MB
| columns: all
| extrapolated-rows=disabled max-scan-range-rows=150.00K
| mem-estimate=16.00MB mem-reservation=8.00MB thread-reservation=0
| tuple-ids=1 row-size=10B cardinality=15.00K cost=864778
| in pipelines: 01(GETNEXT)
|
00:SCAN HDFS [tpch.orders, RANDOM]
HDFS partitions=1/1 files=1 size=162.56MB
runtime filters: RF000[bloom] -> o_custkey
stored statistics:
table: rows=1.50M size=162.56MB
columns: all
extrapolated-rows=disabled max-scan-range-rows=1.18M
mem-estimate=16.00MB mem-reservation=8.00MB thread-reservation=0
tuple-ids=0 row-size=16B cardinality=1.50M cost=6034006
in pipelines: 00(GETNEXT)
{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]