[
https://issues.apache.org/jira/browse/IMPALA-13490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17892878#comment-17892878
]
Fang-Yu Rao commented on IMPALA-13490:
--------------------------------------
Hi [~rizaon], assigned this ticket to you since you are more familiar with this
area. Please feel free to re-assign the JIRA as you see appropriate. Thanks!
> TpcdsCpuCostPlannerTest#testNonTpcdsDdl() could fail after IMPALA-13469
> -----------------------------------------------------------------------
>
> Key: IMPALA-13490
> URL: https://issues.apache.org/jira/browse/IMPALA-13490
> Project: IMPALA
> Issue Type: Bug
> Components: Frontend
> Affects Versions: Impala 4.5.0
> Reporter: Fang-Yu Rao
> Assignee: Riza Suminto
> Priority: Major
> Labels: broken-build
>
> We found that testNonTpcdsDdl() in
> [TpcdsCpuCostPlannerTest.java|https://github.com/apache/impala/blob/master/fe/src/test/java/org/apache/impala/planner/TpcdsCpuCostPlannerTest.java]
> could fail after IMPALA-13469 with the following error.
>
> It looks like the expected value of 'segment-costs' does not match the actual
> one in the single node plan.
> +*Error Message*+
> {code:java}
> Section PLAN of query at line 651:
> create table t partitioned by (c_nationkey) sort by (c_custkey) as
> select c_custkey, max(o_totalprice) as maxprice, c_nationkey
> from tpch.orders join tpch.customer on c_custkey = o_custkey
> where c_nationkey < 10
> group by c_custkey, c_nationkey
> Actual does not match expected result:
> Max Per-Host Resource Reservation: Memory=19.44MB Threads=1
> Per-Host Resource Estimates: Memory=35MB
> F00:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
> | Per-Instance Resources: mem-estimate=34.94MB mem-reservation=19.44MB
> thread-reservation=1 runtime-filters-memory=1.00MB
> | max-parallelism=1 segment-costs=[8689789, 272154, 4822204]
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> WRITE TO HDFS [tpcds_partitioned_parquet_snap.t, OVERWRITE=false,
> PARTITION-KEYS=(c_nationkey)]
> | partitions=25
> | output exprs: c_custkey, max(o_totalprice), c_nationkey
> | mem-estimate=100.00KB mem-reservation=0B thread-reservation=0 cost=4822204
> |
> 04:SORT
> | order by: c_nationkey ASC NULLS LAST, c_custkey ASC NULLS LAST
> | mem-estimate=6.00MB mem-reservation=6.00MB spill-buffer=2.00MB
> thread-reservation=0
> | tuple-ids=3 row-size=18B cardinality=228.68K cost=272154
> | in pipelines: 04(GETNEXT), 03(OPEN)
> |
> 03:AGGREGATE [FINALIZE]
> | output: max(o_totalprice)
> | group by: c_custkey, c_nationkey
> | mem-estimate=10.00MB mem-reservation=8.50MB spill-buffer=512.00KB
> thread-reservation=0
> | tuple-ids=2 row-size=18B cardinality=228.68K cost=1349818
> | in pipelines: 03(GETNEXT), 00(OPEN)
> |
> 02:HASH JOIN [INNER JOIN]
> | hash predicates: o_custkey = c_custkey
> | fk/pk conjuncts: o_custkey = c_custkey
> | runtime filters: RF000[bloom] <- c_custkey
> | mem-estimate=1.94MB mem-reservation=1.94MB spill-buffer=64.00KB
> thread-reservation=0
> | tuple-ids=0,1 row-size=26B cardinality=228.68K cost=441187
> | in pipelines: 00(GETNEXT), 01(OPEN)
> |
> |--01:SCAN HDFS [tpch.customer]
> | HDFS partitions=1/1 files=1 size=23.08MB
> | predicates: c_nationkey < CAST(10 AS SMALLINT)
> | stored statistics:
> | table: rows=150.00K size=23.08MB
> | columns: all
> | extrapolated-rows=disabled max-scan-range-rows=150.00K
> | mem-estimate=16.00MB mem-reservation=8.00MB thread-reservation=0
> | tuple-ids=1 row-size=10B cardinality=15.00K cost=864778
> | in pipelines: 01(GETNEXT)
> |
> 00:SCAN HDFS [tpch.orders]
> HDFS partitions=1/1 files=1 size=162.56MB
> runtime filters: RF000[bloom] -> o_custkey
> stored statistics:
> table: rows=1.50M size=162.56MB
> columns: all
> extrapolated-rows=disabled max-scan-range-rows=1.18M
> mem-estimate=16.00MB mem-reservation=8.00MB thread-reservation=0
> tuple-ids=0 row-size=16B cardinality=1.50M cost=6034006
> in pipelines: 00(GETNEXT)
> Expected:
> Max Per-Host Resource Reservation: Memory=19.44MB Threads=1
> Per-Host Resource Estimates: Memory=35MB
> F00:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
> | Per-Instance Resources: mem-estimate=34.94MB mem-reservation=19.44MB
> thread-reservation=1 runtime-filters-memory=1.00MB
> | max-parallelism=1 segment-costs=[8689789, 17851, 3700630]
> WRITE TO HDFS [tpcds_partitioned_parquet_snap.t, OVERWRITE=false,
> PARTITION-KEYS=(c_nationkey)]
> | partitions=25
> | output exprs: c_custkey, max(o_totalprice), c_nationkey
> | mem-estimate=100.00KB mem-reservation=0B thread-reservation=0 cost=3700630
> |
> 04:SORT
> | order by: c_nationkey ASC NULLS LAST, c_custkey ASC NULLS LAST
> | mem-estimate=6.00MB mem-reservation=6.00MB spill-buffer=2.00MB
> thread-reservation=0
> | tuple-ids=3 row-size=18B cardinality=15.00K cost=17851
> | in pipelines: 04(GETNEXT), 03(OPEN)
> |
> 03:AGGREGATE [FINALIZE]
> | output: max(o_totalprice)
> | group by: c_custkey, c_nationkey
> | mem-estimate=10.00MB mem-reservation=8.50MB spill-buffer=512.00KB
> thread-reservation=0
> | tuple-ids=2 row-size=18B cardinality=15.00K cost=1349818
> | in pipelines: 03(GETNEXT), 00(OPEN)
> |
> 02:HASH JOIN [INNER JOIN]
> | hash predicates: o_custkey = c_custkey
> | fk/pk conjuncts: o_custkey = c_custkey
> | runtime filters: RF000[bloom] <- c_custkey
> | mem-estimate=1.94MB mem-reservation=1.94MB spill-buffer=64.00KB
> thread-reservation=0
> | tuple-ids=0,1 row-size=26B cardinality=228.68K cost=441187
> | in pipelines: 00(GETNEXT), 01(OPEN)
> |
> |--01:SCAN HDFS [tpch.customer]
> | HDFS partitions=1/1 files=1 size=23.08MB
> | predicates: c_nationkey < CAST(10 AS SMALLINT)
> | stored statistics:
> | table: rows=150.00K size=23.08MB
> | columns: all
> | extrapolated-rows=disabled max-scan-range-rows=150.00K
> | mem-estimate=16.00MB mem-reservation=8.00MB thread-reservation=0
> | tuple-ids=1 row-size=10B cardinality=15.00K cost=864778
> | in pipelines: 01(GETNEXT)
> |
> 00:SCAN HDFS [tpch.orders]
> HDFS partitions=1/1 files=1 size=162.56MB
> runtime filters: RF000[bloom] -> o_custkey
> stored statistics:
> table: rows=1.50M size=162.56MB
> columns: all
> extrapolated-rows=disabled max-scan-range-rows=1.18M
> mem-estimate=16.00MB mem-reservation=8.00MB thread-reservation=0
> tuple-ids=0 row-size=16B cardinality=1.50M cost=6034006
> in pipelines: 00(GETNEXT)
> {code}
> Moreover, the expected value of 'Memory' does not match the actual one in the
> distributed plan.
> {code}
> Section DISTRIBUTEDPLAN of query at line 651:
> create table t partitioned by (c_nationkey) sort by (c_custkey) as
> select c_custkey, max(o_totalprice) as maxprice, c_nationkey
> from tpch.orders join tpch.customer on c_custkey = o_custkey
> where c_nationkey < 10
> group by c_custkey, c_nationkey
> Actual does not match expected result:
> Max Per-Host Resource Reservation: Memory=35.69MB Threads=5
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> Per-Host Resource Estimates: Memory=66MB
> F03:PLAN FRAGMENT [HASH(c_nationkey)] hosts=2 instances=2
> | Per-Instance Resources: mem-estimate=8.01MB mem-reservation=6.00MB
> thread-reservation=1
> | max-parallelism=2 segment-costs=[316495, 4822204] cpu-comparison-result=3
> [max(2 (self) vs 3 (sum children))]
> WRITE TO HDFS [tpcds_partitioned_parquet_snap.t, OVERWRITE=false,
> PARTITION-KEYS=(c_nationkey)]
> | partitions=25
> | output exprs: c_custkey, max(o_totalprice), c_nationkey
> | mem-estimate=100.00KB mem-reservation=0B thread-reservation=0 cost=4822204
> |
> 08:SORT
> | order by: c_nationkey ASC NULLS LAST, c_custkey ASC NULLS LAST
> | mem-estimate=6.00MB mem-reservation=6.00MB spill-buffer=2.00MB
> thread-reservation=0
> | tuple-ids=3 row-size=18B cardinality=228.68K cost=272154
> | in pipelines: 08(GETNEXT), 06(OPEN)
> |
> 07:EXCHANGE [HASH(c_nationkey)]
> | mem-estimate=2.01MB mem-reservation=0B thread-reservation=0
> | tuple-ids=2 row-size=18B cardinality=228.68K cost=44341
> | in pipelines: 06(GETNEXT)
> |
> F02:PLAN FRAGMENT [HASH(c_custkey,c_nationkey)] hosts=2 instances=2
> Per-Instance Resources: mem-estimate=12.01MB mem-reservation=4.75MB
> thread-reservation=1
> max-parallelism=2 segment-costs=[1394159, 382905] cpu-comparison-result=3
> [max(2 (self) vs 3 (sum children))]
> 06:AGGREGATE [FINALIZE]
> | output: max:merge(o_totalprice)
> | group by: c_custkey, c_nationkey
> | mem-estimate=10.00MB mem-reservation=4.75MB spill-buffer=256.00KB
> thread-reservation=0
> | tuple-ids=2 row-size=18B cardinality=228.68K cost=1349818
> | in pipelines: 06(GETNEXT), 00(OPEN)
> |
> 05:EXCHANGE [HASH(c_custkey,c_nationkey)]
> | mem-estimate=2.01MB mem-reservation=0B thread-reservation=0
> | tuple-ids=2 row-size=18B cardinality=228.68K cost=44341
> | in pipelines: 00(GETNEXT)
> |
> F00:PLAN FRAGMENT [RANDOM] hosts=2 instances=2
> Per-Host Shared Resources: mem-estimate=1.00MB mem-reservation=1.00MB
> thread-reservation=0 runtime-filters-memory=1.00MB
> Per-Instance Resources: mem-estimate=26.17MB mem-reservation=13.00MB
> thread-reservation=1
> max-parallelism=2 segment-costs=[7810011, 382905] cpu-comparison-result=3
> [max(2 (self) vs 3 (sum children))]
> 03:AGGREGATE [STREAMING]
> | output: max(o_totalprice)
> | group by: c_custkey, c_nationkey
> | mem-estimate=10.00MB mem-reservation=5.00MB spill-buffer=256.00KB
> thread-reservation=0
> | tuple-ids=2 row-size=18B cardinality=228.68K cost=1349818
> | in pipelines: 00(GETNEXT)
> |
> 02:HASH JOIN [INNER JOIN, BROADCAST]
> | hash-table-id=00
> | hash predicates: o_custkey = c_custkey
> | fk/pk conjuncts: o_custkey = c_custkey
> | mem-estimate=0B mem-reservation=0B spill-buffer=64.00KB
> thread-reservation=0
> | tuple-ids=0,1 row-size=26B cardinality=228.68K cost=426187
> | in pipelines: 00(GETNEXT), 01(OPEN)
> |
> |--F04:PLAN FRAGMENT [RANDOM] hosts=2 instances=2
> | | Per-Instance Resources: mem-estimate=3.09MB mem-reservation=2.94MB
> thread-reservation=1 runtime-filters-memory=1.00MB
> | | max-parallelism=2 segment-costs=[18986]
> | JOIN BUILD
> | | join-table-id=00 plan-id=01 cohort-id=01
> | | build expressions: c_custkey
> | | runtime filters: RF000[bloom] <- c_custkey
> | | mem-estimate=1.94MB mem-reservation=1.94MB spill-buffer=64.00KB
> thread-reservation=0 cost=15000
> | |
> | 04:EXCHANGE [BROADCAST]
> | | mem-estimate=160.48KB mem-reservation=0B thread-reservation=0
> | | tuple-ids=1 row-size=10B cardinality=15.00K cost=3986
> | | in pipelines: 01(GETNEXT)
> | |
> | F01:PLAN FRAGMENT [RANDOM] hosts=1 instances=1
> | Per-Instance Resources: mem-estimate=16.05MB mem-reservation=8.00MB
> thread-reservation=1
> | max-parallelism=1 segment-costs=[865507]
> | 01:SCAN HDFS [tpch.customer, RANDOM]
> | HDFS partitions=1/1 files=1 size=23.08MB
> | predicates: c_nationkey < CAST(10 AS SMALLINT)
> | stored statistics:
> | table: rows=150.00K size=23.08MB
> | columns: all
> | extrapolated-rows=disabled max-scan-range-rows=150.00K
> | mem-estimate=16.00MB mem-reservation=8.00MB thread-reservation=0
> | tuple-ids=1 row-size=10B cardinality=15.00K cost=864778
> | in pipelines: 01(GETNEXT)
> |
> 00:SCAN HDFS [tpch.orders, RANDOM]
> HDFS partitions=1/1 files=1 size=162.56MB
> runtime filters: RF000[bloom] -> o_custkey
> stored statistics:
> table: rows=1.50M size=162.56MB
> columns: all
> extrapolated-rows=disabled max-scan-range-rows=1.18M
> mem-estimate=16.00MB mem-reservation=8.00MB thread-reservation=0
> tuple-ids=0 row-size=16B cardinality=1.50M cost=6034006
> in pipelines: 00(GETNEXT)
> Expected:
> Max Per-Host Resource Reservation: Memory=32.88MB Threads=5
> Per-Host Resource Estimates: Memory=63MB
> F03:PLAN FRAGMENT [HASH(c_nationkey)] hosts=2 instances=2
> | Per-Instance Resources: mem-estimate=6.17MB mem-reservation=6.00MB
> thread-reservation=1
> | max-parallelism=2 segment-costs=[20759, 3700630] cpu-comparison-result=3
> [max(2 (self) vs 3 (sum children))]
> WRITE TO HDFS [tpcds_partitioned_parquet_snap.t, OVERWRITE=false,
> PARTITION-KEYS=(c_nationkey)]
> | partitions=25
> | output exprs: c_custkey, max(o_totalprice), c_nationkey
> | mem-estimate=100.00KB mem-reservation=0B thread-reservation=0 cost=3700630
> |
> 08:SORT
> | order by: c_nationkey ASC NULLS LAST, c_custkey ASC NULLS LAST
> | mem-estimate=6.00MB mem-reservation=6.00MB spill-buffer=2.00MB
> thread-reservation=0
> | tuple-ids=3 row-size=18B cardinality=15.00K cost=17851
> | in pipelines: 08(GETNEXT), 06(OPEN)
> |
> 07:EXCHANGE [HASH(c_nationkey)]
> | mem-estimate=175.84KB mem-reservation=0B thread-reservation=0
> | tuple-ids=2 row-size=18B cardinality=15.00K cost=2908
> | in pipelines: 06(GETNEXT)
> |
> F02:PLAN FRAGMENT [HASH(c_custkey,c_nationkey)] hosts=2 instances=2
> Per-Instance Resources: mem-estimate=10.17MB mem-reservation=1.94MB
> thread-reservation=1
> max-parallelism=2 segment-costs=[91447, 25116] cpu-comparison-result=3 [max(2
> (self) vs 3 (sum children))]
> 06:AGGREGATE [FINALIZE]
> | output: max:merge(o_totalprice)
> | group by: c_custkey, c_nationkey
> | mem-estimate=10.00MB mem-reservation=1.94MB spill-buffer=64.00KB
> thread-reservation=0
> | tuple-ids=2 row-size=18B cardinality=15.00K cost=88539
> | in pipelines: 06(GETNEXT), 00(OPEN)
> |
> 05:EXCHANGE [HASH(c_custkey,c_nationkey)]
> | mem-estimate=175.84KB mem-reservation=0B thread-reservation=0
> | tuple-ids=2 row-size=18B cardinality=15.00K cost=2908
> | in pipelines: 00(GETNEXT)
> |
> F00:PLAN FRAGMENT [RANDOM] hosts=2 instances=2
> Per-Host Shared Resources: mem-estimate=1.00MB mem-reservation=1.00MB
> thread-reservation=0 runtime-filters-memory=1.00MB
> Per-Instance Resources: mem-estimate=26.17MB mem-reservation=13.00MB
> thread-reservation=1
> max-parallelism=2 segment-costs=[7810011, 25116] cpu-comparison-result=3
> [max(2 (self) vs 3 (sum children))]
> 03:AGGREGATE [STREAMING]
> | output: max(o_totalprice)
> | group by: c_custkey, c_nationkey
> | mem-estimate=10.00MB mem-reservation=5.00MB spill-buffer=256.00KB
> thread-reservation=0
> | tuple-ids=2 row-size=18B cardinality=15.00K cost=1349818
> | in pipelines: 00(GETNEXT)
> |
> 02:HASH JOIN [INNER JOIN, BROADCAST]
> | hash-table-id=00
> | hash predicates: o_custkey = c_custkey
> | fk/pk conjuncts: o_custkey = c_custkey
> | mem-estimate=0B mem-reservation=0B spill-buffer=64.00KB
> thread-reservation=0
> | tuple-ids=0,1 row-size=26B cardinality=228.68K cost=426187
> | in pipelines: 00(GETNEXT), 01(OPEN)
> |
> |--F04:PLAN FRAGMENT [RANDOM] hosts=2 instances=2
> | | Per-Instance Resources: mem-estimate=3.09MB mem-reservation=2.94MB
> thread-reservation=1 runtime-filters-memory=1.00MB
> | | max-parallelism=2 segment-costs=[18986]
> | JOIN BUILD
> | | join-table-id=00 plan-id=01 cohort-id=01
> | | build expressions: c_custkey
> | | runtime filters: RF000[bloom] <- c_custkey
> | | mem-estimate=1.94MB mem-reservation=1.94MB spill-buffer=64.00KB
> thread-reservation=0 cost=15000
> | |
> | 04:EXCHANGE [BROADCAST]
> | | mem-estimate=160.48KB mem-reservation=0B thread-reservation=0
> | | tuple-ids=1 row-size=10B cardinality=15.00K cost=3986
> | | in pipelines: 01(GETNEXT)
> | |
> | F01:PLAN FRAGMENT [RANDOM] hosts=1 instances=1
> | Per-Instance Resources: mem-estimate=16.05MB mem-reservation=8.00MB
> thread-reservation=1
> | max-parallelism=1 segment-costs=[865507]
> | 01:SCAN HDFS [tpch.customer, RANDOM]
> | HDFS partitions=1/1 files=1 size=23.08MB
> | predicates: c_nationkey < CAST(10 AS SMALLINT)
> | stored statistics:
> | table: rows=150.00K size=23.08MB
> | columns: all
> | extrapolated-rows=disabled max-scan-range-rows=150.00K
> | mem-estimate=16.00MB mem-reservation=8.00MB thread-reservation=0
> | tuple-ids=1 row-size=10B cardinality=15.00K cost=864778
> | in pipelines: 01(GETNEXT)
> |
> 00:SCAN HDFS [tpch.orders, RANDOM]
> HDFS partitions=1/1 files=1 size=162.56MB
> runtime filters: RF000[bloom] -> o_custkey
> stored statistics:
> table: rows=1.50M size=162.56MB
> columns: all
> extrapolated-rows=disabled max-scan-range-rows=1.18M
> mem-estimate=16.00MB mem-reservation=8.00MB thread-reservation=0
> tuple-ids=0 row-size=16B cardinality=1.50M cost=6034006
> in pipelines: 00(GETNEXT)
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]