[ 
https://issues.apache.org/jira/browse/IMPALA-13490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17892878#comment-17892878
 ] 

Fang-Yu Rao commented on IMPALA-13490:
--------------------------------------

Hi [~rizaon], assigned this ticket to you since you are more familiar with this 
area. Please feel free to re-assign the JIRA as you see appropriate. Thanks!

> TpcdsCpuCostPlannerTest#testNonTpcdsDdl() could fail after IMPALA-13469
> -----------------------------------------------------------------------
>
>                 Key: IMPALA-13490
>                 URL: https://issues.apache.org/jira/browse/IMPALA-13490
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>    Affects Versions: Impala 4.5.0
>            Reporter: Fang-Yu Rao
>            Assignee: Riza Suminto
>            Priority: Major
>              Labels: broken-build
>
> We found that testNonTpcdsDdl() in 
> [TpcdsCpuCostPlannerTest.java|https://github.com/apache/impala/blob/master/fe/src/test/java/org/apache/impala/planner/TpcdsCpuCostPlannerTest.java]
>  could fail after IMPALA-13469 with the following error.
>  
> It looks like the expected value of 'segment-costs' does not match the actual 
> one in the single node plan.
> +*Error Message*+
> {code:java}
> Section PLAN of query at line 651:
> create table t partitioned by (c_nationkey) sort by (c_custkey) as
> select c_custkey, max(o_totalprice) as maxprice, c_nationkey
>   from tpch.orders join tpch.customer on c_custkey = o_custkey
>  where c_nationkey < 10
>  group by c_custkey, c_nationkey
> Actual does not match expected result:
> Max Per-Host Resource Reservation: Memory=19.44MB Threads=1
> Per-Host Resource Estimates: Memory=35MB
> F00:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
> |  Per-Instance Resources: mem-estimate=34.94MB mem-reservation=19.44MB 
> thread-reservation=1 runtime-filters-memory=1.00MB
> |  max-parallelism=1 segment-costs=[8689789, 272154, 4822204]
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> WRITE TO HDFS [tpcds_partitioned_parquet_snap.t, OVERWRITE=false, 
> PARTITION-KEYS=(c_nationkey)]
> |  partitions=25
> |  output exprs: c_custkey, max(o_totalprice), c_nationkey
> |  mem-estimate=100.00KB mem-reservation=0B thread-reservation=0 cost=4822204
> |
> 04:SORT
> |  order by: c_nationkey ASC NULLS LAST, c_custkey ASC NULLS LAST
> |  mem-estimate=6.00MB mem-reservation=6.00MB spill-buffer=2.00MB 
> thread-reservation=0
> |  tuple-ids=3 row-size=18B cardinality=228.68K cost=272154
> |  in pipelines: 04(GETNEXT), 03(OPEN)
> |
> 03:AGGREGATE [FINALIZE]
> |  output: max(o_totalprice)
> |  group by: c_custkey, c_nationkey
> |  mem-estimate=10.00MB mem-reservation=8.50MB spill-buffer=512.00KB 
> thread-reservation=0
> |  tuple-ids=2 row-size=18B cardinality=228.68K cost=1349818
> |  in pipelines: 03(GETNEXT), 00(OPEN)
> |
> 02:HASH JOIN [INNER JOIN]
> |  hash predicates: o_custkey = c_custkey
> |  fk/pk conjuncts: o_custkey = c_custkey
> |  runtime filters: RF000[bloom] <- c_custkey
> |  mem-estimate=1.94MB mem-reservation=1.94MB spill-buffer=64.00KB 
> thread-reservation=0
> |  tuple-ids=0,1 row-size=26B cardinality=228.68K cost=441187
> |  in pipelines: 00(GETNEXT), 01(OPEN)
> |
> |--01:SCAN HDFS [tpch.customer]
> |     HDFS partitions=1/1 files=1 size=23.08MB
> |     predicates: c_nationkey < CAST(10 AS SMALLINT)
> |     stored statistics:
> |       table: rows=150.00K size=23.08MB
> |       columns: all
> |     extrapolated-rows=disabled max-scan-range-rows=150.00K
> |     mem-estimate=16.00MB mem-reservation=8.00MB thread-reservation=0
> |     tuple-ids=1 row-size=10B cardinality=15.00K cost=864778
> |     in pipelines: 01(GETNEXT)
> |
> 00:SCAN HDFS [tpch.orders]
>    HDFS partitions=1/1 files=1 size=162.56MB
>    runtime filters: RF000[bloom] -> o_custkey
>    stored statistics:
>      table: rows=1.50M size=162.56MB
>      columns: all
>    extrapolated-rows=disabled max-scan-range-rows=1.18M
>    mem-estimate=16.00MB mem-reservation=8.00MB thread-reservation=0
>    tuple-ids=0 row-size=16B cardinality=1.50M cost=6034006
>    in pipelines: 00(GETNEXT)
> Expected:
> Max Per-Host Resource Reservation: Memory=19.44MB Threads=1
> Per-Host Resource Estimates: Memory=35MB
> F00:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
> |  Per-Instance Resources: mem-estimate=34.94MB mem-reservation=19.44MB 
> thread-reservation=1 runtime-filters-memory=1.00MB
> |  max-parallelism=1 segment-costs=[8689789, 17851, 3700630]
> WRITE TO HDFS [tpcds_partitioned_parquet_snap.t, OVERWRITE=false, 
> PARTITION-KEYS=(c_nationkey)]
> |  partitions=25
> |  output exprs: c_custkey, max(o_totalprice), c_nationkey
> |  mem-estimate=100.00KB mem-reservation=0B thread-reservation=0 cost=3700630
> |
> 04:SORT
> |  order by: c_nationkey ASC NULLS LAST, c_custkey ASC NULLS LAST
> |  mem-estimate=6.00MB mem-reservation=6.00MB spill-buffer=2.00MB 
> thread-reservation=0
> |  tuple-ids=3 row-size=18B cardinality=15.00K cost=17851
> |  in pipelines: 04(GETNEXT), 03(OPEN)
> |
> 03:AGGREGATE [FINALIZE]
> |  output: max(o_totalprice)
> |  group by: c_custkey, c_nationkey
> |  mem-estimate=10.00MB mem-reservation=8.50MB spill-buffer=512.00KB 
> thread-reservation=0
> |  tuple-ids=2 row-size=18B cardinality=15.00K cost=1349818
> |  in pipelines: 03(GETNEXT), 00(OPEN)
> |
> 02:HASH JOIN [INNER JOIN]
> |  hash predicates: o_custkey = c_custkey
> |  fk/pk conjuncts: o_custkey = c_custkey
> |  runtime filters: RF000[bloom] <- c_custkey
> |  mem-estimate=1.94MB mem-reservation=1.94MB spill-buffer=64.00KB 
> thread-reservation=0
> |  tuple-ids=0,1 row-size=26B cardinality=228.68K cost=441187
> |  in pipelines: 00(GETNEXT), 01(OPEN)
> |
> |--01:SCAN HDFS [tpch.customer]
> |     HDFS partitions=1/1 files=1 size=23.08MB
> |     predicates: c_nationkey < CAST(10 AS SMALLINT)
> |     stored statistics:
> |       table: rows=150.00K size=23.08MB
> |       columns: all
> |     extrapolated-rows=disabled max-scan-range-rows=150.00K
> |     mem-estimate=16.00MB mem-reservation=8.00MB thread-reservation=0
> |     tuple-ids=1 row-size=10B cardinality=15.00K cost=864778
> |     in pipelines: 01(GETNEXT)
> |
> 00:SCAN HDFS [tpch.orders]
>    HDFS partitions=1/1 files=1 size=162.56MB
>    runtime filters: RF000[bloom] -> o_custkey
>    stored statistics:
>      table: rows=1.50M size=162.56MB
>      columns: all
>    extrapolated-rows=disabled max-scan-range-rows=1.18M
>    mem-estimate=16.00MB mem-reservation=8.00MB thread-reservation=0
>    tuple-ids=0 row-size=16B cardinality=1.50M cost=6034006
>    in pipelines: 00(GETNEXT)
> {code}
> Moreover, the expected value of 'Memory' does not match the actual one in the 
> distributed plan.
> {code}
> Section DISTRIBUTEDPLAN of query at line 651:
> create table t partitioned by (c_nationkey) sort by (c_custkey) as
> select c_custkey, max(o_totalprice) as maxprice, c_nationkey
>   from tpch.orders join tpch.customer on c_custkey = o_custkey
>  where c_nationkey < 10
>  group by c_custkey, c_nationkey
> Actual does not match expected result:
> Max Per-Host Resource Reservation: Memory=35.69MB Threads=5
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> Per-Host Resource Estimates: Memory=66MB
> F03:PLAN FRAGMENT [HASH(c_nationkey)] hosts=2 instances=2
> |  Per-Instance Resources: mem-estimate=8.01MB mem-reservation=6.00MB 
> thread-reservation=1
> |  max-parallelism=2 segment-costs=[316495, 4822204] cpu-comparison-result=3 
> [max(2 (self) vs 3 (sum children))]
> WRITE TO HDFS [tpcds_partitioned_parquet_snap.t, OVERWRITE=false, 
> PARTITION-KEYS=(c_nationkey)]
> |  partitions=25
> |  output exprs: c_custkey, max(o_totalprice), c_nationkey
> |  mem-estimate=100.00KB mem-reservation=0B thread-reservation=0 cost=4822204
> |
> 08:SORT
> |  order by: c_nationkey ASC NULLS LAST, c_custkey ASC NULLS LAST
> |  mem-estimate=6.00MB mem-reservation=6.00MB spill-buffer=2.00MB 
> thread-reservation=0
> |  tuple-ids=3 row-size=18B cardinality=228.68K cost=272154
> |  in pipelines: 08(GETNEXT), 06(OPEN)
> |
> 07:EXCHANGE [HASH(c_nationkey)]
> |  mem-estimate=2.01MB mem-reservation=0B thread-reservation=0
> |  tuple-ids=2 row-size=18B cardinality=228.68K cost=44341
> |  in pipelines: 06(GETNEXT)
> |
> F02:PLAN FRAGMENT [HASH(c_custkey,c_nationkey)] hosts=2 instances=2
> Per-Instance Resources: mem-estimate=12.01MB mem-reservation=4.75MB 
> thread-reservation=1
> max-parallelism=2 segment-costs=[1394159, 382905] cpu-comparison-result=3 
> [max(2 (self) vs 3 (sum children))]
> 06:AGGREGATE [FINALIZE]
> |  output: max:merge(o_totalprice)
> |  group by: c_custkey, c_nationkey
> |  mem-estimate=10.00MB mem-reservation=4.75MB spill-buffer=256.00KB 
> thread-reservation=0
> |  tuple-ids=2 row-size=18B cardinality=228.68K cost=1349818
> |  in pipelines: 06(GETNEXT), 00(OPEN)
> |
> 05:EXCHANGE [HASH(c_custkey,c_nationkey)]
> |  mem-estimate=2.01MB mem-reservation=0B thread-reservation=0
> |  tuple-ids=2 row-size=18B cardinality=228.68K cost=44341
> |  in pipelines: 00(GETNEXT)
> |
> F00:PLAN FRAGMENT [RANDOM] hosts=2 instances=2
> Per-Host Shared Resources: mem-estimate=1.00MB mem-reservation=1.00MB 
> thread-reservation=0 runtime-filters-memory=1.00MB
> Per-Instance Resources: mem-estimate=26.17MB mem-reservation=13.00MB 
> thread-reservation=1
> max-parallelism=2 segment-costs=[7810011, 382905] cpu-comparison-result=3 
> [max(2 (self) vs 3 (sum children))]
> 03:AGGREGATE [STREAMING]
> |  output: max(o_totalprice)
> |  group by: c_custkey, c_nationkey
> |  mem-estimate=10.00MB mem-reservation=5.00MB spill-buffer=256.00KB 
> thread-reservation=0
> |  tuple-ids=2 row-size=18B cardinality=228.68K cost=1349818
> |  in pipelines: 00(GETNEXT)
> |
> 02:HASH JOIN [INNER JOIN, BROADCAST]
> |  hash-table-id=00
> |  hash predicates: o_custkey = c_custkey
> |  fk/pk conjuncts: o_custkey = c_custkey
> |  mem-estimate=0B mem-reservation=0B spill-buffer=64.00KB 
> thread-reservation=0
> |  tuple-ids=0,1 row-size=26B cardinality=228.68K cost=426187
> |  in pipelines: 00(GETNEXT), 01(OPEN)
> |
> |--F04:PLAN FRAGMENT [RANDOM] hosts=2 instances=2
> |  |  Per-Instance Resources: mem-estimate=3.09MB mem-reservation=2.94MB 
> thread-reservation=1 runtime-filters-memory=1.00MB
> |  |  max-parallelism=2 segment-costs=[18986]
> |  JOIN BUILD
> |  |  join-table-id=00 plan-id=01 cohort-id=01
> |  |  build expressions: c_custkey
> |  |  runtime filters: RF000[bloom] <- c_custkey
> |  |  mem-estimate=1.94MB mem-reservation=1.94MB spill-buffer=64.00KB 
> thread-reservation=0 cost=15000
> |  |
> |  04:EXCHANGE [BROADCAST]
> |  |  mem-estimate=160.48KB mem-reservation=0B thread-reservation=0
> |  |  tuple-ids=1 row-size=10B cardinality=15.00K cost=3986
> |  |  in pipelines: 01(GETNEXT)
> |  |
> |  F01:PLAN FRAGMENT [RANDOM] hosts=1 instances=1
> |  Per-Instance Resources: mem-estimate=16.05MB mem-reservation=8.00MB 
> thread-reservation=1
> |  max-parallelism=1 segment-costs=[865507]
> |  01:SCAN HDFS [tpch.customer, RANDOM]
> |     HDFS partitions=1/1 files=1 size=23.08MB
> |     predicates: c_nationkey < CAST(10 AS SMALLINT)
> |     stored statistics:
> |       table: rows=150.00K size=23.08MB
> |       columns: all
> |     extrapolated-rows=disabled max-scan-range-rows=150.00K
> |     mem-estimate=16.00MB mem-reservation=8.00MB thread-reservation=0
> |     tuple-ids=1 row-size=10B cardinality=15.00K cost=864778
> |     in pipelines: 01(GETNEXT)
> |
> 00:SCAN HDFS [tpch.orders, RANDOM]
>    HDFS partitions=1/1 files=1 size=162.56MB
>    runtime filters: RF000[bloom] -> o_custkey
>    stored statistics:
>      table: rows=1.50M size=162.56MB
>      columns: all
>    extrapolated-rows=disabled max-scan-range-rows=1.18M
>    mem-estimate=16.00MB mem-reservation=8.00MB thread-reservation=0
>    tuple-ids=0 row-size=16B cardinality=1.50M cost=6034006
>    in pipelines: 00(GETNEXT)
> Expected:
> Max Per-Host Resource Reservation: Memory=32.88MB Threads=5
> Per-Host Resource Estimates: Memory=63MB
> F03:PLAN FRAGMENT [HASH(c_nationkey)] hosts=2 instances=2
> |  Per-Instance Resources: mem-estimate=6.17MB mem-reservation=6.00MB 
> thread-reservation=1
> |  max-parallelism=2 segment-costs=[20759, 3700630] cpu-comparison-result=3 
> [max(2 (self) vs 3 (sum children))]
> WRITE TO HDFS [tpcds_partitioned_parquet_snap.t, OVERWRITE=false, 
> PARTITION-KEYS=(c_nationkey)]
> |  partitions=25
> |  output exprs: c_custkey, max(o_totalprice), c_nationkey
> |  mem-estimate=100.00KB mem-reservation=0B thread-reservation=0 cost=3700630
> |
> 08:SORT
> |  order by: c_nationkey ASC NULLS LAST, c_custkey ASC NULLS LAST
> |  mem-estimate=6.00MB mem-reservation=6.00MB spill-buffer=2.00MB 
> thread-reservation=0
> |  tuple-ids=3 row-size=18B cardinality=15.00K cost=17851
> |  in pipelines: 08(GETNEXT), 06(OPEN)
> |
> 07:EXCHANGE [HASH(c_nationkey)]
> |  mem-estimate=175.84KB mem-reservation=0B thread-reservation=0
> |  tuple-ids=2 row-size=18B cardinality=15.00K cost=2908
> |  in pipelines: 06(GETNEXT)
> |
> F02:PLAN FRAGMENT [HASH(c_custkey,c_nationkey)] hosts=2 instances=2
> Per-Instance Resources: mem-estimate=10.17MB mem-reservation=1.94MB 
> thread-reservation=1
> max-parallelism=2 segment-costs=[91447, 25116] cpu-comparison-result=3 [max(2 
> (self) vs 3 (sum children))]
> 06:AGGREGATE [FINALIZE]
> |  output: max:merge(o_totalprice)
> |  group by: c_custkey, c_nationkey
> |  mem-estimate=10.00MB mem-reservation=1.94MB spill-buffer=64.00KB 
> thread-reservation=0
> |  tuple-ids=2 row-size=18B cardinality=15.00K cost=88539
> |  in pipelines: 06(GETNEXT), 00(OPEN)
> |
> 05:EXCHANGE [HASH(c_custkey,c_nationkey)]
> |  mem-estimate=175.84KB mem-reservation=0B thread-reservation=0
> |  tuple-ids=2 row-size=18B cardinality=15.00K cost=2908
> |  in pipelines: 00(GETNEXT)
> |
> F00:PLAN FRAGMENT [RANDOM] hosts=2 instances=2
> Per-Host Shared Resources: mem-estimate=1.00MB mem-reservation=1.00MB 
> thread-reservation=0 runtime-filters-memory=1.00MB
> Per-Instance Resources: mem-estimate=26.17MB mem-reservation=13.00MB 
> thread-reservation=1
> max-parallelism=2 segment-costs=[7810011, 25116] cpu-comparison-result=3 
> [max(2 (self) vs 3 (sum children))]
> 03:AGGREGATE [STREAMING]
> |  output: max(o_totalprice)
> |  group by: c_custkey, c_nationkey
> |  mem-estimate=10.00MB mem-reservation=5.00MB spill-buffer=256.00KB 
> thread-reservation=0
> |  tuple-ids=2 row-size=18B cardinality=15.00K cost=1349818
> |  in pipelines: 00(GETNEXT)
> |
> 02:HASH JOIN [INNER JOIN, BROADCAST]
> |  hash-table-id=00
> |  hash predicates: o_custkey = c_custkey
> |  fk/pk conjuncts: o_custkey = c_custkey
> |  mem-estimate=0B mem-reservation=0B spill-buffer=64.00KB 
> thread-reservation=0
> |  tuple-ids=0,1 row-size=26B cardinality=228.68K cost=426187
> |  in pipelines: 00(GETNEXT), 01(OPEN)
> |
> |--F04:PLAN FRAGMENT [RANDOM] hosts=2 instances=2
> |  |  Per-Instance Resources: mem-estimate=3.09MB mem-reservation=2.94MB 
> thread-reservation=1 runtime-filters-memory=1.00MB
> |  |  max-parallelism=2 segment-costs=[18986]
> |  JOIN BUILD
> |  |  join-table-id=00 plan-id=01 cohort-id=01
> |  |  build expressions: c_custkey
> |  |  runtime filters: RF000[bloom] <- c_custkey
> |  |  mem-estimate=1.94MB mem-reservation=1.94MB spill-buffer=64.00KB 
> thread-reservation=0 cost=15000
> |  |
> |  04:EXCHANGE [BROADCAST]
> |  |  mem-estimate=160.48KB mem-reservation=0B thread-reservation=0
> |  |  tuple-ids=1 row-size=10B cardinality=15.00K cost=3986
> |  |  in pipelines: 01(GETNEXT)
> |  |
> |  F01:PLAN FRAGMENT [RANDOM] hosts=1 instances=1
> |  Per-Instance Resources: mem-estimate=16.05MB mem-reservation=8.00MB 
> thread-reservation=1
> |  max-parallelism=1 segment-costs=[865507]
> |  01:SCAN HDFS [tpch.customer, RANDOM]
> |     HDFS partitions=1/1 files=1 size=23.08MB
> |     predicates: c_nationkey < CAST(10 AS SMALLINT)
> |     stored statistics:
> |       table: rows=150.00K size=23.08MB
> |       columns: all
> |     extrapolated-rows=disabled max-scan-range-rows=150.00K
> |     mem-estimate=16.00MB mem-reservation=8.00MB thread-reservation=0
> |     tuple-ids=1 row-size=10B cardinality=15.00K cost=864778
> |     in pipelines: 01(GETNEXT)
> |
> 00:SCAN HDFS [tpch.orders, RANDOM]
>    HDFS partitions=1/1 files=1 size=162.56MB
>    runtime filters: RF000[bloom] -> o_custkey
>    stored statistics:
>      table: rows=1.50M size=162.56MB
>      columns: all
>    extrapolated-rows=disabled max-scan-range-rows=1.18M
>    mem-estimate=16.00MB mem-reservation=8.00MB thread-reservation=0
>    tuple-ids=0 row-size=16B cardinality=1.50M cost=6034006
>    in pipelines: 00(GETNEXT)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to