[
https://issues.apache.org/jira/browse/IMPALA-15019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Quanlong Huang updated IMPALA-15019:
------------------------------------
Attachment: tpcds-q4-calcite-plan.txt
> Calcite planner has higher memory estimation
> --------------------------------------------
>
> Key: IMPALA-15019
> URL: https://issues.apache.org/jira/browse/IMPALA-15019
> Project: IMPALA
> Issue Type: Bug
> Components: Frontend
> Reporter: Quanlong Huang
> Assignee: Steve Carlin
> Priority: Major
> Attachments: row-size-comparison.txt, tpcds-q4-calcite-plan.txt,
> tpcds-q4-original-plan.txt
>
>
> Comparing the EXPLAIN outputs between the original planner and
> calcite-planner, it seems the calcite planner always uses a larger row-size,
> which might result in higher memory estimation.
> For instance, for the following query:
> {code:sql}
> EXPLAIN SELECT count(*) FROM functional.alltypes
> WHERE year=2009 AND int_col=1 AND string_col='1';{code}
> The original planner uses row-size=17B in the scan node, which the
> calcite-planner uses row-size=21B.
> Original planner:
> {noformat}
> +-------------------------------------------------------------+
> | Explain String |
> +-------------------------------------------------------------+
> | Max Per-Host Resource Reservation: Memory=32.00KB Threads=3 |
> | Per-Host Resource Estimates: Memory=80MB |
> | Codegen disabled by planner |
> | |
> | PLAN-ROOT SINK |
> | | |
> | 03:AGGREGATE [FINALIZE] |
> | | output: count:merge(*) |
> | | row-size=8B cardinality=1 |
> | | |
> | 02:EXCHANGE [UNPARTITIONED] |
> | | |
> | 01:AGGREGATE |
> | | output: count(*) |
> | | row-size=8B cardinality=3 |
> | | |
> | 00:SCAN HDFS [functional.alltypes] |
> | partition predicates: `year` = 2009 |
> | HDFS partitions=12/24 files=12 size=238.68KB |
> | predicates: int_col = 1, string_col = '1' |
> | row-size=17B cardinality=115 |
> +-------------------------------------------------------------+{noformat}
> Calcite-planner:
> {noformat}
> +--------------------------------------------------------------------------------------+
> | Explain String
> |
> +--------------------------------------------------------------------------------------+
> | Max Per-Host Resource Reservation: Memory=32.00KB Threads=3
> |
> | Per-Host Resource Estimates: Memory=80MB
> |
> | Codegen disabled by planner
> |
> |
> |
> | PLAN-ROOT SINK
> |
> | |
> |
> | 03:AGGREGATE [FINALIZE]
> |
> | | output: count:merge()
> |
> | | row-size=8B cardinality=1
> |
> | |
> |
> | 02:EXCHANGE [UNPARTITIONED]
> |
> | |
> |
> | 01:AGGREGATE
> |
> | | output: count()
> |
> | | row-size=8B cardinality=3
> |
> | |
> |
> | 00:SCAN HDFS [functional.alltypes]
> |
> | partition predicates: functional.alltypes.year = 2009
> |
> | HDFS partitions=12/24 files=12 size=238.68KB
> |
> | predicates: functional.alltypes.int_col = 1,
> functional.alltypes.string_col = '1' |
> | row-size=21B cardinality=115
> |
> +--------------------------------------------------------------------------------------+{noformat}
> Also compared TPCDS-Q4 as a more complex example, the original planner has
> lower memory requirement:
> {noformat}
> Max Per-Host Resource Reservation: Memory=511.00MB Threads=50
> Per-Host Resource Estimates: Memory=2.57GB{noformat}
> The calcite-planner has higher memory:
> {noformat}
> Max Per-Host Resource Reservation: Memory=539.88MB Threads=50
> Per-Host Resource Estimates: Memory=2.68GB{noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]