Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/22926 )
Change subject: IMPALA-14574: Lower memory estimate by analyzing PipelineMembership ...................................................................... Patch Set 15: (11 comments) http://gerrit.cloudera.org:8080/#/c/22926/14//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/22926/14//COMMIT_MSG@92 PS14, Line 92: scale 10. No major regression is observed. > Were there any improvements to note? I guess perf tests run one at a time, I thought no spill happen before and after the patch. But I check again just now and found out that query 28 and 78 spill after the patch. Query 28 reduce Cluster Memory Admitted (across all executors) from 98.04GB to 34.04GB, but latency improve by 6.06% (1575ms faster). Query 78 reduce Cluster Memory Admitted from 468.81GB to 289.46GB, and latency regress by 8.75% (12261ms slower). All other queries does not spill after this patch. I have not run concurrency tests. However, if enough concurrent queries spill to disk, we may have risk of running out of scratch space. TOTAL estimation mode should be more resilient against scratch space exhaustion since higher estimate will cause admission control to stop admitting new query earlier. http://gerrit.cloudera.org:8080/#/c/22926/14/be/src/service/query-options.cc File be/src/service/query-options.cc: http://gerrit.cloudera.org:8080/#/c/22926/14/be/src/service/query-options.cc@1437 PS14, Line 1437: RETURN_IF_ERROR(GetThriftEnum(value, "Memory estimate mode", > nit: it might be better to stick with "Memory estimate mode" here. Consiste Done http://gerrit.cloudera.org:8080/#/c/22926/14/fe/src/main/java/org/apache/impala/planner/PipelineBasedMemEstimator.java File fe/src/main/java/org/apache/impala/planner/PipelineBasedMemEstimator.java: http://gerrit.cloudera.org:8080/#/c/22926/14/fe/src/main/java/org/apache/impala/planner/PipelineBasedMemEstimator.java@54 PS14, Line 54: totalMaxAdjacentOf_ = initializedMemMap(allFragments); > typo: getPerInstanceMinMemOverestimation? Done http://gerrit.cloudera.org:8080/#/c/22926/14/fe/src/main/java/org/apache/impala/planner/PipelineBasedMemEstimator.java@60 PS14, Line 60: > Why does the order of allFragments matter? The map value, PipeMem object, memorize the originating fragment where the pipeline start. This is why we traverse bottom-up. Added comment here. http://gerrit.cloudera.org:8080/#/c/22926/14/fe/src/main/java/org/apache/impala/planner/PipelineBasedMemEstimator.java@104 PS14, Line 104: * 2). Sum of all maximum memory growth estimate rooted at OPEN pipelines adjacent to X. > I'd add trace logging before the loop that indicates something about the pl Added in constructor. http://gerrit.cloudera.org:8080/#/c/22926/14/fe/src/main/java/org/apache/impala/planner/PipelineBasedMemEstimator.java@167 PS14, Line 167: openPipelines.add(pm); > Are GETNEXT and OPEN really the only two options here? What guarantees that PipelineMembership class always initialized with either GETNEXT or OPEN phase. Added Precondition in PipelineMembership constructor to ensure this. http://gerrit.cloudera.org:8080/#/c/22926/14/fe/src/main/java/org/apache/impala/planner/PipelineBasedMemEstimator.java@219 PS14, Line 219: PlanNodeId openPipeId = openPipe.getId(); > This is pretty deeply nested. Consider breaking some of this function into Done http://gerrit.cloudera.org:8080/#/c/22926/14/fe/src/main/java/org/apache/impala/planner/PlanFragment.java File fe/src/main/java/org/apache/impala/planner/PlanFragment.java: http://gerrit.cloudera.org:8080/#/c/22926/14/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@470 PS14, Line 470: // There are several PlanNodes that do not reserve initial memory (0), but set > We could also fix this by ensuring every node sets an initial memory reserv Yes. It is done this way to avoid underestimation when we do memory overlapping loop. Some PlanNode / DataSink do require initial memory just to start, such as 4MB buffer in PlanRootSink to enable result spooling. Other PlanNode does not really consume memory rightaway when they start and thus does not declare any initial memory. Specific for ExchangeNode, there are open Jira that describe current memory estimation issue with it: https://issues.apache.org/jira/browse/IMPALA-12594 https://issues.apache.org/jira/browse/IMPALA-11510 http://gerrit.cloudera.org:8080/#/c/22926/14/fe/src/main/java/org/apache/impala/planner/Planner.java File fe/src/main/java/org/apache/impala/planner/Planner.java: http://gerrit.cloudera.org:8080/#/c/22926/14/fe/src/main/java/org/apache/impala/planner/Planner.java@714 PS14, Line 714: long reduction = > What does this look like in-context? It might be nice to provide a human-re Changed this log line slightly. It will shows up like this: I20260105 22:23:44.934326 2029861 Planner.java:716] Using MAX_PIPELINE estimate: maxPerHostEstimateBytes=43.31GB, originalMemEstimateBytes=68.61GB, minMemReservationBytes=6.10GB, reduction=25.30GB http://gerrit.cloudera.org:8080/#/c/22926/14/fe/src/main/java/org/apache/impala/planner/Planner.java@1097 PS14, Line 1097: > We provide a warning about incomplete stats elsewhere. I'm surprised this h That code exist in Frontend.java https://github.com/apache/impala/blob/e2b0a027b7a9c5ccb127ca318f273ea1f18f127c/fe/src/main/java/org/apache/impala/service/Frontend.java#L2014-L2015 Which happen in later phase of query compilation. This code is here invoked earlier to avoid using pipeline-based memory estimation if accurate estimation can not be done due to missing stats. http://gerrit.cloudera.org:8080/#/c/22926/14/testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q23a.test File testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q23a.test: http://gerrit.cloudera.org:8080/#/c/22926/14/testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q23a.test@513 PS14, Line 513: Per-Host Resource Estimates: Memory=41.59GB > This is a substantial change. Nice sign that it could be beneficial in some The max Per Node Peak Memory Usage before and after patch is around 24 GB. This is the actual profile from 10 node, MT_DOP=12, 3TB setup: Baseline (before patch) Max Per-Host Resource Reservation: Memory=8.99GB Threads=279 Per-Host Resource Estimates: Memory=120.90GB Per Node Peak Memory Usage: coordinator-0.coordinator-int.impala-1764109762-62np.svc.cluster.local:27000(2.01 MB) impala-executor-000-5.impala-executor.impala-1764109762-62np.svc.cluster.local:27010(22.53 GB) impala-executor-000-3.impala-executor.impala-1764109762-62np.svc.cluster.local:27010(23.31 GB) impala-executor-000-8.impala-executor.impala-1764109762-62np.svc.cluster.local:27010(23.56 GB) impala-executor-000-4.impala-executor.impala-1764109762-62np.svc.cluster.local:27010(23.95 GB) impala-executor-000-7.impala-executor.impala-1764109762-62np.svc.cluster.local:27010(22.74 GB) impala-executor-000-0.impala-executor.impala-1764109762-62np.svc.cluster.local:27010(23.38 GB) impala-executor-000-9.impala-executor.impala-1764109762-62np.svc.cluster.local:27010(23.26 GB) impala-executor-000-6.impala-executor.impala-1764109762-62np.svc.cluster.local:27010(23.15 GB) impala-executor-000-2.impala-executor.impala-1764109762-62np.svc.cluster.local:27010(23.49 GB) impala-executor-000-1.impala-executor.impala-1764109762-62np.svc.cluster.local:27010(23.77 GB) After patch: Max Per-Host Resource Reservation: Memory=8.99GB Threads=279 Per-Host Resource Estimates: Memory=43.66GB Per Node Peak Memory Usage: coordinator-0.coordinator-int.impala-1761163328-fw8c.svc.cluster.local:27000(4.01 MB) impala-executor-000-3.impala-executor.impala-1761163328-fw8c.svc.cluster.local:27010(23.72 GB) impala-executor-000-1.impala-executor.impala-1761163328-fw8c.svc.cluster.local:27010(23.41 GB) impala-executor-000-4.impala-executor.impala-1761163328-fw8c.svc.cluster.local:27010(23.53 GB) impala-executor-000-8.impala-executor.impala-1761163328-fw8c.svc.cluster.local:27010(23.40 GB) impala-executor-000-0.impala-executor.impala-1761163328-fw8c.svc.cluster.local:27010(23.65 GB) impala-executor-000-9.impala-executor.impala-1761163328-fw8c.svc.cluster.local:27010(22.79 GB) impala-executor-000-5.impala-executor.impala-1761163328-fw8c.svc.cluster.local:27010(23.54 GB) impala-executor-000-6.impala-executor.impala-1761163328-fw8c.svc.cluster.local:27010(23.68 GB) impala-executor-000-7.impala-executor.impala-1761163328-fw8c.svc.cluster.local:27010(22.66 GB) impala-executor-000-2.impala-executor.impala-1761163328-fw8c.svc.cluster.local:27010(24.11 GB) -- To view, visit http://gerrit.cloudera.org:8080/22926 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iba4f40993d8360bfdc1eb592a626e285dfed1627 Gerrit-Change-Number: 22926 Gerrit-PatchSet: 15 Gerrit-Owner: Riza Suminto <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Michael Smith <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Comment-Date: Tue, 06 Jan 2026 06:52:56 +0000 Gerrit-HasComments: Yes
