Hello Yida Wu, Daniel Becker, Abhishek Rawat, Jason Fehr, Csaba Ringhofer, Wenzhe Zhou, Impala Public Jenkins,
I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21762 to look at the new patch set (#36). Change subject: IMPALA-13333: Limit memory estimation if PlanNode can spill ...................................................................... IMPALA-13333: Limit memory estimation if PlanNode can spill SortNode, AggregationNode, and HashJoinNode (the build side) can spill to disk. However, their memory estimation does not consider this capability and assumes it will hold all rows in memory. This causes memory overestimation if cardinality is also overestimated. In reality, the whole query execution in a single host is often subject to much lower memory upper-bound and not allowed to exceed it. This upper-bound is dictated by, but not limited to: - MEM_LIMIT - MEM_LIMIT_COORDINATORS - MEM_LIMIT_EXECUTORS - MAX_MEM_ESTIMATE_FOR_ADMISSION - impala.admission-control.max-query-mem-limit.<pool_name> from admission control. This patch adds SpillableOperator interface that defines an alternative of either PlanNode.computeNodeResourceProfile() and DataSink.computeResourceProfile() if a lower memory upper-bound can be reasoned about from configs mentioned above. This interface is applied to SortNode, AggregationNode, HashJoinNode, and JoinBuildSink. The in-memory vs spill-to-disk bias is controlled through MEM_ESTIMATE_SCALE_FOR_SPILLING_OPERATOR option. A scale between [0.0,1.0] to control estimate peak memory of query operator that has spill-to-disk capabilities. Setting value closer to 1.0 will make Planner bias towards keeping as much rows as possible in memory, while setting value closer to 0.0 will make Planner bias towards spilling rows to disk under memory pressure. Note that lowering MEM_ESTIMATE_SCALE_FOR_SPILLING_OPERATOR can make query that previously rejected by Admission Controller becomes admittable, but also may spill-to-disk more and have higher risk to exhaust scratch space more than before. There are some caveats on this memory bounding patch: - It checks if spill-to-disk is enabled in the coordinator, but individual backend executors might not have it configured. Mismatch of spill-to-disk configs between the coordinator and backend executor, however, is rare and can be considered as misconfiguration. - It does not check the actual total scratch space available to spill-to-disk. However, query execution will be forced to spill anyway if memory usage exceeds the three memory configs above. Raising MEM_LIMIT / MEM_LIMIT_EXECUTORS option can help increase memory estimation and increase the likelihood for the query to get assigned to a larger executor group set, which usually has a bigger total scratch space. - The memory bound is divided evenly among all instances of a fragment kind in a single host. But in theory, they should be able to share and grow their memory usage independently beyond memory estimate as long max memory reservation is not set. - This does not consider other memory-related configs such as clamp_query_mem_limit_backend_mem_limit or disable_pool_mem_limits flag. But the admission controller will still enforce them if set. Testing: - Pass FE and custom cluster tests with core exploration. Change-Id: I290c4e889d4ab9e921e356f0f55a9c8b11d0854e --- M be/src/service/query-options-test.cc M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/planner/AggregationNode.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java M fe/src/main/java/org/apache/impala/planner/JoinBuildSink.java M fe/src/main/java/org/apache/impala/planner/PlanFragment.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/ResourceProfile.java M fe/src/main/java/org/apache/impala/planner/ResourceProfileBuilder.java M fe/src/main/java/org/apache/impala/planner/SortNode.java A fe/src/main/java/org/apache/impala/planner/SpillableOperator.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/main/java/org/apache/impala/util/JniRequestPoolService.java M fe/src/main/java/org/apache/impala/util/RequestPoolService.java M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java M fe/src/test/java/org/apache/impala/planner/TpcdsCpuCostPlannerTest.java M fe/src/test/resources/llama-site-3-groups.xml M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-ddl-iceberg.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-ddl-parquet.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q22.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q23a.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q23b.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q38.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q51.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q57.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q64-mem_limit-10g.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q64-mem_limit_executors-20g.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q64.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q67-mem_limit-10g.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q67-mem_limit_executors-20g.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q67.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q72.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q87.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q95.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q97-mem_limit-10g.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q97-mem_limit_executors-20g.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q97.test M tests/custom_cluster/test_executor_groups.py 43 files changed, 11,222 insertions(+), 203 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/62/21762/36 -- To view, visit http://gerrit.cloudera.org:8080/21762 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I290c4e889d4ab9e921e356f0f55a9c8b11d0854e Gerrit-Change-Number: 21762 Gerrit-PatchSet: 36 Gerrit-Owner: Riza Suminto <riza.sumi...@cloudera.com> Gerrit-Reviewer: Abhishek Rawat <ara...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Daniel Becker <daniel.bec...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Jason Fehr <jf...@cloudera.com> Gerrit-Reviewer: Riza Suminto <riza.sumi...@cloudera.com> Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com> Gerrit-Reviewer: Yida Wu <wydbaggio...@gmail.com>