Yida Wu has uploaded this change for review. ( http://gerrit.cloudera.org:8080/24055
Change subject: IMPALA-14763: Prevent admissiond OOM during request decompression ...................................................................... IMPALA-14763: Prevent admissiond OOM during request decompression When admissiond is close to its memory limit and a very large queued query is dequeued, decompression of this compressed request can push memory usage over the limit and cause an OOM. Previously in IMPALA-14493, memory checks TCMalloc's BYTES_IN_USE to provide memory safeguard on Submission for uncompressed requests, but after IMPALA-14661, we need to consider the decompression cases. This patch adds memory safeguard for compressed requests, mainly the decompression will happen on Submission or Dequeue. We put all the rejection logic into a static function RejectForAdmissionServiceMemory(), and introduce a new memory tracker, pending_decompression_mem_tracker, to track the total uncompressed size of pending compressed requests. RejectForAdmissionServiceMemory() compares the current tcmalloc bytes-in-use plus the additional memory to reserve against the process memory limit. For compressed requests, we first add the request’s uncompressed size to pending_decompression_mem_tracker, then pass the total pending uncompressed size as the additional reserved memory to RejectForAdmissionServiceMemory(), ensuring thread safety. For uncompressed requests, the additional memory is zero. If the check fails, RejectForAdmissionServiceMemory() returns an error and admissiond rejects the query. Additionally, to prevent early decompression for queued compressed requests when GetQueryStatus() is called, the AC_AFTER_ADMISSION_OUTCOME debug action in WaitOnQueued() is removed if the request is compressed. Testing: Added a new test to check compressed requests being rejected on Submission. Manually verified that the safeguard also works at Dequeue, an automated test for the Dequeue case was a bit flaky to include. Passed exhaustive test test_admission_controller.py. Change-Id: I196455f445f0644d89467a23b4ec1f64f184f2db --- M be/src/scheduling/admission-controller.cc M be/src/scheduling/admission-controller.h M tests/custom_cluster/test_admission_controller.py 3 files changed, 145 insertions(+), 71 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/24055/1 -- To view, visit http://gerrit.cloudera.org:8080/24055 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I196455f445f0644d89467a23b4ec1f64f184f2db Gerrit-Change-Number: 24055 Gerrit-PatchSet: 1 Gerrit-Owner: Yida Wu <[email protected]>
