Yida Wu has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/24055


Change subject: IMPALA-14763: Prevent admissiond OOM during request 
decompression
......................................................................

IMPALA-14763: Prevent admissiond OOM during request decompression

When admissiond is close to its memory limit and a very large
queued query is dequeued, decompression of this compressed
request can push memory usage over the limit and cause an OOM.

Previously in IMPALA-14493, memory checks TCMalloc's BYTES_IN_USE
to provide memory safeguard on Submission for uncompressed
requests, but after IMPALA-14661, we need to consider the
decompression cases. This patch adds memory safeguard for
compressed requests, mainly the decompression will happen on
Submission or Dequeue.

We put all the rejection logic into a static function
RejectForAdmissionServiceMemory(), and introduce a new memory
tracker, pending_decompression_mem_tracker, to track the total
uncompressed size of pending compressed requests.
RejectForAdmissionServiceMemory() compares the current
tcmalloc bytes-in-use plus the additional memory to reserve
against the process memory limit.
For compressed requests, we first add the request’s uncompressed
size to pending_decompression_mem_tracker, then pass the total
pending uncompressed size as the additional reserved memory to
RejectForAdmissionServiceMemory(), ensuring thread safety.
For uncompressed requests, the additional memory is zero.
If the check fails, RejectForAdmissionServiceMemory() returns
an error and admissiond rejects the query.

Additionally, to prevent early decompression for queued
compressed requests when GetQueryStatus() is called, the
AC_AFTER_ADMISSION_OUTCOME debug action in WaitOnQueued() is
removed if the request is compressed.

Testing:
Added a new test to check compressed requests being rejected
on Submission. Manually verified that the safeguard also works
at Dequeue, an automated test for the Dequeue case was a bit
flaky to include.
Passed exhaustive test test_admission_controller.py.

Change-Id: I196455f445f0644d89467a23b4ec1f64f184f2db
---
M be/src/scheduling/admission-controller.cc
M be/src/scheduling/admission-controller.h
M tests/custom_cluster/test_admission_controller.py
3 files changed, 145 insertions(+), 71 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/24055/1
--
To view, visit http://gerrit.cloudera.org:8080/24055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I196455f445f0644d89467a23b4ec1f64f184f2db
Gerrit-Change-Number: 24055
Gerrit-PatchSet: 1
Gerrit-Owner: Yida Wu <[email protected]>

Reply via email to