[
https://issues.apache.org/jira/browse/IMPALA-14771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yida Wu resolved IMPALA-14771.
------------------------------
Fix Version/s: Impala 5.0.0
Target Version: Impala 5.0.0
Resolution: Fixed
> Admissiond crash in DequeueLoop caused by dangling ScheduleState
> ----------------------------------------------------------------
>
> Key: IMPALA-14771
> URL: https://issues.apache.org/jira/browse/IMPALA-14771
> Project: IMPALA
> Issue Type: Bug
> Components: Backend
> Affects Versions: Impala 5.0.0
> Reporter: Yida Wu
> Assignee: Yida Wu
> Priority: Major
> Fix For: Impala 5.0.0
>
>
> IMPALA-14661 adds support for compressing admission requests, but the test
> TestAdmissionControllerWithACService was not correctly applying the start
> flag matrix, so compression was not actually enabled during testing.
> After fixing the test, we found that admissiond can hit a
> [DCHECK|https://github.com/apache/impala/blob/master/be/src/scheduling/schedule-state.cc#L215C3-L215C50]
> in DequeueLoop due to a dangling ScheduleState after
> ClearDecompressedCache().
> {code:java}
> #3 0x000000000187b3fe in impala::ScheduleState::GetPerExecutorMemoryEstimate
> (this=this@entry=0xc03f800) at
> /impala/Impala/be/src/scheduling/schedule-state.cc:215
> #4 0x000000000187bce5 in impala::ScheduleState::UpdateMemoryRequirements
> (this=this@entry=0xc03f800, pool_cfg=...,
> coord_mem_limit_admission=12884901888,
> executor_mem_limit_admission=12884901888)
> at /impala/Impala/be/src/scheduling/schedule-state.cc:329
> #5 0x0000000001812328 in
> impala::AdmissionController::FindGroupToAdmitOrReject
> (this=this@entry=0x951fc00, membership_snapshot=..., pool_config=...,
> root_cfg=..., admit_from_queue=admit_from_queue@entry=true,
> pool_stats=pool_stats@entry=0xc5c1bb0,
> queue_node=0xc622730, coordinator_resource_limited=@0x7f3991e415fe:
> false, is_trivial=0x7f3991e415ff) at
> /impala/Impala/be/src/scheduling/admission-controller.cc:2501
> #6 0x0000000001812cf4 in impala::AdmissionController::TryDequeue
> (this=this@entry=0x951fc00) at
> /impala/Impala/be/src/scheduling/admission-controller.cc:2686
> #7 0x000000000181497a in impala::AdmissionController::DequeueLoop
> (this=0x951fc00) at
> /impala/Impala/be/src/scheduling/admission-controller.cc:2646
> #8 0x0000000001816f83 in boost::_mfi::mf0<void,
> impala::AdmissionController>::operator() (p=<optimized out>, this=<optimized
> out>) at
> /impala/Impala/toolchain/toolchain-packages-gcc10.4.0/boost-1.74.0-p1/include/boost/bind/mem_fn_template.hpp:49
> {code}
> The reason is that ScheduleState [depends on the decompressed
> TQueryExecRequest|https://github.com/apache/impala/blob/master/be/src/scheduling/admission-controller.cc#L2424].
> When a query is enqueued, [ClearDecompressedCache() is
> called|https://github.com/apache/impala/blob/master/be/src/scheduling/admission-controller.cc#L1793]
> to save memory, which frees the decompressed exec request. However,
> queue_node->group_states still holds ScheduleState objects that reference
> this freed request.
> When dequeuing, these stale objects are reused and cause a DCHECK.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]