Yida Wu created IMPALA-14234:
--------------------------------
Summary: AdmissionD DCHECK hit during statestore and coordinator
failover
Key: IMPALA-14234
URL: https://issues.apache.org/jira/browse/IMPALA-14234
Project: IMPALA
Issue Type: Bug
Components: Backend
Reporter: Yida Wu
Assignee: Yida Wu
In certain cases, admissionD hits DCHECK during statestore and coordinator
failover.
Repro steps:
1. Start the minicluster with two coords and one global admissiond with max one
request allowed.
{code:java}
$IMPALA_HOME/bin/start-impala-cluster.py
--admissiond_args='--default_pool_max_requests=1' --num_coordinators=2
--enable_admission_service
{code}
2. Run long query 1 in coord 1, being admitted, run short query 2 in coord 2,
being queued, wait until query 2 timeout.
3. Kill the statestored first, then kill the coord 1.
4. Run short query 3 in coord 2. Start the statestored.
Sometimes it will hit the DCHECK in admissiond like below logs show:
{code:java}
I0716 15:39:22.721899 5746 admission-controller.cc:2665] Could not dequeue
query id=ec4b5866ca3ad3a7:53ec93f300000000 reason: number of running queries 1
is at or over limit 1.
I0716 15:39:22.822168 5746 admission-controller.cc:2665] Could not dequeue
query id=ec4b5866ca3ad3a7:53ec93f300000000 reason: number of running queries 1
is at or over limit 1.
I0716 15:39:22.922407 5746 admission-controller.cc:2665] Could not dequeue
query id=ec4b5866ca3ad3a7:53ec93f300000000 reason: number of running queries 1
is at or over limit 1.
I0716 15:39:23.022684 5746 admission-controller.cc:2665] Could not dequeue
query id=ec4b5866ca3ad3a7:53ec93f300000000 reason: number of running queries 1
is at or over limit 1.
I0716 15:39:23.122916 11038 cluster-membership-mgr.cc:280] Local impala server
needs update
I0716 15:39:23.122927 11038 cluster-membership-mgr.cc:295] Received delta
membership update
I0716 15:39:23.122938 11038 admission-controller.cc:1960] Detected that
coordinator c14e143286ccb6aa:be447b1fe338f587 is no longer in the cluster
membership. Cancelling 1 queries for this coordinator.
I0716 15:39:23.122975 11038 admission-controller.cc:1839] ReleaseQuery for
49496fa9222bdcb1:00f17a2500000000 called with 1 unreleased backends. Releasing
automatically.
F0716 15:39:23.123034 5746 admission-controller.cc:2227] Check failed:
current_membership_version >= previous_membership_version (2 vs. 4)
{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]