[
https://issues.apache.org/jira/browse/IMPALA-12057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18012057#comment-18012057
]
ASF subversion and git services commented on IMPALA-12057:
----------------------------------------------------------
Commit 59fdd7169a4523a2c4916096d550855e49c8a35a in impala's branch
refs/heads/master from Yida Wu
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=59fdd7169 ]
IMPALA-10866: Add testcases for failure cases involving the admission service
The admission service uses the statestore as the only source of
truth to determine whether a coordinator is down. If the statestore
reports a coordinator is down, all running and queued queries
associated with it should be cancelled or rejected.
In IMPALA-12057, we introduced logic to reject queued queries if
the corresponding coordinator has been removed, along with tests
for that behavior.
This patch adds additional test cases to cover other failure
scenarios, such as the coordinator or the statestore going down
with running queries, and verifies that the behavior is as expected
in each case.
Tests:
Passed exhaustive tests.
Change-Id: If617326cbc6fe2567857d6323c6413d98c92d009
Reviewed-on: http://gerrit.cloudera.org:8080/23217
Reviewed-by: Riza Suminto <[email protected]>
Reviewed-by: Abhishek Rawat <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
> admissiond fails to admit queued queries if coordinator's membership id
> changes
> -------------------------------------------------------------------------------
>
> Key: IMPALA-12057
> URL: https://issues.apache.org/jira/browse/IMPALA-12057
> Project: IMPALA
> Issue Type: Bug
> Reporter: Abhishek Rawat
> Assignee: Yida Wu
> Priority: Critical
> Fix For: Impala 5.0.0
>
>
> If coordinator's subscription id changes (due to a restart or reconnection
> with statestored), admissiond has no way of knowing if the coordinator was
> briefly disconnected and is again part of the cluster and has the query state
> preserved or coordinator got restarted and doesn't know anything about the
> queued query.
> Ideally in such cases admissiond should learn from coordinator and
> statestored that the queued queries are still valid and the subscription id
> has changed so that admission controller can submit the queued queries.
> Untill we support that we should at least fail these queries immediately. The
> current behavior is that admission controller goes into an infinite loop
> waiting on these queued queries:
> {code:java}
> I0411 13:52:22.694419 67 admission-controller.cc:2206] Could not dequeue
> query id=c748095c589ccfb6:3819937100000000 reason: Coordinator not registered
> with the statestore.
> I0411 13:52:22.795398 67 admission-controller.cc:2206] Could not dequeue
> query id=c748095c589ccfb6:3819937100000000 reason: Coordinator not registered
> with the statestore.
> ....
> I0411 15:14:11.063143 67 admission-controller.cc:2206] Could not dequeue
> query id=c748095c589ccfb6:3819937100000000 reason: Coordinator not registered
> with the statestore.
> I0411 15:14:11.164698 67 admission-controller.cc:2206] Could not dequeue
> query id=c748095c589ccfb6:3819937100000000 reason: Coordinator not registered
> with the statestore. {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]