[
https://issues.apache.org/jira/browse/IMPALA-14466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18023439#comment-18023439
]
ASF subversion and git services commented on IMPALA-14466:
----------------------------------------------------------
Commit 63dee747122c613fe4399c7320d5796b1b8626b4 in impala's branch
refs/heads/master from Yida Wu
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=63dee7471 ]
IMPALA-14466: Remote client should not cache admissiond's IP when retrying
AdmitQuery RPC
The remote admission client's retry logic for AdmitQuery RPC did not
handle cases where the admissiond restarts with a new IP address.
The client would use the old proxy and retry against the old, stale
ip, causing queries to time out.
This change fixes the issue by adding the GetProxy() call inside the
retry loop. This forces the client to re-resolve the admissiond's
network address on each retry attempt, allowing it to discover the
new endpoint and successfully reconnect.
Tests:
Passed admissiond related exhaustive ee tests.
Since automatically change hosts might be difficult, manually test
to change the /etc/hosts with following steps:
1. Start with --admission_service_host=localhost.
2. Change the 'localhost' in /etc/hosts to an inaccessible IP,
like 127.0.0.2.
3. Submit a query, it will block in the retry logic.
4. While the query is blocked, change 'localhost' in /etc/hosts
back to 127.0.0.1.
5. The query succeeded.
Change-Id: I5857de84ce69902b902099f668e87d747f944aff
Reviewed-on: http://gerrit.cloudera.org:8080/23472
Reviewed-by: Abhishek Rawat <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
> Admissiond remote client retry may not work if ip changes
> ---------------------------------------------------------
>
> Key: IMPALA-14466
> URL: https://issues.apache.org/jira/browse/IMPALA-14466
> Project: IMPALA
> Issue Type: Bug
> Components: Backend
> Affects Versions: Impala 5.0.0
> Reporter: Yida Wu
> Assignee: Yida Wu
> Priority: Major
>
> The {{RemoteAdmissionControlClient}} gets the service proxy for
> {{admissiond}} once before entering the retry loop. If the {{admissiond}}
> service restarts and its endpoint IP changes, the client will endlessly retry
> against the old, stale IP address, causing the query to fail with a timeout.
> Reproduce{*}:{*}
> # Start with {{admissiond}} service.
> # Change the admission_service_host, let's say it is set to "localhost", we
> change the localhost to inaccessible ip 127.0.0.2 in /etc/hosts.
> # Submit a query, and it will be blocked in the retry logic.
> # Change the localhost back to 127.0.0.1 in /etc/hosts.
> # *Expected:* The client's retry logic should discover the new
> {{admissiond}} ip and succeed.
> # *Actual:* The client continues retrying the old IP and the query times out.
> Move the {{AdmissionControlService::GetProxy()}} call inside the {{while}}
> loop could be a solution.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]