Quanlong Huang created IMPALA-14228:
---------------------------------------
Summary: Coordinator should retry requests on the new active
catalogd after HA failover
Key: IMPALA-14228
URL: https://issues.apache.org/jira/browse/IMPALA-14228
Project: IMPALA
Issue Type: Bug
Components: Backend
Reporter: Quanlong Huang
Assignee: Quanlong Huang
During catalogd HA failover, the active catalogd address is changed. RPCs that
coordinator sent to the previous active catalogd will fail (e.g. when it
crashes), which causing query failures. They should be retried on the current
active catalogd.
However, though coordinator will retry the RPC, they keep using the previous
active catalogd address. E.g. in PrioritizeLoad RPCs:
{code:cpp}
Status CatalogOpExecutor::PrioritizeLoad(const TPrioritizeLoadRequest& req,
TPrioritizeLoadResponse* result) {
int attempt = 0; // Used for debug action only.
CatalogServiceConnection::RpcStatus rpc_status =
CatalogServiceConnection::DoRpcWithRetry(
env_->catalogd_lightweight_req_client_cache(),
*ExecEnv::GetInstance()->GetCatalogdAddress().get(), // a fixed
address is used during retry
{code}
Due to this, test_warmed_up_metadata_after_failover is still flaky after fixing
IMPALA-14227.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]