Quanlong Huang created IMPALA-14228:
---------------------------------------

             Summary: Coordinator should retry requests on the new active 
catalogd after HA failover
                 Key: IMPALA-14228
                 URL: https://issues.apache.org/jira/browse/IMPALA-14228
             Project: IMPALA
          Issue Type: Bug
          Components: Backend
            Reporter: Quanlong Huang
            Assignee: Quanlong Huang


During catalogd HA failover, the active catalogd address is changed. RPCs that 
coordinator sent to the previous active catalogd will fail (e.g. when it 
crashes), which causing query failures. They should be retried on the current 
active catalogd.

However, though coordinator will retry the RPC, they keep using the previous 
active catalogd address. E.g. in PrioritizeLoad RPCs:
{code:cpp}
Status CatalogOpExecutor::PrioritizeLoad(const TPrioritizeLoadRequest& req,
    TPrioritizeLoadResponse* result) {
  int attempt = 0; // Used for debug action only.
  CatalogServiceConnection::RpcStatus rpc_status =
      CatalogServiceConnection::DoRpcWithRetry(
          env_->catalogd_lightweight_req_client_cache(),
          *ExecEnv::GetInstance()->GetCatalogdAddress().get(), // a fixed 
address is used during retry
{code}

Due to this, test_warmed_up_metadata_after_failover is still flaky after fixing 
IMPALA-14227.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to