This is an automated email from the ASF dual-hosted git repository.

stigahuang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git


The following commit(s) were added to refs/heads/master by this push:
     new 9d6997b7c IMPALA-14280: (Addendum) Waits for updating active catalogd 
address
9d6997b7c is described below

commit 9d6997b7c00512295401f815630e8f02876ecb74
Author: stiga-huang <[email protected]>
AuthorDate: Tue Aug 5 22:03:46 2025 +0800

    IMPALA-14280: (Addendum) Waits for updating active catalogd address
    
    Some tests for catalogd HA failover have a lightweight verifier function
    that finishes quickly before coordinator notices catalogd HA failover,
    e.g. when the verifier function runs a statement that doesn't trigger
    catalogd RPCs.
    
    If the test finishes in such a state, coordinator will use the stale
    active catalogd address in cleanup, i.e. dropping unique_database, and
    fails quickly since the catalogd is passive now. Retrying the statement
    immediately usually won't help since coordinator hasn't updated the
    active catalogd address yet.
    
    Note that we also retry the verifier function immediately when it's
    failed by coordinator talking to the stale catalogd address. It works
    since the previous active catalogd is not running so the catalogd RPCs
    fail and got retried. The retry interval is 3s (configured by
    catalog_client_rpc_retry_interval_ms) and we retry it for at least 2
    times (customized by catalog_client_connection_num_retries in the
    tests). The duration is usually enough for coordinator to update the
    active catalogd address. But depending on this duration is a bit tricky.
    
    This patch adds a wait before the verifier function to make sure
    coordinator has updated the active catalogd address. This also make sure
    the cleanup of unique_database won't fail due to stale active catalogd
    address.
    
    Tests:
     - Ran test_catalogd_ha.py
    
    Change-Id: I45e4a20170fdcce8282f1762f81a290689777aed
    Reviewed-on: http://gerrit.cloudera.org:8080/23252
    Reviewed-by: Riza Suminto <[email protected]>
    Reviewed-by: Wenzhe Zhou <[email protected]>
    Tested-by: Quanlong Huang <[email protected]>
---
 tests/custom_cluster/test_catalogd_ha.py | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/tests/custom_cluster/test_catalogd_ha.py 
b/tests/custom_cluster/test_catalogd_ha.py
index e5bda08f3..b149d2a44 100644
--- a/tests/custom_cluster/test_catalogd_ha.py
+++ b/tests/custom_cluster/test_catalogd_ha.py
@@ -769,6 +769,11 @@ class TestCatalogdHA(CustomClusterTestSuite):
     assert catalogd_service_2.get_metric_value(
         "catalog-server.ha-number-active-status-change") > 0
     assert catalogd_service_2.get_metric_value("catalog-server.active-status")
+    # Make sure coordinator has updated the active catalogd address.
+    self.cluster.get_first_impalad().service.wait_for_metric_value(
+        "catalog.active-catalogd-address",
+        expected_value="{}:{}".format(catalogd_service_2.hostname,
+                                      catalogd_service_2.service_port))
 
     for i in range(2):
       try:
@@ -784,6 +789,7 @@ class TestCatalogdHA(CustomClusterTestSuite):
           continue
         assert False, str(e)
 
+    # Recover the cluster to have two catalogds
     active_catalogd.start()
     return standby_catalogd, active_catalogd
 

Reply via email to