[ 
https://issues.apache.org/jira/browse/IMPALA-13536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17904616#comment-17904616
 ] 

ASF subversion and git services commented on IMPALA-13536:
----------------------------------------------------------

Commit 41c145f5add442dbd089bc77641c1e117486bc08 in impala's branch 
refs/heads/master from jasonmfehr
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=41c145f5a ]

IMPALA-13536: Fix Workload Management Init with Catalog HA

When running an Impala cluster with catalogd HA enabled, the standby
catalogd would go into a loop waiting for the first catalog update to
arrive repeatedly logging the same error and never joining the server
thread defined in catalogd-main.cc.

Before this patch, when the standby daemon became active, the first
catalogd update was finally received, and the workload management
initialization process ran a second time in the newly active daemon
because this daemon saw that it was active.

This patch modifies the catalogd workload management initialization
code so it waits until the active catalogd has been determined. At
that point, the standby daemon skips workload management
initialization while the active daemon runs it after it receives the
first catalog update.

Testing was accomplished by modifying the workload management
initialization custom cluster tests to assert that the init process
is not re-run when a catalogd switches from standby to active and
also to remove the assumption that the first catalogd was active. The
test_catalog_ha test was deleted since all its assertions are handled
by the setup_method of the new TestWorkloadManagementCatalogHA class.

Ozone tests with and without erasure coding were also ran and passed.

Change-Id: Id3797a0a9cf0b8ae844d9b7d46b607d93824f69a
Reviewed-on: http://gerrit.cloudera.org:8080/22118
Reviewed-by: Riza Suminto <[email protected]>
Reviewed-by: Michael Smith <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> Tests failing in TestWorkloadManagementInitWait on Ozone
> --------------------------------------------------------
>
>                 Key: IMPALA-13536
>                 URL: https://issues.apache.org/jira/browse/IMPALA-13536
>             Project: IMPALA
>          Issue Type: Bug
>            Reporter: Daniel Becker
>            Assignee: Jason Fehr
>            Priority: Blocker
>              Labels: broken-build
>
> Some of our tests failed in the test class 
> test_workload_mgmt_init.py::TestWorkloadManagementInitWait.
> test_upgrade_1_0_0_to_1_1_0():
> {code:java}
> custom_cluster/test_workload_mgmt_init.py:222: in test_upgrade_1_0_0_to_1_1_0
>     self.check_schema_version("1.1.0")
> custom_cluster/test_workload_mgmt_init.py:129: in check_schema_version
>     self.assert_table_prop(tbl_name, "wm_schema_version", schema_version)
> custom_cluster/test_workload_mgmt_init.py:104: in assert_table_prop
>     assert found, "did not find expected table prop '{}' with value '{}' on 
> table " \
> E   AssertionError: did not find expected table prop 'wm_schema_version' with 
> value '1.1.0' on table 'sys.impala_query_log'
> E   assert False
> {code}
> test_invalid_wm_schema_version_live_table_prop():
> {code:java}
> custom_cluster/test_workload_mgmt_init.py:375: in 
> test_invalid_wm_schema_version_live_table_prop
>     self._run_invalid_table_prop_test(self.QUERY_TBL_LIVE, 
> "wm_schema_version")
> custom_cluster/test_workload_mgmt_init.py:325: in _run_invalid_table_prop_test
>     "found on the '{}' property of table '{}'".format(prop_name, table))
> common/impala_test_suite.py:1351: in assert_catalogd_log_contains
>     daemon, level, line_regex, expected_count, timeout_s, dry_run)
> common/impala_test_suite.py:1397: in assert_log_contains
>     (expected_count, log_file_path, line_regex, found, line)
> E   AssertionError: Expected 1 lines in file 
> /data0/jenkins/workspace/impala-asf-master-core-ozone-erasure-coding/repos/Impala/logs/custom_cluster_tests/catalogd.impala-ec2-centos79-m6i-4xlarge-xldisk-126a.vpc.cloudera.com.jenkins.log.FATAL.20241107-042724.11427
>  matching regex 'could not parse version string '' found on the 
> 'wm_schema_version' property of table 'sys.impala_query_live'', but found 0 
> lines. Last line was: 
> E   . Impalad exiting.
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to