Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/22634 )
Change subject: IMPALA-13850: Wait until CatalogD active before resetting metadata ...................................................................... Patch Set 5: (4 comments) http://gerrit.cloudera.org:8080/#/c/22634/5//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/22634/5//COMMIT_MSG@13 PS5, Line 13: registration. Let's mention the lock is held by GatherCatalogUpdatesThread which is calling GetCatalogDelta(). This waits for the java lock versionLock_ which is held by the thread doing CatalogServiceCatalog.reset(). http://gerrit.cloudera.org:8080/#/c/22634/5//COMMIT_MSG@23 PS5, Line 23: catalog.num-tables, or /catalog page content. I think coordinators will still wait for the initial catalog updates from statestore: https://github.com/apache/impala/blob/ddd4f4f8d68addce1542d57f94c637210a090150/be/src/service/impala-server.cc#L3159 before it marks itself as ready: https://github.com/apache/impala/blob/ddd4f4f8d68addce1542d57f94c637210a090150/be/src/service/impala-server.cc#L3381-L3382 Will the coordinator also get killed by k8s due to the healthy check time out? http://gerrit.cloudera.org:8080/#/c/22634/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/22634/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2322 PS5, Line 2322: DebugUtils.executeDebugAction(BackendConfig.INSTANCE.debugActions(), : DebugUtils.RESET_METADATA_LOOP_LOCKED); What about moving this outside of the loop so the wait time is independent to the number of dbs? I think we have different numbers of dbs between core and exhaustive builds. http://gerrit.cloudera.org:8080/#/c/22634/5/tests/custom_cluster/test_catalogd_ha.py File tests/custom_cluster/test_catalogd_ha.py: http://gerrit.cloudera.org:8080/#/c/22634/5/tests/custom_cluster/test_catalogd_ha.py@109 PS5, Line 109: assert page.status_code == requests.codes.ok Can we also verify the coordinator is healthy? -- To view, visit http://gerrit.cloudera.org:8080/22634 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I58cc66dcccedb306ff11893f2916ee5ee6a3efc1 Gerrit-Change-Number: 22634 Gerrit-PatchSet: 5 Gerrit-Owner: Riza Suminto <riza.sumi...@cloudera.com> Gerrit-Reviewer: Abhishek Rawat <ara...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Reviewer: Riza Suminto <riza.sumi...@cloudera.com> Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com> Gerrit-Comment-Date: Mon, 24 Mar 2025 02:43:11 +0000 Gerrit-HasComments: Yes