Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/22816 )
Change subject: IMPALA-13829: Postpone catalog deleteLog GC for waitForHmsEvent requests ...................................................................... Patch Set 3: (6 comments) http://gerrit.cloudera.org:8080/#/c/22816/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/22816/2//COMMIT_MSG@27 PS2, Line 27: postponing > nit: postponing Done http://gerrit.cloudera.org:8080/#/c/22816/2//COMMIT_MSG@33 PS2, Line 33: A new flag, catalog_delete_log_gc_frequency, is added for this. The : deleteLog GC happens in every N+1 (N=catalog_delete_log_gc_frequency) : topic updates. > How many seconds between GC does it translate to in default config? Catalog updates will only be sent out when there are catalog changes (due to DDL/DML/HMS events). Assuming catalog keeps changing, the catalog updates are supposed to be sent every 2 seconds (configured by statestore_update_frequency_ms). However, catalog update thread could be blocked by table locks holding by concurrent DDLs. The actual interval is usually larger than 2s. Sometimes could be minutes depending on how long the DDL holds the table lock. When the GC happens, only items before the last 1000th topic updates are cleared. So an item could survive for 2000 rounds of topic updates. Assuming catalog updates are sent at the fastest speed, 2000 rounds of topic updates means 4000s. sync_hms_events_wait_time_s is the timeout used to wait for HMS events to be processed. It doesn't matter if deleteLog GC happens in the middle of the wait, since we assume 1000 rounds of catalog updates are enough for the impalad to received the GCed deletions. http://gerrit.cloudera.org:8080/#/c/22816/2/be/src/catalog/catalog-server.cc File be/src/catalog/catalog-server.cc: http://gerrit.cloudera.org:8080/#/c/22816/2/be/src/catalog/catalog-server.cc@177 PS2, Line 177: catalog_delete_log_gc_frequency > Please add validator that this flag is always a positive number. Please do Done http://gerrit.cloudera.org:8080/#/c/22816/2/fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java File fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java: http://gerrit.cloudera.org:8080/#/c/22816/2/fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java@137 PS2, Line 137: <= > nit: <= for safety. Done http://gerrit.cloudera.org:8080/#/c/22816/2/tests/metadata/test_event_processing.py File tests/metadata/test_event_processing.py: http://gerrit.cloudera.org:8080/#/c/22816/2/tests/metadata/test_event_processing.py@626 PS2, Line 626: assert False, "Failed to drop dat > Assert that this is success? It seems we don't need assertion here. If it fails, it raises an exception: tests/metadata/test_event_processing.py:626: in test_hms_event_sync_on_deletion client.execute("create database " + db) tests/common/impala_connection.py:505: in execute fetch_exec_summary=fetch_exec_summary) tests/beeswax/impala_beeswax.py:195: in execute handle = self.__execute_query(query_string.strip(), user=user) tests/beeswax/impala_beeswax.py:291: in __execute_query handle = self.execute_query_async(query_string, user=user) tests/beeswax/impala_beeswax.py:285: in execute_query_async handle = self.__do_rpc(lambda: self.imp_service.query(query,)) tests/beeswax/impala_beeswax.py:470: in __do_rpc raise ImpalaBeeswaxException(self.__build_error_message(b), b) E ImpalaBeeswaxException: Query 9c413ce6f3c038e5:c6ebe8ce00000000 failed: E AnalysisException: Database already exists: test_hms_event_sync_on_deletion_115fe4d9_db http://gerrit.cloudera.org:8080/#/c/22816/2/tests/metadata/test_event_processing.py@640 PS2, Line 640: self.hive_client.drop_table(db, tbl_name, deleteData=True) : LOG.info("Dropped table {}.{} in Hive".format(db, tbl_name)) > Wrap L671 to L639 in try block. Done -- To view, visit http://gerrit.cloudera.org:8080/22816 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2441440bca2b928205dd514047ba742a5e8bf05e Gerrit-Change-Number: 22816 Gerrit-PatchSet: 3 Gerrit-Owner: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Michael Smith <michael.sm...@cloudera.com> Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Reviewer: Riza Suminto <riza.sumi...@cloudera.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Comment-Date: Fri, 25 Apr 2025 06:59:14 +0000 Gerrit-HasComments: Yes