Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/21065 )
Change subject: [WIP]IMPALA-12832: Implicit invalidate metadata on event failures ...................................................................... Patch Set 8: (15 comments) http://gerrit.cloudera.org:8080/#/c/21065/8//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21065/8//COMMIT_MSG@25 PS8, Line 25: - Passed FE tests We also need manual tests on failures in handling the failure, i.e. when an event fail, we invalidate the tables. The invalidation could still failed by bugs like IMPALA-12831. Currently, IMPALA-12831 is unresolved yet. We can use it in manual tests. Or we can add another flag to inject failure in the error handling, and see if auto_global_invalidate_metadata works in this scenario. http://gerrit.cloudera.org:8080/#/c/21065/8/be/src/catalog/catalog-server.cc File be/src/catalog/catalog-server.cc: http://gerrit.cloudera.org:8080/#/c/21065/8/be/src/catalog/catalog-server.cc@174 PS8, Line 174: DEFINE_string nit: define this as hidden using DEFINE_string_hidden() http://gerrit.cloudera.org:8080/#/c/21065/7/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java: http://gerrit.cloudera.org:8080/#/c/21065/7/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1292 PS7, Line 1292: e); : catalog_.invalidateTableIfExists(dbName_, tblName_); : } nit: we can use warnLog() http://gerrit.cloudera.org:8080/#/c/21065/8/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java: http://gerrit.cloudera.org:8080/#/c/21065/8/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@a2811 PS8, Line 2811: Could you explain why we remove this? http://gerrit.cloudera.org:8080/#/c/21065/8/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@677 PS8, Line 677: BackendConfig.INSTANCE.getProcessEventFailureEventTypes() nit: this can also be replaced with catalog_.getFailureEventsForTesting() http://gerrit.cloudera.org:8080/#/c/21065/8/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@679 PS8, Line 679: Double nit: use primitive type 'double' http://gerrit.cloudera.org:8080/#/c/21065/8/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1368 PS8, Line 1368: return false; Maybe we can use CatalogServiceCatalog.invalidateDb() here. But it seems low priority since we haven't found bugs on db events yet? http://gerrit.cloudera.org:8080/#/c/21065/8/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1787 PS8, Line 1787: tableBefore_.getDbName(), tableBefore_.getTableName()); Please add a comment saying the new table will be invalidated in MetastoreTableEvent#onFailure(), i.e. the parent method. http://gerrit.cloudera.org:8080/#/c/21065/8/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@2902 PS8, Line 2902: is not processed by then. So try to get the table again here if it is null */ This seems fixing an existing bug. Can we split this into another patch? CommitCompactionEvent and ReloadEvent also have this issue, i.e. using catalog_.getTable() in the constructor. https://github.com/apache/impala/blob/0c0a3fff39839b400370433568f37a317b7d4800/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java#L3072 https://github.com/apache/impala/blob/0c0a3fff39839b400370433568f37a317b7d4800/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java#L2879 http://gerrit.cloudera.org:8080/#/c/21065/8/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@3105 PS8, Line 3105: tableWriteId.getDbName(), tableWriteId.getTblName()); I think we should deduplicate the table names before invalidating them. They could be the same table. http://gerrit.cloudera.org:8080/#/c/21065/8/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java: http://gerrit.cloudera.org:8080/#/c/21065/8/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@1006 PS8, Line 1006: try { Can we add a log for this? http://gerrit.cloudera.org:8080/#/c/21065/8/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@1188 PS8, Line 1188: if (!event.onFailure(e)) { throw e; } Can we add a log when throwing the exception? e.g. "Unable to handle failure in processing event". http://gerrit.cloudera.org:8080/#/c/21065/8/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@1190 PS8, Line 1190: LOG.error(event.debugString("Failed to handle event processing failure")); The invalidation in onFailure() could still fail by bugs like IMPALA-12831. Please add 'ex' in the log. LOG.error(event.debugString("Failed to handle event processing failure"), ex); http://gerrit.cloudera.org:8080/#/c/21065/8/tests/metadata/test_event_processing_error.py File tests/metadata/test_event_processing_error.py: http://gerrit.cloudera.org:8080/#/c/21065/8/tests/metadata/test_event_processing_error.py@28 PS8, Line 28: "ALTER_PARTITION,INSERT,ABORT_TXN,COMMIT_TXN'" Can we add all event types that we currently can handle? e.g. DROP_PARTITION, RELOAD, ALLOC_WRITE_ID_EVENT, COMMIT_COMPACTION are missing here. How about batched event types like ALTER_PARTITIONS, INSERT_PARTITIONS? Should we add them as well? For unsupported event types like ALTER_DATABASE, can we test the feature of auto_global_invalidate_metadata? On the other hand, it's uncertain whether all events are covered in these tests. Can we add individual test for each event type? So we are sure the failure handling of them are all covered. E.g. @CustomClusterTestSuite.with_args( impalad_args="--use_local_catalog=true", catalogd_args="--catalog_topic_mode=minimal --inject_process_event_failure_event_types=ALTER_TABLE") def test_alter_table_event_failure(self, unique_database): # Run Hive statements to generate ALTER_TABLE events EventProcessorUtils.wait_for_event_processing(self) assert EventProcessorUtils.get_event_processor_status() == "ACTIVE" http://gerrit.cloudera.org:8080/#/c/21065/8/tests/metadata/test_event_processing_error.py@61 PS8, Line 61: def test_sync_event_based_replication(self): IIUC, we just want some mix workloads. If so, we can merge these 3 into one so don't need to restart the cluster 3 times. @CustomClusterTestSuite.with_args( impalad_args="--use_local_catalog=true", catalogd_args=CATALOGD_ARGS) def test_event_processing_failures(self, unique_database): TestEventProcessingBase._run_test_insert_events_impl(self.hive_client, self.client, self.cluster, unique_database, True) assert EventProcessorUtils.get_event_processor_status() == "ACTIVE" TestEventProcessingBase._run_test_insert_events_impl(self.hive_client, self.client, self.cluster, unique_database, False) assert EventProcessorUtils.get_event_processor_status() == "ACTIVE" TestEventProcessingBase._run_event_based_replication_tests_impl(self.hive_client, self.client, self.cluster, self.filesystem_client) assert EventProcessorUtils.get_event_processor_status() == "ACTIVE" -- To view, visit http://gerrit.cloudera.org:8080/21065 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia67fc04c995802d3b6b56f79564bf0954b012c6c Gerrit-Change-Number: 21065 Gerrit-PatchSet: 8 Gerrit-Owner: Anonymous Coward <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Sai Hemanth Gantasala <[email protected]> Gerrit-Comment-Date: Tue, 27 Feb 2024 08:42:40 +0000 Gerrit-HasComments: Yes
