Hello Quanlong Huang, Riza Suminto, cclive1...@gmail.com, Sai Hemanth Gantasala, Csaba Ringhofer, Impala Public Jenkins,
I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21031 to look at the new patch set (#44). Change subject: IMPALA-12709: Add support for hierarchical metastore event processing ...................................................................... IMPALA-12709: Add support for hierarchical metastore event processing At present, metastore event processor is single threaded. Notification events are processed sequentially with a maximum limit of 1000 events fetched and processed in a single batch. Multiple locks are used to address the concurrency issues that may arise when catalog DDL operation processing and metastore event processing tries to access/update the catalog objects concurrently. Waiting for a lock or file metadata loading of a table can slow the event processing and can affect the processing of other events following it. Those events may not be dependent on the previous event. Altogether it takes a very long time to synchronize all the HMS events. Existing metastore event processing is turned into multi-level event processing with enable_hierarchical_event_processing flag. It is not enabled by default. Idea is to segregate the events based on their dependency, maintain the order of events as they occur within the dependency and process them independently as much as possible. Following 3 main classes represents the three level threaded event processing. 1. EventExecutorService It provides the necessary methods to initialize, start, clear, stop and process the metastore events processing in hierarchical mode. It is instantiated from MetastoreEventsProcessor and its methods are invoked from MetastoreEventsProcessor. Upon receiving the event to process, EventExecutorService queues the event to appropriate DbEventExecutor for processing. 2. DbEventExecutor An instance of this class has an execution thread, manage events of multiple databases with DbProcessors. An instance of DbProcessor is maintained to store the context of each database within the DbEventExecutor. On each scheduled execution, input events on DbProcessor are segregated to appropriate TableProcessors for the event processing and also process the database events that are eligible for processing. Once a DbEventExecutor is assigned to a database, a DbProcessor is created. And the subsequent events belonging to the database are queued to same DbEventExecutor thread for further processing. Hence, linearizability is ensured in dealing with events within the database. Each instance of DbEventExecutor has a fixed list of TableEventExecutors. 3. TableEventExecutor An instance of this class has an execution thread, processes events of multiple tables with TableProcessors. An instance of TableProcessor is maintained to store context of each table within a TableEventExecutor. On each scheduled execution, events from TableProcessors are processed. Once a TableEventExecutor is assigned to table, a TableProcessor is created. And the subsequent table events are processed by same TableEventExecutor thread. Hence, linearizability is guaranteed in processing events of a particular table. - All the events of a table are processed in the same order they have occurred. - Events of different tables are processed in parallel when those tables are assigned to different TableEventExecutors. Following new events are added: 1. DbBarrierEvent This event wraps a database event. It is used to synchronize all the TableProcessors belonging to database before processing the database event. It acts as a barrier to restrict the processing of table events that occurred after the database event until the database event is processed on DbProcessor. 2. RenameTableBarrierEvent This event wraps an alter table event for rename. It is used to synchronize the source and target TableProcessors to process the rename table event. It ensures the source TableProcessor removes the table first and then allows the target TableProcessor to create the renamed table. 3. PseudoCommitTxnEvent and PseudoAbortTxnEvent CommitTxnEvent and AbortTxnEvent can involve multiple tables in a transaction and processing these events modifies multiple table objects. Pseudo events are introduced such that a pseudo event is created for each table involved in the transaction and these pseudo events are processed independently at respective TableProcessors. Following new flags are introduced: 1. enable_hierarchical_event_processing To enable the hierarchical event processing on catalogd. 2. num_db_event_executors To set the number of database level event executors. 3. num_table_event_executors_per_db_event_executor To set the number of table level event executors within a database event executor. 4. min_event_processor_idle_ms To set the minimum time to retain idle db processors and table processors on the database event executors and table event executors respectively, when they do not have events to process. 5. max_outstanding_events_on_executors To set the limit of maximum outstanding events to process on event executors. Changed hms_event_polling_interval_s type from int to double to support millisecond precision interval TODOs: 1. We need to redefine the lag in the hierarchical processing mode. 2. Need to have a mechanism to capture the actual event processing time in hierarchical processing mode. Currently, with enable_hierarchical_event_processing as true, lastSyncedEventId_ and lastSyncedEventTimeSecs_ are updated upon event dispatch to EventExecutorService for processing on respective DbEventExecutor and/or TableEventExecutor. So lastSyncedEventId_ and lastSyncedEventTimeSecs_ doesn't actually mean events are processed. 3. Hierarchical processing mode currently have a mechanism to show the total number of outstanding events on all the db and table executors at the moment. Need to enhance observability further with this mode. Filed a jira[IMPALA-13801] to fix them. Testing: - Executed existing end to end tests. - Added fe and end-to-end tests with enable_hierarchical_event_processing. - Added event processing performance tests. They are marked to skip. - Have executed the existing tests with hierarchical processing mode enabled. TestEventSyncFailures::test_hms_event_sync_timeout fails when hierarchical processing mode enabled because lastSyncedEventId_ do not actually mean event is processed in this mode. This need to be fixed/verified with above jira[IMPALA-13801]. Change-Id: I76d8a739f9db6d40f01028bfd786a85d83f9e5d6 --- M be/src/catalog/catalog-server.cc M be/src/common/global-flags.cc M be/src/util/backend-gflag-util.cc M be/src/util/event-metrics.cc M be/src/util/event-metrics.h M common/thrift/BackendGflags.thrift M common/thrift/JniCatalog.thrift M common/thrift/metrics.json M docs/topics/impala_metadata.xml M fe/src/compat-apache-hive-3/java/org/apache/impala/compat/MetastoreShim.java M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/TableWriteId.java A fe/src/main/java/org/apache/impala/catalog/events/DbBarrierEvent.java A fe/src/main/java/org/apache/impala/catalog/events/DbEventExecutor.java M fe/src/main/java/org/apache/impala/catalog/events/DeleteEventLog.java A fe/src/main/java/org/apache/impala/catalog/events/EventExecutorService.java M fe/src/main/java/org/apache/impala/catalog/events/ExternalEventsProcessor.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java M fe/src/main/java/org/apache/impala/catalog/events/NoOpEventProcessor.java A fe/src/main/java/org/apache/impala/catalog/events/RenameTableBarrierEvent.java A fe/src/main/java/org/apache/impala/catalog/events/TableEventExecutor.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/JniCatalog.java M fe/src/test/java/org/apache/impala/catalog/CatalogTableWriteIdTest.java M fe/src/test/java/org/apache/impala/catalog/MetastoreApiTestUtils.java A fe/src/test/java/org/apache/impala/catalog/events/EventExecutorServiceTest.java M fe/src/test/java/org/apache/impala/catalog/events/EventsProcessorStressTest.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java M fe/src/test/java/org/apache/impala/catalog/events/SynchronousHMSEventProcessorForTests.java M fe/src/test/java/org/apache/impala/catalog/metastore/CatalogHmsSyncToLatestEventIdTest.java A tests/custom_cluster/test_event_processing_perf.py M tests/custom_cluster/test_events_custom_configs.py M tests/util/event_processor_utils.py 35 files changed, 3,999 insertions(+), 167 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/21031/44 -- To view, visit http://gerrit.cloudera.org:8080/21031 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I76d8a739f9db6d40f01028bfd786a85d83f9e5d6 Gerrit-Change-Number: 21031 Gerrit-PatchSet: 44 Gerrit-Owner: Anonymous Coward <k.venureddy2...@gmail.com> Gerrit-Reviewer: Anonymous Coward <cclive1...@gmail.com> Gerrit-Reviewer: Anonymous Coward <k.venureddy2...@gmail.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Reviewer: Riza Suminto <riza.sumi...@cloudera.com> Gerrit-Reviewer: Sai Hemanth Gantasala <saihema...@cloudera.com>