Hello Quanlong Huang, [email protected], Sai Hemanth Gantasala, Csaba
Ringhofer, Impala Public Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/21031
to look at the new patch set (#28).
Change subject: IMPALA-12709: Add support for hierarchical metastore event
processing
......................................................................
IMPALA-12709: Add support for hierarchical metastore event processing
At present, metastore event processor is single threaded. Notification
events are processed sequentially with a maximum limit of 1000 events
fetched and processed in a single batch. Multiple locks are used to
address the concurrency issues that may arise when catalog DDL
operation processing and metastore event processing tries to
access/update the catalog objects concurrently. Waiting for a lock or
file metadata loading of a table can slow the event processing and can
affect the processing of other events following it. Those events may
not be dependent on the previous event. Altogether it takes a very
long time to synchronize all the HMS events.
Existing metastore event processing is turned into multi-level
event processing with enable_hierarchical_event_processing flag. It
is not enabled by default. Idea is to segregate the events based on
their dependency, maintain the order of events as they occur within
the dependency and process them independently as much as possible:
1. All the events of a table are processed in the same order they
have occurred.
2. Events of different tables are processed in parallel.
3. When a database is altered, all the table events related to
the database that occurred after the alter db event are processed
only after the alter database event is processed.
Following new events are added:
1. DBBarrierEvent
This event wraps a database event. It is used to synchronize all
the table event processors before processing the database event.
It acts as a barrier to restrict the processing of table events
that occurred after the database event until the database event
is processed on DB processor.
2. RenameTableBarrierEvent
This event wraps an alter table event for rename. It is used to
synchronize the source and target table event processors to
process the rename table event. It ensures the source table
processor removes the table first and then allows the target
table processor to create the renamed table.
3. PseudoCommitTxnEvent and PseudoAbortTxnEvent
CommitTxnEvent and AbortTxnEvent can involve multiple tables in
a transaction and processing these events modifies multiple table
objects. Pseudo events are introduced such that a pseudo event is
created for each table involved in the transaction and these
pseudo events are processed independently at respective table
processor.
Following new flags are introduced:
1. enable_hierarchical_event_processing
To enable the hierarchical event processing on catalogd.
2. db_event_executors
To set the number of db level event executors.
3. table_event_executors
To set the number of table level event executors within a db
event executor.
4. remove_processor_threshold
To set the threshold to remove db processors and table processors
on the respective executors when they are idle.
5. max_outstanding_events_on_executors
To set the limit of maximum outstanding events to process on all
event executors.
Changed hms_event_polling_interval_s type from int to double to support
millisecond precision interval
TODOs:
1. We need to redefine the lag in the hierarchical processing mode.
2. Need to have a mechanism to capture the actual event processing time in
hierarchical processing mode.
3. FE tests need some changes to run in the hierarchical processing mode
since event processing happens across multiple threads. But, current tests
expects event processing to be complete when processEvents() is invoked.
Hierarchical processing is not enabled by default so these tests won't fail.
Testing:
- Executed existing end to end tests.
- Added end-to-end test with enable_hierarchical_event_processing.
- Added event processing performance tests. They are marked to skip.
- Have executed all the existing end-to-end tests with hierarchical processing
mode enabled
Change-Id: I76d8a739f9db6d40f01028bfd786a85d83f9e5d6
---
M be/src/catalog/catalog-server.cc
M be/src/common/global-flags.cc
M be/src/util/backend-gflag-util.cc
M be/src/util/event-metrics.cc
M be/src/util/event-metrics.h
M common/thrift/BackendGflags.thrift
M common/thrift/JniCatalog.thrift
M common/thrift/metrics.json
M fe/src/compat-apache-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/TableWriteId.java
A fe/src/main/java/org/apache/impala/catalog/events/DBBarrierEvent.java
A fe/src/main/java/org/apache/impala/catalog/events/DBEventExecutor.java
M fe/src/main/java/org/apache/impala/catalog/events/DeleteEventLog.java
M fe/src/main/java/org/apache/impala/catalog/events/ExternalEventsProcessor.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M
fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java
M fe/src/main/java/org/apache/impala/catalog/events/NoOpEventProcessor.java
A fe/src/main/java/org/apache/impala/catalog/events/RenameTableBarrierEvent.java
A fe/src/main/java/org/apache/impala/catalog/events/TableEventExecutor.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/JniCatalog.java
M fe/src/main/java/org/apache/impala/util/DebugUtils.java
M fe/src/test/java/org/apache/impala/catalog/CatalogTableWriteIdTest.java
M
fe/src/test/java/org/apache/impala/catalog/events/EventsProcessorStressTest.java
M
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M
fe/src/test/java/org/apache/impala/catalog/events/SynchronousHMSEventProcessorForTests.java
M
fe/src/test/java/org/apache/impala/catalog/metastore/CatalogHmsSyncToLatestEventIdTest.java
A tests/custom_cluster/test_event_processing_perf.py
M tests/custom_cluster/test_events_custom_configs.py
M tests/util/event_processor_utils.py
33 files changed, 2,356 insertions(+), 108 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/21031/28
--
To view, visit http://gerrit.cloudera.org:8080/21031
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I76d8a739f9db6d40f01028bfd786a85d83f9e5d6
Gerrit-Change-Number: 21031
Gerrit-PatchSet: 28
Gerrit-Owner: Anonymous Coward <[email protected]>
Gerrit-Reviewer: Anonymous Coward <[email protected]>
Gerrit-Reviewer: Anonymous Coward <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Sai Hemanth Gantasala <[email protected]>