[email protected] has posted comments on this change. ( http://gerrit.cloudera.org:8080/21031 )
Change subject: IMPALA-12709: Add support for hierarchical metastore event processing ...................................................................... Patch Set 27: (9 comments) http://gerrit.cloudera.org:8080/#/c/21031/26//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21031/26//COMMIT_MSG@30 PS26, Line 30: only after the alter database event is processed. > Consider the scenario: a database event, and then there are a bunch of tabl we cannot rename a db http://gerrit.cloudera.org:8080/#/c/21031/26/be/src/catalog/catalog-server.cc File be/src/catalog/catalog-server.cc: http://gerrit.cloudera.org:8080/#/c/21031/26/be/src/catalog/catalog-server.cc@217 PS26, Line 217: DEFINE_int32(hms_event_polling_interval_ms, 0, > Can you add some description about why we need a ms interval when we alread This flag is to set the polling interval of catalogd in milliseconds precision. It is not specific to enable_hierarchical_event_processing flag. But when enable_hierarchical_event_processing is true, we can set hms_event_polling_interval_ms value to < 1000(ms) because the event fetching thread just dispatches the events to appropriate event executors for processing(i.e., actual event processing do not happen on the event fetching thread). This allows the catalogd to catchup faster with HMS. http://gerrit.cloudera.org:8080/#/c/21031/26/be/src/common/global-flags.cc File be/src/common/global-flags.cc: http://gerrit.cloudera.org:8080/#/c/21031/26/be/src/common/global-flags.cc@294 PS26, Line 294: "hms_event_polling_interval_ms is used to specify the polling interval in " > Should we say here that 'hms_event_polling_interval_ms' is used by hierarch hms_event_polling_interval_ms can be used to configure polling intercal in ms precision. It is not specific to enable_hierarchical_event_processing flag. http://gerrit.cloudera.org:8080/#/c/21031/26/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/21031/26/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@637 PS26, Line 637: processEventProcessorDebugAction(ddlRequest.getQuery_options().getDebug_action()); > Do we need a debug action when we have this: https://github.com/apache/impa It is used in /tests/custom_cluster/test_event_processing_perf.py http://gerrit.cloudera.org:8080/#/c/21031/26/fe/src/main/java/org/apache/impala/util/DebugUtils.java File fe/src/main/java/org/apache/impala/util/DebugUtils.java: http://gerrit.cloudera.org:8080/#/c/21031/26/fe/src/main/java/org/apache/impala/util/DebugUtils.java@99 PS26, Line 99: public static final String EVENT_PROCESSOR = "event_processor"; > nit: EVENT_PROCESSOR_STATE sounds more intuitive to me. We can set the event processor debug action like below: event_processor:state@resume event_processor:state@hold So have not used state again in the action label as it becomes redundant. In future if we want to add any other actions to event_processor, we can extend with event_processor:new_action@action_value format. http://gerrit.cloudera.org:8080/#/c/21031/26/fe/src/test/java/org/apache/impala/catalog/events/SynchronousHMSEventProcessorForTests.java File fe/src/test/java/org/apache/impala/catalog/events/SynchronousHMSEventProcessorForTests.java: http://gerrit.cloudera.org:8080/#/c/21031/26/fe/src/test/java/org/apache/impala/catalog/events/SynchronousHMSEventProcessorForTests.java@31 PS26, Line 31: pollingFrequencyInMilliSec > Why change polling frequency to millisec when hierarchical event processor JniCatalog.getEventsProcessor() reads the event polling interval configuration(hms_event_polling_interval_ms, hms_event_polling_interval_s) and pass the configured interval to MetastoreEventsProcessor.getInstance() static method. A new instance of MetastoreEventsProcessor is created with that interval. So have passed time in millisec directly. Since SynchronousHMSEventProcessorForTests extends MetastoreEventsProcessor, have made the corresponding changes here. http://gerrit.cloudera.org:8080/#/c/21031/26/tests/custom_cluster/test_event_processing_perf.py File tests/custom_cluster/test_event_processing_perf.py: http://gerrit.cloudera.org:8080/#/c/21031/26/tests/custom_cluster/test_event_processing_perf.py@310 PS26, Line 310: start_event_id = self.__hold_event_processing() > Why is this commented? Had commented since tests take long time to run. I have uncommented it now. http://gerrit.cloudera.org:8080/#/c/21031/26/tests/custom_cluster/test_event_processing_perf.py@352 PS26, Line 352: def __drop_databases(self, exec_fn): > Should compare the performance with and without a hierarchical event proces These are time consuming tests. Have disabled the tests from running by default. Have added separate tests to run with and without hierarchical event processing enabled for different cases(external, transaction, non-partitioned, partitioned etc). At the end of the each test, have logged the execution time for each operation. Need to capture the time logs manually for each test. http://gerrit.cloudera.org:8080/#/c/21031/26/tests/custom_cluster/test_events_custom_configs.py File tests/custom_cluster/test_events_custom_configs.py: http://gerrit.cloudera.org:8080/#/c/21031/26/tests/custom_cluster/test_events_custom_configs.py@1345 PS26, Line 1345: > Why do we need to set the log level to trace? Removed it -- To view, visit http://gerrit.cloudera.org:8080/21031 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I76d8a739f9db6d40f01028bfd786a85d83f9e5d6 Gerrit-Change-Number: 21031 Gerrit-PatchSet: 27 Gerrit-Owner: Anonymous Coward <[email protected]> Gerrit-Reviewer: Anonymous Coward <[email protected]> Gerrit-Reviewer: Anonymous Coward <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Sai Hemanth Gantasala <[email protected]> Gerrit-Comment-Date: Tue, 26 Nov 2024 12:31:49 +0000 Gerrit-HasComments: Yes
