Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/20131 )
Change subject: IMPALA-12152: Add query options to wait for HMS events sync up ...................................................................... Patch Set 31: (19 comments) http://gerrit.cloudera.org:8080/#/c/20131/30//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20131/30//COMMIT_MSG@37 PS30, Line 37: The wait time can : be set to the largest lag of event processing that has been observed in : the cluster. > Done. Usually users have some charts to track the lag. It's shown in the ca Done http://gerrit.cloudera.org:8080/#/c/20131/30//COMMIT_MSG@51 PS30, Line 51: Compilation: 1s321ms > Maybe "Continuing without syncing Metastore events" is better. We don't kno Ack http://gerrit.cloudera.org:8080/#/c/20131/31//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20131/31//COMMIT_MSG@39 PS31, Line 39: catalogd WebUI and metrics Is it events-processor.lag-time? Can you spell out the metrics name in commit message please? http://gerrit.cloudera.org:8080/#/c/20131/31//COMMIT_MSG@84 PS31, Line 84: Currently, UPDATE_TBL_COL_STAT_EVENT, UPDATE_PART_COL_STAT_EVENT, : OPEN_TXN events are ignored by the event processor. If the latest event : happens to be in these types and there are no more other events, the : last synced event id can never reach the latest event id. Question: Does this mean SYNC_HMS_EVENTS_WAIT_TIME_S will be spent in full waiting and fail? http://gerrit.cloudera.org:8080/#/c/20131/30/be/src/catalog/catalog-server.cc File be/src/catalog/catalog-server.cc: http://gerrit.cloudera.org:8080/#/c/20131/30/be/src/catalog/catalog-server.cc@465 PS30, Line 465: status = catalog_server_->catalog()->WaitForHmsEvent(req, &resp); > > 1. Not wait indefinitely? OK. Can you put comment in ImpalaService.thrift or somewhere else that is relevant to not change SYNC_HMS_EVENTS_WAIT_TIME_S and SYNC_HMS_EVENTS_STRICT_MODE to non-default value cluster wide (ie., though --default_query_options)? To me, this is a class of query option that might be detrimental to performance (and might be cascading) if set to arbitrary high default value, cluster wide. We should warn to be careful about it, even only through docs, until further code improvement is implemented. http://gerrit.cloudera.org:8080/#/c/20131/31/fe/src/main/java/org/apache/impala/analysis/StatementBase.java File fe/src/main/java/org/apache/impala/analysis/StatementBase.java: http://gerrit.cloudera.org:8080/#/c/20131/31/fe/src/main/java/org/apache/impala/analysis/StatementBase.java@73 PS31, Line 73: public String getParsedDb() Can we add @Nullable annotation here? Some IDE can do static analysis and warn if access does not check for null possibility. http://gerrit.cloudera.org:8080/#/c/20131/30/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/20131/30/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@4232 PS30, Line 4232: req.timeout_s * 1000L > That's a fair point. The network latency might be trivial and can be ignore Ack. Please file a follow up JIRA for that. http://gerrit.cloudera.org:8080/#/c/20131/31/fe/src/main/java/org/apache/impala/catalog/events/ExternalEventsProcessor.java File fe/src/main/java/org/apache/impala/catalog/events/ExternalEventsProcessor.java: http://gerrit.cloudera.org:8080/#/c/20131/31/fe/src/main/java/org/apache/impala/catalog/events/ExternalEventsProcessor.java@46 PS31, Line 46: long getLastSyncedEventId(); Maybe add default value for new interface method here? http://gerrit.cloudera.org:8080/#/c/20131/31/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java: http://gerrit.cloudera.org:8080/#/c/20131/31/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@1523 PS31, Line 1523: long timeoutMs, Reference<Boolean> success nit: I think you should create a class two the returned String with success state. I see we use apache.impala.common.Reference a lot in CatalogD code. The pattern make sense in C++, but feels convoluted in Java where we can easily create a container class and maybe reuse it somewhere else relevant. For example, occurrence of "Reference<Boolean> tblWasRemoved, Reference<Boolean> dbWasAdded" deserve to have its own container class to emphasize why both are significant and exist together. http://gerrit.cloudera.org:8080/#/c/20131/31/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@1549 PS31, Line 1549: hmsEventSyncSleepIntervalMs_ Should this be min(timeoutMs, hmsEventSyncSleepIntervalMs_) ? http://gerrit.cloudera.org:8080/#/c/20131/31/fe/src/main/java/org/apache/impala/service/Frontend.java File fe/src/main/java/org/apache/impala/service/Frontend.java: http://gerrit.cloudera.org:8080/#/c/20131/31/fe/src/main/java/org/apache/impala/service/Frontend.java@2294 PS31, Line 2294: queryCtx.setWarnings(errorMsg); Why we need to setWarnings(errorMsg) if errorMsg is already printed in Query timeline? Why not set any warning during planning as an InfoString under FrontendProfile and let backend to grab it from ClientRequestState? (see also: ClientRequestState::SetFrontendProfile) http://gerrit.cloudera.org:8080/#/c/20131/31/tests/common/impala_test_suite.py File tests/common/impala_test_suite.py: http://gerrit.cloudera.org:8080/#/c/20131/31/tests/common/impala_test_suite.py@383 PS31, Line 383: get_client Rename to 'default_impala_client'. It clarify the intent of using previously initialized client targeting Impala node vs creating a new client. FYI, we also have hive_client inside ImpalaTestSuite, so 'default_impala_client' will not be ambiguous. It will be nice to converge the naming with this patch of mine: https://gerrit.cloudera.org/c/22336/ http://gerrit.cloudera.org:8080/#/c/20131/31/tests/common/impala_test_suite.py@1378 PS31, Line 1378: {"sync_hms_events_wait_time_s": timeout_s, "sync_hms_events_strict_mode": True} Can define a dictionary constant and share this with wait_for_db_to_appear. http://gerrit.cloudera.org:8080/#/c/20131/30/tests/custom_cluster/test_events_custom_configs.py File tests/custom_cluster/test_events_custom_configs.py: http://gerrit.cloudera.org:8080/#/c/20131/30/tests/custom_cluster/test_events_custom_configs.py@1530 PS30, Line 1530: > Done. I think this is a custom cluster test that runs serially so it's ok t self.get_client(vector.get_value('protocol')) is good. Thanks. http://gerrit.cloudera.org:8080/#/c/20131/30/tests/custom_cluster/test_events_custom_configs.py@1541 PS30, Line 1541: > The query will fail immediately regardless of the wait time since catalogd Ack http://gerrit.cloudera.org:8080/#/c/20131/31/tests/custom_cluster/test_events_custom_configs.py File tests/custom_cluster/test_events_custom_configs.py: http://gerrit.cloudera.org:8080/#/c/20131/31/tests/custom_cluster/test_events_custom_configs.py@1427 PS31, Line 1427: handle = client.execute_async(query) Close this handle before you run another execute_async at L1444. Do note that HS2 client currently can not run 2 concurrent queries together at the same time. https://gerrit.cloudera.org/c/22362/ http://gerrit.cloudera.org:8080/#/c/20131/31/tests/metadata/test_event_processing.py File tests/metadata/test_event_processing.py: http://gerrit.cloudera.org:8080/#/c/20131/31/tests/metadata/test_event_processing.py@214 PS31, Line 214: class TestEventSyncWaiting(ImpalaTestSuite): Override add_test_dimensions so you can define query option that you must have for this test. def add_test_dimensions(cls): super(TestEventSyncWaiting, cls).add_test_dimensions() cls.ImpalaTestMatrix.add_dimension(create_single_exec_option_dimension()) add_mandatory_exec_option(cls, 'sync_hms_events_wait_time_s', 10) add_mandatory_exec_option(cls, 'sync_hms_events_strict_mode', True) Thus, you dont need EVENT_SYNC_QUERY_OPTIONS anymore and simply use vector.get_exec_option_dict() http://gerrit.cloudera.org:8080/#/c/20131/31/tests/metadata/test_event_processing_base.py File tests/metadata/test_event_processing_base.py: http://gerrit.cloudera.org:8080/#/c/20131/31/tests/metadata/test_event_processing_base.py@30 PS31, Line 30: def _run_test_insert_events_impl(cls, hive_client, impala_client, impala_cluster, : unique_database, is_transactional=False): This method modify and clear existing configuration of impala_client. To ensure correctness, please make sure impala_client is a new client and not one of default impala clients inside ImpalaTestSuite. http://gerrit.cloudera.org:8080/#/c/20131/31/tests/metadata/test_event_processing_base.py@144 PS31, Line 144: def _run_event_based_replication_tests_impl(cls, hive_client, impala_client, : impala_cluster, filesystem_client, transactional=True): This method modify and clear existing configuration of impala_client. To ensure correctness, please make sure impala_client is a new client and not one of default impala clients inside ImpalaTestSuite. -- To view, visit http://gerrit.cloudera.org:8080/20131 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I36ac941bb2c2217b09fcfa2eb567b011b38efa2a Gerrit-Change-Number: 20131 Gerrit-PatchSet: 31 Gerrit-Owner: Quanlong Huang <[email protected]> Gerrit-Reviewer: Anonymous Coward <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Jason Fehr <[email protected]> Gerrit-Reviewer: Michael Smith <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Reviewer: Sai Hemanth Gantasala <[email protected]> Gerrit-Reviewer: Steve Carlin <[email protected]> Gerrit-Comment-Date: Tue, 21 Jan 2025 18:59:29 +0000 Gerrit-HasComments: Yes
