Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20131 )

Change subject: IMPALA-12152: Add query options to wait for HMS events sync up
......................................................................


Patch Set 31:

(12 comments)

http://gerrit.cloudera.org:8080/#/c/20131/30//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20131/30//COMMIT_MSG@7
PS30, Line 7: IMPALA-12152: Add query options to wait for HMS events sy
> nit: IMPALA-12152: Add query options to wait for HMS events sync up
Done


http://gerrit.cloudera.org:8080/#/c/20131/30//COMMIT_MSG@20
PS30, Line 20: SYNC_HMS_EVENTS_WAIT_TIME_S
> nit: SYNC_HMS_EVENTS_WAIT_TIME_S
Done. Currently, no limits. Users can set abitrary values.


http://gerrit.cloudera.org:8080/#/c/20131/30//COMMIT_MSG@22
PS30, Line 22: SYNC_HMS_EVENTS_STRICT_MODE
> nit: SYNC_HMS_EVENTS_STRICT_MODE
Done


http://gerrit.cloudera.org:8080/#/c/20131/30//COMMIT_MSG@37
PS30, Line 37: The wait time can
             : be set to the largest lag of event processing that has been 
observed in
             : the cluster.
> How to observe this? can you mention it in commit message?
Done. Usually users have some charts to track the lag. It's shown in the 
catalogd WebUI and metrics.


http://gerrit.cloudera.org:8080/#/c/20131/30//COMMIT_MSG@51
PS30, Line 51:  Compilation: 1s321ms
> "Failed" might be interpreted by user as mistake. "Why query still running
Maybe "Continuing without syncing Metastore events" is better. We don't know if 
there are unprocessed events. We could be lucky that there are none. Using the 
new message when strict mode is off.


http://gerrit.cloudera.org:8080/#/c/20131/30//COMMIT_MSG@57
PS30, Line 57:
             : A ne
> nit: is added to CatalogService API so that coordinator can wait
Done


http://gerrit.cloudera.org:8080/#/c/20131/30/be/src/catalog/catalog-server.cc
File be/src/catalog/catalog-server.cc:

http://gerrit.cloudera.org:8080/#/c/20131/30/be/src/catalog/catalog-server.cc@465
PS30, Line 465:       status = catalog_server_->catalog()->WaitForHmsEvent(req, 
&resp);
> 1. Not wait indefinitely?

It's a common problem for all catalog RPCs, e.g. execDdl, execResetMetadata, 
getPartialCatalogObject, etc. I think some work in this patch might help: 
https://gerrit.cloudera.org/c/21803/ . E.g. coordinator can send a cancel 
request to catalogd to cancel all threads of a query id, including this waiting 
thread in catalogd.

> 2. Not filling up / holding resources of CatalogService API with multiple 
> sleeping RPC thread?

That's a good point that we should take care of. I think we need some stress 
tests to measure the overhead of these. If we want to control the concurrency, 
we can add a semaphore like partialObjectFetchAccess_ in CatalogServiceCatalog 
(IMPALA-7626). Maybe sleeping threads won't consume too much resource. If it's 
OK so far, I'll file a follow-up JIRA for this.


http://gerrit.cloudera.org:8080/#/c/20131/30/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/20131/30/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@4232
PS30, Line 4232: req.timeout_s * 1000L
> Should this also account for time spent sending TWaitForHmsEventRequest tho
That's a fair point. The network latency might be trivial and can be ignored 
since this is suggested to be set to several seconds/minutes. However, once we 
have the blocking mechanism (like semaphore mentioned in another comment), the 
time between coordinator sending the request and catalogd goes into this method 
could be significant. We can consider it after that.

Another point is that we suggest setting this timeout to the max lag of event 
processing. So here we get the latest event id and start waiting for it to be 
synced. If we account for the time coordinator sending the request, we need to 
consider the max time of that. Maybe we can do it in the same future patch.


http://gerrit.cloudera.org:8080/#/c/20131/30/tests/custom_cluster/test_events_custom_configs.py
File tests/custom_cluster/test_events_custom_configs.py:

http://gerrit.cloudera.org:8080/#/c/20131/30/tests/custom_cluster/test_events_custom_configs.py@1415
PS30, Line 1415:
> Add 'vector' parameter and create a new client based on vector.get_value('p
Good point! I just realized there are some test dimensions for this class. I 
don't think we need to test on them. Moving this new test to 
tests/metadata/test_event_processing.py. The other reason is we don't need 
custom flags here.


http://gerrit.cloudera.org:8080/#/c/20131/30/tests/custom_cluster/test_events_custom_configs.py@1523
PS30, Line 1523:
> Add 'vector' parameter and create a new client based on vector.get_value('p
Done. Try using the created client of the specific protocol.


http://gerrit.cloudera.org:8080/#/c/20131/30/tests/custom_cluster/test_events_custom_configs.py@1530
PS30, Line 1530:
> Don't check against QUERY_STATES directly. It will break this test if somed
Done. I think this is a custom cluster test that runs serially so it's ok to 
use the existing clients. Any concerns on using them?


http://gerrit.cloudera.org:8080/#/c/20131/30/tests/custom_cluster/test_events_custom_configs.py@1541
PS30, Line 1541:
> Can you also validate if the duration is aligned with given sync_hms_events
The query will fail immediately regardless of the wait time since catalogd 
found EventProcessor is disabled. We are currently lack of a test for a real 
timeout. Added it in the next patch set.



--
To view, visit http://gerrit.cloudera.org:8080/20131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I36ac941bb2c2217b09fcfa2eb567b011b38efa2a
Gerrit-Change-Number: 20131
Gerrit-PatchSet: 31
Gerrit-Owner: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Anonymous Coward <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Jason Fehr <[email protected]>
Gerrit-Reviewer: Michael Smith <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Reviewer: Sai Hemanth Gantasala <[email protected]>
Gerrit-Reviewer: Steve Carlin <[email protected]>
Gerrit-Comment-Date: Tue, 21 Jan 2025 13:35:54 +0000
Gerrit-HasComments: Yes

Reply via email to