[Impala-ASF-CR] IMPALA-14618: Ensure processor clear waits for active event processing

Quanlong Huang (Code Review) Mon, 12 Jan 2026 05:08:16 -0800

Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/23823 )

Change subject: IMPALA-14618: Ensure processor clear waits for active event
processing
......................................................................

Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/23823/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/23823/1//COMMIT_MSG@12
PS1, Line 12: by those processors.
> DbProcessors in a DbEventExecutor share single thread. So only one of the D
Ack

http://gerrit.cloudera.org:8080/#/c/23823/1//COMMIT_MSG@14
PS1, Line 14: it could update
: the cache with stale information after the metadata rebuild
> Below is the sequence of operations from logs:
What seems a bug is how CREATE_DATABASE event gets a null db here (so proceeds
with addDb()):
https://github.com/apache/impala/blob/45051a27672c3a8a5be304ab2c71fc5c02946788/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L996

If catalog_.getDb() is invoked after reset(), it should see the db.
If catalog_.getDb() is invoked during reset(), it should wait until the db is
loaded in reset(). But we just add the waiting for the initial reset which
seems a bug.
https://github.com/apache/impala/blob/45051a27672c3a8a5be304ab2c71fc5c02946788/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java#L538
https://github.com/apache/impala/blob/45051a27672c3a8a5be304ab2c71fc5c02946788/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java#L500

This allows the CREATE_DATABASE event to saw the db doesn't exist and then
decide to add it. Note that acquiring the catalog version write lock happens
after the existence check. So reset() adds the db with the correct table list.
But then CREATE_DATABASE event replace it with a db that has empty table list.
It's OK if the following CREATE_TABLE event can add back the table. But the
event is skipped since event processor is paused and then restarted at a newer
event id.

I think the issue could also happen when Hierarchical Event Processing is
disabled. At the begining of reset(), we set the event processor status to
PAUSED and then move forward. But the event processor thread could still be in
processing a CREATE_DATABASE event and then causes the same issue.

Maybe fixing the issue in getDb() is better.

http://gerrit.cloudera.org:8080/#/c/23823/1//COMMIT_MSG@18
PS1, Line 18: This change ensures that processor clear waits for any ongoing
event
: processing to complete before clearing internal state.
> IMO, we need to wait for event processing to completely pause before procee
A single ALTER_TABLE event could take minutes or longer to process if it
reloads a large table. That already impacts the reset performance which then
impacts the catalogd HA warm failover. If we fix the other bug mentioned above,
we don't need this waiting. Otherwise, it's OK to let the thread modify the
stale catalog objects concurrently, if the catalog object can't be referenced
from CatalogServiceCatalog anymore.

--
To view, visit http://gerrit.cloudera.org:8080/23823
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib04ee2fa425cb4be397232a0065b6fdbfc1e244c
Gerrit-Change-Number: 23823
Gerrit-PatchSet: 2
Gerrit-Owner: Anonymous Coward <[email protected]>
Gerrit-Reviewer: Anonymous Coward <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Sai Hemanth Gantasala <[email protected]>
Gerrit-Comment-Date: Mon, 12 Jan 2026 13:06:28 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-14618: Ensure processor clear waits for active event processing

Reply via email to