k.venureddy2...@gmail.com has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22997 )

Change subject: IMPALA-13801: Support greatest synced event with hierarchical 
metastore event processing
......................................................................


Patch Set 22:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/22997/17//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/22997/17//COMMIT_MSG@24
PS17, Line 24:       processing event at appropriate DbEventExecutor and
             :       TableEventExecutor).
             :    c) Event process time from EventExecutorService point of
> Just a note, as now table events are processed in parallel, we can improve
ack. Will do it as follow-up


http://gerrit.cloudera.org:8080/#/c/22997/17/fe/src/main/java/org/apache/impala/catalog/events/DbEventExecutor.java
File fe/src/main/java/org/apache/impala/catalog/events/DbEventExecutor.java:

http://gerrit.cloudera.org:8080/#/c/22997/17/fe/src/main/java/org/apache/impala/catalog/events/DbEventExecutor.java@294
PS17, Line 294: Event.getDi
> nit: "startTime" might be better
Done. Have changed it to processingStartTime


http://gerrit.cloudera.org:8080/#/c/22997/17/fe/src/main/java/org/apache/impala/catalog/events/EventExecutorService.java
File 
fe/src/main/java/org/apache/impala/catalog/events/EventExecutorService.java:

http://gerrit.cloudera.org:8080/#/c/22997/17/fe/src/main/java/org/apache/impala/catalog/events/EventExecutorService.java@83
PS17, Line 83: iter event. Delimiter is a kind of metastore event
             :   // that do not require event
> nit: can we give an example? E.g. add "e.g. INSERT events".
Done. Have updated description.

Delimeter event can be :
1. A CommitTxnEvent that do not have any write event info for a given 
transaction.
2. An AbortTxnEvent that do not have write ids for a given transaction.
3. An IgnoredEvent.


http://gerrit.cloudera.org:8080/#/c/22997/17/fe/src/main/java/org/apache/impala/catalog/events/EventExecutorService.java@88
PS17, Line 88: cessing
> It seems moving this marker into MetastoreEvent as a new field is cleaner t
Done


http://gerrit.cloudera.org:8080/#/c/22997/17/fe/src/main/java/org/apache/impala/catalog/events/EventExecutorService.java@441
PS17, Line 441: ngEventCount();
> nit: use event.getEventDesc() to also show the eventType.
Have used event.debugLog() now.


http://gerrit.cloudera.org:8080/#/c/22997/17/fe/src/main/java/org/apache/impala/catalog/events/EventExecutorService.java@472
PS17, Line 472: elimite
> nit: use value.getFirst().getEventDesc() to also show the eventType.
Have used event.debugLog() now.


http://gerrit.cloudera.org:8080/#/c/22997/18/fe/src/main/java/org/apache/impala/catalog/events/EventExecutorService.java
File 
fe/src/main/java/org/apache/impala/catalog/events/EventExecutorService.java:

http://gerrit.cloudera.org:8080/#/c/22997/18/fe/src/main/java/org/apache/impala/catalog/events/EventExecutorService.java@472
PS18, Line 472:    * @param isDelimiter True if the event is a delimiter event. 
False otherwise
> In which case will this happen? Could you add a comment about it? Do we nee
Done


http://gerrit.cloudera.org:8080/#/c/22997/18/fe/src/main/java/org/apache/impala/catalog/events/EventExecutorService.java@491
PS18, Line 491: og_.put(event.getEventId(), ev
> nit: "dispatched event count" sounds like something that is monotonically i
Have changed it to - "Current count of dispatched events that are being tracked"


http://gerrit.cloudera.org:8080/#/c/22997/19/fe/src/main/java/org/apache/impala/catalog/events/EventExecutorService.java
File 
fe/src/main/java/org/apache/impala/catalog/events/EventExecutorService.java:

http://gerrit.cloudera.org:8080/#/c/22997/19/fe/src/main/java/org/apache/impala/catalog/events/EventExecutorService.java@518
PS19, Line 518: ll the subsequent delimiter e
> nit: "processed event count" also sounds like something that is monotonical
Done


http://gerrit.cloudera.org:8080/#/c/22997/21/fe/src/main/java/org/apache/impala/catalog/events/EventExecutorService.java
File 
fe/src/main/java/org/apache/impala/catalog/events/EventExecutorService.java:

http://gerrit.cloudera.org:8080/#/c/22997/21/fe/src/main/java/org/apache/impala/catalog/events/EventExecutorService.java@547
PS21, Line 547:     Long nextUnprocessedEventId = 
inProgressLog_.higherKey(greatestSyncedEventId);
> What about keys in inProgressLog_ that are lower than processedLog_.firstKe
processedLog_.firstKey() is the greatest Synced EventId. Greatest Synced 
EventId is the event such that all events with id less than or equal to the 
event are definitely synced.
processedLog_ is initialized/added with eventProcessor_.getLastSyncedEventId() 
at the EventExecutorService#start(). So processedLog_.firstKey() is always the 
known greatest Synced EventId from the beginning.


What about keys in inProgressLog_ that are lower than processedLog_.firstKey()?

==> No event with id less than processedLog_.firstKey() can exist in 
inProgressLog_.


I think the algorithm assums that processedLog_.firstKey() <= 
inProgressLog_.firstKey() when inProgressLog_ is not empty. So there are no 
such keys.

==> Algorithm always starts with one entry in processedLog_. It has the known 
greatest Synced EventId. processedLog_.firstKey() is always less than 
inProgressLog_.firstKey().


But it seems wrong for the following sequence:

At the begining (e.g. after startup or global IM), both maps are empty.

==> Only inProgressLog_ is empty. processedLog_ is initialized with 
eventProcessor_.getLastSyncedEventId() when eventProcessor is made 
EventProcessorStatus.ACTIVE both at startup and global IM in method 
EventExecutorService#makeActive() which is called from 
EventExecutorService#start().


Have modified the example you have mentioned a bit to make it clear.

Lets say, Keys are added into inProgressLog_, e.g. [101, 102, 103, 104].
Algorithm would have initialized processedLog_ with [100] since 100 is greatest 
Synced EventId.

Consider the following order of processing:
1. Assume 103 is processed and moved to the processedLog_.
inProgressLog_ is [101, 102, 104]
processedLog_ before prune is [100, 103].
pruneProcessedLog() doesn't prune anything. processedLog_ is [100, 103].
++++
pruneProcessedLog() does following:
nextUnprocessedEventId is 101
greatestSyncedEventId = processedLog_.lowerKey(101) i.e., 100
processedLog_ = [100, 103]
++++
greatestSyncedEventId is still 100 because 101 and 102 are not processed. 
processedLog_.firstKey() returns 100.


2. Assume 101 is processed now and moved to the processedLog_.
inProgressLog_ is [102, 104]
processedLog_ before prune is [100, 101, 103]. pruneProcessedLog() makes 
processedLog_ as [101, 103].
++++
nextUnprocessedEventId is 102
greatestSyncedEventId = processedLog_.lowerKey(102) i.e., 101
processedLog_ = [101, 103]
++++
greatestSyncedEventId is 101. processedLog_.firstKey() returns 101.


3. Assume 102 is processed now and moved to the processedLog_.
inProgressLog_ is [104]
processedLog_ before prune is [101, 102, 103]. pruneProcessedLog() makes 
processedLog_ as [103].
++++
nextUnprocessedEventId is 104
greatestSyncedEventId = processedLog_.lowerKey(104) i.e., 103
processedLog_ = [103]
++++
greatestSyncedEventId is 103. processedLog_.firstKey() returns 103.


4. Event 104 is processed and moved to the processedLog_.
inProgressLog_ is []
processedLog_ before prune is [103, 104]. pruneProcessedLog() makes 
processedLog_ as [104].
++++
nextUnprocessedEventId is null
greatestSyncedEventId = processedLog_.lastKey() i.e., 104
processedLog_ = [104]
++++
greatestSyncedEventId is 104. processedLog_.firstKey() returns 104.


http://gerrit.cloudera.org:8080/#/c/22997/19/fe/src/test/java/org/apache/impala/catalog/events/EventExecutorServiceTest.java
File 
fe/src/test/java/org/apache/impala/catalog/events/EventExecutorServiceTest.java:

http://gerrit.cloudera.org:8080/#/c/22997/19/fe/src/test/java/org/apache/impala/catalog/events/EventExecutorServiceTest.java@805
PS19, Line 805:         eventExecutorService.getGreatestSyncedEventId());
> Can we add test coverage on using addTo/removeFromInProgressLog() and prune
Done



--
To view, visit http://gerrit.cloudera.org:8080/22997
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I26240f36aaf85125428dc39a66a2a1e4d3197e85
Gerrit-Change-Number: 22997
Gerrit-PatchSet: 22
Gerrit-Owner: Anonymous Coward <k.venureddy2...@gmail.com>
Gerrit-Reviewer: Anonymous Coward <k.venureddy2...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com>
Gerrit-Reviewer: Riza Suminto <riza.sumi...@cloudera.com>
Gerrit-Reviewer: Sai Hemanth Gantasala <saihema...@cloudera.com>
Gerrit-Comment-Date: Mon, 22 Sep 2025 16:39:52 +0000
Gerrit-HasComments: Yes

Reply via email to