k.venureddy2...@gmail.com has posted comments on this change. ( http://gerrit.cloudera.org:8080/22997 )
Change subject: IMPALA-13801: Support greatest synced event with hierarchical metastore event processing ...................................................................... Patch Set 22: (11 comments) http://gerrit.cloudera.org:8080/#/c/22997/17//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/22997/17//COMMIT_MSG@24 PS17, Line 24: processing event at appropriate DbEventExecutor and : TableEventExecutor). : c) Event process time from EventExecutorService point of > Just a note, as now table events are processed in parallel, we can improve ack. Will do it as follow-up http://gerrit.cloudera.org:8080/#/c/22997/17/fe/src/main/java/org/apache/impala/catalog/events/DbEventExecutor.java File fe/src/main/java/org/apache/impala/catalog/events/DbEventExecutor.java: http://gerrit.cloudera.org:8080/#/c/22997/17/fe/src/main/java/org/apache/impala/catalog/events/DbEventExecutor.java@294 PS17, Line 294: Event.getDi > nit: "startTime" might be better Done. Have changed it to processingStartTime http://gerrit.cloudera.org:8080/#/c/22997/17/fe/src/main/java/org/apache/impala/catalog/events/EventExecutorService.java File fe/src/main/java/org/apache/impala/catalog/events/EventExecutorService.java: http://gerrit.cloudera.org:8080/#/c/22997/17/fe/src/main/java/org/apache/impala/catalog/events/EventExecutorService.java@83 PS17, Line 83: iter event. Delimiter is a kind of metastore event : // that do not require event > nit: can we give an example? E.g. add "e.g. INSERT events". Done. Have updated description. Delimeter event can be : 1. A CommitTxnEvent that do not have any write event info for a given transaction. 2. An AbortTxnEvent that do not have write ids for a given transaction. 3. An IgnoredEvent. http://gerrit.cloudera.org:8080/#/c/22997/17/fe/src/main/java/org/apache/impala/catalog/events/EventExecutorService.java@88 PS17, Line 88: cessing > It seems moving this marker into MetastoreEvent as a new field is cleaner t Done http://gerrit.cloudera.org:8080/#/c/22997/17/fe/src/main/java/org/apache/impala/catalog/events/EventExecutorService.java@441 PS17, Line 441: ngEventCount(); > nit: use event.getEventDesc() to also show the eventType. Have used event.debugLog() now. http://gerrit.cloudera.org:8080/#/c/22997/17/fe/src/main/java/org/apache/impala/catalog/events/EventExecutorService.java@472 PS17, Line 472: elimite > nit: use value.getFirst().getEventDesc() to also show the eventType. Have used event.debugLog() now. http://gerrit.cloudera.org:8080/#/c/22997/18/fe/src/main/java/org/apache/impala/catalog/events/EventExecutorService.java File fe/src/main/java/org/apache/impala/catalog/events/EventExecutorService.java: http://gerrit.cloudera.org:8080/#/c/22997/18/fe/src/main/java/org/apache/impala/catalog/events/EventExecutorService.java@472 PS18, Line 472: * @param isDelimiter True if the event is a delimiter event. False otherwise > In which case will this happen? Could you add a comment about it? Do we nee Done http://gerrit.cloudera.org:8080/#/c/22997/18/fe/src/main/java/org/apache/impala/catalog/events/EventExecutorService.java@491 PS18, Line 491: og_.put(event.getEventId(), ev > nit: "dispatched event count" sounds like something that is monotonically i Have changed it to - "Current count of dispatched events that are being tracked" http://gerrit.cloudera.org:8080/#/c/22997/19/fe/src/main/java/org/apache/impala/catalog/events/EventExecutorService.java File fe/src/main/java/org/apache/impala/catalog/events/EventExecutorService.java: http://gerrit.cloudera.org:8080/#/c/22997/19/fe/src/main/java/org/apache/impala/catalog/events/EventExecutorService.java@518 PS19, Line 518: ll the subsequent delimiter e > nit: "processed event count" also sounds like something that is monotonical Done http://gerrit.cloudera.org:8080/#/c/22997/21/fe/src/main/java/org/apache/impala/catalog/events/EventExecutorService.java File fe/src/main/java/org/apache/impala/catalog/events/EventExecutorService.java: http://gerrit.cloudera.org:8080/#/c/22997/21/fe/src/main/java/org/apache/impala/catalog/events/EventExecutorService.java@547 PS21, Line 547: Long nextUnprocessedEventId = inProgressLog_.higherKey(greatestSyncedEventId); > What about keys in inProgressLog_ that are lower than processedLog_.firstKe processedLog_.firstKey() is the greatest Synced EventId. Greatest Synced EventId is the event such that all events with id less than or equal to the event are definitely synced. processedLog_ is initialized/added with eventProcessor_.getLastSyncedEventId() at the EventExecutorService#start(). So processedLog_.firstKey() is always the known greatest Synced EventId from the beginning. What about keys in inProgressLog_ that are lower than processedLog_.firstKey()? ==> No event with id less than processedLog_.firstKey() can exist in inProgressLog_. I think the algorithm assums that processedLog_.firstKey() <= inProgressLog_.firstKey() when inProgressLog_ is not empty. So there are no such keys. ==> Algorithm always starts with one entry in processedLog_. It has the known greatest Synced EventId. processedLog_.firstKey() is always less than inProgressLog_.firstKey(). But it seems wrong for the following sequence: At the begining (e.g. after startup or global IM), both maps are empty. ==> Only inProgressLog_ is empty. processedLog_ is initialized with eventProcessor_.getLastSyncedEventId() when eventProcessor is made EventProcessorStatus.ACTIVE both at startup and global IM in method EventExecutorService#makeActive() which is called from EventExecutorService#start(). Have modified the example you have mentioned a bit to make it clear. Lets say, Keys are added into inProgressLog_, e.g. [101, 102, 103, 104]. Algorithm would have initialized processedLog_ with [100] since 100 is greatest Synced EventId. Consider the following order of processing: 1. Assume 103 is processed and moved to the processedLog_. inProgressLog_ is [101, 102, 104] processedLog_ before prune is [100, 103]. pruneProcessedLog() doesn't prune anything. processedLog_ is [100, 103]. ++++ pruneProcessedLog() does following: nextUnprocessedEventId is 101 greatestSyncedEventId = processedLog_.lowerKey(101) i.e., 100 processedLog_ = [100, 103] ++++ greatestSyncedEventId is still 100 because 101 and 102 are not processed. processedLog_.firstKey() returns 100. 2. Assume 101 is processed now and moved to the processedLog_. inProgressLog_ is [102, 104] processedLog_ before prune is [100, 101, 103]. pruneProcessedLog() makes processedLog_ as [101, 103]. ++++ nextUnprocessedEventId is 102 greatestSyncedEventId = processedLog_.lowerKey(102) i.e., 101 processedLog_ = [101, 103] ++++ greatestSyncedEventId is 101. processedLog_.firstKey() returns 101. 3. Assume 102 is processed now and moved to the processedLog_. inProgressLog_ is [104] processedLog_ before prune is [101, 102, 103]. pruneProcessedLog() makes processedLog_ as [103]. ++++ nextUnprocessedEventId is 104 greatestSyncedEventId = processedLog_.lowerKey(104) i.e., 103 processedLog_ = [103] ++++ greatestSyncedEventId is 103. processedLog_.firstKey() returns 103. 4. Event 104 is processed and moved to the processedLog_. inProgressLog_ is [] processedLog_ before prune is [103, 104]. pruneProcessedLog() makes processedLog_ as [104]. ++++ nextUnprocessedEventId is null greatestSyncedEventId = processedLog_.lastKey() i.e., 104 processedLog_ = [104] ++++ greatestSyncedEventId is 104. processedLog_.firstKey() returns 104. http://gerrit.cloudera.org:8080/#/c/22997/19/fe/src/test/java/org/apache/impala/catalog/events/EventExecutorServiceTest.java File fe/src/test/java/org/apache/impala/catalog/events/EventExecutorServiceTest.java: http://gerrit.cloudera.org:8080/#/c/22997/19/fe/src/test/java/org/apache/impala/catalog/events/EventExecutorServiceTest.java@805 PS19, Line 805: eventExecutorService.getGreatestSyncedEventId()); > Can we add test coverage on using addTo/removeFromInProgressLog() and prune Done -- To view, visit http://gerrit.cloudera.org:8080/22997 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I26240f36aaf85125428dc39a66a2a1e4d3197e85 Gerrit-Change-Number: 22997 Gerrit-PatchSet: 22 Gerrit-Owner: Anonymous Coward <k.venureddy2...@gmail.com> Gerrit-Reviewer: Anonymous Coward <k.venureddy2...@gmail.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Reviewer: Riza Suminto <riza.sumi...@cloudera.com> Gerrit-Reviewer: Sai Hemanth Gantasala <saihema...@cloudera.com> Gerrit-Comment-Date: Mon, 22 Sep 2025 16:39:52 +0000 Gerrit-HasComments: Yes