donalmag commented on PR #8871: URL: https://github.com/apache/ozone/pull/8871#issuecomment-3136206919
> ### Slight Alternative Approach: Co-locating a listener on each OM > I haven't totally worked out this proposal but wanted to put it out here since it seems to address some of the previous concerns. Instead of a full OM, the listener can be a small process that is co-located on each OM node. If its workload ends up being especially light, it may even be able to be a thread within the OM itself. > > The main OM would be configured to move old Ratis log files to a backup directory instead of deleting them. This keeps its working directory clean and will not affect startup time due to a large number of files. I did a quick look through Ratis and it [doesn't look like this is supported currently](https://github.com/apache/ratis/blob/f21e350d4e7330d1b2f24001ac62040ebba205f4/ratis-server/src/main/java/org/apache/ratis/server/raftlog/segmented/SegmentedRaftLogWorker.java#L483), but it could be added. The listener can read log entries from the backup dir, and then the main OM dir. As a listener, it will be notified of the cluster's apply index, which it can use to determine which log files correspond to valid events. It will also know the current leader through election events, so the instances running on followers can pause running. This listener can then push events to the plugged in consumers based on the Ratis logs, and purge them from the backup dir when the consumers have ac ked them. It does not need to consume the ratis logs that come through the Ratis listener API since it will use the local copies on the OM. > > We would still need to hash out how the at-least-once delivery specification from Ozone to the consumer will fit with leader changes in this model. @errose28 - Thanks for the feedback. This approach is interesting as it is similar to a separate POC we had developed internally. The reason we did not go with that approach was that we didn't think we could guarantee the entry in the ratis log was actually applied successfully without attempting to update the metadata itself first. Your comment seems to imply that should be possible though? Can you explain how we could verify this? My understanding is that the lastAppliedTxn is just the last txn the leader executed, there is no guarantee it was executed successfully - is that assumption incorrect? E.g. If the notify process is on txn 2 and the leader is on txn 10 how do we confirm which of txns 2->10 were applied successfully? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
