errose28 commented on PR #8871: URL: https://github.com/apache/ozone/pull/8871#issuecomment-3177110525
Here is the recording of today's discussion: https://cloudera.zoom.us/rec/share/gV55URMxUdXpdq8UYHy4EtyoZ3a1Vro1lJYUWeQDA5smHHuVLyr4K7zqY5ddgbIK.h5bSNPYJvhDEbwaV Passcode: jj6oSj4? Here is a high level summary of what was discussed. Feel free to add other details I may have missed. --- The high level design discussed involves the Ozone Manager persisting a bounded number of events to RocksDB as they occur. Plugins decoupled from the OM can consume these events by iterating the RocksDB column family which stores these events. Plugins would have freedom to make many decisions as needed for their use case: - How they want to persist the index of the event they last left off on - Which node should have the currently active plugin - Whether the plugin pushes to a single source or multiple sources - Whether event delivery to the source is blocking or non-blocking - What schema they want to output Internally, the Ozone Manager can persist events to its state machine as part of the `applyTransaction` execution of relevant requests. This ensures high availability of events for at least once delivery. Events can be written to a new column family in RocksDB using a key based on an incrementing transaction ID. The OM would have a configuration like `ozone.om.events.max` which would define the maximum number of events that would remain persisted. Once this number is exceeded, older events may be deleted by the OM with a RocksDB range delete. This provides a spec for plugins consuming events: They can be guaranteed to see at least the last `ozone.om.max.events`, and they can use the transaction ID of the last event they processed to resume processing newer events. To run plugins, a few options were discussed: 1. Run each plugin as a thread within the OM - **Pros**: - Easiest process and configuration management - Easy for plugins to check if the current OM is the leader, and use this as their run condition - Plugins can add metrics to the OM's JMX endpoint for monitoring. - **Cons**: - Potentially more load on the OM if there are a lot of plugins pushing information or churning iterator objects. - Adding or removing plugins would likely require (rolling) restart of OMs since the classpath must be updated - If a plugin's thread crashes unexpectedly, either the OM must be restarted to revive it, or a management CLI to start or stop plugins must be added. 2. Run each plugin as a process on each OM using a read-only RocksDB handle to consume events. - **Pros**: - No extra load on the OM process. - Plugins can be added or removed at any time independent of all other plugins or OM. - **Cons**: - Process management complexity increases with number of plugins. - Cannot do a local leader check if that is to be used as a run condition. Best option is probably to query metrics of the current host to check its leader status. - Would need to add a new JMX endpoint if plugin metrics were to be added. 3. Run one process on each OM that internally runs all configured plugins - Mostly the same pros and cons as the previous option. - **Pros**: - Less process management than the second option, while still removing all load from the OM. - Plugins can be added, removed, or restarted independent of the OM, but not other plugins. - **Cons**: - Still need to manage a separate process with its own configuration and metrics. - Cannot do a local leader check, still need to rely on the host OM's metrics. For either approach, we could create a base plugin class that would handle most of the common tasks. The OM already supports dynamically loading plugins that implement a given interface like `IAccessAuthorizer`. We can use a similar model here: - `OMEventListener` (or other name): The interface that all plugins must implement. It would probably just contain a single `run` method that would be started with the process and have complete autonomy over how it reads and processes events. - `OMEventListenerBase` (or other name): An abstract class that provides a common starting implementation that most plugins would likely want to use for consuming events: - Running a thread that fetches all the events from the DB from the last processed index, deserializing them, and invoking a callback to the implementation to consume them. - Periodically persisting the last event index that was processed. - This does not need to happen for every event due to at least once delivery. - Persistence options include: - Writing to a local file (harder to resume work on leader change) - Writing the information to a file in Ozone - Checking if the current node is the leader to determine whether the plugin should continue pushing events or sleep - This failover mechanism also does not need to be exact due to at least once delivery. A follower may continue to deliver events before realizing it is no longer the leader without violating this protocol. The biggest open item at the end of this discussion was whether to run the plugins within the OM as threads or as a separate process. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
