errose28 commented on PR #8871:
URL: https://github.com/apache/ozone/pull/8871#issuecomment-3177110525

   Here is the recording of today's discussion:
   
https://cloudera.zoom.us/rec/share/gV55URMxUdXpdq8UYHy4EtyoZ3a1Vro1lJYUWeQDA5smHHuVLyr4K7zqY5ddgbIK.h5bSNPYJvhDEbwaV
   Passcode: jj6oSj4?
   
   Here is a high level summary of what was discussed. Feel free to add other 
details I may have missed.
   
   ---
   
   The high level design discussed involves the Ozone Manager persisting a 
bounded number of events to RocksDB as they occur. Plugins decoupled from the 
OM can consume these events by iterating the RocksDB column family which stores 
these events. Plugins would have freedom to make many decisions as needed for 
their use case:
   - How they want to persist the index of the event they last left off on
   - Which node should have the currently active plugin
   - Whether the plugin pushes to a single source or multiple sources
   - Whether event delivery to the source is blocking or non-blocking
   - What schema they want to output
    
   Internally, the Ozone Manager can persist events to its state machine as 
part of the `applyTransaction` execution of relevant requests. This ensures 
high availability of events for at least once delivery. Events can be written 
to a new column family in RocksDB using a key based on an incrementing 
transaction ID. The OM would have a configuration like `ozone.om.events.max` 
which would define the maximum number of events that would remain persisted. 
Once this number is exceeded, older events may be deleted by the OM with a 
RocksDB range delete.
   
   This provides a spec for plugins consuming events: They can be guaranteed to 
see at least the last `ozone.om.max.events`, and they can use the transaction 
ID of the last event they processed to resume processing newer events.
   
   To run plugins, a few options were discussed:
   
   1. Run each plugin as a thread within the OM
       - **Pros**:
           - Easiest process and configuration management
           - Easy for plugins to check if the current OM is the leader, and use 
this as their run condition
           - Plugins can add metrics to the OM's JMX endpoint for monitoring.
       - **Cons**:
           - Potentially more load on the OM if there are a lot of plugins 
pushing information or churning iterator objects.
           - Adding or removing plugins would likely require (rolling) restart 
of OMs since the classpath must be updated
           - If a plugin's thread crashes unexpectedly, either the OM must be 
restarted to revive it, or a management CLI to start or stop plugins must be 
added.
   2. Run each plugin as a process on each OM using a read-only RocksDB handle 
to consume events.
       - **Pros**:
           - No extra load on the OM process.
           - Plugins can be added or removed at any time independent of all 
other plugins or OM.
       - **Cons**:
           - Process management complexity increases with number of plugins.
           - Cannot do a local leader check if that is to be used as a run 
condition. Best option is probably to query metrics of the current host to 
check its leader status.
           - Would need to add a new JMX endpoint if plugin metrics were to be 
added.
   3. Run one process on each OM that internally runs all configured plugins
       - Mostly the same pros and cons as the previous option.
       - **Pros**:
           - Less process management than the second option, while still 
removing all load from the OM.
           - Plugins can be added, removed, or restarted independent of the OM, 
but not other plugins.
       - **Cons**:
           - Still need to manage a separate process with its own configuration 
and metrics.
           - Cannot do a local leader check, still need to rely on the host 
OM's metrics.
   
   For either approach, we could create a base plugin class that would handle 
most of the common tasks. The OM already supports dynamically loading plugins 
that implement a given interface like `IAccessAuthorizer`. We can use a similar 
model here:
   - `OMEventListener` (or other name): The interface that all plugins must 
implement. It would probably just contain a single `run` method that would be 
started with the process and have complete autonomy over how it reads and 
processes events.
   - `OMEventListenerBase` (or other name): An abstract class that provides a 
common starting implementation that most plugins would likely want to use for 
consuming events:
       - Running a thread that fetches all the events from the DB from the last 
processed index, deserializing them, and invoking a callback to the 
implementation to consume them.
       - Periodically persisting the last event index that was processed.
           - This does not need to happen for every event due to at least once 
delivery.
           - Persistence options include:
               - Writing to a local file (harder to resume work on leader 
change)
               - Writing the information to a file in Ozone
       - Checking if the current node is the leader to determine whether the 
plugin should continue pushing events or sleep
           - This failover mechanism also does not need to be exact due to at 
least once delivery. A follower may continue to deliver events before realizing 
it is no longer the leader without violating this protocol.
   
   The biggest open item at the end of this discussion was whether to run the 
plugins within the OM as threads or as a separate process.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to