Re: [PR] HDDS-13513 Ozone Event Notification Design [ozone]

via GitHub Mon, 11 Aug 2025 15:46:28 -0700


errose28 commented on PR #8871:
URL: https://github.com/apache/ozone/pull/8871#issuecomment-3177110525

Here is the recording of today's discussion:

https://cloudera.zoom.us/rec/share/gV55URMxUdXpdq8UYHy4EtyoZ3a1Vro1lJYUWeQDA5smHHuVLyr4K7zqY5ddgbIK.h5bSNPYJvhDEbwaV
Passcode: jj6oSj4?

Here is a high level summary of what was discussed. Feel free to add other
details I may have missed.

---

The high level design discussed involves the Ozone Manager persisting a
bounded number of events to RocksDB as they occur. Plugins decoupled from the
OM can consume these events by iterating the RocksDB column family which stores
these events. Plugins would have freedom to make many decisions as needed for
their use case:
- How they want to persist the index of the event they last left off on
- Which node should have the currently active plugin
- Whether the plugin pushes to a single source or multiple sources
- Whether event delivery to the source is blocking or non-blocking
- What schema they want to output

Internally, the Ozone Manager can persist events to its state machine as
part of the `applyTransaction` execution of relevant requests. This ensures
high availability of events for at least once delivery. Events can be written
to a new column family in RocksDB using a key based on an incrementing
transaction ID. The OM would have a configuration like `ozone.om.events.max`
which would define the maximum number of events that would remain persisted.
Once this number is exceeded, older events may be deleted by the OM with a
RocksDB range delete.

This provides a spec for plugins consuming events: They can be guaranteed to
see at least the last `ozone.om.max.events`, and they can use the transaction
ID of the last event they processed to resume processing newer events.

To run plugins, a few options were discussed:

1. Run each plugin as a thread within the OM
- **Pros**:
- Easiest process and configuration management
- Easy for plugins to check if the current OM is the leader, and use
this as their run condition
- Plugins can add metrics to the OM's JMX endpoint for monitoring.
- **Cons**:
- Potentially more load on the OM if there are a lot of plugins
pushing information or churning iterator objects.
- Adding or removing plugins would likely require (rolling) restart
of OMs since the classpath must be updated
- If a plugin's thread crashes unexpectedly, either the OM must be
restarted to revive it, or a management CLI to start or stop plugins must be
added.
2. Run each plugin as a process on each OM using a read-only RocksDB handle
to consume events.
- **Pros**:
- No extra load on the OM process.
- Plugins can be added or removed at any time independent of all
other plugins or OM.
- **Cons**:
- Process management complexity increases with number of plugins.
- Cannot do a local leader check if that is to be used as a run
condition. Best option is probably to query metrics of the current host to
check its leader status.
- Would need to add a new JMX endpoint if plugin metrics were to be
added.
3. Run one process on each OM that internally runs all configured plugins
- Mostly the same pros and cons as the previous option.
- **Pros**:
- Less process management than the second option, while still
removing all load from the OM.
- Plugins can be added, removed, or restarted independent of the OM,
but not other plugins.
- **Cons**:
- Still need to manage a separate process with its own configuration
and metrics.
- Cannot do a local leader check, still need to rely on the host
OM's metrics.

For either approach, we could create a base plugin class that would handle
most of the common tasks. The OM already supports dynamically loading plugins
that implement a given interface like `IAccessAuthorizer`. We can use a similar
model here:
- `OMEventListener` (or other name): The interface that all plugins must
implement. It would probably just contain a single `run` method that would be
started with the process and have complete autonomy over how it reads and
processes events.
- `OMEventListenerBase` (or other name): An abstract class that provides a
common starting implementation that most plugins would likely want to use for
consuming events:
- Running a thread that fetches all the events from the DB from the last
processed index, deserializing them, and invoking a callback to the
implementation to consume them.
- Periodically persisting the last event index that was processed.
- This does not need to happen for every event due to at least once
delivery.
- Persistence options include:
- Writing to a local file (harder to resume work on leader
change)
- Writing the information to a file in Ozone
- Checking if the current node is the leader to determine whether the
plugin should continue pushing events or sleep
- This failover mechanism also does not need to be exact due to at
least once delivery. A follower may continue to deliver events before realizing
it is no longer the leader without violating this protocol.

The biggest open item at the end of this discussion was whether to run the
plugins within the OM as threads or as a separate process.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] HDDS-13513 Ozone Event Notification Design [ozone]

Reply via email to