errose28 commented on code in PR #8871:
URL: https://github.com/apache/ozone/pull/8871#discussion_r2246298742


##########
hadoop-hdds/docs/content/design/event-notification-schema.md:
##########
@@ -0,0 +1,396 @@
+---
+title: Event notification schema discussion
+summary: Event notifications schema discussion
+date: 2025-06-29
+jira: HDDS-13513
+status: design
+author: Colm Dougan, Donal Magennis
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+## Overview
+
+This document outlines the schema requirements for event notification
+within Ozone and discusses the suitability of 2 widely used event
+notification schemas (S3 and HDFS) as candidates to use as a basis for
+the transmission format for notifications within Ozone.
+
+# General schema requirements
+
+## File/Directory creation/modification
+
+event notifications should be raised to inform consumers of completed
+operations which modify the filesystem and specifically the requests:
+
+#### CreateRequest
+
+we should emit some **create** event
+
+required fields:
+- path (volume + bucket + key)
+- isfile
+
+nice to have fields:
+- overwrite
+- recursive
+
+#### CreateFileRequest
+
+we should emit some **create** event
+
+required fields:
+- path (volume + bucket + key)
+- isfile

Review Comment:
   Seems like this field would be redundant given the event name. Same with 
`CreateDirectoryRequest`.



##########
hadoop-hdds/docs/content/design/event-notifications.md:
##########
@@ -0,0 +1,193 @@
+---
+title: Event notification support in Ozone
+summary: Event notifications for all bucket/event types in ozone
+date: 2025-06-28
+jira: HDDS-13513
+status: design
+author: Donal Magennis, Colm Dougan
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Abstract
+
+Implement an event notification system for Apache Ozone, providing the ability 
for users to consume events occuring on the Ozone filesystem.
+This is similar to https://issues.apache.org/jira/browse/HDDS-5984 but aims to 
encapsulate all events and not solely S3 buckets.  
+This document proposes a potential solution and discusses some of the 
challenges/open questions.
+
+# Introduction
+
+Apache Ozone does not currently provide the ability to consume filesystem 
events, similar to how HDFS does with Inotify or S3 with bucket notifications.  
+These events are an integral part of integration with external systems to 
support real-time, scalable, and programmatic monitoring of changes in the data 
or metadata stored in Ozone.  
+These external systems can use notifications of objects created/deleted to 
trigger data processing workflows, replication and monitoring alerts.
+
+# Goals
+
+Provide support for all events across the Ozone filesystem for FSO and non FSO 
buckets, including renames and changes to acls.
+Not impact with performance of client requests.
+Guarantee at-least-once delivery.
+
+# Non-Goals
+
+Filtering of events or paths/buckets
+Persistent storage of notification messages
+Asynchronous delivery
+
+# Supported OMRequests
+
+OMDirectoryCreateRequest
+OMKeyCommitRequest
+OMKeyDeleteRequest
+OMKeyRenameRequest
+OMKeyAddAclRequest
+OMKeyRemoveAclRequest
+OMKeySetAclRequest
+OMKeySetTimesRequest
+
+# Design
+
+## Overview
+
+Introduce a new OzoneManager node in the ratis ring in LISTENER mode.  
+The node will be defined up-front in ozone-site.xml.
+This node will maintain metadata similar to the other OM nodes however it will 
generate notifications for events after they have been successfully commited.
+This node will not be able to become a leader, so the notification process 
will not impact client requests.  
+
+![OzoneEventNotification.png](OzoneEventNotification.png)
+
+#### Component
+
+Implementation of this feature requires changes to the OzoneManager:
+    - Implement an agnostic notification interface
+    - Add support for a LISTENER node
+    - Add a hook in the OMRequest exection flow to generate notifications
+
+#### Component
+
+Introduces a new field on OMAdminProtocol.proto to identify the notify node:
+
+```protobuf
+message OMNodeInfo {
+required string nodeID = 1;
+required string hostAddress = 2;
+required uint32 rpcPort = 3;
+required uint32 ratisPort = 4;
+optional NodeState nodeState = 5 [default=ACTIVE];
+optional bool isNotifyNode = 6 [default=false];
+}
+```
+
+## Performance
+
+While this is a synchronous only approach any latency between notification 
target and OM should not impact the performance of client requests as the 
notification does not run on the leader.

Review Comment:
   I'm not sure "synchronous" is the best term to use here. [Ceph's 
definitions](https://docs.ceph.com/en/quincy/radosgw/notifications/#synchronous-notifications)
 are that synchronous blocks the writer and asynchronous does not. It's 
probably clearer to use similar definitions here, so the system would be 
asynchronous. I think the point being made here is that the notifier will block 
until the consumer acks the event but that this will not block the writer.



##########
hadoop-hdds/docs/content/design/event-notification-schema.md:
##########
@@ -0,0 +1,396 @@
+---
+title: Event notification schema discussion
+summary: Event notifications schema discussion
+date: 2025-06-29
+jira: HDDS-13513
+status: design
+author: Colm Dougan, Donal Magennis
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+## Overview
+
+This document outlines the schema requirements for event notification
+within Ozone and discusses the suitability of 2 widely used event
+notification schemas (S3 and HDFS) as candidates to use as a basis for
+the transmission format for notifications within Ozone.
+
+# General schema requirements
+
+## File/Directory creation/modification
+
+event notifications should be raised to inform consumers of completed
+operations which modify the filesystem and specifically the requests:
+
+#### CreateRequest
+
+we should emit some **create** event
+
+required fields:
+- path (volume + bucket + key)
+- isfile
+
+nice to have fields:
+- overwrite
+- recursive
+
+#### CreateFileRequest
+
+we should emit some **create** event
+
+required fields:
+- path (volume + bucket + key)
+- isfile
+
+nice to have fields:
+- overwrite
+- recursive
+
+#### CreateDirectoryRequest
+
+we should emit some **create** event
+
+required fields:
+- path (volume + bucket + key)
+- isfile
+
+#### CommitKeyRequest
+
+we should emit some **commit/close** event
+
+required fields:
+- path (volume + bucket + key)
+
+nice to have fields:
+- data size
+- hsync?
+
+#### DeleteKeyRequest
+
+we should emit some **delete** event
+
+required fields:
+- path (volume + bucket + key)
+
+nice to have fields:
+- recursive (if known)
+
+### RenameKeyRequest
+
+we should emit some **rename** event
+
+required fields:
+- fromPath (volume + bucket + key)
+- toPath (volume + bucket + toKeyName)
+
+nice to have fields:
+- recursive (if known)
+- is directory (if known)
+
+NOTE: in the case of a FSO directory rename there is a dillema
+(discussed later in this document) as to whether we should emit a single
+event for a directory rename (specifying only the old/new directory names)
+or whether we should emit granular events for all the child objects impacted by
+the rename.
+
+## ACLs
+
+event notifications should be raised to inform consumers that ACL events
+have happened. The relevant requests are:
+
+* AddAclRequest
+* SetAclRequest
+* RemoveAclRequest
+
+The fields provided could vary based on the implementation complexity.
+
+Minimally we have a requirement that we be informed that "some ACL update
+happened" to a certain key (or prefix).
+
+Ideally the details would include the full context of the change made as
+per the request. (perhaps by mirroring the full request details as a JSON
+sub-object) e.g. :
+
+```json
+   ...
+
+   "acls": [
+    {
+      type: "GROUP",
+      name: "mygroup"
+      rights: "\000\001",
+      aclScope: "ACCESS",
+    }
+   ]
+```
+
+The precise details we would need to revisit with guidance from the
+community but this is just to set broad brush expectations.
+
+## SetTimes
+
+event notifications should be raised to inform consumers that
+mtime/atime has changed, as per **SetTimesRequest**
+
+# Transmission format
+
+This section discusses 2 widely used transmission formats for event
+notifiations (S3 and HDFS) and their suitability as candidates for
+adoption within Ozone.
+
+It is not assumed that these are the only options available but they are
+good examples to test against our requirements and discuss trade-offs.
+
+## 1. S3 Event Notification schema
+
+The S3 event notification schema:
+
+[https://docs.aws.amazon.com/AmazonS3/latest/userguide/notification-how-to-event-types-and-destinations.html#supported-notification-event-types](https://docs.aws.amazon.com/AmazonS3/latest/userguide/notification-how-to-event-types-and-destinations.html#supported-notification-event-types)
+
+has become a standard for change notifications in S3 compatible storage 
services such as S3 itself, Ceph, MinIO etc
+
+Notification events are produced as a list of JSON records.
+
+To illustrate we can look at a sample "create" event from the Ceph docs
+(https://docs.ceph.com/en/quincy/radosgw/notifications/#events):
+
+```json
+
+{"Records":[
+    {
+        "eventVersion":"2.1",
+        "eventSource":"ceph:s3",
+        "awsRegion":"us-east-1",
+        "eventTime":"2019-11-22T13:47:35.124724Z",
+        "eventName":"ObjectCreated:Put",
+        "userIdentity":{
+            "principalId":"tester"
+        },
+        "requestParameters":{
+            "sourceIPAddress":""
+        },
+        "responseElements":{
+            
"x-amz-request-id":"503a4c37-85eb-47cd-8681-2817e80b4281.5330.903595",
+            "x-amz-id-2":"14d2-zone1-zonegroup1"
+        },
+        "s3":{
+            "s3SchemaVersion":"1.0",
+            "configurationId":"mynotif1",
+            "bucket":{
+                "name":"mybucket1",
+                "ownerIdentity":{
+                    "principalId":"tester"
+                },
+                "arn":"arn:aws:s3:us-east-1::mybucket1",
+                "id":"503a4c37-85eb-47cd-8681-2817e80b4281.5332.38"
+            },
+            "object":{
+                "key":"myimage1.jpg",
+                "size":"1024",
+                "eTag":"37b51d194a7513e45b56f6524f2d51f2",
+                "versionId":"",
+                "sequencer": "F7E6D75DC742D108",
+                "metadata":[],
+                "tags":[]
+            }
+        },
+        "eventId":"",
+        "opaqueData":"[email protected]"
+    }
+]}
+```
+
+As we can see above: there are a number of boilerplate fields to inform us
+of various aspects of the completed operation but there are a few fundamental
+aspects to highlight;
+
+1. the "key" informs us of the key that the operation was performed on.
+
+2. the "eventName" informs us of the type of operation that was
+   performed.  The 2 most notable eventNames are **ObjectCreated:Put** and
+   **ObjectRemoved:Deleted** which pertain to key creation and deletion 
respectively.
+
+3. operation specific fields can be included within the "object" sub-object (in
+   the above example we can see that "size" and "eTag" of the created object 
are included)
+
+## Applicability to Ozone
+
+For non-FSO Ozone buckets / operations there is a clear mapping between
+operations such as CreateKey / CommitKey / DeleteKey / RenameKey and the
+standard S3 event notification semantics.
+
+Examples:
+
+1. CommitKey could be mapped to a ObjectCreated:Put "/path/to/keyToCreate" 
notification event
+
+2. DeleteKey could be mapped to a ObjectRemoved:Deleted "/path/to/keyToDelete" 
notification event
+
+3. RenameKey (assuming a file based key) in standard S3 event noification 
semantics would produce 2 events:
+
+- a ObjectRemoved:Deleted event for the source path of the rename
+- a ObjectCreated:Put event for the destination path of the rename
+
+The challenge in adopting S3 Event notification semantics within Ozone
+would be in at least 2 areas:
+
+### 1. FSO hierarchical operations which impact multiple child keys
+
+Example: directory renames
+
+To illustrate with an example: lets say we have the following simple directory 
structure:
+
+```
+  /vol1/bucket1/myfiles/f1
+  /vol1/bucket1/myfiles/f2
+  /vol1/bucket1/myfiles/subdir/f1
+```
+
+If a user performs a directory rename such as:
+
+```
+  ozone fs -mv /vol1/bucket1/myfiles /vol1/bucket1/myfiles-RENAMED
+```
+
+Within standard S3 event notification semantics we would expect to see 6 
notifications
+emitted in that case:
+
+```
+  eventName=ObjectRemoved:Deleted, key=/vol1/bucket1/myfiles/f1
+  eventName=ObjectRemoved:Deleted, key=/vol1/bucket1/myfiles/f2
+  eventName=ObjectRemoved:Deleted, key=/vol1/bucket1/myfiles/subdir/f1
+  eventName=ObjectCreated:Put, key=/vol1/bucket1/myfiles-RENAMED/f1
+  eventName=ObjectCreated:Put, key=/vol1/bucket1/myfiles-RENAMED/f2
+  eventName=ObjectCreated:Put, key=/vol1/bucket1/myfiles-RENAMED/subdir/f1
+```
+
+However, with an approach of simply producing notifications based on Ratis
+state machine events then all we would have to go on from the
+RenameKeyRequest would be the fromKeyName and the toKeyName of the
+*parent* of the directory being renamed (and not the impacted child
+objects).

Review Comment:
   Yes this is probably the best way to do it with rename or delete directory 
generating a single event. These are atomic operations on the Ozone cluster, so 
ideally the consumer would see them that way as well.



##########
hadoop-hdds/docs/content/design/event-notification-schema.md:
##########
@@ -0,0 +1,396 @@
+---
+title: Event notification schema discussion
+summary: Event notifications schema discussion
+date: 2025-06-29
+jira: HDDS-13513
+status: design
+author: Colm Dougan, Donal Magennis
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+## Overview
+
+This document outlines the schema requirements for event notification
+within Ozone and discusses the suitability of 2 widely used event
+notification schemas (S3 and HDFS) as candidates to use as a basis for
+the transmission format for notifications within Ozone.
+
+# General schema requirements
+
+## File/Directory creation/modification
+
+event notifications should be raised to inform consumers of completed
+operations which modify the filesystem and specifically the requests:
+
+#### CreateRequest
+
+we should emit some **create** event
+
+required fields:
+- path (volume + bucket + key)
+- isfile
+
+nice to have fields:
+- overwrite
+- recursive
+
+#### CreateFileRequest
+
+we should emit some **create** event
+
+required fields:
+- path (volume + bucket + key)
+- isfile
+
+nice to have fields:
+- overwrite
+- recursive
+
+#### CreateDirectoryRequest
+
+we should emit some **create** event
+
+required fields:
+- path (volume + bucket + key)
+- isfile
+
+#### CommitKeyRequest
+
+we should emit some **commit/close** event
+
+required fields:
+- path (volume + bucket + key)
+
+nice to have fields:
+- data size
+- hsync?
+
+#### DeleteKeyRequest
+
+we should emit some **delete** event
+
+required fields:
+- path (volume + bucket + key)
+
+nice to have fields:
+- recursive (if known)
+
+### RenameKeyRequest
+
+we should emit some **rename** event
+
+required fields:
+- fromPath (volume + bucket + key)
+- toPath (volume + bucket + toKeyName)
+
+nice to have fields:
+- recursive (if known)
+- is directory (if known)
+
+NOTE: in the case of a FSO directory rename there is a dillema
+(discussed later in this document) as to whether we should emit a single
+event for a directory rename (specifying only the old/new directory names)
+or whether we should emit granular events for all the child objects impacted by
+the rename.
+
+## ACLs
+
+event notifications should be raised to inform consumers that ACL events
+have happened. The relevant requests are:
+
+* AddAclRequest
+* SetAclRequest
+* RemoveAclRequest
+
+The fields provided could vary based on the implementation complexity.
+
+Minimally we have a requirement that we be informed that "some ACL update
+happened" to a certain key (or prefix).
+
+Ideally the details would include the full context of the change made as
+per the request. (perhaps by mirroring the full request details as a JSON
+sub-object) e.g. :
+
+```json
+   ...
+
+   "acls": [
+    {
+      type: "GROUP",
+      name: "mygroup"
+      rights: "\000\001",
+      aclScope: "ACCESS",
+    }
+   ]
+```
+
+The precise details we would need to revisit with guidance from the
+community but this is just to set broad brush expectations.
+
+## SetTimes
+
+event notifications should be raised to inform consumers that
+mtime/atime has changed, as per **SetTimesRequest**
+
+# Transmission format
+
+This section discusses 2 widely used transmission formats for event
+notifiations (S3 and HDFS) and their suitability as candidates for
+adoption within Ozone.
+
+It is not assumed that these are the only options available but they are
+good examples to test against our requirements and discuss trade-offs.
+
+## 1. S3 Event Notification schema
+
+The S3 event notification schema:
+
+[https://docs.aws.amazon.com/AmazonS3/latest/userguide/notification-how-to-event-types-and-destinations.html#supported-notification-event-types](https://docs.aws.amazon.com/AmazonS3/latest/userguide/notification-how-to-event-types-and-destinations.html#supported-notification-event-types)
+
+has become a standard for change notifications in S3 compatible storage 
services such as S3 itself, Ceph, MinIO etc
+
+Notification events are produced as a list of JSON records.
+
+To illustrate we can look at a sample "create" event from the Ceph docs
+(https://docs.ceph.com/en/quincy/radosgw/notifications/#events):
+
+```json
+
+{"Records":[
+    {
+        "eventVersion":"2.1",
+        "eventSource":"ceph:s3",
+        "awsRegion":"us-east-1",
+        "eventTime":"2019-11-22T13:47:35.124724Z",
+        "eventName":"ObjectCreated:Put",
+        "userIdentity":{
+            "principalId":"tester"
+        },
+        "requestParameters":{
+            "sourceIPAddress":""
+        },
+        "responseElements":{
+            
"x-amz-request-id":"503a4c37-85eb-47cd-8681-2817e80b4281.5330.903595",
+            "x-amz-id-2":"14d2-zone1-zonegroup1"
+        },
+        "s3":{
+            "s3SchemaVersion":"1.0",
+            "configurationId":"mynotif1",
+            "bucket":{
+                "name":"mybucket1",
+                "ownerIdentity":{
+                    "principalId":"tester"
+                },
+                "arn":"arn:aws:s3:us-east-1::mybucket1",
+                "id":"503a4c37-85eb-47cd-8681-2817e80b4281.5332.38"
+            },
+            "object":{
+                "key":"myimage1.jpg",
+                "size":"1024",
+                "eTag":"37b51d194a7513e45b56f6524f2d51f2",
+                "versionId":"",
+                "sequencer": "F7E6D75DC742D108",
+                "metadata":[],
+                "tags":[]
+            }
+        },
+        "eventId":"",
+        "opaqueData":"[email protected]"
+    }
+]}
+```
+
+As we can see above: there are a number of boilerplate fields to inform us
+of various aspects of the completed operation but there are a few fundamental
+aspects to highlight;
+
+1. the "key" informs us of the key that the operation was performed on.
+
+2. the "eventName" informs us of the type of operation that was
+   performed.  The 2 most notable eventNames are **ObjectCreated:Put** and
+   **ObjectRemoved:Deleted** which pertain to key creation and deletion 
respectively.
+
+3. operation specific fields can be included within the "object" sub-object (in
+   the above example we can see that "size" and "eTag" of the created object 
are included)
+
+## Applicability to Ozone
+
+For non-FSO Ozone buckets / operations there is a clear mapping between
+operations such as CreateKey / CommitKey / DeleteKey / RenameKey and the
+standard S3 event notification semantics.
+
+Examples:
+
+1. CommitKey could be mapped to a ObjectCreated:Put "/path/to/keyToCreate" 
notification event
+
+2. DeleteKey could be mapped to a ObjectRemoved:Deleted "/path/to/keyToDelete" 
notification event
+
+3. RenameKey (assuming a file based key) in standard S3 event noification 
semantics would produce 2 events:
+
+- a ObjectRemoved:Deleted event for the source path of the rename
+- a ObjectCreated:Put event for the destination path of the rename
+
+The challenge in adopting S3 Event notification semantics within Ozone
+would be in at least 2 areas:
+
+### 1. FSO hierarchical operations which impact multiple child keys
+
+Example: directory renames
+
+To illustrate with an example: lets say we have the following simple directory 
structure:
+
+```
+  /vol1/bucket1/myfiles/f1
+  /vol1/bucket1/myfiles/f2
+  /vol1/bucket1/myfiles/subdir/f1
+```
+
+If a user performs a directory rename such as:
+
+```
+  ozone fs -mv /vol1/bucket1/myfiles /vol1/bucket1/myfiles-RENAMED
+```
+
+Within standard S3 event notification semantics we would expect to see 6 
notifications
+emitted in that case:
+
+```
+  eventName=ObjectRemoved:Deleted, key=/vol1/bucket1/myfiles/f1
+  eventName=ObjectRemoved:Deleted, key=/vol1/bucket1/myfiles/f2
+  eventName=ObjectRemoved:Deleted, key=/vol1/bucket1/myfiles/subdir/f1
+  eventName=ObjectCreated:Put, key=/vol1/bucket1/myfiles-RENAMED/f1
+  eventName=ObjectCreated:Put, key=/vol1/bucket1/myfiles-RENAMED/f2
+  eventName=ObjectCreated:Put, key=/vol1/bucket1/myfiles-RENAMED/subdir/f1
+```
+
+However, with an approach of simply producing notifications based on Ratis
+state machine events then all we would have to go on from the
+RenameKeyRequest would be the fromKeyName and the toKeyName of the
+*parent* of the directory being renamed (and not the impacted child
+objects).
+
+Therefore to produce notifications using the standard S3 event
+notification semantics for FSO directory renames we would need to
+consider the trade-offs between compatibility with the normal S3
+semantics for renames vs a custom event type for directory renames.
+
+### most compatible approach
+
+We could introduce some additional processing before emitting notification
+events in the case of a directory rename which "gathers together" (prior
+to the change being committed to the DB) the child objects impacted by
+the directory rename and emits pairs of delete/create events for each
+key (as described above)
+
+Pros:
+- standard S3 event notification rename semantics
+
+Cons:
+- additional processing to pull together the events.  This could mean an
+  unknown amount of additional processing for large directory renames.
+- could be a performance drag if performed on the leader
+
+### custom event type
+
+Conversely - we could opt to not try to be fully compliant with existing S3 
event notification
+semantics since the schema was designed for non-hierarchical filesystems and
+instead create some custom event extension (e.g. ObjectRenamed:) and
+emit just a single event for directory renames which specifies only the parent
+paths impacted by the rename:
+
+e.g.
+```
+  eventName=ObjectReanmed:Reanmed, fromKey=myfiles, toKey=myfiles-RENAMED
+```
+
+.. it would then be up to the notification consumer to deal with the
+different rename event semantics (i.e. that only the parent names were
+notified and not the impacted child objects).
+
+This is the same semantics used in the HDFS inotify directory rename
+event (see below).
+
+Pros:
+- no additional processing when emitting events
+
+Cons:
+- non-standard S3 event notification semantics
+
+NOTE: directory rename is just one example of a hierarchical FSO
+operation which impacts child objects.  There may be other Ozone
+hierarchical FSO operations which will need be catered for in a similar
+way (recursive delete?)

Review Comment:
   I think directory delete and rename are the only two that fit this category. 
We do have atomic recursive directory delete.



##########
hadoop-hdds/docs/content/design/event-notification-schema.md:
##########
@@ -0,0 +1,396 @@
+---
+title: Event notification schema discussion
+summary: Event notifications schema discussion
+date: 2025-06-29
+jira: HDDS-13513
+status: design
+author: Colm Dougan, Donal Magennis
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+## Overview
+
+This document outlines the schema requirements for event notification
+within Ozone and discusses the suitability of 2 widely used event
+notification schemas (S3 and HDFS) as candidates to use as a basis for
+the transmission format for notifications within Ozone.
+
+# General schema requirements
+
+## File/Directory creation/modification
+
+event notifications should be raised to inform consumers of completed
+operations which modify the filesystem and specifically the requests:
+
+#### CreateRequest
+
+we should emit some **create** event
+
+required fields:
+- path (volume + bucket + key)
+- isfile
+
+nice to have fields:
+- overwrite
+- recursive
+
+#### CreateFileRequest
+
+we should emit some **create** event
+
+required fields:
+- path (volume + bucket + key)
+- isfile
+
+nice to have fields:
+- overwrite
+- recursive
+
+#### CreateDirectoryRequest
+
+we should emit some **create** event
+
+required fields:
+- path (volume + bucket + key)
+- isfile
+
+#### CommitKeyRequest
+
+we should emit some **commit/close** event
+
+required fields:
+- path (volume + bucket + key)
+
+nice to have fields:
+- data size
+- hsync?
+
+#### DeleteKeyRequest
+
+we should emit some **delete** event
+
+required fields:
+- path (volume + bucket + key)
+
+nice to have fields:
+- recursive (if known)
+
+### RenameKeyRequest
+
+we should emit some **rename** event
+
+required fields:
+- fromPath (volume + bucket + key)
+- toPath (volume + bucket + toKeyName)
+
+nice to have fields:
+- recursive (if known)
+- is directory (if known)
+
+NOTE: in the case of a FSO directory rename there is a dillema
+(discussed later in this document) as to whether we should emit a single
+event for a directory rename (specifying only the old/new directory names)
+or whether we should emit granular events for all the child objects impacted by
+the rename.
+
+## ACLs
+
+event notifications should be raised to inform consumers that ACL events
+have happened. The relevant requests are:
+
+* AddAclRequest
+* SetAclRequest
+* RemoveAclRequest
+
+The fields provided could vary based on the implementation complexity.
+
+Minimally we have a requirement that we be informed that "some ACL update
+happened" to a certain key (or prefix).
+
+Ideally the details would include the full context of the change made as
+per the request. (perhaps by mirroring the full request details as a JSON
+sub-object) e.g. :
+
+```json
+   ...
+
+   "acls": [
+    {
+      type: "GROUP",
+      name: "mygroup"
+      rights: "\000\001",
+      aclScope: "ACCESS",
+    }
+   ]
+```
+
+The precise details we would need to revisit with guidance from the
+community but this is just to set broad brush expectations.
+
+## SetTimes
+
+event notifications should be raised to inform consumers that
+mtime/atime has changed, as per **SetTimesRequest**

Review Comment:
   We don't support atime ([only 
mtime](https://github.com/apache/ozone/blob/5d1b43d44fa435b5a304bd1d99e8fa4a60d092cf/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/request/key/OMKeySetTimesRequest.java#L164))
 because atime turns all read operations into write operations which kills 
performance.



##########
hadoop-hdds/docs/content/design/event-notification-schema.md:
##########
@@ -0,0 +1,396 @@
+---
+title: Event notification schema discussion
+summary: Event notifications schema discussion
+date: 2025-06-29
+jira: HDDS-13513
+status: design
+author: Colm Dougan, Donal Magennis
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+## Overview
+
+This document outlines the schema requirements for event notification
+within Ozone and discusses the suitability of 2 widely used event
+notification schemas (S3 and HDFS) as candidates to use as a basis for
+the transmission format for notifications within Ozone.
+
+# General schema requirements
+
+## File/Directory creation/modification
+
+event notifications should be raised to inform consumers of completed
+operations which modify the filesystem and specifically the requests:
+
+#### CreateRequest
+
+we should emit some **create** event
+
+required fields:
+- path (volume + bucket + key)
+- isfile
+
+nice to have fields:
+- overwrite
+- recursive
+
+#### CreateFileRequest
+
+we should emit some **create** event
+
+required fields:
+- path (volume + bucket + key)
+- isfile
+
+nice to have fields:
+- overwrite
+- recursive
+
+#### CreateDirectoryRequest
+
+we should emit some **create** event
+
+required fields:
+- path (volume + bucket + key)
+- isfile
+
+#### CommitKeyRequest
+
+we should emit some **commit/close** event
+
+required fields:
+- path (volume + bucket + key)
+
+nice to have fields:
+- data size
+- hsync?
+
+#### DeleteKeyRequest
+
+we should emit some **delete** event
+
+required fields:
+- path (volume + bucket + key)
+
+nice to have fields:
+- recursive (if known)
+
+### RenameKeyRequest
+
+we should emit some **rename** event
+
+required fields:
+- fromPath (volume + bucket + key)
+- toPath (volume + bucket + toKeyName)
+
+nice to have fields:
+- recursive (if known)
+- is directory (if known)
+
+NOTE: in the case of a FSO directory rename there is a dillema
+(discussed later in this document) as to whether we should emit a single
+event for a directory rename (specifying only the old/new directory names)
+or whether we should emit granular events for all the child objects impacted by
+the rename.
+
+## ACLs
+
+event notifications should be raised to inform consumers that ACL events
+have happened. The relevant requests are:
+
+* AddAclRequest
+* SetAclRequest
+* RemoveAclRequest
+
+The fields provided could vary based on the implementation complexity.
+
+Minimally we have a requirement that we be informed that "some ACL update
+happened" to a certain key (or prefix).
+
+Ideally the details would include the full context of the change made as
+per the request. (perhaps by mirroring the full request details as a JSON
+sub-object) e.g. :
+
+```json
+   ...
+
+   "acls": [
+    {
+      type: "GROUP",
+      name: "mygroup"
+      rights: "\000\001",
+      aclScope: "ACCESS",
+    }
+   ]
+```
+
+The precise details we would need to revisit with guidance from the
+community but this is just to set broad brush expectations.
+
+## SetTimes
+
+event notifications should be raised to inform consumers that
+mtime/atime has changed, as per **SetTimesRequest**
+
+# Transmission format
+
+This section discusses 2 widely used transmission formats for event
+notifiations (S3 and HDFS) and their suitability as candidates for
+adoption within Ozone.
+
+It is not assumed that these are the only options available but they are
+good examples to test against our requirements and discuss trade-offs.
+
+## 1. S3 Event Notification schema
+
+The S3 event notification schema:
+
+[https://docs.aws.amazon.com/AmazonS3/latest/userguide/notification-how-to-event-types-and-destinations.html#supported-notification-event-types](https://docs.aws.amazon.com/AmazonS3/latest/userguide/notification-how-to-event-types-and-destinations.html#supported-notification-event-types)
+
+has become a standard for change notifications in S3 compatible storage 
services such as S3 itself, Ceph, MinIO etc
+
+Notification events are produced as a list of JSON records.
+
+To illustrate we can look at a sample "create" event from the Ceph docs
+(https://docs.ceph.com/en/quincy/radosgw/notifications/#events):
+
+```json
+
+{"Records":[
+    {
+        "eventVersion":"2.1",
+        "eventSource":"ceph:s3",
+        "awsRegion":"us-east-1",
+        "eventTime":"2019-11-22T13:47:35.124724Z",
+        "eventName":"ObjectCreated:Put",
+        "userIdentity":{
+            "principalId":"tester"
+        },
+        "requestParameters":{
+            "sourceIPAddress":""
+        },
+        "responseElements":{
+            
"x-amz-request-id":"503a4c37-85eb-47cd-8681-2817e80b4281.5330.903595",
+            "x-amz-id-2":"14d2-zone1-zonegroup1"
+        },
+        "s3":{
+            "s3SchemaVersion":"1.0",
+            "configurationId":"mynotif1",
+            "bucket":{
+                "name":"mybucket1",
+                "ownerIdentity":{
+                    "principalId":"tester"
+                },
+                "arn":"arn:aws:s3:us-east-1::mybucket1",
+                "id":"503a4c37-85eb-47cd-8681-2817e80b4281.5332.38"
+            },
+            "object":{
+                "key":"myimage1.jpg",
+                "size":"1024",
+                "eTag":"37b51d194a7513e45b56f6524f2d51f2",
+                "versionId":"",
+                "sequencer": "F7E6D75DC742D108",
+                "metadata":[],
+                "tags":[]
+            }
+        },
+        "eventId":"",
+        "opaqueData":"[email protected]"
+    }
+]}
+```
+
+As we can see above: there are a number of boilerplate fields to inform us
+of various aspects of the completed operation but there are a few fundamental
+aspects to highlight;
+
+1. the "key" informs us of the key that the operation was performed on.
+
+2. the "eventName" informs us of the type of operation that was
+   performed.  The 2 most notable eventNames are **ObjectCreated:Put** and
+   **ObjectRemoved:Deleted** which pertain to key creation and deletion 
respectively.
+
+3. operation specific fields can be included within the "object" sub-object (in
+   the above example we can see that "size" and "eTag" of the created object 
are included)
+
+## Applicability to Ozone
+
+For non-FSO Ozone buckets / operations there is a clear mapping between
+operations such as CreateKey / CommitKey / DeleteKey / RenameKey and the
+standard S3 event notification semantics.
+
+Examples:
+
+1. CommitKey could be mapped to a ObjectCreated:Put "/path/to/keyToCreate" 
notification event
+
+2. DeleteKey could be mapped to a ObjectRemoved:Deleted "/path/to/keyToDelete" 
notification event
+
+3. RenameKey (assuming a file based key) in standard S3 event noification 
semantics would produce 2 events:
+
+- a ObjectRemoved:Deleted event for the source path of the rename
+- a ObjectCreated:Put event for the destination path of the rename
+
+The challenge in adopting S3 Event notification semantics within Ozone
+would be in at least 2 areas:
+
+### 1. FSO hierarchical operations which impact multiple child keys
+
+Example: directory renames
+
+To illustrate with an example: lets say we have the following simple directory 
structure:
+
+```
+  /vol1/bucket1/myfiles/f1
+  /vol1/bucket1/myfiles/f2
+  /vol1/bucket1/myfiles/subdir/f1
+```
+
+If a user performs a directory rename such as:
+
+```
+  ozone fs -mv /vol1/bucket1/myfiles /vol1/bucket1/myfiles-RENAMED
+```
+
+Within standard S3 event notification semantics we would expect to see 6 
notifications
+emitted in that case:
+
+```
+  eventName=ObjectRemoved:Deleted, key=/vol1/bucket1/myfiles/f1
+  eventName=ObjectRemoved:Deleted, key=/vol1/bucket1/myfiles/f2
+  eventName=ObjectRemoved:Deleted, key=/vol1/bucket1/myfiles/subdir/f1
+  eventName=ObjectCreated:Put, key=/vol1/bucket1/myfiles-RENAMED/f1
+  eventName=ObjectCreated:Put, key=/vol1/bucket1/myfiles-RENAMED/f2
+  eventName=ObjectCreated:Put, key=/vol1/bucket1/myfiles-RENAMED/subdir/f1
+```
+
+However, with an approach of simply producing notifications based on Ratis
+state machine events then all we would have to go on from the
+RenameKeyRequest would be the fromKeyName and the toKeyName of the
+*parent* of the directory being renamed (and not the impacted child
+objects).
+
+Therefore to produce notifications using the standard S3 event
+notification semantics for FSO directory renames we would need to
+consider the trade-offs between compatibility with the normal S3
+semantics for renames vs a custom event type for directory renames.
+
+### most compatible approach
+
+We could introduce some additional processing before emitting notification
+events in the case of a directory rename which "gathers together" (prior
+to the change being committed to the DB) the child objects impacted by
+the directory rename and emits pairs of delete/create events for each
+key (as described above)
+
+Pros:
+- standard S3 event notification rename semantics
+
+Cons:
+- additional processing to pull together the events.  This could mean an
+  unknown amount of additional processing for large directory renames.
+- could be a performance drag if performed on the leader
+
+### custom event type
+
+Conversely - we could opt to not try to be fully compliant with existing S3 
event notification
+semantics since the schema was designed for non-hierarchical filesystems and
+instead create some custom event extension (e.g. ObjectRenamed:) and
+emit just a single event for directory renames which specifies only the parent
+paths impacted by the rename:
+
+e.g.
+```
+  eventName=ObjectReanmed:Reanmed, fromKey=myfiles, toKey=myfiles-RENAMED
+```
+
+.. it would then be up to the notification consumer to deal with the
+different rename event semantics (i.e. that only the parent names were
+notified and not the impacted child objects).
+
+This is the same semantics used in the HDFS inotify directory rename
+event (see below).
+
+Pros:
+- no additional processing when emitting events
+
+Cons:
+- non-standard S3 event notification semantics

Review Comment:
   I think this is acceptable for FSO buckets. FSO buckets accessed through S3 
API do compatibility on a best-effort basis, but there are some things that 
just won't work. For example writing an object called `/bucket/prefix` then 
writing another object called `/bucket/prefix/key` is valid in S3 and OBS but 
not FSO (the second object would try to create `prefix` as a directory). IMO it 
is ok for S3 event notifications to make similar tradeoffs when used with FSO 
buckets. 



##########
hadoop-hdds/docs/content/design/event-notifications.md:
##########
@@ -0,0 +1,193 @@
+---
+title: Event notification support in Ozone
+summary: Event notifications for all bucket/event types in ozone
+date: 2025-06-28
+jira: HDDS-13513
+status: design
+author: Donal Magennis, Colm Dougan
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Abstract
+
+Implement an event notification system for Apache Ozone, providing the ability 
for users to consume events occuring on the Ozone filesystem.
+This is similar to https://issues.apache.org/jira/browse/HDDS-5984 but aims to 
encapsulate all events and not solely S3 buckets.  
+This document proposes a potential solution and discusses some of the 
challenges/open questions.
+
+# Introduction
+
+Apache Ozone does not currently provide the ability to consume filesystem 
events, similar to how HDFS does with Inotify or S3 with bucket notifications.  
+These events are an integral part of integration with external systems to 
support real-time, scalable, and programmatic monitoring of changes in the data 
or metadata stored in Ozone.  
+These external systems can use notifications of objects created/deleted to 
trigger data processing workflows, replication and monitoring alerts.
+
+# Goals
+
+Provide support for all events across the Ozone filesystem for FSO and non FSO 
buckets, including renames and changes to acls.
+Not impact with performance of client requests.
+Guarantee at-least-once delivery.
+
+# Non-Goals
+
+Filtering of events or paths/buckets
+Persistent storage of notification messages
+Asynchronous delivery

Review Comment:
   Interesting that [S3 does not guarantee event 
order](https://docs.aws.amazon.com/AmazonS3/latest/userguide/notification-how-to-event-types-and-destinations.html#event-ordering-and-duplicate-events).
  It seems like this would make it difficult to build any sort of replication 
based system around these events. We may be able to support this depending how 
we are working with the Ratis logs.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]


Reply via email to