jojochuang commented on code in PR #8496: URL: https://github.com/apache/ozone/pull/8496#discussion_r2116540843
########## hadoop-hdds/docs/content/feature/Snapshot.md: ########## @@ -25,53 +25,270 @@ summary: Ozone Snapshot ## Introduction -Snapshot feature for Apache Ozone object store allows users to take point-in-time consistent image of a given bucket. Snapshot feature enables various use cases, including: - * Backup and Restore: Create hourly, daily, weekly, monthly snapshots for backup and recovery when needed. - * Archival and Compliance: Take snapshots for compliance purpose and archive them as required. - * Replication and Disaster Recovery (DR): Snapshots provide frozen immutable images of the bucket on the source Ozone cluster. Snapshots can be used for replicating these immutable bucket images to remote DR sites. - * Incremental Replication: DistCp with SnapshotDiff offers an efficient way to incrementally sync up source and destination buckets. +Snapshot feature for Apache Ozone object store allows users to take a point-in-time consistent image of a given bucket. The snapshot is a read-only, frozen image of the bucket’s state at the time of snapshot creation. Snapshot feature enables various use cases, including: -## Snapshot APIs +* **Backup and Restore** – Create hourly, daily, weekly, monthly snapshots for backup and recovery when needed. +* **Archival and Compliance** – Take snapshots for compliance purposes and archive them as required. +* **Replication and Disaster Recovery (DR)** – Snapshots provide frozen, immutable images of the bucket on the source Ozone cluster. These can be used for replicating bucket images to remote DR sites. +* **Incremental Replication** – DistCp with SnapshotDiff offers an efficient way to incrementally sync up source and destination buckets. -Snapshot feature is available through 'ozone fs' and 'ozone sh' CLI. This feature can also be programmatically accessed from Ozone `ObjectStore` Java client. The feature provides following functionalities: -* Create Snapshot: Create an instantaneous snapshot for a given bucket +## Architecture + +Ozone Snapshot architecture leverages the immutability of data blocks in Ozone. Data blocks, once written, remain immutable for their lifetime and are only reclaimed when the corresponding key metadata is removed from the namespace. All Ozone metadata (volume, bucket, keys, directories) is stored in the Ozone Manager (OM) metadata store (RocksDB). When a user takes a snapshot of a bucket, the system internally creates a point-in-time copy of the bucket’s namespace metadata on the OM. Since Ozone doesn’t allow in-place updates to DataNode blocks, the integrity of data referenced by the snapshot is preserved. The OM’s key deletion service is aware of snapshots: it will not permanently delete any key as long as that key is still referenced by the active bucket or any existing snapshot. A background KeyDeletingService and DirectoryDeleting Service (garbage collectors) identify keys that are no longer referenced by any snapshot or the live bucket, and reclaim those blocks. + +Ozone also provides a SnapshotDiff feature. When a user issues a SnapshotDiff between two snapshots, the OM efficiently computes all the differences (added, deleted, modified, or renamed keys) between the two snapshots and returns a paginated list of changes. Snapshot diff results are cached to speed up subsequent requests for the same snapshot pair. + +## System Architecture Deep Dive + +Internally, Ozone implements snapshots by **versioning the OM metadata for each bucket** snapshot. The OM maintains a snapshot metadata table that records the state of the bucket’s key directory tree at the moment of snapshot creation. No data is physically copied at snapshot creation – the operation simply marks a consistent snapshot of the OM’s RocksDB state (hence snapshots are created instantaneously). Under the hood, Ozone relies on RocksDB’s abilities (like checkpoint) to preserve point-in-time views of the metadata. Each snapshot is identified by a unique ID and name, and each key entry in the OM DB carries information about which snapshots (if any) it belongs to. This approach ensures that **common data is not duplicated** across snapshots: if a key has not changed between two snapshots, both snapshots reference the same underlying data blocks. Review Comment: Currently we do not allow compaction on snapshots. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
