vtutrinov commented on code in PR #6989: URL: https://github.com/apache/ozone/pull/6989#discussion_r2097266132
########## hadoop-hdds/docs/content/design/storage-policy.md: ########## @@ -0,0 +1,397 @@ +--- +title: Ozone Storage Policy Support +summary: Support Ozone storage strategy, and support to write key into the specified type of storage medium. +date: 2024-07-25 +jira: HDDS-11233 +status: draft +--- +<!-- + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + http://www.apache.org/licenses/LICENSE-2.0 + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. See accompanying LICENSE file. +--> + +# Terminology + +## Terminology + +- Storage Policy: Defines where key data replicas should be stored in specific storage tiers. +- Storage Type: The types of disks/Container replicas in a Datanode, storage type could include RAM_DISK, SSD, HDD, ARCHIVE, etc. +- Storage Tier: A set of Container replicas in a cluster that satisfy the storage policy. +- Volume: In this document, unless otherwise specified, a volume refers to the volume of a Datanode.. +- prefix: The prefix in this article, unless otherwise specified, refers to the prefix of the storage policy type, not the ACL prefix. The prefix of the storage policy type is used to configure the prefix of the storage policy for the specified prefix. + +## Storage Policy vs Storage Type vs Storage Tier + + + +The relation of Storage Policy, Storage Type and Storage Tier + +- The storage policy is the property of key/bucket/ prefix (Managed by OM); +- The storage tier is the property of Pipeline and Container (Managed by SCM); Review Comment: Will we deal with the storage tier as an entry of the cluster topology? ########## hadoop-hdds/docs/content/design/storage-policy.md: ########## @@ -0,0 +1,397 @@ +--- +title: Ozone Storage Policy Support +summary: Support Ozone storage strategy, and support to write key into the specified type of storage medium. +date: 2024-07-25 +jira: HDDS-11233 +status: draft +--- +<!-- + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + http://www.apache.org/licenses/LICENSE-2.0 + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. See accompanying LICENSE file. +--> + +# Terminology + +## Terminology + +- Storage Policy: Defines where key data replicas should be stored in specific storage tiers. +- Storage Type: The types of disks/Container replicas in a Datanode, storage type could include RAM_DISK, SSD, HDD, ARCHIVE, etc. +- Storage Tier: A set of Container replicas in a cluster that satisfy the storage policy. +- Volume: In this document, unless otherwise specified, a volume refers to the volume of a Datanode.. +- prefix: The prefix in this article, unless otherwise specified, refers to the prefix of the storage policy type, not the ACL prefix. The prefix of the storage policy type is used to configure the prefix of the storage policy for the specified prefix. + +## Storage Policy vs Storage Type vs Storage Tier + + + +The relation of Storage Policy, Storage Type and Storage Tier + +- The storage policy is the property of key/bucket/ prefix (Managed by OM); +- The storage tier is the property of Pipeline and Container (Managed by SCM); +- The storage type is the property of volume and Container replicas (Managed by DN); +- Only the storage policy can be modified by the user directly via ozone command; + +Example: + +For a keyA, its storage policy is Hot, its Container 1 tier is SSD tier, and Container 1 has three replicas, all of which are of the SSD storage type. + +# User Scenarios + +- User A needs a bucket that supports high-performance IO, so create a bucket with the storage policy set to Hot. Data written by User A to bucket will automatically be distributed across the SSD disks in the cluster. +- User B needs higher IO performance for the directory/prefix /project/metadata, so set the storage policy for the prefix /project/metadata to Hot. Subsequently, data written to /project/metadata will be automatically distributed across the SSD disks in the cluster. +- User C has already written key1 to the cluster and requires better IO performance. The storage policy for key1 can be set to Hot, and then a migration can be triggered to move key1 to the SSD disks. +- Use D use command `aws s3 cp myfile.txt s3://my-bucket/myfile.txt --storage-class XXX` upload a file the Ozone SSD tier + +# Current Status + +- Ozone currently has some support for tiered storage such as storage type, and some parts of this article may already be implemented. +- Currently, in Ozone, when a key is created, the key's Block can appear on any volume of a Datanode. When a key is created, SCM first needs to allocate a Block for the key through Pipelines. The Client then writes the Block to the corresponding Datanode based on the Pipeline information. In this process, the smallest element managed by the SCM Pipeline is the Datanode, and when the Datanode creates a Container, the Container may appear on any volume with enough remaining space. Under the current architecture, Ozone does not support writing data to specific disks + +# Goal Requirements Specification + +### **Support for Storage Policy Writing and Management** + +- **Writing keys**: Allow keys to be written to specified storage tiers based on storage policies. +- **Policy Management**: Enable setting, unsetting, and inheriting storage policies for keys, prefixes, and buckets. Inherit policies based on the longest matching prefix or bucket if no specific policy is set. + +### **Support for Data Migration Across Different Storage Policies** + +- **Data Migration**: Support data migration across different storage policies via manual triggers, ensuring data is moved to the appropriate storage tiers. + +### **Adaptation of AWS S3 StorageClass** + +- **S3 StorageClass Mapping**: Map AWS S3 storage classes to Ozone storage policies, supporting related API operations (PutObject, CopyObject, Multipart Upload, GetObject, HeadObject, ListObjects). + +### **Management and Monitoring Tools** + +- **Storage Policy Commands**: Provide tools to view storage policies of containers, datanode usage, and pipeline information. +- **Metrics and Monitoring**: Enable visibility into storage policy compliance, container storage types, and space information across different storage policies. + +### **Future Enhancements** + +- **Intelligent Storage Policies**: Plan to support automatic data migration based on access frequency, similar to S3 Intelligent-Tiering. +- **Bucket StorageClass Lifecycle Rules: Support setting storage policies Lifecycle Rules at the bucket level.** +- **Recon Support**: Enhance Recon to display relevant storage tier information. + +# Detailed Requirements Specification + +## Storage Policy and Storage Types + +### Supported Storage Types + +- Specify the Storage Type for each volume through configuration. If no Storage Type is specified, the default value will be DISK. +- Support Storage Type:SSD / DISK / ARCHIVE / RAM_DISK + +### Supported Storage Policies + +Support storage policy: Hot , Warm, Cold + +### Storage Policies Map To Storage Tiers + +| Storage Policy | Storage Tier for Write | Fallback Tier for Write | +| --- | --- | --- | +| Hot | SSD | DISK | Review Comment: Is this the final list of desired/planned storage policies? Wouldn't we like to implement policies like in HDFS - https://hadoop.apache.org/docs/r3.4.1/hadoop-project-dist/hadoop-hdfs/ArchivalStorage.html ? ########## hadoop-hdds/docs/content/design/storage-policy.md: ########## @@ -0,0 +1,397 @@ +--- +title: Ozone Storage Policy Support +summary: Support Ozone storage strategy, and support to write key into the specified type of storage medium. +date: 2024-07-25 +jira: HDDS-11233 +status: draft +--- +<!-- + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + http://www.apache.org/licenses/LICENSE-2.0 + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. See accompanying LICENSE file. +--> + +# Terminology + +## Terminology + +- Storage Policy: Defines where key data replicas should be stored in specific storage tiers. +- Storage Type: The types of disks/Container replicas in a Datanode, storage type could include RAM_DISK, SSD, HDD, ARCHIVE, etc. +- Storage Tier: A set of Container replicas in a cluster that satisfy the storage policy. +- Volume: In this document, unless otherwise specified, a volume refers to the volume of a Datanode.. +- prefix: The prefix in this article, unless otherwise specified, refers to the prefix of the storage policy type, not the ACL prefix. The prefix of the storage policy type is used to configure the prefix of the storage policy for the specified prefix. + +## Storage Policy vs Storage Type vs Storage Tier + + + +The relation of Storage Policy, Storage Type and Storage Tier + +- The storage policy is the property of key/bucket/ prefix (Managed by OM); +- The storage tier is the property of Pipeline and Container (Managed by SCM); +- The storage type is the property of volume and Container replicas (Managed by DN); +- Only the storage policy can be modified by the user directly via ozone command; + +Example: + +For a keyA, its storage policy is Hot, its Container 1 tier is SSD tier, and Container 1 has three replicas, all of which are of the SSD storage type. + +# User Scenarios + +- User A needs a bucket that supports high-performance IO, so create a bucket with the storage policy set to Hot. Data written by User A to bucket will automatically be distributed across the SSD disks in the cluster. +- User B needs higher IO performance for the directory/prefix /project/metadata, so set the storage policy for the prefix /project/metadata to Hot. Subsequently, data written to /project/metadata will be automatically distributed across the SSD disks in the cluster. +- User C has already written key1 to the cluster and requires better IO performance. The storage policy for key1 can be set to Hot, and then a migration can be triggered to move key1 to the SSD disks. +- Use D use command `aws s3 cp myfile.txt s3://my-bucket/myfile.txt --storage-class XXX` upload a file the Ozone SSD tier + +# Current Status + +- Ozone currently has some support for tiered storage such as storage type, and some parts of this article may already be implemented. +- Currently, in Ozone, when a key is created, the key's Block can appear on any volume of a Datanode. When a key is created, SCM first needs to allocate a Block for the key through Pipelines. The Client then writes the Block to the corresponding Datanode based on the Pipeline information. In this process, the smallest element managed by the SCM Pipeline is the Datanode, and when the Datanode creates a Container, the Container may appear on any volume with enough remaining space. Under the current architecture, Ozone does not support writing data to specific disks + +# Goal Requirements Specification + +### **Support for Storage Policy Writing and Management** + +- **Writing keys**: Allow keys to be written to specified storage tiers based on storage policies. +- **Policy Management**: Enable setting, unsetting, and inheriting storage policies for keys, prefixes, and buckets. Inherit policies based on the longest matching prefix or bucket if no specific policy is set. + +### **Support for Data Migration Across Different Storage Policies** + +- **Data Migration**: Support data migration across different storage policies via manual triggers, ensuring data is moved to the appropriate storage tiers. + +### **Adaptation of AWS S3 StorageClass** + +- **S3 StorageClass Mapping**: Map AWS S3 storage classes to Ozone storage policies, supporting related API operations (PutObject, CopyObject, Multipart Upload, GetObject, HeadObject, ListObjects). + +### **Management and Monitoring Tools** + +- **Storage Policy Commands**: Provide tools to view storage policies of containers, datanode usage, and pipeline information. +- **Metrics and Monitoring**: Enable visibility into storage policy compliance, container storage types, and space information across different storage policies. + +### **Future Enhancements** + +- **Intelligent Storage Policies**: Plan to support automatic data migration based on access frequency, similar to S3 Intelligent-Tiering. +- **Bucket StorageClass Lifecycle Rules: Support setting storage policies Lifecycle Rules at the bucket level.** +- **Recon Support**: Enhance Recon to display relevant storage tier information. + +# Detailed Requirements Specification + +## Storage Policy and Storage Types + +### Supported Storage Types + +- Specify the Storage Type for each volume through configuration. If no Storage Type is specified, the default value will be DISK. +- Support Storage Type:SSD / DISK / ARCHIVE / RAM_DISK + +### Supported Storage Policies + +Support storage policy: Hot , Warm, Cold + +### Storage Policies Map To Storage Tiers + +| Storage Policy | Storage Tier for Write | Fallback Tier for Write | +| --- | --- | --- | +| Hot | SSD | DISK | +| Warm | DISK | none | +| Cold | ARCHIVE | none | +- **Storage Tier For Write**: The priority storage tier where data is written when storage policy is specified. +- **Fallback Tier for Write**: If the specified storage policy cannot be satisfied with the priority storage tier, the SCM will attempt to use this fallback tier to meet the policy requirements. + +### Storage Tier Map To Storage Type + +| Tier | StorageType of Pipeline | One Replication +Container Replicas Storage Type | Three replication +Container Replicas Storage Type | EC +Container Replicas Storage Type | +| --- | --- | --- | --- | --- | +| SSD | SSD | SSD | 3 SSD | n SSD | +| DISK | DISK | DISK | 3 DISK | n DISK | +| ARCHIVE | ARCHIVE | ARCHIVE | 3 ARCHIVE | n ARCHIVE | + +### Fallback Storage Type For Container replicas Replication/Migration + +| Container Replicas Type | Container Replicas Fallback Storage Type [1] | +| --- | --- | +| SSD | DISK | +| DISK | none | +| ARCHIVE | none | +- Container Replicas Fallback Storage Type: During the Container replicas replication or migration process, if the SCM cannot find a suitable volume type that matches the original Container replica's storage type, it will attempt to use this fallback storage tier. + +[1] For a Container replicas, it will not know the Storage Policy of the Container’s key or the tier of the SCM Container located, the Container replicas just know its own expected storage type, So column name is “Fallback Storage Type” + +## Support for Ozone Storage Policy Writing and Management + +### Support storage policy writing + +- Support specifying a storage policy when writing a key. + - If a storage policy is specified when writing a key, the key storage policy is the specified storage policy. + - If no storage policy is specified, the default behavior refers to the "Inheritance of storage policy" section. + - If a key neither inheriting any storage policy nor specified a storage policy when writing a key, then the key storage policy will be default storage policy (can refers to the "default storage policy" section) + - If the priority storage policy is not satisfied, support writing to the fallback tier if the fallbackStrategy is “allow” + +### Support fallback strategy configuration + +- fallbackStrategy + - Allow (default): In this case, the behavior is similar to HDFS, with automatic fallback, and it does not trigger errors or additional alerts; + - Prohibit: Prohibit fallback; if a tier that satisfies the storage policy cannot be found, the write operation fails directly. + +### Inheritance of storage policies + +- If no storage policy is specified (undefined storage policy) when writing a key, the key's storage policy inherits the longest matching prefix. If there is no matching prefix, it inherits the storage policy of the bucket. If the bucket has no effective storage policy [1], the key's storage policy will be the default storage policy . +- If a key is created with an effective storage policy, the storage policy of the key will not change with the storage policy changing of the bucket or prefix. + +[1] Effective storage policy means a non-empty storage policy. + +### undefined storage policy + +- If the user does not specify any storage policy when creating a key, the user's storage policy is undefined storage policy. +- Even if the user's key inherits the storage policy of the bucket/prefix, the user's storage policy is still undefined storage policy. +- Undefined storage policy does not mean the key no storage policy. if the key inherits a storage policy, the key actual storage policy is the inherited storage policy. +- The undefined storage policy will change as the changing of the prefix/bucket storage policy, including when the key is renamed to a prefix with a different storage policy. + +### default storage policy + +- If a key neither inheriting any storage policy nor specified a storage policy when writing a key, then the key storage policy will be default storage policy +- The default storage policy is the storage policy for existing keys before the storage policy feature is launched. That is, all keys have at least the default storage policy, even if the key was created before the storage policy feature was launched. +- If the user has not configured a default storage policy, the default storage policy should be Warm. +- The default storage policy can be configured. + +## Storage policy management + +### key storage policy + +- Support setting and unsetting the storage policy for keys. + - After unsetting the storage policy, the actual storage policy of the key refers to the "Inheritance of storage policy" section, or is the default storage policy if key do not inherit any storage policy. +- Support displaying the storage policy in the key list/info results(Include whether the storage policy is the default storage policy). + +### Bucket storage policy management + +- Support setting and unsetting storage policies for buckets. + - After unsetting the storage policy, the storage policy is the default storage policy. +- Support specifying a storage policy when creating a bucket. + - If no storage policy is specified, the storage policy is the default storage policy. +- Support displaying the storage policy in the bucket list/info results (Include whether the storage policy is the default storage policy). + +### Prefix management + +- Support creating, deleting, setting, getting, and listing prefixes. +- The display of the prefix storage policy should display whether the storage policy is the default storage policy +- A prefix can only have one type of policy. +- Prefixes do not support unsetting storage policies; deleting a prefix is equivalent to unsetting the storage policy. + +### Support for persistent storage of storage policy changes: + +- Use ozone admin storagepolicies satisfyStoragePolicy to trigger the migration of corresponding changes, and mark the corresponding storage policy changes as completed. + +### FSO type buckets: + +- Use prefixes to implement directory-level storage policy management, not directly support setting storage policies for directories. +- Do not support setting storage policies for directory-type keys. + +## Adaptation of AWS S3 + +### Adaptation of AWS S3 StorageClass + +Not all the StorageClass will be support by the Ozone + +A possible solution + +| AWS S3 StorageClass | Ozone StoragePolicy | +| --- | --- | +| STANDARD | Hot | +| STANDARD_IA | Warm | +| GLACIER | COLD | + +> According to AWS S3 documentation, STANDARD is the highest performance S3 StorageClass, but its name is STANDARD, which is not easy to convert it to OZONE SSD + +> AWS StorageClass Valid Values: STANDARD | REDUCED_REDUNDANCY | STANDARD_IA | ONEZONE_IA | INTELLIGENT_TIERING | GLACIER | DEEP_ARCHIVE | OUTPOSTS | GLACIER_IR | SNOW | EXPRESS_ONEZONE + +### Adaptation of AWS S3 Related API + +refer to +[Using Amazon S3 storage classes - Amazon Simple Storage Service](https://docs.aws.amazon.com/AmazonS3/latest/userguide/storage-class-intro.html) + +- PutObject: + - Support specifying the StorageClass parameter in the PutObject request to determine the storage policy for the object. +- CopyObject: + - Support specifying a new storage policy (StorageClass) in the CopyObject request and applying the new storage policy when migrating the object from the source location to the target location. If no new storage policy is specified, inherit the storage policy of the source object. +- Multipart Upload + - Support specifying the StorageClass parameter in the CreateMultipartUpload request and following the StorageClass parameter of CreateMultipartUpload in UploadPart. +- GetObject operation: + - Return the current storage policy of the object in the GetObject response. +- HeadObject: + - Support the HeadObject request to return the metadata of the object, including its storage policy (StorageClass). +- ListObjects: + - Include the storage policy (StorageClass) information of each object in the ListObjects response. +- Bucket StorageClass Lifecycle Rules: + - Support setting storage policies Lifecycle Rules at the bucket level and automatically managing the storage policy conversion of objects through policies and lifecycle rules. For example, automatically transfer objects from SSD to HDD or from HDD to NVMe SSD based on the object's age or access frequency. +- ~~RestoreObject operation: [Not Supported]~~ + +## Support for Storage Policy Management Commands/Metrics + +Lists commands that need to be added/adapted to storage policies, but may not be all commands + +- Storage policy management + - `ozone admin storagepolicies list`, lists all supported policies in the current cluster. + - `ozone admin storagepolicies get ${key}`, get the key storage policy (include the inherited storage policy) + - `ozone admin storagepolicies check ${key}`, check whether the key storage policy is satisfied + - `ozone admin storagepolicies satisfyStoragePolicy -bucket / -prefix / -key`, triggers migrations to satisfy the corresponding data's storage policy. + - `ozone admin storagepolicies satisfyStoragePolicy status -bucket / -prefix / -key`, checks the migration status of the storage policy. + - `ozone admin storagepolicies checkSatisfyStoragePolicy -bucket / -prefix / -key`, checks whether the specified resource's storage policy is satisfied. + - Containers may be migrated to a fallback tier. At this time, the storage policy of the corresponding resource is not satisfied. +- `ozone admin container info`, supports displaying the storage policy of Containers. +- `ozone admin datanode usageinfo`, supports displaying space information according to different storage policies. +- `ozone admin pipeline list`, supports displaying the storage tier supported by Pipelines. +- `ozone admin pipeline create`, supports creating Pipelines that support specified storage tier. +- Datanode volume related Metrics support displaying the storage policies of volumes. +- SCM supports displaying the data space information of different storage policies in the cluster. + +## Storage Policy Satisfier Service + +- Support data migration across different storage policies. + - Storage policy migrations are triggered by ozone admin storagepolicies satisfyStoragePolicy. Modifying the storage policy of a bucket/prefix/key does not directly trigger storage policy migrations. +- Support manually setting storage policies for buckets/prefixes/keys and implementing asynchronous migrations. +- Responsible for managing storage policy-related migration work. +- Responsible for responding to user requests and checking whether the specified resources satisfy the storage policy. + +## Permissions Management + +- Changing the storage policy of buckets/prefixes/keys requires administrator permissions. + +## Storage Policy Supported Replicas Types + +- Support replica type storage policies. +- Support EC type storage policies. + - EC types cannot support some storage policies that mix different storage media. + +## Storage Policy Supported Bucket Types + +- Support all types of buckets. + +## Compatible HDFS Commands + +HDFS storage policy-related commands are not supported. + +Do not support specified storage policy when creating a file using HDFS Filesystem interface, the Hadoop FileSystem (org.apache.hadoop.fs.FileSystem) does not support passing the storage policy by Create method. + +- Ozone needs to record the access frequency of keys/buckets. + +## Recon Side Feature + +Support displaying relevant storage tier information in Recon. Need to design more detailed. // TODO + +# Architecture Design + +This section aims to describe the architecture and design concepts of the system without delving into specific implementation details. Its purpose is to validate the feasibility and rationality of the technical route, as well as to assess the workload and technical dependencies of related tasks. + +## Writing Key Process + +Writing data to the specified tier: + +- When a user creates a key, they set the storage policy and fallbackStrategy parameters and send the request to the OM: + - CreateKey + - CreateFile + - CreateMultipartKey +- Community Ozone: + - CreateStreamKey + - CreateStreamFile + - CreateMultipartStreamKey +- OM adds the storage policy information to the allocateBlock request sent to the SCM. The actual storage strategy should be calculated based on the storage strategy requirements above. +- SCM, upon receiving the request, selects the Pipeline based on the storage policy and allocates a Container. + - The storage type of each Container replicas should be added to for the Datanodes in the Pipeline, just like the replicaIndex for the EC key. + - If the expected storage tier Pipeline cannot be found, attempt to find the fallback tier defined by the storage policy if the fallbackStrategy is allowed. If it cannot be found, the creation process fails. + - If the SCM needs to create a new Container for a specific storage tier, SCM needs to record the Container’s storage tier for the Container. So the ContainerData should add a storage tier field. +- After receiving the Pipeline with the specified storage type information, the client writes the Chunk replica with the storage type information. +- Datanode writes data to the specified Container replica or creates a Container replica in the specified volume based on the storage type . +- OMKeyInfo needs to persist in the storage policy field, if the storage policy does not be specified, the storage policy should be null. + +## DN Container Replica Creation + +- DN should select the appropriate volume based on the storage type. +- If the Container includes storage type information, DN should store this information in the Container's replicas YAML configuration. +- When DN starts, it should read the storage type information from the Container's YAML configuration and load it into memory. +- If the storage type of Container is not specified, it defaults to Disk. + +## Storage Policy Management + +- Implement key, prefix, and bucket storage policy-related commands as needed, and add related fields to persist tier information. + +## SCM Pipeline / ContainerReplica Management and Block Allocation + +- Pipeline needs to add tier-related fields to distinguish between different tiers of Pipelines. +- For Pipelines tier: Review Comment: The current implementation of the background pipeline creator creates a limited list of RATIS/THREE-pipelines per datanode. The design doc proposes to deal with specific pipelines for different storage policies. Is there any proposal doc on how we gonna deal with the limitation mentioned earlier? (How should we deal with the distribution of the pipelines of different types: 30% per storage tire or some other option? What if the pipelines of one of the storrage tire will not be used?) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
