errose28 commented on code in PR #8388: URL: https://github.com/apache/ozone/pull/8388#discussion_r2098245692
########## hadoop-hdds/docs/content/design/dn-min-space-configuration.md: ########## @@ -0,0 +1,108 @@ +--- +title: Minimum free space configuration for datanode volumes +summary: Describe proposal for minimum free space configuration which volume must have to function correctly. +date: 2025-05-05 +jira: HDDS-12928 +status: implemented +author: Sumit Agrawal +--- +<!-- + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + http://www.apache.org/licenses/LICENSE-2.0 + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. See accompanying LICENSE file. +--> + +# Abstract +Volume in the datanode stores the container data and metadata (rocks db co-located on the volume). +There are various parallel operation going on such as import container, export container, write and delete data blocks, +container repairs, create and delete containers. The space is also required for volume db to perform compaction at regular interval. +This is hard to capture exact usages and free available space. So, this is required to configure minimum free space +so that datanode operation can perform without any corruption and environment being stuck and support read of data. + +This free space is used to ensure volume allocation if `required space < (volume available space - free space - reserved space)` +Any container creation and import container need ensure this constrain is met. And block write need ensure that this space is available if new blocks are written. +Note: Any issue related to ensuring free space is tracked with separate JIRA. + +# Existing configuration (before HDDS-12928) +Two configurations are provided, +- hdds.datanode.volume.min.free.space (default: 5GB) +- hdds.datanode.volume.min.free.space.percent + +1. If nothing is configured, takes default value as 5GB +2. if both are configured, priority to hdds.datanode.volume.min.free.space +3. else respective configuration is used. + +# Problem Statement + +- With 5GB default configuration, its not avoiding full disk scenario due to error in ensuring free space availability. +This is due to container size being imported is 5GB which is near boundary, and other parallel operation. +- Volume DB size can increase with increase in disk space as container and blocks it can hold can more and hence metadata. +- Volume DB size can also vary due to small files and big files combination, as more small files can lead to more metadata. + +Solution involves +- appropriate default min free space +- depends on disk size variation + +# Approach 1 Combination of minimum free space and percent increase on disk size + +Configuration: +1. Minimum free space: hdds.datanode.volume.min.free.space: default value `20GB` +2. disk size variation: hdds.datanode.volume.min.free.space.percent: default 0.1% or 0.001 ratio + +Minimum free space = Max (`<Min free space>`, `<percent disk space>`) + +| Disk space | Min Free Space (percent: 1%) | Min Free Space ( percent: 0.1%) | +| -- |------------------------------|---------------------------------| +| 100 GB | 20 GB | 20 GB (min space default) | +| 1 TB | 20 GB | 20 GB (min space default) | +| 10 TB | 100 GB | 20 GB (min space default) | +| 100 TB | 1 TB | 100 GB | + +considering above table with this solution, +- 0.1 % to be sufficient to hold almost all cases, as not observed any dn volume db to be more that 1-2 GB + +# Approach 2 Only minimum free space configuration + +Considering above approach, 20 GB as default should be sufficient for most of the disk, as usually disk size is 10-15TB as seen. +Higher disk is rarely used, and instead multiple volumes are attached to same DN with multiple disk. + +Considering this scenario, Minimum free space: `hdds.datanode.volume.min.free.space` itself is enough and +percent based configuration can be removed. + +### Compatibility +If `hdds.datanode.volume.min.free.space.percent` is configured, this should not have any impact +as default value is increased to 20GB which will consider most of the use case. + +# Approach 3 Combination of maximum free space and percent configuration on disk size + +Configuration: +1. Maximum free space: hdds.datanode.volume.min.free.space: default value `20GB` +2. disk size variation: hdds.datanode.volume.min.free.space.percent: default 10% or 0.1 ratio + +Minimum free space = **Min** (`<Max free space>`, `<percent disk space>`) +> Difference with approach `one` is, Min function over the 2 above configuration + +| Disk space | Min Free Space (20GB, 10% of disk) | +| -- |------------------------------------| +| 10 GB | 1 GB (=Min(20GB, 1GB) | +| 100 GB | 10 GB (=Min(20GB, 10GB) | +| 1 TB | 20 GB (=Min(20GB, 100GB) | +| 10 TB | 20 GB (=Min(20GB, 1TB) | +| 100 TB | 20GB (=Min(20GB, 10TB) | + +This case is more useful for test environment where disk space is less and no need any additional configuration. + +# Conclusion +1. Going with Approach 1 Review Comment: @sumitagrawl please re-read the **Proposal to address all requirements** section in my reply. I think this very clearly states the proposal but the things you are referring to in your reply are not mentioned there. > You mean we need have another config for min.free.space? No, two configs, one for min free space and one for DU reserved that each use the same value schema. I very clearly said in the previous response "Only two config keys: hdds.datanode.min.free.space and hdds.datanode.du.reserved". > I do not feel being in name of similar config for space, we should go with this approach, These are if different purpose. This is your take as developer. You need to look at this from a user's perspective. Our consistent failure to consider this perspective is why the system is difficult to use. Configs representing the same "type" of configuration, be it an address, percentage, disk space, time duration, etc must accept the same types of values. Users are not going to understand the nuance of why two similar configs accept different value formats, and in a few months I probably won't either. > Making similar just in name of both represent free space will make configuration complex for min.free.space as user need config for all disk. This is not part of the proposal. Please re-read it. Min space can be configured with one value across all disks, OR it can use a volume mapping. > There is no usecase till not for min.free.space for this. Lack of use case is not a valid reason to create a separate value schema for configs that work on the same type. There is also no use case for setting `hdds.heartbeat.interval` to `7d`, but the same value makes perfect sense for `hdds.container.scrub.data.scan.interval`. Yet they use the same value schema because they both represent time intervals. Your suggestion is analogous to rejecting the `d` suffix for `hdds.heartbeat.interval` because it would never be set that long. @adoroszlai > we need to consider that even old version may encounter values understood only by new one, and fail. We definitely need to formalize our configuration compatibility guarantees. This probably warrants a dedicated discussion somewhere more visible. My initial take is that we should always support "new software old config", but that supporting "old software new config" is not sustainable because it closes our config for extensions. Especially on the server side this would seem like a deployment error. Maybe our client side config compat guarantees would be different from the server. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org