Hi Sammi and others, Thanks for working on the Disk Balancer feature! It is going to be very useful.
Some comments: - Why does SCM need a DiskBalancerManager? Should the DiskBalancerService just be a local service in Datanode? - In DiskBalancerTask, it holds the container read lock. Would it be possible that a file gets deleted when another thread is reading it and results in an exception? - In DefaultContainerChoosingPolicy, it has a CACHE for caching container iterators. Would it cause ConcurrentModificationException if the underlying containerMap is modified? - In DefaultVolumeChoosingPolicy.chooseVolume(..), getCurrentUsage() is called multiple times so it can return different values. - Is DiskBalancerService.DISK_BALANCER_TMP_DIR the same as StorageVolume.TMP_DIR_NAME? If yes, use StorageVolume.TMP_DIR_NAME instead of adding DiskBalancerService.DISK_BALANCER_TMP_DIR. - Do not throw RuntimeException -- how to handle it when it happens? - Since this is a new feature, how about not supporting SCHEMA_V1/V2 ? If we want to support them, we should add some tests. - DatanodeDiskBalancerInfoProto should use DatanodeIDProto instead of DatanodeDetailsProto - Remove DatanodeDiskBalancerInfoType since it is never used. - Remove the new Container.importContainerData(Path containerPath) method since it is never used. - Rename Container.copyContainerData to copyContainerDirectory. Otherwise, it sounds like copy/import the ContainerData class. - In DiskBalancerService, use ContainerID instead of Long for container id. - Do not use Optional for parameters; see https://www.reddit.com/r/java/comments/sat1j4/opinions_on_using_optional_as_parameter/ - Use Objects.requireNonNull(..) instead of Preconditions.checkNotNull(..). Regards, Tsz-Wo On Fri, Aug 8, 2025 at 9:57 AM Ethan Rose <er...@apache.org> wrote: > Hi Sammi, thanks for working on this feature. > > There should not be any incompatible changes introduced with this feature. > > This section needs more information because it only talks about a feature > flag, which is orthogonal to any upgrade/downgrade concerns because feature > flags do not prevent downgrades like finalization does. > > - What happens if the feature flag is enabled, and later disabled? > - If an in-progress container move is interrupted by a downgrade, what > does the old version do? > - Are components persisting any state about the ongoing operations, and > if so, what happens to that during a downgrade then upgrade flow? > > The disk balancer has a feature flag that currently defaults to false. In > this > comment <https://github.com/apache/ozone/pull/8869#issuecomment-3130009094 > > > I explained why feature flags for features that are only triggered by > admins are redundant. I’ve heard other devs from HDFS also express concern > over everything in that project being behind feature flags and trying to > avoid sending Ozone down that path as well. IMO if we want to indicate that > this is not ready for prod use, it would be better to hide the CLI command > <https://picocli.info/quick-guide.html#_hidden_options_and_parameters> and > add a disclaimer to the help message indicating this. The current config > flag approach provides no user facing information as to why it is disabled > by default or what the implications of enabling it are. Additionally, why > is this set tofalse by default but considered ready to merge with master? > > Ethan > > On Fri, Aug 8, 2025 at 4:53 AM Sammi Chen <sammic...@apache.org> wrote: > > > Hi Ozone developers, > > > > I would like to propose merging the HDDS-5713 Disk Balancer feature > branch > > into master. > > > > HDDS-5713 adds the support of the disk volume utilization balancer > function > > by selecting and moving containers from high utilized data volume to > lower > > utilized data volume, to achieve an all disk volumes' utilization even > > state. > > > > > > Feature Jira Link: > > https://issues.apache.org/jira/browse/HDDS-5713 > > > > > > Checklist for feature branch merge: > > > > > https://cwiki.apache.org/confluence/display/OZONE/Disk+Balancer+For+Datanode+-+HDDS-5713 > > < > > > https://cwiki.apache.org/confluence/display/OZONE/Disk+Balancer+For+Datanode+-+HDDS-5713 > > > > > > > > > This vote will be open for at least a week. > > > > Thanks, > > Sammi Chen > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@ozone.apache.org > > For additional commands, e-mail: dev-h...@ozone.apache.org > > >