Hi Sammi and others, Thanks for working hard on addressing the review comments!
The branch [1] is 313 commits behind master. Let's update it (and resolve the conflicts, if there are any)? BTW, let's start a new email thread and use [VOTE 2] in the subject for the new VOTE. It is earlier to follow. Tsz-Wo [1] https://github.com/apache/ozone/tree/HDDS-5713 On Sun, Dec 7, 2025 at 11:45 PM Sammi Chen <[email protected]> wrote: > Thanks Sumit for the review and comments. > > Sorry for a late update. With the aggregated feedback and a discussion with > Symious, we decided to remove SCM as the proxy of disk balancer command to > datanode, so CLI will send directly to datanode. Here are the JIRAs of this > major change, HDDS-13598 <https://issues.apache.org/jira/browse/HDDS-13598 > > > , HDDS-13878 <https://issues.apache.org/jira/browse/HDDS-13878>. And > introduced a feature flag HDDS-13497 > <https://issues.apache.org/jira/browse/HDDS-13497>. > > With the above changes, all major comments are addressed. Please help to > take another review and cast your vote for disk balancer branch merge. > > > Thanks, > Sammi > > On Fri, 5 Sept 2025 at 20:08, Sumit Agrawal > <[email protected]> wrote: > > > Hi All, > > > > Few feedback from my side over design: > > 1. SCM act as proxy for CLI command like start / stop / config changes > but > > it has issue due to SCM async nature (API result consistency), > > a. Command results for successful apply or failure of change can not be > > shown in *expected* *time* (for a happy path will take at least 2 HB > time, > > i.e. 1 min) > > b. Command rejection failure is high, based on SCM state (especially > for > > all DN cases) like DN state un-healthy or not available or restarted, SCM > > restart, ... > > IMO, we can remove this CLI command execution via SCM but can be direct > > from Tool CLI itself as its redundant job which CLI also can do > > efficiently. > > > > 2. The DN side having a file as persistence of configuration is complex > and > > error prone. DB can be used instead to keep configuration as there for > open > > containers. > > a. file format versioning code for upgrade -- having additional code > > b. file consistency, recovery on failure and atomic write -- additional > > code is there > > c. file path configuration > > d. amount of code is high 2-3kloc for this which can be handled in a > few > > 100s. > > So I think we should replace the file with DB for keeping configuration > > in an atomic way which DB has inbuilt features and *proto* provides > > implicit *upgrade capability.* > > > > IMO, we need to make the above 2 changes with respect to code > > implementation. Others can share opinions. > > > > Regards > > Sumit > > > > On Fri, Aug 22, 2025 at 9:33 PM Tsz-Wo Nicholas Sze <[email protected]> > > wrote: > > > > > > The SlidingWindow class is not related to disk balancer, ... > > > > > > Ethan, thanks for pointing it out! Then, we should not remove it here. > > > > > > Tsz-Wo > > > > > > > > > On Thu, Aug 21, 2025 at 1:18 PM Ethan Rose <[email protected]> wrote: > > > > > > > Thanks for working to update this for merge. > > > > > > > > The SlidingWindow class is not related to disk balancer, it has been > > > added > > > > to master to be used by the volume scanner. It was added in > HDDS-12852 > > > > <https://issues.apache.org/jira/browse/HDDS-12852> and is being used > > in > > > > HDDS-13108 <https://issues.apache.org/jira/browse/HDDS-13108> / > > > > https://github.com/apache/ozone/pull/8843. Please don't remove it. > > > > > > > > Please update the wiki with the justification for why there are no > > > > incompatible changes as described in this thread. Currently it still > > > > says "There > > > > should not be any incompatible changes introduced with this feature." > > but > > > > provides no justification. > > > > > > > > Regarding the feature flag, the status communicated in this thread > must > > > > also be clear to end users. Until this is considered stable and > enabled > > > by > > > > default, please: > > > > - Update the user doc added in > > https://github.com/apache/ozone/pull/8837 > > > > to > > > > indicate that the feature is considered experimental/beta. > > Additionally, > > > I > > > > don't see a mention of the feature flag in this doc. > > > > - Update the config description added in > > > > https://github.com/apache/ozone/pull/8869 to indicate the same. > > > > - Update the CLI help message to indicate the same. > > > > - Hide the CLI so it does not appear in help messages. > > > > > > > > Additionally, please make all the CLI flags kebab-case. They are > using > > a > > > > mix of kebab and camel case right now. Ideally the command itself > would > > > be > > > > `disk-balancer` as well, but since we already have > `containerbalancer` > > we > > > > might want to leave that part as is for consistency. > > > > > > > > Thanks, > > > > Ethan > > > > > > > > > > > > > -- > > *Sumit Agrawal* | Senior Staff Engineer > > cloudera.com <https://www.cloudera.com> > > [image: Cloudera] <https://www.cloudera.com/> > > [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image: > > Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: > Cloudera > > on LinkedIn] <https://www.linkedin.com/company/cloudera> > > ------------------------------ > > >
