Hi Sammi and others,

Thanks for working hard on addressing the review comments!

The branch [1] is 313 commits behind master.  Let's update it (and resolve
the conflicts, if there are any)?

BTW, let's start a new email thread and use [VOTE 2] in the subject for the
new VOTE.  It is earlier to follow.

Tsz-Wo
[1] https://github.com/apache/ozone/tree/HDDS-5713

On Sun, Dec 7, 2025 at 11:45 PM Sammi Chen <[email protected]> wrote:

> Thanks Sumit for the review and comments.
>
> Sorry for a late update. With the aggregated feedback and a discussion with
> Symious, we decided to remove SCM as the proxy of disk balancer command to
> datanode, so CLI will send directly to datanode. Here are the JIRAs of this
> major change, HDDS-13598 <https://issues.apache.org/jira/browse/HDDS-13598
> >
>  , HDDS-13878 <https://issues.apache.org/jira/browse/HDDS-13878>.  And
> introduced a feature flag HDDS-13497
> <https://issues.apache.org/jira/browse/HDDS-13497>.
>
> With the above changes, all major comments are addressed.  Please help to
> take another review and cast your vote for disk balancer branch merge.
>
>
> Thanks,
> Sammi
>
> On Fri, 5 Sept 2025 at 20:08, Sumit Agrawal
> <[email protected]> wrote:
>
> > Hi All,
> >
> > Few feedback from my side over design:
> > 1. SCM act as proxy for CLI command like start / stop / config changes
> but
> > it has issue due to SCM async nature (API result consistency),
> >   a. Command results for successful apply or failure of change can not be
> > shown in *expected* *time* (for a happy path will take at least 2 HB
> time,
> > i.e. 1 min)
> >   b. Command rejection failure is high, based on SCM state (especially
> for
> > all DN cases) like DN state un-healthy or not available or restarted, SCM
> > restart, ...
> > IMO, we can remove this CLI command execution via SCM but can be direct
> > from Tool CLI itself as its redundant job which CLI also can do
> > efficiently.
> >
> > 2. The DN side having a file as persistence of configuration is complex
> and
> > error prone. DB can be used instead to keep configuration as there for
> open
> > containers.
> >   a. file format versioning code for upgrade -- having additional code
> >   b. file consistency, recovery on failure and atomic write -- additional
> > code is there
> >   c. file path configuration
> >   d. amount of code is high 2-3kloc for this which can be handled in a
> few
> > 100s.
> >   So I think we should replace the file with DB for keeping configuration
> > in an atomic way which DB has inbuilt features and *proto* provides
> > implicit *upgrade capability.*
> >
> > IMO, we need to make the above 2 changes with respect to code
> > implementation. Others can share opinions.
> >
> > Regards
> > Sumit
> >
> > On Fri, Aug 22, 2025 at 9:33 PM Tsz-Wo Nicholas Sze <[email protected]>
> > wrote:
> >
> > > > The SlidingWindow class is not related to disk balancer, ...
> > >
> > > Ethan, thanks for pointing it out!  Then, we should not remove it here.
> > >
> > > Tsz-Wo
> > >
> > >
> > > On Thu, Aug 21, 2025 at 1:18 PM Ethan Rose <[email protected]> wrote:
> > >
> > > > Thanks for working to update this for merge.
> > > >
> > > > The SlidingWindow class is not related to disk balancer, it has been
> > > added
> > > > to master to be used by the volume scanner. It was added in
> HDDS-12852
> > > > <https://issues.apache.org/jira/browse/HDDS-12852> and is being used
> > in
> > > > HDDS-13108 <https://issues.apache.org/jira/browse/HDDS-13108> /
> > > > https://github.com/apache/ozone/pull/8843. Please don't remove it.
> > > >
> > > > Please update the wiki with the justification for why there are no
> > > > incompatible changes as described in this thread. Currently it still
> > > > says "There
> > > > should not be any incompatible changes introduced with this feature."
> > but
> > > > provides no justification.
> > > >
> > > > Regarding the feature flag, the status communicated in this thread
> must
> > > > also be clear to end users. Until this is considered stable and
> enabled
> > > by
> > > > default, please:
> > > > - Update the user doc added in
> > https://github.com/apache/ozone/pull/8837
> > > > to
> > > > indicate that the feature is considered experimental/beta.
> > Additionally,
> > > I
> > > > don't see a mention of the feature flag in this doc.
> > > > - Update the config description added in
> > > > https://github.com/apache/ozone/pull/8869 to indicate the same.
> > > > - Update the CLI help message to indicate the same.
> > > > - Hide the CLI so it does not appear in help messages.
> > > >
> > > > Additionally, please make all the CLI flags kebab-case. They are
> using
> > a
> > > > mix of kebab and camel case right now. Ideally the command itself
> would
> > > be
> > > > `disk-balancer` as well, but since we already have
> `containerbalancer`
> > we
> > > > might want to leave that part as is for consistency.
> > > >
> > > > Thanks,
> > > > Ethan
> > > >
> > >
> >
> >
> > --
> > *Sumit Agrawal* | Senior Staff Engineer
> > cloudera.com <https://www.cloudera.com>
> > [image: Cloudera] <https://www.cloudera.com/>
> > [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
> > Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
> Cloudera
> > on LinkedIn] <https://www.linkedin.com/company/cloudera>
> > ------------------------------
> >
>

Reply via email to