Re: Changing the default Datanode Volume Choosing policy

Kihwal Lee Tue, 05 May 2020 09:23:21 -0700

Successfully running on 1,000 clusters over 5 years proves the feature is
stable.  It does not, however, give me assurance that it will perform well
in our env.


It will be nice if there is some data on its performance. On obvious
concern is, running into grossly unbalanced I/O load among drives. Since
our multi-tenant clusters have a high utilization for both CPU and IO,
ganging up on a drive tends to hurt job throughput and cause SLA misses.

I would feel more comfortable if the feature takes I/O balancing into
consideration at the same time.  Sorry, I didn't look at the code, so if it
is already doing this, that's good news.

Thanks
Kihwal

On Thu, Apr 30, 2020 at 4:34 AM Stephen O'Donnell
<sodonn...@cloudera.com.invalid> wrote:

> I am hoping Arpit Agarwal & Tsz-wo-Sze will comment here too, but I will
> ping them directly if they do not.
>
> 5 years ago, when they raised those concerns, the feature was new and
> little used. Their concerns, I think, were based on a theory that the
> feature might not perform well. However since then the feature has proven
> stable and trouble free. In supporting many of Cloudera's clusters over the
> last 5 years and despite us having about 1000 clusters using this setting,
> I don't recall a single issue caused by it. On the other hand, we fielded a
> lot of support issues around default round robin policy, where smaller
> disks filled up, needing to run the disk balancer etc.
>
> As the feature seems to work well in practice, I would be inclined to leave
> what appears to be stable as it is, and only make changes if we see issues
> in real usage.
>
> On Thu, Apr 30, 2020 at 10:03 AM Ayush Saxena <ayush...@gmail.com> wrote:
>
> > Hey Stephen,
> > Thanx for initiating this.
> > Just had a look on HDFS-8538, Seems it had concerns couple of concerns
> > regarding the write throughput and performance by Arpit Agarwal &
> > Tsz-wo-Sze. It concluded with a solution in the end as mentioned here :
> >
> >
> https://issues.apache.org/jira/browse/HDFS-8538?focusedCommentId=14606094&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14606094
> >
> > Do you plan to incorporate the same and then continue or is the concern
> > then raised isn't there now? Any pointers on those concerns and comments?
> > Would be great if you get a nod from them too..
> >
> > Thanx
> > -Ayush
> >
> > On Thu, 30 Apr 2020 at 09:32, Akira Ajisaka <aajis...@apache.org> wrote:
> >
> >> +1 to change the default policy in Hadoop 3.4+.
> >>
> >> -Akira
> >>
> >> On Wed, Apr 29, 2020 at 1:28 AM Clay Baenziger (BLOOMBERG/ 919 3RD A) <
> >> cbaenzi...@bloomberg.net> wrote:
> >>
> >> > I can confirm that my group has run with Available Space for a number
> of
> >> > years on the 2.7.x line quite successfully.
> >> >
> >> > -Clay
> >> >
> >> > From: weic...@cloudera.com.INVALID At: 04/28/20 11:50:27To:
> >> > sodonn...@cloudera.com.invalid
> >> > Cc:  hdfs-dev@hadoop.apache.org
> >> > Subject: Re: Changing the default Datanode Volume Choosing policy
> >> >
> >> > +1 to switch it on in Hadoop 3.4.0
> >> >
> >> > (1) it doesn't break any existing applications I am aware of.
> >> > (2) No noticeable performance regression in any cases observed.
> >> >
> >> > I feel compelled to make a feature the default if it is strictly
> better.
> >> > Hopefully we can make Hadoop easier to use in this way too.
> >> >
> >> > On Tue, Apr 28, 2020 at 8:36 AM Stephen O'Donnell
> >> > <sodonn...@cloudera.com.invalid> wrote:
> >> >
> >> > > Hi,
> >> > >
> >> > > A long time back there was a Jira raised to change the default
> volume
> >> > > choosing policy from Round Robin to Available Space:
> >> > >
> >> > > https://issues.apache.org/jira/browse/HDFS-8538
> >> > >
> >> > > At the time there were some objections / concerns about using
> >> available
> >> > > space.
> >> > >
> >> > > In the 5 years since then, at Cloudera we have seen about 1000
> >> clusters
> >> > > running with Available Space enabled, and we have not seen any
> issues
> >> > > caused by it. It feels like this policy should be the default, as we
> >> have
> >> > > to change it more often than not.
> >> > >
> >> > > To recap, the Available Space places blocks on disks with more free
> >> space
> >> > > with a higher probability until all disks are within a threshold of
> >> free
> >> > > space from each other. After that it behaves in a round robin
> fashion.
> >> > This
> >> > > means if a disk is replaced, it will slowly catch up to the usage of
> >> the
> >> > > others, and if you have disks of different sizes, they will self
> >> balance.
> >> > >
> >> > > I would like to ask:
> >> > >
> >> > > 1. Are there others in the community running the Available Space
> >> volume
> >> > > choosing policy, and if so, have you seen any issues, or does it run
> >> > > smoothly?
> >> > >
> >> > > 2. Does anyone have any strong objections in changing the default to
> >> > > Available Space from 3.4 onwards?
> >> > >
> >> > > Thanks,
> >> > >
> >> > > Stephen.
> >> > >
> >> >
> >> >
> >> >
> >>
> >
>

Re: Changing the default Datanode Volume Choosing policy

Reply via email to