Thanks, Wei-Chiu for the explanation. In that case, I give my +1 to the proposed change.
73 Kihwal On Tue, May 12, 2020 at 10:31 AM Wei-Chiu Chuang <weic...@apache.org> wrote: > I don't think we have I/O-based balancing. That would surely make a great > research project but it doesn't seem trivial to me. > > Also worth noting the implementation doesn't try to achieve very fine > grained balance in space. As long as all volumes has the available space > within a threshold (10GB by default), > it falls back to round-robin policy. > > On Tue, May 5, 2020 at 9:23 AM Kihwal Lee <kih...@verizonmedia.com > .invalid> > wrote: > > > Successfully running on 1,000 clusters over 5 years proves the feature is > > stable. It does not, however, give me assurance that it will perform > well > > in our env. > > > > It will be nice if there is some data on its performance. On obvious > > concern is, running into grossly unbalanced I/O load among drives. Since > > our multi-tenant clusters have a high utilization for both CPU and IO, > > ganging up on a drive tends to hurt job throughput and cause SLA misses. > > > > I would feel more comfortable if the feature takes I/O balancing into > > consideration at the same time. Sorry, I didn't look at the code, so if > it > > is already doing this, that's good news. > > > > Thanks > > Kihwal > > > > On Thu, Apr 30, 2020 at 4:34 AM Stephen O'Donnell > > <sodonn...@cloudera.com.invalid> wrote: > > > > > I am hoping Arpit Agarwal & Tsz-wo-Sze will comment here too, but I > will > > > ping them directly if they do not. > > > > > > 5 years ago, when they raised those concerns, the feature was new and > > > little used. Their concerns, I think, were based on a theory that the > > > feature might not perform well. However since then the feature has > proven > > > stable and trouble free. In supporting many of Cloudera's clusters over > > the > > > last 5 years and despite us having about 1000 clusters using this > > setting, > > > I don't recall a single issue caused by it. On the other hand, we > > fielded a > > > lot of support issues around default round robin policy, where smaller > > > disks filled up, needing to run the disk balancer etc. > > > > > > As the feature seems to work well in practice, I would be inclined to > > leave > > > what appears to be stable as it is, and only make changes if we see > > issues > > > in real usage. > > > > > > On Thu, Apr 30, 2020 at 10:03 AM Ayush Saxena <ayush...@gmail.com> > > wrote: > > > > > > > Hey Stephen, > > > > Thanx for initiating this. > > > > Just had a look on HDFS-8538, Seems it had concerns couple of > concerns > > > > regarding the write throughput and performance by Arpit Agarwal & > > > > Tsz-wo-Sze. It concluded with a solution in the end as mentioned > here : > > > > > > > > > > > > > > https://issues.apache.org/jira/browse/HDFS-8538?focusedCommentId=14606094&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14606094 > > > > > > > > Do you plan to incorporate the same and then continue or is the > concern > > > > then raised isn't there now? Any pointers on those concerns and > > comments? > > > > Would be great if you get a nod from them too.. > > > > > > > > Thanx > > > > -Ayush > > > > > > > > On Thu, 30 Apr 2020 at 09:32, Akira Ajisaka <aajis...@apache.org> > > wrote: > > > > > > > >> +1 to change the default policy in Hadoop 3.4+. > > > >> > > > >> -Akira > > > >> > > > >> On Wed, Apr 29, 2020 at 1:28 AM Clay Baenziger (BLOOMBERG/ 919 3RD > A) > > < > > > >> cbaenzi...@bloomberg.net> wrote: > > > >> > > > >> > I can confirm that my group has run with Available Space for a > > number > > > of > > > >> > years on the 2.7.x line quite successfully. > > > >> > > > > >> > -Clay > > > >> > > > > >> > From: weic...@cloudera.com.INVALID At: 04/28/20 11:50:27To: > > > >> > sodonn...@cloudera.com.invalid > > > >> > Cc: hdfs-dev@hadoop.apache.org > > > >> > Subject: Re: Changing the default Datanode Volume Choosing policy > > > >> > > > > >> > +1 to switch it on in Hadoop 3.4.0 > > > >> > > > > >> > (1) it doesn't break any existing applications I am aware of. > > > >> > (2) No noticeable performance regression in any cases observed. > > > >> > > > > >> > I feel compelled to make a feature the default if it is strictly > > > better. > > > >> > Hopefully we can make Hadoop easier to use in this way too. > > > >> > > > > >> > On Tue, Apr 28, 2020 at 8:36 AM Stephen O'Donnell > > > >> > <sodonn...@cloudera.com.invalid> wrote: > > > >> > > > > >> > > Hi, > > > >> > > > > > >> > > A long time back there was a Jira raised to change the default > > > volume > > > >> > > choosing policy from Round Robin to Available Space: > > > >> > > > > > >> > > https://issues.apache.org/jira/browse/HDFS-8538 > > > >> > > > > > >> > > At the time there were some objections / concerns about using > > > >> available > > > >> > > space. > > > >> > > > > > >> > > In the 5 years since then, at Cloudera we have seen about 1000 > > > >> clusters > > > >> > > running with Available Space enabled, and we have not seen any > > > issues > > > >> > > caused by it. It feels like this policy should be the default, > as > > we > > > >> have > > > >> > > to change it more often than not. > > > >> > > > > > >> > > To recap, the Available Space places blocks on disks with more > > free > > > >> space > > > >> > > with a higher probability until all disks are within a threshold > > of > > > >> free > > > >> > > space from each other. After that it behaves in a round robin > > > fashion. > > > >> > This > > > >> > > means if a disk is replaced, it will slowly catch up to the > usage > > of > > > >> the > > > >> > > others, and if you have disks of different sizes, they will self > > > >> balance. > > > >> > > > > > >> > > I would like to ask: > > > >> > > > > > >> > > 1. Are there others in the community running the Available Space > > > >> volume > > > >> > > choosing policy, and if so, have you seen any issues, or does it > > run > > > >> > > smoothly? > > > >> > > > > > >> > > 2. Does anyone have any strong objections in changing the > default > > to > > > >> > > Available Space from 3.4 onwards? > > > >> > > > > > >> > > Thanks, > > > >> > > > > > >> > > Stephen. > > > >> > > > > > >> > > > > >> > > > > >> > > > > >> > > > > > > > > > >