I can confirm that my group has run with Available Space for a number of years on the 2.7.x line quite successfully.
-Clay From: weic...@cloudera.com.INVALID At: 04/28/20 11:50:27To: sodonn...@cloudera.com.invalid Cc: hdfs-dev@hadoop.apache.org Subject: Re: Changing the default Datanode Volume Choosing policy +1 to switch it on in Hadoop 3.4.0 (1) it doesn't break any existing applications I am aware of. (2) No noticeable performance regression in any cases observed. I feel compelled to make a feature the default if it is strictly better. Hopefully we can make Hadoop easier to use in this way too. On Tue, Apr 28, 2020 at 8:36 AM Stephen O'Donnell <sodonn...@cloudera.com.invalid> wrote: > Hi, > > A long time back there was a Jira raised to change the default volume > choosing policy from Round Robin to Available Space: > > https://issues.apache.org/jira/browse/HDFS-8538 > > At the time there were some objections / concerns about using available > space. > > In the 5 years since then, at Cloudera we have seen about 1000 clusters > running with Available Space enabled, and we have not seen any issues > caused by it. It feels like this policy should be the default, as we have > to change it more often than not. > > To recap, the Available Space places blocks on disks with more free space > with a higher probability until all disks are within a threshold of free > space from each other. After that it behaves in a round robin fashion. This > means if a disk is replaced, it will slowly catch up to the usage of the > others, and if you have disks of different sizes, they will self balance. > > I would like to ask: > > 1. Are there others in the community running the Available Space volume > choosing policy, and if so, have you seen any issues, or does it run > smoothly? > > 2. Does anyone have any strong objections in changing the default to > Available Space from 3.4 onwards? > > Thanks, > > Stephen. >