On Jan 7, 2025, at 13:53, Mohr, Rick <moh...@ornl.gov> wrote:

Lustre won't automatically spread files across multiple mdts.
Actually, since Lustre 2.15 the clients will distribute new subdirectory 
creation across MDTs, either round-robin across MDTs for the top level 
directories, and based on free space/inodes on the MDTs if they are imbalanced.

I guess you are running an older Lustre version (e.g. 2.12)?  To have the MDT 
space balancing you need to have both clients and servers with a new enough 
version.

You can use "lfs mkdir -i <mdt_index> <dir>" to create a remote directory on a 
specific mdt which will cause all files in that directory to also reside on the 
same mdt.

The use of single-stripe remote directories is recommended for balancing MDT 
usage.  The use of "lfs mkdir -i $(((RANDOM % 2) + 1)) $DIR" could be added to 
job launch scripts to put new output directories onto a random non-zero MDT.

A default remote MDT directory layout can be set on existing directories (e.g. 
per user directory) to redirect new subdirectory creation to the other MDTs:

    lfs setdirstripe -D -c 1 -i 1 /mnt/lustre/user1
    lfs setdirstripe -D -c 1 -i 2 /mnt/lustre/user2
    :
    :

Note that this will *not* affect existing files/directories, only new 
*subdirectory* creation in the specified directories.  This needs to be run on 
all directories where new subdirectories are being created, so priority should 
be on top-level user directories and wherever they are running jobs.  You don't 
strictly need to run the "lfs setdirstripe" on *all* of the subdirectories, 
e.g. maybe limited to directories modified in the last month.

You can also stripe a directory across multiple mdts using "lfs mkdir -c 
<stripe_count> <dir>".

The use of many-striped directories is *not* recommended, unless there are 
single directories with millions of entries.  Striped directories should 
definitely *not* be used for MDT balancing, since this will cause significant 
overhead and consume a large number of MDT inodes.

My guess is that remote directories or striped directories were never created 
so all files end up on MDT0 by default.  Look for the section titled "Creating 
a sub-directory on a specific MDT" in the Lustre manual for more details.  
Hopefully that will resolve your issues.

Yes, this was a common problem on Lustre 2.12 and earlier deployments, and that 
is why the DNE3 MDT space balancing was implemented and enabled by default in 
newer releases.

Cheers, Andreas


--Rick


On 1/7/25, 3:29 AM, "lustre-discuss on behalf of Ihsan Ur Rahman" 
<lustre-discuss-boun...@lists.lustre.org 
<mailto:lustre-discuss-boun...@lists.lustre.org> on behalf of 
ihsanur...@gmail.com <mailto:ihsanur...@gmail.com>> wrote:


Hello lustre folks,
New to this form and lustre as well.




We have a lustre system and the users are getting an error that no space is 
left on the device. after checking, we have realised that the inodes are full 
for one of the MDT.




lfs df -ihv /mnt/lustre/
UUID Inodes IUsed IFree IUse% Mounted on
lustre-MDT0000_UUID 894.0M 894.0M 58 100% /mnt/lustre[MDT:0]
lustre-MDT0001_UUID 894.0M 313 894.0M 1% /mnt/lustre[MDT:1]
lustre-MDT0002_UUID 894.0M 313 894.0M 1% /mnt/lustre[MDT:2]
lustre-OST0000_UUID 4.0G 26.2M 4.0G 1% /mnt/lustre[OST:0]
lustre-OST0001_UUID 4.0G 26.1M 4.0G 1% /mnt/lustre[OST:1]
lustre-OST0002_UUID 4.0G 28.1M 4.0G 1% /mnt/lustre[OST:2]
lustre-OST0003_UUID 4.0G 26.6M 4.0G 1% /mnt/lustre[OST:3]
lustre-OST0004_UUID 4.0G 28.2M 4.0G 1% /mnt/lustre[OST:4]
lustre-OST0005_UUID 4.0G 27.3M 4.0G 1% /mnt/lustre[OST:5]
lustre-OST0006_UUID 4.0G 27.5M 4.0G 1% /mnt/lustre[OST:6]
lustre-OST0007_UUID 4.0G 28.0M 4.0G 1% /mnt/lustre[OST:7]
lustre-OST0008_UUID 4.0G 27.5M 4.0G 1% /mnt/lustre[OST:8]
lustre-OST0009_UUID 4.0G 26.4M 4.0G 1% /mnt/lustre[OST:9]
lustre-OST000a_UUID 4.0G 27.9M 4.0G 1% /mnt/lustre[OST:10]
lustre-OST000b_UUID 4.0G 28.4M 4.0G 1% /mnt/lustre[OST:11]
lustre-OST000c_UUID 4.0G 28.3M 4.0G 1% /mnt/lustre[OST:12]
lustre-OST000d_UUID 4.0G 27.8M 4.0G 1% /mnt/lustre[OST:13]
lustre-OST000e_UUID 4.0G 27.6M 4.0G 1% /mnt/lustre[OST:14]
lustre-OST000f_UUID 4.0G 27.1M 4.0G 1% /mnt/lustre[OST:15]
lustre-OST0010_UUID 4.0G 26.5M 4.0G 1% /mnt/lustre[OST:16]
lustre-OST0011_UUID 4.0G 27.3M 4.0G 1% /mnt/lustre[OST:17]
lustre-OST0012_UUID 4.0G 27.1M 4.0G 1% /mnt/lustre[OST:18]
lustre-OST0013_UUID 4.0G 28.8M 4.0G 1% /mnt/lustre[OST:19]
lustre-OST0014_UUID 4.0G 28.2M 4.0G 1% /mnt/lustre[OST:20]
lustre-OST0015_UUID 4.0G 26.1M 4.0G 1% /mnt/lustre[OST:21]
lustre-OST0016_UUID 4.0G 27.2M 4.0G 1% /mnt/lustre[OST:22]
lustre-OST0017_UUID 4.0G 28.7M 4.0G 1% /mnt/lustre[OST:23]
lustre-OST0018_UUID 4.0G 28.5M 4.0G 1% /mnt/lustre[OST:24]
lustre-OST0019_UUID 4.0G 28.3M 4.0G 1% /mnt/lustre[OST:25]
lustre-OST001a_UUID 4.0G 27.3M 4.0G 1% /mnt/lustre[OST:26]
lustre-OST001b_UUID 4.0G 27.0M 4.0G 1% /mnt/lustre[OST:27]
lustre-OST001c_UUID 4.0G 28.8M 4.0G 1% /mnt/lustre[OST:28]
lustre-OST001d_UUID 4.0G 28.5M 4.0G 1% /mnt/lustre[OST:29]


filesystem_summary: 2.6G 894.0M 1.7G 34% /mnt/lustre




After some search on google I have found that there may be some open files on 
the compute which can lead to consuming of the inodes.
With the command below I have got the list of the nodes where the files were 
open.
lctl get_param mdt.*.exports.*.open_files
I login to each server and with lsof, I figure out the files and kill all those 
files, but still it does not work for us.




Our primary goal is to bring the usage of inodes from 100% to below 90%, and 
then we can share the load of inodes on the other two MDT nodes.
Need your guidance and support.




regards,




Ihsan















_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Cheers, Andreas
—
Andreas Dilger
Lustre Principal Architect
Whamcloud/DDN




_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to