OK, I disabled, waited for a while, then reenabled. I still get the same numbers. The only thing I can think is somehow the count is correct, despite the huge difference. Robinhood and find show about 1.7M files, dirs, and links. The quota is showing a bit over 3.1M inodes used. We only have one MDS and MGS. Any ideas where the discrepancy may lie? Orphans? Is there a lost+found area in lustre?
— Dan Szkola FNAL > On Oct 10, 2023, at 8:24 AM, Daniel Szkola <[email protected]> wrote: > > Hi Robert, > > Thanks for the response. Do you remember exactly how you did it? Did you > bring everything down at any point? I know you can do this: > > lctl conf_param fsname.quota.mdt=none > > but is that all you did? Did you wait or bring everything down before > reenabling? I’m worried because that allegedly just enables/disables > enforcement and space accounting is always on. Andreas stated that quotas are > controlled by ZFS, but there has been no quota support enabled on any of the > ZFS volumes in our lustre filesystem. > > — > Dan Szkola > FNAL > >> On Oct 10, 2023, at 2:17 AM, Redl, Robert <[email protected]> wrote: >> >> Dear Dan, >> >> I had a similar problem some time ago. We are also using ZFS for MDT and >> OSTs. For us, the used disk space was reported wrong. The problem was fixed >> by switching quota support off on the MGS and then on again. >> >> Cheers, >> Robert >> >>> Am 09.10.2023 um 17:55 schrieb Daniel Szkola via lustre-discuss >>> <[email protected]>: >>> >>> Thanks, I will look into the ZFS quota since we are using ZFS for all >>> storage, MDT and OSTs. >>> >>> In our case, there is a single MDS/MDT. I have used Robinhood and lfs find >>> (by group) commands to verify what the numbers should apparently be. >>> >>> — >>> Dan Szkola >>> FNAL >>> >>>> On Oct 9, 2023, at 10:13 AM, Andreas Dilger <[email protected]> wrote: >>>> >>>> The quota accounting is controlled by the backing filesystem of the OSTs >>>> and MDTs. >>>> >>>> For ldiskfs/ext4 you could run e2fsck to re-count all of the inode and >>>> block usage. >>>> >>>> For ZFS you would have to ask on the ZFS list to see if there is some way >>>> to re-count the quota usage. >>>> >>>> The "inode" quota is accounted from the MDTs, while the "block" quota is >>>> accounted from the OSTs. You might be able to see with "lfs quota -v -g >>>> group" to see if there is one particular MDT that is returning too many >>>> inodes. >>>> >>>> Possibly if you have directories that are striped across many MDTs it >>>> would inflate the used inode count. For example, if every one of the 426k >>>> directories reported by RBH was striped across 4 MDTs then you would see >>>> the inode count add up to 3.6M. >>>> >>>> If that was the case, then I would really, really advise against striping >>>> every directory in the filesystem. That will cause problems far worse >>>> than just inflating the inode quota accounting. >>>> >>>> Cheers, Andreas >>>> >>>>> On Oct 9, 2023, at 22:33, Daniel Szkola via lustre-discuss >>>>> <[email protected]> wrote: >>>>> >>>>> Is there really no way to force a recount of files used by the quota? >>>>> All indications are we have accounts where files were removed and this is >>>>> not reflected in the used file count in the quota. The space used seems >>>>> correct but the inodes used numbers are way high. There must be a way to >>>>> clear these numbers and have a fresh count done. >>>>> >>>>> — >>>>> Dan Szkola >>>>> FNAL >>>>> >>>>>> On Oct 4, 2023, at 11:37 AM, Daniel Szkola via lustre-discuss >>>>>> <[email protected]> wrote: >>>>>> >>>>>> Also, quotas on the OSTS don’t add up to near 3 million files either: >>>>>> >>>>>> [root@lustreclient scratch]# ssh ossnode0 lfs quota -g somegroup -I 0 >>>>>> /lustre1 >>>>>> Disk quotas for grp somegroup (gid 9544): >>>>>> Filesystem kbytes quota limit grace files quota limit >>>>>> grace >>>>>> 1394853459 0 1913344192 - 132863 0 0 >>>>>> - >>>>>> [root@lustreclient scratch]# ssh ossnode0 lfs quota -g somegroup -I 1 >>>>>> /lustre1 >>>>>> Disk quotas for grp somegroup (gid 9544): >>>>>> Filesystem kbytes quota limit grace files quota limit >>>>>> grace >>>>>> 1411579601 0 1963246413 - 120643 0 0 >>>>>> - >>>>>> [root@lustreclient scratch]# ssh ossnode1 lfs quota -g somegroup -I 2 >>>>>> /lustre1 >>>>>> Disk quotas for grp somegroup (gid 9544): >>>>>> Filesystem kbytes quota limit grace files quota limit >>>>>> grace >>>>>> 1416507527 0 1789950778 - 190687 0 0 >>>>>> - >>>>>> [root@lustreclient scratch]# ssh ossnode1 lfs quota -g somegroup -I 3 >>>>>> /lustre1 >>>>>> Disk quotas for grp somegroup (gid 9544): >>>>>> Filesystem kbytes quota limit grace files quota limit >>>>>> grace >>>>>> 1636465724 0 1926578117 - 195034 0 0 >>>>>> - >>>>>> [root@lustreclient scratch]# ssh ossnode2 lfs quota -g somegroup -I 4 >>>>>> /lustre1 >>>>>> Disk quotas for grp somegroup (gid 9544): >>>>>> Filesystem kbytes quota limit grace files quota limit >>>>>> grace >>>>>> 2202272244 0 3020159313 - 185097 0 0 >>>>>> - >>>>>> [root@lustreclient scratch]# ssh ossnode2 lfs quota -g somegroup -I 5 >>>>>> /lustre1 >>>>>> Disk quotas for grp somegroup (gid 9544): >>>>>> Filesystem kbytes quota limit grace files quota limit >>>>>> grace >>>>>> 1324770165 0 1371244768 - 145347 0 0 >>>>>> - >>>>>> [root@lustreclient scratch]# ssh ossnode3 lfs quota -g somegroup -I 6 >>>>>> /lustre1 >>>>>> Disk quotas for grp somegroup (gid 9544): >>>>>> Filesystem kbytes quota limit grace files quota limit >>>>>> grace >>>>>> 2892027349 0 3221225472 - 169386 0 0 >>>>>> - >>>>>> [root@lustreclient scratch]# ssh ossnode3 lfs quota -g somegroup -I 7 >>>>>> /lustre1 >>>>>> Disk quotas for grp somegroup (gid 9544): >>>>>> Filesystem kbytes quota limit grace files quota limit >>>>>> grace >>>>>> 2076201636 0 2474853207 - 171552 0 0 >>>>>> - >>>>>> >>>>>> >>>>>> — >>>>>> Dan Szkola >>>>>> FNAL >>>>>> >>>>>>>> On Oct 4, 2023, at 8:45 AM, Daniel Szkola via lustre-discuss >>>>>>>> <[email protected]> wrote: >>>>>>> >>>>>>> No combination of ossnodek runs has helped with this. >>>>>>> >>>>>>> Again, robinhood shows 1796104 files for the group, an 'lfs find -G >>>>>>> gid' found 1796104 files as well. >>>>>>> >>>>>>> So why is the quota command showing over 3 million inodes used? >>>>>>> >>>>>>> There must be a way to force it to recount or clear all stale quota >>>>>>> data and have it regenerate it? >>>>>>> >>>>>>> Anyone? >>>>>>> >>>>>>> — >>>>>>> Dan Szkola >>>>>>> FNAL >>>>>>> >>>>>>> >>>>>>>> On Sep 27, 2023, at 9:42 AM, Daniel Szkola via lustre-discuss >>>>>>>> <[email protected]> wrote: >>>>>>>> >>>>>>>> We have a lustre filesystem that we just upgraded to 2.15.3, however >>>>>>>> this problem has been going on for some time. >>>>>>>> >>>>>>>> The quota command shows this: >>>>>>>> >>>>>>>> Disk quotas for grp somegroup (gid 9544): >>>>>>>> Filesystem used quota limit grace files quota limit >>>>>>>> grace >>>>>>>> /lustre1 13.38T 40T 45T - 3136761* 2621440 3670016 >>>>>>>> expired >>>>>>>> >>>>>>>> The group is not using nearly that many files. We have robinhood >>>>>>>> installed and it show this: >>>>>>>> >>>>>>>> Using config file '/etc/robinhood.d/lustre1.conf'. >>>>>>>> group, type, count, volume, spc_used, avg_size >>>>>>>> somegroup, symlink, 59071, 5.12 MB, 103.16 MB, 91 >>>>>>>> somegroup, dir, 426619, 5.24 GB, 5.24 GB, 12.87 KB >>>>>>>> somegroup, file, 1310414, 16.24 TB, 13.37 TB, 13.00 MB >>>>>>>> >>>>>>>> Total: 1796104 entries, volume: 17866508365925 bytes (16.25 TB), space >>>>>>>> used: 14704924899840 bytes (13.37 TB) >>>>>>>> >>>>>>>> Any ideas what is wrong here? >>>>>>>> >>>>>>>> — >>>>>>>> Dan Szkola >>>>>>>> FNAL >>> >>> _______________________________________________ >>> lustre-discuss mailing list >>> [email protected] >>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org >> > _______________________________________________ lustre-discuss mailing list [email protected] http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
