Re: [lustre-discuss] Ongoing issues with quota

Daniel Szkola via lustre-discuss Tue, 10 Oct 2023 07:16:07 -0700

OK, I disabled, waited for a while, then reenabled. I still get the same 
numbers. The only thing I can think is somehow the count is correct, despite 
the huge difference. Robinhood and find show about 1.7M files, dirs, and links. 
The quota is showing a bit over 3.1M inodes used. We only have one MDS and MGS. 
Any ideas where the discrepancy may lie? Orphans? Is there a lost+found area in 
lustre?


—
Dan Szkola
FNAL


> On Oct 10, 2023, at 8:24 AM, Daniel Szkola <[email protected]> wrote:
> 
> Hi Robert,
> 
> Thanks for the response. Do you remember exactly how you did it? Did you 
> bring everything down at any point? I know you can do this:
> 
> lctl conf_param fsname.quota.mdt=none
> 
> but is that all you did? Did you wait or bring everything down before 
> reenabling? I’m worried because that allegedly just enables/disables 
> enforcement and space accounting is always on. Andreas stated that quotas are 
> controlled by ZFS, but there has been no quota support enabled on any of the 
> ZFS volumes in our lustre filesystem.
> 
> —
> Dan Szkola
> FNAL
> 
>> On Oct 10, 2023, at 2:17 AM, Redl, Robert <[email protected]> wrote:
>> 
>> Dear Dan,
>> 
>> I had a similar problem some time ago. We are also using ZFS for MDT and 
>> OSTs. For us, the used disk space was reported wrong. The problem was fixed 
>> by switching quota support off on the MGS and then on again. 
>> 
>> Cheers,
>> Robert
>> 
>>> Am 09.10.2023 um 17:55 schrieb Daniel Szkola via lustre-discuss 
>>> <[email protected]>:
>>> 
>>> Thanks, I will look into the ZFS quota since we are using ZFS for all 
>>> storage, MDT and OSTs.
>>> 
>>> In our case, there is a single MDS/MDT. I have used Robinhood and lfs find 
>>> (by group) commands to verify what the numbers should apparently be.
>>> 
>>> —
>>> Dan Szkola
>>> FNAL
>>> 
>>>> On Oct 9, 2023, at 10:13 AM, Andreas Dilger <[email protected]> wrote:
>>>> 
>>>> The quota accounting is controlled by the backing filesystem of the OSTs 
>>>> and MDTs.
>>>> 
>>>> For ldiskfs/ext4 you could run e2fsck to re-count all of the inode and 
>>>> block usage. 
>>>> 
>>>> For ZFS you would have to ask on the ZFS list to see if there is some way 
>>>> to re-count the quota usage. 
>>>> 
>>>> The "inode" quota is accounted from the MDTs, while the "block" quota is 
>>>> accounted from the OSTs. You might be able to see with "lfs quota -v -g 
>>>> group" to see if there is one particular MDT that is returning too many 
>>>> inodes. 
>>>> 
>>>> Possibly if you have directories that are striped across many MDTs it 
>>>> would inflate the used inode count. For example, if every one of the 426k 
>>>> directories reported by RBH was striped across 4 MDTs then you would see 
>>>> the inode count add up to 3.6M. 
>>>> 
>>>> If that was the case, then I would really, really advise against striping 
>>>> every directory in the filesystem.  That will cause problems far worse 
>>>> than just inflating the inode quota accounting. 
>>>> 
>>>> Cheers, Andreas
>>>> 
>>>>> On Oct 9, 2023, at 22:33, Daniel Szkola via lustre-discuss 
>>>>> <[email protected]> wrote:
>>>>> 
>>>>> Is there really no way to force a recount of files used by the quota? 
>>>>> All indications are we have accounts where files were removed and this is 
>>>>> not reflected in the used file count in the quota. The space used seems 
>>>>> correct but the inodes used numbers are way high. There must be a way to 
>>>>> clear these numbers and have a fresh count done.
>>>>> 
>>>>> —
>>>>> Dan Szkola
>>>>> FNAL
>>>>> 
>>>>>> On Oct 4, 2023, at 11:37 AM, Daniel Szkola via lustre-discuss 
>>>>>> <[email protected]> wrote:
>>>>>> 
>>>>>> Also, quotas on the OSTS don’t add up to near 3 million files either:
>>>>>> 
>>>>>> [root@lustreclient scratch]# ssh ossnode0 lfs quota -g somegroup -I 0 
>>>>>> /lustre1
>>>>>> Disk quotas for grp somegroup (gid 9544):
>>>>>> Filesystem  kbytes   quota   limit   grace   files   quota   limit   
>>>>>> grace
>>>>>>           1394853459       0 1913344192       -  132863       0       0  
>>>>>>      -
>>>>>> [root@lustreclient scratch]# ssh ossnode0 lfs quota -g somegroup -I 1 
>>>>>> /lustre1
>>>>>> Disk quotas for grp somegroup (gid 9544):
>>>>>> Filesystem  kbytes   quota   limit   grace   files   quota   limit   
>>>>>> grace
>>>>>>           1411579601       0 1963246413       -  120643       0       0  
>>>>>>      -
>>>>>> [root@lustreclient scratch]# ssh ossnode1 lfs quota -g somegroup -I 2 
>>>>>> /lustre1
>>>>>> Disk quotas for grp somegroup (gid 9544):
>>>>>> Filesystem  kbytes   quota   limit   grace   files   quota   limit   
>>>>>> grace
>>>>>>           1416507527       0 1789950778       -  190687       0       0  
>>>>>>      -
>>>>>> [root@lustreclient scratch]# ssh ossnode1 lfs quota -g somegroup -I 3 
>>>>>> /lustre1
>>>>>> Disk quotas for grp somegroup (gid 9544):
>>>>>> Filesystem  kbytes   quota   limit   grace   files   quota   limit   
>>>>>> grace
>>>>>>           1636465724       0 1926578117       -  195034       0       0  
>>>>>>      -
>>>>>> [root@lustreclient scratch]# ssh ossnode2 lfs quota -g somegroup -I 4 
>>>>>> /lustre1
>>>>>> Disk quotas for grp somegroup (gid 9544):
>>>>>> Filesystem  kbytes   quota   limit   grace   files   quota   limit   
>>>>>> grace
>>>>>>           2202272244       0 3020159313       -  185097       0       0  
>>>>>>      -
>>>>>> [root@lustreclient scratch]# ssh ossnode2 lfs quota -g somegroup -I 5 
>>>>>> /lustre1
>>>>>> Disk quotas for grp somegroup (gid 9544):
>>>>>> Filesystem  kbytes   quota   limit   grace   files   quota   limit   
>>>>>> grace
>>>>>>           1324770165       0 1371244768       -  145347       0       0  
>>>>>>      -
>>>>>> [root@lustreclient scratch]# ssh ossnode3 lfs quota -g somegroup -I 6 
>>>>>> /lustre1
>>>>>> Disk quotas for grp somegroup (gid 9544):
>>>>>> Filesystem  kbytes   quota   limit   grace   files   quota   limit   
>>>>>> grace
>>>>>>           2892027349       0 3221225472       -  169386       0       0  
>>>>>>      -
>>>>>> [root@lustreclient scratch]# ssh ossnode3 lfs quota -g somegroup -I 7 
>>>>>> /lustre1
>>>>>> Disk quotas for grp somegroup (gid 9544):
>>>>>> Filesystem  kbytes   quota   limit   grace   files   quota   limit   
>>>>>> grace
>>>>>>           2076201636       0 2474853207       -  171552       0       0  
>>>>>>      -
>>>>>> 
>>>>>> 
>>>>>> —
>>>>>> Dan Szkola
>>>>>> FNAL
>>>>>> 
>>>>>>>> On Oct 4, 2023, at 8:45 AM, Daniel Szkola via lustre-discuss 
>>>>>>>> <[email protected]> wrote:
>>>>>>> 
>>>>>>> No combination of ossnodek runs has helped with this.
>>>>>>> 
>>>>>>> Again, robinhood shows 1796104 files for the group, an 'lfs find -G 
>>>>>>> gid' found 1796104 files as well.
>>>>>>> 
>>>>>>> So why is the quota command showing over 3 million inodes used?
>>>>>>> 
>>>>>>> There must be a way to force it to recount or clear all stale quota 
>>>>>>> data and have it regenerate it?
>>>>>>> 
>>>>>>> Anyone?
>>>>>>> 
>>>>>>> —
>>>>>>> Dan Szkola
>>>>>>> FNAL
>>>>>>> 
>>>>>>> 
>>>>>>>> On Sep 27, 2023, at 9:42 AM, Daniel Szkola via lustre-discuss 
>>>>>>>> <[email protected]> wrote:
>>>>>>>> 
>>>>>>>> We have a lustre filesystem that we just upgraded to 2.15.3, however 
>>>>>>>> this problem has been going on for some time.
>>>>>>>> 
>>>>>>>> The quota command shows this:
>>>>>>>> 
>>>>>>>> Disk quotas for grp somegroup (gid 9544):
>>>>>>>> Filesystem    used   quota   limit   grace   files   quota   limit   
>>>>>>>> grace
>>>>>>>> /lustre1  13.38T     40T     45T       - 3136761* 2621440 3670016 
>>>>>>>> expired
>>>>>>>> 
>>>>>>>> The group is not using nearly that many files. We have robinhood 
>>>>>>>> installed and it show this:
>>>>>>>> 
>>>>>>>> Using config file '/etc/robinhood.d/lustre1.conf'.
>>>>>>>> group,     type,      count,     volume,   spc_used,   avg_size
>>>>>>>> somegroup,   symlink,      59071,    5.12 MB,  103.16 MB,         91
>>>>>>>> somegroup,       dir,     426619,    5.24 GB,    5.24 GB,   12.87 KB
>>>>>>>> somegroup,      file,    1310414,   16.24 TB,   13.37 TB,   13.00 MB
>>>>>>>> 
>>>>>>>> Total: 1796104 entries, volume: 17866508365925 bytes (16.25 TB), space 
>>>>>>>> used: 14704924899840 bytes (13.37 TB)
>>>>>>>> 
>>>>>>>> Any ideas what is wrong here?
>>>>>>>> 
>>>>>>>> —
>>>>>>>> Dan Szkola
>>>>>>>> FNAL
>>> 
>>> _______________________________________________
>>> lustre-discuss mailing list
>>> [email protected]
>>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>> 
> 

_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] Ongoing issues with quota

Reply via email to