[ceph-users] Re: pools nearfull (because cephfs is lazy in releasing space for deleted files ?)

Eugen Block Wed, 29 Oct 2025 01:35:50 -0700

Hi,

do you have the command history? I know it's unlikely, but maybe youset a quota of 45 TB instead 4,5 so it didn't prevent the poolbecoming full? 4,5 TB has a lot of digits. ;-) Did you set it viaDashboard or CLI? Did you verify if you had set the correct quota? Forexample:


getfattr -n ceph.quota.max_bytes /mnt/test-quota/
# file: mnt/test-quota/
ceph.quota.max_bytes="100034150"

The fact that IO was blocked after you set a quota of 600 GB sounds abit like it might have not been 4,5 TB. But without further "proof"it's hard to tell.


Regards,
Eugen

Zitat von Massimo Sgaravatto <[email protected]>:

Dear all

We have a portion of a Cephfs file system that maps to a ceph pool called
cephfs_data_ssd.


If I perform a "du -sh" on this portion of the file system, I see that the
value matches the "STORED" field of the "ceph df" output for the
cephfs_data_ssd pool.

So far so good.

I set a quota of 4.5 TB for this file system area.


During the weekend, this pool (and other pools of the same device class)
became nearfull.

A "ceph df" showed that the problem was indeed in the the cephfs_data_ssd
pool, with a reported usage of 7 TiB of data (21 TiB in replica 3):

cephfs_data_ssd 62 32 7.1 TiB 2.02M 21 TiB 89.06 898 GiB


This sounds strange to me because I set a quota of 4.5 TB in that area, and
because a "du -sh" of the relevant directory showed a usage of 600 GB.


When I lowered the disk quota from 4.5 TB to 600 GB, the jobs writing in
that
area failed (because of disk quota exceeded) and after a while the space was
released.


The only explanation I can think of is that, as far as I understand,
cephfs can take a while to release the space for deleted files
(https://docs.ceph.com/en/reef/dev/delayed-delete/).


This would also be consistent with the fact that it looks like some jobs
were performing a lot of writes and deletions (they kept writing a ~ 5GB
checkpoint file, and deleting the previous one after each iteration).



How can I understand from the log files if this was indeed the problem ?

Or do you have some other possible explanations for this problem ?

And, most important, how can I prevent scenarios such as this one ?

Thanks, Massimo
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]



_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[ceph-users] Re: pools nearfull (because cephfs is lazy in releasing space for deleted files ?)

Reply via email to