Hi everyone!

Just thought I would let everyone know: The issue appears to have been the Ceph NFS service associated with the filesystem.

I removed all the files, waited a while, disconnected all the clients, waited a while, then deleted the NFS shares - the disk space and objects abruptly began freeing up.

I'm sorry that I can't contribute any more useful diagnostic information, but maybe this is the extra bit of data that crystallizes someone's theory about the issue.

On 21/03/2024 10:33 am, Anthony D'Atri wrote:
Grep through the ls output for ‘rados bench’ leftovers, it’s easy to leave them 
behind.

On Mar 20, 2024, at 5:28 PM, Igor Fedotov<igor.fedo...@croit.io> wrote:

Hi Thorne,

unfortunately I'm unaware of any tools high level enough to easily map files to rados 
objects without deep undestanding how this works. You might want to try "rados 
ls" command to get the list of all the objects in the cephfs data pool. And then  
learn how that mapping is performed and parse your listing.


Thanks,

Igor

On 3/20/2024 1:30 AM, Thorne Lawler wrote:

Igor,

Those files are VM disk images, and they're under constant heavy use, so yes- 
there/is/ constant severe write load against this disk.

Apart from writing more test files into the filesystems, there must be Ceph 
diagnostic tools to describe what those objects are being used for, surely?

We're talking about an extra 10TB of space. How hard can it be to determine 
which file those objects are associated with?

On 19/03/2024 8:39 pm, Igor Fedotov wrote:

Hi Thorn,

given the amount of files at CephFS volume I presume you don't have severe 
write load against it. Is that correct?

If so we can assume that the numbers you're sharing are mostly refer to your 
experiment. At peak I can see bytes_used increase = 629,461,893,120 bytes 
(45978612027392  - 45349150134272). With replica factor = 3 this roughly 
matches your written data (200GB I presume?).


More interestingly is that after file's removal we can see 419,450,880 bytes 
delta (=45349569585152 - 45349150134272). I could see two options (apart that 
someone else wrote additional stuff to CephFS during the experiment) to explain 
this:

1. File removal wasn't completed at the last probe half an hour after file's 
removal. Did you see stale object counter when making that probe?

2. Some space is leaking. If that's the case this could be a reason for your 
issue if huge(?) files at CephFS are created/removed periodically. So if we're 
certain that the leak really occurred (and option 1. above isn't the case) it 
makes sense to run more experiments with writing/removing a bunch of huge files 
to the volume to confirm space leakage.

On 3/18/2024 3:12 AM, Thorne Lawler wrote:
Thanks Igor,

I have tried that, and the number of objects and bytes_used took a long time to 
drop, but they seem to have dropped back to almost the original level:

  * Before creating the file:
      o 3885835 objects
      o 45349150134272 bytes_used
  * After creating the file:
      o 3931663 objects
      o 45924147249152 bytes_used
  * Immediately after deleting the file:
      o 3935995 objects
      o 45978612027392 bytes_used
  * Half an hour after deleting the file:
      o 3886013 objects
      o 45349569585152 bytes_used

Unfortunately, this is all production infrastructure, so there is always other 
activity taking place.

What tools are there to visually inspect the object map and see how it relates 
to the filesystem?

Not sure if there is anything like that at CephFS level but you can use rados 
tool to view objects in cephfs data pool and try to build some mapping between 
them and CephFS file list. Could be a bit tricky though.
On 15/03/2024 7:18 pm, Igor Fedotov wrote:
ceph df detail --format json-pretty
--

Regards,

Thorne Lawler - Senior System Administrator
*DDNS* | ABN 76 088 607 265
First registrar certified ISO 27001-2013 Data Security Standard ITGOV40172
P +61 499 449 170

_DDNS

/_*Please note:* The information contained in this email message and any 
attached files may be confidential information, and may also be the subject of 
legal professional privilege. _If you are not the intended recipient any use, 
disclosure or copying of this email is unauthorised. _If you received this 
email in error, please notify Discount Domain Name Services Pty Ltd on 03 9815 
6868 to report this matter and delete all copies of this transmission together 
with any attachments. /

--
Igor Fedotov
Ceph Lead Developer

Looking for help with your Ceph cluster? Contact us athttps://croit.io

croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263
Web:https://croit.io  | YouTube:https://goo.gl/PGE1Bx
--

Regards,

Thorne Lawler - Senior System Administrator
*DDNS* | ABN 76 088 607 265
First registrar certified ISO 27001-2013 Data Security Standard ITGOV40172
P +61 499 449 170

_DDNS

/_*Please note:* The information contained in this email message and any 
attached files may be confidential information, and may also be the subject of 
legal professional privilege. _If you are not the intended recipient any use, 
disclosure or copying of this email is unauthorised. _If you received this 
email in error, please notify Discount Domain Name Services Pty Ltd on 03 9815 
6868 to report this matter and delete all copies of this transmission together 
with any attachments. /

--
Igor Fedotov
Ceph Lead Developer

Looking for help with your Ceph cluster? Contact us athttps://croit.io

croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263
Web:https://croit.io  | YouTube:https://goo.gl/PGE1Bx
_______________________________________________
ceph-users mailing list --ceph-users@ceph.io
To unsubscribe send an email toceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list --ceph-users@ceph.io
To unsubscribe send an email toceph-users-le...@ceph.io
--

Regards,

Thorne Lawler - Senior System Administrator
*DDNS* | ABN 76 088 607 265
First registrar certified ISO 27001-2013 Data Security Standard ITGOV40172
P +61 499 449 170

_DDNS

/_*Please note:* The information contained in this email message and any attached files may be confidential information, and may also be the subject of legal professional privilege. _If you are not the intended recipient any use, disclosure or copying of this email is unauthorised. _If you received this email in error, please notify Discount Domain Name Services Pty Ltd on 03 9815 6868 to report this matter and delete all copies of this transmission together with any attachments. /
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to