[ceph-users] Re: Emergency support request for ceph MDS trouble shooting

2025-01-20 Thread Frank Schilder
Hi all, the job is taken. Thanks to anyone considering. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Frank Schilder Sent: Monday, January 20, 2025 11:53 AM To: ceph-users@ceph.io Subject: [ceph-users] Emergenc

[ceph-users] Re: MDS hung in purge_stale_snap_data after populating cache

2025-01-20 Thread Eugen Block
A colleague of mine suggested to create a coredump when the MDS has become stale and then inspect it with gdb. But if you think it’s more promising to increase the buffer, or maybe it’s quicker to test, then do that first. Zitat von Frank Schilder : which is 3758096384. I'm not even sure

[ceph-users] Notes from CSC Weekly 2025-01-20

2025-01-20 Thread Dan van der Ster
Hi all, The CSC Weekly (formerly known as the CLT Weekly) was shortened today due to a US holiday. All planned agenda items were postponed to next week. There was however one technical point to be communicated today: Radek would like all component leads to review PR https://github.com/ceph/ceph/p

[ceph-users] ceph orch ls --refresh

2025-01-20 Thread Alex Hussein-Kershaw (HE/HIM)
Hi Folks, Curious about the --refresh argument on this command (and similar). From experience so far, and a bit of a code read, it seems that: * Passing --refresh will kick the serve loop to run. * It won't wait for it to complete, so data returned may not be the latest. * Re-running the "

[ceph-users] Re: MDS hung in purge_stale_snap_data after populating cache

2025-01-20 Thread Frank Schilder
> which is 3758096384. I'm not even sure what the unit is, probably bytes? Sorry, it is bytes. Our items are about 100b on average, that's how we observe approximately 37462448 executions of purge_stale_snap_data until the queue is filled up. Best regards, = Frank Schilder AIT R

[ceph-users] Re: MDS hung in purge_stale_snap_data after populating cache

2025-01-20 Thread Frank Schilder
> which is 3758096384. I'm not even sure what the unit is, probably bytes? As far as I understand the unit is "list items". They can have variable length. On our system about 400G are allocated while filling up the bufferlist. Best regards, = Frank Schilder AIT Risø Campus Bygnin

[ceph-users] Re: MDS hung in purge_stale_snap_data after populating cache

2025-01-20 Thread Frank Schilder
Hi Eugen, I think the default is just a "reasonably large number" that's not too large. Looking at the code line you found: write_buf_throttle(cct, "write_buf_throttle", UINT_MAX - (UINT_MAX >> 3)), my gut feeling is that rebuilding it with this change (factor 4): write_buf_throttle(cct, "

[ceph-users] Re: MDS hung in purge_stale_snap_data after populating cache

2025-01-20 Thread Eugen Block
Hi, right, I haven't found a parameter for this to tune. Some throttling parameters are tunable, though. For example when I created https://tracker.ceph.com/issues/66310, where I assume that the default for mgr_mon_messages is too low (which shows up as throttle-mgr_mon_messsages in the p

[ceph-users] Re: MDS hung in purge_stale_snap_data after populating cache

2025-01-20 Thread Frank Schilder
Hi Eugen, yeah, I think you found it. That would also mean there is no parameter to scale that. I wonder if it is possible to skip the initial run of purge_stale_snap_data, have a lot of trash in the cache and use the forward-scrub to deal with the stray items. Well, we got in touch with some

[ceph-users] Re: Help needed: s3cmd set ACL command possess S3 error: 400 (InvalidArgument) in squid ceph version.

2025-01-20 Thread Saif Mohammad
Thanks Stephan ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Help needed: s3cmd set ACL command possess S3 error: 400 (InvalidArgument) in squid ceph version.

2025-01-20 Thread Stephan Hohn
Hi Mohammad, this seems to be a bug in the current squid version. https://tracker.ceph.com/issues/69527 Cheers Stephan Am Mo., 20. Jan. 2025 um 11:56 Uhr schrieb Saif Mohammad < samdto...@gmail.com>: > Hello Community, > > We are trying to set ACL for one of the objects by s3cmd tool within t

[ceph-users] Re: MDS hung in purge_stale_snap_data after populating cache

2025-01-20 Thread Eugen Block
It looks like a hard-coded max for the throttle: write_buf_throttle(cct, "write_buf_throttle", UINT_MAX - (UINT_MAX >> 3)), which is 3758096384. I'm not even sure what the unit is, probably bytes? https://github.com/ceph/ceph/blob/v16.2.15/src/osdc/Journaler.h#L410 Zitat von Frank Schilder :

[ceph-users] osd won't restart

2025-01-20 Thread Dominique Ramaekers
Hi, Strange thing just happened (ceph v19.2.0). I added two disks to a host. Kernel recognized nicely the two disks and they appeared as available devices in ceph. After 15 minutes osd's were not created, so I looked at the logs: /usr/bin/docker: stderr --> Creating keyring file for osd.36 /usr/

[ceph-users] Help needed: s3cmd set ACL command possess S3 error: 400 (InvalidArgument) in squid ceph version.

2025-01-20 Thread Saif Mohammad
Hello Community, We are trying to set ACL for one of the objects by s3cmd tool within the buckets to be public by using the command as follows but we are unable to set it in squid ceph version, however the same was done in the reef version, we were successfully able to set it public. Please let

[ceph-users] Emergency support request for ceph MDS trouble shooting

2025-01-20 Thread Frank Schilder
Dear all, this is a request to companies/consultants with development experience on ceph for a contract to help us out of our current file system outage. If you can offer help, please send a PM directly back to me. Short description with links to what we found out already: We experience a tota

[ceph-users] Re: MDS hung in purge_stale_snap_data after populating cache

2025-01-20 Thread Frank Schilder
Hi Eugen, thanks for your input. I can't query the hung MDS, but the others say this here: ceph tell mds.ceph-14 perf dump throttle-write_buf_throttle { "throttle-write_buf_throttle": { "val": 0, "max": 3758096384, "get_started": 0, "get": 5199, "get_su

[ceph-users] Re: MDS hung in purge_stale_snap_data after populating cache

2025-01-20 Thread Eugen Block
Hi Frank, are you able to query the daemon while it's trying to purge the snaps? pacific:~ # ceph tell mds.{your_daemon} perf dump throttle-write_buf_throttle ... "max": 3758096384, I don't know yet where that "max" setting comes from, but I'll keep looking. Zitat von Frank Schilder :