[ceph-users] Re: MDS hung in purge_stale_snap_data after populating cache

2025-01-23 Thread Frank Schilder
nk Schilder AIT Risø Campus Bygning 109, rum S14 From: Eugen Block Sent: Monday, January 20, 2025 6:49 PM To: Frank Schilder Cc: ceph-users@ceph.io Subject: Re: [ceph-users] Re: MDS hung in purge_stale_snap_data after populating cache A colleague of mine s

[ceph-users] Re: MDS hung in purge_stale_snap_data after populating cache

2025-01-20 Thread Eugen Block
9, rum S14 From: Frank Schilder Sent: Monday, January 20, 2025 1:51 PM To: Eugen Block Cc: ceph-users@ceph.io Subject: [ceph-users] Re: MDS hung in purge_stale_snap_data after populating cache which is 3758096384. I'm not even sure what the unit is,

[ceph-users] Re: MDS hung in purge_stale_snap_data after populating cache

2025-01-20 Thread Frank Schilder
Schilder AIT Risø Campus Bygning 109, rum S14 From: Frank Schilder Sent: Monday, January 20, 2025 1:51 PM To: Eugen Block Cc: ceph-users@ceph.io Subject: [ceph-users] Re: MDS hung in purge_stale_snap_data after populating cache > which is 3758096384. I

[ceph-users] Re: MDS hung in purge_stale_snap_data after populating cache

2025-01-20 Thread Frank Schilder
er AIT Risø Campus Bygning 109, rum S14 From: Frank Schilder Sent: Monday, January 20, 2025 1:38 PM To: Eugen Block Cc: ceph-users@ceph.io Subject: Re: [ceph-users] Re: MDS hung in purge_stale_snap_data after populating cache Hi Eugen, I think the default is

[ceph-users] Re: MDS hung in purge_stale_snap_data after populating cache

2025-01-20 Thread Frank Schilder
s, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Eugen Block Sent: Monday, January 20, 2025 1:25 PM To: Frank Schilder Cc: ceph-users@ceph.io Subject: Re: [ceph-users] Re: MDS hung in purge_stale_snap_data after populating cache Hi, right, I ha

[ceph-users] Re: MDS hung in purge_stale_snap_data after populating cache

2025-01-20 Thread Eugen Block
: Monday, January 20, 2025 12:40 PM To: Frank Schilder Cc: ceph-users@ceph.io Subject: Re: [ceph-users] Re: MDS hung in purge_stale_snap_data after populating cache It looks like a hard-coded max for the throttle: write_buf_throttle(cct, "write_buf_throttle", UINT_MAX - (UINT_MAX >>

[ceph-users] Re: MDS hung in purge_stale_snap_data after populating cache

2025-01-20 Thread Frank Schilder
: Frank Schilder Cc: ceph-users@ceph.io Subject: Re: [ceph-users] Re: MDS hung in purge_stale_snap_data after populating cache It looks like a hard-coded max for the throttle: write_buf_throttle(cct, "write_buf_throttle", UINT_MAX - (UINT_MAX >> 3)), which is 3758096384. I'm n

[ceph-users] Re: MDS hung in purge_stale_snap_data after populating cache

2025-01-20 Thread Eugen Block
ed. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 ____________ From: Eugen Block Sent: Monday, January 20, 2025 11:12 AM To: ceph-users@ceph.io Subject: [ceph-users] Re: MDS hung in purge_stale_snap_data after populating

[ceph-users] Re: MDS hung in purge_stale_snap_data after populating cache

2025-01-20 Thread Frank Schilder
= Frank Schilder AIT Risø Campus Bygning 109, rum S14 ________ From: Eugen Block Sent: Monday, January 20, 2025 11:12 AM To: ceph-users@ceph.io Subject: [ceph-users] Re: MDS hung in purge_stale_snap_data after populating cache Hi Frank, are you able to

[ceph-users] Re: MDS hung in purge_stale_snap_data after populating cache

2025-01-20 Thread Eugen Block
, rum S14 ________________ From: Frank Schilder Sent: Friday, January 17, 2025 3:02 PM To: Bailey Allison; ceph-users@ceph.io Subject: [ceph-users] Re: MDS hung in purge_stale_snap_data after populating cache Hi Bailey. ceph-14 (rank=0): num_stray=205532 ceph-13 (rank=1):

[ceph-users] Re: MDS hung in purge_stale_snap_data after populating cache

2025-01-19 Thread Frank Schilder
ocklist_on_timeoutfalse Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 ________________ From: Bailey Allison Sent: Thursday, January 16, 2025 10:08 PM To: ceph-users@ceph.io Subject: [ceph-users] Re: MDS hung in purge_stale_snap_data after populating cache Fra

[ceph-users] Re: MDS hung in purge_stale_snap_data after populating cache

2025-01-17 Thread Bailey Allison
mpus Bygning 109, rum S14 From: Bailey Allison Sent: Thursday, January 16, 2025 10:08 PM To: ceph-users@ceph.io Subject: [ceph-users] Re: MDS hung in purge_stale_snap_data after populating cache Frank, Are you able to share an update to date ceph config dump a

[ceph-users] Re: MDS hung in purge_stale_snap_data after populating cache

2025-01-17 Thread Frank Schilder
10:08 PM To: ceph-users@ceph.io Subject: [ceph-users] Re: MDS hung in purge_stale_snap_data after populating cache Frank, Are you able to share an update to date ceph config dump and ceph daemon mds.X perf dump | grep strays from the cluster? We're just getting through our comically long

[ceph-users] Re: MDS hung in purge_stale_snap_data after populating cache

2025-01-16 Thread Bailey Allison
Frank, Are you able to share an update to date ceph config dump and ceph daemon mds.X perf dump | grep strays from the cluster? We're just getting through our comically long ceph outage, so i'd like to be able to share the love here hahahaha Regards, Bailey Allison Service Team Lead 45Driv

[ceph-users] Re: MDS hung in purge_stale_snap_data after populating cache

2025-01-16 Thread Frank Schilder
I think I finally found the moment where everything goes downhill. Please take a look at this comment: https://tracker.ceph.com/issues/69547?next_issue_id=69546#note-4 . This looks a lot like a timeout, but I have no clue what to look for. Any hint is greatly appreciated. Thanks and best regar

[ceph-users] Re: MDS hung in purge_stale_snap_data after populating cache

2025-01-16 Thread Frank Schilder
The MDS was up over night and it started showing CPU load again. I added a screen show to the imgur post (https://imgur.com/a/mds-hung-purge-stale-snap-data-after-populating-cache-RF7ExSP). Unfortunately, its only the messenger threads. The MDS seems to idle around. Best regards, ==