Hi,
As promised, the summary now that the situation is back to normal
(HEALTH_OK). The main issue seems to be related to a new feature in
Squid regarding scrub queueing that leads to some scrubs waiting too
long before being scheduled. A parameter,
osd_scrub_disable_reservation_queuing=true, allow to restore the
previous behaviour for scrub queueing and to work around the issue.
Issue is tracked at https://tracker.ceph.com/issues/69078 as mentioned
by Frederic, with quite some details.
When in the situation of having a large number of late scrubs, going
back to a normal situation can take quite some time. In particular we
found that inappropriate values for osd_mclock_max_capacity_iops_hdd/sdd
(we are using mclock scheduler) for some OSDs can make the "recovery"
(not a PG recovery) longer. And if you can afford it, defining
osd_mclock_profile to high_recovery_ops during this "recovery period"
clearly helps.
Best regards,
Michel
Le 09/06/2025 à 14:27, Michel Jouvin a écrit :
Hi Frederic,
Thanks for the pointer, I unfortunately missed it when trying to
Google about the issue... I made the suggested change 2 hours ago and
I contfirm that it seems the situation is improving. It clearly
unblocked deep scrubs with their active number doubled compared to
before the change, the shallow scrub numbers remaining basically
constant:
1414 active+clean+scrubbing+deep
449 active+clean+scrubbing
I read the issue mentioned in the previous thread about the same issue
and I confirm it is matching our observations. As the other person,
I'll let the system come back to a normal situation with all the
scrubs done and try to revert the setting changed related to scrub
queueing. In the issue, the assumption seemed to be that a deep scrub
was blocking the shallow scrubs where I tended to think the opposite
as the number of late shallow scrubs was so high compared to deep
scrub (I'm skeptical that 1 deep scrub could block 1000 shallow scrubs
as it was the case initially but I don't have precise facts to back
this idea).
I'll update the issue with my findings and try to sent a final message
on the list if it worked as expected.
Best regards,
Michel
Le 09/06/2025 à 12:08, Frédéric Nass a écrit :
Hi Michel,
You're probably facing this [1].
Best regards,
Frédéric.
[1]
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/FWDE4FSNPLKG4SWKT6IMBPYUXJK6VE63/
----- Le 9 Juin 25, à 11:09, Michel Jouvin
michel.jou...@ijclab.in2p3.fr a écrit :
Apologizes, I realize that I mentioned new deep scrub parameters in
Squid where it should have been "new shallow scrub parameters". But it
doesn change the reasoning.
Michel
Le 09/06/2025 à 11:01, Michel Jouvin a écrit :
Hi,
We upgraded one of our production cluster (480 OSDs, 13,5 K PGs, most
pools EC 9+6) from 18.2.7 to 19.2.2 on May 26. It was healthy when we
upgraded it and remained so until last Thrusday (June 5) where, doing
a `ceph -s`, I saw that there were 1 deep scrub and ~1000 scrubs late:
1 pgs not deep-scrubbed in time
1168 pgs not scrubbed in time
Looking again Friday morning (~16 hours later), this number increased
a lot:
27 pgs not deep-scrubbed in time
3252 pgs not scrubbed in time
Checking our configuration on Friday, we found that osd_max_scrubs was
set to 1 instead of using the new default since Reef which is 3
(probably a leftover of a config change after a problem 18 monts ago).
We unset the specific value and this led to a reduction of these
numbers in the next 24 hours (~2700 scrubs late) but since then (2
days) it remained stable initially and is now increasing slowly. This
morning the situation is:
294 pgs not deep-scrubbed in time
3013 pgs not scrubbed in time
My guess is that the real issue is the late scrubs that results in
many OSD reaching the limit of 3 concurrent scrubs and that it has the
consequences that some deep scrubs cannot run too (I've in mind that
the limit applies both to shallow scubs and deep scrubs, am I right?).
I checked the main cluster logs and didn't find any error or warning,
related to OSDs like slow ops, slow requests...The only things we have
spotted through our monitoring system is a dramatic decrease (~75%) of
IOPS on each OSD server right after the 19.2.2 upgrade but it is not
necessarily the sign of a problem. I guess it may in particular be a
consequence of the new deep scrubs parameters,
osd_shallow_scrub_chunk_min/max, which a probably intended to reduced
the deep scrub IOPS load. The release notes for Squid also mention a
change in osd_op_num_shards_hdd and osd_op_num_threads_per_shard_hdd,
I don't know if they may also have impact.
Up to now, no users has reported any issue so it seems to be a problem
only with scrubs. I'm wondering where to start looking for an issue or
anything related to 19.2.2 is already known. We increased the deep
scrub interval from 10 to 14 days a few days before the upgrade (we
saw that there was permanently 1 deep scrub late, a different PG all
the time) and kept the standard 7 day interval for scrubs. Looking at
the number of scrubs and deep scrubs per day, it doesn't look weird
(see below).
I guess that if we retart all OSDs we'll clear the problem but we'd
like to understand what happened and be sure it is not something
related to Squid, before upgrade our other production cluster. Any
hint/advice will be highly appreciated. I took a snapshot of `ceph pg
dump pgs_brief` regularly and I'll try to identify if there are some
stucked scrubs and what are the OSDs involved but with 500 OSDs and 18
OSD servers, it may not be obvious...
Best regards,
Michel
_Distribution of scrubs_ (first number is the number of scrubs during
the day)
(number increases on June 6 after setting osd_max_scrubs=3)
295 "2025-05-25
1578 "2025-05-26
300 "2025-05-27
392 "2025-05-28
578 "2025-05-29
707 "2025-05-30
611 "2025-05-31
819 "2025-06-01
679 "2025-06-02
724 "2025-06-03
698 "2025-06-04
726 "2025-06-05
1577 "2025-06-06
1393 "2025-06-07
1962 "2025-06-08
645 "2025-06-09
_Distribution of deep scrubs per day_
(number increases on June 6 after setting osd_max_scrubs=3 and starts
to decrease again on June 8, when the number of late subs increases
again, probably because we hit the limit of 3 scrubs per OSD)
22 "2025-05-12
63 "2025-05-13
101 "2025-05-14
127 "2025-05-15
173 "2025-05-16
238 "2025-05-17
305 "2025-05-18
387 "2025-05-19
450 "2025-05-20
564 "2025-05-21
675 "2025-05-22
716 "2025-05-23
871 "2025-05-24
1071 "2025-05-25
801 "2025-05-26
188 "2025-05-27
292 "2025-05-28
335 "2025-05-29
409 "2025-05-30
371 "2025-05-31
514 "2025-06-01
440 "2025-06-02
504 "2025-06-03
478 "2025-06-04
546 "2025-06-05
1132 "2025-06-06
1022 "2025-06-07
662 "2025-06-08
227 "2025-06-09
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io