Right, that’s what I found out as well here (https://heiterbiswolkig.blogs.nde.ag/2024/09/06/pgs-not-deep-scrubbed-in-time/), but I also kind of hoped that this would have been corrected in the meantime. I don’t remember right now if I created a tracker. I’ll check when I have time.

Zitat von Michel Jouvin <michel.jou...@ijclab.in2p3.fr>:

Eugen,

Thanks for 'ceph config help', I always forget about it! But it doesn't really help in this case:

-----

osd_deep_scrub_interval - Deep scrub each PG (i.e., verify data checksums) at least this often
  (float, advanced)
  Default: 604800.000000
  Can update at runtime: true
  Services: [osd]
-----

And clearly it is wrong. It should mention osd + mgr or probably better 'global'. If you modify it only on OSD (what we did), you end up with deep scrubs being properly scheduled by OSDs but your cluster reported in WARNING state with an incredibly high number of late deep scrubs that can be worrying...

Michel

Le 26/05/2025 à 09:56, Eugen Block a écrit :
It’s reported by the mgr, so you’ll either have to pass global or mgr and osd to the configuration change. You can also check ‚ceph config help {CONFIG}‘ to check which services are related to that configuration value.

Zitat von Michel Jouvin <michel.jou...@ijclab.in2p3.fr>:

The page I checked, https://docs.ceph.com/en/reef/rados/configuration/osd-config-ref/, is just describing the parameters, mentioning they can be defined for all OSDs or just a specific one. But there is nothing that I have identified about the fact that some of these parameters must be defined as global as they both control the OSD behaviour and the alarms generated by mon...

Michel

Le 26/05/2025 à 09:31, Gregory Orange a écrit :
This is a great illustration of the need for this to be global. Is it
documented that way?

There was a discussion on Slack a couple of weeks ago where someone was
asserting that it should be an osd value whereas we always use global -
well, ever since we hit the same problem as you, a few years ago!


On 26/5/25 15:21, Michel Jouvin wrote:
Sorry for the noise, I found the mistake right after sending this
message. We did a 'ceph config set osd osd_deep_scrub_interval' instead
of a `ceph config set global...'. As a result only the OSDs saw the
changes. Fixing this, the cluster was back to CEPH_OK immediatly!

Michel

Le 26/05/2025 à 09:17, Michel Jouvin a écrit :
Hi,

Last week we increased osd_deep_scrub_interval from 10 days to 14 days
as we tended to have permanently 1 PG with a late deep scrub (the PG
changing all the time). We did it with `ceph config set ...`. From
what we have seen, the deep scrubs are now spread over 14 days (the
oldest are 14 days) meaning that OSDs took this change into account
(without being restarted). But the number of late deep scrubs reported
by `ceph -s) is ~700 which is unexpected. Does it mean that the mon
(who is in charge of the report if I am right) have not seen the
changes (they have not been restarted)?

Cheers,

Michel


_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to