[ceph-users] Re: 18.2.7: modification of osd_deep_scrub_interval in config ignored by mon

Eugen Block Mon, 26 May 2025 01:39:06 -0700

Right, that’s what I found out as well here(https://heiterbiswolkig.blogs.nde.ag/2024/09/06/pgs-not-deep-scrubbed-in-time/), but I also kind of hoped that this would have been corrected in the meantime. I don’t remember right now if I created a tracker. I’ll check when I havetime.


Zitat von Michel Jouvin <michel.jou...@ijclab.in2p3.fr>:

Eugen,
Thanks for 'ceph config help', I always forget about it! But itdoesn't really help in this case:
-----
osd_deep_scrub_interval - Deep scrub each PG (i.e., verify datachecksums) at least this often
  (float, advanced)
  Default: 604800.000000
  Can update at runtime: true
  Services: [osd]
-----
And clearly it is wrong. It should mention osd + mgr or probablybetter 'global'. If you modify it only on OSD (what we did), you endup with deep scrubs being properly scheduled by OSDs but yourcluster reported in WARNING state with an incredibly high number oflate deep scrubs that can be worrying...
Michel

Le 26/05/2025 à 09:56, Eugen Block a écrit :
It’s reported by the mgr, so you’ll either have to pass global ormgr and osd to the configuration change. You can also check ‚cephconfig help {CONFIG}‘ to check which services are related to thatconfiguration value.
Zitat von Michel Jouvin <michel.jou...@ijclab.in2p3.fr>:
The page I checked,https://docs.ceph.com/en/reef/rados/configuration/osd-config-ref/,is just describing the parameters, mentioning they can be definedfor all OSDs or just a specific one. But there is nothing that Ihave identified about the fact that some of these parameters mustbe defined as global as they both control the OSD behaviour andthe alarms generated by mon...
Michel

Le 26/05/2025 à 09:31, Gregory Orange a écrit :
This is a great illustration of the need for this to be global. Is it
documented that way?

There was a discussion on Slack a couple of weeks ago where someone was
asserting that it should be an osd value whereas we always use global -
well, ever since we hit the same problem as you, a few years ago!


On 26/5/25 15:21, Michel Jouvin wrote:
Sorry for the noise, I found the mistake right after sending this
message. We did a 'ceph config set osd osd_deep_scrub_interval' instead
of a `ceph config set global...'. As a result only the OSDs saw the
changes. Fixing this, the cluster was back to CEPH_OK immediatly!

Michel

Le 26/05/2025 à 09:17, Michel Jouvin a écrit :
Hi,

Last week we increased osd_deep_scrub_interval from 10 days to 14 days
as we tended to have permanently 1 PG with a late deep scrub (the PG
changing all the time). We did it with `ceph config set ...`. From
what we have seen, the deep scrubs are now spread over 14 days (the
oldest are 14 days) meaning that OSDs took this change into account
(without being restarted). But the number of late deep scrubs reported
by `ceph -s) is ~700 which is unexpected. Does it mean that the mon
(who is in charge of the report if I am right) have not seen the
changes (they have not been restarted)?

Cheers,

Michel
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: 18.2.7: modification of osd_deep_scrub_interval in config ignored by mon

Reply via email to