Hi, nice coincidence that you mention that today; I've just debugged the exact same problem on a setup where deep_scrub_interval was increased.
The solution was to set the deep_scrub_interval directly on all pools instead (which was better for this particular setup anyways): ceph osd pool set <pool> deep_scrub_interval <deep_scrub_in_seconds> Here's the code that generates the warning: https://github.com/ceph/ceph/blob/v14.2.4/src/mon/PGMap.cc#L3058 * There's no obvious bug in the code, no reason why it shouldn't work with the option unless "pool->opts.get(pool_opts_t::DEEP_SCRUB_INTERVAL, x)" returns the wrong thing if it's not configured for a pool * I've used "config diff" to check that all mons use the correct value for deep_scrub_interval * mon_warn_pg_not_deep_scrubbed_ratio is a little bit odd because the warning will trigger at (mon_warn_pg_not_deep_scrubbed_ratio + 1) * deep_scrub_interval which is somewhat unexpected, so by default at 125% the configured interval Paul -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89 1896585 90 On Mon, Dec 9, 2019 at 5:17 PM Robert LeBlanc <rob...@leblancnet.us> wrote: > I've increased the deep_scrub interval on the OSDs on our Nautilus cluster > with the following added to the [osd] section: > > osd_deep_scrub_interval = 2600000 > > And I started seeing > > 1518 pgs not deep-scrubbed in time > > in ceph -s. So I added > > mon_warn_pg_not_deep_scrubbed_ratio = 1 > > since the default would start warning with a whole week left to scrub. But > the messages persist. The cluster has been running for a month with these > settings. Here is an example of the output. As you can see, some of these > are not even two weeks old, no where close to the 75% of 4 weeks. > > pg 6.1f49 not deep-scrubbed since 2019-11-09 23:04:55.370373 > pg 6.1f47 not deep-scrubbed since 2019-11-18 16:10:52.561204 > pg 6.1f44 not deep-scrubbed since 2019-11-18 15:48:16.825569 > pg 6.1f36 not deep-scrubbed since 2019-11-20 05:39:00.309340 > pg 6.1f31 not deep-scrubbed since 2019-11-27 02:48:45.347680 > pg 6.1f30 not deep-scrubbed since 2019-11-11 21:34:15.795622 > pg 6.1f2d not deep-scrubbed since 2019-11-24 11:37:39.502829 > pg 6.1f27 not deep-scrubbed since 2019-11-25 07:38:58.689315 > pg 6.1f25 not deep-scrubbed since 2019-11-20 00:13:43.048569 > pg 6.1f1a not deep-scrubbed since 2019-11-09 15:08:43.516666 > pg 6.1f19 not deep-scrubbed since 2019-11-25 10:24:47.884332 > 1468 more pgs... > Mon Dec 9 08:12:01 PST 2019 > > There is very little data on the cluster, so it's not a problem of > deep-scrubs taking too long: > > $ ceph df > RAW STORAGE: > CLASS SIZE AVAIL USED RAW USED %RAW USED > hdd 6.3 PiB 6.1 PiB 153 TiB 154 TiB 2.39 > nvme 5.8 TiB 5.6 TiB 138 GiB 197 GiB 3.33 > TOTAL 6.3 PiB 6.2 PiB 154 TiB 154 TiB 2.39 > > POOLS: > POOL ID STORED OBJECTS USED > %USED MAX AVAIL > .rgw.root 1 3.0 KiB 7 3.0 KiB > 0 1.8 PiB > default.rgw.control 2 0 B 8 0 B > 0 1.8 PiB > default.rgw.meta 3 7.4 KiB 24 7.4 KiB > 0 1.8 PiB > default.rgw.log 4 11 GiB 341 11 GiB > 0 1.8 PiB > default.rgw.buckets.data 6 100 TiB 41.84M 100 TiB > 1.82 4.2 PiB > default.rgw.buckets.index 7 33 GiB 574 33 GiB > 0 1.8 PiB > default.rgw.buckets.non-ec 8 8.1 MiB 22 8.1 MiB > 0 1.8 PiB > > Please help me figure out what I'm doing wrong with these settings. > > Thanks, > Robert LeBlanc > ---------------- > Robert LeBlanc > PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com