The code checks the pg with the oldest scrub_stamp/deep_scrub_stamp to see whether the osd_scrub_min_interval/osd_deep_scrub_interval time has elapsed. So the output you are showing with the very old scrub stamps shouldn’t happen under default settings. As soon set deep-scrub is re-enabled, the 5 pgs with that old stamp should be the first to get run.
A PG needs to have active and clean set to be scrubbed. If any weren’t active+clean, then even a manual scrub would do nothing. Now that I’m looking at the code I see that your symptom is possible if the values of osd_scrub_min_interval or osd_scrub_max_interval are larger than your osd_deep_scrub_interval. Should the osd_scrub_min_interval be greater than osd_deep_scrub_interval, there won't be a deep scrub until the osd_scrub_min_interval has elapsed. If an OSD is under load and the osd_scrub_max_interval is greater than the osd_deep_scrub_interval, there won't be a deep scrub until osd_scrub_max_interval has elapsed. Please check the 3 interval config values. Verify that your PGs are active+clean just to be sure. David On May 20, 2014, at 5:21 PM, Mike Dawson <mike.daw...@cloudapt.com> wrote: > Today I noticed that deep-scrub is consistently missing some of my Placement > Groups, leaving me with the following distribution of PGs and the last day > they were successfully deep-scrubbed. > > # ceph pg dump all | grep active | awk '{ print $20}' | sort -k1 | uniq -c > 5 2013-11-06 > 221 2013-11-20 > 1 2014-02-17 > 25 2014-02-19 > 60 2014-02-20 > 4 2014-03-06 > 3 2014-04-03 > 6 2014-04-04 > 6 2014-04-05 > 13 2014-04-06 > 4 2014-04-08 > 3 2014-04-10 > 2 2014-04-11 > 50 2014-04-12 > 28 2014-04-13 > 14 2014-04-14 > 3 2014-04-15 > 78 2014-04-16 > 44 2014-04-17 > 8 2014-04-18 > 1 2014-04-20 > 16 2014-05-02 > 69 2014-05-04 > 140 2014-05-05 > 569 2014-05-06 > 9231 2014-05-07 > 103 2014-05-08 > 514 2014-05-09 > 1593 2014-05-10 > 393 2014-05-16 > 2563 2014-05-17 > 1283 2014-05-18 > 1640 2014-05-19 > 1979 2014-05-20 > > I have been running the default "osd deep scrub interval" of once per week, > but have disabled deep-scrub on several occasions in an attempt to avoid the > associated degraded cluster performance I have written about before. > > To get the PGs longest in need of a deep-scrub started, I set the > nodeep-scrub flag, and wrote a script to manually kick off deep-scrub > according to age. It is processing as expected. > > Do you consider this a feature request or a bug? Perhaps the code that > schedules PGs to deep-scrub could be improved to prioritize PGs that have > needed a deep-scrub the longest. > > Thanks, > Mike Dawson > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com