Hi Eugen, Reading the code, the muted alert was cleared because it was non-sticky and the number of affected PGs increased (which was decided to be a good reason to alert the admin).
Have you tried to use the --sticky argument on the 'ceph health mute' command? Cheers, Frédéric. ----- Le 25 Juin 25, à 9:21, Eugen Block ebl...@nde.ag a écrit : > Hi, > > I'm trying to understand the "ceph health mute" behavior. In this > case, I'm referring to the warning PG_NOT_DEEP_SCRUBBED. If you mute > it for a week and the cluster continues deep-scrubbing, the "mute" > will clear at some point although there are still PGs not > deep-scrubbed in time warnings. I could verify this in a tiny lab with > 19.2.2, setting osd_deep_scrub_interval to 10 minutes, the warning > pops up. Then I mute that warning, issue deep-scrubs for several > pools, and at some point I see this in the mon log: > > Jun 25 08:53:28 host1 ceph-mon[823315]: log_channel(cluster) log [WRN] > : Health check update: 61 pgs not deep-scrubbed in time > (PG_NOT_DEEP_SCRUBBED) > Jun 25 08:53:28 host1 ceph-mon[823315]: Health check update: 61 pgs > not deep-scrubbed in time (PG_NOT_DEEP_SCRUBBED) > Jun 25 08:53:29 host1 ceph-mon[823315]: pgmap v164176: 389 pgs: 389 > active+clean; 428 MiB data, 57 GiB used, 279 GiB / 336 GiB avail > ... > Jun 25 08:53:31 host1 ceph-mon[823315]: log_channel(cluster) log [INF] > : Health alert mute PG_NOT_DEEP_SCRUBBED cleared (count increased from > 60 to 61) > Jun 25 08:53:31 host1 ceph-mon[823315]: Health alert mute > PG_NOT_DEEP_SCRUBBED cleared (count increased from 60 to 61) > > > I don't really understand what the code does [0] (I'm not a dev): > > ---snip--- > if (!p->second.sticky) { > auto q = all.checks.find(p->first); > if (q == all.checks.end()) { > mon.clog->info() << "Health alert mute " << p->first > << " cleared (health alert cleared)"; > p = pending_mutes.erase(p); > changed = true; > continue; > } > if (p->second.count) { > // count-based mute > if (q->second.count > p->second.count) { > mon.clog->info() << "Health alert mute " << p->first > << " cleared (count increased from " << > p->second.count > << " to " << q->second.count << ")"; > p = pending_mutes.erase(p); > changed = true; > continue; > ---snip--- > > Could anyone shed some light what I'm not understanding? Why would the > mute clear although there are still PGs not deep-scrubbed? > > Thanks! > Eugen > > [0] > https://github.com/ceph/ceph/blob/d78ffd1247d6cef5cbd829e77204185dc0d3a8ba/src/mon/HealthMonitor.cc#L431 > > _______________________________________________ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io