Hi,

I'm trying to understand the "ceph health mute" behavior. In this case, I'm referring to the warning PG_NOT_DEEP_SCRUBBED. If you mute it for a week and the cluster continues deep-scrubbing, the "mute" will clear at some point although there are still PGs not deep-scrubbed in time warnings. I could verify this in a tiny lab with 19.2.2, setting osd_deep_scrub_interval to 10 minutes, the warning pops up. Then I mute that warning, issue deep-scrubs for several pools, and at some point I see this in the mon log:

Jun 25 08:53:28 host1 ceph-mon[823315]: log_channel(cluster) log [WRN] : Health check update: 61 pgs not deep-scrubbed in time (PG_NOT_DEEP_SCRUBBED) Jun 25 08:53:28 host1 ceph-mon[823315]: Health check update: 61 pgs not deep-scrubbed in time (PG_NOT_DEEP_SCRUBBED) Jun 25 08:53:29 host1 ceph-mon[823315]: pgmap v164176: 389 pgs: 389 active+clean; 428 MiB data, 57 GiB used, 279 GiB / 336 GiB avail
...
Jun 25 08:53:31 host1 ceph-mon[823315]: log_channel(cluster) log [INF] : Health alert mute PG_NOT_DEEP_SCRUBBED cleared (count increased from 60 to 61) Jun 25 08:53:31 host1 ceph-mon[823315]: Health alert mute PG_NOT_DEEP_SCRUBBED cleared (count increased from 60 to 61)


I don't really understand what the code does [0] (I'm not a dev):

---snip---
    if (!p->second.sticky) {
      auto q = all.checks.find(p->first);
      if (q == all.checks.end()) {
        mon.clog->info() << "Health alert mute " << p->first
                          << " cleared (health alert cleared)";
        p = pending_mutes.erase(p);
        changed = true;
        continue;
      }
      if (p->second.count) {
        // count-based mute
        if (q->second.count > p->second.count) {
          mon.clog->info() << "Health alert mute " << p->first
                            << " cleared (count increased from " << 
p->second.count
                            << " to " << q->second.count << ")";
          p = pending_mutes.erase(p);
          changed = true;
          continue;
---snip---

Could anyone shed some light what I'm not understanding? Why would the mute clear although there are still PGs not deep-scrubbed?

Thanks!
Eugen

[0] https://github.com/ceph/ceph/blob/d78ffd1247d6cef5cbd829e77204185dc0d3a8ba/src/mon/HealthMonitor.cc#L431

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to