I suspect that (deep-)scrubbing also has an impact on the epoch increment. I see those in completely idle clusters as well, the timestamps of deep-scrubs and epoch changes seem to align more or less. It's not easy to tell without digging really deep into debug logs.

Zitat von Joel Davidow <jdavi...@nso.edu>:

I'm seeing a similar increase in osdmap epochs with the only diff from `ceph osd dump` being epoch and modified date. Duration of osdmap epochs varies a lot on scale of seconds but are generally less than 10 minutes with a max of about half an hour. This is in a cephadm cluster running 18.2.4 (upgraded from 16.2.15) with rgw only. No changes to scrub configs. No snapshots, though `purged_snaps scrubs` are logged at debug 10. I've set mon debug at 5, 10, 15, and 20 but haven't found anything in those logs that suggests a cause. There is a chance I've missed something in the debug 20 log though as it has 1244292 lines. A portion of the debug 15 log is below. Any suggestions on next steps and/or possible causes are welcome. Thanks.

```
Mar 31 16:20:46 firelord bash[2426394]: debug 2025-03-31T16:20:46.443+0000 7f02de959640 10 mon.firelord@0(leader).osd e405676 should_propose Mar 31 16:20:46 firelord bash[2426394]: debug 2025-03-31T16:20:46.647+0000 7f02de959640 10 mon.firelord@0(leader).log v61415522 preprocess_query log(2 entries from seq 12229 at 2025-03-31T16:20:46.448717+0000) v1 from osd.132 v2:10.224.190.122:6833/3799949454 Mar 31 16:20:46 firelord bash[2426394]: debug 2025-03-31T16:20:46.647+0000 7f02de959640 10 mon.firelord@0(leader).log v61415522 preprocess_log log(2 entries from seq 12229 at 2025-03-31T16:20:46.448717+0000) v1 from osd.132 Mar 31 16:20:46 firelord bash[2426394]: debug 2025-03-31T16:20:46.647+0000 7f02de959640 10 mon.firelord@0(leader).log v61415522 prepare_update log(2 entries from seq 12229 at 2025-03-31T16:20:46.448717+0000) v1 from osd.132 v2:10.224.190.122:6833/3799949454 Mar 31 16:20:46 firelord bash[2426394]: debug 2025-03-31T16:20:46.647+0000 7f02de959640 10 mon.firelord@0(leader).log v61415522 prepare_log log(2 entries from seq 12229 at 2025-03-31T16:20:46.448717+0000) v1 from osd.132 Mar 31 16:20:46 firelord bash[2426394]: debug 2025-03-31T16:20:46.647+0000 7f02de959640 10 mon.firelord@0(leader).log v61415522 logging 2025-03-31T16:20:46.448717+0000 osd.132 (osd.132) 12229 : cluster [DBG] purged_snaps scrub starts Mar 31 16:20:46 firelord bash[2426394]: debug 2025-03-31T16:20:46.647+0000 7f02de959640 10 mon.firelord@0(leader).log v61415522 logging 2025-03-31T16:20:46.449046+0000 osd.132 (osd.132) 12230 : cluster [DBG] purged_snaps scrub ok Mar 31 16:20:46 firelord bash[2426394]: debug 2025-03-31T16:20:46.723+0000 7f02de959640 10 mon.firelord@0(leader).osd e405676 preprocess_query osd_beacon(pgs [6.1ef,6.3d,6.2d2,6.5ee,6.102,7.85,6.6da,8.e] lec 405676 last_purged_snaps_scrub 2025-03-30T19:54:39.335243+0000 osd_beacon_report_interval 300 v405676) v3 from osd.177 v2:10.224.190.121:6872/1659276092 Mar 31 16:20:46 firelord bash[2426394]: debug 2025-03-31T16:20:46.723+0000 7f02de959640 10 mon.firelord@0(leader) e15 no_reply to osd.177 v2:10.224.190.121:6872/1659276092 osd_beacon(pgs [6.1ef,6.3d,6.2d2,6.5ee,6.102,7.85,6.6da,8.e] lec 405676 last_purged_snaps_scrub 2025-03-30T19:54:39.335243+0000 osd_beacon_report_interval 300 v405676) v3 Mar 31 16:20:46 mon.0 bash[2426394]: debug 2025-03-31T16:20:46.723+0000 7f02de959640 7 mon.0@0(leader).osd e405676 prepare_update osd_beacon(pgs [6.1ef,6.3d,6.2d2,6.5ee,6.102,7.85,6.6da,8.e] lec 405676 last_purged_snaps_scrub 2025-03-30T19:54:39.335243+0000 osd_beacon_report_interval 300 v405676) v3 from osd.177 v2:X.X.X.121:6872/1659276092 Mar 31 16:20:46 mon.0 bash[2426394]: debug 2025-03-31T16:20:46.723+0000 7f02de959640 10 mon.0@0(leader).osd e405676 prepare_beacon osd_beacon(pgs [6.1ef,6.3d,6.2d2,6.5ee,6.102,7.85,6.6da,8.e] lec 405676 last_purged_snaps_scrub 2025-03-30T19:54:39.335243+0000 osd_beacon_report_interval 300 v405676) v3 from osd.177 Mar 31 16:20:46 mon.0 bash[2426394]: debug 2025-03-31T16:20:46.839+0000 7f02de959640 10 mon.0@0(leader) e15 no_reply to mgr.24259957 X.X.X.154:0/1735542615 monmgrreport(gid 24259957, 0 checks, 0 progress events) v3 Mar 31 16:20:46 mon.0 bash[2426394]: debug 2025-03-31T16:20:46.839+0000 7f02de959640 10 mon.0@0(leader).mgrstat prepare_report 2721 pgs: 2710 active+clean, 11 active+clean+scrubbing+deep; 259 TiB data, 434 TiB used, 4.0 PiB / 4.4 PiB avail; 206 KiB/s rd, 208 op/s, 0 health checks, 0 progress events Mar 31 16:20:46 mon.0 bash[2426394]: debug 2025-03-31T16:20:46.899+0000 7f02e115e640 10 mon.0@0(leader).elector(306) send_peer_ping to peer 1 Mar 31 16:20:46 mon.0 bash[2426394]: debug 2025-03-31T16:20:46.899+0000 7f02e115e640 10 mon.0@0(leader).elector(306) send_peer_ping to peer 2 Mar 31 16:20:46 mon.0 bash[2426394]: debug 2025-03-31T16:20:46.899+0000 7f02e115e640 10 mon.0@0(leader).elector(306) send_peer_ping to peer 3 Mar 31 16:20:46 mon.0 bash[2426394]: debug 2025-03-31T16:20:46.899+0000 7f02e115e640 10 mon.0@0(leader).elector(306) send_peer_ping to peer 4 Mar 31 16:20:46 mon.0 bash[2426394]: debug 2025-03-31T16:20:46.899+0000 7f02de959640 10 mon.0@0(leader).elector(306) assimilate_connection_reports Mar 31 16:20:46 mon.0 bash[2426394]: message repeated 3 times: [ debug 2025-03-31T16:20:46.899+0000 7f02de959640 10 mon.0@0(leader).elector(306) assimilate_connection_reports] Mar 31 16:20:46 mon.0 bash[2426394]: debug 2025-03-31T16:20:46.987+0000 7f02de959640 10 mon.0@0(leader).elector(306) assimilate_connection_reports Mar 31 16:20:46 mon.0 bash[2426394]: debug 2025-03-31T16:20:46.987+0000 7f02de959640 10 mon.0@0(leader).elector(306) assimilate_connection_reports Mar 31 16:20:47 mon.0 bash[2426394]: debug 2025-03-31T16:20:47.127+0000 7f02e115e640 10 mon.0@0(leader).log v61415522 encode_pending v61415523 Mar 31 16:20:47 mon.0 bash[2426394]: debug 2025-03-31T16:20:47.127+0000 7f02e115e640 10 mon.0@0(leader).log v61415522 encode_pending pruning channel cluster 19229268 -> 19229271 Mar 31 16:20:47 mon.0 bash[2426394]: debug 2025-03-31T16:20:47.127+0000 7f02e115e640 10 mon.0@0(leader).mgrstat 53693941 Mar 31 16:20:47 mon.0 bash[2426394]: debug 2025-03-31T16:20:47.127+0000 7f02e115e640 10 mon.0@0(leader) e15 log_health updated 0 previous 0 Mar 31 16:20:47 mon.0 bash[2426394]: debug 2025-03-31T16:20:47.127+0000 7f02e115e640 10 mon.0@0(leader).osd e405676 encode_pending e 405677 Mar 31 16:20:47 mon.0 bash[2426394]: debug 2025-03-31T16:20:47.127+0000 7f02e115e640 1 mon.0@0(leader).osd e405676 do_prune osdmap full prune enabled Mar 31 16:20:47 mon.0 bash[2426394]: debug 2025-03-31T16:20:47.127+0000 7f02e115e640 10 mon.0@0(leader).osd e405676 should_prune could only prune 5 epochs (405171..405176), which is less than the required minimum (10000) Mar 31 16:20:47 mon.0 bash[2426394]: debug 2025-03-31T16:20:47.143+0000 7f02e115e640 10 mon.0@0(leader).osd e405676 update_pending_pgs Mar 31 16:20:47 mon.0 bash[2426394]: debug 2025-03-31T16:20:47.143+0000 7f02e115e640 10 mon.0@0(leader).osd e405676 scan_for_creating_pgs already created 1 Mar 31 16:20:47 mon.0 bash[2426394]: debug 2025-03-31T16:20:47.143+0000 7f02e115e640 10 mon.0@0(leader).osd e405676 scan_for_creating_pgs already created 2 Mar 31 16:20:47 mon.0 bash[2426394]: debug 2025-03-31T16:20:47.143+0000 7f02e115e640 10 mon.0@0(leader).osd e405676 scan_for_creating_pgs already created 3 Mar 31 16:20:47 mon.0 bash[2426394]: debug 2025-03-31T16:20:47.143+0000 7f02e115e640 10 mon.0@0(leader).osd e405676 scan_for_creating_pgs already created 4 Mar 31 16:20:47 mon.0 bash[2426394]: debug 2025-03-31T16:20:47.143+0000 7f02e115e640 10 mon.0@0(leader).osd e405676 scan_for_creating_pgs already created 5 Mar 31 16:20:47 mon.0 bash[2426394]: debug 2025-03-31T16:20:47.143+0000 7f02e115e640 10 mon.0@0(leader).osd e405676 scan_for_creating_pgs already created 6 Mar 31 16:20:47 mon.0 bash[2426394]: debug 2025-03-31T16:20:47.143+0000 7f02e115e640 10 mon.0@0(leader).osd e405676 scan_for_creating_pgs already created 7 Mar 31 16:20:47 mon.0 bash[2426394]: debug 2025-03-31T16:20:47.143+0000 7f02e115e640 10 mon.0@0(leader).osd e405676 scan_for_creating_pgs already created 8 Mar 31 16:20:47 mon.0 bash[2426394]: debug 2025-03-31T16:20:47.143+0000 7f02e115e640 10 mon.0@0(leader).osd e405676 update_pending_pgs 0 pools queued Mar 31 16:20:47 mon.0 bash[2426394]: debug 2025-03-31T16:20:47.143+0000 7f02e115e640 10 mon.0@0(leader).osd e405676 update_pending_pgs 0 pgs removed because they're created Mar 31 16:20:47 mon.0 bash[2426394]: debug 2025-03-31T16:20:47.143+0000 7f02e115e640 10 mon.0@0(leader).osd e405676 update_pending_pgs queue remaining: 0 pools Mar 31 16:20:47 mon.0 bash[2426394]: debug 2025-03-31T16:20:47.143+0000 7f02e115e640 10 mon.0@0(leader).osd e405676 update_pending_pgs 0/0 pgs added from queued pools Mar 31 16:20:47 mon.0 bash[2426394]: debug 2025-03-31T16:20:47.143+0000 7f02e115e640 10 mon.0@0(leader).osd e405676 encode_pending encoding full map with reef features 1080873258835847684 Mar 31 16:20:47 mon.0 bash[2426394]: debug 2025-03-31T16:20:47.143+0000 7f02e115e640 10 mon.0@0(leader) e15 log_health updated 0 previous 0 Mar 31 16:20:47 mon.0 bash[2426394]: debug 2025-03-31T16:20:47.147+0000 7f02dc154640 10 mon.0@0(leader) e15 refresh_from_paxos Mar 31 16:20:47 mon.0 bash[2426394]: debug 2025-03-31T16:20:47.147+0000 7f02dc154640 10 mon.0@0(leader).log v61415523 update_from_paxos Mar 31 16:20:47 mon.0 bash[2426394]: debug 2025-03-31T16:20:47.147+0000 7f02dc154640 10 mon.0@0(leader).log v61415523 update_from_paxos version 61415523 summary v 61415522 Mar 31 16:20:47 mon.0 bash[2426394]: debug 2025-03-31T16:20:47.147+0000 7f02dc154640 10 mon.0@0(leader).log v61415523 update_from_paxos latest full 61415484 Mar 31 16:20:47 mon.0 bash[2426394]: debug 2025-03-31T16:20:47.147+0000 7f02dc154640 7 mon.0@0(leader).log v61415523 update_from_paxos applying incremental log 61415523 2025-03-31T16:20:46.448717+0000 osd.132 (osd.132) 12229 : cluster [DBG] purged_snaps scrub starts Mar 31 16:20:47 mon.0 bash[2426394]: debug 2025-03-31T16:20:47.147+0000 7f02dc154640 7 mon.0@0(leader).log v61415523 update_from_paxos applying incremental log 61415523 2025-03-31T16:20:46.449046+0000 osd.132 (osd.132) 12230 : cluster [DBG] purged_snaps scrub ok Mar 31 16:20:47 mon.0 bash[2426394]: debug 2025-03-31T16:20:47.147+0000 7f02dc154640 10 mon.0@0(leader).log v61415523 summary.channel_info {=0,1,audit=6468256,6478256,cephadm=125508,135508,cluster=19229271,19239273} Mar 31 16:20:47 mon.0 bash[2426394]: debug 2025-03-31T16:20:47.147+0000 7f02dc154640 10 mon.0@0(leader).log v61415523 check_subs Mar 31 16:20:47 mon.0 bash[2426394]: debug 2025-03-31T16:20:47.147+0000 7f02dc154640 10 mon.0@0(leader).log v61415523 check_sub client wants log-info ver 61415523 Mar 31 16:20:47 mon.0 bash[2426394]: debug 2025-03-31T16:20:47.147+0000 7f02dc154640 10 mon.0@0(leader).log v61415523 _create_sub_incremental level 1 ver 61415523 cur summary ver 61415523 Mar 31 16:20:47 mon.0 bash[2426394]: debug 2025-03-31T16:20:47.147+0000 7f02dc154640 10 mon.0@0(leader).log v61415523 _create_sub_incremental incremental message ready (0 entries) Mar 31 16:20:47 mon.0 bash[2426394]: debug 2025-03-31T16:20:47.147+0000 7f02dc154640 10 mon.0@0(leader).log v61415523 check_sub sending message to mgr.24259957 with 0 entries (version 61415523) Mar 31 16:20:47 mon.0 bash[2426394]: debug 2025-03-31T16:20:47.147+0000 7f02dc154640 10 mon.0@0(leader).auth v35561 update_from_paxos Mar 31 16:20:47 mon.0 bash[2426394]: debug 2025-03-31T16:20:47.155+0000 7f02dc154640 10 mon.0@0(leader).config load_config unrecognized option 'mgr/pg_autoscaler/noautoscale' Mar 31 16:20:47 mon.0 bash[2426394]: debug 2025-03-31T16:20:47.155+0000 7f02dc154640 10 mon.0@0(leader).config load_config got 87 keys Mar 31 16:20:47 mon.0 bash[2426394]: debug 2025-03-31T16:20:47.159+0000 7f02dc154640 10 mon.0@0(leader).mgr e164 prime_mgr_client Mar 31 16:20:47 mon.0 bash[2426394]: debug 2025-03-31T16:20:47.159+0000 7f02dc154640 10 mon.0@0(leader).mgrstat 53693940 Mar 31 16:20:47 mon.0 bash[2426394]: debug 2025-03-31T16:20:47.163+0000 7f02dc154640 10 mon.0@0(leader).mgrstat update_from_paxos v53693940 service_map e14807793 0 progress events Mar 31 16:20:47 mon.0 bash[2426394]: debug 2025-03-31T16:20:47.163+0000 7f02dc154640 10 mon.0@0(leader).mgrstat check_subs Mar 31 16:20:47 mon.0 bash[2426394]: debug 2025-03-31T16:20:47.163+0000 7f02dc154640 10 mon.0@0(leader).mgrstat check_sub next 14807794 vs service_map.epoch 14807793 Mar 31 16:20:47 mon.0 bash[2426394]: debug 2025-03-31T16:20:47.163+0000 7f02dc154640 10 mon.0@0(leader).health update_from_paxos Mar 31 16:20:47 mon.0 bash[2426394]: debug 2025-03-31T16:20:47.163+0000 7f02dc154640 10 mon.0@0(leader).log v61415523 create_pending v 61415524 Mar 31 16:20:47 mon.0 bash[2426394]: debug 2025-03-31T16:20:47.163+0000 7f02dc154640 7 mon.0@0(leader).log v61415523 _updated_log for osd.132 v2:X.X.X.122:6833/3799949454 Mar 31 16:20:47 mon.0 bash[2426394]: debug 2025-03-31T16:20:47.163+0000 7f02dc154640 2 mon.0@0(leader) e15 send_reply 0x556dba76d680 0x556dba48ca80 log(last 12230) v1 Mar 31 16:20:47 mon.0 bash[2426394]: debug 2025-03-31T16:20:47.171+0000 7f02dc154640 10 mon.0@0(leader) e15 refresh_from_paxos Mar 31 16:20:47 mon.0 bash[2426394]: debug 2025-03-31T16:20:47.171+0000 7f02dc154640 15 mon.0@0(leader).osd e405676 update_from_paxos paxos e 405677, my e 405676 Mar 31 16:20:47 mon.0 bash[2426394]: debug 2025-03-31T16:20:47.171+0000 7f02dc154640 7 mon.0@0(leader).osd e405676 update_from_paxos loading creating_pgs last_scan_epoch 405676 with 0 pgs Mar 31 16:20:47 mon.0 bash[2426394]: debug 2025-03-31T16:20:47.171+0000 7f02dc154640 7 mon.0@0(leader).osd e405676 update_from_paxos applying incremental 405677 Mar 31 16:20:47 mon.0 bash[2426394]: debug 2025-03-31T16:20:47.171+0000 7f02dc154640 1 mon.0@0(leader).osd e405677 e405677: 355 total, 355 up, 355 in Mar 31 16:20:47 mon.0 bash[2426394]: debug 2025-03-31T16:20:47.175+0000 7f02dc154640 10 mon.0@0(leader).osd e405677 check_osdmap_subs Mar 31 16:20:47 mon.0 bash[2426394]: debug 2025-03-31T16:20:47.175+0000 7f02dc154640 10 mon.0@0(leader).osd e405677 check_osdmap_sub 0x556dbf539860 next 405677 (onetime) Mar 31 16:20:47 mon.0 bash[2426394]: debug 2025-03-31T16:20:47.175+0000 7f02dc154640 5 mon.0@0(leader).osd e405677 send_incremental [405677..405677] to mgr.24259957 Mar 31 16:20:47 mon.0 bash[2426394]: debug 2025-03-31T16:20:47.175+0000 7f02dc154640 10 mon.0@0(leader).osd e405677 build_incremental [405677..405677] with features 3f01cfbffffdffff Mar 31 16:20:47 mon.0 bash[2426394]: debug 2025-03-31T16:20:47.175+0000 7f02dc154640 10 mon.0@0(leader).osd e405677 check_osdmap_sub 0x556dc40a27e0 next 405677 (onetime) Mar 31 16:20:47 mon.0 bash[2426394]: debug 2025-03-31T16:20:47.175+0000 7f02dc154640 5 mon.0@0(leader).osd e405677 send_incremental [405677..405677] to client.? Mar 31 16:20:47 mon.0 bash[2426394]: debug 2025-03-31T16:20:47.175+0000 7f02dc154640 10 mon.0@0(leader).osd e405677 build_incremental [405677..405677] with features 3f01cfbffffdffff Mar 31 16:20:47 mon.0 bash[2426394]: debug 2025-03-31T16:20:47.175+0000 7f02dc154640 10 mon.0@0(leader).osd e405677 check_osdmap_sub 0x556dd03810e0 next 405677 (onetime) Mar 31 16:20:47 mon.0 bash[2426394]: debug 2025-03-31T16:20:47.175+0000 7f02dc154640 5 mon.0@0(leader).osd e405677 send_incremental [405677..405677] to client.? Mar 31 16:20:47 mon.0 bash[2426394]: debug 2025-03-31T16:20:47.175+0000 7f02dc154640 10 mon.0@0(leader).osd e405677 build_incremental [405677..405677] with features 3f01cfbffffdffff Mar 31 16:20:47 mon.0 bash[2426394]: debug 2025-03-31T16:20:47.175+0000 7f02dc154640 10 mon.0@0(leader).osd e405677 check_osdmap_sub 0x556dc12e36e0 next 405677 (onetime) Mar 31 16:20:47 mon.0 bash[2426394]: debug 2025-03-31T16:20:47.175+0000 7f02dc154640 5 mon.0@0(leader).osd e405677 send_incremental [405677..405677] to client.5361270 Mar 31 16:20:47 mon.0 bash[2426394]: debug 2025-03-31T16:20:47.175+0000 7f02dc154640 10 mon.0@0(leader).osd e405677 build_incremental [405677..405677] with features 3f01cfbffffdffff Mar 31 16:20:47 mon.0 bash[2426394]: debug 2025-03-31T16:20:47.175+0000 7f02dc154640 10 mon.0@0(leader).osd e405677 check_osdmap_sub 0x556de3549920 next 405677 (onetime) Mar 31 16:20:47 mon.0 bash[2426394]: debug 2025-03-31T16:20:47.175+0000 7f02dc154640 5 mon.0@0(leader).osd e405677 send_incremental [405677..405677] to client.5350016 Mar 31 16:20:47 mon.0 bash[2426394]: debug 2025-03-31T16:20:47.175+0000 7f02dc154640 10 mon.0@0(leader).osd e405677 build_incremental [405677..405677] with features 3f01cfbffffdffff Mar 31 16:20:47 mon.0 bash[2426394]: debug 2025-03-31T16:20:47.175+0000 7f02dc154640 10 mon.0@0(leader).osd e405677 check_osdmap_sub 0x556dbb13c660 next 405677 (onetime) Mar 31 16:20:47 mon.0 bash[2426394]: debug 2025-03-31T16:20:47.175+0000 7f02dc154640 5 mon.0@0(leader).osd e405677 send_incremental [405677..405677] to client.5370846 Mar 31 16:20:47 mon.0 bash[2426394]: debug 2025-03-31T16:20:47.175+0000 7f02dc154640 10 mon.0@0(leader).osd e405677 build_incremental [405677..405677] with features 3f01cfbffffdffff Mar 31 16:20:47 mon.0 bash[2426394]: debug 2025-03-31T16:20:47.175+0000 7f02dc154640 10 mon.0@0(leader).osd e405677 committed, telling random osd.138 all about it Mar 31 16:20:47 mon.0 bash[2426394]: debug 2025-03-31T16:20:47.175+0000 7f02dc154640 10 mon.0@0(leader).osd e405677 build_incremental [405676..405677] with features 3f01cfbffffdffff Mar 31 16:20:47 mon.0 bash[2426394]: debug 2025-03-31T16:20:47.175+0000 7f02dc154640 10 mon.0@0(leader).osd e405677 update_logger Mar 31 16:20:47 mon.0 bash[2426394]: debug 2025-03-31T16:20:47.175+0000 7f02dc154640 10 mon.0@0(leader).log v61415523 update_from_paxos Mar 31 16:20:47 mon.0 bash[2426394]: debug 2025-03-31T16:20:47.175+0000 7f02dc154640 10 mon.0@0(leader).log v61415523 update_from_paxos version 61415523 summary v 61415523 Mar 31 16:20:47 mon.0 bash[2426394]: debug 2025-03-31T16:20:47.175+0000 7f02dc154640 10 mon.0@0(leader).auth v35561 update_from_paxos
```
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to