[ceph-users] constant increase in osdmap epoch

Frank Schilder Mon, 18 Nov 2024 07:03:06 -0800

Hi all,

we observe a problem that has been reported before, but I can't find a 
resolution. This is related to an earlier thread "failed to load OSD map for 
epoch 2898146, got 0 bytes" 
(https://www.spinics.net/lists/ceph-users/msg84485.html).


We run an octopus latest cluster and observe a constant increase in osdmap 
epoch every few seconds. There is no change in the contents between two 
successive epochs:

# diff map.3075085.txt map.3075086.txt
1c1
< epoch 3075085
---
> epoch 3075086
4c4
< modified 2024-11-18T15:38:45.512100+0100
---
> modified 2024-11-18T15:38:47.858092+0100

This is exactly what others reported too, for example, "steady increasing of 
osd map epoch since octopus" 
(https://www.spinics.net/lists/ceph-users/msg69443.html). Its a real problem 
since it dramatically shortens the time window an OSD can be down before its 
latest OSD map is purged from the cluster. This, in turn, leads to serious 
follow-up problems with OSD restart as reported in the thread I'm referring to 
at the beginning.

Related to that I also see the mgrs increasing the pgmap version constantly 
every 2 seconds. However, I believe this is intentional.

I don't see this redundant pgp_num_actual setting by the mgrs reported here: 
https://tracker.ceph.com/issues/51433 .

I can't find a resolution anywhere. Any help would be very much appreciated.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] constant increase in osdmap epoch

Reply via email to