Hi Henning,

I think the increasing strays_created is normal. This is a counter that is 
monotonically increasing when any file is deleted. And is only reset when the 
MDS is restarted.

The num_strays is the actual number of strays in your system, and they are not 
necessarily reside in memory.

Weiwen Hu

> 在 2023年5月24日,20:22,Henning Achterrath <ach...@uni-bonn.de> 写道:
> 
> Hello again,
> 
> In two days, the number has increased by about one and a half million and the 
> ram usage of mds remains high by about 50G. We are very unsure if this is a 
> normal behavior.
> 
> Today:
>    "num_strays": 53695,
>         "num_strays_delayed": 4,
>         "num_strays_enqueuing": 0,
>         "strays_created": 3618390,
>         "strays_enqueued": 3943542,
>         "strays_reintegrated": 144545,
>         "strays_migrated": 38,
> 
> On 22.05.23
> 
> ceph daemon  mds.0 perf dump | grep stray
>          "num_strays": 49846,
>         "num_strays_delayed": 21,
>          "num_strays_enqueuing": 0,
>           "strays_created": 2042124,
>          "strays_enqueued": 2396076,
>           "strays_reintegrated": 44207,
>           "strays_migrated": 38,
> 
> Maybe someone can explain to us what these counters mean in detail. The perf 
> schema is not very revealing.
> 
> Our idea is to add a standbye-replay (hot-standbye mds) temporary, to ensure 
> the journal is replayable before we resume the upgrade.
> 
> I would be grateful for any advise.
> 
> best regards
> Henning
> 
>> On 23.05.23 17:24, Henning Achterrath wrote:
>> In addition, i would like to mention that the number of "strays_created" 
>> also increases after this action, but the number of num_strays is lower now. 
>> If desired, we can provide debug logs from mds at the time the mds was in 
>> stopping state and we did a systemctl restart mds1.
>> The only active mds server has a ram usage of about 50G. The memory limit is 
>> 32G, but we get no warnings about that. Maybe the separate purge_queue is 
>> consuming a lot of RAM and it does not count for the limit? Usually we get 
>> notified when the mds is behind the memory limit.
>> thank you
>>> On 22.05.23 15:23, t.kulschew...@uni-bonn.de wrote:
>>> Hi Venky,
>>> 
>>> thank you for your help. We managed to shut down mds.1:
>>> We set "ceph fs set max_mds 1" and waited for about 30 minutes. In the 
>>> first couple minutes, strays were migrated from mds.1 to mds.0. After this, 
>>> the stray export hung. The mds.1 remained in the state_stopping. After 
>>> about 30 minutes, we restarted mds.1. This resulted in one active mds and 
>>> two standby mds. However, we are not sure, if the remaining strays could be 
>>> migrated.
>>> 
>>> When we had a closer look at the perf counter of the mds, we realized that 
>>> the number of strays_enqueued is quite high and constantly increasing. Is 
>>> this to be expected? What does the counter "strays_enqueued" mean in detail?
>>> 
>>> ceph daemon  mds.0 perf dump | grep stray
>>>          "num_strays": 49846,
>>>          "num_strays_delayed": 21,
>>>          "num_strays_enqueuing": 0,
>>>          "strays_created": 2042124,
>>>          "strays_enqueued": 2396076,
>>>          "strays_reintegrated": 44207,
>>>          "strays_migrated": 38,
>>> 
>>> Would it be safe to perform "ceph orch upgrade resume" at this point? At 
>>> the moment, the MONs and OSDs are running 17.2.6, while the MDSs and RGWs 
>>> are running 17.2.5. So we have to upgrade the MDS and RGW eventually.
>>> 
>>> Best, Tobias
>>> _______________________________________________
>>> ceph-users mailing list -- ceph-users@ceph.io
>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to