By saying upgrade, I mean upgrade from the non-dockerized 16.2.5 to cephadm 
version 16.2.6. So I think you need to disable standby-replay and reduce the 
number of ranks to 1, then stop all the non-dockerized mds, deploy new mds with 
cephadm. Only scaling back up after you finish the migration. Did you also 
tried that?

In fact, similar issue has been reported several times on this list when 
upgrade mds to 16.2.6, e.g. [1]. I have faced that too. So I’m pretty confident 
that you are facing the same issue.

[1]: 
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/KQ5A5OWRIUEOJBC7VILBGDIKPQGJQIWN/

在 2021年10月4日,19:00,Petr Belyaev <p.bely...@alohi.com> 写道:

 Hi Weiwen,

Yes, we did that during the upgrade. In fact, we did that multiple times even 
after the upgrade to see if it will resolve the issue (disabling hot standby, 
scaling everything down to a single MDS, swapping it with the new one, scaling 
back up).

The upgrade itself went fine, problems started during the migration to cephadm 
(which was done after migrating everything to Pacific).
It only occurs when using dockerized MDS. Non-dockerized MDS nodes, also 
Pacific, everything runs fine.

Petr

On 4 Oct 2021, at 12:43, 胡 玮文 <huw...@outlook.com<mailto:huw...@outlook.com>> 
wrote:

Hi Petr,

Please read https://docs.ceph.com/en/latest/cephfs/upgrading/ for MDS upgrade 
procedure.

In short, when upgrading to 16.2.6, you need to disable standby-replay and 
reduce the number of ranks to 1.

Weiwen Hu

从 Windows 版邮件<https://go.microsoft.com/fwlink/?LinkId=550986>发送

发件人: Petr Belyaev<mailto:p.bely...@alohi.com>
发送时间: 2021年10月4日 18:00
收件人: ceph-users@ceph.io<mailto:ceph-users@ceph.io>
主题: [ceph-users] MDS not becoming active after migrating to cephadm

Hi,

We’ve recently upgraded from Nautilus to Pacific, and tried moving our services 
to cephadm/ceph orch.
For some reason, MDS nodes deployed through orch never become active (or at 
least standby-replay). Non-dockerized MDS nodes can still be deployed and work 
fine. Non-dockerized mds version is 16.2.6, docker image version is 
16.2.5-387-g7282d81d (came as a default).

In the MDS log, the only related message is monitors assigning MDS as standby. 
Increasing the log level does not help much, it only adds beacon messages.
Monitor log also contains no differences compared to a non-dockerized MDS 
startup.
Mds metadata command output is identical to that of a non-dockerized MDS.

The only difference I can see in the log is the value in curly braces after the 
node name, e.g. mds.storage{0:1234ff}. For dockerized MDS, the first value is 
ffffffff, for non-dockerized it’s zero. Compat flags are identical.

Could someone please advise me why the dockerized MDS is being stuck as a 
standby? Maybe some config values missing or smth?

Best regards,
Petr
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io<mailto:ceph-users@ceph.io>
To unsubscribe send an email to 
ceph-users-le...@ceph.io<mailto:ceph-users-le...@ceph.io>

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to