Hi Raymond,

I'm pinging this old thread because we hit the same issue last week.

Is it possible that when you upgraded to nautilus you ran `ceph osd
require-osd-release nautilus` but did not run `ceph mon enable-msgr2`
?

We were in that state (intentionally), and started getting the `unable
to obtain rotating service keys` after around half the osds were
restarted with require_osd_release=nautilus.
Those restarted osds bind on the v2 port, and they seemingly get
confused how to communicate with the mons.

As soon as we did `ceph mon enable-msgr2` to enable v2 on the mons the
osds could boot without issue.

I guess this is a heads up not to skip any step of the nautilus
upgrade, even though the docs make `ceph mon enable-msgr2` look
optional.

Cheers, Dan


-- Dan


On Tue, Jan 28, 2020 at 8:12 PM Raymond Clotfelter <r...@ksu.edu> wrote:
>
> I have a server with 12 OSDs on it. Five of them are unable to start, and 
> give the following error message in the their logs:
>
> 2020-01-28 13:00:41.760 7f61fb490c80  0 monclient: wait_auth_rotating timed 
> out after 30
> 2020-01-28 13:00:41.760 7f61fb490c80 -1 osd.178 411005 unable to obtain 
> rotating service keys; retrying
>
> These OSDs were up and running when they initially just died on me. I tried 
> to restart them and they failed to come up. I rebooted the node and they did 
> not recover. All 5 died within a few hours and were all 5 down by time I 
> started poking them. I previously had this happen with 2 other OSDs, one each 
> on 2 servers each with 12 OSDs. I ended up just purging and recreating those 
> OSDs. I would really like to find a solution to fix this problem that does 
> not involve purging the OSDs.
>
> I have tried stopping and starting all monitors and managers, one at a time, 
> and all at the same time. Additionally, all servers in the cluster have been 
> restarted over the past couple of days for various other reasons.
>
> I am on Ceph 14.2.6, Debian buster and am using the Debian packages. All of 
> my servers are kept in the time sync via ntp, and this has been verified 
> multiple times that everything remains in time sync.
>
> I have googled the error message and tried all of the solutions offered from 
> that, but nothing makes any difference.
>
> I would appreciate any constructive advice.
>
> Thanks.
>
> -- ray
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to