Hi Raymond, I'm pinging this old thread because we hit the same issue last week.
Is it possible that when you upgraded to nautilus you ran `ceph osd require-osd-release nautilus` but did not run `ceph mon enable-msgr2` ? We were in that state (intentionally), and started getting the `unable to obtain rotating service keys` after around half the osds were restarted with require_osd_release=nautilus. Those restarted osds bind on the v2 port, and they seemingly get confused how to communicate with the mons. As soon as we did `ceph mon enable-msgr2` to enable v2 on the mons the osds could boot without issue. I guess this is a heads up not to skip any step of the nautilus upgrade, even though the docs make `ceph mon enable-msgr2` look optional. Cheers, Dan -- Dan On Tue, Jan 28, 2020 at 8:12 PM Raymond Clotfelter <r...@ksu.edu> wrote: > > I have a server with 12 OSDs on it. Five of them are unable to start, and > give the following error message in the their logs: > > 2020-01-28 13:00:41.760 7f61fb490c80 0 monclient: wait_auth_rotating timed > out after 30 > 2020-01-28 13:00:41.760 7f61fb490c80 -1 osd.178 411005 unable to obtain > rotating service keys; retrying > > These OSDs were up and running when they initially just died on me. I tried > to restart them and they failed to come up. I rebooted the node and they did > not recover. All 5 died within a few hours and were all 5 down by time I > started poking them. I previously had this happen with 2 other OSDs, one each > on 2 servers each with 12 OSDs. I ended up just purging and recreating those > OSDs. I would really like to find a solution to fix this problem that does > not involve purging the OSDs. > > I have tried stopping and starting all monitors and managers, one at a time, > and all at the same time. Additionally, all servers in the cluster have been > restarted over the past couple of days for various other reasons. > > I am on Ceph 14.2.6, Debian buster and am using the Debian packages. All of > my servers are kept in the time sync via ntp, and this has been verified > multiple times that everything remains in time sync. > > I have googled the error message and tried all of the solutions offered from > that, but nothing makes any difference. > > I would appreciate any constructive advice. > > Thanks. > > -- ray > > _______________________________________________ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io