Hey folks,
I have a Luminous 12.2.6 cluster which suffered a power failure
recently. On recovery, one of my OSDs is continually crashing and
restarting, with the error below:
----
9ae00 con 0
-3> 2018-07-15 09:50:58.313242 7f131c5a9700 10 monclient: tick
-2> 2018-07-15 09:50:58.313277 7f131c5a9700 10 monclient:
_check_auth_rotating have uptodate secrets (they expire after 2018-07-15
09:50:28.313274)
-1> 2018-07-15 09:50:58.313320 7f131c5a9700 10 log_client
log_queue is 8 last_log 10 sent 0 num 8 unsent 10 sending 10
0> 2018-07-15 09:50:58.320255 7f131c5a9700 -1
/build/ceph-12.2.6/src/common/LogClient.cc: In function 'Message*
LogClient::_get_mon_log_message()' thread 7f131c5a9700 time 2018-07-15
09:50:58.313336
/build/ceph-12.2.6/src/common/LogClient.cc: 294: FAILED
assert(num_unsent <= log_queue.size())
----
I've found a few recent references to this "FAILED assert" message
(assuming that's the cause of the problem), such as
https://bugzilla.redhat.com/show_bug.cgi?id=1599718 and
http://tracker.ceph.com/issues/18209, with the most recent occurance
being 3 days ago (http://tracker.ceph.com/issues/18209#note-12).
Is there any resolution to this issue, or anything I can attempt to recover?
Thanks!
D
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com