There have been other issues related to hangs during realm
reconfiguration, ex http://tracker.ceph.com/issues/20937. We decided to
revert the use of SIGHUP to trigger realm reconfiguration in
https://github.com/ceph/ceph/pull/16807. I just started a backport of
that for luminous.
On 12/11/20
That's the issue I remember (#20763)!
The hang happened to me once, on this cluster, after upgrade from jewel
to 12.2.2; then on Friday I disabled automatic bucket resharding due to
some other problems - didn't get any logrotate-related hangs through the
weekend. I wonder if these could be rel
Hi!
This sounds like http://tracker.ceph.com/issues/20763 (or indeed
http://tracker.ceph.com/issues/20866).
It is still present in 12.2.2 (just tried it). My workaround is to exclude
radosgw from logrotate (remove "radosgw" from /etc/logrotate.d/ceph) from being
SIGHUPed, and to rotate the log
I noticed this morning that all four of our rados gateways (luminous
12.2.2) hung at logrotate time overnight. The last message logged was:
2017-12-08 03:21:01.897363 7fac46176700 0 ERROR: failed to clone shard,
completion_mgr.get_next() returned ret=-125
one of the 3 nodes recorded more de