Re: [ceph-users] Luminous rgw hangs after sighup

2017-12-11 Thread Casey Bodley
There have been other issues related to hangs during realm reconfiguration, ex http://tracker.ceph.com/issues/20937. We decided to revert the use of SIGHUP to trigger realm reconfiguration in https://github.com/ceph/ceph/pull/16807. I just started a backport of that for luminous. On 12/11/20

Re: [ceph-users] Luminous rgw hangs after sighup

2017-12-11 Thread Graham Allan
That's the issue I remember (#20763)! The hang happened to me once, on this cluster, after upgrade from jewel to 12.2.2; then on Friday I disabled automatic bucket resharding due to some other problems - didn't get any logrotate-related hangs through the weekend. I wonder if these could be rel

Re: [ceph-users] Luminous rgw hangs after sighup

2017-12-11 Thread Martin Emrich
Hi! This sounds like http://tracker.ceph.com/issues/20763 (or indeed http://tracker.ceph.com/issues/20866). It is still present in 12.2.2 (just tried it). My workaround is to exclude radosgw from logrotate (remove "radosgw" from /etc/logrotate.d/ceph) from being SIGHUPed, and to rotate the log

[ceph-users] Luminous rgw hangs after sighup

2017-12-08 Thread Graham Allan
I noticed this morning that all four of our rados gateways (luminous 12.2.2) hung at logrotate time overnight. The last message logged was: 2017-12-08 03:21:01.897363 7fac46176700 0 ERROR: failed to clone shard, completion_mgr.get_next() returned ret=-125 one of the 3 nodes recorded more de