Hi!

This sounds like http://tracker.ceph.com/issues/20763 (or indeed 
http://tracker.ceph.com/issues/20866).

It is still present in 12.2.2 (just tried it). My workaround is to exclude 
radosgw from logrotate (remove "radosgw" from /etc/logrotate.d/ceph) from being 
SIGHUPed, and to rotate the logs manually from time to time and completely 
restarting the radosgw processes one after the other on my radosgw cluster.

Regards,

Martin

Am 08.12.17, 18:58 schrieb "ceph-users im Auftrag von Graham Allan" 
<ceph-users-boun...@lists.ceph.com im Auftrag von g...@umn.edu>:

    I noticed this morning that all four of our rados gateways (luminous 
    12.2.2) hung at logrotate time overnight. The last message logged was:
    
    > 2017-12-08 03:21:01.897363 7fac46176700  0 ERROR: failed to clone shard, 
completion_mgr.get_next() returned ret=-125
    
    one of the 3 nodes recorded more detail:
    > 2017-12-08 06:51:04.452108 7f80fbfdf700  1 rgw realm reloader: Pausing 
frontends for realm update...
    > 2017-12-08 06:51:04.452126 7f80fbfdf700  1 rgw realm reloader: Frontends 
paused
    > 2017-12-08 06:51:04.452891 7f8202436700  0 ERROR: failed to clone shard, 
completion_mgr.get_next() returned ret=-125
    I remember seeing this happen on our test cluster a while back with 
    Kraken. I can't find the tracker issue I originally found related to 
    this, but it also sounds like it could be a reversion of bug #20339 or 
    #20686?
    
    I recorded some strace output from one of the radosgw instances before 
    restarting, if it's useful to open an issue.
    
    -- 
    Graham Allan
    Minnesota Supercomputing Institute - g...@umn.edu
    _______________________________________________
    ceph-users mailing list
    ceph-users@lists.ceph.com
    http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
    

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to