On 12/10/2014 17:48, Gregory Farnum wrote: > On Sun, Oct 12, 2014 at 7:46 AM, Loic Dachary <l...@dachary.org> wrote: >> Hi, >> >> On a 0.80.6 cluster the command >> >> ceph tell osd.6 version >> >> hangs forever. I checked that it establishes a TCP connection to the OSD, >> raised the OSD debug level to 20 and I do not see >> >> https://github.com/ceph/ceph/blob/firefly/src/osd/OSD.cc#L4991 >> >> in the logs. All other OSDs answer to the same "version" command as they >> should. And ceph daemon osd.6 version on the machine running OSD 6 responds >> as it should. There also are an ever growing number of slow requests on this >> OSD. But not error in the logs. In other words, except for taking forever to >> answer any kind of request the OSD looks fine. >> >> Another OSD running on the same machine is behaving well. >> >> Any idea what that behaviour relates to ? > > What commands have you run? The admin socket commands don't require > nearly as many locks, nor do they go through the same event loops that > messages do. You might have found a deadlock or something. (In which > case just restarting the OSD would probably fix it, but you should > grab a core dump first.)
# /etc/init.d/ceph stop osd.6 === osd.6 === Stopping Ceph osd.6 on g3...kill 23690...kill 23690...done root@g3:/var/lib/ceph/osd/ceph-6/current# /etc/init.d/ceph start osd.6 === osd.6 === Starting Ceph osd.6 on g3... starting osd.6 at :/0 osd_data /var/lib/ceph/osd/ceph-6 /var/lib/ceph/osd/ceph-6/journal root@g3:/var/lib/ceph/osd/ceph-6/current# ceph tell osd.6 version { "version": "ceph version 0.80.6 (f93610a4421cb670b08e974c6550ee715ac528ae)"} root@g3:/var/lib/ceph/osd/ceph-6/current# ceph tell osd.6 version and now it blocks. It looks like a deadlock happens shortly after it boots. -- Loïc Dachary, Artisan Logiciel Libre
signature.asc
Description: OpenPGP digital signature
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com