Have you tested on the new ceph-fuse? This does sound vaguely familiar and is an issue I'd generally expect to have the fix backported for, once it was identified.
On Thu, Nov 2, 2017 at 11:40 AM Andras Pataki <apat...@flatironinstitute.org> wrote: > We've been running into a strange problem with Ceph using ceph-fuse and > the filesystem. All the back end nodes are on 10.2.10, the fuse clients > are on 10.2.7. > > After some hours of runs, some processes get stuck waiting for fuse like: > > [root@worker1144 ~]# cat /proc/58193/stack > [<ffffffffa08cd241>] wait_answer_interruptible+0x91/0xe0 [fuse] > [<ffffffffa08cd653>] __fuse_request_send+0x253/0x2c0 [fuse] > [<ffffffffa08cd6d2>] fuse_request_send+0x12/0x20 [fuse] > [<ffffffffa08d69d6>] fuse_send_write+0xd6/0x110 [fuse] > [<ffffffffa08d84d5>] fuse_perform_write+0x2f5/0x5a0 [fuse] > [<ffffffffa08d8a21>] fuse_file_aio_write+0x2a1/0x340 [fuse] > [<ffffffff811fdfbd>] do_sync_write+0x8d/0xd0 > [<ffffffff811fe82d>] vfs_write+0xbd/0x1e0 > [<ffffffff811ff34f>] SyS_write+0x7f/0xe0 > [<ffffffff816975c9>] system_call_fastpath+0x16/0x1b > [<ffffffffffffffff>] 0xffffffffffffffff > > The cluster is healthy (all OSDs up, no slow requests, etc.). More > details of my investigation efforts are in the bug report I just submitted: > http://tracker.ceph.com/issues/22008 > > It looks like the fuse client is asking for some caps that it never > thinks it receives from the MDS, so the thread waiting for those caps on > behalf of the writing client never wakes up. The restart of the MDS > fixes the problem (since ceph-fuse re-negotiates caps). > > Any ideas/suggestions? > > Andras > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com