> > On Jul 30, 2015, at 12:48 PM, Z Zhang <zhangz.da...@outlook.com> wrote: > > We also hit the similar issue from time to time on centos with 3.10.x kernel. > By iostat, we can see kernel rbd client's util is 100%, but no r/w io, and we > can't umount/unmap this rbd client. After restarting OSDs, it will become > normal.
Is your rbd kernel client and ceph OSDs running on the same machine? Or you’ve encountered this problem even you separate the kernel client and ceph OSDs? > > @Ilya, could you pls point us the possible fixes on 3.18.19 towards this > issue? Then we can try to back-port them to our old kernel because we can't > jump to a major kernel version. > > Thanks. > > David Zhang > > > From: chaofa...@owtware.com > Date: Thu, 30 Jul 2015 10:30:12 +0800 > To: idryo...@gmail.com > CC: ceph-users@lists.ceph.com > Subject: Re: [ceph-users] which kernel version can help avoid kernel client > deadlock > > > On Jul 29, 2015, at 12:40 AM, Ilya Dryomov <idryo...@gmail.com > <mailto:idryo...@gmail.com>> wrote: > > On Tue, Jul 28, 2015 at 7:20 PM, van <chaofa...@owtware.com > <mailto:chaofa...@owtware.com>> wrote: > > On Jul 28, 2015, at 7:57 PM, Ilya Dryomov <idryo...@gmail.com > <mailto:idryo...@gmail.com>> wrote: > > On Tue, Jul 28, 2015 at 2:46 PM, van <chaofa...@owtware.com > <mailto:chaofa...@owtware.com>> wrote: > Hi, Ilya, > > In the dmesg, there is also a lot of libceph socket error, which I think > may be caused by my stopping ceph service without unmap rbd. > > Well, sure enough, if you kill all OSDs, the filesystem mounted on top > of rbd device will get stuck. > > Sure it will get stuck if osds are stopped. And since rados requests have > retry policy, the stucked requests will recover after I start the daemon > again. > > But in my case, the osds are running in normal state and librbd API can > read/write normally. > Meanwhile, heavy fio test for the filesystem mounted on top of rbd device > will get stuck. > > I wonder if this phenomenon is triggered by running rbd kernel client on > machines have ceph daemons, i.e. the annoying loopback mount deadlock issue. > > In my opinion, if it’s due to the loopback mount deadlock, the OSDs will > become unresponsive. > No matter the requests are from user space requests (like API) or from kernel > client. > Am I right? > > Not necessarily. > > > If so, my case seems to be triggered by another bug. > > Anyway, it seems that I should separate client and daemons at least. > > Try 3.18.19 if you can. I'd be interested in your results. > > It’s strange, after I drop the page cache and restart my OSDs, same heavy IO > tests on rbd folder now works fine. > The deadlock seems not that easy to trigger. Maybe I need longer tests. > > I’ll try 3.18.19 LTS, thanks. > > > Thanks, > > Ilya > > > _______________________________________________ ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com