Re: [ceph-users] which kernel version can help avoid kernel client deadlock

van Wed, 29 Jul 2015 22:17:18 -0700

> 
> On Jul 30, 2015, at 12:48 PM, Z Zhang <zhangz.da...@outlook.com> wrote:
> 
> We also hit the similar issue from time to time on centos with 3.10.x kernel. 
> By iostat, we can see kernel rbd client's util is 100%, but no r/w io, and we 
> can't umount/unmap this rbd client. After restarting OSDs, it will become 
> normal.


Is your rbd kernel client and ceph OSDs running on the same machine?
Or you’ve encountered this problem even you separate the kernel client and ceph 
OSDs?

> 
> @Ilya, could you pls point us the possible fixes on 3.18.19 towards this 
> issue? Then we can try to back-port them to our old kernel because we can't 
> jump to a major kernel version.  
> 
> Thanks.
> 
> David Zhang
> 
> 
> From: chaofa...@owtware.com
> Date: Thu, 30 Jul 2015 10:30:12 +0800
> To: idryo...@gmail.com
> CC: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] which kernel version can help avoid kernel client   
> deadlock
> 
> 
> On Jul 29, 2015, at 12:40 AM, Ilya Dryomov <idryo...@gmail.com 
> <mailto:idryo...@gmail.com>> wrote:
> 
> On Tue, Jul 28, 2015 at 7:20 PM, van <chaofa...@owtware.com 
> <mailto:chaofa...@owtware.com>> wrote:
> 
> On Jul 28, 2015, at 7:57 PM, Ilya Dryomov <idryo...@gmail.com 
> <mailto:idryo...@gmail.com>> wrote:
> 
> On Tue, Jul 28, 2015 at 2:46 PM, van <chaofa...@owtware.com 
> <mailto:chaofa...@owtware.com>> wrote:
> Hi, Ilya,
> 
> In the dmesg, there is also a lot of libceph socket error, which I think
> may be caused by my stopping ceph service without unmap rbd.
> 
> Well, sure enough, if you kill all OSDs, the filesystem mounted on top
> of rbd device will get stuck.
> 
> Sure it will get stuck if osds are stopped. And since rados requests have 
> retry policy, the stucked requests will recover after I start the daemon 
> again.
> 
> But in my case, the osds are running in normal state and librbd API can 
> read/write normally.
> Meanwhile, heavy fio test for the filesystem mounted on top of rbd device 
> will get stuck.
> 
> I wonder if this phenomenon is triggered by running rbd kernel client on 
> machines have ceph daemons, i.e. the annoying loopback mount deadlock issue.
> 
> In my opinion, if it’s due to the loopback mount deadlock, the OSDs will 
> become unresponsive.
> No matter the requests are from user space requests (like API) or from kernel 
> client.
> Am I right?
> 
> Not necessarily.
> 
> 
> If so, my case seems to be triggered by another bug.
> 
> Anyway, it seems that I should separate client and daemons at least.
> 
> Try 3.18.19 if you can.  I'd be interested in your results.
> 
> It’s strange, after I drop the page cache and restart my OSDs, same heavy IO 
> tests on rbd folder now works fine.
> The deadlock seems not that easy to trigger. Maybe I need longer tests.
> 
> I’ll try 3.18.19 LTS, thanks.
> 
> 
> Thanks,
> 
>                Ilya
> 
> 
> _______________________________________________ ceph-users mailing list 
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] which kernel version can help avoid kernel client deadlock

Reply via email to