Re: [ceph-users] Kernel RBD hang on OSD Failure

2015-12-17 Thread Tom Christensen
I've just checked 1072 and 872, they both look the same, a single op for the object in question, in retry+read state, appears to be retrying forever. On Thu, Dec 17, 2015 at 10:05 AM, Tom Christensen wrote: > I had already nuked the previous hang, but we have another one: > > osdc output: > > 7

Re: [ceph-users] Kernel RBD hang on OSD Failure

2015-12-12 Thread Ilya Dryomov
On Sat, Dec 12, 2015 at 6:37 PM, Tom Christensen wrote: > We had a kernel map get hung up again last night/this morning. The rbd is > mapped but unresponsive, if I try to unmap it I get the following error: > rbd: sysfs write failed > rbd: unmap failed: (16) Device or resource busy > > Now that t

Re: [ceph-users] Kernel RBD hang on OSD Failure

2015-12-10 Thread Matt Conner
Hi Ilya, I had already recovered but I managed to recreate the problem again. I ran the commands against rbd_data.f54f9422698a8. which was one of those listed in osdc this time. We have 2048 PGs in the pool so the list is long. As for when I fetched the object using rados, it grab

Re: [ceph-users] Kernel RBD hang on OSD Failure

2015-12-08 Thread Ilya Dryomov
On Tue, Dec 8, 2015 at 11:53 AM, Tom Christensen wrote: > To be clear, we are also using format 2 RBDs, so we didn't really expect it > to work until recently as it was listed as unsupported. We are under the > understanding that as of 3.19 RBD format 2 should be supported. Are we > incorrect in

Re: [ceph-users] Kernel RBD hang on OSD Failure

2015-12-08 Thread Tom Christensen
To be clear, we are also using format 2 RBDs, so we didn't really expect it to work until recently as it was listed as unsupported. We are under the understanding that as of 3.19 RBD format 2 should be supported. Are we incorrect in that understanding? On Tue, Dec 8, 2015 at 3:44 AM, Tom Christe

Re: [ceph-users] Kernel RBD hang on OSD Failure

2015-12-08 Thread Tom Christensen
We haven't submitted a ticket as we've just avoided using the kernel client. We've periodically tried with various kernels and various versions of ceph over the last two years, but have just given up each time and reverted to using rbd-fuse, which although not super stable, at least doesn't hang t

Re: [ceph-users] Kernel RBD hang on OSD Failure

2015-12-08 Thread Ilya Dryomov
On Tue, Dec 8, 2015 at 10:57 AM, Tom Christensen wrote: > We aren't running NFS, but regularly use the kernel driver to map RBDs and > mount filesystems in same. We see very similar behavior across nearly all > kernel versions we've tried. In my experience only very few versions of the > kernel

Re: [ceph-users] Kernel RBD hang on OSD Failure

2015-12-08 Thread Tom Christensen
We aren't running NFS, but regularly use the kernel driver to map RBDs and mount filesystems in same. We see very similar behavior across nearly all kernel versions we've tried. In my experience only very few versions of the kernel driver survive any sort of crush map change/update while somethin

Re: [ceph-users] Kernel RBD hang on OSD Failure

2015-12-07 Thread Blair Bethwaite
Hi Matt, (CC'ing in ceph-users too - similar reports there: http://www.spinics.net/lists/ceph-users/msg23037.html) We've seen something similar for KVM [lib]RBD clients acting as NFS gateways within our OpenStack cloud, the NFS services were locking up and causing client timeouts whenever we star