Re: [ceph-users] hung rbd requests for one pool

2017-04-26 Thread Phil Lacroute
A quick update just to close out this thread: After investigating with netstat I found one ceph-osd process had three TCP connections in established state but with no connection state on the peer system (the client node that previously had been using the RBD image). The qemu process on the cli

Re: [ceph-users] hung rbd requests for one pool

2017-04-24 Thread Jason Dillaman
I would double-check your file descriptor limits on both sides -- OSDs and the client. 131 sockets shouldn't make a difference. Port is open on any possible firewalls you have running? On Mon, Apr 24, 2017 at 8:14 PM, Phil Lacroute wrote: > Yes it is the correct IP and port: > > ceph3:~$ netstat

Re: [ceph-users] hung rbd requests for one pool

2017-04-24 Thread Phil Lacroute
Yes it is the correct IP and port: ceph3:~$ netstat -anp | fgrep 192.168.206.13:6804 tcp0 0 192.168.206.13:6804 0.0.0.0:* LISTEN 22934/ceph-osd I turned up the logging on the osd and I don’t think it received the request. However I also noticed a large num

Re: [ceph-users] hung rbd requests for one pool

2017-04-24 Thread Jason Dillaman
Just to cover all the bases, is 192.168.206.13:6804 really associated with a running daemon for OSD 11? On Mon, Apr 24, 2017 at 4:23 PM, Phil Lacroute wrote: > Jason, > > Thanks for the suggestion. That seems to show it is not the OSD that got > stuck: > > ceph7:~$ sudo rbd -c debug/ceph.conf in

Re: [ceph-users] hung rbd requests for one pool

2017-04-24 Thread Peter Maloney
On 04/24/17 22:23, Phil Lacroute wrote: > Jason, > > Thanks for the suggestion. That seems to show it is not the OSD that > got stuck: > > ceph7:~$ sudo rbd -c debug/ceph.conf info app/image1 > … > 2017-04-24 13:13:49.761076 7f739aefc700 1 -- > 192.168.206.17:0/1250293899 --> 192.168.206.13:6804/

Re: [ceph-users] hung rbd requests for one pool

2017-04-24 Thread Phil Lacroute
Jason, Thanks for the suggestion. That seems to show it is not the OSD that got stuck: ceph7:~$ sudo rbd -c debug/ceph.conf info app/image1 … 2017-04-24 13:13:49.761076 7f739aefc700 1 -- 192.168.206.17:0/1250293899 --> 192.168.206.13:6804/22934 -- osd_op(client.4384.0:3 1.af6f1e38 rbd_header.

Re: [ceph-users] hung rbd requests for one pool

2017-04-24 Thread Jason Dillaman
On Mon, Apr 24, 2017 at 2:53 PM, Phil Lacroute wrote: > 2017-04-24 11:30:57.058233 7f5512ffd700 1 -- 192.168.206.17:0/3282647735 > --> 192.168.206.13:6804/22934 -- osd_op(client.4375.0:3 1.af6f1e38 > rbd_header.1058238e1f29 [call rbd.get_size,call rbd.get_object_prefix] snapc > 0=[] ack+read+know