That doesn't appear to be an error -- that's just stating that it found a dead client that was holding the exclusice-lock, so it broke the dead client's lock on the image (by blacklisting the client).
On Fri, Jan 25, 2019 at 5:09 AM ST Wong (ITSC) <s...@itsc.cuhk.edu.hk> wrote: > > Oops, while I can map and mount the filesystem, still found error as below, > while rebooting the client machine freezes and have to power reset her. > > Jan 25 17:57:30 acapp1 kernel: XFS (rbd0): Mounting V5 Filesystem > Jan 25 17:57:30 acapp1 kernel: rbd: rbd0: client74700 seems dead, breaking > lock ß > Jan 25 17:57:30 acapp1 kernel: XFS (rbd0): Starting recovery (logdev: > internal) > Jan 25 17:57:30 acapp1 kernel: XFS (rbd0): Ending recovery (logdev: internal) > Jan 25 17:58:07 acapp1 kernel: rbd: rbd1: capacity 10737418240 features 0x5 > Jan 25 17:58:14 acapp1 kernel: XFS (rbd1): Mounting V5 Filesystem > Jan 25 17:58:14 acapp1 kernel: rbd: rbd1: client74700 seems dead, breaking > lock ß > Jan 25 17:58:15 acapp1 kernel: XFS (rbd1): Starting recovery (logdev: > internal) > Jan 25 17:58:15 acapp1 kernel: XFS (rbd1): Ending recovery (logdev: internal) > > Would you help ? Thanks. > /st > > -----Original Message----- > From: ceph-users <ceph-users-boun...@lists.ceph.com> On Behalf Of ST Wong > (ITSC) > Sent: Friday, January 25, 2019 5:58 PM > To: dilla...@redhat.com > Cc: ceph-users@lists.ceph.com > Subject: Re: [ceph-users] RBD client hangs > > Hi, It works. Thanks a lot. > > /st > > -----Original Message----- > From: Jason Dillaman <jdill...@redhat.com> > Sent: Tuesday, January 22, 2019 9:29 PM > To: ST Wong (ITSC) <s...@itsc.cuhk.edu.hk> > Cc: Ilya Dryomov <idryo...@gmail.com>; ceph-users@lists.ceph.com > Subject: Re: [ceph-users] RBD client hangs > > Your "mon" cap should be "profile rbd" instead of "allow r" [1]. > > [1] > http://docs.ceph.com/docs/master/rbd/rados-rbd-cmds/#create-a-block-device-user > > On Mon, Jan 21, 2019 at 9:05 PM ST Wong (ITSC) <s...@itsc.cuhk.edu.hk> wrote: > > > > Hi, > > > > > Is this an upgraded or a fresh cluster? > > It's a fresh cluster. > > > > > Does client.acapp1 have the permission to blacklist other clients? You > > > can check with "ceph auth get client.acapp1". > > > > No, it's our first Ceph cluster with basic setup for testing, without any > > blacklist implemented. > > > > --------------- cut here ----------- > > # ceph auth get client.acapp1 > > exported keyring for client.acapp1 > > [client.acapp1] > > key = <key here> > > caps mds = "allow r" > > caps mgr = "allow r" > > caps mon = "allow r" > > caps osd = "allow rwx pool=2copy, allow rwx pool=4copy" > > --------------- cut here ----------- > > > > Thanks a lot. > > /st > > > > > > > > -----Original Message----- > > From: Ilya Dryomov <idryo...@gmail.com> > > Sent: Monday, January 21, 2019 7:33 PM > > To: ST Wong (ITSC) <s...@itsc.cuhk.edu.hk> > > Cc: ceph-users@lists.ceph.com > > Subject: Re: [ceph-users] RBD client hangs > > > > On Mon, Jan 21, 2019 at 11:43 AM ST Wong (ITSC) <s...@itsc.cuhk.edu.hk> > > wrote: > > > > > > Hi, we’re trying mimic on an VM farm. It consists 4 OSD hosts (8 OSDs) > > > and 3 MON. We tried mounting as RBD and CephFS (fuse and kernel > > > mount) on different clients without problem. > > > > Is this an upgraded or a fresh cluster? > > > > > > > > Then one day we perform failover test and stopped one of the OSD. Not > > > sure if it’s related but after that testing, the RBD client freeze when > > > trying to mount the rbd device. > > > > > > > > > > > > Steps to reproduce: > > > > > > > > > > > > # modprobe rbd > > > > > > > > > > > > (dmesg) > > > > > > [ 309.997587] Key type dns_resolver registered > > > > > > [ 310.043647] Key type ceph registered > > > > > > [ 310.044325] libceph: loaded (mon/osd proto 15/24) > > > > > > [ 310.054548] rbd: loaded > > > > > > > > > > > > # rbd -n client.acapp1 map 4copy/foo > > > > > > /dev/rbd0 > > > > > > > > > > > > # rbd showmapped > > > > > > id pool image snap device > > > > > > 0 4copy foo - /dev/rbd0 > > > > > > > > > > > > > > > > > > Then hangs if I tried to mount or reboot the server after rbd map. > > > There are lot of error in dmesg, e.g. > > > > > > > > > > > > Jan 20 03:43:32 acapp1 kernel: rbd: rbd0: blacklist of client74700 > > > failed: -13 > > > > > > Jan 20 03:43:32 acapp1 kernel: rbd: rbd0: failed to acquire lock: > > > -13 > > > > > > Jan 20 03:43:32 acapp1 kernel: rbd: rbd0: no lock owners detected > > > > > > Jan 20 03:43:32 acapp1 kernel: rbd: rbd0: client74700 seems dead, > > > breaking lock > > > > > > Jan 20 03:43:32 acapp1 kernel: rbd: rbd0: blacklist of client74700 > > > failed: -13 > > > > > > Jan 20 03:43:32 acapp1 kernel: rbd: rbd0: failed to acquire lock: > > > -13 > > > > > > Jan 20 03:43:32 acapp1 kernel: rbd: rbd0: no lock owners detected > > > > Does client.acapp1 have the permission to blacklist other clients? You can > > check with "ceph auth get client.acapp1". If not, follow step 6 of > > http://docs.ceph.com/docs/master/releases/luminous/#upgrade-from-jewel-or-kraken. > > > > Thanks, > > > > Ilya > > _______________________________________________ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > -- > Jason > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Jason _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com