Hi,

> Is this an upgraded or a fresh cluster?
It's a fresh cluster.

> Does client.acapp1 have the permission to blacklist other clients?  You can 
> check with "ceph auth get client.acapp1".  

No,  it's our first Ceph cluster with basic setup for testing, without any 
blacklist implemented.     

--------------- cut here -----------
# ceph auth get client.acapp1
exported keyring for client.acapp1
[client.acapp1]
        key = <key here>
        caps mds = "allow r"
        caps mgr = "allow r"
        caps mon = "allow r"
        caps osd = "allow rwx pool=2copy, allow rwx pool=4copy"
--------------- cut here -----------

Thanks a lot.
/st



-----Original Message-----
From: Ilya Dryomov <idryo...@gmail.com> 
Sent: Monday, January 21, 2019 7:33 PM
To: ST Wong (ITSC) <s...@itsc.cuhk.edu.hk>
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] RBD client hangs

On Mon, Jan 21, 2019 at 11:43 AM ST Wong (ITSC) <s...@itsc.cuhk.edu.hk> wrote:
>
> Hi, we’re trying mimic on an VM farm.  It consists 4 OSD hosts (8 OSDs) and 3 
> MON.     We tried mounting as RBD and CephFS (fuse and kernel mount) on 
> different clients without problem.

Is this an upgraded or a fresh cluster?

>
> Then one day we perform failover test and stopped one of the OSD.  Not sure 
> if it’s related but after that testing, the RBD client freeze when trying to 
> mount the rbd device.
>
>
>
> Steps to reproduce:
>
>
>
> # modprobe rbd
>
>
>
> (dmesg)
>
> [  309.997587] Key type dns_resolver registered
>
> [  310.043647] Key type ceph registered
>
> [  310.044325] libceph: loaded (mon/osd proto 15/24)
>
> [  310.054548] rbd: loaded
>
>
>
> # rbd -n client.acapp1 map 4copy/foo
>
> /dev/rbd0
>
>
>
> # rbd showmapped
>
> id pool  image snap device
>
> 0  4copy foo   -    /dev/rbd0
>
>
>
>
>
> Then hangs if I tried to mount or reboot the server after rbd map.   There 
> are lot of error in dmesg, e.g.
>
>
>
> Jan 20 03:43:32 acapp1 kernel: rbd: rbd0: blacklist of client74700 
> failed: -13
>
> Jan 20 03:43:32 acapp1 kernel: rbd: rbd0: failed to acquire lock: -13
>
> Jan 20 03:43:32 acapp1 kernel: rbd: rbd0: no lock owners detected
>
> Jan 20 03:43:32 acapp1 kernel: rbd: rbd0: client74700 seems dead, 
> breaking lock
>
> Jan 20 03:43:32 acapp1 kernel: rbd: rbd0: blacklist of client74700 
> failed: -13
>
> Jan 20 03:43:32 acapp1 kernel: rbd: rbd0: failed to acquire lock: -13
>
> Jan 20 03:43:32 acapp1 kernel: rbd: rbd0: no lock owners detected

Does client.acapp1 have the permission to blacklist other clients?  You can 
check with "ceph auth get client.acapp1".  If not, follow step 6 of 
http://docs.ceph.com/docs/master/releases/luminous/#upgrade-from-jewel-or-kraken.

Thanks,

                Ilya
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to