[ceph-users] TCMU Runner: Could not check lock ownership. Error: Cannot send after transport endpoint shutdown

Kilian Ries Tue, 22 Oct 2019 01:20:30 -0700

Hi,


i'm running a ceph cluster with 4x ISCSI exporter nodes and oVirt on the client 
side. In the tcmu-runner logs i the the following happening every few seconds:


###

2019-10-22 10:11:11.231 1710 [WARN] tcmu_rbd_lock:762 rbd/image.lun0: Acquired 
exclusive lock.

2019-10-22 10:11:11.395 1710 [ERROR] tcmu_rbd_has_lock:516 rbd/image.lun2: 
Could not check lock ownership. Error: Cannot send after transport endpoint 
shutdown.

2019-10-22 10:11:12.346 1710 [WARN] tcmu_notify_lock_lost:222 rbd/image.lun0: 
Async lock drop. Old state 1

2019-10-22 10:11:12.353 1710 [INFO] alua_implicit_transition:566 
rbd/image.lun0: Starting lock acquisition operation.

2019-10-22 10:11:13.325 1710 [INFO] alua_implicit_transition:566 
rbd/image.lun0: Starting lock acquisition operation.

2019-10-22 10:11:13.852 1710 [ERROR] tcmu_rbd_has_lock:516 rbd/image.lun2: 
Could not check lock ownership. Error: Cannot send after transport endpoint 
shutdown.

2019-10-22 10:11:13.854 1710 [ERROR] tcmu_rbd_has_lock:516 rbd/image.lun1: 
Could not check lock ownership. Error: Cannot send after transport endpoint 
shutdown.

2019-10-22 10:11:13.863 1710 [ERROR] tcmu_rbd_has_lock:516 rbd/image.lun1: 
Could not check lock ownership. Error: Cannot send after transport endpoint 
shutdown.

2019-10-22 10:11:14.202 1710 [INFO] alua_implicit_transition:566 
rbd/image.lun0: Starting lock acquisition operation.

2019-10-22 10:11:14.285 1710 [ERROR] tcmu_rbd_has_lock:516 rbd/image.lun2: 
Could not check lock ownership. Error: Cannot send after transport endpoint 
shutdown.

2019-10-22 10:11:15.217 1710 [WARN] tcmu_rbd_lock:762 rbd/image.lun0: Acquired 
exclusive lock.

2019-10-22 10:11:15.873 1710 [ERROR] tcmu_rbd_has_lock:516 rbd/image.lun2: 
Could not check lock ownership. Error: Cannot send after transport endpoint 
shutdown.

2019-10-22 10:11:16.696 1710 [WARN] tcmu_notify_lock_lost:222 rbd/image.lun0: 
Async lock drop. Old state 1

2019-10-22 10:11:16.696 1710 [INFO] alua_implicit_transition:566 
rbd/image.lun0: Starting lock acquisition operation.

2019-10-22 10:11:16.696 1710 [WARN] tcmu_notify_lock_lost:222 rbd/image.lun0: 
Async lock drop. Old state 2

2019-10-22 10:11:16.992 1710 [ERROR] tcmu_rbd_has_lock:516 rbd/image.lun2: 
Could not check lock ownership. Error: Cannot send after transport endpoint 
shutdown.

###



This happens on all of my four iscsi exporter nodes. Blacklist gives me the 
following (number of blacklisted objects does not really shrink):


###

ceph osd blacklist ls


listed 10579 entries

###



On the client site i configured the multipath config like this:


###

    device {

        vendor                 "LIO-ORG"

        hardware_handler       "1 alua"

        path_grouping_policy   "failover"

        path_selector          "queue-length 0"

        failback               60

        path_checker           tur

        prio                   alua

        prio_args              exclusive_pref_bit

        fast_io_fail_tmo       25

        no_path_retry          queue

    }

###


And multipath -ll shows me all four path as "active ready" without errors.



For me this looks like the tcmu-runner cannot aquire exclusive lock and it is 
flapping between nodes. In addition, in the ceph gui / dashboard i can see the 
LUNs in the "active / optimized" state are flapping between nodes ...




I'm have installed the following versions (CentOS 7.7, Ceph 13.2.6):


###

rpm -qa |egrep "ceph|iscsi|tcmu|rst|kernel"


python-cephfs-13.2.6-0.el7.x86_64

ceph-selinux-13.2.6-0.el7.x86_64

kernel-3.10.0-957.5.1.el7.x86_64

kernel-3.10.0-957.1.3.el7.x86_64

kernel-tools-libs-3.10.0-1062.1.2.el7.x86_64

libcephfs2-13.2.6-0.el7.x86_64

libtcmu-1.4.0-106.gd17d24e.el7.x86_64

ceph-common-13.2.6-0.el7.x86_64

ceph-osd-13.2.6-0.el7.x86_64

tcmu-runner-1.4.0-106.gd17d24e.el7.x86_64

kernel-3.10.0-1062.1.2.el7.x86_64

ceph-iscsi-3.3-1.el7.noarch

kernel-headers-3.10.0-1062.1.2.el7.x86_64

kernel-3.10.0-862.14.4.el7.x86_64

ceph-base-13.2.6-0.el7.x86_64

kernel-tools-3.10.0-1062.1.2.el7.x86_64

###


Greets,

Kilian

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] TCMU Runner: Could not check lock ownership. Error: Cannot send after transport endpoint shutdown

Reply via email to