Hi,
does anyone here use CEPH iSCSI with VMware ESXi? It seems that we are hitting 
the 5 second timeout limit on software HBA in ESXi. It appears whenever there 
is increased load on the cluster, like deep scrub or rebalance. Is it normal 
behaviour in production? Or is there something special we need to tune?
We are on latest Nautilus, 12 x 10 TB OSDs (4 servers), 25 Gbit/s Ethernet, 
erasure coded rbd pool with 128 PGs, aroun 200 PGs per OSD total.


ESXi Log:

2020-10-04T01:57:04.314Z cpu34:2098959)WARNING: iscsi_vmk: 
iscsivmk_ConnReceiveAtomic:517: vmhba64:CH:1 T:0 CN:0: Failed to receive data: 
Connection closed by peer
2020-10-04T01:57:04.314Z cpu34:2098959)iscsi_vmk: 
iscsivmk_ConnRxNotifyFailure:1235: vmhba64:CH:1 T:0 CN:0: Connection rx 
notifying failure: Failed to Receive. State=Bound
2020-10-04T01:57:04.566Z cpu19:2098979)WARNING: iscsi_vmk: 
iscsivmk_StopConnection:741: vmhba64:CH:1 T:0 CN:0: iSCSI connection is being 
marked "OFFLINE" (Event:4)
2020-10-04T01:57:04.654Z cpu7:2097866)WARNING: VMW_SATP_ALUA: 
satp_alua_issueCommandOnPath:788: Probe cmd 0xa3 failed for path 
"vmhba64:C2:T0:L0" (0x5/0x20/0x0). Check if failover mode is still ALUA.


OSD Log:

[303088.450088] Did not receive response to NOPIN on CID: 0, failing connection 
for I_T Nexus 
iqn.1994-05.com.redhat:esxi1,i,0x00023d000002,iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw,t,0x01
[324926.694077] Did not receive response to NOPIN on CID: 0, failing connection 
for I_T Nexus 
iqn.1994-05.com.redhat:esxi2,i,0x00023d000001,iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw,t,0x01
[407067.404538] ABORT_TASK: Found referenced iSCSI task_tag: 5891
[407076.077175] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 5891
[411677.887690] ABORT_TASK: Found referenced iSCSI task_tag: 6722
[411683.297425] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 6722
[481459.755876] ABORT_TASK: Found referenced iSCSI task_tag: 7930
[481460.787968] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 7930

Cheers,
Martin

Attachment: smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to