Thanks! Does that mean that occasional iSCSI path drop-outs are somewhat expected? We are using SSDs for WAL/DB on each OSD server, so at least that.
Do you think that If we buy additional 6/12 HDDs would that help with the IOPS for the VMs? Regards, Martin > On 4 Oct 2020, at 15:17, Martin Verges <martin.ver...@croit.io> wrote: > > Hello, > > no iSCSI + VMware works without such problems. > > > We are on latest Nautilus, 12 x 10 TB OSDs (4 servers), 25 Gbit/s Ethernet, > > erasure coded rbd pool with 128 PGs, aroun 200 PGs per OSD total. > > Nautilus is a good choice > 12*10TB HDD is not good for VMs > 25Gbit/s on HDD is way to much for that system > 200 PGs per OSD is to much, I would suggest 75-100 PGs per OSD > > You can improve latency on HDD clusters using external DB/WAL on NVMe. That > might help you > > -- > Martin Verges > Managing director > > Mobile: +49 174 9335695 > E-Mail: martin.ver...@croit.io <mailto:martin.ver...@croit.io> > Chat: https://t.me/MartinVerges <https://t.me/MartinVerges> > > croit GmbH, Freseniusstr. 31h, 81247 Munich > CEO: Martin Verges - VAT-ID: DE310638492 > Com. register: Amtsgericht Munich HRB 231263 > > Web: https://croit.io <https://croit.io/> > YouTube: https://goo.gl/PGE1Bx <https://goo.gl/PGE1Bx> > > > Am So., 4. Okt. 2020 um 14:37 Uhr schrieb Golasowski Martin > <martin.golasow...@vsb.cz <mailto:martin.golasow...@vsb.cz>>: > Hi, > does anyone here use CEPH iSCSI with VMware ESXi? It seems that we are > hitting the 5 second timeout limit on software HBA in ESXi. It appears > whenever there is increased load on the cluster, like deep scrub or > rebalance. Is it normal behaviour in production? Or is there something > special we need to tune? > > We are on latest Nautilus, 12 x 10 TB OSDs (4 servers), 25 Gbit/s Ethernet, > erasure coded rbd pool with 128 PGs, aroun 200 PGs per OSD total. > > > ESXi Log: > > 2020-10-04T01:57:04.314Z cpu34:2098959)WARNING: iscsi_vmk: > iscsivmk_ConnReceiveAtomic:517: vmhba64:CH:1 T:0 CN:0: Failed to receive > data: Connection closed by peer > 2020-10-04T01:57:04.314Z cpu34:2098959)iscsi_vmk: > iscsivmk_ConnRxNotifyFailure:1235: vmhba64:CH:1 T:0 CN:0: Connection rx > notifying failure: Failed to Receive. State=Bound > 2020-10-04T01:57:04.566Z cpu19:2098979)WARNING: iscsi_vmk: > iscsivmk_StopConnection:741: vmhba64:CH:1 T:0 CN:0: iSCSI connection is being > marked "OFFLINE" (Event:4) > 2020-10-04T01:57:04.654Z cpu7:2097866)WARNING: VMW_SATP_ALUA: > satp_alua_issueCommandOnPath:788: Probe cmd 0xa3 failed for path > "vmhba64:C2:T0:L0" (0x5/0x20/0x0). Check if failover mode is still ALUA. > > > OSD Log: > > [303088.450088] Did not receive response to NOPIN on CID: 0, failing > connection for I_T Nexus > iqn.1994-05.com.redhat:esxi1,i,0x00023d000002,iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw,t,0x01 > [324926.694077] Did not receive response to NOPIN on CID: 0, failing > connection for I_T Nexus > iqn.1994-05.com.redhat:esxi2,i,0x00023d000001,iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw,t,0x01 > [407067.404538] ABORT_TASK: Found referenced iSCSI task_tag: 5891 > [407076.077175] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 5891 > [411677.887690] ABORT_TASK: Found referenced iSCSI task_tag: 6722 > [411683.297425] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 6722 > [481459.755876] ABORT_TASK: Found referenced iSCSI task_tag: 7930 > [481460.787968] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 7930 > > Cheers, > Martin_______________________________________________ > ceph-users mailing list -- ceph-users@ceph.io <mailto:ceph-users@ceph.io> > To unsubscribe send an email to ceph-users-le...@ceph.io > <mailto:ceph-users-le...@ceph.io>
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io