Hello,

On Thu, Feb 23, 2023 at 1:35 PM Ritesh Raj Sarraf <r...@debian.org> wrote:

> Hello Milan,
>
> On Mon, 2023-02-20 at 09:51 +0100, Milan Oravec wrote:
> > > >
> > >
> > > So did you trigger a cluster takeover ? My guess is it is your
> > > target
> > > initiating the connection drops, while taking over to the other
> > > node.
> > >
> > >
> >
> >
> > There was no intentional cluster takeover. This happens during normal
> > operation.
> >
> > > How a target behaves during the transition is left to the target.
> > > The
> > > initiator will keep querying for recovery, until either it times
> > > out or
> > > recovers.
> > >
> >
> >
> > recovery seems to work fine. I've tested it by disconnecting one path
> > to target and all was OK, second path was used and when first one
> > recovered is switched to that one over.
> >
>
>
> > >
> > > > > >
> >
> > I can understand this is very complex problem, what do you suggest me
> > to debug this issues? We have 5 host connected to Fujitsu target, but
> > errors are only on those two which are running KVM guests. Other
> > servers running mail hosts and camera surveillance are not affected
> > it seems. They are Debian systems but not running multipath, so I've
> > made test:
> >
> > So far I've disabled all KVM guests on one (of those two KVM hosts
> > used fro virtualization) host and mounted  one KVM guest file system
> > locally (/mnt) and performed stress test on that mount:
> >
>
> By KVM, do you mean just pure KVM ? Or the management suite too,
> libvirt ?
>

Yes, libvirt is used to manage KVM, and virt-manager on my desktop to
connect to it. It is simple setup without clustering.


>
> >
> > When I let run one KVM guest (Debian) on this host system following
> > errors are in dmesg:
> >
> > [Mon Feb 20 09:49:39 2023]  connection1:0: pdu (op 0x5 itt 0x53)
> > rejected. Reason code 0x9
> > [Mon Feb 20 09:49:41 2023]  connection1:0: pdu (op 0x5 itt 0x7c)
> > rejected. Reason code 0x9
> > [Mon Feb 20 09:49:41 2023]  connection1:0: pdu (op 0x5 itt 0x51)
> > rejected. Reason code 0x9
> > [Mon Feb 20 09:50:07 2023]  connection1:0: pdu (op 0x5 itt 0x33)
> > rejected. Reason code 0x9
> >
> > Guest is using Virtio disk drivers (vda) so I've switched it to
> > generic SATA (sda) for this guest, but errors in log remains.
> >
> > It seem that KVM is triggering this errors and make our NAS unstable.
> >
> > What can KVM do differently? It is using /dev/mapper/dm-XX as disk
> > devices so no direct iscsi access.
> >
> > Any ideas what should I try next?
> >
> >
>
> I'm afraid I do not have much direct hints for you here. given that
> this issue does not happen when KVM is not involved, would imply that
> the erratic behavior originates from that software's integration.
>
>
Do you know someone who can help with this? Here is example of KVM guest
running configuration:

libvirt+ 244533      1  5 Feb03 ?        1-03:54:37 qemu-system-x86_64
-enable-kvm -name guest=me_test,process=qemu:me_test,debug-threads=on -S
-object
secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-108-me_test/master-key.aes
-machine pc-1.1,accel=kvm,usb=off,dump-guest-core=off -cpu host -m 8096
-realtime mlock=off -smp 4,sockets=1,cores=4,threads=1 -uuid
1591f345-96b5-4077-9d32-b2991003753d -no-user-config -nodefaults -chardev
socket,id=charmonitor,fd=57,server,nowait -mon
chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown
-boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2
-device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x4 -drive
if=none,id=drive-ide0-1-0,readonly=on -device
ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0,bootindex=1 -drive
file=/dev/mapper/me_test,format=raw,if=none,id=drive-virtio-disk0,cache=none,discard=unmap,aio=native
-device
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2,write-cache=on
-netdev tap,fd=60,id=hostnet0,vhost=on,vhostfd=61 -device
virtio-net-pci,netdev=hostnet0,id=net0,mac=aa:bb:cc:00:10:31,bus=pci.0,addr=0x3
-chardev pty,id=charserial0 -device
isa-serial,chardev=charserial0,id=serial0 -chardev
spicevmc,id=charchannel0,name=vdagent -device
virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0
-spice port=5929,addr=0.0.0.0,disable-ticketing,seamless-migration=on
-device
qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pci.0,addr=0x2
-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 -sandbox
on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny
-msg timestamp=on

Maybe there is something undesirable for the iscsi target.


>
> > > There will be errors in your system journal for this particular
> > > setup.
> > >
> > > Errors like:
> > >
> > > * connection drops
> > > * iscsi session drops/terminations
> > > * SCSI errors
> > > * multipath path checker errors
> > >
> > > All these will be errors which will be recovered eventually. That
> > > is
> > > why we have the need for close integration in between these layers,
> > > when building a storage solution on top.
> > >
> >
> >
> > This is very complex ecosystem. I know that error reporting is good
> > think :) and helping out to troubleshoot problems. But when
> > everything is all right there should be no errors, right?
> >
>
> In an ideal scenario, yes, there will be no errors. But on a SAN setup,
> cluster failovers are a feature of the SAN target, and as such during
> that transition some errors are expected on the initiator, which are
> eventually recovered.
>
> Recovery is the critical part here. When states do not recover to
> normal, it is an error; either the target or the initiator. Or even the
> middleman (network) at times.
>

This part seems to work OK, so far no data loss was detected.


>
> > >
> > >
> > >
> > > Note: These days I only have a software LIO target to test/play
> > > with,
> > > where I have not seen any real issues/errors. How each SAN Target
> > > behaves is something highly specific to the target, in your case
> > > the
> > > Fujitsu target.
> > >
> > >
> >
> >
> > Are you running KVM virtualization atop of your SAN target?
> >
>
> My LIO target runs in a KVM guest. So does the iSCSI initiator too.
>

Pure KVM or libvirtd too?

Thank you, kind regards

Milan


>
>
> --
> Ritesh Raj Sarraf | http://people.debian.org/~rrs
> Debian - The Universal Operating System
>

Reply via email to