Hello, On Thu, Feb 23, 2023 at 1:35 PM Ritesh Raj Sarraf <r...@debian.org> wrote:
> Hello Milan, > > On Mon, 2023-02-20 at 09:51 +0100, Milan Oravec wrote: > > > > > > > > > > So did you trigger a cluster takeover ? My guess is it is your > > > target > > > initiating the connection drops, while taking over to the other > > > node. > > > > > > > > > > > > There was no intentional cluster takeover. This happens during normal > > operation. > > > > > How a target behaves during the transition is left to the target. > > > The > > > initiator will keep querying for recovery, until either it times > > > out or > > > recovers. > > > > > > > > > recovery seems to work fine. I've tested it by disconnecting one path > > to target and all was OK, second path was used and when first one > > recovered is switched to that one over. > > > > > > > > > > > > > > > > > I can understand this is very complex problem, what do you suggest me > > to debug this issues? We have 5 host connected to Fujitsu target, but > > errors are only on those two which are running KVM guests. Other > > servers running mail hosts and camera surveillance are not affected > > it seems. They are Debian systems but not running multipath, so I've > > made test: > > > > So far I've disabled all KVM guests on one (of those two KVM hosts > > used fro virtualization) host and mounted one KVM guest file system > > locally (/mnt) and performed stress test on that mount: > > > > By KVM, do you mean just pure KVM ? Or the management suite too, > libvirt ? > Yes, libvirt is used to manage KVM, and virt-manager on my desktop to connect to it. It is simple setup without clustering. > > > > > When I let run one KVM guest (Debian) on this host system following > > errors are in dmesg: > > > > [Mon Feb 20 09:49:39 2023] connection1:0: pdu (op 0x5 itt 0x53) > > rejected. Reason code 0x9 > > [Mon Feb 20 09:49:41 2023] connection1:0: pdu (op 0x5 itt 0x7c) > > rejected. Reason code 0x9 > > [Mon Feb 20 09:49:41 2023] connection1:0: pdu (op 0x5 itt 0x51) > > rejected. Reason code 0x9 > > [Mon Feb 20 09:50:07 2023] connection1:0: pdu (op 0x5 itt 0x33) > > rejected. Reason code 0x9 > > > > Guest is using Virtio disk drivers (vda) so I've switched it to > > generic SATA (sda) for this guest, but errors in log remains. > > > > It seem that KVM is triggering this errors and make our NAS unstable. > > > > What can KVM do differently? It is using /dev/mapper/dm-XX as disk > > devices so no direct iscsi access. > > > > Any ideas what should I try next? > > > > > > I'm afraid I do not have much direct hints for you here. given that > this issue does not happen when KVM is not involved, would imply that > the erratic behavior originates from that software's integration. > > Do you know someone who can help with this? Here is example of KVM guest running configuration: libvirt+ 244533 1 5 Feb03 ? 1-03:54:37 qemu-system-x86_64 -enable-kvm -name guest=me_test,process=qemu:me_test,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-108-me_test/master-key.aes -machine pc-1.1,accel=kvm,usb=off,dump-guest-core=off -cpu host -m 8096 -realtime mlock=off -smp 4,sockets=1,cores=4,threads=1 -uuid 1591f345-96b5-4077-9d32-b2991003753d -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=57,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x4 -drive if=none,id=drive-ide0-1-0,readonly=on -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0,bootindex=1 -drive file=/dev/mapper/me_test,format=raw,if=none,id=drive-virtio-disk0,cache=none,discard=unmap,aio=native -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2,write-cache=on -netdev tap,fd=60,id=hostnet0,vhost=on,vhostfd=61 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=aa:bb:cc:00:10:31,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev spicevmc,id=charchannel0,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 -spice port=5929,addr=0.0.0.0,disable-ticketing,seamless-migration=on -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on Maybe there is something undesirable for the iscsi target. > > > > There will be errors in your system journal for this particular > > > setup. > > > > > > Errors like: > > > > > > * connection drops > > > * iscsi session drops/terminations > > > * SCSI errors > > > * multipath path checker errors > > > > > > All these will be errors which will be recovered eventually. That > > > is > > > why we have the need for close integration in between these layers, > > > when building a storage solution on top. > > > > > > > > > This is very complex ecosystem. I know that error reporting is good > > think :) and helping out to troubleshoot problems. But when > > everything is all right there should be no errors, right? > > > > In an ideal scenario, yes, there will be no errors. But on a SAN setup, > cluster failovers are a feature of the SAN target, and as such during > that transition some errors are expected on the initiator, which are > eventually recovered. > > Recovery is the critical part here. When states do not recover to > normal, it is an error; either the target or the initiator. Or even the > middleman (network) at times. > This part seems to work OK, so far no data loss was detected. > > > > > > > > > > > > > Note: These days I only have a software LIO target to test/play > > > with, > > > where I have not seen any real issues/errors. How each SAN Target > > > behaves is something highly specific to the target, in your case > > > the > > > Fujitsu target. > > > > > > > > > > > > Are you running KVM virtualization atop of your SAN target? > > > > My LIO target runs in a KVM guest. So does the iSCSI initiator too. > Pure KVM or libvirtd too? Thank you, kind regards Milan > > > -- > Ritesh Raj Sarraf | http://people.debian.org/~rrs > Debian - The Universal Operating System >