On 14-06-11 15:48, Florian Haas wrote: > On 2011-06-14 15:41, Jelle de Jong wrote: >> On 14-06-11 15:22, Florian Haas wrote: >>> On 2011-06-10 17:28, Jelle de Jong wrote: >>>> The problem is most of my kvm guest file-systems get corrupted when >>>> migrating my iscsi target on heavy disk load on the kvm guests. >>> Have you tried setting DefaultTime2Retain like I suggested on Feb 24? >> # root@godfrey:~# crm configure show >> http://paste.debian.net/119798/ > > DefaultTime2Retain is a parameter that is being negotiated between the > target and the initiator, and the _minimum_ of proposed > DefaultTime2Retain values wins. The default DefaultTime2Retain for > open-iscsi is 0, thus if the initiator proposes 0 and the target 60, 0 wins. > > You'll have to set this on the initiator and the target.
Florian, thank you for taking the time to help! much appreciated! # root@godfrey:~# tgtadm --lld iscsi --mode target --op show # root@hennessy:~# iscsiadm --mode node --targetname ... --portal ... # root@viktoriya:~# iscsiadm -m session -P 1 --show # root@viktoriya:~# iscsiadm --mode node --targetname ... --portal ... # root@hennessy:~# cat /etc/iscsi/iscsid.conf http://paste.debian.net/119805/ # found: node.session.iscsi.DefaultTime2Retain = 0 node.session.iscsi.DefaultTime2Wait = 2 # stopped open-iscsi added the following started it again: echo 'node.session.iscsi.DefaultTime2Retain = 60' | tee --append /etc/iscsi/iscsid.conf echo 'node.session.iscsi.DefaultTime2Wait = 5' | tee --append /etc/iscsi/iscsid.conf rm -rv /etc/iscsi/nodes/* # reconnected to target, restarted open-iscsi # iscsiadm --mode node --targetname ... http://paste.debian.net/119807/ # found: node.session.iscsi.DefaultTime2Retain = 60 node.session.iscsi.DefaultTime2Wait = 5 node.session.timeo.replacement_timeout = 480 node.conn[0].timeo.noop_out_interval = 15 node.conn[0].timeo.noop_out_timeout = 30 migration test by doing crm node standby on active target # crm configure show http://paste.debian.net/119832/ I already had to tune the ocf:heartbeat:iSCSILogicalUnit timeout to 80s. # repeating error message during migration until migration completes ERROR: Called "tgtadm --lld iscsi --op delete --mode logicalunit --tid 1 --lun=1" ERROR: Exit code 22 ERROR: Command output: "tgtadm: this logical unit is still active" # disk erros during iscsi/drbd migration on kvm host system http://paste.debian.net/119830/ # lvm logical volume is damaged after this... # the kvm guest system was running bonnie++ -d /tmp/bonnie/ -n 128 # and the guest reported disk errors and bonnie crashed # dmesg: http://paste.debian.net/119831/ Other kvm guest running mysql got corrupted databases. However no more read-only file-systems on all kvm guests and the file system damage was recoverable instead of complete destruction after running fsck in previous tests... Please advice :) A migraion of the iscsi/drbd target should be possible on a busy system without damage to the guests? Thanks in advance, Kind regards, Jelle de Jong _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker