Re: [Pacemaker] KVM live migration and multipath

Sven Arnold Sat, 22 Jun 2013 06:45:10 -0700

Hi,

I am getting closer... Some updates for those who are interested.

Did you turn caching off for your VMs disks?


That's a point. Indeed caching was not explicitely turned off and I just
noticed that the default setting of the cache attribute of the device
tag in libvirt has changed. [1]
I would expect that libvirt flushes all caches before finalizing the
migration process. But it is probably best to turn off caches anyway.

I have now configured:

<disk type='block' device='disk'>
       <driver name='qemu' type='raw' cache='none'/>


I would also switch to a native IO (aio) if your kernel/qemu support
that. Otherwise qemu allocates several dedicated IO threads, and it is
much slower that aio. There were some problems with aio in the past, but
it should work ok for recent enough distros.


This is interesting. After switching to native io out of curiosity:

<driver name='qemu' type='raw' cache='none' io='native'/>

the situation looked much better - to my surprise I did not experiencefurther corruptions with this virtual machine.

Then I added a second and third vm to the setup only to get errors againon those machines. I noticed that those additional vms had older qemumachine types (pc-0.11 and pc-0.12) set. After upgrading the domains tomachine type pc-1.0:


<os>
    <type arch='x86_64' machine='pc-1.0'>hvm</type>
    <boot dev='hd'/>
</os>

I did not trigger file system corruptions again. So, at this moment itlooks like it is important to:

- turn caching off
- use native aio
- *and* use an up-to-date machine type

Failure to meet any of these criteria would result in fs corruption.
Does this make sense at all?


May be that may depend on combination of libvirt/qemu versions and
migration mode used?


qemu is at 1.0 (1.0+noroms-0ubuntu14.8)
libvirt is at 0.9.8 (0.9.8-2ubuntu17.10)

And, do you always have fs corruption, independently of IO load?


I seems so that I have to create some IO to trigger the corruption.


Did you try to stop all but one iSCSI connection to eliminate multipathing?


Not exactly. This would be what I would do next if I have still problems.

What I did, was to use one iSCSI path directly (by using/dev/disk/by-path/... as the source of the block device). This seemed towork - but it is hard to tell if I just did not trigger a bug in my setup.

That everything worked with a single path (or at least seemed so) is notconsistent with the observations above. Therefore I still do not trustthe setup and will do some more long time tests.


May I ask a few more questions?

Do you manage the multipath daemon with pacemaker? In my setup multipathis started at boot time and not managed by pacemaker.


Where do you loose the dependencies between targets and initiator?
I use two advisory orders:

order o-iscsitarget_before_iscsiinitiator 0: rg-iscsitargetclone-iscsiinitiator

order o-iscsiinitiator_before_libvirt 0: clone-iscsiinitiatorclone-libvirtd

to have the possibility to restart targets (needed for failover) and torestart iscsi initiators (to scan for new targets easily). Is this goodpractice?



Thanks a lot and best regards,

Sven







_______________________________________________
Pacemaker mailing list: [email protected]
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] KVM live migration and multipath

Reply via email to