HI everyone, last week i spent some time testing the live migration capabilities now that nova has started to use neutron multiple port bindings. all testing unless otherwise specifed was done with Rocky RC1 or later commits on Centos 7.5 using devstack.
test summary ~~~~~~~~~~ i have tested the following scenarios with different levels of success. linux bridge to linux bridge: Worked ovs iptables to ovs iptables: Worked ovs conntrack to ovs conntrack: Worked ovs iptables to ovs conntrack: Worked ovs conntrack to ovs iptables: Worked linux bridge to ovs: migrtation succeded network connectivity broken see bug 1788009 ovs to linux bridge: failed, libvirt error due to lack of destination bridge name see bug 1788012 ovs to ovs dpdk: failed qemu bug encountered on migrate. nova xml generation appears correct. ovs dpdk to ovs: failed another qemu bug encountered on migrate. nova xml generation appears correct. centos->ubuntu: failed emultor not found. see bug: 1788028 not that since iptables to conntrack migration now works operators will be able to change this value once they have upgraded to rocky via a rolling update using live migration. host config ~~~~~~~~ note that not all nodes were running the exact same commits as i added addtional nodes later in my testing. all nodes were at least at this level nova sha: afe4512bf66c89a061b1a7ccd3e7ac8e3b1b284d neutron sha: 1dda2bca862b1268c0f5ae39b7508f1b1cab6f15 nova was configured with [compute] live_migration_wait_for_vif_plug = True and the nova commit above contains the revirt of the slow migration change. test details ~~~~~~~~ in both the ovs-dpdk tests, when the migration failed and the vm contiuned to run on the source node however it had no network connectivity. on hard reboot of the vm, it went to error state because the vif binding was set to none as the vif:bidning-details:host_id was set to none so the vif_type was also set to none. i have opened a nova bug to track the fact that the vm is left in an invalid state even though the status is active. see bug 1788014 when i was testing live migration betweeen ovs with iptable and the connection tracking firewall i also did minimal testing to ensure the firewall work. i did this by booting 3 vm. 2 VM A and B in the same security group and one in a seperate security group VM C. VM A and B where intially on differnet node ovs compute nodes with VM A using iptables and VM B using conntrack security group driver. VM C was on the conntrac node. before VM c was setup to ping vm B which is block by security groups VM A was also configured to ping VM B which is allowed by security groups. VM B was then live migrate from the conntrack node to the iptables node and back while observing the ping out put of VM A and C druing this process it was observed that VM A contiued to ping VM B succesfully and at no point was VM C able to ping VM B. while this is by no means a complete test it indicates that security groups appear to be configred before network conenctive is restored on live migrating as expected. i also noticed that the interval where network connectivity was lost during live migrate was longer when going between the ip table node to the conntrack node the the reverse. i did not investage why but i suspect this is related to some flow timeouts in the contrack module. other testing ~~~~~~~~~ about two week ago i also tested the numa aware vswitch sepc. dureing that testing i confiimed that new isntaces were numa affined corectly i also confirmed that while live migration succeded the numa pinnning was not updated. as this was expected i have not opened a bug for this since it will be addressed in Stein by the numa aware live migration sepc. future testing ~~~~~~~~~ OVS-DPDK to OVS-DPDK =================== if i have time i will try and test live migration betwen two ovs-dpdk host. this has worked since before nova supported vhost-user. i did not test this case yet but its possible the qemu bug i hit in my ovs to ovs-dpdk testing could also break ovs-dpdk to ovs-dpdk migration. ovs to ovn ======== if i have time i may also test ovs to ovn migration. this should just work but i suspect that the same bug i hit with mixed ovs and linux bridge clouds may exist and the vxlan tunnels mesh may not be created. BUGS _____ nova ~~~~ when live migration fails due to a internal error rollback is not handeled correctly. :- https://bugs.launchpad.net/nova/+bug/1788014 libvirt: nova assumed dest emultor path is the same as source and fails to migrate if this is not true. :- https://bugs.launchpad.net/nova/+bug/1788028 neutron ~~~~~ neutron bridge name is not always set for ml2/ovs: - https://bugs.launchpad.net/neutron/+bug/1788009 bridge name not set in vif:binding-details by ml2/linux-bridge: - https://bugs.launchpad.net/neutron/+bug/1788012 neutron does not form mesh tunnel overly between different ml2 driver.: - https://bugs.launchpad.net/neutron/+bug/1788023 regards sean __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: [email protected]?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
