thx Wei - that should (so I was told) kill the DESTINATION VM on failed migrations - i.e. perform cleanup - so that is OK?
On Fri, 22 Nov 2019 at 10:59, Wei ZHOU <[email protected]> wrote: > Hi Andrija, > > As I remember, it happened on our production few years ago. > > https://github.com/apache/cloudstack/blob/master/engine/orchestration/src/main/java/com/cloud/vm/VirtualMachineManagerImpl.java#L2962-L2983 > > > -Wei > > On Fri, 22 Nov 2019 at 09:34, Andrija Panic <[email protected]> > wrote: > > > Thx both, thx Wei - that sounds all interesting. > > > > as for "vm migration fails and no retry in cloudstack" - this should NOT > > trigger stopping the VM - at least what I saw so far - simply host will > be > > in ErrorMaintenance - can you confirm VMs are not stopped in this case? > > > > On Fri, 22 Nov 2019 at 08:54, Wei ZHOU <[email protected]> wrote: > > > > > Hi Andrija, > > > > > > We have faces some vm migration issues. There are three categories > > actually > > > 1. vm migration fails due to different hardware or software on source > and > > > destination hosts, for example, cpu models. vm will be still running on > > > source hosts. > > > you may find some errors in agent.log. > > > 2. vm migration fails due to some libvirt/qemu bugs. you may find some > > > errors in /var/log/libvirt/qemu/ folder (on ubuntu) on the source or > > > destination host. > > > mostly the vm will be still running on source host. In rare cases the > vm > > is > > > stopped. > > > 3. vm is stopped due to some cloudstack bugs. for example, when we put > a > > > host to maintenance, the vm will be stopped if (1) no other host is Up > in > > > same cluster, or (2) vm migration fails and no retry in cloudstack, or > > (3) > > > multiple vms are migrated to same destination at the same time but > there > > is > > > no enough memory on the destination. > > > > > > We need to fix the issues mentioned in part 3 above in cloudstack. > > > > > > In Leaseweb, to improve the vm migration > > > (1) we use custom cpu model , see > > > > > > > > > http://docs.cloudstack.apache.org/projects/cloudstack-installation/en/master/hypervisor/kvm.html#configure-cpu-model-for-kvm-guest-optional > > > (2) we have build our own qemu packages with some bug fixes for > > > installation > > > (3) we have some fixes in our fork from 4.7.1. We have not tested with > > > 4.13/4.14. > > > We still see failed vm migration sometimes. However the vms will not be > > > stopped if migration fails. > > > > > > -Wei > > > > > > On Fri, 22 Nov 2019 at 01:54, Andrija Panic <[email protected]> > > > wrote: > > > > > > > ( @Sven, not being able to migrate Vm with ISO attached - don't > recall > > > > testing/doing that recently - but is technically perfectly possible, > > > unless > > > > we don't support it via CloudStack - feel free to open GitHub issue > > with > > > > correct steps to reproduce etc) > > > > > > > > On Fri, 22 Nov 2019 at 01:47, Andrija Panic <[email protected] > > > > > > wrote: > > > > > > > > > That sucks...thx both. > > > > > > > > > > @both - which ACS version do you use (and encounter such issues?) > > > > > > > > > > Ubuntu comes with a whole another set of issues (I was losing my > > nerves > > > > > around very idiotic things, last time a week ago...) - though most > > can > > > be > > > > > managed with some workarounds. > > > > > But yes, Qemu/libvirt should be better with Ubuntu - free of RedHat > > > > > s$^%tty business politics - i.e. in CentOS 6.x you were able to > live > > > > > migrate VM WITH all the volumes to another host/storage. On CentOS > 7 > > > you > > > > > can't do that any more, unless you are using qemu-kvm-ev (but not > the > > > > > regular one from the SIG CentOS repo, you need the one from the > oVirt > > > > > project) > > > > > > > > > > I'm just trying to understand if this is happening also on i.e. ACS > > > 4.11 > > > > - > > > > > so to stop digging around the problem (and assume it's purely > CentOS > > > > which > > > > > is broken - why all great things need to come to an end...damn it) > > > > > > > > > > (well I could also test same ACS code on Ubuntu and see if no > issues > > > > there > > > > > with live migrations..) > > > > > > > > > > Thanks > > > > > Andrija > > > > > > > > > > On Thu, 21 Nov 2019 at 23:39, Jean-Francois Nadeau < > > > > [email protected]> > > > > > wrote: > > > > > > > > > >> Hi Andrija, > > > > >> > > > > >> We experienced that problem with stock packages on CentOS 7.4. > > Live > > > > >> migration would frequently fail and leave the VM dead. We since > > > moved > > > > >> to > > > > >> RHEV packages for qemu. Libvirt is still stock per CentoS 7.6 > > (4.5). > > > > I > > > > >> want to say the situation improved but I can't tell yet if we > have a > > > > 100% > > > > >> success rate on live migrations (as it should be !) > > > > >> > > > > >> Redhat also have been messing up severely with stock libvirt > > versions > > > > >> between 7.4/7.5/7.6 in such way it broke live migration > > compatibility > > > > (cpu > > > > >> definitions). Im at the crossroads right now to entirely ditch > > > > >> centos/redhat in favor of Ubuntu to have well tested stock > packages. > > > > >> > > > > >> best, > > > > >> > > > > >> -Jfn > > > > >> > > > > >> > > > > >> > > > > >> On Thu, Nov 21, 2019 at 5:25 PM Andrija Panic < > > > [email protected]> > > > > >> wrote: > > > > >> > > > > >> > Hi guys. > > > > >> > > > > > >> > I wanted to see if any of you have seen similar/same in master, > as > > > > >> below. > > > > >> > > > > > >> > I've been testing some work/PRs (against the current master) and > > > I've > > > > >> seen > > > > >> > that VMs will crash/be stopped occasionally when live migration > is > > > > >> > happening. I experienced this on an NEW/EMPTY env, with 2 KVM > > hosts, > > > > and > > > > >> > only SSVM and CPVM - not a capacity issues or similar. > > > > >> > > > > > >> > This is happening with CentOS 7 (CentOS 7.3 I believe, but we > also > > > > >> updated > > > > >> > packages to the latest stock ones and same issue was happening > > > again). > > > > >> > > > > > >> > This is still under investigation, but I was wondering if anyone > > > else > > > > >> has > > > > >> > seen similar thing happening? > > > > >> > > > > > >> > Best, > > > > >> > > > > > >> > -- > > > > >> > > > > > >> > Andrija Panić > > > > >> > > > > > >> > > > > > > > > > > > > > > > -- > > > > > > > > > > Andrija Panić > > > > > > > > > > > > > > > > > -- > > > > > > > > Andrija Panić > > > > > > > > > > > > > -- > > > > Andrija Panić > > > -- Andrija Panić
