Re: [Xen-devel] migration regression in xen-4.11 and qemu-2.11 and qcow2

2018-05-17 Thread Olaf Hering
Am Thu, 17 May 2018 08:30:58 +0200 schrieb Olaf Hering : > I think the issue fixed by 5d6c599fe1d69a1bf8c5c4d3c58be2b31cd625ad is not > specific to HVM. It seems domain_suspend_common_guest_suspended would call > that changed function only for HVM. It seems the logic is wrong. It is not > about

Re: [Xen-devel] migration regression in xen-4.11 and qemu-2.11 and qcow2

2018-05-17 Thread Olaf Hering
Am Mon, 7 May 2018 17:19:46 +0200 schrieb Olaf Hering : > With qemu-2.11 the sender thinks everything is alright and the domU is moved. Another case of breakage in qemu-2.11: if the targethost does not even have access to the diskimage the sender still thinks everything is alright. qemu does not

Re: [Xen-devel] migration regression in xen-4.11 and qemu-2.11 and qcow2

2018-05-16 Thread Olaf Hering
Am Wed, 16 May 2018 16:53:28 +0200 schrieb Olaf Hering : > Am Thu, 10 May 2018 11:40:18 +0100 > schrieb Anthony PERARD : > > I did fix the bug in QEMU 2.11 (5d6c599fe1d69a1bf8c5c4d3c58be2b31cd625ad) > > so Xen 4.11 does include it it the qemu-xen tree. > Is this supposed to be called also for PV

Re: [Xen-devel] migration regression in xen-4.11 and qemu-2.11 and qcow2

2018-05-16 Thread Olaf Hering
Am Thu, 10 May 2018 11:40:18 +0100 schrieb Anthony PERARD : > I did fix the bug in QEMU 2.11 (5d6c599fe1d69a1bf8c5c4d3c58be2b31cd625ad) > so Xen 4.11 does include it it the qemu-xen tree. Is this supposed to be called also for PV? In my testing qmp_xen_save_devices_state shows up only on HVM. O

Re: [Xen-devel] migration regression in xen-4.11 and qemu-2.11 and qcow2

2018-05-16 Thread Olaf Hering
Am Thu, 10 May 2018 09:03:30 -0700 (PDT) schrieb Stefano Stabellini : > You could add a property to vmstate_xen_platform of xen_platform.c, but > you need to pay attention to legacy compatibility. Inevitably, there > will be older versions that do not have the new vmstate_xen_platform > field or d

Re: [Xen-devel] migration regression in xen-4.11 and qemu-2.11 and qcow2

2018-05-14 Thread Olaf Hering
Am Thu, 10 May 2018 11:40:18 +0100 schrieb Anthony PERARD : > I'm not sure if that information is going to help, but that what I have > for now about the lock of block images. I think the issue is not with two dom0s locking the same file, but with one qemu process trying to lock the same region

Re: [Xen-devel] migration regression in xen-4.11 and qemu-2.11 and qcow2

2018-05-10 Thread Stefano Stabellini
On Thu, 10 May 2018, Olaf Hering wrote: > Am Wed, 9 May 2018 14:43:17 -0700 (PDT) > schrieb Stefano Stabellini : > > > 512b109ec962 is a very old commit: why is it causing problems to Xen > > 4.10 and Xen 4.11 HVM migration? What is the error exactly? Sorry, I > > might be missing some context. >

Re: [Xen-devel] migration regression in xen-4.11 and qemu-2.11 and qcow2

2018-05-10 Thread Anthony PERARD
On Tue, May 08, 2018 at 01:31:43PM +0200, Olaf Hering wrote: > It is unclear why that was never noticed in xen-4.10, qemu-2.9 did not have > that bug. > Also, if a KVM or Xen guest is migrated should make zero difference for the > qcow2 driver... Hi Olaf, I did try to fix a migration related is

Re: [Xen-devel] migration regression in xen-4.11 and qemu-2.11 and qcow2

2018-05-09 Thread Olaf Hering
Am Wed, 9 May 2018 14:43:17 -0700 (PDT) schrieb Stefano Stabellini : > 512b109ec962 is a very old commit: why is it causing problems to Xen > 4.10 and Xen 4.11 HVM migration? What is the error exactly? Sorry, I > might be missing some context. It is papering over the real issue, thats why one can

Re: [Xen-devel] migration regression in xen-4.11 and qemu-2.11 and qcow2

2018-05-09 Thread Stefano Stabellini
CC'ing Linux x86 maintainers On Wed, 9 May 2018, Olaf Hering wrote: > Am Wed, 9 May 2018 14:08:14 -0700 (PDT) > schrieb Stefano Stabellini : > > > I cannot find 512b109ec962 or "xen: unplug the emulated devices at > > resume time" anywhere, neither in qemu.org/master nor in the qemu-xen > > trees

Re: [Xen-devel] migration regression in xen-4.11 and qemu-2.11 and qcow2

2018-05-09 Thread Olaf Hering
Am Wed, 9 May 2018 14:08:14 -0700 (PDT) schrieb Stefano Stabellini : > I cannot find 512b109ec962 or "xen: unplug the emulated devices at > resume time" anywhere, neither in qemu.org/master nor in the qemu-xen > trees. What am I missing? It is a 7 years old kernel patch. Olaf pgpIm5JJ0o6y2.pgp

Re: [Xen-devel] migration regression in xen-4.11 and qemu-2.11 and qcow2

2018-05-09 Thread Stefano Stabellini
On Wed, 9 May 2018, Olaf Hering wrote: > Am Tue, 8 May 2018 18:40:26 +0200 > schrieb Olaf Hering : > > > It looks like the IDE unplug is not permanent. > > Stefano, > > Jochen pointed me to commit 512b109ec962 ("xen: unplug the emulated devices > at resume time"), which I think is wrong. The ke

Re: [Xen-devel] migration regression in xen-4.11 and qemu-2.11 and qcow2

2018-05-09 Thread Olaf Hering
Am Tue, 8 May 2018 18:40:26 +0200 schrieb Olaf Hering : > It looks like the IDE unplug is not permanent. Stefano, Jochen pointed me to commit 512b109ec962 ("xen: unplug the emulated devices at resume time"), which I think is wrong. The kernel will most likely not be able to switch from a PV ba

Re: [Xen-devel] migration regression in xen-4.11 and qemu-2.11 and qcow2

2018-05-08 Thread Olaf Hering
Am Tue, 8 May 2018 13:31:43 +0200 schrieb Olaf Hering : > On the sending side offset 0xc9 is unlocked on the other fd, which allows > F_WRLCK to succeed: > > It seems on the receiving side some code forgets to unclock offset 0xc9, > which causes F_WRLCK to fail: It looks like the IDE unplug is

Re: [Xen-devel] migration regression in xen-4.11 and qemu-2.11 and qcow2

2018-05-08 Thread Olaf Hering
Am Mon, 7 May 2018 17:19:46 +0200 schrieb Olaf Hering : > What I gathered during debugging so far is that somehow qemu on the receiving > side locks a region twice: After further debugging with many wild printfs: On the receiving side blockdev_init sets BDRV_O_INACTIVE because RUN_STATE_INMIGRA

[Xen-devel] migration regression in xen-4.11 and qemu-2.11 and qcow2

2018-05-07 Thread Olaf Hering
I assume OSS test does not test realworld live migration, therefore the following regression remained unnoticed: name="hvm" builder="hvm" memory=555 vcpus=4 serial="pty" boot="c" disk=[ 'qcow2:/nfs/vdisk.qcow2,hda,w', ] device_model_version="qemu-xen" xl create -cf hvm.cfg sleep N xl migrate hvm