On 22/05/17 12:03, George Dunlap wrote: > On Mon, May 22, 2017 at 11:18 AM, George Dunlap > <george.dun...@citrix.com> wrote: >> On 22/05/17 07:35, Hao, Xudong wrote: >>> Bug detailed description: >>> >>> ---------------- >>> >>> Create one RHEL7.3 HVM and do live migration continuously, while doing the >>> 200+ or 300+ times live-migration, tool stack report error and migration >>> failed. >>> >>> >>> >>> Environment : >>> >>> ---------------- >>> >>> HW: Skylake server >>> >>> Xen: Xen 4.9.0 RC4 >>> >>> Dom0: Linux 4.11.0 >>> >>> >>> >>> Reproduce steps: >>> >>> ---------------- >>> >>> 1. Compile Xen 4.9 Rc4 and dom0 kernel 4.11.0, boot to dom0 >>> >>> 2. Boot RHEL7.3 HVM guest >>> >>> 3. Migrate guest to localhost, sleep 10 seconds >>> >>> 4. Repeat doing the step3. >>> >>> >>> >>> Current result: >>> >>> ---------------- >>> >>> VM Migration fail. >>> >>> >>> >>> Base error log: >>> >>> ---------------- >>> >>> xl migrate 24hrs_lm_guest_2 localhost >>> >>> root@localhost's password: >>> >>> migration target: Ready to receive domain. >>> >>> Saving to migration stream new xl format (info 0x3/0x0/1761) >>> >>> Loading new save file <incoming migration stream> (new xl fmt info >>> 0x3/0x0/1761) >>> >>> Savefile contains xl domain config in JSON format >>> >>> Parsing config from <saved> >>> >>> xc: info: Saving domain 273, type x86 HVM >>> >>> xc: info: Found x86 HVM domain from Xen 4.9 >>> >>> xc: info: Restoring domain >>> >>> xc: error: set HVM param 12 = 0x00000000feffe000 (85 = Interrupted system >>> call should ): Internal error >>> >>> xc: error: Restore failed (85 = Interrupted system call should ): Internal >>> error >> Interesting -- it appears that setting HVM_PARAM_IDENT_PT (#12) can fail >> with -ERESTART. But the comment for ERESTART makes it explicit that it >> should be internal only -- it should cause a hypercall continuation (so >> that the hypercall restarts automatically), rather than returning to the >> guest. >> >> But the hypercall continuation code seems to have disappeared from >> do_hvm_op() at some point? >> >> /me digs a bit more... > The problem turns out to be commit ae20ccf ("dm_op: convert > HVMOP_set_mem_type"), which says: > > This patch removes the need for handling HVMOP restarts, so that > infrastructure is removed. > > While it's true that there are no more operations which need iteration > information restored, but there are two operations which may still > need to be restarted to avoid deadlocks with other operations. > > Attached is a patch which restores hypercall continuation checking. > Xudong, can you give it a test? > > Thanks, > -George > > From 3d4ce135ea3b396bb63752c39e6234366d590c16 Mon Sep 17 00:00:00 2001 > From: George Dunlap <george.dun...@citrix.com> > Date: Mon, 22 May 2017 11:38:31 +0100 > Subject: [PATCH] Restore HVM_OP hypercall continuation (partial revert of > ae20ccf) > > Commit ae20ccf removed the hypercall continuation logic from the end > of do_hvm_op(), claiming: > > "This patch removes the need for handling HVMOP restarts, so that > infrastructure is removed." > > That turns out to be false. The removal of HVMOP_set_mem_type removed > the need to store a start iteration value in the hypercall > continuation, but a grep through hvm.c for ERESTART turns up at least > two places where do_hvm_op() may still need a hypercall continuation: > > * HVMOP_set_hvm_param can return -ERESTART when setting > HVM_PARAM_IDENT_PT in the event that it fails to acquire the domctl > lock > > * HVMOP_flush_tlbs can return -ERESTART if several vcpus call it at > the same time > > In both cases, a simple restart (with no stored iteration information) > is necessary. > > Add a check for -ERESTART again, along with a comment at the top of > the function regarding the lack of decoding any information from the > op value. > > Reported-by: Xudong Hao <xudong....@intel.com> > Signed-off-by: George Dunlap <george.dun...@citrix.com> > --- > CC: Andrew Cooper <andrew.coop...@citrix.com> > CC: Jan Beulich <jbeul...@suse.com> > CC: Paul Durrant <paul.durr...@citrix.com>
Reviewed-by: Andrew Cooper <andrew.coop...@citrix.com> (with the final hunk removed) > --- > xen/arch/x86/hvm/hvm.c | 12 +++++++++++- > 1 file changed, 11 insertions(+), 1 deletion(-) > > diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c > index 81691e2..e3e817d 100644 > --- a/xen/arch/x86/hvm/hvm.c > +++ b/xen/arch/x86/hvm/hvm.c > @@ -4544,6 +4544,13 @@ long do_hvm_op(unsigned long op, > XEN_GUEST_HANDLE_PARAM(void) arg) > { > long rc = 0; > > + /* > + * NB: hvm_op can be part of a restarted hypercall; but at the > + * moment the only hypercalls which do continuations don't need to > + * store any iteration information (since they're just re-trying > + * the acquisition of a lock). > + */ > + > switch ( op ) > { > case HVMOP_set_evtchn_upcall_vector: > @@ -4636,6 +4643,10 @@ long do_hvm_op(unsigned long op, > XEN_GUEST_HANDLE_PARAM(void) arg) > } > } > > + if ( rc == -ERESTART ) > + rc = hypercall_create_continuation(__HYPERVISOR_hvm_op, "lh", > + op, arg); > + > return rc; > } > > @@ -4869,4 +4880,3 @@ void hvm_set_segment_register(struct vcpu *v, enum > x86_segment seg, > * indent-tabs-mode: nil > * End: > */ > - > -- 2.1.4
_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel