On 22/05/17 12:03, George Dunlap wrote:
> On Mon, May 22, 2017 at 11:18 AM, George Dunlap
> <george.dun...@citrix.com> wrote:
>> On 22/05/17 07:35, Hao, Xudong wrote:
>>> Bug detailed description:
>>>
>>> ----------------
>>>
>>> Create one RHEL7.3 HVM and do live migration continuously, while doing the 
>>> 200+ or 300+ times live-migration, tool stack report error and migration 
>>> failed.
>>>
>>>
>>>
>>> Environment :
>>>
>>> ----------------
>>>
>>> HW: Skylake server
>>>
>>> Xen: Xen 4.9.0 RC4
>>>
>>> Dom0: Linux 4.11.0
>>>
>>>
>>>
>>> Reproduce steps:
>>>
>>> ----------------
>>>
>>> 1.      Compile Xen 4.9 Rc4 and dom0 kernel 4.11.0, boot to dom0
>>>
>>> 2.      Boot RHEL7.3 HVM guest
>>>
>>> 3.      Migrate guest to localhost, sleep 10 seconds
>>>
>>> 4.      Repeat doing the step3.
>>>
>>>
>>>
>>> Current result:
>>>
>>> ----------------
>>>
>>> VM Migration fail.
>>>
>>>
>>>
>>> Base error log:
>>>
>>> ----------------
>>>
>>> xl migrate 24hrs_lm_guest_2 localhost
>>>
>>> root@localhost's password:
>>>
>>> migration target: Ready to receive domain.
>>>
>>> Saving to migration stream new xl format (info 0x3/0x0/1761)
>>>
>>> Loading new save file <incoming migration stream> (new xl fmt info 
>>> 0x3/0x0/1761)
>>>
>>> Savefile contains xl domain config in JSON format
>>>
>>> Parsing config from <saved>
>>>
>>> xc: info: Saving domain 273, type x86 HVM
>>>
>>> xc: info: Found x86 HVM domain from Xen 4.9
>>>
>>> xc: info: Restoring domain
>>>
>>> xc: error: set HVM param 12 = 0x00000000feffe000 (85 = Interrupted system 
>>> call should ): Internal error
>>>
>>> xc: error: Restore failed (85 = Interrupted system call should ): Internal 
>>> error
>> Interesting -- it appears that setting HVM_PARAM_IDENT_PT (#12) can fail
>> with -ERESTART.  But the comment for ERESTART makes it explicit that it
>> should be internal only -- it should cause a hypercall continuation (so
>> that the hypercall restarts automatically), rather than returning to the
>> guest.
>>
>> But the hypercall continuation code seems to have disappeared from
>> do_hvm_op() at some point?
>>
>> /me digs a bit more...
> The problem turns out to be commit ae20ccf ("dm_op: convert
> HVMOP_set_mem_type"), which says:
>
>     This patch removes the need for handling HVMOP restarts, so that
>     infrastructure is removed.
>
> While it's true that there are no more operations which need iteration
> information restored, but there are two operations which may still
> need to be restarted to avoid deadlocks with other operations.
>
> Attached is a patch which restores hypercall continuation checking.
> Xudong, can you give it a test?
>
> Thanks,
>  -George
>
> From 3d4ce135ea3b396bb63752c39e6234366d590c16 Mon Sep 17 00:00:00 2001
> From: George Dunlap <george.dun...@citrix.com>
> Date: Mon, 22 May 2017 11:38:31 +0100
> Subject: [PATCH] Restore HVM_OP hypercall continuation (partial revert of
>  ae20ccf)
>
> Commit ae20ccf removed the hypercall continuation logic from the end
> of do_hvm_op(), claiming:
>
> "This patch removes the need for handling HVMOP restarts, so that
> infrastructure is removed."
>
> That turns out to be false.  The removal of HVMOP_set_mem_type removed
> the need to store a start iteration value in the hypercall
> continuation, but a grep through hvm.c for ERESTART turns up at least
> two places where do_hvm_op() may still need a hypercall continuation:
>
>  * HVMOP_set_hvm_param can return -ERESTART when setting
> HVM_PARAM_IDENT_PT in the event that it fails to acquire the domctl
> lock
>
>  * HVMOP_flush_tlbs can return -ERESTART if several vcpus call it at
>    the same time
>
> In both cases, a simple restart (with no stored iteration information)
> is necessary.
>
> Add a check for -ERESTART again, along with a comment at the top of
> the function regarding the lack of decoding any information from the
> op value.
>
> Reported-by: Xudong Hao <xudong....@intel.com>
> Signed-off-by: George Dunlap <george.dun...@citrix.com>
> ---
> CC: Andrew Cooper <andrew.coop...@citrix.com>
> CC: Jan Beulich <jbeul...@suse.com>
> CC: Paul Durrant <paul.durr...@citrix.com>

Reviewed-by: Andrew Cooper <andrew.coop...@citrix.com> (with the final
hunk removed)

> ---
>  xen/arch/x86/hvm/hvm.c | 12 +++++++++++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index 81691e2..e3e817d 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -4544,6 +4544,13 @@ long do_hvm_op(unsigned long op, 
> XEN_GUEST_HANDLE_PARAM(void) arg)
>  {
>      long rc = 0;
>  
> +    /* 
> +     * NB: hvm_op can be part of a restarted hypercall; but at the
> +     * moment the only hypercalls which do continuations don't need to
> +     * store any iteration information (since they're just re-trying
> +     * the acquisition of a lock).
> +     */
> +    
>      switch ( op )
>      {
>      case HVMOP_set_evtchn_upcall_vector:
> @@ -4636,6 +4643,10 @@ long do_hvm_op(unsigned long op, 
> XEN_GUEST_HANDLE_PARAM(void) arg)
>      }
>      }
>  
> +    if ( rc == -ERESTART )
> +        rc = hypercall_create_continuation(__HYPERVISOR_hvm_op, "lh",
> +                                           op, arg);
> +
>      return rc;
>  }
>  
> @@ -4869,4 +4880,3 @@ void hvm_set_segment_register(struct vcpu *v, enum 
> x86_segment seg,
>   * indent-tabs-mode: nil
>   * End:
>   */
> -
> -- 2.1.4

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Reply via email to