On 10/03/15 16:36, Jan Beulich wrote:
> Use + on outputs instead of = and a matching input. Allow not just
> memory for the _eflags operand (it turns out that recent gcc produces
> worse code when also doing this for _dst.val, so the latter is being
> avoided).
>
> Signed-off-by: Jan Beulich <jbeul...@suse.com>
>
> --- a/xen/arch/x86/x86_emulate/x86_emulate.c
> +++ b/xen/arch/x86/x86_emulate/x86_emulate.c
> @@ -428,7 +428,7 @@ typedef union {
>  /* Before executing instruction: restore necessary bits in EFLAGS. */
>  #define _PRE_EFLAGS(_sav, _msk, _tmp)                           \
>  /* EFLAGS = (_sav & _msk) | (EFLAGS & ~_msk); _sav &= ~_msk; */ \
> -"movl %"_sav",%"_LO32 _tmp"; "                                  \
> +"movl %"_LO32 _sav",%"_LO32 _tmp"; "                            \
>  "push %"_tmp"; "                                                \
>  "push %"_tmp"; "                                                \
>  "movl %"_msk",%"_LO32 _tmp"; "                                  \
> @@ -448,7 +448,7 @@ typedef union {
>  "pushf; "                                       \
>  "pop  %"_tmp"; "                                \
>  "andl %"_msk",%"_LO32 _tmp"; "                  \
> -"orl  %"_LO32 _tmp",%"_sav"; "
> +"orl  %"_LO32 _tmp",%"_LO32 _sav"; "
>  
>  /* Raw emulation: instruction has two explicit operands. */
>  #define __emulate_2op_nobyte(_op,_src,_dst,_eflags,_wx,_wy,_lx,_ly,_qx,_qy)\
> @@ -460,18 +460,16 @@ do{ unsigned long _tmp;                 
>              _PRE_EFLAGS("0","4","2")                                       \
>              _op"w %"_wx"3,%1; "                                            \
>              _POST_EFLAGS("0","4","2")                                      \
> -            : "=m" (_eflags), "=m" ((_dst).val), "=&r" (_tmp)              \
> -            : _wy ((_src).val), "i" (EFLAGS_MASK),                         \
> -              "m" (_eflags), "m" ((_dst).val) );                           \
> +            : "+g" (_eflags), "+m" ((_dst).val), "=&r" (_tmp)              \
> +            : _wy ((_src).val), "i" (EFLAGS_MASK) );                       \

I believe the old ASM was buggy, not just inefficient.

Having read the Extended ASM documentation quite carefully, the
following statement is relevant

"Only input operands may use numbers in constraints, and they must each
refer to an output operand. Only a number (or the symbolic assembler
name) in the constraint can guarantee that one operand is in the same
place as another. The mere fact tha|t 'foo' |||is the value of both
operands is not enough to guarantee that they are in the same place in
the generated assembler code."

Because the input operands do not use numbers, the asm must read from %5
and write to %0 to guarantee that the _eflags temporary is used properly.

I believe that this transformation does now make the asm correct, as the
output and input sides are now guaranteed to match the %0 used to
reference the _eflags temporary.

Did you observe any code changes simply from changing = constraints to
+, or did we get very lucky in with the generated code?

I think it might be a very wise idea to switch to using symbolic names. 
This code is very complicated and has many ways to go subtly wrong.

~Andrew
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Reply via email to