On 10/03/15 16:36, Jan Beulich wrote: > Use + on outputs instead of = and a matching input. Allow not just > memory for the _eflags operand (it turns out that recent gcc produces > worse code when also doing this for _dst.val, so the latter is being > avoided). > > Signed-off-by: Jan Beulich <jbeul...@suse.com> > > --- a/xen/arch/x86/x86_emulate/x86_emulate.c > +++ b/xen/arch/x86/x86_emulate/x86_emulate.c > @@ -428,7 +428,7 @@ typedef union { > /* Before executing instruction: restore necessary bits in EFLAGS. */ > #define _PRE_EFLAGS(_sav, _msk, _tmp) \ > /* EFLAGS = (_sav & _msk) | (EFLAGS & ~_msk); _sav &= ~_msk; */ \ > -"movl %"_sav",%"_LO32 _tmp"; " \ > +"movl %"_LO32 _sav",%"_LO32 _tmp"; " \ > "push %"_tmp"; " \ > "push %"_tmp"; " \ > "movl %"_msk",%"_LO32 _tmp"; " \ > @@ -448,7 +448,7 @@ typedef union { > "pushf; " \ > "pop %"_tmp"; " \ > "andl %"_msk",%"_LO32 _tmp"; " \ > -"orl %"_LO32 _tmp",%"_sav"; " > +"orl %"_LO32 _tmp",%"_LO32 _sav"; " > > /* Raw emulation: instruction has two explicit operands. */ > #define __emulate_2op_nobyte(_op,_src,_dst,_eflags,_wx,_wy,_lx,_ly,_qx,_qy)\ > @@ -460,18 +460,16 @@ do{ unsigned long _tmp; > _PRE_EFLAGS("0","4","2") \ > _op"w %"_wx"3,%1; " \ > _POST_EFLAGS("0","4","2") \ > - : "=m" (_eflags), "=m" ((_dst).val), "=&r" (_tmp) \ > - : _wy ((_src).val), "i" (EFLAGS_MASK), \ > - "m" (_eflags), "m" ((_dst).val) ); \ > + : "+g" (_eflags), "+m" ((_dst).val), "=&r" (_tmp) \ > + : _wy ((_src).val), "i" (EFLAGS_MASK) ); \
I believe the old ASM was buggy, not just inefficient. Having read the Extended ASM documentation quite carefully, the following statement is relevant "Only input operands may use numbers in constraints, and they must each refer to an output operand. Only a number (or the symbolic assembler name) in the constraint can guarantee that one operand is in the same place as another. The mere fact tha|t 'foo' |||is the value of both operands is not enough to guarantee that they are in the same place in the generated assembler code." Because the input operands do not use numbers, the asm must read from %5 and write to %0 to guarantee that the _eflags temporary is used properly. I believe that this transformation does now make the asm correct, as the output and input sides are now guaranteed to match the %0 used to reference the _eflags temporary. Did you observe any code changes simply from changing = constraints to +, or did we get very lucky in with the generated code? I think it might be a very wise idea to switch to using symbolic names. This code is very complicated and has many ways to go subtly wrong. ~Andrew
_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel