Il 22/12/2013 13:24, Aurelien Jarno ha scritto:
> On Sat, Dec 21, 2013 at 03:08:21PM +0100, Paolo Bonzini wrote:
>> Il 21/12/2013 00:00, Richard Henderson ha scritto:
>>> +        if (real_bswap && have_movbe) {
>>> +            tcg_out_modrm_offset(s, OPC_MOVBE_GyMy + P_DATA16 + seg,
>>> +                                 datalo, base, ofs);
>>> +            tcg_out_ext16u(s, datalo, datalo);
>>
>> Do partial register stalls still exist on Atom and Haswell?  I don't
>> remember exactly what you had to do to prevent them, but IIRC you first
>> moved zero to the register and then overwrote the the low 16 bits.
> 
> Note that for unsigned 16-bit load you can do either movzw + bswap or 
> movbe + movzw.

Yeah, I was asking if xor + movbe would be faster.  Benchmarking could
tell, but anyway xor + movbe is likely the smallest code you can produce.

Paolo


Reply via email to