Re: [PATCH] tcg/i386: Check for shorter instruction sequence for ARITH_AND

Richard Henderson Mon, 07 Aug 2023 11:59:16 -0700

On 8/7/23 07:28, Helge Deller wrote:

The tcg uses tgen_arithi(ARITH_AND) during fast CPU TLB lookups,
which e.g. translates to:


0x7ff5b011556a:  48 81 e6 00 f0 ff ff     andq     $0xfffffffffffff000, %rsi

In case the upper 48 bits are all set, the shorter sequence to operate
on the lower 16 bits of the target reg (si) can be used, which will then
be a 2 bytes shorter instruction sequence:

0x7f4488097b31:  66 81 e6 00 f0           andw     $0xf000, %si

Signed-off-by: Helge Deller <del...@gmx.de>



Current Intel optimization guidelines

https://www.intel.com/content/www/us/en/content-details/671488/intel-64-and-ia-32-architectures-optimization-reference-manual.html

Section 3.4.2.3, Length Changing Prefixes, suggests that using 16-byte operands slowsdecode from 1 cycle to 6 cycles.

Section 3.5.2.3, Partial Register Stalls, says that Skylake has fixed the major issuesthat older microarchitectures had with such stalls, but that these operations have twoadditional cycles of delay.


So on balance I don't think this is a good tradeoff.


r~

Re: [PATCH] tcg/i386: Check for shorter instruction sequence for ARITH_AND

Reply via email to