Confirmed my suspicions. if I zero the upper bits of the register (I
used something akin to "AND RCX, $F"), there is no speed loss.
Therefore, I can make the hypothesis, on my Intel(R) Core(TM) i7-10750H,
that using TEST on a sub-register causes a false dependency if the bits
outside of the subset are not zero, even though the register isn't being
modified.
Gareth aka. Kit
On 02/10/2020 11:57, J. Gareth Moreton via fpc-devel wrote:
So... I've done some tests, replacing TEST RCX, $4 with TEST CL, $4
and the like in a number-crunching function, and it seems to cause a
notable penalty, even though none of the instructions are in my
critical loop. So I think it's something that needs to be avoided in
most cases. I think the reason why it worked in my Int and Frac
functions is because the processor knows the upper 48 bits of the
register are zero.
Long story short... best not to do it unless you have some additional
insight into what the registers contain.
Gareth aka. Kit
On 02/10/2020 08:15, J. Gareth Moreton via fpc-devel wrote:
Ah brilliant, thank you.
I have used Agner Fog's material before for cycle counting. When I
implemented my 3 MOV -> XCHG optimisation
(https://bugs.freepascal.org/view.php?id=36511), I used Agner Fog's
empirical results to determine when it's best to apply this
optimisation where speed is concerned (on a lot of older processors,
it's not worth it because XCHG took 3 cycles and the 3 MOVs generally
took only 2 (due to how the dependency chain is set up). Only when
XCHG's cycle count dropped to 1 or 2, or when optimising for size,
does it pay off.
So it looks like a partial read of the lower bits is absolutely fine,
since you're not changing anything.
Gareth aka. Kit
On 02/10/2020 01:40, Nikolay Nikolov via fpc-devel wrote:
On 10/1/20 11:36 PM, J. Gareth Moreton via fpc-devel wrote:
I thought that might be the case - thanks Nikolay. And I meant to
say lower bits of a REGISTER, not an instruction!
Admittedly I'm cycle-counting and byte-counting again! I was
looking for ways to reduce 13 bytes of padding in one of my pure
assembly language routines and realised I could make a saving
there. The only thing I can think of that I have to watch out for
logically is if I change, say, TEST EAX, $80 to TEST AL, $80, the
latter will set the sign flag if the most-significant bit is 1
after the 'and' operation) while the former always clears the sign
flag.
I have used such subregisters before in the FPC RTL, in
fpc_int_real and fpc_frac_real in rtl/x86_64/math.inc, where I read
AX instead of the larger RAX, but that's only after a call to "SHR
RAX, 48" that guarantees that everything above the 16th bit is
zero, and after testing other implementation candidates a kind of
informal competition. (Surprisingly, I think "shr $48, %rax; and
$0x7ff0,%ax; cmp $0x4330,%ax" runs faster than moving 64-bit
constants into temporary registers (since 64-bit immediates aren't
supported outside of MOV) and using 'and' and 'cmp' on %rax directly)
I think you always get a read penalty when using the high-byte
registers because the processor has to do an implicit shift operation.
I don't remember the reason, but I recall reading they are less
efficient in Agner Fog's optimization manual. Here's the relevant
quote:
"Any use of the high 8-bit registers AH, BH, CH, DH should be
avoided because it can cause false dependences and less efficient
code."
It's from the chapter "Partial registers" (page 61) of this document:
https://www.agner.org/optimize/optimizing_assembly.pdf
Highly recommended reading, as it addresses exactly the topic of
partial registers. In general, it is the partial register writes of
16-bit or 8-bit subregisters that cause problems - either false read
dependencies (usually on AMD) or extra penalties for
joining/splitting registers (on Intel, at least in the P6 era).
Best regards,
Nikolay
_______________________________________________
fpc-devel maillist - fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
_______________________________________________
fpc-devel maillist - fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
--
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus
_______________________________________________
fpc-devel maillist - fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel