Am 28.02.21 um 11:11 schrieb J. Gareth Moreton via fpc-devel:
Hi everyone,

So to get to the point, I've spotted another potential peephole optimisation specifically on x86_64:

     movq    (%rdx),%rax
     shrq    $32,%rax

Is it acceptable to change this to the following?

     movl    4(%rdx),%eax

Yes. If (%rdx) is naturally aligned (so to a 8 byte boundary), 4(%rdx) is at least aligned to a 4 byte boundary and thus naturally aligned.


Logically it's equivalent thanks to the guarantee that the upper 32-bits of the destination register will be zeroed, but I know sometimes there might be a penalty for reading from memory that isn't aligned to a 16-byte boundary, say.

x86 is very robust against misalignments and the example code is anyways naturally aligned. Everything above natural alignment is coincidence.


A "movl; shrl $16" version may be possible with movzx, but I'm not certain if that will be even more inefficient due to the offset now being 2 rather than 4.

Gareth aka. Kit



_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

Reply via email to