Re: [fpc-devel] Double-checking an optimisation

2022-01-09 Thread J. Gareth Moreton via fpc-devel
On 09/01/2022 15:28, Martin Frb via fpc-devel wrote: Btw, have you seen this? https://www.agner.org/optimize/optimizing_assembly.pdf Page 70, it says that under some conditions a branch may be faster than a conditional move. I'm definitely saving a local copy of that!  It could prove insightf

Re: [fpc-devel] Double-checking an optimisation

2022-01-09 Thread Martin Frb via fpc-devel
On 09/01/2022 04:53, J. Gareth Moreton via fpc-devel wrote: On 09/01/2022 01:47, Martin Frb via fpc-devel wrote: I take it, it also is one (or two?) bytes longer? If that is in a loop, which otherwise is exactly within a 32 byte aligned block, then that could cause a slow down too. (If the l

Re: [fpc-devel] Double-checking an optimisation

2022-01-09 Thread Florian Klämpfl via fpc-devel
> Am 09.01.2022 um 15:08 schrieb J. Gareth Moreton via fpc-devel > : > > On 09/01/2022 12:35, Florian Klämpfl via fpc-devel wrote: >>> It removes a jump and a label, which might permit other long-range >>> optimisations, but it's 3 instructions that are in a dependency chain. >> >> Didn't y

Re: [fpc-devel] Double-checking an optimisation

2022-01-09 Thread J. Gareth Moreton via fpc-devel
On 09/01/2022 12:35, Florian Klämpfl via fpc-devel wrote:   It removes a jump and a label, which might permit other long-range optimisations, but it's 3 instructions that are in a dependency chain. Didn't you implement something which transformed the code above in   xorl    %ebx,%ebx  

Re: [fpc-devel] Double-checking an optimisation

2022-01-09 Thread Florian Klämpfl via fpc-devel
Am 09.01.2022 um 01:37 schrieb J. Gareth Moreton via fpc-devel: Hi everyone, So a merge request of mine was just approved that allows the peephole optimizer access to more registers when it needs one for temporary storage.  It allows it to make an optimisation on x86_64-win64 that wasn't possib

Re: [fpc-devel] Double-checking an optimisation

2022-01-08 Thread J. Gareth Moreton via fpc-devel
On 09/01/2022 01:47, Martin Frb via fpc-devel wrote: I take it, it also is one (or two?) bytes longer? If that is in a loop, which otherwise is exactly within a 32 byte aligned block, then that could cause a slow down too. (If the loop is 16 bytes long, but aligned to a 32byte-bound+16, then

Re: [fpc-devel] Double-checking an optimisation

2022-01-08 Thread Martin Frb via fpc-devel
On 09/01/2022 01:37, J. Gareth Moreton via fpc-devel wrote: Hi everyone, So a merge request of mine was just approved that allows the peephole optimizer access to more registers when it needs one for temporary storage.  It allows it to make an optimisation on x86_64-win64 that wasn't possible

[fpc-devel] Double-checking an optimisation

2022-01-08 Thread J. Gareth Moreton via fpc-devel
Hi everyone, So a merge request of mine was just approved that allows the peephole optimizer access to more registers when it needs one for temporary storage.  It allows it to make an optimisation on x86_64-win64 that wasn't possible before due to the lack of available volatile registers.  In