Thanks David, I'm still learning some of the nuances of the Intel and AMD processors, but most of it is just logical analysis. Admittedly my main drive has been to shrink down the size of the binary, since Delphi and Free Pascal have always been a little bit bloated in comparison. Not that it is necessarily a bad thing, but saving space without sacrificing performance can only be a good thing, especially for those with limited bandwidth or for saving those few precious bytes when burning files to a CD or DVD.
There have been a few instances in the compiled compiler (my main test case) where an entire register is freed up due to my deep optimisation, and that means the corresponding "push" and "pop" at either end of the procedure can be removed (along with the corresponding stack unwinding information), although I haven't started programming that yet. I am ready to submit this part of my deep optimiser as a patch. I'm just waiting for Florian's acceptance or rejection of my debug strip patch - https://bugs.freepascal.org/view.php?id=33798 (the 3rd attempt!) - only because it shares some debugging code with said patch (it was useful to monitor how the registers inside references were changed). If it's rejected, it just means I'll have to change some of that debugging code a bit. Gareth aka. Kit On Mon 11/06/18 20:27 , David Pethes pub...@satd.sk sent: Hi, nice work. On 8. 6. 2018 0:46, J. Gareth Moreton wrote: > The deep optimiser changes this to: > > movq %rcx,%rax > movq %rdx,%rsi > movq %rcx,%rbx > > It determines, for the third MOV, it can > change %rax for %rcx to minimise a > pipeline stall, and then knows that %rbx > and %rcx contain the same value, so can > remove the 4th MOV completely. Given that > modern processors usually have at least 3 > ALUs and the interdependencies have been > removed, this will likely give a speed > increase of one cycle over these few > commands. Note that modern cpu-s can use move elimination for reg to reg moves, so it doesn't cost any execution resources (it's "free"). Despite that it's still a win, because it spares both bytes in I-cache and decoder bandwidth (which can indirectly lead to some spared cycle(s) at other places). David _______________________________________________ fpc-devel maillist - fpc-devel@lists.freepascal.org [1] http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel [2]">http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel Links: ------ [1] mailto:fpc-devel@lists.freepascal.org [2] http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
_______________________________________________ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel