So a progress update. I've tied in part of my deep optimiser into the peephole optimiser, specifically PostPeepholeOptMov, and it's had some unexpected benefits. One of the things it does is start with a MOV command that copies a register's contents into another, then looks at subsequent reference addresses to see if it can swap out one register for another, to reduce the chance of a pipeline stall. There are cases where it's noticed that all such registers have been switched in a certain block and hence safely removes the original MOV command.
What this means is that as well as reducing the chances of a pipeline stall, it's removing unnecessary assignments. My main test case has been compiling the compiler, since it's sufficiently complex and easy to crash if incorrect machine code is produced, and it also gives plenty of examples of optimisation. As a very brief example, in compiler/x86_64/symcpu.pas in TCPUProcDef.ppuload_platform, the first four lines are: movq %rcx,%rax movq %rdx,%rsi movq %rax,%rbx movq %rbx,%rcx The deep optimiser changes this to: movq %rcx,%rax movq %rdx,%rsi movq %rcx,%rbx It determines, for the third MOV, it can change %rax for %rcx to minimise a pipeline stall, and then knows that %rbx and %rcx contain the same value, so can remove the 4th MOV completely. Given that modern processors usually have at least 3 ALUs and the interdependencies have been removed, this will likely give a speed increase of one cycle over these few commands. Before I go submitting patches though, I still need to test it under Linux and i386. Kit _______________________________________________ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel