Re: [fpc-devel] -O3 peephole proposal... run Pass 1 again if Pass 2 returns True

J. Gareth Moreton via fpc-devel Mon, 01 Mar 2021 03:01:05 -0800

Okay, so I ran a test and decided to see what would happen if I cycledback to pass 1 if pass 2 made changes. Other than some edge cases insome packages (a bug caused an infinite loop at this stage of the makeprocess), it causes no difference in code generation in the RTL. Theonly notable difference is the compiler being much slower!

So I think we can write off this idea. Granted, there are still somesituations where pass 2 methods call pass 1 methods - the most notableone is a slightly convoluted optimisation in OptPass2MOV - if theinstruction that follows is JMP, it calls OptPass2JMP on it right thereand then, rather than wait for PeepHoleOptPass2Cpu to reach theinstruction. The reason for this is that many of OptPass2JMP'soptimisations either insert a MOV that can be optimised with theoriginal MOV, or inserts a RET and turns the original MOV into adeadstore. However, this optimisation cannot be moved into pass 1without a drop in optimisation quality (notably, OptPass2JMP performsworse with converting blocks of MOV's into CMOVcc instructions).

Following on from comments from Florian in i38555, I'll see aboutfactoring out the specific MOV/MOV and MOV/RET optimisations fromOptPass1MOV at some point so they can be called separately. Not onlydoes it minimise problems and design violations of calling pass 1methods from pass 2, but it will also provide a speed gain in pass 2from not having to check everything that OptPass1MOV has to offer.


Gareth aka. Kit

On 28/02/2021 04:15, J. Gareth Moreton via fpc-devel wrote:

Just as an example, when compiling the System unit on r48813, thereexists this block of disassembly:
.Lj4072:
    ...
    leaq    (%rsi,%r13),%rax
    leaq    -1(%rax),%r12
# Peephole Optimization: SubMov2LeaSub
    subq    $1,%rax
    ...
With my improvement over at i38555, the optimiser can remove the subinstruction because %rax doesn't get used afterwards, hence:
.Lj4072:
    ...
    jne    .Lj4070
    leaq    (%rsi,%r13),%rax
# Peephole Optimization: SubMov2Lea
    leaq    -1(%rax),%r12
    ...
SubMov2Lea (and SubMov2LeaSub) is a Pass 2 optimisation because of thepotential to do deeper optimisations on the MOV instruction (which arein Pass 1). After the optimisation is made, and with the knowledgethat %rax's value is discarded afterwards, careful observation willreveal that the two LEA instructions can be merged:
.Lj4072:
    ...
    jne    .Lj4070
# Peephole Optimization: SubMov2Lea
    leaq    -1(%rsi,%r13),%r12
    ...
I've been working in a separate branch to improve the optimisations inOptPass1LEA to detect this (it currently doesn't because the twodestination registers aren't identical), and this is why I callOptPass1LEA from OptPass2SUB in the patch provided on i38555, althoughas I originally described, this feels somewhat hacky and has a risk ofopening up more bugs. A safer and more thorough approach, althoughslower, would be to call Pass 1 again where the register tracking isup to date, for example (when calling from OptPass2SUB, because thefirst LEA is the previous instruction, the register tracking is aheadby one instruction upon entering OptPass1LEA).
Gareth aka. Kit


On 28/02/2021 01:51, J. Gareth Moreton via fpc-devel wrote:
Hi everyone,
I'm currently developing some new optimisations for Lea instructionsafter I discovered some new potential ones after fixing i38527. Thataside though, sometimes these optimisations only become apparentafter Pass 2 has completed. I've tried to change the order of thingsso the optimisation is made in Pass 1, but there's no easycombination that ensures the best optimisations take place (i.e. Imake a change to improve one optimisation, and another one is madeworse at the same time).
I've taken to calling OptPass1XXX routines from OptPass2XXX routinesin places where this is likely to happen, and so far this producesthe best code - however, it feels hacky and problems may occur withregister tracking if OptPass1XXX is called on a different instructionto the current one (e.g. one optimisation I've found requires callingGetLastInstruction and then calling OptPass1LEA on the result if it'sa LEA instruction).
So to help clean up the code and provide the best output, I wouldlike to propose a cross-platform change to the peephole optimizer:
- Under -O3, if a change was made in Pass 2 (implied if any of theOptPass2XXX routines return True), the peephole optimiser cycles backto Pass 1 and tries again.
There are a few variants for this:
- After Pass 1 is called after Pass 2, it then goes to thePost-peephole Pass regardless of if anything was changed.
- It goes through the whole process again in that after Pass 1 iscalled again, Pass 2 is then called again, and if Pass 2 returns Trueagain, then it goes back to Pass 1 and does it as many times asneeded (or until it hits an upper limit to prevent an infinite loopdue to a compiler bug). Only once does Pass 2 return False that itgoes to the Post-peephole Pass.
- The third variant is that variant 1 is done for -O2 and variant 2is done for -O3 (and no extra run of Pass 1 for -O1).
The obvious side-effect is that it causes the compiler to runslightly slower, but this could potentially be mitigated by mergingthe Pre-Peephole Pass with Pass 1, thus eliminating a distinct pass,while any missed optimisations that occur due to this are picked upin the second call to Pass 1 (it will most likely be picked up in thefirst call to Pass 1 due to PeepHoleOptPass1Cpu returning True andsignalling another iteration).
What are everyone's thoughts?

Gareth aka. Kit


--
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

Re: [fpc-devel] -O3 peephole proposal... run Pass 1 again if Pass 2 returns True

Reply via email to