[fpc-devel] Peephole optimizer passes

J. Gareth Moreton via fpc-devel Tue, 25 Jan 2022 12:14:43 -0800

Hi everyone,

So I've found with the peephole optimizer, at least on x86, that if yourun pass 2 more than once, it often catches even more optimisations thatotherwise get missed. At the same time I've found some bugs that gettriggered when pass 2 is run again (which is why I asked aboutRegLoadedWithNewValue in another chain).

I'm working out how best to permit this, given pass 2 has only ever beenrun once, and it's a cross-platform thing that will cause slowdownacross the board, although I figure if it only runs pass 2 multipletimes on -O3 and above, then it running more slowly is permissible.

Additionally, I've found that running certain elements of pass 1 againalso yield some new optimisations, although in this instance I figureit's best to just run these optimisations again in pass 2 instead offalling back to pass 1, although I'll have to experiment to see if thiscatches all eventualities;

On another note, I do wonder if the pre-peephole pass should be mergedinto pass 1, and then pass 1 be run up to 3 times on -O2 instead oftwice so the level of optimisation is identical. Then again, I'm notcertain if other platforms do some special instruction manipulation thatwould be incompatible with a regular pass.


Gareth aka. Kit

P.S. Just some examples... in ninl, for example - before:

.Lj1162:
    movq    %r13,%rcx
    call    NCON_$$_GENENUMNODE$TENUMSYM$$TORDCONSTNODE
    movq    %rax,56(%rsp)
    movq    56(%rsp),%rdi
    jmp    .Lj1141
    .balign 16,0x90

After:

.Lj1162:
    movq    %r13,%rcx
    call    NCON_$$_GENENUMNODE$TENUMSYM$$TORDCONSTNODE
    movq    %rax,56(%rsp)
    movq    %rax,%rdi
    jmp    .Lj1141
    .balign 16,0x90

In SysUtils, this sequence appears surprisingly often on x86_64-win64:

.Lj7572:
    movq    -40(%rbp),%rax
    cmpb    $0,-292(%rax)
    jne    .Lj7577
    movq    -40(%rbp),%rcx
    movl    $1,%r8d
    movq    -48(%rbp),%rdx
    call SYSUTILS$_$DATETIMETOSTRING$hxuwovHuJEHC_$$_STORESTR$PCHAR$LONGINT
    movb    %sil,%dil
    movb    -4(%rbp),%sil
    movb    %sil,%dil
    movb    -4(%rbp),%sil
    jmp    .Lj7447
    .p2align 4,,10
    .p2align 3

And this is optimised by additional passes and optimisations:

.Lj7572:
    movq    -40(%rbp),%rax
    cmpb    $0,-292(%rax)
    jne    .Lj7577
    movq    -40(%rbp),%rcx
    movl    $1,%r8d
    movq    -48(%rbp),%rdx
    call SYSUTILS$_$DATETIMETOSTRING$hxuwovHuJEHC_$$_STORESTR$PCHAR$LONGINT
    movb    -4(%rbp),%dil
    movb    %dil,%sil
    jmp    .Lj7447
    .p2align 4,,10
    .p2align 3


--
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

[fpc-devel] Peephole optimizer passes

Reply via email to