https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95294

--- Comment #2 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Maciej W. Rozycki <ma...@gcc.gnu.org>:

https://gcc.gnu.org/g:e552abe2ba27ccec0d7c0d050b4617f2f636dd63

commit r11-5796-ge552abe2ba27ccec0d7c0d050b4617f2f636dd63
Author: Maciej W. Rozycki <ma...@linux-mips.org>
Date:   Sat Dec 5 18:26:27 2020 +0000

    PR target/95294: VAX: Convert backend to MODE_CC representation

    In the VAX ISA INSV bit-field insert instruction is the only computational
    operation that keeps the condition codes, held in the PSL or Processor
    Status Longword register, intact.  The instruction is flexible enough it
    could potentially be used for data moves post-reload, but then reportedly
    it is not the best choice performance-wise, and then we have no addition
    operation available that would keep the condition codes unchanged.

    Futhermore, as usually with a complex CISC ISA, for many operations we
    have several machine instructions or instruction sequences to choose
    from that set condition codes in a different manner.

    Use the approach then where the condition codes only get introduced by
    reload, by definining instruction splitters for RTL insns that change
    condition codes in some way, by default considering them clobbered.

    Then to prevent code generated from regressing too much provide insns
    that include a `compare' operation setting the condition codes in
    parallel to the main operation.  The manner condition codes are set by
    each insn is supposed to be provided by the whatever the SELECT_CC_MODE
    macro expands to.

    Given that individual patterns provided for the same RTL basic operation
    may set the condion codes differently keeping the information away from
    the insn patterns themselves would cause a maintenance nightmare and
    would be bound to fail in a horrible way sooner or later.  Therefore
    instead let the patterns themselves choose which condition modes they
    support, by having one or more subst iterators applied and then have
    individual comparison operators require the specific condition mode each
    according to the codes used by the operation.

    While subst iterators only support one alternative each, there is
    actually no problem with applying multiple ones to a single insn with
    the result as intended, and if the corresponding subst attribute
    supplies an empty NO-SUBST-VALUE, then no mess results even.  Make use
    of this observation.

    Add appropriate subst iterators to all the computational patterns then,
    according to the condition codes they usably set, including DImode ones
    and a substitute DImode comparison instruction in the absence of a CMPQ
    machine instruction, however do not provide a `cbranchdi4' named pattern
    as without a further development it regresses code quality by resorting
    to the `__cmpdi2' libcall where a simpler operation would do, e.g. to
    check for negativity the TSTL machine instruction may be executed over
    the upper longword only.  This is good material for further work.

    Do not apply subst iterators to the increment- or decrement-and-branch
    patterns at this time; these may yet have to be reviewed, in particular
    whether `*jsobneq_minus_one' is still relevant in the context of the
    recent integer constant cost review.

    Also add a couple of peepholes to help eliminating comparisons in some
    problematic cases, such as with the BIT instruction which is bitwise-AND
    for condition codes only that has no direct counterpart for the actual
    calculation, because the BIC instruction which does do bitwise-AND and
    produces a result implements the operation with a bitwise negation of
    its input `mask' operand.  Or the FFS instruction which sets the Z
    condition code according to its `field' input operand rather than the
    result produced.  Or the bit-field comparisons we don't have generic
    middle-end support for.

    Code size stats are as follows, obtained from 17640 and 9086 executables
    built in `check-c' and `check-c++' GCC testing respectively:

                          check-c                 check-c++
                  samples average  median  samples average  median
    ---------------------------------------------------------------
    regressions      1813  0.578%  0.198%      289  0.349%  0.175%
    unchanged       15160  0.000%  0.000%     8662  0.000%  0.000%
    progressions      667 -0.589% -0.194%      135 -0.944% -0.191%
    ----------------------------------------------------------------
    total           17640  0.037%  0.000%     9086 -0.003%  0.000%

    Outliers:

    old     new     change  %change filename
    ----------------------------------------------------
    2406    2950    +544    +22.610 20111208-1.exe
    4314    5329    +1015   +23.528 pr39417.exe
    2235    3055    +820    +36.689 990404-1.exe
    2631    4213    +1582   +60.129 pr57521.exe
    3063    5579    +2516   +82.142 20000422-1.exe

    and:

    old     new     change  %change filename
    ----------------------------------------------------
    6317    4845    -1472   -23.302 vector-compare-1.exe
    6313    4845    -1468   -23.254 vector-compare-1.exe
    6474    5002    -1472   -22.737 vector-compare-1.exe
    6470    5002    -1468   -22.689 vector-compare-1.exe

    We have some code quality regressions like:

        10861:      9e ef d9 12     movab 11b40 <p>,r0
        10865:      00 00 50
        10868:      90 a0 03 a0     movb 0x3(r0),0x2(r0)
        1086c:      02
        1086d:      d1 60 8f 61     cmpl (r0),$0x64646261
        10871:      62 64 64
        10874:      13 07           beql 1087d <main_test+0x21>

    to:

        10861:      9e ef e1 12     movab 11b48 <p>,r0
        10865:      00 00 50
        10868:      90 a0 03 a0     movb 0x3(r0),0x2(r0)
        1086c:      02
        1086d:      d1 ef d5 12     cmpl 11b48 <p>,$0x64646261
        10871:      00 00 8f 61
        10875:      62 64 64
        10878:      13 07           beql 10881 <main_test+0x25>

    (from `memmove-2.x2') due to the constant propagation passes eagerly
    replacing pseudo registers with direct symbol references where possible,
    which does not happen with CC0 even though the passes do run regardless.

    There are further code quality regressions due to earlier compilation
    stages trying to push expression evaluation earlier where possible so
    as to make data dependencies further apart from each other.  This works
    well for computations and architectures that do not involve condition
    codes set as a side effect of calculations.  However for integer
    negation that makes assembly code produced like:

            movb *8(%ap),%r0
            mnegb %r0,%r1
            tstb %r0
            jeql .L2

    the RTL equibvalent of which the comparison elimination pass cannot
    really do anything about, because the comparison is made on the source
    rather than the target operand of the negation (we could add a peephole
    for this, but this seems futile an effort, as one'd have to iterate over
    all the possible such cases), even though this is really equivalent to:

            movb *8(%ap),%r0
            mnegb %r0,%r1
            jeql .L2

    or, if R0 is dead at the conclusion of the branch, even:

            mnegb *8(%ap),%r1
            jeql .L2

    Since the compiler insists on doing the comparison on the source of the
    negation it obviously has to load it into a temporary so as to avoid
    accessing the original memory location twice, hence the sequence of
    three instructions rather than just a single one.  A similar phenomenon
    can be observed with the XOR operation and in other cases.

    In some cases a comparison does get eliminated, however useless moves
    into registers done in preparation to it remain, such as with:

            movb *8(%ap),%r2
            movb *12(%ap),%r1
            subb3 %r1,%r2,%r0
            jlssu .L2

    where R1 and R2 are both dead at conclusion and therefore:

            subb3 *12(%ap),*8(%ap),%r0
            jlssu .L2

    would obviously do, but there was to be a comparison before the branch:

            cmpb %r2,%r1

    All this looks like material for future improvement.

    Test cases for comparison elimination and the peepholes will be supplied
    separately.

            gcc/
            PR target/95294
            * config/vax/elf.h (REGISTER_NAMES): Append `%psl'.
            * config/vax/vax-modes.def (CCN, CCNZ, CCZ): New modes.
            * config/vax/vax-protos.h (vax_select_cc_mode): New prototype.
            (vax_maybe_split_dimode_move): Likewise.
            (vax_notice_update_cc): Remove prototype.
            * config/vax/vax.c (TARGET_FLAGS_REGNUM): New macro.
            (TARGET_CC_MODES_COMPATIBLE): Likewise.
            (TARGET_MD_ASM_ADJUST): Likewise.
            (vax_select_cc_mode): New function
            (vax_cc_modes_compatible): Likewise.
            (vax_md_asm_adjust): Likewise.
            (vax_notice_update_cc): Remove function.
            (vax_output_int_move): Factor out code checking if a DImode move
            may have to be split...
            (vax_maybe_split_dimode_move): ... into this new function.
            * config/vax/vax.h (FIRST_PSEUDO_REGISTER): Bump up.
            (FIXED_REGISTERS): Append an entry for PSL.
            (CALL_USED_REGISTERS): Likewise.
            (NOTICE_UPDATE_CC, OUTPUT_JUMP): Remove macros.
            (SELECT_CC_MODE): New macro.
            (REGISTER_NAMES): Append `psl'.
            * config/vax/predicates.md (const_zero_operand)
            (vax_cc_comparison_operator, vax_ccn_comparison_operator)
            (vax_ccnz_comparison_operator, vax_ccz_comparison_operator):
            New predicates.
            * config/vax/builtins.md: Rewrite for MODE_CC representation.
            * config/vax/vax.md: Likewise.

Reply via email to