95294: VAX: Convert backend to MODE_CC representation

Jeff Law via Gcc-patches Sat, 21 Nov 2020 19:27:28 -0800

On 11/19/20 8:36 PM, Maciej W. Rozycki wrote:
> In the VAX ISA INSV bitfield insert instruction is the only computational
> operation that keeps the condition codes, held in the PSL or Processor
> Status Longword register, intact.  The instruction is flexible enough it
> could potentially be used for data moves post-reload, but then reportedly
> it is not the best choice performance-wise, and then we have no addition
> operation available that would keep the condition codes unchanged.
>
> Futhermore, as usually with a complex CISC ISA, for many operations we
> have several machine instructions or instruction sequences to choose
> from that set condition codes in a different manner.
>
> Use the approach then where the condition codes only get introduced by
> reload, by definining instruction splitters for RTL insns that change
> condition codes in some way, by default considering them clobbered.
>
> Then to prevent code generated from regressing too much provide insns
> that include a `compare' operation setting the condition codes in
> parallel to the main operation.  The manner condition codes are set by
> each insn is supposed to be provided by the whatever the SELECT_CC_MODE
> macro expands to.
>
> Given that individual patterns provided for the same RTL basic operation
> may set the condion codes differently keeping the information away from
> the insn patterns themselves would cause a maintenance nightmare and
> would be bound to fail in a horrible way sooner or later.  Therefore
> instead let the patterns themselves choose which condition modes they
> support, by having one or more subst iterators applied and then have
> individual comparison operators require the specific condition mode each
> according to the codes used by the operation.
>
> While subst iterators only support one alternative each, there is
> actually no problem with applying multiple ones to a single insn with
> the result as intended, and if the corresponding subst attribute
> supplies an empty NO-SUBST-VALUE, then no mess results even.  Make use
> of this observation.
>
> Add appropriate subst iterators to all the computational patterns then,
> according to the condition codes they usably set, including DImode ones
> and a substitute DImode comparison instruction in the absence of a CMPQ
> machine instruction, however do not provide a `cbranchdi4' named pattern
> as without a further development it regresses code quality by resorting
> to the `__cmpdi2' libcall where a simpler operation would do, e.g. to
> check for negativity the TSTL machine instruction may be executed over
> the upper longword only.  This is good material for further work.
>
> Do not apply subst iterators to the increment- or decrement-and-branch
> patterns at this time; these may yet have to be reviewed, in particular
> whether `*jsobneq_minus_one' is still relevant in the context of the
> recent integer constant cost review.
>
> Also add a couple of peepholes to help eliminating comparisons in some
> problematic cases, such as with the BIT instruction which is bitwise-AND
> for condition codes only that has no direct counterpart for the actual
> calculation, because the BIC instruction which does do bitwise-AND and
> produces a result implements the operation with a bitwise negation of
> its input `mask' operand.  Or the FFS instruction which sets the Z
> condition code according to its `field' input operand rather than the
> result produced.  Or the bitfield comparisons we don't have generic
> middle-end support for.
>
> Code size stats are as follows, obtained from 17640 and 9086 executables
> built in `check-c' and `check-c++' GCC testing respectively:
>
>                       check-c                 check-c++
>               samples average  median  samples average  median
> ---------------------------------------------------------------
> regressions      1813  0.578%  0.198%      289  0.349%  0.175%
> unchanged       15160  0.000%  0.000%     8662  0.000%  0.000%
> progressions      667 -0.589% -0.194%      135 -0.944% -0.191%
> ----------------------------------------------------------------
> total           17640  0.037%  0.000%     9086 -0.003%  0.000%
>
> Outliers:
>
> old     new     change  %change filename
> ----------------------------------------------------
> 2406    2950    +544    +22.610 20111208-1.exe
> 4314    5329    +1015   +23.528 pr39417.exe
> 2235    3055    +820    +36.689 990404-1.exe
> 2631    4213    +1582   +60.129 pr57521.exe
> 3063    5579    +2516   +82.142 20000422-1.exe
>
> and:
>
> old     new     change  %change filename
> ----------------------------------------------------
> 6317    4845    -1472   -23.302 vector-compare-1.exe
> 6313    4845    -1468   -23.254 vector-compare-1.exe
> 6474    5002    -1472   -22.737 vector-compare-1.exe
> 6470    5002    -1468   -22.689 vector-compare-1.exe
>
> We have some code quality regressions like:
>
>     10861:    9e ef d9 12     movab 11b40 <p>,r0
>     10865:    00 00 50
>     10868:    90 a0 03 a0     movb 0x3(r0),0x2(r0)
>     1086c:    02
>     1086d:    d1 60 8f 61     cmpl (r0),$0x64646261
>     10871:    62 64 64
>     10874:    13 07           beql 1087d <main_test+0x21>
>
> to:
>
>     10861:    9e ef e1 12     movab 11b48 <p>,r0
>     10865:    00 00 50
>     10868:    90 a0 03 a0     movb 0x3(r0),0x2(r0)
>     1086c:    02
>     1086d:    d1 ef d5 12     cmpl 11b48 <p>,$0x64646261
>     10871:    00 00 8f 61
>     10875:    62 64 64
>     10878:    13 07           beql 10881 <main_test+0x25>
>
> (from `memmove-2.x2') due to the constant propagation passes eagerly
> replacing pseudo registers with direct symbol references where possible,
> which does not happen with CC0 even though the passes do run regardless.
>
> There are further code quality regressions due to earlier compilation
> stages trying to push expression evaluation earlier where possible so
> as to make data dependencies further apart from each other.  This works
> well for computations and architectures that do not involve condition
> codes set as a side effect of calculations.  However for integer
> negation that makes assembly code produced like:
>
>       movb *8(%ap),%r0
>       mnegb %r0,%r1
>       tstb %r0
>       jeql .L2
>
> the RTL equibvalent of which the comparison elimination pass cannot
> really do anything about, because the comparison is made on the source
> rather than the target operand of the negation (we could add a peephole
> for this, but this seems futile an effort, as one'd have to iterate over
> all the possible such cases), even though this is really equivalent to:
>
>       movb *8(%ap),%r0
>       mnegb %r0,%r1
>       jeql .L2
>
> or, if R0 is dead at the conclusion of the branch, even:
>
>       mnegb *8(%ap),%r1
>       jeql .L2
>
> Since the compiler insists on doing the comparison on the source of the
> negation it obviously has to load it into a temporary so as to avoid
> accessing the original memory location twice, hence the sequence of
> three instructions rather than just a single one.  A similar phenomenon
> can be observed with the XOR operation and in other cases.
>
> In some cases a comparison does get eliminated, however useless moves
> into registers done in preparation to it remain, such as with:
>
>       movb *8(%ap),%r2
>       movb *12(%ap),%r1
>       subb3 %r1,%r2,%r0
>       jlssu .L2
>
> where R1 and R2 are both dead at conclusion and therefore:
>
>       subb3 *12(%ap),*8(%ap),%r0
>       jlssu .L2
>
> would obviously do, but there was to be a comparison before the branch:
>
>       cmpb %r2,%r1
>
> All this looks like material for future improvement.
>
> Test cases for comparison elimination and the peepholes will be supplied
> separately.
>
>       gcc/
>       PR target/95294
>       * config/vax/elf.h (REGISTER_NAMES): Append `%psl'.
>       * config/vax/vax-modes.def (CCN, CCNZ, CCZ): New modes.
>       * config/vax/vax-protos.h (vax_select_cc_mode): New prototype.
>       (vax_maybe_split_dimode_move): Likewise.
>       (vax_notice_update_cc): Remove prototype.
>       * config/vax/vax.c (TARGET_FLAGS_REGNUM): New macro.
>       (TARGET_CC_MODES_COMPATIBLE): Likewise.
>       (TARGET_MD_ASM_ADJUST): Likewise.
>       (vax_select_cc_mode): New function
>       (vax_cc_modes_compatible): Likewise.
>       (vax_md_asm_adjust): Likewise.
>       (vax_notice_update_cc): Remove function.
>       (vax_output_int_move): Factor out code checking if a DImode move
>       may have to be split...
>       (vax_maybe_split_dimode_move): ... into this new function.
>       * config/vax/vax.h (FIRST_PSEUDO_REGISTER): Bump up.
>       (FIXED_REGISTERS): Append an entry for PSL.
>       (CALL_USED_REGISTERS): Likewise.
>       (NOTICE_UPDATE_CC, OUTPUT_JUMP): Remove macros.
>       (SELECT_CC_MODE): New macro.
>       (REGISTER_NAMES): Append `psl'.
>       * config/vax/predicates.md (const_zero_operand)
>       (vax_cc_comparison_operator, vax_ccn_comparison_operator)
>       (vax_ccnz_comparison_operator, vax_ccz_comparison_operator):
>       New predicates.
>       * config/vax/builtins.md: Rewrite for MODE_CC representation.
>       * config/vax/vax.md: Likewise.
So I just spot-checked this and the bits I looked at were quite sane. 
Given that and the fact that the port is on the chopping block if we
don't get the conversion done, I think it is fine to ACK this now and if
we want to iterate on things further we certainly can.

Jeff
Re: [PATCH 30/31] PR target/95294: VAX: Convert backend to MODE_CC representation

Reply via email to