On 11/19/20 8:36 PM, Maciej W. Rozycki wrote:
> In the VAX ISA INSV bitfield insert instruction is the only computational
> operation that keeps the condition codes, held in the PSL or Processor
> Status Longword register, intact. The instruction is flexible enough it
> could potentially be used for data moves post-reload, but then reportedly
> it is not the best choice performance-wise, and then we have no addition
> operation available that would keep the condition codes unchanged.
>
> Futhermore, as usually with a complex CISC ISA, for many operations we
> have several machine instructions or instruction sequences to choose
> from that set condition codes in a different manner.
>
> Use the approach then where the condition codes only get introduced by
> reload, by definining instruction splitters for RTL insns that change
> condition codes in some way, by default considering them clobbered.
>
> Then to prevent code generated from regressing too much provide insns
> that include a `compare' operation setting the condition codes in
> parallel to the main operation. The manner condition codes are set by
> each insn is supposed to be provided by the whatever the SELECT_CC_MODE
> macro expands to.
>
> Given that individual patterns provided for the same RTL basic operation
> may set the condion codes differently keeping the information away from
> the insn patterns themselves would cause a maintenance nightmare and
> would be bound to fail in a horrible way sooner or later. Therefore
> instead let the patterns themselves choose which condition modes they
> support, by having one or more subst iterators applied and then have
> individual comparison operators require the specific condition mode each
> according to the codes used by the operation.
>
> While subst iterators only support one alternative each, there is
> actually no problem with applying multiple ones to a single insn with
> the result as intended, and if the corresponding subst attribute
> supplies an empty NO-SUBST-VALUE, then no mess results even. Make use
> of this observation.
>
> Add appropriate subst iterators to all the computational patterns then,
> according to the condition codes they usably set, including DImode ones
> and a substitute DImode comparison instruction in the absence of a CMPQ
> machine instruction, however do not provide a `cbranchdi4' named pattern
> as without a further development it regresses code quality by resorting
> to the `__cmpdi2' libcall where a simpler operation would do, e.g. to
> check for negativity the TSTL machine instruction may be executed over
> the upper longword only. This is good material for further work.
>
> Do not apply subst iterators to the increment- or decrement-and-branch
> patterns at this time; these may yet have to be reviewed, in particular
> whether `*jsobneq_minus_one' is still relevant in the context of the
> recent integer constant cost review.
>
> Also add a couple of peepholes to help eliminating comparisons in some
> problematic cases, such as with the BIT instruction which is bitwise-AND
> for condition codes only that has no direct counterpart for the actual
> calculation, because the BIC instruction which does do bitwise-AND and
> produces a result implements the operation with a bitwise negation of
> its input `mask' operand. Or the FFS instruction which sets the Z
> condition code according to its `field' input operand rather than the
> result produced. Or the bitfield comparisons we don't have generic
> middle-end support for.
>
> Code size stats are as follows, obtained from 17640 and 9086 executables
> built in `check-c' and `check-c++' GCC testing respectively:
>
> check-c check-c++
> samples average median samples average median
> ---------------------------------------------------------------
> regressions 1813 0.578% 0.198% 289 0.349% 0.175%
> unchanged 15160 0.000% 0.000% 8662 0.000% 0.000%
> progressions 667 -0.589% -0.194% 135 -0.944% -0.191%
> ----------------------------------------------------------------
> total 17640 0.037% 0.000% 9086 -0.003% 0.000%
>
> Outliers:
>
> old new change %change filename
> ----------------------------------------------------
> 2406 2950 +544 +22.610 20111208-1.exe
> 4314 5329 +1015 +23.528 pr39417.exe
> 2235 3055 +820 +36.689 990404-1.exe
> 2631 4213 +1582 +60.129 pr57521.exe
> 3063 5579 +2516 +82.142 20000422-1.exe
>
> and:
>
> old new change %change filename
> ----------------------------------------------------
> 6317 4845 -1472 -23.302 vector-compare-1.exe
> 6313 4845 -1468 -23.254 vector-compare-1.exe
> 6474 5002 -1472 -22.737 vector-compare-1.exe
> 6470 5002 -1468 -22.689 vector-compare-1.exe
>
> We have some code quality regressions like:
>
> 10861: 9e ef d9 12 movab 11b40 <p>,r0
> 10865: 00 00 50
> 10868: 90 a0 03 a0 movb 0x3(r0),0x2(r0)
> 1086c: 02
> 1086d: d1 60 8f 61 cmpl (r0),$0x64646261
> 10871: 62 64 64
> 10874: 13 07 beql 1087d <main_test+0x21>
>
> to:
>
> 10861: 9e ef e1 12 movab 11b48 <p>,r0
> 10865: 00 00 50
> 10868: 90 a0 03 a0 movb 0x3(r0),0x2(r0)
> 1086c: 02
> 1086d: d1 ef d5 12 cmpl 11b48 <p>,$0x64646261
> 10871: 00 00 8f 61
> 10875: 62 64 64
> 10878: 13 07 beql 10881 <main_test+0x25>
>
> (from `memmove-2.x2') due to the constant propagation passes eagerly
> replacing pseudo registers with direct symbol references where possible,
> which does not happen with CC0 even though the passes do run regardless.
>
> There are further code quality regressions due to earlier compilation
> stages trying to push expression evaluation earlier where possible so
> as to make data dependencies further apart from each other. This works
> well for computations and architectures that do not involve condition
> codes set as a side effect of calculations. However for integer
> negation that makes assembly code produced like:
>
> movb *8(%ap),%r0
> mnegb %r0,%r1
> tstb %r0
> jeql .L2
>
> the RTL equibvalent of which the comparison elimination pass cannot
> really do anything about, because the comparison is made on the source
> rather than the target operand of the negation (we could add a peephole
> for this, but this seems futile an effort, as one'd have to iterate over
> all the possible such cases), even though this is really equivalent to:
>
> movb *8(%ap),%r0
> mnegb %r0,%r1
> jeql .L2
>
> or, if R0 is dead at the conclusion of the branch, even:
>
> mnegb *8(%ap),%r1
> jeql .L2
>
> Since the compiler insists on doing the comparison on the source of the
> negation it obviously has to load it into a temporary so as to avoid
> accessing the original memory location twice, hence the sequence of
> three instructions rather than just a single one. A similar phenomenon
> can be observed with the XOR operation and in other cases.
>
> In some cases a comparison does get eliminated, however useless moves
> into registers done in preparation to it remain, such as with:
>
> movb *8(%ap),%r2
> movb *12(%ap),%r1
> subb3 %r1,%r2,%r0
> jlssu .L2
>
> where R1 and R2 are both dead at conclusion and therefore:
>
> subb3 *12(%ap),*8(%ap),%r0
> jlssu .L2
>
> would obviously do, but there was to be a comparison before the branch:
>
> cmpb %r2,%r1
>
> All this looks like material for future improvement.
>
> Test cases for comparison elimination and the peepholes will be supplied
> separately.
>
> gcc/
> PR target/95294
> * config/vax/elf.h (REGISTER_NAMES): Append `%psl'.
> * config/vax/vax-modes.def (CCN, CCNZ, CCZ): New modes.
> * config/vax/vax-protos.h (vax_select_cc_mode): New prototype.
> (vax_maybe_split_dimode_move): Likewise.
> (vax_notice_update_cc): Remove prototype.
> * config/vax/vax.c (TARGET_FLAGS_REGNUM): New macro.
> (TARGET_CC_MODES_COMPATIBLE): Likewise.
> (TARGET_MD_ASM_ADJUST): Likewise.
> (vax_select_cc_mode): New function
> (vax_cc_modes_compatible): Likewise.
> (vax_md_asm_adjust): Likewise.
> (vax_notice_update_cc): Remove function.
> (vax_output_int_move): Factor out code checking if a DImode move
> may have to be split...
> (vax_maybe_split_dimode_move): ... into this new function.
> * config/vax/vax.h (FIRST_PSEUDO_REGISTER): Bump up.
> (FIXED_REGISTERS): Append an entry for PSL.
> (CALL_USED_REGISTERS): Likewise.
> (NOTICE_UPDATE_CC, OUTPUT_JUMP): Remove macros.
> (SELECT_CC_MODE): New macro.
> (REGISTER_NAMES): Append `psl'.
> * config/vax/predicates.md (const_zero_operand)
> (vax_cc_comparison_operator, vax_ccn_comparison_operator)
> (vax_ccnz_comparison_operator, vax_ccz_comparison_operator):
> New predicates.
> * config/vax/builtins.md: Rewrite for MODE_CC representation.
> * config/vax/vax.md: Likewise.
So I just spot-checked this and the bits I looked at were quite sane.
Given that and the fact that the port is on the chopping block if we
don't get the conversion done, I think it is fine to ACK this now and if
we want to iterate on things further we certainly can.
Jeff