On Wed, Sep 11, 2024 at 11:47:54AM +0200, Ilya Leoshkevich wrote: > On Fri, 2024-08-16 at 09:41 +0200, Stefan Schulze Frielinghaus wrote: > > Currently subregs originating from *tf_to_fprx2_0 and *tf_to_fprx2_1 > > survive register allocation. This in turn leads to wrong register > > renaming. Keeping the current approach would mean we need two insns > > for > > *tf_to_fprx2_0 and *tf_to_fprx2_1, respectively. Something along the > > lines > > > > (define_insn "*tf_to_fprx2_0" > > [(set (subreg:DF (match_operand:FPRX2 0 "nonimmediate_operand" > > "=f") 0) > > (unspec:DF [(match_operand:TF 1 "general_operand" "v")] > > UNSPEC_TF_TO_FPRX2_0))] > > "TARGET_VXE" > > "#") > > > > (define_insn "*tf_to_fprx2_0" > > [(set (match_operand:DF 0 "nonimmediate_operand" "=f") > > (unspec:DF [(match_operand:TF 1 "general_operand" "v")] > > UNSPEC_TF_TO_FPRX2_0))] > > "TARGET_VXE" > > "vpdi\t%v0,%v1,%v0,1 > > [(set_attr "op_type" "VRR")]) > > > > and similar for *tf_to_fprx2_1. Note, pre register allocation > > operand 0 > > has mode FPRX2 and afterwards DF once subregs have been eliminated. > > > > Since we always copy a whole vector register into a floating-point > > register pair, another way to fix this is to merge *tf_to_fprx2_0 and > > *tf_to_fprx2_1 into a single insn which means we don't have to use > > subregs at all. The downside of this is that the assembler template > > contains two instructions, now. The upside is that we don't have to > > come up with some artificial insn before RA which might be more > > readable/maintainable. That is implemented by this patch. > > > > In commit r11-4872-ge627cda5686592, the output operand specifier %V > > was > > introduced which is used in tf_to_fprx2 only, now. I didn't come up > > with its counterpart like %F for floating-point registers. Instead I > > printed the register pair in the output function directly. This > > spares > > us a new and "rare" format specifier for a single insn. I don't have > > a > > strong opinion which option to choose, however, we should either add > > %F > > in order to mimic the same behaviour as %V or getting rid of %V and > > inline the logic in the output function. I lean towards the latter. > > Any preferences? > > --- > > gcc/config/s390/s390.md | 2 + > > gcc/config/s390/vector.md | 66 +++++++++++--------- > > -- > > gcc/testsuite/gcc.target/s390/pr115860-1.c | 26 +++++++++ > > 3 files changed, 60 insertions(+), 34 deletions(-) > > create mode 100644 gcc/testsuite/gcc.target/s390/pr115860-1.c > > [...] > > > + char buf[64]; > > + switch (which_alternative) > > + { > > + case 0: > > + if (REGNO (operands[0]) == REGNO (operands[1])) > > + return "vpdi\t%V0,%v1,%V0,5"; > > + else > > + return "ldr\t%f0,%f1;vpdi\t%V0,%v1,%V0,5"; > > + case 1: > > + { > > + const char *reg_pair = reg_names[REGNO (operands[0]) + 1]; > > + snprintf (buf, sizeof (buf), "ld\t%%f0,%%1;ld\t%%%s,8+%%1", > > reg_pair); > > I wonder if there is a corner case where 8+ does not fit into short > displacement?
That is covered by constraint AR, i.e., for short displacement, and AT for long displacement.