On Wed, Sep 11, 2024 at 01:22:30PM +0200, Ilya Leoshkevich wrote: > On Wed, 2024-09-11 at 12:35 +0200, Stefan Schulze Frielinghaus wrote: > > On Wed, Sep 11, 2024 at 11:47:54AM +0200, Ilya Leoshkevich wrote: > > > On Fri, 2024-08-16 at 09:41 +0200, Stefan Schulze Frielinghaus > > > wrote: > > > > Currently subregs originating from *tf_to_fprx2_0 and > > > > *tf_to_fprx2_1 > > > > survive register allocation. This in turn leads to wrong > > > > register > > > > renaming. Keeping the current approach would mean we need two > > > > insns > > > > for > > > > *tf_to_fprx2_0 and *tf_to_fprx2_1, respectively. Something along > > > > the > > > > lines > > > > > > > > (define_insn "*tf_to_fprx2_0" > > > > [(set (subreg:DF (match_operand:FPRX2 0 "nonimmediate_operand" > > > > "=f") 0) > > > > (unspec:DF [(match_operand:TF 1 "general_operand" "v")] > > > > UNSPEC_TF_TO_FPRX2_0))] > > > > "TARGET_VXE" > > > > "#") > > > > > > > > (define_insn "*tf_to_fprx2_0" > > > > [(set (match_operand:DF 0 "nonimmediate_operand" "=f") > > > > (unspec:DF [(match_operand:TF 1 "general_operand" "v")] > > > > UNSPEC_TF_TO_FPRX2_0))] > > > > "TARGET_VXE" > > > > "vpdi\t%v0,%v1,%v0,1 > > > > [(set_attr "op_type" "VRR")]) > > > > > > > > and similar for *tf_to_fprx2_1. Note, pre register allocation > > > > operand 0 > > > > has mode FPRX2 and afterwards DF once subregs have been > > > > eliminated. > > > > > > > > Since we always copy a whole vector register into a floating- > > > > point > > > > register pair, another way to fix this is to merge *tf_to_fprx2_0 > > > > and > > > > *tf_to_fprx2_1 into a single insn which means we don't have to > > > > use > > > > subregs at all. The downside of this is that the assembler > > > > template > > > > contains two instructions, now. The upside is that we don't have > > > > to > > > > come up with some artificial insn before RA which might be more > > > > readable/maintainable. That is implemented by this patch. > > > > > > > > In commit r11-4872-ge627cda5686592, the output operand specifier > > > > %V > > > > was > > > > introduced which is used in tf_to_fprx2 only, now. I didn't come > > > > up > > > > with its counterpart like %F for floating-point registers. > > > > Instead I > > > > printed the register pair in the output function directly. This > > > > spares > > > > us a new and "rare" format specifier for a single insn. I don't > > > > have > > > > a > > > > strong opinion which option to choose, however, we should either > > > > add > > > > %F > > > > in order to mimic the same behaviour as %V or getting rid of %V > > > > and > > > > inline the logic in the output function. I lean towards the > > > > latter. > > > > Any preferences? > > > > --- > > > > gcc/config/s390/s390.md | 2 + > > > > gcc/config/s390/vector.md | 66 +++++++++++----- > > > > ---- > > > > -- > > > > gcc/testsuite/gcc.target/s390/pr115860-1.c | 26 +++++++++ > > > > 3 files changed, 60 insertions(+), 34 deletions(-) > > > > create mode 100644 gcc/testsuite/gcc.target/s390/pr115860-1.c > > > > > > [...] > > > > > > > + char buf[64]; > > > > + switch (which_alternative) > > > > + { > > > > + case 0: > > > > + if (REGNO (operands[0]) == REGNO (operands[1])) > > > > + return "vpdi\t%V0,%v1,%V0,5"; > > > > + else > > > > + return "ldr\t%f0,%f1;vpdi\t%V0,%v1,%V0,5"; > > > > + case 1: > > > > + { > > > > + const char *reg_pair = reg_names[REGNO (operands[0]) + > > > > 1]; > > > > + snprintf (buf, sizeof (buf), > > > > "ld\t%%f0,%%1;ld\t%%%s,8+%%1", > > > > reg_pair); > > > > > > I wonder if there is a corner case where 8+ does not fit into short > > > displacement? > > > > That is covered by constraint AR, i.e., for short displacement, and > > AT > > for long displacement. > > Don't they cover only %1, and not 8+%1? Can't there be a situation > where %1 barely fits and 8+%1 doesn't fit? A quick glance shows that > the code doesn't leave any allowance for this: > > "AR" > s390_mem_constraint("AR") > s390_check_qrst_address('R') > s390_short_displacement() > INTVAL (disp) >= 0 && INTVAL (disp) < 4096
Isn't this covered by int s390_mem_constraint (const char *str, rtx op) { char c = str[0]; switch (c) { case 'A': /* Check for offsettable variants of memory constraints. */ if (!MEM_P (op) || MEM_VOLATILE_P (op)) return 0; if ((reload_completed || reload_in_progress) ? !offsettable_memref_p (op) : !offsettable_nonstrict_memref_p (op)) return 0; where /* Return true if OP is a memory reference whose address contains no side effects and remains valid after the addition of a positive integer less than the size of the object being referenced. We assume that the original address is valid and do not check it. This uses strict_memory_address_p as a subroutine, so don't use it before reload. */ bool offsettable_memref_p (rtx op) { return ((MEM_P (op)) && offsettable_address_addr_space_p (1, GET_MODE (op), XEXP (op, 0), MEM_ADDR_SPACE (op))); } guarantees that whatever we add to the address such that we stay inside the object, the address is valid. Or do I miss something?