Secondary reload and pseudos

Stefan Schulze Frielinghaus via Gcc Fri, 08 Nov 2024 04:25:36 -0800

Hi all,

For older s390 machines, which do not support vector extensions, I'm trying to
implement 2-byte GPR<->FPR moves.  Since GPRs are right-aligned and FPRs
left-aligned I cannot trivially copy between them.  However, at least since
z9-ec (TARGET_DFP) we have instructions ldgr/lgdr in order to move 8-byte
values between GPRs and FPRs.  A 2-byte GPR into FPR move could be implemented
by shifting the input GPR by 48 bit and storing the result in a scratch GPR
which is then finally copied as an 8-byte move into an FPR.  This means,
for a GPR into FPR move I need a scratch register which I could
allocated during reload.  So far I came up with


if (TARGET_DFP && !TARGET_VX && GET_MODE_SIZE (mode) == 2
    && ((in_p && (REG_P (x) || (SUBREG_P (x) && REG_P (SUBREG_REG (x))))
         && reg_classes_intersect_p (rclass, FP_REGS))
        || (!in_p && FP_REGNO_P (true_regnum (x))
            && reg_classes_intersect_p (rclass, GENERAL_REGS))))
  {
    sri->icode = CODE_FOR_reload_half_into_fpr;
    return NO_REGS;
  }

in TARGET_SECONDARY_RELOAD and the corresponding expander

(define_expand "reload_half_into_fpr"
  [(parallel [(match_operand    0 "register_operand" "=f")
              (match_operand    1 "register_operand"  "d")
              (match_operand:DI 2 "register_operand" "=d")])]
  "TARGET_DFP"
{
  gcc_assert (FP_REGNO_P (true_regnum (operands[0])));
  operands[1] = simplify_gen_subreg (DImode, operands[1], GET_MODE 
(operands[1]), 0);
  emit_insn (gen_rtx_SET (operands[2], operands[1]));
  emit_insn (gen_rtx_SET (operands[2], gen_rtx_ASHIFT (DImode, operands[2], 
GEN_INT (48))));
  operands[2] = simplify_gen_subreg (DFmode, operands[2], GET_MODE 
(operands[2]), 0);
  emit_insn (gen_rtx_SET (gen_rtx_REG (DFmode, true_regnum (operands[0])), 
operands[2]));
  DONE;
})

This works for some minimal examples but bootstrap fails in lra_assign()

if (! lra_hard_reg_split_p && ! lra_asm_error_p && flag_checking)
  /* Check correctness of allocation but only when there are no hard reg
     splits and asm errors as in the case of errors explicit insns involving
     hard regs are added or the asm is removed and this can result in
     incorrect allocation.  */
  for (i = FIRST_PSEUDO_REGISTER; i < max_regno; i++)
    if (lra_reg_info[i].nrefs != 0
        && reg_renumber[i] >= 0
        && overlaps_hard_reg_set_p (lra_reg_info[i].conflict_hard_regs,
                                    PSEUDO_REGNO_MODE (i), reg_renumber[i]))
      gcc_unreachable ();

where we have for a reduced test and pseudo 68

(gdb) print reg_renumber[68]
$2 = 25
(gdb) call debug_hard_reg_set (lra_reg_info[68].conflict_hard_regs)
 0-5 12 14-23 25 33 35 38-53

which is why gcc_unreachable() fires.

To get straight to the point I fear that in the expander the final assignment to
gen_rtx_REG (DFmode, true_regnum (operands[0])) might be problematic.  In
reloads dump I see that in the expander I'm finally not writing into the former
target namely pseudo 68 but in its designated hard register f10 due to
true_regnum():

   70: r68:HF=r91:HF
      REG_DEAD r91:HF
    Inserting the move before:
   93: r102:DI=r91:HF#0
   94: r102:DI=r102:DI<<0x30
   95: %f10:DF=r102:DI#0

Deleting move 70
   70: r68:HF=r91:HF
      REG_DEAD r91:HF
deleting insn with uid = 70.

I fear that this might have consequences while determining liveness et al. for
pseudo 68.  Long story short: am I allowed to write into the designated hard
register instead of its corresponding pseudo?

The tricky part seems to be that the initial mode of the target FPR is a 2-byte
mode, however, in the end I have to emit a move insn with an 8-byte mode.  If
the target FPR is a hard reg I'm allowed to change the mode.  However, in case 
of
a pseudo I cannot use a subreg since this would lead to infinite recursive
reloads.  This leaves me with assigning to its designated hard reg
gen_rtx_REG (DFmode, true_regnum (operands[0])) which might interfere with the
liveness (or whatever) of its corresponding pseudo.

In order to shift around this and enforce a hard register I went for allocating
a scratch register during secondary reload and a splitter which triggers after
reload where I'm certain that I only deal with hard registers, i.e., something
along the lines

(define_expand "reload_half_into_fpr"
  [(parallel [(match_operand    0 "register_operand" "=f")
              (match_operand    1 "register_operand"  "d")
              (match_operand:DI 2 "register_operand" "=d")])]
  "TARGET_DFP"
{
  emit_insn (gen_reload_half_into_fpr_helper (operands[0], operands[1], 
operands[2]));
  DONE;
})

(define_insn_and_split "reload_half_into_fpr_helper"
  [(set (match_operand             0 "register_operand" "=f")
        (unspec [(match_operand    1 "register_operand"  "d")
                 (match_operand:DI 2 "register_operand" "=d")]
                UNSPEC_GPRTOFPR))]
  "TARGET_DFP"
  "#"
  "&& reload_completed"
  [(const_int 0)]
{
  emit_insn (gen_rtx_SET (operands[2], gen_rtx_REG (DImode, REGNO 
(operands[1]))));
  emit_insn (gen_rtx_SET (operands[2], gen_rtx_ASHIFT (DImode, operands[2], 
GEN_INT (48))));
  emit_insn (gen_rtx_SET (gen_rtx_REG (DFmode, REGNO (operands[0])), 
gen_rtx_REG (DFmode, REGNO (operands[2]))));
  DONE;
})

That restores bootstrap.  However, this feels a bit hacky and I'm wondering
whether first of all the initial implementation is wrong at all, or whether
there exists a more elegant solution?  Any thoughts?

Cheers,
Stefan

Secondary reload and pseudos

Reply via email to