Hi all, For older s390 machines, which do not support vector extensions, I'm trying to implement 2-byte GPR<->FPR moves. Since GPRs are right-aligned and FPRs left-aligned I cannot trivially copy between them. However, at least since z9-ec (TARGET_DFP) we have instructions ldgr/lgdr in order to move 8-byte values between GPRs and FPRs. A 2-byte GPR into FPR move could be implemented by shifting the input GPR by 48 bit and storing the result in a scratch GPR which is then finally copied as an 8-byte move into an FPR. This means, for a GPR into FPR move I need a scratch register which I could allocated during reload. So far I came up with
if (TARGET_DFP && !TARGET_VX && GET_MODE_SIZE (mode) == 2 && ((in_p && (REG_P (x) || (SUBREG_P (x) && REG_P (SUBREG_REG (x)))) && reg_classes_intersect_p (rclass, FP_REGS)) || (!in_p && FP_REGNO_P (true_regnum (x)) && reg_classes_intersect_p (rclass, GENERAL_REGS)))) { sri->icode = CODE_FOR_reload_half_into_fpr; return NO_REGS; } in TARGET_SECONDARY_RELOAD and the corresponding expander (define_expand "reload_half_into_fpr" [(parallel [(match_operand 0 "register_operand" "=f") (match_operand 1 "register_operand" "d") (match_operand:DI 2 "register_operand" "=d")])] "TARGET_DFP" { gcc_assert (FP_REGNO_P (true_regnum (operands[0]))); operands[1] = simplify_gen_subreg (DImode, operands[1], GET_MODE (operands[1]), 0); emit_insn (gen_rtx_SET (operands[2], operands[1])); emit_insn (gen_rtx_SET (operands[2], gen_rtx_ASHIFT (DImode, operands[2], GEN_INT (48)))); operands[2] = simplify_gen_subreg (DFmode, operands[2], GET_MODE (operands[2]), 0); emit_insn (gen_rtx_SET (gen_rtx_REG (DFmode, true_regnum (operands[0])), operands[2])); DONE; }) This works for some minimal examples but bootstrap fails in lra_assign() if (! lra_hard_reg_split_p && ! lra_asm_error_p && flag_checking) /* Check correctness of allocation but only when there are no hard reg splits and asm errors as in the case of errors explicit insns involving hard regs are added or the asm is removed and this can result in incorrect allocation. */ for (i = FIRST_PSEUDO_REGISTER; i < max_regno; i++) if (lra_reg_info[i].nrefs != 0 && reg_renumber[i] >= 0 && overlaps_hard_reg_set_p (lra_reg_info[i].conflict_hard_regs, PSEUDO_REGNO_MODE (i), reg_renumber[i])) gcc_unreachable (); where we have for a reduced test and pseudo 68 (gdb) print reg_renumber[68] $2 = 25 (gdb) call debug_hard_reg_set (lra_reg_info[68].conflict_hard_regs) 0-5 12 14-23 25 33 35 38-53 which is why gcc_unreachable() fires. To get straight to the point I fear that in the expander the final assignment to gen_rtx_REG (DFmode, true_regnum (operands[0])) might be problematic. In reloads dump I see that in the expander I'm finally not writing into the former target namely pseudo 68 but in its designated hard register f10 due to true_regnum(): 70: r68:HF=r91:HF REG_DEAD r91:HF Inserting the move before: 93: r102:DI=r91:HF#0 94: r102:DI=r102:DI<<0x30 95: %f10:DF=r102:DI#0 Deleting move 70 70: r68:HF=r91:HF REG_DEAD r91:HF deleting insn with uid = 70. I fear that this might have consequences while determining liveness et al. for pseudo 68. Long story short: am I allowed to write into the designated hard register instead of its corresponding pseudo? The tricky part seems to be that the initial mode of the target FPR is a 2-byte mode, however, in the end I have to emit a move insn with an 8-byte mode. If the target FPR is a hard reg I'm allowed to change the mode. However, in case of a pseudo I cannot use a subreg since this would lead to infinite recursive reloads. This leaves me with assigning to its designated hard reg gen_rtx_REG (DFmode, true_regnum (operands[0])) which might interfere with the liveness (or whatever) of its corresponding pseudo. In order to shift around this and enforce a hard register I went for allocating a scratch register during secondary reload and a splitter which triggers after reload where I'm certain that I only deal with hard registers, i.e., something along the lines (define_expand "reload_half_into_fpr" [(parallel [(match_operand 0 "register_operand" "=f") (match_operand 1 "register_operand" "d") (match_operand:DI 2 "register_operand" "=d")])] "TARGET_DFP" { emit_insn (gen_reload_half_into_fpr_helper (operands[0], operands[1], operands[2])); DONE; }) (define_insn_and_split "reload_half_into_fpr_helper" [(set (match_operand 0 "register_operand" "=f") (unspec [(match_operand 1 "register_operand" "d") (match_operand:DI 2 "register_operand" "=d")] UNSPEC_GPRTOFPR))] "TARGET_DFP" "#" "&& reload_completed" [(const_int 0)] { emit_insn (gen_rtx_SET (operands[2], gen_rtx_REG (DImode, REGNO (operands[1])))); emit_insn (gen_rtx_SET (operands[2], gen_rtx_ASHIFT (DImode, operands[2], GEN_INT (48)))); emit_insn (gen_rtx_SET (gen_rtx_REG (DFmode, REGNO (operands[0])), gen_rtx_REG (DFmode, REGNO (operands[2])))); DONE; }) That restores bootstrap. However, this feels a bit hacky and I'm wondering whether first of all the initial implementation is wrong at all, or whether there exists a more elegant solution? Any thoughts? Cheers, Stefan