Hi Jeff,
> The pattern's operand 0 explicitly allows MEMs as do the constraints. > So forcing the operand into a register just seems like it's papering > over the real problem. The added of force_reg code is address the problem preduced after address the error combine. The more restrict condtion of the pattern forbidden mem->mem pattern which will produced in -O0. I think the implementation forgot to do this force_reg operation before when doing the intrinis expansion The reason this problem isn't exposed before is because the reload pass will converts mem->mem to mem->reg; reg->mem based on the constraint. > I wonder if we should just remove the memory destination from this > pattern. Ultimately isn't that case just trying to optimize a constant > store into memory -- perhaps we just need a distinct pattern for that. > We generally try to avoid that for movXX patterns, but this seems a bit > different. The pattern like scalar mov pattern, need to block mem->mem case. I think mem->reg, reg->mem, reg->reg patterns are defined in the same insn is more readable, I wonder how you feel about that? And there's another `*mov<mode>_whole` pattern that needs to be restricted here as well, I'll try to send a separate patch to address that like bellow. (define_insn "*mov<mode>_whole" [(set (match_operand:V_WHOLE 0 "reg_or_mem_operand" "=vr, m,vr") (match_operand:V_WHOLE 1 "reg_or_mem_operand" " m,vr,vr"))] "TARGET_VECTOR" ...) Change to: (define_insn "*mov<mode>_whole" [(set (match_operand:V_WHOLE 0 "reg_or_mem_operand" "=vr, m,vr") (match_operand:V_WHOLE 1 "reg_or_mem_operand" " m,vr,vr"))] "TARGET_VECTOR && (register_operand (operands[0], <MODE>mode) || register_operand(operands[1], <MODE>mode))" ...) > This comment doesn't make sense in conjuction with your earlier details. > In particular combine doesn't run at -O0, so your earlier comment that > combine creates the problem seems inconsistent with the comment above. As the above says, the code addresses the problem which produced after addressing the combine problem. > Umm, wow. I haven't thought deeply about this, but the complexity of > that insn condition is a huge red flag that our operand predicates > aren't correct for this pattern. This condition is large because the vsetvl info need (compare to scalar mov or *mov<mode>_whole pattern), but I think this condition is enough clear to understand. Let me explain briefly. (register_operand (operands[0], <MODE>mode) && MEM_P (operands[3])) || (MEM_P (operands[0]) && register_operand(operands[3], <MODE>mode)) This two conditons mean allow mem->reg and reg->mem pattern. (register_operand (operands[0], <MODE>mode) && satisfies_constraint_Wc1 (operands[1])) This condition mean the mask must be all trues for reg->reg_or_imm pattern since reg->reg insn doen't support mask operand. Best, Lehua ------------------ Original ------------------ From: "Jeff Law" <jeffreya...@gmail.com>; Date: Wed, Aug 9, 2023 00:10 AM To: "Lehua Ding"<lehua.d...@rivai.ai>;"gcc-patches"<gcc-patches@gcc.gnu.org>; Cc: "juzhe.zhong"<juzhe.zh...@rivai.ai>;"rdapp.gcc"<rdapp....@gmail.com>;"kito.cheng"<kito.ch...@gmail.com>;"palmer"<pal...@rivosinc.com>; Subject: Re: [PATCH] RISC-V: Fix error combine of pred_mov pattern On 8/8/23 05:57, Lehua Ding wrote: > Hi, > > This patch fix PR110943 which will produce some error code. This is because > the error combine of some pred_mov pattern. Consider this code: > > ``` > #include <riscv_vector.h> > > void foo9 (void *base, void *out, size_t vl) > { > int64_t scalar = *(int64_t*)(base + 100); > vint64m2_t v = __riscv_vmv_v_x_i64m2 (0, 1); > *(vint64m2_t*)out = v; > } > ``` > > RTL before combine pass: > > ``` > (insn 11 10 12 2 (set (reg/v:RVVM2DI 134 [ v ]) > (if_then_else:RVVM2DI (unspec:RVVMF32BI [ > (const_vector:RVVMF32BI repeat [ > (const_int 1 [0x1]) > ]) > (const_int 1 [0x1]) > (const_int 2 [0x2]) repeated x2 > (const_int 0 [0]) > (reg:SI 66 vl) > (reg:SI 67 vtype) > ] UNSPEC_VPREDICATE) > (const_vector:RVVM2DI repeat [ > (const_int 0 [0]) > ]) > (unspec:RVVM2DI [ > (reg:SI 0 zero) > ] UNSPEC_VUNDEF))) "/app/example.c":6:20 1089 {pred_movrvvm2di}) > (insn 14 13 0 2 (set (mem:RVVM2DI (reg/v/f:DI 136 [ out ]) [1 MEM[(vint64m2_t *)out_4(D)]+0 S[32, 32] A128]) > (reg/v:RVVM2DI 134 [ v ])) "/app/example.c":7:23 717 {*movrvvm2di_whole}) > ``` > > RTL after combine pass: > ``` > (insn 14 13 0 2 (set (mem:RVVM2DI (reg:DI 138) [1 MEM[(vint64m2_t *)out_4(D)]+0 S[32, 32] A128]) > (if_then_else:RVVM2DI (unspec:RVVMF32BI [ > (const_vector:RVVMF32BI repeat [ > (const_int 1 [0x1]) > ]) > (const_int 1 [0x1]) > (const_int 2 [0x2]) repeated x2 > (const_int 0 [0]) > (reg:SI 66 vl) > (reg:SI 67 vtype) > ] UNSPEC_VPREDICATE) > (const_vector:RVVM2DI repeat [ > (const_int 0 [0]) > ]) > (unspec:RVVM2DI [ > (reg:SI 0 zero) > ] UNSPEC_VUNDEF))) "/app/example.c":7:23 1089 {pred_movrvvm2di}) > ``` > > This combine change the semantics of insn 14. I refine the conditon of @pred_mov > pattern to a more restrict. It's Ok for trunk? > > Best, > Lehua > > >PR target/110943 > > gcc/ChangeLog: > >* config/riscv/riscv-vector-builtins.cc (function_expander::function_expander): > force_reg mem operand. >* config/riscv/vector.md: Refine condition. > > gcc/testsuite/ChangeLog: > >* gcc.target/riscv/rvv/base/zvfhmin-intrinsic.c: Update. >* gcc.target/riscv/rvv/base/pr110943.c: New test. So at a high level this doesn't look correct to me. The pattern's operand 0 explicitly allows MEMs as do the constraints. So forcing the operand into a register just seems like it's papering over the real problem. I wonder if we should just remove the memory destination from this pattern. Ultimately isn't that case just trying to optimize a constant store into memory -- perhaps we just need a distinct pattern for that. We generally try to avoid that for movXX patterns, but this seems a bit different. > create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110943.c > > diff --git a/gcc/config/riscv/riscv-vector-builtins.cc b/gcc/config/riscv/riscv-vector-builtins.cc > index 528dca7ae85..cd40fb2060f 100644 > --- a/gcc/config/riscv/riscv-vector-builtins.cc > +++ b/gcc/config/riscv/riscv-vector-builtins.cc > @@ -3471,7 +3471,13 @@ function_expander::function_expander (const function_instance &instance, > exp (exp_in), target (target_in), opno (0) > { > if (!function_returns_void_p ()) > - create_output_operand (&m_ops[opno++], target, TYPE_MODE (TREE_TYPE (exp))); > + { > + if (target != NULL_RTX && MEM_P (target)) > + /* Use force_reg to prevent illegal mem-to-mem pattern on -O0. */ This comment doesn't make sense in conjuction with your earlier details. In particular combine doesn't run at -O0, so your earlier comment that combine creates the problem seems inconsistent with the comment above. > diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md > index e56a2bf4bed..f0484b1162c 100644 > --- a/gcc/config/riscv/vector.md > +++ b/gcc/config/riscv/vector.md > @@ -1509,8 +1509,9 @@ > (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE) > (match_operand:V_VLS 3 "vector_move_operand" " m, m, m, vr, vr, vr, viWc0, viWc0") > (match_operand:V_VLS 2 "vector_merge_operand" " 0, vu, vu, vu, vu, 0, vu, 0")))] > - "TARGET_VECTOR && (MEM_P (operands[0]) || MEM_P (operands[3]) > - || CONST_VECTOR_P (operands[1]))" > + "TARGET_VECTOR && ((register_operand (operands[0], <MODE>mode) && MEM_P (operands[3])) || > + (MEM_P (operands[0]) && register_operand (operands[3], <MODE>mode)) || > + (register_operand (operands[0], <MODE>mode) && satisfies_constraint_Wc1 (operands[1])))" Umm, wow. I haven't thought deeply about this, but the complexity of that insn condition is a huge red flag that our operand predicates aren't correct for this pattern. From a formatting standpoint bring the wrapped operator down and indent. ie (condition 1 || condition 2 || (condition 3 && other test 4)) Jeff