Hi Robin,
Thanks for these nice comments!
- emit_insn (gen_vcond_mask (vmode, vmode, d->target, d->op0, d->op1, mask));
+ /* swap op0 and op1 since the order is opposite to pred_merge. */
+ rtx ops2[] = {d->target, d->op1, d->op0, mask};
+ emit_vlmax_merge_insn (code_for_pred_merge (vmode),
riscv_vector::RVV_MERGE_OP, ops2);
return true;
}
This seems a separate, general fix that just surfaced in the course of
this patch? Would be nice to have this factored out but as we already have
it, no need I guess.
Yes, since I change @vcond_mask_<mode><vm> from define_expand to
define_insn_and_split. If I don't change it then I need to manually make
sure that d->target, d->op1, d->op0 satisfy the predicate of the
@vcond_mask (vregs pass will check it, so need forbidden mem operand).
If I use emit_vlmax_merge_insn directly, it uses expand_insn inner,
which automatically converts the operands for me to make it satisfy the
predicate condition. This is one difference between gen_xxx and
expand_insn. And I think calling emit_vlmax_merge_insn to generate
pred_merge is the most appropriate and uniform way.
+ if (is_dummy_mask)
+ {
+ /* Use TU, MASK ANY policy. */
+ if (needs_fp_rounding (code, mode))
+ emit_nonvlmax_fp_tu_insn (icode, RVV_UNOP_TU, cond_ops, len);
+ else
+ emit_nonvlmax_tu_insn (icode, RVV_UNOP_TU, cond_ops, len);
+ }
We have quite a bit of code duplication across the expand_cond_len functions
now (binop, ternop, unop). Not particular to your patch but I'd suggest to
unify this later.
Indeed, leave it to me and I'll send another patch later to reduce this
duplicate code.
+TEST_ALL (DEF_LOOP)
+
+/* NOTE: int abs operator is converted to vmslt + vneg.v */
+/* { dg-final { scan-assembler-times {\tvneg\.v\tv[0-9]+,v[0-9]+,v0\.t} 12 { xfail {
any-opts "--param riscv-autovec-lmul=m2" } } } } */
Why does this fail with LMUL == 2 (also in the following tests)? A comment
would be nice here.
This is because the number of iterations 5 in the testcase caused GCC to
remove the Loop and turn it into two basic blocks, resulting in a
doubling of the number of vnegs. I'm going to modify the iteration count
(It should be big enough that that wouldn't happen even when LMUL=m8) so
that it doesn't trigger that optimization.
V2 patch: https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628210.html
--
Best,
Lehua