(Sorry, I missed this because I was on vacation.)

On 11/08/2024 22:00, Robin Dapp wrote:
This patch adds a zero else operand to the masked loads.

The patch is OK, but I have a question below.

gcc/ChangeLog:

        * config/gcn/predicates.md (maskload_else_operand): New
        predicate.
        * config/gcn/gcn-valu.md: Use new predicate.
---
  gcc/config/gcn/gcn-valu.md   | 6 ++++--
  gcc/config/gcn/predicates.md | 3 +++
  2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md
index b24cf9be32e..2344bc00ffc 100644
--- a/gcc/config/gcn/gcn-valu.md
+++ b/gcc/config/gcn/gcn-valu.md
@@ -4002,7 +4002,8 @@ (define_expand "while_ultsidi"
  (define_expand "maskload<mode>di"
    [(match_operand:V_MOV 0 "register_operand")
     (match_operand:V_MOV 1 "memory_operand")
-   (match_operand 2 "")]
+   (match_operand 2 "")
+   (match_operand:V_MOV 3 "maskload_else_operand")]
    ""
    {
      rtx exec = force_reg (DImode, operands[2]);
@@ -4040,7 +4041,8 @@ (define_expand "mask_gather_load<mode><vnsi>"
     (match_operand:<VnSI> 2 "register_operand")
     (match_operand 3 "immediate_operand")
     (match_operand:SI 4 "gcn_alu_operand")
-   (match_operand:DI 5 "")]
+   (match_operand:DI 5 "")
+   (match_operand:V_MOV 6 "maskload_else_operand")]
    ""
    {
      rtx exec = force_reg (DImode, operands[5]);
diff --git a/gcc/config/gcn/predicates.md b/gcc/config/gcn/predicates.md
index 3f59396a649..9bc806cf990 100644
--- a/gcc/config/gcn/predicates.md
+++ b/gcc/config/gcn/predicates.md
@@ -228,3 +228,6 @@ (define_predicate "ascending_zero_int_parallel"
    return gcn_stepped_zero_int_parallel_p (op, 1);
  })
+(define_predicate "maskload_else_operand"
+  (and (match_code "const_int,const_vector")
+       (match_test "op == CONST0_RTX (GET_MODE (op))")))

This forces maskload and mask_gather_load to only accept zero here, but in fact the hardware would allow us to accept any value (including undefined).

Here's the expand code:

  /* Masked lanes are required to hold zero.  */
  emit_move_insn (operands[0], gcn_vec_constant (<MODE>mode, 0));

  emit_insn (gen_gather<mode>_expr_exec (operands[0], addr, as, v,
                                         operands[0], exec));

In other words, initialize the whole vector to zero, and then use the gather_load instruction to implement the masked load (GCN does not have a contiguous-memory vector load instruction).

We could easily omit the initialization instruction, or pass through the new value.

Would there be any advantage to accepting other values, or is forcing zero actually the right choice?

Thanks for the patch

Andrew

Reply via email to