(Sorry, I missed this because I was on vacation.)
On 11/08/2024 22:00, Robin Dapp wrote:
This patch adds a zero else operand to the masked loads.
The patch is OK, but I have a question below.
gcc/ChangeLog:
* config/gcn/predicates.md (maskload_else_operand): New
predicate.
* config/gcn/gcn-valu.md: Use new predicate.
---
gcc/config/gcn/gcn-valu.md | 6 ++++--
gcc/config/gcn/predicates.md | 3 +++
2 files changed, 7 insertions(+), 2 deletions(-)
diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md
index b24cf9be32e..2344bc00ffc 100644
--- a/gcc/config/gcn/gcn-valu.md
+++ b/gcc/config/gcn/gcn-valu.md
@@ -4002,7 +4002,8 @@ (define_expand "while_ultsidi"
(define_expand "maskload<mode>di"
[(match_operand:V_MOV 0 "register_operand")
(match_operand:V_MOV 1 "memory_operand")
- (match_operand 2 "")]
+ (match_operand 2 "")
+ (match_operand:V_MOV 3 "maskload_else_operand")]
""
{
rtx exec = force_reg (DImode, operands[2]);
@@ -4040,7 +4041,8 @@ (define_expand "mask_gather_load<mode><vnsi>"
(match_operand:<VnSI> 2 "register_operand")
(match_operand 3 "immediate_operand")
(match_operand:SI 4 "gcn_alu_operand")
- (match_operand:DI 5 "")]
+ (match_operand:DI 5 "")
+ (match_operand:V_MOV 6 "maskload_else_operand")]
""
{
rtx exec = force_reg (DImode, operands[5]);
diff --git a/gcc/config/gcn/predicates.md b/gcc/config/gcn/predicates.md
index 3f59396a649..9bc806cf990 100644
--- a/gcc/config/gcn/predicates.md
+++ b/gcc/config/gcn/predicates.md
@@ -228,3 +228,6 @@ (define_predicate "ascending_zero_int_parallel"
return gcn_stepped_zero_int_parallel_p (op, 1);
})
+(define_predicate "maskload_else_operand"
+ (and (match_code "const_int,const_vector")
+ (match_test "op == CONST0_RTX (GET_MODE (op))")))
This forces maskload and mask_gather_load to only accept zero here, but
in fact the hardware would allow us to accept any value (including
undefined).
Here's the expand code:
/* Masked lanes are required to hold zero. */
emit_move_insn (operands[0], gcn_vec_constant (<MODE>mode, 0));
emit_insn (gen_gather<mode>_expr_exec (operands[0], addr, as, v,
operands[0], exec));
In other words, initialize the whole vector to zero, and then use the
gather_load instruction to implement the masked load (GCN does not have
a contiguous-memory vector load instruction).
We could easily omit the initialization instruction, or pass through the
new value.
Would there be any advantage to accepting other values, or is forcing
zero actually the right choice?
Thanks for the patch
Andrew