On 7/26/24 2:55 AM, HAO CHEN GUI wrote:
Hi Jeff,
在 2024/7/24 5:57, Jeff Law 写道:
On 7/21/24 7:58 PM, HAO CHEN GUI wrote:
Hi,
This patch adds const0 move checking for CLEAR_BY_PIECES. The original
vec_duplicate handles duplicates of non-constant inputs. But 0 is a
constant. So even a platform doesn't support vec_duplicate, it could
still do clear by pieces if it supports const0 move by that mode.
Compared to the previous version, the main change is to do const0
direct move for by-piece clear if the target supports const0 move by
that mode.
https://gcc.gnu.org/pipermail/gcc-patches/2024-January/643063.html
Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. There are several regressions on aarch64. They could be
fixed by enhancing const0 move on V2x8QImode. Is it OK for trunk?
Can you be more specific about the aarch64 regressions? Execution? Scan-asm?
ICE?
Ideally we'd include a patch to fix those regressions as well.
For aarch64, it supports const0 move on V2x8QImode (the first vector mode it
could find for 16-byte length). But the generated instructions are not
preferable. aarch64 has a better "stp" instruction which leverages its zero
register to store 16-byte zeros into memory.
I tested following experiment patch on aarch64. It converts a const0 move on
V2x8QImode to a const0 move on TImode which generates "stp" instruction. It
fixed all regressions.
We'll probably need Richard S. or someone else to chime in on the actual
patch, but yea, if they can leverage stp, it's likely going to be better
than actual vectors.
Do we have a testcase for this issue or was it something you just
happened to notice?
diff --git a/gcc/config/aarch64/aarch64-simd.md
b/gcc/config/aarch64/aarch64-simd.md
index 01b084d8ccb..8aa72940b12 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -7766,7 +7766,14 @@ (define_expand "mov<mode>"
(match_operand:VSTRUCT_QD 1 "general_operand"))]
"TARGET_FLOAT"
{
- if (can_create_pseudo_p ())
+ if (<MODE>mode == V2x8QImode
+ && operands[1] == CONST0_RTX (V2x8QImode)
+ && MEM_P (operands[0]))
+
+ operands[0] = adjust_address (operands[0], TImode, 0);
+ operands[1] = CONST0_RTX (TImode);
+ }
There's may be other modes where they'll want to do something similar.
Hence my note above that we'll likely need to get Richard S. or someone
else with more knowledge of the port involved.
1. Originally it tests vec_duplicate_optab for clear by pieces. According to
the words in md.texi, this pattern should only handle non-constant inputs.
Constant vector should go through the move pattern instead. So shall we remove
vec_duplicate_optab checking thoroughly?
I think so. The docs are pretty clear that the vec_duplicate pattern
should only handle non-constant values.
2. The clear by pieces final calls emit_move_insn to do the memory set. I
noticed that emit_move_insn doesn't check the predicate of move expand. So any
operand (mem/reg/constant) should be fine when the certain mode of mov_optab
exists. So shall we need following predicate checking for const0 move?
insn_operand_matches (icode, 1, CONST0_RTX (mode))
I'm pretty confident that if we can't move (const_int 0) into any of the
standard integer modes (up to and including word_mode) that all kinds of
things would break.
Perhaps just make it an assert?
3. For clear by pieces, currently the zero is not directly passed to generate
function. It uses help fucntion - builtin_memset_read_str to store the zero
in a pseudo then do the move with the pseudo. I saw i386 can benefit with this
process as it calls gen_memset_value_from_prev to generate the value from
previous one. The i386 can utilize its variable length register for storing
different modes of constant zero. If we pass zero directly to generate function,
this optimization will lost. Shall we set a target hook for it?
I probably wouldn't do a target hook for this. I'd let the x86 backend
deal with it in their expanders.
jeff