On 06/12/2024 18:14, Christophe Lyon wrote:
On Fri, 6 Dec 2024 at 12:41, Richard Earnshaw (lists)
<richard.earns...@arm.com> wrote:
On 04/12/2024 20:56, Christophe Lyon wrote:
On Wed, 4 Dec 2024 at 12:39, Richard Earnshaw (lists)
<richard.earns...@arm.com> wrote:
On 25/11/2024 20:08, Christophe Lyon wrote:
In this PR, we have to handle a case where MVE predicates are supplied
as a const_int, where individual predicates have illegal boolean
values (such as 0xc for a 4-bit boolean predicate). To avoid the ICE,
fix the constant (any non-zero value is converted to all 1s) and emit
a warning.
On MVE, V8BI and V4BI multi-bit masks are interpreted byte-by-byte at
instruction level, but end-users should describe lanes rather than
bytes (so all bytes of a true-predicated lane should be '1'), see
https://developer.arm.com/documentation/101028/0012/14--M-profile-Vector-Extension--MVE--intrinsics.
Since gen_lowpart can ICE on a subreg, we force predicates in a subreg
into a reg, after removing subreg of the same size as the target
(HImode) which would be made redundant by gen_lowpart and confuse the
DLSTP optimization.
2024-11-20 Christophe Lyon <christophe.l...@linaro.org>
Jakub Jelinek <ja...@redhat.com>
PR target/114801
gcc/
* config/arm/arm-mve-builtins.cc
(function_expander::add_input_operand): Handle CONST_INT
predicates.
gcc/testsuite/
* gcc.target/arm/mve/pr108443.c: Update predicate constant.
* gcc.target/arm/mve/pr114801.c: New test.
---
gcc/config/arm/arm-mve-builtins.cc | 37 ++++++++++++++++++-
gcc/testsuite/gcc.target/arm/mve/pr108443.c | 4 +--
gcc/testsuite/gcc.target/arm/mve/pr114801.c | 39 +++++++++++++++++++++
3 files changed, 77 insertions(+), 3 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/arm/mve/pr114801.c
diff --git a/gcc/config/arm/arm-mve-builtins.cc
b/gcc/config/arm/arm-mve-builtins.cc
index 255aed25600..5ff32ce06b7 100644
--- a/gcc/config/arm/arm-mve-builtins.cc
+++ b/gcc/config/arm/arm-mve-builtins.cc
@@ -2352,7 +2352,42 @@ function_expander::add_input_operand (insn_code icode,
rtx x)
mode = GET_MODE (x);
}
else if (VALID_MVE_PRED_MODE (mode))
- x = gen_lowpart (mode, x);
+ {
+ if (CONST_INT_P (x) && (mode == V8BImode || mode == V4BImode))
+ {
+ /* In V8BI or V4BI each element has 2 or 4 bits, if those bits aren't
+ all the same, gen_lowpart might ICE. Canonicalize all the 2 or 4
+ bits to all ones if any of them is non-zero. V8BI and V4BI
+ multi-bit masks are interpreted byte-by-byte at instruction level,
+ but such constants should describe lanes, rather than bytes. See
+
https://developer.arm.com/documentation/101028/0012/14--M-profile-Vector-Extension--MVE--intrinsics.
*/
Apart from being an overly long line, deep links like this are generally not very stable.
I suggest we just say something like "See the section on MVE intrinsics in the Arm
ACLE specification".
Right, I was wondering what was the best practice, I think I've seen
such links recently, not sure where.
I'll update the comment, and the commit message.
+ unsigned HOST_WIDE_INT xi = UINTVAL (x);
+ xi |= ((xi & 0x5555) << 1) | ((xi & 0xaaaa) >> 1);
+ if (mode == V4BImode)
+ xi |= ((xi & 0x3333) << 2) | ((xi & 0xcccc) >> 2);
+ if (xi != UINTVAL (x))
+ inform (location, "constant predicate argument %d (%wx) does"
+ " not map to %d lane numbers, converted to %wx",
+ opno, UINTVAL (x) & 0xffff, mode == V8BImode ? 8 : 4,
+ xi & 0xffff);
I think this should be a warning (so that werror can work with it). Otherwise
such messages can't be faulted.
OK, I will change this.
+
+ x = gen_int_mode (xi, HImode);
+ }
+ else if (SUBREG_P (x))
+ {
+ /* Already of the right size, drop the subreg which will be made
+ redundant by gen_lowpart below. */
+ if (GET_MODE_SIZE (GET_MODE (x)) == GET_MODE_SIZE (HImode)
+ || SUBREG_BYTE (x) == 0)
+ x = SUBREG_REG (x);
+
+ /* gen_lowpart on a SUBREG can ICE. */
+ if (gen_lowpart_common (mode, x) == 0)
+ x = force_reg (GET_MODE (x), x);
+ }
+
+ x = gen_lowpart (mode, x);
I wonder if this is overly complex. Wouldn't it be better to just write here:
else if (!REG_P (x))
x = force_reg (GET_MODE (x), x);
and then let the optimizers clean things up?
The !REG_P(x) condition is not right, because force_reg crashes if x
== (const_int 1) for instance (we have mode=VOID in
int_mode_for_mode).
The first 'if' which looks at the mode sizes it to avoid regressions
in DLSTP transform, despite recent improvements there.
Without this, dlstp-compile-asm-2.c has regressions in test6 and test9.
For some reason we have:
(set (reg:HI 137 [ p1 ]
(subreg/s/v:HI (reg/v:SI 129 [ p1 ]) 0)) (nil))
(set (reg:V4BI 138 [ p1 ])
(subreg:V4BI (reg:HI 137 [ p1 ]) 0)) (nil))
which not hoisted, while with the check above we hoist this:
(set (reg:V4BI 136 [ p1 ])
(subreg:V4BI (reg/v:SI 129 [ p1 ]) 0)) (nil))
I was thinking this could be OK for gcc-15, and I would look at
removing this hack for gcc-16.
The gen_lowpart_common (mode, x) == 0 part is to call force_reg only
if gen_lowpart might have trouble with the current x, but avoid
unnecessary force_reg that could confuse DLSTP transform too.
In practice this is not exercised by the testsuite, but Jakub raised
concerns after gen_lowpart possibly ICEing.
So how about using force_subreg_lowpart at the end, rather than gen_lowpart?
Do you mean gen_lowpart_SUBREG? (force_subreg_lowpart does not exist?)
gen_lowpart_SUBREG ICEs when x is a const_int, so do you mean:
if (CONST_INT_P (x))
x = gen_lowpart (mode, x);
else
x = gen_lowpart_SUBREG (mode, x);
Without the checks of the mode sizes, we still have regressions in
DLSTP transform anyway.
Sorry, I meant force_lowpart_subreg(). In explow.cc just after
force_subreg.
R.
Thanks,
Christophe
R.
Thanks,
Christophe
R.