On 7/21/23 11:31, Palmer Dabbelt wrote:
On Fri, 21 Jul 2023 10:55:52 PDT (-0700), Vineet Gupta wrote:
DF +0.0 is bitwise all zeros so int x0 store to mem can be used to
optimize it.
void zd(double *) { *d = 0.0; }
currently:
| fmv.d.x fa5,zero
| fsd fa5,0(a0)
| ret
With patch
| sd zero,0(a0)
| ret
This came to light when testing the in-flight f-m-o patch where an ICE
was gettinh triggered due to lack of this pattern but turns out this
is an independent optimization of its own [1]
[1] https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624857.html
Apparently this is a regression in gcc-13, introduced by commit
ef85d150b5963 ("RISC-V: Enable TARGET_SUPPORTS_WIDE_INT") and the fix
thus is a partial revert of that change.
Given that it can ICE, we should probably backport it to 13.
FWIW ICE is on an in-flight for-gcc-14 patch, not something in tree
already. And this will merge ahead of that.
I'm fine with backport though.
Ran thru full multilib testsuite, there was 1 false failure due to
Did you run the test with autovec?
I have standard 32/64 mutlilibs, but no 'v' in arch so autovec despite
being enabled at -O2 and above will not kick in.
I think we should add a 'v' multilib.
There's also a pmode_reg_or_0_operand, some of those don't appear
protected from FP values. So we might need something like
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index cd5b19457f8..d8ce9223343 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -63,7 +63,7 @@ (define_expand "movmisalign<mode>"
(define_expand "len_mask_gather_load<VNX1_QHSD:mode><VNX1_QHSDI:mode>"
[(match_operand:VNX1_QHSD 0 "register_operand")
- (match_operand 1 "pmode_reg_or_0_operand")
+ (match_operand:P 1 "pmode_reg_or_0_operand")
(match_operand:VNX1_QHSDI 2 "register_operand")
(match_operand 3 "<VNX1_QHSD:gs_extension>")
(match_operand 4 "<VNX1_QHSD:gs_scale>")
a bunch of times, as there's a ton of them? I'm not entirely sure if
that
could manifest as an actual bug, though...
What does 'P' do here ?
+
+void zd(double *d) { *d = 0.0; }
+void zf(float *f) { *f = 0.0; }
+
+/* { dg-final { scan-assembler-not "\tfmv\\.d\\.x\t" } } */
+/* { dg-final { scan-assembler-not "\tfmv\\.s\\.x\t" } } */
IIUC the pattern to emit fmv suffers from the same bug -- it's fixed
in the same
way, but I think we might be able to come up with a test for it:
`fmv.d.x FREG,
x0` would be the fastest way to generate 0.0, so maybe something like
double sum(double *d) {
double sum = 0;
for (int i = 0; i < 8; ++i)
sum += d[i];
return sum;
}
would do it? That's generating the fmv on 13 for me, though, so maybe
I'm
missing something?`
I need to unpack this first :-)
diff --git a/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
b/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
index 1036044291e7..89eb48bed1b9 100644
--- a/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
+++ b/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
@@ -18,7 +18,7 @@ d2ll (double d)
/* { dg-final { scan-assembler "th.fmv.hw.x" } } */
/* { dg-final { scan-assembler "fmv.x.w" } } */
/* { dg-final { scan-assembler "th.fmv.x.hw" } } */
-/* { dg-final { scan-assembler-not "sw" } } */
-/* { dg-final { scan-assembler-not "fld" } } */
-/* { dg-final { scan-assembler-not "fsd" } } */
-/* { dg-final { scan-assembler-not "lw" } } */
+/* { dg-final { scan-assembler-not "\tsw\t" } } */
+/* { dg-final { scan-assembler-not "\tfld\t" } } */
+/* { dg-final { scan-assembler-not "\tfsd\t" } } */
+/* { dg-final { scan-assembler-not "\tlw\t" } } */
I think that autovec one is the only possible dependency that might
have snuck
in, so we should be safe otherwise. Thanks!
I'm not sure if this specific comment is related to the xthead test or
continuation of above.
For xthead it is real issue since I saw a random "lw" in lto assembler
output.