On 10/02/2023 09:11, Jakub Jelinek wrote:
I've tried to fix the -flto thing and I can't figure out how. The problem
seems to be that there are two dump files from the two compiler invocations
and it scans the wrong one. Aarch64 has the same problem.

Two dumps are because it is in a dg-do run test.
I think it would be better to separate it, have for all cases one
test with defaulted dg-do (in vect.exp that is either dg-do run or dg-do
compile:
# If the target system supports vector instructions, the default action
# for a test is 'run', otherwise it's 'compile'.
) without the dg-final and then another one with the same TYPE which would
be forcibly dg-do compile with dg-final and
dg-additional-options "-ffat-lto-objects", then you get a single dump only.

If I change the testcase to "dg-do compile" then it does indeed only produce one dump, but it's still the wrong one.

The command it runs is this (I removed some noise):

  x86_64-none-linux-gnu-gcc vect-simd-clone-16.c -flto -ffat-lto-objects \
              -msse2 -ftree-vectorize -fno-tree-loop-distribute-patterns \
              -fno-vect-cost-model -fno-common -O2 \
              -fdump-tree-vect-details -fopenmp-simd -mavx

With "-S" (dg-do compile) I get

  vect-simd-clone-16.c.172t.vect

Otherwise (dg-do run) I get

  a-vect-simd-clone-16.c.172t.vect
  a.ltrans0.ltrans.172t.vect

The "ltrans0" dump has the "foo.simdclone" output that we're looking for, but dejagnu appears to be scanning the other, which does not. The filenames vary between the two commands, but the contents is identical.

+/* { dg-final { scan-tree-dump-times "simdclone" 18 "optimized" { target 
x86_64-*-* } } } */
+/* { dg-final { scan-tree-dump-times "simdclone" 7 "optimized" { target 
amdgcn-*-* } } } */

And scan-tree-dump-times " = foo.simdclone" 2 "optimized"; I'd think that
should be the right number for all of x86_64, amdgcn and aarch64.  And
please don't forget about i?86-*-* too.

I've switched the pattern and changed to using the "vect" dump (instead of
"optimized") so that the later transformations don't mess up the counts.
However there are still other reasons why the count varies. It might be that
those can be turned off by options somehow, but probably testing those cases
is valuable too. The values are 2, 3, or 4, now, instead of 18, so that's an
improvement.

But still varries between the architectures, so it is an extra maintainance
nightmare.

+/* TODO: aarch64 */

For aarch64, one would need to include it in 
check_effective_target_vect_simd_clones
first...

I've done so and tested it, but that's not included in the patch because
there were other testcases that started reporting fails. None of the new
testcases fail for Aarch64.

Sure, that would be for a separate patch.

Anyway, if you want, commit the patch as is and tweak the testcases if
possible incrementally.

I will do so now. It would be nice to fix the testcase oddities, but I don't know how.

I wrote the above yesterday, but apparently the email didn't send ... since then some bugs have been reported. I'll try to investigate today, although I think Richi has a fix already.

Thanks

Andrew

Reply via email to