On 08/04/2024 11:45, Thomas Schwinge wrote:
Hi!

On 2024-03-28T08:00:50+0100, I wrote:
On 2024-03-22T15:54:48+0000, Andrew Stubbs <a...@baylibre.com> wrote:
This patch alters the default (preferred) vector size to 32 on RDNA devices to
better match the actual hardware.  64-lane vectors will continue to be
used where they are hard-coded (such as function prologues).

We run these devices in wavefrontsize64 for compatibility, but they actually
only have 32-lane vectors, natively.  If the upper part of a V64 is masked
off (as it is in V32) then RDNA devices will skip execution of the upper part
for most operations, so this adjustment shouldn't leave too much performance on
the table.  One exception is memory instructions, so full wavefrontsize32
support would be better.

The advantage is that we avoid the missing V64 operations (such as permute and
vec_extract).

Committed to mainline.

In my GCN target '-march=gfx1100' testing, this commit
"amdgcn: Prefer V32 on RDNA devices" does resolve (or, make latent?) a
number of execution test FAILs (that is, regressions compared to earlier
'-march=gfx90a' etc. testing).

This commit also resolves (for my '-march=gfx1100' testing) one
pre-existing FAIL (that is, already seen in '-march=gfx90a' earlier
etc. testing):

     PASS: gcc.dg/tree-ssa/scev-14.c (test for excess errors)
     [-FAIL:-]{+PASS:+} gcc.dg/tree-ssa/scev-14.c scan-tree-dump ivopts 
"Overflowness wrto loop niter:\tNo-overflow"

That means, this test case specifically (or, just its 'scan-tree-dump'?)
needs to be adjusted for GCN V64 testing?

This commit, as you'd also mentioned elsewhere, however also causes a
number of regressions in 'gcc.target/gcn/gcn.exp', see list below.

Those can be "fixed" with 'dg-additional-options -march=gfx90a' (or
similar) in the affected test cases (let me know if you'd like me to
'git push' that), but I suppose something more elaborate may be in order?
(Conditionalize those on 'target { ! gcn_rdna }', and add respective
scanning for 'target gcn_rdna'?  I can help with effective-target
'gcn_rdna' (or similar), if you'd like me to.)

And/or, have a '-mpreferred-simd-mode=v64' (or similar) to be used for
such test cases, to override 'if (TARGET_RDNA2_PLUS)' etc. in
'gcn_vectorize_preferred_simd_mode'?

The latter I have quickly implemented, see attached
"GCN: '--param=gcn-preferred-vector-lane-width=[default,32,64]'".  OK to
push to trunk branch?

(This '--param' will also be useful for another bug/regression I'm about
to file.)

Best, probably, both these things, to properly test both V32 and V64?

That part remains to be done, but is best done by someone who actually
knowns "GCN" assembly/GCC back end -- that is, not me.

I'm not sure that this is *best* solution to the problem (in general, it's probably best to test the actual code that will be generated in practice), but I think this option will be useful for testing performance in each configuration and other correctness issues, and these tests are not testing that feature.

However, "vector lane width" sounds like it's configuring the number of bits in each lane. I think "vectorization factor" is unambigous.

OK to commit, with the name change.

Andrew



Grüße
  Thomas


     PASS: gcc.target/gcn/cond_fmaxnm_1.c (test for excess errors)
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_1.c scan-assembler-not 
\\tv_writelane_b32\\tv[0-9]+, vcc_..
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_1.c scan-assembler-times 
smaxv64df3_exec 3
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_1.c scan-assembler-times 
smaxv64sf3_exec 3
     PASS: gcc.target/gcn/cond_fmaxnm_1_run.c (test for excess errors)
     PASS: gcc.target/gcn/cond_fmaxnm_1_run.c execution test

     PASS: gcc.target/gcn/cond_fmaxnm_2.c (test for excess errors)
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_2.c scan-assembler-not 
\\tv_writelane_b32\\tv[0-9]+, vcc_..
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_2.c scan-assembler-times 
smaxv64df3_exec 3
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_2.c scan-assembler-times 
smaxv64sf3_exec 3
     PASS: gcc.target/gcn/cond_fmaxnm_2_run.c (test for excess errors)
     PASS: gcc.target/gcn/cond_fmaxnm_2_run.c execution test

     PASS: gcc.target/gcn/cond_fmaxnm_3.c (test for excess errors)
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_3.c scan-assembler-not 
\\tv_writelane_b32\\tv[0-9]+, vcc_..
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_3.c scan-assembler-times 
movv64df_exec 3
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_3.c scan-assembler-times 
movv64sf_exec 3
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_3.c scan-assembler-times 
smaxv64sf3 3
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_3.c scan-assembler-times 
smaxv64sf3 3
     PASS: gcc.target/gcn/cond_fmaxnm_3_run.c (test for excess errors)
     PASS: gcc.target/gcn/cond_fmaxnm_3_run.c execution test

     PASS: gcc.target/gcn/cond_fmaxnm_4.c (test for excess errors)
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_4.c scan-assembler-not 
\\tv_writelane_b32\\tv[0-9]+, vcc_..
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_4.c scan-assembler-times 
movv64df_exec 3
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_4.c scan-assembler-times 
movv64sf_exec 3
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_4.c scan-assembler-times 
smaxv64sf3 3
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_4.c scan-assembler-times 
smaxv64sf3 3
     PASS: gcc.target/gcn/cond_fmaxnm_4_run.c (test for excess errors)
     PASS: gcc.target/gcn/cond_fmaxnm_4_run.c execution test

     PASS: gcc.target/gcn/cond_fmaxnm_5.c (test for excess errors)
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_5.c scan-assembler-not 
\\tv_writelane_b32\\tv[0-9]+, vcc_..
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_5.c scan-assembler-times 
smaxv64df3_exec 3
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_5.c scan-assembler-times 
smaxv64sf3_exec 3
     PASS: gcc.target/gcn/cond_fmaxnm_5_run.c (test for excess errors)
     PASS: gcc.target/gcn/cond_fmaxnm_5_run.c execution test

     PASS: gcc.target/gcn/cond_fmaxnm_6.c (test for excess errors)
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_6.c scan-assembler-not 
\\tv_writelane_b32\\tv[0-9]+, vcc_..
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_6.c scan-assembler-times 
smaxv64df3_exec 3
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_6.c scan-assembler-times 
smaxv64sf3_exec 3
     PASS: gcc.target/gcn/cond_fmaxnm_6_run.c (test for excess errors)
     PASS: gcc.target/gcn/cond_fmaxnm_6_run.c execution test

     PASS: gcc.target/gcn/cond_fmaxnm_7.c (test for excess errors)
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_7.c scan-assembler-not 
\\tv_writelane_b32\\tv[0-9]+, vcc_..
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_7.c scan-assembler-times 
smaxv64df3_exec 3
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_7.c scan-assembler-times 
smaxv64sf3_exec 3
     PASS: gcc.target/gcn/cond_fmaxnm_7_run.c (test for excess errors)
     PASS: gcc.target/gcn/cond_fmaxnm_7_run.c execution test

     PASS: gcc.target/gcn/cond_fmaxnm_8.c (test for excess errors)
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_8.c scan-assembler-not 
\\tv_writelane_b32\\tv[0-9]+, vcc_..
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_8.c scan-assembler-times 
smaxv64df3_exec 3
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fmaxnm_8.c scan-assembler-times 
smaxv64sf3_exec 3
     PASS: gcc.target/gcn/cond_fmaxnm_8_run.c (test for excess errors)
     PASS: gcc.target/gcn/cond_fmaxnm_8_run.c execution test

     PASS: gcc.target/gcn/cond_fminnm_1.c (test for excess errors)
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fminnm_1.c scan-assembler-not 
\\tv_writelane_b32\\tv[0-9]+, vcc_..
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fminnm_1.c scan-assembler-times 
sminv64df3_exec 3
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fminnm_1.c scan-assembler-times 
sminv64sf3_exec 3
     PASS: gcc.target/gcn/cond_fminnm_1_run.c (test for excess errors)
     PASS: gcc.target/gcn/cond_fminnm_1_run.c execution test

     PASS: gcc.target/gcn/cond_fminnm_2.c (test for excess errors)
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fminnm_2.c scan-assembler-not 
\\tv_writelane_b32\\tv[0-9]+, vcc_..
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fminnm_2.c scan-assembler-times 
sminv64df3_exec 3
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fminnm_2.c scan-assembler-times 
sminv64sf3_exec 3
     PASS: gcc.target/gcn/cond_fminnm_2_run.c (test for excess errors)
     PASS: gcc.target/gcn/cond_fminnm_2_run.c execution test

     PASS: gcc.target/gcn/cond_fminnm_3.c (test for excess errors)
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fminnm_3.c scan-assembler-not 
\\tv_writelane_b32\\tv[0-9]+, vcc_..
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fminnm_3.c scan-assembler-times 
movv64df_exec 3
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fminnm_3.c scan-assembler-times 
movv64sf_exec 3
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fminnm_3.c scan-assembler-times 
sminv64sf3 3
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fminnm_3.c scan-assembler-times 
sminv64sf3 3
     PASS: gcc.target/gcn/cond_fminnm_3_run.c (test for excess errors)
     PASS: gcc.target/gcn/cond_fminnm_3_run.c execution test

     PASS: gcc.target/gcn/cond_fminnm_4.c (test for excess errors)
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fminnm_4.c scan-assembler-not 
\\tv_writelane_b32\\tv[0-9]+, vcc_..
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fminnm_4.c scan-assembler-times 
movv64df_exec 3
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fminnm_4.c scan-assembler-times 
movv64sf_exec 3
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fminnm_4.c scan-assembler-times 
sminv64sf3 3
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fminnm_4.c scan-assembler-times 
sminv64sf3 3
     PASS: gcc.target/gcn/cond_fminnm_4_run.c (test for excess errors)
     PASS: gcc.target/gcn/cond_fminnm_4_run.c execution test

     PASS: gcc.target/gcn/cond_fminnm_5.c (test for excess errors)
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fminnm_5.c scan-assembler-not 
\\tv_writelane_b32\\tv[0-9]+, vcc_..
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fminnm_5.c scan-assembler-times 
sminv64df3_exec 3
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fminnm_5.c scan-assembler-times 
sminv64sf3_exec 3
     PASS: gcc.target/gcn/cond_fminnm_5_run.c (test for excess errors)
     PASS: gcc.target/gcn/cond_fminnm_5_run.c execution test

     PASS: gcc.target/gcn/cond_fminnm_6.c (test for excess errors)
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fminnm_6.c scan-assembler-not 
\\tv_writelane_b32\\tv[0-9]+, vcc_..
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fminnm_6.c scan-assembler-times 
sminv64df3_exec 3
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fminnm_6.c scan-assembler-times 
sminv64sf3_exec 3
     PASS: gcc.target/gcn/cond_fminnm_6_run.c (test for excess errors)
     PASS: gcc.target/gcn/cond_fminnm_6_run.c execution test

     PASS: gcc.target/gcn/cond_fminnm_7.c (test for excess errors)
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fminnm_7.c scan-assembler-not 
\\tv_writelane_b32\\tv[0-9]+, vcc_..
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fminnm_7.c scan-assembler-times 
sminv64df3_exec 3
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fminnm_7.c scan-assembler-times 
sminv64sf3_exec 3
     PASS: gcc.target/gcn/cond_fminnm_7_run.c (test for excess errors)
     PASS: gcc.target/gcn/cond_fminnm_7_run.c execution test

     PASS: gcc.target/gcn/cond_fminnm_8.c (test for excess errors)
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fminnm_8.c scan-assembler-not 
\\tv_writelane_b32\\tv[0-9]+, vcc_..
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fminnm_8.c scan-assembler-times 
sminv64df3_exec 3
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_fminnm_8.c scan-assembler-times 
sminv64sf3_exec 3
     PASS: gcc.target/gcn/cond_fminnm_8_run.c (test for excess errors)
     PASS: gcc.target/gcn/cond_fminnm_8_run.c execution test

     @@ -124634,12 +124634,12 @@ PASS: gcc.target/gcn/cond_shift_3.c 
scan-assembler-not movv64di_exec/2
     PASS: gcc.target/gcn/cond_shift_3.c scan-assembler-not v_cndmask_b32
     PASS: gcc.target/gcn/cond_shift_3.c scan-assembler-times 
\\tv_ashrrev_i32\\tv[0-9]+, 3, v[0-9]+ 1
     PASS: gcc.target/gcn/cond_shift_3.c scan-assembler-times 
\\tv_lshlrev_b32\\tv[0-9]+, 3, v[0-9]+ 10
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_shift_3.c scan-assembler-times 
vashlv64di3_exec 2
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_shift_3.c scan-assembler-times 
vashlv64si3_exec 18
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_shift_3.c scan-assembler-times 
vashrv64di3_exec 1
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_shift_3.c scan-assembler-times 
vashrv64si3_exec 1
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_shift_3.c scan-assembler-times 
vlshrv64di3_exec 1
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_shift_3.c scan-assembler-times 
vlshrv64si3_exec 1
     PASS: gcc.target/gcn/cond_shift_3_run.c (test for excess errors)
     PASS: gcc.target/gcn/cond_shift_3_run.c execution test

     PASS: gcc.target/gcn/cond_shift_4.c (test for excess errors)
     @@ -124647,77 +124647,77 @@ PASS: gcc.target/gcn/cond_shift_4.c 
scan-assembler-not movv64di_exec/2
     PASS: gcc.target/gcn/cond_shift_4.c scan-assembler-not v_cndmask_b32
     PASS: gcc.target/gcn/cond_shift_4.c scan-assembler-times 
\\tv_ashrrev_i32\\tv[0-9]+, 3, v[0-9]+ 1
     PASS: gcc.target/gcn/cond_shift_4.c scan-assembler-times 
\\tv_lshlrev_b32\\tv[0-9]+, 3, v[0-9]+ 10
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_shift_4.c scan-assembler-times 
vashlv64di3_exec 2
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_shift_4.c scan-assembler-times 
vashlv64si3_exec 18
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_shift_4.c scan-assembler-times 
vashrv64di3_exec 1
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_shift_4.c scan-assembler-times 
vashrv64si3_exec 1
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_shift_4.c scan-assembler-times 
vlshrv64di3_exec 1
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_shift_4.c scan-assembler-times 
vlshrv64si3_exec 1
     PASS: gcc.target/gcn/cond_shift_4_run.c (test for excess errors)
     PASS: gcc.target/gcn/cond_shift_4_run.c execution test

     PASS: gcc.target/gcn/cond_shift_8.c (test for excess errors)
     PASS: gcc.target/gcn/cond_shift_8.c scan-assembler-not movv64di_exec/0
     PASS: gcc.target/gcn/cond_shift_8.c scan-assembler-not movv64si_exec/0
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_shift_8.c scan-assembler-times 
vashlv64di3_exec 2
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_shift_8.c scan-assembler-times 
vashlv64si3_exec 18
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_shift_8.c scan-assembler-times 
vashrv64di3_exec 1
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_shift_8.c scan-assembler-times 
vashrv64si3_exec 1
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_shift_8.c scan-assembler-times 
vlshrv64di3_exec 1
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_shift_8.c scan-assembler-times 
vlshrv64si3_exec 1
     PASS: gcc.target/gcn/cond_shift_8_run.c (test for excess errors)
     PASS: gcc.target/gcn/cond_shift_8_run.c execution test

     PASS: gcc.target/gcn/cond_shift_9.c (test for excess errors)
     PASS: gcc.target/gcn/cond_shift_9.c scan-assembler-not movv64di_exec/1
     PASS: gcc.target/gcn/cond_shift_9.c scan-assembler-not movv64si_exec/2
     PASS: gcc.target/gcn/cond_shift_9.c scan-assembler-not v_cndmask_b32
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_shift_9.c scan-assembler-times 
vashlv64di3_exec 2
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_shift_9.c scan-assembler-times 
vashlv64si3_exec 18
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_shift_9.c scan-assembler-times 
vashrv64di3_exec 1
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_shift_9.c scan-assembler-times 
vashrv64si3_exec 1
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_shift_9.c scan-assembler-times 
vlshrv64di3_exec 1
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_shift_9.c scan-assembler-times 
vlshrv64si3_exec 1
     PASS: gcc.target/gcn/cond_shift_9_run.c (test for excess errors)
     PASS: gcc.target/gcn/cond_shift_9_run.c execution test

     PASS: gcc.target/gcn/cond_smax_1.c (test for excess errors)
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_smax_1.c scan-assembler-not 
\\ts_cmpk_lg_u32\\tvcc_lo, 0
     PASS: gcc.target/gcn/cond_smax_1.c scan-assembler-not 
\\tv_cmpx_gt_i32\\tvcc, s[0-9]+, v[0-9]+
     PASS: gcc.target/gcn/cond_smax_1.c scan-assembler-not 
\\tv_writelane_b32\\tv[0-9]+, vcc_??, 0
     PASS: gcc.target/gcn/cond_smax_1.c scan-assembler-not smaxv64si3/0
     PASS: gcc.target/gcn/cond_smax_1.c scan-assembler-times 
\\tv_cmp_gt_i32\\tvcc, s[0-9]+, v[0-9]+ 80
     PASS: gcc.target/gcn/cond_smax_1.c scan-assembler-times 
\\tv_cmp_gt_i64\\tvcc, v[[0-9]+:[0-9]+], v[[0-9]+:[0-9]+] 10
     PASS: gcc.target/gcn/cond_smax_1.c scan-assembler-times 
\\tv_cmp_ne_u64\\ts\\[[0-9]+:[0-9]+\\], v\\[[0-9]+:[0-9]+\\], -1 10
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_smax_1.c scan-assembler-times 
smaxv64si3_exec 30
     PASS: gcc.target/gcn/cond_smax_1_run.c (test for excess errors)
     PASS: gcc.target/gcn/cond_smax_1_run.c execution test

     PASS: gcc.target/gcn/cond_smin_1.c (test for excess errors)
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_smin_1.c scan-assembler-not 
\\ts_cmpk_lg_u32\\tvcc_lo, 0
     PASS: gcc.target/gcn/cond_smin_1.c scan-assembler-not 
\\tv_cmpx_gt_i32\\tvcc, s[0-9]+, v[0-9]+
     PASS: gcc.target/gcn/cond_smin_1.c scan-assembler-not 
\\tv_writelane_b32\\tv[0-9]+, vcc_??, 0
     PASS: gcc.target/gcn/cond_smin_1.c scan-assembler-not sminv64si3/0
     PASS: gcc.target/gcn/cond_smin_1.c scan-assembler-times 
\\tv_cmp_gt_i32\\tvcc, s[0-9]+, v[0-9]+ 80
     PASS: gcc.target/gcn/cond_smin_1.c scan-assembler-times 
\\tv_cmp_lt_i64\\tvcc, v[[0-9]+:[0-9]+], v[[0-9]+:[0-9]+] 10
     PASS: gcc.target/gcn/cond_smin_1.c scan-assembler-times 
\\tv_cmp_ne_u64\\ts\\[[0-9]+:[0-9]+\\], v\\[[0-9]+:[0-9]+\\], -1 10
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_smin_1.c scan-assembler-times 
sminv64si3_exec 30
     PASS: gcc.target/gcn/cond_smin_1_run.c (test for excess errors)
     PASS: gcc.target/gcn/cond_smin_1_run.c execution test

     PASS: gcc.target/gcn/cond_umax_1.c (test for excess errors)
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_umax_1.c scan-assembler-not 
\\ts_cmpk_lg_u32\\tvcc_lo, 0
     PASS: gcc.target/gcn/cond_umax_1.c scan-assembler-not 
\\tv_writelane_b32\\tv[0-9]+, vcc_??, 0
     PASS: gcc.target/gcn/cond_umax_1.c scan-assembler-not umaxv64si3/0
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_umax_1.c scan-assembler-times 
\\tv_cmp_gt_i32\\tvcc, s[0-9]+, v[0-9]+ 56
     PASS: gcc.target/gcn/cond_umax_1.c scan-assembler-times 
\\tv_cmp_gt_u64\\tvcc, v[[0-9]+:[0-9]+], v[[0-9]+:[0-9]+] 8
     PASS: gcc.target/gcn/cond_umax_1.c scan-assembler-times 
\\tv_cmp_ne_u64\\ts\\[[0-9]+:[0-9]+\\], v\\[[0-9]+:[0-9]+\\], 1 8
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_umax_1.c scan-assembler-times 
umaxv64si3_exec 20
     PASS: gcc.target/gcn/cond_umax_1_run.c (test for excess errors)
     PASS: gcc.target/gcn/cond_umax_1_run.c execution test

     PASS: gcc.target/gcn/cond_umin_1.c (test for excess errors)
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_umin_1.c scan-assembler-not 
\\ts_cmpk_lg_u32\\tvcc_lo, 0
     PASS: gcc.target/gcn/cond_umin_1.c scan-assembler-not 
\\tv_writelane_b32\\tv[0-9]+, vcc_??, 0
     PASS: gcc.target/gcn/cond_umin_1.c scan-assembler-not uminv64si3/0
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_umin_1.c scan-assembler-times 
\\tv_cmp_gt_i32\\tvcc, s[0-9]+, v[0-9]+ 56
     PASS: gcc.target/gcn/cond_umin_1.c scan-assembler-times 
\\tv_cmp_lt_u64\\tvcc, v[[0-9]+:[0-9]+], v[[0-9]+:[0-9]+] 8
     PASS: gcc.target/gcn/cond_umin_1.c scan-assembler-times 
\\tv_cmp_ne_u64\\ts\\[[0-9]+:[0-9]+\\], v\\[[0-9]+:[0-9]+\\], 1 8
     [-PASS:-]{+FAIL:+} gcc.target/gcn/cond_umin_1.c scan-assembler-times 
uminv64si3_exec 20
     PASS: gcc.target/gcn/cond_umin_1_run.c (test for excess errors)
     PASS: gcc.target/gcn/cond_umin_1_run.c execution test

     PASS: gcc.target/gcn/simd-math-1.c (test for excess errors)
     [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect 
"v64df_acos"
     [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect 
"v64df_acosh"
     [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect 
"v64df_asin"
     [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect 
"v64df_asinh"
     [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect 
"v64df_atan"
     [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect 
"v64df_atan2"
     [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect 
"v64df_atanh"
     [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect 
"v64df_copysign"
     [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect 
"v64df_cos"
     [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect 
"v64df_cosh"
     [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect 
"v64df_erf"
     [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect 
"v64df_exp"
     [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect 
"v64df_exp2"
     [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect 
"v64df_fmod"
     XFAIL: gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64df_gamma"
     [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect 
"v64df_hypot"
     XFAIL: gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64df_lgamma"
     [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect 
"v64df_log"
     [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect 
"v64df_log10"
     XFAIL: gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64df_log2"
     [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect 
"v64df_pow"
     [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect 
"v64df_remainder"
     [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect 
"v64df_rint"
     [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect 
"v64df_scalb"
     [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect 
"v64df_significand"
     [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect 
"v64df_sin"
     [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect 
"v64df_sinh"
     [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect 
"v64df_sqrt"
     [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect 
"v64df_tan"
     [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect 
"v64df_tanh"
     [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect 
"v64df_tgamma"
     [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect 
"v64sf_acosf"
     [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect 
"v64sf_acoshf"
     [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect 
"v64sf_asinf"
     [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect 
"v64sf_asinhf"
     [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect 
"v64sf_atan2f"
     [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect 
"v64sf_atanf"
     [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect 
"v64sf_atanhf"
     [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect 
"v64sf_copysignf"
     [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect 
"v64sf_cosf"
     [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect 
"v64sf_coshf"
     [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect 
"v64sf_erff"
     [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect 
"v64sf_exp2f"
     [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect 
"v64sf_expf"
     [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect 
"v64sf_fmodf"
     XFAIL: gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64sf_gammaf"
     [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect 
"v64sf_hypotf"
     XFAIL: gcc.target/gcn/simd-math-1.c scan-tree-dump vect "v64sf_lgammaf"
     [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect 
"v64sf_log10f"
     [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect 
"v64sf_log2f"
     [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect 
"v64sf_logf"
     [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect 
"v64sf_powf"
     [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect 
"v64sf_remainderf"
     [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect 
"v64sf_rintf"
     [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect 
"v64sf_scalbf"
     [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect 
"v64sf_significandf"
     [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect 
"v64sf_sinf"
     [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect 
"v64sf_sinhf"
     [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect 
"v64sf_sqrtf"
     [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect 
"v64sf_tanf"
     [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect 
"v64sf_tanhf"
     [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-1.c scan-tree-dump vect 
"v64sf_tgammaf"

     @@ -125130,7 +125130,7 @@ PASS: gcc.target/gcn/simd-math-5-char-run.c 
(test for excess errors)
     PASS: gcc.target/gcn/simd-math-5-char-run.c execution test
     PASS: gcc.target/gcn/simd-math-5-char.c (test for excess errors)
     XFAIL: gcc.target/gcn/simd-math-5-char.c scan-assembler-times 
__divmodv64si4@rel32@lo 1
     [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-5-char.c scan-assembler-times 
__divv64hi3@rel32@lo 1
     PASS: gcc.target/gcn/simd-math-5-char.c scan-assembler-times 
__divv64qi3@rel32@lo 0
     FAIL: gcc.target/gcn/simd-math-5-char.c scan-assembler-times 
__modv64qi3@rel32@lo 1
     PASS: gcc.target/gcn/simd-math-5-char.c scan-assembler-times 
__udivv64qi3@rel32@lo 0

     @@ -125171,8 +125171,8 @@ PASS: gcc.target/gcn/simd-math-5-long-run.c 
(test for excess errors)
     PASS: gcc.target/gcn/simd-math-5-long-run.c execution test
     PASS: gcc.target/gcn/simd-math-5-long.c (test for excess errors)
     XFAIL: gcc.target/gcn/simd-math-5-long.c scan-assembler-times 
__divmodv64di4@rel32@lo 1
     [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-5-long.c scan-assembler-times 
__divv64di3@rel32@lo 1
     [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-5-long.c scan-assembler-times 
__modv64di3@rel32@lo 1
     PASS: gcc.target/gcn/simd-math-5-long.c scan-assembler-times 
__udivv64di3@rel32@lo 0
     PASS: gcc.target/gcn/simd-math-5-long.c scan-assembler-times 
__umodv64di3@rel32@lo 0

     PASS: gcc.target/gcn/simd-math-5-short.c (test for excess errors)
     XFAIL: gcc.target/gcn/simd-math-5-short.c scan-assembler-times 
__divmodv64si4@rel32@lo 1
     PASS: gcc.target/gcn/simd-math-5-short.c scan-assembler-times 
__divv64hi3@rel32@lo 0
     [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-5-short.c scan-assembler-times 
__divv64si3@rel32@lo 1
     FAIL: gcc.target/gcn/simd-math-5-short.c scan-assembler-times 
__modv64hi3@rel32@lo 1
     PASS: gcc.target/gcn/simd-math-5-short.c scan-assembler-times 
__udivv64hi3@rel32@lo 0
     PASS: gcc.target/gcn/simd-math-5-short.c scan-assembler-times 
__umodv64hi3@rel32@lo 0

     PASS: gcc.target/gcn/simd-math-5.c (test for excess errors)
     XFAIL: gcc.target/gcn/simd-math-5.c scan-assembler-times 
__divmodv64si4@rel32@lo 1
     PASS: gcc.target/gcn/simd-math-5.c scan-assembler-times __divsi3@rel32@lo 1
     [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-5.c scan-assembler-times 
__divv64si3@rel32@lo 1
     [-PASS:-]{+FAIL:+} gcc.target/gcn/simd-math-5.c scan-assembler-times 
__modv64si3@rel32@lo 1
     PASS: gcc.target/gcn/simd-math-5.c scan-assembler-times 
__udivmodv64si4@rel32@lo 0
     PASS: gcc.target/gcn/simd-math-5.c scan-assembler-times __udivsi3@rel32@lo 0
     PASS: gcc.target/gcn/simd-math-5.c scan-assembler-times 
__udivv64si3@rel32@lo 0
     @@ -125242,13 +125242,13 @@ PASS: gcc.target/gcn/simd-math-5.c 
scan-assembler-times __umodv64si3@rel32@lo 0

     PASS: gcc.target/gcn/smax_1.c (test for excess errors)
     PASS: gcc.target/gcn/smax_1.c scan-assembler-times \\tv_cmp_gt_i64\\tvcc, 
v[[0-9]+:[0-9]+], v[[0-9]+:[0-9]+] 10
     FAIL: gcc.target/gcn/smax_1.c scan-assembler-times \\tv_cmpx_gt_i32\\tvcc, 
s[0-9]+, v[0-9]+ 80
     [-PASS:-]{+FAIL:+} gcc.target/gcn/smax_1.c scan-assembler-times 
vec_cmpv64didi 10
     PASS: gcc.target/gcn/smax_1_run.c (test for excess errors)
     PASS: gcc.target/gcn/smax_1_run.c execution test

     PASS: gcc.target/gcn/smin_1.c (test for excess errors)
     PASS: gcc.target/gcn/smin_1.c scan-assembler-times \\tv_cmp_lt_i64\\tvcc, 
v[[0-9]+:[0-9]+], v[[0-9]+:[0-9]+] 10
     FAIL: gcc.target/gcn/smin_1.c scan-assembler-times \\tv_cmpx_gt_i32\\tvcc, 
s[0-9]+, v[0-9]+ 80
     [-PASS:-]{+FAIL:+} gcc.target/gcn/smin_1.c scan-assembler-times 
vec_cmpv64didi 10
     PASS: gcc.target/gcn/smin_1_run.c (test for excess errors)
     PASS: gcc.target/gcn/smin_1_run.c execution test

     PASS: gcc.target/gcn/sram-ecc-3.c (test for excess errors)
     [-PASS:-]{+FAIL:+} gcc.target/gcn/sram-ecc-3.c scan-assembler 
(\\*zero_extendv64qiv64si_sdwa|\\*zero_extendv64qiv64si_shift)

     PASS: gcc.target/gcn/sram-ecc-4.c (test for excess errors)
     [-PASS:-]{+FAIL:+} gcc.target/gcn/sram-ecc-4.c scan-assembler 
(\\*zero_extendv64hiv64si_sdwa|\\*zero_extendv64hiv64si_shift)

     PASS: gcc.target/gcn/sram-ecc-7.c (test for excess errors)
     [-PASS:-]{+FAIL:+} gcc.target/gcn/sram-ecc-7.c scan-assembler 
(\\*zero_extendv64qiv64si_sdwa|\\*zero_extendv64qiv64si_shift)

     PASS: gcc.target/gcn/sram-ecc-8.c (test for excess errors)
     [-PASS:-]{+FAIL:+} gcc.target/gcn/sram-ecc-8.c scan-assembler 
(\\*zero_extendv64hiv64si_sdwa|\\*zero_extendv64hiv64si_shift)

     PASS: gcc.target/gcn/umax_1.c (test for excess errors)
     PASS: gcc.target/gcn/umax_1.c scan-assembler-times \\tv_cmp_gt_u64\\tvcc, 
v[[0-9]+:[0-9]+], v[[0-9]+:[0-9]+] 8
     FAIL: gcc.target/gcn/umax_1.c scan-assembler-times \\tv_cmpx_gt_i32\\tvcc, 
s[0-9]+, v[0-9]+ 56
     [-PASS:-]{+FAIL:+} gcc.target/gcn/umax_1.c scan-assembler-times 
vec_cmpv64didi 8
     PASS: gcc.target/gcn/umax_1_run.c (test for excess errors)
     PASS: gcc.target/gcn/umax_1_run.c execution test

     PASS: gcc.target/gcn/umin_1.c (test for excess errors)
     PASS: gcc.target/gcn/umin_1.c scan-assembler-times \\tv_cmp_lt_u64\\tvcc, 
v[[0-9]+:[0-9]+], v[[0-9]+:[0-9]+] 8
     FAIL: gcc.target/gcn/umin_1.c scan-assembler-times \\tv_cmpx_gt_i32\\tvcc, 
s[0-9]+, v[0-9]+ 56
     [-PASS:-]{+FAIL:+} gcc.target/gcn/umin_1.c scan-assembler-times 
vec_cmpv64didi 8
     PASS: gcc.target/gcn/umin_1_run.c (test for excess errors)
     PASS: gcc.target/gcn/umin_1_run.c execution test


Grüße
  Thomas


gcc/ChangeLog:

        * config/gcn/gcn.cc (gcn_vectorize_preferred_simd_mode): Prefer V32 on
        RDNA devices.
---
  gcc/config/gcn/gcn.cc | 26 ++++++++++++++++++++++++++
  1 file changed, 26 insertions(+)

diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
index 498146dcde9..efb73af50c4 100644
--- a/gcc/config/gcn/gcn.cc
+++ b/gcc/config/gcn/gcn.cc
@@ -5226,6 +5226,32 @@ gcn_vector_mode_supported_p (machine_mode mode)
  static machine_mode
  gcn_vectorize_preferred_simd_mode (scalar_mode mode)
  {
+  /* RDNA devices have 32-lane vectors with limited support for 64-bit vectors
+     (in particular, permute operations are only available for cases that don't
+     span the 32-lane boundary).
+
+     From the RDNA3 manual: "Hardware may choose to skip either half if the
+     EXEC mask for that half is all zeros...". This means that preferring
+     32-lanes is a good stop-gap until we have proper wave32 support.  */
+  if (TARGET_RDNA2_PLUS)
+    switch (mode)
+      {
+      case E_QImode:
+       return V32QImode;
+      case E_HImode:
+       return V32HImode;
+      case E_SImode:
+       return V32SImode;
+      case E_DImode:
+       return V32DImode;
+      case E_SFmode:
+       return V32SFmode;
+      case E_DFmode:
+       return V32DFmode;
+      default:
+       return word_mode;
+      }
+
    switch (mode)
      {
      case E_QImode:
--
2.41.0



Reply via email to