On Fri, 16 Feb 2024, Andrew Stubbs wrote: > On 16/02/2024 10:17, Richard Biener wrote: > > On Fri, 16 Feb 2024, Thomas Schwinge wrote: > > > >> Hi! > >> > >> On 2023-10-20T12:51:03+0100, Andrew Stubbs <a...@codesourcery.com> wrote: > >>> I've committed this patch > >> > >> ... as commit c7ec7bd1c6590cf4eed267feab490288e0b8d691 > >> "amdgcn: add -march=gfx1030 EXPERIMENTAL", which the later RDNA3/gfx1100 > >> support builds on top of, and that's what I'm currently working on > >> getting proper GCC/GCN target (not offloading) results for. > >> > >> Now looking at 'gcc.dg/vect/bb-slp-cond-1.c', which is reasonably simple, > >> and hopefully representative for other SLP execution test FAILs > >> (regressions compared to my earlier non-gfx1100 testing). > >> > >> $ build-gcc/gcc/xgcc -Bbuild-gcc/gcc/ > >> source-gcc/gcc/testsuite/gcc.dg/vect/bb-slp-cond-1.c > >> --sysroot=install/amdgcn-amdhsa -ftree-vectorize > >> -fno-tree-loop-distribute-patterns -fno-vect-cost-model -fno-common > >> -O2 -fdump-tree-slp-details -fdump-tree-vect-details -isystem > >> build-gcc/amdgcn-amdhsa/gfx1100/newlib/targ-include -isystem > >> source-gcc/newlib/libc/include > >> -Bbuild-gcc/amdgcn-amdhsa/gfx1100/newlib/ > >> -Lbuild-gcc/amdgcn-amdhsa/gfx1100/newlib -wrapper > >> setarch,--addr-no-randomize -fdump-tree-all-all -fdump-ipa-all-all > >> -fdump-rtl-all-all -save-temps -march=gfx1100 > >> > >> The '-march=gfx1030' 'a-bb-slp-cond-1.s' is identical (apart from > >> 'TARGET_PACKED_WORK_ITEMS' in 'gcn_target_asm_function_prologue'), so I > >> suppose will also exhibit the same failure mode, once again? > >> > >> Compared to '-march=gfx90a', the differences begin in > >> 'a-bb-slp-cond-1.c.266r.expand' (only!), down to 'a-bb-slp-cond-1.s'. > >> > >> Changed like: > >> > >> @@ -38,10 +38,10 @@ int main () > >> #pragma GCC novector > >> for (i = 1; i < N; i++) > >> if (a[i] != i%4 + 1) > >> - abort (); > >> + __builtin_printf("%d %d != %d\n", i, a[i], i%4 + 1); > >> > >> if (a[0] != 5) > >> - abort (); > >> + __builtin_printf("%d %d != %d\n", 0, a[0], 5); > >> > >> ..., we see: > >> > >> $ flock /tmp/gcn.lock build-gcc/gcc/gcn-run a.out > >> 40 5 != 1 > >> 41 6 != 2 > >> 42 7 != 3 > >> 43 8 != 4 > >> 44 5 != 1 > >> 45 6 != 2 > >> 46 7 != 3 > >> 47 8 != 4 > >> > >> '40..47' are the 'i = 10..11' in 'foo', and the expectation is > >> 'a[i * stride + 0..3] != 0'. So, either some earlier iteration has > >> scribbled zero values over these (vector lane masking issue, perhaps?), > >> or some other code generation issue? > > > > So we're indeed BB vectorizing this to > > > > _54 = MEM <vector(4) int> [(int *)_14]; > > vect_iftmp.12_56 = .VCOND (_54, { 0, 0, 0, 0 }, { 1, 2, 3, 4 }, { 5, 6, > > 7, 8 }, 115); > > MEM <vector(4) int> [(int *)_14] = vect_iftmp.12_56; > > > > I don't understand the assembly very well but it might be that > > the mask computation for the .VCOND scribbles the mask used > > to constrain operation to 4 lanes? > > > > .L3: > > s_mov_b64 exec, 15 > > v_add_co_u32 v4, s[22:23], s32, v3 > > v_mov_b32 v5, s33 > > v_add_co_ci_u32 v5, s[22:23], 0, v5, s[22:23] > > flat_load_dword v7, v[4:5] offset:0 > > s_waitcnt 0 > > flat_load_dword v0, v[10:11] offset:0 > > s_waitcnt 0 > > flat_load_dword v6, v[8:9] offset:0 > > s_waitcnt 0 > > v_cmp_ne_u32 s[18:19], v7, 0 > > v_cndmask_b32 v0, v6, v0, s[18:19] > > flat_store_dword v[4:5], v0 offset:0 > > s_add_i32 s12, s12, 1 > > s_add_u32 s32, s32, s28 > > s_addc_u32 s33, s33, s29 > > s_cmp_lg_u32 s12, s13 > > s_cbranch_scc1 .L3 > > This basic block has EXEC set to 15 (4 lanes) throughout. The mask for the > VCOND a.k.a. v_vndmask_b32 is in s[18:19]. Those things seem OK. > > I see the testcase avoids vec_extract V64SI to V4SI for gfx1100, even though > it would be a no-op conversion, because the general case requires a permute > instruction and named pattern insns can't have non-constant conditions. Is > vec_extract allowed to FAIL? That might give a better result in this case. > > However, I must be doing something different because vect/bb-slp-cond-1.c > passes for me, on gfx1100.
I didn't try to run it - when doing make check-gcc fails to using gcn-run for test invocation, what's the trick to make it do that? Richard.