Hi! On 2024-02-16T12:41:06+0000, Andrew Stubbs <a...@baylibre.com> wrote: > On 16/02/2024 12:26, Richard Biener wrote: >> On Fri, 16 Feb 2024, Andrew Stubbs wrote: >>> On 16/02/2024 10:17, Richard Biener wrote: >>>> On Fri, 16 Feb 2024, Thomas Schwinge wrote: >>>>> On 2023-10-20T12:51:03+0100, Andrew Stubbs <a...@codesourcery.com> wrote: >>>>>> I've committed this patch >>>>> >>>>> ... as commit c7ec7bd1c6590cf4eed267feab490288e0b8d691 >>>>> "amdgcn: add -march=gfx1030 EXPERIMENTAL", which the later RDNA3/gfx1100 >>>>> support builds on top of, and that's what I'm currently working on >>>>> getting proper GCC/GCN target (not offloading) results for. >>>>> >>>>> Now looking at 'gcc.dg/vect/bb-slp-cond-1.c', which is reasonably simple, >>>>> and hopefully representative for other SLP execution test FAILs >>>>> (regressions compared to my earlier non-gfx1100 testing). >>>>> >>>>> $ build-gcc/gcc/xgcc -Bbuild-gcc/gcc/ >>>>> source-gcc/gcc/testsuite/gcc.dg/vect/bb-slp-cond-1.c >>>>> --sysroot=install/amdgcn-amdhsa -ftree-vectorize >>>>> -fno-tree-loop-distribute-patterns -fno-vect-cost-model -fno-common >>>>> -O2 -fdump-tree-slp-details -fdump-tree-vect-details -isystem >>>>> build-gcc/amdgcn-amdhsa/gfx1100/newlib/targ-include -isystem >>>>> source-gcc/newlib/libc/include >>>>> -Bbuild-gcc/amdgcn-amdhsa/gfx1100/newlib/ >>>>> -Lbuild-gcc/amdgcn-amdhsa/gfx1100/newlib -wrapper >>>>> setarch,--addr-no-randomize -fdump-tree-all-all -fdump-ipa-all-all >>>>> -fdump-rtl-all-all -save-temps -march=gfx1100 >>>>> >>>>> The '-march=gfx1030' 'a-bb-slp-cond-1.s' is identical (apart from >>>>> 'TARGET_PACKED_WORK_ITEMS' in 'gcn_target_asm_function_prologue'), so I >>>>> suppose will also exhibit the same failure mode, once again? >>>>> >>>>> Compared to '-march=gfx90a', the differences begin in >>>>> 'a-bb-slp-cond-1.c.266r.expand' (only!), down to 'a-bb-slp-cond-1.s'. >>>>> >>>>> Changed like: >>>>> >>>>> @@ -38,10 +38,10 @@ int main () >>>>> #pragma GCC novector >>>>> for (i = 1; i < N; i++) >>>>> if (a[i] != i%4 + 1) >>>>> - abort (); >>>>> + __builtin_printf("%d %d != %d\n", i, a[i], i%4 + 1); >>>>> >>>>> if (a[0] != 5) >>>>> - abort (); >>>>> + __builtin_printf("%d %d != %d\n", 0, a[0], 5); >>>>> >>>>> ..., we see: >>>>> >>>>> $ flock /tmp/gcn.lock build-gcc/gcc/gcn-run a.out >>>>> 40 5 != 1 >>>>> 41 6 != 2 >>>>> 42 7 != 3 >>>>> 43 8 != 4 >>>>> 44 5 != 1 >>>>> 45 6 != 2 >>>>> 46 7 != 3 >>>>> 47 8 != 4 >>>>> >>>>> '40..47' are the 'i = 10..11' in 'foo', and the expectation is >>>>> 'a[i * stride + 0..3] != 0'. So, either some earlier iteration has >>>>> scribbled zero values over these (vector lane masking issue, perhaps?), >>>>> or some other code generation issue?
>>> [...], I must be doing something different because vect/bb-slp-cond-1.c >>> passes for me, on gfx1100. That's strange. I've looked at your log file (looks good), and used your toolchain to compile, and your 'gcn-run' to invoke, and still do get: $ flock /tmp/gcn.lock ~/gcn-run ~/bb-slp-cond-1.exe GCN Kernel Aborted Kernel aborted Andrew, later on, please try what happens when you put an unconditional 'abort' call into a test case? >> I didn't try to run it - when doing make check-gcc fails to using >> gcn-run for test invocation Note, that for such individual test cases, invoking the compiler and then 'gcn-run' manually would seem easiest? >> what's the trick to make it do that? I tell you've probably not done much "embedded" or simulator testing of GCC targets? ;-P > There's a config file for nvptx here: > https://github.com/SourceryTools/nvptx-tools/blob/master/nvptx-none-run.exp Yes, and I have pending some updates to that one, to be finished once I've generally got my testing set up again, to a sufficient degree... > You can probably make the obvious adjustments. I think Thomas has a GCN > version with a few more features. Right. I'm attaching my current 'amdgcn-amdhsa-run.exp'. I'm aware that the 'set_board_info gcc,[...] [...]' may be obsolete/wrong (as Andrew also noted privately) -- likewise, at least in part, for GCC/nvptx, which is where I copied all that from. (Will revise later; not relevant for this discussion, here.) Similar to what I've recently added to libgomp, there is 'flock'ing here, so that you may use 'make -j[...] check' for (partial) parallelism, but still all execution testing runs serialized. I found this to greatly help denoise the test results. (Not ideal, of course, but improving that is for later, too.) You may want to disable the 'HSA_STATUS_ERROR_OUT_OF_RESOURCES' thing if that doesn't work like that in your case. (I've no idea what 'amdgpu_gpu_recover' would do if the GPU is also used for display.) But this, again, greatly helps denoise test results, at least for the one system I'm currently testing on. I intend to publish proper documentation of all this, later on -- happy to answer any questions in the mean time. If you don't already have a common directory for DejaGnu board files, put 'amdgcn-amdhsa-run.exp' into '~/tmp/amdgcn-amdhsa/', for example, and add a 'dejagnu.exp' file next to it: lappend boards_dir ~/tmp/amdgcn-amdhsa Prepare: $ DEJAGNU=$HOME/tmp/amdgcn-amdhsa/dejagnu.exp $ export DEJAGNU $ AMDGCN_AMDHSA_RUN=[...]/build-gcc/gcc/gcn-run $ export AMDGCN_AMDHSA_RUN $ # If necessary: $ AMDGCN_AMDHSA_LD_LIBRARY_PATH=/opt/rocm/lib $ LD_LIBRARY_PATH=$AMDGCN_AMDHSA_LD_LIBRARY_PATH${LD_LIBRARY_PATH+:$LD_LIBRARY_PATH} $ export LD_LIBRARY_PATH ..., and then run: $ make -j8 check-gcc-c RUNTESTFLAGS='--target_board=amdgcn-amdhsa-run/-march=gfx1030 vect.exp' Oh, and I saw that on <https://gcc.gnu.org/wiki/Offloading>, Tobias has recently put into a new "Using the GPU as stand-alone system" section some similar information. (..., but this should, in my opinion, be on a different page, as it's explicitly *not* about what we understand as offloading.) > I usually use the CodeSourcery magic stack of scripts for testing > installed toolchains on remote devices, so I'm not too familiar with > using Dejagnu directly. Tsk... ;'-| Grüße Thomas
# DejaGnu board file for amdgcn-amdhsa. set_board_info target_install {amdgcn-amdhsa} load_generic_config "sim" if { [info exists env(AMDGCN_AMDHSA_LOCK_FILE)] } then { set_board_info sim,lock_file "$env(AMDGCN_AMDHSA_LOCK_FILE)" } else { #TODO What's a good default filename? set_board_info sim,lock_file "/tmp/gcn.lock" } if { [info exists env(AMDGCN_AMDHSA_RUN)] } then { set_board_info sim "$env(AMDGCN_AMDHSA_RUN)" } else { set_board_info sim "gcn-run" } # This isn't a simulator, but rather a "launcher". unset_board_info is_simulator unset_board_info slow_simulator process_multilib_options "" set_board_info gcc,stack_size 8192 set_board_info gcc,no_trampolines 1 set_board_info gcc,no_label_values 1 set_board_info gcc,signal_suppress 1 set_board_info compiler "[find_gcc]" set_board_info cflags "[newlib_include_flags]" set_board_info ldflags "[newlib_link_flags]" set_board_info ldscript "" #TODO Work around <http://mid.mail-archive.com/B457CE4A2BB446B7930A9BA1E38DBCCC@pleaset> 'ERROR: (DejaGnu) proc "::tcl::tm::UnknownHandler {::tcl::MacOSXPkgUnknown ::tclPkgUnknown} msgcat 1.4" does not exist.'... # Otherwise, our use of 'clock format' may cause spurious errors such as: # ERROR: gcc.c-torture/compile/pr44686.c -O0 : unknown dg option: ::tcl::tm::UnknownHandler ::tclPkgUnknown msgcat 1.4 for " dg-require-profiling 1 "-fprofile-generate" " # ..., and all testing thus breaking apart. set dummy [clock format [clock seconds]] unset dummy proc sim__open_lock_file { lock_file } { # Try to open the lock file for reading, so that this also works if # somebody else created the file. if [catch {open $lock_file r} result] { verbose -log "Couldn't open '$lock_file' for reading: $result" # Try to create the lock file. if [catch {open $lock_file a+} result] { verbose -log "Couldn't create '$lock_file': $result" # If this again failed, somebody else created it, concurrently. If # in the following we're now not able to open it for reading, we've # got a fundamental problem, and let it fail. set result [open $lock_file r] } } return $result } # The default 'sim_load' would eventually call into 'sim_spawn', 'sim_wait', # but it's earlier here to just override the former one, and put safeguards # into the latter two. proc sim_spawn { dest cmdline args } { perror "TODO 'sim_spawn'" verbose -log "TODO 'sim_spawn'" return -1 } proc sim_wait { dest timeout } { perror "TODO 'sim_wait'" verbose -log "TODO 'sim_wait'" return -1 } proc sim_load { dest prog args } { set inpfile "" if { [llength $args] > 1 } { if { [lindex $args 1] != "" } { set inpfile "[lindex $args 1]" } } # The launcher arguments are the program followed by the program arguments. set pargs [lindex $args 0] set largs [concat $prog $pargs] set args [lreplace $args 0 0 $largs] set launcher [board_info $dest sim] # To support parallel testing ('make -j[...] check') in light of flaky test # results for concurrent GPU usage, we'd like to serialize execution tests. set lock_file [board_info $dest sim,lock_file] if { $lock_file != "" } { set lock_fd [sim__open_lock_file $lock_file] set lock_clock_begin [clock seconds] exec flock 0 <@ $lock_fd set lock_clock_end [clock seconds] verbose -log "Got flock('$lock_file') at [clock format $lock_clock_end] after [expr $lock_clock_end - $lock_clock_begin] s" 2 } # Note, not using 'remote_exec $dest' here. set result [eval [list remote_exec host $launcher] $args $inpfile] #TODO If we ran into 'HSA_STATUS_ERROR_OUT_OF_RESOURCES'... if { [lindex $result 0] != 0 && [string match "*HSA_STATUS_ERROR_OUT_OF_RESOURCES*" [lindex $result 1]] } { verbose -log "Trying to recover from 'HSA_STATUS_ERROR_OUT_OF_RESOURCES', and then re-execute." #TODO ..., reset the GPU.... exec sudo cat /sys/kernel/debug/dri/0/amdgpu_gpu_recover #TODO ..., and try again. set result [eval [list remote_exec host $launcher] $args $inpfile] } # We don't tell 'launcher' execution failure from 'prog' execution failure. # Maybe we should, or maybe it doesn't matter. (When there's an error, # there's an error.) if { $lock_file != "" } { # Unlock (implicit with 'close'). close $lock_fd } if { [lindex $result 0] == 0 } { return [list "pass" [lindex $result 1]] } else { return [list "fail" [lindex $result 1]] } } # <https://inbox.sourceware.org/1392398663.17835.120.camel@ubuntu-sellcey> proc sim_exec { dest srcfile args } { perror "TODO 'sim_exec'" verbose -log "TODO 'sim_exec'" return -1 }