Hi Anoop, 1. gfx902 warning: this is "intentionally" there on the ROCm compiler folks side. Essentially, they are trying to warn you that APUs are not 100% optimized for in ROCm. In particular, I believe libraries like MIOpen do not have APU support. But as long as your code does not use libraries like this, I think you should be fine.
2. The target not being found in gem5 is because you need to pass in --gfx-version=gfx902 on the command line: https://gem5.googlesource.com/public/gem5/+/refs/heads/develop/configs/example/apu_se.py#366 (or in your board if you are using a board). (this assumes you have already updated your Makefile to compile for gfx902) Essentially the problem here is that without specifying this, gem5 thinks you are running a different version of ROCm, and there is a mismatch. Hope this helps, Matt On Thu, Oct 19, 2023 at 4:08 AM Anoop Mysore <mysan...@gmail.com> wrote: > Thank you both! > I was able to manually copy over the instruction execution support, and it > works. But there are more changes in Vega that might be useful to getting > some of the CHAI benchmarks running -- so I would like to move to gfx902 as > suggested. > > However, when I try to compile for gfx902 with hipcc from ROCm 4.0.1, it > throws "Warning: The specified HIP target: gfx902 is unknown. Correct > compilation is not guaranteed." > And understandably, in simulation there's an error that complains that the > right kernel for the device is not found. I'm assuming I would need to > update the ROCm stack (for a newer hipcc) -- but I wasn't able to find an > architecture-support list to figure out which version to install. The > latest, v5.1, fails due to bad IOCTLs, so is there perhaps an intermediate > version that works? Or have I got this all wrong somehow? > > On Mon, Sep 11, 2023 at 9:15 PM Matt Sinclair < > mattdsinclair.w...@gmail.com> wrote: > >> Yeah, I haven't tried CHAI but I believe gfx902 would work with it (if >> you need APUs). >> >> Matt S. >> >> On Mon, Sep 11, 2023 at 12:56 PM Poremba, Matthew < >> matthew.pore...@amd.com> wrote: >> >>> [Public] >>> >>> Hi Anoop, >>> >>> >>> >>> >>> >>> That instruction was recently added to gem5, but for Vega ISA only: >>> https://gem5-review.googlesource.com/c/public/gem5/+/67072 . It could >>> be ported to GCN3 probably by copying the code exactly into the >>> corresponding GCN3 files. You’ll notice however in that relation chain >>> there are many more instructions implemented for Vega only, so there will >>> be similar issues to this. Alternately, I think there is a Vega APU >>> working (gfx902?). MattS would know more about the status of that. I am >>> not sure of your use case but if you can use a dGPU, Vega with gfx900 >>> version or full system mode is another option to use Vega ISA. >>> >>> >>> >>> For the docker automatically quitting, you will have to do `docker run >>> *-it* …` to start an interactive session. >>> >>> >>> >>> >>> >>> -Matt >>> >>> >>> >>> *From:* Anoop Mysore <mysan...@gmail.com> >>> *Sent:* Monday, September 11, 2023 10:33 AM >>> *To:* Poremba, Matthew <matthew.pore...@amd.com> >>> *Cc:* Matt Sinclair <mattdsinclair.w...@gmail.com>; The gem5 Users >>> mailing list <gem5-users@gem5.org> >>> *Subject:* Re: [gem5-users] Re: Error in an application running on gem5 >>> GCN3 (with apu_se.py) >>> >>> >>> >>> *Caution:* This message originated from an External Source. Use proper >>> caution when opening attachments, clicking links, or responding. >>> >>> >>> >>> Thanks, Matt. Yes, the printfs in the GPU kernel code were the issue for >>> s_sendmsg. >>> >>> However, the ds_add_u32 instruction is still an issue. I am already >>> compiling with -O1 like so: >>> >>> /opt/rocm/hip/bin/hipcc --amdgpu-target=gfx801,gfx803 >>> >>> main.cpp kernel.cu kernel.cpp >>> >>> -o ./bin/hsto.gem5 >>> >>> >>> -I/home/anoop/new/gem5-resources/src/gpu/chai/HIP-U-gem5/HSTO/../.gem5/include >>> >>> -lz -lm -lc -lpthread -O1 >>> >>> >>> -L/home/anoop/new/gem5-resources/src/gpu/chai/HIP-U-gem5/HSTO/../.gem5/util/m5/build/x86/out >>> -lm5 >>> >>> >>> >>> The exact error is: >>> src/gpu-compute/scoreboard_check_stage.cc:158: panic: next instruction: >>> ds_add_u32 v7, v8 is of unknown type >>> >>> >>> >>> The corresponding line in the simulator >>> <https://github.com/gem5/gem5/blob/48a40cf2f5182a82de360b7efa497d82e06b1631/src/gpu-compute/scoreboard_check_stage.cc#L158>, >>> and decoder section of it >>> <https://github.com/gem5/gem5/blob/48a40cf2f5182a82de360b7efa497d82e06b1631/src/arch/amdgpu/gcn3/insts/instructions.cc#L30929>. >>> Because of the involvement of the LDS/GDS, I'm unsure how to implement this >>> -- any help would be appreciated. >>> >>> >>> >>> Also, GDB still doesn't seem to be working with my gem5. And without >>> prints in the kernel, it's cumbersome to get any useful insight on failing >>> programs. >>> >>> I added within the Dockerfile: RUN apt install -y gdb >>> >>> I am invoking gdb with: >>> >>> docker run -u $UID:$GID --volume $(pwd):$(pwd) -w $(pwd) gem5:new gdb >>> --args gem5/build/GCN3_X86/gem5.debug gem5/configs/example/apu_se.py >>> --cpu-type=DerivO3CPU --num-cpus=4 --mem-size=1GB --ruby >>> --mem-type=SimpleMemory -c >>> gem5-resources/src/gpu/chai/HIP-U-gem5/HSTO/bin/hsto.gem5 >>> >>> >>> >>> Log: >>> >>> GNU gdb (Ubuntu 9.2-0ubuntu1~20.04.1) 9.2 >>> Copyright (C) 2020 Free Software Foundation, Inc. >>> License GPLv3+: GNU GPL version 3 or later < >>> http://gnu.org/licenses/gpl.html> >>> This is free software: you are free to change and redistribute it. >>> There is NO WARRANTY, to the extent permitted by law. >>> Type "show copying" and "show warranty" for details. >>> This GDB was configured as "x86_64-linux-gnu". >>> Type "show configuration" for configuration details. >>> For bug reporting instructions, please see: >>> <http://www.gnu.org/software/gdb/bugs/>. >>> Find the GDB manual and other documentation resources online at: >>> <http://www.gnu.org/software/gdb/documentation/>. >>> >>> For help, type "help". >>> Type "apropos word" to search for commands related to "word"... >>> Reading symbols from gem5/build/GCN3_X86/gem5.debug... >>> (gdb) quit >>> >>> >>> >>> PS: `quit` was automatically taken in. >>> >>> Is there anything wrong I'm doing here? >>> >>> >>> >>> >>> >>> >>> >>> On Fri, Sep 8, 2023 at 4:50 PM Poremba, Matthew <matthew.pore...@amd.com> >>> wrote: >>> >>> [Public] >>> >>> >>> >>> Hi Anoop, >>> >>> >>> >>> >>> >>> Based on that register count, I am going to guess you built the >>> application with -O0 or some other debugging flags? If you do this, the >>> compiler makes some super large number of registers. I assume that is so a >>> real GPU will not run any other applications simultaneously. >>> >>> >>> >>> Similarly, if you are seeing s_sendmsg I am going to guess there is a >>> printf() in your GPU kernel. These aren’t currently supported in gem5, but >>> something that would be very nice to have. >>> >>> >>> >>> If these are true you will need to remove any printfs and compile with >>> at least -O1 to run in gem5. >>> >>> >>> >>> >>> >>> -Matt >>> >>> >>> >>> *From:* Anoop Mysore <mysan...@gmail.com> >>> *Sent:* Friday, September 8, 2023 7:33 AM >>> *To:* Matt Sinclair <mattdsinclair.w...@gmail.com> >>> *Cc:* The gem5 Users mailing list <gem5-users@gem5.org>; Poremba, >>> Matthew <matthew.pore...@amd.com> >>> *Subject:* Re: [gem5-users] Re: Error in an application running on gem5 >>> GCN3 (with apu_se.py) >>> >>> >>> >>> *Caution:* This message originated from an External Source. Use proper >>> caution when opening attachments, clicking links, or responding. >>> >>> >>> >>> Hi Matt, >>> I'm facing a few other problems: >>> >>> 1. `panic: panic condition (numWfs * vregDemandPerWI) > (numVectorALUs * >>> numVecRegsPerSimd) occurred: WG with 1 WFs and 29285 VGPRs per WI can not >>> be allocated to CU that has 8192 VGPRs` >>> >>> The corresponding line of the code in gem5: >>> https://github.com/gem5/gem5/blob/f29bfc0640c88a79eb7f94454ce31b3237ec0066/src/gpu-compute/compute_unit.cc#L565 >>> >>> One of the variables (vregDemandPerWI) is ultimately derived from >>> reading the executable for the kernel code. Is it possible to reduce this >>> VGRP demand somehow, or is increasing the VGPRs (to what seems like an >>> unrealistically high value) be the only solution? Similar error for SGPRs >>> as well. >>> >>> 2. Some kernels (compiled for gfx801/3) have instructions such as >>> ds_add_u32 >>> <https://www.amd.com/system/files/TechDocs/gcn3-instruction-set-architecture.pdf> >>> (Data >>> Store instruction page: 12-161), s_sendmsg (send message to host CPU) -- >>> which do not have their relevant decoding code available >>> <https://github.com/gem5/gem5/blob/48a40cf2f5182a82de360b7efa497d82e06b1631/src/arch/amdgpu/gcn3/insts/instructions.cc#L30929>. >>> Is this intentional or was this just punted for later -- anything to keep >>> in mind when coding for these? >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> On Thu, Aug 17, 2023 at 5:13 PM Matt Sinclair < >>> mattdsinclair.w...@gmail.com> wrote: >>> >>> Hi Anoop, >>> >>> >>> >>> I'm glad that increasing -n helped. It's hard to say what exactly the >>> problem is without digging in further, but often the ROCm stack will launch >>> additional processes to do a variety of things (e.g., check which version >>> of LLVM is being used). In gem5, each of these require a separate CPU >>> thread context -- which increasing -n handles in SE mode. So if I had to >>> guess, I would say that this is what is happening. >>> >>> >>> >>> If you added gdb locally to your docker, and you built the docker >>> properly, then I would expect gdb to work with gem5. >>> >>> >>> >>> Thanks, >>> >>> Matt >>> >>> >>> >>> On Wed, Aug 16, 2023 at 11:41 PM Anoop Mysore <mysan...@gmail.com> >>> wrote: >>> >>> Thank you, Matt, having 10 CPUs (up from previous 3) in the simulated >>> system seems to make it work! (At least, I don't see that error at that >>> point anymore). Is "resource temporarily unavailable" commonly due to CPU >>> count? Curious to know how you made that connection. >>> >>> >>> >>> Re gdb: I am indeed using a local docker build >>> (gem5/util/dockerfiles/gcn-gpu) with an added gdb installation -- is that >>> what you meant? >>> >>> >>> >>> Will send in a PR to the repo soon as I'm done :) >>> >>> On Wed, Aug 16, 2023, 5:03 PM Matt Sinclair < >>> mattdsinclair.w...@gmail.com> wrote: >>> >>> Hi Anoop, >>> >>> >>> >>> A few things here: >>> >>> >>> >>> - Regarding the original failure (at least the !FS part), this is >>> normally happening either because of the GPU Target ISA (e.g., gfx900) you >>> used in your Makefile (e.g., it is not supported) or because you didn't >>> properly specify what GPU ISA you are using when running the program. So, >>> what is your command line for running this application and what ISA are you >>> specifying in your Makefile? >>> >>> - If the "what()" is the real source of the error, then I think this >>> could be related to the number of CPU thread contexts you are running with >>> gem5. What did you set "-n" to? >>> >>> - Regarding gdb, @Matt P: did you remove gdb from what is installed in >>> the Docker a while back? If so, I think Anoop would need to add it back >>> and create a local docker or something like that. >>> >>> - Setting aside the above, it would be wonderful if you contribute the >>> CHAI benchmarks to gem5-resources once you get them working! Please let us >>> know if we can do anything to help with that. >>> >>> >>> >>> Thanks, >>> >>> Matt >>> >>> >>> >>> On Wed, Aug 16, 2023 at 9:51 AM Anoop Mysore via gem5-users < >>> gem5-users@gem5.org> wrote: >>> >>> Curiously, running the gem5.debug executable with gdb within docker >>> results in: >>> >>> Reading symbols from gem5/build/GCN3_X86/gem5.debug... >>> (gdb) quit >>> (the quit wasn't a command I provided, it just quits automatically). Is >>> gdb working with gem5 GCN3 in Docker? >>> >>> >>> >>> I ran gem5.opt with ExecAll and SyscallAll debug flags, the debug tail >>> and the simerr logs are attached. >>> >>> I don't see anything peculiar other than a tgkill syscall with a SIGABRT >>> sent to a thread thereafter halting within a few instructions. >>> >>> >>> >>> On Tue, Aug 15, 2023 at 9:00 PM Anoop Mysore <mysan...@gmail.com> wrote: >>> >>> I am trying to port CHAI benchmarks >>> <https://github.com/chai-benchmarks/chai>similarly to >>> gem5-resources/src/gpu/pannotia >>> <https://github.com/gem5/gem5-resources/tree/stable/src/gpu/pannotia>. >>> I was able to HIPify (through the perl script + some manual changes) all >>> the code files, and ran the BFS program. I see the following error message >>> at the point of launching the CPU threads here >>> <https://github.com/mysoreanoop/chai/blob/678c18fd551fbf12f4abbb05ab7164f1b588be68/HIP-U-gem5/BFS/main.cpp#L273> >>> (fork >>> of HIPified CHAI). I do not see any of the prints from the CPU threads >>> which leads me to believe the error is to do with the threads not being >>> launched or a related error. >>> >>> >>> >>> (This looks related; incorporated the suggestion of linking against >>> -pthread: https://stackoverflow.com/a/6485728) >>> >>> >>> >>> The stderr log is below; any help is appreciated. >>> >>> _________ >>> >>> .... >>> >>> AM: Launching CPU >>> >>> terminate called after throwing an instance of 'std::system_error' >>> >>> what(): Resource temporarily unavailable >>> >>> build/GCN3_X86/sim/faults.cc:60: panic: panic condition !FullSystem >>> occurred: fault (General-Protection) detected @ PC >>> (0x7ffff6afa941=>0x7ffff6afa942).(0=>1) >>> Memory Usage: 19704072 KBytes >>> >>> Program aborted at tick 441590522500 >>> >>> --- BEGIN LIBC BACKTRACE --- >>> gem5/build/GCN3_X86/gem5.opt(+0x550200)[0x55a709b31200] >>> gem5/build/GCN3_X86/gem5.opt(+0x57d46e)[0x55a709b5e46e] >>> /lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7f18881a0420] >>> /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7f188734800b] >>> /lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7f1887327859] >>> gem5/build/GCN3_X86/gem5.opt(+0x4be295)[0x55a709a9f295] >>> gem5/build/GCN3_X86/gem5.opt(+0x5f6169)[0x55a709bd7169] >>> gem5/build/GCN3_X86/gem5.opt(+0x9fd9ed)[0x55a709fde9ed] >>> gem5/build/GCN3_X86/gem5.opt(+0x15b1d10)[0x55a70ab92d10] >>> gem5/build/GCN3_X86/gem5.opt(+0x15b2fd5)[0x55a70ab93fd5] >>> gem5/build/GCN3_X86/gem5.opt(+0x15b5620)[0x55a70ab96620] >>> gem5/build/GCN3_X86/gem5.opt(+0x15b6348)[0x55a70ab97348] >>> gem5/build/GCN3_X86/gem5.opt(+0x15c2954)[0x55a70aba3954] >>> gem5/build/GCN3_X86/gem5.opt(+0x56a082)[0x55a709b4b082] >>> gem5/build/GCN3_X86/gem5.opt(+0x59e2c4)[0x55a709b7f2c4] >>> gem5/build/GCN3_X86/gem5.opt(+0x59e8a3)[0x55a709b7f8a3] >>> gem5/build/GCN3_X86/gem5.opt(+0x4ed462)[0x55a709ace462] >>> gem5/build/GCN3_X86/gem5.opt(+0x4af427)[0x55a709a90427] >>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x2a8738)[0x7f1888459738] >>> >>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x8dd8)[0x7f188822ef48] >>> >>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f188837be3b] >>> >>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyFunction_Vectorcall+0x94)[0x7f1888459114] >>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d)[0x7f1888225d6d] >>> >>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x7d86)[0x7f188822def6] >>> >>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f188837be3b] >>> >>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCodeEx+0x42)[0x7f188837c1c2] >>> >>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCode+0x1f)[0x7f188837c5af] >>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x1cfbf1)[0x7f1888380bf1] >>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x25f537)[0x7f1888410537] >>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d)[0x7f1888225d6d] >>> >>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x12fd)[0x7f188822746d] >>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x8006b)[0x7f188823106b] >>> --- END LIBC BACKTRACE --- >>> Failed to execute default signal handler! >>> >>> _________ >>> >>> _______________________________________________ >>> gem5-users mailing list -- gem5-users@gem5.org >>> To unsubscribe send an email to gem5-users-le...@gem5.org >>> >>>
_______________________________________________ gem5-users mailing list -- gem5-users@gem5.org To unsubscribe send an email to gem5-users-le...@gem5.org