Yeah, I haven't tried CHAI but I believe gfx902 would work with it (if you need APUs).
Matt S. On Mon, Sep 11, 2023 at 12:56 PM Poremba, Matthew <matthew.pore...@amd.com> wrote: > [Public] > > Hi Anoop, > > > > > > That instruction was recently added to gem5, but for Vega ISA only: > https://gem5-review.googlesource.com/c/public/gem5/+/67072 . It could be > ported to GCN3 probably by copying the code exactly into the corresponding > GCN3 files. You’ll notice however in that relation chain there are many > more instructions implemented for Vega only, so there will be similar > issues to this. Alternately, I think there is a Vega APU working > (gfx902?). MattS would know more about the status of that. I am not sure > of your use case but if you can use a dGPU, Vega with gfx900 version or > full system mode is another option to use Vega ISA. > > > > For the docker automatically quitting, you will have to do `docker run > *-it* …` to start an interactive session. > > > > > > -Matt > > > > *From:* Anoop Mysore <mysan...@gmail.com> > *Sent:* Monday, September 11, 2023 10:33 AM > *To:* Poremba, Matthew <matthew.pore...@amd.com> > *Cc:* Matt Sinclair <mattdsinclair.w...@gmail.com>; The gem5 Users > mailing list <gem5-users@gem5.org> > *Subject:* Re: [gem5-users] Re: Error in an application running on gem5 > GCN3 (with apu_se.py) > > > > *Caution:* This message originated from an External Source. Use proper > caution when opening attachments, clicking links, or responding. > > > > Thanks, Matt. Yes, the printfs in the GPU kernel code were the issue for > s_sendmsg. > > However, the ds_add_u32 instruction is still an issue. I am already > compiling with -O1 like so: > > /opt/rocm/hip/bin/hipcc --amdgpu-target=gfx801,gfx803 > > main.cpp kernel.cu kernel.cpp > > -o ./bin/hsto.gem5 > > > -I/home/anoop/new/gem5-resources/src/gpu/chai/HIP-U-gem5/HSTO/../.gem5/include > > -lz -lm -lc -lpthread -O1 > > > -L/home/anoop/new/gem5-resources/src/gpu/chai/HIP-U-gem5/HSTO/../.gem5/util/m5/build/x86/out > -lm5 > > > > The exact error is: > src/gpu-compute/scoreboard_check_stage.cc:158: panic: next instruction: > ds_add_u32 v7, v8 is of unknown type > > > > The corresponding line in the simulator > <https://github.com/gem5/gem5/blob/48a40cf2f5182a82de360b7efa497d82e06b1631/src/gpu-compute/scoreboard_check_stage.cc#L158>, > and decoder section of it > <https://github.com/gem5/gem5/blob/48a40cf2f5182a82de360b7efa497d82e06b1631/src/arch/amdgpu/gcn3/insts/instructions.cc#L30929>. > Because of the involvement of the LDS/GDS, I'm unsure how to implement this > -- any help would be appreciated. > > > > Also, GDB still doesn't seem to be working with my gem5. And without > prints in the kernel, it's cumbersome to get any useful insight on failing > programs. > > I added within the Dockerfile: RUN apt install -y gdb > > I am invoking gdb with: > > docker run -u $UID:$GID --volume $(pwd):$(pwd) -w $(pwd) gem5:new gdb > --args gem5/build/GCN3_X86/gem5.debug gem5/configs/example/apu_se.py > --cpu-type=DerivO3CPU --num-cpus=4 --mem-size=1GB --ruby > --mem-type=SimpleMemory -c > gem5-resources/src/gpu/chai/HIP-U-gem5/HSTO/bin/hsto.gem5 > > > > Log: > > GNU gdb (Ubuntu 9.2-0ubuntu1~20.04.1) 9.2 > Copyright (C) 2020 Free Software Foundation, Inc. > License GPLv3+: GNU GPL version 3 or later < > http://gnu.org/licenses/gpl.html> > This is free software: you are free to change and redistribute it. > There is NO WARRANTY, to the extent permitted by law. > Type "show copying" and "show warranty" for details. > This GDB was configured as "x86_64-linux-gnu". > Type "show configuration" for configuration details. > For bug reporting instructions, please see: > <http://www.gnu.org/software/gdb/bugs/>. > Find the GDB manual and other documentation resources online at: > <http://www.gnu.org/software/gdb/documentation/>. > > For help, type "help". > Type "apropos word" to search for commands related to "word"... > Reading symbols from gem5/build/GCN3_X86/gem5.debug... > (gdb) quit > > > > PS: `quit` was automatically taken in. > > Is there anything wrong I'm doing here? > > > > > > > > On Fri, Sep 8, 2023 at 4:50 PM Poremba, Matthew <matthew.pore...@amd.com> > wrote: > > [Public] > > > > Hi Anoop, > > > > > > Based on that register count, I am going to guess you built the > application with -O0 or some other debugging flags? If you do this, the > compiler makes some super large number of registers. I assume that is so a > real GPU will not run any other applications simultaneously. > > > > Similarly, if you are seeing s_sendmsg I am going to guess there is a > printf() in your GPU kernel. These aren’t currently supported in gem5, but > something that would be very nice to have. > > > > If these are true you will need to remove any printfs and compile with at > least -O1 to run in gem5. > > > > > > -Matt > > > > *From:* Anoop Mysore <mysan...@gmail.com> > *Sent:* Friday, September 8, 2023 7:33 AM > *To:* Matt Sinclair <mattdsinclair.w...@gmail.com> > *Cc:* The gem5 Users mailing list <gem5-users@gem5.org>; Poremba, Matthew > <matthew.pore...@amd.com> > *Subject:* Re: [gem5-users] Re: Error in an application running on gem5 > GCN3 (with apu_se.py) > > > > *Caution:* This message originated from an External Source. Use proper > caution when opening attachments, clicking links, or responding. > > > > Hi Matt, > I'm facing a few other problems: > > 1. `panic: panic condition (numWfs * vregDemandPerWI) > (numVectorALUs * > numVecRegsPerSimd) occurred: WG with 1 WFs and 29285 VGPRs per WI can not > be allocated to CU that has 8192 VGPRs` > > The corresponding line of the code in gem5: > https://github.com/gem5/gem5/blob/f29bfc0640c88a79eb7f94454ce31b3237ec0066/src/gpu-compute/compute_unit.cc#L565 > > One of the variables (vregDemandPerWI) is ultimately derived from reading > the executable for the kernel code. Is it possible to reduce this VGRP > demand somehow, or is increasing the VGPRs (to what seems like an > unrealistically high value) be the only solution? Similar error for SGPRs > as well. > > 2. Some kernels (compiled for gfx801/3) have instructions such as > ds_add_u32 > <https://www.amd.com/system/files/TechDocs/gcn3-instruction-set-architecture.pdf> > (Data > Store instruction page: 12-161), s_sendmsg (send message to host CPU) -- > which do not have their relevant decoding code available > <https://github.com/gem5/gem5/blob/48a40cf2f5182a82de360b7efa497d82e06b1631/src/arch/amdgpu/gcn3/insts/instructions.cc#L30929>. > Is this intentional or was this just punted for later -- anything to keep > in mind when coding for these? > > > > > > > > > > On Thu, Aug 17, 2023 at 5:13 PM Matt Sinclair < > mattdsinclair.w...@gmail.com> wrote: > > Hi Anoop, > > > > I'm glad that increasing -n helped. It's hard to say what exactly the > problem is without digging in further, but often the ROCm stack will launch > additional processes to do a variety of things (e.g., check which version > of LLVM is being used). In gem5, each of these require a separate CPU > thread context -- which increasing -n handles in SE mode. So if I had to > guess, I would say that this is what is happening. > > > > If you added gdb locally to your docker, and you built the docker > properly, then I would expect gdb to work with gem5. > > > > Thanks, > > Matt > > > > On Wed, Aug 16, 2023 at 11:41 PM Anoop Mysore <mysan...@gmail.com> wrote: > > Thank you, Matt, having 10 CPUs (up from previous 3) in the simulated > system seems to make it work! (At least, I don't see that error at that > point anymore). Is "resource temporarily unavailable" commonly due to CPU > count? Curious to know how you made that connection. > > > > Re gdb: I am indeed using a local docker build > (gem5/util/dockerfiles/gcn-gpu) with an added gdb installation -- is that > what you meant? > > > > Will send in a PR to the repo soon as I'm done :) > > On Wed, Aug 16, 2023, 5:03 PM Matt Sinclair <mattdsinclair.w...@gmail.com> > wrote: > > Hi Anoop, > > > > A few things here: > > > > - Regarding the original failure (at least the !FS part), this is normally > happening either because of the GPU Target ISA (e.g., gfx900) you used in > your Makefile (e.g., it is not supported) or because you didn't properly > specify what GPU ISA you are using when running the program. So, what is > your command line for running this application and what ISA are you > specifying in your Makefile? > > - If the "what()" is the real source of the error, then I think this could > be related to the number of CPU thread contexts you are running with gem5. > What did you set "-n" to? > > - Regarding gdb, @Matt P: did you remove gdb from what is installed in the > Docker a while back? If so, I think Anoop would need to add it back and > create a local docker or something like that. > > - Setting aside the above, it would be wonderful if you contribute the > CHAI benchmarks to gem5-resources once you get them working! Please let us > know if we can do anything to help with that. > > > > Thanks, > > Matt > > > > On Wed, Aug 16, 2023 at 9:51 AM Anoop Mysore via gem5-users < > gem5-users@gem5.org> wrote: > > Curiously, running the gem5.debug executable with gdb within docker > results in: > > Reading symbols from gem5/build/GCN3_X86/gem5.debug... > (gdb) quit > (the quit wasn't a command I provided, it just quits automatically). Is > gdb working with gem5 GCN3 in Docker? > > > > I ran gem5.opt with ExecAll and SyscallAll debug flags, the debug tail and > the simerr logs are attached. > > I don't see anything peculiar other than a tgkill syscall with a SIGABRT > sent to a thread thereafter halting within a few instructions. > > > > On Tue, Aug 15, 2023 at 9:00 PM Anoop Mysore <mysan...@gmail.com> wrote: > > I am trying to port CHAI benchmarks > <https://github.com/chai-benchmarks/chai>similarly to > gem5-resources/src/gpu/pannotia > <https://github.com/gem5/gem5-resources/tree/stable/src/gpu/pannotia>. I > was able to HIPify (through the perl script + some manual changes) all the > code files, and ran the BFS program. I see the following error message at > the point of launching the CPU threads here > <https://github.com/mysoreanoop/chai/blob/678c18fd551fbf12f4abbb05ab7164f1b588be68/HIP-U-gem5/BFS/main.cpp#L273> > (fork > of HIPified CHAI). I do not see any of the prints from the CPU threads > which leads me to believe the error is to do with the threads not being > launched or a related error. > > > > (This looks related; incorporated the suggestion of linking against > -pthread: https://stackoverflow.com/a/6485728) > > > > The stderr log is below; any help is appreciated. > > _________ > > .... > > AM: Launching CPU > > terminate called after throwing an instance of 'std::system_error' > > what(): Resource temporarily unavailable > > build/GCN3_X86/sim/faults.cc:60: panic: panic condition !FullSystem > occurred: fault (General-Protection) detected @ PC > (0x7ffff6afa941=>0x7ffff6afa942).(0=>1) > Memory Usage: 19704072 KBytes > > Program aborted at tick 441590522500 > > --- BEGIN LIBC BACKTRACE --- > gem5/build/GCN3_X86/gem5.opt(+0x550200)[0x55a709b31200] > gem5/build/GCN3_X86/gem5.opt(+0x57d46e)[0x55a709b5e46e] > /lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7f18881a0420] > /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7f188734800b] > /lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7f1887327859] > gem5/build/GCN3_X86/gem5.opt(+0x4be295)[0x55a709a9f295] > gem5/build/GCN3_X86/gem5.opt(+0x5f6169)[0x55a709bd7169] > gem5/build/GCN3_X86/gem5.opt(+0x9fd9ed)[0x55a709fde9ed] > gem5/build/GCN3_X86/gem5.opt(+0x15b1d10)[0x55a70ab92d10] > gem5/build/GCN3_X86/gem5.opt(+0x15b2fd5)[0x55a70ab93fd5] > gem5/build/GCN3_X86/gem5.opt(+0x15b5620)[0x55a70ab96620] > gem5/build/GCN3_X86/gem5.opt(+0x15b6348)[0x55a70ab97348] > gem5/build/GCN3_X86/gem5.opt(+0x15c2954)[0x55a70aba3954] > gem5/build/GCN3_X86/gem5.opt(+0x56a082)[0x55a709b4b082] > gem5/build/GCN3_X86/gem5.opt(+0x59e2c4)[0x55a709b7f2c4] > gem5/build/GCN3_X86/gem5.opt(+0x59e8a3)[0x55a709b7f8a3] > gem5/build/GCN3_X86/gem5.opt(+0x4ed462)[0x55a709ace462] > gem5/build/GCN3_X86/gem5.opt(+0x4af427)[0x55a709a90427] > /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x2a8738)[0x7f1888459738] > > /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x8dd8)[0x7f188822ef48] > > /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f188837be3b] > > /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyFunction_Vectorcall+0x94)[0x7f1888459114] > /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d)[0x7f1888225d6d] > > /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x7d86)[0x7f188822def6] > > /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f188837be3b] > > /lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCodeEx+0x42)[0x7f188837c1c2] > > /lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCode+0x1f)[0x7f188837c5af] > /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x1cfbf1)[0x7f1888380bf1] > /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x25f537)[0x7f1888410537] > /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d)[0x7f1888225d6d] > > /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x12fd)[0x7f188822746d] > /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x8006b)[0x7f188823106b] > --- END LIBC BACKTRACE --- > Failed to execute default signal handler! > > _________ > > _______________________________________________ > gem5-users mailing list -- gem5-users@gem5.org > To unsubscribe send an email to gem5-users-le...@gem5.org > >
_______________________________________________ gem5-users mailing list -- gem5-users@gem5.org To unsubscribe send an email to gem5-users-le...@gem5.org