Hi Anoop,

1.  gfx902 warning: this is "intentionally" there on the ROCm compiler
folks side.  Essentially, they are trying to warn you that APUs are not
100% optimized for in ROCm.  In particular, I believe libraries like MIOpen
do not have APU support.  But as long as your code does not use libraries
like this, I think you should be fine.

2.  The target not being found in gem5 is because you need to pass in
--gfx-version=gfx902 on the command line:
https://gem5.googlesource.com/public/gem5/+/refs/heads/develop/configs/example/apu_se.py#366
(or in your board if you are using a board).  (this assumes you have
already updated your Makefile to compile for gfx902)  Essentially the
problem here is that without specifying this, gem5 thinks you are running a
different version of ROCm, and there is a mismatch.

Hope this helps,
Matt

On Thu, Oct 19, 2023 at 4:08 AM Anoop Mysore <mysan...@gmail.com> wrote:

> Thank you both!
> I was able to manually copy over the instruction execution support, and it
> works. But there are more changes in Vega that might be useful to getting
> some of the CHAI benchmarks running -- so I would like to move to gfx902 as
> suggested.
>
> However, when I try to compile for gfx902 with hipcc from ROCm 4.0.1, it
> throws "Warning: The specified HIP target: gfx902 is unknown. Correct
> compilation is not guaranteed."
> And understandably, in simulation there's an error that complains that the
> right kernel for the device is not found. I'm assuming I would need to
> update the ROCm stack (for a newer hipcc) -- but I wasn't able to find an
> architecture-support list to figure out which version to install. The
> latest, v5.1, fails due to bad IOCTLs, so is there perhaps an intermediate
> version that works? Or have I got this all wrong somehow?
>
> On Mon, Sep 11, 2023 at 9:15 PM Matt Sinclair <
> mattdsinclair.w...@gmail.com> wrote:
>
>> Yeah, I haven't tried CHAI but I believe gfx902 would work with it (if
>> you need APUs).
>>
>> Matt S.
>>
>> On Mon, Sep 11, 2023 at 12:56 PM Poremba, Matthew <
>> matthew.pore...@amd.com> wrote:
>>
>>> [Public]
>>>
>>> Hi Anoop,
>>>
>>>
>>>
>>>
>>>
>>> That instruction was recently added to gem5, but for Vega ISA only:
>>> https://gem5-review.googlesource.com/c/public/gem5/+/67072 .  It could
>>> be ported to GCN3 probably by copying the code exactly into the
>>> corresponding GCN3 files.  You’ll notice however in that relation chain
>>> there are many more instructions implemented for Vega only, so there will
>>> be similar issues to this.  Alternately, I think there is a Vega APU
>>> working (gfx902?).  MattS would know more about the status of that.   I am
>>> not sure of your use case but if you can use a dGPU, Vega with gfx900
>>> version or full system mode is another option to use Vega ISA.
>>>
>>>
>>>
>>> For the docker automatically quitting, you will have to do `docker run
>>> *-it* …` to start an interactive session.
>>>
>>>
>>>
>>>
>>>
>>> -Matt
>>>
>>>
>>>
>>> *From:* Anoop Mysore <mysan...@gmail.com>
>>> *Sent:* Monday, September 11, 2023 10:33 AM
>>> *To:* Poremba, Matthew <matthew.pore...@amd.com>
>>> *Cc:* Matt Sinclair <mattdsinclair.w...@gmail.com>; The gem5 Users
>>> mailing list <gem5-users@gem5.org>
>>> *Subject:* Re: [gem5-users] Re: Error in an application running on gem5
>>> GCN3 (with apu_se.py)
>>>
>>>
>>>
>>> *Caution:* This message originated from an External Source. Use proper
>>> caution when opening attachments, clicking links, or responding.
>>>
>>>
>>>
>>> Thanks, Matt. Yes, the printfs in the GPU kernel code were the issue for
>>> s_sendmsg.
>>>
>>> However, the ds_add_u32 instruction is still an issue. I am already
>>> compiling with -O1 like so:
>>>
>>> /opt/rocm/hip/bin/hipcc --amdgpu-target=gfx801,gfx803
>>>
>>>     main.cpp kernel.cu kernel.cpp
>>>
>>>     -o ./bin/hsto.gem5
>>>
>>>
>>> -I/home/anoop/new/gem5-resources/src/gpu/chai/HIP-U-gem5/HSTO/../.gem5/include
>>>
>>>     -lz -lm -lc -lpthread -O1
>>>
>>>
>>> -L/home/anoop/new/gem5-resources/src/gpu/chai/HIP-U-gem5/HSTO/../.gem5/util/m5/build/x86/out
>>> -lm5
>>>
>>>
>>>
>>> The exact error is:
>>> src/gpu-compute/scoreboard_check_stage.cc:158: panic: next instruction:
>>> ds_add_u32 v7, v8 is of unknown type
>>>
>>>
>>>
>>> The corresponding line in the simulator
>>> <https://github.com/gem5/gem5/blob/48a40cf2f5182a82de360b7efa497d82e06b1631/src/gpu-compute/scoreboard_check_stage.cc#L158>,
>>> and decoder section of it
>>> <https://github.com/gem5/gem5/blob/48a40cf2f5182a82de360b7efa497d82e06b1631/src/arch/amdgpu/gcn3/insts/instructions.cc#L30929>.
>>> Because of the involvement of the LDS/GDS, I'm unsure how to implement this
>>> -- any help would be appreciated.
>>>
>>>
>>>
>>> Also, GDB still doesn't seem to be working with my gem5. And without
>>> prints in the kernel, it's cumbersome to get any useful insight on failing
>>> programs.
>>>
>>> I added within the Dockerfile: RUN apt install -y gdb
>>>
>>> I am invoking gdb with:
>>>
>>> docker run -u $UID:$GID --volume $(pwd):$(pwd) -w $(pwd) gem5:new gdb
>>> --args gem5/build/GCN3_X86/gem5.debug gem5/configs/example/apu_se.py
>>> --cpu-type=DerivO3CPU --num-cpus=4 --mem-size=1GB --ruby
>>> --mem-type=SimpleMemory -c
>>> gem5-resources/src/gpu/chai/HIP-U-gem5/HSTO/bin/hsto.gem5
>>>
>>>
>>>
>>> Log:
>>>
>>> GNU gdb (Ubuntu 9.2-0ubuntu1~20.04.1) 9.2
>>> Copyright (C) 2020 Free Software Foundation, Inc.
>>> License GPLv3+: GNU GPL version 3 or later <
>>> http://gnu.org/licenses/gpl.html>
>>> This is free software: you are free to change and redistribute it.
>>> There is NO WARRANTY, to the extent permitted by law.
>>> Type "show copying" and "show warranty" for details.
>>> This GDB was configured as "x86_64-linux-gnu".
>>> Type "show configuration" for configuration details.
>>> For bug reporting instructions, please see:
>>> <http://www.gnu.org/software/gdb/bugs/>.
>>> Find the GDB manual and other documentation resources online at:
>>>     <http://www.gnu.org/software/gdb/documentation/>.
>>>
>>> For help, type "help".
>>> Type "apropos word" to search for commands related to "word"...
>>> Reading symbols from gem5/build/GCN3_X86/gem5.debug...
>>> (gdb) quit
>>>
>>>
>>>
>>> PS: `quit` was automatically taken in.
>>>
>>> Is there anything wrong I'm doing here?
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Fri, Sep 8, 2023 at 4:50 PM Poremba, Matthew <matthew.pore...@amd.com>
>>> wrote:
>>>
>>> [Public]
>>>
>>>
>>>
>>> Hi Anoop,
>>>
>>>
>>>
>>>
>>>
>>> Based on that register count, I am going to guess you built the
>>> application with -O0 or some other debugging flags?  If you do this, the
>>> compiler makes some super large number of registers. I assume that is so a
>>> real GPU will not run any other applications simultaneously.
>>>
>>>
>>>
>>> Similarly, if you are seeing s_sendmsg I am going to guess there is a
>>> printf() in your GPU kernel.  These aren’t currently supported in gem5, but
>>> something that would be very nice to have.
>>>
>>>
>>>
>>> If these are true you will need to remove any printfs and compile with
>>> at least -O1 to run in gem5.
>>>
>>>
>>>
>>>
>>>
>>> -Matt
>>>
>>>
>>>
>>> *From:* Anoop Mysore <mysan...@gmail.com>
>>> *Sent:* Friday, September 8, 2023 7:33 AM
>>> *To:* Matt Sinclair <mattdsinclair.w...@gmail.com>
>>> *Cc:* The gem5 Users mailing list <gem5-users@gem5.org>; Poremba,
>>> Matthew <matthew.pore...@amd.com>
>>> *Subject:* Re: [gem5-users] Re: Error in an application running on gem5
>>> GCN3 (with apu_se.py)
>>>
>>>
>>>
>>> *Caution:* This message originated from an External Source. Use proper
>>> caution when opening attachments, clicking links, or responding.
>>>
>>>
>>>
>>> Hi Matt,
>>> I'm facing a few other problems:
>>>
>>> 1. `panic: panic condition (numWfs * vregDemandPerWI) > (numVectorALUs *
>>> numVecRegsPerSimd) occurred: WG with 1 WFs and 29285 VGPRs per WI can not
>>> be allocated to CU that has 8192 VGPRs`
>>>
>>> The corresponding line of the code in gem5:
>>> https://github.com/gem5/gem5/blob/f29bfc0640c88a79eb7f94454ce31b3237ec0066/src/gpu-compute/compute_unit.cc#L565
>>>
>>> One of the variables (vregDemandPerWI) is ultimately derived from
>>> reading the executable for the kernel code. Is it possible to reduce this
>>> VGRP demand somehow, or is increasing the VGPRs (to what seems like an
>>> unrealistically high value) be the only solution? Similar error for SGPRs
>>> as well.
>>>
>>> 2. Some kernels (compiled for gfx801/3) have instructions such as
>>> ds_add_u32
>>> <https://www.amd.com/system/files/TechDocs/gcn3-instruction-set-architecture.pdf>
>>>  (Data
>>> Store instruction page: 12-161), s_sendmsg (send message to host CPU) --
>>> which do not have their relevant decoding code available
>>> <https://github.com/gem5/gem5/blob/48a40cf2f5182a82de360b7efa497d82e06b1631/src/arch/amdgpu/gcn3/insts/instructions.cc#L30929>.
>>> Is this intentional or was this just punted for later -- anything to keep
>>> in mind when coding for these?
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Aug 17, 2023 at 5:13 PM Matt Sinclair <
>>> mattdsinclair.w...@gmail.com> wrote:
>>>
>>> Hi Anoop,
>>>
>>>
>>>
>>> I'm glad that increasing -n helped.  It's hard to say what exactly the
>>> problem is without digging in further, but often the ROCm stack will launch
>>> additional processes to do a variety of things (e.g., check which version
>>> of LLVM is being used).  In gem5, each of these require a separate CPU
>>> thread context -- which increasing -n handles in SE mode.  So if I had to
>>> guess, I would say that this is what is happening.
>>>
>>>
>>>
>>> If you added gdb locally to your docker, and you built the docker
>>> properly, then I would expect gdb to work with gem5.
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Matt
>>>
>>>
>>>
>>> On Wed, Aug 16, 2023 at 11:41 PM Anoop Mysore <mysan...@gmail.com>
>>> wrote:
>>>
>>> Thank you, Matt, having 10 CPUs (up from previous 3) in the simulated
>>> system seems to make it work! (At least, I don't see that error at that
>>> point anymore). Is "resource temporarily unavailable" commonly due to CPU
>>> count? Curious to know how you made that connection.
>>>
>>>
>>>
>>> Re gdb: I am indeed using a local docker build
>>> (gem5/util/dockerfiles/gcn-gpu) with an added gdb installation -- is that
>>> what you meant?
>>>
>>>
>>>
>>> Will send in a PR to the repo soon as I'm done :)
>>>
>>> On Wed, Aug 16, 2023, 5:03 PM Matt Sinclair <
>>> mattdsinclair.w...@gmail.com> wrote:
>>>
>>> Hi Anoop,
>>>
>>>
>>>
>>> A few things here:
>>>
>>>
>>>
>>> - Regarding the original failure (at least the !FS part), this is
>>> normally happening either because of the GPU Target ISA (e.g., gfx900) you
>>> used in your Makefile (e.g., it is not supported) or because you didn't
>>> properly specify what GPU ISA you are using when running the program.  So,
>>> what is your command line for running this application and what ISA are you
>>> specifying in your Makefile?
>>>
>>> - If the "what()" is the real source of the error, then I think this
>>> could be related to the number of CPU thread contexts you are running with
>>> gem5.  What did you set "-n" to?
>>>
>>> - Regarding gdb, @Matt P: did you remove gdb from what is installed in
>>> the Docker a while back?  If so, I think Anoop would need to add it back
>>> and create a local docker or something like that.
>>>
>>> - Setting aside the above, it would be wonderful if you contribute the
>>> CHAI benchmarks to gem5-resources once you get them working!  Please let us
>>> know if we can do anything to help with that.
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Matt
>>>
>>>
>>>
>>> On Wed, Aug 16, 2023 at 9:51 AM Anoop Mysore via gem5-users <
>>> gem5-users@gem5.org> wrote:
>>>
>>> Curiously, running the gem5.debug executable with gdb within docker
>>> results in:
>>>
>>> Reading symbols from gem5/build/GCN3_X86/gem5.debug...
>>> (gdb) quit
>>> (the quit wasn't a command I provided, it just quits automatically). Is
>>> gdb working with gem5 GCN3 in Docker?
>>>
>>>
>>>
>>> I ran gem5.opt with ExecAll and SyscallAll debug flags, the debug tail
>>> and the simerr logs are attached.
>>>
>>> I don't see anything peculiar other than a tgkill syscall with a SIGABRT
>>> sent to a thread thereafter halting within a few instructions.
>>>
>>>
>>>
>>> On Tue, Aug 15, 2023 at 9:00 PM Anoop Mysore <mysan...@gmail.com> wrote:
>>>
>>> I am trying to port CHAI benchmarks
>>> <https://github.com/chai-benchmarks/chai>similarly to
>>> gem5-resources/src/gpu/pannotia
>>> <https://github.com/gem5/gem5-resources/tree/stable/src/gpu/pannotia>.
>>> I was able to HIPify (through the perl script + some manual changes) all
>>> the code files, and ran the BFS program. I see the following error message
>>> at the point of launching the CPU threads here
>>> <https://github.com/mysoreanoop/chai/blob/678c18fd551fbf12f4abbb05ab7164f1b588be68/HIP-U-gem5/BFS/main.cpp#L273>
>>>  (fork
>>> of HIPified CHAI). I do not see any of the prints from the CPU threads
>>> which leads me to believe the error is to do with the threads not being
>>> launched or a related error.
>>>
>>>
>>>
>>> (This looks related; incorporated the suggestion of linking against
>>> -pthread: https://stackoverflow.com/a/6485728)
>>>
>>>
>>>
>>> The stderr log is below; any help is appreciated.
>>>
>>> _________
>>>
>>> ....
>>>
>>> AM: Launching CPU
>>>
>>> terminate called after throwing an instance of 'std::system_error'
>>>
>>> what():  Resource temporarily unavailable
>>>
>>> build/GCN3_X86/sim/faults.cc:60: panic: panic condition !FullSystem
>>> occurred: fault (General-Protection) detected @ PC
>>> (0x7ffff6afa941=>0x7ffff6afa942).(0=>1)
>>> Memory Usage: 19704072 KBytes
>>>
>>> Program aborted at tick 441590522500
>>>
>>> --- BEGIN LIBC BACKTRACE ---
>>> gem5/build/GCN3_X86/gem5.opt(+0x550200)[0x55a709b31200]
>>> gem5/build/GCN3_X86/gem5.opt(+0x57d46e)[0x55a709b5e46e]
>>> /lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7f18881a0420]
>>> /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7f188734800b]
>>> /lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7f1887327859]
>>> gem5/build/GCN3_X86/gem5.opt(+0x4be295)[0x55a709a9f295]
>>> gem5/build/GCN3_X86/gem5.opt(+0x5f6169)[0x55a709bd7169]
>>> gem5/build/GCN3_X86/gem5.opt(+0x9fd9ed)[0x55a709fde9ed]
>>> gem5/build/GCN3_X86/gem5.opt(+0x15b1d10)[0x55a70ab92d10]
>>> gem5/build/GCN3_X86/gem5.opt(+0x15b2fd5)[0x55a70ab93fd5]
>>> gem5/build/GCN3_X86/gem5.opt(+0x15b5620)[0x55a70ab96620]
>>> gem5/build/GCN3_X86/gem5.opt(+0x15b6348)[0x55a70ab97348]
>>> gem5/build/GCN3_X86/gem5.opt(+0x15c2954)[0x55a70aba3954]
>>> gem5/build/GCN3_X86/gem5.opt(+0x56a082)[0x55a709b4b082]
>>> gem5/build/GCN3_X86/gem5.opt(+0x59e2c4)[0x55a709b7f2c4]
>>> gem5/build/GCN3_X86/gem5.opt(+0x59e8a3)[0x55a709b7f8a3]
>>> gem5/build/GCN3_X86/gem5.opt(+0x4ed462)[0x55a709ace462]
>>> gem5/build/GCN3_X86/gem5.opt(+0x4af427)[0x55a709a90427]
>>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x2a8738)[0x7f1888459738]
>>>
>>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x8dd8)[0x7f188822ef48]
>>>
>>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f188837be3b]
>>>
>>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyFunction_Vectorcall+0x94)[0x7f1888459114]
>>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d)[0x7f1888225d6d]
>>>
>>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x7d86)[0x7f188822def6]
>>>
>>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f188837be3b]
>>>
>>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCodeEx+0x42)[0x7f188837c1c2]
>>>
>>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCode+0x1f)[0x7f188837c5af]
>>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x1cfbf1)[0x7f1888380bf1]
>>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x25f537)[0x7f1888410537]
>>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d)[0x7f1888225d6d]
>>>
>>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x12fd)[0x7f188822746d]
>>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x8006b)[0x7f188823106b]
>>> --- END LIBC BACKTRACE ---
>>> Failed to execute default signal handler!
>>>
>>> _________
>>>
>>> _______________________________________________
>>> gem5-users mailing list -- gem5-users@gem5.org
>>> To unsubscribe send an email to gem5-users-le...@gem5.org
>>>
>>>
_______________________________________________
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

Reply via email to