Hi Anoop, I'm glad that increasing -n helped. It's hard to say what exactly the problem is without digging in further, but often the ROCm stack will launch additional processes to do a variety of things (e.g., check which version of LLVM is being used). In gem5, each of these require a separate CPU thread context -- which increasing -n handles in SE mode. So if I had to guess, I would say that this is what is happening.
If you added gdb locally to your docker, and you built the docker properly, then I would expect gdb to work with gem5. Thanks, Matt On Wed, Aug 16, 2023 at 11:41 PM Anoop Mysore <mysan...@gmail.com> wrote: > Thank you, Matt, having 10 CPUs (up from previous 3) in the simulated > system seems to make it work! (At least, I don't see that error at that > point anymore). Is "resource temporarily unavailable" commonly due to CPU > count? Curious to know how you made that connection. > > Re gdb: I am indeed using a local docker build > (gem5/util/dockerfiles/gcn-gpu) with an added gdb installation -- is that > what you meant? > > Will send in a PR to the repo soon as I'm done :) > > On Wed, Aug 16, 2023, 5:03 PM Matt Sinclair <mattdsinclair.w...@gmail.com> > wrote: > >> Hi Anoop, >> >> A few things here: >> >> - Regarding the original failure (at least the !FS part), this is >> normally happening either because of the GPU Target ISA (e.g., gfx900) you >> used in your Makefile (e.g., it is not supported) or because you didn't >> properly specify what GPU ISA you are using when running the program. So, >> what is your command line for running this application and what ISA are you >> specifying in your Makefile? >> - If the "what()" is the real source of the error, then I think this >> could be related to the number of CPU thread contexts you are running with >> gem5. What did you set "-n" to? >> - Regarding gdb, @Matt P: did you remove gdb from what is installed in >> the Docker a while back? If so, I think Anoop would need to add it back >> and create a local docker or something like that. >> - Setting aside the above, it would be wonderful if you contribute the >> CHAI benchmarks to gem5-resources once you get them working! Please let us >> know if we can do anything to help with that. >> >> Thanks, >> Matt >> >> On Wed, Aug 16, 2023 at 9:51 AM Anoop Mysore via gem5-users < >> gem5-users@gem5.org> wrote: >> >>> Curiously, running the gem5.debug executable with gdb within docker results >>> in: >>> Reading symbols from gem5/build/GCN3_X86/gem5.debug... >>> (gdb) quit >>> (the quit wasn't a command I provided, it just quits automatically). Is >>> gdb working with gem5 GCN3 in Docker? >>> >>> I ran gem5.opt with ExecAll and SyscallAll debug flags, the debug tail >>> and the simerr logs are attached. >>> I don't see anything peculiar other than a tgkill syscall with a SIGABRT >>> sent to a thread thereafter halting within a few instructions. >>> >>> On Tue, Aug 15, 2023 at 9:00 PM Anoop Mysore <mysan...@gmail.com> wrote: >>> >>>> I am trying to port CHAI benchmarks >>>> <https://github.com/chai-benchmarks/chai>similarly to >>>> gem5-resources/src/gpu/pannotia >>>> <https://github.com/gem5/gem5-resources/tree/stable/src/gpu/pannotia>. >>>> I was able to HIPify (through the perl script + some manual changes) all >>>> the code files, and ran the BFS program. I see the following error message >>>> at the point of launching the CPU threads here >>>> <https://github.com/mysoreanoop/chai/blob/678c18fd551fbf12f4abbb05ab7164f1b588be68/HIP-U-gem5/BFS/main.cpp#L273> >>>> (fork >>>> of HIPified CHAI). I do not see any of the prints from the CPU threads >>>> which leads me to believe the error is to do with the threads not being >>>> launched or a related error. >>>> >>>> (This looks related; incorporated the suggestion of linking against >>>> -pthread: https://stackoverflow.com/a/6485728) >>>> >>>> The stderr log is below; any help is appreciated. >>>> _________ >>>> .... >>>> AM: Launching CPU >>>> terminate called after throwing an instance of 'std::system_error' >>>> what(): Resource temporarily unavailable >>>> build/GCN3_X86/sim/faults.cc:60: panic: panic condition !FullSystem >>>> occurred: fault (General-Protection) detected @ PC >>>> (0x7ffff6afa941=>0x7ffff6afa942).(0=>1) >>>> Memory Usage: 19704072 KBytes >>>> >>>> Program aborted at tick 441590522500 >>>> --- BEGIN LIBC BACKTRACE --- >>>> gem5/build/GCN3_X86/gem5.opt(+0x550200)[0x55a709b31200] >>>> gem5/build/GCN3_X86/gem5.opt(+0x57d46e)[0x55a709b5e46e] >>>> /lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7f18881a0420] >>>> /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7f188734800b] >>>> /lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7f1887327859] >>>> gem5/build/GCN3_X86/gem5.opt(+0x4be295)[0x55a709a9f295] >>>> gem5/build/GCN3_X86/gem5.opt(+0x5f6169)[0x55a709bd7169] >>>> gem5/build/GCN3_X86/gem5.opt(+0x9fd9ed)[0x55a709fde9ed] >>>> gem5/build/GCN3_X86/gem5.opt(+0x15b1d10)[0x55a70ab92d10] >>>> gem5/build/GCN3_X86/gem5.opt(+0x15b2fd5)[0x55a70ab93fd5] >>>> gem5/build/GCN3_X86/gem5.opt(+0x15b5620)[0x55a70ab96620] >>>> gem5/build/GCN3_X86/gem5.opt(+0x15b6348)[0x55a70ab97348] >>>> gem5/build/GCN3_X86/gem5.opt(+0x15c2954)[0x55a70aba3954] >>>> gem5/build/GCN3_X86/gem5.opt(+0x56a082)[0x55a709b4b082] >>>> gem5/build/GCN3_X86/gem5.opt(+0x59e2c4)[0x55a709b7f2c4] >>>> gem5/build/GCN3_X86/gem5.opt(+0x59e8a3)[0x55a709b7f8a3] >>>> gem5/build/GCN3_X86/gem5.opt(+0x4ed462)[0x55a709ace462] >>>> gem5/build/GCN3_X86/gem5.opt(+0x4af427)[0x55a709a90427] >>>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x2a8738)[0x7f1888459738] >>>> >>>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x8dd8)[0x7f188822ef48] >>>> >>>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f188837be3b] >>>> >>>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyFunction_Vectorcall+0x94)[0x7f1888459114] >>>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d)[0x7f1888225d6d] >>>> >>>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x7d86)[0x7f188822def6] >>>> >>>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f188837be3b] >>>> >>>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCodeEx+0x42)[0x7f188837c1c2] >>>> >>>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCode+0x1f)[0x7f188837c5af] >>>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x1cfbf1)[0x7f1888380bf1] >>>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x25f537)[0x7f1888410537] >>>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d)[0x7f1888225d6d] >>>> >>>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x12fd)[0x7f188822746d] >>>> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x8006b)[0x7f188823106b] >>>> --- END LIBC BACKTRACE --- >>>> Failed to execute default signal handler! >>>> _________ >>>> >>> _______________________________________________ >>> gem5-users mailing list -- gem5-users@gem5.org >>> To unsubscribe send an email to gem5-users-le...@gem5.org >>> >>
_______________________________________________ gem5-users mailing list -- gem5-users@gem5.org To unsubscribe send an email to gem5-users-le...@gem5.org