[gem5-users] Re: Gem5 gpu

Matt Sinclair via gem5-users Fri, 09 Aug 2024 14:17:52 -0700

Hi Ravikant,

>From looking at the details below, it appears you are using the GPUSE gem5
support.  In this version, I don’t believe we ever officially got AlexNet
or VGG working.  For fwd_conv there was a prior message on this mailing
list about some of the issues with it, but I’m having a hard time finding
it on my phone. Maybe check the gem5 message archive?


Regarding AlexNet I spent a bunch of time on them, but every time I fixed a
bug there was another one a layer or two after.  We have made a number of
fixes since I last tried, but seems the state is the same, sadly.  This
failure you are running into is a common one, sort of like a “segfault”
error message but in gem5.  Basically the error message is telling you that
the program is accessing (writing) some memory address it shouldn’t be.
Unfortunately since many different bugs led to this error there isn’t a
perfect place I can point you to.  But if you are willing to help us debug,
here are some ideas:

- if you run DNNMark with its DEBUG flag set, it will print more
information about what it was trying to do around where the failure
occurred, which might help us provide more useful help.
- are you running with stable or develop?  If you are using stable, I
recommend trying develop — we are pushing bug fixes there often.
- I have not tried DNNMark with GPUFS yet, but if you are willing to give
it a try (and/or don’t need GPUSE for your research), I would recommend
trying DNNMark with GPUFS.  We have been focusing most of our effort on
supporting GPUFS in recent months because it is much easier to support
newer ROCm versions in it.
- if none of the above solve your problem, you’d need to get a trace then
to identify where this unmapped address is coming from and fix that.

Ultimately if you are willing to help debug this issue we can try to
provide some guidance on how to fix it (and then hopefully you will
consider contributing the bug fix back!).  Let us know what happens with
the above and we can go from there.

Hope this helps,
Matt

On Thu, Aug 8, 2024 at 10:19 AM Ravikant Bhardwaj via gem5-users <
gem5-users@gem5.org> wrote:

> Hi,
>
> I am currently working on GPU model of Gem5 there while running alexnet
> benchmark in DNNMARK suite in 24-0, I am currently getting an error that my
> memory size is less. So to resolve it I have increased memory size to 8GB.
> After increasing the memory size I am getting a different error which I am
> not able to resolve, I am attaching the error and the command which I have
> used
>
> Error--
>
> src/arch/x86/faults.cc:167: panic: Tried to write unmapped address
> 0x7fffffffdf80.
> PC: (0x7ffff8009475=>0x7ffff8009478).(0=>1), Instr:   MOV_M_R : st   edx,
> SS:[t0 + rsp]
> Memory Usage: 20324256 KBytes
> Program aborted at tick 190131659500
> --- BEGIN LIBC BACKTRACE ---
> build/VEGA_X86/gem5.opt(_ZN4gem515print_backtraceEv+0x30)[0x5601f24103f0]
> build/VEGA_X86/gem5.opt(_ZN4gem512abortHandlerEi+0x4c)[0x5601f243b8ac]
> /lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7f94cc539420]
> /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7f94cb70300b]
> /lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7f94cb6e2859]
> build/VEGA_X86/gem5.opt(+0xdebc05)[0x5601f224dc05]
>
> build/VEGA_X86/gem5.opt(_ZN4gem56X86ISA9PageFault6invokeEPNS_13ThreadContextERKNS_14RefCountingPtrINS_10StaticInstEEE+0x1b6)[0x5601f3222856]
>
> build/VEGA_X86/gem5.opt(_ZN4gem513BaseSimpleCPU9advancePCERKSt10shared_ptrINS_9FaultBaseEE+0xde)[0x5601f3a07e1e]
>
> build/VEGA_X86/gem5.opt(_ZN4gem515TimingSimpleCPU11advanceInstERKSt10shared_ptrINS_9FaultBaseEE+0xc6)[0x5601f39ff026]
>
> build/VEGA_X86/gem5.opt(_ZN4gem515TimingSimpleCPU17finishTranslationEPNS_21WholeTranslationStateE+0x111)[0x5601f3a02c81]
>
> build/VEGA_X86/gem5.opt(_ZN4gem515DataTranslationIPNS_15TimingSimpleCPUEE6finishERKSt10shared_ptrINS_9FaultBaseEERKS4_INS_7RequestEEPNS_13ThreadContextENS_7BaseMMU4ModeE+0xe3)[0x5601f3a060f3]
>
> build/VEGA_X86/gem5.opt(_ZN4gem56X86ISA3TLB15translateTimingERKSt10shared_ptrINS_7RequestEEPNS_13ThreadContextEPNS_7BaseMMU11TranslationENS9_4ModeE+0xd2)[0x5601f3263e82]
>
> build/VEGA_X86/gem5.opt(_ZN4gem515TimingSimpleCPU8writeMemEPhjmNS_5FlagsImEEPmRKSt6vectorIbSaIbEE+0x758)[0x5601f3a04888]
>
> build/VEGA_X86/gem5.opt(_ZN4gem517SimpleExecContext8writeMemEPhjmNS_5FlagsImEEPmRKSt6vectorIbSaIbEE+0x57)[0x5601f3a099a7]
> build/VEGA_X86/gem5.opt(+0x22559e6)[0x5601f36b79e6]
>
> build/VEGA_X86/gem5.opt(_ZNK4gem510X86ISAInst2St11initiateAccEPNS_11ExecContextEPNS_5trace10InstRecordE+0x168)[0x5601f36f46f8]
>
> build/VEGA_X86/gem5.opt(_ZN4gem515TimingSimpleCPU14completeIfetchEPNS_6PacketE+0x16d)[0x5601f39ffefd]
>
> build/VEGA_X86/gem5.opt(_ZN4gem510EventQueue10serviceOneEv+0x175)[0x5601f2427eb5]
>
> build/VEGA_X86/gem5.opt(_ZN4gem59doSimLoopEPNS_10EventQueueE+0x70)[0x5601f245cb30]
> build/VEGA_X86/gem5.opt(_ZN4gem58simulateEm+0x28b)[0x5601f245d1bb]
> build/VEGA_X86/gem5.opt(+0x2a12620)[0x5601f3e74620]
> build/VEGA_X86/gem5.opt(+0xdcc388)[0x5601f222e388]
> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x2a8748)[0x7f94cc7f0748]
>
> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x8dd8)[0x7f94cc5c5f48]
>
> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f94cc712e4b]
>
> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyFunction_Vectorcall+0x94)[0x7f94cc7f0124]
> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d)[0x7f94cc5bcd6d]
>
> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x7d86)[0x7f94cc5c4ef6]
>
> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f94cc712e4b]
>
> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCodeEx+0x42)[0x7f94cc7131d2]
>
> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCode+0x1f)[0x7f94cc7135bf]
> /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x1cfc01)[0x7f94cc717c01]
> --- END LIBC BACKTRACE ---
> For more info on how to address this issue, please visit
> https://www.gem5.org/documentation/general_docs/common-errors/
>
>
>
> Command-
>
> docker run --rm -v ${PWD}:${PWD} -v
> ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0
> -w ${PWD} ghcr.io/gem5/gcn-gpu:v24-0 build/VEGA_X86/gem5.opt
> configs/example/apu_se.py -n3
> --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_alexnet
> -c dnnmark_test_alexnet --options="-config
> gem5-resources/src/gpu/DNNMark/config_example/alexnet.dnnmark -mmap
> gem5-resources/src/gpu/DNNMark/mmap.bin" --mem-size=8GB
>
>
> I am also getting similar error for test_fwd_conv and VGG.
> Regards,
> Ravikant
> _______________________________________________
> gem5-users mailing list -- gem5-users@gem5.org
> To unsubscribe send an email to gem5-users-le...@gem5.org
>

_______________________________________________
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: Gem5 gpu

Reply via email to