Thanks this is helpful.  Kyle and I went through the error and we haven't
run on a machine with enough memory to run batch size 100 (which is what
bwd_activation assumes by default).  However, we have gotten it to run with
up to batch size 50.

We think the failure you were seeing was essentially happening because we
weren't testing bwd_activation in the nightly/weekly regressions, and thus
missed that the file we use to generate the MIOpen cachefiles for the
DNNMark kernels did not have the appropriate kernel for bwd_activation.
Kyle created a patch to fix this problem:
https://gem5-review.googlesource.com/c/public/gem5-resources/+/56789.

You will need to pull this patch and rerun generate_cachefiles before
trying to run again.  Moreover, since we only know it works up to batch
size 50, you may consider changing the batch size here:
https://gem5.googlesource.com/public/gem5-resources/+/refs/heads/stable/src/gpu/DNNMark/config_example/activation_config.dnnmark#6,
to something <= 50 since N represents the batch size.  Alternatively if you
need > 50 batch size, you can try running again on the larger machine you
mentioned before, but since we haven't run it on such a large machine yet
we don't know exactly what will happen.

Hope this helps,
Matt

On Fri, Feb 11, 2022 at 12:11 PM 1575883782 via gem5-users <
gem5-users@gem5.org> wrote:

> yeah, I running DNNMark inside docker, and the version is v21-2. I run
> command by remote-container plugin of VsCode.
>
> ---Original---
> *From:* "Matt Sinclair via gem5-users"<gem5-users@gem5.org>
> *Date:* Sat, Feb 12, 2022 01:41 AM
> *To:* "gem5 users mailing list"<gem5-users@gem5.org>;
> *Cc:* "1575883782"<1575883...@qq.com>;"Kyle Roarty"<kroa...@wisc.edu>;"Matt
> Sinclair"<mattdsinclair.w...@gmail.com>;
> *Subject:* [gem5-users] Re: Gem5 GCN3 DNNMark benchmark error
> (fwd_softmax is ok, but others are not)
>
> One more question for you, original poster: are you running DNNMark inside
> the docker resources we provided:
> http://resources.gem5.org/resources/dnn-mark?
>
> Or are you trying to get this running on your machine directly?
>
> Matt
>
> On Fri, Feb 11, 2022 at 11:37 AM Matt Sinclair <
> mattdsinclair.w...@gmail.com> wrote:
>
>> Kyle, can you please help with this?  I don't recall when we last tested
>> bwd_act.
>>
>> Matt
>>
>> On Fri, Feb 11, 2022 at 2:18 AM 1575883782 via gem5-users <
>> gem5-users@gem5.org> wrote:
>>
>>> Hi,
>>>
>>> I was trying to run DNNMark benchmark with its GCN3 GPU model following the 
>>> instructions
>>> on http://resources.gem5.org/resources/dnn-mark 
>>> <https://www.gem5.org/documentation/general_docs/gpu_models/GCN3>.
>>>
>>> I succeed running fwd_softmax, but when I run other layers, I met some 
>>> problems. For example, "bwd_activation".
>>>
>>>
>>> I tried to run gem5 DNNMark bwd_activation bechmark in 2 computers.
>>>
>>>
>>> First computer has 32G Mem size. Gem5 could run fwd_softmax successfully, 
>>> but always was killed while running bwd_activation. The error message was 
>>> "Killed" + process id. No other messages. I guess it's as this computer's 
>>> mem size is not enough to run it.
>>>
>>>
>>> Second computer has 256G Mem size. Gem5 could run fwd_softmax successfully. 
>>> But some problems happened while running bwd_activation. I solved some, but 
>>> have not solved all. Error messages are:
>>>
>>>
>>> > I0909 01:46:50.680040   100 dnn_wrapper.h:341] enter 
>>> > dnnmarkActivationBackward func
>>> > build/GCN3_X86/sim/mem_pool.cc:110: warn: Reached m5ops MMIO region
>>> > build/GCN3_X86/sim/mem_pool.cc:110: warn: Reached m5ops MMIO region
>>> > build/GCN3_X86/sim/mem_pool.cc:110: warn: Reached m5ops MMIO region
>>> > build/GCN3_X86/sim/mem_pool.cc:110: warn: Reached m5ops MMIO region
>>> > build/GCN3_X86/arch/x86/faults.cc:170: panic: Tried to read unmapped 
>>> > address 0.
>>> > PC: 0x7fffeef84b80, Instr:   FMUL2_M : ldfp87   %ufp1, DS:[rdx]
>>> > Memory Usage: 46436124 KBytes
>>> > Program aborted at tick 10680071080500
>>> >
>>>
>>>
>>> sometimes, error are:
>>>
>>> > panic: Tried to write unmapped address 0x2b95d881.
>>>
>>> or
>>>
>>> > panic: Tried to write unmapped address 0x3.
>>>
>>>
>>> According to my log, I found the problem happended on 
>>> "dnnmarkActivationBackward" func.
>>>
>>> > LOG(INFO) << "enter dnnmarkActivationBackward func";
>>> > #ifdef AMD_MIOPEN
>>> >   MIOPEN_CALL(miopenActivationBackward(
>>> >               mode == COMPOSED ?
>>> >               handle.GetMIOpen(idx) : handle.GetMIOpen(),
>>> >               activation_desc.Get(),
>>> >               alpha,
>>> >               top_desc.Get(), y,
>>> >               top_desc.Get(), dy,
>>> >               bottom_desc.Get(), x,
>>> >               beta,
>>> >               bottom_desc.Get(), dx));
>>> > #endif
>>> >   LOG(INFO) << "exit dnnmarkActivationBackward func";
>>>
>>>
>>> It seems to be a miopen interface functions. I don't know how to solve it. 
>>> Someone could help me?
>>>
>>>
>>> PS:
>>>
>>> my gem5 version is v21-2, and docker image is v21-2.
>>>
>>> my run command is: build/GCN3_X86/gem5.opt --outdir=$outdir 
>>> configs/example/apu_se.py -n 10 --mem-size=8GB 
>>> --benchmark-root=$BenchmarkRoot/test_bwd_activation -c 
>>> dnnmark_test_bwd_activation --options="-config 
>>> "$ConfigRoot"/activation_config.dnnmark -mmap "$MMAPFile" -debuginfo 1"
>>>
>>> Both computers have no AMD GPU.
>>>
>>> _______________________________________________
>>> gem5-users mailing list -- gem5-users@gem5.org
>>> To unsubscribe send an email to gem5-users-le...@gem5.org
>>> %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
>>
>> _______________________________________________
> gem5-users mailing list -- gem5-users@gem5.org
> To unsubscribe send an email to gem5-users-le...@gem5.org
> %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
_______________________________________________
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

Reply via email to