[gem5-dev] Re: Problem on simulating GCN3 GPU: Running DNNMark too slow.

Matt Sinclair via gem5-dev Tue, 09 May 2023 14:35:15 -0700

Hi,

Trying to answer your various questions:

1.  Similar to #2 below, I am unclear what "blocked" means.  It sounds like
the program is just running, but is slower than you were hoping it would
be?  If so, unfortunately, this is a well known problem with detailed
simulators like gem5 -- they can take a long time to simulate a workload.
However, there is another option, where you aren't using enough thread
contexts, see #2 below.  If you are willing to, you can decrease the batch
size, and usually the program simulates faster.  For FWD_FC in particular,
you would do this by decreasing n (e.g., from to 100 to 4, 8, or 16):
https://gem5.googlesource.com/public/gem5-resources/+/refs/heads/develop/src/gpu/DNNMark/config_example/fc_config.dnnmark#6
.

2.  Define blocked -- what does this mean?  The bigger benchmarks here are
very large ML workloads, it would not surprise me if they took days (or
maybe weeks) to run them end-to-end in gem5.  Are you seeing kernels
progressing through it (e.g., use the GPUKernelInfo debug flag to print
when kernels launch and exit)?  If you are seeing kernels progress, it's
just a really large workload and you'd have to be more patient.  My group
is working on ways to cut down runtime for workloads like this, but nothing
we have specifically tested for these workloads and no ETA on when that
would be available/fully working.

It is also possible that you aren't running with enough CPU thread contexts
and the program is infinitely looping there (ROCm launches additional CPU
processes when setting up a GPU program, these require gem5 to have
additional CPU thread contexts).  But without knowing where the program
seems to be blocked, it's hard to say if this is a problem or not.  But you
could try increasing -n on the command line (e.g., from 3 to 5, or from 5
to 10) to see if this resolves the current problem.  This will not resolve
the above issue though.

3.  I have never personally tried modeling a Transformer in DNNMark, so
this might be a better question for the DNNMark authors.  But ultimately
what you are suggesting is the right way to model things in DNNMark -- in
the config files you can specify a series of layers, one connected after
another.  So, if you knew what the layers in a Transformer are, in theory
you could express it in a config file.  This assumes that DNNMark supports
all of the layers in a Transformer though, which I do not know if that is
true or not (you would need to ask the DNNMark authors).

4.  This seems like a question for DNNMark's authors.  In gem5, we are just
running DNNMark in gem5.  But ultimately what I can recommend is you start
with the base files (e.g.,
https://gem5.googlesource.com/public/gem5-resources/+/refs/heads/develop/src/gpu/DNNMark/benchmarks/test_alexnet/test_alexnet.cc)
and the config files (e.g.,
https://gem5.googlesource.com/public/gem5-resources/+/refs/heads/develop/src/gpu/DNNMark/config_example/alexnet.dnnmark)
and go from there.  When I started with DNNMark, I would observe the LOG
prints it prints to the screen, then grep for those prints and examine the
code.

5.  What is "ruby memory" -- is this L1, L2, or main memory size?
Something else?  There are documents like this:
https://www.gem5.org/2020/06/01/towards-full.html,
https://www.gem5.org/2020/05/30/enabling-multi-gpu.html,
https://www.gem5.org/2020/05/27/modern-gpu-applications.html, and
https://www.gem5.org/documentation/general_docs/gpu_models/GCN3.  The GPU
Ruby system uses the same building blocks as the CPU Ruby models:
https://www.gem5.org/documentation/learning_gem5/part3/MSIintro/.  Not sure
what exactly you are looking for though.

Thanks,
Matt

On Tue, May 9, 2023 at 4:34 AM 429442672 <429442...@qq.com> wrote:

>
> hi everyone,
>
> I have successfully built and ran DNNMark using the command:
>
> sudo docker run --rm -v ${PWD}:${PWD} -v
> ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0
> -w ${PWD} gcn-gpu
> gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n 3
> --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_softmax
> -cdnnmark_test_fwd_softmax
> --options="-config
> gem5-resources/src/gpu/DNNMark/config_example/softmax_config.dnnmark -mmap
> gem5-resources/src/gpu/DNNMark/mmap.bin"
>
> with the output
>
> Exiting because exiting with last active thread context
>
> which may means i have correctly made the running environment.
>
>
> However, i tried several benchmarks in
>
>
> but meet following problems:
>
> 1. problem on running test_fwd_fc
>
> When i run test_fwd_fc using:
>
> sudo docker run --rm -v ${PWD}:${PWD} -v
> ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0
> -w ${PWD} gcn-gpu gem5/build/GCN3_X86/gem5.opt
> gem5/configs/example/apu_se.py -n3
> --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_fc
> -c dnnmark_test_fwd_fc
> --options="-config
> gem5-resources/src/gpu/DNNMark/config_example/fc_config.dnnmark -mmap
> gem5-resources/src/gpu/DNNMark/DNNMark_data.dat"
>
> the problem is running for a few hours, even though i have modify the
> input data (mmap.bin -> DNNMark_data.dat) to a smaller size 300MB (2GB in
> default).
> I have also tried several benchmarks, the only benchmark i done is the
> test_fwd_pool and test_bwd_pool, when i ran benchmarks such as
> conv、pool、fc, the program will be blocked, with out any output.
> Is there anything i did wrong here? or these benchmards are too
> compute-intensive to run, leading to slow running?
> May i ask for any suggestion for running these benchmarks?
>
> 2. problem on running test_VGG and test_alexnet.
>
> I run them with the commands:
>
> sudo docker run --rm -v ${PWD}:${PWD} -v
> ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0
> -w ${PWD} gcn-gpu gem5/build/GCN3_X86/gem5.opt
> gem5/configs/example/apu_se.py -n3
> --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_alexnet
> -c dnnmark_test_alexnet
> --options="-config
> gem5-resources/src/gpu/DNNMark/config_example/alexnet.dnnmark -mmap
> gem5-resources/src/gpu/DNNMark/mmap.bin"
>
> and
>
> sudo docker run --rm -v ${PWD}:${PWD} -v
> ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0
> -w ${PWD} gcn-gpu gem5/build/GCN3_X86/gem5.opt
> gem5/configs/example/apu_se.py -n3
> --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_VGG
> -c dnnmark_test_VGG
> --options="-config
> gem5-resources/src/gpu/DNNMark/config_example/VGG.dnnmark -mmap
> gem5-resources/src/gpu/DNNMark/mmap.bin"
>
>
> but they are also blocked.
> May i ask for any suggestion for running these benchmarks?
>
> 3. question on modifying DNN network.
>
> May i ask how to modify the DNN network architecture? For example, is it
> possible to make a transformer block and run it on gem5? It seems that i
> can change the configures in /DNNMark/config_example following the
> example of alexnet.dnnmark, without modifing the code in
> DNNMark/benchmarks/test_alexnet. May i ask is that correct?
>
> 4. How can i get trace on running DNNMark.
>
> Running DNNMark seem like a block box. It is possible to get the trace of
> running DNNMark? For example, the process of data loading, computing, etc.
>
> 5. question on apu_se.py
>
> It seem that all the benchmarks require apu_se.py. May i ask is there any
> more detailed documents to introduce what this apu_se.py did and how to
> modify it？For example，how can i add more ruby memory to the gpu.
>
>
>
>
> The documents and introduction for gem5 gcn gpu is pretty few, if it is
> possible, could any one provide some help for me?
>
> Thank you all very much！
>

_______________________________________________
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org

[gem5-dev] Re: Problem on simulating GCN3 GPU: Running DNNMark too slow.

Reply via email to