Thank you so much for taking the time to answer my questions


For the question 1:


yes, blocked means as what you said: "the program is just running"


I followed your suggestion and made some modifications:


a. for src/gpu/DNNMark/config_example/fc_config.dnnmark:

b. i generate a 30MB data as input, instead of using the mmap.bin



then i ran:



sudo docker run --rm -v ${PWD}:${PWD} -v 
${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 -w 
${PWD} gcn-gpu gem5/build/GCN3_X86/gem5.opt



gem5/configs/example/apu_se.py -n4 
--benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_fc -c 
dnnmark_test_fwd_fc



--options="-config 
gem5-resources/src/gpu/DNNMark/config_example/fc_config.dnnmark -mmap 
gem5-resources/src/gpu/DNNMark/DNNMark_data.dat"





after a few hours, i got the output with:



build/GCN3_X86/sim/syscall_emul.cc:74: warn: ignoring syscall mprotect(...)

build/GCN3_X86/arch/generic/debugfaults.hh:145: warn: MOVNTDQ: Ignoring 
non-temporal hint, modeling as cacheable!


build/GCN3_X86/sim/mem_state.cc:99: panic: Someone allocated physical memory at 
VA 0x10000000 without creating a VMA!


Memory Usage: 22622544 KBytes


Program aborted at tick 10636412834000


--- BEGIN LIBC BACKTRACE ---


gem5/build/GCN3_X86/gem5.opt(+0x19efd50)[0x560c0deabd50]


gem5/build/GCN3_X86/gem5.opt(+0x1a1425e)[0x560c0ded025e]


/lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7f33bb73f420]


/lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7f33ba8e400b]


/lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7f33ba8c3859]


gem5/build/GCN3_X86/gem5.opt(+0x4c20a5)[0x560c0c97e0a5]


gem5/build/GCN3_X86/gem5.opt(+0x1a80f2b)[0x560c0df3cf2b]


gem5/build/GCN3_X86/gem5.opt(+0x1a81623)[0x560c0df3d623]


gem5/build/GCN3_X86/gem5.opt(+0x1a928ab)[0x560c0df4e8ab]


gem5/build/GCN3_X86/gem5.opt(+0x12a3c92)[0x560c0d75fc92]


gem5/build/GCN3_X86/gem5.opt(+0x12dc7f5)[0x560c0d7987f5]


gem5/build/GCN3_X86/gem5.opt(+0x1304b15)[0x560c0d7c0b15]


gem5/build/GCN3_X86/gem5.opt(+0x1304cc0)[0x560c0d7c0cc0]


gem5/build/GCN3_X86/gem5.opt(+0x1a9427f)[0x560c0df5027f]


gem5/build/GCN3_X86/gem5.opt(+0x129bef0)[0x560c0d757ef0]


gem5/build/GCN3_X86/gem5.opt(+0x1a7333f)[0x560c0df2f33f]


gem5/build/GCN3_X86/gem5.opt(+0x16b9804)[0x560c0db75804]


gem5/build/GCN3_X86/gem5.opt(+0x16b3dc8)[0x560c0db6fdc8]


gem5/build/GCN3_X86/gem5.opt(+0x16b4b80)[0x560c0db70b80]


gem5/build/GCN3_X86/gem5.opt(+0x1a03665)[0x560c0debf665]


gem5/build/GCN3_X86/gem5.opt(+0x1a2bab4)[0x560c0dee7ab4]


gem5/build/GCN3_X86/gem5.opt(+0x1a2c093)[0x560c0dee8093]


gem5/build/GCN3_X86/gem5.opt(+0xadded2)[0x560c0cf99ed2]


gem5/build/GCN3_X86/gem5.opt(+0x4b6757)[0x560c0c972757]


/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x2a8748)[0x7f33bb9f6748]


/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x8dd8)[0x7f33bb7cbf48]


/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f33bb918e4b]


/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyFunction_Vectorcall+0x94)[0x7f33bb9f6124]


/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d)[0x7f33bb7c2d6d]


/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x7d86)[0x7f33bb7caef6]


/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7f33bb918e4b]


/lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCodeEx+0x42)[0x7f33bb9191d2]


--- END LIBC BACKTRACE ---


Failed to execute default signal handler!




I don't know what i did wrong.
Have you ever tried running this benchmark or the benchmarks like alexnet or 
VGG? 
May I ask for some advices for successfully runing this test_fwd_fc?


Thank you !!


------------------ ???????? ------------------
??????:                                                                         
                                               "Matt Sinclair"                  
                                                                  
<mattdsinclair.w...@gmail.com&gt;;
????????:&nbsp;2023??5??10??(??????) ????5:34
??????:&nbsp;"429442672"<429442...@qq.com&gt;;
????:&nbsp;"gem5-dev"<gem5-dev@gem5.org&gt;;"gem5-users"<gem5-us...@gem5.org&gt;;"Poremba,
 Matthew"<matthew.pore...@amd.com&gt;;
????:&nbsp;Re: Problem on simulating GCN3 GPU: Running DNNMark too slow.



Hi,


Trying to answer your various questions:


1.&nbsp; Similar to #2 below, I am unclear what "blocked" means.&nbsp; It 
sounds like the program is just running, but is slower than you were hoping it 
would be?&nbsp; If so, unfortunately, this is a well known problem with 
detailed simulators like gem5 -- they can take a long time to simulate a 
workload.&nbsp; However, there is another option, where you aren't using enough 
thread contexts, see #2 below.&nbsp; If you are willing to, you can decrease 
the batch size, and usually the program simulates faster.&nbsp; For FWD_FC in 
particular, you would do this by decreasing n (e.g., from to 100 to 4, 8, or 
16): 
https://gem5.googlesource.com/public/gem5-resources/+/refs/heads/develop/src/gpu/DNNMark/config_example/fc_config.dnnmark#6.



2.&nbsp; Define blocked -- what does this mean?&nbsp; The bigger benchmarks 
here are very large ML workloads, it would not surprise me if they took days 
(or maybe weeks) to run them end-to-end in gem5.&nbsp; Are you seeing kernels 
progressing through it (e.g., use the GPUKernelInfo debug flag to print when 
kernels launch and exit)?&nbsp; If you are seeing kernels progress, it's just a 
really large workload and you'd have to be more patient.&nbsp; My group is 
working on ways to cut down runtime for workloads like this, but nothing we 
have specifically tested for these workloads and no ETA on when that would be 
available/fully working.


It is also possible that you aren't running with enough CPU thread contexts and 
the program is infinitely looping there (ROCm launches additional CPU processes 
when setting up a GPU program, these require gem5 to have additional CPU thread 
contexts).&nbsp; But without knowing where the program seems to be blocked, 
it's hard to say if this is a problem or not.&nbsp; But you could try 
increasing -n on the command line (e.g., from 3 to 5, or from 5 to 10) to see 
if this resolves the current problem.&nbsp; This will not resolve the above 
issue though.



3.&nbsp; I have never personally tried modeling a Transformer in DNNMark, so 
this might be a better question for the DNNMark authors.&nbsp; But ultimately 
what you are suggesting is the right way to model things in DNNMark -- in the 
config files you can specify a series of layers, one connected after 
another.&nbsp; So, if you knew what the layers in a Transformer are, in theory 
you could express it in a config file.&nbsp; This assumes that DNNMark supports 
all of the layers in a Transformer though, which I do not know if that is true 
or not (you would need to ask the DNNMark authors).



4.&nbsp; This seems like a question for DNNMark's authors.&nbsp; In gem5, we 
are just running DNNMark in gem5.&nbsp; But ultimately what I can recommend is 
you start with the base files (e.g., 
https://gem5.googlesource.com/public/gem5-resources/+/refs/heads/develop/src/gpu/DNNMark/benchmarks/test_alexnet/test_alexnet.cc)
 and the config files (e.g., 
https://gem5.googlesource.com/public/gem5-resources/+/refs/heads/develop/src/gpu/DNNMark/config_example/alexnet.dnnmark)
 and go from there.&nbsp; When I started with DNNMark, I would observe the LOG 
prints it prints to the screen, then grep for those prints and examine the code.



5.&nbsp; What is "ruby memory" -- is this L1, L2, or main memory size?&nbsp; 
Something else?&nbsp; There are documents like this: 
https://www.gem5.org/2020/06/01/towards-full.html, 
https://www.gem5.org/2020/05/30/enabling-multi-gpu.html, 
https://www.gem5.org/2020/05/27/modern-gpu-applications.html, and 
https://www.gem5.org/documentation/general_docs/gpu_models/GCN3.&nbsp; The GPU 
Ruby system uses the same building blocks as the CPU Ruby models: 
https://www.gem5.org/documentation/learning_gem5/part3/MSIintro/.&nbsp; Not 
sure what exactly you are looking for though.



Thanks,
Matt



On Tue, May 9, 2023 at 4:34?6?2AM 429442672 <429442...@qq.com&gt; wrote:



hi everyone,

I have successfully built and ran DNNMark using the command:


sudo docker run --rm -v ${PWD}:${PWD} -v 
${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 -w 
${PWD} gcn-gpu 
gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n 3 
--benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_softmax
 -cdnnmark_test_fwd_softmax 
--options="-config 
gem5-resources/src/gpu/DNNMark/config_example/softmax_config.dnnmark -mmap 
gem5-resources/src/gpu/DNNMark/mmap.bin"
with the output
Exiting because  exiting with last active thread context

which may means i have correctly made the running environment.




However, i tried several benchmarks in





but meet following problems:


1. problem on running test_fwd_fc


When i run test_fwd_fc using:
sudo docker run --rm -v ${PWD}:${PWD} -v 
${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 -w 
${PWD} gcn-gpu gem5/build/GCN3_X86/gem5.opt 
gem5/configs/example/apu_se.py -n3 
--benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_fc -c 
dnnmark_test_fwd_fc 
--options="-config 
gem5-resources/src/gpu/DNNMark/config_example/fc_config.dnnmark -mmap 
gem5-resources/src/gpu/DNNMark/DNNMark_data.dat"

the problem is running for a few hours, even though i have modify the input 
data (mmap.bin -&gt; DNNMark_data.dat) to a smaller size 300MB (2GB in default).
I have also tried several benchmarks, the only benchmark i done is the 
test_fwd_pool and test_bwd_pool, when i ran benchmarks such as conv??pool??fc, 
the program will be blocked, with out any output.
Is there anything i did wrong here? or these benchmards are too 
compute-intensive to run, leading to slow running?
May i ask for any suggestion for running these benchmarks?


2. problem on running test_VGG and test_alexnet.


I run them with the commands:
sudo docker run --rm -v ${PWD}:${PWD} -v 
${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 -w 
${PWD} gcn-gpu gem5/build/GCN3_X86/gem5.opt
 gem5/configs/example/apu_se.py -n3 
--benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_alexnet 
-c dnnmark_test_alexnet 
--options="-config 
gem5-resources/src/gpu/DNNMark/config_example/alexnet.dnnmark -mmap 
gem5-resources/src/gpu/DNNMark/mmap.bin"

and
sudo docker run --rm -v ${PWD}:${PWD} -v 
${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 -w 
${PWD} gcn-gpu gem5/build/GCN3_X86/gem5.opt
 gem5/configs/example/apu_se.py -n3 
--benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_VGG -c 
dnnmark_test_VGG
 --options="-config gem5-resources/src/gpu/DNNMark/config_example/VGG.dnnmark 
-mmap gem5-resources/src/gpu/DNNMark/mmap.bin"





but they are also blocked.
May i ask for any suggestion for running these benchmarks?


3. question on modifying DNN network.


May i ask how to modify the DNN network architecture? For example, is it 
possible to make a transformer block and run it on gem5? It seems that i can 
change the configures in /DNNMark/config_example following the 
example of alexnet.dnnmark, without modifing the code in 
DNNMark/benchmarks/test_alexnet. May i ask is that correct?


4. How can i get trace on running DNNMark.


Running DNNMark seem like a block box. It is possible to get the trace of 
running DNNMark? For example, the process of data loading, computing, etc.




5. question on apu_se.py


It seem that all the benchmarks require apu_se.py. May i ask is there any more 
detailed documents to introduce what this apu_se.py did and how to modify 
it??For example??how can i add more ruby memory to the gpu.








The documents and introduction for gem5 gcn gpu is pretty few, if it is 
possible, could any one provide some help for me?


Thank you all very much??
_______________________________________________
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org

Reply via email to