Hi, Trying to answer your various questions:
1. Similar to #2 below, I am unclear what "blocked" means. It sounds like the program is just running, but is slower than you were hoping it would be? If so, unfortunately, this is a well known problem with detailed simulators like gem5 -- they can take a long time to simulate a workload. However, there is another option, where you aren't using enough thread contexts, see #2 below. If you are willing to, you can decrease the batch size, and usually the program simulates faster. For FWD_FC in particular, you would do this by decreasing n (e.g., from to 100 to 4, 8, or 16): https://gem5.googlesource.com/public/gem5-resources/+/refs/heads/develop/src/gpu/DNNMark/config_example/fc_config.dnnmark#6 . 2. Define blocked -- what does this mean? The bigger benchmarks here are very large ML workloads, it would not surprise me if they took days (or maybe weeks) to run them end-to-end in gem5. Are you seeing kernels progressing through it (e.g., use the GPUKernelInfo debug flag to print when kernels launch and exit)? If you are seeing kernels progress, it's just a really large workload and you'd have to be more patient. My group is working on ways to cut down runtime for workloads like this, but nothing we have specifically tested for these workloads and no ETA on when that would be available/fully working. It is also possible that you aren't running with enough CPU thread contexts and the program is infinitely looping there (ROCm launches additional CPU processes when setting up a GPU program, these require gem5 to have additional CPU thread contexts). But without knowing where the program seems to be blocked, it's hard to say if this is a problem or not. But you could try increasing -n on the command line (e.g., from 3 to 5, or from 5 to 10) to see if this resolves the current problem. This will not resolve the above issue though. 3. I have never personally tried modeling a Transformer in DNNMark, so this might be a better question for the DNNMark authors. But ultimately what you are suggesting is the right way to model things in DNNMark -- in the config files you can specify a series of layers, one connected after another. So, if you knew what the layers in a Transformer are, in theory you could express it in a config file. This assumes that DNNMark supports all of the layers in a Transformer though, which I do not know if that is true or not (you would need to ask the DNNMark authors). 4. This seems like a question for DNNMark's authors. In gem5, we are just running DNNMark in gem5. But ultimately what I can recommend is you start with the base files (e.g., https://gem5.googlesource.com/public/gem5-resources/+/refs/heads/develop/src/gpu/DNNMark/benchmarks/test_alexnet/test_alexnet.cc) and the config files (e.g., https://gem5.googlesource.com/public/gem5-resources/+/refs/heads/develop/src/gpu/DNNMark/config_example/alexnet.dnnmark) and go from there. When I started with DNNMark, I would observe the LOG prints it prints to the screen, then grep for those prints and examine the code. 5. What is "ruby memory" -- is this L1, L2, or main memory size? Something else? There are documents like this: https://www.gem5.org/2020/06/01/towards-full.html, https://www.gem5.org/2020/05/30/enabling-multi-gpu.html, https://www.gem5.org/2020/05/27/modern-gpu-applications.html, and https://www.gem5.org/documentation/general_docs/gpu_models/GCN3. The GPU Ruby system uses the same building blocks as the CPU Ruby models: https://www.gem5.org/documentation/learning_gem5/part3/MSIintro/. Not sure what exactly you are looking for though. Thanks, Matt On Tue, May 9, 2023 at 4:34 AM 429442672 <429442...@qq.com> wrote: > > hi everyone, > > I have successfully built and ran DNNMark using the command: > > sudo docker run --rm -v ${PWD}:${PWD} -v > ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 > -w ${PWD} gcn-gpu > gem5/build/GCN3_X86/gem5.opt gem5/configs/example/apu_se.py -n 3 > --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_softmax > -cdnnmark_test_fwd_softmax > --options="-config > gem5-resources/src/gpu/DNNMark/config_example/softmax_config.dnnmark -mmap > gem5-resources/src/gpu/DNNMark/mmap.bin" > > with the output > > Exiting because exiting with last active thread context > > which may means i have correctly made the running environment. > > > However, i tried several benchmarks in > > > but meet following problems: > > 1. problem on running test_fwd_fc > > When i run test_fwd_fc using: > > sudo docker run --rm -v ${PWD}:${PWD} -v > ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 > -w ${PWD} gcn-gpu gem5/build/GCN3_X86/gem5.opt > gem5/configs/example/apu_se.py -n3 > --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_fwd_fc > -c dnnmark_test_fwd_fc > --options="-config > gem5-resources/src/gpu/DNNMark/config_example/fc_config.dnnmark -mmap > gem5-resources/src/gpu/DNNMark/DNNMark_data.dat" > > the problem is running for a few hours, even though i have modify the > input data (mmap.bin -> DNNMark_data.dat) to a smaller size 300MB (2GB in > default). > I have also tried several benchmarks, the only benchmark i done is the > test_fwd_pool and test_bwd_pool, when i ran benchmarks such as > conv、pool、fc, the program will be blocked, with out any output. > Is there anything i did wrong here? or these benchmards are too > compute-intensive to run, leading to slow running? > May i ask for any suggestion for running these benchmarks? > > 2. problem on running test_VGG and test_alexnet. > > I run them with the commands: > > sudo docker run --rm -v ${PWD}:${PWD} -v > ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 > -w ${PWD} gcn-gpu gem5/build/GCN3_X86/gem5.opt > gem5/configs/example/apu_se.py -n3 > --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_alexnet > -c dnnmark_test_alexnet > --options="-config > gem5-resources/src/gpu/DNNMark/config_example/alexnet.dnnmark -mmap > gem5-resources/src/gpu/DNNMark/mmap.bin" > > and > > sudo docker run --rm -v ${PWD}:${PWD} -v > ${PWD}/gem5-resources/src/gpu/DNNMark/cachefiles:/root/.cache/miopen/2.9.0 > -w ${PWD} gcn-gpu gem5/build/GCN3_X86/gem5.opt > gem5/configs/example/apu_se.py -n3 > --benchmark-root=gem5-resources/src/gpu/DNNMark/build/benchmarks/test_VGG > -c dnnmark_test_VGG > --options="-config > gem5-resources/src/gpu/DNNMark/config_example/VGG.dnnmark -mmap > gem5-resources/src/gpu/DNNMark/mmap.bin" > > > but they are also blocked. > May i ask for any suggestion for running these benchmarks? > > 3. question on modifying DNN network. > > May i ask how to modify the DNN network architecture? For example, is it > possible to make a transformer block and run it on gem5? It seems that i > can change the configures in /DNNMark/config_example following the > example of alexnet.dnnmark, without modifing the code in > DNNMark/benchmarks/test_alexnet. May i ask is that correct? > > 4. How can i get trace on running DNNMark. > > Running DNNMark seem like a block box. It is possible to get the trace of > running DNNMark? For example, the process of data loading, computing, etc. > > 5. question on apu_se.py > > It seem that all the benchmarks require apu_se.py. May i ask is there any > more detailed documents to introduce what this apu_se.py did and how to > modify it?For example,how can i add more ruby memory to the gpu. > > > > > The documents and introduction for gem5 gcn gpu is pretty few, if it is > possible, could any one provide some help for me? > > Thank you all very much! >
_______________________________________________ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org