Hi Matt, Thanks for the quick reply.
I am running the benchmarks on research clusters where running docker is not permitted and hence I have to build everything and install locally. I have made modifications to the coherence protocol and porting it to a newer Gem5 version may take some time and hence I am stuck with v21.0.0 for now. Although the modifications are basically flags to identify certain packet types, so I am assuming that I haven't broken the protocol. Also, I have run the *square *benchmark and *2DConvolution, FDTD-2D *to completion (compared with cpu execution result) for smaller input sizes. If this version of GEM5 supports anything higher than rocm 1.6.x, I will try to build and use it. To build hcc, I have used the following command. I looked at the CMakelist.txt of other dependencies, but, they don't seem to be using HSA_AMDGPU_GPU_TARGET variable: cmake -DCMAKE_INSTALL_PREFIX=rocm/hcc -DROCM_ROOT=rocm -DHSA_AMDGPU_GPU_TARGET="gfx801" -DCMAKE_BUILD_TYPE=Release .. And I build polybench using: hipcc --amdgpu-target=gfx801 -O2 2DConvolution.cpp -Igem5/include -Lgem5/util/m5/build/x86/out -Lgcc/lib64 -o 2DConvolution.exe -lm5 I do remember that while compiling HCC, *bin/cmake-tests* build was failing because it was using the generated *clang++* which was unable to find *libstdc++.so.* LIBRARY_PATH is ignored (compile time) by the generated clang++ maybe. So, I modified the generated CMake file to add a " -Lgcc/lib64" to it so that it completes *make* and *make install*. The downside is I have to explicitly place *" -Lgcc/lib64 *" while compiling benchmarks using hipcc. Also, *square *completes, so I think LD_LIBRARY_PATH works(runtime). I did see the commits you recently merged, but I wasn't sure whether I can retroactively add them to v21.0.0 which also has my own modifications. Should I go ahead and make the VIPER_TCC changes ? Also, I will definitely try to submit the benchmarks if they work out. Regards, Sampad On Sat, Oct 9, 2021 at 12:34 PM Matt Sinclair via gem5-users < gem5-users@gem5.org> wrote: > Hi Sampad, > > I have not seen anyone attempt to run workloads in a way you are > attempting, so I can't offer every solution, but here are a few things I > noticed: > > - Why are you still using ROCm 1.6.x? And why did you build it from > source? I strongly recommend using the built-in docker support (which > supports ROCm 4.0 now). The error #4 you are having is almost definitely > because something you built from source is not built correctly. But the > possible causes of this error are disparate, so I can't suggest anything > specific about how to fix it. Basically, that error means something went > wrong when running the application, which almost always (in my experience) > is due to not installing ROCm correctly. If you need to continue on with > ROCm 1.6.x, I would recommend looking at the old commits before ROCm 4.0 > support was added, and use the docker support there. > > - Error #3 likely comes from how you are compiling the program with > hipcc/hcc. Depending on which commit you are using, you need to only use > gfx801, gfx803, gfx900, or gfx902. Since you seem to be using a slightly > older setup, probably the issue is you are compiling for something other > than gfx801 (also if you are compiling for gfx803 or gfx900, did you use > the -dgpu flag on the command line?). It is likely error #1 is related to > this too. > > - Error #2 will require getting a Ruby trace and looking at what's > happening with those addresses (ProtocolTrace debug flag is the most > important flag to use). You may find the following useful: > https://www.gem5.org/documentation/learning_gem5/part3/MSIdebugging/. > Having said that, note that I recently merged two fixes to the VIPER TCC > that may be relevant/useful: > https://gem5-review.googlesource.com/c/public/gem5/+/51368, > https://gem5-review.googlesource.com/c/public/gem5/+/51367 > > Finally, Polybench is not officially supported. If you get them working, > it would be great if you submit them to gem5-resources ( > resources.gem5.org/) to allow others to also use them! > > Thanks, > Matt > > On Sat, Oct 9, 2021 at 9:47 AM Sampad Mohapatra via gem5-users < > gem5-users@gem5.org> wrote: > >> Hi All, >> >> I am running gem5 v21.0.0.0, rocm v1.6.x (built from source). The >> simulations run one host CPU (its pair runs a tiny binary and ends exec >> quickly) to launch GPU benchmark (hipified Polybench GPU) and one CPU of a >> separate core-pair(its 2nd core runs a lightweight binary and ends exec >> quickly) to launch a SPEC-17 CPU benchmark on a 3x3 Mesh network. And I am >> facing 4 different kinds of errors and am requesting some help regarding >> them. The GPU benchmarks do "malloc"s of size ranging from 2GB - 10GB. The >> errors appear on various combination of CPU and GPU benchmarks. >> >> (1) The below error appears and disappears on different simulation runs >> """"" >> fdtd2d: ../ROCR-Runtime/src/core/runtime/amd_gpu_agent.cpp:577: virtual >> void amd::GpuAgent::InitDma(): Assertion `queues_[QueueBlitOnly] != __null >> && "Queue creation failed"' failed. >> """"" >> >> (2) Similar errors with varying values >> """"" >> panic: Possible Deadlock detected. Aborting! >> version: 4 request.paddr: 0x190b80c uncoalescedTable: 4 current time: >> 12393604096000 issue_time: 12393350811000 difference: 253285000 >> Request Tables: >> >> Listing pending packets from 4 instructions Addr: [0x2379b, line >> 0x23780] with 0 pending packets >> Addr: [0x237ae, line 0x23780] with 64 pending packets >> Addr: [0x237b0, line 0x23780] with 56 pending packets >> Addr: [0x237b5, line 0x23780] with 61 pending packets >> Memory Usage: 57420616 KBytes >> """"" >> >> (3) The below error appears and disappears on different simulation runs: >> """"" >> There is no device can be used to do the computation >> """"" >> >> (4) The below error appears and disappears on different simulation runs: >> """"" >> fatal: syscall mincore (#27) unimplemented. >> """"" >> >> Thanks and Regards, >> Sampad Mohapatra >> _______________________________________________ >> gem5-users mailing list -- gem5-users@gem5.org >> To unsubscribe send an email to gem5-users-le...@gem5.org >> %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s > > _______________________________________________ > gem5-users mailing list -- gem5-users@gem5.org > To unsubscribe send an email to gem5-users-le...@gem5.org > %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
_______________________________________________ gem5-users mailing list -- gem5-users@gem5.org To unsubscribe send an email to gem5-users-le...@gem5.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s