Hi Matt,

Thanks for the quick reply.

I am running the benchmarks on research clusters where running docker is
not permitted and hence I have to build everything and install locally.
I have made modifications to the coherence protocol and porting it to a
newer Gem5 version may take some time and hence I am stuck with v21.0.0 for
now.
Although the modifications are basically flags to identify certain packet
types, so I am assuming that I haven't broken the protocol.
Also, I have run the *square *benchmark and *2DConvolution, FDTD-2D *to
completion (compared with cpu execution result) for smaller input sizes.
If this version of GEM5 supports anything higher than rocm 1.6.x, I will
try to build and use it.

To build hcc, I have used the following command. I looked at the
CMakelist.txt of other dependencies, but, they don't seem to be using
HSA_AMDGPU_GPU_TARGET  variable:
cmake -DCMAKE_INSTALL_PREFIX=rocm/hcc -DROCM_ROOT=rocm
-DHSA_AMDGPU_GPU_TARGET="gfx801" -DCMAKE_BUILD_TYPE=Release ..

And I build polybench using:
hipcc --amdgpu-target=gfx801 -O2 2DConvolution.cpp -Igem5/include
-Lgem5/util/m5/build/x86/out -Lgcc/lib64 -o 2DConvolution.exe -lm5

I do remember that while compiling HCC, *bin/cmake-tests* build was failing
because it was using the generated *clang++* which was unable to find
*libstdc++.so.*
LIBRARY_PATH is ignored (compile time) by the generated clang++ maybe.
So, I modified the generated CMake file to add a " -Lgcc/lib64" to it so
that it completes *make* and *make install*. The downside is I have to
explicitly place *" -Lgcc/lib64 *"
while compiling benchmarks using hipcc. Also, *square  *completes, so I
think LD_LIBRARY_PATH works(runtime).

I did see the commits you recently merged, but I wasn't sure whether I can
retroactively add them to v21.0.0 which also has my own modifications.
Should I go ahead and make the VIPER_TCC changes ?

Also, I will definitely try to submit the benchmarks if they work out.

Regards,
Sampad

On Sat, Oct 9, 2021 at 12:34 PM Matt Sinclair via gem5-users <
gem5-users@gem5.org> wrote:

> Hi Sampad,
>
> I have not seen anyone attempt to run workloads in a way you are
> attempting, so I can't offer every solution, but here are a few things I
> noticed:
>
> - Why are you still using ROCm 1.6.x?  And why did you build it from
> source?  I strongly recommend using the built-in docker support (which
> supports ROCm 4.0 now).  The error #4 you are having is almost definitely
> because something you built from source is not built correctly. But the
> possible causes of this error are disparate, so I can't suggest anything
> specific about how to fix it.  Basically, that error means something went
> wrong when running the application, which almost always (in my experience)
> is due to not installing ROCm correctly.  If you need to continue on with
> ROCm 1.6.x, I would recommend looking at the old commits before ROCm 4.0
> support was added, and use the docker support there.
>
> - Error #3 likely comes from how you are compiling the program with
> hipcc/hcc.  Depending on which commit you are using, you need to only use
> gfx801, gfx803, gfx900, or gfx902.  Since you seem to be using a slightly
> older setup, probably the issue is you are compiling for something other
> than gfx801 (also if you are compiling for gfx803 or gfx900, did you use
> the -dgpu flag on the command line?).  It is likely error #1 is related to
> this too.
>
> - Error #2 will require getting a Ruby trace and looking at what's
> happening with those addresses (ProtocolTrace debug flag is the most
> important flag to use).  You may find the following useful:
> https://www.gem5.org/documentation/learning_gem5/part3/MSIdebugging/.
> Having said that, note that I recently merged two fixes to the VIPER TCC
> that may be relevant/useful:
> https://gem5-review.googlesource.com/c/public/gem5/+/51368,
> https://gem5-review.googlesource.com/c/public/gem5/+/51367
>
> Finally, Polybench is not officially supported.  If you get them working,
> it would be great if you submit them to gem5-resources (
> resources.gem5.org/) to allow others to also use them!
>
> Thanks,
> Matt
>
> On Sat, Oct 9, 2021 at 9:47 AM Sampad Mohapatra via gem5-users <
> gem5-users@gem5.org> wrote:
>
>> Hi All,
>>
>> I am running gem5 v21.0.0.0, rocm v1.6.x (built from source). The
>> simulations run one host CPU (its pair runs a tiny binary and ends exec
>> quickly) to launch GPU benchmark (hipified Polybench GPU) and one CPU of a
>> separate core-pair(its 2nd core runs a lightweight binary and ends exec
>> quickly) to launch a SPEC-17 CPU benchmark on a 3x3 Mesh network. And I am
>> facing 4 different kinds of errors and am requesting some help regarding
>> them. The GPU benchmarks do "malloc"s of  size ranging from 2GB - 10GB. The
>> errors appear on various combination of CPU and GPU benchmarks.
>>
>> (1) The below error appears and disappears on different simulation runs
>> """""
>> fdtd2d: ../ROCR-Runtime/src/core/runtime/amd_gpu_agent.cpp:577: virtual
>> void amd::GpuAgent::InitDma(): Assertion `queues_[QueueBlitOnly] != __null
>> && "Queue creation failed"' failed.
>> """""
>>
>> (2) Similar errors with varying values
>> """""
>> panic: Possible Deadlock detected. Aborting!
>> version: 4 request.paddr: 0x190b80c uncoalescedTable: 4 current time:
>> 12393604096000 issue_time: 12393350811000 difference: 253285000
>> Request Tables:
>>
>> Listing pending packets from 4 instructions     Addr: [0x2379b, line
>> 0x23780] with 0 pending packets
>>         Addr: [0x237ae, line 0x23780] with 64 pending packets
>>         Addr: [0x237b0, line 0x23780] with 56 pending packets
>>         Addr: [0x237b5, line 0x23780] with 61 pending packets
>> Memory Usage: 57420616 KBytes
>> """""
>>
>> (3) The below error appears and disappears on different simulation runs:
>> """""
>> There is no device can be used to do the computation
>> """""
>>
>> (4) The below error appears and disappears on different simulation runs:
>> """""
>> fatal: syscall mincore (#27) unimplemented.
>> """""
>>
>> Thanks and Regards,
>> Sampad Mohapatra
>> _______________________________________________
>> gem5-users mailing list -- gem5-users@gem5.org
>> To unsubscribe send an email to gem5-users-le...@gem5.org
>> %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
>
> _______________________________________________
> gem5-users mailing list -- gem5-users@gem5.org
> To unsubscribe send an email to gem5-users-le...@gem5.org
> %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
_______________________________________________
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

Reply via email to