However it's not only related to mclk and sclk. I tried this: rocm-smi --setsclk 2 rocm-smi --setmclk 3 rocm-smi ======================== ROCm System Management Interface ======================== ================================================================================================ GPU Temp AvgPwr SCLK MCLK PCLK Fan Perf PwrCap SCLK OD MCLK OD GPU% GPU[0] : WARNING: Empty SysFS value: pclk GPU[0] : WARNING: Unable to read /sys/class/drm/card0/device/gpu_busy_percent 0 34.0c N/A 1240Mhz 1333Mhz N/A 0% manual N/A 0% 0% N/A ================================================================================================ ======================== End of ROCm SMI Log ========================
./cl_slow_test 1 got 1 platforms 1 devices speed 3919.777100 avg 3919.777100 mbytes/s speed 3809.373291 avg 3864.575195 mbytes/s speed 585.796814 avg 2771.649170 mbytes/s speed 188.721848 avg 2125.917236 mbytes/s speed 188.916367 avg 1738.517090 mbytes/s So despite forcing max sclk and mclk the memory speed is still slow.. -- Lauri On Tue, Mar 12, 2019 at 11:21 PM Lauri Ehrenpreis <lauri...@gmail.com> wrote: > IN the case when memory is slow, the rocm-smi outputs this: > ======================== ROCm System Management Interface > ======================== > > ================================================================================================ > GPU Temp AvgPwr SCLK MCLK PCLK Fan Perf > PwrCap SCLK OD MCLK OD GPU% > GPU[0] : WARNING: Empty SysFS value: pclk > GPU[0] : WARNING: Unable to read > /sys/class/drm/card0/device/gpu_busy_percent > 0 30.0c N/A 400Mhz 933Mhz N/A 0% auto N/A > 0% 0% N/A > > ================================================================================================ > ======================== End of ROCm SMI Log > ======================== > > normal memory speed case gives following: > ======================== ROCm System Management Interface > ======================== > > ================================================================================================ > GPU Temp AvgPwr SCLK MCLK PCLK Fan Perf > PwrCap SCLK OD MCLK OD GPU% > GPU[0] : WARNING: Empty SysFS value: pclk > GPU[0] : WARNING: Unable to read > /sys/class/drm/card0/device/gpu_busy_percent > 0 35.0c N/A 400Mhz 1200Mhz N/A 0% auto N/A > 0% 0% N/A > > ================================================================================================ > ======================== End of ROCm SMI Log > ======================== > > So there is a difference in MCLK - can this cause such a huge slowdown? > > -- > Lauri > > On Tue, Mar 12, 2019 at 6:39 PM Kuehling, Felix <felix.kuehl...@amd.com> > wrote: > >> [adding the list back] >> >> I'd suspect a problem related to memory clock. This is an APU where >> system memory is shared with the CPU, so if the SMU changes memory >> clocks that would affect CPU memory access performance. If the problem >> only occurs when OpenCL is running, then the compute power profile could >> have an effect here. >> >> Laurie, can you monitor the clocks during your tests using rocm-smi? >> >> Regards, >> Felix >> >> On 2019-03-11 1:15 p.m., Tom St Denis wrote: >> > Hi Lauri, >> > >> > I don't have ROCm installed locally (not on that team at AMD) but I >> > can rope in some of the KFD folk and see what they say :-). >> > >> > (in the mean time I should look into installing the ROCm stack on my >> > Ubuntu disk for experimentation...). >> > >> > Only other thing that comes to mind is some sort of stutter due to >> > power/clock gating (or gfx off/etc). But that typically affects the >> > display/gpu side not the CPU side. >> > >> > Felix: Any known issues with Raven and ROCm interacting over memory >> > bus performance? >> > >> > Tom >> > >> > On Mon, Mar 11, 2019 at 12:56 PM Lauri Ehrenpreis <lauri...@gmail.com >> > <mailto:lauri...@gmail.com>> wrote: >> > >> > Hi! >> > >> > The 100x memory slowdown is hard to belive indeed. I attached the >> > test program with my first e-mail which depends only on >> > rocm-opencl-dev package. Would you mind compiling it and checking >> > if it slows down memory for you as well? >> > >> > steps: >> > 1) g++ cl_slow_test.cpp -o cl_slow_test -I >> > /opt/rocm/opencl/include/ -L /opt/rocm/opencl/lib/x86_64/ -lOpenCL >> > 2) logout from desktop env and disconnect hdmi/diplayport etc >> > 3) log in over ssh >> > 4) run the program ./cl_slow_test 1 >> > >> > For me it reproduced even without step 2 as well but less >> > reliably. moving mouse for example could make the memory speed >> > fast again. >> > >> > -- >> > Lauri >> > >> > >> > >> > On Mon, Mar 11, 2019 at 6:33 PM Tom St Denis <tstdeni...@gmail.com >> > <mailto:tstdeni...@gmail.com>> wrote: >> > >> > Hi Lauri, >> > >> > There's really no connection between the two other than they >> > run in the same package. I too run a 2400G (as my >> > workstation) and I got the same ~6.6GB/sec transfer rate but >> > without a CL app running ... The only logical reason is your >> > CL app is bottlenecking the APUs memory bus but you claim >> > "simply opening a context is enough" so something else is >> > going on. >> > >> > Your last reply though says "with it running in the >> > background" so it's entirely possible the CPU isn't busy but >> > the package memory controller (shared between both the CPU and >> > GPU) is busy. For instance running xonotic in a 1080p window >> > on my 4K display reduced the memory test to 5.8GB/sec and >> > that's hardly a heavy memory bound GPU app. >> > >> > The only other possible connection is the GPU is generating so >> > much heat that it's throttling the package which is also >> > unlikely if you have a proper HSF attached (I use the ones >> > that came in the retail boxes). >> > >> > Cheers, >> > Tom >> > >> >
_______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx