For reproduction only the tiny cl_slow_test.cpp is needed which is attached to first e-mail.
System information is following: CPU: Ryzen5 2400G Main board: Gigabyte AMD B450 AORUS mini itx: https://www.gigabyte.com/Motherboard/B450-I-AORUS-PRO-WIFI-rev-10#kf BIOS: F5 8.47 MB 2019/01/25 (latest) Kernel: https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.0/ (amd64) OS: Ubuntu 18.04 LTS rocm-opencl-dev installation: wget -qO - http://repo.radeon.com/rocm/apt/debian/rocm.gpg.key | sudo apt-key add - echo 'deb [arch=amd64] http://repo.radeon.com/rocm/apt/debian/ xenial main' | sudo tee /etc/apt/sources.list.d/rocm.list sudo apt install rocm-opencl-dev Also exactly the same issue happens with this board: https://www.gigabyte.com/Motherboard/GA-AB350-Gaming-3-rev-1x#kf I have MSI and Asrock mini itx boards ready as well, So far didn't get amdgpu & opencl working there but I'll try again tomorrow.. -- Lauri On Wed, Mar 13, 2019 at 8:51 PM Kuehling, Felix <felix.kuehl...@amd.com> wrote: > Hi Lauri, > > I still think the SMU is doing something funny, but rocm-smi isn't > showing enough information to really see what's going on. > > On APUs the SMU firmware is embedded in the system BIOS. Unlike discrete > GPUs, the SMU firmware is not loaded by the driver. You could try > updating your system BIOS to the latest version available from your main > board vendor and see if that makes a difference. It may include a newer > version of the SMU firmware, potentially with a fix. > > If that doesn't help, we'd have to reproduce the problem in house to see > what's happening, which may require the same main board and BIOS version > you're using. We can ask our SMU firmware team if they've ever > encountered your type of problem. But I don't want to give you too much > hope. It's a tricky problem involving HW, firmware and multiple driver > components in a fairly unusual configuration. > > Regards, > Felix > > On 2019-03-13 7:28 a.m., Lauri Ehrenpreis wrote: > > What I observe is that moving the mouse made the memory speed go up > > and also it made mclk=1200Mhz in rocm-smi output. > > However if I force mclk to 1200Mhz myself then memory speed is still > > slow. > > > > So rocm-smi output when memory speed went fast due to mouse movement: > > rocm-smi > > ======================== ROCm System Management Interface > > ======================== > > > ================================================================================================ > > GPU Temp AvgPwr SCLK MCLK PCLK Fan Perf > > PwrCap SCLK OD MCLK OD GPU% > > GPU[0] : WARNING: Empty SysFS value: pclk > > GPU[0] : WARNING: Unable to read > > /sys/class/drm/card0/device/gpu_busy_percent > > 0 44.0c N/A 400Mhz 1200Mhz N/A 0% manual N/A > > 0% 0% N/A > > > ================================================================================================ > > ======================== End of ROCm SMI Log > > ======================== > > > > And rocm-smi output when I forced memclk=1200MHz myself: > > rocm-smi --setmclk 2 > > rocm-smi > > ======================== ROCm System Management Interface > > ======================== > > > ================================================================================================ > > GPU Temp AvgPwr SCLK MCLK PCLK Fan Perf > > PwrCap SCLK OD MCLK OD GPU% > > GPU[0] : WARNING: Empty SysFS value: pclk > > GPU[0] : WARNING: Unable to read > > /sys/class/drm/card0/device/gpu_busy_percent > > 0 39.0c N/A 400Mhz 1200Mhz N/A 0% manual N/A > > 0% 0% N/A > > > ================================================================================================ > > ======================== End of ROCm SMI Log > > ======================== > > > > So only difference is that temperature shows 44c when memory speed was > > fast and 39c when it was slow. But mclk was 1200MHz and sclk was > > 400MHz in both cases. > > Can it be that rocm-smi just has a bug in reporting and mclk was not > > actually 1200MHz when I forced it with rocm-smi --setmclk 2 ? > > That would explain the different behaviour.. > > > > If so then is there a programmatic way how to really guarantee the > > high speed mclk? Basically I want do something similar in my program > > what happens if I move > > the mouse in desktop env and this way guarantee the normal memory > > speed each time the program starts. > > > > -- > > Lauri > > > > > > On Tue, Mar 12, 2019 at 11:36 PM Deucher, Alexander > > <alexander.deuc...@amd.com <mailto:alexander.deuc...@amd.com>> wrote: > > > > Forcing the sclk and mclk high may impact the CPU frequency since > > they share TDP. > > > > Alex > > > ------------------------------------------------------------------------ > > *From:* amd-gfx <amd-gfx-boun...@lists.freedesktop.org > > <mailto:amd-gfx-boun...@lists.freedesktop.org>> on behalf of Lauri > > Ehrenpreis <lauri...@gmail.com <mailto:lauri...@gmail.com>> > > *Sent:* Tuesday, March 12, 2019 5:31 PM > > *To:* Kuehling, Felix > > *Cc:* Tom St Denis; amd-gfx@lists.freedesktop.org > > <mailto:amd-gfx@lists.freedesktop.org> > > *Subject:* Re: Slow memory access when using OpenCL without X11 > > However it's not only related to mclk and sclk. I tried this: > > rocm-smi --setsclk 2 > > rocm-smi --setmclk 3 > > rocm-smi > > ======================== ROCm System Management Interface > > ======================== > > > > ================================================================================================ > > GPU Temp AvgPwr SCLK MCLK PCLK Fan Perf > > PwrCap SCLK OD MCLK OD GPU% > > GPU[0] : WARNING: Empty SysFS value: pclk > > GPU[0] : WARNING: Unable to read > > /sys/class/drm/card0/device/gpu_busy_percent > > 0 34.0c N/A 1240Mhz 1333Mhz N/A 0% > > manual N/A 0% 0% N/A > > > > ================================================================================================ > > ======================== End of ROCm SMI Log > > ======================== > > > > ./cl_slow_test 1 > > got 1 platforms 1 devices > > speed 3919.777100 avg 3919.777100 mbytes/s > > speed 3809.373291 avg 3864.575195 mbytes/s > > speed 585.796814 avg 2771.649170 mbytes/s > > speed 188.721848 avg 2125.917236 mbytes/s > > speed 188.916367 avg 1738.517090 mbytes/s > > > > So despite forcing max sclk and mclk the memory speed is still slow.. > > > > -- > > Lauri > > > > > > On Tue, Mar 12, 2019 at 11:21 PM Lauri Ehrenpreis > > <lauri...@gmail.com <mailto:lauri...@gmail.com>> wrote: > > > > IN the case when memory is slow, the rocm-smi outputs this: > > ======================== ROCm System Management > > Interface ======================== > > > > ================================================================================================ > > GPU Temp AvgPwr SCLK MCLK PCLK Fan > > Perf PwrCap SCLK OD MCLK OD GPU% > > GPU[0] : WARNING: Empty SysFS value: pclk > > GPU[0] : WARNING: Unable to read > > /sys/class/drm/card0/device/gpu_busy_percent > > 0 30.0c N/A 400Mhz 933Mhz N/A 0% > > auto N/A 0% 0% N/A > > > > ================================================================================================ > > ======================== End of ROCm SMI Log > > ======================== > > > > normal memory speed case gives following: > > ======================== ROCm System Management > > Interface ======================== > > > > ================================================================================================ > > GPU Temp AvgPwr SCLK MCLK PCLK Fan > > Perf PwrCap SCLK OD MCLK OD GPU% > > GPU[0] : WARNING: Empty SysFS value: pclk > > GPU[0] : WARNING: Unable to read > > /sys/class/drm/card0/device/gpu_busy_percent > > 0 35.0c N/A 400Mhz 1200Mhz N/A 0% > > auto N/A 0% 0% N/A > > > > ================================================================================================ > > ======================== End of ROCm SMI Log > > ======================== > > > > So there is a difference in MCLK - can this cause such a huge > > slowdown? > > > > -- > > Lauri > > > > On Tue, Mar 12, 2019 at 6:39 PM Kuehling, Felix > > <felix.kuehl...@amd.com <mailto:felix.kuehl...@amd.com>> wrote: > > > > [adding the list back] > > > > I'd suspect a problem related to memory clock. This is an > > APU where > > system memory is shared with the CPU, so if the SMU > > changes memory > > clocks that would affect CPU memory access performance. If > > the problem > > only occurs when OpenCL is running, then the compute power > > profile could > > have an effect here. > > > > Laurie, can you monitor the clocks during your tests using > > rocm-smi? > > > > Regards, > > Felix > > > > On 2019-03-11 1:15 p.m., Tom St Denis wrote: > > > Hi Lauri, > > > > > > I don't have ROCm installed locally (not on that team at > > AMD) but I > > > can rope in some of the KFD folk and see what they say :-). > > > > > > (in the mean time I should look into installing the ROCm > > stack on my > > > Ubuntu disk for experimentation...). > > > > > > Only other thing that comes to mind is some sort of > > stutter due to > > > power/clock gating (or gfx off/etc). But that typically > > affects the > > > display/gpu side not the CPU side. > > > > > > Felix: Any known issues with Raven and ROCm interacting > > over memory > > > bus performance? > > > > > > Tom > > > > > > On Mon, Mar 11, 2019 at 12:56 PM Lauri Ehrenpreis > > <lauri...@gmail.com <mailto:lauri...@gmail.com> > > > <mailto:lauri...@gmail.com <mailto:lauri...@gmail.com>>> > > wrote: > > > > > > Hi! > > > > > > The 100x memory slowdown is hard to belive indeed. I > > attached the > > > test program with my first e-mail which depends only on > > > rocm-opencl-dev package. Would you mind compiling it > > and checking > > > if it slows down memory for you as well? > > > > > > steps: > > > 1) g++ cl_slow_test.cpp -o cl_slow_test -I > > > /opt/rocm/opencl/include/ -L > > /opt/rocm/opencl/lib/x86_64/ -lOpenCL > > > 2) logout from desktop env and disconnect > > hdmi/diplayport etc > > > 3) log in over ssh > > > 4) run the program ./cl_slow_test 1 > > > > > > For me it reproduced even without step 2 as well but > > less > > > reliably. moving mouse for example could make the > > memory speed > > > fast again. > > > > > > -- > > > Lauri > > > > > > > > > > > > On Mon, Mar 11, 2019 at 6:33 PM Tom St Denis > > <tstdeni...@gmail.com <mailto:tstdeni...@gmail.com> > > > <mailto:tstdeni...@gmail.com > > <mailto:tstdeni...@gmail.com>>> wrote: > > > > > > Hi Lauri, > > > > > > There's really no connection between the two > > other than they > > > run in the same package. I too run a 2400G (as my > > > workstation) and I got the same ~6.6GB/sec > > transfer rate but > > > without a CL app running ... The only logical > > reason is your > > > CL app is bottlenecking the APUs memory bus but > > you claim > > > "simply opening a context is enough" so > > something else is > > > going on. > > > > > > Your last reply though says "with it running in the > > > background" so it's entirely possible the CPU > > isn't busy but > > > the package memory controller (shared between > > both the CPU and > > > GPU) is busy. For instance running xonotic in a > > 1080p window > > > on my 4K display reduced the memory test to > > 5.8GB/sec and > > > that's hardly a heavy memory bound GPU app. > > > > > > The only other possible connection is the GPU is > > generating so > > > much heat that it's throttling the package which > > is also > > > unlikely if you have a proper HSF attached (I > > use the ones > > > that came in the retail boxes). > > > > > > Cheers, > > > Tom > > > > > >
_______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx