On Thu Apr 24, 2025 at 4:44 PM BST, Alex Deucher wrote: > On Tue, Apr 22, 2025 at 11:59 AM Alexey Klimov <alexey.kli...@linaro.org> > wrote: >> >> On Tue Apr 22, 2025 at 2:00 PM BST, Alex Deucher wrote: >> > On Mon, Apr 21, 2025 at 10:21 PM Alexey Klimov <alexey.kli...@linaro.org> >> > wrote: >> >> >> >> On Thu Apr 17, 2025 at 2:08 PM BST, Alex Deucher wrote: >> >> > On Wed, Apr 16, 2025 at 8:43 PM Fugang Duan <fugang.d...@cixtech.com> >> >> > wrote: >> >> >> >> >> >> 发件人: Alex Deucher <alexdeuc...@gmail.com> 发送时间: 2025年4月16日 22:49 >> >> >> >收件人: Alexey Klimov <alexey.kli...@linaro.org> >> >> >> >On Wed, Apr 16, 2025 at 9:48 AM Alexey Klimov >> >> >> ><alexey.kli...@linaro.org> wrote: >> >> >> >> >> >> >> >> On Wed Apr 16, 2025 at 4:12 AM BST, Fugang Duan wrote: >> >> >> >> > 发件人: Alexey Klimov <alexey.kli...@linaro.org> 发送时间: 2025年4月16 >> >> >> >日 2:28 >> >> >> >> >>#regzbot introduced: v6.12..v6.13 >> >> >> >> >>The only change related to hdp_v5_0_flush_hdp() was >> >> >> >> >>cf424020e040 drm/amdgpu/hdp5.0: do a posting read when flushing >> >> >> >> >>HDP >> >> >> >> >> >> >> >> >> >>Reverting that commit ^^ did help and resolved that problem. >> >> >> >> >>Before >> >> [..] >> >> >> > OK. that patch won't change anything then. Can you try this patch >> >> > instead? >> >> >> >> Config I am using is basically defconfig wrt memory parameters, yeah, i >> >> use 4k. >> >> >> >> So I tested that patch, thank you, and some other different >> >> configurations -- >> >> nothing helped. Exactly the same behaviour with the same backtrace. >> > >> > Did you test the first (4k check) or the second (don't remap on ARM) patch? >> >> The second one. I think you mentioned that first one won't help for 4k pages. >> >> >> >> So it seems that it is firmware problem after all? >> > >> > There is no GPU firmware involved in this operation. It's just a >> > posted write. E.g., we write to a register to flush the HDP write >> > queue and then read the register back to make sure the write posted. >> > If the second patch didn't help, then perhaps there is some issue with >> > MMIO access on your platform? >> >> I didn't mean GPU firmware at all. I only had uefi/EL3 firmwares in mind. >> >> Completely out of the blue, based on nothing, do you think that >> adding delay/some mem barrier between write and read might help? >> I wonder if host data path code should be executed during common desktop >> usage as a common user then why it doesn't break later. But yeah, I also >> think this is this motherboard problem. Thank you. > > I think I found the problem. The previous patch wasn't doing what I > expected. Please try this patch instead.
This one works! [ 4.483750] [drm] amdgpu kernel modesetting enabled. [ 4.491985] amdgpu: IO link not available for non x86 platforms [ 4.497189] amdgpu: Virtual CRAT table created for CPU [ 4.497559] amdgpu: Topology: Add CPU node [ 4.509623] amdgpu 0000:c3:00.0: amdgpu: detected ip block number 0 <nv_common> [ 4.512905] amdgpu 0000:c3:00.0: amdgpu: detected ip block number 1 <gmc_v10_0> [ 4.513254] amdgpu 0000:c3:00.0: amdgpu: detected ip block number 2 <navi10_ih> [ 4.513595] amdgpu 0000:c3:00.0: amdgpu: detected ip block number 3 <psp> [ 4.513932] amdgpu 0000:c3:00.0: amdgpu: detected ip block number 4 <smu> [ 4.514278] amdgpu 0000:c3:00.0: amdgpu: detected ip block number 5 <dm> [ 4.514625] amdgpu 0000:c3:00.0: amdgpu: detected ip block number 6 <gfx_v10_0> [ 4.514980] amdgpu 0000:c3:00.0: amdgpu: detected ip block number 7 <sdma_v5_2> [ 4.515334] amdgpu 0000:c3:00.0: amdgpu: detected ip block number 8 <vcn_v3_0> [ 4.515699] amdgpu 0000:c3:00.0: amdgpu: detected ip block number 9 <jpeg_v3_0> [ 4.516087] amdgpu 0000:c3:00.0: amdgpu: Fetched VBIOS from VFCT [ 4.516466] amdgpu: ATOM BIOS: 113-V502MECH-0OC [ 4.749748] amdgpu 0000:c3:00.0: amdgpu: Trusted Memory Zone (TMZ) feature disabled as experimental (default) [ 4.777435] amdgpu 0000:c3:00.0: BAR 2 [mem 0x1810000000-0x18101fffff 64bit pref]: releasing [ 4.793256] amdgpu 0000:c3:00.0: BAR 0 [mem 0x1800000000-0x180fffffff 64bit pref]: releasing [ 4.844639] amdgpu 0000:c3:00.0: BAR 0 [mem 0x1800000000-0x19ffffffff 64bit pref]: assigned [ 4.849774] amdgpu 0000:c3:00.0: BAR 2 [mem 0x1a00000000-0x1a001fffff 64bit pref]: assigned [ 4.957411] amdgpu 0000:c3:00.0: amdgpu: VRAM: 8176M 0x0000008000000000 - 0x00000081FEFFFFFF (8176M used) [ 4.967618] amdgpu 0000:c3:00.0: amdgpu: GART: 512M 0x0000000000000000 - 0x000000001FFFFFFF [ 4.992963] [drm] amdgpu: 8176M of VRAM memory ready [ 5.004032] [drm] amdgpu: 7888M of GTT memory ready. [ 6.224159] amdgpu 0000:c3:00.0: amdgpu: STB initialized to 2048 entries [ 6.284328] amdgpu 0000:c3:00.0: amdgpu: Found VCN firmware Version ENC: 1.33 DEC: 4 VEP: 0 Revision: 3 [ 6.361142] amdgpu 0000:c3:00.0: amdgpu: reserve 0xa00000 from 0x81fd000000 for PSP TMR [ 6.471231] amdgpu 0000:c3:00.0: amdgpu: RAS: optional ras ta ucode is not available [ 6.492967] amdgpu 0000:c3:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available [ 6.492993] amdgpu 0000:c3:00.0: amdgpu: smu driver if version = 0x0000000f, smu fw if version = 0x00000013, smu fw program = 0, version = 0x003b3100 (59.49.0) [ 6.513659] amdgpu 0000:c3:00.0: amdgpu: SMU driver if version not matched [ 6.513699] amdgpu 0000:c3:00.0: amdgpu: use vbios provided pptable [ 6.588418] amdgpu 0000:c3:00.0: amdgpu: SMU is initialized successfully! [ 6.800975] kfd kfd: amdgpu: Allocated 3969056 bytes on gart [ 6.806709] kfd kfd: amdgpu: Total number of KFD nodes to be created: 1 [ 6.813516] amdgpu: Virtual CRAT table created for GPU [ 6.819229] amdgpu: Topology: Add dGPU node [0x73ff:0x1002] [ 6.824865] kfd kfd: amdgpu: added device 1002:73ff [ 6.829821] amdgpu 0000:c3:00.0: amdgpu: SE 2, SH per SE 2, CU per SH 8, active_cu_number 28 [ 6.838355] amdgpu 0000:c3:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0 [ 6.846007] amdgpu 0000:c3:00.0: amdgpu: ring gfx_0.1.0 uses VM inv eng 1 on hub 0 [ 6.853658] amdgpu 0000:c3:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 4 on hub 0 [ 6.861398] amdgpu 0000:c3:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 5 on hub 0 [ 6.869137] amdgpu 0000:c3:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0 [ 6.876877] amdgpu 0000:c3:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on hub 0 [ 6.884615] amdgpu 0000:c3:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0 [ 6.892356] amdgpu 0000:c3:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0 [ 6.900094] amdgpu 0000:c3:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on hub 0 [ 6.907921] amdgpu 0000:c3:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on hub 0 [ 6.915748] amdgpu 0000:c3:00.0: amdgpu: ring kiq_0.2.1.0 uses VM inv eng 12 on hub 0 [ 6.923663] amdgpu 0000:c3:00.0: amdgpu: ring sdma0 uses VM inv eng 13 on hub 0 [ 6.931050] amdgpu 0000:c3:00.0: amdgpu: ring sdma1 uses VM inv eng 14 on hub 0 [ 6.938439] amdgpu 0000:c3:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 8 [ 6.946089] amdgpu 0000:c3:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1 on hub 8 [ 6.953916] amdgpu 0000:c3:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4 on hub 8 [ 6.961742] amdgpu 0000:c3:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on hub 8 [ 6.970485] amdgpu 0000:c3:00.0: amdgpu: Using BACO for runtime pm [ 6.977167] [drm] Initialized amdgpu 3.63.0 for 0000:c3:00.0 on minor 0 [ 7.234638] amdgpu 0000:c3:00.0: [drm] fb0: amdgpudrmfb frame buffer device root@orion:~ # uname -a Linux orion 6.15.0-rc3test6+ #1 SMP Sun Apr 27 01:12:10 BST 2025 aarch64 GNU/Linux Thank you for taking a look into this. Best regards, Alexey