I think I have a better solution. Please try these patches instead. Thanks!
For the RX6600, you only need patch 0003. The rest of the series fixes up other chips. Thanks, Alex On Sat, Apr 26, 2025 at 9:01 PM Alexey Klimov <alexey.kli...@linaro.org> wrote: > > On Thu Apr 24, 2025 at 4:44 PM BST, Alex Deucher wrote: > > On Tue, Apr 22, 2025 at 11:59 AM Alexey Klimov <alexey.kli...@linaro.org> > > wrote: > >> > >> On Tue Apr 22, 2025 at 2:00 PM BST, Alex Deucher wrote: > >> > On Mon, Apr 21, 2025 at 10:21 PM Alexey Klimov > >> > <alexey.kli...@linaro.org> wrote: > >> >> > >> >> On Thu Apr 17, 2025 at 2:08 PM BST, Alex Deucher wrote: > >> >> > On Wed, Apr 16, 2025 at 8:43 PM Fugang Duan <fugang.d...@cixtech.com> > >> >> > wrote: > >> >> >> > >> >> >> 发件人: Alex Deucher <alexdeuc...@gmail.com> 发送时间: 2025年4月16日 22:49 > >> >> >> >收件人: Alexey Klimov <alexey.kli...@linaro.org> > >> >> >> >On Wed, Apr 16, 2025 at 9:48 AM Alexey Klimov > >> >> >> ><alexey.kli...@linaro.org> wrote: > >> >> >> >> > >> >> >> >> On Wed Apr 16, 2025 at 4:12 AM BST, Fugang Duan wrote: > >> >> >> >> > 发件人: Alexey Klimov <alexey.kli...@linaro.org> 发送时间: 2025年4月16 > >> >> >> >日 2:28 > >> >> >> >> >>#regzbot introduced: v6.12..v6.13 > >> >> >> >> >>The only change related to hdp_v5_0_flush_hdp() was > >> >> >> >> >>cf424020e040 drm/amdgpu/hdp5.0: do a posting read when flushing > >> >> >> >> >>HDP > >> >> >> >> >> > >> >> >> >> >>Reverting that commit ^^ did help and resolved that problem. > >> >> >> >> >>Before > >> > >> [..] > >> > >> >> > OK. that patch won't change anything then. Can you try this patch > >> >> > instead? > >> >> > >> >> Config I am using is basically defconfig wrt memory parameters, yeah, i > >> >> use 4k. > >> >> > >> >> So I tested that patch, thank you, and some other different > >> >> configurations -- > >> >> nothing helped. Exactly the same behaviour with the same backtrace. > >> > > >> > Did you test the first (4k check) or the second (don't remap on ARM) > >> > patch? > >> > >> The second one. I think you mentioned that first one won't help for 4k > >> pages. > >> > >> > >> >> So it seems that it is firmware problem after all? > >> > > >> > There is no GPU firmware involved in this operation. It's just a > >> > posted write. E.g., we write to a register to flush the HDP write > >> > queue and then read the register back to make sure the write posted. > >> > If the second patch didn't help, then perhaps there is some issue with > >> > MMIO access on your platform? > >> > >> I didn't mean GPU firmware at all. I only had uefi/EL3 firmwares in mind. > >> > >> Completely out of the blue, based on nothing, do you think that > >> adding delay/some mem barrier between write and read might help? > >> I wonder if host data path code should be executed during common desktop > >> usage as a common user then why it doesn't break later. But yeah, I also > >> think this is this motherboard problem. Thank you. > > > > I think I found the problem. The previous patch wasn't doing what I > > expected. Please try this patch instead. > > This one works! > > [ 4.483750] [drm] amdgpu kernel modesetting enabled. > [ 4.491985] amdgpu: IO link not available for non x86 platforms > [ 4.497189] amdgpu: Virtual CRAT table created for CPU > [ 4.497559] amdgpu: Topology: Add CPU node > [ 4.509623] amdgpu 0000:c3:00.0: amdgpu: detected ip block number 0 > <nv_common> > [ 4.512905] amdgpu 0000:c3:00.0: amdgpu: detected ip block number 1 > <gmc_v10_0> > [ 4.513254] amdgpu 0000:c3:00.0: amdgpu: detected ip block number 2 > <navi10_ih> > [ 4.513595] amdgpu 0000:c3:00.0: amdgpu: detected ip block number 3 <psp> > [ 4.513932] amdgpu 0000:c3:00.0: amdgpu: detected ip block number 4 <smu> > [ 4.514278] amdgpu 0000:c3:00.0: amdgpu: detected ip block number 5 <dm> > [ 4.514625] amdgpu 0000:c3:00.0: amdgpu: detected ip block number 6 > <gfx_v10_0> > [ 4.514980] amdgpu 0000:c3:00.0: amdgpu: detected ip block number 7 > <sdma_v5_2> > [ 4.515334] amdgpu 0000:c3:00.0: amdgpu: detected ip block number 8 > <vcn_v3_0> > [ 4.515699] amdgpu 0000:c3:00.0: amdgpu: detected ip block number 9 > <jpeg_v3_0> > [ 4.516087] amdgpu 0000:c3:00.0: amdgpu: Fetched VBIOS from VFCT > [ 4.516466] amdgpu: ATOM BIOS: 113-V502MECH-0OC > [ 4.749748] amdgpu 0000:c3:00.0: amdgpu: Trusted Memory Zone (TMZ) feature > disabled as experimental (default) > [ 4.777435] amdgpu 0000:c3:00.0: BAR 2 [mem 0x1810000000-0x18101fffff > 64bit pref]: releasing > [ 4.793256] amdgpu 0000:c3:00.0: BAR 0 [mem 0x1800000000-0x180fffffff > 64bit pref]: releasing > [ 4.844639] amdgpu 0000:c3:00.0: BAR 0 [mem 0x1800000000-0x19ffffffff > 64bit pref]: assigned > [ 4.849774] amdgpu 0000:c3:00.0: BAR 2 [mem 0x1a00000000-0x1a001fffff > 64bit pref]: assigned > [ 4.957411] amdgpu 0000:c3:00.0: amdgpu: VRAM: 8176M 0x0000008000000000 - > 0x00000081FEFFFFFF (8176M used) > [ 4.967618] amdgpu 0000:c3:00.0: amdgpu: GART: 512M 0x0000000000000000 - > 0x000000001FFFFFFF > [ 4.992963] [drm] amdgpu: 8176M of VRAM memory ready > [ 5.004032] [drm] amdgpu: 7888M of GTT memory ready. > [ 6.224159] amdgpu 0000:c3:00.0: amdgpu: STB initialized to 2048 entries > [ 6.284328] amdgpu 0000:c3:00.0: amdgpu: Found VCN firmware Version ENC: > 1.33 DEC: 4 VEP: 0 Revision: 3 > [ 6.361142] amdgpu 0000:c3:00.0: amdgpu: reserve 0xa00000 from > 0x81fd000000 for PSP TMR > [ 6.471231] amdgpu 0000:c3:00.0: amdgpu: RAS: optional ras ta ucode is not > available > [ 6.492967] amdgpu 0000:c3:00.0: amdgpu: SECUREDISPLAY: securedisplay ta > ucode is not available > [ 6.492993] amdgpu 0000:c3:00.0: amdgpu: smu driver if version = > 0x0000000f, smu fw if version = 0x00000013, smu fw program = 0, version = > 0x003b3100 (59.49.0) > [ 6.513659] amdgpu 0000:c3:00.0: amdgpu: SMU driver if version not matched > [ 6.513699] amdgpu 0000:c3:00.0: amdgpu: use vbios provided pptable > [ 6.588418] amdgpu 0000:c3:00.0: amdgpu: SMU is initialized successfully! > [ 6.800975] kfd kfd: amdgpu: Allocated 3969056 bytes on gart > [ 6.806709] kfd kfd: amdgpu: Total number of KFD nodes to be created: 1 > [ 6.813516] amdgpu: Virtual CRAT table created for GPU > [ 6.819229] amdgpu: Topology: Add dGPU node [0x73ff:0x1002] > [ 6.824865] kfd kfd: amdgpu: added device 1002:73ff > [ 6.829821] amdgpu 0000:c3:00.0: amdgpu: SE 2, SH per SE 2, CU per SH 8, > active_cu_number 28 > [ 6.838355] amdgpu 0000:c3:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 > on hub 0 > [ 6.846007] amdgpu 0000:c3:00.0: amdgpu: ring gfx_0.1.0 uses VM inv eng 1 > on hub 0 > [ 6.853658] amdgpu 0000:c3:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 4 > on hub 0 > [ 6.861398] amdgpu 0000:c3:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 5 > on hub 0 > [ 6.869137] amdgpu 0000:c3:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 > on hub 0 > [ 6.876877] amdgpu 0000:c3:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 > on hub 0 > [ 6.884615] amdgpu 0000:c3:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 > on hub 0 > [ 6.892356] amdgpu 0000:c3:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 > on hub 0 > [ 6.900094] amdgpu 0000:c3:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng > 10 on hub 0 > [ 6.907921] amdgpu 0000:c3:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng > 11 on hub 0 > [ 6.915748] amdgpu 0000:c3:00.0: amdgpu: ring kiq_0.2.1.0 uses VM inv eng > 12 on hub 0 > [ 6.923663] amdgpu 0000:c3:00.0: amdgpu: ring sdma0 uses VM inv eng 13 on > hub 0 > [ 6.931050] amdgpu 0000:c3:00.0: amdgpu: ring sdma1 uses VM inv eng 14 on > hub 0 > [ 6.938439] amdgpu 0000:c3:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 > on hub 8 > [ 6.946089] amdgpu 0000:c3:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng > 1 on hub 8 > [ 6.953916] amdgpu 0000:c3:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng > 4 on hub 8 > [ 6.961742] amdgpu 0000:c3:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 > on hub 8 > [ 6.970485] amdgpu 0000:c3:00.0: amdgpu: Using BACO for runtime pm > [ 6.977167] [drm] Initialized amdgpu 3.63.0 for 0000:c3:00.0 on minor 0 > [ 7.234638] amdgpu 0000:c3:00.0: [drm] fb0: amdgpudrmfb frame buffer device > root@orion:~ # uname -a > Linux orion 6.15.0-rc3test6+ #1 SMP Sun Apr 27 01:12:10 BST 2025 aarch64 > GNU/Linux > > Thank you for taking a look into this. > > Best regards, > Alexey >
From 9c8c80ba970816e653e9f100b2e33dafebf11cce Mon Sep 17 00:00:00 2001 From: Alex Deucher <alexander.deuc...@amd.com> Date: Wed, 30 Apr 2025 12:50:02 -0400 Subject: [PATCH 5/5] drm/amdgpu/hdp7: use memcfg register to post the write for HDP flush Reading back the remapped HDP flush register seems to cause problems on some platforms. All we need is a read, so read back the memcfg register. Fixes: 689275140cb8 ("drm/amdgpu/hdp7.0: do a posting read when flushing HDP") Reported-by: Alexey Klimov <alexey.kli...@linaro.org> Link: https://lists.freedesktop.org/archives/amd-gfx/2025-April/123150.html Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4119 Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3908 Signed-off-by: Alex Deucher <alexander.deuc...@amd.com> --- drivers/gpu/drm/amd/amdgpu/hdp_v7_0.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/hdp_v7_0.c b/drivers/gpu/drm/amd/amdgpu/hdp_v7_0.c index 49f7eb4fbd117..2c9239a22f398 100644 --- a/drivers/gpu/drm/amd/amdgpu/hdp_v7_0.c +++ b/drivers/gpu/drm/amd/amdgpu/hdp_v7_0.c @@ -32,7 +32,12 @@ static void hdp_v7_0_flush_hdp(struct amdgpu_device *adev, { if (!ring || !ring->funcs->emit_wreg) { WREG32((adev->rmmio_remap.reg_offset + KFD_MMIO_REMAP_HDP_MEM_FLUSH_CNTL) >> 2, 0); - RREG32((adev->rmmio_remap.reg_offset + KFD_MMIO_REMAP_HDP_MEM_FLUSH_CNTL) >> 2); + /* We just need to read back a register to post the write. + * Reading back the remapped register causes problems on + * some platforms so just read back the memory size register. + */ + if (adev->nbio.funcs->get_memsize) + adev->nbio.funcs->get_memsize(adev); } else { amdgpu_ring_emit_wreg(ring, (adev->rmmio_remap.reg_offset + KFD_MMIO_REMAP_HDP_MEM_FLUSH_CNTL) >> 2, 0); } -- 2.49.0
From 697d39b740db06b16141402fe9ba3327290e994e Mon Sep 17 00:00:00 2001 From: Alex Deucher <alexander.deuc...@amd.com> Date: Wed, 30 Apr 2025 12:48:51 -0400 Subject: [PATCH 4/5] drm/amdgpu/hdp6: use memcfg register to post the write for HDP flush Reading back the remapped HDP flush register seems to cause problems on some platforms. All we need is a read, so read back the memcfg register. Fixes: abe1cbaec6cf ("drm/amdgpu/hdp6.0: do a posting read when flushing HDP") Reported-by: Alexey Klimov <alexey.kli...@linaro.org> Link: https://lists.freedesktop.org/archives/amd-gfx/2025-April/123150.html Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4119 Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3908 Signed-off-by: Alex Deucher <alexander.deuc...@amd.com> --- drivers/gpu/drm/amd/amdgpu/hdp_v6_0.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/hdp_v6_0.c b/drivers/gpu/drm/amd/amdgpu/hdp_v6_0.c index a88d25a06c29b..6ccd31c8bc692 100644 --- a/drivers/gpu/drm/amd/amdgpu/hdp_v6_0.c +++ b/drivers/gpu/drm/amd/amdgpu/hdp_v6_0.c @@ -35,7 +35,12 @@ static void hdp_v6_0_flush_hdp(struct amdgpu_device *adev, { if (!ring || !ring->funcs->emit_wreg) { WREG32((adev->rmmio_remap.reg_offset + KFD_MMIO_REMAP_HDP_MEM_FLUSH_CNTL) >> 2, 0); - RREG32((adev->rmmio_remap.reg_offset + KFD_MMIO_REMAP_HDP_MEM_FLUSH_CNTL) >> 2); + /* We just need to read back a register to post the write. + * Reading back the remapped register causes problems on + * some platforms so just read back the memory size register. + */ + if (adev->nbio.funcs->get_memsize) + adev->nbio.funcs->get_memsize(adev); } else { amdgpu_ring_emit_wreg(ring, (adev->rmmio_remap.reg_offset + KFD_MMIO_REMAP_HDP_MEM_FLUSH_CNTL) >> 2, 0); } -- 2.49.0
From b8d7fa0010c3dd412a72f1c89db81ac89e107261 Mon Sep 17 00:00:00 2001 From: Alex Deucher <alexander.deuc...@amd.com> Date: Wed, 30 Apr 2025 12:46:56 -0400 Subject: [PATCH 2/5] drm/amdgpu/hdp5: use memcfg register to post the write for HDP flush Reading back the remapped HDP flush register seems to cause problems on some platforms. All we need is a read, so read back the memcfg register. Fixes: cf424020e040 ("drm/amdgpu/hdp5.0: do a posting read when flushing HDP") Reported-by: Alexey Klimov <alexey.kli...@linaro.org> Link: https://lists.freedesktop.org/archives/amd-gfx/2025-April/123150.html Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4119 Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3908 Signed-off-by: Alex Deucher <alexander.deuc...@amd.com> --- drivers/gpu/drm/amd/amdgpu/hdp_v5_0.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/hdp_v5_0.c b/drivers/gpu/drm/amd/amdgpu/hdp_v5_0.c index 43195c0797480..086a647308df0 100644 --- a/drivers/gpu/drm/amd/amdgpu/hdp_v5_0.c +++ b/drivers/gpu/drm/amd/amdgpu/hdp_v5_0.c @@ -32,7 +32,12 @@ static void hdp_v5_0_flush_hdp(struct amdgpu_device *adev, { if (!ring || !ring->funcs->emit_wreg) { WREG32((adev->rmmio_remap.reg_offset + KFD_MMIO_REMAP_HDP_MEM_FLUSH_CNTL) >> 2, 0); - RREG32((adev->rmmio_remap.reg_offset + KFD_MMIO_REMAP_HDP_MEM_FLUSH_CNTL) >> 2); + /* We just need to read back a register to post the write. + * Reading back the remapped register causes problems on + * some platforms so just read back the memory size register. + */ + if (adev->nbio.funcs->get_memsize) + adev->nbio.funcs->get_memsize(adev); } else { amdgpu_ring_emit_wreg(ring, (adev->rmmio_remap.reg_offset + KFD_MMIO_REMAP_HDP_MEM_FLUSH_CNTL) >> 2, 0); } -- 2.49.0
From 35557a60c2e5fdf9db8e1f06cadc9cd470ea26a0 Mon Sep 17 00:00:00 2001 From: Alex Deucher <alexander.deuc...@amd.com> Date: Wed, 30 Apr 2025 12:47:37 -0400 Subject: [PATCH 3/5] drm/amdgpu/hdp5.2: use memcfg register to post the write for HDP flush Reading back the remapped HDP flush register seems to cause problems on some platforms. All we need is a read, so read back the memcfg register. Fixes: f756dbac1ce1 ("drm/amdgpu/hdp5.2: do a posting read when flushing HDP") Reported-by: Alexey Klimov <alexey.kli...@linaro.org> Link: https://lists.freedesktop.org/archives/amd-gfx/2025-April/123150.html Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4119 Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3908 Signed-off-by: Alex Deucher <alexander.deuc...@amd.com> --- drivers/gpu/drm/amd/amdgpu/hdp_v5_2.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/hdp_v5_2.c b/drivers/gpu/drm/amd/amdgpu/hdp_v5_2.c index fcb8dd2876bcc..40940b4ab4007 100644 --- a/drivers/gpu/drm/amd/amdgpu/hdp_v5_2.c +++ b/drivers/gpu/drm/amd/amdgpu/hdp_v5_2.c @@ -33,7 +33,17 @@ static void hdp_v5_2_flush_hdp(struct amdgpu_device *adev, if (!ring || !ring->funcs->emit_wreg) { WREG32_NO_KIQ((adev->rmmio_remap.reg_offset + KFD_MMIO_REMAP_HDP_MEM_FLUSH_CNTL) >> 2, 0); - RREG32_NO_KIQ((adev->rmmio_remap.reg_offset + KFD_MMIO_REMAP_HDP_MEM_FLUSH_CNTL) >> 2); + if (amdgpu_sriov_vf(adev)) { + /* this is fine because SR_IOV doesn't remap the register */ + RREG32_NO_KIQ((adev->rmmio_remap.reg_offset + KFD_MMIO_REMAP_HDP_MEM_FLUSH_CNTL) >> 2); + } else { + /* We just need to read back a register to post the write. + * Reading back the remapped register causes problems on + * some platforms so just read back the memory size register. + */ + if (adev->nbio.funcs->get_memsize) + adev->nbio.funcs->get_memsize(adev); + } } else { amdgpu_ring_emit_wreg(ring, (adev->rmmio_remap.reg_offset + KFD_MMIO_REMAP_HDP_MEM_FLUSH_CNTL) >> 2, -- 2.49.0
From 2582d03b1812919053f0b733dc37f32065c8c5b9 Mon Sep 17 00:00:00 2001 From: Alex Deucher <alexander.deuc...@amd.com> Date: Wed, 30 Apr 2025 12:45:04 -0400 Subject: [PATCH 1/5] drm/amdgpu/hdp4: use memcfg register to post the write for HDP flush Reading back the remapped HDP flush register seems to cause problems on some platforms. All we need is a read, so read back the memcfg register. Fixes: c9b8dcabb52a ("drm/amdgpu/hdp4.0: do a posting read when flushing HDP") Reported-by: Alexey Klimov <alexey.kli...@linaro.org> Link: https://lists.freedesktop.org/archives/amd-gfx/2025-April/123150.html Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4119 Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3908 Signed-off-by: Alex Deucher <alexander.deuc...@amd.com> --- drivers/gpu/drm/amd/amdgpu/hdp_v4_0.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/hdp_v4_0.c b/drivers/gpu/drm/amd/amdgpu/hdp_v4_0.c index f1dc13b3ab38e..cbbeadeb53f72 100644 --- a/drivers/gpu/drm/amd/amdgpu/hdp_v4_0.c +++ b/drivers/gpu/drm/amd/amdgpu/hdp_v4_0.c @@ -41,7 +41,12 @@ static void hdp_v4_0_flush_hdp(struct amdgpu_device *adev, { if (!ring || !ring->funcs->emit_wreg) { WREG32((adev->rmmio_remap.reg_offset + KFD_MMIO_REMAP_HDP_MEM_FLUSH_CNTL) >> 2, 0); - RREG32((adev->rmmio_remap.reg_offset + KFD_MMIO_REMAP_HDP_MEM_FLUSH_CNTL) >> 2); + /* We just need to read back a register to post the write. + * Reading back the remapped register causes problems on + * some platforms so just read back the memory size register. + */ + if (adev->nbio.funcs->get_memsize) + adev->nbio.funcs->get_memsize(adev); } else { amdgpu_ring_emit_wreg(ring, (adev->rmmio_remap.reg_offset + KFD_MMIO_REMAP_HDP_MEM_FLUSH_CNTL) >> 2, 0); } -- 2.49.0