I think we should implement the write/wait combined command in gfx10. Did we ever released any firmware which couldn't do this?
Christian. Am 28.10.2019 13:07 schrieb "Zhu, Changfeng" <changfeng....@amd.com>: Hi Christian, Should we also realize the function of gfx_v9_0_wait_reg_mem in gfx10 like gfx9 since gfx10 also realize write/wait command in a single packet after CL#1761300? Or we can add dummy read in gmc10 by using emit_wait like Luben's way? BR, Changfeng. -----Original Message----- From: Koenig, Christian <christian.koe...@amd.com> Sent: Monday, October 28, 2019 6:47 PM To: Zhu, Changfeng <changfeng....@amd.com>; amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander <alexander.deuc...@amd.com>; Pelloux-prayer, Pierre-eric <pierre-eric.pelloux-pra...@amd.com>; Huang, Ray <ray.hu...@amd.com>; Tuikov, Luben <luben.tui...@amd.com> Subject: Re: [PATCH] drm/amdgpu: GFX9, GFX10: GRBM requires 1-cycle delay Hi Changfeng, > So how can we deal with the firmware between mec version(402) and mec > version(421)? Well of hand I see only two options: Either print a warning or completely reject loading the driver. Completely rejecting loading the driver is probably not a good idea and the issue is actually extremely unlikely to cause any problems. So printing a warning that the user should update their firmware is probably the best approach. Regards, Christian. Am 28.10.19 um 04:01 schrieb Zhu, Changfeng: > Hi Christian, > > Re- that won't work, you can't add this to > amdgpu_ring_emit_reg_write_reg_wait_helper or break all read triggered > registers (like the semaphore ones). > > Do you mean that I should use reg_wait registers(wait_reg_mem) like Luben to > replace read triggered registers for adding dummy read? > > Re-Additional to that it will never work on GFX9, since the CP firmware there > uses the integrated write/wait command and you can't add an additional dummy > read there. > > Yes, I see the integrated write/wait command and they are realized in > gfx_v9_0_wait_reg_mem: > Emily's patch: > drm/amdgpu: Remove the sriov checking and add firmware checking > decides when to go into gfx_v9_0_wait_reg_mem and when go into > amdgpu_ring_emit_reg_write_reg_wait_helper. > > However there are two problems now. > 1.Before the fw_version_ok fw version, the code goes into > amdgpu_ring_emit_reg_write_reg_wait_helper. In this case, should not we add > dummy read in amdgpu_ring_emit_reg_write_reg_wait_helper? > 2.After the fw_version_ok fw version, the code goes into > gfx_v9_0_wait_reg_mem. However, it realizes write/wait command in firmware. > Then how can we add this dummy read? According to Yang,Zilong, the CP > firmware has realized dummy in firmware in CL: > Vega20 CL#1762470 @3/27/2019 > Navi10 CL#1761300 @3/25/2019 > Accodring to CL#1762470, > The firmware which realized dummy read is(Raven for example): > Mec version: > #define F32_MEC_UCODE_VERSION "#421" > #define F32_MEC_FEATURE_VERSION 46 > Pfp version: > #define F32_PFP_UCODE_VERSION "#183" > #define F32_PFP_FEATURE_VERSION 46 > In Emily's patch: > The CP firmware whichuses the integrated write/wait command begins from > version: > + case CHIP_RAVEN: > + if ((adev->gfx.me_fw_version >= 0x0000009c) && > + (adev->gfx.me_feature_version >= 42) && > + (adev->gfx.pfp_fw_version >= 0x000000b1(177)) && > + (adev->gfx.pfp_feature_version >= 42)) > + adev->gfx.me_fw_write_wait = true; > + > + if ((adev->gfx.mec_fw_version >= 0x00000192(402)) && > + (adev->gfx.mec_feature_version >= 42)) > + adev->gfx.mec_fw_write_wait = true; > + break; > > So how can we deal with the firmware between mec version(402) and mec > version(421)? > It will realize write/wait command in CP firmware but it doesn't have dummy > read. > > BR, > Changfeng. > > -----Original Message----- > From: Koenig, Christian <christian.koe...@amd.com> > Sent: Friday, October 25, 2019 11:54 PM > To: Zhu, Changfeng <changfeng....@amd.com>; > amd-gfx@lists.freedesktop.org > Cc: Deucher, Alexander <alexander.deuc...@amd.com>; Pelloux-prayer, > Pierre-eric <pierre-eric.pelloux-pra...@amd.com>; Huang, Ray > <ray.hu...@amd.com>; Tuikov, Luben <luben.tui...@amd.com> > Subject: Re: [PATCH] drm/amdgpu: GFX9, GFX10: GRBM requires 1-cycle > delay > > Hi Changfeng, > > that won't work, you can't add this to > amdgpu_ring_emit_reg_write_reg_wait_helper or break all read triggered > registers (like the semaphore ones). > > Additional to that it will never work on GFX9, since the CP firmware there > uses the integrated write/wait command and you can't add an additional dummy > read there. > > Regards, > Christian. > > Am 25.10.19 um 16:22 schrieb Zhu, Changfeng: >> I try to write a patch based on the patch of Tuikov,Luben. >> >> Inspired by Luben,here is the patch: >> >> From 1980d8f1ed44fb9a84a5ea1f6e2edd2bc25c629a Mon Sep 17 00:00:00 >> 2001 >> From: changzhu <changfeng....@amd.com> >> Date: Thu, 10 Oct 2019 11:02:33 +0800 >> Subject: [PATCH] drm/amdgpu: add dummy read by engines for some GCVM status >> registers >> >> The GRBM register interface is now capable of bursting 1 cycle per >> register wr->wr, wr->rd much faster than previous muticycle per >> transaction done interface. This has caused a problem where status >> registers requiring HW to update have a 1 cycle delay, due to the >> register update having to go through GRBM. >> >> SW may operate on an incorrect value if they write a register and >> immediately check the corresponding status register. >> >> Registers requiring HW to clear or set fields may be delayed by 1 cycle. >> For example, >> >> 1. write VM_INVALIDATE_ENG0_REQ mask = 5a 2. read >> VM_INVALIDATE_ENG0_ACKb till the ack is same as the request mask = 5a >> a. HW will reset VM_INVALIDATE_ENG0_ACK = 0 until invalidation >> is complete 3. write VM_INVALIDATE_ENG0_REQ mask = 5a 4. read >> VM_INVALIDATE_ENG0_ACK till the ack is same as the request mask = 5a >> a. First read of VM_INVALIDATE_ENG0_ACK = 5a instead of 0 >> b. Second read of VM_INVALIDATE_ENG0_ACK = 0 because the remote GRBM >> h/w >> register takes one extra cycle to be cleared >> c. In this case,SW wil see a false ACK if they exit on first read >> >> Affected registers (only GC variant) | Recommended Dummy Read >> --------------------------------------+---------------------------- >> VM_INVALIDATE_ENG*_ACK | VM_INVALIDATE_ENG*_REQ >> VM_L2_STATUS | VM_L2_STATUS >> VM_L2_PROTECTION_FAULT_STATUS | VM_L2_PROTECTION_FAULT_STATUS >> VM_L2_PROTECTION_FAULT_ADDR_HI/LO32 | VM_L2_PROTECTION_FAULT_ADDR_HI/LO32 >> VM_L2_IH_LOG_BUSY | VM_L2_IH_LOG_BUSY >> MC_VM_L2_PERFCOUNTER_HI/LO | MC_VM_L2_PERFCOUNTER_HI/LO >> ATC_L2_PERFCOUNTER_HI/LO | ATC_L2_PERFCOUNTER_HI/LO >> ATC_L2_PERFCOUNTER2_HI/LO | ATC_L2_PERFCOUNTER2_HI/LO >> >> It also needs dummy read by engines for these gc registers. >> >> Change-Id: Ie028f37eb789966d4593984bd661b248ebeb1ac3 >> Signed-off-by: changzhu <changfeng....@amd.com> >> --- >> drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 5 +++++ >> drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 2 ++ >> drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 2 ++ >> drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c | 4 ++++ >> drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c | 18 ++++++++++++++++++ >> 5 files changed, 31 insertions(+) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c >> index 4b3f58dbf36f..c2fbf6087ecf 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c >> @@ -392,6 +392,11 @@ void amdgpu_ring_emit_reg_write_reg_wait_helper(struct >> amdgpu_ring *ring, >> uint32_t ref, uint32_t mask) >> { >> amdgpu_ring_emit_wreg(ring, reg0, ref); >> + >> + /* wait for a cycle to reset vm_inv_eng0_ack */ >> + if (ring->funcs->vmhub == AMDGPU_GFXHUB_0) >> + amdgpu_ring_emit_rreg(ring, reg0); >> + >> amdgpu_ring_emit_reg_wait(ring, reg1, mask, mask); >> } >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c >> b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c >> index ef1975a5323a..104c47734316 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c >> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c >> @@ -5155,6 +5155,7 @@ static const struct amdgpu_ring_funcs >> gfx_v10_0_ring_funcs_gfx = { >> .patch_cond_exec = gfx_v10_0_ring_emit_patch_cond_exec, >> .preempt_ib = gfx_v10_0_ring_preempt_ib, >> .emit_tmz = gfx_v10_0_ring_emit_tmz, >> + .emit_rreg = gfx_v10_0_ring_emit_rreg, >> .emit_wreg = gfx_v10_0_ring_emit_wreg, >> .emit_reg_wait = gfx_v10_0_ring_emit_reg_wait, >> }; >> @@ -5188,6 +5189,7 @@ static const struct amdgpu_ring_funcs >> gfx_v10_0_ring_funcs_compute = { >> .test_ib = gfx_v10_0_ring_test_ib, >> .insert_nop = amdgpu_ring_insert_nop, >> .pad_ib = amdgpu_ring_generic_pad_ib, >> + .emit_rreg = gfx_v10_0_ring_emit_rreg, >> .emit_wreg = gfx_v10_0_ring_emit_wreg, >> .emit_reg_wait = gfx_v10_0_ring_emit_reg_wait, >> }; >> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c >> b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c >> index 2f03bf533d41..d00b53de0fdc 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c >> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c >> @@ -6253,6 +6253,7 @@ static const struct amdgpu_ring_funcs >> gfx_v9_0_ring_funcs_gfx = { >> .init_cond_exec = gfx_v9_0_ring_emit_init_cond_exec, >> .patch_cond_exec = gfx_v9_0_ring_emit_patch_cond_exec, >> .emit_tmz = gfx_v9_0_ring_emit_tmz, >> + .emit_rreg = gfx_v9_0_ring_emit_rreg, >> .emit_wreg = gfx_v9_0_ring_emit_wreg, >> .emit_reg_wait = gfx_v9_0_ring_emit_reg_wait, >> .emit_reg_write_reg_wait = gfx_v9_0_ring_emit_reg_write_reg_wait, >> @@ -6289,6 +6290,7 @@ static const struct amdgpu_ring_funcs >> gfx_v9_0_ring_funcs_compute = { >> .insert_nop = amdgpu_ring_insert_nop, >> .pad_ib = amdgpu_ring_generic_pad_ib, >> .set_priority = gfx_v9_0_ring_set_priority_compute, >> + .emit_rreg = gfx_v9_0_ring_emit_rreg, >> .emit_wreg = gfx_v9_0_ring_emit_wreg, >> .emit_reg_wait = gfx_v9_0_ring_emit_reg_wait, >> .emit_reg_write_reg_wait = gfx_v9_0_ring_emit_reg_write_reg_wait, >> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c >> b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c >> index 3b00bce14cfb..dce6b651da1f 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c >> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c >> @@ -346,6 +346,10 @@ static uint64_t >> gmc_v10_0_emit_flush_gpu_tlb(struct amdgpu_ring *ring, >> >> amdgpu_ring_emit_wreg(ring, hub->vm_inv_eng0_req + eng, req); >> >> + /* wait for a cycle to reset vm_inv_eng0_ack */ >> + if (ring->funcs->vmhub == AMDGPU_GFXHUB_0) >> + amdgpu_ring_emit_rreg(ring, hub->vm_inv_eng0_req + eng); >> + >> /* wait for the invalidate to complete */ >> amdgpu_ring_emit_reg_wait(ring, hub->vm_inv_eng0_ack + eng, >> 1 << vmid, 1 << vmid); >> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c >> b/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c >> index 3460c00f3eaa..baaa33467882 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c >> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c >> @@ -38,6 +38,7 @@ >> #include "navi10_sdma_pkt_open.h" >> #include "nbio_v2_3.h" >> #include "sdma_v5_0.h" >> +#include "nvd.h" >> >> MODULE_FIRMWARE("amdgpu/navi10_sdma.bin"); >> MODULE_FIRMWARE("amdgpu/navi10_sdma1.bin"); >> @@ -1147,6 +1148,22 @@ static void sdma_v5_0_ring_emit_vm_flush(struct >> amdgpu_ring *ring, >> amdgpu_gmc_emit_flush_gpu_tlb(ring, vmid, pd_addr); >> } >> >> +static void sdma_v5_0_ring_emit_rreg(struct amdgpu_ring *ring, >> +uint32_t reg) { >> + struct amdgpu_device *adev = ring->adev; >> + >> + amdgpu_ring_write(ring, PACKET3(PACKET3_COPY_DATA, 4)); >> + amdgpu_ring_write(ring, 0 | /* src: register*/ >> + (5 << 8) | /* dst: memory */ >> + (1 << 20)); /* write confirm */ >> + amdgpu_ring_write(ring, reg); >> + amdgpu_ring_write(ring, 0); >> + amdgpu_ring_write(ring, lower_32_bits(adev->wb.gpu_addr + >> + adev->virt.reg_val_offs * 4)); >> + amdgpu_ring_write(ring, upper_32_bits(adev->wb.gpu_addr + >> + adev->virt.reg_val_offs * 4)); >> +} >> + >> static void sdma_v5_0_ring_emit_wreg(struct amdgpu_ring *ring, >> uint32_t reg, uint32_t val) >> { >> @@ -1597,6 +1614,7 @@ static const struct amdgpu_ring_funcs >> sdma_v5_0_ring_funcs = { >> .test_ib = sdma_v5_0_ring_test_ib, >> .insert_nop = sdma_v5_0_ring_insert_nop, >> .pad_ib = sdma_v5_0_ring_pad_ib, >> + .emit_rreg = sdma_v5_0_ring_emit_rreg, >> .emit_wreg = sdma_v5_0_ring_emit_wreg, >> .emit_reg_wait = sdma_v5_0_ring_emit_reg_wait, >> .init_cond_exec = sdma_v5_0_ring_init_cond_exec,
_______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx