[PATCH v8 0/5] rework bo mem stats tracking

2024-11-15 Thread Yunxiang Li
Right now every time the fdinfo is read, we go through the vm lists and lock all the BOs to calcuate the statistics. This causes a lot of lock contention when the VM is actively used. It gets worse if there is a lot of shared BOs or if there's a lot of submissions. We have seen submissions lock-up

[PATCH v8 1/5] drm: add drm_memory_stats_is_zero

2024-11-15 Thread Yunxiang Li
Add a helper to check if the memory stats is zero, this will be used to check for memory accounting errors. Signed-off-by: Yunxiang Li Reviewed-by: Christian König CC: dri-de...@lists.freedesktop.org --- drivers/gpu/drm/drm_file.c | 10 ++ include/drm/drm_file.h | 1 + 2 files chan

[PATCH v8 4/5] drm/amdgpu: remove unused function parameter

2024-11-15 Thread Yunxiang Li
amdgpu_vm_bo_invalidate doesn't use the adev parameter and not all callers have a reference to adev handy, so remove it for cleanliness. Signed-off-by: Yunxiang Li Reviewed-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 4 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c

[PATCH v8 5/5] drm/amdgpu: track bo memory stats at runtime

2024-11-15 Thread Yunxiang Li
Before, every time fdinfo is queried we try to lock all the BOs in the VM and calculate memory usage from scratch. This works okay if the fdinfo is rarely read and the VMs don't have a ton of BOs. If either of these conditions is not true, we get a massive performance hit. In this new revision, we

[PATCH v8 3/5] Documentation/gpu: Clarify drm memory stats definition

2024-11-15 Thread Yunxiang Li
Define how to handle buffers with multiple possible placement so we don't get incompatible implementations. Callout the resident requirement for drm-purgeable- explicitly. Remove the requirement for there to be only drm-memory- or only drm-resident-, it's not what's implemented and having both is b

[PATCH v8 2/5] drm: make drm-active- stats optional

2024-11-15 Thread Yunxiang Li
When memory stats is generated fresh everytime by going though all the BOs, their active information is quite easy to get. But if the stats are tracked with BO's state this becomes harder since the job scheduling part doesn't really deal with individual buffers. Make drm-active- optional to enable

[PATCH 5/5] drm/connector: make mode_valid accept const struct drm_display_mode

2024-11-15 Thread Dmitry Baryshkov
The mode_valid() callbacks of drm_encoder, drm_crtc and drm_bridge accept const struct drm_display_mode argument. Change the mode_valid callback of drm_connector to also accept const argument. Signed-off-by: Dmitry Baryshkov --- drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c | 8 -

[PATCH 4/5] drm/connector: make mode_valid_ctx accept const struct drm_display_mode

2024-11-15 Thread Dmitry Baryshkov
The mode_valid() callbacks of drm_encoder, drm_crtc and drm_bridge accept const struct drm_display_mode argument. Change the mode_valid_ctx callback of drm_connector to also accept const argument. Signed-off-by: Dmitry Baryshkov --- drivers/gpu/drm/i915/display/intel_dp_mst.c | 2 +- include/drm

[PATCH 3/5] drm/sti: hda: pass const struct drm_display_mode* to hda_get_mode_idx()

2024-11-15 Thread Dmitry Baryshkov
Make hda_get_mode_idx() accept const struct drm_display_mode pointer instead of just raw struct drm_display_mode. This is a preparation to converting the mode_valid() callback of drm_connector to accept const struct drm_display_mode argument. Signed-off-by: Dmitry Baryshkov --- drivers/gpu/drm/

[PATCH 2/5] drm/amdgpu: don't change mode in amdgpu_dm_connector_mode_valid()

2024-11-15 Thread Dmitry Baryshkov
Make amdgpu_dm_connector_mode_valid() duplicate the mode during the test rather than modifying the passed mode. This is a preparation to converting the mode_valid() callback of drm_connector to accept const struct drm_display_mode argument. Signed-off-by: Dmitry Baryshkov --- drivers/gpu/drm/amd

[PATCH 1/5] drm/encoder_slave: make mode_valid accept const struct drm_display_mode

2024-11-15 Thread Dmitry Baryshkov
The mode_valid() callbacks of drm_encoder, drm_crtc and drm_bridge accept const struct drm_display_mode argument. Change the mode_valid callback of drm_encoder_slave to also accept const argument. Signed-off-by: Dmitry Baryshkov --- drivers/gpu/drm/i2c/ch7006_drv.c | 2 +- drivers/gpu/d

[PATCH 0/5] drm/connector: make mode_valid() callback accept const mode pointer

2024-11-15 Thread Dmitry Baryshkov
ase-commit: 7d2faa8dbb7055a115fe0cd6068d7090094a573d change-id: 20241115-drm-connector-mode-valid-const-ae3db0ef6cb7 Best regards, -- Dmitry Baryshkov

[pull] amdgpu, amdkfd drm-next-6.13

2024-11-15 Thread Alex Deucher
Hi Dave, Simona, Fixes for 6.13. The following changes since commit 35a6e15aabc016a241379c09d6c367519709b95b: Merge tag 'drm-etnaviv-next-2024-11-07' of https://git.pengutronix.de/git/lst/linux into drm-next (2024-11-08 12:32:06 +1000) are available in the Git repository at: https://gitl

Re: [PATCH 15/15] drm/amdgpu/vcn: update work handler for per instance powergating

2024-11-15 Thread Alex Deucher
On Fri, Nov 15, 2024 at 7:27 AM Christian König wrote: > > Am 13.11.24 um 22:44 schrieb Alex Deucher: > > Only gate/ungate the relevant instances. > > That won't work, that is the whole problem why we started this series in > the first place. Why won't it work? From my perspective, it was not th

Re: [PATCH 2/2] drm/amdgpu: fix vcn sw init failed

2024-11-15 Thread Alex Deucher
On Fri, Nov 15, 2024 at 7:34 AM Christian König wrote: > > Am 13.11.24 um 22:43 schrieb Alex Deucher: > > On Wed, Nov 13, 2024 at 12:32 AM Lazar, Lijo wrote: > >> > >> > >> On 11/13/2024 10:54 AM, Alex Deucher wrote: > >>> On Wed, Nov 13, 2024 at 12:03 AM Lazar, Lijo wrote: > > >

Re: [PATCH] drm/amd/pm: fix and simplify workload handling

2024-11-15 Thread Alex Deucher
On Fri, Nov 15, 2024 at 7:14 AM Lazar, Lijo wrote: > > > > On 11/15/2024 2:36 AM, Alex Deucher wrote: > > smu->workload_mask is IP specific and should not be messed with in > > the common code. The mask bits vary across SMU versions. > > > > Move all handling of smu->workload_mask in to the backen

[PATCH] drm/amdgpu: update MODULE_PARM_DESC for freesync_video

2024-11-15 Thread Alex Deucher
To better describe what it does. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3756 Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd

Re: [PATCH v9 2/4] drm/doc: Document device wedged event

2024-11-15 Thread Andy Shevchenko
On Fri, Nov 15, 2024 at 10:19:42AM +0100, Christian König wrote: > Am 15.11.24 um 06:07 schrieb Raag Jadav: > > Add documentation for device wedged event in a new 'Device wedging' > > chapter. The describes basic definitions and consumer expectations > > along with an example. > > > > v8: Improve

Re: [PATCH 2/2] drm/amdgpu: fix vcn sw init failed

2024-11-15 Thread Christian König
Am 13.11.24 um 22:43 schrieb Alex Deucher: On Wed, Nov 13, 2024 at 12:32 AM Lazar, Lijo wrote: On 11/13/2024 10:54 AM, Alex Deucher wrote: On Wed, Nov 13, 2024 at 12:03 AM Lazar, Lijo wrote: On 11/13/2024 10:16 AM, Alex Deucher wrote: On Tue, Nov 12, 2024 at 10:24 PM Lazar, Lijo wrote:

Re: [PATCH 15/15] drm/amdgpu/vcn: update work handler for per instance powergating

2024-11-15 Thread Christian König
Am 13.11.24 um 22:44 schrieb Alex Deucher: Only gate/ungate the relevant instances. That won't work, that is the whole problem why we started this series in the first place. Regards, Christian. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 8 1 fil

Re: [PATCH] drm/amdkfd: make sure ring buffer is flushed before update wptr

2024-11-15 Thread Lazar, Lijo
On 11/15/2024 3:28 PM, Victor Zhao wrote: > In a consecutive packet submission, for example unmap and query status, > when CP is reading wptr caused by unmap packet doorbell ring, if in some > case CP operates slower (e.g. doorbell_mode=1) and wptr has been updated > to next packet (query status

Re: [PATCH] drm/amdkfd: make sure ring buffer is flushed before update wptr

2024-11-15 Thread Christian König
Am 15.11.24 um 10:58 schrieb Victor Zhao: In a consecutive packet submission, for example unmap and query status, when CP is reading wptr caused by unmap packet doorbell ring, if in some case CP operates slower (e.g. doorbell_mode=1) and wptr has been updated to next packet (query status), but th

Re: [PATCH] drm/amd/pm: fix and simplify workload handling

2024-11-15 Thread Lazar, Lijo
On 11/15/2024 2:36 AM, Alex Deucher wrote: > smu->workload_mask is IP specific and should not be messed with in > the common code. The mask bits vary across SMU versions. > > Move all handling of smu->workload_mask in to the backends and > simplify the code. Store the user's preference in smu-

RE: [PATCH] drm/amd/pm: fix and simplify workload handling

2024-11-15 Thread Feng, Kenneth
[AMD Official Use Only - AMD Internal Distribution Only] Reviewed-by: Kenneth Feng kenneth.f...@amd.com -Original Message- From: Deucher, Alexander Sent: Friday, November 15, 2024 5:06 AM To: amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander ; Feng, Kenneth ; Lazar, Lijo Subject: [

[PATCH] drm/amdkfd: make sure ring buffer is flushed before update wptr

2024-11-15 Thread Victor Zhao
In a consecutive packet submission, for example unmap and query status, when CP is reading wptr caused by unmap packet doorbell ring, if in some case CP operates slower (e.g. doorbell_mode=1) and wptr has been updated to next packet (query status), but the query status packet content has not been f

Re: [PATCH v9 2/4] drm/doc: Document device wedged event

2024-11-15 Thread Christian König
Am 15.11.24 um 06:07 schrieb Raag Jadav: Add documentation for device wedged event in a new 'Device wedging' chapter. The describes basic definitions and consumer expectations along with an example. v8: Improve documentation (Christian, Rodrigo) v9: Add prerequisites section (Christian) Signed-

[PATCH v9 4/4] drm/i915: Use device wedged event

2024-11-15 Thread Raag Jadav
Now that we have device wedged event provided by DRM core, make use of it and support both driver rebind and bus-reset based recovery. With this in place, userspace will be notified of wedged device on gt reset failure. Signed-off-by: Raag Jadav --- drivers/gpu/drm/i915/gt/intel_reset.c | 3 +++

[PATCH] drm/amdgpu: reduce the mmio writes in kiq setting

2024-11-15 Thread Prike Liang
There's no need to perform the two MMIO writes in the KIQ Setting registers programmed period, and reducing the MMIO writes will save the driver loading time. Signed-off-by: Prike Liang --- drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 8 ++-- drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 4 +--- dr

[PATCH v9 3/4] drm/xe: Use device wedged event

2024-11-15 Thread Raag Jadav
This was previously attempted as xe specific reset uevent but dropped in commit 77a0d4d1cea2 ("drm/xe/uapi: Remove reset uevent for now") as part of refactoring. Now that we have device wedged event provided by DRM core, make use of it and support both driver rebind and bus-reset based recovery. W

[PATCH v9 0/4] Introduce DRM device wedged event

2024-11-15 Thread Raag Jadav
This series introduces device wedged event in DRM subsystem and uses it in xe and i915 drivers. Detailed description in commit message. This was earlier attempted as xe specific uevent in v1 and v2. https://patchwork.freedesktop.org/series/136909/ Similar work by André Almeida. https://lore.kerne

[PATCH v2 1/2] drm/amd/display: remove redundant is_dsc_possible check

2024-11-15 Thread Bhavin Sharma
Since is_dsc_possible is already checked just above, there's no need to check it again before filling out the DSC settings. Signed-off-by: Bhavin Sharma --- drivers/gpu/drm/amd/display/dc/dsc/dc_dsc.c | 13 + 1 file changed, 5 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/d

[PATCH v9 1/4] drm: Introduce device wedged event

2024-11-15 Thread Raag Jadav
Introduce device wedged event, which notifies userspace of 'wedged' (hanged/unusable) state of the DRM device through a uevent. This is useful especially in cases where the device is no longer operating as expected and has become unrecoverable from driver context. Purpose of this implementation is

[PATCH v9 2/4] drm/doc: Document device wedged event

2024-11-15 Thread Raag Jadav
Add documentation for device wedged event in a new 'Device wedging' chapter. The describes basic definitions and consumer expectations along with an example. v8: Improve documentation (Christian, Rodrigo) v9: Add prerequisites section (Christian) Signed-off-by: Raag Jadav --- Documentation/gpu/

[PATCH v2 0/2] Remove redundant check

2024-11-15 Thread Bhavin Sharma
Change in V2: in patch 1/2: - Remove mode_422 condition check because that is fixed in amd-staging-drm-next Link for v1: https://lore.kernel.org/dri-devel/2024120900.63869-1-bhavin.sha...@siliconsignals.io/T/#t Bhavin Sharma (2): drm/amd/display: remove redundant is_dsc_

[PATCH v2 2/2] drm/amd/pm: remove redundant tools_size check

2024-11-15 Thread Bhavin Sharma
The check for tools_size being non-zero is redundant as tools_size is explicitly set to a non-zero value (0x19000). Removing the if condition simplifies the code without altering functionality. Signed-off-by: Bhavin Sharma --- .../amd/pm/powerplay/smumgr/vega12_smumgr.c | 24 +-

[PATCH 1/2] drm/amdgpu: Add init level for post reset reinit

2024-11-15 Thread Lijo Lazar
When device needs to be reset before initialization, it's not required for all IPs to be initialized before a reset. In such cases, it needs to identify whether the IP/feature is initialized for the first time or whether it's reinitialized after a reset. Add RESET_RECOVERY init level to identify p

[PATCH 2/2] drm/amdgpu: Check whether in reset recovery state

2024-11-15 Thread Lijo Lazar
Some in_reset checks are infact checking whether the state is reinitialization after reset. Replace with reset_in_recovery calls to identify that it's really checking for recovery stage after reset. Signed-off-by: Lijo Lazar --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +- drivers/gpu/drm