RE: [PATCH] drm/amdgpu: disable GPU RAS bad page feature for specific ASIC

2024-09-10 Thread Zhou1, Tao
[AMD Official Use Only - AMD Internal Distribution Only] > -Original Message- > From: Lazar, Lijo > Sent: Tuesday, September 10, 2024 1:21 PM > To: Zhou1, Tao ; amd-gfx@lists.freedesktop.org > Subject: Re: [PATCH] drm/amdgpu: disable GPU RAS bad page feature for specific > ASIC > > > > On

[PATCH] drm/amdgpu: Fix a typo

2024-09-10 Thread Andrew Kreimer
Fix a typo in comments. Reported-by: Matthew Wilcox Signed-off-by: Andrew Kreimer --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom

Re: [RFC 1/4] drm/sched: Add locking to drm_sched_entity_modify_sched

2024-09-10 Thread Tvrtko Ursulin
On 09/09/2024 13:46, Philipp Stanner wrote: On Mon, 2024-09-09 at 13:37 +0100, Tvrtko Ursulin wrote: On 09/09/2024 13:18, Christian König wrote: Am 09.09.24 um 14:13 schrieb Philipp Stanner: On Mon, 2024-09-09 at 13:29 +0200, Christian König wrote: Am 09.09.24 um 11:44 schrieb Philipp Stan

Re: [PATCH 0/2] drm/amd: fix VRR race condition during IRQ handling

2024-09-10 Thread Tobias Jakobi
On 9/9/24 19:18, Harry Wentland wrote: On 2024-09-09 13:11, Alex Deucher wrote: On Sun, Sep 8, 2024 at 7:23 AM Tobias Jakobi wrote: On 9/8/24 09:35, Christopher Snowhill wrote: On Mon Sep 2, 2024 at 2:40 AM PDT, tjakobi wrote: From: Tobias Jakobi Hello, this fixes a nasty race conditio

Re: [PATCH 1/8] drm/sched: Add locking to drm_sched_entity_modify_sched

2024-09-10 Thread Christian König
Am 09.09.24 um 19:19 schrieb Tvrtko Ursulin: From: Tvrtko Ursulin Without the locking amdgpu currently can race between amdgpu_ctx_set_entity_priority() (via drm_sched_entity_modify_sched()) and drm_sched_job_arm(), leading to the latter accesing potentially inconsitent entity->sched_list and e

Re: [PATCH 2/8] drm/sched: Always wake up correct scheduler in drm_sched_entity_push_job

2024-09-10 Thread Christian König
Am 09.09.24 um 19:19 schrieb Tvrtko Ursulin: From: Tvrtko Ursulin Since drm_sched_entity_modify_sched() can modify the entities run queue, lets make sure to only dereference the pointer once so both adding and waking up are guaranteed to be consistent. Alternative of moving the spin_unlock to

Re: [PATCH v4 20/80] drm/imx/ipuv3: Run DRM default client setup

2024-09-10 Thread Philipp Zabel
On Mo, 2024-09-09 at 13:30 +0200, Thomas Zimmermann wrote: > Call drm_client_setup_with_color_mode() to run the kernel's default > client setup for DRM. Set fbdev_probe in struct drm_driver, so that > the client setup can start the common fbdev client. > > Signed-off-by: Thomas Zimmermann > Cc: P

Re: [PATCH 4/8] drm/sched: Optimise drm_sched_entity_push_job

2024-09-10 Thread Christian König
Am 09.09.24 um 19:19 schrieb Tvrtko Ursulin: From: Tvrtko Ursulin In FIFO mode We can avoid dropping the lock only to immediately re-acquire by adding a new drm_sched_rq_update_fifo_locked() helper. Signed-off-by: Tvrtko Ursulin Cc: Christian König Cc: Alex Deucher Cc: Luben Tuikov Cc:

[PATCH] drm/amdgpu: disable GPU RAS bad page feature for specific ASIC

2024-09-10 Thread Tao Zhou
The feature is not applicable to specific app platform. v2: update the disablement condition and commit description v3: move the setting to amdgpu_ras_check_supported Signed-off-by: Tao Zhou Reviewed-by: Hawking Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 5 + 1 file changed, 5 ins

Re: [PATCH 5/8] drm/sched: Stop setting current entity in FIFO mode

2024-09-10 Thread Christian König
Am 09.09.24 um 19:19 schrieb Tvrtko Ursulin: From: Tvrtko Ursulin It does not seem there is a need to set the current entity in FIFO mode since ot only serves as being a "cursor" in round-robin mode. Even if scheduling mode is changed at runtime the change in behaviour is simply to restart from

RE: [PATCH] drm/amdgpu: disable GPU RAS bad page feature for specific ASIC

2024-09-10 Thread Zhang, Hawking
[AMD Official Use Only - AMD Internal Distribution Only] Reviewed-by: Hawking Zhang Regards, Hawking -Original Message- From: Zhou1, Tao Sent: Tuesday, September 10, 2024 16:38 To: amd-gfx@lists.freedesktop.org Cc: Zhou1, Tao ; Zhang, Hawking Subject: [PATCH] drm/amdgpu: disable GPU RA

Re: [PATCH 6/8] drm/sched: Re-order struct drm_sched_rq members for clarity

2024-09-10 Thread Christian König
Am 09.09.24 um 19:19 schrieb Tvrtko Ursulin: From: Tvrtko Ursulin Lets re-order the members to make it clear which are protected by the lock and at the same time document it via kerneldoc. Signed-off-by: Tvrtko Ursulin Cc: Christian König Cc: Alex Deucher Cc: Luben Tuikov Cc: Matthew Brost

Re: [PATCH 7/8] drm/sched: Re-group and rename the entity run-queue lock

2024-09-10 Thread Christian König
Am 09.09.24 um 19:19 schrieb Tvrtko Ursulin: From: Tvrtko Ursulin Christian suggested to rename the lock and improve the documentation of what it protects. And to also re-order the structure members so all protected by the lock are together in a block. Signed-off-by: Tvrtko Ursulin Cc: Christ

Re: [PATCH 8/8] drm/sched: Further optimise drm_sched_entity_push_job

2024-09-10 Thread Christian König
Am 09.09.24 um 19:19 schrieb Tvrtko Ursulin: From: Tvrtko Ursulin Having removed one re-lock cycle on the entity->lock in a patch titled "drm/sched: Optimise drm_sched_entity_push_job", with only a tiny bit larger refactoring we can do the same optimisation on the rq->lock. (Currently both drm_

Re: [PATCH 5/8] drm/sched: Stop setting current entity in FIFO mode

2024-09-10 Thread Philipp Stanner
On Mon, 2024-09-09 at 18:19 +0100, Tvrtko Ursulin wrote: > From: Tvrtko Ursulin > > It does not seem there is a need to set the current entity in FIFO > mode > since ot only serves as being a "cursor" in round-robin mode. Even if > scheduling mode is changed at runtime the change in behaviour is

Re: [PATCH 6/8] drm/sched: Re-order struct drm_sched_rq members for clarity

2024-09-10 Thread Philipp Stanner
On Mon, 2024-09-09 at 18:19 +0100, Tvrtko Ursulin wrote: > From: Tvrtko Ursulin > > Lets re-order the members to make it clear which are protected by the > lock > and at the same time document it via kerneldoc. I'd prefer if commit messages follow the idiomatic kernel style of that order: 1.

Re: [PATCH 8/8] drm/sched: Further optimise drm_sched_entity_push_job

2024-09-10 Thread Philipp Stanner
On Mon, 2024-09-09 at 18:19 +0100, Tvrtko Ursulin wrote: > From: Tvrtko Ursulin > > Having removed one re-lock cycle on the entity->lock in a patch > titled > "drm/sched: Optimise drm_sched_entity_push_job", with only a tiny bit > larger refactoring we can do the same optimisation on the rq->lock

Re: [PATCH 6/8] drm/sched: Re-order struct drm_sched_rq members for clarity

2024-09-10 Thread Philipp Stanner
On Tue, 2024-09-10 at 11:42 +0100, Tvrtko Ursulin wrote: > > On 10/09/2024 11:05, Philipp Stanner wrote: > > On Mon, 2024-09-09 at 18:19 +0100, Tvrtko Ursulin wrote: > > > From: Tvrtko Ursulin > > > > > > Lets re-order the members to make it clear which are protected by > > > the > > > lock > >

[PATCH v2] drm/ci: uprev mesa, IGT and deqp-runner

2024-09-10 Thread Vignesh Raman
Uprev mesa, IGT to the latest version and deqp-runner to v0.20.0. Also update expectation files. Acked-by: Helen Koike Reviewed-by: Daniel Stone Signed-off-by: Vignesh Raman --- v1: - Flaky test report will be sent to maintainers after this patch series is reviewed. v2: - Uprev mesa and rer

[PATCH 2/2] drm/amdgpu: Retry i2c transfer once if it fails

2024-09-10 Thread Kent Russell
During init, there can be some collisions on the i2c bus that result in the EEPROM read failing. This has been mitigated in the PMFW to a degree, but there is still a small chance that the bus will be busy. When the read fails during RAS init, that disables page retirement altogether, which is obvi

[PATCH 1/2] drm/amdkfd: Move queue fs deletion after destroy check

2024-09-10 Thread Kent Russell
We were removing the kernfs entry for queue info before checking if the queue could be destroyed. If it failed to get destroyed (e.g. during some GPU resets), then we would try to delete it later during pqm teardown, but the file was already removed. This led to a kernel WARN trying to remove size,

Re: [PATCH] drm/amdgpu: disable GPU RAS bad page feature for specific ASIC

2024-09-10 Thread Lazar, Lijo
On 9/10/2024 2:07 PM, Tao Zhou wrote: > The feature is not applicable to specific app platform. > > v2: update the disablement condition and commit description > v3: move the setting to amdgpu_ras_check_supported > > Signed-off-by: Tao Zhou > Reviewed-by: Hawking Zhang Reviewed-by: Lijo Laz

Re: [PATCH] drm/amdgpu: disable GPU RAS bad page feature for specific ASIC

2024-09-10 Thread Lazar, Lijo
On a second thought, this may be made more generic by just checking APU flag - holds true for any APU in general. Thanks, Lijo On 9/10/2024 7:24 PM, Lazar, Lijo wrote: > > > On 9/10/2024 2:07 PM, Tao Zhou wrote: >> The feature is not applicable to specific app platform. >> >> v2: update the d

Re: [PATCH v2 3/3] drm/amdgpu/sdma6: implement ring reset callback for sdma6

2024-09-10 Thread Alex Deucher
On Mon, Sep 9, 2024 at 11:48 PM wrote: > > From: Jiadong Zhu > > Implement sdma queue reset callback using mes_reset_queue_mmio. > > v2: check instance id before reset queue. > > Signed-off-by: Jiadong Zhu Series is: Reviewed-by: Alex Deucher > --- > drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c |

RE: [PATCH 2/2] drm/amdgpu: Retry i2c transfer once if it fails

2024-09-10 Thread Lazar, Lijo
[AMD Official Use Only - AMD Internal Distribution Only] The ideal place is - smu_v13_0_6_request_i2c_xfer Restricts the change to specific SOCs with collision problem. Gives a bit more survival chance with a retry on every chunk requested. Thanks, Lijo -Original Message- From: a

RE: [PATCH 2/2] drm/amdgpu: Retry i2c transfer once if it fails

2024-09-10 Thread Russell, Kent
[AMD Official Use Only - AMD Internal Distribution Only] Sounds good, thanks for the input Lijo! I'll get that tested and verify that it works too. Kent > -Original Message- > From: Lazar, Lijo > Sent: Tuesday, September 10, 2024 10:06 AM > To: Russell, Kent ; amd-gfx@lists.freedeskto

[PATCH] drm/amdgpu: add amdgpu_jpeg_sched_mask debugfs

2024-09-10 Thread Sathishkumar S
JPEG_4_0_3 has up to 32 jpeg cores and a single mjpeg video decode will use all available cores on the hardware. This debugfs entry helps to disable or enable job submission to a cluster of cores or one specific core in the ip for debugging. The entry is populated only if there is at least two or m

[PATCH] drm/amdgpu/gfx9.4.3: drop extra wrapper

2024-09-10 Thread Alex Deucher
Drop wrapper used in one place. gfx_v9_4_3_xcc_cp_enable() is used in one place. gfx_v9_4_3_xcc_cp_compute_enable() is used everywhere else. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 8 +--- 1 file changed, 1 insertion(+), 7 deletions(-) diff --git a/driver

[PATCH] drm/amdgpu: fix spelling in amd_shared.h

2024-09-10 Thread Alex Deucher
Fix spelling in documentation. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/include/amd_shared.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/include/amd_shared.h b/drivers/gpu/drm/amd/include/amd_shared.h index 745fd052840d..3f91926a50e9 100644

RE: [PATCH] drm/amdgpu: fix spelling in amd_shared.h

2024-09-10 Thread Russell, Kent
[AMD Official Use Only - AMD Internal Distribution Only] Reviewed-by: Kent Russell > -Original Message- > From: amd-gfx On Behalf Of Alex > Deucher > Sent: Tuesday, September 10, 2024 10:50 AM > To: amd-gfx@lists.freedesktop.org > Cc: Deucher, Alexander > Subject: [PATCH] drm/amdgpu:

Re: [PATCH 8/8] drm/sched: Further optimise drm_sched_entity_push_job

2024-09-10 Thread Christian König
Am 10.09.24 um 11:46 schrieb Tvrtko Ursulin: On 10/09/2024 10:08, Christian König wrote: Am 09.09.24 um 19:19 schrieb Tvrtko Ursulin: From: Tvrtko Ursulin Having removed one re-lock cycle on the entity->lock in a patch titled "drm/sched: Optimise drm_sched_entity_push_job", with only a tiny

Re: [PATCH 8/8] drm/sched: Further optimise drm_sched_entity_push_job

2024-09-10 Thread Christian König
Am 10.09.24 um 12:25 schrieb Philipp Stanner: On Mon, 2024-09-09 at 18:19 +0100, Tvrtko Ursulin wrote: From: Tvrtko Ursulin Having removed one re-lock cycle on the entity->lock in a patch titled "drm/sched: Optimise drm_sched_entity_push_job", with only a tiny bit larger refactoring we can do

Re: [PATCH] drm/amdkfd: fix vm-pasid lookup for multiple partitions

2024-09-10 Thread Philip Yang
On 2024-09-09 14:46, Christian König wrote: Am 09.09.24 um 18:02 schrieb Kim, Jonathan: [Public] -Original Message- From: Christian König Sent: Thursday, September 5, 202

Re: [PATCH v2 2/2] drm/amdgpu: track bo memory stats at runtime

2024-09-10 Thread Christian König
Am 24.06.24 um 16:08 schrieb Yunxiang Li: Before, every time fdinfo is queried we try to lock all the BOs in the VM and calculate memory usage from scratch. This works okay if the fdinfo is rarely read and the VMs don't have a ton of BOs. If either of these conditions is not true, we get a massiv

Re: 6.11/regression/bisected - after commit 1b04dcca4fb1, launching some RenPy games causes computer hang

2024-09-10 Thread Leo Li
On 2024-09-08 19:30, Mikhail Gavrilov wrote: I have done additional tests: 1. The computer does not hang with 6900XT instead the screen flickers when moving the cursor. 2. The computer does not hang with 7900XTX if I turn off VRR. But the screen flickers when moving the cursor, as on 6900XT. T

Re: [PATCH 1/4] new helper: drm_gem_prime_handle_to_dmabuf()

2024-09-10 Thread Alex Deucher
Thanks. I cherry-picked these to my tree. Sorry for the delay. Alex On Fri, Aug 23, 2024 at 3:53 AM Al Viro wrote: > > On Fri, Aug 23, 2024 at 09:21:14AM +0200, Thomas Zimmermann wrote: > > > Acked-by: Thomas Zimmermann > > > > Thank you so much. > > OK, Acked-by added, branch force-pushed to

RE: [PATCH v2 2/2] drm/amdgpu: track bo memory stats at runtime

2024-09-10 Thread Li, Yunxiang (Teddy)
[Public] > Ok that looks extremely ugly. Please just add a separate function and call > that > from the TTM move function. Should I still remove the adev argument? It is never used and causes a few call sites having to find an adev unnecessarily. > Please either drop that or compare each memor

[PATCH] drm/colorop: get DATA blob ref at duplicate_state

2024-09-10 Thread Harry Wentland
Signed-off-by: Harry Wentland --- It was a stupid mistake on my part. The duplicate_state function needs to take a reference to the blob. This should fix it. Please give it a try if you can. I'll roll it into the patch that introduces the bug in my v6. Harry drivers/gpu/drm/drm_colorop.c | 3

Re: [PATCH v5 00/44] Color Pipeline API w/ VKMS

2024-09-10 Thread Alex Goins
Hi Harry, Thanks for this. I just want to remind about a few things that would be required for NVIDIA hardware, as discussed at the Display Next Hackfest -- fully understand that they aren't currently included in this series because they aren't required on AMD hardware. Allowing color ops to be no

Re: 6.11/regression/bisected - after commit 1b04dcca4fb1, launching some RenPy games causes computer hang

2024-09-10 Thread Leo Li
Hi Mikhail, Can you give this patch a try to see if it helps? https://gist.github.com/leeonadoh/3271e90ec95d768424c572c970ada743 Thanks, Leo On 2024-09-10 11:47, Leo Li wrote: On 2024-09-08 19:30, Mikhail Gavrilov wrote: I have done additional tests: 1. The computer does not hang with 6900X

Re: 6.11/regression/bisected - after commit 1b04dcca4fb1, launching some RenPy games causes computer hang

2024-09-10 Thread Mikhail Gavrilov
On Tue, Sep 10, 2024 at 8:47 PM Leo Li wrote: > > Thanks Mikhail, I think I know what's going on now. > > The `scale-monitor-framebuffer` experimental setting is what puts us down the > bad code path. It seems VRR has nothing to do with this issue, just setting > `scale-monitor-framebuffer` is eno

RE: [PATCH] drm/amdgpu: disable GPU RAS bad page feature for specific ASIC

2024-09-10 Thread Zhou1, Tao
[AMD Official Use Only - AMD Internal Distribution Only] It's not true, the feature on gpu side is ASIC specific even for APU. Regards, Tao > -Original Message- > From: Lazar, Lijo > Sent: Tuesday, September 10, 2024 9:58 PM > To: Zhou1, Tao ; amd-gfx@lists.freedesktop.org > Cc: Zhang,

Re: [PATCH 3/6] drm/amdgpu: screen freeze and userq driver crash

2024-09-10 Thread Paneer Selvam, Arunpravin
Hi Christian, On 9/5/2024 4:50 PM, Christian König wrote: Am 30.08.24 um 20:43 schrieb Arunpravin Paneer Selvam: Screen freeze and userq fence driver crash while playing Xonotic Signed-off-by: Arunpravin Paneer Selvam ---   drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c | 12 +++-  

Re: [PATCH 1/6] drm/amdgpu: Implement userqueue signal/wait IOCTL

2024-09-10 Thread Paneer Selvam, Arunpravin
Hi Christian, On 9/5/2024 2:10 PM, Christian König wrote: Am 30.08.24 um 20:43 schrieb Arunpravin Paneer Selvam: This patch introduces new IOCTL for userqueue secure semaphore. The signal IOCTL called from userspace application creates a drm syncobj and array of bo GEM handles and passed in as

Re: [PATCH] drm/amdgpu/gfx9.4.3: drop extra wrapper

2024-09-10 Thread Lazar, Lijo
On 9/10/2024 8:19 PM, Alex Deucher wrote: > Drop wrapper used in one place. gfx_v9_4_3_xcc_cp_enable() > is used in one place. gfx_v9_4_3_xcc_cp_compute_enable() > is used everywhere else. > > Signed-off-by: Alex Deucher Reviewed-by: Lijo Lazar Thanks, Lijo > --- > drivers/gpu/drm/amd/a

Re: [PATCH v2 2/2] drm/amdgpu: track bo memory stats at runtime

2024-09-10 Thread Christian König
Am 10.09.24 um 19:40 schrieb Li, Yunxiang (Teddy): [Public] Ok that looks extremely ugly. Please just add a separate function and call that from the TTM move function. Should I still remove the adev argument? It is never used and causes a few call sites having to find an adev unnecessarily.

Re: [PATCH v1] drm/amdgpu: fix typo in the comment

2024-09-10 Thread Christian König
Am 11.09.24 um 06:27 schrieb Yan Zhen: Correctly spelled comments make it easier for the reader to understand the code. Replace 'udpate' with 'update' in the comment & replace 'recieved' with 'received' in the comment & replace 'dsiable' with 'disable' in the comment & replace 'Initiailize' with