[PATCH] [PATCH] drm/amdgpu/sriov: Check pending job finished or not to identify has bad job

2024-11-07 Thread Shikang Fan
drm_sched_free_job_work is a queue work function, so even job is finished in hw, it still needs some time to be deleted from the pending queue by drm_sched_free_job_work. here iterates over the pending job list and wait for each job to finish within specified timeout (1s by default) to avoid jobs t

Re: [PATCH] drm/amdgpu: fix warning when removing sysfs

2024-11-07 Thread Christian König
Am 08.11.24 um 03:21 schrieb jesse.zh...@amd.com: Fix similar warning when running IGT: [ 155.585721] kernfs: can not remove 'enforce_isolation', no directory [ 155.592201] WARNING: CPU: 3 PID: 6960 at fs/kernfs/dir.c:1683 kernfs_remove_by_name_ns+0xb9/0xc0 [ 155.601145] Modules linked in: x

RE: [PATCH] drm/amdgpu: fix warning when removing sysfs

2024-11-07 Thread Huang, Tim
[AMD Official Use Only - AMD Internal Distribution Only] Hi Jesse, > -Original Message- > From: jesse.zh...@amd.com > Sent: Friday, November 8, 2024 10:22 AM > To: amd-gfx@lists.freedesktop.org > Cc: Deucher, Alexander ; Koenig, Christian > ; Prosyak, Vitaly ; > Huang, Tim ; Zhang, Jesse

[PATCH] drm/amdgpu: fix warning when removing sysfs

2024-11-07 Thread jesse.zh...@amd.com
Fix similar warning when running IGT: [ 155.585721] kernfs: can not remove 'enforce_isolation', no directory [ 155.592201] WARNING: CPU: 3 PID: 6960 at fs/kernfs/dir.c:1683 kernfs_remove_by_name_ns+0xb9/0xc0 [ 155.601145] Modules linked in: xt_MASQUERADE xt_comment nft_compat veth bridge stp

Re: [PATCH] drm/amdgpu: Add documentation for enforce isolation feature

2024-11-07 Thread Deucher, Alexander
[Public] Reviewed-by: Alex Deucher From: SHANMUGAM, SRINIVASAN Sent: Thursday, November 7, 2024 2:59 PM To: Koenig, Christian ; Deucher, Alexander Cc: amd-gfx@lists.freedesktop.org ; SHANMUGAM, SRINIVASAN ; Kamal, Asad Subject: [PATCH] drm/amdgpu: Add documen

[PATCH] drm/amdgpu: Add documentation for enforce isolation feature

2024-11-07 Thread Srinivasan Shanmugam
This feature enables process isolation on the graphics engine by serializing access to it and adding a cleaner shader which clears LDS (Local Data Store) and GPRs (General Purpose Registers) between jobs. Cc: Christian König Cc: Alex Deucher Signed-off-by: Srinivasan Shanmugam Suggested-by: Ale

[PATCH] drm/amdgpu/gfx11: Enable cleaner shader for GFX11.0.0/11.0.2 GPUs

2024-11-07 Thread Srinivasan Shanmugam
Enable the cleaner shader for GFX11.0.0/11.0.2 GPUs to provide data isolation between GPU workloads. The cleaner shader is responsible for clearing the Local Data Store (LDS), Vector General Purpose Registers (VGPRs), and Scalar General Purpose Registers (SGPRs), which helps prevent data leakage an

Re: [PATCH v6 4/5] drm: add drm_memory_stats_is_zero

2024-11-07 Thread Christian König
Am 07.11.24 um 15:43 schrieb Tvrtko Ursulin: On 07/11/2024 14:17, Li, Yunxiang (Teddy) wrote: [AMD Official Use Only - AMD Internal Distribution Only] From: Tvrtko Ursulin Sent: Thursday, November 7, 2024 5:41 On 25/10/2024 18:41, Yunxiang Li wrote: Add a helper to check if the memory stats

Re: no-retry page fault on 6.11.6 kernel with radeon VII

2024-11-07 Thread Alex Deucher
On Thu, Nov 7, 2024 at 3:03 AM Kenneth Topp wrote: > > Greetings, > > I'm getting no-retry page fault fatal errors (kills Xwayland): > > [ 177.470230 <6.102062 >] myhost kernel: amdgpu :03:00.0: > amdgpu: [gfxhub0] no-retry page fault (src_id:0 ring:158 vmid:3 > pasid:32776) > [ 177.4704

[pull] drm-fixes-6.12

2024-11-07 Thread Alex Deucher
Hi Dave, Simona, Fixes for 6.12. The following changes since commit 59b723cd2adbac2a34fc8e12c74ae26ae45bf230: Linux 6.12-rc6 (2024-11-03 14:05:52 -1000) are available in the Git repository at: https://gitlab.freedesktop.org/agd5f/linux.git tags/amd-drm-fixes-6.12-2024-11-07 for you to fe

Re: [PATCH v2 05/10] sysfs: treewide: constify attribute callback of bin_is_visible()

2024-11-07 Thread Pratyush Yadav
On Sun, Nov 03 2024, Thomas Weißschuh wrote: > The is_bin_visible() callbacks should not modify the struct > bin_attribute passed as argument. > Enforce this by marking the argument as const. > > As there are not many callback implementers perform this change > throughout the tree at once. > > Sig

Re: [PATCH v6 4/5] drm: add drm_memory_stats_is_zero

2024-11-07 Thread Tvrtko Ursulin
On 25/10/2024 18:41, Yunxiang Li wrote: Add a helper to check if the memory stats is zero, this will be used to check for memory accounting errors. Signed-off-by: Yunxiang Li --- drivers/gpu/drm/drm_file.c | 9 + include/drm/drm_file.h | 1 + 2 files changed, 10 insertions(+)

RE: [PATCH v6 5/5] drm/amdgpu: track bo memory stats at runtime

2024-11-07 Thread Li, Yunxiang (Teddy)
[Public] > From: Tvrtko Ursulin > Sent: Thursday, November 7, 2024 5:48 > On 31/10/2024 13:48, Li, Yunxiang (Teddy) wrote: > > [Public] > > > >> From: Christian König > >> Sent: Thursday, October 31, 2024 8:54 Am 25.10.24 um 19:41 schrieb > >> Yunxiang Li: > >>> Before, every time fdinfo is quer

Re: [PATCH v3 1/2] drm/display/dsc: Refactor DRM MST DSC Determination Policy

2024-11-07 Thread Dan Carpenter
://anongit.freedesktop.org/drm/drm drm-next patch link: https://lore.kernel.org/r/20241106150444.424579-2-Jerry.Zuo%40amd.com patch subject: [PATCH v3 1/2] drm/display/dsc: Refactor DRM MST DSC Determination Policy config: i386-randconfig-141-20241107 (https://download.01.org/0day-ci/archive/20241107

Re: [PATCH 2/2] drm/amdkfd: use cache GTT buffer for PQ and wb pool

2024-11-07 Thread Christian König
Am 07.11.24 um 06:58 schrieb Lazar, Lijo: On 11/6/2024 8:42 PM, Alex Deucher wrote: On Wed, Nov 6, 2024 at 1:49 AM Victor Zhao wrote: From: Monk Liu As cache GTT buffer is snooped, this way the coherence between CPU write and GPU fetch is guaranteed, but original code uses WC + unsnooped for

RE: [PATCH v6 4/5] drm: add drm_memory_stats_is_zero

2024-11-07 Thread Li, Yunxiang (Teddy)
[AMD Official Use Only - AMD Internal Distribution Only] > From: Tvrtko Ursulin > Sent: Thursday, November 7, 2024 5:41 > On 25/10/2024 18:41, Yunxiang Li wrote: > > Add a helper to check if the memory stats is zero, this will be used > > to check for memory accounting errors. > > > > Signed-off-

Re: [PATCH v2 03/10] PCI/sysfs: Calculate bin_attribute size through bin_size()

2024-11-07 Thread Bjorn Helgaas
On Sun, Nov 03, 2024 at 05:03:32PM +, Thomas Weißschuh wrote: > Stop abusing the is_bin_visible() callback to calculate the attribute > size. Instead use the new, dedicated bin_size() one. > > Signed-off-by: Thomas Weißschuh Acked-by: Bjorn Helgaas Thanks for doing this! > --- > drivers/

Re: [PATCH v2 02/10] sysfs: introduce callback attribute_group::bin_size

2024-11-07 Thread Krzysztof Wilczyński
Hello, [...] > > There exist the sysfs_update_groups(), but the BAR resource sysfs objects > > are currently, at least not yet, added to any attribute group. > > then maybe they should be added to one :) Yeah. There is work in progress that will take care of some of this. Krzysztof

Re: [PATCH v6 5/5] drm/amdgpu: track bo memory stats at runtime

2024-11-07 Thread Tvrtko Ursulin
On 31/10/2024 13:48, Li, Yunxiang (Teddy) wrote: [Public] From: Christian König Sent: Thursday, October 31, 2024 8:54 Am 25.10.24 um 19:41 schrieb Yunxiang Li: Before, every time fdinfo is queried we try to lock all the BOs in the VM and calculate memory usage from scratch. This works okay

RE: [PATCH v2 4/4] drm/amdgpu: Implement virt req_ras_err_count

2024-11-07 Thread Zhou1, Tao
[AMD Official Use Only - AMD Internal Distribution Only] For amdgpu_ras_block_to_sriov, can we return block directly? As the definition of enum amdgpu_ras_block is same as that of enum amd_sriov_ras_telemetry_gpu_block. Anyway, the framework is fine for me, the series is: Acked-by: Tao Zhou >

Re: [PATCH] amdkfd: Explicitly specify data type amdkfd_process_info in related functions

2024-11-07 Thread Zhu Lingshan
On 11/5/2024 5:49 AM, Felix Kuehling wrote: > On 2024-10-31 23:20, Zhu Lingshan wrote: >> On 10/22/2024 4:01 PM, Zhu Lingshan wrote: >>> On 10/22/2024 12:20 PM, Felix Kuehling wrote: On 2024-10-14 23:51, Zhu Lingshan wrote: > This commit specifies data type struct amdkfd_process_info >

RE: [PATCH] drm/amdgpu/mes12: correct kiq unmap latency

2024-11-07 Thread Zhang, Hawking
[AMD Official Use Only - AMD Internal Distribution Only] Reviewed-by: Hawking Zhang Regards, Hawking -Original Message- From: Xiao, Jack Sent: Thursday, November 7, 2024 15:39 To: amd-gfx@lists.freedesktop.org; Deucher, Alexander ; Zhang, Hawking Cc: Xiao, Jack Subject: [PATCH] drm/a

Re: [PATCH 6.6 28/28] maple_tree: correct tree corruption on spanning store

2024-11-07 Thread Lorenzo Stoakes
On Thu, Oct 24, 2024 at 09:22:25PM +0800, Yu Kuai wrote: > diff --git a/lib/maple_tree.c b/lib/maple_tree.c > index 5328e08723d7..c57b6fc4db2e 100644 > --- a/lib/maple_tree.c > +++ b/lib/maple_tree.c > @@ -2239,6 +2239,8 @@ static inline void mas_node_or_none(struct ma_state > *mas, > > /* > *

Re: [PATCH 6.6 00/28] fix CVE-2024-46701

2024-11-07 Thread Yu Kuai
Hi, 在 2024/11/06 22:43, Lorenzo Stoakes 写道: NACK. Do this some other way that isn't a terrible mess. You've reverted my CRITICAL fix, then didn't cc- me so I'm grumpy. Even if you bizarrely brought it back later. Don't fail to cc- people you revert in future, please, especially in stable. It

Re: [PATCH 6.6 00/28] fix CVE-2024-46701

2024-11-07 Thread Lorenzo Stoakes
NACK. Do this some other way that isn't a terrible mess. You've reverted my CRITICAL fix, then didn't cc- me so I'm grumpy. Even if you bizarrely brought it back later. Don't fail to cc- people you revert in future, please, especially in stable. It's not only discourteous it's also an actual se

no-retry page fault on 6.11.6 kernel with radeon VII

2024-11-07 Thread Kenneth Topp
Greetings, I'm getting no-retry page fault fatal errors (kills Xwayland): [ 177.470230 <6.102062 >] myhost kernel: amdgpu :03:00.0: amdgpu: [gfxhub0] no-retry page fault (src_id:0 ring:158 vmid:3 pasid:32776) [ 177.470483 <0.000253 >] myhost kernel: amdgpu :03:00.0: amdgpu: for

Re: [PATCH 6.6 00/28] fix CVE-2024-46701

2024-11-07 Thread Liam R. Howlett
* Greg KH [241106 01:16]: > On Thu, Oct 24, 2024 at 09:19:41PM +0800, Yu Kuai wrote: > > From: Yu Kuai > > > > Fix patch is patch 27, relied patches are from: > > > > - patches from set [1] to add helpers to maple_tree, the last patch to > > improve fork() performance is not backported; > > S

Re: [PATCH 6.6 00/28] fix CVE-2024-46701

2024-11-07 Thread James Bottomley
On Wed, 2024-11-06 at 15:19 +, Chuck Lever III wrote: > This is the first I've heard of this CVE. It > would help if the patch authors got some > notification when these are filed. Greg did it; it came from the kernel CNA: https://www.cve.org/CVERecord?id=CVE-2024-46701 The way it seems to w

Re: [PATCH 6.6 28/28] maple_tree: correct tree corruption on spanning store

2024-11-07 Thread Yu Kuai
Hi, 在 2024/11/06 23:02, Lorenzo Stoakes 写道: On Thu, Oct 24, 2024 at 09:22:25PM +0800, Yu Kuai wrote: diff --git a/lib/maple_tree.c b/lib/maple_tree.c index 5328e08723d7..c57b6fc4db2e 100644 --- a/lib/maple_tree.c +++ b/lib/maple_tree.c @@ -2239,6 +2239,8 @@ static inline void mas_node_or_none(

Re: [PATCH 6.6 00/28] fix CVE-2024-46701

2024-11-07 Thread Chuck Lever III
> On Nov 6, 2024, at 1:16 AM, Greg KH wrote: > > On Thu, Oct 24, 2024 at 09:19:41PM +0800, Yu Kuai wrote: >> From: Yu Kuai >> >> Fix patch is patch 27, relied patches are from: I assume patch 27 is: libfs: fix infinite directory reads for offset dir https://lore.kernel.org/stable/202410241

Re: [PATCH 6.6 00/28] fix CVE-2024-46701

2024-11-07 Thread Yu Kuai
Hi, 在 2024/11/06 23:19, Chuck Lever III 写道: On Nov 6, 2024, at 1:16 AM, Greg KH wrote: On Thu, Oct 24, 2024 at 09:19:41PM +0800, Yu Kuai wrote: From: Yu Kuai Fix patch is patch 27, relied patches are from: I assume patch 27 is: libfs: fix infinite directory reads for offset dir https