Re: [PATCH 004/156] drm/nouveau: pass drm to nv50_dmac_create(), rather than device+disp

2024-04-16 Thread Philipp Stanner
On Wed, 2024-04-17 at 09:37 +1000, Ben Skeggs wrote: > - zero reason to do otherwise to do _what_ otherwise? that refers to the title, it seems. But one could describe why it was ever done that way (older architecture? bug? mistake?) in the first place. The commit messages in this entire series a

Re: [RFC PATCH 7/8] rust: add firmware abstractions

2024-05-22 Thread Philipp Stanner
On Wed, 2024-05-22 at 08:53 +0900, FUJITA Tomonori wrote: > Hi, > Thanks for working on the firmware API! > > On Mon, 20 May 2024 19:24:19 +0200 > Danilo Krummrich wrote: > > > Add an abstraction around the kernels firmware API to request > > firmware > > images. The abstraction provides functio

Re: LLM GPU Support

2024-06-05 Thread Philipp Stanner
On Tue, 2024-06-04 at 10:27 -0500, Blake McBride wrote: > Greetings, > > I have used the nouveau driver with my Nvidia card on Linux.  Works > fine.  However, my problem has to do with running LLM on my GPU with > your driver.  My impression is, it doesn't work.  Am I correct? Yo, "it doesn't wo

[PATCH] drm/nouveau: Improve variable names in nouveau_sched_init()

2024-07-11 Thread Philipp Stanner
7;s "timeout" parameter. The actual "hang_limit" parameter is directly set to 0. Define a new variable and rename the existing one to make naming congruent with the function API. Signed-off-by: Philipp Stanner --- drivers/gpu/drm/nouveau/nouveau_sched.c | 5 +++-- 1 file

[PATCH v2] drm/nouveau: Improve variable names in nouveau_sched_init()

2024-07-11 Thread Philipp Stanner
7;s "timeout" parameter. The actual "hang_limit" parameter is directly set to 0. Rename "job_hang_limit" to "timeout". Signed-off-by: Philipp Stanner --- Changes in v2: - Remove variable "hang_limit". (Danilo) --- drivers/gpu/drm/nouveau/nouveau_sc

Re: [Nouveau] [PATCH 03/44] drm/nouveau/nvkm: bump maximum number of NVJPG

2023-10-24 Thread Philipp Stanner
On Tue, 2023-09-19 at 06:21 +1000, Ben Skeggs wrote: > From: Ben Skeggs > > RM (and GH100) support 8 NVJPG instances. I don't think the commit message provides enough information. Instinctively I would read the RM as "remove", thus "remove support [for] 8 NVJPG instances" ??? Two sentences (wit

Re: [Nouveau] [PATCH 17/44] drm/nouveau/mmu/tu102-: prepare for GSP-RM

2023-10-24 Thread Philipp Stanner
On Tue, 2023-09-19 at 06:21 +1000, Ben Skeggs wrote: > From: Ben Skeggs > > - (temporarily) disable if GSP-RM detected, will be added later disable _what_? The other commit messages at least briefly name the component. This one should as well. Furthermore, I'd say that the wording should be som

Re: [Nouveau] [PATCH 31/44] drm/nouveau/nvkm: support loading fws into sg_table

2023-10-24 Thread Philipp Stanner
On Tue, 2023-09-19 at 06:21 +1000, Ben Skeggs wrote: > From: Ben Skeggs > > - preparation for GSP-RM, which has massive FW images > - based on a patch by Dave Airlie Probably more canonical to use one of the standard phrases such as Suggested-by > > Signed-off-by: Ben Skeggs > --- >  .../drm

Re: [Nouveau] [PATCH 33/44] drm/nouveau/gsp/r535: add support for rm control

2023-10-24 Thread Philipp Stanner
On Tue, 2023-09-19 at 06:21 +1000, Ben Skeggs wrote: > From: Ben Skeggs > > Adds the plumbing to start making RM control calls, and initialises > objects to represent internal RM objects provided to us during init. > > These will be used by subsequent patches. > > Signed-off-by: Ben Skeggs > -

Re: [Nouveau] [PATCH 42/44] drm/nouveau/nvenc/r535: initial support

2023-10-24 Thread Philipp Stanner
On Tue, 2023-09-19 at 06:21 +1000, Ben Skeggs wrote: > From: Ben Skeggs > > Adds support for allocating VIDEO_ENCODER classes from RM. > > Signed-off-by: Ben Skeggs > --- >  drivers/gpu/drm/nouveau/include/nvif/class.h  |   4 + >  .../drm/nouveau/include/nvkm/engine/nvenc.h   |   2 + >  .../535

Re: [Nouveau] [bug report] drm/nouveau/gsp/r535: add support for booting GSP-RM

2023-11-07 Thread Philipp Stanner
On Tue, 2023-11-07 at 17:34 +0300, Dan Carpenter wrote: > Hello Ben Skeggs, Hi, FYI, Ben is not maintaining Nouveau anymore. The MAINTAINERS file has been updated in that regard. P. > > The patch 176fdcbddfd2: "drm/nouveau/gsp/r535: add support for > booting GSP-RM" from Sep 19, 2023 (linux-ne

Re: [PATCH 1/2] nouveau: handle EBUSY and EAGAIN for GSP aux errors.

2024-11-13 Thread Philipp Stanner
On Mon, 2024-11-11 at 13:41 +1000, Dave Airlie wrote: > From: Dave Airlie > > The upper layer transfer functions expect EBUSY as a return > for when retries should be done. > > Fix the AUX error translation, but also check for both errors > in a few places. > > Fixes: eb284f4b3781 ("drm/nouveau

[PATCH v3] drm/sched: Use struct for drm_sched_init() params

2025-02-07 Thread Philipp Stanner
_init()"). Introduce a new struct for the scheduler init parameters and port all users. Signed-off-by: Philipp Stanner Reviewed-by: Liviu Dudau Acked-by: Matthew Brost # for Xe Reviewed-by: Boris Brezillon # for Panfrost and Panthor Reviewed-by: Christian Gmeiner # for Etnaviv Reviewed

[PATCH v4] drm/sched: Use struct for drm_sched_init() params

2025-02-11 Thread Philipp Stanner
me in nouveau_sched_init()"). Introduce a new struct for the scheduler init parameters and port all users. Signed-off-by: Philipp Stanner Reviewed-by: Liviu Dudau Acked-by: Matthew Brost # for Xe Reviewed-by: Boris Brezillon # for Panfrost and Panthor Reviewed-by: Christian Gmeiner # for Etnavi

Re: [PATCH] drm/sched: Use struct for drm_sched_init() params

2025-01-22 Thread Philipp Stanner
On Wed, 2025-01-22 at 15:34 +0100, Christian König wrote: > Am 22.01.25 um 15:08 schrieb Philipp Stanner: > > drm_sched_init() has a great many parameters and upcoming new > > functionality for the scheduler might add even more. Generally, the > > great number of parameters re

Re: [PATCH] drm/sched: Use struct for drm_sched_init() params

2025-01-22 Thread Philipp Stanner
On Wed, 2025-01-22 at 16:06 +0100, Christian König wrote: > Am 22.01.25 um 15:48 schrieb Philipp Stanner: > > On Wed, 2025-01-22 at 15:34 +0100, Christian König wrote: > > > Am 22.01.25 um 15:08 schrieb Philipp Stanner: > > > > drm_sched_init() has a great ma

[PATCH] drm/sched: Use struct for drm_sched_init() params

2025-01-22 Thread Philipp Stanner
me in nouveau_sched_init()"). Introduce a new struct for the scheduler init parameters and port all users. Signed-off-by: Philipp Stanner --- Howdy, I have a patch-series in the pipe that will add a `flags` argument to drm_sched_init(). I thought it would be wise to first rework the API as detailed in this p

Re: [PATCH] drm/sched: Use struct for drm_sched_init() params

2025-01-23 Thread Philipp Stanner
On Thu, 2025-01-23 at 09:10 +0100, Philipp Stanner wrote: > On Wed, 2025-01-22 at 19:07 -0300, Maíra Canal wrote: > > Hi Philipp, > > > > On 22/01/25 11:08, Philipp Stanner wrote: > > > drm_sched_init() has a great many parameters and upcoming new > > > fu

Re: [PATCH] drm/sched: Use struct for drm_sched_init() params

2025-01-22 Thread Philipp Stanner
t; > On Wed, 22 Jan 2025 15:08:20 +0100 > > > > Philipp Stanner wrote: > > > >   > > > > > --- a/drivers/gpu/drm/panthor/panthor_sched.c > > > > > +++ b/drivers/gpu/drm/panthor/panthor_sched.c > > > > > @@ -3272,6 +3272,7 @@ group_creat

Re: [PATCH] drm/sched: Use struct for drm_sched_init() params

2025-01-23 Thread Philipp Stanner
On Wed, 2025-01-22 at 19:07 -0300, Maíra Canal wrote: > Hi Philipp, > > On 22/01/25 11:08, Philipp Stanner wrote: > > drm_sched_init() has a great many parameters and upcoming new > > functionality for the scheduler might add even more. Generally, the > > great n

[PATCH v2] drm/sched: Use struct for drm_sched_init() params

2025-01-28 Thread Philipp Stanner
me in nouveau_sched_init()"). Introduce a new struct for the scheduler init parameters and port all users. Signed-off-by: Philipp Stanner --- Changes in v2: - Point out that the hang-limit is deprecated. (Christian) - Initialize the structs to 0 at declaration. (Planet Earth) - Don't set stuff

[RFC PATCH 0/5] drm/sched: Fix memory leaks in drm_sched_fini()

2025-03-24 Thread Philipp Stanner
reliable, clean scheduler API. Philipp Philipp Stanner (5): drm/sched: Fix teardown leaks with waitqueue drm/sched: Prevent teardown waitque from blocking too long drm/sched: Warn if pending list is not empty drm/nouveau: Add new callback for scheduler teardown drm/nouveau: Remove wait

[RFC PATCH 2/5] drm/sched: Prevent teardown waitque from blocking too long

2025-03-24 Thread Philipp Stanner
callback is not implemented. Suggested-by: Danilo Krummrich Signed-off-by: Philipp Stanner --- drivers/gpu/drm/scheduler/sched_main.c | 47 +- include/drm/gpu_scheduler.h| 11 ++ 2 files changed, 42 insertions(+), 16 deletions(-) diff --git a/drivers/gpu

[RFC PATCH 1/5] drm/sched: Fix teardown leaks with waitqueue

2025-03-24 Thread Philipp Stanner
From: Philipp Stanner The GPU scheduler currently does not ensure that its pending_list is empty before performing various other teardown tasks in drm_sched_fini(). If there are still jobs in the pending_list, this is problematic because after scheduler teardown, no one will call

[RFC PATCH 4/5] drm/nouveau: Add new callback for scheduler teardown

2025-03-24 Thread Philipp Stanner
There is a new callback for always tearing the scheduler down in a leak-free, deadlock-free manner. Port Nouveau as its first user by providing the scheduler with a callback that ensures the fence context gets killed in drm_sched_fini(). Signed-off-by: Philipp Stanner --- drivers/gpu/drm

[RFC PATCH 3/5] drm/sched: Warn if pending list is not empty

2025-03-24 Thread Philipp Stanner
drm_sched_fini() can leak jobs under certain circumstances. Warn if that happens. Signed-off-by: Philipp Stanner --- drivers/gpu/drm/scheduler/sched_main.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c

[RFC PATCH 5/5] drm/nouveau: Remove waitque for sched teardown

2025-04-05 Thread Philipp Stanner
nouveau_sched_fence_context_kill() the waitque is not necessary anymore. Remove the waitque. Signed-off-by: Philipp Stanner --- drivers/gpu/drm/nouveau/nouveau_sched.c | 20 +++- drivers/gpu/drm/nouveau/nouveau_sched.h | 9 +++-- drivers/gpu/drm/nouveau/nouveau_uvmm.c | 8 3 files

[PATCH] drm/nouveau: Prevent signalled fences in pending list

2025-03-27 Thread Philipp Stanner
fence") Signed-off-by: Philipp Stanner --- I'm not entirely sure what Fixes-Tag is appropriate. The last time the line causing the signalled fence in the list was touched is the commit listed above. --- drivers/gpu/drm/nouveau/nouveau_fence.c | 41 - drivers/

[PATCH 2/3] drm/nouveau: Remove surplus if-branch

2025-04-10 Thread Philipp Stanner
. Remove the if-branch. Signed-off-by: Philipp Stanner --- drivers/gpu/drm/nouveau/nouveau_fence.c | 24 +++- 1 file changed, 11 insertions(+), 13 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c b/drivers/gpu/drm/nouveau/nouveau_fence.c index 33535987d8ed

[PATCH 3/5] drm/sched: Warn if pending list is not empty

2025-04-09 Thread Philipp Stanner
drm_sched_fini() can leak jobs under certain circumstances. Warn if that happens. Signed-off-by: Philipp Stanner --- drivers/gpu/drm/scheduler/sched_main.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c

[PATCH 0/3] drm/nouveau: Fix & improve nouveau_fence_done()

2025-04-10 Thread Philipp Stanner
a bug, or rather: the archetype of a race, since (as Christian pointed out) nouveau_fence_update() later at some point will remove the signaled fence (by signaling it again). P. Philipp Stanner (3): drm/nouveau: Prevent signaled fences in pending list drm/nouveau: Remove surplus if-branch

[PATCH 3/3] drm/nouveau: Add helper to check base fence

2025-04-10 Thread Philipp Stanner
(). Signed-off-by: Philipp Stanner --- drivers/gpu/drm/nouveau/nouveau_fence.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c b/drivers/gpu/drm/nouveau/nouveau_fence.c index db6f4494405c..0d58a81b3402 100644 --- a/drivers/gpu/drm

[PATCH 1/3] drm/nouveau: Prevent signaled fences in pending list

2025-04-10 Thread Philipp Stanner
nouveau_fence_base_is_signaled(). Cc: # 4.10+, precise commit not to be determined Signed-off-by: Philipp Stanner --- drivers/gpu/drm/nouveau/nouveau_fence.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c b/drivers/gpu/drm/nouveau

[PATCH 4/5] drm/nouveau: Add new callback for scheduler teardown

2025-04-10 Thread Philipp Stanner
There is a new callback for always tearing the scheduler down in a leak-free, deadlock-free manner. Port Nouveau as its first user by providing the scheduler with a callback that ensures the fence context gets killed in drm_sched_fini(). Signed-off-by: Philipp Stanner --- drivers/gpu/drm

Memory leaks from r535_gsp_oneinit()

2025-03-29 Thread Philipp Stanner
I see two small memory leaks on a Fedora 41 desktop with a custom built kernel @ commit: 27d4815149ba drm/sched: Group exported prototypes by object type GPU is an RTX 5000 Ada The leaks are there immediately after booting the machine. They don't seem to reoccur, although I have not verified thi

[PATCH 0/5] drm/sched: Fix memory leaks in drm_sched_fini()

2025-04-07 Thread Philipp Stanner
t can provide users with a more reliable, clean scheduler API. Philipp Philipp Stanner (5): drm/sched: Fix teardown leaks with waitqueue drm/sched: Prevent teardown waitque from blocking too long drm/sched: Warn if pending list is not empty drm/nouveau: Add new callback for scheduler te

[PATCH] drm/nouveau: Remove forgotten TODO

2025-04-09 Thread Philipp Stanner
y assumed that the TODO is not needed anymore. Besides, its content is useless anyways since it does not specify *what* should have been done. Remove the TODO. Signed-off-by: Philipp Stanner --- drivers/gpu/drm/nouveau/nouveau_chan.h | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/gpu

[PATCH 0/2] dma-fence: Rename dma_fence_is_signaled()

2025-04-09 Thread Philipp Stanner
at it becomes very, very explicit when reading code that this is a place where fences can get signaled. This series obsoletes this patch: [2] P. [1] https://lore.kernel.org/all/20250403101353.42880-2-pha...@kernel.org/ [2] https://lore.kernel.org/all/20250408122217.61530-2-pha...@kernel.org/ Ph

[PATCH 1/2] dma-fence: Rename dma_fence_is_signaled()

2025-04-09 Thread Philipp Stanner
nouveau_fence_done() uses the function to check a fence, which causes a race. Give the function a more obvious name. Signed-off-by: Philipp Stanner --- drivers/dma-buf/dma-fence-array.c | 2 +- drivers/dma-buf/dma-fence-chain.c | 6 +++--- drivers/dma-buf/dma-

[PATCH 2/2] dma-fence: Improve docu for dma_fence_check_and_signal()

2025-04-09 Thread Philipp Stanner
style. Signed-off-by: Philipp Stanner --- include/linux/dma-fence.h | 26 -- 1 file changed, 20 insertions(+), 6 deletions(-) diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h index dc2ad171458b..3df370b2cc7c 100644 --- a/include/linux/dma-fence.h +++ b/in

[PATCH 5/5] drm/nouveau: Remove waitque for sched teardown

2025-04-07 Thread Philipp Stanner
nouveau_sched_fence_context_kill() the waitque is not necessary anymore. Remove the waitque. Signed-off-by: Philipp Stanner --- drivers/gpu/drm/nouveau/nouveau_sched.c | 20 +++- drivers/gpu/drm/nouveau/nouveau_sched.h | 9 +++-- drivers/gpu/drm/nouveau/nouveau_uvmm.c | 8 3 files

[PATCH v2 0/2] drm/nouveau: Don't set signaled fences' error codes

2025-04-15 Thread Philipp Stanner
quot; before. I've tested this with KASAN & kmemleak. P. Philipp Stanner (2): drm/nouveau: Fix WARN_ON in nouveau_fence_context_kill() drm/nouveau: nouveau_fence: Standardize list iterations drivers/gpu/drm/nouveau/nouveau_fence.c | 12 +++- 1 file changed, 7 insertions(+), 5 deletions(-) -- 2.48.1

[PATCH v2 1/2] drm/nouveau: Fix WARN_ON in nouveau_fence_context_kill()

2025-04-15 Thread Philipp Stanner
Fixes: ea13e5abf807 ("drm/nouveau: signal pending fences when channel has been killed") Suggested-by: Christian König Signed-off-by: Philipp Stanner --- drivers/gpu/drm/nouveau/nouveau_fence.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/nouveau/nou

[PATCH v2 2/2] drm/nouveau: nouveau_fence: Standardize list iterations

2025-04-15 Thread Philipp Stanner
nouveau_fence.c iterates over lists in a non-canonical way. Since the operations done are just basic for-each-loops, they should be written in the standard form. Use for_each_safe() instead of the custom loop iterations. Signed-off-by: Philipp Stanner --- drivers/gpu/drm/nouveau

[PATCH 2/4] drm/nouveau: Simplify calls to nvif_event_block()

2025-04-24 Thread Philipp Stanner
nouveau_fence_signal() returns a de-facto boolean to indicate when nvif_event_block() shall be called. The code can be made more compact and readable by calling nvif_event_block() in nouveau_fence_update() directly. Make those calls in nouveau_fence.c more canonical. Signed-off-by: Philipp

[PATCH 4/4] drm/nouveau: Check dma_fence in canonical way

2025-04-24 Thread Philipp Stanner
In nouveau_fence_done(), a fence is checked for being signaled by manually evaluating the base fence's bits. This can be done in a canonical manner through dma_fence_is_signaled(). Replace the bit-check with dma_fence_is_signaled(). Signed-off-by: Philipp Stanner --- drivers/gpu/drm/no

[PATCH 0/4] drm/nouveau: Simplify nouveau_fence.c

2025-04-24 Thread Philipp Stanner
/ Philipp Stanner (4): drm/nouveau: nouveau_fence: Standardize list iterations drm/nouveau: Simplify calls to nvif_event_block() drm/nouveau: Simplify nouveau_fence_done() drm/nouveau: Check dma_fence in canonical way drivers/gpu/drm/nouveau/nouveau_fence.c | 72 +++-- 1 file

[PATCH 3/4] drm/nouveau: Simplify nouveau_fence_done()

2025-04-24 Thread Philipp Stanner
nouveau_fence_done() contains an if branch that checks whether a nouveau_fence has either of the two existing nouveau_fence backend ops, which will always evaluate to true. Remove the surplus check. Signed-off-by: Philipp Stanner --- drivers/gpu/drm/nouveau/nouveau_fence.c | 24

[PATCH 1/4] drm/nouveau: nouveau_fence: Standardize list iterations

2025-04-24 Thread Philipp Stanner
nouveau_fence.c iterates over lists in a non-canonical way. Since the operations done are just basic for-each-loops and list-empty checks, they should be written in the standard form. Use standard list operations. Signed-off-by: Philipp Stanner --- drivers/gpu/drm/nouveau/nouveau_fence.c | 21

[PATCH 2/5] drm/sched: Prevent teardown waitque from blocking too long

2025-04-10 Thread Philipp Stanner
callback is not implemented. Suggested-by: Danilo Krummrich Signed-off-by: Philipp Stanner --- drivers/gpu/drm/scheduler/sched_main.c | 47 +- include/drm/gpu_scheduler.h| 11 ++ 2 files changed, 42 insertions(+), 16 deletions(-) diff --git a/drivers/gpu

Re: [PATCH 1/5] drm/sched: Fix teardown leaks with waitqueue

2025-04-17 Thread Philipp Stanner
On Mon, 2025-04-07 at 17:22 +0200, Philipp Stanner wrote: > From: Philipp Stanner > > The GPU scheduler currently does not ensure that its pending_list is > empty before performing various other teardown tasks in > drm_sched_fini(). > > If there are still jobs in the

[PATCH v2 6/6] drm/sched: Port unit tests to new cleanup design

2025-04-24 Thread Philipp Stanner
the unit tests. Remove the manual cleanup code. Signed-off-by: Philipp Stanner --- .../gpu/drm/scheduler/tests/mock_scheduler.c | 34 --- 1 file changed, 21 insertions(+), 13 deletions(-) diff --git a/drivers/gpu/drm/scheduler/tests/mock_scheduler.c b/drivers/gpu/drm/scheduler

[PATCH v2] drm/nouveau: Remove surplus accel_done

2025-04-24 Thread Philipp Stanner
d, therefore, must be a relict forgotten in a previous cleanup. Remove the TODO and accel_done. Signed-off-by: Philipp Stanner --- Changes in v2: - Remove accel_done, too. (Danilo) --- drivers/gpu/drm/nouveau/nouveau_chan.h | 2 -- drivers/gpu/drm/nouveau/nouveau_dma.h | 1 - 2 files changed

[PATCH v2 2/6] drm/sched: Prevent teardown waitque from blocking too long

2025-04-24 Thread Philipp Stanner
callback is not implemented. Suggested-by: Danilo Krummrich Signed-off-by: Philipp Stanner --- drivers/gpu/drm/scheduler/sched_main.c | 47 +- include/drm/gpu_scheduler.h| 11 ++ 2 files changed, 42 insertions(+), 16 deletions(-) diff --git a/drivers/gpu

[PATCH v2 4/6] drm/nouveau: Add new callback for scheduler teardown

2025-04-24 Thread Philipp Stanner
There is a new callback for always tearing the scheduler down in a leak-free, deadlock-free manner. Port Nouveau as its first user by providing the scheduler with a callback that ensures the fence context gets killed in drm_sched_fini(). Signed-off-by: Philipp Stanner --- drivers/gpu/drm

[PATCH v2 5/6] drm/nouveau: Remove waitque for sched teardown

2025-04-24 Thread Philipp Stanner
nouveau_sched_fence_context_kill() the waitque is not necessary anymore. Remove the waitque. Signed-off-by: Philipp Stanner --- drivers/gpu/drm/nouveau/nouveau_sched.c | 20 +++- drivers/gpu/drm/nouveau/nouveau_sched.h | 9 +++-- drivers/gpu/drm/nouveau/nouveau_uvmm.c | 8 3 files

[PATCH v2 0/6] drm/sched: Fix memory leaks in drm_sched_fini()

2025-04-24 Thread Philipp Stanner
ks fine and solves the problem (though we did discover an unrelated problem inside Nouveau in the process). It also works with the unit tests. I'm looking forward to your input and feedback. I really hope we can work this RFC into something that can provide users with a more reliable, clean

[PATCH v2 1/6] drm/sched: Fix teardown leaks with waitqueue

2025-04-24 Thread Philipp Stanner
From: Philipp Stanner The GPU scheduler currently does not ensure that its pending_list is empty before performing various other teardown tasks in drm_sched_fini(). If there are still jobs in the pending_list, this is problematic because after scheduler teardown, no one will call

[PATCH v2 3/6] drm/sched: Warn if pending list is not empty

2025-04-24 Thread Philipp Stanner
drm_sched_fini() can leak jobs under certain circumstances. Warn if that happens. Signed-off-by: Philipp Stanner --- drivers/gpu/drm/scheduler/sched_main.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c

[PATCH 1/5] drm/sched: Fix teardown leaks with waitqueue

2025-04-10 Thread Philipp Stanner
From: Philipp Stanner The GPU scheduler currently does not ensure that its pending_list is empty before performing various other teardown tasks in drm_sched_fini(). If there are still jobs in the pending_list, this is problematic because after scheduler teardown, no one will call

Re: Memory leaks from r535_gsp_oneinit()

2025-04-04 Thread Philipp Stanner
On Fri, 2025-03-21 at 09:56 +0100, Philipp Stanner wrote: > I see two small memory leaks on a Fedora 41 desktop with a custom > built > kernel @ commit: > > 27d4815149ba drm/sched: Group exported prototypes by object type > > GPU is an RTX 5000 Ada > > The leaks

[PATCH v2] drm/nouveau: Prevent signalled fences in pending list

2025-04-03 Thread Philipp Stanner
signalling a fence has additional effects is to add those effects to a callback and register it on that fence. Move the code from nouveau_fence_signal() into a dma_fence callback. Register that callback when creating the fence. Cc: # 4.10+ Signed-off-by: Philipp Stanner --- Changes in v2: - Remove

Re: [PATCH v3 1/5] drm/sched: Fix teardown leaks with waitqueue

2025-05-22 Thread Philipp Stanner
On Thu, 2025-05-22 at 14:37 +0100, Tvrtko Ursulin wrote: > > On 22/05/2025 09:27, Philipp Stanner wrote: > > From: Philipp Stanner > > > > The GPU scheduler currently does not ensure that its pending_list > > is > > empty before performing various other

[PATCH v3 0/5] Fix memory leaks in drm_sched_fini()

2025-05-22 Thread Philipp Stanner
ovide users with a more reliable, clean scheduler API. Philipp Philipp Stanner (5): drm/sched: Fix teardown leaks with waitqueue drm/sched/tests: Port tests to new cleanup method drm/sched: Warn if pending list is not empty drm/nouveau: Add new callback for scheduler teardown drm/nouveau: Remove

[PATCH v3 1/5] drm/sched: Fix teardown leaks with waitqueue

2025-05-22 Thread Philipp Stanner
From: Philipp Stanner The GPU scheduler currently does not ensure that its pending_list is empty before performing various other teardown tasks in drm_sched_fini(). If there are still jobs in the pending_list, this is problematic because after scheduler teardown, no one will call

[PATCH v3 2/5] drm/sched/tests: Port tests to new cleanup method

2025-05-22 Thread Philipp Stanner
a new error field for the fence error. Keep the job status as DRM_MOCK_SCHED_JOB_DONE for now, since there is no party for which checking for a CANCELED status would be useful currently. Signed-off-by: Philipp Stanner --- .../gpu/drm/scheduler/tests/mock_scheduler.c | 67

[PATCH v3 3/5] drm/sched: Warn if pending list is not empty

2025-05-22 Thread Philipp Stanner
drm_sched_fini() can leak jobs under certain circumstances. Warn if that happens. Signed-off-by: Philipp Stanner --- drivers/gpu/drm/scheduler/sched_main.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c

[PATCH v3 5/5] drm/nouveau: Remove waitque for sched teardown

2025-05-22 Thread Philipp Stanner
nouveau_sched_fence_context_kill() the waitque is not necessary anymore. Remove the waitque. Signed-off-by: Philipp Stanner --- drivers/gpu/drm/nouveau/nouveau_sched.c | 20 +++- drivers/gpu/drm/nouveau/nouveau_sched.h | 9 +++-- drivers/gpu/drm/nouveau/nouveau_uvmm.c | 8 3 files

[PATCH v3 4/5] drm/nouveau: Add new callback for scheduler teardown

2025-05-22 Thread Philipp Stanner
There is a new callback for always tearing the scheduler down in a leak-free, deadlock-free manner. Port Nouveau as its first user by providing the scheduler with a callback that ensures the fence context gets killed in drm_sched_fini(). Signed-off-by: Philipp Stanner --- drivers/gpu/drm

[PATCH 1/2] dma-buf: Add __dma_fence_is_signaled()

2025-05-22 Thread Philipp Stanner
ed. Use it internally. Suggested-by: Tvrtko Ursulin Signed-off-by: Philipp Stanner --- include/linux/dma-fence.h | 24 ++-- 1 file changed, 22 insertions(+), 2 deletions(-) diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h index 48b5202c531d..ac951a54a007 10

[PATCH 2/2] drm/nouveau: Don't signal when killing the fence context

2025-05-22 Thread Philipp Stanner
which only checks, never signals. Signed-off-by: Philipp Stanner --- drivers/gpu/drm/nouveau/nouveau_fence.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c b/drivers/gpu/drm/nouveau/nouveau_fence.c index d5654e26d5bc..993b3dcb5db0

[RFC PATCH 2/6] drm/sched/tests: Implement cancel_job()

2025-06-03 Thread Philipp Stanner
hardware fence. That should be repaired and cleaned up, but it's probably better to do that in a separate series. Signed-off-by: Philipp Stanner --- .../gpu/drm/scheduler/tests/mock_scheduler.c | 71 +++ drivers/gpu/drm/scheduler/tests/sched_tests.h | 4 +- 2 files change

Re: [PATCH] drm/nouveau/gsp: fix potential leak of memory used during acpi init

2025-06-17 Thread Philipp Stanner
On Tue, 2025-06-17 at 14:00 +1000, Ben Skeggs wrote: > If any of the ACPI calls fail, memory allocated for the input buffer > would be leaked.  Fix failure paths to free allocated memory. > > Also add checks to ensure the allocations succeeded in the first > place. > > Reported-by: Danilo Krummri

[RFC PATCH 1/6] drm/sched: Avoid memory leaks with cancel_job() callback

2025-06-03 Thread Philipp Stanner
the hardware fence associated with the job. Afterwards, the scheduler can savely use the established free_job() callback for freeing the job. Implement the new backend_ops callback cancel_job(). Suggested-by: Tvrtko Ursulin Signed-off-by: Philipp Stanner --- drivers/gpu/drm/scheduler

[RFC PATCH 3/6] drm/sched: Warn if pending list is not empty

2025-06-03 Thread Philipp Stanner
drm_sched_fini() can leak jobs under certain circumstances. Warn if that happens. Signed-off-by: Philipp Stanner --- drivers/gpu/drm/scheduler/sched_main.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c

[RFC PATCH 5/6] drm/nouveau: Add new callback for scheduler teardown

2025-06-03 Thread Philipp Stanner
There is a new callback for always tearing the scheduler down in a leak-free, deadlock-free manner. Port Nouveau as its first user by providing the scheduler with a callback that ensures the fence context gets killed in drm_sched_fini(). Signed-off-by: Philipp Stanner --- drivers/gpu/drm

[RFC PATCH 0/6] drm/sched: Avoid memory leaks by canceling job-by-job

2025-06-03 Thread Philipp Stanner
ps://lore.kernel.org/dri-devel/20250418113211.69956-1-tvrtko.ursu...@igalia.com/ Philipp Stanner (6): drm/sched: Avoid memory leaks with cancel_job() callback drm/sched/tests: Implement cancel_job() drm/sched: Warn if pending list is not empty drm/nouveau: Make fence container helper usable driver-wide

[RFC PATCH 4/6] drm/nouveau: Make fence container helper usable driver-wide

2025-06-03 Thread Philipp Stanner
: Philipp Stanner --- drivers/gpu/drm/nouveau/nouveau_fence.c | 20 +++- drivers/gpu/drm/nouveau/nouveau_fence.h | 6 ++ 2 files changed, 13 insertions(+), 13 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c b/drivers/gpu/drm/nouveau/nouveau_fence.c index

[RFC PATCH 6/6] drm/nouveau: Remove waitque for sched teardown

2025-06-03 Thread Philipp Stanner
nouveau_sched_fence_context_kill() the waitque is not necessary anymore. Remove the waitque. Signed-off-by: Philipp Stanner --- drivers/gpu/drm/nouveau/nouveau_sched.c | 20 +++- drivers/gpu/drm/nouveau/nouveau_sched.h | 9 +++-- drivers/gpu/drm/nouveau/nouveau_uvmm.c | 8 3 files

[PATCH v2 0/7] drm/sched: Fix memory leaks with cancel_job() callback

2025-07-07 Thread Philipp Stanner
are still in drm_sched.pending_list. This series solves the leaks in a backwards-compatible manner by adding a new, optional callback. If that callback is implemented, the scheduler uses it to cancel all jobs from pending_list and then frees them. Philipp Stanner (7): drm/sched: Avoid memory

[PATCH v2 1/7] drm/sched: Avoid memory leaks with cancel_job() callback

2025-07-07 Thread Philipp Stanner
-tvrtko.ursu...@igalia.com/ Signed-off-by: Philipp Stanner Reviewed-by: Maíra Canal --- drivers/gpu/drm/scheduler/sched_main.c | 34 -- include/drm/gpu_scheduler.h| 18 ++ 2 files changed, 39 insertions(+), 13 deletions(-) diff --git a/drivers/gpu/drm

[PATCH v2 5/7] drm/nouveau: Make fence container helper usable driver-wide

2025-07-07 Thread Philipp Stanner
: Philipp Stanner --- drivers/gpu/drm/nouveau/nouveau_fence.c | 20 +++- drivers/gpu/drm/nouveau/nouveau_fence.h | 6 ++ 2 files changed, 13 insertions(+), 13 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c b/drivers/gpu/drm/nouveau/nouveau_fence.c index

[PATCH v2 2/7] drm/sched/tests: Implement cancel_job() callback

2025-07-07 Thread Philipp Stanner
the code where necessary. Signed-off-by: Philipp Stanner --- .../gpu/drm/scheduler/tests/mock_scheduler.c | 66 +++ 1 file changed, 23 insertions(+), 43 deletions(-) diff --git a/drivers/gpu/drm/scheduler/tests/mock_scheduler.c b/drivers/gpu/drm/scheduler/tests/mock_scheduler.c

[PATCH v2 3/7] drm/sched/tests: Add unit test for cancel_job()

2025-07-07 Thread Philipp Stanner
The scheduler unit tests now provide a new callback, cancel_job(). This callback gets used by drm_sched_fini() for all still pending jobs to cancel them. Implement a new unit test to test this. Signed-off-by: Philipp Stanner --- drivers/gpu/drm/scheduler/tests/tests_basic.c | 43

[PATCH v2 6/7] drm/nouveau: Add new callback for scheduler teardown

2025-07-07 Thread Philipp Stanner
There is a new callback for always tearing the scheduler down in a leak-free, deadlock-free manner. Port Nouveau as its first user by providing the scheduler with a callback that ensures the fence context gets killed in drm_sched_fini(). Signed-off-by: Philipp Stanner --- drivers/gpu/drm

[PATCH v2 7/7] drm/nouveau: Remove waitque for sched teardown

2025-07-07 Thread Philipp Stanner
nouveau_sched_fence_context_kill() the waitque is not necessary anymore. Remove the waitque. Signed-off-by: Philipp Stanner --- drivers/gpu/drm/nouveau/nouveau_sched.c | 20 +++- drivers/gpu/drm/nouveau/nouveau_sched.h | 9 +++-- drivers/gpu/drm/nouveau/nouveau_uvmm.c | 8 3 files

[PATCH v2 4/7] drm/sched: Warn if pending list is not empty

2025-07-07 Thread Philipp Stanner
drm_sched_fini() can leak jobs under certain circumstances. Warn if that happens. Signed-off-by: Philipp Stanner --- drivers/gpu/drm/scheduler/sched_main.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c

Re: [PATCH] drm/nouveau/gsp: fix potential leak of memory used during acpi init

2025-07-07 Thread Philipp Stanner
On Tue, 2025-06-17 at 14:00 +1000, Ben Skeggs wrote: > If any of the ACPI calls fail, memory allocated for the input buffer > would be leaked.  Fix failure paths to free allocated memory. > > Also add checks to ensure the allocations succeeded in the first > place. If you've got a kmemleak trace,

Re: [PATCH] drm/nouveau/gsp: fix potential leak of memory used during acpi init

2025-07-09 Thread Philipp Stanner
On Mon, 2025-07-07 at 16:31 +0200, Danilo Krummrich wrote: > On 7/7/25 10:27 AM, Philipp Stanner wrote: > > On Tue, 2025-06-17 at 14:00 +1000, Ben Skeggs wrote: > > > If any of the ACPI calls fail, memory allocated for the input buffer > > > would be leaked.  Fix fail

[PATCH v3 3/7] drm/sched/tests: Add unit test for cancel_job()

2025-07-09 Thread Philipp Stanner
The scheduler unit tests now provide a new callback, cancel_job(). This callback gets used by drm_sched_fini() for all still pending jobs to cancel them. Implement a new unit test to test this. Signed-off-by: Philipp Stanner Reviewed-by: Tvrtko Ursulin --- drivers/gpu/drm/scheduler/tests

[PATCH v3 1/7] drm/sched: Avoid memory leaks with cancel_job() callback

2025-07-09 Thread Philipp Stanner
-tvrtko.ursu...@igalia.com/ Signed-off-by: Philipp Stanner Reviewed-by: Maíra Canal --- drivers/gpu/drm/scheduler/sched_main.c | 34 -- include/drm/gpu_scheduler.h| 18 ++ 2 files changed, 39 insertions(+), 13 deletions(-) diff --git a/drivers/gpu/drm

[PATCH v3 2/7] drm/sched/tests: Implement cancel_job() callback

2025-07-09 Thread Philipp Stanner
the code where necessary. Signed-off-by: Philipp Stanner --- .../gpu/drm/scheduler/tests/mock_scheduler.c | 68 +++ drivers/gpu/drm/scheduler/tests/sched_tests.h | 1 - 2 files changed, 25 insertions(+), 44 deletions(-) diff --git a/drivers/gpu/drm/scheduler/tests

[PATCH v3 4/7] drm/sched: Warn if pending_list is not empty

2025-07-09 Thread Philipp Stanner
drm_sched_fini() can leak jobs under certain circumstances. Warn if that happens. Signed-off-by: Philipp Stanner --- drivers/gpu/drm/scheduler/sched_main.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c

[PATCH v3 6/7] drm/nouveau: Add new callback for scheduler teardown

2025-07-09 Thread Philipp Stanner
There is a new callback for always tearing the scheduler down in a leak-free, deadlock-free manner. Port Nouveau as its first user by providing the scheduler with a callback that ensures the fence context gets killed in drm_sched_fini(). Signed-off-by: Philipp Stanner Acked-by: Danilo Krummrich

[PATCH v3 0/7] drm/sched: Fix memory leaks with cancel_job() callback

2025-07-09 Thread Philipp Stanner
manner by adding a new, optional callback. If that callback is implemented, the scheduler uses it to cancel all jobs from pending_list and then frees them. Philipp Stanner (7): drm/sched: Avoid memory leaks with cancel_job() callback drm/sched/tests: Implement cancel_job() callback drm/sched/

[PATCH v3 5/7] drm/nouveau: Make fence container helper usable driver-wide

2025-07-09 Thread Philipp Stanner
: Philipp Stanner Acked-by: Danilo Krummrich --- drivers/gpu/drm/nouveau/nouveau_fence.c | 20 +++- drivers/gpu/drm/nouveau/nouveau_fence.h | 6 ++ 2 files changed, 13 insertions(+), 13 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c b/drivers/gpu/drm

[PATCH v3 7/7] drm/nouveau: Remove waitque for sched teardown

2025-07-09 Thread Philipp Stanner
nouveau_sched_cancel_job() the waitque is not necessary anymore. Remove the waitque. Signed-off-by: Philipp Stanner Acked-by: Danilo Krummrich --- drivers/gpu/drm/nouveau/nouveau_sched.c | 20 +++- drivers/gpu/drm/nouveau/nouveau_sched.h | 9 +++-- drivers/gpu/drm/nouveau/nouveau_uvmm.c

[PATCH v4 1/8] drm/panfrost: Fix scheduler workqueue bug

2025-07-10 Thread Philipp Stanner
-4d55-aa47-c35cd7861...@igalia.com/ Signed-off-by: Philipp Stanner --- drivers/gpu/drm/panfrost/panfrost_job.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c index 5657106c2f7d..15e2d505550f 10

[PATCH v4 2/8] drm/sched: Avoid memory leaks with cancel_job() callback

2025-07-10 Thread Philipp Stanner
-tvrtko.ursu...@igalia.com/ Signed-off-by: Philipp Stanner Reviewed-by: Maíra Canal --- drivers/gpu/drm/scheduler/sched_main.c | 34 -- include/drm/gpu_scheduler.h| 18 ++ 2 files changed, 39 insertions(+), 13 deletions(-) diff --git a/drivers/gpu/drm

[PATCH v4 3/8] drm/sched/tests: Implement cancel_job() callback

2025-07-10 Thread Philipp Stanner
the code where necessary. Signed-off-by: Philipp Stanner --- .../gpu/drm/scheduler/tests/mock_scheduler.c | 68 +++ drivers/gpu/drm/scheduler/tests/sched_tests.h | 1 - 2 files changed, 25 insertions(+), 44 deletions(-) diff --git a/drivers/gpu/drm/scheduler/tests

  1   2   >