Re: [PATCH v11 1/3] drm/amdgpu: Add ioctl to get all gem handles for a process

2025-08-08 Thread Christian König
On 07.08.25 22:22, David Francis wrote: > Add new ioctl DRM_IOCTL_AMDGPU_GEM_LIST_HANDLES. > > This ioctl returns a list of bos with their handles, sizes, > and flags and domains. > > This ioctl is meant to be used during CRIU checkpoint and > provide information needed to reconstruct the bos > i

Re: [PATCH] drm/amdgpu: fix nullptr error of amdgpu_vm_handle_moved

2025-08-08 Thread Christian König
On 08.08.25 05:14, Heng Zhou wrote: > If a amdgpu_bo_va is fpriv->prt_va, the bo of this one is always NULL. > So, such kind of amdgpu_bo_va should be updated separately before > amdgpu_vm_handle_moved. > > Signed-off-by: Heng Zhou > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 6

Re: [PATCH] drm/amdgpu: add to custom amdgpu_drm_release drm_dev_enter/exit

2025-08-08 Thread Christian König
> > Cc: Christian König > Cc: Alex Deucher > Signed-off-by: Vitaly Prosyak Reviewed-by: Christian König > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 5 - > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu

Re: [PATCH v10 2/3] drm/amdgpu: Add mapping info option for GEM_OP ioctl

2025-08-07 Thread Christian König
On 07.08.25 16:00, David Francis wrote: > Add new GEM_OP_IOCTL option GET_MAPPING_INFO, which > returns a list of mappings associated with a given bo, along with > their positions and offsets. > > Signed-off-by: David Francis > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 97 ++

Re: [PATCH v10 1/3] drm/amdgpu: Add ioctl to get bo info

2025-08-07 Thread Christian König
On 07.08.25 16:00, David Francis wrote: > Add new ioctl DRM_IOCTL_AMDGPU_GEM_BO_INFO. > > This ioctl returns a list of bos with their handles, sizes, > and flags and domains. > > This ioctl is meant to be used during CRIU checkpoint and > provide information needed to reconstruct the bos > in CRI

Re: Is amdgpu open to converting logging to drm_* functions

2025-08-07 Thread Christian König
On 07.08.25 15:22, Alex Deucher wrote: > On Thu, Aug 7, 2025 at 9:13 AM Brahmajit Das wrote: >> >> Hello Alex, Christian, >> >> I'm a mentee at Linux kernel Bug Fixing Summer 2025. I came across the >> TODO list on the kernel doc, and specifically this section[0]. Would >> amdgpu be open to this c

Re: [PATCH] drm/amdgpu: skip disabling audio when device is unplugged

2025-08-07 Thread Christian König
; Regards, > Shixiong Ou. > > > 在 2025/8/7 18:03, Christian König 写道: >> On 07.08.25 11:47, oushixiong1...@163.com wrote: >>> From: Shixiong Ou >>> >>> When Stopping lightdm and removing amdgpu driver are executed, the following >>> error is trigge

Re: Is amdgpu open to converting logging to drm_* functions

2025-08-07 Thread Christian König
IIRC we settled on printing everything DRM related (e.g. display, device lifetime, clients etc...) with the DRM macros. But everything related to general HW information like PCI slot configuration, BARs, speed etc... is printed with the general kernel functions, e.g. dev_err() dev_warn() B

Re: [PATCH] drm/amdgpu: skip disabling audio when device is unplugged

2025-08-07 Thread Christian König
On 07.08.25 11:47, oushixiong1...@163.com wrote: > From: Shixiong Ou > > When Stopping lightdm and removing amdgpu driver are executed, the following > error is triggered probably: > > Unable to handle kernel paging request at virtual address 5e00 > . > [ 2] [T10084] Call trace:

Re: [PATCH] drm/amdgpu: Fix race condition in amdgpu_vm_wait_idle during process kill

2025-08-07 Thread Christian König
On 07.08.25 10:46, Liu01 Tong wrote: > The early commit b8adc31cc0ca ("drm/amdgpu: Avoid extra evict-restore > process.") changed amdgpu_vm_wait_idle to use drm_sched_entity_flush > instead of dma_resv_wait_timeout to avoid KFD eviction fence signaling. > But this introduce a race condition when pr

Re: [PATCH RFC 0/6] amdgpu: Avoid powering on the dGPU on vkEnumeratePhysicalDevices()

2025-08-06 Thread Christian König
On 06.08.25 12:15, Philipp Zabel wrote: > On Mi, 2025-08-06 at 10:58 +0200, Christian König wrote: >> On 31.07.25 07:36, Philipp Zabel wrote: >>> This is an attempt at fixing amd#2295 [1]: >>> >>> On an AMD Rembrandt laptop with 680M iGPU and 6700S dGPU, call

Re: [PATCH 1/6] drm/amdgpu: Power up UVD 3 for FW validation

2025-08-06 Thread Christian König
On 06.08.25 02:35, Timur Kristóf wrote: >> > > Alex Hi, These are my observations about how the UVD clock works on SI: 1. It seems that the SMC needs to know whether UVD is enabled or not, and the UVD clocks are included as part of the power states. S

Re: [PATCH] drm/amdgpu: fix incorrect comment format

2025-08-06 Thread Christian König
On 06.08.25 05:34, Cryolitia PukNgae via B4 Relay wrote: > From: Cryolitia PukNgae > > Comments should not have a leading plus sign. Good catch, potentially a left over from a merge conflict or similar. > > Signed-off-by: Cryolitia PukNgae Acked-by: Christian König >

Re: [PATCH RFC 0/6] amdgpu: Avoid powering on the dGPU on vkEnumeratePhysicalDevices()

2025-08-06 Thread Christian König
On 31.07.25 07:36, Philipp Zabel wrote: > This is an attempt at fixing amd#2295 [1]: > > On an AMD Rembrandt laptop with 680M iGPU and 6700S dGPU, calling > vkEnumeratePhysicalDevices() wakes up the sleeping dGPU, even if all > the application wants is to find and use the iGPU. This causes a

Re: [PATCH] drm/amdgpu: Raven: don't allow mixing GTT and VRAM

2025-08-05 Thread Christian König
On 28.07.25 18:38, Alex Deucher wrote: Anyway, back to your suggestion, I think we can probably drop the checks as you should always get a compatible memory buffer due to amdgpu_bo_get_preferred_domain(). Pinning should fail if we can't pin in the required domain. amdgpu_displa

Re: [PATCH 1/6] drm/amdgpu: Power up UVD 3 for FW validation

2025-08-04 Thread Christian König
On 04.08.25 19:45, Alex Deucher wrote: > On Mon, Aug 4, 2025 at 12:00 PM Timur Kristóf wrote: >> >> On Mon, 2025-08-04 at 11:20 -0400, Alex Deucher wrote: >>> On Mon, Aug 4, 2025 at 9:58 AM Timur Kristóf >>> wrote: Unlike later versions, UVD 3 has firmware validation. For this to w

Re: [PATCH] drm/amdgpu: keep job->vm in amdgpu_job_prepare_job

2025-08-04 Thread Christian König
On 23.07.25 11:06, YuanShang wrote: > job->vm is used in function amdgpu_job_run to get the page > table re-generation counter and decide whether the job should be skipped. Support for resubmitting jobs was removed, so that code should probably be removed as well. We should probably move the cal

Re: [PATCH v4] drm/amdgpu: Avoid extra evict-restore process.

2025-07-18 Thread Christian König
35235] ? ima_bprm_check+0xa2/0xd0 > [677852.635240] search_binary_handler+0xda/0x260 > [677852.635245] exec_binprm+0x58/0x1a0 > [677852.635249] bprm_execve.part.0+0x16f/0x210 > [677852.635254] bprm_execve+0x45/0x80 > [677852.635257] do_execveat_common.isra.0+0x190/0x200 > > Suggested-by:

Re: [PATCH] drm/amdgpu: Raven: don't allow mixing GTT and VRAM

2025-07-17 Thread Christian König
me more problematic side effects (drawing more power etc...) > It would seem that all devices > would have this issue, no? Also, I'm not familiar with how > kms_plane_alpha_blend works, but does this also support that test > failing as the cause? Correct, it affects all APUs which

Re: [PATCH v3] drm/amdgpu: Avoid extra evict-restore process.

2025-07-17 Thread Christian König
35235] ? ima_bprm_check+0xa2/0xd0 > [677852.635240] search_binary_handler+0xda/0x260 > [677852.635245] exec_binprm+0x58/0x1a0 > [677852.635249] bprm_execve.part.0+0x16f/0x210 > [677852.635254] bprm_execve+0x45/0x80 > [677852.635257] do_execveat_common.isra.0+0x190/0x200 > > Suggested-by:

Re: [RFC] drm/amdgpu/sdma5.2: Avoid latencies caused by the powergating workaround

2025-07-16 Thread Christian König
On 16.07.25 16:06, Tvrtko Ursulin wrote: > > On 16/07/2025 14:00, Christian König wrote: >> On 16.07.25 14:51, Tvrtko Ursulin wrote: >>>>>>>> be disabled once GFX/SDMA is no longer active.  In this particular >>>>>>>> case ther

Re: [RFC] drm/amdgpu/sdma5.2: Avoid latencies caused by the powergating workaround

2025-07-16 Thread Christian König
On 16.07.25 14:51, Tvrtko Ursulin wrote: >> be disabled once GFX/SDMA is no longer active.  In this particular >> case there was a race condition somewhere in the internal handshaking >> with SDMA which led to SDMA missing doorbells sometimes and not >> executing the job even if the

Re: [PATCH v5 2/3] drm/amdgpu: Reset the clear flag in buddy during resume

2025-07-16 Thread Christian König
On 16.07.25 12:47, Christian König wrote: > On 16.07.25 12:28, Arunpravin Paneer Selvam wrote: >> Hi Dave, >> >> I am trying to push this series into drm-misc-fixes, but I get the below >> error when dim push-branch drm-misc-fixes. >> >> dim:ERROR:e24c180b4

Re: [PATCH v5 2/3] drm/amdgpu: Reset the clear flag in buddy during resume

2025-07-16 Thread Christian König
w Auld) >>    - Having this function being able to flip the state either way would be >> good. (Matthew Brost) >> >> v3(Matthew Auld): >>    - Do merge step first to avoid the use of extra reset flag. >> >> Signed-off-by: Arunpravin Paneer Selvam &

Re: [PATCH 10/33] drm/amdgpu/gfx9.4.3: re-emit unprocessed state on kcq reset

2025-07-16 Thread Christian König
Patches #9-#22 Reviewed-by: Christian König On 15.07.25 18:12, Alex Deucher wrote: > Ping? > > Alex > > On Fri, Jul 11, 2025 at 6:48 PM Alex Deucher > wrote: >> >> Re-emit the unprocessed state after resetting the queue. >> >> Signed-off-by: Al

Re: [PATCH v4 2/3] drm/amdgpu: Reset the clear flag in buddy during resume

2025-07-16 Thread Christian König
ng able to flip the state either way would be > good. (Matthew Brost) > > v3(Matthew Auld): > - Do merge step first to avoid the use of extra reset flag. You've lost me with that :) > > Signed-off-by: Arunpravin Paneer Selvam > Suggested-by: Christian König >

Re: [PATCH v6 06/11] drm/amdgpu: track the userq bo va for its obj management

2025-07-15 Thread Christian König
On 15.07.25 14:05, Liang, Prike wrote: > [Public] > > Regards, > Prike > >> -Original Message- >> From: Koenig, Christian >> Sent: Friday, July 11, 2025 8:11 PM >> To: Liang, Prike ; amd-gfx@lists.freedesktop.org >> Cc: Deucher, Alexander >> Subject: Re: [PATCH v6 06/11] drm/amdgp

Re: [PATCH v6 07/11] drm/amdgpu: validate userq's last fence prior to destroying

2025-07-15 Thread Christian König
On 15.07.25 13:50, Liang, Prike wrote: > [Public] > > Regards, > Prike > >> -Original Message- >> From: Koenig, Christian >> Sent: Friday, July 11, 2025 8:13 PM >> To: Liang, Prike ; amd-gfx@lists.freedesktop.org >> Cc: Deucher, Alexander >> Subject: Re: [PATCH v6 07/11] drm/amdgp

Re: [PATCH v6 03/11] drm/amdgpu: rework the userq doorbell object destroy

2025-07-15 Thread Christian König
On 15.07.25 10:07, Liang, Prike wrote: > [Public] > > Regards, > Prike > >> -Original Message- >> From: Koenig, Christian >> Sent: Friday, July 11, 2025 8:01 PM >> To: Liang, Prike ; amd-gfx@lists.freedesktop.org >> Cc: Deucher, Alexander >> Subject: Re: [PATCH v6 03/11] drm/amdgp

Re: [PATCH v6 04/11] drm/amdgpu: validate userq buffer virtual address and size

2025-07-15 Thread Christian König
On 15.07.25 10:19, Liang, Prike wrote: >>> + /* Validate the userq virtual address.*/ >>> + if (amdgpu_userq_input_va_validate(&fpriv->vm, args->in.queue_va, args- >>> in.queue_size) || >>> + amdgpu_userq_input_va_validate(&fpriv->vm, args->in.rptr_va, >> PAGE_SIZE) || >>> + amdgpu_

Re: [PATCH 08/33] drm/amdgpu: track ring state associated with a fence

2025-07-14 Thread Christian König
the unprocessed state so that it can be > re-emitted after the queue is reset. > > Signed-off-by: Alex Deucher This clearly needs a follow up cleanup, but Reviewed-by: Christian König for now. > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 90 +++

Re: [PATCH 06/33] drm/amdgpu: clean up jpeg reset functions

2025-07-14 Thread Christian König
On 12.07.25 00:39, Alex Deucher wrote: > Make them consistent and use the reset flags. > > Signed-off-by: Alex Deucher I'm not very keen on spreading amdgpu_sriov_vf() around everywere. But for now Acked-by: Christian König for patches #6 and #7. > --- > driver

Re: [PATCH 05/33] drm/amdgpu/vcn: don't enable per queue resets on SR-IOV

2025-07-14 Thread Christian König
On 12.07.25 00:39, Alex Deucher wrote: > Power control is only available in bare metal. SR-IOV > will need a different method. > > Signed-off-by: Alex Deucher Reviewed-by: Christian König > --- > drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c | 3 ++- > drivers/gpu/drm/amd/am

Re: [PATCH 02/33] drm/amdgpu/jpeg2: add additional ring reset error checking

2025-07-14 Thread Christian König
On 12.07.25 00:39, Alex Deucher wrote: > Start and stop can fail, so add checks. > > Fixes: 500c04d2a708 ("drm/amdgpu: Add ring reset callback for JPEG2_0_0") > Signed-off-by: Alex Deucher > Cc: Sathishkumar S Reviewed-by: Christian König for patches #2-#4. >

Re: [PATCH 01/33] drm/amdgpu: clean up sdma reset functions

2025-07-14 Thread Christian König
On 12.07.25 00:39, Alex Deucher wrote: > Make them consistent and drop unneeded extra variables. > > Signed-off-by: Alex Deucher Acked-by: Christian König > --- > drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c | 14 +++--- > drivers/gpu/drm/amd/amdgp

Re: [RFC] drm/amdgpu/sdma5.2: Avoid latencies caused by the powergating workaround

2025-07-11 Thread Christian König
On 11.07.25 15:58, Tvrtko Ursulin wrote: > > On 11/07/2025 14:39, Alex Deucher wrote: >> On Fri, Jul 11, 2025 at 9:22 AM Tvrtko Ursulin >> wrote: >>> >>> >>> On 11/07/2025 13:45, Christian König wrote: >>>> On 11.07.25 14:23, Tvrtko Ur

Re: [PATCH] drm/scheduler: Fix sched hang when killing app with dependent jobs

2025-07-11 Thread Christian König
On 11.07.25 15:13, Philipp Stanner wrote: > On Thu, 2025-07-10 at 08:33 +, cao, lin wrote: >> >> [AMD Official Use Only - AMD Internal Distribution Only] >> >> >> >> Hi Christian, >> >> >> Thanks for your suggestion, I modified the patch as: > > Looks promising. You'll send a v2 I guess :) We

Re: [RFC] drm/amdgpu/sdma5.2: Avoid latencies caused by the powergating workaround

2025-07-11 Thread Christian König
. > the real improvement in max submission latency is severely understated by > these numbers. Well that would indeed be quite nice to have. Regards, Christian. > > Signed-off-by: Tvrtko Ursulin > References: 94b1e028e15c ("drm/amdgpu/sdma5.2: add begin/end_use ring >

Re: [PATCH v6 10/11] drm/amdgpu: validate the queue va for resuming the queue

2025-07-11 Thread Christian König
On 11.07.25 11:39, Prike Liang wrote: > It requires validating the userq VA whether is mapped before > trying to resume the queue. > > Signed-off-by: Prike Liang Yeah that looks sane to me. Patch is Reviewed-by: Christian König > --- > drivers/gpu/drm/amd/amdgpu/

Re: [PATCH v6 09/11] drm/amdgpu: validate the shared bo for tracking usage size

2025-07-11 Thread Christian König
On 11.07.25 11:39, Prike Liang wrote: > It requires validating the shared BO before updating its usage > size; otherwise, there is a potential NULL pointer error when the > BO released improperly. Clear NAK to that. You are obviously working around a bug elsewhere. Regards, Christian. > > Signe

Re: [PATCH v6 07/11] drm/amdgpu: validate userq's last fence prior to destroying

2025-07-11 Thread Christian König
On 11.07.25 11:39, Prike Liang wrote: > The userq requires validating queue status before destroying > it, if user tries to destroy a busy userq by IOCTL then the > driver should report an error for this illegal usage. Clear NAK, destroying a busy userqueue is perfectly valid! Regards, Christian.

Re: [PATCH v6 06/11] drm/amdgpu: track the userq bo va for its obj management

2025-07-11 Thread Christian König
On 11.07.25 11:39, Prike Liang wrote: > The user queue object destroy requires ensuring its > VA keeps mapping prior to the queue being destroyed. > Otherwise, it seems a bug in the user space or VA > freed wrongly, and the kernel driver should report an > invalidated error to the user IOCLT req

Re: [PATCH v6 04/11] drm/amdgpu: validate userq buffer virtual address and size

2025-07-11 Thread Christian König
On 11.07.25 11:39, Prike Liang wrote: > It needs to validate the userq object virtual address to > determin whether it is residented in a valid vm mapping. > > Signed-off-by: Prike Liang > Reviewed-by: Alex Deucher > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c | 38 +

Re: [PATCH v6 03/11] drm/amdgpu: rework the userq doorbell object destroy

2025-07-11 Thread Christian König
On 11.07.25 11:39, Prike Liang wrote: > This patch aims to unify and destroy the userq doorbell objects at > mes_userq_mqd_destroy(), and this change will also help with unpinning > and destroying the userq doorbell objects for amdgpu_userq_mgr_fini() > during releasing the drm files. > > Signed-o

Re: [PATCH v6 5/5] drm/amdgpu: do not resume device in thaw for normal hibernation

2025-07-10 Thread Christian König
On 10.07.25 14:13, Mario Limonciello wrote: > On 7/10/2025 2:23 AM, Samuel Zhang wrote: >> For normal hibernation, GPU do not need to be resumed in thaw since it is >> not involved in writing the hibernation image. Skip resume in this case >> can reduce the hibernation time. >> >> On VM with 8 * 19

Re: [PATCH] drm/scheduler: Fix sched hang when killing app with dependent jobs

2025-07-10 Thread Christian König
First of all you need to CC the scheduler maintainers, try to use the get_maintainer.pl script. Adding them on CC. On 10.07.25 08:36, Lin.Cao wrote: > When Application A submits jobs (a1, a2, a3) and application B submits > job b1 with a dependency on a2's scheduler fence, killing application A >

[PATCH 2/2] drm/amdgpu: rework how PTE flags are generated v2

2025-07-09 Thread Christian König
he HW flags and translate them to the HW flags while filling in the PTEs. Only tested on Navi 23 for now, so probably needs quite a bit of more work. v2: fix KFD and SVN handling Signed-off-by: Christian König --- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 7 ++- drivers/gpu/drm/amd/a

[PATCH 1/2] drm/amdgpu: rework gmc_v9_0_get_coherence_flags v2

2025-07-09 Thread Christian König
Avoid using the mapping here. v2: use amdgpu_xgmi_same_hive() as suggested by Felix Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd

Re: [PATCH v4 1/5] drm/ttm: add new api ttm_device_prepare_hibernation()

2025-07-09 Thread Christian König
On 09.07.25 08:44, Samuel Zhang wrote: > This new api is used for hibernation to move GTT BOs to shmem after > VRAM eviction. shmem will be flushed to swap disk later to reduce > the system memory usage for hibernation. > > Signed-off-by: Samuel Zhang Reviewed-by:

Re: [PATCH v6 14/15] drm/sched: Queue all free credits in one worker invocation

2025-07-09 Thread Christian König
On 08.07.25 17:31, Tvrtko Ursulin wrote: > > On 08/07/2025 14:02, Christian König wrote: >> On 08.07.25 14:54, Tvrtko Ursulin wrote: >>> >>> On 08/07/2025 13:37, Christian König wrote: >>>> On 08.07.25 11:51, Tvrtko Ursulin wrote: >>>>> T

Re: [PATCH v2 1/1] drm/amdkfd: return -ENOTTY for unsupported IOCTLs

2025-07-09 Thread Christian König
On 09.07.25 06:56, Lazar, Lijo wrote: > On 7/8/2025 8:40 PM, Deucher, Alexander wrote: >> [Public] >> >> >> I seem to recall -ENOTSUPP being frowned upon for IOCTLs. >> >> > Going by documentation - > https://dri.freedesktop.org/docs/drm/gpu/drm-uapi.html > Good point. > EOPNOTSUPP: > Feature (li

Re: [PATCH] drm/amdgpu: fix the logic to validate fpriv and root bo

2025-07-09 Thread Christian König
On 09.07.25 09:16, Sunil Khatri wrote: > Fix the smatch warning, > smatch warnings: > drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c:2146 amdgpu_pt_info_read() > error: we previously assumed 'fpriv' could be null (see line 2146) > > "if (!fpriv && !fpriv->vm.root.bo)", It has to be an OR condition >

Re: [PATCH v6 14/15] drm/sched: Queue all free credits in one worker invocation

2025-07-08 Thread Christian König
On 08.07.25 14:54, Tvrtko Ursulin wrote: > > On 08/07/2025 13:37, Christian König wrote: >> On 08.07.25 11:51, Tvrtko Ursulin wrote: >>> There is no reason to queue just a single job if scheduler can take more >>> and re-queue the worker to queue more. >

Re: [PATCH v6 14/15] drm/sched: Queue all free credits in one worker invocation

2025-07-08 Thread Christian König
-off-by: Tvrtko Ursulin > Cc: Christian König > Cc: Danilo Krummrich > Cc: Matthew Brost > Cc: Philipp Stanner > --- > drivers/gpu/drm/scheduler/sched_internal.h | 2 - > drivers/gpu/drm/scheduler/sched_main.c | 132 ++--- > drivers/gpu/drm/schedul

Re: [PATCH] drm/amdgpu: fix MQD debugfs undefined symbol when DEBUG_FS=n

2025-07-08 Thread Christian König
On 08.07.25 12:15, Sunil Khatri wrote: > Fix undefined reference to amdgpu_mqd_info_fops during > debugfs_create_file if DEBUG_FS=n > > Signed-off-by: Sunil Khatri Yeah, that's exactly the reason why I wanted to put this into amdgpu_debugfs.c. For now Reviewed-by: Christi

Re: [PATCH v5 8/9] drm/amdgpu: validate userq activity status for GEM_VA unmap

2025-07-08 Thread Christian König
On 08.07.25 11:28, Liang, Prike wrote: > [Public] > > Regards, > Prike > >> -Original Message- >> From: Koenig, Christian >> Sent: Monday, July 7, 2025 5:43 PM >> To: Liang, Prike ; amd-gfx@lists.freedesktop.org >> Cc: Deucher, Alexander >> Subject: Re: [PATCH v5 8/9] drm/amdgpu:

Re: [PATCH 01/36] drm/amdgpu/gfx9: fix kiq locking in KCQ reset

2025-07-08 Thread Christian König
Patches #1-#3 are Reviewed-by: Christian König I need to wrap my head around the SDMA stuff, but at the moment won't have time for that. Would be really good if somebody else could take a look at that as well. Regards, Christian. On 07.07.25 21:03, Alex Deucher wrote: > The ring te

Re: [PATCH v5 7/9] drm/amdgpu: add userq va unmap validated helper

2025-07-08 Thread Christian König
On 08.07.25 09:32, Liang, Prike wrote: > [Public] > > Regards, > Prike > >> -Original Message- >> From: Koenig, Christian >> Sent: Monday, July 7, 2025 5:37 PM >> To: Liang, Prike ; amd-gfx@lists.freedesktop.org >> Cc: Deucher, Alexander >> Subject: Re: [PATCH v5 7/9] drm/amdgpu:

Re: [PATCH v3 2/5] drm/amdgpu: move GTT to shmem after eviction for hibernation

2025-07-08 Thread Christian König
KMD, then shmem to swap disk in kernel > hibernation code to make room for hibernation image. > > Signed-off-by: Samuel Zhang Reviewed-by: Christian König > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 10 +- > 1 file changed, 9 insertions(+), 1 deletion(-) >

Re: [PATCH v3 1/5] drm/ttm: add new api ttm_device_prepare_hibernation()

2025-07-08 Thread Christian König
On 08.07.25 09:42, Samuel Zhang wrote: > This new api is used for hibernation to move GTT BOs to shmem after > VRAM eviction. shmem will be flushed to swap disk later to reduce > the system memory usage for hibernation. > > Signed-off-by: Samuel Zhang > --- > drivers/gpu/drm/ttm/ttm_device.c

Re: [PATCH v5 3/9] drm/amdgpu: rework the userq doorbell object destroy

2025-07-08 Thread Christian König
On 08.07.25 09:00, Liang, Prike wrote: > [Public] > >> -Original Message- >> From: Koenig, Christian >> Sent: Monday, July 7, 2025 5:28 PM >> To: Liang, Prike ; amd-gfx@lists.freedesktop.org >> Cc: Deucher, Alexander >> Subject: Re: [PATCH v5 3/9] drm/amdgpu: rework the userq doorbell ob

Re: [PATCH v3 2/3] drm/amdgpu: Reset the clear flag in buddy during resume

2025-07-08 Thread Christian König
ng able to flip the state either way would be > good. (Matthew Brost) > > v3(Matthew Auld): > - Do merge step first to avoid the use of extra reset flag. > > Signed-off-by: Arunpravin Paneer Selvam > Suggested-by: Christian König > Cc: sta...@vger.kernel.org > Fi

Re: [PATCH v3 1/3] drm/amdgpu: Add WARN_ON to the resource clear function

2025-07-08 Thread Christian König
; - Add back the resource clear flag set function call after > being wiped during eviction (Christian). > - Modified the patch subject name. > > Signed-off-by: Arunpravin Paneer Selvam > Suggested-by: Christian König > Cc: sta...@vger.kernel.org > Fixes: a68c7eaa7a

Re: [PATCH v2 1/1] drm/amdkfd: return -ENOTTY for unsupported IOCTLs

2025-07-08 Thread Christian König
On 08.07.25 06:22, Geoffrey McRae wrote: > Some kfd ioctls may not be available depending on the kernel version the > user is running, as such we need to report -ENOTTY so userland can > determine the cause of the ioctl failure. In general sounds like a good idea, but ENOTTY is potentially a bit m

Re: [PATCH 1/2] drm/ttm: rename ttm_bo_put to _fini

2025-07-07 Thread Christian König
On 07.07.25 18:25, Matthew Brost wrote: > On Mon, Jul 07, 2025 at 02:38:07PM +0200, Christian König wrote: >> On 03.07.25 00:01, Matthew Brost wrote: >>>> diff --git a/drivers/gpu/drm/ttm/tests/ttm_bo_test.c >>>> b/drivers/gpu/drm/ttm/tests/ttm_bo_test.c >

Re: [PATCH] drm/amdgpu: Replace HQD terminology with slots naming

2025-07-07 Thread Christian König
On 07.07.25 15:22, Alex Deucher wrote: > On Mon, Jul 7, 2025 at 5:48 AM Christian König > wrote: >> >> On 04.07.25 09:26, Jesse Zhang wrote: >>> The term "HQD" is CP-specific and doesn't >>> accurately describe the queue resources for othe

Re: [PATCH 1/2] drm/ttm: rename ttm_bo_put to _fini

2025-07-07 Thread Christian König
On 03.07.25 00:01, Matthew Brost wrote: >> diff --git a/drivers/gpu/drm/ttm/tests/ttm_bo_test.c >> b/drivers/gpu/drm/ttm/tests/ttm_bo_test.c >> index 6c77550c51af..5426b435f702 100644 >> --- a/drivers/gpu/drm/ttm/tests/ttm_bo_test.c >> +++ b/drivers/gpu/drm/ttm/tests/ttm_bo_test.c >> @@ -379,7 +37

Re: Switching TTM over to GEM refcounts v2

2025-07-07 Thread Christian König
On 02.07.25 18:15, Matthew Brost wrote: > On Wed, Jul 02, 2025 at 01:00:26PM +0200, Christian König wrote: >> Hi everyone, >> >> v2 of this patch set. I've either pushed or removed the other >> patches from v1, so only two remain. >> >> Pretty straight

Re: WARNING: drivers/gpu/drm/drm_gem.c:286 at drm_gem_object_handle_put_unlocked+0xb1/0xf0 [drm]

2025-07-07 Thread Christian König
On 07.07.25 11:30, Borislav Petkov wrote: > Hi all, > > I see the below on -rc5 + tip, on a RN machine. Yeah, that's an known issue. Thomas and I are working on that. Regards, Christian. > > --- > > [5.592468] cdc_ncm 2-2:2.0 eth0: register 'cdc_ncm' at > usb-:03:00.3-2, CDC NCM (NO

Re: [PATCH v5 8/9] drm/amdgpu: validate userq activity status for GEM_VA unmap

2025-07-07 Thread Christian König
On 04.07.25 12:33, Prike Liang wrote: > The userq VA unmap requires validating queue status before unamapping > it, if user tries to unmap a busy userq by GEM VA IOCTL then the > driver should report an error for this illegal usage. Clear NAK to the whole approach. We should never deny unmapping

Re: [PATCH v5 7/9] drm/amdgpu: add userq va unmap validated helper

2025-07-07 Thread Christian König
On 04.07.25 12:33, Prike Liang wrote: > This helper can validate the userq whether can be > unmapped prior to the userq VA GEM unmap. > > Signed-off-by: Prike Liang > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c | 78 +++ > drivers/gpu/drm/amd/amdgpu/amdgpu_userq.h | 3

Re: [PATCH v5 3/9] drm/amdgpu: rework the userq doorbell object destroy

2025-07-07 Thread Christian König
On 04.07.25 12:33, Prike Liang wrote: > This patch aims to unify and destroy the userq doorbell objects at > mes_userq_mqd_destroy(), and this change will also help with unpinning > and destroying the userq doorbell objects for amdgpu_userq_mgr_fini() > during releasing the drm files. > > Signe

Re: [PATCH] drm/amdgpu: Replace HQD terminology with slots naming

2025-07-07 Thread Christian König
he generic nature of the resource counting > 2. Updates the UAPI struct member from `userq_num_hqds` to `userq_num_slots` > 3. Maintains the same functionality while using more appropriate terminology > > Signed-off-by: Jesse Zhang Acked-by: Christian König BTW: Why us

Re: [PATCH v2 2/5] drm/amdgpu: move GTT to shmem after eviction for hibernation

2025-07-07 Thread Christian König
On 04.07.25 12:12, Samuel Zhang wrote: > When hibernate with data center dGPUs, huge number of VRAM BOs evicted > to GTT and takes too much system memory. This will cause hibernation > fail due to insufficient memory for creating the hibernation image. > > Move GTT BOs to shmem in KMD, then shm

Re: [PATCH v2 1/5] drm/ttm: add ttm_device_prepare_hibernation() api

2025-07-07 Thread Christian König
On 04.07.25 12:12, Samuel Zhang wrote: > This new api is used for hibernation to move GTT BOs to shmem after > VRAM eviction. shmem will be flushed to swap disk later to reduce > the system memory usage for hibernation. > > Signed-off-by: Samuel Zhang > --- > drivers/gpu/drm/ttm/ttm_device.c | 2

Re: [PATCH] drm/amdgpu: Fix lifetime of struct amdgpu_task_info after ring reset

2025-07-04 Thread Christian König
rg/dri-devel/CAPM=9tz0rQP8VZWKWyuF8kUMqRScxqoa6aVdwWw9=5yyxyy...@mail.gmail.com/ > Signed-off-by: André Almeida Reviewed-by: Christian König > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 5 ++--- > 1 file changed, 2 insertions(+), 3 deletions(-) > > diff --git a/drivers/

Re: [PATCH v1 1/3] drm/buddy: add a flag to disable trimming of non cleared blocks

2025-07-03 Thread Christian König
On 02.07.25 18:12, Pierre-Eric Pelloux-Prayer wrote: > A vkcts test case is triggering a case where the drm buddy allocator > wastes lots of memory and performs badly: > > dEQP-VK.memory.allocation.basic.size_8KiB.reverse.count_4000 > > For each memory pool type, the test will allocate 4000 8kB

Re: [PATCH v9 3/4] drm/amdgpu: add debugfs support for VM pagetable per client

2025-07-03 Thread Christian König
0 for success, error for failure. > */ > int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm, > -int32_t xcp_id) > +int32_t xcp_id, struct drm_file *file) > { > struct amdgpu_bo *root_bo; > struct amdgpu_bo_vm *ro

Re: [PATCH 1/2] Revert "drm/amdgpu: fix slab-use-after-free in amdgpu_userq_mgr_fini+0x70c"

2025-07-03 Thread Christian König
will be fixed in the next patch. > > Cc: Alex Deucher > Cc: Christian Koenig > Signed-off-by Vitaly Prosyak Reviewed-by: Christian König > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 16 +++- > drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 3 --- > 2

Re: [PATCH] drm/amdgpu: delete function amdgpu_flush

2025-07-03 Thread Christian König
On 02.07.25 18:40, Philip Yang wrote: > > On 2025-07-01 03:28, Christian König wrote: >> Clear NAK to removing this! >> >> The amdgpu_flush function is vital for correct operation. > no fflush call from libdrm/amdgpu, so amdgpu_flush is only called from fclose > -&g

Re: [PATCH v2 1/3] drm/amdgpu: Dirty cleared blocks on free

2025-07-02 Thread Christian König
On 02.07.25 13:58, Arunpravin Paneer Selvam wrote: > Hi Christian, > > On 7/2/2025 1:27 PM, Christian König wrote: >> On 01.07.25 21:08, Arunpravin Paneer Selvam wrote: >>> Set the dirty bit when the memory resource is not cleared >>> during BO release. >&g

[PATCH 2/2] drm/ttm: replace TTMs refcount with the DRM refcount v2

2025-07-02 Thread Christian König
re-enable disabled test Signed-off-by: Christian König --- .../gpu/drm/ttm/tests/ttm_bo_validate_test.c | 8 +- drivers/gpu/drm/ttm/tests/ttm_kunit_helpers.c | 2 - drivers/gpu/drm/ttm/ttm_bo.c | 148 +- drivers/gpu/drm/ttm/ttm_bo_internal.h | 9

[PATCH 1/2] drm/ttm: rename ttm_bo_put to _fini

2025-07-02 Thread Christian König
Give TTM BOs a separate cleanup function. The next step in removing the TTM BO reference counting and replacing it with the GEM object reference counting. Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 2 +- drivers/gpu/drm/drm_gem_vram_helper.c

Switching TTM over to GEM refcounts v2

2025-07-02 Thread Christian König
Hi everyone, v2 of this patch set. I've either pushed or removed the other patches from v1, so only two remain. Pretty straight forward conversation and shouldn't result in any visible technical difference. Please review and/or comment. Regards, Christian.

Re: [PATCH 1/3] drm/amdgpu: move GTT to SHM after eviction for hibernation

2025-07-02 Thread Christian König
On 02.07.25 09:28, Samuel Zhang wrote: > > On 2025/7/1 16:22, Christian König wrote: >> On 01.07.25 10:18, Zhang, GuoQing (Sam) wrote: >>> [AMD Official Use Only - AMD Internal Distribution Only] >>> >>> >>> Hi Christian, >>> >>>  

Re: [PATCH v2 1/3] drm/amdgpu: Dirty cleared blocks on free

2025-07-02 Thread Christian König
ff-by: Arunpravin Paneer Selvam > Suggested-by: Christian König > Cc: sta...@vger.kernel.org > Fixes: a68c7eaa7a8f ("drm/amdgpu: Enable clear page functionality") > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 1 - > drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mg

Re: [PATCH v2] drm/amdgpu: Unify Device Aperture in amdgpu_info_ioctl for KGD/KFD

2025-07-01 Thread Christian König
be 0x or 0xffff to avoid sign extension problems. (Christian) >> >> Cc: David Yat Sin >> Cc: Christian König >> Cc: Alex Deucher >> Signed-off-by: Srinivasan Shanmugam > > Reviewed-by: Alex Deucher Reviewed-by: Christian König as well. > But don&#x

Re: [PATCH v4 10/11] drm/amdgpu: only bound the eviction fence to userq bo

2025-07-01 Thread Christian König
On 01.07.25 15:21, Liang, Prike wrote: > [Public] > > Regards, > Prike > >> -Original Message- >> From: Koenig, Christian >> Sent: Wednesday, June 25, 2025 3:50 PM >> To: Liang, Prike ; amd-gfx@lists.freedesktop.org >> Cc: Deucher, Alexander >> Subject: Re: [PATCH v4 10/11] drm/am

Re: [PATCH v4 07/11] drm/amdgpu: add user queue vm identifier

2025-07-01 Thread Christian König
On 01.07.25 15:12, Liang, Prike wrote: > [Public] > > Regards, > Prike > >> -Original Message- >> From: Koenig, Christian >> Sent: Wednesday, June 25, 2025 3:52 PM >> To: Liang, Prike ; amd-gfx@lists.freedesktop.org >> Cc: Deucher, Alexander >> Subject: Re: [PATCH v4 07/11] drm/am

Re: [PATCH 3/3] drm/amdgpu: skip kfd resume_process for dev_pm_ops.thaw()

2025-07-01 Thread Christian König
at approach here looks fishy to me, but I don't know how to properly fix it either. @Alex any idea? Regards, Christian. > > > Regards > Sam > > > On 2025/6/30 19:58, Christian König wrote: >> On 30.06.25 12:41, Samuel Zhang wrote: >>> The hibernation su

Re: [PATCH 1/3] drm/amdgpu: move GTT to SHM after eviction for hibernation

2025-07-01 Thread Christian König
On 01.07.25 10:18, Zhang, GuoQing (Sam) wrote: > [AMD Official Use Only - AMD Internal Distribution Only] > > > Hi Christian, > >   > > Thank you for the feedback. > >   > > For “return ret < 0 ? ret : 0;”, it is equivalent to “return ret;” since ret > is always <= 0 after the loop. No it i

Re: [PATCH] drm/amdgpu: delete function amdgpu_flush

2025-07-01 Thread Christian König
Clear NAK to removing this! The amdgpu_flush function is vital for correct operation. The intention is to block closing the file handle in child processes and wait for all previous operations to complete. Regards, Christian. On 01.07.25 07:35, YuanShang Mao (River) wrote: > [AMD Official Use O

Re: [PATCH v7 1/5] drm: move the debugfs accel driver code to drm layer

2025-06-30 Thread Christian König
On 30.06.25 16:36, Sunil Khatri wrote: > Move the debugfs accel driver code to the drm layer > and it is an intermediate step to move all debugfs > related handling into drm_debugfs.c > > Signed-off-by: Sunil Khatri > Reviewed-by: Christian König > --- > driver

Re: [PATCH v6 2/5] drm: move debugfs functionality from drm_drv.c to drm_debugfs.c

2025-06-30 Thread Christian König
On 30.06.25 15:34, Khatri, Sunil wrote: > > On 6/30/2025 5:11 PM, Christian König wrote: >> >> On 27.06.25 11:49, Sunil Khatri wrote: >>> move the debugfs functions from drm_drv.c to drm_debugfs.c >>> >>> move this root node to the debugfs for easi

Re: [PATCH v6 5/5] drm/amdgpu: add support of debugfs for mqd information

2025-06-30 Thread Christian König
On 27.06.25 11:49, Sunil Khatri wrote: > Add debugfs support for mqd for each queue of the client. > > The address exposed to debugfs could be used to dump > the mqd. > > Signed-off-by: Sunil Khatri > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c | 52 +++ > drivers/gpu/dr

Re: [PATCH v6 4/5] drm/amdgpu: add debugfs support for VM pagetable per client

2025-06-30 Thread Christian König
On 27.06.25 11:49, Sunil Khatri wrote: > Add a debugfs file under the client directory which shares > the root page table base address of the VM. > > This address could be used to dump the pagetable for debug > memory issues. > > Signed-off-by: Sunil Khatri > --- > drivers/gpu/drm/amd/amdgpu

Re: [PATCH v6 3/5] drm: add debugfs support on per client-id basis

2025-06-30 Thread Christian König
; > Also create a debugfs file which show the process > information for the client and create a symlink back > to the parent drm device from each client. > > Signed-off-by: Sunil Khatri Reviewed-by: Christian König > --- > drivers/gpu/drm/drm_debugfs.c | 80 +++

Re: [PATCH 3/3] drm/amdgpu: skip kfd resume_process for dev_pm_ops.thaw()

2025-06-30 Thread Christian König
On 30.06.25 12:41, Samuel Zhang wrote: > The hibernation successful workflow: > - prepare: evict VRAM and swapout GTT BOs > - freeze > - create the hibernation image in system memory > - thaw: swapin and restore BOs Why should a thaw happen here in between? > - complete > - write hibernation imag

Re: [PATCH 1/3] drm/amdgpu: move GTT to SHM after eviction for hibernation

2025-06-30 Thread Christian König
On 30.06.25 12:41, Samuel Zhang wrote: > When hibernate with data center dGPUs, huge number of VRAM BOs evicted > to GTT and takes too much system memory. This will cause hibernation > fail due to insufficient memory for creating the hibernation image. > > Move GTT BOs to shmem in KMD, then shmem

Re: [PATCH v6 1/5] drm: move the debugfs accel driver code to drm layer

2025-06-30 Thread Christian König
On 27.06.25 11:49, Sunil Khatri wrote: > Move the debugfs accel driver code to the drm layer > and it is an intermediate step to move all debugfs > related handling into drm_debugfs.c > > Signed-off-by: Sunil Khatri Reviewed-by: Christian König > --- > drivers/a

  1   2   3   4   5   6   7   8   9   10   >