Re: [PATCH v7 2/3] drm/buddy: Separate clear and dirty free block trees

2025-10-04 Thread Arunpravin Paneer Selvam
der) +    return 0; +    } Do we need all these changes in __force_merge? Can't we just always pick the dirty tree and keep everything else the same? If something is non-merged there must be a dirty block preventing that, and when force merging everything unmerged

Re: [PATCH] drm/amdkfd: Fix svm_bo and vram page refcount

2025-10-03 Thread Philip Yang
On 2025-10-03 17:46, Felix Kuehling wrote: On 2025-10-03 17:18, Philip Yang wrote: On 2025-10-03 17:05, Felix Kuehling wrote: On 2025-09-26 17:03, Philip Yang wrote: zone_device_page_init uses set_page_count to set vram page refcount to 1, there is race if step 2 happens between step 1 and

Re: [PATCH] drm/amdkfd: Fix svm_bo and vram page refcount

2025-10-03 Thread Felix Kuehling
On 2025-10-03 17:18, Philip Yang wrote: On 2025-10-03 17:05, Felix Kuehling wrote: On 2025-09-26 17:03, Philip Yang wrote: zone_device_page_init uses set_page_count to set vram page refcount to 1, there is race if step 2 happens between step 1 and 3. 1. CPU page fault handler get vram page,

Re: [PATCH] drm/amdkfd: Fix svm_bo and vram page refcount

2025-10-03 Thread Felix Kuehling
[+Linux MM and HMM maintainers] Please see below my question about the safety of using zone_device_page_init. On 2025-10-03 18:02, Philip Yang wrote: On 2025-10-03 17:46, Felix Kuehling wrote: On 2025-10-03 17:18, Philip Yang wrote: On 2025-10-03 17:05, Felix Kuehling wrote: On 2025-09-

Re: [PATCH 2/3] drm/amdkfd: svm unmap use page aligned address

2025-10-03 Thread Philip Yang
On 2025-10-02 18:04, Chen, Xiaogang wrote: On 10/2/2025 12:43 PM, Philip Yang wrote: svm_range_unmap_from_gpus uses page aligned start, end address, the end address is inclusive. Fixes: 38c55f6719f7 ("drm/amdkfd: Handle lack of READ permissions in SVM mapping") Signed-off-by: Philip Yang

Re: [PATCH] drm/amdkfd: Fix svm_bo and vram page refcount

2025-10-03 Thread Philip Yang
On 2025-10-03 17:05, Felix Kuehling wrote: On 2025-09-26 17:03, Philip Yang wrote: zone_device_page_init uses set_page_count to set vram page refcount to 1, there is race if step 2 happens between step 1 and 3. 1. CPU page fault handler get vram page, migrate the vram page to system page 2. G

Re: [PATCH] drm/amdkfd: Fix svm_bo and vram page refcount

2025-10-03 Thread Felix Kuehling
On 2025-09-26 17:03, Philip Yang wrote: zone_device_page_init uses set_page_count to set vram page refcount to 1, there is race if step 2 happens between step 1 and 3. 1. CPU page fault handler get vram page, migrate the vram page to system page 2. GPU page fault migrate to the vram page, set pa

Re: [PATCH v2 3/3] drm/amdkfd: Don't stuck in svm restore worker

2025-10-03 Thread Felix Kuehling
On 2025-10-03 14:34, Chen, Xiaogang wrote: On 10/3/2025 1:27 PM, Philip Yang wrote: On 2025-10-03 14:22, Chen, Xiaogang wrote: [AMD Official Use Only - AMD Internal Distribution Only] The MADV_FREE is handled at madvise_free_single_vma(madvise_dontneed_free) from madvise_vma_behavior at mm

Re: [Patch v2 1/2] drm/amdgpu: use user provided hmm_range buffer in amdgpu_ttm_tt_get_user_pages

2025-10-03 Thread Chen, Xiaogang
On 9/26/2025 5:53 AM, Khatri, Sunil wrote: On 9/24/2025 10:27 PM, Kuehling, Felix wrote: On 2025-09-24 06:01, Sunil Khatri wrote: update the amdgpu_ttm_tt_get_user_pages and all dependent function along with it callers to use a user allocated hmm_range buffer instead hmm layer allocates the

Re: [PATCH] drm/amdgpu: fix handling of harvesting for ip_discovery firmware

2025-10-03 Thread Alex Deucher
Ping? On Fri, Sep 26, 2025 at 7:44 PM Alex Deucher wrote: > > Chips which use the IP discovery firmware loaded by the driver > reported incorrect harvesting information in the ip discovery > table in sysfs because the driver only uses the ip discovery > firmware for populating sysfs and not for d

Re: [PATCH] drm/amdkfd: Fix two comments in kfd_ioctl.h

2025-10-03 Thread Alex Deucher
On Fri, Oct 3, 2025 at 4:09 PM Felix Kuehling wrote: > > Queue read and write pointers are "to KFD", not "from KFD". > > Suggested-by: Robert Liu > Signed-off-by: Felix Kuehling Reviewed-by: Alex Deucher > --- > include/uapi/linux/kfd_ioctl.h | 4 ++-- > 1 file changed, 2 insertions(+), 2 de

Re: [PATCH 1/3] drm/amdgpu: svm check hmm range kzalloc return NULL

2025-10-03 Thread Chen, Xiaogang
On 10/3/2025 10:12 AM, Philip Yang wrote: On 2025-10-02 17:48, Chen, Xiaogang wrote: On 10/2/2025 12:43 PM, Philip Yang wrote: Add hmm_range kzalloc return NULL error check. In case the get_pages return failed, free and set hmm_range to NULL, to avoid double free in get_pages_done. Fixes:

RE: [PATCH 07/19 v6.1.y] minmax: make generic MIN() and MAX() macros available everywhere

2025-10-03 Thread Farber, Eliav
On Wed, Sep 24, 2025 at 08:23:08PM +, Eliav Farber wrote: > From: Linus Torvalds > > [ Upstream commit 1a251f52cfdc417c84411a056bc142cbd77baef4 ] As this didn't go into 6.6.y yet, I'll stop here on this series for now. Please fix up for newer kernels first and then resend these. The fix fo

Re: [PATCH V11 06/47] drm/colorop: Add 1D Curve subtype

2025-10-03 Thread Shengyu Qu
在 2025/9/26 2:22, Harry Wentland 写道: On 2025-09-25 04:11, Pekka Paalanen wrote: On Tue, 23 Sep 2025 11:41:24 -0600 Alex Hung wrote: On 9/23/25 10:16, Alex Hung wrote: On 9/23/25 01:59, Pekka Paalanen wrote: On Mon, 22 Sep 2025 21:16:45 -0600 Alex Hung wrote: On 9/18/25 02:40, Pekka

RE: [PATCH 2/3] drm/amd: Stop overloading power limit with limit type

2025-10-03 Thread Lazar, Lijo
[Public] Patches 1 and 3 are Reviewed-by: Lijo Lazar For this one, suggest splitting to two patches - one to pass the limit type and the other to save/restore fast ppt limit. Could add LIMIT_TYPE_COUNT to enum smu_ppt_limit_type and keep an array in smu_user_dpm_profile. However, this

RE: [PATCH] drm/amd: Drop superfluous call to set_power_limit()

2025-10-03 Thread Lazar, Lijo
[Public] This is because currently only Vangogh has Fast PPT limit. The limit for that is not the same as default one. It will be higher than the max_power_limit. Since this is Vangogh-only, it's left to the implementation to handle that check. Thanks, Lijo -Original Message- From: Lim

RE: [PATCH 07/19 v6.1.y] minmax: make generic MIN() and MAX() macros available everywhere

2025-10-03 Thread Farber, Eliav
> On Mon, Sep 29, 2025 at 02:39:26PM +, Farber, Eliav wrote: > > > On Wed, Sep 24, 2025 at 08:23:08PM +, Eliav Farber wrote: > > > > From: Linus Torvalds > > > > > > > > [ Upstream commit 1a251f52cfdc417c84411a056bc142cbd77baef4 ] > > > > > > > > > > > > As this didn't go into 6.6.y yet,

Re: [PATCH 07/19 v6.1.y] minmax: make generic MIN() and MAX() macros available everywhere

2025-10-03 Thread Greg KH
On Mon, Sep 29, 2025 at 02:39:26PM +, Farber, Eliav wrote: > > On Wed, Sep 24, 2025 at 08:23:08PM +, Eliav Farber wrote: > > > From: Linus Torvalds > > > > > > [ Upstream commit 1a251f52cfdc417c84411a056bc142cbd77baef4 ] > > > > > > > > As this didn't go into 6.6.y yet, I'll stop here on

RE: [PATCH 07/19 v6.1.y] minmax: make generic MIN() and MAX() macros available everywhere

2025-10-03 Thread Farber, Eliav
> On Wed, Sep 24, 2025 at 08:23:08PM +, Eliav Farber wrote: > > From: Linus Torvalds > > > > [ Upstream commit 1a251f52cfdc417c84411a056bc142cbd77baef4 ] > > > > As this didn't go into 6.6.y yet, I'll stop here on this series for now. > Please fix up for newer kernels first and then resend th

Re: [RFC v8 00/21] DRM scheduling cgroup controller

2025-10-03 Thread Philipp Stanner
+Cc Sima, Dave On Mon, 2025-09-29 at 16:07 +0200, Danilo Krummrich wrote: > On Wed Sep 3, 2025 at 5:23 PM CEST, Tvrtko Ursulin wrote: > > This is another respin of this old work^1 which since v7 is a total rewrite > > and > > completely changes how the control is done. > > I only got some of the

Re: [PATCH v7 3/3] drm/buddy: Add KUnit tests for allocator performance under fragmentation

2025-10-02 Thread Arunpravin Paneer Selvam
On 9/26/2025 4:30 PM, Matthew Auld wrote: On 23/09/2025 10:02, Arunpravin Paneer Selvam wrote: Add KUnit test cases that create severe memory fragmentation and measure allocation/free performance. The tests simulate two scenarios - 1. Allocation under severe fragmentation     - Allocate the

Re: [PATCH] drm: amd: Use kmalloc_array to prevent overflow of dynamic size calculation

2025-10-02 Thread kernel test robot
Hi Bhanu, kernel test robot noticed the following build warnings: [auto build test WARNING on amd-pstate/linux-next] [also build test WARNING on amd-pstate/bleeding-edge v6.17] [cannot apply to linus/master next-20251002] [If your patch is applied to the wrong git tree, kindly drop us a note. And

RE: [PATCH v5 4/8] drm/amdgpu: keeping waiting userq fence infinitely

2025-10-02 Thread Liang, Prike
[Public] Hi Alex, Apologies for overlooking your earlier review comments. I just see patches 1-4 have already been reviewed. Can we proceed to land the series (patches 1-6) in drm-next? Regards, Prike > -Original Message- > From: Liang, Prike > Sent: Monday, September 29, 2025

Re: [PATCH 2/3] drm/amdkfd: svm unmap use page aligned address

2025-10-02 Thread Chen, Xiaogang
On 10/2/2025 12:43 PM, Philip Yang wrote: svm_range_unmap_from_gpus uses page aligned start, end address, the end address is inclusive. Fixes: 38c55f6719f7 ("drm/amdkfd: Handle lack of READ permissions in SVM mapping") Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 4

Re: [PATCH 1/3] drm/amdgpu: svm check hmm range kzalloc return NULL

2025-10-02 Thread Chen, Xiaogang
On 10/2/2025 12:43 PM, Philip Yang wrote: Add hmm_range kzalloc return NULL error check. In case the get_pages return failed, free and set hmm_range to NULL, to avoid double free in get_pages_done. Fixes: 29e6f5716115 ("drm/amdgpu: use user provided hmm_range buffer in amdgpu_ttm_tt_get_user_

Re: [PATCH v4 2/2] amd/amdkfd: enhance kfd process check in switch partition

2025-10-01 Thread Philip Yang
On 2025-09-26 15:52, Chen, Xiaogang wrote: On 9/24/2025 10:29 AM, Yifan Zhang wrote: current switch partition only check if kfd_processes_table is empty. kfd_prcesses_table entry is deleted in kfd_process_notifier_release, but kfd_process tear down is in kfd_process_wq_release. consider two

Re: [PATCH v2] drm/client: Remove holds_console_lock parameter from suspend/resume

2025-10-01 Thread Rodrigo Vivi
On Wed, Oct 01, 2025 at 04:37:03PM +0200, Thomas Zimmermann wrote: > No caller of the client resume/suspend helpers holds the console > lock. The last such cases were removed from radeon in the patch > series at [1]. Now remove the related parameter and the TODO items. > > v2: > - update placehold

Re: [PATCH] drm/client: Remove holds_console_lock parameter from suspend/resume

2025-10-01 Thread Nirmoy Das
On 01.10.25 16:39, Thomas Zimmermann wrote: Hi Am 01.10.25 um 14:46 schrieb Nirmoy Das: On 01.10.25 14:15, Thomas Zimmermann wrote: No caller of the client resume/suspend helpers holds the console lock. The last such cases were removed from radeon in the patch series at [1]. Now remove the

Re: [PATCH] drm/client: Remove holds_console_lock parameter from suspend/resume

2025-10-01 Thread Thomas Zimmermann
Hi Am 01.10.25 um 14:46 schrieb Nirmoy Das: On 01.10.25 14:15, Thomas Zimmermann wrote: No caller of the client resume/suspend helpers holds the console lock. The last such cases were removed from radeon in the patch series at [1]. Now remove the related parameter and the TODO items. Signed-o

Re: [PATCH] drm/client: Remove holds_console_lock parameter from suspend/resume

2025-10-01 Thread Nirmoy Das
On 01.10.25 14:15, Thomas Zimmermann wrote: No caller of the client resume/suspend helpers holds the console lock. The last such cases were removed from radeon in the patch series at [1]. Now remove the related parameter and the TODO items. Signed-off-by: Thomas Zimmermann Link: https://patch

Re: [Patch v2 1/2] drm/amdgpu: use user provided hmm_range buffer in amdgpu_ttm_tt_get_user_pages

2025-10-01 Thread Kuehling, Felix
On 2025-09-26 06:53, Khatri, Sunil wrote: On 9/24/2025 10:27 PM, Kuehling, Felix wrote: On 2025-09-24 06:01, Sunil Khatri wrote: update the amdgpu_ttm_tt_get_user_pages and all dependent function along with it callers to use a user allocated hmm_range buffer instead hmm layer allocates the buf

Re: [PATCH 3/3] drm/amdgpu: Add amdgpu drm ras ioctl for ras module

2025-10-01 Thread Alex Deucher
On Tue, Sep 30, 2025 at 4:27 AM YiPeng Chai wrote: > > Add amdgpu drm ras ioctl for ras module. Please describe the IOCTL and how it is used and what functionality it provides. Additionally please provide a link to the proposed open source userspace tools that will use it. We can't merge the ke

Re: [PATCH V4 17/18] amdkfd: set_debug_trap ioctl only works on a primary kfd_process target

2025-09-30 Thread Kuehling, Felix
On 2025-09-25 22:49, Zhu, Lingshan wrote: On 9/25/2025 5:50 AM, Kuehling, Felix wrote: On 2025-09-23 03:26, Zhu Lingshan wrote: The user space program pass down a pid to kfd through set_debug_trap ioctl, which can help find the corresponding user space program and its mm struct. However, these

Re: [PATCH V11 17/47] drm/vkms: Use s32 for internal color pipeline precision

2025-09-30 Thread Harry Wentland
On 2025-09-30 03:07, Pekka Paalanen wrote: On Thu, 14 Aug 2025 21:50:06 -0600 Alex Hung wrote: From: Harry Wentland Certain operations require us to preserve values below 0.0 and above 1.0 (0x0 and 0x respectively in 16 bpc unorm). One such operation is a BT709 encoding operation foll

Re: DRM Jobqueue design (was "[RFC v8 00/21] DRM scheduling cgroup controller")

2025-09-30 Thread Danilo Krummrich
On Tue Sep 30, 2025 at 11:00 AM CEST, Philipp Stanner wrote: > +Cc Sima, Dave > > On Mon, 2025-09-29 at 16:07 +0200, Danilo Krummrich wrote: >> On Wed Sep 3, 2025 at 5:23 PM CEST, Tvrtko Ursulin wrote: >> > This is another respin of this old work^1 which since v7 is a total >> > rewrite and >> > c

RE: [PATCH v5 4/8] drm/amdgpu: keeping waiting userq fence infinitely

2025-09-30 Thread Liang, Prike
Original Message- > From: amd-gfx On Behalf Of Liang, > Prike > Sent: Tuesday, September 30, 2025 4:20 PM > To: amd-gfx@lists.freedesktop.org > Cc: Deucher, Alexander ; Koenig, Christian > > Subject: RE: [PATCH v5 4/8] drm/amdgpu: keeping waiting userq fence infini

Re: [PATCH 3/3] drm/amd/display: Set stricter clock dividers on DCE 6-10

2025-09-28 Thread timur . kristof
On Sun, 2025-09-28 at 16:14 +0200, Christian König wrote: > > > On 26.09.25 20:26, Timur Kristóf wrote: > > Set stricter dividers to stabilize the PLL's feedback loop. > > In practice, the actual output isn't exactly the target > > clock, but slowly oscillates around it. This makes it > > more st

Re: [PATCH 0/5] Adjustments to common mode behavior

2025-09-27 Thread Timur Kristóf
On 9/24/25 19:48, Mario Limonciello wrote: A slightly related question, would you be OK with changing the link detection code to return dc_connection_none when DDC cannot read an EDID header on digital signals, similar to how the non-DC code does it? I personally think lining up all th

Re: [RFC 1/2] drm/ttm: Allow drivers to specify maximum beneficial TTM pool size

2025-09-27 Thread Christian König
On 19.09.25 15:11, Tvrtko Ursulin wrote: > GPUs typically benefit from contiguous memory via reduced TLB pressure and > improved caching performance, where the maximum size of contiguous block > which adds a performance benefit is related to hardware design. > > TTM pool allocator by default tr

Re: [RFC v8 11/12] drm/sched: Remove FIFO and RR and simplify to a single run queue

2025-09-27 Thread Philipp Stanner
> > GFP_KERNEL); > > > - if (!sched->sched_rq[i]) > > > - goto Out_unroll; > > > - drm_sched_rq_init(sched, sched->sched_rq[i]); > > > - } > > > + drm_sched_rq_init(sched, sched->rq); > > > >

Re: [RFC v8 07/12] drm/sched: Account entity GPU time

2025-09-27 Thread Philipp Stanner
On Thu, 2025-09-25 at 12:52 +0100, Tvrtko Ursulin wrote: > > On 24/09/2025 10:11, Philipp Stanner wrote: > > On Wed, 2025-09-03 at 11:18 +0100, Tvrtko Ursulin wrote: > > > To implement fair scheduling we need a view into the GPU time consumed by > > > entities. Problem we have is that jobs and ent

Re: [Patch v2 1/2] drm/amdgpu: use user provided hmm_range buffer in amdgpu_ttm_tt_get_user_pages

2025-09-26 Thread Kuehling, Felix
On 2025-09-24 06:01, Sunil Khatri wrote: update the amdgpu_ttm_tt_get_user_pages and all dependent function along with it callers to use a user allocated hmm_range buffer instead hmm layer allocates the buffer. This is a need to get hmm_range pointers easily accessible without accessing the bo a

RE: [PATCH] drm/amdkfd: Fix svm_bo and vram page refcount

2025-09-26 Thread Kasiviswanathan, Harish
[AMD Official Use Only - AMD Internal Distribution Only] Acked-by: Harish Kasiviswanathan -Original Message- From: amd-gfx On Behalf Of Philip Yang Sent: Friday, September 26, 2025 5:04 PM To: amd-gfx@lists.freedesktop.org Cc: Kuehling, Felix ; Yang, Philip Subject: [PATCH] drm/amdkf

Re: [PATCH v4 2/2] amd/amdkfd: enhance kfd process check in switch partition

2025-09-26 Thread Chen, Xiaogang
On 9/24/2025 10:29 AM, Yifan Zhang wrote: current switch partition only check if kfd_processes_table is empty. kfd_prcesses_table entry is deleted in kfd_process_notifier_release, but kfd_process tear down is in kfd_process_wq_release. consider two processes: Process A (workqueue) -> kfd_proc

Re: [PATCH] drm/amd/amdgpu: Fix the mes version that support inv_tlbs

2025-09-26 Thread Chen, Michael
[AMD Official Use Only - AMD Internal Distribution Only] Reviewed-by: Michael Chen From: amd-gfx on behalf of Shaoyun Liu Sent: Wednesday, September 24, 2025 9:43 PM To: amd-gfx@lists.freedesktop.org Cc: Liu, Shaoyun Subject: [PATCH] drm/amd/amdgpu: Fix the m

Re: [PATCH v7 2/3] drm/buddy: Separate clear and dirty free block trees

2025-09-26 Thread Matthew Auld
lock, true); + if (order >= min_order) + return 0; + } Do we need all these changes in __force_merge? Can't we just always pick the dirty tree and keep everything else the same? If something is no

RE: [PATCH] drm/amd/amdgpu: Fix the mes version that support inv_tlbs

2025-09-26 Thread Liu, Shaoyun
[AMD Official Use Only - AMD Internal Distribution Only] ping -Original Message- From: Liu, Shaoyun Sent: Wednesday, September 24, 2025 9:44 PM To: amd-gfx@lists.freedesktop.org Cc: Liu, Shaoyun Subject: [PATCH] drm/amd/amdgpu: Fix the mes version that support inv_tlbs MES version 0x83

Re: [PATCH v7 1/3] drm/buddy: Optimize free block management with RB tree

2025-09-26 Thread Matthew Auld
On 23/09/2025 10:02, Arunpravin Paneer Selvam wrote: Replace the freelist (O(n)) used for free block management with a red-black tree, providing more efficient O(log n) search, insert, and delete operations. This improves scalability and performance when managing large numbers of free blocks per

Re: [PATCH v4 1/2] amd/amdkfd: resolve a race in amdgpu_amdkfd_device_fini_sw

2025-09-26 Thread Chen, Xiaogang
3:11 AM To: Yang, Philip ; Zhang, Yifan ; amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander ; Kuehling, Felix ; Yang, Philip ; Lazar, Lijo Subject: Re: [PATCH v4 1/2] amd/amdkfd: resolve a race in amdgpu_amdkfd_device_fini_sw On 9/24/2025 5:48 PM, Philip Yang wrote: On 2025-09-24 11:29, Y

Re: [PATCH v4 1/2] amd/amdkfd: resolve a race in amdgpu_amdkfd_device_fini_sw

2025-09-26 Thread Zhang, Yifan
terrupt_lock, flags); - kfifo_free(&node->ih_fifo); -} - From: Lazar, Lijo Sent: Friday, September 26, 2025 2:49 PM To: Zhang, Yifan ; Yang, Philip ; amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander ; Kuehling, Felix Subject: RE: [PATCH v4

Re: [v2] drm/amdgpu: Merge amdgpu_vm_set_pasid into amdgpu_vm_init

2025-09-26 Thread Christian König
On 26.09.25 09:58, Jesse.Zhang wrote: > As KFD no longer uses a separate PASID, the global > amdgpu_vm_set_pasid()function is no longer necessary. > Merge its functionality directly intoamdgpu_vm_init() to simplify code flow > and eliminate redundant locking. > > v2: remove superflous check >

Re: [PATCH v7 3/3] drm/buddy: Add KUnit tests for allocator performance under fragmentation

2025-09-26 Thread Matthew Auld
On 23/09/2025 10:02, Arunpravin Paneer Selvam wrote: Add KUnit test cases that create severe memory fragmentation and measure allocation/free performance. The tests simulate two scenarios - 1. Allocation under severe fragmentation - Allocate the entire 4 GiB space as 8 KiB blocks with 64 Ki

Re: [Patch v2 1/2] drm/amdgpu: use user provided hmm_range buffer in amdgpu_ttm_tt_get_user_pages

2025-09-26 Thread Khatri, Sunil
On 9/24/2025 10:27 PM, Kuehling, Felix wrote: On 2025-09-24 06:01, Sunil Khatri wrote: update the amdgpu_ttm_tt_get_user_pages and all dependent function along with it callers to use a user allocated hmm_range buffer instead hmm layer allocates the buffer. This is a need to get hmm_range poin

Re: [PATCH] drm/amdgpu: notify amdgpu gpu reset state via uevent

2025-09-26 Thread Lazar, Lijo
26, 2025 3:13:39 PM To: Lazar, Lijo ; amd-gfx@lists.freedesktop.org Cc: Zhang, Hawking ; Deucher, Alexander Subject: RE: [PATCH] drm/amdgpu: notify amdgpu gpu reset state via uevent [Public] >> I guess the primary reason to have drm_ event and amdgpu having that is >> b

RE: [PATCH] drm/amdgpu: notify amdgpu gpu reset state via uevent

2025-09-26 Thread Wang, Yang(Kevin)
edesktop.org Cc: Zhang, Hawking ; Deucher, Alexander Subject: Re: [PATCH] drm/amdgpu: notify amdgpu gpu reset state via uevent [Public] The intention is to notify users of the device about the event. I guess the primary reason to have drm_ event and amdgpu having that is because all the &#x

Re: [PATCH] drm/amdgpu: notify amdgpu gpu reset state via uevent

2025-09-26 Thread Lazar, Lijo
Yang(Kevin) Sent: Friday, September 26, 2025 1:04:56 PM To: Lazar, Lijo ; amd-gfx@lists.freedesktop.org Cc: Zhang, Hawking ; Deucher, Alexander Subject: RE: [PATCH] drm/amdgpu: notify amdgpu gpu reset state via uevent [Public] KERNEL[173.150476] change /devices/pci:00/:00:03.1/:03

Re: [PATCH 0/3] Fixes for hybrid sleep

2025-09-26 Thread Kenneth Crudup
(Thanks Mario for the hint re: replying via Lore!) On 9/24/25 13:52, Mario Limonciello (AMD) wrote: Ionut Nechita reported recently a hibernate failure, but in debugging the issue it's actually not a hibernate failure; but a hybrid sleep failure. Multiple changes related to the change of

Re: [Patch v1] drm/amdgpu/userqueue: validate userptrs for userqueues

2025-09-26 Thread Christian König
On 25.09.25 11:01, Sunil Khatri wrote: > userptrs could be changed by the user at any time and > hence while locking all the bos before GPU start processing > validate all the userptr bos. > > Signed-off-by: Sunil Khatri > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c | 76

RE: [PATCH] drm/amdgpu: notify amdgpu gpu reset state via uevent

2025-09-26 Thread Wang, Yang(Kevin)
ember 26, 2025 14:55 To: Wang, Yang(Kevin) ; amd-gfx@lists.freedesktop.org Cc: Zhang, Hawking ; Deucher, Alexander Subject: RE: [PATCH] drm/amdgpu: notify amdgpu gpu reset state via uevent [Public] Presently, there is this one also - drm_dev_wedged_event. Perhaps it's better to modify t

RE: [PATCH] drm/amdgpu: notify amdgpu gpu reset state via uevent

2025-09-25 Thread Lazar, Lijo
[Public] Presently, there is this one also - drm_dev_wedged_event. Perhaps it's better to modify this to include additional info like pre and post reset along with cause of reset? Thanks, Lijo -Original Message- From: amd-gfx On Behalf Of Yang Wang Sent: Friday, September 26, 2025 12:0

RE: [PATCH v4 1/2] amd/amdkfd: resolve a race in amdgpu_amdkfd_device_fini_sw

2025-09-25 Thread Lazar, Lijo
no more queueing is allowed from any node. Thanks, Lijo -Original Message- From: Zhang, Yifan Sent: Thursday, September 25, 2025 3:25 PM To: Lazar, Lijo ; Yang, Philip ; amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander ; Kuehling, Felix Subject: RE: [PATCH v4 1

Re: [PATCH v7 1/3] drm/buddy: Optimize free block management with RB tree

2025-09-25 Thread Arunpravin Paneer Selvam
Hi Matthew, Ping ? Regards, Arun. On 9/23/2025 2:32 PM, Arunpravin Paneer Selvam wrote: Replace the freelist (O(n)) used for free block management with a red-black tree, providing more efficient O(log n) search, insert, and delete operations. This improves scalability and performance when mana

Re: [PATCH V4 14/18] amdkfd: record kfd process id into kfd process_info

2025-09-25 Thread Zhu, Lingshan
On 9/25/2025 5:45 AM, Kuehling, Felix wrote: > On 2025-09-23 03:26, Zhu Lingshan wrote: >> This commit records the id of the owner >> kfd_process into a kfd process_info when >> create it. >> >> Signed-off-by: Zhu Lingshan >> --- >>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h   | 2 ++ >>   d

Re: [PATCH V4 08/18] amdkfd: identify a secondary kfd process by its id

2025-09-25 Thread Zhu, Lingshan
On 9/25/2025 5:41 AM, Kuehling, Felix wrote: > On 2025-09-23 03:25, Zhu Lingshan wrote: >> This commit introduces a new id field for >> struct kfd process, which helps identify >> a kfd process among multiple contexts that >> all belong to a single user space program. >> >> The sysfs entry of a se

Re: [PATCH V4 17/18] amdkfd: set_debug_trap ioctl only works on a primary kfd_process target

2025-09-25 Thread Zhu, Lingshan
On 9/25/2025 5:50 AM, Kuehling, Felix wrote: > On 2025-09-23 03:26, Zhu Lingshan wrote: >> The user space program pass down a pid to kfd >> through set_debug_trap ioctl, which can help >> find the corresponding user space program and >> its mm struct. >> >> However, these information is insufficie

RE: [PATCH v4 1/2] amd/amdkfd: resolve a race in amdgpu_amdkfd_device_fini_sw

2025-09-25 Thread Zhang, Yifan
5 3:11 AM To: Yang, Philip ; Zhang, Yifan ; amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander ; Kuehling, Felix ; Yang, Philip ; Lazar, Lijo Subject: Re: [PATCH v4 1/2] amd/amdkfd: resolve a race in amdgpu_amdkfd_device_fini_sw On 9/24/2025 5:48 PM, Philip Yang wrote: > > On 2025-09-24

Re: [PATCH v2 1/3] PM: hibernate: Fix hybrid-sleep

2025-09-25 Thread Rafael J. Wysocki
On Thu, Sep 25, 2025 at 5:59 PM Mario Limonciello (AMD) wrote: > > Hybrid sleep will hibernate the system followed by running through > the suspend routine. Since both the hibernate and the suspend routine > will call pm_restrict_gfp_mask(), pm_restore_gfp_mask() must be called > before starting

Re: [PATCH] drm/amd: Check whether secure display TA loaded successfully

2025-09-25 Thread Mario Limonciello
On 9/25/2025 4:16 PM, Alex Deucher wrote: On Thu, Sep 25, 2025 at 3:50 PM Mario Limonciello wrote: On 9/25/2025 2:46 PM, Alex Deucher wrote: On Thu, Sep 25, 2025 at 3:39 PM Mario Limonciello wrote: [Why] Not all renoir hardware supports secure display. If the TA is present but the fe

Re: [PATCH v2 1/3] PM: hibernate: Fix hybrid-sleep

2025-09-25 Thread Mario Limonciello (AMD) (kernel.org)
On 9/25/2025 12:55 PM, Rafael J. Wysocki wrote: On Thu, Sep 25, 2025 at 7:51 PM Rafael J. Wysocki wrote: On Thu, Sep 25, 2025 at 7:47 PM Rafael J. Wysocki wrote: On Thu, Sep 25, 2025 at 5:59 PM Mario Limonciello (AMD) wrote: Hybrid sleep will hibernate the system followed by running t

Re: [PATCH 2/5] drm/amd/display: Add missing DCE6 SCL_HORZ_FILTER_INIT* SRIs

2025-09-25 Thread Timur Kristóf
Alex Deucher ezt írta (időpont: 2025. szept. 25., Csü 23:28): > On Thu, Sep 25, 2025 at 2:45 PM Timur Kristóf > wrote: > > > > Without these, it's impossible to program these registers. > > > > Fixes: 102b2f587ac8 ("drm/amd/display: dce_transform: DCE6 Scaling > Horizontal Filter Init (v2)") > >

Re: [PATCH] drm/amd: Check whether secure display TA loaded successfully

2025-09-25 Thread Alex Deucher
On Thu, Sep 25, 2025 at 5:47 PM Mario Limonciello wrote: > > > > On 9/25/2025 4:16 PM, Alex Deucher wrote: > > On Thu, Sep 25, 2025 at 3:50 PM Mario Limonciello > > wrote: > >> > >> > >> > >> On 9/25/2025 2:46 PM, Alex Deucher wrote: > >>> On Thu, Sep 25, 2025 at 3:39 PM Mario Limonciello > >>>

Re: [PATCH 2/5] drm/amd/display: Add missing DCE6 SCL_HORZ_FILTER_INIT* SRIs

2025-09-25 Thread Alex Deucher
On Thu, Sep 25, 2025 at 5:33 PM Timur Kristóf wrote: > > > > Alex Deucher ezt írta (időpont: 2025. szept. 25., Csü > 23:28): >> >> On Thu, Sep 25, 2025 at 2:45 PM Timur Kristóf >> wrote: >> > >> > Without these, it's impossible to program these registers. >> > >> > Fixes: 102b2f587ac8 ("drm/am

Re: [PATCH 2/5] drm/amd/display: Add missing DCE6 SCL_HORZ_FILTER_INIT* SRIs

2025-09-25 Thread Alex Deucher
On Thu, Sep 25, 2025 at 2:45 PM Timur Kristóf wrote: > > Without these, it's impossible to program these registers. > > Fixes: 102b2f587ac8 ("drm/amd/display: dce_transform: DCE6 Scaling Horizontal > Filter Init (v2)") > Signed-off-by: Timur Kristóf I think it would make sense to just squash pa

Re: [PATCH] drm/amd: Check whether secure display TA loaded successfully

2025-09-25 Thread Alex Deucher
On Thu, Sep 25, 2025 at 3:50 PM Mario Limonciello wrote: > > > > On 9/25/2025 2:46 PM, Alex Deucher wrote: > > On Thu, Sep 25, 2025 at 3:39 PM Mario Limonciello > > wrote: > >> > >> [Why] > >> Not all renoir hardware supports secure display. If the TA is present > >> but the feature isn't suppor

Re: [PATCH] drm/amd: Check whether secure display TA loaded successfully

2025-09-25 Thread Mario Limonciello
On 9/25/2025 2:46 PM, Alex Deucher wrote: On Thu, Sep 25, 2025 at 3:39 PM Mario Limonciello wrote: [Why] Not all renoir hardware supports secure display. If the TA is present but the feature isn't supported it will fail to load or send commands. This shows ERR messages to the user that mak

Re: [PATCH] drm/amd: Check whether secure display TA loaded successfully

2025-09-25 Thread Alex Deucher
On Thu, Sep 25, 2025 at 3:39 PM Mario Limonciello wrote: > > [Why] > Not all renoir hardware supports secure display. If the TA is present > but the feature isn't supported it will fail to load or send commands. > This shows ERR messages to the user that make it seems like there is > a problem. >

Re: [PATCH v3 0/3] Fixes for hybrid sleep

2025-09-25 Thread Rafael J. Wysocki
On Thu, Sep 25, 2025 at 8:51 PM Mario Limonciello (AMD) wrote: > > From: Mario Limonciello > > Ionut Nechita reported recently a hibernate failure, but in debugging > the issue it's actually not a hibernate failure; but a hybrid sleep > failure. > > Multiple changes related to the change of when

Re: [PATCH v4 1/2] amd/amdkfd: resolve a race in amdgpu_amdkfd_device_fini_sw

2025-09-25 Thread Chen, Xiaogang
On 9/24/2025 5:48 PM, Philip Yang wrote: On 2025-09-24 11:29, Yifan Zhang wrote: There is race in amdgpu_amdkfd_device_fini_sw and interrupt. if amdgpu_amdkfd_device_fini_sw run in b/w kfd_cleanup_nodes and    kfree(kfd), and KGD interrupt generated. kernel panic log: BUG: kernel NULL point

Re: [PATCH V11 06/47] drm/colorop: Add 1D Curve subtype

2025-09-25 Thread Harry Wentland
On 2025-09-25 04:11, Pekka Paalanen wrote: On Tue, 23 Sep 2025 11:41:24 -0600 Alex Hung wrote: On 9/23/25 10:16, Alex Hung wrote: On 9/23/25 01:59, Pekka Paalanen wrote: On Mon, 22 Sep 2025 21:16:45 -0600 Alex Hung wrote: On 9/18/25 02:40, Pekka Paalanen wrote: ... The problem

Re: [PATCH v2 1/3] PM: hibernate: Fix hybrid-sleep

2025-09-25 Thread Rafael J. Wysocki
On Thu, Sep 25, 2025 at 7:51 PM Rafael J. Wysocki wrote: > > On Thu, Sep 25, 2025 at 7:47 PM Rafael J. Wysocki wrote: > > > > On Thu, Sep 25, 2025 at 5:59 PM Mario Limonciello (AMD) > > wrote: > > > > > > Hybrid sleep will hibernate the system followed by running through > > > the suspend routin

Re: [PATCH v2 1/3] PM: hibernate: Fix hybrid-sleep

2025-09-25 Thread Rafael J. Wysocki
On Thu, Sep 25, 2025 at 7:47 PM Rafael J. Wysocki wrote: > > On Thu, Sep 25, 2025 at 5:59 PM Mario Limonciello (AMD) > wrote: > > > > Hybrid sleep will hibernate the system followed by running through > > the suspend routine. Since both the hibernate and the suspend routine > > will call pm_rest

Re: [RFC v8 07/12] drm/sched: Account entity GPU time

2025-09-25 Thread Tvrtko Ursulin
On 24/09/2025 10:11, Philipp Stanner wrote: On Wed, 2025-09-03 at 11:18 +0100, Tvrtko Ursulin wrote: To implement fair scheduling we need a view into the GPU time consumed by entities. Problem we have is that jobs and entities objects have decoupled lifetimes, where at the point we have a view

Re: [PATCH 1/3] PM: hibernate: Fix hybrid-sleep

2025-09-25 Thread kernel test robot
Hi Mario, kernel test robot noticed the following build errors: [auto build test ERROR on amd-pstate/linux-next] [also build test ERROR on amd-pstate/bleeding-edge linus/master v6.17-rc7 next-20250924] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting pa

Re: [PATCH] drm/amdgpu: Fix for GPU reset being blocked by KIQ I/O.

2025-09-25 Thread Philipp Stanner
On Thu, 2025-09-25 at 17:43 +0800, Heng Zhou wrote: > There is some probability that reset workqueue is blocked by KIQ I/O for 10+ > seconds after gpu hangs. > So we need to add a in_reset check during each KIQ register poll. > > Signed-off-by: Heng Zhou > --- You should create such patches wit

Re: [PATCH v2 0/3] Fixes for hybrid sleep

2025-09-25 Thread Rafael J. Wysocki
On Thu, Sep 25, 2025 at 5:59 PM Mario Limonciello (AMD) wrote: > > Ionut Nechita reported recently a hibernate failure, but in debugging > the issue it's actually not a hibernate failure; but a hybrid sleep > failure. > > Multiple changes related to the change of when swap is disabled in > the sus

Re: [RFC v8 12/12] drm/sched: Embed run queue singleton into the scheduler

2025-09-25 Thread Tvrtko Ursulin
On 24/09/2025 13:01, Philipp Stanner wrote: On Wed, 2025-09-03 at 11:18 +0100, Tvrtko Ursulin wrote: Now that the run queue to scheduler relationship is always 1:1 we can embed it (the run queue) directly in the scheduler struct and save on some allocation error handling code and such. Looks

Re: [PATCH 2/3] PM: hibernate: Add pm_hibernation_mode_is_suspend()

2025-09-25 Thread kernel test robot
Hi Mario, kernel test robot noticed the following build warnings: [auto build test WARNING on amd-pstate/linux-next] [also build test WARNING on amd-pstate/bleeding-edge linus/master v6.17-rc7 next-20250924] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitt

Re: [PATCH 0/3] DC: Reject too high pixel clocks on DCE6-10

2025-09-25 Thread Mario Limonciello
On 9/24/2025 6:38 AM, Timur Kristóf wrote: Reject modes with a pixel clock higher than the maximum display clock. These were never supported, but we haven't noticed the issue until the YCbCr 422 fallback was recently added. For example, the DP 1.2 standard technically supports 4K 120Hz YCbCr 422

RE: [PATCH v4 1/2] amd/amdkfd: resolve a race in amdgpu_amdkfd_device_fini_sw

2025-09-25 Thread Lazar, Lijo
ar, Lijo ; Yang, Philip ; amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander ; Kuehling, Felix Subject: RE: [PATCH v4 1/2] amd/amdkfd: resolve a race in amdgpu_amdkfd_device_fini_sw [Public] The interrupts are from KGD, still active after flush ih_wq and kfd_dev is freed. And aft

Re: [RFC v8 11/12] drm/sched: Remove FIFO and RR and simplify to a single run queue

2025-09-25 Thread Tvrtko Ursulin
the life -* time rule of all entities having to be torn down -* before their scheduler. Then, however, locking could -* be dropped alltogether from this function. -* -

Re: [RFC v8 08/12] drm/sched: Remove idle entity from tree

2025-09-25 Thread Tvrtko Ursulin
last job is consumed. This keeps the tree smaller which is nicer and more efficient as entities are removed and re-added on every popped job. This reads suspiciously as if it could be an independent patch, not necessarily tied to this series. I see it depends on the _pop() function you added. I

Re: [RFC v8 09/12] drm/sched: Add fair scheduling policy

2025-09-25 Thread Tvrtko Ursulin
  ktime_t min_vruntime, +   enum drm_sched_priority rq_prio) +{ + struct drm_sched_entity_stats *stats = entity->stats; + enum drm_sched_priority prio = entity->priority; + ktime_t vruntime; + + BUILD_BUG_ON(DRM_SCHED_PRIO

Re: [PATCH] drm/amdgpu: Merge amdgpu_vm_set_pasid into amdgpu_vm_init

2025-09-25 Thread Christian König
On 25.09.25 12:32, Jesse.Zhang wrote: > As KFD no longer uses a separate PASID, the global > amdgpu_vm_set_pasid()function is no longer necessary. > Merge its functionality directly intoamdgpu_vm_init() to simplify code flow > and eliminate redundant locking. > > Suggested-by: Christian König >

RE: [PATCH v4 1/2] amd/amdkfd: resolve a race in amdgpu_amdkfd_device_fini_sw

2025-09-25 Thread Zhang, Yifan
kfd_interrupt_exit(knode); + } destroy_workqueue(kfd->ih_wq); for (i = 0; i < num_nodes; i++) { -Original Message- From: Lazar, Lijo Sent: Thursday, September 25, 2025 3:06 PM To: Lazar, Lijo ; Zhang, Yifan ; Yang, Philip ; amd-gfx@lists.freedesktop.o

RE: [PATCH v4 1/2] amd/amdkfd: resolve a race in amdgpu_amdkfd_device_fini_sw

2025-09-24 Thread Zhang, Yifan
ree(knode); spin_lock_irqsave(&node->interrupt_lock, flags);| //NULL Pointer -Original Message- From: Lazar, Lijo Sent: Thursday, September 25, 2025 2:19 PM To: Yang, Philip ; Zhang, Yifan ; amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander ; Kuehling, Felix Subject: RE: [P

RE: [PATCH v4 1/2] amd/amdkfd: resolve a race in amdgpu_amdkfd_device_fini_sw

2025-09-24 Thread Lazar, Lijo
ndividual node NULL pointers? Thanks, Lijo -Original Message- From: Yang, Philip Sent: Thursday, September 25, 2025 4:19 AM To: Zhang, Yifan ; amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander ; Kuehling, Felix ; Yang, Philip ; Lazar, Lijo Subject: Re: [PATCH v4 1/2] amd/amdkfd: res

Re: [PATCH 1/5] drm/amd/display: Only enable common modes for eDP and LVDS

2025-09-24 Thread Harry Wentland
connector types. > > In the past there was an experiment done to disable common mode adding > for eDP and LVDS from commit 6d396e7ac1ce3 ("drm/amd/display: Disable > common modes for LVDS") and commit 7948afb46af92 ("drm/amd/display: > Disable common modes for eDP"

Re: [RFC v8 12/12] drm/sched: Embed run queue singleton into the scheduler

2025-09-24 Thread Philipp Stanner
On Wed, 2025-09-03 at 11:18 +0100, Tvrtko Ursulin wrote: > Now that the run queue to scheduler relationship is always 1:1 we can > embed it (the run queue) directly in the scheduler struct and save on > some allocation error handling code and such. Looks reasonable to me. What I suggest is to do

RE: [PATCH] drm/amdgpu: Introduce dynamic pf-vf critical region handling in SRIOV

2025-09-24 Thread Gande, Shravan kumar
[AMD Official Use Only - AMD Internal Distribution Only] looks good. Reviewed-by: Shravan Kumar Gande Thanks, Shravan -Original Message- From: Pan, Ellen Sent: Wednesday, August 6, 2025 8:02 PM To: amd-gfx@lists.freedesktop.org Cc: Gande, Shravan kumar ; Luo, Zhigang ; Pan, Ellen Su

Re: [PATCH] drm/amdgpu: Fix pipelining jobs with timeline syncobj dependencies

2025-09-24 Thread Christian König
On 17.09.25 11:59, David Rosca wrote: > drm_syncobj_find_fence returns fence chain for timeline syncobjs. > Scheduler expects normal fences as job dependencies to be able to > determine whether the fences come from the same entity or sched > and skip waiting on them. > With fence chain as job de

  1   2   3   4   5   6   7   8   9   10   >