On 28. 03. 24, 20:50, roman...@amd.com wrote:
From: Hersen Wu
[Why] Mst slot nums equals to pbn / pbn_div.
Today, pbn_div refers to dm_mst_get_pbn_divider ->
dc_link_bandwidth_kbps. In dp_link_bandwidth_kbps,
which includes effect of FEC overhead already. As
result, we should not include effec
[AMD Official Use Only - AMD Internal Distribution Only]
Reviewed-by: Hawking Zhang
Thanks for the clarification.
Regards,
Hawking
-Original Message-
From: Yang, Stanley
Sent: Thursday, July 18, 2024 12:06
To: Zhang, Hawking ; amd-gfx@lists.freedesktop.org
Subject: RE: [PATCH Review V2
Reviewed-by: Tom Chung
On 7/18/2024 12:03 PM, Srinivasan Shanmugam wrote:
Fixes the below with gcc W=1:
Function parameter or struct member 'pstate_keepout' not described in
'optc1_program_timing'
Cc: Tom Chung
Cc: Rodrigo Siqueira
Cc: Roman Li
Cc: Alex Hung
Cc: Aurabindo Pillai
Cc: Harr
Hi Easwar,
On Thu, Jul 11, 2024 at 05:27:31AM +, Easwar Hariharan wrote:
> I2C v7, SMBus 3.2, and I3C 1.1.1 specifications have replaced "master/slave"
> with more appropriate terms. Inspired by Wolfram's series to fix drivers/i2c/,
> fix the terminology for users of I2C_ALGOBIT bitbanging int
- Add a new start parameter in trim function to specify exact
address from where to start the trimming. This would help us
in situations like if drivers would like to do address alignment
for specific requirements.
- Add a new flag DRM_BUDDY_TRIM_DISABLE. Drivers can use this
flag to disab
Add address alignment support to the DCC VRAM buffers.
v2:
- adjust size based on the max_texture_channel_caches values
only for GFX12 DCC buffers.
- used AMDGPU_GEM_CREATE_GFX12_DCC flag to apply change only
for DCC buffers.
- roundup non power of two DCC buffer adjusted size to nea
Am 18.07.24 um 12:32 schrieb Arunpravin Paneer Selvam:
Add address alignment support to the DCC VRAM buffers.
v2:
- adjust size based on the max_texture_channel_caches values
only for GFX12 DCC buffers.
- used AMDGPU_GEM_CREATE_GFX12_DCC flag to apply change only
for DCC buffers.
[Public]
Series is:
Reviewed-by: Alex Deucher
From: Sunil Khatri
Sent: Thursday, July 18, 2024 12:42 AM
To: Deucher, Alexander ; Koenig, Christian
Cc: amd-gfx@lists.freedesktop.org ; Khatri,
Sunil
Subject: [PATCH v1 6/6] drm/amdgpu: add print support for sdma
On Mon, Jul 15, 2024 at 05:46:38PM -0400, Aurabindo Pillai wrote:
>
>
> On 7/12/24 1:45 PM, Abhishek Tamboli wrote:
> > Add detail description for the read_mpcc_state function in the
> > mpc_funcs struct to resolve the documentation warning.
> >
> > A kernel-doc warning was addressed:
> > ./driv
On Mon, Jul 15, 2024 at 8:00 PM wrote:
>
> On Mon, Jul 15, 2024 at 4:05 AM Łukasz Bartosik wrote:
> >
> > On Sat, Jul 13, 2024 at 11:45 PM wrote:
> > >
> > > On Fri, Jul 12, 2024 at 9:44 AM Łukasz Bartosik
> > > wrote:
> > > >
> > > > On Wed, Jul 3, 2024 at 12:14 AM wrote:
> > > > >
> > > > >
Hi Thomas,
On 6/24/24 6:15 PM, Thomas Weißschuh wrote:
> Hi Hans!
>
> thanks for your feedback!
>
> On 2024-06-24 11:11:40+, Hans de Goede wrote:
>> On 6/23/24 10:51 AM, Thomas Weißschuh wrote:
>>> The value of "min_input_signal" returned from ATIF on a Framework AMD 13
>>> is "12". This lea
On 15.07.24 06:39, Chris Hixon wrote:
> System: HP ENVY x360 Convertible 15-ds1xxx; AMD Ryzen 7 4700U with
> Radeon Graphics
>
> Problem commits (introduced in v6.9-rc1):
> 6296562f30b1 HID: amd_sfh: Extend MP2 register access to SFH
> 2105e8e00da4 HID: amd_sfh: Improve boot time when SFH is avail
Applied. Thanks!
Alex
On Thu, Jul 18, 2024 at 9:13 AM Ma Ke wrote:
>
> In radeon_add_common_modes(), the return value of drm_cvt_mode() is
> assigned to mode, which will lead to a possible NULL pointer dereference
> on failure of drm_cvt_mode(). Add a check to avoid npd.
>
> Cc: sta...@vger.ker
Add support for SMDA to the KIQ map/unmap functions.
Signed-off-by: Alex Deucher
---
drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 85 +-
drivers/gpu/drm/amd/amdgpu/nvd.h | 2 +
2 files changed, 73 insertions(+), 14 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/
Handle the second pipe in mes_v11_0_set_hw_resources().
Signed-off-by: Alex Deucher
---
drivers/gpu/drm/amd/amdgpu/mes_v11_0.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
index 8ce51b9236c1..27d54ec82208 1
This adds preliminary support for GC per queue reset. In this
case, only the jobs currently in the queue are lost. If this
fails, we fall back to a full adapter reset.
Alex Deucher (19):
drm/amdgpu/mes: add API for legacy queue reset
drm/amdgpu/mes11: add API for legacy queue reset
drm/amd
Add API for resetting kernel queues.
Signed-off-by: Alex Deucher
---
drivers/gpu/drm/amd/amdgpu/mes_v11_0.c | 33 ++
1 file changed, 33 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
index 27d54ec82208..f611183
Add API for resetting kernel queues.
Signed-off-by: Alex Deucher
---
drivers/gpu/drm/amd/amdgpu/mes_v12_0.c | 33 ++
1 file changed, 33 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
b/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
index c9f74231ad59..14b8c88
Add API for resetting user queues.
Signed-off-by: Alex Deucher
---
drivers/gpu/drm/amd/amdgpu/mes_v12_0.c | 21 +
1 file changed, 21 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
b/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
index 14b8c88fb0e0..aea6225df539 1
Add API for resetting kernel queues.
Signed-off-by: Alex Deucher
---
drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 24
drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h | 16
2 files changed, 40 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
b/dr
If a specific job is hung, try and reset just the
ring associated with the job.
v2: move to amdgpu_job.c
Signed-off-by: Alex Deucher
---
drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 17 +
1 file changed, 17 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
b/driv
Add API for resetting user queues.
Signed-off-by: Alex Deucher
---
drivers/gpu/drm/amd/amdgpu/mes_v11_0.c | 21 +
1 file changed, 21 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
index f611183e1ebf..bf8fb6a1becb 1
From: Prike Liang
Update the reset counter for the amdgpu_cs_query_reset_state()
Signed-off-by: Prike Liang
Reviewed-by: Alex Deucher
Signed-off-by: Alex Deucher
---
drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_
Rename to gfx_v11_0_kgq_init_queue() to better align with
the other naming in the file.
Signed-off-by: Alex Deucher
---
drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
b/drivers/gpu/drm/amd/a
Use this to reset just a single ring.
Signed-off-by: Alex Deucher
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
index 582053f1cd56..c7f15edeb367 100644
--- a/
From: Jiadong Zhu
There is a racing condition that cp firmware modifies
MQD in reset sequence after driver updates it for
remapping. We have to wait till CP_HQD_ACTIVE becoming
false then remap the queue.
v2: fix KIQ locking (Alex)
Signed-off-by: Jiadong Zhu
Reviewed-by: Alex Deucher
Signed-o
Add ring reset callbacks for gfx and compute.
v2: fix gfx handling
v3: wait for KIQ to complete
Signed-off-by: Alex Deucher
---
drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 91 ++
1 file changed, 91 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
b/driver
From: Prike Liang
Since the MES FW resets kernel compute queue always failed, this
may caused by the KIQ failed to process unmap KCQ. So, before MES
FW work properly that will fallback to driver executes dequeue and
resets SPI directly. Besides, rework the ring reset function and make
the busy ri
From: Jiadong Zhu
Kiq command unmap_queues only does the dequeueing action.
We have to map the queue back with clean mqd.
Signed-off-by: Jiadong Zhu
Reviewed-by: Alex Deucher
Signed-off-by: Alex Deucher
---
drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 36 ---
1 file change
Add ring reset callbacks for gfx and compute.
Signed-off-by: Alex Deucher
---
drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 18 ++
1 file changed, 18 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
index ce5cb60b8628..56606c
From: Jiadong Zhu
There is a racing condition that cp firmware modifies
MQD in reset sequence after driver updates it for
remapping. We have to wait till CP_HQD_ACTIVE becoming
false then remap the queue.
Signed-off-by: Jiadong Zhu
Reviewed-by: Alex Deucher
Signed-off-by: Alex Deucher
---
dr
From: Jiadong Zhu
There is a racing condition that cp firmware modifies
MQD in reset sequence after driver updates it for
remapping. We have to wait till CP_HQD_ACTIVE becoming
false then remap the queue.
v2: fix KIQ locking (Alex)
Signed-off-by: Jiadong Zhu
Reviewed-by: Alex Deucher
Signed-o
Add API for resetting user queues.
Signed-off-by: Alex Deucher
---
drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 43 +
drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h | 9 ++
2 files changed, 52 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
b/drivers/gpu/d
Add ring reset callback for compute.
Signed-off-by: Alex Deucher
---
drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 38 +
1 file changed, 38 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
index 98fe6c40da64..6cf90
From: Jiadong Zhu
The reset_queue api could be used from kfd or kgd.
v2: add use_mmio parameter for mes_reset_legacy_queue.
Signed-off-by: Jiadong Zhu
Reviewed-by: Alex Deucher
Signed-off-by: Alex Deucher
---
drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 20
1 file changed,
From: Jiadong Zhu
Kiq command unmap_queues only does the dequeueing action.
We have to map the queue back with clean mqd.
Signed-off-by: Jiadong Zhu
Reviewed-by: Alex Deucher
Signed-off-by: Alex Deucher
---
drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 36 ++---
1 file change
From: Jiadong Zhu
There is a racing condition that cp firmware modifies
MQD in reset sequence after driver updates it for
remapping. We have to wait till CP_HQD_ACTIVE becoming
false then remap the queue.
v2: fix KIQ locking (Alex)
Signed-off-by: Jiadong Zhu
Reviewed-by: Alex Deucher
Signed-o
Since the MES FW resets kernel compute queue always failed, this
may caused by the KIQ failed to process unmap KCQ. So, before MES
FW work properly that will fallback to driver executes dequeue and
resets SPI directly. Besides, rework the ring reset function and make
the busy ring type reset in eac
From: Jiadong Zhu
Kiq command unmap_queues only does the dequeueing action.
We have to map the queue back with clean mqd.
v2: fix up error handling (Alex)
Signed-off-by: Jiadong Zhu
Reviewed-by: Alex Deucher
Signed-off-by: Alex Deucher
---
drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 46 +++
This will be used in more places in the future so
add a mutex.
Signed-off-by: Alex Deucher
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 1 +
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h| 2 ++
drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 10 +++---
3 files changed, 10 insertions(+), 3 d
From: Jiadong Zhu
Using mmio to do queue reset. Enter safe mode
when writing registers.
Signed-off-by: Jiadong Zhu
Reviewed-by: Alex Deucher
Signed-off-by: Alex Deucher
---
drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 37 +++
1 file changed, 37 insertions(+)
diff --git a/
Add ring reset callback for compute.
Signed-off-by: Alex Deucher
---
drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 38 +++
1 file changed, 38 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index 675a1a8e2515..78495df29
Add ring reset callbacks for gfx and compute.
Signed-off-by: Alex Deucher
---
drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c | 18 ++
1 file changed, 18 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c
b/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c
index 63b073fd4dc7..9ed6c8
Need to enter safe mode before touching GC MMIO.
Signed-off-by: Alex Deucher
---
drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
index 348bc1b1784a..9bd
From: Jiadong Zhu
Using mmio to do queue reset. Enter safe mode
before writing mmio registers.
v2: set register instance offset according to xcc id.
Signed-off-by: Jiadong Zhu
Reviewed-by: Alex Deucher
Signed-off-by: Alex Deucher
---
drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 36
From: Jiadong Zhu
Add reset_hw_queue in kiq_pm4_funcs callbacks.
Signed-off-by: Jiadong Zhu
Reviewed-by: Alex Deucher
Signed-off-by: Alex Deucher
---
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h | 4
1 file changed, 4 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
b/dri
To match other GFX IPs.
Signed-off-by: Alex Deucher
---
drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 26 +++---
1 file changed, 19 insertions(+), 7 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
index e9d93bf909db..b833
From: Jiadong Zhu
Implement queue reset for graphic and compute queue.
v2: use amdgpu_gfx_rlc funcs to enter/exit safe mode.
v3: use gfx_v11_0_request_gfx_index_mutex()
Signed-off-by: Jiadong Zhu
Reviewed-by: Alex Deucher
Signed-off-by: Alex Deucher
---
drivers/gpu/drm/amd/amdgpu/mes_v11_0.
From: Jiadong Zhu
Add me/pipe/queue parameters for queue reset input.
v2: fix build (Alex)
Signed-off-by: Jiadong Zhu
Reviewed-by: Alex Deucher
Signed-off-by: Alex Deucher
---
drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 3 ++-
drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h | 14 +-
driv
It will be used by the queue reset code.
Signed-off-by: Alex Deucher
---
drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 4 ++--
drivers/gpu/drm/amd/amdgpu/gfx_v11_0.h | 3 +++
2 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
b/drivers/gpu/drm/amd/a
It's not supported under SR-IOV at the moment.
Signed-off-by: Alex Deucher
---
drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 3 +++
drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 3 +++
2 files changed, 6 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
b/drivers/gpu/drm/amd/amdgpu/gfx_
It's not supported under SR-IOV at the moment.
Signed-off-by: Alex Deucher
---
drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 6 ++
1 file changed, 6 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
index 20be1b9ecdc3..021f7394b252 100644
It's not supported under SR-IOV at the moment.
Signed-off-by: Alex Deucher
---
drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 6 ++
1 file changed, 6 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
index b833943faa53..a8f26a311faf 100644
It's not supported under SR-IOV at the moment.
Signed-off-by: Alex Deucher
---
drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c | 6 ++
1 file changed, 6 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c
b/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c
index ba121491f5a7..4ee36a172471 100644
Applied. Thanks!
On Thu, Jul 18, 2024 at 10:12 AM Ma Ke wrote:
>
> In amdgpu_connector_add_common_modes(), the return value of drm_cvt_mode()
> is assigned to mode, which will lead to a NULL pointer dereference on
> failure of drm_cvt_mode(). Add a check to avoid npd.
>
> Cc: sta...@vger.kernel.
Applied. Thanks!
Alex
On Thu, Jul 18, 2024 at 10:17 AM Ma Ke wrote:
>
> Return 0 to avoid returning an uninitialized variable r.
>
> Cc: sta...@vger.kernel.org
> Fixes: 230dd6bb6117 ("drm/amd/amdgpu: implement mode2 reset on smu_v13_0_10")
> Signed-off-by: Ma Ke
> ---
> Changes in v2:
> - adde
Hi,
On 18.07.24 16:06, Alex Deucher wrote:
This adds preliminary support for GC per queue reset. In this
case, only the jobs currently in the queue are lost. If this
fails, we fall back to a full adapter reset.
First of all, thank you so much for working on this! It's great to
finally see pr
On Thu, Jul 18, 2024 at 10:15 AM Alex Deucher wrote:
>
> This adds preliminary support for GC per queue reset. In this
> case, only the jobs currently in the queue are lost. If this
> fails, we fall back to a full adapter reset.
Also available here via git:
https://gitlab.freedesktop.org/agd5f/
In order to allow ROCm GDB to handle reset queues, raise an
EC_QUEUE_RESET exception so that the debugger can subscribe and
query this exception.
Reset queues should still be considered suspendable with a status
flag of KFD_DBG_QUEUE_RESET_MASK.
However they should not be resumable since user spac
Support per-queue reset for GFX9. The recommendation is for the driver
to target reset the HW queue via a SPI MMIO register write.
Since this requires pipe and HW queue info and MEC FW is limited to
doorbell reports of hung queues after an unmap failure, scan the HW
queue slots defined by SET_RES
On 2024-07-17 16:10, Felix Kuehling
wrote:
@@ -603,8 +606,6
@@ struct queue {
void *gang_ctx_bo;
uint64_t gang_ctx_gpu_addr;
void *gang_ctx_cpu_ptr;
-
On 2024-07-17 16:16, Felix Kuehling
wrote:
Sorry, I
see that this patch still doesn't propagate errors returned from
kfd_queue_releasre_buffers correctly. And the later patches in the
series don't seem to fix it either. See inline.
Add ring reset callback for gfx. Untested.
Signed-off-by: Alex Deucher
---
drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 75 ++-
drivers/gpu/drm/amd/amdgpu/vid.h | 1 +
2 files changed, 75 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
Add ring reset callback for gfx.
Signed-off-by: Alex Deucher
---
drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 47 +++
1 file changed, 47 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index ae23a7848237..5c4b0c8669b6
Add ring reset callback for gfx. Untested.
Signed-off-by: Alex Deucher
---
drivers/gpu/drm/amd/amdgpu/cikd.h | 1 +
drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c | 76 ++-
2 files changed, 76 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/cikd.h
b/dr
On 2024-07-18 15:57, Philip Yang wrote:
>
> On 2024-07-17 16:16, Felix Kuehling wrote:
>> Sorry, I see that this patch still doesn't propagate errors returned from
>> kfd_queue_releasre_buffers correctly. And the later patches in the series
>> don't seem to fix it either. See inline.
> kfd_qu
On 2024-07-18 1:25, Chen, Xiaogang wrote:
>
> On 7/17/2024 6:02 PM, Felix Kuehling wrote:
>>
>> On 2024-06-26 11:06, Xiaogang.Chen wrote:
>>> From: Xiaogang Chen
>>>
>>> When user adds new vm range that has overlapping with existing svm pranges
>>> current kfd creats a cloned pragne and split
This patch series do additional queue buffers validation in the queue
creation IOCTLS, fail the queue creation if buffers not mapped on the GPU
with the expected size.
Ensure queue buffers residency by tracking the GPUVM virtual addresses
for queue buffers to return error if the user tries to free
Change amdgpu_amdkfd_bo_mapped_to_dev to use drm_priv as parameter
instead of adev, to support spatial partition. This is only used by CRIU
checkpoint restore now. No functional change.
Signed-off-by: Philip Yang
Reviewed-by: Felix Kuehling
---
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h |
Ensure update queue new ring buffer is mapped on GPU with correct size.
Decrease queue old ring_bo queue_refcount and increase new ring_bo
queue_refcount.
Signed-off-by: Philip Yang
Reviewed-by: Felix Kuehling
---
.../amd/amdkfd/kfd_process_queue_manager.c| 32 ++-
1 file c
Add atomic queue_refcount to struct bo_va, return -EBUSY to fail unmap
BO from the GPU if the bo_va queue_refcount is not zero.
Create queue to increase the bo_va queue_refcount, destroy queue to
decrease the bo_va queue_refcount, to ensure the queue buffers mapped on
the GPU when queue is active.
When creating KFD user compute queue, check if queue eop buffer size,
cwsr area size, ctl stack size equal to the size of KFD node
properities.
Check the entire cwsr area which may split into multiple svm ranges
aligned to gramularity boundary.
Signed-off-by: Philip Yang
Reviewed-by: Felix Kuehl
Queue CWSR area maybe registered to GPU as svm memory, create queue to
ensure svm mapped to GPU with KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED flag.
Add queue_refcount to struct svm_range, to track queue CWSR area usage.
Because unmap mmu notifier callback return value is ignored, if
application unmap
Find user queue rptr, ring buf, eop buffer and cwsr area BOs, and
check BOs are mapped on the GPU with correct size and take the BO
reference.
Signed-off-by: Philip Yang
Reviewed-by: Felix Kuehling
---
drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 4 +++
drivers/gpu/drm/amd/amdkfd/kfd_queue.c | 38
Pass pointer reference to amdgpu_bo_unref to clear the correct pointer,
otherwise amdgpu_bo_unref clear the local variable, the original pointer
not set to NULL, this could cause use-after-free bug.
Signed-off-by: Philip Yang
Reviewed-by: Felix Kuehling
---
drivers/gpu/drm/amd/amdgpu/amdgpu_amd
Use the queue eop buffer size, cwsr area size, ctl stack size
calculation from Thunk, store the value to KFD node properties.
Those will be used to validate queue eop buffer size, cwsr area size,
ctl stack size when creating KFD user compute queue.
Those will be exposed to user space via sysfs KF
Add helper function kfd_queue_acquire_buffers to get queue wptr_bo
reference from queue write_ptr if it is mapped to the KFD node with
expected size.
Add wptr_bo to structure queue_properties because structure queue is
allocated after queue buffers are validated, then we can remove wptr_bo
paramet
On 2024-07-18 17:05, Philip Yang wrote:
> This patch series do additional queue buffers validation in the queue
> creation IOCTLS, fail the queue creation if buffers not mapped on the GPU
> with the expected size.
>
> Ensure queue buffers residency by tracking the GPUVM virtual addresses
> for q
Hi Dave, Sima,
Fixes for 6.11.
The following changes since commit 1cff1010bef6f325d895db0306b59dc7232ed9b7:
drm/amdgpu/mes12: add missing opcode string (2024-07-12 11:46:46 -0400)
are available in the Git repository at:
https://gitlab.freedesktop.org/agd5f/linux.git
tags/amd-drm-fixes-6.1
Certain GPUs have better copy performance over xGMI on specific
SDMA engines depending on the source and destination GPU.
Allow users to create SDMA queues on these recommended engines.
Close to 2x overall performance has been observed with this
optimization.
Signed-off-by: Jonathan Kim
---
driv
80 matches
Mail list logo