logging behaviour.
Finally, bail early on failure to remove a single queue as something
has gone really wrong post-suspend and a GPU reset is going to occur
anyway so it's more efficient to just release the device lock.
Signed-off-by: Jonathan Kim
---
.../drm/amd/amdkfd/kfd_device_queue_mana
cgroup excluded from this device.
Signed-off-by: Jonathan Kim
---
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 4 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 8 +--
drivers/gpu/drm/amd/amdkfd/kfd_device.c| 69 ++
drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 5 +-
drivers
ASICs post GFX 9 are being flagged as SDMA per queue reset supported
in the KGD but KFD and scheduler FW currently have no support.
Limit SDMA queue reset capabilities to GFX 9.
Signed-off-by: Jonathan Kim
---
drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 6 +++---
1 file changed, 3 insertions
Clause instructions with precise memory enabled currently hang the
shader so set capabilities flag to disabled since it's unsafe to use
for debugging.
Signed-off-by: Jonathan Kim
---
drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 4
1 file changed, 4 deletions(-)
diff --git a/drivers/gp
Remove unused declaration of gws_debug_workaround.
Signed-off-by: Jonathan Kim
---
drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 1 -
1 file changed, 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 59619f794b6b..43950f3e6672 100644
Remove unused declaration of gws_debug_workaround.
Signed-off-by: Jonathan Kim
---
drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 1 -
1 file changed, 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 59619f794b6b..43950f3e6672 100644
Similar to compute queue reset, flag SDMA queue reset capabilities to
user space for safe testing.
Signed-off-by: Jonathan Kim
---
drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 5 +
drivers/gpu/drm/amd/amdkfd/kfd_topology.h | 1 +
include/uapi/linux/kfd_sysfs.h| 3 +++
3 files
, create a common call for all reset types to simplify
the handling of module parameter settings that block gpu resets.
Signed-off-by: Jonathan Kim
---
.../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 1 +
.../drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c | 4 +-
.../drm/amd/amdgpu
Deprecate KFD XGMI peer info calls in favour of calling directly from
simplified XGMI peer info functions.
v2: generalize bandwidth interface to return range in one call
Signed-off-by: Jonathan Kim
---
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 42
drivers/gpu/drm/amd/amdgpu
Even though GWS no longer exists, to maintain runtime usage for
cooperative launch, SW set legacy GWS size.
Signed-off-by: Jonathan Kim
---
drivers/gpu/drm/amd/amdkfd/kfd_device.c | 6 +-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
b
Deprecate KFD XGMI peer info calls in favour of calling directly from
simplified XGMI peer info functions.
Signed-off-by: Jonathan Kim
---
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 42 --
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 5 ---
drivers/gpu/drm/amd/amdgpu
Per queue reset should be bypassed when gpu recovery is disabled
with module parameter.
Signed-off-by: Jonathan Kim
---
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 6 ++
1 file changed, 6 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
b/drivers/gpu
GFX 12 does not require a page size cap for the trap handler because
it does not require a CWSR work around like GFX 11 did.
v2: set default cap
Signed-off-by: Jonathan Kim
---
drivers/gpu/drm/amd/amdkfd/kfd_device.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers
GFX 12 does not require a page size cap for the trap handler because
it does not require a CWSR work around like GFX 11 did.
Signed-off-by: Jonathan Kim
---
drivers/gpu/drm/amd/amdkfd/kfd_device.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
b
Flag KFD support for per-queue reset on GFX9 devices.
Signed-off-by: Jonathan Kim
---
drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 2 ++
include/uapi/linux/kfd_sysfs.h| 3 ++-
2 files changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
b
nd delete queue from list later as-per
description instead of destroy queue referencing hack.
Signed-off-by: Jonathan Kim
---
drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_
hat a potential subsequent queue
reset call can check against this queue as well.
Signed-off-by: Jonathan Kim
---
drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 10 +-
drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h | 1 +
2 files changed, 10 insertions(+), 1 deletion(-)
diff
per-partition-per-PASID
instead.
Signed-off-by: Jonathan Kim
---
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c| 12
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h| 1 +
drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 4 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 7 ++-
drivers
enough SDMA xGMI engines report the recommended
engines in the first place.
v2: fixups in description
Fixes: a0f548d7871e ("drm/amdkfd: allow users to target recommended SDMA
engines")
Signed-off-by: Jonathan Kim
---
drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 5 ++---
1 file changed, 2
initializes synchronously after
the KGD partition mode is set regardless of user or system setup.
Fixes: a0f548d7871e ("drm/amdkfd: allow users to target recommended SDMA
engines")
Signed-off-by: Jonathan Kim
---
drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 2 +-
1 file changed, 1 inser
If queue reset fails, tell the CP to reset the pipe.
Since queues multiplex context per pipe and we've issues a device wide
preemption prior to the hang, we can assume the hung pipe only has one
queue to reset on pipe reset.
Signed-off-by: Jonathan Kim
---
.../gpu/drm/amd/a
to NULL on free
- update DRM_ERR on reset to drm_err app warning message
v2: move reset queue flag for house keeping to process device.
split detect and reset into separate functions.
make reset call safe during power saving modes.
clean up some other nitpicks.
Signed-off-by: Jonathan Kim
space will no longer
be able to access reset queues.
v2: move per-queue reset flag to this patch
rebase based on patch 1 changes
Signed-off-by: Jonathan Kim
---
.../drm/amd/amdkfd/kfd_device_queue_manager.c | 31 ---
drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 1 +
include/uapi
call safe during power saving modes.
clean up some other nitpicks.
Signed-off-by: Jonathan Kim
---
.../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 2 +
.../drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c | 4 +-
.../drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c | 4 +-
.../drm/amd/amdgpu
and refactor sdma resource
bit setting logic.
Signed-off-by: Jonathan Kim
---
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 16 ++
.../drm/amd/amdkfd/kfd_device_queue_manager.c | 38 +-
drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 5 +-
.../amd/amdkfd
The number of watchpoints should be set and constrained per logical
partition device, not by the socket device.
Signed-off-by: Jonathan Kim
---
drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 20 ++--
drivers/gpu/drm/amd/amdkfd/kfd_device.c | 4 ++--
drivers/gpu/drm/amd/amdkfd
Certain GPUs have better copy performance over xGMI on specific
SDMA engines depending on the source and destination GPU.
Allow users to create SDMA queues on these recommended engines.
Close to 2x overall performance has been observed with this
optimization.
Signed-off-by: Jonathan Kim
SET_RESOURCES first to identify the user queue
candidates to reset.
Only signal reset events to processes that have had a queue reset.
If queue reset fails, fall back to GPU reset.
Signed-off-by: Jonathan Kim
---
.../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 1 +
.../drm/amd/amdgpu
space will no longer
be able to access reset queues.
Signed-off-by: Jonathan Kim
---
.../drm/amd/amdkfd/kfd_device_queue_manager.c | 30 +++
include/uapi/linux/kfd_ioctl.h| 4 +++
2 files changed, 29 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/amd
MES internally has a timeout allowance of 2 seconds.
Increase driver timeout to 3 seconds to be safe.
Signed-off-by: Jonathan Kim
---
drivers/gpu/drm/amd/amdgpu/mes_v11_0.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
b/drivers/gpu
Due to a CP interrupt bug, bad packet garbage exception codes are raised.
Do a range check so that the debugger and runtime do not receive garbage
codes.
Update the user api to guard exception code type checking as well.
Signed-off-by: Jonathan Kim
Tested-by: Jesse Zhang
---
.../gpu/drm/amd
Prevent dropping the KFD process reference at the end of a debug
IOCTL call where the acquired process value is an error.
Signed-off-by: Jonathan Kim
---
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
b
Fix up on mes process context flush to prevent non-mes devices from
spamming error messages or running into undefined behaviour during
process termination.
Fixes: 73204d028eb5 ("drm/amdkfd: fix mes set shader debugger process
management")
Signed-off-by: Jonathan Kim
---
drivers/g
flush call and the MES debugger calls use the same MES
interface but are separated as KFD calls to avoid conflicting with each
other.
Signed-off-by: Jonathan Kim
Tested-by: Alice Wong
---
drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 31 +++
drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
GC IP 9.4.2 and up support TA reporting of the number
of xGMI links between peers.
Tested-by: Vignesh Chander
Signed-off-by: Jonathan Kim
---
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu
handling and running KFD tests.
The only time ADD_QUEUE.skip_process_ctx_clear is required is for
debugger use cases where a debugged process is always runtime enabled
when adding a queue.
Tested-by: Shikai Guo
Signed-off-by: Jonathan Kim
---
drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 6
adding a queue.
Tested-by: Shikai Guo
Signed-off-by: Jonathan Kim
---
drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 6 --
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
b/drivers/gpu/drm/amd/amdkfd
Remove redundant assignment when skipping process ctx clear.
Signed-off-by: Jonathan Kim
---
drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
b/drivers/gpu/drm/amd/amdkfd
The MES cached process context must be cleared on adding any queue for
the first time.
For proper debug support, the MES will clear it's cached process context
on the first call to SET_SHADER_DEBUGGER.
This allows TTMPs to be pesistently enabled in a safe manner.
Signed-off-by: Jonatha
do not want these to be cooperative
dispatches.
v2: fix up indentation and comments.
remove unnecessary perf warning on oversubscription.
change 0 init to 0 memset to deal with padding.
Signed-off-by: Jonathan Kim
---
drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 2 ++
drivers/gpu/drm
Update the list of devices that require the cwsr trap handling
workaround for debugging use cases.
Signed-off-by: Jonathan Kim
---
drivers/gpu/drm/amd/amdkfd/kfd_debug.c| 5 ++---
drivers/gpu/drm/amd/amdkfd/kfd_debug.h| 6 ++
drivers/gpu/drm/amd/amdkfd
do not want these to be cooperative
dispatches.
NOTE: FIXME MES FW enablement checks are a placeholder at the moment and
will be updated when the binary revision number is finalized.
Signed-off-by: Jonathan Kim
---
drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 2 +-
drivers/gpu/drm
-off-by: Jonathan Kim
---
drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
index 1a4cdee86759..eeedc3ddffeb 100644
--- a/drivers/gpu/drm/amd/am
Queue count should decrement on queue destruction regardless of HWS
support type.
Signed-off-by: Jonathan Kim
---
drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
b
Null check should be done on queue struct itself and not on the
process queue list node.
Signed-off-by: Jonathan Kim
---
drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
b/drivers/gpu/drm/amd
. Once the binaries have been created, this check may
be subject to change.
v2: do a trap_en safety check in case old mes doesn't accept
unused trap_en d-word.
remove unnecessary process termination work around.
Signed-off-by: Jonathan Kim
---
drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
access issues.
Remove KFD GFX OFF enable toggle clutter by moving these calls into the
KGD debug calls themselves.
v2: toggle gfx off around address watch hi/lo settings as well.
Signed-off-by: Jonathan Kim
---
.../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 4 +++
.../drm/amd/amdgpu
change.
Signed-off-by: Jonathan Kim
---
drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 5 ++-
drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h | 4 ++-
drivers/gpu/drm/amd/amdgpu/mes_v11_0.c| 1 +
drivers/gpu/drm/amd/amdkfd/kfd_debug.c| 31 ++-
.../drm/amd/amdkfd
Exception handling for vmfaults should be raised with additional data.
Reported-by: Mukul Joshi
Signed-off-by: Jonathan Kim
---
drivers/gpu/drm/amd/amdkfd/kfd_events.c | 34 +++--
1 file changed, 20 insertions(+), 14 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdkfd
ng failure.
Also allow the debugger to clear exceptions when doing a snapshot.
Signed-off-by: Jonathan Kim
Reviewed-by: Felix Kuehling
---
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 6 +++
.../drm/amd/amdkfd/kfd_device_queue_manager.c | 36 +
.../drm/amd/amdkfd/kfd_device_queue
option of clearing the target exception on query.
Signed-off-by: Jonathan Kim
Reviewed-by: Felix Kuehling
---
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 7 ++
drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 120 +++
drivers/gpu/drm/amd/amdkfd/kfd_debug.h | 6 ++
3 files ch
subsequent
successful call.
v2: add num_xcc to device snapshot and fixup new kfd_node reference
Signed-off-by: Jonathan Kim
---
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 7 ++-
drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 73
drivers/gpu/drm/amd/amdkfd/kfd_debug.h
Allow the debugger to query a single queue, device and process
exception.
The KFD should also return the GPU or Queue id of the exception.
The debugger also has the option of clearing exceptions after
being queried.
Signed-off-by: Jonathan Kim
Reviewed-by: Felix Kuehling
---
drivers/gpu/drm
v10.c. This is because the IV from SQ interrupts are
packed into a new continguous format unlike GFX9. To make this clear,
a separate interrupting handling code file was created.
v2: use new kfd_node struct in prototypes.
Signed-off-by: Jonathan Kim
---
drivers/gpu/drm/amd/amdgpu/amdgpu_amd
The debugger subscibes to nofication for requested exceptions on attach.
Allow the debugger to change its subsciption later on.
Signed-off-by: Jonathan Kim
Reviewed-by: Felix Kuehling
---
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 3 ++
drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 36
Allow the debugger to set wave behaviour on to either normally operate,
halt at launch, trap on every instruction, terminate immediately or
stall on allocation.
v2: fixup with new kfd_node struct reference for mes check
Signed-off-by: Jonathan Kim
---
.../drm/amd/amdgpu
watch points are allocated or not.
v2: fixup with new kfd_node struct reference for mes and watch point
checks
Signed-off-by: Jonathan Kim
---
.../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 51 +++
.../drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c | 2 +
.../drm/amd/amdgpu
Bump the minor version to declare debugging capability is now
available.
Signed-off-by: Jonathan Kim
Reviewed-by: Felix Kuehling
---
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 1 -
include/uapi/linux/kfd_ioctl.h | 3 ++-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a
engine, return the runtime status
as enabled but with an error.
In addition, like any other mutli-process debug supported devices,
disable trap temporary setup per-process to avoid performance impact from
setup overhead.
Signed-off-by: Jonathan Kim
Reviewed-by: Felix Kuehling
---
drivers/gpu/drm
From: Jay Cornwall
Trap handler behavior will differ when a debugger is attached.
Make the debug trap flag available in the trap handler TMA.
Update it when the debug trap ioctl is invoked.
Signed-off-by: Jay Cornwall
Reviewed-by: Felix Kuehling
Signed-off-by: Jonathan Kim
Reviewed-by
.
Signed-off-by: Jonathan Kim
Reviewed-by: Felix Kuehling
---
drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 32 +++
drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h | 20
drivers/gpu/drm/amd/amdgpu/mes_v11_0.c| 12 +++
drivers/gpu/drm/amd/include
cise at the cost of performance. This setting is not
permitted on debug devices that support only a global setting of this
option.
Return the previous set flags to the debugger as well.
v2: fixup with new kfd_node struct reference mes checks
Signed-off-by: Jonathan Kim
---
drivers/gpu/drm/amd/am
Implement the per-device calls to enable or disable HW debug mode
for GFX11.
Signed-off-by: Jonathan Kim
Reviewed-by: Felix Kuehling
---
.../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v11.c| 38 +++
1 file changed, 38 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu
be overridden or fully replaced.
In order for the debugger to know what is permissible, returned the
supported override mask back to the debugger along with the previously
enable overrides.
v2: fixup with new kfd_node struct reference for mes check
Signed-off-by: Jonathan Kim
---
.../drm/amd
x27;t suspend or
resume queues).
v2: fixup new kfd_node struct reference for mes fw check.
also fixup missing EC_QUEUE_NEW flagging on newly created queue.
Signed-off-by: Jonathan Kim
---
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c| 5 +
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h| 1 +
dr
.
For runtime exceptions, this will unblock the runtime enable
function which will be explained and implemented in a follow up
patch.
Signed-off-by: Jonathan Kim
Reviewed-by: Felix Kuehling
---
.../gpu/drm/amd/amdkfd/cik_event_interrupt.c | 4 +-
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
.
For memory violation exceptions, extra exception data will be saved.
The debugger will be able to query the saved exception states by query
operation that will be provided by follow up patches.
v2: use new kfd_node struct in prototype.
Signed-off-by: Jonathan Kim
---
drivers/gpu/drm/amd/amdkfd
the required register values that the HWS needs to write on debug enable
and disable.
Signed-off-by: Jonathan Kim
Reviewed-by: Felix Kuehling
---
.../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 42 ++-
1 file changed, 41 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm
nable functions are implemented in a follow up patch.
v2: spot fix with new kfd_node references
Signed-off-by: Jonathan Kim
---
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 5 +
drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 148 ++-
drivers/gpu/drm/amd/amdkfd/kfd_debug.h
ode struct reference
Signed-off-by: Jonathan Kim
---
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 143 ++-
drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 6 +-
drivers/gpu/drm/amd/amdkfd/kfd_debug.h | 4 +
drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 1 +
4 files changed,
SET_RESOUCES so that a debugged
process will never migrate away from its pinned VMID.
The KFD is responsible for reserving and releasing this pinned VMID
accordingly whenever the debugger attaches and detaches respectively.
v2: spot fix ups using new kfd_node references
Signed-off-by: Jonathan Kim
Flush delayed restore work in kfd_suspend_all_queues instead of
cancelling. Cancelling the work before it runs results in the queues
becoming permanently disabled. Flushing the work ensures that the
queue suspend/resume state stays balanced.
Signed-off-by: Jonathan Kim
Reviewed-by: Felix
changing the implicit wait count setting. Once set, resume all work.
Signed-off-by: Jonathan Kim
Reviewed-by: Felix Kuehling
---
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 3 +
.../drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c | 116 ++
drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
.
v2: add null grace period function pointers to VI packet manager.
Signed-off-by: Jonathan Kim
---
.../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 2 +
.../drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c | 2 +
.../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c| 43
.../drm/amd/amdgpu
.
v2: spot fixup new kfd_node references
Signed-off-by: Jonathan Kim
---
drivers/gpu/drm/amd/amdkfd/kfd_debug.h| 5 ++
.../drm/amd/amdkfd/kfd_device_queue_manager.c | 51 +++
.../drm/amd/amdkfd/kfd_device_queue_manager.h | 3 ++
.../drm/amd/amdkfd/kfd_packet_manager_v9.c
ll be fixed for GFX11 onwards.
Also remove a bunch of deprecated misplaced references for GFX10.3.
Signed-off-by: Jonathan Kim
Reviewed-by: Felix Kuehling
---
.../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c| 96
.../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.h| 28
.../dr
rder to correctly set this up, set the special reserved CP bit by
default whenever the MQD is initailized.
v2: add missing 0-init of SPI_GDBG_TRAP_DATA0/1
Signed-off-by: Jonathan Kim
---
drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c| 26 +++
drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
wave inheritence
of that mode is upheld.
Also ensure that exception overrides are reset to their original state
prior to debug enable or disable.
Signed-off-by: Jonathan Kim
Reviewed-by: Felix Kuehling
---
.../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 92 +++
.../gpu/drm/amd/a
Introduce the require KGD debug calls that will execute hardware debug
mode setting.
Signed-off-by: Jonathan Kim
Reviewed-by: Felix Kuehling
---
.../gpu/drm/amd/include/kgd_kfd_interface.h | 34 +++
1 file changed, 34 insertions(+)
diff --git a/drivers/gpu/drm/amd/include
ption events will notify the debugger through a pollable FIFO
file descriptor that the debugger provides to the KFD to manage.
Finally on process termination of either the debugger or the target,
debugging must be disabled if it has not been done so.
Signed-off-by: Jonathan Kim
Reviewed-by:
rface coordinates exception handling with the
HSA runtime.
Usage is available in the kern docs at uapi/linux/kfd_ioctl.h.
v2: add num_xcc to device snapshot entry.
fixup missing EC_QUEUE_PACKET_RESERVED mask.
Signed-off-by: Jonathan Kim
---
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 48 ++
in
igned-off-by: Jonathan Kim
---
drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 101 --
drivers/gpu/drm/amd/amdkfd/kfd_topology.h | 6 ++
include/uapi/linux/kfd_sysfs.h| 15
3 files changed, 117 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/amd/a
access issues.
Remove KFD GFX OFF enable toggle clutter by moving these calls into the
KGD debug calls themselves.
Signed-off-by: Jonathan Kim
---
.../drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c | 7
.../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c| 33 ++-
.../gpu/drm/amd/amdgpu
Bump the minor version to declare debugging capability is now
available.
v2: bump to 1.13 after upstream rebase.
Signed-off-by: Jonathan Kim
Reviewed-by: Felix Kuehling
---
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 1 -
include/uapi/linux/kfd_ioctl.h | 3 ++-
2 files changed, 2
-off-by: Jonathan Kim
Reviewed-by: Felix Kuehling
---
.../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 51 +++
.../drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c | 2 +
.../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c| 78 ++
.../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.h| 8 ++
.../drm
flag setup on APUs
Signed-off-by: Jay Cornwall
Reviewed-by: Felix Kuehling
Signed-off-by: Jonathan Kim
---
drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 11 +++
drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 2 ++
drivers/gpu/drm/amd/amdkfd/kfd_process.c | 15 +++
3 files changed
and remove deprecated launch mode options
Signed-off-by: Jonathan Kim
---
.../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 12 +++
.../drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c | 1 +
.../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c| 25 +
.../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.h
x27;t suspend or
resume queues).
v3: update safer copy context save header
v2: add gfx11/mes support.
prevent header copy on suspend from overwriting user fields.
simplify resume_queues function.
address other nit-picks
Signed-off-by: Jonathan Kim
---
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
flag for now.
v2: add gfx11 support.
Signed-off-by: Jonathan Kim
---
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 2 +
drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 58
drivers/gpu/drm/amd/amdkfd/kfd_debug.h | 1 +
3 files changed, 61 insertions(+)
diff --git a/drivers/gpu
eanup ttmp_setup for runtime_enable.
v2: fix up hierarchy of semantics in description.
Signed-off-by: Jonathan Kim
---
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 143 ++-
drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 6 +-
drivers/gpu/drm/amd/amdkfd/kfd_debug.h | 4 +
driver
.
For runtime exceptions, this will unblock the runtime enable
function which will be explained and implemented in a follow up
patch.
v2: missing closing brace in set workaround function got fixed
in patch 17.
Signed-off-by: Jonathan Kim
---
.../gpu/drm/amd/amdkfd/cik_event_interrupt.c | 4
queue and device snapshot.
change device snapshot implementation to match queue snapshot
implementation.
Signed-off-by: Jonathan Kim
Reviewed-by: Felix Kuehling
---
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 7 ++-
drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 72
drivers
pport. fix fw checks. remove asic family name comments.
Signed-off-by: Jonathan Kim
---
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 5 +
drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 148 ++-
drivers/gpu/drm/amd/amdkfd/kfd_debug.h | 29 +
drivers/gpu/drm/amd/amdkfd/kfd
-by: Jonathan Kim
---
.../drm/amd/amdkfd/kfd_device_queue_manager.c | 93 +++
.../drm/amd/amdkfd/kfd_device_queue_manager.h | 5 +
.../drm/amd/amdkfd/kfd_packet_manager_v9.c| 9 ++
.../gpu/drm/amd/amdkfd/kfd_pm4_headers_ai.h | 5 +-
4 files changed, 111 insertions(+), 1
-off-by: Jonathan Kim
Reviewed-by: Felix Kuehling
---
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 3 +
.../drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c | 116 ++
drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 4 +-
3 files changed, 121 insertions(+), 2 deletions(-)
diff --git
option of clearing the target exception on query.
Signed-off-by: Jonathan Kim
Reviewed-by: Felix Kuehling
---
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 7 ++
drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 120 +++
drivers/gpu/drm/amd/amdkfd/kfd_debug.h | 6 ++
3 files ch
v2: change buf_size arg to num_queues for clarity.
fix minimum entry size calculation.
Signed-off-by: Jonathan Kim
Reviewed-by: Felix Kuehling
---
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 6 +++
.../drm/amd/amdkfd/kfd_device_queue_manager.c | 36 +
.../drm/
application.
disable debugging for now on gfx11 due to broken fw.
Signed-off-by: Jonathan Kim
---
drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h | 2 +
drivers/gpu/drm/amd/amdgpu/mes_v11_0.c| 7 +--
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 2 -
drivers/gpu/drm/amd/amdkfd/kfd_debug.c
.
v3: remove unneeded comment. also add missing kfd_debug.h include
in dqm file.
v2: remove asic family code name comment in per vmid support check
Signed-off-by: Jonathan Kim
---
drivers/gpu/drm/amd/amdkfd/kfd_debug.h| 5 ++
.../drm/amd/amdkfd/kfd_device_queue_manager.c | 51
.
Signed-off-by: Jonathan Kim
Reviewed-by: Felix Kuehling
---
drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 32 +++
drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h | 20
drivers/gpu/drm/amd/amdgpu/mes_v11_0.c| 12 +++
drivers/gpu/drm/amd/include
1 - 100 of 281 matches
Mail list logo