from:"Shashank Sharma"

[PATCH v11 00/28] AMDGPU usermode queues

2024-09-09 Thread Shashank Sharma

This patch series introduces base code of AMDGPU usermode queues for gfx
workloads. Usermode queues is a method of GPU workload submission into the
graphics hardware without any interaction with kernel/DRM schedulers. In
this method, a userspace graphics application can create its own workqueue
and submit it directly in the GPU HW.

The general idea of how Userqueues are supposed to work:
- The application creates the following GPU objetcs:
  - A queue object to hold the workload packets.
  - A read pointer object.
  - A write pointer object.
  - A doorbell page.
  - Other supporting buffer objects as per target IP engine (shadow, GDS
etc, information available with AMDGPU_INFO_IOCTL)
- The application picks a 32-bit offset in the doorbell page for this
  queue.
- The application uses the usermode_queue_create IOCTL introduced in
  this patch, by passing the GPU addresses of these objects (read ptr,
  write ptr, queue base address, shadow, gds) with doorbell object and
  32-bit doorbell offset in the doorbell page.
- The kernel creates the queue and maps it in the HW.
- The application maps the GPU buffers in process address space.
- The application can start submitting the data in the queue as soon as
  the kernel IOCTL returns.
- After filling the workload data in the queue, the app must write the
  number of dwords added in the queue into the doorbell offset and the
  WPTR buffer. The GPU will start fetching the data as soon as its done.
- This series adds usermode queue support for all three MES based IPs
  (GFX, SDMA and Compute).
- This series also adds eviction fences to handle migration of the
  userqueue mapped buffers by TTM.
- For synchronization of userqueues, we have added a secure semaphores
  IOCTL which is getting reviewed separately here:
  https://patchwork.freedesktop.org/patch/611971/

libDRM UAPI changes for this series can be found here:
(This also contains an example test utility which demonstrates
the usage of userqueue UAPI)
https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/287

MESA changes consuming this series can be seen in the MR here:
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29010

Alex Deucher (1):
  drm/amdgpu: UAPI for user queue management

Arvind Yadav (4):
  drm/amdgpu: enable SDMA usermode queues
  drm/amdgpu: Add input fence to sync bo unmap
  drm/amdgpu: fix MES GFX mask
  Revert "drm/amdgpu: don't allow userspace to create a doorbell BO"

Shashank Sharma (18):
  drm/amdgpu: add usermode queue base code
  drm/amdgpu: add new IOCTL for usermode queue
  drm/amdgpu: add helpers to create userqueue object
  drm/amdgpu: create MES-V11 usermode queue for GFX
  drm/amdgpu: create context space for usermode queue
  drm/amdgpu: map usermode queue into MES
  drm/amdgpu: map wptr BO into GART
  drm/amdgpu: generate doorbell index for userqueue
  drm/amdgpu: cleanup leftover queues
  drm/amdgpu: enable GFX-V11 userqueue support
  drm/amdgpu: enable compute/gfx usermode queue
  drm/amdgpu: update userqueue BOs and PDs
  drm/amdgpu: add kernel config for gfx-userqueue
  drm/amdgpu: add gfx eviction fence helpers
  drm/amdgpu: add userqueue suspend/resume functions
  drm/amdgpu: suspend gfx userqueues
  drm/amdgpu: resume gfx userqueues
  Revert "drm/amdgpu/gfx11: only enable CP GFX shadowing on SR-IOV"

 drivers/gpu/drm/amd/amdgpu/Kconfig|   8 +
 drivers/gpu/drm/amd/amdgpu/Makefile   |  10 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu.h   |  11 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c|   5 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c   |  10 +
 .../drm/amd/amdgpu/amdgpu_eviction_fence.c| 297 
 .../drm/amd/amdgpu/amdgpu_eviction_fence.h|  67 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c   |  68 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c   |  11 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c   |   3 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h   |   2 +-
 .../gpu/drm/amd/amdgpu/amdgpu_userq_fence.c   | 713 ++
 .../gpu/drm/amd/amdgpu/amdgpu_userq_fence.h   |  74 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 644 
 drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c|  42 +-
 drivers/gpu/drm/amd/amdgpu/mes_v11_0.c|  16 +-
 .../gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c  | 395 ++
 .../gpu/drm/amd/amdgpu/mes_v11_0_userqueue.h  |  30 +
 drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c|   5 +
 .../gpu/drm/amd/include/amdgpu_userqueue.h| 100 +++
 drivers/gpu/drm/amd/include/v11_structs.h |   4 +-
 include/uapi/drm/amdgpu_drm.h | 252 +++
 22 files changed, 2722 insertions(+), 45 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.h
 create mode 100644 drivers/gpu/drm/amd/

[PATCH v11 01/28] drm/amdgpu: UAPI for user queue management

2024-09-09 Thread Shashank Sharma

From: Alex Deucher 

This patch intorduces new UAPI/IOCTL for usermode graphics
queue. The userspace app will fill this structure and request
the graphics driver to add a graphics work queue for it. The
output of this UAPI is a queue id.

This UAPI maps the queue into GPU, so the graphics app can start
submitting work to the queue as soon as the call returns.

V2: Addressed review comments from Alex and Christian
- Make the doorbell offset's comment clearer
- Change the output parameter name to queue_id

V3: Integration with doorbell manager

V4:
- Updated the UAPI doc (Pierre-Eric)
- Created a Union for engine specific MQDs (Alex)
- Added Christian's R-B
V5:
- Add variables for GDS and CSA in MQD structure (Alex)
- Make MQD data a ptr-size pair instead of union (Alex)

V9:
   - renamed struct drm_amdgpu_userq_mqd_gfx_v11 to struct
 drm_amdgpu_userq_mqd as its being used for SDMA and
 compute queues as well

V10:
- keeping the drm_amdgpu_userq_mqd IP independent, moving the
  _gfx_v11 objects in a separate structure in other patch.
  (Alex)

Cc: Alex Deucher 
Cc: Christian Koenig 
Reviewed-by: Christian König 
Signed-off-by: Alex Deucher 
Signed-off-by: Shashank Sharma 
---
 include/uapi/drm/amdgpu_drm.h | 90 +++
 1 file changed, 90 insertions(+)

diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
index 3e488b0119eb..bd8d47a3 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -54,6 +54,7 @@ extern "C" {
 #define DRM_AMDGPU_VM  0x13
 #define DRM_AMDGPU_FENCE_TO_HANDLE 0x14
 #define DRM_AMDGPU_SCHED   0x15
+#define DRM_AMDGPU_USERQ   0x16
 
 #define DRM_IOCTL_AMDGPU_GEM_CREATEDRM_IOWR(DRM_COMMAND_BASE + 
DRM_AMDGPU_GEM_CREATE, union drm_amdgpu_gem_create)
 #define DRM_IOCTL_AMDGPU_GEM_MMAP  DRM_IOWR(DRM_COMMAND_BASE + 
DRM_AMDGPU_GEM_MMAP, union drm_amdgpu_gem_mmap)
@@ -71,6 +72,7 @@ extern "C" {
 #define DRM_IOCTL_AMDGPU_VMDRM_IOWR(DRM_COMMAND_BASE + 
DRM_AMDGPU_VM, union drm_amdgpu_vm)
 #define DRM_IOCTL_AMDGPU_FENCE_TO_HANDLE DRM_IOWR(DRM_COMMAND_BASE + 
DRM_AMDGPU_FENCE_TO_HANDLE, union drm_amdgpu_fence_to_handle)
 #define DRM_IOCTL_AMDGPU_SCHED DRM_IOW(DRM_COMMAND_BASE + 
DRM_AMDGPU_SCHED, union drm_amdgpu_sched)
+#define DRM_IOCTL_AMDGPU_USERQ DRM_IOWR(DRM_COMMAND_BASE + 
DRM_AMDGPU_USERQ, union drm_amdgpu_userq)
 
 /**
  * DOC: memory domains
@@ -319,6 +321,94 @@ union drm_amdgpu_ctx {
union drm_amdgpu_ctx_out out;
 };
 
+/* user queue IOCTL */
+#define AMDGPU_USERQ_OP_CREATE 1
+#define AMDGPU_USERQ_OP_FREE   2
+
+/* Flag to indicate secure buffer related workload, unused for now */
+#define AMDGPU_USERQ_MQD_FLAGS_SECURE  (1 << 0)
+/* Flag to indicate AQL workload, unused for now */
+#define AMDGPU_USERQ_MQD_FLAGS_AQL (1 << 1)
+
+/*
+ * MQD (memory queue descriptor) is a set of parameters which allow
+ * the GPU to uniquely define and identify a usermode queue. This
+ * structure defines the MQD for GFX-V11 IP ver 0.
+ */
+struct drm_amdgpu_userq_in {
+   /** AMDGPU_USERQ_OP_* */
+   __u32   op;
+   /** Queue handle for USERQ_OP_FREE */
+   __u32   queue_id;
+   /** the target GPU engine to execute workload (AMDGPU_HW_IP_*) */
+   __u32   ip_type;
+   /**
+* @flags: flags to indicate special function for queue like secure
+* buffer (TMZ). Unused for now.
+*/
+   __u32   flags;
+   /**
+* @doorbell_handle: the handle of doorbell GEM object
+* associated to this client.
+*/
+   __u32   doorbell_handle;
+   /**
+* @doorbell_offset: 32-bit offset of the doorbell in the doorbell bo.
+* Kernel will generate absolute doorbell offset using doorbell_handle
+* and doorbell_offset in the doorbell bo.
+*/
+   __u32   doorbell_offset;
+
+   /**
+* @queue_va: Virtual address of the GPU memory which holds the queue
+* object. The queue holds the workload packets.
+*/
+   __u64   queue_va;
+   /**
+* @queue_size: Size of the queue in bytes, this needs to be 256-byte
+* aligned.
+*/
+   __u64   queue_size;
+   /**
+* @rptr_va : Virtual address of the GPU memory which holds the ring 
RPTR.
+* This object must be at least 8 byte in size and aligned to 8-byte 
offset.
+*/
+   __u64   rptr_va;
+   /**
+* @wptr_va : Virtual address of the GPU memory which holds the ring 
WPTR.
+* This object must be at least 8 byte in size and aligned to 8-byte 
offset.
+*
+* Queue, RPTR and WPTR can come from the same object, as long as the 
size
+* and alignment related requirements are met.
+*/
+   __u64   wptr_va;
+   /**
+* @mqd: Queue descriptor for USERQ_OP_CREATE
+

[PATCH v11 02/28] drm/amdgpu: add usermode queue base code

2024-09-09 Thread Shashank Sharma

This patch adds IP independent skeleton code for amdgpu
usermode queue. It contains:
- A new files with init functions of usermode queues.
- A queue context manager in driver private data.

V1: Worked on design review comments from RFC patch series:
(https://patchwork.freedesktop.org/series/112214/)
- Alex: Keep a list of queues, instead of single queue per process.
- Christian: Use the queue manager instead of global ptrs,
   Don't keep the queue structure in amdgpu_ctx

V2:
 - Reformatted code, split the big patch into two

V3:
- Integration with doorbell manager

V4:
- Align the structure member names to the largest member's column
  (Luben)
- Added SPDX license (Luben)

V5:
- Do not add amdgpu.h in amdgpu_userqueue.h (Christian).
- Move struct amdgpu_userq_mgr into amdgpu_userqueue.h (Christian).

V6: Rebase
V9: Rebase
V10: Rebase + Alex's R-B

Cc: Alex Deucher 
Cc: Christian Koenig 
Reviewed-by: Christian König 
Reviewed-by: Alex Deucher 
Signed-off-by: Shashank Sharma 
Change-Id: I6585d012a7ead1105bf43a7b91f361d7dd20a9a9
---
 drivers/gpu/drm/amd/amdgpu/Makefile   |  2 +
 drivers/gpu/drm/amd/amdgpu/amdgpu.h   |  3 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c   |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c   |  6 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 40 
 .../gpu/drm/amd/include/amdgpu_userqueue.h| 61 +++
 6 files changed, 113 insertions(+)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
 create mode 100644 drivers/gpu/drm/amd/include/amdgpu_userqueue.h

diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
b/drivers/gpu/drm/amd/amdgpu/Makefile
index 34943b866687..dcf64b965bdf 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -250,6 +250,8 @@ amdgpu-y += \
 # add amdkfd interfaces
 amdgpu-y += amdgpu_amdkfd.o
 
+# add gfx usermode queue
+amdgpu-y += amdgpu_userqueue.o
 
 ifneq ($(CONFIG_HSA_AMD),)
 AMDKFD_PATH := ../amdkfd
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 6e6580ab7e04..57a418eec3d3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -112,6 +112,7 @@
 #include "amdgpu_xcp.h"
 #include "amdgpu_seq64.h"
 #include "amdgpu_reg_state.h"
+#include "amdgpu_userqueue.h"
 #if defined(CONFIG_DRM_AMD_ISP)
 #include "amdgpu_isp.h"
 #endif
@@ -493,6 +494,7 @@ struct amdgpu_fpriv {
struct mutexbo_list_lock;
struct idr  bo_list_handles;
struct amdgpu_ctx_mgr   ctx_mgr;
+   struct amdgpu_userq_mgr userq_mgr;
/** GPU partition selection */
uint32_txcp_id;
 };
@@ -1052,6 +1054,7 @@ struct amdgpu_device {
boolenable_uni_mes;
struct amdgpu_mes   mes;
struct amdgpu_mqd   mqds[AMDGPU_HW_IP_NUM];
+   const struct amdgpu_userq_funcs *userq_funcs[AMDGPU_HW_IP_NUM];
 
/* df */
struct amdgpu_dfdf;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 82bde5132dc6..d92f01f3ea44 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -50,6 +50,7 @@
 #include "amdgpu_reset.h"
 #include "amdgpu_sched.h"
 #include "amdgpu_xgmi.h"
+#include "amdgpu_userqueue.h"
 #include "../amdxcp/amdgpu_xcp_drv.h"
 
 /*
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index d9fde38f6ee2..019a377620ce 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -45,6 +45,7 @@
 #include "amdgpu_ras.h"
 #include "amdgpu_reset.h"
 #include "amd_pcie.h"
+#include "amdgpu_userqueue.h"
 
 void amdgpu_unregister_gpu_instance(struct amdgpu_device *adev)
 {
@@ -1392,6 +1393,10 @@ int amdgpu_driver_open_kms(struct drm_device *dev, 
struct drm_file *file_priv)
 
amdgpu_ctx_mgr_init(&fpriv->ctx_mgr, adev);
 
+   r = amdgpu_userq_mgr_init(&fpriv->userq_mgr, adev);
+   if (r)
+   DRM_WARN("Can't setup usermode queues, use legacy workload 
submission only\n");
+
file_priv->driver_priv = fpriv;
goto out_suspend;
 
@@ -1461,6 +1466,7 @@ void amdgpu_driver_postclose_kms(struct drm_device *dev,
 
amdgpu_ctx_mgr_fini(&fpriv->ctx_mgr);
amdgpu_vm_fini(adev, &fpriv->vm);
+   amdgpu_userq_mgr_fini(&fpriv->userq_mgr);
 
if (pasid)
amdgpu_pasid_free_delayed(pd->tbo.base.resv, pasid);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
new file mode 100644
index ..effc0c7c02cf
--- /dev/null
+++ b/drivers/gpu/dr

[PATCH v11 03/28] drm/amdgpu: add new IOCTL for usermode queue

2024-09-09 Thread Shashank Sharma

This patch adds:
- A new IOCTL function to create and destroy
- A new structure to keep all the user queue data in one place.
- A function to generate unique index for the queue.

V1: Worked on review comments from RFC patch series:
  - Alex: Keep a list of queues, instead of single queue per process.
  - Christian: Use the queue manager instead of global ptrs,
   Don't keep the queue structure in amdgpu_ctx

V2: Worked on review comments:
 - Christian:
   - Formatting of text
   - There is no need for queuing of userqueues, with idr in place
 - Alex:
   - Remove use_doorbell, its unnecessary
   - Reuse amdgpu_mqd_props for saving mqd fields

 - Code formatting and re-arrangement

V3:
 - Integration with doorbell manager

V4:
 - Accommodate MQD union related changes in UAPI (Alex)
 - Do not set the queue size twice (Bas)

V5:
 - Remove wrapper functions for queue indexing (Christian)
 - Do not save the queue id/idr in queue itself (Christian)
 - Move the idr allocation in the IP independent generic space
  (Christian)

V6:
 - Check the validity of input IP type (Christian)

V7:
 - Move uq_func from uq_mgr to adev (Alex)
 - Add missing free(queue) for error cases (Yifan)

V9:
 - Rebase

V10: Addressed review comments from Christian, and added R-B:
 - Do not initialize the local variable
 - Convert DRM_ERROR to DEBUG.

V11:
  - check the input flags to be zero (Alex)

Cc: Alex Deucher 
Cc: Christian Koenig 
Reviewed-by: Christian Koenig 
Reviewed-by: Alex Deucher 
Signed-off-by: Shashank Sharma 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c   |   1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 120 ++
 .../gpu/drm/amd/include/amdgpu_userqueue.h|   2 +
 3 files changed, 123 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index d92f01f3ea44..79db64d30c18 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -2951,6 +2951,7 @@ const struct drm_ioctl_desc amdgpu_ioctls_kms[] = {
DRM_IOCTL_DEF_DRV(AMDGPU_GEM_VA, amdgpu_gem_va_ioctl, 
DRM_AUTH|DRM_RENDER_ALLOW),
DRM_IOCTL_DEF_DRV(AMDGPU_GEM_OP, amdgpu_gem_op_ioctl, 
DRM_AUTH|DRM_RENDER_ALLOW),
DRM_IOCTL_DEF_DRV(AMDGPU_GEM_USERPTR, amdgpu_gem_userptr_ioctl, 
DRM_AUTH|DRM_RENDER_ALLOW),
+   DRM_IOCTL_DEF_DRV(AMDGPU_USERQ, amdgpu_userq_ioctl, 
DRM_AUTH|DRM_RENDER_ALLOW),
 };
 
 static const struct drm_driver amdgpu_kms_driver = {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
index effc0c7c02cf..cf7fe68d9277 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
@@ -23,6 +23,126 @@
  */
 
 #include "amdgpu.h"
+#include "amdgpu_vm.h"
+#include "amdgpu_userqueue.h"
+
+static struct amdgpu_usermode_queue *
+amdgpu_userqueue_find(struct amdgpu_userq_mgr *uq_mgr, int qid)
+{
+   return idr_find(&uq_mgr->userq_idr, qid);
+}
+
+static int
+amdgpu_userqueue_destroy(struct drm_file *filp, int queue_id)
+{
+   struct amdgpu_fpriv *fpriv = filp->driver_priv;
+   struct amdgpu_userq_mgr *uq_mgr = &fpriv->userq_mgr;
+   struct amdgpu_device *adev = uq_mgr->adev;
+   const struct amdgpu_userq_funcs *uq_funcs;
+   struct amdgpu_usermode_queue *queue;
+
+   mutex_lock(&uq_mgr->userq_mutex);
+
+   queue = amdgpu_userqueue_find(uq_mgr, queue_id);
+   if (!queue) {
+   DRM_DEBUG_DRIVER("Invalid queue id to destroy\n");
+   mutex_unlock(&uq_mgr->userq_mutex);
+   return -EINVAL;
+   }
+
+   uq_funcs = adev->userq_funcs[queue->queue_type];
+   uq_funcs->mqd_destroy(uq_mgr, queue);
+   idr_remove(&uq_mgr->userq_idr, queue_id);
+   kfree(queue);
+
+   mutex_unlock(&uq_mgr->userq_mutex);
+   return 0;
+}
+
+static int
+amdgpu_userqueue_create(struct drm_file *filp, union drm_amdgpu_userq *args)
+{
+   struct amdgpu_fpriv *fpriv = filp->driver_priv;
+   struct amdgpu_userq_mgr *uq_mgr = &fpriv->userq_mgr;
+   struct amdgpu_device *adev = uq_mgr->adev;
+   const struct amdgpu_userq_funcs *uq_funcs;
+   struct amdgpu_usermode_queue *queue;
+   int qid, r = 0;
+
+   if (args->in.flags) {
+   DRM_ERROR("Usermode queue flags not supported yet\n");
+   return -EINVAL;
+   }
+
+   mutex_lock(&uq_mgr->userq_mutex);
+
+   uq_funcs = adev->userq_funcs[args->in.ip_type];
+   if (!uq_funcs) {
+   DRM_ERROR("Usermode queue is not supported for this IP (%u)\n", 
args->in.ip_type);
+   r = -EINVAL;
+   goto unlock;
+   }
+
+   queue = kzalloc(sizeof(struct amdgpu_usermode_queue), GFP_KERNEL);
+   if (!queue) {
+   DRM_ERROR("Failed

[PATCH v11 04/28] drm/amdgpu: add helpers to create userqueue object

2024-09-09 Thread Shashank Sharma

This patch introduces amdgpu_userqueue_object and its helper
functions to creates and destroy this object. The helper
functions creates/destroys a base amdgpu_bo, kmap/unmap it and
save the respective GPU and CPU addresses in the encapsulating
userqueue object.

These helpers will be used to create/destroy userqueue MQD, WPTR
and FW areas.

V7:
- Forked out this new patch from V11-gfx-userqueue patch to prevent
  that patch from growing very big.
- Using amdgpu_bo_create instead of amdgpu_bo_create_kernel in prep
  for eviction fences (Christian)

V9:
 - Rebase
V10:
 - Added Alex's R-B

Cc: Alex Deucher 
Cc: Christian Koenig 
Reviewed-by: Alex Deucher 
Signed-off-by: Shashank Sharma 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 62 +++
 .../gpu/drm/amd/include/amdgpu_userqueue.h| 13 
 2 files changed, 75 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
index cf7fe68d9277..501324dde343 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
@@ -32,6 +32,68 @@ amdgpu_userqueue_find(struct amdgpu_userq_mgr *uq_mgr, int 
qid)
return idr_find(&uq_mgr->userq_idr, qid);
 }
 
+int amdgpu_userqueue_create_object(struct amdgpu_userq_mgr *uq_mgr,
+  struct amdgpu_userq_obj *userq_obj,
+  int size)
+{
+   struct amdgpu_device *adev = uq_mgr->adev;
+   struct amdgpu_bo_param bp;
+   int r;
+
+   memset(&bp, 0, sizeof(bp));
+   bp.byte_align = PAGE_SIZE;
+   bp.domain = AMDGPU_GEM_DOMAIN_GTT;
+   bp.flags = AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS |
+  AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED;
+   bp.type = ttm_bo_type_kernel;
+   bp.size = size;
+   bp.resv = NULL;
+   bp.bo_ptr_size = sizeof(struct amdgpu_bo);
+
+   r = amdgpu_bo_create(adev, &bp, &userq_obj->obj);
+   if (r) {
+   DRM_ERROR("Failed to allocate BO for userqueue (%d)", r);
+   return r;
+   }
+
+   r = amdgpu_bo_reserve(userq_obj->obj, true);
+   if (r) {
+   DRM_ERROR("Failed to reserve BO to map (%d)", r);
+   goto free_obj;
+   }
+
+   r = amdgpu_ttm_alloc_gart(&(userq_obj->obj)->tbo);
+   if (r) {
+   DRM_ERROR("Failed to alloc GART for userqueue object (%d)", r);
+   goto unresv;
+   }
+
+   r = amdgpu_bo_kmap(userq_obj->obj, &userq_obj->cpu_ptr);
+   if (r) {
+   DRM_ERROR("Failed to map BO for userqueue (%d)", r);
+   goto unresv;
+   }
+
+   userq_obj->gpu_addr = amdgpu_bo_gpu_offset(userq_obj->obj);
+   amdgpu_bo_unreserve(userq_obj->obj);
+   memset(userq_obj->cpu_ptr, 0, size);
+   return 0;
+
+unresv:
+   amdgpu_bo_unreserve(userq_obj->obj);
+
+free_obj:
+   amdgpu_bo_unref(&userq_obj->obj);
+   return r;
+}
+
+void amdgpu_userqueue_destroy_object(struct amdgpu_userq_mgr *uq_mgr,
+  struct amdgpu_userq_obj *userq_obj)
+{
+   amdgpu_bo_kunmap(userq_obj->obj);
+   amdgpu_bo_unref(&userq_obj->obj);
+}
+
 static int
 amdgpu_userqueue_destroy(struct drm_file *filp, int queue_id)
 {
diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h 
b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
index b739274c72e1..bbd29f68b8d4 100644
--- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
+++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
@@ -29,6 +29,12 @@
 
 struct amdgpu_mqd_prop;
 
+struct amdgpu_userq_obj {
+   void *cpu_ptr;
+   uint64_t gpu_addr;
+   struct amdgpu_bo *obj;
+};
+
 struct amdgpu_usermode_queue {
int queue_type;
uint64_tdoorbell_handle;
@@ -37,6 +43,7 @@ struct amdgpu_usermode_queue {
struct amdgpu_mqd_prop  *userq_prop;
struct amdgpu_userq_mgr *userq_mgr;
struct amdgpu_vm*vm;
+   struct amdgpu_userq_obj mqd;
 };
 
 struct amdgpu_userq_funcs {
@@ -60,4 +67,10 @@ int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr 
*userq_mgr, struct amdgpu_devi
 
 void amdgpu_userq_mgr_fini(struct amdgpu_userq_mgr *userq_mgr);
 
+int amdgpu_userqueue_create_object(struct amdgpu_userq_mgr *uq_mgr,
+  struct amdgpu_userq_obj *userq_obj,
+  int size);
+
+void amdgpu_userqueue_destroy_object(struct amdgpu_userq_mgr *uq_mgr,
+struct amdgpu_userq_obj *userq_obj);
 #endif
-- 
2.45.1

[PATCH v11 06/28] drm/amdgpu: create context space for usermode queue

2024-09-09 Thread Shashank Sharma

The MES FW expects us to allocate at least one page as context
space to process gang and process related context data. This
patch creates a joint object for the same, and calculates GPU
space offsets of these spaces.

V1: Addressed review comments on RFC patch:
Alex: Make this function IP specific

V2: Addressed review comments from Christian
- Allocate only one object for total FW space, and calculate
  offsets for each of these objects.

V3: Integration with doorbell manager

V4: Review comments:
- Remove shadow from FW space list from cover letter (Alex)
- Alignment of macro (Luben)

V5: Merged patches 5 and 6 into this single patch
Addressed review comments:
- Use lower_32_bits instead of mask (Christian)
- gfx_v11_0 instead of gfx_v11 in function names (Alex)
- Shadow and GDS objects are now coming from userspace (Christian,
  Alex)

V6:
- Add a comment to replace amdgpu_bo_create_kernel() with
  amdgpu_bo_create() during fw_ctx object creation (Christian).
- Move proc_ctx_gpu_addr, gang_ctx_gpu_addr and fw_ctx_gpu_addr out
  of generic queue structure and make it gen11 specific (Alex).

V7:
   - Using helper function to create/destroy userqueue objects.
   - Removed FW object space allocation.

V8:
   - Updating FW object address from user values.

V9:
   - uppdated function name from gfx_v11_* to mes_v11_*

V10:
   - making this patch independent of IP based changes, moving any
 GFX object related changes in GFX specific patch (Alex)

Cc: Alex Deucher 
Cc: Christian Koenig 
Acked-by: Christian Koenig 
Signed-off-by: Shashank Sharma 
Signed-off-by: Arvind Yadav 
---
 .../gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c  | 33 +++
 .../gpu/drm/amd/include/amdgpu_userqueue.h|  1 +
 2 files changed, 34 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
index 63fd48a5b8b0..2486ea2d72fe 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
@@ -27,6 +27,31 @@
 #include "mes_v11_0.h"
 #include "mes_v11_0_userqueue.h"
 
+#define AMDGPU_USERQ_PROC_CTX_SZ PAGE_SIZE
+#define AMDGPU_USERQ_GANG_CTX_SZ PAGE_SIZE
+
+static int mes_v11_0_userq_create_ctx_space(struct amdgpu_userq_mgr *uq_mgr,
+   struct amdgpu_usermode_queue *queue,
+   struct drm_amdgpu_userq_in 
*mqd_user)
+{
+   struct amdgpu_userq_obj *ctx = &queue->fw_obj;
+   int r, size;
+
+   /*
+* The FW expects at least one page space allocated for
+* process ctx and gang ctx each. Create an object
+* for the same.
+*/
+   size = AMDGPU_USERQ_PROC_CTX_SZ + AMDGPU_USERQ_GANG_CTX_SZ;
+   r = amdgpu_userqueue_create_object(uq_mgr, ctx, size);
+   if (r) {
+   DRM_ERROR("Failed to allocate ctx space bo for userqueue, 
err:%d\n", r);
+   return r;
+   }
+
+   return 0;
+}
+
 static int mes_v11_0_userq_mqd_create(struct amdgpu_userq_mgr *uq_mgr,
  struct drm_amdgpu_userq_in *args_in,
  struct amdgpu_usermode_queue *queue)
@@ -73,6 +98,13 @@ static int mes_v11_0_userq_mqd_create(struct 
amdgpu_userq_mgr *uq_mgr,
goto free_mqd;
}
 
+   /* Create BO for FW operations */
+   r = mes_v11_0_userq_create_ctx_space(uq_mgr, queue, mqd_user);
+   if (r) {
+   DRM_ERROR("Failed to allocate BO for userqueue (%d)", r);
+   goto free_mqd;
+   }
+
return 0;
 
 free_mqd:
@@ -88,6 +120,7 @@ static void
 mes_v11_0_userq_mqd_destroy(struct amdgpu_userq_mgr *uq_mgr,
struct amdgpu_usermode_queue *queue)
 {
+   amdgpu_userqueue_destroy_object(uq_mgr, &queue->fw_obj);
kfree(queue->userq_prop);
amdgpu_userqueue_destroy_object(uq_mgr, &queue->mqd);
 }
diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h 
b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
index bbd29f68b8d4..643f31474bd8 100644
--- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
+++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
@@ -44,6 +44,7 @@ struct amdgpu_usermode_queue {
struct amdgpu_userq_mgr *userq_mgr;
struct amdgpu_vm*vm;
struct amdgpu_userq_obj mqd;
+   struct amdgpu_userq_obj fw_obj;
 };
 
 struct amdgpu_userq_funcs {
-- 
2.45.1

[PATCH v11 05/28] drm/amdgpu: create MES-V11 usermode queue for GFX

2024-09-09 Thread Shashank Sharma

A Memory queue descriptor (MQD) of a userqueue defines it in
the hw's context. As MQD format can vary between different
graphics IPs, we need gfx GEN specific handlers to create MQDs.

This patch:
- Adds a new file which will be used for MES based userqueue
  functions targeting GFX and SDMA IP.
- Introduces MQD handler functions for the usermode queues.

V1: Worked on review comments from Alex:
- Make MQD functions GEN and IP specific

V2: Worked on review comments from Alex:
- Reuse the existing adev->mqd[ip] for MQD creation
- Formatting and arrangement of code

V3:
- Integration with doorbell manager

V4: Review comments addressed:
- Do not create a new file for userq, reuse gfx_v11_0.c (Alex)
- Align name of structure members (Luben)
- Don't break up the Cc tag list and the Sob tag list in commit
  message (Luben)
V5:
   - No need to reserve the bo for MQD (Christian).
   - Some more changes to support IP specific MQD creation.

V6:
   - Add a comment reminding us to replace the amdgpu_bo_create_kernel()
 calls while creating MQD object to amdgpu_bo_create() once eviction
 fences are ready (Christian).

V7:
   - Re-arrange userqueue functions in adev instead of uq_mgr (Alex)
   - Use memdup_user instead of copy_from_user (Christian)

V9:
   - Moved userqueue code from gfx_v11_0.c to new file mes_v11_0.c so
 that it can be reused for SDMA userqueues as well (Shashank, Alex)

V10: Addressed review comments from Alex
   - Making this patch independent of IP engine(GFX/SDMA/Compute) and
 specific to MES V11 only, using the generic MQD structure.
   - Splitting a spearate patch to enabling GFX support from here.
   - Verify mqd va address to be non-NULL.
   - Add a separate header file.

Cc: Alex Deucher 
Cc: Christian Koenig 
Signed-off-by: Shashank Sharma 
Signed-off-by: Arvind Yadav 
Change-Id: I855f895a4822ef015957542bc17eabb166b792e6
---
 drivers/gpu/drm/amd/amdgpu/Makefile   |  3 +-
 .../gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c  | 98 +++
 .../gpu/drm/amd/amdgpu/mes_v11_0_userqueue.h  | 30 ++
 3 files changed, 130 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
 create mode 100644 drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.h

diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
b/drivers/gpu/drm/amd/amdgpu/Makefile
index dcf64b965bdf..d9bf70251eba 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -173,7 +173,8 @@ amdgpu-y += \
 amdgpu-y += \
amdgpu_mes.o \
mes_v11_0.o \
-   mes_v12_0.o
+   mes_v12_0.o \
+   mes_v11_0_userqueue.o
 
 # add UVD block
 amdgpu-y += \
diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
new file mode 100644
index ..63fd48a5b8b0
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
@@ -0,0 +1,98 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright 2024 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+#include "amdgpu.h"
+#include "amdgpu_gfx.h"
+#include "v11_structs.h"
+#include "mes_v11_0.h"
+#include "mes_v11_0_userqueue.h"
+
+static int mes_v11_0_userq_mqd_create(struct amdgpu_userq_mgr *uq_mgr,
+ struct drm_amdgpu_userq_in *args_in,
+ struct amdgpu_usermode_queue *queue)
+{
+   struct amdgpu_device *adev = uq_mgr->adev;
+   struct amdgpu_mqd *mqd_hw_default = &adev->mqds[queue->queue_type];
+   struct drm_amdgpu_userq_in *mqd_user = args_in;
+   struct amdgpu_mqd_prop *userq_props;
+   int r;
+
+   /* Structure to initialize MQD for userqueue using generic MQD init 
function */
+   userq_props = kzalloc(sizeof(struct amdgpu_m

[PATCH v11 07/28] drm/amdgpu: map usermode queue into MES

2024-09-09 Thread Shashank Sharma

This patch adds new functions to map/unmap a usermode queue into
the FW, using the MES ring. As soon as this mapping is done, the
queue would  be considered ready to accept the workload.

V1: Addressed review comments from Alex on the RFC patch series
- Map/Unmap should be IP specific.
V2:
Addressed review comments from Christian:
- Fix the wptr_mc_addr calculation (moved into another patch)
Addressed review comments from Alex:
- Do not add fptrs for map/unmap

V3:  Integration with doorbell manager
V4:  Rebase
V5:  Use gfx_v11_0 for function names (Alex)
V6:  Removed queue->proc/gang/fw_ctx_address variables and doing the
 address calculations locally to keep the queue structure GEN
 independent (Alex)
V7:  Added R-B from Alex
V8:  Rebase
V9:  Rebase
V10: Rebase

Cc: Alex Deucher 
Cc: Christian Koenig 
Reviewed-by: Alex Deucher 
Signed-off-by: Shashank Sharma 
---
 .../gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c  | 74 +++
 1 file changed, 74 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
index 2486ea2d72fe..a1bc6f488928 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
@@ -30,6 +30,69 @@
 #define AMDGPU_USERQ_PROC_CTX_SZ PAGE_SIZE
 #define AMDGPU_USERQ_GANG_CTX_SZ PAGE_SIZE
 
+static int mes_v11_0_userq_map(struct amdgpu_userq_mgr *uq_mgr,
+  struct amdgpu_usermode_queue *queue,
+  struct amdgpu_mqd_prop *userq_props)
+{
+   struct amdgpu_device *adev = uq_mgr->adev;
+   struct amdgpu_userq_obj *ctx = &queue->fw_obj;
+   struct mes_add_queue_input queue_input;
+   int r;
+
+   memset(&queue_input, 0x0, sizeof(struct mes_add_queue_input));
+
+   queue_input.process_va_start = 0;
+   queue_input.process_va_end = (adev->vm_manager.max_pfn - 1) << 
AMDGPU_GPU_PAGE_SHIFT;
+
+   /* set process quantum to 10 ms and gang quantum to 1 ms as default */
+   queue_input.process_quantum = 10;
+   queue_input.gang_quantum = 1;
+   queue_input.paging = false;
+
+   queue_input.process_context_addr = ctx->gpu_addr;
+   queue_input.gang_context_addr = ctx->gpu_addr + 
AMDGPU_USERQ_PROC_CTX_SZ;
+   queue_input.inprocess_gang_priority = AMDGPU_MES_PRIORITY_LEVEL_NORMAL;
+   queue_input.gang_global_priority_level = 
AMDGPU_MES_PRIORITY_LEVEL_NORMAL;
+
+   queue_input.process_id = queue->vm->pasid;
+   queue_input.queue_type = queue->queue_type;
+   queue_input.mqd_addr = queue->mqd.gpu_addr;
+   queue_input.wptr_addr = userq_props->wptr_gpu_addr;
+   queue_input.queue_size = userq_props->queue_size >> 2;
+   queue_input.doorbell_offset = userq_props->doorbell_index;
+   queue_input.page_table_base_addr = 
amdgpu_gmc_pd_addr(queue->vm->root.bo);
+
+   amdgpu_mes_lock(&adev->mes);
+   r = adev->mes.funcs->add_hw_queue(&adev->mes, &queue_input);
+   amdgpu_mes_unlock(&adev->mes);
+   if (r) {
+   DRM_ERROR("Failed to map queue in HW, err (%d)\n", r);
+   return r;
+   }
+
+   DRM_DEBUG_DRIVER("Queue (doorbell:%d) mapped successfully\n", 
userq_props->doorbell_index);
+   return 0;
+}
+
+static void mes_v11_0_userq_unmap(struct amdgpu_userq_mgr *uq_mgr,
+ struct amdgpu_usermode_queue *queue)
+{
+   struct amdgpu_device *adev = uq_mgr->adev;
+   struct mes_remove_queue_input queue_input;
+   struct amdgpu_userq_obj *ctx = &queue->fw_obj;
+   int r;
+
+   memset(&queue_input, 0x0, sizeof(struct mes_remove_queue_input));
+   queue_input.doorbell_offset = queue->doorbell_index;
+   queue_input.gang_context_addr = ctx->gpu_addr + 
AMDGPU_USERQ_PROC_CTX_SZ;
+
+   amdgpu_mes_lock(&adev->mes);
+   r = adev->mes.funcs->remove_hw_queue(&adev->mes, &queue_input);
+   amdgpu_mes_unlock(&adev->mes);
+   if (r)
+   DRM_ERROR("Failed to unmap queue in HW, err (%d)\n", r);
+}
+
 static int mes_v11_0_userq_create_ctx_space(struct amdgpu_userq_mgr *uq_mgr,
struct amdgpu_usermode_queue *queue,
struct drm_amdgpu_userq_in 
*mqd_user)
@@ -105,8 +168,18 @@ static int mes_v11_0_userq_mqd_create(struct 
amdgpu_userq_mgr *uq_mgr,
goto free_mqd;
}
 
+   /* Map userqueue into FW using MES */
+   r = mes_v11_0_userq_map(uq_mgr, queue, userq_props);
+   if (r) {
+   DRM_ERROR("Failed to init MQD\n");
+   goto free_ctx;
+   }
+
return 0;
 
+free_ctx:
+   amdgpu_userqueue_destroy_object(uq_mgr, &queue->fw_obj);
+
 free_

[PATCH v11 09/28] drm/amdgpu: generate doorbell index for userqueue

2024-09-09 Thread Shashank Sharma

The userspace sends us the doorbell object and the relative doobell
index in the object to be used for the usermode queue, but the FW
expects the absolute doorbell index on the PCI BAR in the MQD. This
patch adds a function to convert this relative doorbell index to
absolute doorbell index.

V5:  Fix the db object reference leak (Christian)
V6:  Pin the doorbell bo in userqueue_create() function, and unpin it
 in userqueue destoy (Christian)
V7:  Added missing kfree for queue in error cases
 Added Alex's R-B
V8:  Rebase
V9:  Changed the function names from gfx_v11* to mes_v11*
V10: Rebase

Cc: Alex Deucher 
Cc: Christian Koenig 
Reviewed-by: Alex Deucher 
Signed-off-by: Shashank Sharma 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 59 +++
 .../gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c  |  1 +
 .../gpu/drm/amd/include/amdgpu_userqueue.h|  1 +
 3 files changed, 61 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
index 501324dde343..3c9f804478d5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
@@ -94,6 +94,53 @@ void amdgpu_userqueue_destroy_object(struct amdgpu_userq_mgr 
*uq_mgr,
amdgpu_bo_unref(&userq_obj->obj);
 }
 
+static uint64_t
+amdgpu_userqueue_get_doorbell_index(struct amdgpu_userq_mgr *uq_mgr,
+struct amdgpu_usermode_queue *queue,
+struct drm_file *filp,
+uint32_t doorbell_offset)
+{
+   uint64_t index;
+   struct drm_gem_object *gobj;
+   struct amdgpu_userq_obj *db_obj = &queue->db_obj;
+   int r;
+
+   gobj = drm_gem_object_lookup(filp, queue->doorbell_handle);
+   if (gobj == NULL) {
+   DRM_ERROR("Can't find GEM object for doorbell\n");
+   return -EINVAL;
+   }
+
+   db_obj->obj = amdgpu_bo_ref(gem_to_amdgpu_bo(gobj));
+   drm_gem_object_put(gobj);
+
+   /* Pin the BO before generating the index, unpin in queue destroy */
+   r = amdgpu_bo_pin(db_obj->obj, AMDGPU_GEM_DOMAIN_DOORBELL);
+   if (r) {
+   DRM_ERROR("[Usermode queues] Failed to pin doorbell object\n");
+   goto unref_bo;
+   }
+
+   r = amdgpu_bo_reserve(db_obj->obj, true);
+   if (r) {
+   DRM_ERROR("[Usermode queues] Failed to pin doorbell object\n");
+   goto unpin_bo;
+   }
+
+   index = amdgpu_doorbell_index_on_bar(uq_mgr->adev, db_obj->obj,
+doorbell_offset, sizeof(u64));
+   DRM_DEBUG_DRIVER("[Usermode queues] doorbell index=%lld\n", index);
+   amdgpu_bo_unreserve(db_obj->obj);
+   return index;
+
+unpin_bo:
+   amdgpu_bo_unpin(db_obj->obj);
+
+unref_bo:
+   amdgpu_bo_unref(&db_obj->obj);
+   return r;
+}
+
 static int
 amdgpu_userqueue_destroy(struct drm_file *filp, int queue_id)
 {
@@ -114,6 +161,8 @@ amdgpu_userqueue_destroy(struct drm_file *filp, int 
queue_id)
 
uq_funcs = adev->userq_funcs[queue->queue_type];
uq_funcs->mqd_destroy(uq_mgr, queue);
+   amdgpu_bo_unpin(queue->db_obj.obj);
+   amdgpu_bo_unref(&queue->db_obj.obj);
idr_remove(&uq_mgr->userq_idr, queue_id);
kfree(queue);
 
@@ -129,6 +178,7 @@ amdgpu_userqueue_create(struct drm_file *filp, union 
drm_amdgpu_userq *args)
struct amdgpu_device *adev = uq_mgr->adev;
const struct amdgpu_userq_funcs *uq_funcs;
struct amdgpu_usermode_queue *queue;
+   uint64_t index;
int qid, r = 0;
 
if (args->in.flags) {
@@ -157,6 +207,15 @@ amdgpu_userqueue_create(struct drm_file *filp, union 
drm_amdgpu_userq *args)
queue->flags = args->in.flags;
queue->vm = &fpriv->vm;
 
+   /* Convert relative doorbell offset into absolute doorbell index */
+   index = amdgpu_userqueue_get_doorbell_index(uq_mgr, queue, filp, 
args->in.doorbell_offset);
+   if (index == (uint64_t)-EINVAL) {
+   DRM_ERROR("Failed to get doorbell for queue\n");
+   kfree(queue);
+   goto unlock;
+   }
+   queue->doorbell_index = index;
+
r = uq_funcs->mqd_create(uq_mgr, &args->in, queue);
if (r) {
DRM_ERROR("Failed to create Queue\n");
diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
index 90511abaef05..bc9ce5233a7d 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
@@ -220,6 +220,7 @@ static int mes_v11_0_userq_mqd_create(struct 
amdgpu_userq_mgr *uq_mgr,
userq_props->hqd_base_gpu_addr = mqd_user->queue_va;

[PATCH v11 08/28] drm/amdgpu: map wptr BO into GART

2024-09-09 Thread Shashank Sharma

To support oversubscription, MES FW expects WPTR BOs to
be mapped into GART, before they are submitted to usermode
queues. This patch adds a function for the same.

V4: fix the wptr value before mapping lookup (Bas, Christian).

V5: Addressed review comments from Christian:
- Either pin object or allocate from GART, but not both.
- All the handling must be done with the VM locks held.

V7: Addressed review comments from Christian:
- Do not take vm->eviction_lock
- Use amdgpu_bo_gpu_offset to get the wptr_bo GPU offset

V8:  Rebase
V9:  Changed the function names from gfx_v11* to mes_v11*
V10: Remove unused adev (Harish)

Cc: Alex Deucher 
Cc: Christian Koenig 
Signed-off-by: Shashank Sharma 
Signed-off-by: Arvind Yadav 
---
 .../gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c  | 76 +++
 .../gpu/drm/amd/include/amdgpu_userqueue.h|  1 +
 2 files changed, 77 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
index a1bc6f488928..90511abaef05 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
@@ -30,6 +30,73 @@
 #define AMDGPU_USERQ_PROC_CTX_SZ PAGE_SIZE
 #define AMDGPU_USERQ_GANG_CTX_SZ PAGE_SIZE
 
+static int
+mes_v11_0_map_gtt_bo_to_gart(struct amdgpu_bo *bo)
+{
+   int ret;
+
+   ret = amdgpu_bo_reserve(bo, true);
+   if (ret) {
+   DRM_ERROR("Failed to reserve bo. ret %d\n", ret);
+   goto err_reserve_bo_failed;
+   }
+
+   ret = amdgpu_ttm_alloc_gart(&bo->tbo);
+   if (ret) {
+   DRM_ERROR("Failed to bind bo to GART. ret %d\n", ret);
+   goto err_map_bo_gart_failed;
+   }
+
+   amdgpu_bo_unreserve(bo);
+   bo = amdgpu_bo_ref(bo);
+
+   return 0;
+
+err_map_bo_gart_failed:
+   amdgpu_bo_unreserve(bo);
+err_reserve_bo_failed:
+   return ret;
+}
+
+static int
+mes_v11_0_create_wptr_mapping(struct amdgpu_userq_mgr *uq_mgr,
+ struct amdgpu_usermode_queue *queue,
+ uint64_t wptr)
+{
+   struct amdgpu_bo_va_mapping *wptr_mapping;
+   struct amdgpu_vm *wptr_vm;
+   struct amdgpu_userq_obj *wptr_obj = &queue->wptr_obj;
+   int ret;
+
+   wptr_vm = queue->vm;
+   ret = amdgpu_bo_reserve(wptr_vm->root.bo, false);
+   if (ret)
+   return ret;
+
+   wptr &= AMDGPU_GMC_HOLE_MASK;
+   wptr_mapping = amdgpu_vm_bo_lookup_mapping(wptr_vm, wptr >> PAGE_SHIFT);
+   amdgpu_bo_unreserve(wptr_vm->root.bo);
+   if (!wptr_mapping) {
+   DRM_ERROR("Failed to lookup wptr bo\n");
+   return -EINVAL;
+   }
+
+   wptr_obj->obj = wptr_mapping->bo_va->base.bo;
+   if (wptr_obj->obj->tbo.base.size > PAGE_SIZE) {
+   DRM_ERROR("Requested GART mapping for wptr bo larger than one 
page\n");
+   return -EINVAL;
+   }
+
+   ret = mes_v11_0_map_gtt_bo_to_gart(wptr_obj->obj);
+   if (ret) {
+   DRM_ERROR("Failed to map wptr bo to GART\n");
+   return ret;
+   }
+
+   queue->wptr_obj.gpu_addr = amdgpu_bo_gpu_offset_no_check(wptr_obj->obj);
+   return 0;
+}
+
 static int mes_v11_0_userq_map(struct amdgpu_userq_mgr *uq_mgr,
   struct amdgpu_usermode_queue *queue,
   struct amdgpu_mqd_prop *userq_props)
@@ -61,6 +128,7 @@ static int mes_v11_0_userq_map(struct amdgpu_userq_mgr 
*uq_mgr,
queue_input.queue_size = userq_props->queue_size >> 2;
queue_input.doorbell_offset = userq_props->doorbell_index;
queue_input.page_table_base_addr = 
amdgpu_gmc_pd_addr(queue->vm->root.bo);
+   queue_input.wptr_mc_addr = queue->wptr_obj.gpu_addr;
 
amdgpu_mes_lock(&adev->mes);
r = adev->mes.funcs->add_hw_queue(&adev->mes, &queue_input);
@@ -168,6 +236,13 @@ static int mes_v11_0_userq_mqd_create(struct 
amdgpu_userq_mgr *uq_mgr,
goto free_mqd;
}
 
+   /* FW expects WPTR BOs to be mapped into GART */
+   r = mes_v11_0_create_wptr_mapping(uq_mgr, queue, 
userq_props->wptr_gpu_addr);
+   if (r) {
+   DRM_ERROR("Failed to create WPTR mapping\n");
+   goto free_ctx;
+   }
+
/* Map userqueue into FW using MES */
r = mes_v11_0_userq_map(uq_mgr, queue, userq_props);
if (r) {
@@ -194,6 +269,7 @@ mes_v11_0_userq_mqd_destroy(struct amdgpu_userq_mgr *uq_mgr,
struct amdgpu_usermode_queue *queue)
 {
mes_v11_0_userq_unmap(uq_mgr, queue);
+   amdgpu_bo_unref(&queue->wptr_obj.obj);
amdgpu_userqueue_destroy_object(uq_mgr, &queue->fw_obj);
kfree(queue->userq_prop);
amdgpu_u

[PATCH v11 10/28] drm/amdgpu: cleanup leftover queues

2024-09-09 Thread Shashank Sharma

This patch adds code to cleanup any leftover userqueues which
a user might have missed to destroy due to a crash or any other
programming error.

V7:  Added Alex's R-B
V8:  Rebase
V9:  Rebase
V10: Rebase

Cc: Alex Deucher 
Cc: Christian Koenig 
Reviewed-by: Alex Deucher 
Suggested-by: Bas Nieuwenhuizen 
Signed-off-by: Bas Nieuwenhuizen 
Signed-off-by: Shashank Sharma 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 27 ++-
 1 file changed, 20 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
index 3c9f804478d5..64a063ec3b27 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
@@ -26,6 +26,19 @@
 #include "amdgpu_vm.h"
 #include "amdgpu_userqueue.h"
 
+static void
+amdgpu_userqueue_cleanup(struct amdgpu_userq_mgr *uq_mgr,
+struct amdgpu_usermode_queue *queue,
+int queue_id)
+{
+   struct amdgpu_device *adev = uq_mgr->adev;
+   const struct amdgpu_userq_funcs *uq_funcs = 
adev->userq_funcs[queue->queue_type];
+
+   uq_funcs->mqd_destroy(uq_mgr, queue);
+   idr_remove(&uq_mgr->userq_idr, queue_id);
+   kfree(queue);
+}
+
 static struct amdgpu_usermode_queue *
 amdgpu_userqueue_find(struct amdgpu_userq_mgr *uq_mgr, int qid)
 {
@@ -146,8 +159,6 @@ amdgpu_userqueue_destroy(struct drm_file *filp, int 
queue_id)
 {
struct amdgpu_fpriv *fpriv = filp->driver_priv;
struct amdgpu_userq_mgr *uq_mgr = &fpriv->userq_mgr;
-   struct amdgpu_device *adev = uq_mgr->adev;
-   const struct amdgpu_userq_funcs *uq_funcs;
struct amdgpu_usermode_queue *queue;
 
mutex_lock(&uq_mgr->userq_mutex);
@@ -159,13 +170,9 @@ amdgpu_userqueue_destroy(struct drm_file *filp, int 
queue_id)
return -EINVAL;
}
 
-   uq_funcs = adev->userq_funcs[queue->queue_type];
-   uq_funcs->mqd_destroy(uq_mgr, queue);
amdgpu_bo_unpin(queue->db_obj.obj);
amdgpu_bo_unref(&queue->db_obj.obj);
-   idr_remove(&uq_mgr->userq_idr, queue_id);
-   kfree(queue);
-
+   amdgpu_userqueue_cleanup(uq_mgr, queue, queue_id);
mutex_unlock(&uq_mgr->userq_mutex);
return 0;
 }
@@ -276,6 +283,12 @@ int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr 
*userq_mgr, struct amdgpu_devi
 
 void amdgpu_userq_mgr_fini(struct amdgpu_userq_mgr *userq_mgr)
 {
+   uint32_t queue_id;
+   struct amdgpu_usermode_queue *queue;
+
+   idr_for_each_entry(&userq_mgr->userq_idr, queue, queue_id)
+   amdgpu_userqueue_cleanup(userq_mgr, queue, queue_id);
+
idr_destroy(&userq_mgr->userq_idr);
mutex_destroy(&userq_mgr->userq_mutex);
 }
-- 
2.45.1

[PATCH v11 13/28] drm/amdgpu: enable compute/gfx usermode queue

2024-09-09 Thread Shashank Sharma

This patch does the necessary changes required to
enable compute workload support using the existing
usermode queues infrastructure.

V9:  Patch introduced
V10: Add custom IP specific mqd strcuture for compute (Alex)
V11: Rename drm_amdgpu_userq_mqd_compute_gfx_v11 to
 drm_amdgpu_userq_mqd_compute_gfx11 (Marek)

Cc: Alex Deucher 
Cc: Christian Koenig 
Acked-by: Christian König 
Signed-off-by: Arvind Yadav 
Signed-off-by: Shashank Sharma 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c |  4 +++-
 drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c|  2 ++
 .../gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c  | 23 +++
 include/uapi/drm/amdgpu_drm.h | 10 
 4 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
index 2c5747cc492e..5173718c3848 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
@@ -189,7 +189,9 @@ amdgpu_userqueue_create(struct drm_file *filp, union 
drm_amdgpu_userq *args)
int qid, r = 0;
 
/* Usermode queues are only supported for GFX IP as of now */
-   if (args->in.ip_type != AMDGPU_HW_IP_GFX && args->in.ip_type != 
AMDGPU_HW_IP_DMA) {
+   if (args->in.ip_type != AMDGPU_HW_IP_GFX &&
+   args->in.ip_type != AMDGPU_HW_IP_DMA &&
+   args->in.ip_type != AMDGPU_HW_IP_COMPUTE) {
DRM_ERROR("Usermode queue doesn't support IP type %u\n", 
args->in.ip_type);
return -EINVAL;
}
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
index e68874fd0ff9..82a8df56240e 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
@@ -1554,6 +1554,7 @@ static int gfx_v11_0_sw_init(void *handle)
adev->gfx.mec.num_pipe_per_mec = 4;
adev->gfx.mec.num_queue_per_pipe = 4;
adev->userq_funcs[AMDGPU_HW_IP_GFX] = &userq_mes_v11_0_funcs;
+   adev->userq_funcs[AMDGPU_HW_IP_COMPUTE] = 
&userq_mes_v11_0_funcs;
break;
case IP_VERSION(11, 0, 1):
case IP_VERSION(11, 0, 4):
@@ -1567,6 +1568,7 @@ static int gfx_v11_0_sw_init(void *handle)
adev->gfx.mec.num_pipe_per_mec = 4;
adev->gfx.mec.num_queue_per_pipe = 4;
adev->userq_funcs[AMDGPU_HW_IP_GFX] = &userq_mes_v11_0_funcs;
+   adev->userq_funcs[AMDGPU_HW_IP_COMPUTE] = 
&userq_mes_v11_0_funcs;
break;
default:
adev->gfx.me.num_me = 1;
diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
index dc5359742774..e70b8e429e9c 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
@@ -268,6 +268,29 @@ static int mes_v11_0_userq_mqd_create(struct 
amdgpu_userq_mgr *uq_mgr,
userq_props->use_doorbell = true;
userq_props->doorbell_index = queue->doorbell_index;
 
+   if (queue->queue_type == AMDGPU_HW_IP_COMPUTE) {
+   struct drm_amdgpu_userq_mqd_compute_gfx11 *compute_mqd;
+
+   if (mqd_user->mqd_size != sizeof(*compute_mqd)) {
+   DRM_ERROR("Invalid compute IP MQD size\n");
+   r = -EINVAL;
+   goto free_mqd;
+   }
+
+   compute_mqd = memdup_user(u64_to_user_ptr(mqd_user->mqd), 
mqd_user->mqd_size);
+   if (IS_ERR(compute_mqd)) {
+   DRM_ERROR("Failed to read user MQD\n");
+   r = -ENOMEM;
+   goto free_mqd;
+   }
+
+   userq_props->eop_gpu_addr = compute_mqd->eop_va;
+   userq_props->hqd_pipe_priority = AMDGPU_GFX_PIPE_PRIO_NORMAL;
+   userq_props->hqd_queue_priority = 
AMDGPU_GFX_QUEUE_PRIORITY_MINIMUM;
+   userq_props->hqd_active = false;
+   kfree(compute_mqd);
+   }
+
queue->userq_prop = userq_props;
 
r = mqd_hw_default->init_mqd(adev, (void *)queue->mqd.cpu_ptr, 
userq_props);
diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
index 3ea067242b19..6eac46e0f3fd 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -438,6 +438,16 @@ struct drm_amdgpu_userq_mqd_sdma_gfx11 {
__u64   csa_va;
 };
 
+/* GFX V11 Compute IP specific MQD parameters */
+struct drm_amdgpu_userq_mqd_compute_gfx11 {
+   /**
+* @eop_va: Virtual address of the GPU memory to hold the EOP buffer.
+* This must be a from a separate GPU object, and must be at least 1 
page
+* sized.
+*/
+   __u64   eop_va;
+};
+
 /* vm ioctl */
 #define AMDGPU_VM_OP_RESERVE_VMID  1
 #define AMDGPU_VM_OP_UNRESERVE_VMID2
-- 
2.45.1

[PATCH v11 11/28] drm/amdgpu: enable GFX-V11 userqueue support

2024-09-09 Thread Shashank Sharma

This patch enables GFX-v11 IP support in the usermode queue base
code. It typically:
- adds a GFX_v11 specific MQD structure
- sets IP functions to create and destroy MQDs
- sets MQD objects coming from userspace

V10: introduced this spearate patch for GFX V11 enabling (Alex).
V11: Addressed review comments:
 - update the comments in GFX mqd structure informing user about using
   the INFO IOCTL for object sizes (Alex)
 - rename struct drm_amdgpu_userq_mqd_gfx_v11 to
   drm_amdgpu_userq_mqd_gfx11 (Marek)

Cc: Alex Deucher 
Cc: Christian Koenig 
Signed-off-by: Shashank Sharma 
Signed-off-by: Arvind Yadav 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c |  6 
 drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c|  3 ++
 .../gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c  | 28 +++
 include/uapi/drm/amdgpu_drm.h | 19 +
 4 files changed, 56 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
index 64a063ec3b27..5cb984c509c2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
@@ -188,6 +188,12 @@ amdgpu_userqueue_create(struct drm_file *filp, union 
drm_amdgpu_userq *args)
uint64_t index;
int qid, r = 0;
 
+   /* Usermode queues are only supported for GFX IP as of now */
+   if (args->in.ip_type != AMDGPU_HW_IP_GFX) {
+   DRM_ERROR("Usermode queue doesn't support IP type %u\n", 
args->in.ip_type);
+   return -EINVAL;
+   }
+
if (args->in.flags) {
DRM_ERROR("Usermode queue flags not supported yet\n");
return -EINVAL;
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
index d3e8be82a172..e68874fd0ff9 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
@@ -49,6 +49,7 @@
 #include "gfx_v11_0_3.h"
 #include "nbio_v4_3.h"
 #include "mes_v11_0.h"
+#include "mes_v11_0_userqueue.h"
 
 #define GFX11_NUM_GFX_RINGS1
 #define GFX11_MEC_HPD_SIZE 2048
@@ -1552,6 +1553,7 @@ static int gfx_v11_0_sw_init(void *handle)
adev->gfx.mec.num_mec = 2;
adev->gfx.mec.num_pipe_per_mec = 4;
adev->gfx.mec.num_queue_per_pipe = 4;
+   adev->userq_funcs[AMDGPU_HW_IP_GFX] = &userq_mes_v11_0_funcs;
break;
case IP_VERSION(11, 0, 1):
case IP_VERSION(11, 0, 4):
@@ -1564,6 +1566,7 @@ static int gfx_v11_0_sw_init(void *handle)
adev->gfx.mec.num_mec = 1;
adev->gfx.mec.num_pipe_per_mec = 4;
adev->gfx.mec.num_queue_per_pipe = 4;
+   adev->userq_funcs[AMDGPU_HW_IP_GFX] = &userq_mes_v11_0_funcs;
break;
default:
adev->gfx.me.num_me = 1;
diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
index bc9ce5233a7d..bcfa0d1ef7bf 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
@@ -180,6 +180,34 @@ static int mes_v11_0_userq_create_ctx_space(struct 
amdgpu_userq_mgr *uq_mgr,
return r;
}
 
+   /* Shadow, GDS and CSA objects come directly from userspace */
+   if (mqd_user->ip_type == AMDGPU_HW_IP_GFX) {
+   struct v11_gfx_mqd *mqd = queue->mqd.cpu_ptr;
+   struct drm_amdgpu_userq_mqd_gfx11 *mqd_gfx_v11;
+
+   if (mqd_user->mqd_size != sizeof(*mqd_gfx_v11) || 
!mqd_user->mqd) {
+   DRM_ERROR("Invalid GFX MQD\n");
+   return -EINVAL;
+   }
+
+   mqd_gfx_v11 = memdup_user(u64_to_user_ptr(mqd_user->mqd), 
mqd_user->mqd_size);
+   if (IS_ERR(mqd_gfx_v11)) {
+   DRM_ERROR("Failed to read user MQD\n");
+   amdgpu_userqueue_destroy_object(uq_mgr, ctx);
+   return -ENOMEM;
+   }
+
+   mqd->shadow_base_lo = mqd_gfx_v11->shadow_va & 0xFFFC;
+   mqd->shadow_base_hi = upper_32_bits(mqd_gfx_v11->shadow_va);
+
+   mqd->gds_bkup_base_lo = mqd_gfx_v11->gds_va & 0xFFFC;
+   mqd->gds_bkup_base_hi = upper_32_bits(mqd_gfx_v11->gds_va);
+
+   mqd->fw_work_area_base_lo = mqd_gfx_v11->csa_va & 0xFFFC;
+   mqd->fw_work_area_base_hi = upper_32_bits(mqd_gfx_v11->csa_va);
+   kfree(mqd_gfx_v11);
+   }
+
return 0;
 }
 
diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
index bd8d47a3..895d64982498 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/incl

[PATCH v11 14/28] drm/amdgpu: update userqueue BOs and PDs

2024-09-09 Thread Shashank Sharma

This patch updates the VM_IOCTL to allow userspace to synchronize
the mapping/unmapping of a BO in the page table.

The major changes are:
- it adds a drm_timeline object as an input parameter to the VM IOCTL.
- this object is used by the kernel to sync the update of the BO in
  the page table during the mapping of the object.
- the kernel also synchronizes the tlb flush of the page table entry of
  this object during the unmapping (Added in this series:
  https://patchwork.freedesktop.org/series/131276/ and
  https://patchwork.freedesktop.org/patch/584182/)
- the userspace can wait on this timeline, and then the BO is ready to
  be consumed by the GPU.

V2:
 - remove the eviction fence coupling

V3:
 - added the drm timeline support instead of input/output fence
   (Christian)

V4:
 - made timeline 64-bit (Christian)
 - bug fix (Arvind)

V5: GLCTS bug fix (Arvind)
V6: Rename syncobj_handle -> timeline_syncobj_out
Rename point -> timeline_point_in (Marek)

Cc: Alex Deucher 
Cc: Christian Koenig 
Cc: Felix Kuehling 
Signed-off-by: Arvind Yadav 
Signed-off-by: Shashank Sharma 
Change-Id: I0942942641e095408a95d4ab6e2e9d813f0f78db
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c   | 14 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 89 ++-
 .../gpu/drm/amd/include/amdgpu_userqueue.h|  3 +
 include/uapi/drm/amdgpu_drm.h |  4 +
 4 files changed, 107 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index ebb3f87ef4f6..f4529f2fad97 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -647,7 +647,7 @@ static void amdgpu_gem_va_update_vm(struct amdgpu_device 
*adev,
if (!amdgpu_vm_ready(vm))
return;
 
-   r = amdgpu_vm_clear_freed(adev, vm, NULL);
+   r = amdgpu_vm_clear_freed(adev, vm, &vm->last_update);
if (r)
goto error;
 
@@ -825,10 +825,20 @@ int amdgpu_gem_va_ioctl(struct drm_device *dev, void 
*data,
default:
break;
}
-   if (!r && !(args->flags & AMDGPU_VM_DELAY_UPDATE) && !adev->debug_vm)
+   if (!r && !(args->flags & AMDGPU_VM_DELAY_UPDATE) && !adev->debug_vm) {
amdgpu_gem_va_update_vm(adev, &fpriv->vm, bo_va,
args->operation);
 
+   if (args->timeline_syncobj_out && args->timeline_point_in) {
+   r = amdgpu_userqueue_update_bo_mapping(filp, bo_va, 
args->operation,
+  
args->timeline_syncobj_out,
+  
args->timeline_point_in);
+   if (r) {
+   DRM_ERROR("Failed to update userqueue mapping 
(%u)\n", r);
+   }
+   }
+   }
+
 error:
drm_exec_fini(&exec);
drm_gem_object_put(gobj);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
index 5173718c3848..c9cc935caabd 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
@@ -21,7 +21,7 @@
  * OTHER DEALINGS IN THE SOFTWARE.
  *
  */
-
+#include 
 #include "amdgpu.h"
 #include "amdgpu_vm.h"
 #include "amdgpu_userqueue.h"
@@ -154,6 +154,87 @@ amdgpu_userqueue_get_doorbell_index(struct 
amdgpu_userq_mgr *uq_mgr,
return r;
 }
 
+static int
+amdgpu_userqueue_validate_vm_bo(void *_unused, struct amdgpu_bo *bo)
+{
+   struct ttm_operation_ctx ctx = { false, false };
+   int ret;
+
+   amdgpu_bo_placement_from_domain(bo, bo->allowed_domains);
+
+   ret = ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
+   if (ret)
+   DRM_ERROR("Fail to validate\n");
+
+   return ret;
+}
+
+int amdgpu_userqueue_update_bo_mapping(struct drm_file *filp, struct 
amdgpu_bo_va *bo_va,
+  uint32_t operation, uint32_t 
syncobj_handle,
+  uint64_t point)
+{
+   struct amdgpu_bo *bo = bo_va ? bo_va->base.bo : NULL;
+   struct amdgpu_fpriv *fpriv = filp->driver_priv;
+   struct amdgpu_vm *vm = &fpriv->vm;
+   struct drm_syncobj *syncobj;
+   struct dma_fence_chain *chain;
+   struct dma_fence *last_update;
+
+   /*  Find the sync object */
+   syncobj = drm_syncobj_find(filp, syncobj_handle);
+   if (!syncobj)
+   return -ENOENT;
+
+   /* Allocate the chain node */
+   chain = dma_fence_chain_alloc();
+   if (!chain) {
+   drm_syncobj_put(syncobj);
+   return -ENOMEM;
+   }
+
+   /*  Determine the last update fence */
+   if ((bo

[PATCH v11 12/28] drm/amdgpu: enable SDMA usermode queues

2024-09-09 Thread Shashank Sharma

From: Arvind Yadav 

This patch does necessary modifications to enable the SDMA
usermode queues using the existing userqueue infrastructure.

V9:  introduced this patch in the series
V10: use header file instead of extern (Alex)
V11: rename drm_amdgpu_userq_mqd_sdma_gfx_v11 to
 drm_amdgpu_userq_mqd_sdma_gfx11 (Marek)

Cc: Christian König 
Cc: Alex Deucher 
Reviewed-by: Christian König 
Signed-off-by: Shashank Sharma 
Signed-off-by: Arvind Yadav 
Signed-off-by: Srinivasan Shanmugam 
Change-Id: I782acfc08fef0fa5302e665173788fc03dbc51e1
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c  |  2 +-
 .../gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c   | 18 ++
 drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c |  2 ++
 include/uapi/drm/amdgpu_drm.h  | 10 ++
 4 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
index 5cb984c509c2..2c5747cc492e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
@@ -189,7 +189,7 @@ amdgpu_userqueue_create(struct drm_file *filp, union 
drm_amdgpu_userq *args)
int qid, r = 0;
 
/* Usermode queues are only supported for GFX IP as of now */
-   if (args->in.ip_type != AMDGPU_HW_IP_GFX) {
+   if (args->in.ip_type != AMDGPU_HW_IP_GFX && args->in.ip_type != 
AMDGPU_HW_IP_DMA) {
DRM_ERROR("Usermode queue doesn't support IP type %u\n", 
args->in.ip_type);
return -EINVAL;
}
diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
index bcfa0d1ef7bf..dc5359742774 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
@@ -206,6 +206,24 @@ static int mes_v11_0_userq_create_ctx_space(struct 
amdgpu_userq_mgr *uq_mgr,
mqd->fw_work_area_base_lo = mqd_gfx_v11->csa_va & 0xFFFC;
mqd->fw_work_area_base_hi = upper_32_bits(mqd_gfx_v11->csa_va);
kfree(mqd_gfx_v11);
+   } else if (mqd_user->ip_type == AMDGPU_HW_IP_DMA) {
+   struct v11_sdma_mqd *mqd = queue->mqd.cpu_ptr;
+   struct drm_amdgpu_userq_mqd_sdma_gfx11 *mqd_sdma_v11;
+
+   if (mqd_user->mqd_size != sizeof(*mqd_sdma_v11) || 
!mqd_user->mqd) {
+   DRM_ERROR("Invalid SDMA MQD\n");
+   return -EINVAL;
+   }
+
+   mqd_sdma_v11 = memdup_user(u64_to_user_ptr(mqd_user->mqd), 
mqd_user->mqd_size);
+   if (IS_ERR(mqd_sdma_v11)) {
+   DRM_ERROR("Failed to read sdma user MQD\n");
+   amdgpu_userqueue_destroy_object(uq_mgr, ctx);
+   return -ENOMEM;
+   }
+
+   mqd->sdmax_rlcx_csa_addr_lo = mqd_sdma_v11->csa_va & 0xFFFC;
+   mqd->sdmax_rlcx_csa_addr_hi = 
upper_32_bits(mqd_sdma_v11->csa_va);
}
 
return 0;
diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c 
b/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
index 208a1fa9d4e7..62f6f015c685 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
@@ -43,6 +43,7 @@
 #include "sdma_common.h"
 #include "sdma_v6_0.h"
 #include "v11_structs.h"
+#include "mes_v11_0_userqueue.h"
 
 MODULE_FIRMWARE("amdgpu/sdma_6_0_0.bin");
 MODULE_FIRMWARE("amdgpu/sdma_6_0_1.bin");
@@ -1340,6 +1341,7 @@ static int sdma_v6_0_sw_init(void *handle)
else
DRM_ERROR("Failed to allocated memory for SDMA IP Dump\n");
 
+   adev->userq_funcs[AMDGPU_HW_IP_DMA] = &userq_mes_v11_0_funcs;
return r;
 }
 
diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
index 895d64982498..3ea067242b19 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -428,6 +428,16 @@ struct drm_amdgpu_userq_mqd_gfx11 {
__u64   csa_va;
 };
 
+/* GFX V11 SDMA IP specific MQD parameters */
+struct drm_amdgpu_userq_mqd_sdma_gfx11 {
+   /**
+* @csa_va: Virtual address of the GPU memory to hold the CSA buffer.
+* This must be a from a separate GPU object, and use AMDGPU_INFO IOCTL
+* to get the size.
+*/
+   __u64   csa_va;
+};
+
 /* vm ioctl */
 #define AMDGPU_VM_OP_RESERVE_VMID  1
 #define AMDGPU_VM_OP_UNRESERVE_VMID2
-- 
2.45.1

[PATCH v11 15/28] drm/amdgpu: add kernel config for gfx-userqueue

2024-09-09 Thread Shashank Sharma

This patch:
- adds a kernel config option "CONFIG_DRM_AMD_USERQ_GFX"
- moves the usequeue initialization code for all IPs under
  this flag

so that the userqueue works only when the config is enabled.

V9:  Introduce this patch
V10: Call it CONFIG_DRM_AMDGPU_NAVI3X_USERQ instead of
 CONFIG_DRM_AMDGPU_USERQ_GFX (Christian)
V11: Add GFX in the config help description message.

Cc: Alex Deucher 
Cc: Christian Koenig 
Reviewed-by: Christian König 
Signed-off-by: Shashank Sharma 
Signed-off-by: Arvind Yadav 
Change-Id: I509a1fc9eb9ae1a1e042ae4456737333a606
---
 drivers/gpu/drm/amd/amdgpu/Kconfig | 8 
 drivers/gpu/drm/amd/amdgpu/Makefile| 4 +++-
 drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 4 
 drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c | 3 +++
 4 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/Kconfig 
b/drivers/gpu/drm/amd/amdgpu/Kconfig
index 0051fb1b437f..b7f41177b3b9 100644
--- a/drivers/gpu/drm/amd/amdgpu/Kconfig
+++ b/drivers/gpu/drm/amd/amdgpu/Kconfig
@@ -92,6 +92,14 @@ config DRM_AMDGPU_WERROR
  Add -Werror to the build flags for amdgpu.ko.
  Only enable this if you are warning code for amdgpu.ko.
 
+config DRM_AMDGPU_NAVI3X_USERQ
+   bool "Enable Navi 3x gfx usermode queues"
+   depends on DRM_AMDGPU
+   default n
+   help
+ Choose this option to enable GFX usermode queue support for 
GFX/SDMA/Compute
+  workload submission. This feature is supported on Navi 3X only.
+
 source "drivers/gpu/drm/amd/acp/Kconfig"
 source "drivers/gpu/drm/amd/display/Kconfig"
 source "drivers/gpu/drm/amd/amdkfd/Kconfig"
diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
b/drivers/gpu/drm/amd/amdgpu/Makefile
index d9bf70251eba..beb8442b4e3a 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -174,7 +174,9 @@ amdgpu-y += \
amdgpu_mes.o \
mes_v11_0.o \
mes_v12_0.o \
-   mes_v11_0_userqueue.o
+
+# add GFX userqueue support
+amdgpu-$(CONFIG_DRM_AMDGPU_NAVI3X_USERQ) += mes_v11_0_userqueue.o
 
 # add UVD block
 amdgpu-y += \
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
index 82a8df56240e..f3d034f2d4fb 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
@@ -1553,8 +1553,10 @@ static int gfx_v11_0_sw_init(void *handle)
adev->gfx.mec.num_mec = 2;
adev->gfx.mec.num_pipe_per_mec = 4;
adev->gfx.mec.num_queue_per_pipe = 4;
+#ifdef CONFIG_DRM_AMDGPU_NAVI3X_USERQ
adev->userq_funcs[AMDGPU_HW_IP_GFX] = &userq_mes_v11_0_funcs;
adev->userq_funcs[AMDGPU_HW_IP_COMPUTE] = 
&userq_mes_v11_0_funcs;
+#endif
break;
case IP_VERSION(11, 0, 1):
case IP_VERSION(11, 0, 4):
@@ -1567,8 +1569,10 @@ static int gfx_v11_0_sw_init(void *handle)
adev->gfx.mec.num_mec = 1;
adev->gfx.mec.num_pipe_per_mec = 4;
adev->gfx.mec.num_queue_per_pipe = 4;
+#ifdef CONFIG_DRM_AMD_USERQ_GFX
adev->userq_funcs[AMDGPU_HW_IP_GFX] = &userq_mes_v11_0_funcs;
adev->userq_funcs[AMDGPU_HW_IP_COMPUTE] = 
&userq_mes_v11_0_funcs;
+#endif
break;
default:
adev->gfx.me.num_me = 1;
diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c 
b/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
index 62f6f015c685..bb11917ad855 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
@@ -1341,7 +1341,10 @@ static int sdma_v6_0_sw_init(void *handle)
else
DRM_ERROR("Failed to allocated memory for SDMA IP Dump\n");
 
+#ifdef CONFIG_DRM_AMDGPU_NAVI3X_USERQ
adev->userq_funcs[AMDGPU_HW_IP_DMA] = &userq_mes_v11_0_funcs;
+#endif
+
return r;
 }
 
-- 
2.45.1

[PATCH v11 21/28] drm/amdgpu: add gfx eviction fence helpers

2024-09-09 Thread Shashank Sharma

This patch adds basic eviction fence framework for the gfx buffers.
The idea is to:
- One eviction fence is created per gfx process, at kms_open.
- This fence is attached to all the gem buffers created
  by this process.
- This fence is detached to all the gem buffers at postclose_kms.

This framework will be further used for usermode queues.

V2: Addressed review comments from Christian
- keep fence_ctx and fence_seq directly in fpriv
- evcition_fence should be dynamically allocated
- do not save eviction fence instance in BO, there could be many
  such fences attached to one BO
- use dma_resv_replace_fence() in detach

V3: Addressed review comments from Christian
- eviction fence create and destroy functions should be called only once
  from fpriv create/destroy
- use dma_fence_put() in eviction_fence_destroy

V4: Addressed review comments from Christian:
- create a separate ev_fence_mgr structure
- cleanup fence init part
- do not add a domain for fence owner KGD

Cc: Christian Koenig 
Cc: Alex Deucher 
Signed-off-by: Shashank Sharma 
Signed-off-by: Arvind Yadav 
Change-Id: I7a8d27d7172bafbfe34aa9decf2cd36655948275
---
 drivers/gpu/drm/amd/amdgpu/Makefile   |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu.h   |   6 +-
 .../drm/amd/amdgpu/amdgpu_eviction_fence.c| 148 ++
 .../drm/amd/amdgpu/amdgpu_eviction_fence.h|  65 
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c   |   9 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c   |   3 +
 6 files changed, 231 insertions(+), 2 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h

diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
b/drivers/gpu/drm/amd/amdgpu/Makefile
index ff5621697c68..0643078d1225 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -66,7 +66,7 @@ amdgpu-y += amdgpu_device.o amdgpu_doorbell_mgr.o 
amdgpu_kms.o \
amdgpu_fw_attestation.o amdgpu_securedisplay.o \
amdgpu_eeprom.o amdgpu_mca.o amdgpu_psp_ta.o amdgpu_lsdma.o \
amdgpu_ring_mux.o amdgpu_xcp.o amdgpu_seq64.o amdgpu_aca.o 
amdgpu_dev_coredump.o \
-   amdgpu_userq_fence.o
+   amdgpu_userq_fence.o amdgpu_eviction_fence.o
 
 amdgpu-$(CONFIG_PROC_FS) += amdgpu_fdinfo.o
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 76ada47b1875..0013bfc74024 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -113,6 +113,7 @@
 #include "amdgpu_seq64.h"
 #include "amdgpu_reg_state.h"
 #include "amdgpu_userqueue.h"
+#include "amdgpu_eviction_fence.h"
 #if defined(CONFIG_DRM_AMD_ISP)
 #include "amdgpu_isp.h"
 #endif
@@ -481,7 +482,6 @@ struct amdgpu_flip_work {
boolasync;
 };
 
-
 /*
  * file private structure
  */
@@ -495,6 +495,10 @@ struct amdgpu_fpriv {
struct idr  bo_list_handles;
struct amdgpu_ctx_mgr   ctx_mgr;
struct amdgpu_userq_mgr userq_mgr;
+
+   /* Eviction fence infra */
+   struct amdgpu_eviction_fence_mgr evf_mgr;
+
/** GPU partition selection */
uint32_txcp_id;
 };
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
new file mode 100644
index ..2d474cb11cf9
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
@@ -0,0 +1,148 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright 2024 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+#include 
+#include "amdgpu.h"
+
+static const char *
+amdgpu_eviction_fence_get_driver_name(struct dma_fence *fence)
+{
+   return "amdgpu";
+}
+
+static co

[PATCH v11 23/28] drm/amdgpu: suspend gfx userqueues

2024-09-09 Thread Shashank Sharma

This patch adds suspend support for gfx userqueues. It typically does
the following:
- adds an enable_signaling function for the eviction fence, so that it
  can trigger the userqueue suspend,
- adds a delayed function for suspending the userqueues, to suspend all
  the queues under this userq manager and signals the eviction fence,
- adds reference of userq manager in the eviction fence container so
  that it can be used in the suspend function.

V2: Addressed Christian's review comments:
- schedule suspend work immediately

V4: Addressed Christian's review comments:
- wait for pending uq fences before starting suspend, added
  queue->last_fence for the same
- accommodate ev_fence_mgr into existing code
- some bug fixes and NULL checks

V5: Addressed Christian's review comments (gitlab)
- Wait for eviction fence to get signaled in destroy, dont signal it
- Wait for eviction fence to get signaled in replace fence, dont signal it

Cc: Alex Deucher 
Cc: Christian Koenig 
Signed-off-by: Shashank Sharma 
Signed-off-by: Arvind Yadav 
Change-Id: Ib60a7feda5544e3badc87bd1a991931ee726ee82
---
 .../drm/amd/amdgpu/amdgpu_eviction_fence.c| 149 ++
 .../drm/amd/amdgpu/amdgpu_eviction_fence.h|   2 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c   |   2 +
 .../gpu/drm/amd/amdgpu/amdgpu_userq_fence.c   |  10 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 100 
 .../gpu/drm/amd/include/amdgpu_userqueue.h|  10 ++
 6 files changed, 272 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
index 2d474cb11cf9..3d4fc704adb1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
@@ -22,8 +22,12 @@
  *
  */
 #include 
+#include 
 #include "amdgpu.h"
 
+#define work_to_evf_mgr(w, name) container_of(w, struct 
amdgpu_eviction_fence_mgr, name)
+#define evf_mgr_to_fpriv(e) container_of(e, struct amdgpu_fpriv, evf_mgr)
+
 static const char *
 amdgpu_eviction_fence_get_driver_name(struct dma_fence *fence)
 {
@@ -39,10 +43,150 @@ amdgpu_eviction_fence_get_timeline_name(struct dma_fence 
*f)
return ef->timeline_name;
 }
 
+static void
+amdgpu_eviction_fence_update_fence(struct amdgpu_eviction_fence_mgr *evf_mgr,
+  struct amdgpu_eviction_fence *new_ef)
+{
+   struct dma_fence *old_ef = &evf_mgr->ev_fence->base;
+
+   spin_lock(&evf_mgr->ev_fence_lock);
+   dma_fence_put(old_ef);
+   evf_mgr->ev_fence = new_ef;
+   spin_unlock(&evf_mgr->ev_fence_lock);
+}
+
+int
+amdgpu_eviction_fence_replace_fence(struct amdgpu_fpriv *fpriv)
+{
+   struct amdgpu_eviction_fence_mgr *evf_mgr = &fpriv->evf_mgr;
+   struct amdgpu_vm *vm = &fpriv->vm;
+   struct amdgpu_eviction_fence *old_ef, *new_ef;
+   struct amdgpu_bo_va *bo_va, *tmp;
+   int ret;
+
+   old_ef = evf_mgr->ev_fence;
+   if (old_ef && !dma_fence_is_signaled(&old_ef->base)) {
+   DRM_DEBUG_DRIVER("Old EF not signaled yet\n");
+   dma_fence_wait(&old_ef->base, true);
+   }
+
+   new_ef = amdgpu_eviction_fence_create(evf_mgr);
+   if (!new_ef) {
+   DRM_ERROR("Failed to create new eviction fence\n");
+   return ret;
+   }
+
+   /* Replace fences and free old one */
+   amdgpu_eviction_fence_update_fence(evf_mgr, new_ef);
+
+   /* Attach new eviction fence to BOs */
+   list_for_each_entry_safe(bo_va, tmp, &vm->done, base.vm_status) {
+   struct amdgpu_bo *bo = bo_va->base.bo;
+
+   if (!bo)
+   continue;
+
+   /* Skip pinned BOs */
+   if (bo->tbo.pin_count)
+   continue;
+
+   ret = amdgpu_eviction_fence_attach(evf_mgr, bo);
+   if (ret) {
+   DRM_ERROR("Failed to attch new eviction fence\n");
+   goto free_err;
+   }
+   }
+
+   return 0;
+
+free_err:
+   kfree(new_ef);
+   return ret;
+}
+
+static void
+amdgpu_eviction_fence_suspend_worker(struct work_struct *work)
+{
+   struct amdgpu_eviction_fence_mgr *evf_mgr = work_to_evf_mgr(work, 
suspend_work.work);
+   struct amdgpu_fpriv *fpriv = evf_mgr_to_fpriv(evf_mgr);
+   struct amdgpu_vm *vm = &fpriv->vm;
+   struct amdgpu_bo_va *bo_va, *tmp;
+   struct drm_exec exec;
+   struct amdgpu_bo *bo;
+   int ret;
+
+   /* Signal old eviction fence */
+   ret = amdgpu_eviction_fence_signal(evf_mgr);
+   if (ret) {
+   DRM_ERROR("Failed to signal eviction fence err=%d\n", ret);
+   return;
+   }
+
+   /* Cleanup old eviction fence entry *

[PATCH v11 25/28] drm/amdgpu: Add input fence to sync bo unmap

2024-09-09 Thread Shashank Sharma

From: Arvind Yadav 

This patch adds input fences to VM_IOCTL for unmapping an object.
The kernel will unmap the BO only when the fence is signaled.

V2: Bug fix (Arvind)
V3: Bug fix (Arvind)
V4: Rename UAPI objects as per UAPI review (Marek)

Cc: Alex Deucher 
Cc: Christian Koenig 
Signed-off-by: Arvind Yadav 
Signed-off-by: Shashank Sharma 
Change-Id: Ib1572da97b640d80e39d73c9c166fa1759d720b5
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 41 +
 include/uapi/drm/amdgpu_drm.h   |  4 +++
 2 files changed, 45 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index c9b4a6ce3f14..7823faa3dbaa 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -36,6 +36,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "amdgpu.h"
 #include "amdgpu_display.h"
@@ -45,6 +46,39 @@
 
 static const struct drm_gem_object_funcs amdgpu_gem_object_funcs;
 
+static void amdgpu_userqueue_add_input_fence(struct drm_file *filp,
+uint64_t syncobj_handles_array,
+uint32_t num_syncobj_handles)
+{
+   struct dma_fence *fence;
+   uint32_t *syncobj_handles;
+   int ret, i;
+
+   if (!num_syncobj_handles)
+   return;
+
+   syncobj_handles = memdup_user(u64_to_user_ptr(syncobj_handles_array),
+ sizeof(uint32_t) * num_syncobj_handles);
+   if (IS_ERR(syncobj_handles)) {
+   DRM_ERROR("Failed to get the syncobj handles err = %ld\n",
+ PTR_ERR(syncobj_handles));
+   return;
+   }
+
+   for (i = 0; i < num_syncobj_handles; i++) {
+
+   if (!syncobj_handles[i])
+   continue;
+
+   ret = drm_syncobj_find_fence(filp, syncobj_handles[i], 0, 0, 
&fence);
+   if (ret)
+   continue;
+
+   dma_fence_wait(fence, false);
+   dma_fence_put(fence);
+   }
+}
+
 static vm_fault_t amdgpu_gem_fault(struct vm_fault *vmf)
 {
struct ttm_buffer_object *bo = vmf->vma->vm_private_data;
@@ -809,6 +843,13 @@ int amdgpu_gem_va_ioctl(struct drm_device *dev, void *data,
bo_va = NULL;
}
 
+   if (args->operation == AMDGPU_VA_OP_UNMAP ||
+   args->operation == AMDGPU_VA_OP_CLEAR ||
+   args->operation == AMDGPU_VA_OP_REPLACE)
+   amdgpu_userqueue_add_input_fence(filp,
+
args->input_fence_syncobj_array_in,
+args->num_syncobj_handles_in);
+
switch (args->operation) {
case AMDGPU_VA_OP_MAP:
va_flags = amdgpu_gem_va_map_flags(adev, args->flags);
diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
index 1dc1dba6b024..8dd0d1808e37 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -840,6 +840,10 @@ struct drm_amdgpu_gem_va {
__u32 timeline_syncobj_out;
/** Timeline point */
__u64 timeline_point_in;
+   /** Array of sync object handle to wait for given input fences */
+   __u64 input_fence_syncobj_array_in;
+   /** the number of syncobj handles in @input_fence_syncobj_array_in */
+   __u32 num_syncobj_handles_in;
 };
 
 #define AMDGPU_HW_IP_GFX  0
-- 
2.45.1

[PATCH v11 26/28] drm/amdgpu: fix MES GFX mask

2024-09-09 Thread Shashank Sharma

From: Arvind Yadav 

Current MES GFX mask prevents FW to enable oversubscription. This patch
does the following:
- Fixes the mask values and adds a description for the same
- Removes the central mask setup and makes it IP specific, as it would
  be different when the number of pipes and queues are different.

Cc: Christian König 
Cc: Alex Deucher 
Signed-off-by: Shashank Sharma 
Signed-off-by: Arvind Yadav 
Change-Id: I86f5b89c5527c23df94edc707c69c78819f4c8cf
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 3 ---
 drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h | 2 +-
 drivers/gpu/drm/amd/amdgpu/mes_v11_0.c  | 9 +++--
 3 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
index f7d5d4f08a53..dbf19122dfc3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
@@ -151,9 +151,6 @@ int amdgpu_mes_init(struct amdgpu_device *adev)
adev->mes.compute_hqd_mask[i] = 0xc;
}
 
-   for (i = 0; i < AMDGPU_MES_MAX_GFX_PIPES; i++)
-   adev->mes.gfx_hqd_mask[i] = i ? 0 : 0xfffe;
-
for (i = 0; i < AMDGPU_MES_MAX_SDMA_PIPES; i++) {
if (amdgpu_ip_version(adev, SDMA0_HWIP, 0) <
IP_VERSION(6, 0, 0))
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
index 96788c0f42f1..45e3508f0f8e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
@@ -109,8 +109,8 @@ struct amdgpu_mes {
 
uint32_tvmid_mask_gfxhub;
uint32_tvmid_mask_mmhub;
-   uint32_t
compute_hqd_mask[AMDGPU_MES_MAX_COMPUTE_PIPES];
uint32_tgfx_hqd_mask[AMDGPU_MES_MAX_GFX_PIPES];
+   uint32_t
compute_hqd_mask[AMDGPU_MES_MAX_COMPUTE_PIPES];
uint32_t
sdma_hqd_mask[AMDGPU_MES_MAX_SDMA_PIPES];
uint32_t
aggregated_doorbells[AMDGPU_MES_PRIORITY_NUM_LEVELS];
uint32_tsch_ctx_offs[AMDGPU_MAX_MES_PIPES];
diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
index 2911c45cfbe0..d2610a664b2a 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
@@ -653,8 +653,13 @@ static int mes_v11_0_set_hw_resources(struct amdgpu_mes 
*mes)
mes_set_hw_res_pkt.compute_hqd_mask[i] =
mes->compute_hqd_mask[i];
 
-   for (i = 0; i < MAX_GFX_PIPES; i++)
-   mes_set_hw_res_pkt.gfx_hqd_mask[i] = mes->gfx_hqd_mask[i];
+   /*
+* GFX pipe 0 queue 0 is being used by kernel
+* Set GFX pipe 0 queue 1 for MES scheduling
+* GFX pipe 1 can't be used for MES due to HW limitation.
+*/
+   mes_set_hw_res_pkt.gfx_hqd_mask[0] = 0x2;
+   mes_set_hw_res_pkt.gfx_hqd_mask[1] = 0;
 
for (i = 0; i < MAX_SDMA_PIPES; i++)
mes_set_hw_res_pkt.sdma_hqd_mask[i] = mes->sdma_hqd_mask[i];
-- 
2.45.1

[PATCH v11 27/28] Revert "drm/amdgpu/gfx11: only enable CP GFX shadowing on SR-IOV"

2024-09-09 Thread Shashank Sharma

From: Shashank Sharma 

This reverts commit 81af32520e7aaa337fe132f16c12ce54170187ea.

This commit prevents a usermode queue client to get the shadow related
information.

Signed-off-by: Shashank Sharma 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
index dbf3bcadee32..1f0f7ec0facc 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
@@ -661,12 +661,8 @@ static void gfx_v11_0_check_fw_cp_gfx_shadow(struct 
amdgpu_device *adev)
case IP_VERSION(11, 0, 3):
if ((adev->gfx.me_fw_version >= 1505) &&
(adev->gfx.pfp_fw_version >= 1600) &&
-   (adev->gfx.mec_fw_version >= 512)) {
-   if (amdgpu_sriov_vf(adev))
-   adev->gfx.cp_gfx_shadow = true;
-   else
-   adev->gfx.cp_gfx_shadow = false;
-   }
+   (adev->gfx.mec_fw_version >= 512))
+   adev->gfx.cp_gfx_shadow = true;
break;
default:
adev->gfx.cp_gfx_shadow = false;
-- 
2.45.1

[PATCH v11 24/28] drm/amdgpu: resume gfx userqueues

2024-09-09 Thread Shashank Sharma

This patch adds support for userqueue resume. What it typically does is
this:
- adds a new delayed work for resuming all the queues.
- schedules this delayed work from the suspend work.
- validates the BOs and replaces the eviction fence before resuming all
  the queues running under this instance of userq manager.

V2: Addressed Christian's review comments:
- declare local variables like ret at the bottom.
- lock all the object first, then start attaching the new fence.
- dont replace old eviction fence, just attach new eviction fence.
- no error logs for drm_exec_lock failures
- no need to reserve bos after drm_exec_locked
- schedule the resume worker immediately (not after 100 ms)
- check for NULL BO (Arvind)

Cc: Alex Deucher 
Cc: Christian Koenig 
Signed-off-by: Shashank Sharma 
Signed-off-by: Arvind Yadav 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 120 ++
 .../gpu/drm/amd/include/amdgpu_userqueue.h|   1 +
 2 files changed, 121 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
index 979174f80993..e7f7354e0c0e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
@@ -405,6 +405,122 @@ int amdgpu_userq_ioctl(struct drm_device *dev, void *data,
return r;
 }
 
+static int
+amdgpu_userqueue_resume_all(struct amdgpu_userq_mgr *uq_mgr)
+{
+   struct amdgpu_device *adev = uq_mgr->adev;
+   const struct amdgpu_userq_funcs *userq_funcs;
+   struct amdgpu_usermode_queue *queue;
+   int queue_id, ret;
+
+   userq_funcs = adev->userq_funcs[AMDGPU_HW_IP_GFX];
+
+   /* Resume all the queues for this process */
+   idr_for_each_entry(&uq_mgr->userq_idr, queue, queue_id) {
+   ret = userq_funcs->resume(uq_mgr, queue);
+   if (ret)
+   DRM_ERROR("Failed to resume queue %d\n", queue_id);
+   }
+
+   return ret;
+}
+
+static int
+amdgpu_userqueue_validate_bos(struct amdgpu_userq_mgr *uq_mgr)
+{
+   struct amdgpu_fpriv *fpriv = uq_mgr_to_fpriv(uq_mgr);
+   struct amdgpu_vm *vm = &fpriv->vm;
+   struct amdgpu_bo_va *bo_va, *tmp;
+   struct drm_exec exec;
+   struct amdgpu_bo *bo;
+   int ret;
+
+   drm_exec_init(&exec, DRM_EXEC_IGNORE_DUPLICATES, 0);
+   drm_exec_until_all_locked(&exec) {
+   ret = amdgpu_vm_lock_pd(vm, &exec, 2);
+   drm_exec_retry_on_contention(&exec);
+   if (unlikely(ret)) {
+   DRM_ERROR("Failed to lock PD\n");
+   goto unlock_all;
+   }
+
+   /* Lock the done list */
+   list_for_each_entry_safe(bo_va, tmp, &vm->done, base.vm_status) 
{
+   bo = bo_va->base.bo;
+   if (!bo)
+   continue;
+
+   ret = drm_exec_lock_obj(&exec, &bo->tbo.base);
+   drm_exec_retry_on_contention(&exec);
+   if (unlikely(ret))
+   goto unlock_all;
+   }
+
+   /* Lock the invalidated list */
+   list_for_each_entry_safe(bo_va, tmp, &vm->invalidated, 
base.vm_status) {
+   bo = bo_va->base.bo;
+   if (!bo)
+   continue;
+
+   ret = drm_exec_lock_obj(&exec, &bo->tbo.base);
+   drm_exec_retry_on_contention(&exec);
+   if (unlikely(ret))
+   goto unlock_all;
+   }
+   }
+
+   /* Now validate BOs */
+   list_for_each_entry_safe(bo_va, tmp, &vm->invalidated, base.vm_status) {
+   bo = bo_va->base.bo;
+   if (!bo)
+   continue;
+
+   ret = amdgpu_userqueue_validate_vm_bo(NULL, bo);
+   if (ret) {
+   DRM_ERROR("Failed to validate BO\n");
+   goto unlock_all;
+   }
+   }
+
+   /* Handle the moved BOs */
+   ret = amdgpu_vm_handle_moved(uq_mgr->adev, vm, &exec.ticket);
+   if (ret) {
+   DRM_ERROR("Failed to handle moved BOs\n");
+   goto unlock_all;
+   }
+
+   ret = amdgpu_eviction_fence_replace_fence(fpriv);
+   if (ret)
+   DRM_ERROR("Failed to replace eviction fence\n");
+
+unlock_all:
+   drm_exec_fini(&exec);
+   return ret;
+}
+
+static void amdgpu_userqueue_resume_worker(struct work_struct *work)
+{
+   struct amdgpu_userq_mgr *uq_mgr = work_to_uq_mgr(work, 
resume_work.work);
+   int ret;
+
+   mutex_lock(&uq_mgr->userq_mutex);
+
+   ret = amdgpu_userqueue_validate_bos(uq_m

[PATCH v11 28/28] Revert "drm/amdgpu: don't allow userspace to create a doorbell BO"

2024-09-09 Thread Shashank Sharma

From: Arvind Yadav 

This reverts commit 6be2ad4f0073c541146caa66c5ae936c955a8224.
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index 7823faa3dbaa..2e3c974a3340 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -365,10 +365,6 @@ int amdgpu_gem_create_ioctl(struct drm_device *dev, void 
*data,
uint32_t handle, initial_domain;
int r;
 
-   /* reject DOORBELLs until userspace code to use it is available */
-   if (args->in.domains & AMDGPU_GEM_DOMAIN_DOORBELL)
-   return -EINVAL;
-
/* reject invalid gem flags */
if (flags & ~(AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED |
  AMDGPU_GEM_CREATE_NO_CPU_ACCESS |
-- 
2.45.1

[PATCH v11 22/28] drm/amdgpu: add userqueue suspend/resume functions

2024-09-09 Thread Shashank Sharma

This patch adds userqueue suspend/resume functions at
core MES V11 IP level.

V2: use true/false for queue_active status (Christian)
added Christian's R-B

V3: reset/set queue status in mqd.create and mqd.destroy

Cc: Alex Deucher 
Cc: Christian Koenig 
Reviewed-by: Christian Koenig 
Signed-off-by: Shashank Sharma 
Signed-off-by: Arvind Yadav 
---
 .../gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c  | 33 +++
 .../gpu/drm/amd/include/amdgpu_userqueue.h|  5 +++
 2 files changed, 38 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
index b3aa49ff1a87..51c9a215ae77 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
@@ -331,6 +331,7 @@ static int mes_v11_0_userq_mqd_create(struct 
amdgpu_userq_mgr *uq_mgr,
goto free_ctx;
}
 
+   queue->queue_active = true;
return 0;
 
 free_ctx:
@@ -354,9 +355,41 @@ mes_v11_0_userq_mqd_destroy(struct amdgpu_userq_mgr 
*uq_mgr,
amdgpu_userqueue_destroy_object(uq_mgr, &queue->fw_obj);
kfree(queue->userq_prop);
amdgpu_userqueue_destroy_object(uq_mgr, &queue->mqd);
+   queue->queue_active = false;
+}
+
+static int mes_v11_0_userq_suspend(struct amdgpu_userq_mgr *uq_mgr,
+  struct amdgpu_usermode_queue *queue)
+{
+   if (queue->queue_active) {
+   mes_v11_0_userq_unmap(uq_mgr, queue);
+   queue->queue_active = false;
+   }
+
+   return 0;
+}
+
+static int mes_v11_0_userq_resume(struct amdgpu_userq_mgr *uq_mgr,
+ struct amdgpu_usermode_queue *queue)
+{
+   int ret;
+
+   if (queue->queue_active)
+   return 0;
+
+   ret = mes_v11_0_userq_map(uq_mgr, queue, queue->userq_prop);
+   if (ret) {
+   DRM_ERROR("Failed to resume queue\n");
+   return ret;
+   }
+
+   queue->queue_active = true;
+   return 0;
 }
 
 const struct amdgpu_userq_funcs userq_mes_v11_0_funcs = {
.mqd_create = mes_v11_0_userq_mqd_create,
.mqd_destroy = mes_v11_0_userq_mqd_destroy,
+   .suspend = mes_v11_0_userq_suspend,
+   .resume = mes_v11_0_userq_resume,
 };
diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h 
b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
index 77a33f9e37f8..37be29048f42 100644
--- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
+++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
@@ -37,6 +37,7 @@ struct amdgpu_userq_obj {
 
 struct amdgpu_usermode_queue {
int queue_type;
+   uint8_t queue_active;
uint64_tdoorbell_handle;
uint64_tdoorbell_index;
uint64_tflags;
@@ -57,6 +58,10 @@ struct amdgpu_userq_funcs {
  struct amdgpu_usermode_queue *queue);
void (*mqd_destroy)(struct amdgpu_userq_mgr *uq_mgr,
struct amdgpu_usermode_queue *uq);
+   int (*suspend)(struct amdgpu_userq_mgr *uq_mgr,
+  struct amdgpu_usermode_queue *queue);
+   int (*resume)(struct amdgpu_userq_mgr *uq_mgr,
+ struct amdgpu_usermode_queue *queue);
 };
 
 /* Usermode queues for gfx */
-- 
2.45.1

[PATCH v2 1/3] drm/amdgpu: replace TLB seq callback with HW seq

2024-01-31 Thread Shashank Sharma

From: Christian König 

The callback we installed for the SDMA update were actually pretty
horrible. since we now have seq64 use that one and HW seq writes
instead.

V2:(Shashank)
 - rebased on amd-drm-staging-next
 - changed amdgpu_seq64_gpu_addr

Cc: Christian König 
Cc: Alex Deucher 
Cc: Felix Kuehling 
Cc: Rajneesh Bhardwaj 
Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c   | 14 
 drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.h   |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c  | 79 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h  | 27 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_cpu.c  |  3 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c   |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c |  5 ++
 7 files changed, 42 insertions(+), 89 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
index 3d0d56087d41..300dc79fa2b9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
@@ -199,6 +199,20 @@ void amdgpu_seq64_free(struct amdgpu_device *adev, u64 va)
__clear_bit(bit_pos, adev->seq64.used);
 }
 
+/**
+ * amdgpu_seq64_gpu_addr - Calculate GPU addr from va
+ *
+ * @adev: amdgpu_device pointer
+ * @va: virtual address in client address space
+ *
+ * Calculate the GART address for a VA.
+ */
+u64 amdgpu_seq64_gpu_addr(struct amdgpu_device *adev, u64 va)
+{
+   return va - amdgpu_seq64_get_va_base(adev) +
+   amdgpu_bo_gpu_offset(adev->seq64.sbo);
+}
+
 /**
  * amdgpu_seq64_fini - Cleanup seq64 driver
  *
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.h
index 4203b2ab318d..63e8ac0a2057 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.h
@@ -43,6 +43,7 @@ void amdgpu_seq64_free(struct amdgpu_device *adev, u64 
gpu_addr);
 int amdgpu_seq64_map(struct amdgpu_device *adev, struct amdgpu_vm *vm,
 struct amdgpu_bo_va **bo_va);
 void amdgpu_seq64_unmap(struct amdgpu_device *adev, struct amdgpu_fpriv 
*fpriv);
+u64 amdgpu_seq64_gpu_addr(struct amdgpu_device *adev, u64 va);
 
 #endif
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index ed4a8c5d26d7..0960e0a665d3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -111,21 +111,6 @@ struct amdgpu_prt_cb {
struct dma_fence_cb cb;
 };
 
-/**
- * struct amdgpu_vm_tlb_seq_struct - Helper to increment the TLB flush sequence
- */
-struct amdgpu_vm_tlb_seq_struct {
-   /**
-* @vm: pointer to the amdgpu_vm structure to set the fence sequence on
-*/
-   struct amdgpu_vm *vm;
-
-   /**
-* @cb: callback
-*/
-   struct dma_fence_cb cb;
-};
-
 /**
  * amdgpu_vm_set_pasid - manage pasid and vm ptr mapping
  *
@@ -862,23 +847,6 @@ int amdgpu_vm_update_pdes(struct amdgpu_device *adev,
return r;
 }
 
-/**
- * amdgpu_vm_tlb_seq_cb - make sure to increment tlb sequence
- * @fence: unused
- * @cb: the callback structure
- *
- * Increments the tlb sequence to make sure that future CS execute a VM flush.
- */
-static void amdgpu_vm_tlb_seq_cb(struct dma_fence *fence,
-struct dma_fence_cb *cb)
-{
-   struct amdgpu_vm_tlb_seq_struct *tlb_cb;
-
-   tlb_cb = container_of(cb, typeof(*tlb_cb), cb);
-   atomic64_inc(&tlb_cb->vm->tlb_seq);
-   kfree(tlb_cb);
-}
-
 /**
  * amdgpu_vm_update_range - update a range in the vm page table
  *
@@ -911,7 +879,6 @@ int amdgpu_vm_update_range(struct amdgpu_device *adev, 
struct amdgpu_vm *vm,
   struct dma_fence **fence)
 {
struct amdgpu_vm_update_params params;
-   struct amdgpu_vm_tlb_seq_struct *tlb_cb;
struct amdgpu_res_cursor cursor;
enum amdgpu_sync_mode sync_mode;
int r, idx;
@@ -919,12 +886,6 @@ int amdgpu_vm_update_range(struct amdgpu_device *adev, 
struct amdgpu_vm *vm,
if (!drm_dev_enter(adev_to_drm(adev), &idx))
return -ENODEV;
 
-   tlb_cb = kmalloc(sizeof(*tlb_cb), GFP_KERNEL);
-   if (!tlb_cb) {
-   r = -ENOMEM;
-   goto error_unlock;
-   }
-
/* Vega20+XGMI where PTEs get inadvertently cached in L2 texture cache,
 * heavy-weight flush TLB unconditionally.
 */
@@ -942,6 +903,7 @@ int amdgpu_vm_update_range(struct amdgpu_device *adev, 
struct amdgpu_vm *vm,
params.immediate = immediate;
params.pages_addr = pages_addr;
params.unlocked = unlocked;
+   params.needs_flush = flush_tlb;
params.allow_override = allow_override;
 
/* Implicitly sync to command submissions in the same VM before
@@ -955,7 +917,7 @@ int amdgpu_vm_update_range(struct amdgpu_device *adev, 
struct amdgpu_vm *vm,
amdgpu_vm_eviction_lock(vm);
if (vm->evicting) {

[PATCH v2 3/3] drm/amdgpu: sync page table freeing with tlb flush

2024-01-31 Thread Shashank Sharma

This patch:
- Attaches the TLB flush fence to the PT objects being freed
- Adds a new ptr in VM to save this last TLB flush fence
- Adds a new lock in VM to prevent out-of-context update of saved
  TLB flush fence
- Adds a new ptr in tlb_flush structure to save vm

The idea is to delay freeing of page table objects until we have the
respective TLB entries flushed.

V2: rebase

Cc: Christian König 
Cc: Alex Deucher 
Cc: Felix Kuehling 
Cc: Rajneesh Bhardwaj 
Signed-off-by: Shashank Sharma 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c|  3 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h|  4 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c | 27 +++
 .../gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c  | 13 +++--
 4 files changed, 45 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 67c690044b97..b0e81c249e3a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -2245,6 +2245,7 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct 
amdgpu_vm *vm,
vm->generation = 0;
 
mutex_init(&vm->eviction_lock);
+   mutex_init(&vm->tlb_flush_lock);
vm->evicting = false;
vm->tlb_fence_context = dma_fence_context_alloc(1);
 
@@ -2360,7 +2361,9 @@ int amdgpu_vm_make_compute(struct amdgpu_device *adev, 
struct amdgpu_vm *vm)
}
 
dma_fence_put(vm->last_update);
+   dma_fence_put(vm->tlb_fence_last);
vm->last_update = dma_fence_get_stub();
+   vm->tlb_fence_last = dma_fence_get_stub();
vm->is_compute_context = true;
 
/* Free the shadow bo for compute VM */
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
index 8e6fd25d07b7..b05bc586237f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -334,6 +334,10 @@ struct amdgpu_vm {
uint64_t*tlb_seq_cpu_addr;
uint64_ttlb_fence_context;
 
+   /* Ptr and lock to maintain tlb flush sync */
+   struct mutextlb_flush_lock;
+   struct dma_fence*tlb_fence_last;
+
atomic64_t  kfd_last_flushed_seq;
 
/* How many times we had to re-generate the page tables */
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
index 95dc0afdaffb..f1c4418c4d63 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
@@ -631,6 +631,18 @@ static int amdgpu_vm_pt_alloc(struct amdgpu_device *adev,
return r;
 }
 
+static inline
+void amdgpu_vm_attach_tlb_fence(struct amdgpu_bo *bo, struct dma_fence *fence)
+{
+   if (!bo || !fence)
+   return;
+
+   if (!dma_resv_reserve_fences(bo->tbo.base.resv, 1)) {
+   dma_resv_add_fence(bo->tbo.base.resv, fence,
+  DMA_RESV_USAGE_BOOKKEEP);
+   }
+}
+
 /**
  * amdgpu_vm_pt_free - free one PD/PT
  *
@@ -638,6 +650,7 @@ static int amdgpu_vm_pt_alloc(struct amdgpu_device *adev,
  */
 static void amdgpu_vm_pt_free(struct amdgpu_vm_bo_base *entry)
 {
+   struct amdgpu_vm *vm;
struct amdgpu_bo *shadow;
 
if (!entry->bo)
@@ -646,9 +659,23 @@ static void amdgpu_vm_pt_free(struct amdgpu_vm_bo_base 
*entry)
entry->bo->vm_bo = NULL;
shadow = amdgpu_bo_shadowed(entry->bo);
if (shadow) {
+   vm = shadow->vm_bo->vm;
+
+   mutex_lock(&vm->tlb_flush_lock);
+   if (vm->tlb_fence_last)
+   amdgpu_vm_attach_tlb_fence(shadow, vm->tlb_fence_last);
+   mutex_unlock(&vm->tlb_flush_lock);
+
ttm_bo_set_bulk_move(&shadow->tbo, NULL);
amdgpu_bo_unref(&shadow);
}
+
+   vm = entry->vm;
+   mutex_lock(&vm->tlb_flush_lock);
+   if (vm->tlb_fence_last)
+   amdgpu_vm_attach_tlb_fence(entry->bo, vm->tlb_fence_last);
+   mutex_unlock(&vm->tlb_flush_lock);
+
ttm_bo_set_bulk_move(&entry->bo->tbo, NULL);
 
spin_lock(&entry->vm->status_lock);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c
index 569681badd7c..54ec81d30034 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c
@@ -31,6 +31,7 @@
 struct amdgpu_tlb_fence {
struct dma_fencebase;
struct amdgpu_device*adev;
+   struct amdgpu_vm*vm;
struct dma_fence*dependency;
struct work_struct  work;
spinlock_t  lock;
@@ -51,6 +52,7 @@ static const char *amdgpu_tlb_fence_get_timeline_name(struct 
dma_fence *f)
 static void a

[PATCH v2 2/3] drm/amdgpu: implement TLB flush fence

2024-01-31 Thread Shashank Sharma

From: Christian König 

The problem is that when (for example) 4k pages are replaced
with a single 2M page we need to wait for change to be flushed
out by invalidating the TLB before the PT can be freed.

Solve this by moving the TLB flush into a DMA-fence object which
can be used to delay the freeing of the PT BOs until it is signaled.

V2: (Shashank)
- rebase
- set dma_fence_error only in case of error
- add tlb_flush fence only when PT/PD BO is locked (Felix)
- use vm->pasid when f is NULL (Mukul)

Cc: Christian Koenig 
Cc: Felix Kuehling 
Cc: Rajneesh Bhardwaj 
Cc: Alex Deucher 
Signed-off-by: Christian König 
Signed-off-by: Shashank Sharma 
---
 drivers/gpu/drm/amd/amdgpu/Makefile   |   3 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c|  10 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h|   4 +
 .../gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c  | 106 ++
 4 files changed, 122 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c

diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
b/drivers/gpu/drm/amd/amdgpu/Makefile
index 4c989da4d2f3..fdbb3d770c7b 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -70,7 +70,8 @@ amdgpu-y += amdgpu_device.o amdgpu_doorbell_mgr.o 
amdgpu_kms.o \
amdgpu_cs.o amdgpu_bios.o amdgpu_benchmark.o \
atombios_dp.o amdgpu_afmt.o amdgpu_trace_points.o \
atombios_encoders.o amdgpu_sa.o atombios_i2c.o \
-   amdgpu_dma_buf.o amdgpu_vm.o amdgpu_vm_pt.o amdgpu_ib.o amdgpu_pll.o \
+   amdgpu_dma_buf.o amdgpu_vm.o amdgpu_vm_pt.o amdgpu_vm_tlb_fence.o \
+   amdgpu_ib.o amdgpu_pll.o \
amdgpu_ucode.o amdgpu_bo_list.o amdgpu_ctx.o amdgpu_sync.o \
amdgpu_gtt_mgr.o amdgpu_preempt_mgr.o amdgpu_vram_mgr.o amdgpu_virt.o \
amdgpu_atomfirmware.o amdgpu_vf_error.o amdgpu_sched.o \
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 0960e0a665d3..67c690044b97 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -932,6 +932,15 @@ int amdgpu_vm_update_range(struct amdgpu_device *adev, 
struct amdgpu_vm *vm,
if (r)
goto error_unlock;
 
+   /* Prepare a TLB flush fence to be attached to PTs */
+   if (!unlocked && params.needs_flush && vm->is_compute_context) {
+   amdgpu_vm_tlb_fence_create(adev, vm, fence);
+
+   /* Makes sure no PD/PT is freed before the flush */
+   dma_resv_add_fence(vm->root.bo->tbo.base.resv, *fence,
+  DMA_RESV_USAGE_BOOKKEEP);
+   }
+
amdgpu_res_first(pages_addr ? NULL : res, offset,
 (last - start + 1) * AMDGPU_GPU_PAGE_SIZE, &cursor);
while (cursor.remaining) {
@@ -2237,6 +2246,7 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct 
amdgpu_vm *vm,
 
mutex_init(&vm->eviction_lock);
vm->evicting = false;
+   vm->tlb_fence_context = dma_fence_context_alloc(1);
 
r = amdgpu_vm_pt_create(adev, vm, adev->vm_manager.root_level,
false, &root, xcp_id);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
index ac9380afcb69..8e6fd25d07b7 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -332,6 +332,7 @@ struct amdgpu_vm {
atomic64_t  tlb_seq;
uint64_ttlb_seq_va;
uint64_t*tlb_seq_cpu_addr;
+   uint64_ttlb_fence_context;
 
atomic64_t  kfd_last_flushed_seq;
 
@@ -585,5 +586,8 @@ void amdgpu_vm_update_fault_cache(struct amdgpu_device 
*adev,
  uint64_t addr,
  uint32_t status,
  unsigned int vmhub);
+void amdgpu_vm_tlb_fence_create(struct amdgpu_device *adev,
+struct amdgpu_vm *vm,
+struct dma_fence **fence);
 
 #endif
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c
new file mode 100644
index ..569681badd7c
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c
@@ -0,0 +1,106 @@
+// SPDX-License-Identifier: GPL-2.0 OR MIT
+/*
+ * Copyright 2023 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to th

[PATCH v3] drm/amdgpu: change vm->task_info handling

2024-02-05 Thread Shashank Sharma

This patch changes the handling and lifecycle of vm->task_info object.
The major changes are:
- vm->task_info is a dynamically allocated ptr now, and its uasge is
  reference counted.
- introducing two new helper funcs for task_info lifecycle management
- amdgpu_vm_get_task_info: reference counts up task_info before
  returning this info
- amdgpu_vm_put_task_info: reference counts down task_info
- last put to task_info() frees task_info from the vm.

This patch also does logistical changes required for existing usage
of vm->task_info.

V2: Do not block all the prints when task_info not found (Felix)
V3: (Felix)
   - Fix wrong indentation
   - No debug message for -ENOMEM
   - Add NULL check for task_info
   - Do not duplicate the debug messages (ti vs no ti)
   - Get first reference of task_info in vm_init(), put last
 in vm_fini()

Cc: Christian Koenig 
Cc: Alex Deucher 
Cc: Felix Kuehling 
Signed-off-by: Shashank Sharma 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c |   9 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c |  18 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c   |  12 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c  | 158 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h  |  21 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c   |   2 +-
 drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c  |  24 +--
 drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c  |  23 +--
 drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c   |  20 ++-
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c   |  23 +--
 drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c  |  23 +--
 drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c|  22 +--
 drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c |  20 +--
 13 files changed, 251 insertions(+), 124 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
index 0e61ebdb3f3e..f9eb12697b95 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -1775,9 +1775,14 @@ static int amdgpu_debugfs_vm_info_show(struct seq_file 
*m, void *unused)
list_for_each_entry(file, &dev->filelist, lhead) {
struct amdgpu_fpriv *fpriv = file->driver_priv;
struct amdgpu_vm *vm = &fpriv->vm;
+   struct amdgpu_task_info *ti;
+
+   ti = amdgpu_vm_get_task_info_vm(vm);
+   if (ti) {
+   seq_printf(m, "pid:%d\tProcess:%s --\n", 
ti->pid, ti->process_name);
+   amdgpu_vm_put_task_info(ti);
+   }
 
-   seq_printf(m, "pid:%d\tProcess:%s --\n",
-   vm->task_info.pid, vm->task_info.process_name);
r = amdgpu_bo_reserve(vm->root.bo, true);
if (r)
break;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index 1f357198533f..e6e6d56398f2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -35,7 +35,7 @@ static enum drm_gpu_sched_stat amdgpu_job_timedout(struct 
drm_sched_job *s_job)
 {
struct amdgpu_ring *ring = to_amdgpu_ring(s_job->sched);
struct amdgpu_job *job = to_amdgpu_job(s_job);
-   struct amdgpu_task_info ti;
+   struct amdgpu_task_info *ti;
struct amdgpu_device *adev = ring->adev;
int idx;
int r;
@@ -48,7 +48,7 @@ static enum drm_gpu_sched_stat amdgpu_job_timedout(struct 
drm_sched_job *s_job)
return DRM_GPU_SCHED_STAT_ENODEV;
}
 
-   memset(&ti, 0, sizeof(struct amdgpu_task_info));
+
adev->job_hang = true;
 
if (amdgpu_gpu_recovery &&
@@ -58,12 +58,16 @@ static enum drm_gpu_sched_stat amdgpu_job_timedout(struct 
drm_sched_job *s_job)
goto exit;
}
 
-   amdgpu_vm_get_task_info(ring->adev, job->pasid, &ti);
DRM_ERROR("ring %s timeout, signaled seq=%u, emitted seq=%u\n",
- job->base.sched->name, atomic_read(&ring->fence_drv.last_seq),
- ring->fence_drv.sync_seq);
-   DRM_ERROR("Process information: process %s pid %d thread %s pid %d\n",
- ti.process_name, ti.tgid, ti.task_name, ti.pid);
+  job->base.sched->name, 
atomic_read(&ring->fence_drv.last_seq),
+  ring->fence_drv.sync_seq);
+
+   ti = amdgpu_vm_get_task_info_pasid(ring->adev, job->pasid);
+   if (ti) {
+   DRM_ERROR("Process information: process %s pid %d thread %s pid 
%d\n",
+ ti->process_name, ti->tgid, ti->task_name, ti->pid);
+   amdgpu_vm_put_task_info(ti);
+   }
 
dma_fence_set_error(&s_job->s_fence->finished, -ETIME);
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.

[PATCH v3 2/3] drm/amdgpu: implement TLB flush fence

2024-02-23 Thread Shashank Sharma

From: Christian König 

The problem is that when (for example) 4k pages are replaced
with a single 2M page we need to wait for change to be flushed
out by invalidating the TLB before the PT can be freed.

Solve this by moving the TLB flush into a DMA-fence object which
can be used to delay the freeing of the PT BOs until it is signaled.

V2: (Shashank)
- rebase
- set dma_fence_error only in case of error
- add tlb_flush fence only when PT/PD BO is locked (Felix)
- use vm->pasid when f is NULL (Mukul)

Cc: Christian Koenig 
Cc: Felix Kuehling 
Cc: Rajneesh Bhardwaj 
Cc: Alex Deucher 
Signed-off-by: Christian König 
Signed-off-by: Shashank Sharma 
---
 drivers/gpu/drm/amd/amdgpu/Makefile   |   3 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c|  10 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h|   4 +
 .../gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c  | 106 ++
 4 files changed, 122 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c

diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
b/drivers/gpu/drm/amd/amdgpu/Makefile
index 4c989da4d2f3..fdbb3d770c7b 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -70,7 +70,8 @@ amdgpu-y += amdgpu_device.o amdgpu_doorbell_mgr.o 
amdgpu_kms.o \
amdgpu_cs.o amdgpu_bios.o amdgpu_benchmark.o \
atombios_dp.o amdgpu_afmt.o amdgpu_trace_points.o \
atombios_encoders.o amdgpu_sa.o atombios_i2c.o \
-   amdgpu_dma_buf.o amdgpu_vm.o amdgpu_vm_pt.o amdgpu_ib.o amdgpu_pll.o \
+   amdgpu_dma_buf.o amdgpu_vm.o amdgpu_vm_pt.o amdgpu_vm_tlb_fence.o \
+   amdgpu_ib.o amdgpu_pll.o \
amdgpu_ucode.o amdgpu_bo_list.o amdgpu_ctx.o amdgpu_sync.o \
amdgpu_gtt_mgr.o amdgpu_preempt_mgr.o amdgpu_vram_mgr.o amdgpu_virt.o \
amdgpu_atomfirmware.o amdgpu_vf_error.o amdgpu_sched.o \
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 0960e0a665d3..67c690044b97 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -932,6 +932,15 @@ int amdgpu_vm_update_range(struct amdgpu_device *adev, 
struct amdgpu_vm *vm,
if (r)
goto error_unlock;
 
+   /* Prepare a TLB flush fence to be attached to PTs */
+   if (!unlocked && params.needs_flush && vm->is_compute_context) {
+   amdgpu_vm_tlb_fence_create(adev, vm, fence);
+
+   /* Makes sure no PD/PT is freed before the flush */
+   dma_resv_add_fence(vm->root.bo->tbo.base.resv, *fence,
+  DMA_RESV_USAGE_BOOKKEEP);
+   }
+
amdgpu_res_first(pages_addr ? NULL : res, offset,
 (last - start + 1) * AMDGPU_GPU_PAGE_SIZE, &cursor);
while (cursor.remaining) {
@@ -2237,6 +2246,7 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct 
amdgpu_vm *vm,
 
mutex_init(&vm->eviction_lock);
vm->evicting = false;
+   vm->tlb_fence_context = dma_fence_context_alloc(1);
 
r = amdgpu_vm_pt_create(adev, vm, adev->vm_manager.root_level,
false, &root, xcp_id);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
index ac9380afcb69..8e6fd25d07b7 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -332,6 +332,7 @@ struct amdgpu_vm {
atomic64_t  tlb_seq;
uint64_ttlb_seq_va;
uint64_t*tlb_seq_cpu_addr;
+   uint64_ttlb_fence_context;
 
atomic64_t  kfd_last_flushed_seq;
 
@@ -585,5 +586,8 @@ void amdgpu_vm_update_fault_cache(struct amdgpu_device 
*adev,
  uint64_t addr,
  uint32_t status,
  unsigned int vmhub);
+void amdgpu_vm_tlb_fence_create(struct amdgpu_device *adev,
+struct amdgpu_vm *vm,
+struct dma_fence **fence);
 
 #endif
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c
new file mode 100644
index ..569681badd7c
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c
@@ -0,0 +1,106 @@
+// SPDX-License-Identifier: GPL-2.0 OR MIT
+/*
+ * Copyright 2023 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to th

[PATCH v3 1/3] drm/amdgpu: replace TLB seq callback with HW seq

2024-02-23 Thread Shashank Sharma

From: Christian König 

The callback we installed for the SDMA update were actually pretty
horrible. since we now have seq64 use that one and HW seq writes
instead.

V2:(Shashank)
 - rebased on amd-drm-staging-next
 - changed amdgpu_seq64_gpu_addr

Cc: Christian König 
Cc: Alex Deucher 
Cc: Felix Kuehling 
Cc: Rajneesh Bhardwaj 
Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c   | 14 
 drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.h   |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c  | 79 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h  | 27 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_cpu.c  |  3 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c   |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c |  5 ++
 7 files changed, 42 insertions(+), 89 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
index 3d0d56087d41..300dc79fa2b9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
@@ -199,6 +199,20 @@ void amdgpu_seq64_free(struct amdgpu_device *adev, u64 va)
__clear_bit(bit_pos, adev->seq64.used);
 }
 
+/**
+ * amdgpu_seq64_gpu_addr - Calculate GPU addr from va
+ *
+ * @adev: amdgpu_device pointer
+ * @va: virtual address in client address space
+ *
+ * Calculate the GART address for a VA.
+ */
+u64 amdgpu_seq64_gpu_addr(struct amdgpu_device *adev, u64 va)
+{
+   return va - amdgpu_seq64_get_va_base(adev) +
+   amdgpu_bo_gpu_offset(adev->seq64.sbo);
+}
+
 /**
  * amdgpu_seq64_fini - Cleanup seq64 driver
  *
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.h
index 4203b2ab318d..63e8ac0a2057 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.h
@@ -43,6 +43,7 @@ void amdgpu_seq64_free(struct amdgpu_device *adev, u64 
gpu_addr);
 int amdgpu_seq64_map(struct amdgpu_device *adev, struct amdgpu_vm *vm,
 struct amdgpu_bo_va **bo_va);
 void amdgpu_seq64_unmap(struct amdgpu_device *adev, struct amdgpu_fpriv 
*fpriv);
+u64 amdgpu_seq64_gpu_addr(struct amdgpu_device *adev, u64 va);
 
 #endif
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index ed4a8c5d26d7..0960e0a665d3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -111,21 +111,6 @@ struct amdgpu_prt_cb {
struct dma_fence_cb cb;
 };
 
-/**
- * struct amdgpu_vm_tlb_seq_struct - Helper to increment the TLB flush sequence
- */
-struct amdgpu_vm_tlb_seq_struct {
-   /**
-* @vm: pointer to the amdgpu_vm structure to set the fence sequence on
-*/
-   struct amdgpu_vm *vm;
-
-   /**
-* @cb: callback
-*/
-   struct dma_fence_cb cb;
-};
-
 /**
  * amdgpu_vm_set_pasid - manage pasid and vm ptr mapping
  *
@@ -862,23 +847,6 @@ int amdgpu_vm_update_pdes(struct amdgpu_device *adev,
return r;
 }
 
-/**
- * amdgpu_vm_tlb_seq_cb - make sure to increment tlb sequence
- * @fence: unused
- * @cb: the callback structure
- *
- * Increments the tlb sequence to make sure that future CS execute a VM flush.
- */
-static void amdgpu_vm_tlb_seq_cb(struct dma_fence *fence,
-struct dma_fence_cb *cb)
-{
-   struct amdgpu_vm_tlb_seq_struct *tlb_cb;
-
-   tlb_cb = container_of(cb, typeof(*tlb_cb), cb);
-   atomic64_inc(&tlb_cb->vm->tlb_seq);
-   kfree(tlb_cb);
-}
-
 /**
  * amdgpu_vm_update_range - update a range in the vm page table
  *
@@ -911,7 +879,6 @@ int amdgpu_vm_update_range(struct amdgpu_device *adev, 
struct amdgpu_vm *vm,
   struct dma_fence **fence)
 {
struct amdgpu_vm_update_params params;
-   struct amdgpu_vm_tlb_seq_struct *tlb_cb;
struct amdgpu_res_cursor cursor;
enum amdgpu_sync_mode sync_mode;
int r, idx;
@@ -919,12 +886,6 @@ int amdgpu_vm_update_range(struct amdgpu_device *adev, 
struct amdgpu_vm *vm,
if (!drm_dev_enter(adev_to_drm(adev), &idx))
return -ENODEV;
 
-   tlb_cb = kmalloc(sizeof(*tlb_cb), GFP_KERNEL);
-   if (!tlb_cb) {
-   r = -ENOMEM;
-   goto error_unlock;
-   }
-
/* Vega20+XGMI where PTEs get inadvertently cached in L2 texture cache,
 * heavy-weight flush TLB unconditionally.
 */
@@ -942,6 +903,7 @@ int amdgpu_vm_update_range(struct amdgpu_device *adev, 
struct amdgpu_vm *vm,
params.immediate = immediate;
params.pages_addr = pages_addr;
params.unlocked = unlocked;
+   params.needs_flush = flush_tlb;
params.allow_override = allow_override;
 
/* Implicitly sync to command submissions in the same VM before
@@ -955,7 +917,7 @@ int amdgpu_vm_update_range(struct amdgpu_device *adev, 
struct amdgpu_vm *vm,
amdgpu_vm_eviction_lock(vm);
if (vm->evicting) {

[PATCH v3 3/3] drm/amdgpu: sync page table freeing with tlb flush

2024-02-23 Thread Shashank Sharma

This patch:
- adds a new list in amdgou_vm to hold the VM PT entries being freed
- waits for the TLB flush using the vm->tlb_flush_fence
- actually frees the PT BOs

V2: rebase
V3: Do not attach the tlb_fence to the entries, rather add the entries
to a list and delay their freeing (Christian)

Cc: Christian König 
Cc: Alex Deucher 
Cc: Felix Kuehling 
Cc: Rajneesh Bhardwaj 
Signed-off-by: Shashank Sharma 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c|  6 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h|  6 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c | 51 ---
 3 files changed, 58 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 67c690044b97..eebb73f2c2ef 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -939,6 +939,10 @@ int amdgpu_vm_update_range(struct amdgpu_device *adev, 
struct amdgpu_vm *vm,
/* Makes sure no PD/PT is freed before the flush */
dma_resv_add_fence(vm->root.bo->tbo.base.resv, *fence,
   DMA_RESV_USAGE_BOOKKEEP);
+
+   mutex_lock(&vm->tlb_fence_lock);
+   vm->tlb_fence_last = *fence;
+   mutex_unlock(&vm->tlb_fence_lock);
}
 
amdgpu_res_first(pages_addr ? NULL : res, offset,
@@ -2212,6 +2216,7 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct 
amdgpu_vm *vm,
INIT_LIST_HEAD(&vm->freed);
INIT_LIST_HEAD(&vm->done);
INIT_LIST_HEAD(&vm->pt_freed);
+   INIT_LIST_HEAD(&vm->tlb_flush_waitlist);
INIT_WORK(&vm->pt_free_work, amdgpu_vm_pt_free_work);
INIT_KFIFO(vm->faults);
 
@@ -2244,6 +2249,7 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct 
amdgpu_vm *vm,
vm->last_unlocked = dma_fence_get_stub();
vm->generation = 0;
 
+   mutex_init(&vm->tlb_fence_lock);
mutex_init(&vm->eviction_lock);
vm->evicting = false;
vm->tlb_fence_context = dma_fence_context_alloc(1);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
index 8e6fd25d07b7..77f10ed80973 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -334,6 +334,10 @@ struct amdgpu_vm {
uint64_t*tlb_seq_cpu_addr;
uint64_ttlb_fence_context;
 
+   struct mutextlb_fence_lock;
+   struct dma_fence*tlb_fence_last;
+   struct list_headtlb_flush_waitlist;
+
atomic64_t  kfd_last_flushed_seq;
 
/* How many times we had to re-generate the page tables */
@@ -379,6 +383,8 @@ struct amdgpu_vm {
 
/* cached fault info */
struct amdgpu_vm_fault_info fault_info;
+
+   int count_bos;
 };
 
 struct amdgpu_vm_manager {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
index 95dc0afdaffb..57ea95c5c085 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
@@ -643,13 +643,13 @@ static void amdgpu_vm_pt_free(struct amdgpu_vm_bo_base 
*entry)
if (!entry->bo)
return;
 
-   entry->bo->vm_bo = NULL;
shadow = amdgpu_bo_shadowed(entry->bo);
if (shadow) {
ttm_bo_set_bulk_move(&shadow->tbo, NULL);
amdgpu_bo_unref(&shadow);
}
ttm_bo_set_bulk_move(&entry->bo->tbo, NULL);
+   entry->bo->vm_bo = NULL;
 
spin_lock(&entry->vm->status_lock);
list_del(&entry->vm_status);
@@ -657,6 +657,38 @@ static void amdgpu_vm_pt_free(struct amdgpu_vm_bo_base 
*entry)
amdgpu_bo_unref(&entry->bo);
 }
 
+static void amdgpu_vm_pt_flush_waitlist(struct amdgpu_vm *vm)
+{
+   struct amdgpu_vm_bo_base *entry, *next;
+   LIST_HEAD(tlb_flush_waitlist);
+
+   if (!vm || list_empty(&vm->tlb_flush_waitlist))
+   return;
+
+   /* Wait for pending TLB flush before freeing PT BOs */
+   mutex_lock(&vm->tlb_fence_lock);
+   if (vm->tlb_fence_last && !dma_fence_is_signaled(vm->tlb_fence_last)) {
+   if (dma_fence_wait_timeout(vm->tlb_fence_last, false,
+  MAX_SCHEDULE_TIMEOUT) <= 0) {
+   DRM_ERROR("Timedout waiting for TLB flush, not freeing 
PT BOs\n");
+   mutex_unlock(&vm->tlb_fence_lock);
+   return;
+   }
+
+   vm->tlb_fence_last = NULL;
+   }
+
+   /* Save the waitlist locally and reset the flushlist */
+   list_splice_init(&vm->tlb_flush_waitlist, &tlb_flush_waitlist);
+   mutex_unlock(&vm->tlb_fe

[PATCH v4 2/2] drm/amdgpu: sync page table freeing with tlb flush

2024-03-01 Thread Shashank Sharma

The idea behind this patch is to delay the freeing of PT entry objects
until the TLB flush is done.

This patch:
- Adds a tlb_flush_waitlist which will keep the objects that need to be
  freed after tlb_flush
- Adds PT entries in this list in amdgpu_vm_pt_free_dfs, instead of freeing
  them immediately.
- Exports function amdgpu_vm_pt_free to be called dircetly.
- Adds a 'force' input bool to amdgpu_vm_pt_free_dfs to differentiate
  between immediate freeing of the BOs (like from
  amdgpu_vm_pt_free_root) vs delayed freeing.

V2: rebase
V4: (Christian)
- add only locked PTEs entries in TLB flush waitlist.
- do not create a separate function for list flush.
- do not create a new lock for TLB flush.
- there is no need to wait on tlb_flush_fence exclusively.

Cc: Christian König 
Cc: Alex Deucher 
Cc: Felix Kuehling 
Cc: Rajneesh Bhardwaj 
Signed-off-by: Shashank Sharma 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c| 10 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h|  4 
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c | 21 ++---
 3 files changed, 28 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 310aae6fb49b..94581a1fe34f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -990,11 +990,20 @@ int amdgpu_vm_update_range(struct amdgpu_device *adev, 
struct amdgpu_vm *vm,
 
/* Prepare a TLB flush fence to be attached to PTs */
if (!unlocked && params.needs_flush && vm->is_compute_context) {
+   struct amdgpu_vm_bo_base *entry, *next;
+
amdgpu_vm_tlb_fence_create(adev, vm, fence);
 
/* Makes sure no PD/PT is freed before the flush */
dma_resv_add_fence(vm->root.bo->tbo.base.resv, *fence,
   DMA_RESV_USAGE_BOOKKEEP);
+
+   if (list_empty(&vm->tlb_flush_waitlist))
+   goto error_unlock;
+
+   /* Now actually free the waitlist */
+   list_for_each_entry_safe(entry, next, &vm->tlb_flush_waitlist, 
vm_status)
+   amdgpu_vm_pt_free(entry);
}
 
 error_unlock:
@@ -2214,6 +2223,7 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct 
amdgpu_vm *vm,
INIT_LIST_HEAD(&vm->pt_freed);
INIT_WORK(&vm->pt_free_work, amdgpu_vm_pt_free_work);
INIT_KFIFO(vm->faults);
+   INIT_LIST_HEAD(&vm->tlb_flush_waitlist);
 
r = amdgpu_seq64_alloc(adev, &vm->tlb_seq_va, &vm->tlb_seq_cpu_addr);
if (r)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
index 298f604b8e5f..ba374c2c61bd 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -343,6 +343,9 @@ struct amdgpu_vm {
uint64_t*tlb_seq_cpu_addr;
uint64_ttlb_fence_context;
 
+   /* temporary storage of PT BOs until the TLB flush */
+   struct list_headtlb_flush_waitlist;
+
atomic64_t  kfd_last_flushed_seq;
 
/* How many times we had to re-generate the page tables */
@@ -545,6 +548,7 @@ int amdgpu_vm_ptes_update(struct amdgpu_vm_update_params 
*params,
  uint64_t start, uint64_t end,
  uint64_t dst, uint64_t flags);
 void amdgpu_vm_pt_free_work(struct work_struct *work);
+void amdgpu_vm_pt_free(struct amdgpu_vm_bo_base *entry);
 
 #if defined(CONFIG_DEBUG_FS)
 void amdgpu_debugfs_vm_bo_info(struct amdgpu_vm *vm, struct seq_file *m);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
index 95dc0afdaffb..cb14e5686c0f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
@@ -636,7 +636,7 @@ static int amdgpu_vm_pt_alloc(struct amdgpu_device *adev,
  *
  * @entry: PDE to free
  */
-static void amdgpu_vm_pt_free(struct amdgpu_vm_bo_base *entry)
+void amdgpu_vm_pt_free(struct amdgpu_vm_bo_base *entry)
 {
struct amdgpu_bo *shadow;
 
@@ -685,13 +685,15 @@ void amdgpu_vm_pt_free_work(struct work_struct *work)
  * @vm: amdgpu vm structure
  * @start: optional cursor where to start freeing PDs/PTs
  * @unlocked: vm resv unlock status
+ * @force: force free all PDs/PTs without waiting for TLB flush
  *
  * Free the page directory or page table level and all sub levels.
  */
 static void amdgpu_vm_pt_free_dfs(struct amdgpu_device *adev,
  struct amdgpu_vm *vm,
  struct amdgpu_vm_pt_cursor *start,
- bool unlocked)
+ bool unlocked,
+ bool force)
 {
struct amdgpu_vm_pt_cursor cursor;
struct amdgpu_vm_bo_base *ent

[PATCH v4 1/2] drm/amdgpu: implement TLB flush fence

2024-03-01 Thread Shashank Sharma

From: Christian König 

The problem is that when (for example) 4k pages are replaced
with a single 2M page we need to wait for change to be flushed
out by invalidating the TLB before the PT can be freed.

Solve this by moving the TLB flush into a DMA-fence object which
can be used to delay the freeing of the PT BOs until it is signaled.

V2: (Shashank)
- rebase
- set dma_fence_error only in case of error
- add tlb_flush fence only when PT/PD BO is locked (Felix)
- use vm->pasid when f is NULL (Mukul)

V4: - add a wait for (f->dependency) in tlb_fence_work (Christian)
- move the misplaced fence_create call to the end (Philip)

Cc: Christian Koenig 
Cc: Felix Kuehling 
Cc: Rajneesh Bhardwaj 
Cc: Alex Deucher 
Signed-off-by: Christian König 
Signed-off-by: Shashank Sharma 
---
 drivers/gpu/drm/amd/amdgpu/Makefile   |   3 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c|  10 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h|   4 +
 .../gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c  | 111 ++
 4 files changed, 127 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c

diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
b/drivers/gpu/drm/amd/amdgpu/Makefile
index fa26a4e3a99d..91ab4cf29b5b 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -70,7 +70,8 @@ amdgpu-y += amdgpu_device.o amdgpu_doorbell_mgr.o 
amdgpu_kms.o \
amdgpu_cs.o amdgpu_bios.o amdgpu_benchmark.o \
atombios_dp.o amdgpu_afmt.o amdgpu_trace_points.o \
atombios_encoders.o amdgpu_sa.o atombios_i2c.o \
-   amdgpu_dma_buf.o amdgpu_vm.o amdgpu_vm_pt.o amdgpu_ib.o amdgpu_pll.o \
+   amdgpu_dma_buf.o amdgpu_vm.o amdgpu_vm_pt.o amdgpu_vm_tlb_fence.o \
+   amdgpu_ib.o amdgpu_pll.o \
amdgpu_ucode.o amdgpu_bo_list.o amdgpu_ctx.o amdgpu_sync.o \
amdgpu_gtt_mgr.o amdgpu_preempt_mgr.o amdgpu_vram_mgr.o amdgpu_virt.o \
amdgpu_atomfirmware.o amdgpu_vf_error.o amdgpu_sched.o \
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 0960e0a665d3..310aae6fb49b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -988,6 +988,15 @@ int amdgpu_vm_update_range(struct amdgpu_device *adev, 
struct amdgpu_vm *vm,
 
r = vm->update_funcs->commit(¶ms, fence);
 
+   /* Prepare a TLB flush fence to be attached to PTs */
+   if (!unlocked && params.needs_flush && vm->is_compute_context) {
+   amdgpu_vm_tlb_fence_create(adev, vm, fence);
+
+   /* Makes sure no PD/PT is freed before the flush */
+   dma_resv_add_fence(vm->root.bo->tbo.base.resv, *fence,
+  DMA_RESV_USAGE_BOOKKEEP);
+   }
+
 error_unlock:
amdgpu_vm_eviction_unlock(vm);
drm_dev_exit(idx);
@@ -2237,6 +2246,7 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct 
amdgpu_vm *vm,
 
mutex_init(&vm->eviction_lock);
vm->evicting = false;
+   vm->tlb_fence_context = dma_fence_context_alloc(1);
 
r = amdgpu_vm_pt_create(adev, vm, adev->vm_manager.root_level,
false, &root, xcp_id);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
index 64b3f69efa57..298f604b8e5f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -341,6 +341,7 @@ struct amdgpu_vm {
atomic64_t  tlb_seq;
uint64_ttlb_seq_va;
uint64_t*tlb_seq_cpu_addr;
+   uint64_ttlb_fence_context;
 
atomic64_t  kfd_last_flushed_seq;
 
@@ -594,5 +595,8 @@ void amdgpu_vm_update_fault_cache(struct amdgpu_device 
*adev,
  uint64_t addr,
  uint32_t status,
  unsigned int vmhub);
+void amdgpu_vm_tlb_fence_create(struct amdgpu_device *adev,
+struct amdgpu_vm *vm,
+struct dma_fence **fence);
 
 #endif
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c
new file mode 100644
index ..54c33c24fa46
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c
@@ -0,0 +1,111 @@
+// SPDX-License-Identifier: GPL-2.0 OR MIT
+/*
+ * Copyright 2023 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to wh

[PATCH v5 1/2] drm/amdgpu: implement TLB flush fence

2024-03-06 Thread Shashank Sharma

From: Christian König 

The problem is that when (for example) 4k pages are replaced
with a single 2M page we need to wait for change to be flushed
out by invalidating the TLB before the PT can be freed.

Solve this by moving the TLB flush into a DMA-fence object which
can be used to delay the freeing of the PT BOs until it is signaled.

V2: (Shashank)
- rebase
- set dma_fence_error only in case of error
- add tlb_flush fence only when PT/PD BO is locked (Felix)
- use vm->pasid when f is NULL (Mukul)

V4: - add a wait for (f->dependency) in tlb_fence_work (Christian)
- move the misplaced fence_create call to the end (Philip)

V5: - free the f->dependency properly (Christian)

Cc: Christian Koenig 
Cc: Felix Kuehling 
Cc: Rajneesh Bhardwaj 
Cc: Alex Deucher 
Reviewed-by: Shashank Sharma 
Signed-off-by: Christian König 
Signed-off-by: Shashank Sharma 
---
 drivers/gpu/drm/amd/amdgpu/Makefile   |   3 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c|  10 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h|   4 +
 .../gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c  | 112 ++
 4 files changed, 128 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c

diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
b/drivers/gpu/drm/amd/amdgpu/Makefile
index fa26a4e3a99d..91ab4cf29b5b 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -70,7 +70,8 @@ amdgpu-y += amdgpu_device.o amdgpu_doorbell_mgr.o 
amdgpu_kms.o \
amdgpu_cs.o amdgpu_bios.o amdgpu_benchmark.o \
atombios_dp.o amdgpu_afmt.o amdgpu_trace_points.o \
atombios_encoders.o amdgpu_sa.o atombios_i2c.o \
-   amdgpu_dma_buf.o amdgpu_vm.o amdgpu_vm_pt.o amdgpu_ib.o amdgpu_pll.o \
+   amdgpu_dma_buf.o amdgpu_vm.o amdgpu_vm_pt.o amdgpu_vm_tlb_fence.o \
+   amdgpu_ib.o amdgpu_pll.o \
amdgpu_ucode.o amdgpu_bo_list.o amdgpu_ctx.o amdgpu_sync.o \
amdgpu_gtt_mgr.o amdgpu_preempt_mgr.o amdgpu_vram_mgr.o amdgpu_virt.o \
amdgpu_atomfirmware.o amdgpu_vf_error.o amdgpu_sched.o \
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 0960e0a665d3..310aae6fb49b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -988,6 +988,15 @@ int amdgpu_vm_update_range(struct amdgpu_device *adev, 
struct amdgpu_vm *vm,
 
r = vm->update_funcs->commit(¶ms, fence);
 
+   /* Prepare a TLB flush fence to be attached to PTs */
+   if (!unlocked && params.needs_flush && vm->is_compute_context) {
+   amdgpu_vm_tlb_fence_create(adev, vm, fence);
+
+   /* Makes sure no PD/PT is freed before the flush */
+   dma_resv_add_fence(vm->root.bo->tbo.base.resv, *fence,
+  DMA_RESV_USAGE_BOOKKEEP);
+   }
+
 error_unlock:
amdgpu_vm_eviction_unlock(vm);
drm_dev_exit(idx);
@@ -2237,6 +2246,7 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct 
amdgpu_vm *vm,
 
mutex_init(&vm->eviction_lock);
vm->evicting = false;
+   vm->tlb_fence_context = dma_fence_context_alloc(1);
 
r = amdgpu_vm_pt_create(adev, vm, adev->vm_manager.root_level,
false, &root, xcp_id);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
index 64b3f69efa57..298f604b8e5f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -341,6 +341,7 @@ struct amdgpu_vm {
atomic64_t  tlb_seq;
uint64_ttlb_seq_va;
uint64_t*tlb_seq_cpu_addr;
+   uint64_ttlb_fence_context;
 
atomic64_t  kfd_last_flushed_seq;
 
@@ -594,5 +595,8 @@ void amdgpu_vm_update_fault_cache(struct amdgpu_device 
*adev,
  uint64_t addr,
  uint32_t status,
  unsigned int vmhub);
+void amdgpu_vm_tlb_fence_create(struct amdgpu_device *adev,
+struct amdgpu_vm *vm,
+struct dma_fence **fence);
 
 #endif
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c
new file mode 100644
index ..51cddfa3f1e8
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c
@@ -0,0 +1,112 @@
+// SPDX-License-Identifier: GPL-2.0 OR MIT
+/*
+ * Copyright 2023 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sub

[PATCH v5 2/2] drm/amdgpu: sync page table freeing with tlb flush

2024-03-06 Thread Shashank Sharma

The idea behind this patch is to delay the freeing of PT entry objects
until the TLB flush is done.

This patch:
- Adds a tlb_flush_waitlist in amdgpu_vm_update_params which will keep the
  objects that need to be freed after tlb_flush.
- Adds PT entries in this list in amdgpu_vm_ptes_update after finding
  the PT entry.
- Changes functionality of amdgpu_vm_pt_free_dfs from (df_search + free)
  to simply freeing of the BOs, also renames it to
  amdgpu_vm_pt_free_list to reflect this same.
- Exports function amdgpu_vm_pt_free_list to be called directly.
- Calls amdgpu_vm_pt_free_list directly from amdgpu_vm_update_range.

V2: rebase
V4: Addressed review comments from Christian
- add only locked PTEs entries in TLB flush waitlist.
- do not create a separate function for list flush.
- do not create a new lock for TLB flush.
- there is no need to wait on tlb_flush_fence exclusively.

V5: Addressed review comments from Christian
- change the amdgpu_vm_pt_free_dfs's functionality to simple freeing
  of the objects and rename it.
- add all the PTE objects in params->tlb_flush_waitlist
- let amdgpu_vm_pt_free_root handle the freeing of BOs independently
- call amdgpu_vm_pt_free_list directly

Cc: Christian König 
Cc: Alex Deucher 
Cc: Felix Kuehling 
Cc: Rajneesh Bhardwaj 
Signed-off-by: Shashank Sharma 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c|  4 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h|  7 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c | 53 +--
 3 files changed, 40 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 310aae6fb49b..b8a6e534cd81 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -905,6 +905,7 @@ int amdgpu_vm_update_range(struct amdgpu_device *adev, 
struct amdgpu_vm *vm,
params.unlocked = unlocked;
params.needs_flush = flush_tlb;
params.allow_override = allow_override;
+   INIT_LIST_HEAD(¶ms.tlb_flush_waitlist);
 
/* Implicitly sync to command submissions in the same VM before
 * unmapping. Sync to moving fences before mapping.
@@ -997,6 +998,9 @@ int amdgpu_vm_update_range(struct amdgpu_device *adev, 
struct amdgpu_vm *vm,
   DMA_RESV_USAGE_BOOKKEEP);
}
 
+   if (params.needs_flush)
+   amdgpu_vm_pt_free_list(adev, ¶ms);
+
 error_unlock:
amdgpu_vm_eviction_unlock(vm);
drm_dev_exit(idx);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
index 298f604b8e5f..b81ca460b210 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -265,6 +265,11 @@ struct amdgpu_vm_update_params {
 * to be overridden for NUMA local memory.
 */
bool allow_override;
+
+   /**
+* @tlb_flush_waitlist: temporary storage for BOs until tlb_flush
+*/
+   struct list_head tlb_flush_waitlist;
 };
 
 struct amdgpu_vm_update_funcs {
@@ -545,6 +550,8 @@ int amdgpu_vm_ptes_update(struct amdgpu_vm_update_params 
*params,
  uint64_t start, uint64_t end,
  uint64_t dst, uint64_t flags);
 void amdgpu_vm_pt_free_work(struct work_struct *work);
+void amdgpu_vm_pt_free_list(struct amdgpu_device *adev,
+   struct amdgpu_vm_update_params *params);
 
 #if defined(CONFIG_DEBUG_FS)
 void amdgpu_debugfs_vm_bo_info(struct amdgpu_vm *vm, struct seq_file *m);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
index 95dc0afdaffb..d3afc9d34b71 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
@@ -679,40 +679,30 @@ void amdgpu_vm_pt_free_work(struct work_struct *work)
 }
 
 /**
- * amdgpu_vm_pt_free_dfs - free PD/PT levels
+ * amdgpu_vm_pt_free_list - free PD/PT levels
  *
  * @adev: amdgpu device structure
- * @vm: amdgpu vm structure
- * @start: optional cursor where to start freeing PDs/PTs
- * @unlocked: vm resv unlock status
+ * @params: see amdgpu_vm_update_params definition
  *
- * Free the page directory or page table level and all sub levels.
+ * Free the page directory objects saved in the flush list
  */
-static void amdgpu_vm_pt_free_dfs(struct amdgpu_device *adev,
- struct amdgpu_vm *vm,
- struct amdgpu_vm_pt_cursor *start,
- bool unlocked)
+void amdgpu_vm_pt_free_list(struct amdgpu_device *adev,
+   struct amdgpu_vm_update_params *params)
 {
-   struct amdgpu_vm_pt_cursor cursor;
-   struct amdgpu_vm_bo_base *entry;
+   struct amdgpu_vm_bo_base *entry, *next;
+   struct amdgpu_vm *vm = params->vm;
+   bool unlocked = params->unlocked;
 
if (unlocked) {

[PATCH] drm/amdgpu: cleanup unused variable

2024-03-12 Thread Shashank Sharma

This patch removes an unused input variable in the MES
doorbell function.

Cc: Christian König 
Cc: Alex Deucher 
Signed-off-by: Shashank Sharma 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 10 +++---
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
index 89ac50405e25..7615daf89ba5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
@@ -40,7 +40,6 @@ int amdgpu_mes_doorbell_process_slice(struct amdgpu_device 
*adev)
 }
 
 static int amdgpu_mes_kernel_doorbell_get(struct amdgpu_device *adev,
-struct amdgpu_mes_process *process,
 int ip_type, uint64_t *doorbell_index)
 {
unsigned int offset, found;
@@ -65,7 +64,6 @@ static int amdgpu_mes_kernel_doorbell_get(struct 
amdgpu_device *adev,
 }
 
 static void amdgpu_mes_kernel_doorbell_free(struct amdgpu_device *adev,
-  struct amdgpu_mes_process *process,
   uint32_t doorbell_index)
 {
unsigned int old, rel_index;
@@ -623,7 +621,7 @@ int amdgpu_mes_add_hw_queue(struct amdgpu_device *adev, int 
gang_id,
*queue_id = queue->queue_id = r;
 
/* allocate a doorbell index for the queue */
-   r = amdgpu_mes_kernel_doorbell_get(adev, gang->process,
+   r = amdgpu_mes_kernel_doorbell_get(adev,
  qprops->queue_type,
  &qprops->doorbell_off);
if (r)
@@ -681,8 +679,7 @@ int amdgpu_mes_add_hw_queue(struct amdgpu_device *adev, int 
gang_id,
return 0;
 
 clean_up_doorbell:
-   amdgpu_mes_kernel_doorbell_free(adev, gang->process,
-  qprops->doorbell_off);
+   amdgpu_mes_kernel_doorbell_free(adev, qprops->doorbell_off);
 clean_up_queue_id:
spin_lock_irqsave(&adev->mes.queue_id_lock, flags);
idr_remove(&adev->mes.queue_id_idr, queue->queue_id);
@@ -736,8 +733,7 @@ int amdgpu_mes_remove_hw_queue(struct amdgpu_device *adev, 
int queue_id)
  queue_id);
 
list_del(&queue->list);
-   amdgpu_mes_kernel_doorbell_free(adev, gang->process,
-  queue->doorbell_off);
+   amdgpu_mes_kernel_doorbell_free(adev, queue->doorbell_off);
amdgpu_mes_unlock(&adev->mes);
 
amdgpu_mes_queue_free_mqd(queue);
-- 
2.43.2

[PATCH v6 1/2] drm/amdgpu: implement TLB flush fence

2024-03-15 Thread Shashank Sharma

From: Christian Koenig 

The problem is that when (for example) 4k pages are replaced
with a single 2M page we need to wait for change to be flushed
out by invalidating the TLB before the PT can be freed.

Solve this by moving the TLB flush into a DMA-fence object which
can be used to delay the freeing of the PT BOs until it is signaled.
Also remove existing TLB flush cb and vm->last_tlb_flush fence and
replace it with new method of flushing tlb.

V2: (Shashank)
- rebase
- set dma_fence_error only in case of error
- add tlb_flush fence only when PT/PD BO is locked (Felix)
- use vm->pasid when f is NULL (Mukul)

V4: - add a wait for (f->dependency) in tlb_fence_work (Christian)
- move the misplaced fence_create call to the end (Philip)

V5: - free the f->dependency properly (Christian)

V6: (Shashank)
- added some cleanup from the HW seq patch in this patch.
- introduce params.needs_flush and its usage in this patch.
- remove vm->last_tlb_flush and tlb_cb.
- rebase without TLB HW seq patch.

Cc: Christian Koenig 
Cc: Felix Kuehling 
Cc: Rajneesh Bhardwaj 
Cc: Alex Deucher 
Acked-by: Felix Kuehling 
Acked-by: Rajneesh Bhardwaj 
Tested-by: Rajneesh Bhardwaj 
Reviewed-by: Shashank Sharma 
Signed-off-by: Christian KÃ¶nig 
Signed-off-by: Shashank Sharma 
---
 drivers/gpu/drm/amd/amdgpu/Makefile   |   3 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c|  77 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h|  26 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_cpu.c|   4 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c   |   4 +
 .../gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c  | 112 ++
 7 files changed, 141 insertions(+), 87 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c

diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
b/drivers/gpu/drm/amd/amdgpu/Makefile
index 4536c8ad0e11..f24f11ac3e92 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -70,7 +70,8 @@ amdgpu-y += amdgpu_device.o amdgpu_doorbell_mgr.o 
amdgpu_kms.o \
amdgpu_cs.o amdgpu_bios.o amdgpu_benchmark.o \
atombios_dp.o amdgpu_afmt.o amdgpu_trace_points.o \
atombios_encoders.o amdgpu_sa.o atombios_i2c.o \
-   amdgpu_dma_buf.o amdgpu_vm.o amdgpu_vm_pt.o amdgpu_ib.o amdgpu_pll.o \
+   amdgpu_dma_buf.o amdgpu_vm.o amdgpu_vm_pt.o amdgpu_vm_tlb_fence.o \
+   amdgpu_ib.o amdgpu_pll.o \
amdgpu_ucode.o amdgpu_bo_list.o amdgpu_ctx.o amdgpu_sync.o \
amdgpu_gtt_mgr.o amdgpu_preempt_mgr.o amdgpu_vram_mgr.o amdgpu_virt.o \
amdgpu_atomfirmware.o amdgpu_vf_error.o amdgpu_sched.o \
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 81fb3465e197..3b64623f32ea 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -111,21 +111,6 @@ struct amdgpu_prt_cb {
struct dma_fence_cb cb;
 };
 
-/**
- * struct amdgpu_vm_tlb_seq_struct - Helper to increment the TLB flush sequence
- */
-struct amdgpu_vm_tlb_seq_struct {
-   /**
-* @vm: pointer to the amdgpu_vm structure to set the fence sequence on
-*/
-   struct amdgpu_vm *vm;
-
-   /**
-* @cb: callback
-*/
-   struct dma_fence_cb cb;
-};
-
 /**
  * amdgpu_vm_set_pasid - manage pasid and vm ptr mapping
  *
@@ -868,23 +853,6 @@ int amdgpu_vm_update_pdes(struct amdgpu_device *adev,
return r;
 }
 
-/**
- * amdgpu_vm_tlb_seq_cb - make sure to increment tlb sequence
- * @fence: unused
- * @cb: the callback structure
- *
- * Increments the tlb sequence to make sure that future CS execute a VM flush.
- */
-static void amdgpu_vm_tlb_seq_cb(struct dma_fence *fence,
-struct dma_fence_cb *cb)
-{
-   struct amdgpu_vm_tlb_seq_struct *tlb_cb;
-
-   tlb_cb = container_of(cb, typeof(*tlb_cb), cb);
-   atomic64_inc(&tlb_cb->vm->tlb_seq);
-   kfree(tlb_cb);
-}
-
 /**
  * amdgpu_vm_update_range - update a range in the vm page table
  *
@@ -917,7 +885,6 @@ int amdgpu_vm_update_range(struct amdgpu_device *adev, 
struct amdgpu_vm *vm,
   struct dma_fence **fence)
 {
struct amdgpu_vm_update_params params;
-   struct amdgpu_vm_tlb_seq_struct *tlb_cb;
struct amdgpu_res_cursor cursor;
enum amdgpu_sync_mode sync_mode;
int r, idx;
@@ -925,12 +892,6 @@ int amdgpu_vm_update_range(struct amdgpu_device *adev, 
struct amdgpu_vm *vm,
if (!drm_dev_enter(adev_to_drm(adev), &idx))
return -ENODEV;
 
-   tlb_cb = kmalloc(sizeof(*tlb_cb), GFP_KERNEL);
-   if (!tlb_cb) {
-   r = -ENOMEM;
-   goto error_unlock;
-   }
-
/* Vega20+XGMI where PTEs get inadvertently cached in L2 texture cache,
 * heavy-weight flush TLB unconditionally.

[PATCH v6 2/2] drm/amdgpu: sync page table freeing with tlb flush

2024-03-15 Thread Shashank Sharma

The idea behind this patch is to delay the freeing of PT entry objects
until the TLB flush is done.

This patch:
- Adds a tlb_flush_waitlist in amdgpu_vm_update_params which will keep the
  objects that need to be freed after tlb_flush.
- Adds PT entries in this list in amdgpu_vm_ptes_update after finding
  the PT entry.
- Changes functionality of amdgpu_vm_pt_free_dfs from (df_search + free)
  to simply freeing of the BOs, also renames it to
  amdgpu_vm_pt_free_list to reflect this same.
- Exports function amdgpu_vm_pt_free_list to be called directly.
- Calls amdgpu_vm_pt_free_list directly from amdgpu_vm_update_range.

V2: rebase
V4: Addressed review comments from Christian
- add only locked PTEs entries in TLB flush waitlist.
- do not create a separate function for list flush.
- do not create a new lock for TLB flush.
- there is no need to wait on tlb_flush_fence exclusively.

V5: Addressed review comments from Christian
- change the amdgpu_vm_pt_free_dfs's functionality to simple freeing
  of the objects and rename it.
- add all the PTE objects in params->tlb_flush_waitlist
- let amdgpu_vm_pt_free_root handle the freeing of BOs independently
- call amdgpu_vm_pt_free_list directly

V6: Rebase

Cc: Christian König 
Cc: Alex Deucher 
Cc: Felix Kuehling 
Cc: Rajneesh Bhardwaj 
Acked-by: Felix Kuehling 
Acked-by: Rajneesh Bhardwaj 
Tested-by: Rajneesh Bhardwaj 
Signed-off-by: Shashank Sharma 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c|  4 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h|  7 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c | 53 +--
 3 files changed, 40 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 3b64623f32ea..9845d5077750 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -911,6 +911,7 @@ int amdgpu_vm_update_range(struct amdgpu_device *adev, 
struct amdgpu_vm *vm,
params.unlocked = unlocked;
params.needs_flush = flush_tlb;
params.allow_override = allow_override;
+   INIT_LIST_HEAD(¶ms.tlb_flush_waitlist);
 
/* Implicitly sync to command submissions in the same VM before
 * unmapping. Sync to moving fences before mapping.
@@ -1003,6 +1004,9 @@ int amdgpu_vm_update_range(struct amdgpu_device *adev, 
struct amdgpu_vm *vm,
   DMA_RESV_USAGE_BOOKKEEP);
}
 
+   if (params.needs_flush)
+   amdgpu_vm_pt_free_list(adev, ¶ms);
+
 error_unlock:
amdgpu_vm_eviction_unlock(vm);
drm_dev_exit(idx);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
index ba92f431f4e0..cc6a74a79f52 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -266,6 +266,11 @@ struct amdgpu_vm_update_params {
 * to be overridden for NUMA local memory.
 */
bool allow_override;
+
+   /**
+* @tlb_flush_waitlist: temporary storage for BOs until tlb_flush
+*/
+   struct list_head tlb_flush_waitlist;
 };
 
 struct amdgpu_vm_update_funcs {
@@ -546,6 +551,8 @@ int amdgpu_vm_ptes_update(struct amdgpu_vm_update_params 
*params,
  uint64_t start, uint64_t end,
  uint64_t dst, uint64_t flags);
 void amdgpu_vm_pt_free_work(struct work_struct *work);
+void amdgpu_vm_pt_free_list(struct amdgpu_device *adev,
+   struct amdgpu_vm_update_params *params);
 
 #if defined(CONFIG_DEBUG_FS)
 void amdgpu_debugfs_vm_bo_info(struct amdgpu_vm *vm, struct seq_file *m);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
index 601df0ce8290..440dc8c581fc 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
@@ -622,40 +622,30 @@ void amdgpu_vm_pt_free_work(struct work_struct *work)
 }
 
 /**
- * amdgpu_vm_pt_free_dfs - free PD/PT levels
+ * amdgpu_vm_pt_free_list - free PD/PT levels
  *
  * @adev: amdgpu device structure
- * @vm: amdgpu vm structure
- * @start: optional cursor where to start freeing PDs/PTs
- * @unlocked: vm resv unlock status
+ * @params: see amdgpu_vm_update_params definition
  *
- * Free the page directory or page table level and all sub levels.
+ * Free the page directory objects saved in the flush list
  */
-static void amdgpu_vm_pt_free_dfs(struct amdgpu_device *adev,
- struct amdgpu_vm *vm,
- struct amdgpu_vm_pt_cursor *start,
- bool unlocked)
+void amdgpu_vm_pt_free_list(struct amdgpu_device *adev,
+   struct amdgpu_vm_update_params *params)
 {
-   struct amdgpu_vm_pt_cursor cursor;
-   struct amdgpu_vm_bo_base *entry;
+   struct amdgpu_vm_bo_base *entry, *next;
+   struct amdgpu_vm *vm

[PATCH v7 2/2] drm/amdgpu: sync page table freeing with tlb flush

2024-03-18 Thread Shashank Sharma

The idea behind this patch is to delay the freeing of PT entry objects
until the TLB flush is done.

This patch:
- Adds a tlb_flush_waitlist in amdgpu_vm_update_params which will keep the
  objects that need to be freed after tlb_flush.
- Adds PT entries in this list in amdgpu_vm_ptes_update after finding
  the PT entry.
- Changes functionality of amdgpu_vm_pt_free_dfs from (df_search + free)
  to simply freeing of the BOs, also renames it to
  amdgpu_vm_pt_free_list to reflect this same.
- Exports function amdgpu_vm_pt_free_list to be called directly.
- Calls amdgpu_vm_pt_free_list directly from amdgpu_vm_update_range.

V2: rebase
V4: Addressed review comments from Christian
- add only locked PTEs entries in TLB flush waitlist.
- do not create a separate function for list flush.
- do not create a new lock for TLB flush.
- there is no need to wait on tlb_flush_fence exclusively.

V5: Addressed review comments from Christian
- change the amdgpu_vm_pt_free_dfs's functionality to simple freeing
  of the objects and rename it.
- add all the PTE objects in params->tlb_flush_waitlist
- let amdgpu_vm_pt_free_root handle the freeing of BOs independently
- call amdgpu_vm_pt_free_list directly

V6: Rebase
V7: Rebase

Cc: Christian König 
Cc: Alex Deucher 
Cc: Felix Kuehling 
Cc: Rajneesh Bhardwaj 
Acked-by: Felix Kuehling 
Acked-by: Rajneesh Bhardwaj 
Tested-by: Rajneesh Bhardwaj 
Signed-off-by: Shashank Sharma 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c|  5 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h|  7 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c | 53 +--
 3 files changed, 40 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 26f1c3359642..eaa402f99fe0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -977,6 +977,7 @@ int amdgpu_vm_update_range(struct amdgpu_device *adev, 
struct amdgpu_vm *vm,
params.unlocked = unlocked;
params.needs_flush = flush_tlb;
params.allow_override = allow_override;
+   INIT_LIST_HEAD(¶ms.tlb_flush_waitlist);
 
/* Implicitly sync to command submissions in the same VM before
 * unmapping. Sync to moving fences before mapping.
@@ -1062,8 +1063,10 @@ int amdgpu_vm_update_range(struct amdgpu_device *adev, 
struct amdgpu_vm *vm,
if (r)
goto error_unlock;
 
-   if (params.needs_flush)
+   if (params.needs_flush) {
r = amdgpu_vm_tlb_flush(¶ms, fence);
+   amdgpu_vm_pt_free_list(adev, ¶ms);
+   }
 
 error_unlock:
amdgpu_vm_eviction_unlock(vm);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
index b0a4fe683352..54d7da396de0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -266,6 +266,11 @@ struct amdgpu_vm_update_params {
 * to be overridden for NUMA local memory.
 */
bool allow_override;
+
+   /**
+* @tlb_flush_waitlist: temporary storage for BOs until tlb_flush
+*/
+   struct list_head tlb_flush_waitlist;
 };
 
 struct amdgpu_vm_update_funcs {
@@ -547,6 +552,8 @@ int amdgpu_vm_ptes_update(struct amdgpu_vm_update_params 
*params,
  uint64_t start, uint64_t end,
  uint64_t dst, uint64_t flags);
 void amdgpu_vm_pt_free_work(struct work_struct *work);
+void amdgpu_vm_pt_free_list(struct amdgpu_device *adev,
+   struct amdgpu_vm_update_params *params);
 
 #if defined(CONFIG_DEBUG_FS)
 void amdgpu_debugfs_vm_bo_info(struct amdgpu_vm *vm, struct seq_file *m);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
index 601df0ce8290..440dc8c581fc 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
@@ -622,40 +622,30 @@ void amdgpu_vm_pt_free_work(struct work_struct *work)
 }
 
 /**
- * amdgpu_vm_pt_free_dfs - free PD/PT levels
+ * amdgpu_vm_pt_free_list - free PD/PT levels
  *
  * @adev: amdgpu device structure
- * @vm: amdgpu vm structure
- * @start: optional cursor where to start freeing PDs/PTs
- * @unlocked: vm resv unlock status
+ * @params: see amdgpu_vm_update_params definition
  *
- * Free the page directory or page table level and all sub levels.
+ * Free the page directory objects saved in the flush list
  */
-static void amdgpu_vm_pt_free_dfs(struct amdgpu_device *adev,
- struct amdgpu_vm *vm,
- struct amdgpu_vm_pt_cursor *start,
- bool unlocked)
+void amdgpu_vm_pt_free_list(struct amdgpu_device *adev,
+   struct amdgpu_vm_update_params *params)
 {
-   struct amdgpu_vm_pt_cursor cursor;
-   struct amdgpu_vm_bo_base *entry;
+

[PATCH v7 1/2] drm/amdgpu: implement TLB flush fence

2024-03-18 Thread Shashank Sharma

From: Christian Koenig 

The problem is that when (for example) 4k pages are replaced
with a single 2M page we need to wait for change to be flushed
out by invalidating the TLB before the PT can be freed.

Solve this by moving the TLB flush into a DMA-fence object which
can be used to delay the freeing of the PT BOs until it is signaled.

V2: (Shashank)
- rebase
- set dma_fence_error only in case of error
- add tlb_flush fence only when PT/PD BO is locked (Felix)
- use vm->pasid when f is NULL (Mukul)

V4: - add a wait for (f->dependency) in tlb_fence_work (Christian)
- move the misplaced fence_create call to the end (Philip)

V5: - free the f->dependency properly

V6: (Shashank)
- light code movement, moved all the clean-up in previous patch
- introduce params.needs_flush and its usage in this patch
- rebase without TLB HW sequence patch

V7:
   - Keep the vm->last_update_fence and tlb_cb code until
 we can fix the HW sequencing (Christian)
   - Move all the tlb_fence related code in a separate function so that
 its easier to read and review

Cc: Christian Koenig 
Cc: Felix Kuehling 
Cc: Rajneesh Bhardwaj 
Cc: Alex Deucher 
Acked-by: Felix Kuehling 
Acked-by: Rajneesh Bhardwaj 
Tested-by: Rajneesh Bhardwaj 
Reviewed-by: Shashank Sharma 
Signed-off-by: Christian Koenig 
Signed-off-by: Shashank Sharma 
---
 drivers/gpu/drm/amd/amdgpu/Makefile   |   3 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c|  68 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h|   8 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_cpu.c|   4 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c   |   4 +
 .../gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c  | 112 ++
 7 files changed, 171 insertions(+), 30 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c

diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
b/drivers/gpu/drm/amd/amdgpu/Makefile
index 4536c8ad0e11..f24f11ac3e92 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -70,7 +70,8 @@ amdgpu-y += amdgpu_device.o amdgpu_doorbell_mgr.o 
amdgpu_kms.o \
amdgpu_cs.o amdgpu_bios.o amdgpu_benchmark.o \
atombios_dp.o amdgpu_afmt.o amdgpu_trace_points.o \
atombios_encoders.o amdgpu_sa.o atombios_i2c.o \
-   amdgpu_dma_buf.o amdgpu_vm.o amdgpu_vm_pt.o amdgpu_ib.o amdgpu_pll.o \
+   amdgpu_dma_buf.o amdgpu_vm.o amdgpu_vm_pt.o amdgpu_vm_tlb_fence.o \
+   amdgpu_ib.o amdgpu_pll.o \
amdgpu_ucode.o amdgpu_bo_list.o amdgpu_ctx.o amdgpu_sync.o \
amdgpu_gtt_mgr.o amdgpu_preempt_mgr.o amdgpu_vram_mgr.o amdgpu_virt.o \
amdgpu_atomfirmware.o amdgpu_vf_error.o amdgpu_sched.o \
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 81fb3465e197..26f1c3359642 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -885,6 +885,40 @@ static void amdgpu_vm_tlb_seq_cb(struct dma_fence *fence,
kfree(tlb_cb);
 }
 
+static int
+amdgpu_vm_tlb_flush(struct amdgpu_vm_update_params *params, struct dma_fence 
**fence)
+{
+   struct amdgpu_vm_tlb_seq_struct *tlb_cb;
+   struct amdgpu_vm *vm = params->vm;
+
+   if (!fence || !*fence)
+   return 0;
+
+   tlb_cb = kmalloc(sizeof(*tlb_cb), GFP_KERNEL);
+   if (!tlb_cb)
+   return -ENOMEM;
+
+   tlb_cb->vm = vm;
+   if (!dma_fence_add_callback(*fence, &tlb_cb->cb,
+   amdgpu_vm_tlb_seq_cb)) {
+   dma_fence_put(vm->last_tlb_flush);
+   vm->last_tlb_flush = dma_fence_get(*fence);
+   } else {
+   amdgpu_vm_tlb_seq_cb(NULL, &tlb_cb->cb);
+   }
+
+   /* Prepare a TLB flush fence to be attached to PTs */
+   if (!params->unlocked && vm->is_compute_context) {
+   amdgpu_vm_tlb_fence_create(params->adev, vm, fence);
+
+   /* Makes sure no PD/PT is freed before the flush */
+   dma_resv_add_fence(vm->root.bo->tbo.base.resv, *fence,
+  DMA_RESV_USAGE_BOOKKEEP);
+   }
+
+   return 0;
+}
+
 /**
  * amdgpu_vm_update_range - update a range in the vm page table
  *
@@ -917,7 +951,6 @@ int amdgpu_vm_update_range(struct amdgpu_device *adev, 
struct amdgpu_vm *vm,
   struct dma_fence **fence)
 {
struct amdgpu_vm_update_params params;
-   struct amdgpu_vm_tlb_seq_struct *tlb_cb;
struct amdgpu_res_cursor cursor;
enum amdgpu_sync_mode sync_mode;
int r, idx;
@@ -925,12 +958,6 @@ int amdgpu_vm_update_range(struct amdgpu_device *adev, 
struct amdgpu_vm *vm,
if (!drm_dev_enter(adev_to_drm(adev), &idx))
return -ENODEV;
 
-   tlb_cb = kmalloc(sizeof(*tlb_cb), GFP_KERNEL);
-   if (!tlb_cb)

[PATCH v8] drm/amdgpu: sync page table freeing with tlb flush

2024-03-18 Thread Shashank Sharma

The idea behind this patch is to delay the freeing of PT entry objects
until the TLB flush is done.

This patch:
- Adds a tlb_flush_waitlist in amdgpu_vm_update_params which will keep the
  objects that need to be freed after tlb_flush.
- Adds PT entries in this list in amdgpu_vm_ptes_update after finding
  the PT entry.
- Changes functionality of amdgpu_vm_pt_free_dfs from (df_search + free)
  to simply freeing of the BOs, also renames it to
  amdgpu_vm_pt_free_list to reflect this same.
- Exports function amdgpu_vm_pt_free_list to be called directly.
- Calls amdgpu_vm_pt_free_list directly from amdgpu_vm_update_range.

V2: rebase
V4: Addressed review comments from Christian
- add only locked PTEs entries in TLB flush waitlist.
- do not create a separate function for list flush.
- do not create a new lock for TLB flush.
- there is no need to wait on tlb_flush_fence exclusively.

V5: Addressed review comments from Christian
- change the amdgpu_vm_pt_free_dfs's functionality to simple freeing
  of the objects and rename it.
- add all the PTE objects in params->tlb_flush_waitlist
- let amdgpu_vm_pt_free_root handle the freeing of BOs independently
- call amdgpu_vm_pt_free_list directly

V6: Rebase
V7: Rebase
V8: Added a NULL check to fix this backtrace issue:
[  415.351447] BUG: kernel NULL pointer dereference, address: 0008
[  415.359245] #PF: supervisor write access in kernel mode
[  415.365081] #PF: error_code(0x0002) - not-present page
[  415.370817] PGD 101259067 P4D 101259067 PUD 10125a067 PMD 0
[  415.377140] Oops: 0002 [#1] PREEMPT SMP NOPTI
[  415.382004] CPU: 0 PID: 25481 Comm: test_with_MPI.e Tainted: G   OE  
   5.18.2-mi300-build-140423-ubuntu-22.04+ #24
[  415.394437] Hardware name: AMD Corporation Sh51p/Sh51p, BIOS RMO1001AS 
02/21/2024
[  415.402797] RIP: 0010:amdgpu_vm_ptes_update+0x6fd/0xa10 [amdgpu]
[  415.409648] Code: 4c 89 ff 4d 8d 66 30 e8 f1 ed ff ff 48 85 db 74 42 48 39 
5d a0 74 40 48 8b 53 20 48 8b 4b 18 48 8d 43 18 48 8d 75 b0 4c 89 ff <48
> 89 51 08 48 89 0a 49 8b 56 30 48 89 42 08 48 89 53 18 4c 89 63
[  415.430621] RSP: 0018:c9000401f990 EFLAGS: 00010287
[  415.436456] RAX: 888147bb82f0 RBX: 888147bb82d8 RCX: 
[  415.26] RDX:  RSI: c9000401fa30 RDI: 888161f8
[  415.452397] RBP: c9000401fa80 R08:  R09: c9000401fa00
[  415.460368] R10: 0007f0cc R11: 0007f0c85000 R12: c9000401fb20
[  415.468340] R13: 0007f0d0 R14: c9000401faf0 R15: 888161f8
[  415.476312] FS:  7f132ff89840() GS:889f87c0() 
knlGS:
[  415.485350] CS:  0010 DS:  ES:  CR0: 80050033
[  415.491767] CR2: 0008 CR3: 000161d46003 CR4: 00770ef0
[  415.499738] PKRU: 5554
[  415.502750] Call Trace:
[  415.505482]  
[  415.507825]  amdgpu_vm_update_range+0x32a/0x880 [amdgpu]
[  415.513869]  amdgpu_vm_clear_freed+0x117/0x250 [amdgpu]
[  415.519814]  amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu+0x18c/0x250 [amdgpu]
[  415.527729]  kfd_ioctl_unmap_memory_from_gpu+0xed/0x340 [amdgpu]
[  415.534551]  kfd_ioctl+0x3b6/0x510 [amdgpu]

Cc: Christian König 
Cc: Alex Deucher 
Cc: Felix Kuehling 
Cc: Rajneesh Bhardwaj 
Acked-by: Felix Kuehling 
Acked-by: Rajneesh Bhardwaj 
Tested-by: Rajneesh Bhardwaj 
Signed-off-by: Shashank Sharma 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c|  5 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h|  7 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c | 58 +--
 3 files changed, 45 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 26f1c3359642..eaa402f99fe0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -977,6 +977,7 @@ int amdgpu_vm_update_range(struct amdgpu_device *adev, 
struct amdgpu_vm *vm,
params.unlocked = unlocked;
params.needs_flush = flush_tlb;
params.allow_override = allow_override;
+   INIT_LIST_HEAD(¶ms.tlb_flush_waitlist);
 
/* Implicitly sync to command submissions in the same VM before
 * unmapping. Sync to moving fences before mapping.
@@ -1062,8 +1063,10 @@ int amdgpu_vm_update_range(struct amdgpu_device *adev, 
struct amdgpu_vm *vm,
if (r)
goto error_unlock;
 
-   if (params.needs_flush)
+   if (params.needs_flush) {
r = amdgpu_vm_tlb_flush(¶ms, fence);
+   amdgpu_vm_pt_free_list(adev, ¶ms);
+   }
 
 error_unlock:
amdgpu_vm_eviction_unlock(vm);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
index b0a4fe683352..54d7da396de0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -266,6 +266,11 @@ struct amdgpu_vm_update_params {
 * to be overridden for NUM

[PATCH v9 1/2] drm/amdgpu: implement TLB flush fence

2024-03-18 Thread Shashank Sharma

From: Christian Koenig 

The problem is that when (for example) 4k pages are replaced
with a single 2M page we need to wait for change to be flushed
out by invalidating the TLB before the PT can be freed.

Solve this by moving the TLB flush into a DMA-fence object which
can be used to delay the freeing of the PT BOs until it is signaled.

V2: (Shashank)
- rebase
- set dma_fence_error only in case of error
- add tlb_flush fence only when PT/PD BO is locked (Felix)
- use vm->pasid when f is NULL (Mukul)

V4: - add a wait for (f->dependency) in tlb_fence_work (Christian)
- move the misplaced fence_create call to the end (Philip)

V5: - free the f->dependency properly

V6: (Shashank)
- light code movement, moved all the clean-up in previous patch
- introduce params.needs_flush and its usage in this patch
- rebase without TLB HW sequence patch

V7:
   - Keep the vm->last_update_fence and tlb_cb code until
 we can fix the HW sequencing (Christian)
   - Move all the tlb_fence related code in a separate function so that
 its easier to read and review

V9: Addressed review comments from Christian
- start PT update only when we have callback memory allocated

Cc: Christian Koenig 
Cc: Felix Kuehling 
Cc: Rajneesh Bhardwaj 
Cc: Alex Deucher 
Acked-by: Felix Kuehling 
Acked-by: Rajneesh Bhardwaj 
Tested-by: Rajneesh Bhardwaj 
Reviewed-by: Shashank Sharma 
Signed-off-by: Christian Koenig 
Signed-off-by: Shashank Sharma 
---
 drivers/gpu/drm/amd/amdgpu/Makefile   |   3 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c|  64 +++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h|   8 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_cpu.c|   4 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c   |   4 +
 .../gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c  | 112 ++
 7 files changed, 175 insertions(+), 22 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c

diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
b/drivers/gpu/drm/amd/amdgpu/Makefile
index 4536c8ad0e11..f24f11ac3e92 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -70,7 +70,8 @@ amdgpu-y += amdgpu_device.o amdgpu_doorbell_mgr.o 
amdgpu_kms.o \
amdgpu_cs.o amdgpu_bios.o amdgpu_benchmark.o \
atombios_dp.o amdgpu_afmt.o amdgpu_trace_points.o \
atombios_encoders.o amdgpu_sa.o atombios_i2c.o \
-   amdgpu_dma_buf.o amdgpu_vm.o amdgpu_vm_pt.o amdgpu_ib.o amdgpu_pll.o \
+   amdgpu_dma_buf.o amdgpu_vm.o amdgpu_vm_pt.o amdgpu_vm_tlb_fence.o \
+   amdgpu_ib.o amdgpu_pll.o \
amdgpu_ucode.o amdgpu_bo_list.o amdgpu_ctx.o amdgpu_sync.o \
amdgpu_gtt_mgr.o amdgpu_preempt_mgr.o amdgpu_vram_mgr.o amdgpu_virt.o \
amdgpu_atomfirmware.o amdgpu_vf_error.o amdgpu_sched.o \
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 81fb3465e197..104bf600c85f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -885,6 +885,44 @@ static void amdgpu_vm_tlb_seq_cb(struct dma_fence *fence,
kfree(tlb_cb);
 }
 
+/**
+ * amdgpu_vm_tlb_flush - prepare TLB flush
+ *
+ * @params: parameters for update
+ * @fence: input fence to sync TLB flush with
+ * @tlb_cb: the callback structure
+ *
+ * Increments the tlb sequence to make sure that future CS execute a VM flush.
+ */
+static void
+amdgpu_vm_tlb_flush(struct amdgpu_vm_update_params *params,
+   struct dma_fence **fence,
+   struct amdgpu_vm_tlb_seq_struct *tlb_cb)
+{
+   struct amdgpu_vm *vm = params->vm;
+
+   if (!fence || !*fence)
+   return;
+
+   tlb_cb->vm = vm;
+   if (!dma_fence_add_callback(*fence, &tlb_cb->cb,
+   amdgpu_vm_tlb_seq_cb)) {
+   dma_fence_put(vm->last_tlb_flush);
+   vm->last_tlb_flush = dma_fence_get(*fence);
+   } else {
+   amdgpu_vm_tlb_seq_cb(NULL, &tlb_cb->cb);
+   }
+
+   /* Prepare a TLB flush fence to be attached to PTs */
+   if (!params->unlocked && vm->is_compute_context) {
+   amdgpu_vm_tlb_fence_create(params->adev, vm, fence);
+
+   /* Makes sure no PD/PT is freed before the flush */
+   dma_resv_add_fence(vm->root.bo->tbo.base.resv, *fence,
+  DMA_RESV_USAGE_BOOKKEEP);
+   }
+}
+
 /**
  * amdgpu_vm_update_range - update a range in the vm page table
  *
@@ -916,8 +954,8 @@ int amdgpu_vm_update_range(struct amdgpu_device *adev, 
struct amdgpu_vm *vm,
   struct ttm_resource *res, dma_addr_t *pages_addr,
   struct dma_fence **fence)
 {
-   struct amdgpu_vm_update_params params;
struct amdgpu_vm_tlb_seq_struct *tlb_cb;
+   struct amd

[PATCH v9 2/2] drm/amdgpu: sync page table freeing with tlb flush

2024-03-18 Thread Shashank Sharma

The idea behind this patch is to delay the freeing of PT entry objects
until the TLB flush is done.

This patch:
- Adds a tlb_flush_waitlist in amdgpu_vm_update_params which will keep the
  objects that need to be freed after tlb_flush.
- Adds PT entries in this list in amdgpu_vm_ptes_update after finding
  the PT entry.
- Changes functionality of amdgpu_vm_pt_free_dfs from (df_search + free)
  to simply freeing of the BOs, also renames it to
  amdgpu_vm_pt_free_list to reflect this same.
- Exports function amdgpu_vm_pt_free_list to be called directly.
- Calls amdgpu_vm_pt_free_list directly from amdgpu_vm_update_range.

V2: rebase
V4: Addressed review comments from Christian
- add only locked PTEs entries in TLB flush waitlist.
- do not create a separate function for list flush.
- do not create a new lock for TLB flush.
- there is no need to wait on tlb_flush_fence exclusively.

V5: Addressed review comments from Christian
- change the amdgpu_vm_pt_free_dfs's functionality to simple freeing
  of the objects and rename it.
- add all the PTE objects in params->tlb_flush_waitlist
- let amdgpu_vm_pt_free_root handle the freeing of BOs independently
- call amdgpu_vm_pt_free_list directly

V6: Rebase
V7: Rebase
V8: Added a NULL check to fix this backtrace issue:
[  415.351447] BUG: kernel NULL pointer dereference, address: 0008
[  415.359245] #PF: supervisor write access in kernel mode
[  415.365081] #PF: error_code(0x0002) - not-present page
[  415.370817] PGD 101259067 P4D 101259067 PUD 10125a067 PMD 0
[  415.377140] Oops: 0002 [#1] PREEMPT SMP NOPTI
[  415.382004] CPU: 0 PID: 25481 Comm: test_with_MPI.e Tainted: G   OE  
   5.18.2-mi300-build-140423-ubuntu-22.04+ #24
[  415.394437] Hardware name: AMD Corporation Sh51p/Sh51p, BIOS RMO1001AS 
02/21/2024
[  415.402797] RIP: 0010:amdgpu_vm_ptes_update+0x6fd/0xa10 [amdgpu]
[  415.409648] Code: 4c 89 ff 4d 8d 66 30 e8 f1 ed ff ff 48 85 db 74 42 48 39 
5d a0 74 40 48 8b 53 20 48 8b 4b 18 48 8d 43 18 48 8d 75 b0 4c 89 ff <48
> 89 51 08 48 89 0a 49 8b 56 30 48 89 42 08 48 89 53 18 4c 89 63
[  415.430621] RSP: 0018:c9000401f990 EFLAGS: 00010287
[  415.436456] RAX: 888147bb82f0 RBX: 888147bb82d8 RCX: 
[  415.26] RDX:  RSI: c9000401fa30 RDI: 888161f8
[  415.452397] RBP: c9000401fa80 R08:  R09: c9000401fa00
[  415.460368] R10: 0007f0cc R11: 0007f0c85000 R12: c9000401fb20
[  415.468340] R13: 0007f0d0 R14: c9000401faf0 R15: 888161f8
[  415.476312] FS:  7f132ff89840() GS:889f87c0() 
knlGS:
[  415.485350] CS:  0010 DS:  ES:  CR0: 80050033
[  415.491767] CR2: 0008 CR3: 000161d46003 CR4: 00770ef0
[  415.499738] PKRU: 5554
[  415.502750] Call Trace:
[  415.505482]  
[  415.507825]  amdgpu_vm_update_range+0x32a/0x880 [amdgpu]
[  415.513869]  amdgpu_vm_clear_freed+0x117/0x250 [amdgpu]
[  415.519814]  amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu+0x18c/0x250 [amdgpu]
[  415.527729]  kfd_ioctl_unmap_memory_from_gpu+0xed/0x340 [amdgpu]
[  415.534551]  kfd_ioctl+0x3b6/0x510 [amdgpu]

V9: Addressed review comments from Christian
- No NULL check reqd for root PT freeing
- Free PT list regardless of needs_flush
- Move adding BOs in list in a separate function

Cc: Christian König 
Cc: Alex Deucher 
Cc: Felix Kuehling 
Cc: Rajneesh Bhardwaj 
Acked-by: Felix Kuehling 
Acked-by: Rajneesh Bhardwaj 
Tested-by: Rajneesh Bhardwaj 
Signed-off-by: Shashank Sharma 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c|  3 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h|  7 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c | 66 +++
 3 files changed, 53 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 104bf600c85f..8fada1152664 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -986,6 +986,7 @@ int amdgpu_vm_update_range(struct amdgpu_device *adev, 
struct amdgpu_vm *vm,
params.unlocked = unlocked;
params.needs_flush = flush_tlb;
params.allow_override = allow_override;
+   INIT_LIST_HEAD(¶ms.tlb_flush_waitlist);
 
/* Implicitly sync to command submissions in the same VM before
 * unmapping. Sync to moving fences before mapping.
@@ -1076,6 +1077,8 @@ int amdgpu_vm_update_range(struct amdgpu_device *adev, 
struct amdgpu_vm *vm,
tlb_cb = NULL;
}
 
+   amdgpu_vm_pt_free_list(adev, ¶ms);
+
 error_free:
kfree(tlb_cb);
amdgpu_vm_eviction_unlock(vm);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
index b0a4fe683352..54d7da396de0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -266,6 +266,11 @@ struct amdgpu_vm

[PATCH] drm/amdgpu: Add a NULL check for freeing root PT

2024-03-21 Thread Shashank Sharma

This patch adds a NULL check to fix this crash reported during the
freeing of root PT entry:

[  06:55] BUG: unable to handle page fault for address: c9002d637aa0
[  +0.007689] #PF: supervisor write access in kernel mode
[  +0.005833] #PF: error_code(0x0002) - not-present page
[  +0.005732] PGD 10067 P4D 10067 PUD 1001ec067 PMD 4882af067 PTE 0
[  +0.007579] Oops: 0002 [#1] PREEMPT SMP NOPTI
[  +0.004861] CPU: 52 PID: 8146 Comm: kworker/52:2 Tainted: G   OE 
5.18.2-mi300-build-140423-ubuntu-22.04+ #24
[  +0.012135] Hardware name: AMD Corporation Sh54p/Sh54p, BIOS WPP4311S 
03/11/2024
[  +0.008254] Workqueue: events delayed_fput
[  +0.004573] RIP: 0010:amdgpu_vm_pt_free+0x66/0xe0 [amdgpu]
[  +0.006270] Code: 01 74 6e 48 c7 45 e8 00 00 00 00 31 f6 48 83 c7 58 e8 0e ea 
3b ff 48 8b 03 48 8d 78 38 e8 f2 9b 90 c0 48 8b 43 20 48 8b 53 18 <48> 89 42 08 
48 89 10 48 b8 00 01 00 00 00 00 ad de 48 89 43 18 48
[  +0.020954] RSP: 0018:c9002e117c08 EFLAGS: 00010246
[  +0.005830] RAX: 8884867bda20 RBX: 8884867bd9a8 RCX: 
[  +0.007961] RDX: c9002d637a98 RSI: 888482845458 RDI: c155916e
[  +0.007958] RBP: c9002e117c20 R08:  R09: 0001
[  +0.007961] R10: 888482843000 R11: 000141eed000 R12: 8884867bd9a8
[  +0.007959] R13: 888471d68098 R14: 888471d68098 R15: c1dab300
[  +0.007960] FS:  () GS:88e1cf70() 
knlGS:
[  +0.009027] CS:  0010 DS:  ES:  CR0: 80050033
[  +0.006409] CR2: c9002d637aa0 CR3: 06410006 CR4: 00770ee0
[  +0.007961] PKRU: 5554
[  +0.003016] Call Trace:
[  +0.002726]  
[  +0.002340]  amdgpu_vm_pt_free_root+0x60/0xa0 [amdgpu]
[  +0.005843]  amdgpu_vm_fini+0x2cb/0x5d0 [amdgpu]
[  +0.005248]  ? amdgpu_ctx_mgr_entity_fini+0x53/0x1c0 [amdgpu]
[  +0.006520]  amdgpu_driver_postclose_kms+0x191/0x2d0 [amdgpu]
[  +0.006520]  drm_file_free.part.0+0x1e5/0x260 [drm]

Cc: Christian König 
Cc: Alex Deucher 
Cc: Felix Kuehling 
Cc: Rajneesh Bhardwaj 
Signed-off-by: Shashank Sharma 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
index d904fc96ba0f..a0a5b955a4b4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
@@ -688,8 +688,10 @@ void amdgpu_vm_pt_free_root(struct amdgpu_device *adev, 
struct amdgpu_vm *vm)
struct amdgpu_vm_pt_cursor cursor;
struct amdgpu_vm_bo_base *entry;
 
-   for_each_amdgpu_vm_pt_dfs_safe(adev, vm, NULL, cursor, entry)
-   amdgpu_vm_pt_free(entry);
+   for_each_amdgpu_vm_pt_dfs_safe(adev, vm, NULL, cursor, entry) {
+   if (entry)
+   amdgpu_vm_pt_free(entry);
+   }
 }
 
 /**
-- 
2.43.2

[PATCH] drm/amdgpu: fix the list movement

2024-03-22 Thread Shashank Sharma

This patch adds a fix for list object movement, which was
introduced in the TLB flush series.

Fixes: 0a29a49f3ed4 ("drm/amdgpu: sync page table freeing with tlb flush")
Cc: Christian König 
Suggested-by: Christian König 
Signed-off-by: Shashank Sharma 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
index a0a5b955a4b4..7fdd306a48a0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
@@ -641,7 +641,7 @@ void amdgpu_vm_pt_free_list(struct amdgpu_device *adev,
 
if (unlocked) {
spin_lock(&vm->status_lock);
-   list_splice_init(&vm->pt_freed, ¶ms->tlb_flush_waitlist);
+   list_splice_init(¶ms->tlb_flush_waitlist, &vm->pt_freed);
spin_unlock(&vm->status_lock);
schedule_work(&vm->pt_free_work);
return;
-- 
2.43.2

[PATCH] drm/amdgpu: fix MES HQD masks

2024-04-05 Thread Shashank Sharma

This patch fixes the existing HQD masks prepared during the MES
initialization. These existing masks values were causing problems
when we try to enable GFX oversubscription.

Cc: Christian König 
Cc: Alex Deucher 
Signed-off-by: Shashank Sharma 
Signed-off-by: Arvind Yadav 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c |  3 ---
 drivers/gpu/drm/amd/amdgpu/mes_v10_1.c  | 15 ++-
 drivers/gpu/drm/amd/amdgpu/mes_v11_0.c  | 15 ++-
 3 files changed, 28 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
index da48b6da0107..7db80ffda33f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
@@ -148,9 +148,6 @@ int amdgpu_mes_init(struct amdgpu_device *adev)
adev->mes.compute_hqd_mask[i] = 0xc;
}
 
-   for (i = 0; i < AMDGPU_MES_MAX_GFX_PIPES; i++)
-   adev->mes.gfx_hqd_mask[i] = i ? 0 : 0xfffe;
-
for (i = 0; i < AMDGPU_MES_MAX_SDMA_PIPES; i++) {
if (amdgpu_ip_version(adev, SDMA0_HWIP, 0) <
IP_VERSION(6, 0, 0))
diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v10_1.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v10_1.c
index 1e5ad1e08d2a..9217914f824d 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v10_1.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v10_1.c
@@ -266,6 +266,19 @@ static int mes_v10_1_query_sched_status(struct amdgpu_mes 
*mes)
offsetof(union MESAPI__QUERY_MES_STATUS, api_status));
 }
 
+static inline uint32_t mes_v10_get_gfx_hqd_mask(int pipe_index)
+{
+   /* Pipe 1 can't be used for MES due to HW limitation */
+   if (pipe_index == 1)
+   return 0;
+
+   /*
+* GFX V10 supports 2 queues, but we want to keep queue 0
+* reserved for kernel, so enable only queue 1 (1<<1) for MES.
+*/
+   return 0x2;
+}
+
 static int mes_v10_1_set_hw_resources(struct amdgpu_mes *mes)
 {
int i;
@@ -291,7 +304,7 @@ static int mes_v10_1_set_hw_resources(struct amdgpu_mes 
*mes)
mes->compute_hqd_mask[i];
 
for (i = 0; i < MAX_GFX_PIPES; i++)
-   mes_set_hw_res_pkt.gfx_hqd_mask[i] = mes->gfx_hqd_mask[i];
+   mes_set_hw_res_pkt.gfx_hqd_mask[i] = 
mes_v10_get_gfx_hqd_mask(i);
 
for (i = 0; i < MAX_SDMA_PIPES; i++)
mes_set_hw_res_pkt.sdma_hqd_mask[i] = mes->sdma_hqd_mask[i];
diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
index 26d71a22395d..b7dcd936afc8 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
@@ -360,6 +360,19 @@ static int mes_v11_0_misc_op(struct amdgpu_mes *mes,
offsetof(union MESAPI__MISC, api_status));
 }
 
+static inline uint32_t mes_v11_get_gfx_hqd_mask(int pipe_index)
+{
+   /* Pipe 1 can't be used for MES due to HW limitation */
+   if (pipe_index == 1)
+   return 0;
+
+   /*
+* GFX V10 supports 2 queues, but we want to keep queue 0
+* reserved for kernel, so enable only queue 1 (1<<1) for MES.
+*/
+   return 0x2;
+}
+
 static int mes_v11_0_set_hw_resources(struct amdgpu_mes *mes)
 {
int i;
@@ -385,7 +398,7 @@ static int mes_v11_0_set_hw_resources(struct amdgpu_mes 
*mes)
mes->compute_hqd_mask[i];
 
for (i = 0; i < MAX_GFX_PIPES; i++)
-   mes_set_hw_res_pkt.gfx_hqd_mask[i] = mes->gfx_hqd_mask[i];
+   mes_set_hw_res_pkt.gfx_hqd_mask[i] = 
mes_v11_get_gfx_hqd_mask(i);
 
for (i = 0; i < MAX_SDMA_PIPES; i++)
mes_set_hw_res_pkt.sdma_hqd_mask[i] = mes->sdma_hqd_mask[i];
-- 
2.43.2

[PATCH] drm/amdgpu: fix MES GFX mask

2024-04-23 Thread Shashank Sharma

Current MES GFX mask prevents FW to enable oversubscription. This patch
does the following:
- Fixes the mask values and adds a description for the same
- Removes the central mask setup and makes it IP specific, as it would
  be different when the number of pipes and queues are different.

Cc: Christian König 
Cc: Alex Deucher 
Signed-off-by: Shashank Sharma 
Signed-off-by: Arvind Yadav 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 3 ---
 drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h | 1 -
 drivers/gpu/drm/amd/amdgpu/mes_v10_1.c  | 9 +++--
 drivers/gpu/drm/amd/amdgpu/mes_v11_0.c  | 9 +++--
 4 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
index da48b6da0107..7db80ffda33f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
@@ -148,9 +148,6 @@ int amdgpu_mes_init(struct amdgpu_device *adev)
adev->mes.compute_hqd_mask[i] = 0xc;
}
 
-   for (i = 0; i < AMDGPU_MES_MAX_GFX_PIPES; i++)
-   adev->mes.gfx_hqd_mask[i] = i ? 0 : 0xfffe;
-
for (i = 0; i < AMDGPU_MES_MAX_SDMA_PIPES; i++) {
if (amdgpu_ip_version(adev, SDMA0_HWIP, 0) <
IP_VERSION(6, 0, 0))
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
index 7d4f93fea937..e30f5de92c0f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
@@ -109,7 +109,6 @@ struct amdgpu_mes {
uint32_tvmid_mask_gfxhub;
uint32_tvmid_mask_mmhub;
uint32_t
compute_hqd_mask[AMDGPU_MES_MAX_COMPUTE_PIPES];
-   uint32_tgfx_hqd_mask[AMDGPU_MES_MAX_GFX_PIPES];
uint32_t
sdma_hqd_mask[AMDGPU_MES_MAX_SDMA_PIPES];
uint32_t
aggregated_doorbells[AMDGPU_MES_PRIORITY_NUM_LEVELS];
uint32_tsch_ctx_offs;
diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v10_1.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v10_1.c
index 1e5ad1e08d2a..4d1121d1a1e7 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v10_1.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v10_1.c
@@ -290,8 +290,13 @@ static int mes_v10_1_set_hw_resources(struct amdgpu_mes 
*mes)
mes_set_hw_res_pkt.compute_hqd_mask[i] =
mes->compute_hqd_mask[i];
 
-   for (i = 0; i < MAX_GFX_PIPES; i++)
-   mes_set_hw_res_pkt.gfx_hqd_mask[i] = mes->gfx_hqd_mask[i];
+   /*
+* GFX pipe 0 queue 0 is being used by kernel
+* Set GFX pipe 0 queue 1 for MES scheduling
+* GFX pipe 1 can't be used for MES due to HW limitation.
+*/
+   mes_set_hw_res_pkt.gfx_hqd_mask[0] = 0x2;
+   mes_set_hw_res_pkt.gfx_hqd_mask[1] = 0;
 
for (i = 0; i < MAX_SDMA_PIPES; i++)
mes_set_hw_res_pkt.sdma_hqd_mask[i] = mes->sdma_hqd_mask[i];
diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
index 26d71a22395d..fae6455aa8d5 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
@@ -384,8 +384,13 @@ static int mes_v11_0_set_hw_resources(struct amdgpu_mes 
*mes)
mes_set_hw_res_pkt.compute_hqd_mask[i] =
mes->compute_hqd_mask[i];
 
-   for (i = 0; i < MAX_GFX_PIPES; i++)
-   mes_set_hw_res_pkt.gfx_hqd_mask[i] = mes->gfx_hqd_mask[i];
+   /*
+* GFX pipe 0 queue 0 is being used by kernel
+* Set GFX pipe 0 queue 1 for MES scheduling
+* GFX pipe 1 can't be used for MES due to HW limitation.
+*/
+   mes_set_hw_res_pkt.gfx_hqd_mask[0] = 0x2;
+   mes_set_hw_res_pkt.gfx_hqd_mask[1] = 0;
 
for (i = 0; i < MAX_SDMA_PIPES; i++)
mes_set_hw_res_pkt.sdma_hqd_mask[i] = mes->sdma_hqd_mask[i];
-- 
2.43.2

[PATCH v9 00/14] AMDGPU usermode queues

2024-04-26 Thread Shashank Sharma

This patch series introduces AMDGPU usermode queues for gfx workloads.
Usermode queues is a method of GPU workload submission into the graphics
hardware without any interaction with kernel/DRM schedulers. In this
method, a userspace graphics application can create its own workqueue and
submit it directly in the GPU HW.

The general idea of how this is supposed to work:
- The application creates the following GPU objetcs:
  - A queue object to hold the workload packets.
  - A read pointer object.
  - A write pointer object.
  - A doorbell page.
  - Shadow bufffer pages.
  - GDS buffer pages (as required).
- The application picks a 32-bit offset in the doorbell page for this
  queue.
- The application uses the usermode_queue_create IOCTL introduced in
  this patch, by passing the GPU addresses of these objects (read ptr,
  write ptr, queue base address, shadow, gds) with doorbell object and
  32-bit doorbell offset in the doorbell page.
- The kernel creates the queue and maps it in the HW.
- The application maps the GPU buffers in process address space.
- The application can start submitting the data in the queue as soon as
  the kernel IOCTL returns.
- After filling the workload data in the queue, the app must write the
  number of dwords added in the queue into the doorbell offset and the
  WPTR buffer, and the GPU will start fetching the data.
- This series adds usermode queue support for all three MES based IPs
  (GFX, SDMA and Compute).

libDRM changes for this series and a sample DRM test program can be found
in the MESA merge request here:
https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/287

Alex Deucher (1):
  drm/amdgpu: UAPI for user queue management

Arvind Yadav (1):
  drm/amdgpu: enable compute/gfx usermode queue

Shashank Sharma (12):
  drm/amdgpu: add usermode queue base code
  drm/amdgpu: add new IOCTL for usermode queue
  drm/amdgpu: add helpers to create userqueue object
  drm/amdgpu: create MES-V11 usermode queue for GFX
  drm/amdgpu: create context space for usermode queue
  drm/amdgpu: map usermode queue into MES
  drm/amdgpu: map wptr BO into GART
  drm/amdgpu: generate doorbell index for userqueue
  drm/amdgpu: cleanup leftover queues
  drm/amdgpu: fix MES GFX mask
  drm/amdgpu: enable SDMA usermode queues
  drm/amdgpu: add kernel config for gfx-userqueue

 drivers/gpu/drm/amd/amdgpu/Kconfig|   8 +
 drivers/gpu/drm/amd/amdgpu/Makefile   |   7 +
 drivers/gpu/drm/amd/amdgpu/amdgpu.h   |   3 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c   |   2 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c   |   6 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c   |   3 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h   |   1 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 296 
 drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c|  10 +
 drivers/gpu/drm/amd/amdgpu/mes_v10_1.c|   9 +-
 drivers/gpu/drm/amd/amdgpu/mes_v11_0.c|   9 +-
 .../gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c  | 317 ++
 drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c|   6 +
 .../gpu/drm/amd/include/amdgpu_userqueue.h|  79 +
 include/uapi/drm/amdgpu_drm.h | 111 ++
 15 files changed, 859 insertions(+), 8 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
 create mode 100644 drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
 create mode 100644 drivers/gpu/drm/amd/include/amdgpu_userqueue.h

-- 
2.43.2

[PATCH v9 02/14] drm/amdgpu: add usermode queue base code

2024-04-26 Thread Shashank Sharma

This patch adds skeleton code for amdgpu usermode queue.
It contains:
- A new files with init functions of usermode queues.
- A queue context manager in driver private data.

V1: Worked on design review comments from RFC patch series:
(https://patchwork.freedesktop.org/series/112214/)
- Alex: Keep a list of queues, instead of single queue per process.
- Christian: Use the queue manager instead of global ptrs,
   Don't keep the queue structure in amdgpu_ctx

V2:
 - Reformatted code, split the big patch into two

V3:
- Integration with doorbell manager

V4:
- Align the structure member names to the largest member's column
  (Luben)
- Added SPDX license (Luben)

V5:
- Do not add amdgpu.h in amdgpu_userqueue.h (Christian).
- Move struct amdgpu_userq_mgr into amdgpu_userqueue.h (Christian).

V6: Rebase
V9: Rebase

Cc: Alex Deucher 
Cc: Christian Koenig 
Reviewed-by: Christian König 
Signed-off-by: Shashank Sharma 
---
 drivers/gpu/drm/amd/amdgpu/Makefile   |  2 +
 drivers/gpu/drm/amd/amdgpu/amdgpu.h   |  3 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c   |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c   |  6 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 40 
 .../gpu/drm/amd/include/amdgpu_userqueue.h| 61 +++
 6 files changed, 113 insertions(+)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
 create mode 100644 drivers/gpu/drm/amd/include/amdgpu_userqueue.h

diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
b/drivers/gpu/drm/amd/amdgpu/Makefile
index 4536c8ad0e11..05a2d1714070 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -260,6 +260,8 @@ amdgpu-y += \
 # add amdkfd interfaces
 amdgpu-y += amdgpu_amdkfd.o
 
+# add usermode queue
+amdgpu-y += amdgpu_userqueue.o
 
 ifneq ($(CONFIG_HSA_AMD),)
 AMDKFD_PATH := ../amdkfd
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index b3b84647207e..4ca14b02668b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -112,6 +112,7 @@
 #include "amdgpu_xcp.h"
 #include "amdgpu_seq64.h"
 #include "amdgpu_reg_state.h"
+#include "amdgpu_userqueue.h"
 
 #define MAX_GPU_INSTANCE   64
 
@@ -477,6 +478,7 @@ struct amdgpu_fpriv {
struct mutexbo_list_lock;
struct idr  bo_list_handles;
struct amdgpu_ctx_mgr   ctx_mgr;
+   struct amdgpu_userq_mgr userq_mgr;
/** GPU partition selection */
uint32_txcp_id;
 };
@@ -1039,6 +1041,7 @@ struct amdgpu_device {
boolenable_mes_kiq;
struct amdgpu_mes   mes;
struct amdgpu_mqd   mqds[AMDGPU_HW_IP_NUM];
+   const struct amdgpu_userq_funcs *userq_funcs[AMDGPU_HW_IP_NUM];
 
/* df */
struct amdgpu_dfdf;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index e4277298cf1a..374970984a61 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -50,6 +50,7 @@
 #include "amdgpu_reset.h"
 #include "amdgpu_sched.h"
 #include "amdgpu_xgmi.h"
+#include "amdgpu_userqueue.h"
 #include "../amdxcp/amdgpu_xcp_drv.h"
 
 /*
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index a2df3025a754..d78b06af834e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -44,6 +44,7 @@
 #include "amdgpu_display.h"
 #include "amdgpu_ras.h"
 #include "amd_pcie.h"
+#include "amdgpu_userqueue.h"
 
 void amdgpu_unregister_gpu_instance(struct amdgpu_device *adev)
 {
@@ -1388,6 +1389,10 @@ int amdgpu_driver_open_kms(struct drm_device *dev, 
struct drm_file *file_priv)
 
amdgpu_ctx_mgr_init(&fpriv->ctx_mgr, adev);
 
+   r = amdgpu_userq_mgr_init(&fpriv->userq_mgr, adev);
+   if (r)
+   DRM_WARN("Can't setup usermode queues, use legacy workload 
submission only\n");
+
file_priv->driver_priv = fpriv;
goto out_suspend;
 
@@ -1457,6 +1462,7 @@ void amdgpu_driver_postclose_kms(struct drm_device *dev,
 
amdgpu_ctx_mgr_fini(&fpriv->ctx_mgr);
amdgpu_vm_fini(adev, &fpriv->vm);
+   amdgpu_userq_mgr_fini(&fpriv->userq_mgr);
 
if (pasid)
amdgpu_pasid_free_delayed(pd->tbo.base.resv, pasid);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
new file mode 100644
index ..effc0c7c02cf
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
@@ -0,0 +1,40 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright 2023 Advanced Micro Devices, Inc.
+ *
+ * Permission is

[PATCH v9 01/14] drm/amdgpu: UAPI for user queue management

2024-04-26 Thread Shashank Sharma

From: Alex Deucher 

This patch intorduces new UAPI/IOCTL for usermode graphics
queue. The userspace app will fill this structure and request
the graphics driver to add a graphics work queue for it. The
output of this UAPI is a queue id.

This UAPI maps the queue into GPU, so the graphics app can start
submitting work to the queue as soon as the call returns.

V2: Addressed review comments from Alex and Christian
- Make the doorbell offset's comment clearer
- Change the output parameter name to queue_id

V3: Integration with doorbell manager

V4:
- Updated the UAPI doc (Pierre-Eric)
- Created a Union for engine specific MQDs (Alex)
- Added Christian's R-B
V5:
- Add variables for GDS and CSA in MQD structure (Alex)
- Make MQD data a ptr-size pair instead of union (Alex)

V9:
   - renamed struct drm_amdgpu_userq_mqd_gfx_v11 to struct
 drm_amdgpu_userq_mqd as its being used for SDMA and
 compute queues as well

Cc: Alex Deucher 
Cc: Christian Koenig 
Reviewed-by: Christian König 
Signed-off-by: Alex Deucher 
Signed-off-by: Shashank Sharma 
---
 include/uapi/drm/amdgpu_drm.h | 110 ++
 1 file changed, 110 insertions(+)

diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
index 96e32dafd4f0..22f56a30f7cb 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -54,6 +54,7 @@ extern "C" {
 #define DRM_AMDGPU_VM  0x13
 #define DRM_AMDGPU_FENCE_TO_HANDLE 0x14
 #define DRM_AMDGPU_SCHED   0x15
+#define DRM_AMDGPU_USERQ   0x16
 
 #define DRM_IOCTL_AMDGPU_GEM_CREATEDRM_IOWR(DRM_COMMAND_BASE + 
DRM_AMDGPU_GEM_CREATE, union drm_amdgpu_gem_create)
 #define DRM_IOCTL_AMDGPU_GEM_MMAP  DRM_IOWR(DRM_COMMAND_BASE + 
DRM_AMDGPU_GEM_MMAP, union drm_amdgpu_gem_mmap)
@@ -71,6 +72,7 @@ extern "C" {
 #define DRM_IOCTL_AMDGPU_VMDRM_IOWR(DRM_COMMAND_BASE + 
DRM_AMDGPU_VM, union drm_amdgpu_vm)
 #define DRM_IOCTL_AMDGPU_FENCE_TO_HANDLE DRM_IOWR(DRM_COMMAND_BASE + 
DRM_AMDGPU_FENCE_TO_HANDLE, union drm_amdgpu_fence_to_handle)
 #define DRM_IOCTL_AMDGPU_SCHED DRM_IOW(DRM_COMMAND_BASE + 
DRM_AMDGPU_SCHED, union drm_amdgpu_sched)
+#define DRM_IOCTL_AMDGPU_USERQ DRM_IOW(DRM_COMMAND_BASE + 
DRM_AMDGPU_USERQ, union drm_amdgpu_userq)
 
 /**
  * DOC: memory domains
@@ -317,6 +319,114 @@ union drm_amdgpu_ctx {
union drm_amdgpu_ctx_out out;
 };
 
+/* user queue IOCTL */
+#define AMDGPU_USERQ_OP_CREATE 1
+#define AMDGPU_USERQ_OP_FREE   2
+
+/* Flag to indicate secure buffer related workload, unused for now */
+#define AMDGPU_USERQ_MQD_FLAGS_SECURE  (1 << 0)
+/* Flag to indicate AQL workload, unused for now */
+#define AMDGPU_USERQ_MQD_FLAGS_AQL (1 << 1)
+
+/*
+ * MQD (memory queue descriptor) is a set of parameters which allow
+ * the GPU to uniquely define and identify a usermode queue. This
+ * structure defines the MQD for GFX-V11 IP ver 0.
+ */
+struct drm_amdgpu_userq_mqd {
+   /**
+* @queue_va: Virtual address of the GPU memory which holds the queue
+* object. The queue holds the workload packets.
+*/
+   __u64   queue_va;
+   /**
+* @queue_size: Size of the queue in bytes, this needs to be 256-byte
+* aligned.
+*/
+   __u64   queue_size;
+   /**
+* @rptr_va : Virtual address of the GPU memory which holds the ring 
RPTR.
+* This object must be at least 8 byte in size and aligned to 8-byte 
offset.
+*/
+   __u64   rptr_va;
+   /**
+* @wptr_va : Virtual address of the GPU memory which holds the ring 
WPTR.
+* This object must be at least 8 byte in size and aligned to 8-byte 
offset.
+*
+* Queue, RPTR and WPTR can come from the same object, as long as the 
size
+* and alignment related requirements are met.
+*/
+   __u64   wptr_va;
+   /**
+* @shadow_va: Virtual address of the GPU memory to hold the shadow 
buffer.
+* This must be a from a separate GPU object, and must be at least 
4-page
+* sized.
+*/
+   __u64   shadow_va;
+   /**
+* @gds_va: Virtual address of the GPU memory to hold the GDS buffer.
+* This must be a from a separate GPU object, and must be at least 
1-page
+* sized.
+*/
+   __u64   gds_va;
+   /**
+* @csa_va: Virtual address of the GPU memory to hold the CSA buffer.
+* This must be a from a separate GPU object, and must be at least 
1-page
+* sized.
+*/
+   __u64   csa_va;
+};
+
+struct drm_amdgpu_userq_in {
+   /** AMDGPU_USERQ_OP_* */
+   __u32   op;
+   /** Queue handle for USERQ_OP_FREE */
+   __u32   queue_id;
+   /** the target GPU engine to execute workload (AMDGPU_HW_IP_*) */
+   __u32   ip_type;
+   /**
+* @flags: flags to indicate special function for queue like secu

[PATCH v9 06/14] drm/amdgpu: create context space for usermode queue

2024-04-26 Thread Shashank Sharma

The FW expects us to allocate at least one page as context
space to process gang, process, GDS and FW  related work.
This patch creates a joint object for the same, and calculates
GPU space offsets of these spaces.

V1: Addressed review comments on RFC patch:
Alex: Make this function IP specific

V2: Addressed review comments from Christian
- Allocate only one object for total FW space, and calculate
  offsets for each of these objects.

V3: Integration with doorbell manager

V4: Review comments:
- Remove shadow from FW space list from cover letter (Alex)
- Alignment of macro (Luben)

V5: Merged patches 5 and 6 into this single patch
Addressed review comments:
- Use lower_32_bits instead of mask (Christian)
- gfx_v11_0 instead of gfx_v11 in function names (Alex)
- Shadow and GDS objects are now coming from userspace (Christian,
  Alex)

V6:
- Add a comment to replace amdgpu_bo_create_kernel() with
  amdgpu_bo_create() during fw_ctx object creation (Christian).
- Move proc_ctx_gpu_addr, gang_ctx_gpu_addr and fw_ctx_gpu_addr out
  of generic queue structure and make it gen11 specific (Alex).

V7:
   - Using helper function to create/destroy userqueue objects.
   - Removed FW object space allocation.

V8:
   - Updating FW object address from user values.

V9:
   - uppdated function name from gfx_v11_* to mes_v11_*

Cc: Alex Deucher 
Cc: Christian Koenig 
Signed-off-by: Shashank Sharma 
Signed-off-by: Arvind Yadav 
---
 .../gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c  | 43 +++
 .../gpu/drm/amd/include/amdgpu_userqueue.h|  1 +
 2 files changed, 44 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
index 9e7dee77d344..9f9fdcb9c294 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
@@ -27,6 +27,41 @@
 #include "mes_v11_0.h"
 #include "amdgpu_userqueue.h"
 
+#define AMDGPU_USERQ_PROC_CTX_SZ PAGE_SIZE
+#define AMDGPU_USERQ_GANG_CTX_SZ PAGE_SIZE
+
+static int mes_v11_0_userq_create_ctx_space(struct amdgpu_userq_mgr *uq_mgr,
+   struct amdgpu_usermode_queue *queue,
+   struct drm_amdgpu_userq_mqd 
*mqd_user)
+{
+   struct amdgpu_userq_obj *ctx = &queue->fw_obj;
+   struct v11_gfx_mqd *mqd = queue->mqd.cpu_ptr;
+   int r, size;
+
+   /*
+* The FW expects at least one page space allocated for
+* process ctx and gang ctx each. Create an object
+* for the same.
+*/
+   size = AMDGPU_USERQ_PROC_CTX_SZ + AMDGPU_USERQ_GANG_CTX_SZ;
+   r = amdgpu_userqueue_create_object(uq_mgr, ctx, size);
+   if (r) {
+   DRM_ERROR("Failed to allocate ctx space bo for userqueue, 
err:%d\n", r);
+   return r;
+   }
+
+   /* Shadow and GDS objects come directly from userspace */
+   mqd->shadow_base_lo = mqd_user->shadow_va & 0xFFFC;
+   mqd->shadow_base_hi = upper_32_bits(mqd_user->shadow_va);
+
+   mqd->gds_bkup_base_lo = mqd_user->gds_va & 0xFFFC;
+   mqd->gds_bkup_base_hi = upper_32_bits(mqd_user->gds_va);
+
+   mqd->fw_work_area_base_lo = mqd_user->csa_va & 0xFFFC;
+   mqd->fw_work_area_base_hi = upper_32_bits(mqd_user->csa_va);
+   return 0;
+}
+
 static int mes_v11_0_userq_mqd_create(struct amdgpu_userq_mgr *uq_mgr,
  struct drm_amdgpu_userq_in *args_in,
  struct amdgpu_usermode_queue *queue)
@@ -82,6 +117,13 @@ static int mes_v11_0_userq_mqd_create(struct 
amdgpu_userq_mgr *uq_mgr,
goto free_mqd;
}
 
+   /* Create BO for FW operations */
+   r = mes_v11_0_userq_create_ctx_space(uq_mgr, queue, mqd_user);
+   if (r) {
+   DRM_ERROR("Failed to allocate BO for userqueue (%d)", r);
+   goto free_mqd;
+   }
+
return 0;
 
 free_mqd:
@@ -100,6 +142,7 @@ static void
 mes_v11_0_userq_mqd_destroy(struct amdgpu_userq_mgr *uq_mgr,
struct amdgpu_usermode_queue *queue)
 {
+   amdgpu_userqueue_destroy_object(uq_mgr, &queue->fw_obj);
kfree(queue->userq_prop);
amdgpu_userqueue_destroy_object(uq_mgr, &queue->mqd);
 }
diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h 
b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
index bbd29f68b8d4..643f31474bd8 100644
--- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
+++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
@@ -44,6 +44,7 @@ struct amdgpu_usermode_queue {
struct amdgpu_userq_mgr *userq_mgr;
struct amdgpu_vm*vm;
struct amdgpu_userq_obj mqd;
+   struct amdgpu_userq_obj fw_obj;
 };
 
 struct amdgpu_userq_funcs {
-- 
2.43.2

[PATCH v9 04/14] drm/amdgpu: add helpers to create userqueue object

2024-04-26 Thread Shashank Sharma

This patch introduces amdgpu_userqueue_object and its helper
functions to creates and destroy this object. The helper
functions creates/destroys a base amdgpu_bo, kmap/unmap it and
save the respective GPU and CPU addresses in the encapsulating
userqueue object.

These helpers will be used to create/destroy userqueue MQD, WPTR
and FW areas.

V7:
- Forked out this new patch from V11-gfx-userqueue patch to prevent
  that patch from growing very big.
- Using amdgpu_bo_create instead of amdgpu_bo_create_kernel in prep
  for eviction fences (Christian)

V9:
 - Rebase

Cc: Alex Deucher 
Cc: Christian Koenig 
Signed-off-by: Shashank Sharma 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 62 +++
 .../gpu/drm/amd/include/amdgpu_userqueue.h| 13 
 2 files changed, 75 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
index df97b856f891..65cab0ad97a1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
@@ -32,6 +32,68 @@ amdgpu_userqueue_find(struct amdgpu_userq_mgr *uq_mgr, int 
qid)
return idr_find(&uq_mgr->userq_idr, qid);
 }
 
+int amdgpu_userqueue_create_object(struct amdgpu_userq_mgr *uq_mgr,
+  struct amdgpu_userq_obj *userq_obj,
+  int size)
+{
+   struct amdgpu_device *adev = uq_mgr->adev;
+   struct amdgpu_bo_param bp;
+   int r;
+
+   memset(&bp, 0, sizeof(bp));
+   bp.byte_align = PAGE_SIZE;
+   bp.domain = AMDGPU_GEM_DOMAIN_GTT;
+   bp.flags = AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS |
+  AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED;
+   bp.type = ttm_bo_type_kernel;
+   bp.size = size;
+   bp.resv = NULL;
+   bp.bo_ptr_size = sizeof(struct amdgpu_bo);
+
+   r = amdgpu_bo_create(adev, &bp, &userq_obj->obj);
+   if (r) {
+   DRM_ERROR("Failed to allocate BO for userqueue (%d)", r);
+   return r;
+   }
+
+   r = amdgpu_bo_reserve(userq_obj->obj, true);
+   if (r) {
+   DRM_ERROR("Failed to reserve BO to map (%d)", r);
+   goto free_obj;
+   }
+
+   r = amdgpu_ttm_alloc_gart(&(userq_obj->obj)->tbo);
+   if (r) {
+   DRM_ERROR("Failed to alloc GART for userqueue object (%d)", r);
+   goto unresv;
+   }
+
+   r = amdgpu_bo_kmap(userq_obj->obj, &userq_obj->cpu_ptr);
+   if (r) {
+   DRM_ERROR("Failed to map BO for userqueue (%d)", r);
+   goto unresv;
+   }
+
+   userq_obj->gpu_addr = amdgpu_bo_gpu_offset(userq_obj->obj);
+   amdgpu_bo_unreserve(userq_obj->obj);
+   memset(userq_obj->cpu_ptr, 0, size);
+   return 0;
+
+unresv:
+   amdgpu_bo_unreserve(userq_obj->obj);
+
+free_obj:
+   amdgpu_bo_unref(&userq_obj->obj);
+   return r;
+}
+
+void amdgpu_userqueue_destroy_object(struct amdgpu_userq_mgr *uq_mgr,
+  struct amdgpu_userq_obj *userq_obj)
+{
+   amdgpu_bo_kunmap(userq_obj->obj);
+   amdgpu_bo_unref(&userq_obj->obj);
+}
+
 static int
 amdgpu_userqueue_destroy(struct drm_file *filp, int queue_id)
 {
diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h 
b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
index b739274c72e1..bbd29f68b8d4 100644
--- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
+++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
@@ -29,6 +29,12 @@
 
 struct amdgpu_mqd_prop;
 
+struct amdgpu_userq_obj {
+   void *cpu_ptr;
+   uint64_t gpu_addr;
+   struct amdgpu_bo *obj;
+};
+
 struct amdgpu_usermode_queue {
int queue_type;
uint64_tdoorbell_handle;
@@ -37,6 +43,7 @@ struct amdgpu_usermode_queue {
struct amdgpu_mqd_prop  *userq_prop;
struct amdgpu_userq_mgr *userq_mgr;
struct amdgpu_vm*vm;
+   struct amdgpu_userq_obj mqd;
 };
 
 struct amdgpu_userq_funcs {
@@ -60,4 +67,10 @@ int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr 
*userq_mgr, struct amdgpu_devi
 
 void amdgpu_userq_mgr_fini(struct amdgpu_userq_mgr *userq_mgr);
 
+int amdgpu_userqueue_create_object(struct amdgpu_userq_mgr *uq_mgr,
+  struct amdgpu_userq_obj *userq_obj,
+  int size);
+
+void amdgpu_userqueue_destroy_object(struct amdgpu_userq_mgr *uq_mgr,
+struct amdgpu_userq_obj *userq_obj);
 #endif
-- 
2.43.2

[PATCH v9 05/14] drm/amdgpu: create MES-V11 usermode queue for GFX

2024-04-26 Thread Shashank Sharma

A Memory queue descriptor (MQD) of a userqueue defines it in
the hw's context. As MQD format can vary between different
graphics IPs, we need gfx GEN specific handlers to create MQDs.

This patch:
- Adds a new file which will be used for MES based userqueue
  functions targeting GFX and SDMA IP.
- Introduces MQD handler functions for the usermode queues.
- Adds new functions to create and destroy userqueue MQD for
  MES-V11 for GFX IP.

V1: Worked on review comments from Alex:
- Make MQD functions GEN and IP specific

V2: Worked on review comments from Alex:
- Reuse the existing adev->mqd[ip] for MQD creation
- Formatting and arrangement of code

V3:
- Integration with doorbell manager

V4: Review comments addressed:
- Do not create a new file for userq, reuse gfx_v11_0.c (Alex)
- Align name of structure members (Luben)
- Don't break up the Cc tag list and the Sob tag list in commit
  message (Luben)
V5:
   - No need to reserve the bo for MQD (Christian).
   - Some more changes to support IP specific MQD creation.

V6:
   - Add a comment reminding us to replace the amdgpu_bo_create_kernel()
 calls while creating MQD object to amdgpu_bo_create() once eviction
 fences are ready (Christian).

V7:
   - Re-arrange userqueue functions in adev instead of uq_mgr (Alex)
   - Use memdup_user instead of copy_from_user (Christian)

V9:
   - Moved userqueue code from gfx_v11_0.c to new file mes_v11_0.c so
 that it can be reused for SDMA userqueues as well (Shashank, Alex)

Cc: Alex Deucher 
Cc: Christian Koenig 
Signed-off-by: Shashank Sharma 
Signed-off-by: Arvind Yadav 
---
 drivers/gpu/drm/amd/amdgpu/Makefile   |   3 +-
 drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c|   4 +
 .../gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c  | 110 ++
 3 files changed, 116 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c

diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
b/drivers/gpu/drm/amd/amdgpu/Makefile
index 05a2d1714070..a640bfa468ad 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -184,7 +184,8 @@ amdgpu-y += \
 amdgpu-y += \
amdgpu_mes.o \
mes_v10_1.o \
-   mes_v11_0.o
+   mes_v11_0.o \
+   mes_v11_0_userqueue.o
 
 # add UVD block
 amdgpu-y += \
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
index f7325b02a191..525bd0f4d3f7 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
@@ -1331,6 +1331,8 @@ static int gfx_v11_0_rlc_backdoor_autoload_enable(struct 
amdgpu_device *adev)
return 0;
 }
 
+extern const struct amdgpu_userq_funcs userq_mes_v11_0_funcs;
+
 static int gfx_v11_0_sw_init(void *handle)
 {
int i, j, k, r, ring_id = 0;
@@ -1347,6 +1349,7 @@ static int gfx_v11_0_sw_init(void *handle)
adev->gfx.mec.num_mec = 2;
adev->gfx.mec.num_pipe_per_mec = 4;
adev->gfx.mec.num_queue_per_pipe = 4;
+   adev->userq_funcs[AMDGPU_HW_IP_GFX] = &userq_mes_v11_0_funcs;
break;
case IP_VERSION(11, 0, 1):
case IP_VERSION(11, 0, 4):
@@ -1358,6 +1361,7 @@ static int gfx_v11_0_sw_init(void *handle)
adev->gfx.mec.num_mec = 1;
adev->gfx.mec.num_pipe_per_mec = 4;
adev->gfx.mec.num_queue_per_pipe = 4;
+   adev->userq_funcs[AMDGPU_HW_IP_GFX] = &userq_mes_v11_0_funcs;
break;
default:
adev->gfx.me.num_me = 1;
diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
new file mode 100644
index ..9e7dee77d344
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
@@ -0,0 +1,110 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright 2024 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHER

[PATCH v9 03/14] drm/amdgpu: add new IOCTL for usermode queue

2024-04-26 Thread Shashank Sharma

This patch adds:
- A new IOCTL function to create and destroy
- A new structure to keep all the user queue data in one place.
- A function to generate unique index for the queue.

V1: Worked on review comments from RFC patch series:
  - Alex: Keep a list of queues, instead of single queue per process.
  - Christian: Use the queue manager instead of global ptrs,
   Don't keep the queue structure in amdgpu_ctx

V2: Worked on review comments:
 - Christian:
   - Formatting of text
   - There is no need for queuing of userqueues, with idr in place
 - Alex:
   - Remove use_doorbell, its unnecessary
   - Reuse amdgpu_mqd_props for saving mqd fields

 - Code formatting and re-arrangement

V3:
 - Integration with doorbell manager

V4:
 - Accommodate MQD union related changes in UAPI (Alex)
 - Do not set the queue size twice (Bas)

V5:
 - Remove wrapper functions for queue indexing (Christian)
 - Do not save the queue id/idr in queue itself (Christian)
 - Move the idr allocation in the IP independent generic space
  (Christian)

V6:
 - Check the validity of input IP type (Christian)

V7:
 - Move uq_func from uq_mgr to adev (Alex)
 - Add missing free(queue) for error cases (Yifan)

V9:
 - Rebase

Cc: Alex Deucher 
Cc: Christian Koenig 
Signed-off-by: Shashank Sharma 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c   |   1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 121 ++
 .../gpu/drm/amd/include/amdgpu_userqueue.h|   2 +
 3 files changed, 124 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 374970984a61..acee1c279abb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -2916,6 +2916,7 @@ const struct drm_ioctl_desc amdgpu_ioctls_kms[] = {
DRM_IOCTL_DEF_DRV(AMDGPU_GEM_VA, amdgpu_gem_va_ioctl, 
DRM_AUTH|DRM_RENDER_ALLOW),
DRM_IOCTL_DEF_DRV(AMDGPU_GEM_OP, amdgpu_gem_op_ioctl, 
DRM_AUTH|DRM_RENDER_ALLOW),
DRM_IOCTL_DEF_DRV(AMDGPU_GEM_USERPTR, amdgpu_gem_userptr_ioctl, 
DRM_AUTH|DRM_RENDER_ALLOW),
+   DRM_IOCTL_DEF_DRV(AMDGPU_USERQ, amdgpu_userq_ioctl, 
DRM_AUTH|DRM_RENDER_ALLOW),
 };
 
 static const struct drm_driver amdgpu_kms_driver = {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
index effc0c7c02cf..df97b856f891 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
@@ -23,6 +23,127 @@
  */
 
 #include "amdgpu.h"
+#include "amdgpu_vm.h"
+#include "amdgpu_userqueue.h"
+
+static struct amdgpu_usermode_queue *
+amdgpu_userqueue_find(struct amdgpu_userq_mgr *uq_mgr, int qid)
+{
+   return idr_find(&uq_mgr->userq_idr, qid);
+}
+
+static int
+amdgpu_userqueue_destroy(struct drm_file *filp, int queue_id)
+{
+   struct amdgpu_fpriv *fpriv = filp->driver_priv;
+   struct amdgpu_userq_mgr *uq_mgr = &fpriv->userq_mgr;
+   struct amdgpu_device *adev = uq_mgr->adev;
+   const struct amdgpu_userq_funcs *uq_funcs;
+   struct amdgpu_usermode_queue *queue;
+
+   mutex_lock(&uq_mgr->userq_mutex);
+
+   queue = amdgpu_userqueue_find(uq_mgr, queue_id);
+   if (!queue) {
+   DRM_DEBUG_DRIVER("Invalid queue id to destroy\n");
+   mutex_unlock(&uq_mgr->userq_mutex);
+   return -EINVAL;
+   }
+
+   uq_funcs = adev->userq_funcs[queue->queue_type];
+   uq_funcs->mqd_destroy(uq_mgr, queue);
+   idr_remove(&uq_mgr->userq_idr, queue_id);
+   kfree(queue);
+
+   mutex_unlock(&uq_mgr->userq_mutex);
+   return 0;
+}
+
+static int
+amdgpu_userqueue_create(struct drm_file *filp, union drm_amdgpu_userq *args)
+{
+   struct amdgpu_fpriv *fpriv = filp->driver_priv;
+   struct amdgpu_userq_mgr *uq_mgr = &fpriv->userq_mgr;
+   struct amdgpu_device *adev = uq_mgr->adev;
+   const struct amdgpu_userq_funcs *uq_funcs;
+   struct amdgpu_usermode_queue *queue;
+   int qid, r = 0;
+
+   /* Usermode queues are only supported for GFX/SDMA engines as of now */
+   if (args->in.ip_type != AMDGPU_HW_IP_GFX) {
+   DRM_ERROR("Usermode queue doesn't support IP type %u\n", 
args->in.ip_type);
+   return -EINVAL;
+   }
+
+   mutex_lock(&uq_mgr->userq_mutex);
+
+   uq_funcs = adev->userq_funcs[args->in.ip_type];
+   if (!uq_funcs) {
+   DRM_ERROR("Usermode queue is not supported for this IP (%u)\n", 
args->in.ip_type);
+   r = -EINVAL;
+   goto unlock;
+   }
+
+   queue = kzalloc(sizeof(struct amdgpu_usermode_queue), GFP_KERNEL);
+   if (!queue) {
+   DRM_ERROR("Failed to allocate memory for queue\n");
+   r = -ENOMEM;
+   goto unlock;
+   }
+   que

[PATCH v9 08/14] drm/amdgpu: map wptr BO into GART

2024-04-26 Thread Shashank Sharma

To support oversubscription, MES FW expects WPTR BOs to
be mapped into GART, before they are submitted to usermode
queues. This patch adds a function for the same.

V4: fix the wptr value before mapping lookup (Bas, Christian).

V5: Addressed review comments from Christian:
- Either pin object or allocate from GART, but not both.
- All the handling must be done with the VM locks held.

V7: Addressed review comments from Christian:
- Do not take vm->eviction_lock
- Use amdgpu_bo_gpu_offset to get the wptr_bo GPU offset

V8: Rebase
V9: Changed the function names from gfx_v11* to mes_v11*

Cc: Alex Deucher 
Cc: Christian Koenig 
Signed-off-by: Shashank Sharma 
Signed-off-by: Arvind Yadav 
---
 .../gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c  | 77 +++
 .../gpu/drm/amd/include/amdgpu_userqueue.h|  1 +
 2 files changed, 78 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
index 8d2cd61af26b..37b80626e792 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
@@ -30,6 +30,74 @@
 #define AMDGPU_USERQ_PROC_CTX_SZ PAGE_SIZE
 #define AMDGPU_USERQ_GANG_CTX_SZ PAGE_SIZE
 
+static int
+mes_v11_0_map_gtt_bo_to_gart(struct amdgpu_device *adev, struct amdgpu_bo *bo)
+{
+   int ret;
+
+   ret = amdgpu_bo_reserve(bo, true);
+   if (ret) {
+   DRM_ERROR("Failed to reserve bo. ret %d\n", ret);
+   goto err_reserve_bo_failed;
+   }
+
+   ret = amdgpu_ttm_alloc_gart(&bo->tbo);
+   if (ret) {
+   DRM_ERROR("Failed to bind bo to GART. ret %d\n", ret);
+   goto err_map_bo_gart_failed;
+   }
+
+   amdgpu_bo_unreserve(bo);
+   bo = amdgpu_bo_ref(bo);
+
+   return 0;
+
+err_map_bo_gart_failed:
+   amdgpu_bo_unreserve(bo);
+err_reserve_bo_failed:
+   return ret;
+}
+
+static int
+mes_v11_0_create_wptr_mapping(struct amdgpu_userq_mgr *uq_mgr,
+ struct amdgpu_usermode_queue *queue,
+ uint64_t wptr)
+{
+   struct amdgpu_device *adev = uq_mgr->adev;
+   struct amdgpu_bo_va_mapping *wptr_mapping;
+   struct amdgpu_vm *wptr_vm;
+   struct amdgpu_userq_obj *wptr_obj = &queue->wptr_obj;
+   int ret;
+
+   wptr_vm = queue->vm;
+   ret = amdgpu_bo_reserve(wptr_vm->root.bo, false);
+   if (ret)
+   return ret;
+
+   wptr &= AMDGPU_GMC_HOLE_MASK;
+   wptr_mapping = amdgpu_vm_bo_lookup_mapping(wptr_vm, wptr >> PAGE_SHIFT);
+   amdgpu_bo_unreserve(wptr_vm->root.bo);
+   if (!wptr_mapping) {
+   DRM_ERROR("Failed to lookup wptr bo\n");
+   return -EINVAL;
+   }
+
+   wptr_obj->obj = wptr_mapping->bo_va->base.bo;
+   if (wptr_obj->obj->tbo.base.size > PAGE_SIZE) {
+   DRM_ERROR("Requested GART mapping for wptr bo larger than one 
page\n");
+   return -EINVAL;
+   }
+
+   ret = mes_v11_0_map_gtt_bo_to_gart(adev, wptr_obj->obj);
+   if (ret) {
+   DRM_ERROR("Failed to map wptr bo to GART\n");
+   return ret;
+   }
+
+   queue->wptr_obj.gpu_addr = amdgpu_bo_gpu_offset_no_check(wptr_obj->obj);
+   return 0;
+}
+
 static int mes_v11_0_userq_map(struct amdgpu_userq_mgr *uq_mgr,
   struct amdgpu_usermode_queue *queue,
   struct amdgpu_mqd_prop *userq_props)
@@ -61,6 +129,7 @@ static int mes_v11_0_userq_map(struct amdgpu_userq_mgr 
*uq_mgr,
queue_input.queue_size = userq_props->queue_size >> 2;
queue_input.doorbell_offset = userq_props->doorbell_index;
queue_input.page_table_base_addr = 
amdgpu_gmc_pd_addr(queue->vm->root.bo);
+   queue_input.wptr_mc_addr = queue->wptr_obj.gpu_addr;
 
amdgpu_mes_lock(&adev->mes);
r = adev->mes.funcs->add_hw_queue(&adev->mes, &queue_input);
@@ -187,6 +256,13 @@ static int mes_v11_0_userq_mqd_create(struct 
amdgpu_userq_mgr *uq_mgr,
goto free_mqd;
}
 
+   /* FW expects WPTR BOs to be mapped into GART */
+   r = mes_v11_0_create_wptr_mapping(uq_mgr, queue, 
userq_props->wptr_gpu_addr);
+   if (r) {
+   DRM_ERROR("Failed to create WPTR mapping\n");
+   goto free_ctx;
+   }
+
/* Map userqueue into FW using MES */
r = mes_v11_0_userq_map(uq_mgr, queue, userq_props);
if (r) {
@@ -216,6 +292,7 @@ mes_v11_0_userq_mqd_destroy(struct amdgpu_userq_mgr *uq_mgr,
struct amdgpu_usermode_queue *queue)
 {
mes_v11_0_userq_unmap(uq_mgr, queue);
+   amdgpu_bo_unref(&queue->wptr_obj.obj);
amdgpu_userqueue_destroy_object(uq_mgr, &queue->fw_obj)

[PATCH v9 09/14] drm/amdgpu: generate doorbell index for userqueue

2024-04-26 Thread Shashank Sharma

The userspace sends us the doorbell object and the relative doobell
index in the object to be used for the usermode queue, but the FW
expects the absolute doorbell index on the PCI BAR in the MQD. This
patch adds a function to convert this relative doorbell index to
absolute doorbell index.

V5: Fix the db object reference leak (Christian)
V6: Pin the doorbell bo in userqueue_create() function, and unpin it
in userqueue destoy (Christian)
V7: Added missing kfree for queue in error cases
Added Alex's R-B
V8: Rebase
V9: Changed the function names from gfx_v11* to mes_v11*

Cc: Alex Deucher 
Cc: Christian Koenig 
Reviewed-by: Alex Deucher 
Signed-off-by: Shashank Sharma 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 59 +++
 .../gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c  |  1 +
 .../gpu/drm/amd/include/amdgpu_userqueue.h|  1 +
 3 files changed, 61 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
index 65cab0ad97a1..fbc7313710f6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
@@ -94,6 +94,53 @@ void amdgpu_userqueue_destroy_object(struct amdgpu_userq_mgr 
*uq_mgr,
amdgpu_bo_unref(&userq_obj->obj);
 }
 
+static uint64_t
+amdgpu_userqueue_get_doorbell_index(struct amdgpu_userq_mgr *uq_mgr,
+struct amdgpu_usermode_queue *queue,
+struct drm_file *filp,
+uint32_t doorbell_offset)
+{
+   uint64_t index;
+   struct drm_gem_object *gobj;
+   struct amdgpu_userq_obj *db_obj = &queue->db_obj;
+   int r;
+
+   gobj = drm_gem_object_lookup(filp, queue->doorbell_handle);
+   if (gobj == NULL) {
+   DRM_ERROR("Can't find GEM object for doorbell\n");
+   return -EINVAL;
+   }
+
+   db_obj->obj = amdgpu_bo_ref(gem_to_amdgpu_bo(gobj));
+   drm_gem_object_put(gobj);
+
+   /* Pin the BO before generating the index, unpin in queue destroy */
+   r = amdgpu_bo_pin(db_obj->obj, AMDGPU_GEM_DOMAIN_DOORBELL);
+   if (r) {
+   DRM_ERROR("[Usermode queues] Failed to pin doorbell object\n");
+   goto unref_bo;
+   }
+
+   r = amdgpu_bo_reserve(db_obj->obj, true);
+   if (r) {
+   DRM_ERROR("[Usermode queues] Failed to pin doorbell object\n");
+   goto unpin_bo;
+   }
+
+   index = amdgpu_doorbell_index_on_bar(uq_mgr->adev, db_obj->obj,
+doorbell_offset, sizeof(u64));
+   DRM_DEBUG_DRIVER("[Usermode queues] doorbell index=%lld\n", index);
+   amdgpu_bo_unreserve(db_obj->obj);
+   return index;
+
+unpin_bo:
+   amdgpu_bo_unpin(db_obj->obj);
+
+unref_bo:
+   amdgpu_bo_unref(&db_obj->obj);
+   return r;
+}
+
 static int
 amdgpu_userqueue_destroy(struct drm_file *filp, int queue_id)
 {
@@ -114,6 +161,8 @@ amdgpu_userqueue_destroy(struct drm_file *filp, int 
queue_id)
 
uq_funcs = adev->userq_funcs[queue->queue_type];
uq_funcs->mqd_destroy(uq_mgr, queue);
+   amdgpu_bo_unpin(queue->db_obj.obj);
+   amdgpu_bo_unref(&queue->db_obj.obj);
idr_remove(&uq_mgr->userq_idr, queue_id);
kfree(queue);
 
@@ -129,6 +178,7 @@ amdgpu_userqueue_create(struct drm_file *filp, union 
drm_amdgpu_userq *args)
struct amdgpu_device *adev = uq_mgr->adev;
const struct amdgpu_userq_funcs *uq_funcs;
struct amdgpu_usermode_queue *queue;
+   uint64_t index;
int qid, r = 0;
 
/* Usermode queues are only supported for GFX/SDMA engines as of now */
@@ -158,6 +208,15 @@ amdgpu_userqueue_create(struct drm_file *filp, union 
drm_amdgpu_userq *args)
queue->flags = args->in.flags;
queue->vm = &fpriv->vm;
 
+   /* Convert relative doorbell offset into absolute doorbell index */
+   index = amdgpu_userqueue_get_doorbell_index(uq_mgr, queue, filp, 
args->in.doorbell_offset);
+   if (index == (uint64_t)-EINVAL) {
+   DRM_ERROR("Failed to get doorbell for queue\n");
+   kfree(queue);
+   goto unlock;
+   }
+   queue->doorbell_index = index;
+
r = uq_funcs->mqd_create(uq_mgr, &args->in, queue);
if (r) {
DRM_ERROR("Failed to create Queue\n");
diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
index 37b80626e792..a6c3037d2d1f 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
@@ -240,6 +240,7 @@ static int mes_v11_0_userq_mqd_create(struct 
amdgpu_userq_mgr *uq_mgr,
userq_props->hqd_base_gpu_addr =

[PATCH v9 07/14] drm/amdgpu: map usermode queue into MES

2024-04-26 Thread Shashank Sharma

This patch adds new functions to map/unmap a usermode queue into
the FW, using the MES ring. As soon as this mapping is done, the
queue would  be considered ready to accept the workload.

V1: Addressed review comments from Alex on the RFC patch series
- Map/Unmap should be IP specific.
V2:
Addressed review comments from Christian:
- Fix the wptr_mc_addr calculation (moved into another patch)
Addressed review comments from Alex:
- Do not add fptrs for map/unmap

V3: Integration with doorbell manager
V4: Rebase
V5: Use gfx_v11_0 for function names (Alex)
V6: Removed queue->proc/gang/fw_ctx_address variables and doing the
address calculations locally to keep the queue structure GEN
independent (Alex)
V7: Added R-B from Alex
V8: Rebase
V9: Rebase

Cc: Alex Deucher 
Cc: Christian Koenig 
Reviewed-by: Alex Deucher 
Signed-off-by: Shashank Sharma 
---
 .../gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c  | 74 +++
 1 file changed, 74 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
index 9f9fdcb9c294..8d2cd61af26b 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
@@ -30,6 +30,69 @@
 #define AMDGPU_USERQ_PROC_CTX_SZ PAGE_SIZE
 #define AMDGPU_USERQ_GANG_CTX_SZ PAGE_SIZE
 
+static int mes_v11_0_userq_map(struct amdgpu_userq_mgr *uq_mgr,
+  struct amdgpu_usermode_queue *queue,
+  struct amdgpu_mqd_prop *userq_props)
+{
+   struct amdgpu_device *adev = uq_mgr->adev;
+   struct amdgpu_userq_obj *ctx = &queue->fw_obj;
+   struct mes_add_queue_input queue_input;
+   int r;
+
+   memset(&queue_input, 0x0, sizeof(struct mes_add_queue_input));
+
+   queue_input.process_va_start = 0;
+   queue_input.process_va_end = (adev->vm_manager.max_pfn - 1) << 
AMDGPU_GPU_PAGE_SHIFT;
+
+   /* set process quantum to 10 ms and gang quantum to 1 ms as default */
+   queue_input.process_quantum = 10;
+   queue_input.gang_quantum = 1;
+   queue_input.paging = false;
+
+   queue_input.process_context_addr = ctx->gpu_addr;
+   queue_input.gang_context_addr = ctx->gpu_addr + 
AMDGPU_USERQ_PROC_CTX_SZ;
+   queue_input.inprocess_gang_priority = AMDGPU_MES_PRIORITY_LEVEL_NORMAL;
+   queue_input.gang_global_priority_level = 
AMDGPU_MES_PRIORITY_LEVEL_NORMAL;
+
+   queue_input.process_id = queue->vm->pasid;
+   queue_input.queue_type = queue->queue_type;
+   queue_input.mqd_addr = queue->mqd.gpu_addr;
+   queue_input.wptr_addr = userq_props->wptr_gpu_addr;
+   queue_input.queue_size = userq_props->queue_size >> 2;
+   queue_input.doorbell_offset = userq_props->doorbell_index;
+   queue_input.page_table_base_addr = 
amdgpu_gmc_pd_addr(queue->vm->root.bo);
+
+   amdgpu_mes_lock(&adev->mes);
+   r = adev->mes.funcs->add_hw_queue(&adev->mes, &queue_input);
+   amdgpu_mes_unlock(&adev->mes);
+   if (r) {
+   DRM_ERROR("Failed to map queue in HW, err (%d)\n", r);
+   return r;
+   }
+
+   DRM_DEBUG_DRIVER("Queue (doorbell:%d) mapped successfully\n", 
userq_props->doorbell_index);
+   return 0;
+}
+
+static void mes_v11_0_userq_unmap(struct amdgpu_userq_mgr *uq_mgr,
+ struct amdgpu_usermode_queue *queue)
+{
+   struct amdgpu_device *adev = uq_mgr->adev;
+   struct mes_remove_queue_input queue_input;
+   struct amdgpu_userq_obj *ctx = &queue->fw_obj;
+   int r;
+
+   memset(&queue_input, 0x0, sizeof(struct mes_remove_queue_input));
+   queue_input.doorbell_offset = queue->doorbell_index;
+   queue_input.gang_context_addr = ctx->gpu_addr + 
AMDGPU_USERQ_PROC_CTX_SZ;
+
+   amdgpu_mes_lock(&adev->mes);
+   r = adev->mes.funcs->remove_hw_queue(&adev->mes, &queue_input);
+   amdgpu_mes_unlock(&adev->mes);
+   if (r)
+   DRM_ERROR("Failed to unmap queue in HW, err (%d)\n", r);
+}
+
 static int mes_v11_0_userq_create_ctx_space(struct amdgpu_userq_mgr *uq_mgr,
struct amdgpu_usermode_queue *queue,
struct drm_amdgpu_userq_mqd 
*mqd_user)
@@ -124,8 +187,18 @@ static int mes_v11_0_userq_mqd_create(struct 
amdgpu_userq_mgr *uq_mgr,
goto free_mqd;
}
 
+   /* Map userqueue into FW using MES */
+   r = mes_v11_0_userq_map(uq_mgr, queue, userq_props);
+   if (r) {
+   DRM_ERROR("Failed to init MQD\n");
+   goto free_ctx;
+   }
+
return 0;
 
+free_ctx:
+   amdgpu_userqueue_destroy_object(uq_mgr, &queue->fw_obj);
+
 free_mqd:
amdgpu_userqueue

[PATCH v9 11/14] drm/amdgpu: fix MES GFX mask

2024-04-26 Thread Shashank Sharma

Current MES GFX mask prevents FW to enable oversubscription. This patch
does the following:
- Fixes the mask values and adds a description for the same.
- Removes the central mask setup and makes it IP specific, as it would
  be different when the number of pipes and queues are different.

V9: introduce this patch in the series

Cc: Christian König 
Cc: Alex Deucher 
Signed-off-by: Shashank Sharma 
Signed-off-by: Arvind Yadav 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 3 ---
 drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h | 1 -
 drivers/gpu/drm/amd/amdgpu/mes_v10_1.c  | 9 +++--
 drivers/gpu/drm/amd/amdgpu/mes_v11_0.c  | 9 +++--
 4 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
index a00cf4756ad0..b405fafc0b71 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
@@ -151,9 +151,6 @@ int amdgpu_mes_init(struct amdgpu_device *adev)
adev->mes.compute_hqd_mask[i] = 0xc;
}
 
-   for (i = 0; i < AMDGPU_MES_MAX_GFX_PIPES; i++)
-   adev->mes.gfx_hqd_mask[i] = i ? 0 : 0xfffe;
-
for (i = 0; i < AMDGPU_MES_MAX_SDMA_PIPES; i++) {
if (amdgpu_ip_version(adev, SDMA0_HWIP, 0) <
IP_VERSION(6, 0, 0))
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
index 4c8fc3117ef8..598556619337 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
@@ -110,7 +110,6 @@ struct amdgpu_mes {
uint32_tvmid_mask_gfxhub;
uint32_tvmid_mask_mmhub;
uint32_t
compute_hqd_mask[AMDGPU_MES_MAX_COMPUTE_PIPES];
-   uint32_tgfx_hqd_mask[AMDGPU_MES_MAX_GFX_PIPES];
uint32_t
sdma_hqd_mask[AMDGPU_MES_MAX_SDMA_PIPES];
uint32_t
aggregated_doorbells[AMDGPU_MES_PRIORITY_NUM_LEVELS];
uint32_tsch_ctx_offs;
diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v10_1.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v10_1.c
index 1e5ad1e08d2a..4d1121d1a1e7 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v10_1.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v10_1.c
@@ -290,8 +290,13 @@ static int mes_v10_1_set_hw_resources(struct amdgpu_mes 
*mes)
mes_set_hw_res_pkt.compute_hqd_mask[i] =
mes->compute_hqd_mask[i];
 
-   for (i = 0; i < MAX_GFX_PIPES; i++)
-   mes_set_hw_res_pkt.gfx_hqd_mask[i] = mes->gfx_hqd_mask[i];
+   /*
+* GFX pipe 0 queue 0 is being used by kernel
+* Set GFX pipe 0 queue 1 for MES scheduling
+* GFX pipe 1 can't be used for MES due to HW limitation.
+*/
+   mes_set_hw_res_pkt.gfx_hqd_mask[0] = 0x2;
+   mes_set_hw_res_pkt.gfx_hqd_mask[1] = 0;
 
for (i = 0; i < MAX_SDMA_PIPES; i++)
mes_set_hw_res_pkt.sdma_hqd_mask[i] = mes->sdma_hqd_mask[i];
diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
index 63f281a9984d..feb7fa2c304c 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
@@ -387,8 +387,13 @@ static int mes_v11_0_set_hw_resources(struct amdgpu_mes 
*mes)
mes_set_hw_res_pkt.compute_hqd_mask[i] =
mes->compute_hqd_mask[i];
 
-   for (i = 0; i < MAX_GFX_PIPES; i++)
-   mes_set_hw_res_pkt.gfx_hqd_mask[i] = mes->gfx_hqd_mask[i];
+   /*
+* GFX pipe 0 queue 0 is being used by kernel
+* Set GFX pipe 0 queue 1 for MES scheduling
+* GFX pipe 1 can't be used for MES due to HW limitation.
+*/
+   mes_set_hw_res_pkt.gfx_hqd_mask[0] = 0x2;
+   mes_set_hw_res_pkt.gfx_hqd_mask[1] = 0;
 
for (i = 0; i < MAX_SDMA_PIPES; i++)
mes_set_hw_res_pkt.sdma_hqd_mask[i] = mes->sdma_hqd_mask[i];
-- 
2.43.2

[PATCH v9 14/14] drm/amdgpu: add kernel config for gfx-userqueue

2024-04-26 Thread Shashank Sharma

This patch:
- adds a kernel config option "CONFIG_DRM_AMD_USERQ_GFX"
- moves the usequeue initialization code for all IPs under
  this flag

so that the userqueue works only when the config is enabled.

Cc: Alex Deucher 
Cc: Christian Koenig 
Signed-off-by: Shashank Sharma 
---
 drivers/gpu/drm/amd/amdgpu/Kconfig | 8 
 drivers/gpu/drm/amd/amdgpu/Makefile| 8 ++--
 drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 4 
 drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c | 3 +++
 4 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/Kconfig 
b/drivers/gpu/drm/amd/amdgpu/Kconfig
index 22d88f8ef527..bba963527d22 100644
--- a/drivers/gpu/drm/amd/amdgpu/Kconfig
+++ b/drivers/gpu/drm/amd/amdgpu/Kconfig
@@ -80,6 +80,14 @@ config DRM_AMDGPU_WERROR
  Add -Werror to the build flags for amdgpu.ko.
  Only enable this if you are warning code for amdgpu.ko.
 
+config DRM_AMDGPU_USERQ_GFX
+   bool "Enable Navi 3x gfx usermode queues"
+   depends on DRM_AMDGPU
+   default n
+   help
+ Choose this option to enable usermode queue support for GFX
+  workload submission. This feature is supported on Navi 3X only.
+
 source "drivers/gpu/drm/amd/acp/Kconfig"
 source "drivers/gpu/drm/amd/display/Kconfig"
 source "drivers/gpu/drm/amd/amdkfd/Kconfig"
diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
b/drivers/gpu/drm/amd/amdgpu/Makefile
index a640bfa468ad..0b17fc1740a0 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -184,8 +184,12 @@ amdgpu-y += \
 amdgpu-y += \
amdgpu_mes.o \
mes_v10_1.o \
-   mes_v11_0.o \
-   mes_v11_0_userqueue.o
+   mes_v11_0.o
+
+# add GFX userqueue support
+ifneq ($(CONFIG_DRM_AMD_USERQ_GFX),)
+amdgpu-y += mes_v11_0_userqueue.o
+endif
 
 # add UVD block
 amdgpu-y += \
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
index 27b86f7fe949..8591aed9f9ab 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
@@ -1349,8 +1349,10 @@ static int gfx_v11_0_sw_init(void *handle)
adev->gfx.mec.num_mec = 2;
adev->gfx.mec.num_pipe_per_mec = 4;
adev->gfx.mec.num_queue_per_pipe = 4;
+#ifdef CONFIG_DRM_AMD_USERQ_GFX
adev->userq_funcs[AMDGPU_HW_IP_GFX] = &userq_mes_v11_0_funcs;
adev->userq_funcs[AMDGPU_HW_IP_COMPUTE] = 
&userq_mes_v11_0_funcs;
+#endif
break;
case IP_VERSION(11, 0, 1):
case IP_VERSION(11, 0, 4):
@@ -1362,8 +1364,10 @@ static int gfx_v11_0_sw_init(void *handle)
adev->gfx.mec.num_mec = 1;
adev->gfx.mec.num_pipe_per_mec = 4;
adev->gfx.mec.num_queue_per_pipe = 4;
+#ifdef CONFIG_DRM_AMD_USERQ_GFX
adev->userq_funcs[AMDGPU_HW_IP_GFX] = &userq_mes_v11_0_funcs;
adev->userq_funcs[AMDGPU_HW_IP_COMPUTE] = 
&userq_mes_v11_0_funcs;
+#endif
break;
default:
adev->gfx.me.num_me = 1;
diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c 
b/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
index 90354a70c807..084059c95db6 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
@@ -1267,7 +1267,10 @@ static int sdma_v6_0_sw_init(void *handle)
return -EINVAL;
}
 
+#ifdef CONFIG_DRM_AMD_USERQ_GFX
adev->userq_funcs[AMDGPU_HW_IP_DMA] = &userq_mes_v11_0_funcs;
+#endif
+
return r;
 }
 
-- 
2.43.2

[PATCH v9 10/14] drm/amdgpu: cleanup leftover queues

2024-04-26 Thread Shashank Sharma

This patch adds code to cleanup any leftover userqueues which
a user might have missed to destroy due to a crash or any other
programming error.

V7: Added Alex's R-B
V8: Rebase
V9: Rebase

Cc: Alex Deucher 
Cc: Christian Koenig 
Reviewed-by: Alex Deucher 
Suggested-by: Bas Nieuwenhuizen 
Signed-off-by: Bas Nieuwenhuizen 
Signed-off-by: Shashank Sharma 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 27 ++-
 1 file changed, 20 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
index fbc7313710f6..781283753804 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
@@ -26,6 +26,19 @@
 #include "amdgpu_vm.h"
 #include "amdgpu_userqueue.h"
 
+static void
+amdgpu_userqueue_cleanup(struct amdgpu_userq_mgr *uq_mgr,
+struct amdgpu_usermode_queue *queue,
+int queue_id)
+{
+   struct amdgpu_device *adev = uq_mgr->adev;
+   const struct amdgpu_userq_funcs *uq_funcs = 
adev->userq_funcs[queue->queue_type];
+
+   uq_funcs->mqd_destroy(uq_mgr, queue);
+   idr_remove(&uq_mgr->userq_idr, queue_id);
+   kfree(queue);
+}
+
 static struct amdgpu_usermode_queue *
 amdgpu_userqueue_find(struct amdgpu_userq_mgr *uq_mgr, int qid)
 {
@@ -146,8 +159,6 @@ amdgpu_userqueue_destroy(struct drm_file *filp, int 
queue_id)
 {
struct amdgpu_fpriv *fpriv = filp->driver_priv;
struct amdgpu_userq_mgr *uq_mgr = &fpriv->userq_mgr;
-   struct amdgpu_device *adev = uq_mgr->adev;
-   const struct amdgpu_userq_funcs *uq_funcs;
struct amdgpu_usermode_queue *queue;
 
mutex_lock(&uq_mgr->userq_mutex);
@@ -159,13 +170,9 @@ amdgpu_userqueue_destroy(struct drm_file *filp, int 
queue_id)
return -EINVAL;
}
 
-   uq_funcs = adev->userq_funcs[queue->queue_type];
-   uq_funcs->mqd_destroy(uq_mgr, queue);
amdgpu_bo_unpin(queue->db_obj.obj);
amdgpu_bo_unref(&queue->db_obj.obj);
-   idr_remove(&uq_mgr->userq_idr, queue_id);
-   kfree(queue);
-
+   amdgpu_userqueue_cleanup(uq_mgr, queue, queue_id);
mutex_unlock(&uq_mgr->userq_mutex);
return 0;
 }
@@ -277,6 +284,12 @@ int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr 
*userq_mgr, struct amdgpu_devi
 
 void amdgpu_userq_mgr_fini(struct amdgpu_userq_mgr *userq_mgr)
 {
+   uint32_t queue_id;
+   struct amdgpu_usermode_queue *queue;
+
+   idr_for_each_entry(&userq_mgr->userq_idr, queue, queue_id)
+   amdgpu_userqueue_cleanup(userq_mgr, queue, queue_id);
+
idr_destroy(&userq_mgr->userq_idr);
mutex_destroy(&userq_mgr->userq_mutex);
 }
-- 
2.43.2

[PATCH v9 12/14] drm/amdgpu: enable SDMA usermode queues

2024-04-26 Thread Shashank Sharma

This patch does necessary modifications to enable the SDMA
usermode queues using the existing userqueue infrastructure.

V9: introduced this patch in the series

Cc: Christian König 
Cc: Alex Deucher 
Signed-off-by: Shashank Sharma 
Signed-off-by: Arvind Yadav 
Signed-off-by: Srinivasan Shanmugam 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c| 2 +-
 drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c | 4 
 drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c   | 3 +++
 3 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
index 781283753804..e516487e8db9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
@@ -189,7 +189,7 @@ amdgpu_userqueue_create(struct drm_file *filp, union 
drm_amdgpu_userq *args)
int qid, r = 0;
 
/* Usermode queues are only supported for GFX/SDMA engines as of now */
-   if (args->in.ip_type != AMDGPU_HW_IP_GFX) {
+   if (args->in.ip_type != AMDGPU_HW_IP_GFX && args->in.ip_type != 
AMDGPU_HW_IP_DMA) {
DRM_ERROR("Usermode queue doesn't support IP type %u\n", 
args->in.ip_type);
return -EINVAL;
}
diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
index a6c3037d2d1f..a5e270eda37b 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
@@ -182,6 +182,10 @@ static int mes_v11_0_userq_create_ctx_space(struct 
amdgpu_userq_mgr *uq_mgr,
return r;
}
 
+   /* We don't need to set other FW objects for SDMA queues */
+   if (queue->queue_type == AMDGPU_HW_IP_DMA)
+   return 0;
+
/* Shadow and GDS objects come directly from userspace */
mqd->shadow_base_lo = mqd_user->shadow_va & 0xFFFC;
mqd->shadow_base_hi = upper_32_bits(mqd_user->shadow_va);
diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c 
b/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
index 361835a61f2e..90354a70c807 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
@@ -1225,6 +1225,8 @@ static int sdma_v6_0_early_init(void *handle)
return 0;
 }
 
+extern const struct amdgpu_userq_funcs userq_mes_v11_0_funcs;
+
 static int sdma_v6_0_sw_init(void *handle)
 {
struct amdgpu_ring *ring;
@@ -1265,6 +1267,7 @@ static int sdma_v6_0_sw_init(void *handle)
return -EINVAL;
}
 
+   adev->userq_funcs[AMDGPU_HW_IP_DMA] = &userq_mes_v11_0_funcs;
return r;
 }
 
-- 
2.43.2

[PATCH v9 13/14] drm/amdgpu: enable compute/gfx usermode queue

2024-04-26 Thread Shashank Sharma

From: Arvind Yadav 

This patch does the necessary changes required to
enable compute workload support using the existing
usermode queues infrastructure.

Cc: Alex Deucher 
Cc: Christian Koenig 
Signed-off-by: Arvind Yadav 
Signed-off-by: Shashank Sharma 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c|  3 ++-
 drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c   |  2 ++
 drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c | 10 +-
 include/uapi/drm/amdgpu_drm.h|  1 +
 4 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
index e516487e8db9..78d34fa7a0b9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
@@ -189,7 +189,8 @@ amdgpu_userqueue_create(struct drm_file *filp, union 
drm_amdgpu_userq *args)
int qid, r = 0;
 
/* Usermode queues are only supported for GFX/SDMA engines as of now */
-   if (args->in.ip_type != AMDGPU_HW_IP_GFX && args->in.ip_type != 
AMDGPU_HW_IP_DMA) {
+   if (args->in.ip_type != AMDGPU_HW_IP_GFX && args->in.ip_type != 
AMDGPU_HW_IP_DMA
+   && args->in.ip_type != AMDGPU_HW_IP_COMPUTE) {
DRM_ERROR("Usermode queue doesn't support IP type %u\n", 
args->in.ip_type);
return -EINVAL;
}
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
index 525bd0f4d3f7..27b86f7fe949 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
@@ -1350,6 +1350,7 @@ static int gfx_v11_0_sw_init(void *handle)
adev->gfx.mec.num_pipe_per_mec = 4;
adev->gfx.mec.num_queue_per_pipe = 4;
adev->userq_funcs[AMDGPU_HW_IP_GFX] = &userq_mes_v11_0_funcs;
+   adev->userq_funcs[AMDGPU_HW_IP_COMPUTE] = 
&userq_mes_v11_0_funcs;
break;
case IP_VERSION(11, 0, 1):
case IP_VERSION(11, 0, 4):
@@ -1362,6 +1363,7 @@ static int gfx_v11_0_sw_init(void *handle)
adev->gfx.mec.num_pipe_per_mec = 4;
adev->gfx.mec.num_queue_per_pipe = 4;
adev->userq_funcs[AMDGPU_HW_IP_GFX] = &userq_mes_v11_0_funcs;
+   adev->userq_funcs[AMDGPU_HW_IP_COMPUTE] = 
&userq_mes_v11_0_funcs;
break;
default:
adev->gfx.me.num_me = 1;
diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
index a5e270eda37b..d61d80f86003 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
@@ -183,7 +183,8 @@ static int mes_v11_0_userq_create_ctx_space(struct 
amdgpu_userq_mgr *uq_mgr,
}
 
/* We don't need to set other FW objects for SDMA queues */
-   if (queue->queue_type == AMDGPU_HW_IP_DMA)
+   if ((queue->queue_type == AMDGPU_HW_IP_DMA) ||
+   (queue->queue_type == AMDGPU_HW_IP_COMPUTE))
return 0;
 
/* Shadow and GDS objects come directly from userspace */
@@ -246,6 +247,13 @@ static int mes_v11_0_userq_mqd_create(struct 
amdgpu_userq_mgr *uq_mgr,
userq_props->use_doorbell = true;
userq_props->doorbell_index = queue->doorbell_index;
 
+   if (queue->queue_type == AMDGPU_HW_IP_COMPUTE) {
+   userq_props->eop_gpu_addr = mqd_user->eop_va;
+   userq_props->hqd_pipe_priority = AMDGPU_GFX_PIPE_PRIO_NORMAL;
+   userq_props->hqd_queue_priority = 
AMDGPU_GFX_QUEUE_PRIORITY_MINIMUM;
+   userq_props->hqd_active = false;
+   }
+
queue->userq_prop = userq_props;
 
r = mqd_hw_default->init_mqd(adev, (void *)queue->mqd.cpu_ptr, 
userq_props);
diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
index 22f56a30f7cb..676792ad3618 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -375,6 +375,7 @@ struct drm_amdgpu_userq_mqd {
 * sized.
 */
__u64   csa_va;
+   __u64   eop_va;
 };
 
 struct drm_amdgpu_userq_in {
-- 
2.43.2

[PATCH] drm/amdgpu: fix doorbell regression

2024-04-29 Thread Shashank Sharma

This patch adds a missed handling of PL domain doorbell while
handling VRAM faults.

Fixes: a6ff969fe9cb ("drm/amdgpu: fix visible VRAM handling during faults")
Cc: Christian Koenig 
Cc: Alex Deucher 
Signed-off-by: Shashank Sharma 
Signed-off-by: Arvind Yadav 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 1d71729e3f6b..c71eeb6a04e6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -419,7 +419,7 @@ bool amdgpu_res_cpu_visible(struct amdgpu_device *adev,
return false;
 
if (res->mem_type == TTM_PL_SYSTEM || res->mem_type == TTM_PL_TT ||
-   res->mem_type == AMDGPU_PL_PREEMPT)
+   res->mem_type == AMDGPU_PL_PREEMPT || res->mem_type == 
AMDGPU_PL_DOORBELL)
return true;
 
if (res->mem_type != TTM_PL_VRAM)
-- 
2.43.2

[PATCH] drm/amdgpu: add gfx eviction fence helpers

2024-04-30 Thread Shashank Sharma

This patch adds basic eviction fence framework for the gfx buffers.
The idea is to:
- One eviction fence is created per gfx process, at kms_open.
- This same fence is attached to all the gem buffers created
  by this process.

This framework will be further used for usermode queues.

Cc: Christian Koenig 
Cc: Alex Deucher 
Signed-off-by: Shashank Sharma 
---
 drivers/gpu/drm/amd/amdgpu/Makefile   |  3 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu.h   | 15 +++
 .../drm/amd/amdgpu/amdgpu_eviction_fence.c| 96 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c   | 10 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c   |  4 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.h|  2 +
 6 files changed, 127 insertions(+), 3 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c

diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
b/drivers/gpu/drm/amd/amdgpu/Makefile
index 4536c8ad0e11..ba00789eb4ef 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -80,7 +80,8 @@ amdgpu-y += amdgpu_device.o amdgpu_doorbell_mgr.o 
amdgpu_kms.o \
amdgpu_umc.o smu_v11_0_i2c.o amdgpu_fru_eeprom.o amdgpu_rap.o \
amdgpu_fw_attestation.o amdgpu_securedisplay.o \
amdgpu_eeprom.o amdgpu_mca.o amdgpu_psp_ta.o amdgpu_lsdma.o \
-   amdgpu_ring_mux.o amdgpu_xcp.o amdgpu_seq64.o amdgpu_aca.o
+   amdgpu_ring_mux.o amdgpu_xcp.o amdgpu_seq64.o amdgpu_aca.o \
+   amdgpu_eviction_fence.o
 
 amdgpu-$(CONFIG_PROC_FS) += amdgpu_fdinfo.o
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 9c62552bec34..4a4b2680eb9b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -463,6 +463,13 @@ struct amdgpu_flip_work {
boolasync;
 };
 
+struct amdgpu_eviction_fence {
+   u64  fence_ctx;
+   atomic_t seq;
+   spinlock_t   lock;
+   struct dma_fence base;
+   char timeline_name[TASK_COMM_LEN];
+};
 
 /*
  * file private structure
@@ -476,6 +483,7 @@ struct amdgpu_fpriv {
struct mutexbo_list_lock;
struct idr  bo_list_handles;
struct amdgpu_ctx_mgr   ctx_mgr;
+   struct amdgpu_eviction_fence eviction_fence;
/** GPU partition selection */
uint32_txcp_id;
 };
@@ -1474,6 +1482,13 @@ void amdgpu_disable_vblank_kms(struct drm_crtc *crtc);
 int amdgpu_info_ioctl(struct drm_device *dev, void *data,
  struct drm_file *filp);
 
+/* Eviction fence */
+void amdgpu_eviction_fence_create(struct amdgpu_fpriv *fpriv);
+int amdgpu_eviction_fence_attach(struct amdgpu_fpriv *fpriv,
+struct amdgpu_bo *bo);
+void amdgpu_eviction_fence_detach(struct amdgpu_fpriv *fpriv);
+void amdgpu_eviction_fence_signal(struct amdgpu_fpriv *fpriv);
+
 /*
  * functions used by amdgpu_encoder.c
  */
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
new file mode 100644
index ..36009d89be03
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
@@ -0,0 +1,96 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright 2024 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+#include 
+#include "amdgpu.h"
+
+static const char *
+amdgpu_ev_fence_get_driver_name(struct dma_fence *fence)
+{
+   return "amdgpu";
+}
+
+static const char *
+amdgpu_ev_fence_get_timeline_name(struct dma_fence *f)
+{
+   struct amdgpu_eviction_fence *ef;
+
+   ef = container_of(f, struct amdgpu_eviction_fence, base);
+   return ef->timeline_name;
+}
+
+static const struct dma_fence_ops amdgpu_eviction_fence_ops = {
+   .use_64bit_seqno = true,
+   .get_dr

[PATCH v10 00/14] AMDGPU usermode queues

2024-05-02 Thread Shashank Sharma

This patch series introduces AMDGPU usermode queues for gfx workloads.
Usermode queues is a method of GPU workload submission into the graphics
hardware without any interaction with kernel/DRM schedulers. In this
method, a userspace graphics application can create its own workqueue and
submit it directly in the GPU HW.

The general idea of how this is supposed to work:
- The application creates the following GPU objetcs:
  - A queue object to hold the workload packets.
  - A read pointer object.
  - A write pointer object.
  - A doorbell page.
  - Shadow bufffer pages.
  - GDS buffer pages (as required).
- The application picks a 32-bit offset in the doorbell page for this
  queue.
- The application uses the usermode_queue_create IOCTL introduced in
  this patch, by passing the GPU addresses of these objects (read ptr,
  write ptr, queue base address, shadow, gds) with doorbell object and
  32-bit doorbell offset in the doorbell page.
- The kernel creates the queue and maps it in the HW.
- The application maps the GPU buffers in process address space.
- The application can start submitting the data in the queue as soon as
  the kernel IOCTL returns.
- After filling the workload data in the queue, the app must write the
  number of dwords added in the queue into the doorbell offset and the
  WPTR buffer, and the GPU will start fetching the data.
- This series adds usermode queue support for all three MES based IPs
  (GFX, SDMA and Compute).

libDRM changes for this series and a sample DRM test program can be
found here:
https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/287

MESA changes consuming this series can be seen in the MR here:
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29010

Alex Deucher (1):
  drm/amdgpu: UAPI for user queue management

Shashank Sharma (13):
  drm/amdgpu: add usermode queue base code
  drm/amdgpu: add new IOCTL for usermode queue
  drm/amdgpu: add helpers to create userqueue object
  drm/amdgpu: create MES-V11 usermode queue for GFX
  drm/amdgpu: create context space for usermode queue
  drm/amdgpu: map usermode queue into MES
  drm/amdgpu: map wptr BO into GART
  drm/amdgpu: generate doorbell index for userqueue
  drm/amdgpu: cleanup leftover queues
  drm/amdgpu: enable GFX-V11 userqueue support
  drm/amdgpu: enable SDMA-V6 usermode queues
  drm/amdgpu: enable compute/gfx usermode queue
  drm/amdgpu: add kernel config for gfx-userqueue

 drivers/gpu/drm/amd/amdgpu/Kconfig|   8 +
 drivers/gpu/drm/amd/amdgpu/Makefile   |   5 +
 drivers/gpu/drm/amd/amdgpu/amdgpu.h   |   3 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c   |   2 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c   |   6 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 296 +++
 drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c|   9 +
 .../gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c  | 338 ++
 .../gpu/drm/amd/amdgpu/mes_v11_0_userqueue.h  |  30 ++
 drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c|   5 +
 .../gpu/drm/amd/include/amdgpu_userqueue.h|  79 
 include/uapi/drm/amdgpu_drm.h | 122 +++
 12 files changed, 903 insertions(+)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
 create mode 100644 drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
 create mode 100644 drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.h
 create mode 100644 drivers/gpu/drm/amd/include/amdgpu_userqueue.h

-- 
2.43.2

[PATCH v10 01/14] drm/amdgpu: UAPI for user queue management

2024-05-02 Thread Shashank Sharma

From: Alex Deucher 

This patch intorduces new UAPI/IOCTL for usermode graphics
queue. The userspace app will fill this structure and request
the graphics driver to add a graphics work queue for it. The
output of this UAPI is a queue id.

This UAPI maps the queue into GPU, so the graphics app can start
submitting work to the queue as soon as the call returns.

V2: Addressed review comments from Alex and Christian
- Make the doorbell offset's comment clearer
- Change the output parameter name to queue_id

V3: Integration with doorbell manager

V4:
- Updated the UAPI doc (Pierre-Eric)
- Created a Union for engine specific MQDs (Alex)
- Added Christian's R-B
V5:
- Add variables for GDS and CSA in MQD structure (Alex)
- Make MQD data a ptr-size pair instead of union (Alex)

V9:
   - renamed struct drm_amdgpu_userq_mqd_gfx_v11 to struct
 drm_amdgpu_userq_mqd as its being used for SDMA and
 compute queues as well

V10:
- keeping the drm_amdgpu_userq_mqd IP independent, moving the
  _gfx_v11 objects in a separate structure in other patch.
  (Alex)

Cc: Alex Deucher 
Cc: Christian Koenig 
Reviewed-by: Christian König 
Signed-off-by: Alex Deucher 
Signed-off-by: Shashank Sharma 
---
 include/uapi/drm/amdgpu_drm.h | 90 +++
 1 file changed, 90 insertions(+)

diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
index 5b6c0055cfcf..f7313e576f06 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -54,6 +54,7 @@ extern "C" {
 #define DRM_AMDGPU_VM  0x13
 #define DRM_AMDGPU_FENCE_TO_HANDLE 0x14
 #define DRM_AMDGPU_SCHED   0x15
+#define DRM_AMDGPU_USERQ   0x16
 
 #define DRM_IOCTL_AMDGPU_GEM_CREATEDRM_IOWR(DRM_COMMAND_BASE + 
DRM_AMDGPU_GEM_CREATE, union drm_amdgpu_gem_create)
 #define DRM_IOCTL_AMDGPU_GEM_MMAP  DRM_IOWR(DRM_COMMAND_BASE + 
DRM_AMDGPU_GEM_MMAP, union drm_amdgpu_gem_mmap)
@@ -71,6 +72,7 @@ extern "C" {
 #define DRM_IOCTL_AMDGPU_VMDRM_IOWR(DRM_COMMAND_BASE + 
DRM_AMDGPU_VM, union drm_amdgpu_vm)
 #define DRM_IOCTL_AMDGPU_FENCE_TO_HANDLE DRM_IOWR(DRM_COMMAND_BASE + 
DRM_AMDGPU_FENCE_TO_HANDLE, union drm_amdgpu_fence_to_handle)
 #define DRM_IOCTL_AMDGPU_SCHED DRM_IOW(DRM_COMMAND_BASE + 
DRM_AMDGPU_SCHED, union drm_amdgpu_sched)
+#define DRM_IOCTL_AMDGPU_USERQ DRM_IOW(DRM_COMMAND_BASE + 
DRM_AMDGPU_USERQ, union drm_amdgpu_userq)
 
 /**
  * DOC: memory domains
@@ -317,6 +319,94 @@ union drm_amdgpu_ctx {
union drm_amdgpu_ctx_out out;
 };
 
+/* user queue IOCTL */
+#define AMDGPU_USERQ_OP_CREATE 1
+#define AMDGPU_USERQ_OP_FREE   2
+
+/* Flag to indicate secure buffer related workload, unused for now */
+#define AMDGPU_USERQ_MQD_FLAGS_SECURE  (1 << 0)
+/* Flag to indicate AQL workload, unused for now */
+#define AMDGPU_USERQ_MQD_FLAGS_AQL (1 << 1)
+
+/*
+ * MQD (memory queue descriptor) is a set of parameters which allow
+ * the GPU to uniquely define and identify a usermode queue. This
+ * structure defines the MQD for GFX-V11 IP ver 0.
+ */
+struct drm_amdgpu_userq_in {
+   /** AMDGPU_USERQ_OP_* */
+   __u32   op;
+   /** Queue handle for USERQ_OP_FREE */
+   __u32   queue_id;
+   /** the target GPU engine to execute workload (AMDGPU_HW_IP_*) */
+   __u32   ip_type;
+   /**
+* @flags: flags to indicate special function for queue like secure
+* buffer (TMZ). Unused for now.
+*/
+   __u32   flags;
+   /**
+* @doorbell_handle: the handle of doorbell GEM object
+* associated to this client.
+*/
+   __u32   doorbell_handle;
+   /**
+* @doorbell_offset: 32-bit offset of the doorbell in the doorbell bo.
+* Kernel will generate absolute doorbell offset using doorbell_handle
+* and doorbell_offset in the doorbell bo.
+*/
+   __u32   doorbell_offset;
+
+   /**
+* @queue_va: Virtual address of the GPU memory which holds the queue
+* object. The queue holds the workload packets.
+*/
+   __u64   queue_va;
+   /**
+* @queue_size: Size of the queue in bytes, this needs to be 256-byte
+* aligned.
+*/
+   __u64   queue_size;
+   /**
+* @rptr_va : Virtual address of the GPU memory which holds the ring 
RPTR.
+* This object must be at least 8 byte in size and aligned to 8-byte 
offset.
+*/
+   __u64   rptr_va;
+   /**
+* @wptr_va : Virtual address of the GPU memory which holds the ring 
WPTR.
+* This object must be at least 8 byte in size and aligned to 8-byte 
offset.
+*
+* Queue, RPTR and WPTR can come from the same object, as long as the 
size
+* and alignment related requirements are met.
+*/
+   __u64   wptr_va;
+   /**
+* @mqd: Queue descriptor for USERQ_OP_CREATE
+

[PATCH v10 02/14] drm/amdgpu: add usermode queue base code

2024-05-02 Thread Shashank Sharma

This patch adds IP independent skeleton code for amdgpu
usermode queue. It contains:
- A new files with init functions of usermode queues.
- A queue context manager in driver private data.

V1: Worked on design review comments from RFC patch series:
(https://patchwork.freedesktop.org/series/112214/)
- Alex: Keep a list of queues, instead of single queue per process.
- Christian: Use the queue manager instead of global ptrs,
   Don't keep the queue structure in amdgpu_ctx

V2:
 - Reformatted code, split the big patch into two

V3:
- Integration with doorbell manager

V4:
- Align the structure member names to the largest member's column
  (Luben)
- Added SPDX license (Luben)

V5:
- Do not add amdgpu.h in amdgpu_userqueue.h (Christian).
- Move struct amdgpu_userq_mgr into amdgpu_userqueue.h (Christian).

V6: Rebase
V9: Rebase
V10: Rebase + Alex's R-B

Cc: Alex Deucher 
Cc: Christian Koenig 
Reviewed-by: Christian König 
Reviewed-by: Alex Deucher 
Signed-off-by: Shashank Sharma 
---
 drivers/gpu/drm/amd/amdgpu/Makefile   |  2 +
 drivers/gpu/drm/amd/amdgpu/amdgpu.h   |  3 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c   |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c   |  6 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 40 
 .../gpu/drm/amd/include/amdgpu_userqueue.h| 61 +++
 6 files changed, 113 insertions(+)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
 create mode 100644 drivers/gpu/drm/amd/include/amdgpu_userqueue.h

diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
b/drivers/gpu/drm/amd/amdgpu/Makefile
index de7b76327f5b..2d421f17626d 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -266,6 +266,8 @@ amdgpu-y += \
 # add amdkfd interfaces
 amdgpu-y += amdgpu_amdkfd.o
 
+# add usermode queue
+amdgpu-y += amdgpu_userqueue.o
 
 ifneq ($(CONFIG_HSA_AMD),)
 AMDKFD_PATH := ../amdkfd
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 8bb8b414d511..c24e9f9d37e5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -112,6 +112,7 @@
 #include "amdgpu_xcp.h"
 #include "amdgpu_seq64.h"
 #include "amdgpu_reg_state.h"
+#include "amdgpu_userqueue.h"
 
 #define MAX_GPU_INSTANCE   64
 
@@ -486,6 +487,7 @@ struct amdgpu_fpriv {
struct mutexbo_list_lock;
struct idr  bo_list_handles;
struct amdgpu_ctx_mgr   ctx_mgr;
+   struct amdgpu_userq_mgr userq_mgr;
/** GPU partition selection */
uint32_txcp_id;
 };
@@ -1050,6 +1052,7 @@ struct amdgpu_device {
boolenable_uni_mes;
struct amdgpu_mes   mes;
struct amdgpu_mqd   mqds[AMDGPU_HW_IP_NUM];
+   const struct amdgpu_userq_funcs *userq_funcs[AMDGPU_HW_IP_NUM];
 
/* df */
struct amdgpu_dfdf;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 447fa858c654..b52442e2d04a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -50,6 +50,7 @@
 #include "amdgpu_reset.h"
 #include "amdgpu_sched.h"
 #include "amdgpu_xgmi.h"
+#include "amdgpu_userqueue.h"
 #include "../amdxcp/amdgpu_xcp_drv.h"
 
 /*
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index a0ea6fe8d060..76d02dc330a4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -44,6 +44,7 @@
 #include "amdgpu_display.h"
 #include "amdgpu_ras.h"
 #include "amd_pcie.h"
+#include "amdgpu_userqueue.h"
 
 void amdgpu_unregister_gpu_instance(struct amdgpu_device *adev)
 {
@@ -1357,6 +1358,10 @@ int amdgpu_driver_open_kms(struct drm_device *dev, 
struct drm_file *file_priv)
 
amdgpu_ctx_mgr_init(&fpriv->ctx_mgr, adev);
 
+   r = amdgpu_userq_mgr_init(&fpriv->userq_mgr, adev);
+   if (r)
+   DRM_WARN("Can't setup usermode queues, use legacy workload 
submission only\n");
+
file_priv->driver_priv = fpriv;
goto out_suspend;
 
@@ -1426,6 +1431,7 @@ void amdgpu_driver_postclose_kms(struct drm_device *dev,
 
amdgpu_ctx_mgr_fini(&fpriv->ctx_mgr);
amdgpu_vm_fini(adev, &fpriv->vm);
+   amdgpu_userq_mgr_fini(&fpriv->userq_mgr);
 
if (pasid)
amdgpu_pasid_free_delayed(pd->tbo.base.resv, pasid);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
new file mode 100644
index ..effc0c7c02cf
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
@@ -0,0 +1,40 @@
+// SPDX-License-Identifier:

[PATCH v10 03/14] drm/amdgpu: add new IOCTL for usermode queue

2024-05-02 Thread Shashank Sharma

This patch adds:
- A new IOCTL function to create and destroy
- A new structure to keep all the user queue data in one place.
- A function to generate unique index for the queue.

V1: Worked on review comments from RFC patch series:
  - Alex: Keep a list of queues, instead of single queue per process.
  - Christian: Use the queue manager instead of global ptrs,
   Don't keep the queue structure in amdgpu_ctx

V2: Worked on review comments:
 - Christian:
   - Formatting of text
   - There is no need for queuing of userqueues, with idr in place
 - Alex:
   - Remove use_doorbell, its unnecessary
   - Reuse amdgpu_mqd_props for saving mqd fields

 - Code formatting and re-arrangement

V3:
 - Integration with doorbell manager

V4:
 - Accommodate MQD union related changes in UAPI (Alex)
 - Do not set the queue size twice (Bas)

V5:
 - Remove wrapper functions for queue indexing (Christian)
 - Do not save the queue id/idr in queue itself (Christian)
 - Move the idr allocation in the IP independent generic space
  (Christian)

V6:
 - Check the validity of input IP type (Christian)

V7:
 - Move uq_func from uq_mgr to adev (Alex)
 - Add missing free(queue) for error cases (Yifan)

V9:
 - Rebase

V10: Addressed review comments from Christian, and added R-B:
 - Do not initialize the local variable
 - Convert DRM_ERROR to DEBUG.

Cc: Alex Deucher 
Cc: Christian Koenig 
Reviewed-by: Christian Koenig 
Signed-off-by: Shashank Sharma 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c   |   1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 121 ++
 .../gpu/drm/amd/include/amdgpu_userqueue.h|   2 +
 3 files changed, 124 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index b52442e2d04a..551e13693100 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -2929,6 +2929,7 @@ const struct drm_ioctl_desc amdgpu_ioctls_kms[] = {
DRM_IOCTL_DEF_DRV(AMDGPU_GEM_VA, amdgpu_gem_va_ioctl, 
DRM_AUTH|DRM_RENDER_ALLOW),
DRM_IOCTL_DEF_DRV(AMDGPU_GEM_OP, amdgpu_gem_op_ioctl, 
DRM_AUTH|DRM_RENDER_ALLOW),
DRM_IOCTL_DEF_DRV(AMDGPU_GEM_USERPTR, amdgpu_gem_userptr_ioctl, 
DRM_AUTH|DRM_RENDER_ALLOW),
+   DRM_IOCTL_DEF_DRV(AMDGPU_USERQ, amdgpu_userq_ioctl, 
DRM_AUTH|DRM_RENDER_ALLOW),
 };
 
 static const struct drm_driver amdgpu_kms_driver = {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
index effc0c7c02cf..ce9b25b82e94 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
@@ -23,6 +23,127 @@
  */
 
 #include "amdgpu.h"
+#include "amdgpu_vm.h"
+#include "amdgpu_userqueue.h"
+
+static struct amdgpu_usermode_queue *
+amdgpu_userqueue_find(struct amdgpu_userq_mgr *uq_mgr, int qid)
+{
+   return idr_find(&uq_mgr->userq_idr, qid);
+}
+
+static int
+amdgpu_userqueue_destroy(struct drm_file *filp, int queue_id)
+{
+   struct amdgpu_fpriv *fpriv = filp->driver_priv;
+   struct amdgpu_userq_mgr *uq_mgr = &fpriv->userq_mgr;
+   struct amdgpu_device *adev = uq_mgr->adev;
+   const struct amdgpu_userq_funcs *uq_funcs;
+   struct amdgpu_usermode_queue *queue;
+
+   mutex_lock(&uq_mgr->userq_mutex);
+
+   queue = amdgpu_userqueue_find(uq_mgr, queue_id);
+   if (!queue) {
+   DRM_DEBUG_DRIVER("Invalid queue id to destroy\n");
+   mutex_unlock(&uq_mgr->userq_mutex);
+   return -EINVAL;
+   }
+
+   uq_funcs = adev->userq_funcs[queue->queue_type];
+   uq_funcs->mqd_destroy(uq_mgr, queue);
+   idr_remove(&uq_mgr->userq_idr, queue_id);
+   kfree(queue);
+
+   mutex_unlock(&uq_mgr->userq_mutex);
+   return 0;
+}
+
+static int
+amdgpu_userqueue_create(struct drm_file *filp, union drm_amdgpu_userq *args)
+{
+   struct amdgpu_fpriv *fpriv = filp->driver_priv;
+   struct amdgpu_userq_mgr *uq_mgr = &fpriv->userq_mgr;
+   struct amdgpu_device *adev = uq_mgr->adev;
+   const struct amdgpu_userq_funcs *uq_funcs;
+   struct amdgpu_usermode_queue *queue;
+   int qid, r = 0;
+
+   /* Usermode queues are only supported for GFX/SDMA engines as of now */
+   if (args->in.ip_type != AMDGPU_HW_IP_GFX) {
+   DRM_ERROR("Usermode queue doesn't support IP type %u\n", 
args->in.ip_type);
+   return -EINVAL;
+   }
+
+   mutex_lock(&uq_mgr->userq_mutex);
+
+   uq_funcs = adev->userq_funcs[args->in.ip_type];
+   if (!uq_funcs) {
+   DRM_ERROR("Usermode queue is not supported for this IP (%u)\n", 
args->in.ip_type);
+   r = -EINVAL;
+   goto unlock;
+   }
+
+   queue = kzalloc(sizeof(struct amdgpu_usermode_queue), GFP_KERNEL);
+   i

[PATCH v10 06/14] drm/amdgpu: create context space for usermode queue

2024-05-02 Thread Shashank Sharma

The MES FW expects us to allocate at least one page as context
space to process gang and process related context data. This
patch creates a joint object for the same, and calculates GPU
space offsets of these spaces.

V1: Addressed review comments on RFC patch:
Alex: Make this function IP specific

V2: Addressed review comments from Christian
- Allocate only one object for total FW space, and calculate
  offsets for each of these objects.

V3: Integration with doorbell manager

V4: Review comments:
- Remove shadow from FW space list from cover letter (Alex)
- Alignment of macro (Luben)

V5: Merged patches 5 and 6 into this single patch
Addressed review comments:
- Use lower_32_bits instead of mask (Christian)
- gfx_v11_0 instead of gfx_v11 in function names (Alex)
- Shadow and GDS objects are now coming from userspace (Christian,
  Alex)

V6:
- Add a comment to replace amdgpu_bo_create_kernel() with
  amdgpu_bo_create() during fw_ctx object creation (Christian).
- Move proc_ctx_gpu_addr, gang_ctx_gpu_addr and fw_ctx_gpu_addr out
  of generic queue structure and make it gen11 specific (Alex).

V7:
   - Using helper function to create/destroy userqueue objects.
   - Removed FW object space allocation.

V8:
   - Updating FW object address from user values.

V9:
   - uppdated function name from gfx_v11_* to mes_v11_*

V10:
   - making this patch independent of IP based changes, moving any
 GFX object related changes in GFX specific patch (Alex)

Cc: Alex Deucher 
Cc: Christian Koenig 
Acked-by: Christian Koenig 
Signed-off-by: Shashank Sharma 
Signed-off-by: Arvind Yadav 
---
 .../gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c  | 33 +++
 .../gpu/drm/amd/include/amdgpu_userqueue.h|  1 +
 2 files changed, 34 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
index 75d7c58418c8..58cfc956cddd 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
@@ -27,6 +27,31 @@
 #include "mes_v11_0.h"
 #include "mes_v11_0_userqueue.h"
 
+#define AMDGPU_USERQ_PROC_CTX_SZ PAGE_SIZE
+#define AMDGPU_USERQ_GANG_CTX_SZ PAGE_SIZE
+
+static int mes_v11_0_userq_create_ctx_space(struct amdgpu_userq_mgr *uq_mgr,
+   struct amdgpu_usermode_queue *queue,
+   struct drm_amdgpu_userq_in 
*mqd_user)
+{
+   struct amdgpu_userq_obj *ctx = &queue->fw_obj;
+   int r, size;
+
+   /*
+* The FW expects at least one page space allocated for
+* process ctx and gang ctx each. Create an object
+* for the same.
+*/
+   size = AMDGPU_USERQ_PROC_CTX_SZ + AMDGPU_USERQ_GANG_CTX_SZ;
+   r = amdgpu_userqueue_create_object(uq_mgr, ctx, size);
+   if (r) {
+   DRM_ERROR("Failed to allocate ctx space bo for userqueue, 
err:%d\n", r);
+   return r;
+   }
+
+   return 0;
+}
+
 static int mes_v11_0_userq_mqd_create(struct amdgpu_userq_mgr *uq_mgr,
  struct drm_amdgpu_userq_in *args_in,
  struct amdgpu_usermode_queue *queue)
@@ -89,6 +114,13 @@ static int mes_v11_0_userq_mqd_create(struct 
amdgpu_userq_mgr *uq_mgr,
goto free_mqd;
}
 
+   /* Create BO for FW operations */
+   r = mes_v11_0_userq_create_ctx_space(uq_mgr, queue, mqd_user);
+   if (r) {
+   DRM_ERROR("Failed to allocate BO for userqueue (%d)", r);
+   goto free_mqd;
+   }
+
return 0;
 
 free_mqd:
@@ -107,6 +139,7 @@ static void
 mes_v11_0_userq_mqd_destroy(struct amdgpu_userq_mgr *uq_mgr,
struct amdgpu_usermode_queue *queue)
 {
+   amdgpu_userqueue_destroy_object(uq_mgr, &queue->fw_obj);
kfree(queue->userq_prop);
amdgpu_userqueue_destroy_object(uq_mgr, &queue->mqd);
 }
diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h 
b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
index bbd29f68b8d4..643f31474bd8 100644
--- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
+++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
@@ -44,6 +44,7 @@ struct amdgpu_usermode_queue {
struct amdgpu_userq_mgr *userq_mgr;
struct amdgpu_vm*vm;
struct amdgpu_userq_obj mqd;
+   struct amdgpu_userq_obj fw_obj;
 };
 
 struct amdgpu_userq_funcs {
-- 
2.43.2

[PATCH v10 04/14] drm/amdgpu: add helpers to create userqueue object

2024-05-02 Thread Shashank Sharma

This patch introduces amdgpu_userqueue_object and its helper
functions to creates and destroy this object. The helper
functions creates/destroys a base amdgpu_bo, kmap/unmap it and
save the respective GPU and CPU addresses in the encapsulating
userqueue object.

These helpers will be used to create/destroy userqueue MQD, WPTR
and FW areas.

V7:
- Forked out this new patch from V11-gfx-userqueue patch to prevent
  that patch from growing very big.
- Using amdgpu_bo_create instead of amdgpu_bo_create_kernel in prep
  for eviction fences (Christian)

V9:
 - Rebase
V10:
 - Added Alex's R-B

Cc: Alex Deucher 
Cc: Christian Koenig 
Reviewed-by: Alex Deucher 
Signed-off-by: Shashank Sharma 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 62 +++
 .../gpu/drm/amd/include/amdgpu_userqueue.h| 13 
 2 files changed, 75 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
index ce9b25b82e94..edbcb0f4c898 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
@@ -32,6 +32,68 @@ amdgpu_userqueue_find(struct amdgpu_userq_mgr *uq_mgr, int 
qid)
return idr_find(&uq_mgr->userq_idr, qid);
 }
 
+int amdgpu_userqueue_create_object(struct amdgpu_userq_mgr *uq_mgr,
+  struct amdgpu_userq_obj *userq_obj,
+  int size)
+{
+   struct amdgpu_device *adev = uq_mgr->adev;
+   struct amdgpu_bo_param bp;
+   int r;
+
+   memset(&bp, 0, sizeof(bp));
+   bp.byte_align = PAGE_SIZE;
+   bp.domain = AMDGPU_GEM_DOMAIN_GTT;
+   bp.flags = AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS |
+  AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED;
+   bp.type = ttm_bo_type_kernel;
+   bp.size = size;
+   bp.resv = NULL;
+   bp.bo_ptr_size = sizeof(struct amdgpu_bo);
+
+   r = amdgpu_bo_create(adev, &bp, &userq_obj->obj);
+   if (r) {
+   DRM_ERROR("Failed to allocate BO for userqueue (%d)", r);
+   return r;
+   }
+
+   r = amdgpu_bo_reserve(userq_obj->obj, true);
+   if (r) {
+   DRM_ERROR("Failed to reserve BO to map (%d)", r);
+   goto free_obj;
+   }
+
+   r = amdgpu_ttm_alloc_gart(&(userq_obj->obj)->tbo);
+   if (r) {
+   DRM_ERROR("Failed to alloc GART for userqueue object (%d)", r);
+   goto unresv;
+   }
+
+   r = amdgpu_bo_kmap(userq_obj->obj, &userq_obj->cpu_ptr);
+   if (r) {
+   DRM_ERROR("Failed to map BO for userqueue (%d)", r);
+   goto unresv;
+   }
+
+   userq_obj->gpu_addr = amdgpu_bo_gpu_offset(userq_obj->obj);
+   amdgpu_bo_unreserve(userq_obj->obj);
+   memset(userq_obj->cpu_ptr, 0, size);
+   return 0;
+
+unresv:
+   amdgpu_bo_unreserve(userq_obj->obj);
+
+free_obj:
+   amdgpu_bo_unref(&userq_obj->obj);
+   return r;
+}
+
+void amdgpu_userqueue_destroy_object(struct amdgpu_userq_mgr *uq_mgr,
+  struct amdgpu_userq_obj *userq_obj)
+{
+   amdgpu_bo_kunmap(userq_obj->obj);
+   amdgpu_bo_unref(&userq_obj->obj);
+}
+
 static int
 amdgpu_userqueue_destroy(struct drm_file *filp, int queue_id)
 {
diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h 
b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
index b739274c72e1..bbd29f68b8d4 100644
--- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
+++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
@@ -29,6 +29,12 @@
 
 struct amdgpu_mqd_prop;
 
+struct amdgpu_userq_obj {
+   void *cpu_ptr;
+   uint64_t gpu_addr;
+   struct amdgpu_bo *obj;
+};
+
 struct amdgpu_usermode_queue {
int queue_type;
uint64_tdoorbell_handle;
@@ -37,6 +43,7 @@ struct amdgpu_usermode_queue {
struct amdgpu_mqd_prop  *userq_prop;
struct amdgpu_userq_mgr *userq_mgr;
struct amdgpu_vm*vm;
+   struct amdgpu_userq_obj mqd;
 };
 
 struct amdgpu_userq_funcs {
@@ -60,4 +67,10 @@ int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr 
*userq_mgr, struct amdgpu_devi
 
 void amdgpu_userq_mgr_fini(struct amdgpu_userq_mgr *userq_mgr);
 
+int amdgpu_userqueue_create_object(struct amdgpu_userq_mgr *uq_mgr,
+  struct amdgpu_userq_obj *userq_obj,
+  int size);
+
+void amdgpu_userqueue_destroy_object(struct amdgpu_userq_mgr *uq_mgr,
+struct amdgpu_userq_obj *userq_obj);
 #endif
-- 
2.43.2

[PATCH v10 05/14] drm/amdgpu: create MES-V11 usermode queue for GFX

2024-05-02 Thread Shashank Sharma

A Memory queue descriptor (MQD) of a userqueue defines it in
the hw's context. As MQD format can vary between different
graphics IPs, we need gfx GEN specific handlers to create MQDs.

This patch:
- Adds a new file which will be used for MES based userqueue
  functions targeting GFX and SDMA IP.
- Introduces MQD handler functions for the usermode queues.

V1: Worked on review comments from Alex:
- Make MQD functions GEN and IP specific

V2: Worked on review comments from Alex:
- Reuse the existing adev->mqd[ip] for MQD creation
- Formatting and arrangement of code

V3:
- Integration with doorbell manager

V4: Review comments addressed:
- Do not create a new file for userq, reuse gfx_v11_0.c (Alex)
- Align name of structure members (Luben)
- Don't break up the Cc tag list and the Sob tag list in commit
  message (Luben)
V5:
   - No need to reserve the bo for MQD (Christian).
   - Some more changes to support IP specific MQD creation.

V6:
   - Add a comment reminding us to replace the amdgpu_bo_create_kernel()
 calls while creating MQD object to amdgpu_bo_create() once eviction
 fences are ready (Christian).

V7:
   - Re-arrange userqueue functions in adev instead of uq_mgr (Alex)
   - Use memdup_user instead of copy_from_user (Christian)

V9:
   - Moved userqueue code from gfx_v11_0.c to new file mes_v11_0.c so
 that it can be reused for SDMA userqueues as well (Shashank, Alex)

V10: Addressed review comments from Alex
   - Making this patch independent of IP engine(GFX/SDMA/Compute) and
 specific to MES V11 only, using the generic MQD structure.
   - Splitting a spearate patch to enabling GFX support from here.
   - Verify mqd va address to be non-NULL.
   - Add a separate header file.

Cc: Alex Deucher 
Cc: Christian Koenig 
Signed-off-by: Shashank Sharma 
Signed-off-by: Arvind Yadav 
---
 drivers/gpu/drm/amd/amdgpu/Makefile   |   1 +
 .../gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c  | 117 ++
 .../gpu/drm/amd/amdgpu/mes_v11_0_userqueue.h  |  30 +
 3 files changed, 148 insertions(+)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
 create mode 100644 drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.h

diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
b/drivers/gpu/drm/amd/amdgpu/Makefile
index 2d421f17626d..987fabb2b2c6 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -189,6 +189,7 @@ amdgpu-y += \
amdgpu_mes.o \
mes_v10_1.o \
mes_v11_0.o \
+   mes_v11_0_userqueue.o \
mes_v12_0.o
 
 # add UVD block
diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
new file mode 100644
index ..75d7c58418c8
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
@@ -0,0 +1,117 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright 2024 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+#include "amdgpu.h"
+#include "amdgpu_gfx.h"
+#include "v11_structs.h"
+#include "mes_v11_0.h"
+#include "mes_v11_0_userqueue.h"
+
+static int mes_v11_0_userq_mqd_create(struct amdgpu_userq_mgr *uq_mgr,
+ struct drm_amdgpu_userq_in *args_in,
+ struct amdgpu_usermode_queue *queue)
+{
+   struct amdgpu_device *adev = uq_mgr->adev;
+   struct amdgpu_mqd *mqd_hw_default = &adev->mqds[queue->queue_type];
+   struct drm_amdgpu_userq_in *mqd_user;
+   struct amdgpu_mqd_prop *userq_props;
+   int r;
+
+   /* Incoming MQD parameters from userspace to be saved here */
+   memset(&mqd_user, 0, sizeof(mqd_user));
+
+   /* Structure to initialize MQD for userqueue using generic MQD init 
function */
+   userq_props = kzallo

[PATCH v10 07/14] drm/amdgpu: map usermode queue into MES

2024-05-02 Thread Shashank Sharma

This patch adds new functions to map/unmap a usermode queue into
the FW, using the MES ring. As soon as this mapping is done, the
queue would  be considered ready to accept the workload.

V1: Addressed review comments from Alex on the RFC patch series
- Map/Unmap should be IP specific.
V2:
Addressed review comments from Christian:
- Fix the wptr_mc_addr calculation (moved into another patch)
Addressed review comments from Alex:
- Do not add fptrs for map/unmap

V3:  Integration with doorbell manager
V4:  Rebase
V5:  Use gfx_v11_0 for function names (Alex)
V6:  Removed queue->proc/gang/fw_ctx_address variables and doing the
 address calculations locally to keep the queue structure GEN
 independent (Alex)
V7:  Added R-B from Alex
V8:  Rebase
V9:  Rebase
V10: Rebase

Cc: Alex Deucher 
Cc: Christian Koenig 
Reviewed-by: Alex Deucher 
Signed-off-by: Shashank Sharma 
---
 .../gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c  | 74 +++
 1 file changed, 74 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
index 58cfc956cddd..874ea3901319 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
@@ -30,6 +30,69 @@
 #define AMDGPU_USERQ_PROC_CTX_SZ PAGE_SIZE
 #define AMDGPU_USERQ_GANG_CTX_SZ PAGE_SIZE
 
+static int mes_v11_0_userq_map(struct amdgpu_userq_mgr *uq_mgr,
+  struct amdgpu_usermode_queue *queue,
+  struct amdgpu_mqd_prop *userq_props)
+{
+   struct amdgpu_device *adev = uq_mgr->adev;
+   struct amdgpu_userq_obj *ctx = &queue->fw_obj;
+   struct mes_add_queue_input queue_input;
+   int r;
+
+   memset(&queue_input, 0x0, sizeof(struct mes_add_queue_input));
+
+   queue_input.process_va_start = 0;
+   queue_input.process_va_end = (adev->vm_manager.max_pfn - 1) << 
AMDGPU_GPU_PAGE_SHIFT;
+
+   /* set process quantum to 10 ms and gang quantum to 1 ms as default */
+   queue_input.process_quantum = 10;
+   queue_input.gang_quantum = 1;
+   queue_input.paging = false;
+
+   queue_input.process_context_addr = ctx->gpu_addr;
+   queue_input.gang_context_addr = ctx->gpu_addr + 
AMDGPU_USERQ_PROC_CTX_SZ;
+   queue_input.inprocess_gang_priority = AMDGPU_MES_PRIORITY_LEVEL_NORMAL;
+   queue_input.gang_global_priority_level = 
AMDGPU_MES_PRIORITY_LEVEL_NORMAL;
+
+   queue_input.process_id = queue->vm->pasid;
+   queue_input.queue_type = queue->queue_type;
+   queue_input.mqd_addr = queue->mqd.gpu_addr;
+   queue_input.wptr_addr = userq_props->wptr_gpu_addr;
+   queue_input.queue_size = userq_props->queue_size >> 2;
+   queue_input.doorbell_offset = userq_props->doorbell_index;
+   queue_input.page_table_base_addr = 
amdgpu_gmc_pd_addr(queue->vm->root.bo);
+
+   amdgpu_mes_lock(&adev->mes);
+   r = adev->mes.funcs->add_hw_queue(&adev->mes, &queue_input);
+   amdgpu_mes_unlock(&adev->mes);
+   if (r) {
+   DRM_ERROR("Failed to map queue in HW, err (%d)\n", r);
+   return r;
+   }
+
+   DRM_DEBUG_DRIVER("Queue (doorbell:%d) mapped successfully\n", 
userq_props->doorbell_index);
+   return 0;
+}
+
+static void mes_v11_0_userq_unmap(struct amdgpu_userq_mgr *uq_mgr,
+ struct amdgpu_usermode_queue *queue)
+{
+   struct amdgpu_device *adev = uq_mgr->adev;
+   struct mes_remove_queue_input queue_input;
+   struct amdgpu_userq_obj *ctx = &queue->fw_obj;
+   int r;
+
+   memset(&queue_input, 0x0, sizeof(struct mes_remove_queue_input));
+   queue_input.doorbell_offset = queue->doorbell_index;
+   queue_input.gang_context_addr = ctx->gpu_addr + 
AMDGPU_USERQ_PROC_CTX_SZ;
+
+   amdgpu_mes_lock(&adev->mes);
+   r = adev->mes.funcs->remove_hw_queue(&adev->mes, &queue_input);
+   amdgpu_mes_unlock(&adev->mes);
+   if (r)
+   DRM_ERROR("Failed to unmap queue in HW, err (%d)\n", r);
+}
+
 static int mes_v11_0_userq_create_ctx_space(struct amdgpu_userq_mgr *uq_mgr,
struct amdgpu_usermode_queue *queue,
struct drm_amdgpu_userq_in 
*mqd_user)
@@ -121,8 +184,18 @@ static int mes_v11_0_userq_mqd_create(struct 
amdgpu_userq_mgr *uq_mgr,
goto free_mqd;
}
 
+   /* Map userqueue into FW using MES */
+   r = mes_v11_0_userq_map(uq_mgr, queue, userq_props);
+   if (r) {
+   DRM_ERROR("Failed to init MQD\n");
+   goto free_ctx;
+   }
+
return 0;
 
+free_ctx:
+   amdgpu_userqueue_destroy_object(uq_mgr, &queue->fw_obj);
+
 free_

[PATCH v10 08/14] drm/amdgpu: map wptr BO into GART

2024-05-02 Thread Shashank Sharma

To support oversubscription, MES FW expects WPTR BOs to
be mapped into GART, before they are submitted to usermode
queues. This patch adds a function for the same.

V4: fix the wptr value before mapping lookup (Bas, Christian).

V5: Addressed review comments from Christian:
- Either pin object or allocate from GART, but not both.
- All the handling must be done with the VM locks held.

V7: Addressed review comments from Christian:
- Do not take vm->eviction_lock
- Use amdgpu_bo_gpu_offset to get the wptr_bo GPU offset

V8:  Rebase
V9:  Changed the function names from gfx_v11* to mes_v11*
V10: Remove unused adev (Harish)

Cc: Alex Deucher 
Cc: Christian Koenig 
Signed-off-by: Shashank Sharma 
Signed-off-by: Arvind Yadav 
---
 .../gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c  | 76 +++
 .../gpu/drm/amd/include/amdgpu_userqueue.h|  1 +
 2 files changed, 77 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
index 874ea3901319..6ff04647b62e 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
@@ -30,6 +30,73 @@
 #define AMDGPU_USERQ_PROC_CTX_SZ PAGE_SIZE
 #define AMDGPU_USERQ_GANG_CTX_SZ PAGE_SIZE
 
+static int
+mes_v11_0_map_gtt_bo_to_gart(struct amdgpu_bo *bo)
+{
+   int ret;
+
+   ret = amdgpu_bo_reserve(bo, true);
+   if (ret) {
+   DRM_ERROR("Failed to reserve bo. ret %d\n", ret);
+   goto err_reserve_bo_failed;
+   }
+
+   ret = amdgpu_ttm_alloc_gart(&bo->tbo);
+   if (ret) {
+   DRM_ERROR("Failed to bind bo to GART. ret %d\n", ret);
+   goto err_map_bo_gart_failed;
+   }
+
+   amdgpu_bo_unreserve(bo);
+   bo = amdgpu_bo_ref(bo);
+
+   return 0;
+
+err_map_bo_gart_failed:
+   amdgpu_bo_unreserve(bo);
+err_reserve_bo_failed:
+   return ret;
+}
+
+static int
+mes_v11_0_create_wptr_mapping(struct amdgpu_userq_mgr *uq_mgr,
+ struct amdgpu_usermode_queue *queue,
+ uint64_t wptr)
+{
+   struct amdgpu_bo_va_mapping *wptr_mapping;
+   struct amdgpu_vm *wptr_vm;
+   struct amdgpu_userq_obj *wptr_obj = &queue->wptr_obj;
+   int ret;
+
+   wptr_vm = queue->vm;
+   ret = amdgpu_bo_reserve(wptr_vm->root.bo, false);
+   if (ret)
+   return ret;
+
+   wptr &= AMDGPU_GMC_HOLE_MASK;
+   wptr_mapping = amdgpu_vm_bo_lookup_mapping(wptr_vm, wptr >> PAGE_SHIFT);
+   amdgpu_bo_unreserve(wptr_vm->root.bo);
+   if (!wptr_mapping) {
+   DRM_ERROR("Failed to lookup wptr bo\n");
+   return -EINVAL;
+   }
+
+   wptr_obj->obj = wptr_mapping->bo_va->base.bo;
+   if (wptr_obj->obj->tbo.base.size > PAGE_SIZE) {
+   DRM_ERROR("Requested GART mapping for wptr bo larger than one 
page\n");
+   return -EINVAL;
+   }
+
+   ret = mes_v11_0_map_gtt_bo_to_gart(wptr_obj->obj);
+   if (ret) {
+   DRM_ERROR("Failed to map wptr bo to GART\n");
+   return ret;
+   }
+
+   queue->wptr_obj.gpu_addr = amdgpu_bo_gpu_offset_no_check(wptr_obj->obj);
+   return 0;
+}
+
 static int mes_v11_0_userq_map(struct amdgpu_userq_mgr *uq_mgr,
   struct amdgpu_usermode_queue *queue,
   struct amdgpu_mqd_prop *userq_props)
@@ -61,6 +128,7 @@ static int mes_v11_0_userq_map(struct amdgpu_userq_mgr 
*uq_mgr,
queue_input.queue_size = userq_props->queue_size >> 2;
queue_input.doorbell_offset = userq_props->doorbell_index;
queue_input.page_table_base_addr = 
amdgpu_gmc_pd_addr(queue->vm->root.bo);
+   queue_input.wptr_mc_addr = queue->wptr_obj.gpu_addr;
 
amdgpu_mes_lock(&adev->mes);
r = adev->mes.funcs->add_hw_queue(&adev->mes, &queue_input);
@@ -184,6 +252,13 @@ static int mes_v11_0_userq_mqd_create(struct 
amdgpu_userq_mgr *uq_mgr,
goto free_mqd;
}
 
+   /* FW expects WPTR BOs to be mapped into GART */
+   r = mes_v11_0_create_wptr_mapping(uq_mgr, queue, 
userq_props->wptr_gpu_addr);
+   if (r) {
+   DRM_ERROR("Failed to create WPTR mapping\n");
+   goto free_ctx;
+   }
+
/* Map userqueue into FW using MES */
r = mes_v11_0_userq_map(uq_mgr, queue, userq_props);
if (r) {
@@ -213,6 +288,7 @@ mes_v11_0_userq_mqd_destroy(struct amdgpu_userq_mgr *uq_mgr,
struct amdgpu_usermode_queue *queue)
 {
mes_v11_0_userq_unmap(uq_mgr, queue);
+   amdgpu_bo_unref(&queue->wptr_obj.obj);
amdgpu_userqueue_destroy_object(uq_mgr, &queue->fw_obj);
kfree(queue->userq_prop);
amdgpu_u

[PATCH v10 10/14] drm/amdgpu: cleanup leftover queues

2024-05-02 Thread Shashank Sharma

This patch adds code to cleanup any leftover userqueues which
a user might have missed to destroy due to a crash or any other
programming error.

V7:  Added Alex's R-B
V8:  Rebase
V9:  Rebase
V10: Rebase

Cc: Alex Deucher 
Cc: Christian Koenig 
Reviewed-by: Alex Deucher 
Suggested-by: Bas Nieuwenhuizen 
Signed-off-by: Bas Nieuwenhuizen 
Signed-off-by: Shashank Sharma 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 27 ++-
 1 file changed, 20 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
index fbf6235cfea0..df0e74a3ec8c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
@@ -26,6 +26,19 @@
 #include "amdgpu_vm.h"
 #include "amdgpu_userqueue.h"
 
+static void
+amdgpu_userqueue_cleanup(struct amdgpu_userq_mgr *uq_mgr,
+struct amdgpu_usermode_queue *queue,
+int queue_id)
+{
+   struct amdgpu_device *adev = uq_mgr->adev;
+   const struct amdgpu_userq_funcs *uq_funcs = 
adev->userq_funcs[queue->queue_type];
+
+   uq_funcs->mqd_destroy(uq_mgr, queue);
+   idr_remove(&uq_mgr->userq_idr, queue_id);
+   kfree(queue);
+}
+
 static struct amdgpu_usermode_queue *
 amdgpu_userqueue_find(struct amdgpu_userq_mgr *uq_mgr, int qid)
 {
@@ -146,8 +159,6 @@ amdgpu_userqueue_destroy(struct drm_file *filp, int 
queue_id)
 {
struct amdgpu_fpriv *fpriv = filp->driver_priv;
struct amdgpu_userq_mgr *uq_mgr = &fpriv->userq_mgr;
-   struct amdgpu_device *adev = uq_mgr->adev;
-   const struct amdgpu_userq_funcs *uq_funcs;
struct amdgpu_usermode_queue *queue;
 
mutex_lock(&uq_mgr->userq_mutex);
@@ -159,13 +170,9 @@ amdgpu_userqueue_destroy(struct drm_file *filp, int 
queue_id)
return -EINVAL;
}
 
-   uq_funcs = adev->userq_funcs[queue->queue_type];
-   uq_funcs->mqd_destroy(uq_mgr, queue);
amdgpu_bo_unpin(queue->db_obj.obj);
amdgpu_bo_unref(&queue->db_obj.obj);
-   idr_remove(&uq_mgr->userq_idr, queue_id);
-   kfree(queue);
-
+   amdgpu_userqueue_cleanup(uq_mgr, queue, queue_id);
mutex_unlock(&uq_mgr->userq_mutex);
return 0;
 }
@@ -277,6 +284,12 @@ int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr 
*userq_mgr, struct amdgpu_devi
 
 void amdgpu_userq_mgr_fini(struct amdgpu_userq_mgr *userq_mgr)
 {
+   uint32_t queue_id;
+   struct amdgpu_usermode_queue *queue;
+
+   idr_for_each_entry(&userq_mgr->userq_idr, queue, queue_id)
+   amdgpu_userqueue_cleanup(userq_mgr, queue, queue_id);
+
idr_destroy(&userq_mgr->userq_idr);
mutex_destroy(&userq_mgr->userq_mutex);
 }
-- 
2.43.2

[PATCH v10 09/14] drm/amdgpu: generate doorbell index for userqueue

2024-05-02 Thread Shashank Sharma

The userspace sends us the doorbell object and the relative doobell
index in the object to be used for the usermode queue, but the FW
expects the absolute doorbell index on the PCI BAR in the MQD. This
patch adds a function to convert this relative doorbell index to
absolute doorbell index.

V5:  Fix the db object reference leak (Christian)
V6:  Pin the doorbell bo in userqueue_create() function, and unpin it
 in userqueue destoy (Christian)
V7:  Added missing kfree for queue in error cases
 Added Alex's R-B
V8:  Rebase
V9:  Changed the function names from gfx_v11* to mes_v11*
V10: Rebase

Cc: Alex Deucher 
Cc: Christian Koenig 
Reviewed-by: Alex Deucher 
Signed-off-by: Shashank Sharma 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 59 +++
 .../gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c  |  1 +
 .../gpu/drm/amd/include/amdgpu_userqueue.h|  1 +
 3 files changed, 61 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
index edbcb0f4c898..fbf6235cfea0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
@@ -94,6 +94,53 @@ void amdgpu_userqueue_destroy_object(struct amdgpu_userq_mgr 
*uq_mgr,
amdgpu_bo_unref(&userq_obj->obj);
 }
 
+static uint64_t
+amdgpu_userqueue_get_doorbell_index(struct amdgpu_userq_mgr *uq_mgr,
+struct amdgpu_usermode_queue *queue,
+struct drm_file *filp,
+uint32_t doorbell_offset)
+{
+   uint64_t index;
+   struct drm_gem_object *gobj;
+   struct amdgpu_userq_obj *db_obj = &queue->db_obj;
+   int r;
+
+   gobj = drm_gem_object_lookup(filp, queue->doorbell_handle);
+   if (gobj == NULL) {
+   DRM_ERROR("Can't find GEM object for doorbell\n");
+   return -EINVAL;
+   }
+
+   db_obj->obj = amdgpu_bo_ref(gem_to_amdgpu_bo(gobj));
+   drm_gem_object_put(gobj);
+
+   /* Pin the BO before generating the index, unpin in queue destroy */
+   r = amdgpu_bo_pin(db_obj->obj, AMDGPU_GEM_DOMAIN_DOORBELL);
+   if (r) {
+   DRM_ERROR("[Usermode queues] Failed to pin doorbell object\n");
+   goto unref_bo;
+   }
+
+   r = amdgpu_bo_reserve(db_obj->obj, true);
+   if (r) {
+   DRM_ERROR("[Usermode queues] Failed to pin doorbell object\n");
+   goto unpin_bo;
+   }
+
+   index = amdgpu_doorbell_index_on_bar(uq_mgr->adev, db_obj->obj,
+doorbell_offset, sizeof(u64));
+   DRM_DEBUG_DRIVER("[Usermode queues] doorbell index=%lld\n", index);
+   amdgpu_bo_unreserve(db_obj->obj);
+   return index;
+
+unpin_bo:
+   amdgpu_bo_unpin(db_obj->obj);
+
+unref_bo:
+   amdgpu_bo_unref(&db_obj->obj);
+   return r;
+}
+
 static int
 amdgpu_userqueue_destroy(struct drm_file *filp, int queue_id)
 {
@@ -114,6 +161,8 @@ amdgpu_userqueue_destroy(struct drm_file *filp, int 
queue_id)
 
uq_funcs = adev->userq_funcs[queue->queue_type];
uq_funcs->mqd_destroy(uq_mgr, queue);
+   amdgpu_bo_unpin(queue->db_obj.obj);
+   amdgpu_bo_unref(&queue->db_obj.obj);
idr_remove(&uq_mgr->userq_idr, queue_id);
kfree(queue);
 
@@ -129,6 +178,7 @@ amdgpu_userqueue_create(struct drm_file *filp, union 
drm_amdgpu_userq *args)
struct amdgpu_device *adev = uq_mgr->adev;
const struct amdgpu_userq_funcs *uq_funcs;
struct amdgpu_usermode_queue *queue;
+   uint64_t index;
int qid, r = 0;
 
/* Usermode queues are only supported for GFX/SDMA engines as of now */
@@ -158,6 +208,15 @@ amdgpu_userqueue_create(struct drm_file *filp, union 
drm_amdgpu_userq *args)
queue->flags = args->in.flags;
queue->vm = &fpriv->vm;
 
+   /* Convert relative doorbell offset into absolute doorbell index */
+   index = amdgpu_userqueue_get_doorbell_index(uq_mgr, queue, filp, 
args->in.doorbell_offset);
+   if (index == (uint64_t)-EINVAL) {
+   DRM_ERROR("Failed to get doorbell for queue\n");
+   kfree(queue);
+   goto unlock;
+   }
+   queue->doorbell_index = index;
+
r = uq_funcs->mqd_create(uq_mgr, &args->in, queue);
if (r) {
DRM_ERROR("Failed to create Queue\n");
diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
index 6ff04647b62e..d084c5754273 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
@@ -236,6 +236,7 @@ static int mes_v11_0_userq_mqd_create(struct 
amdgpu_userq_mgr *uq_mgr,
userq_props->hqd

[PATCH v10 11/14] drm/amdgpu: enable GFX-V11 userqueue support

2024-05-02 Thread Shashank Sharma

This patch enables GFX-v11 IP support in the usermode queue base
code. It typically:
- adds a GFX_v11 specific MQD structure
- sets IP functions to create and destroy MQDs
- sets MQD objects coming from userspace

V10: introduced this spearate patch for GFX V11 enabling (Alex).

Cc: Alex Deucher 
Cc: Christian Koenig 
Signed-off-by: Shashank Sharma 
Signed-off-by: Arvind Yadav 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c|  3 +++
 .../gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c  | 22 +++
 include/uapi/drm/amdgpu_drm.h | 22 +++
 3 files changed, 47 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
index ad6431013c73..888edc2b4769 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
@@ -49,6 +49,7 @@
 #include "gfx_v11_0_3.h"
 #include "nbio_v4_3.h"
 #include "mes_v11_0.h"
+#include "mes_v11_0_userqueue.h"
 
 #define GFX11_NUM_GFX_RINGS1
 #define GFX11_MEC_HPD_SIZE 2048
@@ -1347,6 +1348,7 @@ static int gfx_v11_0_sw_init(void *handle)
adev->gfx.mec.num_mec = 2;
adev->gfx.mec.num_pipe_per_mec = 4;
adev->gfx.mec.num_queue_per_pipe = 4;
+   adev->userq_funcs[AMDGPU_HW_IP_GFX] = &userq_mes_v11_0_funcs;
break;
case IP_VERSION(11, 0, 1):
case IP_VERSION(11, 0, 4):
@@ -1358,6 +1360,7 @@ static int gfx_v11_0_sw_init(void *handle)
adev->gfx.mec.num_mec = 1;
adev->gfx.mec.num_pipe_per_mec = 4;
adev->gfx.mec.num_queue_per_pipe = 4;
+   adev->userq_funcs[AMDGPU_HW_IP_GFX] = &userq_mes_v11_0_funcs;
break;
default:
adev->gfx.me.num_me = 1;
diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
index d084c5754273..80375894c4f3 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
@@ -180,6 +180,28 @@ static int mes_v11_0_userq_create_ctx_space(struct 
amdgpu_userq_mgr *uq_mgr,
return r;
}
 
+   /* Shadow, GDS and CSA objects come directly from userspace */
+   if (mqd_user->ip_type == AMDGPU_HW_IP_GFX) {
+   struct v11_gfx_mqd *mqd = queue->mqd.cpu_ptr;
+   struct drm_amdgpu_userq_mqd_gfx_v11 *mqd_gfx_v11;
+
+   if (mqd_user->mqd_size != sizeof(*mqd_gfx_v11) || 
!mqd_user->mqd) {
+   DRM_ERROR("Invalid GFX MQD\n");
+   return -EINVAL;
+   }
+
+   mqd_gfx_v11 = (struct drm_amdgpu_userq_mqd_gfx_v11 
*)mqd_user->mqd;
+
+   mqd->shadow_base_lo = mqd_gfx_v11->shadow_va & 0xFFFC;
+   mqd->shadow_base_hi = upper_32_bits(mqd_gfx_v11->shadow_va);
+
+   mqd->gds_bkup_base_lo = mqd_gfx_v11->gds_va & 0xFFFC;
+   mqd->gds_bkup_base_hi = upper_32_bits(mqd_gfx_v11->gds_va);
+
+   mqd->fw_work_area_base_lo = mqd_gfx_v11->csa_va & 0xFFFC;
+   mqd->fw_work_area_base_hi = upper_32_bits(mqd_gfx_v11->csa_va);
+   }
+
return 0;
 }
 
diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
index f7313e576f06..6798139036a1 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -407,6 +407,28 @@ union drm_amdgpu_userq {
struct drm_amdgpu_userq_out out;
 };
 
+/* GFX V11 IP specific MQD parameters */
+struct drm_amdgpu_userq_mqd_gfx_v11 {
+   /**
+* @shadow_va: Virtual address of the GPU memory to hold the shadow 
buffer.
+* This must be a from a separate GPU object, and must be at least 
4-page
+* sized.
+*/
+   __u64   shadow_va;
+   /**
+* @gds_va: Virtual address of the GPU memory to hold the GDS buffer.
+* This must be a from a separate GPU object, and must be at least 
1-page
+* sized.
+*/
+   __u64   gds_va;
+   /**
+* @csa_va: Virtual address of the GPU memory to hold the CSA buffer.
+* This must be a from a separate GPU object, and must be at least 
1-page
+* sized.
+*/
+   __u64   csa_va;
+};
+
 /* vm ioctl */
 #define AMDGPU_VM_OP_RESERVE_VMID  1
 #define AMDGPU_VM_OP_UNRESERVE_VMID2
-- 
2.43.2

[PATCH v10 12/14] drm/amdgpu: enable SDMA-V6 usermode queues

2024-05-02 Thread Shashank Sharma

This patch does necessary modifications to enable the SDMA-v6
usermode queues using the existing userqueue infrastructure.

V9:  introduced this patch in the series
V10: use header file instead of extern (Alex)

Cc: Christian König 
Cc: Alex Deucher 
Signed-off-by: Shashank Sharma 
Signed-off-by: Arvind Yadav 
Signed-off-by: Srinivasan Shanmugam 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 2 +-
 drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c| 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
index df0e74a3ec8c..f7ece0b31ff9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
@@ -189,7 +189,7 @@ amdgpu_userqueue_create(struct drm_file *filp, union 
drm_amdgpu_userq *args)
int qid, r = 0;
 
/* Usermode queues are only supported for GFX/SDMA engines as of now */
-   if (args->in.ip_type != AMDGPU_HW_IP_GFX) {
+   if (args->in.ip_type != AMDGPU_HW_IP_GFX && args->in.ip_type != 
AMDGPU_HW_IP_DMA) {
DRM_ERROR("Usermode queue doesn't support IP type %u\n", 
args->in.ip_type);
return -EINVAL;
}
diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c 
b/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
index c833b6b8373b..0989400d0afe 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
@@ -43,6 +43,7 @@
 #include "sdma_common.h"
 #include "sdma_v6_0.h"
 #include "v11_structs.h"
+#include "mes_v11_0_userqueue.h"
 
 MODULE_FIRMWARE("amdgpu/sdma_6_0_0.bin");
 MODULE_FIRMWARE("amdgpu/sdma_6_0_1.bin");
@@ -1273,6 +1274,7 @@ static int sdma_v6_0_sw_init(void *handle)
return -EINVAL;
}
 
+   adev->userq_funcs[AMDGPU_HW_IP_DMA] = &userq_mes_v11_0_funcs;
return r;
 }
 
-- 
2.43.2

[PATCH v10 13/14] drm/amdgpu: enable compute/gfx usermode queue

2024-05-02 Thread Shashank Sharma

This patch does the necessary changes required to
enable compute workload support using the existing
usermode queues infrastructure.

V9:  Patch introduced
V10: Add custom IP specific mqd strcuture for compute (Alex)

Cc: Alex Deucher 
Cc: Christian Koenig 
Signed-off-by: Arvind Yadav 
Signed-off-by: Shashank Sharma 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c|  3 ++-
 drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c   |  2 ++
 drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c | 15 +++
 include/uapi/drm/amdgpu_drm.h| 10 ++
 4 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
index f7ece0b31ff9..84bce9434102 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
@@ -189,7 +189,8 @@ amdgpu_userqueue_create(struct drm_file *filp, union 
drm_amdgpu_userq *args)
int qid, r = 0;
 
/* Usermode queues are only supported for GFX/SDMA engines as of now */
-   if (args->in.ip_type != AMDGPU_HW_IP_GFX && args->in.ip_type != 
AMDGPU_HW_IP_DMA) {
+   if (args->in.ip_type != AMDGPU_HW_IP_GFX && args->in.ip_type != 
AMDGPU_HW_IP_DMA
+   && args->in.ip_type != AMDGPU_HW_IP_COMPUTE) {
DRM_ERROR("Usermode queue doesn't support IP type %u\n", 
args->in.ip_type);
return -EINVAL;
}
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
index 888edc2b4769..46304d09c4bd 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
@@ -1349,6 +1349,7 @@ static int gfx_v11_0_sw_init(void *handle)
adev->gfx.mec.num_pipe_per_mec = 4;
adev->gfx.mec.num_queue_per_pipe = 4;
adev->userq_funcs[AMDGPU_HW_IP_GFX] = &userq_mes_v11_0_funcs;
+   adev->userq_funcs[AMDGPU_HW_IP_COMPUTE] = 
&userq_mes_v11_0_funcs;
break;
case IP_VERSION(11, 0, 1):
case IP_VERSION(11, 0, 4):
@@ -1361,6 +1362,7 @@ static int gfx_v11_0_sw_init(void *handle)
adev->gfx.mec.num_pipe_per_mec = 4;
adev->gfx.mec.num_queue_per_pipe = 4;
adev->userq_funcs[AMDGPU_HW_IP_GFX] = &userq_mes_v11_0_funcs;
+   adev->userq_funcs[AMDGPU_HW_IP_COMPUTE] = 
&userq_mes_v11_0_funcs;
break;
default:
adev->gfx.me.num_me = 1;
diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
index 80375894c4f3..2ae6f720dc66 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
@@ -260,6 +260,21 @@ static int mes_v11_0_userq_mqd_create(struct 
amdgpu_userq_mgr *uq_mgr,
userq_props->use_doorbell = true;
userq_props->doorbell_index = queue->doorbell_index;
 
+   if (queue->queue_type == AMDGPU_HW_IP_COMPUTE) {
+   struct drm_amdgpu_userq_mqd_compute_gfx_v11 *compute_mqd;
+
+   if (mqd_user->mqd_size != sizeof(*compute_mqd)) {
+   DRM_ERROR("Invalid compute IP MQD size\n");
+   goto free_mqd_user;
+   }
+   compute_mqd = (struct drm_amdgpu_userq_mqd_compute_gfx_v11 
*)mqd_user->mqd;
+
+   userq_props->eop_gpu_addr = compute_mqd->eop_va;
+   userq_props->hqd_pipe_priority = AMDGPU_GFX_PIPE_PRIO_NORMAL;
+   userq_props->hqd_queue_priority = 
AMDGPU_GFX_QUEUE_PRIORITY_MINIMUM;
+   userq_props->hqd_active = false;
+   }
+
queue->userq_prop = userq_props;
 
r = mqd_hw_default->init_mqd(adev, (void *)queue->mqd.cpu_ptr, 
userq_props);
diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
index 6798139036a1..7ffa9ee885e6 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -429,6 +429,16 @@ struct drm_amdgpu_userq_mqd_gfx_v11 {
__u64   csa_va;
 };
 
+/* GFX V11 Compute IP specific MQD parameters */
+struct drm_amdgpu_userq_mqd_compute_gfx_v11 {
+   /**
+* @eop_va: Virtual address of the GPU memory to hold the EOP buffer.
+* This must be a from a separate GPU object, and must be at least 1 
page
+* sized.
+*/
+   __u64   eop_va;
+};
+
 /* vm ioctl */
 #define AMDGPU_VM_OP_RESERVE_VMID  1
 #define AMDGPU_VM_OP_UNRESERVE_VMID2
-- 
2.43.2

[PATCH v10 14/14] drm/amdgpu: add kernel config for gfx-userqueue

2024-05-02 Thread Shashank Sharma

This patch:
- adds a kernel config option "CONFIG_DRM_AMD_USERQ_GFX"
- moves the usequeue initialization code for all IPs under
  this flag

so that the userqueue works only when the config is enabled.

V9:  Introduce this patch
V10: Call it CONFIG_DRM_AMDGPU_NAVI3X_USERQ instead of
 CONFIG_DRM_AMDGPU_USERQ_GFX (Christian)

Cc: Alex Deucher 
Cc: Christian Koenig 
Signed-off-by: Shashank Sharma 
---
 drivers/gpu/drm/amd/amdgpu/Kconfig | 8 
 drivers/gpu/drm/amd/amdgpu/Makefile| 4 +++-
 drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 4 
 drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c | 3 +++
 4 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/Kconfig 
b/drivers/gpu/drm/amd/amdgpu/Kconfig
index 22d88f8ef527..a7c85eeec756 100644
--- a/drivers/gpu/drm/amd/amdgpu/Kconfig
+++ b/drivers/gpu/drm/amd/amdgpu/Kconfig
@@ -80,6 +80,14 @@ config DRM_AMDGPU_WERROR
  Add -Werror to the build flags for amdgpu.ko.
  Only enable this if you are warning code for amdgpu.ko.
 
+config DRM_AMDGPU_NAVI3X_USERQ
+   bool "Enable Navi 3x gfx usermode queues"
+   depends on DRM_AMDGPU
+   default n
+   help
+ Choose this option to enable usermode queue support for 
GFX/SDMA/Compute
+  workload submission. This feature is supported on Navi 3X only.
+
 source "drivers/gpu/drm/amd/acp/Kconfig"
 source "drivers/gpu/drm/amd/display/Kconfig"
 source "drivers/gpu/drm/amd/amdkfd/Kconfig"
diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
b/drivers/gpu/drm/amd/amdgpu/Makefile
index 987fabb2b2c6..0a64f2c57def 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -189,9 +189,11 @@ amdgpu-y += \
amdgpu_mes.o \
mes_v10_1.o \
mes_v11_0.o \
-   mes_v11_0_userqueue.o \
mes_v12_0.o
 
+# add GFX userqueue support
+amdgpu-$(DRM_AMDGPU_NAVI3X_USERQ) += mes_v11_0_userqueue.o
+
 # add UVD block
 amdgpu-y += \
amdgpu_uvd.o \
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
index 46304d09c4bd..5c4bf243ed04 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
@@ -1348,8 +1348,10 @@ static int gfx_v11_0_sw_init(void *handle)
adev->gfx.mec.num_mec = 2;
adev->gfx.mec.num_pipe_per_mec = 4;
adev->gfx.mec.num_queue_per_pipe = 4;
+#ifdef DRM_AMDGPU_NAVI3X_USERQ
adev->userq_funcs[AMDGPU_HW_IP_GFX] = &userq_mes_v11_0_funcs;
adev->userq_funcs[AMDGPU_HW_IP_COMPUTE] = 
&userq_mes_v11_0_funcs;
+#endif
break;
case IP_VERSION(11, 0, 1):
case IP_VERSION(11, 0, 4):
@@ -1361,8 +1363,10 @@ static int gfx_v11_0_sw_init(void *handle)
adev->gfx.mec.num_mec = 1;
adev->gfx.mec.num_pipe_per_mec = 4;
adev->gfx.mec.num_queue_per_pipe = 4;
+#ifdef CONFIG_DRM_AMD_USERQ_GFX
adev->userq_funcs[AMDGPU_HW_IP_GFX] = &userq_mes_v11_0_funcs;
adev->userq_funcs[AMDGPU_HW_IP_COMPUTE] = 
&userq_mes_v11_0_funcs;
+#endif
break;
default:
adev->gfx.me.num_me = 1;
diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c 
b/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
index 0989400d0afe..f6a2c2daa00f 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
@@ -1274,7 +1274,10 @@ static int sdma_v6_0_sw_init(void *handle)
return -EINVAL;
}
 
+#ifdef DRM_AMDGPU_NAVI3X_USERQ
adev->userq_funcs[AMDGPU_HW_IP_DMA] = &userq_mes_v11_0_funcs;
+#endif
+
return r;
 }
 
-- 
2.43.2

[PATCH 0/4] AMDGPU userqueue suspend/resume

2024-05-08 Thread Shashank Sharma

This patch series adds support for suspending and resuming the gfx
usermode queues. It also adds eviction fences which are primarily used
by usermode queues.

This patch series is dependent on basic AMDGPU usermode queue series
which is being reviewed here:
https://patchwork.freedesktop.org/series/113675/

Shashank Sharma (4):
  drm/amdgpu: add gfx eviction fence helpers
  drm/amdgpu: add core userqueue suspend/resume functions
  drm/amdgpu: suspend gfx userqueues
  drm/amdgpu: add userqueue resume

 drivers/gpu/drm/amd/amdgpu/Makefile   |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu.h   |  21 ++
 .../drm/amd/amdgpu/amdgpu_eviction_fence.c| 112 
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c   |  10 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c   |   8 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.h|   1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 243 ++
 .../gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c  |  31 +++
 .../gpu/drm/amd/include/amdgpu_userqueue.h|  12 +
 9 files changed, 437 insertions(+), 3 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c

-- 
2.43.2

[PATCH 2/4] drm/amdgpu: add core userqueue suspend/resume functions

2024-05-08 Thread Shashank Sharma

This patch adds userqueue suspend/resume functions at
core MES V11 IP level.

Cc: Alex Deucher 
Cc: Christian Koenig 
Signed-off-by: Shashank Sharma 
---
 .../gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c  | 31 +++
 .../gpu/drm/amd/include/amdgpu_userqueue.h|  5 +++
 2 files changed, 36 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
index 412970376b49..4e05da3c8f53 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
@@ -321,7 +321,38 @@ mes_v11_0_userq_mqd_destroy(struct amdgpu_userq_mgr 
*uq_mgr,
amdgpu_userqueue_destroy_object(uq_mgr, &queue->mqd);
 }
 
+static int mes_v11_0_userq_suspend(struct amdgpu_userq_mgr *uq_mgr,
+  struct amdgpu_usermode_queue *queue)
+{
+   if (queue->queue_active) {
+   mes_v11_0_userq_unmap(uq_mgr, queue);
+   queue->queue_active = 0;
+   }
+
+   return 0;
+}
+
+static int mes_v11_0_userq_resume(struct amdgpu_userq_mgr *uq_mgr,
+ struct amdgpu_usermode_queue *queue)
+{
+   int ret;
+
+   if (queue->queue_active)
+   return 0;
+
+   ret = mes_v11_0_userq_map(uq_mgr, queue, queue->userq_prop);
+   if (ret) {
+   DRM_ERROR("Failed to resume queue\n");
+   return ret;
+   }
+
+   queue->queue_active = 1;
+   return 0;
+}
+
 const struct amdgpu_userq_funcs userq_mes_v11_0_funcs = {
.mqd_create = mes_v11_0_userq_mqd_create,
.mqd_destroy = mes_v11_0_userq_mqd_destroy,
+   .suspend = mes_v11_0_userq_suspend,
+   .resume = mes_v11_0_userq_resume,
 };
diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h 
b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
index 5416de0bdf25..afaf93faa824 100644
--- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
+++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
@@ -37,6 +37,7 @@ struct amdgpu_userq_obj {
 
 struct amdgpu_usermode_queue {
int queue_type;
+   uint8_t queue_active;
uint64_tdoorbell_handle;
uint64_tdoorbell_index;
uint64_tflags;
@@ -57,6 +58,10 @@ struct amdgpu_userq_funcs {
  struct amdgpu_usermode_queue *queue);
void (*mqd_destroy)(struct amdgpu_userq_mgr *uq_mgr,
struct amdgpu_usermode_queue *uq);
+   int (*suspend)(struct amdgpu_userq_mgr *uq_mgr,
+  struct amdgpu_usermode_queue *queue);
+   int (*resume)(struct amdgpu_userq_mgr *uq_mgr,
+ struct amdgpu_usermode_queue *queue);
 };
 
 /* Usermode queues for gfx */
-- 
2.43.2

[PATCH 3/4] drm/amdgpu: suspend gfx userqueues

2024-05-08 Thread Shashank Sharma

This patch adds suspend support for gfx userqueues. It typically does
the following:
- adds an enable_signaling function for the eviction fence, so that it
  can trigger the userqueue suspend,
- adds a delayed function for suspending the userqueues, to suspend all
  the queues under this userq manager and signals the eviction fence,
- adds reference of userq manager in the eviction fence container so
  that it can be used in the suspend function.

Cc: Alex Deucher 
Cc: Christian Koenig 
Signed-off-by: Shashank Sharma 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h   |  1 +
 .../drm/amd/amdgpu/amdgpu_eviction_fence.c|  6 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.h|  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 77 +++
 .../gpu/drm/amd/include/amdgpu_userqueue.h|  6 ++
 5 files changed, 91 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index a37193fc9ddc..1856fe11dd05 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -469,6 +469,7 @@ struct amdgpu_eviction_fence {
struct dma_fence base;
spinlock_t   lock;
char timeline_name[TASK_COMM_LEN];
+   struct amdgpu_userq_mgr *uq_mgr;
 };
 
 /*
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
index 1a03f040ccc8..3f806e44f614 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
@@ -39,10 +39,16 @@ amdgpu_ev_fence_get_timeline_name(struct dma_fence *f)
return ef->timeline_name;
 }
 
+static bool amdgpu_ev_fence_enable_signaling(struct dma_fence *f)
+{
+   return !amdgpu_userqueue_enable_signaling(f);
+}
+
 static const struct dma_fence_ops amdgpu_eviction_fence_ops = {
.use_64bit_seqno = true,
.get_driver_name = amdgpu_ev_fence_get_driver_name,
.get_timeline_name = amdgpu_ev_fence_get_timeline_name,
+   .enable_signaling = amdgpu_ev_fence_enable_signaling,
 };
 
 struct amdgpu_eviction_fence *
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
index fa03d9e4874c..8c13de7f2a19 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
@@ -111,6 +111,7 @@ struct amdgpu_bo {
 #endif
struct kgd_mem  *kfd_bo;
 
+
/*
 * For GPUs with spatial partitioning, xcp partition number, -1 means
 * any partition. For other ASICs without spatial partition, always 0
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
index 7a89f378c97f..fdbd542e7f53 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
@@ -23,11 +23,16 @@
  */
 #include 
 #include 
+#include 
 #include "amdgpu.h"
 #include "amdgpu_vm.h"
 #include "amdgpu_userqueue.h"
 #include "amdgpu_userq_fence.h"
 
+#define work_to_uq_mgr(w, name) container_of(w, struct amdgpu_userq_mgr, name)
+#define uq_mgr_to_fpriv(u) container_of(u, struct amdgpu_fpriv, userq_mgr)
+#define to_ev_fence(f) container_of(f, struct amdgpu_eviction_fence, base)
+
 static void amdgpu_userq_walk_and_drop_fence_drv(struct xarray *xa)
 {
struct amdgpu_userq_fence_driver *fence_drv;
@@ -226,6 +231,7 @@ int amdgpu_userqueue_update_bo_mapping(struct drm_file 
*filp, struct amdgpu_bo *
}
 
drm_syncobj_add_point(syncobj, chain, bo_va->last_pt_update, 
(uint64_t)point);
+
return 0;
 }
 
@@ -392,12 +398,83 @@ int amdgpu_userq_ioctl(struct drm_device *dev, void *data,
return r;
 }
 
+static int
+amdgpu_userqueue_suspend_all(struct amdgpu_userq_mgr *uq_mgr)
+{
+   struct amdgpu_device *adev = uq_mgr->adev;
+   const struct amdgpu_userq_funcs *userq_funcs;
+   struct amdgpu_usermode_queue *queue;
+   int queue_id, ret;
+
+   userq_funcs = adev->userq_funcs[AMDGPU_HW_IP_GFX];
+
+   /* Suspend all the queues for this process */
+   idr_for_each_entry(&uq_mgr->userq_idr, queue, queue_id) {
+   ret = userq_funcs->suspend(uq_mgr, queue);
+   if (ret)
+   DRM_ERROR("Failed to suspend queue\n");
+   }
+
+   return ret;
+}
+
+static void
+amdgpu_userqueue_suspend_worker(struct work_struct *work)
+{
+   int ret;
+   struct dma_fence *fence;
+   struct amdgpu_userq_mgr *uq_mgr = work_to_uq_mgr(work, 
suspend_work.work);
+   struct amdgpu_fpriv *fpriv = uq_mgr_to_fpriv(uq_mgr);
+
+   mutex_lock(&uq_mgr->userq_mutex);
+   ret = amdgpu_userqueue_suspend_all(uq_mgr);
+   if (ret) {
+   DRM_ERROR("Failed to evict userqueue\n");
+   goto unlock;
+   }
+
+   /* Signal current eviction fence */
+

[PATCH 4/4] drm/amdgpu: add userqueue resume

2024-05-08 Thread Shashank Sharma

This patch adds support for userqueue resume. What it typically does is
this:
- adds a new delayed work for resuming all the queues.
- schedules this delayed work from the suspend work.
- validates the BOs and replaces the eviction fence before resuming all
  the queues running under this instance of userq manager.

Cc: Alex Deucher 
Cc: Christian Koenig 
Signed-off-by: Shashank Sharma 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 166 ++
 .../gpu/drm/amd/include/amdgpu_userqueue.h|   1 +
 2 files changed, 167 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
index fdbd542e7f53..02ddd713d068 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
@@ -398,6 +398,167 @@ int amdgpu_userq_ioctl(struct drm_device *dev, void *data,
return r;
 }
 
+static int
+amdgpu_userqueue_resume_all(struct amdgpu_userq_mgr *uq_mgr)
+{
+   struct amdgpu_device *adev = uq_mgr->adev;
+   const struct amdgpu_userq_funcs *userq_funcs;
+   struct amdgpu_usermode_queue *queue;
+   int queue_id, ret;
+
+   userq_funcs = adev->userq_funcs[AMDGPU_HW_IP_GFX];
+
+   /* Resume all the queues for this process */
+   idr_for_each_entry(&uq_mgr->userq_idr, queue, queue_id) {
+   ret = userq_funcs->resume(uq_mgr, queue);
+   if (ret)
+   DRM_ERROR("Failed to resume queue %d\n", queue_id);
+   }
+
+   return ret;
+}
+
+static int
+amdgpu_userqueue_replace_ev_fence(struct amdgpu_userq_mgr *uq_mgr,
+ struct drm_exec *exec)
+{
+   int ret;
+   struct amdgpu_fpriv *fpriv = uq_mgr_to_fpriv(uq_mgr);
+   struct amdgpu_vm *vm = &fpriv->vm;
+   struct amdgpu_eviction_fence *old_ef, *new_ef;
+   struct amdgpu_bo_va *bo_va, *tmp;
+
+   old_ef = fpriv->ev_fence;
+   new_ef = amdgpu_eviction_fence_create(fpriv);
+   if (!new_ef) {
+   DRM_ERROR("Failed to create new eviction fence\n");
+   return ret;
+   }
+
+   list_for_each_entry_safe(bo_va, tmp, &vm->done, base.vm_status) {
+   struct amdgpu_bo *bo = bo_va->base.bo;
+
+   /* Skip pinned BOs */
+   if (bo->tbo.pin_count)
+   continue;
+
+   ret = drm_exec_lock_obj(exec, &bo->tbo.base);
+   if (unlikely(ret)) {
+   DRM_ERROR("Failed to lock BO for eviction fence 
replacement\n");
+   goto free_err;
+   }
+
+   /* replace the old eviction fence with new one */
+   amdgpu_eviction_fence_detach(fpriv, old_ef, bo);
+   ret = amdgpu_eviction_fence_attach(new_ef, bo);
+   if (ret) {
+   DRM_ERROR("Failed to attch new eviction fence\n");
+   goto free_err;
+   }
+   }
+
+   /* Update the new eviction fence */
+   fpriv->ev_fence = new_ef;
+   kfree(old_ef);
+   return 0;
+
+free_err:
+   kfree(new_ef);
+   return ret;
+}
+
+/* Expects drm_exec_until_all_locked called on this exec */
+static int
+amdgpu_userqueue_validate_bos(struct amdgpu_userq_mgr *uq_mgr,
+ struct drm_exec *exec)
+{
+   int ret;
+   struct amdgpu_bo *bo;
+   struct amdgpu_bo_va *bo_va, *tmp;
+   struct amdgpu_fpriv *fpriv = uq_mgr_to_fpriv(uq_mgr);
+   struct amdgpu_vm *vm = &fpriv->vm;
+
+   list_for_each_entry_safe(bo_va, tmp, &vm->done, base.vm_status) {
+   bo = bo_va->base.bo;
+   ret = drm_exec_lock_obj(exec, &bo->tbo.base);
+   if (unlikely(ret)) {
+   DRM_ERROR("Failed to exec lock  for validation\n");
+   goto unlock_all;
+   }
+   }
+
+   list_for_each_entry_safe(bo_va, tmp, &vm->invalidated, base.vm_status) {
+   bo = bo_va->base.bo;
+   ret = drm_exec_lock_obj(exec, &bo->tbo.base);
+   if (unlikely(ret)) {
+   DRM_ERROR("Failed to lock BO for validation\n");
+   goto unlock_all;
+   }
+
+   ret = amdgpu_bo_reserve(bo, false);
+   if (unlikely(ret)) {
+   DRM_ERROR("Failed to reserve BO for validation\n");
+   goto unlock_all;
+   }
+
+   ret = amdgpu_userqueue_validate_bo(bo);
+   amdgpu_bo_unreserve(bo);
+   if (ret) {
+   DRM_ERROR("Failed to validate BO\n");
+   goto unlock_all;
+   }
+   }
+
+   ret = amdgpu_vm_handle_moved(uq_mgr->adev, vm, NULL);
+   if

[PATCH 1/4] drm/amdgpu: add gfx eviction fence helpers

2024-05-08 Thread Shashank Sharma

This patch adds basic eviction fence framework for the gfx buffers.
The idea is to:
- One eviction fence is created per gfx process, at kms_open.
- This same fence is attached to all the gem buffers created
  by this process.

This framework will be further used for usermode queues.

V2: Addressed review comments from Christian
- keep fence_ctx and fence_seq directly in fpriv
- evcition_fence should be dynamically allocated
- do not save eviction fence instance in BO, there could be many
  such fences attached to one BO
- use dma_resv_replace_fence() in detach

Cc: Christian Koenig 
Cc: Alex Deucher 
Signed-off-by: Shashank Sharma 
---
 drivers/gpu/drm/amd/amdgpu/Makefile   |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu.h   |  20 
 .../drm/amd/amdgpu/amdgpu_eviction_fence.c| 106 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c   |  10 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c   |   8 +-
 5 files changed, 143 insertions(+), 3 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c

diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
b/drivers/gpu/drm/amd/amdgpu/Makefile
index b0103f404957..9743bf06d6aa 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -82,7 +82,7 @@ amdgpu-y += amdgpu_device.o amdgpu_doorbell_mgr.o 
amdgpu_kms.o \
amdgpu_fw_attestation.o amdgpu_securedisplay.o \
amdgpu_eeprom.o amdgpu_mca.o amdgpu_psp_ta.o amdgpu_lsdma.o \
amdgpu_ring_mux.o amdgpu_xcp.o amdgpu_seq64.o amdgpu_aca.o \
-   amdgpu_userq_fence.o
+   amdgpu_userq_fence.o amdgpu_eviction_fence.o
 
 amdgpu-$(CONFIG_PROC_FS) += amdgpu_fdinfo.o
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 2d5ef2e74c71..a37193fc9ddc 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -465,6 +465,11 @@ struct amdgpu_flip_work {
boolasync;
 };
 
+struct amdgpu_eviction_fence {
+   struct dma_fence base;
+   spinlock_t   lock;
+   char timeline_name[TASK_COMM_LEN];
+};
 
 /*
  * file private structure
@@ -479,6 +484,12 @@ struct amdgpu_fpriv {
struct idr  bo_list_handles;
struct amdgpu_ctx_mgr   ctx_mgr;
struct amdgpu_userq_mgr userq_mgr;
+
+   /* Eviction fence infra */
+   u64 ev_fence_ctx;
+   atomic_tev_fence_seq;
+   struct amdgpu_eviction_fence *ev_fence;
+
/** GPU partition selection */
uint32_txcp_id;
 };
@@ -1480,6 +1491,15 @@ void amdgpu_disable_vblank_kms(struct drm_crtc *crtc);
 int amdgpu_info_ioctl(struct drm_device *dev, void *data,
  struct drm_file *filp);
 
+/* Eviction fence */
+struct amdgpu_eviction_fence *amdgpu_eviction_fence_create(struct amdgpu_fpriv 
*fpriv);
+void amdgpu_eviction_fence_destroy(struct amdgpu_fpriv *fpriv);
+int amdgpu_eviction_fence_attach(struct amdgpu_eviction_fence *ev_fence,
+struct amdgpu_bo *bo);
+void amdgpu_eviction_fence_detach(struct amdgpu_fpriv *fpriv,
+ struct amdgpu_eviction_fence *ev_fence,
+ struct amdgpu_bo *bo);
+
 /*
  * functions used by amdgpu_encoder.c
  */
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
new file mode 100644
index ..1a03f040ccc8
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
@@ -0,0 +1,106 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright 2024 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+#include 
+#include "amdgpu.h"
+
+static const char *
+amdgpu_ev_fence_get_driver_name(struct dma_fence *fence)
+{
+   return &qu

[PATCH] drm/amdgpu: change vm->task_info handling

2024-01-02 Thread Shashank Sharma

drm/amdgpu: change vm->task_info handling

This patch changes the handling and lifecycle of vm->task_info object.
The major changes are:
- vm->task_info is a dynamically allocated ptr now, and its uasge is
  reference counted.
- introducing two new helper funcs for task_info lifecycle management
- amdgpu_vm_get_task_info: reference counts up task_info before
  returning this info
- amdgpu_vm_put_task_info: reference counts down task_info
- last put to task_info() frees task_info from the vm.

This patch also does logistical changes required for existing usage
of vm->task_info.

Cc: Christian Koenig 
Cc: Alex Deucher 
Cc: Felix Kuehling 
Signed-off-by: Shashank Sharma 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c |   7 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c  |  15 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c |  17 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c  | 142 +---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h  |  24 +++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c   |   2 +-
 drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c  |  27 ++--
 drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c  |  28 ++--
 drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c   |  26 ++--
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c   |  28 ++--
 drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c  |  20 +--
 drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c|  19 +--
 drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c |  17 +--
 13 files changed, 259 insertions(+), 113 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
index a4faea4fa0b5..111f8afb03a1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -1763,9 +1763,12 @@ static int amdgpu_debugfs_vm_info_show(struct seq_file 
*m, void *unused)
list_for_each_entry(file, &dev->filelist, lhead) {
struct amdgpu_fpriv *fpriv = file->driver_priv;
struct amdgpu_vm *vm = &fpriv->vm;
+   struct amdgpu_task_info *ti;
+
+   ti = amdgpu_vm_get_task_info_vm(vm);
+   seq_printf(m, "pid:%d\tProcess:%s --\n", ti->pid, 
ti->process_name);
+   amdgpu_vm_put_task_info_vm(ti, vm);
 
-   seq_printf(m, "pid:%d\tProcess:%s --\n",
-   vm->task_info.pid, vm->task_info.process_name);
r = amdgpu_bo_reserve(vm->root.bo, true);
if (r)
break;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 2b8356699f23..00516fa178b5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4952,10 +4952,17 @@ int amdgpu_do_asic_reset(struct list_head 
*device_list_handle,
tmp_adev->reset_vram_lost = vram_lost;
memset(&tmp_adev->reset_task_info, 0,

sizeof(tmp_adev->reset_task_info));
-   if (reset_context->job && 
reset_context->job->vm)
-   tmp_adev->reset_task_info =
-   
reset_context->job->vm->task_info;
-   amdgpu_reset_capture_coredumpm(tmp_adev);
+   if (reset_context->job && 
reset_context->job->vm) {
+   struct amdgpu_task_info *ti;
+   struct amdgpu_vm *vm = 
reset_context->job->vm;
+
+   ti = amdgpu_vm_get_task_info_vm(vm);
+   if (ti) {
+   tmp_adev->reset_task_info = *ti;
+   
amdgpu_reset_capture_coredumpm(tmp_adev);
+   amdgpu_vm_put_task_info_vm(ti, 
vm);
+   }
+   }
 #endif
if (vram_lost) {
DRM_INFO("VRAM is lost due to GPU 
reset!\n");
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index 78476bc75b4e..b89ee6ab7db9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -35,7 +35,7 @@ static enum drm_gpu_sched_stat amdgpu_job_timedout(struct 
drm_sched_job *s_job)
 {
struct amdgpu_ring *ring = to_amdgpu_ring(s_job->sched);
struct amdgpu_job *job = to_amdgpu_job(s_job);
-   struct amdgpu_task_info ti;
+   struct amdgpu_task_info *ti;
struct amdgpu_device *adev = ring->adev;
int idx;
int r;
@@ -58,12 +58,15 @@ static e

Re: [PATCH] drm/amdgpu: change vm->task_info handling

2024-01-03 Thread Shashank Sharma


Hey Felix,

On 02/01/2024 19:02, Felix Kuehling wrote:


On 2024-01-02 06:12, Shashank Sharma wrote:

drm/amdgpu: change vm->task_info handling

This patch changes the handling and lifecycle of vm->task_info object.
The major changes are:
- vm->task_info is a dynamically allocated ptr now, and its uasge is
   reference counted.
- introducing two new helper funcs for task_info lifecycle management
 - amdgpu_vm_get_task_info: reference counts up task_info before
   returning this info
 - amdgpu_vm_put_task_info: reference counts down task_info
- last put to task_info() frees task_info from the vm.

This patch also does logistical changes required for existing usage
of vm->task_info.

Cc: Christian Koenig 
Cc: Alex Deucher 
Cc: Felix Kuehling 
Signed-off-by: Shashank Sharma 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c |   7 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c  |  15 ++-
  drivers/gpu/drm/amd/amdgpu/amdgpu_job.c |  17 ++-
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c  | 142 +---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h  |  24 +++-
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c   |   2 +-
  drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c  |  27 ++--
  drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c  |  28 ++--
  drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c   |  26 ++--
  drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c   |  28 ++--
  drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c  |  20 +--
  drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c    |  19 +--
  drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c |  17 +--
  13 files changed, 259 insertions(+), 113 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c

index a4faea4fa0b5..111f8afb03a1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -1763,9 +1763,12 @@ static int amdgpu_debugfs_vm_info_show(struct 
seq_file *m, void *unused)

  list_for_each_entry(file, &dev->filelist, lhead) {
  struct amdgpu_fpriv *fpriv = file->driver_priv;
  struct amdgpu_vm *vm = &fpriv->vm;
+    struct amdgpu_task_info *ti;
+
+    ti = amdgpu_vm_get_task_info_vm(vm);
+    seq_printf(m, "pid:%d\tProcess:%s --\n", ti->pid, 
ti->process_name);

+    amdgpu_vm_put_task_info_vm(ti, vm);
  -    seq_printf(m, "pid:%d\tProcess:%s --\n",
-    vm->task_info.pid, vm->task_info.process_name);
  r = amdgpu_bo_reserve(vm->root.bo, true);
  if (r)
  break;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c

index 2b8356699f23..00516fa178b5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4952,10 +4952,17 @@ int amdgpu_do_asic_reset(struct list_head 
*device_list_handle,

  tmp_adev->reset_vram_lost = vram_lost;
  memset(&tmp_adev->reset_task_info, 0,
  sizeof(tmp_adev->reset_task_info));
-    if (reset_context->job && reset_context->job->vm)
-    tmp_adev->reset_task_info =
- reset_context->job->vm->task_info;
-    amdgpu_reset_capture_coredumpm(tmp_adev);
+    if (reset_context->job && reset_context->job->vm) {
+    struct amdgpu_task_info *ti;
+    struct amdgpu_vm *vm = reset_context->job->vm;
+
+    ti = amdgpu_vm_get_task_info_vm(vm);
+    if (ti) {
+    tmp_adev->reset_task_info = *ti;
+ amdgpu_reset_capture_coredumpm(tmp_adev);
+    amdgpu_vm_put_task_info_vm(ti, vm);
+    }
+    }
  #endif
  if (vram_lost) {
  DRM_INFO("VRAM is lost due to GPU reset!\n");
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c

index 78476bc75b4e..b89ee6ab7db9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -35,7 +35,7 @@ static enum drm_gpu_sched_stat 
amdgpu_job_timedout(struct drm_sched_job *s_job)

  {
  struct amdgpu_ring *ring = to_amdgpu_ring(s_job->sched);
  struct amdgpu_job *job = to_amdgpu_job(s_job);
-    struct amdgpu_task_info ti;
+    struct amdgpu_task_info *ti;
  struct amdgpu_device *adev = ring->adev;
  int idx;
  int r;
@@ -58,12 +58,15 @@ static enum drm_gpu_sched_stat 
amdgpu_job_timedout(struct drm_sched_job *s_job)

  goto exit;
  }
  -    amdgpu_vm_get_task_info(ring->adev, job->pasid, &ti);
-    DRM_ERROR("ring %s timeout, signaled seq=%u, emitted seq=%u\n",
-  job->base.sched->name, 
atomic_read(&ring->fence_drv.last_seq),

-  ring->fence_drv.sync

[PATCH v2] drm/amdgpu: change vm->task_info handling

2024-01-18 Thread Shashank Sharma

This patch changes the handling and lifecycle of vm->task_info object.
The major changes are:
- vm->task_info is a dynamically allocated ptr now, and its uasge is
  reference counted.
- introducing two new helper funcs for task_info lifecycle management
- amdgpu_vm_get_task_info: reference counts up task_info before
  returning this info
- amdgpu_vm_put_task_info: reference counts down task_info
- last put to task_info() frees task_info from the vm.

This patch also does logistical changes required for existing usage
of vm->task_info.

V2: Do not block all the prints when task_info not found (Felix)

Cc: Christian Koenig 
Cc: Alex Deucher 
Cc: Felix Kuehling 
Signed-off-by: Shashank Sharma 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c |   7 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c |  18 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c   |  12 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c  | 142 +---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h  |  26 +++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c   |   2 +-
 drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c  |  30 +++--
 drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c  |  31 +++--
 drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c   |  22 +--
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c   |  26 ++--
 drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c  |  26 ++--
 drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c|  26 ++--
 drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c |  20 +--
 13 files changed, 287 insertions(+), 101 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
index 0e61ebdb3f3e..99c736b6e32c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -1775,9 +1775,12 @@ static int amdgpu_debugfs_vm_info_show(struct seq_file 
*m, void *unused)
list_for_each_entry(file, &dev->filelist, lhead) {
struct amdgpu_fpriv *fpriv = file->driver_priv;
struct amdgpu_vm *vm = &fpriv->vm;
+   struct amdgpu_task_info *ti;
+
+   ti = amdgpu_vm_get_task_info_vm(vm);
+   seq_printf(m, "pid:%d\tProcess:%s --\n", ti->pid, 
ti->process_name);
+   amdgpu_vm_put_task_info_vm(ti, vm);
 
-   seq_printf(m, "pid:%d\tProcess:%s --\n",
-   vm->task_info.pid, vm->task_info.process_name);
r = amdgpu_bo_reserve(vm->root.bo, true);
if (r)
break;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index 1f357198533f..af23746821b7 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -35,7 +35,7 @@ static enum drm_gpu_sched_stat amdgpu_job_timedout(struct 
drm_sched_job *s_job)
 {
struct amdgpu_ring *ring = to_amdgpu_ring(s_job->sched);
struct amdgpu_job *job = to_amdgpu_job(s_job);
-   struct amdgpu_task_info ti;
+   struct amdgpu_task_info *ti;
struct amdgpu_device *adev = ring->adev;
int idx;
int r;
@@ -48,7 +48,7 @@ static enum drm_gpu_sched_stat amdgpu_job_timedout(struct 
drm_sched_job *s_job)
return DRM_GPU_SCHED_STAT_ENODEV;
}
 
-   memset(&ti, 0, sizeof(struct amdgpu_task_info));
+
adev->job_hang = true;
 
if (amdgpu_gpu_recovery &&
@@ -58,12 +58,16 @@ static enum drm_gpu_sched_stat amdgpu_job_timedout(struct 
drm_sched_job *s_job)
goto exit;
}
 
-   amdgpu_vm_get_task_info(ring->adev, job->pasid, &ti);
DRM_ERROR("ring %s timeout, signaled seq=%u, emitted seq=%u\n",
- job->base.sched->name, atomic_read(&ring->fence_drv.last_seq),
- ring->fence_drv.sync_seq);
-   DRM_ERROR("Process information: process %s pid %d thread %s pid %d\n",
- ti.process_name, ti.tgid, ti.task_name, ti.pid);
+ job->base.sched->name, 
atomic_read(&ring->fence_drv.last_seq),
+ ring->fence_drv.sync_seq);
+
+   ti = amdgpu_vm_get_task_info_pasid(ring->adev, job->pasid);
+   if (ti) {
+   DRM_ERROR("Process information: process %s pid %d thread %s pid 
%d\n",
+ ti->process_name, ti->tgid, ti->task_name, ti->pid);
+   amdgpu_vm_put_task_info_pasid(ring->adev, ti, job->pasid);
+   }
 
dma_fence_set_error(&s_job->s_fence->finished, -ETIME);
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
index 4baa300121d8..bfd7a6067edd 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
@@ -230,8 +230,16 @@ void amdgpu_coredump(struct

Re: [PATCH v2] drm/amdgpu: change vm->task_info handling

2024-01-24 Thread Shashank Sharma




On 19/01/2024 21:23, Felix Kuehling wrote:


On 2024-01-18 14:21, Shashank Sharma wrote:

This patch changes the handling and lifecycle of vm->task_info object.
The major changes are:
- vm->task_info is a dynamically allocated ptr now, and its uasge is
   reference counted.
- introducing two new helper funcs for task_info lifecycle management
 - amdgpu_vm_get_task_info: reference counts up task_info before
   returning this info
 - amdgpu_vm_put_task_info: reference counts down task_info
- last put to task_info() frees task_info from the vm.

This patch also does logistical changes required for existing usage
of vm->task_info.

V2: Do not block all the prints when task_info not found (Felix)

Cc: Christian Koenig 
Cc: Alex Deucher 
Cc: Felix Kuehling 
Signed-off-by: Shashank Sharma 


Nit-picks inline.



---
  drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c |   7 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_job.c |  18 ++-
  drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c   |  12 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c  | 142 +---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h  |  26 +++-
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c   |   2 +-
  drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c  |  30 +++--
  drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c  |  31 +++--
  drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c   |  22 +--
  drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c   |  26 ++--
  drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c  |  26 ++--
  drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c    |  26 ++--
  drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c |  20 +--
  13 files changed, 287 insertions(+), 101 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c

index 0e61ebdb3f3e..99c736b6e32c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -1775,9 +1775,12 @@ static int amdgpu_debugfs_vm_info_show(struct 
seq_file *m, void *unused)

  list_for_each_entry(file, &dev->filelist, lhead) {
  struct amdgpu_fpriv *fpriv = file->driver_priv;
  struct amdgpu_vm *vm = &fpriv->vm;
+    struct amdgpu_task_info *ti;
+
+    ti = amdgpu_vm_get_task_info_vm(vm);


Can ti be NULL here? I think it can, so you'd need a NULL check to 
avoid a possible kernel oops.

Agree



+    seq_printf(m, "pid:%d\tProcess:%s --\n", ti->pid, 
ti->process_name);

+    amdgpu_vm_put_task_info_vm(ti, vm);
  -    seq_printf(m, "pid:%d\tProcess:%s --\n",
-    vm->task_info.pid, vm->task_info.process_name);
  r = amdgpu_bo_reserve(vm->root.bo, true);
  if (r)
  break;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c

index 1f357198533f..af23746821b7 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -35,7 +35,7 @@ static enum drm_gpu_sched_stat 
amdgpu_job_timedout(struct drm_sched_job *s_job)

  {
  struct amdgpu_ring *ring = to_amdgpu_ring(s_job->sched);
  struct amdgpu_job *job = to_amdgpu_job(s_job);
-    struct amdgpu_task_info ti;
+    struct amdgpu_task_info *ti;
  struct amdgpu_device *adev = ring->adev;
  int idx;
  int r;
@@ -48,7 +48,7 @@ static enum drm_gpu_sched_stat 
amdgpu_job_timedout(struct drm_sched_job *s_job)

  return DRM_GPU_SCHED_STAT_ENODEV;
  }
  -    memset(&ti, 0, sizeof(struct amdgpu_task_info));
+
  adev->job_hang = true;
    if (amdgpu_gpu_recovery &&
@@ -58,12 +58,16 @@ static enum drm_gpu_sched_stat 
amdgpu_job_timedout(struct drm_sched_job *s_job)

  goto exit;
  }
  -    amdgpu_vm_get_task_info(ring->adev, job->pasid, &ti);
  DRM_ERROR("ring %s timeout, signaled seq=%u, emitted seq=%u\n",
-  job->base.sched->name, 
atomic_read(&ring->fence_drv.last_seq),

-  ring->fence_drv.sync_seq);
-    DRM_ERROR("Process information: process %s pid %d thread %s pid 
%d\n",

-  ti.process_name, ti.tgid, ti.task_name, ti.pid);
+  job->base.sched->name, 
atomic_read(&ring->fence_drv.last_seq),

+  ring->fence_drv.sync_seq);


Unnecessary (and incorrect) indentation change.


Ah, my bad, looks like copy-paste screwed-up my editor config for 
alignment.

+
+    ti = amdgpu_vm_get_task_info_pasid(ring->adev, job->pasid);
+    if (ti) {
+    DRM_ERROR("Process information: process %s pid %d thread %s 
pid %d\n",

+  ti->process_name, ti->tgid, ti->task_name, ti->pid);
+    amdgpu_vm_put_task_info_pasid(ring->adev, ti, job->pasid);
+    }
    dma_fence_set_error(&s_job->s_fence->finished, -ETIME);
  diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c

index

[PATCH v3] drm/amdgpu/OLAND: clip the ref divider max value

2021-08-20 Thread Shashank Sharma

This patch limits the ref_div_max value to 100, during the
calculation of PLL feedback reference divider. With current
value (128), the produced fb_ref_div value generates unstable
output at particular frequencies. Radeon driver limits this
value at 100.

On Oland, when we try to setup mode 2048x1280@60 (a bit weird,
I know), it demands a clock of 221270 Khz. It's been observed
that the PLL calculations using values 128 and 100 are vastly
different, and look like this:

+--+
|Parameter|AMDGPU|Radeon   |
| |  | |
+-++
|Clock feedback  | |
|divider max  |  128 |   100   |
|cap value|  | |
| |  | |
| |  | |
+--+
|ref_div_max  |  | |
| |  42  |  20 |
| |  | |
| |  | |
+--+
|ref_div  |  42  |  20 |
| |  | |
+--+
|fb_div   |  10326   |  8195   |
+--+
|fb_div   |  1024|  163|
+--+
|fb_dev_p |  4   |  9  |
|frac fb_de^_p|  | |
++-+

With ref_div_max value clipped at 100, AMDGPU driver can also
drive videmode 2048x1280@60 (221Mhz) and produce proper output
without any blanking and distortion on the screen.

PS: This value was changed from 128 to 100 in Radeon driver also, here:
https://github.com/freedesktop/drm-tip/commit/4b21ce1b4b5d262e7d4656b8ececc891fc3cb806

V1:
Got acks from:
Acked-by: Alex Deucher 
Acked-by: Christian König 

V2:
- Restricting the changes only for OLAND, just to avoid any regression
  for other cards.
- Changed unsigned -> unsigned int to make checkpatch quiet.

V3: Apply the change on SI family (not only oland) (Christian)

Cc: Alex Deucher 
Cc: Christian König 
Cc: Eddy Qin 
Signed-off-by: Shashank Sharma 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_pll.c| 20 +---
 drivers/gpu/drm/amd/amdgpu/amdgpu_pll.h|  3 ++-
 drivers/gpu/drm/amd/amdgpu/atombios_crtc.c |  2 +-
 3 files changed, 16 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_pll.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_pll.c
index f2e20666c9c1..4eaec446b49d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_pll.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_pll.c
@@ -80,12 +80,17 @@ static void amdgpu_pll_reduce_ratio(unsigned *nom, unsigned 
*den,
  * Calculate feedback and reference divider for a given post divider. Makes
  * sure we stay within the limits.
  */
-static void amdgpu_pll_get_fb_ref_div(unsigned nom, unsigned den, unsigned 
post_div,
- unsigned fb_div_max, unsigned ref_div_max,
- unsigned *fb_div, unsigned *ref_div)
+static void amdgpu_pll_get_fb_ref_div(struct amdgpu_device *adev, unsigned int 
nom,
+ unsigned int den, unsigned int post_div,
+ unsigned int fb_div_max, unsigned int 
ref_div_max,
+ unsigned int *fb_div, unsigned int 
*ref_div)
 {
+
/* limit reference * post divider to a maximum */
-   ref_div_max = min(128 / post_div, ref_div_max);
+   if (adev->family == AMDGPU_FAMILY_SI)
+   ref_div_max = min(100 / post_div, ref_div_max);
+   else
+   ref_div_max = min(128 / post_div, ref_div_max);
 
/* get matching reference and feedback divider */
*ref_div = min(max(DIV_ROUND_CLOSEST(den, post_div), 1u), ref_div_max);
@@ -112,7 +117,8 @@ static void amdgpu_pll_get_fb_ref_div(unsigned nom, 
unsigned den, unsigned post_
  * Try to calculate the PLL parameters to generate the given frequency:
  * dot_clock = (ref_freq * feedback_div) / (ref_div * post_div)
  */
-void amdgpu_pll_compute(struct amdgpu_pll *pll,
+void amdgpu_pll_compute(struct amdgpu_device *adev,
+   struct amdgpu_pll *pll,
u32 freq,
u32 *dot_clock_p,
u32 *fb_div_p,
@@ -199,7 +205,7 @@ void amdgpu_pll_compute(struct amdgpu_pll *pll,
 
for (post_div = post_div_min; post_div <= post_div_max; ++post_div) {
unsigned diff;
-   amdgpu_pll_get_fb_ref_div(nom, den, post_div, fb_div_max,
+   amdgpu_pll_get_fb_ref_div(adev, nom, den, post_div, fb_div_max,
  ref_div_max, &fb_div, &ref_div);
diff = abs(target_clock - (pll-&g

[PATCH] drm/amdgpu/dm: Fix NULL pointer crash during DP MST hotplug

2021-04-15 Thread Shashank Sharma

This patch checks the return value of the function
dc_link_add_remote_sink before using it. This was causing
a crash during consecutive hotplugs of DP MST displays.

Cc: Harry Wentland 
Signed-off-by: Shashank Sharma 
---
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
index eee19ed5..8dc5005bec0a 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
@@ -226,6 +226,11 @@ static int dm_dp_mst_get_modes(struct drm_connector 
*connector)
(aconnector->edid->extensions + 1) * EDID_LENGTH,
&init_params);
 
+   if (!dc_sink) {
+   DRM_ERROR("Unable to add a remote sink\n");
+   return 0;
+   }
+
dc_sink->priv = aconnector;
/* dc_link_add_remote_sink returns a new reference */
aconnector->dc_sink = dc_sink;
-- 
2.25.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [RFC PATCH 0/3] A drm_plane API to support HDR planes

2021-04-28 Thread Shashank Sharma

Hello Harry,

Many of us in the mail chain have discussed this before, on what is the right 
way to blend and tone map a SDR and a HDR buffer from same/different color 
spaces, and what kind of DRM plane properties will be needed.

As you can see from the previous comments, that the majority of the decision 
making will happen in the Compositor, as it's the only SW unit, which has the 
overall picture clear.

Reference: 
(https://lists.freedesktop.org/archives/wayland-devel/2019-January/039808.html )

If we see a systematic approach of how do we make such blending policy, it will 
look like:


- Compositor needs to understand the following values of each of the buffer:

    - Color space or Gamut: BT2020/SRGB/DCI-P3/BT709/BT601 etc

    - Color format (RGB/YCBCR) and subsampling (444/422/420)

    - Tone (SDR/HDR_A/HDR_B)


- Then the Compositor needs to understand the capabilities of the output 
display, as this will be a clamping value

    - Output Gamut support (BT2020/SRGB/DCIP3)

    - Output max Luminance of the monitor in Nits (even in case of HDR content 
to HDR display)

  

Based of all this information above, the compositor needs to set a blending 
target, which contains the following:

    - Output Colorspace of the blended output: say BT2020

    - Output Luminance of the blended output: Match content, if monitor can 
support it

    - Output Color format of the blended output: Say YCBCR4:2:0


Let's assume compositor prepares a blending policy with output as:

    - Output Luminance: HDR 500 Nits

    - Output color space: BT2020

    - Output color format: RGB888

    - Output curve: ST2084

  

Assuming these details, A compositor will look for DRM color properties like 
these:

1. Degamma plane property : To make buffers linear for Gamut mapping

2. Gamut mapping plane property:  To gamut map SRGB buffer to BT2020 colorspace

3. Color space conversion plane property: To convert from YCBCR->RGB

4. Tone mapping plane property: To tone map SDR buffer S2H and HDR buffer H2H

5. Gamma plane/CRTC property: to re-apply the output ST2084 curve


We will also need connector/CRTC properties to set AVI info-frames accordingly.

A high level block diagram for blending on a generic HW should look like this:

/*
 *  SDR 200Nits┌┐ SDR 200 Nits  ┌┐ SDR 200 
┌──┐HDR 500┌┐ HDR 500
 *   BT709 │    │ BT709 │    │ BT2020  │
  │BT2020 │    │ BT2020
 *   ► │   Degamma  ├─► │ Gamut Mapping  ├►│  
Tone mapping    ├──►│  Gamma │
 *  RGB888 │ 2.2    │ RGB888    │  709->2020 │ RGB888  │    
S2H   │RGB888 │  ST2084    │ RGB888
 *  Non Linear │    │ Linear    │    │ Linear  │   
200->500   │Linear │    │ ST2084
 * └┘   └┘ 
└──┘   └┘
 *
 *
 *
 *
 *
 *
 *
 *
 * ┌─┐ ┌─┐   
┌─┐   ┌┐
 * HDR 600 Nits│ │HDR 600 Nits │ │HDR600 │  
   │HDR500 │    │ HDR500
 *   ► │  Degamma    ├►│  Color space    ├──►│  
Tone mapping   ├──►│  Gamma │
 * BT2020  │  OETF ST2084    │ BT2020  │  conversion │BT2020 │  
 H2H   │BT2020 │  ST2084    │ BT2020
 * YCBCR420    │ │ YCBCR420    │ YCBCR->RGB  │RGB88  │  
 600->500  │RGB888 │    │ RGB888
 * Non Linear  └─┘ Linear  └─┘Linear 
└─┘Linear └┘ ST2084
 */


Hope this helps to refine the series.


Regards

Shashank

On 27/04/21 20:20, Pekka Paalanen wrote:
> On Mon, 26 Apr 2021 13:38:49 -0400
> Harry Wentland  wrote:
>
>> ## Introduction
>>
>> We are looking to enable HDR support for a couple of single-plane and
>> multi-plane scenarios. To do this effectively we recommend new
>> interfaces to drm_plane. Below I'll give a bit of background on HDR
>> and why we propose these interfaces.
>>
>>
>> ## Defining a pixel's luminance
>>
>> Currently the luminance space of pixels in a framebuffer/plane
>> presented to the display is not well defined. It's usually assumed to
>> be in a 2.2 or 2.4 gamma space and has no mapping to an absolute
>> luminance value but is interpreted in relative terms.
>>
>> Luminance can be measured and described in absolute terms as candela
>> per meter squared, or cd/m2, or nits. Even though a pixel value can
>> be mapped to luminance in a linear fashion to do so without losing a
>> lot of detail requires 16-bpc color depth. The reason for this is
>> that human perception can distinguish roughly between a 0.5-1%
>> luminance delta. A linear representation is suboptimal, wasting
>> precision in the h

Re: [RFC PATCH 0/3] A drm_plane API to support HDR planes

2021-04-30 Thread Shashank Sharma

Hello Pekka,

On 30/04/21 15:13, Pekka Paalanen wrote:
> On Wed, 28 Apr 2021 13:24:27 +0530
> Shashank Sharma  wrote:
>
>> Assuming these details, A compositor will look for DRM color properties like 
>> these:
>>
>> 1. Degamma plane property : To make buffers linear for Gamut mapping
>>
>> 2. Gamut mapping plane property:  To gamut map SRGB buffer to BT2020 
>> colorspace
>>
>> 3. Color space conversion plane property: To convert from YCBCR->RGB
>>
>> 4. Tone mapping plane property: To tone map SDR buffer S2H and HDR buffer H2H
>>
>> 5. Gamma plane/CRTC property: to re-apply the output ST2084 curve
>>
>>
> ...
>
>>  *
>>  *
>>  *
>>  * ┌─┐ ┌─┐   
>> ┌─┐   ┌┐
>>  * HDR 600 Nits│ │HDR 600 Nits │ │HDR600 
>> │ │HDR500 │    │ HDR500
>>  *   ► │  Degamma    ├►│  Color space    
>> ├──►│  Tone mapping   ├──►│  Gamma │
>>  * BT2020  │  OETF ST2084    │ BT2020  │  conversion │BT2020 
>> │   H2H   │BT2020 │  ST2084    │ BT2020
>>  * YCBCR420    │ │ YCBCR420    │ YCBCR->RGB  │RGB88  
>> │   600->500  │RGB888 │    │ RGB888
>>  * Non Linear  └─┘ Linear  └─┘Linear 
>> └─┘Linear └┘ ST2084
>>  */
> Hi Shashank,
>
> I think you might have degamma and color model conversion reversed, or
> is that a new thing in the HDR specs?
>
> Usually the YCbCr/RGB conversion matrix applies to non-linear values
> AFAIU.
Ah, that was due to the Gamut mapping block. You are right, color format 
conversion can happen on non-linear data (doesn't mean it can't happen on 
linear), but in the sequential block above, there was gamut mapping (color 
space conversion), which needs to be done on Linear space, and I was a bit too 
lazy to create separate blocks, so I just re[placed the block titles  :D.
> There is also confusion with OETF vs. EOTF. I got that initially wrong
> too. OETF is not just a name for inverse-EOTF but it is used in a
> different context. Though here it seems to be just a typo.
> OETF is inherent to a camera when it converts light into
> electrical signals. EOTF is inherent to a monitor when it converts
> electrical signals to light. Depending on what the electrical signals
> have been defined to be in each step of a broadcasting chain, you might
> need OETF or EOTF or their inverse or a different OETF or EOTF or their
> inverse.

Yes, that was a typo. The intention was to call it inverse curve for HDR 
encoded buffers. It's almost 4 years (and 2 companies) since I last did HDR, so 
I am a bit rusty on the topic ;) .

- Shashank

>
> As we are talking about displays and likely assuming display-referred
> content (not scene-referred content), we probably have no use for OETF,
> but we could have several different EOTFs.
>
>
> Thanks,
> pq
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/amdgpu/display: DP MST hot-unplug cleanup

2021-04-30 Thread Shashank Sharma

The current DP MST hotplug handling sequence adds new remote
sinks during the MST plug-in, but it doesn't removes it during
the unplug, which results into saturation of sink count after
2 contineous hotplugs (dual monitor scenario).

This patch adds a clean-up sequence during the hot-unplug situation.

Cc: Harry Wentland 
Cc: Alex Deucher 

Signed-off-by: Shashank Sharma 
---
 .../display/amdgpu_dm/amdgpu_dm_mst_types.c   | 30 ++-
 1 file changed, 29 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
index 8dc5005bec0a..8b87dd0a3d50 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
@@ -220,6 +220,7 @@ static int dm_dp_mst_get_modes(struct drm_connector 
*connector)
struct dc_sink_init_data init_params = {
.link = aconnector->dc_link,
.sink_signal = SIGNAL_TYPE_DISPLAY_PORT_MST };
+
dc_sink = dc_link_add_remote_sink(
aconnector->dc_link,
(uint8_t *)aconnector->edid,
@@ -266,15 +267,42 @@ dm_mst_atomic_best_encoder(struct drm_connector 
*connector,
return &adev->dm.mst_encoders[acrtc->crtc_id].base;
 }
 
+static void
+dm_dp_mst_sink_cleanup(struct drm_connector *connector)
+{
+   struct amdgpu_dm_connector *aconnector = 
to_amdgpu_dm_connector(connector);
+
+   if (aconnector->dc_sink)
+   dc_link_remove_remote_sink(aconnector->dc_link,
+  aconnector->dc_sink);
+
+   if (aconnector->edid) {
+   kfree(aconnector->edid);
+   aconnector->edid = NULL;
+   }
+}
+
 static int
 dm_dp_mst_detect(struct drm_connector *connector,
 struct drm_modeset_acquire_ctx *ctx, bool force)
 {
struct amdgpu_dm_connector *aconnector = 
to_amdgpu_dm_connector(connector);
struct amdgpu_dm_connector *master = aconnector->mst_port;
+   enum drm_connector_status status;
 
-   return drm_dp_mst_detect_port(connector, ctx, &master->mst_mgr,
+   status = drm_dp_mst_detect_port(connector, ctx, &master->mst_mgr,
  aconnector->port);
+
+   if ((status == connector_status_disconnected) &&
+   (connector->status == connector_status_connected)) {
+
+   /* Fresh hot-unplug scenario, sink cleaup required */
+   DRM_DEBUG_DRIVER("[CONNECTOR:%d:%s] MST hot-unplug, doing sink 
cleanup\n",
+   connector->base.id, connector->name);
+   dm_dp_mst_sink_cleanup(connector);
+   }
+
+   return status;
 }
 
 static int dm_dp_mst_atomic_check(struct drm_connector *connector,
-- 
2.25.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH v2] drm/amdgpu/display: DP MST hot-unplug cleanup

2021-04-30 Thread Shashank Sharma

The current DP MST hotplug handling sequence adds new remote
sinks during the MST plug-in, but it doesn't removes it during
the unplug, which results into saturation of sink count after
2 contineous hotplugs (dual monitor scenario).

This patch adds a clean-up sequence during the hot-unplug situation.

V2: Removed one extra line added in V1

Cc: Harry Wentland 
Cc: Alex Deucher 
Signed-off-by: Shashank Sharma 
---
 .../display/amdgpu_dm/amdgpu_dm_mst_types.c   | 30 ++-
 1 file changed, 29 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
index 8dc5005bec0a..8b87dd0a3d50 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
@@ -220,6 +220,7 @@ static int dm_dp_mst_get_modes(struct drm_connector 
*connector)
struct dc_sink_init_data init_params = {
.link = aconnector->dc_link,
.sink_signal = SIGNAL_TYPE_DISPLAY_PORT_MST };
+
dc_sink = dc_link_add_remote_sink(
aconnector->dc_link,
(uint8_t *)aconnector->edid,
@@ -266,15 +267,42 @@ dm_mst_atomic_best_encoder(struct drm_connector 
*connector,
return &adev->dm.mst_encoders[acrtc->crtc_id].base;
 }
 
+static void
+dm_dp_mst_sink_cleanup(struct drm_connector *connector)
+{
+struct amdgpu_dm_connector *aconnector = 
to_amdgpu_dm_connector(connector);
+
+if (aconnector->dc_sink)
+dc_link_remove_remote_sink(aconnector->dc_link,
+   aconnector->dc_sink);
+
+if (aconnector->edid) {
+kfree(aconnector->edid);
+aconnector->edid = NULL;
+}
+}
+
 static int
 dm_dp_mst_detect(struct drm_connector *connector,
 struct drm_modeset_acquire_ctx *ctx, bool force)
 {
struct amdgpu_dm_connector *aconnector = 
to_amdgpu_dm_connector(connector);
struct amdgpu_dm_connector *master = aconnector->mst_port;
+enum drm_connector_status status;
 
-   return drm_dp_mst_detect_port(connector, ctx, &master->mst_mgr,
+   status = drm_dp_mst_detect_port(connector, ctx, &master->mst_mgr,
  aconnector->port);
+
+if ((status == connector_status_disconnected) &&
+(connector->status == connector_status_connected)) {
+
+/* Fresh hot-unplug scenario, sink cleaup required */
+DRM_DEBUG_DRIVER("[CONNECTOR:%d:%s] MST hot-unplug, doing sink 
cleanup\n",
+connector->base.id, connector->name);
+dm_dp_mst_sink_cleanup(connector);
+}
+
+return status;
 }
 
 static int dm_dp_mst_atomic_check(struct drm_connector *connector,
-- 
2.25.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH v3] drm/amdgpu: add new trace event for page table update v3

2020-08-11 Thread Shashank Sharma

This patch adds a new trace event to track the PTE update
events. This specific event will provide information like:
- start and end of virtual memory mapping
- HW engine flags for the map
- physical address for mapping

This will be particularly useful for memory profiling tools
(like RMV) which are monitoring the page table update events.

V2: Added physical address lookup logic in trace point
V3: switch to use __dynamic_array
added nptes int the TPprint arguments list
added page size in the arg list

Cc: Christian König 
Cc: Alex Deucher 
Signed-off-by: Christian König 
Signed-off-by: Shashank Sharma 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h | 38 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c|  9 --
 2 files changed, 45 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
index 63e734a125fb..b9aae7983b4b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
@@ -321,6 +321,44 @@ DEFINE_EVENT(amdgpu_vm_mapping, amdgpu_vm_bo_cs,
TP_ARGS(mapping)
 );
 
+TRACE_EVENT(amdgpu_vm_update_ptes,
+   TP_PROTO(struct amdgpu_vm_update_params *p,
+uint64_t start, uint64_t end,
+unsigned int nptes, uint64_t dst,
+uint64_t incr, uint64_t flags),
+   TP_ARGS(p, start, end, nptes, dst, incr, flags),
+   TP_STRUCT__entry(
+__field(u64, start)
+__field(u64, end)
+__field(u64, flags)
+__field(u64, incr)
+__field(unsigned int, nptes)
+__dynamic_array(u64, dst, nptes)
+   ),
+
+   TP_fast_assign(
+   unsigned int i;
+
+   __entry->start = start;
+   __entry->end = end;
+   __entry->flags = flags;
+   __entry->incr = incr;
+   __entry->nptes = nptes;
+
+   for (i = 0; i < nptes; ++i) {
+   u64 addr = p->pages_addr ? amdgpu_vm_map_gart(
+   p->pages_addr, dst) : dst;
+
+   ((u64 *)__get_dynamic_array(dst))[i] = addr;
+   dst += incr;
+   }
+   ),
+   TP_printk("seg:0x%010llx-0x%010llx, flags:0x%llx, nptr=%u, pgsz:%llu,"
+ " dst:\n%s", __entry->start, __entry->end, __entry->flags,
+ __entry->nptes, __entry->incr,
+ __print_array(__get_dynamic_array(dst), __entry->nptes, 8))
+);
+
 TRACE_EVENT(amdgpu_vm_set_ptes,
TP_PROTO(uint64_t pe, uint64_t addr, unsigned count,
 uint32_t incr, uint64_t flags, bool direct),
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 71e005cf2952..b5dbb5e8bc61 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -1513,17 +1513,22 @@ static int amdgpu_vm_update_ptes(struct 
amdgpu_vm_update_params *params,
do {
uint64_t upd_end = min(entry_end, frag_end);
unsigned nptes = (upd_end - frag_start) >> shift;
+   uint64_t upd_flags = flags | AMDGPU_PTE_FRAG(frag);
 
/* This can happen when we set higher level PDs to
 * silent to stop fault floods.
 */
nptes = max(nptes, 1u);
+
+   trace_amdgpu_vm_update_ptes(params, frag_start, upd_end,
+   nptes, dst, incr,
+   upd_flags);
amdgpu_vm_update_flags(params, pt, cursor.level,
   pe_start, dst, nptes, incr,
-  flags | AMDGPU_PTE_FRAG(frag));
+  upd_flags);
 
pe_start += nptes * 8;
-   dst += (uint64_t)nptes * AMDGPU_GPU_PAGE_SIZE << shift;
+   dst += nptes * incr;
 
frag_start = upd_end;
if (frag_start >= frag_end) {
-- 
2.25.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH v3] drm/amdgpu: add new trace event for page table update v3

2020-08-12 Thread Shashank Sharma

Hello Christian,

On 12/08/20 12:15 pm, Christian König wrote:
> Am 12.08.20 um 06:33 schrieb Shashank Sharma:
>> This patch adds a new trace event to track the PTE update
>> events. This specific event will provide information like:
>> - start and end of virtual memory mapping
>> - HW engine flags for the map
>> - physical address for mapping
>>
>> This will be particularly useful for memory profiling tools
>> (like RMV) which are monitoring the page table update events.
>>
>> V2: Added physical address lookup logic in trace point
>> V3: switch to use __dynamic_array
>>  added nptes int the TPprint arguments list
>>  added page size in the arg list
>>
>> Cc: Christian König 
>> Cc: Alex Deucher 
>> Signed-off-by: Christian König 
>> Signed-off-by: Shashank Sharma 
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h | 38 +++
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c|  9 --
>>   2 files changed, 45 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
>> index 63e734a125fb..b9aae7983b4b 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
>> @@ -321,6 +321,44 @@ DEFINE_EVENT(amdgpu_vm_mapping, amdgpu_vm_bo_cs,
>>  TP_ARGS(mapping)
>>   );
>>   
>> +TRACE_EVENT(amdgpu_vm_update_ptes,
>> +TP_PROTO(struct amdgpu_vm_update_params *p,
>> + uint64_t start, uint64_t end,
>> + unsigned int nptes, uint64_t dst,
>> + uint64_t incr, uint64_t flags),
>> +TP_ARGS(p, start, end, nptes, dst, incr, flags),
>> +TP_STRUCT__entry(
>> + __field(u64, start)
>> + __field(u64, end)
>> + __field(u64, flags)
>> + __field(u64, incr)
>> + __field(unsigned int, nptes)
>> + __dynamic_array(u64, dst, nptes)
>> +),
>> +
>> +TP_fast_assign(
>> +unsigned int i;
>> +
>> +__entry->start = start;
>> +__entry->end = end;
>> +__entry->flags = flags;
>> +__entry->incr = incr;
>> +__entry->nptes = nptes;
>> +
>> +for (i = 0; i < nptes; ++i) {
>> +u64 addr = p->pages_addr ? amdgpu_vm_map_gart(
>> +p->pages_addr, dst) : dst;
>> +
>> +((u64 *)__get_dynamic_array(dst))[i] = addr;
>> +dst += incr;
>> +}
>> +),
>> +TP_printk("seg:0x%010llx-0x%010llx, flags:0x%llx, nptr=%u, pgsz:%llu,"
>> +  " dst:\n%s", __entry->start, __entry->end, __entry->flags,
>> +  __entry->nptes, __entry->incr,
> This is not correct. The increment is NOT the page size, but rather the 
> page size rounded down to a power of 512+4K.
>
> In other words page size can be 4K, 8K, 16K, 32K, 64K1M, 2M, 
> 4M512M, 1G, 2G
>
> But the increment can only be 4K, 2M, 1G
Understood. But I think the requirement here is for increment. My understanding 
is that the tool needs to save the page entries, and for that, it will need 
start of virtual mem, start of physical mem, mapping size and step to increment 
the entries. If that's so, we can re-label this entry as "step" instead of 
"page size". Please let me know if you think it's the right thing to do. 
> And do we need the nptes here? We just need it to print the correct 
> number of destination addresses.

Agree, we don't really need nptes here, I will remove that and send V4.

- Shashank

>
> Regards,
> Christian.
>
>> +  __print_array(__get_dynamic_array(dst), __entry->nptes, 8))
>> +);
>> +
>>   TRACE_EVENT(amdgpu_vm_set_ptes,
>>  TP_PROTO(uint64_t pe, uint64_t addr, unsigned count,
>>   uint32_t incr, uint64_t flags, bool direct),
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> index 71e005cf2952..b5dbb5e8bc61 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> @@ -1513,17 +1513,22 @@ static int amdgpu_vm_update_ptes(struct 
>> amdgpu_vm_update_para

Re: [PATCH v3] drm/amdgpu: add new trace event for page table update v3

2020-08-12 Thread Shashank Sharma


On 12/08/20 2:02 pm, Christian König wrote:
> Am 12.08.20 um 10:15 schrieb Shashank Sharma:
>> Hello Christian,
>>
>> On 12/08/20 12:15 pm, Christian König wrote:
>>> Am 12.08.20 um 06:33 schrieb Shashank Sharma:
>>>> This patch adds a new trace event to track the PTE update
>>>> events. This specific event will provide information like:
>>>> - start and end of virtual memory mapping
>>>> - HW engine flags for the map
>>>> - physical address for mapping
>>>>
>>>> This will be particularly useful for memory profiling tools
>>>> (like RMV) which are monitoring the page table update events.
>>>>
>>>> V2: Added physical address lookup logic in trace point
>>>> V3: switch to use __dynamic_array
>>>>   added nptes int the TPprint arguments list
>>>>   added page size in the arg list
>>>>
>>>> Cc: Christian König 
>>>> Cc: Alex Deucher 
>>>> Signed-off-by: Christian König 
>>>> Signed-off-by: Shashank Sharma 
>>>> ---
>>>>drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h | 38 +++
>>>>drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c|  9 --
>>>>2 files changed, 45 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h 
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
>>>> index 63e734a125fb..b9aae7983b4b 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
>>>> @@ -321,6 +321,44 @@ DEFINE_EVENT(amdgpu_vm_mapping, amdgpu_vm_bo_cs,
>>>>TP_ARGS(mapping)
>>>>);
>>>>
>>>> +TRACE_EVENT(amdgpu_vm_update_ptes,
>>>> +  TP_PROTO(struct amdgpu_vm_update_params *p,
>>>> +   uint64_t start, uint64_t end,
>>>> +   unsigned int nptes, uint64_t dst,
>>>> +   uint64_t incr, uint64_t flags),
>>>> +  TP_ARGS(p, start, end, nptes, dst, incr, flags),
>>>> +  TP_STRUCT__entry(
>>>> +   __field(u64, start)
>>>> +   __field(u64, end)
>>>> +   __field(u64, flags)
>>>> +   __field(u64, incr)
>>>> +   __field(unsigned int, nptes)
>>>> +   __dynamic_array(u64, dst, nptes)
>>>> +  ),
>>>> +
>>>> +  TP_fast_assign(
>>>> +  unsigned int i;
>>>> +
>>>> +  __entry->start = start;
>>>> +  __entry->end = end;
>>>> +  __entry->flags = flags;
>>>> +  __entry->incr = incr;
>>>> +  __entry->nptes = nptes;
>>>> +
>>>> +  for (i = 0; i < nptes; ++i) {
>>>> +  u64 addr = p->pages_addr ? amdgpu_vm_map_gart(
>>>> +  p->pages_addr, dst) : dst;
>>>> +
>>>> +  ((u64 *)__get_dynamic_array(dst))[i] = addr;
>>>> +  dst += incr;
>>>> +  }
>>>> +  ),
>>>> +  TP_printk("seg:0x%010llx-0x%010llx, flags:0x%llx, nptr=%u, pgsz:%llu,"
>>>> +" dst:\n%s", __entry->start, __entry->end, __entry->flags,
>>>> +__entry->nptes, __entry->incr,
>>> This is not correct. The increment is NOT the page size, but rather the
>>> page size rounded down to a power of 512+4K.
>>>
>>> In other words page size can be 4K, 8K, 16K, 32K, 64K1M, 2M,
>>> 4M512M, 1G, 2G
>>>
>>> But the increment can only be 4K, 2M, 1G
>> Understood. But I think the requirement here is for increment. My 
>> understanding is that the tool needs to save the page entries, and for that, 
>> it will need start of virtual mem, start of physical mem, mapping size and 
>> step to increment the entries. If that's so, we can re-label this entry as 
>> "step" instead of "page size". Please let me know if you think it's the 
>> right thing to do.
> We could stick with the naming increment if that helps, but this can 
> also be derived from the number of destination addresses we have.
sure, i will make it increment.
>
> On the other hand explicitly mentioni

[PATCH v4] drm/amdgpu: add new trace event for page table update v3

2020-08-12 Thread Shashank Sharma

This patch adds a new trace event to track the PTE update
events. This specific event will provide information like:
- start and end of virtual memory mapping
- HW engine flags for the map
- physical address for mapping

This will be particularly useful for memory profiling tools
(like RMV) which are monitoring the page table update events.

V2: Added physical address lookup logic in trace point
V3: switch to use __dynamic_array
added nptes int the TPprint arguments list
added page size in the arg list
V4: Addressed Christian's review comments
add start/end instead of seg
use incr instead of page_sz to be accurate

Cc: Christian König 
Cc: Alex Deucher 
Signed-off-by: Christian König 
Signed-off-by: Shashank Sharma 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h | 37 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c|  9 --
 2 files changed, 44 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
index 63e734a125fb..df12cf8466c2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
@@ -321,6 +321,43 @@ DEFINE_EVENT(amdgpu_vm_mapping, amdgpu_vm_bo_cs,
TP_ARGS(mapping)
 );
 
+TRACE_EVENT(amdgpu_vm_update_ptes,
+   TP_PROTO(struct amdgpu_vm_update_params *p,
+uint64_t start, uint64_t end,
+unsigned int nptes, uint64_t dst,
+uint64_t incr, uint64_t flags),
+   TP_ARGS(p, start, end, nptes, dst, incr, flags),
+   TP_STRUCT__entry(
+__field(u64, start)
+__field(u64, end)
+__field(u64, flags)
+__field(unsigned int, nptes)
+__field(u64, incr)
+__dynamic_array(u64, dst, nptes)
+   ),
+
+   TP_fast_assign(
+   unsigned int i;
+
+   __entry->start = start;
+   __entry->end = end;
+   __entry->flags = flags;
+   __entry->incr = incr;
+   __entry->nptes = nptes;
+   for (i = 0; i < nptes; ++i) {
+   u64 addr = p->pages_addr ? amdgpu_vm_map_gart(
+   p->pages_addr, dst) : dst;
+
+   ((u64 *)__get_dynamic_array(dst))[i] = addr;
+   dst += incr;
+   }
+   ),
+   TP_printk("start:0x%010llx end:0x%010llx, flags:0x%llx, incr:%llu,"
+ " dst:\n%s", __entry->start, __entry->end, __entry->flags,
+ __entry->incr, __print_array(
+ __get_dynamic_array(dst), __entry->nptes, 8))
+);
+
 TRACE_EVENT(amdgpu_vm_set_ptes,
TP_PROTO(uint64_t pe, uint64_t addr, unsigned count,
 uint32_t incr, uint64_t flags, bool direct),
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 71e005cf2952..b5dbb5e8bc61 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -1513,17 +1513,22 @@ static int amdgpu_vm_update_ptes(struct 
amdgpu_vm_update_params *params,
do {
uint64_t upd_end = min(entry_end, frag_end);
unsigned nptes = (upd_end - frag_start) >> shift;
+   uint64_t upd_flags = flags | AMDGPU_PTE_FRAG(frag);
 
/* This can happen when we set higher level PDs to
 * silent to stop fault floods.
 */
nptes = max(nptes, 1u);
+
+   trace_amdgpu_vm_update_ptes(params, frag_start, upd_end,
+   nptes, dst, incr,
+   upd_flags);
amdgpu_vm_update_flags(params, pt, cursor.level,
   pe_start, dst, nptes, incr,
-  flags | AMDGPU_PTE_FRAG(frag));
+  upd_flags);
 
pe_start += nptes * 8;
-   dst += (uint64_t)nptes * AMDGPU_GPU_PAGE_SIZE << shift;
+   dst += nptes * incr;
 
frag_start = upd_end;
if (frag_start >= frag_end) {
-- 
2.25.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH v4] drm/amdgpu: add new trace event for page table update v3

2020-08-19 Thread Shashank Sharma


On 13/08/20 1:28 pm, Christian König wrote:
> Am 13.08.20 um 05:04 schrieb Shashank Sharma:
>> This patch adds a new trace event to track the PTE update
>> events. This specific event will provide information like:
>> - start and end of virtual memory mapping
>> - HW engine flags for the map
>> - physical address for mapping
>>
>> This will be particularly useful for memory profiling tools
>> (like RMV) which are monitoring the page table update events.
>>
>> V2: Added physical address lookup logic in trace point
>> V3: switch to use __dynamic_array
>>  added nptes int the TPprint arguments list
>>  added page size in the arg list
>> V4: Addressed Christian's review comments
>>  add start/end instead of seg
>>  use incr instead of page_sz to be accurate
>>
>> Cc: Christian König 
>> Cc: Alex Deucher 
>> Signed-off-by: Christian König 
>> Signed-off-by: Shashank Sharma 
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h | 37 +++
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c|  9 --
>>   2 files changed, 44 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
>> index 63e734a125fb..df12cf8466c2 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
>> @@ -321,6 +321,43 @@ DEFINE_EVENT(amdgpu_vm_mapping, amdgpu_vm_bo_cs,
>>  TP_ARGS(mapping)
>>   );
>>   
>> +TRACE_EVENT(amdgpu_vm_update_ptes,
>> +TP_PROTO(struct amdgpu_vm_update_params *p,
>> + uint64_t start, uint64_t end,
>> + unsigned int nptes, uint64_t dst,
>> + uint64_t incr, uint64_t flags),
>> +TP_ARGS(p, start, end, nptes, dst, incr, flags),
>> +TP_STRUCT__entry(
>> + __field(u64, start)
>> + __field(u64, end)
>> + __field(u64, flags)
>> + __field(unsigned int, nptes)
>> + __field(u64, incr)
>> + __dynamic_array(u64, dst, nptes)
> As discussed with the trace subsystem maintainer we need to add the pid 
> and probably the VM context ID we use here to identify the updated VM.
>
> Christian.

I printed both vm->task_info.pid Vs current->pid for testing, and I can see 
different values coming out of .

gnome-shell-2114  [011]    41.812894: amdgpu_vm_update_ptes: start:0x0800102e80 
end:0x0800102e82, flags:0x80, incr:4096, pid=2128 vmid=0 cpid=2114

pid is vm->task_info.pid=2128 whereas cpid=2114 is current.pid.

Which is the one we want to send with the event ?

Trace event by default seems to be adding the process name and id at the header 
of the event (gnome-shell-2114), which is same as current.pid


Also, is it ok to extract vmid from job->vmid ?


Regards

Shashank

>
>> +),
>> +
>> +TP_fast_assign(
>> +unsigned int i;
>> +
>> +__entry->start = start;
>> +__entry->end = end;
>> +__entry->flags = flags;
>> +__entry->incr = incr;
>> +__entry->nptes = nptes;
>> +for (i = 0; i < nptes; ++i) {
>> +u64 addr = p->pages_addr ? amdgpu_vm_map_gart(
>> +p->pages_addr, dst) : dst;
>> +
>> +((u64 *)__get_dynamic_array(dst))[i] = addr;
>> +dst += incr;
>> +}
>> +),
>> +TP_printk("start:0x%010llx end:0x%010llx, flags:0x%llx, incr:%llu,"
>> +  " dst:\n%s", __entry->start, __entry->end, __entry->flags,
>> +  __entry->incr, __print_array(
>> +  __get_dynamic_array(dst), __entry->nptes, 8))
>> +);
>> +
>>   TRACE_EVENT(amdgpu_vm_set_ptes,
>>  TP_PROTO(uint64_t pe, uint64_t addr, unsigned count,
>>   uint32_t incr, uint64_t flags, bool direct),
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> index 71e005cf2952..b5dbb5e8bc61 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> @@ -1513,17 +1513,22 @@ static int amdgpu_vm_update_ptes(struct 
>> amdgpu_vm_update_params *params,
>>  do {
>>  uint64_t upd_end = min(entr

Re: [PATCH v4] drm/amdgpu: add new trace event for page table update v3

2020-08-19 Thread Shashank Sharma


On 19/08/20 5:38 pm, Christian König wrote:
> Am 19.08.20 um 13:52 schrieb Shashank Sharma:
>> On 13/08/20 1:28 pm, Christian König wrote:
>>> Am 13.08.20 um 05:04 schrieb Shashank Sharma:
>>>> This patch adds a new trace event to track the PTE update
>>>> events. This specific event will provide information like:
>>>> - start and end of virtual memory mapping
>>>> - HW engine flags for the map
>>>> - physical address for mapping
>>>>
>>>> This will be particularly useful for memory profiling tools
>>>> (like RMV) which are monitoring the page table update events.
>>>>
>>>> V2: Added physical address lookup logic in trace point
>>>> V3: switch to use __dynamic_array
>>>>   added nptes int the TPprint arguments list
>>>>   added page size in the arg list
>>>> V4: Addressed Christian's review comments
>>>>   add start/end instead of seg
>>>>   use incr instead of page_sz to be accurate
>>>>
>>>> Cc: Christian König 
>>>> Cc: Alex Deucher 
>>>> Signed-off-by: Christian König 
>>>> Signed-off-by: Shashank Sharma 
>>>> ---
>>>>drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h | 37 +++
>>>>drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c|  9 --
>>>>2 files changed, 44 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h 
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
>>>> index 63e734a125fb..df12cf8466c2 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
>>>> @@ -321,6 +321,43 @@ DEFINE_EVENT(amdgpu_vm_mapping, amdgpu_vm_bo_cs,
>>>>TP_ARGS(mapping)
>>>>);
>>>>
>>>> +TRACE_EVENT(amdgpu_vm_update_ptes,
>>>> +  TP_PROTO(struct amdgpu_vm_update_params *p,
>>>> +   uint64_t start, uint64_t end,
>>>> +   unsigned int nptes, uint64_t dst,
>>>> +   uint64_t incr, uint64_t flags),
>>>> +  TP_ARGS(p, start, end, nptes, dst, incr, flags),
>>>> +  TP_STRUCT__entry(
>>>> +   __field(u64, start)
>>>> +   __field(u64, end)
>>>> +   __field(u64, flags)
>>>> +   __field(unsigned int, nptes)
>>>> +   __field(u64, incr)
>>>> +   __dynamic_array(u64, dst, nptes)
>>> As discussed with the trace subsystem maintainer we need to add the pid
>>> and probably the VM context ID we use here to identify the updated VM.
>>>
>>> Christian.
>> I printed both vm->task_info.pid Vs current->pid for testing, and I can see 
>> different values coming out of .
>>
>> gnome-shell-2114  [011]    41.812894: amdgpu_vm_update_ptes: 
>> start:0x0800102e80 end:0x0800102e82, flags:0x80, incr:4096, pid=2128 vmid=0 
>> cpid=2114
>>
>> pid is vm->task_info.pid=2128 whereas cpid=2114 is current.pid.
>>
>> Which is the one we want to send with the event ?
> That is vm->task_info.pid, since this is the PID which is using the VM 
> for command submission.
got it.
>> Trace event by default seems to be adding the process name and id at the 
>> header of the event (gnome-shell-2114), which is same as current.pid
>>
>>
>> Also, is it ok to extract vmid from job->vmid ?
> Only in trace_amdgpu_vm_grab_id(), in all other cases it's probably not 
> assigned yet.

Ok, let me check how can we get vmid from this context we are sending the event 
from. Or maybe I we should  keep V5 with pid only, and later send a separate 
patch to add vmid ?

- Shashank

> Christian.
>
>>
>> Regards
>>
>> Shashank
>>
>>>> +  ),
>>>> +
>>>> +  TP_fast_assign(
>>>> +  unsigned int i;
>>>> +
>>>> +  __entry->start = start;
>>>> +  __entry->end = end;
>>>> +  __entry->flags = flags;
>>>> +  __entry->incr = incr;
>>>> +  __entry->nptes = nptes;
>>>> +  for (i = 0; i < nptes; ++i) {
>>>> +  u64 addr = p->pages_addr ? amdgpu_vm_map_gart(
>>>> +  p->pages_addr, dst) : dst;
>>

1 2 3 4 5 >

1 - 100 of 499 matches

Mail list logo