Responses inline, will post updated patch shortly. Look at my response for the 
Sysfs permission

Regards,
Ramesh
 
-----Original Message-----
From: Kuehling, Felix <felix.kuehl...@amd.com> 
Sent: Wednesday, August 28, 2024 3:59 PM
To: amd-gfx@lists.freedesktop.org; Errabolu, Ramesh <ramesh.errab...@amd.com>
Subject: Re: [PATCH v2] drm/amdgpu: Surface svm_attr_gobm, a RW module parameter


On 2024-08-26 15:34, Ramesh Errabolu wrote:
> Enables users to update the default size of buffer used in migration 
> either from Sysmem to VRAM or vice versa.
> The param GOBM refers to granularity of buffer migration, and is 
> specified in terms of log(numPages(buffer)). It facilitates users of 
> unregistered memory to control GOBM, albeit at a coarse level

Can we change the name of this to something more human-readable? I suggest 
something like svm_default_granularity.
Ramesh: Done per suggestion

>
> Signed-off-by: Ramesh Errabolu <ramesh.errab...@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu.h     |  4 ++++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 18 +++++++++++++++++
>   drivers/gpu/drm/amd/amdkfd/kfd_priv.h   | 12 ++++++++++++
>   drivers/gpu/drm/amd/amdkfd/kfd_svm.c    | 26 ++++++++++++++++---------
>   4 files changed, 51 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index e8c284aea1f2..73dd816b01f2 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -237,6 +237,7 @@ extern int sched_policy;
>   extern bool debug_evictions;
>   extern bool no_system_mem_limit;
>   extern int halt_if_hws_hang;
> +extern uint amdgpu_svm_attr_gobm;
>   #else
>   static const int __maybe_unused sched_policy = KFD_SCHED_POLICY_HWS;
>   static const bool __maybe_unused debug_evictions; /* = false */
> @@ -313,6 +314,9 @@ extern int amdgpu_wbrf;
>   /* Extra time delay(in ms) to eliminate the influence of temperature 
> momentary fluctuation */
>   #define AMDGPU_SWCTF_EXTRA_DELAY            50
>   
> +/* Default size of buffer to use in migrating buffer */
> +#define AMDGPU_SVM_ATTR_GOBM     9

I change this name, too. AMDGPU_SVM_DEFAULT_GRANULARITY.
Ramesh: Removing this per input from other review. Also it is not used only in 
one location

> +
>   struct amdgpu_xcp_mgr;
>   struct amdgpu_device;
>   struct amdgpu_irq_src;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index b9529948f2b2..09c501753a3b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -169,6 +169,17 @@ uint amdgpu_sdma_phase_quantum = 32;
>   char *amdgpu_disable_cu;
>   char *amdgpu_virtual_display;
>   bool enforce_isolation;
> +
> +/* Specifies the default size of buffer to use in
> + * migrating buffer from Sysmem to VRAM and vice
> + * versa
> + *
> + * GOBM - Granularity of Buffer Migration

This is really the granularity for page faults as well as migration. 
Hence the suggested name change. Also, GOBM is a new acronym. If you 
only define that in this comment, users won't know what it means. I'd 
remove this comment and instead include enough information in the 
PARM_DESC to let users know what this means. This information is visible 
to them in the output of the "modinfo" command.


> + *
> + * Defined as log2(sizeof(buffer)/PAGE_SIZE)
> + */
> +uint amdgpu_svm_attr_gobm = AMDGPU_SVM_ATTR_GOBM;
> +
>   /*
>    * OverDrive(bit 14) disabled by default
>    * GFX DCS(bit 19) disabled by default
> @@ -320,6 +331,13 @@ module_param_named(pcie_gen2, amdgpu_pcie_gen2, int, 
> 0444);
>   MODULE_PARM_DESC(msi, "MSI support (1 = enable, 0 = disable, -1 = auto)");
>   module_param_named(msi, amdgpu_msi, int, 0444);
>   
> +/**
> + * DOC: svm_attr_gobm (uint)
> + * Size of buffer to use in migrating buffer from Sysmem to VRAM and vice 
> versa
> + */
> +MODULE_PARM_DESC(svm_attr_gobm, "Defined as log2(sizeof(buffer)/PAGE_SIZE), 
> e.g. 9 for 2 MiB");

Suggested description: "Default SVM page fault granularity in 2^x pages, 
default 9 = 2MiB".
Ramesh: Elaborated the comment in the driver to include both aspects. Updated 
PARAM_DESC() to hint at its role on both

> +module_param_named(svm_attr_gobm, amdgpu_svm_attr_gobm, uint, 0644);

All the other writable options use permissions 0600. I'd stay consistent 
with that.
Ramesh: Keeping it as 644 to allow non-root users to be able to query this 
parameter.
This is similar to how params such as " queue_preemption_timeout_ms " and 
"timeout_period" are defined

> +
>   /**
>    * DOC: lockup_timeout (string)
>    * Set GPU scheduler timeout value in ms.
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
> b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> index 9ae9abc6eb43..c2e54b18c167 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> @@ -868,6 +868,18 @@ struct svm_range_list {
>       struct task_struct              *faulting_task;
>       /* check point ts decides if page fault recovery need be dropped */
>       uint64_t                        checkpoint_ts[MAX_GPU_INSTANCE];
> +
> +     /* Indicates the default size to use in migrating
> +      * buffers of a process from Sysmem to VRAM and vice
> +      * versa. The max legal value cannot be greater than
> +      * 0x3F
> +      *
> +      * @note: A side effect of this symbol being part of
> +      * struct svm_range_list is that it forces all buffers
> +      * of the process of unregistered kind to use the same
> +      * size in buffer migration
> +      */
> +     uint8_t attr_gobm;

Attr is not a good name for this. Attr are the per-virtual address range 
attributes. This is the default setting for just one of these 
attributes. Instead of a long comment, just use a more descriptive name:

        uint8_t default_granularity;
Ramesh: Updated name to suggested, including an updated comment to emphasize 
its role in buffer migration and page fault handling.

>   };
>   
>   /* Process data */
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> index b44dec90969f..78c78baddb1f 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> @@ -309,12 +309,11 @@ static void svm_range_free(struct svm_range *prange, 
> bool do_unmap)
>   }
>   
>   static void
> -svm_range_set_default_attributes(int32_t *location, int32_t *prefetch_loc,
> -                              uint8_t *granularity, uint32_t *flags)
> +svm_range_set_default_attributes(int32_t *location,
> +                     int32_t *prefetch_loc, uint32_t *flags)

You broke the indentation here.
Ramesh: Do not see it. Also checkpatch script didn't complain

>   {
>       *location = KFD_IOCTL_SVM_LOCATION_UNDEFINED;
>       *prefetch_loc = KFD_IOCTL_SVM_LOCATION_UNDEFINED;
> -     *granularity = 9;

I think I'd prefer to keep this here. Instead of removing this, add a 
struct svm_range_list *svms parameter so you can initialize this here.
Ramesh: Done, per your suggestion and others

>       *flags =
>               KFD_IOCTL_SVM_FLAG_HOST_ACCESS | KFD_IOCTL_SVM_FLAG_COHERENT;
>   }
> @@ -358,9 +357,9 @@ svm_range *svm_range_new(struct svm_range_list *svms, 
> uint64_t start,
>               bitmap_copy(prange->bitmap_access, svms->bitmap_supported,
>                           MAX_GPU_INSTANCE);
>   
> +     prange->granularity = svms->attr_gobm;
>       svm_range_set_default_attributes(&prange->preferred_loc,
> -                                      &prange->prefetch_loc,
> -                                      &prange->granularity, &prange->flags);
> +                             &prange->prefetch_loc, &prange->flags);

Broken indentation.
Ramesh: Do not see it. Also checkpatch script didn't complain

>   
>       pr_debug("svms 0x%p [0x%llx 0x%llx]\n", svms, start, last);
>   
> @@ -2693,10 +2692,12 @@ svm_range_get_range_boundaries(struct kfd_process *p, 
> int64_t addr,
>   
>       *is_heap_stack = vma_is_initial_heap(vma) || vma_is_initial_stack(vma);
>   
> +     /* Determine the starting and ending page of prange */
>       start_limit = max(vma->vm_start >> PAGE_SHIFT,
> -                   (unsigned long)ALIGN_DOWN(addr, 2UL << 8));
> +                   (unsigned long)ALIGN_DOWN(addr, 1 << p->svms.attr_gobm));
>       end_limit = min(vma->vm_end >> PAGE_SHIFT,
> -                 (unsigned long)ALIGN(addr + 1, 2UL << 8));
> +                 (unsigned long)ALIGN(addr + 1, 1 << p->svms.attr_gobm));
> +
>       /* First range that starts after the fault address */
>       node = interval_tree_iter_first(&p->svms.objects, addr + 1, ULONG_MAX);
>       if (node) {
> @@ -3240,6 +3241,12 @@ int svm_range_list_init(struct kfd_process *p)
>               if (KFD_IS_SVM_API_SUPPORTED(p->pdds[i]->dev->adev))
>                       bitmap_set(svms->bitmap_supported, i, 1);
>   
> +     /* Bind granularity of buffer migration, either
> +      * the default size or one specified by the user
> +      */
> +     svms->attr_gobm = min_t(u8, amdgpu_svm_attr_gobm, 0x3F);

0x3F is a little high as a limit. That's 63 decimal. 2^63 pages is more 
than the supported virtual address space in the HW. With 47 virtual 
address bits, 12 bits for the address within the page, the highest 
possible limit would be 35 decimal. A more useful limit would be maybe 
27 (512GB), which is the largest page size supported with 4-level paging.
Ramesh: I was thinking the same but went with existing code. Isn't the VA limit 
48 instead of 47. Nonetheless, will apply 512 GiB

Regards,
   Felix


> +     pr_debug("Granularity Of Buffer Migration: %d\n", svms->attr_gobm);
> +
>       return 0;
>   }
>   
> @@ -3767,8 +3774,9 @@ svm_range_get_attr(struct kfd_process *p, struct 
> mm_struct *mm,
>       node = interval_tree_iter_first(&svms->objects, start, last);
>       if (!node) {
>               pr_debug("range attrs not found return default values\n");
> -             svm_range_set_default_attributes(&location, &prefetch_loc,
> -                                              &granularity, &flags_and);
> +             granularity = svms->attr_gobm;
> +             svm_range_set_default_attributes(&location,
> +                                     &prefetch_loc, &flags_and);
>               flags_or = flags_and;
>               if (p->xnack_enabled)
>                       bitmap_copy(bitmap_access, svms->bitmap_supported,

Reply via email to