On Thu, Aug 20, 2020 at 12:05:56PM +0800, Kuehling, Felix wrote:
> 
> Am 2020-08-19 um 11:09 p.m. schrieb Huang Rui:
> > On Thu, Aug 20, 2020 at 08:18:57AM +0800, Kuehling, Felix wrote:
> >> On 2020-08-19 7:56 p.m., Huang Rui wrote:
> >>> On Wed, Aug 19, 2020 at 11:38:34PM +0800, Kuehling, Felix wrote:
> >>>> Am 2020-08-19 um 7:06 a.m. schrieb Huang Rui:
> >>>>> We still have a few iommu issues which need to address, so force raven
> >>>>> as "dgpu" path for the moment.
> >>>>>
> >>>>> This is to add the fallback path to bypass IOMMU if IOMMU v2 is disabled
> >>>>> or ACPI CRAT table not correct.
> >>>>>
> >>>>> v2: Use ignore_crat parameter to decide whether it will go with IOMMUv2.
> >>>>> v3: Align with existed thunk, don't change the way of raven, only renoir
> >>>>>      will use "dgpu" path by default.
> >>>>>
> >>>>> Signed-off-by: Huang Rui <ray.hu...@amd.com>
> >>>>> ---
> >>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c   |  5 +++-
> >>>>>   drivers/gpu/drm/amd/amdkfd/kfd_crat.c     | 28 ++++++++++++++++++++++-
> >>>>>   drivers/gpu/drm/amd/amdkfd/kfd_device.c   |  2 +-
> >>>>>   drivers/gpu/drm/amd/amdkfd/kfd_priv.h     |  2 +-
> >>>>>   drivers/gpu/drm/amd/amdkfd/kfd_topology.c |  1 +
> >>>>>   5 files changed, 34 insertions(+), 4 deletions(-)
> >>>>>
> >>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
> >>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> >>>>> index a9a4319c24ae..189f9d7e190d 100644
> >>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> >>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> >>>>> @@ -684,11 +684,14 @@ MODULE_PARM_DESC(debug_largebar,
> >>>>>    * Ignore CRAT table during KFD initialization. By default, KFD uses 
> >>>>> the ACPI CRAT
> >>>>>    * table to get information about AMD APUs. This option can serve as 
> >>>>> a workaround on
> >>>>>    * systems with a broken CRAT table.
> >>>>> + *
> >>>>> + * Default is auto (according to asic type, iommu_v2, and crat table, 
> >>>>> to decide
> >>>>> + * whehter use CRAT)
> >>>>>    */
> >>>>>   int ignore_crat;
> >>>>>   module_param(ignore_crat, int, 0444);
> >>>>>   MODULE_PARM_DESC(ignore_crat,
> >>>>> -       "Ignore CRAT table during KFD initialization (0 = use CRAT 
> >>>>> (default), 1 = ignore CRAT)");
> >>>>> +       "Ignore CRAT table during KFD initialization (0 = auto 
> >>>>> (default), 1 = ignore CRAT)");
> >>>>>   
> >>>>>   /**
> >>>>>    * DOC: halt_if_hws_hang (int)
> >>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c 
> >>>>> b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> >>>>> index 59557e3e206a..f8346d4402e2 100644
> >>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> >>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> >>>>> @@ -22,6 +22,7 @@
> >>>>>   
> >>>>>   #include <linux/pci.h>
> >>>>>   #include <linux/acpi.h>
> >>>>> +#include <asm/processor.h>
> >>>>>   #include "kfd_crat.h"
> >>>>>   #include "kfd_priv.h"
> >>>>>   #include "kfd_topology.h"
> >>>>> @@ -740,6 +741,30 @@ static int kfd_fill_gpu_cache_info(struct kfd_dev 
> >>>>> *kdev,
> >>>>>         return 0;
> >>>>>   }
> >>>>>   
> >>>>> +
> >>>>> +#ifdef CONFIG_ACPI
> >>>>> +static void kfd_setup_ignore_crat_option(void)
> >>>>> +{
> >>>>> +
> >>>>> +       if (ignore_crat)
> >>>>> +               return;
> >>>>> +
> >>>>> +#ifndef KFD_SUPPORT_IOMMU_V2
> >>>>> +       ignore_crat = 1;
> >>>>> +#else
> >>>>> +       ignore_crat = 0;
> >>>>> +#endif
> >>>>> +
> >>>>> +       /* Renoir use the fallback path to align with existed thunk */
> >>>> Are you sure you need special code for Renoir here? For Renoir the
> >>>> dev->device_info already treats it as a dGPU and always has.
> >>> Renoir also is an APU, in other words, we might have got the correct CRAT
> >>> table from SBIOS (the CRAT table in SBIOS for renoir is broken so far). If
> >>> we had got CRAT table, the kfd would create an APU node. That's not
> >>> expected.
> >> kfd_assign_gpu will not assign a Renoir GPU as the APU from the CRAT 
> >> table because gpu->device_info->needs_iommu_device is False for Renoir. 
> >> So Renoir will always show up in the topology as its own discrete GPU node.
> >>
> >> How does this work today? Renoir is already treated as a dGPU. But the 
> >> CPU node info (/sys/class/kfd/kfd/topology/nodes/0/properties) from the 
> >> CRAT table still shows GPU cores?
> >>
> >> Regards,
> >>    Felix
> >>
> >>
> >>>> I don't like the whole idea of changing the value of a module parameter,
> >>>> because it is global and visible to the user through sysfs. Instead, if
> >>>> you need to override the value of ignore_crat to consider other
> >>>> conditions, I think kfd_device_use_iommu_v2 and
> >>>> kfd_create_crat_image_acpi would be the right place to do it.
> >>>>
> >>>> To avoid duplicating the conditions, you could add a helper function
> >>>> bool kfd_ignore_crat(void) that can be called instead of using the
> >>>> ignore_crat parameter directly. This function, changing the global
> >>>> module parameter, should be removed.
> >>> That makes sense. Will update it in next version.
> >>>
> >>>>> +       if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD &&
> >>>>> +           boot_cpu_data.x86 == 0x17 &&
> >>>>> +           boot_cpu_data.x86_model >= 0x60 && boot_cpu_data.x86_model 
> >>>>> < 0x70) {
> >>>>> +               ignore_crat = 1;
> >>>>> +       }
> >>>>> +
> >>>>> +       return;
> >>>>> +}
> >>>>> +
> >>>>>   /*
> >>>>>    * kfd_create_crat_image_acpi - Allocates memory for CRAT image and
> >>>>>    * copies CRAT from ACPI (if available).
> >>>>> @@ -751,7 +776,6 @@ static int kfd_fill_gpu_cache_info(struct kfd_dev 
> >>>>> *kdev,
> >>>>>    *
> >>>>>    *    Return 0 if successful else return error code
> >>>>>    */
> >>>>> -#ifdef CONFIG_ACPI
> >>>>>   int kfd_create_crat_image_acpi(void **crat_image, size_t *size)
> >>>>>   {
> >>>>>         struct acpi_table_header *crat_table;
> >>>>> @@ -775,6 +799,8 @@ int kfd_create_crat_image_acpi(void **crat_image, 
> >>>>> size_t *size)
> >>>>>                 return -EINVAL;
> >>>>>         }
> >>>>>   
> >>>>> +       kfd_setup_ignore_crat_option();
> >>>>> +
> >>>>>         if (ignore_crat) {
> >>>>>                 pr_info("CRAT table disabled by module option\n");
> >>>>>                 return -ENODATA;
> >>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c 
> >>>>> b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> >>>>> index 2c030c2b5b8d..dab44951c4d8 100644
> >>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> >>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> >>>>> @@ -112,6 +112,7 @@ static const struct kfd_device_info 
> >>>>> carrizo_device_info = {
> >>>>>         .num_xgmi_sdma_engines = 0,
> >>>>>         .num_sdma_queues_per_engine = 2,
> >>>>>   };
> >>>>> +#endif
> >>>>>   
> >>>>>   static const struct kfd_device_info raven_device_info = {
> >>>>>         .asic_family = CHIP_RAVEN,
> >>>>> @@ -130,7 +131,6 @@ static const struct kfd_device_info 
> >>>>> raven_device_info = {
> >>>>>         .num_xgmi_sdma_engines = 0,
> >>>>>         .num_sdma_queues_per_engine = 2,
> >>>>>   };
> >>>>> -#endif
> >>>>>   
> >>>>>   static const struct kfd_device_info hawaii_device_info = {
> >>>>>         .asic_family = CHIP_HAWAII,
> >>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
> >>>>> b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> >>>>> index 82f955750e75..4b6e7ef7a71c 100644
> >>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> >>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> >>>>> @@ -1234,7 +1234,7 @@ static inline int 
> >>>>> kfd_devcgroup_check_permission(struct kfd_dev *kfd)
> >>>>>   
> >>>>>   static inline bool kfd_device_use_iommu_v2(const struct kfd_dev *dev)
> >>>>>   {
> >>>>> -       return dev && dev->device_info->needs_iommu_device;
> >>>>> +       return !ignore_crat && dev && 
> >>>>> dev->device_info->needs_iommu_device;
> >>>>>   }
> >>>>>   
> >>>>>   /* Debugfs */
> >>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c 
> >>>>> b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> >>>>> index 4b29815e9205..b92ce75a4c53 100644
> >>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> >>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> >>>>> @@ -1090,6 +1090,7 @@ int kfd_topology_init(void)
> >>>>>                                                     COMPUTE_UNIT_CPU, 
> >>>>> NULL,
> >>>>>                                                     proximity_domain);
> >>>>>                 cpu_only_node = 1;
> >>>>> +               ignore_crat = 1;
> >>>> Don't change the global variable. I think you're doing this here in case
> >>>> the CRAT table is broken and contains no GPU info. Maybe we need to add
> >>>> a new flag "use_iommu_v2" into the kfd_dev structure to handle this.
> >>>>
> > Find it just now, kfd_dev is not initialized here. So we may be unable to
> > use flag in kfd_dev.
> 
> I see. This is very early during module init. When you get here, you
> already failed to read the ACPI CRAT table and created a VCRAT for the
> CPU with no GPU cores.
> 
> If you wanted to add a per device "use_iommu_v2" flag, you could
> probably set that in kfd_assign_gpu when it assigns a KFD device to a
> node with CPU cores.
> 

Yes, exactly!

Thanks,
Ray

> Regards,
>   Felix
> 
> 
> >
> > Thanks,
> > Ray
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Reply via email to