[AMD Official Use Only - AMD Internal Distribution Only]

> -----Original Message-----
> From: Zhou1, Tao <tao.zh...@amd.com>
> Sent: Wednesday, April 2, 2025 11:02 PM
> To: Skvortsov, Victor <victor.skvort...@amd.com>; 
> amd-gfx@lists.freedesktop.org
> Cc: Zhang, Hawking <hawking.zh...@amd.com>; Zhao, Victor
> <victor.z...@amd.com>
> Subject: RE: [PATCH] drm/amdgpu: Disable ACA on VFs
>
> [AMD Official Use Only - AMD Internal Distribution Only]
>
> > -----Original Message-----
> > From: Skvortsov, Victor <victor.skvort...@amd.com>
> > Sent: Thursday, April 3, 2025 6:16 AM
> > To: amd-gfx@lists.freedesktop.org
> > Cc: Zhang, Hawking <hawking.zh...@amd.com>; Zhao, Victor
> > <victor.z...@amd.com>; Zhou1, Tao <tao.zh...@amd.com>; Skvortsov,
> > Victor <victor.skvort...@amd.com>
> > Subject: [PATCH] drm/amdgpu: Disable ACA on VFs
> >
> > VFs query RAS error counts directly from host with
> > AMDGPU_RAS_VIRT_ERROR_COUNT_QUERY. When ACA is enabled, an
> unusable
> > aca_sysfs is created rather than amdgpu_ras_sysfs_create()
> >
> > Likewise, VFs depend on host support to query CPERs, rather than ACA
> component.
> >
> > Signed-off-by: Victor Skvortsov <victor.skvort...@amd.com>
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_cper.c |  4 ++--
> > drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c  | 10 ++++++----
> >  2 files changed, 8 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cper.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_cper.c
> > index 360e07a5c7c1..5a234eadae8b 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cper.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cper.c
> > @@ -549,7 +549,7 @@ int amdgpu_cper_init(struct amdgpu_device *adev)  {
> >       int r;
> >
> > -     if (!amdgpu_aca_is_enabled(adev))
> > +     if (!amdgpu_aca_is_enabled(adev) &&
> > + !amdgpu_sriov_ras_cper_en(adev))
>
> [Tao] can we put amdgpu_sriov_ras_cper_en into amdgpu_aca_is_enabled?

[Victor] This will cause problems inside amdgpu_ras_sysfs_create() since VFs 
use the legacy sysfs to report IP block error counts through 
AMDGPU_RAS_VIRT_ERROR_COUNT_QUERY.

>
> >               return 0;
> >
> >       r = amdgpu_cper_ring_init(adev); @@ -568,7 +568,7 @@ int
> > amdgpu_cper_init(struct amdgpu_device *adev)
> >
> >  int amdgpu_cper_fini(struct amdgpu_device *adev)  {
> > -     if (!amdgpu_aca_is_enabled(adev))
> > +     if (!amdgpu_aca_is_enabled(adev) &&
> > + !amdgpu_sriov_ras_cper_en(adev))
> >               return 0;
> >
> >       adev->cper.enabled = false;
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> > index ebf1f63d0442..5bb7673fd28e 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> > @@ -3794,10 +3794,12 @@ static void amdgpu_ras_check_supported(struct
> > amdgpu_device *adev)
> >               adev->ras_hw_enabled & amdgpu_ras_mask;
> >
> >       /* aca is disabled by default except for psp 
> > v13_0_6/v13_0_12/v13_0_14 */
> > -     adev->aca.is_enabled =
> > -             (amdgpu_ip_version(adev, MP0_HWIP, 0) == IP_VERSION(13, 0, 6)
> > ||
> > -              amdgpu_ip_version(adev, MP0_HWIP, 0) == IP_VERSION(13, 0, 12)
> > ||
> > -              amdgpu_ip_version(adev, MP0_HWIP, 0) == IP_VERSION(13, 0,
> > 14));
> > +     if (!amdgpu_sriov_vf(adev)) {
> > +             adev->aca.is_enabled =
> > +                     (amdgpu_ip_version(adev, MP0_HWIP, 0) ==
> > IP_VERSION(13, 0, 6) ||
> > +                     amdgpu_ip_version(adev, MP0_HWIP, 0) ==
> > IP_VERSION(13, 0, 12) ||
> > +                     amdgpu_ip_version(adev, MP0_HWIP, 0) ==
> > IP_VERSION(13, 0, 14));
> > +     }
> >
> >       /* bad page feature is not applicable to specific app platform */
> >       if (adev->gmc.is_app_apu &&
> > --
> > 2.34.1
>

Reply via email to