> -----Original Message-----
> From: Zhang, Hawking <hawking.zh...@amd.com>
> Sent: Monday, March 6, 2023 10:32 AM
> To: amd-gfx@lists.freedesktop.org; Zhou1, Tao <tao.zh...@amd.com>;
> Yang, Stanley <stanley.y...@amd.com>; Li, Candice <candice...@amd.com>;
> Chai, Thomas <yipeng.c...@amd.com>
> Cc: Zhang, Hawking <hawking.zh...@amd.com>
> Subject: [PATCH 10/11] drm/amdgpu: Rework pcie_bif ras sw_init
> 
> pcie_bif ras blocks needs to be initialized as early as possible to handle 
> fatal
> error detected in hw_init phase. also align the pcie_bif ras sw_init with 
> other
> ras blocks
> 
> Signed-off-by: Hawking Zhang <hawking.zh...@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_nbio.c | 23
> +++++++++++++++++++++++
> drivers/gpu/drm/amd/amdgpu/amdgpu_nbio.h |  1 +
> drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c  | 16 ++++++++++++----
>  3 files changed, 36 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_nbio.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_nbio.c
> index 37d779b8e4a6..a3bc00577a7c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_nbio.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_nbio.c
> @@ -22,6 +22,29 @@
>  #include "amdgpu.h"
>  #include "amdgpu_ras.h"
> 
> +int amdgpu_nbio_ras_sw_init(struct amdgpu_device *adev) {
> +     int err;
> +     struct amdgpu_nbio_ras *ras;
> +
> +     if (!adev->nbio.ras)
> +             return 0;
> +
> +     ras = adev->nbio.ras;
> +     err = amdgpu_ras_register_ras_block(adev, &ras->ras_block);
> +     if (err) {
> +             dev_err(adev->dev, "Failed to register pcie_bif ras block!\n");
> +             return err;
> +     }
> +
> +     strcpy(ras->ras_block.ras_comm.name, "pcie_bif");
> +     ras->ras_block.ras_comm.block = AMDGPU_RAS_BLOCK__PCIE_BIF;
> +     ras->ras_block.ras_comm.type =
> AMDGPU_RAS_ERROR__MULTI_UNCORRECTABLE;
> +     adev->nbio.ras_if = &ras->ras_block.ras_comm;
> +
> +     return 0;
> +}
> +
>  int amdgpu_nbio_ras_late_init(struct amdgpu_device *adev, struct
> ras_common_if *ras_block)  {
>       int r;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_nbio.h
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_nbio.h
> index a240336bbc6b..c686ff4bcc39 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_nbio.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_nbio.h
> @@ -106,5 +106,6 @@ struct amdgpu_nbio {
>       struct amdgpu_nbio_ras  *ras;
>  };
> 
> +int amdgpu_nbio_ras_sw_init(struct amdgpu_device *adev);
>  int amdgpu_nbio_ras_late_init(struct amdgpu_device *adev, struct
> ras_common_if *ras_block);  #endif diff --git
> a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> index 63dfcc98152d..f42480b8a8d3 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> @@ -2558,17 +2558,25 @@ int amdgpu_ras_init(struct amdgpu_device
> *adev)
>       case CHIP_VEGA20:
>       case CHIP_ARCTURUS:
>       case CHIP_ALDEBARAN:
> -             if (!adev->gmc.xgmi.connected_to_cpu) {
> +             if (!adev->gmc.xgmi.connected_to_cpu)

[Stanley]: Same as patch#8 and patch#9.

Regards,
Stanley
>                       adev->nbio.ras = &nbio_v7_4_ras;
> -                     amdgpu_ras_register_ras_block(adev, &adev-
> >nbio.ras->ras_block);
> -                     adev->nbio.ras_if = &adev->nbio.ras-
> >ras_block.ras_comm;
> -             }
>               break;
>       default:
>               /* nbio ras is not available */
>               break;
>       }
> 
> +     /* nbio ras block needs to be enabled ahead of other ras blocks
> +      * to handle fatal error */
> +     if (!adev->gmc.xgmi.connected_to_cpu &&
> +         amdgpu_ras_is_supported(adev,
> AMDGPU_RAS_BLOCK__PCIE_BIF)) {

[Stanley]: Do we need to check gmc.xgmi.connected_to_cpu here? The 
AMDGPU_RAS_BLOCK__PCIE_BIF bit flag is not set when xgmi.connected_to_cpu is set
            according to amdgpu_ras_check_supported function.

Regards,
Stanley
> +             r = amdgpu_nbio_ras_sw_init(adev);
> +             if (r) {
> +                     dev_err(adev->dev, "Failed to initialize pcie_bif ras
> block!\n");
> +                     return r;
> +             }
> +     }
> +
>       if (adev->nbio.ras &&
>           adev->nbio.ras->init_ras_controller_interrupt) {
>               r = adev->nbio.ras->init_ras_controller_interrupt(adev);
> --
> 2.17.1

Reply via email to