Hi Andrey,

I don't have the bare mental environment, I can only test the SRIOV cases.

Best Regards,
JingWen Chen



> On Mar 3, 2022, at 01:55, Grodzovsky, Andrey <andrey.grodzov...@amd.com> 
> wrote:
> 
> The patch is acked-by: Andrey Grodzovsky <andrey.grodzov...@amd.com>
> 
> If you also smoked tested bare metal feel free to apply all the patches, if 
> no let me know.
> 
> Andrey
> 
> On 2022-03-02 04:51, JingWen Chen wrote:
>> Hi Andrey,
>> 
>> Most part of the patches are OK, but the code will introduce a ib test fail 
>> on the disabled vcn of sienna_cichlid.
>> 
>> In SRIOV use case we will disable one vcn on sienna_cichlid, I have attached 
>> a patch to fix this issue, please check the attachment.
>> 
>> Best Regards,
>> 
>> Jingwen Chen
>> 
>> 
>> On 2/26/22 5:22 AM, Andrey Grodzovsky wrote:
>>> Hey, patches attached - i applied the patches and resolved merge conflicts 
>>> but weren't able to test as my on board's network card doesn't work with 
>>> 5.16 kernel (it does with 5.17, maybe it's Kconfig issue and i need to 
>>> check more).
>>> The patches are on top of 'cababde192b2 Yifan Zhang         2 days ago     
>>> drm/amd/pm: fix mode2 reset fail for smu 13.0.5 ' commit.
>>> 
>>> Please test and let me know. Maybe by Monday I will be able to resolve the 
>>> connectivity issue on 5.16.
>>> 
>>> Andrey
>>> 
>>> On 2022-02-24 22:13, JingWen Chen wrote:
>>>> Hi Andrey,
>>>> 
>>>> Sorry for the misleading, I mean the whole patch series. We are depending 
>>>> on this patch series to fix the concurrency issue within SRIOV TDR 
>>>> sequence.
>>>> 
>>>> 
>>>> 
>>>> On 2/25/22 1:26 AM, Andrey Grodzovsky wrote:
>>>>> No problem if so but before I do,
>>>>> 
>>>>> 
>>>>> JingWen - why you think this patch is needed as a standalone now ? It has 
>>>>> no use without the
>>>>> entire feature together with it. Is it some changes you want to do on top 
>>>>> of that code ?
>>>>> 
>>>>> 
>>>>> Andrey
>>>>> 
>>>>> 
>>>>> On 2022-02-24 12:12, Deucher, Alexander wrote:
>>>>>> [Public]
>>>>>> 
>>>>>> 
>>>>>> If it applies cleanly, feel free to drop it in.  I'll drop those patches 
>>>>>> for drm-next since they are already in drm-misc.
>>>>>> 
>>>>>> Alex
>>>>>> 
>>>>>> ------------------------------------------------------------------------
>>>>>> *From:* amd-gfx <amd-gfx-boun...@lists.freedesktop.org> on behalf of 
>>>>>> Andrey Grodzovsky <andrey.grodzov...@amd.com>
>>>>>> *Sent:* Thursday, February 24, 2022 11:24 AM
>>>>>> *To:* Chen, JingWen <jingwen.ch...@amd.com>; Christian König 
>>>>>> <ckoenig.leichtzumer...@gmail.com>; dri-de...@lists.freedesktop.org 
>>>>>> <dri-de...@lists.freedesktop.org>; amd-gfx@lists.freedesktop.org 
>>>>>> <amd-gfx@lists.freedesktop.org>
>>>>>> *Cc:* Liu, Monk <monk....@amd.com>; Chen, Horace <horace.c...@amd.com>; 
>>>>>> Lazar, Lijo <lijo.la...@amd.com>; Koenig, Christian 
>>>>>> <christian.koe...@amd.com>; dan...@ffwll.ch <dan...@ffwll.ch>
>>>>>> *Subject:* Re: [RFC v4 02/11] drm/amdgpu: Move scheduler init to after 
>>>>>> XGMI is ready
>>>>>> No because all the patch-set including this patch was landed into
>>>>>> drm-misc-next and will reach amd-staging-drm-next on the next upstream
>>>>>> rebase i guess.
>>>>>> 
>>>>>> Andrey
>>>>>> 
>>>>>> On 2022-02-24 01:47, JingWen Chen wrote:
>>>>>>> Hi Andrey,
>>>>>>> 
>>>>>>> Will you port this patch into amd-staging-drm-next?
>>>>>>> 
>>>>>>> on 2/10/22 2:06 AM, Andrey Grodzovsky wrote:
>>>>>>>> All comments are fixed and code pushed. Thanks for everyone
>>>>>>>> who helped reviewing.
>>>>>>>> 
>>>>>>>> Andrey
>>>>>>>> 
>>>>>>>> On 2022-02-09 02:53, Christian König wrote:
>>>>>>>>> Am 09.02.22 um 01:23 schrieb Andrey Grodzovsky:
>>>>>>>>>> Before we initialize schedulers we must know which reset
>>>>>>>>>> domain are we in - for single device there iis a single
>>>>>>>>>> domain per device and so single wq per device. For XGMI
>>>>>>>>>> the reset domain spans the entire XGMI hive and so the
>>>>>>>>>> reset wq is per hive.
>>>>>>>>>> 
>>>>>>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzov...@amd.com>
>>>>>>>>> One more comment below, with that fixed Reviewed-by: Christian König 
>>>>>>>>> <christian.koe...@amd.com>.
>>>>>>>>> 
>>>>>>>>>> ---
>>>>>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 45 
>>>>>>>>>> ++++++++++++++++++++++
>>>>>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  | 34 ++--------------
>>>>>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   |  2 +
>>>>>>>>>>      3 files changed, 51 insertions(+), 30 deletions(-)
>>>>>>>>>> 
>>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
>>>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>>>>>> index 9704b0e1fd82..00123b0013d3 100644
>>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>>>>>> @@ -2287,6 +2287,47 @@ static int amdgpu_device_fw_loading(struct 
>>>>>>>>>> amdgpu_device *adev)
>>>>>>>>>>          return r;
>>>>>>>>>>      }
>>>>>>>>>>      +static int amdgpu_device_init_schedulers(struct amdgpu_device 
>>>>>>>>>> *adev)
>>>>>>>>>> +{
>>>>>>>>>> +    long timeout;
>>>>>>>>>> +    int r, i;
>>>>>>>>>> +
>>>>>>>>>> +    for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
>>>>>>>>>> +        struct amdgpu_ring *ring = adev->rings[i];
>>>>>>>>>> +
>>>>>>>>>> +        /* No need to setup the GPU scheduler for rings that don't 
>>>>>>>>>> need it */
>>>>>>>>>> +        if (!ring || ring->no_scheduler)
>>>>>>>>>> +            continue;
>>>>>>>>>> +
>>>>>>>>>> +        switch (ring->funcs->type) {
>>>>>>>>>> +        case AMDGPU_RING_TYPE_GFX:
>>>>>>>>>> +            timeout = adev->gfx_timeout;
>>>>>>>>>> +            break;
>>>>>>>>>> +        case AMDGPU_RING_TYPE_COMPUTE:
>>>>>>>>>> +            timeout = adev->compute_timeout;
>>>>>>>>>> +            break;
>>>>>>>>>> +        case AMDGPU_RING_TYPE_SDMA:
>>>>>>>>>> +            timeout = adev->sdma_timeout;
>>>>>>>>>> +            break;
>>>>>>>>>> +        default:
>>>>>>>>>> +            timeout = adev->video_timeout;
>>>>>>>>>> +            break;
>>>>>>>>>> +        }
>>>>>>>>>> +
>>>>>>>>>> +        r = drm_sched_init(&ring->sched, &amdgpu_sched_ops,
>>>>>>>>>> + ring->num_hw_submission, amdgpu_job_hang_limit,
>>>>>>>>>> +                   timeout, adev->reset_domain.wq, 
>>>>>>>>>> ring->sched_score, ring->name);
>>>>>>>>>> +        if (r) {
>>>>>>>>>> +            DRM_ERROR("Failed to create scheduler on ring %s.\n",
>>>>>>>>>> +                  ring->name);
>>>>>>>>>> +            return r;
>>>>>>>>>> +        }
>>>>>>>>>> +    }
>>>>>>>>>> +
>>>>>>>>>> +    return 0;
>>>>>>>>>> +}
>>>>>>>>>> +
>>>>>>>>>> +
>>>>>>>>>>      /**
>>>>>>>>>>       * amdgpu_device_ip_init - run init for hardware IPs
>>>>>>>>>>       *
>>>>>>>>>> @@ -2419,6 +2460,10 @@ static int amdgpu_device_ip_init(struct 
>>>>>>>>>> amdgpu_device *adev)
>>>>>>>>>>              }
>>>>>>>>>>          }
>>>>>>>>>>      +    r = amdgpu_device_init_schedulers(adev);
>>>>>>>>>> +    if (r)
>>>>>>>>>> +        goto init_failed;
>>>>>>>>>> +
>>>>>>>>>>          /* Don't init kfd if whole hive need to be reset during 
>>>>>>>>>> init */
>>>>>>>>>>          if (!adev->gmc.xgmi.pending_reset)
>>>>>>>>>> amdgpu_amdkfd_device_init(adev);
>>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
>>>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>>>>>>> index 45977a72b5dd..fa302540c69a 100644
>>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>>>>>>> @@ -457,8 +457,6 @@ int amdgpu_fence_driver_init_ring(struct 
>>>>>>>>>> amdgpu_ring *ring,
>>>>>>>>>>                        atomic_t *sched_score)
>>>>>>>>>>      {
>>>>>>>>>>          struct amdgpu_device *adev = ring->adev;
>>>>>>>>>> -    long timeout;
>>>>>>>>>> -    int r;
>>>>>>>>>>            if (!adev)
>>>>>>>>>>              return -EINVAL;
>>>>>>>>>> @@ -478,36 +476,12 @@ int amdgpu_fence_driver_init_ring(struct 
>>>>>>>>>> amdgpu_ring *ring,
>>>>>>>>>> spin_lock_init(&ring->fence_drv.lock);
>>>>>>>>>>          ring->fence_drv.fences = kcalloc(num_hw_submission * 2, 
>>>>>>>>>> sizeof(void *),
>>>>>>>>>>                           GFP_KERNEL);
>>>>>>>>>> -    if (!ring->fence_drv.fences)
>>>>>>>>>> -        return -ENOMEM;
>>>>>>>>>>      -    /* No need to setup the GPU scheduler for rings that don't 
>>>>>>>>>> need it */
>>>>>>>>>> -    if (ring->no_scheduler)
>>>>>>>>>> -        return 0;
>>>>>>>>>> +    ring->num_hw_submission = num_hw_submission;
>>>>>>>>>> +    ring->sched_score = sched_score;
>>>>>>>>> Let's move this into the caller and then use ring->num_hw_submission 
>>>>>>>>> in the fence code as well.
>>>>>>>>> 
>>>>>>>>> The maximum number of jobs on the ring is not really fence specific.
>>>>>>>>> 
>>>>>>>>> Regards,
>>>>>>>>> Christian.
>>>>>>>>> 
>>>>>>>>>>      -    switch (ring->funcs->type) {
>>>>>>>>>> -    case AMDGPU_RING_TYPE_GFX:
>>>>>>>>>> -        timeout = adev->gfx_timeout;
>>>>>>>>>> -        break;
>>>>>>>>>> -    case AMDGPU_RING_TYPE_COMPUTE:
>>>>>>>>>> -        timeout = adev->compute_timeout;
>>>>>>>>>> -        break;
>>>>>>>>>> -    case AMDGPU_RING_TYPE_SDMA:
>>>>>>>>>> -        timeout = adev->sdma_timeout;
>>>>>>>>>> -        break;
>>>>>>>>>> -    default:
>>>>>>>>>> -        timeout = adev->video_timeout;
>>>>>>>>>> -        break;
>>>>>>>>>> -    }
>>>>>>>>>> -
>>>>>>>>>> -    r = drm_sched_init(&ring->sched, &amdgpu_sched_ops,
>>>>>>>>>> -               num_hw_submission, amdgpu_job_hang_limit,
>>>>>>>>>> -               timeout, NULL, sched_score, ring->name);
>>>>>>>>>> -    if (r) {
>>>>>>>>>> -        DRM_ERROR("Failed to create scheduler on ring %s.\n",
>>>>>>>>>> -              ring->name);
>>>>>>>>>> -        return r;
>>>>>>>>>> -    }
>>>>>>>>>> +    if (!ring->fence_drv.fences)
>>>>>>>>>> +        return -ENOMEM;
>>>>>>>>>>            return 0;
>>>>>>>>>>      }
>>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h 
>>>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>>>>>>>>>> index fae7d185ad0d..7f20ce73a243 100644
>>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>>>>>>>>>> @@ -251,6 +251,8 @@ struct amdgpu_ring {
>>>>>>>>>>          bool has_compute_vm_bug;
>>>>>>>>>>          bool            no_scheduler;
>>>>>>>>>>          int            hw_prio;
>>>>>>>>>> +    unsigned num_hw_submission;
>>>>>>>>>> +    atomic_t        *sched_score;
>>>>>>>>>>      };
>>>>>>>>>>        #define amdgpu_ring_parse_cs(r, p, ib) 
>>>>>>>>>> ((r)->funcs->parse_cs((p), (ib)))

Reply via email to