Thanks a lot

Best Regards,
JingWen Chen



> On Mar 4, 2022, at 00:36, Grodzovsky, Andrey <andrey.grodzov...@amd.com> 
> wrote:
> 
> I pushed all the changes including your patch.
> 
> Andrey
> 
> On 2022-03-02 22:16, Andrey Grodzovsky wrote:
>> OK, i will do quick smoke test tomorrow and push all of it it then.
>> 
>> Andrey
>> 
>> On 2022-03-02 21:59, Chen, JingWen wrote:
>>> Hi Andrey,
>>> 
>>> I don't have the bare mental environment, I can only test the SRIOV cases.
>>> 
>>> Best Regards,
>>> JingWen Chen
>>> 
>>> 
>>> 
>>>> On Mar 3, 2022, at 01:55, Grodzovsky, Andrey <andrey.grodzov...@amd.com> 
>>>> wrote:
>>>> 
>>>> The patch is acked-by: Andrey Grodzovsky <andrey.grodzov...@amd.com>
>>>> 
>>>> If you also smoked tested bare metal feel free to apply all the patches, 
>>>> if no let me know.
>>>> 
>>>> Andrey
>>>> 
>>>> On 2022-03-02 04:51, JingWen Chen wrote:
>>>>> Hi Andrey,
>>>>> 
>>>>> Most part of the patches are OK, but the code will introduce a ib test 
>>>>> fail on the disabled vcn of sienna_cichlid.
>>>>> 
>>>>> In SRIOV use case we will disable one vcn on sienna_cichlid, I have 
>>>>> attached a patch to fix this issue, please check the attachment.
>>>>> 
>>>>> Best Regards,
>>>>> 
>>>>> Jingwen Chen
>>>>> 
>>>>> 
>>>>> On 2/26/22 5:22 AM, Andrey Grodzovsky wrote:
>>>>>> Hey, patches attached - i applied the patches and resolved merge 
>>>>>> conflicts but weren't able to test as my on board's network card doesn't 
>>>>>> work with 5.16 kernel (it does with 5.17, maybe it's Kconfig issue and i 
>>>>>> need to check more).
>>>>>> The patches are on top of 'cababde192b2 Yifan Zhang         2 days ago   
>>>>>>   drm/amd/pm: fix mode2 reset fail for smu 13.0.5 ' commit.
>>>>>> 
>>>>>> Please test and let me know. Maybe by Monday I will be able to resolve 
>>>>>> the connectivity issue on 5.16.
>>>>>> 
>>>>>> Andrey
>>>>>> 
>>>>>> On 2022-02-24 22:13, JingWen Chen wrote:
>>>>>>> Hi Andrey,
>>>>>>> 
>>>>>>> Sorry for the misleading, I mean the whole patch series. We are 
>>>>>>> depending on this patch series to fix the concurrency issue within 
>>>>>>> SRIOV TDR sequence.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On 2/25/22 1:26 AM, Andrey Grodzovsky wrote:
>>>>>>>> No problem if so but before I do,
>>>>>>>> 
>>>>>>>> 
>>>>>>>> JingWen - why you think this patch is needed as a standalone now ? It 
>>>>>>>> has no use without the
>>>>>>>> entire feature together with it. Is it some changes you want to do on 
>>>>>>>> top of that code ?
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Andrey
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On 2022-02-24 12:12, Deucher, Alexander wrote:
>>>>>>>>> [Public]
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> If it applies cleanly, feel free to drop it in. I'll drop those 
>>>>>>>>> patches for drm-next since they are already in drm-misc.
>>>>>>>>> 
>>>>>>>>> Alex
>>>>>>>>> 
>>>>>>>>> ------------------------------------------------------------------------
>>>>>>>>>  
>>>>>>>>> *From:* amd-gfx <amd-gfx-boun...@lists.freedesktop.org> on behalf of 
>>>>>>>>> Andrey Grodzovsky <andrey.grodzov...@amd.com>
>>>>>>>>> *Sent:* Thursday, February 24, 2022 11:24 AM
>>>>>>>>> *To:* Chen, JingWen <jingwen.ch...@amd.com>; Christian König 
>>>>>>>>> <ckoenig.leichtzumer...@gmail.com>; dri-de...@lists.freedesktop.org 
>>>>>>>>> <dri-de...@lists.freedesktop.org>; amd-gfx@lists.freedesktop.org 
>>>>>>>>> <amd-gfx@lists.freedesktop.org>
>>>>>>>>> *Cc:* Liu, Monk <monk....@amd.com>; Chen, Horace 
>>>>>>>>> <horace.c...@amd.com>; Lazar, Lijo <lijo.la...@amd.com>; Koenig, 
>>>>>>>>> Christian <christian.koe...@amd.com>; dan...@ffwll.ch 
>>>>>>>>> <dan...@ffwll.ch>
>>>>>>>>> *Subject:* Re: [RFC v4 02/11] drm/amdgpu: Move scheduler init to 
>>>>>>>>> after XGMI is ready
>>>>>>>>> No because all the patch-set including this patch was landed into
>>>>>>>>> drm-misc-next and will reach amd-staging-drm-next on the next upstream
>>>>>>>>> rebase i guess.
>>>>>>>>> 
>>>>>>>>> Andrey
>>>>>>>>> 
>>>>>>>>> On 2022-02-24 01:47, JingWen Chen wrote:
>>>>>>>>>> Hi Andrey,
>>>>>>>>>> 
>>>>>>>>>> Will you port this patch into amd-staging-drm-next?
>>>>>>>>>> 
>>>>>>>>>> on 2/10/22 2:06 AM, Andrey Grodzovsky wrote:
>>>>>>>>>>> All comments are fixed and code pushed. Thanks for everyone
>>>>>>>>>>> who helped reviewing.
>>>>>>>>>>> 
>>>>>>>>>>> Andrey
>>>>>>>>>>> 
>>>>>>>>>>> On 2022-02-09 02:53, Christian König wrote:
>>>>>>>>>>>> Am 09.02.22 um 01:23 schrieb Andrey Grodzovsky:
>>>>>>>>>>>>> Before we initialize schedulers we must know which reset
>>>>>>>>>>>>> domain are we in - for single device there iis a single
>>>>>>>>>>>>> domain per device and so single wq per device. For XGMI
>>>>>>>>>>>>> the reset domain spans the entire XGMI hive and so the
>>>>>>>>>>>>> reset wq is per hive.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzov...@amd.com>
>>>>>>>>>>>> One more comment below, with that fixed Reviewed-by: Christian 
>>>>>>>>>>>> König <christian.koe...@amd.com>.
>>>>>>>>>>>> 
>>>>>>>>>>>>> ---
>>>>>>>>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 45 
>>>>>>>>>>>>> ++++++++++++++++++++++
>>>>>>>>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  | 34 ++--------------
>>>>>>>>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h |  2 +
>>>>>>>>>>>>>       3 files changed, 51 insertions(+), 30 deletions(-)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
>>>>>>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>>>>>>>>> index 9704b0e1fd82..00123b0013d3 100644
>>>>>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>>>>>>>>> @@ -2287,6 +2287,47 @@ static int amdgpu_device_fw_loading(struct 
>>>>>>>>>>>>> amdgpu_device *adev)
>>>>>>>>>>>>>           return r;
>>>>>>>>>>>>>       }
>>>>>>>>>>>>>       +static int amdgpu_device_init_schedulers(struct 
>>>>>>>>>>>>> amdgpu_device *adev)
>>>>>>>>>>>>> +{
>>>>>>>>>>>>> +    long timeout;
>>>>>>>>>>>>> +    int r, i;
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +    for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
>>>>>>>>>>>>> +        struct amdgpu_ring *ring = adev->rings[i];
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +        /* No need to setup the GPU scheduler for rings that 
>>>>>>>>>>>>> don't need it */
>>>>>>>>>>>>> +        if (!ring || ring->no_scheduler)
>>>>>>>>>>>>> +            continue;
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +        switch (ring->funcs->type) {
>>>>>>>>>>>>> +        case AMDGPU_RING_TYPE_GFX:
>>>>>>>>>>>>> +            timeout = adev->gfx_timeout;
>>>>>>>>>>>>> +            break;
>>>>>>>>>>>>> +        case AMDGPU_RING_TYPE_COMPUTE:
>>>>>>>>>>>>> +            timeout = adev->compute_timeout;
>>>>>>>>>>>>> +            break;
>>>>>>>>>>>>> +        case AMDGPU_RING_TYPE_SDMA:
>>>>>>>>>>>>> +            timeout = adev->sdma_timeout;
>>>>>>>>>>>>> +            break;
>>>>>>>>>>>>> +        default:
>>>>>>>>>>>>> +            timeout = adev->video_timeout;
>>>>>>>>>>>>> +            break;
>>>>>>>>>>>>> +        }
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +        r = drm_sched_init(&ring->sched, &amdgpu_sched_ops,
>>>>>>>>>>>>> + ring->num_hw_submission, amdgpu_job_hang_limit,
>>>>>>>>>>>>> +                   timeout, adev->reset_domain.wq, 
>>>>>>>>>>>>> ring->sched_score, ring->name);
>>>>>>>>>>>>> +        if (r) {
>>>>>>>>>>>>> +            DRM_ERROR("Failed to create scheduler on ring %s.\n",
>>>>>>>>>>>>> +                  ring->name);
>>>>>>>>>>>>> +            return r;
>>>>>>>>>>>>> +        }
>>>>>>>>>>>>> +    }
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +    return 0;
>>>>>>>>>>>>> +}
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +
>>>>>>>>>>>>>       /**
>>>>>>>>>>>>>        * amdgpu_device_ip_init - run init for hardware IPs
>>>>>>>>>>>>>        *
>>>>>>>>>>>>> @@ -2419,6 +2460,10 @@ static int amdgpu_device_ip_init(struct 
>>>>>>>>>>>>> amdgpu_device *adev)
>>>>>>>>>>>>>               }
>>>>>>>>>>>>>           }
>>>>>>>>>>>>>       +    r = amdgpu_device_init_schedulers(adev);
>>>>>>>>>>>>> +    if (r)
>>>>>>>>>>>>> +        goto init_failed;
>>>>>>>>>>>>> +
>>>>>>>>>>>>>           /* Don't init kfd if whole hive need to be reset during 
>>>>>>>>>>>>> init */
>>>>>>>>>>>>>           if (!adev->gmc.xgmi.pending_reset)
>>>>>>>>>>>>> amdgpu_amdkfd_device_init(adev);
>>>>>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
>>>>>>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>>>>>>>>>> index 45977a72b5dd..fa302540c69a 100644
>>>>>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>>>>>>>>>> @@ -457,8 +457,6 @@ int amdgpu_fence_driver_init_ring(struct 
>>>>>>>>>>>>> amdgpu_ring *ring,
>>>>>>>>>>>>>                         atomic_t *sched_score)
>>>>>>>>>>>>>       {
>>>>>>>>>>>>>           struct amdgpu_device *adev = ring->adev;
>>>>>>>>>>>>> -    long timeout;
>>>>>>>>>>>>> -    int r;
>>>>>>>>>>>>>             if (!adev)
>>>>>>>>>>>>>               return -EINVAL;
>>>>>>>>>>>>> @@ -478,36 +476,12 @@ int amdgpu_fence_driver_init_ring(struct 
>>>>>>>>>>>>> amdgpu_ring *ring,
>>>>>>>>>>>>> spin_lock_init(&ring->fence_drv.lock);
>>>>>>>>>>>>>           ring->fence_drv.fences = kcalloc(num_hw_submission * 2, 
>>>>>>>>>>>>> sizeof(void *),
>>>>>>>>>>>>>                            GFP_KERNEL);
>>>>>>>>>>>>> -    if (!ring->fence_drv.fences)
>>>>>>>>>>>>> -        return -ENOMEM;
>>>>>>>>>>>>>       -    /* No need to setup the GPU scheduler for rings that 
>>>>>>>>>>>>> don't need it */
>>>>>>>>>>>>> -    if (ring->no_scheduler)
>>>>>>>>>>>>> -        return 0;
>>>>>>>>>>>>> +    ring->num_hw_submission = num_hw_submission;
>>>>>>>>>>>>> +    ring->sched_score = sched_score;
>>>>>>>>>>>> Let's move this into the caller and then use 
>>>>>>>>>>>> ring->num_hw_submission in the fence code as well.
>>>>>>>>>>>> 
>>>>>>>>>>>> The maximum number of jobs on the ring is not really fence 
>>>>>>>>>>>> specific.
>>>>>>>>>>>> 
>>>>>>>>>>>> Regards,
>>>>>>>>>>>> Christian.
>>>>>>>>>>>> 
>>>>>>>>>>>>>       -    switch (ring->funcs->type) {
>>>>>>>>>>>>> -    case AMDGPU_RING_TYPE_GFX:
>>>>>>>>>>>>> -        timeout = adev->gfx_timeout;
>>>>>>>>>>>>> -        break;
>>>>>>>>>>>>> -    case AMDGPU_RING_TYPE_COMPUTE:
>>>>>>>>>>>>> -        timeout = adev->compute_timeout;
>>>>>>>>>>>>> -        break;
>>>>>>>>>>>>> -    case AMDGPU_RING_TYPE_SDMA:
>>>>>>>>>>>>> -        timeout = adev->sdma_timeout;
>>>>>>>>>>>>> -        break;
>>>>>>>>>>>>> -    default:
>>>>>>>>>>>>> -        timeout = adev->video_timeout;
>>>>>>>>>>>>> -        break;
>>>>>>>>>>>>> -    }
>>>>>>>>>>>>> -
>>>>>>>>>>>>> -    r = drm_sched_init(&ring->sched, &amdgpu_sched_ops,
>>>>>>>>>>>>> -               num_hw_submission, amdgpu_job_hang_limit,
>>>>>>>>>>>>> -               timeout, NULL, sched_score, ring->name);
>>>>>>>>>>>>> -    if (r) {
>>>>>>>>>>>>> -        DRM_ERROR("Failed to create scheduler on ring %s.\n",
>>>>>>>>>>>>> -              ring->name);
>>>>>>>>>>>>> -        return r;
>>>>>>>>>>>>> -    }
>>>>>>>>>>>>> +    if (!ring->fence_drv.fences)
>>>>>>>>>>>>> +        return -ENOMEM;
>>>>>>>>>>>>>             return 0;
>>>>>>>>>>>>>       }
>>>>>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h 
>>>>>>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>>>>>>>>>>>>> index fae7d185ad0d..7f20ce73a243 100644
>>>>>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>>>>>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>>>>>>>>>>>>> @@ -251,6 +251,8 @@ struct amdgpu_ring {
>>>>>>>>>>>>>           bool has_compute_vm_bug;
>>>>>>>>>>>>>           bool            no_scheduler;
>>>>>>>>>>>>>           int            hw_prio;
>>>>>>>>>>>>> +    unsigned num_hw_submission;
>>>>>>>>>>>>> +    atomic_t        *sched_score;
>>>>>>>>>>>>>       };
>>>>>>>>>>>>>         #define amdgpu_ring_parse_cs(r, p, ib) 
>>>>>>>>>>>>> ((r)->funcs->parse_cs((p), (ib)))

Reply via email to