Re: [PATCH] drm/amdgpu: fix ring test failure issue during s3 in vce 3.0

Christian König Tue, 28 May 2019 00:24:43 -0700

Wow, really good catch!

The underlying problem is most likely that VCE block is either power orclock gated and because of this the readptr read always returns zero.

Now amdgpu_ring_alloc() informs the power management code that the blockis about to be used and so the gating is turned off.

Mhm, that is probably wrong at a hole bunch of other places, at leastthe UVD and VCN code comes to mind.

I agree with Leo that you should remove the original read (so to notread twice) and it would be realy nice if you could double check theother code (UVD/VCN) for similar problems as well.


Regards,
Christian.

Am 27.05.19 um 19:20 schrieb Li, Ching-shih (Louis):

I don’t mean to read it twice. The solution is to make first readlater. I didn’t modify the original code to make code difference lessand simple. I guess it should work to remove the original read there.


*From:*Liu, Leo <leo....@amd.com>
*Sent:* Tuesday, May 28, 2019 12:40 AM

*To:* Li, Ching-shih (Louis) <ching-shih...@amd.com>; S, Shirish<shiris...@amd.com>; Grodzovsky, Andrey <andrey.grodzov...@amd.com>;Zhang, Jerry <jerry.zh...@amd.com>; Deng, Emily <emily.d...@amd.com>;Deucher, Alexander <alexander.deuc...@amd.com>

*Cc:* amd-gfx@lists.freedesktop.org

*Subject:* Re: [PATCH] drm/amdgpu: fix ring test failure issue durings3 in vce 3.0


int amdgpu_vce_ring_test_ring(struct amdgpu_ring *ring)
{
    struct amdgpu_device *adev = ring->adev;

    uint32_t rptr = amdgpu_ring_get_rptr(ring);

    unsigned i;
    int r, timeout = adev->usec_timeout;

    /* skip ring test for sriov*/
    if (amdgpu_sriov_vf(adev))
        return 0;

    r = amdgpu_ring_alloc(ring, 16);
    if (r)
        return r;

    amdgpu_ring_write(ring, VCE_CMD_END);
    amdgpu_ring_commit(ring);

Above is original code, rptr is updated when called, and below is yourpatch, my question is why do you need to get rptr twice?


@@ -1084,6 +1084,8 @@ int amdgpu_vce_ring_test_ring(struct amdgpu_ring *ring)
         if (r)
                return r;

+ rptr = amdgpu_ring_get_rptr(ring);

+
         amdgpu_ring_write(ring, VCE_CMD_END);
         amdgpu_ring_commit(ring);

On 5/27/19 12:22 PM, Li, Ching-shih (Louis) wrote:

    Hi Leo,

    Yes, I confirm it is the root cause *for the Chrome S3 issue*.
    Whenever system is resumed, the original instruction always gets
    zero. However, I have no idea why it fails, and didn’t verify this
    problem on CRB or any other Linux platform yet.

    Although I think the ideal solution is an indicator, e.g. a
    register, for driver to check if related firmware and hardware are
    ready to work. So driver can make sure it is ok to read rptr.
    Without any reference document, I can only try to solve the
    problem by modifying driver. Debug traces reveal that only first
    rptr read fails, but the read in check loop is ok. Therefore, a
    solution comes to mind: to update rptr later for initial rptr
    value. Tests prove it working in Chrome platforms. Fyi~

    BR,

    Louis

    *From:*Liu, Leo <leo....@amd.com> <mailto:leo....@amd.com>
    *Sent:* Monday, May 27, 2019 9:01 PM
    *To:* S, Shirish <shiris...@amd.com> <mailto:shiris...@amd.com>;
    Grodzovsky, Andrey <andrey.grodzov...@amd.com>
    <mailto:andrey.grodzov...@amd.com>; Zhang, Jerry
    <jerry.zh...@amd.com> <mailto:jerry.zh...@amd.com>; Deng, Emily
    <emily.d...@amd.com> <mailto:emily.d...@amd.com>; Deucher,
    Alexander <alexander.deuc...@amd.com>
    <mailto:alexander.deuc...@amd.com>
    *Cc:* amd-gfx@lists.freedesktop.org
    <mailto:amd-gfx@lists.freedesktop.org>; Li, Ching-shih (Louis)
    <ching-shih...@amd.com> <mailto:ching-shih...@amd.com>
    *Subject:* Re: [PATCH] drm/amdgpu: fix ring test failure issue
    during s3 in vce 3.0

    On 5/27/19 3:42 AM, S, Shirish wrote:

        From: Louis Li<ching-shih...@amd.com>  <mailto:ching-shih...@amd.com>

        [What]

        vce ring test fails consistently during resume in s3 cycle, due to

        mismatch read & write pointers.

        On debug/analysis its found that rptr to be compared is not being

        correctly updated/read, which leads to this failure.

        Below is the failure signature:

           [drm:amdgpu_vce_ring_test_ring] *ERROR* amdgpu: ring 12 test failed

           [drm:amdgpu_device_ip_resume_phase2] *ERROR* resume of IP block 
<vce_v3_0> failed -110

           [drm:amdgpu_device_resume] *ERROR* amdgpu_device_ip_resume failed 
(-110).

        [How]

        fetch rptr appropriately, meaning move its read location further down

        in the code flow.

        With this patch applied the s3 failure is no more seen for >5k s3 
cycles,

        which otherwise is pretty consistent.

        Signed-off-by: Louis Li<ching-shih...@amd.com>  
<mailto:ching-shih...@amd.com>

        ---

          drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c | 2 ++

          1 file changed, 2 insertions(+)

        diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c

        index c021b11..92f9d46 100644

        --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c

        +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c

        @@ -1084,6 +1084,8 @@ int amdgpu_vce_ring_test_ring(struct amdgpu_ring 
*ring)

           if (r)

                  return r;

        + rptr = amdgpu_ring_get_rptr(ring);

        +

    The rptr update is there:

    |        uint32_t rptr = amdgpu_ring_get_rptr(ring);|

    ||

    |Are you sure this is the root cause?|

    ||

    |Regards,|

    |Leo|

    ||

           amdgpu_ring_write(ring, VCE_CMD_END);

           amdgpu_ring_commit(ring);


_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: fix ring test failure issue during s3 in vce 3.0

Reply via email to