On 2021-08-12 1:33 p.m., Lazar, Lijo wrote:
> On 8/12/2021 1:41 PM, Michel Dänzer wrote:
>> On 2021-08-12 7:55 a.m., Koenig, Christian wrote:
>>> Hi James,
>>>
>>> Evan seems to have understood how this all works together.
>>>
>>> See while any begin/end use critical section is active the work should not 
>>> be active.
>>>
>>> When you handle only one ring you can just call cancel in begin use and 
>>> schedule in end use. But when you have more than one ring you need a lock 
>>> or counter to prevent concurrent work items to be started.
>>>
>>> Michelle's idea to use mod_delayed_work is a bad one because it assumes 
>>> that the delayed work is still running.
>>
>> It merely assumes that the work may already have been scheduled before.
>>
>> Admittedly, I missed the cancel_delayed_work_sync calls for patch 2. While I 
>> think it can still have some effect when there's a single work item for 
>> multiple rings, as described by James, it's probably negligible, since 
>> presumably the time intervals between ring_begin_use and ring_end_use are 
>> normally much shorter than a second.
>>
>> So, while patch 2 is at worst a no-op (since mod_delayed_work is the same as 
>> schedule_delayed_work if the work hasn't been scheduled yet), I'm fine with 
>> dropping it.
>>
>>
>>> Something similar applies to the first patch I think,
>>
>> There are no cancel work calls in that case, so the commit log is accurate 
>> TTBOMK.
> 
> Curious -
> 
> For patch 1, does it make a difference if any delayed work scheduled is 
> cancelled in the else part before proceeding?
> 
> } else if (!enable && adev->gfx.gfx_off_state) {
> cancel_delayed_work();

I tried the patch below.

While this does seem to fix the problem as well, I see a potential issue:

1. amdgpu_gfx_off_ctrl locks adev->gfx.gfx_off_mutex
2. amdgpu_device_delay_enable_gfx_off runs, blocks in mutex_lock
3. amdgpu_gfx_off_ctrl calls cancel_delayed_work_sync

I'm afraid this would deadlock? (CONFIG_PROVE_LOCKING doesn't complain though)


Maybe it's possible to fix it with cancel_delayed_work_sync somehow, but I'm 
not sure how offhand. (With cancel_delayed_work instead, I'm worried 
amdgpu_device_delay_enable_gfx_off might still enable GFXOFF in the HW 
immediately after amdgpu_gfx_off_ctrl unlocks the mutex. Then again, that might 
happen with mod_delayed_work as well...)


diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c

index a0be0772c8b3..3e4585ffb9af 100644

--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c

+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c

@@ -570,8 +570,11 @@ void amdgpu_gfx_off_ctrl(struct amdgpu_device *adev, bool 
enable)



        if (enable && !adev->gfx.gfx_off_state && !adev->gfx.gfx_off_req_count) 
{

                schedule_delayed_work(&adev->gfx.gfx_off_delay_work, 
GFX_OFF_DELAY_ENABLE);

-       } else if (!enable && adev->gfx.gfx_off_state) {

-               if (!amdgpu_dpm_set_powergating_by_smu(adev, 
AMD_IP_BLOCK_TYPE_GFX, false)) {

+       } else if (!enable) {

+               cancel_delayed_work_sync(&adev->gfx.gfx_off_delay_work);

+

+               if (adev->gfx.gfx_off_state &&

+                   !amdgpu_dpm_set_powergating_by_smu(adev, 
AMD_IP_BLOCK_TYPE_GFX, false)) {

                        adev->gfx.gfx_off_state = false;



                        if (adev->gfx.funcs->init_spm_golden) {



-- 
Earthling Michel Dänzer               |               https://redhat.com
Libre software enthusiast             |             Mesa and X developer

Reply via email to