On 2021-08-16 12:20 p.m., Quan, Evan wrote:
> [AMD Official Use Only]
> 
> Hi Michel,
> 
> The patch seems reasonable to me(especially the cancel_delayed_work_sync() 
> part).
> However, can you explain more about the code below?
> What's the race issue here exactly?
> 
> +     /* mutex_lock could deadlock with cancel_delayed_work_sync in 
> amdgpu_gfx_off_ctrl. */
> +     if (!mutex_trylock(&adev->gfx.gfx_off_mutex)) {
> +             /* If there's a bug which causes amdgpu_gfx_off_ctrl to be 
> called with enable=true
> +              * when adev->gfx.gfx_off_req_count is already 0, we might race 
> with that.
> +              * Re-schedule to make sure gfx off will be re-enabled in the 
> HW eventually.
> +              */
> +             schedule_delayed_work(&adev->gfx.gfx_off_delay_work, 
> AMDGPU_GFX_OFF_DELAY_ENABLE);
> +             return;
> +     }

If amdgpu_gfx_off_ctrl was called with enable=true when 
adev->gfx.gfx_off_req_count == 0 already, it could have prevented 
amdgpu_device_delay_enable_gfx_off from locking the mutex.

v3 solves this by only scheduling the work when adev->gfx.gfx_off_req_count 
transitions from 1 to 0, which means it no longer needs to lock the mutex.


-- 
Earthling Michel Dänzer               |               https://redhat.com
Libre software enthusiast             |             Mesa and X developer

Reply via email to