On Mon Sep 2, 2024 at 2:40 AM PDT, tjakobi wrote:
> From: Tobias Jakobi <tjak...@math.uni-bielefeld.de>
>
> dc_state_destruct() nulls the resource context of the DC state. The pipe
> context passed to dcn10_set_drr() is a member of this resource context.
>
> If dc_state_destruct() is called parallel to the IRQ processing (which
> calls dcn10_set_drr() at some point), we can end up using already nulled
> function callback fields of struct stream_resource.
>
> The logic in dcn10_set_drr() already tries to avoid this, by checking tg
> against NULL. But if the nulling happens exactly after the NULL check and
> before the next access, then we get a race.
>
> Avoid this by copying tg first to a local variable, and then use this
> variable for all the operations. This should work, as long as nobody
> frees the resource pool where the timing generators live.
>
> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3142
> Fixes: 06ad7e164256 ("drm/amd/display: Destroy DC context while keeping DML 
> and DML2")
> Signed-off-by: Tobias Jakobi <tjak...@math.uni-bielefeld.de>
> ---
>  .../amd/display/dc/hwss/dcn10/dcn10_hwseq.c   | 20 +++++++++++--------
>  1 file changed, 12 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c 
> b/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c
> index 3306684e805a..da8f2cb3c5db 100644
> --- a/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c
> +++ b/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c
> @@ -3223,15 +3223,19 @@ void dcn10_set_drr(struct pipe_ctx **pipe_ctx,
>        * as well.
>        */
>       for (i = 0; i < num_pipes; i++) {
> -             if ((pipe_ctx[i]->stream_res.tg != NULL) && 
> pipe_ctx[i]->stream_res.tg->funcs) {
> -                     if (pipe_ctx[i]->stream_res.tg->funcs->set_drr)
> -                             pipe_ctx[i]->stream_res.tg->funcs->set_drr(
> -                                     pipe_ctx[i]->stream_res.tg, &params);
> +             /* dc_state_destruct() might null the stream resources, so 
> fetch tg
> +              * here first to avoid a race condition. The lifetime of the 
> pointee
> +              * itself (the timing_generator object) is not a problem here.
> +              */
> +             struct timing_generator *tg = pipe_ctx[i]->stream_res.tg;
> +
> +             if ((tg != NULL) && tg->funcs) {
> +                     if (tg->funcs->set_drr)
> +                             tg->funcs->set_drr(tg, &params);
>                       if (adjust.v_total_max != 0 && adjust.v_total_min != 0)
> -                             if 
> (pipe_ctx[i]->stream_res.tg->funcs->set_static_screen_control)
> -                                     
> pipe_ctx[i]->stream_res.tg->funcs->set_static_screen_control(
> -                                             pipe_ctx[i]->stream_res.tg,
> -                                             event_triggers, num_frames);
> +                             if (tg->funcs->set_static_screen_control)
> +                                     tg->funcs->set_static_screen_control(
> +                                             tg, event_triggers, num_frames);
>               }
>       }
>  }

This fixes hard to trace panics with labwc VRR and Wayfire on RX 6700 XT. I had 
to use netconsole to arrive at the original bug report.

Tested-by: Christopher Snowhill <ch...@kode54.net>

Reply via email to