On 2018-10-12 03:34 AM, Daniel Vetter wrote:
On Thu, Oct 11, 2018 at 08:27:43PM -0400, sunpeng...@amd.com wrote:
From: Leo Li <sunpeng...@amd.com>

This fixes a general protection fault, caused by accessing the contents
of a flip_done completion object that has already been freed. It occurs
due to the preemption of a non-blocking commit worker thread W by
another commit thread X. X continues to clear its atomic state at the
end, destroying the CRTC commit object that W still needs. Switching
back to W and accessing the commit objects then leads to bad results.

Worker W becomes preemptable when waiting for flip_done to complete. At
this point, a frequently occurring commit thread X can take over. Here's
an example where W is a worker thread that flips on both CRTCs, and X
does a legacy cursor update on both CRTCs:

         ...
      1. W does flip work
      2. W runs commit_hw_done()
      3. W waits for flip_done on CRTC 1
      4. > flip_done for CRTC 1 completes
      5. W finishes waiting for CRTC 1
      6. W waits for flip_done on CRTC 2

      7. > Preempted by X
      8. > flip_done for CRTC 2 completes
      9. X atomic_check: hw_done and flip_done are complete on all CRTCs
     10. X updates cursor on both CRTCs
     11. X destroys atomic state
     12. X done

     13. > Switch back to W
     14. W waits for flip_done on CRTC 2
     15. W raises general protection fault

The error looks like so:

     general protection fault: 0000 [#1] PREEMPT SMP PTI
     **snip**
     Call Trace:
      lock_acquire+0xa2/0x1b0
      _raw_spin_lock_irq+0x39/0x70
      wait_for_completion_timeout+0x31/0x130
      drm_atomic_helper_wait_for_flip_done+0x64/0x90 [drm_kms_helper]
      amdgpu_dm_atomic_commit_tail+0xcae/0xdd0 [amdgpu]
      commit_tail+0x3d/0x70 [drm_kms_helper]
      process_one_work+0x212/0x650
      worker_thread+0x49/0x420
      kthread+0xfb/0x130
      ret_from_fork+0x3a/0x50
     Modules linked in: x86_pkg_temp_thermal amdgpu(O) chash(O)
     gpu_sched(O) drm_kms_helper(O) syscopyarea sysfillrect sysimgblt
     fb_sys_fops ttm(O) drm(O)

Note that i915 has this issue masked, since hw_done is signaled after
waiting for flip_done. Doing so will block the cursor update from
happening until hw_done is signaled, preventing the cursor commit from
destroying the state.

Signed-off-by: Leo Li <sunpeng...@amd.com>

Great analysis!

Bugfix itself needs a bit more work though, see below.

---
  drivers/gpu/drm/drm_atomic_helper.c | 30 ++++++++++++++++++++++++++----
  1 file changed, 26 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/drm_atomic_helper.c 
b/drivers/gpu/drm/drm_atomic_helper.c
index 80be74d..efdf043 100644
--- a/drivers/gpu/drm/drm_atomic_helper.c
+++ b/drivers/gpu/drm/drm_atomic_helper.c
@@ -1410,20 +1410,42 @@ void drm_atomic_helper_wait_for_flip_done(struct 
drm_device *dev,
  {
        struct drm_crtc_state *new_crtc_state;
        struct drm_crtc *crtc;
+       struct drm_crtc_commit **commits;
        int i;
+       int num_crtc = dev->mode_config.num_crtc;
+ commits = kcalloc(num_crtc, sizeof(*commits), GFP_KERNEL);
+
+       /*
+        * Because we open ourselves to preemption by calling
+        * wait_for_completion_timeout(), we need to get our own references to
+        * the commit objects. This is so that a preempting commit won't free
+        * them.
+        */

The scheduler can preempt you at any point before here too, if you enable
CONFIG_PREEMPT. It definitely can preempt/block in the kcalloc above, even
if you have CONFIG_PREEMPT disabled. The scheduling point you identified
below is simply the most likely.


Ah, didn't think of this.

The underlying bug is that after drm_atomic_helper_commit_hw_done() we
unblock the next commit worker (thread X in your example), and cannot
assume anymore that the new state will stay around (since new state is
released by the subsequent commit work run in thread X).

Therefore your drm_crtc_commit_get here already can access freed memory.
The correct fix here is to store the temporary reference before we call
drm_atomic_helper_commit_hw_done(), and then use that in this function
here. The problem is that this is somewhat tricky to pull off, since some
drivers call wait_for_flip_done() before hw_done() (like i915).

Here's what I'd do:
- Add a new drm_crtc_commit pointer to struct __drm_crtcs_state.

- Fill that out in drm_atomic_helper_setup_commit, and make sure you
   release the reference for it in drm_atomic_state_default_clear().

- Use that pointer instead in drm_atomic_helper_wait_for_flip_done() here.

Thanks for the pointers, patch incoming :)

Leo


Cheers, Daniel
        for_each_new_crtc_in_state(old_state, crtc, new_crtc_state, i) {
-               struct drm_crtc_commit *commit = new_crtc_state->commit;
+               commits[i] = new_crtc_state->commit;
+               if (commits[i])
+                       drm_crtc_commit_get(commits[i]);
+       }
+
+       for (i = 0; i < num_crtc; i++) {
                int ret;
- if (!commit)
+               if (!commits[i])
                        continue;
- ret = wait_for_completion_timeout(&commit->flip_done, 10 * HZ);
+               ret = wait_for_completion_timeout(&commits[i]->flip_done,
+                                                 10 * HZ);
                if (ret == 0)
                        DRM_ERROR("[CRTC:%d:%s] flip_done timed out\n",
-                                 crtc->base.id, crtc->name);
+                                 commits[i]->crtc->base.id,
+                                 commits[i]->crtc->name);
        }
+
+       for (i = 0; i < num_crtc; i++)
+               if (commits[i])
+                       drm_crtc_commit_put(commits[i]);
+       kfree(commits);
  }
  EXPORT_SYMBOL(drm_atomic_helper_wait_for_flip_done);
--
2.7.4

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Reply via email to