On 03/05/2025 21:59, Maíra Canal wrote:
Currently, if we add the assertions presented in this commit to the mock
scheduler, we will see the following output:

[15:47:08] ============== [PASSED] drm_sched_basic_tests ==============
[15:47:08] ======== drm_sched_basic_timeout_tests (1 subtest) =========
[15:47:08] # drm_sched_basic_timeout: ASSERTION FAILED at 
drivers/gpu/drm/scheduler/tests/tests_basic.c:246
[15:47:08] Expected list_empty(&sched->job_list) to be true, but is false
[15:47:08] [FAILED] drm_sched_basic_timeout
[15:47:08] # module: drm_sched_tests

This occurs because `mock_sched_timedout_job()` doesn't properly handle
the hang. From the DRM sched documentation, `drm_sched_stop()` and
`drm_sched_start()` are typically used for reset recovery. If these
functions are not used, the offending job won't be freed and should be
freed by the caller.

Currently, the mock scheduler doesn't use the functions provided by the
API, nor does it handle the freeing of the job. As a result, the job isn't
removed from the job list.

For the record the job does gets freed via the kunit managed allocation.

It was a design choice for this test to be a *strict* unit test which tests only a _single_ thing. And that is that the timedout_job() hook gets called. As such the hook was implemented to satisfy that single requirement only.

But I also do not oppose making it test multiple things in one test per se.

This commit mocks a GPU reset by stopping the scheduler affected by the
reset, waiting a couple of microseconds to mimic a hardware reset, and
then restart the affected scheduler.

Signed-off-by: Maíra Canal <mca...@igalia.com>
---
  drivers/gpu/drm/scheduler/tests/mock_scheduler.c | 10 ++++++++++
  drivers/gpu/drm/scheduler/tests/tests_basic.c    |  3 +++
  2 files changed, 13 insertions(+)

diff --git a/drivers/gpu/drm/scheduler/tests/mock_scheduler.c 
b/drivers/gpu/drm/scheduler/tests/mock_scheduler.c
index 
f999c8859cf7adb8f06fc8a37969656dd3249fa7..e9af202d84bd55ea5cc048215e39f5407bc84458
 100644
--- a/drivers/gpu/drm/scheduler/tests/mock_scheduler.c
+++ b/drivers/gpu/drm/scheduler/tests/mock_scheduler.c
@@ -1,6 +1,8 @@
  // SPDX-License-Identifier: GPL-2.0
  /* Copyright (c) 2025 Valve Corporation */
+#include <linux/delay.h>
+
  #include "sched_tests.h"
/*
@@ -203,10 +205,18 @@ static struct dma_fence *mock_sched_run_job(struct 
drm_sched_job *sched_job)
  static enum drm_gpu_sched_stat
  mock_sched_timedout_job(struct drm_sched_job *sched_job)
  {
+       struct drm_mock_scheduler *sched =
+               drm_sched_to_mock_sched(sched_job->sched);
        struct drm_mock_sched_job *job = drm_sched_job_to_mock_job(sched_job);
job->flags |= DRM_MOCK_SCHED_JOB_TIMEDOUT; + drm_sched_stop(&sched->base, &job->base);
+
+       usleep_range(200, 500);

msleep(10) or something to make it seem less like the actual numbers are relevant?

> +> +    drm_sched_start(&sched->base, 0);
+
        return DRM_GPU_SCHED_STAT_NOMINAL;
  }
diff --git a/drivers/gpu/drm/scheduler/tests/tests_basic.c b/drivers/gpu/drm/scheduler/tests/tests_basic.c
index 
7230057e0594c6246f02608f07fcb1f8d738ac75..8f960f0fd31d0af7873f410ceba2d636f58a5474
 100644
--- a/drivers/gpu/drm/scheduler/tests/tests_basic.c
+++ b/drivers/gpu/drm/scheduler/tests/tests_basic.c
@@ -241,6 +241,9 @@ static void drm_sched_basic_timeout(struct kunit *test)
                        job->flags & DRM_MOCK_SCHED_JOB_TIMEDOUT,
                        DRM_MOCK_SCHED_JOB_TIMEDOUT);
+ KUNIT_ASSERT_TRUE(test, list_empty(&sched->job_list));

Hmm I think this assert could be racy because it appears to rely on the free worker to run and cleanup the "finished" job in the window between drm_mock_sched_job_wait_finished() (or drm_sched_start(), depends how you look at it) and here. Am I missing something?

Regards,

Tvrtko

+       KUNIT_ASSERT_TRUE(test, list_empty(&sched->done_list));
> +>      drm_mock_sched_entity_free(entity);
  }

Reply via email to