On 24/04/2018 09:16, Chris Wilson wrote:
If we have more than a few, possibly several thousand request in the
queue, don't show the central portion, just the first few and the last
being executed and/or queued. The first few should be enough to help
identify a problem in execution, and most often comparing the first/last
in the queue is enough to identify problems in the scheduling.

We may need some fine tuning to set MAX_REQUESTS_TO_SHOW for common
debug scenarios, but for the moment if we can avoiding spending more
than a few seconds dumping the GPU state that will avoid a nasty
livelock (where hangcheck spends so long dumping the state, it fires
again and starts to dump the state again in parallel, ad infinitum).

v2: Remember to print last not the stale rq iter after the loop.

Signed-off-by: Chris Wilson <ch...@chris-wilson.co.uk>
---
  drivers/gpu/drm/i915/intel_engine_cs.c | 43 +++++++++++++++++++++++---
  1 file changed, 38 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c 
b/drivers/gpu/drm/i915/intel_engine_cs.c
index 66cddd059666..2398ea71e747 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -1307,11 +1307,13 @@ void intel_engine_dump(struct intel_engine_cs *engine,
                       struct drm_printer *m,
                       const char *header, ...)
  {
+       const int MAX_REQUESTS_TO_SHOW = 8;
        struct intel_breadcrumbs * const b = &engine->breadcrumbs;
        const struct intel_engine_execlists * const execlists = 
&engine->execlists;
        struct i915_gpu_error * const error = &engine->i915->gpu_error;
-       struct i915_request *rq;
+       struct i915_request *rq, *last;
        struct rb_node *rb;
+       int count;
if (header) {
                va_list ap;
@@ -1378,16 +1380,47 @@ void intel_engine_dump(struct intel_engine_cs *engine,
        }
spin_lock_irq(&engine->timeline->lock);
-       list_for_each_entry(rq, &engine->timeline->requests, link)
-               print_request(m, rq, "\t\tE ");
+
+       last = NULL;
+       count = 0;
+       list_for_each_entry(rq, &engine->timeline->requests, link) {
+               if (count++ < MAX_REQUESTS_TO_SHOW - 1)
+                       print_request(m, rq, "\t\tE ");
+               else
+                       last = rq;

else {
        last = list_last_entry(...) ?
        break;
}

+       }
+       if (last) {
+               if (count > MAX_REQUESTS_TO_SHOW) {
+                       drm_printf(m,
+                                  "\t\t...skipping %d executing requests...\n",
+                                  count - MAX_REQUESTS_TO_SHOW);
+               }
+               print_request(m, last, "\t\tE ");
+       }

Or even stuff this printf in the first loop above, under the else branch. Maybe shorter would be like this, module off by ones if I made some:

list_for_each_entry(rq, &engine->timeline->requests, link) {
        struct i915_request *pr = rq;

        if (++count > MAX_REQUESTS_TO_SHOW) {
                pr = list_last_entry(...);
                drm_printf(m,
                           "\t\t...skipping %d executing requests...\n",
                           count - MAX_REQUESTS_TO_SHOW);
        }
        
        print_request(m, pr, "\t\tE ");
        
        if (count > MAX_REQUESTS_TO_SHOW)
                break;
}

+
+       last = NULL;
+       count = 0;
        drm_printf(m, "\t\tQueue priority: %d\n", execlists->queue_priority);
        for (rb = execlists->first; rb; rb = rb_next(rb)) {
                struct i915_priolist *p =
                        rb_entry(rb, typeof(*p), node);
- list_for_each_entry(rq, &p->requests, sched.link)
-                       print_request(m, rq, "\t\tQ ");
+               list_for_each_entry(rq, &p->requests, sched.link) {
+                       if (count++ < MAX_REQUESTS_TO_SHOW - 1)
+                               print_request(m, rq, "\t\tQ ");
+                       else
+                               last = rq;
+               }
        }
+       if (last) {
+               if (count > MAX_REQUESTS_TO_SHOW) {
+                       drm_printf(m,
+                                  "\t\t...skipping %d queued requests...\n",
+                                  count - MAX_REQUESTS_TO_SHOW);
+               }
+               print_request(m, last, "\t\tQ ");
+       }

Then I am thinking how to avoid the duplication and extract the smart printer. Macro would be easy at least, if a bit ugly.

Another idea comes to mind, but probably for the future, to merge same prio, context and strictly consecutive seqnos to a single line of output like:

 [prefix]seqno-seqno [%llx:seqno-seqno] (%u consecutive requests)

Give or take, but it will be more involved to do that.

+
        spin_unlock_irq(&engine->timeline->lock);
spin_lock_irq(&b->rb_lock);


Looks OK in general, just please see if you like my idea to contain the logic within a single loop and maybe even move it to a macro.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Reply via email to