On Mon, Apr 14, 2014 at 10:23:20AM +0000, Mateo Lozano, Oscar wrote:
> > > > > +                             &requests);
> > > > > +             if (req_matched == 1) {
> > > > > +                     igt_assert(strstr(ring_name, 
> > > > > expected_ring_name));
> > > > > +                     igt_assert(requests == 1);
> > > >
> > > > Bad assumption. You could still have the request from
> > > > gem_quiescent_gpu() which may not have been retired before the error
> > > > triggers.
> > >
> > > Hmmmm... but I make a first valid gem_execbuf()+gem_sync() before the
> > one that actually hangs. Shouldn´t that make sure that all previous requests
> > have been retired?
> > 
> > No, I would not make that assumption about kernel behaviour. The only thing
> > that it will guarantee is that the last request with that handle is 
> > retired. (In
> > other words, forthcoming bug fixes already break this
> > assumption.)
> 
> Ok, then I´ll check all the reported requests (can I at least assume that the 
> one I am interested in will be the last one? request_list seems to be 
> FIFO...).

In general, even that may be dangerous. (But less so if can be sure that
is no batchbuffer injection from the testsuite, and that there are no
other clients, or that the kernel doesn't start having to do its own
w/a.)

> > > > > +
> > > > > +                     igt_assert(getline(&line, &line_size, file) > 
> > > > > 0);
> > > > > +                     items = sscanf(line, "  seqno 0x%08x, emitted 
> > > > > %ld, tail
> > > > 0x%08x\n",
> > > > > +                                     &seqno, &jiffies, &tail);
> > > > > +                     igt_assert(items == 3);
> > > >
> > > > Bad. I may change the format. s/may/will/
> > >
> > > I still need to make sure I got a valid tail, so I need to know the 
> > > format and
> > fail if don´t understand it. Or do you want me to use a different tail, 
> > maybe the
> > ringbuffer one?
> > 
> > I didn't spot where you used tail. Care to elaborate? I may be able to 
> > suggest an
> > alternative, or we may either code this more defensively or make the kernel
> > emission easier to use and hopefully more informative for debugging.
> 
> I´m using the request tail to traverse the ring buffer, make sure there is a 
> MI_BB_START and then our object´s address shortly before that tail. I could 
> use ring->tail, or I could skip this check altogether (after all, I´m already 
> checking that the GPU was executing a Batch Buffer, and that the BB had my 
> magic number on it).

I would add a little more smarts to both the kernel and error-decode.
In the kernel, we can print the guilty request, which you can then use
to confirm that it is yours. That seems to me to be a stronger
validation of gem_error_capture, and a useful bit of information from
hangstats that we do not expose currently.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Reply via email to