On Wed, May 8, 2013 at 4:06 PM, Chris Wilson <ch...@chris-wilson.co.uk> wrote: > On Wed, May 08, 2013 at 04:02:00PM +0200, Daniel Vetter wrote: >> On Wed, May 08, 2013 at 02:29:30PM +0100, Chris Wilson wrote: >> > There is an unlikely corner case whereby a lockless wait may not notice >> > a GPU hang and reset, and so continue to wait for the device to advance >> > beyond the chosen seqno. This of course may never happen as the waiter >> > may be the only user. Instead, we can explicitly advance the device >> > seqno to match the requests that are forcibly retired following the >> > hang. >> > >> > Signed-off-by: Chris Wilson <ch...@chris-wilson.co.uk> >> >> This race is why the reset counter must always increase and can't just >> flip-flop between the reset-in-progress and everything-works states. >> >> Now if we want to unwedge on resume we need to reconsider this, but imo it >> would be easier to simply remember the reset counter before we wedge the >> gpu and restore that one (incremented as if the gpu reset worked). We >> already assume that wedged will never collide with a real reset counter, >> so this should work. > > Agree that this a unwedge-upon-resume issue, but my argument here is > that this leaves the hardware state consistent with what we forcibly > reset it to. From that perspective your suggestion is papering over this > here bug and this is the neat solution.
Yeah, for the reset case I agree that just continuing in the sequence would be more resilient. I'm still a bit unsure though what to do across suspend/resume (where we currently force-reset the sequence numbers, too). Maybe we need the poke-y stick there, too (in the form of kicking waiters and incrementing the reset counter). -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx