Am 12.06.2013 12:26, schrieb Michel D?nzer: > On Die, 2013-06-11 at 16:23 -0700, Andy Lutomirski wrote: >> If the device is idle for over ten seconds, then the next attempt to do >> anything can race with the lockup detector and cause a bogus lockup >> to be detected. >> >> Oddly, the situation is well-described in the lockup detector's comments >> and a fix is even described. This patch implements that fix (and corrects >> some typos in the description). >> >> My system has been stable for about a week running this code. Without this, >> my screen would go blank every now and then and, when it came back, >> everything >> would be remarkably slow (the latter is a separate bug). >> >> Signed-off-by: Andy Lutomirski <luto at amacapital.net> > [...] > >> diff --git a/drivers/gpu/drm/radeon/radeon_ring.c >> b/drivers/gpu/drm/radeon/radeon_ring.c >> index 1ef5eaa..fb7b3ea 100644 >> --- a/drivers/gpu/drm/radeon/radeon_ring.c >> +++ b/drivers/gpu/drm/radeon/radeon_ring.c >> @@ -547,12 +547,12 @@ void radeon_ring_lockup_update(struct radeon_ring >> *ring) >> * have CP rptr to a different value of jiffies wrap around which will >> force >> * initialization of the lockup tracking informations. >> * >> - * A possible false positivie is if we get call after while and >> last_cp_rptr == >> - * the current CP rptr, even if it's unlikely it might happen. To avoid this >> - * if the elapsed time since last call is bigger than 2 second than we >> return >> - * false and update the tracking information. Due to this the caller must >> call >> - * radeon_ring_test_lockup several time in less than 2sec for lockup to be >> reported >> - * the fencing code should be cautious about that. >> + * A possible false positive is if we get called after a while and >> + * last_cp_rptr == the current CP rptr, even if it's unlikely it might >> + * happen. To avoid this if the elapsed time since the last call is bigger >> + * than 2 second then we return false and update the tracking >> + * information. Due to this the caller must call radeon_ring_test_lockup >> + * more frequently than once every 2s when waiting. > Is it guaranteed that radeon_ring_test_lockup will be called more often > than every 2s when waiting? If not, this change might prevent a real > lockup from being detected? > > Either way, I wonder if there might not be a simpler solution to the > problem, e.g. by updating last_activity when submitting commands to a > previously empty ring.
If I remember correctly that's exactly what I used to do when creating radeon_ring_test_lockup, and now I'm really wondering why that stuff isn't there any more. Anyway the problem you describe should only happen very very rarely in case of a wraparound, so I'm pretty sure that we have a different problem that's just masked by that patch. Christian.