[PATCH] radeon: Fix a false positive lockup after 10s of inactivity

Christian König Wed, 12 Jun 2013 13:20:34 +0200

Am 12.06.2013 12:26, schrieb Michel D?nzer:
> On Die, 2013-06-11 at 16:23 -0700, Andy Lutomirski wrote:
>> If the device is idle for over ten seconds, then the next attempt to do
>> anything can race with the lockup detector and cause a bogus lockup
>> to be detected.
>>
>> Oddly, the situation is well-described in the lockup detector's comments
>> and a fix is even described.  This patch implements that fix (and corrects
>> some typos in the description).
>>
>> My system has been stable for about a week running this code.  Without this,
>> my screen would go blank every now and then and, when it came back, 
>> everything
>> would be remarkably slow (the latter is a separate bug).
>>
>> Signed-off-by: Andy Lutomirski <luto at amacapital.net>
> [...]
>
>> diff --git a/drivers/gpu/drm/radeon/radeon_ring.c 
>> b/drivers/gpu/drm/radeon/radeon_ring.c
>> index 1ef5eaa..fb7b3ea 100644
>> --- a/drivers/gpu/drm/radeon/radeon_ring.c
>> +++ b/drivers/gpu/drm/radeon/radeon_ring.c
>> @@ -547,12 +547,12 @@ void radeon_ring_lockup_update(struct radeon_ring 
>> *ring)
>>    * have CP rptr to a different value of jiffies wrap around which will 
>> force
>>    * initialization of the lockup tracking informations.
>>    *
>> - * A possible false positivie is if we get call after while and 
>> last_cp_rptr ==
>> - * the current CP rptr, even if it's unlikely it might happen. To avoid this
>> - * if the elapsed time since last call is bigger than 2 second than we 
>> return
>> - * false and update the tracking information. Due to this the caller must 
>> call
>> - * radeon_ring_test_lockup several time in less than 2sec for lockup to be 
>> reported
>> - * the fencing code should be cautious about that.
>> + * A possible false positive is if we get called after a while and
>> + * last_cp_rptr == the current CP rptr, even if it's unlikely it might
>> + * happen. To avoid this if the elapsed time since the last call is bigger
>> + * than 2 second then we return false and update the tracking
>> + * information. Due to this the caller must call radeon_ring_test_lockup
>> + * more frequently than once every 2s when waiting.
> Is it guaranteed that radeon_ring_test_lockup will be called more often
> than every 2s when waiting? If not, this change might prevent a real
> lockup from being detected?
>
> Either way, I wonder if there might not be a simpler solution to the
> problem, e.g. by updating last_activity when submitting commands to a
> previously empty ring.


If I remember correctly that's exactly what I used to do when creating 
radeon_ring_test_lockup, and now I'm really wondering why that stuff 
isn't there any more.

Anyway the problem you describe should only happen very very rarely in 
case of a wraparound, so I'm pretty sure that we have a different 
problem that's just masked by that patch.

Christian.

[PATCH] radeon: Fix a false positive lockup after 10s of inactivity

Reply via email to