Re: [BUG] completely bonkers use of set_need_resched + VM_FAULT_NOPAGE

2013-09-13 Thread Maarten Lankhorst
Op 13-09-13 08:44, Thomas Hellstrom schreef:
> On 09/12/2013 11:50 PM, Maarten Lankhorst wrote:
>> Op 12-09-13 18:44, Thomas Hellstrom schreef:
>>> On 09/12/2013 05:45 PM, Maarten Lankhorst wrote:
 Op 12-09-13 17:36, Daniel Vetter schreef:
> On Thu, Sep 12, 2013 at 5:06 PM, Peter Zijlstra  
> wrote:
>> So I'm poking around the preemption code and stumbled upon:
>>
>> drivers/gpu/drm/i915/i915_gem.c:set_need_resched();
>> drivers/gpu/drm/ttm/ttm_bo_vm.c:
>> set_need_resched();
>> drivers/gpu/drm/ttm/ttm_bo_vm.c:
>> set_need_resched();
>> drivers/gpu/drm/udl/udl_gem.c:  set_need_resched();
>>
>> All these sites basically do:
>>
>> while (!trylock())
>>   yield();
>>
>> which is a horrible and broken locking pattern.
>>
>> Firstly its deadlock prone, suppose the faulting process is a FIFOn+1
>> task that preempted the lock holder at FIFOn.
>>
>> Secondly the implementation is worse than usual by abusing
>> VM_FAULT_NOPAGE, which is supposed to install a PTE so that the fault
>> doesn't retry, but you're using it as a get out of fault path. And
>> you're using set_need_resched() which is not something a driver should
>> _ever_ touch.
>>
>> Now I'm going to take away set_need_resched() -- and while you can
>> 'reimplement' it using set_thread_flag() you're not going to do that
>> because it will be broken due to changes to the preempt code.
>>
>> So please as to fix ASAP and don't allow anybody to trick you into
>> merging silly things like that again ;-)
> The set_need_resched in i915_gem.c:i915_gem_fault can actually be
> removed. It was there to give the error handler a chance to sneak in
> and reset the hw/sw tracking when the gpu is dead. That hack goes back
> to the days when the locking around our error handler was somewhere
> between nonexistent and totally broken, nowadays we keep things from
> live-locking by a bit of magic in i915_mutex_lock_interruptible. I'll
> whip up a patch to rip this out. I'll also check that our testsuite
> properly exercises this path (needs a bit of work on a quick look for
> better coverage).
>
> The one in ttm is just bonghits to shut up lockdep: ttm can recurse
> into it's own pagefault handler and then deadlock, the trylock just
> keeps lockdep quiet. We've had that bug arise in drm/i915 due to some
> fun userspace did and now have testcases for them. The right solution
> to fix this is to use copy_to|from_user_atomic in ttm everywhere it
> holds locks and have slowpaths which drops locks, copies stuff into a
> temp allocation and then continues. At least that's how we've fixed
> all those inversions in i915-gem. I'm not volunteering to fix this ;-)
 Ah the case where a mmap'd address is passed to the execbuf ioctl? :P

 Fine I'll look into it a bit, hopefully before tuesday. Else it might take 
 a bit longer since I'll be on my way to plumbers..
>>> I think a possible fix would be if fault() were allowed to return an error 
>>> and drop the mmap_sem() before returning.
>>>
>>> Otherwise we need to track down all copy_to_user / copy_from_user which 
>>> happen with bo::reserve held.
>
> Actually, from looking at the mm code, it seems OK to do the following:
>
> if (!bo_tryreserve()) {
> up_read mmap_sem(); // Release the mmap_sem to avoid deadlocks.
> bo_reserve();   // Wait for the BO to become available 
> (interruptible)
> bo_unreserve();   // Where is bo_wait_unreserved() when we need 
> it, Maarten :P
> return VM_FAULT_RETRY; // Go ahead and retry the VMA walk, after 
> regrabbing
> }
Is this meant as a jab at me? You're doing locking wrong here! Again!

> Somebody conveniently added a VM_FAULT_RETRY, but for a different purpose.
>
> If possible, I suggest to take this route for now to avoid the mess of 
> changing locking order in all TTM drivers, with
> all give-up-locking slowpaths that comes with it. IIRC it took some time for 
> i915 to get that right, and completely get rid of all lockdep warnings.
Sorry, but it's still the right thing to do. I can convert nouveau and take a 
look at radeon. Locking
slowpaths are easy to test too with CONFIG_DEBUG_WW_MUTEX_SLOWPATH.
Just because it's harder, doesn't mean we have to avoid doing it.

The might_fault function will verify the usage of mmap_sem with lockdep 
automatically when PROVE_LOCKING=y.
This means that any copy_from_user / copy_to_user will always check mmap_sem.

> This will keep the official locking order
> bo::reserve
> mmap_sem
Disagree, fix the order and the trylock and 'wait for unreserved' half assed 
locking will disappear.

~Maarten
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.or

Re: [BUG] completely bonkers use of set_need_resched + VM_FAULT_NOPAGE

2013-09-13 Thread Thomas Hellstrom

On 09/13/2013 09:16 AM, Maarten Lankhorst wrote:

Op 13-09-13 08:44, Thomas Hellstrom schreef:

On 09/12/2013 11:50 PM, Maarten Lankhorst wrote:

Op 12-09-13 18:44, Thomas Hellstrom schreef:

On 09/12/2013 05:45 PM, Maarten Lankhorst wrote:

Op 12-09-13 17:36, Daniel Vetter schreef:

On Thu, Sep 12, 2013 at 5:06 PM, Peter Zijlstra  wrote:

So I'm poking around the preemption code and stumbled upon:

drivers/gpu/drm/i915/i915_gem.c:set_need_resched();
drivers/gpu/drm/ttm/ttm_bo_vm.c:set_need_resched();
drivers/gpu/drm/ttm/ttm_bo_vm.c:set_need_resched();
drivers/gpu/drm/udl/udl_gem.c:  set_need_resched();

All these sites basically do:

 while (!trylock())
   yield();

which is a horrible and broken locking pattern.

Firstly its deadlock prone, suppose the faulting process is a FIFOn+1
task that preempted the lock holder at FIFOn.

Secondly the implementation is worse than usual by abusing
VM_FAULT_NOPAGE, which is supposed to install a PTE so that the fault
doesn't retry, but you're using it as a get out of fault path. And
you're using set_need_resched() which is not something a driver should
_ever_ touch.

Now I'm going to take away set_need_resched() -- and while you can
'reimplement' it using set_thread_flag() you're not going to do that
because it will be broken due to changes to the preempt code.

So please as to fix ASAP and don't allow anybody to trick you into
merging silly things like that again ;-)

The set_need_resched in i915_gem.c:i915_gem_fault can actually be
removed. It was there to give the error handler a chance to sneak in
and reset the hw/sw tracking when the gpu is dead. That hack goes back
to the days when the locking around our error handler was somewhere
between nonexistent and totally broken, nowadays we keep things from
live-locking by a bit of magic in i915_mutex_lock_interruptible. I'll
whip up a patch to rip this out. I'll also check that our testsuite
properly exercises this path (needs a bit of work on a quick look for
better coverage).

The one in ttm is just bonghits to shut up lockdep: ttm can recurse
into it's own pagefault handler and then deadlock, the trylock just
keeps lockdep quiet. We've had that bug arise in drm/i915 due to some
fun userspace did and now have testcases for them. The right solution
to fix this is to use copy_to|from_user_atomic in ttm everywhere it
holds locks and have slowpaths which drops locks, copies stuff into a
temp allocation and then continues. At least that's how we've fixed
all those inversions in i915-gem. I'm not volunteering to fix this ;-)

Ah the case where a mmap'd address is passed to the execbuf ioctl? :P

Fine I'll look into it a bit, hopefully before tuesday. Else it might take a 
bit longer since I'll be on my way to plumbers..

I think a possible fix would be if fault() were allowed to return an error and 
drop the mmap_sem() before returning.

Otherwise we need to track down all copy_to_user / copy_from_user which happen 
with bo::reserve held.

Actually, from looking at the mm code, it seems OK to do the following:

if (!bo_tryreserve()) {
 up_read mmap_sem(); // Release the mmap_sem to avoid deadlocks.
 bo_reserve();   // Wait for the BO to become available 
(interruptible)
 bo_unreserve();   // Where is bo_wait_unreserved() when we need 
it, Maarten :P
 return VM_FAULT_RETRY; // Go ahead and retry the VMA walk, after regrabbing
}

Is this meant as a jab at me? You're doing locking wrong here! Again!


It's not meant as a jab at you.  I'm sorry if it came out that way. It 
was meant as a joke. I wasn't aware the topic was sensitive.


Anyway, could you describe what is wrong, with the above solution, 
because it seems perfectly legal to me.
There is no substantial overhead, and there is no risc of deadlocks. Or 
do you mean it's bad because it confuses lockdep?


/Thomas
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [BUG] completely bonkers use of set_need_resched + VM_FAULT_NOPAGE

2013-09-13 Thread Maarten Lankhorst
Op 13-09-13 09:46, Thomas Hellstrom schreef:
> On 09/13/2013 09:16 AM, Maarten Lankhorst wrote:
>> Op 13-09-13 08:44, Thomas Hellstrom schreef:
>>> On 09/12/2013 11:50 PM, Maarten Lankhorst wrote:
 Op 12-09-13 18:44, Thomas Hellstrom schreef:
> On 09/12/2013 05:45 PM, Maarten Lankhorst wrote:
>> Op 12-09-13 17:36, Daniel Vetter schreef:
>>> On Thu, Sep 12, 2013 at 5:06 PM, Peter Zijlstra  
>>> wrote:
 So I'm poking around the preemption code and stumbled upon:

 drivers/gpu/drm/i915/i915_gem.c:set_need_resched();
 drivers/gpu/drm/ttm/ttm_bo_vm.c:
 set_need_resched();
 drivers/gpu/drm/ttm/ttm_bo_vm.c:
 set_need_resched();
 drivers/gpu/drm/udl/udl_gem.c:  set_need_resched();

 All these sites basically do:

  while (!trylock())
yield();

 which is a horrible and broken locking pattern.

 Firstly its deadlock prone, suppose the faulting process is a FIFOn+1
 task that preempted the lock holder at FIFOn.

 Secondly the implementation is worse than usual by abusing
 VM_FAULT_NOPAGE, which is supposed to install a PTE so that the fault
 doesn't retry, but you're using it as a get out of fault path. And
 you're using set_need_resched() which is not something a driver should
 _ever_ touch.

 Now I'm going to take away set_need_resched() -- and while you can
 'reimplement' it using set_thread_flag() you're not going to do that
 because it will be broken due to changes to the preempt code.

 So please as to fix ASAP and don't allow anybody to trick you into
 merging silly things like that again ;-)
>>> The set_need_resched in i915_gem.c:i915_gem_fault can actually be
>>> removed. It was there to give the error handler a chance to sneak in
>>> and reset the hw/sw tracking when the gpu is dead. That hack goes back
>>> to the days when the locking around our error handler was somewhere
>>> between nonexistent and totally broken, nowadays we keep things from
>>> live-locking by a bit of magic in i915_mutex_lock_interruptible. I'll
>>> whip up a patch to rip this out. I'll also check that our testsuite
>>> properly exercises this path (needs a bit of work on a quick look for
>>> better coverage).
>>>
>>> The one in ttm is just bonghits to shut up lockdep: ttm can recurse
>>> into it's own pagefault handler and then deadlock, the trylock just
>>> keeps lockdep quiet. We've had that bug arise in drm/i915 due to some
>>> fun userspace did and now have testcases for them. The right solution
>>> to fix this is to use copy_to|from_user_atomic in ttm everywhere it
>>> holds locks and have slowpaths which drops locks, copies stuff into a
>>> temp allocation and then continues. At least that's how we've fixed
>>> all those inversions in i915-gem. I'm not volunteering to fix this ;-)
>> Ah the case where a mmap'd address is passed to the execbuf ioctl? :P
>>
>> Fine I'll look into it a bit, hopefully before tuesday. Else it might 
>> take a bit longer since I'll be on my way to plumbers..
> I think a possible fix would be if fault() were allowed to return an 
> error and drop the mmap_sem() before returning.
>
> Otherwise we need to track down all copy_to_user / copy_from_user which 
> happen with bo::reserve held.
>>> Actually, from looking at the mm code, it seems OK to do the following:
>>>
>>> if (!bo_tryreserve()) {
>>>  up_read mmap_sem(); // Release the mmap_sem to avoid deadlocks.
>>>  bo_reserve();   // Wait for the BO to become available 
>>> (interruptible)
>>>  bo_unreserve();   // Where is bo_wait_unreserved() when we 
>>> need it, Maarten :P
>>>  return VM_FAULT_RETRY; // Go ahead and retry the VMA walk, after 
>>> regrabbing
>>> }
>> Is this meant as a jab at me? You're doing locking wrong here! Again!
>
> It's not meant as a jab at you.  I'm sorry if it came out that way. It was 
> meant as a joke. I wasn't aware the topic was sensitive.
>
> Anyway, could you describe what is wrong, with the above solution, because it 
> seems perfectly legal to me.
> There is no substantial overhead, and there is no risc of deadlocks. Or do 
> you mean it's bad because it confuses lockdep?
Evil userspace can pass a bo as pointer to use for relocation lists, lockdep 
will warn when that locks up, but still..
This is already a problem now, and your fixing will only cause lockdep to 
explicitly warn on it.

You can make a complicated user program to test this, or simply use this 
function for debugging:
void ttm_might_fault(void) { struct reservation_object obj; 
reservation_object_init(&obj); ww_mutex_lock(&obj.lock, NULL); 
ww_mutex_unlock(&

Re: [BUG] completely bonkers use of set_need_resched + VM_FAULT_NOPAGE

2013-09-13 Thread Thomas Hellstrom

On 09/13/2013 09:51 AM, Maarten Lankhorst wrote:

Op 13-09-13 09:46, Thomas Hellstrom schreef:

On 09/13/2013 09:16 AM, Maarten Lankhorst wrote:

Op 13-09-13 08:44, Thomas Hellstrom schreef:

On 09/12/2013 11:50 PM, Maarten Lankhorst wrote:

Op 12-09-13 18:44, Thomas Hellstrom schreef:

On 09/12/2013 05:45 PM, Maarten Lankhorst wrote:

Op 12-09-13 17:36, Daniel Vetter schreef:

On Thu, Sep 12, 2013 at 5:06 PM, Peter Zijlstra  wrote:

So I'm poking around the preemption code and stumbled upon:

drivers/gpu/drm/i915/i915_gem.c:set_need_resched();
drivers/gpu/drm/ttm/ttm_bo_vm.c:set_need_resched();
drivers/gpu/drm/ttm/ttm_bo_vm.c:set_need_resched();
drivers/gpu/drm/udl/udl_gem.c:  set_need_resched();

All these sites basically do:

  while (!trylock())
yield();

which is a horrible and broken locking pattern.

Firstly its deadlock prone, suppose the faulting process is a FIFOn+1
task that preempted the lock holder at FIFOn.

Secondly the implementation is worse than usual by abusing
VM_FAULT_NOPAGE, which is supposed to install a PTE so that the fault
doesn't retry, but you're using it as a get out of fault path. And
you're using set_need_resched() which is not something a driver should
_ever_ touch.

Now I'm going to take away set_need_resched() -- and while you can
'reimplement' it using set_thread_flag() you're not going to do that
because it will be broken due to changes to the preempt code.

So please as to fix ASAP and don't allow anybody to trick you into
merging silly things like that again ;-)

The set_need_resched in i915_gem.c:i915_gem_fault can actually be
removed. It was there to give the error handler a chance to sneak in
and reset the hw/sw tracking when the gpu is dead. That hack goes back
to the days when the locking around our error handler was somewhere
between nonexistent and totally broken, nowadays we keep things from
live-locking by a bit of magic in i915_mutex_lock_interruptible. I'll
whip up a patch to rip this out. I'll also check that our testsuite
properly exercises this path (needs a bit of work on a quick look for
better coverage).

The one in ttm is just bonghits to shut up lockdep: ttm can recurse
into it's own pagefault handler and then deadlock, the trylock just
keeps lockdep quiet. We've had that bug arise in drm/i915 due to some
fun userspace did and now have testcases for them. The right solution
to fix this is to use copy_to|from_user_atomic in ttm everywhere it
holds locks and have slowpaths which drops locks, copies stuff into a
temp allocation and then continues. At least that's how we've fixed
all those inversions in i915-gem. I'm not volunteering to fix this ;-)

Ah the case where a mmap'd address is passed to the execbuf ioctl? :P

Fine I'll look into it a bit, hopefully before tuesday. Else it might take a 
bit longer since I'll be on my way to plumbers..

I think a possible fix would be if fault() were allowed to return an error and 
drop the mmap_sem() before returning.

Otherwise we need to track down all copy_to_user / copy_from_user which happen 
with bo::reserve held.

Actually, from looking at the mm code, it seems OK to do the following:

if (!bo_tryreserve()) {
  up_read mmap_sem(); // Release the mmap_sem to avoid deadlocks.
  bo_reserve();   // Wait for the BO to become available 
(interruptible)
  bo_unreserve();   // Where is bo_wait_unreserved() when we need 
it, Maarten :P
  return VM_FAULT_RETRY; // Go ahead and retry the VMA walk, after 
regrabbing
}

Is this meant as a jab at me? You're doing locking wrong here! Again!

It's not meant as a jab at you.  I'm sorry if it came out that way. It was 
meant as a joke. I wasn't aware the topic was sensitive.

Anyway, could you describe what is wrong, with the above solution, because it 
seems perfectly legal to me.
There is no substantial overhead, and there is no risc of deadlocks. Or do you 
mean it's bad because it confuses lockdep?

Evil userspace can pass a bo as pointer to use for relocation lists, lockdep 
will warn when that locks up, but still..
This is already a problem now, and your fixing will only cause lockdep to 
explicitly warn on it.


As previously mentioned, copy_from_user should return -EFAULT, since the 
VMAs are marked with VM_IO. It should not recurse into fault(), so evil 
user-space looses.




You can make a complicated user program to test this, or simply use this 
function for debugging:
void ttm_might_fault(void) { struct reservation_object obj; reservation_object_init(&obj); 
ww_mutex_lock(&obj.lock, NULL); ww_mutex_unlock(&obj.lock); 
reservation_object_fini(&obj); }

Put it near every instance of copy_to_user/copy_from_user and you'll find the 
bugs. :)


I'm still not convinced that there are any problems with this solution. 
Did you take what's said above into account?



Now, could we try to approach this based on pros and cons? Let's say we 
would

Re: [BUG] completely bonkers use of set_need_resched + VM_FAULT_NOPAGE

2013-09-13 Thread Daniel Vetter
On Fri, Sep 13, 2013 at 7:33 AM, Thomas Hellstrom  wrote:
> Given that all copy_to_user / copy_from_user paths are actually hit during
> testing, right?

Ime it requires a bit of ingenuity to properly test this from
userspace. We're using a few tricks in drm/i915 kernel testing:
- When we hand a gtt mmap pointer to execbuf or other ioctls we upload
the data in there through pwrite (or if you don't have that use the
gpu to blt it there). This way you can careful control when the
pagefault will happen. Also since we supply correct data we can make
sure that the kernel actually does the right thing and not just
whether it'll blow up.
- We have a module parameter which can be changed at runtime to
disable all the prefaulting we're doing.
- We have a debugfs interface to drop caches/evict lrus. If you have a
parallel thread that regularly forces the inactive list to be evicted
we can force a refault even after the first fault already happend.
That's useful to test the slowpath after a slowpath already happened,
e.g. when trying to copy reloc offset out to userspace after execbuf
completed.

With these tricks we have imo great test coverage for i915.ko and more
important good assurance that any regressions in this tricky code will
get caught.

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [BUG] completely bonkers use of set_need_resched + VM_FAULT_NOPAGE

2013-09-13 Thread Peter Zijlstra
On Fri, Sep 13, 2013 at 09:46:03AM +0200, Thomas Hellstrom wrote:
> >>if (!bo_tryreserve()) {
> >> up_read mmap_sem(); // Release the mmap_sem to avoid deadlocks.
> >> bo_reserve();   // Wait for the BO to become available 
> >> (interruptible)
> >> bo_unreserve();   // Where is bo_wait_unreserved() when we 
> >> need it, Maarten :P
> >> return VM_FAULT_RETRY; // Go ahead and retry the VMA walk, after 
> >> regrabbing
> >>}
> 
> Anyway, could you describe what is wrong, with the above solution, because
> it seems perfectly legal to me.

Luckily the rule of law doesn't have anything to do with this stuff --
at least I sincerely hope so.

The thing that's wrong with that pattern is that its still not
deterministic - although its a lot better than the pure trylock. Because
you have to release and re-acquire with the trylock another user might
have gotten in again. Its utterly prone to starvation.

The acquire+release does remove the dead/life-lock scenario from the
FIFO case, since blocking on the acquire will allow the other task to
run (or even get boosted on -rt).

Aside from that there's nothing particularly wrong with it and lockdep
should be happy afaict (but I haven't had my morning juice yet).
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [BUG] completely bonkers use of set_need_resched + VM_FAULT_NOPAGE

2013-09-13 Thread Daniel Vetter
On Fri, Sep 13, 2013 at 10:23 AM, Thomas Hellstrom
 wrote:
> As previously mentioned, copy_from_user should return -EFAULT, since the
> VMAs are marked with VM_IO. It should not recurse into fault(), so evil
> user-space looses.

I haven't put a printk in the code to prove this, but gem mmap also
sets VM_IO in drm_gem_mmap_obj. And we can very much hit our own fault
handler and deadlock 

On a _very_ quick reading (and definitely not enough coffee yet for
reading mm/* stuff) it looks like it's get_user_pages that will return
an -EFAULT when hitting upon a VM_IO mapping (which makes sense since
there's really no page backing it). Actually using get_user_pages was
the original slowpath we've had in a few places until we've noticed
that for pwrite that breaks legit userspace (the glBufferData(glMap))
use-case), so we've switched to lock dropping and proper slowpaths
using copy_*_user everywhere instead of trying to pin the userspace
storage with get_user_pages.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [BUG] completely bonkers use of set_need_resched + VM_FAULT_NOPAGE

2013-09-13 Thread Thomas Hellstrom

On 09/13/2013 10:32 AM, Daniel Vetter wrote:

On Fri, Sep 13, 2013 at 10:23 AM, Thomas Hellstrom
 wrote:

As previously mentioned, copy_from_user should return -EFAULT, since the
VMAs are marked with VM_IO. It should not recurse into fault(), so evil
user-space looses.

I haven't put a printk in the code to prove this, but gem mmap also
sets VM_IO in drm_gem_mmap_obj. And we can very much hit our own fault
handler and deadlock 


If this is indeed true, I guess I need to accept the fact that my 
solution is bad.

(and worse I can't blame not having my morning coffee).

I'll take a deeper look.
/Thomas
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [BUG] completely bonkers use of set_need_resched + VM_FAULT_NOPAGE

2013-09-13 Thread Daniel Vetter
On Fri, Sep 13, 2013 at 10:29 AM, Peter Zijlstra  wrote:
> On Fri, Sep 13, 2013 at 09:46:03AM +0200, Thomas Hellstrom wrote:
>> >>if (!bo_tryreserve()) {
>> >> up_read mmap_sem(); // Release the mmap_sem to avoid deadlocks.
>> >> bo_reserve();   // Wait for the BO to become available 
>> >> (interruptible)
>> >> bo_unreserve();   // Where is bo_wait_unreserved() when we 
>> >> need it, Maarten :P
>> >> return VM_FAULT_RETRY; // Go ahead and retry the VMA walk, after 
>> >> regrabbing
>> >>}
>>
>> Anyway, could you describe what is wrong, with the above solution, because
>> it seems perfectly legal to me.
>
> Luckily the rule of law doesn't have anything to do with this stuff --
> at least I sincerely hope so.
>
> The thing that's wrong with that pattern is that its still not
> deterministic - although its a lot better than the pure trylock. Because
> you have to release and re-acquire with the trylock another user might
> have gotten in again. Its utterly prone to starvation.
>
> The acquire+release does remove the dead/life-lock scenario from the
> FIFO case, since blocking on the acquire will allow the other task to
> run (or even get boosted on -rt).
>
> Aside from that there's nothing particularly wrong with it and lockdep
> should be happy afaict (but I haven't had my morning juice yet).

bo_reserve internally maps to a ww-mutex and task can already hold
ww-mutex (potentially even the same for especially nasty userspace).
So lockdep will complain and I think the only way to properly solve
this is to have lock-dropping slowpaths around all copy_*_user
callsites that already hold a bo_reserve ww_mutex. At least that's
been my conclusion after much head-banging against this issue for
drm/i915, and we've tried a lot approaches ;-)
-Daniel

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [Intel-gfx] [PATCH 1/2] drm/i915: kill set_need_resched

2013-09-13 Thread Daniel Vetter
On Fri, Sep 13, 2013 at 2:59 AM, Rob Clark  wrote:
> I guess in i915 (and ttm) case, the issue arises due to need for CPU
> access to buffer via GTT?  In which case I should be safe to drop the
> set_need_resched() as well? (Since CPU always has direct access to the
> pages.)  Or am I missing something about the original issue that
> necessitated set_need_resched()?

For drm/i915 the _only_ reason we've had  it was to avoid life-locking
with our gpu reset work when the gpu hung. We've fixed that properly
now by using a wait-queue to stall when a gpu reset is pending and
proper locking in the gpu reset handler (plus tons of evil tests to
make sure it doesn't break, there's rather fragile lock-dropping and
tricky ordering involved). So if you don't have i915's broken gpu
reset handling from yonder you don't need our cargo-cult.

ttm's usage with a trylock+yield is a different form of duct-tape to
paper over locking inversions between copy_*_user callsites and the
pagefault handler.

In any case there's no way it actually works properly ;-)

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [BUG] completely bonkers use of set_need_resched + VM_FAULT_NOPAGE

2013-09-13 Thread Maarten Lankhorst
Op 13-09-13 10:23, Thomas Hellstrom schreef:
> On 09/13/2013 09:51 AM, Maarten Lankhorst wrote:
>> Op 13-09-13 09:46, Thomas Hellstrom schreef:
>>> On 09/13/2013 09:16 AM, Maarten Lankhorst wrote:
 Op 13-09-13 08:44, Thomas Hellstrom schreef:
> On 09/12/2013 11:50 PM, Maarten Lankhorst wrote:
>> Op 12-09-13 18:44, Thomas Hellstrom schreef:
>>> On 09/12/2013 05:45 PM, Maarten Lankhorst wrote:
 Op 12-09-13 17:36, Daniel Vetter schreef:
> On Thu, Sep 12, 2013 at 5:06 PM, Peter Zijlstra 
>  wrote:
>> So I'm poking around the preemption code and stumbled upon:
>>
>> drivers/gpu/drm/i915/i915_gem.c:set_need_resched();
>> drivers/gpu/drm/ttm/ttm_bo_vm.c:
>> set_need_resched();
>> drivers/gpu/drm/ttm/ttm_bo_vm.c:
>> set_need_resched();
>> drivers/gpu/drm/udl/udl_gem.c:  set_need_resched();
>>
>> All these sites basically do:
>>
>>   while (!trylock())
>> yield();
>>
>> which is a horrible and broken locking pattern.
>>
>> Firstly its deadlock prone, suppose the faulting process is a FIFOn+1
>> task that preempted the lock holder at FIFOn.
>>
>> Secondly the implementation is worse than usual by abusing
>> VM_FAULT_NOPAGE, which is supposed to install a PTE so that the fault
>> doesn't retry, but you're using it as a get out of fault path. And
>> you're using set_need_resched() which is not something a driver 
>> should
>> _ever_ touch.
>>
>> Now I'm going to take away set_need_resched() -- and while you can
>> 'reimplement' it using set_thread_flag() you're not going to do that
>> because it will be broken due to changes to the preempt code.
>>
>> So please as to fix ASAP and don't allow anybody to trick you into
>> merging silly things like that again ;-)
> The set_need_resched in i915_gem.c:i915_gem_fault can actually be
> removed. It was there to give the error handler a chance to sneak in
> and reset the hw/sw tracking when the gpu is dead. That hack goes back
> to the days when the locking around our error handler was somewhere
> between nonexistent and totally broken, nowadays we keep things from
> live-locking by a bit of magic in i915_mutex_lock_interruptible. I'll
> whip up a patch to rip this out. I'll also check that our testsuite
> properly exercises this path (needs a bit of work on a quick look for
> better coverage).
>
> The one in ttm is just bonghits to shut up lockdep: ttm can recurse
> into it's own pagefault handler and then deadlock, the trylock just
> keeps lockdep quiet. We've had that bug arise in drm/i915 due to some
> fun userspace did and now have testcases for them. The right solution
> to fix this is to use copy_to|from_user_atomic in ttm everywhere it
> holds locks and have slowpaths which drops locks, copies stuff into a
> temp allocation and then continues. At least that's how we've fixed
> all those inversions in i915-gem. I'm not volunteering to fix this ;-)
 Ah the case where a mmap'd address is passed to the execbuf ioctl? :P

 Fine I'll look into it a bit, hopefully before tuesday. Else it might 
 take a bit longer since I'll be on my way to plumbers..
>>> I think a possible fix would be if fault() were allowed to return an 
>>> error and drop the mmap_sem() before returning.
>>>
>>> Otherwise we need to track down all copy_to_user / copy_from_user which 
>>> happen with bo::reserve held.
> Actually, from looking at the mm code, it seems OK to do the following:
>
> if (!bo_tryreserve()) {
>   up_read mmap_sem(); // Release the mmap_sem to avoid deadlocks.
>   bo_reserve();   // Wait for the BO to become available 
> (interruptible)
>   bo_unreserve();   // Where is bo_wait_unreserved() when we 
> need it, Maarten :P
>   return VM_FAULT_RETRY; // Go ahead and retry the VMA walk, after 
> regrabbing
> }
 Is this meant as a jab at me? You're doing locking wrong here! Again!
>>> It's not meant as a jab at you.  I'm sorry if it came out that way. It was 
>>> meant as a joke. I wasn't aware the topic was sensitive.
>>>
>>> Anyway, could you describe what is wrong, with the above solution, because 
>>> it seems perfectly legal to me.
>>> There is no substantial overhead, and there is no risc of deadlocks. Or do 
>>> you mean it's bad because it confuses lockdep?
>> Evil userspace can pass a bo as pointer to use for relocation lists, lockdep 
>> will warn when that locks up, but still..
>> This is already a problem now, and your fixin

Re: [BUG] completely bonkers use of set_need_resched + VM_FAULT_NOPAGE

2013-09-13 Thread Peter Zijlstra
On Fri, Sep 13, 2013 at 10:41:54AM +0200, Daniel Vetter wrote:
> On Fri, Sep 13, 2013 at 10:29 AM, Peter Zijlstra  wrote:
> > On Fri, Sep 13, 2013 at 09:46:03AM +0200, Thomas Hellstrom wrote:
> >> >>if (!bo_tryreserve()) {
> >> >> up_read mmap_sem(); // Release the mmap_sem to avoid deadlocks.
> >> >> bo_reserve();   // Wait for the BO to become available 
> >> >> (interruptible)
> >> >> bo_unreserve();   // Where is bo_wait_unreserved() when we 
> >> >> need it, Maarten :P
> >> >> return VM_FAULT_RETRY; // Go ahead and retry the VMA walk, after 
> >> >> regrabbing
> >> >>}
> >>
> >> Anyway, could you describe what is wrong, with the above solution, because
> >> it seems perfectly legal to me.
> >
> > Luckily the rule of law doesn't have anything to do with this stuff --
> > at least I sincerely hope so.
> >
> > The thing that's wrong with that pattern is that its still not
> > deterministic - although its a lot better than the pure trylock. Because
> > you have to release and re-acquire with the trylock another user might
> > have gotten in again. Its utterly prone to starvation.
> >
> > The acquire+release does remove the dead/life-lock scenario from the
> > FIFO case, since blocking on the acquire will allow the other task to
> > run (or even get boosted on -rt).
> >
> > Aside from that there's nothing particularly wrong with it and lockdep
> > should be happy afaict (but I haven't had my morning juice yet).
> 
> bo_reserve internally maps to a ww-mutex and task can already hold
> ww-mutex (potentially even the same for especially nasty userspace).

OK, yes I wasn't aware of that. Yes in that case you're quite right.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 60857] Unstable display with Radeon 760G (ASUS M4A78L-M LE)

2013-09-13 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=60857

--- Comment #8 from Stuart Foster  ---
Created attachment 108311
  --> https://bugzilla.kernel.org/attachment.cgi?id=108311&action=edit
Dmesg output

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 60857] Unstable display with Radeon 760G (ASUS M4A78L-M LE)

2013-09-13 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=60857

--- Comment #9 from Stuart Foster  ---
The dmesg is from a vanilla 3.11 kernel with dpm turned on and patch
kbug60857.diff applied.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 3/4] drm: omap: Enable DT support for DMM

2013-09-13 Thread Archit Taneja
Enable use of DT for DMM/Tiler.

Originally worked on by Andy Gross.

Cc: Andy Gross 
Cc: DRI Development 
Signed-off-by: Archit Taneja 
---
 drivers/gpu/drm/omapdrm/omap_dmm_tiler.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/drivers/gpu/drm/omapdrm/omap_dmm_tiler.c 
b/drivers/gpu/drm/omapdrm/omap_dmm_tiler.c
index acf6678..59f17de 100644
--- a/drivers/gpu/drm/omapdrm/omap_dmm_tiler.c
+++ b/drivers/gpu/drm/omapdrm/omap_dmm_tiler.c
@@ -968,12 +968,23 @@ static const struct dev_pm_ops omap_dmm_pm_ops = {
 };
 #endif
 
+#if defined(CONFIG_OF)
+static const struct of_device_id dmm_of_match[] = {
+   { .compatible = "ti,omap4-dmm", },
+   { .compatible = "ti,omap5-dmm", },
+   {},
+};
+#else
+#define dmm_of_match NULL
+#endif
+
 struct platform_driver omap_dmm_driver = {
.probe = omap_dmm_probe,
.remove = omap_dmm_remove,
.driver = {
.owner = THIS_MODULE,
.name = DMM_DRIVER_NAME,
+   .of_match_table = dmm_of_match,
 #ifdef CONFIG_PM
.pm = &omap_dmm_pm_ops,
 #endif
-- 
1.8.1.2

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [BUG] completely bonkers use of set_need_resched + VM_FAULT_NOPAGE

2013-09-13 Thread Thomas Hellstrom

On 09/13/2013 10:58 AM, Maarten Lankhorst wrote:

Op 13-09-13 10:23, Thomas Hellstrom schreef:

On 09/13/2013 09:51 AM, Maarten Lankhorst wrote:

Op 13-09-13 09:46, Thomas Hellstrom schreef:

On 09/13/2013 09:16 AM, Maarten Lankhorst wrote:

Op 13-09-13 08:44, Thomas Hellstrom schreef:

On 09/12/2013 11:50 PM, Maarten Lankhorst wrote:

Op 12-09-13 18:44, Thomas Hellstrom schreef:

On 09/12/2013 05:45 PM, Maarten Lankhorst wrote:

Op 12-09-13 17:36, Daniel Vetter schreef:

On Thu, Sep 12, 2013 at 5:06 PM, Peter Zijlstra  wrote:

So I'm poking around the preemption code and stumbled upon:

drivers/gpu/drm/i915/i915_gem.c:set_need_resched();
drivers/gpu/drm/ttm/ttm_bo_vm.c:set_need_resched();
drivers/gpu/drm/ttm/ttm_bo_vm.c:set_need_resched();
drivers/gpu/drm/udl/udl_gem.c:  set_need_resched();

All these sites basically do:

   while (!trylock())
 yield();

which is a horrible and broken locking pattern.

Firstly its deadlock prone, suppose the faulting process is a FIFOn+1
task that preempted the lock holder at FIFOn.

Secondly the implementation is worse than usual by abusing
VM_FAULT_NOPAGE, which is supposed to install a PTE so that the fault
doesn't retry, but you're using it as a get out of fault path. And
you're using set_need_resched() which is not something a driver should
_ever_ touch.

Now I'm going to take away set_need_resched() -- and while you can
'reimplement' it using set_thread_flag() you're not going to do that
because it will be broken due to changes to the preempt code.

So please as to fix ASAP and don't allow anybody to trick you into
merging silly things like that again ;-)

The set_need_resched in i915_gem.c:i915_gem_fault can actually be
removed. It was there to give the error handler a chance to sneak in
and reset the hw/sw tracking when the gpu is dead. That hack goes back
to the days when the locking around our error handler was somewhere
between nonexistent and totally broken, nowadays we keep things from
live-locking by a bit of magic in i915_mutex_lock_interruptible. I'll
whip up a patch to rip this out. I'll also check that our testsuite
properly exercises this path (needs a bit of work on a quick look for
better coverage).

The one in ttm is just bonghits to shut up lockdep: ttm can recurse
into it's own pagefault handler and then deadlock, the trylock just
keeps lockdep quiet. We've had that bug arise in drm/i915 due to some
fun userspace did and now have testcases for them. The right solution
to fix this is to use copy_to|from_user_atomic in ttm everywhere it
holds locks and have slowpaths which drops locks, copies stuff into a
temp allocation and then continues. At least that's how we've fixed
all those inversions in i915-gem. I'm not volunteering to fix this ;-)

Ah the case where a mmap'd address is passed to the execbuf ioctl? :P

Fine I'll look into it a bit, hopefully before tuesday. Else it might take a 
bit longer since I'll be on my way to plumbers..

I think a possible fix would be if fault() were allowed to return an error and 
drop the mmap_sem() before returning.

Otherwise we need to track down all copy_to_user / copy_from_user which happen 
with bo::reserve held.

Actually, from looking at the mm code, it seems OK to do the following:

if (!bo_tryreserve()) {
   up_read mmap_sem(); // Release the mmap_sem to avoid deadlocks.
   bo_reserve();   // Wait for the BO to become available 
(interruptible)
   bo_unreserve();   // Where is bo_wait_unreserved() when we need 
it, Maarten :P
   return VM_FAULT_RETRY; // Go ahead and retry the VMA walk, after 
regrabbing
}

Is this meant as a jab at me? You're doing locking wrong here! Again!

It's not meant as a jab at you.  I'm sorry if it came out that way. It was 
meant as a joke. I wasn't aware the topic was sensitive.

Anyway, could you describe what is wrong, with the above solution, because it 
seems perfectly legal to me.
There is no substantial overhead, and there is no risc of deadlocks. Or do you 
mean it's bad because it confuses lockdep?

Evil userspace can pass a bo as pointer to use for relocation lists, lockdep 
will warn when that locks up, but still..
This is already a problem now, and your fixing will only cause lockdep to 
explicitly warn on it.

As previously mentioned, copy_from_user should return -EFAULT, since the VMAs 
are marked with VM_IO. It should not recurse into fault(), so evil user-space 
looses.


You can make a complicated user program to test this, or simply use this 
function for debugging:
void ttm_might_fault(void) { struct reservation_object obj; reservation_object_init(&obj); 
ww_mutex_lock(&obj.lock, NULL); ww_mutex_unlock(&obj.lock); 
reservation_object_fini(&obj); }

Put it near every instance of copy_to_user/copy_from_user and you'll find the 
bugs. :)

I'm still not convinced that there are any problems with this solution. Did you 
take what's said abo

[PATCH] drm, ttm Fix uninitialized warning

2013-09-13 Thread Prarit Bhargava
Fix uninitialized warning.

drivers/gpu/drm/ttm/ttm_object.c: In function ‘ttm_base_object_lookup’:
drivers/gpu/drm/ttm/ttm_object.c:213:10: error: ‘base’ may be used 
uninitialized in this function [-Werror=maybe-uninitialized]
  kref_put(&base->refcount, ttm_release_base);
  ^
drivers/gpu/drm/ttm/ttm_object.c:221:26: note: ‘base’ was declared here
  struct ttm_base_object *base;

Signed-off-by: Prarit Bhargava 
Cc: David Airlie 
Cc: rcl...@redhat.com
---
 drivers/gpu/drm/ttm/ttm_object.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/ttm/ttm_object.c b/drivers/gpu/drm/ttm/ttm_object.c
index 58a5f32..a868176 100644
--- a/drivers/gpu/drm/ttm/ttm_object.c
+++ b/drivers/gpu/drm/ttm/ttm_object.c
@@ -218,7 +218,7 @@ struct ttm_base_object *ttm_base_object_lookup(struct 
ttm_object_file *tfile,
   uint32_t key)
 {
struct ttm_object_device *tdev = tfile->tdev;
-   struct ttm_base_object *base;
+   struct ttm_base_object *uninitialized_var(base);
struct drm_hash_item *hash;
int ret;
 
-- 
1.7.9.3

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 60857] Unstable display with Radeon 760G (ASUS M4A78L-M LE)

2013-09-13 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=60857

--- Comment #10 from Alex Deucher  ---
Created attachment 108321
  --> https://bugzilla.kernel.org/attachment.cgi?id=108321&action=edit
skip_set_power_state

Here are two patches to test.  Test them independently and let me know if
either one helps.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 60857] Unstable display with Radeon 760G (ASUS M4A78L-M LE)

2013-09-13 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=60857

--- Comment #11 from Alex Deucher  ---
Created attachment 108331
  --> https://bugzilla.kernel.org/attachment.cgi?id=108331&action=edit
skip_clock_scaling

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] drm, ttm Fix uninitialized warning

2013-09-13 Thread Rob Clark
On Fri, Sep 13, 2013 at 8:33 AM, Prarit Bhargava  wrote:
> Fix uninitialized warning.
>
> drivers/gpu/drm/ttm/ttm_object.c: In function ‘ttm_base_object_lookup’:
> drivers/gpu/drm/ttm/ttm_object.c:213:10: error: ‘base’ may be used 
> uninitialized in this function [-Werror=maybe-uninitialized]
>   kref_put(&base->refcount, ttm_release_base);
>   ^
> drivers/gpu/drm/ttm/ttm_object.c:221:26: note: ‘base’ was declared here
>   struct ttm_base_object *base;
>
> Signed-off-by: Prarit Bhargava 
> Cc: David Airlie 
> Cc: rcl...@redhat.com

Reviewed-by: Rob Clark 

> ---
>  drivers/gpu/drm/ttm/ttm_object.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_object.c 
> b/drivers/gpu/drm/ttm/ttm_object.c
> index 58a5f32..a868176 100644
> --- a/drivers/gpu/drm/ttm/ttm_object.c
> +++ b/drivers/gpu/drm/ttm/ttm_object.c
> @@ -218,7 +218,7 @@ struct ttm_base_object *ttm_base_object_lookup(struct 
> ttm_object_file *tfile,
>uint32_t key)
>  {
> struct ttm_object_device *tdev = tfile->tdev;
> -   struct ttm_base_object *base;
> +   struct ttm_base_object *uninitialized_var(base);
> struct drm_hash_item *hash;
> int ret;
>
> --
> 1.7.9.3
>
> ___
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 3/4] drm: omap: Enable DT support for DMM

2013-09-13 Thread Rob Clark
On Fri, Sep 13, 2013 at 5:14 AM, Archit Taneja  wrote:
> Enable use of DT for DMM/Tiler.
>
> Originally worked on by Andy Gross.

looks good.. but do we want to get information about # of LUT's, etc,
from DT?  Or did we decide that we can reliably get this from the hw?
I lost track of that discussion (I guess Andy would remember)..

BR,
-R

> Cc: Andy Gross 
> Cc: DRI Development 
> Signed-off-by: Archit Taneja 
> ---
>  drivers/gpu/drm/omapdrm/omap_dmm_tiler.c | 11 +++
>  1 file changed, 11 insertions(+)
>
> diff --git a/drivers/gpu/drm/omapdrm/omap_dmm_tiler.c 
> b/drivers/gpu/drm/omapdrm/omap_dmm_tiler.c
> index acf6678..59f17de 100644
> --- a/drivers/gpu/drm/omapdrm/omap_dmm_tiler.c
> +++ b/drivers/gpu/drm/omapdrm/omap_dmm_tiler.c
> @@ -968,12 +968,23 @@ static const struct dev_pm_ops omap_dmm_pm_ops = {
>  };
>  #endif
>
> +#if defined(CONFIG_OF)
> +static const struct of_device_id dmm_of_match[] = {
> +   { .compatible = "ti,omap4-dmm", },
> +   { .compatible = "ti,omap5-dmm", },
> +   {},
> +};
> +#else
> +#define dmm_of_match NULL
> +#endif
> +
>  struct platform_driver omap_dmm_driver = {
> .probe = omap_dmm_probe,
> .remove = omap_dmm_remove,
> .driver = {
> .owner = THIS_MODULE,
> .name = DMM_DRIVER_NAME,
> +   .of_match_table = dmm_of_match,
>  #ifdef CONFIG_PM
> .pm = &omap_dmm_pm_ops,
>  #endif
> --
> 1.8.1.2
>
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 60857] Unstable display with Radeon 760G (ASUS M4A78L-M LE)

2013-09-13 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=60857

--- Comment #12 from Stuart Foster  ---
(In reply to Alex Deucher from comment #11)
> Created attachment 108331 [details]
> skip_clock_scaling

Both patches appear to work ok.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 59649] [r600][RV635] GPU lockup CP stall / GPU resets over and over - Kernel 3.7 to 3.11 inclusive

2013-09-13 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=59649

Shawn Starr  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #9 from Shawn Starr  ---
Closing, I have not had any resets anymore with the respective code changes.
Much thanks to Alex for finding this issue!

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 69321] starting openCL crashes/boots system

2013-09-13 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=69321

Alex Deucher  changed:

   What|Removed |Added

   Assignee|mesa-dev@lists.freedesktop. |dri-devel@lists.freedesktop
   |org |.org
  Component|Other   |Drivers/Gallium/r600

--- Comment #2 from Alex Deucher  ---
Can you bisect?

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 1/4] drm/radeon/dpm/rs780: use drm_mode_vrefresh()

2013-09-13 Thread Alex Deucher
Rather than open coding it.

Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/radeon/rs780_dpm.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/radeon/rs780_dpm.c 
b/drivers/gpu/drm/radeon/rs780_dpm.c
index 828a776..afb7584 100644
--- a/drivers/gpu/drm/radeon/rs780_dpm.c
+++ b/drivers/gpu/drm/radeon/rs780_dpm.c
@@ -62,9 +62,7 @@ static void rs780_get_pm_mode_parameters(struct radeon_device 
*rdev)
radeon_crtc = to_radeon_crtc(crtc);
pi->crtc_id = radeon_crtc->crtc_id;
if (crtc->mode.htotal && crtc->mode.vtotal)
-   pi->refresh_rate =
-   (crtc->mode.clock * 1000) /
-   (crtc->mode.htotal * crtc->mode.vtotal);
+   pi->refresh_rate = 
drm_mode_vrefresh(&crtc->mode);
break;
}
}
-- 
1.8.3.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 2/4] drm/radeon/dpm/rs780: add some sanity checking to sclk scaling

2013-09-13 Thread Alex Deucher
Since the clock scaling is based on fb divider adjustments,
make sure the other pll parameters are the same.

Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/radeon/rs780_dpm.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/radeon/rs780_dpm.c 
b/drivers/gpu/drm/radeon/rs780_dpm.c
index afb7584..31487ce 100644
--- a/drivers/gpu/drm/radeon/rs780_dpm.c
+++ b/drivers/gpu/drm/radeon/rs780_dpm.c
@@ -449,6 +449,12 @@ static int rs780_set_engine_clock_scaling(struct 
radeon_device *rdev,
if (ret)
return ret;
 
+   if ((min_dividers.ref_div != max_dividers.ref_div) ||
+   (min_dividers.post_div != max_dividers.post_div) ||
+   (max_dividers.ref_div != current_max_dividers.ref_div) ||
+   (max_dividers.post_div != current_max_dividers.post_div))
+   return -EINVAL;
+
rs780_force_fbdiv(rdev, max_dividers.fb_div);
 
if (max_dividers.fb_div > min_dividers.fb_div) {
-- 
1.8.3.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 3/4] drm/radeon/dpm/rs780: don't enable sclk scaling if not required

2013-09-13 Thread Alex Deucher
If the low and high sclks are the same, there is no need to
enable sclk scaling.  This causes display stability issues on
certain boards.

Fixes:
https://bugzilla.kernel.org/show_bug.cgi?id=60857

Signed-off-by: Alex Deucher 
Cc: sta...@vger.kernel.org
---
 drivers/gpu/drm/radeon/rs780_dpm.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/radeon/rs780_dpm.c 
b/drivers/gpu/drm/radeon/rs780_dpm.c
index 31487ce..eb336bf 100644
--- a/drivers/gpu/drm/radeon/rs780_dpm.c
+++ b/drivers/gpu/drm/radeon/rs780_dpm.c
@@ -499,6 +499,9 @@ static void rs780_activate_engine_clk_scaling(struct 
radeon_device *rdev,
(new_state->sclk_low == old_state->sclk_low))
return;
 
+   if (new_state->sclk_high == new_state->sclk_low)
+   return;
+
rs780_clk_scaling_enable(rdev, true);
 }
 
-- 
1.8.3.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 4/4] drm/radeon/dpm/rs780: fix force_performance state for same sclks

2013-09-13 Thread Alex Deucher
If the low and high sclks within a power state are the same,
there no need to enable sclk scaling.  Enabling sclk scaling
can cause display stability issues on some boards.

Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/radeon/rs780_dpm.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/radeon/rs780_dpm.c 
b/drivers/gpu/drm/radeon/rs780_dpm.c
index eb336bf..6af8505 100644
--- a/drivers/gpu/drm/radeon/rs780_dpm.c
+++ b/drivers/gpu/drm/radeon/rs780_dpm.c
@@ -1043,8 +1043,10 @@ int rs780_dpm_force_performance_level(struct 
radeon_device *rdev,
if (pi->voltage_control)
rs780_force_voltage(rdev, pi->max_voltage);
 
-   WREG32_P(FVTHROT_FBDIV_REG1, 0, ~FORCE_FEEDBACK_DIV);
-   rs780_clk_scaling_enable(rdev, true);
+   if (ps->sclk_high != ps->sclk_low) {
+   WREG32_P(FVTHROT_FBDIV_REG1, 0, ~FORCE_FEEDBACK_DIV);
+   rs780_clk_scaling_enable(rdev, true);
+   }
 
if (pi->voltage_control) {
rs780_voltage_scaling_enable(rdev, true);
-- 
1.8.3.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 1/4] drm/radeon/dpm/rs780: use drm_mode_vrefresh()

2013-09-13 Thread Christian König

Am 13.09.2013 17:08, schrieb Alex Deucher:

Rather than open coding it.

Signed-off-by: Alex Deucher 


For this series: Reviewed-by: Christian König 


---
  drivers/gpu/drm/radeon/rs780_dpm.c | 4 +---
  1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/radeon/rs780_dpm.c 
b/drivers/gpu/drm/radeon/rs780_dpm.c
index 828a776..afb7584 100644
--- a/drivers/gpu/drm/radeon/rs780_dpm.c
+++ b/drivers/gpu/drm/radeon/rs780_dpm.c
@@ -62,9 +62,7 @@ static void rs780_get_pm_mode_parameters(struct radeon_device 
*rdev)
radeon_crtc = to_radeon_crtc(crtc);
pi->crtc_id = radeon_crtc->crtc_id;
if (crtc->mode.htotal && crtc->mode.vtotal)
-   pi->refresh_rate =
-   (crtc->mode.clock * 1000) /
-   (crtc->mode.htotal * crtc->mode.vtotal);
+   pi->refresh_rate = 
drm_mode_vrefresh(&crtc->mode);
break;
}
}


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 60857] Unstable display with Radeon 760G (ASUS M4A78L-M LE)

2013-09-13 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=60857

--- Comment #14 from Stuart Foster  ---
(In reply to Alex Deucher from comment #13)
> Great.  I'll push the skip_clock_scaling patch upstream.

Ok thank you.

Stuart

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 60857] Unstable display with Radeon 760G (ASUS M4A78L-M LE)

2013-09-13 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=60857

--- Comment #13 from Alex Deucher  ---
Great.  I'll push the skip_clock_scaling patch upstream.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH] drm/radeon: Fix hmdi typo

2013-09-13 Thread Damien Lespiau
I keep making that one, so checked if I was the only one. Apparently
not.

Cc: Alex Deucher 
Signed-off-by: Damien Lespiau 
---
 drivers/gpu/drm/radeon/r600d.h | 2 +-
 drivers/gpu/drm/radeon/radeon_connectors.c | 2 +-
 drivers/gpu/drm/radeon/rv770d.h| 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/radeon/r600d.h b/drivers/gpu/drm/radeon/r600d.h
index 7c78083..1312136 100644
--- a/drivers/gpu/drm/radeon/r600d.h
+++ b/drivers/gpu/drm/radeon/r600d.h
@@ -1004,7 +1004,7 @@
 #   define HDMI0_AVI_INFO_CONT   (1 << 1)
 #   define HDMI0_AUDIO_INFO_SEND (1 << 4)
 #   define HDMI0_AUDIO_INFO_CONT (1 << 5)
-#   define HDMI0_AUDIO_INFO_SOURCE (1 << 6) /* 0 - sound block; 1 - hmdi 
regs */
+#   define HDMI0_AUDIO_INFO_SOURCE (1 << 6) /* 0 - sound block; 1 - hdmi 
regs */
 #   define HDMI0_AUDIO_INFO_UPDATE (1 << 7)
 #   define HDMI0_MPEG_INFO_SEND  (1 << 8)
 #   define HDMI0_MPEG_INFO_CONT  (1 << 9)
diff --git a/drivers/gpu/drm/radeon/radeon_connectors.c 
b/drivers/gpu/drm/radeon/radeon_connectors.c
index 2399f25..fc0a217 100644
--- a/drivers/gpu/drm/radeon/radeon_connectors.c
+++ b/drivers/gpu/drm/radeon/radeon_connectors.c
@@ -1420,7 +1420,7 @@ radeon_dp_detect(struct drm_connector *connector, bool 
force)
if (radeon_dp_getdpcd(radeon_connector))
ret = connector_status_connected;
} else {
-   /* try non-aux ddc (DP to DVI/HMDI/etc. 
adapter) */
+   /* try non-aux ddc (DP to DVI/HDMI/etc. 
adapter) */
if (radeon_ddc_probe(radeon_connector, false))
ret = connector_status_connected;
}
diff --git a/drivers/gpu/drm/radeon/rv770d.h b/drivers/gpu/drm/radeon/rv770d.h
index 6bef2b7..d291625 100644
--- a/drivers/gpu/drm/radeon/rv770d.h
+++ b/drivers/gpu/drm/radeon/rv770d.h
@@ -852,7 +852,7 @@
 #define AFMT_VBI_PACKET_CONTROL  0x7608
 #   define AFMT_GENERIC0_UPDATE  (1 << 2)
 #define AFMT_INFOFRAME_CONTROL0  0x760c
-#   define AFMT_AUDIO_INFO_SOURCE(1 << 6) /* 0 - sound block; 1 - 
hmdi regs */
+#   define AFMT_AUDIO_INFO_SOURCE(1 << 6) /* 0 - sound block; 1 - 
hdmi regs */
 #   define AFMT_AUDIO_INFO_UPDATE(1 << 7)
 #   define AFMT_MPEG_INFO_UPDATE (1 << 10)
 #define AFMT_GENERIC0_7  0x7610
-- 
1.8.3.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 58033] [r300g][r600g] Black gap artifacts when playing WoW

2013-09-13 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=58033

--- Comment #19 from Tomasz P.  ---
(In reply to comment #15)
> I've just upgraded one of my 32 bit boxes to Fedora 18, which has allowed me
> to compile Mesa git here. And I can now report that the corruption happens
> with an RV730 chip too.

This corruption appears with all shader backends (sb,llvm,sb+llvm) ?

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] drm/radeon: Fix hmdi typo

2013-09-13 Thread Alex Deucher
On Fri, Sep 13, 2013 at 11:37 AM, Damien Lespiau
 wrote:
> I keep making that one, so checked if I was the only one. Apparently
> not.

Thanks!  applied.

Alex

>
> Cc: Alex Deucher 
> Signed-off-by: Damien Lespiau 
> ---
>  drivers/gpu/drm/radeon/r600d.h | 2 +-
>  drivers/gpu/drm/radeon/radeon_connectors.c | 2 +-
>  drivers/gpu/drm/radeon/rv770d.h| 2 +-
>  3 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/radeon/r600d.h b/drivers/gpu/drm/radeon/r600d.h
> index 7c78083..1312136 100644
> --- a/drivers/gpu/drm/radeon/r600d.h
> +++ b/drivers/gpu/drm/radeon/r600d.h
> @@ -1004,7 +1004,7 @@
>  #   define HDMI0_AVI_INFO_CONT   (1 << 1)
>  #   define HDMI0_AUDIO_INFO_SEND (1 << 4)
>  #   define HDMI0_AUDIO_INFO_CONT (1 << 5)
> -#   define HDMI0_AUDIO_INFO_SOURCE (1 << 6) /* 0 - sound block; 1 - hmdi 
> regs */
> +#   define HDMI0_AUDIO_INFO_SOURCE (1 << 6) /* 0 - sound block; 1 - hdmi 
> regs */
>  #   define HDMI0_AUDIO_INFO_UPDATE (1 << 7)
>  #   define HDMI0_MPEG_INFO_SEND  (1 << 8)
>  #   define HDMI0_MPEG_INFO_CONT  (1 << 9)
> diff --git a/drivers/gpu/drm/radeon/radeon_connectors.c 
> b/drivers/gpu/drm/radeon/radeon_connectors.c
> index 2399f25..fc0a217 100644
> --- a/drivers/gpu/drm/radeon/radeon_connectors.c
> +++ b/drivers/gpu/drm/radeon/radeon_connectors.c
> @@ -1420,7 +1420,7 @@ radeon_dp_detect(struct drm_connector *connector, bool 
> force)
> if (radeon_dp_getdpcd(radeon_connector))
> ret = connector_status_connected;
> } else {
> -   /* try non-aux ddc (DP to DVI/HMDI/etc. 
> adapter) */
> +   /* try non-aux ddc (DP to DVI/HDMI/etc. 
> adapter) */
> if (radeon_ddc_probe(radeon_connector, false))
> ret = connector_status_connected;
> }
> diff --git a/drivers/gpu/drm/radeon/rv770d.h b/drivers/gpu/drm/radeon/rv770d.h
> index 6bef2b7..d291625 100644
> --- a/drivers/gpu/drm/radeon/rv770d.h
> +++ b/drivers/gpu/drm/radeon/rv770d.h
> @@ -852,7 +852,7 @@
>  #define AFMT_VBI_PACKET_CONTROL  0x7608
>  #   define AFMT_GENERIC0_UPDATE  (1 << 2)
>  #define AFMT_INFOFRAME_CONTROL0  0x760c
> -#   define AFMT_AUDIO_INFO_SOURCE(1 << 6) /* 0 - sound block; 1 
> - hmdi regs */
> +#   define AFMT_AUDIO_INFO_SOURCE(1 << 6) /* 0 - sound block; 1 
> - hdmi regs */
>  #   define AFMT_AUDIO_INFO_UPDATE(1 << 7)
>  #   define AFMT_MPEG_INFO_UPDATE (1 << 10)
>  #define AFMT_GENERIC0_7  0x7610
> --
> 1.8.3.1
>
> ___
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 6/9] drm: Add a DRM_CAP_STEREO_3D capability for SET_CAP ioctl

2013-09-13 Thread Joakim Plate
David Herrmann  gmail.com> writes:

> 
> So just to be clear: Whenever a mode is present with 3D flags, it is
> also a valid non-3D mode? Is this guaranteed? 
> 

Well.. Some HDTV's will when they receive a frame packed mode (1080*2+45=2205 
pixels high) . Display just the top part. The bottom part of that is not on 
screen.

So while it will not display it as 3d, it will discard half of the image.

/Joakim

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] drm, ttm Fix uninitialized warning

2013-09-13 Thread David Herrmann
Hi

On Fri, Sep 13, 2013 at 2:33 PM, Prarit Bhargava  wrote:
> Fix uninitialized warning.
>
> drivers/gpu/drm/ttm/ttm_object.c: In function ‘ttm_base_object_lookup’:
> drivers/gpu/drm/ttm/ttm_object.c:213:10: error: ‘base’ may be used 
> uninitialized in this function [-Werror=maybe-uninitialized]
>   kref_put(&base->refcount, ttm_release_base);
>   ^
> drivers/gpu/drm/ttm/ttm_object.c:221:26: note: ‘base’ was declared here
>   struct ttm_base_object *base;
>
> Signed-off-by: Prarit Bhargava 
> Cc: David Airlie 
> Cc: rcl...@redhat.com

Did some research on that, another fix is:

diff --git a/drivers/gpu/drm/ttm/ttm_object.c b/drivers/gpu/drm/ttm/ttm_object.c
index 58a5f32..6b7f7b7 100644
--- a/drivers/gpu/drm/ttm/ttm_object.c
+++ b/drivers/gpu/drm/ttm/ttm_object.c
@@ -228,7 +228,10 @@ struct ttm_base_object
*ttm_base_object_lookup(struct ttm_object_file *tfile,
if (likely(ret == 0)) {
base = drm_hash_entry(hash, struct ttm_base_object, hash);
ret = kref_get_unless_zero(&base->refcount) ? 0 : -EINVAL;
+   } else {
+   ret = -EINVAL;
}
+
rcu_read_unlock();

if (unlikely(ret != 0))

Looks totally stupid but also silences the warning. In fact, the
warning is triggered by rcu_read_unlock(); and only if PROVE_LOCKING
and DEBUG_LOCK_ALLOC are enabled. And it's related to the IP-size of
the rcu_read_unlock() path.

I'd actually prefer "buf = NULL;" and an extended bail-out condition:
  if (unlikely(ret != 0 || !buf))
but it's not my decision, so:
Reviewed-by: David Herrmann 

Cheers
David

> ---
>  drivers/gpu/drm/ttm/ttm_object.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_object.c 
> b/drivers/gpu/drm/ttm/ttm_object.c
> index 58a5f32..a868176 100644
> --- a/drivers/gpu/drm/ttm/ttm_object.c
> +++ b/drivers/gpu/drm/ttm/ttm_object.c
> @@ -218,7 +218,7 @@ struct ttm_base_object *ttm_base_object_lookup(struct 
> ttm_object_file *tfile,
>uint32_t key)
>  {
> struct ttm_object_device *tdev = tfile->tdev;
> -   struct ttm_base_object *base;
> +   struct ttm_base_object *uninitialized_var(base);
> struct drm_hash_item *hash;
> int ret;
>
> --
> 1.7.9.3
>
> ___
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 7/9] drm/edid: Expose mandatory stereo modes for HDMI sinks

2013-09-13 Thread Daniel Vetter
On Fri, Sep 13, 2013 at 6:10 PM, Joakim Plate  wrote:
>
> Also, some logic aught to indicate pixel aspect ratio for the modes since
> they are non square for the half res modes.

Atm we completely ignore pixel aspect ratio, also for flatworld CEA
modes. So I don't think we need to concer ourselves here about this,
imo pixel aspect ratio support is orthogonal.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 7/9] drm/edid: Expose mandatory stereo modes for HDMI sinks

2013-09-13 Thread Joakim Plate
Damien Lespiau  intel.com> writes:

> +static const struct s3d_mandatory_mode s3d_mandatory_modes[] = {
> + { 1920, 1080, 24, 0,
> +   DRM_MODE_FLAG_3D_TOP_AND_BOTTOM | DRM_MODE_FLAG_3D_FRAME_PACKING 
},
> + { 1920, 1080, 50, DRM_MODE_FLAG_INTERLACE,
> +   DRM_MODE_FLAG_3D_SIDE_BY_SIDE_HALF },
> + { 1920, 1080, 60, DRM_MODE_FLAG_INTERLACE,
> +   DRM_MODE_FLAG_3D_SIDE_BY_SIDE_HALF },
> + { 1280, 720,  50, 0,
> +   DRM_MODE_FLAG_3D_TOP_AND_BOTTOM | DRM_MODE_FLAG_3D_FRAME_PACKING 
},
> + { 1280, 720,  60, 0,
> +   DRM_MODE_FLAG_3D_TOP_AND_BOTTOM | DRM_MODE_FLAG_3D_FRAME_PACKING }
> +};


I may be missing something here... But..

The frame packed modes are much higher in pixels than this and include frame 
packing.
1080*2+45=2050
720*2+30=1470

Unless you intend to hide the left/right split in mesa or other place, we 
need to get the ability to render to both fields somehow.

Either as the full 2050 pixels high or at 1080*2 and the driver adds the 
blanking.

Also, some logic aught to indicate pixel aspect ratio for the modes since 
they are non square for the half res modes.

/Joakim



___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 69328] New: Recoverable and unrecoverable lockups with opencl-example on trinity APU

2013-09-13 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=69328

  Priority: medium
Bug ID: 69328
  Assignee: dri-devel@lists.freedesktop.org
   Summary: Recoverable and unrecoverable lockups with
opencl-example on trinity APU
  Severity: normal
Classification: Unclassified
OS: Linux (All)
  Reporter: slick...@gmx.com
  Hardware: x86-64 (AMD64)
Status: NEW
   Version: git
 Component: Drivers/DRI/R600
   Product: Mesa

Software in use:

Up to date mesa, llvm, clang, libclc, firmware, as of 20130910.  Gentoo's Linux
3.11. opencl-example-12905ac620b83713b07ece763ff3c36fb3c2e7e5.

Hardware in use:  AMD A8 5600K APU (Radeon HD 7560D, Aruba), 32GB system RAM,
Biostar Hi-Fi A85W motherboard.

Steps to reproduce: 

Run hello_world program from opencl-example.  First run works correctly. 
Second run either completely locks up the machine or locks up the GPU (which
recovers after a short time).  Same behavior for other tests - the first test
completes and the second causes problems.


This is what the second opencl-example run looks like:

localhost opencl-example-12905ac620b83713b07ece763ff3c36fb3c2e7e5 #
./hello_world 
There are 1 platforms.
There are 1 GPU devices.
clCreateContext() succeeded.
clCreateCommandQueue() succeeded.
clCreateProgramWithSource() suceeded.
clBuildProgram() suceeded.
clCreateKernel() suceeded.
clCreateBuffer() succeeded.
clSetKernelArg() succeeded.
clEnqueueNDRangeKernel() suceeded.
((( 10 second hang here, or forever if the machine is toast )))
clEnqueueReadBuffer() suceeded.
pi = 3.141590


And, here is the dmesg output from the recoverable lockups:

[ 1365.806285] radeon :00:01.0: GPU lockup CP stall for more than 1msec
[ 1365.806292] radeon :00:01.0: GPU lockup (waiting for 0x7ec3
last fence id 0x7ec2)
[ 1365.821261] radeon :00:01.0: Saved 559 dwords of commands on ring 0.
[ 1365.821293] radeon :00:01.0: GPU softreset: 0x0008
[ 1365.821297] radeon :00:01.0:   GRBM_STATUS   = 0xB0003828
[ 1365.821300] radeon :00:01.0:   GRBM_STATUS_SE0   = 0x0007
[ 1365.821304] radeon :00:01.0:   GRBM_STATUS_SE1   = 0x0007
[ 1365.821307] radeon :00:01.0:   SRBM_STATUS   = 0x2040
[ 1365.821332] radeon :00:01.0:   SRBM_STATUS2  = 0x
[ 1365.821335] radeon :00:01.0:   R_008674_CP_STALLED_STAT1 = 0x
[ 1365.821338] radeon :00:01.0:   R_008678_CP_STALLED_STAT2 = 0x4000
[ 1365.821341] radeon :00:01.0:   R_00867C_CP_BUSY_STAT = 0x00010002
[ 1365.821344] radeon :00:01.0:   R_008680_CP_STAT  = 0x80220243
[ 1365.821347] radeon :00:01.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
[ 1365.821350] radeon :00:01.0:   R_00D834_DMA_STATUS_REG   = 0x44C83D57
[ 1365.821354] radeon :00:01.0:   VM_CONTEXT0_PROTECTION_FAULT_ADDR  
0x
[ 1365.821357] radeon :00:01.0:   VM_CONTEXT0_PROTECTION_FAULT_STATUS
0x
[ 1365.821360] radeon :00:01.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR  
0x
[ 1365.821363] radeon :00:01.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS
0x
[ 1365.827029] radeon :00:01.0: GRBM_SOFT_RESET=0x4001
[ 1365.827083] radeon :00:01.0: SRBM_SOFT_RESET=0x0100
[ 1365.828237] radeon :00:01.0:   GRBM_STATUS   = 0x3828
[ 1365.828240] radeon :00:01.0:   GRBM_STATUS_SE0   = 0x0007
[ 1365.828243] radeon :00:01.0:   GRBM_STATUS_SE1   = 0x0007
[ 1365.828246] radeon :00:01.0:   SRBM_STATUS   = 0x2040
[ 1365.828271] radeon :00:01.0:   SRBM_STATUS2  = 0x
[ 1365.828274] radeon :00:01.0:   R_008674_CP_STALLED_STAT1 = 0x
[ 1365.828277] radeon :00:01.0:   R_008678_CP_STALLED_STAT2 = 0x
[ 1365.828280] radeon :00:01.0:   R_00867C_CP_BUSY_STAT = 0x
[ 1365.828283] radeon :00:01.0:   R_008680_CP_STAT  = 0x
[ 1365.828286] radeon :00:01.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
[ 1365.828289] radeon :00:01.0:   R_00D834_DMA_STATUS_REG   = 0x44C83D57
[ 1365.828317] radeon :00:01.0: GPU reset succeeded, trying to resume
[ 1365.843638] [drm] PCIE GART of 512M enabled (table at 0x00276000).
[ 1365.843775] radeon :00:01.0: WB enabled
[ 1365.843781] radeon :00:01.0: fence driver on ring 0 use gpu addr
0x2c00 and cpu addr 0x8807dea6bc00
[ 1365.844520] radeon :00:01.0: fence driver on ring 5 use gpu addr
0x00075a18 and cpu addr 0xc900057b5a18
[ 1365.844524] radeon :00:01.0: fence driver on ring 1 use gpu addr
0x2c04 and cpu addr 0x8807dea6bc04
[ 1365.844528] radeon :00:01.0: fence driver on ring 2 use gpu addr
0x2c08 and cpu addr 0x8807dea6bc08
[ 1365.844532] radeon :00:01.0: fence driver on ring 3 use gpu addr
0x2c0c and cpu

[Bug 69328] Recoverable and unrecoverable lockups with opencl-example on trinity APU

2013-09-13 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=69328

Alex Deucher  changed:

   What|Removed |Added

  Component|Drivers/DRI/R600|Drivers/Gallium/r600

--- Comment #1 from Alex Deucher  ---
Is this a regression?  If so, can you bisect?

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 69321] starting openCL crashes/boots system

2013-09-13 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=69321

--- Comment #3 from udo  ---
Trying to find a working commit first.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH] drm/radeon/dpm: rework auto performance level enable

2013-09-13 Thread Alex Deucher
Calling force_performance_level() from set_power_state()
doesn't work on some asics because the current power
state pointer has not been properly updated at that point.
Move the calls to force_performance_level() out of the
asic specific set_power_state() functions and into
the main power state sequence.

Fixes dpm resume on SI.

Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/radeon/btc_dpm.c |  6 --
 drivers/gpu/drm/radeon/ci_dpm.c  |  6 --
 drivers/gpu/drm/radeon/cypress_dpm.c |  6 --
 drivers/gpu/drm/radeon/kv_dpm.c  |  1 -
 drivers/gpu/drm/radeon/ni_dpm.c  |  6 --
 drivers/gpu/drm/radeon/radeon_pm.c   | 14 +-
 drivers/gpu/drm/radeon/rv6xx_dpm.c   |  2 --
 drivers/gpu/drm/radeon/rv770_dpm.c   |  6 --
 drivers/gpu/drm/radeon/si_dpm.c  |  6 --
 drivers/gpu/drm/radeon/sumo_dpm.c|  2 --
 drivers/gpu/drm/radeon/trinity_dpm.c |  1 -
 11 files changed, 9 insertions(+), 47 deletions(-)

diff --git a/drivers/gpu/drm/radeon/btc_dpm.c b/drivers/gpu/drm/radeon/btc_dpm.c
index 084e694..05ff315 100644
--- a/drivers/gpu/drm/radeon/btc_dpm.c
+++ b/drivers/gpu/drm/radeon/btc_dpm.c
@@ -2340,12 +2340,6 @@ int btc_dpm_set_power_state(struct radeon_device *rdev)
return ret;
}
 
-   ret = rv770_dpm_force_performance_level(rdev, 
RADEON_DPM_FORCED_LEVEL_AUTO);
-   if (ret) {
-   DRM_ERROR("rv770_dpm_force_performance_level failed\n");
-   return ret;
-   }
-
return 0;
 }
 
diff --git a/drivers/gpu/drm/radeon/ci_dpm.c b/drivers/gpu/drm/radeon/ci_dpm.c
index 3cce533..8996274 100644
--- a/drivers/gpu/drm/radeon/ci_dpm.c
+++ b/drivers/gpu/drm/radeon/ci_dpm.c
@@ -4748,12 +4748,6 @@ int ci_dpm_set_power_state(struct radeon_device *rdev)
if (pi->pcie_performance_request)
ci_notify_link_speed_change_after_state_change(rdev, new_ps, 
old_ps);
 
-   ret = ci_dpm_force_performance_level(rdev, 
RADEON_DPM_FORCED_LEVEL_AUTO);
-   if (ret) {
-   DRM_ERROR("ci_dpm_force_performance_level failed\n");
-   return ret;
-   }
-
cik_update_cg(rdev, (RADEON_CG_BLOCK_GFX |
 RADEON_CG_BLOCK_MC |
 RADEON_CG_BLOCK_SDMA |
diff --git a/drivers/gpu/drm/radeon/cypress_dpm.c 
b/drivers/gpu/drm/radeon/cypress_dpm.c
index 95a66db..91bb470 100644
--- a/drivers/gpu/drm/radeon/cypress_dpm.c
+++ b/drivers/gpu/drm/radeon/cypress_dpm.c
@@ -2014,12 +2014,6 @@ int cypress_dpm_set_power_state(struct radeon_device 
*rdev)
if (eg_pi->pcie_performance_request)
cypress_notify_link_speed_change_after_state_change(rdev, 
new_ps, old_ps);
 
-   ret = rv770_dpm_force_performance_level(rdev, 
RADEON_DPM_FORCED_LEVEL_AUTO);
-   if (ret) {
-   DRM_ERROR("rv770_dpm_force_performance_level failed\n");
-   return ret;
-   }
-
return 0;
 }
 
diff --git a/drivers/gpu/drm/radeon/kv_dpm.c b/drivers/gpu/drm/radeon/kv_dpm.c
index b98b9c9..7139906 100644
--- a/drivers/gpu/drm/radeon/kv_dpm.c
+++ b/drivers/gpu/drm/radeon/kv_dpm.c
@@ -1854,7 +1854,6 @@ int kv_dpm_set_power_state(struct radeon_device *rdev)
 RADEON_CG_BLOCK_BIF |
 RADEON_CG_BLOCK_HDP), true);
 
-   rdev->pm.dpm.forced_level = RADEON_DPM_FORCED_LEVEL_AUTO;
return 0;
 }
 
diff --git a/drivers/gpu/drm/radeon/ni_dpm.c b/drivers/gpu/drm/radeon/ni_dpm.c
index f7b625c..6c398a4 100644
--- a/drivers/gpu/drm/radeon/ni_dpm.c
+++ b/drivers/gpu/drm/radeon/ni_dpm.c
@@ -3865,12 +3865,6 @@ int ni_dpm_set_power_state(struct radeon_device *rdev)
return ret;
}
 
-   ret = ni_dpm_force_performance_level(rdev, 
RADEON_DPM_FORCED_LEVEL_AUTO);
-   if (ret) {
-   DRM_ERROR("ni_dpm_force_performance_level failed\n");
-   return ret;
-   }
-
return 0;
 }
 
diff --git a/drivers/gpu/drm/radeon/radeon_pm.c 
b/drivers/gpu/drm/radeon/radeon_pm.c
index d41ac8a..87e1d69 100644
--- a/drivers/gpu/drm/radeon/radeon_pm.c
+++ b/drivers/gpu/drm/radeon/radeon_pm.c
@@ -917,10 +917,13 @@ static void radeon_dpm_change_power_state_locked(struct 
radeon_device *rdev)
 
radeon_dpm_post_set_power_state(rdev);
 
-   /* force low perf level for thermal */
-   if (rdev->pm.dpm.thermal_active &&
-   rdev->asic->dpm.force_performance_level) {
-   radeon_dpm_force_performance_level(rdev, 
RADEON_DPM_FORCED_LEVEL_LOW);
+   if (rdev->asic->dpm.force_performance_level) {
+   if (rdev->pm.dpm.thermal_active)
+   /* force low perf level for thermal */
+   radeon_dpm_force_performance_level(rdev, 
RADEON_DPM_FORCED_LEVEL_LOW);
+   else
+   /* otherwise, enable auto */
+   radeon_dpm_force_performance_level(rdev, 
RADEON_DPM_FORCED_LEVEL_AUTO);
}
 
 done:
@@ -1149,9 +1152,10 @@ stati

[Bug 68235] Display freezes after login with kernel 3.11.0-rc5 on Cayman with dpm=1

2013-09-13 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=68235

--- Comment #26 from Alex Deucher  ---
Can you attach your dmesg with dpm enabled?

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 69245] Opencl random lockups whilst running tstellar's opencl-examples

2013-09-13 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=69245

--- Comment #1 from Tom Stellard  ---
Can you test this with the latest version of Mesa from git?   I was able to
reproduce this on the 9.2 branch by running three separate instances of the
run_tests.sh script at the same time as the Lightsmark demo.  However, I could
not reproduce this on master, so maybe it has been fixed.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 68235] Display freezes after login with kernel 3.11.0-rc5 on Cayman with dpm=1

2013-09-13 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=68235

--- Comment #27 from Alexandre Demers  ---
(In reply to comment #26)
> Can you attach your dmesg with dpm enabled?

Do you mean with the patch applied (total and/or problematic part left alone)?

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 68235] Display freezes after login with kernel 3.11.0-rc5 on Cayman with dpm=1

2013-09-13 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=68235

--- Comment #28 from Alex Deucher  ---
(In reply to comment #27)
> (In reply to comment #26)
> > Can you attach your dmesg with dpm enabled?
> 
> Do you mean with the patch applied (total and/or problematic part left
> alone)?

Doesn't matter.  I just want to see the basic driver output and power state
list.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH] drm/radeon: fix panel scaling with eDP and LVDS bridges

2013-09-13 Thread Alex Deucher
We were using the wrong set_properly callback so we always
ended up with Full scaling even if something else (Center or
Full aspect) was selected.

Signed-off-by: Alex Deucher 
Cc: sta...@vger.kernel.org
---
 drivers/gpu/drm/radeon/radeon_connectors.c | 34 +++---
 1 file changed, 31 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_connectors.c 
b/drivers/gpu/drm/radeon/radeon_connectors.c
index f7c8c6e..79159b5 100644
--- a/drivers/gpu/drm/radeon/radeon_connectors.c
+++ b/drivers/gpu/drm/radeon/radeon_connectors.c
@@ -1504,6 +1504,24 @@ static const struct drm_connector_funcs 
radeon_dp_connector_funcs = {
.force = radeon_dvi_force,
 };
 
+static const struct drm_connector_funcs radeon_edp_connector_funcs = {
+   .dpms = drm_helper_connector_dpms,
+   .detect = radeon_dp_detect,
+   .fill_modes = drm_helper_probe_single_connector_modes,
+   .set_property = radeon_lvds_set_property,
+   .destroy = radeon_dp_connector_destroy,
+   .force = radeon_dvi_force,
+};
+
+static const struct drm_connector_funcs radeon_lvds_bridge_connector_funcs = {
+   .dpms = drm_helper_connector_dpms,
+   .detect = radeon_dp_detect,
+   .fill_modes = drm_helper_probe_single_connector_modes,
+   .set_property = radeon_lvds_set_property,
+   .destroy = radeon_dp_connector_destroy,
+   .force = radeon_dvi_force,
+};
+
 void
 radeon_add_atom_connector(struct drm_device *dev,
  uint32_t connector_id,
@@ -1595,8 +1613,6 @@ radeon_add_atom_connector(struct drm_device *dev,
goto failed;
radeon_dig_connector->igp_lane_info = igp_lane_info;
radeon_connector->con_priv = radeon_dig_connector;
-   drm_connector_init(dev, &radeon_connector->base, 
&radeon_dp_connector_funcs, connector_type);
-   drm_connector_helper_add(&radeon_connector->base, 
&radeon_dp_connector_helper_funcs);
if (i2c_bus->valid) {
/* add DP i2c bus */
if (connector_type == DRM_MODE_CONNECTOR_eDP)
@@ -1613,6 +1629,10 @@ radeon_add_atom_connector(struct drm_device *dev,
case DRM_MODE_CONNECTOR_VGA:
case DRM_MODE_CONNECTOR_DVIA:
default:
+   drm_connector_init(dev, &radeon_connector->base,
+  &radeon_dp_connector_funcs, 
connector_type);
+   drm_connector_helper_add(&radeon_connector->base,
+
&radeon_dp_connector_helper_funcs);
connector->interlace_allowed = true;
connector->doublescan_allowed = true;
radeon_connector->dac_load_detect = true;
@@ -1625,6 +1645,10 @@ radeon_add_atom_connector(struct drm_device *dev,
case DRM_MODE_CONNECTOR_HDMIA:
case DRM_MODE_CONNECTOR_HDMIB:
case DRM_MODE_CONNECTOR_DisplayPort:
+   drm_connector_init(dev, &radeon_connector->base,
+  &radeon_dp_connector_funcs, 
connector_type);
+   drm_connector_helper_add(&radeon_connector->base,
+
&radeon_dp_connector_helper_funcs);
drm_object_attach_property(&radeon_connector->base.base,
  
rdev->mode_info.underscan_property,
  UNDERSCAN_OFF);
@@ -1652,6 +1676,10 @@ radeon_add_atom_connector(struct drm_device *dev,
break;
case DRM_MODE_CONNECTOR_LVDS:
case DRM_MODE_CONNECTOR_eDP:
+   drm_connector_init(dev, &radeon_connector->base,
+  &radeon_lvds_bridge_connector_funcs, 
connector_type);
+   drm_connector_helper_add(&radeon_connector->base,
+
&radeon_dp_connector_helper_funcs);
drm_object_attach_property(&radeon_connector->base.base,
  
dev->mode_config.scaling_mode_property,
  
DRM_MODE_SCALE_FULLSCREEN);
@@ -1830,7 +1858,7 @@ radeon_add_atom_connector(struct drm_device *dev,
goto failed;
radeon_dig_connector->igp_lane_info = igp_lane_info;
radeon_connector->con_priv = radeon_dig_connector;
-   drm_connector_init(dev, &radeon_connector->base, 
&radeon_dp_connector_funcs, connector_type);
+   drm_connector_init(dev, &radeon_connector->base, 
&radeon_edp_connector_funcs, connector_type);
drm_connector_helper_add(&radeon_connector->base, 
&radeon_dp_

[Bug 68224] [radeonsi] Serious Sam3 is segfaulting (LLVM assert)

2013-09-13 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=68224

--- Comment #14 from Tom Stellard  ---
(In reply to comment #13)
> (In reply to comment #12)
> > Created attachment 85372 [details] [review] [review]
> > SGPR register spilling patch v2
> > 
> > Can you try this v2 patch?  It fixes the bug Michel found plus another one.
> 
> Same result as patch v1, GPU lockup instead of llvm assert; Sanctuary demo
> gives also a GPU lockup

Sanctuary doesn't lockup for me with this patch, but the only thing visible is
the torch.  Everything else is black.  What settings are you using with
Sanctuary?

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 69340] New: Recent mesa git revisions cause frequent gpu hangs on radeonsi

2013-09-13 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=69340

  Priority: medium
Bug ID: 69340
  Assignee: dri-devel@lists.freedesktop.org
   Summary: Recent mesa git revisions cause frequent gpu hangs on
radeonsi
  Severity: normal
Classification: Unclassified
OS: Linux (All)
  Reporter: j.suarez.agap...@gmail.com
  Hardware: x86-64 (AMD64)
Status: NEW
   Version: git
 Component: Drivers/Gallium/radeonsi
   Product: Mesa

After installing mesa git 395b9410 (from oibaf's ppa on Kubuntu raring) I am
experiencing gpu hangs and kernel panics when launching "somewhat complex" 3D
games. For example, glxgears and supertuxkart do not produce the gpu hang, but
speed-dreams2 (it hangs when the game should show your car in order to drive),
L4D2 (just after the Valve logo-video, just when the game intro movie should
start playing) and Crusader Kings II (just at the very beginning, when the
loading screen should come up).

The last mesa git version I had installed was 505fad04, which works correctly.

Moreover, the crashes happen both with radeon.dpm=1 and radeon.dpm=0.

I have managed to get some dmesg outputs of the crashes:

Crash #1

[  334.162270] radeon :01:00.0: GPU lockup CP stall for more than 1msec
[  334.162280] radeon :01:00.0: GPU lockup (waiting for 0x000160ea)
[  334.162289] radeon :01:00.0: failed to get a new IB (-35)
[  334.162291] [TTM] Failed to expire sync object before buffer eviction
[  334.162299] [drm:radeon_cs_ib_vm_chunk] *ERROR* Failed to get ib !
[  334.162378] [TTM] Failed to expire sync object before buffer eviction
[  334.172123] radeon :01:00.0: sa_manager is not empty, clearing anyway
[  334.381742] radeon :01:00.0: Saved 97917 dwords of commands on ring 0.
[  334.381879] radeon :01:00.0: GPU softreset: 0x0049
[  334.381882] radeon :01:00.0:   GRBM_STATUS   = 0xE5D04028
[  334.381884] radeon :01:00.0:   GRBM_STATUS_SE0   = 0xEE40
[  334.381886] radeon :01:00.0:   GRBM_STATUS_SE1   = 0xEE40
[  334.381889] radeon :01:00.0:   SRBM_STATUS   = 0x20C0
[  334.382000] radeon :01:00.0:   SRBM_STATUS2  = 0x
[  334.382002] radeon :01:00.0:   R_008674_CP_STALLED_STAT1 = 0x
[  334.382004] radeon :01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00018000
[  334.382006] radeon :01:00.0:   R_00867C_CP_BUSY_STAT = 0x00408002
[  334.382009] radeon :01:00.0:   R_008680_CP_STAT  = 0x84038643
[  334.382011] radeon :01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
[  334.382013] radeon :01:00.0:   R_00D834_DMA_STATUS_REG   = 0x44C83D57
[  334.382016] radeon :01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR  
0x
[  334.382018] radeon :01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS
0x
[  334.386528] radeon :01:00.0: GRBM_SOFT_RESET=0xDDFF
[  334.386582] radeon :01:00.0: SRBM_SOFT_RESET=0x0100
[  334.387728] radeon :01:00.0:   GRBM_STATUS   = 0x3028
[  334.387730] radeon :01:00.0:   GRBM_STATUS_SE0   = 0x0006
[  334.387731] radeon :01:00.0:   GRBM_STATUS_SE1   = 0x0006
[  334.387733] radeon :01:00.0:   SRBM_STATUS   = 0x20C0
[  334.387844] radeon :01:00.0:   SRBM_STATUS2  = 0x
[  334.387846] radeon :01:00.0:   R_008674_CP_STALLED_STAT1 = 0x
[  334.387848] radeon :01:00.0:   R_008678_CP_STALLED_STAT2 = 0x
[  334.387850] radeon :01:00.0:   R_00867C_CP_BUSY_STAT = 0x
[  334.387852] radeon :01:00.0:   R_008680_CP_STAT  = 0x
[  334.387854] radeon :01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
[  334.387856] radeon :01:00.0:   R_00D834_DMA_STATUS_REG   = 0x44C83D57
[  334.387981] radeon :01:00.0: GPU reset succeeded, trying to resume
[  334.415260] [drm] probing gen 2 caps for device 1002:5a16 = 31cd02/0
[  334.415264] [drm] PCIE gen 2 link speeds already enabled
[  334.417417] [drm] PCIE GART of 512M enabled (table at 0x00276000).
[  334.417520] radeon :01:00.0: WB enabled
[  334.417522] radeon :01:00.0: fence driver on ring 0 use gpu addr
0x8c00 and cpu addr 0x880412af4c00
[  334.417524] radeon :01:00.0: fence driver on ring 1 use gpu addr
0x8c04 and cpu addr 0x880412af4c04
[  334.417526] radeon :01:00.0: fence driver on ring 2 use gpu addr
0x8c08 and cpu addr 0x880412af4c08
[  334.417528] radeon :01:00.0: fence driver on ring 3 use gpu addr
0x8c0c and cpu addr 0x880412af4c0c
[  334.417530] radeon :01:00.0: fence driver on ring 4 use gpu addr
0x8c10 and cpu addr 0x880412af4c10
[  334.418521] radeon :01:00.0: fence driver on ring 5 use gpu addr
0x00075a18 and cpu addr 0xc90011db5a18
[  334.43691

[Bug 69340] Recent mesa git revisions cause frequent gpu hangs on radeonsi

2013-09-13 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=69340

--- Comment #1 from José Suárez  ---
Created attachment 85793
  --> https://bugs.freedesktop.org/attachment.cgi?id=85793&action=edit
Full dmesg

Full dmesg of the system

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 69328] Recoverable and unrecoverable lockups with opencl-example on trinity APU

2013-09-13 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=69328

--- Comment #2 from Tom Stellard  ---
Created attachment 85794
  --> https://bugs.freedesktop.org/attachment.cgi?id=85794&action=edit
Don't set DB_DEST or CB_DEST* bit on cp_coher_cntl

This patch fixes the hangs for me and all the run_test.sh tests pass.  However,
this is just a hack and not a proper solution.  Can you test this patch?

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 69340] Recent mesa git revisions cause frequent gpu hangs on radeonsi

2013-09-13 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=69340

--- Comment #2 from José Suárez  ---
Just an update: I have tried building the .deb packages with the lines
committed in 395b9410 removed from the source and the hangs are still there, so
I am not sure if the problem lies in that commit...

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 69340] Recent mesa git revisions cause frequent gpu hangs on radeonsi

2013-09-13 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=69340

--- Comment #3 from Hohahiu  ---
Created attachment 85797
  --> https://bugs.freedesktop.org/attachment.cgi?id=85797&action=edit
dmesg

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 69340] Recent mesa git revisions cause frequent gpu hangs on radeonsi

2013-09-13 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=69340

--- Comment #4 from Hohahiu  ---
I'm experiencing similar problems with unigene tropics. File attached above is
my dmesg.

My specs:
intel hd 4000 + AMD Radeon 7750M

Software is:
OpenSUSE 12.3 x86_64
kernel-3.11
Mesa, libdrm are from git
xserver 1.14

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 68235] Display freezes after login with kernel 3.11.0-rc5 on Cayman with dpm=1

2013-09-13 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=68235

--- Comment #29 from Alexandre Demers  ---
Created attachment 85798
  --> https://bugs.freedesktop.org/attachment.cgi?id=85798&action=edit
dpm=1 with partial patch applied on 3.11.0

dmesg output when dpm=1 with partial patch applied (deactivation of pretty much
everything but one to pass ni_upload_sw_state)

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 64867] Hangs on Cayman (HD6950) when watching flash/using vdpau

2013-09-13 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=64867

--- Comment #10 from Harald Judt  ---
With current up-to-date git versions of libdrm, mesa, xorg-server and
xf86-video-ati, the R600_DEBUG=nodma hack no longer seems necessary
(linux-3.11.0-rc6 with UVD disabled); the GPU faults have vanished and the
system is stable.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[Patch] drm/nouveau: nouveau/nouveau_abi16.c does not return offset of allocated notifier

2013-09-13 Thread Bob Gleitsmann
Hi,

The following patch fixes problems retrieving query results. I have
tested it on nv40 - 6800 Ultra.

Best Wishes,

Bob

--- a/drivers/gpu/drm/nouveau/nouveau_abi16.c
+++ b/drivers/gpu/drm/nouveau/nouveau_abi16.c
@@ -445,6 +445,7 @@
nouveau_abi16_ioctl_notifierobj_alloc(ABI16_IOCTL_ARGS)
 sizeof(args), &object);
if (ret)
goto done;
+   info->offset = ntfy->node->offset;
 
 done:
if (ret)


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 59649] [r600][RV635] GPU lockup CP stall / GPU resets over and over - Kernel 3.7 to 3.11 inclusive

2013-09-13 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=59649

Shawn Starr  changed:

   What|Removed |Added

 Status|RESOLVED|REOPENED
 Resolution|FIXED   |---

--- Comment #10 from Shawn Starr  ---
Reopen :/

At least now i can trigger the crash repeatedly.

1) Log into Second Life first

2) You need to patch some of the GLSL programs as they will fail with Mesa 9.2
GLSL compiler

- #extension GL_ARB_texture_rectangle : enable
+/* #extension GL_ARB_texture_rectangle : enable */

3) Go to the Graphics options and under Shaders, enable:

- Basic Shaders
- Atmospheric Shaders
- Advanced Lighting Model
- Ambient Occlusion
- Depth of field

GPU will reset:

[566574.634495] switching from power state:
[566574.634497] ui class: performance
[566574.634498] internal class: none
[566574.634500] caps: single_disp video 
[566574.634501] uvdvclk: 0 dclk: 0
[566574.634502] power level 0sclk: 11000 mclk: 40500 vddc:
900
[566574.634503] power level 1sclk: 3 mclk: 7 vddc:
1100
[566574.634504] power level 2sclk: 6 mclk: 7 vddc:
1100
[566574.634505] status: c 
[566574.634506] switching to power state:
[566574.634507] ui class: performance
[566574.634508] internal class: none
[566574.634509] caps: video 
[566574.634509] uvdvclk: 0 dclk: 0
[566574.634510] power level 0sclk: 3 mclk: 7 vddc:
1100
[566574.634511] power level 1sclk: 3 mclk: 7 vddc:
1100
[566574.634512] power level 2sclk: 6 mclk: 7 vddc:
1100
[566574.634513] status: r 
[566584.067826] switching from power state:
[566584.067830] ui class: performance
[566584.067831] internal class: none
[566584.067833] caps: video 
[566584.067835] uvdvclk: 0 dclk: 0
[566584.067836] power level 0sclk: 3 mclk: 7 vddc:
1100
[566584.067837] power level 1sclk: 3 mclk: 7 vddc:
1100
[566584.067839] power level 2sclk: 6 mclk: 7 vddc:
1100
[566584.067840] status: c 
[566584.067841] switching to power state:
[566584.067842] ui class: performance
[566584.067843] internal class: none
[566584.067844] caps: single_disp video 
[566584.067846] uvdvclk: 0 dclk: 0
[566584.067847] power level 0sclk: 11000 mclk: 40500 vddc:
900
[566584.067848] power level 1sclk: 3 mclk: 7 vddc:
1100
[566584.067849] power level 2sclk: 6 mclk: 7 vddc:
1100
[566584.067850] status: r 
[568371.037065] radeon :01:00.0: GPU lockup CP stall for more than
1msec
[568371.044281] radeon :01:00.0: GPU lockup (waiting for 0x017b4541
last fence id 0x017b4531)
[568371.111399] switching from power state:
[568371.111401] ui class: performance
[568371.111402] internal class: none
[568371.111403] caps: single_disp video 
[568371.111403] uvdvclk: 0 dclk: 0
[568371.111405] power level 0sclk: 11000 mclk: 40500 vddc:
900
[568371.111405] power level 1sclk: 3 mclk: 7 vddc:
1100
[568371.111406] power level 2sclk: 6 mclk: 7 vddc:
1100
[568371.111406] status: c 
[568371.111407] switching to power state:
[568371.111407] ui class: performance
[568371.111408] internal class: none
[568371.111408] caps: video 
[568371.111409] uvdvclk: 0 dclk: 0
[568371.111409] power level 0sclk: 3 mclk: 7 vddc:
1100
[568371.111410] power level 1sclk: 3 mclk: 7 vddc:
1100
[568371.111410] power level 2sclk: 6 mclk: 7 vddc:
1100
[568371.111411] status: r 
[568371.544089] radeon :01:00.0: GPU lockup CP stall for more than
10507msec
[568371.550588] radeon :01:00.0: GPU lockup (waiting for
0x017b4532)
[568371.550591] radeon :01:00.0: failed to get a new IB (-35)
[568371.555183] [drm:radeon_cs_ib_chunk] *ERROR* Failed to get ib !
[568371.561868] radeon :01:00.0: Saved 505 dwords of commands on ring 0.
[568371.561878] radeon :01:00.0: GPU softreset: 0x0008
[568371.561880] radeon :01:00.0:   R_008010_GRBM_STATUS  = 0xA0002030
[568371.561882] radeon :01:00.0:   R_008014_GRBM_STATUS2 = 0x0003
[568371.561884] radeon :01:00.0:   R_000E50_SRBM_STATUS  = 0x20C0
[568371.561886] radeon :01:00.0:   R_008674_CP_STALLED_STAT1 = 0x
[568371.561888] radeon :01:00.0:   R_008678_CP_STALLED_STAT2 = 0x
[568371.561890] radeon :01:00.0:   R_00867C_CP_BUSY_STAT = 0x00020186
[568371.561892] r