Re: [BUG] completely bonkers use of set_need_resched + VM_FAULT_NOPAGE
Op 13-09-13 08:44, Thomas Hellstrom schreef: > On 09/12/2013 11:50 PM, Maarten Lankhorst wrote: >> Op 12-09-13 18:44, Thomas Hellstrom schreef: >>> On 09/12/2013 05:45 PM, Maarten Lankhorst wrote: Op 12-09-13 17:36, Daniel Vetter schreef: > On Thu, Sep 12, 2013 at 5:06 PM, Peter Zijlstra > wrote: >> So I'm poking around the preemption code and stumbled upon: >> >> drivers/gpu/drm/i915/i915_gem.c:set_need_resched(); >> drivers/gpu/drm/ttm/ttm_bo_vm.c: >> set_need_resched(); >> drivers/gpu/drm/ttm/ttm_bo_vm.c: >> set_need_resched(); >> drivers/gpu/drm/udl/udl_gem.c: set_need_resched(); >> >> All these sites basically do: >> >> while (!trylock()) >> yield(); >> >> which is a horrible and broken locking pattern. >> >> Firstly its deadlock prone, suppose the faulting process is a FIFOn+1 >> task that preempted the lock holder at FIFOn. >> >> Secondly the implementation is worse than usual by abusing >> VM_FAULT_NOPAGE, which is supposed to install a PTE so that the fault >> doesn't retry, but you're using it as a get out of fault path. And >> you're using set_need_resched() which is not something a driver should >> _ever_ touch. >> >> Now I'm going to take away set_need_resched() -- and while you can >> 'reimplement' it using set_thread_flag() you're not going to do that >> because it will be broken due to changes to the preempt code. >> >> So please as to fix ASAP and don't allow anybody to trick you into >> merging silly things like that again ;-) > The set_need_resched in i915_gem.c:i915_gem_fault can actually be > removed. It was there to give the error handler a chance to sneak in > and reset the hw/sw tracking when the gpu is dead. That hack goes back > to the days when the locking around our error handler was somewhere > between nonexistent and totally broken, nowadays we keep things from > live-locking by a bit of magic in i915_mutex_lock_interruptible. I'll > whip up a patch to rip this out. I'll also check that our testsuite > properly exercises this path (needs a bit of work on a quick look for > better coverage). > > The one in ttm is just bonghits to shut up lockdep: ttm can recurse > into it's own pagefault handler and then deadlock, the trylock just > keeps lockdep quiet. We've had that bug arise in drm/i915 due to some > fun userspace did and now have testcases for them. The right solution > to fix this is to use copy_to|from_user_atomic in ttm everywhere it > holds locks and have slowpaths which drops locks, copies stuff into a > temp allocation and then continues. At least that's how we've fixed > all those inversions in i915-gem. I'm not volunteering to fix this ;-) Ah the case where a mmap'd address is passed to the execbuf ioctl? :P Fine I'll look into it a bit, hopefully before tuesday. Else it might take a bit longer since I'll be on my way to plumbers.. >>> I think a possible fix would be if fault() were allowed to return an error >>> and drop the mmap_sem() before returning. >>> >>> Otherwise we need to track down all copy_to_user / copy_from_user which >>> happen with bo::reserve held. > > Actually, from looking at the mm code, it seems OK to do the following: > > if (!bo_tryreserve()) { > up_read mmap_sem(); // Release the mmap_sem to avoid deadlocks. > bo_reserve(); // Wait for the BO to become available > (interruptible) > bo_unreserve(); // Where is bo_wait_unreserved() when we need > it, Maarten :P > return VM_FAULT_RETRY; // Go ahead and retry the VMA walk, after > regrabbing > } Is this meant as a jab at me? You're doing locking wrong here! Again! > Somebody conveniently added a VM_FAULT_RETRY, but for a different purpose. > > If possible, I suggest to take this route for now to avoid the mess of > changing locking order in all TTM drivers, with > all give-up-locking slowpaths that comes with it. IIRC it took some time for > i915 to get that right, and completely get rid of all lockdep warnings. Sorry, but it's still the right thing to do. I can convert nouveau and take a look at radeon. Locking slowpaths are easy to test too with CONFIG_DEBUG_WW_MUTEX_SLOWPATH. Just because it's harder, doesn't mean we have to avoid doing it. The might_fault function will verify the usage of mmap_sem with lockdep automatically when PROVE_LOCKING=y. This means that any copy_from_user / copy_to_user will always check mmap_sem. > This will keep the official locking order > bo::reserve > mmap_sem Disagree, fix the order and the trylock and 'wait for unreserved' half assed locking will disappear. ~Maarten ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.or
Re: [BUG] completely bonkers use of set_need_resched + VM_FAULT_NOPAGE
On 09/13/2013 09:16 AM, Maarten Lankhorst wrote: Op 13-09-13 08:44, Thomas Hellstrom schreef: On 09/12/2013 11:50 PM, Maarten Lankhorst wrote: Op 12-09-13 18:44, Thomas Hellstrom schreef: On 09/12/2013 05:45 PM, Maarten Lankhorst wrote: Op 12-09-13 17:36, Daniel Vetter schreef: On Thu, Sep 12, 2013 at 5:06 PM, Peter Zijlstra wrote: So I'm poking around the preemption code and stumbled upon: drivers/gpu/drm/i915/i915_gem.c:set_need_resched(); drivers/gpu/drm/ttm/ttm_bo_vm.c:set_need_resched(); drivers/gpu/drm/ttm/ttm_bo_vm.c:set_need_resched(); drivers/gpu/drm/udl/udl_gem.c: set_need_resched(); All these sites basically do: while (!trylock()) yield(); which is a horrible and broken locking pattern. Firstly its deadlock prone, suppose the faulting process is a FIFOn+1 task that preempted the lock holder at FIFOn. Secondly the implementation is worse than usual by abusing VM_FAULT_NOPAGE, which is supposed to install a PTE so that the fault doesn't retry, but you're using it as a get out of fault path. And you're using set_need_resched() which is not something a driver should _ever_ touch. Now I'm going to take away set_need_resched() -- and while you can 'reimplement' it using set_thread_flag() you're not going to do that because it will be broken due to changes to the preempt code. So please as to fix ASAP and don't allow anybody to trick you into merging silly things like that again ;-) The set_need_resched in i915_gem.c:i915_gem_fault can actually be removed. It was there to give the error handler a chance to sneak in and reset the hw/sw tracking when the gpu is dead. That hack goes back to the days when the locking around our error handler was somewhere between nonexistent and totally broken, nowadays we keep things from live-locking by a bit of magic in i915_mutex_lock_interruptible. I'll whip up a patch to rip this out. I'll also check that our testsuite properly exercises this path (needs a bit of work on a quick look for better coverage). The one in ttm is just bonghits to shut up lockdep: ttm can recurse into it's own pagefault handler and then deadlock, the trylock just keeps lockdep quiet. We've had that bug arise in drm/i915 due to some fun userspace did and now have testcases for them. The right solution to fix this is to use copy_to|from_user_atomic in ttm everywhere it holds locks and have slowpaths which drops locks, copies stuff into a temp allocation and then continues. At least that's how we've fixed all those inversions in i915-gem. I'm not volunteering to fix this ;-) Ah the case where a mmap'd address is passed to the execbuf ioctl? :P Fine I'll look into it a bit, hopefully before tuesday. Else it might take a bit longer since I'll be on my way to plumbers.. I think a possible fix would be if fault() were allowed to return an error and drop the mmap_sem() before returning. Otherwise we need to track down all copy_to_user / copy_from_user which happen with bo::reserve held. Actually, from looking at the mm code, it seems OK to do the following: if (!bo_tryreserve()) { up_read mmap_sem(); // Release the mmap_sem to avoid deadlocks. bo_reserve(); // Wait for the BO to become available (interruptible) bo_unreserve(); // Where is bo_wait_unreserved() when we need it, Maarten :P return VM_FAULT_RETRY; // Go ahead and retry the VMA walk, after regrabbing } Is this meant as a jab at me? You're doing locking wrong here! Again! It's not meant as a jab at you. I'm sorry if it came out that way. It was meant as a joke. I wasn't aware the topic was sensitive. Anyway, could you describe what is wrong, with the above solution, because it seems perfectly legal to me. There is no substantial overhead, and there is no risc of deadlocks. Or do you mean it's bad because it confuses lockdep? /Thomas ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [BUG] completely bonkers use of set_need_resched + VM_FAULT_NOPAGE
Op 13-09-13 09:46, Thomas Hellstrom schreef: > On 09/13/2013 09:16 AM, Maarten Lankhorst wrote: >> Op 13-09-13 08:44, Thomas Hellstrom schreef: >>> On 09/12/2013 11:50 PM, Maarten Lankhorst wrote: Op 12-09-13 18:44, Thomas Hellstrom schreef: > On 09/12/2013 05:45 PM, Maarten Lankhorst wrote: >> Op 12-09-13 17:36, Daniel Vetter schreef: >>> On Thu, Sep 12, 2013 at 5:06 PM, Peter Zijlstra >>> wrote: So I'm poking around the preemption code and stumbled upon: drivers/gpu/drm/i915/i915_gem.c:set_need_resched(); drivers/gpu/drm/ttm/ttm_bo_vm.c: set_need_resched(); drivers/gpu/drm/ttm/ttm_bo_vm.c: set_need_resched(); drivers/gpu/drm/udl/udl_gem.c: set_need_resched(); All these sites basically do: while (!trylock()) yield(); which is a horrible and broken locking pattern. Firstly its deadlock prone, suppose the faulting process is a FIFOn+1 task that preempted the lock holder at FIFOn. Secondly the implementation is worse than usual by abusing VM_FAULT_NOPAGE, which is supposed to install a PTE so that the fault doesn't retry, but you're using it as a get out of fault path. And you're using set_need_resched() which is not something a driver should _ever_ touch. Now I'm going to take away set_need_resched() -- and while you can 'reimplement' it using set_thread_flag() you're not going to do that because it will be broken due to changes to the preempt code. So please as to fix ASAP and don't allow anybody to trick you into merging silly things like that again ;-) >>> The set_need_resched in i915_gem.c:i915_gem_fault can actually be >>> removed. It was there to give the error handler a chance to sneak in >>> and reset the hw/sw tracking when the gpu is dead. That hack goes back >>> to the days when the locking around our error handler was somewhere >>> between nonexistent and totally broken, nowadays we keep things from >>> live-locking by a bit of magic in i915_mutex_lock_interruptible. I'll >>> whip up a patch to rip this out. I'll also check that our testsuite >>> properly exercises this path (needs a bit of work on a quick look for >>> better coverage). >>> >>> The one in ttm is just bonghits to shut up lockdep: ttm can recurse >>> into it's own pagefault handler and then deadlock, the trylock just >>> keeps lockdep quiet. We've had that bug arise in drm/i915 due to some >>> fun userspace did and now have testcases for them. The right solution >>> to fix this is to use copy_to|from_user_atomic in ttm everywhere it >>> holds locks and have slowpaths which drops locks, copies stuff into a >>> temp allocation and then continues. At least that's how we've fixed >>> all those inversions in i915-gem. I'm not volunteering to fix this ;-) >> Ah the case where a mmap'd address is passed to the execbuf ioctl? :P >> >> Fine I'll look into it a bit, hopefully before tuesday. Else it might >> take a bit longer since I'll be on my way to plumbers.. > I think a possible fix would be if fault() were allowed to return an > error and drop the mmap_sem() before returning. > > Otherwise we need to track down all copy_to_user / copy_from_user which > happen with bo::reserve held. >>> Actually, from looking at the mm code, it seems OK to do the following: >>> >>> if (!bo_tryreserve()) { >>> up_read mmap_sem(); // Release the mmap_sem to avoid deadlocks. >>> bo_reserve(); // Wait for the BO to become available >>> (interruptible) >>> bo_unreserve(); // Where is bo_wait_unreserved() when we >>> need it, Maarten :P >>> return VM_FAULT_RETRY; // Go ahead and retry the VMA walk, after >>> regrabbing >>> } >> Is this meant as a jab at me? You're doing locking wrong here! Again! > > It's not meant as a jab at you. I'm sorry if it came out that way. It was > meant as a joke. I wasn't aware the topic was sensitive. > > Anyway, could you describe what is wrong, with the above solution, because it > seems perfectly legal to me. > There is no substantial overhead, and there is no risc of deadlocks. Or do > you mean it's bad because it confuses lockdep? Evil userspace can pass a bo as pointer to use for relocation lists, lockdep will warn when that locks up, but still.. This is already a problem now, and your fixing will only cause lockdep to explicitly warn on it. You can make a complicated user program to test this, or simply use this function for debugging: void ttm_might_fault(void) { struct reservation_object obj; reservation_object_init(&obj); ww_mutex_lock(&obj.lock, NULL); ww_mutex_unlock(&
Re: [BUG] completely bonkers use of set_need_resched + VM_FAULT_NOPAGE
On 09/13/2013 09:51 AM, Maarten Lankhorst wrote: Op 13-09-13 09:46, Thomas Hellstrom schreef: On 09/13/2013 09:16 AM, Maarten Lankhorst wrote: Op 13-09-13 08:44, Thomas Hellstrom schreef: On 09/12/2013 11:50 PM, Maarten Lankhorst wrote: Op 12-09-13 18:44, Thomas Hellstrom schreef: On 09/12/2013 05:45 PM, Maarten Lankhorst wrote: Op 12-09-13 17:36, Daniel Vetter schreef: On Thu, Sep 12, 2013 at 5:06 PM, Peter Zijlstra wrote: So I'm poking around the preemption code and stumbled upon: drivers/gpu/drm/i915/i915_gem.c:set_need_resched(); drivers/gpu/drm/ttm/ttm_bo_vm.c:set_need_resched(); drivers/gpu/drm/ttm/ttm_bo_vm.c:set_need_resched(); drivers/gpu/drm/udl/udl_gem.c: set_need_resched(); All these sites basically do: while (!trylock()) yield(); which is a horrible and broken locking pattern. Firstly its deadlock prone, suppose the faulting process is a FIFOn+1 task that preempted the lock holder at FIFOn. Secondly the implementation is worse than usual by abusing VM_FAULT_NOPAGE, which is supposed to install a PTE so that the fault doesn't retry, but you're using it as a get out of fault path. And you're using set_need_resched() which is not something a driver should _ever_ touch. Now I'm going to take away set_need_resched() -- and while you can 'reimplement' it using set_thread_flag() you're not going to do that because it will be broken due to changes to the preempt code. So please as to fix ASAP and don't allow anybody to trick you into merging silly things like that again ;-) The set_need_resched in i915_gem.c:i915_gem_fault can actually be removed. It was there to give the error handler a chance to sneak in and reset the hw/sw tracking when the gpu is dead. That hack goes back to the days when the locking around our error handler was somewhere between nonexistent and totally broken, nowadays we keep things from live-locking by a bit of magic in i915_mutex_lock_interruptible. I'll whip up a patch to rip this out. I'll also check that our testsuite properly exercises this path (needs a bit of work on a quick look for better coverage). The one in ttm is just bonghits to shut up lockdep: ttm can recurse into it's own pagefault handler and then deadlock, the trylock just keeps lockdep quiet. We've had that bug arise in drm/i915 due to some fun userspace did and now have testcases for them. The right solution to fix this is to use copy_to|from_user_atomic in ttm everywhere it holds locks and have slowpaths which drops locks, copies stuff into a temp allocation and then continues. At least that's how we've fixed all those inversions in i915-gem. I'm not volunteering to fix this ;-) Ah the case where a mmap'd address is passed to the execbuf ioctl? :P Fine I'll look into it a bit, hopefully before tuesday. Else it might take a bit longer since I'll be on my way to plumbers.. I think a possible fix would be if fault() were allowed to return an error and drop the mmap_sem() before returning. Otherwise we need to track down all copy_to_user / copy_from_user which happen with bo::reserve held. Actually, from looking at the mm code, it seems OK to do the following: if (!bo_tryreserve()) { up_read mmap_sem(); // Release the mmap_sem to avoid deadlocks. bo_reserve(); // Wait for the BO to become available (interruptible) bo_unreserve(); // Where is bo_wait_unreserved() when we need it, Maarten :P return VM_FAULT_RETRY; // Go ahead and retry the VMA walk, after regrabbing } Is this meant as a jab at me? You're doing locking wrong here! Again! It's not meant as a jab at you. I'm sorry if it came out that way. It was meant as a joke. I wasn't aware the topic was sensitive. Anyway, could you describe what is wrong, with the above solution, because it seems perfectly legal to me. There is no substantial overhead, and there is no risc of deadlocks. Or do you mean it's bad because it confuses lockdep? Evil userspace can pass a bo as pointer to use for relocation lists, lockdep will warn when that locks up, but still.. This is already a problem now, and your fixing will only cause lockdep to explicitly warn on it. As previously mentioned, copy_from_user should return -EFAULT, since the VMAs are marked with VM_IO. It should not recurse into fault(), so evil user-space looses. You can make a complicated user program to test this, or simply use this function for debugging: void ttm_might_fault(void) { struct reservation_object obj; reservation_object_init(&obj); ww_mutex_lock(&obj.lock, NULL); ww_mutex_unlock(&obj.lock); reservation_object_fini(&obj); } Put it near every instance of copy_to_user/copy_from_user and you'll find the bugs. :) I'm still not convinced that there are any problems with this solution. Did you take what's said above into account? Now, could we try to approach this based on pros and cons? Let's say we would
Re: [BUG] completely bonkers use of set_need_resched + VM_FAULT_NOPAGE
On Fri, Sep 13, 2013 at 7:33 AM, Thomas Hellstrom wrote: > Given that all copy_to_user / copy_from_user paths are actually hit during > testing, right? Ime it requires a bit of ingenuity to properly test this from userspace. We're using a few tricks in drm/i915 kernel testing: - When we hand a gtt mmap pointer to execbuf or other ioctls we upload the data in there through pwrite (or if you don't have that use the gpu to blt it there). This way you can careful control when the pagefault will happen. Also since we supply correct data we can make sure that the kernel actually does the right thing and not just whether it'll blow up. - We have a module parameter which can be changed at runtime to disable all the prefaulting we're doing. - We have a debugfs interface to drop caches/evict lrus. If you have a parallel thread that regularly forces the inactive list to be evicted we can force a refault even after the first fault already happend. That's useful to test the slowpath after a slowpath already happened, e.g. when trying to copy reloc offset out to userspace after execbuf completed. With these tricks we have imo great test coverage for i915.ko and more important good assurance that any regressions in this tricky code will get caught. Cheers, Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [BUG] completely bonkers use of set_need_resched + VM_FAULT_NOPAGE
On Fri, Sep 13, 2013 at 09:46:03AM +0200, Thomas Hellstrom wrote: > >>if (!bo_tryreserve()) { > >> up_read mmap_sem(); // Release the mmap_sem to avoid deadlocks. > >> bo_reserve(); // Wait for the BO to become available > >> (interruptible) > >> bo_unreserve(); // Where is bo_wait_unreserved() when we > >> need it, Maarten :P > >> return VM_FAULT_RETRY; // Go ahead and retry the VMA walk, after > >> regrabbing > >>} > > Anyway, could you describe what is wrong, with the above solution, because > it seems perfectly legal to me. Luckily the rule of law doesn't have anything to do with this stuff -- at least I sincerely hope so. The thing that's wrong with that pattern is that its still not deterministic - although its a lot better than the pure trylock. Because you have to release and re-acquire with the trylock another user might have gotten in again. Its utterly prone to starvation. The acquire+release does remove the dead/life-lock scenario from the FIFO case, since blocking on the acquire will allow the other task to run (or even get boosted on -rt). Aside from that there's nothing particularly wrong with it and lockdep should be happy afaict (but I haven't had my morning juice yet). ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [BUG] completely bonkers use of set_need_resched + VM_FAULT_NOPAGE
On Fri, Sep 13, 2013 at 10:23 AM, Thomas Hellstrom wrote: > As previously mentioned, copy_from_user should return -EFAULT, since the > VMAs are marked with VM_IO. It should not recurse into fault(), so evil > user-space looses. I haven't put a printk in the code to prove this, but gem mmap also sets VM_IO in drm_gem_mmap_obj. And we can very much hit our own fault handler and deadlock On a _very_ quick reading (and definitely not enough coffee yet for reading mm/* stuff) it looks like it's get_user_pages that will return an -EFAULT when hitting upon a VM_IO mapping (which makes sense since there's really no page backing it). Actually using get_user_pages was the original slowpath we've had in a few places until we've noticed that for pwrite that breaks legit userspace (the glBufferData(glMap)) use-case), so we've switched to lock dropping and proper slowpaths using copy_*_user everywhere instead of trying to pin the userspace storage with get_user_pages. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [BUG] completely bonkers use of set_need_resched + VM_FAULT_NOPAGE
On 09/13/2013 10:32 AM, Daniel Vetter wrote: On Fri, Sep 13, 2013 at 10:23 AM, Thomas Hellstrom wrote: As previously mentioned, copy_from_user should return -EFAULT, since the VMAs are marked with VM_IO. It should not recurse into fault(), so evil user-space looses. I haven't put a printk in the code to prove this, but gem mmap also sets VM_IO in drm_gem_mmap_obj. And we can very much hit our own fault handler and deadlock If this is indeed true, I guess I need to accept the fact that my solution is bad. (and worse I can't blame not having my morning coffee). I'll take a deeper look. /Thomas ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [BUG] completely bonkers use of set_need_resched + VM_FAULT_NOPAGE
On Fri, Sep 13, 2013 at 10:29 AM, Peter Zijlstra wrote: > On Fri, Sep 13, 2013 at 09:46:03AM +0200, Thomas Hellstrom wrote: >> >>if (!bo_tryreserve()) { >> >> up_read mmap_sem(); // Release the mmap_sem to avoid deadlocks. >> >> bo_reserve(); // Wait for the BO to become available >> >> (interruptible) >> >> bo_unreserve(); // Where is bo_wait_unreserved() when we >> >> need it, Maarten :P >> >> return VM_FAULT_RETRY; // Go ahead and retry the VMA walk, after >> >> regrabbing >> >>} >> >> Anyway, could you describe what is wrong, with the above solution, because >> it seems perfectly legal to me. > > Luckily the rule of law doesn't have anything to do with this stuff -- > at least I sincerely hope so. > > The thing that's wrong with that pattern is that its still not > deterministic - although its a lot better than the pure trylock. Because > you have to release and re-acquire with the trylock another user might > have gotten in again. Its utterly prone to starvation. > > The acquire+release does remove the dead/life-lock scenario from the > FIFO case, since blocking on the acquire will allow the other task to > run (or even get boosted on -rt). > > Aside from that there's nothing particularly wrong with it and lockdep > should be happy afaict (but I haven't had my morning juice yet). bo_reserve internally maps to a ww-mutex and task can already hold ww-mutex (potentially even the same for especially nasty userspace). So lockdep will complain and I think the only way to properly solve this is to have lock-dropping slowpaths around all copy_*_user callsites that already hold a bo_reserve ww_mutex. At least that's been my conclusion after much head-banging against this issue for drm/i915, and we've tried a lot approaches ;-) -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [Intel-gfx] [PATCH 1/2] drm/i915: kill set_need_resched
On Fri, Sep 13, 2013 at 2:59 AM, Rob Clark wrote: > I guess in i915 (and ttm) case, the issue arises due to need for CPU > access to buffer via GTT? In which case I should be safe to drop the > set_need_resched() as well? (Since CPU always has direct access to the > pages.) Or am I missing something about the original issue that > necessitated set_need_resched()? For drm/i915 the _only_ reason we've had it was to avoid life-locking with our gpu reset work when the gpu hung. We've fixed that properly now by using a wait-queue to stall when a gpu reset is pending and proper locking in the gpu reset handler (plus tons of evil tests to make sure it doesn't break, there's rather fragile lock-dropping and tricky ordering involved). So if you don't have i915's broken gpu reset handling from yonder you don't need our cargo-cult. ttm's usage with a trylock+yield is a different form of duct-tape to paper over locking inversions between copy_*_user callsites and the pagefault handler. In any case there's no way it actually works properly ;-) Cheers, Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [BUG] completely bonkers use of set_need_resched + VM_FAULT_NOPAGE
Op 13-09-13 10:23, Thomas Hellstrom schreef: > On 09/13/2013 09:51 AM, Maarten Lankhorst wrote: >> Op 13-09-13 09:46, Thomas Hellstrom schreef: >>> On 09/13/2013 09:16 AM, Maarten Lankhorst wrote: Op 13-09-13 08:44, Thomas Hellstrom schreef: > On 09/12/2013 11:50 PM, Maarten Lankhorst wrote: >> Op 12-09-13 18:44, Thomas Hellstrom schreef: >>> On 09/12/2013 05:45 PM, Maarten Lankhorst wrote: Op 12-09-13 17:36, Daniel Vetter schreef: > On Thu, Sep 12, 2013 at 5:06 PM, Peter Zijlstra > wrote: >> So I'm poking around the preemption code and stumbled upon: >> >> drivers/gpu/drm/i915/i915_gem.c:set_need_resched(); >> drivers/gpu/drm/ttm/ttm_bo_vm.c: >> set_need_resched(); >> drivers/gpu/drm/ttm/ttm_bo_vm.c: >> set_need_resched(); >> drivers/gpu/drm/udl/udl_gem.c: set_need_resched(); >> >> All these sites basically do: >> >> while (!trylock()) >> yield(); >> >> which is a horrible and broken locking pattern. >> >> Firstly its deadlock prone, suppose the faulting process is a FIFOn+1 >> task that preempted the lock holder at FIFOn. >> >> Secondly the implementation is worse than usual by abusing >> VM_FAULT_NOPAGE, which is supposed to install a PTE so that the fault >> doesn't retry, but you're using it as a get out of fault path. And >> you're using set_need_resched() which is not something a driver >> should >> _ever_ touch. >> >> Now I'm going to take away set_need_resched() -- and while you can >> 'reimplement' it using set_thread_flag() you're not going to do that >> because it will be broken due to changes to the preempt code. >> >> So please as to fix ASAP and don't allow anybody to trick you into >> merging silly things like that again ;-) > The set_need_resched in i915_gem.c:i915_gem_fault can actually be > removed. It was there to give the error handler a chance to sneak in > and reset the hw/sw tracking when the gpu is dead. That hack goes back > to the days when the locking around our error handler was somewhere > between nonexistent and totally broken, nowadays we keep things from > live-locking by a bit of magic in i915_mutex_lock_interruptible. I'll > whip up a patch to rip this out. I'll also check that our testsuite > properly exercises this path (needs a bit of work on a quick look for > better coverage). > > The one in ttm is just bonghits to shut up lockdep: ttm can recurse > into it's own pagefault handler and then deadlock, the trylock just > keeps lockdep quiet. We've had that bug arise in drm/i915 due to some > fun userspace did and now have testcases for them. The right solution > to fix this is to use copy_to|from_user_atomic in ttm everywhere it > holds locks and have slowpaths which drops locks, copies stuff into a > temp allocation and then continues. At least that's how we've fixed > all those inversions in i915-gem. I'm not volunteering to fix this ;-) Ah the case where a mmap'd address is passed to the execbuf ioctl? :P Fine I'll look into it a bit, hopefully before tuesday. Else it might take a bit longer since I'll be on my way to plumbers.. >>> I think a possible fix would be if fault() were allowed to return an >>> error and drop the mmap_sem() before returning. >>> >>> Otherwise we need to track down all copy_to_user / copy_from_user which >>> happen with bo::reserve held. > Actually, from looking at the mm code, it seems OK to do the following: > > if (!bo_tryreserve()) { > up_read mmap_sem(); // Release the mmap_sem to avoid deadlocks. > bo_reserve(); // Wait for the BO to become available > (interruptible) > bo_unreserve(); // Where is bo_wait_unreserved() when we > need it, Maarten :P > return VM_FAULT_RETRY; // Go ahead and retry the VMA walk, after > regrabbing > } Is this meant as a jab at me? You're doing locking wrong here! Again! >>> It's not meant as a jab at you. I'm sorry if it came out that way. It was >>> meant as a joke. I wasn't aware the topic was sensitive. >>> >>> Anyway, could you describe what is wrong, with the above solution, because >>> it seems perfectly legal to me. >>> There is no substantial overhead, and there is no risc of deadlocks. Or do >>> you mean it's bad because it confuses lockdep? >> Evil userspace can pass a bo as pointer to use for relocation lists, lockdep >> will warn when that locks up, but still.. >> This is already a problem now, and your fixin
Re: [BUG] completely bonkers use of set_need_resched + VM_FAULT_NOPAGE
On Fri, Sep 13, 2013 at 10:41:54AM +0200, Daniel Vetter wrote: > On Fri, Sep 13, 2013 at 10:29 AM, Peter Zijlstra wrote: > > On Fri, Sep 13, 2013 at 09:46:03AM +0200, Thomas Hellstrom wrote: > >> >>if (!bo_tryreserve()) { > >> >> up_read mmap_sem(); // Release the mmap_sem to avoid deadlocks. > >> >> bo_reserve(); // Wait for the BO to become available > >> >> (interruptible) > >> >> bo_unreserve(); // Where is bo_wait_unreserved() when we > >> >> need it, Maarten :P > >> >> return VM_FAULT_RETRY; // Go ahead and retry the VMA walk, after > >> >> regrabbing > >> >>} > >> > >> Anyway, could you describe what is wrong, with the above solution, because > >> it seems perfectly legal to me. > > > > Luckily the rule of law doesn't have anything to do with this stuff -- > > at least I sincerely hope so. > > > > The thing that's wrong with that pattern is that its still not > > deterministic - although its a lot better than the pure trylock. Because > > you have to release and re-acquire with the trylock another user might > > have gotten in again. Its utterly prone to starvation. > > > > The acquire+release does remove the dead/life-lock scenario from the > > FIFO case, since blocking on the acquire will allow the other task to > > run (or even get boosted on -rt). > > > > Aside from that there's nothing particularly wrong with it and lockdep > > should be happy afaict (but I haven't had my morning juice yet). > > bo_reserve internally maps to a ww-mutex and task can already hold > ww-mutex (potentially even the same for especially nasty userspace). OK, yes I wasn't aware of that. Yes in that case you're quite right. ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[Bug 60857] Unstable display with Radeon 760G (ASUS M4A78L-M LE)
https://bugzilla.kernel.org/show_bug.cgi?id=60857 --- Comment #8 from Stuart Foster --- Created attachment 108311 --> https://bugzilla.kernel.org/attachment.cgi?id=108311&action=edit Dmesg output -- You are receiving this mail because: You are watching the assignee of the bug. ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[Bug 60857] Unstable display with Radeon 760G (ASUS M4A78L-M LE)
https://bugzilla.kernel.org/show_bug.cgi?id=60857 --- Comment #9 from Stuart Foster --- The dmesg is from a vanilla 3.11 kernel with dpm turned on and patch kbug60857.diff applied. -- You are receiving this mail because: You are watching the assignee of the bug. ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 3/4] drm: omap: Enable DT support for DMM
Enable use of DT for DMM/Tiler. Originally worked on by Andy Gross. Cc: Andy Gross Cc: DRI Development Signed-off-by: Archit Taneja --- drivers/gpu/drm/omapdrm/omap_dmm_tiler.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/drivers/gpu/drm/omapdrm/omap_dmm_tiler.c b/drivers/gpu/drm/omapdrm/omap_dmm_tiler.c index acf6678..59f17de 100644 --- a/drivers/gpu/drm/omapdrm/omap_dmm_tiler.c +++ b/drivers/gpu/drm/omapdrm/omap_dmm_tiler.c @@ -968,12 +968,23 @@ static const struct dev_pm_ops omap_dmm_pm_ops = { }; #endif +#if defined(CONFIG_OF) +static const struct of_device_id dmm_of_match[] = { + { .compatible = "ti,omap4-dmm", }, + { .compatible = "ti,omap5-dmm", }, + {}, +}; +#else +#define dmm_of_match NULL +#endif + struct platform_driver omap_dmm_driver = { .probe = omap_dmm_probe, .remove = omap_dmm_remove, .driver = { .owner = THIS_MODULE, .name = DMM_DRIVER_NAME, + .of_match_table = dmm_of_match, #ifdef CONFIG_PM .pm = &omap_dmm_pm_ops, #endif -- 1.8.1.2 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [BUG] completely bonkers use of set_need_resched + VM_FAULT_NOPAGE
On 09/13/2013 10:58 AM, Maarten Lankhorst wrote: Op 13-09-13 10:23, Thomas Hellstrom schreef: On 09/13/2013 09:51 AM, Maarten Lankhorst wrote: Op 13-09-13 09:46, Thomas Hellstrom schreef: On 09/13/2013 09:16 AM, Maarten Lankhorst wrote: Op 13-09-13 08:44, Thomas Hellstrom schreef: On 09/12/2013 11:50 PM, Maarten Lankhorst wrote: Op 12-09-13 18:44, Thomas Hellstrom schreef: On 09/12/2013 05:45 PM, Maarten Lankhorst wrote: Op 12-09-13 17:36, Daniel Vetter schreef: On Thu, Sep 12, 2013 at 5:06 PM, Peter Zijlstra wrote: So I'm poking around the preemption code and stumbled upon: drivers/gpu/drm/i915/i915_gem.c:set_need_resched(); drivers/gpu/drm/ttm/ttm_bo_vm.c:set_need_resched(); drivers/gpu/drm/ttm/ttm_bo_vm.c:set_need_resched(); drivers/gpu/drm/udl/udl_gem.c: set_need_resched(); All these sites basically do: while (!trylock()) yield(); which is a horrible and broken locking pattern. Firstly its deadlock prone, suppose the faulting process is a FIFOn+1 task that preempted the lock holder at FIFOn. Secondly the implementation is worse than usual by abusing VM_FAULT_NOPAGE, which is supposed to install a PTE so that the fault doesn't retry, but you're using it as a get out of fault path. And you're using set_need_resched() which is not something a driver should _ever_ touch. Now I'm going to take away set_need_resched() -- and while you can 'reimplement' it using set_thread_flag() you're not going to do that because it will be broken due to changes to the preempt code. So please as to fix ASAP and don't allow anybody to trick you into merging silly things like that again ;-) The set_need_resched in i915_gem.c:i915_gem_fault can actually be removed. It was there to give the error handler a chance to sneak in and reset the hw/sw tracking when the gpu is dead. That hack goes back to the days when the locking around our error handler was somewhere between nonexistent and totally broken, nowadays we keep things from live-locking by a bit of magic in i915_mutex_lock_interruptible. I'll whip up a patch to rip this out. I'll also check that our testsuite properly exercises this path (needs a bit of work on a quick look for better coverage). The one in ttm is just bonghits to shut up lockdep: ttm can recurse into it's own pagefault handler and then deadlock, the trylock just keeps lockdep quiet. We've had that bug arise in drm/i915 due to some fun userspace did and now have testcases for them. The right solution to fix this is to use copy_to|from_user_atomic in ttm everywhere it holds locks and have slowpaths which drops locks, copies stuff into a temp allocation and then continues. At least that's how we've fixed all those inversions in i915-gem. I'm not volunteering to fix this ;-) Ah the case where a mmap'd address is passed to the execbuf ioctl? :P Fine I'll look into it a bit, hopefully before tuesday. Else it might take a bit longer since I'll be on my way to plumbers.. I think a possible fix would be if fault() were allowed to return an error and drop the mmap_sem() before returning. Otherwise we need to track down all copy_to_user / copy_from_user which happen with bo::reserve held. Actually, from looking at the mm code, it seems OK to do the following: if (!bo_tryreserve()) { up_read mmap_sem(); // Release the mmap_sem to avoid deadlocks. bo_reserve(); // Wait for the BO to become available (interruptible) bo_unreserve(); // Where is bo_wait_unreserved() when we need it, Maarten :P return VM_FAULT_RETRY; // Go ahead and retry the VMA walk, after regrabbing } Is this meant as a jab at me? You're doing locking wrong here! Again! It's not meant as a jab at you. I'm sorry if it came out that way. It was meant as a joke. I wasn't aware the topic was sensitive. Anyway, could you describe what is wrong, with the above solution, because it seems perfectly legal to me. There is no substantial overhead, and there is no risc of deadlocks. Or do you mean it's bad because it confuses lockdep? Evil userspace can pass a bo as pointer to use for relocation lists, lockdep will warn when that locks up, but still.. This is already a problem now, and your fixing will only cause lockdep to explicitly warn on it. As previously mentioned, copy_from_user should return -EFAULT, since the VMAs are marked with VM_IO. It should not recurse into fault(), so evil user-space looses. You can make a complicated user program to test this, or simply use this function for debugging: void ttm_might_fault(void) { struct reservation_object obj; reservation_object_init(&obj); ww_mutex_lock(&obj.lock, NULL); ww_mutex_unlock(&obj.lock); reservation_object_fini(&obj); } Put it near every instance of copy_to_user/copy_from_user and you'll find the bugs. :) I'm still not convinced that there are any problems with this solution. Did you take what's said abo
[PATCH] drm, ttm Fix uninitialized warning
Fix uninitialized warning. drivers/gpu/drm/ttm/ttm_object.c: In function ‘ttm_base_object_lookup’: drivers/gpu/drm/ttm/ttm_object.c:213:10: error: ‘base’ may be used uninitialized in this function [-Werror=maybe-uninitialized] kref_put(&base->refcount, ttm_release_base); ^ drivers/gpu/drm/ttm/ttm_object.c:221:26: note: ‘base’ was declared here struct ttm_base_object *base; Signed-off-by: Prarit Bhargava Cc: David Airlie Cc: rcl...@redhat.com --- drivers/gpu/drm/ttm/ttm_object.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/ttm/ttm_object.c b/drivers/gpu/drm/ttm/ttm_object.c index 58a5f32..a868176 100644 --- a/drivers/gpu/drm/ttm/ttm_object.c +++ b/drivers/gpu/drm/ttm/ttm_object.c @@ -218,7 +218,7 @@ struct ttm_base_object *ttm_base_object_lookup(struct ttm_object_file *tfile, uint32_t key) { struct ttm_object_device *tdev = tfile->tdev; - struct ttm_base_object *base; + struct ttm_base_object *uninitialized_var(base); struct drm_hash_item *hash; int ret; -- 1.7.9.3 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[Bug 60857] Unstable display with Radeon 760G (ASUS M4A78L-M LE)
https://bugzilla.kernel.org/show_bug.cgi?id=60857 --- Comment #10 from Alex Deucher --- Created attachment 108321 --> https://bugzilla.kernel.org/attachment.cgi?id=108321&action=edit skip_set_power_state Here are two patches to test. Test them independently and let me know if either one helps. -- You are receiving this mail because: You are watching the assignee of the bug. ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[Bug 60857] Unstable display with Radeon 760G (ASUS M4A78L-M LE)
https://bugzilla.kernel.org/show_bug.cgi?id=60857 --- Comment #11 from Alex Deucher --- Created attachment 108331 --> https://bugzilla.kernel.org/attachment.cgi?id=108331&action=edit skip_clock_scaling -- You are receiving this mail because: You are watching the assignee of the bug. ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH] drm, ttm Fix uninitialized warning
On Fri, Sep 13, 2013 at 8:33 AM, Prarit Bhargava wrote: > Fix uninitialized warning. > > drivers/gpu/drm/ttm/ttm_object.c: In function ‘ttm_base_object_lookup’: > drivers/gpu/drm/ttm/ttm_object.c:213:10: error: ‘base’ may be used > uninitialized in this function [-Werror=maybe-uninitialized] > kref_put(&base->refcount, ttm_release_base); > ^ > drivers/gpu/drm/ttm/ttm_object.c:221:26: note: ‘base’ was declared here > struct ttm_base_object *base; > > Signed-off-by: Prarit Bhargava > Cc: David Airlie > Cc: rcl...@redhat.com Reviewed-by: Rob Clark > --- > drivers/gpu/drm/ttm/ttm_object.c |2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/ttm/ttm_object.c > b/drivers/gpu/drm/ttm/ttm_object.c > index 58a5f32..a868176 100644 > --- a/drivers/gpu/drm/ttm/ttm_object.c > +++ b/drivers/gpu/drm/ttm/ttm_object.c > @@ -218,7 +218,7 @@ struct ttm_base_object *ttm_base_object_lookup(struct > ttm_object_file *tfile, >uint32_t key) > { > struct ttm_object_device *tdev = tfile->tdev; > - struct ttm_base_object *base; > + struct ttm_base_object *uninitialized_var(base); > struct drm_hash_item *hash; > int ret; > > -- > 1.7.9.3 > > ___ > dri-devel mailing list > dri-devel@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/dri-devel ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH 3/4] drm: omap: Enable DT support for DMM
On Fri, Sep 13, 2013 at 5:14 AM, Archit Taneja wrote: > Enable use of DT for DMM/Tiler. > > Originally worked on by Andy Gross. looks good.. but do we want to get information about # of LUT's, etc, from DT? Or did we decide that we can reliably get this from the hw? I lost track of that discussion (I guess Andy would remember).. BR, -R > Cc: Andy Gross > Cc: DRI Development > Signed-off-by: Archit Taneja > --- > drivers/gpu/drm/omapdrm/omap_dmm_tiler.c | 11 +++ > 1 file changed, 11 insertions(+) > > diff --git a/drivers/gpu/drm/omapdrm/omap_dmm_tiler.c > b/drivers/gpu/drm/omapdrm/omap_dmm_tiler.c > index acf6678..59f17de 100644 > --- a/drivers/gpu/drm/omapdrm/omap_dmm_tiler.c > +++ b/drivers/gpu/drm/omapdrm/omap_dmm_tiler.c > @@ -968,12 +968,23 @@ static const struct dev_pm_ops omap_dmm_pm_ops = { > }; > #endif > > +#if defined(CONFIG_OF) > +static const struct of_device_id dmm_of_match[] = { > + { .compatible = "ti,omap4-dmm", }, > + { .compatible = "ti,omap5-dmm", }, > + {}, > +}; > +#else > +#define dmm_of_match NULL > +#endif > + > struct platform_driver omap_dmm_driver = { > .probe = omap_dmm_probe, > .remove = omap_dmm_remove, > .driver = { > .owner = THIS_MODULE, > .name = DMM_DRIVER_NAME, > + .of_match_table = dmm_of_match, > #ifdef CONFIG_PM > .pm = &omap_dmm_pm_ops, > #endif > -- > 1.8.1.2 > ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[Bug 60857] Unstable display with Radeon 760G (ASUS M4A78L-M LE)
https://bugzilla.kernel.org/show_bug.cgi?id=60857 --- Comment #12 from Stuart Foster --- (In reply to Alex Deucher from comment #11) > Created attachment 108331 [details] > skip_clock_scaling Both patches appear to work ok. -- You are receiving this mail because: You are watching the assignee of the bug. ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[Bug 59649] [r600][RV635] GPU lockup CP stall / GPU resets over and over - Kernel 3.7 to 3.11 inclusive
https://bugs.freedesktop.org/show_bug.cgi?id=59649 Shawn Starr changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #9 from Shawn Starr --- Closing, I have not had any resets anymore with the respective code changes. Much thanks to Alex for finding this issue! -- You are receiving this mail because: You are the assignee for the bug. ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[Bug 69321] starting openCL crashes/boots system
https://bugs.freedesktop.org/show_bug.cgi?id=69321 Alex Deucher changed: What|Removed |Added Assignee|mesa-dev@lists.freedesktop. |dri-devel@lists.freedesktop |org |.org Component|Other |Drivers/Gallium/r600 --- Comment #2 from Alex Deucher --- Can you bisect? -- You are receiving this mail because: You are the assignee for the bug. ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 1/4] drm/radeon/dpm/rs780: use drm_mode_vrefresh()
Rather than open coding it. Signed-off-by: Alex Deucher --- drivers/gpu/drm/radeon/rs780_dpm.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/gpu/drm/radeon/rs780_dpm.c b/drivers/gpu/drm/radeon/rs780_dpm.c index 828a776..afb7584 100644 --- a/drivers/gpu/drm/radeon/rs780_dpm.c +++ b/drivers/gpu/drm/radeon/rs780_dpm.c @@ -62,9 +62,7 @@ static void rs780_get_pm_mode_parameters(struct radeon_device *rdev) radeon_crtc = to_radeon_crtc(crtc); pi->crtc_id = radeon_crtc->crtc_id; if (crtc->mode.htotal && crtc->mode.vtotal) - pi->refresh_rate = - (crtc->mode.clock * 1000) / - (crtc->mode.htotal * crtc->mode.vtotal); + pi->refresh_rate = drm_mode_vrefresh(&crtc->mode); break; } } -- 1.8.3.1 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 2/4] drm/radeon/dpm/rs780: add some sanity checking to sclk scaling
Since the clock scaling is based on fb divider adjustments, make sure the other pll parameters are the same. Signed-off-by: Alex Deucher --- drivers/gpu/drm/radeon/rs780_dpm.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/gpu/drm/radeon/rs780_dpm.c b/drivers/gpu/drm/radeon/rs780_dpm.c index afb7584..31487ce 100644 --- a/drivers/gpu/drm/radeon/rs780_dpm.c +++ b/drivers/gpu/drm/radeon/rs780_dpm.c @@ -449,6 +449,12 @@ static int rs780_set_engine_clock_scaling(struct radeon_device *rdev, if (ret) return ret; + if ((min_dividers.ref_div != max_dividers.ref_div) || + (min_dividers.post_div != max_dividers.post_div) || + (max_dividers.ref_div != current_max_dividers.ref_div) || + (max_dividers.post_div != current_max_dividers.post_div)) + return -EINVAL; + rs780_force_fbdiv(rdev, max_dividers.fb_div); if (max_dividers.fb_div > min_dividers.fb_div) { -- 1.8.3.1 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 3/4] drm/radeon/dpm/rs780: don't enable sclk scaling if not required
If the low and high sclks are the same, there is no need to enable sclk scaling. This causes display stability issues on certain boards. Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=60857 Signed-off-by: Alex Deucher Cc: sta...@vger.kernel.org --- drivers/gpu/drm/radeon/rs780_dpm.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/radeon/rs780_dpm.c b/drivers/gpu/drm/radeon/rs780_dpm.c index 31487ce..eb336bf 100644 --- a/drivers/gpu/drm/radeon/rs780_dpm.c +++ b/drivers/gpu/drm/radeon/rs780_dpm.c @@ -499,6 +499,9 @@ static void rs780_activate_engine_clk_scaling(struct radeon_device *rdev, (new_state->sclk_low == old_state->sclk_low)) return; + if (new_state->sclk_high == new_state->sclk_low) + return; + rs780_clk_scaling_enable(rdev, true); } -- 1.8.3.1 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 4/4] drm/radeon/dpm/rs780: fix force_performance state for same sclks
If the low and high sclks within a power state are the same, there no need to enable sclk scaling. Enabling sclk scaling can cause display stability issues on some boards. Signed-off-by: Alex Deucher --- drivers/gpu/drm/radeon/rs780_dpm.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/radeon/rs780_dpm.c b/drivers/gpu/drm/radeon/rs780_dpm.c index eb336bf..6af8505 100644 --- a/drivers/gpu/drm/radeon/rs780_dpm.c +++ b/drivers/gpu/drm/radeon/rs780_dpm.c @@ -1043,8 +1043,10 @@ int rs780_dpm_force_performance_level(struct radeon_device *rdev, if (pi->voltage_control) rs780_force_voltage(rdev, pi->max_voltage); - WREG32_P(FVTHROT_FBDIV_REG1, 0, ~FORCE_FEEDBACK_DIV); - rs780_clk_scaling_enable(rdev, true); + if (ps->sclk_high != ps->sclk_low) { + WREG32_P(FVTHROT_FBDIV_REG1, 0, ~FORCE_FEEDBACK_DIV); + rs780_clk_scaling_enable(rdev, true); + } if (pi->voltage_control) { rs780_voltage_scaling_enable(rdev, true); -- 1.8.3.1 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH 1/4] drm/radeon/dpm/rs780: use drm_mode_vrefresh()
Am 13.09.2013 17:08, schrieb Alex Deucher: Rather than open coding it. Signed-off-by: Alex Deucher For this series: Reviewed-by: Christian König --- drivers/gpu/drm/radeon/rs780_dpm.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/gpu/drm/radeon/rs780_dpm.c b/drivers/gpu/drm/radeon/rs780_dpm.c index 828a776..afb7584 100644 --- a/drivers/gpu/drm/radeon/rs780_dpm.c +++ b/drivers/gpu/drm/radeon/rs780_dpm.c @@ -62,9 +62,7 @@ static void rs780_get_pm_mode_parameters(struct radeon_device *rdev) radeon_crtc = to_radeon_crtc(crtc); pi->crtc_id = radeon_crtc->crtc_id; if (crtc->mode.htotal && crtc->mode.vtotal) - pi->refresh_rate = - (crtc->mode.clock * 1000) / - (crtc->mode.htotal * crtc->mode.vtotal); + pi->refresh_rate = drm_mode_vrefresh(&crtc->mode); break; } } ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[Bug 60857] Unstable display with Radeon 760G (ASUS M4A78L-M LE)
https://bugzilla.kernel.org/show_bug.cgi?id=60857 --- Comment #14 from Stuart Foster --- (In reply to Alex Deucher from comment #13) > Great. I'll push the skip_clock_scaling patch upstream. Ok thank you. Stuart -- You are receiving this mail because: You are watching the assignee of the bug. ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[Bug 60857] Unstable display with Radeon 760G (ASUS M4A78L-M LE)
https://bugzilla.kernel.org/show_bug.cgi?id=60857 --- Comment #13 from Alex Deucher --- Great. I'll push the skip_clock_scaling patch upstream. -- You are receiving this mail because: You are watching the assignee of the bug. ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH] drm/radeon: Fix hmdi typo
I keep making that one, so checked if I was the only one. Apparently not. Cc: Alex Deucher Signed-off-by: Damien Lespiau --- drivers/gpu/drm/radeon/r600d.h | 2 +- drivers/gpu/drm/radeon/radeon_connectors.c | 2 +- drivers/gpu/drm/radeon/rv770d.h| 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/radeon/r600d.h b/drivers/gpu/drm/radeon/r600d.h index 7c78083..1312136 100644 --- a/drivers/gpu/drm/radeon/r600d.h +++ b/drivers/gpu/drm/radeon/r600d.h @@ -1004,7 +1004,7 @@ # define HDMI0_AVI_INFO_CONT (1 << 1) # define HDMI0_AUDIO_INFO_SEND (1 << 4) # define HDMI0_AUDIO_INFO_CONT (1 << 5) -# define HDMI0_AUDIO_INFO_SOURCE (1 << 6) /* 0 - sound block; 1 - hmdi regs */ +# define HDMI0_AUDIO_INFO_SOURCE (1 << 6) /* 0 - sound block; 1 - hdmi regs */ # define HDMI0_AUDIO_INFO_UPDATE (1 << 7) # define HDMI0_MPEG_INFO_SEND (1 << 8) # define HDMI0_MPEG_INFO_CONT (1 << 9) diff --git a/drivers/gpu/drm/radeon/radeon_connectors.c b/drivers/gpu/drm/radeon/radeon_connectors.c index 2399f25..fc0a217 100644 --- a/drivers/gpu/drm/radeon/radeon_connectors.c +++ b/drivers/gpu/drm/radeon/radeon_connectors.c @@ -1420,7 +1420,7 @@ radeon_dp_detect(struct drm_connector *connector, bool force) if (radeon_dp_getdpcd(radeon_connector)) ret = connector_status_connected; } else { - /* try non-aux ddc (DP to DVI/HMDI/etc. adapter) */ + /* try non-aux ddc (DP to DVI/HDMI/etc. adapter) */ if (radeon_ddc_probe(radeon_connector, false)) ret = connector_status_connected; } diff --git a/drivers/gpu/drm/radeon/rv770d.h b/drivers/gpu/drm/radeon/rv770d.h index 6bef2b7..d291625 100644 --- a/drivers/gpu/drm/radeon/rv770d.h +++ b/drivers/gpu/drm/radeon/rv770d.h @@ -852,7 +852,7 @@ #define AFMT_VBI_PACKET_CONTROL 0x7608 # define AFMT_GENERIC0_UPDATE (1 << 2) #define AFMT_INFOFRAME_CONTROL0 0x760c -# define AFMT_AUDIO_INFO_SOURCE(1 << 6) /* 0 - sound block; 1 - hmdi regs */ +# define AFMT_AUDIO_INFO_SOURCE(1 << 6) /* 0 - sound block; 1 - hdmi regs */ # define AFMT_AUDIO_INFO_UPDATE(1 << 7) # define AFMT_MPEG_INFO_UPDATE (1 << 10) #define AFMT_GENERIC0_7 0x7610 -- 1.8.3.1 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[Bug 58033] [r300g][r600g] Black gap artifacts when playing WoW
https://bugs.freedesktop.org/show_bug.cgi?id=58033 --- Comment #19 from Tomasz P. --- (In reply to comment #15) > I've just upgraded one of my 32 bit boxes to Fedora 18, which has allowed me > to compile Mesa git here. And I can now report that the corruption happens > with an RV730 chip too. This corruption appears with all shader backends (sb,llvm,sb+llvm) ? -- You are receiving this mail because: You are the assignee for the bug. ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH] drm/radeon: Fix hmdi typo
On Fri, Sep 13, 2013 at 11:37 AM, Damien Lespiau wrote: > I keep making that one, so checked if I was the only one. Apparently > not. Thanks! applied. Alex > > Cc: Alex Deucher > Signed-off-by: Damien Lespiau > --- > drivers/gpu/drm/radeon/r600d.h | 2 +- > drivers/gpu/drm/radeon/radeon_connectors.c | 2 +- > drivers/gpu/drm/radeon/rv770d.h| 2 +- > 3 files changed, 3 insertions(+), 3 deletions(-) > > diff --git a/drivers/gpu/drm/radeon/r600d.h b/drivers/gpu/drm/radeon/r600d.h > index 7c78083..1312136 100644 > --- a/drivers/gpu/drm/radeon/r600d.h > +++ b/drivers/gpu/drm/radeon/r600d.h > @@ -1004,7 +1004,7 @@ > # define HDMI0_AVI_INFO_CONT (1 << 1) > # define HDMI0_AUDIO_INFO_SEND (1 << 4) > # define HDMI0_AUDIO_INFO_CONT (1 << 5) > -# define HDMI0_AUDIO_INFO_SOURCE (1 << 6) /* 0 - sound block; 1 - hmdi > regs */ > +# define HDMI0_AUDIO_INFO_SOURCE (1 << 6) /* 0 - sound block; 1 - hdmi > regs */ > # define HDMI0_AUDIO_INFO_UPDATE (1 << 7) > # define HDMI0_MPEG_INFO_SEND (1 << 8) > # define HDMI0_MPEG_INFO_CONT (1 << 9) > diff --git a/drivers/gpu/drm/radeon/radeon_connectors.c > b/drivers/gpu/drm/radeon/radeon_connectors.c > index 2399f25..fc0a217 100644 > --- a/drivers/gpu/drm/radeon/radeon_connectors.c > +++ b/drivers/gpu/drm/radeon/radeon_connectors.c > @@ -1420,7 +1420,7 @@ radeon_dp_detect(struct drm_connector *connector, bool > force) > if (radeon_dp_getdpcd(radeon_connector)) > ret = connector_status_connected; > } else { > - /* try non-aux ddc (DP to DVI/HMDI/etc. > adapter) */ > + /* try non-aux ddc (DP to DVI/HDMI/etc. > adapter) */ > if (radeon_ddc_probe(radeon_connector, false)) > ret = connector_status_connected; > } > diff --git a/drivers/gpu/drm/radeon/rv770d.h b/drivers/gpu/drm/radeon/rv770d.h > index 6bef2b7..d291625 100644 > --- a/drivers/gpu/drm/radeon/rv770d.h > +++ b/drivers/gpu/drm/radeon/rv770d.h > @@ -852,7 +852,7 @@ > #define AFMT_VBI_PACKET_CONTROL 0x7608 > # define AFMT_GENERIC0_UPDATE (1 << 2) > #define AFMT_INFOFRAME_CONTROL0 0x760c > -# define AFMT_AUDIO_INFO_SOURCE(1 << 6) /* 0 - sound block; 1 > - hmdi regs */ > +# define AFMT_AUDIO_INFO_SOURCE(1 << 6) /* 0 - sound block; 1 > - hdmi regs */ > # define AFMT_AUDIO_INFO_UPDATE(1 << 7) > # define AFMT_MPEG_INFO_UPDATE (1 << 10) > #define AFMT_GENERIC0_7 0x7610 > -- > 1.8.3.1 > > ___ > dri-devel mailing list > dri-devel@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/dri-devel ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH 6/9] drm: Add a DRM_CAP_STEREO_3D capability for SET_CAP ioctl
David Herrmann gmail.com> writes: > > So just to be clear: Whenever a mode is present with 3D flags, it is > also a valid non-3D mode? Is this guaranteed? > Well.. Some HDTV's will when they receive a frame packed mode (1080*2+45=2205 pixels high) . Display just the top part. The bottom part of that is not on screen. So while it will not display it as 3d, it will discard half of the image. /Joakim ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH] drm, ttm Fix uninitialized warning
Hi On Fri, Sep 13, 2013 at 2:33 PM, Prarit Bhargava wrote: > Fix uninitialized warning. > > drivers/gpu/drm/ttm/ttm_object.c: In function ‘ttm_base_object_lookup’: > drivers/gpu/drm/ttm/ttm_object.c:213:10: error: ‘base’ may be used > uninitialized in this function [-Werror=maybe-uninitialized] > kref_put(&base->refcount, ttm_release_base); > ^ > drivers/gpu/drm/ttm/ttm_object.c:221:26: note: ‘base’ was declared here > struct ttm_base_object *base; > > Signed-off-by: Prarit Bhargava > Cc: David Airlie > Cc: rcl...@redhat.com Did some research on that, another fix is: diff --git a/drivers/gpu/drm/ttm/ttm_object.c b/drivers/gpu/drm/ttm/ttm_object.c index 58a5f32..6b7f7b7 100644 --- a/drivers/gpu/drm/ttm/ttm_object.c +++ b/drivers/gpu/drm/ttm/ttm_object.c @@ -228,7 +228,10 @@ struct ttm_base_object *ttm_base_object_lookup(struct ttm_object_file *tfile, if (likely(ret == 0)) { base = drm_hash_entry(hash, struct ttm_base_object, hash); ret = kref_get_unless_zero(&base->refcount) ? 0 : -EINVAL; + } else { + ret = -EINVAL; } + rcu_read_unlock(); if (unlikely(ret != 0)) Looks totally stupid but also silences the warning. In fact, the warning is triggered by rcu_read_unlock(); and only if PROVE_LOCKING and DEBUG_LOCK_ALLOC are enabled. And it's related to the IP-size of the rcu_read_unlock() path. I'd actually prefer "buf = NULL;" and an extended bail-out condition: if (unlikely(ret != 0 || !buf)) but it's not my decision, so: Reviewed-by: David Herrmann Cheers David > --- > drivers/gpu/drm/ttm/ttm_object.c |2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/ttm/ttm_object.c > b/drivers/gpu/drm/ttm/ttm_object.c > index 58a5f32..a868176 100644 > --- a/drivers/gpu/drm/ttm/ttm_object.c > +++ b/drivers/gpu/drm/ttm/ttm_object.c > @@ -218,7 +218,7 @@ struct ttm_base_object *ttm_base_object_lookup(struct > ttm_object_file *tfile, >uint32_t key) > { > struct ttm_object_device *tdev = tfile->tdev; > - struct ttm_base_object *base; > + struct ttm_base_object *uninitialized_var(base); > struct drm_hash_item *hash; > int ret; > > -- > 1.7.9.3 > > ___ > dri-devel mailing list > dri-devel@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/dri-devel ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH 7/9] drm/edid: Expose mandatory stereo modes for HDMI sinks
On Fri, Sep 13, 2013 at 6:10 PM, Joakim Plate wrote: > > Also, some logic aught to indicate pixel aspect ratio for the modes since > they are non square for the half res modes. Atm we completely ignore pixel aspect ratio, also for flatworld CEA modes. So I don't think we need to concer ourselves here about this, imo pixel aspect ratio support is orthogonal. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH 7/9] drm/edid: Expose mandatory stereo modes for HDMI sinks
Damien Lespiau intel.com> writes: > +static const struct s3d_mandatory_mode s3d_mandatory_modes[] = { > + { 1920, 1080, 24, 0, > + DRM_MODE_FLAG_3D_TOP_AND_BOTTOM | DRM_MODE_FLAG_3D_FRAME_PACKING }, > + { 1920, 1080, 50, DRM_MODE_FLAG_INTERLACE, > + DRM_MODE_FLAG_3D_SIDE_BY_SIDE_HALF }, > + { 1920, 1080, 60, DRM_MODE_FLAG_INTERLACE, > + DRM_MODE_FLAG_3D_SIDE_BY_SIDE_HALF }, > + { 1280, 720, 50, 0, > + DRM_MODE_FLAG_3D_TOP_AND_BOTTOM | DRM_MODE_FLAG_3D_FRAME_PACKING }, > + { 1280, 720, 60, 0, > + DRM_MODE_FLAG_3D_TOP_AND_BOTTOM | DRM_MODE_FLAG_3D_FRAME_PACKING } > +}; I may be missing something here... But.. The frame packed modes are much higher in pixels than this and include frame packing. 1080*2+45=2050 720*2+30=1470 Unless you intend to hide the left/right split in mesa or other place, we need to get the ability to render to both fields somehow. Either as the full 2050 pixels high or at 1080*2 and the driver adds the blanking. Also, some logic aught to indicate pixel aspect ratio for the modes since they are non square for the half res modes. /Joakim ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[Bug 69328] New: Recoverable and unrecoverable lockups with opencl-example on trinity APU
https://bugs.freedesktop.org/show_bug.cgi?id=69328 Priority: medium Bug ID: 69328 Assignee: dri-devel@lists.freedesktop.org Summary: Recoverable and unrecoverable lockups with opencl-example on trinity APU Severity: normal Classification: Unclassified OS: Linux (All) Reporter: slick...@gmx.com Hardware: x86-64 (AMD64) Status: NEW Version: git Component: Drivers/DRI/R600 Product: Mesa Software in use: Up to date mesa, llvm, clang, libclc, firmware, as of 20130910. Gentoo's Linux 3.11. opencl-example-12905ac620b83713b07ece763ff3c36fb3c2e7e5. Hardware in use: AMD A8 5600K APU (Radeon HD 7560D, Aruba), 32GB system RAM, Biostar Hi-Fi A85W motherboard. Steps to reproduce: Run hello_world program from opencl-example. First run works correctly. Second run either completely locks up the machine or locks up the GPU (which recovers after a short time). Same behavior for other tests - the first test completes and the second causes problems. This is what the second opencl-example run looks like: localhost opencl-example-12905ac620b83713b07ece763ff3c36fb3c2e7e5 # ./hello_world There are 1 platforms. There are 1 GPU devices. clCreateContext() succeeded. clCreateCommandQueue() succeeded. clCreateProgramWithSource() suceeded. clBuildProgram() suceeded. clCreateKernel() suceeded. clCreateBuffer() succeeded. clSetKernelArg() succeeded. clEnqueueNDRangeKernel() suceeded. ((( 10 second hang here, or forever if the machine is toast ))) clEnqueueReadBuffer() suceeded. pi = 3.141590 And, here is the dmesg output from the recoverable lockups: [ 1365.806285] radeon :00:01.0: GPU lockup CP stall for more than 1msec [ 1365.806292] radeon :00:01.0: GPU lockup (waiting for 0x7ec3 last fence id 0x7ec2) [ 1365.821261] radeon :00:01.0: Saved 559 dwords of commands on ring 0. [ 1365.821293] radeon :00:01.0: GPU softreset: 0x0008 [ 1365.821297] radeon :00:01.0: GRBM_STATUS = 0xB0003828 [ 1365.821300] radeon :00:01.0: GRBM_STATUS_SE0 = 0x0007 [ 1365.821304] radeon :00:01.0: GRBM_STATUS_SE1 = 0x0007 [ 1365.821307] radeon :00:01.0: SRBM_STATUS = 0x2040 [ 1365.821332] radeon :00:01.0: SRBM_STATUS2 = 0x [ 1365.821335] radeon :00:01.0: R_008674_CP_STALLED_STAT1 = 0x [ 1365.821338] radeon :00:01.0: R_008678_CP_STALLED_STAT2 = 0x4000 [ 1365.821341] radeon :00:01.0: R_00867C_CP_BUSY_STAT = 0x00010002 [ 1365.821344] radeon :00:01.0: R_008680_CP_STAT = 0x80220243 [ 1365.821347] radeon :00:01.0: R_00D034_DMA_STATUS_REG = 0x44C83D57 [ 1365.821350] radeon :00:01.0: R_00D834_DMA_STATUS_REG = 0x44C83D57 [ 1365.821354] radeon :00:01.0: VM_CONTEXT0_PROTECTION_FAULT_ADDR 0x [ 1365.821357] radeon :00:01.0: VM_CONTEXT0_PROTECTION_FAULT_STATUS 0x [ 1365.821360] radeon :00:01.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x [ 1365.821363] radeon :00:01.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x [ 1365.827029] radeon :00:01.0: GRBM_SOFT_RESET=0x4001 [ 1365.827083] radeon :00:01.0: SRBM_SOFT_RESET=0x0100 [ 1365.828237] radeon :00:01.0: GRBM_STATUS = 0x3828 [ 1365.828240] radeon :00:01.0: GRBM_STATUS_SE0 = 0x0007 [ 1365.828243] radeon :00:01.0: GRBM_STATUS_SE1 = 0x0007 [ 1365.828246] radeon :00:01.0: SRBM_STATUS = 0x2040 [ 1365.828271] radeon :00:01.0: SRBM_STATUS2 = 0x [ 1365.828274] radeon :00:01.0: R_008674_CP_STALLED_STAT1 = 0x [ 1365.828277] radeon :00:01.0: R_008678_CP_STALLED_STAT2 = 0x [ 1365.828280] radeon :00:01.0: R_00867C_CP_BUSY_STAT = 0x [ 1365.828283] radeon :00:01.0: R_008680_CP_STAT = 0x [ 1365.828286] radeon :00:01.0: R_00D034_DMA_STATUS_REG = 0x44C83D57 [ 1365.828289] radeon :00:01.0: R_00D834_DMA_STATUS_REG = 0x44C83D57 [ 1365.828317] radeon :00:01.0: GPU reset succeeded, trying to resume [ 1365.843638] [drm] PCIE GART of 512M enabled (table at 0x00276000). [ 1365.843775] radeon :00:01.0: WB enabled [ 1365.843781] radeon :00:01.0: fence driver on ring 0 use gpu addr 0x2c00 and cpu addr 0x8807dea6bc00 [ 1365.844520] radeon :00:01.0: fence driver on ring 5 use gpu addr 0x00075a18 and cpu addr 0xc900057b5a18 [ 1365.844524] radeon :00:01.0: fence driver on ring 1 use gpu addr 0x2c04 and cpu addr 0x8807dea6bc04 [ 1365.844528] radeon :00:01.0: fence driver on ring 2 use gpu addr 0x2c08 and cpu addr 0x8807dea6bc08 [ 1365.844532] radeon :00:01.0: fence driver on ring 3 use gpu addr 0x2c0c and cpu
[Bug 69328] Recoverable and unrecoverable lockups with opencl-example on trinity APU
https://bugs.freedesktop.org/show_bug.cgi?id=69328 Alex Deucher changed: What|Removed |Added Component|Drivers/DRI/R600|Drivers/Gallium/r600 --- Comment #1 from Alex Deucher --- Is this a regression? If so, can you bisect? -- You are receiving this mail because: You are the assignee for the bug. ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[Bug 69321] starting openCL crashes/boots system
https://bugs.freedesktop.org/show_bug.cgi?id=69321 --- Comment #3 from udo --- Trying to find a working commit first. -- You are receiving this mail because: You are the assignee for the bug. ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH] drm/radeon/dpm: rework auto performance level enable
Calling force_performance_level() from set_power_state() doesn't work on some asics because the current power state pointer has not been properly updated at that point. Move the calls to force_performance_level() out of the asic specific set_power_state() functions and into the main power state sequence. Fixes dpm resume on SI. Signed-off-by: Alex Deucher --- drivers/gpu/drm/radeon/btc_dpm.c | 6 -- drivers/gpu/drm/radeon/ci_dpm.c | 6 -- drivers/gpu/drm/radeon/cypress_dpm.c | 6 -- drivers/gpu/drm/radeon/kv_dpm.c | 1 - drivers/gpu/drm/radeon/ni_dpm.c | 6 -- drivers/gpu/drm/radeon/radeon_pm.c | 14 +- drivers/gpu/drm/radeon/rv6xx_dpm.c | 2 -- drivers/gpu/drm/radeon/rv770_dpm.c | 6 -- drivers/gpu/drm/radeon/si_dpm.c | 6 -- drivers/gpu/drm/radeon/sumo_dpm.c| 2 -- drivers/gpu/drm/radeon/trinity_dpm.c | 1 - 11 files changed, 9 insertions(+), 47 deletions(-) diff --git a/drivers/gpu/drm/radeon/btc_dpm.c b/drivers/gpu/drm/radeon/btc_dpm.c index 084e694..05ff315 100644 --- a/drivers/gpu/drm/radeon/btc_dpm.c +++ b/drivers/gpu/drm/radeon/btc_dpm.c @@ -2340,12 +2340,6 @@ int btc_dpm_set_power_state(struct radeon_device *rdev) return ret; } - ret = rv770_dpm_force_performance_level(rdev, RADEON_DPM_FORCED_LEVEL_AUTO); - if (ret) { - DRM_ERROR("rv770_dpm_force_performance_level failed\n"); - return ret; - } - return 0; } diff --git a/drivers/gpu/drm/radeon/ci_dpm.c b/drivers/gpu/drm/radeon/ci_dpm.c index 3cce533..8996274 100644 --- a/drivers/gpu/drm/radeon/ci_dpm.c +++ b/drivers/gpu/drm/radeon/ci_dpm.c @@ -4748,12 +4748,6 @@ int ci_dpm_set_power_state(struct radeon_device *rdev) if (pi->pcie_performance_request) ci_notify_link_speed_change_after_state_change(rdev, new_ps, old_ps); - ret = ci_dpm_force_performance_level(rdev, RADEON_DPM_FORCED_LEVEL_AUTO); - if (ret) { - DRM_ERROR("ci_dpm_force_performance_level failed\n"); - return ret; - } - cik_update_cg(rdev, (RADEON_CG_BLOCK_GFX | RADEON_CG_BLOCK_MC | RADEON_CG_BLOCK_SDMA | diff --git a/drivers/gpu/drm/radeon/cypress_dpm.c b/drivers/gpu/drm/radeon/cypress_dpm.c index 95a66db..91bb470 100644 --- a/drivers/gpu/drm/radeon/cypress_dpm.c +++ b/drivers/gpu/drm/radeon/cypress_dpm.c @@ -2014,12 +2014,6 @@ int cypress_dpm_set_power_state(struct radeon_device *rdev) if (eg_pi->pcie_performance_request) cypress_notify_link_speed_change_after_state_change(rdev, new_ps, old_ps); - ret = rv770_dpm_force_performance_level(rdev, RADEON_DPM_FORCED_LEVEL_AUTO); - if (ret) { - DRM_ERROR("rv770_dpm_force_performance_level failed\n"); - return ret; - } - return 0; } diff --git a/drivers/gpu/drm/radeon/kv_dpm.c b/drivers/gpu/drm/radeon/kv_dpm.c index b98b9c9..7139906 100644 --- a/drivers/gpu/drm/radeon/kv_dpm.c +++ b/drivers/gpu/drm/radeon/kv_dpm.c @@ -1854,7 +1854,6 @@ int kv_dpm_set_power_state(struct radeon_device *rdev) RADEON_CG_BLOCK_BIF | RADEON_CG_BLOCK_HDP), true); - rdev->pm.dpm.forced_level = RADEON_DPM_FORCED_LEVEL_AUTO; return 0; } diff --git a/drivers/gpu/drm/radeon/ni_dpm.c b/drivers/gpu/drm/radeon/ni_dpm.c index f7b625c..6c398a4 100644 --- a/drivers/gpu/drm/radeon/ni_dpm.c +++ b/drivers/gpu/drm/radeon/ni_dpm.c @@ -3865,12 +3865,6 @@ int ni_dpm_set_power_state(struct radeon_device *rdev) return ret; } - ret = ni_dpm_force_performance_level(rdev, RADEON_DPM_FORCED_LEVEL_AUTO); - if (ret) { - DRM_ERROR("ni_dpm_force_performance_level failed\n"); - return ret; - } - return 0; } diff --git a/drivers/gpu/drm/radeon/radeon_pm.c b/drivers/gpu/drm/radeon/radeon_pm.c index d41ac8a..87e1d69 100644 --- a/drivers/gpu/drm/radeon/radeon_pm.c +++ b/drivers/gpu/drm/radeon/radeon_pm.c @@ -917,10 +917,13 @@ static void radeon_dpm_change_power_state_locked(struct radeon_device *rdev) radeon_dpm_post_set_power_state(rdev); - /* force low perf level for thermal */ - if (rdev->pm.dpm.thermal_active && - rdev->asic->dpm.force_performance_level) { - radeon_dpm_force_performance_level(rdev, RADEON_DPM_FORCED_LEVEL_LOW); + if (rdev->asic->dpm.force_performance_level) { + if (rdev->pm.dpm.thermal_active) + /* force low perf level for thermal */ + radeon_dpm_force_performance_level(rdev, RADEON_DPM_FORCED_LEVEL_LOW); + else + /* otherwise, enable auto */ + radeon_dpm_force_performance_level(rdev, RADEON_DPM_FORCED_LEVEL_AUTO); } done: @@ -1149,9 +1152,10 @@ stati
[Bug 68235] Display freezes after login with kernel 3.11.0-rc5 on Cayman with dpm=1
https://bugs.freedesktop.org/show_bug.cgi?id=68235 --- Comment #26 from Alex Deucher --- Can you attach your dmesg with dpm enabled? -- You are receiving this mail because: You are the assignee for the bug. ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[Bug 69245] Opencl random lockups whilst running tstellar's opencl-examples
https://bugs.freedesktop.org/show_bug.cgi?id=69245 --- Comment #1 from Tom Stellard --- Can you test this with the latest version of Mesa from git? I was able to reproduce this on the 9.2 branch by running three separate instances of the run_tests.sh script at the same time as the Lightsmark demo. However, I could not reproduce this on master, so maybe it has been fixed. -- You are receiving this mail because: You are the assignee for the bug. ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[Bug 68235] Display freezes after login with kernel 3.11.0-rc5 on Cayman with dpm=1
https://bugs.freedesktop.org/show_bug.cgi?id=68235 --- Comment #27 from Alexandre Demers --- (In reply to comment #26) > Can you attach your dmesg with dpm enabled? Do you mean with the patch applied (total and/or problematic part left alone)? -- You are receiving this mail because: You are the assignee for the bug. ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[Bug 68235] Display freezes after login with kernel 3.11.0-rc5 on Cayman with dpm=1
https://bugs.freedesktop.org/show_bug.cgi?id=68235 --- Comment #28 from Alex Deucher --- (In reply to comment #27) > (In reply to comment #26) > > Can you attach your dmesg with dpm enabled? > > Do you mean with the patch applied (total and/or problematic part left > alone)? Doesn't matter. I just want to see the basic driver output and power state list. -- You are receiving this mail because: You are the assignee for the bug. ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH] drm/radeon: fix panel scaling with eDP and LVDS bridges
We were using the wrong set_properly callback so we always ended up with Full scaling even if something else (Center or Full aspect) was selected. Signed-off-by: Alex Deucher Cc: sta...@vger.kernel.org --- drivers/gpu/drm/radeon/radeon_connectors.c | 34 +++--- 1 file changed, 31 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon_connectors.c b/drivers/gpu/drm/radeon/radeon_connectors.c index f7c8c6e..79159b5 100644 --- a/drivers/gpu/drm/radeon/radeon_connectors.c +++ b/drivers/gpu/drm/radeon/radeon_connectors.c @@ -1504,6 +1504,24 @@ static const struct drm_connector_funcs radeon_dp_connector_funcs = { .force = radeon_dvi_force, }; +static const struct drm_connector_funcs radeon_edp_connector_funcs = { + .dpms = drm_helper_connector_dpms, + .detect = radeon_dp_detect, + .fill_modes = drm_helper_probe_single_connector_modes, + .set_property = radeon_lvds_set_property, + .destroy = radeon_dp_connector_destroy, + .force = radeon_dvi_force, +}; + +static const struct drm_connector_funcs radeon_lvds_bridge_connector_funcs = { + .dpms = drm_helper_connector_dpms, + .detect = radeon_dp_detect, + .fill_modes = drm_helper_probe_single_connector_modes, + .set_property = radeon_lvds_set_property, + .destroy = radeon_dp_connector_destroy, + .force = radeon_dvi_force, +}; + void radeon_add_atom_connector(struct drm_device *dev, uint32_t connector_id, @@ -1595,8 +1613,6 @@ radeon_add_atom_connector(struct drm_device *dev, goto failed; radeon_dig_connector->igp_lane_info = igp_lane_info; radeon_connector->con_priv = radeon_dig_connector; - drm_connector_init(dev, &radeon_connector->base, &radeon_dp_connector_funcs, connector_type); - drm_connector_helper_add(&radeon_connector->base, &radeon_dp_connector_helper_funcs); if (i2c_bus->valid) { /* add DP i2c bus */ if (connector_type == DRM_MODE_CONNECTOR_eDP) @@ -1613,6 +1629,10 @@ radeon_add_atom_connector(struct drm_device *dev, case DRM_MODE_CONNECTOR_VGA: case DRM_MODE_CONNECTOR_DVIA: default: + drm_connector_init(dev, &radeon_connector->base, + &radeon_dp_connector_funcs, connector_type); + drm_connector_helper_add(&radeon_connector->base, + &radeon_dp_connector_helper_funcs); connector->interlace_allowed = true; connector->doublescan_allowed = true; radeon_connector->dac_load_detect = true; @@ -1625,6 +1645,10 @@ radeon_add_atom_connector(struct drm_device *dev, case DRM_MODE_CONNECTOR_HDMIA: case DRM_MODE_CONNECTOR_HDMIB: case DRM_MODE_CONNECTOR_DisplayPort: + drm_connector_init(dev, &radeon_connector->base, + &radeon_dp_connector_funcs, connector_type); + drm_connector_helper_add(&radeon_connector->base, + &radeon_dp_connector_helper_funcs); drm_object_attach_property(&radeon_connector->base.base, rdev->mode_info.underscan_property, UNDERSCAN_OFF); @@ -1652,6 +1676,10 @@ radeon_add_atom_connector(struct drm_device *dev, break; case DRM_MODE_CONNECTOR_LVDS: case DRM_MODE_CONNECTOR_eDP: + drm_connector_init(dev, &radeon_connector->base, + &radeon_lvds_bridge_connector_funcs, connector_type); + drm_connector_helper_add(&radeon_connector->base, + &radeon_dp_connector_helper_funcs); drm_object_attach_property(&radeon_connector->base.base, dev->mode_config.scaling_mode_property, DRM_MODE_SCALE_FULLSCREEN); @@ -1830,7 +1858,7 @@ radeon_add_atom_connector(struct drm_device *dev, goto failed; radeon_dig_connector->igp_lane_info = igp_lane_info; radeon_connector->con_priv = radeon_dig_connector; - drm_connector_init(dev, &radeon_connector->base, &radeon_dp_connector_funcs, connector_type); + drm_connector_init(dev, &radeon_connector->base, &radeon_edp_connector_funcs, connector_type); drm_connector_helper_add(&radeon_connector->base, &radeon_dp_
[Bug 68224] [radeonsi] Serious Sam3 is segfaulting (LLVM assert)
https://bugs.freedesktop.org/show_bug.cgi?id=68224 --- Comment #14 from Tom Stellard --- (In reply to comment #13) > (In reply to comment #12) > > Created attachment 85372 [details] [review] [review] > > SGPR register spilling patch v2 > > > > Can you try this v2 patch? It fixes the bug Michel found plus another one. > > Same result as patch v1, GPU lockup instead of llvm assert; Sanctuary demo > gives also a GPU lockup Sanctuary doesn't lockup for me with this patch, but the only thing visible is the torch. Everything else is black. What settings are you using with Sanctuary? -- You are receiving this mail because: You are the assignee for the bug. ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[Bug 69340] New: Recent mesa git revisions cause frequent gpu hangs on radeonsi
https://bugs.freedesktop.org/show_bug.cgi?id=69340 Priority: medium Bug ID: 69340 Assignee: dri-devel@lists.freedesktop.org Summary: Recent mesa git revisions cause frequent gpu hangs on radeonsi Severity: normal Classification: Unclassified OS: Linux (All) Reporter: j.suarez.agap...@gmail.com Hardware: x86-64 (AMD64) Status: NEW Version: git Component: Drivers/Gallium/radeonsi Product: Mesa After installing mesa git 395b9410 (from oibaf's ppa on Kubuntu raring) I am experiencing gpu hangs and kernel panics when launching "somewhat complex" 3D games. For example, glxgears and supertuxkart do not produce the gpu hang, but speed-dreams2 (it hangs when the game should show your car in order to drive), L4D2 (just after the Valve logo-video, just when the game intro movie should start playing) and Crusader Kings II (just at the very beginning, when the loading screen should come up). The last mesa git version I had installed was 505fad04, which works correctly. Moreover, the crashes happen both with radeon.dpm=1 and radeon.dpm=0. I have managed to get some dmesg outputs of the crashes: Crash #1 [ 334.162270] radeon :01:00.0: GPU lockup CP stall for more than 1msec [ 334.162280] radeon :01:00.0: GPU lockup (waiting for 0x000160ea) [ 334.162289] radeon :01:00.0: failed to get a new IB (-35) [ 334.162291] [TTM] Failed to expire sync object before buffer eviction [ 334.162299] [drm:radeon_cs_ib_vm_chunk] *ERROR* Failed to get ib ! [ 334.162378] [TTM] Failed to expire sync object before buffer eviction [ 334.172123] radeon :01:00.0: sa_manager is not empty, clearing anyway [ 334.381742] radeon :01:00.0: Saved 97917 dwords of commands on ring 0. [ 334.381879] radeon :01:00.0: GPU softreset: 0x0049 [ 334.381882] radeon :01:00.0: GRBM_STATUS = 0xE5D04028 [ 334.381884] radeon :01:00.0: GRBM_STATUS_SE0 = 0xEE40 [ 334.381886] radeon :01:00.0: GRBM_STATUS_SE1 = 0xEE40 [ 334.381889] radeon :01:00.0: SRBM_STATUS = 0x20C0 [ 334.382000] radeon :01:00.0: SRBM_STATUS2 = 0x [ 334.382002] radeon :01:00.0: R_008674_CP_STALLED_STAT1 = 0x [ 334.382004] radeon :01:00.0: R_008678_CP_STALLED_STAT2 = 0x00018000 [ 334.382006] radeon :01:00.0: R_00867C_CP_BUSY_STAT = 0x00408002 [ 334.382009] radeon :01:00.0: R_008680_CP_STAT = 0x84038643 [ 334.382011] radeon :01:00.0: R_00D034_DMA_STATUS_REG = 0x44C83D57 [ 334.382013] radeon :01:00.0: R_00D834_DMA_STATUS_REG = 0x44C83D57 [ 334.382016] radeon :01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x [ 334.382018] radeon :01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x [ 334.386528] radeon :01:00.0: GRBM_SOFT_RESET=0xDDFF [ 334.386582] radeon :01:00.0: SRBM_SOFT_RESET=0x0100 [ 334.387728] radeon :01:00.0: GRBM_STATUS = 0x3028 [ 334.387730] radeon :01:00.0: GRBM_STATUS_SE0 = 0x0006 [ 334.387731] radeon :01:00.0: GRBM_STATUS_SE1 = 0x0006 [ 334.387733] radeon :01:00.0: SRBM_STATUS = 0x20C0 [ 334.387844] radeon :01:00.0: SRBM_STATUS2 = 0x [ 334.387846] radeon :01:00.0: R_008674_CP_STALLED_STAT1 = 0x [ 334.387848] radeon :01:00.0: R_008678_CP_STALLED_STAT2 = 0x [ 334.387850] radeon :01:00.0: R_00867C_CP_BUSY_STAT = 0x [ 334.387852] radeon :01:00.0: R_008680_CP_STAT = 0x [ 334.387854] radeon :01:00.0: R_00D034_DMA_STATUS_REG = 0x44C83D57 [ 334.387856] radeon :01:00.0: R_00D834_DMA_STATUS_REG = 0x44C83D57 [ 334.387981] radeon :01:00.0: GPU reset succeeded, trying to resume [ 334.415260] [drm] probing gen 2 caps for device 1002:5a16 = 31cd02/0 [ 334.415264] [drm] PCIE gen 2 link speeds already enabled [ 334.417417] [drm] PCIE GART of 512M enabled (table at 0x00276000). [ 334.417520] radeon :01:00.0: WB enabled [ 334.417522] radeon :01:00.0: fence driver on ring 0 use gpu addr 0x8c00 and cpu addr 0x880412af4c00 [ 334.417524] radeon :01:00.0: fence driver on ring 1 use gpu addr 0x8c04 and cpu addr 0x880412af4c04 [ 334.417526] radeon :01:00.0: fence driver on ring 2 use gpu addr 0x8c08 and cpu addr 0x880412af4c08 [ 334.417528] radeon :01:00.0: fence driver on ring 3 use gpu addr 0x8c0c and cpu addr 0x880412af4c0c [ 334.417530] radeon :01:00.0: fence driver on ring 4 use gpu addr 0x8c10 and cpu addr 0x880412af4c10 [ 334.418521] radeon :01:00.0: fence driver on ring 5 use gpu addr 0x00075a18 and cpu addr 0xc90011db5a18 [ 334.43691
[Bug 69340] Recent mesa git revisions cause frequent gpu hangs on radeonsi
https://bugs.freedesktop.org/show_bug.cgi?id=69340 --- Comment #1 from José Suárez --- Created attachment 85793 --> https://bugs.freedesktop.org/attachment.cgi?id=85793&action=edit Full dmesg Full dmesg of the system -- You are receiving this mail because: You are the assignee for the bug. ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[Bug 69328] Recoverable and unrecoverable lockups with opencl-example on trinity APU
https://bugs.freedesktop.org/show_bug.cgi?id=69328 --- Comment #2 from Tom Stellard --- Created attachment 85794 --> https://bugs.freedesktop.org/attachment.cgi?id=85794&action=edit Don't set DB_DEST or CB_DEST* bit on cp_coher_cntl This patch fixes the hangs for me and all the run_test.sh tests pass. However, this is just a hack and not a proper solution. Can you test this patch? -- You are receiving this mail because: You are the assignee for the bug. ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[Bug 69340] Recent mesa git revisions cause frequent gpu hangs on radeonsi
https://bugs.freedesktop.org/show_bug.cgi?id=69340 --- Comment #2 from José Suárez --- Just an update: I have tried building the .deb packages with the lines committed in 395b9410 removed from the source and the hangs are still there, so I am not sure if the problem lies in that commit... -- You are receiving this mail because: You are the assignee for the bug. ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[Bug 69340] Recent mesa git revisions cause frequent gpu hangs on radeonsi
https://bugs.freedesktop.org/show_bug.cgi?id=69340 --- Comment #3 from Hohahiu --- Created attachment 85797 --> https://bugs.freedesktop.org/attachment.cgi?id=85797&action=edit dmesg -- You are receiving this mail because: You are the assignee for the bug. ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[Bug 69340] Recent mesa git revisions cause frequent gpu hangs on radeonsi
https://bugs.freedesktop.org/show_bug.cgi?id=69340 --- Comment #4 from Hohahiu --- I'm experiencing similar problems with unigene tropics. File attached above is my dmesg. My specs: intel hd 4000 + AMD Radeon 7750M Software is: OpenSUSE 12.3 x86_64 kernel-3.11 Mesa, libdrm are from git xserver 1.14 -- You are receiving this mail because: You are the assignee for the bug. ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[Bug 68235] Display freezes after login with kernel 3.11.0-rc5 on Cayman with dpm=1
https://bugs.freedesktop.org/show_bug.cgi?id=68235 --- Comment #29 from Alexandre Demers --- Created attachment 85798 --> https://bugs.freedesktop.org/attachment.cgi?id=85798&action=edit dpm=1 with partial patch applied on 3.11.0 dmesg output when dpm=1 with partial patch applied (deactivation of pretty much everything but one to pass ni_upload_sw_state) -- You are receiving this mail because: You are the assignee for the bug. ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[Bug 64867] Hangs on Cayman (HD6950) when watching flash/using vdpau
https://bugs.freedesktop.org/show_bug.cgi?id=64867 --- Comment #10 from Harald Judt --- With current up-to-date git versions of libdrm, mesa, xorg-server and xf86-video-ati, the R600_DEBUG=nodma hack no longer seems necessary (linux-3.11.0-rc6 with UVD disabled); the GPU faults have vanished and the system is stable. -- You are receiving this mail because: You are the assignee for the bug. ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[Patch] drm/nouveau: nouveau/nouveau_abi16.c does not return offset of allocated notifier
Hi, The following patch fixes problems retrieving query results. I have tested it on nv40 - 6800 Ultra. Best Wishes, Bob --- a/drivers/gpu/drm/nouveau/nouveau_abi16.c +++ b/drivers/gpu/drm/nouveau/nouveau_abi16.c @@ -445,6 +445,7 @@ nouveau_abi16_ioctl_notifierobj_alloc(ABI16_IOCTL_ARGS) sizeof(args), &object); if (ret) goto done; + info->offset = ntfy->node->offset; done: if (ret) ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[Bug 59649] [r600][RV635] GPU lockup CP stall / GPU resets over and over - Kernel 3.7 to 3.11 inclusive
https://bugs.freedesktop.org/show_bug.cgi?id=59649 Shawn Starr changed: What|Removed |Added Status|RESOLVED|REOPENED Resolution|FIXED |--- --- Comment #10 from Shawn Starr --- Reopen :/ At least now i can trigger the crash repeatedly. 1) Log into Second Life first 2) You need to patch some of the GLSL programs as they will fail with Mesa 9.2 GLSL compiler - #extension GL_ARB_texture_rectangle : enable +/* #extension GL_ARB_texture_rectangle : enable */ 3) Go to the Graphics options and under Shaders, enable: - Basic Shaders - Atmospheric Shaders - Advanced Lighting Model - Ambient Occlusion - Depth of field GPU will reset: [566574.634495] switching from power state: [566574.634497] ui class: performance [566574.634498] internal class: none [566574.634500] caps: single_disp video [566574.634501] uvdvclk: 0 dclk: 0 [566574.634502] power level 0sclk: 11000 mclk: 40500 vddc: 900 [566574.634503] power level 1sclk: 3 mclk: 7 vddc: 1100 [566574.634504] power level 2sclk: 6 mclk: 7 vddc: 1100 [566574.634505] status: c [566574.634506] switching to power state: [566574.634507] ui class: performance [566574.634508] internal class: none [566574.634509] caps: video [566574.634509] uvdvclk: 0 dclk: 0 [566574.634510] power level 0sclk: 3 mclk: 7 vddc: 1100 [566574.634511] power level 1sclk: 3 mclk: 7 vddc: 1100 [566574.634512] power level 2sclk: 6 mclk: 7 vddc: 1100 [566574.634513] status: r [566584.067826] switching from power state: [566584.067830] ui class: performance [566584.067831] internal class: none [566584.067833] caps: video [566584.067835] uvdvclk: 0 dclk: 0 [566584.067836] power level 0sclk: 3 mclk: 7 vddc: 1100 [566584.067837] power level 1sclk: 3 mclk: 7 vddc: 1100 [566584.067839] power level 2sclk: 6 mclk: 7 vddc: 1100 [566584.067840] status: c [566584.067841] switching to power state: [566584.067842] ui class: performance [566584.067843] internal class: none [566584.067844] caps: single_disp video [566584.067846] uvdvclk: 0 dclk: 0 [566584.067847] power level 0sclk: 11000 mclk: 40500 vddc: 900 [566584.067848] power level 1sclk: 3 mclk: 7 vddc: 1100 [566584.067849] power level 2sclk: 6 mclk: 7 vddc: 1100 [566584.067850] status: r [568371.037065] radeon :01:00.0: GPU lockup CP stall for more than 1msec [568371.044281] radeon :01:00.0: GPU lockup (waiting for 0x017b4541 last fence id 0x017b4531) [568371.111399] switching from power state: [568371.111401] ui class: performance [568371.111402] internal class: none [568371.111403] caps: single_disp video [568371.111403] uvdvclk: 0 dclk: 0 [568371.111405] power level 0sclk: 11000 mclk: 40500 vddc: 900 [568371.111405] power level 1sclk: 3 mclk: 7 vddc: 1100 [568371.111406] power level 2sclk: 6 mclk: 7 vddc: 1100 [568371.111406] status: c [568371.111407] switching to power state: [568371.111407] ui class: performance [568371.111408] internal class: none [568371.111408] caps: video [568371.111409] uvdvclk: 0 dclk: 0 [568371.111409] power level 0sclk: 3 mclk: 7 vddc: 1100 [568371.111410] power level 1sclk: 3 mclk: 7 vddc: 1100 [568371.111410] power level 2sclk: 6 mclk: 7 vddc: 1100 [568371.111411] status: r [568371.544089] radeon :01:00.0: GPU lockup CP stall for more than 10507msec [568371.550588] radeon :01:00.0: GPU lockup (waiting for 0x017b4532) [568371.550591] radeon :01:00.0: failed to get a new IB (-35) [568371.555183] [drm:radeon_cs_ib_chunk] *ERROR* Failed to get ib ! [568371.561868] radeon :01:00.0: Saved 505 dwords of commands on ring 0. [568371.561878] radeon :01:00.0: GPU softreset: 0x0008 [568371.561880] radeon :01:00.0: R_008010_GRBM_STATUS = 0xA0002030 [568371.561882] radeon :01:00.0: R_008014_GRBM_STATUS2 = 0x0003 [568371.561884] radeon :01:00.0: R_000E50_SRBM_STATUS = 0x20C0 [568371.561886] radeon :01:00.0: R_008674_CP_STALLED_STAT1 = 0x [568371.561888] radeon :01:00.0: R_008678_CP_STALLED_STAT2 = 0x [568371.561890] radeon :01:00.0: R_00867C_CP_BUSY_STAT = 0x00020186 [568371.561892] r