ast: cursor flashing softlockups
[ +to Scot Doyle ] Scot, please take a look at this soft lockup. Regards, Peter Hurley Hi Ming, On 05/17/2016 02:12 AM, Ming Lei wrote: > Hi, > > On Tue, May 17, 2016 at 4:07 AM, Dann Frazier > wrote: >> Hi, >> I'm observing a soft lockup issue w/ the ASPEED controller on an >> arm64 server platform. This was originally seen on Ubuntu's 4.4 >> kernel, but it is reproducible w/ vanilla 4.6-rc7 as well. >> >> [ 32.792656] NMI watchdog: BUG: soft lockup - CPU#38 stuck for 22s! >> [swapper/38:0] >> >> I observe this just once each time I boot into debian-installer (I'm >> using a serial console, but the ast module gets loaded during >> startup). > > I have figured out that it is caused by 'mod_timer(timer, jiffies)' and > 'ops->cur_blink_jiffies' is observed as zero in cursor_timer_handler() > when the issue happened. Thanks for tracking this down. This softlockup looks to be caused by: commit 27a4c827c34ac4256a190cc9d24607f953c1c459 Author: Scot Doyle Date: Thu Mar 26 13:56:38 2015 + fbcon: use the cursor blink interval provided by vt vt now provides a cursor blink interval via vc_data. Use this interval instead of the currently hardcoded 200 msecs. Store it in fbcon_ops to avoid locking the console in cursor_timer_handler(). Signed-off-by: Scot Doyle Acked-by: Pavel Machek Signed-off-by: Greg Kroah-Hartman and commit bd63364caa8df38bad2b25b11b2a1b849475cce5 Author: Scot Doyle Date: Thu Mar 26 13:54:39 2015 + vt: add cursor blink interval escape sequence Add an escape sequence to specify the current console's cursor blink interval. The interval is specified as a number of milliseconds until the next cursor display state toggle, from 50 to 65535. /proc/loadavg did not show a difference with a one msec interval, but the lower bound is set to 50 msecs since slower hardware wasn't tested. Store the interval in the vc_data structure for later access by fbcon, initializing the value to fbcon's current hardcoded value of 200 msecs. Signed-off-by: Scot Doyle Acked-by: Pavel Machek Signed-off-by: Greg Kroah-Hartman > Looks it is a real fbcon/vt issue, see following: > > fbcon_init() > <-.con_init > <-visual_init() > > reset_terminal() > <-vc_init() > > vc->vc_cur_blink_ms is just set in reset_terminal() from vc_init() path, > and ops->cur_blink_jiffies is figured out from vc->vc_cur_blink_ms > in fbcon_init(). > > And visual_init() is always run before vc_init(), so ops->cur_blink_jiffies > is initialized as zero and cause the soft lockup issue finally. > > Thanks, > Ming > >> >> perf shows that the CPU caught by the NMI is typically in code >> updating the cursor timer: >> >> - 16.92% swapper [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore >>- _raw_spin_unlock_irqrestore >> + 16.87% mod_timer >> + 0.05% cursor_timer_handler >> - 12.15% swapper [kernel.kallsyms] [k] queue_work_on >>- queue_work_on >> + 12.00% cursor_timer_handler >> + 0.15% call_timer_fn >> + 10.98% swapper [kernel.kallsyms] [k] run_timer_softirq >> -2.23% swapper [kernel.kallsyms] [k] mod_timer >>- mod_timer >> + 1.97% cursor_timer_handler >> + 0.26% call_timer_fn >> >> During the same period, I can see that another CPU is actively >> executing the timer function: >> >> - 42.18% kworker/u96:2 [kernel.kallsyms] [k] ww_mutex_unlock >>- ww_mutex_unlock >> - 40.70% ast_dirty_update >>ast_imageblit >>soft_cursor >>bit_cursor >>fb_flashcursor >>process_one_work >>worker_thread >>kthread >>ret_from_fork >> + 1.48% ast_imageblit >> - 40.15% kworker/u96:2 [kernel.kallsyms] [k] __memcpy_toio >>- __memcpy_toio >> + 31.54% ast_dirty_update >> + 8.61% ast_imageblit >> >> Using the graph function tracer on fb_flashcursor(), I see that >> ast_dirty_update usually takes around 60 us, in which it makes 16 >> calls to __memcpy_toio(). However, there is always one instance on >> every boot of the installer where ast_dirty_update() takes ~98 *ms* to >> complete, during which it makes 743 calls to __memcpy_toio(). While >> that doesn't directly account for the full 22s, I do wonder if that >> maybe a smoking gun. >> >> fyi, this is being tracked at: https://bugs.launchpad.net/bugs/1574814 >> >> -dann
Re: [Nouveau] [PATCH] drm/nouveau: fix vblank deadlock
On 08/12/2013 07:50 AM, Maarten Lankhorst wrote: This fixes a deadlock inversion when vblank is enabled/disabled by drm. &dev->vblank_time_lock is always taken when the vblank state is toggled, which caused a deadlock when &event->lock was also taken during event_get/put. Solve the race by requiring that lock to change enable/disable state, and always keeping vblank on the event list. Core drm ignores unwanted vblanks, so extra calls to drm_handle_vblank are harmless. I don't feel this is the appropriate solution to the lock inversion between vblank_time_lock and event->lock. Preferably drm core should correct the interface layer bug; ie., calling into a sub-driver holding a lock _and_ requiring the sub-driver to call a drm helper function which claims the same lock is bad design. The console lock suffers from the same design flaw and is a constant problem. Alternatively, the event trigger could be lockless; ie., the event list could be an RCU list instead. In this way, the event->lock does not need to be claimed, and thus no lock inversion is possible. The main drawback here is that currently the event->lock enforces non-overlapping lifetimes between the event handler and the event. Untangling object lifetimes in nouveau is a non-trivial exercise. Regards, Peter Hurley Signed-off-by: Maarten Lankhorst --- diff --git a/drivers/gpu/drm/nouveau/core/core/event.c b/drivers/gpu/drm/nouveau/core/core/event.c index 7eb81c1..78bff7c 100644 --- a/drivers/gpu/drm/nouveau/core/core/event.c +++ b/drivers/gpu/drm/nouveau/core/core/event.c @@ -23,43 +23,64 @@ #include #include -static void -nouveau_event_put_locked(struct nouveau_event *event, int index, -struct nouveau_eventh *handler) -{ - if (!--event->index[index].refs) { - if (event->disable) - event->disable(event, index); - } - list_del(&handler->head); -} - void nouveau_event_put(struct nouveau_event *event, int index, struct nouveau_eventh *handler) { unsigned long flags; + if (index >= event->index_nr) + return; + spin_lock_irqsave(&event->lock, flags); - if (index < event->index_nr) - nouveau_event_put_locked(event, index, handler); + list_del(&handler->head); + + if (event->toggle_lock) + spin_lock(event->toggle_lock); + nouveau_event_disable_locked(event, index, 1); + if (event->toggle_lock) + spin_unlock(event->toggle_lock); + spin_unlock_irqrestore(&event->lock, flags); } void +nouveau_event_enable_locked(struct nouveau_event *event, int index) +{ + if (index >= event->index_nr) + return; + + if (!event->index[index].refs++ && event->enable) + event->enable(event, index); +} + +void +nouveau_event_disable_locked(struct nouveau_event *event, int index, int refs) +{ + if (index >= event->index_nr) + return; + + event->index[index].refs -= refs; + if (!event->index[index].refs && event->disable) + event->disable(event, index); +} + +void nouveau_event_get(struct nouveau_event *event, int index, struct nouveau_eventh *handler) { unsigned long flags; + if (index >= event->index_nr) + return; + spin_lock_irqsave(&event->lock, flags); - if (index < event->index_nr) { - list_add(&handler->head, &event->index[index].list); - if (!event->index[index].refs++) { - if (event->enable) - event->enable(event, index); - } - } + list_add(&handler->head, &event->index[index].list); + if (event->toggle_lock) + spin_lock(event->toggle_lock); + nouveau_event_enable_locked(event, index); + if (event->toggle_lock) + spin_unlock(event->toggle_lock); spin_unlock_irqrestore(&event->lock, flags); } @@ -68,6 +89,7 @@ nouveau_event_trigger(struct nouveau_event *event, int index) { struct nouveau_eventh *handler, *temp; unsigned long flags; + int refs = 0; if (index >= event->index_nr) return; @@ -75,9 +97,17 @@ nouveau_event_trigger(struct nouveau_event *event, int index) spin_lock_irqsave(&event->lock, flags); list_for_each_entry_safe(handler, temp, &event->index[index].list, head) { if (handler->func(handler, index) == NVKM_EVENT_DROP) { - nouveau_event_put_locked(event, index, handler); + list_del(&handler->head); + refs++;
[PATCH] drm: Use correct spinlock flavor in drm_vblank_get()
The irq flags state is already established by the outer spin_lock_irqsave(); re-disabling irqs is redundant. Signed-off-by: Peter Hurley --- drivers/gpu/drm/drm_irq.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c index f92da0a..c7d4af5 100644 --- a/drivers/gpu/drm/drm_irq.c +++ b/drivers/gpu/drm/drm_irq.c @@ -952,13 +952,13 @@ static void drm_update_vblank_count(struct drm_device *dev, int crtc) */ int drm_vblank_get(struct drm_device *dev, int crtc) { - unsigned long irqflags, irqflags2; + unsigned long irqflags; int ret = 0; spin_lock_irqsave(&dev->vbl_lock, irqflags); /* Going from 0->1 means we have to enable interrupts again */ if (atomic_add_return(1, &dev->vblank_refcount[crtc]) == 1) { - spin_lock_irqsave(&dev->vblank_time_lock, irqflags2); + spin_lock(&dev->vblank_time_lock); if (!dev->vblank_enabled[crtc]) { /* Enable vblank irqs under vblank_time_lock protection. * All vblank count & timestamp updates are held off @@ -976,7 +976,7 @@ int drm_vblank_get(struct drm_device *dev, int crtc) drm_update_vblank_count(dev, crtc); } } - spin_unlock_irqrestore(&dev->vblank_time_lock, irqflags2); + spin_unlock(&dev->vblank_time_lock); } else { if (!dev->vblank_enabled[crtc]) { atomic_dec(&dev->vblank_refcount[crtc]); -- 1.8.1.2 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 1/9] drm/nouveau: Add priv field for event handlers
Provide private field for event handlers exclusive use. Convert nouveau_fence_wait_uevent() and nouveau_fence_wait_uevent_handler(); drop struct nouveau_fence_uevent. Signed-off-by: Peter Hurley --- drivers/gpu/drm/nouveau/core/include/core/event.h | 1 + drivers/gpu/drm/nouveau/nouveau_fence.c | 20 +++- 2 files changed, 8 insertions(+), 13 deletions(-) diff --git a/drivers/gpu/drm/nouveau/core/include/core/event.h b/drivers/gpu/drm/nouveau/core/include/core/event.h index 9e09440..ad4d8c2 100644 --- a/drivers/gpu/drm/nouveau/core/include/core/event.h +++ b/drivers/gpu/drm/nouveau/core/include/core/event.h @@ -7,6 +7,7 @@ struct nouveau_eventh { struct list_head head; + void *priv; int (*func)(struct nouveau_eventh *, int index); }; diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c b/drivers/gpu/drm/nouveau/nouveau_fence.c index be31499..c2e3167 100644 --- a/drivers/gpu/drm/nouveau/nouveau_fence.c +++ b/drivers/gpu/drm/nouveau/nouveau_fence.c @@ -165,17 +165,11 @@ nouveau_fence_done(struct nouveau_fence *fence) return !fence->channel; } -struct nouveau_fence_uevent { - struct nouveau_eventh handler; - struct nouveau_fence_priv *priv; -}; - static int -nouveau_fence_wait_uevent_handler(struct nouveau_eventh *event, int index) +nouveau_fence_wait_uevent_handler(struct nouveau_eventh *handler, int index) { - struct nouveau_fence_uevent *uevent = - container_of(event, struct nouveau_fence_uevent, handler); - wake_up_all(&uevent->priv->waiting); + struct nouveau_fence_priv *priv = handler->priv; + wake_up_all(&priv->waiting); return NVKM_EVENT_KEEP; } @@ -186,13 +180,13 @@ nouveau_fence_wait_uevent(struct nouveau_fence *fence, bool intr) struct nouveau_channel *chan = fence->channel; struct nouveau_fifo *pfifo = nouveau_fifo(chan->drm->device); struct nouveau_fence_priv *priv = chan->drm->fence; - struct nouveau_fence_uevent uevent = { - .handler.func = nouveau_fence_wait_uevent_handler, + struct nouveau_eventh handler = { + .func = nouveau_fence_wait_uevent_handler, .priv = priv, }; int ret = 0; - nouveau_event_get(pfifo->uevent, 0, &uevent.handler); + nouveau_event_get(pfifo->uevent, 0, &handler); if (fence->timeout) { unsigned long timeout = fence->timeout - jiffies; @@ -224,7 +218,7 @@ nouveau_fence_wait_uevent(struct nouveau_fence *fence, bool intr) } } - nouveau_event_put(pfifo->uevent, 0, &uevent.handler); + nouveau_event_put(pfifo->uevent, 0, &handler); if (unlikely(ret < 0)) return ret; -- 1.8.1.2 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 2/9] drm/nouveau: Move event index check from critical section
The index_nr field is constant for the lifetime of the event, so serialized access is unnecessary. Signed-off-by: Peter Hurley --- drivers/gpu/drm/nouveau/core/core/event.c | 19 +++ 1 file changed, 11 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/nouveau/core/core/event.c b/drivers/gpu/drm/nouveau/core/core/event.c index 7eb81c1..e69c463 100644 --- a/drivers/gpu/drm/nouveau/core/core/event.c +++ b/drivers/gpu/drm/nouveau/core/core/event.c @@ -40,9 +40,11 @@ nouveau_event_put(struct nouveau_event *event, int index, { unsigned long flags; + if (index >= event->index_nr) + return; + spin_lock_irqsave(&event->lock, flags); - if (index < event->index_nr) - nouveau_event_put_locked(event, index, handler); + nouveau_event_put_locked(event, index, handler); spin_unlock_irqrestore(&event->lock, flags); } @@ -52,13 +54,14 @@ nouveau_event_get(struct nouveau_event *event, int index, { unsigned long flags; + if (index >= event->index_nr) + return; + spin_lock_irqsave(&event->lock, flags); - if (index < event->index_nr) { - list_add(&handler->head, &event->index[index].list); - if (!event->index[index].refs++) { - if (event->enable) - event->enable(event, index); - } + list_add(&handler->head, &event->index[index].list); + if (!event->index[index].refs++) { + if (event->enable) + event->enable(event, index); } spin_unlock_irqrestore(&event->lock, flags); } -- 1.8.1.2 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 0/9] drm/nouveau: Cleanup event/handler design
This series was originally motivated by a deadlock, introduced in commit 1d7c71a3e2f77336df536855b0efd2dc5bdeb41b 'drm/nouveau/disp: port vblank handling to event interface', due to inverted lock order between nouveau_drm_vblank_enable() and nouveau_drm_vblank_handler() (the complete lockdep report is included in the patch 4/5 changelog). Because this series fixes the vblank event deadlock, it is a competing solution to Maarten Lankhorst's 'drm/nouveau: fix vblank deadlock'. This series fixes the vblank event deadlock by converting the event handler list to RCU. This solution allows the event trigger to be lockless, and thus avoiding the lock inversion. Typical hurdles to RCU conversion include: 1) ensuring the object lifetime exceeds the lockless access; 2) preventing premature object reuse; and 3) verifying that stale object use not present logical errors. Because object reuse is not safe in RCU-based operations, the nouveau_event_get/_put interface is migrated from add/remove semantics to enable/disable semantics with a separate install/remove step (which also serves to document the handler lifetime). This also corrects an unsafe interface design where handlers can mistakenly be reused while in use. The nouveau driver currently supports 4 events -- gpio, uevent, cevent, and vblank. Every event is created by and stored within its respective subdev/engine object -- gpio in the GPIO subdev, uevent and cevent in the FIFO engine, and vblank in the DISP engine. Thus event lifetime extends until the subdev is destructed during devobj teardown. Event handler lifetime varies and is detailed in the table below (apologies for the wide-format): Event Handler function Container Until - --- -- gpio nouveau_connector_hotplug struct nouveau_connector nouveau_connector_destroy uevent nouveau_fence_wait_uevent_handler local stack object nouveau_fence_wait_uevent returns cevent none n/a n/a vblank nouveau_drm_vblank_handler struct nouveau_drm nouveau_drm_remove vblank nv50_software_vblsem_release struct nouveau_software_chan _nouveau_engctx_dtor (call stack originates with nouveau_abi16_chan_free ioctl) vblank nvc0_software_vblsem_release struct nouveau_software_chan same as above RCU lifetime considerations for handlers: Event HandlerLifetime - - gpio nouveau_connector_hotplug kfree_rcu(nv_connector) uevent nouveau_fence_wait_uevent_handler explicit use of nouveau_event_hander_create/_destroy cevent none n/a vblank nouveau_drm_vblank_handler synchronize_rcu() in nouveau_drm_unload vblank nv50_software_vblsem_release synchronize_rcu() in container dtor vblank nvc0_software_vblsem_release synchronize_rcu() in container dtor synchronize_rcu() is used for nouveau_object-based containers for which kfree_rcu() is not suitable/possible. Stale event handler execution is not a concern for the existing handlers because either: 1) the handler is responsible for disabling itself (via returning NVKM_EVENT_DROP), or 2) the existing handler can already be stale, as is the case with nouveau_connector_hotplug, which only schedules a work item, and nouveau_drm_vblank_handler, which the drm core expects may be stale. Peter Hurley (9): drm/nouveau: Add priv field for event handlers drm/nouveau: Move event index check from critical section drm/nouveau: Allocate local event handlers drm/nouveau: Allow asymmetric nouveau_event_get/_put drm/nouveau: Add install/remove semantics for event handlers drm/nouveau: Convert event handler list to RCU drm/nouveau: Fold nouveau_event_put_locked into caller drm/nouveau: Simplify event interface drm/nouveau: Simplify event handler interface drivers/gpu/drm/nouveau/core/core/event.c | 121 + .../gpu/drm/nouveau/core/engine/software/nv50.c| 32 -- .../gpu/drm/nouveau/core/engine/software/nvc0.c| 32 -- drivers/gpu/drm/nouveau/core/include/core/event.h | 27 - .../gpu/drm/nouveau/core/include/engine/software.h | 2 +- drivers/gpu/drm/nouveau/nouveau_connector.c| 16 ++- drivers/gpu/drm/nouveau/nouveau_display.c | 16 +-- drivers/gpu/drm/nouveau/nouveau_drm.c | 35 +++--- drivers/gpu/drm/nouveau/nouveau_fence.c| 27 ++--- 9 files changed, 220 insertions(+), 88 deletions(-) -- 1.8.1.2 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://list
[PATCH 3/9] drm/nouveau: Allocate local event handlers
Prepare for transition to RCU-based event handler list; since RCU list traversal may have stale pointers, local storage may go out of scope before handler completes. Introduce nouveau_event_handler_create/_destroy which provides suitable semantics for multiple, temporary event handlers. Signed-off-by: Peter Hurley --- drivers/gpu/drm/nouveau/core/core/event.c | 24 +++ drivers/gpu/drm/nouveau/core/include/core/event.h | 6 ++ drivers/gpu/drm/nouveau/nouveau_fence.c | 15 +++--- 3 files changed, 38 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/nouveau/core/core/event.c b/drivers/gpu/drm/nouveau/core/core/event.c index e69c463..1a8d685 100644 --- a/drivers/gpu/drm/nouveau/core/core/event.c +++ b/drivers/gpu/drm/nouveau/core/core/event.c @@ -23,6 +23,30 @@ #include #include +int +nouveau_event_handler_create(struct nouveau_event *event, int index, +int (*func)(struct nouveau_eventh*, int), +void *priv, struct nouveau_eventh **phandler) +{ + struct nouveau_eventh *handler; + + handler = *phandler = kzalloc(sizeof(*handler), GFP_KERNEL); + if (!handler) + return -ENOMEM; + handler->func = func; + handler->priv = priv; + nouveau_event_get(event, index, handler); + return 0; +} + +void +nouveau_event_handler_destroy(struct nouveau_event *event, int index, + struct nouveau_eventh *handler) +{ + nouveau_event_put(event, index, handler); + kfree(handler); +} + static void nouveau_event_put_locked(struct nouveau_event *event, int index, struct nouveau_eventh *handler) diff --git a/drivers/gpu/drm/nouveau/core/include/core/event.h b/drivers/gpu/drm/nouveau/core/include/core/event.h index ad4d8c2..bdf1a0a 100644 --- a/drivers/gpu/drm/nouveau/core/include/core/event.h +++ b/drivers/gpu/drm/nouveau/core/include/core/event.h @@ -34,4 +34,10 @@ void nouveau_event_get(struct nouveau_event *, int index, void nouveau_event_put(struct nouveau_event *, int index, struct nouveau_eventh *); +int nouveau_event_handler_create(struct nouveau_event *, int index, +int (*func)(struct nouveau_eventh*, int), +void *priv, struct nouveau_eventh **); +void nouveau_event_handler_destroy(struct nouveau_event *, int index, + struct nouveau_eventh *); + #endif diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c b/drivers/gpu/drm/nouveau/nouveau_fence.c index c2e3167..6dde483 100644 --- a/drivers/gpu/drm/nouveau/nouveau_fence.c +++ b/drivers/gpu/drm/nouveau/nouveau_fence.c @@ -180,13 +180,14 @@ nouveau_fence_wait_uevent(struct nouveau_fence *fence, bool intr) struct nouveau_channel *chan = fence->channel; struct nouveau_fifo *pfifo = nouveau_fifo(chan->drm->device); struct nouveau_fence_priv *priv = chan->drm->fence; - struct nouveau_eventh handler = { - .func = nouveau_fence_wait_uevent_handler, - .priv = priv, - }; - int ret = 0; + struct nouveau_eventh *handler; + int ret; - nouveau_event_get(pfifo->uevent, 0, &handler); + ret = nouveau_event_handler_create(pfifo->uevent, 0, + nouveau_fence_wait_uevent_handler, + priv, &handler); + if (ret) + return ret; if (fence->timeout) { unsigned long timeout = fence->timeout - jiffies; @@ -218,7 +219,7 @@ nouveau_fence_wait_uevent(struct nouveau_fence *fence, bool intr) } } - nouveau_event_put(pfifo->uevent, 0, &handler); + nouveau_event_handler_destroy(pfifo->uevent, 0, handler); if (unlikely(ret < 0)) return ret; -- 1.8.1.2 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 4/9] drm/nouveau: Allow asymmetric nouveau_event_get/_put
Most nouveau event handlers have storage in 'static' containers (structures with lifetimes nearly equivalent to the drm_device), but are dangerously reused via nouveau_event_get/_put. For example, if nouveau_event_get is called more than once for a given handler, the event handler list will be corrupted. Migrate nouveau_event_get/_put from add/remove semantics to enable/disable semantics. Signed-off-by: Peter Hurley --- drivers/gpu/drm/nouveau/core/core/event.c | 20 drivers/gpu/drm/nouveau/core/include/core/event.h | 4 drivers/gpu/drm/nouveau/nouveau_drm.c | 8 ++-- 3 files changed, 18 insertions(+), 14 deletions(-) diff --git a/drivers/gpu/drm/nouveau/core/core/event.c b/drivers/gpu/drm/nouveau/core/core/event.c index 1a8d685..0a65ede 100644 --- a/drivers/gpu/drm/nouveau/core/core/event.c +++ b/drivers/gpu/drm/nouveau/core/core/event.c @@ -51,11 +51,13 @@ static void nouveau_event_put_locked(struct nouveau_event *event, int index, struct nouveau_eventh *handler) { - if (!--event->index[index].refs) { - if (event->disable) - event->disable(event, index); + if (__test_and_clear_bit(NVKM_EVENT_ENABLE, &handler->flags)) { + if (!--event->index[index].refs) { + if (event->disable) + event->disable(event, index); + } + list_del(&handler->head); } - list_del(&handler->head); } void @@ -82,10 +84,12 @@ nouveau_event_get(struct nouveau_event *event, int index, return; spin_lock_irqsave(&event->lock, flags); - list_add(&handler->head, &event->index[index].list); - if (!event->index[index].refs++) { - if (event->enable) - event->enable(event, index); + if (!__test_and_set_bit(NVKM_EVENT_ENABLE, &handler->flags)) { + list_add(&handler->head, &event->index[index].list); + if (!event->index[index].refs++) { + if (event->enable) + event->enable(event, index); + } } spin_unlock_irqrestore(&event->lock, flags); } diff --git a/drivers/gpu/drm/nouveau/core/include/core/event.h b/drivers/gpu/drm/nouveau/core/include/core/event.h index bdf1a0a..3e704d5 100644 --- a/drivers/gpu/drm/nouveau/core/include/core/event.h +++ b/drivers/gpu/drm/nouveau/core/include/core/event.h @@ -5,8 +5,12 @@ #define NVKM_EVENT_DROP 0 #define NVKM_EVENT_KEEP 1 +/* nouveau_eventh.flags bit #s */ +#define NVKM_EVENT_ENABLE 0 + struct nouveau_eventh { struct list_head head; + unsigned long flags; void *priv; int (*func)(struct nouveau_eventh *, int index); }; diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c index b29d04b..ccea2c4 100644 --- a/drivers/gpu/drm/nouveau/nouveau_drm.c +++ b/drivers/gpu/drm/nouveau/nouveau_drm.c @@ -88,7 +88,6 @@ nouveau_drm_vblank_enable(struct drm_device *dev, int head) if (WARN_ON_ONCE(head > ARRAY_SIZE(drm->vblank))) return -EIO; - WARN_ON_ONCE(drm->vblank[head].func); drm->vblank[head].func = nouveau_drm_vblank_handler; nouveau_event_get(pdisp->vblank, head, &drm->vblank[head]); return 0; @@ -99,11 +98,8 @@ nouveau_drm_vblank_disable(struct drm_device *dev, int head) { struct nouveau_drm *drm = nouveau_drm(dev); struct nouveau_disp *pdisp = nouveau_disp(drm->device); - if (drm->vblank[head].func) - nouveau_event_put(pdisp->vblank, head, &drm->vblank[head]); - else - WARN_ON_ONCE(1); - drm->vblank[head].func = NULL; + + nouveau_event_put(pdisp->vblank, head, &drm->vblank[head]); } static u64 -- 1.8.1.2 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 5/9] drm/nouveau: Add install/remove semantics for event handlers
Complete migration of nouveau_event_get/_put from add/remove semantics to enable/disable semantics. Introduce nouveau_event_handler_install/_remove interface to add/remove non-local-scope event handlers (ie., those stored in parent containers). This change in semantics makes explicit the handler lifetime, and distinguishes "one-of" event handlers (such as gpio) from "many temporary" event handlers (such as uevent). Signed-off-by: Peter Hurley --- drivers/gpu/drm/nouveau/core/core/event.c | 63 +++--- .../gpu/drm/nouveau/core/engine/software/nv50.c| 31 +-- .../gpu/drm/nouveau/core/engine/software/nvc0.c| 31 +-- drivers/gpu/drm/nouveau/core/include/core/event.h | 6 +++ .../gpu/drm/nouveau/core/include/engine/software.h | 2 +- drivers/gpu/drm/nouveau/nouveau_connector.c| 10 +++- drivers/gpu/drm/nouveau/nouveau_drm.c | 17 +- 7 files changed, 140 insertions(+), 20 deletions(-) diff --git a/drivers/gpu/drm/nouveau/core/core/event.c b/drivers/gpu/drm/nouveau/core/core/event.c index 0a65ede..4cd1ebe 100644 --- a/drivers/gpu/drm/nouveau/core/core/event.c +++ b/drivers/gpu/drm/nouveau/core/core/event.c @@ -23,19 +23,60 @@ #include #include +void +nouveau_event_handler_install(struct nouveau_event *event, int index, + int (*func)(struct nouveau_eventh*, int), + void *priv, struct nouveau_eventh *handler) +{ + unsigned long flags; + + if (index >= event->index_nr) + return; + + handler->func = func; + handler->priv = priv; + + spin_lock_irqsave(&event->lock, flags); + list_add(&handler->head, &event->index[index].list); + spin_unlock_irqrestore(&event->lock, flags); +} + +void +nouveau_event_handler_remove(struct nouveau_event *event, int index, +struct nouveau_eventh *handler) +{ + unsigned long flags; + + if (index >= event->index_nr) + return; + + spin_lock_irqsave(&event->lock, flags); + list_del(&handler->head); + spin_unlock_irqrestore(&event->lock, flags); +} + int nouveau_event_handler_create(struct nouveau_event *event, int index, int (*func)(struct nouveau_eventh*, int), void *priv, struct nouveau_eventh **phandler) { struct nouveau_eventh *handler; + unsigned long flags; handler = *phandler = kzalloc(sizeof(*handler), GFP_KERNEL); if (!handler) return -ENOMEM; handler->func = func; handler->priv = priv; - nouveau_event_get(event, index, handler); + __set_bit(NVKM_EVENT_ENABLE, &handler->flags); + + spin_lock_irqsave(&event->lock, flags); + list_add(&handler->head, &event->index[index].list); + if (!event->index[index].refs++) { + if (event->enable) + event->enable(event, index); + } + spin_unlock_irqrestore(&event->lock, flags); return 0; } @@ -43,7 +84,18 @@ void nouveau_event_handler_destroy(struct nouveau_event *event, int index, struct nouveau_eventh *handler) { - nouveau_event_put(event, index, handler); + unsigned long flags; + + if (index >= event->index_nr) + return; + + spin_lock_irqsave(&event->lock, flags); + if (!--event->index[index].refs) { + if (event->disable) + event->disable(event, index); + } + list_del(&handler->head); + spin_unlock_irqrestore(&event->lock, flags); kfree(handler); } @@ -56,7 +108,6 @@ nouveau_event_put_locked(struct nouveau_event *event, int index, if (event->disable) event->disable(event, index); } - list_del(&handler->head); } } @@ -85,7 +136,6 @@ nouveau_event_get(struct nouveau_event *event, int index, spin_lock_irqsave(&event->lock, flags); if (!__test_and_set_bit(NVKM_EVENT_ENABLE, &handler->flags)) { - list_add(&handler->head, &event->index[index].list); if (!event->index[index].refs++) { if (event->enable) event->enable(event, index); @@ -105,8 +155,9 @@ nouveau_event_trigger(struct nouveau_event *event, int index) spin_lock_irqsave(&event->lock, flags); list_for_each_entry_safe(handler, temp, &event->index[index].list, head) { - if (handler->func(handler, index) == NVKM_EVENT_DROP) { - nouveau_event_put_locked(event, i
[PATCH 6/9] drm/nouveau: Convert event handler list to RCU
Lockdep report [1] correctly identifies a potential deadlock between nouveau_drm_vblank_enable() and nouveau_drm_vblank_handler() due to inverted lock order of event->lock with drm core's dev->vblank_time_lock. Fix vblank event deadlock by converting event handler list to RCU. List updates remain serialized by event->lock. List traversal & handler execution is lockless which prevents inverted lock order problems with the drm core. Safe deletion of the event handler is accomplished with synchronize_rcu() for those handlers stored in nouveau_object-based containers (nouveau_drm & nv50_/nvc0_software_context); otherwise, with kfree_rcu (for nouveau_connector & uevent temporary handlers). [1] == [ INFO: possible circular locking dependency detected ] 3.10.0-0+tip-xeon+lockdep #0+tip Not tainted --- swapper/7/0 is trying to acquire lock: (&(&dev->vblank_time_lock)->rlock){-.}, at: [] drm_handle_vblank+0x69/0x400 [drm] but task is already holding lock: (&(&event->lock)->rlock){-.}, at: [] nouveau_event_trigger+0x4d/0xd0 [nouveau] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&(&event->lock)->rlock){-.}: [] lock_acquire+0x92/0x1f0 [] _raw_spin_lock_irqsave+0x46/0x60 [] nouveau_event_get+0x27/0xa0 [nouveau] [] nouveau_drm_vblank_enable+0x8d/0xf0 [nouveau] [] drm_vblank_get+0xf8/0x2c0 [drm] [] drm_wait_vblank+0x84/0x6f0 [drm] [] drm_ioctl+0x559/0x690 [drm] [] do_vfs_ioctl+0x97/0x590 [] SyS_ioctl+0x91/0xb0 [] system_call_fastpath+0x16/0x1b -> #0 (&(&dev->vblank_time_lock)->rlock){-.}: [] __lock_acquire+0x1c43/0x1d30 [] lock_acquire+0x92/0x1f0 [] _raw_spin_lock_irqsave+0x46/0x60 [] drm_handle_vblank+0x69/0x400 [drm] [] nouveau_drm_vblank_handler+0x27/0x30 [nouveau] [] nouveau_event_trigger+0x90/0xd0 [nouveau] [] nv50_disp_intr+0xdd/0x230 [nouveau] [] nouveau_mc_intr+0xa1/0x100 [nouveau] [] handle_irq_event_percpu+0x6c/0x3d0 [] handle_irq_event+0x48/0x70 [] handle_fasteoi_irq+0x5a/0x100 [] handle_irq+0x22/0x40 [] do_IRQ+0x5a/0xd0 [] ret_from_intr+0x0/0x13 [] arch_cpu_idle+0x26/0x30 [] cpu_startup_entry+0xce/0x410 [] start_secondary+0x255/0x25c other info that might help us debug this: Possible unsafe locking scenario: CPU0CPU1 lock(&(&event->lock)->rlock); lock(&(&dev->vblank_time_lock)->rlock); lock(&(&event->lock)->rlock); lock(&(&dev->vblank_time_lock)->rlock); *** DEADLOCK *** 1 lock held by swapper/7/0: #0: (&(&event->lock)->rlock){-.}, at: [] nouveau_event_trigger+0x4d/0xd0 [nouveau] stack backtrace: CPU: 7 PID: 0 Comm: swapper/7 Not tainted 3.10.0-0+tip-xeon+lockdep #0+tip Hardware name: Dell Inc. Precision WorkStation T5400 /0RW203, BIOS A11 04/30/2012 821ccf60 8802bfdc3b08 81755f39 8802bfdc3b58 8174f526 0002 8802bfdc3be8 8802b330a7f8 8802b330a820 8802b330a7f8 0001 Call Trace: [] dump_stack+0x19/0x1b [] print_circular_bug+0x1fb/0x20c [] __lock_acquire+0x1c43/0x1d30 [] ? trace_hardirqs_off+0xd/0x10 [] ? flat_send_IPI_mask+0x88/0xa0 [] lock_acquire+0x92/0x1f0 [] ? drm_handle_vblank+0x69/0x400 [drm] [] _raw_spin_lock_irqsave+0x46/0x60 [] ? drm_handle_vblank+0x69/0x400 [drm] [] drm_handle_vblank+0x69/0x400 [drm] [] nouveau_drm_vblank_handler+0x27/0x30 [nouveau] [] nouveau_event_trigger+0x90/0xd0 [nouveau] [] nv50_disp_intr+0xdd/0x230 [nouveau] [] ? _raw_spin_unlock_irqrestore+0x42/0x80 [] ? __hrtimer_start_range_ns+0x16c/0x530 [] nouveau_mc_intr+0xa1/0x100 [nouveau] [] handle_irq_event_percpu+0x6c/0x3d0 [] handle_irq_event+0x48/0x70 [] ? handle_fasteoi_irq+0x1e/0x100 [] handle_fasteoi_irq+0x5a/0x100 [] handle_irq+0x22/0x40 [] do_IRQ+0x5a/0xd0 [] common_interrupt+0x6f/0x6f [] ? default_idle+0x1d/0x290 [] ? default_idle+0x1b/0x290 [] arch_cpu_idle+0x26/0x30 [] cpu_startup_entry+0xce/0x410 [] ? clockevents_register_device+0xdc/0x140 [] start_secondary+0x255/0x25c Reported-by: Dave Jones Signed-off-by: Peter Hurley --- drivers/gpu/drm/nouveau/core/core/event.c | 22 +++--- .../gpu/drm/nouveau/core/engine/software/nv50.c| 1 + .../gpu/drm/nouveau/core/engine/software/nvc0.c| 1 + drivers/gpu/drm/nouveau/core/include/core/event.h | 1 + drivers/gpu/drm/nouveau/nouveau_connector.c| 2 +- drivers/gpu/drm/nouveau/nouveau_drm.c | 1 + 6 files changed, 16 insertion
[PATCH 7/9] drm/nouveau: Fold nouveau_event_put_locked into caller
nouveau_event_put_locked() only has 1 call site; fold into caller. Signed-off-by: Peter Hurley --- drivers/gpu/drm/nouveau/core/core/event.c | 19 ++- 1 file changed, 6 insertions(+), 13 deletions(-) diff --git a/drivers/gpu/drm/nouveau/core/core/event.c b/drivers/gpu/drm/nouveau/core/core/event.c index ce0a0ef..45bcb37 100644 --- a/drivers/gpu/drm/nouveau/core/core/event.c +++ b/drivers/gpu/drm/nouveau/core/core/event.c @@ -100,18 +100,6 @@ nouveau_event_handler_destroy(struct nouveau_event *event, int index, kfree_rcu(handler, rcu); } -static void -nouveau_event_put_locked(struct nouveau_event *event, int index, -struct nouveau_eventh *handler) -{ - if (__test_and_clear_bit(NVKM_EVENT_ENABLE, &handler->flags)) { - if (!--event->index[index].refs) { - if (event->disable) - event->disable(event, index); - } - } -} - void nouveau_event_put(struct nouveau_event *event, int index, struct nouveau_eventh *handler) @@ -122,7 +110,12 @@ nouveau_event_put(struct nouveau_event *event, int index, return; spin_lock_irqsave(&event->lock, flags); - nouveau_event_put_locked(event, index, handler); + if (__test_and_clear_bit(NVKM_EVENT_ENABLE, &handler->flags)) { + if (!--event->index[index].refs) { + if (event->disable) + event->disable(event, index); + } + } spin_unlock_irqrestore(&event->lock, flags); } -- 1.8.1.2 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 8/9] drm/nouveau: Simplify event interface
Store event back-pointer and index within struct event_handler; remove superfluous parameters when event_handler is supplied. Signed-off-by: Peter Hurley --- drivers/gpu/drm/nouveau/core/core/event.c | 36 +- .../gpu/drm/nouveau/core/engine/software/nv50.c| 11 ++- .../gpu/drm/nouveau/core/engine/software/nvc0.c| 11 ++- drivers/gpu/drm/nouveau/core/include/core/event.h | 15 + drivers/gpu/drm/nouveau/nouveau_connector.c| 5 +-- drivers/gpu/drm/nouveau/nouveau_display.c | 16 +++--- drivers/gpu/drm/nouveau/nouveau_drm.c | 10 ++ drivers/gpu/drm/nouveau/nouveau_fence.c| 2 +- 8 files changed, 44 insertions(+), 62 deletions(-) diff --git a/drivers/gpu/drm/nouveau/core/core/event.c b/drivers/gpu/drm/nouveau/core/core/event.c index 45bcb37..b7d8ae1 100644 --- a/drivers/gpu/drm/nouveau/core/core/event.c +++ b/drivers/gpu/drm/nouveau/core/core/event.c @@ -34,6 +34,9 @@ nouveau_event_handler_install(struct nouveau_event *event, int index, if (index >= event->index_nr) return; + handler->event = event; + handler->index = index; + handler->func = func; handler->priv = priv; @@ -43,12 +46,12 @@ nouveau_event_handler_install(struct nouveau_event *event, int index, } void -nouveau_event_handler_remove(struct nouveau_event *event, int index, -struct nouveau_eventh *handler) +nouveau_event_handler_remove(struct nouveau_eventh *handler) { + struct nouveau_event *event = handler->event; unsigned long flags; - if (index >= event->index_nr) + if (!event) return; spin_lock_irqsave(&event->lock, flags); @@ -67,6 +70,10 @@ nouveau_event_handler_create(struct nouveau_event *event, int index, handler = *phandler = kzalloc(sizeof(*handler), GFP_KERNEL); if (!handler) return -ENOMEM; + + handler->event = event; + handler->index = index; + handler->func = func; handler->priv = priv; __set_bit(NVKM_EVENT_ENABLE, &handler->flags); @@ -82,13 +89,12 @@ nouveau_event_handler_create(struct nouveau_event *event, int index, } void -nouveau_event_handler_destroy(struct nouveau_event *event, int index, - struct nouveau_eventh *handler) +nouveau_event_handler_destroy(struct nouveau_eventh *handler) { + struct nouveau_event *event = handler->event; + int index = handler->index; unsigned long flags; - if (index >= event->index_nr) - return; spin_lock_irqsave(&event->lock, flags); if (!--event->index[index].refs) { @@ -101,12 +107,13 @@ nouveau_event_handler_destroy(struct nouveau_event *event, int index, } void -nouveau_event_put(struct nouveau_event *event, int index, - struct nouveau_eventh *handler) +nouveau_event_put(struct nouveau_eventh *handler) { + struct nouveau_event *event = handler->event; + int index = handler->index; unsigned long flags; - if (index >= event->index_nr) + if (!event) return; spin_lock_irqsave(&event->lock, flags); @@ -120,12 +127,13 @@ nouveau_event_put(struct nouveau_event *event, int index, } void -nouveau_event_get(struct nouveau_event *event, int index, - struct nouveau_eventh *handler) +nouveau_event_get(struct nouveau_eventh *handler) { + struct nouveau_event *event = handler->event; + int index = handler->index; unsigned long flags; - if (index >= event->index_nr) + if (!event) return; spin_lock_irqsave(&event->lock, flags); @@ -150,7 +158,7 @@ nouveau_event_trigger(struct nouveau_event *event, int index) list_for_each_entry_rcu(handler, &event->index[index].list, head) { if (test_bit(NVKM_EVENT_ENABLE, &handler->flags)) { if (handler->func(handler, index) == NVKM_EVENT_DROP) - nouveau_event_put(event, index, handler); + nouveau_event_put(handler); } } rcu_read_unlock(); diff --git a/drivers/gpu/drm/nouveau/core/engine/software/nv50.c b/drivers/gpu/drm/nouveau/core/engine/software/nv50.c index 87aeee1..e969f0c 100644 --- a/drivers/gpu/drm/nouveau/core/engine/software/nv50.c +++ b/drivers/gpu/drm/nouveau/core/engine/software/nv50.c @@ -92,12 +92,11 @@ nv50_software_mthd_vblsem_release(struct nouveau_object *object, u32 mthd, void *args, u32 size) { struct nv50_software_chan *chan = (void *)nv_engctx(object->parent); - struct nouveau_disp *disp = nouveau_disp(object); u32 crtc = *(u32 *)args
[PATCH 9/9] drm/nouveau: Simplify event handler interface
Remove index parameter; access index via handler->index instead. Dissociate handler from related container; use handler->priv to access container. Signed-off-by: Peter Hurley --- drivers/gpu/drm/nouveau/core/core/event.c | 6 +++--- drivers/gpu/drm/nouveau/core/engine/software/nv50.c | 7 +++ drivers/gpu/drm/nouveau/core/engine/software/nvc0.c | 7 +++ drivers/gpu/drm/nouveau/core/include/core/event.h | 6 +++--- drivers/gpu/drm/nouveau/nouveau_connector.c | 9 + drivers/gpu/drm/nouveau/nouveau_drm.c | 9 + drivers/gpu/drm/nouveau/nouveau_fence.c | 2 +- 7 files changed, 23 insertions(+), 23 deletions(-) diff --git a/drivers/gpu/drm/nouveau/core/core/event.c b/drivers/gpu/drm/nouveau/core/core/event.c index b7d8ae1..1240fef 100644 --- a/drivers/gpu/drm/nouveau/core/core/event.c +++ b/drivers/gpu/drm/nouveau/core/core/event.c @@ -26,7 +26,7 @@ void nouveau_event_handler_install(struct nouveau_event *event, int index, - int (*func)(struct nouveau_eventh*, int), + int (*func)(struct nouveau_eventh*), void *priv, struct nouveau_eventh *handler) { unsigned long flags; @@ -61,7 +61,7 @@ nouveau_event_handler_remove(struct nouveau_eventh *handler) int nouveau_event_handler_create(struct nouveau_event *event, int index, -int (*func)(struct nouveau_eventh*, int), +int (*func)(struct nouveau_eventh*), void *priv, struct nouveau_eventh **phandler) { struct nouveau_eventh *handler; @@ -157,7 +157,7 @@ nouveau_event_trigger(struct nouveau_event *event, int index) rcu_read_lock(); list_for_each_entry_rcu(handler, &event->index[index].list, head) { if (test_bit(NVKM_EVENT_ENABLE, &handler->flags)) { - if (handler->func(handler, index) == NVKM_EVENT_DROP) + if (handler->func(handler) == NVKM_EVENT_DROP) nouveau_event_put(handler); } } diff --git a/drivers/gpu/drm/nouveau/core/engine/software/nv50.c b/drivers/gpu/drm/nouveau/core/engine/software/nv50.c index e969f0c..bf6f23b 100644 --- a/drivers/gpu/drm/nouveau/core/engine/software/nv50.c +++ b/drivers/gpu/drm/nouveau/core/engine/software/nv50.c @@ -131,10 +131,9 @@ nv50_software_sclass[] = { **/ static int -nv50_software_vblsem_release(struct nouveau_eventh *event, int head) +nv50_software_vblsem_release(struct nouveau_eventh *handler) { - struct nouveau_software_chan *chan = - container_of(event, struct nouveau_software_chan, vblank.event[head]); + struct nouveau_software_chan *chan = handler->priv; struct nv50_software_priv *priv = (void *)nv_object(chan)->engine; struct nouveau_bar *bar = nouveau_bar(priv); @@ -172,7 +171,7 @@ nv50_software_context_ctor(struct nouveau_object *parent, for (i = 0; i < ARRAY_SIZE(chan->base.vblank.event); i++) { nouveau_event_handler_install(disp->vblank, i, nv50_software_vblsem_release, - NULL, + chan, &chan->base.vblank.event[i]); } return 0; diff --git a/drivers/gpu/drm/nouveau/core/engine/software/nvc0.c b/drivers/gpu/drm/nouveau/core/engine/software/nvc0.c index d06658a..1a2a7a8 100644 --- a/drivers/gpu/drm/nouveau/core/engine/software/nvc0.c +++ b/drivers/gpu/drm/nouveau/core/engine/software/nvc0.c @@ -143,10 +143,9 @@ nvc0_software_sclass[] = { **/ static int -nvc0_software_vblsem_release(struct nouveau_eventh *event, int head) +nvc0_software_vblsem_release(struct nouveau_eventh *handler) { - struct nouveau_software_chan *chan = - container_of(event, struct nouveau_software_chan, vblank.event[head]); + struct nouveau_software_chan *chan = handler->priv; struct nvc0_software_priv *priv = (void *)nv_object(chan)->engine; struct nouveau_bar *bar = nouveau_bar(priv); @@ -178,7 +177,7 @@ nvc0_software_context_ctor(struct nouveau_object *parent, for (i = 0; i < ARRAY_SIZE(chan->base.vblank.event); i++) { nouveau_event_handler_install(disp->vblank, i, nvc0_software_vblsem_release, - NULL, + chan, &chan->base.vblank.event[i]); } return 0; diff --git a/drivers/gpu/drm/nouveau/core/inc
Re: [PATCH 0/9] drm/nouveau: Cleanup event/handler design
On 08/28/2013 03:15 AM, Ben Skeggs wrote: On Wed, Aug 28, 2013 at 6:12 AM, Peter Hurley wrote: This series was originally motivated by a deadlock, introduced in commit 1d7c71a3e2f77336df536855b0efd2dc5bdeb41b 'drm/nouveau/disp: port vblank handling to event interface', due to inverted lock order between nouveau_drm_vblank_enable() and nouveau_drm_vblank_handler() (the complete lockdep report is included in the patch 4/5 changelog). Hey Peter, Thanks for the patch series. I've only taken a cursory glance through the patches thus far, as they're definitely too intrusive for -fixes at this point. I will take a proper look through the series within the next week or so, I have some work pending which may possibly make some of these changes unnecessary, though from what I can tell, there's other good bits here we'll want. In a previous mail on the locking issue, you mentioned the drm core doing the "wrong thing" here too, did you have any further thoughts on correcting that issue too? I'm working on the premise that drm_handle_vblank() can be lockless; that would minimize the api disruption. I still have a fair bit of analysis to do, but some early observations: 1) The vbl_time_lock spinlock is unnecessary -- drm_handle_vblank() could be protected with vbl_lock, since everywhere else vbl_time_lock is held, vbl_lock is already held. 2) The _vblank_count[crtc] doesn't need to be atomic. All the code paths that update _vblank_count[] are already spinlocked. Even if the code weren't spinlocked, the atomic_inc/_adds aren't necessary; only the memory barriers and read/write order of the vblank count/timestamp tuple is relevant. 3) Careful enabling of drm_handle_vblank() is sufficient to solve the concurrency between drm_handle_vblank() and drm_vblank_get() without needing a spinlock for exclusion. drm_handle_vblank() would need to account for the possibility of having missed a vblank interrupt between the reading of the hw vblank counter and the enabling drm_handle_vblank(). 4) I'm still thinking about how/whether to exclude drm_handle_vblank() and vblank_disable_and_save(). One thought is to replace the timeout timer with delayed work instead so that vblank_disable_and_save() could wait for drm_handle_vblank() to finish after disabling it. [I'd also need to verify that the drm drivers other than intel which use drm_vblank_off() do so from non-atomic context.] I realize that I'm mostly referring to the hw counter/timestamp flavor of vblank handling; that's primarily because it requires a more rigorous handling and has race conditions that don't apply to the software-only counter. Regards, Peter Hurley ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: BUG: sleeping function called from invalid context on 3.10.10-rt7
[+cc dri-devel] On 09/11/2013 11:38 AM, Steven Rostedt wrote: On Wed, 11 Sep 2013 11:16:43 -0400 Peter Hurley wrote: The funny part is, there's a comment there that shows that this was done even for "PREEMPT_RT". Unfortunately, the call to "get_scanout_position()" can call functions that use the rt-mutex "sleeping spin locks" and it breaks there. I guess we need to ask the authors of the mainline patch exactly why that preempt_disable() is needed? The drm core associates a timestamp with each vertical blank frame #. Drm drivers can optionally support a 'high resolution' hw timestamp. The vblank frame #/timestamp tuple is user-space visible. The i915 drm driver supports a hw timestamp via this drm helper function which computes the timestamp from the crtc scan position (based on the pixel clock). For mainline, the preempt_disable/_enable() isn't actually necessary because every call tree that leads here already has preemption disabled. For -RT, the maybe i915 register spinlock (uncore.lock) should be raw? No, it should not. Note, any other lock that can be held when it is held would also need to be raw. By that, you mean "any other lock" that might be claimed "would also need to be raw"? Hopefully not "any other lock" already held? And by taking a quick audit of the code, I see this: spin_lock_irqsave(&dev_priv->uncore.lock, irqflags); /* Reset the chip */ /* GEN6_GDRST is not in the gt power well, no need to check * for fifo space for the write or forcewake the chip for * the read */ __raw_i915_write32(dev_priv, GEN6_GDRST, GEN6_GRDOM_FULL); /* Spin waiting for the device to ack the reset request */ ret = wait_for((__raw_i915_read32(dev_priv, GEN6_GDRST) & GEN6_GRDOM_FULL) == 0, 500); That spin is unacceptable in RT with preemption and interrupts disabled. Yep. That would be bad. AFAICT the registers read in i915_get_crtc_scanoutpos() aren't included in the force-wake set, so raw reads of the registers would probably be acceptable (thus obviating the need for claiming the uncore.lock). Except that _ALL_ register access is disabled with the uncore.lock during a gpu reset. Not sure if that's meant to include crtc registers or not, or what other synchronization/serialization issues are being handled/hidden by forcing all register accesses to wait during a gpu reset. Hopefully an i915 expert can weigh in here? What's the real issue here? That the vblank timestamp needs to be an accurate measurement of a realtime event. Sleeping/servicing interrupts while reading the registers necessary to compute the timestamp would be bad too. (edit: which hopefully Mario Kleiner clarified in his reply) My point earlier was three-fold: 1. Don't need the preempt_disable() for mainline: all callers are already holding interrupt-disabling spinlocks. 2. -RT still needs to prevent scheduling there. 3. the problem is i915-specific. [update: the radeon driver should also BUG like the i915 driver but probably should have mmio_idx_lock spinlock as raw] Does it have something to do with this dump? Not sure [but I didn't realize the query was regarding 3.0]. [3.932060] [ cut here ] [3.936809] WARNING: at /home/rostedt/work/git/linux-rt.git/drivers/gpu/drm/i915/i915_drv.c:322 gen6_gt_force_wake_get+0x4d/0x50 [i915]() [3.949229] Hardware name: HP Compaq Pro 6300 SFF [3.953988] Modules linked in: i915(+) video i2c_algo_bit drm_kms_helper drm i2c_core [3.961943] Pid: 220, comm: udevd Not tainted 3.0.89-test-rt117 #117 [3.968343] Call Trace: [3.970851] [] warn_slowpath_common+0x7f/0xc0 [3.970852] [] warn_slowpath_null+0x1a/0x20 [3.970857] [] gen6_gt_force_wake_get+0x4d/0x50 [i915] [3.970865] [] ivybridge_init_clock_gating+0xcb/0x2f0 [i915] [3.970870] [] ? intel_crtc_disable+0x29/0x60 [i915] [3.970877] [] intel_modeset_init+0x751/0x10e0 [i915] [3.970882] [] i915_driver_load+0xfc8/0x17f0 [i915] [3.970885] [] ? device_register+0x1e/0x30 [3.970892] [] ? drm_sysfs_device_add+0x86/0xb0 [drm] [3.970896] [] ? drm_get_minor+0x233/0x300 [drm] [3.970900] [] drm_get_pci_dev+0x189/0x2a0 [drm] [3.970902] [] ? migrate_enable+0x8b/0x1c0 [3.970910] [] i915_pci_probe+0x1b/0x1d [i915] [3.970913] [] local_pci_probe+0x5c/0xd0 [3.970915] [] pci_device_probe+0x109/0x130 [3.970917] [] driver_probe_device+0x9c/0x2b0 [3.970918] [] __driver_attach+0xab/0xb0 [3.970919] [] ? driver_probe_device+0x2b0/0x2b0 [3.970920] [] ? driver_probe_device+0x2b0/0x2b0 [3.970921] [] bus_for_each_dev+0x64/0xa0 [3.970923] [] driver_attach+0x1e/0x20 [3.970924] [] bus_add_driver+0x1b0/0x290 [3.970925] [] driver_register+0x76/0x140 [3.970927] [] __pci_register_driver+0x82/0x10
Re: BUG: sleeping function called from invalid context on 3.10.10-rt7
On 09/11/2013 03:31 PM, Peter Hurley wrote: [+cc dri-devel] On 09/11/2013 11:38 AM, Steven Rostedt wrote: On Wed, 11 Sep 2013 11:16:43 -0400 Peter Hurley wrote: The funny part is, there's a comment there that shows that this was done even for "PREEMPT_RT". Unfortunately, the call to "get_scanout_position()" can call functions that use the rt-mutex "sleeping spin locks" and it breaks there. I guess we need to ask the authors of the mainline patch exactly why that preempt_disable() is needed? The drm core associates a timestamp with each vertical blank frame #. Drm drivers can optionally support a 'high resolution' hw timestamp. The vblank frame #/timestamp tuple is user-space visible. The i915 drm driver supports a hw timestamp via this drm helper function which computes the timestamp from the crtc scan position (based on the pixel clock). For mainline, the preempt_disable/_enable() isn't actually necessary because every call tree that leads here already has preemption disabled. For -RT, the maybe i915 register spinlock (uncore.lock) should be raw? No, it should not. Note, any other lock that can be held when it is held would also need to be raw. By that, you mean "any other lock" that might be claimed "would also need to be raw"? Hopefully not "any other lock" already held? And by taking a quick audit of the code, I see this: spin_lock_irqsave(&dev_priv->uncore.lock, irqflags); /* Reset the chip */ /* GEN6_GDRST is not in the gt power well, no need to check * for fifo space for the write or forcewake the chip for * the read */ __raw_i915_write32(dev_priv, GEN6_GDRST, GEN6_GRDOM_FULL); /* Spin waiting for the device to ack the reset request */ ret = wait_for((__raw_i915_read32(dev_priv, GEN6_GDRST) & GEN6_GRDOM_FULL) == 0, 500); That spin is unacceptable in RT with preemption and interrupts disabled. Yep. That would be bad. AFAICT the registers read in i915_get_crtc_scanoutpos() aren't included in the force-wake set, so raw reads of the registers would probably be acceptable (thus obviating the need for claiming the uncore.lock). Except that _ALL_ register access is disabled with the uncore.lock during a gpu reset. Not sure if that's meant to include crtc registers or not, or what other synchronization/serialization issues are being handled/hidden by forcing all register accesses to wait during a gpu reset. Hopefully an i915 expert can weigh in here? Daniel, Can you shed some light on whether the i915+ crtc registers (specifically those in i915_get_crtc_scanoutpos() and i915_/gm45_get_vblank_counter()) read as part of the vblank counter/timestamp handling need to be prevented during gpu reset? The implied wait with preemption and interrupts disabled is causing grief in -RT, but also a 4ms wait inside an irq handler seems like a bad idea. Regards, Peter Hurley What's the real issue here? That the vblank timestamp needs to be an accurate measurement of a realtime event. Sleeping/servicing interrupts while reading the registers necessary to compute the timestamp would be bad too. (edit: which hopefully Mario Kleiner clarified in his reply) My point earlier was three-fold: 1. Don't need the preempt_disable() for mainline: all callers are already holding interrupt-disabling spinlocks. 2. -RT still needs to prevent scheduling there. 3. the problem is i915-specific. [update: the radeon driver should also BUG like the i915 driver but probably should have mmio_idx_lock spinlock as raw] ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: BUG: sleeping function called from invalid context on 3.10.10-rt7
On 09/17/2013 04:55 PM, Daniel Vetter wrote: On Tue, Sep 17, 2013 at 9:50 PM, Peter Hurley wrote: On 09/11/2013 03:31 PM, Peter Hurley wrote: [+cc dri-devel] On 09/11/2013 11:38 AM, Steven Rostedt wrote: On Wed, 11 Sep 2013 11:16:43 -0400 Peter Hurley wrote: The funny part is, there's a comment there that shows that this was done even for "PREEMPT_RT". Unfortunately, the call to "get_scanout_position()" can call functions that use the rt-mutex "sleeping spin locks" and it breaks there. I guess we need to ask the authors of the mainline patch exactly why that preempt_disable() is needed? The drm core associates a timestamp with each vertical blank frame #. Drm drivers can optionally support a 'high resolution' hw timestamp. The vblank frame #/timestamp tuple is user-space visible. The i915 drm driver supports a hw timestamp via this drm helper function which computes the timestamp from the crtc scan position (based on the pixel clock). For mainline, the preempt_disable/_enable() isn't actually necessary because every call tree that leads here already has preemption disabled. For -RT, the maybe i915 register spinlock (uncore.lock) should be raw? No, it should not. Note, any other lock that can be held when it is held would also need to be raw. By that, you mean "any other lock" that might be claimed "would also need to be raw"? Hopefully not "any other lock" already held? And by taking a quick audit of the code, I see this: spin_lock_irqsave(&dev_priv->uncore.lock, irqflags); /* Reset the chip */ /* GEN6_GDRST is not in the gt power well, no need to check * for fifo space for the write or forcewake the chip for * the read */ __raw_i915_write32(dev_priv, GEN6_GDRST, GEN6_GRDOM_FULL); /* Spin waiting for the device to ack the reset request */ ret = wait_for((__raw_i915_read32(dev_priv, GEN6_GDRST) & GEN6_GRDOM_FULL) == 0, 500); That spin is unacceptable in RT with preemption and interrupts disabled. Yep. That would be bad. AFAICT the registers read in i915_get_crtc_scanoutpos() aren't included in the force-wake set, so raw reads of the registers would probably be acceptable (thus obviating the need for claiming the uncore.lock). Except that _ALL_ register access is disabled with the uncore.lock during a gpu reset. Not sure if that's meant to include crtc registers or not, or what other synchronization/serialization issues are being handled/hidden by forcing all register accesses to wait during a gpu reset. Hopefully an i915 expert can weigh in here? Daniel, Can you shed some light on whether the i915+ crtc registers (specifically those in i915_get_crtc_scanoutpos() and i915_/gm45_get_vblank_counter()) read as part of the vblank counter/timestamp handling need to be prevented during gpu reset? The depency here in the locking is a recent addition: commit a7cd1b8fea2f341b626b255d9898a5ca5fabbf0a Author: Chris Wilson Date: Fri Jul 19 20:36:51 2013 +0100 drm/i915: Serialize almost all register access It's a (slightly) oversized hammer to work around a hardware issue - we could break it down to register blocks, which can be accessed concurrently, but that tends to be more fragile. But the chip really dies if you access (even just reads) the same block concurrently :( Ouch. But thanks for clarifying that. Ok, so register access needs to be serialized. And a separate but related concern is that gen6+ resets also need to hold-off register access where forcewake is required. While I was reviewing the registers that require forcewake handling, I saw this: from i915_reg.h: #define _DPLL_A (dev_priv->info->display_mmio_offset + 0x6014) #define _DPLL_B (dev_priv->info->display_mmio_offset + 0x6018) from i915_drv.c: static const struct intel_device_info intel_valleyview_m_info = { GEN7_FEATURES, .is_mobile = 1, .num_pipes = 2, .is_valleyview = 1, .display_mmio_offset = VLV_DISPLAY_BASE, <<<--- .has_llc = 0, /* legal, last one wins */ }; from intel_uncore.c: #define NEEDS_FORCE_WAKE(dev_priv, reg) \ ((HAS_FORCE_WAKE((dev_priv)->dev)) && \ ((reg) < 0x4) &&\ ((reg) != FORCEWAKE)) Is this is a mistake or do the valleyview PLLs not require the same forcewake handling as the other intel gpus? Regards, Peter Hurley We could try break the spinlock protected section a bit in the reset handler - register access on a hung gpu tends to be ill-defined anyway. The implied wait with preemption and interrupts disabled is causing grief in -RT, but also a 4ms wait inside an irq handler seems like a bad idea. Oops, the magic code in wait_for which is just there to make the imo totally misguided kgdb support work papered over the aweful long wait in atomic context ever since w
Re: [Nouveau] [PATCH 6/6] drm/nouveau: use MSI interrupts
iously the right thing to do. Well, we can't just go around breaking stuff deliberately for the people still using them! I've blacklisted them myself and merged the patch. Ben, This patch causes my dual-head Quadro FX570 (G84) to fail to idle when dragging a window around. It loops for the full timeout (15 sec.) in nouveau_gem_ioctl_cpu_prep() -- ie., never reaches required fence sequence #. lspci -vvv -nn 02:00.0 VGA compatible controller [0300]: NVIDIA Corporation G84 [Quadro FX 570] [10de:040e] (rev a1) (prog-if 00 [VGA controller]) Subsystem: NVIDIA Corporation Device [10de:0474] Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- Capabilities: [600 v1] Vendor Specific Information: ID=0001 Rev=1 Len=024 Kernel driver in use: nouveau Kernel modules: nouveau, nvidiafb Regards, Peter Hurley ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [Nouveau] [PATCH 6/6] drm/nouveau: use MSI interrupts
On 09/30/2013 01:27 PM, Peter Hurley wrote: On 09/03/2013 09:45 PM, Ben Skeggs wrote: Well, we can't just go around breaking stuff deliberately for the people still using them! I've blacklisted them myself and merged the patch. Ben, This patch causes my dual-head Quadro FX570 (G84) to fail to idle when dragging a window around. It loops for the full timeout (15 sec.) in nouveau_gem_ioctl_cpu_prep() -- ie., never reaches required fence sequence #. FWIW, I can get the binary 319.32 driver to run this hardware with MSIs on [1] and not crash in similar circumstances. But repeating the testing on 3.12.0-rc2/3 has proved challenging. Although I have the binary driver running on 3.12.0-rc2, the userspace won't fully come up and I haven't figured out why (it's an old userspace that I don't use regularly, so it could be some other problem). Regards, Peter Hurley [1] lspci -vvv -nn on 3.2.x 02:00.0 VGA compatible controller [0300]: NVIDIA Corporation G84GL [Quadro FX 570] [10de:040e] (rev a1) (prog-if 00 [VGA controller]) Subsystem: NVIDIA Corporation Device [10de:0474] Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- Capabilities: [600 v1] Vendor Specific Information: ID=0001 Rev=1 Len=024 Kernel driver in use: nvidia Kernel modules: nvidia_319_updates, nouveau, nvidiafb ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH] drm: Fix DRM_FORCE_ON_DIGITAL use
A connector may be forced on from the command line via video= command line setting. The digital output of dual-mode connectors can also be specifically selected and forced on; eg., 'video=DVI-I-2:D'. However, in this case, the connector->status will be mistakenly set to connector_status_disconnected, and the connector will not be mode set. Fix the connector->status when connector->force is DRM_FORCE_ON_DIGITAL. Signed-off-by: Peter Hurley --- drivers/gpu/drm/drm_probe_helper.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/drm_probe_helper.c b/drivers/gpu/drm/drm_probe_helper.c index 6857e9a..7483a47 100644 --- a/drivers/gpu/drm/drm_probe_helper.c +++ b/drivers/gpu/drm/drm_probe_helper.c @@ -118,7 +118,8 @@ static int drm_helper_probe_single_connector_modes_merge_bits(struct drm_connect mode->status = MODE_UNVERIFIED; if (connector->force) { - if (connector->force == DRM_FORCE_ON) + if (connector->force == DRM_FORCE_ON || + connector->force == DRM_FORCE_ON_DIGITAL) connector->status = connector_status_connected; else connector->status = connector_status_disconnected; -- 2.1.1
[PATCH] drm: Remove compiler BUG_ON() test
modeset->num_connectors must be 0 to reach the BUG_ON() which tests for non-zero modeset->num_connectors; remove BUG_ON(). Signed-off-by: Peter Hurley --- drivers/gpu/drm/drm_fb_helper.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/gpu/drm/drm_fb_helper.c b/drivers/gpu/drm/drm_fb_helper.c index d934346..7198257 100644 --- a/drivers/gpu/drm/drm_fb_helper.c +++ b/drivers/gpu/drm/drm_fb_helper.c @@ -1573,7 +1573,6 @@ static void drm_setup_crtcs(struct drm_fb_helper *fb_helper) modeset = &fb_helper->crtc_info[i].mode_set; if (modeset->num_connectors == 0) { BUG_ON(modeset->fb); - BUG_ON(modeset->num_connectors); if (modeset->mode) drm_mode_destroy(dev, modeset->mode); modeset->mode = NULL; -- 2.1.1
BUG caused by "Use new drm_fb_helper functions" series
Hi Archit, Just booting 4.4-rc5+, I got this splat [1] At first glance, this appears to be a simple fix. However, I'm concerned that fbcon functions, which may be called with interrupts disabled, are now hooked up to fbdev functions which may assume interrupts are not disabled (as is the case with cfb_imageblit()). For example, in the splat below, it's a simple fix to make the splat go away with GFP_ATOMIC allocation. However, the following fence wait is _never_ going to trigger with interrupts disabled on UP. FWIW, I've been running almost exclusively debug kernel builds so I'm not sure why this hasn't triggered many times before, but it hasn't. Regards, Peter Hurley [1] BUG splat [ 37.438494] BUG: sleeping function called from invalid context at /home/peter/src/kernels/mainline/mm/slub.c:1287 [ 37.438495] in_atomic(): 1, irqs_disabled(): 1, pid: 2276, name: auditd [ 37.438497] 1 lock held by auditd/2276: [ 37.438507] #0: (audit_cmd_mutex){+.+.+.}, at: [] audit_receive+0x1f/0xa0 [ 37.438507] irq event stamp: 1689 [ 37.438511] hardirqs last enabled at (1689): [] vprintk_emit+0x236/0x620 [ 37.438513] hardirqs last disabled at (1688): [] vprintk_emit+0xd4/0x620 [ 37.438518] softirqs last enabled at (1652): [] netlink_poll+0x138/0x1c0 [ 37.438520] softirqs last disabled at (1650): [] netlink_poll+0xf7/0x1c0 [ 37.438522] CPU: 7 PID: 2276 Comm: auditd Not tainted 4.4.0-rc5+wip-xeon+debug #rc5+wip [ 37.438523] Hardware name: Dell Inc. Precision WorkStation T5400 /0RW203, BIOS A11 04/30/2012 [ 37.438526] 81ce5cc8 8802a87c3590 813fb6c5 8802ac768000 [ 37.438528] 8802a87c35b8 810a6fb9 81ce5cc8 0507 [ 37.438530] 8802a87c35e0 810a70b9 024080c0 [ 37.438531] Call Trace: [ 37.438535] [] dump_stack+0x4e/0x79 [ 37.438538] [] ___might_sleep+0x149/0x200 [ 37.438540] [] __might_sleep+0x49/0x80 [ 37.438544] [] kmem_cache_alloc_trace+0x20d/0x2e0 [ 37.438600] [] ? nouveau_fence_new+0x3b/0x90 [nouveau] [ 37.438624] [] nouveau_fence_new+0x3b/0x90 [nouveau] [ 37.438649] [] nouveau_channel_idle+0x42/0xb0 [nouveau] [ 37.438673] [] nouveau_fbcon_sync+0x7f/0xb0 [nouveau] [ 37.438677] [] cfb_imageblit+0x9a/0x4d0 [ 37.438681] [] ? trace_hardirqs_off_caller+0x1f/0xc0 [ 37.438693] [] drm_fb_helper_cfb_imageblit+0xe/0x10 [drm_kms_helper] [ 37.438717] [] nouveau_fbcon_imageblit+0x51/0xd0 [nouveau] [ 37.438719] [] bit_putcs+0x2dc/0x530 [ 37.438721] [] ? trace_hardirqs_off+0xd/0x10 [ 37.438725] [] ? get_color.isra.15+0x34/0x130 [ 37.438727] [] fbcon_putcs+0x128/0x160 [ 37.438728] [] ? bit_cursor+0x5e0/0x5e0 [ 37.438730] [] fbcon_redraw.isra.25+0x16b/0x1d0 [ 37.438731] [] fbcon_scroll+0x1ea/0xce0 [ 37.438734] [] scrup+0x14a/0x160 [ 37.438736] [] lf+0x80/0x90 [ 37.438737] [] vt_console_print+0x2a7/0x3e0 [ 37.438739] [] call_console_drivers.constprop.24+0x144/0x1d0 [ 37.438741] [] console_unlock+0x463/0x540 [ 37.438742] [] vprintk_emit+0x35a/0x620 [ 37.438744] [] vprintk_default+0x29/0x40 [ 37.438748] [] printk+0x4d/0x4f [ 37.438750] [] audit_printk_skb+0x62/0x70 [ 37.438751] [] audit_log_end+0x1d4/0x2d0 [ 37.438752] [] ? audit_log_end+0x30/0x2d0 [ 37.438754] [] audit_log_config_change+0x89/0xa0 [ 37.438756] [] audit_receive_msg+0xa5a/0xbb0 [ 37.438759] [] ? mutex_lock_nested+0x2ed/0x450 [ 37.438761] [] ? audit_receive+0x1f/0xa0 [ 37.438762] [] ? audit_receive+0x1f/0xa0 [ 37.438764] [] audit_receive+0x52/0xa0 [ 37.438766] [] netlink_unicast+0xf2/0x1c0 [ 37.438767] [] netlink_sendmsg+0x3e7/0x620 [ 37.438771] [] sock_sendmsg+0x38/0x50 [ 37.438772] [] SYSC_sendto+0xf6/0x170 [ 37.438775] [] ? context_tracking_exit+0x1d/0x30 [ 37.438778] [] ? enter_from_user_mode+0x1f/0x50 [ 37.438780] [] ? syscall_trace_enter_phase1+0xcb/0x130 [ 37.438781] [] ? trace_hardirqs_on_thunk+0x17/0x19 [ 37.438784] [] SyS_sendto+0xe/0x10 [ 37.438786] [] entry_SYSCALL_64_fastpath+0x16/0x7a
BUG caused by "Use new drm_fb_helper functions" series
On 02/01/2016 09:20 PM, Archit Taneja wrote: > Hi Peter, > > On 02/02/2016 02:07 AM, Peter Hurley wrote: >> Hi Archit, >> >> Just booting 4.4-rc5+, I got this splat [1] >> At first glance, this appears to be a simple fix. > > Thanks for sharing this. > >> >> However, I'm concerned that fbcon functions, which may be called with >> interrupts disabled, are now hooked up to fbdev functions which may assume >> interrupts are not disabled (as is the case with cfb_imageblit()). > > In the case when CONFIG_FB is enabled, drm_fb_helper_cfb_imageblit > helper simply wraps around cfg_imageblit, so I don't see how we'd have > any difference in behaviour. Sorry; terrible attribution on my part. This bug clearly has nothing to do with this series. But a better look has me wondering how all these gpus are syncing the framebuffer for direct access via cfb_imageblit and friends. I only see nouveau and intel gma even trying. >> For example, in the splat below, it's a simple fix to make the splat go >> away with GFP_ATOMIC allocation. However, the following fence wait is _never_ >> going to trigger with interrupts disabled on UP. >> >> FWIW, I've been running almost exclusively debug kernel builds so I'm not >> sure why this hasn't triggered many times before, but it hasn't. > > We could revert the patch > "drm/nouveau: Use new drm_fb_helper functions" and see if we still hit this > issue. > > Thanks, > Archit > >> Regards, >> Peter Hurley >> >> >> [1] BUG splat >> >> [ 37.438494] BUG: sleeping function called from invalid context at >> /home/peter/src/kernels/mainline/mm/slub.c:1287 >> [ 37.438495] in_atomic(): 1, irqs_disabled(): 1, pid: 2276, name: auditd >> [ 37.438497] 1 lock held by auditd/2276: >> [ 37.438507] #0: (audit_cmd_mutex){+.+.+.}, at: [] >> audit_receive+0x1f/0xa0 >> [ 37.438507] irq event stamp: 1689 >> [ 37.438511] hardirqs last enabled at (1689): [] >> vprintk_emit+0x236/0x620 >> [ 37.438513] hardirqs last disabled at (1688): [] >> vprintk_emit+0xd4/0x620 >> [ 37.438518] softirqs last enabled at (1652): [] >> netlink_poll+0x138/0x1c0 >> [ 37.438520] softirqs last disabled at (1650): [] >> netlink_poll+0xf7/0x1c0 >> [ 37.438522] CPU: 7 PID: 2276 Comm: auditd Not tainted >> 4.4.0-rc5+wip-xeon+debug #rc5+wip >> [ 37.438523] Hardware name: Dell Inc. Precision WorkStation T5400 >> /0RW203, BIOS A11 04/30/2012 >> [ 37.438526] 81ce5cc8 8802a87c3590 813fb6c5 >> 8802ac768000 >> [ 37.438528] 8802a87c35b8 810a6fb9 81ce5cc8 >> 0507 >> [ 37.438530] 8802a87c35e0 810a70b9 >> 024080c0 >> [ 37.438531] Call Trace: >> [ 37.438535] [] dump_stack+0x4e/0x79 >> [ 37.438538] [] ___might_sleep+0x149/0x200 >> [ 37.438540] [] __might_sleep+0x49/0x80 >> [ 37.438544] [] kmem_cache_alloc_trace+0x20d/0x2e0 >> [ 37.438600] [] ? nouveau_fence_new+0x3b/0x90 [nouveau] >> [ 37.438624] [] nouveau_fence_new+0x3b/0x90 [nouveau] >> [ 37.438649] [] nouveau_channel_idle+0x42/0xb0 [nouveau] >> [ 37.438673] [] nouveau_fbcon_sync+0x7f/0xb0 [nouveau] >> [ 37.438677] [] cfb_imageblit+0x9a/0x4d0 >> [ 37.438681] [] ? trace_hardirqs_off_caller+0x1f/0xc0 >> [ 37.438693] [] drm_fb_helper_cfb_imageblit+0xe/0x10 >> [drm_kms_helper] >> [ 37.438717] [] nouveau_fbcon_imageblit+0x51/0xd0 >> [nouveau] >> [ 37.438719] [] bit_putcs+0x2dc/0x530 >> [ 37.438721] [] ? trace_hardirqs_off+0xd/0x10 >> [ 37.438725] [] ? get_color.isra.15+0x34/0x130 >> [ 37.438727] [] fbcon_putcs+0x128/0x160 >> [ 37.438728] [] ? bit_cursor+0x5e0/0x5e0 >> [ 37.438730] [] fbcon_redraw.isra.25+0x16b/0x1d0 >> [ 37.438731] [] fbcon_scroll+0x1ea/0xce0 >> [ 37.438734] [] scrup+0x14a/0x160 >> [ 37.438736] [] lf+0x80/0x90 >> [ 37.438737] [] vt_console_print+0x2a7/0x3e0 >> [ 37.438739] [] >> call_console_drivers.constprop.24+0x144/0x1d0 >> [ 37.438741] [] console_unlock+0x463/0x540 >> [ 37.438742] [] vprintk_emit+0x35a/0x620 >> [ 37.438744] [] vprintk_default+0x29/0x40 >> [ 37.438748] [] printk+0x4d/0x4f >> [ 37.438750] [] audit_printk_skb+0x62/0x70 >> [ 37.438751] [] audit_log_end+0x1d4/0x2d0 >> [ 37.438752] [] ? audit_log_end+0x30/0x2d0 >> [ 37.438754] [] audit_log_config_change+0x89/0xa0 >> [ 37.438756] [] audit_receive_msg+0xa5a/0xbb0 >> [ 37.438759]
[Intel-gfx] [PATCH] drm: Assert correct locking for drm_send_vblank_event
On 09/12/2014 01:25 PM, Daniel Vetter wrote: > On Fri, Sep 12, 2014 at 01:03:51PM -0400, Peter Hurley wrote: >> On 09/12/2014 12:04 PM, Chris Wilson wrote: >>> On Fri, Sep 12, 2014 at 05:34:56PM +0200, Daniel Vetter wrote: >>>> On Fri, Sep 12, 2014 at 04:23:29PM +0100, Chris Wilson wrote: >>>>> On Fri, Sep 12, 2014 at 03:40:56PM +0200, Daniel Vetter wrote: >>>>>> The comment says that the caller must hold the dev->event_lock >>>>>> spinlock, so let's enforce this. >>>>>> >>>>>> A quick audit over all driver shows that except for the one place in >>>>>> i915 which motivated this all callers fullfill this requirement >>>>>> already. >>>>> >>>>> Replace the rogue WARN_ON_SMP(!spin_is_locked(&dev->event_lock)) in >>>>> send_vblank_event() as well then. >>>> >>>> Meh, I've missed that one, that's actually better I think. I'll drop my >>>> patch here. >>> >>> I thought assert_spin_lock was the preferred form? >> >> Actually, lockdep_assert_held() is the preferred form. >> >> See https://lkml.org/lkml/2014/9/3/171 > > Which unfortunately doesn't warn for all the normal users which are not > insane enough to enable lockdep and so is totally useless to validate a > driver that runs on metric piles of different chips (with a resulting > combinatorial explosion of code-paths because hw designers are creative). > And we rely a lot on random drive-by testers to report such issues. I know. When I wrote [in that thread linked above]: On Wed, Sep 03, 2014 at 10:50:01AM -0400, Peter Hurley wrote: > So a lockdep-only assert is unlikely to draw attention to existing bugs, > especially in established drivers. here's the replies I got: Peter Zijlstra wrote: > By the same logic lockdep will not find locking errors in established > drivers. and On 09/04/2014 01:14 AM, Ingo Molnar wrote: > Indeed, this patch is ill-advised in several ways: > > - it extends an API variant that we want to phase > > - emits a warning even if say lockdep has already emitted a > warning and locking state is not guaranteed to be consistent. > > - makes the kernel more expensive once fully debugged, in that > non-fatal checks are unconditional. :/ Regards, Peter Hurley
[Intel-gfx] [PATCH] drm: Assert correct locking for drm_send_vblank_event
On 09/12/2014 12:04 PM, Chris Wilson wrote: > On Fri, Sep 12, 2014 at 05:34:56PM +0200, Daniel Vetter wrote: >> On Fri, Sep 12, 2014 at 04:23:29PM +0100, Chris Wilson wrote: >>> On Fri, Sep 12, 2014 at 03:40:56PM +0200, Daniel Vetter wrote: >>>> The comment says that the caller must hold the dev->event_lock >>>> spinlock, so let's enforce this. >>>> >>>> A quick audit over all driver shows that except for the one place in >>>> i915 which motivated this all callers fullfill this requirement >>>> already. >>> >>> Replace the rogue WARN_ON_SMP(!spin_is_locked(&dev->event_lock)) in >>> send_vblank_event() as well then. >> >> Meh, I've missed that one, that's actually better I think. I'll drop my >> patch here. > > I thought assert_spin_lock was the preferred form? Actually, lockdep_assert_held() is the preferred form. See https://lkml.org/lkml/2014/9/3/171 Regards, Peter Hurley
page allocator bug in 3.16?
After several days uptime with a 3.16 kernel (generally running Thunderbird, emacs, kernel builds, several Chrome tabs on multiple desktop workspaces) I've been seeing some really extreme slowdowns. Mostly the slowdowns are associated with gpu-related tasks, like opening new emacs windows, switching workspaces, laughing at internet gifs, etc. Because this x86_64 desktop is nouveau-based, I didn't pursue it right away -- 3.15 is the first time suspend has worked reliably. This week I started looking into what the slowdown was and discovered it's happening during dma allocation through swiotlb (the cpus can do intel iommu but I don't use it because it's not the default for most users). I'm still working on a bisection but each step takes 8+ hours to validate and even then I'm no longer sure I still have the 'bad' commit in the bisection. [edit: yup, I started over] I just discovered a smattering of these in my logs and only on 3.16-rc+ kernels: Sep 25 07:57:59 thor kernel: [28786.001300] alloc_contig_range test_pages_isolated(2bf560, 2bf562) failed This dual-Xeon box has 10GB and sysrq Show Memory isn't showing heavy fragmentation [1]. Besides Mel's page allocator changes in 3.16, another suspect commit is: commit b13b1d2d8692b437203de7a404c6b809d2cc4d99 Author: Shaohua Li Date: Tue Apr 8 15:58:09 2014 +0800 x86/mm: In the PTE swapout page reclaim case clear the accessed bit instead of flushing the TLB Specifically, this statement: It could cause incorrect page aging and the (mistaken) reclaim of hot pages, but the chance of that should be relatively low. I'm wondering if this could cause worse-case behavior with TTM? I'm testing a revert of this on mainline 3.16-final now, with no results yet. Thoughts? Regards, Peter Hurley [1] SysRq : Show Memory Mem-Info: Node 0 DMA per-cpu: CPU0: hi:0, btch: 1 usd: 0 CPU1: hi:0, btch: 1 usd: 0 CPU2: hi:0, btch: 1 usd: 0 CPU3: hi:0, btch: 1 usd: 0 CPU4: hi:0, btch: 1 usd: 0 CPU5: hi:0, btch: 1 usd: 0 CPU6: hi:0, btch: 1 usd: 0 CPU7: hi:0, btch: 1 usd: 0 Node 0 DMA32 per-cpu: CPU0: hi: 186, btch: 31 usd: 18 CPU1: hi: 186, btch: 31 usd: 82 CPU2: hi: 186, btch: 31 usd: 46 CPU3: hi: 186, btch: 31 usd: 30 CPU4: hi: 186, btch: 31 usd: 18 CPU5: hi: 186, btch: 31 usd: 43 CPU6: hi: 186, btch: 31 usd: 157 CPU7: hi: 186, btch: 31 usd: 26 Node 0 Normal per-cpu: CPU0: hi: 186, btch: 31 usd: 25 CPU1: hi: 186, btch: 31 usd: 33 CPU2: hi: 186, btch: 31 usd: 28 CPU3: hi: 186, btch: 31 usd: 46 CPU4: hi: 186, btch: 31 usd: 23 CPU5: hi: 186, btch: 31 usd: 8 CPU6: hi: 186, btch: 31 usd: 112 CPU7: hi: 186, btch: 31 usd: 18 active_anon:382833 inactive_anon:12103 isolated_anon:0 active_file:1156997 inactive_file:733988 isolated_file:0 unevictable:15 dirty:35833 writeback:0 unstable:0 free:129383 slab_reclaimable:95038 slab_unreclaimable:11095 mapped:81924 shmem:12509 pagetables:9039 bounce:0 free_cma:0 Node 0 DMA free:15860kB min:104kB low:128kB high:156kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15960kB managed:15876kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:16kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes lowmem_reserve[]: 0 2974 9980 9980 Node 0 DMA32 free:166712kB min:20108kB low:25132kB high:30160kB active_anon:475548kB inactive_anon:15204kB active_file:1368716kB inactive_file:865832kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3127336kB managed:3048188kB mlocked:0kB dirty:38228kB writeback:0kB mapped:94340kB shmem:15436kB slab_reclaimable:116424kB slab_unreclaimable:12756kB kernel_stack:2512kB pagetables:11532kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 0 7006 7006 Node 0 Normal free:334960kB min:47368kB low:59208kB high:71052kB active_anon:1055784kB inactive_anon:33208kB active_file:3259272kB inactive_file:2070120kB unevictable:60kB isolated(anon):0kB isolated(file):0kB present:7340032kB managed:7174484kB mlocked:60kB dirty:105104kB writeback:0kB mapped:233356kB shmem:34600kB slab_reclaimable:263728kB slab_unreclaimable:31608kB kernel_stack:7344kB pagetables:24624kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 0 0 0 Node 0 DMA: 1*4kB (U) 0*8kB 1*16kB (U) 1*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (R) 3*4096kB (M) = 15860kB Node 0 DMA32: 209*4kB (UEM) 394*8kB (UEM) 303*16kB (UEM) 60*32kB (UEM) 314*64kB (UEM) 117*128kB (UEM) 9*256kB (EM) 3*512kB (UEM) 2*1
page allocator bug in 3.16?
On 09/25/2014 02:55 PM, Peter Hurley wrote: > After several days uptime with a 3.16 kernel (generally running > Thunderbird, emacs, kernel builds, several Chrome tabs on multiple > desktop workspaces) I've been seeing some really extreme slowdowns. > > Mostly the slowdowns are associated with gpu-related tasks, like > opening new emacs windows, switching workspaces, laughing at internet > gifs, etc. Because this x86_64 desktop is nouveau-based, I didn't pursue > it right away -- 3.15 is the first time suspend has worked reliably. > > This week I started looking into what the slowdown was and discovered > it's happening during dma allocation through swiotlb (the cpus can do > intel iommu but I don't use it because it's not the default for most users). > > I'm still working on a bisection but each step takes 8+ hours to > validate and even then I'm no longer sure I still have the 'bad' > commit in the bisection. [edit: yup, I started over] > > I just discovered a smattering of these in my logs and only on 3.16-rc+ > kernels: > Sep 25 07:57:59 thor kernel: [28786.001300] alloc_contig_range > test_pages_isolated(2bf560, 2bf562) failed > > This dual-Xeon box has 10GB and sysrq Show Memory isn't showing heavy > fragmentation [1]. It's swapping, which is crazy because there's 7+GB of file cache [1] which should be dropped before swapping. The alloc_contig_range() failure precedes the swapping but not immediately (44 mins. earlier). How I reproduce this is to simply do a full distro kernel build. Skipping the TLB flush is not the problem; the results below are from 3.16-final with that commit reverted. The slowdown is really obvious because workspace switching redraw takes multiple seconds to complete (all-cpu perf record of that below [2]) Regards, Peter Hurley [1] SysRq : Show Memory Mem-Info: Node 0 DMA per-cpu: CPU0: hi:0, btch: 1 usd: 0 CPU1: hi:0, btch: 1 usd: 0 CPU2: hi:0, btch: 1 usd: 0 CPU3: hi:0, btch: 1 usd: 0 CPU4: hi:0, btch: 1 usd: 0 CPU5: hi:0, btch: 1 usd: 0 CPU6: hi:0, btch: 1 usd: 0 CPU7: hi:0, btch: 1 usd: 0 Node 0 DMA32 per-cpu: CPU0: hi: 186, btch: 31 usd: 71 CPU1: hi: 186, btch: 31 usd: 166 CPU2: hi: 186, btch: 31 usd: 183 CPU3: hi: 186, btch: 31 usd: 109 CPU4: hi: 186, btch: 31 usd: 106 CPU5: hi: 186, btch: 31 usd: 161 CPU6: hi: 186, btch: 31 usd: 120 CPU7: hi: 186, btch: 31 usd: 54 Node 0 Normal per-cpu: CPU0: hi: 186, btch: 31 usd: 159 CPU1: hi: 186, btch: 31 usd: 66 CPU2: hi: 186, btch: 31 usd: 178 CPU3: hi: 186, btch: 31 usd: 173 CPU4: hi: 186, btch: 31 usd: 91 CPU5: hi: 186, btch: 31 usd: 57 CPU6: hi: 186, btch: 31 usd: 58 CPU7: hi: 186, btch: 31 usd: 158 active_anon:170368 inactive_anon:173964 isolated_anon:0 active_file:982209 inactive_file:973911 isolated_file:0 unevictable:15 dirty:15 writeback:1 unstable:0 free:96067 slab_reclaimable:107401 slab_unreclaimable:12572 mapped:58271 shmem:10857 pagetables:9898 bounce:0 free_cma:18 Node 0 DMA free:15860kB min:104kB low:128kB high:156kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15960kB managed:15876kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:16kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes lowmem_reserve[]: 0 2974 9980 9980 Node 0 DMA32 free:117740kB min:20108kB low:25132kB high:30160kB active_anon:205232kB inactive_anon:196308kB active_file:1186764kB inactive_file:1173760kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3127336kB managed:3048212kB mlocked:0kB dirty:24kB writeback:4kB mapped:71600kB shmem:8776kB slab_reclaimable:129132kB slab_unreclaimable:13468kB kernel_stack:2864kB pagetables:11536kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 0 7006 7006 Node 0 Normal free:250668kB min:47368kB low:59208kB high:71052kB active_anon:476240kB inactive_anon:499548kB active_file:2742072kB inactive_file:2721884kB unevictable:60kB isolated(anon):0kB isolated(file):0kB present:7340032kB managed:7174484kB mlocked:60kB dirty:36kB writeback:0kB mapped:161484kB shmem:34652kB slab_reclaimable:300472kB slab_unreclaimable:36804kB kernel_stack:7232kB pagetables:28056kB unstable:0kB bounce:0kB free_cma:72kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 0 0 0 Node 0 DMA: 1*4kB (U) 0*8kB 1*16kB (U) 1*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (R) 3*4096kB (M) = 15860kB Node 0 DMA32: 4099*4kB (UEM) 4372*8kB (UEM) 668*16kB (UEM)
page allocator bug in 3.16?
On 09/25/2014 04:33 PM, Alex Deucher wrote: > On Thu, Sep 25, 2014 at 2:55 PM, Peter Hurley > wrote: >> After several days uptime with a 3.16 kernel (generally running >> Thunderbird, emacs, kernel builds, several Chrome tabs on multiple >> desktop workspaces) I've been seeing some really extreme slowdowns. >> >> Mostly the slowdowns are associated with gpu-related tasks, like >> opening new emacs windows, switching workspaces, laughing at internet >> gifs, etc. Because this x86_64 desktop is nouveau-based, I didn't pursue >> it right away -- 3.15 is the first time suspend has worked reliably. >> >> This week I started looking into what the slowdown was and discovered >> it's happening during dma allocation through swiotlb (the cpus can do >> intel iommu but I don't use it because it's not the default for most users). >> >> I'm still working on a bisection but each step takes 8+ hours to >> validate and even then I'm no longer sure I still have the 'bad' >> commit in the bisection. [edit: yup, I started over] >> >> I just discovered a smattering of these in my logs and only on 3.16-rc+ >> kernels: >> Sep 25 07:57:59 thor kernel: [28786.001300] alloc_contig_range >> test_pages_isolated(2bf560, 2bf562) failed >> >> This dual-Xeon box has 10GB and sysrq Show Memory isn't showing heavy >> fragmentation [1]. >> >> Besides Mel's page allocator changes in 3.16, another suspect commit is: >> >> commit b13b1d2d8692b437203de7a404c6b809d2cc4d99 >> Author: Shaohua Li >> Date: Tue Apr 8 15:58:09 2014 +0800 >> >> x86/mm: In the PTE swapout page reclaim case clear the accessed bit >> instead of flushing the TLB >> >> Specifically, this statement: >> >> It could cause incorrect page aging and the (mistaken) reclaim of >> hot pages, but the chance of that should be relatively low. >> >> I'm wondering if this could cause worse-case behavior with TTM? I'm >> testing a revert of this on mainline 3.16-final now, with no results yet. >> >> Thoughts? > > You may also be seeing this: > https://lkml.org/lkml/2014/8/8/445 Thanks Alex. That is indeed the problem. Still reading the email thread to find out where the patches are that fix this. Although it doesn't make much sense to me that nouveau sets up a 1GB GART and then uses TTM which is trying to shove all the DMA through a 16MB CMA window (which turns out to be the base Ubuntu config). Regards, Peter Hurley
page allocator bug in 3.16?
On 09/25/2014 03:35 PM, Chuck Ebbert wrote: > There are six ttm patches queued for 3.16.4: > > drm-ttm-choose-a-pool-to-shrink-correctly-in-ttm_dma_pool_shrink_scan.patch > drm-ttm-fix-handling-of-ttm_pl_flag_topdown-v2.patch > drm-ttm-fix-possible-division-by-0-in-ttm_dma_pool_shrink_scan.patch > drm-ttm-fix-possible-stack-overflow-by-recursive-shrinker-calls.patch > drm-ttm-pass-gfp-flags-in-order-to-avoid-deadlock.patch > drm-ttm-use-mutex_trylock-to-avoid-deadlock-inside-shrinker-functions.patch Thanks for info, Chuck. Unfortunately, none of these fix TTM dma allocation doing CMA dma allocation, which is the root problem. Regards, Peter Hurley
page allocator bug in 3.16?
[ +cc Leann Ogasawara, Marek Szyprowski, Kyungmin Park, Arnd Bergmann ] On 09/26/2014 08:40 AM, Rik van Riel wrote: > On 09/26/2014 08:28 AM, Rob Clark wrote: >> On Fri, Sep 26, 2014 at 6:45 AM, Thomas Hellstrom >> wrote: >>> On 09/26/2014 12:40 PM, Chuck Ebbert wrote: >>>> On Fri, 26 Sep 2014 09:15:57 +0200 Thomas Hellstrom >>>> wrote: >>>> >>>>> On 09/26/2014 01:52 AM, Peter Hurley wrote: >>>>>> On 09/25/2014 03:35 PM, Chuck Ebbert wrote: >>>>>>> There are six ttm patches queued for 3.16.4: >>>>>>> >>>>>>> drm-ttm-choose-a-pool-to-shrink-correctly-in-ttm_dma_pool_shrink_scan.patch >>>>>>> >>>>>>> > drm-ttm-fix-handling-of-ttm_pl_flag_topdown-v2.patch >>>>>>> drm-ttm-fix-possible-division-by-0-in-ttm_dma_pool_shrink_scan.patch >>>>>>> >>>>>>> > drm-ttm-fix-possible-stack-overflow-by-recursive-shrinker-calls.patch >>>>>>> drm-ttm-pass-gfp-flags-in-order-to-avoid-deadlock.patch >>>>>>> drm-ttm-use-mutex_trylock-to-avoid-deadlock-inside-shrinker-functions.patch >>>>>> >>>>>>> > Thanks for info, Chuck. >>>>>> >>>>>> Unfortunately, none of these fix TTM dma allocation doing >>>>>> CMA dma allocation, which is the root problem. >>>>>> >>>>>> Regards, Peter Hurley >>>>> The problem is not really in TTM but in CMA, There was a guy >>>>> offering to fix this in the CMA code but I guess he didn't >>>>> probably because he didn't receive any feedback. >>>>> >>>> Yeah, the "solution" to this problem seems to be "don't enable >>>> CMA on x86". Maybe it should even be disabled in the config >>>> system. >>> Or, as previously suggested, don't use CMA for order 0 (single >>> page) allocations >> >> On devices that actually need CMA pools to arrange for memory to be >> in certain ranges, I think you probably do want to have order 0 >> pages come from the CMA pool. >> >> Seems like disabling CMA on x86 (where it should be unneeded) is >> the better way, IMO > > CMA has its uses on x86. For example, CMA is used to allocate 1GB huge > pages. > > There may also be people with devices that do not scatter-gather, and > need a large physically contiguous buffer, though there should be > relatively few of those on x86. > > I suspect it makes most sense to do DMA allocations up to PAGE_ORDER > through the normal allocator on x86, and only invoking CMA for larger > allocations. The code that uses CMA to satisfy DMA allocations on x86 is specific to the x86 arch and was added in 2011 as a means of _testing_ CMA in KVM: commit 0a2b9a6ea93650b8a00f9fd5ee8fdd25671e2df6 Author: Marek Szyprowski Date: Thu Dec 29 13:09:51 2011 +0100 X86: integrate CMA with DMA-mapping subsystem This patch adds support for CMA to dma-mapping subsystem for x86 architecture that uses common pci-dma/pci-nommu implementation. This allows to test CMA on KVM/QEMU and a lot of common x86 boxes. Signed-off-by: Marek Szyprowski Signed-off-by: Kyungmin Park CC: Michal Nazarewicz Acked-by: Arnd Bergmann (no x86 maintainer acks?). Unfortunately, this code is enabled whenever CMA is enabled, rather than as a separate test configuration. So, while enabling CMA may have other purposes on x86, using it for x86 swiotlb and nommu dma allocations is not one of the them. And Ubuntu should not be enabling CONFIG_DMA_CMA for their i386 and amd64 configurations, as this is trying to drive _all_ dma mapping allocations through a _very_ small window (which is killing GPU performance). Regards, Peter Hurley
page allocator bug in 3.16?
On 09/26/2014 11:12 AM, Leann Ogasawara wrote: > On Fri, Sep 26, 2014 at 7:10 AM, Peter Hurley > wrote: >> [ +cc Leann Ogasawara, Marek Szyprowski, Kyungmin Park, Arnd Bergmann ] >> >> On 09/26/2014 08:40 AM, Rik van Riel wrote: >>> On 09/26/2014 08:28 AM, Rob Clark wrote: >>>> On Fri, Sep 26, 2014 at 6:45 AM, Thomas Hellstrom >>>> wrote: >>>>> On 09/26/2014 12:40 PM, Chuck Ebbert wrote: >>>>>> On Fri, 26 Sep 2014 09:15:57 +0200 Thomas Hellstrom >>>>>> wrote: >>>>>> >>>>>>> On 09/26/2014 01:52 AM, Peter Hurley wrote: >>>>>>>> On 09/25/2014 03:35 PM, Chuck Ebbert wrote: >>>>>>>>> There are six ttm patches queued for 3.16.4: >>>>>>>>> >>>>>>>>> drm-ttm-choose-a-pool-to-shrink-correctly-in-ttm_dma_pool_shrink_scan.patch >>>>>>>>> >>>>>>>>> >>> drm-ttm-fix-handling-of-ttm_pl_flag_topdown-v2.patch >>>>>>>>> drm-ttm-fix-possible-division-by-0-in-ttm_dma_pool_shrink_scan.patch >>>>>>>>> >>>>>>>>> >>> drm-ttm-fix-possible-stack-overflow-by-recursive-shrinker-calls.patch >>>>>>>>> drm-ttm-pass-gfp-flags-in-order-to-avoid-deadlock.patch >>>>>>>>> drm-ttm-use-mutex_trylock-to-avoid-deadlock-inside-shrinker-functions.patch >>>>>>>> >>>>>>>>> >>> Thanks for info, Chuck. >>>>>>>> >>>>>>>> Unfortunately, none of these fix TTM dma allocation doing >>>>>>>> CMA dma allocation, which is the root problem. >>>>>>>> >>>>>>>> Regards, Peter Hurley >>>>>>> The problem is not really in TTM but in CMA, There was a guy >>>>>>> offering to fix this in the CMA code but I guess he didn't >>>>>>> probably because he didn't receive any feedback. >>>>>>> >>>>>> Yeah, the "solution" to this problem seems to be "don't enable >>>>>> CMA on x86". Maybe it should even be disabled in the config >>>>>> system. >>>>> Or, as previously suggested, don't use CMA for order 0 (single >>>>> page) allocations >>>> >>>> On devices that actually need CMA pools to arrange for memory to be >>>> in certain ranges, I think you probably do want to have order 0 >>>> pages come from the CMA pool. >>>> >>>> Seems like disabling CMA on x86 (where it should be unneeded) is >>>> the better way, IMO >>> >>> CMA has its uses on x86. For example, CMA is used to allocate 1GB huge >>> pages. >>> >>> There may also be people with devices that do not scatter-gather, and >>> need a large physically contiguous buffer, though there should be >>> relatively few of those on x86. >>> >>> I suspect it makes most sense to do DMA allocations up to PAGE_ORDER >>> through the normal allocator on x86, and only invoking CMA for larger >>> allocations. >> >> The code that uses CMA to satisfy DMA allocations on x86 is >> specific to the x86 arch and was added in 2011 as a means of _testing_ >> CMA in KVM: >> >> commit 0a2b9a6ea93650b8a00f9fd5ee8fdd25671e2df6 >> Author: Marek Szyprowski >> Date: Thu Dec 29 13:09:51 2011 +0100 >> >> X86: integrate CMA with DMA-mapping subsystem >> >> This patch adds support for CMA to dma-mapping subsystem for x86 >> architecture that uses common pci-dma/pci-nommu implementation. This >> allows to test CMA on KVM/QEMU and a lot of common x86 boxes. >> >> Signed-off-by: Marek Szyprowski >> Signed-off-by: Kyungmin Park >> CC: Michal Nazarewicz >> Acked-by: Arnd Bergmann >> >> (no x86 maintainer acks?). >> >> Unfortunately, this code is enabled whenever CMA is enabled, rather >> than as a separate test configuration. >> >> So, while enabling CMA may have other purposes on x86, using it for >> x86 swiotlb and nommu dma allocations is not one of the them. >> >> And Ubuntu should not be enabling CONFIG_DMA_CMA for their i386 >> and amd64 configurations, as this is trying to drive _all_ dma mapping >> allocations through a _very_ small window (which is killing GPU >> performance). > > Thanks for the note Peter. We do have this disabled for our upcoming > Ubuntu 14.10 release. It is however still enabled in the previous 14.04 > release. We have been tracking this in > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1362261 but users > able to reproduce performance impacts in 14.10 were unable to reproduce > in 14.04 which is why we hadn't yet disabled it there. Leann, Thanks for that important clue. The missing piece specific to 3.16+ is these patches which impact every iommu config: Akinobu Mita (5): x86: make dma_alloc_coherent() return zeroed memory if CMA is enabled x86: enable DMA CMA with swiotlb intel-iommu: integrate DMA CMA memblock: introduce memblock_alloc_range() cma: add placement specifier for "cma=" kernel parameter These patches take the pre-existing nommu CMA test configuration and hook it up to all the x86 iommus, effectively reducing 10GB of DMA-able memory to 64MB, and hooks it all up to an allocator that's not nearly as effective as the page allocator. All to enable DMA allocation below 4GB which is already supported with the GFP_DMA32 flag to dma_alloc_coherent(). Regards, Peter Hurley
[Nouveau] [PATCH] drm/nouveau: fix vblank deadlock
On 08/12/2013 07:50 AM, Maarten Lankhorst wrote: > This fixes a deadlock inversion when vblank is enabled/disabled by drm. > &dev->vblank_time_lock is always taken when the vblank state is toggled, > which caused a deadlock when &event->lock was also taken during > event_get/put. > > Solve the race by requiring that lock to change enable/disable state, > and always keeping vblank on the event list. Core drm ignores unwanted > vblanks, so extra calls to drm_handle_vblank are harmless. I don't feel this is the appropriate solution to the lock inversion between vblank_time_lock and event->lock. Preferably drm core should correct the interface layer bug; ie., calling into a sub-driver holding a lock _and_ requiring the sub-driver to call a drm helper function which claims the same lock is bad design. The console lock suffers from the same design flaw and is a constant problem. Alternatively, the event trigger could be lockless; ie., the event list could be an RCU list instead. In this way, the event->lock does not need to be claimed, and thus no lock inversion is possible. The main drawback here is that currently the event->lock enforces non-overlapping lifetimes between the event handler and the event. Untangling object lifetimes in nouveau is a non-trivial exercise. Regards, Peter Hurley > Signed-off-by: Maarten Lankhorst > --- > diff --git a/drivers/gpu/drm/nouveau/core/core/event.c > b/drivers/gpu/drm/nouveau/core/core/event.c > index 7eb81c1..78bff7c 100644 > --- a/drivers/gpu/drm/nouveau/core/core/event.c > +++ b/drivers/gpu/drm/nouveau/core/core/event.c > @@ -23,43 +23,64 @@ > #include > #include > > -static void > -nouveau_event_put_locked(struct nouveau_event *event, int index, > - struct nouveau_eventh *handler) > -{ > - if (!--event->index[index].refs) { > - if (event->disable) > - event->disable(event, index); > - } > - list_del(&handler->head); > -} > - > void > nouveau_event_put(struct nouveau_event *event, int index, > struct nouveau_eventh *handler) > { > unsigned long flags; > > + if (index >= event->index_nr) > + return; > + > spin_lock_irqsave(&event->lock, flags); > - if (index < event->index_nr) > - nouveau_event_put_locked(event, index, handler); > + list_del(&handler->head); > + > + if (event->toggle_lock) > + spin_lock(event->toggle_lock); > + nouveau_event_disable_locked(event, index, 1); > + if (event->toggle_lock) > + spin_unlock(event->toggle_lock); > + > spin_unlock_irqrestore(&event->lock, flags); > } > > void > +nouveau_event_enable_locked(struct nouveau_event *event, int index) > +{ > + if (index >= event->index_nr) > + return; > + > + if (!event->index[index].refs++ && event->enable) > + event->enable(event, index); > +} > + > +void > +nouveau_event_disable_locked(struct nouveau_event *event, int index, int > refs) > +{ > + if (index >= event->index_nr) > + return; > + > + event->index[index].refs -= refs; > + if (!event->index[index].refs && event->disable) > + event->disable(event, index); > +} > + > +void > nouveau_event_get(struct nouveau_event *event, int index, > struct nouveau_eventh *handler) > { > unsigned long flags; > > + if (index >= event->index_nr) > + return; > + > spin_lock_irqsave(&event->lock, flags); > - if (index < event->index_nr) { > - list_add(&handler->head, &event->index[index].list); > - if (!event->index[index].refs++) { > - if (event->enable) > - event->enable(event, index); > - } > - } > + list_add(&handler->head, &event->index[index].list); > + if (event->toggle_lock) > + spin_lock(event->toggle_lock); > + nouveau_event_enable_locked(event, index); > + if (event->toggle_lock) > + spin_unlock(event->toggle_lock); > spin_unlock_irqrestore(&event->lock, flags); > } > > @@ -68,6 +89,7 @@ nouveau_event_trigger(struct nouveau_event *event, int > index) > { > struct nouveau_eventh *handler, *temp; > unsigned long flags; > + int refs = 0; > > if (index >= event->index_nr) > return; > @@ -75,9 +97,17 @@ nouveau_event_
[PATCH] drm: Use correct spinlock flavor in drm_vblank_get()
The irq flags state is already established by the outer spin_lock_irqsave(); re-disabling irqs is redundant. Signed-off-by: Peter Hurley --- drivers/gpu/drm/drm_irq.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c index f92da0a..c7d4af5 100644 --- a/drivers/gpu/drm/drm_irq.c +++ b/drivers/gpu/drm/drm_irq.c @@ -952,13 +952,13 @@ static void drm_update_vblank_count(struct drm_device *dev, int crtc) */ int drm_vblank_get(struct drm_device *dev, int crtc) { - unsigned long irqflags, irqflags2; + unsigned long irqflags; int ret = 0; spin_lock_irqsave(&dev->vbl_lock, irqflags); /* Going from 0->1 means we have to enable interrupts again */ if (atomic_add_return(1, &dev->vblank_refcount[crtc]) == 1) { - spin_lock_irqsave(&dev->vblank_time_lock, irqflags2); + spin_lock(&dev->vblank_time_lock); if (!dev->vblank_enabled[crtc]) { /* Enable vblank irqs under vblank_time_lock protection. * All vblank count & timestamp updates are held off @@ -976,7 +976,7 @@ int drm_vblank_get(struct drm_device *dev, int crtc) drm_update_vblank_count(dev, crtc); } } - spin_unlock_irqrestore(&dev->vblank_time_lock, irqflags2); + spin_unlock(&dev->vblank_time_lock); } else { if (!dev->vblank_enabled[crtc]) { atomic_dec(&dev->vblank_refcount[crtc]); -- 1.8.1.2
[PATCH 0/9] drm/nouveau: Cleanup event/handler design
This series was originally motivated by a deadlock, introduced in commit 1d7c71a3e2f77336df536855b0efd2dc5bdeb41b 'drm/nouveau/disp: port vblank handling to event interface', due to inverted lock order between nouveau_drm_vblank_enable() and nouveau_drm_vblank_handler() (the complete lockdep report is included in the patch 4/5 changelog). Because this series fixes the vblank event deadlock, it is a competing solution to Maarten Lankhorst's 'drm/nouveau: fix vblank deadlock'. This series fixes the vblank event deadlock by converting the event handler list to RCU. This solution allows the event trigger to be lockless, and thus avoiding the lock inversion. Typical hurdles to RCU conversion include: 1) ensuring the object lifetime exceeds the lockless access; 2) preventing premature object reuse; and 3) verifying that stale object use not present logical errors. Because object reuse is not safe in RCU-based operations, the nouveau_event_get/_put interface is migrated from add/remove semantics to enable/disable semantics with a separate install/remove step (which also serves to document the handler lifetime). This also corrects an unsafe interface design where handlers can mistakenly be reused while in use. The nouveau driver currently supports 4 events -- gpio, uevent, cevent, and vblank. Every event is created by and stored within its respective subdev/engine object -- gpio in the GPIO subdev, uevent and cevent in the FIFO engine, and vblank in the DISP engine. Thus event lifetime extends until the subdev is destructed during devobj teardown. Event handler lifetime varies and is detailed in the table below (apologies for the wide-format): Event Handler function Container Until - --- -- gpio nouveau_connector_hotplug struct nouveau_connector nouveau_connector_destroy uevent nouveau_fence_wait_uevent_handler local stack object nouveau_fence_wait_uevent returns cevent none n/a n/a vblank nouveau_drm_vblank_handler struct nouveau_drm nouveau_drm_remove vblank nv50_software_vblsem_release struct nouveau_software_chan _nouveau_engctx_dtor (call stack originates with nouveau_abi16_chan_free ioctl) vblank nvc0_software_vblsem_release struct nouveau_software_chan same as above RCU lifetime considerations for handlers: Event HandlerLifetime - - gpio nouveau_connector_hotplug kfree_rcu(nv_connector) uevent nouveau_fence_wait_uevent_handler explicit use of nouveau_event_hander_create/_destroy cevent none n/a vblank nouveau_drm_vblank_handler synchronize_rcu() in nouveau_drm_unload vblank nv50_software_vblsem_release synchronize_rcu() in container dtor vblank nvc0_software_vblsem_release synchronize_rcu() in container dtor synchronize_rcu() is used for nouveau_object-based containers for which kfree_rcu() is not suitable/possible. Stale event handler execution is not a concern for the existing handlers because either: 1) the handler is responsible for disabling itself (via returning NVKM_EVENT_DROP), or 2) the existing handler can already be stale, as is the case with nouveau_connector_hotplug, which only schedules a work item, and nouveau_drm_vblank_handler, which the drm core expects may be stale. Peter Hurley (9): drm/nouveau: Add priv field for event handlers drm/nouveau: Move event index check from critical section drm/nouveau: Allocate local event handlers drm/nouveau: Allow asymmetric nouveau_event_get/_put drm/nouveau: Add install/remove semantics for event handlers drm/nouveau: Convert event handler list to RCU drm/nouveau: Fold nouveau_event_put_locked into caller drm/nouveau: Simplify event interface drm/nouveau: Simplify event handler interface drivers/gpu/drm/nouveau/core/core/event.c | 121 + .../gpu/drm/nouveau/core/engine/software/nv50.c| 32 -- .../gpu/drm/nouveau/core/engine/software/nvc0.c| 32 -- drivers/gpu/drm/nouveau/core/include/core/event.h | 27 - .../gpu/drm/nouveau/core/include/engine/software.h | 2 +- drivers/gpu/drm/nouveau/nouveau_connector.c| 16 ++- drivers/gpu/drm/nouveau/nouveau_display.c | 16 +-- drivers/gpu/drm/nouveau/nouveau_drm.c | 35 +++--- drivers/gpu/drm/nouveau/nouveau_fence.c| 27 ++--- 9 files changed, 220 insertions(+), 88 deletions(-) -- 1.8.1.2
[PATCH 1/9] drm/nouveau: Add priv field for event handlers
Provide private field for event handlers exclusive use. Convert nouveau_fence_wait_uevent() and nouveau_fence_wait_uevent_handler(); drop struct nouveau_fence_uevent. Signed-off-by: Peter Hurley --- drivers/gpu/drm/nouveau/core/include/core/event.h | 1 + drivers/gpu/drm/nouveau/nouveau_fence.c | 20 +++- 2 files changed, 8 insertions(+), 13 deletions(-) diff --git a/drivers/gpu/drm/nouveau/core/include/core/event.h b/drivers/gpu/drm/nouveau/core/include/core/event.h index 9e09440..ad4d8c2 100644 --- a/drivers/gpu/drm/nouveau/core/include/core/event.h +++ b/drivers/gpu/drm/nouveau/core/include/core/event.h @@ -7,6 +7,7 @@ struct nouveau_eventh { struct list_head head; + void *priv; int (*func)(struct nouveau_eventh *, int index); }; diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c b/drivers/gpu/drm/nouveau/nouveau_fence.c index be31499..c2e3167 100644 --- a/drivers/gpu/drm/nouveau/nouveau_fence.c +++ b/drivers/gpu/drm/nouveau/nouveau_fence.c @@ -165,17 +165,11 @@ nouveau_fence_done(struct nouveau_fence *fence) return !fence->channel; } -struct nouveau_fence_uevent { - struct nouveau_eventh handler; - struct nouveau_fence_priv *priv; -}; - static int -nouveau_fence_wait_uevent_handler(struct nouveau_eventh *event, int index) +nouveau_fence_wait_uevent_handler(struct nouveau_eventh *handler, int index) { - struct nouveau_fence_uevent *uevent = - container_of(event, struct nouveau_fence_uevent, handler); - wake_up_all(&uevent->priv->waiting); + struct nouveau_fence_priv *priv = handler->priv; + wake_up_all(&priv->waiting); return NVKM_EVENT_KEEP; } @@ -186,13 +180,13 @@ nouveau_fence_wait_uevent(struct nouveau_fence *fence, bool intr) struct nouveau_channel *chan = fence->channel; struct nouveau_fifo *pfifo = nouveau_fifo(chan->drm->device); struct nouveau_fence_priv *priv = chan->drm->fence; - struct nouveau_fence_uevent uevent = { - .handler.func = nouveau_fence_wait_uevent_handler, + struct nouveau_eventh handler = { + .func = nouveau_fence_wait_uevent_handler, .priv = priv, }; int ret = 0; - nouveau_event_get(pfifo->uevent, 0, &uevent.handler); + nouveau_event_get(pfifo->uevent, 0, &handler); if (fence->timeout) { unsigned long timeout = fence->timeout - jiffies; @@ -224,7 +218,7 @@ nouveau_fence_wait_uevent(struct nouveau_fence *fence, bool intr) } } - nouveau_event_put(pfifo->uevent, 0, &uevent.handler); + nouveau_event_put(pfifo->uevent, 0, &handler); if (unlikely(ret < 0)) return ret; -- 1.8.1.2
[PATCH 2/9] drm/nouveau: Move event index check from critical section
The index_nr field is constant for the lifetime of the event, so serialized access is unnecessary. Signed-off-by: Peter Hurley --- drivers/gpu/drm/nouveau/core/core/event.c | 19 +++ 1 file changed, 11 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/nouveau/core/core/event.c b/drivers/gpu/drm/nouveau/core/core/event.c index 7eb81c1..e69c463 100644 --- a/drivers/gpu/drm/nouveau/core/core/event.c +++ b/drivers/gpu/drm/nouveau/core/core/event.c @@ -40,9 +40,11 @@ nouveau_event_put(struct nouveau_event *event, int index, { unsigned long flags; + if (index >= event->index_nr) + return; + spin_lock_irqsave(&event->lock, flags); - if (index < event->index_nr) - nouveau_event_put_locked(event, index, handler); + nouveau_event_put_locked(event, index, handler); spin_unlock_irqrestore(&event->lock, flags); } @@ -52,13 +54,14 @@ nouveau_event_get(struct nouveau_event *event, int index, { unsigned long flags; + if (index >= event->index_nr) + return; + spin_lock_irqsave(&event->lock, flags); - if (index < event->index_nr) { - list_add(&handler->head, &event->index[index].list); - if (!event->index[index].refs++) { - if (event->enable) - event->enable(event, index); - } + list_add(&handler->head, &event->index[index].list); + if (!event->index[index].refs++) { + if (event->enable) + event->enable(event, index); } spin_unlock_irqrestore(&event->lock, flags); } -- 1.8.1.2
[PATCH 3/9] drm/nouveau: Allocate local event handlers
Prepare for transition to RCU-based event handler list; since RCU list traversal may have stale pointers, local storage may go out of scope before handler completes. Introduce nouveau_event_handler_create/_destroy which provides suitable semantics for multiple, temporary event handlers. Signed-off-by: Peter Hurley --- drivers/gpu/drm/nouveau/core/core/event.c | 24 +++ drivers/gpu/drm/nouveau/core/include/core/event.h | 6 ++ drivers/gpu/drm/nouveau/nouveau_fence.c | 15 +++--- 3 files changed, 38 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/nouveau/core/core/event.c b/drivers/gpu/drm/nouveau/core/core/event.c index e69c463..1a8d685 100644 --- a/drivers/gpu/drm/nouveau/core/core/event.c +++ b/drivers/gpu/drm/nouveau/core/core/event.c @@ -23,6 +23,30 @@ #include #include +int +nouveau_event_handler_create(struct nouveau_event *event, int index, +int (*func)(struct nouveau_eventh*, int), +void *priv, struct nouveau_eventh **phandler) +{ + struct nouveau_eventh *handler; + + handler = *phandler = kzalloc(sizeof(*handler), GFP_KERNEL); + if (!handler) + return -ENOMEM; + handler->func = func; + handler->priv = priv; + nouveau_event_get(event, index, handler); + return 0; +} + +void +nouveau_event_handler_destroy(struct nouveau_event *event, int index, + struct nouveau_eventh *handler) +{ + nouveau_event_put(event, index, handler); + kfree(handler); +} + static void nouveau_event_put_locked(struct nouveau_event *event, int index, struct nouveau_eventh *handler) diff --git a/drivers/gpu/drm/nouveau/core/include/core/event.h b/drivers/gpu/drm/nouveau/core/include/core/event.h index ad4d8c2..bdf1a0a 100644 --- a/drivers/gpu/drm/nouveau/core/include/core/event.h +++ b/drivers/gpu/drm/nouveau/core/include/core/event.h @@ -34,4 +34,10 @@ void nouveau_event_get(struct nouveau_event *, int index, void nouveau_event_put(struct nouveau_event *, int index, struct nouveau_eventh *); +int nouveau_event_handler_create(struct nouveau_event *, int index, +int (*func)(struct nouveau_eventh*, int), +void *priv, struct nouveau_eventh **); +void nouveau_event_handler_destroy(struct nouveau_event *, int index, + struct nouveau_eventh *); + #endif diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c b/drivers/gpu/drm/nouveau/nouveau_fence.c index c2e3167..6dde483 100644 --- a/drivers/gpu/drm/nouveau/nouveau_fence.c +++ b/drivers/gpu/drm/nouveau/nouveau_fence.c @@ -180,13 +180,14 @@ nouveau_fence_wait_uevent(struct nouveau_fence *fence, bool intr) struct nouveau_channel *chan = fence->channel; struct nouveau_fifo *pfifo = nouveau_fifo(chan->drm->device); struct nouveau_fence_priv *priv = chan->drm->fence; - struct nouveau_eventh handler = { - .func = nouveau_fence_wait_uevent_handler, - .priv = priv, - }; - int ret = 0; + struct nouveau_eventh *handler; + int ret; - nouveau_event_get(pfifo->uevent, 0, &handler); + ret = nouveau_event_handler_create(pfifo->uevent, 0, + nouveau_fence_wait_uevent_handler, + priv, &handler); + if (ret) + return ret; if (fence->timeout) { unsigned long timeout = fence->timeout - jiffies; @@ -218,7 +219,7 @@ nouveau_fence_wait_uevent(struct nouveau_fence *fence, bool intr) } } - nouveau_event_put(pfifo->uevent, 0, &handler); + nouveau_event_handler_destroy(pfifo->uevent, 0, handler); if (unlikely(ret < 0)) return ret; -- 1.8.1.2
[PATCH 4/9] drm/nouveau: Allow asymmetric nouveau_event_get/_put
Most nouveau event handlers have storage in 'static' containers (structures with lifetimes nearly equivalent to the drm_device), but are dangerously reused via nouveau_event_get/_put. For example, if nouveau_event_get is called more than once for a given handler, the event handler list will be corrupted. Migrate nouveau_event_get/_put from add/remove semantics to enable/disable semantics. Signed-off-by: Peter Hurley --- drivers/gpu/drm/nouveau/core/core/event.c | 20 drivers/gpu/drm/nouveau/core/include/core/event.h | 4 drivers/gpu/drm/nouveau/nouveau_drm.c | 8 ++-- 3 files changed, 18 insertions(+), 14 deletions(-) diff --git a/drivers/gpu/drm/nouveau/core/core/event.c b/drivers/gpu/drm/nouveau/core/core/event.c index 1a8d685..0a65ede 100644 --- a/drivers/gpu/drm/nouveau/core/core/event.c +++ b/drivers/gpu/drm/nouveau/core/core/event.c @@ -51,11 +51,13 @@ static void nouveau_event_put_locked(struct nouveau_event *event, int index, struct nouveau_eventh *handler) { - if (!--event->index[index].refs) { - if (event->disable) - event->disable(event, index); + if (__test_and_clear_bit(NVKM_EVENT_ENABLE, &handler->flags)) { + if (!--event->index[index].refs) { + if (event->disable) + event->disable(event, index); + } + list_del(&handler->head); } - list_del(&handler->head); } void @@ -82,10 +84,12 @@ nouveau_event_get(struct nouveau_event *event, int index, return; spin_lock_irqsave(&event->lock, flags); - list_add(&handler->head, &event->index[index].list); - if (!event->index[index].refs++) { - if (event->enable) - event->enable(event, index); + if (!__test_and_set_bit(NVKM_EVENT_ENABLE, &handler->flags)) { + list_add(&handler->head, &event->index[index].list); + if (!event->index[index].refs++) { + if (event->enable) + event->enable(event, index); + } } spin_unlock_irqrestore(&event->lock, flags); } diff --git a/drivers/gpu/drm/nouveau/core/include/core/event.h b/drivers/gpu/drm/nouveau/core/include/core/event.h index bdf1a0a..3e704d5 100644 --- a/drivers/gpu/drm/nouveau/core/include/core/event.h +++ b/drivers/gpu/drm/nouveau/core/include/core/event.h @@ -5,8 +5,12 @@ #define NVKM_EVENT_DROP 0 #define NVKM_EVENT_KEEP 1 +/* nouveau_eventh.flags bit #s */ +#define NVKM_EVENT_ENABLE 0 + struct nouveau_eventh { struct list_head head; + unsigned long flags; void *priv; int (*func)(struct nouveau_eventh *, int index); }; diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c index b29d04b..ccea2c4 100644 --- a/drivers/gpu/drm/nouveau/nouveau_drm.c +++ b/drivers/gpu/drm/nouveau/nouveau_drm.c @@ -88,7 +88,6 @@ nouveau_drm_vblank_enable(struct drm_device *dev, int head) if (WARN_ON_ONCE(head > ARRAY_SIZE(drm->vblank))) return -EIO; - WARN_ON_ONCE(drm->vblank[head].func); drm->vblank[head].func = nouveau_drm_vblank_handler; nouveau_event_get(pdisp->vblank, head, &drm->vblank[head]); return 0; @@ -99,11 +98,8 @@ nouveau_drm_vblank_disable(struct drm_device *dev, int head) { struct nouveau_drm *drm = nouveau_drm(dev); struct nouveau_disp *pdisp = nouveau_disp(drm->device); - if (drm->vblank[head].func) - nouveau_event_put(pdisp->vblank, head, &drm->vblank[head]); - else - WARN_ON_ONCE(1); - drm->vblank[head].func = NULL; + + nouveau_event_put(pdisp->vblank, head, &drm->vblank[head]); } static u64 -- 1.8.1.2
[PATCH 5/9] drm/nouveau: Add install/remove semantics for event handlers
Complete migration of nouveau_event_get/_put from add/remove semantics to enable/disable semantics. Introduce nouveau_event_handler_install/_remove interface to add/remove non-local-scope event handlers (ie., those stored in parent containers). This change in semantics makes explicit the handler lifetime, and distinguishes "one-of" event handlers (such as gpio) from "many temporary" event handlers (such as uevent). Signed-off-by: Peter Hurley --- drivers/gpu/drm/nouveau/core/core/event.c | 63 +++--- .../gpu/drm/nouveau/core/engine/software/nv50.c| 31 +-- .../gpu/drm/nouveau/core/engine/software/nvc0.c| 31 +-- drivers/gpu/drm/nouveau/core/include/core/event.h | 6 +++ .../gpu/drm/nouveau/core/include/engine/software.h | 2 +- drivers/gpu/drm/nouveau/nouveau_connector.c| 10 +++- drivers/gpu/drm/nouveau/nouveau_drm.c | 17 +- 7 files changed, 140 insertions(+), 20 deletions(-) diff --git a/drivers/gpu/drm/nouveau/core/core/event.c b/drivers/gpu/drm/nouveau/core/core/event.c index 0a65ede..4cd1ebe 100644 --- a/drivers/gpu/drm/nouveau/core/core/event.c +++ b/drivers/gpu/drm/nouveau/core/core/event.c @@ -23,19 +23,60 @@ #include #include +void +nouveau_event_handler_install(struct nouveau_event *event, int index, + int (*func)(struct nouveau_eventh*, int), + void *priv, struct nouveau_eventh *handler) +{ + unsigned long flags; + + if (index >= event->index_nr) + return; + + handler->func = func; + handler->priv = priv; + + spin_lock_irqsave(&event->lock, flags); + list_add(&handler->head, &event->index[index].list); + spin_unlock_irqrestore(&event->lock, flags); +} + +void +nouveau_event_handler_remove(struct nouveau_event *event, int index, +struct nouveau_eventh *handler) +{ + unsigned long flags; + + if (index >= event->index_nr) + return; + + spin_lock_irqsave(&event->lock, flags); + list_del(&handler->head); + spin_unlock_irqrestore(&event->lock, flags); +} + int nouveau_event_handler_create(struct nouveau_event *event, int index, int (*func)(struct nouveau_eventh*, int), void *priv, struct nouveau_eventh **phandler) { struct nouveau_eventh *handler; + unsigned long flags; handler = *phandler = kzalloc(sizeof(*handler), GFP_KERNEL); if (!handler) return -ENOMEM; handler->func = func; handler->priv = priv; - nouveau_event_get(event, index, handler); + __set_bit(NVKM_EVENT_ENABLE, &handler->flags); + + spin_lock_irqsave(&event->lock, flags); + list_add(&handler->head, &event->index[index].list); + if (!event->index[index].refs++) { + if (event->enable) + event->enable(event, index); + } + spin_unlock_irqrestore(&event->lock, flags); return 0; } @@ -43,7 +84,18 @@ void nouveau_event_handler_destroy(struct nouveau_event *event, int index, struct nouveau_eventh *handler) { - nouveau_event_put(event, index, handler); + unsigned long flags; + + if (index >= event->index_nr) + return; + + spin_lock_irqsave(&event->lock, flags); + if (!--event->index[index].refs) { + if (event->disable) + event->disable(event, index); + } + list_del(&handler->head); + spin_unlock_irqrestore(&event->lock, flags); kfree(handler); } @@ -56,7 +108,6 @@ nouveau_event_put_locked(struct nouveau_event *event, int index, if (event->disable) event->disable(event, index); } - list_del(&handler->head); } } @@ -85,7 +136,6 @@ nouveau_event_get(struct nouveau_event *event, int index, spin_lock_irqsave(&event->lock, flags); if (!__test_and_set_bit(NVKM_EVENT_ENABLE, &handler->flags)) { - list_add(&handler->head, &event->index[index].list); if (!event->index[index].refs++) { if (event->enable) event->enable(event, index); @@ -105,8 +155,9 @@ nouveau_event_trigger(struct nouveau_event *event, int index) spin_lock_irqsave(&event->lock, flags); list_for_each_entry_safe(handler, temp, &event->index[index].list, head) { - if (handler->func(handler, index) == NVKM_EVENT_DROP) { - nouveau_event_put_locked(event, i
[PATCH 6/9] drm/nouveau: Convert event handler list to RCU
Lockdep report [1] correctly identifies a potential deadlock between nouveau_drm_vblank_enable() and nouveau_drm_vblank_handler() due to inverted lock order of event->lock with drm core's dev->vblank_time_lock. Fix vblank event deadlock by converting event handler list to RCU. List updates remain serialized by event->lock. List traversal & handler execution is lockless which prevents inverted lock order problems with the drm core. Safe deletion of the event handler is accomplished with synchronize_rcu() for those handlers stored in nouveau_object-based containers (nouveau_drm & nv50_/nvc0_software_context); otherwise, with kfree_rcu (for nouveau_connector & uevent temporary handlers). [1] == [ INFO: possible circular locking dependency detected ] 3.10.0-0+tip-xeon+lockdep #0+tip Not tainted --- swapper/7/0 is trying to acquire lock: (&(&dev->vblank_time_lock)->rlock){-.}, at: [] drm_handle_vblank+0x69/0x400 [drm] but task is already holding lock: (&(&event->lock)->rlock){-.}, at: [] nouveau_event_trigger+0x4d/0xd0 [nouveau] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&(&event->lock)->rlock){-.}: [] lock_acquire+0x92/0x1f0 [] _raw_spin_lock_irqsave+0x46/0x60 [] nouveau_event_get+0x27/0xa0 [nouveau] [] nouveau_drm_vblank_enable+0x8d/0xf0 [nouveau] [] drm_vblank_get+0xf8/0x2c0 [drm] [] drm_wait_vblank+0x84/0x6f0 [drm] [] drm_ioctl+0x559/0x690 [drm] [] do_vfs_ioctl+0x97/0x590 [] SyS_ioctl+0x91/0xb0 [] system_call_fastpath+0x16/0x1b -> #0 (&(&dev->vblank_time_lock)->rlock){-.}: [] __lock_acquire+0x1c43/0x1d30 [] lock_acquire+0x92/0x1f0 [] _raw_spin_lock_irqsave+0x46/0x60 [] drm_handle_vblank+0x69/0x400 [drm] [] nouveau_drm_vblank_handler+0x27/0x30 [nouveau] [] nouveau_event_trigger+0x90/0xd0 [nouveau] [] nv50_disp_intr+0xdd/0x230 [nouveau] [] nouveau_mc_intr+0xa1/0x100 [nouveau] [] handle_irq_event_percpu+0x6c/0x3d0 [] handle_irq_event+0x48/0x70 [] handle_fasteoi_irq+0x5a/0x100 [] handle_irq+0x22/0x40 [] do_IRQ+0x5a/0xd0 [] ret_from_intr+0x0/0x13 [] arch_cpu_idle+0x26/0x30 [] cpu_startup_entry+0xce/0x410 [] start_secondary+0x255/0x25c other info that might help us debug this: Possible unsafe locking scenario: CPU0CPU1 lock(&(&event->lock)->rlock); lock(&(&dev->vblank_time_lock)->rlock); lock(&(&event->lock)->rlock); lock(&(&dev->vblank_time_lock)->rlock); *** DEADLOCK *** 1 lock held by swapper/7/0: #0: (&(&event->lock)->rlock){-.}, at: [] nouveau_event_trigger+0x4d/0xd0 [nouveau] stack backtrace: CPU: 7 PID: 0 Comm: swapper/7 Not tainted 3.10.0-0+tip-xeon+lockdep #0+tip Hardware name: Dell Inc. Precision WorkStation T5400 /0RW203, BIOS A11 04/30/2012 821ccf60 8802bfdc3b08 81755f39 8802bfdc3b58 8174f526 0002 8802bfdc3be8 8802b330a7f8 8802b330a820 8802b330a7f8 0001 Call Trace: [] dump_stack+0x19/0x1b [] print_circular_bug+0x1fb/0x20c [] __lock_acquire+0x1c43/0x1d30 [] ? trace_hardirqs_off+0xd/0x10 [] ? flat_send_IPI_mask+0x88/0xa0 [] lock_acquire+0x92/0x1f0 [] ? drm_handle_vblank+0x69/0x400 [drm] [] _raw_spin_lock_irqsave+0x46/0x60 [] ? drm_handle_vblank+0x69/0x400 [drm] [] drm_handle_vblank+0x69/0x400 [drm] [] nouveau_drm_vblank_handler+0x27/0x30 [nouveau] [] nouveau_event_trigger+0x90/0xd0 [nouveau] [] nv50_disp_intr+0xdd/0x230 [nouveau] [] ? _raw_spin_unlock_irqrestore+0x42/0x80 [] ? __hrtimer_start_range_ns+0x16c/0x530 [] nouveau_mc_intr+0xa1/0x100 [nouveau] [] handle_irq_event_percpu+0x6c/0x3d0 [] handle_irq_event+0x48/0x70 [] ? handle_fasteoi_irq+0x1e/0x100 [] handle_fasteoi_irq+0x5a/0x100 [] handle_irq+0x22/0x40 [] do_IRQ+0x5a/0xd0 [] common_interrupt+0x6f/0x6f [] ? default_idle+0x1d/0x290 [] ? default_idle+0x1b/0x290 [] arch_cpu_idle+0x26/0x30 [] cpu_startup_entry+0xce/0x410 [] ? clockevents_register_device+0xdc/0x140 [] start_secondary+0x255/0x25c Reported-by: Dave Jones Signed-off-by: Peter Hurley --- drivers/gpu/drm/nouveau/core/core/event.c | 22 +++--- .../gpu/drm/nouveau/core/engine/software/nv50.c| 1 + .../gpu/drm/nouveau/core/engine/software/nvc0.c| 1 + drivers/gpu/drm/nouveau/core/include/core/event.h | 1 + drivers/gpu/drm/nouveau/nouveau_connector.c| 2 +- drivers/gpu/drm/nouveau/nouveau_drm.c | 1 + 6 files changed, 16 insertion
[PATCH 7/9] drm/nouveau: Fold nouveau_event_put_locked into caller
nouveau_event_put_locked() only has 1 call site; fold into caller. Signed-off-by: Peter Hurley --- drivers/gpu/drm/nouveau/core/core/event.c | 19 ++- 1 file changed, 6 insertions(+), 13 deletions(-) diff --git a/drivers/gpu/drm/nouveau/core/core/event.c b/drivers/gpu/drm/nouveau/core/core/event.c index ce0a0ef..45bcb37 100644 --- a/drivers/gpu/drm/nouveau/core/core/event.c +++ b/drivers/gpu/drm/nouveau/core/core/event.c @@ -100,18 +100,6 @@ nouveau_event_handler_destroy(struct nouveau_event *event, int index, kfree_rcu(handler, rcu); } -static void -nouveau_event_put_locked(struct nouveau_event *event, int index, -struct nouveau_eventh *handler) -{ - if (__test_and_clear_bit(NVKM_EVENT_ENABLE, &handler->flags)) { - if (!--event->index[index].refs) { - if (event->disable) - event->disable(event, index); - } - } -} - void nouveau_event_put(struct nouveau_event *event, int index, struct nouveau_eventh *handler) @@ -122,7 +110,12 @@ nouveau_event_put(struct nouveau_event *event, int index, return; spin_lock_irqsave(&event->lock, flags); - nouveau_event_put_locked(event, index, handler); + if (__test_and_clear_bit(NVKM_EVENT_ENABLE, &handler->flags)) { + if (!--event->index[index].refs) { + if (event->disable) + event->disable(event, index); + } + } spin_unlock_irqrestore(&event->lock, flags); } -- 1.8.1.2
[PATCH 8/9] drm/nouveau: Simplify event interface
Store event back-pointer and index within struct event_handler; remove superfluous parameters when event_handler is supplied. Signed-off-by: Peter Hurley --- drivers/gpu/drm/nouveau/core/core/event.c | 36 +- .../gpu/drm/nouveau/core/engine/software/nv50.c| 11 ++- .../gpu/drm/nouveau/core/engine/software/nvc0.c| 11 ++- drivers/gpu/drm/nouveau/core/include/core/event.h | 15 + drivers/gpu/drm/nouveau/nouveau_connector.c| 5 +-- drivers/gpu/drm/nouveau/nouveau_display.c | 16 +++--- drivers/gpu/drm/nouveau/nouveau_drm.c | 10 ++ drivers/gpu/drm/nouveau/nouveau_fence.c| 2 +- 8 files changed, 44 insertions(+), 62 deletions(-) diff --git a/drivers/gpu/drm/nouveau/core/core/event.c b/drivers/gpu/drm/nouveau/core/core/event.c index 45bcb37..b7d8ae1 100644 --- a/drivers/gpu/drm/nouveau/core/core/event.c +++ b/drivers/gpu/drm/nouveau/core/core/event.c @@ -34,6 +34,9 @@ nouveau_event_handler_install(struct nouveau_event *event, int index, if (index >= event->index_nr) return; + handler->event = event; + handler->index = index; + handler->func = func; handler->priv = priv; @@ -43,12 +46,12 @@ nouveau_event_handler_install(struct nouveau_event *event, int index, } void -nouveau_event_handler_remove(struct nouveau_event *event, int index, -struct nouveau_eventh *handler) +nouveau_event_handler_remove(struct nouveau_eventh *handler) { + struct nouveau_event *event = handler->event; unsigned long flags; - if (index >= event->index_nr) + if (!event) return; spin_lock_irqsave(&event->lock, flags); @@ -67,6 +70,10 @@ nouveau_event_handler_create(struct nouveau_event *event, int index, handler = *phandler = kzalloc(sizeof(*handler), GFP_KERNEL); if (!handler) return -ENOMEM; + + handler->event = event; + handler->index = index; + handler->func = func; handler->priv = priv; __set_bit(NVKM_EVENT_ENABLE, &handler->flags); @@ -82,13 +89,12 @@ nouveau_event_handler_create(struct nouveau_event *event, int index, } void -nouveau_event_handler_destroy(struct nouveau_event *event, int index, - struct nouveau_eventh *handler) +nouveau_event_handler_destroy(struct nouveau_eventh *handler) { + struct nouveau_event *event = handler->event; + int index = handler->index; unsigned long flags; - if (index >= event->index_nr) - return; spin_lock_irqsave(&event->lock, flags); if (!--event->index[index].refs) { @@ -101,12 +107,13 @@ nouveau_event_handler_destroy(struct nouveau_event *event, int index, } void -nouveau_event_put(struct nouveau_event *event, int index, - struct nouveau_eventh *handler) +nouveau_event_put(struct nouveau_eventh *handler) { + struct nouveau_event *event = handler->event; + int index = handler->index; unsigned long flags; - if (index >= event->index_nr) + if (!event) return; spin_lock_irqsave(&event->lock, flags); @@ -120,12 +127,13 @@ nouveau_event_put(struct nouveau_event *event, int index, } void -nouveau_event_get(struct nouveau_event *event, int index, - struct nouveau_eventh *handler) +nouveau_event_get(struct nouveau_eventh *handler) { + struct nouveau_event *event = handler->event; + int index = handler->index; unsigned long flags; - if (index >= event->index_nr) + if (!event) return; spin_lock_irqsave(&event->lock, flags); @@ -150,7 +158,7 @@ nouveau_event_trigger(struct nouveau_event *event, int index) list_for_each_entry_rcu(handler, &event->index[index].list, head) { if (test_bit(NVKM_EVENT_ENABLE, &handler->flags)) { if (handler->func(handler, index) == NVKM_EVENT_DROP) - nouveau_event_put(event, index, handler); + nouveau_event_put(handler); } } rcu_read_unlock(); diff --git a/drivers/gpu/drm/nouveau/core/engine/software/nv50.c b/drivers/gpu/drm/nouveau/core/engine/software/nv50.c index 87aeee1..e969f0c 100644 --- a/drivers/gpu/drm/nouveau/core/engine/software/nv50.c +++ b/drivers/gpu/drm/nouveau/core/engine/software/nv50.c @@ -92,12 +92,11 @@ nv50_software_mthd_vblsem_release(struct nouveau_object *object, u32 mthd, void *args, u32 size) { struct nv50_software_chan *chan = (void *)nv_engctx(object->parent); - struct nouveau_disp *disp = nouveau_disp(object); u32 crtc = *(u32 *)args; if (crtc > 1)
[PATCH 9/9] drm/nouveau: Simplify event handler interface
Remove index parameter; access index via handler->index instead. Dissociate handler from related container; use handler->priv to access container. Signed-off-by: Peter Hurley --- drivers/gpu/drm/nouveau/core/core/event.c | 6 +++--- drivers/gpu/drm/nouveau/core/engine/software/nv50.c | 7 +++ drivers/gpu/drm/nouveau/core/engine/software/nvc0.c | 7 +++ drivers/gpu/drm/nouveau/core/include/core/event.h | 6 +++--- drivers/gpu/drm/nouveau/nouveau_connector.c | 9 + drivers/gpu/drm/nouveau/nouveau_drm.c | 9 + drivers/gpu/drm/nouveau/nouveau_fence.c | 2 +- 7 files changed, 23 insertions(+), 23 deletions(-) diff --git a/drivers/gpu/drm/nouveau/core/core/event.c b/drivers/gpu/drm/nouveau/core/core/event.c index b7d8ae1..1240fef 100644 --- a/drivers/gpu/drm/nouveau/core/core/event.c +++ b/drivers/gpu/drm/nouveau/core/core/event.c @@ -26,7 +26,7 @@ void nouveau_event_handler_install(struct nouveau_event *event, int index, - int (*func)(struct nouveau_eventh*, int), + int (*func)(struct nouveau_eventh*), void *priv, struct nouveau_eventh *handler) { unsigned long flags; @@ -61,7 +61,7 @@ nouveau_event_handler_remove(struct nouveau_eventh *handler) int nouveau_event_handler_create(struct nouveau_event *event, int index, -int (*func)(struct nouveau_eventh*, int), +int (*func)(struct nouveau_eventh*), void *priv, struct nouveau_eventh **phandler) { struct nouveau_eventh *handler; @@ -157,7 +157,7 @@ nouveau_event_trigger(struct nouveau_event *event, int index) rcu_read_lock(); list_for_each_entry_rcu(handler, &event->index[index].list, head) { if (test_bit(NVKM_EVENT_ENABLE, &handler->flags)) { - if (handler->func(handler, index) == NVKM_EVENT_DROP) + if (handler->func(handler) == NVKM_EVENT_DROP) nouveau_event_put(handler); } } diff --git a/drivers/gpu/drm/nouveau/core/engine/software/nv50.c b/drivers/gpu/drm/nouveau/core/engine/software/nv50.c index e969f0c..bf6f23b 100644 --- a/drivers/gpu/drm/nouveau/core/engine/software/nv50.c +++ b/drivers/gpu/drm/nouveau/core/engine/software/nv50.c @@ -131,10 +131,9 @@ nv50_software_sclass[] = { **/ static int -nv50_software_vblsem_release(struct nouveau_eventh *event, int head) +nv50_software_vblsem_release(struct nouveau_eventh *handler) { - struct nouveau_software_chan *chan = - container_of(event, struct nouveau_software_chan, vblank.event[head]); + struct nouveau_software_chan *chan = handler->priv; struct nv50_software_priv *priv = (void *)nv_object(chan)->engine; struct nouveau_bar *bar = nouveau_bar(priv); @@ -172,7 +171,7 @@ nv50_software_context_ctor(struct nouveau_object *parent, for (i = 0; i < ARRAY_SIZE(chan->base.vblank.event); i++) { nouveau_event_handler_install(disp->vblank, i, nv50_software_vblsem_release, - NULL, + chan, &chan->base.vblank.event[i]); } return 0; diff --git a/drivers/gpu/drm/nouveau/core/engine/software/nvc0.c b/drivers/gpu/drm/nouveau/core/engine/software/nvc0.c index d06658a..1a2a7a8 100644 --- a/drivers/gpu/drm/nouveau/core/engine/software/nvc0.c +++ b/drivers/gpu/drm/nouveau/core/engine/software/nvc0.c @@ -143,10 +143,9 @@ nvc0_software_sclass[] = { **/ static int -nvc0_software_vblsem_release(struct nouveau_eventh *event, int head) +nvc0_software_vblsem_release(struct nouveau_eventh *handler) { - struct nouveau_software_chan *chan = - container_of(event, struct nouveau_software_chan, vblank.event[head]); + struct nouveau_software_chan *chan = handler->priv; struct nvc0_software_priv *priv = (void *)nv_object(chan)->engine; struct nouveau_bar *bar = nouveau_bar(priv); @@ -178,7 +177,7 @@ nvc0_software_context_ctor(struct nouveau_object *parent, for (i = 0; i < ARRAY_SIZE(chan->base.vblank.event); i++) { nouveau_event_handler_install(disp->vblank, i, nvc0_software_vblsem_release, - NULL, + chan, &chan->base.vblank.event[i]); } return 0; diff --git a/drivers/gpu/drm/nouveau/core/include/c
[PATCH 0/9] drm/nouveau: Cleanup event/handler design
On 08/28/2013 03:15 AM, Ben Skeggs wrote: > On Wed, Aug 28, 2013 at 6:12 AM, Peter Hurley > wrote: >> This series was originally motivated by a deadlock, introduced in >> commit 1d7c71a3e2f77336df536855b0efd2dc5bdeb41b >> 'drm/nouveau/disp: port vblank handling to event interface', >> due to inverted lock order between nouveau_drm_vblank_enable() >> and nouveau_drm_vblank_handler() (the complete >> lockdep report is included in the patch 4/5 changelog). > Hey Peter, > > Thanks for the patch series. I've only taken a cursory glance through > the patches thus far, as they're definitely too intrusive for -fixes > at this point. I will take a proper look through the series within > the next week or so, I have some work pending which may possibly make > some of these changes unnecessary, though from what I can tell, > there's other good bits here we'll want. > > In a previous mail on the locking issue, you mentioned the drm core > doing the "wrong thing" here too, did you have any further thoughts on > correcting that issue too? I'm working on the premise that drm_handle_vblank() can be lockless; that would minimize the api disruption. I still have a fair bit of analysis to do, but some early observations: 1) The vbl_time_lock spinlock is unnecessary -- drm_handle_vblank() could be protected with vbl_lock, since everywhere else vbl_time_lock is held, vbl_lock is already held. 2) The _vblank_count[crtc] doesn't need to be atomic. All the code paths that update _vblank_count[] are already spinlocked. Even if the code weren't spinlocked, the atomic_inc/_adds aren't necessary; only the memory barriers and read/write order of the vblank count/timestamp tuple is relevant. 3) Careful enabling of drm_handle_vblank() is sufficient to solve the concurrency between drm_handle_vblank() and drm_vblank_get() without needing a spinlock for exclusion. drm_handle_vblank() would need to account for the possibility of having missed a vblank interrupt between the reading of the hw vblank counter and the enabling drm_handle_vblank(). 4) I'm still thinking about how/whether to exclude drm_handle_vblank() and vblank_disable_and_save(). One thought is to replace the timeout timer with delayed work instead so that vblank_disable_and_save() could wait for drm_handle_vblank() to finish after disabling it. [I'd also need to verify that the drm drivers other than intel which use drm_vblank_off() do so from non-atomic context.] I realize that I'm mostly referring to the hw counter/timestamp flavor of vblank handling; that's primarily because it requires a more rigorous handling and has race conditions that don't apply to the software-only counter. Regards, Peter Hurley
Re: 3.8-rc2: lockdep warning in nouveau driver
On Wed, 2013-01-09 at 12:45 +0100, Arend van Spriel wrote: > Maybe this one is already known, but I did not find a post about it. So > here it is. > > Regards, > Arend [snip] > [9.589986] = > [9.595365] [ INFO: possible recursive locking detected ] > [9.600745] 3.8.0-rc2-wl-testing-lockdep-2-ga524cf0 #1 Not tainted > [9.607248] - > [9.612626] modprobe/163 is trying to acquire lock: > [9.617486] (&subdev->mutex){+.+.+.}, at: [] > nv50_fb_vram_new+0x92/0x230 [nouveau] > [9.626052] > [9.626052] but task is already holding lock: > [9.631865] (&subdev->mutex){+.+.+.}, at: [] > nv50_disp_data_ctor+0x55/0xc0 [nouveau] > [9.640593] > [9.640593] other info that might help us debug this: > [9.647096] Possible unsafe locking scenario: > [9.647096] > [9.652995]CPU0 > [9.655430] > [9.657867] lock(&subdev->mutex); > [9.661365] lock(&subdev->mutex); > [9.664863] > [9.664863] *** DEADLOCK *** > [9.664863] > [9.670762] May be due to missing lock nesting notation Same. [5.995881] = [5.995886] [ INFO: possible recursive locking detected ] [5.995892] 3.8.0-next-20130125+ttypatch-xeon+lockdep #20130125+ttypatch Not tainted [5.995898] - [5.995904] modprobe/275 is trying to acquire lock: [5.995909] (&subdev->mutex){+.+.+.}, at: [] nouveau_instobj_create_+0x48/0x90 [nouveau] [5.995955] [5.995955] but task is already holding lock: [5.995961] (&subdev->mutex){+.+.+.}, at: [] nv50_disp_data_ctor+0x65/0xd0 [nouveau] [5.995989] [5.995989] other info that might help us debug this: [5.995995] Possible unsafe locking scenario: [5.995995] [5.996001]CPU0 [5.996004] [5.996005] lock(&subdev->mutex); [5.996005] lock(&subdev->mutex); [5.996005] [5.996005] *** DEADLOCK *** Regards, Peter Hurley ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH] drm/nouveau: add lockdep annotations
Hi Maarten On Mon, 2013-02-04 at 22:59 +0100, Maarten Lankhorst wrote: > Op 04-02-13 22:30, Marcin Slusarz schreef: > > 1) Lockdep thinks all nouveau subdevs belong to the same class and can be > > locked in arbitrary order, which is not true (at least in general case). > > Tell it to distinguish subdevs by (o)class type. > Apart from this specific case, is there any other reason why we require being > able to nest 2 subdev locks? > > Add a NVOBJ_FLAG_CREATE_EXCLUSIVE flag to nouveau_engctx_create and return > -EBUSY if there is any existing object. Problem solved, and lockdep is still > generic. > > > 2) DRM client can be locked under user client lock - tell lockdep to put DRM > > client lock in a separate class. > Can you copy paste a typical lockdep splat for this? Also this should be a > separate patch. :-) PS - Deep call chain. Has anyone looked into max stack depth yet? [5.995881] = [5.995886] [ INFO: possible recursive locking detected ] [5.995892] 3.8.0-next-20130125+ttypatch-xeon+lockdep #20130125+ttypatch Not tainted [5.995898] - [5.995904] modprobe/275 is trying to acquire lock: [5.995909] (&subdev->mutex){+.+.+.}, at: [] nouveau_instobj_create_+0x48/0x90 [nouveau] [5.995955] [5.995955] but task is already holding lock: [5.995961] (&subdev->mutex){+.+.+.}, at: [] nv50_disp_data_ctor+0x65/0xd0 [nouveau] [5.995989] [5.995989] other info that might help us debug this: [5.995995] Possible unsafe locking scenario: [5.995995] [5.996001]CPU0 [5.996004] [5.996005] lock(&subdev->mutex); [5.996005] lock(&subdev->mutex); [5.996005] [5.996005] *** DEADLOCK *** [5.996005] [5.996005] May be due to missing lock nesting notation [5.996005] [5.996005] 4 locks held by modprobe/275: [5.996005] #0: (&__lockdep_no_validate__){..}, at: [] __driver_attach+0x5b/0xb0 [5.996005] #1: (&__lockdep_no_validate__){..}, at: [] __driver_attach+0x69/0xb0 [5.996005] #2: (drm_global_mutex){+.+.+.}, at: [] drm_get_pci_dev+0xc6/0x2d0 [drm] [5.996005] #3: (&subdev->mutex){+.+.+.}, at: [] nv50_disp_data_ctor+0x65/0xd0 [nouveau] [5.996005] [5.996005] stack backtrace: [5.996005] Pid: 275, comm: modprobe Not tainted 3.8.0-next-20130125+ttypatch-xeon+lockdep #20130125+ttypatch [5.996005] Call Trace: [5.996005] [] __lock_acquire+0x687/0x1a70 [5.996005] [] ? mark_held_locks+0x9b/0x100 [5.996005] [] ? trace_hardirqs_on_caller+0x10d/0x1a0 [5.996005] [] ? trace_hardirqs_on+0xd/0x10 [5.996005] [] ? _raw_write_unlock_irqrestore+0x4a/0x90 [5.996005] [] lock_acquire+0x98/0x150 [5.996005] [] ? nouveau_instobj_create_+0x48/0x90 [nouveau] [5.996005] [] ? nouveau_instobj_create_+0x48/0x90 [nouveau] [5.996005] [] mutex_lock_nested+0x50/0x390 [5.996005] [] ? nouveau_instobj_create_+0x48/0x90 [nouveau] [5.996005] [] ? nouveau_object_ref+0x46/0xd0 [nouveau] [5.996005] [] ? nouveau_object_create_+0x65/0xa0 [nouveau] [5.996005] [] nouveau_instobj_create_+0x48/0x90 [nouveau] [5.996005] [] nv50_instobj_ctor+0x51/0xf0 [nouveau] [5.996005] [] nouveau_object_ctor+0x38/0xc0 [nouveau] [5.996005] [] nv50_instmem_alloc+0x26/0x30 [nouveau] [5.996005] [] nouveau_gpuobj_create_+0x247/0x2f0 [nouveau] [5.996005] [] ? _raw_spin_unlock_irqrestore+0x6d/0x90 [5.996005] [] ? trace_hardirqs_on_caller+0x10d/0x1a0 [5.996005] [] nouveau_engctx_create_+0x26c/0x2b0 [nouveau] [5.996005] [] nv50_disp_data_ctor+0xc1/0xd0 [nouveau] [5.996005] [] nouveau_object_ctor+0x38/0xc0 [nouveau] [5.996005] [] nouveau_object_new+0x112/0x240 [nouveau] [5.996005] [] nv50_display_create+0x1a5/0x890 [nouveau] [5.996005] [] ? __cancel_work_timer+0x8b/0xe0 [5.996005] [] nouveau_display_create+0x3cb/0x670 [nouveau] [5.996005] [] nouveau_drm_load+0x26f/0x590 [nouveau] [5.996005] [] ? device_register+0x1e/0x30 [5.996005] [] ? drm_sysfs_device_add+0x86/0xb0 [drm] [5.996005] [] drm_get_pci_dev+0x186/0x2d0 [drm] [5.996005] [] nouveau_drm_probe+0x26a/0x2c0 [nouveau] [5.996005] [] ? _raw_spin_unlock_irqrestore+0x4a/0x90 [5.996005] [] local_pci_probe+0x4b/0x80 [5.996005] [] pci_device_probe+0x111/0x120 [5.996005] [] driver_probe_device+0x8b/0x3a0 [5.996005] [] __driver_attach+0xab/0xb0 [5.996005] [] ? driver_probe_device+0x3a0/0x3a0 [5.996005] [] bus_for_each_dev+0x55/0x90 [5.996005] [] driver_attach+0x1e/0x20 [5.996005] [] bus_add_driver+0x121/0x2b0 [5.996005] [] ? 0xa01acfff [5.996005] [] driver_register+0x77/0x170 [5.996005] [] ? 0xa01acfff [5.996005] [] __pci_register_driver+0x64/0x70 [5.996005] [] drm_pci_init+0x11a/0x130 [drm] [5.996005] [] ? 0xa01acfff [5.996005] [] ?
[next-20130204] nouveau: lockdep warning (not the same as the subdev lockdep warning)
Got this lockdep warning straightaway during boot: [7.435890] = [7.435891] [ INFO: possible recursive locking detected ] [7.435893] 3.8.0-next-20130204+pcipatch-xeon+lockdep #20130204+pcipatch Not tainted [7.435893] - [7.435895] modprobe/255 is trying to acquire lock: [7.435942] (&dmac->lock){+.+...}, at: [] evo_wait+0x43/0xf0 [nouveau] [7.435942] [7.435942] but task is already holding lock: [7.435964] (&dmac->lock){+.+...}, at: [] evo_wait+0x43/0xf0 [nouveau] [7.435964] [7.435964] other info that might help us debug this: [7.435965] Possible unsafe locking scenario: [7.435965] [7.435966]CPU0 [7.435967] [7.435968] lock(&dmac->lock); [7.435970] lock(&dmac->lock); [7.435970] [7.435970] *** DEADLOCK *** [7.435970] [7.435971] May be due to missing lock nesting notation [7.435971] [7.435973] 9 locks held by modprobe/255: [7.435979] #0: (&__lockdep_no_validate__){..}, at: [] __driver_attach+0x5b/0xb0 [7.435982] #1: (&__lockdep_no_validate__){..}, at: [] __driver_attach+0x69/0xb0 [7.436001] #2: (drm_global_mutex){+.+.+.}, at: [] drm_get_pci_dev+0xbd/0x2a0 [drm] [7.436012] #3: (registration_lock){+.+.+.}, at: [] register_framebuffer+0x25/0x2f0 [7.436016] #4: (&fb_info->lock){+.+.+.}, at: [] lock_fb_info+0x26/0x60 [7.436023] #5: ((fb_notifier_list).rwsem){.+.+.+}, at: [] __blocking_notifier_call_chain+0x56/0xc0 [7.436035] #6: (&dev->mode_config.mutex){+.+.+.}, at: [] drm_modeset_lock_all+0x2a/0x70 [drm] [7.436046] #7: (&crtc->mutex){+.+.+.}, at: [] drm_modeset_lock_all+0x54/0x70 [drm] [7.436068] #8: (&dmac->lock){+.+...}, at: [] evo_wait+0x43/0xf0 [nouveau] [7.436068] [7.436068] stack backtrace: [7.436070] Pid: 255, comm: modprobe Not tainted 3.8.0-next-20130204+pcipatch-xeon+lockdep #20130204+pcipatch [7.436072] Call Trace: [7.436077] [] __lock_acquire+0x697/0x1a80 [7.436080] [] lock_acquire+0xa1/0x200 [7.436100] [] ? evo_wait+0x43/0xf0 [nouveau] [7.436121] [] ? evo_wait+0x43/0xf0 [nouveau] [7.436125] [] mutex_lock_nested+0x63/0x3a0 [7.436145] [] ? evo_wait+0x43/0xf0 [nouveau] [7.436148] [] ? mutex_lock_nested+0x29e/0x3a0 [7.436168] [] ? evo_wait+0x43/0xf0 [nouveau] [7.436170] [] ? mark_held_locks+0xaf/0x110 [7.436190] [] evo_wait+0x43/0xf0 [nouveau] [7.436212] [] evo_sync+0x5c/0xf0 [nouveau] [7.436233] [] nv50_display_flip_next+0x5ad/0x5c0 [nouveau] Regards, Peter Hurley ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: kernel BUG at mm/slub.c:3409, 3.8.0-rc7
[+cc Marcin Slusarz, dri-devel] On Tue, 2013-02-19 at 00:21 +, Christoph Lameter wrote: > The problem is that the subsystem attempted to call kfree with a pointer > that was not obtained via a slab allocation. > > On Sat, 16 Feb 2013, Denys Fedoryshchenko wrote: > > > Hi > > > > Worked for a while on 3.8.0-rc7, generally it is fine, then suddenly laptop > > stopped responding to keyboard and mouse. > > Sure it can be memory corruption by some other module, but maybe not. Worth > > to > > report i guess. > > After reboot checked logs and found this: > > > > Feb 16 00:40:17 localhost kernel: [23260.079253] [ cut here > > ] > > Feb 16 00:40:17 localhost kernel: [23260.079257] kernel BUG at > > mm/slub.c:3409! > > Feb 16 00:40:17 localhost kernel: [23260.079259] invalid opcode: [#1] > > SMP > > Feb 16 00:40:17 localhost kernel: [23260.079262] Modules linked in: > > ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 > > nf_defrag_ipv4 > > xt_state nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle iptable_filter > > ip_tables tun bridge stp llc nouveau snd_hda_codec_hdmi coretemp kvm_intel Was there an allocation failure earlier in the log? Might be this nouveau bug (linux-next at the time was 3.8 now): https://bugs.freedesktop.org/show_bug.cgi?id=58087 I think this was fixed but neither bug report has a cross reference :( The original report is here: https://bugzilla.kernel.org/show_bug.cgi?id=51291 Pekka, Can you please re-assign the bugzilla #51291 above to DRI? Thanks. Regards, Peter Hurley ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Page allocation failures (was Re: WARNING: at drivers/gpu/drm/nouveau/core/core/mm.c:242)
On Tue, 2013-02-19 at 00:43 +0100, Jiri Slaby wrote: > On 02/19/2013 12:23 AM, Marcin Slusarz wrote: > > On Mon, Feb 18, 2013 at 11:27:43AM +0100, Jiri Slaby wrote: > >> Hi, > >> > >> we have a report of WARNING from 3.7.6 in nouveau at > >> drivers/gpu/drm/nouveau/core/core/mm.c:242 here: > >> https://bugzilla.novell.com/show_bug.cgi?id=802347#c11 > >> > >> There is an order 4 allocation failure in nouveau_drm_open -> > >> nouveau_vm_create, i.e. this one failed: > >> vm->pgt = kcalloc(vm->lpde - vm->fpde + 1, sizeof(*vm->pgt), GFP_KERNEL); Hi Jiri, I had the order 4 allocation failure and nouveau crash back in November on next-20121129. Bugzillas here: Allocation failure: https://bugzilla.kernel.org/show_bug.cgi?id=51301 Nouveau bug: https://bugzilla.kernel.org/show_bug.cgi?id=51291 https://bugs.freedesktop.org/show_bug.cgi?id=58087 IMO, the 32k allocation failure is the more serious bug. Check out the slab info from your report: Feb 06 13:16:15 desdemona kernel: Node 0 DMA32: 13378*4kB 5026*8kB 1823*16kB 135*32kB 5*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 129576kB Feb 06 13:16:15 desdemona kernel: Node 0 Normal: 1946*4kB 831*8kB 3*16kB 1*32kB 1*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 16496kB The pages are there: why did the allocation fail? I think this is related to all that kswapd mess. In my case, the machine really was OOM -- which made no sense. Completely out of page blocks larger than 32k on a 10gb machine with a bunch of emacs and terminal windows open for 3 days, just doing code, build, code, build, code, build? Regards, Peter Hurley ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[bisected][3.9.0-rc3] NULL ptr dereference from nv50_disp_intr()
On vanilla 3.9.0-rc3, I get this 100% repeatable oops after login when the user X session is coming up: BUG: unable to handle kernel NULL pointer dereference at 0001 IP: [<0001>] 0x0 PGD 0 Oops: 0010 [#1] PREEMPT SMP Modules linked in: ip6table_filter ip6_tables ebtable_nat ebtables .. CPU 3 Pid: 0, comm: swapper/3 Not tainted 3.9.0-rc3-xeon #rc3 Dell Inc. Precision WorkStation T5400 /0RW203 RIP: 0010:[<0001>] [<0001>] 0x0 RSP: 0018:8802afcc3d80 EFLAGS: 00010087 RAX: 88029f6e5808 RBX: 0001 RCX: RDX: 0096 RSI: 0001 RDI: 88029f6e5808 RBP: 8802afcc3dc8 R08: R09: 0004 R10: 002c R11: 88029e559a98 R12: 8802a376cb78 R13: 88029f6e57e0 R14: 88029f6e57f8 R15: 88029f6e5808 FS: () GS:8802afcc() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: 0001 CR3: 00029fa67000 CR4: 07e0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Process swapper/3 (pid: 0, threadinfo 8802a355e000, task 8802a3535c40) Stack: a0159d8a 0082 88029f6e5820 0001 88029f71aa00 0400 0400 8802afcc3e38 a01843b5 8802afcc3df8 Call Trace: [] ? nouveau_event_trigger+0xaa/0xe0 [nouveau] [] nv50_disp_intr+0xc5/0x200 [nouveau] [] ? _raw_spin_unlock_irqrestore+0x2c/0x50 [] ? notifier_call_chain+0x4d/0x70 [] nouveau_mc_intr+0xb5/0x110 [nouveau] [] nouveau_irq_handler+0x6f/0x80 [nouveau] [] handle_irq_event_percpu+0x75/0x260 [] handle_irq_event+0x48/0x70 [] handle_fasteoi_irq+0x5a/0x100 [] handle_irq+0x22/0x40 [] do_IRQ+0x5a/0xd0 [] common_interrupt+0x6d/0x6d [] ? native_safe_halt+0x6/0x10 [] default_idle+0x3d/0x170 [] cpu_idle+0x116/0x130 [] start_secondary+0x251/0x258 Code: Bad RIP value. RIP [<0001>] 0x0 RSP CR2: 0001 ---[ end trace 907323cb8ce6f301 ]--- git bisect from 3.8.0 (good) to 3.9.0-rc3 (bad) blames (bisect log attached): 1d7c71a3e2f77336df536855b0efd2dc5bdeb41b is the first bad commit commit 1d7c71a3e2f77336df536855b0efd2dc5bdeb41b Author: Ben Skeggs Date: Thu Jan 31 09:23:34 2013 +1000 drm/nouveau/disp: port vblank handling to event interface This removes the nastiness with the interactions between display and software engines when handling vblank semaphore release interrupts. Now, all the semantics are handled in one place (sw) \o/. Signed-off-by: Ben Skeggs :04 04 fbd44f8566271415fd2775ab4b6346efef7e82fe a0730be0f35feaa1476b1447b1d65c4b3b3c0686 M drivers On this hardware: nouveau [ DEVICE][:02:00.0] BOOT0 : 0x084e00a2 nouveau [ DEVICE][:02:00.0] Chipset: G84 (NV84) nouveau [ DEVICE][:02:00.0] Family : NV50 nouveau [ VBIOS][:02:00.0] checking PRAMIN for image... nouveau [ VBIOS][:02:00.0] ... appears to be valid nouveau [ VBIOS][:02:00.0] using image from PRAMIN nouveau [ VBIOS][:02:00.0] BIT signature found nouveau [ VBIOS][:02:00.0] version 60.84.63.00.11 nouveau [ PFB][:02:00.0] RAM type: DDR2 nouveau [ PFB][:02:00.0] RAM size: 256 MiB nouveau [ PFB][:02:00.0]ZCOMP: 1892 tags nouveau [ DRM] VRAM: 256 MiB nouveau [ DRM] GART: 512 MiB nouveau [ DRM] BIT BIOS found nouveau [ DRM] Bios version 60.84.63.00 nouveau [ DRM] TMDS table version 2.0 nouveau [ DRM] DCB version 4.0 nouveau [ DRM] DCB outp 00: 02000300 0028 nouveau [ DRM] DCB outp 01: 01000302 0030 nouveau [ DRM] DCB outp 02: 04011310 0028 nouveau [ DRM] DCB outp 03: 02011312 0030 nouveau [ DRM] DCB conn 00: 1030 nouveau [ DRM] DCB conn 01: 2130 nouveau [ DRM] 2 available performance level(s) nouveau [ DRM] 0: core 208MHz shader 416MHz memory 100MHz voltage 1200mV fanspeed 100% nouveau [ DRM] 1: core 460MHz shader 920MHz memory 400MHz voltage 1200mV fanspeed 100% nouveau [ DRM] c: core 459MHz shader 918MHz memory 399MHz voltage 1200mV nouveau [ DRM] MM: using CRYPT for buffer copies nouveau [ DRM] allocated 1680x1050 fb: 0x6, bo 88029ef50400 fbcon: nouveaufb (fb0) is primary device nouveau :02:00.0: fb0: nouveaufb frame buffer device nouveau :02:00.0: registered panic notifier [drm] Initialized nouveau 1.1.0 20120801 for :02:00.0 on minor 0 02:00.0 VGA compatible controller: NVIDIA Corporation G84 [Quadro FX 570] (rev a1) (prog-if 00 [VGA controller]) Subsystem: NVIDIA Corporation Device 0474 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR-
Re: [Nouveau] nouveau lockdep splat
[ adding Ben Skeggs and Dave Airlie ] On Tue, 2013-03-19 at 21:24 +0100, Borislav Petkov wrote: > On Tue, Mar 05, 2013 at 05:30:52PM +0100, Lucas Stach wrote: > > Dropping Tegra ML, it's not the place where Nouveau mails should go. > > Adding Nouveau ML and Maarten, who probably knows Lockdep+Nouveau best. > > Ok, > > with the hope of having the right people on CC now (finally, thanks > Lucas :-)), here's the same splat on -rc3. Someone better take a look > soonish, please: Also happens in next (on nv50 hardware). > [0.541078] [drm] No driver support for vblank timestamp query. > [0.541272] nouveau [ DRM] 3 available performance level(s) > [0.541276] nouveau [ DRM] 0: core 135MHz shader 270MHz memory 135MHz > voltage 900mV > [0.541280] nouveau [ DRM] 1: core 405MHz shader 810MHz memory 405MHz > voltage 900mV > [0.541284] nouveau [ DRM] 3: core 520MHz shader 1230MHz memory > 790MHz voltage 900mV > [0.541287] nouveau [ DRM] c: core 405MHz shader 810MHz memory 405MHz > voltage 900mV > [0.559846] nouveau [ DRM] MM: using COPY for buffer copies > [0.625371] nouveau [ DRM] allocated 1920x1080 fb: 0x7, bo > 88043b54f000 > [0.625441] fbcon: nouveaufb (fb0) is primary device > [0.62] > [0.625556] = > [0.625556] [ INFO: possible recursive locking detected ] > [0.625557] 3.9.0-rc3+ #25 Not tainted > [0.625557] - > [0.625558] swapper/0/1 is trying to acquire lock: > [0.625562] (&dmac->lock){+.+...}, at: [] > evo_wait+0x43/0xf0 > [0.625562] > [0.625562] but task is already holding lock: > [0.625564] (&dmac->lock){+.+...}, at: [] > evo_wait+0x43/0xf0 > [0.625565] > [0.625565] other info that might help us debug this: > [0.625565] Possible unsafe locking scenario: > [0.625565] > [0.625565]CPU0 > [0.625565] > [0.625566] lock(&dmac->lock); > [0.625567] lock(&dmac->lock); > [0.625567] > [0.625567] *** DEADLOCK *** > [0.625567] > [0.625567] May be due to missing lock nesting notation > [0.625567] > [0.625568] 10 locks held by swapper/0/1: > [0.625570] #0: (&__lockdep_no_validate__){..}, at: > [] __driver_attach+0x5b/0xb0 > [0.625572] #1: (&__lockdep_no_validate__){..}, at: > [] __driver_attach+0x69/0xb0 > [0.625575] #2: (drm_global_mutex){+.+.+.}, at: [] > drm_get_pci_dev+0xc6/0x2d0 > [0.625578] #3: (registration_lock){+.+.+.}, at: [] > register_framebuffer+0x25/0x310 > [0.625581] #4: (&fb_info->lock){+.+.+.}, at: [] > lock_fb_info+0x26/0x60 > [0.625583] #5: (console_lock){+.+.+.}, at: [] > register_framebuffer+0x1ba/0x310 > [0.625585] #6: ((fb_notifier_list).rwsem){.+.+.+}, at: > [] __blocking_notifier_call_chain+0x42/0x80 > [0.625587] #7: (&dev->mode_config.mutex){+.+.+.}, at: > [] drm_modeset_lock_all+0x2a/0x70 > [0.625589] #8: (&crtc->mutex){+.+.+.}, at: [] > drm_modeset_lock_all+0x54/0x70 > [0.625591] #9: (&dmac->lock){+.+...}, at: [] > evo_wait+0x43/0xf0 > [0.625591] > [0.625591] stack backtrace: > [0.625592] Pid: 1, comm: swapper/0 Not tainted 3.9.0-rc3+ #25 > [0.625593] Call Trace: > [0.625595] [] __lock_acquire+0x76b/0x1c20 > [0.625597] [] ? dcb_table+0x1ac/0x2a0 > [0.625599] [] lock_acquire+0x8a/0x120 > [0.625600] [] ? evo_wait+0x43/0xf0 > [0.625602] [] ? mutex_lock_nested+0x292/0x330 > [0.625603] [] mutex_lock_nested+0x6e/0x330 > [0.625605] [] ? evo_wait+0x43/0xf0 > [0.625606] [] ? mark_held_locks+0x9b/0x100 > [0.625607] [] evo_wait+0x43/0xf0 > [0.625609] [] nv50_display_flip_next+0x713/0x7a0 > [0.625611] [] ? mutex_unlock+0xe/0x10 > [0.625612] [] ? evo_kick+0x37/0x40 > [0.625613] [] nv50_crtc_commit+0x10e/0x230 > [0.625615] [] drm_crtc_helper_set_mode+0x365/0x510 > [0.625617] [] drm_crtc_helper_set_config+0xa4e/0xb70 > [0.625618] [] drm_mode_set_config_internal+0x31/0x70 > [0.625619] [] drm_fb_helper_set_par+0x71/0xf0 > [0.625621] [] fbcon_init+0x514/0x5a0 > [0.625623] [] visual_init+0xbc/0x120 > [0.625624] [] do_bind_con_driver+0x163/0x320 > [0.625625] [] do_take_over_console+0x61/0x70 > [0.625627] [] do_fbcon_takeover+0x63/0xc0 > [0.625628] [] fbcon_event_notify+0x5fd/0x700 > [0.625629] [] notifier_call_chain+0x4d/0x70 > [0.625630] [] __blocking_notifier_call_chain+0x58/0x80 > [0.625631] [] blocking_notifier_call_chain+0x16/0x20 > [0.625633] [] fb_notifier_call_chain+0x1b/0x20 > [0.625634] [] register_framebuffer+0x1c8/0x310 > [0.625635] [] drm_fb_helper_initial_config+0x371/0x520 > [0.625637] [] ? > drm_fb_helper_single_add_all_connectors+0x47/0xf0 > [0.625639] [] ? kmem_cache_alloc_trace+0xee/0x150 > [0.625641] [] nouveau_fbcon_init+0x10e/0x160 > [
Re: [bisected][3.9.0-rc3] NULL ptr dereference from nv50_disp_intr()
On Tue, 2013-03-19 at 11:13 -0400, Peter Hurley wrote: > On vanilla 3.9.0-rc3, I get this 100% repeatable oops after login when > the user X session is coming up: Perhaps I wasn't clear that this happens on every boot and is a regression from 3.8 I'd be happy to help resolve this but time is of the essence; it would be a shame to have to revert all of this for 3.9 Regards, Peter Hurley > BUG: unable to handle kernel NULL pointer dereference at 0001 > IP: [<0001>] 0x0 > PGD 0 > Oops: 0010 [#1] PREEMPT SMP > Modules linked in: ip6table_filter ip6_tables ebtable_nat ebtables > .. > CPU 3 > Pid: 0, comm: swapper/3 Not tainted 3.9.0-rc3-xeon #rc3 Dell Inc. Precision > WorkStation T5400 /0RW203 > RIP: 0010:[<0001>] [<0001>] 0x0 > RSP: 0018:8802afcc3d80 EFLAGS: 00010087 > RAX: 88029f6e5808 RBX: 0001 RCX: > RDX: 0096 RSI: 0001 RDI: 88029f6e5808 > RBP: 8802afcc3dc8 R08: R09: 0004 > R10: 002c R11: 88029e559a98 R12: 8802a376cb78 > R13: 88029f6e57e0 R14: 88029f6e57f8 R15: 88029f6e5808 > FS: () GS:8802afcc() knlGS: > CS: 0010 DS: ES: CR0: 8005003b > CR2: 0001 CR3: 00029fa67000 CR4: 07e0 > DR0: DR1: DR2: > DR3: DR6: 0ff0 DR7: 0400 > Process swapper/3 (pid: 0, threadinfo 8802a355e000, task 8802a3535c40) > Stack: > a0159d8a 0082 88029f6e5820 0001 > 88029f71aa00 0400 > 0400 8802afcc3e38 a01843b5 8802afcc3df8 > Call Trace: > > [] ? nouveau_event_trigger+0xaa/0xe0 [nouveau] > [] nv50_disp_intr+0xc5/0x200 [nouveau] > [] ? _raw_spin_unlock_irqrestore+0x2c/0x50 > [] ? notifier_call_chain+0x4d/0x70 > [] nouveau_mc_intr+0xb5/0x110 [nouveau] > [] nouveau_irq_handler+0x6f/0x80 [nouveau] > [] handle_irq_event_percpu+0x75/0x260 > [] handle_irq_event+0x48/0x70 > [] handle_fasteoi_irq+0x5a/0x100 > [] handle_irq+0x22/0x40 > [] do_IRQ+0x5a/0xd0 > [] common_interrupt+0x6d/0x6d > > [] ? native_safe_halt+0x6/0x10 > [] default_idle+0x3d/0x170 > [] cpu_idle+0x116/0x130 > [] start_secondary+0x251/0x258 > Code: Bad RIP value. > RIP [<0001>] 0x0 > RSP > CR2: 0001 > ---[ end trace 907323cb8ce6f301 ]--- > > > > git bisect from 3.8.0 (good) to 3.9.0-rc3 (bad) blames (bisect log > attached): > > 1d7c71a3e2f77336df536855b0efd2dc5bdeb41b is the first bad commit > commit 1d7c71a3e2f77336df536855b0efd2dc5bdeb41b > Author: Ben Skeggs > Date: Thu Jan 31 09:23:34 2013 +1000 > > drm/nouveau/disp: port vblank handling to event interface > > This removes the nastiness with the interactions between display and > software engines when handling vblank semaphore release interrupts. > > Now, all the semantics are handled in one place (sw) \o/. > > Signed-off-by: Ben Skeggs > > :04 04 fbd44f8566271415fd2775ab4b6346efef7e82fe > a0730be0f35feaa1476b1447b1d65c4b3b3c0686 Mdrivers > > > On this hardware: > nouveau [ DEVICE][:02:00.0] BOOT0 : 0x084e00a2 > nouveau [ DEVICE][:02:00.0] Chipset: G84 (NV84) > nouveau [ DEVICE][:02:00.0] Family : NV50 > nouveau [ VBIOS][:02:00.0] checking PRAMIN for image... > nouveau [ VBIOS][:02:00.0] ... appears to be valid > nouveau [ VBIOS][:02:00.0] using image from PRAMIN > nouveau [ VBIOS][:02:00.0] BIT signature found > nouveau [ VBIOS][:02:00.0] version 60.84.63.00.11 > nouveau [ PFB][:02:00.0] RAM type: DDR2 > nouveau [ PFB][:02:00.0] RAM size: 256 MiB > nouveau [ PFB][:02:00.0]ZCOMP: 1892 tags > nouveau [ DRM] VRAM: 256 MiB > nouveau [ DRM] GART: 512 MiB > nouveau [ DRM] BIT BIOS found > nouveau [ DRM] Bios version 60.84.63.00 > nouveau [ DRM] TMDS table version 2.0 > nouveau [ DRM] DCB version 4.0 > nouveau [ DRM] DCB outp 00: 02000300 0028 > nouveau [ DRM] DCB outp 01: 01000302 0030 > nouveau [ DRM] DCB outp 02: 04011310 0028 > nouveau [ DRM] DCB outp 03: 02011312 0030 > nouveau [ DRM] DCB conn 00: 1030 > nouveau [ DRM] DCB conn 01: 2130 > nouveau [ DRM] 2 available performance level(s) > nouveau [ DRM] 0: core 208MHz shader 416MHz memory 100MHz voltage 1200mV > fanspeed 100% > nouveau [ DRM] 1: core 46
Re: [PATCH] drm/nouveau: fix NULL ptr dereference from nv50_disp_intr()
On Sun, 2013-03-24 at 12:56 +0100, Maarten Lankhorst wrote: > Op 23-03-13 12:47, Peter Hurley schreef: > > On Tue, 2013-03-19 at 11:13 -0400, Peter Hurley wrote: > >> On vanilla 3.9.0-rc3, I get this 100% repeatable oops after login when > >> the user X session is coming up: > > Perhaps I wasn't clear that this happens on every boot and is a > > regression from 3.8 > > > > I'd be happy to help resolve this but time is of the essence; it would > > be a shame to have to revert all of this for 3.9 > > Well it broke on my system too, so it was easy to fix. > > I didn't even need gdm to trigger it! > > >8 > This fixes regression caused by 1d7c71a3e2f7 (drm/nouveau/disp: port vblank > handling to event interface), Thanks Maarten! But am I the only one running multi-head nouveau on linux-next and early RCs? That's a scary thought. Is there a test bench for validating nouveau? Regards, Peter Hurley ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [Nouveau] [PATCH v2] drm/nouveau: wait for vblank on page flipping
On Thu, 2013-03-28 at 16:16 +0100, Maarten Lankhorst wrote: > Signed-off-by: Maarten Lankhorst > --- > Oops, fixed to apply this time.. > > diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c > b/drivers/gpu/drm/nouveau/nouveau_display.c > index 4610c3a..020542e 100644 > --- a/drivers/gpu/drm/nouveau/nouveau_display.c > +++ b/drivers/gpu/drm/nouveau/nouveau_display.c > @@ -593,7 +597,7 @@ nouveau_crtc_page_flip(struct drm_crtc *crtc, struct > drm_framebuffer *fb, > > /* Emit a page flip */ > if (nv_device(drm->device)->card_type >= NV_50) { > - ret = nv50_display_flip_next(crtc, fb, chan, 0); > + ret = nv50_display_flip_next(crtc, fb, chan, 1); Why would this work? > if (ret) { > mutex_unlock(&chan->cli->mutex); > goto fail_unreserve; ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH] drm/nouveau: add lockdep annotations
Hi Maarten On Mon, 2013-02-04 at 22:59 +0100, Maarten Lankhorst wrote: > Op 04-02-13 22:30, Marcin Slusarz schreef: > > 1) Lockdep thinks all nouveau subdevs belong to the same class and can be > > locked in arbitrary order, which is not true (at least in general case). > > Tell it to distinguish subdevs by (o)class type. > Apart from this specific case, is there any other reason why we require being > able to nest 2 subdev locks? > > Add a NVOBJ_FLAG_CREATE_EXCLUSIVE flag to nouveau_engctx_create and return > -EBUSY if there is any existing object. Problem solved, and lockdep is still > generic. > > > 2) DRM client can be locked under user client lock - tell lockdep to put DRM > > client lock in a separate class. > Can you copy paste a typical lockdep splat for this? Also this should be a > separate patch. :-) PS - Deep call chain. Has anyone looked into max stack depth yet? [5.995881] = [5.995886] [ INFO: possible recursive locking detected ] [5.995892] 3.8.0-next-20130125+ttypatch-xeon+lockdep #20130125+ttypatch Not tainted [5.995898] - [5.995904] modprobe/275 is trying to acquire lock: [5.995909] (&subdev->mutex){+.+.+.}, at: [] nouveau_instobj_create_+0x48/0x90 [nouveau] [5.995955] [5.995955] but task is already holding lock: [5.995961] (&subdev->mutex){+.+.+.}, at: [] nv50_disp_data_ctor+0x65/0xd0 [nouveau] [5.995989] [5.995989] other info that might help us debug this: [5.995995] Possible unsafe locking scenario: [5.995995] [5.996001]CPU0 [5.996004] [5.996005] lock(&subdev->mutex); [5.996005] lock(&subdev->mutex); [5.996005] [5.996005] *** DEADLOCK *** [5.996005] [5.996005] May be due to missing lock nesting notation [5.996005] [5.996005] 4 locks held by modprobe/275: [5.996005] #0: (&__lockdep_no_validate__){..}, at: [] __driver_attach+0x5b/0xb0 [5.996005] #1: (&__lockdep_no_validate__){..}, at: [] __driver_attach+0x69/0xb0 [5.996005] #2: (drm_global_mutex){+.+.+.}, at: [] drm_get_pci_dev+0xc6/0x2d0 [drm] [5.996005] #3: (&subdev->mutex){+.+.+.}, at: [] nv50_disp_data_ctor+0x65/0xd0 [nouveau] [5.996005] [5.996005] stack backtrace: [5.996005] Pid: 275, comm: modprobe Not tainted 3.8.0-next-20130125+ttypatch-xeon+lockdep #20130125+ttypatch [5.996005] Call Trace: [5.996005] [] __lock_acquire+0x687/0x1a70 [5.996005] [] ? mark_held_locks+0x9b/0x100 [5.996005] [] ? trace_hardirqs_on_caller+0x10d/0x1a0 [5.996005] [] ? trace_hardirqs_on+0xd/0x10 [5.996005] [] ? _raw_write_unlock_irqrestore+0x4a/0x90 [5.996005] [] lock_acquire+0x98/0x150 [5.996005] [] ? nouveau_instobj_create_+0x48/0x90 [nouveau] [5.996005] [] ? nouveau_instobj_create_+0x48/0x90 [nouveau] [5.996005] [] mutex_lock_nested+0x50/0x390 [5.996005] [] ? nouveau_instobj_create_+0x48/0x90 [nouveau] [5.996005] [] ? nouveau_object_ref+0x46/0xd0 [nouveau] [5.996005] [] ? nouveau_object_create_+0x65/0xa0 [nouveau] [5.996005] [] nouveau_instobj_create_+0x48/0x90 [nouveau] [5.996005] [] nv50_instobj_ctor+0x51/0xf0 [nouveau] [5.996005] [] nouveau_object_ctor+0x38/0xc0 [nouveau] [5.996005] [] nv50_instmem_alloc+0x26/0x30 [nouveau] [5.996005] [] nouveau_gpuobj_create_+0x247/0x2f0 [nouveau] [5.996005] [] ? _raw_spin_unlock_irqrestore+0x6d/0x90 [5.996005] [] ? trace_hardirqs_on_caller+0x10d/0x1a0 [5.996005] [] nouveau_engctx_create_+0x26c/0x2b0 [nouveau] [5.996005] [] nv50_disp_data_ctor+0xc1/0xd0 [nouveau] [5.996005] [] nouveau_object_ctor+0x38/0xc0 [nouveau] [5.996005] [] nouveau_object_new+0x112/0x240 [nouveau] [5.996005] [] nv50_display_create+0x1a5/0x890 [nouveau] [5.996005] [] ? __cancel_work_timer+0x8b/0xe0 [5.996005] [] nouveau_display_create+0x3cb/0x670 [nouveau] [5.996005] [] nouveau_drm_load+0x26f/0x590 [nouveau] [5.996005] [] ? device_register+0x1e/0x30 [5.996005] [] ? drm_sysfs_device_add+0x86/0xb0 [drm] [5.996005] [] drm_get_pci_dev+0x186/0x2d0 [drm] [5.996005] [] nouveau_drm_probe+0x26a/0x2c0 [nouveau] [5.996005] [] ? _raw_spin_unlock_irqrestore+0x4a/0x90 [5.996005] [] local_pci_probe+0x4b/0x80 [5.996005] [] pci_device_probe+0x111/0x120 [5.996005] [] driver_probe_device+0x8b/0x3a0 [5.996005] [] __driver_attach+0xab/0xb0 [5.996005] [] ? driver_probe_device+0x3a0/0x3a0 [5.996005] [] bus_for_each_dev+0x55/0x90 [5.996005] [] driver_attach+0x1e/0x20 [5.996005] [] bus_add_driver+0x121/0x2b0 [5.996005] [] ? 0xa01acfff [5.996005] [] driver_register+0x77/0x170 [5.996005] [] ? 0xa01acfff [5.996005] [] __pci_register_driver+0x64/0x70 [5.996005] [] drm_pci_init+0x11a/0x130 [drm] [5.996005] [] ? 0xa01acfff [5.996005] [] ?
[next-20130204] nouveau: lockdep warning (not the same as the subdev lockdep warning)
Got this lockdep warning straightaway during boot: [7.435890] = [7.435891] [ INFO: possible recursive locking detected ] [7.435893] 3.8.0-next-20130204+pcipatch-xeon+lockdep #20130204+pcipatch Not tainted [7.435893] - [7.435895] modprobe/255 is trying to acquire lock: [7.435942] (&dmac->lock){+.+...}, at: [] evo_wait+0x43/0xf0 [nouveau] [7.435942] [7.435942] but task is already holding lock: [7.435964] (&dmac->lock){+.+...}, at: [] evo_wait+0x43/0xf0 [nouveau] [7.435964] [7.435964] other info that might help us debug this: [7.435965] Possible unsafe locking scenario: [7.435965] [7.435966]CPU0 [7.435967] [7.435968] lock(&dmac->lock); [7.435970] lock(&dmac->lock); [7.435970] [7.435970] *** DEADLOCK *** [7.435970] [7.435971] May be due to missing lock nesting notation [7.435971] [7.435973] 9 locks held by modprobe/255: [7.435979] #0: (&__lockdep_no_validate__){..}, at: [] __driver_attach+0x5b/0xb0 [7.435982] #1: (&__lockdep_no_validate__){..}, at: [] __driver_attach+0x69/0xb0 [7.436001] #2: (drm_global_mutex){+.+.+.}, at: [] drm_get_pci_dev+0xbd/0x2a0 [drm] [7.436012] #3: (registration_lock){+.+.+.}, at: [] register_framebuffer+0x25/0x2f0 [7.436016] #4: (&fb_info->lock){+.+.+.}, at: [] lock_fb_info+0x26/0x60 [7.436023] #5: ((fb_notifier_list).rwsem){.+.+.+}, at: [] __blocking_notifier_call_chain+0x56/0xc0 [7.436035] #6: (&dev->mode_config.mutex){+.+.+.}, at: [] drm_modeset_lock_all+0x2a/0x70 [drm] [7.436046] #7: (&crtc->mutex){+.+.+.}, at: [] drm_modeset_lock_all+0x54/0x70 [drm] [7.436068] #8: (&dmac->lock){+.+...}, at: [] evo_wait+0x43/0xf0 [nouveau] [7.436068] [7.436068] stack backtrace: [7.436070] Pid: 255, comm: modprobe Not tainted 3.8.0-next-20130204+pcipatch-xeon+lockdep #20130204+pcipatch [7.436072] Call Trace: [7.436077] [] __lock_acquire+0x697/0x1a80 [7.436080] [] lock_acquire+0xa1/0x200 [7.436100] [] ? evo_wait+0x43/0xf0 [nouveau] [7.436121] [] ? evo_wait+0x43/0xf0 [nouveau] [7.436125] [] mutex_lock_nested+0x63/0x3a0 [7.436145] [] ? evo_wait+0x43/0xf0 [nouveau] [7.436148] [] ? mutex_lock_nested+0x29e/0x3a0 [7.436168] [] ? evo_wait+0x43/0xf0 [nouveau] [7.436170] [] ? mark_held_locks+0xaf/0x110 [7.436190] [] evo_wait+0x43/0xf0 [nouveau] [7.436212] [] evo_sync+0x5c/0xf0 [nouveau] [7.436233] [] nv50_display_flip_next+0x5ad/0x5c0 [nouveau] Regards, Peter Hurley
Page allocation failures (was Re: WARNING: at drivers/gpu/drm/nouveau/core/core/mm.c:242)
On Tue, 2013-02-19 at 00:43 +0100, Jiri Slaby wrote: > On 02/19/2013 12:23 AM, Marcin Slusarz wrote: > > On Mon, Feb 18, 2013 at 11:27:43AM +0100, Jiri Slaby wrote: > >> Hi, > >> > >> we have a report of WARNING from 3.7.6 in nouveau at > >> drivers/gpu/drm/nouveau/core/core/mm.c:242 here: > >> https://bugzilla.novell.com/show_bug.cgi?id=802347#c11 > >> > >> There is an order 4 allocation failure in nouveau_drm_open -> > >> nouveau_vm_create, i.e. this one failed: > >> vm->pgt = kcalloc(vm->lpde - vm->fpde + 1, sizeof(*vm->pgt), GFP_KERNEL); Hi Jiri, I had the order 4 allocation failure and nouveau crash back in November on next-20121129. Bugzillas here: Allocation failure: https://bugzilla.kernel.org/show_bug.cgi?id=51301 Nouveau bug: https://bugzilla.kernel.org/show_bug.cgi?id=51291 https://bugs.freedesktop.org/show_bug.cgi?id=58087 IMO, the 32k allocation failure is the more serious bug. Check out the slab info from your report: Feb 06 13:16:15 desdemona kernel: Node 0 DMA32: 13378*4kB 5026*8kB 1823*16kB 135*32kB 5*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 129576kB Feb 06 13:16:15 desdemona kernel: Node 0 Normal: 1946*4kB 831*8kB 3*16kB 1*32kB 1*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 16496kB The pages are there: why did the allocation fail? I think this is related to all that kswapd mess. In my case, the machine really was OOM -- which made no sense. Completely out of page blocks larger than 32k on a 10gb machine with a bunch of emacs and terminal windows open for 3 days, just doing code, build, code, build, code, build? Regards, Peter Hurley
kernel BUG at mm/slub.c:3409, 3.8.0-rc7
[+cc Marcin Slusarz, dri-devel] On Tue, 2013-02-19 at 00:21 +, Christoph Lameter wrote: > The problem is that the subsystem attempted to call kfree with a pointer > that was not obtained via a slab allocation. > > On Sat, 16 Feb 2013, Denys Fedoryshchenko wrote: > > > Hi > > > > Worked for a while on 3.8.0-rc7, generally it is fine, then suddenly laptop > > stopped responding to keyboard and mouse. > > Sure it can be memory corruption by some other module, but maybe not. Worth > > to > > report i guess. > > After reboot checked logs and found this: > > > > Feb 16 00:40:17 localhost kernel: [23260.079253] [ cut here > > ] > > Feb 16 00:40:17 localhost kernel: [23260.079257] kernel BUG at > > mm/slub.c:3409! > > Feb 16 00:40:17 localhost kernel: [23260.079259] invalid opcode: [#1] > > SMP > > Feb 16 00:40:17 localhost kernel: [23260.079262] Modules linked in: > > ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 > > nf_defrag_ipv4 > > xt_state nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle iptable_filter > > ip_tables tun bridge stp llc nouveau snd_hda_codec_hdmi coretemp kvm_intel Was there an allocation failure earlier in the log? Might be this nouveau bug (linux-next at the time was 3.8 now): https://bugs.freedesktop.org/show_bug.cgi?id=58087 I think this was fixed but neither bug report has a cross reference :( The original report is here: https://bugzilla.kernel.org/show_bug.cgi?id=51291 Pekka, Can you please re-assign the bugzilla #51291 above to DRI? Thanks. Regards, Peter Hurley
3.8-rc2: lockdep warning in nouveau driver
On Wed, 2013-01-09 at 12:45 +0100, Arend van Spriel wrote: > Maybe this one is already known, but I did not find a post about it. So > here it is. > > Regards, > Arend [snip] > [9.589986] = > [9.595365] [ INFO: possible recursive locking detected ] > [9.600745] 3.8.0-rc2-wl-testing-lockdep-2-ga524cf0 #1 Not tainted > [9.607248] - > [9.612626] modprobe/163 is trying to acquire lock: > [9.617486] (&subdev->mutex){+.+.+.}, at: [] > nv50_fb_vram_new+0x92/0x230 [nouveau] > [9.626052] > [9.626052] but task is already holding lock: > [9.631865] (&subdev->mutex){+.+.+.}, at: [] > nv50_disp_data_ctor+0x55/0xc0 [nouveau] > [9.640593] > [9.640593] other info that might help us debug this: > [9.647096] Possible unsafe locking scenario: > [9.647096] > [9.652995]CPU0 > [9.655430] > [9.657867] lock(&subdev->mutex); > [9.661365] lock(&subdev->mutex); > [9.664863] > [9.664863] *** DEADLOCK *** > [9.664863] > [9.670762] May be due to missing lock nesting notation Same. [5.995881] = [5.995886] [ INFO: possible recursive locking detected ] [5.995892] 3.8.0-next-20130125+ttypatch-xeon+lockdep #20130125+ttypatch Not tainted [5.995898] - [5.995904] modprobe/275 is trying to acquire lock: [5.995909] (&subdev->mutex){+.+.+.}, at: [] nouveau_instobj_create_+0x48/0x90 [nouveau] [5.995955] [5.995955] but task is already holding lock: [5.995961] (&subdev->mutex){+.+.+.}, at: [] nv50_disp_data_ctor+0x65/0xd0 [nouveau] [5.995989] [5.995989] other info that might help us debug this: [5.995995] Possible unsafe locking scenario: [5.995995] [5.996001]CPU0 [5.996004] [5.996005] lock(&subdev->mutex); [5.996005] lock(&subdev->mutex); [5.996005] [5.996005] *** DEADLOCK *** Regards, Peter Hurley
[bisected][3.9.0-rc3] NULL ptr dereference from nv50_disp_intr()
On Tue, 2013-03-19 at 11:13 -0400, Peter Hurley wrote: > On vanilla 3.9.0-rc3, I get this 100% repeatable oops after login when > the user X session is coming up: Perhaps I wasn't clear that this happens on every boot and is a regression from 3.8 I'd be happy to help resolve this but time is of the essence; it would be a shame to have to revert all of this for 3.9 Regards, Peter Hurley > BUG: unable to handle kernel NULL pointer dereference at 0001 > IP: [<0001>] 0x0 > PGD 0 > Oops: 0010 [#1] PREEMPT SMP > Modules linked in: ip6table_filter ip6_tables ebtable_nat ebtables > .. > CPU 3 > Pid: 0, comm: swapper/3 Not tainted 3.9.0-rc3-xeon #rc3 Dell Inc. Precision > WorkStation T5400 /0RW203 > RIP: 0010:[<0001>] [<0001>] 0x0 > RSP: 0018:8802afcc3d80 EFLAGS: 00010087 > RAX: 88029f6e5808 RBX: 0001 RCX: > RDX: 0096 RSI: 0001 RDI: 88029f6e5808 > RBP: 8802afcc3dc8 R08: R09: 0004 > R10: 002c R11: 88029e559a98 R12: 8802a376cb78 > R13: 88029f6e57e0 R14: 88029f6e57f8 R15: 88029f6e5808 > FS: () GS:8802afcc() knlGS: > CS: 0010 DS: ES: CR0: 8005003b > CR2: 0001 CR3: 00029fa67000 CR4: 07e0 > DR0: DR1: DR2: > DR3: DR6: 0ff0 DR7: 0400 > Process swapper/3 (pid: 0, threadinfo 8802a355e000, task 8802a3535c40) > Stack: > a0159d8a 0082 88029f6e5820 0001 > 88029f71aa00 0400 > 0400 8802afcc3e38 a01843b5 8802afcc3df8 > Call Trace: > > [] ? nouveau_event_trigger+0xaa/0xe0 [nouveau] > [] nv50_disp_intr+0xc5/0x200 [nouveau] > [] ? _raw_spin_unlock_irqrestore+0x2c/0x50 > [] ? notifier_call_chain+0x4d/0x70 > [] nouveau_mc_intr+0xb5/0x110 [nouveau] > [] nouveau_irq_handler+0x6f/0x80 [nouveau] > [] handle_irq_event_percpu+0x75/0x260 > [] handle_irq_event+0x48/0x70 > [] handle_fasteoi_irq+0x5a/0x100 > [] handle_irq+0x22/0x40 > [] do_IRQ+0x5a/0xd0 > [] common_interrupt+0x6d/0x6d > > [] ? native_safe_halt+0x6/0x10 > [] default_idle+0x3d/0x170 > [] cpu_idle+0x116/0x130 > [] start_secondary+0x251/0x258 > Code: Bad RIP value. > RIP [<0001>] 0x0 > RSP > CR2: 0001 > ---[ end trace 907323cb8ce6f301 ]--- > > > > git bisect from 3.8.0 (good) to 3.9.0-rc3 (bad) blames (bisect log > attached): > > 1d7c71a3e2f77336df536855b0efd2dc5bdeb41b is the first bad commit > commit 1d7c71a3e2f77336df536855b0efd2dc5bdeb41b > Author: Ben Skeggs > Date: Thu Jan 31 09:23:34 2013 +1000 > > drm/nouveau/disp: port vblank handling to event interface > > This removes the nastiness with the interactions between display and > software engines when handling vblank semaphore release interrupts. > > Now, all the semantics are handled in one place (sw) \o/. > > Signed-off-by: Ben Skeggs > > :04 04 fbd44f8566271415fd2775ab4b6346efef7e82fe > a0730be0f35feaa1476b1447b1d65c4b3b3c0686 Mdrivers > > > On this hardware: > nouveau [ DEVICE][:02:00.0] BOOT0 : 0x084e00a2 > nouveau [ DEVICE][:02:00.0] Chipset: G84 (NV84) > nouveau [ DEVICE][:02:00.0] Family : NV50 > nouveau [ VBIOS][:02:00.0] checking PRAMIN for image... > nouveau [ VBIOS][:02:00.0] ... appears to be valid > nouveau [ VBIOS][:02:00.0] using image from PRAMIN > nouveau [ VBIOS][:02:00.0] BIT signature found > nouveau [ VBIOS][:02:00.0] version 60.84.63.00.11 > nouveau [ PFB][:02:00.0] RAM type: DDR2 > nouveau [ PFB][:02:00.0] RAM size: 256 MiB > nouveau [ PFB][:02:00.0]ZCOMP: 1892 tags > nouveau [ DRM] VRAM: 256 MiB > nouveau [ DRM] GART: 512 MiB > nouveau [ DRM] BIT BIOS found > nouveau [ DRM] Bios version 60.84.63.00 > nouveau [ DRM] TMDS table version 2.0 > nouveau [ DRM] DCB version 4.0 > nouveau [ DRM] DCB outp 00: 02000300 0028 > nouveau [ DRM] DCB outp 01: 01000302 0030 > nouveau [ DRM] DCB outp 02: 04011310 0028 > nouveau [ DRM] DCB outp 03: 02011312 0030 > nouveau [ DRM] DCB conn 00: 1030 > nouveau [ DRM] DCB conn 01: 2130 > nouveau [ DRM] 2 available performance level(s) > nouveau [ DRM] 0: core 208MHz shader 416MHz memory 100MHz voltage 1200mV > fanspeed 100% > nouveau [ DRM] 1: core 46
[PATCH] drm/nouveau: fix NULL ptr dereference from nv50_disp_intr()
On Sun, 2013-03-24 at 12:56 +0100, Maarten Lankhorst wrote: > Op 23-03-13 12:47, Peter Hurley schreef: > > On Tue, 2013-03-19 at 11:13 -0400, Peter Hurley wrote: > >> On vanilla 3.9.0-rc3, I get this 100% repeatable oops after login when > >> the user X session is coming up: > > Perhaps I wasn't clear that this happens on every boot and is a > > regression from 3.8 > > > > I'd be happy to help resolve this but time is of the essence; it would > > be a shame to have to revert all of this for 3.9 > > Well it broke on my system too, so it was easy to fix. > > I didn't even need gdm to trigger it! > > >8 > This fixes regression caused by 1d7c71a3e2f7 (drm/nouveau/disp: port vblank > handling to event interface), Thanks Maarten! But am I the only one running multi-head nouveau on linux-next and early RCs? That's a scary thought. Is there a test bench for validating nouveau? Regards, Peter Hurley
[Nouveau] [PATCH v2] drm/nouveau: wait for vblank on page flipping
On Thu, 2013-03-28 at 16:16 +0100, Maarten Lankhorst wrote: > Signed-off-by: Maarten Lankhorst > --- > Oops, fixed to apply this time.. > > diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c > b/drivers/gpu/drm/nouveau/nouveau_display.c > index 4610c3a..020542e 100644 > --- a/drivers/gpu/drm/nouveau/nouveau_display.c > +++ b/drivers/gpu/drm/nouveau/nouveau_display.c > @@ -593,7 +597,7 @@ nouveau_crtc_page_flip(struct drm_crtc *crtc, struct > drm_framebuffer *fb, > > /* Emit a page flip */ > if (nv_device(drm->device)->card_type >= NV_50) { > - ret = nv50_display_flip_next(crtc, fb, chan, 0); > + ret = nv50_display_flip_next(crtc, fb, chan, 1); Why would this work? > if (ret) { > mutex_unlock(&chan->cli->mutex); > goto fail_unreserve;
[Intel-gfx] [PATCH] drm/vblank: Fixup and document timestamp update/read barriers
On 05/04/2015 12:52 AM, Mario Kleiner wrote: > On 04/16/2015 03:03 PM, Daniel Vetter wrote: >> On Thu, Apr 16, 2015 at 08:30:55AM -0400, Peter Hurley wrote: >>> On 04/15/2015 01:31 PM, Daniel Vetter wrote: >>>> On Wed, Apr 15, 2015 at 09:00:04AM -0400, Peter Hurley wrote: >>>>> Hi Daniel, >>>>> >>>>> On 04/15/2015 03:17 AM, Daniel Vetter wrote: >>>>>> This was a bit too much cargo-culted, so lets make it solid: >>>>>> - vblank->count doesn't need to be an atomic, writes are always done >>>>>>under the protection of dev->vblank_time_lock. Switch to an unsigned >>>>>>long instead and update comments. Note that atomic_read is just a >>>>>>normal read of a volatile variable, so no need to audit all the >>>>>>read-side access specifically. >>>>>> >>>>>> - The barriers for the vblank counter seqlock weren't complete: The >>>>>>read-side was missing the first barrier between the counter read and >>>>>>the timestamp read, it only had a barrier between the ts and the >>>>>>counter read. We need both. >>>>>> >>>>>> - Barriers weren't properly documented. Since barriers only work if >>>>>>you have them on boths sides of the transaction it's prudent to >>>>>>reference where the other side is. To avoid duplicating the >>>>>>write-side comment 3 times extract a little store_vblank() helper. >>>>>>In that helper also assert that we do indeed hold >>>>>>dev->vblank_time_lock, since in some cases the lock is acquired a >>>>>>few functions up in the callchain. >>>>>> >>>>>> Spotted while reviewing a patch from Chris Wilson to add a fastpath to >>>>>> the vblank_wait ioctl. >>>>>> >>>>>> Cc: Chris Wilson >>>>>> Cc: Mario Kleiner >>>>>> Cc: Ville Syrjälä >>>>>> Cc: Michel Dänzer >>>>>> Signed-off-by: Daniel Vetter >>>>>> --- >>>>>> drivers/gpu/drm/drm_irq.c | 92 >>>>>> --- >>>>>> include/drm/drmP.h| 8 +++-- >>>>>> 2 files changed, 54 insertions(+), 46 deletions(-) >>>>>> >>>>>> diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c >>>>>> index c8a34476570a..23bfbc61a494 100644 >>>>>> --- a/drivers/gpu/drm/drm_irq.c >>>>>> +++ b/drivers/gpu/drm/drm_irq.c >>>>>> @@ -74,6 +74,33 @@ module_param_named(vblankoffdelay, >>>>>> drm_vblank_offdelay, int, 0600); >>>>>> module_param_named(timestamp_precision_usec, drm_timestamp_precision, >>>>>> int, 0600); >>>>>> module_param_named(timestamp_monotonic, drm_timestamp_monotonic, int, >>>>>> 0600); >>>>>> >>>>>> +static void store_vblank(struct drm_device *dev, int crtc, >>>>>> + unsigned vblank_count_inc, >>>>>> + struct timeval *t_vblank) >>>>>> +{ >>>>>> +struct drm_vblank_crtc *vblank = &dev->vblank[crtc]; >>>>>> +u32 tslot; >>>>>> + >>>>>> +assert_spin_locked(&dev->vblank_time_lock); >>>>>> + >>>>>> +if (t_vblank) { >>>>>> +tslot = vblank->count + vblank_count_inc; >>>>>> +vblanktimestamp(dev, crtc, tslot) = *t_vblank; >>>>>> +} >>>>>> + >>>>>> +/* >>>>>> + * vblank timestamp updates are protected on the write side with >>>>>> + * vblank_time_lock, but on the read side done locklessly using a >>>>>> + * sequence-lock on the vblank counter. Ensure correct ordering >>>>>> using >>>>>> + * memory barrriers. We need the barrier both before and also after >>>>>> the >>>>>> + * counter update to synchronize with the next timestamp write. >>>>>> + * The read-side barriers for this are in drm_vblank_count_and_time. >>>>>> + */ >>>>>> +smp_wmb(); >>>>>> +vblank->count += vblank_c
[Intel-gfx] [PATCH] drm/vblank: Fixup and document timestamp update/read barriers
On 05/05/2015 11:42 AM, Daniel Vetter wrote: > On Tue, May 05, 2015 at 10:36:24AM -0400, Peter Hurley wrote: >> On 05/04/2015 12:52 AM, Mario Kleiner wrote: >>> On 04/16/2015 03:03 PM, Daniel Vetter wrote: >>>> On Thu, Apr 16, 2015 at 08:30:55AM -0400, Peter Hurley wrote: >>>>> On 04/15/2015 01:31 PM, Daniel Vetter wrote: >>>>>> On Wed, Apr 15, 2015 at 09:00:04AM -0400, Peter Hurley wrote: >>>>>>> Hi Daniel, >>>>>>> >>>>>>> On 04/15/2015 03:17 AM, Daniel Vetter wrote: >>>>>>>> This was a bit too much cargo-culted, so lets make it solid: >>>>>>>> - vblank->count doesn't need to be an atomic, writes are always done >>>>>>>>under the protection of dev->vblank_time_lock. Switch to an unsigned >>>>>>>>long instead and update comments. Note that atomic_read is just a >>>>>>>>normal read of a volatile variable, so no need to audit all the >>>>>>>>read-side access specifically. >>>>>>>> >>>>>>>> - The barriers for the vblank counter seqlock weren't complete: The >>>>>>>>read-side was missing the first barrier between the counter read and >>>>>>>>the timestamp read, it only had a barrier between the ts and the >>>>>>>>counter read. We need both. >>>>>>>> >>>>>>>> - Barriers weren't properly documented. Since barriers only work if >>>>>>>>you have them on boths sides of the transaction it's prudent to >>>>>>>>reference where the other side is. To avoid duplicating the >>>>>>>>write-side comment 3 times extract a little store_vblank() helper. >>>>>>>>In that helper also assert that we do indeed hold >>>>>>>>dev->vblank_time_lock, since in some cases the lock is acquired a >>>>>>>>few functions up in the callchain. >>>>>>>> >>>>>>>> Spotted while reviewing a patch from Chris Wilson to add a fastpath to >>>>>>>> the vblank_wait ioctl. >>>>>>>> >>>>>>>> Cc: Chris Wilson >>>>>>>> Cc: Mario Kleiner >>>>>>>> Cc: Ville Syrjälä >>>>>>>> Cc: Michel Dänzer >>>>>>>> Signed-off-by: Daniel Vetter >>>>>>>> --- >>>>>>>> drivers/gpu/drm/drm_irq.c | 92 >>>>>>>> --- >>>>>>>> include/drm/drmP.h| 8 +++-- >>>>>>>> 2 files changed, 54 insertions(+), 46 deletions(-) >>>>>>>> >>>>>>>> diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c >>>>>>>> index c8a34476570a..23bfbc61a494 100644 >>>>>>>> --- a/drivers/gpu/drm/drm_irq.c >>>>>>>> +++ b/drivers/gpu/drm/drm_irq.c >>>>>>>> @@ -74,6 +74,33 @@ module_param_named(vblankoffdelay, >>>>>>>> drm_vblank_offdelay, int, 0600); >>>>>>>> module_param_named(timestamp_precision_usec, >>>>>>>> drm_timestamp_precision, int, 0600); >>>>>>>> module_param_named(timestamp_monotonic, drm_timestamp_monotonic, >>>>>>>> int, 0600); >>>>>>>> >>>>>>>> +static void store_vblank(struct drm_device *dev, int crtc, >>>>>>>> + unsigned vblank_count_inc, >>>>>>>> + struct timeval *t_vblank) >>>>>>>> +{ >>>>>>>> +struct drm_vblank_crtc *vblank = &dev->vblank[crtc]; >>>>>>>> +u32 tslot; >>>>>>>> + >>>>>>>> +assert_spin_locked(&dev->vblank_time_lock); >>>>>>>> + >>>>>>>> +if (t_vblank) { >>>>>>>> +tslot = vblank->count + vblank_count_inc; >>>>>>>> +vblanktimestamp(dev, crtc, tslot) = *t_vblank; >>>>>>>> +} >>>>>>>> + >>>>>>>> +/* >>>>>>>> + * vblank timestamp updates are protected on the write side with >>
[Intel-gfx] [PATCH] drm/vblank: Fixup and document timestamp update/read barriers
On 05/05/2015 11:57 AM, Peter Hurley wrote: > On 05/05/2015 11:42 AM, Daniel Vetter wrote: >> I'm also somewhat confused about how you to a line across both cpus for >> barriers because barriers only have cpu-local effects (which is why we >> always need a barrier on both ends of a transaction). I'm sorry if my barrier notation confuses you; I find that it clearly identifies matching pairs. Also, there is a distinction between "can be visible" and "must be visible"; the load and stores themselves are not cpu-local. Regards, Peter Hurley
[Intel-gfx] [PATCH] drm/vblank: Fixup and document timestamp update/read barriers
On 05/06/2015 04:56 AM, Daniel Vetter wrote: > On Tue, May 05, 2015 at 11:57:42AM -0400, Peter Hurley wrote: >> On 05/05/2015 11:42 AM, Daniel Vetter wrote: >>> On Tue, May 05, 2015 at 10:36:24AM -0400, Peter Hurley wrote: >>>> On 05/04/2015 12:52 AM, Mario Kleiner wrote: >>>>> On 04/16/2015 03:03 PM, Daniel Vetter wrote: >>>>>> On Thu, Apr 16, 2015 at 08:30:55AM -0400, Peter Hurley wrote: >>>>>>> On 04/15/2015 01:31 PM, Daniel Vetter wrote: >>>>>>>> On Wed, Apr 15, 2015 at 09:00:04AM -0400, Peter Hurley wrote: >>>>>>>>> Hi Daniel, >>>>>>>>> >>>>>>>>> On 04/15/2015 03:17 AM, Daniel Vetter wrote: >>>>>>>>>> This was a bit too much cargo-culted, so lets make it solid: >>>>>>>>>> - vblank->count doesn't need to be an atomic, writes are always done >>>>>>>>>>under the protection of dev->vblank_time_lock. Switch to an >>>>>>>>>> unsigned >>>>>>>>>>long instead and update comments. Note that atomic_read is just a >>>>>>>>>>normal read of a volatile variable, so no need to audit all the >>>>>>>>>>read-side access specifically. >>>>>>>>>> >>>>>>>>>> - The barriers for the vblank counter seqlock weren't complete: The >>>>>>>>>>read-side was missing the first barrier between the counter read >>>>>>>>>> and >>>>>>>>>>the timestamp read, it only had a barrier between the ts and the >>>>>>>>>>counter read. We need both. >>>>>>>>>> >>>>>>>>>> - Barriers weren't properly documented. Since barriers only work if >>>>>>>>>>you have them on boths sides of the transaction it's prudent to >>>>>>>>>>reference where the other side is. To avoid duplicating the >>>>>>>>>>write-side comment 3 times extract a little store_vblank() helper. >>>>>>>>>>In that helper also assert that we do indeed hold >>>>>>>>>>dev->vblank_time_lock, since in some cases the lock is acquired a >>>>>>>>>>few functions up in the callchain. >>>>>>>>>> >>>>>>>>>> Spotted while reviewing a patch from Chris Wilson to add a fastpath >>>>>>>>>> to >>>>>>>>>> the vblank_wait ioctl. >>>>>>>>>> >>>>>>>>>> Cc: Chris Wilson >>>>>>>>>> Cc: Mario Kleiner >>>>>>>>>> Cc: Ville Syrjälä >>>>>>>>>> Cc: Michel Dänzer >>>>>>>>>> Signed-off-by: Daniel Vetter >>>>>>>>>> --- >>>>>>>>>> drivers/gpu/drm/drm_irq.c | 92 >>>>>>>>>> --- >>>>>>>>>> include/drm/drmP.h| 8 +++-- >>>>>>>>>> 2 files changed, 54 insertions(+), 46 deletions(-) >>>>>>>>>> >>>>>>>>>> diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c >>>>>>>>>> index c8a34476570a..23bfbc61a494 100644 >>>>>>>>>> --- a/drivers/gpu/drm/drm_irq.c >>>>>>>>>> +++ b/drivers/gpu/drm/drm_irq.c >>>>>>>>>> @@ -74,6 +74,33 @@ module_param_named(vblankoffdelay, >>>>>>>>>> drm_vblank_offdelay, int, 0600); >>>>>>>>>> module_param_named(timestamp_precision_usec, >>>>>>>>>> drm_timestamp_precision, int, 0600); >>>>>>>>>> module_param_named(timestamp_monotonic, drm_timestamp_monotonic, >>>>>>>>>> int, 0600); >>>>>>>>>> >>>>>>>>>> +static void store_vblank(struct drm_device *dev, int crtc, >>>>>>>>>> + unsigned vblank_count_inc, >>>>>>>>>> + struct timeval *t_vblank) >>>>>>>>>> +{ >>>>>>>>>> +stru
[PATCH] drm/nouveau/core: deinline nv_mask()
On 05/07/2015 04:49 AM, Denys Vlasenko wrote: > Function compiles to 89 bytes of machine code. > 466 callsites with this .config: > http://busybox.net/~vda/kernel_config > Size reduction: Much of the cruft is related to calling iowriteX. Ben, Isn't subdev io always mmio? (iow, never to the 64k i/o space) >text data bss dec hex filename > 82432426 22255384 20627456 125315266 77828c2 vmlinux.before > 82426986 22255416 20627456 125309858 77813a2 vmlinux > > Signed-off-by: Denys Vlasenko > CC: Stefan Huehner > CC: Ben Skeggs > CC: David Airlie > CC: dri-devel at lists.freedesktop.org > CC: linux-kernel at vger.kernel.org > --- > drivers/gpu/drm/nouveau/include/nvkm/core/subdev.h | 9 ++--- > drivers/gpu/drm/nouveau/nvkm/core/subdev.c | 8 > 2 files changed, 10 insertions(+), 7 deletions(-) > > diff --git a/drivers/gpu/drm/nouveau/include/nvkm/core/subdev.h > b/drivers/gpu/drm/nouveau/include/nvkm/core/subdev.h > index 6fdc391..261b7ff 100644 > --- a/drivers/gpu/drm/nouveau/include/nvkm/core/subdev.h > +++ b/drivers/gpu/drm/nouveau/include/nvkm/core/subdev.h > @@ -109,11 +109,6 @@ nv_wr32(void *obj, u32 addr, u32 data) > iowrite32_native(data, subdev->mmio + addr); > } > > -static inline u32 > -nv_mask(void *obj, u32 addr, u32 mask, u32 data) > -{ > - u32 temp = nv_rd32(obj, addr); > - nv_wr32(obj, addr, (temp & ~mask) | data); > - return temp; > -} > +u32 > +nv_mask(void *obj, u32 addr, u32 mask, u32 data); > #endif > diff --git a/drivers/gpu/drm/nouveau/nvkm/core/subdev.c > b/drivers/gpu/drm/nouveau/nvkm/core/subdev.c > index c5fb3a79..88331ea 100644 > --- a/drivers/gpu/drm/nouveau/nvkm/core/subdev.c > +++ b/drivers/gpu/drm/nouveau/nvkm/core/subdev.c > @@ -25,6 +25,14 @@ > #include > #include > > +u32 > +nv_mask(void *obj, u32 addr, u32 mask, u32 data) > +{ > + u32 temp = nv_rd32(obj, addr); > + nv_wr32(obj, addr, (temp & ~mask) | data); > + return temp; > +} > + > struct nvkm_subdev * > nvkm_subdev(void *obj, int idx) > { >
[PATCH] drm/nouveau/gem: tolerate a buffer specified multiple times
On 07/30/2015 10:12 AM, Ilia Mirkin wrote: > Is this happening with libdrm 2.4.60? If so, that's a known > (user-side) issue and should be fixed by using any version but that > one. What's the freedesktop bugzilla # for reference? Regards, Peter Hurley > On Thu, Jul 30, 2015 at 6:28 AM, Bryan O'Donoghue > wrote: >> Ubuntu is shipping Chrome Version 44.0.2403.125 (64-bit). With this version >> of the browser and current tip-of-tree 86ea07ca846a I get the following >> error message followed by a lock-up of X. >> >> nouveau E[chrome[2737]] multiple instances of buffer 33 on validation list >> nouveau E[chrome[2737]] validate_init >> nouveau E[chrome[2737]] validate: -22 >> nouveau E[chrome[2737]] multiple instances of buffer 18 on validation list >> nouveau E[chrome[2737]] validate_init >> nouveau E[chrome[2737]] validate: -22 >> nouveau E[ PFIFO][:01:00.0] PFIFO: read fault at >> 0x0003e21000 [PAGE_NOT_PRESENT] from (unknown enum >> 0x)/GPC0/(unknown enum 0x000f) on channel 0x007f80c000 >> [unknown] >> >> This patch suggests a fix for this with the kernel simply tolerating an >> application such as chrome requesting the same buffer more than once. >> >> With the version of chrome given above, you can elicit this behaviour by >> clicking on the bookmarks drop down. This will open another window on-top >> of the current window. Minus the fix included here, this will lead to hard >> lockup of all windows on the desktop. >> >> Chrome Version 44.0.2403.125 (64-bit) >> Linux 4.2.0-rc4+ 86ea07ca846a >> >> People are suggesting running chrome with -disable-gpu however it is >> possible to run Chrome in it's default mode, so long as we tolerate the >> above behaviour. >> >> http://tinyurl.com/orvbzf3 >> >> Signed-off-by: Bryan O'Donoghue >> --- >> drivers/gpu/drm/nouveau/nouveau_gem.c | 4 +--- >> 1 file changed, 1 insertion(+), 3 deletions(-) >> >> diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c >> b/drivers/gpu/drm/nouveau/nouveau_gem.c >> index af1ee51..a9694faad 100644 >> --- a/drivers/gpu/drm/nouveau/nouveau_gem.c >> +++ b/drivers/gpu/drm/nouveau/nouveau_gem.c >> @@ -401,9 +401,7 @@ retry: >> if (nvbo->reserved_by && nvbo->reserved_by == file_priv) { >> NV_PRINTK(error, cli, "multiple instances of buffer >> %d on " >> "validation list\n", b->handle); >> - drm_gem_object_unreference_unlocked(gem); >> - ret = -EINVAL; >> - break; >> + continue; >> } >> >> ret = ttm_bo_reserve(&nvbo->bo, true, false, true, >> &op->ticket); >> -- >> 1.9.1 >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in >> the body of a message to majordomo at vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> Please read the FAQ at http://www.tux.org/lkml/ > ___ > dri-devel mailing list > dri-devel at lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/dri-devel >
[PATCH] drm/nouveau/gem: tolerate a buffer specified multiple times
On 07/30/2015 10:12 AM, Ilia Mirkin wrote: > Is this happening with libdrm 2.4.60? If so, that's a known > (user-side) issue and should be fixed by using any version but that > one. What's the freedesktop bugzilla # for reference? Regards, Peter Hurley > On Thu, Jul 30, 2015 at 6:28 AM, Bryan O'Donoghue > wrote: >> Ubuntu is shipping Chrome Version 44.0.2403.125 (64-bit). With this version >> of the browser and current tip-of-tree 86ea07ca846a I get the following >> error message followed by a lock-up of X. >> >> nouveau E[chrome[2737]] multiple instances of buffer 33 on validation list >> nouveau E[chrome[2737]] validate_init >> nouveau E[chrome[2737]] validate: -22 >> nouveau E[chrome[2737]] multiple instances of buffer 18 on validation list >> nouveau E[chrome[2737]] validate_init >> nouveau E[chrome[2737]] validate: -22 >> nouveau E[ PFIFO][:01:00.0] PFIFO: read fault at >> 0x0003e21000 [PAGE_NOT_PRESENT] from (unknown enum >> 0x)/GPC0/(unknown enum 0x000f) on channel 0x007f80c000 >> [unknown] >> >> This patch suggests a fix for this with the kernel simply tolerating an >> application such as chrome requesting the same buffer more than once. >> >> With the version of chrome given above, you can elicit this behaviour by >> clicking on the bookmarks drop down. This will open another window on-top >> of the current window. Minus the fix included here, this will lead to hard >> lockup of all windows on the desktop. >> >> Chrome Version 44.0.2403.125 (64-bit) >> Linux 4.2.0-rc4+ 86ea07ca846a >> >> People are suggesting running chrome with -disable-gpu however it is >> possible to run Chrome in it's default mode, so long as we tolerate the >> above behaviour. >> >> http://tinyurl.com/orvbzf3 >> >> Signed-off-by: Bryan O'Donoghue >> --- >> drivers/gpu/drm/nouveau/nouveau_gem.c | 4 +--- >> 1 file changed, 1 insertion(+), 3 deletions(-) >> >> diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c >> b/drivers/gpu/drm/nouveau/nouveau_gem.c >> index af1ee51..a9694faad 100644 >> --- a/drivers/gpu/drm/nouveau/nouveau_gem.c >> +++ b/drivers/gpu/drm/nouveau/nouveau_gem.c >> @@ -401,9 +401,7 @@ retry: >> if (nvbo->reserved_by && nvbo->reserved_by == file_priv) { >> NV_PRINTK(error, cli, "multiple instances of buffer >> %d on " >> "validation list\n", b->handle); >> - drm_gem_object_unreference_unlocked(gem); >> - ret = -EINVAL; >> - break; >> + continue; >> } >> >> ret = ttm_bo_reserve(&nvbo->bo, true, false, true, >> &op->ticket); >> -- >> 1.9.1
[PATCH] drm/nouveau/gem: tolerate a buffer specified multiple times
[ +cc Debian maintainer ] On 07/30/2015 11:26 AM, Emil Velikov wrote: > On 30 July 2015 at 16:02, Ilia Mirkin wrote: >> On Thu, Jul 30, 2015 at 10:56 AM, Bryan O'Donoghue >> wrote: >>> On 30/07/15 15:52, Bryan O'Donoghue wrote: >>>> >>>> On 30/07/15 15:49, Peter Hurley wrote: >>>>> >>>>> On 07/30/2015 10:12 AM, Ilia Mirkin wrote: >>>>>> >>>>>> Is this happening with libdrm 2.4.60? If so, that's a known >>>>>> (user-side) issue and should be fixed by using any version but that >>>>>> one. >>>>> >>>>> >>>>> What's the freedesktop bugzilla # for reference? >>>>> >>>>> Regards, >>>>> Peter Hurley >>>> >>>> >>>> I believe it's this one >>>> >>>> https://bugs.freedesktop.org/show_bug.cgi?id=89842#c19 >>>> >>> >>> Not really a world of choice on ubuntu to fix it though... >>> >>> deckard at aineko:~/Development/projectara$ apt-show-versions libdrm2 >>> libdrm2:amd64/trusty-updates 2.4.60-2~ubuntu14.04.1 uptodate >>> libdrm2:i386/trusty-updates 2.4.60-2~ubuntu14.04.1 uptodate >>> >>> :( >> >> That's unfortunate. I know next to nothing about debian/ubuntu or how >> they do versions or how to even build packages for them. But they're >> big distros, presumably they have support teams of some sort, perhaps >> they can help you. >> >> Assuming that switching away does resolve the issue for you, perhaps >> you can also recommend that they avoid shipping that version, or >> include this nouveau fix in it: >> >> http://cgit.freedesktop.org/mesa/drm/commit/?id=812e8fe6ce46d733c30207ee26c788c61f546294 >> > Fwiw debian has been tracking this as #789759, and they are shipping > 2.4.62 which includes the fix. Unfortunately the LTS version of Ubuntu (trusty) was updated to 2.4.60 several days ago without this fix. I repackaged libdrm 2.4.60 with only the bug fix above and confirm the patch above fixes the observed behavior in freedesktop bug# 89842/ debian bug# 789759. I pushed the repackage to Launchpad PPA @ ppa:phurley/libdrm Hopefully the Debian maintainer grabs this fix and updates the official distribution version soon. Regards, Peter Hurley
[PATCH] vt_buffer: drop console buffer copying optimisations
On 01/28/2015 11:11 PM, Dave Airlie wrote: > These two copy to/from VGA memory, however on the Silicon > Motion SMI750 VGA card on a 64-bit system cause console corruption. > > This is due to the hw being buggy and not handling a 64-bit transaction > correctly. > > We could try and create a 32-bit version of these routines, > but I'm not sure the optimisation is worth much today. > > Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1132826 Restricted link. > Tested-by: Huawei engineering. > Signed-off-by: Dave Airlie > --- > > Linus, this came up a while back I finally got some confirmation > that it fixes those servers. > > include/linux/vt_buffer.h | 4 > 1 file changed, 4 deletions(-) > > diff --git a/include/linux/vt_buffer.h b/include/linux/vt_buffer.h > index 057db7d..f38c10b 100644 > --- a/include/linux/vt_buffer.h > +++ b/include/linux/vt_buffer.h > @@ -21,10 +21,6 @@ > #ifndef VT_BUF_HAVE_RW > #define scr_writew(val, addr) (*(addr) = (val)) > #define scr_readw(addr) (*(addr)) > -#define scr_memcpyw(d, s, c) memcpy(d, s, c) > -#define scr_memmovew(d, s, c) memmove(d, s, c) > -#define VT_BUF_HAVE_MEMCPYW > -#define VT_BUF_HAVE_MEMMOVEW > #endif > > #ifndef VT_BUF_HAVE_MEMSETW > -- Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ -- ___ Dri-devel mailing list Dri-devel at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
[PATCH] drm/vblank: Fixup and document timestamp update/read barriers
Hi Daniel, On 04/15/2015 03:17 AM, Daniel Vetter wrote: > This was a bit too much cargo-culted, so lets make it solid: > - vblank->count doesn't need to be an atomic, writes are always done > under the protection of dev->vblank_time_lock. Switch to an unsigned > long instead and update comments. Note that atomic_read is just a > normal read of a volatile variable, so no need to audit all the > read-side access specifically. > > - The barriers for the vblank counter seqlock weren't complete: The > read-side was missing the first barrier between the counter read and > the timestamp read, it only had a barrier between the ts and the > counter read. We need both. > > - Barriers weren't properly documented. Since barriers only work if > you have them on boths sides of the transaction it's prudent to > reference where the other side is. To avoid duplicating the > write-side comment 3 times extract a little store_vblank() helper. > In that helper also assert that we do indeed hold > dev->vblank_time_lock, since in some cases the lock is acquired a > few functions up in the callchain. > > Spotted while reviewing a patch from Chris Wilson to add a fastpath to > the vblank_wait ioctl. > > Cc: Chris Wilson > Cc: Mario Kleiner > Cc: Ville Syrjälä > Cc: Michel Dänzer > Signed-off-by: Daniel Vetter > --- > drivers/gpu/drm/drm_irq.c | 92 > --- > include/drm/drmP.h| 8 +++-- > 2 files changed, 54 insertions(+), 46 deletions(-) > > diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c > index c8a34476570a..23bfbc61a494 100644 > --- a/drivers/gpu/drm/drm_irq.c > +++ b/drivers/gpu/drm/drm_irq.c > @@ -74,6 +74,33 @@ module_param_named(vblankoffdelay, drm_vblank_offdelay, > int, 0600); > module_param_named(timestamp_precision_usec, drm_timestamp_precision, int, > 0600); > module_param_named(timestamp_monotonic, drm_timestamp_monotonic, int, 0600); > > +static void store_vblank(struct drm_device *dev, int crtc, > + unsigned vblank_count_inc, > + struct timeval *t_vblank) > +{ > + struct drm_vblank_crtc *vblank = &dev->vblank[crtc]; > + u32 tslot; > + > + assert_spin_locked(&dev->vblank_time_lock); > + > + if (t_vblank) { > + tslot = vblank->count + vblank_count_inc; > + vblanktimestamp(dev, crtc, tslot) = *t_vblank; > + } > + > + /* > + * vblank timestamp updates are protected on the write side with > + * vblank_time_lock, but on the read side done locklessly using a > + * sequence-lock on the vblank counter. Ensure correct ordering using > + * memory barrriers. We need the barrier both before and also after the > + * counter update to synchronize with the next timestamp write. > + * The read-side barriers for this are in drm_vblank_count_and_time. > + */ > + smp_wmb(); > + vblank->count += vblank_count_inc; > + smp_wmb(); The comment and the code are each self-contradictory. If vblank->count writes are always protected by vblank_time_lock (something I did not verify but that the comment above asserts), then the trailing write barrier is not required (and the assertion that it is in the comment is incorrect). A spin unlock operation is always a write barrier. Regards, Peter Hurley > +} > + > /** > * drm_update_vblank_count - update the master vblank counter > * @dev: DRM device > @@ -93,7 +120,7 @@ module_param_named(timestamp_monotonic, > drm_timestamp_monotonic, int, 0600); > static void drm_update_vblank_count(struct drm_device *dev, int crtc) > { > struct drm_vblank_crtc *vblank = &dev->vblank[crtc]; > - u32 cur_vblank, diff, tslot; > + u32 cur_vblank, diff; > bool rc; > struct timeval t_vblank; > > @@ -129,18 +156,12 @@ static void drm_update_vblank_count(struct drm_device > *dev, int crtc) > if (diff == 0) > return; > > - /* Reinitialize corresponding vblank timestamp if high-precision query > - * available. Skip this step if query unsupported or failed. Will > - * reinitialize delayed at next vblank interrupt in that case. > + /* > + * Only reinitialize corresponding vblank timestamp if high-precision > query > + * available and didn't fail. Will reinitialize delayed at next vblank > + * interrupt in that case. >*/ > - if (rc) { > - tslot = atomic_read(&vblank->count) + diff; > - vblanktimestamp(dev, crtc, tslot) = t_vblank; > - } > - > - s
[PATCH] drm/vblank: Fixup and document timestamp update/read barriers
On 04/15/2015 05:26 PM, Mario Kleiner wrote: > A couple of questions to educate me and one review comment. > > On 04/15/2015 07:34 PM, Daniel Vetter wrote: >> This was a bit too much cargo-culted, so lets make it solid: >> - vblank->count doesn't need to be an atomic, writes are always done >>under the protection of dev->vblank_time_lock. Switch to an unsigned >>long instead and update comments. Note that atomic_read is just a >>normal read of a volatile variable, so no need to audit all the >>read-side access specifically. >> >> - The barriers for the vblank counter seqlock weren't complete: The >>read-side was missing the first barrier between the counter read and >>the timestamp read, it only had a barrier between the ts and the >>counter read. We need both. >> >> - Barriers weren't properly documented. Since barriers only work if >>you have them on boths sides of the transaction it's prudent to >>reference where the other side is. To avoid duplicating the >>write-side comment 3 times extract a little store_vblank() helper. >>In that helper also assert that we do indeed hold >>dev->vblank_time_lock, since in some cases the lock is acquired a >>few functions up in the callchain. >> >> Spotted while reviewing a patch from Chris Wilson to add a fastpath to >> the vblank_wait ioctl. >> >> v2: Add comment to better explain how store_vblank works, suggested by >> Chris. >> >> v3: Peter noticed that as-is the 2nd smp_wmb is redundant with the >> implicit barrier in the spin_unlock. But that can only be proven by >> auditing all callers and my point in extracting this little helper was >> to localize all the locking into just one place. Hence I think that >> additional optimization is too risky. >> >> Cc: Chris Wilson >> Cc: Mario Kleiner >> Cc: Ville Syrjälä >> Cc: Michel Dänzer >> Cc: Peter Hurley >> Signed-off-by: Daniel Vetter >> --- >> drivers/gpu/drm/drm_irq.c | 95 >> +-- >> include/drm/drmP.h| 8 +++- >> 2 files changed, 57 insertions(+), 46 deletions(-) >> >> diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c >> index c8a34476570a..8694b77d0002 100644 >> --- a/drivers/gpu/drm/drm_irq.c >> +++ b/drivers/gpu/drm/drm_irq.c >> @@ -74,6 +74,36 @@ module_param_named(vblankoffdelay, drm_vblank_offdelay, >> int, 0600); >> module_param_named(timestamp_precision_usec, drm_timestamp_precision, int, >> 0600); >> module_param_named(timestamp_monotonic, drm_timestamp_monotonic, int, >> 0600); >> >> +static void store_vblank(struct drm_device *dev, int crtc, >> + unsigned vblank_count_inc, >> + struct timeval *t_vblank) >> +{ >> +struct drm_vblank_crtc *vblank = &dev->vblank[crtc]; >> +u32 tslot; >> + >> +assert_spin_locked(&dev->vblank_time_lock); >> + >> +if (t_vblank) { >> +/* All writers hold the spinlock, but readers are serialized by >> + * the latching of vblank->count below. >> + */ >> +tslot = vblank->count + vblank_count_inc; >> +vblanktimestamp(dev, crtc, tslot) = *t_vblank; >> +} >> + >> +/* >> + * vblank timestamp updates are protected on the write side with >> + * vblank_time_lock, but on the read side done locklessly using a >> + * sequence-lock on the vblank counter. Ensure correct ordering using >> + * memory barrriers. We need the barrier both before and also after the >> + * counter update to synchronize with the next timestamp write. >> + * The read-side barriers for this are in drm_vblank_count_and_time. >> + */ >> +smp_wmb(); >> +vblank->count += vblank_count_inc; >> +smp_wmb(); >> +} >> + >> /** >>* drm_update_vblank_count - update the master vblank counter >>* @dev: DRM device >> @@ -93,7 +123,7 @@ module_param_named(timestamp_monotonic, >> drm_timestamp_monotonic, int, 0600); >> static void drm_update_vblank_count(struct drm_device *dev, int crtc) >> { >> struct drm_vblank_crtc *vblank = &dev->vblank[crtc]; >> -u32 cur_vblank, diff, tslot; >> +u32 cur_vblank, diff; >> bool rc; >> struct timeval t_vblank; >> >> @@ -129,18 +159,12 @@ static void drm_update_vblank_count(struct drm_device >> *dev, int crtc) >&g
[PATCH] drm/vblank: Fixup and document timestamp update/read barriers
On 04/16/2015 02:39 AM, Mario Kleiner wrote: > On 04/16/2015 03:29 AM, Peter Hurley wrote: >> On 04/15/2015 05:26 PM, Mario Kleiner wrote: >>> A couple of questions to educate me and one review comment. >>> >>> On 04/15/2015 07:34 PM, Daniel Vetter wrote: >>>> This was a bit too much cargo-culted, so lets make it solid: >>>> - vblank->count doesn't need to be an atomic, writes are always done >>>> under the protection of dev->vblank_time_lock. Switch to an unsigned >>>> long instead and update comments. Note that atomic_read is just a >>>> normal read of a volatile variable, so no need to audit all the >>>> read-side access specifically. >>>> >>>> - The barriers for the vblank counter seqlock weren't complete: The >>>> read-side was missing the first barrier between the counter read and >>>> the timestamp read, it only had a barrier between the ts and the >>>> counter read. We need both. >>>> >>>> - Barriers weren't properly documented. Since barriers only work if >>>> you have them on boths sides of the transaction it's prudent to >>>> reference where the other side is. To avoid duplicating the >>>> write-side comment 3 times extract a little store_vblank() helper. >>>> In that helper also assert that we do indeed hold >>>> dev->vblank_time_lock, since in some cases the lock is acquired a >>>> few functions up in the callchain. >>>> >>>> Spotted while reviewing a patch from Chris Wilson to add a fastpath to >>>> the vblank_wait ioctl. >>>> >>>> v2: Add comment to better explain how store_vblank works, suggested by >>>> Chris. >>>> >>>> v3: Peter noticed that as-is the 2nd smp_wmb is redundant with the >>>> implicit barrier in the spin_unlock. But that can only be proven by >>>> auditing all callers and my point in extracting this little helper was >>>> to localize all the locking into just one place. Hence I think that >>>> additional optimization is too risky. >>>> >>>> Cc: Chris Wilson >>>> Cc: Mario Kleiner >>>> Cc: Ville Syrjälä >>>> Cc: Michel Dänzer >>>> Cc: Peter Hurley >>>> Signed-off-by: Daniel Vetter >>>> --- >>>>drivers/gpu/drm/drm_irq.c | 95 >>>> +-- >>>>include/drm/drmP.h| 8 +++- >>>>2 files changed, 57 insertions(+), 46 deletions(-) >>>> >>>> diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c >>>> index c8a34476570a..8694b77d0002 100644 >>>> --- a/drivers/gpu/drm/drm_irq.c >>>> +++ b/drivers/gpu/drm/drm_irq.c >>>> @@ -74,6 +74,36 @@ module_param_named(vblankoffdelay, drm_vblank_offdelay, >>>> int, 0600); >>>>module_param_named(timestamp_precision_usec, drm_timestamp_precision, >>>> int, 0600); >>>>module_param_named(timestamp_monotonic, drm_timestamp_monotonic, int, >>>> 0600); >>>> >>>> +static void store_vblank(struct drm_device *dev, int crtc, >>>> + unsigned vblank_count_inc, >>>> + struct timeval *t_vblank) >>>> +{ >>>> +struct drm_vblank_crtc *vblank = &dev->vblank[crtc]; >>>> +u32 tslot; >>>> + >>>> +assert_spin_locked(&dev->vblank_time_lock); >>>> + >>>> +if (t_vblank) { >>>> +/* All writers hold the spinlock, but readers are serialized by >>>> + * the latching of vblank->count below. >>>> + */ >>>> +tslot = vblank->count + vblank_count_inc; >>>> +vblanktimestamp(dev, crtc, tslot) = *t_vblank; >>>> +} >>>> + >>>> +/* >>>> + * vblank timestamp updates are protected on the write side with >>>> + * vblank_time_lock, but on the read side done locklessly using a >>>> + * sequence-lock on the vblank counter. Ensure correct ordering using >>>> + * memory barrriers. We need the barrier both before and also after >>>> the >>>> + * counter update to synchronize with the next timestamp write. >>>> + * The read-side barriers for this are in drm_vblank_count_and_time. >>>> + */ >>>>
[Intel-gfx] [PATCH] drm/vblank: Fixup and document timestamp update/read barriers
On 04/15/2015 01:31 PM, Daniel Vetter wrote: > On Wed, Apr 15, 2015 at 09:00:04AM -0400, Peter Hurley wrote: >> Hi Daniel, >> >> On 04/15/2015 03:17 AM, Daniel Vetter wrote: >>> This was a bit too much cargo-culted, so lets make it solid: >>> - vblank->count doesn't need to be an atomic, writes are always done >>> under the protection of dev->vblank_time_lock. Switch to an unsigned >>> long instead and update comments. Note that atomic_read is just a >>> normal read of a volatile variable, so no need to audit all the >>> read-side access specifically. >>> >>> - The barriers for the vblank counter seqlock weren't complete: The >>> read-side was missing the first barrier between the counter read and >>> the timestamp read, it only had a barrier between the ts and the >>> counter read. We need both. >>> >>> - Barriers weren't properly documented. Since barriers only work if >>> you have them on boths sides of the transaction it's prudent to >>> reference where the other side is. To avoid duplicating the >>> write-side comment 3 times extract a little store_vblank() helper. >>> In that helper also assert that we do indeed hold >>> dev->vblank_time_lock, since in some cases the lock is acquired a >>> few functions up in the callchain. >>> >>> Spotted while reviewing a patch from Chris Wilson to add a fastpath to >>> the vblank_wait ioctl. >>> >>> Cc: Chris Wilson >>> Cc: Mario Kleiner >>> Cc: Ville Syrjälä >>> Cc: Michel Dänzer >>> Signed-off-by: Daniel Vetter >>> --- >>> drivers/gpu/drm/drm_irq.c | 92 >>> --- >>> include/drm/drmP.h| 8 +++-- >>> 2 files changed, 54 insertions(+), 46 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c >>> index c8a34476570a..23bfbc61a494 100644 >>> --- a/drivers/gpu/drm/drm_irq.c >>> +++ b/drivers/gpu/drm/drm_irq.c >>> @@ -74,6 +74,33 @@ module_param_named(vblankoffdelay, drm_vblank_offdelay, >>> int, 0600); >>> module_param_named(timestamp_precision_usec, drm_timestamp_precision, int, >>> 0600); >>> module_param_named(timestamp_monotonic, drm_timestamp_monotonic, int, >>> 0600); >>> >>> +static void store_vblank(struct drm_device *dev, int crtc, >>> +unsigned vblank_count_inc, >>> +struct timeval *t_vblank) >>> +{ >>> + struct drm_vblank_crtc *vblank = &dev->vblank[crtc]; >>> + u32 tslot; >>> + >>> + assert_spin_locked(&dev->vblank_time_lock); >>> + >>> + if (t_vblank) { >>> + tslot = vblank->count + vblank_count_inc; >>> + vblanktimestamp(dev, crtc, tslot) = *t_vblank; >>> + } >>> + >>> + /* >>> +* vblank timestamp updates are protected on the write side with >>> +* vblank_time_lock, but on the read side done locklessly using a >>> +* sequence-lock on the vblank counter. Ensure correct ordering using >>> +* memory barrriers. We need the barrier both before and also after the >>> +* counter update to synchronize with the next timestamp write. >>> +* The read-side barriers for this are in drm_vblank_count_and_time. >>> +*/ >>> + smp_wmb(); >>> + vblank->count += vblank_count_inc; >>> + smp_wmb(); >> >> The comment and the code are each self-contradictory. >> >> If vblank->count writes are always protected by vblank_time_lock (something I >> did not verify but that the comment above asserts), then the trailing write >> barrier is not required (and the assertion that it is in the comment is >> incorrect). >> >> A spin unlock operation is always a write barrier. > > Hm yeah. Otoh to me that's bordering on "code too clever for my own good". > That the spinlock is held I can assure. That no one goes around and does > multiple vblank updates (because somehow that code raced with the hw > itself) I can't easily assure with a simple assert or something similar. > It's not the case right now, but that can changes. The algorithm would be broken if multiple updates for the same vblank count were allowed; that's why it checks to see if the vblank count has not advanced before storing a new timestamp. Otherwise, the read side would not be able to determine that the timestamp is valid by double-checking that the vblank count has not changed. And besides, even if the code looped without dropping the spinlock, the correct write order would still be observed because it would still be executing on the same cpu. My objection to the write memory barrier is not about optimization; it's about correct code. Regards, Peter Hurley
[PATCH] drm/vblank: Fixup and document timestamp update/read barriers
On 04/16/2015 02:39 AM, Mario Kleiner wrote: > On 04/16/2015 03:29 AM, Peter Hurley wrote: >> On 04/15/2015 05:26 PM, Mario Kleiner wrote: >> Because the time scales for these events don't require that level of >> resolution; consider how much code has to get executed between a >> hardware vblank irq triggering and the vblank counter being updated. >> >> Realistically, the only relevant requirement is that the timestamp >> match the counter. >> > > Yes that is the really important part. A msec delay would possibly matter for > some timing sensitive apps like mine - some more exotic displays run at 200 > Hz, and some apps need to synchronize to the vblank not strictly for > graphics. But i assume potential delays here are more on the order of a few > microseconds if some pending loads from the cache would get reordered for > overall efficiency? I'd be surprised if the delay were as much as 1 us. The latency to return to userspace significantly dwarfs any observable effects having missed the vblank count update by 1 instruction. Regards, Peter Hurley
bitfield structures
On 10/16/2014 10:14 AM, Alex Deucher wrote: > Are there any strong objections to these sorts of structures? You may want to blacklist certain compiler version/arch combinations, or get the affected arches to do it. gcc up to 4.7.1 on ia64 and ppc64 generates 64-bit wide RMW cycles on bitfields, regardless of the specified type or actual field width. The enlarged write overwrites adjacent fields. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52080 Regards, Peter Hurley
[bisected][3.9.0-rc3] NULL ptr dereference from nv50_disp_intr()
On vanilla 3.9.0-rc3, I get this 100% repeatable oops after login when the user X session is coming up: BUG: unable to handle kernel NULL pointer dereference at 0001 IP: [<0001>] 0x0 PGD 0 Oops: 0010 [#1] PREEMPT SMP Modules linked in: ip6table_filter ip6_tables ebtable_nat ebtables .. CPU 3 Pid: 0, comm: swapper/3 Not tainted 3.9.0-rc3-xeon #rc3 Dell Inc. Precision WorkStation T5400 /0RW203 RIP: 0010:[<0001>] [<0001>] 0x0 RSP: 0018:8802afcc3d80 EFLAGS: 00010087 RAX: 88029f6e5808 RBX: 0001 RCX: RDX: 0096 RSI: 0001 RDI: 88029f6e5808 RBP: 8802afcc3dc8 R08: R09: 0004 R10: 002c R11: 88029e559a98 R12: 8802a376cb78 R13: 88029f6e57e0 R14: 88029f6e57f8 R15: 88029f6e5808 FS: () GS:8802afcc() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: 0001 CR3: 00029fa67000 CR4: 07e0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Process swapper/3 (pid: 0, threadinfo 8802a355e000, task 8802a3535c40) Stack: a0159d8a 0082 88029f6e5820 0001 88029f71aa00 0400 0400 8802afcc3e38 a01843b5 8802afcc3df8 Call Trace: [] ? nouveau_event_trigger+0xaa/0xe0 [nouveau] [] nv50_disp_intr+0xc5/0x200 [nouveau] [] ? _raw_spin_unlock_irqrestore+0x2c/0x50 [] ? notifier_call_chain+0x4d/0x70 [] nouveau_mc_intr+0xb5/0x110 [nouveau] [] nouveau_irq_handler+0x6f/0x80 [nouveau] [] handle_irq_event_percpu+0x75/0x260 [] handle_irq_event+0x48/0x70 [] handle_fasteoi_irq+0x5a/0x100 [] handle_irq+0x22/0x40 [] do_IRQ+0x5a/0xd0 [] common_interrupt+0x6d/0x6d [] ? native_safe_halt+0x6/0x10 [] default_idle+0x3d/0x170 [] cpu_idle+0x116/0x130 [] start_secondary+0x251/0x258 Code: Bad RIP value. RIP [<0001>] 0x0 RSP CR2: 0001 ---[ end trace 907323cb8ce6f301 ]--- git bisect from 3.8.0 (good) to 3.9.0-rc3 (bad) blames (bisect log attached): 1d7c71a3e2f77336df536855b0efd2dc5bdeb41b is the first bad commit commit 1d7c71a3e2f77336df536855b0efd2dc5bdeb41b Author: Ben Skeggs Date: Thu Jan 31 09:23:34 2013 +1000 drm/nouveau/disp: port vblank handling to event interface This removes the nastiness with the interactions between display and software engines when handling vblank semaphore release interrupts. Now, all the semantics are handled in one place (sw) \o/. Signed-off-by: Ben Skeggs :04 04 fbd44f8566271415fd2775ab4b6346efef7e82fe a0730be0f35feaa1476b1447b1d65c4b3b3c0686 M drivers On this hardware: nouveau [ DEVICE][:02:00.0] BOOT0 : 0x084e00a2 nouveau [ DEVICE][:02:00.0] Chipset: G84 (NV84) nouveau [ DEVICE][:02:00.0] Family : NV50 nouveau [ VBIOS][:02:00.0] checking PRAMIN for image... nouveau [ VBIOS][:02:00.0] ... appears to be valid nouveau [ VBIOS][:02:00.0] using image from PRAMIN nouveau [ VBIOS][:02:00.0] BIT signature found nouveau [ VBIOS][:02:00.0] version 60.84.63.00.11 nouveau [ PFB][:02:00.0] RAM type: DDR2 nouveau [ PFB][:02:00.0] RAM size: 256 MiB nouveau [ PFB][:02:00.0]ZCOMP: 1892 tags nouveau [ DRM] VRAM: 256 MiB nouveau [ DRM] GART: 512 MiB nouveau [ DRM] BIT BIOS found nouveau [ DRM] Bios version 60.84.63.00 nouveau [ DRM] TMDS table version 2.0 nouveau [ DRM] DCB version 4.0 nouveau [ DRM] DCB outp 00: 02000300 0028 nouveau [ DRM] DCB outp 01: 01000302 0030 nouveau [ DRM] DCB outp 02: 04011310 0028 nouveau [ DRM] DCB outp 03: 02011312 0030 nouveau [ DRM] DCB conn 00: 1030 nouveau [ DRM] DCB conn 01: 2130 nouveau [ DRM] 2 available performance level(s) nouveau [ DRM] 0: core 208MHz shader 416MHz memory 100MHz voltage 1200mV fanspeed 100% nouveau [ DRM] 1: core 460MHz shader 920MHz memory 400MHz voltage 1200mV fanspeed 100% nouveau [ DRM] c: core 459MHz shader 918MHz memory 399MHz voltage 1200mV nouveau [ DRM] MM: using CRYPT for buffer copies nouveau [ DRM] allocated 1680x1050 fb: 0x6, bo 88029ef50400 fbcon: nouveaufb (fb0) is primary device nouveau :02:00.0: fb0: nouveaufb frame buffer device nouveau :02:00.0: registered panic notifier [drm] Initialized nouveau 1.1.0 20120801 for :02:00.0 on minor 0 02:00.0 VGA compatible controller: NVIDIA Corporation G84 [Quadro FX 570] (rev a1) (prog-if 00 [VGA controller]) Subsystem: NVIDIA Corporation Device 0474 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- Capabilitie
[Nouveau] nouveau lockdep splat
[ adding Ben Skeggs and Dave Airlie ] On Tue, 2013-03-19 at 21:24 +0100, Borislav Petkov wrote: > On Tue, Mar 05, 2013 at 05:30:52PM +0100, Lucas Stach wrote: > > Dropping Tegra ML, it's not the place where Nouveau mails should go. > > Adding Nouveau ML and Maarten, who probably knows Lockdep+Nouveau best. > > Ok, > > with the hope of having the right people on CC now (finally, thanks > Lucas :-)), here's the same splat on -rc3. Someone better take a look > soonish, please: Also happens in next (on nv50 hardware). > [0.541078] [drm] No driver support for vblank timestamp query. > [0.541272] nouveau [ DRM] 3 available performance level(s) > [0.541276] nouveau [ DRM] 0: core 135MHz shader 270MHz memory 135MHz > voltage 900mV > [0.541280] nouveau [ DRM] 1: core 405MHz shader 810MHz memory 405MHz > voltage 900mV > [0.541284] nouveau [ DRM] 3: core 520MHz shader 1230MHz memory > 790MHz voltage 900mV > [0.541287] nouveau [ DRM] c: core 405MHz shader 810MHz memory 405MHz > voltage 900mV > [0.559846] nouveau [ DRM] MM: using COPY for buffer copies > [0.625371] nouveau [ DRM] allocated 1920x1080 fb: 0x7, bo > 88043b54f000 > [0.625441] fbcon: nouveaufb (fb0) is primary device > [0.62] > [0.625556] = > [0.625556] [ INFO: possible recursive locking detected ] > [0.625557] 3.9.0-rc3+ #25 Not tainted > [0.625557] - > [0.625558] swapper/0/1 is trying to acquire lock: > [0.625562] (&dmac->lock){+.+...}, at: [] > evo_wait+0x43/0xf0 > [0.625562] > [0.625562] but task is already holding lock: > [0.625564] (&dmac->lock){+.+...}, at: [] > evo_wait+0x43/0xf0 > [0.625565] > [0.625565] other info that might help us debug this: > [0.625565] Possible unsafe locking scenario: > [0.625565] > [0.625565]CPU0 > [0.625565] > [0.625566] lock(&dmac->lock); > [0.625567] lock(&dmac->lock); > [0.625567] > [0.625567] *** DEADLOCK *** > [0.625567] > [0.625567] May be due to missing lock nesting notation > [0.625567] > [0.625568] 10 locks held by swapper/0/1: > [0.625570] #0: (&__lockdep_no_validate__){..}, at: > [] __driver_attach+0x5b/0xb0 > [0.625572] #1: (&__lockdep_no_validate__){..}, at: > [] __driver_attach+0x69/0xb0 > [0.625575] #2: (drm_global_mutex){+.+.+.}, at: [] > drm_get_pci_dev+0xc6/0x2d0 > [0.625578] #3: (registration_lock){+.+.+.}, at: [] > register_framebuffer+0x25/0x310 > [0.625581] #4: (&fb_info->lock){+.+.+.}, at: [] > lock_fb_info+0x26/0x60 > [0.625583] #5: (console_lock){+.+.+.}, at: [] > register_framebuffer+0x1ba/0x310 > [0.625585] #6: ((fb_notifier_list).rwsem){.+.+.+}, at: > [] __blocking_notifier_call_chain+0x42/0x80 > [0.625587] #7: (&dev->mode_config.mutex){+.+.+.}, at: > [] drm_modeset_lock_all+0x2a/0x70 > [0.625589] #8: (&crtc->mutex){+.+.+.}, at: [] > drm_modeset_lock_all+0x54/0x70 > [0.625591] #9: (&dmac->lock){+.+...}, at: [] > evo_wait+0x43/0xf0 > [0.625591] > [0.625591] stack backtrace: > [0.625592] Pid: 1, comm: swapper/0 Not tainted 3.9.0-rc3+ #25 > [0.625593] Call Trace: > [0.625595] [] __lock_acquire+0x76b/0x1c20 > [0.625597] [] ? dcb_table+0x1ac/0x2a0 > [0.625599] [] lock_acquire+0x8a/0x120 > [0.625600] [] ? evo_wait+0x43/0xf0 > [0.625602] [] ? mutex_lock_nested+0x292/0x330 > [0.625603] [] mutex_lock_nested+0x6e/0x330 > [0.625605] [] ? evo_wait+0x43/0xf0 > [0.625606] [] ? mark_held_locks+0x9b/0x100 > [0.625607] [] evo_wait+0x43/0xf0 > [0.625609] [] nv50_display_flip_next+0x713/0x7a0 > [0.625611] [] ? mutex_unlock+0xe/0x10 > [0.625612] [] ? evo_kick+0x37/0x40 > [0.625613] [] nv50_crtc_commit+0x10e/0x230 > [0.625615] [] drm_crtc_helper_set_mode+0x365/0x510 > [0.625617] [] drm_crtc_helper_set_config+0xa4e/0xb70 > [0.625618] [] drm_mode_set_config_internal+0x31/0x70 > [0.625619] [] drm_fb_helper_set_par+0x71/0xf0 > [0.625621] [] fbcon_init+0x514/0x5a0 > [0.625623] [] visual_init+0xbc/0x120 > [0.625624] [] do_bind_con_driver+0x163/0x320 > [0.625625] [] do_take_over_console+0x61/0x70 > [0.625627] [] do_fbcon_takeover+0x63/0xc0 > [0.625628] [] fbcon_event_notify+0x5fd/0x700 > [0.625629] [] notifier_call_chain+0x4d/0x70 > [0.625630] [] __blocking_notifier_call_chain+0x58/0x80 > [0.625631] [] blocking_notifier_call_chain+0x16/0x20 > [0.625633] [] fb_notifier_call_chain+0x1b/0x20 > [0.625634] [] register_framebuffer+0x1c8/0x310 > [0.625635] [] drm_fb_helper_initial_config+0x371/0x520 > [0.625637] [] ? > drm_fb_helper_single_add_all_connectors+0x47/0xf0 > [0.625639] [] ? kmem_cache_alloc_trace+0xee/0x150 > [0.625641] [] nouveau_fbcon_init+0x10e/0x160 > [