from:"Peter Zijlstra"

Re: [PATCH 1/3] drm: use the lookup lock in drm_is_current_master

2021-07-27 Thread Peter Zijlstra

On Thu, Jul 22, 2021 at 12:38:10PM +0200, Daniel Vetter wrote:
> On Thu, Jul 22, 2021 at 05:29:27PM +0800, Desmond Cheong Zhi Xi wrote:
> > Inside drm_is_current_master, using the outer drm_device.master_mutex
> > to protect reads of drm_file.master makes the function prone to creating
> > lock hierarchy inversions. Instead, we can use the
> > drm_file.master_lookup_lock that sits at the bottom of the lock
> > hierarchy.
> > 
> > Reported-by: Daniel Vetter 
> > Signed-off-by: Desmond Cheong Zhi Xi 
> > ---
> >  drivers/gpu/drm/drm_auth.c | 9 +
> >  1 file changed, 5 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/drm_auth.c b/drivers/gpu/drm/drm_auth.c
> > index f00354bec3fb..9c24b8cc8e36 100644
> > --- a/drivers/gpu/drm/drm_auth.c
> > +++ b/drivers/gpu/drm/drm_auth.c
> > @@ -63,8 +63,9 @@
> >  
> >  static bool drm_is_current_master_locked(struct drm_file *fpriv)
> >  {
> > -   lockdep_assert_held_once(&fpriv->minor->dev->master_mutex);
> > -
> > +   /* Either drm_device.master_mutex or drm_file.master_lookup_lock
> > +* should be held here.
> > +*/
> 
> Disappointing that lockdep can't check or conditions for us, a
> lockdep_assert_held_either would be really neat in some cases.
> 
> Adding lockdep folks, maybe they have ideas.

#ifdef CONFIG_LOCKDEP
WARN_ON_ONCE(debug_locks && !(lockdep_is_held(&drm_device.master_mutex) 
||
  
lockdep_is_held(&drm_file.master_lookup_lock)));
#endif

doesn't exactly roll off the tongue, but should do as you want I
suppose.

Would something like:

#define lockdep_assert(cond)WARN_ON_ONCE(debug_locks && !(cond))

Such that we can write:

lockdep_assert(lockdep_is_held(&drm_device.master_mutex) ||
   lockdep_is_held(&drm_file.master_lookup_lock));

make it better ?

---
Subject: locking/lockdep: Provide lockdep_assert{,_once}() helpers

Extract lockdep_assert{,_once}() helpers to more easily write composite
assertions like, for example:

lockdep_assert(lockdep_is_held(&drm_device.master_mutex) ||
   lockdep_is_held(&drm_file.master_lookup_lock));

Signed-off-by: Peter Zijlstra (Intel) 
---
diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index 5cf387813754..0da67341c1fb 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -306,31 +306,29 @@ extern void lock_unpin_lock(struct lockdep_map *lock, 
struct pin_cookie);
 
 #define lockdep_depth(tsk) (debug_locks ? (tsk)->lockdep_depth : 0)
 
-#define lockdep_assert_held(l) do {\
-   WARN_ON(debug_locks &&  \
-   lockdep_is_held(l) == LOCK_STATE_NOT_HELD); \
-   } while (0)
+#define lockdep_assert(cond)   \
+   do { WARN_ON(debug_locks && !(cond)); } while (0)
 
-#define lockdep_assert_not_held(l) do {\
-   WARN_ON(debug_locks &&  \
-   lockdep_is_held(l) == LOCK_STATE_HELD); \
-   } while (0)
+#define lockdep_assert_once(cond)  \
+   do { WARN_ON_ONCE(debug_locks && !(cond)); } while (0)
 
-#define lockdep_assert_held_write(l)   do {\
-   WARN_ON(debug_locks && !lockdep_is_held_type(l, 0));\
-   } while (0)
+#define lockdep_assert_held(l) \
+   lockdep_assert(lockdep_is_held(l) != LOCK_STAT_NOT_HELD)
 
-#define lockdep_assert_held_read(l)do {\
-   WARN_ON(debug_locks && !lockdep_is_held_type(l, 1));\
-   } while (0)
+#define lockdep_assert_not_held(l) \
+   lockdep_assert(lockdep_is_held(l) != LOCK_STATE_HELD)
 
-#define lockdep_assert_held_once(l)do {\
-   WARN_ON_ONCE(debug_locks && !lockdep_is_held(l));   \
-   } while (0)
+#define lockdep_assert_held_write(l)   \
+   lockdep_assert(lockdep_is_held_type(l, 0))
 
-#define lockdep_assert_none_held_once()do {
\
-   WARN_ON_ONCE(debug_locks && current->lockdep_depth);\
-   } while (0)
+#define lockdep_assert_held_read(l)\
+   lockdep_assert(lockdep_is_held_type(l, 1))
+
+#define lockdep_assert_held_once(l)\
+   lockdep_assert_once(lockdep_is_held(l) != LOCK_STAT_NOT_HELD)
+
+#define lockdep_assert_none_held_once()\
+   lockdep_assert_once(!current->lockdep_depth)
 
 #define lockdep_recursing(tsk) ((tsk)->lockdep_recursion)
 
@@ -407,6 +405,9 @@ extern int lock_is_held(const void *);
 extern int lockdep_is_held(co

Re: [PATCH 1/3] drm: use the lookup lock in drm_is_current_master

2021-07-29 Thread Peter Zijlstra

On Thu, Jul 29, 2021 at 10:32:13PM +0800, Desmond Cheong Zhi Xi wrote:
> Sounds good, will do. Thanks for the patch, Peter.
> 
> Just going to make a small edit:
> s/LOCK_STAT_NOT_HELD/LOCK_STATE_NOT_HELD/

Bah, I knew I should've compile tested it :-), Thanks!

Re: [PATCH v2 0/2] locking/lockdep, drm: apply new lockdep assert in drm_auth.c

2021-08-02 Thread Peter Zijlstra

On Mon, Aug 02, 2021 at 10:26:16AM +0200, Daniel Vetter wrote:
> On Sat, Jul 31, 2021 at 04:24:56PM +0800, Desmond Cheong Zhi Xi wrote:
> > Hi,
> > 
> > Following a discussion on the patch ("drm: use the lookup lock in
> > drm_is_current_master") [1], Peter Zijlstra proposed new lockdep_assert
> > helpers to make it convenient to compose lockdep checks together.
> > 
> > This series includes the patch that introduces the new lockdep helpers,
> > then utilizes these helpers in drm_is_current_master_locked in the
> > following patch.
> > 
> > v1 -> v2:
> > Patch 2:
> > - Updated the kerneldoc on the lock design of drm_file.master to explain
> > the use of lockdep_assert(). As suggested by Boqun Feng.
> > 
> > Link: 
> > https://lore.kernel.org/lkml/20210722092929.244629-2-desmondcheon...@gmail.com/
> >  [1]
> 
> Can you pls also cc: this to intel-gfx so the local CI there can pick it
> up and verify? Just to check we got it all.

Acked-by: Peter Zijlstra (Intel) 

Feel free to take it through the drm tree.

Re: [PATCH] drm/syncobj: make lockdep complain on WAIT_FOR_SUBMIT v2

2021-01-19 Thread Peter Zijlstra

On Mon, Jan 18, 2021 at 07:03:34PM +0100, Christian König wrote:

> diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
> index 6e74e6745eca..f51458615158 100644
> --- a/drivers/gpu/drm/drm_syncobj.c
> +++ b/drivers/gpu/drm/drm_syncobj.c
> @@ -387,6 +387,13 @@ int drm_syncobj_find_fence(struct drm_file *file_private,
>   if (!syncobj)
>   return -ENOENT;
>  
> + /* Waiting for userspace with locks help is illegal cause that can
> +  * trivial deadlock with page faults for example. Make lockdep complain
> +  * about it early on.
> +  */

Egads, the cursed comment style is spreading :/

> + if (flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT)
> + lockdep_assert_none_held_once();
> +

Should this not be part of drm_syncobj_fence_add_wait() instead? Also,
do you want to sprinkle might_sleep() around ?

>   *fence = drm_syncobj_fence_get(syncobj);
>   drm_syncobj_put(syncobj);
>  
> diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH] drm/syncobj: make lockdep complain on WAIT_FOR_SUBMIT v2

2021-01-19 Thread Peter Zijlstra

On Tue, Jan 19, 2021 at 10:46:53AM +0100, Christian König wrote:
> But I'm going to double check if drm_syncobj_fence_add_wait() isn't used
> elsewhere as well.

drm_syncobj_array_wait_timeout()

Which is why I asked.. :-)
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH] drm/syncobj: make lockdep complain on WAIT_FOR_SUBMIT v3

2021-01-19 Thread Peter Zijlstra

On Tue, Jan 19, 2021 at 02:05:09PM +0100, Daniel Vetter wrote:

> > diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
> > index b9e9adec73e8..6eb117c0d0f3 100644
> > --- a/include/linux/lockdep.h
> > +++ b/include/linux/lockdep.h
> > @@ -310,6 +310,10 @@ extern void lock_unpin_lock(struct lockdep_map *lock, 
> > struct pin_cookie);
> > WARN_ON_ONCE(debug_locks && !lockdep_is_held(l));   \
> > } while (0)
> >
> > +#define lockdep_assert_none_held_once()do {
> > \
> > +   WARN_ON_ONCE(debug_locks && current->lockdep_depth);\
> > +   } while (0)
> > +
> >  #define lockdep_recursing(tsk) ((tsk)->lockdep_recursion)
> >
> >  #define lockdep_pin_lock(l)lock_pin_lock(&(l)->dep_map)
> > @@ -387,6 +391,7 @@ extern int lockdep_is_held(const void *);
> >  #define lockdep_assert_held_write(l)   do { (void)(l); } while (0)
> >  #define lockdep_assert_held_read(l)do { (void)(l); } while (0)
> >  #define lockdep_assert_held_once(l)do { (void)(l); } while (0)
> > +#define lockdep_assert_none_held_once()    do { } while (0)
> >
> >  #define lockdep_recursing(tsk) (0)
> 
> ofc needs ack from Peter, but drm parts look all good to me.

Acked-by: Peter Zijlstra (Intel) 
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH] mutex: nuke mutex_trylock_recursive

2021-02-16 Thread Peter Zijlstra

On Tue, Feb 16, 2021 at 10:29:00AM +0100, Daniel Vetter wrote:
> On Tue, Feb 16, 2021 at 09:21:46AM +0100, Christian König wrote:
> > The last user went away in the 5.11 cycle.
> > 
> > Signed-off-by: Christian König 
> 
> Nice.
> 
> Reviewed-by: Daniel Vetter 
> 
> I think would be good to still stuff this into 5.12 before someone
> resurrects this zombie.

Already done:

  
https://lkml.kernel.org/r/161296556531.23325.10473355259841906876.tip-bot2@tip-bot2
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH] mutex: nuke mutex_trylock_recursive

2021-02-16 Thread Peter Zijlstra

On Tue, Feb 16, 2021 at 01:38:49PM +0100, Christian König wrote:
> 
> 
> Am 16.02.21 um 11:13 schrieb Peter Zijlstra:
> > On Tue, Feb 16, 2021 at 10:29:00AM +0100, Daniel Vetter wrote:
> > > On Tue, Feb 16, 2021 at 09:21:46AM +0100, Christian König wrote:
> > > > The last user went away in the 5.11 cycle.
> > > > 
> > > > Signed-off-by: Christian König 
> > > Nice.
> > > 
> > > Reviewed-by: Daniel Vetter 
> > > 
> > > I think would be good to still stuff this into 5.12 before someone
> > > resurrects this zombie.
> > Already done:
> > 
> >
> > https://lkml.kernel.org/r/161296556531.23325.10473355259841906876.tip-bot2@tip-bot2
> 
> One less bad concept to worry about.
> 
> But your patch is missing to remove the checkpatch.pl check for this :)

The next patch does that, look at the "Thread overview:" at the bottom
of the things.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 1/2] locking/rwsem: Add down_write_interruptible

2021-03-09 Thread Peter Zijlstra

On Mon, Mar 08, 2021 at 03:54:55PM -0500, Zack Rusin wrote:
> Add an interruptible version of down_write. It's the other
> side of the already implemented down_read_interruptible.
> It allows drivers which used custom locking code to
> support interruptible rw semaphores to switch over
> to rwsem.
> 
> Cc: Peter Zijlstra 
> Cc: Ingo Molnar 
> Cc: Will Deacon 
> Cc: linux-ker...@vger.kernel.org
> Cc: dri-devel@lists.freedesktop.org

No SoB!

Assuning you actually wrote and and simply forgot to add it, the patch
does look ok, so:

Acked-by: Peter Zijlstra (Intel) 
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 3/3] locking/selftests: Add testcases for fs_reclaim

2020-11-20 Thread Peter Zijlstra

On Fri, Nov 20, 2020 at 10:54:44AM +0100, Daniel Vetter wrote:
> Since I butchered this I figured better to make sure we have testcases
> for this now. Since we only have a locking context for __GFP_FS that's
> the only thing we're testing right now.
> 
> Cc: linux-fsde...@vger.kernel.org
> Cc: Dave Chinner 
> Cc: Qian Cai 
> Cc: linux-...@vger.kernel.org
> Cc: Thomas Hellström (Intel) 
> Cc: Andrew Morton 
> Cc: Jason Gunthorpe 
> Cc: linux...@kvack.org
> Cc: linux-r...@vger.kernel.org
> Cc: Maarten Lankhorst 
> Cc: Christian König 
> Cc: "Matthew Wilcox (Oracle)" 
> Signed-off-by: Daniel Vetter 
> Cc: Peter Zijlstra 
> Cc: Ingo Molnar 
> Cc: Will Deacon 
> Cc: linux-ker...@vger.kernel.org
> ---
>  lib/locking-selftest.c | 47 ++
>  1 file changed, 47 insertions(+)

I have a few changes pending for this file, I don't think the conflicts
will be bad, but..

In any case:

Acked-by: Peter Zijlstra (Intel) 
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH] kernel/locking: Add context to ww_mutex_trylock.

2021-09-08 Thread Peter Zijlstra

On Tue, Sep 07, 2021 at 03:20:44PM +0200, Maarten Lankhorst wrote:
> i915 will soon gain an eviction path that trylock a whole lot of locks
> for eviction, getting dmesg failures like below:
> 
> BUG: MAX_LOCK_DEPTH too low!
> turning off the locking correctness validator.
> depth: 48  max: 48!
> 48 locks held by i915_selftest/5776:
>  #0: 888101a79240 (&dev->mutex){}-{3:3}, at: 
> __driver_attach+0x88/0x160
>  #1: c99778c0 (reservation_ww_class_acquire){+.+.}-{0:0}, at: 
> i915_vma_pin.constprop.63+0x39/0x1b0 [i915]
>  #2: 88800cf74de8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: 
> i915_vma_pin.constprop.63+0x5f/0x1b0 [i915]
>  #3: 88810c7f9e38 (&vm->mutex/1){+.+.}-{3:3}, at: 
> i915_vma_pin_ww+0x1c4/0x9d0 [i915]
>  #4: 88810bad5768 (reservation_ww_class_mutex){+.+.}-{3:3}, at: 
> i915_gem_evict_something+0x110/0x860 [i915]
>  #5: 88810bad60e8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: 
> i915_gem_evict_something+0x110/0x860 [i915]
> ...
>  #46: 88811964d768 (reservation_ww_class_mutex){+.+.}-{3:3}, at: 
> i915_gem_evict_something+0x110/0x860 [i915]
>  #47: 88811964e0e8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: 
> i915_gem_evict_something+0x110/0x860 [i915]
> INFO: lockdep is turned off.

> As an intermediate solution, add an acquire context to ww_mutex_trylock,
> which allows us to do proper nesting annotations on the trylocks, making
> the above lockdep splat disappear.

Fair enough I suppose.

> +/**
> + * ww_mutex_trylock - tries to acquire the w/w mutex with optional acquire 
> context
> + * @lock: mutex to lock
> + * @ctx: optional w/w acquire context
> + *
> + * Trylocks a mutex with the optional acquire context; no deadlock detection 
> is
> + * possible. Returns 1 if the mutex has been acquired successfully, 0 
> otherwise.
> + *
> + * Unlike ww_mutex_lock, no deadlock handling is performed. However, if a 
> @ctx is
> + * specified, -EALREADY and -EDEADLK handling may happen in calls to 
> ww_mutex_lock.
> + *
> + * A mutex acquired with this function must be released with ww_mutex_unlock.
> + */
> +int __sched
> +ww_mutex_trylock(struct ww_mutex *ww, struct ww_acquire_ctx *ctx)
> +{
> + bool locked;
> +
> + if (!ctx)
> + return mutex_trylock(&ww->base);
> +
> +#ifdef CONFIG_DEBUG_MUTEXES
> + DEBUG_LOCKS_WARN_ON(ww->base.magic != &ww->base);
> +#endif
> +
> + preempt_disable();
> + locked = __mutex_trylock(&ww->base);
> +
> + if (locked) {
> + ww_mutex_set_context_fastpath(ww, ctx);
> + mutex_acquire_nest(&ww->base.dep_map, 0, 1, &ctx->dep_map, 
> _RET_IP_);
> + }
> + preempt_enable();
> +
> + return locked;
> +}
> +EXPORT_SYMBOL(ww_mutex_trylock);

You'll need a similar hunk in ww_rt_mutex.c

Re: [PATCH] kernel/locking: Add context to ww_mutex_trylock.

2021-09-09 Thread Peter Zijlstra

On Thu, Sep 09, 2021 at 07:38:06AM +0200, Maarten Lankhorst wrote:

> > You'll need a similar hunk in ww_rt_mutex.c
> 
> What tree has that file?

Linus' tree should have it. Per commit:

  f8635d509d80 ("locking/ww_mutex: Implement rtmutex based ww_mutex API 
functions")

Re: [PATCH v2] kernel/locking: Add context to ww_mutex_trylock.

2021-09-10 Thread Peter Zijlstra

On Thu, Sep 09, 2021 at 11:32:18AM +0200, Maarten Lankhorst wrote:
> diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
> index d456579d0952..791c28005eef 100644
> --- a/kernel/locking/mutex.c
> +++ b/kernel/locking/mutex.c
> @@ -736,6 +736,44 @@ __ww_mutex_lock(struct mutex *lock, unsigned int state, 
> unsigned int subclass,
>   return __mutex_lock_common(lock, state, subclass, NULL, ip, ww_ctx, 
> true);
>  }
>  
> +/**
> + * ww_mutex_trylock - tries to acquire the w/w mutex with optional acquire 
> context
> + * @lock: mutex to lock
> + * @ctx: optional w/w acquire context
> + *
> + * Trylocks a mutex with the optional acquire context; no deadlock detection 
> is
> + * possible. Returns 1 if the mutex has been acquired successfully, 0 
> otherwise.
> + *
> + * Unlike ww_mutex_lock, no deadlock handling is performed. However, if a 
> @ctx is
> + * specified, -EALREADY and -EDEADLK handling may happen in calls to 
> ww_mutex_lock.
> + *
> + * A mutex acquired with this function must be released with ww_mutex_unlock.
> + */
> +int __sched
> +ww_mutex_trylock(struct ww_mutex *ww, struct ww_acquire_ctx *ctx)
> +{
> + bool locked;
> +
> + if (!ctx)
> + return mutex_trylock(&ww->base);
> +
> +#ifdef CONFIG_DEBUG_MUTEXES
> + DEBUG_LOCKS_WARN_ON(ww->base.magic != &ww->base);
> +#endif
> +
> + preempt_disable();
> + locked = __mutex_trylock(&ww->base);
> +
> + if (locked) {
> + ww_mutex_set_context_fastpath(ww, ctx);
> + mutex_acquire_nest(&ww->base.dep_map, 0, 1, &ctx->dep_map, 
> _RET_IP_);
> + }
> + preempt_enable();
> +
> + return locked;
> +}
> +EXPORT_SYMBOL(ww_mutex_trylock);
> +
>  #ifdef CONFIG_DEBUG_LOCK_ALLOC
>  void __sched
>  mutex_lock_nested(struct mutex *lock, unsigned int subclass)

> diff --git a/kernel/locking/ww_rt_mutex.c b/kernel/locking/ww_rt_mutex.c
> index 3f1fff7d2780..c4cb863edb4c 100644
> --- a/kernel/locking/ww_rt_mutex.c
> +++ b/kernel/locking/ww_rt_mutex.c
> @@ -50,6 +50,18 @@ __ww_rt_mutex_lock(struct ww_mutex *lock, struct 
> ww_acquire_ctx *ww_ctx,
>   return ret;
>  }
>  
> +int __sched
> +ww_mutex_trylock(struct ww_mutex *lock, struct ww_acquire_ctx *ctx)
> +{
> + int locked = rt_mutex_trylock(&lock->base);
> +
> + if (locked && ctx)
> + ww_mutex_set_context_fastpath(lock, ctx);
> +
> + return locked;
> +}
> +EXPORT_SYMBOL(ww_mutex_trylock);
> +
>  int __sched
>  ww_mutex_lock(struct ww_mutex *lock, struct ww_acquire_ctx *ctx)
>  {

That doesn't look right, how's this for you?

---
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -94,6 +94,9 @@ static inline unsigned long __owner_flag
return owner & MUTEX_FLAGS;
 }
 
+/*
+ * Returns: __mutex_owner(lock) on failure or NULL on success.
+ */
 static inline struct task_struct *__mutex_trylock_common(struct mutex *lock, 
bool handoff)
 {
unsigned long owner, curr = (unsigned long)current;
@@ -736,6 +739,47 @@ __ww_mutex_lock(struct mutex *lock, unsi
return __mutex_lock_common(lock, state, subclass, NULL, ip, ww_ctx, 
true);
 }
 
+/**
+ * ww_mutex_trylock - tries to acquire the w/w mutex with optional acquire 
context
+ * @ww: mutex to lock
+ * @ww_ctx: optional w/w acquire context
+ *
+ * Trylocks a mutex with the optional acquire context; no deadlock detection is
+ * possible. Returns 1 if the mutex has been acquired successfully, 0 
otherwise.
+ *
+ * Unlike ww_mutex_lock, no deadlock handling is performed. However, if a @ctx 
is
+ * specified, -EALREADY handling may happen in calls to ww_mutex_trylock.
+ *
+ * A mutex acquired with this function must be released with ww_mutex_unlock.
+ */
+int ww_mutex_trylock(struct ww_mutex *ww, struct ww_acquire_ctx *ww_ctx)
+{
+   if (!ww_ctx)
+   return mutex_trylock(&ww->base);
+
+   MUTEX_WARN_ON(ww->base.magic != &ww->base);
+
+   if (unlikely(ww_ctx == READ_ONCE(ww->ctx)))
+   return -EALREADY;
+
+   /*
+* Reset the wounded flag after a kill. No other process can
+* race and wound us here, since they can't have a valid owner
+* pointer if we don't have any locks held.
+*/
+   if (ww_ctx->acquired == 0)
+   ww_ctx->wounded = 0;
+
+   if (__mutex_trylock(&ww->base)) {
+   ww_mutex_set_context_fastpath(ww, ww_ctx);
+   mutex_acquire_nest(&ww->base.dep_map, 0, 1, &ww_ctx->dep_map, 
_RET_IP_);
+   return 1;
+   }
+
+   return 0;
+}
+EXPORT_SYMBOL(ww_mutex_trylock);
+
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
 void __sched
 mutex_lock_nested(struct mutex *lock, unsigned int subclass)
--- a/kernel/locking/ww_rt_mutex.c
+++ b/kernel/locking/ww_rt_mutex.c
@@ -9,6 +9,34 @@
 #define WW_RT
 #include "rtmutex.c"
 
+int ww_mutex_trylock(struct ww_mutex *lock, struct ww_acquire_ctx *ww_ctx)
+{
+   struct rt_mutex *rtm = &lock->base;
+
+   if (!ww_ctx)
+   return rt_mutex_trylock(rtm);
+
+   if (unlikely(ww_

Re: [PATCH v2] kernel/locking: Add context to ww_mutex_trylock.

2021-09-10 Thread Peter Zijlstra

On Fri, Sep 10, 2021 at 05:02:54PM +0200, Peter Zijlstra wrote:

> That doesn't look right, how's this for you?

Full patch for the robots here:

https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/commit/?h=locking/core&id=826e7b8826f0af185bb93249600533c33fd69a95

Re: [PATCH v2] kernel/locking: Add context to ww_mutex_trylock.

2021-09-13 Thread Peter Zijlstra

On Mon, Sep 13, 2021 at 10:42:36AM +0200, Maarten Lankhorst wrote:

> > +/**
> > + * ww_mutex_trylock - tries to acquire the w/w mutex with optional acquire 
> > context
> > + * @ww: mutex to lock
> > + * @ww_ctx: optional w/w acquire context
> > + *
> > + * Trylocks a mutex with the optional acquire context; no deadlock 
> > detection is
> > + * possible. Returns 1 if the mutex has been acquired successfully, 0 
> > otherwise.
> > + *
> > + * Unlike ww_mutex_lock, no deadlock handling is performed. However, if a 
> > @ctx is
> > + * specified, -EALREADY handling may happen in calls to ww_mutex_trylock.
> > + *
> > + * A mutex acquired with this function must be released with 
> > ww_mutex_unlock.
> > + */
> > +int ww_mutex_trylock(struct ww_mutex *ww, struct ww_acquire_ctx *ww_ctx)
> > +{
> > +   if (!ww_ctx)
> > +   return mutex_trylock(&ww->base);
> > +
> > +   MUTEX_WARN_ON(ww->base.magic != &ww->base);
> > +
> > +   if (unlikely(ww_ctx == READ_ONCE(ww->ctx)))
> > +   return -EALREADY;
> 
> I'm not 100% sure this is a good idea, because it would make the
> trylock weird.  For i915 I checked manually, because I didn't want to
> change the function signature. This is probably the other extreme.
> 
> "if (ww_mutex_trylock())" would look correct, but actually be wrong
> and lead to double unlock without adjustments.  Maybe we could make a
> ww_mutex_trylock_ctx_err, which would return -EALREADY or -EBUSY on
> failure, and 0 on success?  We could keep ww_mutex_trylock without
> ctx, probably just #define as (!ww_mutex_trylock_ctx_err(lock, NULL))

Urgh, yeah. Also, I suppose that if we already own it, we'll just fail
the trylock anyway. Let me take this out.

> > +   /*
> > +* Reset the wounded flag after a kill. No other process can
> > +* race and wound us here, since they can't have a valid owner
> > +* pointer if we don't have any locks held.
> > +*/
> > +   if (ww_ctx->acquired == 0)
> > +   ww_ctx->wounded = 0;
> 
> Yeah I guess this needs fixing too. Not completely sure since trylock
> wouldn't do the whole ww dance, but since it's our first lock,
> probably best to do so regardless so other users don't trip over it.

This is actually critical, because if this trylock is the first lock
acquisition for the context, there won't be any other opportunity to
reset this value.

> > +
> > +   if (__mutex_trylock(&ww->base)) {
> > +   ww_mutex_set_context_fastpath(ww, ww_ctx);
> > +   mutex_acquire_nest(&ww->base.dep_map, 0, 1, &ww_ctx->dep_map, 
> > _RET_IP_);
> > +   return 1;
> > +   }
> > +
> > +   return 0;
> > +}
> > +EXPORT_SYMBOL(ww_mutex_trylock);

Updated version below...

---
Subject: kernel/locking: Add context to ww_mutex_trylock()
From: Maarten Lankhorst 
Date: Thu, 9 Sep 2021 11:32:18 +0200

From: Maarten Lankhorst 

i915 will soon gain an eviction path that trylock a whole lot of locks
for eviction, getting dmesg failures like below:

  BUG: MAX_LOCK_DEPTH too low!
  turning off the locking correctness validator.
  depth: 48  max: 48!
  48 locks held by i915_selftest/5776:
   #0: 888101a79240 (&dev->mutex){}-{3:3}, at: 
__driver_attach+0x88/0x160
   #1: c99778c0 (reservation_ww_class_acquire){+.+.}-{0:0}, at: 
i915_vma_pin.constprop.63+0x39/0x1b0 [i915]
   #2: 88800cf74de8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: 
i915_vma_pin.constprop.63+0x5f/0x1b0 [i915]
   #3: 88810c7f9e38 (&vm->mutex/1){+.+.}-{3:3}, at: 
i915_vma_pin_ww+0x1c4/0x9d0 [i915]
   #4: 88810bad5768 (reservation_ww_class_mutex){+.+.}-{3:3}, at: 
i915_gem_evict_something+0x110/0x860 [i915]
   #5: 88810bad60e8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: 
i915_gem_evict_something+0x110/0x860 [i915]
  ...
   #46: 88811964d768 (reservation_ww_class_mutex){+.+.}-{3:3}, at: 
i915_gem_evict_something+0x110/0x860 [i915]
   #47: 88811964e0e8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: 
i915_gem_evict_something+0x110/0x860 [i915]
  INFO: lockdep is turned off.

Fixing eviction to nest into ww_class_acquire is a high priority, but
it requires a rework of the entire driver, which can only be done one
step at a time.

As an intermediate solution, add an acquire context to
ww_mutex_trylock, which allows us to do proper nesting annotations on
the trylocks, making the above lockdep splat disappear.

This is also useful in regulator_lock_nested, which may avoid dropping
regulator_nesting_mutex in the uncontended path, so use it there.

TTM may be another user for this, where we could lock a buffer in a

Re: [PATCH v2] kernel/locking: Add context to ww_mutex_trylock.

2021-09-16 Thread Peter Zijlstra

On Thu, Sep 16, 2021 at 03:00:39PM +0200, Maarten Lankhorst wrote:

> > For merge logistics, can we pls have a stable branch? I expect that the
> > i915 patches will be ready for 5.16.
> >
> > Or send it in for -rc2 so that the interface change doesn't cause needless
> > conflicts, whatever you think is best.

> Yeah, some central branch drm could pull from, would make upstreaming patches 
> that depends on it easier. :)

I think I'll make tip/locking/wwmutex and include that in
tip/locking/core, let me have a poke.

Re: [PATCH v2] kernel/locking: Add context to ww_mutex_trylock.

2021-09-17 Thread Peter Zijlstra

On Thu, Sep 16, 2021 at 03:28:11PM +0200, Peter Zijlstra wrote:
> On Thu, Sep 16, 2021 at 03:00:39PM +0200, Maarten Lankhorst wrote:
> 
> > > For merge logistics, can we pls have a stable branch? I expect that the
> > > i915 patches will be ready for 5.16.
> > >
> > > Or send it in for -rc2 so that the interface change doesn't cause needless
> > > conflicts, whatever you think is best.
> 
> > Yeah, some central branch drm could pull from, would make upstreaming 
> > patches that depends on it easier. :)
> 
> I think I'll make tip/locking/wwmutex and include that in
> tip/locking/core, let me have a poke.

This is now so. Enjoy!

Re: [Intel-gfx] [PATCH 2/4] mm: add a io_mapping_map_user helper

2021-10-20 Thread Peter Zijlstra

On Wed, Oct 20, 2021 at 08:40:05AM -0700, Lucas De Marchi wrote:
> On Fri, Mar 26, 2021 at 06:55:03AM +0100, Christoph Hellwig wrote:
> > Add a helper that calls remap_pfn_range for an struct io_mapping, relying
> > on the pgprot pre-validation done when creating the mapping instead of
> > doing it at runtime.
> > 
> > Signed-off-by: Christoph Hellwig 
> > ---
> > include/linux/io-mapping.h |  3 +++
> > mm/Kconfig |  3 +++
> > mm/Makefile|  1 +
> > mm/io-mapping.c| 29 +
> > 4 files changed, 36 insertions(+)
> > create mode 100644 mm/io-mapping.c
> > 
> > diff --git a/include/linux/io-mapping.h b/include/linux/io-mapping.h
> > index c093e81310a9b3..e9743cfd858527 100644
> > --- a/include/linux/io-mapping.h
> > +++ b/include/linux/io-mapping.h
> > @@ -220,3 +220,6 @@ io_mapping_free(struct io_mapping *iomap)
> > }
> > 
> > #endif /* _LINUX_IO_MAPPING_H */
> > +
> > +int io_mapping_map_user(struct io_mapping *iomap, struct vm_area_struct 
> > *vma,
> > +   unsigned long addr, unsigned long pfn, unsigned long size);
> 
> I'm not sure what exactly brought me to check this, but while debugging
> I noticed this outside the header guard. But then after some more checks I
> saw nothing actually selects CONFIG_IO_MAPPING because commit using
> it was reverted in commit 0e4fe0c9f2f9 ("Revert "i915: use 
> io_mapping_map_user"")
> 
> Is this something we want to re-attempt moving to mm/ ?

Yes, it would be very good to unexport apply_to_page_range(), it's a
terrible interface to expose.

Re: [RFC 1/6] sched: Add nice value change notifier

2021-09-30 Thread Peter Zijlstra

On Thu, Sep 30, 2021 at 06:15:47PM +0100, Tvrtko Ursulin wrote:
>  void set_user_nice(struct task_struct *p, long nice)
>  {
>   bool queued, running;
> - int old_prio;
> + int old_prio, ret;
>   struct rq_flags rf;
>   struct rq *rq;
>  
> @@ -6913,6 +6945,9 @@ void set_user_nice(struct task_struct *p, long nice)
>*/
>   p->sched_class->prio_changed(rq, p, old_prio);
>  
> + ret = atomic_notifier_call_chain(&user_nice_notifier_list, nice, p);
> + WARN_ON_ONCE(ret != NOTIFY_DONE);
> +
>  out_unlock:
>   task_rq_unlock(rq, p, &rf);
>  }

No, we're not going to call out to exported, and potentially unbounded,
functions under scheduler locks.

Re: [RFC 0/6] CPU + GPU synchronised priority scheduling

2021-09-30 Thread Peter Zijlstra

On Thu, Sep 30, 2021 at 06:15:46PM +0100, Tvrtko Ursulin wrote:

> (Note I did not copy
> everyone on all patches but just the cover letter for context and the rest
> should be available from the mailing list.)

In general, if you can't be arsed to send it to me, I can't be arsed to
dig it out. I've got plenty other email I can read without having to go
looking for more.

Re: [RFC 1/6] sched: Add nice value change notifier

2021-10-01 Thread Peter Zijlstra

On Fri, Oct 01, 2021 at 11:32:16AM +0100, Tvrtko Ursulin wrote:
> 
> On 01/10/2021 10:04, Tvrtko Ursulin wrote:
> > 
> > Hi Peter,
> > 
> > On 30/09/2021 19:33, Peter Zijlstra wrote:
> > > On Thu, Sep 30, 2021 at 06:15:47PM +0100, Tvrtko Ursulin wrote:
> > > >   void set_user_nice(struct task_struct *p, long nice)
> > > >   {
> > > >   bool queued, running;
> > > > -    int old_prio;
> > > > +    int old_prio, ret;
> > > >   struct rq_flags rf;
> > > >   struct rq *rq;
> > > > @@ -6913,6 +6945,9 @@ void set_user_nice(struct task_struct *p,
> > > > long nice)
> > > >    */
> > > >   p->sched_class->prio_changed(rq, p, old_prio);
> > > > +    ret = atomic_notifier_call_chain(&user_nice_notifier_list,
> > > > nice, p);
> > > > +    WARN_ON_ONCE(ret != NOTIFY_DONE);
> > > > +
> > > >   out_unlock:
> > > >   task_rq_unlock(rq, p, &rf);
> > > >   }
> > > 
> > > No, we're not going to call out to exported, and potentially unbounded,
> > > functions under scheduler locks.
> > 
> > Agreed, that's another good point why it is even more hairy, as I have
> > generally alluded in the cover letter.
> > 
> > Do you have any immediate thoughts on possible alternatives?
> > 
> > Like for instance if I did a queue_work from set_user_nice and then ran
> > a notifier chain async from a worker? I haven't looked at yet what
> > repercussion would that have in terms of having to cancel the pending
> > workers when tasks exit. I can try and prototype that and see how it
> > would look.
> 
> Hm or I simply move calling the notifier chain to after task_rq_unlock? That
> would leave it run under the tasklist lock so probably still quite bad.

Hmm? That's for normalize_rt_tasks() only, right? Just don't have it
call the notifier in that special case (that's a magic sysrq thing
anyway).

Re: [RFC 1/6] sched: Add nice value change notifier

2021-10-04 Thread Peter Zijlstra

On Mon, Oct 04, 2021 at 09:12:37AM +0100, Tvrtko Ursulin wrote:
> On 01/10/2021 16:48, Peter Zijlstra wrote:

> > Hmm? That's for normalize_rt_tasks() only, right? Just don't have it
> > call the notifier in that special case (that's a magic sysrq thing
> > anyway).
> 
> You mean my talk about tasklist_lock? No, it is also on the syscall part I
> am interested in as well. Call chain looks like this:

Urgh, I alwys miss that because it lives outside of sched.. :/

> sys_setpriority()
> {
>   ...
>   rcu_read_lock();
>   read_lock(&tasklist_lock);
>   ...
>   set_one_prio()
> set_user_nice()
> {
>   ...
>   task_rq_lock();
> -> my notifier from this RFC [1]
>   task_rq_unlock();
> -> I can move the notifier here for _some_ improvement [2]
> }
>   ...
>   read_unlock(&tasklist_lock);
>   rcu_read_unlock();
> }
> 
> So this RFC had the notifier call chain at [1], which I understood was the
> thing you initially pointed was horrible, being under a scheduler lock.
> 
> I can trivially move it to [2] but that still leaves it under the tasklist
> lock. I don't have a good feel how much better that would be. If not good
> enough then I will look for a smarter solution with less opportunity for
> global impact.

So task_list lock is pretty terrible and effectively unbound already
(just create more tasks etc..) so adding a notifier call there shouldn't
really make it much worse.

Re: [PATCH 8/8] drm/i915: Don't disable interrupts and pretend a lock as been acquired in __timeline_mark_lock().

2021-10-05 Thread Peter Zijlstra

On Tue, Oct 05, 2021 at 05:00:46PM +0200, Sebastian Andrzej Siewior wrote:
> This is a revert of commits
>d67739268cf0e ("drm/i915/gt: Mark up the nested engine-pm timeline lock as 
> irqsafe")
>6c69a45445af9 ("drm/i915/gt: Mark context->active_count as protected by 
> timeline->mutex")
> 
> The existing code leads to a different behaviour depending on wheather
> lockdep is enabled or not. Any following lock that is acquired without
> disabling interrupts (but needs to) will not be noticed by lockdep.
> 
> This it not just a lockdep annotation but is used but an actual mutex_t
> that is properly used as a lock but in case of __timeline_mark_lock()
> lockdep is only told that it is acquired but no lock has been acquired.
> 
> It appears that its purporse is just satisfy the lockdep_assert_held()
> check in intel_context_mark_active(). The other problem with disabling
> interrupts is that on PREEMPT_RT interrupts are also disabled which
> leads to problems for instance later during memory allocation.
> 
> Add an argument to intel_context_mark_active() which is true if the lock
> must have been acquired, false if other magic is involved and the lock
> is not needed. Use the `false' argument only from within
> switch_to_kernel_context() and remove __timeline_mark_lock().
> 
> Cc: Peter Zijlstra 
> Signed-off-by: Sebastian Andrzej Siewior 

Eeew, nice find.


> -static inline void intel_context_mark_active(struct intel_context *ce)
> +static inline void intel_context_mark_active(struct intel_context *ce,
> +  bool timeline_mutex_needed)
>  {
> - lockdep_assert_held(&ce->timeline->mutex);
> + if (timeline_mutex_needed)
> + lockdep_assert_held(&ce->timeline->mutex);
>   ++ce->active_count;
>  }

Chris, might it be possible to write that something like:

lockdep_assert(lockdep_is_held(&ce->timeline->mutex) ||
   engine_is_parked(ce));

instead?

Otherwise,

Acked-by: Peter Zijlstra (Intel)

Re: printk deadlock due to double lock attempt on current CPU's runqueue

2021-11-09 Thread Peter Zijlstra

On Tue, Nov 09, 2021 at 12:06:48PM -0800, Sultan Alsawaf wrote:
> Hi,
> 
> I encountered a printk deadlock on 5.13 which appears to still affect the 
> latest
> kernel. The deadlock occurs due to printk being used while having the current
> CPU's runqueue locked, and the underlying framebuffer console attempting to 
> lock
> the same runqueue when printk tries to flush the log buffer.

Yes, that's a known 'feature' of some consoles. printk() is in the
process of being reworked to not call con->write() from the printk()
calling context, which would go a long way towards fixing this.

>   #27 [c95b8e28] enqueue_task_fair at 8129774a  <-- 
> SCHED_WARN_ON(rq->tmp_alone_branch != &rq->leaf_cfs_rq_list);
>   #28 [c95b8ec0] activate_task at 8125625d
>   #29 [c95b8ef0] ttwu_do_activate at 81257943
>   #30 [c95b8f28] sched_ttwu_pending at 8125c71f <-- locks 
> this CPU's runqueue
>   #31 [c95b8fa0] flush_smp_call_function_queue at 813c6833
>   #32 [c95b8fd8] generic_smp_call_function_single_interrupt at 
> 813c7f58
>   #33 [c95b8fe0] __sysvec_call_function_single at 810f1456
>   #34 [c95b8ff0] sysvec_call_function_single at 831ec1bc
>   ---  ---
>   #35 [c919fda8] sysvec_call_function_single at 831ec1bc
>   RIP: 831ed06e  RSP: ed10438a6a49  RFLAGS: 0001
>   RAX: 888100d832c0  RBX:   RCX: 19233fd7
>   RDX:   RSI: 888100d832c0  RDI: ed10438a6a49
>   RBP: 831ec166   R8: dc00   R9: 
>   R10: 83400e22  R11:   R12: 831ed83e
>   R13:   R14: c919fde8  R15: 814d4d9d
>   ORIG_RAX: 88821c53524b  CS: 0001  SS: ef073a2
>   WARNING: possibly bogus exception frame
> --->8---
> 
> The catalyst is that CONFIG_SCHED_DEBUG is enabled and the tmp_alone_branch
> assertion fails (Peter, is this bad?).

Yes, that's not good. IIRC Vincent and Michal were looking at that code
recently.

> I'm not sure what the *correct* solution is here (don't use printk while 
> having
> a runqueue locked? don't use schedule_work() from the fbcon path? tell printk
> to use one of its lock-less backends?), so I've cc'd all the relevant folks.

I'm a firm believer in early_printk serial consoles.

Re: printk deadlock due to double lock attempt on current CPU's runqueue

2021-11-10 Thread Peter Zijlstra

On Wed, Nov 10, 2021 at 11:50:38AM +0100, Petr Mladek wrote:
> On Tue 2021-11-09 12:06:48, Sultan Alsawaf wrote:
> > Hi,
> > 
> > I encountered a printk deadlock on 5.13 which appears to still affect the 
> > latest
> > kernel. The deadlock occurs due to printk being used while having the 
> > current
> > CPU's runqueue locked, and the underlying framebuffer console attempting to 
> > lock
> > the same runqueue when printk tries to flush the log buffer.
> > 
> > I'm not sure what the *correct* solution is here (don't use printk while 
> > having
> > a runqueue locked? don't use schedule_work() from the fbcon path? tell 
> > printk
> > to use one of its lock-less backends?), so I've cc'd all the relevant folks.
> 
> At the moment, printk_deferred() could be used here. It defers the
> console handling via irq_work().

I think I've rejected that patch at least twice now :-) John's printk
stuff will really land real soon now, right.

Re: [PATCH 0/2] Nuke PAGE_KERNEL_IO

2021-11-12 Thread Peter Zijlstra

On Thu, Oct 21, 2021 at 11:15:09AM -0700, Lucas De Marchi wrote:
> Last user of PAGE_KERNEL_IO is the i915 driver. While removing it from
> there as we seek to bring the driver to other architectures, Daniel
> suggested that we could finish the cleanup and remove it altogether,
> through the tip tree. So here I'm sending both commits needed for that.
> 
> Lucas De Marchi (2):
>   drm/i915/gem: stop using PAGE_KERNEL_IO
>   x86/mm: nuke PAGE_KERNEL_IO
> 
>  arch/x86/include/asm/fixmap.h | 2 +-
>  arch/x86/include/asm/pgtable_types.h  | 7 ---
>  arch/x86/mm/ioremap.c | 2 +-
>  arch/x86/xen/setup.c  | 2 +-
>  drivers/gpu/drm/i915/gem/i915_gem_pages.c | 4 ++--
>  include/asm-generic/fixmap.h  | 2 +-
>  6 files changed, 6 insertions(+), 13 deletions(-)

Acked-by: Peter Zijlstra (Intel)

Re: [PATCH] locking/rwsem: drop redundant semicolon of down_write_nest_lock

2022-01-14 Thread Peter Zijlstra

On Fri, Jan 14, 2022 at 09:40:54AM +0100, Christian König wrote:
> Am 14.01.22 um 09:37 schrieb Guchun Chen:
> > Otherwise, braces are needed when using it.
> > 
> > Signed-off-by: Guchun Chen 
> 
> Acked-by: Christian König 
> 
> Peter any objections that we push this upstream through the drm subsystem?

Nah, take it.

Acked-by: Peter Zijlstra (Intel)

drm/fb-helper: Reduce READ_ONCE(master) to lockless_dereference

2016-08-11 Thread Peter Zijlstra

On Wed, Jun 22, 2016 at 08:46:12AM +0100, Chris Wilson wrote:
> We are only documenting that the read is outside of the lock, and do not
> require strict ordering on the operation. In this case the more relaxed
> lockless_dereference() will suffice.

No, no, no... This is 'broken'. lockless_dereference() is _stronger_
than READ_ONCE(), not weaker.

lockless_dereference() is a wrapper around smp_read_barrier_depends()
and is used to form read dependencies. There is no read dependency here,
therefore using lockless_dereference() is entirely pointless.

Look at the definition of lockless_dereference(), it does a READ_ONCE()
and then smp_read_barrier_depends().

Also, clue is in the name: 'dereference', you don't actually dereference
the pointer here, only load it.

> +++ b/drivers/gpu/drm/drm_fb_helper.c
> @@ -464,7 +464,7 @@ static bool drm_fb_helper_is_bound(struct drm_fb_helper 
> *fb_helper)
>  
>   /* Sometimes user space wants everything disabled, so don't steal the
>* display if there's a master. */
> - if (READ_ONCE(dev->master))
> + if (lockless_dereference(dev->master))
>   return false;
>  
>   drm_for_each_crtc(crtc, dev) {

[PATCH 1/2] Revert "drm/fb-helper: Reduce READ_ONCE(master) to lockless_dereference"

2016-08-12 Thread Peter Zijlstra

On Thu, Aug 11, 2016 at 11:26:47AM -0700, Paul E. McKenney wrote:
> If my upcoming testing of the two changes together pans out, I will
> give you a Tested-by -- I am guessing that you don't want to wait
> until the next merge window for these changes.

I was planning to stuff them in tip/locking/urgent, so they'd end up in
this release.

Re: [PATCH] RFC: console: hack up console_lock more v3

2019-05-09 Thread Peter Zijlstra

On Thu, May 09, 2019 at 02:09:03PM +0200, Daniel Vetter wrote:
> Fix this by creating a prinkt_safe_up() which calls wake_up_process
> outside of the spinlock. This isn't correct in full generality, but
> good enough for console_lock:
> 
> - console_lock doesn't use interruptible or killable or timeout down()
>   calls, hence an up() is the only thing that can wake up a process.

Wrong :/ Any task can be woken at any random time. We must, at all
times, assume spurious wakeups will happen.

> +void printk_safe_up(struct semaphore *sem)
> +{
> + unsigned long flags;
> + struct semaphore_waiter *waiter = NULL;
> +
> + raw_spin_lock_irqsave(&sem->lock, flags);
> + if (likely(list_empty(&sem->wait_list))) {
> + sem->count++;
> + } else {
> + waiter = list_first_entry(&sem->wait_list,
> +   struct semaphore_waiter, list);
> + list_del(&waiter->list);
> + waiter->up = true;
> + }
> + raw_spin_unlock_irqrestore(&sem->lock, flags);
> +
> + if (waiter) /* protected by being sole wake source */
> + wake_up_process(waiter->task);
> +}
> +EXPORT_SYMBOL(printk_safe_up);

Since its only used from printk, that EXPORT really isn't needed.

Something like the below might work.

---
 kernel/locking/semaphore.c | 26 +-
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/kernel/locking/semaphore.c b/kernel/locking/semaphore.c
index 561acdd39960..ac0a67e95aac 100644
--- a/kernel/locking/semaphore.c
+++ b/kernel/locking/semaphore.c
@@ -38,7 +38,6 @@ static noinline void __down(struct semaphore *sem);
 static noinline int __down_interruptible(struct semaphore *sem);
 static noinline int __down_killable(struct semaphore *sem);
 static noinline int __down_timeout(struct semaphore *sem, long timeout);
-static noinline void __up(struct semaphore *sem);
 
 /**
  * down - acquire the semaphore
@@ -178,14 +177,24 @@ EXPORT_SYMBOL(down_timeout);
  */
 void up(struct semaphore *sem)
 {
+   struct semaphore_waiter *waiter;
+   DEFINE_WAKE_Q(wake_q);
unsigned long flags;
 
raw_spin_lock_irqsave(&sem->lock, flags);
-   if (likely(list_empty(&sem->wait_list)))
+   if (likely(list_empty(&sem->wait_list))) {
sem->count++;
-   else
-   __up(sem);
+   goto unlock;
+   }
+
+   waiter = list_first_entry(&sem->wait_list, struct semaphore_waiter, 
list);
+   list_del(&waiter->list);
+   waiter->up = true;
+   wake_q_add(&wake_q, waiter->task);
+unlock:
raw_spin_unlock_irqrestore(&sem->lock, flags);
+
+   wake_up_q(&wake_q);
 }
 EXPORT_SYMBOL(up);
 
@@ -252,12 +261,3 @@ static noinline int __sched __down_timeout(struct 
semaphore *sem, long timeout)
 {
return __down_common(sem, TASK_UNINTERRUPTIBLE, timeout);
 }
-
-static noinline void __sched __up(struct semaphore *sem)
-{
-   struct semaphore_waiter *waiter = list_first_entry(&sem->wait_list,
-   struct semaphore_waiter, list);
-   list_del(&waiter->list);
-   waiter->up = true;
-   wake_up_process(waiter->task);
-}
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH] RFC: console: hack up console_lock more v3

2019-05-09 Thread Peter Zijlstra

On Thu, May 09, 2019 at 03:06:09PM +0200, Daniel Vetter wrote:
> On Thu, May 9, 2019 at 2:31 PM Peter Zijlstra  wrote:
> > On Thu, May 09, 2019 at 02:09:03PM +0200, Daniel Vetter wrote:
> > > Fix this by creating a prinkt_safe_up() which calls wake_up_process
> > > outside of the spinlock. This isn't correct in full generality, but
> > > good enough for console_lock:
> > >
> > > - console_lock doesn't use interruptible or killable or timeout down()
> > >   calls, hence an up() is the only thing that can wake up a process.
> >
> > Wrong :/ Any task can be woken at any random time. We must, at all
> > times, assume spurious wakeups will happen.
> 
> Out of curiosity, where do these come from? I know about the races
> where you need to recheck on the waiter side to avoid getting stuck,
> but didn't know about this. Are these earlier (possibly spurious)
> wakeups that got held up and delayed for a while, then hit the task
> much later when it's already continued doing something else?

Yes, this. So they all more or less have the form:

CPU0CPU1

enqueue_waiter()
done = true;
if (waiters)
for (;;) {
  if (done)
break;

  ...
}

dequeue_waiter()

do something else again

  wake_up_task



The wake_q thing made the above much more common, but we've had it
forever.

> Or even
> more random, and even if I never put a task on a wait list or anything
> else, ever, it can get woken spuriously?

I had patches that did that on purpose, but no.

> > Something like the below might work.
> 
> Yeah that looks like the proper fix. I guess semaphores are uncritical
> enough that we can roll this out for everyone. Thanks for the hint.

It's actually an optimization that we never did because semaphores are
so uncritical :-)

The thing is, by delaying the wakup until after we've released the
spinlock, the waiter will not contend on the spinlock the moment it
wakes.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v3 08/18] objtool: add kunit_try_catch_throw to the noreturn list

2019-05-13 Thread Peter Zijlstra

On Mon, May 13, 2019 at 10:42:42PM -0700, Brendan Higgins wrote:
> This fixes the following warning seen on GCC 7.3:
>   kunit/test-test.o: warning: objtool: kunit_test_unsuccessful_try() falls 
> through to next function kunit_test_catch()
> 

What is that file and function; no kernel tree near me seems to have
that.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v3 08/18] objtool: add kunit_try_catch_throw to the noreturn list

2019-05-14 Thread Peter Zijlstra

On Tue, May 14, 2019 at 01:12:23AM -0700, Brendan Higgins wrote:
> On Tue, May 14, 2019 at 08:56:43AM +0200, Peter Zijlstra wrote:
> > On Mon, May 13, 2019 at 10:42:42PM -0700, Brendan Higgins wrote:
> > > This fixes the following warning seen on GCC 7.3:
> > >   kunit/test-test.o: warning: objtool: kunit_test_unsuccessful_try() 
> > > falls through to next function kunit_test_catch()
> > > 
> > 
> > What is that file and function; no kernel tree near me seems to have
> > that.
> 
> Oh, sorry about that. The function is added in the following patch,
> "[PATCH v3 09/18] kunit: test: add support for test abort"[1].
> 
> My apologies if this patch is supposed to come after it in sequence, but
> I assumed it should come before otherwise objtool would complain about
> the symbol when it is introduced.

Or send me all patches such that I have context, or have a sane
Changelog that gives me context. Just don't give me one patch with a
crappy changelog.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [patch V2 28/29] stacktrace: Provide common infrastructure

2019-04-19 Thread Peter Zijlstra

On Thu, Apr 18, 2019 at 05:42:55PM +0200, Thomas Gleixner wrote:
> On Thu, 18 Apr 2019, Josh Poimboeuf wrote:

> > Another idea I had (but never got a chance to work on) was to extend the
> > x86 unwind interface to all arches.  So instead of the callbacks, each
> > arch would implement something like this API:

> I surely thought about that, but after staring at all incarnations of
> arch/*/stacktrace.c I just gave up.
> 
> Aside of that quite some archs already have callback based unwinders
> because they use them for more than stacktracing and just have a single
> implementation of that loop.
> 
> I'm fine either way. We can start with x86 and then let archs convert over
> their stuff, but I wouldn't hold my breath that this will be completed in
> the forseeable future.

I suggested the same to Thomas early on, and I even spend the time to
convert some $random arch to the iterator interface, and while it is
indeed entirely feasible, it is _far_ more work.

The callback thing OTOH is flexible enough to do what we want to do now,
and allows converting most archs to it without too much pain (as Thomas
said, many archs are already in this form and only need minor API
adjustments), which gets us in a far better place than we are now.

And we can always go to iterators later on. But I think getting the
generic unwinder improved across all archs is a really important first
step here.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [patch V2 28/29] stacktrace: Provide common infrastructure

2019-04-19 Thread Peter Zijlstra

On Thu, Apr 18, 2019 at 10:41:47AM +0200, Thomas Gleixner wrote:

> +typedef bool (*stack_trace_consume_fn)(void *cookie, unsigned long addr,
> +  bool reliable);

> +void arch_stack_walk(stack_trace_consume_fn consume_entry, void *cookie,
> +  struct task_struct *task, struct pt_regs *regs);
> +int arch_stack_walk_reliable(stack_trace_consume_fn consume_entry, void 
> *cookie,
> +  struct task_struct *task);

This bugs me a little; ideally the _reliable() thing would not exists.

Thomas said that the existing __save_stack_trace_reliable() is different
enough for the unification to be non-trivial, but maybe Josh can help
out?

From what I can see the biggest significant differences are:

 - it looks at the regs sets on the stack and for FP bails early
 - bails for khreads and idle (after it does all the hard work!?!)

The first (FP checking for exceptions) should probably be reflected in
consume_fn(.reliable) anyway -- although that would mean a lot of extra
'?' entries where there are none today.

And the second (KTHREAD/IDLE) is something that the generic code can
easily do before calling into the arch unwinder.

Hmm?

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [patch V2 28/29] stacktrace: Provide common infrastructure

2019-04-19 Thread Peter Zijlstra

On Fri, Apr 19, 2019 at 10:32:30AM +0200, Thomas Gleixner wrote:
> On Fri, 19 Apr 2019, Peter Zijlstra wrote:
> > On Thu, Apr 18, 2019 at 10:41:47AM +0200, Thomas Gleixner wrote:
> > 
> > > +typedef bool (*stack_trace_consume_fn)(void *cookie, unsigned long addr,
> > > +  bool reliable);
> > 
> > > +void arch_stack_walk(stack_trace_consume_fn consume_entry, void *cookie,
> > > +  struct task_struct *task, struct pt_regs *regs);
> > > +int arch_stack_walk_reliable(stack_trace_consume_fn consume_entry, void 
> > > *cookie,
> > > +  struct task_struct *task);
> > 
> > This bugs me a little; ideally the _reliable() thing would not exists.
> > 
> > Thomas said that the existing __save_stack_trace_reliable() is different
> > enough for the unification to be non-trivial, but maybe Josh can help
> > out?
> > 
> > >From what I can see the biggest significant differences are:
> > 
> >  - it looks at the regs sets on the stack and for FP bails early
> >  - bails for khreads and idle (after it does all the hard work!?!)
> > 
> > The first (FP checking for exceptions) should probably be reflected in
> > consume_fn(.reliable) anyway -- although that would mean a lot of extra
> > '?' entries where there are none today.
> > 
> > And the second (KTHREAD/IDLE) is something that the generic code can
> > easily do before calling into the arch unwinder.
> 
> And looking at the powerpc version of it, that has even more interesting
> extra checks in that function.

Right, but not fundamentally different from determining @reliable I
think.

Anyway, it would be good if someone knowledgable could have a look at
this.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v2 56/79] docs: Documentation/.txt: rename all ReST files to .rst

2019-04-24 Thread Peter Zijlstra

On Mon, Apr 22, 2019 at 10:27:45AM -0300, Mauro Carvalho Chehab wrote:

>  .../{atomic_bitops.txt => atomic_bitops.rst}  |  2 +

What's happend to atomic_t.txt, also NAK, I still occationally touch
these files.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v2 56/79] docs: Documentation/.txt: rename all ReST files to .rst

2019-04-24 Thread Peter Zijlstra

On Tue, Apr 23, 2019 at 10:30:53AM -0600, Jonathan Corbet wrote:
> On Tue, 23 Apr 2019 15:01:32 +0200
> Peter Zijlstra  wrote:
> 
> > But yes, I have 0 motivation to learn or abide by rst. It simply doesn't
> > give me anything in return. There is no upside, only worse text files :/
> 
> So I believe it gives even you one thing in return: documentation that is
> more accessible for both readers and authors.

I know I'm an odd duck; but no. They're _less_ accessible for me, as
both a reader and author. They look 'funny' when read as a text file
(the only way it makes sense to read them; I spend 99% of my time on a
computer looking at monospace text interfaces; mutt, vim and console, in
that approximate order).

When writing, I now have to be bothered about this format crap over just
trying to write a coherent document.

Look at crap like this:

"The memory allocations via :c:func:`kmalloc`, :c:func:`vmalloc`,
:c:func:`kmem_cache_alloc` and"

That should've been written like:

"The memory allocations via kmalloc(), vmalloc(), kmem_cache_alloc()
and"

Heck, that paragraph isn't even properly flowed.

Then there's the endless stuck ':' key, and the mysterious "''" because
\" isn't a character, oh wait.

Bah..
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v2 56/79] docs: Documentation/.txt: rename all ReST files to .rst

2019-04-24 Thread Peter Zijlstra

On Tue, Apr 23, 2019 at 08:55:19AM -0400, Mike Snitzer wrote:
> On Tue, Apr 23 2019 at  4:31am -0400,
> Peter Zijlstra  wrote:
> 
> > On Mon, Apr 22, 2019 at 10:27:45AM -0300, Mauro Carvalho Chehab wrote:
> > 
> > >  .../{atomic_bitops.txt => atomic_bitops.rst}  |  2 +
> > 
> > What's happend to atomic_t.txt, also NAK, I still occationally touch
> > these files.
> 
> Seems Mauro's point is in the future we need to touch these .rst files
> in terms of ReST compatible changes.
> 
> I'm dreading DM documentation changes in the future.. despite Mauro and
> Jon Corbet informing me that ReST is simple, etc.

Well, it _can_ be simple, I've seen examples of rst that were not far
from generated HTML contents. And I must give Jon credit for not
accepting that atrocious crap.

But yes, I have 0 motivation to learn or abide by rst. It simply doesn't
give me anything in return. There is no upside, only worse text files :/
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v2 56/79] docs: Documentation/.txt: rename all ReST files to .rst

2019-04-24 Thread Peter Zijlstra


A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing in e-mail?
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v2 56/79] docs: Documentation/.txt: rename all ReST files to .rst

2019-04-24 Thread Peter Zijlstra

On Tue, Apr 23, 2019 at 11:38:16PM +0200, Borislav Petkov wrote:
> If that is all the changes it would need, then I guess that's ok. Btw,
> those rst-conversion patches don't really show what got changed. Dunno
> if git can even show that properly. I diffed the two files by hand to
> see what got changed, see end of mail.

That is not a happy diff; that table has gotten waay worse to read due
to all that extra table crap.

> ---
> --- mm.old2019-04-23 23:18:55.954335784 +0200
> +++ mm.new2019-04-23 23:18:48.122335821 +0200
> @@ -18,51 +18,68 @@ Notes:
> notation than "16 EB", which few will recognize at first sight as 16 
> exabytes.
> It also shows it nicely how incredibly large 64-bit address space is.
>  
> -
> -Start addr|   Offset   | End addr |  Size   | VM area 
> description
> -
> -  ||  | |
> -  |0   | 7fff |  128 TB | user-space 
> virtual memory, different per mm
> -__||__|_|___
> -  ||  | |
> - 8000 | +128TB | 7fff | ~16M TB | ... huge, 
> almost 64 bits wide hole of non-canonical
> -  ||  | | virtual 
> memory addresses up to the -128 TB
> -  ||  | | starting 
> offset of kernel mappings.
> -__||__|_|___
> -|
> -| Kernel-space 
> virtual memory, shared between all processes:
> -|___
> -  ||  | |
> - 8000 | -128TB | 87ff |8 TB | ... guard 
> hole, also reserved for hypervisor
> - 8800 | -120TB | 887f |  0.5 TB | LDT remap for 
> PTI
> - 8880 | -119.5  TB | c87f |   64 TB | direct mapping 
> of all physical memory (page_offset_base)
> - c880 |  -55.5  TB | c8ff |  0.5 TB | ... unused hole
> - c900 |  -55TB | e8ff |   32 TB | 
> vmalloc/ioremap space (vmalloc_base)
> - e900 |  -23TB | e9ff |1 TB | ... unused hole
> - ea00 |  -22TB | eaff |1 TB | virtual memory 
> map (vmemmap_base)
> - eb00 |  -21TB | ebff |1 TB | ... unused hole
> - ec00 |  -20TB | fbff |   16 TB | KASAN shadow 
> memory
> -__||__|_|
> -|
> -| Identical 
> layout to the 56-bit one from here on:
> -|
> -  ||  | |
> - fc00 |   -4TB | fdff |2 TB | ... unused hole
> -  ||  | | vaddr_end for 
> KASLR
> - fe00 |   -2TB | fe7f |  0.5 TB | cpu_entry_area 
> mapping
> - fe80 |   -1.5  TB | feff |  0.5 TB | ... unused hole
> - ff00 |   -1TB | ff7f |  0.5 TB | %esp fixup 
> stacks
> - ff80 | -512GB | ffee |  444 GB | ... unused hole
> - ffef |  -68GB | fffe |   64 GB | EFI region 
> mapping space
> -  |   -4GB | 7fff |2 GB | ... unused hole
> - 8000 |   -2GB | 9fff |  512 MB | kernel text 
> mapping, mapped to physical address 0
> - 8000 |-2048MB |  | |
> - a000 |-1536MB | feff | 1520 MB | module mapping 
> space
> - ff00 |  -16MB |  | |
> -FIXADDR_START | ~-11MB | ff5f | ~0.5 MB | 
> kernel-internal fixmap range, variable size and offset
> - ff60 |  -10MB | ff600fff |4 kB | legacy 
> vsyscall ABI
> - ffe0 |   -2MB |  |2 MB | ... unused hole
> -__||__

Re: [PATCH v2 56/79] docs: Documentation/.txt: rename all ReST files to .rst

2019-04-24 Thread Peter Zijlstra

On Tue, Apr 23, 2019 at 11:53:49AM -0600, Jonathan Corbet wrote:
> > Look at crap like this:
> > 
> > "The memory allocations via :c:func:`kmalloc`, :c:func:`vmalloc`,
> > :c:func:`kmem_cache_alloc` and"
> > 
> > That should've been written like:
> > 
> > "The memory allocations via kmalloc(), vmalloc(), kmem_cache_alloc()
> > and"
> 
> Yeah, I get it.  That markup generates cross-references, which can be
> seriously useful for readers - we want that.

The funny thing is; that sentence continues (on a new line) like:

"friends are traced and the pointers, together with additional"

So while it then has cross-references to a few functions, all 'friends'
are left dangling. So what's the point of the cross-references?

Also, 'make ctags' and follow tag (ctrl-] for fellow vim users) will get
you to the function, no magic markup required.

> But I do wonder if we
> couldn't do it automatically with just a little bit of scripting work.
> It's not to hard to recognize this_is_a_function(), after all.  I'll look
> into that, it would definitely help to remove some gunk from the source
> docs.

That would be good; less markup is more.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [patch V2 18/29] lockdep: Move stack trace logic into check_prev_add()

2019-04-24 Thread Peter Zijlstra

On Thu, Apr 18, 2019 at 10:41:37AM +0200, Thomas Gleixner wrote:
> There is only one caller of check_prev_add() which hands in a zeroed struct
> stack trace and a function pointer to save_stack(). Inside check_prev_add()
> the stack_trace struct is checked for being empty, which is always
> true. Based on that one code path stores a stack trace which is unused. The
> comment there does not make sense either. It's all leftovers from
> historical lockdep code (cross release).

I was more or less expecting a revert of:

ce07a9415f26 ("locking/lockdep: Make check_prev_add() able to handle external 
stack_trace")

And then I read the comment that went with the "static struct
stack_trace trace" that got removed (in the above commit) and realized
that your patch will consume more stack entries.

The problem is when the held lock stack in check_prevs_add() has multple
trylock entries on top, in that case we call check_prev_add() multiple
times, and this patch will then save the exact same stack-trace multiple
times, consuming static resources.

Possibly we should copy what stackdepot does (but we cannot use it
directly because stackdepot uses locks; but possible we can share bits),
but that is a patch for another day I think.

So while convoluted, perhaps we should retain this code for now.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [patch V2 19/29] lockdep: Simplify stack trace handling

2019-04-24 Thread Peter Zijlstra

On Thu, Apr 18, 2019 at 10:41:38AM +0200, Thomas Gleixner wrote:
> Replace the indirection through struct stack_trace by using the storage
> array based interfaces and storing the information is a small lockdep
> specific data structure.
> 

Acked-by: Peter Zijlstra (Intel) 
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [patch V3 18/29] lockdep: Remove save argument from check_prev_add()

2019-04-25 Thread Peter Zijlstra

On Thu, Apr 25, 2019 at 11:45:11AM +0200, Thomas Gleixner wrote:
> There is only one caller which hands in save_trace as function pointer.
> 
> Signed-off-by: Thomas Gleixner 

Acked-by: Peter Zijlstra (Intel) 
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 1/6] locking: add ww_mutex_(un)lock_for_each helpers

2019-06-14 Thread Peter Zijlstra

On Fri, Jun 14, 2019 at 02:41:20PM +0200, Christian König wrote:
> The ww_mutex implementation allows for detection deadlocks when multiple
> threads try to acquire the same set of locks in different order.
> 
> The problem is that handling those deadlocks was the burden of the user of
> the ww_mutex implementation and at least some users didn't got that right
> on the first try.
> 
> I'm not a big fan of macros, but it still better then implementing the same
> logic at least halve a dozen times. So this patch adds two macros to lock
> and unlock multiple ww_mutex instances with the necessary deadlock handling.
> 
> Signed-off-by: Christian König 
> ---
>  include/linux/ww_mutex.h | 75 
>  1 file changed, 75 insertions(+)
> 
> diff --git a/include/linux/ww_mutex.h b/include/linux/ww_mutex.h
> index 3af7c0e03be5..a0d893b5b114 100644
> --- a/include/linux/ww_mutex.h
> +++ b/include/linux/ww_mutex.h
> @@ -369,4 +369,79 @@ static inline bool ww_mutex_is_locked(struct ww_mutex 
> *lock)
>   return mutex_is_locked(&lock->base);
>  }
>  
> +/**
> + * ww_mutex_unlock_for_each - cleanup after error or contention
> + * @loop: for loop code fragment iterating over all the locks
> + * @pos: code fragment returning the currently handled lock
> + * @contended: the last contended ww_mutex or NULL or ERR_PTR
> + *
> + * Helper to make cleanup after error or lock contention easier.
> + * First unlock the last contended lock and then all other locked ones.
> + */
> +#define ww_mutex_unlock_for_each(loop, pos, contended)   \
> + if (!IS_ERR(contended)) {   \
> + if (contended)  \
> + ww_mutex_unlock(contended); \
> + contended = (pos);  \
> + loop {  \
> + if (contended == (pos)) \
> + break;  \
> + ww_mutex_unlock(pos);   \
> + }   \
> + }
> +
> +/**
> + * ww_mutex_lock_for_each - implement ww_mutex deadlock handling
> + * @loop: for loop code fragment iterating over all the locks
> + * @pos: code fragment returning the currently handled lock
> + * @contended: ww_mutex pointer variable for state handling
> + * @ret: int variable to store the return value of ww_mutex_lock()
> + * @intr: If true ww_mutex_lock_interruptible() is used
> + * @ctx: ww_acquire_ctx pointer for the locking
> + *
> + * This macro implements the necessary logic to lock multiple ww_mutex
> + * instances. Lock contention with backoff and re-locking is handled by the
> + * macro so that the loop body only need to handle other errors and
> + * successfully acquired locks.
> + *
> + * With the @loop and @pos code fragment it is possible to apply this logic
> + * to all kind of containers (array, list, tree, etc...) holding multiple
> + * ww_mutex instances.
> + *
> + * @contended is used to hold the current state we are in. -ENOENT is used to
> + * signal that we are just starting the handling. -EDEADLK means that the
> + * current position is contended and we need to restart the loop after 
> locking
> + * it. NULL means that there is no contention to be handled. Any other value 
> is
> + * the last contended ww_mutex.
> + */
> +#define ww_mutex_lock_for_each(loop, pos, contended, ret, intr, ctx) \
> + for (contended = ERR_PTR(-ENOENT); ({   \
> + __label__ relock, next; \
> + ret = -ENOENT;  \
> + if (contended == ERR_PTR(-ENOENT))  \
> + contended = NULL;   \
> + else if (contended == ERR_PTR(-EDEADLK))\
> + contended = (pos);  \
> + else\
> + goto next;  \
> + loop {  \
> + if (contended == (pos)) {   \
> + contended = NULL;   \
> + continue;   \
> + }   \
> +relock:  
> \
> + ret = !(intr) ? ww_mutex_lock(pos, ctx) :   \
> + ww_mutex_lock_interruptible(pos, ctx);  \
> + if (ret == -EDEADLK) {  \
> + ww_mutex_unlock_for_each(loop, pos, \
> +  contended);\
> + contended

Re: [PATCH 3/6] drm/gem: use new ww_mutex_(un)lock_for_each macros

2019-06-14 Thread Peter Zijlstra

On Fri, Jun 14, 2019 at 02:41:22PM +0200, Christian König wrote:
> Use the provided macros instead of implementing deadlock handling on our own.
> 
> Signed-off-by: Christian König 
> ---
>  drivers/gpu/drm/drm_gem.c | 49 ++-
>  1 file changed, 12 insertions(+), 37 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
> index 50de138c89e0..6e4623d3bee2 100644
> --- a/drivers/gpu/drm/drm_gem.c
> +++ b/drivers/gpu/drm/drm_gem.c
> @@ -1307,51 +1307,26 @@ int
>  drm_gem_lock_reservations(struct drm_gem_object **objs, int count,
> struct ww_acquire_ctx *acquire_ctx)
>  {
> - int contended = -1;
> + struct ww_mutex *contended;
>   int i, ret;
>  
>   ww_acquire_init(acquire_ctx, &reservation_ww_class);
>  
> -retry:
> - if (contended != -1) {
> - struct drm_gem_object *obj = objs[contended];
> -
> - ret = ww_mutex_lock_slow_interruptible(&obj->resv->lock,
> -acquire_ctx);
> - if (ret) {
> - ww_acquire_done(acquire_ctx);
> - return ret;
> - }
> - }
> -
> - for (i = 0; i < count; i++) {
> - if (i == contended)
> - continue;
> -
> - ret = ww_mutex_lock_interruptible(&objs[i]->resv->lock,
> -   acquire_ctx);
> - if (ret) {
> - int j;
> -
> - for (j = 0; j < i; j++)
> - ww_mutex_unlock(&objs[j]->resv->lock);
> -
> - if (contended != -1 && contended >= i)
> - ww_mutex_unlock(&objs[contended]->resv->lock);
> -
> - if (ret == -EDEADLK) {
> - contended = i;
> - goto retry;

retry here, starts the whole locking loop.

> - }
> -
> - ww_acquire_done(acquire_ctx);
> - return ret;
> - }
> - }

+#define ww_mutex_unlock_for_each(loop, pos, contended) \
+   if (!IS_ERR(contended)) {   \
+   if (contended)  \
+   ww_mutex_unlock(contended); \
+   contended = (pos);  \
+   loop {  \
+   if (contended == (pos)) \
+   break;  \
+   ww_mutex_unlock(pos);   \
+   }   \
+   }
+

+#define ww_mutex_lock_for_each(loop, pos, contended, ret, intr, ctx)   \
+   for (contended = ERR_PTR(-ENOENT); ({   \
+   __label__ relock, next; \
+   ret = -ENOENT;  \
+   if (contended == ERR_PTR(-ENOENT))  \
+   contended = NULL;   \
+   else if (contended == ERR_PTR(-EDEADLK))\
+   contended = (pos);  \
+   else\
+   goto next;  \
+   loop {  \
+   if (contended == (pos)) {   \
+   contended = NULL;   \
+   continue;   \
+   }   \
+relock:
\
+   ret = !(intr) ? ww_mutex_lock(pos, ctx) :   \
+   ww_mutex_lock_interruptible(pos, ctx);  \
+   if (ret == -EDEADLK) {  \
+   ww_mutex_unlock_for_each(loop, pos, \
+contended);\
+   contended = ERR_PTR(-EDEADLK);  \
+   goto relock;\

while relock here continues where it left of and doesn't restart @loop.
Or am I reading this monstrosity the wrong way?

+   }   \
+   break;  \
+next:  \
+   continue;   \
+   }   \
+   }), ret != -ENOENT;)
+

> + ww_mutex_lock_for_each(for (i = 0; i

Re: [PATCH 3/6] drm/gem: use new ww_mutex_(un)lock_for_each macros

2019-06-14 Thread Peter Zijlstra

On Fri, Jun 14, 2019 at 02:41:22PM +0200, Christian König wrote:
> Use the provided macros instead of implementing deadlock handling on our own.
> 
> Signed-off-by: Christian König 
> ---
>  drivers/gpu/drm/drm_gem.c | 49 ++-
>  1 file changed, 12 insertions(+), 37 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
> index 50de138c89e0..6e4623d3bee2 100644
> --- a/drivers/gpu/drm/drm_gem.c
> +++ b/drivers/gpu/drm/drm_gem.c
> @@ -1307,51 +1307,26 @@ int
>  drm_gem_lock_reservations(struct drm_gem_object **objs, int count,
> struct ww_acquire_ctx *acquire_ctx)
>  {
> - int contended = -1;
> + struct ww_mutex *contended;
>   int i, ret;
>  
>   ww_acquire_init(acquire_ctx, &reservation_ww_class);
>  
> -retry:
> - if (contended != -1) {
> - struct drm_gem_object *obj = objs[contended];
> -
> - ret = ww_mutex_lock_slow_interruptible(&obj->resv->lock,
> -acquire_ctx);
> - if (ret) {
> - ww_acquire_done(acquire_ctx);
> - return ret;
> - }
> - }
> -
> - for (i = 0; i < count; i++) {
> - if (i == contended)
> - continue;
> -
> - ret = ww_mutex_lock_interruptible(&objs[i]->resv->lock,
> -   acquire_ctx);
> - if (ret) {
> - int j;
> -
> - for (j = 0; j < i; j++)
> - ww_mutex_unlock(&objs[j]->resv->lock);
> -
> - if (contended != -1 && contended >= i)
> - ww_mutex_unlock(&objs[contended]->resv->lock);
> -
> - if (ret == -EDEADLK) {
> - contended = i;
> - goto retry;
> - }
> -
> - ww_acquire_done(acquire_ctx);
> - return ret;
> - }
> - }

I note all the sites you use this on are simple idx iterators; so how
about something like so:

int ww_mutex_unlock_all(int count, void *data, struct ww_mutex *(*func)(int, 
void *))
{
int i;

for (i = 0; i < count; i++) {
lock = func(i, data);
ww_mutex_unlock(lock);
}
}

int ww_mutex_lock_all(int count, struct ww_acquire_context *acquire_ctx, bool 
intr,
  void *data, struct ww_mutex *(*func)(int, void *))
{
int i, ret, contended = -1;
struct ww_mutex *lock;

retry:
if (contended != -1) {
lock = func(contended, data);
if (intr)
ret = ww_mutex_lock_slow_interruptible(lock, 
acquire_ctx);
else
ret = ww_mutex_lock_slow(lock, acquire_ctx), 0;

if (ret) {
ww_acquire_done(acquire_ctx);
return ret;
}
}

for (i = 0; i < count; i++) {
if (i == contended)
continue;

lock = func(i, data);
if (intr)
ret = ww_mutex_lock_interruptible(lock, acquire_ctx);
else
ret = ww_mutex_lock(lock, acquire_ctx), 0;

if (ret) {
ww_mutex_unlock_all(i, data, func);
if (contended > i) {
lock = func(contended, data);
ww_mutex_unlock(lock);
}

if (ret == -EDEADLK) {
contended = i;
goto retry;
}

ww_acquire_done(acquire_ctx);
return ret;
}
}

ww_acquire_done(acquire_ctx);
return 0;
}

> + ww_mutex_lock_for_each(for (i = 0; i < count; i++),
> +&objs[i]->resv->lock, contended, ret, true,
> +acquire_ctx)
> + if (ret)
> + goto error;

which then becomes:

struct ww_mutex *gem_ww_mutex_func(int i, void *data)
{
struct drm_gem_object **objs = data;
return &objs[i]->resv->lock;
}

ret = ww_mutex_lock_all(count, acquire_ctx, true, objs, 
gem_ww_mutex_func);

>   ww_acquire_done(acquire_ctx);
>  
>   return 0;
> +
> +error:
> + ww_mutex_unlock_for_each(for (i = 0; i < count; i++),
> +  &objs[i]->resv->lock, contended);
> + ww_acquire_done(acquire_ctx);
> + return ret;
>  }
>  EXPORT_SYMBOL(drm_gem_lock_reservations);
>  
> -- 
> 2.17.1
> 
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listi

Re: [PATCH 3/6] drm/gem: use new ww_mutex_(un)lock_for_each macros

2019-06-14 Thread Peter Zijlstra

On Fri, Jun 14, 2019 at 03:06:10PM +0200, Christian König wrote:
> Am 14.06.19 um 14:59 schrieb Peter Zijlstra:
> > +#define ww_mutex_lock_for_each(loop, pos, contended, ret, intr, ctx)   \
> > +   for (contended = ERR_PTR(-ENOENT); ({   \
> > +   __label__ relock, next; \
> > +   ret = -ENOENT;  \
> > +   if (contended == ERR_PTR(-ENOENT))  \
> > +   contended = NULL;   \
> > +   else if (contended == ERR_PTR(-EDEADLK))\
> > +   contended = (pos);  \
> > +   else\
> > +   goto next;  \
> > +   loop {  \
> > +   if (contended == (pos)) {   \
> > +   contended = NULL;   \
> > +   continue;   \
> > +   }   \
> > +relock:
> > \
> > +   ret = !(intr) ? ww_mutex_lock(pos, ctx) :   \
> > +   ww_mutex_lock_interruptible(pos, ctx);  \
> > +   if (ret == -EDEADLK) {  \
> > +   ww_mutex_unlock_for_each(loop, pos, \
> > +contended);\
> > +   contended = ERR_PTR(-EDEADLK);  \
> > +   goto relock;\
> > 
> > while relock here continues where it left of and doesn't restart @loop.
> > Or am I reading this monstrosity the wrong way?
> 
> contended = ERR_PTR(-EDEADLK) makes sure that the whole loop is restarted
> after retrying to acquire the lock.
> 
> See the "if" above "loop".

ARGH, the loop inside the loop... Yeah, maybe, brain hurts.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v1 12/22] docs: driver-api: add .rst files from the main dir

2019-06-19 Thread Peter Zijlstra

On Wed, Jun 19, 2019 at 07:22:18AM -0300, Mauro Carvalho Chehab wrote:
> Hi Daniel,
> 
> Em Wed, 19 Jun 2019 11:05:57 +0200
> Daniel Vetter  escreveu:
> 
> > On Tue, Jun 18, 2019 at 10:55 PM Mauro Carvalho Chehab
> >  wrote:
> > > diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst
> > > index fa30dfcfc3c8..b0f948d8733b 100644
> > > --- a/Documentation/gpu/drm-mm.rst
> > > +++ b/Documentation/gpu/drm-mm.rst
> > > @@ -320,7 +320,7 @@ struct :c:type:`struct file_operations 
> > > ` get_unmapped_area
> > >  field with a pointer on :c:func:`drm_gem_cma_get_unmapped_area`.
> > >
> > >  More detailed information about get_unmapped_area can be found in
> > > -Documentation/nommu-mmap.rst
> > > +Documentation/driver-api/nommu-mmap.rst  
> > 
> > Random drive-by comment: Could we convert these into hyperlinks within
> > sphinx somehow, without making them less useful as raw file references
> > (with vim I can just type 'gf' and it works, emacs probably the same).
> > -Daniel
> 
> Short answer: I don't know how vim/emacs would recognize Sphinx tags.

No, the other way around, Sphinx can recognize local files and treat
them special. That way we keep the text readable.

Same with that :c:func:'foo' crap, that needs to die, and Sphinx needs
to be taught about foo().
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v1 12/22] docs: driver-api: add .rst files from the main dir

2019-06-19 Thread Peter Zijlstra

On Tue, Jun 18, 2019 at 05:53:17PM -0300, Mauro Carvalho Chehab wrote:

>  .../{ => driver-api}/atomic_bitops.rst|  2 -

That's a .txt file, big fat NAK for making it an rst.

>  .../{ => driver-api}/futex-requeue-pi.rst |  2 -

>  .../{ => driver-api}/gcc-plugins.rst  |  2 -

>  Documentation/{ => driver-api}/kprobes.rst|  2 -
>  .../{ => driver-api}/percpu-rw-semaphore.rst  |  2 -

More NAK for rst conversion

>  Documentation/{ => driver-api}/pi-futex.rst   |  2 -
>  .../{ => driver-api}/preempt-locking.rst  |  2 -

>  Documentation/{ => driver-api}/rbtree.rst |  2 -

>  .../{ => driver-api}/robust-futex-ABI.rst |  2 -
>  .../{ => driver-api}/robust-futexes.rst   |  2 -

>  .../{ => driver-api}/speculation.rst  |  8 +--
>  .../{ => driver-api}/static-keys.rst  |  2 -

>  .../{ => driver-api}/this_cpu_ops.rst |  2 -

>  Documentation/locking/rt-mutex.rst|  2 +-

NAK. None of the above have anything to do with driver-api.

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v1 12/22] docs: driver-api: add .rst files from the main dir

2019-06-19 Thread Peter Zijlstra

On Wed, Jun 19, 2019 at 01:43:56PM +0200, Peter Zijlstra wrote:
> On Tue, Jun 18, 2019 at 05:53:17PM -0300, Mauro Carvalho Chehab wrote:
> 
> >  .../{ => driver-api}/atomic_bitops.rst|  2 -
> 
> That's a .txt file, big fat NAK for making it an rst.

Also, how many bloody times do I have to keep telling this? It is
starting to get _REALLY_ annoying.


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v1 12/22] docs: driver-api: add .rst files from the main dir

2019-06-19 Thread Peter Zijlstra

On Wed, Jun 19, 2019 at 01:45:51PM +0200, Peter Zijlstra wrote:
> On Wed, Jun 19, 2019 at 01:43:56PM +0200, Peter Zijlstra wrote:
> > On Tue, Jun 18, 2019 at 05:53:17PM -0300, Mauro Carvalho Chehab wrote:
> > 
> > >  .../{ => driver-api}/atomic_bitops.rst|  2 -
> > 
> > That's a .txt file, big fat NAK for making it an rst.
> 
> Also, how many bloody times do I have to keep telling this? It is
> starting to get _REALLY_ annoying.

Also, cross-posting to moderated lists is rude.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[RFC][PATCH] wake_up_var() memory ordering

2019-06-24 Thread Peter Zijlstra

Hi all,

I tried using wake_up_var() today and accidentally noticed that it
didn't imply an smp_mb() and specifically requires it through
wake_up_bit() / waitqueue_active().

Now, wake_up_bit() doesn't imply the barrier because it is assumed to be
used with the atomic bitops API which either implies (test_and_clear) or
only needs smp_mb__after_atomic(), which is 'much' cheaper than an
unconditional smp_mb().

Still, while auditing all that, I found a whole bunch of things that
could be improved. There were missing barriers, superfluous barriers and
a whole bunch of sites that could use clear_and_wake_up_bit().

So this fixes all wake_up_bit() usage without actually changing
semantics of it (which are unfortunate but understandable). This does
however change the semantics of wake_up_var(); even though wake_up_var()
is most often used with atomics and then the additional smp_mb() is most
often superfluous :/

There isn't really a good option here, comments (other than I need to
split this up)?


---
 drivers/bluetooth/btmtksdio.c   |  5 +
 drivers/bluetooth/btmtkuart.c   |  5 +
 drivers/bluetooth/hci_mrvl.c|  8 ++--
 drivers/gpu/drm/i915/i915_reset.c   |  6 ++
 drivers/md/dm-bufio.c   | 10 ++
 drivers/media/usb/dvb-usb-v2/dvb_usb_core.c | 15 ---
 fs/afs/fs_probe.c   |  1 +
 fs/afs/server.c |  1 +
 fs/afs/vl_probe.c   |  1 +
 fs/afs/volume.c |  1 +
 fs/aio.c|  4 +---
 fs/block_dev.c  |  1 +
 fs/btrfs/extent_io.c|  4 +---
 fs/cachefiles/namei.c   |  1 +
 fs/cifs/connect.c   |  3 +--
 fs/cifs/misc.c  | 15 +--
 fs/fscache/cookie.c |  2 ++
 fs/fscache/object.c |  2 ++
 fs/fscache/page.c   |  3 +++
 fs/gfs2/glock.c |  8 ++--
 fs/gfs2/glops.c |  1 +
 fs/gfs2/lock_dlm.c  |  8 ++--
 fs/gfs2/recovery.c  |  4 +---
 fs/gfs2/super.c |  1 +
 fs/gfs2/sys.c   |  4 +---
 fs/nfs/nfs4state.c  |  4 +---
 fs/nfs/pnfs_nfs.c   |  4 +---
 fs/nfsd/nfs4recover.c   |  4 +---
 fs/orangefs/file.c  |  2 +-
 kernel/sched/wait_bit.c |  1 +
 net/bluetooth/hci_event.c   |  5 +
 net/rds/ib_recv.c   |  1 +
 security/keys/gc.c  |  5 ++---
 33 files changed, 50 insertions(+), 90 deletions(-)

diff --git a/drivers/bluetooth/btmtksdio.c b/drivers/bluetooth/btmtksdio.c
index 813338288453..27523cfeac9a 100644
--- a/drivers/bluetooth/btmtksdio.c
+++ b/drivers/bluetooth/btmtksdio.c
@@ -356,11 +356,8 @@ static int btmtksdio_recv_event(struct hci_dev *hdev, 
struct sk_buff *skb)
 
if (hdr->evt == HCI_EV_VENDOR) {
if (test_and_clear_bit(BTMTKSDIO_TX_WAIT_VND_EVT,
-  &bdev->tx_state)) {
-   /* Barrier to sync with other CPUs */
-   smp_mb__after_atomic();
+  &bdev->tx_state))
wake_up_bit(&bdev->tx_state, BTMTKSDIO_TX_WAIT_VND_EVT);
-   }
}
 
return 0;
diff --git a/drivers/bluetooth/btmtkuart.c b/drivers/bluetooth/btmtkuart.c
index f5dbeec8e274..7fe324df3799 100644
--- a/drivers/bluetooth/btmtkuart.c
+++ b/drivers/bluetooth/btmtkuart.c
@@ -340,11 +340,8 @@ static int btmtkuart_recv_event(struct hci_dev *hdev, 
struct sk_buff *skb)
 
if (hdr->evt == HCI_EV_VENDOR) {
if (test_and_clear_bit(BTMTKUART_TX_WAIT_VND_EVT,
-  &bdev->tx_state)) {
-   /* Barrier to sync with other CPUs */
-   smp_mb__after_atomic();
+  &bdev->tx_state))
wake_up_bit(&bdev->tx_state, BTMTKUART_TX_WAIT_VND_EVT);
-   }
}
 
return 0;
diff --git a/drivers/bluetooth/hci_mrvl.c b/drivers/bluetooth/hci_mrvl.c
index 50212ac629e3..f03294d39d08 100644
--- a/drivers/bluetooth/hci_mrvl.c
+++ b/drivers/bluetooth/hci_mrvl.c
@@ -157,9 +157,7 @@ static int mrvl_recv_fw_req(struct hci_dev *hdev, struct 
sk_buff *skb)
 
mrvl->tx_len = le16_to_cpu(pkt->lhs);
 
-   clear_bit(STATE_FW_REQ_PENDING, &mrvl->flags);
-   smp_mb__after_atomic();
-   wake_up_bit(&mrvl->flags, STATE_FW_REQ_PENDING);
+   clear_and_wake_up_bit(STATE_FW_REQ_PENDING, &mrvl->flags);
 
 done:
kfree_skb(skb);
@@ -192,9 +190,7 @@ static int mrvl_recv_chi

Re: [RFC][PATCH] wake_up_var() memory ordering

2019-06-25 Thread Peter Zijlstra

(sorry for cross-posting to moderated lists btw, I've since
 acquired a patch to get_maintainers.pl that wil exclude them
 in the future)

On Tue, Jun 25, 2019 at 08:51:01AM +0100, David Howells wrote:
> Peter Zijlstra  wrote:
> 
> > I tried using wake_up_var() today and accidentally noticed that it
> > didn't imply an smp_mb() and specifically requires it through
> > wake_up_bit() / waitqueue_active().
> 
> Thinking about it again, I'm not sure why you need to add the barrier when
> wake_up() (which this is a wrapper around) is required to impose a barrier at
> the front if there's anything to wake up (ie. the wait queue isn't empty).
> 
> If this is insufficient, does it make sense just to have wake_up*() functions
> do an unconditional release or full barrier right at the front, rather than it
> being conditional on something being woken up?

The curprit is __wake_up_bit()'s usage of waitqueue_active(); it is this
latter (see its comment) that requires the smp_mb().

wake_up_bit() and wake_up_var() are wrappers around __wake_up_bit().

Without this barrier it is possible for the waitqueue_active() load to
be hoisted over the cond=true store and the remote end can miss the
store and we can miss its enqueue and we'll all miss a wakeup and get
stuck.

Adding an smp_mb() (or use wq_has_sleeper()) in __wake_up_bit() would be
nice, but I fear some people will complain about overhead, esp. since
about half the sites don't need the barrier due to being behind
test_and_clear_bit() and the other half using smp_mb__after_atomic()
after some clear_bit*() variant.

There's a few sites that seem to open-code
wait_var_event()/wake_up_var() and those actually need the full
smp_mb(), but then maybe they should be converted to var instread of bit
anyway.

> > @@ -619,9 +614,7 @@ static int dvb_usb_fe_sleep(struct dvb_frontend *fe)
> >  err:
> > if (!adap->suspend_resume_active) {
> > adap->active_fe = -1;
> 
> I'm wondering if there's a missing barrier here.  Should the clear_bit() on
> the next line be clear_bit_unlock() or clear_bit_release()?

That looks reasonable, but I'd like to hear from the DVB folks on that.

> > -   clear_bit(ADAP_SLEEP, &adap->state_bits);
> > -   smp_mb__after_atomic();
> > -   wake_up_bit(&adap->state_bits, ADAP_SLEEP);
> > +   clear_and_wake_up_bit(ADAP_SLEEP, &adap->state_bits);
> > }
> >  
> > dev_dbg(&d->udev->dev, "%s: ret=%d\n", __func__, ret);
> > diff --git a/fs/afs/fs_probe.c b/fs/afs/fs_probe.c
> > index cfe62b154f68..377ee07d5f76 100644
> > --- a/fs/afs/fs_probe.c
> > +++ b/fs/afs/fs_probe.c
> > @@ -18,6 +18,7 @@ static bool afs_fs_probe_done(struct afs_server *server)
> >  
> > wake_up_var(&server->probe_outstanding);
> > clear_bit_unlock(AFS_SERVER_FL_PROBING, &server->flags);
> > +   smp_mb__after_atomic();
> > wake_up_bit(&server->flags, AFS_SERVER_FL_PROBING);
> > return true;
> >  }
> 
> Looking at this and the dvb one, does it make sense to stick the release
> semantics of clear_bit_unlock() into clear_and_wake_up_bit()?

I was thinking of adding another helper, maybe unlock_and_wake_up_bit()
that included that extra barrier, but maybe making it unconditional
isn't the worst idea.

> Also, should clear_bit_unlock() be renamed to clear_bit_release() (and
> similarly test_and_set_bit_lock() -> test_and_set_bit_acquire()) if we seem to
> be trying to standardise on that terminology.

That definitely makes sense to me, there's only 157 clear_bit_unlock()
and 76 test_and_set_bit_lock() users (note the asymetry of that).

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RFC][PATCH] wake_up_var() memory ordering

2019-06-25 Thread Peter Zijlstra

On Tue, Jun 25, 2019 at 11:19:35AM +0200, Andreas Gruenbacher wrote:
> > diff --git a/fs/gfs2/glops.c b/fs/gfs2/glops.c
> > index cf4c767005b1..29ea5da7 100644
> > --- a/fs/gfs2/glops.c
> > +++ b/fs/gfs2/glops.c
> > @@ -227,6 +227,7 @@ static void gfs2_clear_glop_pending(struct gfs2_inode 
> > *ip)
> > return;
> >
> > clear_bit_unlock(GIF_GLOP_PENDING, &ip->i_flags);
> > +   smp_mb__after_atomic();
> > wake_up_bit(&ip->i_flags, GIF_GLOP_PENDING);
> 
> This should become clear_and_wake_up_bit as well, right? There are
> several more instances of the same pattern.

Only if we do as David suggested and make clean_and_wake_up_bit()
provide the RELEASE barrier.

That is, currently clear_and_wake_up_bit() is

clear_bit()
smp_mb__after_atomic();
wake_up_bit();

But the above is:

clear_bit_unlock();
smp_mb__after_atomic();
wake_up_bit()

the difference is that _unlock() uses RELEASE semantics, where
clear_bit() does not.

The difference is illustrated with something like:

cond = true;
clear_bit()
smp_mb__after_atomic();
wake_up_bit();

In this case, a remote CPU can first observe the clear_bit() and then
the 'cond = true' store. When we use clear_bit_unlock() this is not
possible, because the RELEASE barrier ensures that everything before,
stays before.

> >  }
> >
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RFC][PATCH] wake_up_var() memory ordering

2019-06-25 Thread Peter Zijlstra

On Tue, Jun 25, 2019 at 02:12:22PM +0200, Andreas Gruenbacher wrote:

> > Only if we do as David suggested and make clean_and_wake_up_bit()
> > provide the RELEASE barrier.
> 
> (It's clear_and_wake_up_bit, not clean_and_wake_up_bit.)

Yes, typing hard.

> > That is, currently clear_and_wake_up_bit() is
> >
> > clear_bit()
> > smp_mb__after_atomic();
> > wake_up_bit();
> >

> Now I'm confused because clear_and_wake_up_bit() in mainline does use
> clear_bit_unlock(), so it's the exact opposite of what you just said.

Argh; clearly I couldn't read. And then yes, you're right.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 1/2] dma-buf: add reservation_context for deadlock handling

2019-06-25 Thread Peter Zijlstra

On Tue, Jun 25, 2019 at 03:55:06PM +0200, Christian König wrote:

> diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
> index 4d32e2c67862..9e53e42b053a 100644
> --- a/drivers/dma-buf/reservation.c
> +++ b/drivers/dma-buf/reservation.c
> @@ -55,6 +55,88 @@ EXPORT_SYMBOL(reservation_seqcount_class);
>  const char reservation_seqcount_string[] = "reservation_seqcount";
>  EXPORT_SYMBOL(reservation_seqcount_string);
>  
> +/**
> + * reservation_context_init - initialize a reservation context
> + * @ctx: the context to initialize
> + *
> + * Start using this reservation context to lock reservation objects for 
> update.
> + */
> +void reservation_context_init(struct reservation_context *ctx)
> +{
> + ww_acquire_init(&ctx->ctx, &reservation_ww_class);
> + init_llist_head(&ctx->locked);

ctx->locked = NULL;

> + ctx->contended = NULL;
> +}
> +EXPORT_SYMBOL(reservation_context_init);
> +
> +/**
> + * reservation_context_unlock_all - unlock all reservation objects
> + * @ctx: the context which holds the reservation objects
> + *
> + * Unlocks all reservation objects locked with this context.
> + */
> +void reservation_context_unlock_all(struct reservation_context *ctx)
> +{
> + struct reservation_object *obj, *next;
> +
> + if (ctx->contended)
> + ww_mutex_unlock(&ctx->contended->lock);
> + ctx->contended = NULL;
> +
> + llist_for_each_entry_safe(obj, next, ctx->locked.first, locked)
> + ww_mutex_unlock(&obj->lock);

for (obj = ctx->locked; obj && (next = obj->locked, true); obj = next)

> + init_llist_head(&ctx->locked);

ctx->locked = NULL;
> +}
> +EXPORT_SYMBOL(reservation_context_unlock_all);
> +
> +/**
> + * reservation_context_lock - lock a reservation object with deadlock 
> handling
> + * @ctx: the context which should be used to lock the object
> + * @obj: the object which needs to be locked
> + * @interruptible: if we should wait interruptible or not
> + *
> + * Use @ctx to lock the reservation object. If a deadlock is detected we 
> backoff
> + * by releasing all locked objects and use the slow path to lock the 
> reservation
> + * object. After successfully locking in the slow path -EDEADLK is returned 
> to
> + * signal that all other locks must be re-taken as well.
> + */
> +int reservation_context_lock(struct reservation_context *ctx,
> +  struct reservation_object *obj,
> +  bool interruptible)
> +{
> + int ret = 0;
> +
> + if (unlikely(ctx->contended == obj))
> + ctx->contended = NULL;
> + else if (interruptible)
> + ret = ww_mutex_lock_interruptible(&obj->lock, &ctx->ctx);
> + else
> + ret = ww_mutex_lock(&obj->lock, &ctx->ctx);
> +
> + if (likely(!ret)) {
> + /* don't use llist_add here, we have separate locking */
> + obj->locked.next = ctx->locked.first;
> + ctx->locked.first = &obj->locked;

Since you're not actually using llist, could you then maybe open-code
the whole thing and not rely on its implementation?

That is, it hardly seems worth the trouble, as shown the bits you do use
are hardly longer than just writing it out. Also, I was confused by the
use of llist, as I wondered where the concurrency came from.

> + return 0;
> + }
> + if (unlikely(ret != -EDEADLK))
> + return ret;
> +
> + reservation_context_unlock_all(ctx);
> +
> + if (interruptible) {
> + ret = ww_mutex_lock_slow_interruptible(&obj->lock, &ctx->ctx);
> + if (unlikely(ret))
> + return ret;
> + } else {
> + ww_mutex_lock_slow(&obj->lock, &ctx->ctx);
> + }
> +
> + ctx->contended = obj;
> + return -EDEADLK;
> +}
> +EXPORT_SYMBOL(reservation_context_lock);

> diff --git a/include/linux/reservation.h b/include/linux/reservation.h
> index ee750765cc94..a8a52e5d3e80 100644
> --- a/include/linux/reservation.h
> +++ b/include/linux/reservation.h

> +struct reservation_context {
> + struct ww_acquire_ctx ctx;
> + struct llist_head locked;

struct reservation_object *locked;

> + struct reservation_object *contended;
> +};

> @@ -71,6 +108,7 @@ struct reservation_object_list {
>   */
>  struct reservation_object {
>   struct ww_mutex lock;
> + struct llist_node locked;

struct reservation_object *locked;

>   seqcount_t seq;
>  
>   struct dma_fence __rcu *fence_excl;
> -- 
> 2.17.1
> 
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 00/34] put_user_pages(): miscellaneous call sites

2019-08-02 Thread Peter Zijlstra

On Thu, Aug 01, 2019 at 07:16:19PM -0700, john.hubb...@gmail.com wrote:

> This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
> ("mm: introduce put_user_page*(), placeholder versions"). That commit
> has an extensive description of the problem and the planned steps to
> solve it, but the highlites are:

That is one horridly mangled Changelog there :-/ It looks like it's
partially duplicated.

Anyway; no objections to any of that, but I just wanted to mention that
there are other problems with long term pinning that haven't been
mentioned, notably they inhibit compaction.

A long time ago I proposed an interface to mark pages as pinned, such
that we could run compaction before we actually did the pinning.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 15/15] seqlock: drop seqcount_ww_mutex_t

2022-04-07 Thread Peter Zijlstra

On Thu, Apr 07, 2022 at 10:59:46AM +0200, Christian König wrote:
> Daniel pointed out that this series removes the last user of
> seqcount_ww_mutex_t, so let's drop this.
> 
> Signed-off-by: Christian König 

Acked-by: Peter Zijlstra (Intel)

Re: [PATCH -next] treewide: remove unused argument in lock_release()

2019-10-08 Thread Peter Zijlstra

On Tue, Oct 08, 2019 at 06:33:51PM +0200, Daniel Vetter wrote:
> On Thu, Sep 19, 2019 at 12:09:40PM -0400, Qian Cai wrote:
> > Since the commit b4adfe8e05f1 ("locking/lockdep: Remove unused argument
> > in __lock_release"), @nested is no longer used in lock_release(), so
> > remove it from all lock_release() calls and friends.
> > 
> > Signed-off-by: Qian Cai 
> 
> Ack on the concept and for the drm parts (and feel free to keep the ack if
> you inevitably have to respin this later on). Might result in some
> conflicts, but welp we need to keep Linus busy :-)
> 
> Acked-by: Daniel Vetter 

Thanks Daniel!

Re: [PATCH -next] treewide: remove unused argument in lock_release()

2019-09-30 Thread Peter Zijlstra

On Thu, Sep 19, 2019 at 12:09:40PM -0400, Qian Cai wrote:
> Since the commit b4adfe8e05f1 ("locking/lockdep: Remove unused argument
> in __lock_release"), @nested is no longer used in lock_release(), so
> remove it from all lock_release() calls and friends.

Right; I never did this cleanup for not wanting the churn, but as long
as it applies I'll take it.

> Signed-off-by: Qian Cai 
> ---
>  drivers/gpu/drm/drm_connector.c   |  2 +-
>  drivers/gpu/drm/i915/gem/i915_gem_shrinker.c  |  6 +++---
>  drivers/gpu/drm/i915/gt/intel_engine_pm.c |  2 +-
>  drivers/gpu/drm/i915/i915_request.c   |  2 +-
>  drivers/tty/tty_ldsem.c   |  8 
>  fs/dcache.c   |  2 +-
>  fs/jbd2/transaction.c |  4 ++--
>  fs/kernfs/dir.c   |  4 ++--
>  fs/ocfs2/dlmglue.c|  2 +-
>  include/linux/jbd2.h  |  2 +-
>  include/linux/lockdep.h   | 21 ++---
>  include/linux/percpu-rwsem.h  |  4 ++--
>  include/linux/rcupdate.h  |  2 +-
>  include/linux/rwlock_api_smp.h| 16 
>  include/linux/seqlock.h   |  4 ++--
>  include/linux/spinlock_api_smp.h  |  8 
>  include/linux/ww_mutex.h  |  2 +-
>  include/net/sock.h|  2 +-
>  kernel/bpf/stackmap.c |  2 +-
>  kernel/cpu.c  |  2 +-
>  kernel/locking/lockdep.c  |  3 +--
>  kernel/locking/mutex.c|  4 ++--
>  kernel/locking/rtmutex.c  |  6 +++---
>  kernel/locking/rwsem.c| 10 +-
>  kernel/printk/printk.c| 10 +-
>  kernel/sched/core.c   |  2 +-
>  lib/locking-selftest.c| 24 
>  mm/memcontrol.c   |  2 +-
>  net/core/sock.c   |  2 +-
>  tools/lib/lockdep/include/liblockdep/common.h |  3 +--
>  tools/lib/lockdep/include/liblockdep/mutex.h  |  2 +-
>  tools/lib/lockdep/include/liblockdep/rwlock.h |  2 +-
>  tools/lib/lockdep/preload.c   | 16 
>  33 files changed, 90 insertions(+), 93 deletions(-)

Thanks!

Re: linux-next: build failure after merge of the tip tree

2019-11-21 Thread Peter Zijlstra

On Thu, Nov 21, 2019 at 02:54:03PM +1100, Stephen Rothwell wrote:
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
> index f940526c5889..63e734a125fb 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
> @@ -473,7 +473,7 @@ TRACE_EVENT(amdgpu_ib_pipe_sync,
>   TP_PROTO(struct amdgpu_job *sched_job, struct dma_fence *fence),
>   TP_ARGS(sched_job, fence),
>   TP_STRUCT__entry(
> -  __string(ring, sched_job->base.sched->name);
> +  __string(ring, sched_job->base.sched->name)
>__field(uint64_t, id)
>__field(struct dma_fence *, fence)
>__field(uint64_t, ctx)

Correct, ';' there is invalid and now results in very verbose compile
errors :-)


signature.asc
Description: PGP signature
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 3/4] kernel.h: Add non_block_start/end()

2019-08-23 Thread Peter Zijlstra

On Tue, Aug 20, 2019 at 10:24:40PM +0200, Daniel Vetter wrote:
> On Tue, Aug 20, 2019 at 10:19:01AM +0200, Daniel Vetter wrote:
> > In some special cases we must not block, but there's not a
> > spinlock, preempt-off, irqs-off or similar critical section already
> > that arms the might_sleep() debug checks. Add a non_block_start/end()
> > pair to annotate these.
> > 
> > This will be used in the oom paths of mmu-notifiers, where blocking is
> > not allowed to make sure there's forward progress. Quoting Michal:
> > 
> > "The notifier is called from quite a restricted context - oom_reaper -
> > which shouldn't depend on any locks or sleepable conditionals. The code
> > should be swift as well but we mostly do care about it to make a forward
> > progress. Checking for sleepable context is the best thing we could come
> > up with that would describe these demands at least partially."
> > 
> > Peter also asked whether we want to catch spinlocks on top, but Michal
> > said those are less of a problem because spinlocks can't have an
> > indirect dependency upon the page allocator and hence close the loop
> > with the oom reaper.
> > 
> > Suggested by Michal Hocko.
> > 
> > v2:
> > - Improve commit message (Michal)
> > - Also check in schedule, not just might_sleep (Peter)
> > 
> > v3: It works better when I actually squash in the fixup I had lying
> > around :-/
> > 
> > v4: Pick the suggestion from Andrew Morton to give non_block_start/end
> > some good kerneldoc comments. I added that other blocking calls like
> > wait_event pose similar issues, since that's the other example we
> > discussed.
> > 
> > Cc: Jason Gunthorpe 
> > Cc: Peter Zijlstra 
> > Cc: Ingo Molnar 
> > Cc: Andrew Morton 
> > Cc: Michal Hocko 
> > Cc: David Rientjes 
> > Cc: "Christian König" 
> > Cc: Daniel Vetter 
> > Cc: "Jérôme Glisse" 
> > Cc: linux...@kvack.org
> > Cc: Masahiro Yamada 
> > Cc: Wei Wang 
> > Cc: Andy Shevchenko 
> > Cc: Thomas Gleixner 
> > Cc: Jann Horn 
> > Cc: Feng Tang 
> > Cc: Kees Cook 
> > Cc: Randy Dunlap 
> > Cc: linux-ker...@vger.kernel.org
> > Acked-by: Christian König  (v1)
> > Signed-off-by: Daniel Vetter 
> 
> Hi Peter,
> 
> Iirc you've been involved at least somewhat in discussing this. -mm folks
> are a bit undecided whether these new non_block semantics are a good idea.
> Michal Hocko still is in support, but Andrew Morton and Jason Gunthorpe
> are less enthusiastic. Jason said he's ok with merging the hmm side of
> this if scheduler folks ack. If not, then I'll respin with the
> preempt_disable/enable instead like in v1.
> 
> So ack/nack for this from the scheduler side?

Right, I had memories of seeing this before, and I just found a fairly
long discussion on this elsewhere in the vacation inbox (*groan*).

Yeah, this is something I can live with,

Acked-by: Peter Zijlstra (Intel) 
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 3/4] kernel.h: Add non_block_start/end()

2019-08-23 Thread Peter Zijlstra

On Fri, Aug 23, 2019 at 09:12:34AM -0300, Jason Gunthorpe wrote:

> I still haven't heard a satisfactory answer why a whole new scheme is
> needed and a simple:
> 
>if (IS_ENABLED(CONFIG_DEBUG_ATOMIC_SLEEP))
> preempt_disable()
> 
> isn't sufficient to catch the problematic cases during debugging??
> IMHO the fact preempt is changed by the above when debugging is not
> material here. I think that information should be included in the
> commit message at least.

That has a much larger impact and actually changes behaviour, while the
relatively simple patch Daniel proposed only adds a warning but doesn't
affect behaviour.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 3/4] kernel.h: Add non_block_start/end()

2019-08-23 Thread Peter Zijlstra

On Fri, Aug 23, 2019 at 03:42:47PM +0200, Daniel Vetter wrote:
> I'm assuming the lockdep one will land, so not going to resend that.

I was assuming you'd wake the might_lock_nested() along with the i915
user through the i915/drm tree. If want me to take some or all of that,
lemme know.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v2 1/6] mm: introduce vma->vm_flags modifier functions

2023-01-25 Thread Peter Zijlstra

On Wed, Jan 25, 2023 at 12:38:46AM -0800, Suren Baghdasaryan wrote:

> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 2d6d790d9bed..6c7c70bf50dd 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -491,7 +491,13 @@ struct vm_area_struct {
>* See vmf_insert_mixed_prot() for discussion.
>*/
>   pgprot_t vm_page_prot;
> - unsigned long vm_flags; /* Flags, see mm.h. */
> +
> + /*
> +  * Flags, see mm.h.
> +  * WARNING! Do not modify directly.
> +  * Use {init|reset|set|clear|mod}_vm_flags() functions instead.
> +  */
> + unsigned long vm_flags;

We have __private and ACCESS_PRIVATE() to help with enforcing this.

Re: [PATCH v5 0/7] Introduce __xchg, non-atomic xchg

2023-02-22 Thread Peter Zijlstra

On Wed, Jan 18, 2023 at 04:35:22PM +0100, Andrzej Hajda wrote:

> Andrzej Hajda (7):
>   arch: rename all internal names __xchg to __arch_xchg
>   linux/include: add non-atomic version of xchg
>   arch/*/uprobes: simplify arch_uretprobe_hijack_return_addr
>   llist: simplify __llist_del_all
>   io_uring: use __xchg if possible
>   qed: use __xchg if possible
>   drm/i915/gt: use __xchg instead of internal helper

Nothing crazy in here I suppose, I somewhat wonder why you went through
the trouble, but meh.

You want me to take this through te locking tree (for the next cycle,
not this one) where I normally take atomic things or does someone else
want this?

Re: [Intel-gfx] [PATCH v5 0/7] Introduce __xchg, non-atomic xchg

2023-02-27 Thread Peter Zijlstra

On Thu, Feb 23, 2023 at 10:24:19PM +0100, Andrzej Hajda wrote:
> On 22.02.2023 18:04, Peter Zijlstra wrote:
> > On Wed, Jan 18, 2023 at 04:35:22PM +0100, Andrzej Hajda wrote:
> > 
> > > Andrzej Hajda (7):
> > >arch: rename all internal names __xchg to __arch_xchg
> > >linux/include: add non-atomic version of xchg
> > >arch/*/uprobes: simplify arch_uretprobe_hijack_return_addr
> > >llist: simplify __llist_del_all
> > >io_uring: use __xchg if possible
> > >qed: use __xchg if possible
> > >drm/i915/gt: use __xchg instead of internal helper
> > 
> > Nothing crazy in here I suppose, I somewhat wonder why you went through
> > the trouble, but meh.
> 
> If you are asking why I have proposed this patchset, then the answer is
> simple, 1st I've tried to find a way to move internal i915 helper to core
> (see patch 7).
> Then I was looking for possible other users of this helper. And apparently
> there are many of them, patches 3-7 shows some.
> 
> 
> > 
> > You want me to take this through te locking tree (for the next cycle,
> > not this one) where I normally take atomic things or does someone else
> > want this?
> 
> If you could take it I will be happy.

OK, I'll go queue it in tip/locking/core after -rc1. Thanks!

Re: [PATCH v2 20/20] jump_label: RFC - tolerate toggled state

2023-01-17 Thread Peter Zijlstra

On Fri, Jan 13, 2023 at 12:30:16PM -0700, Jim Cromie wrote:
> __jump_label_patch currently will "crash the box" if it finds a
> jump_entry not as expected.  ISTM this overly harsh; it doesn't
> distinguish between "alternate/opposite" state, and truly
> "insane/corrupted".
> 
> The "opposite" (but well-formed) state is a milder mis-initialization
> problem, and some less severe mitigation seems practical.  ATM this
> just warns about it; a range/enum of outcomes: warn, crash, silence,
> ok, fixup-continue, etc, are possible on a case-by-case basis.
> 
> Ive managed to create this mis-initialization condition with
> test_dynamic_debug.ko & _submod.ko.  These replicate DRM's regression
> on DRM_USE_DYNAMIC_DEBUG=y; drm.debug callsites in drivers/helpers
> (dependent modules) are not enabled along with those in drm.ko itself.
> 

> Ive hit this case a few times, but havent been able to isolate the
> when and why.
> 
> warn-only is something of a punt, and I'm still left with remaining
> bugs which are likely related; I'm able to toggle the p-flag on
> callsites in the submod, but their enablement still doesn't yield
> logging activity.

Right; having been in this is state is bad since it will generate
inconsistent code-flow. Full on panic *might* not be warranted (as it
does for corrupted text) but it is still a fairly bad situation -- so
I'm not convinced we want to warn and carry on.

It would be really good to figure out why the site was skipped over and
got out of skew.

Given it's all module stuff, the 'obvious' case would be something like
a race between adding the new sites and flipping it, but I'm not seeing
how -- things are rather crudely serialized by jump_label_mutex.

The only other option I can come up with is that somehow the update
condition in jump_label_add_module() is somehow wrong.

Re: [PATCH mm-unstable v1 20/20] mm: rename FOLL_FORCE to FOLL_PTRACE

2022-11-18 Thread Peter Zijlstra

On Wed, Nov 16, 2022 at 10:16:34AM -0800, Linus Torvalds wrote:
> Following the history of it is a big of a mess, because there's a
> number of renamings and re-organizations, but it seems to go back to
> 2007 and commit b6a2fea39318 ("mm: variable length argument support").

I went back and read parts of the discussions with Ollie, and the
.force=1 thing just magically appeared one day when we were sending
work-in-progress patches back and forth without mention of where it came
from :-/

And I certainly can't remember now..

Looking at it now, I have the same reaction as both you and Kees had, it
seems entirely superflous. So I'm all for trying to remove it.

Re: [PATCH v2 03/47] mm: shrinker: add infrastructure for dynamically allocating shrinker

2023-07-24 Thread Peter Zijlstra

On Mon, Jul 24, 2023 at 05:43:10PM +0800, Qi Zheng wrote:

> +void shrinker_unregister(struct shrinker *shrinker)
> +{
> + struct dentry *debugfs_entry;
> + int debugfs_id;
> +
> + if (!shrinker || !(shrinker->flags & SHRINKER_REGISTERED))
> + return;
> +
> + down_write(&shrinker_rwsem);
> + list_del(&shrinker->list);
> + shrinker->flags &= ~SHRINKER_REGISTERED;
> + if (shrinker->flags & SHRINKER_MEMCG_AWARE)
> + unregister_memcg_shrinker(shrinker);
> + debugfs_entry = shrinker_debugfs_detach(shrinker, &debugfs_id);
> + up_write(&shrinker_rwsem);
> +
> + shrinker_debugfs_remove(debugfs_entry, debugfs_id);

Should there not be an rcu_barrier() right about here?

> +
> + kfree(shrinker->nr_deferred);
> + shrinker->nr_deferred = NULL;
> +
> + kfree(shrinker);
> +}
> +EXPORT_SYMBOL(shrinker_unregister);

Re: [PATCH v3 0/4] mm: convert to vma_is_initial_heap/stack()

2023-07-31 Thread Peter Zijlstra

On Fri, Jul 28, 2023 at 01:00:39PM +0800, Kefeng Wang wrote:

> Kefeng Wang (4):
>   mm: factor out VMA stack and heap checks
>   drm/amdkfd: use vma_is_initial_stack() and vma_is_initial_heap()
>   selinux: use vma_is_initial_stack() and vma_is_initial_heap()
>   perf/core: use vma_is_initial_stack() and vma_is_initial_heap()
> 
>  drivers/gpu/drm/amd/amdkfd/kfd_svm.c |  5 +
>  fs/proc/task_mmu.c   | 24 
>  fs/proc/task_nommu.c | 15 +
>  include/linux/mm.h   | 25 +
>  kernel/events/core.c | 33 ++--
>  security/selinux/hooks.c |  7 ++
>  6 files changed, 44 insertions(+), 65 deletions(-)

Acked-by: Peter Zijlstra (Intel)

Re: Consider switching to WQ_UNBOUND messages (was: Re: [PATCH v2 6/7] workqueue: Report work funcs that trigger automatic CPU_INTENSIVE mechanism)

2023-07-12 Thread Peter Zijlstra

On Tue, Jul 11, 2023 at 11:39:17AM -1000, Tejun Heo wrote:

> I wonder whether the right thing to do here is somehow scaling the threshold
> according to the relative processing power. It's difficult to come up with a
> threshold which works well across the latest & fastest and really tiny CPUs.
> I'll think about it some more but if you have some ideas, please feel free
> to suggest.

We could scale by BogoMIPS I suppose, it's a bogus measurement, as per
the name, but it does have some relation to how fast the machine is.

Re: Consider switching to WQ_UNBOUND messages (was: Re: [PATCH v2 6/7] workqueue: Report work funcs that trigger automatic CPU_INTENSIVE mechanism)

2023-07-12 Thread Peter Zijlstra

On Wed, Jul 12, 2023 at 11:04:16AM +0200, Geert Uytterhoeven wrote:
> Hoi Peter,
> 
> On Wed, Jul 12, 2023 at 10:05 AM Peter Zijlstra  wrote:
> > On Tue, Jul 11, 2023 at 11:39:17AM -1000, Tejun Heo wrote:
> > > I wonder whether the right thing to do here is somehow scaling the 
> > > threshold
> > > according to the relative processing power. It's difficult to come up 
> > > with a
> > > threshold which works well across the latest & fastest and really tiny 
> > > CPUs.
> > > I'll think about it some more but if you have some ideas, please feel free
> > > to suggest.
> >
> > We could scale by BogoMIPS I suppose, it's a bogus measurement, as per
> > the name, but it does have some relation to how fast the machine is.
> 
> That's gonna fail miserably on e.g. ARM and RISC-V, where BogoMIPS
> depends on some timer frequency.
> 
> R-Car M2-W with 1.5 GHz Cortex-A15: 40.00 BogoMIPS
> R-Car V4H with 1.8 GHz Cortex-A76: 33.33 BogoMIPS
> 
> while the real slow 48 MHz VexRiscV gets 128 BogoMIPS.

Hehe, OK, really bogus then. Lets file this idea in the bit-bucket then.

Re: [PATCH v2 4/4] perf/core: use vma_is_initial_stack() and vma_is_initial_heap()

2023-07-19 Thread Peter Zijlstra

On Wed, Jul 19, 2023 at 03:51:14PM +0800, Kefeng Wang wrote:
> Use the helpers to simplify code, also kill unneeded goto cpy_name.

Grrr.. why am I only getting 4/4 ?

I'm going to write a bot that auto NAKs all partial series :/

linux-next: manual merge of the tip tree with the drm-intel tree

2016-11-08 Thread Peter Zijlstra

On Tue, Nov 08, 2016 at 11:44:03AM +0100, Daniel Vetter wrote:
> On Tue, Nov 08, 2016 at 03:25:41PM +1100, Stephen Rothwell wrote:
> > Hi all,
> > 
> > FIXME: Add owner of second tree to To:
> >Add author(s)/SOB of conflicting commits.
> > 
> > Today's linux-next merge of the tip tree got a conflict in:
> > 
> >   drivers/gpu/drm/i915/i915_gem_shrinker.c
> > 
> > between commits:
> > 
> >   1233e2db199d ("drm/i915: Move object backing storage manipulation to its 
> > own locking")
> > 
> > from the drm-intel tree and commit:
> > 
> >   3ab7c086d5ec ("locking/drm: Kill mutex trickery")
> >   c7faee2109f9 ("locking/drm: Fix i915_gem_shrinker_lock() locking")
> 
> Hm, this seems to be the older versions that nuke the recursive locking
> trickery entirely, I thought we had version in-flight that kept that? I
> know that the i915 (and msm locking fwiw) is horrible since essentially
> it's a recursive BKL, and we're working (slowly, after all getting rid of
> the BKL wasn't simple either) to fix this. But meanwhile I'm assuming that
> we'll still need this to be able to get out of low memory situations in
> i915. Has that part simply not yet landed?

You're talking about:

  lkml.kernel.org/r/20161007154351.GL3117 at twins.programming.kicks-ass.net

? I got no feedback from you DRM guys on that so I kinda forgot about
that in the hope we'd not have to do this at all.

I can try and resurrect, that I suppose.

Now, I know you're working on getting rid of this entirely for i915, but
what about that MSM driver? Will we continue to need it there, is
anybody actually maintaining that thing?

[Intel-gfx] linux-next: manual merge of the tip tree with the drm-intel tree

2016-11-08 Thread Peter Zijlstra

On Tue, Nov 08, 2016 at 05:09:16PM +0100, Daniel Vetter wrote:
> > You're talking about:
> > 
> >   lkml.kernel.org/r/20161007154351.GL3117 at twins.programming.kicks-ass.net
> > 
> > ? I got no feedback from you DRM guys on that so I kinda forgot about
> > that in the hope we'd not have to do this at all.
> 
> Yes. Chris/Joonas, pls give this is a spin and review.

OK, I'll respin that thing with Linus' feedback etc. Might not be until
tomorrow though, so any additional feedback would be good.

Thanks!

[Intel-gfx] linux-next: manual merge of the tip tree with the drm-intel tree

2016-11-08 Thread Peter Zijlstra

On Tue, Nov 08, 2016 at 05:09:16PM +0100, Daniel Vetter wrote:
> > Now, I know you're working on getting rid of this entirely for i915, but
> > what about that MSM driver? Will we continue to need it there, is
> > anybody actually maintaining that thing?
> 
> Rob Clark is, and since he's a one-man gpu driver team with other
> responsibilities it might take even longer than for i915 :(

Fair enough. For my information, how much a of copy/paste job from i915
was that? Could he, in principle, copy/paste your changes to get rid of
this back into MSM without too much effort, or have things diverged
greatly since the initial copy/paste?

[PATCH 1/4] locking/ww_mutex: Fix a deadlock affecting ww_mutexes

2016-11-23 Thread Peter Zijlstra

On Wed, Nov 23, 2016 at 12:25:22PM +0100, Nicolai HÃ¤hnle wrote:
> From: Nicolai HÃ¤hnle 
> 
> Fix a race condition involving 4 threads and 2 ww_mutexes as indicated in
> the following example. Acquire context stamps are ordered like the thread
> numbers, i.e. thread #1 should back off when it encounters a mutex locked
> by thread #0 etc.
> 
> Thread #0Thread #1Thread #2Thread #3
> ----
>lock(ww)
>success
>  lock(ww')
>  success
>   lock(ww)
>  lock(ww).
> .. unlock(ww) part 1
> lock(ww)...
> success ...
> .. unlock(ww) part 2
> . back off
> lock(ww')   .
>..
> (stuck)  (stuck)
> 
> Here, unlock(ww) part 1 is the part that sets lock->base.count to 1
> (without being protected by lock->base.wait_lock), meaning that thread #0
> can acquire ww in the fast path or, much more likely, the medium path
> in mutex_optimistic_spin. Since lock->base.count == 0, thread #0 then
> won't wake up any of the waiters in ww_mutex_set_context_fastpath.
> 
> Then, unlock(ww) part 2 wakes up _only_the_first_ waiter of ww. This is
> thread #2, since waiters are added at the tail. Thread #2 wakes up and
> backs off since it sees ww owned by a context with a lower stamp.
> 
> Meanwhile, thread #1 is never woken up, and so it won't back off its lock
> on ww'. So thread #0 gets stuck waiting for ww' to be released.
> 
> This patch fixes the deadlock by waking up all waiters in the slow path
> of ww_mutex_unlock.
> 
> We have an internal test case for amdgpu which continuously submits
> command streams from tens of threads, where all command streams reference
> hundreds of GPU buffer objects with a lot of overlap in the buffer lists
> between command streams. This test reliably caused a deadlock, and while I
> haven't completely confirmed that it is exactly the scenario outlined
> above, this patch does fix the test case.
> 
> v2:
> - use wake_q_add
> - add additional explanations
> 
> Cc: Peter Zijlstra 
> Cc: Ingo Molnar 
> Cc: Chris Wilson 
> Cc: Maarten Lankhorst 
> Cc: dri-devel at lists.freedesktop.org
> Cc: stable at vger.kernel.org
> Reviewed-by: Christian KÃ¶nig  (v1)
> Signed-off-by: Nicolai HÃ¤hnle 

Completely and utterly fails to apply; I think this patch is based on
code prior to the mutex rewrite.

Please rebase on tip/locking/core.

Also, is this a regression, or has this been a 'feature' of the ww_mutex
code from early on?

[PATCH 1/4] locking/ww_mutex: Fix a deadlock affecting ww_mutexes

2016-11-23 Thread Peter Zijlstra

On Wed, Nov 23, 2016 at 12:25:22PM +0100, Nicolai HÃ¤hnle wrote:
> @@ -473,7 +476,14 @@ void __sched ww_mutex_unlock(struct ww_mutex *lock)
>*/
>   mutex_clear_owner(&lock->base);
>  #endif
> - __mutex_fastpath_unlock(&lock->base.count, __mutex_unlock_slowpath);
> + /*
> +  * A previously _not_ waiting task may acquire the lock via the fast
> +  * path during our unlock. In that case, already waiting tasks may have
> +  * to back off to avoid a deadlock. Wake up all waiters so that they
> +  * can check their acquire context stamp against the new owner.
> +  */
> + __mutex_fastpath_unlock(&lock->base.count,
> + __mutex_unlock_slowpath_wakeall);
>  }

So doing a wake-all has obvious issues with thundering herd etc.. Also,
with the new mutex, you'd not be able to do hand-off, which would
introduce starvation cases.

Ideally we'd iterate the blocked list and pick the waiter with the
earliest stamp, or we'd maintain the list in stamp order instead of
FIFO, for ww_mutex.

[PATCH 1/4] locking/ww_mutex: Fix a deadlock affecting ww_mutexes

2016-11-23 Thread Peter Zijlstra

On Wed, Nov 23, 2016 at 03:25:25PM +0100, Daniel Vetter wrote:

> Without thinking it through in detail this is a PI issue, except that we
> replace boosting with wakeup&back-off. Could we perhaps steal something
> from rt mutexes to make it fair&efficient?

rt_mutexes order the waiters by 'priority' and wake the top most.

[PATCH 1/4] locking/ww_mutex: Fix a deadlock affecting ww_mutexes

2016-11-24 Thread Peter Zijlstra

On Thu, Nov 24, 2016 at 12:26:57PM +0100, Nicolai HÃ¤hnle wrote:

> I do believe we can win a bit by keeping the wait list sorted, if we also
> make sure that waiters don't add themselves in the first place if they see
> that a deadlock situation cannot be avoided.
> 
> I will probably want to extend struct mutex_waiter with ww_mutex-specific
> fields to facilitate this (i.e. ctx pointer, perhaps stamp as well to reduce
> pointer-chasing). That should be fine since it lives on the stack.

Right, shouldn't be a problem I think.

The only 'problem' I can see with using that is that its possible to mix
ww and !ww waiters through ww_mutex_lock(.ctx = NULL). This makes the
list order somewhat tricky.

Ideally we'd remove that feature, although I see its actually used quite
a bit :/

> In the meantime, I'd appreciate it if patch #1 could be accepted as-is for
> stable updates to <= 4.8. It fixes a real (if rare) bug, and the stampede
> inefficiency isn't a problem in practice at least for GPU applications.

Sorry can't do. We don't do stable patches that don't have anything
upstream.

[PATCH 1/4] locking/ww_mutex: Fix a deadlock affecting ww_mutexes

2016-11-24 Thread Peter Zijlstra

On Thu, Nov 24, 2016 at 12:52:25PM +0100, Daniel Vetter wrote:
> On Thu, Nov 24, 2016 at 12:40 PM, Peter Zijlstra  
> wrote:
> >
> >> I do believe we can win a bit by keeping the wait list sorted, if we also
> >> make sure that waiters don't add themselves in the first place if they see
> >> that a deadlock situation cannot be avoided.
> >>
> >> I will probably want to extend struct mutex_waiter with ww_mutex-specific
> >> fields to facilitate this (i.e. ctx pointer, perhaps stamp as well to 
> >> reduce
> >> pointer-chasing). That should be fine since it lives on the stack.
> >
> > Right, shouldn't be a problem I think.
> >
> > The only 'problem' I can see with using that is that its possible to mix
> > ww and !ww waiters through ww_mutex_lock(.ctx = NULL). This makes the
> > list order somewhat tricky.
> >
> > Ideally we'd remove that feature, although I see its actually used quite
> > a bit :/
> 
> I guess we could create a small fake acquire_ctx for single-lock
> paths. That way callers still don't need to deal with having an
> explicit ctx, but we can assume the timestamp (for ensuring fairness)
> is available for all cases. Otherwise there's indeed a problem with
> correctly (well fairly) interleaving ctx and non-ctx lockers I think.

Actually tried that, but we need a ww_class to get a stamp from, and
ww_mutex_lock() doesn't have one of those..

[PATCH 01/11] drm/vgem: Use ww_mutex_(un)lock even with a NULL context

2016-11-28 Thread Peter Zijlstra

On Mon, Nov 28, 2016 at 01:42:26PM +0100, Maarten Lankhorst wrote:

> > +   ww_mutex_lock(&resv->lock.base, NULL);
> Yuck, can we rename base to __NEVER_TOUCH_DIRECTLY_OUTSIDE_LOCKING_CORE?
> It's harder to get them confused like that, even with a null context it's 
> illegal to call mutex_lock/unlock directly.

I think there's a __private sparse annotation that could be used.

[PATCH v3 10/12] locking/ww_mutex: Yield to other waiters from optimistic spin

2017-01-05 Thread Peter Zijlstra



OK, so I got this far, although I stuffed two patches in the middle
(which I should probably pull to the beginning and did this one patch
differently.

I've not really tested the result yet, will attempt to do so tomorrow.

Please have a look at the current state of things here:

  git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git locking/core

Lockdep warning when using REGCACHE_RBTREE

2016-01-15 Thread Peter Zijlstra

On Thu, Jan 14, 2016 at 02:30:50PM -0800, Stefan Agner wrote:
> Hi Mark,
> 
> I currently work on the DCU DRM driver (drivers/gpu/drm/fsl-dcu/) on a
> Linux 4.4 kernel. With CONFIG_LOCKDEP enabled I get the following
> warning on startup:
> 
> [1.327284] [ cut here ]
> [1.332010] WARNING: CPU: 0 PID: 1 at kernel/locking/lockdep.c:2755 
> lockdep_trace_alloc+0x120/0x124()
> [1.341358] DEBUG_LOCKS_WARN_ON(irqs_disabled_flags(flags))

> [1.521744] [<80375308>] (_regmap_write) from [<80376760>] 
> (regmap_write+0x48/0x68)
> [1.535348] [<80376718>] (regmap_write) from [<803538e8>] 
> (fsl_dcu_drm_crtc_create+0x98/0xe8)
> [1.549824] [<80353850>] (fsl_dcu_drm_crtc_create) from [<80352df0>] 
> (fsl_dcu_drm_modeset_init+0x5c/0xfc)


> The comment in __lockdep_trace_alloc says:
> "Oi! Can't be having __GFP_FS allocations with IRQs disabled.".

So _regmap_write() does: map->lock(map->lock_arg)

Which isn't very enlightening, since that can be a mutex or a spinlock,
the above strongly suggests spinlock though.

Looking at __regmap_init(), this seems to depend on:

  (bus && bus->fast_io) || config->fast_io

now, afaict, regmap_mmio is used in this case, which has .fast_io =
true.

So yeah, fail.

Re: rcu_barrier() no longer allowed within mmap_sem?

2020-03-30 Thread Peter Zijlstra

On Mon, Mar 30, 2020 at 03:00:35PM +0200, Daniel Vetter wrote:
> Hi all, for all = rcu, cpuhotplug and perf maintainers
> 
> We've hit an interesting new lockdep splat in our drm/i915 CI:
> 
> https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17096/shard-tglb7/igt@kms_frontbuffer_track...@fbcpsr-rgb101010-draw-mmap-gtt.html#dmesg-warnings861
> 
> Summarizing away the driver parts we have
> 
> < gpu locks which are held within mm->mmap_sem in various gpu fault handlers >
> 
> -> #4 (&mm->mmap_sem#2){}:
> <4> [604.892615] __might_fault+0x63/0x90
> <4> [604.892617] _copy_to_user+0x1e/0x80
> <4> [604.892619] perf_read+0x200/0x2b0
> <4> [604.892621] vfs_read+0x96/0x160
> <4> [604.892622] ksys_read+0x9f/0xe0
> <4> [604.892623] do_syscall_64+0x4f/0x220
> <4> [604.892624] entry_SYSCALL_64_after_hwframe+0x49/0xbe
> <4> [604.892625]
> -> #3 (&cpuctx_mutex){+.+.}:
> <4> [604.892626] __mutex_lock+0x9a/0x9c0
> <4> [604.892627] perf_event_init_cpu+0xa4/0x140
> <4> [604.892629] perf_event_init+0x19d/0x1cd
> <4> [604.892630] start_kernel+0x362/0x4e4
> <4> [604.892631] secondary_startup_64+0xa4/0xb0
> <4> [604.892631]
> -> #2 (pmus_lock){+.+.}:
> <4> [604.892633] __mutex_lock+0x9a/0x9c0
> <4> [604.892633] perf_event_init_cpu+0x6b/0x140
> <4> [604.892635] cpuhp_invoke_callback+0x9b/0x9d0
> <4> [604.892636] _cpu_up+0xa2/0x140
> <4> [604.892637] do_cpu_up+0x61/0xa0
> <4> [604.892639] smp_init+0x57/0x96
> <4> [604.892639] kernel_init_freeable+0x87/0x1dc
> <4> [604.892640] kernel_init+0x5/0x100
> <4> [604.892642] ret_from_fork+0x24/0x50
> <4> [604.892642]
> -> #1 (cpu_hotplug_lock.rw_sem){}:
> <4> [604.892643] cpus_read_lock+0x34/0xd0
> <4> [604.892644] rcu_barrier+0xaa/0x190
> <4> [604.892645] kernel_init+0x21/0x100
> <4> [604.892647] ret_from_fork+0x24/0x50
> <4> [604.892647]
> -> #0 (rcu_state.barrier_mutex){+.+.}:


> The last backtrace boils down to i915 driver code which holds the same
> locks we are holding within mm->mmap_sem, and then ends up calling
> rcu_barrier(). From what I can see i915 is just the messenger here,
> any driver with this pattern of a lock held within mmap_sem which also
> has a path of calling rcu_barrier while holding that lock should be
> hitting this splat.
> 
> Two questions:
> - This suggests that calling rcu_barrier() isn't ok anymore while
> holding mmap_sem, or anything that has a dependency upon mmap_sem. I
> guess that's not the idea, please confirm.
> - Assuming this depedency is indeed not intended, where should the
> loop be broken? It goes through perf, cpuhotplug and rcu subsystems,
> and I don't have a clue about any of those.

I wonder what is new here; the 1-4 chain there has been true for a long
time, see also the comment at perf_event_ctx_lock_nested().

That said; it _might_ be possible to break 3->4, that is, all the
copy_{to,from}_user() usage in perf can be lifted out from under the
various locks by re-arranging code, but I have a nagging feeling there
was more to it than that. Of course, while I did document the locking
rules, I seem to have forgotten to comment on exactly why these rules
are as they are.. oh well.


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 17/28] mm: remove the prot argument from vm_map_ram

2020-04-08 Thread Peter Zijlstra

On Wed, Apr 08, 2020 at 01:59:15PM +0200, Christoph Hellwig wrote:
> This is always GFP_KERNEL - for long term mappings with other properties
> vmap should be used.

 PAGE_KERNEL != GFP_KERNEL :-)

> - return vm_map_ram(mock->pages, mock->npages, 0, PAGE_KERNEL);
> + return vm_map_ram(mock->pages, mock->npages, 0);
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: decruft the vmalloc API

2020-04-08 Thread Peter Zijlstra

On Wed, Apr 08, 2020 at 01:58:58PM +0200, Christoph Hellwig wrote:
> Hi all,
> 
> Peter noticed that with some dumb luck you can toast the kernel address
> space with exported vmalloc symbols.
> 
> I used this as an opportunity to decruft the vmalloc.c API and make it
> much more systematic.  This also removes any chance to create vmalloc
> mappings outside the designated areas or using executable permissions
> from modules.  Besides that it removes more than 300 lines of code.
> 

Looks great, thanks for doing this!

Acked-by: Peter Zijlstra (Intel) 
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 10/28] mm: only allow page table mappings for built-in zsmalloc

2020-04-08 Thread Peter Zijlstra

On Wed, Apr 08, 2020 at 08:01:00AM -0700, Randy Dunlap wrote:
> Hi,
> 
> On 4/8/20 4:59 AM, Christoph Hellwig wrote:
> > diff --git a/mm/Kconfig b/mm/Kconfig
> > index 36949a9425b8..614cc786b519 100644
> > --- a/mm/Kconfig
> > +++ b/mm/Kconfig
> > @@ -702,7 +702,7 @@ config ZSMALLOC
> >  
> >  config ZSMALLOC_PGTABLE_MAPPING
> > bool "Use page table mapping to access object in zsmalloc"
> > -   depends on ZSMALLOC
> > +   depends on ZSMALLOC=y
> 
> It's a bool so this shouldn't matter... not needed.

My mm/Kconfig has:

config ZSMALLOC
tristate "Memory allocator for compressed pages"
depends on MMU

which I think means it can be modular, no?
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 10/28] mm: only allow page table mappings for built-in zsmalloc

2020-04-09 Thread Peter Zijlstra

On Thu, Apr 09, 2020 at 09:08:26AM -0700, Minchan Kim wrote:
> On Wed, Apr 08, 2020 at 01:59:08PM +0200, Christoph Hellwig wrote:
> > This allows to unexport map_vm_area and unmap_kernel_range, which are
> > rather deep internal and should not be available to modules.
> 
> Even though I don't know how many usecase we have using zsmalloc as
> module(I heard only once by dumb reason), it could affect existing
> users. Thus, please include concrete explanation in the patch to
> justify when the complain occurs.

The justification is 'we can unexport functions that have no sane reason
of being exported in the first place'.

The Changelog pretty much says that.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends

2020-09-24 Thread Peter Zijlstra

On Thu, Sep 24, 2020 at 08:32:41AM -0400, Steven Rostedt wrote:
> Anyway, instead of blocking. What about having a counter of number of
> migrate disabled tasks per cpu, and when taking a migrate_disable(), and 
> there's
> already another task with migrate_disabled() set, and the current task has
> an affinity greater than 1, it tries to migrate to another CPU?

That doesn't solve the problem. On wakeup we should already prefer an
idle CPU over one running a (RT) task, but you can always wake more
tasks than there's CPUs around and you'll _have_ to stack at some point.

The trick is how to unstack them correctly. We need to detect when a
migrate_disable() task _should_ start running again, and migrate away
whoever is in the way at that point.

It turns out, that getting selected for pull-balance is exactly that
condition, and clearly a migrate_disable() task cannot be pulled, but we
can use that signal to try and pull away the running task that's in the
way.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends

2020-09-24 Thread Peter Zijlstra

On Thu, Sep 24, 2020 at 09:51:38AM -0400, Steven Rostedt wrote:

> > It turns out, that getting selected for pull-balance is exactly that
> > condition, and clearly a migrate_disable() task cannot be pulled, but we
> > can use that signal to try and pull away the running task that's in the
> > way.
> 
> Unless of course that running task is in a migrate disable section
> itself ;-)

See my ramblings here:

  
https://lkml.kernel.org/r/20200924082717.ga1362...@hirez.programming.kicks-ass.net

My plan was to have the migrate_enable() of the running task trigger the
migration in that case.

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [External] Re: [PATCH 2/2] sched: mark PRINTK_DEFERRED_CONTEXT_MASK in __schedule()

2020-09-28 Thread Peter Zijlstra

On Mon, Sep 28, 2020 at 06:04:23PM +0800, Chengming Zhou wrote:

> Well, you are lucky. So it's a problem in our printk implementation.

Not lucky; I just kicked it in the groin really hard:

  git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git 
debug/experimental

> The deadlock path is:
> 
> printk
>   vprintk_emit
>     console_unlock
>       vt_console_print
>         hide_cursor
>           bit_cursor
>             soft_cursor
>               queue_work_on
>                 __queue_work
>                   try_to_wake_up
>                     _raw_spin_lock
>                       native_queued_spin_lock_slowpath
> 
> Looks like it's introduced by this commit:
> 
> eaa434defaca1781fb2932c685289b610aeb8b4b
> 
> "drm/fb-helper: Add fb_deferred_io support"

Oh gawd, yeah, all the !serial consoles are utter batshit.

Please look at John's last printk rewrite, IIRC it farms all that off to
a kernel thread instead of doing it from the printk() caller's context.

I'm not sure where he hides his latests patches, but I'm sure he'll be
more than happy to tell you.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [External] Re: [PATCH 2/2] sched: mark PRINTK_DEFERRED_CONTEXT_MASK in __schedule()

2020-09-29 Thread Peter Zijlstra

On Tue, Sep 29, 2020 at 04:27:51PM +0200, Petr Mladek wrote:

> Upstreaming the console handling will be the next big step. I am sure
> that there will be long discussion about it. But there might be
> few things that would help removing printk_deferred().
> 
> 1. Messages will be printed on consoles by dedicated kthreads. It will
>be safe context. No deadlocks.

This, that's what I remember. With sane consoles having a
->write_atomic() callback which is called in-line.

Current -RT has a single kthread that's flushing the consoles.

> 2. The registration and unregistration of consoles should not longer
>be handled by console_lock (semaphore). It should be possible to
>call most consoles without a sleeping lock. It should remove all
>these deadlocks between printk() and scheduler().

I'm confused, who cares about registation? That only happens once at
boot, right?

>There might be problems with some consoles. For example, tty would
>most likely still need a sleeping lock because it is using the console
>semaphore also internally.

But per 1) above, that's done from a kthread, so nobody cares.

> 3. We will try harder to get the messages out immediately during
>panic().

As long as you guarantee to first write everything to consoles with
->write_atomic() before attempting to flush others that should be fine.

> It would take some time until the above reaches upstream. But it seems
> to be the right way to go.

Yep.

> Finally, the deadlock happens "only" when someone is waiting on
> console_lock() in parallel. Otherwise, the waitqueue for the semaphore
> is empty and scheduler is not called.

There's more deadlocks. In fact the one described earlier in this
thread isn't the one you mention.

> It means that there is quite a big change to see the WARN(). It might
> be even bigger than with printk_deferred() because WARN() in scheduler
> means that the scheduler is big troubles. Nobody guarantees that
> the deferred messages will get handled later.

I only care about ->write_atomic() :-) Anything else is a
best-effort-loss anyway.

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v2 0/3] drm: commit_work scheduling

2020-10-06 Thread Peter Zijlstra

On Tue, Oct 06, 2020 at 11:08:59AM +0200, Daniel Vetter wrote:
> vblank work needs to preempt commit work.
> 
> Right now we don't have any driver requiring this, but if we e.g. roll out
> the gamma table update for i915, then this _has_ to happen in the vblank
> period.
> 
> Whereas the commit work can happen in there, but it can also be delayed a
> bit (until the vblank worker has finished) we will not miss any additional
> deadline due to that.
> 
> So that's why we have 2 levels. I'm not even sure you can model that with
> SCHED_DEADLINE, since essentially we need a few usec of cpu time very
> vblank (16ms normally), but thos few usec _must_ be scheduled within a
> very specific time slot or we're toast. And that vblank period is only
> 1-2ms usually.

Depends a bit on what the hardware gets us. If for example we're
provided an interrupt on vblank start, then that could wake a DEADLINE
job with (given your numbers above):

 .sched_period = 16ms,
 .sched_deadline = 1-2ms,
 .sched_runtime = 1-2ms,

The effective utilization of that task would be: 1-2/16.

> deadline has the upshot that it compose much better than SCHED_FIFO:
> Everyone just drops their deadline requirements onto the scheduler,
> scheduler makes sure it's all obeyed (or rejects your request).
> 
> The trouble is we'd need to know how long a commit takes, worst case, on a
> given platform. And for that you need to measure stuff, and we kinda can't
> spend a few minutes at boot-up going through the combinatorial maze of
> atomic commits to make sure we have it all.
> 
> So I think in practice letting userspace set the right rt priority/mode is
> the only way to go here :-/

Or you can have it adjust it's expected runtime as the system runs
(always keeping a little extra room over what you measure to make sure).

Given you have period > deadline, you can simply start with runtime =
deadline and adjust downwards during use (carefully).


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[RFC PATCH 00/11] drm/i915: Expose OA metrics via perf PMU

2015-05-19 Thread Peter Zijlstra

On Fri, May 15, 2015 at 02:07:29AM +0100, Robert Bragg wrote:
> On Fri, May 8, 2015 at 5:24 PM, Peter Zijlstra  
> wrote:
> > On Thu, May 07, 2015 at 03:15:43PM +0100, Robert Bragg wrote:
> >
> >> I've changed the uapi for configuring the i915_oa specific attributes
> >> when calling perf_event_open(2) whereby instead of cramming lots of
> >> bitfields into the perf_event_attr config members, I'm now
> >> daisy-chaining a drm_i915_oa_event_attr_t structure off of a single
> >> config member that's extensible and validated in the same way as the
> >> perf_event_attr struct. I've found this much nicer to work with while
> >> being neatly extensible too.
> >
> > This worries me a bit.. is there more background for this?
> 
> Would it maybe be helpful to see the before and after? I had kept this
> uapi change in a separate patch for a while locally but in the end
> decided to squash it before sending out my updated series.
> 
> Although I did find it a bit awkward with the bitfields, I was mainly
> concerned about the extensibility of packing logically separate
> attributes into the config members and had heard similar concerns from
> a few others who had been experimenting with my patches too.
> 
> A few simple attributes I can think of a.t.m that we might want to add
> in the future are:
> - control of the OABUFFER size
> - a way to ask the kernel to collect reports at the beginning and end
> of batch buffers, in addition to periodic reports
> - alternative ways to uniquely identify a context to support tools
> profiling a single context not necessarily owned by the current
> process
> 
> It could also be interesting to expose some counter configuration
> through these attributes too. E.g. on Broadwell+ we have 14 'Flexible
> EU' counters included in the OA unit reports, each with a 16bit
> configuration.
> 
> In a more extreme case it might also be useful to allow userspace to
> specify a complete counter config, which (depending on the
> configuration) could be over 100 32bit values to select the counter
> signals + configure the corresponding combining logic.
> 
> Since this pmu is in a device driver it also seemed reasonably
> appropriate to de-couple it slightly from the core perf_event_attr
> structure by allowing driver extensible attributes.
> 
> I wonder if it might be less worrisome if the i915_oa_copy_attr() code
> were instead a re-usable utility perhaps maintained in events/core.c,
> so if other pmu drivers were to follow suite there would be less risk
> of a mistake being made here?

So I had a peek at:

https://01.org/sites/default/files/documentation/observability_performance_counters_haswell.pdf

In an attempt to inform myself of how the hardware worked. But the
document is rather sparse (and does not include broadwell).

So from what I can gather there's two ways to observe the counters,
through MMIO or trough the ring-buffer, which in turn seems triggered by
a MI_REPORT_PERF_COUNT command.

[ Now the document talks about shortcomings of this scheme, where the
MI_REPORT_PERF_COUNT command can only be placed every other command, but
a single command can contain so much computation that this is not fine
grained enough -- leading to the suggestion of a timer/cycle based
reporting, but that is not actually mentioned afaict ]

Now the MI_REPORT_PERF_COUNT can select a vector (Counter Select) of
which events it will write out.

This covers the regular 'A' counters. Is this correct?

Then there are the 'B' counters, which appear to be programmable through
the CEC MMIO registers.

These B events can also be part of the MI_REPORT_PERF_COUNT vector.

Right?

So for me the 'natural' way to represent this in perf would be through
event groups. Create a perf event for every single event -- yes this is
53 events.

Use the MMIO reads for the regular read() interface, and use a hrtimer
placing MI_REPORT_PERF_COUNT commands, with a counter select mask
covering the all events in the current group, for sampling.

Then obtain the vector values using PERF_SAMPLE_READ and
PERF_FORMAT_READ -- and use the read txn support to consume the
ring-buffer vectors instead of the MMIO registers.

You can use the perf_event_attr::config to select the counter (A0-A44,
B0-B7) and use perf_event_attr::config1 (low and high dword) for the
corresponding CEC registers.

This does not require random per driver ABI extentions for
perf_event_attr, nor your custom output format.

Am I missing something obvious here?

[RFC PATCH 00/11] drm/i915: Expose OA metrics via perf PMU

2015-05-27 Thread Peter Zijlstra

On Thu, May 21, 2015 at 12:17:48AM +0100, Robert Bragg wrote:
> >
> > So for me the 'natural' way to represent this in perf would be through
> > event groups. Create a perf event for every single event -- yes this is
> > 53 events.
> 
> So when I was first looking at this work I had considered the
> possibility of separate events, and these are some of the things that
> in the end made me forward the hardware's raw report data via a single
> event instead...
> 
> There are 100s of possible B counters depending on the MUX
> configuration + fixed-function logic in addition to the A counters.

There's only 8 B counters, what there is 100s of is configurations, and
that's no different from any other PMU. There's 100s of events to select
on the CPU side as well - yet only 4 (general purpose) counters (8 if
you disable HT on recent machines) per CPU.

Of course, you could create more than 8 events at any one time, and then
perf will round-robin the configuration for you.

> A
> choice would need to be made about whether to expose events for the
> configurable counters that aren't inherently associated with any
> semantics, or instead defining events for counters with specific
> semantics (with 100s of possible counters to define). The later would
> seem more complex for userspace and the kernel if they both now have
> to understand the constraints on what counters can be used together.

Again, no different from existing PMUs. The most 'interesting' part of
any PMU driver is event scheduling.

The explicit design of perf was to put event scheduling _in_ the kernel
and not allow userspace direct access to the hardware (as previous PMU
models did).

> I
> guess with either approach we would also need to have some form of
> dedicated group leader event accepting attributes for configuring the
> state that affects the group as a whole, such as the counter
> configuration (3D vs GPGPU vs media etc). 

That depends a bit on how flexible you want to switch between these
modes; with the cost being in the 100s of MMIO register writes, it might
just not make sense to make this too dynamic.

An option might be to select the mode through the PMU driver's sysfs
files. Another option might be to expose the B counters 3 times and have
each set be mutually exclusive. That is, as soon as you've created a 3D
event, you can no longer create GPGPU/Media events.

> I'm not sure where we would
> handle the context-id + drm file descriptor attributes for initiating
> single context profiling but guess we'd need to authenticate each
> individual event open.

Event open is a slow path, so that should not be a problem, right?

> It's not clear if we'd configure the report
> layout via the group leader, or try to automatically choose the most
> compact format based on the group members. I'm not sure how pmus
> currently handle the opening of enabled events on an enabled group but

With the PERF_FORMAT_GROUP layout changing in-flight. I would recommend
not doing that -- decoding the output will be 'interesting' but not
impossible.

> I think there would need to be limitations in our case that new
> members can't result in a reconfigure of the counters if that might
> loose the current counter values known to userspace.

I'm not entirely sure what you mean here.

> From a user's pov, there's no real freedom to mix and match which
> counters are configured together, and there's only some limited
> ability to ignore some of the currently selected counters by not
> including them in reports.

It is 'impossible' to create a group that is not programmable. That is,
the pmu::event_init() call _should_ verify that the addition of the
event to the group (as given by event->group_leader) is valid.

> Something to understand here is that we have to work with sets of
> pre-defined MUX + fixed-function logic configurations that have been
> validated to give useful metrics for specific use cases, such as
> benchmarking 3D rendering, GPGPU or media workloads.

This is fine; as stated above. Since these are limited pieces of
'firmware' which are expensive to load, you don't have to do a fully
dynamic solution here.

> As it is currently the kernel doesn't need to know anything about the
> semantics of individual counters being selected, so it's currently
> convenient that we can aim to maintain all the counter meta data we
> have in userspace according to the changing needs of tools or drivers
> (e.g. names, descriptions, units, max values, normalization
> equations), de-coupled from the kernel, instead of splitting it
> between the kernel and userspace.

And that fully violates the design premise of perf. The kernel should be
in control of resources, not userspace.

If we'd have put userspace in charge, we could now not profile the same
task by two (or more) different observers. But with kernel side counter
management that is no problem at all.

> A benefit of being able to change the report size is to reduce memory
> bandwidth usage that can skew measure

linux-next: build failure after merge of the tip tree (from the drm-intel tree)

2016-07-05 Thread Peter Zijlstra

On Tue, Jul 05, 2016 at 01:53:03PM +1000, Stephen Rothwell wrote:
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index d3502c0603e5..1f91f187b2a8 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3290,7 +3290,7 @@ i915_gem_retire_work_handler(struct work_struct *work)
>* We do not need to do this test under locking as in the worst-case
>* we queue the retire worker once too often.
>*/
> - if (lockless_dereference(dev_priv->gt.awake))
> + if (/*lockless_dereference*/(dev_priv->gt.awake))
>   queue_delayed_work(dev_priv->wq,
>  &dev_priv->gt.retire_work,
>  round_jiffies_up_relative(HZ));
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index f6de8dd567a2..2c1926418691 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -3095,7 +3095,7 @@ static void i915_hangcheck_elapsed(struct work_struct 
> *work)
>   if (!i915.enable_hangcheck)
>   return;
>  
> - if (!lockless_dereference(dev_priv->gt.awake))
> + if (!/*lockless_dereference*/(dev_priv->gt.awake))
>   return;
>  
>   /* As enabling the GPU requires fairly extensive mmio access,

Right, neither case appears to include a data dependency and thus
lockless_dereference() seems misguided.

1 2 3 >

1 - 100 of 259 matches

Mail list logo