Re: [Intel-gfx] [PATCH 2/4] mm: add a io_mapping_map_user helper

2021-10-20 Thread Peter Zijlstra
On Wed, Oct 20, 2021 at 08:40:05AM -0700, Lucas De Marchi wrote:
> On Fri, Mar 26, 2021 at 06:55:03AM +0100, Christoph Hellwig wrote:
> > Add a helper that calls remap_pfn_range for an struct io_mapping, relying
> > on the pgprot pre-validation done when creating the mapping instead of
> > doing it at runtime.
> > 
> > Signed-off-by: Christoph Hellwig 
> > ---
> > include/linux/io-mapping.h |  3 +++
> > mm/Kconfig |  3 +++
> > mm/Makefile|  1 +
> > mm/io-mapping.c| 29 +
> > 4 files changed, 36 insertions(+)
> > create mode 100644 mm/io-mapping.c
> > 
> > diff --git a/include/linux/io-mapping.h b/include/linux/io-mapping.h
> > index c093e81310a9b3..e9743cfd858527 100644
> > --- a/include/linux/io-mapping.h
> > +++ b/include/linux/io-mapping.h
> > @@ -220,3 +220,6 @@ io_mapping_free(struct io_mapping *iomap)
> > }
> > 
> > #endif /* _LINUX_IO_MAPPING_H */
> > +
> > +int io_mapping_map_user(struct io_mapping *iomap, struct vm_area_struct 
> > *vma,
> > +   unsigned long addr, unsigned long pfn, unsigned long size);
> 
> I'm not sure what exactly brought me to check this, but while debugging
> I noticed this outside the header guard. But then after some more checks I
> saw nothing actually selects CONFIG_IO_MAPPING because commit using
> it was reverted in commit 0e4fe0c9f2f9 ("Revert "i915: use 
> io_mapping_map_user"")
> 
> Is this something we want to re-attempt moving to mm/ ?

Yes, it would be very good to unexport apply_to_page_range(), it's a
terrible interface to expose.


Re: [Intel-gfx] [PATCH v2] kernel/locking: Add context to ww_mutex_trylock.

2021-09-13 Thread Peter Zijlstra
On Mon, Sep 13, 2021 at 10:42:36AM +0200, Maarten Lankhorst wrote:

> > +/**
> > + * ww_mutex_trylock - tries to acquire the w/w mutex with optional acquire 
> > context
> > + * @ww: mutex to lock
> > + * @ww_ctx: optional w/w acquire context
> > + *
> > + * Trylocks a mutex with the optional acquire context; no deadlock 
> > detection is
> > + * possible. Returns 1 if the mutex has been acquired successfully, 0 
> > otherwise.
> > + *
> > + * Unlike ww_mutex_lock, no deadlock handling is performed. However, if a 
> > @ctx is
> > + * specified, -EALREADY handling may happen in calls to ww_mutex_trylock.
> > + *
> > + * A mutex acquired with this function must be released with 
> > ww_mutex_unlock.
> > + */
> > +int ww_mutex_trylock(struct ww_mutex *ww, struct ww_acquire_ctx *ww_ctx)
> > +{
> > +   if (!ww_ctx)
> > +   return mutex_trylock(&ww->base);
> > +
> > +   MUTEX_WARN_ON(ww->base.magic != &ww->base);
> > +
> > +   if (unlikely(ww_ctx == READ_ONCE(ww->ctx)))
> > +   return -EALREADY;
> 
> I'm not 100% sure this is a good idea, because it would make the
> trylock weird.  For i915 I checked manually, because I didn't want to
> change the function signature. This is probably the other extreme.
> 
> "if (ww_mutex_trylock())" would look correct, but actually be wrong
> and lead to double unlock without adjustments.  Maybe we could make a
> ww_mutex_trylock_ctx_err, which would return -EALREADY or -EBUSY on
> failure, and 0 on success?  We could keep ww_mutex_trylock without
> ctx, probably just #define as (!ww_mutex_trylock_ctx_err(lock, NULL))

Urgh, yeah. Also, I suppose that if we already own it, we'll just fail
the trylock anyway. Let me take this out.

> > +   /*
> > +* Reset the wounded flag after a kill. No other process can
> > +* race and wound us here, since they can't have a valid owner
> > +* pointer if we don't have any locks held.
> > +*/
> > +   if (ww_ctx->acquired == 0)
> > +   ww_ctx->wounded = 0;
> 
> Yeah I guess this needs fixing too. Not completely sure since trylock
> wouldn't do the whole ww dance, but since it's our first lock,
> probably best to do so regardless so other users don't trip over it.

This is actually critical, because if this trylock is the first lock
acquisition for the context, there won't be any other opportunity to
reset this value.

> > +
> > +   if (__mutex_trylock(&ww->base)) {
> > +   ww_mutex_set_context_fastpath(ww, ww_ctx);
> > +   mutex_acquire_nest(&ww->base.dep_map, 0, 1, &ww_ctx->dep_map, 
> > _RET_IP_);
> > +   return 1;
> > +   }
> > +
> > +   return 0;
> > +}
> > +EXPORT_SYMBOL(ww_mutex_trylock);

Updated version below...

---
Subject: kernel/locking: Add context to ww_mutex_trylock()
From: Maarten Lankhorst 
Date: Thu, 9 Sep 2021 11:32:18 +0200

From: Maarten Lankhorst 

i915 will soon gain an eviction path that trylock a whole lot of locks
for eviction, getting dmesg failures like below:

  BUG: MAX_LOCK_DEPTH too low!
  turning off the locking correctness validator.
  depth: 48  max: 48!
  48 locks held by i915_selftest/5776:
   #0: 888101a79240 (&dev->mutex){}-{3:3}, at: 
__driver_attach+0x88/0x160
   #1: c99778c0 (reservation_ww_class_acquire){+.+.}-{0:0}, at: 
i915_vma_pin.constprop.63+0x39/0x1b0 [i915]
   #2: 88800cf74de8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: 
i915_vma_pin.constprop.63+0x5f/0x1b0 [i915]
   #3: 88810c7f9e38 (&vm->mutex/1){+.+.}-{3:3}, at: 
i915_vma_pin_ww+0x1c4/0x9d0 [i915]
   #4: 88810bad5768 (reservation_ww_class_mutex){+.+.}-{3:3}, at: 
i915_gem_evict_something+0x110/0x860 [i915]
   #5: 88810bad60e8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: 
i915_gem_evict_something+0x110/0x860 [i915]
  ...
   #46: 88811964d768 (reservation_ww_class_mutex){+.+.}-{3:3}, at: 
i915_gem_evict_something+0x110/0x860 [i915]
   #47: 88811964e0e8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: 
i915_gem_evict_something+0x110/0x860 [i915]
  INFO: lockdep is turned off.

Fixing eviction to nest into ww_class_acquire is a high priority, but
it requires a rework of the entire driver, which can only be done one
step at a time.

As an intermediate solution, add an acquire context to
ww_mutex_trylock, which allows us to do proper nesting annotations on
the trylocks, making the above lockdep splat disappear.

This is also useful in regulator_lock_nested, which may avoid dropping
regulator_nesting_mutex in the uncontended path, so use it there.

TTM may be another user for this, where we could lock a buffer in a

Re: [Intel-gfx] [PATCH v2] kernel/locking: Add context to ww_mutex_trylock.

2021-09-16 Thread Peter Zijlstra
On Thu, Sep 16, 2021 at 03:00:39PM +0200, Maarten Lankhorst wrote:

> > For merge logistics, can we pls have a stable branch? I expect that the
> > i915 patches will be ready for 5.16.
> >
> > Or send it in for -rc2 so that the interface change doesn't cause needless
> > conflicts, whatever you think is best.

> Yeah, some central branch drm could pull from, would make upstreaming patches 
> that depends on it easier. :)

I think I'll make tip/locking/wwmutex and include that in
tip/locking/core, let me have a poke.


Re: [Intel-gfx] [PATCH v2] kernel/locking: Add context to ww_mutex_trylock.

2021-09-17 Thread Peter Zijlstra
On Thu, Sep 16, 2021 at 03:28:11PM +0200, Peter Zijlstra wrote:
> On Thu, Sep 16, 2021 at 03:00:39PM +0200, Maarten Lankhorst wrote:
> 
> > > For merge logistics, can we pls have a stable branch? I expect that the
> > > i915 patches will be ready for 5.16.
> > >
> > > Or send it in for -rc2 so that the interface change doesn't cause needless
> > > conflicts, whatever you think is best.
> 
> > Yeah, some central branch drm could pull from, would make upstreaming 
> > patches that depends on it easier. :)
> 
> I think I'll make tip/locking/wwmutex and include that in
> tip/locking/core, let me have a poke.

This is now so. Enjoy!


Re: [Intel-gfx] [RFC 1/6] sched: Add nice value change notifier

2021-09-30 Thread Peter Zijlstra
On Thu, Sep 30, 2021 at 06:15:47PM +0100, Tvrtko Ursulin wrote:
>  void set_user_nice(struct task_struct *p, long nice)
>  {
>   bool queued, running;
> - int old_prio;
> + int old_prio, ret;
>   struct rq_flags rf;
>   struct rq *rq;
>  
> @@ -6913,6 +6945,9 @@ void set_user_nice(struct task_struct *p, long nice)
>*/
>   p->sched_class->prio_changed(rq, p, old_prio);
>  
> + ret = atomic_notifier_call_chain(&user_nice_notifier_list, nice, p);
> + WARN_ON_ONCE(ret != NOTIFY_DONE);
> +
>  out_unlock:
>   task_rq_unlock(rq, p, &rf);
>  }

No, we're not going to call out to exported, and potentially unbounded,
functions under scheduler locks.


Re: [Intel-gfx] [RFC 0/6] CPU + GPU synchronised priority scheduling

2021-09-30 Thread Peter Zijlstra
On Thu, Sep 30, 2021 at 06:15:46PM +0100, Tvrtko Ursulin wrote:

> (Note I did not copy
> everyone on all patches but just the cover letter for context and the rest
> should be available from the mailing list.)

In general, if you can't be arsed to send it to me, I can't be arsed to
dig it out. I've got plenty other email I can read without having to go
looking for more.


Re: [Intel-gfx] [RFC 1/6] sched: Add nice value change notifier

2021-10-01 Thread Peter Zijlstra
On Fri, Oct 01, 2021 at 11:32:16AM +0100, Tvrtko Ursulin wrote:
> 
> On 01/10/2021 10:04, Tvrtko Ursulin wrote:
> > 
> > Hi Peter,
> > 
> > On 30/09/2021 19:33, Peter Zijlstra wrote:
> > > On Thu, Sep 30, 2021 at 06:15:47PM +0100, Tvrtko Ursulin wrote:
> > > >   void set_user_nice(struct task_struct *p, long nice)
> > > >   {
> > > >   bool queued, running;
> > > > -    int old_prio;
> > > > +    int old_prio, ret;
> > > >   struct rq_flags rf;
> > > >   struct rq *rq;
> > > > @@ -6913,6 +6945,9 @@ void set_user_nice(struct task_struct *p,
> > > > long nice)
> > > >    */
> > > >   p->sched_class->prio_changed(rq, p, old_prio);
> > > > +    ret = atomic_notifier_call_chain(&user_nice_notifier_list,
> > > > nice, p);
> > > > +    WARN_ON_ONCE(ret != NOTIFY_DONE);
> > > > +
> > > >   out_unlock:
> > > >   task_rq_unlock(rq, p, &rf);
> > > >   }
> > > 
> > > No, we're not going to call out to exported, and potentially unbounded,
> > > functions under scheduler locks.
> > 
> > Agreed, that's another good point why it is even more hairy, as I have
> > generally alluded in the cover letter.
> > 
> > Do you have any immediate thoughts on possible alternatives?
> > 
> > Like for instance if I did a queue_work from set_user_nice and then ran
> > a notifier chain async from a worker? I haven't looked at yet what
> > repercussion would that have in terms of having to cancel the pending
> > workers when tasks exit. I can try and prototype that and see how it
> > would look.
> 
> Hm or I simply move calling the notifier chain to after task_rq_unlock? That
> would leave it run under the tasklist lock so probably still quite bad.

Hmm? That's for normalize_rt_tasks() only, right? Just don't have it
call the notifier in that special case (that's a magic sysrq thing
anyway).


Re: [Intel-gfx] [RFC 1/6] sched: Add nice value change notifier

2021-10-04 Thread Peter Zijlstra
On Mon, Oct 04, 2021 at 09:12:37AM +0100, Tvrtko Ursulin wrote:
> On 01/10/2021 16:48, Peter Zijlstra wrote:

> > Hmm? That's for normalize_rt_tasks() only, right? Just don't have it
> > call the notifier in that special case (that's a magic sysrq thing
> > anyway).
> 
> You mean my talk about tasklist_lock? No, it is also on the syscall part I
> am interested in as well. Call chain looks like this:

Urgh, I alwys miss that because it lives outside of sched.. :/

> sys_setpriority()
> {
>   ...
>   rcu_read_lock();
>   read_lock(&tasklist_lock);
>   ...
>   set_one_prio()
> set_user_nice()
> {
>   ...
>   task_rq_lock();
> -> my notifier from this RFC [1]
>   task_rq_unlock();
> -> I can move the notifier here for _some_ improvement [2]
> }
>   ...
>   read_unlock(&tasklist_lock);
>   rcu_read_unlock();
> }
> 
> So this RFC had the notifier call chain at [1], which I understood was the
> thing you initially pointed was horrible, being under a scheduler lock.
> 
> I can trivially move it to [2] but that still leaves it under the tasklist
> lock. I don't have a good feel how much better that would be. If not good
> enough then I will look for a smarter solution with less opportunity for
> global impact.

So task_list lock is pretty terrible and effectively unbound already
(just create more tasks etc..) so adding a notifier call there shouldn't
really make it much worse.




Re: [Intel-gfx] [PATCH 8/8] drm/i915: Don't disable interrupts and pretend a lock as been acquired in __timeline_mark_lock().

2021-10-05 Thread Peter Zijlstra
On Tue, Oct 05, 2021 at 05:00:46PM +0200, Sebastian Andrzej Siewior wrote:
> This is a revert of commits
>d67739268cf0e ("drm/i915/gt: Mark up the nested engine-pm timeline lock as 
> irqsafe")
>6c69a45445af9 ("drm/i915/gt: Mark context->active_count as protected by 
> timeline->mutex")
> 
> The existing code leads to a different behaviour depending on wheather
> lockdep is enabled or not. Any following lock that is acquired without
> disabling interrupts (but needs to) will not be noticed by lockdep.
> 
> This it not just a lockdep annotation but is used but an actual mutex_t
> that is properly used as a lock but in case of __timeline_mark_lock()
> lockdep is only told that it is acquired but no lock has been acquired.
> 
> It appears that its purporse is just satisfy the lockdep_assert_held()
> check in intel_context_mark_active(). The other problem with disabling
> interrupts is that on PREEMPT_RT interrupts are also disabled which
> leads to problems for instance later during memory allocation.
> 
> Add an argument to intel_context_mark_active() which is true if the lock
> must have been acquired, false if other magic is involved and the lock
> is not needed. Use the `false' argument only from within
> switch_to_kernel_context() and remove __timeline_mark_lock().
> 
> Cc: Peter Zijlstra 
> Signed-off-by: Sebastian Andrzej Siewior 

Eeew, nice find.


> -static inline void intel_context_mark_active(struct intel_context *ce)
> +static inline void intel_context_mark_active(struct intel_context *ce,
> +  bool timeline_mutex_needed)
>  {
> - lockdep_assert_held(&ce->timeline->mutex);
> + if (timeline_mutex_needed)
> + lockdep_assert_held(&ce->timeline->mutex);
>   ++ce->active_count;
>  }

Chris, might it be possible to write that something like:

lockdep_assert(lockdep_is_held(&ce->timeline->mutex) ||
   engine_is_parked(ce));

instead?

Otherwise,

Acked-by: Peter Zijlstra (Intel) 


Re: [Intel-gfx] [PATCH 1/3] drm: use the lookup lock in drm_is_current_master

2021-07-27 Thread Peter Zijlstra
On Thu, Jul 22, 2021 at 12:38:10PM +0200, Daniel Vetter wrote:
> On Thu, Jul 22, 2021 at 05:29:27PM +0800, Desmond Cheong Zhi Xi wrote:
> > Inside drm_is_current_master, using the outer drm_device.master_mutex
> > to protect reads of drm_file.master makes the function prone to creating
> > lock hierarchy inversions. Instead, we can use the
> > drm_file.master_lookup_lock that sits at the bottom of the lock
> > hierarchy.
> > 
> > Reported-by: Daniel Vetter 
> > Signed-off-by: Desmond Cheong Zhi Xi 
> > ---
> >  drivers/gpu/drm/drm_auth.c | 9 +
> >  1 file changed, 5 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/drm_auth.c b/drivers/gpu/drm/drm_auth.c
> > index f00354bec3fb..9c24b8cc8e36 100644
> > --- a/drivers/gpu/drm/drm_auth.c
> > +++ b/drivers/gpu/drm/drm_auth.c
> > @@ -63,8 +63,9 @@
> >  
> >  static bool drm_is_current_master_locked(struct drm_file *fpriv)
> >  {
> > -   lockdep_assert_held_once(&fpriv->minor->dev->master_mutex);
> > -
> > +   /* Either drm_device.master_mutex or drm_file.master_lookup_lock
> > +* should be held here.
> > +*/
> 
> Disappointing that lockdep can't check or conditions for us, a
> lockdep_assert_held_either would be really neat in some cases.
> 
> Adding lockdep folks, maybe they have ideas.

#ifdef CONFIG_LOCKDEP
WARN_ON_ONCE(debug_locks && !(lockdep_is_held(&drm_device.master_mutex) 
||
  
lockdep_is_held(&drm_file.master_lookup_lock)));
#endif

doesn't exactly roll off the tongue, but should do as you want I
suppose.

Would something like:

#define lockdep_assert(cond)WARN_ON_ONCE(debug_locks && !(cond))

Such that we can write:

lockdep_assert(lockdep_is_held(&drm_device.master_mutex) ||
   lockdep_is_held(&drm_file.master_lookup_lock));

make it better ?

---
Subject: locking/lockdep: Provide lockdep_assert{,_once}() helpers

Extract lockdep_assert{,_once}() helpers to more easily write composite
assertions like, for example:

lockdep_assert(lockdep_is_held(&drm_device.master_mutex) ||
   lockdep_is_held(&drm_file.master_lookup_lock));

Signed-off-by: Peter Zijlstra (Intel) 
---
diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index 5cf387813754..0da67341c1fb 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -306,31 +306,29 @@ extern void lock_unpin_lock(struct lockdep_map *lock, 
struct pin_cookie);
 
 #define lockdep_depth(tsk) (debug_locks ? (tsk)->lockdep_depth : 0)
 
-#define lockdep_assert_held(l) do {\
-   WARN_ON(debug_locks &&  \
-   lockdep_is_held(l) == LOCK_STATE_NOT_HELD); \
-   } while (0)
+#define lockdep_assert(cond)   \
+   do { WARN_ON(debug_locks && !(cond)); } while (0)
 
-#define lockdep_assert_not_held(l) do {\
-   WARN_ON(debug_locks &&  \
-   lockdep_is_held(l) == LOCK_STATE_HELD); \
-   } while (0)
+#define lockdep_assert_once(cond)  \
+   do { WARN_ON_ONCE(debug_locks && !(cond)); } while (0)
 
-#define lockdep_assert_held_write(l)   do {\
-   WARN_ON(debug_locks && !lockdep_is_held_type(l, 0));\
-   } while (0)
+#define lockdep_assert_held(l) \
+   lockdep_assert(lockdep_is_held(l) != LOCK_STAT_NOT_HELD)
 
-#define lockdep_assert_held_read(l)do {\
-   WARN_ON(debug_locks && !lockdep_is_held_type(l, 1));\
-   } while (0)
+#define lockdep_assert_not_held(l) \
+   lockdep_assert(lockdep_is_held(l) != LOCK_STATE_HELD)
 
-#define lockdep_assert_held_once(l)do {\
-   WARN_ON_ONCE(debug_locks && !lockdep_is_held(l));   \
-   } while (0)
+#define lockdep_assert_held_write(l)   \
+   lockdep_assert(lockdep_is_held_type(l, 0))
 
-#define lockdep_assert_none_held_once()do {
\
-   WARN_ON_ONCE(debug_locks && current->lockdep_depth);\
-   } while (0)
+#define lockdep_assert_held_read(l)\
+   lockdep_assert(lockdep_is_held_type(l, 1))
+
+#define lockdep_assert_held_once(l)\
+   lockdep_assert_once(lockdep_is_held(l) != LOCK_STAT_NOT_HELD)
+
+#define lockdep_assert_none_held_once()\
+   lockdep_assert_once(!current->lockdep_depth)
 
 #define lockdep_recursing(tsk) ((tsk)->lockdep_recursion)
 
@@ -407,6 +405,9 @@ extern int lock_is_held(const void *);
 extern int lockdep_is_held(co

Re: [Intel-gfx] [PATCH 1/3] drm: use the lookup lock in drm_is_current_master

2021-07-29 Thread Peter Zijlstra
On Thu, Jul 29, 2021 at 10:32:13PM +0800, Desmond Cheong Zhi Xi wrote:
> Sounds good, will do. Thanks for the patch, Peter.
> 
> Just going to make a small edit:
> s/LOCK_STAT_NOT_HELD/LOCK_STATE_NOT_HELD/

Bah, I knew I should've compile tested it :-), Thanks!
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] 5.13 i915/PAT regression on Brasswell, adding nopat to the kernel commandline worksaround this

2021-05-12 Thread Peter Zijlstra
On Wed, May 12, 2021 at 11:57:02AM +0200, Hans de Goede wrote:
> Hi All,
> 
> I'm not sure if this is a i915 bug, or caused by changes elsewhere in the 
> kernel,
> so I thought it would be best to just send out an email and then see from 
> there.
> 
> With 5.13-rc1 gdm fails to show and dmesg contains:
> 
> [   38.504613] x86/PAT: Xwayland:683 map pfn RAM range req write-combining 
> for [mem 0x23883000-0x23883fff], got write-back
> 
> [   39.484766] x86/PAT: gnome-shell:632 map pfn RAM range req write-combining 
> for [mem 0x1c6a3000-0x1c6a3fff], got write-back
> 
> [   54.314858] Asynchronous wait on fence :00:02.0:gnome-shell[632]:a 
> timed out (hint:intel_cursor_plane_create [i915])
> [   58.339769] i915 :00:02.0: [drm] GPU HANG: ecode 8:1:86dfdffb, in 
> gnome-shell [632]
> [   58.341161] i915 :00:02.0: [drm] Resetting rcs0 for stopped heartbeat 
> on rcs0
> [   58.341267] i915 :00:02.0: [drm] gnome-shell[632] context reset due to 
> GPU hang
> 
> Because of the PAT errors I tried adding "nopat" to the kernel commandline
> and I'm happy to report that that works around this.
> 
> Any hints on how to debug this further (without doing a full git bisect) 
> would be
> appreciated.

IIRC it's because of 74ffa5a3e685 ("mm: add remap_pfn_range_notrack"),
which added a sanity check to make sure expectations were met. It turns
out they were not.

The bug is not new, the warning is. AFAIK the i915 team is aware, but
other than that I've not followed.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] kernel/locking: Add context to ww_mutex_trylock.

2021-09-08 Thread Peter Zijlstra
On Tue, Sep 07, 2021 at 03:20:44PM +0200, Maarten Lankhorst wrote:
> i915 will soon gain an eviction path that trylock a whole lot of locks
> for eviction, getting dmesg failures like below:
> 
> BUG: MAX_LOCK_DEPTH too low!
> turning off the locking correctness validator.
> depth: 48  max: 48!
> 48 locks held by i915_selftest/5776:
>  #0: 888101a79240 (&dev->mutex){}-{3:3}, at: 
> __driver_attach+0x88/0x160
>  #1: c99778c0 (reservation_ww_class_acquire){+.+.}-{0:0}, at: 
> i915_vma_pin.constprop.63+0x39/0x1b0 [i915]
>  #2: 88800cf74de8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: 
> i915_vma_pin.constprop.63+0x5f/0x1b0 [i915]
>  #3: 88810c7f9e38 (&vm->mutex/1){+.+.}-{3:3}, at: 
> i915_vma_pin_ww+0x1c4/0x9d0 [i915]
>  #4: 88810bad5768 (reservation_ww_class_mutex){+.+.}-{3:3}, at: 
> i915_gem_evict_something+0x110/0x860 [i915]
>  #5: 88810bad60e8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: 
> i915_gem_evict_something+0x110/0x860 [i915]
> ...
>  #46: 88811964d768 (reservation_ww_class_mutex){+.+.}-{3:3}, at: 
> i915_gem_evict_something+0x110/0x860 [i915]
>  #47: 88811964e0e8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: 
> i915_gem_evict_something+0x110/0x860 [i915]
> INFO: lockdep is turned off.

> As an intermediate solution, add an acquire context to ww_mutex_trylock,
> which allows us to do proper nesting annotations on the trylocks, making
> the above lockdep splat disappear.

Fair enough I suppose.

> +/**
> + * ww_mutex_trylock - tries to acquire the w/w mutex with optional acquire 
> context
> + * @lock: mutex to lock
> + * @ctx: optional w/w acquire context
> + *
> + * Trylocks a mutex with the optional acquire context; no deadlock detection 
> is
> + * possible. Returns 1 if the mutex has been acquired successfully, 0 
> otherwise.
> + *
> + * Unlike ww_mutex_lock, no deadlock handling is performed. However, if a 
> @ctx is
> + * specified, -EALREADY and -EDEADLK handling may happen in calls to 
> ww_mutex_lock.
> + *
> + * A mutex acquired with this function must be released with ww_mutex_unlock.
> + */
> +int __sched
> +ww_mutex_trylock(struct ww_mutex *ww, struct ww_acquire_ctx *ctx)
> +{
> + bool locked;
> +
> + if (!ctx)
> + return mutex_trylock(&ww->base);
> +
> +#ifdef CONFIG_DEBUG_MUTEXES
> + DEBUG_LOCKS_WARN_ON(ww->base.magic != &ww->base);
> +#endif
> +
> + preempt_disable();
> + locked = __mutex_trylock(&ww->base);
> +
> + if (locked) {
> + ww_mutex_set_context_fastpath(ww, ctx);
> + mutex_acquire_nest(&ww->base.dep_map, 0, 1, &ctx->dep_map, 
> _RET_IP_);
> + }
> + preempt_enable();
> +
> + return locked;
> +}
> +EXPORT_SYMBOL(ww_mutex_trylock);

You'll need a similar hunk in ww_rt_mutex.c


Re: [Intel-gfx] [PATCH] kernel/locking: Add context to ww_mutex_trylock.

2021-09-09 Thread Peter Zijlstra
On Thu, Sep 09, 2021 at 07:38:06AM +0200, Maarten Lankhorst wrote:

> > You'll need a similar hunk in ww_rt_mutex.c
> 
> What tree has that file?

Linus' tree should have it. Per commit:

  f8635d509d80 ("locking/ww_mutex: Implement rtmutex based ww_mutex API 
functions")


Re: [Intel-gfx] [PATCH v2] kernel/locking: Add context to ww_mutex_trylock.

2021-09-10 Thread Peter Zijlstra
On Thu, Sep 09, 2021 at 11:32:18AM +0200, Maarten Lankhorst wrote:
> diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
> index d456579d0952..791c28005eef 100644
> --- a/kernel/locking/mutex.c
> +++ b/kernel/locking/mutex.c
> @@ -736,6 +736,44 @@ __ww_mutex_lock(struct mutex *lock, unsigned int state, 
> unsigned int subclass,
>   return __mutex_lock_common(lock, state, subclass, NULL, ip, ww_ctx, 
> true);
>  }
>  
> +/**
> + * ww_mutex_trylock - tries to acquire the w/w mutex with optional acquire 
> context
> + * @lock: mutex to lock
> + * @ctx: optional w/w acquire context
> + *
> + * Trylocks a mutex with the optional acquire context; no deadlock detection 
> is
> + * possible. Returns 1 if the mutex has been acquired successfully, 0 
> otherwise.
> + *
> + * Unlike ww_mutex_lock, no deadlock handling is performed. However, if a 
> @ctx is
> + * specified, -EALREADY and -EDEADLK handling may happen in calls to 
> ww_mutex_lock.
> + *
> + * A mutex acquired with this function must be released with ww_mutex_unlock.
> + */
> +int __sched
> +ww_mutex_trylock(struct ww_mutex *ww, struct ww_acquire_ctx *ctx)
> +{
> + bool locked;
> +
> + if (!ctx)
> + return mutex_trylock(&ww->base);
> +
> +#ifdef CONFIG_DEBUG_MUTEXES
> + DEBUG_LOCKS_WARN_ON(ww->base.magic != &ww->base);
> +#endif
> +
> + preempt_disable();
> + locked = __mutex_trylock(&ww->base);
> +
> + if (locked) {
> + ww_mutex_set_context_fastpath(ww, ctx);
> + mutex_acquire_nest(&ww->base.dep_map, 0, 1, &ctx->dep_map, 
> _RET_IP_);
> + }
> + preempt_enable();
> +
> + return locked;
> +}
> +EXPORT_SYMBOL(ww_mutex_trylock);
> +
>  #ifdef CONFIG_DEBUG_LOCK_ALLOC
>  void __sched
>  mutex_lock_nested(struct mutex *lock, unsigned int subclass)

> diff --git a/kernel/locking/ww_rt_mutex.c b/kernel/locking/ww_rt_mutex.c
> index 3f1fff7d2780..c4cb863edb4c 100644
> --- a/kernel/locking/ww_rt_mutex.c
> +++ b/kernel/locking/ww_rt_mutex.c
> @@ -50,6 +50,18 @@ __ww_rt_mutex_lock(struct ww_mutex *lock, struct 
> ww_acquire_ctx *ww_ctx,
>   return ret;
>  }
>  
> +int __sched
> +ww_mutex_trylock(struct ww_mutex *lock, struct ww_acquire_ctx *ctx)
> +{
> + int locked = rt_mutex_trylock(&lock->base);
> +
> + if (locked && ctx)
> + ww_mutex_set_context_fastpath(lock, ctx);
> +
> + return locked;
> +}
> +EXPORT_SYMBOL(ww_mutex_trylock);
> +
>  int __sched
>  ww_mutex_lock(struct ww_mutex *lock, struct ww_acquire_ctx *ctx)
>  {

That doesn't look right, how's this for you?

---
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -94,6 +94,9 @@ static inline unsigned long __owner_flag
return owner & MUTEX_FLAGS;
 }
 
+/*
+ * Returns: __mutex_owner(lock) on failure or NULL on success.
+ */
 static inline struct task_struct *__mutex_trylock_common(struct mutex *lock, 
bool handoff)
 {
unsigned long owner, curr = (unsigned long)current;
@@ -736,6 +739,47 @@ __ww_mutex_lock(struct mutex *lock, unsi
return __mutex_lock_common(lock, state, subclass, NULL, ip, ww_ctx, 
true);
 }
 
+/**
+ * ww_mutex_trylock - tries to acquire the w/w mutex with optional acquire 
context
+ * @ww: mutex to lock
+ * @ww_ctx: optional w/w acquire context
+ *
+ * Trylocks a mutex with the optional acquire context; no deadlock detection is
+ * possible. Returns 1 if the mutex has been acquired successfully, 0 
otherwise.
+ *
+ * Unlike ww_mutex_lock, no deadlock handling is performed. However, if a @ctx 
is
+ * specified, -EALREADY handling may happen in calls to ww_mutex_trylock.
+ *
+ * A mutex acquired with this function must be released with ww_mutex_unlock.
+ */
+int ww_mutex_trylock(struct ww_mutex *ww, struct ww_acquire_ctx *ww_ctx)
+{
+   if (!ww_ctx)
+   return mutex_trylock(&ww->base);
+
+   MUTEX_WARN_ON(ww->base.magic != &ww->base);
+
+   if (unlikely(ww_ctx == READ_ONCE(ww->ctx)))
+   return -EALREADY;
+
+   /*
+* Reset the wounded flag after a kill. No other process can
+* race and wound us here, since they can't have a valid owner
+* pointer if we don't have any locks held.
+*/
+   if (ww_ctx->acquired == 0)
+   ww_ctx->wounded = 0;
+
+   if (__mutex_trylock(&ww->base)) {
+   ww_mutex_set_context_fastpath(ww, ww_ctx);
+   mutex_acquire_nest(&ww->base.dep_map, 0, 1, &ww_ctx->dep_map, 
_RET_IP_);
+   return 1;
+   }
+
+   return 0;
+}
+EXPORT_SYMBOL(ww_mutex_trylock);
+
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
 void __sched
 mutex_lock_nested(struct mutex *lock, unsigned int subclass)
--- a/kernel/locking/ww_rt_mutex.c
+++ b/kernel/locking/ww_rt_mutex.c
@@ -9,6 +9,34 @@
 #define WW_RT
 #include "rtmutex.c"
 
+int ww_mutex_trylock(struct ww_mutex *lock, struct ww_acquire_ctx *ww_ctx)
+{
+   struct rt_mutex *rtm = &lock->base;
+
+   if (!ww_ctx)
+   return rt_mutex_trylock(rtm);
+
+   if (unlikely(ww_

Re: [Intel-gfx] [PATCH v2] kernel/locking: Add context to ww_mutex_trylock.

2021-09-10 Thread Peter Zijlstra
On Fri, Sep 10, 2021 at 05:02:54PM +0200, Peter Zijlstra wrote:

> That doesn't look right, how's this for you?

Full patch for the robots here:

https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/commit/?h=locking/core&id=826e7b8826f0af185bb93249600533c33fd69a95


Re: [Intel-gfx] [drm:gen8_de_irq_handler [i915]] *ERROR* Fault errors on pipe B: 0x00000080

2021-04-07 Thread Peter Zijlstra
On Wed, Apr 07, 2021 at 09:18:51AM -0400, Rodrigo Vivi wrote:
> On Thu, Jun 04, 2020 at 12:11:07PM +0200, Peter Zijlstra wrote:
> > On Tue, Jun 02, 2020 at 08:27:10PM +0200, Peter Zijlstra wrote:
> > > On Tue, Jun 02, 2020 at 06:08:03PM +, Souza, Jose wrote:
> > > > Hi Peter
> > > > Please file a bug by follow this instructions: 
> > > > https://gitlab.freedesktop.org/drm/intel/-/wikis/How-to-file-i915-bugs
> > > 
> > > *sigh*, top posting and webforms :-(
> > > 
> > > Steps to reproduce: Boot into X
> > > 
> > > How often: Always
> > > 
> > > uname -r: 5.6.0-2-amd64
> > > 
> > > Machine: Supermicro X11SSZ-F
> > > 
> > > Display connector:
> > > 
> > > [14.907] (II) intel(0): switch to mode 3840x2160@60.0 on DP2 using 
> > > pipe 0, position (0, 0), rotation normal, reflection none
> > > [14.918] (II) intel(0): switch to mode 3840x2160@60.0 on DP3 using 
> > > pipe 1, position (0, 0), rotation normal, reflection none
> > > 
> > > 
> > > I'll add the kernel parameters next time I reboot this thing, I'll also
> > > add the latest drm next time I build a kernel for this machine.
> > 
> > dmesg attached, I've yet to build a new kernel for this box..
> 
> Hi Peter,
> do you still face this issue?
> 
> Could you please add some information to 
> https://gitlab.freedesktop.org/drm/intel/-/issues/2004

It went away when I disabled the mitigation crud. Consider it solved.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 5/7] xen/privcmd: Use verify_page_range()

2021-04-12 Thread Peter Zijlstra
Avoid using apply_to_page_range() from modules, use the safer
verify_page_range() instead.

Signed-off-by: Peter Zijlstra (Intel) 
---
 drivers/xen/privcmd.c |8 
 1 file changed, 4 insertions(+), 4 deletions(-)

--- a/drivers/xen/privcmd.c
+++ b/drivers/xen/privcmd.c
@@ -946,9 +946,9 @@ static int privcmd_mmap(struct file *fil
  * on a per pfn/pte basis. Mapping calls that fail with ENOENT
  * can be then retried until success.
  */
-static int is_mapped_fn(pte_t *pte, unsigned long addr, void *data)
+static int is_mapped_fn(pte_t pte, unsigned long addr, void *data)
 {
-   return pte_none(*pte) ? 0 : -EBUSY;
+   return pte_none(pte) ? 0 : -EBUSY;
 }
 
 static int privcmd_vma_range_is_mapped(
@@ -956,8 +956,8 @@ static int privcmd_vma_range_is_mapped(
   unsigned long addr,
   unsigned long nr_pages)
 {
-   return apply_to_page_range(vma->vm_mm, addr, nr_pages << PAGE_SHIFT,
-  is_mapped_fn, NULL) != 0;
+   return verify_page_range(vma->vm_mm, addr, nr_pages << PAGE_SHIFT,
+is_mapped_fn, NULL) != 0;
 }
 
 const struct file_operations xen_privcmd_fops = {


___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 7/7] mm: Unexport apply_to_page_range()

2021-04-12 Thread Peter Zijlstra
Now that all module users of apply_to_page_range() have been removed,
unexport this function.

This is an unsafe function in that it gives direct access to the
page-tables.

Signed-off-by: Peter Zijlstra (Intel) 
---
 mm/memory.c |5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2544,13 +2544,14 @@ static int __apply_to_page_range(struct
 /*
  * Scan a region of virtual memory, filling in page tables as necessary
  * and calling a provided function on each leaf page table.
+ *
+ * DO NOT EXPORT; this hands out our page-tables on a platter.
  */
 int apply_to_page_range(struct mm_struct *mm, unsigned long addr,
unsigned long size, pte_fn_t fn, void *data)
 {
return __apply_to_page_range(mm, addr, size, fn, data, true);
 }
-EXPORT_SYMBOL_GPL(apply_to_page_range);
 
 /*
  * Scan a region of virtual memory, calling a provided function on
@@ -2558,6 +2559,8 @@ EXPORT_SYMBOL_GPL(apply_to_page_range);
  *
  * Unlike apply_to_page_range, this does _not_ fill in page tables
  * where they are absent.
+ *
+ * DO NOT EXPORT; this hands out our page-tables on a platter.
  */
 int apply_to_existing_page_range(struct mm_struct *mm, unsigned long addr,
 unsigned long size, pte_fn_t fn, void *data)


___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 1/7] mm: Unexport apply_to_existing_page_range()

2021-04-12 Thread Peter Zijlstra
There are no modular in-tree users, remove the EXPORT.

This is an unsafe function in that it gives direct access to the
page-tables.

Signed-off-by: Peter Zijlstra (Intel) 
---
 mm/memory.c |1 -
 1 file changed, 1 deletion(-)

--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2558,7 +2558,6 @@ int apply_to_existing_page_range(struct
 {
return __apply_to_page_range(mm, addr, size, fn, data, false);
 }
-EXPORT_SYMBOL_GPL(apply_to_existing_page_range);
 
 /*
  * handle_pte_fault chooses page fault handler according to an entry which was


___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 0/7] mm: Unexport apply_to_page_range()

2021-04-12 Thread Peter Zijlstra
Hi,

These patches remove the last few modular apply_to_page_range() users and
unexports these interface. Since these functions provide direct access to our
page-tables they're a prime target for nefarious activities.

These patches rely on the remap_pfn_range_notrack() patches from Christoph that
are currently already in Andrew's tree.

Please consider.

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 6/7] i915: Convert to verify_page_range()

2021-04-12 Thread Peter Zijlstra
check_{present,absent}() only need R/O access, use verify_page_range()
instead to remove modular use of apply_to_page_range().

Signed-off-by: Peter Zijlstra (Intel) 
---
 drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c |   16 
 1 file changed, 8 insertions(+), 8 deletions(-)

--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
@@ -1225,9 +1225,9 @@ static int igt_mmap_gpu(void *arg)
return 0;
 }
 
-static int check_present_pte(pte_t *pte, unsigned long addr, void *data)
+static int check_present_pte(pte_t pte, unsigned long addr, void *data)
 {
-   if (!pte_present(*pte) || pte_none(*pte)) {
+   if (!pte_present(pte) || pte_none(pte)) {
pr_err("missing PTE:%lx\n",
   (addr - (unsigned long)data) >> PAGE_SHIFT);
return -EINVAL;
@@ -1236,9 +1236,9 @@ static int check_present_pte(pte_t *pte,
return 0;
 }
 
-static int check_absent_pte(pte_t *pte, unsigned long addr, void *data)
+static int check_absent_pte(pte_t pte, unsigned long addr, void *data)
 {
-   if (pte_present(*pte) && !pte_none(*pte)) {
+   if (pte_present(pte) && !pte_none(pte)) {
pr_err("present PTE:%lx; expected to be revoked\n",
   (addr - (unsigned long)data) >> PAGE_SHIFT);
return -EINVAL;
@@ -1249,14 +1249,14 @@ static int check_absent_pte(pte_t *pte,
 
 static int check_present(unsigned long addr, unsigned long len)
 {
-   return apply_to_page_range(current->mm, addr, len,
-  check_present_pte, (void *)addr);
+   return verify_page_range(current->mm, addr, len,
+check_present_pte, (void *)addr);
 }
 
 static int check_absent(unsigned long addr, unsigned long len)
 {
-   return apply_to_page_range(current->mm, addr, len,
-  check_absent_pte, (void *)addr);
+   return verify_page_range(current->mm, addr, len,
+check_absent_pte, (void *)addr);
 }
 
 static int prefault_range(u64 start, u64 len)


___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 3/7] xen/gntdev: Remove apply_to_page_range() use from module

2021-04-12 Thread Peter Zijlstra
Instead of relying on apply_to_page_range() being available to
modules, move its use into core kernel code and export it's
application.

Signed-off-by: Peter Zijlstra (Intel) 
---
 drivers/xen/gntdev-common.h |2 ++
 drivers/xen/gntdev.c|   30 +-
 drivers/xen/grant-table.c   |   37 +
 3 files changed, 40 insertions(+), 29 deletions(-)

--- a/drivers/xen/gntdev-common.h
+++ b/drivers/xen/gntdev-common.h
@@ -86,4 +86,6 @@ bool gntdev_test_page_count(unsigned int
 
 int gntdev_map_grant_pages(struct gntdev_grant_map *map);
 
+int gnttab_use_ptemod(struct vm_area_struct *vma, struct gntdev_grant_map 
*map);
+
 #endif
--- a/drivers/xen/gntdev.c
+++ b/drivers/xen/gntdev.c
@@ -262,32 +262,6 @@ void gntdev_put_map(struct gntdev_priv *
 
 /* -- */
 
-static int find_grant_ptes(pte_t *pte, unsigned long addr, void *data)
-{
-   struct gntdev_grant_map *map = data;
-   unsigned int pgnr = (addr - map->vma->vm_start) >> PAGE_SHIFT;
-   int flags = map->flags | GNTMAP_application_map | GNTMAP_contains_pte;
-   u64 pte_maddr;
-
-   BUG_ON(pgnr >= map->count);
-   pte_maddr = arbitrary_virt_to_machine(pte).maddr;
-
-   /*
-* Set the PTE as special to force get_user_pages_fast() fall
-* back to the slow path.  If this is not supported as part of
-* the grant map, it will be done afterwards.
-*/
-   if (xen_feature(XENFEAT_gnttab_map_avail_bits))
-   flags |= (1 << _GNTMAP_guest_avail0);
-
-   gnttab_set_map_op(&map->map_ops[pgnr], pte_maddr, flags,
- map->grants[pgnr].ref,
- map->grants[pgnr].domid);
-   gnttab_set_unmap_op(&map->unmap_ops[pgnr], pte_maddr, flags,
-   INVALID_GRANT_HANDLE);
-   return 0;
-}
-
 int gntdev_map_grant_pages(struct gntdev_grant_map *map)
 {
int i, err = 0;
@@ -1028,9 +1002,7 @@ static int gntdev_mmap(struct file *flip
mmu_interval_read_begin(&map->notifier);
 
map->pages_vm_start = vma->vm_start;
-   err = apply_to_page_range(vma->vm_mm, vma->vm_start,
- vma->vm_end - vma->vm_start,
- find_grant_ptes, map);
+   err = gnttab_use_ptemod(vma, map);
if (err) {
pr_warn("find_grant_ptes() failure.\n");
goto out_put_map;
--- a/drivers/xen/grant-table.c
+++ b/drivers/xen/grant-table.c
@@ -1591,6 +1591,43 @@ int gnttab_init(void)
 }
 EXPORT_SYMBOL_GPL(gnttab_init);
 
+#include 
+#include "gntdev-common.h"
+
+static int find_grant_ptes(pte_t *pte, unsigned long addr, void *data)
+{
+   struct gntdev_grant_map *map = data;
+   unsigned int pgnr = (addr - map->vma->vm_start) >> PAGE_SHIFT;
+   int flags = map->flags | GNTMAP_application_map | GNTMAP_contains_pte;
+   u64 pte_maddr;
+
+   BUG_ON(pgnr >= map->count);
+   pte_maddr = arbitrary_virt_to_machine(pte).maddr;
+
+   /*
+* Set the PTE as special to force get_user_pages_fast() fall
+* back to the slow path.  If this is not supported as part of
+* the grant map, it will be done afterwards.
+*/
+   if (xen_feature(XENFEAT_gnttab_map_avail_bits))
+   flags |= (1 << _GNTMAP_guest_avail0);
+
+   gnttab_set_map_op(&map->map_ops[pgnr], pte_maddr, flags,
+ map->grants[pgnr].ref,
+ map->grants[pgnr].domid);
+   gnttab_set_unmap_op(&map->unmap_ops[pgnr], pte_maddr, flags,
+   INVALID_GRANT_HANDLE);
+   return 0;
+}
+
+int gnttab_use_ptemod(struct vm_area_struct *vma, struct gntdev_grant_map *map)
+{
+   return apply_to_page_range(vma->vm_mm, vma->vm_start,
+  vma->vm_end - vma->vm_start,
+  find_grant_ptes, map);
+}
+EXPORT_SYMBOL_GPL(gnttab_use_ptemod);
+
 static int __gnttab_init(void)
 {
if (!xen_domain())


___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 4/7] mm: Introduce verify_page_range()

2021-04-12 Thread Peter Zijlstra
Introduce and EXPORT a read-only counterpart to apply_to_page_range().

It only exposes the PTE value, not a pointer to the pagetables itself
and is thus quite a bit safer to export. A number of
apply_to_page_range() users can be converted to this primitive.

Signed-off-by: Peter Zijlstra (Intel) 
---
 include/linux/mm.h |4 
 mm/memory.c|   24 
 2 files changed, 28 insertions(+)

--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2876,6 +2876,10 @@ extern int apply_to_page_range(struct mm
 extern int apply_to_existing_page_range(struct mm_struct *mm,
   unsigned long address, unsigned long size,
   pte_fn_t fn, void *data);
+extern int verify_page_range(struct mm_struct *mm,
+unsigned long addr, unsigned long size,
+int (*fn)(pte_t pte, unsigned long addr, void 
*data),
+void *data);
 
 extern void init_mem_debugging_and_hardening(void);
 #ifdef CONFIG_PAGE_POISONING
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2559,6 +2559,30 @@ int apply_to_existing_page_range(struct
return __apply_to_page_range(mm, addr, size, fn, data, false);
 }
 
+struct vpr_data {
+   int (*fn)(pte_t pte, unsigned long addr, void *data);
+   void *data;
+};
+
+static int vpr_fn(pte_t *pte, unsigned long addr, void *data)
+{
+   struct vpr_data *vpr = data;
+   return vpr->fn(*pte, addr, vpr->data);
+}
+
+int verify_page_range(struct mm_struct *mm,
+ unsigned long addr, unsigned long size,
+ int (*fn)(pte_t pte, unsigned long addr, void *data),
+ void *data)
+{
+   struct vpr_data vpr = {
+   .fn = fn,
+   .data = data,
+   };
+   return apply_to_page_range(mm, addr, size, vpr_fn, &vpr);
+}
+EXPORT_SYMBOL_GPL(verify_page_range);
+
 /*
  * handle_pte_fault chooses page fault handler according to an entry which was
  * read non-atomically.  Before making any commitment, on those architectures


___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 2/7] xen/gntdev, x86: Remove apply_to_page_range() use from module

2021-04-12 Thread Peter Zijlstra
Instead of relying on apply_to_page_range() being available to
modules, move its use into core kernel code and export it's
application.

NOTE: ideally we do: use_ptemod = !auto_translate_physmap &&
gnttab_map_avail_bits and remove this hack.

Signed-off-by: Peter Zijlstra (Intel) 
---
 arch/x86/include/asm/xen/page.h |2 ++
 arch/x86/xen/mmu.c  |   26 ++
 drivers/xen/gntdev.c|   23 +--
 3 files changed, 29 insertions(+), 22 deletions(-)

--- a/arch/x86/include/asm/xen/page.h
+++ b/arch/x86/include/asm/xen/page.h
@@ -370,4 +370,6 @@ static inline unsigned long xen_get_swio
return __get_free_pages(__GFP_NOWARN, order);
 }
 
+extern void xen_set_grant_as_special(struct vm_area_struct *vma);
+
 #endif /* _ASM_X86_XEN_PAGE_H */
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -51,3 +51,29 @@ int xen_unmap_domain_gfn_range(struct vm
return -EINVAL;
 }
 EXPORT_SYMBOL_GPL(xen_unmap_domain_gfn_range);
+
+static int set_grant_ptes_as_special(pte_t *pte, unsigned long addr, void 
*data)
+{
+   set_pte_at(current->mm, addr, pte, pte_mkspecial(*pte));
+   return 0;
+}
+
+void xen_set_grant_as_special(struct vm_area_struct *vma)
+{
+   if (xen_feature(XENFEAT_gnttab_map_avail_bits))
+   return;
+
+   /*
+* If the PTEs were not made special by the grant map
+* hypercall, do so here.
+*
+* This is racy since the mapping is already visible
+* to userspace but userspace should be well-behaved
+* enough to not touch it until the mmap() call
+* returns.
+*/
+   apply_to_page_range(vma->vm_mm, vma->vm_start,
+   vma->vm_end - vma->vm_start,
+   set_grant_ptes_as_special, NULL);
+}
+EXPORT_SYMBOL_GPL(xen_set_grant_as_special);
--- a/drivers/xen/gntdev.c
+++ b/drivers/xen/gntdev.c
@@ -278,14 +278,6 @@ static int find_grant_ptes(pte_t *pte, u
return 0;
 }
 
-#ifdef CONFIG_X86
-static int set_grant_ptes_as_special(pte_t *pte, unsigned long addr, void 
*data)
-{
-   set_pte_at(current->mm, addr, pte, pte_mkspecial(*pte));
-   return 0;
-}
-#endif
-
 int gntdev_map_grant_pages(struct gntdev_grant_map *map)
 {
int i, err = 0;
@@ -1040,20 +1032,7 @@ static int gntdev_mmap(struct file *flip
goto out_put_map;
} else {
 #ifdef CONFIG_X86
-   /*
-* If the PTEs were not made special by the grant map
-* hypercall, do so here.
-*
-* This is racy since the mapping is already visible
-* to userspace but userspace should be well-behaved
-* enough to not touch it until the mmap() call
-* returns.
-*/
-   if (!xen_feature(XENFEAT_gnttab_map_avail_bits)) {
-   apply_to_page_range(vma->vm_mm, vma->vm_start,
-   vma->vm_end - vma->vm_start,
-   set_grant_ptes_as_special, NULL);
-   }
+   xen_set_grant_as_special(vma);
 #endif
}
 


___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 3/7] xen/gntdev: Remove apply_to_page_range() use from module

2021-04-12 Thread Peter Zijlstra
On Mon, Apr 12, 2021 at 10:27:21AM +0200, Christoph Hellwig wrote:
> On Mon, Apr 12, 2021 at 10:00:15AM +0200, Peter Zijlstra wrote:
> > --- a/drivers/xen/grant-table.c
> > +++ b/drivers/xen/grant-table.c
> > @@ -1591,6 +1591,43 @@ int gnttab_init(void)
> >  }
> >  EXPORT_SYMBOL_GPL(gnttab_init);
> >  
> > +#include 
> > +#include "gntdev-common.h"
> 
> Can't we keep the includes at the top of the file?

Probably could, lemme move them.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 4/7] mm: Introduce verify_page_range()

2021-04-12 Thread Peter Zijlstra
On Mon, Apr 12, 2021 at 10:28:05AM +0200, Christoph Hellwig wrote:
> On Mon, Apr 12, 2021 at 10:00:16AM +0200, Peter Zijlstra wrote:
> > +extern int verify_page_range(struct mm_struct *mm,
> 
> No need for the extern here.

It's consistent with the rest of the functions there. Also, I personally
like that extern.

> > +int verify_page_range(struct mm_struct *mm,
> > + unsigned long addr, unsigned long size,
> > + int (*fn)(pte_t pte, unsigned long addr, void *data),
> > + void *data)
> 
> A kerneldoc comment would be nice for this function.
> 
> Otherwise this looks fine.

Something like so?

/**
 * verify_page_range() - Scan (and fill) a range of virtual memory and validate 
PTEs
 * @mm: mm identifying the virtual memory map
 * @addr: starting virtual address of the range
 * @size: size of the range
 * @fn: function that verifies the PTEs
 * @data: opaque data passed to @fn
 *
 * Scan a region of virtual memory, filling in page tables as necessary and
 * calling a provided function on each leaf, providing a copy of the
 * page-table-entry.
 *
 * Similar apply_to_page_range(), but does not provide direct access to the
 * page-tables.
 *
 * NOTE! this function does not work correctly vs large pages.
 *
 * Return: the first !0 return of the provided function, or 0 on completion.
 */
int verify_page_range(struct mm_struct *mm,
  unsigned long addr, unsigned long size,
  int (*fn)(pte_t pte, unsigned long addr, void *data),
  void *data)
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 2/7] xen/gntdev, x86: Remove apply_to_page_range() use from module

2021-04-12 Thread Peter Zijlstra
On Mon, Apr 12, 2021 at 10:26:40AM +0200, Christoph Hellwig wrote:
> On Mon, Apr 12, 2021 at 10:00:14AM +0200, Peter Zijlstra wrote:
> > Instead of relying on apply_to_page_range() being available to
> > modules, move its use into core kernel code and export it's
> > application.
> 
> This doesn't exactly look great, but at least it contains the damage..

That was just about as far as I got...

> > NOTE: ideally we do: use_ptemod = !auto_translate_physmap &&
> > gnttab_map_avail_bits and remove this hack.
> 
> Given how much pain the !auto_translate_physmap case causes all over
> the kernel I wonder what a realistic timeline might be for dropping
> support for this case might be..

I've no experience with any of that; it looks absolutely disguisting,
all of it. I figured that's part of the Xen 'charm' :-)
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] ✗ Fi.CI.BUILD: failure for mm: Unexport apply_to_page_range()

2021-04-12 Thread Peter Zijlstra
On Mon, Apr 12, 2021 at 09:00:16AM -, Patchwork wrote:
> == Series Details ==
> 
> Series: mm: Unexport apply_to_page_range()
> URL   : https://patchwork.freedesktop.org/series/88943/
> State : failure
> 
> == Summary ==
> 
> CALLscripts/checksyscalls.sh
>   CALLscripts/atomic/check-atomics.sh
>   DESCEND  objtool
>   CHK include/generated/compile.h
> Kernel: arch/x86/boot/bzImage is ready  (#1)
>   MODPOST Module.symvers
> ERROR: modpost: "apply_to_page_range" [drivers/gpu/drm/i915/i915.ko] 
> undefined!

This would be your robot not including the remap_pfn_range_notrack()
patches from Christoph I suppose..
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 4/7] mm: Introduce verify_page_range()

2021-04-12 Thread Peter Zijlstra
On Mon, Apr 12, 2021 at 01:05:09PM -0700, Kees Cook wrote:
> On Mon, Apr 12, 2021 at 10:00:16AM +0200, Peter Zijlstra wrote:
> > +struct vpr_data {
> > +   int (*fn)(pte_t pte, unsigned long addr, void *data);
> > +   void *data;
> > +};
> 
> Eeerg. This is likely to become an attack target itself. Stored function
> pointer with stored (3rd) argument.

You got some further reading on that? How exactly are those exploited?
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 6/7] i915: Convert to verify_page_range()

2021-04-12 Thread Peter Zijlstra
On Mon, Apr 12, 2021 at 01:08:38PM -0700, Kees Cook wrote:
> On Mon, Apr 12, 2021 at 10:00:18AM +0200, Peter Zijlstra wrote:
> > @@ -1249,14 +1249,14 @@ static int check_absent_pte(pte_t *pte,
> >  
> >  static int check_present(unsigned long addr, unsigned long len)
> >  {
> > -   return apply_to_page_range(current->mm, addr, len,
> > -  check_present_pte, (void *)addr);
> > +   return verify_page_range(current->mm, addr, len,
> > +check_present_pte, (void *)addr);
> 
> For example, switch to returning bad addr through verify_page_range(),
> or have a by-reference value, etc:
> 
>   unsigned long failed;
> 
>   failed = verify_page_range(current->mm< addr, len, check_present_pte);
>   if (failed) {
>   pr_err("missing PTE:%lx\n",
>  (addr - failed) >> PAGE_SHIFT);

OK, lemme try that.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 4/7] mm: Introduce verify_page_range()

2021-04-13 Thread Peter Zijlstra
On Mon, Apr 12, 2021 at 01:05:09PM -0700, Kees Cook wrote:
> On Mon, Apr 12, 2021 at 10:00:16AM +0200, Peter Zijlstra wrote:
> > +struct vpr_data {
> > +   int (*fn)(pte_t pte, unsigned long addr, void *data);
> > +   void *data;
> > +};
> 
> Eeerg. This is likely to become an attack target itself. Stored function
> pointer with stored (3rd) argument.
> 
> This doesn't seem needed: only DRM uses it, and that's for error
> reporting. I'd rather plumb back errors in a way to not have to add
> another place in the kernel where we do func+arg stored calling.

Is this any better? It does have the stored pointer, but not a stored
argument, assuming you don't count returns as arguments I suppose.

The alternative is refactoring apply_to_page_range() :-/

---

struct vpr_data {
bool (*fn)(pte_t pte, unsigned long addr);
unsigned long addr;
};

static int vpr_fn(pte_t *pte, unsigned long addr, void *data)
{
struct vpr_data *vpr = data;
if (!vpr->fn(*pte, addr)) {
vpr->addr = addr;
return -EINVAL;
}
return 0;
}

/**
 * verify_page_range() - Scan (and fill) a range of virtual memory and validate 
PTEs
 * @mm: mm identifying the virtual memory map
 * @addr: starting virtual address of the range
 * @size: size of the range
 * @fn: function that verifies the PTEs
 *
 * Scan a region of virtual memory, filling in page tables as necessary and
 * calling a provided function on each leaf, providing a copy of the
 * page-table-entry.
 *
 * Similar apply_to_page_range(), but does not provide direct access to the
 * page-tables.
 *
 * NOTE! this function does not work correctly vs large pages.
 *
 * Return: the address that failed verification or 0 on success.
 */
unsigned long verify_page_range(struct mm_struct *mm,
unsigned long addr, unsigned long size,
bool (*fn)(pte_t pte, unsigned long addr))
{
struct vpr_data vpr = {
.fn = fn,
.addr = 0,
};
apply_to_page_range(mm, addr, size, vpr_fn, &vpr);
return vpr.addr;
}
EXPORT_SYMBOL_GPL(verify_page_range);
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 4/7] mm: Introduce verify_page_range()

2021-04-14 Thread Peter Zijlstra
On Tue, Apr 13, 2021 at 08:01:08PM -0700, Kees Cook wrote:
> So the addr can just be encoded in "int", and no structure is needed at:
> 
> typedef bool (*vpr_fn_t)(pte_t pte);
> 
> static int vpr_fn(pte_t *pte, unsigned long addr, void *data)
> {
>   vpr_fn_t callback = data;
> 
>   if (!callback(*pte))
>   return addr >> PAGE_SIZE;
>   return 0;
> }
> 
> unsigned long verify_page_range(struct mm_struct *mm,
>   unsigned long addr, unsigned long size,
>   vpr_fn_t callback)
> {
>   return apply_to_page_range(mm, addr, size, vpr_fn, callback) << 
> PAGE_SIZE;
> }
> 
> But maybe I'm missing something?

That covers only (32+12) bits of address space and will mostly work, but
we definitely support architectures (very much including x86_64) with
larger address spaces than that.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] RFC: console: hack up console_lock more v3

2019-05-09 Thread Peter Zijlstra
On Thu, May 09, 2019 at 02:09:03PM +0200, Daniel Vetter wrote:
> Fix this by creating a prinkt_safe_up() which calls wake_up_process
> outside of the spinlock. This isn't correct in full generality, but
> good enough for console_lock:
> 
> - console_lock doesn't use interruptible or killable or timeout down()
>   calls, hence an up() is the only thing that can wake up a process.

Wrong :/ Any task can be woken at any random time. We must, at all
times, assume spurious wakeups will happen.

> +void printk_safe_up(struct semaphore *sem)
> +{
> + unsigned long flags;
> + struct semaphore_waiter *waiter = NULL;
> +
> + raw_spin_lock_irqsave(&sem->lock, flags);
> + if (likely(list_empty(&sem->wait_list))) {
> + sem->count++;
> + } else {
> + waiter = list_first_entry(&sem->wait_list,
> +   struct semaphore_waiter, list);
> + list_del(&waiter->list);
> + waiter->up = true;
> + }
> + raw_spin_unlock_irqrestore(&sem->lock, flags);
> +
> + if (waiter) /* protected by being sole wake source */
> + wake_up_process(waiter->task);
> +}
> +EXPORT_SYMBOL(printk_safe_up);

Since its only used from printk, that EXPORT really isn't needed.

Something like the below might work.

---
 kernel/locking/semaphore.c | 26 +-
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/kernel/locking/semaphore.c b/kernel/locking/semaphore.c
index 561acdd39960..ac0a67e95aac 100644
--- a/kernel/locking/semaphore.c
+++ b/kernel/locking/semaphore.c
@@ -38,7 +38,6 @@ static noinline void __down(struct semaphore *sem);
 static noinline int __down_interruptible(struct semaphore *sem);
 static noinline int __down_killable(struct semaphore *sem);
 static noinline int __down_timeout(struct semaphore *sem, long timeout);
-static noinline void __up(struct semaphore *sem);
 
 /**
  * down - acquire the semaphore
@@ -178,14 +177,24 @@ EXPORT_SYMBOL(down_timeout);
  */
 void up(struct semaphore *sem)
 {
+   struct semaphore_waiter *waiter;
+   DEFINE_WAKE_Q(wake_q);
unsigned long flags;
 
raw_spin_lock_irqsave(&sem->lock, flags);
-   if (likely(list_empty(&sem->wait_list)))
+   if (likely(list_empty(&sem->wait_list))) {
sem->count++;
-   else
-   __up(sem);
+   goto unlock;
+   }
+
+   waiter = list_first_entry(&sem->wait_list, struct semaphore_waiter, 
list);
+   list_del(&waiter->list);
+   waiter->up = true;
+   wake_q_add(&wake_q, waiter->task);
+unlock:
raw_spin_unlock_irqrestore(&sem->lock, flags);
+
+   wake_up_q(&wake_q);
 }
 EXPORT_SYMBOL(up);
 
@@ -252,12 +261,3 @@ static noinline int __sched __down_timeout(struct 
semaphore *sem, long timeout)
 {
return __down_common(sem, TASK_UNINTERRUPTIBLE, timeout);
 }
-
-static noinline void __sched __up(struct semaphore *sem)
-{
-   struct semaphore_waiter *waiter = list_first_entry(&sem->wait_list,
-   struct semaphore_waiter, list);
-   list_del(&waiter->list);
-   waiter->up = true;
-   wake_up_process(waiter->task);
-}
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH] RFC: console: hack up console_lock more v2

2019-05-09 Thread Peter Zijlstra
On Thu, May 09, 2019 at 11:32:57AM +0100, Chris Wilson wrote:
> Quoting Daniel Vetter (2019-05-06 08:45:53)
> > +/**
> > + * printk_safe_up - release the semaphore in console_unlock
> > + * @sem: the semaphore to release
> > + *
> > + * Release the semaphore.  Unlike mutexes, up() may be called from any
> > + * context and even by tasks which have never called down().
> > + *
> > + * NOTE: This is a special version of up() for console_unlock only. It is 
> > only
> > + * safe if there are no killable, interruptible or timing out down() calls.
> > + */
> > +void printk_safe_up(struct semaphore *sem)
> > +{
> > +   unsigned long flags;
> > +   struct semaphore_waiter *waiter = NULL;
> > +
> > +   raw_spin_lock_irqsave(&sem->lock, flags);
> > +   if (likely(list_empty(&sem->wait_list))) {
> > +   sem->count++;
> > +   } else {
> > +   waiter = list_first_entry(&sem->wait_list,
> > + struct semaphore_waiter, list);
> > +   list_del(&waiter->list);
> > +   waiter->up = true;
> > +   }
> > +   raw_spin_unlock_irqrestore(&sem->lock, flags);
> > +
> > +   if (waiter)
> > +   wake_up_process(waiter->task);
> 
> From comparing against __down_common() there's a risk here that as soon
> as waiter->up == true, the waiter may complete and make the onstack
> struct semaphore_waiter invalid. If you store waiter->task locally under
> the spinlock that problem is resolved.
> 
> Then there is the issue of an unprotected dereference of the task in
> wake_up_process() -- I think you can wrap this function with
> rcu_read_lock() to keep that safe, and wake_up_process() should be a
> no-op if it races against process termination.

task_struct is not RCU protected, see task_rcu_dereference() for magic.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH] RFC: console: hack up console_lock more v3

2019-05-09 Thread Peter Zijlstra
On Thu, May 09, 2019 at 03:06:09PM +0200, Daniel Vetter wrote:
> On Thu, May 9, 2019 at 2:31 PM Peter Zijlstra  wrote:
> > On Thu, May 09, 2019 at 02:09:03PM +0200, Daniel Vetter wrote:
> > > Fix this by creating a prinkt_safe_up() which calls wake_up_process
> > > outside of the spinlock. This isn't correct in full generality, but
> > > good enough for console_lock:
> > >
> > > - console_lock doesn't use interruptible or killable or timeout down()
> > >   calls, hence an up() is the only thing that can wake up a process.
> >
> > Wrong :/ Any task can be woken at any random time. We must, at all
> > times, assume spurious wakeups will happen.
> 
> Out of curiosity, where do these come from? I know about the races
> where you need to recheck on the waiter side to avoid getting stuck,
> but didn't know about this. Are these earlier (possibly spurious)
> wakeups that got held up and delayed for a while, then hit the task
> much later when it's already continued doing something else?

Yes, this. So they all more or less have the form:

CPU0CPU1

enqueue_waiter()
done = true;
if (waiters)
for (;;) {
  if (done)
break;

  ...
}

dequeue_waiter()

do something else again

  wake_up_task



The wake_q thing made the above much more common, but we've had it
forever.

> Or even
> more random, and even if I never put a task on a wait list or anything
> else, ever, it can get woken spuriously?

I had patches that did that on purpose, but no.

> > Something like the below might work.
> 
> Yeah that looks like the proper fix. I guess semaphores are uncritical
> enough that we can roll this out for everyone. Thanks for the hint.

It's actually an optimization that we never did because semaphores are
so uncritical :-)

The thing is, by delaying the wakup until after we've released the
spinlock, the waiter will not contend on the spinlock the moment it
wakes.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [patch V2 28/29] stacktrace: Provide common infrastructure

2019-04-19 Thread Peter Zijlstra
On Thu, Apr 18, 2019 at 05:42:55PM +0200, Thomas Gleixner wrote:
> On Thu, 18 Apr 2019, Josh Poimboeuf wrote:

> > Another idea I had (but never got a chance to work on) was to extend the
> > x86 unwind interface to all arches.  So instead of the callbacks, each
> > arch would implement something like this API:

> I surely thought about that, but after staring at all incarnations of
> arch/*/stacktrace.c I just gave up.
> 
> Aside of that quite some archs already have callback based unwinders
> because they use them for more than stacktracing and just have a single
> implementation of that loop.
> 
> I'm fine either way. We can start with x86 and then let archs convert over
> their stuff, but I wouldn't hold my breath that this will be completed in
> the forseeable future.

I suggested the same to Thomas early on, and I even spend the time to
convert some $random arch to the iterator interface, and while it is
indeed entirely feasible, it is _far_ more work.

The callback thing OTOH is flexible enough to do what we want to do now,
and allows converting most archs to it without too much pain (as Thomas
said, many archs are already in this form and only need minor API
adjustments), which gets us in a far better place than we are now.

And we can always go to iterators later on. But I think getting the
generic unwinder improved across all archs is a really important first
step here.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [patch V2 28/29] stacktrace: Provide common infrastructure

2019-04-19 Thread Peter Zijlstra
On Thu, Apr 18, 2019 at 10:41:47AM +0200, Thomas Gleixner wrote:

> +typedef bool (*stack_trace_consume_fn)(void *cookie, unsigned long addr,
> +  bool reliable);

> +void arch_stack_walk(stack_trace_consume_fn consume_entry, void *cookie,
> +  struct task_struct *task, struct pt_regs *regs);
> +int arch_stack_walk_reliable(stack_trace_consume_fn consume_entry, void 
> *cookie,
> +  struct task_struct *task);

This bugs me a little; ideally the _reliable() thing would not exists.

Thomas said that the existing __save_stack_trace_reliable() is different
enough for the unification to be non-trivial, but maybe Josh can help
out?

From what I can see the biggest significant differences are:

 - it looks at the regs sets on the stack and for FP bails early
 - bails for khreads and idle (after it does all the hard work!?!)

The first (FP checking for exceptions) should probably be reflected in
consume_fn(.reliable) anyway -- although that would mean a lot of extra
'?' entries where there are none today.

And the second (KTHREAD/IDLE) is something that the generic code can
easily do before calling into the arch unwinder.

Hmm?

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [patch V2 28/29] stacktrace: Provide common infrastructure

2019-04-19 Thread Peter Zijlstra
On Fri, Apr 19, 2019 at 10:32:30AM +0200, Thomas Gleixner wrote:
> On Fri, 19 Apr 2019, Peter Zijlstra wrote:
> > On Thu, Apr 18, 2019 at 10:41:47AM +0200, Thomas Gleixner wrote:
> > 
> > > +typedef bool (*stack_trace_consume_fn)(void *cookie, unsigned long addr,
> > > +  bool reliable);
> > 
> > > +void arch_stack_walk(stack_trace_consume_fn consume_entry, void *cookie,
> > > +  struct task_struct *task, struct pt_regs *regs);
> > > +int arch_stack_walk_reliable(stack_trace_consume_fn consume_entry, void 
> > > *cookie,
> > > +  struct task_struct *task);
> > 
> > This bugs me a little; ideally the _reliable() thing would not exists.
> > 
> > Thomas said that the existing __save_stack_trace_reliable() is different
> > enough for the unification to be non-trivial, but maybe Josh can help
> > out?
> > 
> > >From what I can see the biggest significant differences are:
> > 
> >  - it looks at the regs sets on the stack and for FP bails early
> >  - bails for khreads and idle (after it does all the hard work!?!)
> > 
> > The first (FP checking for exceptions) should probably be reflected in
> > consume_fn(.reliable) anyway -- although that would mean a lot of extra
> > '?' entries where there are none today.
> > 
> > And the second (KTHREAD/IDLE) is something that the generic code can
> > easily do before calling into the arch unwinder.
> 
> And looking at the powerpc version of it, that has even more interesting
> extra checks in that function.

Right, but not fundamentally different from determining @reliable I
think.

Anyway, it would be good if someone knowledgable could have a look at
this.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [patch V2 18/29] lockdep: Move stack trace logic into check_prev_add()

2019-04-24 Thread Peter Zijlstra
On Thu, Apr 18, 2019 at 10:41:37AM +0200, Thomas Gleixner wrote:
> There is only one caller of check_prev_add() which hands in a zeroed struct
> stack trace and a function pointer to save_stack(). Inside check_prev_add()
> the stack_trace struct is checked for being empty, which is always
> true. Based on that one code path stores a stack trace which is unused. The
> comment there does not make sense either. It's all leftovers from
> historical lockdep code (cross release).

I was more or less expecting a revert of:

ce07a9415f26 ("locking/lockdep: Make check_prev_add() able to handle external 
stack_trace")

And then I read the comment that went with the "static struct
stack_trace trace" that got removed (in the above commit) and realized
that your patch will consume more stack entries.

The problem is when the held lock stack in check_prevs_add() has multple
trylock entries on top, in that case we call check_prev_add() multiple
times, and this patch will then save the exact same stack-trace multiple
times, consuming static resources.

Possibly we should copy what stackdepot does (but we cannot use it
directly because stackdepot uses locks; but possible we can share bits),
but that is a patch for another day I think.

So while convoluted, perhaps we should retain this code for now.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [patch V2 19/29] lockdep: Simplify stack trace handling

2019-04-24 Thread Peter Zijlstra
On Thu, Apr 18, 2019 at 10:41:38AM +0200, Thomas Gleixner wrote:
> Replace the indirection through struct stack_trace by using the storage
> array based interfaces and storing the information is a small lockdep
> specific data structure.
> 

Acked-by: Peter Zijlstra (Intel) 
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [patch V3 18/29] lockdep: Remove save argument from check_prev_add()

2019-04-25 Thread Peter Zijlstra
On Thu, Apr 25, 2019 at 11:45:11AM +0200, Thomas Gleixner wrote:
> There is only one caller which hands in save_trace as function pointer.
> 
> Signed-off-by: Thomas Gleixner 

Acked-by: Peter Zijlstra (Intel) 
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] linux-next: build failure after merge of the tip tree (from the drm-intel tree)

2016-07-05 Thread Peter Zijlstra
On Tue, Jul 05, 2016 at 01:53:03PM +1000, Stephen Rothwell wrote:
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index d3502c0603e5..1f91f187b2a8 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3290,7 +3290,7 @@ i915_gem_retire_work_handler(struct work_struct *work)
>* We do not need to do this test under locking as in the worst-case
>* we queue the retire worker once too often.
>*/
> - if (lockless_dereference(dev_priv->gt.awake))
> + if (/*lockless_dereference*/(dev_priv->gt.awake))
>   queue_delayed_work(dev_priv->wq,
>  &dev_priv->gt.retire_work,
>  round_jiffies_up_relative(HZ));
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index f6de8dd567a2..2c1926418691 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -3095,7 +3095,7 @@ static void i915_hangcheck_elapsed(struct work_struct 
> *work)
>   if (!i915.enable_hangcheck)
>   return;
>  
> - if (!lockless_dereference(dev_priv->gt.awake))
> + if (!/*lockless_dereference*/(dev_priv->gt.awake))
>   return;
>  
>   /* As enabling the GPU requires fairly extensive mmio access,

Right, neither case appears to include a data dependency and thus
lockless_dereference() seems misguided.

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 2/4] kernel.h: Add non_block_start/end()

2018-12-10 Thread Peter Zijlstra
On Mon, Dec 10, 2018 at 03:13:37PM +0100, Michal Hocko wrote:
> I do not see any scheduler guys Cced and it would be really great to get
> their opinion here.
> 
> On Mon 10-12-18 11:36:39, Daniel Vetter wrote:
> > In some special cases we must not block, but there's not a
> > spinlock, preempt-off, irqs-off or similar critical section already
> > that arms the might_sleep() debug checks. Add a non_block_start/end()
> > pair to annotate these.
> > 
> > This will be used in the oom paths of mmu-notifiers, where blocking is
> > not allowed to make sure there's forward progress.
> 
> Considering the only alternative would be to abuse
> preempt_{disable,enable}, and that really has a different semantic, I
> think this makes some sense. The cotext is preemptible but we do not
> want notifier to sleep on any locks, WQ etc.

I'm confused... what is this supposed to do?

And what does 'block' mean here? Without preempt_disable/IRQ-off we're
subject to regular preemption and execution can stall for arbitrary
amounts of time.

The Changelog doesn't yield any clues.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 2/4] kernel.h: Add non_block_start/end()

2018-12-10 Thread Peter Zijlstra
On Mon, Dec 10, 2018 at 05:20:10PM +0100, Michal Hocko wrote:
> > OK, no real objections to the thing.  Just so long we're all on the same
> > page as to what it does and doesn't do ;-)
> 
> I am not really sure whether there are other potential users besides
> this one and whether the check as such is justified.

It's a debug option...

> > I suppose you could extend the check to include schedule_debug() as
> > well, maybe something like:
> 
> Do you mean to make the check cheaper?

Nah, so the patch only touched might_sleep(), the below touches
schedule().

If there were a patch that hits schedule() without going through a
might_sleep() (rare in practise I think, but entirely possible) then you
won't get a splat without something like the below on top.

> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index f66920173370..b1aaa278f1af 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -3278,13 +3278,18 @@ static noinline void __schedule_bug(struct 
> > task_struct *prev)
> >  /*
> >   * Various schedule()-time debugging checks and statistics:
> >   */
> > -static inline void schedule_debug(struct task_struct *prev)
> > +static inline void schedule_debug(struct task_struct *prev, bool preempt)
> >  {
> >  #ifdef CONFIG_SCHED_STACK_END_CHECK
> > if (task_stack_end_corrupted(prev))
> > panic("corrupted stack end detected inside scheduler\n");
> >  #endif
> >  
> > +#ifdef CONFIG_DEBUG_ATOMIC_SLEEP
> > +   if (!preempt && prev->state && prev->non_block_count)
> > +   // splat
> > +#endif
> > +
> > if (unlikely(in_atomic_preempt_off())) {
> > __schedule_bug(prev);
> > preempt_count_set(PREEMPT_DISABLED);
> > @@ -3391,7 +3396,7 @@ static void __sched notrace __schedule(bool preempt)
> > rq = cpu_rq(cpu);
> > prev = rq->curr;
> >  
> > -   schedule_debug(prev);
> > +   schedule_debug(prev, preempt);
> >  
> > if (sched_feat(HRTICK))
> > hrtick_clear(rq);
> 
> -- 
> Michal Hocko
> SUSE Labs
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 2/4] kernel.h: Add non_block_start/end()

2018-12-10 Thread Peter Zijlstra
On Mon, Dec 10, 2018 at 04:01:59PM +0100, Michal Hocko wrote:
> On Mon 10-12-18 15:47:11, Peter Zijlstra wrote:
> > On Mon, Dec 10, 2018 at 03:13:37PM +0100, Michal Hocko wrote:
> > > I do not see any scheduler guys Cced and it would be really great to get
> > > their opinion here.
> > > 
> > > On Mon 10-12-18 11:36:39, Daniel Vetter wrote:
> > > > In some special cases we must not block, but there's not a
> > > > spinlock, preempt-off, irqs-off or similar critical section already
> > > > that arms the might_sleep() debug checks. Add a non_block_start/end()
> > > > pair to annotate these.
> > > > 
> > > > This will be used in the oom paths of mmu-notifiers, where blocking is
> > > > not allowed to make sure there's forward progress.
> > > 
> > > Considering the only alternative would be to abuse
> > > preempt_{disable,enable}, and that really has a different semantic, I
> > > think this makes some sense. The cotext is preemptible but we do not
> > > want notifier to sleep on any locks, WQ etc.
> > 
> > I'm confused... what is this supposed to do?
> > 
> > And what does 'block' mean here? Without preempt_disable/IRQ-off we're
> > subject to regular preemption and execution can stall for arbitrary
> > amounts of time.
> 
> The notifier is called from quite a restricted context - oom_reaper - 
> which shouldn't depend on any locks or sleepable conditionals. 

You want to exclude spinlocks too? We could maybe frob something with
lockdep if you need that?

> The code
> should be swift as well but we mostly do care about it to make a forward
> progress. Checking for sleepable context is the best thing we could come
> up with that would describe these demands at least partially.

OK, no real objections to the thing.  Just so long we're all on the same
page as to what it does and doesn't do ;-)

I suppose you could extend the check to include schedule_debug() as
well, maybe something like:

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index f66920173370..b1aaa278f1af 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3278,13 +3278,18 @@ static noinline void __schedule_bug(struct task_struct 
*prev)
 /*
  * Various schedule()-time debugging checks and statistics:
  */
-static inline void schedule_debug(struct task_struct *prev)
+static inline void schedule_debug(struct task_struct *prev, bool preempt)
 {
 #ifdef CONFIG_SCHED_STACK_END_CHECK
if (task_stack_end_corrupted(prev))
panic("corrupted stack end detected inside scheduler\n");
 #endif
 
+#ifdef CONFIG_DEBUG_ATOMIC_SLEEP
+   if (!preempt && prev->state && prev->non_block_count)
+   // splat
+#endif
+
if (unlikely(in_atomic_preempt_off())) {
__schedule_bug(prev);
preempt_count_set(PREEMPT_DISABLED);
@@ -3391,7 +3396,7 @@ static void __sched notrace __schedule(bool preempt)
rq = cpu_rq(cpu);
prev = rq->curr;
 
-   schedule_debug(prev);
+   schedule_debug(prev, preempt);
 
if (sched_feat(HRTICK))
hrtick_clear(rq);

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [RFC][PATCH] wake_up_var() memory ordering

2019-06-24 Thread Peter Zijlstra
Hi all,

I tried using wake_up_var() today and accidentally noticed that it
didn't imply an smp_mb() and specifically requires it through
wake_up_bit() / waitqueue_active().

Now, wake_up_bit() doesn't imply the barrier because it is assumed to be
used with the atomic bitops API which either implies (test_and_clear) or
only needs smp_mb__after_atomic(), which is 'much' cheaper than an
unconditional smp_mb().

Still, while auditing all that, I found a whole bunch of things that
could be improved. There were missing barriers, superfluous barriers and
a whole bunch of sites that could use clear_and_wake_up_bit().

So this fixes all wake_up_bit() usage without actually changing
semantics of it (which are unfortunate but understandable). This does
however change the semantics of wake_up_var(); even though wake_up_var()
is most often used with atomics and then the additional smp_mb() is most
often superfluous :/

There isn't really a good option here, comments (other than I need to
split this up)?


---
 drivers/bluetooth/btmtksdio.c   |  5 +
 drivers/bluetooth/btmtkuart.c   |  5 +
 drivers/bluetooth/hci_mrvl.c|  8 ++--
 drivers/gpu/drm/i915/i915_reset.c   |  6 ++
 drivers/md/dm-bufio.c   | 10 ++
 drivers/media/usb/dvb-usb-v2/dvb_usb_core.c | 15 ---
 fs/afs/fs_probe.c   |  1 +
 fs/afs/server.c |  1 +
 fs/afs/vl_probe.c   |  1 +
 fs/afs/volume.c |  1 +
 fs/aio.c|  4 +---
 fs/block_dev.c  |  1 +
 fs/btrfs/extent_io.c|  4 +---
 fs/cachefiles/namei.c   |  1 +
 fs/cifs/connect.c   |  3 +--
 fs/cifs/misc.c  | 15 +--
 fs/fscache/cookie.c |  2 ++
 fs/fscache/object.c |  2 ++
 fs/fscache/page.c   |  3 +++
 fs/gfs2/glock.c |  8 ++--
 fs/gfs2/glops.c |  1 +
 fs/gfs2/lock_dlm.c  |  8 ++--
 fs/gfs2/recovery.c  |  4 +---
 fs/gfs2/super.c |  1 +
 fs/gfs2/sys.c   |  4 +---
 fs/nfs/nfs4state.c  |  4 +---
 fs/nfs/pnfs_nfs.c   |  4 +---
 fs/nfsd/nfs4recover.c   |  4 +---
 fs/orangefs/file.c  |  2 +-
 kernel/sched/wait_bit.c |  1 +
 net/bluetooth/hci_event.c   |  5 +
 net/rds/ib_recv.c   |  1 +
 security/keys/gc.c  |  5 ++---
 33 files changed, 50 insertions(+), 90 deletions(-)

diff --git a/drivers/bluetooth/btmtksdio.c b/drivers/bluetooth/btmtksdio.c
index 813338288453..27523cfeac9a 100644
--- a/drivers/bluetooth/btmtksdio.c
+++ b/drivers/bluetooth/btmtksdio.c
@@ -356,11 +356,8 @@ static int btmtksdio_recv_event(struct hci_dev *hdev, 
struct sk_buff *skb)
 
if (hdr->evt == HCI_EV_VENDOR) {
if (test_and_clear_bit(BTMTKSDIO_TX_WAIT_VND_EVT,
-  &bdev->tx_state)) {
-   /* Barrier to sync with other CPUs */
-   smp_mb__after_atomic();
+  &bdev->tx_state))
wake_up_bit(&bdev->tx_state, BTMTKSDIO_TX_WAIT_VND_EVT);
-   }
}
 
return 0;
diff --git a/drivers/bluetooth/btmtkuart.c b/drivers/bluetooth/btmtkuart.c
index f5dbeec8e274..7fe324df3799 100644
--- a/drivers/bluetooth/btmtkuart.c
+++ b/drivers/bluetooth/btmtkuart.c
@@ -340,11 +340,8 @@ static int btmtkuart_recv_event(struct hci_dev *hdev, 
struct sk_buff *skb)
 
if (hdr->evt == HCI_EV_VENDOR) {
if (test_and_clear_bit(BTMTKUART_TX_WAIT_VND_EVT,
-  &bdev->tx_state)) {
-   /* Barrier to sync with other CPUs */
-   smp_mb__after_atomic();
+  &bdev->tx_state))
wake_up_bit(&bdev->tx_state, BTMTKUART_TX_WAIT_VND_EVT);
-   }
}
 
return 0;
diff --git a/drivers/bluetooth/hci_mrvl.c b/drivers/bluetooth/hci_mrvl.c
index 50212ac629e3..f03294d39d08 100644
--- a/drivers/bluetooth/hci_mrvl.c
+++ b/drivers/bluetooth/hci_mrvl.c
@@ -157,9 +157,7 @@ static int mrvl_recv_fw_req(struct hci_dev *hdev, struct 
sk_buff *skb)
 
mrvl->tx_len = le16_to_cpu(pkt->lhs);
 
-   clear_bit(STATE_FW_REQ_PENDING, &mrvl->flags);
-   smp_mb__after_atomic();
-   wake_up_bit(&mrvl->flags, STATE_FW_REQ_PENDING);
+   clear_and_wake_up_bit(STATE_FW_REQ_PENDING, &mrvl->flags);
 
 done:
kfree_skb(skb);
@@ -192,9 +190,7 @@ static int mrvl_recv_chi

Re: [Intel-gfx] [RFC][PATCH] wake_up_var() memory ordering

2019-06-25 Thread Peter Zijlstra

(sorry for cross-posting to moderated lists btw, I've since
 acquired a patch to get_maintainers.pl that wil exclude them
 in the future)

On Tue, Jun 25, 2019 at 08:51:01AM +0100, David Howells wrote:
> Peter Zijlstra  wrote:
> 
> > I tried using wake_up_var() today and accidentally noticed that it
> > didn't imply an smp_mb() and specifically requires it through
> > wake_up_bit() / waitqueue_active().
> 
> Thinking about it again, I'm not sure why you need to add the barrier when
> wake_up() (which this is a wrapper around) is required to impose a barrier at
> the front if there's anything to wake up (ie. the wait queue isn't empty).
> 
> If this is insufficient, does it make sense just to have wake_up*() functions
> do an unconditional release or full barrier right at the front, rather than it
> being conditional on something being woken up?

The curprit is __wake_up_bit()'s usage of waitqueue_active(); it is this
latter (see its comment) that requires the smp_mb().

wake_up_bit() and wake_up_var() are wrappers around __wake_up_bit().

Without this barrier it is possible for the waitqueue_active() load to
be hoisted over the cond=true store and the remote end can miss the
store and we can miss its enqueue and we'll all miss a wakeup and get
stuck.

Adding an smp_mb() (or use wq_has_sleeper()) in __wake_up_bit() would be
nice, but I fear some people will complain about overhead, esp. since
about half the sites don't need the barrier due to being behind
test_and_clear_bit() and the other half using smp_mb__after_atomic()
after some clear_bit*() variant.

There's a few sites that seem to open-code
wait_var_event()/wake_up_var() and those actually need the full
smp_mb(), but then maybe they should be converted to var instread of bit
anyway.

> > @@ -619,9 +614,7 @@ static int dvb_usb_fe_sleep(struct dvb_frontend *fe)
> >  err:
> > if (!adap->suspend_resume_active) {
> > adap->active_fe = -1;
> 
> I'm wondering if there's a missing barrier here.  Should the clear_bit() on
> the next line be clear_bit_unlock() or clear_bit_release()?

That looks reasonable, but I'd like to hear from the DVB folks on that.

> > -   clear_bit(ADAP_SLEEP, &adap->state_bits);
> > -   smp_mb__after_atomic();
> > -   wake_up_bit(&adap->state_bits, ADAP_SLEEP);
> > +   clear_and_wake_up_bit(ADAP_SLEEP, &adap->state_bits);
> > }
> >  
> > dev_dbg(&d->udev->dev, "%s: ret=%d\n", __func__, ret);
> > diff --git a/fs/afs/fs_probe.c b/fs/afs/fs_probe.c
> > index cfe62b154f68..377ee07d5f76 100644
> > --- a/fs/afs/fs_probe.c
> > +++ b/fs/afs/fs_probe.c
> > @@ -18,6 +18,7 @@ static bool afs_fs_probe_done(struct afs_server *server)
> >  
> > wake_up_var(&server->probe_outstanding);
> > clear_bit_unlock(AFS_SERVER_FL_PROBING, &server->flags);
> > +   smp_mb__after_atomic();
> > wake_up_bit(&server->flags, AFS_SERVER_FL_PROBING);
> > return true;
> >  }
> 
> Looking at this and the dvb one, does it make sense to stick the release
> semantics of clear_bit_unlock() into clear_and_wake_up_bit()?

I was thinking of adding another helper, maybe unlock_and_wake_up_bit()
that included that extra barrier, but maybe making it unconditional
isn't the worst idea.

> Also, should clear_bit_unlock() be renamed to clear_bit_release() (and
> similarly test_and_set_bit_lock() -> test_and_set_bit_acquire()) if we seem to
> be trying to standardise on that terminology.

That definitely makes sense to me, there's only 157 clear_bit_unlock()
and 76 test_and_set_bit_lock() users (note the asymetry of that).

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [RFC][PATCH] wake_up_var() memory ordering

2019-06-25 Thread Peter Zijlstra
On Tue, Jun 25, 2019 at 11:19:35AM +0200, Andreas Gruenbacher wrote:
> > diff --git a/fs/gfs2/glops.c b/fs/gfs2/glops.c
> > index cf4c767005b1..29ea5da7 100644
> > --- a/fs/gfs2/glops.c
> > +++ b/fs/gfs2/glops.c
> > @@ -227,6 +227,7 @@ static void gfs2_clear_glop_pending(struct gfs2_inode 
> > *ip)
> > return;
> >
> > clear_bit_unlock(GIF_GLOP_PENDING, &ip->i_flags);
> > +   smp_mb__after_atomic();
> > wake_up_bit(&ip->i_flags, GIF_GLOP_PENDING);
> 
> This should become clear_and_wake_up_bit as well, right? There are
> several more instances of the same pattern.

Only if we do as David suggested and make clean_and_wake_up_bit()
provide the RELEASE barrier.

That is, currently clear_and_wake_up_bit() is

clear_bit()
smp_mb__after_atomic();
wake_up_bit();

But the above is:

clear_bit_unlock();
smp_mb__after_atomic();
wake_up_bit()

the difference is that _unlock() uses RELEASE semantics, where
clear_bit() does not.

The difference is illustrated with something like:

cond = true;
clear_bit()
smp_mb__after_atomic();
wake_up_bit();

In this case, a remote CPU can first observe the clear_bit() and then
the 'cond = true' store. When we use clear_bit_unlock() this is not
possible, because the RELEASE barrier ensures that everything before,
stays before.

> >  }
> >
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [RFC][PATCH] wake_up_var() memory ordering

2019-06-25 Thread Peter Zijlstra
On Tue, Jun 25, 2019 at 02:12:22PM +0200, Andreas Gruenbacher wrote:

> > Only if we do as David suggested and make clean_and_wake_up_bit()
> > provide the RELEASE barrier.
> 
> (It's clear_and_wake_up_bit, not clean_and_wake_up_bit.)

Yes, typing hard.

> > That is, currently clear_and_wake_up_bit() is
> >
> > clear_bit()
> > smp_mb__after_atomic();
> > wake_up_bit();
> >

> Now I'm confused because clear_and_wake_up_bit() in mainline does use
> clear_bit_unlock(), so it's the exact opposite of what you just said.

Argh; clearly I couldn't read. And then yes, you're right.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 15/15] seqlock: drop seqcount_ww_mutex_t

2022-04-07 Thread Peter Zijlstra
On Thu, Apr 07, 2022 at 10:59:46AM +0200, Christian König wrote:
> Daniel pointed out that this series removes the last user of
> seqcount_ww_mutex_t, so let's drop this.
> 
> Signed-off-by: Christian König 

Acked-by: Peter Zijlstra (Intel) 


Re: [Intel-gfx] [PATCH 0/2] Nuke PAGE_KERNEL_IO

2021-11-12 Thread Peter Zijlstra
On Thu, Oct 21, 2021 at 11:15:09AM -0700, Lucas De Marchi wrote:
> Last user of PAGE_KERNEL_IO is the i915 driver. While removing it from
> there as we seek to bring the driver to other architectures, Daniel
> suggested that we could finish the cleanup and remove it altogether,
> through the tip tree. So here I'm sending both commits needed for that.
> 
> Lucas De Marchi (2):
>   drm/i915/gem: stop using PAGE_KERNEL_IO
>   x86/mm: nuke PAGE_KERNEL_IO
> 
>  arch/x86/include/asm/fixmap.h | 2 +-
>  arch/x86/include/asm/pgtable_types.h  | 7 ---
>  arch/x86/mm/ioremap.c | 2 +-
>  arch/x86/xen/setup.c  | 2 +-
>  drivers/gpu/drm/i915/gem/i915_gem_pages.c | 4 ++--
>  include/asm-generic/fixmap.h  | 2 +-
>  6 files changed, 6 insertions(+), 13 deletions(-)

Acked-by: Peter Zijlstra (Intel) 


Re: [Intel-gfx] [RFC 00/12] locking: Separate lock tracepoints from lockdep/lock_stat (v1)

2022-02-09 Thread Peter Zijlstra
On Tue, Feb 08, 2022 at 10:41:56AM -0800, Namhyung Kim wrote:

> Eventually I'm mostly interested in the contended locks only and I
> want to reduce the overhead in the fast path.  By moving that, it'd be
> easy to track contended locks with timing by using two tracepoints.

So why not put in two new tracepoints and call it a day?

Why muck about with all that lockdep stuff just to preserve the name
(and in the process continue to blow up data structures etc..). This
leaves distros in a bind, will they enable this config and provide
tracepoints while bloating the data structures and destroying things
like lockref (which relies on sizeof(spinlock_t)), or not provide this
at all.

Yes, the name is convenient, but it's just not worth it IMO. It makes
the whole proposition too much of a trade-off.

Would it not be possible to reconstruct enough useful information from
the lock callsite?


Re: [Intel-gfx] [RFC 00/12] locking: Separate lock tracepoints from lockdep/lock_stat (v1)

2022-02-10 Thread Peter Zijlstra
On Wed, Feb 09, 2022 at 03:17:38PM -0500, Waiman Long wrote:
> 
> On 2/9/22 14:45, Namhyung Kim wrote:
> > On Wed, Feb 9, 2022 at 11:28 AM Mathieu Desnoyers
> >  wrote:
> > > - On Feb 9, 2022, at 2:22 PM, Namhyung Kim namhy...@kernel.org wrote:
> > > > I'm also concerning dynamic allocated locks in a data structure.
> > > > If we keep the info in a hash table, we should delete it when the
> > > > lock is gone.  I'm not sure we have a good place to hook it up all.
> > > I was wondering about this use case as well. Can we make it mandatory to
> > > declare the lock "class" (including the name) statically, even though the
> > > lock per-se is allocated dynamically ? Then the initialization of the lock
> > > embedded within the data structure would simply refer to the lock class
> > > definition.
> > Isn't it still the same if we have static lock classes that the entry needs
> > to be deleted from the hash table when it frees the data structure?
> > I'm more concerned about free than alloc as there seems to be no
> > API to track that in a place.
> 
> We may have to invent some new APIs to do that. For example,
> spin_lock_exit() can be the counterpart of spin_lock_init() and so on. Of
> course, existing kernel code have to be modified to designate the point
> after which a lock is no longer being used or is freed.

The canonical name is _destroy(). We even have mutex_destroy() except
it's usage isn't mandatory.

The easy way out is doing as lockdep does and hook into the memory
allocators and check every free'd hunk of memory for a lock. It does
hoever mean your data structure of choice needs to be able to answer: do
I have an entry in @range. Which mostly disqualifies a hash-table.

Still, I really don't think you need any of this, it's just bloat. A
very limited stack unwind for one of the two tracepoints should allow
you to find the offending lock just fine.


Re: [Intel-gfx] [RFC 00/12] locking: Separate lock tracepoints from lockdep/lock_stat (v1)

2022-02-10 Thread Peter Zijlstra
On Wed, Feb 09, 2022 at 04:32:58PM -0800, Namhyung Kim wrote:
> On Wed, Feb 9, 2022 at 1:09 AM Peter Zijlstra  wrote:
> >
> > On Tue, Feb 08, 2022 at 10:41:56AM -0800, Namhyung Kim wrote:
> >
> > > Eventually I'm mostly interested in the contended locks only and I
> > > want to reduce the overhead in the fast path.  By moving that, it'd be
> > > easy to track contended locks with timing by using two tracepoints.
> >
> > So why not put in two new tracepoints and call it a day?
> >
> > Why muck about with all that lockdep stuff just to preserve the name
> > (and in the process continue to blow up data structures etc..). This
> > leaves distros in a bind, will they enable this config and provide
> > tracepoints while bloating the data structures and destroying things
> > like lockref (which relies on sizeof(spinlock_t)), or not provide this
> > at all.
> 
> If it's only lockref, is it possible to change it to use arch_spinlock_t
> so that it can remain in 4 bytes?  It'd be really nice if we can keep
> spin lock size, but it'd be easier to carry the name with it for
> analysis IMHO.

It's just vile and disgusting to blow up the lock size for convenience
like this.

And no, there's more of that around. A lot of effort has been spend to
make sure spinlocks are 32bit and we're not going to give that up for
something as daft as this.

Just think harder on the analysis side. Like said; I'm thinking the
caller IP should be good enough most of the time.


Re: [Intel-gfx] [RFC 00/12] locking: Separate lock tracepoints from lockdep/lock_stat (v1)

2022-02-11 Thread Peter Zijlstra
On Thu, Feb 10, 2022 at 09:55:27PM -0800, Namhyung Kim wrote:

> So you are ok with adding two new tracepoints, even if they are
> similar to what we already have in lockdep/lock_stat, right?

Yeah, I don't think adding tracepoints to the slowpaths of the various
locks should be a problem.


Re: [Intel-gfx] [PATCH -next] treewide: remove unused argument in lock_release()

2019-09-30 Thread Peter Zijlstra
On Thu, Sep 19, 2019 at 12:09:40PM -0400, Qian Cai wrote:
> Since the commit b4adfe8e05f1 ("locking/lockdep: Remove unused argument
> in __lock_release"), @nested is no longer used in lock_release(), so
> remove it from all lock_release() calls and friends.

Right; I never did this cleanup for not wanting the churn, but as long
as it applies I'll take it.

> Signed-off-by: Qian Cai 
> ---
>  drivers/gpu/drm/drm_connector.c   |  2 +-
>  drivers/gpu/drm/i915/gem/i915_gem_shrinker.c  |  6 +++---
>  drivers/gpu/drm/i915/gt/intel_engine_pm.c |  2 +-
>  drivers/gpu/drm/i915/i915_request.c   |  2 +-
>  drivers/tty/tty_ldsem.c   |  8 
>  fs/dcache.c   |  2 +-
>  fs/jbd2/transaction.c |  4 ++--
>  fs/kernfs/dir.c   |  4 ++--
>  fs/ocfs2/dlmglue.c|  2 +-
>  include/linux/jbd2.h  |  2 +-
>  include/linux/lockdep.h   | 21 ++---
>  include/linux/percpu-rwsem.h  |  4 ++--
>  include/linux/rcupdate.h  |  2 +-
>  include/linux/rwlock_api_smp.h| 16 
>  include/linux/seqlock.h   |  4 ++--
>  include/linux/spinlock_api_smp.h  |  8 
>  include/linux/ww_mutex.h  |  2 +-
>  include/net/sock.h|  2 +-
>  kernel/bpf/stackmap.c |  2 +-
>  kernel/cpu.c  |  2 +-
>  kernel/locking/lockdep.c  |  3 +--
>  kernel/locking/mutex.c|  4 ++--
>  kernel/locking/rtmutex.c  |  6 +++---
>  kernel/locking/rwsem.c| 10 +-
>  kernel/printk/printk.c| 10 +-
>  kernel/sched/core.c   |  2 +-
>  lib/locking-selftest.c| 24 
>  mm/memcontrol.c   |  2 +-
>  net/core/sock.c   |  2 +-
>  tools/lib/lockdep/include/liblockdep/common.h |  3 +--
>  tools/lib/lockdep/include/liblockdep/mutex.h  |  2 +-
>  tools/lib/lockdep/include/liblockdep/rwlock.h |  2 +-
>  tools/lib/lockdep/preload.c   | 16 
>  33 files changed, 90 insertions(+), 93 deletions(-)

Thanks!
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH -next] treewide: remove unused argument in lock_release()

2019-10-08 Thread Peter Zijlstra
On Tue, Oct 08, 2019 at 06:33:51PM +0200, Daniel Vetter wrote:
> On Thu, Sep 19, 2019 at 12:09:40PM -0400, Qian Cai wrote:
> > Since the commit b4adfe8e05f1 ("locking/lockdep: Remove unused argument
> > in __lock_release"), @nested is no longer used in lock_release(), so
> > remove it from all lock_release() calls and friends.
> > 
> > Signed-off-by: Qian Cai 
> 
> Ack on the concept and for the drm parts (and feel free to keep the ack if
> you inevitably have to respin this later on). Might result in some
> conflicts, but welp we need to keep Linus busy :-)
> 
> Acked-by: Daniel Vetter 

Thanks Daniel!
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH v4 2/9] perf/core: open access for CAP_SYS_PERFMON privileged process

2020-01-08 Thread Peter Zijlstra
On Wed, Dec 18, 2019 at 12:25:35PM +0300, Alexey Budankov wrote:
> 
> Open access to perf_events monitoring for CAP_SYS_PERFMON privileged
> processes. For backward compatibility reasons access to perf_events
> subsystem remains open for CAP_SYS_ADMIN privileged processes but
> CAP_SYS_ADMIN usage for secure perf_events monitoring is discouraged
> with respect to CAP_SYS_PERFMON capability.
> 
> Signed-off-by: Alexey Budankov 
> ---
>  include/linux/perf_event.h | 6 +++---
>  kernel/events/core.c   | 6 +++---
>  2 files changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index 34c7c6910026..f46acd69425f 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -1285,7 +1285,7 @@ static inline int perf_is_paranoid(void)
>  
>  static inline int perf_allow_kernel(struct perf_event_attr *attr)
>  {
> - if (sysctl_perf_event_paranoid > 1 && !capable(CAP_SYS_ADMIN))
> + if (sysctl_perf_event_paranoid > 1 && !perfmon_capable())
>   return -EACCES;
>  
>   return security_perf_event_open(attr, PERF_SECURITY_KERNEL);
> @@ -1293,7 +1293,7 @@ static inline int perf_allow_kernel(struct 
> perf_event_attr *attr)
>  
>  static inline int perf_allow_cpu(struct perf_event_attr *attr)
>  {
> - if (sysctl_perf_event_paranoid > 0 && !capable(CAP_SYS_ADMIN))
> + if (sysctl_perf_event_paranoid > 0 && !perfmon_capable())
>   return -EACCES;
>  
>   return security_perf_event_open(attr, PERF_SECURITY_CPU);
> @@ -1301,7 +1301,7 @@ static inline int perf_allow_cpu(struct perf_event_attr 
> *attr)
>  
>  static inline int perf_allow_tracepoint(struct perf_event_attr *attr)
>  {
> - if (sysctl_perf_event_paranoid > -1 && !capable(CAP_SYS_ADMIN))
> + if (sysctl_perf_event_paranoid > -1 && !perfmon_capable())
>   return -EPERM;
>  
>   return security_perf_event_open(attr, PERF_SECURITY_TRACEPOINT);

These are OK I suppose.

> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 059ee7116008..d9db414f2197 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -9056,7 +9056,7 @@ static int perf_kprobe_event_init(struct perf_event 
> *event)
>   if (event->attr.type != perf_kprobe.type)
>   return -ENOENT;
>  
> - if (!capable(CAP_SYS_ADMIN))
> + if (!perfmon_capable())
>   return -EACCES;
>  
>   /*

This one only allows attaching to already extant kprobes, right? It does
not allow creation of kprobes.

> @@ -9116,7 +9116,7 @@ static int perf_uprobe_event_init(struct perf_event 
> *event)
>   if (event->attr.type != perf_uprobe.type)
>   return -ENOENT;
>  
> - if (!capable(CAP_SYS_ADMIN))
> + if (!perfmon_capable())
>   return -EACCES;
>  
>   /*

Idem, I presume.

> @@ -11157,7 +11157,7 @@ SYSCALL_DEFINE5(perf_event_open,
>   }
>  
>   if (attr.namespaces) {
> - if (!capable(CAP_SYS_ADMIN))
> + if (!perfmon_capable())
>   return -EACCES;
>   }

And given we basically make the entire kernel observable with this CAP,
busting namespaces shoulnd't be a problem either.

So yeah, I suppose that works.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v4 2/9] perf/core: open access for CAP_SYS_PERFMON privileged process

2020-01-10 Thread Peter Zijlstra
On Thu, Jan 09, 2020 at 02:36:50PM +0300, Alexey Budankov wrote:
> On 08.01.2020 19:07, Peter Zijlstra wrote:
> > On Wed, Dec 18, 2019 at 12:25:35PM +0300, Alexey Budankov wrote:

> >> diff --git a/kernel/events/core.c b/kernel/events/core.c
> >> index 059ee7116008..d9db414f2197 100644
> >> --- a/kernel/events/core.c
> >> +++ b/kernel/events/core.c
> >> @@ -9056,7 +9056,7 @@ static int perf_kprobe_event_init(struct perf_event 
> >> *event)
> >>if (event->attr.type != perf_kprobe.type)
> >>return -ENOENT;
> >>  
> >> -  if (!capable(CAP_SYS_ADMIN))
> >> +  if (!perfmon_capable())
> >>return -EACCES;
> >>  
> >>/*
> > 
> > This one only allows attaching to already extant kprobes, right? It does
> > not allow creation of kprobes.
> 
> This unblocks creation of local trace kprobes and uprobes by CAP_SYS_PERFMON 
> privileged process, exactly the same as for CAP_SYS_ADMIN privileged process.

I've no idea what you just said; it's just words.

Again, this only allows attaching to previously created kprobes, it does
not allow creating kprobes, right?

That is; I don't think CAP_SYS_PERFMON should be allowed to create
kprobes.

As might be clear; I don't actually know what the user-ABI is for
creating kprobes.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 01/10] PM: QoS: Add CPU_RESPONSE_FREQUENCY global PM QoS limit.

2020-03-11 Thread Peter Zijlstra
On Tue, Mar 10, 2020 at 02:41:54PM -0700, Francisco Jerez wrote:
> +static void cpu_response_frequency_qos_apply(struct pm_qos_request *req,
> +  enum pm_qos_req_action action,
> +  s32 value)
> +{
> + int ret = pm_qos_update_target(req->qos, &req->node, action, value);
> +
> + if (ret > 0)
> + wake_up_all_idle_cpus();
> +}

That's a pretty horrific thing to do; how often do we expect to call
this?
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] rcu_barrier() no longer allowed within mmap_sem?

2020-03-30 Thread Peter Zijlstra
On Mon, Mar 30, 2020 at 03:00:35PM +0200, Daniel Vetter wrote:
> Hi all, for all = rcu, cpuhotplug and perf maintainers
> 
> We've hit an interesting new lockdep splat in our drm/i915 CI:
> 
> https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17096/shard-tglb7/igt@kms_frontbuffer_track...@fbcpsr-rgb101010-draw-mmap-gtt.html#dmesg-warnings861
> 
> Summarizing away the driver parts we have
> 
> < gpu locks which are held within mm->mmap_sem in various gpu fault handlers >
> 
> -> #4 (&mm->mmap_sem#2){}:
> <4> [604.892615] __might_fault+0x63/0x90
> <4> [604.892617] _copy_to_user+0x1e/0x80
> <4> [604.892619] perf_read+0x200/0x2b0
> <4> [604.892621] vfs_read+0x96/0x160
> <4> [604.892622] ksys_read+0x9f/0xe0
> <4> [604.892623] do_syscall_64+0x4f/0x220
> <4> [604.892624] entry_SYSCALL_64_after_hwframe+0x49/0xbe
> <4> [604.892625]
> -> #3 (&cpuctx_mutex){+.+.}:
> <4> [604.892626] __mutex_lock+0x9a/0x9c0
> <4> [604.892627] perf_event_init_cpu+0xa4/0x140
> <4> [604.892629] perf_event_init+0x19d/0x1cd
> <4> [604.892630] start_kernel+0x362/0x4e4
> <4> [604.892631] secondary_startup_64+0xa4/0xb0
> <4> [604.892631]
> -> #2 (pmus_lock){+.+.}:
> <4> [604.892633] __mutex_lock+0x9a/0x9c0
> <4> [604.892633] perf_event_init_cpu+0x6b/0x140
> <4> [604.892635] cpuhp_invoke_callback+0x9b/0x9d0
> <4> [604.892636] _cpu_up+0xa2/0x140
> <4> [604.892637] do_cpu_up+0x61/0xa0
> <4> [604.892639] smp_init+0x57/0x96
> <4> [604.892639] kernel_init_freeable+0x87/0x1dc
> <4> [604.892640] kernel_init+0x5/0x100
> <4> [604.892642] ret_from_fork+0x24/0x50
> <4> [604.892642]
> -> #1 (cpu_hotplug_lock.rw_sem){}:
> <4> [604.892643] cpus_read_lock+0x34/0xd0
> <4> [604.892644] rcu_barrier+0xaa/0x190
> <4> [604.892645] kernel_init+0x21/0x100
> <4> [604.892647] ret_from_fork+0x24/0x50
> <4> [604.892647]
> -> #0 (rcu_state.barrier_mutex){+.+.}:


> The last backtrace boils down to i915 driver code which holds the same
> locks we are holding within mm->mmap_sem, and then ends up calling
> rcu_barrier(). From what I can see i915 is just the messenger here,
> any driver with this pattern of a lock held within mmap_sem which also
> has a path of calling rcu_barrier while holding that lock should be
> hitting this splat.
> 
> Two questions:
> - This suggests that calling rcu_barrier() isn't ok anymore while
> holding mmap_sem, or anything that has a dependency upon mmap_sem. I
> guess that's not the idea, please confirm.
> - Assuming this depedency is indeed not intended, where should the
> loop be broken? It goes through perf, cpuhotplug and rcu subsystems,
> and I don't have a clue about any of those.

I wonder what is new here; the 1-4 chain there has been true for a long
time, see also the comment at perf_event_ctx_lock_nested().

That said; it _might_ be possible to break 3->4, that is, all the
copy_{to,from}_user() usage in perf can be lifted out from under the
various locks by re-arranging code, but I have a nagging feeling there
was more to it than that. Of course, while I did document the locking
rules, I seem to have forgotten to comment on exactly why these rules
are as they are.. oh well.


___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 00/34] put_user_pages(): miscellaneous call sites

2019-08-02 Thread Peter Zijlstra
On Thu, Aug 01, 2019 at 07:16:19PM -0700, john.hubb...@gmail.com wrote:

> This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
> ("mm: introduce put_user_page*(), placeholder versions"). That commit
> has an extensive description of the problem and the planned steps to
> solve it, but the highlites are:

That is one horridly mangled Changelog there :-/ It looks like it's
partially duplicated.

Anyway; no objections to any of that, but I just wanted to mention that
there are other problems with long term pinning that haven't been
mentioned, notably they inhibit compaction.

A long time ago I proposed an interface to mark pages as pinned, such
that we could run compaction before we actually did the pinning.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 3/4] kernel.h: Add non_block_start/end()

2019-08-23 Thread Peter Zijlstra
On Tue, Aug 20, 2019 at 10:24:40PM +0200, Daniel Vetter wrote:
> On Tue, Aug 20, 2019 at 10:19:01AM +0200, Daniel Vetter wrote:
> > In some special cases we must not block, but there's not a
> > spinlock, preempt-off, irqs-off or similar critical section already
> > that arms the might_sleep() debug checks. Add a non_block_start/end()
> > pair to annotate these.
> > 
> > This will be used in the oom paths of mmu-notifiers, where blocking is
> > not allowed to make sure there's forward progress. Quoting Michal:
> > 
> > "The notifier is called from quite a restricted context - oom_reaper -
> > which shouldn't depend on any locks or sleepable conditionals. The code
> > should be swift as well but we mostly do care about it to make a forward
> > progress. Checking for sleepable context is the best thing we could come
> > up with that would describe these demands at least partially."
> > 
> > Peter also asked whether we want to catch spinlocks on top, but Michal
> > said those are less of a problem because spinlocks can't have an
> > indirect dependency upon the page allocator and hence close the loop
> > with the oom reaper.
> > 
> > Suggested by Michal Hocko.
> > 
> > v2:
> > - Improve commit message (Michal)
> > - Also check in schedule, not just might_sleep (Peter)
> > 
> > v3: It works better when I actually squash in the fixup I had lying
> > around :-/
> > 
> > v4: Pick the suggestion from Andrew Morton to give non_block_start/end
> > some good kerneldoc comments. I added that other blocking calls like
> > wait_event pose similar issues, since that's the other example we
> > discussed.
> > 
> > Cc: Jason Gunthorpe 
> > Cc: Peter Zijlstra 
> > Cc: Ingo Molnar 
> > Cc: Andrew Morton 
> > Cc: Michal Hocko 
> > Cc: David Rientjes 
> > Cc: "Christian König" 
> > Cc: Daniel Vetter 
> > Cc: "Jérôme Glisse" 
> > Cc: linux...@kvack.org
> > Cc: Masahiro Yamada 
> > Cc: Wei Wang 
> > Cc: Andy Shevchenko 
> > Cc: Thomas Gleixner 
> > Cc: Jann Horn 
> > Cc: Feng Tang 
> > Cc: Kees Cook 
> > Cc: Randy Dunlap 
> > Cc: linux-ker...@vger.kernel.org
> > Acked-by: Christian König  (v1)
> > Signed-off-by: Daniel Vetter 
> 
> Hi Peter,
> 
> Iirc you've been involved at least somewhat in discussing this. -mm folks
> are a bit undecided whether these new non_block semantics are a good idea.
> Michal Hocko still is in support, but Andrew Morton and Jason Gunthorpe
> are less enthusiastic. Jason said he's ok with merging the hmm side of
> this if scheduler folks ack. If not, then I'll respin with the
> preempt_disable/enable instead like in v1.
> 
> So ack/nack for this from the scheduler side?

Right, I had memories of seeing this before, and I just found a fairly
long discussion on this elsewhere in the vacation inbox (*groan*).

Yeah, this is something I can live with,

Acked-by: Peter Zijlstra (Intel) 
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 2/3] lockdep: add might_lock_nested()

2019-08-23 Thread Peter Zijlstra
On Tue, Aug 20, 2019 at 10:19:50AM +0200, Daniel Vetter wrote:
> Necessary to annotate functions where we might acquire a
> mutex_lock_nested() or similar. Needed by i915.
> 
> Signed-off-by: Daniel Vetter 
> Cc: Peter Zijlstra 
> Cc: Ingo Molnar 
> Cc: Will Deacon 
> Cc: linux-ker...@vger.kernel.org

Acked-by: Peter Zijlstra (Intel) 

> ---
>  include/linux/lockdep.h | 8 
>  1 file changed, 8 insertions(+)
> 
> diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
> index 38ea6178df7d..30f6172d6889 100644
> --- a/include/linux/lockdep.h
> +++ b/include/linux/lockdep.h
> @@ -631,6 +631,13 @@ do { 
> \
>   lock_acquire(&(lock)->dep_map, 0, 0, 1, 1, NULL, _THIS_IP_);\
>   lock_release(&(lock)->dep_map, 0, _THIS_IP_);   \
>  } while (0)
> +# define might_lock_nested(lock, subclass)   \
> +do { \
> + typecheck(struct lockdep_map *, &(lock)->dep_map);  \
> + lock_acquire(&(lock)->dep_map, subclass, 0, 1, 1, NULL, \
> +  _THIS_IP_);\
> + lock_release(&(lock)->dep_map, 0, _THIS_IP_);   \
> +} while (0)
>  
>  #define lockdep_assert_irqs_enabled()do {
> \
>   WARN_ONCE(debug_locks && !current->lockdep_recursion && \
> @@ -653,6 +660,7 @@ do {  
> \
>  #else
>  # define might_lock(lock) do { } while (0)
>  # define might_lock_read(lock) do { } while (0)
> +# define might_lock_nested(lock, subclass) do { } while (0)
>  # define lockdep_assert_irqs_enabled() do { } while (0)
>  # define lockdep_assert_irqs_disabled() do { } while (0)
>  # define lockdep_assert_in_irq() do { } while (0)
> -- 
> 2.23.0.rc1
> 
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 3/3] drm/i915: use might_lock_nested in get_pages annotation

2019-08-23 Thread Peter Zijlstra
On Tue, Aug 20, 2019 at 10:19:51AM +0200, Daniel Vetter wrote:
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h 
> b/drivers/gpu/drm/i915/gem/i915_gem_object.h
> index a0b1fa8a3224..b3fd6aac93bc 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
> @@ -233,10 +233,26 @@ void __i915_gem_object_set_pages(struct 
> drm_i915_gem_object *obj,
>  int i915_gem_object_get_pages(struct drm_i915_gem_object *obj);
>  int __i915_gem_object_get_pages(struct drm_i915_gem_object *obj);
>  
> +enum i915_mm_subclass { /* lockdep subclass for obj->mm.lock/struct_mutex */
> + I915_MM_NORMAL = 0,
> + /*
> +  * Only used by struct_mutex, when called "recursively" from
> +  * direct-reclaim-esque. Safe because there is only every one
> +  * struct_mutex in the entire system. */
> + I915_MM_SHRINKER = 1,
> + /*
> +  * Used for obj->mm.lock when allocating pages. Safe because the object
> +  * isn't yet on any LRU, and therefore the shrinker can't deadlock on
> +  * it. As soon as the object has pages, obj->mm.lock nests within
> +  * fs_reclaim.
> +  */
> + I915_MM_GET_PAGES = 1,

Those comments are inconsistently styled; if you move them, might as
well fix that too :-)

> +};
> +
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 3/4] kernel.h: Add non_block_start/end()

2019-08-23 Thread Peter Zijlstra
On Fri, Aug 23, 2019 at 09:12:34AM -0300, Jason Gunthorpe wrote:

> I still haven't heard a satisfactory answer why a whole new scheme is
> needed and a simple:
> 
>if (IS_ENABLED(CONFIG_DEBUG_ATOMIC_SLEEP))
> preempt_disable()
> 
> isn't sufficient to catch the problematic cases during debugging??
> IMHO the fact preempt is changed by the above when debugging is not
> material here. I think that information should be included in the
> commit message at least.

That has a much larger impact and actually changes behaviour, while the
relatively simple patch Daniel proposed only adds a warning but doesn't
affect behaviour.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 3/4] kernel.h: Add non_block_start/end()

2019-08-23 Thread Peter Zijlstra
On Fri, Aug 23, 2019 at 03:42:47PM +0200, Daniel Vetter wrote:
> I'm assuming the lockdep one will land, so not going to resend that.

I was assuming you'd wake the might_lock_nested() along with the i915
user through the i915/drm tree. If want me to take some or all of that,
lemme know.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 3/3] locking/selftests: Add testcases for fs_reclaim

2020-11-20 Thread Peter Zijlstra
On Fri, Nov 20, 2020 at 10:54:44AM +0100, Daniel Vetter wrote:
> Since I butchered this I figured better to make sure we have testcases
> for this now. Since we only have a locking context for __GFP_FS that's
> the only thing we're testing right now.
> 
> Cc: linux-fsde...@vger.kernel.org
> Cc: Dave Chinner 
> Cc: Qian Cai 
> Cc: linux-...@vger.kernel.org
> Cc: Thomas Hellström (Intel) 
> Cc: Andrew Morton 
> Cc: Jason Gunthorpe 
> Cc: linux...@kvack.org
> Cc: linux-r...@vger.kernel.org
> Cc: Maarten Lankhorst 
> Cc: Christian König 
> Cc: "Matthew Wilcox (Oracle)" 
> Signed-off-by: Daniel Vetter 
> Cc: Peter Zijlstra 
> Cc: Ingo Molnar 
> Cc: Will Deacon 
> Cc: linux-ker...@vger.kernel.org
> ---
>  lib/locking-selftest.c | 47 ++
>  1 file changed, 47 insertions(+)

I have a few changes pending for this file, I don't think the conflicts
will be bad, but..

In any case:

Acked-by: Peter Zijlstra (Intel) 
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [RFC] GPU-bound energy efficiency improvements for the intel_pstate driver (v2.99)

2020-05-11 Thread Peter Zijlstra
On Mon, Apr 27, 2020 at 08:22:47PM -0700, Francisco Jerez wrote:
> This addresses the technical concerns people brought up about my
> previous v2 revision of this series.  Other than a few bug fixes, the
> only major change relative to v2 is that the controller is now exposed
> as a new CPUFREQ generic governor as requested by Rafael (named
> "adaptive" in this RFC though other naming suggestions are welcome).
> Main reason for calling this v2.99 rather than v3 is that I haven't
> yet addressed all the documentation requests from the v2 thread --
> Will spend some time doing that as soon as I have an ACK (ideally from
> Rafael) that things are moving in the right direction.
> 
> You can also find this series along with the WIP code for non-HWP
> platforms in this branch:
> 
> https://github.com/curro/linux/tree/intel_pstate-vlp-v2.99
> 
> Thanks!
> 
> [PATCHv2.99 01/11] PM: QoS: Add CPU_SCALING_RESPONSE global PM QoS limit.
> [PATCHv2.99 02/11] drm/i915: Adjust PM QoS scaling response frequency based 
> on GPU load.
> [PATCHv2.99 03/11] OPTIONAL: drm/i915: Expose PM QoS control parameters via 
> debugfs.
> [PATCHv2.99 04/11] cpufreq: Define ADAPTIVE frequency governor policy.
> [PATCHv2.99 05/11] cpufreq: intel_pstate: Reorder 
> intel_pstate_clear_update_util_hook() and intel_pstate_set_update_util_hook().
> [PATCHv2.99 06/11] cpufreq: intel_pstate: Call 
> intel_pstate_set_update_util_hook() once from the setpolicy hook.
> [PATCHv2.99 07/11] cpufreq: intel_pstate: Implement VLP controller statistics 
> and target range calculation.
> [PATCHv2.99 08/11] cpufreq: intel_pstate: Implement VLP controller for HWP 
> parts.
> [PATCHv2.99 09/11] cpufreq: intel_pstate: Enable VLP controller based on ACPI 
> FADT profile and CPUID.
> [PATCHv2.99 10/11] OPTIONAL: cpufreq: intel_pstate: Add tracing of VLP 
> controller status.
> [PATCHv2.99 11/11] OPTIONAL: cpufreq: intel_pstate: Expose VLP controller 
> parameters via debugfs.

What I'm missing is an explanation for why this isn't using the
infrastructure that was build for these kinds of things? The thermal
framework, was AFAIU, supposed to help with these things, and the IPA
thing in particular is used by ARM to do exactly this GPU/CPU power
budget thing.

If thermal/IPA is found wanting, why aren't we improving that?

How much of that ADAPTIVE crud is actually intel_pstate specific? On a
(really) quick read it appears to me that much of the controller bits
there can be applied more generic, and thus should not be part of any
one governor.

Specifically, I want to use sched_util as cpufreq governor and use the
intel_pstate as a passive driver.


___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [drm:gen8_de_irq_handler [i915]] *ERROR* Fault errors on pipe B: 0x00000080

2020-06-02 Thread Peter Zijlstra
Hi All,

My desktop (Intel(R) Xeon(R) CPU E3-1245 v5 @ 3.50GHz) is spewing tons
and tons of:

[  778.461227] [drm:gen8_de_irq_handler [i915]] *ERROR* Fault errors on pipe B: 
0x0080
[  778.477763] [drm:gen8_de_irq_handler [i915]] *ERROR* Fault errors on pipe B: 
0x0080
[  778.577718] [drm:gen8_de_irq_handler [i915]] *ERROR* Fault errors on pipe B: 
0x0080
[  778.577824] [drm:gen8_de_irq_handler [i915]] *ERROR* Fault errors on pipe B: 
0x0080

at a rate of ~3 per second, and X isn't as stable as one would like (it
crashes every few days, sometimes taking the whole kernel along). Sadly,
this being my desktop, I don't actually have a serial line connected to
it, although I could hook one up if required.

It is currently running 5.6.14-1 (as per debian testing), but it seems
to have done this for a while, I only now got around to reporting it :/

What else I you need to know, want me to try etc.. ?

~ Peter


___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [drm:gen8_de_irq_handler [i915]] *ERROR* Fault errors on pipe B: 0x00000080

2020-06-02 Thread Peter Zijlstra
On Tue, Jun 02, 2020 at 06:08:03PM +, Souza, Jose wrote:
> Hi Peter
> Please file a bug by follow this instructions: 
> https://gitlab.freedesktop.org/drm/intel/-/wikis/How-to-file-i915-bugs

*sigh*, top posting and webforms :-(

Steps to reproduce: Boot into X

How often: Always

uname -r: 5.6.0-2-amd64

Machine: Supermicro X11SSZ-F

Display connector:

[14.907] (II) intel(0): switch to mode 3840x2160@60.0 on DP2 using pipe 0, 
position (0, 0), rotation normal, reflection none
[14.918] (II) intel(0): switch to mode 3840x2160@60.0 on DP3 using pipe 1, 
position (0, 0), rotation normal, reflection none


I'll add the kernel parameters next time I reboot this thing, I'll also
add the latest drm next time I build a kernel for this machine.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v8 00/12] Introduce CAP_PERFMON to secure system performance monitoring and observability

2020-07-14 Thread Peter Zijlstra
On Mon, Jul 13, 2020 at 03:51:52PM -0300, Arnaldo Carvalho de Melo wrote:

> > > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > > index 856d98c36f56..a2397f724c10 100644
> > > --- a/kernel/events/core.c
> > > +++ b/kernel/events/core.c
> > > @@ -11595,7 +11595,7 @@ SYSCALL_DEFINE5(perf_event_open,
> > >* perf_event_exit_task() that could imply).
> > >*/
> > >   err = -EACCES;
> > > - if (!ptrace_may_access(task, PTRACE_MODE_READ_REALCREDS))
> > > + if (!perfmon_capable() && !ptrace_may_access(task, 
> > > PTRACE_MODE_READ_REALCREDS))
> > >   goto err_cred;
> > >   }
> > > 
> > >> makes monitoring simpler and even more secure to use since Perf tool need
> > >> not to start/stop/single-step and read/write registers and memory and so 
> > >> on
> > >> like a debugger or strace-like tool. What do you think?
> > > 
> > > I tend to agree, Peter?

So this basically says that if CAP_PERFMON, we don't care about the
ptrace() permissions? Just like how CAP_SYS_PTRACE would always allow
the ptrace checks?

I suppose that makes sense.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends

2020-09-24 Thread Peter Zijlstra
On Thu, Sep 24, 2020 at 08:32:41AM -0400, Steven Rostedt wrote:
> Anyway, instead of blocking. What about having a counter of number of
> migrate disabled tasks per cpu, and when taking a migrate_disable(), and 
> there's
> already another task with migrate_disabled() set, and the current task has
> an affinity greater than 1, it tries to migrate to another CPU?

That doesn't solve the problem. On wakeup we should already prefer an
idle CPU over one running a (RT) task, but you can always wake more
tasks than there's CPUs around and you'll _have_ to stack at some point.

The trick is how to unstack them correctly. We need to detect when a
migrate_disable() task _should_ start running again, and migrate away
whoever is in the way at that point.

It turns out, that getting selected for pull-balance is exactly that
condition, and clearly a migrate_disable() task cannot be pulled, but we
can use that signal to try and pull away the running task that's in the
way.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends

2020-09-24 Thread Peter Zijlstra
On Thu, Sep 24, 2020 at 09:51:38AM -0400, Steven Rostedt wrote:

> > It turns out, that getting selected for pull-balance is exactly that
> > condition, and clearly a migrate_disable() task cannot be pulled, but we
> > can use that signal to try and pull away the running task that's in the
> > way.
> 
> Unless of course that running task is in a migrate disable section
> itself ;-)

See my ramblings here:

  
https://lkml.kernel.org/r/20200924082717.ga1362...@hirez.programming.kicks-ass.net

My plan was to have the migrate_enable() of the running task trigger the
migration in that case.

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Minimize uaccess exposure in i915_gem_execbuffer2_ioctl()

2020-02-28 Thread Peter Zijlstra
On Thu, Feb 27, 2020 at 07:03:42PM -0600, Josh Poimboeuf wrote:
> > And why not mark gen8_canonical_addr() __always_inline?
> 
> Right, marking those two functions as __always_inline is the other
> option.  The problem is, if you keep doing it, eventually you end up
> with __always_inline-itis spreading all over the place.  And it affects
> all the other callers, at least in the CONFIG_CC_OPTIMIZE_FOR_SIZE case.
> At least this fix is localized.

I'm all for __always_inline in this case, the compiler not inlining sign
extention is just retarded,
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 2/2] lockdep: Up MAX_LOCKDEP_CHAINS

2017-11-29 Thread Peter Zijlstra
On Wed, Nov 29, 2017 at 04:41:45PM +0100, Daniel Vetter wrote:
> cross-release ftl
> 
> From Chris:
> 
> "Fwiw, this isn't cross-release but us reloading the module many times,
> creating a whole host of new lockclasses. Even more fun is when the
> module gets a slightly different address and the new lock address hashes
> into an old lock...

Yeah, this is a known issue, just reboot.

> "I did think about a module-hook to revoke the stale lockclasses, but
> that still leaves all the hashed chains.

Its an absolute royal pain to remove all the resources consumed by a
module, and if you manage you then have to deal with fragmented storage
-- that is, we need to go keep track of which entries are used.

Its a giant heap of complexity that's just not worth it.


Given all that, I don't see why we should up this. Just don't reload
modules (or better, don't use modules at all).
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] kthread: finer-grained lockdep/cross-release completion

2017-12-07 Thread Peter Zijlstra
On Thu, Dec 07, 2017 at 11:08:49AM +0100, Daniel Vetter wrote:
> Since -rc1 we're hitting a bunch of lockdep splats using the new
> cross-release stuff around the 2 kthread completions. In all cases
> they are because totally independent uses of kthread are mixed up by
> lockdep into the same locking class, creating artificial deadlocks.
> 
> Fix this by converting kthread code in the same way as e.g.
> alloc_workqueue already works: Use macros for the public api so we can
> have a callsite specific lockdep key, then pass that through the
> entire callchain. Due to the many entry points this is slightly
> tedious.

Do you have a splat somewhere? I'm having trouble seeing how all this
comes together. That is, I've no real idea what you're actual problem is
and if this is the right solution.

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] kthread: finer-grained lockdep/cross-release completion

2017-12-07 Thread Peter Zijlstra
On Thu, Dec 07, 2017 at 03:58:28PM +0100, Daniel Vetter wrote:

> [   85.069417] gem_exec_captur/2810 is trying to acquire lock:
> [   85.069419]  ((completion)&self->parked){+.+.}, at: [] 
> kthread_park+0x3d/0x50
> [   85.069426] 
>but task is already holding lock:
> [   85.069428]  (&dev->struct_mutex){+.+.}, at: [] 
> i915_reset_device+0x1bd/0x230 [i915]
> [   85.069448] 
>which lock already depends on the new lock.
> 
> [   85.069451] 
>the existing dependency chain (in reverse order) is:
> [   85.069454] 
>-> #3 (&dev->struct_mutex){+.+.}:
> [   85.069460]__mutex_lock+0x81/0x9b0
> [   85.069481]i915_mutex_lock_interruptible+0x47/0x130 [i915]
> [   85.069502]i915_gem_fault+0x201/0x760 [i915]
> [   85.069507]__do_fault+0x15/0x70
> [   85.069509]__handle_mm_fault+0x7bf/0xda0
> [   85.069512]handle_mm_fault+0x14f/0x2f0
> [   85.069515]__do_page_fault+0x2d1/0x560
> [   85.069518]page_fault+0x22/0x30
> [   85.069520] 
>-> #2 (&mm->mmap_sem){}:
> [   85.069525]__might_fault+0x63/0x90
> [   85.069529]_copy_to_user+0x1e/0x70
> [   85.069532]perf_read+0x21d/0x290
> [   85.069534]__vfs_read+0x1e/0x120
> [   85.069536]vfs_read+0xa1/0x150
> [   85.069539]SyS_read+0x40/0xa0
> [   85.069541]entry_SYSCALL_64_fastpath+0x1c/0x89

>-> #0 ((completion)&self->parked){+.+.}:

> [   85.069692] Chain exists of:
>  (completion)&self->parked --> &mm->mmap_sem --> 
> &dev->struct_mutex

> [   85.069718] 3 locks held by gem_exec_captur/2810:

> [   85.069732]  #2:  (&dev->struct_mutex){+.+.}, at: [] 
> i915_reset_device+0x1bd/0x230 [i915]

>stack backtrace:

> [   85.069788]  lock_acquire+0xaf/0x200
> [   85.069793]  wait_for_common+0x54/0x210
> [   85.069807]  kthread_park+0x3d/0x50
> [   85.069827]  i915_gem_reset_prepare_engine+0x1d/0x90 [i915]
> [   85.069849]  i915_gem_reset_prepare+0x2c/0x60 [i915]
> [   85.069865]  i915_reset+0x66/0x230 [i915]
> [   85.069881]  i915_reset_device+0x1cb/0x230 [i915]
> [   85.069919]  i915_handle_error+0x2d3/0x430 [i915]
> [   85.069951]  i915_wedged_set+0x79/0xc0 [i915]
> [   85.069955]  simple_attr_write+0xab/0xc0
> [   85.069959]  full_proxy_write+0x4b/0x70
> [   85.069961]  __vfs_write+0x1e/0x130
> [   85.069976]  vfs_write+0xc0/0x1b0
> [   85.069979]  SyS_write+0x40/0xa0
> [   85.069982]  entry_SYSCALL_64_fastpath+0x1c/0x89


Thread-Ak-Thread

i915_reset_device
#3mutex_lock(&dev->struct_mutex)
  i915_reset()
i915_gem_reset_prepare()
  i915_gem_reset_prepare_engine()
kthread_park()

__do_page_fault()
#2down_read(&mm->mmap_sem);
  handle_mm_fault()
__handle_mm_fault()
  __do_fault()
i915_gem_fault()
  
i915_mutex_lock_interruptible()
#3  
mutex_lock(&dev->struct_mutex)

/* twiddles thumbs 
forever more */

#0wait_for_common()

#0  complete()


Is what it says I suppose. Now I don't know enough about that i915 code
to say if that breadcrumbs_signal thread can ever trigger a fault or
not. I got properly lost in that dma_fence callback maze.

You're saying not?


(also, that comment near need_resched() doesn't make sense to me)
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] kthread: finer-grained lockdep/cross-release completion

2017-12-08 Thread Peter Zijlstra
On Thu, Dec 07, 2017 at 09:56:57PM +0100, Daniel Vetter wrote:
> On Thu, Dec 07, 2017 at 08:57:09PM +0100, Peter Zijlstra wrote:

> > Is what it says I suppose. Now I don't know enough about that i915 code
> > to say if that breadcrumbs_signal thread can ever trigger a fault or
> > not. I got properly lost in that dma_fence callback maze.
> > 
> > You're saying not?
> 
> Our own kthread, no. At least a tons of run on our CI with the kthread
> patch applied shut up lockdep splats for good. And since we have all the
> i915 kthreads still with the same lockdep_map even with the patch applied,
> since they are all created in the same function, I think that's pretty
> solid evidence.
> 
> [There's also really no reasonable reason for it to fault, but I trust
> automated tools more to check this stuff than my own brain. The test suite
> we're running is fairly nasty and does all kinds of corner case
> thrashing. Note that the dma_fence callbacks can be provideded by any
> other driver (think multi-gpu desktops and stuff), but the contract is
> that they must be able to handle hardirq context. Faulting's definitely
> not on the table.]

OK, good.

> The problem lockdep seems to complain about is that some random other
> kthread could fault, end up in the i915 fault handler, and get stuck until
> i915_reset_device is done doing what it needs to do. But as long as that
> kthread is in turn not providing a service that i915_reset_device needs, I
> don't see how that can deadlock. And if we have that case (there was
> definitely plenty of that stuff that cross-release uncovered in our code,
> we had to shuffle a bunch of allocations and things out from under
> dev->struct_mutex), then there should be another lock or completion that
> closes the loop again.

Indeed so.

> > (also, that comment near need_resched() doesn't make sense to me)
> 
> I assume you mean the one in intel_breadcrumbs_signaler(). The hw design
> is somewhat screwed up and depends upon ridiculously low interrupt
> servicing time. We get there by essentially implementing something like
> netdev polled mode, from irq context. Like net polling if the scheduler
> gets pissed at us we stop and dump it all into a kthread. From a latency
> and ops/sec pov a gpu is pretty close to networking sometimes.
> 
> [Note: I just have a rough idea what the code is supposed to do, I didn't
> write/review/debug that one.]

The thing is though; that calling schedule() from an RT thread doesn't
help anything if it goes running instantly again.

And looking more; that uses the waitqueue code 'creatively' it doesn't
actually have a condition to wait on, so wtf is it doing with a
waitqueue?

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] kthread: finer-grained lockdep/cross-release completion

2017-12-08 Thread Peter Zijlstra
On Thu, Dec 07, 2017 at 11:08:49AM +0100, Daniel Vetter wrote:
> Since -rc1 we're hitting a bunch of lockdep splats using the new
> cross-release stuff around the 2 kthread completions. In all cases
> they are because totally independent uses of kthread are mixed up by
> lockdep into the same locking class, creating artificial deadlocks.
> 
> Fix this by converting kthread code in the same way as e.g.
> alloc_workqueue already works: Use macros for the public api so we can
> have a callsite specific lockdep key, then pass that through the
> entire callchain. Due to the many entry points this is slightly
> tedious.
> 
> Cc: Tvrtko Ursulin 
> Cc: Marta Lofstedt 
> Cc: Byungchul Park 
> Cc: Ingo Molnar 
> Cc: Peter Zijlstra 
> Cc: Tejun Heo 
> Cc: Kees Cook 
> Cc: Thomas Gleixner 
> Cc: Shaohua Li 
> Cc: Andrew Morton 
> Cc: Jens Axboe 
> Cc: Daniel Vetter 
> Cc: Greg Kroah-Hartman 
> Cc: Jonathan Corbet 
> Cc: Oleg Nesterov 
> References: https://bugs.freedesktop.org/show_bug.cgi?id=103950
> Signed-off-by: Daniel Vetter 

Acked-by: Peter Zijlstra (Intel) 


___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] kthread: finer-grained lockdep/cross-release completion

2017-12-08 Thread Peter Zijlstra
On Fri, Dec 08, 2017 at 05:36:28PM +0100, Daniel Vetter wrote:

> Aside: Could/should we take some fake lockdep locks around these
> callbacks, since not all drivers call them from a hardirq context? Just to
> validate that everyone follows the contract.

What I typically do is use local_irq_save/restore over the actual
callback. Then if someone doesn't honour the contract, it shows real
quick :-)
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: properly init lockdep class

2017-12-13 Thread Peter Zijlstra
On Wed, Dec 13, 2017 at 06:36:33PM +0100, Sebastian Andrzej Siewior wrote:
> +peterz
> context: http://www.spinics.net/lists/intel-gfx/msg149011.html
> 
> On 2017-12-13 17:37:21 [+0200], Joonas Lahtinen wrote:
> > On Wed, 2017-12-13 at 16:06 +0100, Sebastian Andrzej Siewior wrote:
> > > On 2017-12-13 16:00:49 [+0200], Joonas Lahtinen wrote:
> > > > On Thu, 2017-11-30 at 16:19 +0100, Sebastian Andrzej Siewior wrote:
> > > > > The code has an ifdef and uses two functions to either init the bare
> > > > > spinlock or init it and set a lock-class. It is possible to do the 
> > > > > same
> > > > > thing without an ifdef.
> > > > > With this patch (in debug case) we first use the "default" lock class
> > > > > which is later overwritten to the supplied one. Without lockdep the 
> > > > > set
> > > > > name/class function vanishes.
> …
> > > > At least according to the source there doesn't seem to be clarity about
> > > > what is the right thing to do, this being just one option.

The proposed patch is definitely the right thing to do. The fact that it
doesn't require #ifdef is a very big clue.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 0/9] GPU-bound energy efficiency improvements for the intel_pstate driver.

2018-04-12 Thread Peter Zijlstra
On Thu, Apr 12, 2018 at 11:34:54AM -0700, Francisco Jerez wrote:
> The reason for the energy efficiency problem of iowait boosting is
> precisely the greater oscillation between turbo and idle.  Say that
> iowait boost increases the frequency by a factor alpha relative to the
> optimal frequency f0 (in terms of energy efficiency) required to execute
> some IO-bound workload.  This will cause the CPU to be busy for a
> fraction of the time it was busy originally, approximately t1 = t0 /
> alpha, which indeed divides the overall energy usage by a factor alpha,
> but at the same time multiplies the instantaneous power consumption
> while busy by a factor potentially much greater than alpha, since the
> CPU's power curve is largely non-linear, and in fact approximately
> convex within the frequency range allowed by the policy, so you get an
> average energy usage possibly much greater than the optimal.

Ah, but we don't (yet) have the (normalized) power curves, so we cannot
make that call.

Once we have the various energy/OPP numbers required for EAS we can
compute the optimal. I think such was even mentioned in the thread I
referred earlier.

Until such time; we boost to max for lack of a better option.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 0/9] GPU-bound energy efficiency improvements for the intel_pstate driver.

2018-04-13 Thread Peter Zijlstra
On Thu, Apr 12, 2018 at 12:55:39PM -0700, Francisco Jerez wrote:
> Actually assuming that a single geometric feature of the power curve is
> known -- it being convex in the frequency range allowed by the policy
> (which is almost always the case, not only for Intel CPUs), the optimal
> frequency for an IO-bound workload is fully independent of the exact
> power curve -- It's just the minimum CPU frequency that's able to keep
> the bottlenecking IO device at 100% utilization. 

I think that is difficult to determine with the information at hand. We
have lost all device information by the time we reach the scheduler.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 0/9] GPU-bound energy efficiency improvements for the intel_pstate driver.

2018-04-14 Thread Peter Zijlstra
On Fri, Apr 13, 2018 at 06:57:39PM -0700, Francisco Jerez wrote:
> Peter Zijlstra  writes:
> 
> > On Thu, Apr 12, 2018 at 12:55:39PM -0700, Francisco Jerez wrote:
> >> Actually assuming that a single geometric feature of the power curve is
> >> known -- it being convex in the frequency range allowed by the policy
> >> (which is almost always the case, not only for Intel CPUs), the optimal
> >> frequency for an IO-bound workload is fully independent of the exact
> >> power curve -- It's just the minimum CPU frequency that's able to keep
> >> the bottlenecking IO device at 100% utilization. 
> >
> > I think that is difficult to determine with the information at hand. We
> > have lost all device information by the time we reach the scheduler.
> 
> I assume you mean it's difficult to tell whether the workload is
> CPU-bound or IO-bound?  Yeah, it's non-trivial to determine whether the
> system is bottlenecking on IO, it requires additional infrastructure to
> keep track of IO utilization (that's the purpose of PATCH 1), and even

Note that I've not actually seen any of your patches; I got Cc'ed on
later.

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] drm/fb-helper: Reduce READ_ONCE(master) to lockless_dereference

2016-08-11 Thread Peter Zijlstra
On Wed, Jun 22, 2016 at 08:46:12AM +0100, Chris Wilson wrote:
> We are only documenting that the read is outside of the lock, and do not
> require strict ordering on the operation. In this case the more relaxed
> lockless_dereference() will suffice.

No, no, no... This is 'broken'. lockless_dereference() is _stronger_
than READ_ONCE(), not weaker.

lockless_dereference() is a wrapper around smp_read_barrier_depends()
and is used to form read dependencies. There is no read dependency here,
therefore using lockless_dereference() is entirely pointless.

Look at the definition of lockless_dereference(), it does a READ_ONCE()
and then smp_read_barrier_depends().

Also, clue is in the name: 'dereference', you don't actually dereference
the pointer here, only load it.

> +++ b/drivers/gpu/drm/drm_fb_helper.c
> @@ -464,7 +464,7 @@ static bool drm_fb_helper_is_bound(struct drm_fb_helper 
> *fb_helper)
>  
>   /* Sometimes user space wants everything disabled, so don't steal the
>* display if there's a master. */
> - if (READ_ONCE(dev->master))
> + if (lockless_dereference(dev->master))
>   return false;
>  
>   drm_for_each_crtc(crtc, dev) {
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 1/2] Revert "drm/fb-helper: Reduce READ_ONCE(master) to lockless_dereference"

2016-08-12 Thread Peter Zijlstra
On Thu, Aug 11, 2016 at 11:26:47AM -0700, Paul E. McKenney wrote:
> If my upcoming testing of the two changes together pans out, I will
> give you a Tested-by -- I am guessing that you don't want to wait
> until the next merge window for these changes.

I was planning to stuff them in tip/locking/urgent, so they'd end up in
this release.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] freedesktop bug id: 100548, bisected to sched/clock commit

2017-04-12 Thread Peter Zijlstra
On Wed, Apr 12, 2017 at 12:04:00PM +, Lofstedt, Marta wrote:
> Hi,
> 
> We have this "old" Lenovo Cantiga laptop(Intel Core 2 Duo L9400), hocked up 
> to our i915 pre-merge CI system, that has started to give unstable results 
> after commit:
> 
> commit 7b09cc5a9debc86c903c2eff8f8a1fdef773c649
> Author: Pavel Tatashin 
> Date:   Wed Mar 22 16:24:17 2017 -0400
> 
> sched/clock: Fix broken stable to unstable transfer
> 
> The issue is describe more in detail here:
> https://bugs.freedesktop.org/show_bug.cgi?id=100548

I don't click links.

> We have reverted above patch and then issue is no longer reproducible.
> Also, note that this issue has not been reproduced on any of our other 
> machines,
> https://intel-gfx-ci.01.org/CI/
> 
> So, why is this only affecting the Core 2 Duo?

Core2 doesn't have a usable TSC and would revert to the slow path. I'll
have another look at that patch.

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] freedesktop bug id: 100548, bisected to sched/clock commit

2017-04-12 Thread Peter Zijlstra
On Wed, Apr 12, 2017 at 04:30:32PM +0300, Jani Nikula wrote:
> On Wed, 12 Apr 2017, Peter Zijlstra  wrote:
> > On Wed, Apr 12, 2017 at 12:04:00PM +, Lofstedt, Marta wrote:

> >> The issue is describe more in detail here:
> >> https://bugs.freedesktop.org/show_bug.cgi?id=100548
> >
> > I don't click links.
> 
> Frankly, I find that more than a little offensive response to a
> regression report.

*shrug*, its more than you would've gotten if you'd filed a kernel
bugzilla. Those go straight to /dev/null.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] freedesktop bug id: 100548, bisected to sched/clock commit

2017-04-12 Thread Peter Zijlstra
On Wed, Apr 12, 2017 at 04:40:11PM +0300, Martin Peres wrote:

> > > So, why is this only affecting the Core 2 Duo?
> > 
> > Core2 doesn't have a usable TSC and would revert to the slow path. I'll
> > have another look at that patch.
> > 
> 
> So, by default, it is using the hpet clock source. FYI, I tried the only
> other available clock source (acpi_pm) and got the same result.

So because HPET is unbearably slow we've cobbled together something that
takes the HPET (or rather get-time-of-day CLOCK_MONOTONIC) value at tick
time and uses TSC to add per-cpu increments to that. Using windowing to
keep the TSC maddness at bay.

The patch in question affects the windowing.. clearly something went
amiss.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] freedesktop bug id: 100548, bisected to sched/clock commit

2017-04-12 Thread Peter Zijlstra
On Wed, Apr 12, 2017 at 05:49:53PM +0300, Martin Peres wrote:
> 
> 
> On 12/04/17 17:32, Peter Zijlstra wrote:
> > On Wed, Apr 12, 2017 at 04:40:11PM +0300, Martin Peres wrote:
> > 
> > > > > So, why is this only affecting the Core 2 Duo?
> > > > 
> > > > Core2 doesn't have a usable TSC and would revert to the slow path. I'll
> > > > have another look at that patch.
> > > > 
> > > 
> > > So, by default, it is using the hpet clock source. FYI, I tried the only
> > > other available clock source (acpi_pm) and got the same result.
> > 
> > So because HPET is unbearably slow we've cobbled together something that
> > takes the HPET (or rather get-time-of-day CLOCK_MONOTONIC) value at tick
> > time and uses TSC to add per-cpu increments to that. Using windowing to
> > keep the TSC maddness at bay.
> > 
> > The patch in question affects the windowing.. clearly something went
> > amiss.
> > 
> 
> Good to know. Is there a way to disable this behaviour, as a workaround for
> our CI system until a proper fix lands? We already pushed locally the revert
> for this patch, but that may affect other platforms which do not exhibit the
> problem.

No, the only thing you can do is revert for now.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] freedesktop bug id: 100548, bisected to sched/clock commit

2017-04-13 Thread Peter Zijlstra
On Wed, Apr 12, 2017 at 05:49:53PM +0300, Martin Peres wrote:

> Good to know. Is there a way to disable this behaviour, as a workaround for
> our CI system until a proper fix lands? We already pushed locally the revert
> for this patch, but that may affect other platforms which do not exhibit the
> problem.

Blergh, so the patch is correct, but the __gtod_offset calculation is
fed with absolute crap numbers due to 'circumstances' and then using it
ends up being worse than not using it.

Something like the below could be a work-around, but let me see if I
can't fix things better.

---
 arch/x86/kernel/tsc.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 714dfba..8ab883a 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -374,6 +374,8 @@ static int __init tsc_setup(char *str)
tsc_clocksource_reliable = 1;
if (!strncmp(str, "noirqtime", 9))
no_sched_irq_time = 1;
+   if (!strcmp(str, "unstable"))
+   mark_tsc_unstable("boot parameter");
return 1;
 }
 
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] freedesktop bug id: 100548, bisected to sched/clock commit

2017-04-13 Thread Peter Zijlstra
On Thu, Apr 13, 2017 at 03:30:25PM +0300, Martin Peres wrote:
> On 13/04/17 14:48, Peter Zijlstra wrote:
> > On Wed, Apr 12, 2017 at 05:49:53PM +0300, Martin Peres wrote:
> > 
> > > Good to know. Is there a way to disable this behaviour, as a workaround 
> > > for
> > > our CI system until a proper fix lands? We already pushed locally the 
> > > revert
> > > for this patch, but that may affect other platforms which do not exhibit 
> > > the
> > > problem.
> > 
> > Blergh, so the patch is correct, but the __gtod_offset calculation is
> > fed with absolute crap numbers due to 'circumstances' and then using it
> > ends up being worse than not using it.
> 
> Thanks for taking this bug seriously!

So I've not actually dug out a Core2 machine, so have only tested this
by poking random values into the TSC MSR on an otherwise 'good' machine.

Could you give it a go to see if it works for you?

Thomas, how much hate?

---
Subject: sched/clock,x86/tsc: Improve clock continuity for stable->unstable 
transition
From: Peter Zijlstra 
Date: Thu Apr 13 14:56:44 CEST 2017

Marta reported that commit:

  7b09cc5a9deb ("sched/clock: Fix broken stable to unstable transfer")

Appeared to have broken things on a Core2Duo machine. While that patch
is in fact correct, it exposes a problem with commit:

  5680d8094ffa ("sched/clock: Provide better clock continuity")

Where we hoped that TSC would not make big jumps after SMP bringup. Of
course, TSC needs to prove us wrong. Because Core2 comes up with a
semi-stable TSC and only goes funny once we probe the idle drivers,
because Core2 stops TSC on idle.

Now we could of course delay the final switch to stable longer, but it
would be better to entirely remove the assumption that TSC doesn't
make big jumps and improve things all-round.

So instead we have the clocksource watchdog call a special function
when it finds the TSC is still good (there's a race, it could've
gotten bad between us determining it's still good and calling our
function, do we care?).

This function then updates the __gtod_offset using sane values, which
is the value needed for clock continuity when being marked unstable.

Cc: Pavel Tatashin 
Cc: Martin Peres 
Reported-by: "Lofstedt, Marta" 
Fixes: 5680d8094ffa ("sched/clock: Provide better clock continuity")
Signed-off-by: Peter Zijlstra (Intel) 
---
 arch/x86/kernel/tsc.c   |   12 ++
 include/linux/clocksource.h |1 
 include/linux/sched/clock.h |2 -
 kernel/sched/clock.c|   50 
 kernel/time/clocksource.c   |3 ++
 5 files changed, 45 insertions(+), 23 deletions(-)

--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -374,6 +374,8 @@ static int __init tsc_setup(char *str)
tsc_clocksource_reliable = 1;
if (!strncmp(str, "noirqtime", 9))
no_sched_irq_time = 1;
+   if (!strcmp(str, "unstable"))
+   mark_tsc_unstable("boot parameter");
return 1;
 }
 
@@ -1127,6 +1129,15 @@ static void tsc_cs_mark_unstable(struct
pr_info("Marking TSC unstable due to clocksource watchdog\n");
 }
 
+static void tsc_cs_tick_stable(struct clocksource *cs)
+{
+   if (tsc_unstable)
+   return;
+
+   if (using_native_sched_clock())
+   sched_clock_tick_stable();
+}
+
 /*
  * .mask MUST be CLOCKSOURCE_MASK(64). See comment above read_tsc()
  */
@@ -1140,6 +1151,7 @@ static struct clocksource clocksource_ts
.archdata   = { .vclock_mode = VCLOCK_TSC },
.resume = tsc_resume,
.mark_unstable  = tsc_cs_mark_unstable,
+   .tick_stable= tsc_cs_tick_stable,
 };
 
 void mark_tsc_unstable(char *reason)
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -96,6 +96,7 @@ struct clocksource {
void (*suspend)(struct clocksource *cs);
void (*resume)(struct clocksource *cs);
void (*mark_unstable)(struct clocksource *cs);
+   void (*tick_stable)(struct clocksource *cs);
 
/* private: */
 #ifdef CONFIG_CLOCKSOURCE_WATCHDOG
--- a/include/linux/sched/clock.h
+++ b/include/linux/sched/clock.h
@@ -63,8 +63,8 @@ extern void clear_sched_clock_stable(voi
  */
 extern u64 __sched_clock_offset;
 
-
 extern void sched_clock_tick(void);
+extern void sched_clock_tick_stable(void);
 extern void sched_clock_idle_sleep_event(void);
 extern void sched_clock_idle_wakeup_event(u64 delta_ns);
 
--- a/kernel/sched/clock.c
+++ b/kernel/sched/clock.c
@@ -152,25 +152,15 @@ static void __clear_sched_clock_stable(v
 {
struct sched_clock_data *scd = this_scd();
 
-   /*
-* Attempt to make the stable->unstable transition continuous.
-*
-* Trouble is, t

Re: [Intel-gfx] freedesktop bug id: 100548, bisected to sched/clock commit

2017-04-18 Thread Peter Zijlstra
On Tue, Apr 18, 2017 at 02:10:07PM +, Lofstedt, Marta wrote:
> Sorry Peter, I still see regression on the Core2 machine, with your patch.
> 

Blergh, ok. I'll see if I can dig out an actual Core2 machine somewhere.
I should have enough parts about.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] freedesktop bug id: 100548, bisected to sched/clock commit

2017-04-20 Thread Peter Zijlstra
On Tue, Apr 18, 2017 at 05:53:56PM +0200, Peter Zijlstra wrote:
> On Tue, Apr 18, 2017 at 02:10:07PM +, Lofstedt, Marta wrote:
> > Sorry Peter, I still see regression on the Core2 machine, with your patch.
> > 
> 
> Blergh, ok. I'll see if I can dig out an actual Core2 machine somewhere.
> I should have enough parts about.

Argh!

My Core2 seems to work as expected _without_ this patch (time is
continuous at the stable->unstable switch), but is broken (time jumps
backwards by almost .2s) with this patch -- because by the time ACPI
Processor marks TSC as busted, we haven't ran the clocksource watchdog
yet.


Just for my sanity, could you confirm "tsc=unstable" (which requires the
patch) actually works for you?

I'll go prod at things now that I have an actual Core2 running; although
sadly I don't see an obvious problem without this patch.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] freedesktop bug id: 100548, bisected to sched/clock commit

2017-04-20 Thread Peter Zijlstra
On Thu, Apr 20, 2017 at 07:19:50PM +0200, Peter Zijlstra wrote:

> Just for my sanity, could you confirm "tsc=unstable" (which requires the
> patch) actually works for you?


Also, could you get me the dmesg of a 'broken' boot?
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 11/12] sched: use for_each_if in topology.h

2018-07-09 Thread Peter Zijlstra
On Mon, Jul 09, 2018 at 10:36:49AM +0200, Daniel Vetter wrote:
> Avoids complaints from gcc about ambiguous else clauses.

Is that a new thing? I'm fairly sure I've never seen it do that,

> Signed-off-by: Daniel Vetter 
> Cc: Andrew Morton 
> Cc: Peter Zijlstra 
> ---
>  include/linux/topology.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/linux/topology.h b/include/linux/topology.h
> index cb0775e1ee4b..4fba5a5b148d 100644
> --- a/include/linux/topology.h
> +++ b/include/linux/topology.h
> @@ -40,7 +40,7 @@
>  
>  #define for_each_node_with_cpus(node)\
>   for_each_online_node(node)  \
> - if (nr_cpus_node(node))
> + for_each_if (nr_cpus_node(node))

Not having gotten any of the other patches, I'm not really sure what
this does and such, but improve readability it does not :/
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 11/12] sched: use for_each_if in topology.h

2018-07-09 Thread Peter Zijlstra
On Mon, Jul 09, 2018 at 05:00:07PM +0200, Daniel Vetter wrote:
> On Mon, Jul 9, 2018 at 12:36 PM, Peter Zijlstra  wrote:
> > On Mon, Jul 09, 2018 at 10:36:49AM +0200, Daniel Vetter wrote:

> >>  #define for_each_node_with_cpus(node)\
> >>   for_each_online_node(node)  \
> >> - if (nr_cpus_node(node))
> >> + for_each_if (nr_cpus_node(node))
> >
> > Not having gotten any of the other patches, I'm not really sure what
> > this does and such, but improve readability it does not :/
> 
> Patch 1 in this series, which I dumped onto lkml as a whole:
> 
> https://lkml.org/lkml/2018/7/9/179

Right, so while I don't object to being Cc'ed to the whole series, I do
mind not being Cc'ed to at least the generic bits required to understand
the patch I do have to look at.

> Imo it does improve readability for the if (!cond) {} else pattern.
> And (assuming my grep fu isn't too badly wrong) most places in the
> kernel do use this pattern in for_each macros, so I guess its a real
> thing. We've definitely hit it plenty in drm iterators (but we seem to
> like if() checks in iterator macros maybe a bit too much).
> 
> I'm happy to drop this patch tough if you deem it offensive.

I'd just like to understand it better; what compiler complains about
this and is the warning otherwise useful? These things don't seem
mentioned in that initial patch either.

IOW I suppose I'm asking for the justification of this churn. If it's
really needed and useful so be it, but so far I'm not seeing any.

At a while guess I'd say this is something new in gcc-8 (and while I
have that installed on some machines, it doesn't seem to be the default,
and so I've not actually seen its output). But is the warning actually
useful, should we not just kill the warning like we tend to do some
really silly ones.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 11/12] sched: use for_each_if in topology.h

2018-07-09 Thread Peter Zijlstra
On Mon, Jul 09, 2018 at 05:52:04PM +0200, Daniel Vetter wrote:
> for_each_something(foo)
>   if (foo->bla)
>   call_bla(foo);
>   else
>   call_default(foo);
> 
> Totally contrived, but this complains. Liberally sprinkling {} also shuts
> up the compiler, but it's a bit confusing given that a plain for {;;} is
> totally fine. And it's confusing since at first glance the compiler
> complaining about nested if and ambigous else doesn't make sense since
> clearly there's only 1 if there.

Ah, so the pattern the compiler tries to warn about is:

if (foo)
if (bar)
/* stmts1 */
else
/* stmts2 *

Because it might not be immediately obvious with which if the else goes.
Which is fair enough I suppose.

OK, ACK.

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


  1   2   >