The following commit has been merged into the sched/core branch of tip:
Commit-ID: e98fa02c4f2ea4991dae422ac7e34d102d2f0599
Gitweb:
https://git.kernel.org/tip/e98fa02c4f2ea4991dae422ac7e34d102d2f0599
Author:Paul Turner
AuthorDate:Fri, 10 Apr 2020 15:52:07 -07:00
Committer
From: Nadav Amit
Date: Fri, May 10, 2019 at 7:45 PM
To:
Cc: Borislav Petkov, , Nadav Amit, Andy
Lutomirsky, Ingo Molnar, Peter Zijlstra, Thomas Gleixner, Jann Horn
> It may be useful to check in runtime whether certain assertions are
> violated even during speculative execution. This can allow t
On Mon, Jan 8, 2018 at 4:48 PM, David Woodhouse wrote:
> On Tue, 2018-01-09 at 00:44 +, Woodhouse, David wrote:
>> On IRC, Arjan assures me that 'pause' here really is sufficient as a
>> speculation trap. If we do end up returning back here as a
>> misprediction, that 'pause' will stop the spe
On Mon, Jan 8, 2018 at 5:21 PM, Andi Kleen wrote:
> On Mon, Jan 08, 2018 at 05:16:02PM -0800, Andi Kleen wrote:
>> > If we clear the registers, what the hell are you going to put in the
>> > RSB that helps you?
>>
>> RSB allows you to control chains of gadgets.
>
> I admit the gadget thing is a bi
On Mon, Jan 8, 2018 at 2:25 PM, Andi Kleen wrote:
>> So pjt did alignment, a single unroll and per discussion earlier today
>> (CET) or late last night (PST), he only does 16.
>
> I used the Intel recommended sequence, which recommends 32.
>
> Not sure if alignment makes a difference. I can check.
On Mon, Jan 8, 2018 at 2:11 PM, Peter Zijlstra wrote:
> On Mon, Jan 08, 2018 at 12:15:31PM -0800, Andi Kleen wrote:
>> diff --git a/arch/x86/include/asm/nospec-branch.h
>> b/arch/x86/include/asm/nospec-branch.h
>> index b8c8eeacb4be..e84e231248c2 100644
>> --- a/arch/x86/include/asm/nospec-branch
ing a binary, or a new AMD processor, 32 calls are
required. I would suggest tuning this based on the current CPU (which
also covers the future case while saving cycles now) to save overhead.
On Mon, Jan 8, 2018 at 3:16 AM, Andrew Cooper wrote:
> On 08/01/18 10:42, Paul Turner wrote:
>>
On Mon, Jan 8, 2018 at 2:45 AM, David Woodhouse wrote:
> On Mon, 2018-01-08 at 02:34 -0800, Paul Turner wrote:
>> One detail that is missing is that we still need RSB refill in some
>> cases.
>> This is not because the retpoline sequence itself will underflow (it
>> is
On Mon, Jan 8, 2018 at 2:38 AM, Jiri Kosina wrote:
> On Mon, 8 Jan 2018, Paul Turner wrote:
>
>> user->kernel in the absence of SMEP:
>> In the absence of SMEP, we must worry about user-generated RSB entries
>> being consumable by kernel execution.
>> Generally sp
tem it took ~43 cycles on average. Note that non-zero
displacement calls should be used as these may be optimized to not
interact with the RSB due to their use in fetching RIP for 32-bit
relocations.
On Mon, Jan 8, 2018 at 2:34 AM, Paul Turner wrote:
> One detail that is missing is that we sti
On Fri, Jan 5, 2018 at 3:26 AM, Paolo Bonzini wrote:
> On 05/01/2018 11:28, Paul Turner wrote:
>>
>> The "pause; jmp" sequence proved minutely faster than "lfence;jmp" which is
>> why
>> it was chosen.
>>
>> "pause; jmp" 3
On Fri, Jan 5, 2018 at 3:32 AM, Paolo Bonzini wrote:
> On 04/01/2018 22:22, Van De Ven, Arjan wrote:
>> this is about a level of paranoia you are comfortable with.
>>
>> Retpoline on Skylake raises the bar for the issue enormously, but
>> there are a set of corner cases that exist and that are not
On Thu, Jan 4, 2018 at 11:33 AM, Linus Torvalds
wrote:
> On Thu, Jan 4, 2018 at 11:19 AM, David Woodhouse wrote:
>>
>> On Skylake the target for a 'ret' instruction may also come from the
>> BTB. So if you ever let the RSB (which remembers where the 'call's came
>> from get empty, you end up vuln
On Fri, Jan 05, 2018 at 10:55:38AM +, David Woodhouse wrote:
> On Fri, 2018-01-05 at 02:28 -0800, Paul Turner wrote:
> > On Thu, Jan 04, 2018 at 07:27:58PM +, David Woodhouse wrote:
> > > On Thu, 2018-01-04 at 10:36 -0800, Alexei Starovoitov wrote:
> >
On Fri, Jan 05, 2018 at 10:55:38AM +, David Woodhouse wrote:
> On Fri, 2018-01-05 at 02:28 -0800, Paul Turner wrote:
> > On Thu, Jan 04, 2018 at 07:27:58PM +, David Woodhouse wrote:
> > > On Thu, 2018-01-04 at 10:36 -0800, Alexei Starovoitov wrote:
> >
On Thu, Jan 04, 2018 at 08:18:57AM -0800, Andy Lutomirski wrote:
> On Thu, Jan 4, 2018 at 1:30 AM, Woodhouse, David wrote:
> > On Thu, 2018-01-04 at 01:10 -0800, Paul Turner wrote:
> >> Apologies for the discombobulation around today's disclosure. Obviously
> >&g
On Thu, Jan 04, 2018 at 10:25:35AM -0800, Linus Torvalds wrote:
> On Thu, Jan 4, 2018 at 10:17 AM, Alexei Starovoitov
> wrote:
> >
> > Clearly Paul's approach to retpoline without lfence is faster.
Using pause rather than lfence does not represent a fundamental difference here.
A protected indir
On Thu, Jan 04, 2018 at 10:40:23AM -0800, Andi Kleen wrote:
> > Clearly Paul's approach to retpoline without lfence is faster.
> > I'm guessing it wasn't shared with amazon/intel until now and
> > this set of patches going to adopt it, right?
> >
> > Paul, could you share a link to a set of altern
On Thu, Jan 04, 2018 at 07:27:58PM +, David Woodhouse wrote:
> On Thu, 2018-01-04 at 10:36 -0800, Alexei Starovoitov wrote:
> >
> > Pretty much.
> > Paul's writeup: https://support.google.com/faqs/answer/7625886
> > tldr: jmp *%r11 gets converted to:
> > call set_up_target;
> > capture_spec:
>
On Thu, Jan 4, 2018 at 1:10 AM, Paul Turner wrote:
> Apologies for the discombobulation around today's disclosure. Obviously the
> original goal was to communicate this a little more coherently, but the
> unscheduled advances in the disclosure disrupted the efforts to pull this
&
On Thu, Jan 4, 2018 at 1:10 AM, Paul Turner wrote:
> Apologies for the discombobulation around today's disclosure. Obviously the
> original goal was to communicate this a little more coherently, but the
> unscheduled advances in the disclosure disrupted the efforts to pull this
&
Apologies for the discombobulation around today's disclosure. Obviously the
original goal was to communicate this a little more coherently, but the
unscheduled advances in the disclosure disrupted the efforts to pull this
together more cleanly.
I wanted to open discussion the "retpoline" approach
On Wed, Jan 3, 2018 at 3:51 PM, Linus Torvalds
wrote:
> On Wed, Jan 3, 2018 at 3:09 PM, Andi Kleen wrote:
>> This is a fix for Variant 2 in
>> https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html
>>
>> Any speculative indirect calls in the kernel can be tricked
I have some comments that apply to many of the threads.
I've been fully occupied with a wedding and a security issue; but I'm
about to be free to spend the majority of my time on RSEQ things.
I was sorely hoping that day would be today. But it's looking like
I'm still a day or two from being free
On Thu, Mar 30, 2017 at 7:14 AM, Peter Zijlstra wrote:
> On Thu, Mar 30, 2017 at 02:16:58PM +0200, Peter Zijlstra wrote:
>> On Thu, Mar 30, 2017 at 04:21:08AM -0700, Paul Turner wrote:
>
>> > > +
>> > > + if (unlikely(periods >= LOAD_AVG_MAX_N))
>
On Fri, Mar 31, 2017 at 12:01 AM, Peter Zijlstra wrote:
> On Thu, Mar 30, 2017 at 03:02:47PM -0700, Paul Turner wrote:
>> On Thu, Mar 30, 2017 at 7:14 AM, Peter Zijlstra wrote:
>> > On Thu, Mar 30, 2017 at 02:16:58PM +0200, Peter Zijlstra wrote:
>> >> On Thu, Ma
On Thu, Mar 30, 2017 at 7:14 AM, Peter Zijlstra wrote:
> On Thu, Mar 30, 2017 at 02:16:58PM +0200, Peter Zijlstra wrote:
>> On Thu, Mar 30, 2017 at 04:21:08AM -0700, Paul Turner wrote:
>
>> > > +
>> > > + if (unlikely(periods >= LOAD_AVG_MAX_N))
>
On Mon, Mar 20, 2017 at 11:08 AM, Patrick Bellasi
wrote:
> On 20-Mar 13:15, Tejun Heo wrote:
>> Hello,
>>
>> On Tue, Feb 28, 2017 at 02:38:38PM +, Patrick Bellasi wrote:
>> > This patch extends the CPU controller by adding a couple of new
>> > attributes, capacity_min and capacity_max, which c
There is one important, fundamental difference here:
{cfs,rt}_{period,runtime}_us is a property that applies to a group of
threads, it can be sub-divided.
We can consume 100ms of quota either by having one thread run for
100ms, or 2 threads running for 50ms.
This is not true for capacity. It's a
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -2767,7 +2767,7 @@ static const u32 __accumulated_sum_N32[]
> * Approximate:
> * val * y^n,where y^32 ~= 0.5 (~1 scheduling period)
> */
> -static __always_inline u64 decay_load(u64 val, u64 n)
> +static u64 decay_load(u64 val
On Thu, Feb 2, 2017 at 12:06 PM, Tejun Heo wrote:
> Hello,
>
> This patchset implements cgroup v2 thread mode. It is largely based
> on the discussions that we had at the plumbers last year. Here's the
> rough outline.
>
> * Thread mode is explicitly enabled on a cgroup by writing "enable"
> i
On Mon, Dec 19, 2016 at 3:29 PM, Samuel Thibault
wrote:
> Paul Turner, on Mon 19 Dec 2016 15:26:19 -0800, wrote:
>> >> > - if (shares < MIN_SHARES)
>> >> > - shares = MIN_SHARES;
>> > ...
>> >> > return
On Mon, Dec 19, 2016 at 3:07 PM, Samuel Thibault
wrote:
> Paul Turner, on Mon 19 Dec 2016 14:44:38 -0800, wrote:
>> On Mon, Dec 19, 2016 at 2:40 PM, Samuel Thibault
>> wrote:
>> > 2159197d6677 ("sched/core: Enable increased load resolution on 64-bit
>> > k
On Mon, Dec 19, 2016 at 2:40 PM, Samuel Thibault
wrote:
> 2159197d6677 ("sched/core: Enable increased load resolution on 64-bit
> kernels")
>
> exposed yet another miscalculation in calc_cfs_shares: MIN_SHARES is unscaled,
> and must thus be scaled before being manipulated against "shares" amount
;
>> The restartable critical sections (percpu atomics) work has been started
>> by Paul Turner and Andrew Hunter. It lets the kernel handle restart of
>> critical sections. [1] [2] The re-implementation proposed here brings a
>> few simplifications to the ABI which facilita
On Mon, Aug 22, 2016 at 7:00 AM, Dietmar Eggemann
wrote:
>
> Since commit 2159197d6677 ("sched/core: Enable increased load resolution
> on 64-bit kernels") we now have two different fixed point units for
> load.
> shares in calc_cfs_shares() has 20 bit fixed point unit on 64-bit
> kernels. Therefo
eans that _wait_on_bit_lock can return -EINTR up to
__lock_page; which does not validate the return code and blindly
returns. This looks to have been a previously existing bug, but it
was at least masked by the fact that it required a fatal signal
previously (and that the page we return unlo
On Mon, Nov 2, 2015 at 12:34 PM, Peter Zijlstra wrote:
> On Mon, Nov 02, 2015 at 12:27:05PM -0800, Paul Turner wrote:
>> I suspect this part might be more explicitly expressed by specifying
>> the requirements that migration satisfies; then providing an example.
>> This makes
On Wed, Oct 28, 2015 at 6:58 PM, Peter Zijlstra wrote:
> On Wed, Oct 28, 2015 at 05:57:10PM -0700, Olav Haugan wrote:
>> On 15-10-25 11:09:24, Peter Zijlstra wrote:
>> > On Sat, Oct 24, 2015 at 11:01:02AM -0700, Olav Haugan wrote:
>> > > Task->on_rq has three states:
>> > > 0 - Task is not on ru
On Mon, Nov 2, 2015 at 5:29 AM, Peter Zijlstra wrote:
> These are some notes on the scheduler locking and how it provides
> program order guarantees on SMP systems.
>
> Cc: Linus Torvalds
> Cc: Will Deacon
> Cc: Oleg Nesterov
> Cc: Boqun Feng
> Cc: "Paul E. McKenney"
> Cc: Jonathan Corbet
>
On Tue, Oct 27, 2015 at 10:03 PM, Peter Zijlstra wrote:
>
> On Tue, Oct 27, 2015 at 04:57:05PM -0700, Paul Turner wrote:
> > +static void rseq_sched_out(struct preempt_notifier *pn,
> > +struct task_struct *next)
> > +{
> > + set
From: Paul Turner
Recall the general ABI is:
The kernel ABI generally consists of:
a) A shared TLS word which exports the current cpu and event-count
b) A shared TLS word which, when non-zero, stores the first post-commit
instruction if a sequence is active. (The kernel
rrupted by signals.
"basic_percpu_ops_test" is a slightly more "realistic" variant, implementing a
few simple per-cpu operations and testing their correctness. It also includes
a trivial example of user-space may multiplexing the critical section via the
restart handle
This is an update to the previously posted series at:
https://lkml.org/lkml/2015/6/24/665
Dave Watson has posted a similar follow-up which allows additional critical
regions to be registered as well as single-step support at:
https://lkml.org/lkml/2015/10/22/588
This series is a new approach
From: Paul Turner
Introduce the notion of a restartable sequence. This is a piece of user code
that can be described in 3 components:
1) Establish where [e.g. which cpu] the thread is running
2) Preparatory work that is dependent on the state in [1].
3) A committing instruction that
On Thu, Oct 1, 2015 at 11:46 AM, Tejun Heo wrote:
> Hello, Paul.
>
> Sorry about the delay. Things were kinda hectic in the past couple
> weeks.
Likewise :-(
>
> On Fri, Sep 18, 2015 at 04:27:07AM -0700, Paul Turner wrote:
>> On Sat, Sep 12, 2015 at 7:40 AM, Tejun Heo
On Sat, Sep 12, 2015 at 7:40 AM, Tejun Heo wrote:
> Hello,
>
> On Wed, Sep 09, 2015 at 05:49:31AM -0700, Paul Turner wrote:
>> I do not think this is a layering problem. This is more like C++:
>> there is no sane way to concurrently use all the features available,
>&g
24, 2015 at 04:06:39PM -0700, Paul Turner wrote:
>> > This is an erratic behavior on cpuset's part tho. Nothing else
>> > behaves this way and it's borderline buggy.
>>
>> It's actually the only sane possible interaction here.
>>
>> If you do
On Mon, Aug 24, 2015 at 3:49 PM, Tejun Heo wrote:
> Hello,
>
> On Mon, Aug 24, 2015 at 03:03:05PM -0700, Paul Turner wrote:
>> > Hmm... I was hoping for an actual configurations and usage scenarios.
>> > Preferably something people can set up and play with.
>>
>
On Mon, Aug 24, 2015 at 3:19 PM, Tejun Heo wrote:
> Hey,
>
> On Mon, Aug 24, 2015 at 02:58:23PM -0700, Paul Turner wrote:
>> > Why isn't it? Because the programs themselves might try to override
>> > it?
>>
>> The major reasons are:
>>
>>
On Mon, Aug 24, 2015 at 2:40 PM, Tejun Heo wrote:
> On Mon, Aug 24, 2015 at 02:19:29PM -0700, Paul Turner wrote:
>> > Would it be possible for you to give realistic and concrete examples?
>> > I'm not trying to play down the use cases but concrete examples are
>&
On Mon, Aug 24, 2015 at 2:36 PM, Tejun Heo wrote:
> Hello, Paul.
>
> On Mon, Aug 24, 2015 at 01:52:01PM -0700, Paul Turner wrote:
>> We typically share our machines between many jobs, these jobs can have
>> cores that are "private" (and not shared with other jobs
On Mon, Aug 24, 2015 at 2:17 PM, Tejun Heo wrote:
> Hello,
>
> On Mon, Aug 24, 2015 at 02:10:17PM -0700, Paul Turner wrote:
>> Suppose that we have 10 vcpu threads and 100 support threads.
>> Suppose that we want the support threads to receive up to 10% of the
>> ti
On Mon, Aug 24, 2015 at 2:12 PM, Tejun Heo wrote:
> Hello, Paul.
>
> On Mon, Aug 24, 2015 at 02:00:54PM -0700, Paul Turner wrote:
>> > Hmmm... I'm trying to understand the usecases where having hierarchy
>> > inside a process are actually required so that we d
On Mon, Aug 24, 2015 at 2:02 PM, Tejun Heo wrote:
> Hello,
>
> On Mon, Aug 24, 2015 at 01:54:08PM -0700, Paul Turner wrote:
>> > That alone doesn't require hierarchical resource distribution tho.
>> > Setting nice levels reasonably is likely to alleviate most of
On Mon, Aug 24, 2015 at 1:25 PM, Tejun Heo wrote:
> Hello, Austin.
>
> On Mon, Aug 24, 2015 at 04:00:49PM -0400, Austin S Hemmelgarn wrote:
>> >That alone doesn't require hierarchical resource distribution tho.
>> >Setting nice levels reasonably is likely to alleviate most of the
>> >problem.
>>
>
On Mon, Aug 24, 2015 at 10:04 AM, Tejun Heo wrote:
> Hello, Austin.
>
> On Mon, Aug 24, 2015 at 11:47:02AM -0400, Austin S Hemmelgarn wrote:
>> >Just to learn more, what sort of hypervisor support threads are we
>> >talking about? They would have to consume considerable amount of cpu
>> >cycles f
On Sat, Aug 22, 2015 at 11:29 AM, Tejun Heo wrote:
> Hello, Paul.
>
> On Fri, Aug 21, 2015 at 12:26:30PM -0700, Paul Turner wrote:
> ...
>> A very concrete example of the above is a virtual machine in which you
>> want to guarantee scheduling for the vCPU threads which
On Tue, Aug 18, 2015 at 1:31 PM, Tejun Heo wrote:
> Hello, Paul.
>
> On Mon, Aug 17, 2015 at 09:03:30PM -0700, Paul Turner wrote:
>> > 2) Control within an address-space. For subsystems with fungible
>> > resources,
>> > e.g. CPU, it can be useful for a
Apologies for the repeat. Gmail ate its plain text setting for some
reason. Shame bells.
On Mon, Aug 17, 2015 at 9:02 PM, Paul Turner wrote:
>
>
> On Wed, Aug 5, 2015 at 7:31 AM, Tejun Heo wrote:
>>
>> Hello,
>>
>> On Wed, Aug 05, 2015 at 11:10:36AM +0200,
On Fri, Jun 26, 2015 at 12:31 PM, Andy Lutomirski wrote:
> On Fri, Jun 26, 2015 at 11:09 AM, Mathieu Desnoyers
> wrote:
>> - On Jun 24, 2015, at 6:26 PM, Paul Turner p...@google.com wrote:
>>
>>> Implements the x86 (i386 & x86-64) ABIs for interrupting and
On Thu, Jun 25, 2015 at 6:15 PM, Mathieu Desnoyers
wrote:
> - On Jun 24, 2015, at 10:54 PM, Paul Turner p...@google.com wrote:
>
>> On Wed, Jun 24, 2015 at 5:07 PM, Andy Lutomirski wrote:
>>> On Wed, Jun 24, 2015 at 3:26 PM, Paul Turner wrote:
>>>&
On Wed, Jun 24, 2015 at 5:07 PM, Andy Lutomirski wrote:
> On Wed, Jun 24, 2015 at 3:26 PM, Paul Turner wrote:
>> This is a fairly small series demonstrating a feature we've found to be quite
>> powerful in practice, "restartable sequences".
>>
>
> On
tested in isolation.
Signed-off-by: Paul Turner
---
arch/Kconfig |7 +
arch/x86/Kconfig |1
arch/x86/syscalls/syscall_64.tbl |1
fs/exec.c |1
include/linux/sched.h | 28 ++
include/uapi/asm-generi
t we always want
the arguments to be available for sequence restart, it's much more natural to
ultimately differentiate the ABI in these two cases.
Signed-off-by: Paul Turner
---
arch/x86/include/asm/restartable_sequences.h | 50 +++
arch/x86/kernel
; is a slightly more "realistic" variant, implementing a
few simple per-cpu operations and testing their correctness. It also includes
a trivial example of user-space may multiplexing the critical section via the
restart handler.
Signed-off-by: Paul Turner
---
tools/t
This is a fairly small series demonstrating a feature we've found to be quite
powerful in practice, "restartable sequences".
Most simply: these sequences comprise small snippets of user-code that are
guaranteed to be (effectively) executed serially, with support for restart (or
other handling) in
actually something we've
strongly considered dropping. The complexity of correct TLS
addressing is non-trivial.
>
> This approach is inspired by Paul Turner and Andrew Hunter's work
> on percpu atomics, which lets the kernel handle restart of critical
> sections, ref.
&g
On Thu, May 21, 2015 at 12:08 PM, Mathieu Desnoyers
wrote:
> - Original Message -
>> On Thu, May 21, 2015 at 10:44:47AM -0400, Mathieu Desnoyers wrote:
>>
>> > +struct thread_percpu_user {
>> > + int32_t nesting;
>> > + int32_t signal_sent;
>> > + int32_t signo;
>> > + int32_t curr
On Thu, Sep 4, 2014 at 2:30 PM, John Stultz wrote:
> On Thu, Sep 4, 2014 at 2:17 PM, Andrew Hunter wrote:
>> On Wed, Sep 3, 2014 at 5:06 PM, John Stultz wrote:
>>> Maybe with the next version of the patch, before you get into the
>>> unwinding the math, you might practically describe what is bro
size_t i = 0; i < 10; ++i) {
> struct itimerval prev;
> setitimer(ITIMER_PROF, &zero, &prev);
> /* on old kernels, this goes up by TICK_USEC every iteration */
> printf("previous value: %ld %ld %ld %ld\n",
>prev.it_interval.tv_sec, prev.it_inter
On Tue, Aug 26, 2014 at 4:11 PM, Jason Low wrote:
> Based on perf profiles, the update_cfs_rq_blocked_load function constantly
> shows up as taking up a noticeable % of system run time. This is especially
> apparent on larger numa systems.
>
> Much of the contention is in __update_cfs_rq_tg_load_c
On Wed, Jan 22, 2014 at 9:53 AM, wrote:
> Vincent Guittot writes:
>
>> This reverts commit 282cf499f03ec1754b6c8c945c9674b02631fb0f.
>>
>> With the current implementation, the load average statistics of a sched
>> entity
>> change according to other activity on the CPU even if this activity is
On Tue, Jan 21, 2014 at 12:00 PM, Vincent Guittot
wrote:
>
> Le 21 janv. 2014 19:39, a écrit :
>
>
>>
>> Vincent Guittot writes:
>>
>> > With the current implementation, the load average statistics of a sched
>> > entity
>> > change according to other activity on the CPU even if this activity is
On Mon, Dec 9, 2013 at 5:04 PM, Alex Shi wrote:
> On 12/03/2013 06:26 PM, Peter Zijlstra wrote:
>>
>> Paul, can you guys have a look at this, last time around you have a
>> regression with this stuff, so it would be good to hear from you.
>>
>
> Ping Paul.
>
Ben was looking at this right before t
(*task_dead) (struct task_struct *p);
>
> void (*switched_from) (struct rq *this_rq, struct task_struct *task);
> void (*switched_to) (struct rq *this_rq, struct task_struct *task);
Reviewed-by: Paul Turner
> --
> 1.7.9.5
>
> --
> To unsubscribe from this lis
On Tue, Nov 12, 2013 at 8:29 AM, Wang, Xiaoming wrote:
> cfs_rq get its group run queue but the value of
> cfs_rq->nr_running maybe zero, which will cause
> the panic in pick_next_task_fair.
> So the evaluated of cfs_rq->nr_running is needed.
>
> [15729.985797] BUG: unable to handle kernel NULL po
Commit-ID: 0ac9b1c21874d2490331233b3242085f8151e166
Gitweb: http://git.kernel.org/tip/0ac9b1c21874d2490331233b3242085f8151e166
Author: Paul Turner
AuthorDate: Wed, 16 Oct 2013 11:16:27 -0700
Committer: Ingo Molnar
CommitDate: Tue, 29 Oct 2013 12:02:23 +0100
sched: Guarantee new group
On Wed, Oct 16, 2013 at 3:01 PM, Peter Zijlstra wrote:
> On Wed, Oct 16, 2013 at 11:16:27AM -0700, Ben Segall wrote:
>> From: Paul Turner
>>
>> Currently, group entity load-weights are initialized to zero. This
>> admits some races with respect to the first time they a
On Mon, Sep 30, 2013 at 7:22 PM, Yuanhan Liu
wrote:
> On Mon, Sep 30, 2013 at 12:14:03PM +0400, Vladimir Davydov wrote:
>> On 09/29/2013 01:47 PM, Yuanhan Liu wrote:
>> >On Fri, Sep 20, 2013 at 06:46:59AM -0700, tip-bot for Vladimir Davydov
>> >wrote:
>> >>Commit-ID: 7e3115ef5149fc502e3a2e80719d
On Thu, Sep 26, 2013 at 4:16 AM, Peter Zijlstra wrote:
> On Thu, Sep 26, 2013 at 03:55:55AM -0700, Paul Turner wrote:
>> > + /*
>> > +* Don't bother with select_idle_sibling() in the case of
>> > a sync wakeup
>> > +
On Thu, Sep 26, 2013 at 2:58 AM, Peter Zijlstra wrote:
> On Wed, Sep 25, 2013 at 10:56:17AM +0200, Mike Galbraith wrote:
>> That will make pipe-test go fugly -> pretty, and help very fast/light
>> localhost network, but eat heavier localhost overlap recovery. We need
>> a working (and cheap) over
x27;v Once we've made it that e made it that
>
> if (!se) {
> - cfs_rq->h_load = rq->avg.load_avg_contrib;
> + cfs_rq->h_load = cfs_rq->runnable_load_avg;
Looks good.
Reviewed-by: Paul Turner
> cfs_rq->last_h_load_update = now;
>
On Mon, Aug 26, 2013 at 5:07 AM, Peter Zijlstra wrote:
> On Sat, Aug 24, 2013 at 03:33:59AM -0700, Paul Turner wrote:
>> On Mon, Aug 19, 2013 at 9:01 AM, Peter Zijlstra wrote:
>> > +++ b/kernel/sched/fair.c
>> > @@ -4977,7 +4977,7 @@ static struct rq *find_busiest_queu
On Mon, Aug 26, 2013 at 2:49 PM, Rik van Riel wrote:
> On 08/26/2013 08:09 AM, Peter Zijlstra wrote:
>> On Sat, Aug 24, 2013 at 03:45:57AM -0700, Paul Turner wrote:
>>>> @@ -5157,6 +5158,13 @@ cpu_attach_domain(struct sched_domain *s
>>>>
On Sun, Aug 25, 2013 at 7:56 PM, Lei Wen wrote:
> On Tue, Aug 20, 2013 at 12:01 AM, Peter Zijlstra wrote:
>> From: Joonsoo Kim
>>
>> There is no reason to maintain separate variables for this_group
>> and busiest_group in sd_lb_stat, except saving some space.
>> But this structure is always allo
BLING down in case of a
> +* degenerate parent; the spans match for this
> +* so the property transfers.
> +*/
> + if (parent->flags & SD_PREFER_SIBLING)
> +
load = wl;
max_load_power = power;
...
This would actually end up being a little more accurate even.
[ Alternatively without caching max_load_power we could compare wl *
power vs max_load * SCHED_POWER_SCALE. ]
Reviewed-by: Paul Turner
--
To unsubscribe from this list: send the line "unsubscribe lin
On Mon, Aug 19, 2013 at 9:01 AM, Peter Zijlstra wrote:
> We can shrink sg_lb_stats because rq::nr_running is an 'unsigned int'
> and cpu numbers are 'int'
>
> Before:
> sgs:/* size: 72, cachelines: 2, members: 10 */
> sds:/* size: 184, cachelines: 3, members: 7 */
>
> After:
>
if (env->idle == CPU_NEWLY_IDLE && sds.this_has_capacity &&
> - !sds.busiest_has_capacity)
> + if (env->idle == CPU_NEWLY_IDLE && this->group_has_capacity &&
> + !busiest->group_has_capacity)
>
On Thu, Aug 22, 2013 at 3:42 AM, Peter Zijlstra wrote:
> On Thu, Aug 22, 2013 at 02:58:27AM -0700, Paul Turner wrote:
>> On Mon, Aug 19, 2013 at 9:01 AM, Peter Zijlstra wrote:
>
>> > + if (local_group)
>> > load = target_loa
int balance = 1;
> + int should_balance = 1;
> struct rq *rq = cpu_rq(cpu);
> unsigned long interval;
> struct sched_domain *sd;
> @@ -5618,7 +5615,7 @@ static void rebalance_domains(int cpu, e
> }
>
> if (time_after_eq(jiffies, sd->last_balance + interval)) {
> - if (load_balance(cpu, rq, sd, idle, &balance)) {
> + if (load_balance(cpu, rq, sd, idle, &should_balance))
> {
> /*
> * The LBF_SOME_PINNED logic could have
> changed
> * env->dst_cpu, so we can't know our idle
> @@ -5641,7 +5638,7 @@ static void rebalance_domains(int cpu, e
> * CPU in our sched group which is doing load balancing more
> * actively.
> */
> - if (!balance)
> + if (!should_balance)
> break;
> }
> rcu_read_unlock();
>
>
Reviewed-by: Paul Turner
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
On Mon, Aug 19, 2013 at 9:00 AM, Peter Zijlstra wrote:
> From: Joonsoo Kim
>
> Remove one division operation in find_busiest_queue() by using
> crosswise multiplication:
>
> wl_i / power_i > wl_j / power_j :=
> wl_i * power_j > wl_j * power_i
>
> Signed-off-by: Joonsoo Kim
> [pet
On Thu, Aug 15, 2013 at 10:39 AM, Peter Zijlstra wrote:
> On Tue, Aug 13, 2013 at 01:08:17AM -0700, Paul Turner wrote:
>> On Tue, Aug 13, 2013 at 12:38 AM, Peter Zijlstra
>> wrote:
>> > On Tue, Aug 13, 2013 at 12:45:12PM +0800, Lei Wen wrote:
>> >> > No
On Tue, Aug 13, 2013 at 1:18 AM, Lei Wen wrote:
> Hi Paul,
>
> On Tue, Aug 13, 2013 at 4:08 PM, Paul Turner wrote:
>> On Tue, Aug 13, 2013 at 12:38 AM, Peter Zijlstra
>> wrote:
>>> On Tue, Aug 13, 2013 at 12:45:12PM +0800, Lei Wen wrote:
>>>> &
On Tue, Aug 13, 2013 at 12:38 AM, Peter Zijlstra wrote:
> On Tue, Aug 13, 2013 at 12:45:12PM +0800, Lei Wen wrote:
>> > Not quite right; I think you need busiest->cfs.h_nr_running.
>> > cfs.nr_running is the number of entries running in this 'group'. If
>> > you've got nested groups like:
>> >
>>
We attached the following explanatory comment to our version of the patch:
/*
* In the common case (two user threads sharing mm
* switching) the bit will be set; avoid doing a write
* (via atomic test & set) unless we have to. This is
* safe, because no other CPU ever writes to our bit
* in the m
cpu, mm_cpumask(next));
> load_cr3(next->pgd);
> load_LDT_nolock(&next->context);
> }
We're carrying the *exact* same patch for *exact* same reason. I've
been meaning to send it out but wasn't sure of a goo
On Fri, Jul 26, 2013 at 2:50 PM, Peter Zijlstra wrote:
> On Fri, Jul 26, 2013 at 02:24:50PM -0700, Paul Turner wrote:
>> On Fri, Jul 26, 2013 at 2:03 PM, Peter Zijlstra wrote:
>> >
>> >
>> > OK, so I have the below; however on a second look, Paul, shouldn
augmenting
> periodic update that was supposed to account for this; resulting in a
> potential loss of fairness.
>
> To fix this, re-introduce the explicit update in
> update_cfs_rq_blocked_load() [called via entity_tick()].
>
> Cc: sta...@kernel.org
> Reviewed-by: Paul Tur
1 - 100 of 201 matches
Mail list logo