Re: [PATCH] perf: fix ring_buffer perf_output_space() boundary calculation

2013-03-18 Thread Peter Zijlstra
On Mon, 2013-03-18 at 13:48 +0100, Stephane Eranian wrote: > if (!rb->writable) > - return true; > + return false; writable means user writable (VM_WRITE); the difference is that a !VM_WRITE buffer will simply over-write its own tail whereas a VM_WRITE buffer w

Re: [PATCH] perf: fix ring_buffer perf_output_space() boundary calculation

2013-03-18 Thread Peter Zijlstra
On Mon, 2013-03-18 at 14:03 +0100, Stephane Eranian wrote: > On Mon, Mar 18, 2013 at 1:59 PM, Peter Zijlstra wrote: > > On Mon, 2013-03-18 at 13:48 +0100, Stephane Eranian wrote: > >> if (!rb->writable) > >> - return true; >

Re: [PATCH] scheduler: convert BUG_ON()s in try_to_wake_up_local() to WARN_ON_ONCE()s

2013-03-19 Thread Peter Zijlstra
On Mon, 2013-03-18 at 12:22 -0700, Tejun Heo wrote: > try_to_wake_up_local() should only be invoked to wake up another task > in the same runqueue and BUG_ON()s are used to enforce the rule. > Missing try_to_wake_up_local() can stall workqueue execution but such > stalls are likely to be finite eit

Re: [PATCHv3] perf: Fix vmalloc ring buffer free function

2013-03-19 Thread Peter Zijlstra
> are you going to include that, or should I repost it? Ah, please repost (and prettify) it, I'm still very limited in the amount of work that I can do :/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordo

Re: [PATCH V3 1/7] sched: Create sched_select_cpu() to give preferred CPU for power saving

2013-03-19 Thread Peter Zijlstra
On Mon, 2013-03-18 at 20:53 +0530, Viresh Kumar wrote: > +/* > + * This routine returns the nearest non-idle cpu. It accepts a > bitwise OR of > + * SD_* flags present in linux/sched.h. If the local CPU isn't idle, > it is > + * returned back. If it is idle, then we must look for another CPU > whic

Re: [PATCH v2] perf: fix ring_buffer perf_output_space() boundary calculation

2013-03-19 Thread Peter Zijlstra
ecord can be saved and it will be gracefully > handled > by upper code layers. > > In v2, we also make the logic for the writable more explicit by > renaming it to rb->overwrite because it tells whether or not the > buffer can overwrite its tail (suggested by PeterZ). > > S

Re: [PATCH] perf,x86: fix uninitialized pt_regs in intel_pmu_drain_bts_buffer()

2013-03-19 Thread Peter Zijlstra
On Mon, 2013-03-18 at 14:46 +0100, Stephane Eranian wrote: > > This patch fixes an uninitialized pt_regs struct in drain BTS > function. The pt_regs struct is propagated all the way to the > code_get_segment() function from perf_instruction_pointer() > and may get garbage. > > We cannot simply in

Re: [PATCH] perf,x86: fix uninitialized pt_regs in intel_pmu_drain_bts_buffer()

2013-03-19 Thread Peter Zijlstra
On Tue, 2013-03-19 at 13:50 +0100, Stephane Eranian wrote: > > Should we not replace: > > > > regs.ip = 0; > > > > with that memset? It avoids the memset work in a few cases and > removes > > the then superfluous clearing of the IP field. > > > We could drop it because it's covered by t

Re: [PATCH 1/8] sched: change position of resched_cpu() in load_balance()

2013-03-19 Thread Peter Zijlstra
oonsoo Kim Acked-by: Peter Zijlstra -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/

Re: [PATCH 2/8] sched: explicitly cpu_idle_type checking in rebalance_domains()

2013-03-19 Thread Peter Zijlstra
On Thu, 2013-02-14 at 14:48 +0900, Joonsoo Kim wrote: > After commit 88b8dac0, dst-cpu can be changed in load_balance(), > then we can't know cpu_idle_type of dst-cpu when load_balance() > return positive. So, add explicit cpu_idle_type checking. No real objection I suppose, but did you actually s

Re: [PATCH 3/8] sched: don't consider other cpus in our group in case of NEWLY_IDLE

2013-03-19 Thread Peter Zijlstra
#x27;t > consider other cpus. Assigning to 'this_rq->idle_stamp' is now valid. > > Cc: Srivatsa Vaddagiri > Signed-off-by: Joonsoo Kim Fair enough, good catch. Acked-by: Peter Zijlstra > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 0c6aaf

Re: [PATCH 4/8] sched: clean up move_task() and move_one_task()

2013-03-19 Thread Peter Zijlstra
On Thu, 2013-02-14 at 14:48 +0900, Joonsoo Kim wrote: > Some validation for task moving is performed in move_tasks() and > move_one_task(). We can move these code to can_migrate_task() > which is already exist for this purpose. > @@ -4011,18 +4027,7 @@ static int move_tasks(struct lb_env *env) >

Re: [PATCH 3/9] perf util: Get rid of write_or_die() from trace-event-info.c

2013-03-19 Thread Peter Zijlstra
On Tue, 2013-03-19 at 10:35 -0400, Steven Rostedt wrote: > What about: > int err = 0; > > err += tracing_data_header(); > err += read_header_files(); > [...] > > if (err < 0) { > free(tdata); > tdata = NULL; > } > >

Re: [PATCH 6/8] sched: rename load_balance_tmpmask to load_balance_cpu_active

2013-03-19 Thread Peter Zijlstra
On Thu, 2013-02-14 at 14:48 +0900, Joonsoo Kim wrote: > This name doesn't represent specific meaning. > So rename it to imply it's purpose. > > Signed-off-by: Joonsoo Kim > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index 26058d0..e6f8783 100644 > --- a/kernel/sched/core.c > +++

Re: [PATCH 3/9] perf util: Get rid of write_or_die() from trace-event-info.c

2013-03-19 Thread Peter Zijlstra
On Tue, 2013-03-19 at 10:59 -0400, Steven Rostedt wrote: > On Tue, 2013-03-19 at 15:49 +0100, Peter Zijlstra wrote: > > On Tue, 2013-03-19 at 10:35 -0400, Steven Rostedt wrote: > > > What about: > > > int err = 0; > > > > > > er

Re: [PATCH 7/8] sched: prevent to re-select dst-cpu in load_balance()

2013-03-19 Thread Peter Zijlstra
On Thu, 2013-02-14 at 14:48 +0900, Joonsoo Kim wrote: > Commit 88b8dac0 makes load_balance() consider other cpus in its group. > But, in that, there is no code for preventing to re-select dst-cpu. > So, same dst-cpu can be selected over and over. > > This patch add functionality to load_balance()

Re: [PATCH 8/8] sched: reset lb_env when redo in load_balance()

2013-03-19 Thread Peter Zijlstra
On Thu, 2013-02-14 at 14:48 +0900, Joonsoo Kim wrote: > Commit 88b8dac0 makes load_balance() consider other cpus in its group. > So, now, When we redo in load_balance(), we should reset some fields of > lb_env to ensure that load_balance() works for initial cpu, not for other > cpus in its group. S

Re: [PATCH 4/8] sched: clean up move_task() and move_one_task()

2013-03-20 Thread Peter Zijlstra
On Wed, 2013-03-20 at 16:33 +0900, Joonsoo Kim wrote: > > Right, so I'm not so taken with this one. The whole load stuff really > > is a balance heuristic that's part of move_tasks(), move_one_task() > > really doesn't care about that. > > > > So why did you include it? Purely so you didn't have

Re: [PATCH 4/8] sched: clean up move_task() and move_one_task()

2013-03-20 Thread Peter Zijlstra
On Wed, 2013-03-20 at 16:33 +0900, Joonsoo Kim wrote: > > Right, so I'm not so taken with this one. The whole load stuff really > > is a balance heuristic that's part of move_tasks(), move_one_task() > > really doesn't care about that. > > > > So why did you include it? Purely so you didn't have

Re: [PATCH 4/8] sched: clean up move_task() and move_one_task()

2013-03-20 Thread Peter Zijlstra
On Wed, 2013-03-20 at 12:16 +0100, Peter Zijlstra wrote: > > If your recommandation is to move up can_mirate_task() above > > load evaluation code, yes, I can, and will do that. :) > > I would actually propose ... to move the throttled test into can_migrate_task(). (damn

Re: [PATCH 7/8] sched: prevent to re-select dst-cpu in load_balance()

2013-03-20 Thread Peter Zijlstra
On Wed, 2013-03-20 at 16:43 +0900, Joonsoo Kim wrote: > On Tue, Mar 19, 2013 at 04:05:46PM +0100, Peter Zijlstra wrote: > > On Thu, 2013-02-14 at 14:48 +0900, Joonsoo Kim wrote: > > > Commit 88b8dac0 makes load_balance() consider other cpus in its group. > > > But, in

Re: [patch v4 01/18] sched: set SD_PREFER_SIBLING on MC domain to reduce a domain level

2013-02-12 Thread Peter Zijlstra
On Thu, 2013-01-24 at 11:06 +0800, Alex Shi wrote: > The domain flag SD_PREFER_SIBLING was set both on MC and CPU domain at > frist commit b5d978e0c7e79a, and was removed uncarefully when clear up > obsolete power scheduler. Then commit 6956dc568 recover the flag on CPU > domain only. It works, but

Re: [patch v4 02/18] sched: select_task_rq_fair clean up

2013-02-12 Thread Peter Zijlstra
On Thu, 2013-01-24 at 11:06 +0800, Alex Shi wrote: > It is impossible to miss a task allowed cpu in a eligible group. I suppose your reasoning goes like: tsk->cpus_allowed is protected by ->pi_lock, we hold this, therefore it cannot change and find_idlest_group() dtrt? We can then state that this

Re: [patch v4 03/18] sched: fix find_idlest_group mess logical

2013-02-12 Thread Peter Zijlstra
On Thu, 2013-01-24 at 11:06 +0800, Alex Shi wrote: > There is 4 situations in the function: > 1, no task allowed group; > so min_load = ULONG_MAX, this_load = 0, idlest = NULL > 2, only local group task allowed; > so min_load = ULONG_MAX, this_load assigned, idlest = NULL > 3, only non-

Re: [patch v4 05/18] sched: quicker balancing on fork/exec/wake

2013-02-12 Thread Peter Zijlstra
On Thu, 2013-01-24 at 11:06 +0800, Alex Shi wrote: > Guess the search cpu from bottom to up in domain tree come from > commit 3dbd5342074a1e sched: multilevel sbe sbf, the purpose is > balancing over tasks on all level domains. > > This balancing cost too much if there has many domain/groups in a

Re: [patch v4 06/18] sched: give initial value for runnable avg of sched entities.

2013-02-12 Thread Peter Zijlstra
On Thu, 2013-01-24 at 11:06 +0800, Alex Shi wrote: > We need initialize the se.avg.{decay_count, load_avg_contrib} to zero > after a new task forked. > Otherwise random values of above variables cause mess when do new task > enqueue: > enqueue_task_fair > enqueue_entity > en

Re: [patch v4 07/18] sched: set initial load avg of new forked task

2013-02-12 Thread Peter Zijlstra
On Thu, 2013-01-24 at 11:06 +0800, Alex Shi wrote: > + /* > +* set the initial load avg of new task same as its load > +* in order to avoid brust fork make few cpu too heavier > +*/ > + if (flags & ENQUEUE_NEWTASK) > + se->avg.load_avg_contrib = se-

Re: [patch v4 08/18] Revert "sched: Introduce temporary FAIR_GROUP_SCHED dependency for load-tracking"

2013-02-12 Thread Peter Zijlstra
On Thu, 2013-01-24 at 11:06 +0800, Alex Shi wrote: > Remove CONFIG_FAIR_GROUP_SCHED that covers the runnable info, then > we can use runnable load variables. > It would be nice if we could quantify the performance hit of doing so. Haven't yet looked at later patches to see if we remove anything to

Re: [patch v4 09/18] sched: add sched_policies in kernel

2013-02-12 Thread Peter Zijlstra
On Thu, 2013-01-24 at 11:06 +0800, Alex Shi wrote: > Current scheduler behavior is just consider the for larger performance > of system. So it try to spread tasks on more cpu sockets and cpu cores > > To adding the consideration of power awareness, the patchset adds > 2 kinds of scheduler policy:

Re: [patch v4 11/18] sched: log the cpu utilization at rq

2013-02-12 Thread Peter Zijlstra
On Thu, 2013-01-24 at 11:06 +0800, Alex Shi wrote: > > The cpu's utilization is to measure how busy is the cpu. > util = cpu_rq(cpu)->avg.runnable_avg_sum > / cpu_rq(cpu)->avg.runnable_avg_period; > > Since the util is no more than 1, we use its percentage value in later >

Re: [RFC] perf: need to expose sched_clock to correlate user samples with kernel samples

2013-04-02 Thread Peter Zijlstra
On Mon, 2013-04-01 at 11:29 -0700, John Stultz wrote: > I'm still not sold on the CLOCK_PERF posix clock. The semantics are > still too hand-wavy and implementation specific. How about we define the semantics as: match whatever comes out of perf (and preferably ftrace by default) stuff? Since th

Re: [PATCH 2/5] sched: factor out code to should_we_balance()

2013-04-02 Thread Peter Zijlstra
On Thu, 2013-03-28 at 16:58 +0900, Joonsoo Kim wrote: > Now checking that this cpu is appropriate to balance is embedded into > update_sg_lb_stats() and this checking has no direct relationship to > this > function. > > There is not enough reason to place this checking at > update_sg_lb_stats(), >

Re: [PATCH 2/5] sched: factor out code to should_we_balance()

2013-04-02 Thread Peter Zijlstra
On Tue, 2013-04-02 at 18:50 +0900, Joonsoo Kim wrote: > > It seems that there is some misunderstanding about this patch. > In this patch, we don't iterate all groups. Instead, we iterate on > cpus of local sched_group only. So there is no penalty you mentioned. OK, I'll go stare at it again.. --

Re: [PATCH 2/5] sched: factor out code to should_we_balance()

2013-04-02 Thread Peter Zijlstra
On Tue, 2013-04-02 at 12:00 +0200, Peter Zijlstra wrote: > On Tue, 2013-04-02 at 18:50 +0900, Joonsoo Kim wrote: > > > > It seems that there is some misunderstanding about this patch. > > In this patch, we don't iterate all groups. Instead, we iterate on > > cp

Re: [PATCH v2 2/3] mutex: add support for reservation style locks, v2

2013-04-02 Thread Peter Zijlstra
On Thu, 2013-02-28 at 11:25 +0100, Maarten Lankhorst wrote: > +mutex_reserve_lock_slow and mutex_reserve_lock_intr_slow: > + Similar to mutex_reserve_lock, except it won't backoff with > -EAGAIN. > + This is useful when mutex_reserve_lock failed with -EAGAIN, and you > + unreserved all reservati

Re: [PATCH v2 2/3] mutex: add support for reservation style locks, v2

2013-04-02 Thread Peter Zijlstra
On Thu, 2013-02-28 at 11:25 +0100, Maarten Lankhorst wrote: > +struct ticket_mutex { > + struct mutex base; > + atomic_long_t reservation_id; > +}; I'm not sure this is a good name, esp. due to the potential confusion vs the ticket spinlocks we have. -- To unsubscribe from this list:

Re: [PATCH v2 2/3] mutex: add support for reservation style locks, v2

2013-04-02 Thread Peter Zijlstra
On Thu, 2013-02-28 at 11:25 +0100, Maarten Lankhorst wrote: > +Reservation type mutexes > +struct ticket_mutex { > +extern int __must_check _mutex_reserve_lock(struct ticket_mutex *lock, That's two different names and two different forms of one (for a total of 3 variants) for the same scheme. F

Re: [PATCH v2 2/3] mutex: add support for reservation style locks, v2

2013-04-02 Thread Peter Zijlstra
On Thu, 2013-02-28 at 11:25 +0100, Maarten Lankhorst wrote: > +The algorithm that TTM came up with for dealing with this problem is > +quite simple. For each group of buffers (execbuf) that need to be > +locked, the caller would be assigned a unique reservation_id, from a > +global counter. In ca

Re: [PATCH v2 2/3] mutex: add support for reservation style locks, v2

2013-04-02 Thread Peter Zijlstra
On Tue, 2013-04-02 at 16:57 +0200, Maarten Lankhorst wrote: > Hey, > > Thanks for reviewing. Only partway through so far :-) > Op 02-04-13 13:00, Peter Zijlstra schreef: > > On Thu, 2013-02-28 at 11:25 +0100, Maarten Lankhorst wrote: > >> +Reservation type mutexe

Re: [PATCH v2 2/3] mutex: add support for reservation style locks, v2

2013-04-04 Thread Peter Zijlstra
I'm sorry, this email ended up quite a bit longer than I had hoped for; please bear with me. On Tue, 2013-04-02 at 18:59 +0200, Peter Zijlstra wrote: > struct ww_mutex; /* wound/wait */ > > int mutex_wound_lock(struct ww_mutex *); /* returns -EDEADLK */ > int mutex

Re: [PATCH 4/6] perf: Fix hw breakpoints overflow period sampling

2013-04-04 Thread Peter Zijlstra
On Sun, 2013-03-10 at 19:41 +0100, Jiri Olsa wrote: > The hw breakpoint pmu 'add' function is missing the > period_left update needed for SW events. > > The perf HW breakpoint events use SW events framework > for to process the overflow, so it needs to be properly > initialized during PMU 'add' me

Re: [RFC] Add implicit barriers to irqsave/restore class of functions

2013-04-04 Thread Peter Zijlstra
On Wed, 2013-04-03 at 15:10 +0200, Christian Ruppert wrote: > This patch adds implicit memory barriers to irqsave/restore functions > of > the ARC architecture port in line with what is done in other > architectures. > diff --git a/arch/arc/include/asm/irqflags.h > b/arch/arc/include/asm/irqflags.

Re: [PATCH v2 2/3] mutex: add support for reservation style locks, v2

2013-04-04 Thread Peter Zijlstra
On Thu, 2013-04-04 at 15:31 +0200, Daniel Vetter wrote: > We do add some form of owner tracking by storing the reservation > ticket > of the current holder into every ww_mutex. So when task-Y in your > above > example tries to acquire lock A it notices that it's behind in the > global > queue and i

Re: [PATCH v2 2/3] mutex: add support for reservation style locks, v2

2013-04-04 Thread Peter Zijlstra
On Thu, 2013-04-04 at 15:31 +0200, Daniel Vetter wrote: > Hm, I guess your aim with the TASK_DEADLOCK wakeup is to bound the > wait > times of older task. No, imagine the following: struct ww_mutex A, B; struct mutex C; task-O task-Y task-X A B

Re: [PATCH v2 2/3] mutex: add support for reservation style locks, v2

2013-04-04 Thread Peter Zijlstra
On Thu, 2013-04-04 at 15:31 +0200, Daniel Vetter wrote: > The trick with the current code is that the oldest task > will never see an -EAGAIN ever and hence is guaranteed to make forward > progress. If the task is really unlucky though it might be forced to > wait > for a younger task for every ww_

Re: [PATCH v2 2/3] mutex: add support for reservation style locks, v2

2013-04-04 Thread Peter Zijlstra
On Thu, 2013-04-04 at 15:31 +0200, Daniel Vetter wrote: > The thing is now that you're not expected to hold these locks for a > long > time - if you need to synchronously stall while holding a lock > performance > will go down the gutters anyway. And since most current > gpus/co-processors > still

Re: [PATCH v2 2/3] mutex: add support for reservation style locks, v2

2013-04-04 Thread Peter Zijlstra
On Thu, 2013-04-04 at 15:31 +0200, Daniel Vetter wrote: > Another big reason for having a start/end marker like you've describe > is > lockdep support. Yeah, I saw how you did that.. but there's other ways of making it work too, you could for instance create a new validation state for this type of

Re: [PATCH v2 2/3] mutex: add support for reservation style locks, v2

2013-04-04 Thread Peter Zijlstra
On Thu, 2013-04-04 at 15:31 +0200, Daniel Vetter wrote: > I'm a bit confused about the different classes you're talking about. > Since > the ticket queue is currently a global counter there's only one class > of > ww_mutexes. Right, so that's not something that's going to fly.. we need to support

Re: [PATCH v2 2/3] mutex: add support for reservation style locks, v2

2013-04-04 Thread Peter Zijlstra
On Thu, 2013-04-04 at 15:31 +0200, Daniel Vetter wrote: > We've discussed this approach of using (rt-prio, age) instead of just > age > to determine the the "oldness" of a task for deadlock-breaking with > -EAGAIN. The problem is that through PI boosting or normal rt-prio > changes > while tasks ar

Re: [PATCH v2 2/3] mutex: add support for reservation style locks, v2

2013-04-04 Thread Peter Zijlstra
On Thu, 2013-04-04 at 15:31 +0200, Daniel Vetter wrote: > Well, it was a good read and I'm rather happy that we agree on the > ww_ctx > thing (whatever it's called in the end), even though we have slightly > different reasons for it. Yeah, I tried various weirdness to get out from under it, but th

[PATCH] sched: Fix 32bit race in sched_clock_remote()

2013-04-05 Thread Peter Zijlstra
value would trigger a retry. Except we don't validate the new value 'val' in the same way! Thus we can propagate non-atomic read errors into the clock value. Cc: Ingo Molnar Cc: Steven Rostedt Debugged-by: Thomas Gleixner Signed-off-by: Peter Zijlstra --- kernel/sch

Re: [RFC] perf: need to expose sched_clock to correlate user samples with kernel samples

2013-02-25 Thread Peter Zijlstra
On Fri, 2013-02-22 at 22:04 -0800, John Stultz wrote: > On 02/20/2013 02:29 AM, Peter Zijlstra wrote: > > On Tue, 2013-02-19 at 10:25 -0800, John Stultz wrote: > >> So describe how the perf time domain is different then > >> CLOCK_MONOTONIC_RAW. > > The primary d

Re: [PATCH] Fix rq->lock vs logbuf_lock unlock race

2013-02-20 Thread Peter Zijlstra
On Mon, 2013-02-18 at 12:53 +, Bu, Yitian wrote: > This patch is for kernel V3.7.9 > > From 8796f4a2175a323aaa49ea8dd0fe68678dd5dccd Mon Sep 17 00:00:00 2001 > From: ybu > Date: Mon, 18 Feb 2013 19:52:01 +0800 > Subject: [PATCH] Fix rq->lock vs logbuf_lock unlock race > > fix up the fallout

Re: [PATCH] sched: Skip looking at skip if next or last is set

2013-02-20 Thread Peter Zijlstra
On Mon, 2013-02-18 at 18:31 +0530, Srikar Dronamraju wrote: > pick_next_entity() prefers next, then last. However code checks if the > left entity can be skipped even if next / last is set. > > Check if left entity should be skipped only if next/last is not set. You fail to explain why its a prob

Re: [patch v5 06/15] sched: log the cpu utilization at rq

2013-02-20 Thread Peter Zijlstra
On Mon, 2013-02-18 at 13:07 +0800, Alex Shi wrote: > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index fcdb21f..b9a34ab 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -1495,8 +1495,12 @@ static void update_cfs_rq_blocked_load(struct cfs_rq > *cfs_rq, int force_up

Re: [patch v5 07/15] sched: add new sg/sd_lb_stats fields for incoming fork/exec/wake balancing

2013-02-20 Thread Peter Zijlstra
On Mon, 2013-02-18 at 13:07 +0800, Alex Shi wrote: > @@ -4214,6 +4214,11 @@ struct sd_lb_stats { > unsigned int busiest_group_weight; > > int group_imb; /* Is there imbalance in this sd */ > + > + /* Varibles of power awaring scheduling */ > + unsigned int sd_utils;

Re: [patch v5 09/15] sched: add power aware scheduling in fork/exec/wake

2013-02-20 Thread Peter Zijlstra
On Mon, 2013-02-18 at 13:07 +0800, Alex Shi wrote: > +/* > + * Try to collect the task running number and capacity of the group. > + */ > +static void get_sg_power_stats(struct sched_group *group, > + struct sched_domain *sd, struct sg_lb_stats *sgs) > +{ > + int i; > + > + for_ea

Re: [patch v5 11/15] sched: add power/performance balance allow flag

2013-02-20 Thread Peter Zijlstra
On Mon, 2013-02-18 at 13:07 +0800, Alex Shi wrote: > @@ -4053,6 +4053,8 @@ struct lb_env { > unsigned intloop; > unsigned intloop_break; > unsigned intloop_max; > + int power_lb; /* if power balance needed > */ >

Re: [PATCH] Fix rq->lock vs logbuf_lock unlock race

2013-02-20 Thread Peter Zijlstra
On Wed, 2013-02-20 at 09:38 +, Bu, Yitian wrote: > > 2. from printk comment: "This is printk(). It can be called from any > context. > We want it to work. ". I suppose to use printk in any context. Unfortunately that's not quite possible, rq->lock is really out of bounds. At one point I trie

Re: [RFC] perf: need to expose sched_clock to correlate user samples with kernel samples

2013-02-20 Thread Peter Zijlstra
On Tue, 2013-02-19 at 10:25 -0800, John Stultz wrote: > So describe how the perf time domain is different then > CLOCK_MONOTONIC_RAW. The primary difference is that the trace/sched/perf time domain is not strictly monotonic, it is only locally monotonic -- that is two time stamps taken on the same

Re: [RFC PATCH v3 1/3] sched: schedule balance map foundation

2013-02-20 Thread Peter Zijlstra
On Tue, 2013-01-29 at 17:09 +0800, Michael Wang wrote: > + for_each_possible_cpu(cpu) { > + sbm = &per_cpu(sbm_array, cpu); > + node = cpu_to_node(cpu); > + size = sizeof(struct sched_domain *) * sbm_max_level; > + > + for (type = 0; typ

Re: [RFC PATCH v3 1/3] sched: schedule balance map foundation

2013-02-20 Thread Peter Zijlstra
On Tue, 2013-01-29 at 17:09 +0800, Michael Wang wrote: > +struct sched_balance_map { > + struct sched_domain **sd[SBM_MAX_TYPE]; > + int top_level[SBM_MAX_TYPE]; > + struct sched_domain *affine_map[NR_CPUS]; > +}; Argh.. affine_map is O(n^2) in nr_cpus, that's not cool. -- To un

Re: [RFC PATCH v3 0/3] sched: simplify the select_task_rq_fair()

2013-02-20 Thread Peter Zijlstra
On Wed, 2013-02-20 at 11:49 +0100, Ingo Molnar wrote: > The changes look clean and reasoable, I don't necessarily agree, note that O(n^2) storage requirement that Michael failed to highlight ;-) > any ideas exactly *why* it speeds up? That is indeed the most interesting part.. There's two part

Re: [patch v5 06/15] sched: log the cpu utilization at rq

2013-02-20 Thread Peter Zijlstra
On Wed, 2013-02-20 at 17:39 +0530, Preeti U Murthy wrote: > Hi, > > >> /* > >> * This is the main, per-CPU runqueue data structure. > >> * > >> @@ -481,6 +484,7 @@ struct rq { > >> #endif > >> > >>struct sched_avg avg; > >> + unsigned int util; > >> }; > >> > >> static inline int

Re: [patch v5 09/15] sched: add power aware scheduling in fork/exec/wake

2013-02-20 Thread Peter Zijlstra
On Wed, 2013-02-20 at 20:09 +0800, Alex Shi wrote: > On 02/20/2013 05:42 PM, Peter Zijlstra wrote: > > On Mon, 2013-02-18 at 13:07 +0800, Alex Shi wrote: > >> +/* > >> + * Try to collect the task running number and capacity of the group. > >> + */ > &g

Re: [patch v5 11/15] sched: add power/performance balance allow flag

2013-02-20 Thread Peter Zijlstra
On Wed, 2013-02-20 at 20:04 +0800, Alex Shi wrote: > >> @@ -5195,6 +5197,8 @@ static int load_balance(int this_cpu, struct rq > >> *this_rq, > >> .idle = idle, > >> .loop_break = sched_nr_migrate_break, > >> .cpus = cpus, > >>

Re: [patch v5 11/15] sched: add power/performance balance allow flag

2013-02-20 Thread Peter Zijlstra
On Wed, 2013-02-20 at 14:37 +0100, Peter Zijlstra wrote: > On Wed, 2013-02-20 at 20:04 +0800, Alex Shi wrote: > > > >> @@ -5195,6 +5197,8 @@ static int load_balance(int this_cpu, struct rq > > >> *this_rq, > > >> .idle

Re: [patch v5 06/15] sched: log the cpu utilization at rq

2013-02-20 Thread Peter Zijlstra
On Wed, 2013-02-20 at 22:33 +0800, Alex Shi wrote: > > There's generally a better value than 100 when using computers.. > seeing > > how 100 is 64+32+4. > > I didn't find a good example for this. and no idea of your suggestion, > would you like to explain a bit more? Basically what you're doing e

Re: [patch v5 06/15] sched: log the cpu utilization at rq

2013-02-20 Thread Peter Zijlstra
On Wed, 2013-02-20 at 22:33 +0800, Alex Shi wrote: > > You don't actually compute the rq utilization, you only compute the > > utilization as per the fair class, so if there's significant RT > activity > > it'll think the cpu is under-utilized, whihc I think will result in > the > > wrong thing. >

Re: [RFC PATCH v3 0/3] sched: simplify the select_task_rq_fair()

2013-02-21 Thread Peter Zijlstra
On Thu, 2013-02-21 at 12:51 +0800, Michael Wang wrote: > The old logical when locate affine_sd is: > > if prev_cpu != curr_cpu > if wake_affine() > prev_cpu = curr_cpu > new_cpu = select_idle_sibling(prev_cpu) > return new_cpu > > Th

Re: [PATCH v2 -tip] sched/rt: Fix locality of threaded interrupt handlers

2013-02-21 Thread Peter Zijlstra
On Wed, 2013-02-20 at 10:19 +0100, Alexander Gordeev wrote: > When a interrupt affinity mask targets multiple CPUs, the > RT scheduler selects a runqueue for RT task corresponding > to a threaded interrupt handler without consideration of > where the interrupt is actually gets delivered. It leads >

Re: [RFC PATCH v3 1/3] sched: schedule balance map foundation

2013-02-21 Thread Peter Zijlstra
On Thu, 2013-02-21 at 12:58 +0800, Michael Wang wrote: > > You are right, it cost space in order to accelerate the system, I've > calculated the cost once before (I'm really not good at this, please > let > me know if I make any silly calculation...), The exact size isn't that important, but its

Re: [patch v5 09/15] sched: add power aware scheduling in fork/exec/wake

2013-02-21 Thread Peter Zijlstra
On Wed, 2013-02-20 at 22:23 +0800, Alex Shi wrote: > > But but but,... nr_running is completely unrelated to utilization. > > > > Actually, I also hesitated on the name, how about using nr_running to > replace group_util directly? The name is a secondary issue, first you need to explain why you

Re: [RFC PATCH v3 0/3] sched: simplify the select_task_rq_fair()

2013-02-22 Thread Peter Zijlstra
On Fri, 2013-02-22 at 10:36 +0800, Michael Wang wrote: > According to my understanding, in the old world, wake_affine() will > only > be used if curr_cpu and prev_cpu share cache, which means they are in > one package, whatever search in llc sd of curr_cpu or prev_cpu, we > won't > have the chance

Re: [RFC PATCH v3 0/3] sched: simplify the select_task_rq_fair()

2013-02-22 Thread Peter Zijlstra
On Fri, 2013-02-22 at 10:37 +0800, Michael Wang wrote: > But that's really some benefit hardly to be estimate, especially when > the workload is heavy, the cost of wake_affine() is very high to > calculated se one by one, is that worth for some benefit we could not > promise? Look at something lik

Re: [PATCH 0/2] cpustat: use atomic operations to read/update stats

2013-02-22 Thread Peter Zijlstra
On Thu, 2013-02-21 at 21:56 -0800, Kevin Hilman wrote: > On 64-bit platforms, reads/writes of the various cpustat fields are > atomic due to native 64-bit loads/stores. However, on non 64-bit > platforms, reads/writes of the cpustat fields are not atomic and could > lead to inconsistent statistics

Re: [patch v5 09/15] sched: add power aware scheduling in fork/exec/wake

2013-02-22 Thread Peter Zijlstra
On Thu, 2013-02-21 at 22:40 +0800, Alex Shi wrote: > > The name is a secondary issue, first you need to explain why you > think > > nr_running is a useful metric at all. > > > > You can have a high nr_running and a low utilization (a burst of > > wakeups, each waking a process that'll instantly go

Re: [RFC PATCH v3 0/3] sched: simplify the select_task_rq_fair()

2013-02-22 Thread Peter Zijlstra
On Fri, 2013-02-22 at 17:10 +0800, Michael Wang wrote: > On 02/22/2013 04:21 PM, Peter Zijlstra wrote: > > On Fri, 2013-02-22 at 10:36 +0800, Michael Wang wrote: > >> According to my understanding, in the old world, wake_affine() will > >> only > >> be used

Re: [RFC PATCH v3 0/3] sched: simplify the select_task_rq_fair()

2013-02-22 Thread Peter Zijlstra
On Fri, 2013-02-22 at 17:11 +0800, Michael Wang wrote: > Ok, it do looks like wake_affine() lost it's value... I'm not sure we can say that on this one benchmark, there's a preemption advantage to running on a single cpu for pipe-test as well. We'd need to create a better benchmark to test this,

Re: [PATCH 0/2] cpustat: use atomic operations to read/update stats

2013-02-22 Thread Peter Zijlstra
On Fri, 2013-02-22 at 13:50 +0100, Frederic Weisbecker wrote: > atomic64_read() and atomic64_set() are supposed to take care of that, > without > even the need for _inc() or _add() parts that use LOCK. Are you sure? Generally atomic*_set() is not actually an atomic operation. -- To unsubscribe fr

Re: [PATCH 1/2] cpustat: use accessor functions for get/set/add

2013-02-22 Thread Peter Zijlstra
On Fri, 2013-02-22 at 14:38 +0100, Frederic Weisbecker wrote: > Looks good, just a minor neat: > > On Thu, Feb 21, 2013 at 09:56:43PM -0800, Kevin Hilman wrote: > > diff --git a/include/linux/kernel_stat.h b/include/linux/kernel_stat.h > > index 66b7078..df8ad75 100644 > > --- a/include/linux/kern

Re: [PATCH 0/2] cpustat: use atomic operations to read/update stats

2013-02-22 Thread Peter Zijlstra
On Fri, 2013-02-22 at 14:54 +0100, Ingo Molnar wrote: > * Peter Zijlstra wrote: > > > On Fri, 2013-02-22 at 13:50 +0100, Frederic Weisbecker wrote: > > > > atomic64_read() and atomic64_set() are supposed to take care > > > of that, without even the need for _

Re: [PATCH 0/2] cpustat: use atomic operations to read/update stats

2013-02-22 Thread Peter Zijlstra
On Fri, 2013-02-22 at 13:50 +0100, Frederic Weisbecker wrote: > > Which is a problem how? > > So here is a possible scenario, CPU 0 reads a kcpustat value, and CPU > 1 writes > it at the same time: > > //Initial value of "cpustat" is 0x > == CPU 0 == == CPU 1 == >

Re: [PATCH 0/2] cpustat: use atomic operations to read/update stats

2013-02-22 Thread Peter Zijlstra
On Fri, 2013-02-22 at 13:50 +0100, Frederic Weisbecker wrote: > > Argh!! at what cost? 64bit atomics are like expensive. Wouldn't > adding > > a seqlock be saner? > > Not sure. This requires a spinlock in the write side which is called > from > fast path like the timer interrupt. A single spinloc

Re: [PATCH 0/2] cpustat: use atomic operations to read/update stats

2013-02-22 Thread Peter Zijlstra
On Fri, 2013-02-22 at 15:16 +0100, Ingo Molnar wrote: > > I checked arch/x86/include/asm/atomic64_32.h and we use > > cmpxchg8b for everything from _set() to _read(), which > > translates into 'horridly stupendifyingly slow' for a number > > of machines, but coherent. > > That's a valid concern

Re: [PATCHv3] perf: Fix vmalloc ring buffer free function

2013-03-12 Thread Peter Zijlstra
On Tue, 2013-03-12 at 11:53 +0100, Jiri Olsa wrote: > > @@ -316,7 +316,7 @@ void rb_free(struct ring_buffer *rb) > > struct page * > > perf_mmap_to_page(struct ring_buffer *rb, unsigned long pgoff) > > { > > - if (pgoff > (1UL << page_order(rb))) > > + if (pgoff > rb->nr_pages) > >

Re: [PATCHv3] perf: Fix vmalloc ring buffer free function

2013-03-12 Thread Peter Zijlstra
On Tue, 2013-03-12 at 14:52 +0100, Jiri Olsa wrote: > > @@ -373,7 +373,7 @@ struct ring_buffer *rb_alloc(int nr_pages, long > > watermark, int cpu, int flags) > > rb->user_page = all_buf; > > rb->data_pages[0] = all_buf + PAGE_SIZE; > > rb->page_order = ilog2(nr_pages); > > -

Re: [PATCH] device: separate all subsys mutexes (was: Re: [BUG] potential deadlock led by cpu_hotplug lock (memcg involved))

2013-03-12 Thread Peter Zijlstra
On Tue, 2013-03-12 at 14:05 +0100, Michal Hocko wrote: > @@ -111,17 +111,17 @@ struct bus_type { > struct iommu_ops *iommu_ops; > > struct subsys_private *p; > + struct lock_class_key __key; > }; Is struct bus_type constrained to static storage or can people go an allocate

Re: [PATCH] device: separate all subsys mutexes (was: Re: [BUG] potential deadlock led by cpu_hotplug lock (memcg involved))

2013-03-12 Thread Peter Zijlstra
On Tue, 2013-03-12 at 08:43 -0700, Greg Kroah-Hartman wrote: > On Tue, Mar 12, 2013 at 04:28:25PM +0100, Peter Zijlstra wrote: > > On Tue, 2013-03-12 at 14:05 +0100, Michal Hocko wrote: > > > @@ -111,17 +111,17 @@ struct bus_type { > > >

Re: [PATCHv3] perf: Fix vmalloc ring buffer free function

2013-03-12 Thread Peter Zijlstra
Right you are.. How about something like the below; using that 0 << -1 is still 0. --- kernel/events/ring_buffer.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c index 23cb34f..e72ca70 100644 --- a/kernel/even

[PATCH] sched,trace: Allow tracing the preemption decision on wakeup

2013-03-14 Thread Peter Zijlstra
Signed-off-by: Peter Zijlstra --- kernel/sched/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index b36635e..849deb9 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -1288,8 +1288,8 @@ static void ttwu_activate

Re: [PATCH] sched: wakeup buddy

2013-03-14 Thread Peter Zijlstra
On Wed, 2013-03-13 at 11:07 +0800, Michael Wang wrote: > However, we already figure out the logical that wakeup related task > could benefit from closely running, this could promise us somewhat > reliable benefit. I'm not convinced that the 2 task wakeup scenario is the only sane scenario. Imagin

Re: lockdep trace from posix timers

2012-08-28 Thread Peter Zijlstra
On Fri, 2012-08-24 at 20:56 +0200, Oleg Nesterov wrote: > > Peter, if you think it can work for you and if you agree with > the implementation I will be happy to send the patch. Yeah I think it would work, but I'm not sure why you're introducing the cmp_xchg helper just for this.. Anyway, how a

Re: perf backtraces off-by-1

2012-08-28 Thread Peter Zijlstra
On Sun, 2012-08-26 at 10:52 -0700, Arun Sharma wrote: > On 8/26/12 9:10 AM, Peter Zijlstra wrote: > > On Fri, 2012-08-24 at 15:13 -0700, Arun Sharma wrote: > > > >> One option is to support > >> this for user mode only, with code to detect signal frames. Any o

Re: lockdep trace from posix timers

2012-08-28 Thread Peter Zijlstra
On Tue, 2012-08-28 at 19:01 +0200, Oleg Nesterov wrote: > > struct callback_head * > > task_work_cancel(struct task_struct *task, task_work_func_t func) > > { > > + struct callback_head **workp, *work; > > + > > +again: > > + workp = &task->task_works; > > + work = *workp; > > +

Re: [PATCH] perf/x86: Disable uncore on virtualized CPU.

2012-09-05 Thread Peter Zijlstra
On Wed, 2012-09-05 at 08:35 +0200, Ingo Molnar wrote: > * Yan, Zheng wrote: > > > From: "Yan, Zheng" > > > > Initializing uncore PMU on virtualized CPU may hang the kernel. > > This is because kvm does not emulate the entire hardware. Thers > > are lots of uncore related MSRs, making kvm enumer

Re: [RFC 0/5] forced comounts for cgroups.

2012-09-05 Thread Peter Zijlstra
On Wed, 2012-09-05 at 01:47 -0700, Tejun Heo wrote: > I think this is where we disagree. I didn't mean that all controllers > should be using exactly the same hierarchy when I was talking about > unified hierarchy. I do think it's useful and maybe even essential to > allow differing levels of gra

Re: [RFC 0/5] forced comounts for cgroups.

2012-09-05 Thread Peter Zijlstra
On Wed, 2012-09-05 at 11:06 +0200, Peter Zijlstra wrote: > > So either we go and try to contain this mess as proposed by Glauber or > we go delete controllers.. I've had it with this crap. > > Glauber, the other approach is sending a patch that doesn't touch cgroup.c

Re: [RFC 0/5] forced comounts for cgroups.

2012-09-05 Thread Peter Zijlstra
On Wed, 2012-09-05 at 13:12 +0400, Glauber Costa wrote: > On 09/05/2012 01:11 PM, Tejun Heo wrote: > > Hello, Peter. > > > > On Wed, Sep 05, 2012 at 11:06:33AM +0200, Peter Zijlstra wrote: > >> *confused* I always thought that was exactly what you meant with unified

Re: [RFC 0/5] forced comounts for cgroups.

2012-09-05 Thread Peter Zijlstra
On Wed, 2012-09-05 at 02:32 -0700, Tejun Heo wrote: > Hey, again. > > On Wed, Sep 05, 2012 at 11:06:33AM +0200, Peter Zijlstra wrote: > > Doing all this runtime is just going to make the mess even bigger, > > because now we have to deal with even more stupid cases. > &g

  1   2   3   4   5   6   7   8   9   10   >