Re: [PATCH 0/4] (Was: lockdep trace from posix timers)

2012-09-06 Thread Peter Zijlstra
On Thu, 2012-09-06 at 20:01 +0200, Oleg Nesterov wrote: > Ping... Right, email backlog :-) > Peter, do you think you can do your make-it-lockless patch (hehe, I > think this is not possible ;) on top? Sure, I was trying to see if I could play games with the _cancel semantics that would satisfy t

Re: [RFC 00/12] perf diff: Factor diff command

2012-09-06 Thread Peter Zijlstra
On Thu, 2012-09-06 at 17:46 +0200, Jiri Olsa wrote: > The 'perf diff' and 'std/hist' code is now changed to allow computations > mentioned in the paper. Two of them are implemented within this patchset: > 1) ratio differential profiling > 2) weighted differential profiling Seems like a useful

Re: [PATCH tip/core/rcu 06/23] rcu: Break up rcu_gp_kthread() into subfunctions

2012-09-06 Thread Peter Zijlstra
On Thu, 2012-09-06 at 11:49 -0700, Josh Triplett wrote: > > Huh, I thought GCC knew to not emit that warning unless it actually > found control flow reaching the end of the function; since the > infinite > loop has no break in it, you shouldn't need the return. Annoying. tag the function with _

Re: [PATCH tip/core/rcu 01/15] rcu: Add PROVE_RCU_DELAY to provoke difficult races

2012-09-06 Thread Peter Zijlstra
On Thu, 2012-09-06 at 13:51 -0700, Paul E. McKenney wrote: > On Thu, Sep 06, 2012 at 04:38:32PM +0200, Peter Zijlstra wrote: > > On Thu, 2012-08-30 at 11:56 -0700, Paul E. McKenney wrote: > > > +#ifdef CONFIG_PROVE_RCU_DELAY > > > + udelay(10); /* Ma

Re: [PATCH tip/core/rcu 11/15] rcu: Avoid spurious RCU CPU stall warnings

2012-09-07 Thread Peter Zijlstra
On Thu, 2012-09-06 at 15:22 -0700, Paul E. McKenney wrote: > Ah! > > It is perfectly legal to avoid -starting- an RCU grace period for a > minute, or even longer. If RCU has nothing to do, in other words, if no > one registers any RCU callbacks, then RCU need not start a grace period. > > Of cou

Re: [RFC 00/12] perf diff: Factor diff command

2012-09-07 Thread Peter Zijlstra
On Thu, 2012-09-06 at 14:25 -0700, Paul E. McKenney wrote: > On Thu, Sep 06, 2012 at 08:41:09PM +0200, Peter Zijlstra wrote: > > On Thu, 2012-09-06 at 17:46 +0200, Jiri Olsa wrote: > > > The 'perf diff' and 'std/hist' code is now changed to allow computation

Re: WARNING: cpu_is_offline() at native_smp_send_reschedule()

2012-09-07 Thread Peter Zijlstra
t; bootup. If no new processes are spawed or no idle cycles happen, the > load on the cpus will remain unbalanced for that duration. > > Signed-off-by: Diwakar Tundlam > Signed-off-by: Peter Zijlstra > Link: > http://lkml.kernel.org/r/1dd7bfedd3147247b1355b

Re: [PATCH 09/12] perf diff: Add weighted diff computation way to compare hist entries

2012-09-07 Thread Peter Zijlstra
On Fri, 2012-09-07 at 22:33 +0900, Namhyung Kim wrote: > 2012-09-07 (금), 11:28 +0200, Jiri Olsa: > > On Fri, Sep 07, 2012 at 02:58:19PM +0900, Namhyung Kim wrote: > > > I don't see why this do { } while(0) loop is necessary. > > > How about this? > > > > > > w1 = strtol(opt, &tmp, 10); > > > i

Re: [PATCH 1/3] perf: use hrtimer for event multiplexing

2012-09-07 Thread Peter Zijlstra
On Fri, 2012-09-07 at 16:29 +0200, Stephane Eranian wrote: > @@ -148,6 +148,15 @@ static LIST_HEAD(pmus); > static DEFINE_MUTEX(pmus_lock); > static struct srcu_struct pmus_srcu; > > +struct perf_cpu_hrtimer { > + struct hrtimer hrtimer; > + int active; > +}; > + > +static DEFINE_PE

Re: [PATCH 3/3] perf: remove jiffies_interval

2012-09-07 Thread Peter Zijlstra
On Fri, 2012-09-07 at 16:29 +0200, Stephane Eranian wrote: > Obsolete because superseded by hrtimer based > multiplexing. Not entirely, the jiffies_interval allows different PMUs to have different rotation speeds. Your code doesn't allow this. -- To unsubscribe from this list: send the line "unsu

Re: [PATCH 1/3] perf: use hrtimer for event multiplexing

2012-09-07 Thread Peter Zijlstra
On Fri, 2012-09-07 at 16:29 +0200, Stephane Eranian wrote: Style nit: > + if (h->active) > + list_for_each_entry_safe(cpuctx, tmp, head, rotation_list) > + rotations += perf_rotate_context(cpuctx); > + if (!hrtimer_callback_running(hr)) > + __

Re: [PATCH 09/12] perf diff: Add weighted diff computation way to compare hist entries

2012-09-07 Thread Peter Zijlstra
On Fri, 2012-09-07 at 08:31 -0700, Arnaldo Carvalho de Melo wrote: > People don't like goto's, but that is overstated, for error handling > it > is perfectly fine :-) http://marc.info/?l=linux-arch&m=120852974023791&w=2 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" i

Re: [PATCH] perf, ibs: Check syscall attribute flags

2012-09-07 Thread Peter Zijlstra
On Fri, 2012-09-07 at 18:41 +0200, Robert Richter wrote: > From 1d037614edef576da441936bd8c917d31f57b179 Mon Sep 17 00:00:00 2001 > From: Robert Richter > Date: Wed, 25 Jul 2012 19:12:45 +0200 > Subject: [PATCH] perf, ibs: Check syscall attribute flags > > Current implementation simply ignores at

Re: [PATCH 1/3] perf: use hrtimer for event multiplexing

2012-09-07 Thread Peter Zijlstra
On Fri, 2012-09-07 at 19:03 +0200, Stephane Eranian wrote: > I think having different intervals would be a good thing, especially for > uncore. > But now, I am wondering how this could work without too much overhead. > Looks like you're suggesting arming multiple hrtimers if multiple PMU are > ove

Re: [PATCH] perf, ibs: Check syscall attribute flags

2012-09-07 Thread Peter Zijlstra
On Fri, 2012-09-07 at 19:18 +0200, Robert Richter wrote: > I was thinking of this too. But this breaks existing code to compile > since static initialization of struct perf_event_attr fails, e.g.: > > builtin-test.c:469:3: error: unknown field ‘watermark’ specified in > initializer > > Oh bugg

Re: [PATCH 1/3] perf: use hrtimer for event multiplexing

2012-09-07 Thread Peter Zijlstra
On Fri, 2012-09-07 at 21:10 +0200, Stephane Eranian wrote: > > That's true. I started modifying my code to implement your suggestion. > We'll see how it goes. Then we would have to export that mux interval > via sysfs for each PMU. Indeed. Thanks! -- To unsubscribe from this list: send the line

Re: Linux 3.6-rc4

2012-09-10 Thread Peter Zijlstra
On Fri, 2012-09-07 at 11:39 -0700, Linus Torvalds wrote: > Al? Please look into this. I'm not entirely sure what's going on, but > lockdep complains about this: > > Possible interrupt unsafe locking scenario: > >CPU0CPU1 > > lock(

Re: [PATCH v2 03/12] perf tools: include __WORDSIZE definition

2012-09-10 Thread Peter Zijlstra
On Sun, 2012-09-09 at 01:19 +0300, Irina Tirdea wrote: > >> +#ifndef __WORDSIZE > >> +#if defined(__x86_64__) > >> +# define __WORDSIZE 64 > >> +#endif > >> +#if defined(__i386__) || defined(__arm__) > >> +# define __WORDSIZE 32 > >> +#endif > >> +#endif > > > > Why not use "sizeof(unsigned long) *

Re: [PATCH 1/2] nohz: clean up select_nohz_load_balancer()

2012-09-10 Thread Peter Zijlstra
On Mon, 2012-09-10 at 15:10 +0800, Alex Shi wrote: > There is no load_balancer to be selected now. It just set state of > nohz tick stopping. > > So rename the function, pass the 'cpu' from parameter and then > remove the useless calling from tick_nohz_restart_sched_tick(). Please check who wrote

Re: [RFC][PATCH] Improving directed yield scalability for PLE handler

2012-09-10 Thread Peter Zijlstra
On Mon, 2012-09-10 at 08:16 -0500, Andrew Theurer wrote: > > > @@ -4856,8 +4859,6 @@ again: > > > if (curr->sched_class != p->sched_class) > > > goto out; > > > > > > - if (task_running(p_rq, p) || p->state) > > > - goto out; > > > > Is it possible that by this time th

Re: [RFC PATCH 1/3] perf: Add cpumask for uncore pmu

2012-09-10 Thread Peter Zijlstra
On Mon, 2012-09-10 at 15:53 +0800, Yan, Zheng wrote: > Hi, > > This patchset add a cpumask file to the uncore pmu sysfs directory. > If user doesn't explicitly specify CPU list, perf-stat only collects > uncore events on CPUs listed in the cpumask file. > > As Stephane suggested, make perf-stat r

Re: [PATCH 4/4] perf tools: Back [vdso] DSO with real data

2012-09-10 Thread Peter Zijlstra
On Mon, 2012-09-10 at 18:50 +0200, Jiri Olsa wrote: > + maps = fopen("/proc/self/maps", "r"); > + if (!maps) { > + pr_err("vdso: cannot open maps\n"); > + return -1; > + } > + > + while (!found && fgets(line, sizeof(line), maps)) { > +

Re: [PATCH 0/3] perf: precise mode and exclude_guest

2012-09-10 Thread Peter Zijlstra
On Mon, 2012-09-10 at 10:40 -0600, David Ahern wrote: > Hopefully thi wraps up the precise mode-exclude_guest dependency. > I'm sure someone will let me know if I screwed up the attribution > in the second patch. I'll wait with applying until we have the IBS stuff sorted, other than that, thanks

Re: [RFC][PATCH] Improving directed yield scalability for PLE handler

2012-09-10 Thread Peter Zijlstra
On Mon, 2012-09-10 at 22:26 +0530, Srikar Dronamraju wrote: > > +static bool __yield_to_candidate(struct task_struct *curr, struct > > task_struct *p) > > +{ > > + if (!curr->sched_class->yield_to_task) > > + return false; > > + > > + if (curr->sched_class != p->sched_class) >

Re: [PATCH 0/3] perf: precise mode and exclude_guest

2012-09-10 Thread Peter Zijlstra
On Mon, 2012-09-10 at 11:01 -0600, David Ahern wrote: > On 9/10/12 10:57 AM, Peter Zijlstra wrote: > > On Mon, 2012-09-10 at 10:40 -0600, David Ahern wrote: > >> Hopefully thi wraps up the precise mode-exclude_guest dependency. > >> I'm sure someone will let me know

Re: [PATCH 1/3] perf tool: precise mode requires exclude_guest

2012-09-10 Thread Peter Zijlstra
; Signed-off-by: David Ahern > Cc: Ingo Molnar > Cc: Peter Zijlstra > Cc: Robert Richter > Cc: Gleb Natapov > Cc: Avi Kivity > --- > tools/perf/util/parse-events.c |3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/tools/perf/util/parse-events.c b/

Re: [PATCH 2/3] perf: require exclude_guest to use PEBS - kernel side enforcement

2012-09-10 Thread Peter Zijlstra
On Mon, 2012-09-10 at 10:40 -0600, David Ahern wrote: > From: Peter Zijlstra > > See https://lkml.org/lkml/2012/7/9/298 Expanding that a little would be so much better.. take some of the reply to 1/3 on why we have to enforce a strict exclude_guest. -- To unsubscribe from this list:

Re: [PATCH 4/7] ptrace: Partly fix set_task_blockstep()->update_debugctlmsr() logic

2012-09-10 Thread Peter Zijlstra
On Mon, 2012-09-10 at 18:57 +0200, Sebastian Andrzej Siewior wrote: > The only user that is touching this bits in irq context is perf. perf > uses raw_local_irqsave() (raw_* most likely due to -RT). # git grep raw_local_irq arch/x86/kernel/cpu/perf_* kernel/events/ | wc -l 0 I think you're confus

Re: [RFC][PATCH] Improving directed yield scalability for PLE handler

2012-09-10 Thread Peter Zijlstra
On Mon, 2012-09-10 at 15:12 -0500, Andrew Theurer wrote: > + /* > +* if the target task is not running, then only yield if the > +* current task is in guest mode > +*/ > + if (!(p_rq->curr->flags & PF_VCPU)) > + goto out_irq; This would make yield

Re: sched: per-entity load-tracking

2012-10-08 Thread Peter Zijlstra
On Sat, 2012-10-06 at 09:39 +0200, Ingo Molnar wrote: Thanks Ingo! Paul, > tip/kernel/sched/fair.c | 28 ++-- > 1 file changed, 18 insertions(+), 10 deletions(-) > > Index: tip/kernel/sched/fair.c > === >

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-09 Thread Peter Zijlstra
On Mon, 2012-10-08 at 10:59 +0800, Tang Chen wrote: > If a cpu is offline, its nid will be set to -1, and cpu_to_node(cpu) will > return -1. As a result, cpumask_of_node(nid) will return NULL. In this case, > find_next_bit() in for_each_cpu will get a NULL pointer and cause panic. Hurm,. this is n

Re: [PATCH] task_work: avoid unneeded cmpxchg() in task_work_run()

2012-10-09 Thread Peter Zijlstra
On Mon, 2012-10-08 at 14:38 +0200, Oleg Nesterov wrote: > But the code looks more complex, and the only advantage is that > non-exiting task does xchg() instead of cmpxchg(). Not sure this > worth the trouble, in this case task_work_run() will likey run > the callbacks (the caller checks ->task_wor

Re: [REPOST] RFC: sched: Prevent wakeup to enter critical section needlessly

2012-10-09 Thread Peter Zijlstra
On Tue, 2012-10-09 at 06:37 -0700, Andi Kleen wrote: > Ivo Sieben writes: > > > Check the waitqueue task list to be non empty before entering the critical > > section. This prevents locking the spin lock needlessly in case the queue > > was empty, and therefor also prevent scheduling overhead on

Re: [PATCH] x86/perf: Fix virtualization sanity check

2012-10-09 Thread Peter Zijlstra
On Tue, 2012-10-09 at 17:38 +0200, Andre Przywara wrote: > First you need an AMD family 10h/12h CPU. These do not reset the > PERF_CTR registers on a reboot. > Now you boot bare metal Linux, which goes successfully through this > check, but leaves the magic value of 0xabcd in the register. You > do

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-09 Thread Peter Zijlstra
On Tue, 2012-10-09 at 13:36 -0700, David Rientjes wrote: > On Tue, 9 Oct 2012, Peter Zijlstra wrote: > > > On Mon, 2012-10-08 at 10:59 +0800, Tang Chen wrote: > > > If a cpu is offline, its nid will be set to -1, and cpu_to_node(cpu) will > > > return -1. As a res

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-10 Thread Peter Zijlstra
On Tue, 2012-10-09 at 16:27 -0700, David Rientjes wrote: > On Tue, 9 Oct 2012, Peter Zijlstra wrote: > > > Well the code they were patching is in the wakeup path. As I think Tang > > said, we leave !runnable tasks on whatever cpu they ran on last, even if > > that cpu is

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-10 Thread Peter Zijlstra
On Wed, 2012-10-10 at 17:33 +0800, Wen Congyang wrote: > > Hmm, if per-cpu memory is preserved, and we can't offline and remove > this memory. So we can't offline the node. > > But, if the node is hot added, and per-cpu memory doesn't use the > memory on this node. We can hotremove cpu/memory on

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-10 Thread Peter Zijlstra
On Wed, 2012-10-10 at 18:10 +0800, Wen Congyang wrote: > I use ./scripts/get_maintainer.pl, and it doesn't tell me that I should cc > you when I post that patch. That script doesn't look at all usage sites of the code you modify does it? You need to audit the entire tree for usage of the interfa

Re: Netperf UDP_STREAM regression due to not sending IPIs in ttwu_queue()

2012-10-10 Thread Peter Zijlstra
On Wed, 2012-10-10 at 13:29 +0100, Mel Gorman wrote: > Do we really switch more though? > > Look at the difference in interrupts vs context switch. IPIs are an interrupt > so if TTWU_QUEUE wakes process B using an IPI, does that count as a context > switch? Nope. Nor would it for NO_TTWU_QUEUE. A

Re: [PATCH 4/8] perf x86: Adding hardware events translations for amd cpus

2012-10-10 Thread Peter Zijlstra
On Wed, 2012-10-10 at 14:53 +0200, Jiri Olsa wrote: > +static ssize_t amd_event_sysfs_show(char *page, u64 config) > +{ > + u64 event = (config & ARCH_PERFMON_EVENTSEL_EVENT) | > + (config & AMD64_EVENTSEL_EVENT) >> 24; > + > + return x86_event_sysfs_show(page, config,

Re: [PATCH 4/8] perf x86: Adding hardware events translations for amd cpus

2012-10-10 Thread Peter Zijlstra
On Wed, 2012-10-10 at 16:25 +0200, Jiri Olsa wrote: > On Wed, Oct 10, 2012 at 04:11:42PM +0200, Peter Zijlstra wrote: > > On Wed, 2012-10-10 at 14:53 +0200, Jiri Olsa wrote: > > > +static ssize_t amd_event_sysfs_show(char *page, u64 config) > > > +{ > &

Re: Meaningless load?

2012-10-10 Thread Peter Zijlstra
On Wed, 2012-10-10 at 17:44 +0200, Simon Klinkert wrote: > I'm just wondering if the 'load' is really meaningful in this > scenario. The machine is the whole time fully responsive and looks > fine to me but maybe I didn't understand correctly what the load > should mean. Is there any sensible inter

Re: [PATCH V2] task_work: avoid unneeded cmpxchg() in task_work_run()

2012-10-10 Thread Peter Zijlstra
On Wed, 2012-10-10 at 19:50 +0200, Oleg Nesterov wrote: > > But you did not answer, and I am curious. What was your original > motivation? Is xchg really faster than cmpxchg? And is this true over multiple architectures? Or are we optimizing for x86_64 (again) ? -- To unsubscribe from this list:

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-19 Thread Peter Zijlstra
On Wed, 2012-10-17 at 20:29 -0700, David Rientjes wrote: > > Ok, thanks for the update. I agree that we should be clearing the mapping > at node hot-remove since any cpu that would subsequently get onlined and > assume one of the previous cpu's ids is not guaranteed to have the same > affinity

Re: [PATCH 2/2] rename NUMA fault handling functions

2012-10-19 Thread Peter Zijlstra
On Thu, 2012-10-18 at 17:20 -0400, Rik van Riel wrote: > Having the function name indicate what the function is used > for makes the code a little easier to read. Furthermore, > the fault handling code largely consists of do__page > functions. I don't much care either way, but I was thinking

Re: [PATCH 1/2] add credits for NUMA placement

2012-10-19 Thread Peter Zijlstra
er/numa-problem.txt file should > probably be rewritten once we figure out the final details of > what the NUMA code needs to do, and why. > > Signed-off-by: Rik van Riel Acked-by: Peter Zijlstra Thanks Rik! -- To unsubscribe from this list: send the line "unsubscribe linux

Re: [PATCH 1/2] brw_mutex: big read-write mutex

2012-10-19 Thread Peter Zijlstra
On Thu, 2012-10-18 at 15:28 -0400, Mikulas Patocka wrote: > > On Thu, 18 Oct 2012, Oleg Nesterov wrote: > > > Ooooh. And I just noticed include/linux/percpu-rwsem.h which does > > something similar. Certainly it was not in my tree when I started > > this patch... percpu_down_write() doesn't allow

Re: MAX_LOCKDEP_ENTRIES too low (called from ioc_release_fn)

2012-10-19 Thread Peter Zijlstra
On Fri, 2012-10-19 at 01:21 -0400, Dave Jones wrote: > > Not sure why you are CC'ing a call site, rather than the maintainers of > > the code. Just looks like lockdep is using too small a static value. > > Though it is pretty darn large... > > You're right, it's a huge chunk of memory. > It loo

Re: [PATCH] sched, autogroup: fix kernel crashes caused by runtime disable autogroup

2012-10-19 Thread Peter Zijlstra
to put the sysctl enabled check in autogroup_move_group(), kernel should check > it before autogroup_create in sched_autogroup_create_attach(). > > Reported-by: cwillu > Reported-by: Luis Henriques > Signed-off-by: Xiaotian Feng > Cc: Ingo Molnar > Cc: Peter Zijlst

Re: perf: p6 PMU working by accident, should we fix it and KNC?

2012-10-19 Thread Peter Zijlstra
On Wed, 2012-10-17 at 11:35 -0400, Vince Weaver wrote: > > This is by accident; it looks like the code does >val |= ARCH_PERFMON_EVENTSEL_ENABLE; > in p6_pmu_disable_event() so that events are never truly disabled > (is this a bug? should it be &=~ instead?). I think that's on purpose.. f

Re: [PATCH RFC] sched: boost throttled entities on wakeups

2012-10-19 Thread Peter Zijlstra
On Thu, 2012-10-18 at 11:32 +0400, Vladimir Davydov wrote: > > 1) Do you agree that the problem exists and should be sorted out? This is two questions.. yes it exists, I'm absolutely sure I pointed it out as soon as people even started talking about this nonsense (bw cruft). Should it be sorted,

Re: [PATCH RT] slab: Fix up stable merge of slab init_lock_keys()

2012-10-19 Thread Peter Zijlstra
On Thu, 2012-10-18 at 09:40 -0400, Steven Rostedt wrote: > Peter, > > There was a little conflict with my merge of 3.4.14 due to the backport > of this patch: > > commit 947ca1856a7e60aa6d20536785e6a42dff25aa6e > Author: Michael Wang > Date: Wed Sep 5 10:33:18 2012 +0800 > > slab: fix the

Re: [tip:numa/core] sched/numa/mm: Improve migration

2012-10-19 Thread Peter Zijlstra
On Fri, 2012-10-19 at 09:51 -0400, Johannes Weiner wrote: > Of course I'm banging my head into a wall for not seeing earlier > through the existing migration path how easy this could be. There's a reason I keep promoting the idea of 'someone' rewriting all that page-migration code :-) I forever

Re: [tip:numa/core] sched/numa/mm: Improve migration

2012-10-19 Thread Peter Zijlstra
On Fri, 2012-10-19 at 09:51 -0400, Johannes Weiner wrote: > Right now, unlike the traditional migration path, this breaks COW for > every migration, but maybe you don't care about shared pages in the > first place. And fixing that should be nothing more than grabbing the > anon_vma lock and using

Re: [tip:numa/core] sched/numa/mm: Improve migration

2012-10-19 Thread Peter Zijlstra
On Fri, 2012-10-19 at 09:51 -0400, Johannes Weiner wrote: > It's slightly ugly that migrate_page_copy() actually modifies the > existing page (deactivation, munlock) when you end up having to revert > back to it. The worst is actually calling copy_huge_page() on a THP.. it seems to work though ;-

Re: [PATCH 1/2] perf tools: add event modifier to request exclusive PMU access

2012-10-19 Thread Peter Zijlstra
On Fri, 2012-10-19 at 16:52 +0200, Stephane Eranian wrote: > -modifier_event [ukhpGH]{1,8} > +modifier_event [ukhpGHx]{1,8} wouldn't the max modifier sting length grow by adding another possible modifier? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a

Re: [PATCH 2/2] perf: SNB exclusive PMU access for INST_RETIRED:PREC_DIST

2012-10-19 Thread Peter Zijlstra
On Fri, 2012-10-19 at 16:52 +0200, Stephane Eranian wrote: > +static int intel_pebs_aliases_snb(struct perf_event *event) > +{ > + u64 cfg = event->hw.config; > + /* > +* for INST_RETIRED.PREC_DIST to work correctly with PEBS, it must > +* be measured alone on SNB (exclu

Re: question on NUMA page migration

2012-10-19 Thread Peter Zijlstra
On Fri, 2012-10-19 at 11:53 -0400, Rik van Riel wrote: > > If we do need the extra refcount, why is normal > page migration safe? :) Its mostly a matter of how convoluted you make the code, regular page migration is about as bad as you can get Normal does: follow_page(FOLL_GET) +1 isolate

Re: [PATCH 2/2] perf: SNB exclusive PMU access for INST_RETIRED:PREC_DIST

2012-10-19 Thread Peter Zijlstra
On Fri, 2012-10-19 at 18:31 +0200, Stephane Eranian wrote: > On Fri, Oct 19, 2012 at 6:27 PM, Peter Zijlstra wrote: > > On Fri, 2012-10-19 at 16:52 +0200, Stephane Eranian wrote: > >> +static int intel_pebs_aliases_snb(struct perf_event *event) > >> +{ > >>

Re: [PATCH 1/2] brw_mutex: big read-write mutex

2012-10-19 Thread Peter Zijlstra
On Fri, 2012-10-19 at 11:32 -0400, Mikulas Patocka wrote: > So if you can do an alternative implementation without RCU, show it. Uhm,,. no that's not how it works. You just don't push through crap like this and then demand someone else does it better. But using preempt_{disable,enable} and using

Re: question on NUMA page migration

2012-10-19 Thread Peter Zijlstra
On Fri, 2012-10-19 at 13:13 -0400, Rik van Riel wrote: > Would it make sense to have the normal page migration code always > work with the extra refcount, so we do not have to introduce a new > MIGRATE_FAULT migration mode? > > On the other hand, compaction does not take the extra reference... R

Re: linux-next: build failure after merge of the final tree (tip/s390 trees related)

2012-10-19 Thread Peter Zijlstra
On Thu, 2012-10-18 at 17:02 +0200, Ralf Baechle wrote: > CC mm/huge_memory.o > mm/huge_memory.c: In function ‘do_huge_pmd_prot_none’: > mm/huge_memory.c:789:3: error: incompatible type for argument 3 of > ‘update_mmu_cache’ That appears to have become update_mmu_cache_pmd(), which makes se

Re: [PATCH 1/6] perf, x86: Basic Haswell LBR call stack support

2012-10-22 Thread Peter Zijlstra
On Mon, 2012-10-22 at 14:11 +0800, Yan, Zheng wrote: > + /* LBR callstack does not work well with FREEZE_LBRS_ON_PMI */ > + if (!cpuc->lbr_sel || !(cpuc->lbr_sel->config & LBR_CALL_STACK)) > + debugctl |= DEBUGCTLMSR_FREEZE_LBRS_ON_PMI; How useful it is without this? How

Re: [PATCH 1/6] perf, x86: Basic Haswell LBR call stack support

2012-10-22 Thread Peter Zijlstra
On Mon, 2012-10-22 at 14:11 +0800, Yan, Zheng wrote: > --- a/include/uapi/linux/perf_event.h > +++ b/include/uapi/linux/perf_event.h > @@ -160,8 +160,9 @@ enum perf_branch_sample_type { > PERF_SAMPLE_BRANCH_ABORT= 1U << 7, /* transaction aborts */ > PERF_SAMPLE_BRANCH_INTX

Re: [RFC PATCH 5/8] irq_work: Make self-IPIs optable

2012-10-22 Thread Peter Zijlstra
On Sat, 2012-10-20 at 12:22 -0400, Frederic Weisbecker wrote: > + if (empty) { > + /* > +* If an IPI is requested, raise it right away. Otherwise wait > +* for the next tick unless it's stopped. Now if the arch uses > +* some other

Re: [tip:numa/core] x86, mm: Prevent gcc to re-read the pagetables

2012-10-22 Thread Peter Zijlstra
On Sun, 2012-10-21 at 05:56 -0700, tip-bot for Andrea Arcangeli wrote: > In get_user_pages_fast() the TLB shootdown code can clear the pagetables > before firing any TLB flush (the page can't be freed until the TLB > flushing IPI has been delivered but the pagetables will be cleared well > before

Re: [PATCH v2 1/3] sched: introduce distinct per-cpu load average

2012-10-22 Thread Peter Zijlstra
On Sat, 2012-10-20 at 21:06 +0200, Andrea Righi wrote: > @@ -383,13 +383,7 @@ struct rq { > struct list_head leaf_rt_rq_list; > #endif > > + unsigned long __percpu *nr_uninterruptible; This is O(nr_cpus^2) memory.. > +unsigned long nr_uninterruptible_cpu(int cpu) > +{ > +

re: sched, numa, mm: Implement constant, per task Working Set Sampling (WSS) rate

2012-10-22 Thread Peter Zijlstra
On Mon, 2012-10-22 at 14:55 +0300, Dan Carpenter wrote: > Hello Peter Zijlstra, > > The patch 3d049f8a5398: "sched, numa, mm: Implement constant, per > task Working Set Sampling (WSS) rate" from Oct 14, 2012, leads to the > following warning: > kernel/sch

Re: [PATCH v2 1/2] perf tools: add event modifier to request exclusive PMU access

2012-10-22 Thread Peter Zijlstra
On Mon, 2012-10-22 at 17:44 +0200, Stephane Eranian wrote: > > I know the answer, because I know what's going on under the > hood. But what about the average user? I'm still wondering if the avg user really thinks 'instructions' is a useful metric for other than obtaining ipc measurements. The

Re: [PATCH v2 1/2] perf tools: add event modifier to request exclusive PMU access

2012-10-22 Thread Peter Zijlstra
On Mon, 2012-10-22 at 18:08 +0200, Stephane Eranian wrote: > > I'm still wondering if the avg user really thinks 'instructions' is > a > > useful metric for other than obtaining ipc measurements. > > > Yeah, for many users CPI (or IPC) is a useful metric. Right but you don't get that using instru

[PATCH 8/8] sched, numa, mm: Implement slow start for working set sampling

2012-11-12 Thread Peter Zijlstra
straightforward, most of the patch deals with adding the /proc/sys/kernel/sched_numa_scan_delay_ms tunable knob. Signed-off-by: Peter Zijlstra Cc: Linus Torvalds Cc: Andrew Morton Cc: Peter Zijlstra Cc: Andrea Arcangeli Cc: Rik van Riel Cc: Mel Gorman [ Wrote the changelog, ran measurements

[PATCH 1/8] sched, numa, mm: Introduce sched_feat_numa()

2012-11-12 Thread Peter Zijlstra
Avoid a few #ifdef's later on. Signed-off-by: Peter Zijlstra Cc: Paul Turner Cc: Lee Schermerhorn Cc: Christoph Lameter Cc: Rik van Riel Cc: Mel Gorman Cc: Andrew Morton Cc: Linus Torvalds Signed-off-by: Ingo Molnar --- kernel/sched/sched.h |6 ++ 1 file changed, 6 inser

[PATCH 6/8] sched, numa, mm: Implement constant, per task Working Set Sampling (WSS) rate

2012-11-12 Thread Peter Zijlstra
out a possible NULL pointer dereference in the first version of this patch. ] Based-on-idea-by: Andrea Arcangeli Bug-Found-By: Dan Carpenter Signed-off-by: Peter Zijlstra Cc: Linus Torvalds Cc: Andrew Morton Cc: Peter Zijlstra Cc: Andrea Arcangeli Cc: Rik van Riel Cc: Mel Gorman [ Wrote

[PATCH 7/8] sched, numa, mm: Count WS scanning against present PTEs, not virtual memory ranges

2012-11-12 Thread Peter Zijlstra
By accounting against the present PTEs, scanning speed reflects the actual present (mapped) memory. Suggested-by: Ingo Molnar Signed-off-by: Peter Zijlstra Cc: Linus Torvalds Cc: Andrew Morton Cc: Peter Zijlstra Cc: Andrea Arcangeli Cc: Rik van Riel Cc: Mel Gorman Signed-off-by: Ingo

[PATCH 3/8] sched, numa, mm: Add credits for NUMA placement

2012-11-12 Thread Peter Zijlstra
the final details of what the NUMA code needs to do, and why. ] Signed-off-by: Rik van Riel Acked-by: Peter Zijlstra Cc: Peter Zijlstra Cc: Andrea Arcangeli Cc: Rik van Riel Cc: Mel Gorman Cc: Linus Torvalds Cc: Andrew Morton Signed-off-by: Ingo Molnar This is against tip.git numa/

[PATCH 4/8] sched, numa, mm: Add last_cpu to page flags

2012-11-12 Thread Peter Zijlstra
grow enough 64bit only page-flags to push the last-cpu out. ] Suggested-by: Rik van Riel Signed-off-by: Peter Zijlstra Cc: Linus Torvalds Cc: Andrew Morton Cc: Peter Zijlstra Cc: Andrea Arcangeli Cc: Rik van Riel Cc: Mel Gorman Signed-off-by: Ingo Molnar --- include/linux/mm.h

[PATCH 0/8] Announcement: Enhanced NUMA scheduling with adaptive affinity

2012-11-12 Thread Peter Zijlstra
Hi, This series implements an improved version of NUMA scheduling, based on the review and testing feedback we got. Like the previous version, this code is driven by working set probing faults (so much of the VM machinery remains) - but the subsequent utilization of those faults and the scheduler

[PATCH 2/8] sched, numa, mm: Implement THP migration

2012-11-12 Thread Peter Zijlstra
Add THP migration for the NUMA working set scanning fault case. It uses the page lock to serialize. No migration pte dance is necessary because the pte is already unmapped when we decide to migrate. Signed-off-by: Peter Zijlstra Cc: Johannes Weiner Cc: Mel Gorman Cc: Andrea Arcangeli Cc

Re: [RFC PATCH] perf: Update event buffer tail when overwriting old events

2013-04-18 Thread Peter Zijlstra
On Mon, 2013-04-15 at 14:02 +0800, Yan, Zheng wrote: > From: "Yan, Zheng" > > If perf event buffer is in overwrite mode, the kernel only updates > the data head when it overwrites old samples. The program that owns > the buffer need periodically check the buffer and update a variable > that track

Re: [RFC PATCH 0/2] sched: move content out of core files for load average

2013-04-18 Thread Peter Zijlstra
On Mon, 2013-04-15 at 11:33 +0200, Ingo Molnar wrote: > * Paul Gortmaker wrote: > > > Recent activity has had a focus on moving functionally related blocks of > > stuff > > out of sched/core.c into stand-alone files. The code relating to load > > average > > calculations has grown significan

Re: [tip:perf/urgent] perf: Treat attr.config as u64 in perf_swevent_init()

2013-04-18 Thread Peter Zijlstra
On Mon, 2013-04-15 at 03:42 -0700, tip-bot for Tommi Rantala wrote: > Commit-ID: 8176cced706b5e5d15887584150764894e94e02f > Gitweb: http://git.kernel.org/tip/8176cced706b5e5d15887584150764894e94e02f > Author: Tommi Rantala > AuthorDate: Sat, 13 Apr 2013 22:49:14 +0300 > Committer: Ingo M

Re: [PATCH 2/2] perf, amd: support for AMD NB and L2I "uncore" counters.

2013-04-18 Thread Peter Zijlstra
On Mon, 2013-04-15 at 12:21 -0500, Jacob Shin wrote: > Add support for AMD Family 15h [and above] northbridge performance > counters. MSRs 0xc0010240 ~ 0xc0010247 are shared across all cores > that share a common northbridge. > > Add support for AMD Family 16h L2 performance counters. MSRs > 0xc00

Re: [PATCH] NMI: fix NMI period is not correct when cpu frequency changes issue.

2013-04-18 Thread Peter Zijlstra
On Mon, 2013-04-15 at 16:30 -0700, Andrew Morton wrote: > I think this will break the build if CONFIG_PERF_EVENTS=n and > CONFIG_LOCKUP_DETECTOR=y. I was able to create such a config for > powerpc. If I'm reading it correctly, CONFIG_PERF_EVENTS cannot be > disabled on x86_64? If so, what the he

Re: [PATCH v2] NMI: fix NMI period is not correct when cpu frequency changes issue.

2013-04-18 Thread Peter Zijlstra
On Tue, 2013-04-16 at 06:57 +, Pan, Zhenjie wrote: > Watchdog use performance monitor of cpu clock cycle to generate NMI to detect > hard lockup. > But when cpu's frequency changes, the event period will also change. > It's not as expected as the configration. > For example, set the NMI event

Re: [RFC PATCH 0/2] sched: move content out of core files for load average

2013-04-19 Thread Peter Zijlstra
On Fri, 2013-04-19 at 10:25 +0200, Ingo Molnar wrote: > It might eventually make sense to integrate the 'average load' > calculation as well > with all this - as they really have a similar purpose, the avenload[] > vector of > averages is conceptually similar to the rq->cpu_load[] vector of > ave

Re: [PATCH Resend v6] sched: fix wrong rq's runnable_avg update with rt tasks

2013-04-19 Thread Peter Zijlstra
nly runs a periodic RT task, > is close to LOAD_AVG_MAX whatever the running duration of the RT task is. > > A new idle_exit function is called when the prev task is the idle function > so the elapsed time will be accounted as idle time in the rq's load. Acked-by: Peter Zijlstra

Re: [PATCH 0/3] uncore patches

2013-04-19 Thread Peter Zijlstra
On Tue, 2013-04-16 at 19:51 +0800, Yan, Zheng wrote: > From: "Yan, Zheng" > > I sent these 3 patches to the mailing list some time ago, but got no response. > Patch 1 and patch 2 are bug fixes, patch 3 adds Ivy Bridge-EP support. Acked-by: Peter Zijlstra -- To unsu

Re: [PATCH 2/2] perf, amd: support for AMD NB and L2I "uncore" counters.

2013-04-19 Thread Peter Zijlstra
r overflow interrupts. Sampling mode and > per-thread events are not supported. > > Signed-off-by: Jacob Shin Acked-by: Peter Zijlstra -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More maj

Re: [PATCH 2/2] perf, amd: support for AMD NB and L2I "uncore" counters.

2013-04-19 Thread Peter Zijlstra
Zheng Date: Mon Sep 10 15:53:49 2012 +0800 perf/x86: Add cpumask for uncore pmu This patch adds a cpumask file to the uncore pmu sysfs directory. The cpumask file contains one active cpu for every socket. Signed-off-by: "Yan, Zheng" Acked-by: Peter Zijlst

Re: [PATCH v3] perf: Check all MSRs before passing hw check

2013-04-21 Thread Peter Zijlstra
On Sun, 2013-04-21 at 10:52 +0200, Ingo Molnar wrote: > * George Dunlap wrote: > > > Any comments? it's been 2 weeks now. > > Looks good to me - Peter, any objections? Nope, looks good. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to maj

Re: [PATCH] events: Protect access via task_subsys_state_check()

2013-04-22 Thread Peter Zijlstra
McKenney > Tested-by: Gustavo Luiz Duarte Acked-by: Peter Zijlstra > diff --git a/kernel/events/core.c b/kernel/events/core.c > index b0cd865..8db9551 100644 > --- a/kernel/events/core.c > +++ b/kernel/events/core.c > @@ -4593,6 +4593,7 @@ void perf_event_comm(stru

Re: [PATCH v6] sched: fix init NOHZ_IDLE flag

2013-04-22 Thread Peter Zijlstra
On Fri, 2013-04-19 at 15:10 +0200, Vincent Guittot wrote: > As suggested by Frederic Weisbecker, another solution is to have the > same > rcu lifecycle for both NOHZ_IDLE and sched_domain struct. I have > introduce > a new sched_domain_rq struct that is the entry point for both > sched_domains > an

Re: [PATCH] sched: wake-affine throttle

2013-04-22 Thread Peter Zijlstra
OK,.. Ingo said that pipe-test was the original motivation for wake_affine() and since that's currently broken to pieces due to select_idle_sibling() is there still a benefit to having it at all? Can anybody find any significant regression when simply killing wake_affine()? -- To unsubscribe fr

Re: [PATCH v6] sched: fix init NOHZ_IDLE flag

2013-04-22 Thread Peter Zijlstra
On Mon, 2013-04-22 at 13:01 +0200, Vincent Guittot wrote: > > I'm not quite getting things.. what's wrong with adding this flags > > thing to sched_domain itself? That's already RCU destroyed so why > add a > > second RCU layer? > > We need one flags for all sched_domain so if we add it into > sch

Re: Preemptable Ticket Spinlock

2013-04-22 Thread Peter Zijlstra
On Sun, 2013-04-21 at 17:12 -0400, Rik van Riel wrote: > > If we always incremented the ticket number by 2 (instead of 1), then > we could use the lower bit of the ticket number as the spinlock. ISTR that paravirt ticket locks already do that and use the lsb to indicate the unlock needs to perfor

Re: [PATCH v2 2/6] sched: explicitly cpu_idle_type checking in rebalance_domains()

2013-04-22 Thread Peter Zijlstra
On Tue, 2013-03-26 at 15:01 +0900, Joonsoo Kim wrote: > @@ -5506,10 +5506,10 @@ static void rebalance_domains(int cpu, enum > cpu_idle_type idle) > if (time_after_eq(jiffies, sd->last_balance + > interval)) { > if (load_balance(cpu, rq, sd, idle, &balance)) >

Re: [PATCH v2 3/6] sched: don't consider other cpus in our group in case of NEWLY_IDLE

2013-04-22 Thread Peter Zijlstra
#x27;t > consider other cpus. Assigning to 'this_rq->idle_stamp' is now valid. > > Cc: Srivatsa Vaddagiri > Acked-by: Peter Zijlstra > Signed-off-by: Joonsoo Kim > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 9d693d0..3f8c4f2 10064

Re: [PATCH v2 6/6] sched: prevent to re-select dst-cpu in load_balance()

2013-04-22 Thread Peter Zijlstra
On Tue, 2013-03-26 at 15:01 +0900, Joonsoo Kim wrote: > Commit 88b8dac0 makes load_balance() consider other cpus in its group. > But, in that, there is no code for preventing to re-select dst-cpu. > So, same dst-cpu can be selected over and over. > > This patch add functionality to load_balance()

Re: [PATCH v2 0/6] correct load_balance()

2013-04-22 Thread Peter Zijlstra
ith them taken care of. I'll leave that up to you and Ingo. Otherwise: Acked-by: Peter Zijlstra Thanks! -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.o

Re: Preemptable Ticket Spinlock

2013-04-22 Thread Peter Zijlstra
On Mon, 2013-04-22 at 08:52 -0400, Rik van Riel wrote: > On 04/22/2013 07:51 AM, Peter Zijlstra wrote: > > On Sun, 2013-04-21 at 17:12 -0400, Rik van Riel wrote: > >> > >> If we always incremented the ticket number by 2 (instead of 1), then > >> we could use t

<    5   6   7   8   9   10   11   12   13   14   >