[PATCH 07/16] mm: fix cache coloring on x86_64 architecture

2012-11-05 Thread Michel Lespinasse
A mmaps the file with pgoff 0, and program B mmaps the file with pgoff 1. The old code would align the mmaps, resulting in misaligned pages: A: 0123 B: 123 After this patch, they are aligned so the pages line up: A: 0123 B: 123 Signed-off-by: Michel Lespinasse Proposed-by: Rik van Riel

[PATCH 05/16] mm: vm_unmapped_area() lookup function

2012-11-05 Thread Michel Lespinasse
gap length - low/high address limits that the gap must fit into - alignment mask and offset Also update the generic arch_get_unmapped_area[_topdown] functions to make use of vm_unmapped_area() instead of implementing a brute force search. Signed-off-by: Michel Lespinasse Reviewed-by: Rik van

[PATCH 04/16] mm: rearrange vm_area_struct for fewer cache misses

2012-11-05 Thread Michel Lespinasse
tree walk is in the first cache line. Signed-off-by: Michel Lespinasse Signed-off-by: Rik van Riel --- include/linux/mm_types.h | 12 1 files changed, 8 insertions(+), 4 deletions(-) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 94fa52b28ee8..528da4abf8ee

[PATCH 00/16] mm: use augmented rbtrees for finding unmapped areas

2012-11-05 Thread Michel Lespinasse
h the desired arguments is quite shorter than duplicating the brute force algorithm all over the place. There is still a bit of repetition between various implementations of arch_get_unmapped_area[_topdown] functions that could probably be simplified somehow, but I feel we can keep that for a lat

[PATCH 01/16] mm: add anon_vma_lock to validate_mm()

2012-11-05 Thread Michel Lespinasse
: Bob Liu Signed-off-by: Michel Lespinasse --- mm/mmap.c |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/mm/mmap.c b/mm/mmap.c index 2d942353d681..9a796c41e7d9 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -334,8 +334,10 @@ void validate_mm(struct mm_struct *mm)

Re: [PATCH 4/6] rbtree: faster augmented insert

2012-07-27 Thread Michel Lespinasse
On Fri, Jul 27, 2012 at 12:26 PM, Peter Zijlstra wrote: > On Fri, 2012-07-20 at 05:31 -0700, Michel Lespinasse wrote: >> --- a/lib/rbtree.c >> +++ b/lib/rbtree.c >> @@ -88,7 +88,8 @@ __rb_rotate_set_parents(struct rb_node *old, struct >> rb_node *new, >>

Re: [PATCH 4/6] rbtree: faster augmented insert

2012-07-27 Thread Michel Lespinasse
On Fri, Jul 27, 2012 at 1:04 PM, Peter Zijlstra wrote: > On Fri, 2012-07-20 at 05:31 -0700, Michel Lespinasse wrote: >> +static void augment_rotate(struct rb_node *rb_old, struct rb_node *rb_new) >> +{ >> + struct test_node *old = rb_entry(rb_old, struct test_node, r

Re: [PATCH 5/6] rbtree: faster augmented erase

2012-07-27 Thread Michel Lespinasse
On Fri, Jul 27, 2012 at 1:02 PM, Peter Zijlstra wrote: >On Fri, 2012-07-20 at 05:31 -0700, Michel Lespinasse wrote: >> --- a/lib/rbtree_test.c >> +++ b/lib/rbtree_test.c >> @@ -1,5 +1,6 @@ >> #include >> #include >> +#include >This confus

Re: [PATCH 5/6] rbtree: faster augmented erase

2012-07-27 Thread Michel Lespinasse
On Fri, Jul 27, 2012 at 5:44 PM, Michel Lespinasse wrote: > On Fri, Jul 27, 2012 at 1:02 PM, Peter Zijlstra wrote: >> As it stands rb_erase() isn't inlined and its rather big, >> why would you want to inline it for augmented callers? > > Just as the non-augmented rb_

Re: [PATCH 1/6] rbtree: rb_erase updates and comments

2012-07-29 Thread Michel Lespinasse
On Sat, Jul 28, 2012 at 9:04 PM, George Spelvin wrote: > I was just looking at the beginning of the 2-children case and wondering: > > + /* > +* Old is the node we want to erase. It's got left and right > +* children, which makes things difficult. Let'

Re: linux-next: build failure after merge of the akpm tree

2012-07-30 Thread Michel Lespinasse
On Mon, Jul 30, 2012 at 9:40 PM, Stephen Rothwell wrote: > Hi Andrew, > > After merging the akpm tree, today's linux-next build (x86_64 > allmodconfig) failed like this: > > net/ceph/osd_client.c: In function 'ceph_osdc_alloc_request': > net/ceph/osd_client.c:216:2: error: implicit declaration of

Re: linux-next: build failure after merge of the akpm tree

2012-07-31 Thread Michel Lespinasse
gt; rb, list nodes in ceph_osd_request") from the ceph tree. > > I added the following merge fix patch for today: > > From: Stephen Rothwell > Date: Tue, 31 Jul 2012 14:37:35 +1000 > Subject: [PATCH] libceph: remove rb_node initialisation > > Signed-off-by: Stephen Rothwell

Re: [PATCH] rbtree: Add some necessary condition checks

2013-08-26 Thread Michel Lespinasse
On Fri, Aug 23, 2013 at 7:45 AM, wrote: > From: Zhi Yong Wu > > Signed-off-by: Zhi Yong Wu > --- > include/linux/rbtree_augmented.h | 3 ++- > lib/rbtree.c | 5 +++-- > 2 files changed, 5 insertions(+), 3 deletions(-) So, you are saying that the checks are necessary, but y

Re: [PATCH] rwsem: add rwsem_is_contended

2013-09-01 Thread Michel Lespinasse
Hi Josef, On Fri, Aug 30, 2013 at 7:14 AM, Josef Bacik wrote: > Btrfs uses an rwsem to control access to its extent tree. Threads will hold a > read lock on this rwsem while they scan the extent tree, and if need_resched() > they will drop the lock and schedule. The transaction commit needs to

Re: [PATCH] rbtree: Add some necessary condition checks

2013-09-02 Thread Michel Lespinasse
On Sun, Sep 1, 2013 at 11:30 PM, Zhi Yong Wu wrote: > In Tue, Aug 27, 2013 at 6:01 AM, Michel Lespinasse wrote: >> On Fri, Aug 23, 2013 at 7:45 AM, wrote: >>> From: Zhi Yong Wu >>> >>> Signed-off-by: Zhi Yong Wu >>> --- >>> incl

Re: [PATCH] rbtree: Add some necessary condition checks

2013-09-02 Thread Michel Lespinasse
On Mon, Sep 2, 2013 at 9:45 PM, Zhi Yong Wu wrote: > On Mon, Sep 2, 2013 at 4:57 PM, Michel Lespinasse wrote: >> Thanks for the link - I now better understand where you are coming >> from with these fixes. >> >> Going back to the original message: >&g

Re: [PATCH] rwsem: reduce spinlock contention in wakeup code path

2013-09-29 Thread Michel Lespinasse
could be worth trying to go back to a non-sleeping lock. That said, I am very scared of using rwlock_t here, and I would much prefer we choose a fair lock (either spinlock or a new rwlock implementation which guarantees not to starve any locker thread) -- Michel Lespinasse A program is never full

Re: [PATCH 1/7] mm: interval tree updates

2012-09-07 Thread Michel Lespinasse
On Fri, Sep 7, 2012 at 3:13 PM, Andrew Morton wrote: > On Tue, 4 Sep 2012 02:20:51 -0700 > Michel Lespinasse wrote: > >> This commit updates the generic interval tree code that was >> introduced in "mm: replace vma prio_tree with an interval tree". >> >

Re: [PATCH 1/7] mm: interval tree updates

2012-09-07 Thread Michel Lespinasse
On Fri, Sep 07, 2012 at 03:55:14PM -0700, Andrew Morton wrote: > On Fri, 7 Sep 2012 15:29:36 -0700 > Michel Lespinasse wrote: > > > > Ho hum. I don't think I can be bothered untangling all this. > > > > I don't think you should have to do it yourself e

Re: [PATCH 1/7] mm: interval tree updates

2012-09-07 Thread Michel Lespinasse
On Fri, Sep 07, 2012 at 03:55:14PM -0700, Andrew Morton wrote: > On Fri, 7 Sep 2012 15:29:36 -0700 > Michel Lespinasse wrote: > > > > Ho hum. I don't think I can be bothered untangling all this. > > > > I don't think you should have to do it yourself e

[PATCH] perf: fix duplicate header inclusion

2012-10-09 Thread Michel Lespinasse
#include somehow got duplicated on its way to linus's tree (probably as a conflict resolution as things got sent through multiple trees) Signed-off-by: Michel Lespinasse --- tools/perf/util/include/linux/rbtree.h |1 - 1 files changed, 0 insertions(+), 1 deletions(-) diff --git a/

[PATCH] mm: protect against concurrent vma expansion

2012-11-30 Thread Michel Lespinasse
n a given mm would share the same anon_vma, which we already lock here. However this turned out to be difficult - all of the schemes I tried for refcounting the growable anon_vma and clearing turned out ugly. So, I'm now proposing only the minimal fix. Signed-off-by: Michel Lespinasse --- m

Re: [PATCH] mm: protect against concurrent vma expansion

2012-12-03 Thread Michel Lespinasse
On Mon, Dec 3, 2012 at 3:01 PM, Andrew Morton wrote: > On Fri, 30 Nov 2012 22:56:27 -0800 > Michel Lespinasse wrote: > >> expand_stack() runs with a shared mmap_sem lock. Because of this, there >> could be multiple concurrent stack expansions in the same mm, which may >

Re: [PATCH 2/2] mm/migration: Make rmap_walk_anon() and try_to_unmap_anon() more scalable

2012-12-04 Thread Michel Lespinasse
On Mon, Dec 3, 2012 at 6:17 AM, Mel Gorman wrote: > On Sat, Dec 01, 2012 at 09:15:38PM +0100, Ingo Molnar wrote: >> @@ -732,7 +732,7 @@ static int page_referenced_anon(struct p >> struct anon_vma_chain *avc; >> int referenced = 0; >> >> - anon_vma = page_lock_anon_vma(page); >> +

Re: [PATCH 2/2, v2] mm/migration: Make rmap_walk_anon() and try_to_unmap_anon() more scalable

2012-12-04 Thread Michel Lespinasse
ou forgot to rename anon_vma_unlock() too. But really, this is nitpicking. I like the idea behind the patch, and after giving it a close look, I couldn't find anything wrong with it. Reviewed-by: Michel Lespinasse -- Michel "Walken" Lespinasse A program is never fully debugged

Re: [PATCH 49/52] mm/rmap: Convert the struct anon_vma::mutex to an rwsem

2012-12-04 Thread Michel Lespinasse
we mutex_lock()ed we'll now down_write(). Looks good. Reviewed-by: Michel Lespinasse -- Michel "Walken" Lespinasse A program is never fully debugged until the last user dies. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of

Re: [PATCH] mm: protect against concurrent vma expansion

2012-12-04 Thread Michel Lespinasse
k during stack expansion is expected to be small: glibc doesn't use expandable stacks for the threads it creates, so having multiple growable stacks is actually uncommon and we don't expect the page table lock to get bounced between threads. Signed-off-by: Michel Lespinasse ---

Re: [PATCH 2/2, v2] mm/migration: Make rmap_walk_anon() and try_to_unmap_anon() more scalable

2012-12-04 Thread Michel Lespinasse
On Sun, Dec 2, 2012 at 7:12 AM, Ingo Molnar wrote: > Subject: [PATCH] mm/rmap, migration: Make rmap_walk_anon() and > try_to_unmap_anon() more scalable > > rmap_walk_anon() and try_to_unmap_anon() appears to be too > careful about locking the anon vma: while it needs protection > against anon vma

Re: [PATCH 2/2] rename NUMA fault handling functions

2012-10-20 Thread Michel Lespinasse
On Fri, Oct 19, 2012 at 4:41 AM, Peter Zijlstra wrote: > On Thu, 2012-10-18 at 17:20 -0400, Rik van Riel wrote: >> Having the function name indicate what the function is used >> for makes the code a little easier to read. Furthermore, >> the fault handling code largely consists of do__page >>

Re: [PATCH] rbtree: include linux/compiler.h for definition of __always_inline

2012-10-22 Thread Michel Lespinasse
error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or > ‘__attribute__’ before ‘void’ > > This patch includes linux/compiler.h in rbtree_augmented.h so that the > __always_inline macro is resolved correctly. > > Cc: Pekka Enberg > Cc: Michel Lespinasse > Cc: Ingo Molnar > Signed-off-by: Will

Re: [PATCH 0/3] remove kvm's use of augmented rbtree

2012-12-10 Thread Michel Lespinasse
gt; On Sat, Nov 24, 2012 at 9:40 PM, Michel Lespinasse wrote: >>> On Thu, Nov 22, 2012 at 9:49 PM, Michel Lespinasse >>> wrote: >>>> On Thu, Nov 22, 2012 at 9:14 AM, Sasha Levin >>>> wrote: >>>>> The following patch fixed the problem

[PATCH 0/3] fix missing rb_subtree_gap updates on vma insert/erase

2012-11-12 Thread Michel Lespinasse
e exception that the node being erased doesn't need to have an up to date rb_subtree_gap. These 3 patches apply on top of the stack I previously sent (or equally, on top of the last published mmotm). Michel Lespinasse (3): mm: ensure safe rb_subtree_gap update when inserting new VMA mm: e

[PATCH 1/3] mm: ensure safe rb_subtree_gap update when inserting new VMA

2012-11-12 Thread Michel Lespinasse
Levin for uncovering the problem and to Hugh Dickins for coming up with a simpler test case) Reported-by: Sasha Levin Signed-off-by: Michel Lespinasse --- mm/mmap.c | 27 +++ 1 files changed, 15 insertions(+), 12 deletions(-) diff --git a/mm/mmap.c b/mm/mmap.c index 6

[PATCH 3/3] mm: debug code to verify rb_subtree_gap updates are safe

2012-11-12 Thread Michel Lespinasse
ap_update() would fail to propagate the rb_subtree_gap updates as high up as necessary. Signed-off-by: Michel Lespinasse --- mm/mmap.c | 88 ++--- 1 files changed, 55 insertions(+), 33 deletions(-) diff --git a/mm/mmap.c b/mm/mmap.c index c

[PATCH 2/3] mm: ensure safe rb_subtree_gap update when removing VMA

2012-11-12 Thread Michel Lespinasse
updated, and we want to make sure vma_rb_erase() runs before there are any such stale rb_subtree_gap values in the rbtree. (I don't know of a reproduceable test case for this particular issue) Signed-off-by: Michel Lespinasse --- mm/mmap.c |6 +++--- 1 files changed, 3 insertions(+), 3 del

Re: [PATCH 03/16] mm: check rb_subtree_gap correctness

2012-11-12 Thread Michel Lespinasse
On Fri, Nov 9, 2012 at 6:13 AM, Sasha Levin wrote: > While fuzzing with trinity inside a KVM tools (lkvm) guest, using today's > -next > kernel, I'm getting these: > > [ 117.007714] free gap 7fba0dd1c000, correct 7fba0dcfb000 > [ 117.019773] map_count 750 rb -1 > [ 117.028362] [ cu

Re: [PATCH 15/16] mm: use vm_unmapped_area() on sparc32 architecture

2012-11-05 Thread Michel Lespinasse
On Mon, Nov 5, 2012 at 5:25 PM, David Miller wrote: > From: Michel Lespinasse > Date: Mon, 5 Nov 2012 14:47:12 -0800 > >> Update the sparc32 arch_get_unmapped_area function to make use of >> vm_unmapped_area() instead of implementing a brute force search. >> >>

Re: [PATCH 01/16] mm: add anon_vma_lock to validate_mm()

2012-11-06 Thread Michel Lespinasse
Adding Sasha and Bob, which I forgot to CC in the original message. On Mon, Nov 5, 2012 at 3:06 PM, Rik van Riel wrote: > On 11/05/2012 05:46 PM, Michel Lespinasse wrote: >> >> Iterate vma->anon_vma_chain without anon_vma_lock may cause NULL ptr deref >> in >>

Re: mm: NULL ptr deref in anon_vma_interval_tree_verify

2012-11-06 Thread Michel Lespinasse
On Mon, Nov 5, 2012 at 5:41 AM, Michel Lespinasse wrote: > On Sun, Nov 4, 2012 at 8:44 PM, Michel Lespinasse wrote: >> On Sun, Nov 4, 2012 at 8:14 PM, Bob Liu wrote: >>> Hmm, I attached a simple fix patch. >> >> Reviewed-by: Michel Lespinasse >> (also ran s

Re: mm: NULL ptr deref in anon_vma_interval_tree_verify

2012-11-06 Thread Michel Lespinasse
On Tue, Nov 6, 2012 at 12:24 AM, Michel Lespinasse wrote: > On Mon, Nov 5, 2012 at 5:41 AM, Michel Lespinasse wrote: >> On Sun, Nov 4, 2012 at 8:44 PM, Michel Lespinasse wrote: >>> On Sun, Nov 4, 2012 at 8:14 PM, Bob Liu wrote: >>>> Hmm, I attached a simple

Re: linux-next: build warning after merge of the final tree (akpm tree related)

2012-11-08 Thread Michel Lespinasse
ap.c:60:16: warning: unused variable 'start_addr' > [-Wunused-variable] > > Introduced by commit "mm: use vm_unmapped_area() on arm architecture". Sorry for the mistakes. The following changes should fix what's been reported so far. commit 1c98949798ce7a1d4a910775

Re: [next:akpm 136/313] mm/mmap.c:1878:6: error: 'mm' undeclared

2012-11-08 Thread Michel Lespinasse
reported only once for > each function it appears in commit 34550b95185c1ecfa8882664744c14edda385868 Author: Michel Lespinasse Date: Thu Nov 8 22:14:34 2012 -0800 fix mm: augment vma rbtree with rb_subtree_gap diff --git a/mm/mmap.c b/mm/mmap.c index d12c69eaf23f..0b8f9d83e2e2 100644 -

Re: [next:akpm 157/313] arch/tile/mm/hugetlbpage.c:256:20: error: 'mm' undeclared

2012-11-08 Thread Michel Lespinasse
this > function) > arch/tile/mm/hugetlbpage.c:256:20: note: each undeclared identifier is > reported only once for each function it appears in commit 86234092170b43771c3f6257cb320ff6e2c10c52 Author: Michel Lespinasse Date: Thu Nov 8 22:13:58 2012 -0800 fix mm: use vm_unmapped_

Issues with "x86, um: switch to generic fork/vfork/clone" commit

2012-11-09 Thread Michel Lespinasse
Hi, I'm having an issue booting current linux-next kernels on my test machines. Userspace crashes when it's supposed to pivot to the rootfs. With the loglevel=8 kernel parameter, the last messages I see are: Checking root filesystem in pivot_root init. [6.252717] usb 2-1: link qh256-0001/

Re: Issues with "x86, um: switch to generic fork/vfork/clone" commit

2012-11-09 Thread Michel Lespinasse
On Fri, Nov 9, 2012 at 8:51 PM, Al Viro wrote: > On Fri, Nov 09, 2012 at 08:36:53PM -0800, Michel Lespinasse wrote: >> Hi, >> >> I'm having an issue booting current linux-next kernels on my test >> machines. Userspace crashes when it's supposed to pivot

Re: Issues with "x86, um: switch to generic fork/vfork/clone" commit

2012-11-09 Thread Michel Lespinasse
On Fri, Nov 9, 2012 at 9:33 PM, Al Viro wrote: > On Fri, Nov 09, 2012 at 08:57:58PM -0800, Michel Lespinasse wrote: >> On Fri, Nov 9, 2012 at 8:51 PM, Al Viro wrote: >> > On Fri, Nov 09, 2012 at 08:36:53PM -0800, Michel Lespinasse wrote: >> >> Hi, >> >>

Re: Issues with "x86, um: switch to generic fork/vfork/clone" commit

2012-11-10 Thread Michel Lespinasse
On Fri, Nov 9, 2012 at 11:33 PM, Al Viro wrote: > Could you verify that this on top of for-next gets the things working again? > It's a very lazy way to deal with that (we don't want to bother with > restoring extras, at the very least), but the rest can go separately (and > is shared with mainlin

Re: [RFC PATCH 3/3 -v2] x86,smp: auto tune spinlock backoff delay factor

2012-12-26 Thread Michel Lespinasse
On Wed, Dec 26, 2012 at 11:51 AM, Rik van Riel wrote: > On 12/26/2012 02:10 PM, Eric Dumazet wrote: >> We might try to use a hash on lock address, and an array of 16 different >> delays so that different spinlocks have a chance of not sharing the same >> delay. >> >> With following patch, I get 98

Re: [RFC PATCH 3/3 -v2] x86,smp: auto tune spinlock backoff delay factor

2012-12-29 Thread Michel Lespinasse
On Wed, Dec 26, 2012 at 11:10 AM, Eric Dumazet wrote: > I did some tests with your patches with following configuration : > > tc qdisc add dev eth0 root htb r2q 1000 default 3 > (to force a contention on qdisc lock, even with a multi queue net > device) > > and 24 concurrent "netperf -t UDP_STREAM

[PATCH 0/2] extend synchro-test module to test spinlocks too

2012-12-30 Thread Michel Lespinasse
#x27; the back story with synchro-test though - they seem to have been stuck in andrew's tree for a very long time now. Is there any reason delaying their inclusion or is it just that nobody's been pushing for them ? Michel Lespinasse (2): add spinlock test to synchro-test module Docu

[PATCH 2/2] Document default load and interval values in synchro-test module

2012-12-30 Thread Michel Lespinasse
The synchro-test module default parameters are to keep the lock for 2uS and wait 2uS between release and the next attempted acquisition. Having the documentation wrong on this point was quite confusing ! Signed-off-by: Michel Lespinasse --- Documentation/synchro-test.txt |4 ++-- 1 files

[PATCH 1/2] add spinlock test to synchro-test module

2012-12-30 Thread Michel Lespinasse
Signed-off-by: Michel Lespinasse --- kernel/synchro-test.c | 76 + 1 files changed, 70 insertions(+), 6 deletions(-) diff --git a/kernel/synchro-test.c b/kernel/synchro-test.c index 76e3ad39505f..9abfa955d69e 100644 --- a/kernel/synchro-test.c

ticket spinlock proportional backoff experiments

2013-01-01 Thread Michel Lespinasse
On Fri, Dec 21, 2012 at 3:49 PM, Rik van Riel wrote: > Many spinlocks are embedded in data structures; having many CPUs > pounce on the cache line the lock is in will slow down the lock > holder, and can cause system performance to fall off a cliff. > > The paper "Non-scalable locks are dangerous"

[PATCH 1/2] x86,smp: simplify __ticket_spin_lock

2013-01-01 Thread Michel Lespinasse
Just cosmetic - avoid an unnecessary goto construct Signed-off-by: Michel Lespinasse --- arch/x86/include/asm/spinlock.h |7 ++- 1 files changed, 2 insertions(+), 5 deletions(-) diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h index 2a45eb0cdb2c

[PATCH 2/2] x86,smp: proportional backoff for ticket spinlocks

2013-01-01 Thread Michel Lespinasse
e head position among waiters. Signed-off-by: Michel Lespinasse --- arch/x86/include/asm/spinlock.h |2 ++ arch/x86/kernel/smp.c | 33 +++-- 2 files changed, 29 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include

Re: mmap() scalability in the presence of the MAP_POPULATE flag

2013-01-02 Thread Michel Lespinasse
On Wed, Jan 2, 2013 at 8:50 AM, Roman Dubtsov wrote: > Concurrent mmap() calls from the same process are serialized via downing > mm->mmap_sem for write. This means that operations like populating the > pages which do not alter vmas are also performed serially. Anecdotal > data from two machines I

Re: [RFC PATCH 0/5] x86,smp: make ticket spinlock proportional backoff w/ auto tuning

2013-01-03 Thread Michel Lespinasse
On Wed, Jan 2, 2013 at 9:15 PM, Rik van Riel wrote: > The v2 series integrates several ideas from Michel Lespinasse > and Eric Dumazet, which should result in better throughput and > nicer behaviour in situations with contention on multiple locks. > > Please let me know if you

Re: [RFC PATCH 1/5] x86,smp: move waiting on contended ticket lock out of line

2013-01-03 Thread Michel Lespinasse
On Wed, Jan 2, 2013 at 9:18 PM, Rik van Riel wrote: > Moving the wait loop for congested loops to its own function allows > us to add things to that wait loop, without growing the size of the > kernel text appreciably. Looks good :) Still-reviewed-by: Michel Lespinasse -- Miche

Re: [RFC PATCH 2/5] x86,smp: proportional backoff for ticket spinlocks

2013-01-03 Thread Michel Lespinasse
released. >> >> The number 50 is likely to be wrong for many setups, and >> this patch is mostly to illustrate the concept of proportional >> backup. The next patch automatically tunes the delay value. >> >> Signed-off-by: Rik van Riel >> Signed-off-by: Mich

Re: [RFC PATCH 3/5] x86,smp: auto tune spinlock backoff delay factor

2013-01-03 Thread Michel Lespinasse
On Wed, Jan 2, 2013 at 9:23 PM, Rik van Riel wrote: > Proportional spinlock delay with a high delay factor works well > when there is lots contention on a lock. Likewise, a smaller > delay factor works well when a lock is lightly contended. > > Making the code auto-tune the delay factor results in

Re: [RFC PATCH 4/5] x86,smp: keep spinlock delay values per hashed spinlock address

2013-01-03 Thread Michel Lespinasse
On Wed, Jan 2, 2013 at 9:24 PM, Rik van Riel wrote: > From: Eric Dumazet > > Eric Dumazet found a regression with the spinlock backoff code, > in workloads where multiple spinlocks were contended, each having > a different wait time. I think you should really clarify that the regression was obse

Re: [PATCH] mm: protect against concurrent vma expansion

2013-01-03 Thread Michel Lespinasse
On Thu, Jan 3, 2013 at 4:40 PM, Simon Jeons wrote: > On Wed, 2012-12-19 at 19:01 -0800, Michel Lespinasse wrote: >> Hi Simon, >> >> On Wed, Dec 19, 2012 at 5:56 PM, Simon Jeons wrote: >> > One question. >> > >> > I found that mainly callsite of e

Re: mmap() scalability in the presence of the MAP_POPULATE flag

2013-01-04 Thread Michel Lespinasse
On Fri, Jan 04, 2013 at 12:09:37AM +0700, Roman Dubtsov wrote: > On Wed, 2013-01-02 at 16:09 -0800, Michel Lespinasse wrote: > > > Is there an interest in fixing this or concurrent mmaps() from the same > > > process are too much of a corner case to worry about it? > &g

Re: [PATCH 0/9] Avoid populating unbounded num of ptes with mmap_sem held

2013-01-04 Thread Michel Lespinasse
On Fri, Jan 4, 2013 at 10:16 AM, Andy Lutomirski wrote: > I still have quite a few instances of 2-6 ms of latency due to > "call_rwsem_down_read_failed __do_page_fault do_page_fault > page_fault". Any idea why? I don't know any great way to figure out > who is holding mmap_sem at the time. Give

Re: [PATCH] mm: thp: Acquire the anon_vma rwsem for lock during split

2013-01-04 Thread Michel Lespinasse
On Fri, Jan 4, 2013 at 6:08 AM, Mel Gorman wrote: > Despite the reason for these commits, NUMA balancing is not the direct > source of the problem. split_huge_page() expected the anon_vma lock to be > exclusive to serialise the whole split operation. Ordinarily it is expected > that the anon_vma l

Re: mmap() scalability in the presence of the MAP_POPULATE flag

2013-01-04 Thread Michel Lespinasse
On Fri, Jan 4, 2013 at 10:40 PM, Roman Dubtsov wrote: > On Fri, 2013-01-04 at 03:57 -0800, Michel Lespinasse wrote: >> If this doesn't help, could you please send me your test case ? I >> think you described enough of it that I would be able to reproduce it >> give

Re: [PATCH] mm: protect against concurrent vma expansion

2012-12-19 Thread Michel Lespinasse
Hi Simon, On Wed, Dec 19, 2012 at 5:56 PM, Simon Jeons wrote: > One question. > > I found that mainly callsite of expand_stack() is #PF, but it holds > mmap_sem each time before call expand_stack(), how can hold a *shared* > mmap_sem happen? the #PF handler calls down_read(&mm->mmap_sem) before

[PATCH 0/9] Avoid populating unbounded num of ptes with mmap_sem held

2012-12-20 Thread Michel Lespinasse
while we're trying to populate them. It adds a new VM_POPULATE flag on the mappings we do want to populate, so that if userspace replaces them with mappings it doesn't want populated, mm_populate() won't populate those replacement mappings. Michel Lespinasse (9): mm: make ml

[PATCH 1/9] mm: make mlockall preserve flags other than VM_LOCKED in def_flags

2012-12-20 Thread Michel Lespinasse
On most architectures, def_flags is either 0 or VM_LOCKED depending on whether mlockall(MCL_FUTURE) was called. However, this is not an absolute rule as kvm support on s390 may set the VM_NOHUGEPAGE flag in def_flags. We don't want mlockall to clear that. Signed-off-by: Michel Lespi

[PATCH 4/9] mm: use mm_populate() for blocking remap_file_pages()

2012-12-20 Thread Michel Lespinasse
Signed-off-by: Michel Lespinasse --- mm/fremap.c | 22 ++ 1 files changed, 6 insertions(+), 16 deletions(-) diff --git a/mm/fremap.c b/mm/fremap.c index 2db886e31044..b42e32171530 100644 --- a/mm/fremap.c +++ b/mm/fremap.c @@ -129,6 +129,7 @@ SYSCALL_DEFINE5

[PATCH 3/9] mm: introduce mm_populate() for populating new vmas

2012-12-20 Thread Michel Lespinasse
-by: Andy Lutomirski Signed-off-by: Michel Lespinasse --- fs/aio.c |6 +- include/linux/mm.h | 18 +++--- ipc/shm.c | 12 +++- mm/mlock.c | 17 +++-- mm/mmap.c | 20 +++- mm/nommu.c |5

[PATCH 5/9] mm: use mm_populate() when adjusting brk with MCL_FUTURE in effect.

2012-12-20 Thread Michel Lespinasse
Signed-off-by: Michel Lespinasse --- mm/mmap.c | 18 ++ 1 files changed, 14 insertions(+), 4 deletions(-) diff --git a/mm/mmap.c b/mm/mmap.c index a16fc499dbd1..4c8d39e64e80 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -240,6 +240,7 @@ SYSCALL_DEFINE1(brk, unsigned long, brk

[PATCH 7/9] mm: remove flags argument to mmap_region

2012-12-20 Thread Michel Lespinasse
After the MAP_POPULATE handling has been moved to mmap_region() call sites, the only remaining use of the flags argument is to pass the MAP_NORESERVE flag. This can be just as easily handled by do_mmap_pgoff(), so do that and remove the mmap_region() flags parameter. Signed-off-by: Michel

[PATCH 8/9] mm: directly use __mlock_vma_pages_range() in find_extend_vma()

2012-12-20 Thread Michel Lespinasse
n't release the mmap_sem while allocating new stack pages. This is deemed acceptable, because the stack vmas grow by a bounded number of pages at a time, and these are anon pages so we don't have to read from disk to populate them. Signed-off-by: Michel Lespinasse --- include/linux/mm.

[PATCH 6/9] mm: use mm_populate() for mremap() of VM_LOCKED vmas

2012-12-20 Thread Michel Lespinasse
Signed-off-by: Michel Lespinasse --- mm/mremap.c | 25 + 1 files changed, 13 insertions(+), 12 deletions(-) diff --git a/mm/mremap.c b/mm/mremap.c index 1b61c2d3307a..c5a8bf344b1f 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -208,7 +208,7 @@ unsigned long

[PATCH 2/9] mm: remap_file_pages() fixes

2012-12-20 Thread Michel Lespinasse
ause MAP_POPULATE wasn't being passed to mmap_region(). The desired behavior is that we want the pages to be populated and locked if the vma is marked as VM_LOCKED, or to be populated if the MAP_NONBLOCK flag is not passed to remap_file_pages(). Signed-off-by: Michel Lespinasse --- mm/fr

[PATCH 9/9] mm: introduce VM_POPULATE flag to better deal with racy userspace programs

2012-12-20 Thread Michel Lespinasse
mapping is also one that the user has requested (using MAP_SHARED, MAP_LOCKED or mlock) to be populated. Signed-off-by: Michel Lespinasse --- include/linux/mm.h |1 + include/linux/mman.h |4 +++- mm/fremap.c | 12 ++-- mm/mlock.c | 19 ++- mm

Re: [PATCH 0/9] Avoid populating unbounded num of ptes with mmap_sem held

2012-12-21 Thread Michel Lespinasse
20, 2012 at 4:49 PM, Michel Lespinasse wrote: > We have many vma manipulation functions that are fast in the typical case, > but can optionally be instructed to populate an unbounded number of ptes > within the region they work on: > - mmap with MAP_POPULATE or MAP_

Re: [PATCH 0/9] Avoid populating unbounded num of ptes with mmap_sem held

2012-12-21 Thread Michel Lespinasse
On Fri, Dec 21, 2012 at 4:36 PM, Andy Lutomirski wrote: > On Thu, Dec 20, 2012 at 4:49 PM, Michel Lespinasse wrote: >> We have many vma manipulation functions that are fast in the typical case, >> but can optionally be instructed to populate an unbounded number of ptes >> w

Re: [PATCH 0/9] Avoid populating unbounded num of ptes with mmap_sem held

2012-12-21 Thread Michel Lespinasse
On Fri, Dec 21, 2012 at 5:09 PM, Andy Lutomirski wrote: > On Fri, Dec 21, 2012 at 4:59 PM, Michel Lespinasse wrote: >> On Fri, Dec 21, 2012 at 4:36 PM, Andy Lutomirski wrote: >>> Something's buggy here. My evil test case is stuck with lots of >>> thre

Re: [RFC PATCH 1/3] x86,smp: move waiting on contended lock out of line

2012-12-21 Thread Michel Lespinasse
On Fri, Dec 21, 2012 at 3:50 PM, Rik van Riel wrote: > diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h > index 33692ea..2a45eb0 100644 > --- a/arch/x86/include/asm/spinlock.h > +++ b/arch/x86/include/asm/spinlock.h > @@ -34,6 +34,8 @@ > # define UNLOCK_LOCK_PREFIX >

Re: [RFC PATCH 2/3] x86,smp: proportional backoff for ticket spinlocks

2012-12-21 Thread Michel Lespinasse
data structure with > embedded spinlock, the lock holder has a better chance of > making progress. > > Signed-off-by: Rik van Riel Looks fine to me other than the arbitrary-ness of 50 Reviewed-by: Michel Lespinasse -- Michel "Walken" Lespinasse A program is never fu

Re: [RFC PATCH 3/3] x86,smp: auto tune spinlock backoff delay factor

2012-12-21 Thread Michel Lespinasse
On Fri, Dec 21, 2012 at 3:51 PM, Rik van Riel wrote: > Subject: x86,smp: auto tune spinlock backoff delay factor > > Many spinlocks are embedded in data structures; having many CPUs > pounce on the cache line the lock is in will slow down the lock > holder, and can cause system performance to fall

Re: [PATCH 0/9] Avoid populating unbounded num of ptes with mmap_sem held

2012-12-22 Thread Michel Lespinasse
On Fri, Dec 21, 2012 at 6:16 PM, Andy Lutomirski wrote: > On Fri, Dec 21, 2012 at 5:59 PM, Michel Lespinasse wrote: >> Could you share your test case so I can try reproducing the issue >> you're seeing ? > > Not so easy. My test case is a large chunk of a high-frequenc

[PATCH 10/9] mm: make do_mmap_pgoff return populate as a size in bytes, not as a bool

2012-12-22 Thread Michel Lespinasse
as a size rather than as a boolean, so we don't have to duplicate the size rounding logic in mm_populate(). Signed-off-by: Michel Lespinasse --- fs/aio.c |5 ++--- include/linux/mm.h |2 +- ipc/shm.c |4 ++-- mm/mmap.c |6 +++--- mm/nommu.c

Re: [PATCH 2/3] x86,mm: drop TLB flush from ptep_set_access_flags

2012-11-18 Thread Michel Lespinasse
On Sat, Nov 17, 2012 at 1:53 PM, Shentino wrote: > I'm actually curious if the architecture docs/software developer > manuals for IA-32 mandate any TLB invalidations on a #PF > > Is there any official vendor documentation on the subject? Yes. Quoting a prior email: Actually, it is architected on

Re: [PATCH 1/6] mm: use vma_pages() to replace (vm_end - vm_start) >> PAGE_SHIFT

2013-04-18 Thread Michel Lespinasse
On Mon, Apr 15, 2013 at 5:48 AM, Libin wrote: > (*->vm_end - *->vm_start) >> PAGE_SHIFT operation is implemented > as a inline funcion vma_pages() in linux/mm.h, so using it. > > Signed-off-by: Libin Looks good to me. Reviewed-by: Michel Lespinasse -- Michel "Wa

Re: Device driver memory 'mmap()' function helper cleanup

2013-04-19 Thread Michel Lespinasse
On Tue, Apr 16, 2013 at 8:12 PM, Linus Torvalds wrote: > Guys, I just pushed out a new helper function intended for cleaning up > various device driver mmap functions, because they are rather messy, > and at least part of the problem was the bad impedance between what a > driver author would want

Re: Performance regression from switching lock to rw-sem for anon-vma tree

2013-06-14 Thread Michel Lespinasse
On Fri, Jun 14, 2013 at 3:31 PM, Davidlohr Bueso wrote: > A few ideas that come to mind are avoiding taking the ->wait_lock and > avoid dealing with waiters when doing the optimistic spinning (just like > mutexes do). > > I agree that we should first deal with the optimistic spinning before > addi

trigger_all_cpu_backtrace() is ignored on x86

2013-06-05 Thread Michel Lespinasse
Hi, I am having a funny issue with code that tries to use trigger_all_cpu_backtrace(). I would expect this function to dump backtraces on architectures that support it, including x86. However as it turns out, include/linux/nmi.h includes asm/irq.h but not asm/nmi.h, so it misses the arch/x86/inclu

[PATCH] x86: fix trigger_all_cpu_backtrace() implementation

2013-06-06 Thread Michel Lespinasse
hows NMI backtraces on all CPUs Signed-off-by: Michel Lespinasse --- arch/x86/include/asm/irq.h| 5 + arch/x86/include/asm/nmi.h| 4 +--- arch/x86/kernel/apic/hw_nmi.c | 1 + 3 files changed, 7 insertions(+), 3 deletions(-) diff --git a/arch/x86/include/asm/irq.h b/arch/x86/include/as

Re: [PATCH 0/2] rwsem: performance enhancements for systems with many cores

2013-06-21 Thread Michel Lespinasse
On Fri, Jun 21, 2013 at 5:00 PM, Davidlohr Bueso wrote: > On Fri, 2013-06-21 at 16:51 -0700, Tim Chen wrote: >> In this patchset, we introduce two optimizations to read write semaphore. >> The first one reduces cache bouncing of the sem->count field >> by doing a pre-read of the sem->count and avo

Re: [PATCH 1/3] rbtree_test: use pr_info for module prefix in messages

2013-03-21 Thread Michel Lespinasse
On Thu, Mar 21, 2013 at 7:51 PM, Davidlohr Bueso wrote: > On Tue, 2013-03-19 at 11:54 -0600, Shuah Khan wrote: >> On Tue, Mar 19, 2013 at 11:14 AM, Davidlohr Bueso >> wrote: >> > On Tue, 2013-03-19 at 10:29 -0600, Shuah Khan wrote: >> >> On Mon, Mar 18, 2013 at 5:20 PM, Davidlohr Bueso >> >> wr

Re: [PATCH 3/3] rbtree_test: add more rbtree integrity checks

2013-03-21 Thread Michel Lespinasse
On Mon, Mar 18, 2013 at 4:21 PM, Davidlohr Bueso wrote: > When checking the rbtree, account for more properties: > >- Both children of a red node are black. >- The tree has at least 2**bh(v)-1 internal nodes. > - WARN_ON_ONCE(is_red(rb) && > -(!rb

Re: [PATCH 7/7] ipc,sem: fine grained locking for semtimedop

2013-03-22 Thread Michel Lespinasse
lock(&sem->lock); > + } > + locknum = -1; > + } > + return locknum; > +} That's all I have. Very nice test results BTW! Reviewed-by: Michel Lespinasse -- Michel "Walken" Lespinasse A program is never fully debugged until

Re: [PATCH -mm -next] ipc,sem: fix lockdep false positive

2013-03-25 Thread Michel Lespinasse
On Mon, Mar 25, 2013 at 1:38 PM, Rik van Riel wrote: > On Mon, 25 Mar 2013 16:21:22 -0400 > Sasha Levin wrote: > >> On 03/20/2013 03:55 PM, Rik van Riel wrote: >> > Include lkml in the CC: this time... *sigh* >> > ---8<--- >> > >> > This series makes the sysv semaphore code more scalable, >> > by

Re: [PATCH -mm -next] ipc,sem: fix lockdep false positive

2013-03-25 Thread Michel Lespinasse
On Mon, Mar 25, 2013 at 2:42 PM, Michel Lespinasse wrote: > I'll be surprised if it does, because we don't actually have single > depth nesting here... > Adding Peter & Ingo for advice about how to proceed > (the one solution I know would involve using arch_spin_lock(

Re: [PATCH -mm -next] ipc,sem: fix lockdep false positive

2013-03-26 Thread Michel Lespinasse
On Tue, Mar 26, 2013 at 6:19 AM, Peter Zijlstra wrote: > On Mon, 2013-03-25 at 14:42 -0700, Michel Lespinasse wrote: >> depth nesting here... >> Adding Peter & Ingo for advice about how to proceed > >> > +++ b/ipc/sem.c >> > @@ -357,7 +357,7 @@ static inli

infiniband build warning

2013-07-21 Thread Michel Lespinasse
Hi, I am seeing build warnings in drivers/infiniband/core/cma.c starting with v3.11-rc1. These can be reproduced with gcc 4.6.3. Would you consider applying the following fix ? (The compiler warning seems benign as I could easily convince myself that the variable won't be used uninitialized, bu

Re: [RFC PATCH 1/2] ipc: introduce obtaining a lockless ipc object

2013-03-01 Thread Michel Lespinasse
> - out = idr_find(&ids->ipcs_idr, lid); > - if (out == NULL) { > - rcu_read_unlock(); > + if (!out) I think this should be if (IS_ERR(out)) ? Looks great otherwise. Acked-by: Michel Lespinasse -- Michel "Walken" Lespinasse A pr

<    1   2   3   4   5   6   7   >