Re: [PATCH 0/3] fix missing rb_subtree_gap updates on vma insert/erase

2012-11-26 Thread Michel Lespinasse
On Mon, Nov 26, 2012 at 5:16 PM, Sasha Levin wrote: > I've built today's -next, and got the following BUG pretty quickly (2-3 > hours): > > [ 1556.479284] BUG: unable to handle kernel paging request at 00412000 > [ 1556.480036] IP: [] validate_mm+0x34/0x130 > [ 1556.480036] PGD 31739067 P

Re: [PATCH 11/12] rwsem: wake all readers when first waiter is a reader

2013-03-19 Thread Michel Lespinasse
On Mon, Mar 18, 2013 at 6:17 PM, Dave Chinner wrote: > On Wed, Mar 13, 2013 at 10:00:51PM -0400, Peter Hurley wrote: >> On Wed, 2013-03-13 at 14:23 +1100, Dave Chinner wrote: >> > We don't care about the ordering between multiple concurrent >> > metadata modifications - what matters is whether the

Re: [PATCH 0/2] extend synchro-test module to test spinlocks too

2013-02-02 Thread Michel Lespinasse
On Wed, Jan 2, 2013 at 2:39 PM, Andrew Morton wrote: > On Sun, 30 Dec 2012 18:47:12 -0800 > Michel Lespinasse wrote: > >> I'm not sure whats' the back story with synchro-test though - they seem >> to have been stuck in andrew's tree for a very long time now. I

[PATCH v2 2/3] mm: accelerate mm_populate() treatment of THP pages

2013-02-03 Thread Michel Lespinasse
populating THP ranges - that is, when both the pages and vmas arrays are NULL, we don't need to iterate HPAGE_PMD_NR times to cover a single THP page (and we also avoid taking mm->page_table_lock that many times). Signed-off-by: Michel Lespinasse --- include/linux/mm.h | 13 +++-- mm/

[PATCH v2 3/3] mm: accelerate munlock() treatment of THP pages

2013-02-03 Thread Michel Lespinasse
), which would mess up our NR_MLOCK statistics. Is this a latent bug or is there a subtle point I missed here ? Signed-off-by: Michel Lespinasse --- mm/internal.h | 2 +- mm/mlock.c| 32 +--- 2 files changed, 22 insertions(+), 12 deletions(-) diff --git a/mm/in

[PATCH v2 0/3] fixes for large mm_populate() and munlock() operations

2013-02-03 Thread Michel Lespinasse
s changed to unsigned int. - In patch 3, I similarly changed the page_mask values to unsigned int. Michel Lespinasse (3): fix mm: use long type for page counts in mm_populate() and get_user_pages() mm: accelerate mm_populate() treatment of THP pages mm: accelerate munlock() treatment

[PATCH v2 1/3] fix mm: use long type for page counts in mm_populate() and get_user_pages()

2013-02-03 Thread Michel Lespinasse
original in -mm. Signed-off-by: Michel Lespinasse --- include/linux/hugetlb.h | 2 +- include/linux/mm.h | 11 ++- mm/hugetlb.c| 8 mm/memory.c | 12 ++-- mm/mlock.c | 2 +- 5 files changed, 18 insertions(+), 17 deletions(-

Re: [PATCH 1/2] mm: hotplug: implement non-movable version of get_user_pages() called get_user_pages_non_movable()

2013-02-05 Thread Michel Lespinasse
Just nitpicking, but: On Tue, Feb 5, 2013 at 3:57 AM, Mel Gorman wrote: > +static inline bool zone_is_idx(struct zone *zone, enum zone_type idx) > +{ > + /* This mess avoids a potentially expensive pointer subtraction. */ > + int zone_off = (char *)zone - (char *)zone->zone_pgdat->nod

Re: [PATCH 1/3] mm: use long type for page counts in mm_populate() and get_user_pages()

2013-02-07 Thread Michel Lespinasse
On Wed, Feb 6, 2013 at 5:10 PM, Andrew Morton wrote: > On Wed, 06 Feb 2013 19:39:11 -0500 > Sasha Levin wrote: > >> We're now hitting the VM_BUG_ON() which was added in the last hunk of the >> patch: > > hm, why was that added. > > Michel, I seem to have confused myself over this series. I saw a

Re: [RFC PATCH 1/6] kernel: implement queue spinlock API

2013-02-07 Thread Michel Lespinasse
On Thu, Feb 7, 2013 at 2:56 PM, Eric Dumazet wrote: > On Thu, 2013-02-07 at 14:34 -0800, Paul E. McKenney wrote: >> On Tue, Jan 22, 2013 at 03:13:30PM -0800, Michel Lespinasse wrote: >> > Introduce queue spinlocks, to be used in situations where it is desired >> > t

Re: [RFC PATCH 1/6] kernel: implement queue spinlock API

2013-02-07 Thread Michel Lespinasse
On Thu, Feb 7, 2013 at 2:34 PM, Paul E. McKenney wrote: > On Tue, Jan 22, 2013 at 03:13:30PM -0800, Michel Lespinasse wrote: >> Introduce queue spinlocks, to be used in situations where it is desired >> to have good throughput even under the occasional high-contention situat

Re: [RFC PATCH 1/6] kernel: implement queue spinlock API

2013-02-07 Thread Michel Lespinasse
On Thu, Feb 7, 2013 at 4:40 PM, Paul E. McKenney wrote: > On Thu, Feb 07, 2013 at 04:03:54PM -0800, Eric Dumazet wrote: >> It adds yet another memory write to store the node pointer in the >> lock... >> >> I suspect it's going to increase false sharing. > > On the other hand, compared to straight

Re: [RFC PATCH 1/6] kernel: implement queue spinlock API

2013-02-07 Thread Michel Lespinasse
On Thu, Feb 7, 2013 at 9:03 PM, Paul E. McKenney wrote: > Right... For spinlocks that -don't- disable irqs, you need to deal with > the possibility that a CPU gets interrupted while spinning, and the > interrupt handler also tries to acquire a queued lock. One way to deal > with this is to have

Re: [PATCH v2 3/3] mm: accelerate munlock() treatment of THP pages

2013-02-08 Thread Michel Lespinasse
On Fri, Feb 8, 2013 at 12:25 PM, Andrea Arcangeli wrote: > Hi Michel, > > On Sun, Feb 03, 2013 at 11:17:12PM -0800, Michel Lespinasse wrote: >> munlock_vma_pages_range() was always incrementing addresses by PAGE_SIZE >> at a time. When munlocking THP pages (or the huge zero

[PATCH v3 0/3] fixes for large mm_populate() and munlock() operations

2013-02-08 Thread Michel Lespinasse
ch 3, fixed munlock_vma_page() to return a page mask as expected by munlock_vma_pages_range() instead of a number of pages. Michel Lespinasse (3): mm: use long type for page counts in mm_populate() and get_user_pages() mm: accelerate mm_populate() treatment of THP pages mm: accelerate

[PATCH v3 2/3] mm: accelerate mm_populate() treatment of THP pages

2013-02-08 Thread Michel Lespinasse
populating THP ranges - that is, when both the pages and vmas arrays are NULL, we don't need to iterate HPAGE_PMD_NR times to cover a single THP page (and we also avoid taking mm->page_table_lock that many times). Signed-off-by: Michel Lespinasse --- include/linux/mm.h | 13 +++-- mm/

[PATCH v3 3/3] mm: accelerate munlock() treatment of THP pages

2013-02-08 Thread Michel Lespinasse
), which would mess up our NR_MLOCK statistics. Is this a latent bug or is there a subtle point I missed here ? Signed-off-by: Michel Lespinasse --- mm/internal.h | 2 +- mm/mlock.c| 34 +++--- 2 files changed, 24 insertions(+), 12 deletions(-) diff --

[PATCH v3 1/3] mm: use long type for page counts in mm_populate() and get_user_pages()

2013-02-08 Thread Michel Lespinasse
intf("done\n"); return 0; } Signed-off-by: Michel Lespinasse --- include/linux/hugetlb.h | 6 +++--- include/linux/mm.h | 15 --- mm/hugetlb.c| 12 ++-- mm/memory.c | 18 +- mm/mlock.c | 4 ++-- mm/nommu

[PATCH 0/4] rwsem: Implement writer lock-stealing

2013-02-08 Thread Michel Lespinasse
David Howell's synchro-test module (as found in Andrew's -mm tree). Michel Lespinasse (4): rwsem: make the waiter type an enumeration rather than a bitmask rwsem: shorter spinlocked section in rwsem_down_failed_common() rwsem: implement write lock stealing x86 rwsem: avoid taking

[PATCH 4/4] x86 rwsem: avoid taking slow path when stealing write lock

2013-02-08 Thread Michel Lespinasse
, they could have raced with us and obtained the lock before we steal it. Signed-off-by: Michel Lespinasse --- arch/x86/include/asm/rwsem.h | 28 +--- 1 file changed, 21 insertions(+), 7 deletions(-) diff --git a/arch/x86/include/asm/rwsem.h b/arch/x86/include/asm/rwsem.h

[PATCH 1/4] rwsem: make the waiter type an enumeration rather than a bitmask

2013-02-08 Thread Michel Lespinasse
: Michel Lespinasse --- lib/rwsem.c | 19 +++ 1 file changed, 11 insertions(+), 8 deletions(-) diff --git a/lib/rwsem.c b/lib/rwsem.c index 8337e1b9bb8d..4a6ff093a433 100644 --- a/lib/rwsem.c +++ b/lib/rwsem.c @@ -28,12 +28,15 @@ void __init_rwsem(struct rw_semaphore *sem, const

[PATCH 3/4] rwsem: implement write lock stealing

2013-02-08 Thread Michel Lespinasse
of their position in the queue). Signed-off-by: Michel Lespinasse --- include/linux/rwsem.h | 2 + lib/rwsem.c | 235 +++--- 2 files changed, 109 insertions(+), 128 deletions(-) diff --git a/include/linux/rwsem.h b/include/linux/r

[PATCH 2/4] rwsem: shorter spinlocked section in rwsem_down_failed_common()

2013-02-08 Thread Michel Lespinasse
to TASK_UNINTERRUPTIBLE immediately before testing waiter.task to see if someone woke us; it doesn't need to protect the entire function. Signed-off-by: Michel Lespinasse --- lib/rwsem.c | 8 +++- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/lib/rwsem.c b/lib/rwsem

Re: [PATCH 3/4] rwsem: implement write lock stealing

2013-02-08 Thread Michel Lespinasse
On Fri, Feb 8, 2013 at 11:30 PM, Hillf Danton wrote: > On Sat, Feb 9, 2013 at 10:45 AM, Michel Lespinasse wrote: >> + if (waiter->type != RWSEM_WAITING_FOR_WRITE) { >> + list_del(&waiter->list); >> + >> +

Re: [PATCH -v4 4/5] x86,smp: keep spinlock delay values per hashed spinlock address

2013-02-09 Thread Michel Lespinasse
On Wed, Feb 6, 2013 at 12:10 PM, Rik van Riel wrote: > On 01/27/2013 08:04 AM, Michel Lespinasse wrote: >> >> On Fri, Jan 25, 2013 at 11:18 AM, Rik van Riel wrote: >>> >>> + u32 delay = (ent->hash == hash) ? ent->delay : >>> MIN_SPINLOCK

Re: [PATCH 1/6] lib: Implement range locks

2013-02-10 Thread Michel Lespinasse
_unblock((struct range_lock *)node); > + node = interval_tree_iter_next(node, lock->node.start, > + lock->node.last); > + } Maybe just a personal preference, but I prefer a for loop. Also, I would prefer container_of

Re: [PATCH 1/6] lib: Implement range locks

2013-02-11 Thread Michel Lespinasse
On Mon, Feb 11, 2013 at 2:27 AM, Jan Kara wrote: > On Sun 10-02-13 21:42:32, Michel Lespinasse wrote: >> On Thu, Jan 31, 2013 at 1:49 PM, Jan Kara wrote: >> > +void range_lock_init(struct range_lock *lock, unsigned long start, >> > +unsigned long

[PATCH 0/3] fixes for large mm_populate() and munlock() operations

2013-01-30 Thread Michel Lespinasse
cussion of patch 3 takes too long I would ask Andrew to consider merging patches 1-2 first. Michel Lespinasse (3): mm: use long type for page counts in mm_populate() and get_user_pages() mm: accelerate mm_populate() treatment of THP pages mm: accelerate munlock() treatment of THP pages arc

[PATCH 2/3] mm: accelerate mm_populate() treatment of THP pages

2013-01-30 Thread Michel Lespinasse
, we don't need to iterate HPAGE_PMD_NR times to cover a single THP page (and we also avoid taking mm->page_table_lock that many times). Other follow_page() call sites can safely ignore the value returned in *page_mask. Signed-off-by: Michel Lespinasse --- arch/ia64/xen/xencomm.c

[RFC PATCH 3/3] mm: accelerate munlock() treatment of THP pages

2013-01-30 Thread Michel Lespinasse
), which would mess up our NR_MLOCK statistics. Is this a latent bug or is there a subtle point I missed here ? Signed-off-by: Michel Lespinasse --- mm/internal.h | 2 +- mm/mlock.c| 32 2 files changed, 21 insertions(+), 13 deletions(-) diff --git a/mm/interna

[PATCH 1/3] mm: use long type for page counts in mm_populate() and get_user_pages()

2013-01-30 Thread Michel Lespinasse
intf("done\n"); return 0; } Signed-off-by: Michel Lespinasse --- include/linux/hugetlb.h | 6 +++--- include/linux/mm.h | 14 +++--- mm/hugetlb.c| 10 +- mm/memory.c | 14 +++--- mm/mlock.c | 5 +++-- 5 files ch

Re: mmap() scalability in the presence of the MAP_POPULATE flag

2013-01-30 Thread Michel Lespinasse
Hi Roman, On Fri, Jan 4, 2013 at 11:43 PM, Michel Lespinasse wrote: > On Fri, Jan 4, 2013 at 10:40 PM, Roman Dubtsov wrote: >> - POPULATE_VIA_LOOP -- makes the test populate memory using a loop >> - POPULATE_VIA_MMAP -- makes the test populate memory via MAP_POPULATE >> &g

Re: [PATCH 2/3] mm: accelerate mm_populate() treatment of THP pages

2013-01-30 Thread Michel Lespinasse
On Wed, Jan 30, 2013 at 7:05 PM, Hugh Dickins wrote: > On Wed, 30 Jan 2013, Michel Lespinasse wrote: > >> This change adds a page_mask argument to follow_page. >> >> follow_page sets *page_mask to HPAGE_PMD_NR - 1 when it encounters a THP >> pa

Re: [PATCH] rwsem-spinlock: let rwsem write lock stealable

2013-01-31 Thread Michel Lespinasse
On Wed, Jan 30, 2013 at 1:14 AM, Yuanhan Liu wrote: > We(Linux Kernel Performance project) found a regression introduced by > commit 5a50508, which just convert all mutex lock to rwsem write lock. > The semantics is same, but the results is quite huge in some cases. > After investigation, we found

Re: [RFC] [DONOTAPPLY] [PATCH] enhanceio: STEC EnhanceIO SSD caching software for Linux kernel

2013-02-01 Thread Michel Lespinasse
On Fri, Feb 1, 2013 at 4:44 PM, Darrick J. Wong wrote: > This is a patch to migrate STEC's enhanceio driver out of their github > repository and into the staging tree. From their README: > > "EnhanceIO driver is based on EnhanceIO SSD caching software product developed > by STEC Inc. EnhanceIO wa

Re: rwlock_t unfairness and tasklist_lock

2013-01-24 Thread Michel Lespinasse
On Sat, Jan 12, 2013 at 9:31 AM, Oleg Nesterov wrote: > On 01/09, Michel Lespinasse wrote: >> >> - Would there be any fundamental objection to implementing a fair >> >> rwlock_t and dealing with the reentrancy issues in tasklist_lock ? My >> >> pro

Re: [PATCH -v4 4/5] x86,smp: keep spinlock delay values per hashed spinlock address

2013-01-27 Thread Michel Lespinasse
On Fri, Jan 25, 2013 at 11:18 AM, Rik van Riel wrote: > + u32 delay = (ent->hash == hash) ? ent->delay : MIN_SPINLOCK_DELAY; I still don't like the reseting of delay to MIN_SPINLOCK_DELAY when there is a hash collision. -- Michel "Walken" Lespinasse A program is never fully debugged until

Re: [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks

2013-02-27 Thread Michel Lespinasse
Hi Srivatsa, I think there is some elegance in Lai's proposal of using a local trylock for the reader uncontended case and global rwlock to deal with the contended case without deadlocks. He apparently didn't realize initially that nested read locks are common, and he seems to have confused you be

Re: [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks

2013-02-28 Thread Michel Lespinasse
On Thu, Feb 28, 2013 at 3:25 AM, Oleg Nesterov wrote: > On 02/27, Michel Lespinasse wrote: >> >> +void lg_rwlock_local_read_lock(struct lgrwlock *lgrw) >> +{ >> + preempt_disable(); >> + >> + if (__this_cpu_read(*lgrw->local_refcnt) || >

Re: [PATCH v6 08/46] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2013-02-19 Thread Michel Lespinasse
On Tue, Feb 19, 2013 at 2:50 AM, Srivatsa S. Bhat wrote: > But, the whole intention behind removing the parts depending on the > recursive property of rwlocks would be to make it easier to make rwlocks > fair (going forward) right? Then, that won't work for CPU hotplug, because, > just like we hav

Re: PAGE_CACHE_SIZE vs. PAGE_SIZE

2013-02-19 Thread Michel Lespinasse
On Fri, Feb 1, 2013 at 6:40 AM, Andrew Morton wrote: > On Fri, 18 Jan 2013 17:57:25 +0200 > "Kirill A. Shutemov" wrote: > >> Hi, >> >> PAGE_CACHE_* macros were introduced long time ago in hope to implement >> page cache with larger chunks than one page in future. >> >> In fact it was never done.

Re: [patch] mm: mlock: document scary-looking stack expansion mlock chain

2013-02-20 Thread Michel Lespinasse
On Wed, Feb 20, 2013 at 8:51 PM, Ric Mason wrote: > On 02/01/2013 02:10 PM, Johannes Weiner wrote: >> >> The fact that mlock calls get_user_pages, and get_user_pages might >> call mlock when expanding a stack looks like a potential recursion. > > Why expand stack need call mlock? I can't find it i

Re: [patch] mm: mlock: document scary-looking stack expansion mlock chain

2013-02-20 Thread Michel Lespinasse
ma, so no stack expansion will actually happen from mlock. > > Should this ever change: the stack expansion mlocks only the newly > expanded range and so will not result in recursive expansion. > > Reported-by: Al Viro > Signed-off-by: Johannes Weiner Acked-by: Michel Lespinasse

Re: [PATCH 0/4] rwsem: Implement writer lock-stealing

2013-02-20 Thread Michel Lespinasse
On Wed, Feb 20, 2013 at 4:50 PM, Alex Shi wrote: > I did a quick review on the patchset and tested the patches 1~3, and 1~3 > plus 4th, my patch plus 4th. > > The patch looks much complicated, and also goes writing slow path to > steal locking. My patch looks quite straight and simple. > > This 1~

Re: [PATCH 11/12] rwsem: wake all readers when first waiter is a reader

2013-03-13 Thread Michel Lespinasse
On Tue, Mar 12, 2013 at 8:23 PM, Dave Chinner wrote: > On Mon, Mar 11, 2013 at 11:43:34PM -0700, Michel Lespinasse wrote: >> I find the name 'barrier' actually confusing when used to describe >> synchronous operations. To me a barrier is usualy between >> asynchro

Re: [PATCH v5 00/44] ldisc patchset

2013-03-13 Thread Michel Lespinasse
On Tue, Mar 12, 2013 at 9:47 AM, Peter Hurley wrote: > On Mon, 2013-03-11 at 19:28 -0700, Michel Lespinasse wrote: >> Also why the write-priority requirement rather than reader-writer >> fairness ? Is it to make it less likely to hit the writer timeouts ? > > Since

Re: [PATCH 11/12] rwsem: wake all readers when first waiter is a reader

2013-03-14 Thread Michel Lespinasse
On Mon, Mar 11, 2013 at 04:36:47PM -0400, Peter Hurley wrote: > > On Wed, 2013-03-06 at 15:21 -0800, Michel Lespinasse wrote: > > + retry_reader_grants: > > + oldcount = rwsem_atomic_update(adjustment, sem) - adjustment; > > + if (unlikely(oldco

Re: [PATCH v5 00/44] ldisc patchset

2013-03-14 Thread Michel Lespinasse
On Wed, Mar 13, 2013 at 6:12 PM, Peter Hurley wrote: > On Wed, 2013-03-13 at 04:36 -0700, Michel Lespinasse wrote: >> Have you considered building your ldlock based on lib/rwsem-spinlock.c >> instead ? i.e. having an internal spinlock to protect the ldisc >> reference cou

Re: [PATCH v5 00/44] ldisc patchset

2013-03-14 Thread Michel Lespinasse
On Thu, Mar 14, 2013 at 4:42 AM, Peter Hurley wrote: > On Thu, 2013-03-14 at 00:25 -0700, Michel Lespinasse wrote: >> Its not too late to run away from it and preserve your sanity (as well >> as that of the next person working on the tty layer :) > > The long-term plan is to

Re: [PATCH 11/12] rwsem: wake all readers when first waiter is a reader

2013-03-14 Thread Michel Lespinasse
On Thu, Mar 14, 2013 at 4:39 AM, Peter Hurley wrote: > On Thu, 2013-03-14 at 00:03 -0700, Michel Lespinasse wrote: >> > CPU 0 | CPU 1 >> > | >> > | down_write

[PATCH v2 05/13] rwsem: simplify rwsem_down_write_failed

2013-03-15 Thread Michel Lespinasse
ing - wait_lock protects against that. Signed-off-by: Michel Lespinasse --- lib/rwsem.c | 33 + 1 file changed, 9 insertions(+), 24 deletions(-) diff --git a/lib/rwsem.c b/lib/rwsem.c index 66f307e90761..c73bd96dc30c 100644 --- a/lib/rwsem.c +++ b/lib/rwsem.c @@

[PATCH v2 09/13] rwsem: skip initial trylock in rwsem_down_write_failed

2013-03-15 Thread Michel Lespinasse
We can skip the initial trylock in rwsem_down_write_failed() if there are known active lockers already, thus saving one likely-to-fail cmpxchg. Signed-off-by: Michel Lespinasse --- lib/rwsem.c | 17 + 1 file changed, 9 insertions(+), 8 deletions(-) diff --git a/lib/rwsem.c b

[PATCH v2 13/13] x86 rwsem: avoid taking slow path when stealing write lock

2013-03-15 Thread Michel Lespinasse
, they could have raced with us and obtained the lock before we steal it. Signed-off-by: Michel Lespinasse --- arch/x86/include/asm/rwsem.h | 28 +--- 1 file changed, 21 insertions(+), 7 deletions(-) diff --git a/arch/x86/include/asm/rwsem.h b/arch/x86/include/asm/rwsem.h

[PATCH v2 12/13] rwsem: do not block readers at head of queue if other readers are active

2013-03-15 Thread Michel Lespinasse
the active readers complete. Thanks to Peter Hurley for noticing this possible race. Signed-off-by: Michel Lespinasse --- lib/rwsem.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/lib/rwsem.c b/lib/rwsem.c index 09bf03e7808c..4e4c8893dc00 100644 --- a/lib/rwsem.c

[PATCH v2 11/13] rwsem: implement support for write lock stealing on the fastpath

2013-03-15 Thread Michel Lespinasse
fore we wake up additional readers. So, we have to use a new RWSEM_WAKE_READERS value to indicate we only want to wake readers, but we don't currently hold any read lock. Signed-off-by: Michel Lespinasse --- lib/rwsem.c | 63 ++--- 1 file ch

[PATCH v2 10/13] rwsem: simplify __rwsem_do_wake

2013-03-15 Thread Michel Lespinasse
check that. We can use do..while loops to iterate over the readers to wake (generates slightly better code). Signed-off-by: Michel Lespinasse --- lib/rwsem-spinlock.c | 25 - lib/rwsem.c | 26 -- 2 files changed, 20 insertions(+), 31

[PATCH v2 06/13] rwsem: more agressive lock stealing in rwsem_down_write_failed

2013-03-15 Thread Michel Lespinasse
hange, they are expected to be minimal: readers are still granted the lock (rather than having to acquire it themselves) when they reach the front of the wait queue, so we have essentially the same behavior as in rwsem-spinlock. Signed-off-by: Michel Lespinasse --- lib/rwsem.c

[PATCH v2 01/13] rwsem: make the waiter type an enumeration rather than a bitmask

2013-03-15 Thread Michel Lespinasse
: Michel Lespinasse --- lib/rwsem-spinlock.c | 19 +++ lib/rwsem.c | 23 +-- 2 files changed, 24 insertions(+), 18 deletions(-) diff --git a/lib/rwsem-spinlock.c b/lib/rwsem-spinlock.c index 7542afbb22b3..5f117f37ac0a 100644 --- a/lib/rwsem-spinlock.c

[PATCH v2 04/13] rwsem: simplify rwsem_down_read_failed

2013-03-15 Thread Michel Lespinasse
so we don't have to grab the wait_lock either. Signed-off-by: Michel Lespinasse --- lib/rwsem.c | 22 ++ 1 file changed, 2 insertions(+), 20 deletions(-) diff --git a/lib/rwsem.c b/lib/rwsem.c index fb658af1c12c..66f307e90761 100644 --- a/lib/rwsem.c +++ b/lib/rwsem.c

[PATCH v2 03/13] rwsem: move rwsem_down_failed_common code into rwsem_down_{read,write}_failed

2013-03-15 Thread Michel Lespinasse
make it easier to check the following steps. Signed-off-by: Michel Lespinasse --- lib/rwsem.c | 72 - 1 file changed, 57 insertions(+), 15 deletions(-) diff --git a/lib/rwsem.c b/lib/rwsem.c index 40636454cf3c..fb658af1c12c 100644

[PATCH v2 00/13] rwsem fast-path write lock stealing

2013-03-15 Thread Michel Lespinasse
waiters. Patch 12 fixes a race condition Peter Hurley noticed in reviewing v1 of this patch series, which resulted in readers sometimes blocking instead of executing in parallel with other existing readers. Patch 13 finally implements rwsem fast path lock stealing for x86 arch. Michel Lespin

[PATCH v2 02/13] rwsem: shorter spinlocked section in rwsem_down_failed_common()

2013-03-15 Thread Michel Lespinasse
to TASK_UNINTERRUPTIBLE immediately before checking if we actually need to sleep; it doesn't need to protect the entire function. Signed-off-by: Michel Lespinasse --- lib/rwsem.c | 8 +++- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/lib/rwsem.c b/lib/rwsem.c index 672e

[PATCH v2 08/13] rwsem: avoid taking wait_lock in rwsem_down_write_failed

2013-03-15 Thread Michel Lespinasse
In rwsem_down_write_failed(), if there are active locks after we wake up (i.e. the lock got stolen from us), skip taking the wait_lock and go back to sleep immediately. Signed-off-by: Michel Lespinasse --- lib/rwsem.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git

[PATCH v2 07/13] rwsem: use cmpxchg for trying to steal write lock

2013-03-15 Thread Michel Lespinasse
Using rwsem_atomic_update to try stealing the write lock forced us to undo the adjustment in the failure path. We can have simpler and faster code by using cmpxchg instead. Signed-off-by: Michel Lespinasse --- lib/rwsem.c | 26 ++ 1 file changed, 6 insertions(+), 20

Re: [PATCH] mm/fremap.c: fix another oops on error path

2013-03-16 Thread Michel Lespinasse
On Sat, Mar 16, 2013 at 8:23 AM, Ming Lei wrote: > Since find_vma() may return NULL, so don't dereference the > returned 'vma' until it is valid. Agree this was an issue. This is fixed with commit a2362d24764a. -- Michel "Walken" Lespinasse A program is never fully debugged until the last user

Re: [PATCH 0/5] rbtree based interval tree as a prio_tree replacement

2012-08-30 Thread Michel Lespinasse
On Thu, Aug 30, 2012 at 2:34 PM, Andrew Morton wrote: > On Tue, 7 Aug 2012 00:25:38 -0700 > Michel Lespinasse wrote: > >> This patchset goes over the rbtree changes that have been already integrated >> into Andrew's -mm tree, as well as the augmented rbtree prop

Re: [PATCH v2 07/12] rbtree: adjust root color in rb_insert_color() only when necessary

2012-08-31 Thread Michel Lespinasse
On Fri, Aug 31, 2012 at 1:01 AM, Adrian Hunter wrote: > This breaks tools/perf build in linux-next: > > ../../lib/rbtree.c: In function 'rb_insert_color': > ../../lib/rbtree.c:95:9: error: 'true' undeclared (first use in this function) > ../../lib/rbtree.c:95:9: note: each undeclared identifier is

Re: [PATCH v2 07/12] rbtree: adjust root color in rb_insert_color() only when necessary

2012-08-31 Thread Michel Lespinasse
On Fri, Aug 31, 2012 at 1:35 AM, Adrian Hunter wrote: > On 31/08/12 11:15, Andrew Morton wrote: >> On Fri, 31 Aug 2012 01:07:24 -0700 Michel Lespinasse >> wrote: >>> I thought Andrew had a patch >>> rbtree-adjust-root-color-in-rb_insert_color-only-when-necess

[PATCH 0/7] use interval trees for anon rmap

2012-09-04 Thread Michel Lespinasse
original location. I don't expect this to be very frequent, though, so move_ptes() should be as efficient as it was before patch 3 for all likely cases. Michel Lespinasse (7): mm: interval tree updates mm: fix potential anon_vma locking issue in mprotect() mm anon rmap: remove anon_vma_m

[PATCH 2/7] mm: fix potential anon_vma locking issue in mprotect()

2012-09-04 Thread Michel Lespinasse
xt is being expanded. This change also removes an optimization which avoided taking anon_vma lock during brk adjustments. We could probably make that optimization work again, but the following anon rmap change would break it, so I kept things as simple as possible here. Signed-off-by: Michel Lespi

[PATCH 5/7] mm rmap: remove vma_address check for address inside vma

2012-09-04 Thread Michel Lespinasse
Signed-off-by: Michel Lespinasse --- mm/huge_memory.c |4 mm/rmap.c| 48 +--- 2 files changed, 21 insertions(+), 31 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index fe119cb71b41..91b65f962320 100644 --- a/mm/huge_memo

[PATCH 7/7] mm: avoid taking rmap locks in move_ptes()

2012-09-04 Thread Michel Lespinasse
ies resolved in tree insertion order. Signed-off-by: Michel Lespinasse --- fs/exec.c |2 +- include/linux/mm.h |6 +++- mm/mmap.c |7 - mm/mremap.c| 57 +++ 4 files changed, 49 insertions(+), 23 deletions(-

[PATCH 6/7] mm: add CONFIG_DEBUG_VM_RB build option

2012-09-04 Thread Michel Lespinasse
goffs have not changed since the nodes were inserted on the anon vma interval tree (as it is important that the nodes be reindexed after each such update). Signed-off-by: Michel Lespinasse --- include/linux/mm.h |3 +++ include/linux/rmap.h |3 +++ lib/Kconfig.debug|9

[PATCH 4/7] mm anon rmap: replace same_anon_vma linked list with an interval tree.

2012-09-04 Thread Michel Lespinasse
s locked during the update, so there is no chance that rmap would miss the vmas that are being updated. Signed-off-by: Michel Lespinasse --- include/linux/mm.h | 14 include/linux/rmap.h | 11 + mm/huge_memory.c |5 ++- mm/interval_tree.c | 14 +++

[PATCH 3/7] mm anon rmap: remove anon_vma_moveto_tail

2012-09-04 Thread Michel Lespinasse
_vma_moveto_tail() ordering function with proper anon_vma locking in move_ptes(). Once we have the anon interval tree in place, we will re-introduce an optimization to avoid taking these locks in the most common cases. Signed-off-by: Michel Lespinasse --- include/linux/rmap.h |1 - mm/mmap.c

[PATCH 1/7] mm: interval tree updates

2012-09-04 Thread Michel Lespinasse
the nonlinear and interval tree cases, with vma_interval_tree_insert_after() which handles only the interval tree case and has an API that is more consistent with the other interval tree handling functions. The nonlinear case is now handled explicitly in kernel/fork.c dup_mmap(). Signed-off-b

Re: [PATCH 2/7] mm: fix potential anon_vma locking issue in mprotect()

2012-09-04 Thread Michel Lespinasse
On Tue, Sep 04, 2012 at 04:27:45PM +0200, Andrea Arcangeli wrote: > Hi Michel, > > On Tue, Sep 04, 2012 at 02:20:52AM -0700, Michel Lespinasse wrote: > > This change fixes an anon_vma locking issue in the following situation: > > - vma has no anon_vma > > - next has an

Re: [PATCH 2/7] mm: fix potential anon_vma locking issue in mprotect()

2012-09-04 Thread Michel Lespinasse
On Tue, Sep 4, 2012 at 3:16 PM, Andrea Arcangeli wrote: > I would suggest to do the strict fix as above in as patch 1/8 and push > it in -mm, and to do only the optimization removal in 3/8. I think > we want it in -stable too later, so it'll make life easier to > cherry-pick the commit if it's mer

Re: [PATCH 4/7] mm anon rmap: replace same_anon_vma linked list with an interval tree.

2012-09-04 Thread Michel Lespinasse
pdate, so there is no chance that rmap would miss the vmas that are being updated. Change-Id: I6a6127d3c1fc1ab4af2acfc7ed2d269b963f6792 Signed-off-by: Michel Lespinasse --- include/linux/mm.h | 14 + include/linux/rmap.h | 11 --- mm/huge_memory.c |5 ++- mm/interv

Re: BUG at mm/huge_memory.c:1428!

2012-09-13 Thread Michel Lespinasse
On Thu, Sep 13, 2012 at 11:14 AM, Jiri Slaby wrote: > Hi, > > I've just get the following BUG with today's -next. It happens every > time I try to update packages. > > kernel BUG at mm/huge_memory.c:1428! That is very likely my bug. Do you have the message that should be printed right above the

Re: [PATCH 8/10] bug.h: Make BUILD_BUG_ON generate compile-time error

2012-10-01 Thread Michel Lespinasse
On Fri, Sep 28, 2012 at 4:20 PM, Daniel Santos wrote: > Negative sized arrays wont create a compile-time error in some cases > starting with gcc 4.4 (e.g., inlined functions), but gcc 4.3 introduced > the error function attribute that will. This patch modifies > BUILD_BUG_ON to behave like BUILD_

Re: [PATCH 10/10] bug.h: Add gcc 4.2+ versions of BUILD_BUG_ON_* macros

2012-10-01 Thread Michel Lespinasse
On Fri, Sep 28, 2012 at 4:20 PM, Daniel Santos wrote: > BUILD_BUG_ON42(arg) > BUILD_BUG_ON_CONST42(arg) > > Prior to gcc 4.2, the optimizer was unable to determine that many > constant values stored in structs were indeed compile-time constants and > optimize them out. Sometimes, it will find an

Re: [PATCH 7/10] compiler{,-gcc4}.h: Introduce __flatten function attribute

2012-10-03 Thread Michel Lespinasse
On Wed, Oct 3, 2012 at 7:46 AM, Daniel Santos wrote: > On 10/03/2012 09:01 AM, Steven Rostedt wrote: >> You don't need to use get_maintainers. It's more of a help tool to find >> maintainers and not something that is mandatory. Not everyone that has >> ever touched one of these files needs to be C

Re: [PATCH 05/31] x86/mm: Reduce tlb flushes from ptep_set_access_flags()

2012-10-26 Thread Michel Lespinasse
On Thu, Oct 25, 2012 at 9:23 PM, Linus Torvalds wrote: > On Thu, Oct 25, 2012 at 8:57 PM, Rik van Riel wrote: >> >> That may not even be needed. Apparently Intel chips >> automatically flush an entry from the TLB when it >> causes a page fault. I assume AMD chips do the same, >> because flush_t

Re: [PATCH 05/31] x86/mm: Reduce tlb flushes from ptep_set_access_flags()

2012-10-26 Thread Michel Lespinasse
On Fri, Oct 26, 2012 at 5:48 AM, Andi Kleen wrote: > Michel Lespinasse writes: > >> On Thu, Oct 25, 2012 at 9:23 PM, Linus Torvalds >> wrote: >>> On Thu, Oct 25, 2012 at 8:57 PM, Rik van Riel wrote: >>>> >>>> That may not even be needed.

Re: mm: NULL ptr deref in anon_vma_interval_tree_verify

2012-11-04 Thread Michel Lespinasse
On Sun, Nov 4, 2012 at 6:20 PM, Bob Liu wrote: > The loop for each entry of vma->anon_vma_chain in validate_mm() is not > protected by anon_vma lock. > I think that may be the cause. > > Michel, What's your opinion? Good catch, I think that's it. Somehow it had not occured to me to verify the che

Re: mm: NULL ptr deref in anon_vma_interval_tree_verify

2012-11-04 Thread Michel Lespinasse
On Sun, Nov 4, 2012 at 8:14 PM, Bob Liu wrote: > Hmm, I attached a simple fix patch. Reviewed-by: Michel Lespinasse (also ran some tests with it, but I could never reproduce the original issue anyway). Bob, it would be easier if you had sent the original patch inline rather than as

Re: mm: NULL ptr deref in anon_vma_interval_tree_verify

2012-11-05 Thread Michel Lespinasse
On Sun, Nov 4, 2012 at 8:44 PM, Michel Lespinasse wrote: > On Sun, Nov 4, 2012 at 8:14 PM, Bob Liu wrote: >> Hmm, I attached a simple fix patch. > > Reviewed-by: Michel Lespinasse > (also ran some tests with it, but I could never reproduce the original > issue anyway). W

[PATCH 15/16] mm: use vm_unmapped_area() on sparc32 architecture

2012-11-05 Thread Michel Lespinasse
Update the sparc32 arch_get_unmapped_area function to make use of vm_unmapped_area() instead of implementing a brute force search. Signed-off-by: Michel Lespinasse --- arch/sparc/kernel/sys_sparc_32.c | 24 +--- 1 files changed, 9 insertions(+), 15 deletions(-) diff

[PATCH 12/16] mm: use vm_unmapped_area() on sh architecture

2012-11-05 Thread Michel Lespinasse
Update the sh arch_get_unmapped_area[_topdown] functions to make use of vm_unmapped_area() instead of implementing a brute force search. Signed-off-by: Michel Lespinasse --- arch/sh/mm/mmap.c | 126 ++--- 1 files changed, 24 insertions(+), 102

[PATCH 16/16] mm: use vm_unmapped_area() in hugetlbfs on tile architecture

2012-11-05 Thread Michel Lespinasse
Update the tile hugetlb_get_unmapped_area function to make use of vm_unmapped_area() instead of implementing a brute force search. Signed-off-by: Michel Lespinasse --- arch/tile/mm/hugetlbpage.c | 139 1 files changed, 25 insertions(+), 114

[PATCH 14/16] mm: use vm_unmapped_area() in hugetlbfs on sparc64 architecture

2012-11-05 Thread Michel Lespinasse
Update the sparc64 hugetlb_get_unmapped_area function to make use of vm_unmapped_area() instead of implementing a brute force search. Signed-off-by: Michel Lespinasse --- arch/sparc/mm/hugetlbpage.c | 123 ++ 1 files changed, 30 insertions(+), 93

[PATCH 13/16] mm: use vm_unmapped_area() on sparc64 architecture

2012-11-05 Thread Michel Lespinasse
Update the sparc64 arch_get_unmapped_area[_topdown] functions to make use of vm_unmapped_area() instead of implementing a brute force search. Signed-off-by: Michel Lespinasse --- arch/sparc/kernel/sys_sparc_64.c | 132 +- 1 files changed, 30 insertions

[PATCH 10/16] mm: use vm_unmapped_area() on mips architecture

2012-11-05 Thread Michel Lespinasse
Update the mips arch_get_unmapped_area[_topdown] functions to make use of vm_unmapped_area() instead of implementing a brute force search. Signed-off-by: Michel Lespinasse --- arch/mips/mm/mmap.c | 99 +-- 1 files changed, 17 insertions(+), 82

[PATCH 02/16] mm: augment vma rbtree with rb_subtree_gap

2012-11-05 Thread Michel Lespinasse
eck if the following gap is suitable. This does have the potential to make unmapping VMAs more expensive, especially for processes with very large numbers of VMAs, where the VMA rbtree can grow quite deep. Signed-off-by: Michel Lespinasse Reviewed-by: Rik van Riel --- include/linux/mm_types.

[PATCH 03/16] mm: check rb_subtree_gap correctness

2012-11-05 Thread Michel Lespinasse
When CONFIG_DEBUG_VM_RB is enabled, check that rb_subtree_gap is correctly set for every vma and that mm->highest_vm_end is also correct. Also add an explicit 'bug' variable to track if browse_rb() detected any invalid condition. Signed-off-by: Michel Lespinasse Reviewed-by:

[PATCH 06/16] mm: use vm_unmapped_area() on x86_64 architecture

2012-11-05 Thread Michel Lespinasse
Update the x86_64 arch_get_unmapped_area[_topdown] functions to make use of vm_unmapped_area() instead of implementing a brute force search. Signed-off-by: Michel Lespinasse Reviewed-by: Rik van Riel --- arch/x86/include/asm/elf.h |6 +- arch/x86/kernel/sys_x86_64.c | 151

[PATCH 08/16] mm: use vm_unmapped_area() in hugetlbfs

2012-11-05 Thread Michel Lespinasse
Update the hugetlb_get_unmapped_area function to make use of vm_unmapped_area() instead of implementing a brute force search. Signed-off-by: Michel Lespinasse --- fs/hugetlbfs/inode.c | 42 -- 1 files changed, 8 insertions(+), 34 deletions(-) diff

[PATCH 09/16] mm: use vm_unmapped_area() in hugetlbfs on i386 architecture

2012-11-05 Thread Michel Lespinasse
Update the i386 hugetlb_get_unmapped_area function to make use of vm_unmapped_area() instead of implementing a brute force search. Signed-off-by: Michel Lespinasse --- arch/x86/mm/hugetlbpage.c | 130 + 1 files changed, 25 insertions(+), 105

[PATCH 11/16] mm: use vm_unmapped_area() on arm architecture

2012-11-05 Thread Michel Lespinasse
Update the arm arch_get_unmapped_area[_topdown] functions to make use of vm_unmapped_area() instead of implementing a brute force search. Signed-off-by: Michel Lespinasse --- arch/arm/mm/mmap.c | 119 ++-- 1 files changed, 23 insertions(+), 96

  1   2   3   4   5   6   7   >