[PATCH 8/9] mm, hugetlb: remove decrement_hugepage_resv_vma()

2013-07-15 Thread Joonsoo Kim
patch implement it. Signed-off-by: Joonsoo Kim diff --git a/mm/hugetlb.c b/mm/hugetlb.c index f6a7a4e..ed2d0af 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -434,25 +434,6 @@ static int is_vma_resv_set(struct vm_area_struct *vma, unsigned long flag) return (get_vma_private_data(vma

[PATCH 7/9] mm, hugetlb: add VM_NORESERVE check in vma_has_reserves()

2013-07-15 Thread Joonsoo Kim
ge page if free count is under the reserve count. Signed-off-by: Joonsoo Kim diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 6c1eb9b..f6a7a4e 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -464,6 +464,8 @@ void reset_vma_resv_huge_pages(struct vm_area_struct *vma) /* Returns true if the VMA has associat

[PATCH 9/9] mm, hugetlb: decrement reserve count if VM_NORESERVE alloc page cache

2013-07-15 Thread Joonsoo Kim
if (q == MAP_FAILED) { fprintf(stderr, "mmap() failed: %s\n", strerror(errno)); } q[0] = 'c'; This patch solve this problem. Signed-off-by: Joonsoo Kim diff --git a/mm/hugetlb.c b/mm/hugetlb.c index ed2d0af..defb180 100644 --- a/m

[PATCH 5/9] mm, hugetlb: remove redundant list_empty check in gather_surplus_pages()

2013-07-15 Thread Joonsoo Kim
If list is empty, list_for_each_entry_safe() doesn't do anything. So, this check is redundant. Remove it. Signed-off-by: Joonsoo Kim diff --git a/mm/hugetlb.c b/mm/hugetlb.c index a838e6b..d4a1695 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1019,10 +1019,8 @@ free: spin_u

[PATCH 1/9] mm, hugetlb: move up the code which check availability of free huge page

2013-07-15 Thread Joonsoo Kim
We don't need to proceede the processing if we don't have any usable free huge page. So move this code up. Signed-off-by: Joonsoo Kim diff --git a/mm/hugetlb.c b/mm/hugetlb.c index e2bfbf7..d87f70b 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -539,10 +539,6 @@ static s

Re: [RFC PATCH 1/5] mm, page_alloc: support multiple pages allocation

2013-07-15 Thread Joonsoo Kim
On Thu, Jul 11, 2013 at 08:51:22AM -0700, Dave Hansen wrote: > On 07/10/2013 11:12 PM, Joonsoo Kim wrote: > >> > I'd also like to see some scalability numbers on this. How do your > >> > tests look when all the CPUs on the system are hammering away? > > What

Re: [RFC PATCH 1/5] mm, page_alloc: support multiple pages allocation

2013-07-15 Thread Joonsoo Kim
On Fri, Jul 12, 2013 at 09:31:42AM -0700, Dave Hansen wrote: > On 07/10/2013 11:12 PM, Joonsoo Kim wrote: > > On Wed, Jul 10, 2013 at 10:38:20PM -0700, Dave Hansen wrote: > >> You're probably right for small numbers of pages. But, if we're talking > >> abou

[PATCH] Revert "tools lib lk: Fix for cross build"

2013-07-15 Thread Joonsoo Kim
: Respect CROSS_COMPILE Make lk use CROSS_COMPILE, in order to be able to cross compile perf again. Signed-off-by: Joonsoo Kim diff --git a/tools/lib/lk/Makefile b/tools/lib/lk/Makefile index 280dd82..3dba0a4 100644 --- a/tools/lib/lk/Makefile +++ b/tools/lib/lk/Makefile @@ -3,21 +3,6

Re: [PATCH 0/9] mm, hugetlb: clean-up and possible bug fix

2013-07-15 Thread Joonsoo Kim
On Mon, Jul 15, 2013 at 07:40:16PM +0530, Aneesh Kumar K.V wrote: > Joonsoo Kim writes: > > > First 5 patches are almost trivial clean-up patches. > > > > The others are for fixing three bugs. > > Perhaps, these problems are minor, because this codes are used > &

Re: [PATCH 1/9] mm, hugetlb: move up the code which check availability of free huge page

2013-07-15 Thread Joonsoo Kim
On Mon, Jul 15, 2013 at 07:31:33PM +0530, Aneesh Kumar K.V wrote: > Joonsoo Kim writes: > > > We don't need to proceede the processing if we don't have any usable > > free huge page. So move this code up. > > I guess you can also mention that since we are ho

Re: [PATCH 4/9] mm, hugetlb: fix and clean-up node iteration code to alloc or free

2013-07-15 Thread Joonsoo Kim
On Mon, Jul 15, 2013 at 07:57:37PM +0530, Aneesh Kumar K.V wrote: > Joonsoo Kim writes: > > > Current node iteration code have a minor problem which do one more > > node rotation if we can't succeed to allocate. For example, > > if we start to allocate at node 0,

Re: [PATCH 5/9] mm, hugetlb: remove redundant list_empty check in gather_surplus_pages()

2013-07-15 Thread Joonsoo Kim
On Mon, Jul 15, 2013 at 08:01:24PM +0530, Aneesh Kumar K.V wrote: > Joonsoo Kim writes: > > > If list is empty, list_for_each_entry_safe() doesn't do anything. > > So, this check is redundant. Remove it. > > > > Signed-off-by: Joonsoo Kim > > Reviewed

Re: [PATCH 0/9] mm, hugetlb: clean-up and possible bug fix

2013-07-15 Thread Joonsoo Kim
On Tue, Jul 16, 2013 at 09:27:29AM +0800, Sam Ben wrote: > On 07/16/2013 09:10 AM, Joonsoo Kim wrote: > >On Mon, Jul 15, 2013 at 07:40:16PM +0530, Aneesh Kumar K.V wrote: > >>Joonsoo Kim writes: > >> > >>>First 5 patches are almost trivial clean-up patche

Re: [PATCH 6/9] mm, hugetlb: do not use a page in page cache for cow optimization

2013-07-15 Thread Joonsoo Kim
On Mon, Jul 15, 2013 at 07:25:40PM +0530, Aneesh Kumar K.V wrote: > Joonsoo Kim writes: > > > Currently, we use a page with mapped count 1 in page cache for cow > > optimization. If we find this condition, we don't allocate a new > > page and copy contents. Inste

Re: [PATCH 7/9] mm, hugetlb: add VM_NORESERVE check in vma_has_reserves()

2013-07-15 Thread Joonsoo Kim
On Mon, Jul 15, 2013 at 08:41:12PM +0530, Aneesh Kumar K.V wrote: > Joonsoo Kim writes: > > > If we map the region with MAP_NORESERVE and MAP_SHARED, > > we can skip to check reserve counting and eventually we cannot be ensured > > to allocate a huge page in fault

Re: [PATCH 0/9] mm, hugetlb: clean-up and possible bug fix

2013-07-15 Thread Joonsoo Kim
On Tue, Jul 16, 2013 at 09:55:48AM +0800, Sam Ben wrote: > On 07/16/2013 09:45 AM, Joonsoo Kim wrote: > >On Tue, Jul 16, 2013 at 09:27:29AM +0800, Sam Ben wrote: > >>On 07/16/2013 09:10 AM, Joonsoo Kim wrote: > >>>On Mon, Jul 15, 2013 at 07:40:16PM +0530, Aneesh Kuma

Re: [PATCH 1/9] mm, hugetlb: move up the code which check availability of free huge page

2013-07-15 Thread Joonsoo Kim
On Tue, Jul 16, 2013 at 09:06:04AM +0530, Aneesh Kumar K.V wrote: > Joonsoo Kim writes: > > > On Mon, Jul 15, 2013 at 07:31:33PM +0530, Aneesh Kumar K.V wrote: > >> Joonsoo Kim writes: > >> > >> > We don't need to proceede the processing if we

Re: [PATCH] mm/hugetlb: per-vma instantiation mutexes

2013-07-15 Thread Joonsoo Kim
On Mon, Jul 15, 2013 at 09:51:21PM -0400, Rik van Riel wrote: > On 07/15/2013 03:24 AM, David Gibson wrote: > >On Sun, Jul 14, 2013 at 08:16:44PM -0700, Davidlohr Bueso wrote: > > >>>Reading the existing comment, this change looks very suspicious to me. > >>>A per-vma mutex is just not going to pr

Re: [PATCH 7/9] mm, hugetlb: add VM_NORESERVE check in vma_has_reserves()

2013-07-16 Thread Joonsoo Kim
On Tue, Jul 16, 2013 at 11:17:23AM +0530, Aneesh Kumar K.V wrote: > Joonsoo Kim writes: > > > On Mon, Jul 15, 2013 at 08:41:12PM +0530, Aneesh Kumar K.V wrote: > >> Joonsoo Kim writes: > >> > >> > If we map the region with MAP_NORESERVE and MAP_S

Re: [RFC 0/4] Transparent on-demand struct page initialization embedded in the buddy allocator

2013-07-16 Thread Joonsoo Kim
On Fri, Jul 12, 2013 at 10:27:56AM +0200, Ingo Molnar wrote: > > * Robin Holt wrote: > > > [...] > > > > With this patch, we did boot a 16TiB machine. Without the patches, the > > v3.10 kernel with the same configuration took 407 seconds for > > free_all_bootmem. With the patches and operat

Re: [PATCH 4/9] mm, hugetlb: fix and clean-up node iteration code to alloc or free

2013-07-17 Thread Joonsoo Kim
On Wed, Jul 17, 2013 at 10:00:48AM +0800, Jianguo Wu wrote: > On 2013/7/15 17:52, Joonsoo Kim wrote: > > > Current node iteration code have a minor problem which do one more > > node rotation if we can't succeed to allocate. For example, > > if we start to allocate

Re: [PATCH] mm/hugetlb: per-vma instantiation mutexes

2013-07-17 Thread Joonsoo Kim
On Tue, Jul 16, 2013 at 08:01:46PM +1000, David Gibson wrote: > On Tue, Jul 16, 2013 at 02:34:24PM +0900, Joonsoo Kim wrote: > > On Mon, Jul 15, 2013 at 09:51:21PM -0400, Rik van Riel wrote: > > > On 07/15/2013 03:24 AM, David Gibson wrote: > > > >On Sun, Ju

Re: [PATCH] hugepage: allow parallelization of the hugepage fault path

2013-07-18 Thread Joonsoo Kim
On Wed, Jul 17, 2013 at 12:50:25PM -0700, Davidlohr Bueso wrote: > From: David Gibson > > At present, the page fault path for hugepages is serialized by a > single mutex. This is used to avoid spurious out-of-memory conditions > when the hugepage pool is fully utilized (two processes or threads c

Re: [PATCH] hugepage: allow parallelization of the hugepage fault path

2013-07-18 Thread Joonsoo Kim
On Wed, Jul 17, 2013 at 12:50:25PM -0700, Davidlohr Bueso wrote: > From: Davidlohr Bueso > > - Cleaned up and forward ported to Linus' latest. > - Cache aligned mutexes. > - Keep non SMP systems using a single mutex. > > It was found that this mutex can become quite contended > during the early

[PATCH v3 8/9] mm, hugetlb: remove decrement_hugepage_resv_vma()

2013-07-28 Thread Joonsoo Kim
patch implement it. Reviewed-by: Wanpeng Li Reviewed-by: Aneesh Kumar K.V Signed-off-by: Joonsoo Kim diff --git a/mm/hugetlb.c b/mm/hugetlb.c index ca15854..4b1b043 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -434,25 +434,6 @@ static int is_vma_resv_set(struct vm_area_struct *vma, unsigned

[PATCH v3 1/9] mm, hugetlb: move up the code which check availability of free huge page

2013-07-28 Thread Joonsoo Kim
Signed-off-by: Joonsoo Kim diff --git a/mm/hugetlb.c b/mm/hugetlb.c index e2bfbf7..fc4988c 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -539,10 +539,6 @@ static struct page *dequeue_huge_page_vma(struct hstate *h, struct zoneref *z; unsigned int cpuset_mems_cookie;

[PATCH v3 9/9] mm, hugetlb: decrement reserve count if VM_NORESERVE alloc page cache

2013-07-28 Thread Joonsoo Kim
ase reserve count. As implementing above, this patch solve the problem. Reviewed-by: Wanpeng Li Reviewed-by: Aneesh Kumar K.V Signed-off-by: Joonsoo Kim diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 4b1b043..b3b8252 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -443,10 +443,23 @@ void reset_v

[PATCH v3 6/9] mm, hugetlb: do not use a page in page cache for cow optimization

2013-07-28 Thread Joonsoo Kim
ion to a page in page cache, the problem is disappeared. So, I change the trigger condition of optimization. If this page is not AnonPage, we don't do optimization. This makes this optimization turning off for a page cache. Acked-by: Michal Hocko Reviewed-by: Wanpeng Li Reviewed-by: Naoya Hori

[PATCH v3 7/9] mm, hugetlb: add VM_NORESERVE check in vma_has_reserves()

2013-07-28 Thread Joonsoo Kim
(). This prevent to steal a reserved page. With this change, above test generate a SIGBUG which is correct, because all free pages are reserved and non reserved shared mapping can't get a free page. Reviewed-by: Wanpeng Li Reviewed-by: Aneesh Kumar K.V Signed-off-by: Joonsoo Kim diff

[PATCH v3 0/9] mm, hugetlb: clean-up and possible bug fix

2013-07-28 Thread Joonsoo Kim
ar it's purpose. Remove useless indentation changes in 'clean-up alloc_huge_page()' Fix new iteration code bug. Add reviewed-by or acked-by. Joonsoo Kim (9): mm, hugetlb: move up the code which check availability of free huge page mm, hugetlb: trivial commenting fix mm, hug

[PATCH v3 4/9] mm, hugetlb: fix and clean-up node iteration code to alloc or free

2013-07-28 Thread Joonsoo Kim
e_mask_to_[alloc|free]" and fix and clean-up node iteration code to alloc or free. This makes code more understandable. Reviewed-by: Aneesh Kumar K.V Signed-off-by: Joonsoo Kim diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 31d78c5..87d7637 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@

[PATCH 05/18] mm, hugetlb: protect region tracking via newly introduced resv_map lock

2013-07-28 Thread Joonsoo Kim
ure, so it can be modified by two processes concurrently. To solve this, I introduce a lock to resv_map and make region manipulation function grab a lock before they do actual work. This makes region tracking safe. Signed-off-by: Joonsoo Kim diff --git a/include/linux/hugetlb.h b/include/linux

[PATCH 02/18] mm, hugetlb: change variable name reservations to resv

2013-07-28 Thread Joonsoo Kim
'reservations' is so long name as a variable and we use 'resv_map' to represent 'struct resv_map' in other place. To reduce confusion and unreadability, change it. Signed-off-by: Joonsoo Kim diff --git a/mm/hugetlb.c b/mm/hugetlb.c index d971233..12b6581 1006

[PATCH v3 3/9] mm, hugetlb: clean-up alloc_huge_page()

2013-07-28 Thread Joonsoo Kim
This patch unifies successful allocation paths to make the code more readable. There are no functional changes. Acked-by: Michal Hocko Reviewed-by: Wanpeng Li Reviewed-by: Aneesh Kumar K.V Signed-off-by: Joonsoo Kim diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 51564a8..31d78c5 100644 --- a

[PATCH v3 5/9] mm, hugetlb: remove redundant list_empty check in gather_surplus_pages()

2013-07-28 Thread Joonsoo Kim
If list is empty, list_for_each_entry_safe() doesn't do anything. So, this check is redundant. Remove it. Acked-by: Michal Hocko Reviewed-by: Wanpeng Li Reviewed-by: Aneesh Kumar K.V Signed-off-by: Joonsoo Kim diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 87d7637..2e52afea 100644 ---

[PATCH 00/18] mm, hugetlb: remove a hugetlb_instantiation_mutex

2013-07-28 Thread Joonsoo Kim
uot;[PATCH] mm/hugetlb: per-vma instantiation mutexes" [2] https://lkml.org/lkml/2013/7/22/96 "[PATCH v2 00/10] mm, hugetlb: clean-up and possible bug fix" Joonsoo Kim (18): mm, hugetlb: protect reserved pages when softofflining requests the pages mm, huge

[PATCH 04/18] mm, hugetlb: region manipulation functions take resv_map rather list_head

2013-07-28 Thread Joonsoo Kim
To change a protection method for region tracking to find grained one, we pass the resv_map, instead of list_head, to region manipulation functions. This doesn't introduce any functional change, and it is just for preparing a next step. Signed-off-by: Joonsoo Kim diff --git a/mm/hugetlb.c

[PATCH 16/18] mm, hugetlb: return a reserved page to a reserved pool if failed

2013-07-28 Thread Joonsoo Kim
han how we need, because reserve count already decrease in dequeue_huge_page_vma(). This patch fix this situation. Signed-off-by: Joonsoo Kim diff --git a/mm/hugetlb.c b/mm/hugetlb.c index bb8a45f..6a9ec69 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -649,6 +649,34 @@ struct hstate *size

[PATCH 18/18] mm, hugetlb: remove a hugetlb_instantiation_mutex

2013-07-28 Thread Joonsoo Kim
Now, we have prepared to have an infrastructure in order to remove a this awkward mutex which serialize all faulting tasks, so remove it. Signed-off-by: Joonsoo Kim diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 909075b..4fab047 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -2533,9 +2533,7

[PATCH 15/18] mm, hugetlb: move up anon_vma_prepare()

2013-07-28 Thread Joonsoo Kim
sible. So move up anon_vma_prepare() which can be failed in OOM situation. Signed-off-by: Joonsoo Kim diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 683fd38..bb8a45f 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -2536,6 +2536,15 @@ retry_avoidcopy: /* Drop page_table_lock as buddy allo

[PATCH 17/18] mm, hugetlb: retry if we fail to allocate a hugepage with use_reserve

2013-07-28 Thread Joonsoo Kim
. use_reserve represent that this user is legimate one who are ensured to have enough reserved pages. This prevent these thread not to get a SIGBUS signal and make these thread retrying fault handling. Signed-off-by: Joonsoo Kim diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 6a9ec69..909075b 100644 --- a/mm

[PATCH 14/18] mm, hugetlb: clean-up error handling in hugetlb_cow()

2013-07-28 Thread Joonsoo Kim
Current code include 'Caller expects lock to be held' in every error path. We can clean-up it as we do error handling in one place. Signed-off-by: Joonsoo Kim diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 255bd9e..683fd38 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -2516

[PATCH 01/18] mm, hugetlb: protect reserved pages when softofflining requests the pages

2013-07-28 Thread Joonsoo Kim
alloc_huge_page_node() use dequeue_huge_page_node() without any validation check, so it can steal reserved page unconditionally. To fix it, check the number of free_huge_page in alloc_huge_page_node(). Signed-off-by: Joonsoo Kim diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 6782b41..d971233

[PATCH 12/18] mm, hugetlb: remove a check for return value of alloc_huge_page()

2013-07-28 Thread Joonsoo Kim
Now, alloc_huge_page() only return -ENOSPEC if failed. So, we don't worry about other return value. Signed-off-by: Joonsoo Kim diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 94173e0..35ccdad 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -2562,7 +2562,6 @@ retry_avoidcopy: new

[PATCH 13/18] mm, hugetlb: grab a page_table_lock after page_cache_release

2013-07-28 Thread Joonsoo Kim
We don't need to grab a page_table_lock when we try to release a page. So, defer to grab a page_table_lock. Signed-off-by: Joonsoo Kim diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 35ccdad..255bd9e 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -2630,10 +2630,11 @@ retry_avoi

[PATCH 11/18] mm, hugetlb: move down outside_reserve check

2013-07-28 Thread Joonsoo Kim
Just move down outsider_reserve check. This makes code more readable. There is no functional change. Signed-off-by: Joonsoo Kim diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 5f31ca5..94173e0 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -2530,20 +2530,6 @@ retry_avoidcopy

[PATCH 10/18] mm, hugetlb: call vma_has_reserve() before entering alloc_huge_page()

2013-07-28 Thread Joonsoo Kim
handling and remove a hugetlb_instantiation_mutex. Signed-off-by: Joonsoo Kim diff --git a/mm/hugetlb.c b/mm/hugetlb.c index a66226e..5f31ca5 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1123,12 +1123,12 @@ static void vma_commit_reservation(struct hstate *h, } static struct page

[PATCH 06/18] mm, hugetlb: remove vma_need_reservation()

2013-07-28 Thread Joonsoo Kim
vma_need_reservation() can be substituted by vma_has_reserves() with minor change. These function do almost same thing, so unifying them is better to maintain. Signed-off-by: Joonsoo Kim diff --git a/mm/hugetlb.c b/mm/hugetlb.c index bf2ee11..ff46a2c 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c

[PATCH 09/18] mm, hugetlb: unify has_reserve and avoid_reserve to use_reserve

2013-07-28 Thread Joonsoo Kim
Currently, we have two variable to represent whether we can use reserved page or not, has_reserve and avoid_reserve, respectively. These have same meaning, so we can unify them to use_reserve. This makes no functinoal difference, is just for clean-up. Signed-off-by: Joonsoo Kim diff --git a/mm

[PATCH 07/18] mm, hugetlb: pass has_reserve to dequeue_huge_page_vma()

2013-07-28 Thread Joonsoo Kim
We don't have to call vma_has_reserve() each time we need information. Passing has_reserve unburden this effort. Signed-off-by: Joonsoo Kim diff --git a/mm/hugetlb.c b/mm/hugetlb.c index ff46a2c..1426c03 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -572,7 +572,8 @@ static struct

[PATCH 03/18] mm, hugetlb: unify region structure handling

2013-07-28 Thread Joonsoo Kim
re to fine grained lock, and this difference hinder it. So, before changing it, unify region structure handling. Signed-off-by: Joonsoo Kim diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index a3f868a..a1ae3ada 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -366,7 +3

[PATCH v3 2/9] mm, hugetlb: trivial commenting fix

2013-07-28 Thread Joonsoo Kim
The name of the mutex written in comment is wrong. Fix it. Acked-by: Michal Hocko Acked-by: Hillf Danton Reviewed-by: Aneesh Kumar K.V Signed-off-by: Joonsoo Kim diff --git a/mm/hugetlb.c b/mm/hugetlb.c index fc4988c..51564a8 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -135,9 +135,9

[PATCH 08/18] mm, hugetlb: do hugepage_subpool_get_pages() when avoid_reserve

2013-07-28 Thread Joonsoo Kim
When we try to get a huge page with avoid_reserve, we don't consume a reserved page. So it is treated like as non-reserve case. Signed-off-by: Joonsoo Kim diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 1426c03..749629e 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1149,12 +1149,13 @@ s

Re: [PATCH 0/2] hugepage: optimize page fault path locking

2013-07-28 Thread Joonsoo Kim
On Fri, Jul 26, 2013 at 07:27:23AM -0700, Davidlohr Bueso wrote: > This patchset attempts to reduce the amount of contention we impose > on the hugetlb_instantiation_mutex by replacing the global mutex with > a table of mutexes, selected based on a hash. The original discussion can > be found here

Re: [PATCH 01/18] mm, hugetlb: protect reserved pages when softofflining requests the pages

2013-07-30 Thread Joonsoo Kim
On Mon, Jul 29, 2013 at 03:24:46PM +0800, Hillf Danton wrote: > On Mon, Jul 29, 2013 at 1:31 PM, Joonsoo Kim wrote: > > alloc_huge_page_node() use dequeue_huge_page_node() without > > any validation check, so it can steal reserved page unconditionally. > > Well, why is it il

Re: [PATCH 03/18] mm, hugetlb: unify region structure handling

2013-07-30 Thread Joonsoo Kim
On Tue, Jul 30, 2013 at 10:57:37PM +0530, Aneesh Kumar K.V wrote: > Joonsoo Kim writes: > > > Currently, to track a reserved and allocated region, we use two different > > ways for MAP_SHARED and MAP_PRIVATE. For MAP_SHARED, we use > > address_mapping's private_list

Re: [PATCH 05/18] mm, hugetlb: protect region tracking via newly introduced resv_map lock

2013-07-30 Thread Joonsoo Kim
On Mon, Jul 29, 2013 at 04:58:57PM +0800, Hillf Danton wrote: > On Mon, Jul 29, 2013 at 1:31 PM, Joonsoo Kim wrote: > > There is a race condition if we map a same file on different processes. > > Region tracking is protected by mmap_sem and hugetlb_instantiation_mutex. > >

Re: [PATCH 05/18] mm, hugetlb: protect region tracking via newly introduced resv_map lock

2013-07-30 Thread Joonsoo Kim
On Mon, Jul 29, 2013 at 11:53:05AM -0700, Davidlohr Bueso wrote: > On Mon, 2013-07-29 at 14:31 +0900, Joonsoo Kim wrote: > > There is a race condition if we map a same file on different processes. > > Region tracking is protected by mmap_sem and hugetlb_instantiation_mutex. > >

Re: [PATCH 01/18] mm, hugetlb: protect reserved pages when softofflining requests the pages

2013-07-30 Thread Joonsoo Kim
On Wed, Jul 31, 2013 at 10:49:24AM +0800, Hillf Danton wrote: > On Wed, Jul 31, 2013 at 10:27 AM, Joonsoo Kim wrote: > > On Mon, Jul 29, 2013 at 03:24:46PM +0800, Hillf Danton wrote: > >> On Mon, Jul 29, 2013 at 1:31 PM, Joonsoo Kim > >> wrote: > &

Re: [PATCH 06/18] mm, hugetlb: remove vma_need_reservation()

2013-07-30 Thread Joonsoo Kim
Hello, Naoya. On Mon, Jul 29, 2013 at 01:52:52PM -0400, Naoya Horiguchi wrote: > Hi, > > On Mon, Jul 29, 2013 at 02:31:57PM +0900, Joonsoo Kim wrote: > > vma_need_reservation() can be substituted by vma_has_reserves() > > with minor change. These function do almost same t

Re: [PATCH 06/18] mm, hugetlb: remove vma_need_reservation()

2013-07-30 Thread Joonsoo Kim
On Tue, Jul 30, 2013 at 11:19:58PM +0530, Aneesh Kumar K.V wrote: > Joonsoo Kim writes: > > > vma_need_reservation() can be substituted by vma_has_reserves() > > with minor change. These function do almost same thing, > > so unifying them is better to maintain. > &

Re: [PATCH 08/18] mm, hugetlb: do hugepage_subpool_get_pages() when avoid_reserve

2013-07-30 Thread Joonsoo Kim
On Mon, Jul 29, 2013 at 02:05:51PM -0400, Naoya Horiguchi wrote: > On Mon, Jul 29, 2013 at 02:31:59PM +0900, Joonsoo Kim wrote: > > When we try to get a huge page with avoid_reserve, we don't consume > > a reserved page. So it is treated like as non-reserve case. > > Thi

Re: [PATCH 10/18] mm, hugetlb: call vma_has_reserve() before entering alloc_huge_page()

2013-07-30 Thread Joonsoo Kim
On Mon, Jul 29, 2013 at 02:27:54PM -0400, Naoya Horiguchi wrote: > On Mon, Jul 29, 2013 at 02:32:01PM +0900, Joonsoo Kim wrote: > > To implement a graceful failure handling, we need to know whether > > allocation request is for reserved pool or not, on higher level. > > In thi

Re: [PATCH 11/18] mm, hugetlb: move down outside_reserve check

2013-07-30 Thread Joonsoo Kim
On Mon, Jul 29, 2013 at 02:39:30PM -0400, Naoya Horiguchi wrote: > On Mon, Jul 29, 2013 at 02:32:02PM +0900, Joonsoo Kim wrote: > > Just move down outsider_reserve check. > > This makes code more readable. > > > > There is no functional change. > > Why don'

Re: [PATCH 15/18] mm, hugetlb: move up anon_vma_prepare()

2013-07-30 Thread Joonsoo Kim
On Mon, Jul 29, 2013 at 03:19:15PM -0400, Naoya Horiguchi wrote: > On Mon, Jul 29, 2013 at 03:05:37PM -0400, Naoya Horiguchi wrote: > > On Mon, Jul 29, 2013 at 02:32:06PM +0900, Joonsoo Kim wrote: > > > If we fail with a allocated hugepage, it is hard to recover properly. > &g

Re: [PATCH 16/18] mm, hugetlb: return a reserved page to a reserved pool if failed

2013-07-30 Thread Joonsoo Kim
On Mon, Jul 29, 2013 at 04:19:10PM -0400, Naoya Horiguchi wrote: > On Mon, Jul 29, 2013 at 02:32:07PM +0900, Joonsoo Kim wrote: > > If we fail with a reserved page, just calling put_page() is not sufficient, > > because put_page() invoke free_huge_page() at last step and it d

Re: [PATCH 17/18] mm, hugetlb: retry if we fail to allocate a hugepage with use_reserve

2013-07-30 Thread Joonsoo Kim
Hello, David. On Mon, Jul 29, 2013 at 05:28:23PM +1000, David Gibson wrote: > On Mon, Jul 29, 2013 at 02:32:08PM +0900, Joonsoo Kim wrote: > > If parallel fault occur, we can fail to allocate a hugepage, > > because many threads dequeue a hugepage to handle a fault of same address.

Re: [PATCH v3 6/9] mm, hugetlb: do not use a page in page cache for cow optimization

2013-07-30 Thread Joonsoo Kim
On Tue, Jul 30, 2013 at 08:37:08AM +1000, David Gibson wrote: > On Mon, Jul 29, 2013 at 02:28:18PM +0900, Joonsoo Kim wrote: > > Currently, we use a page with mapped count 1 in page cache for cow > > optimization. If we find this condition, we don't allocate a new > &

Re: [PATCH 01/18] mm, hugetlb: protect reserved pages when softofflining requests the pages

2013-07-30 Thread Joonsoo Kim
On Wed, Jul 31, 2013 at 02:21:38PM +0800, Hillf Danton wrote: > On Wed, Jul 31, 2013 at 12:41 PM, Joonsoo Kim wrote: > > On Wed, Jul 31, 2013 at 10:49:24AM +0800, Hillf Danton wrote: > >> On Wed, Jul 31, 2013 at 10:27 AM, Joonsoo Kim > >> wrote: > >> > O

Re: [PATCH 01/18] mm, hugetlb: protect reserved pages when softofflining requests the pages

2013-07-31 Thread Joonsoo Kim
On Wed, Jul 31, 2013 at 11:25:04PM +0800, Hillf Danton wrote: > On Wed, Jul 31, 2013 at 2:37 PM, Joonsoo Kim wrote: > > On Wed, Jul 31, 2013 at 02:21:38PM +0800, Hillf Danton wrote: > >> On Wed, Jul 31, 2013 at 12:41 PM, Joonsoo Kim > >> wrote: > >> > O

[PATCH v2 2/3] sched: factor out code to should_we_balance()

2013-08-01 Thread Joonsoo Kim
354958aa7 kernel/sched/fair.o In addition, rename @balance to @should_balance in order to represent its purpose more clearly. Signed-off-by: Joonsoo Kim diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index eaae77e..7f51b8c 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c

[PATCH v2 3/3] sched: clean-up struct sd_lb_stat

2013-08-01 Thread Joonsoo Kim
d-off-by: Joonsoo Kim diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 7f51b8c..f72ee7d 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -4195,36 +4195,6 @@ static unsigned long task_h_load(struct task_struct *p) /** Helpers for find_busiest_

[PATCH v2 1/3] sched: remove one division operation in find_buiest_queue()

2013-08-01 Thread Joonsoo Kim
Remove one division operation in find_buiest_queue(). Signed-off-by: Joonsoo Kim diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index c61a614..eaae77e 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -4931,7 +4931,7 @@ static struct rq *find_busiest_queue(struct lb_env *env

[PATCH v2 0/3] optimization, clean-up about fair.c

2013-08-01 Thread Joonsoo Kim
an be compiled properly. * Change from v1 Remove 2 patches, because I cannot sure they are right. Joonsoo Kim (3): sched: remove one division operation in find_buiest_queue() sched: factor out code to should_we_balance() sched: clean-up struct sd_lb_stat kernel/sched/fair.c |

[PATCH 1/2] mm, vmalloc: remove useless variable in vmap_block

2013-08-01 Thread Joonsoo Kim
vbq in vmap_block isn't used. So remove it. Signed-off-by: Joonsoo Kim diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 13a5495..d23c432 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -752,7 +752,6 @@ struct vmap_block_queue { struct vmap_block { spinlock_t lock; s

[PATCH 2/2] mm, vmalloc: use well-defined find_last_bit() func

2013-08-01 Thread Joonsoo Kim
Our intention in here is to find last_bit within the region to flush. There is well-defined function, find_last_bit() for this purpose and it's performance may be slightly better than current implementation. So change it. Signed-off-by: Joonsoo Kim diff --git a/mm/vmalloc.c b/mm/vmalloc.c

[PATCH] mm, slab_common: add 'unlikely' to size check of kmalloc_slab()

2013-08-01 Thread Joonsoo Kim
Size is usually below than KMALLOC_MAX_SIZE. If we add a 'unlikely' macro, compiler can make better code. Signed-off-by: Joonsoo Kim diff --git a/mm/slab_common.c b/mm/slab_common.c index 538bade..f0410eb 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -373,7 +373,7 @@ struct

[PATCH 4/4] swap: clean-up #ifdef in page_mapping()

2013-08-01 Thread Joonsoo Kim
PageSwapCache() is always false when !CONFIG_SWAP, so compiler properly discard related code. Therefore, we don't need #ifdef explicitly. Signed-off-by: Joonsoo Kim diff --git a/include/linux/swap.h b/include/linux/swap.h index d95cde5..c638a71 100644 --- a/include/linux/swap.h +++ b/in

[PATCH 3/4] mm: move pgtable related functions to right place

2013-08-01 Thread Joonsoo Kim
pgtable related functions are mostly in pgtable-generic.c. So move remaining functions from memory.c to pgtable-generic.c. Signed-off-by: Joonsoo Kim diff --git a/mm/memory.c b/mm/memory.c index 1ce2e2a..26bce51 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -374,30 +374,6 @@ void

[PATCH 1/4] mm, page_alloc: add likely macro to help compiler optimization

2013-08-01 Thread Joonsoo Kim
We rarely allocate a page with ALLOC_NO_WATERMARKS and it is used in slow path. For making fast path more faster, add likely macro to help compiler optimization. Signed-off-by: Joonsoo Kim diff --git a/mm/page_alloc.c b/mm/page_alloc.c index b100255..86ad44b 100644 --- a/mm/page_alloc.c +++ b

[PATCH 2/4] mm, migrate: allocation new page lazyily in unmap_and_move()

2013-08-01 Thread Joonsoo Kim
We don't need a new page and then go out immediately if some condition is met. Allocation has overhead in comparison with some condition check, so allocating lazyily is preferable solution. Signed-off-by: Joonsoo Kim diff --git a/mm/migrate.c b/mm/migrate.c index 6f0c244..86db87e 100644

Re: [PATCH v2 2/3] sched: factor out code to should_we_balance()

2013-08-02 Thread Joonsoo Kim
On Fri, Aug 02, 2013 at 09:51:45AM +0200, Vincent Guittot wrote: > On 2 August 2013 03:50, Joonsoo Kim wrote: > > Now checking whether this cpu is appropriate to balance or not > > is embedded into update_sg_lb_stats() and this checking has no direct > > relationship to th

Re: [PATCH v2 2/3] sched: factor out code to should_we_balance()

2013-08-05 Thread Joonsoo Kim
On Fri, Aug 02, 2013 at 12:20:40PM +0200, Peter Zijlstra wrote: > On Fri, Aug 02, 2013 at 06:05:51PM +0900, ���ؼ� wrote: > > What is with you people; have you never learned to trim emails? > > Seriously, I'm going to write a script which tests to too many quoted > lines, too many nested quotes an

Re: [PATCH v2 2/3] sched: factor out code to should_we_balance()

2013-08-05 Thread Joonsoo Kim
On Mon, Aug 05, 2013 at 09:52:28AM +0530, Preeti U Murthy wrote: > On 08/02/2013 04:02 PM, Peter Zijlstra wrote: > > On Fri, Aug 02, 2013 at 02:56:14PM +0530, Preeti U Murthy wrote: > You need to iterate over all the groups of the sched domain env->sd and > not just the first group of env

Re: [PATCH v2 3/3] sched: clean-up struct sd_lb_stat

2013-08-05 Thread Joonsoo Kim
> > + if (busiest->group_imb) { > > + busiest->sum_weighted_load = > > + min(busiest->sum_weighted_load, sds->sd_avg_load); > > Right here we get confused as to why the total load is being compared > against load per task (although you are changing it to load per task

Re: [PATCH 17/18] mm, hugetlb: retry if we fail to allocate a hugepage with use_reserve

2013-08-05 Thread Joonsoo Kim
> Any mapping that doesn't use the reserved pool, not just > MAP_NORESERVE. For example, if a process makes a MAP_PRIVATE mapping, > then fork()s then the mapping is instantiated in the child, that will > not draw from the reserved pool. > > > Should we ensure them to allocate the last hugepage?

Re: [PATCH 2/4] mm, migrate: allocation new page lazyily in unmap_and_move()

2013-08-05 Thread Joonsoo Kim
> get_new_page() sets up result to communicate error codes from the > following checks. While the existing ones (page freed and thp split > failed) don't change rc, somebody else might add a condition whose > error code should be propagated back into *result but miss it. > > Please leave get_new_

Re: [PATCH 1/4] mm, page_alloc: add likely macro to help compiler optimization

2013-08-05 Thread Joonsoo Kim
Hello, Michal. On Fri, Aug 02, 2013 at 11:36:07PM +0200, Michal Hocko wrote: > On Fri 02-08-13 16:47:10, Johannes Weiner wrote: > > On Fri, Aug 02, 2013 at 06:27:22PM +0200, Michal Hocko wrote: > > > On Fri 02-08-13 11:07:56, Joonsoo Kim wrote: > > > >

Re: [PATCH 1/4] mm, page_alloc: add likely macro to help compiler optimization

2013-08-05 Thread Joonsoo Kim
On Mon, Aug 05, 2013 at 05:10:08PM +0900, Joonsoo Kim wrote: > Hello, Michal. > > On Fri, Aug 02, 2013 at 11:36:07PM +0200, Michal Hocko wrote: > > On Fri 02-08-13 16:47:10, Johannes Weiner wrote: > > > On Fri, Aug 02, 2013 at 06:27:22PM +0200, Michal Hocko wrote: > >

Re: [PATCH 04/10] sched, fair: Shrink sg_lb_stats and play memset games

2013-08-20 Thread Joonsoo Kim
On Mon, Aug 19, 2013 at 06:01:02PM +0200, Peter Zijlstra wrote: > +static inline void init_sd_lb_stats(struct sd_lb_stats *sds) > +{ > + /* > + * struct sd_lb_stats { > + * struct sched_group * busiest; // 0 8 > + * struct sched_group *

Re: [PATCH 00/10] Various load-balance cleanups/optimizations -v2

2013-08-20 Thread Joonsoo Kim
On Mon, Aug 19, 2013 at 06:00:58PM +0200, Peter Zijlstra wrote: > > After poking at them a little more I feel somewhat more confident. > > I found one more bug, but this one was my own fault, we should also clear > sds->busiest_stat.avg_load because update_sd_pick_busiest() reads that before > we

Re: [PATCH 04/10] sched, fair: Shrink sg_lb_stats and play memset games

2013-08-20 Thread Joonsoo Kim
On Wed, Aug 21, 2013 at 11:08:29AM +0900, Joonsoo Kim wrote: > On Mon, Aug 19, 2013 at 06:01:02PM +0200, Peter Zijlstra wrote: > > +static inline void init_sd_lb_stats(struct sd_lb_stats *sds) > > +{ > > + /* > > +* struct sd_lb_stats { > > +* s

Re: [PATCH v2 03/20] mm, hugetlb: fix subpool accounting handling

2013-08-21 Thread Joonsoo Kim
Hello, Aneesh. First of all, thank you for review! On Wed, Aug 21, 2013 at 02:58:20PM +0530, Aneesh Kumar K.V wrote: > Joonsoo Kim writes: > > > If we alloc hugepage with avoid_reserve, we don't dequeue reserved one. > > So, we should check subpool counter when avoid_

Re: [PATCH v2 06/20] mm, hugetlb: return a reserved page to a reserved pool if failed

2013-08-21 Thread Joonsoo Kim
On Wed, Aug 21, 2013 at 03:24:13PM +0530, Aneesh Kumar K.V wrote: > Joonsoo Kim writes: > > > If we fail with a reserved page, just calling put_page() is not sufficient, > > because put_page() invoke free_huge_page() at last step and it doesn't > > know whether a pa

Re: [PATCH v2 07/20] mm, hugetlb: unify region structure handling

2013-08-21 Thread Joonsoo Kim
On Wed, Aug 21, 2013 at 03:52:57PM +0530, Aneesh Kumar K.V wrote: > Joonsoo Kim writes: > > > Currently, to track a reserved and allocated region, we use two different > > ways for MAP_SHARED and MAP_PRIVATE. For MAP_SHARED, we use > > address_mapping's private_list

Re: [PATCH v2 07/20] mm, hugetlb: unify region structure handling

2013-08-21 Thread Joonsoo Kim
On Wed, Aug 21, 2013 at 03:27:38PM +0530, Aneesh Kumar K.V wrote: > Joonsoo Kim writes: > > > Currently, to track a reserved and allocated region, we use two different > > ways for MAP_SHARED and MAP_PRIVATE. For MAP_SHARED, we use > > address_mapping's private_list

Re: [PATCH v2 09/20] mm, hugetlb: protect region tracking via newly introduced resv_map lock

2013-08-22 Thread Joonsoo Kim
On Wed, Aug 21, 2013 at 03:43:27PM +0530, Aneesh Kumar K.V wrote: > > static long region_chg(struct resv_map *resv, long f, long t) > > { > > struct list_head *head = &resv->regions; > > - struct file_region *rg, *nrg; > > + struct file_region *rg, *nrg = NULL; > > long chg = 0; > >

Re: [PATCH v2 10/20] mm, hugetlb: remove resv_map_put()

2013-08-22 Thread Joonsoo Kim
On Wed, Aug 21, 2013 at 04:19:20PM +0530, Aneesh Kumar K.V wrote: > Joonsoo Kim writes: > > > In following patch, I change vma_resv_map() to return resv_map > > for all case. This patch prepares it by removing resv_map_put() which > > doesn't works properly with

Re: [PATCH v2 11/20] mm, hugetlb: make vma_resv_map() works for all mapping type

2013-08-22 Thread Joonsoo Kim
On Wed, Aug 21, 2013 at 04:07:36PM +0530, Aneesh Kumar K.V wrote: > Joonsoo Kim writes: > > > Util now, we get a resv_map by two ways according to each mapping type. > > This makes code dirty and unreadable. So unfiying it. > > > > Signed-off-by: Joonsoo Kim >

Re: [PATCH v2 03/20] mm, hugetlb: fix subpool accounting handling

2013-08-22 Thread Joonsoo Kim
On Thu, Aug 22, 2013 at 12:38:12PM +0530, Aneesh Kumar K.V wrote: > Joonsoo Kim writes: > > > Hello, Aneesh. > > > > First of all, thank you for review! > > > > On Wed, Aug 21, 2013 at 02:58:20PM +0530, Aneesh Kumar K.V wrote: > >> Joonsoo Kim w

<    1   2   3   4   5   6   7   8   9   10   >