Re: [PATCH 2/3 v2] Optimize CRC32C calculation with PCLMULQDQ instruction

2013-02-26 Thread Tim Chen
On Tue, 2013-02-26 at 17:54 +0800, Herbert Xu wrote: > On Thu, Sep 27, 2012 at 03:44:22PM -0700, Tim Chen wrote: > > This patch adds the crc_pcl function that calculates CRC32C checksum using > > the > > PCLMULQDQ instruction on processors that support this feature. This wil

[PATCH] Update the links to the white papers on CRC32C calculations with PCLMULQDQ instructions.

2013-02-21 Thread Tim Chen
Herbert, The following patch update the stale link to the CRC32C white paper that was referenced. Tim Signed-off-by: Tim Chen --- arch/x86/crypto/crc32c-pcl-intel-asm_64.S |5 +++-- 1 files changed, 3 insertions(+), 2 deletions(-) diff --git a/arch/x86/crypto/crc32c-pcl-intel-asm_64.S

Re: [RFC PATCH 2/2] mm: Batch page_check_references in shrink_page_list sharing the same i_mmap_mutex

2012-09-04 Thread Tim Chen
On Tue, 2012-08-21 at 17:48 -0700, Tim Chen wrote: > > Thanks to Matthew's suggestions on improving the patch. Here's the > updated version. It seems to be sane when I booted my machine up. I > will put it through more testing when I get a chance. > > Tim > M

Re: [RFC PATCH 2/2] mm: Batch page_check_references in shrink_page_list sharing the same i_mmap_mutex

2012-09-04 Thread Tim Chen
On Tue, 2012-09-04 at 08:21 -0700, Tim Chen wrote: > On Tue, 2012-08-21 at 17:48 -0700, Tim Chen wrote: > > > > > Thanks to Matthew's suggestions on improving the patch. Here's the > > updated version. It seems to be sane when I booted my machine up. I > &g

Re: [PATCH 0/3 v2] mm: Batch page reclamation under shink_page_list

2012-09-12 Thread Tim Chen
On Wed, 2012-09-12 at 12:27 -0700, Andrew Morton wrote: > > That sounds good, although more details on the performance changes > would be appreciated - after all, that's the entire point of the > patchset. > > And we shouldn't only test for improvements - we should also test for > degradation.

Re: [PATCH 1/3 v2] mm: Batch unmapping of file mapped pages in shrink_page_list

2012-09-13 Thread Tim Chen
On Tue, 2012-09-11 at 12:05 +0100, Mel Gorman wrote: > > One *massive* change here that is not called out in the changelog is that > the reclaim path now holds the page lock on multiple pages at the same > time waiting for them to be batch unlocked in __remove_mapping_batch. > This is suspicious

Re: [PATCH 0/3 v2] mm: Batch page reclamation under shink_page_list

2012-09-13 Thread Tim Chen
On Tue, 2012-09-11 at 14:36 +0900, Minchan Kim wrote: > > If you send next versions, please use git-format-patch --thread style. > Quote from man > " If given --thread, git-format-patch will generate In-Reply-To and > References >headers to make the second and subsequent patch mail

Re: [PATCH 0/3 v2] Optimize CRC32C calculation using PCLMULQDQ in crc32c-intel module

2012-10-05 Thread Tim Chen
On Fri, 2012-09-28 at 10:57 +0800, Herbert Xu wrote: > 在 2012-9-28 上午10:54,"H. Peter Anvin" 写道: > > > > On 09/27/2012 03:44 PM, Tim Chen wrote: > >> > >> Version 2 > >> This version of the patch series fixes compilation errors for > >> 3

[PATCH 1/3] Rename crc32c-intel.c to crc32c-intel_glue.c

2012-09-25 Thread Tim Chen
This patch rename the crc32c-intel.c file to crc32c-intel_glue.c file in preparation for linking with the new crc32c-pcl-intel-asm.S file, which contains optimized crc32c calculation based on PCLMULQDQ instruction. Tim Signed-off-by: Tim Chen --- arch/x86/crypto/Makefile

[PATCH 2/3] Optimize CRC32C calculation with PCLMULQDQ instruction

2012-09-25 Thread Tim Chen
://download.intel.com/design/intarch/papers/323405.pdf Tim Signed-off-by: Tim Chen --- arch/x86/crypto/Makefile |2 +- arch/x86/crypto/crc32c-intel_glue.c| 75 + arch/x86/crypto/crc32c-pcl-intel-asm.S | 460 3 files changed, 536 insertions(+), 1

[PATCH 3/3] Added speed test in tcrypt for crc32c

2012-09-25 Thread Tim Chen
This patch adds a test case in tcrypt to perform speed test for crc32c checksum calculation. Tim Signed-off-by: Tim Chen --- crypto/tcrypt.c |4 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/crypto/tcrypt.c b/crypto/tcrypt.c index 581081d..6deb77f 100644 --- a/crypto

[PATCH 0/3] Optimize CRC32C calculation using PCLMULQDQ in crc32c-intel module

2012-09-25 Thread Tim Chen
Chen --- Tim Chen (3): Rename crc32c-intel.c to crc32c-intel_glue.c Optimize CRC32C calculation with PCLMULQDQ instruction Added speed test in tcrypt for crc32c arch/x86/crypto/Makefile |1 + .../crypto/{crc32c-intel.c => crc32c-intel_glue.c} | 75 a

[PATCH] Avoid useless inodes and dentries reclamation

2013-08-28 Thread Tim Chen
| alloc_pages_current Signed-off-by: Tim Chen --- fs/super.c | 8 1 file changed, 8 insertions(+) diff --git a/fs/super.c b/fs/super.c index 68307c0..70fa26c 100644 --- a/fs/super.c +++ b/fs/super.c @@ -53,6 +53,7 @@ static char *sb_writers_name

Re: [PATCH] Avoid useless inodes and dentries reclamation

2013-08-28 Thread Tim Chen
an miss a memory hog easily this > way. Is it safe to compute sb->s_op->nr_cached_objects(sb), assuming non null s_op without holding sb_lock to increment ref count on sb? I think it is safe as we hold the shrinker_rwsem so we cannot unregister the shrinker and the s_op an

Re: [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file

2013-10-01 Thread Tim Chen
On Mon, 2013-09-30 at 12:36 -0400, Waiman Long wrote: > On 09/30/2013 12:10 PM, Jason Low wrote: > > On Mon, 2013-09-30 at 11:51 -0400, Waiman Long wrote: > >> On 09/28/2013 12:34 AM, Jason Low wrote: > Also, below is what the mcs_spin_lock() and mcs_spin_unlock() > functions would look l

Re: [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file

2013-10-01 Thread Tim Chen
On Tue, 2013-10-01 at 16:01 -0400, Waiman Long wrote: > On 10/01/2013 12:48 PM, Tim Chen wrote: > > On Mon, 2013-09-30 at 12:36 -0400, Waiman Long wrote: > >> On 09/30/2013 12:10 PM, Jason Low wrote: > >>> On Mon, 2013-09-30 at 11:51 -0400, Waiman Long wrote: > &

Re: [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file

2013-10-02 Thread Tim Chen
On Tue, 2013-10-01 at 21:25 -0400, Waiman Long wrote: > On 10/01/2013 05:16 PM, Tim Chen wrote: > > On Tue, 2013-10-01 at 16:01 -0400, Waiman Long wrote: > >>> > >>> The cpu could still be executing out of order load instruction from the > >>>

[PATCH v8 7/9] MCS Lock: Barrier corrections

2013-10-02 Thread Tim Chen
This patch corrects the way memory barriers are used in the MCS lock and removes ones that are not needed. Also add comments on all barriers. Signed-off-by: Jason Low --- include/linux/mcs_spinlock.h | 13 +++-- 1 files changed, 11 insertions(+), 2 deletions(-) diff --git a/include/li

[PATCH v8 1/9] rwsem: check the lock before cpmxchg in down_write_trylock

2013-10-02 Thread Tim Chen
Cmpxchg will cause the cacheline bouning when do the value checking, that cause scalability issue in a large machine (like a 80 core box). So a lock pre-read can relief this contention. Signed-off-by: Alex Shi --- include/asm-generic/rwsem.h |8 1 files changed, 4 insertions(+), 4

[PATCH v8 5/9] MCS Lock: Restructure the MCS lock defines and locking code into its own file

2013-10-02 Thread Tim Chen
We will need the MCS lock code for doing optimistic spinning for rwsem. Extracting the MCS code from mutex.c and put into its own file allow us to reuse this code easily for rwsem. Reviewed-by: Ingo Molnar Reviewed-by: Peter Zijlstra Signed-off-by: Tim Chen Signed-off-by: Davidlohr Bueso

[PATCH v8 4/9] rwsem/wake: check lock before do atomic update

2013-10-02 Thread Tim Chen
Atomic update lock and roll back will cause cache bouncing in large machine. A lock status pre-read can relieve this problem Suggested-by: Davidlohr bueso Suggested-by: Tim Chen Signed-off-by: Alex Shi --- lib/rwsem.c |8 +++- 1 files changed, 7 insertions(+), 1 deletions(-) diff

[PATCH v8 3/9] rwsem: remove try_reader_grant label do_wake

2013-10-02 Thread Tim Chen
That make code simple and more readable Signed-off-by: Alex Shi --- lib/rwsem.c | 12 +++- 1 files changed, 7 insertions(+), 5 deletions(-) diff --git a/lib/rwsem.c b/lib/rwsem.c index 42f1b1a..a8055cf 100644 --- a/lib/rwsem.c +++ b/lib/rwsem.c @@ -85,15 +85,17 @@ __rwsem_do_wake(stru

[PATCH v8 0/9] rwsem performance optimizations

2013-10-02 Thread Tim Chen
mments update Alex Shi (4): rwsem: check the lock before cpmxchg in down_write_trylock rwsem: remove 'out' label in do_wake rwsem: remove try_reader_grant label do_wake rwsem/wake: check lock before do atomic update Jason Low (2): MCS Lock: optimizations and extra comments

[PATCH v8 6/9] MCS Lock: optimizations and extra comments

2013-10-02 Thread Tim Chen
Remove unnecessary operation and make the cmpxchg(lock, node, NULL) == node check in mcs_spin_unlock() likely() as it is likely that a race did not occur most of the time. Also add in more comments describing how the local node is used in MCS locks. Signed-off-by: Jason Low --- include/linux/mc

[PATCH v8 8/9] rwsem: do optimistic spinning for writer lock acquisition

2013-10-02 Thread Tim Chen
queue reduces wait queue contention and provided greater chance for the rwsem to get acquired. With these changes, rwsem is on par with mutex. Reviewed-by: Ingo Molnar Reviewed-by: Peter Zijlstra Reviewed-by: Peter Hurley Signed-off-by: Tim Chen Signed-off-by: Davidlohr Bueso --- include/

[PATCH v8 2/9] rwsem: remove 'out' label in do_wake

2013-10-02 Thread Tim Chen
That make code simple and more readable. Signed-off-by: Alex Shi --- lib/rwsem.c |5 ++--- 1 files changed, 2 insertions(+), 3 deletions(-) diff --git a/lib/rwsem.c b/lib/rwsem.c index 19c5fa9..42f1b1a 100644 --- a/lib/rwsem.c +++ b/lib/rwsem.c @@ -75,7 +75,7 @@ __rwsem_do_wake(struct rw_se

[PATCH v8 9/9] rwsem: reduce spinlock contention in wakeup code path

2013-10-02 Thread Tim Chen
With the 3.12-rc2 kernel, there is sizable spinlock contention on the rwsem wakeup code path when running AIM7's high_systime workload on a 8-socket 80-core DL980 (HT off) as reported by perf: 7.64% reaim [kernel.kallsyms] [k] _raw_spin_lock_irqsave |--41.77%-- rwsem_wake 1.6

[PATCH 1/2] fs/superblock: Unregister sb shrinker before ->kill_sb()

2013-09-06 Thread Tim Chen
the shrinker before ->kill_sb(). Signed-off-by: Tim Chen --- fs/super.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/super.c b/fs/super.c index 73d0952..b724f35 100644 --- a/fs/super.c +++ b/fs/super.c @@ -324,10 +324,10 @@ void deactivate_locked_super(stru

Re: [PATCH] Avoid useless inodes and dentries reclamation

2013-09-06 Thread Tim Chen
On Fri, 2013-09-06 at 10:55 +1000, Dave Chinner wrote: > On Tue, Sep 03, 2013 at 11:38:27AM -0700, Tim Chen wrote: > > On Sat, 2013-08-31 at 19:00 +1000, Dave Chinner wrote: > > > On Fri, Aug 30, 2013 at 09:21:34AM -0700, Tim Chen wrote: > > > > > >

[PATCH 2/2] fs/superblock: Avoid locking counting inodes and dentries before reclaiming them

2013-09-06 Thread Tim Chen
ion is safe when we are doing unmount. Signed-off-by: Tim Chen --- fs/super.c | 12 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/fs/super.c b/fs/super.c index b724f35..b5c9fdf 100644 --- a/fs/super.c +++ b/fs/super.c @@ -112,9 +112,14 @@ static unsigne

Re: [PATCH v8 0/9] rwsem performance optimizations

2013-10-07 Thread Tim Chen
On Thu, 2013-10-03 at 09:32 +0200, Ingo Molnar wrote: > * Tim Chen wrote: > > > For version 8 of the patchset, we included the patch from Waiman to > > streamline wakeup operations and also optimize the MCS lock used in > > rwsem and mutex. > > I'd be fe

Re: [PATCH v8 5/9] MCS Lock: Restructure the MCS lock defines and locking code into its own file

2013-10-08 Thread Tim Chen
On Tue, 2013-10-08 at 16:51 -0300, Rafael Aquini wrote: > On Wed, Oct 02, 2013 at 03:38:32PM -0700, Tim Chen wrote: > > We will need the MCS lock code for doing optimistic spinning for rwsem. > > Extracting the MCS code from mutex.c and put into its own file allow us > > to r

Re: [PATCH v8 0/9] rwsem performance optimizations

2013-10-09 Thread Tim Chen
On Wed, 2013-10-09 at 08:15 +0200, Ingo Molnar wrote: > * Tim Chen wrote: > > > Ingo, > > > > I ran the vanilla kernel, the kernel with all rwsem patches and the > > kernel with all patches except the optimistic spin one. I am listing > > two presenta

Re: [PATCH] Avoid useless inodes and dentries reclamation

2013-08-29 Thread Tim Chen
> > Signed-off-by: Tim Chen > > --- > > fs/super.c | 8 > > 1 file changed, 8 insertions(+) > > > > diff --git a/fs/super.c b/fs/super.c > > index 68307c0..70fa26c 100644 > > --- a/fs/super.c > > +++ b/fs/super.c > > @@

Re: [PATCH] Avoid useless inodes and dentries reclamation

2013-08-30 Thread Tim Chen
e grab_super_passive from the super_cache_count code. That should remove the bottleneck in reclamation. Thanks for your detailed explanation. Tim Signed-off-by: Tim Chen --- diff --git a/fs/super.c b/fs/super.c index 73d0952..4df1fab 100644 --- a/fs/super.c +++ b/fs/super.c @@ -112,9 +112,6 @@ s

Re: [PATCH] Avoid useless inodes and dentries reclamation

2013-09-03 Thread Tim Chen
On Sat, 2013-08-31 at 19:00 +1000, Dave Chinner wrote: > On Fri, Aug 30, 2013 at 09:21:34AM -0700, Tim Chen wrote: > > > > > > Signed-off-by: Tim Chen > > --- > > diff --git a/fs/super.c b/fs/super.c > > index 73d0952..4df1fab 100644 > > --- a/fs/supe

Re: [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file

2013-09-26 Thread Tim Chen
On Thu, 2013-09-26 at 10:40 +0200, Peter Zijlstra wrote: > On Thu, Sep 26, 2013 at 08:46:29AM +0200, Ingo Molnar wrote: > > > +/* > > > + * MCS lock defines > > > + * > > > + * This file contains the main data structure and API definitions of MCS > > > lock. > > > > A (very) short blurb about wha

Re: [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file

2013-09-26 Thread Tim Chen
t 12:27 -0700, Jason Low wrote: > > > > > On Wed, Sep 25, 2013 at 3:10 PM, Tim Chen > > > > > wrote: > > > > > > We will need the MCS lock code for doing optimistic spinning for > > > > > > rwsem. > > > > > > Extracting t

[PATCH v7 6/6] rwsem: do optimistic spinning for writer lock acquisition

2013-09-26 Thread Tim Chen
queue reduces wait queue contention and provided greater chance for the rwsem to get acquired. With these changes, rwsem is on par with mutex. Reviewed-by: Ingo Molnar Reviewed-by: Peter Zijlstra Reviewed-by: Peter Hurley Signed-off-by: Tim Chen Signed-off-by: Davidlohr Bueso --- include/

[PATCH v7 0/6] rwsem: performance optimizations

2013-09-26 Thread Tim Chen
ared and high_systime workloads when he switched i_mmap_mutex to rwsem. Tests were on 8 socket 80 cores system. With the patchset, he got significant improvements to the aim7 suite instead of regressions: alltests (+16.3%), custom (+20%), disk (+19.5%), high_systime (+7%), shared (+18.4%) and short (+

[PATCH v7 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file

2013-09-26 Thread Tim Chen
We will need the MCS lock code for doing optimistic spinning for rwsem. Extracting the MCS code from mutex.c and put into its own file allow us to reuse this code easily for rwsem. Reviewed-by: Ingo Molnar Reviewed-by: Peter Zijlstra Signed-off-by: Tim Chen Signed-off-by: Davidlohr Bueso

[PATCH v7 2/6] rwsem: remove 'out' label in do_wake

2013-09-26 Thread Tim Chen
That make code simple and more readable. Signed-off-by: Alex Shi --- lib/rwsem.c |5 ++--- 1 files changed, 2 insertions(+), 3 deletions(-) diff --git a/lib/rwsem.c b/lib/rwsem.c index 19c5fa9..42f1b1a 100644 --- a/lib/rwsem.c +++ b/lib/rwsem.c @@ -75,7 +75,7 @@ __rwsem_do_wake(struct rw_se

[PATCH v7 1/6] rwsem: check the lock before cpmxchg in down_write_trylock

2013-09-26 Thread Tim Chen
Cmpxchg will cause the cacheline bouning when do the value checking, that cause scalability issue in a large machine (like a 80 core box). So a lock pre-read can relief this contention. Signed-off-by: Alex Shi --- include/asm-generic/rwsem.h |8 1 files changed, 4 insertions(+), 4

[PATCH v7 4/6] rwsem/wake: check lock before do atomic update

2013-09-26 Thread Tim Chen
Atomic update lock and roll back will cause cache bouncing in large machine. A lock status pre-read can relieve this problem Suggested-by: Davidlohr bueso Suggested-by: Tim Chen Signed-off-by: Alex Shi --- lib/rwsem.c |8 +++- 1 files changed, 7 insertions(+), 1 deletions(-) diff

[PATCH v7 3/6] rwsem: remove try_reader_grant label do_wake

2013-09-26 Thread Tim Chen
That make code simple and more readable Signed-off-by: Alex Shi --- lib/rwsem.c | 12 +++- 1 files changed, 7 insertions(+), 5 deletions(-) diff --git a/lib/rwsem.c b/lib/rwsem.c index 42f1b1a..a8055cf 100644 --- a/lib/rwsem.c +++ b/lib/rwsem.c @@ -85,15 +85,17 @@ __rwsem_do_wake(stru

Re: [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file

2013-09-26 Thread Tim Chen
On Thu, 2013-09-26 at 15:42 -0700, Jason Low wrote: > On Thu, 2013-09-26 at 14:41 -0700, Tim Chen wrote: > > On Thu, 2013-09-26 at 14:09 -0700, Jason Low wrote: > > > On Thu, 2013-09-26 at 13:40 -0700, Davidlohr Bueso wrote: > > > > On Thu, 2013-09-26 at 13:23 -0700,

Re: [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file

2013-09-27 Thread Tim Chen
On Fri, 2013-09-27 at 09:12 -0700, Jason Low wrote: > On Fri, 2013-09-27 at 08:02 +0200, Ingo Molnar wrote: > > Would be nice to have this as a separate, add-on patch. Every single > > instruction removal that has no downside is an upside! > > Okay, so here is a patch. Tim, would you like to add

Re: [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file

2013-09-27 Thread Tim Chen
On Fri, 2013-09-27 at 08:29 -0700, Paul E. McKenney wrote: > On Wed, Sep 25, 2013 at 03:10:49PM -0700, Tim Chen wrote: > > We will need the MCS lock code for doing optimistic spinning for rwsem. > > Extracting the MCS code from mutex.c and put into its own file allow us > &g

Re: [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file

2013-09-27 Thread Tim Chen
On Fri, 2013-09-27 at 08:29 -0700, Paul E. McKenney wrote: > On Wed, Sep 25, 2013 at 03:10:49PM -0700, Tim Chen wrote: > > We will need the MCS lock code for doing optimistic spinning for rwsem. > > Extracting the MCS code from mutex.c and put into its own file allow us > &g

Re: [PATCH] rwsem: reduce spinlock contention in wakeup code path

2013-09-27 Thread Tim Chen
On Fri, 2013-09-27 at 12:39 -0700, Davidlohr Bueso wrote: > On Fri, 2013-09-27 at 12:28 -0700, Linus Torvalds wrote: > > On Fri, Sep 27, 2013 at 12:00 PM, Waiman Long wrote: > > > > > > On a large NUMA machine, it is entirely possible that a fairly large > > > number of threads are queuing up in t

Re: [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file

2013-09-27 Thread Tim Chen
On Fri, 2013-09-27 at 13:38 -0700, Paul E. McKenney wrote: > On Fri, Sep 27, 2013 at 12:38:53PM -0700, Tim Chen wrote: > > On Fri, 2013-09-27 at 08:29 -0700, Paul E. McKenney wrote: > > > On Wed, Sep 25, 2013 at 03:10:49PM -0700, Tim Chen wrote: > > > > We will ne

Re: [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file

2013-09-30 Thread Tim Chen
see the effects of the > > previous lock holder's critical section." And in the mcs_spin_unlock(), > > move the > > memory barrier so that it is before the "ACCESS_ONCE(next->locked) = 1;". > > > > Signed-off-by: Jason Low > > Signed-off-by: Paul E. McKe

Re: [PATCH, v2] anon_vmas: Convert the rwsem to an rwlock_t

2013-09-30 Thread Tim Chen
On Sat, 2013-09-28 at 21:52 +0200, Ingo Molnar wrote: > * Linus Torvalds wrote: > > > On Sat, Sep 28, 2013 at 12:37 PM, Ingo Molnar wrote: > > > > > > - down_write_nest_lock(&anon_vma->root->rwsem, > > > &mm->mmap_sem); > > > + down_write_nest_lock(&anon_vma->root->r

Re: [PATCH, v2] anon_vmas: Convert the rwsem to an rwlock_t

2013-09-30 Thread Tim Chen
On Mon, 2013-09-30 at 20:14 +0200, Peter Zijlstra wrote: > On Mon, Sep 30, 2013 at 10:10:27AM -0700, Tim Chen wrote: > > Here's the exim workload data: > > > > rwsem improvment: > > Waimain's patch:+2.0% > > Alex+Tim's patchset:+4.8% &g

Re: [PATCH, v2] anon_vmas: Convert the rwsem to an rwlock_t

2013-09-30 Thread Tim Chen
On Mon, 2013-09-30 at 15:35 -0400, Waiman Long wrote: > On 09/30/2013 03:23 PM, Tim Chen wrote: > > On Mon, 2013-09-30 at 20:14 +0200, Peter Zijlstra wrote: > >> On Mon, Sep 30, 2013 at 10:10:27AM -0700, Tim Chen wrote: > >>> Here's the exim workload

Re: [PATCH, v2] anon_vmas: Convert the rwsem to an rwlock_t

2013-09-30 Thread Tim Chen
On Mon, 2013-09-30 at 12:47 -0700, Tim Chen wrote: > On Mon, 2013-09-30 at 15:35 -0400, Waiman Long wrote: > > On 09/30/2013 03:23 PM, Tim Chen wrote: > > > On Mon, 2013-09-30 at 20:14 +0200, Peter Zijlstra wrote: > > >> On Mon, Sep 30, 2013 at 10:10:27AM -0700, Tim

[PATCH v5 0/6] rwsem: performance optimizations

2013-09-24 Thread Tim Chen
) and short (+6.3%). Tim Chen also got a +5% improvements to exim mail server workload on a 40 core system. Thanks to Ingo Molnar, Peter Hurley and Peter Zijlstra for reviewing this patchset. Regards, Tim Chen Changelog: v5: 1. Try optimistic spinning before we put the writer on the wait que

[PATCH v5 3/6] rwsem: remove try_reader_grant label do_wake

2013-09-24 Thread Tim Chen
That make code simple and more readable Signed-off-by: Alex Shi --- lib/rwsem.c | 12 +++- 1 files changed, 7 insertions(+), 5 deletions(-) diff --git a/lib/rwsem.c b/lib/rwsem.c index 42f1b1a..a8055cf 100644 --- a/lib/rwsem.c +++ b/lib/rwsem.c @@ -85,15 +85,17 @@ __rwsem_do_wake(stru

[PATCH v5 4/6] rwsem/wake: check lock before do atomic update

2013-09-24 Thread Tim Chen
Atomic update lock and roll back will cause cache bouncing in large machine. A lock status pre-read can relieve this problem Suggested-by: Davidlohr bueso Suggested-by: Tim Chen Signed-off-by: Alex Shi --- lib/rwsem.c |8 +++- 1 files changed, 7 insertions(+), 1 deletions(-) diff

[PATCH v5 6/6] rwsem: do optimistic spinning for writer lock acquisition

2013-09-24 Thread Tim Chen
queue reduces wait queue contention and provided greater chance for the rwsem to get acquired. With these changes, rwsem is on par with mutex. Reviewed-by: Peter Zijlstra Reviewed-by: Peter Hurley Reviewed-by: Ingo Molnar Signed-off-by: Tim Chen Signed-off-by: Davidlohr Bueso --- include/

[PATCH v5 2/6] rwsem: remove 'out' label in do_wake

2013-09-24 Thread Tim Chen
That make code simple and more readable. Signed-off-by: Alex Shi --- lib/rwsem.c |5 ++--- 1 files changed, 2 insertions(+), 3 deletions(-) diff --git a/lib/rwsem.c b/lib/rwsem.c index 19c5fa9..42f1b1a 100644 --- a/lib/rwsem.c +++ b/lib/rwsem.c @@ -75,7 +75,7 @@ __rwsem_do_wake(struct rw_se

[PATCH v5 1/6] rwsem: check the lock before cpmxchg in down_write_trylock

2013-09-24 Thread Tim Chen
Cmpxchg will cause the cacheline bouning when do the value checking, that cause scalability issue in a large machine (like a 80 core box). So a lock pre-read can relief this contention. Signed-off-by: Alex Shi --- include/asm-generic/rwsem.h |8 1 files changed, 4 insertions(+), 4

[PATCH v5 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file

2013-09-24 Thread Tim Chen
We will need the MCS lock code for doing optimistic spinning for rwsem. Extracting the MCS code from mutex.c and put into its own file allow us to reuse this code easily for rwsem. Signed-off-by: Tim Chen Signed-off-by: Davidlohr Bueso --- kernel/mutex.c | 58

Re: [PATCH v5 1/6] rwsem: check the lock before cpmxchg in down_write_trylock

2013-09-24 Thread Tim Chen
On Tue, 2013-09-24 at 16:22 -0700, Jason Low wrote: > Should we do something similar with __down_read_trylock, such as > the following? > > > Signed-off-by: Jason Low > --- > include/asm-generic/rwsem.h |3 +++ > 1 files changed, 3 insertions(+), 0 deletions(-) > > diff --git a/include/asm

Re: [PATCH v5 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file

2013-09-25 Thread Tim Chen
On Wed, 2013-09-25 at 07:55 +0200, Peter Zijlstra wrote: > On Tue, Sep 24, 2013 at 03:22:46PM -0700, Tim Chen wrote: > > We will need the MCS lock code for doing optimistic spinning for rwsem. > > Extracting the MCS code from mutex.c and put into its own file allow us > > to r

[PATCH v6 0/6] rwsem: performance optimizations

2013-09-25 Thread Tim Chen
rwsem. Tests were on 8 socket 80 cores system. With the patchset, he got significant improvements to the aim7 suite instead of regressions: alltests (+16.3%), custom (+20%), disk (+19.5%), high_systime (+7%), shared (+18.4%) and short (+6.3%). Tim Chen also got a +5% improvements to exim mail se

[PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file

2013-09-25 Thread Tim Chen
We will need the MCS lock code for doing optimistic spinning for rwsem. Extracting the MCS code from mutex.c and put into its own file allow us to reuse this code easily for rwsem. Signed-off-by: Tim Chen Signed-off-by: Davidlohr Bueso --- include/linux/mcslock.h | 58

[PATCH v6 3/6] rwsem: remove try_reader_grant label do_wake

2013-09-25 Thread Tim Chen
That make code simple and more readable Signed-off-by: Alex Shi --- lib/rwsem.c | 12 +++- 1 files changed, 7 insertions(+), 5 deletions(-) diff --git a/lib/rwsem.c b/lib/rwsem.c index 42f1b1a..a8055cf 100644 --- a/lib/rwsem.c +++ b/lib/rwsem.c @@ -85,15 +85,17 @@ __rwsem_do_wake(stru

[PATCH v6 6/6] rwsem: do optimistic spinning for writer lock acquisition

2013-09-25 Thread Tim Chen
queue reduces wait queue contention and provided greater chance for the rwsem to get acquired. With these changes, rwsem is on par with mutex. Reviewed-by: Ingo Molnar Reviewed-by: Peter Zijlstra Reviewed-by: Peter Hurley Signed-off-by: Tim Chen Signed-off-by: Davidlohr Bueso --- include/

[PATCH v6 2/6] rwsem: remove 'out' label in do_wake

2013-09-25 Thread Tim Chen
That make code simple and more readable. Signed-off-by: Alex Shi --- lib/rwsem.c |5 ++--- 1 files changed, 2 insertions(+), 3 deletions(-) diff --git a/lib/rwsem.c b/lib/rwsem.c index 19c5fa9..42f1b1a 100644 --- a/lib/rwsem.c +++ b/lib/rwsem.c @@ -75,7 +75,7 @@ __rwsem_do_wake(struct rw_se

[PATCH v6 4/6] rwsem/wake: check lock before do atomic update

2013-09-25 Thread Tim Chen
Atomic update lock and roll back will cause cache bouncing in large machine. A lock status pre-read can relieve this problem Suggested-by: Davidlohr bueso Suggested-by: Tim Chen Signed-off-by: Alex Shi --- lib/rwsem.c |8 +++- 1 files changed, 7 insertions(+), 1 deletions(-) diff

[PATCH v6 1/6] rwsem: check the lock before cpmxchg in down_write_trylock

2013-09-25 Thread Tim Chen
Cmpxchg will cause the cacheline bouning when do the value checking, that cause scalability issue in a large machine (like a 80 core box). So a lock pre-read can relief this contention. Signed-off-by: Alex Shi --- include/asm-generic/rwsem.h |8 1 files changed, 4 insertions(+), 4

[PATCH 2/3 v2] mm: Reorg code to allow i_mmap_mutex acquisition to be done by caller of page_referenced & try_to_unmap

2012-09-10 Thread Tim Chen
n't have to be acquired multiple times. Tim --- Signed-off-by: Tim Chen --- diff --git a/include/linux/rmap.h b/include/linux/rmap.h index fd07c45..f1320b1 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -156,8 +156,11 @@ static inline void page_dup_rmap(struct page *page) /*

[PATCH 3/3 v2] mm: Batch page_check_references in shrink_page_list sharing the same i_mmap_mutex

2012-09-10 Thread Tim Chen
tex acquisition by holding the mutex in shrink_page_list before calling __page_referenced and __try_to_unmap. This improves the performance when the system has a lot page reclamations for file mapped pages if workloads are using a lot of memory for page cache. Tim --- Signed-off-by: Tim Chen Signed-off

[PATCH 0/3 v2] mm: Batch page reclamation under shink_page_list

2012-09-10 Thread Tim Chen
the i_mmap_mutex. I managed to get 14% throughput improvement when with a workload putting heavy pressure of page cache by reading many large mmaped files simultaneously on a 8 socket Westmere server. Tim Signed-off-by: Tim Chen --- Diffstat include/linux/rmap.h |8 +++- mm/rma

[PATCH 1/3 v2] mm: Batch unmapping of file mapped pages in shrink_page_list

2012-09-10 Thread Tim Chen
voids excessive cache bouncing of the tree lock when page reclamations are occurring simultaneously. Tim --- Signed-off-by: Tim Chen --- diff --git a/mm/vmscan.c b/mm/vmscan.c index aac5672..d4ab646 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -600,6 +600,85 @@ cannot_free: ret

Re: [PATCH 00/33] AutoNUMA27

2012-10-05 Thread Tim Chen
On Fri, 2012-10-05 at 16:14 -0700, Andi Kleen wrote: > Andrew Morton writes: > > > On Thu, 4 Oct 2012 01:50:42 +0200 > > Andrea Arcangeli wrote: > > > >> This is a new AutoNUMA27 release for Linux v3.6. > > > > Peter's numa/sched patches have been in -next for a week. > > Did they pass review

[PATCH 0/4] Patchset to use PCLMULQDQ to accelerate CRC-T10DIF checksum computation

2013-04-16 Thread Tim Chen
. Will appreciate if you can consider merging this for the 3.10 kernel. Tim Tim Chen (4): Wrap crc_t10dif function all to use crypto transform framework Accelerated CRC T10 DIF computation with PCLMULQDQ instruction Glue code to cast accelerated CRCT10DIF assembly as a crypto transform

[PATCH 4/4] Simple correctness and speed test for CRCT10DIF hash

2013-04-16 Thread Tim Chen
turbo off when running the speed test so the frequency governor will not tweak the frequency and affects the measurements. Signed-off-by: Tim Chen Tested-by: Keith Busch --- crypto/tcrypt.c | 8 crypto/testmgr.c | 10 ++ crypto/testmgr.h | 24 3 files

[PATCH 3/4] Glue code to cast accelerated CRCT10DIF assembly as a crypto transform

2013-04-16 Thread Tim Chen
: Tim Chen Tested-by: Keith Busch --- arch/x86/crypto/Makefile| 2 + arch/x86/crypto/crct10dif-pclmul_glue.c | 153 crypto/Kconfig | 21 + 3 files changed, 176 insertions(+) create mode 100644 arch/x86/crypto/crct10dif

[PATCH 1/4] Wrap crc_t10dif function all to use crypto transform framework

2013-04-16 Thread Tim Chen
When CRC T10 DIF is calculated using the crypto transform framework, we wrap the crc_t10dif function call to utilize it. This allows us to take advantage of any accelerated CRC T10 DIF transform that is plugged into the crypto framework. Signed-off-by: Tim Chen Tested-by: Keith Busch

[PATCH 2/4] Accelerated CRC T10 DIF computation with PCLMULQDQ instruction

2013-04-16 Thread Tim Chen
323102.pdf Signed-off-by: Tim Chen Tested-by: Keith Busch --- arch/x86/crypto/crct10dif-pcl-asm_64.S | 659 + 1 file changed, 659 insertions(+) create mode 100644 arch/x86/crypto/crct10dif-pcl-asm_64.S diff --git a/arch/x86/crypto/crct10dif-pcl-asm_64.S b/arch/

Re: [PATCH 4/4] Simple correctness and speed test for CRCT10DIF hash

2013-04-17 Thread Tim Chen
On Wed, 2013-04-17 at 20:58 +0300, Jussi Kivilinna wrote: > On 16.04.2013 19:20, Tim Chen wrote: > > These are simple tests to do sanity check of CRC T10 DIF hash. The > > correctness of the transform can be checked with the command > > modprobe tcrypt mode=47 > >

Re: [PATCH 2/4] Accelerated CRC T10 DIF computation with PCLMULQDQ instruction

2013-04-17 Thread Tim Chen
On Wed, 2013-04-17 at 20:58 +0300, Jussi Kivilinna wrote: > On 16.04.2013 19:20, Tim Chen wrote: > > This is the x86_64 CRC T10 DIF transform accelerated with the PCLMULQDQ > > instructions. Details discussing the implementation can be found in the > > paper: > > &g

[PATCH v2 1/4] Wrap crc_t10dif function all to use crypto transform framework

2013-04-17 Thread Tim Chen
When CRC T10 DIF is calculated using the crypto transform framework, we wrap the crc_t10dif function call to utilize it. This allows us to take advantage of any accelerated CRC T10 DIF transform that is plugged into the crypto framework. Signed-off-by: Tim Chen --- include/linux/crc-t10dif.h

[PATCH v2 0/4] Patchset to use PCLMULQDQ to accelerate CRC-T10DIF checksum computation

2013-04-17 Thread Tim Chen
ths through crc t10dif computation. 4. Fix config dependencies of CRYPTO_CRCT10DIF. Thanks to Matthew and Jussi who reviewed the patches and Keith for testing version 1 of the patch set. Tim Chen (4): Wrap crc_t10dif function all to use crypto transform framework Accelerated CRC T10

[PATCH v2 3/4] Glue code to cast accelerated CRCT10DIF assembly as a crypto transform

2013-04-17 Thread Tim Chen
: Tim Chen --- arch/x86/crypto/Makefile| 2 + arch/x86/crypto/crct10dif-pclmul_glue.c | 153 crypto/Kconfig | 21 + 3 files changed, 176 insertions(+) create mode 100644 arch/x86/crypto/crct10dif-pclmul_glue.c diff

[PATCH v2 2/4] Accelerated CRC T10 DIF computation with PCLMULQDQ instruction

2013-04-17 Thread Tim Chen
ents/white-papers/fast-crc-computation-generic-polynomials-pclmulqdq-paper.pdf Signed-off-by: Tim Chen --- arch/x86/crypto/crct10dif-pcl-asm_64.S | 643 + 1 file changed, 643 insertions(+) create mode 100644 arch/x86/crypto/crct10dif-pcl-asm_64.S diff --git a/arch/

[PATCH v2 4/4] Simple correctness and speed test for CRCT10DIF hash

2013-04-17 Thread Tim Chen
turbo off when running the speed test so the frequency governor will not tweak the frequency and affects the measurements. Signed-off-by: Tim Chen --- crypto/tcrypt.c | 8 crypto/testmgr.c | 10 ++ crypto/testmgr.h | 33 + 3 files changed, 51

[PATCH] Fix prototype definitions of sha256_transform_asm, sha512_transform_asm

2013-04-19 Thread Tim Chen
on to static. It also fixes a typo in sha512_ssse3_final function that affects the computation of upper 64 bits of the buffer size. Thanks. Tim Signed-off-by: Tim Chen --- arch/x86/crypto/sha256_ssse3_glue.c | 2 +- arch/x86/crypto/sha512_ssse3_glue.c | 4 ++-- 2 files changed, 3 insertions(

Performance regression from switching lock to rw-sem for anon-vma tree

2013-06-13 Thread Tim Chen
Ingo, At the time of switching the anon-vma tree's lock from mutex to rw-sem (commit 5a505085), we encountered regressions for fork heavy workload. A lot of optimizations to rw-sem (e.g. lock stealing) helped to mitigate the problem. I tried an experiment on the 3.10-rc4 kernel to compare the

Re: Performance regression from switching lock to rw-sem for anon-vma tree

2013-06-14 Thread Tim Chen
Added copy to mailing list which I forgot in my previous reply: On Thu, 2013-06-13 at 16:43 -0700, Davidlohr Bueso wrote: > On Thu, 2013-06-13 at 16:15 -0700, Tim Chen wrote: > > Ingo, > > > > At the time of switching the anon-vma tree's lock from mutex to > &

Re: Performance regression from switching lock to rw-sem for anon-vma tree

2013-06-14 Thread Tim Chen
> > Unfortunately this patch didn't make any difference, in fact it hurt > several of the workloads even more. I also tried disabling preemption > when spinning on owner to actually resemble spinlocks, which was my > original plan, yet not much difference. > That's also similar to the performa

Re: Performance regression from switching lock to rw-sem for anon-vma tree

2013-06-17 Thread Tim Chen
On Mon, 2013-06-17 at 09:22 -0700, Davidlohr Bueso wrote: > On Sun, 2013-06-16 at 17:50 +0800, Alex Shi wrote: > > On 06/14/2013 07:43 AM, Davidlohr Bueso wrote: > > > I was hoping that the lack of spin on owner was the main difference with > > > rwsems and am/was in the middle of implementing it.

Re: Performance regression from switching lock to rw-sem for anon-vma tree

2013-06-17 Thread Tim Chen
On Fri, 2013-06-14 at 15:47 -0700, Michel Lespinasse wrote: > On Fri, Jun 14, 2013 at 3:31 PM, Davidlohr Bueso > wrote: > > A few ideas that come to mind are avoiding taking the ->wait_lock and > > avoid dealing with waiters when doing the optimistic spinning (just like > > mutexes do). > > > > I

Re: Performance regression from switching lock to rw-sem for anon-vma tree

2013-06-17 Thread Tim Chen
On Mon, 2013-06-17 at 12:05 -0700, Davidlohr Bueso wrote: > > > > Thanks. Those are encouraging numbers. On my exim workload I didn't > > get a boost when I added in the preempt disable in optimistic spin and > > put Alex's changes in. Can you send me your combined patch to see if > > there may

Re: Performance regression from switching lock to rw-sem for anon-vma tree

2013-06-17 Thread Tim Chen
re in addition to the other two patches. Right now the patch is an ugly hack. I'll merge rwsem_down_write_failed_s and rwsem_down_write_failed into one function if this approach actually helps things. I'll clean these three patches after we have some idea of their effectiveness. Thanks

Re: Performance regression from switching lock to rw-sem for anon-vma tree

2013-06-19 Thread Tim Chen
On Wed, 2013-06-19 at 15:16 +0200, Ingo Molnar wrote: > > vmstat for mutex implementation: > > procs ---memory-- ---swap-- -io --system-- > > -cpu- > > r b swpd free buff cache si sobibo in cs us sy id > > wa st > > 38 0 0 130957920

Re: Performance regression from switching lock to rw-sem for anon-vma tree

2013-06-19 Thread Tim Chen
On Wed, 2013-06-19 at 16:11 -0700, Davidlohr Bueso wrote: > On Mon, 2013-06-17 at 17:08 -0700, Tim Chen wrote: > > On Mon, 2013-06-17 at 16:35 -0700, Davidlohr Bueso wrote: > > > On Tue, 2013-06-18 at 07:20 +0800, Alex Shi wrote: > > > > On 06/18/2013 1

Re: [PATCH v2 1/2] Make the batch size of the percpu_counter configurable

2013-05-29 Thread Tim Chen
On Wed, 2013-05-29 at 12:26 -0700, Andrew Morton wrote: > On Wed, 22 May 2013 16:37:18 -0700 Tim Chen > wrote: > > > Currently the per cpu counter's batch size for memory accounting is > > configured as twice the number of cpus in the system. However, > > for s

  1   2   3   4   5   6   7   8   9   10   >