On Tue, 2013-02-26 at 17:54 +0800, Herbert Xu wrote:
> On Thu, Sep 27, 2012 at 03:44:22PM -0700, Tim Chen wrote:
> > This patch adds the crc_pcl function that calculates CRC32C checksum using
> > the
> > PCLMULQDQ instruction on processors that support this feature. This wil
Herbert,
The following patch update the stale link to the CRC32C white paper
that was referenced.
Tim
Signed-off-by: Tim Chen
---
arch/x86/crypto/crc32c-pcl-intel-asm_64.S |5 +++--
1 files changed, 3 insertions(+), 2 deletions(-)
diff --git a/arch/x86/crypto/crc32c-pcl-intel-asm_64.S
On Tue, 2012-08-21 at 17:48 -0700, Tim Chen wrote:
>
> Thanks to Matthew's suggestions on improving the patch. Here's the
> updated version. It seems to be sane when I booted my machine up. I
> will put it through more testing when I get a chance.
>
> Tim
>
M
On Tue, 2012-09-04 at 08:21 -0700, Tim Chen wrote:
> On Tue, 2012-08-21 at 17:48 -0700, Tim Chen wrote:
>
> >
> > Thanks to Matthew's suggestions on improving the patch. Here's the
> > updated version. It seems to be sane when I booted my machine up. I
> &g
On Wed, 2012-09-12 at 12:27 -0700, Andrew Morton wrote:
>
> That sounds good, although more details on the performance changes
> would be appreciated - after all, that's the entire point of the
> patchset.
>
> And we shouldn't only test for improvements - we should also test for
> degradation.
On Tue, 2012-09-11 at 12:05 +0100, Mel Gorman wrote:
>
> One *massive* change here that is not called out in the changelog is that
> the reclaim path now holds the page lock on multiple pages at the same
> time waiting for them to be batch unlocked in __remove_mapping_batch.
> This is suspicious
On Tue, 2012-09-11 at 14:36 +0900, Minchan Kim wrote:
>
> If you send next versions, please use git-format-patch --thread style.
> Quote from man
> " If given --thread, git-format-patch will generate In-Reply-To and
> References
>headers to make the second and subsequent patch mail
On Fri, 2012-09-28 at 10:57 +0800, Herbert Xu wrote:
> 在 2012-9-28 上午10:54,"H. Peter Anvin" 写道:
> >
> > On 09/27/2012 03:44 PM, Tim Chen wrote:
> >>
> >> Version 2
> >> This version of the patch series fixes compilation errors for
> >> 3
This patch rename the crc32c-intel.c file to crc32c-intel_glue.c file
in preparation for linking with the new crc32c-pcl-intel-asm.S file,
which contains optimized crc32c calculation based on PCLMULQDQ
instruction.
Tim
Signed-off-by: Tim Chen
---
arch/x86/crypto/Makefile
://download.intel.com/design/intarch/papers/323405.pdf
Tim
Signed-off-by: Tim Chen
---
arch/x86/crypto/Makefile |2 +-
arch/x86/crypto/crc32c-intel_glue.c| 75 +
arch/x86/crypto/crc32c-pcl-intel-asm.S | 460
3 files changed, 536 insertions(+), 1
This patch adds a test case in tcrypt to perform speed test for
crc32c checksum calculation.
Tim
Signed-off-by: Tim Chen
---
crypto/tcrypt.c |4
1 files changed, 4 insertions(+), 0 deletions(-)
diff --git a/crypto/tcrypt.c b/crypto/tcrypt.c
index 581081d..6deb77f 100644
--- a/crypto
Chen
---
Tim Chen (3):
Rename crc32c-intel.c to crc32c-intel_glue.c
Optimize CRC32C calculation with PCLMULQDQ instruction
Added speed test in tcrypt for crc32c
arch/x86/crypto/Makefile |1 +
.../crypto/{crc32c-intel.c => crc32c-intel_glue.c} | 75
a
| alloc_pages_current
Signed-off-by: Tim Chen
---
fs/super.c | 8
1 file changed, 8 insertions(+)
diff --git a/fs/super.c b/fs/super.c
index 68307c0..70fa26c 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -53,6 +53,7 @@ static char *sb_writers_name
an miss a memory hog easily this
> way.
Is it safe to compute sb->s_op->nr_cached_objects(sb), assuming non null
s_op without holding sb_lock to increment ref count on sb?
I think it is safe as we hold the shrinker_rwsem so we cannot
unregister the shrinker and the s_op an
On Mon, 2013-09-30 at 12:36 -0400, Waiman Long wrote:
> On 09/30/2013 12:10 PM, Jason Low wrote:
> > On Mon, 2013-09-30 at 11:51 -0400, Waiman Long wrote:
> >> On 09/28/2013 12:34 AM, Jason Low wrote:
> Also, below is what the mcs_spin_lock() and mcs_spin_unlock()
> functions would look l
On Tue, 2013-10-01 at 16:01 -0400, Waiman Long wrote:
> On 10/01/2013 12:48 PM, Tim Chen wrote:
> > On Mon, 2013-09-30 at 12:36 -0400, Waiman Long wrote:
> >> On 09/30/2013 12:10 PM, Jason Low wrote:
> >>> On Mon, 2013-09-30 at 11:51 -0400, Waiman Long wrote:
> &
On Tue, 2013-10-01 at 21:25 -0400, Waiman Long wrote:
> On 10/01/2013 05:16 PM, Tim Chen wrote:
> > On Tue, 2013-10-01 at 16:01 -0400, Waiman Long wrote:
> >>>
> >>> The cpu could still be executing out of order load instruction from the
> >>>
This patch corrects the way memory barriers are used in the MCS lock
and removes ones that are not needed. Also add comments on all barriers.
Signed-off-by: Jason Low
---
include/linux/mcs_spinlock.h | 13 +++--
1 files changed, 11 insertions(+), 2 deletions(-)
diff --git a/include/li
Cmpxchg will cause the cacheline bouning when do the value checking,
that cause scalability issue in a large machine (like a 80 core box).
So a lock pre-read can relief this contention.
Signed-off-by: Alex Shi
---
include/asm-generic/rwsem.h |8
1 files changed, 4 insertions(+), 4
We will need the MCS lock code for doing optimistic spinning for rwsem.
Extracting the MCS code from mutex.c and put into its own file allow us
to reuse this code easily for rwsem.
Reviewed-by: Ingo Molnar
Reviewed-by: Peter Zijlstra
Signed-off-by: Tim Chen
Signed-off-by: Davidlohr Bueso
Atomic update lock and roll back will cause cache bouncing in large
machine. A lock status pre-read can relieve this problem
Suggested-by: Davidlohr bueso
Suggested-by: Tim Chen
Signed-off-by: Alex Shi
---
lib/rwsem.c |8 +++-
1 files changed, 7 insertions(+), 1 deletions(-)
diff
That make code simple and more readable
Signed-off-by: Alex Shi
---
lib/rwsem.c | 12 +++-
1 files changed, 7 insertions(+), 5 deletions(-)
diff --git a/lib/rwsem.c b/lib/rwsem.c
index 42f1b1a..a8055cf 100644
--- a/lib/rwsem.c
+++ b/lib/rwsem.c
@@ -85,15 +85,17 @@ __rwsem_do_wake(stru
mments update
Alex Shi (4):
rwsem: check the lock before cpmxchg in down_write_trylock
rwsem: remove 'out' label in do_wake
rwsem: remove try_reader_grant label do_wake
rwsem/wake: check lock before do atomic update
Jason Low (2):
MCS Lock: optimizations and extra comments
Remove unnecessary operation and make the cmpxchg(lock, node, NULL) == node
check in mcs_spin_unlock() likely() as it is likely that a race did not occur
most of the time.
Also add in more comments describing how the local node is used in MCS locks.
Signed-off-by: Jason Low
---
include/linux/mc
queue
reduces wait queue contention and provided greater chance for the rwsem
to get acquired. With these changes, rwsem is on par with mutex.
Reviewed-by: Ingo Molnar
Reviewed-by: Peter Zijlstra
Reviewed-by: Peter Hurley
Signed-off-by: Tim Chen
Signed-off-by: Davidlohr Bueso
---
include/
That make code simple and more readable.
Signed-off-by: Alex Shi
---
lib/rwsem.c |5 ++---
1 files changed, 2 insertions(+), 3 deletions(-)
diff --git a/lib/rwsem.c b/lib/rwsem.c
index 19c5fa9..42f1b1a 100644
--- a/lib/rwsem.c
+++ b/lib/rwsem.c
@@ -75,7 +75,7 @@ __rwsem_do_wake(struct rw_se
With the 3.12-rc2 kernel, there is sizable spinlock contention on
the rwsem wakeup code path when running AIM7's high_systime workload
on a 8-socket 80-core DL980 (HT off) as reported by perf:
7.64% reaim [kernel.kallsyms] [k] _raw_spin_lock_irqsave
|--41.77%-- rwsem_wake
1.6
the shrinker before ->kill_sb().
Signed-off-by: Tim Chen
---
fs/super.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/super.c b/fs/super.c
index 73d0952..b724f35 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -324,10 +324,10 @@ void deactivate_locked_super(stru
On Fri, 2013-09-06 at 10:55 +1000, Dave Chinner wrote:
> On Tue, Sep 03, 2013 at 11:38:27AM -0700, Tim Chen wrote:
> > On Sat, 2013-08-31 at 19:00 +1000, Dave Chinner wrote:
> > > On Fri, Aug 30, 2013 at 09:21:34AM -0700, Tim Chen wrote:
> > > >
> >
ion is
safe when we are doing unmount.
Signed-off-by: Tim Chen
---
fs/super.c | 12
1 file changed, 8 insertions(+), 4 deletions(-)
diff --git a/fs/super.c b/fs/super.c
index b724f35..b5c9fdf 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -112,9 +112,14 @@ static unsigne
On Thu, 2013-10-03 at 09:32 +0200, Ingo Molnar wrote:
> * Tim Chen wrote:
>
> > For version 8 of the patchset, we included the patch from Waiman to
> > streamline wakeup operations and also optimize the MCS lock used in
> > rwsem and mutex.
>
> I'd be fe
On Tue, 2013-10-08 at 16:51 -0300, Rafael Aquini wrote:
> On Wed, Oct 02, 2013 at 03:38:32PM -0700, Tim Chen wrote:
> > We will need the MCS lock code for doing optimistic spinning for rwsem.
> > Extracting the MCS code from mutex.c and put into its own file allow us
> > to r
On Wed, 2013-10-09 at 08:15 +0200, Ingo Molnar wrote:
> * Tim Chen wrote:
>
> > Ingo,
> >
> > I ran the vanilla kernel, the kernel with all rwsem patches and the
> > kernel with all patches except the optimistic spin one. I am listing
> > two presenta
> > Signed-off-by: Tim Chen
> > ---
> > fs/super.c | 8
> > 1 file changed, 8 insertions(+)
> >
> > diff --git a/fs/super.c b/fs/super.c
> > index 68307c0..70fa26c 100644
> > --- a/fs/super.c
> > +++ b/fs/super.c
> > @@
e grab_super_passive
from the super_cache_count code. That should remove the bottleneck
in reclamation.
Thanks for your detailed explanation.
Tim
Signed-off-by: Tim Chen
---
diff --git a/fs/super.c b/fs/super.c
index 73d0952..4df1fab 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -112,9 +112,6 @@ s
On Sat, 2013-08-31 at 19:00 +1000, Dave Chinner wrote:
> On Fri, Aug 30, 2013 at 09:21:34AM -0700, Tim Chen wrote:
> >
> >
> > Signed-off-by: Tim Chen
> > ---
> > diff --git a/fs/super.c b/fs/super.c
> > index 73d0952..4df1fab 100644
> > --- a/fs/supe
On Thu, 2013-09-26 at 10:40 +0200, Peter Zijlstra wrote:
> On Thu, Sep 26, 2013 at 08:46:29AM +0200, Ingo Molnar wrote:
> > > +/*
> > > + * MCS lock defines
> > > + *
> > > + * This file contains the main data structure and API definitions of MCS
> > > lock.
> >
> > A (very) short blurb about wha
t 12:27 -0700, Jason Low wrote:
> > > > > On Wed, Sep 25, 2013 at 3:10 PM, Tim Chen
> > > > > wrote:
> > > > > > We will need the MCS lock code for doing optimistic spinning for
> > > > > > rwsem.
> > > > > > Extracting t
queue
reduces wait queue contention and provided greater chance for the rwsem
to get acquired. With these changes, rwsem is on par with mutex.
Reviewed-by: Ingo Molnar
Reviewed-by: Peter Zijlstra
Reviewed-by: Peter Hurley
Signed-off-by: Tim Chen
Signed-off-by: Davidlohr Bueso
---
include/
ared and high_systime workloads when he switched i_mmap_mutex
to rwsem. Tests were on 8 socket 80 cores system. With the patchset,
he got significant improvements to the aim7 suite instead of regressions:
alltests (+16.3%), custom (+20%), disk (+19.5%), high_systime (+7%),
shared (+18.4%) and short (+
We will need the MCS lock code for doing optimistic spinning for rwsem.
Extracting the MCS code from mutex.c and put into its own file allow us
to reuse this code easily for rwsem.
Reviewed-by: Ingo Molnar
Reviewed-by: Peter Zijlstra
Signed-off-by: Tim Chen
Signed-off-by: Davidlohr Bueso
That make code simple and more readable.
Signed-off-by: Alex Shi
---
lib/rwsem.c |5 ++---
1 files changed, 2 insertions(+), 3 deletions(-)
diff --git a/lib/rwsem.c b/lib/rwsem.c
index 19c5fa9..42f1b1a 100644
--- a/lib/rwsem.c
+++ b/lib/rwsem.c
@@ -75,7 +75,7 @@ __rwsem_do_wake(struct rw_se
Cmpxchg will cause the cacheline bouning when do the value checking,
that cause scalability issue in a large machine (like a 80 core box).
So a lock pre-read can relief this contention.
Signed-off-by: Alex Shi
---
include/asm-generic/rwsem.h |8
1 files changed, 4 insertions(+), 4
Atomic update lock and roll back will cause cache bouncing in large
machine. A lock status pre-read can relieve this problem
Suggested-by: Davidlohr bueso
Suggested-by: Tim Chen
Signed-off-by: Alex Shi
---
lib/rwsem.c |8 +++-
1 files changed, 7 insertions(+), 1 deletions(-)
diff
That make code simple and more readable
Signed-off-by: Alex Shi
---
lib/rwsem.c | 12 +++-
1 files changed, 7 insertions(+), 5 deletions(-)
diff --git a/lib/rwsem.c b/lib/rwsem.c
index 42f1b1a..a8055cf 100644
--- a/lib/rwsem.c
+++ b/lib/rwsem.c
@@ -85,15 +85,17 @@ __rwsem_do_wake(stru
On Thu, 2013-09-26 at 15:42 -0700, Jason Low wrote:
> On Thu, 2013-09-26 at 14:41 -0700, Tim Chen wrote:
> > On Thu, 2013-09-26 at 14:09 -0700, Jason Low wrote:
> > > On Thu, 2013-09-26 at 13:40 -0700, Davidlohr Bueso wrote:
> > > > On Thu, 2013-09-26 at 13:23 -0700,
On Fri, 2013-09-27 at 09:12 -0700, Jason Low wrote:
> On Fri, 2013-09-27 at 08:02 +0200, Ingo Molnar wrote:
> > Would be nice to have this as a separate, add-on patch. Every single
> > instruction removal that has no downside is an upside!
>
> Okay, so here is a patch. Tim, would you like to add
On Fri, 2013-09-27 at 08:29 -0700, Paul E. McKenney wrote:
> On Wed, Sep 25, 2013 at 03:10:49PM -0700, Tim Chen wrote:
> > We will need the MCS lock code for doing optimistic spinning for rwsem.
> > Extracting the MCS code from mutex.c and put into its own file allow us
> &g
On Fri, 2013-09-27 at 08:29 -0700, Paul E. McKenney wrote:
> On Wed, Sep 25, 2013 at 03:10:49PM -0700, Tim Chen wrote:
> > We will need the MCS lock code for doing optimistic spinning for rwsem.
> > Extracting the MCS code from mutex.c and put into its own file allow us
> &g
On Fri, 2013-09-27 at 12:39 -0700, Davidlohr Bueso wrote:
> On Fri, 2013-09-27 at 12:28 -0700, Linus Torvalds wrote:
> > On Fri, Sep 27, 2013 at 12:00 PM, Waiman Long wrote:
> > >
> > > On a large NUMA machine, it is entirely possible that a fairly large
> > > number of threads are queuing up in t
On Fri, 2013-09-27 at 13:38 -0700, Paul E. McKenney wrote:
> On Fri, Sep 27, 2013 at 12:38:53PM -0700, Tim Chen wrote:
> > On Fri, 2013-09-27 at 08:29 -0700, Paul E. McKenney wrote:
> > > On Wed, Sep 25, 2013 at 03:10:49PM -0700, Tim Chen wrote:
> > > > We will ne
see the effects of the
> > previous lock holder's critical section." And in the mcs_spin_unlock(),
> > move the
> > memory barrier so that it is before the "ACCESS_ONCE(next->locked) = 1;".
> >
> > Signed-off-by: Jason Low
> > Signed-off-by: Paul E. McKe
On Sat, 2013-09-28 at 21:52 +0200, Ingo Molnar wrote:
> * Linus Torvalds wrote:
>
> > On Sat, Sep 28, 2013 at 12:37 PM, Ingo Molnar wrote:
> > >
> > > - down_write_nest_lock(&anon_vma->root->rwsem,
> > > &mm->mmap_sem);
> > > + down_write_nest_lock(&anon_vma->root->r
On Mon, 2013-09-30 at 20:14 +0200, Peter Zijlstra wrote:
> On Mon, Sep 30, 2013 at 10:10:27AM -0700, Tim Chen wrote:
> > Here's the exim workload data:
> >
> > rwsem improvment:
> > Waimain's patch:+2.0%
> > Alex+Tim's patchset:+4.8%
&g
On Mon, 2013-09-30 at 15:35 -0400, Waiman Long wrote:
> On 09/30/2013 03:23 PM, Tim Chen wrote:
> > On Mon, 2013-09-30 at 20:14 +0200, Peter Zijlstra wrote:
> >> On Mon, Sep 30, 2013 at 10:10:27AM -0700, Tim Chen wrote:
> >>> Here's the exim workload
On Mon, 2013-09-30 at 12:47 -0700, Tim Chen wrote:
> On Mon, 2013-09-30 at 15:35 -0400, Waiman Long wrote:
> > On 09/30/2013 03:23 PM, Tim Chen wrote:
> > > On Mon, 2013-09-30 at 20:14 +0200, Peter Zijlstra wrote:
> > >> On Mon, Sep 30, 2013 at 10:10:27AM -0700, Tim
) and short (+6.3%).
Tim Chen also got a +5% improvements to exim mail server workload on a 40 core
system.
Thanks to Ingo Molnar, Peter Hurley and Peter Zijlstra for reviewing this
patchset.
Regards,
Tim Chen
Changelog:
v5:
1. Try optimistic spinning before we put the writer on the wait que
That make code simple and more readable
Signed-off-by: Alex Shi
---
lib/rwsem.c | 12 +++-
1 files changed, 7 insertions(+), 5 deletions(-)
diff --git a/lib/rwsem.c b/lib/rwsem.c
index 42f1b1a..a8055cf 100644
--- a/lib/rwsem.c
+++ b/lib/rwsem.c
@@ -85,15 +85,17 @@ __rwsem_do_wake(stru
Atomic update lock and roll back will cause cache bouncing in large
machine. A lock status pre-read can relieve this problem
Suggested-by: Davidlohr bueso
Suggested-by: Tim Chen
Signed-off-by: Alex Shi
---
lib/rwsem.c |8 +++-
1 files changed, 7 insertions(+), 1 deletions(-)
diff
queue
reduces wait queue contention and provided greater chance for the rwsem
to get acquired. With these changes, rwsem is on par with mutex.
Reviewed-by: Peter Zijlstra
Reviewed-by: Peter Hurley
Reviewed-by: Ingo Molnar
Signed-off-by: Tim Chen
Signed-off-by: Davidlohr Bueso
---
include/
That make code simple and more readable.
Signed-off-by: Alex Shi
---
lib/rwsem.c |5 ++---
1 files changed, 2 insertions(+), 3 deletions(-)
diff --git a/lib/rwsem.c b/lib/rwsem.c
index 19c5fa9..42f1b1a 100644
--- a/lib/rwsem.c
+++ b/lib/rwsem.c
@@ -75,7 +75,7 @@ __rwsem_do_wake(struct rw_se
Cmpxchg will cause the cacheline bouning when do the value checking,
that cause scalability issue in a large machine (like a 80 core box).
So a lock pre-read can relief this contention.
Signed-off-by: Alex Shi
---
include/asm-generic/rwsem.h |8
1 files changed, 4 insertions(+), 4
We will need the MCS lock code for doing optimistic spinning for rwsem.
Extracting the MCS code from mutex.c and put into its own file allow us
to reuse this code easily for rwsem.
Signed-off-by: Tim Chen
Signed-off-by: Davidlohr Bueso
---
kernel/mutex.c | 58
On Tue, 2013-09-24 at 16:22 -0700, Jason Low wrote:
> Should we do something similar with __down_read_trylock, such as
> the following?
>
>
> Signed-off-by: Jason Low
> ---
> include/asm-generic/rwsem.h |3 +++
> 1 files changed, 3 insertions(+), 0 deletions(-)
>
> diff --git a/include/asm
On Wed, 2013-09-25 at 07:55 +0200, Peter Zijlstra wrote:
> On Tue, Sep 24, 2013 at 03:22:46PM -0700, Tim Chen wrote:
> > We will need the MCS lock code for doing optimistic spinning for rwsem.
> > Extracting the MCS code from mutex.c and put into its own file allow us
> > to r
rwsem. Tests were on 8 socket 80 cores system. With the patchset,
he got significant improvements to the aim7 suite instead of regressions:
alltests (+16.3%), custom (+20%), disk (+19.5%), high_systime (+7%),
shared (+18.4%) and short (+6.3%).
Tim Chen also got a +5% improvements to exim mail se
We will need the MCS lock code for doing optimistic spinning for rwsem.
Extracting the MCS code from mutex.c and put into its own file allow us
to reuse this code easily for rwsem.
Signed-off-by: Tim Chen
Signed-off-by: Davidlohr Bueso
---
include/linux/mcslock.h | 58
That make code simple and more readable
Signed-off-by: Alex Shi
---
lib/rwsem.c | 12 +++-
1 files changed, 7 insertions(+), 5 deletions(-)
diff --git a/lib/rwsem.c b/lib/rwsem.c
index 42f1b1a..a8055cf 100644
--- a/lib/rwsem.c
+++ b/lib/rwsem.c
@@ -85,15 +85,17 @@ __rwsem_do_wake(stru
queue
reduces wait queue contention and provided greater chance for the rwsem
to get acquired. With these changes, rwsem is on par with mutex.
Reviewed-by: Ingo Molnar
Reviewed-by: Peter Zijlstra
Reviewed-by: Peter Hurley
Signed-off-by: Tim Chen
Signed-off-by: Davidlohr Bueso
---
include/
That make code simple and more readable.
Signed-off-by: Alex Shi
---
lib/rwsem.c |5 ++---
1 files changed, 2 insertions(+), 3 deletions(-)
diff --git a/lib/rwsem.c b/lib/rwsem.c
index 19c5fa9..42f1b1a 100644
--- a/lib/rwsem.c
+++ b/lib/rwsem.c
@@ -75,7 +75,7 @@ __rwsem_do_wake(struct rw_se
Atomic update lock and roll back will cause cache bouncing in large
machine. A lock status pre-read can relieve this problem
Suggested-by: Davidlohr bueso
Suggested-by: Tim Chen
Signed-off-by: Alex Shi
---
lib/rwsem.c |8 +++-
1 files changed, 7 insertions(+), 1 deletions(-)
diff
Cmpxchg will cause the cacheline bouning when do the value checking,
that cause scalability issue in a large machine (like a 80 core box).
So a lock pre-read can relief this contention.
Signed-off-by: Alex Shi
---
include/asm-generic/rwsem.h |8
1 files changed, 4 insertions(+), 4
n't have to be acquired multiple times.
Tim
---
Signed-off-by: Tim Chen
---
diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index fd07c45..f1320b1 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -156,8 +156,11 @@ static inline void page_dup_rmap(struct page *page)
/*
tex
acquisition by holding the mutex in shrink_page_list before calling
__page_referenced and __try_to_unmap. This improves the
performance when the system has a lot page reclamations for file mapped
pages if workloads are using a lot of memory for page cache.
Tim
---
Signed-off-by: Tim Chen
Signed-off
the i_mmap_mutex.
I managed to get 14% throughput improvement when with a workload putting
heavy pressure of page cache by reading many large mmaped files
simultaneously on a 8 socket Westmere server.
Tim
Signed-off-by: Tim Chen
---
Diffstat
include/linux/rmap.h |8 +++-
mm/rma
voids excessive cache bouncing of
the tree lock when page reclamations are occurring simultaneously.
Tim
---
Signed-off-by: Tim Chen
---
diff --git a/mm/vmscan.c b/mm/vmscan.c
index aac5672..d4ab646 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -600,6 +600,85 @@ cannot_free:
ret
On Fri, 2012-10-05 at 16:14 -0700, Andi Kleen wrote:
> Andrew Morton writes:
>
> > On Thu, 4 Oct 2012 01:50:42 +0200
> > Andrea Arcangeli wrote:
> >
> >> This is a new AutoNUMA27 release for Linux v3.6.
> >
> > Peter's numa/sched patches have been in -next for a week.
>
> Did they pass review
.
Will appreciate if you can consider merging this for the 3.10 kernel.
Tim
Tim Chen (4):
Wrap crc_t10dif function all to use crypto transform framework
Accelerated CRC T10 DIF computation with PCLMULQDQ instruction
Glue code to cast accelerated CRCT10DIF assembly as a crypto
transform
turbo off when running the
speed test so the frequency governor will not tweak the frequency and
affects the measurements.
Signed-off-by: Tim Chen
Tested-by: Keith Busch
---
crypto/tcrypt.c | 8
crypto/testmgr.c | 10 ++
crypto/testmgr.h | 24
3 files
: Tim Chen
Tested-by: Keith Busch
---
arch/x86/crypto/Makefile| 2 +
arch/x86/crypto/crct10dif-pclmul_glue.c | 153
crypto/Kconfig | 21 +
3 files changed, 176 insertions(+)
create mode 100644 arch/x86/crypto/crct10dif
When CRC T10 DIF is calculated using the crypto transform framework, we
wrap the crc_t10dif function call to utilize it. This allows us to
take advantage of any accelerated CRC T10 DIF transform that is
plugged into the crypto framework.
Signed-off-by: Tim Chen
Tested-by: Keith Busch
323102.pdf
Signed-off-by: Tim Chen
Tested-by: Keith Busch
---
arch/x86/crypto/crct10dif-pcl-asm_64.S | 659 +
1 file changed, 659 insertions(+)
create mode 100644 arch/x86/crypto/crct10dif-pcl-asm_64.S
diff --git a/arch/x86/crypto/crct10dif-pcl-asm_64.S
b/arch/
On Wed, 2013-04-17 at 20:58 +0300, Jussi Kivilinna wrote:
> On 16.04.2013 19:20, Tim Chen wrote:
> > These are simple tests to do sanity check of CRC T10 DIF hash. The
> > correctness of the transform can be checked with the command
> > modprobe tcrypt mode=47
> >
On Wed, 2013-04-17 at 20:58 +0300, Jussi Kivilinna wrote:
> On 16.04.2013 19:20, Tim Chen wrote:
> > This is the x86_64 CRC T10 DIF transform accelerated with the PCLMULQDQ
> > instructions. Details discussing the implementation can be found in the
> > paper:
> >
&g
When CRC T10 DIF is calculated using the crypto transform framework, we
wrap the crc_t10dif function call to utilize it. This allows us to
take advantage of any accelerated CRC T10 DIF transform that is
plugged into the crypto framework.
Signed-off-by: Tim Chen
---
include/linux/crc-t10dif.h
ths through
crc t10dif computation.
4. Fix config dependencies of CRYPTO_CRCT10DIF.
Thanks to Matthew and Jussi who reviewed the patches and Keith
for testing version 1 of the patch set.
Tim Chen (4):
Wrap crc_t10dif function all to use crypto transform framework
Accelerated CRC T10
: Tim Chen
---
arch/x86/crypto/Makefile| 2 +
arch/x86/crypto/crct10dif-pclmul_glue.c | 153
crypto/Kconfig | 21 +
3 files changed, 176 insertions(+)
create mode 100644 arch/x86/crypto/crct10dif-pclmul_glue.c
diff
ents/white-papers/fast-crc-computation-generic-polynomials-pclmulqdq-paper.pdf
Signed-off-by: Tim Chen
---
arch/x86/crypto/crct10dif-pcl-asm_64.S | 643 +
1 file changed, 643 insertions(+)
create mode 100644 arch/x86/crypto/crct10dif-pcl-asm_64.S
diff --git a/arch/
turbo off when running the
speed test so the frequency governor will not tweak the frequency and
affects the measurements.
Signed-off-by: Tim Chen
---
crypto/tcrypt.c | 8
crypto/testmgr.c | 10 ++
crypto/testmgr.h | 33 +
3 files changed, 51
on to static. It also
fixes a typo in sha512_ssse3_final function that affects the computation
of upper 64 bits of the buffer size.
Thanks.
Tim
Signed-off-by: Tim Chen
---
arch/x86/crypto/sha256_ssse3_glue.c | 2 +-
arch/x86/crypto/sha512_ssse3_glue.c | 4 ++--
2 files changed, 3 insertions(
Ingo,
At the time of switching the anon-vma tree's lock from mutex to
rw-sem (commit 5a505085), we encountered regressions for fork heavy workload.
A lot of optimizations to rw-sem (e.g. lock stealing) helped to
mitigate the problem. I tried an experiment on the 3.10-rc4 kernel
to compare the
Added copy to mailing list which I forgot in my previous reply:
On Thu, 2013-06-13 at 16:43 -0700, Davidlohr Bueso wrote:
> On Thu, 2013-06-13 at 16:15 -0700, Tim Chen wrote:
> > Ingo,
> >
> > At the time of switching the anon-vma tree's lock from mutex to
> &
>
> Unfortunately this patch didn't make any difference, in fact it hurt
> several of the workloads even more. I also tried disabling preemption
> when spinning on owner to actually resemble spinlocks, which was my
> original plan, yet not much difference.
>
That's also similar to the performa
On Mon, 2013-06-17 at 09:22 -0700, Davidlohr Bueso wrote:
> On Sun, 2013-06-16 at 17:50 +0800, Alex Shi wrote:
> > On 06/14/2013 07:43 AM, Davidlohr Bueso wrote:
> > > I was hoping that the lack of spin on owner was the main difference with
> > > rwsems and am/was in the middle of implementing it.
On Fri, 2013-06-14 at 15:47 -0700, Michel Lespinasse wrote:
> On Fri, Jun 14, 2013 at 3:31 PM, Davidlohr Bueso
> wrote:
> > A few ideas that come to mind are avoiding taking the ->wait_lock and
> > avoid dealing with waiters when doing the optimistic spinning (just like
> > mutexes do).
> >
> > I
On Mon, 2013-06-17 at 12:05 -0700, Davidlohr Bueso wrote:
> >
> > Thanks. Those are encouraging numbers. On my exim workload I didn't
> > get a boost when I added in the preempt disable in optimistic spin and
> > put Alex's changes in. Can you send me your combined patch to see if
> > there may
re in addition to the
other two patches. Right now the patch is an ugly hack. I'll merge
rwsem_down_write_failed_s and rwsem_down_write_failed into one
function if this approach actually helps things.
I'll clean these three patches after we have some idea of their
effectiveness.
Thanks
On Wed, 2013-06-19 at 15:16 +0200, Ingo Molnar wrote:
> > vmstat for mutex implementation:
> > procs ---memory-- ---swap-- -io --system--
> > -cpu-
> > r b swpd free buff cache si sobibo in cs us sy id
> > wa st
> > 38 0 0 130957920
On Wed, 2013-06-19 at 16:11 -0700, Davidlohr Bueso wrote:
> On Mon, 2013-06-17 at 17:08 -0700, Tim Chen wrote:
> > On Mon, 2013-06-17 at 16:35 -0700, Davidlohr Bueso wrote:
> > > On Tue, 2013-06-18 at 07:20 +0800, Alex Shi wrote:
> > > > On 06/18/2013 1
On Wed, 2013-05-29 at 12:26 -0700, Andrew Morton wrote:
> On Wed, 22 May 2013 16:37:18 -0700 Tim Chen
> wrote:
>
> > Currently the per cpu counter's batch size for memory accounting is
> > configured as twice the number of cpus in the system. However,
> > for s
1 - 100 of 1169 matches
Mail list logo