Re: [PATCH v2 3/3] X86: Add a thread cpu time implementation to vDSO

2014-12-18 Thread Shaohua Li
On Thu, Dec 18, 2014 at 04:22:59PM -0800, Andy Lutomirski wrote: > On Thu, Dec 18, 2014 at 3:30 PM, Andy Lutomirski wrote: > > On Wed, Dec 17, 2014 at 3:12 PM, Shaohua Li wrote: > >> This primarily speeds up clock_gettime(CLOCK_THREAD_CPUTIME_ID, ..). We > >> use the

Re: [PATCH v2 3/3] X86: Add a thread cpu time implementation to vDSO

2014-12-19 Thread Shaohua Li
On Fri, Dec 19, 2014 at 09:53:24AM -0800, Andy Lutomirski wrote: > On Fri, Dec 19, 2014 at 9:42 AM, Chris Mason wrote: > > > > > > On Fri, Dec 19, 2014 at 11:48 AM, Andy Lutomirski > > wrote: > >> > >> On Fri, Dec 19, 2014 at 3:23 AM, Peter Zijlstra > >> wrote: > >>> > >>> On Thu, Dec 18, 2014

Re: [PATCH RFC 1/4] mm: throttle MADV_FREE

2015-02-24 Thread Shaohua Li
) { > > memset(512M); > > madvise(MADV_FREE or MADV_DONTNEED); > > } > > > > 1) dontneed: 6.78user 234.09system 0:48.89elapsed > > 2) madvfree: 6.03user 401.17system 1:30.67elapsed > > 3) madvfree + this ptach: 5.68user 113.42system 0:36.52elapse

Re: [PATCH RFC 1/4] mm: throttle MADV_FREE

2015-02-25 Thread Shaohua Li
On Wed, Feb 25, 2015 at 04:11:18PM +0900, Minchan Kim wrote: > On Wed, Feb 25, 2015 at 09:08:09AM +0900, Minchan Kim wrote: > > Hi Michal, > > > > On Tue, Feb 24, 2015 at 04:43:18PM +0100, Michal Hocko wrote: > > > On Tue 24-02-15 17:18:14, Minchan Kim wrote: > > > > Recently, Shaohua reported tha

Re: [PATCH] libata: revert "libata: use blk taging" et al.

2015-03-11 Thread Shaohua Li
On Wed, Mar 11, 2015 at 06:19:27PM -0400, Tony Battersby wrote: > On 03/11/2015 05:45 PM, Jens Axboe wrote: > > On 03/11/2015 02:15 PM, Tony Battersby wrote: > >> This reverts commits 12cb5ce101abfaf74421f8cc9f196e708209eb79 and > >> 98bd4be1ba95f2fe7f543910792b7163a5de06eb. > >> > >> Commit 12cb5c

Re: [PATCH] libata: revert "libata: use blk taging" et al.

2015-03-11 Thread Shaohua Li
On Wed, Mar 11, 2015 at 06:19:27PM -0400, Tony Battersby wrote: > On 03/11/2015 05:45 PM, Jens Axboe wrote: > > On 03/11/2015 02:15 PM, Tony Battersby wrote: > >> This reverts commits 12cb5ce101abfaf74421f8cc9f196e708209eb79 and > >> 98bd4be1ba95f2fe7f543910792b7163a5de06eb. > >> > >> Commit 12cb5c

Re: [PATCH] libata: revert "libata: use blk taging" et al.

2015-03-12 Thread Shaohua Li
x27;t directly match to ata tag. We use the new flag for sas ata tag allocation. Reported-by: Tony Battersby Signed-off-by: Shaohua Li diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c index 4c35f08..ef150eb 100644 --- a/drivers/ata/libata-core.c +++ b/drivers/ata/libata-core.

Re: [PATCH] libata: revert "libata: use blk taging" et al.

2015-03-12 Thread Shaohua Li
On Thu, Mar 12, 2015 at 09:59:26AM -0400, Tejun Heo wrote: > On Thu, Mar 12, 2015 at 05:46:01AM -0700, Shaohua Li wrote: > > ata: Add a new flag for sas controller > > > > Add a new flag to destinguish sas controller. sas controller has its own tag > > allocation, whic

[PATCH] blk-mq: rationalize plug

2015-03-13 Thread Shaohua Li
kload here is fsync write a block device. Without plug merge, sequential write (fsync makes it sync IO) will dispatch 4k IO. Cc: Jens Axboe Cc: Christoph Hellwig Signed-off-by: Shaohua Li --- block/blk-mq.c | 98 ++ 1 file changed, 71 inser

Re: [PATCH v17 1/7] mm: support madvise(MADV_FREE)

2015-02-06 Thread Shaohua Li
On Fri, Feb 06, 2015 at 02:51:03PM +0900, Minchan Kim wrote: > Hi Shaohua, > > On Thu, Feb 05, 2015 at 04:33:11PM -0800, Shaohua Li wrote: > > > > Hi Minchan, > > > > Sorry to jump in this thread so later, and if some issues are discussed > > before. >

Re: [PATCH v17 1/7] mm: support madvise(MADV_FREE)

2015-02-06 Thread Shaohua Li
On Fri, Feb 06, 2015 at 01:58:25PM +0100, Michal Hocko wrote: > On Thu 05-02-15 16:33:11, Shaohua Li wrote: > [...] > > Did you think about move the MADV_FREE pages to the head of inactive LRU, so > > they can be reclaimed easily? > > Yes this makes sense for pages livin

[PATCH 2/2 --resend] perf: update userspace page info for software event

2015-02-05 Thread Shaohua Li
c: Peter Zijlstra Cc: Andy Lutomirski Cc: Ingo Molnar Signed-off-by: Shaohua Li --- kernel/events/core.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/kernel/events/core.c b/kernel/events/core.c index 04d8b48..98105cf 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -5950

[PATCH 1/2 --resend] perf: update shadow timestamp before add event

2015-02-05 Thread Shaohua Li
: Shaohua Li --- kernel/events/core.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/kernel/events/core.c b/kernel/events/core.c index 19efcf1..04d8b48 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -1769,6 +1769,10 @@ event_sched_in(struct perf_event

Re: [PATCH v17 1/7] mm: support madvise(MADV_FREE)

2015-02-05 Thread Shaohua Li
Hi Minchan, Sorry to jump in this thread so later, and if some issues are discussed before. I'm interesting in this patch, so tried it here. I use a simple test with jemalloc. Obviously this can improve performance when there is no memory pressure. Did you try setup with memory pressure? In my t

Re: [PATCH v17 1/7] mm: support madvise(MADV_FREE)

2015-02-10 Thread Shaohua Li
On Mon, Feb 09, 2015 at 04:15:53PM +0900, Minchan Kim wrote: > On Fri, Feb 06, 2015 at 10:29:18AM -0800, Shaohua Li wrote: > > On Fri, Feb 06, 2015 at 02:51:03PM +0900, Minchan Kim wrote: > > > Hi Shaohua, > > > > > > On Thu, Feb 05, 2015 at 04:33:11PM -0800, S

Re: [PATCH v17 1/7] mm: support madvise(MADV_FREE)

2015-02-11 Thread Shaohua Li
On Wed, Feb 11, 2015 at 09:56:20AM +0900, Minchan Kim wrote: > Hi Shaohua, > > On Tue, Feb 10, 2015 at 02:38:26PM -0800, Shaohua Li wrote: > > On Mon, Feb 09, 2015 at 04:15:53PM +0900, Minchan Kim wrote: > > > On Fri, Feb 06, 2015 at 10:29:18AM -0800, Shaohua Li wrote

Re: [RFC 2/2] perf: update userspace page info for software event

2015-01-28 Thread Shaohua Li
Ping! On Fri, Jan 23, 2015 at 07:57:24AM -0800, Shaohua Li wrote: > On Fri, Jan 23, 2015 at 09:44:51AM +0100, Peter Zijlstra wrote: > > On Thu, Jan 22, 2015 at 01:09:02PM -0800, Shaohua Li wrote: > > > --- > > > kernel/events/core.c | 3 +++ > >

[PATCH v2 1/3] X86: make VDSO data support multiple pages

2014-12-17 Thread Shaohua Li
Currently vdso data is one page. Next patches will add per-cpu data to vdso, which requires several pages if CPU number is big. This makes VDSO data support multiple pages. Cc: Andy Lutomirski Cc: H. Peter Anvin Cc: Ingo Molnar Signed-off-by: Shaohua Li --- arch/x86/include/asm/vdso.h

[PATCH v2 2/3] X86: add a generic API to let vdso code detect context switch

2014-12-17 Thread Shaohua Li
an be used to detect if context switch occurs. Andy suggested we can use a timestamp, so in next patch we can save some intructions. But the principle isn't changed here. This patch uses the timestamp approach. Cc: Andy Lutomirski Cc: H. Peter Anvin Cc: Ingo Molnar Signed-off-by: Shaohua Li

[PATCH v2 3/3] X86: Add a thread cpu time implementation to vDSO

2014-12-17 Thread Shaohua Li
etected on x86_64 context switch code. Most archs that don't support vsyscalls will have this code disabled via jump labels. Cc: Andy Lutomirski Cc: H. Peter Anvin Cc: Ingo Molnar Signed-off-by: Kumar Sundararajan Signed-off-by: Arun Sharma Signed-off-by: Chris Mason Signed-off-by: Shaohua Li

Re: [PATCH RFC 1/4] mm: throttle MADV_FREE

2015-02-26 Thread Shaohua Li
On Thu, Feb 26, 2015 at 09:42:06AM +0900, Minchan Kim wrote: > Hello, > > On Wed, Feb 25, 2015 at 10:37:48AM -0800, Shaohua Li wrote: > > On Wed, Feb 25, 2015 at 04:11:18PM +0900, Minchan Kim wrote: > > > On Wed, Feb 25, 2015 at 09:08:09AM +0900, Minchan Kim

Re: [PATCH] md/raid5: preserve STRIPE_PREREAD_ACTIVE in break_stripe_batch_list

2016-03-09 Thread Shaohua Li
On Wed, Mar 09, 2016 at 12:58:25PM +1100, Neil Brown wrote: > > break_stripe_batch_list breaks up a batch and copies some flags from > the batch head to the members, preserving others. > > It doesn't preserve or copy STRIPE_PREREAD_ACTIVE. This is not > normally a problem as STRIPE_PREREAD_ACTIV

Re: [PATCH] md/raid5: preserve STRIPE_PREREAD_ACTIVE in break_stripe_batch_list

2016-03-09 Thread Shaohua Li
On Thu, Mar 10, 2016 at 06:19:42AM +1100, Neil Brown wrote: > On Thu, Mar 10 2016, Shaohua Li wrote: > > > On Wed, Mar 09, 2016 at 12:58:25PM +1100, Neil Brown wrote: > >> > >> break_stripe_batch_list breaks up a batch and copies some flags from > >> th

Re: [PATCH] md/raid5: Cleanup cpu hotplug notifier

2016-03-19 Thread Shaohua Li
otplug notifier transitions > to free the scratch buffer. > > CC: Shaohua Li > CC: linux-r...@vger.kernel.org > Signed-off-by: Anna-Maria Gleixner Applied, thanks!

[GIT PULL] MD fix for 4.6-rc2

2016-04-08 Thread Shaohua Li
nges up to f9a67b1182e5abfcfcec24762ea95a77332f035e: md/bitmap: clear bitmap if bitmap_create failed (2016-04-01 13:05:50 -0700) Guoqing Jiang (1): md/bitmap: clear bitmap if bitmap_create failed Shaohua Li (1): MD: add rdev reference

Re: [PATCH] block: make sure big bio is splitted into at most 256 bvecs

2016-04-05 Thread Shaohua Li
ake_request handle arbitrarily sized bios) this bug is introduced by d2be537c3ba > Reported-by: Sebastian Roesner > Reported-by: Eric Wheeler > Cc: sta...@vger.kernel.org (4.2+) > Cc: Shaohua Li > Signed-off-by: Ming Lei > --- > I can reproduce the issue and verify the fix

Re: [PATCH] block: make sure big bio is splitted into at most 256 bvecs

2016-04-05 Thread Shaohua Li
On Tue, Apr 05, 2016 at 04:27:33PM -0800, Kent Overstreet wrote: > On Tue, Apr 05, 2016 at 11:27:21AM -0700, Shaohua Li wrote: > > On Wed, Apr 06, 2016 at 01:44:06AM +0800, Ming Lei wrote: > > > After arbitrary bio size is supported, the incoming bio may > > > be very bi

Re: [PATCH] block: make sure big bio is splitted into at most 256 bvecs

2016-04-05 Thread Shaohua Li
On Tue, Apr 05, 2016 at 04:36:04PM -0800, Kent Overstreet wrote: > On Tue, Apr 05, 2016 at 05:30:07PM -0700, Shaohua Li wrote: > > this one: > > http://marc.info/?l=linux-kernel&m=145926976808760&w=2 > > Ah. that patch won't actually fix the bug, since md isn&

Re: [PATCH] [RFC] fix potential access after free: return value of blk_check_plugged() must be used schedule() safe

2016-04-05 Thread Shaohua Li
On Tue, Apr 05, 2016 at 03:36:57PM +0200, Lars Ellenberg wrote: > blk_check_plugged() will return a pointer > to an object linked on current->plug->cb_list. > > That list may "at any time" be implicitly cleared by > blk_flush_plug_list() > flush_plug_callbacks() > either as a result of blk_finish

Re: [PATCH] block: make sure big bio is splitted into at most 256 bvecs

2016-04-05 Thread Shaohua Li
On Tue, Apr 05, 2016 at 04:45:55PM -0800, Kent Overstreet wrote: > On Tue, Apr 05, 2016 at 05:41:47PM -0700, Shaohua Li wrote: > > On Tue, Apr 05, 2016 at 04:36:04PM -0800, Kent Overstreet wrote: > > > On Tue, Apr 05, 2016 at 05:30:07PM -0700, Shaohua Li wrote: > > &g

Re: [PATCH] block: make sure big bio is splitted into at most 256 bvecs

2016-04-05 Thread Shaohua Li
On Wed, Apr 06, 2016 at 08:47:56AM +0800, Ming Lei wrote: > On Wed, Apr 6, 2016 at 2:27 AM, Shaohua Li wrote: > > On Wed, Apr 06, 2016 at 01:44:06AM +0800, Ming Lei wrote: > >> After arbitrary bio size is supported, the incoming bio may > >> be very big. We have to sp

Re: [RFC 1/2] time: workaround crappy hpet

2016-04-18 Thread Shaohua Li
On Mon, Apr 18, 2016 at 10:05:22AM -0700, John Stultz wrote: > On Mon, Apr 11, 2016 at 5:57 PM, Shaohua Li wrote: > > Calvin found 'perf record -a --call-graph dwarf -- sleep 5' making > > clocksource > > switching to hpet. We found similar symptom in another m

Re: [RFC 1/2] time: workaround crappy hpet

2016-04-18 Thread Shaohua Li
On Mon, Apr 18, 2016 at 10:42:38AM -0700, John Stultz wrote: > On Mon, Apr 18, 2016 at 10:32 AM, Shaohua Li wrote: > > On Mon, Apr 18, 2016 at 10:05:22AM -0700, John Stultz wrote: > >> On Mon, Apr 11, 2016 at 5:57 PM, Shaohua Li wrote: > >> > Calvin found 

[PATCH] block: copy NOMERGE flag from bio to request

2016-04-25 Thread Shaohua Li
bio might have NOMERGE flag set, for example blk_queue_split sets it. When we initiate request, copy this flag too. Signed-off-by: Shaohua Li --- include/linux/blk_types.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h

[PATCH] MD: make bio mergeable

2016-04-25 Thread Shaohua Li
https://bugzilla.kernel.org/show_bug.cgi?id=117051 Reported-by: Park Ju Hyung Fixes: 6ac45aeb6bca(block: avoid to merge splitted bio) Cc: sta...@vger.kernel.org (v4.3+) Cc: Ming Lei Cc: Jens Axboe Cc: Neil Brown Signed-off-by: Shaohua Li --- drivers/md/md.c | 2 ++ 1 file changed, 2 insertions(+) diff

Re: [PATCH] block: don't make BLK_DEF_MAX_SECTORS too big

2016-03-29 Thread Shaohua Li
On Tue, Mar 29, 2016 at 02:18:33PM -0700, Christoph Hellwig wrote: > On Tue, Mar 29, 2016 at 09:42:33AM -0700, Shaohua Li wrote: > > bio_alloc_bioset() allocates bvecs from bvec_slabs which can only > > allocate maximum 256 bvec (eg, 1M for 4k pages). We can't bump > &

Re: [PATCH] block: don't make BLK_DEF_MAX_SECTORS too big

2016-03-29 Thread Shaohua Li
On Wed, Mar 30, 2016 at 09:39:35AM +0800, Ming Lei wrote: > On Wed, Mar 30, 2016 at 12:42 AM, Shaohua Li wrote: > > bio_alloc_bioset() allocates bvecs from bvec_slabs which can only > > allocate maximum 256 bvec (eg, 1M for 4k pages). We can't bump > > BLK_DEF_MAX_SE

Re: [PATCH] block: don't make BLK_DEF_MAX_SECTORS too big

2016-03-30 Thread Shaohua Li
On Tue, Mar 29, 2016 at 11:51:51PM -0700, Christoph Hellwig wrote: > On Tue, Mar 29, 2016 at 03:01:10PM -0700, Shaohua Li wrote: > > The problem is bcache allocates a big bio (with bio_alloc). The bio is > > split with blk_queue_split, but it isn't split to small size because

Re: [PATCH] block: don't make BLK_DEF_MAX_SECTORS too big

2016-03-30 Thread Shaohua Li
On Wed, Mar 30, 2016 at 08:13:07PM +0800, Ming Lei wrote: > Hi Shaohua, > > On Wed, Mar 30, 2016 at 10:27 AM, Shaohua Li wrote: > > On Wed, Mar 30, 2016 at 09:39:35AM +0800, Ming Lei wrote: > >> On Wed, Mar 30, 2016 at 12:42 AM, Shaohua Li wrote: > >> > bio

[RFC 1/2] time: workaround crappy hpet

2016-04-11 Thread Shaohua Li
value. In the relevant machine, the hpet counter doesn't read to 0x later. The chance hpet has 0x counter is very small, this patch should have no impact for good hpet. I'm open if there is better solution. Reported-by: Calvin Owens Signed-off-by: Shaohua Li --- arch/x86/ker

[RFC 2/2] time: double check if watchdog clocksource is correct

2016-04-11 Thread Shaohua Li
ff-by: Shaohua Li --- kernel/time/clocksource.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c index 56ece14..36aff4e 100644 --- a/kernel/time/clocksource.c +++ b/kernel/time/clocksource.c @@ -122,9 +122,10 @@ stati

Re: [PATCH v2 01/13] mm: support madvise(MADV_FREE)

2015-11-05 Thread Shaohua Li
On Wed, Nov 04, 2015 at 05:05:47PM -0500, Daniel Micay wrote: > > With enough pages at once, though, munmap would be fine, too. > > That implies lots of page faults and zeroing though. The zeroing alone > is a major performance issue. > > There are separate issues with munmap since it ends up res

Re: [PATCH] md/raid5: fix locking in handle_stripe_clean_event()

2015-10-30 Thread Shaohua Li
On Fri, Oct 30, 2015 at 05:02:47PM +0300, Roman Gushchin wrote: > > Isn't the 4.1 fix just: > > > > diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c > > index e5befa356dbe..6e4350a78257 100644 > > --- a/drivers/md/raid5.c > > +++ b/drivers/md/raid5.c > > @@ -3522,16 +3522,16 @@ returnbi: > >   

Re: [PATCH 1/8] mm: support madvise(MADV_FREE)

2015-10-30 Thread Shaohua Li
On Fri, Oct 30, 2015 at 04:01:37PM +0900, Minchan Kim wrote: > +static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr, > + unsigned long end, struct mm_walk *walk) > + > +{ > + struct mmu_gather *tlb = walk->private; > + struct mm_struct *mm = tlb->mm;

Re: [PATCH 5/8] mm: move lazily freed pages to inactive list

2015-10-30 Thread Shaohua Li
On Fri, Oct 30, 2015 at 04:01:41PM +0900, Minchan Kim wrote: > MADV_FREE is a hint that it's okay to discard pages if there is memory > pressure and we use reclaimers(ie, kswapd and direct reclaim) to free them > so there is no value keeping them in the active anonymous LRU so this > patch moves th

[PATCH] workqueue: make sure delayed work run in local cpu

2015-09-30 Thread Shaohua Li
hread+0xf8/0x110 [ 28.020071] [] ?kthread_create_on_node+0x200/0x200 [ 28.020071] [] ret_from_fork+0x3f/0x70 [ 28.020071] [] ?kthread_create_on_node+0x200/0x200 Signed-off-by: Shaohua Li --- kernel/workqueue.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/k

Re: [PATCH 5/8] mm: move lazily freed pages to inactive list

2015-11-04 Thread Shaohua Li
On Tue, Nov 03, 2015 at 09:52:23AM +0900, Minchan Kim wrote: > On Fri, Oct 30, 2015 at 10:22:12AM -0700, Shaohua Li wrote: > > On Fri, Oct 30, 2015 at 04:01:41PM +0900, Minchan Kim wrote: > > > MADV_FREE is a hint that it's okay to discard pages if there is memory &g

Re: [PATCH 5/8] mm: move lazily freed pages to inactive list

2015-11-04 Thread Shaohua Li
On Wed, Nov 04, 2015 at 09:53:42AM -0800, Shaohua Li wrote: > On Tue, Nov 03, 2015 at 09:52:23AM +0900, Minchan Kim wrote: > > On Fri, Oct 30, 2015 at 10:22:12AM -0700, Shaohua Li wrote: > > > On Fri, Oct 30, 2015 at 04:01:41PM +0900, Minchan Kim wrote: > > > > MADV

Re: [PATCH v2 01/13] mm: support madvise(MADV_FREE)

2015-11-04 Thread Shaohua Li
On Wed, Nov 04, 2015 at 10:25:55AM +0900, Minchan Kim wrote: > Linux doesn't have an ability to free pages lazy while other OS already > have been supported that named by madvise(MADV_FREE). > > The gain is clear that kernel can discard freed pages rather than swapping > out or OOM if memory press

[GIT PULL] MD update for 4.19-rc2

2018-09-07 Thread Shaohua Li
bad: md-cluster: release RESYNC lock after the last resync message (2018-08-31 17:38:10 -0700) Guoqing Jiang (1): md-cluster: release RESYNC lock after the last resync message Shaohua Li (1): md/raid5-cache: disable reshape

[GIT PULL] MD update for 4.18-rc

2018-06-09 Thread Shaohua Li
Hi, A few fixes of MD for this merge window. Mostly bug fixes: - raid5 stripe batch fix from Amy - Read error handling for raid1 FailFast device from Gioh - raid10 recovery NULL pointer dereference fix from Guoqing - Support write hint for raid5 stripe cache from Mariusz - Fixes for device hot add/

[GIT PULL] MD update for 4.17-rc1

2018-04-19 Thread Shaohua Li
Hi, 3 small fixes for MD: - md-cluster fix for faulty device from Guoqing - writehint fix for writebehind IO for raid1 from Mariusz - a live lock fix for interrupted recovery from Yufen Please pull! The following changes since commit f8cf2f16a7c95acce497bfafa90e7c6d8397d653: Merge branch 'next

Re: [PATCH/RFC] add "failfast" support for raid1/raid10.

2016-11-21 Thread Shaohua Li
On Fri, Nov 18, 2016 at 04:16:11PM +1100, Neil Brown wrote: > Hi, > > I've been sitting on these patches for a while because although they > solve a real problem, it is a fairly limited use-case, and I don't > really like some of the details. > > So I'm posting them as RFC in the hope that a

Re: [PATCH V4 02/15] blk-throttle: add .high interface

2016-11-22 Thread Shaohua Li
On Tue, Nov 22, 2016 at 03:02:53PM -0500, Tejun Heo wrote: > Hello, Shaohua. > > Sorry about the delay. > > On Mon, Nov 14, 2016 at 02:22:09PM -0800, Shaohua Li wrote: > > @@ -1376,11 +1414,37 @@ static ssize_t tg_set_max(struct kernfs_open_file > > *of, > >

Re: [PATCH V4 03/15] blk-throttle: configure bps/iops limit for cgroup in high limit

2016-11-22 Thread Shaohua Li
On Tue, Nov 22, 2016 at 03:16:43PM -0500, Tejun Heo wrote: > On Mon, Nov 14, 2016 at 02:22:10PM -0800, Shaohua Li wrote: > > each queue will have a state machine. Initially queue is in LIMIT_HIGH > > state, which means all cgroups will be throttled according to their high >

Re: [PATCH V4 07/15] blk-throttle: make throtl_slice tunable

2016-11-22 Thread Shaohua Li
On Tue, Nov 22, 2016 at 04:27:15PM -0500, Tejun Heo wrote: > Hello, > > On Mon, Nov 14, 2016 at 02:22:14PM -0800, Shaohua Li wrote: > > throtl_slice is important for blk-throttling. A lot of stuffes depend on > > it, for example, throughput measurement. It has 100ms defaul

Re: [PATCH V4 05/15] blk-throttle: add downgrade logic

2016-11-22 Thread Shaohua Li
On Tue, Nov 22, 2016 at 04:42:00PM -0500, Tejun Heo wrote: > Hello, > > On Tue, Nov 22, 2016 at 04:21:21PM -0500, Tejun Heo wrote: > > 1. A cgroup and its high and max limits don't have much to do with > >other cgroups and their limits. I don't get how the choice between > >high and max l

Re: [PATCH V4 09/15] blk-throttle: make bandwidth change smooth

2016-11-23 Thread Shaohua Li
On Wed, Nov 23, 2016 at 04:23:35PM -0500, Tejun Heo wrote: > Hello, > > On Mon, Nov 14, 2016 at 02:22:16PM -0800, Shaohua Li wrote: > > cg1/cg2 bps: 10/80 -> 15/105 -> 20/100 -> 25/95 -> 30/90 -> 35/85 -> 40/80 > > -> 45/75 -> 10/80 > > I wonde

Re: [PATCH V4 11/15] blk-throttle: add interface to configure think time threshold

2016-11-23 Thread Shaohua Li
On Wed, Nov 23, 2016 at 04:32:43PM -0500, Tejun Heo wrote: > On Mon, Nov 14, 2016 at 02:22:18PM -0800, Shaohua Li wrote: > > Add interface to configure the threshold > > > > Signed-off-by: Shaohua Li > > --- > > block/blk-sysfs.c| 7 +

Re: [PATCH V4 10/15] blk-throttle: add a simple idle detection

2016-11-23 Thread Shaohua Li
On Wed, Nov 23, 2016 at 04:46:19PM -0500, Tejun Heo wrote: > Hello, Shaohua. > > On Mon, Nov 14, 2016 at 02:22:17PM -0800, Shaohua Li wrote: > > Unfortunately it's very hard to determine if a cgroup is real idle. This > > patch uses the 'think time check' i

Re: [PATCH] md/r5cache: fix spelling mistake on "recoverying"

2017-01-04 Thread Shaohua Li
On Fri, Dec 23, 2016 at 12:52:30AM +, Colin King wrote: > From: Colin Ian King > > Trivial fix to spelling mistake "recoverying" to "recovering" in > pr_dbg message. applied, thanks > Signed-off-by: Colin Ian King > --- > drivers/md/raid5-cache.c | 2 +- > 1 file changed, 1 insertion(+),

Re: PROBLEM: Kernel BUG with raid5 soft + Xen + DRBD - invalid opcode

2017-01-04 Thread Shaohua Li
On Fri, Dec 23, 2016 at 07:25:56PM +0100, MasterPrenium wrote: > Hello Guys, > > I've having some trouble on a new system I'm setting up. I'm getting a kernel > BUG message, seems to be related with the use of Xen (when I boot the system > _without_ Xen, I don't get any crash). > Here is configu

Re: PROBLEM: Kernel BUG with raid5 soft + Xen + DRBD - invalid opcode

2017-01-05 Thread Shaohua Li
Khlebnikov ? Right. > Do you want the "ext4.dat" fio file ? It will be really difficult for me to > provide it to you as I've only a poor ADSL network connection. Not necessary. Thanks, Shaohua > Thanks for your help, > > MasterPrenium > > Le 04/01/2017 à

[PATCH V6 00/18] blk-throttle: add .low limit

2017-01-14 Thread Shaohua Li
arc.info/?l=linux-block&m=147916216512915&w=2 V2->V3: - Rebase - Fix several bugs - Make harddisk think time threshold bigger http://marc.info/?l=linux-kernel&m=147552964708965&w=2 V1->V2: - Drop io.low interface for simplicity and the interface isn't a must-have to

[PATCH V6 04/18] blk-throttle: configure bps/iops limit for cgroup in low limit

2017-01-14 Thread Shaohua Li
ff-by: Shaohua Li --- block/blk-throttle.c | 20 ++-- 1 file changed, 18 insertions(+), 2 deletions(-) diff --git a/block/blk-throttle.c b/block/blk-throttle.c index d3ad43c..3bc6deb 100644 --- a/block/blk-throttle.c +++ b/block/blk-throttle.c @@ -212,12 +212,28 @@ static s

[PATCH V6 03/18] blk-throttle: add .low interface

2017-01-14 Thread Shaohua Li
configuration. Old bps/iops fields in throtl_grp will be the actual limit we use for throttling. Signed-off-by: Shaohua Li --- block/blk-throttle.c | 142 +-- 1 file changed, 114 insertions(+), 28 deletions(-) diff --git a/block/blk-throttle.c b/block/blk

[PATCH V6 14/18] blk-throttle: ignore idle cgroup limit

2017-01-14 Thread Shaohua Li
Last patch introduces a way to detect idle cgroup. We use it to make upgrade/downgrade decision. And the new algorithm can detect completely idle cgroup too, so we can delete the corresponding code. Signed-off-by: Shaohua Li --- block/blk-throttle.c | 40

[PATCH V6 18/18] blk-throttle: add latency target support

2017-01-14 Thread Shaohua Li
be treated idle and other cgroups can dispatch more IO. Currently this latency target check is only for SSD as we can't calcualte the latency target for hard disk. And this is only for cgroup leaf node so far. Signed-off-by: Shaohua Li --- block/blk-

[PATCH V6 09/18] blk-throttle: choose a small throtl_slice for SSD

2017-01-14 Thread Shaohua Li
The throtl_slice is 100ms by default. This is a long time for SSD, a lot of IO can run. To make cgroups have smoother throughput, we choose a small value (20ms) for SSD. Signed-off-by: Shaohua Li --- block/blk-sysfs.c| 2 ++ block/blk-throttle.c | 18 +++--- block/blk.h

[PATCH V6 12/18] blk-throttle: add a simple idle detection

2017-01-14 Thread Shaohua Li
'us' with 'ns >> 10'. This is fast but loses precision, should not a big deal. Signed-off-by: Shaohua Li --- block/bio.c | 2 ++ block/blk-throttle.c | 79 ++- block/blk.h | 2 ++ include

[PATCH V6 02/18] blk-throttle: prepare support multiple limits

2017-01-14 Thread Shaohua Li
We are going to support low/max limit, each cgroup will have 2 limits after that. This patch prepares for the multiple limits change. Signed-off-by: Shaohua Li --- block/blk-throttle.c | 110 --- 1 file changed, 70 insertions(+), 40 deletions

[PATCH V6 05/18] blk-throttle: add upgrade logic for LIMIT_LOW state

2017-01-14 Thread Shaohua Li
ningless. As long as parent's bps/iops (which is a sum of childrens bps/iops) cross low limit, we can upgrade queue state. Signed-off-by: Shaohua Li --- block/blk-throttle.c | 100 --- 1 file changed, 96 insertions(+), 4 deletions(-) diff --git

[PATCH V6 15/18] blk-throttle: add interface for per-cgroup target latency

2017-01-14 Thread Shaohua Li
other cgroups. User will configure the interface in this way: echo "8:16 rbps=2097152 wbps=max latency=100 idle=200" > io.low latency is in microsecond unit By default, latency target is 0, which means to guarantee IO latency. Signed-off-by: Shaohua Li --- block/blk-throttle.c |

[PATCH V6 13/18] blk-throttle: add interface to configure idle time threshold

2017-01-14 Thread Shaohua Li
Add interface to configure the threshold. The io.low interface will like: echo "8:16 rbps=2097152 wbps=max idle=2000" > io.low idle is in microsecond unit. Signed-off-by: Shaohua Li --- block/blk-throttle.c | 41 - 1 file changed, 28 inse

[PATCH V6 06/18] blk-throttle: add downgrade logic

2017-01-14 Thread Shaohua Li
When queue state machine is in LIMIT_MAX state, but a cgroup is below its low limit for some time, the queue should be downgraded to lower state as one cgroup's low limit isn't met. Signed-off-by: Shaohua Li --- block/blk-throttle.c | 156

[PATCH V6 08/18] blk-throttle: make throtl_slice tunable

2017-01-14 Thread Shaohua Li
the sysfs name 'throttle_sample_time' reflects its character better. Signed-off-by: Shaohua Li --- Documentation/block/queue-sysfs.txt | 6 +++ block/blk-sysfs.c | 10 + block/blk-throttle.c| 77 ++--- block/blk.h

[PATCH V6 01/18] blk-throttle: use U64_MAX/UINT_MAX to replace -1

2017-01-14 Thread Shaohua Li
clean up the code to avoid using -1 Signed-off-by: Shaohua Li --- block/blk-throttle.c | 32 1 file changed, 16 insertions(+), 16 deletions(-) diff --git a/block/blk-throttle.c b/block/blk-throttle.c index a6bb4fe..e45bf50 100644 --- a/block/blk-throttle.c

[PATCH V6 16/18] block: track request size in blk_issue_stat

2017-01-14 Thread Shaohua Li
, which still is very long time. Signed-off-by: Shaohua Li --- block/blk-core.c | 2 +- block/blk-mq.c| 2 +- block/blk-stat.c | 7 --- block/blk-stat.h | 29 +++-- block/blk-wbt.h | 10 +- include/linux

[PATCH V6 11/18] blk-throttle: make bandwidth change smooth

2017-01-14 Thread Shaohua Li
If the scale becomes 0, we then fully downgrade the queue to LIMIT_LOW state. Note this doesn't completely avoid cgroup running under its low limit. The best way to guarantee cgroup doesn't run under its limit is to set max limit. For example, if we set cg1 max limit to 40, cg2 will nev

[PATCH V6 17/18] blk-throttle: add a mechanism to estimate IO latency

2017-01-14 Thread Shaohua Li
ead of request size. Currently this feature is SSD only, we probably can use a fixed threshold like 4ms for hard disk though. Signed-off-by: Shaohua Li --- block/blk-stat.c | 4 ++ block/blk-throttle.c | 162 -- block/blk.h

[PATCH V6 07/18] blk-throttle: make sure expire time isn't too big

2017-01-14 Thread Shaohua Li
roup sleep time not too big wouldn't change cgroup bps/iops, but could make it wakeup more frequently, which isn't a big issue because throtl_slice * 8 is already quite big. Signed-off-by: Shaohua Li --- block/blk-throttle.c | 11 +++ 1 file changed, 11 insertions(+) diff --git

[PATCH V6 10/18] blk-throttle: detect completed idle cgroup

2017-01-14 Thread Shaohua Li
roups, so I leave it here. Signed-off-by: Shaohua Li --- block/blk-throttle.c | 19 ++- 1 file changed, 18 insertions(+), 1 deletion(-) diff --git a/block/blk-throttle.c b/block/blk-throttle.c index 2d05c91..b3ce176 100644 --- a/block/blk-throttle.c +++ b/block/blk-throttle.c @

Re: [RFC 0/6]mm: add new LRU list for MADV_FREE pages

2017-01-31 Thread Shaohua Li
On Tue, Jan 31, 2017 at 01:59:49PM -0500, Johannes Weiner wrote: > Hi Shaohua, > > On Sun, Jan 29, 2017 at 09:51:17PM -0800, Shaohua Li wrote: > > We are trying to use MADV_FREE in jemalloc. Several issues are found. > > Without > > solving the issues, jemalloc can&

[PATCH 1/2] blk-mq: allocate blk_mq_tags and requests in correct node

2017-01-31 Thread Shaohua Li
blk_mq_tags/requests of specific hardware queue are mostly used in specific cpus, which might not be in the same numa node as disk. For example, a nvme card is in node 0. half hardware queue will be used by node 0, the other node 1. Signed-off-by: Shaohua Li --- block/blk-mq.c | 14

[PATCH 2/2] nvme: allocate nvme_queue in correct node

2017-01-31 Thread Shaohua Li
nvme_queue is per-cpu queue (mostly). Allocating it in node where blk-mq will use it. Signed-off-by: Shaohua Li --- drivers/nvme/host/pci.c | 19 +++ 1 file changed, 15 insertions(+), 4 deletions(-) diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index 3faefab

Re: [RFC 0/6]mm: add new LRU list for MADV_FREE pages

2017-02-01 Thread Shaohua Li
On Tue, Jan 31, 2017 at 04:38:10PM -0500, Johannes Weiner wrote: > On Tue, Jan 31, 2017 at 11:45:47AM -0800, Shaohua Li wrote: > > On Tue, Jan 31, 2017 at 01:59:49PM -0500, Johannes Weiner wrote: > > > Hi Shaohua, > > > > > > On Sun, Jan 29, 2017 at 09:51:17PM

[PATCH V2 3/3] nvme: allocate nvme_queue in correct node

2017-02-01 Thread Shaohua Li
nvme_queue is per-cpu queue (mostly). Allocating it in node where blk-mq will use it. Signed-off-by: Shaohua Li --- drivers/nvme/host/pci.c | 12 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index 032237c..9733008

[PATCH V2 2/3] PCI: add an API to get node from vector

2017-02-01 Thread Shaohua Li
Next patch will use the API to get the node from vector for nvme device Signed-off-by: Shaohua Li --- drivers/pci/msi.c | 16 include/linux/pci.h | 6 ++ 2 files changed, 22 insertions(+) diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c index 50c5003..ab7aee7 100644

[PATCH V2 1/3] blk-mq: allocate blk_mq_tags and requests in correct node

2017-02-01 Thread Shaohua Li
blk_mq_tags/requests of specific hardware queue are mostly used in specific cpus, which might not be in the same numa node as disk. For example, a nvme card is in node 0. half hardware queue will be used by node 0, the other node 1. Signed-off-by: Shaohua Li --- block/blk-mq.c | 21

[PATCH] x86/intel_rdt: reinitialize cbm for new group allocation

2017-01-06 Thread Shaohua Li
is more nartual. Cc: Fenghua Yu Cc: Thomas Gleixner Signed-off-by: Shaohua Li --- arch/x86/kernel/cpu/intel_rdt_rdtgroup.c | 9 + 1 file changed, 9 insertions(+) diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c index 8af04af..7e81527 100644 -

Re: [PATCH] x86/intel_rdt: reinitialize cbm for new group allocation

2017-01-09 Thread Shaohua Li
On Mon, Jan 09, 2017 at 11:03:59PM +0100, Thomas Gleixner wrote: > On Mon, 9 Jan 2017, Fenghua Yu wrote: > > On Fri, Jan 06, 2017 at 04:05:19PM -0800, Shaohua Li wrote: > > But since you come here now, I would think reseting the CBM in > > closid_free() is better. > &g

Re: [PATCH V5 00/17] blk-throttle: add .low limit

2017-01-09 Thread Shaohua Li
On Mon, Jan 09, 2017 at 04:46:35PM -0500, Tejun Heo wrote: > Hello, > > Sorry about the long delay. Generally looks good to me. Overall, > there are only a few things that I think should be addressed. Thanks for your time! > * Low limit should default to zero. I forgot to change it after cha

Re: PROBLEM: Kernel BUG with raid5 soft + Xen + DRBD - invalid opcode

2017-01-09 Thread Shaohua Li
On Sun, Jan 08, 2017 at 02:31:15PM +0100, MasterPrenium wrote: > Hello, > > Replies below + : > - I don't know if this can help but after the crash, when the system > reboots, the Raid 5 stack is re-synchronizing > [ 37.028239] md10: Warning: Device sdc1 is misaligned > [ 37.028541] created bi

Re: [PATCH] md/bitmap: use i_blocksize()

2017-01-21 Thread Shaohua Li
On Fri, Jan 20, 2017 at 10:29:52PM +0800, Geliang Tang wrote: > Since i_blocksize() helper has been defined in fs.h, use it instead > of open-coding. which tree is this patch applied to? I can't find it in Linus's tree > Signed-off-by: Geliang Tang > --- > drivers/md/bitmap.c | 6 +++--- > 1 fi

[GIT PULL] MD update for 4.10-rc6

2017-01-27 Thread Shaohua Li
4 11:26:06 -0800) -------- Shaohua Li (1): md/raid5-cache: delete meaningless code Song Liu (5): md/r5cache: read data into orig_page for prexor of cached data md/raid5: move comment of fetch_block to right location

[RFC 0/6]mm: add new LRU list for MADV_FREE pages

2017-01-29 Thread Shaohua Li
E page can be promoted to active page there. But there isn't mm_struct context at that place. Iterating vma there sounds too silly. The patchset don't fix this issue yet. Hopefully somebody can share a hint how to fix this issue. Thanks, Shaohua Minchan previous patches: http://marc

[RFC 5/6] mm: reclaim lazyfree pages

2017-01-29 Thread Shaohua Li
n normal way. Cc: Michal Hocko Cc: Minchan Kim Cc: Hugh Dickins Cc: Johannes Weiner Cc: Rik van Riel Cc: Mel Gorman Signed-off-by: Shaohua Li --- mm/rmap.c | 7 ++- mm/vmscan.c | 56 2 files changed, 54 insertions(+), 9 deleti

[RFC 3/6] mm: add LRU_LAZYFREE lru list

2017-01-29 Thread Shaohua Li
ickins Cc: Johannes Weiner Cc: Rik van Riel Cc: Mel Gorman Signed-off-by: Shaohua Li --- drivers/base/node.c | 2 ++ drivers/staging/android/lowmemorykiller.c | 3 ++- fs/proc/meminfo.c | 1 + include/linux/mm_inline.h

[RFC 2/6] mm: add lazyfree page flag

2017-01-29 Thread Shaohua Li
m Cc: Hugh Dickins Cc: Johannes Weiner Cc: Rik van Riel Cc: Mel Gorman Signed-off-by: Shaohua Li --- fs/proc/task_mmu.c | 8 +++- include/linux/mm_inline.h | 5 + include/linux/page-flags.h | 6 ++ mm/huge_memory.c | 1 + mm/migrate.c | 2 ++ 5

[RFC 4/6] mm: move MADV_FREE pages into LRU_LAZYFREE list

2017-01-29 Thread Shaohua Li
: Mel Gorman Signed-off-by: Shaohua Li --- include/linux/swap.h | 2 +- mm/huge_memory.c | 5 ++--- mm/madvise.c | 3 +-- mm/swap.c| 51 +-- 4 files changed, 33 insertions(+), 28 deletions(-) diff --git a/include/linux/swa

<    1   2   3   4   5   6   7   8   9   >