On Thu, Oct 05, 2017 at 11:28:47AM -0700, Matthias Kaehlcke wrote:
> The raid10 driver can't be built with clang since it uses a variable
> length array in a structure (VLAIS):
>
> drivers/md/raid10.c:4583:17: error: fields must have a constant size:
> 'variable length array in structure' extens
On Fri, Oct 06, 2017 at 01:22:12PM +1100, Neil Brown wrote:
> On Thu, Oct 05 2017, Matthias Kaehlcke wrote:
>
> > Hi Neil,
> >
> > El Fri, Oct 06, 2017 at 10:58:59AM +1100 NeilBrown ha dit:
> >
> >> On Thu, Oct 05 2017, Matthias Kaehlcke wrote:
> >>
> >> > The raid10 driver can't be built with cl
On Thu, Sep 14, 2017 at 02:02:03PM -0700, Shaohua Li wrote:
> From: Shaohua Li
>
> Hi,
>
> The IO dispatched to under layer disk by loop block device isn't cloned from
> original bio, so the IO loses cgroup information of original bio. These IO
> escapes from cgroup c
t (2017-09-27 20:08:44 -0700)
--------
Shaohua Li (4):
md: separate request handling
md: fix a race condition for flush request handling
dm-raid: fix a race condition in request handling
md/raid5: cap worker count
dri
On Wed, Sep 20, 2017 at 11:01:47AM +0200, Artem Savkov wrote:
> Hi All,
>
> We recently started noticing madvise09[1] test from ltp failing strangely. The
> test does the following: maps 32 pages, sets MADV_FREE for the range it got,
> dirties 2 of the pages, creates memory pressure and check that
On Fri, Jun 09, 2017 at 07:24:29AM +1000, Neil Brown wrote:
> On Thu, Jun 08 2017, Mikulas Patocka wrote:
>
> > On Thu, 8 Jun 2017, Shaohua Li wrote:
> >
> >> On Thu, Jun 08, 2017 at 04:59:03PM +1000, Neil Brown wrote:
> >> > On Wed, Jun 07 2017, Miku
Hi,
Several patches for MD. One notable is making flush bios sync, others fix small
issues. Please pull!
Thanks,
Shaohua
The following changes since commit 08332893e37af6ae779367e78e444f8f9571511d:
Linux 4.12-rc2 (2017-05-21 19:30:23 -0700)
are available in the git repository at:
git://git
From: Shaohua Li
When working on adding exportfs operations in kernfs, I found it's hard
to initialize dentry->d_fsdata in the exportfs operations. Looks there
is no way to do it without race condition. Look at the kernfs code
closely, there is no point to set dentry->d_fsdata. inode
From: Shaohua Li
kernfs uses ida to manage inode number. The problem is we can't get
kernfs_node from inode number with ida. Switching to use idr, next patch
will add an API to get kernfs_node from inode number.
Signed-off-by: Shaohua Li
---
fs/kernfs/dir.c
From: Shaohua Li
Hi,
Currently blktrace isn't cgroup aware. blktrace prints out task name of current
context, but the task of current context isn't always in the cgroup where the
BIO comes from. We can't use task name to find out IO cgroup. For example,
Writeback BIOs always com
From: Shaohua Li
Add an API to export cgroup fhandle info. We don't export a full 'struct
file_handle', there are unrequired info. Sepcifically, cgroup is always
a directory, so we don't need a 'FILEID_INO32_GEN_PARENT' type fhandle,
we only need export the inod
From: Shaohua Li
Now we have the facilities to implement exportfs operations. The idea is
cgroup can export the fhandle info to userspace, then userspace uses
fhandle to find the cgroup name. Another example is userspace can get
fhandle for a cgroup and BPF uses the fhandle to filter info for
From: Shaohua Li
Currently blktrace isn't cgroup aware. blktrace prints out task name of
current context, but the task of current context isn't always in the
cgroup where the BIO comes from. We can't use task name to find out IO
cgroup. For example, Writeback BIOs always com
From: Shaohua Li
Add an API to get kernfs node from inode number. We will need this to
implement exportfs operations.
To make the API lock free, kernfs node is freed in RCU context. And we
depend on kernfs_node count/ino number to filter stale kernfs nodes.
Signed-off-by: Shaohua Li
---
fs
From: Shaohua Li
blkcg_bio_issue_check() already gets blkcg for a BIO.
bio_associate_blkcg() uses a percpu refcounter, so it's a very cheap
operation. There is no point we don't attach the cgroup info into bio at
blkcg_bio_issue_check. This also makes blktrace outputs correct c
From: Shaohua Li
Currently cfq/bfq/blk-throttle output cgroup info in trace in their own
way. Now we have standard blktrace API for this, so convert them to use
it.
Note, this changes the behavior a little bit. cgroup info isn't output
by default, we only do this with 'blk_cgro
From: Shaohua Li
bio_free isn't a good place to free cgroup/integrity info. There are a
lot of cases bio is allocated in special way (for example, in stack) and
never gets called by bio_put hence bio_free, we are leaking memory. This
patch moves the free to bio endio, which should be c
From: Shaohua Li
By default we output cgroup id in blktrace. This adds an option to
display cgroup path. Since get cgroup path is a relativly heavy
operation, we don't enable it by default.
with the option enabled, blktrace will output something like this:
dd-1353 [007] d..2 293.0
From: Shaohua Li
Set i_generation for kernfs inode. This is required to implement exportfs
operations.
Note, the generation is 32-bit, so it's possible the generation wraps up
and we find stale files. The possiblity is low, since fhandle matches
both inode number and generation. In most fs
On Fri, Jun 02, 2017 at 03:03:45PM -0700, Eduardo Valentin wrote:
> On Fri, Jun 02, 2017 at 02:53:56PM -0700, Shaohua Li wrote:
> > From: Shaohua Li
> >
> > Add an API to get kernfs node from inode number. We will need this to
> > implement exportfs operations.
> &g
md: raid1/raid10: initialize bvec table via bio_add_page()
md: raid1-10: move raid1/raid10 common code into raid1-10.c
Ofer Heifetz (1):
md/raid5: add thread_group worker async_tx_issue_pending_all
Shaohua Li (2):
md/raid1: fix writebehind bio clone
MD: fix warnning
On Sat, Jul 29, 2017 at 07:52:45PM +0300, Cihangir Akturk wrote:
> Since commit f15146380d28 ("fs: seq_file - add event counter to simplify
> poll() support"), md.c code has been no longer used the private field of
> the struct seq_file, but seq_release_private() has been continued to be
> used to
On Mon, Aug 07, 2017 at 01:20:25PM +0200, Dominik Brodowski wrote:
> Neil, Shaohua,
>
> following up on David R's bug message: I have observed something similar
> on v4.12.[345] and v4.13-rc4, but not on v4.11. This is a RAID1 (on bare
> metal partitions, /dev/sdaX and /dev/sdbY linked together).
On Sat, Aug 12, 2017 at 07:43:46PM +0200, Denys Vlasenko wrote:
> Signed-off-by: Denys Vlasenko
> Cc: H. Peter Anvin
> Cc: mi...@redhat.com
> Cc: Jim Kukunas
> Cc: Fenghua Yu
> Cc: Megha Dey
> Cc: Gayatri Kammela
> Cc: Shaohua Li
> Cc: x...@kernel.org
> Cc
700)
NeilBrown (2):
md: always clear ->safemode when md_check_recovery gets the mddev lock.
md: fix test in md_write_start()
Shaohua Li (1):
MD: not clear ->safemode for external metadata array
Song Liu (2):
m
From: Shaohua Li
Hi,
Currently blktrace isn't cgroup aware. blktrace prints out task name of current
context, but the task of current context isn't always in the cgroup where the
BIO comes from. We can't use task name to find out IO cgroup. For example,
Writeback BIOs always com
From: Shaohua Li
Add an API to export cgroup fhandle info. We don't export a full 'struct
file_handle', there are unrequired info. Sepcifically, cgroup is always
a directory, so we don't need a 'FILEID_INO32_GEN_PARENT' type fhandle,
we only need export the inod
From: Shaohua Li
Currently cfq/bfq/blk-throttle output cgroup info in trace in their own
way. Now we have standard blktrace API for this, so convert them to use
it.
Note, this changes the behavior a little bit. cgroup info isn't output
by default, we only do this with 'blk_cgro
From: Shaohua Li
Add an API to get kernfs node from inode number. We will need this to
implement exportfs operations.
To make the API lock free, kernfs node is freed in RCU context. And we
depend on kernfs_node count/ino number to filter stale kernfs nodes.
Signed-off-by: Shaohua Li
---
fs
From: Shaohua Li
blkcg_bio_issue_check() already gets blkcg for a BIO.
bio_associate_blkcg() uses a percpu refcounter, so it's a very cheap
operation. There is no point we don't attach the cgroup info into bio at
blkcg_bio_issue_check. This also makes blktrace outputs correct c
From: Shaohua Li
When working on adding exportfs operations in kernfs, I found it's hard
to initialize dentry->d_fsdata in the exportfs operations. Looks there
is no way to do it without race condition. Look at the kernfs code
closely, there is no point to set dentry->d_fsdata. inode
From: Shaohua Li
Set i_generation for kernfs inode. This is required to implement
exportfs operations. The generation is 32-bit, so it's possible the
generation wraps up and we find stale files. To reduce the posssibility,
we don't reuse inode numer immediately. When the inode number
From: Shaohua Li
inode number and generation can identify a kernfs node. We are going to
export the identification by exportfs operations, so put ino and
generation into a separate structure. It's convenient when later patches
use the identification.
Signed-off-by: Shaohua Li
---
fs/k
From: Shaohua Li
Now we have the facilities to implement exportfs operations. The idea is
cgroup can export the fhandle info to userspace, then userspace uses
fhandle to find the cgroup name. Another example is userspace can get
fhandle for a cgroup and BPF uses the fhandle to filter info for
From: Shaohua Li
By default we output cgroup id in blktrace. This adds an option to
display cgroup path. Since get cgroup path is a relativly heavy
operation, we don't enable it by default.
with the option enabled, blktrace will output something like this:
dd-1353 [007] d..2 293.0
From: Shaohua Li
bio_free isn't a good place to free cgroup/integrity info. There are a
lot of cases bio is allocated in special way (for example, in stack) and
never gets called by bio_put hence bio_free, we are leaking memory. This
patch moves the free to bio endio, which should be c
From: Shaohua Li
Currently blktrace isn't cgroup aware. blktrace prints out task name of
current context, but the task of current context isn't always in the
cgroup where the BIO comes from. We can't use task name to find out IO
cgroup. For example, Writeback BIOs always com
From: Shaohua Li
kernfs uses ida to manage inode number. The problem is we can't get
kernfs_node from inode number with ida. Switching to use idr, next patch
will add an API to get kernfs_node from inode number.
Signed-off-by: Shaohua Li
---
fs/kernfs/dir.c
9:48 -0800)
Nate Dailey (1):
md: limit mdstat resync progress to max_sectors
Shaohua Li (1):
md/raid1/10: add missed blk plug
Song Liu (1):
md/r5cache: move mddev_lock() out of r5c_journal_mode_set()
bingjingc (1)
tency could be much smaller than 1M IO latency. If we don't add
baseline latency, we can't specify a latency target which works for both 4k and
1M IO.
Thanks,
Shaohua
> Signed-off-by: Tejun Heo
> Cc: Shaohua Li
> ---
> block/blk-throttle.c |3 +--
> 1 file changed,
On Thu, Nov 09, 2017 at 03:42:58PM -0800, Tejun Heo wrote:
> Hello, Shaohua.
>
> On Thu, Nov 09, 2017 at 03:12:12PM -0800, Shaohua Li wrote:
> > The percentage latency makes sense, but the absolute latency doesn't to me.
> > A
> > 4k IO latency could be much s
On Thu, Nov 16, 2017 at 08:25:58PM -0800, Khazhismel Kumykov wrote:
> On Thu, Nov 16, 2017 at 8:50 AM, Shaohua Li wrote:
> > On Tue, Nov 14, 2017 at 03:10:22PM -0800, Khazhismel Kumykov wrote:
> >> Allows configuration additional bytes or ios before a throttle is
> >>
: Tejun Heo
Signed-off-by: Shaohua Li
---
kernel/kthread.c | 7 ---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/kernel/kthread.c b/kernel/kthread.c
index f87cd8b4..cf5c113 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -205,6 +205,10 @@ static int kthread(void
ch doesn't sound much overhead.
Reported-by: syzbot
Fixes: 05e3db95ebfc ("kthread: add a mechanism to store cgroup info")
Cc: Andrew Morton
Cc: Ingo Molnar
Cc: Tejun Heo
Cc: Dmitry Vyukov
Signed-off-by: Shaohua Li
---
kernel/kthread.c | 6 +-
1 file changed, 1 insertion(+), 5
On Fri, Sep 09, 2016 at 12:22:46AM -0700, Fenghua Yu wrote:
> On Thu, Sep 08, 2016 at 03:45:14PM -0700, Shaohua Li wrote:
> > On Thu, Sep 08, 2016 at 06:17:47PM -0700, Fenghua Yu wrote:
> > > On Thu, Sep 08, 2016 at 03:01:20PM -0700, Shaohua Li wrote:
> > > > On T
On Fri, Sep 09, 2016 at 08:03:42PM +0200, Stefan Priebe - Profihost AG wrote:
> Am 08.09.2016 um 19:33 schrieb Shaohua Li:
> > On Thu, Sep 08, 2016 at 10:16:59AM -0600, Jens Axboe wrote:
> >> On 09/08/2016 02:23 AM, Stefan Priebe - Profihost AG wrote:
> >>> Hi,
>
On Fri, Sep 09, 2016 at 06:03:12PM +, Luck, Tony wrote:
> > I don't think this is convenient, but it's ok. Now if we create a new thread
> > between 1 and 2, the new thread is in group1. The new thread pid isn't in
> > the
> > pid list we found in 1, so after 2, the new thread still is in grou
On Sat, Sep 10, 2016 at 12:36:57AM +, Yu, Fenghua wrote:
> > > Hmm, I don't know how applications are going to use the interface.
> > > Nobody knows it right now. But we do have some candicate workloads
> > > which want to configure the cache partition at runtime, so it's not
> > > just a boot
its think time is above a threshold (by
default 50us for SSD and 1ms for HD). The idea is think time above the
threshold will start to harm performance. HD is much slower so a longer
think time is ok. There is a knob to let user configure the threshold
too.
Signed-off-by: Shaohua Li
---
block/bi
;trial' logic, which creates too much fluctuation
- Add a new idle cgroup detection
- Other bug fixes and improvements
http://marc.info/?l=linux-block&m=147395674732335&w=2
V1:
http://marc.info/?l=linux-block&m=146292596425689&w=2
Shaohua Li (11):
block-throttle: prepare s
Last patch introduces a way to detect idle cgroup. We use it to make
upgrade/downgrade decision.
Signed-off-by: Shaohua Li
---
block/blk-throttle.c | 31 +++
1 file changed, 19 insertions(+), 12 deletions(-)
diff --git a/block/blk-throttle.c b/block/blk-throttle.c
.
Signed-off-by: Shaohua Li
---
block/blk-sysfs.c| 11
block/blk-throttle.c | 72
block/blk.h | 3 +++
3 files changed, 64 insertions(+), 22 deletions(-)
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index f87a7e7
for their high limit.
Signed-off-by: Shaohua Li
---
block/blk-throttle.c | 21 +++--
1 file changed, 19 insertions(+), 2 deletions(-)
diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 59d4b4c..e2b3704 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
We are going to support high/max limit, each cgroup will have 2 limits
after that. This patch prepares for the multiple limits change.
Signed-off-by: Shaohua Li
---
block/blk-throttle.c | 109 ---
1 file changed, 68 insertions(+), 41 deletions
When queue state machine is in LIMIT_MAX state, but a cgroup is below
its high limit for some time, the queue should be downgraded to lower
state as one cgroup's high limit isn't met.
Signed-off-by: Shaohua Li
---
block/blk-throttle.c | 187
iops cross high limit, we can upgrade queue state. The other case is
children has higher high limit than parent. Children's high limit is
meaningless. As long as parent's bps/iops cross high limit, we can
upgrade queue state.
Signed-off-by: Shaohua Li
---
b
roup sleep time not too big wouldn't change cgroup
bps/iops, but could make it wakeup more frequently, which isn't a big
issue because throtl_slice * 8 is already quite big.
Signed-off-by: Shaohua Li
---
block/blk-throttle.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/block/b
bandwidth, but that's
something we pay for sharing.
Note this doesn't completely avoid cgroup running under its high limit.
The best way to guarantee cgroup doesn't run under its limit is to set
max limit. For example, if we set cg1 max limit to 40, cg2 will never
run under its high
idle
cgroup is hard. This patch handles a simple case, a cgroup doesn't
dispatch any IO. We ignore such cgroup's limit, so other cgroups can use
the bandwidth.
Signed-off-by: Shaohua Li
---
block/blk-throttle.c | 17 -
1 file changed, 16 insertions(+), 1 deletion(-
Add high limit for cgroup and corresponding cgroup interface.
Signed-off-by: Shaohua Li
---
block/blk-throttle.c | 139 +++
1 file changed, 107 insertions(+), 32 deletions(-)
diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 964b713
Hi,
On Tue, Oct 04, 2016 at 09:28:05AM -0400, Vivek Goyal wrote:
> On Mon, Oct 03, 2016 at 02:20:19PM -0700, Shaohua Li wrote:
> > Hi,
> >
> > The background is we don't have an ioscheduler for blk-mq yet, so we can't
> > prioritize processes/cgroups.
>
On Tue, Oct 04, 2016 at 07:01:39PM +0200, Paolo Valente wrote:
>
> > Il giorno 04 ott 2016, alle ore 18:27, Tejun Heo ha
> > scritto:
> >
> > Hello,
> >
> > On Tue, Oct 04, 2016 at 06:22:28PM +0200, Paolo Valente wrote:
> >> Could you please elaborate more on this point? BFQ uses sectors
> >>
On Tue, Oct 04, 2016 at 07:43:48PM +0200, Paolo Valente wrote:
>
> > Il giorno 04 ott 2016, alle ore 19:28, Shaohua Li ha scritto:
> >
> > On Tue, Oct 04, 2016 at 07:01:39PM +0200, Paolo Valente wrote:
> >>
> >>> Il giorno 04 ott 2016, al
On Wed, Oct 05, 2016 at 10:49:46AM -0400, Tejun Heo wrote:
> Hello, Paolo.
>
> On Wed, Oct 05, 2016 at 02:37:00PM +0200, Paolo Valente wrote:
> > In this respect, for your generic, unpredictable scenario to make
> > sense, there must exist at least one real system that meets the
> > requirements o
On Wed, Oct 05, 2016 at 11:30:53AM -0700, Shaohua Li wrote:
> On Wed, Oct 05, 2016 at 10:49:46AM -0400, Tejun Heo wrote:
> > Hello, Paolo.
> >
> > On Wed, Oct 05, 2016 at 02:37:00PM +0200, Paolo Valente wrote:
> > > In this respect, for your generic, unpredictabl
On Wed, Oct 05, 2016 at 09:57:22PM +0200, Paolo Valente wrote:
>
> > Il giorno 05 ott 2016, alle ore 21:08, Shaohua Li ha scritto:
> >
> > On Wed, Oct 05, 2016 at 11:30:53AM -0700, Shaohua Li wrote:
> >> On Wed, Oct 05, 2016 at 10:49:46AM -0400, Tejun
On Wed, Oct 05, 2016 at 09:47:19PM +0200, Paolo Valente wrote:
>
> > Il giorno 05 ott 2016, alle ore 20:30, Shaohua Li ha scritto:
> >
> > On Wed, Oct 05, 2016 at 10:49:46AM -0400, Tejun Heo wrote:
> >> Hello, Paolo.
> >>
> >> On Wed, Oct 0
On Fri, Aug 12, 2016 at 06:03:18PM -0700, Gayatri Kammela wrote:
> This is the version 2 patch series for adding AVX512 optimized gen_syndrome,
> xor_syndrome and recovery functions.
>
> Optimization of RAID6 using AVX512 instructions should improve the
> RAID6 performance.These patches are tested
Commit-ID: ef28faf837aba5b80d08a3d957e365be972f222b
Gitweb: http://git.kernel.org/tip/ef28faf837aba5b80d08a3d957e365be972f222b
Author: Shaohua Li
AuthorDate: Tue, 8 Apr 2014 15:58:09 +0800
Committer: Ingo Molnar
CommitDate: Mon, 14 Apr 2014 13:34:50 +0200
x86/mm: In the PTE swapout
Commit-ID: b13b1d2d8692b437203de7a404c6b809d2cc4d99
Gitweb: http://git.kernel.org/tip/b13b1d2d8692b437203de7a404c6b809d2cc4d99
Author: Shaohua Li
AuthorDate: Tue, 8 Apr 2014 15:58:09 +0800
Committer: Ingo Molnar
CommitDate: Wed, 16 Apr 2014 08:57:08 +0200
x86/mm: In the PTE swapout
Commit-ID: 72f669c0086febc92ce7390125722c4c0ec5
Gitweb: http://git.kernel.org/tip/72f669c0086febc92ce7390125722c4c0ec5
Author: Shaohua Li
AuthorDate: Thu, 5 Feb 2015 15:55:31 -0800
Committer: Ingo Molnar
CommitDate: Wed, 18 Feb 2015 17:01:44 +0100
perf: Update shadow timestamp
Commit-ID: 6a694a607a97d58c042fb7fbd60ef1caea26950c
Gitweb: http://git.kernel.org/tip/6a694a607a97d58c042fb7fbd60ef1caea26950c
Author: Shaohua Li
AuthorDate: Thu, 5 Feb 2015 15:55:32 -0800
Committer: Ingo Molnar
CommitDate: Wed, 18 Feb 2015 17:01:45 +0100
perf: Update userspace page
Commit-ID: 5d7c631d926b59aa16f3c56eaeb83f1036c81dc7
Gitweb: http://git.kernel.org/tip/5d7c631d926b59aa16f3c56eaeb83f1036c81dc7
Author: Shaohua Li
AuthorDate: Thu, 30 Jul 2015 16:24:43 -0700
Committer: Thomas Gleixner
CommitDate: Mon, 14 Sep 2015 18:29:59 +0200
x86/apic: Serialize LVTT
Commit-ID: 76ae054c69a745ded388fc4ae70422d74c5bc77d
Gitweb: http://git.kernel.org/tip/76ae054c69a745ded388fc4ae70422d74c5bc77d
Author: Shaohua Li
AuthorDate: Fri, 2 Dec 2016 14:21:06 -0800
Committer: Thomas Gleixner
CommitDate: Fri, 9 Dec 2016 14:12:18 +0100
x86/intel_rdt: Implement
Commit-ID: 7bff0af51012500718971f9cc3485f67252353eb
Gitweb: http://git.kernel.org/tip/7bff0af51012500718971f9cc3485f67252353eb
Author: Shaohua Li
AuthorDate: Thu, 3 Nov 2016 14:09:05 -0700
Committer: Thomas Gleixner
CommitDate: Mon, 7 Nov 2016 12:20:52 +0100
x86/intel_rdt: Propagate
Commit-ID: 53a114a69095eeb0e15d04c2a82358b3192f88df
Gitweb: http://git.kernel.org/tip/53a114a69095eeb0e15d04c2a82358b3192f88df
Author: Shaohua Li
AuthorDate: Thu, 3 Nov 2016 14:09:06 -0700
Committer: Thomas Gleixner
CommitDate: Mon, 7 Nov 2016 12:20:52 +0100
x86/intel_rdt: Export the
801 - 876 of 876 matches
Mail list logo