[PATCH] x86/tboot: add an option to disable iommu force on

2017-04-25 Thread Shaohua Li
t we need to disable TBOOT PMR registers. For system without the boot option, nothing is changed. Signed-off-by: Shaohua Li --- Documentation/admin-guide/kernel-parameters.txt | 5 + arch/x86/kernel/tboot.c | 3 +++ drivers/iommu/intel-iommu.c

[PATCH V2] x86/tboot: add an option to disable iommu force on

2017-04-26 Thread Shaohua Li
t we need to disable TBOOT PMR registers. For system without the boot option, nothing is changed. Signed-off-by: Shaohua Li --- Documentation/admin-guide/kernel-parameters.txt | 9 + arch/x86/kernel/tboot.c | 3 +++ drivers/iommu/intel-iommu.c

Re: [PATCH V2] x86/tboot: add an option to disable iommu force on

2017-04-27 Thread Shaohua Li
On Thu, Apr 27, 2017 at 10:42:07AM +0200, Joerg Roedel wrote: > On Thu, Apr 27, 2017 at 08:51:42AM +0200, Ingo Molnar wrote: > > > + tboot_noforce [Default Off] > > > + Do not force the Intel IOMMU enabled under tboot. > > > + By default, tboot will force Int

Re: [PATCH V2] x86/tboot: add an option to disable iommu force on

2017-04-27 Thread Shaohua Li
On Thu, Apr 27, 2017 at 05:18:55PM +0200, Joerg Roedel wrote: > On Thu, Apr 27, 2017 at 07:49:02AM -0700, Shaohua Li wrote: > > This is exactly the usage for us. And please note, not everybody should > > sacrifice the DMA security. It is only required when the pcie device

Re: [RFC] x86/tboot: add an option to disable iommu force on

2017-04-03 Thread Shaohua Li
On Wed, Mar 22, 2017 at 07:50:55AM -0400, Shaohua Li wrote: > On Wed, Mar 22, 2017 at 11:49:00AM +0100, Joerg Roedel wrote: > > Hi Shaohua, > > > > On Tue, Mar 21, 2017 at 11:37:51AM -0700, Shaohua Li wrote: > > > IOMMU harms performance signficantly wh

Re: [RFC] x86/tboot: add an option to disable iommu force on

2017-04-09 Thread Shaohua Li
ar_disable=1. The tboot code (tboot_force_iommu) runs later and force dmar_disabled = 0. Thanks, Shaohua > Thanks, > -ning > > -Original Message- > From: Joerg Roedel [mailto:jroe...@suse.de] > Sent: Friday, April 07, 2017 3:09 AM > To: Shaohua Li > Cc: linux-kern

Re: [RFC] x86/tboot: add an option to disable iommu force on

2017-04-24 Thread Shaohua Li
ce hungry tboot users, so > long as the users are aware of the security implication behind of this option. > > Thanks, > -ning > > -----Original Message- > From: Shaohua Li [mailto:s...@fb.com] > Sent: Sunday, April 09, 2017 9:31 PM > To: Sun, Ning > Cc: Joe

[PATCH V4 03/12] kernfs: add an API to get kernfs node from inode number

2017-06-28 Thread Shaohua Li
From: Shaohua Li Add an API to get kernfs node from inode number. We will need this to implement exportfs operations. This API will be used in blktrace too later, so it should be as fast as possible. To make the API lock free, kernfs node is freed in RCU context. And we depend on kernfs_node

[PATCH V4 09/12] block: always attach cgroup info into bio

2017-06-28 Thread Shaohua Li
From: Shaohua Li blkcg_bio_issue_check() already gets blkcg for a BIO. bio_associate_blkcg() uses a percpu refcounter, so it's a very cheap operation. There is no point we don't attach the cgroup info into bio at blkcg_bio_issue_check. This also makes blktrace outputs correct cgroup in

[PATCH V4 02/12] kernfs: implement i_generation

2017-06-28 Thread Shaohua Li
From: Shaohua Li Set i_generation for kernfs inode. This is required to implement exportfs operations. The generation is 32-bit, so it's possible the generation wraps up and we find stale files. To reduce the posssibility, we don't reuse inode numer immediately. When the inode number

[PATCH V4 00/12] blktrace: output cgroup info

2017-06-28 Thread Shaohua Li
From: Shaohua Li Hi, Currently blktrace isn't cgroup aware. blktrace prints out task name of current context, but the task of current context isn't always in the cgroup where the BIO comes from. We can't use task name to find out IO cgroup. For example, Writeback BIOs always com

[PATCH V4 04/12] kernfs: don't set dentry->d_fsdata

2017-06-28 Thread Shaohua Li
From: Shaohua Li When working on adding exportfs operations in kernfs, I found it's hard to initialize dentry->d_fsdata in the exportfs operations. Looks there is no way to do it without race condition. Look at the kernfs code closely, there is no point to set dentry->d_fsdata. inode

[PATCH V4 12/12] block: use standard blktrace API to output cgroup info for debug notes

2017-06-28 Thread Shaohua Li
From: Shaohua Li Currently cfq/bfq/blk-throttle output cgroup info in trace in their own way. Now we have standard blktrace API for this, so convert them to use it. Note, this changes the behavior a little bit. cgroup info isn't output by default, we only do this with 'blk_cgro

[PATCH V4 11/12] blktrace: add an option to allow displying cgroup path

2017-06-28 Thread Shaohua Li
From: Shaohua Li By default we output cgroup id in blktrace. This adds an option to display cgroup path. Since get cgroup path is a relativly heavy operation, we don't enable it by default. with the option enabled, blktrace will output something like this: dd-1353 [007] d..2 293.0

[PATCH V4 08/12] blktrace: export cgroup info in trace

2017-06-28 Thread Shaohua Li
From: Shaohua Li Currently blktrace isn't cgroup aware. blktrace prints out task name of current context, but the task of current context isn't always in the cgroup where the BIO comes from. We can't use task name to find out IO cgroup. For example, Writeback BIOs always com

[PATCH V4 10/12] block: call __bio_free in bio_endio

2017-06-28 Thread Shaohua Li
From: Shaohua Li bio_free isn't a good place to free cgroup/integrity info. There are a lot of cases bio is allocated in special way (for example, in stack) and never gets called by bio_put hence bio_free, we are leaking memory. This patch moves the free to bio endio, which should be c

[PATCH V4 06/12] kernfs: add exportfs operations

2017-06-28 Thread Shaohua Li
From: Shaohua Li Now we have the facilities to implement exportfs operations. The idea is cgroup can export the fhandle info to userspace, then userspace uses fhandle to find the cgroup name. Another example is userspace can get fhandle for a cgroup and BPF uses the fhandle to filter info for

[PATCH V4 07/12] cgroup: export fhandle info for a cgroup

2017-06-28 Thread Shaohua Li
From: Shaohua Li Add an API to export cgroup fhandle info. We don't export a full 'struct file_handle', there are unrequired info. Sepcifically, cgroup is always a directory, so we don't need a 'FILEID_INO32_GEN_PARENT' type fhandle, we only need export the inod

[PATCH V4 05/12] kernfs: introduce kernfs_node_id

2017-06-28 Thread Shaohua Li
From: Shaohua Li inode number and generation can identify a kernfs node. We are going to export the identification by exportfs operations, so put ino and generation into a separate structure. It's convenient when later patches use the identification. Signed-off-by: Shaohua Li --- fs/k

[PATCH V4 01/12] kernfs: use idr instead of ida to manage inode number

2017-06-28 Thread Shaohua Li
From: Shaohua Li kernfs uses ida to manage inode number. The problem is we can't get kernfs_node from inode number with ida. Switching to use idr, next patch will add an API to get kernfs_node from inode number. Acked-by: Tejun Heo Signed-off-by: Shaohua Li --- fs/kernfs/dir.c

Re: [PATCH V4 00/12] blktrace: output cgroup info

2017-06-28 Thread Shaohua Li
On Wed, Jun 28, 2017 at 10:43:48AM -0600, Jens Axboe wrote: > On 06/28/2017 10:29 AM, Shaohua Li wrote: > > From: Shaohua Li > > > > Hi, > > > > Currently blktrace isn't cgroup aware. blktrace prints out task name of > > current > > context, b

[GIT PULL] MD update for 4.12-rc2

2017-05-18 Thread Shaohua Li
) Artur Paszkiewicz (1): md: don't return -EAGAIN in md_allow_write for external metadata arrays Julia Cartwright (1): md/raid5: make use of spin_lock_irq over local_irq_disable + spin_lock Shaohua Li (2): md/md0: optimize raid0 discard han

Re: [PATCH 13/23] md: namespace private helper names

2017-05-18 Thread Shaohua Li
; > Signed-off-by: Amir Goldstein > Signed-off-by: Christoph Hellwig Reviewed-by: Shaohua Li

Re: [PATCH 5/5] kernfs: add exportfs operations

2017-05-24 Thread Shaohua Li
On Wed, May 24, 2017 at 01:46:04PM -0400, Tejun Heo wrote: > Hello, Christoph. > > On Wed, May 24, 2017 at 10:41:38AM -0700, Christoph Hellwig wrote: > > > But how do you map that back to the cgroup without scanning the cgroup > > > hierarchy? > > > > I'm totally lost on why you would do that. S

[PATCH V5 11/11] block: use standard blktrace API to output cgroup info for debug notes

2017-07-12 Thread Shaohua Li
From: Shaohua Li Currently cfq/bfq/blk-throttle output cgroup info in trace in their own way. Now we have standard blktrace API for this, so convert them to use it. Note, this changes the behavior a little bit. cgroup info isn't output by default, we only do this with 'blk_cgro

[PATCH V5 07/11] cgroup: export fhandle info for a cgroup

2017-07-12 Thread Shaohua Li
From: Shaohua Li Add an API to export cgroup fhandle info. We don't export a full 'struct file_handle', there are unrequired info. Sepcifically, cgroup is always a directory, so we don't need a 'FILEID_INO32_GEN_PARENT' type fhandle, we only need export the inod

[PATCH V5 08/11] blktrace: export cgroup info in trace

2017-07-12 Thread Shaohua Li
From: Shaohua Li Currently blktrace isn't cgroup aware. blktrace prints out task name of current context, but the task of current context isn't always in the cgroup where the BIO comes from. We can't use task name to find out IO cgroup. For example, Writeback BIOs always com

[PATCH V5 09/11] block: always attach cgroup info into bio

2017-07-12 Thread Shaohua Li
From: Shaohua Li blkcg_bio_issue_check() already gets blkcg for a BIO. bio_associate_blkcg() uses a percpu refcounter, so it's a very cheap operation. There is no point we don't attach the cgroup info into bio at blkcg_bio_issue_check. This also makes blktrace outputs correct cgroup in

[PATCH V5 06/11] kernfs: add exportfs operations

2017-07-12 Thread Shaohua Li
From: Shaohua Li Now we have the facilities to implement exportfs operations. The idea is cgroup can export the fhandle info to userspace, then userspace uses fhandle to find the cgroup name. Another example is userspace can get fhandle for a cgroup and BPF uses the fhandle to filter info for

[PATCH V5 10/11] blktrace: add an option to allow displaying cgroup path

2017-07-12 Thread Shaohua Li
From: Shaohua Li By default we output cgroup id in blktrace. This adds an option to display cgroup path. Since get cgroup path is a relativly heavy operation, we don't enable it by default. with the option enabled, blktrace will output something like this: dd-1353 [007] d..2 293.0

[PATCH V5 05/11] kernfs: introduce kernfs_node_id

2017-07-12 Thread Shaohua Li
From: Shaohua Li inode number and generation can identify a kernfs node. We are going to export the identification by exportfs operations, so put ino and generation into a separate structure. It's convenient when later patches use the identification. Acked-by: Greg Kroah-Hartman Signed-o

[PATCH V5 02/11] kernfs: implement i_generation

2017-07-12 Thread Shaohua Li
From: Shaohua Li Set i_generation for kernfs inode. This is required to implement exportfs operations. The generation is 32-bit, so it's possible the generation wraps up and we find stale files. To reduce the posssibility, we don't reuse inode numer immediately. When the inode number

[PATCH V5 00/11] blktrace: output cgroup info

2017-07-12 Thread Shaohua Li
From: Shaohua Li Hi, Currently blktrace isn't cgroup aware. blktrace prints out task name of current context, but the task of current context isn't always in the cgroup where the BIO comes from. We can't use task name to find out IO cgroup. For example, Writeback BIOs always com

[PATCH V5 03/11] kernfs: add an API to get kernfs node from inode number

2017-07-12 Thread Shaohua Li
From: Shaohua Li Add an API to get kernfs node from inode number. We will need this to implement exportfs operations. This API will be used in blktrace too later, so it should be as fast as possible. To make the API lock free, kernfs node is freed in RCU context. And we depend on kernfs_node

[PATCH V5 04/11] kernfs: don't set dentry->d_fsdata

2017-07-12 Thread Shaohua Li
From: Shaohua Li When working on adding exportfs operations in kernfs, I found it's hard to initialize dentry->d_fsdata in the exportfs operations. Looks there is no way to do it without race condition. Look at the kernfs code closely, there is no point to set dentry->d_fsdata. inode

[PATCH V5 01/11] kernfs: use idr instead of ida to manage inode number

2017-07-12 Thread Shaohua Li
From: Shaohua Li kernfs uses ida to manage inode number. The problem is we can't get kernfs_node from inode number with ida. Switching to use idr, next patch will add an API to get kernfs_node from inode number. Acked-by: Tejun Heo Acked-by: Greg Kroah-Hartman Signed-off-by: Shaoh

Re: [PATCH] fs: System memory leak when running HTX with T10 DIF enabled

2017-06-28 Thread Shaohua Li
On Wed, Jun 28, 2017 at 12:57:50PM -0600, Jens Axboe wrote: > On 06/28/2017 12:52 PM, Christoph Hellwig wrote: > > On Wed, Jun 28, 2017 at 12:44:00PM -0600, Jens Axboe wrote: > >> On 06/28/2017 12:38 PM, Christoph Hellwig wrote: > >>> On Wed, Jun 28, 2017 at 12:34:15PM -0600, Jens Axboe wrote: > >>

Re: [PATCH V4 10/12] block: call __bio_free in bio_endio

2017-06-28 Thread Shaohua Li
On Wed, Jun 28, 2017 at 11:29:08PM +0200, Christoph Hellwig wrote: > On Wed, Jun 28, 2017 at 09:30:00AM -0700, Shaohua Li wrote: > > From: Shaohua Li > > > > bio_free isn't a good place to free cgroup/integrity info. There are a > > lot of cases bio is allocated

Re: [PATCH V4 10/12] block: call __bio_free in bio_endio

2017-06-29 Thread Shaohua Li
On Thu, Jun 29, 2017 at 07:15:52PM +0200, Christoph Hellwig wrote: > On Wed, Jun 28, 2017 at 02:42:49PM -0700, Shaohua Li wrote: > > > bio_integrity_endio -> bio_integrity_verify_fn -> bio_integrity_process > > > access the integrity data, so I don't think this work

Re: [PATCH V4 00/12] blktrace: output cgroup info

2017-06-29 Thread Shaohua Li
On Wed, Jun 28, 2017 at 02:57:38PM -0600, Jens Axboe wrote: > On 06/28/2017 12:11 PM, Tejun Heo wrote: > > Hello, > > > > On Wed, Jun 28, 2017 at 10:54:28AM -0600, Jens Axboe wrote: > Series looks fine to me. I don't know how you want to split or funnel it, > since it touches multiple di

Re: [PATCH v2 11/51] md: raid1: initialize bvec table via bio_add_page()

2017-06-29 Thread Shaohua Li
he standardy way instead of writing the > > > talbe directly. Otherwise it won't work any more once > > > multipage bvec is enabled. > > > > > > Cc: Shaohua Li > > > Cc: linux-r...@vger.kernel.org > > > Signed-off-by: Ming Lei > >

[GIT PULL] MD update for 4.13

2017-07-07 Thread Shaohua Li
between mddev_suspend() and md_write_start() md: use a separate bio_set for synchronous IO. Shaohua Li (2): MD: fix a null dereference MD: fix sleep in atomic drivers/md/faulty.c| 5 +++-- drivers/md/linear.c| 7 --- drivers/md/md.c| 47 +

[GIT PULL] MD update for 4.13-rc2

2017-07-18 Thread Shaohua Li
Hi, Please pull 3 MD fixes: - raid5-ppl fix by Artur. This one is introduced in this release cycle. - raid5 reshape fix by Xiao. This is an old bug and will be added to stable. - Bitmap fix by Guoqing. Thanks, Shaohua The following changes since commit af3c8d98508d37541d4bf57f13a984a7f73a328c:

Re: [PATCH] md: raid5: avoid string overflow warning

2018-02-21 Thread Shaohua Li
On Tue, Feb 20, 2018 at 02:09:11PM +0100, Arnd Bergmann wrote: > gcc warns about a possible overflow of the kmem_cache string, when adding > four characters to a string of the same length: > > drivers/md/raid5.c: In function 'setup_conf': > drivers/md/raid5.c:2207:34: error: '-alt' directive writi

Re: [PATCH] md/raid1: Fix trailing semicolon

2018-02-17 Thread Shaohua Li
On Wed, Jan 17, 2018 at 01:38:02PM +, Luis de Bethencourt wrote: > The trailing semicolon is an empty statement that does no operation. > Removing it since it doesn't do anything. > > Signed-off-by: Luis de Bethencourt > --- > > Hi, > > After fixing the same thing in drivers/staging/rtl8723

Re: [PATCH v2] md-multipath: Use seq_putc() in multipath_status()

2018-02-17 Thread Shaohua Li
On Sat, Jan 13, 2018 at 09:55:08AM +0100, SF Markus Elfring wrote: > From: Markus Elfring > Date: Sat, 13 Jan 2018 09:49:03 +0100 > > A single character (closing square bracket) should be put into a sequence. > Thus use the corresponding function "seq_putc". > > This issue was detected by using

Re: [PATCH] md: fix md_write_start() deadlock w/o metadata devices

2018-02-18 Thread Shaohua Li
On Fri, Feb 02, 2018 at 11:13:19PM +0100, Heinz Mauelshagen wrote: > If no metadata devices are configured on raid1/4/5/6/10 > (e.g. via dm-raid), md_write_start() unconditionally waits > for superblocks to be written thus deadlocking. > > Fix introduces mddev->has_superblocks bool, defines it in

Re: [RFC PATCH] blk-throttle: add burst allowance.

2017-11-16 Thread Shaohua Li
On Tue, Nov 14, 2017 at 03:10:22PM -0800, Khazhismel Kumykov wrote: > Allows configuration additional bytes or ios before a throttle is > triggered. > > This allows implementation of a bucket style rate-limit/throttle on a > block device. Previously, bursting to a device was limited to allowance >

Re: Regression with a0747a859ef6 ("bdi: add error handle for bdi_debug_register")

2017-12-19 Thread Shaohua Li
On Tue, Dec 19, 2017 at 10:17:43AM -0600, Bruno Wolff III wrote: > On Sun, Dec 17, 2017 at 21:43:50 +0800, > weiping zhang wrote: > > Hi, thanks for testing, I think you first reproduce this issue(got WARNING > > at device_add_disk) by your own build, then add my debug patch. > > The problem is

[PATCH V2 4/4] block/loop: make loop cgroup aware

2017-09-13 Thread Shaohua Li
From: Shaohua Li loop block device handles IO in a separate thread. The actual IO dispatched isn't cloned from the IO loop device received, so the dispatched IO loses the cgroup context. I'm ignoring buffer IO case now, which is quite complicated. Making the loop thread aware cgro

[PATCH V2 0/4] block: make loop block device cgroup aware

2017-09-13 Thread Shaohua Li
From: Shaohua Li Hi, The IO dispatched to under layer disk by loop block device isn't cloned from original bio, so the IO loses cgroup information of original bio. These IO escapes from cgroup control. The patches try to address this issue. The idea is quite generic, but we currently only

[PATCH V2 3/4] block: make blkcg aware of kthread stored original cgroup info

2017-09-13 Thread Shaohua Li
From: Shaohua Li bio_blkcg is the only API to get cgroup info for a bio right now. If bio_blkcg finds current task is a kthread and has original blkcg associated, it will use the css instead of associating the bio to current task. This makes it possible that kthread dispatches bios on behalf of

[PATCH V2 1/4] kthread: add a mechanism to store cgroup info

2017-09-13 Thread Shaohua Li
From: Shaohua Li kthread usually runs jobs on behalf of other threads. The jobs should be charged to cgroup of original threads. But the jobs run in a kthread, where we lose the cgroup context of original threads. The patch adds a machanism to record cgroup info of original threads in kthread

[PATCH V2 2/4] blkcg: delete unused APIs

2017-09-13 Thread Shaohua Li
From: Shaohua Li Nobody uses the APIs right now. Signed-off-by: Shaohua Li --- block/bio.c| 31 --- include/linux/bio.h| 2 -- include/linux/blk-cgroup.h | 12 3 files changed, 45 deletions(-) diff --git a/block/bio.c b/block

Re: [PATCH V2 1/4] kthread: add a mechanism to store cgroup info

2017-09-13 Thread Shaohua Li
On Wed, Sep 13, 2017 at 02:38:20PM -0700, Tejun Heo wrote: > Hello, > > On Wed, Sep 13, 2017 at 02:01:26PM -0700, Shaohua Li wrote: > > diff --git a/kernel/kthread.c b/kernel/kthread.c > > index 26db528..3107eee 100644 > > --- a/kernel/kthread.c > > +++ b/ke

[PATCH V3 3/4] block: make blkcg aware of kthread stored original cgroup info

2017-09-14 Thread Shaohua Li
From: Shaohua Li bio_blkcg is the only API to get cgroup info for a bio right now. If bio_blkcg finds current task is a kthread and has original blkcg associated, it will use the css instead of associating the bio to current task. This makes it possible that kthread dispatches bios on behalf of

[PATCH V3 4/4] block/loop: make loop cgroup aware

2017-09-14 Thread Shaohua Li
From: Shaohua Li loop block device handles IO in a separate thread. The actual IO dispatched isn't cloned from the IO loop device received, so the dispatched IO loses the cgroup context. I'm ignoring buffer IO case now, which is quite complicated. Making the loop thread aware cgro

[PATCH V3 2/4] blkcg: delete unused APIs

2017-09-14 Thread Shaohua Li
From: Shaohua Li Nobody uses the APIs right now. Acked-by: Tejun Heo Signed-off-by: Shaohua Li --- block/bio.c| 31 --- include/linux/bio.h| 2 -- include/linux/blk-cgroup.h | 12 3 files changed, 45 deletions(-) diff --git a

[PATCH V3 0/4] block: make loop block device cgroup aware

2017-09-14 Thread Shaohua Li
From: Shaohua Li Hi, The IO dispatched to under layer disk by loop block device isn't cloned from original bio, so the IO loses cgroup information of original bio. These IO escapes from cgroup control. The patches try to address this issue. The idea is quite generic, but we currently only

[PATCH V3 1/4] kthread: add a mechanism to store cgroup info

2017-09-14 Thread Shaohua Li
From: Shaohua Li kthread usually runs jobs on behalf of other threads. The jobs should be charged to cgroup of original threads. But the jobs run in a kthread, where we lose the cgroup context of original threads. The patch adds a machanism to record cgroup info of original threads in kthread

[PATCH] kthread_worker: don't hog the cpu

2017-08-25 Thread Shaohua Li
If the worker thread continues getting work, it will hog the cpu and rcu stall complains. Make it a good citizen. This is triggered in a loop block device test. Signed-off-by: Shaohua Li --- kernel/kthread.c | 1 + 1 file changed, 1 insertion(+) diff --git a/kernel/kthread.c b/kernel/kthread.c

[GIT PULL] MD update for 4.14-rc2

2017-09-19 Thread Shaohua Li
) Dennis Yang (1): md/raid5: preserve STRIPE_ON_UNPLUG_LIST in break_stripe_batch_list Shaohua Li (1): md/raid5: fix a race condition in stripe batch drivers/md/raid5.c | 13 ++--- 1 file changed, 10 insertions(+), 3 deletions(-)

Re: [PATCH 1/3] kthread: add a mechanism to store cgroup info

2017-09-08 Thread Shaohua Li
On Fri, Sep 08, 2017 at 07:35:37AM -0700, Tejun Heo wrote: > Hello, > > On Wed, Sep 06, 2017 at 07:00:51PM -0700, Shaohua Li wrote: > > +#ifdef CONFIG_CGROUPS > > +void kthread_set_orig_css(struct cgroup_subsys_state *css); > > +struct cgroup_subsys_state *kthread_ge

Re: [PATCH 3/3] block/loop: make loop cgroup aware

2017-09-08 Thread Shaohua Li
On Fri, Sep 08, 2017 at 07:48:09AM -0700, Tejun Heo wrote: > Hello, > > On Wed, Sep 06, 2017 at 07:00:53PM -0700, Shaohua Li wrote: > > diff --git a/drivers/block/loop.c b/drivers/block/loop.c > > index 9d4545f..9850b27 100644 > > --- a/drivers/block/loop.c >

Re: [PATCH] md/raid5: preserve STRIPE_ON_UNPLUG_LIST in break_stripe_batch_list

2017-08-31 Thread Shaohua Li
On Mon, Aug 28, 2017 at 08:01:59PM +0800, Dennis Yang wrote: > break_stripe_batch_list() did not preserve STRIPE_ON_UNPLUG_LIST which is > set when a stripe_head gets queued to the stripe_head list maintained by > raid5_plug_cb and waiting for releasing after blk_unplug(). > > In release_stripe_pl

Re: [PATCH] md/raid5: preserve STRIPE_ON_UNPLUG_LIST in break_stripe_batch_list

2017-09-05 Thread Shaohua Li
On Fri, Sep 01, 2017 at 05:26:48PM +0800, Dennis Yang wrote: > >On Mon, Aug 28, 2017 at 08:01:59PM +0800, Dennis Yang wrote: > >> break_stripe_batch_list() did not preserve STRIPE_ON_UNPLUG_LIST which is > >> set when a stripe_head gets queued to the stripe_head list maintained by > >> raid5_plug_c

Re: [PATCH V6 00/18] blk-throttle: add .low limit

2017-09-05 Thread Shaohua Li
On Thu, Aug 31, 2017 at 09:24:23AM +0200, Paolo VALENTE wrote: > > > Il giorno 15 gen 2017, alle ore 04:42, Shaohua Li ha scritto: > > > > Hi, > > > > cgroup still lacks a good iocontroller. CFQ works well for hard disk, but > > not > > much for

Re: [PATCH v2] md/raid5: preserve STRIPE_ON_UNPLUG_LIST in break_stripe_batch_list

2017-09-05 Thread Shaohua Li
On Wed, Sep 06, 2017 at 11:02:35AM +0800, Dennis Yang wrote: > In release_stripe_plug(), if a stripe_head has its STRIPE_ON_UNPLUG_LIST > set, it indicates that this stripe_head is already in the raid5_plug_cb > list and release_stripe() would be called instead to drop a reference > count. Otherwis

Re: [PATCH V6 00/18] blk-throttle: add .low limit

2017-09-06 Thread Shaohua Li
On Wed, Sep 06, 2017 at 09:12:20AM +0800, Joseph Qi wrote: > Hi Shaohua, > > On 17/9/6 05:02, Shaohua Li wrote: > > On Thu, Aug 31, 2017 at 09:24:23AM +0200, Paolo VALENTE wrote: > >> > >>> Il giorno 15 gen 2017, alle ore 04:42, Shaohua Li ha > >>

[GIT PULL] MD update for 4.14-rc1

2017-09-06 Thread Shaohua Li
o 512 bits, not bytes Guoqing Jiang (1): raid5: remove raid5_build_block NeilBrown (1): md/bitmap: disable bitmap_resize for file-backed bitmaps. Pawel Baldysiak (2): md: Runtime support for multiple ppls raid5-ppl: Recovery support for multiple partial parity logs Shaohua

[PATCH 2/3] block: make blkcg aware of kthread stored original cgroup info

2017-09-06 Thread Shaohua Li
From: Shaohua Li Several blkcg APIs are deprecated. After removing them, bio_blkcg is the only API to get cgroup info for a bio. If bio_blkcg finds current task is a kthread and has original css recorded, it will use the css instead of associating the bio to current task. Signed-off-by: Shaohua

[PATCH 3/3] block/loop: make loop cgroup aware

2017-09-06 Thread Shaohua Li
From: Shaohua Li loop block device handles IO in a separate thread. The actual IO dispatched isn't cloned from the IO loop device received, so the dispatched IO loses the cgroup context. I'm ignoring buffer IO case now, which is quite complicated. Making the loop thread aware cgro

[PATCH 0/3] block: make loop block device cgroup aware

2017-09-06 Thread Shaohua Li
From: Shaohua Li Hi, The IO dispatched to under layer disk by loop block device isn't cloned from original bio, so the IO loses cgroup information of original bio. These IO escapes from cgroup control. The patches try to address this issue. The idea is quite generic, but we currently only

[PATCH 1/3] kthread: add a mechanism to store cgroup info

2017-09-06 Thread Shaohua Li
From: Shaohua Li kthread usually runs jobs on behalf of other threads. The jobs should be charged to cgroup of original threads. But the jobs run in a kthread, where we lose the cgroup context of original threads. The patch adds a machanism to record cgroup info of original threads in kthread

[PATCH] FS: fix stack-out-of-bounds wanning

2017-05-08 Thread Shaohua Li
Ix: 5ecda13(generic_file_read_iter(): make use of iov_iter_revert()) Cc: Al Viro Signed-off-by: Shaohua Li --- mm/filemap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/filemap.c b/mm/filemap.c index 681da61..df227638 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -205

[PATCH 0/5] kernfs: add exportfs operations

2017-05-22 Thread Shaohua Li
best to identify a cgroup. So this is what this series try to do. Thanks, Shaohua Shaohua Li (5): kernfs: implement i_generation kernfs: use idr instead of ida to manage inode number kernfs: add an API to get kernfs node from inode number kernfs: don't set dentry->d_fsdata

[PATCH 4/5] kernfs: don't set dentry->d_fsdata

2017-05-22 Thread Shaohua Li
eady points to kernfs_node, and we can get inode from a dentry. So this patch just delete the d_fsdata usage. Signed-off-by: Shaohua Li --- fs/kernfs/dir.c | 22 +++--- fs/kernfs/file.c| 6 +++--- fs/kernfs/inode.c | 6 +++--- fs/kernfs/kernfs-i

[PATCH 3/5] kernfs: add an API to get kernfs node from inode number

2017-05-22 Thread Shaohua Li
Add an API to get kernfs node from inode number. We will need this to implement exportfs operations. To make the API lock free, kernfs node is freed in RCU context. And we depend on kernfs_node count/ino number to filter stale kernfs nodes. Signed-off-by: Shaohua Li --- fs/kernfs/dir.c

[PATCH 2/5] kernfs: use idr instead of ida to manage inode number

2017-05-22 Thread Shaohua Li
kernfs uses ida to manage inode number. The problem is we can't get kernfs_node from inode number with ida. Switching to use idr, next patch will add an API to get kernfs_node from inode number. Signed-off-by: Shaohua Li --- fs/kernfs/dir.c| 17 - include/linux/ker

[PATCH 1/5] kernfs: implement i_generation

2017-05-22 Thread Shaohua Li
Set i_generation for kernfs inod. This is required to implement exportfs operations. Signed-off-by: Shaohua Li --- fs/kernfs/dir.c| 2 ++ fs/kernfs/inode.c | 1 + include/linux/kernfs.h | 2 ++ 3 files changed, 5 insertions(+) diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c index

[PATCH 5/5] kernfs: add exportfs operations

2017-05-22 Thread Shaohua Li
Now we have the facilities to implement exportfs operations. Signed-off-by: Shaohua Li --- fs/kernfs/mount.c | 55 +++ 1 file changed, 55 insertions(+) diff --git a/fs/kernfs/mount.c b/fs/kernfs/mount.c index 462a40c..5af88cc 100644 --- a/fs

Re: [PATCH 1/5] kernfs: implement i_generation

2017-05-23 Thread Shaohua Li
On Tue, May 23, 2017 at 12:41:12AM -0700, Christoph Hellwig wrote: > On Mon, May 22, 2017 at 03:53:05PM -0700, Shaohua Li wrote: > > Set i_generation for kernfs inod. This is required to implement exportfs > > operations. > > > > Signed-off-by: Shaohua Li > > --

Re: [PATCH 0/5] kernfs: add exportfs operations

2017-05-23 Thread Shaohua Li
On Tue, May 23, 2017 at 12:39:41AM -0700, Christoph Hellwig wrote: > On Mon, May 22, 2017 at 03:53:04PM -0700, Shaohua Li wrote: > > Hi, > > > > The goal isn't to export kernfs to NFS. The intention is to make tracing > > cgroup > > aware. To do this, tracin

[GIT PULL] MD update for 4.12-rc5

2017-06-09 Thread Shaohua Li
Hi, One bug fix from Neil Brown for MD. The bug is introduced in this cycle. Please pull! Thanks, Shaohua The following changes since commit 3c2993b8c6143d8a5793746a54eba8f86f95240f: Linux 4.12-rc4 (2017-06-04 16:47:43 -0700) are available in the git repository at: git://git.kernel.org/pub

Re: [GIT PULL] MD update for 4.12

2017-05-03 Thread Shaohua Li
On Wed, May 03, 2017 at 10:27:47AM -0700, Linus Torvalds wrote: > On Mon, May 1, 2017 at 3:50 PM, Shaohua Li wrote: > > > > Please pull MD update for 4.12. There are conflicts between MD tree and > > block > > tree, so I did a merge before the pull request. > >

[PATCH V2 06/12] kernfs: add exportfs operations

2017-06-14 Thread Shaohua Li
From: Shaohua Li Now we have the facilities to implement exportfs operations. The idea is cgroup can export the fhandle info to userspace, then userspace uses fhandle to find the cgroup name. Another example is userspace can get fhandle for a cgroup and BPF uses the fhandle to filter info for

[PATCH V2 11/12] blktrace: add an option to allow displying cgroup path

2017-06-14 Thread Shaohua Li
From: Shaohua Li By default we output cgroup id in blktrace. This adds an option to display cgroup path. Since get cgroup path is a relativly heavy operation, we don't enable it by default. with the option enabled, blktrace will output something like this: dd-1353 [007] d..2 293.0

[PATCH V2 00/12]blktrace: output cgroup info

2017-06-14 Thread Shaohua Li
From: Shaohua Li Hi, Currently blktrace isn't cgroup aware. blktrace prints out task name of current context, but the task of current context isn't always in the cgroup where the BIO comes from. We can't use task name to find out IO cgroup. For example, Writeback BIOs always com

[PATCH V2 10/12] block: call __bio_free in bio_endio

2017-06-14 Thread Shaohua Li
From: Shaohua Li bio_free isn't a good place to free cgroup/integrity info. There are a lot of cases bio is allocated in special way (for example, in stack) and never gets called by bio_put hence bio_free, we are leaking memory. This patch moves the free to bio endio, which should be c

[PATCH V2 05/12] kernfs: introduce kernfs_node_id

2017-06-14 Thread Shaohua Li
From: Shaohua Li inode number and generation can identify a kernfs node. We are going to export the identification by exportfs operations, so put ino and generation into a separate structure. It's convenient when later patches use the identification. Please note, I extend inode number

[PATCH V2 08/12] blktrace: export cgroup info in trace

2017-06-14 Thread Shaohua Li
From: Shaohua Li Currently blktrace isn't cgroup aware. blktrace prints out task name of current context, but the task of current context isn't always in the cgroup where the BIO comes from. We can't use task name to find out IO cgroup. For example, Writeback BIOs always com

[PATCH V2 12/12] block: use standard blktrace API to output cgroup info for debug notes

2017-06-14 Thread Shaohua Li
From: Shaohua Li Currently cfq/bfq/blk-throttle output cgroup info in trace in their own way. Now we have standard blktrace API for this, so convert them to use it. Note, this changes the behavior a little bit. cgroup info isn't output by default, we only do this with 'blk_cgro

[PATCH V2 01/12] kernfs: implement i_generation

2017-06-14 Thread Shaohua Li
From: Shaohua Li Set i_generation for kernfs inode. This is required to implement exportfs operations. Note, the generation is 32-bit, so it's possible the generation wraps up and we find stale files. The possiblity is low, since fhandle matches both inode number and generation. In most fs

[PATCH V2 02/12] kernfs: use idr instead of ida to manage inode number

2017-06-14 Thread Shaohua Li
From: Shaohua Li kernfs uses ida to manage inode number. The problem is we can't get kernfs_node from inode number with ida. Switching to use idr, next patch will add an API to get kernfs_node from inode number. Signed-off-by: Shaohua Li --- fs/kernfs/dir.c

[PATCH V2 09/12] block: always attach cgroup info into bio

2017-06-14 Thread Shaohua Li
From: Shaohua Li blkcg_bio_issue_check() already gets blkcg for a BIO. bio_associate_blkcg() uses a percpu refcounter, so it's a very cheap operation. There is no point we don't attach the cgroup info into bio at blkcg_bio_issue_check. This also makes blktrace outputs correct c

[PATCH V2 04/12] kernfs: don't set dentry->d_fsdata

2017-06-14 Thread Shaohua Li
From: Shaohua Li When working on adding exportfs operations in kernfs, I found it's hard to initialize dentry->d_fsdata in the exportfs operations. Looks there is no way to do it without race condition. Look at the kernfs code closely, there is no point to set dentry->d_fsdata. inode

[PATCH V2 03/12] kernfs: add an API to get kernfs node from inode number

2017-06-14 Thread Shaohua Li
From: Shaohua Li Add an API to get kernfs node from inode number. We will need this to implement exportfs operations. To make the API lock free, kernfs node is freed in RCU context. And we depend on kernfs_node count/ino number to filter stale kernfs nodes. Signed-off-by: Shaohua Li --- fs

[PATCH V2 07/12] cgroup: export fhandle info for a cgroup

2017-06-14 Thread Shaohua Li
From: Shaohua Li Add an API to export cgroup fhandle info. We don't export a full 'struct file_handle', there are unrequired info. Sepcifically, cgroup is always a directory, so we don't need a 'FILEID_KERNFS_WITH_PARENT' type fhandle, we only need export the inod

[GIT PULL] MD update for 4.12

2017-05-01 Thread Shaohua Li
md: allow creation of mdNNN arrays via md_mod/parameters/new_array md: support disabling of create-on-open semantics. md: handle read-only member devices better. Shaohua Li (7): md/raid5: prioritize stripes for writeback md/raid5-cache: bump flush stripe batch size

Re: [PATCH] md/raid5: make use of spin_lock_irq over local_irq_disable + spin_lock

2017-05-01 Thread Shaohua Li
On Fri, Apr 28, 2017 at 12:41:02PM -0500, Julia Cartwright wrote: > On mainline, there is no functional difference, just less code, and > symmetric lock/unlock paths. > > On PREEMPT_RT builds, this fixes the following warning, seen by > Alexander GQ Gerasiov, due to the sleeping nature of spinlock

<    3   4   5   6   7   8   9   >