On Tue, Apr 20, 2021 at 12:06:24AM +0800, John Garry wrote:
> Function sdev_store_queue_depth() enforces that the sdev queue depth cannot
> exceed shost.can_queue.
>
> However, the LLDD may still set cmd_per_lun > can_queue, which leads to an
> initial sdev queue depth greater than can_queue.
>
>
f (at_head || blk_rq_is_passthrough(rq)) {
> - if (at_head)
> - list_add(&rq->queuelist, &dd->dispatch);
> - else
> - list_add_tail(&rq->queuelist, &dd->dispatch);
> + if (at_head) {
> + list_add(&rq->queuelist, &dd->dispatch);
> } else {
> deadline_add_rq_rb(dd, rq);
>
> --
> 2.30.2
>
Looks fine:
Reviewed-by: Ming Lei
Thanks,
Ming
/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -272,6 +272,12 @@ static inline bool bio_is_passthrough(struct bio *bio)
> return blk_op_is_scsi(op) || blk_op_is_private(op);
> }
>
> +static inline bool blk_op_is_passthrough(unsigned int op)
> +{
> + return (blk_op_is_
On Sun, Apr 11, 2021 at 10:13:01PM +, Damien Le Moal wrote:
> On 2021/04/09 23:47, Bart Van Assche wrote:
> > On 4/7/21 3:27 AM, Damien Le Moal wrote:
> >> On 2021/04/07 18:46, Changheun Lee wrote:
> >>> I'll prepare new patch as you recommand. It will be added setting of
> >>> limit_bio_size a
mpfs, memory is charged appropriately.
>
> This patch also exports cgroup_get_e_css and int_active_memcg so it
> can be used by the loop module.
>
> Signed-off-by: Dan Schatzberg
> Acked-by: Johannes Weiner
Reviewed-by: Ming Lei
--
Ming Lei
gt;lo_work_lock);
> }
>
> static const struct blk_mq_ops loop_mq_ops = {
> .queue_rq = loop_queue_rq,
> - .init_request = loop_init_request,
> .complete = lo_complete_rq,
> };
>
> @@ -2164,6 +2300,7 @@ static int loop_add(struct loop_device **l, int i)
> mutex_init(&lo->lo_mutex);
> lo->lo_number = i;
> spin_lock_init(&lo->lo_lock);
> + spin_lock_init(&lo->lo_work_lock);
> disk->major = LOOP_MAJOR;
> disk->first_minor = i << part_shift;
> disk->fops = &lo_fops;
> diff --git a/drivers/block/loop.h b/drivers/block/loop.h
> index a3c04f310672..9289c1cd6374 100644
> --- a/drivers/block/loop.h
> +++ b/drivers/block/loop.h
> @@ -14,7 +14,6 @@
> #include
> #include
> #include
> -#include
> #include
>
> /* Possible states of device */
> @@ -54,8 +53,13 @@ struct loop_device {
>
> spinlock_t lo_lock;
> int lo_state;
> - struct kthread_worker worker;
> - struct task_struct *worker_task;
> + spinlock_t lo_work_lock;
> + struct workqueue_struct *workqueue;
> + struct work_struct rootcg_work;
> + struct list_headrootcg_cmd_list;
> + struct list_headidle_worker_list;
> + struct rb_root worker_tree;
> + struct timer_list timer;
> booluse_dio;
> boolsysfs_inited;
>
> @@ -66,7 +70,7 @@ struct loop_device {
> };
>
> struct loop_cmd {
> - struct kthread_work work;
> + struct list_head list_entry;
> bool use_aio; /* use AIO interface to handle I/O */
> atomic_t ref; /* only for aio */
> long ret;
> --
> 2.30.2
>
Reviewed-by: Ming Lei
--
Ming Lei
On Sat, Apr 03, 2021 at 04:10:16PM +0800, Ming Lei wrote:
> On Fri, Apr 02, 2021 at 07:27:30PM +0200, Christoph Hellwig wrote:
> > On Wed, Mar 31, 2021 at 08:16:50AM +0800, Ming Lei wrote:
> > > On Tue, Mar 30, 2021 at 06:53:30PM +0200, Christoph Hellwig wrote:
> > > &
On Fri, Apr 02, 2021 at 07:27:30PM +0200, Christoph Hellwig wrote:
> On Wed, Mar 31, 2021 at 08:16:50AM +0800, Ming Lei wrote:
> > On Tue, Mar 30, 2021 at 06:53:30PM +0200, Christoph Hellwig wrote:
> > > On Tue, Mar 23, 2021 at 04:14:39PM +0800, Ming Lei wrote:
> > > &
On Thu, Apr 01, 2021 at 04:27:37PM +, Gulam Mohamed wrote:
> Hi Ming,
>
> Thanks for taking a look into this. Can you please see my inline
> comments in below mail?
>
> Regards,
> Gulam Mohamed.
>
> -Original Message-
> From: Ming Lei
> Se
On Tue, Mar 30, 2021 at 06:53:30PM +0200, Christoph Hellwig wrote:
> On Tue, Mar 23, 2021 at 04:14:39PM +0800, Ming Lei wrote:
> > blktrace may allocate lots of memory, if the process is terminated
> > by user or OOM, we need to provide one chance to remove the trace
> > buf
On Tue, Mar 30, 2021 at 10:57:04AM +0800, Su Yue wrote:
>
> On Tue 23 Mar 2021 at 16:14, Ming Lei wrote:
>
> > On some ARCHs, such as aarch64, page size may be 64K, meantime there may
> > be lots of CPU cores. relay_open() needs to allocate pages on each CPU
> > b
On Tue, Mar 23, 2021 at 04:14:38PM +0800, Ming Lei wrote:
> blktrace may pass big trace buffer size via '-b', meantime the system
> may have lots of CPU cores, so too much memory can be allocated for
> blktrace.
>
> The 1st patch shutdown bltrace in blkdev_close() in ca
0 RCX: 00465d67
> RDX: 7ffda32c37f3 RSI: 004bfab2 RDI: 7ffda32c37e0
> RBP: R08: R09: 7ffda32c35a0
> R10: 7ffda32c3457 R11: 0202 R12: 0001
> R13: 0000 R14: 0001 R15: 7ffda32c37e0
This is another & un-related warning with original report, so I think
the patch in above
tree fixes the issue.
--
Ming Lei
On Sun, Mar 14, 2021 at 7:10 PM syzbot
wrote:
>
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit:280d542f Merge tag 'drm-fixes-2021-03-05' of git://anongit..
> git tree: upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=15ade5aed0
> kernel config:
On Wed, Mar 24, 2021 at 12:37:03PM +, Gulam Mohamed wrote:
> Hi All,
>
> We are facing a stale link (of the device) issue during the iscsi-logout
> process if we use parted command just before the iscsi logout. Here are the
> details:
>
> As part of iscsi logout, the partitio
in
> kernel between the systemd-udevd and iscsi-logout processing as described
> above. We are able to reproduce this even with latest upstream kernel.
>
> We have come across a patch from Ming Lei which was created for "avoid to
> drop & re-add partitions if partitions a
case of 'blktrace -b 8192' which is used by device-mapper
test suite[1]. This way could cause OOM easily.
Fix the issue by limiting max allowed pages to be 1/8 of totalram_pages().
[1] https://github.com/jthornber/device-mapper-test-suite.git
Signed-off-by: Ming Lei
---
kernel/trace/
uffer size for avoiding potential
OOM.
Ming Lei (2):
block: shutdown blktrace in case of fatal signal pending
blktrace: limit allowed total trace buffer size
fs/block_dev.c | 6 ++
kernel/trace/blktrace.c | 32
2 files changed, 38
blktrace may allocate lots of memory, if the process is terminated
by user or OOM, we need to provide one chance to remove the trace
buffer, otherwise memory leak may be caused.
Fix the issue by shutdown blktrace in case of task exiting in
blkdev_close().
Signed-off-by: Ming Lei
---
fs
On Sun, Mar 14, 2021 at 7:10 PM syzbot
wrote:
>
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit:280d542f Merge tag 'drm-fixes-2021-03-05' of git://anongit..
> git tree: upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=15ade5aed0
> kernel config:
ux a5f35387ebdc
It should be the same issue which was addressed by
aebf5db91705 block: fix use-after-free in disk_part_iter_next
but converting to xarray introduced the issue again.
--
Ming Lei
On Tue, Mar 16, 2021 at 07:35:44PM +0300, Sergei Shtepa wrote:
> The 03/16/2021 11:09, Ming Lei wrote:
> > On Fri, Mar 12, 2021 at 06:44:54PM +0300, Sergei Shtepa wrote:
> > > bdev_interposer allows to redirect bio requests to another devices.
> > >
> &g
On Fri, Mar 12, 2021 at 06:44:54PM +0300, Sergei Shtepa wrote:
> bdev_interposer allows to redirect bio requests to another devices.
>
> Signed-off-by: Sergei Shtepa
> ---
> block/bio.c | 2 ++
> block/blk-core.c | 57 +++
> block/genhd
On Fri, Mar 05, 2021 at 11:14:53PM +0800, John Garry wrote:
> A use-after-free may occur if blk_mq_queue_tag_busy_iter() is run on a
> queue when another queue associated with the same tagset is switching IO
> scheduler:
>
> BUG: KASAN: use-after-free in bt_iter+0xa0/0x120
> Read of size 8 at addr
q_entity_service_tree(entity);
1181is_in_service = entity == sd->in_service_entity;
1182
1183bfq_calc_finish(entity, entity->service);
1184
1185if (is_in_service)
Seems entity->sched_data points to NULL.
>
> Thanks,
> Paolo
>
> > I
Hello Hillf,
Thanks for the debug patch.
On Fri, Mar 5, 2021 at 5:00 PM Hillf Danton wrote:
>
> On Thu, 4 Mar 2021 16:42:30 +0800 Ming Lei wrote:
> > On Sat, Oct 10, 2020 at 1:40 PM Mikhail Gavrilov
> > wrote:
> > >
> > > Paolo, Jens I am sorry for the noi
t; FS: () GS:8dc90e0c() knlGS:
> CS: 0010 DS: ES: CR0: 80050033
> CR2: 003e8ebe4000 CR3: 0007c2546000 CR4: 00350ee0
> Call Trace:
> bfq_deactivate_entity+0x4f/0xc0
Hello,
The same stack trace was observed in RH internal test too, and kernel
is 5.11.0-0.rc6,
but there isn't reproducer yet.
--
Ming Lei
On Wed, Feb 24, 2021 at 09:18:25AM +0100, Christoph Hellwig wrote:
> On Wed, Feb 24, 2021 at 11:58:26AM +0800, Ming Lei wrote:
> > Hi Guys,
> >
> > The two patches changes block ioctl(BLKRRPART) for avoiding drop &
> > re-add partitions if partitions state isn'
On Fri, Feb 19, 2021 at 06:15:13PM +0530, SelvaKumar S wrote:
> This patchset tries to add support for TP4065a ("Simple Copy Command"),
> v2020.05.04 ("Ratified")
>
> The Specification can be found in following link.
> https://nvmexpress.org/wp-content/uploads/NVM-Express-1.4-Ratified-TPs-1.zip
>
On Wed, Feb 17, 2021 at 08:16:29AM +0100, Christoph Hellwig wrote:
> On Wed, Feb 17, 2021 at 11:07:14AM +0800, Ming Lei wrote:
> > Do you think it is correct for ioctl(BLKRRPART) to always drop/re-add
> > partition device node?
>
> Yes, that is what it is designed to do. Th
On Tue, Feb 16, 2021 at 09:44:30AM +0100, Christoph Hellwig wrote:
> On Mon, Feb 15, 2021 at 12:03:41PM +0800, Ming Lei wrote:
> > Hello,
>
> I think this is a fundamentally bad idea. We should not keep the
> parsed partition state around forever just to work around some
i-segment discards. It calculates the correct discard segment
> count by counting the number of bio as each discard bio is considered its
> own segment.
>
> Fixes: 1e739730c5b9 ("block: optionally merge discontiguous discard bios into
> a single request")
> Signed-off-b
On Fri, Feb 05, 2021 at 10:17:06AM +0800, Ming Lei wrote:
> Hi Guys,
>
> The two patches changes block ioctl(BLKRRPART) for avoiding drop &
> re-add partitions if partitions state isn't changed. The current
> behavior confuses userspace because partitions can disapp
On Fri, Feb 05, 2021 at 08:14:29AM +0100, Christoph Hellwig wrote:
> On Fri, Feb 05, 2021 at 10:17:08AM +0800, Ming Lei wrote:
> > block ioctl(BLKRRPART) always drops current partitions and adds
> > partitions again, even though there isn't any change in partitions table.
>
No functional change, make code more readable, and prepare for
supporting safe re-read partitions.
Cc: Ewan D. Milne
Signed-off-by: Ming Lei
---
block/partitions/core.c | 51 ++---
1 file changed, 37 insertions(+), 14 deletions(-)
diff --git a/block
Hi Guys,
The two patches changes block ioctl(BLKRRPART) for avoiding drop &
re-add partitions if partitions state isn't changed. The current
behavior confuses userspace because partitions can disappear anytime
when ioctl(BLKRRPART).
Ming Lei (2):
block: move partitions check code in
s way may confuse userspace or users, for example, one normal
workable partition device node may disappear any time.
Fix this issue by checking if there is real change in partitions state,
and only drop & re-add them when partitions state is really changed.
Cc: Ewan D. Milne
Signed-off-by:
return nr_phys_segs;
> + }
> + /* fall through */
> case REQ_OP_SECURE_ERASE:
> case REQ_OP_WRITE_ZEROES:
> return 0;
blk_rq_nr_discard_segments() always returns >=1 segments, so no similar
issue in case of single range discard.
Reviewed-by: Ming Lei
And it can be thought as:
Fixes: 1e739730c5b9 ("block: optionally merge discontiguous discard bios into a
single request")
--
Ming
On Wed, Feb 03, 2021 at 11:23:37AM -0500, David Jeffery wrote:
> On Wed, Feb 03, 2021 at 10:35:17AM +0800, Ming Lei wrote:
> >
> > On Tue, Feb 02, 2021 at 03:43:55PM -0500, David Jeffery wrote:
> > > The return 0 does seem to be an old relic that does not make sense
>
On Tue, Feb 02, 2021 at 01:12:04PM +0900, Changheun Lee wrote:
> > On Mon, Feb 01, 2021 at 11:52:48AM +0900, Changheun Lee wrote:
> > > > On Fri, Jan 29, 2021 at 12:49:08PM +0900, Changheun Lee wrote:
> > > > > bio size can grow up to 4GB when muli-page bvec is enabled.
> > > > > but sometimes it w
On Tue, Feb 02, 2021 at 03:43:55PM -0500, David Jeffery wrote:
> On Tue, Feb 02, 2021 at 11:33:43AM +0800, Ming Lei wrote:
> >
> > On Mon, Feb 01, 2021 at 11:48:50AM -0500, David Jeffery wrote:
> > > When a stacked block device inserts a request into another
On Mon, Feb 01, 2021 at 11:48:50AM -0500, David Jeffery wrote:
> When a stacked block device inserts a request into another block device
> using blk_insert_cloned_request, the request's nr_phys_segments field gets
> recalculated by a call to blk_recalc_rq_segments in
> blk_cloned_rq_check_limits. B
On Mon, Feb 01, 2021 at 11:52:48AM +0900, Changheun Lee wrote:
> > On Fri, Jan 29, 2021 at 12:49:08PM +0900, Changheun Lee wrote:
> > > bio size can grow up to 4GB when muli-page bvec is enabled.
> > > but sometimes it would lead to inefficient behaviors.
> > > in case of large chunk direct I/O, -
On Fri, Jan 29, 2021 at 12:49:08PM +0900, Changheun Lee wrote:
> bio size can grow up to 4GB when muli-page bvec is enabled.
> but sometimes it would lead to inefficient behaviors.
> in case of large chunk direct I/O, - 32MB chunk read in user space -
> all pages for 32MB would be merged to a bio s
On Wed, Jan 27, 2021 at 09:44:50AM +0200, Maxim Mikityanskiy wrote:
> On Wed, Jan 27, 2021 at 6:23 AM Bart Van Assche wrote:
> >
> > On 1/26/21 11:59 AM, Maxim Mikityanskiy wrote:
> > > The cited commit introduced a serious regression with SATA write speed,
> > > as found by bisecting. This patch
On Tue, Jan 26, 2021 at 06:26:02AM +, Damien Le Moal wrote:
> On 2021/01/26 15:07, Ming Lei wrote:
> > On Tue, Jan 26, 2021 at 04:06:06AM +, Damien Le Moal wrote:
> >> On 2021/01/26 12:58, Ming Lei wrote:
> >>> On Tue, Jan 26, 2021 at 10:32:34AM +0900, Changh
On Tue, Jan 26, 2021 at 04:06:06AM +, Damien Le Moal wrote:
> On 2021/01/26 12:58, Ming Lei wrote:
> > On Tue, Jan 26, 2021 at 10:32:34AM +0900, Changheun Lee wrote:
> >> bio size can grow up to 4GB when muli-page bvec is enabled.
> >> but sometimes it would le
On Tue, Jan 26, 2021 at 10:32:34AM +0900, Changheun Lee wrote:
> bio size can grow up to 4GB when muli-page bvec is enabled.
> but sometimes it would lead to inefficient behaviors.
> in case of large chunk direct I/O, - 32MB chunk read in user space -
> all pages for 32MB would be merged to a bio s
On Thu, Jan 21, 2021 at 09:58:03AM +0900, Changheun Lee wrote:
> bio size can grow up to 4GB when muli-page bvec is enabled.
> but sometimes it would lead to inefficient behaviors.
> in case of large chunk direct I/O, - 32MB chunk read in user space -
> all pages for 32MB would be merged to a bio s
On Thu, Jan 14, 2021 at 11:24:35AM -0600, Brian King wrote:
> On 1/13/21 7:27 PM, Ming Lei wrote:
> > On Wed, Jan 13, 2021 at 11:13:07AM -0600, Brian King wrote:
> >> On 1/12/21 6:33 PM, Tyrel Datwyler wrote:
> >>> On 1/12/21 2:54 PM, Brian King wrote:
> >&g
On Wed, Jan 13, 2021 at 12:02:44PM +, Damien Le Moal wrote:
> On 2021/01/13 20:48, Ming Lei wrote:
> > On Wed, Jan 13, 2021 at 11:16:11AM +, Damien Le Moal wrote:
> >> On 2021/01/13 19:25, Ming Lei wrote:
> >>> On Wed, Jan 13, 2021 at 09:28:02AM +, Damien
On Wed, Jan 13, 2021 at 11:13:07AM -0600, Brian King wrote:
> On 1/12/21 6:33 PM, Tyrel Datwyler wrote:
> > On 1/12/21 2:54 PM, Brian King wrote:
> >> On 1/11/21 5:12 PM, Tyrel Datwyler wrote:
> >>> Introduce several new vhost fields for managing MQ state of the adapter
> >>> as well as initial def
On Wed, Jan 13, 2021 at 11:16:11AM +, Damien Le Moal wrote:
> On 2021/01/13 19:25, Ming Lei wrote:
> > On Wed, Jan 13, 2021 at 09:28:02AM +, Damien Le Moal wrote:
> >> On 2021/01/13 18:19, Ming Lei wrote:
> >>> On Wed, Jan 13, 2021 at 12:09 PM Changheun Lee
On Wed, Jan 13, 2021 at 09:28:02AM +, Damien Le Moal wrote:
> On 2021/01/13 18:19, Ming Lei wrote:
> > On Wed, Jan 13, 2021 at 12:09 PM Changheun Lee
> > wrote:
> >>
> >>> On 2021/01/12 21:14, Changheun Lee wrote:
> >>>>> On 2021/01/12
So what is the actual total
> >latency
> >difference for the entire 32MB user IO ? That is I think what needs to be
> >compared here.
> >
> >Also, what is your device max_sectors_kb and max queue depth ?
> >
>
> 32MB total latency is about 19ms including merge time without this patch.
> But with this patch, total latency is about 17ms including merge time too.
19ms looks too big just for preparing one 32MB sized bio, which isn't
supposed to
take so long. Can you investigate where the 19ms is taken just for
preparing one
32MB sized bio?
It might be iov_iter_get_pages() for handling page fault. If yes, one suggestion
is to enable THP(Transparent HugePage Support) in your application.
--
Ming Lei
On Sun, Jan 10, 2021 at 10:32:47PM +0800, kernel test robot wrote:
>
> Greeting,
>
> FYI, we noticed a -18.4% regression of reaim.jobs_per_min due to commit:
>
>
> commit: 2b0d3d3e4fcfb19d10f9a82910b8f0f05c56ee3e ("percpu_ref: reduce memory
> footprint of percpu_ref in fast path")
> https://gi
direct I/O as memory stall */
> bio_clear_flag(bio, BIO_WORKINGSET);
> diff --git a/include/linux/bio.h b/include/linux/bio.h
> index d8f9077c43ef..1d30572a8c53 100644
> --- a/include/linux/bio.h
> +++ b/include/linux/bio.h
> @@ -444,10 +444,13 @@ static inline void bio_wouldblock_error(struct bio *bio)
>
> /*
> * Calculate number of bvec segments that should be allocated to fit data
> - * pointed by @iter.
> + * pointed by @iter. If @iter is backed by bvec it's going to be reused
> + * instead of allocating a new one.
> */
> static inline int bio_iov_vecs_to_alloc(struct iov_iter *iter, int max_segs)
> {
> + if (iov_iter_is_bvec(iter))
> + return 0;
> return iov_iter_npages(iter, max_segs);
> }
>
> --
> 2.24.0
>
Reviewed-by: Ming Lei
--
Ming
in blk_types.h */
> #include
> +#include
>
> #define BIO_DEBUG
>
> @@ -441,6 +442,15 @@ static inline void bio_wouldblock_error(struct bio *bio)
> bio_endio(bio);
> }
>
> +/*
> + * Calculate number of bvec segments that should be allocated to fit data
> + * pointed by @iter.
> + */
> +static inline int bio_iov_vecs_to_alloc(struct iov_iter *iter, int max_segs)
> +{
> + return iov_iter_npages(iter, max_segs);
> +}
> +
> struct request_queue;
>
> extern int submit_bio_wait(struct bio *bio);
> --
> 2.24.0
>
Reviewed-by: Ming Lei
--
Ming
+ if (iov_iter_is_bvec(i)) {
> + iov_iter_bvec_advance(i, size);
> + return;
> + }
> iterate_and_advance(i, size, v, 0, 0, 0)
> }
> EXPORT_SYMBOL(iov_iter_advance);
> --
> 2.24.0
>
Reviewed-by: Ming Lei
--
Ming
iov_iter_bvec(&iter, is_write, aio_cmd->bvecs, sgl_nents, len);
>
> aio_cmd->cmd = cmd;
> aio_cmd->len = len;
> @@ -307,8 +301,6 @@ fd_execute_rw_aio(struct se_cmd *cmd, struct scatterlist
> *sgl, u32 sgl_nents,
> else
> ret = call_read_iter(file, &aio_cmd->iocb, &iter);
>
> - kfree(bvec);
> -
> if (ret != -EIOCBQUEUED)
> cmd_rw_aio_complete(&aio_cmd->iocb, ret, 0);
>
> --
> 2.24.0
>
Reviewed-by: Ming Lei
--
Ming
; +++ b/fs/direct-io.c
> @@ -426,6 +426,8 @@ static inline void dio_bio_submit(struct dio *dio, struct
> dio_submit *sdio)
> unsigned long flags;
>
> bio->bi_private = dio;
> + /* don't account direct I/O as memory stall */
> + bio_clear_flag(bio, BIO_WORKINGSET);
>
> spin_lock_irqsave(&dio->bio_lock, flags);
> dio->refcount++;
> --
> 2.24.0
>
Reviewed-by: Ming Lei
--
Ming
continue; \
> (void)(STEP); \
> } \
> }
> --
> 2.24.0
>
Reviewed-by: Ming Lei
--
Ming
left -= this_len;
> + n++;
> }
>
> iov_iter_bvec(&from, WRITE, array, n, sd.total_len - left);
> --
> 2.24.0
>
Reviewed-by: Ming Lei
--
Ming
On Thu, Jan 07, 2021 at 09:21:11AM +1100, Dave Chinner wrote:
> On Wed, Jan 06, 2021 at 04:45:48PM +0800, Ming Lei wrote:
> > On Tue, Jan 05, 2021 at 07:39:38PM +0100, Christoph Hellwig wrote:
> > > At least for iomap I think this is the wrong approach. Between t
On Tue, Jan 05, 2021 at 07:39:38PM +0100, Christoph Hellwig wrote:
> At least for iomap I think this is the wrong approach. Between the
> iomap and writeback_control we know the maximum size of the writeback
> request and can just use that.
I think writeback_control can tell us nothing about max
b_vcnt.bt
Cc: Alexander Viro
Cc: Darrick J. Wong
Cc: linux-...@vger.kernel.org
Cc: linux-fsde...@vger.kernel.org
Signed-off-by: Ming Lei
---
fs/block_dev.c| 1 +
fs/iomap/buffered-io.c| 13 +
include/linux/bio.h | 2 --
include/linux/blk
On Mon, Jan 04, 2021 at 09:44:15AM +0100, Christoph Hellwig wrote:
> On Wed, Dec 30, 2020 at 08:08:15AM +0800, Ming Lei wrote:
> > It is observed that __block_write_full_page() always submit bio with size
> > of block size,
> > which is often 512 bytes.
> >
> >
Managing bio slab cache via xarray by using slab cache size as xarray index, and
storing 'struct bio_slab' instance into xarray.
So code is simplified a lot, meantime is is more readable than before.
Signed-off-by: Ming Lei
---
block/b
bvec_alloc(), bvec_free() and bvec_nr_vecs() are only used inside block
layer core functions, no need to declare them in public header.
Signed-off-by: Ming Lei
---
block/blk.h | 4
include/linux/bio.h | 3 ---
2 files changed, 4 insertions(+), 3 deletions(-)
diff --git a/block
This bioset is just for allocating bio only from bio_next_split, and it needn't
bvecs, so remove the flag.
Cc: linux-bca...@vger.kernel.org
Cc: Coly Li
Signed-off-by: Ming Lei
---
drivers/md/bcache/super.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/md/b
bvec_alloc() may allocate more bio vectors than requested, so set .bi_max_vecs
as
actual allocated vector number, instead of the requested number. This way can
help
fs build bigger bio because new bio often won't be allocated until the current
one
becomes full.
Signed-off-by: Min
q->bio_split is only used by bio_split() for fast cloning bio, and no
need to allocate bvecs, so remove this flag.
Signed-off-by: Ming Lei
---
block/blk-core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/block/blk-core.c b/block/blk-core.c
index 96e5fcd7f
Hello,
All are bioset / bvec improvement, and most of them are quite
straightforward.
Ming Lei (6):
block: manage bio slab cache by xarray
block: don't pass BIOSET_NEED_BVECS for q->bio_split
block: don't allocate inline bvecs if this bioset needn't bvecs
block: s
The inline bvecs won't be used if user needn't bvecs by not passing
BIOSET_NEED_BVECS, so don't allocate bvecs in this situation.
Signed-off-by: Ming Lei
---
block/bio.c | 11 +++
include/linux/bio.h | 1 +
2 files changed, 8 insertions(+), 4 deletions(-)
dif
l.org
Cc: Christoph Hellwig
Cc: Jens Axboe
Signed-off-by: Ming Lei
---
fs/buffer.c | 112 +---
1 file changed, 90 insertions(+), 22 deletions(-)
diff --git a/fs/buffer.c b/fs/buffer.c
index 32647d2011df..6bcf9ce5d7f8 100644
--- a/fs/buffer.c
+++ b/fs/
On Mon, Dec 28, 2020 at 05:02:50PM +0800, yukuai (C) wrote:
> Hi
>
> On 2020/12/28 16:28, Ming Lei wrote:
> > Another candidate solution may be to always return true from
> > hctx_may_queue()
> > for this kind of queue because queue_depth has provided fair allocation
On Mon, Dec 28, 2020 at 09:56:15AM +0800, yukuai (C) wrote:
> Hi,
>
> On 2020/12/27 19:58, Ming Lei wrote:
> > Hi Yu Kuai,
> >
> > On Sat, Dec 26, 2020 at 06:28:06PM +0800, Yu Kuai wrote:
> > > When sharing a tag set, if most disks are issuing small amount of
Hi Yu Kuai,
On Sat, Dec 26, 2020 at 06:28:06PM +0800, Yu Kuai wrote:
> When sharing a tag set, if most disks are issuing small amount of IO, and
> only a few is issuing a large amount of IO. Current approach is to limit
> the max amount of tags a disk can get equally to the average of total
> tags
On Tue, Dec 22, 2020 at 11:22:19AM +, John Garry wrote:
> Resend without p...@codeaurora.org, which bounces for me
>
> On 22/12/2020 02:13, Bart Van Assche wrote:
> > On 12/21/20 10:47 AM, John Garry wrote:
> >> Yes, I agree, and I'm not sure what I wrote to give that impression.
> >>
> >> Abo
On Thu, Dec 17, 2020 at 07:07:53PM +0800, John Garry wrote:
> References to old IO sched requests are currently cleared from the
> tagset when freeing those requests; switching elevator or changing
> request queue depth is such a scenario in which this occurs.
>
> However, this does not stop the p
On Mon, Dec 14, 2020 at 10:24:22AM -0500, Steven Rostedt wrote:
> On Mon, 14 Dec 2020 10:22:17 +0800
> Ming Lei wrote:
>
> > trace_note_tsk() is called by __blk_add_trace(), which is covered by RCU
> > read lock.
> > So in case of PREEMPT_RT, warning of 'BU
On Tue, Dec 15, 2020 at 11:14:20AM +, Pavel Begunkov wrote:
> On 15/12/2020 01:41, Ming Lei wrote:
> > On Tue, Dec 15, 2020 at 12:20:19AM +, Pavel Begunkov wrote:
> >> Instead of creating a full copy of iter->bvec into bio in direct I/O,
> >> the patchset mak
On Tue, Dec 15, 2020 at 12:20:19AM +, Pavel Begunkov wrote:
> Instead of creating a full copy of iter->bvec into bio in direct I/O,
> the patchset makes use of the one provided. It changes semantics and
> obliges users of asynchronous kiocb to track bvec lifetime, and [1/6]
> converts the only
_lock into raw_spin_lock().
Cc: Christoph Hellwig
Cc: Steven Rostedt
Cc: Ingo Molnar
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ming Lei
---
kernel/trace/blktrace.c | 14 +++---
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktra
On Fri, Dec 11, 2020 at 01:03:11PM -0800, syzbot wrote:
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit:15ac8fdb Add linux-next specific files for 20201207
> git tree: linux-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=15d8ad3750
> kernel config
archmask))
> return 0;
> +
> + if (node != NUMA_NO_NODE) {
> + /* Try the node mask */
> + if (!assign_vector_locked(irqd, cpumask_of_node(node)))
> + return 0;
> + }
> +
> /* Try the full online mask */
> return assign_vector_locked(irqd, cpu_online_mask);
> }
>
Reviewed-by: Ming Lei
Thanks,
Ming
On Thu, Dec 10, 2020 at 10:44:54AM +, John Garry wrote:
> Hi Ming,
>
> On 10/12/2020 02:07, Ming Lei wrote:
> > > Apart from this, my concern is that we come with for a solution, but it's
> > > a
> > > complicated solution and may not be accep
ED))
> - return false;
> + return;
>
> - return __blk_mq_tag_busy(hctx);
> + __blk_mq_tag_busy(hctx);
The above can be simplified as:
if (hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED)
__blk_mq_tag_busy(hctx);
Otherwise, looks fine:
Reviewed-by: Ming Lei
Thanks,
Ming
On Wed, Dec 09, 2020 at 09:55:30AM +, John Garry wrote:
> On 09/12/2020 01:01, Ming Lei wrote:
> > blk_mq_queue_tag_busy_iter() can be run on another request queue just
> > between one driver tag is allocated and updating the request map, so one
> > extra request reference
On Tue, Dec 08, 2020 at 11:36:58AM +, John Garry wrote:
> On 03/12/2020 09:26, John Garry wrote:
> > On 03/12/2020 00:55, Ming Lei wrote:
> >
> > Hi Ming,
> >
> > > > Yeah, so I said that was another problem which you mentioned
> > > > t
On Thu, Dec 03, 2020 at 09:26:35AM +0800, Ming Lei wrote:
> Hi,
>
> Qian reported there is hang during booting when shared host tagset is
> introduced on megaraid sas. Sumit reported the whole SCSI probe takes
> about ~45min in his test.
>
> Turns out it is caused by n
On Wed, Dec 02, 2020 at 11:18:31AM +, John Garry wrote:
> On 02/12/2020 03:31, Ming Lei wrote:
> > On Tue, Dec 01, 2020 at 09:02:18PM +0800, John Garry wrote:
> > > It has been reported many times that a use-after-free can be
> > > intermittently
> > >
On Tue, Dec 01, 2020 at 09:02:18PM +0800, John Garry wrote:
> It has been reported many times that a use-after-free can be intermittently
> found when iterating busy requests:
>
> -
> https://lore.kernel.org/linux-block/8376443a-ec1b-0cef-8244-ed584b96f...@huawei.com/
> -
> https://lore.kernel.o
porary and the request is still processed
> correctly, better remove the warning as this is the fast path.
>
> Suggested-by: Ming Lei
> Signed-off-by: Daniel Wagner
> ---
>
> v2:
> - remove the warning as suggested by Ming
> v1:
> - initial version
>
&g
On Thu, Nov 26, 2020 at 10:51:52AM +0100, Daniel Wagner wrote:
> The current warning looks aweful like a proper crash. This is
> confusing. There is not much information to gained from the stack
> trace anyway, let's drop it.
>
> While at it print the cpumask as there might be additial helpful
> i
On Thu, Nov 26, 2020 at 01:44:36PM +, Pavel Begunkov wrote:
> On 26/11/2020 02:46, Ming Lei wrote:
> > On Sun, Nov 22, 2020 at 03:35:46PM +, Pavel Begunkov wrote:
> >> map->swap_lock protects map->cleared from concurrent modification,
> >> however sb
On Sun, Nov 22, 2020 at 03:35:46PM +, Pavel Begunkov wrote:
> map->swap_lock protects map->cleared from concurrent modification,
> however sbitmap_deferred_clear() is already atomically drains it, so
> it's guaranteed to not loose bits on concurrent
> sbitmap_deferred_clear().
>
> A one thread
c), &(iter), \
> - (bvl).bv_len) : bvec_iter_skip_zero_bvec(&(iter)))
> + bvec_iter_advance_single((bio_vec), &(iter), (bvl).bv_len))
>
> /* for iterating one bio from start to end */
> #define BVEC_ITER_ALL_INIT (struct bvec_iter)
> \
> --
> 2.24.0
>
Looks fine,
Reviewed-by: Ming Lei
Thanks,
Ming
On Fri, Nov 20, 2020 at 02:06:10AM +, Matthew Wilcox wrote:
> On Fri, Nov 20, 2020 at 01:56:22AM +, Pavel Begunkov wrote:
> > On 20/11/2020 01:49, Matthew Wilcox wrote:
> > > On Fri, Nov 20, 2020 at 01:39:05AM +, Pavel Begunkov wrote:
> > >> On 20/11/2020 01:20, Matthew Wilcox wrote:
>
On Fri, Nov 20, 2020 at 01:39:05AM +, Pavel Begunkov wrote:
> On 20/11/2020 01:20, Matthew Wilcox wrote:
> > On Thu, Nov 19, 2020 at 11:24:38PM +, Pavel Begunkov wrote:
> >> The block layer spends quite a while in iov_iter_npages(), but for the
> >> bvec case the number of pages is already
1 - 100 of 1443 matches
Mail list logo