[linux-next][qla2xxx][85caa95]kernel BUG at lib/list_debug.c:31!

2018-01-09 Thread Abdul Haleem
Greeting's, Linux next kernel panics on powerpc when module qla2xxx is load/unload. Machine Type: Power 8 PowerVM LPAR Kernel : 4.15.0-rc2-next-20171211 gcc : version 4.8.5 Test type: module load/unload few times Trace messages: --- qla2xxx [:00:00.0]-0005: : QLogic Fibre Channe

[PATCH BUGFIX 0/2] block, bfq: fix two memory leaks related to cgroups

2018-01-09 Thread Paolo Valente
Hi Jens, these two patches fix two related memory leaks, the first reported in [1], and the second found by ourselves while fixing the first. Thanks, Paolo [1] https://www.mail-archive.com/linux-block@vger.kernel.org/msg16258.html Paolo Valente (2): block, bfq: put async queues for root bfq gr

[PATCH BUGFIX 2/2] bfq-sq, bfq-mq: release oom-queue ref to root group on exit

2018-01-09 Thread Paolo Valente
On scheduler init, a reference to the root group, and a reference to its corresponding blkg are taken for the oom queue. Yet these references are not released on scheduler exit, which prevents these objects from be freed. This commit adds the missing reference releases. Reported-by: Davide Ferrari

[PATCH BUGFIX 1/2] block, bfq: put async queues for root bfq groups too

2018-01-09 Thread Paolo Valente
For each pair [device for which bfq is selected as I/O scheduler, group in blkio/io], bfq maintains a corresponding bfq group. Each such bfq group contains a set of async queues, with each async queue created on demand, i.e., when some I/O request arrives for it. On creation, an async queue gets a

[PATCH BUGFIX V2 2/2] block, bfq: release oom-queue ref to root group on exit

2018-01-09 Thread Paolo Valente
On scheduler init, a reference to the root group, and a reference to its corresponding blkg are taken for the oom queue. Yet these references are not released on scheduler exit, which prevents these objects from be freed. This commit adds the missing reference releases. Reported-by: Davide Ferrari

[PATCH BUGFIX V2 0/2] block, bfq: fix two memory leaks related to cgroups

2018-01-09 Thread Paolo Valente
[There was a mistake in the subject of the second patch, sorry] Hi Jens, these two patches fix two related memory leaks, the first reported in [1], and the second found by ourselves while fixing the first. Thanks, Paolo [1] https://www.mail-archive.com/linux-block@vger.kernel.org/msg16258.html

[PATCH BUGFIX V2 1/2] block, bfq: put async queues for root bfq groups too

2018-01-09 Thread Paolo Valente
For each pair [device for which bfq is selected as I/O scheduler, group in blkio/io], bfq maintains a corresponding bfq group. Each such bfq group contains a set of async queues, with each async queue created on demand, i.e., when some I/O request arrives for it. On creation, an async queue gets a

Re: bfq: BUG bfq_queue: Objects remaining in bfq_queue on __kmem_cache_shutdown() after rmmod

2018-01-09 Thread Paolo Valente
> Il giorno 21 dic 2017, alle ore 10:14, Guoqing Jiang ha > scritto: > > > > On 12/21/2017 03:53 PM, Paolo Valente wrote: >> >>> Il giorno 21 dic 2017, alle ore 08:08, Guoqing Jiang ha >>> scritto: >>> >>> Hi, >>> >>> >>> On 12/08/2017 08:34 AM, Holger Hoffstätte wrote: So pluggin

Re: [PATCH V4 13/45] block: blk-merge: try to make front segments in full size

2018-01-09 Thread Dmitry Osipenko
On 09.01.2018 05:34, Ming Lei wrote: > On Tue, Jan 09, 2018 at 12:09:27AM +0300, Dmitry Osipenko wrote: >> On 18.12.2017 15:22, Ming Lei wrote: >>> When merging one bvec into segment, if the bvec is too big >>> to merge, current policy is to move the whole bvec into another >>> new segment. >>> >>>

[PATCH] blk-mq: fix kernel oops in blk_mq_tag_idle()

2018-01-09 Thread Ming Lei
HW queues may be unmapped in some cases, such as blk_mq_update_nr_hw_queues(), then we need to check it before calling blk_mq_tag_idle(), otherwise the following kernel oops can be triggered, so fix it by checking if the hw queue is unmapped since it doesn't make sense to idle the tags any more aft

Re: [PATCH V4 13/45] block: blk-merge: try to make front segments in full size

2018-01-09 Thread Ming Lei
On Tue, Jan 09, 2018 at 04:18:39PM +0300, Dmitry Osipenko wrote: > On 09.01.2018 05:34, Ming Lei wrote: > > On Tue, Jan 09, 2018 at 12:09:27AM +0300, Dmitry Osipenko wrote: > >> On 18.12.2017 15:22, Ming Lei wrote: > >>> When merging one bvec into segment, if the bvec is too big > >>> to merge, cur

Re: [RFC PATCH] blk-throttle: dispatch more sync writes in block throttle layer

2018-01-09 Thread Tejun Heo
Hello, On Tue, Jan 09, 2018 at 12:45:13PM +0800, xuejiufei wrote: > 1. A bio is charged according to the direction, if we put the reads > and sync writes together, we need to search the queue to pick a > certain number of read and write IOs when the limit is not reached. Ah, you're right. > 2. I

Re: [PATCH 2/8] blk-mq: protect completion path with RCU

2018-01-09 Thread Jens Axboe
On 1/9/18 12:08 AM, Hannes Reinecke wrote: > On 01/08/2018 08:15 PM, Tejun Heo wrote: >> Currently, blk-mq protects only the issue path with RCU. This patch >> puts the completion path under the same RCU protection. This will be >> used to synchronize issue/completion against timeout by later pat

Re: [PATCH] blk-mq: fix kernel oops in blk_mq_tag_idle()

2018-01-09 Thread Jens Axboe
On Tue, Jan 09 2018, Ming Lei wrote: > HW queues may be unmapped in some cases, such as blk_mq_update_nr_hw_queues(), > then we need to check it before calling blk_mq_tag_idle(), otherwise > the following kernel oops can be triggered, so fix it by checking if > the hw queue is unmapped since it doe

Re: [PATCH BUGFIX V2 0/2] block, bfq: fix two memory leaks related to cgroups

2018-01-09 Thread Holger Hoffstätte
On 01/09/18 10:27, Paolo Valente wrote: > [There was a mistake in the subject of the second patch, sorry] > > Hi Jens, > these two patches fix two related memory leaks, the first reported in > [1], and the second found by ourselves while fixing the first. > > Thanks, > Paolo > > [1] https://www.

Re: [PATCH] blk-mq: fix kernel oops in blk_mq_tag_idle()

2018-01-09 Thread Jens Axboe
On 1/9/18 8:29 AM, Jens Axboe wrote: > On Tue, Jan 09 2018, Ming Lei wrote: >> HW queues may be unmapped in some cases, such as >> blk_mq_update_nr_hw_queues(), >> then we need to check it before calling blk_mq_tag_idle(), otherwise >> the following kernel oops can be triggered, so fix it by check

Re: [PATCH BUGFIX 0/2] block, bfq: fix two memory leaks related to cgroups

2018-01-09 Thread Jens Axboe
On Tue, Jan 09 2018, Paolo Valente wrote: > Hi Jens, > these two patches fix two related memory leaks, the first reported in > [1], and the second found by ourselves while fixing the first. Thanks, applied for 4.16, thanks. -- Jens Axboe

Re: [PATCH 3/8] blk-mq: replace timeout synchronization with a RCU and generation based scheme

2018-01-09 Thread t...@kernel.org
On Mon, Jan 08, 2018 at 09:06:55PM +, Bart Van Assche wrote: > On Mon, 2018-01-08 at 11:15 -0800, Tejun Heo wrote: > > +static void blk_mq_rq_update_aborted_gstate(struct request *rq, u64 gstate) > > +{ > > + unsigned long flags; > > + > > + local_irq_save(flags); > > + u64_stats_update_b

Re: [PATCH 3/8] blk-mq: replace timeout synchronization with a RCU and generation based scheme

2018-01-09 Thread t...@kernel.org
On Mon, Jan 08, 2018 at 11:29:11PM +, Bart Van Assche wrote: > Does "gstate" perhaps stand for "generation number and state"? If so, please > mention this in one of the above comments. Yeah, will do. Thanks. -- tejun

Re: [PATCH] blk-mq: fix kernel oops in blk_mq_tag_idle()

2018-01-09 Thread Ming Lei
On Tue, Jan 09, 2018 at 08:38:55AM -0700, Jens Axboe wrote: > On 1/9/18 8:29 AM, Jens Axboe wrote: > > On Tue, Jan 09 2018, Ming Lei wrote: > >> HW queues may be unmapped in some cases, such as > >> blk_mq_update_nr_hw_queues(), > >> then we need to check it before calling blk_mq_tag_idle(), other

Re: [linux-next][qla2xxx][85caa95]kernel BUG at lib/list_debug.c:31!

2018-01-09 Thread Bart Van Assche
On Tue, 2018-01-09 at 14:44 +0530, Abdul Haleem wrote: > Greeting's, > > Linux next kernel panics on powerpc when module qla2xxx is load/unload. > > Machine Type: Power 8 PowerVM LPAR > Kernel : 4.15.0-rc2-next-20171211 > gcc : version 4.8.5 > Test type: module load/unload few times > > Trace m

Re: [PATCH 5/8] blk-mq: make blk_abort_request() trigger timeout path

2018-01-09 Thread t...@kernel.org
On Mon, Jan 08, 2018 at 10:10:01PM +, Bart Van Assche wrote: > Other req->deadline writes are protected by preempt_disable(), > write_seqcount_begin(&rq->gstate_seq), write_seqcount_end(&rq->gstate_seq) > and preempt_enable(). I think it's fine that the above req->deadline store > does not have

scsi: memory leak in sg_start_req

2018-01-09 Thread Dmitry Vyukov
Hello, syzkaller has found the following memory leak: unreferenced object 0x88004c19 (size 8328): comm "syz-executor", pid 4627, jiffies 4294749150 (age 45.507s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 00 00 00 22 01 00 0

[GIT PULL] Block fixes for 4.15 final

2018-01-09 Thread Jens Axboe
Hi Linus, A set of fixes that should go into this release. This pull request contains: - An NVMe pull request from Christoph, with a few critical fixes for NVMe. - A block drain queue fix from Ming. - The concurrent lo_open/release fix for loop. Please pull! git://git.kernel.dk/linux-blo

Re: [PATCH 2/8] blk-mq: protect completion path with RCU

2018-01-09 Thread Bart Van Assche
On Mon, 2018-01-08 at 11:15 -0800, Tejun Heo wrote: > Currently, blk-mq protects only the issue path with RCU. This patch > puts the completion path under the same RCU protection. This will be > used to synchronize issue/completion against timeout by later patches, > which will also add the comme

Re: [PATCH 2/8] blk-mq: protect completion path with RCU

2018-01-09 Thread Jens Axboe
On 1/9/18 9:12 AM, Bart Van Assche wrote: > On Mon, 2018-01-08 at 11:15 -0800, Tejun Heo wrote: >> Currently, blk-mq protects only the issue path with RCU. This patch >> puts the completion path under the same RCU protection. This will be >> used to synchronize issue/completion against timeout by

Re: [PATCH 2/8] blk-mq: protect completion path with RCU

2018-01-09 Thread t...@kernel.org
Hello, Bart. On Tue, Jan 09, 2018 at 04:12:40PM +, Bart Van Assche wrote: > I'm concerned about the additional CPU cycles needed for the new > blk_mq_map_queue() > call, although I know this call is cheap. Would the timeout code really get > that So, if that is really a concern, let's cache

Re: [PATCH 2/8] blk-mq: protect completion path with RCU

2018-01-09 Thread Jens Axboe
On 1/9/18 9:19 AM, t...@kernel.org wrote: > Hello, Bart. > > On Tue, Jan 09, 2018 at 04:12:40PM +, Bart Van Assche wrote: >> I'm concerned about the additional CPU cycles needed for the new >> blk_mq_map_queue() >> call, although I know this call is cheap. Would the timeout code really get >

[PATCH 6/8] blk-mq: remove REQ_ATOM_COMPLETE usages from blk-mq

2018-01-09 Thread Tejun Heo
After the recent updates to use generation number and state based synchronization, blk-mq no longer depends on REQ_ATOM_COMPLETE except to avoid firing the same timeout multiple times. Remove all REQ_ATOM_COMPLETE usages and use a new rq_flags flag RQF_MQ_TIMEOUT_EXPIRED to avoid firing the same t

[PATCH 8/8] blk-mq: rename blk_mq_hw_ctx->queue_rq_srcu to ->srcu

2018-01-09 Thread Tejun Heo
The RCU protection has been expanded to cover both queueing and completion paths making ->queue_rq_srcu a misnomer. Rename it to ->srcu as suggested by Bart. Signed-off-by: Tejun Heo Cc: Bart Van Assche --- block/blk-mq.c | 14 +++--- include/linux/blk-mq.h | 2 +- 2 files cha

[PATCH 4/8] blk-mq: use blk_mq_rq_state() instead of testing REQ_ATOM_COMPLETE

2018-01-09 Thread Tejun Heo
blk_mq_check_inflight() and blk_mq_poll_hybrid_sleep() test REQ_ATOM_COMPLETE to determine the request state. Both uses are speculative and we can test REQ_ATOM_STARTED and blk_mq_rq_state() for equivalent results. Replace the tests. This will allow removing REQ_ATOM_COMPLETE usages from blk-mq.

[PATCH 7/8] blk-mq: remove REQ_ATOM_STARTED

2018-01-09 Thread Tejun Heo
After the recent updates to use generation number and state based synchronization, we can easily replace REQ_ATOM_STARTED usages by adding an extra state to distinguish completed but not yet freed state. Add MQ_RQ_COMPLETE and replace REQ_ATOM_STARTED usages with blk_mq_rq_state() tests. REQ_ATOM

[PATCH 5/8] blk-mq: make blk_abort_request() trigger timeout path

2018-01-09 Thread Tejun Heo
With issue/complete and timeout paths now using the generation number and state based synchronization, blk_abort_request() is the only one which depends on REQ_ATOM_COMPLETE for arbitrating completion. There's no reason for blk_abort_request() to be a completely separate path. This patch makes bl

[PATCH 3/8] blk-mq: replace timeout synchronization with a RCU and generation based scheme

2018-01-09 Thread Tejun Heo
Currently, blk-mq timeout path synchronizes against the usual issue/completion path using a complex scheme involving atomic bitflags, REQ_ATOM_*, memory barriers and subtle memory coherence rules. Unfortunately, it contains quite a few holes. There's a complex dancing around REQ_ATOM_STARTED and

[PATCH 2/8] blk-mq: protect completion path with RCU

2018-01-09 Thread Tejun Heo
Currently, blk-mq protects only the issue path with RCU. This patch puts the completion path under the same RCU protection. This will be used to synchronize issue/completion against timeout by later patches, which will also add the comments. Signed-off-by: Tejun Heo --- block/blk-mq.c | 5

[PATCH 1/8] blk-mq: move hctx lock/unlock into a helper

2018-01-09 Thread Tejun Heo
From: Jens Axboe Move the RCU vs SRCU logic into lock/unlock helpers, which makes the actual functional bits within the locked region much easier to read. tj: Reordered in front of timeout revamp patches and added the missing blk_mq_run_hw_queue() conversion. Signed-off-by: Jens Axboe Sign

[PATCHSET v5] blk-mq: reimplement timeout handling

2018-01-09 Thread Tejun Heo
Hello, Changes from [v4] - Comments added. Patch description updated. Changes from [v3] - Rebased on top of for-4.16/block. - Integrated Jens's hctx_[un]lock() factoring patch and refreshed the patches accordingly. - Added comment explaining the use of hctx_lock() instead of rcu_read_loc

Re: [PATCH 06/12] IB/core: Add optional PCI P2P flag to rdma_rw_ctx_[init|destroy]()

2018-01-09 Thread Christoph Hellwig
On Mon, Jan 08, 2018 at 12:49:50PM -0700, Jason Gunthorpe wrote: > Pretty sure P2P capable IOMMU hardware exists. > > With SOC's we also have the scenario that an DMA originated from an > on-die device wishes to target an off-die PCI BAR (through the IOMMU), > that definitely exists today, and peo

Re: [PATCH 06/12] IB/core: Add optional PCI P2P flag to rdma_rw_ctx_[init|destroy]()

2018-01-09 Thread Christoph Hellwig
On Mon, Jan 08, 2018 at 12:05:57PM -0700, Logan Gunthorpe wrote: > Ok, so if we shouldn't touch the dma_map infrastructure how should the > workaround to opt-out HFI and QIB look? I'm not that familiar with the RDMA > code. We can add a no_p2p quirk, I'm just not sure what the right place for it

Re: [PATCHSET v5] blk-mq: reimplement timeout handling

2018-01-09 Thread Jens Axboe
On 1/9/18 9:29 AM, Tejun Heo wrote: > Hello, > > Changes from [v4] > > - Comments added. Patch description updated. > > Changes from [v3] > > - Rebased on top of for-4.16/block. > > - Integrated Jens's hctx_[un]lock() factoring patch and refreshed the > patches accordingly. > > - Added com

Re: [PATCH 06/12] IB/core: Add optional PCI P2P flag to rdma_rw_ctx_[init|destroy]()

2018-01-09 Thread Christoph Hellwig
On Mon, Jan 08, 2018 at 12:01:16PM -0700, Jason Gunthorpe wrote: > > So I very much disagree about where to place that workaround - the > > RDMA code is exactly the right place. > > But why? RDMA is using core code to do this. It uses dma_ops in struct > device and it uses normal dma_map SG. How i

Re: [PATCH V4 13/45] block: blk-merge: try to make front segments in full size

2018-01-09 Thread Dmitry Osipenko
On 09.01.2018 17:33, Ming Lei wrote: > On Tue, Jan 09, 2018 at 04:18:39PM +0300, Dmitry Osipenko wrote: >> On 09.01.2018 05:34, Ming Lei wrote: >>> On Tue, Jan 09, 2018 at 12:09:27AM +0300, Dmitry Osipenko wrote: On 18.12.2017 15:22, Ming Lei wrote: > When merging one bvec into segment, if

Re: [PATCH 06/12] IB/core: Add optional PCI P2P flag to rdma_rw_ctx_[init|destroy]()

2018-01-09 Thread Jason Gunthorpe
On Tue, Jan 09, 2018 at 05:46:40PM +0100, Christoph Hellwig wrote: > On Mon, Jan 08, 2018 at 12:49:50PM -0700, Jason Gunthorpe wrote: > > Pretty sure P2P capable IOMMU hardware exists. > > > > With SOC's we also have the scenario that an DMA originated from an > > on-die device wishes to target an

Re: [PATCH 2/5] nvme/multipath: Consult blk_status_t for failover

2018-01-09 Thread Keith Busch
On Mon, Jan 08, 2018 at 01:57:07AM -0800, Christoph Hellwig wrote: > > - if (unlikely(nvme_req(req)->status && nvme_req_needs_retry(req))) { > > - if (nvme_req_needs_failover(req)) { > > + blk_status_t status = nvme_error_status(req); > > + > > + if (unlikely(status != BLK_STS_OK &&

Re: [PATCH 2/5] nvme/multipath: Consult blk_status_t for failover

2018-01-09 Thread Christoph Hellwig
On Tue, Jan 09, 2018 at 10:38:58AM -0700, Keith Busch wrote: > On Mon, Jan 08, 2018 at 01:57:07AM -0800, Christoph Hellwig wrote: > > > - if (unlikely(nvme_req(req)->status && nvme_req_needs_retry(req))) { > > > - if (nvme_req_needs_failover(req)) { > > > + blk_status_t status = nvme_error_

Re: [linux-next][qla2xxx][85caa95]kernel BUG at lib/list_debug.c:31!

2018-01-09 Thread Madhani, Himanshu
Hello Abdul, > On Jan 9, 2018, at 7:54 AM, Bart Van Assche wrote: > > On Tue, 2018-01-09 at 14:44 +0530, Abdul Haleem wrote: >> Greeting's, >> >> Linux next kernel panics on powerpc when module qla2xxx is load/unload. >> >> Machine Type: Power 8 PowerVM LPAR >> Kernel : 4.15.0-rc2-next-20171

[PATCH, resend] blk-mq: Fix spelling in a source code comment

2018-01-09 Thread Bart Van Assche
Change "nedeing" into "needing" and "caes" into "cases". Fixes: commit f906a6a0f426 ("blk-mq: improve tag waiting setup for non-shared tags") Signed-off-by: Bart Van Assche Cc: Christoph Hellwig Cc: Omar Sandoval Cc: Hannes Reinecke Cc: Johannes Thumshirn --- block/blk-mq.c | 4 ++-- 1 file

[PATCH] block: Fix kernel-doc warnings reported when building with W=1

2018-01-09 Thread Bart Van Assche
Commit 3a025e1d1c2e ("Add optional check for bad kernel-doc comments") causes W=1 the kernel-doc script to be run and thereby causes several new warnings to appear when building the kernel with W=1. Fix the block layer kernel-doc headers such that the block layer again builds cleanly with W=1. Sig

Re: [PATCH] block: Fix kernel-doc warnings reported when building with W=1

2018-01-09 Thread Jens Axboe
On 1/9/18 11:11 AM, Bart Van Assche wrote: > Commit 3a025e1d1c2e ("Add optional check for bad kernel-doc comments") > causes W=1 the kernel-doc script to be run and thereby causes several > new warnings to appear when building the kernel with W=1. Fix the > block layer kernel-doc headers such that

Re: [PATCH, resend] blk-mq: Fix spelling in a source code comment

2018-01-09 Thread Jens Axboe
On 1/9/18 11:09 AM, Bart Van Assche wrote: > Change "nedeing" into "needing" and "caes" into "cases". Thanks, applied. -- Jens Axboe

[PATCH 3/4] block: convert REQ_ATOM_COMPLETE to stealing rq->__deadline bit

2018-01-09 Thread Jens Axboe
We only have one atomic flag left. Instead of using an entire unsigned long for that, steal the bottom bit of the deadline field that we already reserved. Remove ->atomic_flags, since it's now unused. Signed-off-by: Jens Axboe --- block/blk-core.c | 2 +- block/blk-mq-debugfs.c | 8

[PATCH 4/4] block: rearrange a few request fields for better cache layout

2018-01-09 Thread Jens Axboe
Move completion related items (like the call single data) near the end of the struct, instead of mixing them in with the initial queueing related fields. Move queuelist below the bio structures. Then we have all queueing related bits in the first cache line. This yields a 1.5-2% increase in IOPS

[PATCHSET 0/4] struct request optimizations

2018-01-09 Thread Jens Axboe
With the latest patchset from Tejun, we grew the request structure a little bit. It's been quite a while since I've taken a look at the layout of the structure, this patchset is a first attempt at doing that. One advantage of Tejun's patchset is that we no longer rely on the atomic complete flag o

[PATCH 1/4] block: remove REQ_ATOM_POLL_SLEPT

2018-01-09 Thread Jens Axboe
We don't need this to be an atomic flag, it can be a regular flag. We either end up on the same CPU for the polling, in which case the state is sane, or we did the sleep which would imply the needed barrier to ensure we see the right state. Signed-off-by: Jens Axboe --- block/blk-mq-debugfs.c |

[PATCH 2/4] block: add accessors for setting/querying request deadline

2018-01-09 Thread Jens Axboe
We reduce the resolution of request expiry, but since we're already using jiffies for this where resolution depends on the kernel configuration and since the timeout resolution is coarse anyway, that should be fine. Signed-off-by: Jens Axboe --- block/blk-mq.c | 2 +- block/blk-timeout.

Re: [PATCH 2/4] block: add accessors for setting/querying request deadline

2018-01-09 Thread Bart Van Assche
On Tue, 2018-01-09 at 11:27 -0700, Jens Axboe wrote: > +static inline void blk_rq_set_deadline(struct request *rq, unsigned long > time) > +{ > + rq->__deadline = time & ~0x1; > +} > + > +static inline unsigned long blk_rq_deadline(struct request *rq) > +{ > + return rq->__deadline & ~0x1;

Re: [PATCH 2/4] block: add accessors for setting/querying request deadline

2018-01-09 Thread Jens Axboe
On 1/9/18 11:40 AM, Bart Van Assche wrote: > On Tue, 2018-01-09 at 11:27 -0700, Jens Axboe wrote: >> +static inline void blk_rq_set_deadline(struct request *rq, unsigned long >> time) >> +{ >> +rq->__deadline = time & ~0x1; >> +} >> + >> +static inline unsigned long blk_rq_deadline(struct requ

Re: [PATCH 3/4] block: convert REQ_ATOM_COMPLETE to stealing rq->__deadline bit

2018-01-09 Thread Bart Van Assche
On Tue, 2018-01-09 at 11:27 -0700, Jens Axboe wrote: > static inline int blk_mark_rq_complete(struct request *rq) > { > - return test_and_set_bit(REQ_ATOM_COMPLETE, &rq->atomic_flags); > + return test_and_set_bit(0, &rq->__deadline); > } > > static inline void blk_clear_rq_complete(st

Re: [PATCH 3/4] block: convert REQ_ATOM_COMPLETE to stealing rq->__deadline bit

2018-01-09 Thread Jens Axboe
On 1/9/18 11:43 AM, Bart Van Assche wrote: > On Tue, 2018-01-09 at 11:27 -0700, Jens Axboe wrote: >> static inline int blk_mark_rq_complete(struct request *rq) >> { >> -return test_and_set_bit(REQ_ATOM_COMPLETE, &rq->atomic_flags); >> +return test_and_set_bit(0, &rq->__deadline); >> } >>

Re: [PATCH 3/4] block: convert REQ_ATOM_COMPLETE to stealing rq->__deadline bit

2018-01-09 Thread Jens Axboe
On 1/9/18 11:44 AM, Jens Axboe wrote: > On 1/9/18 11:43 AM, Bart Van Assche wrote: >> On Tue, 2018-01-09 at 11:27 -0700, Jens Axboe wrote: >>> static inline int blk_mark_rq_complete(struct request *rq) >>> { >>> - return test_and_set_bit(REQ_ATOM_COMPLETE, &rq->atomic_flags); >>> + return tes

[PATCHv2 0/5] nvme/dm failover unification

2018-01-09 Thread Keith Busch
Native nvme multipath provided a separate NVMe status decoder, complicating maintenance as new statuses need to be accounted for. This was already diverging from the generic nvme status decoder, which has implications for other components that rely on accurate generic block errors. This series uni

[PATCHv2 1/5] nvme: Add more command status translation

2018-01-09 Thread Keith Busch
This adds more NVMe status code translations to blk_status_t values, and captures all the current status codes NVMe multipath uses. Acked-by: Mike Snitzer Reviewed-by: Hannes Reinecke Signed-off-by: Keith Busch --- drivers/nvme/host/core.c | 7 +++ 1 file changed, 7 insertions(+) diff --g

[PATCHv2 4/5] nvme/multipath: Use blk_path_error

2018-01-09 Thread Keith Busch
Uses common code for determining if an error should be retried on alternate path. Acked-by: Mike Snitzer Reviewed-by: Hannes Reinecke Signed-off-by: Keith Busch --- drivers/nvme/host/multipath.c | 14 +- 1 file changed, 1 insertion(+), 13 deletions(-) diff --git a/drivers/nvme/hos

[PATCHv2 3/5] block: Provide blk_status_t decoding for path errors

2018-01-09 Thread Keith Busch
This patch provides a common decoder for block status path related errors that may be retried so various entities wishing to consult this do not have to duplicate this decision. Acked-by: Mike Snitzer Reviewed-by: Hannes Reinecke Signed-off-by: Keith Busch --- include/linux/blk_types.h | 28 ++

[PATCHv2 5/5] dm mpath: Use blk_path_error

2018-01-09 Thread Keith Busch
Uses common code for determining if an error should be retried on alternate path. Acked-by: Mike Snitzer Reviewed-by: Hannes Reinecke Signed-off-by: Keith Busch --- drivers/md/dm-mpath.c | 19 ++- 1 file changed, 2 insertions(+), 17 deletions(-) diff --git a/drivers/md/dm-mpat

[PATCHv2 2/5] nvme/multipath: Consult blk_status_t for failover

2018-01-09 Thread Keith Busch
This removes nvme multipath's specific status decoding to see if failover is needed, using the generic blk_status_t that was decoded earlier. This abstraction from the raw NVMe status means all status decoding exists in one place. Acked-by: Mike Snitzer Reviewed-by: Hannes Reinecke Signed-off-by

[PATCH] bcache: closures: move control bits one bit right

2018-01-09 Thread Michael Lyle
Otherwise, architectures that do negated adds of atomics (e.g. s390) to do atomic_sub fail in closure_set_stopped. Signed-off-by: Michael Lyle Cc: Kent Overstreet Reported-by: kbuild test robot --- drivers/md/bcache/closure.h | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff

Re: [PATCH] bcache: closures: move control bits one bit right

2018-01-09 Thread Jens Axboe
On 1/9/18 12:13 PM, Michael Lyle wrote: > Otherwise, architectures that do negated adds of atomics (e.g. s390) > to do atomic_sub fail in closure_set_stopped. Applied, thanks for fixing that up. -- Jens Axboe

[PATCH] null_blk: wire up timeouts

2018-01-09 Thread Jens Axboe
This is needed to ensure that we actually handle timeouts. Without it, the queue_mode=1 path will never call blk_add_timer(), and the queue_mode=2 path will continually just return EH_RESET_TIMER and we never actually complete the offending request. This was used to test the new timeout code, and

Re: unify the interface of the proportional-share policy in blkio/io

2018-01-09 Thread Tejun Heo
Hello, Paolo. On Thu, Jan 04, 2018 at 08:00:02PM +0100, Paolo Valente wrote: > The solution for the second type of parameters may prove useful to > unify also the computation of statistics for the throttling policy. > > Does this proposal sound reasonable? So, the above should work too but I won

Re: unify the interface of the proportional-share policy in blkio/io

2018-01-09 Thread Jens Axboe
On 1/9/18 12:52 PM, Tejun Heo wrote: > Hello, Paolo. > > On Thu, Jan 04, 2018 at 08:00:02PM +0100, Paolo Valente wrote: >> The solution for the second type of parameters may prove useful to >> unify also the computation of statistics for the throttling policy. >> >> Does this proposal sound reason

[PATCH] null_blk: add option for managing IO timeouts

2018-01-09 Thread Jens Axboe
Use the fault injection framework to provide a way for null_blk to configure timeouts. This only works for queue_mode 1 and 2, since the bio mode doesn't have code for tracking timeouts. Let's say you want to have a 10% chance of timing out every 100,000 requests, and for 5 total timeouts, you cou

[for-4.16 PATCH 1/2] block: cope with gendisk's 'queue' being added later

2018-01-09 Thread Mike Snitzer
Since I can remember DM has forced the block layer to allow the allocation and initialization of the request_queue to be distinct operations. Reason for this was block/genhd.c:add_disk() has required that the request_queue (and associated bdi) be tied to the gendisk before add_disk() is called --

[for-4.16 PATCH 0/2] block: cope with gendisk's 'queue' being added later

2018-01-09 Thread Mike Snitzer
Hi Jens, Please consider PATCH 1/2 for 4.16 inclusion. I've included PATCH 2/2 to show the DM changes needed in order to make use of the block changes. I think, all things considered, this moves DM's interface with block core in a better direction for the long-term but I obviously welcome any cr

[for-4.16 PATCH 2/2] dm: fix awkward request_queue initialization

2018-01-09 Thread Mike Snitzer
Fix DM so that it no longer creates a place-holder request_queue that doesn't reflect the actual way the request_queue will ulimately be used, only to have to backfill proper queue and queue_limits initialization. Instead, DM creates a gendisk that initially doesn't have a request_queue at all. Th

Re: unify the interface of the proportional-share policy in blkio/io

2018-01-09 Thread Paolo Valente
> Il giorno 09 gen 2018, alle ore 20:53, Jens Axboe ha > scritto: > > On 1/9/18 12:52 PM, Tejun Heo wrote: >> Hello, Paolo. >> >> On Thu, Jan 04, 2018 at 08:00:02PM +0100, Paolo Valente wrote: >>> The solution for the second type of parameters may prove useful to >>> unify also the computatio

Re: [for-4.16 PATCH 1/2] block: cope with gendisk's 'queue' being added later

2018-01-09 Thread Bart Van Assche
On Tue, 2018-01-09 at 17:10 -0500, Mike Snitzer wrote: > diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c > index 870484eaed1f..0b0dda8e2420 100644 > --- a/block/blk-sysfs.c > +++ b/block/blk-sysfs.c > @@ -919,8 +919,20 @@ int blk_register_queue(struct gendisk *disk) > ret = 0; > unlock: >

Re: [for-4.16 PATCH 1/2] block: cope with gendisk's 'queue' being added later

2018-01-09 Thread Mike Snitzer
On Tue, Jan 09 2018 at 6:04pm -0500, Bart Van Assche wrote: > On Tue, 2018-01-09 at 17:10 -0500, Mike Snitzer wrote: > > diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c > > index 870484eaed1f..0b0dda8e2420 100644 > > --- a/block/blk-sysfs.c > > +++ b/block/blk-sysfs.c > > @@ -919,8 +919,20 @@

Re: [PATCH 2/4] block: Introduce blk_wait_if_quiesced() and blk_finish_wait_if_quiesced()

2018-01-09 Thread Bart Van Assche
On Tue, 2018-01-09 at 07:41 +0100, Hannes Reinecke wrote: > I'm always a bit cautious when having rcu_read_lock() and > rcu_read_unlock() in two separate functions. > Can we make this dependency more explicit by renaming the first function > to blk_start_wait_if_quiesced() and updating the comment

[PATCHSET v2 0/4] struct request optimizations

2018-01-09 Thread Jens Axboe
With the latest patchset from Tejun, we grew the request structure a little bit. It's been quite a while since I've taken a look at the layout of the structure, this patchset is a first attempt at doing that. One advantage of Tejun's patchset is that we no longer rely on the atomic complete flag o

[PATCH 4/4] block: rearrange a few request fields for better cache layout

2018-01-09 Thread Jens Axboe
Move completion related items (like the call single data) near the end of the struct, instead of mixing them in with the initial queueing related fields. Move queuelist below the bio structures. Then we have all queueing related bits in the first cache line. This yields a 1.5-2% increase in IOPS

[PATCH 2/4] block: add accessors for setting/querying request deadline

2018-01-09 Thread Jens Axboe
We reduce the resolution of request expiry, but since we're already using jiffies for this where resolution depends on the kernel configuration and since the timeout resolution is coarse anyway, that should be fine. Signed-off-by: Jens Axboe --- block/blk-mq.c | 2 +- block/blk-timeout.

[PATCH 1/4] block: remove REQ_ATOM_POLL_SLEPT

2018-01-09 Thread Jens Axboe
We don't need this to be an atomic flag, it can be a regular flag. We either end up on the same CPU for the polling, in which case the state is sane, or we did the sleep which would imply the needed barrier to ensure we see the right state. Signed-off-by: Jens Axboe --- block/blk-mq-debugfs.c |

[PATCH 3/4] block: convert REQ_ATOM_COMPLETE to stealing rq->__deadline bit

2018-01-09 Thread Jens Axboe
We only have one atomic flag left. Instead of using an entire unsigned long for that, steal the bottom bit of the deadline field that we already reserved. Remove ->atomic_flags, since it's now unused. Signed-off-by: Jens Axboe --- block/blk-core.c | 2 +- block/blk-mq-debugfs.c | 8

Re: [for-4.16 PATCH 1/2] block: cope with gendisk's 'queue' being added later

2018-01-09 Thread Mike Snitzer
On Tue, Jan 09 2018 at 6:41pm -0500, Mike Snitzer wrote: > On Tue, Jan 09 2018 at 6:04pm -0500, > Bart Van Assche wrote: > > > On Tue, 2018-01-09 at 17:10 -0500, Mike Snitzer wrote: > > > diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c > > > index 870484eaed1f..0b0dda8e2420 100644 > > > --

Re: [PATCH BUGFIX V2 1/2] block, bfq: put async queues for root bfq groups too

2018-01-09 Thread Guoqing Jiang
On 01/09/2018 05:27 PM, Paolo Valente wrote: For each pair [device for which bfq is selected as I/O scheduler, group in blkio/io], bfq maintains a corresponding bfq group. Each such bfq group contains a set of async queues, with each async queue created on demand, i.e., when some I/O request ar

Re: [RFC PATCH] blk-throttle: dispatch more sync writes in block throttle layer

2018-01-09 Thread xuejiufei
Hi Tejun, On 2018/1/9 下午10:56, Tejun Heo wrote: > Hello, > > On Tue, Jan 09, 2018 at 12:45:13PM +0800, xuejiufei wrote: >> 1. A bio is charged according to the direction, if we put the reads >> and sync writes together, we need to search the queue to pick a >> certain number of read and write IOs

[for-4.16 PATCH v2 3/3] dm: fix awkward request_queue initialization

2018-01-09 Thread Mike Snitzer
Fix DM so that it no longer creates a place-holder request_queue that doesn't reflect the actual way the request_queue will ulimately be used, only to have to backfill proper queue and queue_limits initialization. Instead, DM creates a gendisk that initially doesn't have a request_queue at all. Th

Re: [PATCH V4 13/45] block: blk-merge: try to make front segments in full size

2018-01-09 Thread Ming Lei
On Tue, Jan 09, 2018 at 08:02:53PM +0300, Dmitry Osipenko wrote: > On 09.01.2018 17:33, Ming Lei wrote: > > On Tue, Jan 09, 2018 at 04:18:39PM +0300, Dmitry Osipenko wrote: > >> On 09.01.2018 05:34, Ming Lei wrote: > >>> On Tue, Jan 09, 2018 at 12:09:27AM +0300, Dmitry Osipenko wrote: > On 18.

[for-4.16 PATCH v2 1/3] block: only bdi_unregister() in del_gendisk() if !GENHD_FL_HIDDEN

2018-01-09 Thread Mike Snitzer
device_add_disk() will only call bdi_register_owner() if !GENHD_FL_HIDDEN so it follows that bdi_unregister() should be avoided for !GENHD_FL_HIDDEN in del_gendisk(). Found with code inspection. bdi_unregister() won't do much harm if bdi_register_owner() wasn't used but best to avoid it. Fixes:

[for-4.16 PATCH v2 2/3] block: cope with gendisk's 'queue' being added later

2018-01-09 Thread Mike Snitzer
Since I can remember DM has forced the block layer to allow the allocation and initialization of the request_queue to be distinct operations. Reason for this was block/genhd.c:add_disk() has required that the request_queue (and associated bdi) be tied to the gendisk before add_disk() is called --

[for-4.16 PATCH v2 0/3] block: some genhd changes

2018-01-09 Thread Mike Snitzer
Hi Jens, Please consider PATCH 1 and 2 for 4.16 inclusion. I've included PATCH 3 to show the DM changes needed in order to make use of these block changes. v2: added some del_gendisk() code movement to be symmetric with the device_add_disk() related code movement (suggested by Bart). Also

[PATCH] Revert "block: blk-merge: try to make front segments in full size"

2018-01-09 Thread Ming Lei
This reverts commit a2d37968d784363842f87820a21e106741d28004. If max segment size isn't 512-aligned, this patch won't work well. Also once multipage bvec is enabled, adjacent bvecs won't be physically contiguous if page is added via bio_add_page(), so we don't need this kind of complicated logic.

Re: [dm-devel] [for-4.16 PATCH v2 2/3] block: cope with gendisk's 'queue' being added later

2018-01-09 Thread Ming Lei
Hi Mike, On Wed, Jan 10, 2018 at 10:41 AM, Mike Snitzer wrote: > Since I can remember DM has forced the block layer to allow the > allocation and initialization of the request_queue to be distinct > operations. Reason for this was block/genhd.c:add_disk() has required > that the request_queue (a

Re: [PATCH] Revert "block: blk-merge: try to make front segments in full size"

2018-01-09 Thread Jens Axboe
On 1/9/18 7:51 PM, Ming Lei wrote: > This reverts commit a2d37968d784363842f87820a21e106741d28004. > > If max segment size isn't 512-aligned, this patch won't work well. > > Also once multipage bvec is enabled, adjacent bvecs won't be physically > contiguous if page is added via bio_add_page(), s

Re: [for-4.16 PATCH v2 2/3] block: cope with gendisk's 'queue' being added later

2018-01-09 Thread Mike Snitzer
On Tue, Jan 09 2018 at 10:46pm -0500, Ming Lei wrote: > Hi Mike, > > On Wed, Jan 10, 2018 at 10:41 AM, Mike Snitzer wrote: > > Since I can remember DM has forced the block layer to allow the > > allocation and initialization of the request_queue to be distinct > > operations. Reason for this w

Re: [for-4.16 PATCH v2 1/3] block: only bdi_unregister() in del_gendisk() if !GENHD_FL_HIDDEN

2018-01-09 Thread Mike Snitzer
On Tue, Jan 09 2018 at 9:41pm -0500, Mike Snitzer wrote: > device_add_disk() will only call bdi_register_owner() if > !GENHD_FL_HIDDEN so it follows that bdi_unregister() should be avoided > for !GENHD_FL_HIDDEN in del_gendisk(). This ^ should be "GENHD_FL_HIDDEN". Sorry for the cut-n-paste t

BUG: unable to handle kernel NULL pointer dereference at 0000000000000436

2018-01-09 Thread Ming Lei
Hi Paolo, Looks this one is introduced in recent merge, and it is triggered in test of IO vs. removing device on the latest for-next of block tree: [ 296.151615] BUG: unable to handle kernel NULL pointer dereference at 0436 [ 296.152302] IP: percpu_counter_add_batch+0x25/0x9d [ 29

[PATCH] blk-throttle: avoid double counted

2018-01-09 Thread xuejiufei
From: Jiufei Xue If a bio is split after counted to the stat_bytes and stat_ios in blkcg_bio_issue_check(), the bio could be resubmited and enters the block throttle layer again. This will cause the part of the bio is counted twice. The flag BIO_THROTTLED can not be used to fix this problem cons

Re: [PATCH BUGFIX V2 1/2] block, bfq: put async queues for root bfq groups too

2018-01-09 Thread Paolo Valente
> Il giorno 10 gen 2018, alle ore 02:41, Guoqing Jiang ha > scritto: > > > > On 01/09/2018 05:27 PM, Paolo Valente wrote: >> For each pair [device for which bfq is selected as I/O scheduler, >> group in blkio/io], bfq maintains a corresponding bfq group. Each such >> bfq group contains a set

  1   2   >