Greeting's,
Linux next kernel panics on powerpc when module qla2xxx is load/unload.
Machine Type: Power 8 PowerVM LPAR
Kernel : 4.15.0-rc2-next-20171211
gcc : version 4.8.5
Test type: module load/unload few times
Trace messages:
---
qla2xxx [:00:00.0]-0005: : QLogic Fibre Channe
Hi Jens,
these two patches fix two related memory leaks, the first reported in
[1], and the second found by ourselves while fixing the first.
Thanks,
Paolo
[1] https://www.mail-archive.com/linux-block@vger.kernel.org/msg16258.html
Paolo Valente (2):
block, bfq: put async queues for root bfq gr
On scheduler init, a reference to the root group, and a reference to
its corresponding blkg are taken for the oom queue. Yet these
references are not released on scheduler exit, which prevents these
objects from be freed. This commit adds the missing reference
releases.
Reported-by: Davide Ferrari
For each pair [device for which bfq is selected as I/O scheduler,
group in blkio/io], bfq maintains a corresponding bfq group. Each such
bfq group contains a set of async queues, with each async queue
created on demand, i.e., when some I/O request arrives for it. On
creation, an async queue gets a
On scheduler init, a reference to the root group, and a reference to
its corresponding blkg are taken for the oom queue. Yet these
references are not released on scheduler exit, which prevents these
objects from be freed. This commit adds the missing reference
releases.
Reported-by: Davide Ferrari
[There was a mistake in the subject of the second patch, sorry]
Hi Jens,
these two patches fix two related memory leaks, the first reported in
[1], and the second found by ourselves while fixing the first.
Thanks,
Paolo
[1] https://www.mail-archive.com/linux-block@vger.kernel.org/msg16258.html
For each pair [device for which bfq is selected as I/O scheduler,
group in blkio/io], bfq maintains a corresponding bfq group. Each such
bfq group contains a set of async queues, with each async queue
created on demand, i.e., when some I/O request arrives for it. On
creation, an async queue gets a
> Il giorno 21 dic 2017, alle ore 10:14, Guoqing Jiang ha
> scritto:
>
>
>
> On 12/21/2017 03:53 PM, Paolo Valente wrote:
>>
>>> Il giorno 21 dic 2017, alle ore 08:08, Guoqing Jiang ha
>>> scritto:
>>>
>>> Hi,
>>>
>>>
>>> On 12/08/2017 08:34 AM, Holger Hoffstätte wrote:
So pluggin
On 09.01.2018 05:34, Ming Lei wrote:
> On Tue, Jan 09, 2018 at 12:09:27AM +0300, Dmitry Osipenko wrote:
>> On 18.12.2017 15:22, Ming Lei wrote:
>>> When merging one bvec into segment, if the bvec is too big
>>> to merge, current policy is to move the whole bvec into another
>>> new segment.
>>>
>>>
HW queues may be unmapped in some cases, such as blk_mq_update_nr_hw_queues(),
then we need to check it before calling blk_mq_tag_idle(), otherwise
the following kernel oops can be triggered, so fix it by checking if
the hw queue is unmapped since it doesn't make sense to idle the tags
any more aft
On Tue, Jan 09, 2018 at 04:18:39PM +0300, Dmitry Osipenko wrote:
> On 09.01.2018 05:34, Ming Lei wrote:
> > On Tue, Jan 09, 2018 at 12:09:27AM +0300, Dmitry Osipenko wrote:
> >> On 18.12.2017 15:22, Ming Lei wrote:
> >>> When merging one bvec into segment, if the bvec is too big
> >>> to merge, cur
Hello,
On Tue, Jan 09, 2018 at 12:45:13PM +0800, xuejiufei wrote:
> 1. A bio is charged according to the direction, if we put the reads
> and sync writes together, we need to search the queue to pick a
> certain number of read and write IOs when the limit is not reached.
Ah, you're right.
> 2. I
On 1/9/18 12:08 AM, Hannes Reinecke wrote:
> On 01/08/2018 08:15 PM, Tejun Heo wrote:
>> Currently, blk-mq protects only the issue path with RCU. This patch
>> puts the completion path under the same RCU protection. This will be
>> used to synchronize issue/completion against timeout by later pat
On Tue, Jan 09 2018, Ming Lei wrote:
> HW queues may be unmapped in some cases, such as blk_mq_update_nr_hw_queues(),
> then we need to check it before calling blk_mq_tag_idle(), otherwise
> the following kernel oops can be triggered, so fix it by checking if
> the hw queue is unmapped since it doe
On 01/09/18 10:27, Paolo Valente wrote:
> [There was a mistake in the subject of the second patch, sorry]
>
> Hi Jens,
> these two patches fix two related memory leaks, the first reported in
> [1], and the second found by ourselves while fixing the first.
>
> Thanks,
> Paolo
>
> [1] https://www.
On 1/9/18 8:29 AM, Jens Axboe wrote:
> On Tue, Jan 09 2018, Ming Lei wrote:
>> HW queues may be unmapped in some cases, such as
>> blk_mq_update_nr_hw_queues(),
>> then we need to check it before calling blk_mq_tag_idle(), otherwise
>> the following kernel oops can be triggered, so fix it by check
On Tue, Jan 09 2018, Paolo Valente wrote:
> Hi Jens,
> these two patches fix two related memory leaks, the first reported in
> [1], and the second found by ourselves while fixing the first.
Thanks, applied for 4.16, thanks.
--
Jens Axboe
On Mon, Jan 08, 2018 at 09:06:55PM +, Bart Van Assche wrote:
> On Mon, 2018-01-08 at 11:15 -0800, Tejun Heo wrote:
> > +static void blk_mq_rq_update_aborted_gstate(struct request *rq, u64 gstate)
> > +{
> > + unsigned long flags;
> > +
> > + local_irq_save(flags);
> > + u64_stats_update_b
On Mon, Jan 08, 2018 at 11:29:11PM +, Bart Van Assche wrote:
> Does "gstate" perhaps stand for "generation number and state"? If so, please
> mention this in one of the above comments.
Yeah, will do.
Thanks.
--
tejun
On Tue, Jan 09, 2018 at 08:38:55AM -0700, Jens Axboe wrote:
> On 1/9/18 8:29 AM, Jens Axboe wrote:
> > On Tue, Jan 09 2018, Ming Lei wrote:
> >> HW queues may be unmapped in some cases, such as
> >> blk_mq_update_nr_hw_queues(),
> >> then we need to check it before calling blk_mq_tag_idle(), other
On Tue, 2018-01-09 at 14:44 +0530, Abdul Haleem wrote:
> Greeting's,
>
> Linux next kernel panics on powerpc when module qla2xxx is load/unload.
>
> Machine Type: Power 8 PowerVM LPAR
> Kernel : 4.15.0-rc2-next-20171211
> gcc : version 4.8.5
> Test type: module load/unload few times
>
> Trace m
On Mon, Jan 08, 2018 at 10:10:01PM +, Bart Van Assche wrote:
> Other req->deadline writes are protected by preempt_disable(),
> write_seqcount_begin(&rq->gstate_seq), write_seqcount_end(&rq->gstate_seq)
> and preempt_enable(). I think it's fine that the above req->deadline store
> does not have
Hello,
syzkaller has found the following memory leak:
unreferenced object 0x88004c19 (size 8328):
comm "syz-executor", pid 4627, jiffies 4294749150 (age 45.507s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20 00 00 00 22 01 00 0
Hi Linus,
A set of fixes that should go into this release. This pull request
contains:
- An NVMe pull request from Christoph, with a few critical fixes for
NVMe.
- A block drain queue fix from Ming.
- The concurrent lo_open/release fix for loop.
Please pull!
git://git.kernel.dk/linux-blo
On Mon, 2018-01-08 at 11:15 -0800, Tejun Heo wrote:
> Currently, blk-mq protects only the issue path with RCU. This patch
> puts the completion path under the same RCU protection. This will be
> used to synchronize issue/completion against timeout by later patches,
> which will also add the comme
On 1/9/18 9:12 AM, Bart Van Assche wrote:
> On Mon, 2018-01-08 at 11:15 -0800, Tejun Heo wrote:
>> Currently, blk-mq protects only the issue path with RCU. This patch
>> puts the completion path under the same RCU protection. This will be
>> used to synchronize issue/completion against timeout by
Hello, Bart.
On Tue, Jan 09, 2018 at 04:12:40PM +, Bart Van Assche wrote:
> I'm concerned about the additional CPU cycles needed for the new
> blk_mq_map_queue()
> call, although I know this call is cheap. Would the timeout code really get
> that
So, if that is really a concern, let's cache
On 1/9/18 9:19 AM, t...@kernel.org wrote:
> Hello, Bart.
>
> On Tue, Jan 09, 2018 at 04:12:40PM +, Bart Van Assche wrote:
>> I'm concerned about the additional CPU cycles needed for the new
>> blk_mq_map_queue()
>> call, although I know this call is cheap. Would the timeout code really get
>
After the recent updates to use generation number and state based
synchronization, blk-mq no longer depends on REQ_ATOM_COMPLETE except
to avoid firing the same timeout multiple times.
Remove all REQ_ATOM_COMPLETE usages and use a new rq_flags flag
RQF_MQ_TIMEOUT_EXPIRED to avoid firing the same t
The RCU protection has been expanded to cover both queueing and
completion paths making ->queue_rq_srcu a misnomer. Rename it to
->srcu as suggested by Bart.
Signed-off-by: Tejun Heo
Cc: Bart Van Assche
---
block/blk-mq.c | 14 +++---
include/linux/blk-mq.h | 2 +-
2 files cha
blk_mq_check_inflight() and blk_mq_poll_hybrid_sleep() test
REQ_ATOM_COMPLETE to determine the request state. Both uses are
speculative and we can test REQ_ATOM_STARTED and blk_mq_rq_state() for
equivalent results. Replace the tests. This will allow removing
REQ_ATOM_COMPLETE usages from blk-mq.
After the recent updates to use generation number and state based
synchronization, we can easily replace REQ_ATOM_STARTED usages by
adding an extra state to distinguish completed but not yet freed
state.
Add MQ_RQ_COMPLETE and replace REQ_ATOM_STARTED usages with
blk_mq_rq_state() tests. REQ_ATOM
With issue/complete and timeout paths now using the generation number
and state based synchronization, blk_abort_request() is the only one
which depends on REQ_ATOM_COMPLETE for arbitrating completion.
There's no reason for blk_abort_request() to be a completely separate
path. This patch makes bl
Currently, blk-mq timeout path synchronizes against the usual
issue/completion path using a complex scheme involving atomic
bitflags, REQ_ATOM_*, memory barriers and subtle memory coherence
rules. Unfortunately, it contains quite a few holes.
There's a complex dancing around REQ_ATOM_STARTED and
Currently, blk-mq protects only the issue path with RCU. This patch
puts the completion path under the same RCU protection. This will be
used to synchronize issue/completion against timeout by later patches,
which will also add the comments.
Signed-off-by: Tejun Heo
---
block/blk-mq.c | 5
From: Jens Axboe
Move the RCU vs SRCU logic into lock/unlock helpers, which makes
the actual functional bits within the locked region much easier
to read.
tj: Reordered in front of timeout revamp patches and added the missing
blk_mq_run_hw_queue() conversion.
Signed-off-by: Jens Axboe
Sign
Hello,
Changes from [v4]
- Comments added. Patch description updated.
Changes from [v3]
- Rebased on top of for-4.16/block.
- Integrated Jens's hctx_[un]lock() factoring patch and refreshed the
patches accordingly.
- Added comment explaining the use of hctx_lock() instead of
rcu_read_loc
On Mon, Jan 08, 2018 at 12:49:50PM -0700, Jason Gunthorpe wrote:
> Pretty sure P2P capable IOMMU hardware exists.
>
> With SOC's we also have the scenario that an DMA originated from an
> on-die device wishes to target an off-die PCI BAR (through the IOMMU),
> that definitely exists today, and peo
On Mon, Jan 08, 2018 at 12:05:57PM -0700, Logan Gunthorpe wrote:
> Ok, so if we shouldn't touch the dma_map infrastructure how should the
> workaround to opt-out HFI and QIB look? I'm not that familiar with the RDMA
> code.
We can add a no_p2p quirk, I'm just not sure what the right place
for it
On 1/9/18 9:29 AM, Tejun Heo wrote:
> Hello,
>
> Changes from [v4]
>
> - Comments added. Patch description updated.
>
> Changes from [v3]
>
> - Rebased on top of for-4.16/block.
>
> - Integrated Jens's hctx_[un]lock() factoring patch and refreshed the
> patches accordingly.
>
> - Added com
On Mon, Jan 08, 2018 at 12:01:16PM -0700, Jason Gunthorpe wrote:
> > So I very much disagree about where to place that workaround - the
> > RDMA code is exactly the right place.
>
> But why? RDMA is using core code to do this. It uses dma_ops in struct
> device and it uses normal dma_map SG. How i
On 09.01.2018 17:33, Ming Lei wrote:
> On Tue, Jan 09, 2018 at 04:18:39PM +0300, Dmitry Osipenko wrote:
>> On 09.01.2018 05:34, Ming Lei wrote:
>>> On Tue, Jan 09, 2018 at 12:09:27AM +0300, Dmitry Osipenko wrote:
On 18.12.2017 15:22, Ming Lei wrote:
> When merging one bvec into segment, if
On Tue, Jan 09, 2018 at 05:46:40PM +0100, Christoph Hellwig wrote:
> On Mon, Jan 08, 2018 at 12:49:50PM -0700, Jason Gunthorpe wrote:
> > Pretty sure P2P capable IOMMU hardware exists.
> >
> > With SOC's we also have the scenario that an DMA originated from an
> > on-die device wishes to target an
On Mon, Jan 08, 2018 at 01:57:07AM -0800, Christoph Hellwig wrote:
> > - if (unlikely(nvme_req(req)->status && nvme_req_needs_retry(req))) {
> > - if (nvme_req_needs_failover(req)) {
> > + blk_status_t status = nvme_error_status(req);
> > +
> > + if (unlikely(status != BLK_STS_OK &&
On Tue, Jan 09, 2018 at 10:38:58AM -0700, Keith Busch wrote:
> On Mon, Jan 08, 2018 at 01:57:07AM -0800, Christoph Hellwig wrote:
> > > - if (unlikely(nvme_req(req)->status && nvme_req_needs_retry(req))) {
> > > - if (nvme_req_needs_failover(req)) {
> > > + blk_status_t status = nvme_error_
Hello Abdul,
> On Jan 9, 2018, at 7:54 AM, Bart Van Assche wrote:
>
> On Tue, 2018-01-09 at 14:44 +0530, Abdul Haleem wrote:
>> Greeting's,
>>
>> Linux next kernel panics on powerpc when module qla2xxx is load/unload.
>>
>> Machine Type: Power 8 PowerVM LPAR
>> Kernel : 4.15.0-rc2-next-20171
Change "nedeing" into "needing" and "caes" into "cases".
Fixes: commit f906a6a0f426 ("blk-mq: improve tag waiting setup for non-shared
tags")
Signed-off-by: Bart Van Assche
Cc: Christoph Hellwig
Cc: Omar Sandoval
Cc: Hannes Reinecke
Cc: Johannes Thumshirn
---
block/blk-mq.c | 4 ++--
1 file
Commit 3a025e1d1c2e ("Add optional check for bad kernel-doc comments")
causes W=1 the kernel-doc script to be run and thereby causes several
new warnings to appear when building the kernel with W=1. Fix the
block layer kernel-doc headers such that the block layer again builds
cleanly with W=1.
Sig
On 1/9/18 11:11 AM, Bart Van Assche wrote:
> Commit 3a025e1d1c2e ("Add optional check for bad kernel-doc comments")
> causes W=1 the kernel-doc script to be run and thereby causes several
> new warnings to appear when building the kernel with W=1. Fix the
> block layer kernel-doc headers such that
On 1/9/18 11:09 AM, Bart Van Assche wrote:
> Change "nedeing" into "needing" and "caes" into "cases".
Thanks, applied.
--
Jens Axboe
We only have one atomic flag left. Instead of using an entire
unsigned long for that, steal the bottom bit of the deadline
field that we already reserved.
Remove ->atomic_flags, since it's now unused.
Signed-off-by: Jens Axboe
---
block/blk-core.c | 2 +-
block/blk-mq-debugfs.c | 8
Move completion related items (like the call single data) near the
end of the struct, instead of mixing them in with the initial
queueing related fields.
Move queuelist below the bio structures. Then we have all
queueing related bits in the first cache line.
This yields a 1.5-2% increase in IOPS
With the latest patchset from Tejun, we grew the request structure
a little bit. It's been quite a while since I've taken a look at
the layout of the structure, this patchset is a first attempt at
doing that.
One advantage of Tejun's patchset is that we no longer rely on
the atomic complete flag o
We don't need this to be an atomic flag, it can be a regular
flag. We either end up on the same CPU for the polling, in which
case the state is sane, or we did the sleep which would imply
the needed barrier to ensure we see the right state.
Signed-off-by: Jens Axboe
---
block/blk-mq-debugfs.c |
We reduce the resolution of request expiry, but since we're already
using jiffies for this where resolution depends on the kernel
configuration and since the timeout resolution is coarse anyway,
that should be fine.
Signed-off-by: Jens Axboe
---
block/blk-mq.c | 2 +-
block/blk-timeout.
On Tue, 2018-01-09 at 11:27 -0700, Jens Axboe wrote:
> +static inline void blk_rq_set_deadline(struct request *rq, unsigned long
> time)
> +{
> + rq->__deadline = time & ~0x1;
> +}
> +
> +static inline unsigned long blk_rq_deadline(struct request *rq)
> +{
> + return rq->__deadline & ~0x1;
On 1/9/18 11:40 AM, Bart Van Assche wrote:
> On Tue, 2018-01-09 at 11:27 -0700, Jens Axboe wrote:
>> +static inline void blk_rq_set_deadline(struct request *rq, unsigned long
>> time)
>> +{
>> +rq->__deadline = time & ~0x1;
>> +}
>> +
>> +static inline unsigned long blk_rq_deadline(struct requ
On Tue, 2018-01-09 at 11:27 -0700, Jens Axboe wrote:
> static inline int blk_mark_rq_complete(struct request *rq)
> {
> - return test_and_set_bit(REQ_ATOM_COMPLETE, &rq->atomic_flags);
> + return test_and_set_bit(0, &rq->__deadline);
> }
>
> static inline void blk_clear_rq_complete(st
On 1/9/18 11:43 AM, Bart Van Assche wrote:
> On Tue, 2018-01-09 at 11:27 -0700, Jens Axboe wrote:
>> static inline int blk_mark_rq_complete(struct request *rq)
>> {
>> -return test_and_set_bit(REQ_ATOM_COMPLETE, &rq->atomic_flags);
>> +return test_and_set_bit(0, &rq->__deadline);
>> }
>>
On 1/9/18 11:44 AM, Jens Axboe wrote:
> On 1/9/18 11:43 AM, Bart Van Assche wrote:
>> On Tue, 2018-01-09 at 11:27 -0700, Jens Axboe wrote:
>>> static inline int blk_mark_rq_complete(struct request *rq)
>>> {
>>> - return test_and_set_bit(REQ_ATOM_COMPLETE, &rq->atomic_flags);
>>> + return tes
Native nvme multipath provided a separate NVMe status decoder,
complicating maintenance as new statuses need to be accounted for. This
was already diverging from the generic nvme status decoder, which has
implications for other components that rely on accurate generic block
errors.
This series uni
This adds more NVMe status code translations to blk_status_t values,
and captures all the current status codes NVMe multipath uses.
Acked-by: Mike Snitzer
Reviewed-by: Hannes Reinecke
Signed-off-by: Keith Busch
---
drivers/nvme/host/core.c | 7 +++
1 file changed, 7 insertions(+)
diff --g
Uses common code for determining if an error should be retried on
alternate path.
Acked-by: Mike Snitzer
Reviewed-by: Hannes Reinecke
Signed-off-by: Keith Busch
---
drivers/nvme/host/multipath.c | 14 +-
1 file changed, 1 insertion(+), 13 deletions(-)
diff --git a/drivers/nvme/hos
This patch provides a common decoder for block status path related errors
that may be retried so various entities wishing to consult this do not
have to duplicate this decision.
Acked-by: Mike Snitzer
Reviewed-by: Hannes Reinecke
Signed-off-by: Keith Busch
---
include/linux/blk_types.h | 28 ++
Uses common code for determining if an error should be retried on
alternate path.
Acked-by: Mike Snitzer
Reviewed-by: Hannes Reinecke
Signed-off-by: Keith Busch
---
drivers/md/dm-mpath.c | 19 ++-
1 file changed, 2 insertions(+), 17 deletions(-)
diff --git a/drivers/md/dm-mpat
This removes nvme multipath's specific status decoding to see if failover
is needed, using the generic blk_status_t that was decoded earlier. This
abstraction from the raw NVMe status means all status decoding exists
in one place.
Acked-by: Mike Snitzer
Reviewed-by: Hannes Reinecke
Signed-off-by
Otherwise, architectures that do negated adds of atomics (e.g. s390)
to do atomic_sub fail in closure_set_stopped.
Signed-off-by: Michael Lyle
Cc: Kent Overstreet
Reported-by: kbuild test robot
---
drivers/md/bcache/closure.h | 8
1 file changed, 4 insertions(+), 4 deletions(-)
diff
On 1/9/18 12:13 PM, Michael Lyle wrote:
> Otherwise, architectures that do negated adds of atomics (e.g. s390)
> to do atomic_sub fail in closure_set_stopped.
Applied, thanks for fixing that up.
--
Jens Axboe
This is needed to ensure that we actually handle timeouts.
Without it, the queue_mode=1 path will never call blk_add_timer(),
and the queue_mode=2 path will continually just return
EH_RESET_TIMER and we never actually complete the offending request.
This was used to test the new timeout code, and
Hello, Paolo.
On Thu, Jan 04, 2018 at 08:00:02PM +0100, Paolo Valente wrote:
> The solution for the second type of parameters may prove useful to
> unify also the computation of statistics for the throttling policy.
>
> Does this proposal sound reasonable?
So, the above should work too but I won
On 1/9/18 12:52 PM, Tejun Heo wrote:
> Hello, Paolo.
>
> On Thu, Jan 04, 2018 at 08:00:02PM +0100, Paolo Valente wrote:
>> The solution for the second type of parameters may prove useful to
>> unify also the computation of statistics for the throttling policy.
>>
>> Does this proposal sound reason
Use the fault injection framework to provide a way for null_blk
to configure timeouts. This only works for queue_mode 1 and 2,
since the bio mode doesn't have code for tracking timeouts.
Let's say you want to have a 10% chance of timing out every
100,000 requests, and for 5 total timeouts, you cou
Since I can remember DM has forced the block layer to allow the
allocation and initialization of the request_queue to be distinct
operations. Reason for this was block/genhd.c:add_disk() has required
that the request_queue (and associated bdi) be tied to the gendisk
before add_disk() is called --
Hi Jens,
Please consider PATCH 1/2 for 4.16 inclusion. I've included PATCH 2/2
to show the DM changes needed in order to make use of the block
changes.
I think, all things considered, this moves DM's interface with block
core in a better direction for the long-term but I obviously welcome
any cr
Fix DM so that it no longer creates a place-holder request_queue that
doesn't reflect the actual way the request_queue will ulimately be used,
only to have to backfill proper queue and queue_limits initialization.
Instead, DM creates a gendisk that initially doesn't have a
request_queue at all. Th
> Il giorno 09 gen 2018, alle ore 20:53, Jens Axboe ha
> scritto:
>
> On 1/9/18 12:52 PM, Tejun Heo wrote:
>> Hello, Paolo.
>>
>> On Thu, Jan 04, 2018 at 08:00:02PM +0100, Paolo Valente wrote:
>>> The solution for the second type of parameters may prove useful to
>>> unify also the computatio
On Tue, 2018-01-09 at 17:10 -0500, Mike Snitzer wrote:
> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> index 870484eaed1f..0b0dda8e2420 100644
> --- a/block/blk-sysfs.c
> +++ b/block/blk-sysfs.c
> @@ -919,8 +919,20 @@ int blk_register_queue(struct gendisk *disk)
> ret = 0;
> unlock:
>
On Tue, Jan 09 2018 at 6:04pm -0500,
Bart Van Assche wrote:
> On Tue, 2018-01-09 at 17:10 -0500, Mike Snitzer wrote:
> > diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> > index 870484eaed1f..0b0dda8e2420 100644
> > --- a/block/blk-sysfs.c
> > +++ b/block/blk-sysfs.c
> > @@ -919,8 +919,20 @@
On Tue, 2018-01-09 at 07:41 +0100, Hannes Reinecke wrote:
> I'm always a bit cautious when having rcu_read_lock() and
> rcu_read_unlock() in two separate functions.
> Can we make this dependency more explicit by renaming the first function
> to blk_start_wait_if_quiesced() and updating the comment
With the latest patchset from Tejun, we grew the request structure
a little bit. It's been quite a while since I've taken a look at
the layout of the structure, this patchset is a first attempt at
doing that.
One advantage of Tejun's patchset is that we no longer rely on
the atomic complete flag o
Move completion related items (like the call single data) near the
end of the struct, instead of mixing them in with the initial
queueing related fields.
Move queuelist below the bio structures. Then we have all
queueing related bits in the first cache line.
This yields a 1.5-2% increase in IOPS
We reduce the resolution of request expiry, but since we're already
using jiffies for this where resolution depends on the kernel
configuration and since the timeout resolution is coarse anyway,
that should be fine.
Signed-off-by: Jens Axboe
---
block/blk-mq.c | 2 +-
block/blk-timeout.
We don't need this to be an atomic flag, it can be a regular
flag. We either end up on the same CPU for the polling, in which
case the state is sane, or we did the sleep which would imply
the needed barrier to ensure we see the right state.
Signed-off-by: Jens Axboe
---
block/blk-mq-debugfs.c |
We only have one atomic flag left. Instead of using an entire
unsigned long for that, steal the bottom bit of the deadline
field that we already reserved.
Remove ->atomic_flags, since it's now unused.
Signed-off-by: Jens Axboe
---
block/blk-core.c | 2 +-
block/blk-mq-debugfs.c | 8
On Tue, Jan 09 2018 at 6:41pm -0500,
Mike Snitzer wrote:
> On Tue, Jan 09 2018 at 6:04pm -0500,
> Bart Van Assche wrote:
>
> > On Tue, 2018-01-09 at 17:10 -0500, Mike Snitzer wrote:
> > > diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> > > index 870484eaed1f..0b0dda8e2420 100644
> > > --
On 01/09/2018 05:27 PM, Paolo Valente wrote:
For each pair [device for which bfq is selected as I/O scheduler,
group in blkio/io], bfq maintains a corresponding bfq group. Each such
bfq group contains a set of async queues, with each async queue
created on demand, i.e., when some I/O request ar
Hi Tejun,
On 2018/1/9 下午10:56, Tejun Heo wrote:
> Hello,
>
> On Tue, Jan 09, 2018 at 12:45:13PM +0800, xuejiufei wrote:
>> 1. A bio is charged according to the direction, if we put the reads
>> and sync writes together, we need to search the queue to pick a
>> certain number of read and write IOs
Fix DM so that it no longer creates a place-holder request_queue that
doesn't reflect the actual way the request_queue will ulimately be used,
only to have to backfill proper queue and queue_limits initialization.
Instead, DM creates a gendisk that initially doesn't have a
request_queue at all. Th
On Tue, Jan 09, 2018 at 08:02:53PM +0300, Dmitry Osipenko wrote:
> On 09.01.2018 17:33, Ming Lei wrote:
> > On Tue, Jan 09, 2018 at 04:18:39PM +0300, Dmitry Osipenko wrote:
> >> On 09.01.2018 05:34, Ming Lei wrote:
> >>> On Tue, Jan 09, 2018 at 12:09:27AM +0300, Dmitry Osipenko wrote:
> On 18.
device_add_disk() will only call bdi_register_owner() if
!GENHD_FL_HIDDEN so it follows that bdi_unregister() should be avoided
for !GENHD_FL_HIDDEN in del_gendisk().
Found with code inspection. bdi_unregister() won't do much harm if
bdi_register_owner() wasn't used but best to avoid it.
Fixes:
Since I can remember DM has forced the block layer to allow the
allocation and initialization of the request_queue to be distinct
operations. Reason for this was block/genhd.c:add_disk() has required
that the request_queue (and associated bdi) be tied to the gendisk
before add_disk() is called --
Hi Jens,
Please consider PATCH 1 and 2 for 4.16 inclusion. I've included PATCH
3 to show the DM changes needed in order to make use of these block
changes.
v2: added some del_gendisk() code movement to be symmetric with the
device_add_disk() related code movement (suggested by Bart). Also
This reverts commit a2d37968d784363842f87820a21e106741d28004.
If max segment size isn't 512-aligned, this patch won't work well.
Also once multipage bvec is enabled, adjacent bvecs won't be physically
contiguous if page is added via bio_add_page(), so we don't need this
kind of complicated logic.
Hi Mike,
On Wed, Jan 10, 2018 at 10:41 AM, Mike Snitzer wrote:
> Since I can remember DM has forced the block layer to allow the
> allocation and initialization of the request_queue to be distinct
> operations. Reason for this was block/genhd.c:add_disk() has required
> that the request_queue (a
On 1/9/18 7:51 PM, Ming Lei wrote:
> This reverts commit a2d37968d784363842f87820a21e106741d28004.
>
> If max segment size isn't 512-aligned, this patch won't work well.
>
> Also once multipage bvec is enabled, adjacent bvecs won't be physically
> contiguous if page is added via bio_add_page(), s
On Tue, Jan 09 2018 at 10:46pm -0500,
Ming Lei wrote:
> Hi Mike,
>
> On Wed, Jan 10, 2018 at 10:41 AM, Mike Snitzer wrote:
> > Since I can remember DM has forced the block layer to allow the
> > allocation and initialization of the request_queue to be distinct
> > operations. Reason for this w
On Tue, Jan 09 2018 at 9:41pm -0500,
Mike Snitzer wrote:
> device_add_disk() will only call bdi_register_owner() if
> !GENHD_FL_HIDDEN so it follows that bdi_unregister() should be avoided
> for !GENHD_FL_HIDDEN in del_gendisk().
This ^ should be "GENHD_FL_HIDDEN".
Sorry for the cut-n-paste t
Hi Paolo,
Looks this one is introduced in recent merge, and it is triggered
in test of IO vs. removing device on the latest for-next of block
tree:
[ 296.151615] BUG: unable to handle kernel NULL pointer dereference at
0436
[ 296.152302] IP: percpu_counter_add_batch+0x25/0x9d
[ 29
From: Jiufei Xue
If a bio is split after counted to the stat_bytes and stat_ios in
blkcg_bio_issue_check(), the bio could be resubmited and enters the
block throttle layer again. This will cause the part of the bio is
counted twice.
The flag BIO_THROTTLED can not be used to fix this problem cons
> Il giorno 10 gen 2018, alle ore 02:41, Guoqing Jiang ha
> scritto:
>
>
>
> On 01/09/2018 05:27 PM, Paolo Valente wrote:
>> For each pair [device for which bfq is selected as I/O scheduler,
>> group in blkio/io], bfq maintains a corresponding bfq group. Each such
>> bfq group contains a set
1 - 100 of 103 matches
Mail list logo