Hello, Bart.
On Thu, Apr 12, 2018 at 10:40:52PM +, Bart Van Assche wrote:
> Is something like the patch below perhaps what you had in mind?
Yeah, exactly. It'd be really great to have the lockdep asserts add
to the right places.
Thanks.
--
tejun
Hello,
On Thu, Apr 12, 2018 at 06:56:26PM +, Bart Van Assche wrote:
> On Thu, 2018-04-12 at 11:11 -0700, t...@kernel.org wrote:
> > * Right now, blk_queue_enter/exit() doesn't have any annotations.
> > IOW, there's no way for paths which need enter locked to actual
Hello,
On Thu, Apr 12, 2018 at 04:29:09PM +, Bart Van Assche wrote:
> Any code that submits a bio or request needs blk_queue_enter() /
> blk_queue_exit() anyway. Please have a look at the following commit - you will
> see that that commit reduces the number of blk_queue_enter() /
> blk_queue_
Hello,
On Thu, Apr 12, 2018 at 04:03:52PM +, Bart Van Assche wrote:
> On Thu, 2018-04-12 at 08:37 -0700, Tejun Heo wrote:
> > On Thu, Apr 12, 2018 at 08:09:17AM -0600, Bart Van Assche wrote:
> > > I have retested hotunplugging by rerunning the srp-test software. It
> > > seems like you overloo
On Thu, Apr 12, 2018 at 06:58:21AM -0700, t...@kernel.org wrote:
> Cool, can we just factor out the queue lock from those drivers? It's
> just really messy to be switching locks like we do in the cleanup
> path.
So, looking at a couple drivers, it looks like all we'd nee
Hello,
On Thu, Apr 12, 2018 at 03:56:51PM +0200, h...@lst.de wrote:
> On Thu, Apr 12, 2018 at 06:48:12AM -0700, t...@kernel.org wrote:
> > > Which sounds like a very good reason not to use a driver controller
> > > lock for internals like blkcq.
> > >
> > &
Hello,
On Thu, Apr 12, 2018 at 03:14:40PM +0200, h...@lst.de wrote:
> > At least the SCSI ULP drivers drop the last reference to a disk after
> > the blk_cleanup_queue() call. As explained in the description of commit
> > a063057d7c73, removing a request queue from blkcg must happen before
> > blk
Hello, Israel.
On Thu, Apr 12, 2018 at 11:59:11AM +0300, Israel Rukshin wrote:
> On 4/12/2018 12:31 AM, t...@kernel.org wrote:
> >Hey, again.
> >
> >On Wed, Apr 11, 2018 at 10:07:33AM -0700, t...@kernel.org wrote:
> >>Hello, Israel.
> >>
> >>
Hey, again.
On Wed, Apr 11, 2018 at 10:07:33AM -0700, t...@kernel.org wrote:
> Hello, Israel.
>
> On Wed, Apr 11, 2018 at 07:16:14PM +0300, Israel Rukshin wrote:
> > >Just noticed this one, this looks interesting to me as well. Israel,
> > >can you run your test with
On Wed, Apr 11, 2018 at 08:00:29PM +, Bart Van Assche wrote:
> On Wed, 2018-04-11 at 12:57 -0700, t...@kernel.org wrote:
> > On Wed, Apr 11, 2018 at 01:55:25PM -0600, Bart Van Assche wrote:
> > > On 04/11/18 13:00, Alexandru Moise wrote:
> > > > But the root caus
On Wed, Apr 11, 2018 at 01:55:25PM -0600, Bart Van Assche wrote:
> On 04/11/18 13:00, Alexandru Moise wrote:
> >But the root cause of it is in blkcg_init_queue() when blkg_create() returns
> >an ERR ptr, because it tries to insert into a populated index into
> >blkcg->blkg_tree,
> >the entry that
Hello,
On Wed, Apr 11, 2018 at 05:26:12PM +, Bart Van Assche wrote:
> Please explain what you wrote further. It's not clear to me why you think
> that anything is broken nor what a "sever model" is.
So, sever (or revoke) model is where you actively disconnect an object
and ensuring that there
Hello,
On Wed, Apr 11, 2018 at 05:06:41PM +, Bart Van Assche wrote:
> A simple and effective solution is to dissociate a request queue from the
> block cgroup controller before blk_cleanup_queue() returns. This is why commit
> a063057d7c73 ("block: Fix a race between request queue removal and
Hello, Israel.
On Wed, Apr 11, 2018 at 07:16:14PM +0300, Israel Rukshin wrote:
> >Just noticed this one, this looks interesting to me as well. Israel,
> >can you run your test with this patch?
>
> Yes, I just did and it looks good.
Awesome.
> >>+++ b/include/linux/blkdev.h
> >>@@ -227,6 +227,8
Hello,
On Wed, Apr 11, 2018 at 04:42:55PM +, Bart Van Assche wrote:
> On Wed, 2018-04-11 at 07:56 -0700, Tejun Heo wrote:
> > And looking at the change, it looks like the right thing we should
> > have done is caching @lock on the print_blkg side and when switching
> > locks make sure both loc
Hello,
On Wed, Apr 11, 2018 at 05:24:49PM +0300, Sagi Grimberg wrote:
>
> >Ah, yeah, I was moving it out of add_timer but forgot to actully add
> >it to the issue path. Fixed patch below.
> >
> >BTW, no matter what we do w/ the request handover between normal and
> >timeout paths, we'd need some
Hello, Bart.
On Wed, Apr 11, 2018 at 12:50:51PM +, Bart Van Assche wrote:
> Thank you for having shared this patch. It looks interesting to me. What I
> know about the blk-mq timeout handling is as follows:
> * Nobody who has reviewed the blk-mq timeout handling code with this patch
> applie
On Tue, Apr 10, 2018 at 09:46:23PM +, Bart Van Assche wrote:
> On Tue, 2018-04-10 at 14:33 -0700, t...@kernel.org wrote:
> > + else
> > + rq->missed_completion = true;
>
> In this patch I found code that sets rq->missed_completion but no c
Hello,
On Tue, Apr 10, 2018 at 08:40:43AM -0700, t...@kernel.org wrote:
> On Tue, Apr 10, 2018 at 11:38:18PM +0800, Ming Lei wrote:
> > I agree, the issue should be in driver's irq handler and .timeout in
> > theory.
> >
> > For example, even though one requ
Hello,
On Tue, Apr 10, 2018 at 11:38:18PM +0800, Ming Lei wrote:
> I agree, the issue should be in driver's irq handler and .timeout in
> theory.
>
> For example, even though one request has been done by irq handler, .timeout
> still may return RESET_TIMER.
blk-mq can use a separate flag to trac
Hello, Ming.
On Tue, Apr 10, 2018 at 11:25:54PM +0800, Ming Lei wrote:
> + if (time_after_eq(jiffies, deadline) &&
> + blk_mq_change_rq_state(rq, MQ_RQ_IN_FLIGHT, MQ_RQ_COMPLETE)) {
> + blk_mq_rq_timed_out(rq, reserved);
>
> Normal completion still can happen between blk_mq_change_r
Hello,
On Tue, Apr 10, 2018 at 02:30:26PM +, Bart Van Assche wrote:
> > Switching to another model might be better but let's please do that
> > with the right rationales. A good portion of this seems to be built
> > on misunderstandings.
>
> Which misunderstandings? I'm not aware of any misu
Hello, Bart.
On Mon, Apr 09, 2018 at 09:30:27PM +, Bart Van Assche wrote:
> On Mon, 2018-04-09 at 11:56 -0700, t...@kernel.org wrote:
> > On Mon, Apr 09, 2018 at 05:03:05PM +, Bart Van Assche wrote:
> > > exist today in the blk-mq timeout handling code cannot b
Hello,
On Mon, Apr 09, 2018 at 05:03:05PM +, Bart Van Assche wrote:
> My opinion is not only that the two patches that you posted recently do not
> fix all the races that are fixed by this patch but also that the races that
The race was with the path where the ownership of a timed out request
Hello,
On Mon, Apr 02, 2018 at 10:09:18PM +, Bart Van Assche wrote:
> Please elaborate what your long-term goal is for the blk-mq timeout handler.
Hmm... I don't really have any plans beyond what's been posted.
> The legacy block layer suspends request state changes while a timeout is
> bein
Hello,
On Mon, Apr 02, 2018 at 09:56:41PM +, Bart Van Assche wrote:
> This patch increases the time during which .aborted_gstate == .gstate if the
> timeout is reset. Does that increase the chance that a completion will be
> missed
> if the request timeout is reset?
It sure does, but we're c
Hello,
On Mon, Apr 02, 2018 at 09:31:34PM +, Bart Van Assche wrote:
> > > > +* As nothing prevents from completion happening while
> > > > +* ->aborted_gstate is set, this may lead to ignored completions
> > > > +* and further spurious timeouts.
> > > > +*/
> >
Hello, Bart.
On Wed, Feb 21, 2018 at 06:53:05PM +, Bart Van Assche wrote:
> On Sun, 2018-02-18 at 05:11 -0800, t...@kernel.org wrote:
> > On Wed, Feb 14, 2018 at 04:58:56PM +, Bart Van Assche wrote:
> > > With this patch applied the tests I ran so far pass.
> >
Hello, Bart.
On Wed, Feb 14, 2018 at 04:58:56PM +, Bart Van Assche wrote:
> With this patch applied the tests I ran so far pass.
Ah, great to hear. Thanks a lot for testing. Can you please verify
the following? It's the same approach but with RCU sync batching.
Thanks.
Index: work/block/
Hello, Bart.
Sorry about the delay.
On Thu, Feb 08, 2018 at 04:31:43PM +, Bart Van Assche wrote:
> The crash is reported at address scsi_times_out+0x17 == scsi_times_out+23. The
> instruction at that address tries to dereference scsi_cmnd.device (%rax). The
> register dump shows that that poi
On Thu, Feb 08, 2018 at 05:48:32PM +, Bart Van Assche wrote:
> On Thu, 2018-02-08 at 09:40 -0800, t...@kernel.org wrote:
> > Heh, sorry about not being clear. What I'm trying to say is that
> > scmd->device != NULL && device->host == NULL. Or was this
On Thu, Feb 08, 2018 at 05:37:46PM +, Bart Van Assche wrote:
> On Thu, 2018-02-08 at 09:19 -0800, t...@kernel.org wrote:
> > Hello, Bart.
> >
> > On Thu, Feb 08, 2018 at 05:10:45PM +, Bart Van Assche wrote:
> > > I think "dereferencing a pointer"
Hello, Bart.
On Thu, Feb 08, 2018 at 05:10:45PM +, Bart Van Assche wrote:
> I think "dereferencing a pointer" means reading the memory location that
> pointer points
> at? Anyway, I think we both interpret the crash report in the same way,
> namely that it
> means that scmd->device == NULL.
Hello, Bart.
On Thu, Feb 08, 2018 at 04:31:43PM +, Bart Van Assche wrote:
> > That sounds more like a scsi hotplug bug than an issue in the timeout
> > code unless we messed up @req pointer to begin with.
>
> I don't think that this is related to SCSI hotplugging: this crash does not
> occur
On Thu, Feb 08, 2018 at 07:39:40AM -0800, t...@kernel.org wrote:
> That sounds more like a scsi hotplug but than an issue in the timeout
^bug
> code unless we messed up @req pointer to begin with.
--
tejun
On Thu, Feb 08, 2018 at 01:09:57AM +, Bart Van Assche wrote:
> On Wed, 2018-02-07 at 23:48 +, Bart Van Assche wrote:
> > With this patch applied I see requests for which it seems like the timeout
> > handler
> > did not get invoked: [ ... ]
>
> I just noticed the following in the system l
Hello,
On Wed, Feb 07, 2018 at 09:02:21PM +, Bart Van Assche wrote:
> The patch that I used in my test had an smp_wmb() call (see also below).
> Anyway,
> I will see whether I can extract more state information through debugfs.
>
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index ef4f6df
Hello,
On Wed, Feb 07, 2018 at 07:03:56PM +, Bart Van Assche wrote:
> I tried the above patch but already during the first iteration of the test I
> noticed that the test hung, probably due to the following request that got
> stuck:
>
> $ (cd /sys/kernel/debug/block && grep -aH . */*/*/rq_li
Hello, Bart.
On Wed, Feb 07, 2018 at 06:14:13PM +, Bart Van Assche wrote:
> When I wrote my comment I was not sure whether or not non-reentrancy is
> guaranteed for work queue items. However, according to what I found in the
> workqueue implementation I think that is guaranteed. So it shouldn'
Hello, Bart.
On Wed, Feb 07, 2018 at 05:27:10PM +, Bart Van Assche wrote:
> Even with the above change I think that there is still a race between the
> code that handles timer resets and the completion handler. Anyway, the test
Can you elaborate the scenario a bit further? If you're referrin
Hello, Bart.
On Mon, Feb 05, 2018 at 09:33:03PM +, Bart Van Assche wrote:
> My goal with this patch is to fix the race between resetting the timer and
> the completion path. Hence change (3). Changes (1) and (2) are needed to
> make the changes in blk_mq_rq_timed_out() work.
Ah, I see. That
Hello, Bart.
On Tue, Jan 09, 2018 at 04:12:40PM +, Bart Van Assche wrote:
> I'm concerned about the additional CPU cycles needed for the new
> blk_mq_map_queue()
> call, although I know this call is cheap. Would the timeout code really get
> that
So, if that is really a concern, let's cache
On Mon, Jan 08, 2018 at 10:10:01PM +, Bart Van Assche wrote:
> Other req->deadline writes are protected by preempt_disable(),
> write_seqcount_begin(&rq->gstate_seq), write_seqcount_end(&rq->gstate_seq)
> and preempt_enable(). I think it's fine that the above req->deadline store
> does not have
On Mon, Jan 08, 2018 at 11:29:11PM +, Bart Van Assche wrote:
> Does "gstate" perhaps stand for "generation number and state"? If so, please
> mention this in one of the above comments.
Yeah, will do.
Thanks.
--
tejun
On Mon, Jan 08, 2018 at 09:06:55PM +, Bart Van Assche wrote:
> On Mon, 2018-01-08 at 11:15 -0800, Tejun Heo wrote:
> > +static void blk_mq_rq_update_aborted_gstate(struct request *rq, u64 gstate)
> > +{
> > + unsigned long flags;
> > +
> > + local_irq_save(flags);
> > + u64_stats_update_b
On Wed, Dec 20, 2017 at 11:41:02PM +, Bart Van Assche wrote:
> On Tue, 2017-12-12 at 11:01 -0800, Tejun Heo wrote:
> > Currently, blk-mq timeout path synchronizes against the usual
> > issue/completion path using a complex scheme involving atomic
> > bitflags, REQ_ATOM_*, memory barriers and su
Hello, Peter.
On Thu, Dec 14, 2017 at 09:20:42PM +0100, Peter Zijlstra wrote:
> But now that I look at this again, TJ, why can't the below happen?
>
> write_seqlock_begin();
> blk_mq_rq_update_state(rq, IN_FLIGHT);
> blk_add_timer(rq);
>
> read_seqcount_begi
Hello, Bart.
On Thu, Dec 14, 2017 at 09:13:32PM +, Bart Van Assche wrote:
...
> however is called before a every use of a request. Sorry but I'm not
> enthusiast about the code in blk_rq_init() that reinitializes state
> information that should survive request reuse.
If it wasn't clear, me ne
Hello,
On Thu, Dec 14, 2017 at 06:56:55PM +, Bart Van Assche wrote:
> On Tue, 2017-12-12 at 11:01 -0800, Tejun Heo wrote:
> > void blk_abort_request(struct request *req)
> > {
> > - if (blk_mark_rq_complete(req))
> > - return;
> >
> > if (req->q->mq_ops) {
> > - b
Hello,
On Thu, Dec 14, 2017 at 06:51:11PM +, Bart Van Assche wrote:
> On Tue, 2017-12-12 at 11:01 -0800, Tejun Heo wrote:
> > rules. Unfortunatley, it contains quite a few holes.
> ^
> Unfortunately?
>
> > While this change makes REQ_ATOM_COMPLETE synchornizat
On Thu, Dec 14, 2017 at 05:01:06PM +, Bart Van Assche wrote:
> On Tue, 2017-12-12 at 11:01 -0800, Tejun Heo wrote:
> > + } else {
> > + srcu_idx = srcu_read_lock(hctx->queue_rq_srcu);
> > + if (!blk_mark_rq_complete(rq))
> > + __blk_mq_complete_request(rq
On Tue, Dec 12, 2017 at 10:20:39PM +, Bart Van Assche wrote:
> The above code should show all requests owned by the block driver. Patch
> "blk-mq-debugfs: Also show requests that have not yet been started" (not yet
> in Jens' tree) changes the REQ_ATOM_STARTED test into
> list_empty(&rq->queue
Hello, Bart.
On Tue, Dec 12, 2017 at 09:37:11PM +, Bart Van Assche wrote:
> Have you considered the following instead of introducing MQ_RQ_IDLE and
> MQ_RQ_IN_FLIGHT? I think this could help to limit the number of new atomic
> operations introduced in the hot path by this patch series.
But no
Hello,
On Mon, Sep 11, 2017 at 04:09:12PM +, Bart Van Assche wrote:
> On Mon, 2017-09-11 at 06:13 -0700, Tejun Heo wrote:
> > On Fri, Sep 08, 2017 at 04:52:22PM -0700, Bart Van Assche wrote:
> > > The blk-mq core keeps track of the number of request queue users
> > > through q->q_usage_count.
(cc'ing Jan)
Hello, Bart.
On Fri, Apr 21, 2017 at 09:49:17PM +, Bart Van Assche wrote:
> On Thu, 2017-04-20 at 15:18 -0600, Scott Bauer wrote:
> > [ 642.638860] BUG: KASAN: use-after-free in scsi_exit_rq+0xf3/0x120 at
> > addr 8802b7fedf00
> > [ 642.639362] Read of size 1 by task rcuos
Hello,
On Wed, Sep 28, 2016 at 03:43:32AM +, Adam Manzanares wrote:
> I prefer having the feature conditional so you can use the CFQ
> scheduler with I/O priorities as is. If you decide to enable the
> feature then the priorities will be passed down to the drive in
> addition to the work that
56 matches
Mail list logo