date:20130625

Re: RFC: Allow block drivers to poll for I/O instead of sleeping

2013-06-25 Thread Bart Van Assche

On 06/25/13 05:18, Matthew Wilcox wrote:

On Mon, Jun 24, 2013 at 10:07:51AM +0200, Ingo Molnar wrote:

I'm wondering, how will this scheme work if the IO completion latency is a
lot more than the 5 usecs in the testcase? What if it takes 20 usecs or
100 usecs or more?

There's clearly a threshold at which it stops making sense, and our
current NAND-based SSDs are almost certainly on the wrong side of that
threshold! I can't wait for one of the "post-NAND" technologies to make
it to market in some form that makes it economical to use in an SSD.

The problem is that some of the people who are looking at those
technologies are crazy. They want to "bypass the kernel" and "do user
space I/O" because "the kernel is too slow". This patch is part of an
effort to show them how crazy they are. And even if it doesn't convince
them, at least users who refuse to rewrite their applications to take
advantage of magical userspace I/O libraries will see real performance
benefits.

Recently I attended an interesting talk about this subject in which it
was proposed not only to bypass the kernel for access to high-IOPS
devices but also to allow byte-addressability for block devices. The
slides that accompanied that talk can be found here (includes a
performance comparison with the traditional block driver API):

Bernard Metzler, On Suitability of High-Performance Networking API for
Storage, OFA Int'l Developer Workshop, April 24, 2013
(http://www.openfabrics.org/ofa-documents/presentations/doc_download/559-on-suitability-of-high-performance-networking-api-for-storage.html).

This approach leaves the choice of whether to use polling or an
interrupt-based completion notification to the user of the new API,
something the Linux InfiniBand RDMA verbs API already allows today.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCH v11 3/9] Avoid calling __scsi_remove_device() twice

2013-06-25 Thread Bart Van Assche

On 06/24/13 19:38, James Bottomley wrote:
> On Wed, 2013-06-12 at 14:52 +0200, Bart Van Assche wrote:
>> SCSI devices are added to the shost->__devices list from inside
>> scsi_alloc_sdev(). If something goes wrong during LUN scanning,
>> e.g. a transport layer failure occurs, then __scsi_remove_device()
>> can get invoked by the LUN scanning code for a SCSI device in
>> state SDEV_CREATED_BLOCK or SDEV_BLOCKED. If this happens then
>> the SCSI device has not yet been added to sysfs (is_visible == 0).
>> Make sure that if this happens these devices are transitioned
>> into state SDEV_DEL. This avoids that __scsi_remove_device()
>> gets invoked a second time by scsi_forget_host().
> 
> The current principle is that scsi_remove_device can fail, so the
> condition you're avoiding is expected.  If you want to make it always
> succeed, we have to worry about any device state racing with an
> asynchronous remove, which looks like a whole nasty can of worms.
> 
> The change log makes it sound like what you actually want to enable is
> the ability to remove devices which fail probing but which are in the
> blocked state, so why not just respin with only that, which is just
> adding the blocked states to the ->SDEV_DEL state transitions?

If what you had in mind is the patch below, I think we agree:

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index e3d6276..eaea242 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -2185,6 +2185,8 @@ scsi_device_set_state(struct scsi_device *sdev, enum 
scsi_device_state state)
case SDEV_OFFLINE:
case SDEV_TRANSPORT_OFFLINE:
case SDEV_CANCEL:
+   case SDEV_BLOCK:
+   case SDEV_CREATED_BLOCK:
break;
default:
goto illegal;


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v11 4/9] Disallow changing the device state via sysfs into "deleted"

2013-06-25 Thread Bart Van Assche


On 06/24/13 19:59, James Bottomley wrote:

On Wed, 2013-06-12 at 14:53 +0200, Bart Van Assche wrote:

Changing the state of a SCSI device via sysfs into "cancel" or
"deleted" prevents removal of these devices by scsi_remove_host().
Hence do not allow this. Also, introduce the symbolic name
INVALID_SDEV_STATE, representing a value different from any valid
SCSI device state. Update scsi_device_set_state() such that gcc
does not issue a warning about an enumeration value not being
handled inside a switch statement.


zero is the invalid state, that's why the SDEV_ states start at 1.
Using a bare zero also means that gcc doesn't have to consider it in the
switch statement, so there's no need to introduce a new one.

If we want to try to babysit user initiated state changes, then it looks
like OFFLINE<->RUNNING might be the only useful ones?


How about the BLOCKED<>RUNNING and QUIESCE<>RUNNING transitions ? I 
think it may be useful for a user to trigger these as well.


Bart.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v11 6/9] Make scsi_remove_host() wait until error handling finished

2013-06-25 Thread Bart Van Assche


On 06/25/13 00:27, James Bottomley wrote:

On Mon, 2013-06-24 at 15:04 -0500, Mike Christie wrote:

On 06/24/2013 02:19 PM, James Bottomley wrote:

On Wed, 2013-06-12 at 14:55 +0200, Bart Van Assche wrote:

A SCSI LLD may start cleaning up host resources as soon as
scsi_remove_host() returns. These host resources may be needed by
the LLD in an implementation of one of the eh_* functions. So if
one of the eh_* functions is in progress when scsi_remove_host()
is invoked, wait until the eh_* function has finished. Also, do
not invoke any of the eh_* functions after scsi_remove_host() has
started.


We already have state guards for this, don't we?  That's the
SHOST_*_RECOVERY ones.  When eh functions are active, the host
transitions to a recovery state, so the wait could just wait on that
state rather than implement an open coded counting semaphore.


That seems better. For the sg_reset_provider case we just would have to
also wait on the tmf_in_progress bit.


The simplest way is may just be to move the kthread_stop() from release
to remove.  That synchronously waits for the outstanding error handling
to complete and the eh thread to stop.  Perhaps the eh thread should
also wait for tmf in progress before it dies?


Regarding TMF that are in progress: my preference is to leave it to the 
LLD to wait for any TMF in progress if necessary. At least with SRP over 
RDMA it is possible to prevent receiving further TMF completion 
notifications by closing the connection over which these TMF were sent.


There is a difference though between moving the EH kthread_stop() call 
and the patch at the start of this thread: moving the EH kthread_stop() 
call does not prevent that an ioctl like SG_SCSI_RESET triggers an eh_* 
callback after scsi_remove_host() has finished. However, the 
scsi_begin_eh() / scsi_end_eh() functions do prevent that an ioctl can 
cause an eh_* callback to be invoked after scsi_remove_device() finished.


Bart.


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v11 6/9] Make scsi_remove_host() wait until error handling finished

2013-06-25 Thread Bart Van Assche


On 06/25/13 00:27, James Bottomley wrote:

For a variety of reasons this patch set is incredibly hard to review:
Almost every patch touches pieces in the mid layer where you have to be
sure in minute detail you know what's going on (and what should be going
on), so usually it's a couple of hours with the source code just making
sure you do know this.  Plus it's code where the underlying usage model
has evolved over the years meaning the original guarantees might have
been violated silently somewhere and the ramifications spread out like
tentacles everywhere.  Finally, it's not clear from the change logs why
the changes are actually being made: for instance sentence one of this
change log says "A SCSI LLD may start cleaning up host resources as soon
as scsi_remove_host() returns." which causes my sanity checker to go off
immediately because in a refcounted model, like we use for dev, target
and host, nothing essential is supposed to be freed until the last put
which may or may not happen in the remove function.


If the invocations of the eh_* callback functions would be visible to 
the block layer then blk_cleanup_queue() would wait until any such eh_* 
invocations have finished. Such an approach could simplify device 
removal in the SCSI mid-layer significantly. It also would avoid that an 
eh_* callback can be invoked via an ioctl after scsi_remove_device() has 
finished.


Bart.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 8/9] scsi/pm8001: use pdev->pm_cap instead of pci_find_capability(..,PCI_CAP_ID_PM)

2013-06-25 Thread Yijing Wang

Hi,
  Any comments?

On 2013/6/18 16:23, Yijing Wang wrote:
> Pci core has been saved pm cap register offset by pdev->pm_cap in 
> pci_pm_init()
> in init path. So we can use pdev->pm_cap instead of using
> pci_find_capability(pdev, PCI_CAP_ID_PM) for better performance and 
> simplified code.
> 
> Signed-off-by: Yijing Wang 
> Cc: xjtu...@gmail.com
> Cc: lindar_...@usish.com
> Cc: "James E.J. Bottomley" 
> Cc: linux-scsi@vger.kernel.org
> Cc: linux-ker...@vger.kernel.org
> ---
>  drivers/scsi/pm8001/pm8001_init.c |7 +++
>  1 files changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/scsi/pm8001/pm8001_init.c 
> b/drivers/scsi/pm8001/pm8001_init.c
> index e4b9bc7..3861aa1 100644
> --- a/drivers/scsi/pm8001/pm8001_init.c
> +++ b/drivers/scsi/pm8001/pm8001_init.c
> @@ -912,14 +912,13 @@ static int pm8001_pci_suspend(struct pci_dev *pdev, 
> pm_message_t state)
>  {
>   struct sas_ha_struct *sha = pci_get_drvdata(pdev);
>   struct pm8001_hba_info *pm8001_ha;
> - int i , pos;
> + int i;
>   u32 device_state;
>   pm8001_ha = sha->lldd_ha;
>   flush_workqueue(pm8001_wq);
>   scsi_block_requests(pm8001_ha->shost);
> - pos = pci_find_capability(pdev, PCI_CAP_ID_PM);
> - if (pos == 0) {
> - printk(KERN_ERR " PCI PM not supported\n");
> + if (!pdev->pm_cap) {
> + dev_err(&pdev->dev, " PCI PM not supported\n");
>   return -ENODEV;
>   }
>   PM8001_CHIP_DISP->interrupt_disable(pm8001_ha, 0xFF);
> 


-- 
Thanks!
Yijing

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v11 4/9] Disallow changing the device state via sysfs into "deleted"

2013-06-25 Thread James Bottomley

On Tue, 2013-06-25 at 10:41 +0200, Bart Van Assche wrote:
> On 06/24/13 19:59, James Bottomley wrote:
> > On Wed, 2013-06-12 at 14:53 +0200, Bart Van Assche wrote:
> >> Changing the state of a SCSI device via sysfs into "cancel" or
> >> "deleted" prevents removal of these devices by scsi_remove_host().
> >> Hence do not allow this. Also, introduce the symbolic name
> >> INVALID_SDEV_STATE, representing a value different from any valid
> >> SCSI device state. Update scsi_device_set_state() such that gcc
> >> does not issue a warning about an enumeration value not being
> >> handled inside a switch statement.
> >
> > zero is the invalid state, that's why the SDEV_ states start at 1.
> > Using a bare zero also means that gcc doesn't have to consider it in the
> > switch statement, so there's no need to introduce a new one.
> >
> > If we want to try to babysit user initiated state changes, then it looks
> > like OFFLINE<->RUNNING might be the only useful ones?
> 
> How about the BLOCKED<>RUNNING and QUIESCE<>RUNNING transitions ? I 
> think it may be useful for a user to trigger these as well.

They're part of paired state, so the user would tamper with assumptions
the HBA is making ... also, just changing the state doesn't help, the
queue needs to be restarted for these transitions which it currently
isn't.

James

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v11 3/9] Avoid calling __scsi_remove_device() twice

2013-06-25 Thread James Bottomley

On Tue, 2013-06-25 at 10:37 +0200, Bart Van Assche wrote:
> On 06/24/13 19:38, James Bottomley wrote:
> > On Wed, 2013-06-12 at 14:52 +0200, Bart Van Assche wrote:
> >> SCSI devices are added to the shost->__devices list from inside
> >> scsi_alloc_sdev(). If something goes wrong during LUN scanning,
> >> e.g. a transport layer failure occurs, then __scsi_remove_device()
> >> can get invoked by the LUN scanning code for a SCSI device in
> >> state SDEV_CREATED_BLOCK or SDEV_BLOCKED. If this happens then
> >> the SCSI device has not yet been added to sysfs (is_visible == 0).
> >> Make sure that if this happens these devices are transitioned
> >> into state SDEV_DEL. This avoids that __scsi_remove_device()
> >> gets invoked a second time by scsi_forget_host().
> > 
> > The current principle is that scsi_remove_device can fail, so the
> > condition you're avoiding is expected.  If you want to make it always
> > succeed, we have to worry about any device state racing with an
> > asynchronous remove, which looks like a whole nasty can of worms.
> > 
> > The change log makes it sound like what you actually want to enable is
> > the ability to remove devices which fail probing but which are in the
> > blocked state, so why not just respin with only that, which is just
> > adding the blocked states to the ->SDEV_DEL state transitions?
> 
> If what you had in mind is the patch below, I think we agree:
> 
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index e3d6276..eaea242 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -2185,6 +2185,8 @@ scsi_device_set_state(struct scsi_device *sdev, enum 
> scsi_device_state state)
>   case SDEV_OFFLINE:
>   case SDEV_TRANSPORT_OFFLINE:
>   case SDEV_CANCEL:
> + case SDEV_BLOCK:
> + case SDEV_CREATED_BLOCK:

Something like this, yes.  For the probe lun case, we have to be in
CREATED, so any block action transitions only to CREATED_BLOCK.  The
BLOCK->DEL transition can only be a result of an async remove racing
with bringup, can't it?  Which is something I think we still want to
forbid.

James

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v11 6/9] Make scsi_remove_host() wait until error handling finished

2013-06-25 Thread James Bottomley

On Tue, 2013-06-25 at 11:01 +0200, Bart Van Assche wrote:
> On 06/25/13 00:27, James Bottomley wrote:
> > On Mon, 2013-06-24 at 15:04 -0500, Mike Christie wrote:
> >> On 06/24/2013 02:19 PM, James Bottomley wrote:
> >>> On Wed, 2013-06-12 at 14:55 +0200, Bart Van Assche wrote:
>  A SCSI LLD may start cleaning up host resources as soon as
>  scsi_remove_host() returns. These host resources may be needed by
>  the LLD in an implementation of one of the eh_* functions. So if
>  one of the eh_* functions is in progress when scsi_remove_host()
>  is invoked, wait until the eh_* function has finished. Also, do
>  not invoke any of the eh_* functions after scsi_remove_host() has
>  started.
> >>>
> >>> We already have state guards for this, don't we?  That's the
> >>> SHOST_*_RECOVERY ones.  When eh functions are active, the host
> >>> transitions to a recovery state, so the wait could just wait on that
> >>> state rather than implement an open coded counting semaphore.
> >>
> >> That seems better. For the sg_reset_provider case we just would have to
> >> also wait on the tmf_in_progress bit.
> >
> > The simplest way is may just be to move the kthread_stop() from release
> > to remove.  That synchronously waits for the outstanding error handling
> > to complete and the eh thread to stop.  Perhaps the eh thread should
> > also wait for tmf in progress before it dies?
> 
> Regarding TMF that are in progress: my preference is to leave it to the 
> LLD to wait for any TMF in progress if necessary. At least with SRP over 
> RDMA it is possible to prevent receiving further TMF completion 
> notifications by closing the connection over which these TMF were sent.
> 
> There is a difference though between moving the EH kthread_stop() call 
> and the patch at the start of this thread: moving the EH kthread_stop() 
> call does not prevent that an ioctl like SG_SCSI_RESET triggers an eh_* 
> callback after scsi_remove_host() has finished. However, the 
> scsi_begin_eh() / scsi_end_eh() functions do prevent that an ioctl can 
> cause an eh_* callback to be invoked after scsi_remove_device() finished.

OK, but this doesn't tell me what you're trying to achieve.

An eh function is allowable as long as the host hadn't had the release
callback executed.  That means you must have to have a reference to the
device/host to execute the eh function, which is currently guaranteed
for all invocations.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: RFC: Allow block drivers to poll for I/O instead of sleeping

2013-06-25 Thread Steven Rostedt

On Mon, 2013-06-24 at 23:07 -0400, Matthew Wilcox wrote:
> On Mon, Jun 24, 2013 at 08:11:02PM -0400, Steven Rostedt wrote:
> > What about hooking into the idle_balance code? That happens if we are
> > about to go to idle but before the full schedule switch to the idle
> > task.
> > 
> > 
> > In __schedule(void):
> > 
> > if (unlikely(!rq->nr_running))
> > idle_balance(cpu, rq);
> 
> That may be a great place to try it from the PoV of the scheduler, but are
> you OK with me threading a struct backing_dev_info * all the way through
> the scheduler to idle_balance()?  :-)

Well, there's other ways to pass data down besides parameters. You could
attach something to the task itself.

-- Steve


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: RFC: Allow block drivers to poll for I/O instead of sleeping

2013-06-25 Thread Jens Axboe

On Mon, Jun 24 2013, Matthew Wilcox wrote:
> On Mon, Jun 24, 2013 at 09:15:45AM +0200, Jens Axboe wrote:
> > Willy, I think the general design is fine, hooking in via the bdi is the
> > only way to get back to the right place from where you need to sleep.
> > Some thoughts:
> > 
> > - This should be hooked in via blk-iopoll, both of them should call into
> >   the same driver hook for polling completions.
> 
> I actually started working on this, then I realised that it's actually
> a bad idea.  blk-iopoll's poll function is to poll the single I/O queue
> closest to this CPU.  The iowait poll function is to poll all queues
> that the I/O for this address_space might complete on.

blk_iopoll can be tied to "whatever". It was originally designed to be
tied to the queue, which would make it driver-wide. So there's no intent
for it to poll only a subset of the device, though if you tie it to a
completion queue (which would be most natural), then it'd only find
completions there.

I didn't look at your nvme end of the implementation - if you could
reliably map to the right completion queue, then it would have the same
mapping as iopoll on a per-completion queue basis. If you can't, then
you have to poll all of them. That doesn't seem like it would scale well
for having more than a few applications banging on a device.

> I'm reluctant to ask drivers to define two poll functions, but I'm even
> more reluctant to ask them to define one function with two purposes.
>
> > - It needs to be more intelligent in when you want to poll and when you
> >   want regular irq driven IO.
> 
> Oh yeah, absolutely.  While the example patch didn't show it, I wouldn't
> enable it for all NVMe devices; only ones with sufficiently low latency.
> There's also the ability for the driver to look at the number of
> outstanding I/Os and return an error (eg -EBUSY) to stop spinning.

There might also be read vs write differences. Say some devices complete
writes very quickly, but reads are slower. Or vice versa. And then
there's the inevitable "some IOs are slow, but usually very fast". But
that can't really be handled except giving up on the polling at some
point.

> > - With the former note, the app either needs to opt in (and hence
> >   willingly sacrifice CPU cycles of its scheduling slice) or it needs to
> >   be nicer in when it gives up and goes back to irq driven IO.
> 
> Yup.  I like the way you framed it.  If the task *wants* to spend its
> CPU cycles on polling for I/O instead of giving up the remainder of its
> time slice, then it should be able to do that.  After all, it already can;
> it can submit an I/O request via AIO, and then call io_getevents in a
> tight loop.

Exactly, that was my point. Or it can busy loop just checking the aio
ring, at which point it's really stupid to be IRQ driven at all. It'd be
much better to have the app reap the completion directly.

> So maybe the right way to do this is with a task flag?  If we go that
> route, I'd like to further develop this option to allow I/Os to be
> designated as "low latency" vs "normal".  Taking a page fault would be
> "low latency" for all tasks, not just ones that choose to spin for I/O.

Not sure, I'd have to think about it some more. It's a mix of what the
application decides to do, but also what the underlying device can do.
And there might be fs implications, etc.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: RFC: Allow block drivers to poll for I/O instead of sleeping

2013-06-25 Thread Jens Axboe

On Mon, Jun 24 2013, Steven Rostedt wrote:
> On Mon, Jun 24, 2013 at 09:17:18AM +0200, Jens Axboe wrote:
> > On Sun, Jun 23 2013, Linus Torvalds wrote:
> > > 
> > > You could try to do that either *in* the idle thread (which would take
> > > the context switch overhead - maybe negating some of the advantages),
> > > or alternatively hook into the scheduler idle logic before actually
> > > doing the switch.
> > 
> > It can't happen in the idle thread. If you need to take the context
> > switch, then you've negated pretty much all of the gain of the polled
> > approach.
> 
> What about hooking into the idle_balance code? That happens if we are
> about to go to idle but before the full schedule switch to the idle
> task.
> 
> 
> In __schedule(void):
> 
>   if (unlikely(!rq->nr_running))
>   idle_balance(cpu, rq);

If you can avoid the switch (sleep/wakeup), then that's what matters. If
you end up sleeping, you've lost that latency game and polling is mostly
useful in the blk_iopoll designed fashion for high iops scenarios.
Besides, you need the task + page context to be able to find out what to
poll for (and when to stop).

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: RFC: Allow block drivers to poll for I/O instead of sleeping

2013-06-25 Thread Jens Axboe

On Mon, Jun 24 2013, Matthew Wilcox wrote:
> On Mon, Jun 24, 2013 at 10:07:51AM +0200, Ingo Molnar wrote:
> > I'm wondering, how will this scheme work if the IO completion latency is a 
> > lot more than the 5 usecs in the testcase? What if it takes 20 usecs or 
> > 100 usecs or more?
> 
> There's clearly a threshold at which it stops making sense, and our
> current NAND-based SSDs are almost certainly on the wrong side of that
> threshold!  I can't wait for one of the "post-NAND" technologies to make
> it to market in some form that makes it economical to use in an SSD.
> 
> The problem is that some of the people who are looking at those
> technologies are crazy.  They want to "bypass the kernel" and "do user
> space I/O" because "the kernel is too slow".  This patch is part of an
> effort to show them how crazy they are.  And even if it doesn't convince
> them, at least users who refuse to rewrite their applications to take
> advantage of magical userspace I/O libraries will see real performance
> benefits.

Fully concur with that. At least on the read side, nand is just getting
crappier and polled completions is usually not going to be great. On the
write side, however, there are definite gains. Completions in the
10-15usec range aren't unusual. And once we hit PCM, well, it'll be fun.

On the write side, there are plenty of super latency customers out there
who would LOVE to poll when/if it's useful. Most often also the same
kind of people who talk the crazy of putting everything in user space.
Which is why I like the polling. If we can get sufficiently close, then
we can shut some of that up.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v11 3/9] Avoid calling __scsi_remove_device() twice

2013-06-25 Thread Bart Van Assche


On 06/25/13 15:44, James Bottomley wrote:

On Tue, 2013-06-25 at 10:37 +0200, Bart Van Assche wrote:

On 06/24/13 19:38, James Bottomley wrote:

On Wed, 2013-06-12 at 14:52 +0200, Bart Van Assche wrote:

SCSI devices are added to the shost->__devices list from inside
scsi_alloc_sdev(). If something goes wrong during LUN scanning,
e.g. a transport layer failure occurs, then __scsi_remove_device()
can get invoked by the LUN scanning code for a SCSI device in
state SDEV_CREATED_BLOCK or SDEV_BLOCKED. If this happens then
the SCSI device has not yet been added to sysfs (is_visible == 0).
Make sure that if this happens these devices are transitioned
into state SDEV_DEL. This avoids that __scsi_remove_device()
gets invoked a second time by scsi_forget_host().


The current principle is that scsi_remove_device can fail, so the
condition you're avoiding is expected.  If you want to make it always
succeed, we have to worry about any device state racing with an
asynchronous remove, which looks like a whole nasty can of worms.

The change log makes it sound like what you actually want to enable is
the ability to remove devices which fail probing but which are in the
blocked state, so why not just respin with only that, which is just
adding the blocked states to the ->SDEV_DEL state transitions?


If what you had in mind is the patch below, I think we agree:

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index e3d6276..eaea242 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -2185,6 +2185,8 @@ scsi_device_set_state(struct scsi_device *sdev, enum 
scsi_device_state state)
case SDEV_OFFLINE:
case SDEV_TRANSPORT_OFFLINE:
case SDEV_CANCEL:
+   case SDEV_BLOCK:
+   case SDEV_CREATED_BLOCK:


Something like this, yes.  For the probe lun case, we have to be in
CREATED, so any block action transitions only to CREATED_BLOCK.  The
BLOCK->DEL transition can only be a result of an async remove racing
with bringup, can't it?  Which is something I think we still want to
forbid.


OK, I will leave the BLOCK->DEL transition out.

Bart.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v11 6/9] Make scsi_remove_host() wait until error handling finished

2013-06-25 Thread Bart Van Assche


On 06/25/13 15:45, James Bottomley wrote:

On Tue, 2013-06-25 at 11:01 +0200, Bart Van Assche wrote:

There is a difference though between moving the EH kthread_stop() call
and the patch at the start of this thread: moving the EH kthread_stop()
call does not prevent that an ioctl like SG_SCSI_RESET triggers an eh_*
callback after scsi_remove_host() has finished. However, the
scsi_begin_eh() / scsi_end_eh() functions do prevent that an ioctl can
cause an eh_* callback to be invoked after scsi_remove_device() finished.


OK, but this doesn't tell me what you're trying to achieve.

An eh function is allowable as long as the host hadn't had the release
callback executed.  That means you must have to have a reference to the
device/host to execute the eh function, which is currently guaranteed
for all invocations.


That raises a new question: how is an LLD expected to clean up resources 
without triggering a race condition ? What you wrote means that it's not 
safe for an LLD to start cleaning up the resources needed by the eh_* 
callbacks immediately after scsi_remove_device() returns since it it not 
guaranteed that at that time all references to the device have already 
been dropped.


Bart.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

41 matches

Mail list logo