Re: [PATCH 1/6] libata: Do not retry commands with valid autosense

2015-08-02 Thread Tejun Heo
Hello,

On Fri, Jul 31, 2015 at 03:02:03PM +0200, Hannes Reinecke wrote:
> If a failed command has a valid autosense there is no need to
> retry it on the ATA level; at best we're incurring the same
> error again. So rather not retry it here, but leave it to
> the SCSI layer to decide if a retry is in order.

Hmmm... I don't know.  So, we change how we handle errors completely
depending on how the device reports it?  Doesn't seem like a
particularly good idea to me.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/6] ZAC host-aware device support

2015-08-02 Thread Tejun Heo
On Fri, Jul 31, 2015 at 03:02:02PM +0200, Hannes Reinecke wrote:
> Hi all,
> 
> here is a patchset for adding ZAC host-aware device support to libata.
> Main bits are translations for ZBC IN and ZBC OUT; others are the
> required plumbing like generating the correct VPD pages.
> James, do you require a separate patch for adding ZBC IN and ZBC OUT
> or is it okay to have it queued here?
> 
> As usual, reviews and comments are welcome.

Can you please give an overview of why we would want this?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/6] libata: Do not retry commands with valid autosense

2015-08-03 Thread Tejun Heo
Hello,

On Mon, Aug 03, 2015 at 09:31:57AM +0200, Hannes Reinecke wrote:
> The whole point of the autosense feature is that you do _not_
> have to fall back to the original trial-and-error libata EH,
> but know exactly what the problem is. Plus any retry will be giving
> us (in most cases) exactly the same sense code.

Can you please give some examples?  As lacking as ATA error reporting
is, it still can tell whether retry is necessary or not in most cases.

> _And_ the SCSI layer is actually able to understand the sense code,
> allowing him to make a better judgment on what to do with that error.
>
> So any retry in the libata layer will only slow things down,
> leading to the same results eventually.

Have you tested actual error handling?  I doubt this would work as you
expect it to.  libata EH takes over the entire error handling and when
it determines that the command has failed and retrying won't do any
good, it tells SCSI EH to not retry either.

Ugh... so this is from NCQ autosense thing.  Now ATA devices reports
sense data too which trumps AC_ERR_DEV so libata EH decides to retry
even when the device indicates unrecoverable error.  Urgh... we
shouldn't be taking completely different error handling paths because
a device chooses to report error conditions slightly differently.
Please map them so that they behave in a consistent manner.  I'm gonna
plug autosense for now.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH libata/for-4.2-fixes] libata: disable NCQ autosense

2015-08-03 Thread Tejun Heo
>From 8c0fa3e7ca99b0d6d96cb2cf194a912f643da7c5 Mon Sep 17 00:00:00 2001
From: Tejun Heo 
Date: Mon, 3 Aug 2015 11:10:45 -0400

fe7173c206de ("libata: Implement support for sense data reporting")
and subsequent patches enabled NCQ autosense reporting; unfortunately,
this breaks libata EH behavior for devices which support the feature
because it assumes that only ATAPI devices report sense data and that
sense data trumps explicit device error indication.

For now, disable NCQ autosense.

Cc: sta...@vger.kernel.org # v4.1+
Fixes: fe7173c206de ("libata: Implement support for sense data reporting")
---
 drivers/ata/libata-core.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index db5d9f7..32451ac 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -2151,6 +2151,13 @@ static void ata_dev_config_sense_reporting(struct 
ata_device *dev)
 {
unsigned int err_mask;
 
+   /*
+* FIXME: This makes ATA devices report sense data on failure which
+* in turn makes libata EH ignore AC_ERR_DEV leading to incorrect
+* error handling behaviors.
+*/
+   return;
+
if (!ata_id_has_sense_reporting(dev->id))
return;
 
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/6] libata: Do not retry commands with valid autosense

2015-08-03 Thread Tejun Heo
Adding a bit.

On Mon, Aug 03, 2015 at 11:04:28AM -0400, Tejun Heo wrote:
> Ugh... so this is from NCQ autosense thing.  Now ATA devices reports
> sense data too which trumps AC_ERR_DEV so libata EH decides to retry
> even when the device indicates unrecoverable error.  Urgh... we
> shouldn't be taking completely different error handling paths because
> a device chooses to report error conditions slightly differently.
> Please map them so that they behave in a consistent manner.  I'm gonna
> plug autosense for now.

Also, is there anything substantial we gain from NCQ autosense?  Why
do we want this in the first place?  ATA error reporting is
rudimentary but it more or less works and I'm not sure whether we'd
want to overhaul its basic behaviors at this stage.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH libata/for-4.2-fixes] libata: disable NCQ autosense

2015-08-03 Thread Tejun Heo
On Mon, Aug 03, 2015 at 11:16:21AM -0400, Tejun Heo wrote:
> From 8c0fa3e7ca99b0d6d96cb2cf194a912f643da7c5 Mon Sep 17 00:00:00 2001
> From: Tejun Heo 
> Date: Mon, 3 Aug 2015 11:10:45 -0400
> 
> fe7173c206de ("libata: Implement support for sense data reporting")
> and subsequent patches enabled NCQ autosense reporting; unfortunately,
> this breaks libata EH behavior for devices which support the feature
> because it assumes that only ATAPI devices report sense data and that
> sense data trumps explicit device error indication.

So, this isn't enough and apparently you're bypassing the entire
device side error analyzing when the device reports sense data.  Sorry
that I missed this on the first submission but I can't see how this
would work.  Even if you fix the immediate issue in this thread, how
is it gonna handle link error reported by the device?  Who's gonna
tell libata EH that the link better be reset and may be sped down if
errors keep occurring?  This is a significant regression in EH
behavior.

I'm gonna revert the whole ACS-4 sense thing.  ATM, I'm pretty
doubtful that this buys antying substantial for us and even if so we
need to figure out a better way to integrate this.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/6] libata: Do not retry commands with valid autosense

2015-08-03 Thread Tejun Heo
Hello, James.

On Mon, Aug 03, 2015 at 08:42:43AM -0700, James Bottomley wrote:
> I'd think it would be the same reason as all modern transports: it's
> faster and allows processing of sense data in-band.  Under the old
> regime, the device is effectively frozen until you collect the data.
> Under autosense, the data is collected as part of the in-band command
> processing, so it doesn't stall the device.
> 
> Modern drives (and protocols) are moving towards being somewhat more
> chatty with sense data.  It doesn't just signal an error, mostly it's
> just reporting about drive characteristics or other advisory stuff.
> This means that if you handle it the old way, you'll get more drive
> stalls and a corresponding reduction in throughput.

The problem is not the "auto" part but the "sense" part, I guess.  ATA
devices (the harddisks) never reported sense data and instead had a
more rudimentary error bits and for newer devices NCQ log pages, so
libata EH decodes those error information and takes appropriate
actions for the indicated error condition.

Hannes's patchset makes ATA devices mostly bypass libata EH when sense
data is present.  For, say, unrecoverable read errors, it'd be
possible to make this scheme work (broken currently tho); however,
libata and SCSI aren't that closely tied and there currently is no way
for SCSI to tell libata that, e.g., link error was detected on the
device side, so libata will fail to take link recovery actions on
those cases.

This *can* be made to work in a couple different ways but what's
implemented now is pretty broken and making it work properly in any
other way than integrating sense decoding into libata EH would require
major restructuring of the whole thing which I'm not sure would be
worthwhile at this point.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/6] libata: Do not retry commands with valid autosense

2015-08-03 Thread Tejun Heo
Hello, James.

On Mon, Aug 03, 2015 at 09:44:06AM -0700, James Bottomley wrote:
> I'm not arguing that *this* patch is the best way to do it.  You asked
> *why* autosense and that's what I answered.  I think there's time to

Heh, that was mostly me being confused.  I was thinking NCQ autosense
was the only way ATA devices would report sense data.

> work out the implementation details to get them to be correct and well
> structured.

Yeah, definitely.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/6] libata: Do not retry commands with valid autosense

2015-08-03 Thread Tejun Heo
Hello, Hannes.

On Mon, Aug 03, 2015 at 06:47:46PM +0200, Hannes Reinecke wrote:
> At the moment NCQ autosense is mostly used to provide the host with more
> details for a failed I/O. The typical case here is (no small surprise)
> ZAC disks, which use autosense to inform the host about
> a malformed I/O.
>
> It is _not_ being used as a replacement for existing error behaviour,
> (ie link errors are not being signalled with that; how could they
> if there is no link?); in fact, during testing I"ve seen both, autosense

Hmmm?  Devices can report link error via TF bits and you're bypassing
TF analysis completley if sense data is present.

> I/O failures and normal I/O failures for which autosense is
> not set, and the normal error handling kicks in.
> 
> It's not that I've disable the original error handler completely,
> it's only bypassed for I/O failure where a sense code is provided.
> And the drive surely knows which error occurs, so we'd be daft not be
> using that.

The patches are altering EH actions in a very subtle way depending on
*how* an error is reported, not *what* is reported, which is a pretty
silly thing to do.  It makes things a lot more confusing to follow and
predict.  I really don't think this is an acceptable behavior.

> So I think disabling autosense completely is a bit extreme...

Please restructure the feature so that it doesn't interfere with the
usual EH behavior.  e.g. leave the EH actions alone unless explicitly
necessary but report detailed error information upwards.  If the extra
error information can be helpful in determining what EH actions to
take, factoring in that information can be helpful too but I'm not too
convinced that'd make a huge difference.

Also, please consider that ATA_QCFLAG_SENSE_VALID handling assumes
that the reporting device is an ATAPI device and the command in
question is not a regular IO one.  That's why EH ignores AC_ERR_DEV or
AC_ERR_OTHER if ATA_QCFLAG_SENSE_VALID is set.  This doesn't work out
for the new ATA usage at all.

For now, I've reverted the changes as this is actively detrimental.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: IO failures with SMR drives at latest kernel versions

2015-08-22 Thread Tejun Heo
Hello,

On Fri, Aug 21, 2015 at 10:37:44PM -0700, Anatol Pomozov wrote:
...
> https://bbs.archlinux.org/viewtopic.php?id=199351 A lot of people
> report the same problem as mine. It was found that the problem appears
> only starting from 3.19, with kernel 3.18 or Windows these drives work
> fine. It is recommended to revert this change
> 9162c6579bf90b3f5ddb7e3a6c6fa946c1b4cbeb "libata: Implement
> ATA_DEV_ZAC" that seems fixes the issue.

Hannes?

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 3/4] ata: Enabling ATA Command Priorities

2016-10-13 Thread Tejun Heo
Hello,

On Thu, Oct 13, 2016 at 04:00:30PM -0700, Adam Manzanares wrote:
> This patch checks to see if an ATA device supports NCQ command priorities.
> If so and the user has specified an iocontext that indicates
> IO_PRIO_CLASS_RT then we build a tf with a high priority command.
> 
> This is done to improve the tail latency of commands that are high
> priority by passing priority to the device.
> 
> Signed-off-by: Adam Manzanares 

Looks good to me.  Once the block changes are applied, I'll pull it
into libata tree and apply the ata part on top.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 4/4] ata: ATA Command Priority Disabled By Default

2016-10-13 Thread Tejun Heo
Hello, Adam.

Sorry about late reply.  Was on vacation.

On Thu, Oct 13, 2016 at 04:00:31PM -0700, Adam Manzanares wrote:
> Add a sysfs entry to turn on priority information being passed
> to a ATA device. By default this feature is turned off.
> 
> This patch depends on ata: Enabling ATA Command Priorities

Looks generally good but can we please use a device attribute name
which is more specific - ie. enable_ncq_prio?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 1/3] block: Add iocontext priority to request

2016-10-19 Thread Tejun Heo
Hello,

On Mon, Oct 17, 2016 at 11:27:28AM -0700, Adam Manzanares wrote:
> Patch adds an association between iocontext ioprio and the ioprio of a
> request. This is done to enable request based drivers the ability to
> act on priority information stored in the request. An example being
> ATA devices that support command priorities. If the ATA driver discovers
> that the device supports command priorities and the request has valid
> priority information indicating the request is high priority, then a high
> priority command can be sent to the device. This should improve tail
> latencies for high priority IO on any device that queues requests
> internally and can make use of the priority information stored in the
> request.
> 
> The ioprio of the request is set in blk_rq_set_prio which takes the
> request and the ioc as arguments. If the ioc is valid in blk_rq_set_prio
> then the iopriority of the request is set as the iopriority of the ioc.
> In init_request_from_bio a check is made to see if the ioprio of the bio
> is valid and if so then the request prio comes from the bio.
> 
> Signed-off-by: Adam Manzananares 

Jens, if you're okay with it, I can route this through
libata/for-4.10, or this can be applied to block and libata tree can
pull from it.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 3/3] ata: ATA Command Priority Disabled By Default

2016-10-19 Thread Tejun Heo
On Mon, Oct 17, 2016 at 11:27:30AM -0700, Adam Manzanares wrote:
> Add a sysfs entry to turn on priority information being passed
> to a ATA device. By default this feature is turned off.
> 
> This patch depends on ata: Enabling ATA Command Priorities
> 
> Signed-off-by: Adam Manzanares 
> ---
>  drivers/ata/libahci.c |  1 +
>  drivers/ata/libata-core.c |  2 +-
>  drivers/ata/libata-scsi.c | 68 
> +++
>  include/linux/libata.h|  7 +
>  4 files changed, 77 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/ata/libahci.c b/drivers/ata/libahci.c
> index 0d028ea..0e17285 100644
> --- a/drivers/ata/libahci.c
> +++ b/drivers/ata/libahci.c
> @@ -140,6 +140,7 @@ EXPORT_SYMBOL_GPL(ahci_shost_attrs);
>  struct device_attribute *ahci_sdev_attrs[] = {
>   &dev_attr_sw_activity,
>   &dev_attr_unload_heads,
> + &dev_attr_ncq_prio_on,

I'll rename it to ncq_prio_enable while applying but otherwise looks
good to me.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 2/3] ata: Enabling ATA Command Priorities

2016-10-19 Thread Tejun Heo
Hello,

I removed ata_ncq_prio_enabled() as it was colliding with the sysfs
file rename and these trivial test functions tend to obscure more than
help anything.

Thanks.

-- 8< --
>From 8e061784b51ec4a4efed0deaafb5bd9725bf5b06 Mon Sep 17 00:00:00 2001
From: Adam Manzanares 
Date: Mon, 17 Oct 2016 11:27:29 -0700
Subject: [PATCH 1/2] ata: Enabling ATA Command Priorities

This patch checks to see if an ATA device supports NCQ command priorities.
If so and the user has specified an iocontext that indicates
IO_PRIO_CLASS_RT then we build a tf with a high priority command.

This is done to improve the tail latency of commands that are high
priority by passing priority to the device.

tj: Removed trivial ata_ncq_prio_enabled() and open-coded the test.

Signed-off-by: Adam Manzanares 
Signed-off-by: Tejun Heo 
---
 drivers/ata/libata-core.c | 35 ++-
 drivers/ata/libata-scsi.c |  6 +-
 drivers/ata/libata.h  |  2 +-
 include/linux/ata.h   |  6 ++
 include/linux/libata.h|  3 +++
 5 files changed, 49 insertions(+), 3 deletions(-)

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 223a770..8346faf 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -739,6 +739,7 @@ u64 ata_tf_read_block(const struct ata_taskfile *tf, struct 
ata_device *dev)
  * @n_block: Number of blocks
  * @tf_flags: RW/FUA etc...
  * @tag: tag
+ * @class: IO priority class
  *
  * LOCKING:
  * None.
@@ -753,7 +754,7 @@ u64 ata_tf_read_block(const struct ata_taskfile *tf, struct 
ata_device *dev)
  */
 int ata_build_rw_tf(struct ata_taskfile *tf, struct ata_device *dev,
u64 block, u32 n_block, unsigned int tf_flags,
-   unsigned int tag)
+   unsigned int tag, int class)
 {
tf->flags |= ATA_TFLAG_ISADDR | ATA_TFLAG_DEVICE;
tf->flags |= tf_flags;
@@ -785,6 +786,12 @@ int ata_build_rw_tf(struct ata_taskfile *tf, struct 
ata_device *dev,
tf->device = ATA_LBA;
if (tf->flags & ATA_TFLAG_FUA)
tf->device |= 1 << 7;
+
+   if (dev->flags & ATA_DFLAG_NCQ_PRIO) {
+   if (class == IOPRIO_CLASS_RT)
+   tf->hob_nsect |= ATA_PRIO_HIGH <<
+ATA_SHIFT_PRIO;
+   }
} else if (dev->flags & ATA_DFLAG_LBA) {
tf->flags |= ATA_TFLAG_LBA;
 
@@ -2156,6 +2163,30 @@ static void ata_dev_config_ncq_non_data(struct 
ata_device *dev)
}
 }
 
+static void ata_dev_config_ncq_prio(struct ata_device *dev)
+{
+   struct ata_port *ap = dev->link->ap;
+   unsigned int err_mask;
+
+   err_mask = ata_read_log_page(dev,
+ATA_LOG_SATA_ID_DEV_DATA,
+ATA_LOG_SATA_SETTINGS,
+ap->sector_buf,
+1);
+   if (err_mask) {
+   ata_dev_dbg(dev,
+   "failed to get Identify Device data, Emask 0x%x\n",
+   err_mask);
+   return;
+   }
+
+   if (ap->sector_buf[ATA_LOG_NCQ_PRIO_OFFSET] & BIT(3))
+   dev->flags |= ATA_DFLAG_NCQ_PRIO;
+   else
+   ata_dev_dbg(dev, "SATA page does not support priority\n");
+
+}
+
 static int ata_dev_config_ncq(struct ata_device *dev,
   char *desc, size_t desc_sz)
 {
@@ -2205,6 +2236,8 @@ static int ata_dev_config_ncq(struct ata_device *dev,
ata_dev_config_ncq_send_recv(dev);
if (ata_id_has_ncq_non_data(dev->id))
ata_dev_config_ncq_non_data(dev);
+   if (ata_id_has_ncq_prio(dev->id))
+   ata_dev_config_ncq_prio(dev);
}
 
return 0;
diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 9cceb4a..2bccc3c 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -50,6 +50,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "libata.h"
 #include "libata-transport.h"
@@ -1755,6 +1756,8 @@ static unsigned int ata_scsi_rw_xlat(struct 
ata_queued_cmd *qc)
 {
struct scsi_cmnd *scmd = qc->scsicmd;
const u8 *cdb = scmd->cmnd;
+   struct request *rq = scmd->request;
+   int class = IOPRIO_PRIO_CLASS(req_get_ioprio(rq));
unsigned int tf_flags = 0;
u64 block;
u32 n_block;
@@ -1821,7 +1824,8 @@ static unsigned int ata_scsi_rw_xlat(struct 
ata_queued_cmd *qc)
qc->nbytes = n_block * scmd->device->sector_size;
 
rc = ata_build_rw_tf(&qc->tf, qc->dev, block, n_block, tf_flags,
-qc->tag);
+qc

Re: [PATCH v6 0/3] Enabling ATA Command Priorities

2016-10-19 Thread Tejun Heo
On Mon, Oct 17, 2016 at 11:27:27AM -0700, Adam Manzanares wrote:
> This patch builds ATA commands with high priority if the iocontext of a 
> process
> is set to real time. The goal of the patch is to improve tail latencies of 
> workloads that use higher queue depths. This requires setting the iocontext 
> ioprio on the request when it is initialized
> 
> This patch has been tested with an Ultrastar HE8 HDD and cuts the 
> the p99.99 tail latency of foreground IO from 2s down to 72ms when
> using the deadline scheduler. This patch works independently of the
> scheduler so it can be used with all of the currently available 
> request based schedulers. 
> 
> Foreground IO, for the previously described results, is an async fio job 
> submitting 4K read requests at a QD of 1 to the HDD. The foreground IO is set 
> with the iopriority class of real time. The background workload is another fio
> job submitting read requests at a QD of 32 to the same HDD with default 
> iopriority.
> 
> This feature is enabled for ATA devices by setting the ata ncq_prio_on device 
> attribute to 1. An ATA device is also checked to see if the device supports 
> per
> command priority.

Applied 1-3 to libata/for-4.10 w/ some modifications.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 3/3] ata: ATA Command Priority Disabled By Default

2016-10-19 Thread Tejun Heo
Hello,

Removed ata_ncq_prio_on() and renamed _on to _enable.  If I messed up
anything, please let me know.

Also, can you please send a follow-up patch to make the store function
reject prio enabling if the device doesn't support it?

Thanks.

-- 8< --
>From 84f95243b5439a20c33837075b88926bfa00c4ec Mon Sep 17 00:00:00 2001
From: Adam Manzanares 
Date: Mon, 17 Oct 2016 11:27:30 -0700
Subject: [PATCH 2/2] ata: ATA Command Priority Disabled By Default

Add a sysfs entry to turn on priority information being passed
to a ATA device. By default this feature is turned off.

This patch depends on ata: Enabling ATA Command Priorities

tj: Renamed ncq_prio_on to ncq_prio_enable and removed trivial
ata_ncq_prio_on() and open-coded the test.

Signed-off-by: Adam Manzanares 
Signed-off-by: Tejun Heo 
---
 drivers/ata/libahci.c |  1 +
 drivers/ata/libata-core.c |  3 ++-
 drivers/ata/libata-scsi.c | 68 +++
 include/linux/libata.h|  2 ++
 4 files changed, 73 insertions(+), 1 deletion(-)

diff --git a/drivers/ata/libahci.c b/drivers/ata/libahci.c
index 0d028ea..ee7db31 100644
--- a/drivers/ata/libahci.c
+++ b/drivers/ata/libahci.c
@@ -140,6 +140,7 @@ EXPORT_SYMBOL_GPL(ahci_shost_attrs);
 struct device_attribute *ahci_sdev_attrs[] = {
&dev_attr_sw_activity,
&dev_attr_unload_heads,
+   &dev_attr_ncq_prio_enable,
NULL
 };
 EXPORT_SYMBOL_GPL(ahci_sdev_attrs);
diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 8346faf..b294339 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -787,7 +787,8 @@ int ata_build_rw_tf(struct ata_taskfile *tf, struct 
ata_device *dev,
if (tf->flags & ATA_TFLAG_FUA)
tf->device |= 1 << 7;
 
-   if (dev->flags & ATA_DFLAG_NCQ_PRIO) {
+   if ((dev->flags & ATA_DFLAG_NCQ_PRIO) &&
+   (dev->flags & ATA_DFLAG_NCQ_PRIO_ENABLE)) {
if (class == IOPRIO_CLASS_RT)
tf->hob_nsect |= ATA_PRIO_HIGH <<
 ATA_SHIFT_PRIO;
diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 2bccc3c..87597a3 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -271,6 +271,73 @@ DEVICE_ATTR(unload_heads, S_IRUGO | S_IWUSR,
ata_scsi_park_show, ata_scsi_park_store);
 EXPORT_SYMBOL_GPL(dev_attr_unload_heads);
 
+static ssize_t ata_ncq_prio_enable_show(struct device *device,
+   struct device_attribute *attr, char 
*buf)
+{
+   struct scsi_device *sdev = to_scsi_device(device);
+   struct ata_port *ap;
+   struct ata_device *dev;
+   bool ncq_prio_enable;
+   int rc = 0;
+
+   ap = ata_shost_to_port(sdev->host);
+
+   spin_lock_irq(ap->lock);
+   dev = ata_scsi_find_dev(ap, sdev);
+   if (!dev) {
+   rc = -ENODEV;
+   goto unlock;
+   }
+
+   ncq_prio_enable = dev->flags & ATA_DFLAG_NCQ_PRIO_ENABLE;
+
+unlock:
+   spin_unlock_irq(ap->lock);
+
+   return rc ? rc : snprintf(buf, 20, "%u\n", ncq_prio_enable);
+}
+
+static ssize_t ata_ncq_prio_enable_store(struct device *device,
+struct device_attribute *attr,
+const char *buf, size_t len)
+{
+   struct scsi_device *sdev = to_scsi_device(device);
+   struct ata_port *ap;
+   struct ata_device *dev;
+   long int input;
+   unsigned long flags;
+   int rc;
+
+   rc = kstrtol(buf, 10, &input);
+   if (rc)
+   return rc;
+   if ((input < 0) || (input > 1))
+   return -EINVAL;
+
+   ap = ata_shost_to_port(sdev->host);
+
+   spin_lock_irqsave(ap->lock, flags);
+   dev = ata_scsi_find_dev(ap, sdev);
+   if (unlikely(!dev)) {
+   rc = -ENODEV;
+   goto unlock;
+   }
+
+   if (input)
+   dev->flags |= ATA_DFLAG_NCQ_PRIO_ENABLE;
+   else
+   dev->flags &= ~ATA_DFLAG_NCQ_PRIO_ENABLE;
+
+unlock:
+   spin_unlock_irqrestore(ap->lock, flags);
+
+   return rc ? rc : len;
+}
+
+DEVICE_ATTR(ncq_prio_enable, S_IRUGO | S_IWUSR,
+   ata_ncq_prio_enable_show, ata_ncq_prio_enable_store);
+EXPORT_SYMBOL_GPL(dev_attr_ncq_prio_enable);
+
 void ata_scsi_set_sense(struct ata_device *dev, struct scsi_cmnd *cmd,
u8 sk, u8 asc, u8 ascq)
 {
@@ -402,6 +469,7 @@ EXPORT_SYMBOL_GPL(dev_attr_sw_activity);
 
 struct device_attribute *ata_common_sdev_attrs[] = {
&dev_attr_unload_heads,
+   &dev_attr_ncq_prio_enable,
NULL
 };
 EXPORT_SYMBOL_GPL(ata_common_sdev_attrs);
diff --git a/include/linux/libata.h b/include/linux/libata.h
index 90b69a6..c1

Re: [PATCH] mvsas: fix error return code in mvs_task_prep()

2016-10-31 Thread Tejun Heo
On Mon, Oct 31, 2016 at 03:04:10PM +, Wei Yongjun wrote:
> From: Wei Yongjun 
> 
> Fix to return error code -ENOMEM from the error handling
> case instead of 0, as done elsewhere in this function.
> 
> Signed-off-by: Wei Yongjun 

Applied to libata/for-4.9-fixes.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] mvsas: fix error return code in mvs_task_prep()

2016-11-01 Thread Tejun Heo
On Mon, Oct 31, 2016 at 10:29:26AM -0600, Tejun Heo wrote:
> On Mon, Oct 31, 2016 at 03:04:10PM +, Wei Yongjun wrote:
> > From: Wei Yongjun 
> > 
> > Fix to return error code -ENOMEM from the error handling
> > case instead of 0, as done elsewhere in this function.
> > 
> > Signed-off-by: Wei Yongjun 
> 
> Applied to libata/for-4.9-fixes.

I messed up.  This should have gone through scsi, not libata.  Chatted
with Martin and as the patch is a small obvious fix decided to leave
it in libata tree.  My apologies for the confusion.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ata: xgene: Enable NCQ support for APM X-Gene SATA controller hardware v1.1

2016-11-09 Thread Tejun Heo
Hello,

On Wed, Sep 14, 2016 at 04:15:00PM +0530, Rameshwar Sahu wrote:
> > @@ -821,8 +823,6 @@ static int xgene_ahci_probe(struct platform_device
> > *pdev)
> > dev_warn(&pdev->dev, "%s: Error reading
> > device info. Assume version1\n",
> > __func__);
> > version = XGENE_AHCI_V1;
> > -   } else if (info->valid & ACPI_VALID_CID) {
> > -   version = XGENE_AHCI_V2;

Can you please explain this part a bit?  Everything else looks good to
me.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ata: xgene: Enable NCQ support for APM X-Gene SATA controller hardware v1.1

2016-11-15 Thread Tejun Heo
Hello, Rameshwar.

On Fri, Nov 11, 2016 at 01:36:28PM +0530, Rameshwar Sahu wrote:
> Hi Tejun,
> 
> On Wed, Nov 9, 2016 at 10:15 PM, Tejun Heo  wrote:
> > Hello,
> >
> > On Wed, Sep 14, 2016 at 04:15:00PM +0530, Rameshwar Sahu wrote:
> >> > @@ -821,8 +823,6 @@ static int xgene_ahci_probe(struct platform_device
> >> > *pdev)
> >> > dev_warn(&pdev->dev, "%s: Error reading
> >> > device info. Assume version1\n",
> >> > __func__);
> >> > version = XGENE_AHCI_V1;
> >> > -   } else if (info->valid & ACPI_VALID_CID) {
> >> > -   version = XGENE_AHCI_V2;
> >
> > Can you please explain this part a bit?  Everything else looks good to
> > me.
> 
> Here we should not assume XGENE_AHCI_V2 always in case of having valid
> _CID in ACPI table.
> I need to remove this assumption because V1_1 has also valid _CID for
> backward compatibly with v1.

Can you please repost with the above explanation added to the commit
message?

Thanks!

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] ata: xgene: Enable NCQ support for APM X-Gene SATA controller hardware v1.1

2017-01-18 Thread Tejun Heo
Hello,

On Tue, Jan 17, 2017 at 08:25:21PM +0530, Rameshwar Sahu wrote:
> Hi Tejun,
> 
> On Fri, Nov 18, 2016 at 3:15 PM, Rameshwar Prasad Sahu  wrote:
> > This patch enables NCQ support for APM X-Gene SATA controller hardware v1.1
> > that was broken with hardware v1.0. Second thing, here we should not assume
> > XGENE_AHCI_V2 always in case of having valid _CID in ACPI table. I need to
> > remove this assumption because V1_1 also has a valid _CID for backward
> > compatibly with v1.
> >
> > v2 changes:
> > 1. Changed patch description
> >
> > Signed-off-by: Rameshwar Prasad Sahu 

Hmm... I don't have the patch in my queue.  Can you please resend it?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 0/4]ata: Fixes related to APM X-Gene SATA host controller driver.

2014-07-29 Thread Tejun Heo
On Tue, Jul 29, 2014 at 12:24:48PM +0530, Suman Tripathi wrote:
> This patch set contains a couple of fixes related to APM X-Gene SATA
> controller driver.
> 
> v2 Change:
>1. Drop the Link down retry patch from this patch set.
> 
> v4 Change:
>1. Drop the patch to fix the csr-mask in dts for PHY clock
>  node of SATA Host Controller 1.
>2. Add the patch to correct the OOB tunning parameters for
>  the COMINIT/COMWAKE parameters.
>3. Add the patch to remove the NCQ support from the APM
>  X-Gene AHCI SATA Host controller driver.
>4. Add the patch to remove the clock and PHY reference nodes
>  from the APM X-Gene Host controller dts node.
> 
> Signed-off-by: Loc Ho 
> Signed-off-by: Suman Tripathi 

Applied 1 and 3 to libata/for-3.17.  4 doesn't apply.  Also, please
prefix the patches with "ahci_xgene: " from now on.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 0/4]ata: Fixes related to APM X-Gene SATA host controller driver.

2014-07-29 Thread Tejun Heo
On Tue, Jul 29, 2014 at 08:05:45PM +0530, Suman Tripathi wrote:
> Hi,
> 
> Applied 1 and 3 to libata/for-3.17.  4 doesn't apply.  Also, please
> prefix the patches with "ahci_xgene: " from now on.
> 
> [suman] : You mean the "Remove NCQ patch" is not applied. Any reason for
> that ?

I meant that the patch doesn't apply to libata/for-3.17.  Please
always generate patches against the current devel branch.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 1/2] ahci_xgene: Removing NCQ support from the APM X-Gene SoC AHCI SATA Host Controller driver.

2014-08-16 Thread Tejun Heo
On Fri, Aug 08, 2014 at 09:44:25PM +0530, Suman Tripathi wrote:
> This patch removes the NCQ support from the APM X-Gene SoC AHCI
> Host Controller driver as it doesn't support it.
> 
> Signed-off-by: Loc Ho 
> Signed-off-by: Suman Tripathi 

Applied to libata/for-3.17-fixes w/ stable cc'd.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 2/3] ahci_xgene: Skip the PHY and clock initialization if already configured by the firmware.

2014-08-19 Thread Tejun Heo
On Tue, Aug 19, 2014 at 12:01:50PM +0530, Suman Tripathi wrote:
> This patch implements the feature to skip the PHY and clock
> initialization if it is already configured by the firmware.
> 
> Signed-off-by: Loc Ho 
> Signed-off-by: Suman Tripathi 
...
> +static int xgene_ahci_is_memram_inited(struct xgene_ahci_context *ctx)
> +{
> + void __iomem *diagcsr = ctx->csr_diag;
> +
> + if (readl(diagcsr + CFG_MEM_RAM_SHUTDOWN) == 0 &&
> + readl(diagcsr + BLOCK_MEM_RDY) == 0x)
> + return 1;
> + return 0;
> +}

Please make it return bool.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 3/3] ahci_xgene: Fix the link down in first attempt for the APM X-Gene SoC AHCI SATA host controller driver.

2014-08-19 Thread Tejun Heo
On Tue, Aug 19, 2014 at 12:01:51PM +0530, Suman Tripathi wrote:
> The link down issue in first attempt happens due to 2 H/W errata below:
> 
> 1. Due to HW errata, during speed negotiation, sometimes controller
> is not able to detect ALIGN at GEN3(6Gbps) within 54.6us results in
> a timeout. This issue can be recovered by issuing a COMRESET again.
> 
> 2. Due to HW errata, although ALIGH detection is successfull, due to
> 8b/10b and disparity BERR, sometimes the signature from the drive is
> not received successfully by the Host controller. Due to this the
> communication with the host and drive is not established due to
> locking of CDR(clock and data recovery) circuit. This issue can be
> recovered by issuing a COMRESET again.
> 
> This patch fixes the above issues by retrying the COMRESET with a
> maximum attempts of 3.

It's kinda nasty but if it's necessary.  That said, can you please
update the comment so that it actually matches the code?  Also,
Wouldn't it be better to check how the reset failed before retrying?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 3/3] ahci_xgene: Fix the link down in first attempt for the APM X-Gene SoC AHCI SATA host controller driver.

2014-08-21 Thread Tejun Heo
On Thu, Aug 21, 2014 at 01:48:00PM +0530, Suman Tripathi wrote:
> [suman] : The problem is COMRESET didn't failed. I meant the hardreset is
> successful (return 0) but the device is not detected even if device is
> present due to speed negotiation failure. For that reason I check for the
> Pxstatus and retried.

Alright, please update the comment to explain what's going on.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 3/3] ahci_xgene: Fix the link down in first attempt for the APM X-Gene SoC AHCI SATA host controller driver.

2014-08-25 Thread Tejun Heo
On Sun, Aug 24, 2014 at 12:07:27AM +0530, Suman Tripathi wrote:
> This patch addresses two HW erratas as described below by retrying the
> COMRESET:
> 
> 1. During speed negotiation, controller is not able to detect ALIGN
> at GEN3(6Gbps) within 54.6us and results in a timeout. This issue can
> be recovered by issuing a COMRESET.
> 
> 2. Although ALIGN detection is successful, 8b/10b and disparity bit
> errors can occur which result in the signature FIS not received
> successfully by the Host controller. Due to this, the PHY communication
> between the host and drive is not established because of CDR(clock and
> data recovery) circuit doesn't lock. This issue can be recoverd by issuing
> a COMRESET.
> 
> The above retries are issued only if the port status register PXSTATUS
> reports device presence detected but PHY communication not established.
> The maximum retry attempts are 3.

Didn't I ask you to update the comment to explain what's going on?  Or
is the existing comment already sufficient?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 3/3] ahci_xgene: Fix the link down in first attempt for the APM X-Gene SoC AHCI SATA host controller driver.

2014-08-26 Thread Tejun Heo
On Tue, Aug 26, 2014 at 12:17:35PM +0530, Suman Tripathi wrote:
> Didn't I ask you to update the comment to explain what's going on?
> [suman] : can you specifically tell which part of the comment is not clear
> and need more explanation?

The comment on top of the function doesn't seem to match what's being
implemented.  In addition, it's generally not very useful to list the
actual algorithm in text.  Put algorithm in code and explain the
summary and rationales for it in the comments.  Nothing explains why
the retries are being done.

> is the existing comment already sufficient?
> [suman] : The existing comment is sufficient .

No, this isn't.  You don't have to include a novel to explain it but
there's something different going on here and you should provide
information on why this sort of deviation is necessary.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 5/6] mptsas: use async probe

2014-09-05 Thread Tejun Heo
On Thu, Sep 04, 2014 at 11:37:26PM -0700, Luis R. Rodriguez wrote:
> From: "Luis R. Rodriguez" 
> 
> Its reported that mptsas can at times take over 30 seconds
> to recognize SCSI storage devices [0], this is done on the
> driver's probe path. Use the the new asynch probe to
> circumvent systemd from killing this driver.

Again, *ANY* SCSI storage controller may.  The fact that a specific
bug report was filed on mptsas doesn't really mean anything.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM

2014-09-05 Thread Tejun Heo
On Thu, Sep 04, 2014 at 11:37:24PM -0700, Luis R. Rodriguez wrote:
...
> + /*
> +  * I got SIGKILL, but wait for 60 more seconds for completion
> +  * unless chosen by the OOM killer. This delay is there as a
> +  * workaround for boot failure caused by SIGKILL upon device
> +  * driver initialization timeout.
> +  *
> +  * N.B. this will actually let the thread complete regularly,
> +  * wait_for_completion() will be used eventually, the 60 second
> +  * try here is just to check for the OOM over that time.
> +  */
> + WARN_ONCE(!test_thread_flag(TIF_MEMDIE),
> +   "Got SIGKILL but not from OOM, if this issue is on 
> probe use .driver.async_probe\n");
> + for (i = 0; i < 60 && !test_thread_flag(TIF_MEMDIE); i++)
> + if (wait_for_completion_timeout(&done, HZ))
> + goto wait_done;
> +

Ugh... Jesus, this is way too hacky, so now we fail on 90s timeout
instead of 30?  Why do we even need this with the proposed async
probing changes?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM

2014-09-05 Thread Tejun Heo
On Fri, Sep 05, 2014 at 12:47:16AM -0700, Luis R. Rodriguez wrote:
> Ah -- well without it the way we "find" drivers that need this new
> "async feature" is by a bug report and folks saying their system can't
> boot, or they say their device doesn't come up. That's all. Tracing
> this to systemd and a timeout was one of the most ugliest things ever.
> There two insane bug reports you can go check:
> 
> mptsas was the first:
> 
> http://article.gmane.org/gmane.linux.kernel/1669550
> https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1297248
> 
> Then cxgb4:
> 
> https://bugzilla.novell.com/show_bug.cgi?id=877622
> 
> I only had Cc'd you on the newest gem pata_marvell :
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=59581
> 
> We can't seriously expect to be doing all this work for every driver.
> a WARN_ONCE() would enable us to find the drivers that need this new
> async probe "feature".

This whole approach of trying to mark specific drivers as needing
"async probing" is completely broken for the problem at hand.  It
can't address the problem adequately while breaking backward
compatibility.  I don't think this makes much sense.

Nacked-by: Tejun Heo 

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM

2014-09-05 Thread Tejun Heo
Hello,

On Fri, Sep 05, 2014 at 09:44:05AM -0700, Dmitry Torokhov wrote:
> Which problem are we talking about here though? It does solve the slow device
> stalling the rest if the kernel booting (non-module case) for me.

The other one.  The one with timeout.  Neither cxgb4 or pata_marvell
has slow probing stalling boot problem.

> I also reject the notion that anyone should be relying on drivers to be fully
> bound on module loading. It is not nineties anymore. We have hot pluggable
> buses, deferred probing, and even for not hot-pluggable ones the module
> providing the device itself might not be yet loaded. Any scripts that expect 
> to
> find device 100% ready after module loading are simply broken.

We've been treating loading + probing as a single operation when
loading drivers and the assumption has always been that the existing
devices at the time of loading finished probing by the time insmod
finishes.  We now need to split loading and probing and wait for each
of them differently.  The *only* thing we can do is somehow making the
issuer specify that it's gonna wait for probing separately.  I'm not
sure this can even be up for discussion.  We're talking about a major
userland visible behavior change.  We simply can't change it
underneath the existing users.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM

2014-09-05 Thread Tejun Heo
Hello, Dmitry.

On Fri, Sep 05, 2014 at 11:10:03AM -0700, Dmitry Torokhov wrote:
> I do not agree that it is actually user-visible change: generally speaking you
> do not really know if device is there or not. They come and go. Like I said,
> consider all permutations, with hot-pluggable buses, deferred probing, etc,

It is for storage devices which always have guaranteed synchronous
probing on module load and well-defined probing order.  Sure, modern
setups are a lot more dynamic but I'm quite certain that there are
setups in the wild which depend on storage driver loading being
synchronous.  We can't simply declare one day that such behavior is
broken and break, most likely, their boots.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM

2014-09-05 Thread Tejun Heo
On Sat, Sep 06, 2014 at 07:29:56AM +0900, Tejun Heo wrote:
> It is for storage devices which always have guaranteed synchronous
> probing on module load and well-defined probing order.  Sure, modern
> setups are a lot more dynamic but I'm quite certain that there are
> setups in the wild which depend on storage driver loading being
> synchronous.  We can't simply declare one day that such behavior is
> broken and break, most likely, their boots.

To add a bit, if the argument here is that dependency on such behavior
shouldn't exist and module loading and device probing should always be
asynchronous, the right approach is implementing "synchronous_probing"
flag not the other way around.  I actually wouldn't hate to see that
change happening but whoever submits and routes such a change should
be ready for a major shitstorm, I'm afraid.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM

2014-09-05 Thread Tejun Heo
Hello, Luis.

On Fri, Sep 05, 2014 at 11:12:17AM -0700, Luis R. Rodriguez wrote:
> Meanwhile we are allowing a major design consideration such as a 30
> second timeout for both init + probe all of a sudden become a hard
> requirement for device drivers. I see your point but can't also be
> introducing major design changes willy nilly either. We *need* a
> solution for the affected drivers.

Yes, make the behavior specifically specified from userland.  When did
I ever say that there should be no solution for the problem?  I've
been saying that the behavior should be selected from userland from
the get-go, haven't I?

I have no idea how the seleciton should be.  It could be per-insmod or
maybe just a system-wide flag with explicit exceptions marked on
drivers is good enough.  I don't know.

> Also what stops drivers from going ahead and just implementing their
> own async probe? Would that now be frowned upon as it strives away

The drivers can't.  How many times should I explain the same thing
over and over again.  libata can't simply make probing asynchronous
w.r.t. module loading no matter how it does it.  Yeah, sure, there can
be other drivers which can do that without most people noticing it but
a storage driver isn't one of them and the storage drivers are the
problematic ones already, right?

> from the original design? The bool would let those drivers do this
> easily, and we would still need to identify these drivers, although
> this particular change can be NAK'd Oleg's suggestion on
> WARN_ON(fatal_signal_pending() at the end of load_module() seems to me
> at least needed. And if its not async probe... what do those with
> failed drivers do?

I'm getting tired of explaining the same thing over and over again.
The said change was nacked because the whole approach of "let's see
which drivers get reported on the issue which exists basically for all
drivers and just change the behavior of them" is braindead.  It makes
no sense whatsoever.  It doesn't address the root cause of the problem
while making the same class of drivers behave significantly
differently for no good reason.  Please stop chasing your own tail and
try to understand the larger picture.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM

2014-09-05 Thread Tejun Heo
Hello, Dmitry.

On Fri, Sep 05, 2014 at 03:49:17PM -0700, Dmitry Torokhov wrote:
> On Sat, Sep 06, 2014 at 07:31:39AM +0900, Tejun Heo wrote:
> > On Sat, Sep 06, 2014 at 07:29:56AM +0900, Tejun Heo wrote:
> > > It is for storage devices which always have guaranteed synchronous
> > > probing on module load and well-defined probing order.
> 
> Agree about probing order (IIRC that is why we had to revert the
> wholesale asynchronous probing a few years back) but totally disagree
> about synchronous module loading.

I don't get it.  This is a behavior userland already depends on for
boots.  What's there to agree or disagree?  This is just a fact that
we can't do this w/o disturbing some userlands in a major way.

> Anyway, I just posted a patch that I think preserves module loading
> behavior and solves my issue with built-in modules. It does not help
> Luis' issue though (but then I think the main problem is with systemd
> being stupid there).

This sure can be worked around from userland side too by not imposing
any timeout on module loading but that said for the same reasons that
you've been arguing until now, I actually do think that it's kinda
silly to make device probing synchronous to module loading at this
time and age.  What we disagree on is not that we want to separate
those waits.  It is about how to achieve it.

> > To add a bit, if the argument here is that dependency on such behavior
> > shouldn't exist and module loading and device probing should always be
> > asynchronous, the right approach is implementing "synchronous_probing"
> > flag not the other way around.  I actually wouldn't hate to see that
> > change happening but whoever submits and routes such a change should
> > be ready for a major shitstorm, I'm afraid.
> 
> I think we already had this storm and that is why here we have opt-in
> behavior for the drivers.

It's a different shitstorm where we actively break bootings on some
userlands.  Trust me.  That's gonna be a lot worse.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM

2014-09-05 Thread Tejun Heo
Hello,

On Fri, Sep 05, 2014 at 03:52:48PM -0700, Dmitry Torokhov wrote:
> Ahem... and they sure it works reliably with large storage arrays? With
> SCSI doing probing asynchronously already?

I believe this has been mentioned before too but, yes, SCSI device
probing is asynchronous and parallelized but the registration of the
discovered devices are fully serialized according to driver attach
order.  Storage devices are probed in parallel and attached in a fully
deterministic order.  That part has never changed.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM

2014-09-05 Thread Tejun Heo
Hey,

On Fri, Sep 05, 2014 at 04:22:42PM -0700, Dmitry Torokhov wrote:
> > I don't get it.  This is a behavior userland already depends on for
> > boots.  What's there to agree or disagree?  This is just a fact that
> > we can't do this w/o disturbing some userlands in a major way.
> 
> I am just expressing my disbelief that somebody relies on module loading
> being synchronous with probing. Out of curiosity, do you have any
> pointers?

I've seen initrd scripts which depended on the behavior to wait for
storage devices over the years.  AFAIK, none of the modern distros
does it but this has been such a basic feature all along and it seems
highly unlikely to me that there's no userland remaining out there
depending on such behavior.  We do have a lot of different userlands,
many of them quite ad-hoc.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] libata: consolidate ata_dev_classify()

2014-09-05 Thread Tejun Heo
Hello, Hannes.

Sorry about the delay.

On Wed, Jul 30, 2014 at 09:55:08AM +0200, Hannes Reinecke wrote:
> ata_dev_classify() just uses the 'lbah' and 'lbam'
> fields from the taskfile, so we can as well use those
> as arguments and rip out the custom code from sas_ata.

I wonder whether it'd easier to just make sas code pass in
ata_taskfile instead?  The interface which takes three consecutive
u8's is kinda error-prone.

> --- a/drivers/scsi/aic94xx/aic94xx_task.c
> +++ b/drivers/scsi/aic94xx/aic94xx_task.c
> @@ -373,10 +373,10 @@ static int asd_build_ata_ascb(struct asd_ascb *ascb, 
> struct sas_task *task,
>  
>   if (unlikely(task->ata_task.device_control_reg_update))
>   scb->header.opcode = CONTROL_ATA_DEV;
> - else if (dev->sata_dev.command_set == ATA_COMMAND_SET)
> - scb->header.opcode = INITIATE_ATA_TASK;
> - else
> + else if (dev->sata_dev.class == ATA_DEV_ATAPI)
>   scb->header.opcode = INITIATE_ATAPI_TASK;
> + else
> + scb->header.opcode = INITIATE_ATA_TASK;

Are these changes covered by the patch description?  Looks like the
patch is mixing two separate logical changes.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] libata-scsi: Update SATL for ZAC drives

2014-09-05 Thread Tejun Heo
On Wed, Jul 30, 2014 at 09:55:10AM +0200, Hannes Reinecke wrote:
> ZAC (zoned-access command) drives translate into ZBC (Zoned block
> command) device type for SCSI. So implement the correct mappings
> into libata-scsi and update the SCSI command set versions.

Patch 2-3 look good to me.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v9 3/3] ahci_xgene: Fix the link down in first attempt for the APM X-Gene SoC AHCI SATA host controller driver.

2014-09-05 Thread Tejun Heo
On Thu, Aug 28, 2014 at 02:51:22PM +0530, Suman Tripathi wrote:
> Due to HW errata the APM X-Gene AHCI SATA host controller reports link
> down even if the device presence is detected. This issue is due to speed
> negotiation failure. This patch implements the algorithm to retry the
> COMRESET if PxSTAT register reports device presence detected but
> PHY communication not established. The maximum retry attempts are 3.
> 
> This patch also fixes the code to match the algorithm for the printing
> a warning message if the disparity error still exists after link up.
> 
> Signed-off-by: Loc Ho 
> Signed-off-by: Suman Tripathi 

Applied to libata/for-3.17-fixes.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v9 2/3] ahci_xgene: Skip the PHY and clock initialization if already configured by the firmware.

2014-09-05 Thread Tejun Heo
On Thu, Aug 28, 2014 at 02:51:21PM +0530, Suman Tripathi wrote:
> This patch implements the feature to skip the PHY and clock
> initialization if it is already configured by the firmware.
> 
> Signed-off-by: Loc Ho 
> Signed-off-by: Suman Tripathi 

Applied to libata/for-3.17-fixes.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] libata: consolidate ata_dev_classify()

2014-09-06 Thread Tejun Heo
Hello,

On Sat, Sep 06, 2014 at 10:21:51AM +0200, Hannes Reinecke wrote:
> Well, yes, in principle. I was looking into that, too.
> But then I figured that moving to ata_taskfile would be a major overhaul for
> libsas, which would be quite beyond scope here.
> And all for a puny little patch.

Hmm?  Just allocate a ata_taskfile stuct on stack when invoking the
function.  The change would be a lot smaller actually.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] libata: consolidate ata_dev_classify()

2014-09-07 Thread Tejun Heo
On Sun, Sep 07, 2014 at 01:24:47PM +0200, Hannes Reinecke wrote:
> Which was actually my first attempt, but then I figured I'd be
> increasing the stacksize in doing so.
> But sure, if you're okay with it I'll be redoing the patch.

The struct is only 32 bytes.  I don't think it's gonna make any
meaningful difference.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM

2014-09-08 Thread Tejun Heo
Hello, Luis.

On Mon, Sep 08, 2014 at 06:04:23PM -0700, Luis R. Rodriguez wrote:
> > I have no idea how the selection should be.  It could be per-insmod or
> > maybe just a system-wide flag with explicit exceptions marked on
> > drivers is good enough.  I don't know.
> 
> Its perfectly understandable if we don't know what path to take yet
> and its also understandable for it to take time to figure out --
> meanwhile though systemd already has merged a policy of a 30 second
> timeout for *all drivers* though so we therefore need:

I'm not too convinced this is such a difficult problem to figure out.
We already have most of logic in place and the only thing missing is
how to switch it.  Wouldn't something like the following work?

* Add a sysctl knob to enable asynchronous device probing on module
  load and enable asynchronous probing globally if the knob is set.

* Identify cases which can't be asynchronous and make them
  synchronous.  e.g. keep who's doing request_module() and avoid
  asynchronous probing if current is probing one of those.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM

2014-09-08 Thread Tejun Heo
On Tue, Sep 09, 2014 at 10:10:59AM +0900, Tejun Heo wrote:
> * Identify cases which can't be asynchronous and make them
>   synchronous.  e.g. keep who's doing request_module() and avoid
>   asynchronous probing if current is probing one of those.

That wouldn't work as we don't know what's gonna happen in userland
but we can start with just disallowing async probing for char devices
for now.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: WARNING in block layer triggered in 3.17-rc3

2014-09-08 Thread Tejun Heo
Hello,

On Mon, Sep 08, 2014 at 01:42:44PM -0400, Alan Stern wrote:
> > This looks like a nasty hack.  In theory the QUEUE_FLAG_INIT_DONE should
> > be unset on blk_unregister_queue() to match the teardown; it's only
> > accident it isn't.  del_gendisk() in sd_remove() is supposed to tear a
> > lot of queue stuff down.
> 
> It's not clear what the operative assumptions are.  The comment in
> blk_register_queue() implies that bypass is active only because it was
> set up that way when the queue was created.  The fact that
> blk_unregister_queue() doesn't call blk_queue_bypass_start() seems to
> support this view -- although it could also be a simple oversight.
> 
> Hopefully Tejun can clear this iup.

Maintaining the initial bypass till queue registration is an
optimization because shutting down a fully functional queue is a
costly operation and there are drivers which set and destroy queues
repeatedly while probing, so, yeah, it's really a special case for
when the queue is being registered for the first time.

> > Your hack seems to indicate that this doesn't work on the add->del->add
> > transtion of a gendisk.
> 
> Indeed, it does not work.

As such, the change you suggested makes perfect sense to me.  Why
wouldn't it work?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM

2014-09-08 Thread Tejun Heo
On Tue, Sep 09, 2014 at 10:10:59AM +0900, Tejun Heo wrote:
> I'm not too convinced this is such a difficult problem to figure out.
> We already have most of logic in place and the only thing missing is
> how to switch it.  Wouldn't something like the following work?
> 
> * Add a sysctl knob to enable asynchronous device probing on module
>   load and enable asynchronous probing globally if the knob is set.

Alternatively, add a module-generic param "async_probe" or whatever
and use that to switch the behavior should work too.  I don't know
which way is better but either should work fine.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM

2014-09-08 Thread Tejun Heo
On Mon, Sep 08, 2014 at 06:26:04PM -0700, Luis R. Rodriguez wrote:
> > Alternatively, add a module-generic param "async_probe" or whatever
> > and use that to switch the behavior should work too.  I don't know
> > which way is better but either should work fine.
> 
> I take it by this you meant a generic system-wide sysctl or kernel cmd
> line option to enable this for al drivers?

Well, either global or per-insmod switch should work.  There probably
are details that I haven't mentioned - e.g. probably global switch is
easier to backport and deploy to existing systems - but as long as it
works I don't have fundmental objections either way.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM

2014-09-08 Thread Tejun Heo
Hello,

On Mon, Sep 08, 2014 at 06:38:34PM -0700, Luis R. Rodriguez wrote:
> OK then one only concern I would have with this is that the presence
> of such a flag doesn't necessarily mean that all drivers on a system
> have been tested for asynch probe yet. I'd feel much more comfortable

Given that the behvaior change is from driver core and that device
probing can happen post-loading anyway, I don't think we need to worry
about drivers breaking from probing made asynchronous to loading.  The
problem is the expectation of the entity which initiated loading of
the module.  If it's depending on device being probed synchronously
but insmod returns before that, it can break things.  We probably
should audit request_module() users and see which ones expect such
behavior.

> if this global flag allowed say specific drivers that *did* have such
> a bool enabled, for example. Then that would enable synchronous
> behaviour for the kernel by default, require the flag for enabling the
> new async feature but only for drivers that have been tested.

If we're gonna do the global switch, I personally think the right
approach is blacklisting instead of the other way around because each
specific driver doesn't really have much to do with it and the
exceptions are about specific use cases that we don't have a good way
to identify them from module loading path.

> That also still would not technically solve the issue of the current
> existence of the timeout, unless of course we wish to ask systemd to
> only make the timeout take effect *iff* the global sysctl flag /
> whatever was enabled.

Userland could backport a fix to set the sysctl.  Given that we need
both synchrnous and asynchronous behaviors, it's unlikely that we can
come up with a solution which doesn't need cooperation from userland.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM

2014-09-08 Thread Tejun Heo
Hello,

On Mon, Sep 08, 2014 at 07:28:58PM -0700, Luis R. Rodriguez wrote:
> > Given that the behvaior change is from driver core and that device
> > probing can happen post-loading anyway,
> 
> Ah but lets not forget Dmitry's requirement which is for in-kernel
> drivers. We'd need to deal with both built-in and modules. Dmitry's
> case is completely orthogonal to the systemd issue and is just needed
> to help not stall boot but I see no reason to blend these two issues
> into one requirement together.

Maybe we can piggy back the two on the same mechanism but as you said
the two issues are orthogonal.  Let's keep it that way for now.  We
need them separate anyway for backports.

> In terms of approach we would still need to decide on a path for how
> to do asynch probing for both in-kernel drivers and modules, do we
> want async_schedule(), or queue_work()? If async_schedule() do we want
> to use a new domain or a new one shared for all drivers? Priority on

I don't think async_schedule() is the right mechanism for this use
case as the mechanism is inherently opportunistic.  It also gets
tangled up with async synchronization at the end of module loading.

> the schedular was one of my other concerns which we'd need to make
> right to match existing load on drivers through finit_module() and
> synchronous probe.

Why do we care about the priority of probing tasks?  Does that
actually make any meaningful difference?  If so, how?

> > Userland could backport a fix to set the sysctl.  Given that we need
> > both synchrnous and asynchronous behaviors, it's unlikely that we can
> > come up with a solution which doesn't need cooperation from userland.
> 
> True and then the timeout would also have to be skipped for device
> drivers that have the sync_probe flag set, so I guess we'd need to

I'm not sure about skipping for sync_probe flag.  That seems like an
implementation detail to me.  Sure, we do that now because we don't
have a better way of figuring out whether request_module() is waiting
for it or not but hopefully we'd be able to in the future.  I think we
just should make exceptions sensible so that it works fine in practice
for now (and I don't think that'd be too hard).  So, the only
cooperation necessary from userland would be just saying "I don't
wanna wait for device probing on module load."

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM

2014-09-08 Thread Tejun Heo
On Mon, Sep 08, 2014 at 07:57:28PM -0700, Luis R. Rodriguez wrote:
> > I think we
> > just should make exceptions sensible so that it works fine in practice
> > for now (and I don't think that'd be too hard).  So, the only
> > cooperation necessary from userland would be just saying "I don't
> > wanna wait for device probing on module load."
> 
> But we're talking about drivers that have a flag that says 'you gotta
> wait sucker', what do we want systemd to do then? I'd be happy if it'd
> would not send the sigkill for these drivers, for example.

Hah?  Can you give me an example?  I'm having hard time imagining a
driver with such requirement given our current driver core
implementation.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM

2014-09-08 Thread Tejun Heo
Hello,

On Mon, Sep 08, 2014 at 08:19:12PM -0700, Luis R. Rodriguez wrote:
> On the systemd side of things it should enable this sysctl and for
> older kernels what should it do?

Supposing the change is backported via -stable, it can try to set the
sysctl on all kernels.  If the knob doesn't exist, the fix is not
there and nothing can be done about it.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: WARNING in block layer triggered in 3.17-rc3

2014-09-09 Thread Tejun Heo
Hello, Alan.

On Tue, Sep 09, 2014 at 11:08:22AM -0400, Alan Stern wrote:
> Sorry, the meaning wasn't clear.  I meant that the existing code 
> doesn't work.  The patch seems to work okay (except that I can't use 
> queue_flag_test_and_set because the queue isn't locked at that point).  
> I'll submit the patch shortly.

Given it's fully synchronized, the following should work fine and
probably is less misleading than using atomic test_and_set.

if (!blk_queue_init_done) {
queue_flag_set_unlocked(QUEUE_FLAG_INIT_DONE, q);
blk_queue_bypass_end(q);
}

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Block: fix unbalanced bypass-disable in blk_register_queue

2014-09-09 Thread Tejun Heo
On Tue, Sep 09, 2014 at 11:50:58AM -0400, Alan Stern wrote:
> When a queue is registered, the block layer turns off the bypass
> setting (because bypass is enabled when the queue is created).  This
> doesn't work well for queues that are unregistered and then registered
> again; we get a WARNING because of the unbalanced calls to
> blk_queue_bypass_end().
> 
> This patch fixes the problem by making blk_register_queue() call
> blk_queue_bypass_end() only the first time the queue is registered.
> 
> Signed-off-by: Alan Stern 
> CC: Tejun Heo 
> CC: James Bottomley 
> CC: Jens Axboe 

Acked-by: Tejun Heo 

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM

2014-09-09 Thread Tejun Heo
Hey, James.

On Tue, Sep 09, 2014 at 12:35:46PM -0700, James Bottomley wrote:
> I don't have very strong views on this one.  However, I've got to say
> from a systems point of view that if the desire is to flag when the
> module is having problems, probing and initializing synchronously in a
> thread spawned by init which the init process can watchdog and thus can
> flash up warning messages seems to be more straightforwards than an
> elaborate asynchronous mechanism with completion signalling which
> achieves the same thing in a more complicated (and thus bug prone)
> fashion.

We no longer report back error on probe failure on module load.  It
used to make sense to indicate error for module load on probe failure
when the hardware was a lot simpler and drivers did their own device
enumeration.  With the current bus / device setup, it doesn't make any
sense and driver core silently suppresses all probe failures.  There's
nothing the probing thread can monitor anymore.

In that sense, we already separated out device probing from module
loading simply because the hardware reality mandated it and we have
dynamic mechanisms to listen for device probes exactly for the same
reason, so I think it makes sense to separate out the waiting too, at
least in the long term.  In a modern dynamic setup, the waits are
essentially arbitrary and doesn't buy us anything.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM

2014-09-09 Thread Tejun Heo
Hello,

On Tue, Sep 09, 2014 at 03:26:02PM -0700, James Bottomley wrote:
> > We no longer report back error on probe failure on module load.
> 
> Yes, we do; for every probe failure of a device on a driver we'll print
> a warning (see drivers/base/dd.c).  Now if someone is proposing we
> should report this in a better fashion, that's probably a good idea, but
> I must have missed that patch.

We can do printks all the same from anywhere.  There's nothing special
about printing from the module loading thread.  The only way to
actually take advantage of the synchronisity would be propagating
error return to the waiting issuer, which we used to do but no longer
can.

> >   It
> > used to make sense to indicate error for module load on probe failure
> > when the hardware was a lot simpler and drivers did their own device
> > enumeration.  With the current bus / device setup, it doesn't make any
> > sense and driver core silently suppresses all probe failures.  There's
> > nothing the probing thread can monitor anymore.
> 
> Except the length of time taken to probe.  That seems to be what systemd
> is interested in, hence this whole thread, right?

No, systemd in this case isn't interested in the time taken to probe
at all.  It is expecting module load to just do that - load the
module.  Modern userlands, systemd or not, no longer depend on or make
use of the wait.

> But that's nothing to do with sync or async.  Nowadays we register a
> driver, the driver may bind to multiple devices.  If one of those
> devices encounters an error during probe, we just report the fact in
> dmesg and move on.  The module_init thread currently returns when all
> the probe routines for all enumerated devices have been called, so
> module_init has no indication of any failures (because they might be
> mixed with successes); successes are indicated as the device appears but
> we have nothing other than the kernel log to indicate the failures.  How
> does moving to async probing alter this?  It doesn't as far as I can
> see, except that module_init returns earlier but now we no longer have
> an indication of when the probe completes, so we have to add yet another
> mechanism to tell us if we're interested in that.  I really don't see
> what this buys us.

The thing is that we have to have dynamic mechanism to listen for
device attachments no matter what and such mechanism has been in place
for a long time at this point.  The synchronous wait simply doesn't
serve any purpose anymore and kinda gets in the way in that it makes
it a possibly extremely slow process to tell whether loading of a
module succeeded or not because the wait for the initial round of
probe is piggybacked.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM

2014-09-09 Thread Tejun Heo
Hello, James.

On Tue, Sep 09, 2014 at 03:46:23PM -0700, James Bottomley wrote:
> If you want the return of an individual device probe a log scraper gives
> it to you ... and nothing else does currently.  The advantage of the
> prink in dd.c is that it's standard for everything and can be scanned
> for ... if you take that out, you'll get complaints about the lack of
> standard messages (you'd be surprised at the number of enterprise
> monitoring systems that actually do log scraping).

Why would a log scaper care about which task is printing the messages?
The printk can stay there.  There's nothing wrong with it.  Log
scapers tend to be asynchronous in nature but if a log scraper wants
to operate synchronously for whatever reason, it can simply not turn
on async probing.

> OK, so we just fire and forget in userland ... why bother inventing an
> elaborate new infrastructure in the kernel to do exactly what
> 
> modprobe  &
> 
> would do?

I think the argument there is that the issuer wants to know whether
such operations succeeded or not and wants to report and record the
result and possibly take other actions in response.  We're currently
mixing wait and error reporting for one type of operation with wait
for another.  I'm not saying it's a fatal flaw or anything but it can
get in the way.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM

2014-09-09 Thread Tejun Heo
On Tue, Sep 09, 2014 at 12:25:29PM +0900, Tejun Heo wrote:
> Hello,
> 
> On Mon, Sep 08, 2014 at 08:19:12PM -0700, Luis R. Rodriguez wrote:
> > On the systemd side of things it should enable this sysctl and for
> > older kernels what should it do?
> 
> Supposing the change is backported via -stable, it can try to set the
> sysctl on all kernels.  If the knob doesn't exist, the fix is not
> there and nothing can be done about it.

The more I think about it, the more I think this should be a
per-insmod instance thing rather than a system-wide switch.  Currently
the kernel param code doesn't allow a generic param outside the ones
specified by the module itself but adding support for something like
driver.async_load=1 shouldn't be too difficult, applying that to
existing systems shouldn't be much more difficult than a system-wide
switch, and it'd be siginificantly cleaner than fiddling with driver
blacklist.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ahci_xgene: Fix the error print invalid resource for APM X-Gene SoC AHCI SATA Host Controller driver.

2014-09-12 Thread Tejun Heo
On Fri, Sep 12, 2014 at 01:24:08PM +0530, Suman Tripathi wrote:
> This patch fixes the error print invalid resource for the APM X-Gene
> SoC AHCI SATA Host Controller driver. This print was due to the fact
> that the controller 3 don't have a mux resource. This didn't result
> in any errors but the print seems like meaningless.
> 
> Signed-off-by: Loc Ho 
> Signed-off-by: Suman Tripathi 
> ---
>  drivers/ata/ahci_xgene.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/ata/ahci_xgene.c b/drivers/ata/ahci_xgene.c
> index 40d0a76..9404db0c 100644
> --- a/drivers/ata/ahci_xgene.c
> +++ b/drivers/ata/ahci_xgene.c
> @@ -434,7 +434,7 @@ static int xgene_ahci_mux_select(struct 
> xgene_ahci_context *ctx)
>   u32 val;
> 
>   /* Check for optional MUX resource */
> - if (IS_ERR(ctx->csr_mux))
> + if (!ctx->csr_mux)
>   return 0;

Hmmm?  So, if devm_ioremap_resource() call actually fails, now the
function tries to operation on ERR_PTR() value as a real pointer?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ahci_xgene: Fix the error print invalid resource for APM X-Gene SoC AHCI SATA Host Controller driver.

2014-09-14 Thread Tejun Heo
Hello,

On Sun, Sep 14, 2014 at 11:36:51AM +0530, Suman Tripathi wrote:
> We can  maintain same piece (IS_ERR(ctx->csr_mux)), then we can do the
> below instead of NULL ??
> 
> ctx->csr_mux = res ? devm_ioremap_resource(dev, res) : ERR_PTR(-EINVAL);

Setting it to NULL on failure would probably make more sense.  No need
to carry around ERR_PTR() value around.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ahci_xgene: Fix the error print invalid resource for APM X-Gene SoC AHCI SATA Host Controller driver.

2014-09-15 Thread Tejun Heo
On Mon, Sep 15, 2014 at 01:10:01PM +0530, Suman Tripathi wrote:
> [suman] : So the posted version is acceptable ? Any others comments on this
> patch ?

I'm suggesting setting ctx->cs_mux to NULL on failure.  IOW,

if (res) {
ctx->csr_mux = devm_ioremap_resources();
if (IS_ERR(ctx->csr_mux)) {
print warning or something;
ctx->csr_mux = NULL;
}
}

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: boot stall regression due to blk-mq: use percpu_ref for mq usage count

2014-09-22 Thread Tejun Heo
On Tue, Sep 23, 2014 at 07:55:54AM +0200, Christoph Hellwig wrote:
> Jens,
> 
> can we simply get these commits reverted from now if there's no better
> fix?  I'd hate to have this boot stall in the first kernel with blk-mq
> support for scsi.

Patches going out right now.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: boot stall regression due to blk-mq: use percpu_ref for mq usage count

2014-09-22 Thread Tejun Heo
On Tue, Sep 23, 2014 at 01:56:48AM -0400, Tejun Heo wrote:
> On Tue, Sep 23, 2014 at 07:55:54AM +0200, Christoph Hellwig wrote:
> > Jens,
> > 
> > can we simply get these commits reverted from now if there's no better
> > fix?  I'd hate to have this boot stall in the first kernel with blk-mq
> > support for scsi.
> 
> Patches going out right now.

And the original implementation was broken, so...

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: boot stall regression due to blk-mq: use percpu_ref for mq usage count

2014-09-22 Thread Tejun Heo
On Tue, Sep 23, 2014 at 07:59:24AM +0200, Christoph Hellwig wrote:
> "[PATCHSET percpu/for-3.18] percpu_ref: implement switch_to_atomic/percpu()"
> 
> looks way to big for 3.17, and the regression was introduced in the 3.17
> merge window.  I'm not sure what was broken before, but it defintively
> survived a lot of testing.

Do we even care about fixing it for 3.17?  scsi-mq isn't enabled by
default even for 3.18.  The open-coded percpu ref thing was subtly
broken there.  It'd be difficult to trigger but I'm fairly sure it'd
crap out in the wild once in a blue moon.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: boot stall regression due to blk-mq: use percpu_ref for mq usage count

2014-09-22 Thread Tejun Heo
On Tue, Sep 23, 2014 at 02:01:41AM -0400, Tejun Heo wrote:
> On Tue, Sep 23, 2014 at 07:59:24AM +0200, Christoph Hellwig wrote:
> > "[PATCHSET percpu/for-3.18] percpu_ref: implement switch_to_atomic/percpu()"
> > 
> > looks way to big for 3.17, and the regression was introduced in the 3.17
> > merge window.  I'm not sure what was broken before, but it defintively
> > survived a lot of testing.
> 
> Do we even care about fixing it for 3.17?  scsi-mq isn't enabled by
> default even for 3.18.  The open-coded percpu ref thing was subtly
> broken there.  It'd be difficult to trigger but I'm fairly sure it'd
> crap out in the wild once in a blue moon.

Hmmm... also, this is actually an issue with scsi that block layer is
working around.  For other drivers (virtio), the latency during
shutdown should be completely fine.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH block/for-3.18/core] blk-mq: start q->mq_usage_counter in atomic mode

2014-09-22 Thread Tejun Heo
>From 83b06f4fc6ca2f7f3d706a168b71c248bdada668 Mon Sep 17 00:00:00 2001
From: Tejun Heo 
Date: Tue, 23 Sep 2014 01:58:34 -0400

blk-mq uses percpu_ref for its usage counter which tracks the number
of in-flight commands and used to synchronously drain the queue on
freeze.  percpu_ref shutdown takes measureable wallclock time as it
involves a sched RCU grace period.  This means that draining a blk-mq
takes measureable wallclock time.  One would think that this shouldn't
matter as queue shutdown should be a rare event which takes place
asynchronously w.r.t. userland.

Unfortunately, SCSI probing involves synchronously setting up and then
tearing down a lot of request_queues back-to-back for non-existent
LUNs.  This means that SCSI probing may take above ten seconds when
scsi-mq is used.

  [0.949892] scsi host0: Virtio SCSI HBA
  [1.007864] scsi 0:0:0:0: Direct-Access QEMU QEMU HARDDISK1.1. 
PQ: 0 ANSI: 5
  [1.021299] scsi 0:0:1:0: Direct-Access QEMU QEMU HARDDISK1.1. 
PQ: 0 ANSI: 5
  [1.520356] tsc: Refined TSC clocksource calibration: 2491.910 MHz

  

  [   16.186549] sd 0:0:0:0: Attached scsi generic sg0 type 0
  [   16.190478] sd 0:0:1:0: Attached scsi generic sg1 type 0
  [   16.194099] osd: LOADED open-osd 0.2.1
  [   16.203202] sd 0:0:0:0: [sda] 31457280 512-byte logical blocks: (16.1 
GB/15.0 GiB)
  [   16.208478] sd 0:0:0:0: [sda] Write Protect is off
  [   16.211439] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, 
doesn't support DPO or FUA
  [   16.218771] sd 0:0:1:0: [sdb] 31457280 512-byte logical blocks: (16.1 
GB/15.0 GiB)
  [   16.223264] sd 0:0:1:0: [sdb] Write Protect is off
  [   16.225682] sd 0:0:1:0: [sdb] Write cache: enabled, read cache: enabled, 
doesn't support DPO or FUA

This is also the reason why request_queues start in bypass mode which
is ended on blk_register_queue() as shutting down a fully functional
queue also involves a RCU grace period and the queues for non-existent
SCSI devices never reach registration.

blk-mq basically needs to do the same thing - start the mq in a
degraded mode which is faster to shut down and then make it fully
functional only after the queue reaches registration.  percpu_ref
recently grew facilities to force atomic operation until explicitly
switched to percpu mode, which can be used for this purpose.  This
patch makes blk-mq initialize q->mq_usage_counter in atomic mode and
switch it to percpu mode only once blk_register_queue() is reached.

Signed-off-by: Tejun Heo 
Reported-by: Christoph Hellwig 
Link: http://lkml.kernel.org/g/20140919113815.ga10...@lst.de
Fixes: add703fda981 ("blk-mq: use percpu_ref for mq usage count")
Cc: Kent Overstreet 
Cc: Jens Axboe 
Cc: Johannes Weiner 
---
Hello,

This patch is on top of

  block/for-3.18/core fe052529e465 ("scsi: move blk_mq_start_request call 
earlier")
+ [PATCHSET percpu/for-3.18] percpu_ref: implement switch_to_atomic/percpu()
  http://lkml.kernel.org/g/1411451718-17807-1-git-send-email...@kernel.org

and available in the following git branch.

  git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu.git 
review-blk_mq-latency-fix

Thanks.

 block/blk-mq-sysfs.c   |  6 ++
 block/blk-mq.c |  6 +-
 block/blk-sysfs.c  | 11 +--
 include/linux/blk-mq.h |  1 +
 4 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/block/blk-mq-sysfs.c b/block/blk-mq-sysfs.c
index ed52178..371d880 100644
--- a/block/blk-mq-sysfs.c
+++ b/block/blk-mq-sysfs.c
@@ -402,6 +402,12 @@ static void blk_mq_sysfs_init(struct request_queue *q)
}
 }
 
+/* see blk_register_queue() */
+void blk_mq_finish_init(struct request_queue *q)
+{
+   percpu_ref_switch_to_percpu(&q->mq_usage_counter);
+}
+
 int blk_mq_register_disk(struct gendisk *disk)
 {
struct device *dev = disk_to_dev(disk);
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 91c49b3..4ae58b1 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1783,8 +1783,12 @@ struct request_queue *blk_mq_init_queue(struct 
blk_mq_tag_set *set)
if (!q)
goto err_hctxs;
 
+   /*
+* Init percpu_ref in atomic mode so that it's faster to shutdown.
+* See blk_register_queue() for details.
+*/
if (percpu_ref_init(&q->mq_usage_counter, blk_mq_usage_counter_release,
-   0, GFP_KERNEL))
+   PERCPU_REF_INIT_ATOMIC, GFP_KERNEL))
goto err_map;
 
setup_timer(&q->timeout, blk_mq_rq_timer, (unsigned long) q);
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 17f5c84..521ae90 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -551,12 +551,19 @@ int blk_register_queue(struct gendisk *disk)
return -ENXIO;
 
/*
-* Initialization must be complete by now.  Finish the initial
-* bypass from queue allocation.
+* SCSI probing may synchronously cre

Re: boot stall regression due to blk-mq: use percpu_ref for mq usage count

2014-09-22 Thread Tejun Heo
On Tue, Sep 23, 2014 at 08:09:06AM +0200, Christoph Hellwig wrote:
> On Tue, Sep 23, 2014 at 02:01:41AM -0400, Tejun Heo wrote:
> > On Tue, Sep 23, 2014 at 07:59:24AM +0200, Christoph Hellwig wrote:
> > > "[PATCHSET percpu/for-3.18] percpu_ref: implement 
> > > switch_to_atomic/percpu()"
> > > 
> > > looks way to big for 3.17, and the regression was introduced in the 3.17
> > > merge window.  I'm not sure what was broken before, but it defintively
> > > survived a lot of testing.
> > 
> > Do we even care about fixing it for 3.17?  scsi-mq isn't enabled by
> > default even for 3.18.  The open-coded percpu ref thing was subtly
> > broken there.  It'd be difficult to trigger but I'm fairly sure it'd
> > crap out in the wild once in a blue moon.
> 
> It's compiled in by default, and people are extremly eager to test it.

Ugh, I don't know.  It's not like we have a very good baseline we can
go back to and reverting it for -stable and then redoing it seems
kinda excessive for a yet experimental feature.  Jens?

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ahci_xgene: Fix the error print invalid resource for APM X-Gene SoC AHCI SATA Host Controller driver.

2014-09-23 Thread Tejun Heo
On Mon, Sep 22, 2014 at 06:31:33PM +0530, Suman Tripathi wrote:
> This patch fixes the error print invalid resource for the APM X-Gene
> SoC AHCI SATA Host Controller driver. This print was due to the fact
> that the controller 3 don't have a mux resource. This didn't result
> in any errors but the print seems like meaningless.
> 
> Signed-off-by: Loc Ho 
> Signed-off-by: Suman Tripathi 

Applied to libata/for-3.18.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH block/for-3.17-fixes/core] blk-mq, percpu_ref: implement a kludge for SCSI blk-mq stall during probe

2014-09-23 Thread Tejun Heo
blk-mq uses percpu_ref for its usage counter which tracks the number
of in-flight commands and used to synchronously drain the queue on
freeze.  percpu_ref shutdown takes measureable wallclock time as it
involves a sched RCU grace period.  This means that draining a blk-mq
takes measureable wallclock time.  One would think that this shouldn't
matter as queue shutdown should be a rare event which takes place
asynchronously w.r.t. userland.

Unfortunately, SCSI probing involves synchronously setting up and then
tearing down a lot of request_queues back-to-back for non-existent
LUNs.  This means that SCSI probing may take more than ten seconds
when scsi-mq is used.

This will be properly fixed by implementing a mechanism to keep
q->mq_usage_counter in atomic mode till genhd registration; however,
that involves rather big updates to percpu_ref which is difficult to
apply late in the devel cycle (v3.17-rc6 at the moment).  As a
stop-gap measure till the proper fix can be implemented in the next
cycle, this patch introduces __percpu_ref_kill_expedited() and makes
blk_mq_freeze_queue() use it.  This is heavy-handed but should work
for testing the experimental SCSI blk-mq implementation.

Signed-off-by: Tejun Heo 
Reported-by: Christoph Hellwig 
Link: http://lkml.kernel.org/g/20140919113815.ga10...@lst.de
Fixes: add703fda981 ("blk-mq: use percpu_ref for mq usage count")
Cc: Kent Overstreet 
Cc: Jens Axboe 
---
Hello, Jens, Christoph.

How about this one?  This is kinda ugly but should work fine in most
cases and easy to apply to v3.17 and take out during v3.18.

Thanks.

 block/blk-mq.c  |   11 ++-
 include/linux/percpu-refcount.h |1 +
 lib/percpu-refcount.c   |   16 
 3 files changed, 27 insertions(+), 1 deletion(-)

--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -120,7 +120,16 @@ void blk_mq_freeze_queue(struct request_
spin_unlock_irq(q->queue_lock);
 
if (freeze) {
-   percpu_ref_kill(&q->mq_usage_counter);
+   /*
+* XXX: Temporary kludge to work around SCSI blk-mq stall.
+* SCSI synchronously creates and destroys many queues
+* back-to-back during probe leading to lengthy stalls.
+* This will be fixed by keeping ->mq_usage_counter in
+* atomic mode until genhd registration, but, for now,
+* let's work around using expedited synchronization.
+*/
+   __percpu_ref_kill_expedited(&q->mq_usage_counter);
+
blk_mq_run_queues(q, false);
}
wait_event(q->mq_freeze_wq, percpu_ref_is_zero(&q->mq_usage_counter));
--- a/include/linux/percpu-refcount.h
+++ b/include/linux/percpu-refcount.h
@@ -72,6 +72,7 @@ void percpu_ref_reinit(struct percpu_ref
 void percpu_ref_exit(struct percpu_ref *ref);
 void percpu_ref_kill_and_confirm(struct percpu_ref *ref,
 percpu_ref_func_t *confirm_kill);
+void __percpu_ref_kill_expedited(struct percpu_ref *ref);
 
 /**
  * percpu_ref_kill - drop the initial ref
--- a/lib/percpu-refcount.c
+++ b/lib/percpu-refcount.c
@@ -189,3 +189,19 @@ void percpu_ref_kill_and_confirm(struct
call_rcu_sched(&ref->rcu, percpu_ref_kill_rcu);
 }
 EXPORT_SYMBOL_GPL(percpu_ref_kill_and_confirm);
+
+/*
+ * XXX: Temporary kludge to work around SCSI blk-mq stall.  Used only by
+ * block/blk-mq.c::blk_mq_freeze_queue().  Will be removed during v3.18
+ * devel cycle.  Do not use anywhere else.
+ */
+void __percpu_ref_kill_expedited(struct percpu_ref *ref)
+{
+   WARN_ONCE(ref->pcpu_count_ptr & PCPU_REF_DEAD,
+ "percpu_ref_kill() called more than once on %pf!",
+ ref->release);
+
+   ref->pcpu_count_ptr |= PCPU_REF_DEAD;
+   synchronize_sched_expedited();
+   percpu_ref_kill_rcu(&ref->rcu);
+}
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH block/for-3.17-fixes/core] blk-mq, percpu_ref: implement a kludge for SCSI blk-mq stall during probe

2014-09-23 Thread Tejun Heo
Oops, I forgot to cc Kent.  Kent, the patch is at

 http://lkml.kernel.org/g/20140923192432.ga24...@mtj.dyndns.org

On Tue, Sep 23, 2014 at 01:29:03PM -0600, Jens Axboe wrote:
> The hack looks fine to me, if it passes Christophs boot stall test. The
> hack isn't THAT nasty, since you already have a series queued up to fix
> it for real for 3.18.

Yeah, then I can pull in block fixes branch into a percpu for-3.18
branch, revert the hack and then apply the patches to implement the
proper fix.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH block/for-3.17-fixes/core] blk-mq, percpu_ref: implement a kludge for SCSI blk-mq stall during probe

2014-09-24 Thread Tejun Heo
On Wed, Sep 24, 2014 at 08:30:33AM -0600, Jens Axboe wrote:
> On 09/24/2014 02:23 AM, Christoph Hellwig wrote:
> > Thanks, this fixes the boot stall for me.
> > 
> > Tested-by: Christoph Hellwig 
> 
> Sweet, I'll shove this off today, so we are done for 3.17 release.

Cool, once it hits the block branch, I'll pull it into percpu/for-3.18
and apply the patches for proper fix on top.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH percpu/for-3.18] Revert "blk-mq, percpu_ref: implement a kludge for SCSI blk-mq stall during probe"

2014-09-24 Thread Tejun Heo
>From 9eca80461a45177e456219a9cd944c27675d6512 Mon Sep 17 00:00:00 2001
From: Tejun Heo 
Date: Wed, 24 Sep 2014 13:07:33 -0400

This reverts commit 0a30288da1aec914e158c2d7a3482a85f632750f, which
was a temporary fix for SCSI blk-mq stall issue.  The following
patches will fix the issue properly by introducing atomic mode to
percpu_ref.

Signed-off-by: Tejun Heo 
Cc: Kent Overstreet 
Cc: Jens Axboe 
Cc: Christoph Hellwig 
---
Hello,

I pulled in block/for-linus into percpu/for-3.18 and reverted the temp
fix.  Will apply the patchset to implement the proper fix on top of
it.

Thanks.

 block/blk-mq.c  | 11 +--
 include/linux/percpu-refcount.h |  1 -
 lib/percpu-refcount.c   | 16 
 3 files changed, 1 insertion(+), 27 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 255d79c..44a78ae 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -119,16 +119,7 @@ void blk_mq_freeze_queue(struct request_queue *q)
spin_unlock_irq(q->queue_lock);
 
if (freeze) {
-   /*
-* XXX: Temporary kludge to work around SCSI blk-mq stall.
-* SCSI synchronously creates and destroys many queues
-* back-to-back during probe leading to lengthy stalls.
-* This will be fixed by keeping ->mq_usage_counter in
-* atomic mode until genhd registration, but, for now,
-* let's work around using expedited synchronization.
-*/
-   __percpu_ref_kill_expedited(&q->mq_usage_counter);
-
+   percpu_ref_kill(&q->mq_usage_counter);
blk_mq_run_queues(q, false);
}
wait_event(q->mq_freeze_wq, percpu_ref_is_zero(&q->mq_usage_counter));
diff --git a/include/linux/percpu-refcount.h b/include/linux/percpu-refcount.h
index 11b38ce..5df6784 100644
--- a/include/linux/percpu-refcount.h
+++ b/include/linux/percpu-refcount.h
@@ -72,7 +72,6 @@ void percpu_ref_reinit(struct percpu_ref *ref);
 void percpu_ref_exit(struct percpu_ref *ref);
 void percpu_ref_kill_and_confirm(struct percpu_ref *ref,
 percpu_ref_func_t *confirm_kill);
-void __percpu_ref_kill_expedited(struct percpu_ref *ref);
 
 /**
  * percpu_ref_kill - drop the initial ref
diff --git a/lib/percpu-refcount.c b/lib/percpu-refcount.c
index c6c31e2..559ee0b 100644
--- a/lib/percpu-refcount.c
+++ b/lib/percpu-refcount.c
@@ -189,19 +189,3 @@ void percpu_ref_kill_and_confirm(struct percpu_ref *ref,
call_rcu_sched(&ref->rcu, percpu_ref_kill_rcu);
 }
 EXPORT_SYMBOL_GPL(percpu_ref_kill_and_confirm);
-
-/*
- * XXX: Temporary kludge to work around SCSI blk-mq stall.  Used only by
- * block/blk-mq.c::blk_mq_freeze_queue().  Will be removed during v3.18
- * devel cycle.  Do not use anywhere else.
- */
-void __percpu_ref_kill_expedited(struct percpu_ref *ref)
-{
-   WARN_ONCE(ref->pcpu_count_ptr & PCPU_REF_DEAD,
- "percpu_ref_kill() called more than once on %pf!",
- ref->release);
-
-   ref->pcpu_count_ptr |= PCPU_REF_DEAD;
-   synchronize_sched_expedited();
-   percpu_ref_kill_rcu(&ref->rcu);
-}
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH block/for-3.18/core] blk-mq: start q->mq_usage_counter in atomic mode

2014-09-24 Thread Tejun Heo
On Tue, Sep 23, 2014 at 02:08:12AM -0400, Tejun Heo wrote:
> From 83b06f4fc6ca2f7f3d706a168b71c248bdada668 Mon Sep 17 00:00:00 2001
> From: Tejun Heo 
> Date: Tue, 23 Sep 2014 01:58:34 -0400
> 
> blk-mq uses percpu_ref for its usage counter which tracks the number
> of in-flight commands and used to synchronously drain the queue on
> freeze.  percpu_ref shutdown takes measureable wallclock time as it
> involves a sched RCU grace period.  This means that draining a blk-mq
> takes measureable wallclock time.  One would think that this shouldn't
> matter as queue shutdown should be a rare event which takes place
> asynchronously w.r.t. userland.

Applied to percpu/for-3.18 on top of the klude, its revert and the
required percpu_ref changes.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH percpu/for-3.18] blk-mq, percpu_ref: start q->mq_usage_counter in atomic mode

2014-09-24 Thread Tejun Heo
FYI, I added a paragraph explaining what happened to the temp fix at
the end.  The following is the applied version.

Thanks.
- 8< -
>From 17497acbdce9506fd6a75115dee4ab80c3cc5ee5 Mon Sep 17 00:00:00 2001
From: Tejun Heo 
Date: Wed, 24 Sep 2014 13:31:50 -0400

blk-mq uses percpu_ref for its usage counter which tracks the number
of in-flight commands and used to synchronously drain the queue on
freeze.  percpu_ref shutdown takes measureable wallclock time as it
involves a sched RCU grace period.  This means that draining a blk-mq
takes measureable wallclock time.  One would think that this shouldn't
matter as queue shutdown should be a rare event which takes place
asynchronously w.r.t. userland.

Unfortunately, SCSI probing involves synchronously setting up and then
tearing down a lot of request_queues back-to-back for non-existent
LUNs.  This means that SCSI probing may take above ten seconds when
scsi-mq is used.

  [0.949892] scsi host0: Virtio SCSI HBA
  [1.007864] scsi 0:0:0:0: Direct-Access QEMU QEMU HARDDISK1.1. 
PQ: 0 ANSI: 5
  [1.021299] scsi 0:0:1:0: Direct-Access QEMU QEMU HARDDISK1.1. 
PQ: 0 ANSI: 5
  [1.520356] tsc: Refined TSC clocksource calibration: 2491.910 MHz

  

  [   16.186549] sd 0:0:0:0: Attached scsi generic sg0 type 0
  [   16.190478] sd 0:0:1:0: Attached scsi generic sg1 type 0
  [   16.194099] osd: LOADED open-osd 0.2.1
  [   16.203202] sd 0:0:0:0: [sda] 31457280 512-byte logical blocks: (16.1 
GB/15.0 GiB)
  [   16.208478] sd 0:0:0:0: [sda] Write Protect is off
  [   16.211439] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, 
doesn't support DPO or FUA
  [   16.218771] sd 0:0:1:0: [sdb] 31457280 512-byte logical blocks: (16.1 
GB/15.0 GiB)
  [   16.223264] sd 0:0:1:0: [sdb] Write Protect is off
  [   16.225682] sd 0:0:1:0: [sdb] Write cache: enabled, read cache: enabled, 
doesn't support DPO or FUA

This is also the reason why request_queues start in bypass mode which
is ended on blk_register_queue() as shutting down a fully functional
queue also involves a RCU grace period and the queues for non-existent
SCSI devices never reach registration.

blk-mq basically needs to do the same thing - start the mq in a
degraded mode which is faster to shut down and then make it fully
functional only after the queue reaches registration.  percpu_ref
recently grew facilities to force atomic operation until explicitly
switched to percpu mode, which can be used for this purpose.  This
patch makes blk-mq initialize q->mq_usage_counter in atomic mode and
switch it to percpu mode only once blk_register_queue() is reached.

Note that this issue was previously worked around by 0a30288da1ae
("blk-mq, percpu_ref: implement a kludge for SCSI blk-mq stall during
probe") for v3.17.  The temp fix was reverted in preparation of adding
persistent atomic mode to percpu_ref by 9eca80461a45 ("Revert "blk-mq,
percpu_ref: implement a kludge for SCSI blk-mq stall during probe"").
This patch and the prerequisite percpu_ref changes will be merged
during v3.18 devel cycle.

Signed-off-by: Tejun Heo 
Reported-by: Christoph Hellwig 
Link: http://lkml.kernel.org/g/20140919113815.ga10...@lst.de
Fixes: add703fda981 ("blk-mq: use percpu_ref for mq usage count")
Reviewed-by: Kent Overstreet 
Cc: Jens Axboe 
Cc: Johannes Weiner 
---
 block/blk-mq-sysfs.c   |  6 ++
 block/blk-mq.c |  6 +-
 block/blk-sysfs.c  | 11 +--
 include/linux/blk-mq.h |  1 +
 4 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/block/blk-mq-sysfs.c b/block/blk-mq-sysfs.c
index ed52178..371d880 100644
--- a/block/blk-mq-sysfs.c
+++ b/block/blk-mq-sysfs.c
@@ -402,6 +402,12 @@ static void blk_mq_sysfs_init(struct request_queue *q)
}
 }
 
+/* see blk_register_queue() */
+void blk_mq_finish_init(struct request_queue *q)
+{
+   percpu_ref_switch_to_percpu(&q->mq_usage_counter);
+}
+
 int blk_mq_register_disk(struct gendisk *disk)
 {
struct device *dev = disk_to_dev(disk);
diff --git a/block/blk-mq.c b/block/blk-mq.c
index d85fe01..38f4a16 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1795,8 +1795,12 @@ struct request_queue *blk_mq_init_queue(struct 
blk_mq_tag_set *set)
if (!q)
goto err_hctxs;
 
+   /*
+* Init percpu_ref in atomic mode so that it's faster to shutdown.
+* See blk_register_queue() for details.
+*/
if (percpu_ref_init(&q->mq_usage_counter, blk_mq_usage_counter_release,
-   0, GFP_KERNEL))
+   PERCPU_REF_INIT_ATOMIC, GFP_KERNEL))
goto err_map;
 
setup_timer(&q->timeout, blk_mq_rq_timer, (unsigned long) q);
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 17f5c84..521ae90 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -551,12 +551,19 @@ int blk_register_queue(struct gendisk *disk)
   

Re: [PATCH v1 5/5] driver-core: add driver asynchronous probe support

2014-09-28 Thread Tejun Heo
Hello,

On Fri, Sep 26, 2014 at 02:57:17PM -0700, Luis R. Rodriguez wrote:
...
> Systemd should consider enabling async probe on device drivers
> it loads through systemd-udev but probably does not want to
> enable it for modules loaded through systemd-modules-load
> (modules-load.d). At least on my booting enablign async probe
> for all modules fails to boot as such in order to make this

Did you find out why boot failed with those modules?

> a bit more useful we whitelist a few buses where it should be
> at least in theory safe to try to enable async probe. This
> way even if systemd tried to ask to enable async probe for all
> its device drivers the kernel won't blindly do this. We also
> have the sync_probe flag which device drivers can themselves
> enable *iff* its known the device driver should never async
> probe.
> 
> In order to help *test* things folks can use the bus.safe_mod_async_probe=1
> kernel parameter which will work as if userspace would have
> requested all modules to load with async probe. Daring folks can
> also use bus.force_mod_async_probe=1 which will enable asynch probe
> even on buses not tested in any way yet, if you use that though
> you're on your own.

If those two knobs are meant for debugging, let's please make that
fact immediately evident.  e.g. Make them ugly boot params like
"__DEVEL__driver_force_mod_async_probe".  Devel/debug options ending
up becoming stable interface are really nasty.

> +struct driver_attach_work {
> + struct work_struct work;
> + struct device_driver *driver;
> +};
> +
>  struct driver_private {
>   struct kobject kobj;
>   struct klist klist_devices;
>   struct klist_node knode_bus;
>   struct module_kobject *mkobj;
> + struct driver_attach_work *attach_work;
>   struct device_driver *driver;
>  };

How many bytes are we saving by allocating it separately?  Can't we
just embed it in driver_private?

> +static void driver_attach_workfn(struct work_struct *work)
> +{
> + int ret;
> + struct driver_attach_work *attach_work =
> + container_of(work, struct driver_attach_work, work);
> + struct device_driver *drv = attach_work->driver;
> + ktime_t calltime, delta, rettime;
> + unsigned long long duration;

This could just be a personal preference but I think it's easier to
read if local vars w/ initializers come before the ones w/o.

> +
> + calltime = ktime_get();
> +
> + ret = driver_attach(drv);
> + if (ret != 0) {
> + remove_driver_private(drv);
> + bus_put(drv->bus);
> + }
> +
> + rettime = ktime_get();
> + delta = ktime_sub(rettime, calltime);
> + duration = (unsigned long long) ktime_to_ns(delta) >> 10;
> +
> + pr_debug("bus: '%s': add driver %s attach completed after %lld usecs\n",
> +  drv->bus->name, drv->name, duration);

Why do we have the above printout for async path but not sync path?
It's kinda weird for the code path to diverge like this.  Shouldn't
the only difference be the context probes are running from?

...
> +static bool drv_enable_async_probe(struct device_driver *drv,
> +struct bus_type *bus)
> +{
> + struct module *mod;
> +
> + if (!drv->owner || drv->sync_probe)
> + return false;
> +
> + if (force_mod_async)
> + return true;
> +
> + mod = drv->owner;
> + if (!safe_mod_async && !mod->async_probe_requested)
> + return false;
> +
> + /* For now lets avoid stupid bug reports */
> + if (!strcmp(bus->name, "pci") ||
> + !strcmp(bus->name, "pci_express") ||
> + !strcmp(bus->name, "hid") ||
> + !strcmp(bus->name, "sdio") ||
> + !strcmp(bus->name, "gameport") ||
> + !strcmp(bus->name, "mmc") ||
> + !strcmp(bus->name, "i2c") ||
> + !strcmp(bus->name, "platform") ||
> + !strcmp(bus->name, "usb"))
> + return true;

Ugh... things like this tend to become permanent.  Do we really need
this?  And how are we gonna find out what's broken why w/o bug
reports?

> diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> index e4ffbcf..7999aba 100644
> --- a/drivers/base/dd.c
> +++ b/drivers/base/dd.c
> @@ -507,6 +507,13 @@ static void __device_release_driver(struct device *dev)
>  
>   drv = dev->driver;
>   if (drv) {
> + if (drv->owner && !drv->sync_probe) {
> + struct module *mod = drv->owner;
> + struct driver_private *priv = drv->p;
> +
> + if (mod->async_probe_requested)
> + flush_work(&priv->attach_work->work);

This can be unconditional flus_work(&priv->attach_work) if attach_work
isn't separately allocated.

>  static int unknown_module_param_cb(char *param, char *val, const char 
> *modname,
>  void *arg)
>  {
> + int ret;
> + struct module *mod = arg;

Ditto with the order of definitions.

> +

Re: [PATCH 33/38] libata: use __scsi_print_command()

2014-09-29 Thread Tejun Heo
On Mon, Sep 29, 2014 at 01:59:02PM +0200, Hannes Reinecke wrote:
> libata already uses an internal buffer, so we should be using
> __scsi_print_command() here.
> 
> Cc: Tejun Heo 
> Cc: linux-...@vger.kernel.org
> Cc: LKML 
> Signed-off-by: Hannes Reinecke 

Applied to libata/for-3.18.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 33/38] libata: use __scsi_print_command()

2014-09-29 Thread Tejun Heo
On Mon, Sep 29, 2014 at 04:10:30PM +0200, Hannes Reinecke wrote:
> On 09/29/2014 04:06 PM, Tejun Heo wrote:
> > On Mon, Sep 29, 2014 at 01:59:02PM +0200, Hannes Reinecke wrote:
> >> libata already uses an internal buffer, so we should be using
> >> __scsi_print_command() here.
> >>
> >> Cc: Tejun Heo 
> >> Cc: linux-...@vger.kernel.org
> >> Cc: LKML 
> >> Signed-off-by: Hannes Reinecke 
> > 
> > Applied to libata/for-3.18.
> > 
> > Thanks.
> > 
> Errm.
> Nice that you did, but it sort of relies for patches 01-32 to be
> applied previously.
> I'd rather apply your Signed-off-by: to the patch and have it
> routed through the SCSI tree; that way we're sure it'll only be
> applied if the previous patches are in.
> 
> Can you please pull it from libata to avoid build issues?

Ah, okay, pulled it.  Please feel free to add

 Acked-by: Tejun Heo 

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 5/5] driver-core: add driver asynchronous probe support

2014-09-29 Thread Tejun Heo
Hello, Luis.

On Mon, Sep 29, 2014 at 11:22:08PM +0200, Luis R. Rodriguez wrote:
> > > + /* For now lets avoid stupid bug reports */
> > > + if (!strcmp(bus->name, "pci") ||
> > > + !strcmp(bus->name, "pci_express") ||
> > > + !strcmp(bus->name, "hid") ||
> > > + !strcmp(bus->name, "sdio") ||
> > > + !strcmp(bus->name, "gameport") ||
> > > + !strcmp(bus->name, "mmc") ||
> > > + !strcmp(bus->name, "i2c") ||
> > > + !strcmp(bus->name, "platform") ||
> > > + !strcmp(bus->name, "usb"))
> > > + return true;
> > 
> > Ugh... things like this tend to become permanent.  Do we really need
> > this?  And how are we gonna find out what's broken why w/o bug
> > reports?
> 
> Yeah... well we have two options, one is have something like this to
> at least make it generally useful or remove this and let folks who
> care start fixing async for all modules. The downside to removing
> this is it makes async probe pretty much useless on most systems
> right now, it would mean systemd would have to probably consider
> the list above if they wanted to start using this without expecting
> systems to not work.

So, I'd much prefer blacklist approach if something like this is a
necessity.  That way, we'd at least know what doesn't work.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 7/7] driver-core: add preferred async probe option for built-in and modules

2014-10-06 Thread Tejun Heo
Hello, Luis.

The patchset generally looks good to me.  Please feel free to add

 Reviewed-by: Tejun Heo 

A question below.

On Fri, Oct 03, 2014 at 02:44:43PM -0700, Luis R. Rodriguez wrote:
> + bus.enable_kern_async=1 [KNL]
> + This tells the kernel userspace has been vetted for
> + asynchronous probe support and can listen to the device
> + driver prefer_async_probe flag for both built-in device
> + drivers and modules.

Do we intend to keep this param permanently?  Isn't this more of a
temp tool to be used during development?  If so, maybe we should make
that clear with __DEVEL__ too?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 7/7] driver-core: add preferred async probe option for built-in and modules

2014-10-06 Thread Tejun Heo
Hello,

On Mon, Oct 06, 2014 at 10:36:27PM +0200, Luis R. Rodriguez wrote:
> > Do we intend to keep this param permanently?  Isn't this more of a
> > temp tool to be used during development?  If so, maybe we should make
> > that clear with __DEVEL__ too?
> 
> As its designed right now no, its not a temp tool, its there to
> require compatibility with old userspace. For modules we can require
> the module parameter but for built-in we need something else and this
> is what came to mind. It is also what would allow the prefer_async_probe
> flag too as otherwise we won't know if userspace is prepared.

I don't get it.  For in-kernel stuff, we already have a clear
synchronization point where we already synchronize all async calls.
Shouldn't we be flushing these async probes there too?  insmod'ing is
userland visible but there's no reason this has to be for the built-in
drivers.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 7/7] driver-core: add preferred async probe option for built-in and modules

2014-10-07 Thread Tejun Heo
Hello,

On Tue, Oct 07, 2014 at 01:10:46AM +0200, Luis R. Rodriguez wrote:
> On Mon, Oct 06, 2014 at 05:01:18PM -0400, Tejun Heo wrote:
> > For in-kernel stuff, we already have a clear
> > synchronization point where we already synchronize all async calls.
> > Shouldn't we be flushing these async probes there too?
> 
> This seems to be addressing if what I meant by prepared, "ready", so let
> me address this as I do think its important.
> 
> By async calls do you mean users of async_schedule()? I see it

Yes.

> also uses system_unbound_wq as well but I do not see anyone calling
> flush_workqueue(system_unbound_wq) on the kernel. We do use
> async_synchronize_full() on kernel_init() but that just waits.

But you can create a new workqueue and queue all the async probing
work items there and flush the workqueue right after
async_synchronize_full().

...
> bus.enable_kern_async=1 would still also serve as a helper for the driver core
> to figure out if it should use async probe then on modules if 
> prefer_async_probe
> was enabled. Let me know if you figure out a way to avoid it.

Why do we need the choice at all?  It always should, no?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 7/7] driver-core: add preferred async probe option for built-in and modules

2014-10-07 Thread Tejun Heo
Hello,

On Tue, Oct 07, 2014 at 07:50:10PM +0200, Luis R. Rodriguez wrote:
> On Tue, Oct 07, 2014 at 01:34:04PM -0400, Tejun Heo wrote:
> > But you can create a new workqueue and queue all the async probing
> > work items there and flush the workqueue right after
> > async_synchronize_full().
> 
> On second thought I would prefer to avoid this, I see this being good
> to help with old userspace but other than that I don't see a requirement
> for new userspace. Do you?

Hmmm... we batch up and do everything parallel, so I'm not sure how
much gain we'd be looking at by not waiting for at the end before
jumping into the userland.  Also, it's a bit of an orthogonal issue.
If we wanna skip such synchornization point before passing control to
userland, why are we applying that to this but not
async_synchronize_full() which has a far larger impact?  It's weird to
synchronize one while not the other, so yeah, if there are actual
benefits we can consider it but let's do it separately.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/14] libata: SATL update

2016-04-04 Thread Tejun Heo
Hello, Hannes.

Applied the series to libata/for-4.7-zac with cosmetic updates
((s64)-1 -> UINT64_MAX, kbuild warning fixes and other formatting
updates).

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] libata: ZAC support

2016-04-04 Thread Tejun Heo
Hello, Hannes.

Except for the minor nits, looks good to me.  Will wait for refresh
and apply to libata/for-4.7-zac.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] libata: implement ZBC IN translation

2016-04-04 Thread Tejun Heo
On Mon, Apr 04, 2016 at 11:47:48AM +0200, Hannes Reinecke wrote:
> +/*

/**

> + * ata_scsi_report_zones_complete
> + *
> + * Convert T-13 little-endian field representation into
> + * T-10 big-endian field representation.
> + */
> +static void ata_scsi_report_zones_complete(struct ata_queued_cmd *qc)
> +{
> + struct scsi_cmnd *scmd = qc->scsicmd;
> + struct sg_mapping_iter miter;
> + unsigned long flags;
> + unsigned int bytes = 0;
> +
> + sg_miter_start(&miter, scsi_sglist(scmd), scsi_sg_count(scmd),
> +SG_MITER_TO_SG | SG_MITER_ATOMIC);
> +
> + local_irq_save(flags);
> + while (sg_miter_next(&miter)) {
> + unsigned int offset = 0;

Not this patch's fault but ugh  :(

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 03/14] libata-scsi: sanitize ata_gen_ata_sense()

2016-04-04 Thread Tejun Heo
On Mon, Apr 04, 2016 at 02:26:51PM +0300, Sergei Shtylyov wrote:
> >+} else {
> >+/* Could not decode error */
> >+ata_dev_warn(dev, "could not decode error status 0x%x err_mask 
> >0x%x\n",
> 
>"%#x" is equivalent and takes up less space.

Oops, gmail for some reason put Sergei's messages into spam folder.
Hannes, can you please generate incremental patches for Sergei's
comments?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 4/5] scsi: rename SCSI_MAX_{SG, SG_CHAIN}_SEGMENTS

2016-04-04 Thread Tejun Heo
On Mon, Apr 04, 2016 at 01:24:52PM -0700, Ming Lin wrote:
> Could you help to review/ack this ATA part?
> Thanks.

Can you please resend me the patches?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 4/5] scsi: rename SCSI_MAX_{SG, SG_CHAIN}_SEGMENTS

2016-04-05 Thread Tejun Heo
On Mon, Apr 04, 2016 at 02:48:10PM -0700, Ming Lin wrote:
> From: Ming Lin 
> 
> Rename SCSI_MAX_SG_SEGMENTS to SG_CHUNK_SIZE, which means the amount
> we fit into a single scatterlist chunk.
> 
> Rename SCSI_MAX_SG_CHAIN_SEGMENTS to SG_MAX_SEGMENTS.
> 
> Will move these 2 generic definitions to scatterlist.h later.
> 
> Reviewed-by: Christoph Hellwig 
> Acked-by: Bart Van Assche  (for ib_srp changes)
> Signed-off-by: Ming Lin 

For libata,

Acked-by: Tejun Heo 

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2 05/14] libata: NCQ Encapsulation for READ LOG DMA EXT

2016-04-13 Thread Tejun Heo
On Tue, Apr 12, 2016 at 08:47:49AM +0200, Hannes Reinecke wrote:
> Recent devices can use NCQ encapsulation for READ LOG DMA EXT,
> so we should be using it if available.

Does this have any actual benefits than being new and shiny?  It's
being called from EH path where we know that no other commands are in
flight.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2 09/14] libata: fixup ZAC device disabling

2016-04-13 Thread Tejun Heo
On Tue, Apr 12, 2016 at 08:47:53AM +0200, Hannes Reinecke wrote:
> libata device disabling is ... curious. So add the correct
> definitions that we can disable ZAC devices properly.

Fold into an earlier patch?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2 06/14] libata: Check log page directory before accessing pages

2016-04-13 Thread Tejun Heo
On Tue, Apr 12, 2016 at 08:47:50AM +0200, Hannes Reinecke wrote:
> + log_pages = get_unaligned_le16(&ap->sector_buf[log_index]);
> + if (!log_pages) {
> + ata_dev_warn(dev,
> +  "NCQ Send/Recv Log not supported\n");

Shouldn't this be an info message?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2 05/14] libata: NCQ Encapsulation for READ LOG DMA EXT

2016-04-14 Thread Tejun Heo
Hello, Hannes.

On Thu, Apr 14, 2016 at 07:44:19AM +0200, Hannes Reinecke wrote:
> Hehe. No, it isn't, if you look closely.
> (Or make that: it _shouldn't_, and I've messed it up)
> (That's what the 'fpdma' parameter is for)
> 
> The benefit is not so much for normal operations, but it'll give us
> a performance improvements on SMR drives where we need to issue
> READ LOG DMA EXT rather frequently. There we really want to have
> them as NCQ commands.

I'm still a bit confused.  Isn't it part of EH?  If so, the path is
never traveled with other commands in flight and thus whether a
command is NCQ or not doesn't make any difference.  The only thing
it'd do is making devices which advertise the capability but don't get
it quite right fail.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2 09/14] libata: fixup ZAC device disabling

2016-04-14 Thread Tejun Heo
On Thu, Apr 14, 2016 at 07:48:03AM +0200, Hannes Reinecke wrote:
> On 04/13/2016 08:09 PM, Tejun Heo wrote:
> > On Tue, Apr 12, 2016 at 08:47:53AM +0200, Hannes Reinecke wrote:
> >> libata device disabling is ... curious. So add the correct
> >> definitions that we can disable ZAC devices properly.
> > 
> > Fold into an earlier patch?
> > 
> That earlier patch would be commit
> 9162c6579bf90b3f5ddb7e3a6c6fa946c1b4cbeb
> ("libata: Implement ATA_DEV_ZAC"), which got merged
> in 2014 ...

Ah, right you are.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2 05/14] libata: NCQ Encapsulation for READ LOG DMA EXT

2016-04-14 Thread Tejun Heo
On Thu, Apr 14, 2016 at 05:59:48PM +0200, Hannes Reinecke wrote:
> For this patch, yes, you are right.
> However, the ZAC enablement patches later on submit READ LOG EXT commands
> (for REPORT ZONES), and _they_ benefit from NCQ encapsulation.

Umm... so, you can't use ata_exec_internal() outside of EH context.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv3 00/14] libata: ZAC support

2016-04-25 Thread Tejun Heo
Hello,

On Mon, Apr 25, 2016 at 12:45:42PM +0200, Hannes Reinecke wrote:
> here's a patchset implementing ZAC support for libata.
> 
> This is the second part of a larger patchset for ZAC/ZBC support;
> it requires the scsi trace fixes queued for in mkp/4.7/scsi-queue and
> the patchset 'libata: SATL update' queued in tj/for-4.7-zac.

The patches look good from libata side.  If others are okay with it, I
can pull mkp/4.7/scsi-queue into tj/for-4.7-zac and apply this series
on top of it.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv3 00/14] libata: ZAC support

2016-05-09 Thread Tejun Heo
On Fri, May 06, 2016 at 01:05:36PM +0200, Hannes Reinecke wrote:
> On 04/25/2016 10:16 PM, Tejun Heo wrote:
> > Hello,
> > 
> > On Mon, Apr 25, 2016 at 12:45:42PM +0200, Hannes Reinecke wrote:
> >> here's a patchset implementing ZAC support for libata.
> >>
> >> This is the second part of a larger patchset for ZAC/ZBC support;
> >> it requires the scsi trace fixes queued for in mkp/4.7/scsi-queue and
> >> the patchset 'libata: SATL update' queued in tj/for-4.7-zac.
> > 
> > The patches look good from libata side.  If others are okay with it, I
> > can pull mkp/4.7/scsi-queue into tj/for-4.7-zac and apply this series
> > on top of it.
> > 
> Ping?
> 
> I can easily split off the libsas bits into a different patchset if
> this turns out to be an issue.
> But it would be really good if this could make it 4.7, it will help
> further integration a _lot_.

Pulled in mkp/4.7/scsi-queue into libata/for-4.7-zac and applied the
patches on top.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Converting ipr to use ata_port_operations->error_handler

2016-05-25 Thread Tejun Heo
Hello, Brian.

So, of all the ata drivers, ipr seems to be the only driver which
doesn't implement ata_port_opeations->error_handler and thus depends
on the old error handling path. I'm wondering whether it'd be possible
to convert ipr to implement ->error_handler and drop the old path.
Would that be difficult?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   3   4   5   6   7   8   >