Re: [PATCH 04/12] IB/srp: Fix connection state tracking

2015-05-05 Thread Doug Ledford
On Tue, 2015-05-05 at 17:27 +0200, Bart Van Assche wrote:
> On 05/05/15 17:10, Doug Ledford wrote:
> > On Tue, 2015-05-05 at 16:26 +0200, Bart Van Assche wrote:
> >> On 05/05/15 16:10, Doug Ledford wrote:
> >>> However, while looking through the driver to research this, I noticed
> >>> something else that seems more important if you ask me.  With this patch
> >>> we now implement individual channel connection tracking.  However, in
> >>> srp_queuecommand() you pick the channel based on the tag, and the blk
> >>> layer has no idea of these disconnects, so the blk layer is free to
> >>> assign a tag/channel to a channel that's disconnected, and then as best
> >>> I can tell, you will simply try to post a work request to a channel
> >>> that's already disconnected, which I would expect to fail if we have
> >>> already disconnected this particular qp and not brought up a new one
> >>> yet.  So it seems to me there is a race condition between new incoming
> >>> SCSI commands and this disconnect/reconnect window, and that maybe we
> >>> should be sending these commands back to the mid layer for requeueing
> >>> when the channel the blk_mq tag points to is disconnected.  Or am I
> >>> missing something in there?
> >>
> >> Hello Doug,
> >>
> >> Around the time a cable disconnect or other link layer failure is
> >> detected by the SRP initiator or any other SCSI LLD it is unavoidable
> >> that one or more SCSI requests fail. It is up to a higher layer (e.g.
> >> dm-multipath + multipathd) to decide what to do with such requests, e.g.
> >> queue these requests and resend these over another path.
> >
> > Sure, but that wasn't my point.  My point was that if you know the
> > channel is disconnected, then why don't you go immediately to the
> > correct action in queuecommand (where correct action could be requeue
> > waiting on reconnect or return with error, whatever is appropriate)?
> > Instead you attempt to post a command to a known disconnected queue
> > pair.
> >
> >> The SRP initiator driver has been tested thoroughly with the multipath
> >> queue_if_no_path policy, with a fio job with I/O verification enabled
> >> running on top of a dm device while concurrently repeatedly simulating
> >> link layer failures (via ibportstate).
> >
> > Part of my questions here are because I don't know how the blk_mq
> > handles certain conditions.  However, your testing above only handles
> > one case: all channels get dropped.  As unlikely it may be, what if
> > resource constraints caused just one channel to be dropped out of the
> > bunch and the others stayed alive?  Then the blk_mq would see requests
> > on just one queue come back errored, while the others finished
> > successfully.  Does it drop that one queue out of rotation, or does it
> > fail over the entire connection?
> 
> Hello Doug,
> 
> Sorry but I don't think that a SCSI LLD is the appropriate layer to 
> choose between requeuing or failing a request.

Be that as it may, that doesn't change what I said about posting a
command to a known disconnected QP.  You could just fail immediately.
Something like:

if (!ch->connected) {
scmnd->result = DID_NO_CONNECT;
goto err;
}

right after getting the channel in queuecommand would work.  That would
save a couple spinlocks, several DMA mappings, a call into the low level
driver, and a few other things.  (And I only left requeue on the table
because I wasn't sure how the blk_mq dealt with just a single channel
being down versus all of them being down)

>  If multiple paths are 
> available between an initiator system and a SAN and if one path fails

Who says the path failed?  The path may be just fine.

> only the multipath layer knows whether there are other working paths 
> available. If a working path is still available then the request should 
> be resent as soon as possible over another path. The multipath layer can 
> only take such a decision after a SCSI LLD has failed a request.

Sure.  I totally get failing fast and unilaterally for multipath managed
devices.  That's all assuming multipath though.  There are uses without
that.

But my point in all of this is that if you have a single qp between
yourself and the target, then any error including a qp resource error ==
path error since you only have one path.  When you have a multi queue
device, that's no longer true.  A transient resource problem on one qp
does not mean a path event (at least not necessarily, although your
statement below converts a QP event into a p

Re: [PATCH 0/3] IB/srp patches for Linux kernel v4.3

2015-08-01 Thread Doug Ledford
On 07/31/2015 05:12 PM, Bart Van Assche wrote:
> Hello Doug,
> 
> Please apply the following three patches at your earliest convenience:
> 
> 0001-IB-srp-Constify-a-function-argument.patch
> 0002-IB-srp-Handle-partial-connection-success-correctly.patch
> 0003-IB-srp-Bump-driver-version-and-release-date.patch

Got 'em, thanks!


-- 
Doug Ledford 
  GPG KeyID: 0E572FDD




signature.asc
Description: OpenPGP digital signature


Re: iscsi_trx going into D state

2016-12-22 Thread Doug Ledford
>>> [] kthread+0xd8/0xf0
>>>> [] ret_from_fork+0x3f/0x70
>>>> [] 0x
>>>> # cat /proc/23018/stack
>>>> [] target_wait_for_sess_cmds+0x49/0x1a0
>>>> [] isert_wait_conn+0x1ab/0x2f0 [ib_isert]
>>>> [] iscsit_close_connection+0x162/0x870
>>>> [] iscsit_take_action_for_connection_exit+0x7f/0x100
>>>> [] iscsi_target_tx_thread+0x1aa/0x1d0
>>>> [] kthread+0xd8/0xf0
>>>> [] ret_from_fork+0x3f/0x70
>>>> [] 0x
>>>>
>>>> From dmesg:
>>>> [  394.476332] INFO: rcu_sched self-detected stall on CPU
>>>> [  394.476334]  20-...: (23976 ticks this GP)
>>>> idle=edd/141/0 softirq=292/292 fqs=18788
>>>> [  394.476336]   (t=24003 jiffies g=3146 c=3145 q=0)
>>>> [  394.476337] Task dump for CPU 20:
>>>> [  394.476338] kworker/u68:2   R  running task0 12906  2 
>>>> 0x0008
>>>> [  394.476345] Workqueue: isert_comp_wq isert_cq_work [ib_isert]
>>>> [  394.476346]  883f2fe38000 f805705e 883f7fd03da8
>>>> 810ac8ff
>>>> [  394.476347]  0014 81adb680 883f7fd03dc0
>>>> 810af239
>>>> [  394.476348]  0015 883f7fd03df0 810e1cd0
>>>> 883f7fd17b80
>>>> [  394.476348] Call Trace:
>>>> [  394.476354][] sched_show_task+0xaf/0x110
>>>> [  394.476355]  [] dump_cpu_task+0x39/0x40
>>>> [  394.476357]  [] rcu_dump_cpu_stacks+0x80/0xb0
>>>> [  394.476359]  [] rcu_check_callbacks+0x540/0x820
>>>> [  394.476360]  [] ? account_system_time+0x81/0x110
>>>> [  394.476363]  [] ? tick_sched_do_timer+0x50/0x50
>>>> [  394.476364]  [] update_process_times+0x39/0x60
>>>> [  394.476365]  [] tick_sched_handle.isra.17+0x25/0x60
>>>> [  394.476366]  [] tick_sched_timer+0x3d/0x70
>>>> [  394.476368]  [] __hrtimer_run_queues+0x102/0x290
>>>> [  394.476369]  [] hrtimer_interrupt+0xa8/0x1a0
>>>> [  394.476372]  [] local_apic_timer_interrupt+0x35/0x60
>>>> [  394.476374]  [] smp_apic_timer_interrupt+0x3d/0x50
>>>> [  394.476376]  [] apic_timer_interrupt+0x87/0x90
>>>> [  394.476379][] ? console_unlock+0x41e/0x4e0
>>>> [  394.476380]  [] vprintk_emit+0x2fc/0x500
>>>> [  394.476382]  [] vprintk_default+0x1f/0x30
>>>> [  394.476384]  [] printk+0x5d/0x74
>>>> [  394.476388]  [] transport_lookup_cmd_lun+0x1d1/0x200
>>>> [  394.476390]  [] iscsit_setup_scsi_cmd+0x230/0x540
>>>> [  394.476392]  [] isert_rx_do_work+0x3f3/0x7f0 
>>>> [ib_isert]
>>>> [  394.476394]  [] isert_cq_work+0x184/0x770 [ib_isert]
>>>> [  394.476396]  [] process_one_work+0x14f/0x400
>>>> [  394.476397]  [] worker_thread+0x114/0x470
>>>> [  394.476398]  [] ? __schedule+0x34a/0x7f0
>>>> [  394.476399]  [] ? rescuer_thread+0x310/0x310
>>>> [  394.476400]  [] kthread+0xd8/0xf0
>>>> [  394.476402]  [] ? kthread_park+0x60/0x60
>>>> [  394.476403]  [] ret_from_fork+0x3f/0x70
>>>> [  394.476404]  [] ? kthread_park+0x60/0x60
>>>> [  405.716632] Unexpected ret: -104 send data 360
>>>> [  405.721711] tx_data returned -32, expecting 360.
>>>> 
>>>> Robert LeBlanc
>>>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1

When you combine this trace with the newest one, it really makes me
thing there is something of a bad interaction between the new drain cq
API and the iser/isert implementation to use said API.  Sagi, Christoph?

-- 
Doug Ledford 
GPG Key ID: B826A3330E572FDD
Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD



signature.asc
Description: OpenPGP digital signature


Re: [PATCH net-next 4/5] treewide: replace dev->trans_start update with helper

2016-05-12 Thread Doug Ledford
On 05/03/2016 10:33 AM, Florian Westphal wrote:
> Replace all trans_start updates with netif_trans_update helper.
> change was done via spatch:
> 
> struct net_device *d;
> @@
> - d->trans_start = jiffies
> + netif_trans_update(d)
> 
> Compile tested only.
> 
> Cc: user-mode-linux-de...@lists.sourceforge.net
> Cc: linux-xte...@linux-xtensa.org
> Cc: linux1394-de...@lists.sourceforge.net
> Cc: linux-r...@vger.kernel.org
> Cc: net...@vger.kernel.org
> Cc: mpt-fusionlinux@broadcom.com
> Cc: linux-scsi@vger.kernel.org
> Cc: linux-...@vger.kernel.org
> Cc: linux-par...@vger.kernel.org
> Cc: linux-o...@vger.kernel.org
> Cc: linux-h...@vger.kernel.org
> Cc: linux-...@vger.kernel.org
> Cc: linux-wirel...@vger.kernel.org
> Cc: linux-s...@vger.kernel.org
> Cc: de...@driverdev.osuosl.org
> Cc: b.a.t.m@lists.open-mesh.org
> Cc: linux-blueto...@vger.kernel.org
> Signed-off-by: Florian Westphal 
> ---
>  Checkpatch complains about whitespace damage, but
>  this extra whitespace already exists before this patch.
> 
>  drivers/infiniband/hw/nes/nes_nic.c| 2 +-
>  drivers/infiniband/ulp/ipoib/ipoib_cm.c| 2 +-
>  drivers/infiniband/ulp/ipoib/ipoib_ib.c| 2 +-

For InfiniBand bits,

Acked-by: Doug Ledford 

-- 
Doug Ledford 
  GPG KeyID: 0E572FDD




signature.asc
Description: OpenPGP digital signature


Re: [PATCH] scsi: delete decade+ obsolete aic7xxx_old driver

2013-09-16 Thread Doug Ledford
Yes, this driver is well past ready to be removed.

Acked-by: Doug Ledford 

Sent from my ASUS Pad

Paul Gortmaker  wrote:

>After getting warnings in an allyesconfig build[1] from this
>driver, I decided to remind myself just how old it was, and
>whether it warranted fixing.  In the Kconfig help text, I found:
>
>  "This driver will eventually be phased out entirely"
>
>Going back to the history archive, I see the line was added[2]
>in Feb 2002, when we moved from v2.4.2.1 ---> v2.4.2.2
>
>So, with over a decade of notification, and multiple major releases
>since then, I think we can justify removing this.  Currently we have
>people wasting time building it during routine testing, and then
>wasting more time re-researching the known reported warnings, only to
>find that nobody really is willing to integrate the fixes[3] for it.
>
>A quick search didn't seem to indicate any active user base for it.
>If someone happens to have a quirky _old_ card that the eleven year
>old "new" driver doesn't work with, then it is entirely reasonable
>that they stick with a kernel version that predates this removal.
>
>[1] drivers/scsi/aic7xxx_old.c: In function ‘aic7xxx_register’:
>drivers/scsi/aic7xxx_old.c:7901:5: warning: case value ‘257’ not in 
> enumerated type ‘ahc_chip’ [-Wswitch]
>drivers/scsi/aic7xxx_old.c:7898:5: warning: case value ‘513’ not in 
> enumerated type ‘ahc_chip’ [-Wswitch]
>drivers/scsi/aic7xxx_old.c: In function ‘aic7xxx_load_seeprom’:
>drivers/scsi/aic7xxx_old.c:8517:5: warning: case value ‘257’ not in 
> enumerated type ‘ahc_chip’ [-Wswitch]
>drivers/scsi/aic7xxx_old.c:8510:5: warning: case value ‘513’ not in 
> enumerated type ‘ahc_chip’ [-Wswitch]
>
>[2] http://git.kernel.org/cgit/linux/kernel/git/tglx/history.git commit 
>44e8778c
>
>[3] https://lkml.org/lkml/2012/10/29/215
>
>Cc: Hannes Reinecke 
>Cc: Doug Ledford 
>Cc: "James E.J. Bottomley" 
>Signed-off-by: Paul Gortmaker 
>---
>
>[This is an "--irreversible-delete" pseudo-patch which doesn't show all
>the file content that was deleted wholesale.  The full commit is at:
>git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux.git aic7xxx-delete ]
>
> Documentation/scsi/00-INDEX | 2 -
> Documentation/scsi/aic7xxx_old.txt  |   511 --
> MAINTAINERS | 1 -
> drivers/scsi/Kconfig|41 -
> drivers/scsi/Makefile   | 1 -
> drivers/scsi/aic7xxx_old.c  | 11149 --
> drivers/scsi/aic7xxx_old/aic7xxx.h  |28 -
> drivers/scsi/aic7xxx_old/aic7xxx.reg|  1401 
> drivers/scsi/aic7xxx_old/aic7xxx.seq|  1539 -
> drivers/scsi/aic7xxx_old/aic7xxx_proc.c |   270 -
> drivers/scsi/aic7xxx_old/aic7xxx_reg.h  |   629 --
> drivers/scsi/aic7xxx_old/aic7xxx_seq.c  |   817 ---
> drivers/scsi/aic7xxx_old/scsi_message.h |49 -
> drivers/scsi/aic7xxx_old/sequencer.h|   135 -
> 14 files changed, 16573 deletions(-)
> delete mode 100644 Documentation/scsi/aic7xxx_old.txt
> delete mode 100644 drivers/scsi/aic7xxx_old.c
> delete mode 100644 drivers/scsi/aic7xxx_old/aic7xxx.h
> delete mode 100644 drivers/scsi/aic7xxx_old/aic7xxx.reg
> delete mode 100644 drivers/scsi/aic7xxx_old/aic7xxx.seq
> delete mode 100644 drivers/scsi/aic7xxx_old/aic7xxx_proc.c
> delete mode 100644 drivers/scsi/aic7xxx_old/aic7xxx_reg.h
> delete mode 100644 drivers/scsi/aic7xxx_old/aic7xxx_seq.c
> delete mode 100644 drivers/scsi/aic7xxx_old/scsi_message.h
> delete mode 100644 drivers/scsi/aic7xxx_old/sequencer.h
>
>diff --git a/Documentation/scsi/00-INDEX b/Documentation/scsi/00-INDEX
>index 9b0787f..2044be5 100644
>--- a/Documentation/scsi/00-INDEX
>+++ b/Documentation/scsi/00-INDEX
>@@ -42,8 +42,6 @@ aic79xx.txt
>   - Adaptec Ultra320 SCSI host adapters
> aic7xxx.txt
>   - info on driver for Adaptec controllers
>-aic7xxx_old.txt
>-  - info on driver for Adaptec controllers, old generation
> arcmsr_spec.txt
>   - ARECA FIRMWARE SPEC (for IOP331 adapter)
> dc395x.txt
>diff --git a/Documentation/scsi/aic7xxx_old.txt 
>b/Documentation/scsi/aic7xxx_old.txt
>deleted file mode 100644
>index ecfc474..000
>diff --git a/MAINTAINERS b/MAINTAINERS
>index e61c2e8..c79be42 100644
>--- a/MAINTAINERS
>+++ b/MAINTAINERS
>@@ -470,7 +470,6 @@ M: Hannes Reinecke 
> L:linux-scsi@vger.kernel.org
> S:Maintained
> F:drivers/scsi/aic7xxx/
>-F:drivers/scsi/aic7xxx_old/
> 
> AIMSLAB FM RADIO RECEIVER DRIVER
> M:Hans Verkuil 
>diff --git a/drivers/scsi/Kconfig b/drivers/scsi/Kconfig
>index fe25677..1f02003 100644
>--- a/drivers/scsi/Kconf

Re: [PATCH] scsi: delete decade+ obsolete aic7xxx_old driver

2013-09-17 Thread Doug Ledford
On 09/17/13 16:10, James Bottomley wrote:

> OK, so do we have any real evidence that no-one uses this driver?  Does
> any distro actually compile it, for instance?

Red Hat doesn't use it in any of their products (and it hasn't been the
preferred driver since about the old Red Hat Linux 7.0 days).


-- 
Doug Ledford 
  GPG KeyID: 0E572FDD
  http://people.redhat.com/dledford

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 10/13] IB/srp: use the new CQ API

2015-12-11 Thread Doug Ledford
On 12/11/2015 09:22 AM, Christoph Hellwig wrote:
> Hi Bart,
> 
> thanks for all the reviews.  I've updated the git branch with your
> suggestions and reviewed-by tags.  I'm going to wait a little bit
> longer for other reviews to come in before reposting the series.

Indeed, thanks for all the catches Bart.  This patchset, with Bart's
fixups, looks good to me.


-- 
Doug Ledford 
  GPG KeyID: 0E572FDD




signature.asc
Description: OpenPGP digital signature


Re: [PATCH 10/13] IB/srp: use the new CQ API

2015-12-14 Thread Doug Ledford
On 12/13/2015 05:26 AM, Sagi Grimberg wrote:
> 
>> Allright.  How do you want to proceed?  The current rdma-cq branch
>> has all kinds of dependencies, but I've also prepared a new rdma-cq.2
>> branch that could go straight on top of your current queue:
>>
>> http://git.infradead.org/users/hch/rdma.git/shortlog/refs/heads/rdma-cq.2
>>
>> If you're ready to start the 4.5 tree I can send those out as a patch
>> series.
> 
> Will this get on top of iser-remote-inv? Or should I resend atop of this?

I'm going through my inbox right now.  I expect somewhere in there Or
will make his case for why he doesn't like Christoph's patch to get rid
of the attr struct.  I'll listen, and if I'm not convinced, I'll take
that patchset first and this one second. (I reviewed the patchset
alreadyaside from the fact that I *like* having the attr struct
elements in an organized sub-struct, it's fine and it definitely
improves on all of those query calls).

-- 
Doug Ledford 
  GPG KeyID: 0E572FDD




signature.asc
Description: OpenPGP digital signature


Re: completion queue abstraction V2

2015-12-23 Thread Doug Ledford
On 12/07/2015 03:51 PM, Christoph Hellwig wrote:
> This series adds a new RDMA core abstraction that insulated the
> ULPs from the nitty gritty details of CQ polling.  See the individual
> patches for more details.
> 
> Note that this series should be applied on top of my
> "IB: merge struct ib_device_attr into struct ib_device" patch and the
> MR cleanups.

This has been applied, although I used the clean version from your git
repo that didn't depend on the previous patch this one relied upon.

The same comment I made to Christoph Lameter would be helpful here:
please include the v1/v2 in your --subject-prefix you pass to git.
Patchworks does not preserve cover letters nor propagate flags from
cover letter to patch series :-/


-- 
Doug Ledford 
  GPG KeyID: 0E572FDD




signature.asc
Description: OpenPGP digital signature


Re: [PATCH 06/17] Update the infiniband uverbs driver to use idr helper functions.

2015-09-16 Thread Doug Ledford
On 09/16/2015 01:50 PM, Lee Duncan wrote:
> Signed-off-by: Lee Duncan 

Looks OK to me.  The setting of uobj->id is no longer under the lock,
but we won't succeed at an idr lookup until it is set, which means it
won't be found and can't be used in idr_remove_uobj() until after the
uobj->id is set regardless of the lock.

Acked-by: Doug Ledford 

> ---
>  drivers/infiniband/core/uverbs_cmd.c | 12 ++--
>  1 file changed, 2 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/infiniband/core/uverbs_cmd.c 
> b/drivers/infiniband/core/uverbs_cmd.c
> index bbb02ffe87df..1e5b2a66a501 100644
> --- a/drivers/infiniband/core/uverbs_cmd.c
> +++ b/drivers/infiniband/core/uverbs_cmd.c
> @@ -120,24 +120,16 @@ static int idr_add_uobj(struct idr *idr, struct 
> ib_uobject *uobj)
>  {
>   int ret;
>  
> - idr_preload(GFP_KERNEL);
> - spin_lock(&ib_uverbs_idr_lock);
> -
> - ret = idr_alloc(idr, uobj, 0, 0, GFP_NOWAIT);
> + ret = idr_get_index(idr, &ib_uverbs_idr_lock, uobj);
>   if (ret >= 0)
>   uobj->id = ret;
>  
> - spin_unlock(&ib_uverbs_idr_lock);
> - idr_preload_end();
> -
>   return ret < 0 ? ret : 0;
>  }
>  
>  void idr_remove_uobj(struct idr *idr, struct ib_uobject *uobj)
>  {
> - spin_lock(&ib_uverbs_idr_lock);
> - idr_remove(idr, uobj->id);
> - spin_unlock(&ib_uverbs_idr_lock);
> + idr_put_index(idr, &ib_uverbs_idr_lock, uobj->id);
>  }
>  
>  static struct ib_uobject *__idr_get_uobj(struct idr *idr, int id,
> 


-- 
Doug Ledford 
  GPG KeyID: 0E572FDD




signature.asc
Description: OpenPGP digital signature


Re: [GIT PULL] delete decade+ obsolete aic7xxx_old driver

2013-11-08 Thread Doug Ledford

On 11/08/2013 11:04 AM, Paul Gortmaker wrote:


Paul Gortmaker (1):
   scsi: delete decade+ obsolete aic7xxx_old driver

  Documentation/scsi/00-INDEX | 2 -
  Documentation/scsi/aic7xxx_old.txt  |   511 --
  MAINTAINERS | 1 -
  drivers/scsi/Kconfig|41 -
  drivers/scsi/Makefile   | 1 -
  drivers/scsi/aic7xxx_old.c  | 11149 --
  drivers/scsi/aic7xxx_old/aic7xxx.h  |28 -
  drivers/scsi/aic7xxx_old/aic7xxx.reg|  1401 
  drivers/scsi/aic7xxx_old/aic7xxx.seq|  1539 -
  drivers/scsi/aic7xxx_old/aic7xxx_proc.c |   270 -
  drivers/scsi/aic7xxx_old/aic7xxx_reg.h  |   629 --
  drivers/scsi/aic7xxx_old/aic7xxx_seq.c  |   817 ---
  drivers/scsi/aic7xxx_old/scsi_message.h |49 -
  drivers/scsi/aic7xxx_old/sequencer.h|   135 -
  14 files changed, 16573 deletions(-)
  delete mode 100644 Documentation/scsi/aic7xxx_old.txt
  delete mode 100644 drivers/scsi/aic7xxx_old.c
  delete mode 100644 drivers/scsi/aic7xxx_old/aic7xxx.h
  delete mode 100644 drivers/scsi/aic7xxx_old/aic7xxx.reg
  delete mode 100644 drivers/scsi/aic7xxx_old/aic7xxx.seq
  delete mode 100644 drivers/scsi/aic7xxx_old/aic7xxx_proc.c
  delete mode 100644 drivers/scsi/aic7xxx_old/aic7xxx_reg.h
  delete mode 100644 drivers/scsi/aic7xxx_old/aic7xxx_seq.c
  delete mode 100644 drivers/scsi/aic7xxx_old/scsi_message.h
  delete mode 100644 drivers/scsi/aic7xxx_old/sequencer.h



Farewell old friend...don't let the door hit you in the ass on the way 
out ;-)

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] drivers/scsi/aic7xxx_old: Convert to generic boolean-values

2007-02-16 Thread Doug Ledford
On Fri, 2007-02-16 at 10:50 -0800, Andrew Morton wrote:

> Me no understand.
> 
> If you take the specific example of
> 
> void
> ahd_set_syncrate(struct ahd_softc *ahd, struct ahd_devinfo *devinfo,
>u_int period, u_int offset, u_int ppr_options,
>u_int type, int paused)
> 
> then if is crufty, inappropriate and wrong that `paused' is a scalar type.

Although you picked a code segment out of the modern aic7xxx, there is a
matching similar one in aic7xxx_old.  Now, in all fairness, I was at one
point playing with a much more preemptable model for that driver that
allowed nested pauses, at which point the value of pause would have made
sense to be scalar, but that was a *long* time ago.

>  
> It's just not true or sensible that the code is written so that `paused'
> can take a value of seventy eight.  It _is_ a boolean.  It is a truth
> value.  Declaring it as such in the source is all goodness.  Passing the
> value `true' into calls to this function improve readability over passing
> "1".

Hence the reason for the original upper case TRUE/FALSE.  I have to
admit, I don't really like the lower case true/false, it looks like a
variable that can be assigned, thereby changing the implementation of
the function call when in fact each calling location is hard coding a
constant.  But, that's just me and my crufty old C that differentiates
between hard coded things and variables via case.

> So I don't agree with (or understand) your objections.  But I can certainly
> understand reluctance to merge a large-but-minor, do-nothing-much patch into
> a large and not-very-maintained driver.

Hehehe...and here I was thinking of factoring that thing into files and
actually bringing it into the current century.

-- 
Doug Ledford <[EMAIL PROTECTED]>
  GPG KeyID: CFBFF194
  http://people.redhat.com/dledford

Infiniband specific RPMs available at
  http://people.redhat.com/dledford/Infiniband


signature.asc
Description: This is a digitally signed message part


Re: [PATCH v10 00/15] Replace PCI pool by DMA pool API

2017-07-09 Thread Doug Ledford
On Thu, 2017-07-06 at 10:12 +0200, Romain Perier wrote:
> The current PCI pool API are simple macro functions direct expanded
> to
> the appropriate dma pool functions. The prototypes are almost the
> same
> and semantically, they are very similar. I propose to use the DMA
> pool
> API directly and get rid of the old API.
> 
> This set of patches, replaces the old API by the dma pool API
> and remove the defines.

Is someone planning on merging this series?  If not, I'll send through
the patches I've personally tested (3, 5, and 6).

> Changes in v10:
> - Rebased series onto next-20170706
> - I have fixed and improved patch "scsi: megaraid: Replace PCI pool
> old API"
> 
> Changes in v9:
> - Rebased series onto next-20170522
> - I have fixed and improved the patch for lpfc driver
> 
> Changes in v8:
> - Rebased series onto next-20170428
> 
> Changes in v7:
> - Rebased series onto next-20170416
> - Added Acked-by, Tested-by and Reviwed-by tags
> 
> Changes in v6:
> - Fixed an issue reported by kbuild test robot about changes in
> DAC960
> - Removed patches 15/19,16/19,17/19,18/19. They have been merged by
> Greg
> - Added Acked-by Tags
> 
> Changes in v5:
> - Re-worded the cover letter (remove sentence about checkpatch.pl)
> - Rebased series onto next-20170308
> - Fix typos in commit message
> - Added Acked-by Tags
> 
> Changes in v4:
> - Rebased series onto next-20170301
> - Removed patch 20/20: checks done by checkpath.pl, no longer
> required.
>   Thanks to Peter and Joe for their feedbacks.
> - Added Reviewed-by tags
> 
> Changes in v3:
> - Rebased series onto next-20170224
> - Fix checkpath.pl reports for patch 11/20 and patch 12/20
> - Remove prefix RFC
> Changes in v2:
> - Introduced patch 18/20
> - Fixed cosmetic changes: spaces before brace, live over 80
> characters
> - Removed some of the check for NULL pointers before calling
> dma_pool_destroy
> - Improved the regexp in checkpatch for pci_pool, thanks to Joe
> Perches
> - Added Tested-by and Acked-by tags
> 
> Romain Perier (15):
>   block: DAC960: Replace PCI pool old API
>   dmaengine: pch_dma: Replace PCI pool old API
>   IB/mthca: Replace PCI pool old API
>   net: e100: Replace PCI pool old API
>   mlx4: Replace PCI pool old API
>   mlx5: Replace PCI pool old API
>   wireless: ipw2200: Replace PCI pool old API
>   scsi: be2iscsi: Replace PCI pool old API
>   scsi: csiostor: Replace PCI pool old API
>   scsi: lpfc: Replace PCI pool old API
>   scsi: megaraid: Replace PCI pool old API
>   scsi: mpt3sas: Replace PCI pool old API
>   scsi: mvsas: Replace PCI pool old API
>   scsi: pmcraid: Replace PCI pool old API
>   PCI: Remove PCI pool macro functions
> 
>  drivers/block/DAC960.c| 38 ++-
>  drivers/block/DAC960.h|  4 +-
>  drivers/dma/pch_dma.c | 12 ++--
>  drivers/infiniband/hw/mthca/mthca_av.c| 10 +--
>  drivers/infiniband/hw/mthca/mthca_cmd.c   |  8 +--
>  drivers/infiniband/hw/mthca/mthca_dev.h   |  4 +-
>  drivers/net/ethernet/intel/e100.c | 12 ++--
>  drivers/net/ethernet/mellanox/mlx4/cmd.c  | 10 +--
>  drivers/net/ethernet/mellanox/mlx4/mlx4.h |  2 +-
>  drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 11 ++--
>  drivers/net/wireless/intel/ipw2x00/ipw2200.c  | 13 ++--
>  drivers/scsi/be2iscsi/be_iscsi.c  |  6 +-
>  drivers/scsi/be2iscsi/be_main.c   |  6 +-
>  drivers/scsi/be2iscsi/be_main.h   |  2 +-
>  drivers/scsi/csiostor/csio_hw.h   |  2 +-
>  drivers/scsi/csiostor/csio_init.c | 11 ++--
>  drivers/scsi/csiostor/csio_scsi.c |  6 +-
>  drivers/scsi/lpfc/lpfc.h  | 16 ++---
>  drivers/scsi/lpfc/lpfc_init.c | 16 ++---
>  drivers/scsi/lpfc/lpfc_mem.c  | 90 +--
> 
>  drivers/scsi/lpfc/lpfc_nvme.c |  6 +-
>  drivers/scsi/lpfc/lpfc_nvmet.c|  4 +-
>  drivers/scsi/lpfc/lpfc_scsi.c | 12 ++--
>  drivers/scsi/lpfc/lpfc_sli.c  |  6 +-
>  drivers/scsi/megaraid/megaraid_mbox.c | 30 -
>  drivers/scsi/megaraid/megaraid_mm.c   | 29 -
>  drivers/scsi/megaraid/megaraid_sas_base.c | 27 
>  drivers/scsi/megaraid/megaraid_sas_fusion.c   | 46 +++---
>  drivers/scsi/mpt3sas/mpt3sas_base.c   | 73 ++---
> -
>  drivers/scsi/mvsas/mv_init.c  |  6 +-
>  drivers/scsi/mvsas/mv_sas.c   |  6 +-
>  drivers/scsi/pmcraid.c| 10 +--
>  drivers/scsi/pmcraid.h|  2 +-
>  include/linux/mlx5/driver.h   |  2 +-
>  include/linux/pci.h   |  9 ---
>  35 files changed, 269 insertions(+), 278 deletions(-)
> 
-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD



Re: [PATCHv2] ib_srp: Remove WARN_ON in srp_terminate_io()

2018-10-17 Thread Doug Ledford
On Wed, 2018-10-17 at 07:43 -0700, Bart Van Assche wrote:
> On 10/17/18 12:20 AM, Hannes Reinecke wrote:
> > The WARN_ON() is pointless as the rport is placed in SDEV_TRANSPORT_OFFLINE
> > at that time, so no new commands can be submitted via srp_queuecommand()
> > 
> > Signed-off-by: Hannes Reinecke 
> > Reviewed-by: Jens Axboe 
> > Reviewed-by: Johannes Thumshirn 
> > ---
> >   drivers/infiniband/ulp/srp/ib_srp.c | 7 ---
> >   1 file changed, 7 deletions(-)
> > 
> > diff --git a/drivers/infiniband/ulp/srp/ib_srp.c 
> > b/drivers/infiniband/ulp/srp/ib_srp.c
> > index 0b34e909505f..5a79444c2f3c 100644
> > --- a/drivers/infiniband/ulp/srp/ib_srp.c
> > +++ b/drivers/infiniband/ulp/srp/ib_srp.c
> > @@ -1334,13 +1334,6 @@ static void srp_terminate_io(struct srp_rport *rport)
> > struct scsi_device *sdev;
> > int i, j;
> >   
> > -   /*
> > -* Invoking srp_terminate_io() while srp_queuecommand() is running
> > -* is not safe. Hence the warning statement below.
> > -*/
> > -   shost_for_each_device(sdev, shost)
> > -   WARN_ON_ONCE(sdev->request_queue->request_fn_active);
> > -
> > for (i = 0; i < target->ch_count; i++) {
> > ch = &target->ch[i];
> 
> Although I had explained before why I think that warning is not 
> pointless, I agree with this change because the legacy block layer is 
> going away. Anyway:
> 
> Acked-by: Bart Van Assche 

Thanks, applied to for-next.

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD


signature.asc
Description: This is a digitally signed message part


Re: [PATCHv2] ib_srp: Remove WARN_ON in srp_terminate_io()

2018-10-17 Thread Doug Ledford
On Wed, 2018-10-17 at 11:30 -0400, Doug Ledford wrote:
> On Wed, 2018-10-17 at 07:43 -0700, Bart Van Assche wrote:
> > On 10/17/18 12:20 AM, Hannes Reinecke wrote:
> > > The WARN_ON() is pointless as the rport is placed in 
> > > SDEV_TRANSPORT_OFFLINE
> > > at that time, so no new commands can be submitted via srp_queuecommand()
> > > 
> > > Signed-off-by: Hannes Reinecke 
> > > Reviewed-by: Jens Axboe 
> > > Reviewed-by: Johannes Thumshirn 
> > > ---
> > >   drivers/infiniband/ulp/srp/ib_srp.c | 7 ---
> > >   1 file changed, 7 deletions(-)
> > > 
> > > diff --git a/drivers/infiniband/ulp/srp/ib_srp.c 
> > > b/drivers/infiniband/ulp/srp/ib_srp.c
> > > index 0b34e909505f..5a79444c2f3c 100644
> > > --- a/drivers/infiniband/ulp/srp/ib_srp.c
> > > +++ b/drivers/infiniband/ulp/srp/ib_srp.c
> > > @@ -1334,13 +1334,6 @@ static void srp_terminate_io(struct srp_rport 
> > > *rport)
> > >   struct scsi_device *sdev;
> > >   int i, j;
> > >   
> > > - /*
> > > -  * Invoking srp_terminate_io() while srp_queuecommand() is running
> > > -  * is not safe. Hence the warning statement below.
> > > -  */
> > > - shost_for_each_device(sdev, shost)
> > > - WARN_ON_ONCE(sdev->request_queue->request_fn_active);
> > > -
> > >   for (i = 0; i < target->ch_count; i++) {
> > >   ch = &target->ch[i];
> > 
> > Although I had explained before why I think that warning is not 
> > pointless, I agree with this change because the legacy block layer is 
> > going away. Anyway:
> > 
> > Acked-by: Bart Van Assche 
> 
> Thanks, applied to for-next.
> 

FWIW, this introduced a build warning (shost and sdev are now unused
variables).  I edited the patch to remove the newly unneeded variables.

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD


signature.asc
Description: This is a digitally signed message part


Re: [PATCH v6 06/15] mlx5: Replace PCI pool old API

2017-03-24 Thread Doug Ledford
On Mon, 2017-03-20 at 08:31 +0200, Leon Romanovsky wrote:
> On Sun, Mar 19, 2017 at 06:03:55PM +0100, Romain Perier wrote:
> > 
> > The PCI pool API is deprecated. This commit replaces the PCI pool
> > old
> > API by the appropriate function with the DMA pool API.
> > 
> > Signed-off-by: Romain Perier 
> > Reviewed-by: Peter Senna Tschudin 
> > ---
> >  drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 11 ++-
> >  include/linux/mlx5/driver.h   |  2 +-
> >  2 files changed, 7 insertions(+), 6 deletions(-)
> > 
> 
> Thanks,
> Acked-by: Leon Romanovsky 

Changes look fine to me, and in addition I've compiled and tested them
on mlx5 hardware and verified that RDMA communications still work.

Acked-by: Doug Ledford 
Tested-by: Doug Ledford 

-- 
Doug Ledford 
    GPG KeyID: B826A3330E572FDD
   
Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD



Re: [PATCH v6 03/15] IB/mthca: Replace PCI pool old API

2017-03-24 Thread Doug Ledford
On Sun, 2017-03-19 at 18:03 +0100, Romain Perier wrote:
> The PCI pool API is deprecated. This commit replaces the PCI pool old
> API by the appropriate function with the DMA pool API.
> 
> Signed-off-by: Romain Perier 
> Acked-by: Peter Senna Tschudin 
> Tested-by: Peter Senna Tschudin 
> 

Changes look ok, and I've tested them here as well.

Acked-by: Doug Ledford 
Tested-by: Doug Ledford 

-- 
Doug Ledford 
    GPG KeyID: B826A3330E572FDD
   
Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD



Re: [PATCH v6 05/15] mlx4: Replace PCI pool old API

2017-03-25 Thread Doug Ledford
On Mon, 2017-03-20 at 08:32 +0200, Leon Romanovsky wrote:
> On Sun, Mar 19, 2017 at 06:03:54PM +0100, Romain Perier wrote:
> > 
> > The PCI pool API is deprecated. This commit replaces the PCI pool
> > old
> > API by the appropriate function with the DMA pool API.
> > 
> > Signed-off-by: Romain Perier 
> > Acked-by: Peter Senna Tschudin 
> > Tested-by: Peter Senna Tschudin 
> > ---
> >  drivers/net/ethernet/mellanox/mlx4/cmd.c  | 10 +-
> >  drivers/net/ethernet/mellanox/mlx4/mlx4.h |  2 +-
> >  2 files changed, 6 insertions(+), 6 deletions(-)
> > 
> 
> Thanks,
> Reviewed-by: Leon Romanovsky 

Changes look fine to me, and it has been tested on this specific
hardware and confirmed RDMA communications still work.

Acked-by: Doug Ledford 
Tested-by: Doug Ledford 

-- 
Doug Ledford 
    GPG KeyID: B826A3330E572FDD
   
Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD



Re: [PATCH v2 12/15] RDMA/cma: make config_item_type const

2017-10-18 Thread Doug Ledford
On Mon, 2017-10-16 at 17:18 +0200, Bhumika Goyal wrote:
> Make these structures const as they are either passed to the
> functions
> having the argument as const or stored as a reference in the
> "ci_type"
> const field of a config_item structure.
> 
> Signed-off-by: Bhumika Goyal 

Acked-by: Doug Ledford 

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD



scsi_scan problem.

2001-03-14 Thread Doug Ledford


A bug report I was charged with fixing (qla2x00 driver doesn't see all luns or
sees multiple identical luns in different scenarios) was not a bug in the
qla2x00 driver.  The recent changes to allow max luns in the mid layer to be >
7 seems to have caused this problem.  However, the proper fix is a bit of a
quandry for me.  You see, I don't have any Nakamichi or Yamaha multi-cd
changers, or a DAT or DLT autoloader, or several of the different models of
raid chassis.  The bug is that we were detecting offline devices and linking
them into the device list.  But, some devices (at least the Clariion raid
chassis) report luns that don't currently have any device bound to them as
present but offline.  This meant if we truly scanned all luns then we got
something like 100+ devices on one ID from this chassis when only 1 might be
valid :-(  So, I've attached a patch that solves the problem here perfectly. 
But, I need people that have access to the above listed hardware to test it. 
Most specifically, I'm afraid that the CD changers or autoloaders will report
some of their luns as offline so we will skip them even though we want them
entered into the device list.  If that's not the case, and they list their
luns as all being connected, then this patch needs to go into the mainstream
kernel.

-- 

 Doug Ledford <[EMAIL PROTECTED]>  http://people.redhat.com/dledford
  Please check my web site for aic7xxx updates/answers before
  e-mailing me about problems

--- scsi_scan.c.saveWed Mar 14 20:58:21 2001
+++ scsi_scan.c Wed Mar 14 21:10:28 2001
@@ -557,6 +557,23 @@
}
 
/*
+* If we are offline and we are on a LUN != 0, then skip this entry.
+* If we are on a BLIST_FORCELUN device this will stop the scan at
+* the first offline LUN (typically the correct thing to do).  If
+* we are on a BLIST_SPARSELUN device then this won't stop the scan,
+* but it will keep us from having false entries in our device
+* array. DL
+*
+* NOTE: Need to test this to make sure it doesn't cause problems
+* with tape autoloaders, multidisc CD changers, and external
+* RAID chassis that might use sparse luns or multiluns... DL
+*/
+   if (lun != 0 && (scsi_result[0] >> 5) == 1) {
+   scsi_release_request(SRpnt);
+   return 0;
+   }
+
+   /*
 * Get any flags for this device.  
 */
bflags = get_device_flags (scsi_result);
@@ -776,11 +793,26 @@
 *
 * FIXME(eric) - perhaps this should be a kernel configurable?
 */
+   /*
if (*max_dev_lun < shpnt->max_lun)
*max_dev_lun = shpnt->max_lun;
elseif ((max_scsi_luns >> 1) >= *max_dev_lun)
*max_dev_lun += shpnt->max_lun;
else*max_dev_lun = max_scsi_luns;
+   */
+   /*
+* Blech...the above code is broken.  When you have a device
+* that is present, and it is a SPARSELUN device, then we
+* need to scan *all* the luns on that device.  Besides,
+* skipping the scanning of LUNs is a false optimization.
+* Scanning for a LUN on a present device is a very fast
+* operation, it's scanning for devices that don't exist that
+* is expensive and slow (although if you are truly scanning
+* through MAX_SCSI_LUNS devices that would be bad, I hope
+* all of the controllers out there set a reasonable value
+* in shpnt->max_lun).  DL
+*/
+   *max_dev_lun = shpnt->max_lun;
*sparse_lun = 1;
return 1;
}
@@ -795,11 +827,26 @@
 * I think we need REPORT LUNS in future to avoid scanning
 * of unused LUNs. But, that is another item.
 */
+   /*
if (*max_dev_lun < shpnt->max_lun)
*max_dev_lun = shpnt->max_lun;
elseif ((max_scsi_luns >> 1) >= *max_dev_lun)
*max_dev_lun += shpnt->max_lun;
else*max_dev_lun = max_scsi_luns;
+   */
+   /*
+* Blech...the above code is broken.  When you have a device
+* that is present, and it is a FORCELUN device, then we
+* need to scan *all* the luns on that device.  Besides,
+* skipping the scanning of LUNs is a false optimization.
+* Scanning for a LUN on a present device is a very fast
+* operation, it's scanning for devices that don't exist that
+  

Re: scsi_scan problem.

2001-03-14 Thread Doug Ledford

Pete Zaitcev wrote:
> 
> > Date: Wed, 14 Mar 2001 21:28:14 -0500
> > From: Doug Ledford <[EMAIL PROTECTED]>
> 
> > A bug report I was charged with fixing (qla2x00 driver doesn't see all luns or
> > sees multiple identical luns in different scenarios) was not a bug in the
> > qla2x00 driver.  [...]
> >  The bug is that we were detecting offline devices and linking
> > them into the device list.
> 
> Why is this a bug? What would happen when I telnet into the
> the RAID box and enable my volumes on those LUNs?

Then they would be legitimate devices and you would do:

echo "scsi-add-single-device a b c d" > /proc/scsi/scsi

to add them to the device chain.  Before then, they aren't anything (at least
not in the case of the Clariion array).

> >  But, some devices (at least the Clariion raid
> > chassis) report luns that don't currently have any device bound to them as
> > present but offline.  This meant if we truly scanned all luns then we got
> > something like 100+ devices on one ID from this chassis when only 1 might be
> > valid:-(
> 
> 16384 LUNs for Fibre Channel. As you see, scanning is out of the
> question. You must issue REPORT LUNs and fall back on scanning
> if the device reports a check condition. I did that when I worked
> in Sun Storage with A5000/A3500/T3 arrays couple of years ago.

Patches welcomed.  The one I sent already works on a fiber channel setup (the
qla2x00 in question is fc and so is the Clariion array it's connected to, no
detrimental side effects from scanning the box) and so I'm not inclined to add
a REPORT LUNs section to the code unless absolutely necessary.

-- 

 Doug Ledford <[EMAIL PROTECTED]>  http://people.redhat.com/dledford
  Please check my web site for aic7xxx updates/answers before
  e-mailing me about problems
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]



Re: scsi_scan problem.

2001-03-14 Thread Doug Ledford

Doug Ledford wrote:

> Patches welcomed.  The one I sent already works on a fiber channel setup (the
> qla2x00 in question is fc and so is the Clariion array it's connected to, no
> detrimental side effects from scanning the box) and so I'm not inclined to add
> a REPORT LUNs section to the code unless absolutely necessary.

Clarification, I think REPORT LUNS support is a 2.5 thing, not a stick it in
now thing.

-- 

 Doug Ledford <[EMAIL PROTECTED]>  http://people.redhat.com/dledford
  Please check my web site for aic7xxx updates/answers before
  e-mailing me about problems
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]



Re: scsi_scan problem.

2001-03-14 Thread Doug Ledford

Bob Frey wrote:
> 
> On Wed, Mar 14, 2001 at 09:35:43PM -0500, Pete Zaitcev wrote:
> > 16384 LUNs for Fibre Channel. As you see, scanning is out of the
> > question. You must issue REPORT LUNs and fall back on scanning
> > if the device reports a check condition. I did that when I worked
> Why wait for a check condition? There's an INQUIRY field bit that
> indicates whether REPORT LUNs is supported.

And I'm all for using it in the 2.5 kernel SCSI stack ;-)

-- 

 Doug Ledford <[EMAIL PROTECTED]>  http://people.redhat.com/dledford
  Please check my web site for aic7xxx updates/answers before
  e-mailing me about problems
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]



Re: scsi_scan problem.

2001-03-16 Thread Doug Ledford

"Rafael E. Herrera" wrote:
> 
> I applied the first hunk to version 2.4.3-pre4, as by email with Doug.
> The output for the scsi devices follows and is identical with and
> without the patch.

Thank you Rafael.  This is what I suspected.  I'm not sure when we starting
considering devices with a peripheral qualifier of 1 as being valid, but I
suspect it happened when the scsi_scan.c code was separated out of scsi.c.  In
any case, I'm pretty positive that it is the wrong thing to do.  This report
at least alleviates one of my fears about broken device possibilities and
starts to confirm my position.

> Maybe someone can explain the meaning of the illegal
> requests at the end. Nevertheless, I can use the drive fine.

As to the illegal request messages, I'm not sure what those are about ;-)

-- 

 Doug Ledford <[EMAIL PROTECTED]>  http://people.redhat.com/dledford
  Please check my web site for aic7xxx updates/answers before
  e-mailing me about problems
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]



Re: scsi_scan problem.

2001-03-16 Thread Doug Ledford

Patrick Mansfield wrote:

> If your testing Doug's patch, it might be a good idea to run with/without
> your adapter built as a module, as the kernel is inconsistent in its setting
> of "online" in scsi.c: it sets online TRUE after an attach in
> scsi_register_device_module(), but leaves online as is after an
> attach in scsi_register_host().

Grrr...I hate these kinds of inconsistencies.  They don't belong there. 
Whether a driver is a module or compiled in should not effect whether or not
an attached device is considered valid.

> So, if the scan_scsis set online FALSE, it sometimes is set back to TRUE;
> otherwise, I don't think any other code will set online to TRUE (once it
> is set to FALSE after its scanned, no one can even open the device, not even sg).

This is the case for what I was seeing (but, all of the offline entries used
up all 40 SCSI disk structs that were available for use, so if I brought
another controller online there would be no available disk slots).

> The online = TRUE should probably be removed from scsi_register_device_module(),
> as disks with peripheral qualifier 1 (like Doug's the Clariion storage)
> ususally complain when sent a READ CAPACITY. I got such errors when running with
> a Clariion DASS (DGC RAID/DISK, scsi attached disk array).
> 
> Doug - did you try running with/without your adapter built as a module? I'd
> expect you to get a READ CAPACITY failure for each LUN with PQ 1.

All modular, but none of the disks on mine ever got the error you are
mentioning.  Could possibly be because my root disk is on an aic7xxx and the
array on a qla2x00 that I did *not* let be included in the initrd so I could
bring it up after the rest of the boot was complete.  This, of course, implies
that sd_mod was loaded well before the qla2x00 and if I read your above
comments correctly, that means it scanned the array from someplace other than
scsi_register_device_module() and hence didn't get hit by the problem you are
referring to.  In fact, I would guess the problem you are referring to only
happens when a driver module is loaded prior to sd_mod.o, and that reversing
the order will solve the problem (but, I haven't looked so I could easily be
wrong ;-)


-- 

 Doug Ledford <[EMAIL PROTECTED]>  http://people.redhat.com/dledford
  Please check my web site for aic7xxx updates/answers before
  e-mailing me about problems
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]



Re: scsi_scan problem.

2001-03-16 Thread Doug Ledford

Ishikawa wrote:
> 
> Hi,
> 
> I have an "old" Nakamichi CD changer.
> ("old" might be important consideration here. )
> 
> Should I test the patch submitted and report what I found ?
> (Or maybe I don't have to bother at this stage at all
> and  simply wait for the 2.5 development and debugging cycle?)

It would still be helpful because this problem has to be fixed before 2.5. 
The only question is whether to fix it with a simple patch such as I just
submitted, or a more complex patch that uses REPORT LUNs.  Part of that answer
is how my simple patch works on your device.

-- 

 Doug Ledford <[EMAIL PROTECTED]>  http://people.redhat.com/dledford
  Please check my web site for aic7xxx updates/answers before
  e-mailing me about problems
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]



Re: scsi devices

2001-03-20 Thread Doug Ledford

Jeff Chua wrote:
> 
> Is there somewhere that list which scsi host adaptors work under linux?
> 
> Specifically, I'm interested in the following:
> 
> I-O DATA SC-NBD
> Logitec LHA-600U/A
> 
> These are "low profile" PCI boards. Don't think Adaptec makes low profiles
> SCSI cards. I've a book-pc that has only one low profile slot.

As a matter of fact, the aic7xxx driver (both the old and the new one) support
some Ultra160 low profile SCSI controllers that Adaptec makes.  I don't know
if they are available in non-OEM box sets, but OEMs can certainly get them.

-- 

 Doug Ledford <[EMAIL PROTECTED]>  http://people.redhat.com/dledford
  Please check my web site for aic7xxx updates/answers before
  e-mailing me about problems
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]



Re: 160MB/s drive detected as 80MB/s

2001-03-30 Thread Doug Ledford

Simon Garner wrote:
> 
> Hi,
> 
> I have a RH7 system which I have just compiled kernel-2.4.2 for. The box has
> an Asus CUV4X-D motherboard, and an Adaptec 29160 PCI SCSI controller. I'm
> using the AIC7xxx driver.
> 
> Everything works, except the hard drive is a Quantum Atlas 10K2, 160MB/sec,
> but is only detected as 80MB/sec, as follows:
> 
> # dmesg
> ...
> SCSI subsystem driver Revision: 1.00
> (scsi0)  found at PCI 0/11/0
> (scsi0) Wide Channel, SCSI ID=7, 32/255 SCBs
> (scsi0) Downloading sequencer code... 392 instructions downloaded
> scsi0 : Adaptec AHA274x/284x/294x (EISA/VLB/PCI-Fast SCSI) 5.2.1/5.2.0
>
> (scsi0:0:6:0) ***Synchronous at 80.0 Mbyte/sec***, offset 127.
>   Vendor: QUANTUM   Model: ATLAS10K2-TY184L  Rev: DA40
>   Type:   Direct-Access  ANSI SCSI revision: 03
> Detected scsi disk sda at scsi0, channel 0, id 6, lun 0
> SCSI device sda: 35860910 512-byte hdwr sectors (18361 MB)
> ...
> 
> (My asterisks.)
> 
> Is this wrong, and can it be fixed, or am I just misunderstanding?

The 5.2.4 aic7xxx driver fixes this (patches at my web site, also it is in the
ac series kernels) or you can use Justin Gibbs' new aic7xxx driver (which
never had this issue in the first place) which happens to be the new default
aic7xxx driver in the 2.4.3 kernel.

-- 

 Doug Ledford <[EMAIL PROTECTED]>  http://people.redhat.com/dledford
  Please check my web site for aic7xxx updates/answers before
  e-mailing me about problems
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]



Re: QUEUE_FULL + use_new_eh_code = bad_karma?

2001-04-11 Thread Doug Ledford

Eric Youngdale wrote:
> 
> Well hosts that don't use the new EH code don't handle QUEUE_FULL.  Not
> quite sure what would happen with it, actually.
> 
> If everything goes according to plan, when the device returns
> QUEUE_FULL, the io should get stuck back in the queue to be handled later.
> In addition a flag should be set to prevent further I/O from being sent to
> that device - that flag should be cleared when one of the existing I/O
> requests actually completes.
> 
> I am not completely sure how best to debug this.  I guess the first
> thing to do is to see if in fact the QUEUE_FULL is being handled as I
> described and the flag is getting set.   Then the next thing to look for is
> to see what happens when some other I/O finishes.

Please note that while another command completing is enough to trigger the
next command to be sent, there are cases when we can get a QUEUE_FULL with 0
commands active on the device, resulting in a hung drive if you don't
implement a timeout for the paused queue.

-- 

 Doug Ledford <[EMAIL PROTECTED]>  http://people.redhat.com/dledford
  Please check my web site for aic7xxx updates/answers before
  e-mailing me about problems
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]



Re: LUN skipping in v2.4.x?

2001-04-16 Thread Doug Ledford

"conway, heather" wrote:
> 
> Hi Folks,
> I'm working with a mixture of v2.2.x and v2.4.x hosts in an FC-SW
> environment that requires the host to be able to skip LUNs.  I've applied
> Eric's linux-2.4.1.diff and Doug's scsi241_ey2.diff patches on different
> v2.4.x hosts to address the issue of scanning past LUN 7 on v2.4.x.
> Has anyone addressed allowing a v2.4.x Linux host running to skip LUNs?  or
> to scan past a non-existent target 0 LUN 0?  If so, is there a patch
> available?
> Thanks for your help.

You need to add your device IDs to the scsi blacklist with the flag
BLIST_SPARSELUN to do what you are wanting (although, currently there is no
way around the fact that the linux SCSI subsystem assumes no device exists on
a particular target ID if LUN 0 doesn't exist, so it will skip non-exisitent
LUNs above 0, but if LUN 0 doesn't exist on that target ID, then nothing on
that target ID will get scanned).

-- 

 Doug Ledford <[EMAIL PROTECTED]>  http://people.redhat.com/dledford
  Please check my web site for aic7xxx updates/answers before
  e-mailing me about problems
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]



Re: Linux Cluster using shared scsi

2001-05-02 Thread Doug Ledford

"Eric Z. Ayers" wrote:
> 
> Doug Ledford writes:
> (James Bottomley commented about the need for SCSI reservation kernel patches)
>  >
>  > I agree.  It's something that needs fixed in general, your software needs it
>  > as well, and I've written (about 80% done at this point) some open source
>  > software geared towards getting/holding reservations that also requires the
>  > same kernel patches (plus one more to be fully functional, an ioctl to allow a
>  > SCSI reservation to do a forced reboot of a machine).  I'll be releasing that
>  > package in the short term (once I get back from my vacation anyway).
>  >
> 
> Hello Doug,
> 
> Does this package also tell the kernel to "re-establish" a
> reservation for all devices after a bus reset, or at least inform a
> user level program?  Finding out when there has been a bus reset has
> been a stumbling block for me.

It doesn't have to.  The kernel changes are minimal (basically James' SCSI
reset patch that he's been carrying around, the scsi reservation conflict
patch, and I need to write a third patch that makes the system optionally
reboot immediately on a reservation conflict and which is controlled by an
ioctl, but I haven't done that patch yet).  All of the rest is implemented in
user space via the /dev/sg entries.  As such, it doesn't have any more
information about bus resets than you do.  However, because of the policy
enacted in the code, it doesn't need to.  Furthermore, because there are so
many ways to loose a reservation silently, it's foolhardy to try and keep
reservation consistency any way other than something similar to what I outline
below.

The package is meant to be a sort of "scsi reservation" library.  The
application that uses the library is responsible for setting policy.  I wrote
a small, simple application that actually does a decent job of implementing
policy on the system.  The policy it does implement is simple:

If told to get a reservation, then attempt to get it.  If the attempt is
blocked by an existing reservation and we aren't suppossed to reset the drive,
then exit.  If it's blocked and we are suppossed to reset the drive, then send
a device reset, then wait 5 seconds, then try to get the reservation.  If we
again fail, then the other machine is still alive (as proven by the fact that
it re-established its reservation after the reset) and we exit, else we now
have the reservation.

If told to forcefully get a reservation, then attempt to get it.  If the
attempt fails, then reset the device and try again immediately (no 5 second
wait), if it fails again, then exit.

If told to hold a reservation, then resend your reservation request once every
2 seconds (this actually has very minimal CPU/BUS usage and isn't as big a
deal as requesting a reservation every 2 seconds might sound).  The first time
the reservation is refused, consider the reservation stolen by another machine
and exit (or optionally, reboot).

The package is meant to lock against itself (in other words, a malicious user
with write access to the /dev/sg entries could confuse this locking mechanism,
but it will work cooperatively with other copies of itself running on other
machines), the requirements for the locking to be safe are as follows:

1)  A machine is not allowed to mount or otherwise use a drive in any way
shape or form until it has successfully acquired a reservation.

2)  Once a machine has a reservation, it is not allowed to ever take any
action to break another machines reservation, so that if the reservation is
stolen, this machine is required to "gracefully" step away from the drive
(rebooting is the best way to accomplish this since even the act of unmounting
the drive will attempt to write to it).

3)  The timeouts in the program must be honored (resend your reservation, when
you hold it, every 2 seconds so that a passive attempt to steal the
reservation will see you are still alive within the 5 second timeout and leave
you be, which is a sort of heartbeat in and of itself).

Anyway, as I said in my previous email, it's about 80% complete.  It currently
is up and running on SCSI-2 LUN based reservations.  There is code to do
SCSI-2 and SCSI-3 extent based reservations but it hasn't been tested due to
lack of devices that support extent based reservations (my test bed is a
multipath FC setup, so I'm doing all my testing on FC drives over two FC
controllers in the same machine).  I've still got to add the SCSI-3 Persistent
Reservation code to the library (again, I'm lacking test drives for this
scenario).  The library itself requires that the program treat all
reservations as extent/persistent reservations and it silently falls back to
LUN reservations when neither of those two are available.  My simple program
that goes with the applica

Re: Linux Cluster using shared scsi

2001-05-02 Thread Doug Ledford

Max TenEyck Woodbury wrote:
> 
> Doug Ledford wrote:
> >
> > ...
> >
> > If told to hold a reservation, then resend your reservation request once every
> > 2 seconds (this actually has very minimal CPU/BUS usage and isn't as big a
> > deal as requesting a reservation every 2 seconds might sound).  The first time
> > the reservation is refused, consider the reservation stolen by another machine
> > and exit (or optionally, reboot).
> 
> Umm. Reboot? What do you think this is? Windoze?

It's the *only* way to guarantee that the drive is never touched by more than
one machine at a time (notice, I've not been talking about a shared use drive,
only one machine in the cluster "owns" the drive at a time, and it isn't for
single transactions that it owns the drive, it owns the drive for as long as
it is alive, this is a limitation of the filesystes currently available in
mainstream kernels).  The reservation conflict and subsequent reboot also
*only* happens when a reservation has been forcefully stolen from a machine. 
In that case, you are talking about a machine that has been failed over
against its will, and the absolute safest thing to do in order to make sure
the failed over machine doesn't screw the whole cluster up, is to make it
reboot itself and re-enter the cluster as a new failover slave node instead of
as a master node.  If you want a shared common access device with write
locking semantics, as you seem to be suggesting later on, then you need a
different method of locking than what I put in this, I knew that as I wrote it
and that was intentional.

> Really, You can NOT do clustering well if you don't have a consistent locking
> mechanism. The use of a hardware locking method like 'reservation' may be a
> good way to avoid race conditions, but it should be backed up by the
> appropriate exchange of messages to make sure everybody has the same view of
> the system. For example, you might use it like this:
> 
> 1. Examine the lock list for conflicts. If a conflict is found, the lock
>request fails.
> 
> 2. Reserve the device with the lock on it. If the reservation fails, delay
>a short amount of time and return to 1.
> 
> 3. Update the lock list for the device.
> 
> 4. When the list update is complete, release the reservation.
> 
> In other words, the reservation acts as a spin-lock to make sure updates
> occur atomically.

Apples to oranges, as I described above.  This is for a failover cluster, not
a shared data, load balancing cluster.

-- 

 Doug Ledford <[EMAIL PROTECTED]>  http://people.redhat.com/dledford
  Please check my web site for aic7xxx updates/answers before
  e-mailing me about problems
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]



Re: Linux Cluster using shared scsi

2001-05-02 Thread Doug Ledford

Mike Anderson wrote:
> 
> Doug,
> 
> A question on clarification.
> 
> Is the configuration you are testing have both FC adapters going to the same
> port of the storage device (mutli-path) or to different ports of the storage
> device (mulit-port)?
> 
> The reason I ask is that I thought if you are using SCSI-2 reserves that the
> reserve was on a per initiator basis. How does one know which path has the
> reserve?

Reservations are global in nature in that a reservation with a device will
block access to that device from all other initiators, including across
different ports on multiport devices (or else they are broken and need a
firmware update).

> On a side note. I thought the GFS project had up leveled there locking / fencing
> into a API called a locking harness to support different kinds of fencing
> methods. Any thoughts if this capability could be plugged into this service so
> that users could reduce recoding depending on which fencing support they
> selected.

I wouldn't know about that.

-- 

 Doug Ledford <[EMAIL PROTECTED]>  http://people.redhat.com/dledford
  Please check my web site for aic7xxx updates/answers before
  e-mailing me about problems
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]



Re: Linux Cluster using shared scsi

2001-05-01 Thread Doug Ledford

James Bottomley wrote:
> 
> [EMAIL PROTECTED] said:
> > So, will Linux ever support the scsi reservation mechanism as standard?
> 
> That's not within my gift.  I can merely write the code that corrects the
> behaviour.  I can't force anyone else to accept it.

I think it will be standard before not too much longer (I hope anyway, I'm
tired of carrying the patches forward all the time so I'll lend my support to
getting it into the mainstream kernel ;-)

> [EMAIL PROTECTED] said:
> > Isn't there a standard that says if you scsi reserve a disk, no one
> > else should be able to access this disk, or is this a "steeleye/
> > Compaq" standard.
> 
> Use of reservations is laid out in the SCSI-2 and SCSI-3 standards (which can
> be downloaded from the T10 site www.t10.org) which are international in scope.
>  I think the implementation issues come because the reservations part is
> really only relevant to a multi-initiator clustered environment which isn't an
> every day configuration for most Linux users.  Obviously, as Linux moves into
> the SAN arena this type of configuration will become a lot more common, at
> which time the various problems associated with multiple initiators should
> rise in prominence.

I agree.  It's something that needs fixed in general, your software needs it
as well, and I've written (about 80% done at this point) some open source
software geared towards getting/holding reservations that also requires the
same kernel patches (plus one more to be fully functional, an ioctl to allow a
SCSI reservation to do a forced reboot of a machine).  I'll be releasing that
package in the short term (once I get back from my vacation anyway).

-- 

 Doug Ledford <[EMAIL PROTECTED]>  http://people.redhat.com/dledford
  Please check my web site for aic7xxx updates/answers before
  e-mailing me about problems
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]



Re: Linux Cluster using shared scsi

2001-05-02 Thread Doug Ledford

Max TenEyck Woodbury wrote:
> 
> Doug Ledford wrote:
> >
> > Max TenEyck Woodbury wrote:
> >>
> >> Umm. Reboot? What do you think this is? Windoze?
> >
> > It's the *only* way to guarantee that the drive is never touched by more
> > than one machine at a time (notice, I've not been talking about a shared
> > use drive, only one machine in the cluster "owns" the drive at a time,
> > and it isn't for single transactions that it owns the drive, it owns
> > the drive for as long as it is alive, this is a limitation of the
> > filesystes currently available in mainstream kernels).  The reservation
> > conflict and subsequent reboot also *only* happens when a reservation
> > has been forcefully stolen from a machine. In that case, you are talking
> > about a machine that has been failed over against its will, and the
> > absolute safest thing to do in order to make sure the failed over machine
> > doesn't screw the whole cluster up, is to make it reboot itself and
> > re-enter the cluster as a new failover slave node instead of as a master
> > node.  If you want a shared common access device with write locking
> > semantics, as you seem to be suggesting later on, then you need a
> > different method of locking than what I put in this, I knew that as I
> > wrote it and that was intentional.
> 
> That was partly meant to be a joke, but it was also meant to make you stop
> and think about what you are doing. From what little context I read, you
> seem to be looking for a high availability solution. Rebooting a system,
> even if there is a hot backup, should only be used as a last resort.

This is something that only happens when a machine has been forcefully failed
over against its will.  I guess you would need to see the code to tell what
I'm talking about, but in the description I gave of the code, if it doesn't
get a reservation, it exits.  The way the code is intended to be used is
something like this:

Given machine A as cluster master and machine B as a cluster slave.  Machine A
starts the reservation program with something like this as the command line:

reserve --reserve --hold /dev/sdc

This will result in the program grabbing a reservation on drive sdc (or
exiting with a non-0 status on failure) and then sitting in a loop where it
re-issues the reservation every 2 seconds.

Under normal operation, the reserve program is not started at all on machine
B.  However, machine B does use the normal heartbeat method (be that the
heartbeat package or something similar, but not reservations) to check that
machine A is still alive.  Given a failure in the communications between
machine B and machine A, which would typically mean it is time to fail over
the cluster, machine B can test the status of machine A by throwing a reset to
the drive to break any existing reservations, waiting 4 seconds, then trying
to run it's own reservation.  This can be accomplished with the command:

reserve --reset --reserve --hold /dev/sdc

If the program fails to get the reservation then that means machine A was able
to resend it's reservation.  Obviously then, machine A isn't dead.  Machine B
can then decide that the heartbeat link is dead but machine A is still fine
and not try any further failover actions, or it could decide that machine A
has a working reserve program but key services or network connectivity may be
dead, in which case a forced failover would be in order.  To accomplish that,
machine B can issue this command:

reserve --preempt --hold /dev/sdc

This will break machine A's reservation and take the drive over from machine
A.  It's at this point, and this point only, that machine A will see a
reservation conflict.  It has been forcefully failed over, so
resetting/rebooting the machine is a perfectly fine alternative (and the
reason it is recommended is because at this point in time, machine B may
already be engaged in recovering the filesystem on the shared drive, and
machine A may still have buffers it is trying to flush to the same drive, so
in order to make sure machine A doesn't let some dirty buffer get through a
break in machine B's reservation caused by something as inane as another
machine on the bus starting up and throwing an initial reset, we should reset
machine A *as soon as we know it has been forcefully failed over and is no
longer allowed to write to the drive*).  Arguments with this can be directed
to Stephen Tweedie, who is largely responsible for beating me into doing it
this way ;-)


> Another problem is that reservations do *not* guarantee ownership over
> the long haul. There are too many mechanisms that break reservations to
> build a complete strategy on them.

See above about the reason for needing to reset the machine ;-)  The overall
package is cooperativ

Re: Linux Cluster using shared scsi

2001-05-02 Thread Doug Ledford

Mike Anderson wrote:
> 
> Doug,
> 
> I guess I worded my question poorly. My question was around multi-path
> devices in combination with SCSI-2 reserve vs SCSI-3 persistent reserve which
> has not always been easy, but is more difficult is you use a name space that
> can slip or can have multiple entries for the same physical device you want
> to reserve.

The software is independant on each machine, so it is entirely possible that
the same disk will be in two different name spaces one two different machines
and everything work just fine.  For example, maybe it's /dev/sdc on machine A
and /dev/sdf on machine B.  That's fine, you simply tell the software on
machine A to grab /dev/sdc and tell the software on machine B to grab /dev/sdf
and all will work properly.  Now, as to mixing SCSI-2 and SCSI-3 Persistent
Reservations on the same drive, not a chance.  The software will automatically
use the best alternative available, so it won't fall back to SCSI-2 LUN
reservations with SCSI-3 Persistent Reservations available (and if you force
it to do so, then you have no one to blame but yourself ;-)

> But here is a second try.
> 
> If this is a failover cluster then node A will need to reserve all disks in
> shareable space using sg or only a subset if node A has sync'd his sd name
> space with the other node and they both wish to do work in disjoint pools of
> disks.
> 
> In the scenario of grabbing all the disks. If sda and sdb are the same device
> than I can only reserve one of them and ensure IO only goes down through the
> one I reserver-ed otherwise I could get a reservation conflict.

Correct, if you hold a reservation on a device for which you have multiple
paths, you have to use the correct path.

> This goes
> along with your previous patch on supporting multi-path at "md" and translating this 
>into the proper device to reserve.

The md multipath driver doesn't currently allow the proper ioctls for us to do
reservations at the md level.  We could only do them by going in and doing
reservations on /dev/sg entries behind the back of the md layer, which would
be risky at best.

> I guess it is up to the caller of
> your service to handle this case correct??

For now, yes.  And the best method to do so is to configure your failover
software to know that a device is a multipath device, only attempt to reserve
or mount one path, and fail back on the second path if the first path goes
away by issuing a bus reset on the secondary path, then reserving the
secondary path, then mounting the secondary path.  However, as you will have
lost data due to the failed writes on the primary path, I think this is of
dubious value.  Right now, as I see it, multipath and failover simply don't
mix well.  There is more work needed to make it work well.

> If this not any clearer than my last mail I will just wait to see the code
> :-).
> 
> Thanks,
> 
> -Mike
> 
> Doug Ledford [[EMAIL PROTECTED]] wrote:
> >
> >
> >
> > To:   Mike Anderson <[EMAIL PROTECTED]>
> > cc:   [EMAIL PROTECTED], James Bottomley
> >   <[EMAIL PROTECTED]>, "Roets, Chris"
> >   <[EMAIL PROTECTED]>, [EMAIL PROTECTED],
> >   [EMAIL PROTECTED]
> >
> >
> >
> >
> >
> > Mike Anderson wrote:
> > >
> > > Doug,
> > >
> > > A question on clarification.
> > >
> > > Is the configuration you are testing have both FC adapters going to the
> > same
> > > port of the storage device (mutli-path) or to different ports of the
> > storage
> > > device (mulit-port)?
> > >
> > > The reason I ask is that I thought if you are using SCSI-2 reserves that
> > the
> > > reserve was on a per initiator basis. How does one know which path has
> > the
> > > reserve?
> >
> > Reservations are global in nature in that a reservation with a device will
> > block access to that device from all other initiators, including across
> > different ports on multiport devices (or else they are broken and need a
> > firmware update).
> >
> > > On a side note. I thought the GFS project had up leveled there locking /
> > fencing
> > > into a API called a locking harness to support different kinds of fencing
> > > methods. Any thoughts if this capability could be plugged into this
> > service so
> > > that users could reduce recoding depending on which fencing support they
> > > selected.
> >
> > I wouldn't know about that.
> >
> > --
> >
> >  Doug Ledford <[EMAIL PROTECTED]>  http://people.redhat.com/dledford
> >   Please check my web site for aic7xxx updates/answers before
> >   e-mailing me about problems
> 
> --
> Michael Anderson
> [EMAIL PROTECTED]
> 
> IBM Linux Technology Center - Storage IO
> Phone (503) 578-4466
> Tie Line: 775-4466

-- 

 Doug Ledford <[EMAIL PROTECTED]>  http://people.redhat.com/dledford
  Please check my web site for aic7xxx updates/answers before
  e-mailing me about problems
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]



Re: Stop rewind in MT command?

2001-05-24 Thread Doug Ledford

Alan Dayley wrote:
> 
> I am using mt-st v. 0.5b.  At the end of EVERY MT command, a rewind is
> issued.  What good it a seek to end of data (seod) if it is immeadiately
> rewound before another command can be issued?
> 
> Is there an option or setting somewhere to stop the rewind command at the
> end of MT execution?

man st

FILES
   /dev/st*  : the auto-rewind SCSI tape devices
   /dev/nst* : the non-rewind SCSI tape devices

This is true of all tape devices.  The ones that are preceeded with an 'n' are
the non-rewind version, and the ones that aren't automatically rewind when the
tape device is closed.  So, the problem isn't with mt, it's doing exactly what
it is suppossed to do.  The problem is with the device you are accessing the
tape through.

-- 

 Doug Ledford <[EMAIL PROTECTED]>  http://people.redhat.com/dledford
  Please check my web site for aic7xxx updates/answers before
  e-mailing me about problems
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]



Re: question on io_request_lock

2001-07-19 Thread Doug Ledford

"MEHTA,HIREN (A-SanJose,ex1)" wrote:
> 
> Hi List,
> 
> If I decide that I do not want to depend on io_request_lock
> to maintain the consistency of the data structures in my driver,
> then do I ever need to acquire this lock and release it in the driver ?
> e.g. Do I need to acquire io_request_lock before I call the done() routine
> and release it after done() returns ?

Yes, the done() function needs to be wrapped (this isn't so much because it
actually needs it as it is that you are calling mid layer code and you need to
adhere to what it tells you locking semantics are, which in this case is
"always hold the io_request_lock because that's how I keep myself sane").  You
are free to use your own internal spin locks in your code if you wish.  For
certain of your code entry points, you may wish to release the io_request_lock
and then re-grab it before returning.  For example, at the beginning of your
queue routine you should release the io_request_lock and then regrab it before
returning if you want to use your own internal locking and not use the
io_request_lock.  The queue, abort, and reset routines are called with the
io_request_lock held when using the old error handling methods.  I don't know
which entry points are called with the lock held under the new error handling
code, but I would suspect the answer is "all of them".


-- 

 Doug Ledford <[EMAIL PROTECTED]>  http://people.redhat.com/dledford
  Please check my web site for aic7xxx updates/answers before
  e-mailing me about problems
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]



Re: question on io_request_lock

2001-07-19 Thread Doug Ledford

"MEHTA,HIREN (A-SanJose,ex1)" wrote:
> 
> Well, I guess the problem is that : scsi layer does not know whether the
> hba driver will do its own locking and release io_request_lock or not.
> Before calling any entry point, scsi layer is anyway acquires
> io_request_lock.
> Also, if you are depending on the io_request_lock, then in the interrupt
> handler you are supposed to acquired the io_request_lock and
> release it before returning. So, when the hba driver calls the done routine,
> the io_request_lock is already locked and doing it again in the done() will
> freeze the system.

Exactly.  However, if you use the new error handling code, then you *can* get
away with not grabbing the io_request_lock for the done function (I believe,
Eric will correct me if I'm wrong) because it uses a different spin lock to
control the queue it puts the commands into for the bottom half handler.  As
such, it doesn't really do anything that needs the io_request_lock.  The old
error handling code is a whole different ball of wax.  Call that done function
without holding the io_request_lock and you are bound to blow the SCSI layer
to hell.

> -hiren
> 
> -Original Message-
> From: Martin Peschke3 [mailto:[EMAIL PROTECTED]]
> Sent: Thursday, July 19, 2001 12:23 PM
> To: Doug Ledford
> Cc: MEHTA,HIREN (A-SanJose,ex1); '[EMAIL PROTECTED]'
> Subject: Re: question on io_request_lock
> 
> > Yes, the done() function needs to be wrapped (this isn't so much because
> it
> > actually needs it as it is that you are calling mid layer code and you
> need to
> > adhere to what it tells you locking semantics are, which in this case is
> > "always hold the io_request_lock because that's how I keep myself sane").
> 
> Are you sure about that?
> I think, there are low-level drivers which don't follow this rule.
> Why does not the scsi_done function(s) itself get the io_request_lock
> first before doing any work if it really needs this?
> 
> Mit freundlichen Grüßen / with kind regards
> 
> Martin Peschke
> 
> IBM Deutschland Entwicklung GmbH
> GNU/Linux for S/390 and zSeries Development
> Phone: +49-(0)7031-16-2349
> 
> Doug Ledford <[EMAIL PROTECTED]> on 07/19/2001 06:48:45 PM
> 
> Please respond to Doug Ledford <[EMAIL PROTECTED]>
> 
> To:   "MEHTA,HIREN (A-SanJose,ex1)" <[EMAIL PROTECTED]>
> cc:   "'[EMAIL PROTECTED]'" <[EMAIL PROTECTED]>
> Subject:  Re: question on io_request_lock
> 
> "MEHTA,HIREN (A-SanJose,ex1)" wrote:
> >
> > Hi List,
> >
> > If I decide that I do not want to depend on io_request_lock
> > to maintain the consistency of the data structures in my driver,
> > then do I ever need to acquire this lock and release it in the driver ?
> > e.g. Do I need to acquire io_request_lock before I call the done()
> routine
> > and release it after done() returns ?
> 
> Yes, the done() function needs to be wrapped (this isn't so much because it
> actually needs it as it is that you are calling mid layer code and you need
> to
> adhere to what it tells you locking semantics are, which in this case is
> "always hold the io_request_lock because that's how I keep myself sane").
> You
> are free to use your own internal spin locks in your code if you wish.  For
> certain of your code entry points, you may wish to release the
> io_request_lock
> and then re-grab it before returning.  For example, at the beginning of
> your
> queue routine you should release the io_request_lock and then regrab it
> before
> returning if you want to use your own internal locking and not use the
> io_request_lock.  The queue, abort, and reset routines are called with the
> io_request_lock held when using the old error handling methods.  I don't
> know
> which entry points are called with the lock held under the new error
> handling
> code, but I would suspect the answer is "all of them".
> 
> --
> 
>  Doug Ledford <[EMAIL PROTECTED]>  http://people.redhat.com/dledford
>   Please check my web site for aic7xxx updates/answers before
>   e-mailing me about problems
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to [EMAIL PROTECTED]

-- 

 Doug Ledford <[EMAIL PROTECTED]>  http://people.redhat.com/dledford
  Please check my web site for aic7xxx updates/answers before
  e-mailing me about problems
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]



[Patch] QLogic qla2x00 driver fixes

2005-02-24 Thread Doug Ledford
Don't use cmd->request->nr_hw_segments as it may not be initialized
(SG_IO in particular bypasses anything that initializes this and just
uses scsi_do_req to insert a scsi_request directly on the head of the
queue) and a bogus value here can trip up the checks to make sure that
the number of segments will fit in the queue ring buffer, resulting in
commands that are never completed.

Fix up several issues with PCI DMA mapping and failure to check return
values on the mappings.

Make the check for space in the ring buffer happen after the DMA mapping
is done since any checks done before the mapping has taken place are
bogus.

-- 
Doug Ledford <[EMAIL PROTECTED]>
http://people.redhat.com/dledford
= qla_iocb.c 1.15 vs edited =
--- 1.15/drivers/scsi/qla2xxx/qla_iocb.c	2004-12-09 01:12:03 -05:00
+++ edited/qla_iocb.c	2005-02-24 16:25:55 -05:00
@@ -216,18 +216,7 @@ void qla2x00_build_scsi_iocbs_32(srb_t *
 			cur_seg++;
 		}
 	} else {
-		dma_addr_t	req_dma;
-		struct page	*page;
-		unsigned long	offset;
-
-		page = virt_to_page(cmd->request_buffer);
-		offset = ((unsigned long)cmd->request_buffer & ~PAGE_MASK);
-		req_dma = pci_map_page(ha->pdev, page, offset,
-		cmd->request_bufflen, cmd->sc_data_direction);
-
-		sp->dma_handle = req_dma;
-
-		*cur_dsd++ = cpu_to_le32(req_dma);
+		*cur_dsd++ = cpu_to_le32(sp->dma_handle);
 		*cur_dsd++ = cpu_to_le32(cmd->request_bufflen);
 	}
 }
@@ -299,19 +288,8 @@ void qla2x00_build_scsi_iocbs_64(srb_t *
 			cur_seg++;
 		}
 	} else {
-		dma_addr_t	req_dma;
-		struct page	*page;
-		unsigned long	offset;
-
-		page = virt_to_page(cmd->request_buffer);
-		offset = ((unsigned long)cmd->request_buffer & ~PAGE_MASK);
-		req_dma = pci_map_page(ha->pdev, page, offset,
-		cmd->request_bufflen, cmd->sc_data_direction);
-
-		sp->dma_handle = req_dma;
-
-		*cur_dsd++ = cpu_to_le32(LSD(req_dma));
-		*cur_dsd++ = cpu_to_le32(MSD(req_dma));
+		*cur_dsd++ = cpu_to_le32(LSD(sp->dma_handle));
+		*cur_dsd++ = cpu_to_le32(MSD(sp->dma_handle));
 		*cur_dsd++ = cpu_to_le32(cmd->request_bufflen);
 	}
 }
@@ -344,6 +322,8 @@ qla2x00_start_scsi(srb_t *sp)
 
 	/* Setup device pointers. */
 	ret = 0;
+	/* So we know we haven't pci_map'ed anything yet */
+	tot_dsds = 0;
 	fclun = sp->lun_queue->fclun;
 	ha = fclun->fcport->ha;
 	reg = ha->iobase;
@@ -372,8 +352,32 @@ qla2x00_start_scsi(srb_t *sp)
 	if (index == MAX_OUTSTANDING_COMMANDS)
 		goto queuing_error;
 
+	/* Map the sg table so we have an accurate count of sg entries needed */
+	if (cmd->use_sg) {
+		sg = (struct scatterlist *) cmd->request_buffer;
+		tot_dsds = pci_map_sg(ha->pdev, sg, cmd->use_sg,
+		cmd->sc_data_direction);
+		if (tot_dsds == 0)
+			goto queuing_error;
+	} else if (cmd->request_bufflen) {
+		dma_addr_t	req_dma;
+		struct page	*page;
+		unsigned long	offset;
+
+		page = virt_to_page(cmd->request_buffer);
+		offset = ((unsigned long)cmd->request_buffer & ~PAGE_MASK);
+		req_dma = pci_map_page(ha->pdev, page, offset,
+		cmd->request_bufflen, cmd->sc_data_direction);
+
+		if (dma_mapping_error(req_dma))
+			goto queuing_error;
+
+		sp->dma_handle = req_dma;
+		tot_dsds = 1;
+	}
+
 	/* Calculate the number of request entries needed. */
-	req_cnt = (ha->calc_request_entries)(cmd->request->nr_hw_segments);
+	req_cnt = (ha->calc_request_entries)(tot_dsds);
 	if (ha->req_q_cnt < (req_cnt + 2)) {
 		cnt = RD_REG_WORD_RELAXED(ISP_REQ_Q_OUT(ha, reg));
 		if (ha->req_ring_index < cnt)
@@ -385,19 +389,6 @@ qla2x00_start_scsi(srb_t *sp)
 	if (ha->req_q_cnt < (req_cnt + 2))
 		goto queuing_error;
 
-	/* Finally, we have enough space, now perform mappings. */
-	tot_dsds = 0;
-	if (cmd->use_sg) {
-		sg = (struct scatterlist *) cmd->request_buffer;
-		tot_dsds = pci_map_sg(ha->pdev, sg, cmd->use_sg,
-		cmd->sc_data_direction);
-		if (tot_dsds == 0)
-			goto queuing_error;
-	} else if (cmd->request_bufflen) {
-	tot_dsds++;
-	}
-	req_cnt = (ha->calc_request_entries)(tot_dsds);
-
 	/* Build command packet */
 	ha->current_outstanding_cmd = handle;
 	ha->outstanding_cmds[handle] = sp;
@@ -479,6 +470,14 @@ qla2x00_start_scsi(srb_t *sp)
 	return (QLA_SUCCESS);
 
 queuing_error:
+	if (cmd->use_sg && tot_dsds) {
+		sg = (struct scatterlist *) cmd->request_buffer;
+		pci_unmap_sg(ha->pdev, sg, cmd->use_sg,
+		cmd->sc_data_direction);
+	} else if (tot_dsds) {
+		pci_unmap_page(ha->pdev, sp->dma_handle, cmd->request_bufflen,
+			   cmd->sc_data_direction);
+	}
 	spin_unlock_irqrestore(&ha->hardware_lock, flags);
 
 	return (QLA_FUNCTION_FAILED);


Re: [Patch] QLogic qla2x00 driver fixes

2005-02-25 Thread Doug Ledford
On Fri, 2005-02-25 at 03:38 -0500, Jeff Garzik wrote:
> Arjan van de Ven wrote:
> > On Thu, 2005-02-24 at 23:21 -0500, Doug Ledford wrote:
> > 
> >>Don't use cmd->request->nr_hw_segments as it may not be initialized
> >>(SG_IO in particular bypasses anything that initializes this and just
> >>uses scsi_do_req to insert a scsi_request directly on the head of the
> >>queue) 
> > 
> > 
> > should we fix that in the SG_IO layer ?
> 
> Possibly/probably.

I'm not concerned with it personally.  The only reason that the scsi
layer copies the block layer request struct into the scsi
command/request is so that upon completion it has enough information to
mark blocks as either up to date or not while at the same time allowing
the scsi layer to free the original block request at queue time, not at
completion time.  It was never intended to be used by low level drivers.
And since the scsi layer has several ways of bypassing the block layer
to inject commands directly to devices, trying to catch all those
injection points and make them emulate block layer activity when we
already provide everything a low level driver needs in the scsi command
struct is silly.  Just use what's available, reliable, always
initialized, and intended for you to use (cmd->use_sg) instead of trying
to use a roughly duplicate item from a different abstraction layer that
isn't always involved in commands sent to a low level driver (request-
>nr_hw_segments).

> For SCSI drivers specifically, using nr_hw_segments is pointless unless 
> cmd->use_sg goes away.  At which point tons of SCSI drivers want 
> changing anyway.
> 
>   n_sg = dma_map_sg(..., cmd->use_sg, ...)
> 
> will always do the right thing, when the cmd contains a scatterlist.
> 
> Deviate from the norm and pay the price, really...

Aye, yup.

-- 
Doug Ledford <[EMAIL PROTECTED]>
http://people.redhat.com/dledford


-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Patch] QLogic qla2x00 driver fixes

2005-02-28 Thread Doug Ledford
On Fri, 2005-02-25 at 09:02 -0800, Patrick Mansfield wrote:
> On Fri, Feb 25, 2005 at 06:57:39AM -0500, Doug Ledford wrote:
> > On Fri, 2005-02-25 at 03:38 -0500, Jeff Garzik wrote:
> > > Arjan van de Ven wrote:
> > > > On Thu, 2005-02-24 at 23:21 -0500, Doug Ledford wrote:
> > > > 
> > > >>Don't use cmd->request->nr_hw_segments as it may not be initialized
> > > >>(SG_IO in particular bypasses anything that initializes this and just
> > > >>uses scsi_do_req to insert a scsi_request directly on the head of the
> > > >>queue) 
> > > > 
> > > > 
> > > > should we fix that in the SG_IO layer ?
> > > 
> > > Possibly/probably.
> 
> Doug,
> 
> What kernel did you hit this with? 

Our RHEL4 kernel (2.6.9 plus bunches-o'-patches)

> And same question as Doug G: is it via sg (not the block SG_IO)? sg uses
> scsi_do_req(), block SG_IO doesn't.

At the moment, I'm sorting that out.  It's a bit of a black box.  We
have two different loops of commands running on my test machine.  There
are 10 instances of:

while true; do scsiinfo -i /dev/sdf; done 

which sends an INQUIRY request to the target device.  I'm running into a
problem getting an strace from this command, so I'm tell you more when I
can.

The other 10 loops are:

while true; do scsi_id -g -s /block/sdf -p 0x80; done

> Jens sent changes last August or so that fixed SG_IO (not sg) to always
> set nr_hw_segments, change should be in 2.6.10. It is not obvious that his
> change fixed this, I can't find the changeset or log.

Finding out if this is fixed is only greatly complicated by the fact
that there exists a drivers/block/scsi_ioctl.c and
drivers/scsi/scsi_ioctl.c and they both are compiled into the kernel, so
knowing which one is actually being used, codepath wise, can be
confusing.  Then on top of that there's drivers/scsi/sg.c with it's
SG_IO path.  It's not necessarily an easy issue to sort out without that
strace and for some reason strace is locking on the test box I'm
running.

(insert delay as we get the strace issue resolved)

OK, the program that's causing the problem, from what I can tell, is
opening /dev/sdf and then issuing a SCSI_IOCTL_SEND_COMMAND request.
That means we go to scsi/sd.c:sd_ioctl to
block/scsi_ioctl.c:scsi_cmd_ioctl to block/scsi_ioctl.c:sg_scsi_ioctl
and it's on that path that we aren't always getting an initialized
nr_hw_segments.

/me votes for consolidating some of this ioctl mess if at all possible.

-- 
Doug Ledford <[EMAIL PROTECTED]>
http://people.redhat.com/dledford

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html