RE: Possible bug in scsi_lib.c:scsi_req_map_sg()

2007-03-04 Thread James Bottomley
On Sun, 2007-03-04 at 00:36 -0700, Dachepalli, Sudhir wrote:
> Our driver gets called in with the following fashion through the
> queuecommand.
> 
> scsi_request_fn() -> scsi_dispatch_cmd() -> rtn =
> host->hostt->queuecommand(cmd, scsi_done);
> 
> We are using the "cmd" ( scsi_cmnd) as a pass through with out touching
> the "request_buffer" and "request_bufflen".
> 
> We do not allocate memory similar to sg or st for page allocations.
> 
> The request_buffer should already contain the scatter gather list built.

We're trying not to wrapper pass throughs in SCSI commands (unless
defined by standard like the ATA ones).  However, I'm not entirely sure
what you're trying to do ... can you post a link to the driver so that
we can see if there's a better way?

Thanks,

James


-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Possible bug in scsi_lib.c:scsi_req_map_sg()

2007-03-04 Thread Mike Christie
James Bottomley wrote:
> On Sun, 2007-03-04 at 00:36 -0700, Dachepalli, Sudhir wrote:
>> Our driver gets called in with the following fashion through the
>> queuecommand.
>>
>> scsi_request_fn() -> scsi_dispatch_cmd() -> rtn =
>> host->hostt->queuecommand(cmd, scsi_done);
>>
>> We are using the "cmd" ( scsi_cmnd) as a pass through with out touching
>> the "request_buffer" and "request_bufflen".
>>
>> We do not allocate memory similar to sg or st for page allocations.
>>
>> The request_buffer should already contain the scatter gather list built.
> 
> We're trying not to wrapper pass throughs in SCSI commands (unless
> defined by standard like the ATA ones).  However, I'm not entirely sure
> what you're trying to do ... can you post a link to the driver so that
> we can see if there's a better way?
> 

There are trying to do a scsi level multipath driver for RDAC support.
And they are trying to do something similar to request based multipath
(route requests instead of bios but in their case they are routing
scsi_cmnds instead of requests).
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Possible bug in scsi_lib.c:scsi_req_map_sg()

2007-03-04 Thread James Bottomley
On Sun, 2007-03-04 at 09:43 -0600, Mike Christie wrote:
> There are trying to do a scsi level multipath driver for RDAC support.
> And they are trying to do something similar to request based multipath
> (route requests instead of bios but in their case they are routing
> scsi_cmnds instead of requests).

I guessed it might be something like that ... 

The SCSI driver layer wasn't designed to stack like that ... hence the
problems of setup and teardown of the sg list.  Wouldn't it just be
easier to make request based multi-path work?  Unfortunately, you can't
avoid the request system because what you want to do is resubmit an I/O
without having the mid-layer execute the prepare function (which is
where it's mapped) so you need the REQ_DONTPREP flag.

James


James


-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Possible bug in scsi_lib.c:scsi_req_map_sg()

2007-03-04 Thread Mike Christie
James Bottomley wrote:
> On Sun, 2007-03-04 at 09:43 -0600, Mike Christie wrote:
>> There are trying to do a scsi level multipath driver for RDAC support.
>> And they are trying to do something similar to request based multipath
>> (route requests instead of bios but in their case they are routing
>> scsi_cmnds instead of requests).
> 
> I guessed it might be something like that ... 
> 
> The SCSI driver layer wasn't designed to stack like that ... hence the
> problems of setup and teardown of the sg list.  Wouldn't it just be
> easier to make request based multi-path work?  Unfortunately, you can't
> avoid the request system because what you want to do is resubmit an I/O
> without having the mid-layer execute the prepare function (which is
> where it's mapped) so you need the REQ_DONTPREP flag.
> 

I think they get around this and other request settings that need
resetting by using scsi_execute_async. They will take the command, data
direction and buffer fields from the original scsi_cmnd, then pass those
on to scsi_ececute_async which would allocate a new request and then as
you know that new request gets sent to the scsi layer and looks like a
brand new request. So I misspoke above. It might be better to say they
are using it for routing what the other multipath layers would call
cloned commands.
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Possible bug in scsi_lib.c:scsi_req_map_sg()

2007-03-04 Thread James Bottomley
On Sun, 2007-03-04 at 10:21 -0600, Mike Christie wrote:
> I think they get around this and other request settings that need
> resetting by using scsi_execute_async. They will take the command, data
> direction and buffer fields from the original scsi_cmnd, then pass those
> on to scsi_ececute_async which would allocate a new request and then as
> you know that new request gets sent to the scsi layer and looks like a
> brand new request. So I misspoke above. It might be better to say they
> are using it for routing what the other multipath layers would call
> cloned commands.

But actually, that's what I don't think they want to do.
scsi_execute_async() will try to map the sg list.  What they probably
get is a fully set up SCSI command which needs to be cloned to a new
device, so the request sg setup has already been done.  So what you need
to do is something like get a request for the new device, clone the
relevant parameters, set the flags to REQ_DONTPREP and inject it into
the correct queue.  You probably also need to clone the scsi_cmnd as
well.  Finally you need to chain the completions of the old and new
commands.  This should avoid having to setup and re tear down the
requests, while not using some kludgy mechanism like special commands.

However, obviously, this is speculation about the operation of an unseen
driver, which is not incredibly profitable ...

James


-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Possible bug in scsi_lib.c:scsi_req_map_sg()

2007-03-04 Thread Mike Christie
James Bottomley wrote:
> On Sun, 2007-03-04 at 10:21 -0600, Mike Christie wrote:
>> I think they get around this and other request settings that need
>> resetting by using scsi_execute_async. They will take the command, data
>> direction and buffer fields from the original scsi_cmnd, then pass those
>> on to scsi_ececute_async which would allocate a new request and then as
>> you know that new request gets sent to the scsi layer and looks like a
>> brand new request. So I misspoke above. It might be better to say they
>> are using it for routing what the other multipath layers would call
>> cloned commands.
> 
> But actually, that's what I don't think they want to do.

Yeah, you are right. Proper cloning is the better way to go.
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Possible bug in scsi_lib.c:scsi_req_map_sg()

2007-03-04 Thread Mike Christie
Mike Christie wrote:
> James Bottomley wrote:
>> On Sun, 2007-03-04 at 10:21 -0600, Mike Christie wrote:
>>> I think they get around this and other request settings that need
>>> resetting by using scsi_execute_async. They will take the command, data
>>> direction and buffer fields from the original scsi_cmnd, then pass those
>>> on to scsi_ececute_async which would allocate a new request and then as
>>> you know that new request gets sent to the scsi layer and looks like a
>>> brand new request. So I misspoke above. It might be better to say they
>>> are using it for routing what the other multipath layers would call
>>> cloned commands.
>> But actually, that's what I don't think they want to do.
> 
> Yeah, you are right. Proper cloning is the better way to go.

I meant routing of commands and possibly cloning like you described, not
just cloning.
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Possible bug in scsi_lib.c:scsi_req_map_sg()

2007-03-04 Thread Dachepalli, Sudhir
Do you think the following could work If I used blk layer functions
instead of "scsi_execute_async":

Blk_get_request()
Req->flags |= REQ_DONTPREP
Blk_rq_map_kern()
Blk_execute_req_nowait()
Blk_put_request() 

Regards,
Sudhir

-Original Message-
From: Mike Christie [mailto:[EMAIL PROTECTED] 
Sent: Sunday, March 04, 2007 11:07 AM
To: James Bottomley
Cc: Dachepalli, Sudhir; Benny Halevy; Jens Axboe; Boaz Harrosh;
linux-scsi@vger.kernel.org
Subject: Re: Possible bug in scsi_lib.c:scsi_req_map_sg()

Mike Christie wrote:
> James Bottomley wrote:
>> On Sun, 2007-03-04 at 10:21 -0600, Mike Christie wrote:
>>> I think they get around this and other request settings that need 
>>> resetting by using scsi_execute_async. They will take the command, 
>>> data direction and buffer fields from the original scsi_cmnd, then 
>>> pass those on to scsi_ececute_async which would allocate a new 
>>> request and then as you know that new request gets sent to the scsi 
>>> layer and looks like a brand new request. So I misspoke above. It 
>>> might be better to say they are using it for routing what the other 
>>> multipath layers would call cloned commands.
>> But actually, that's what I don't think they want to do.
> 
> Yeah, you are right. Proper cloning is the better way to go.

I meant routing of commands and possibly cloning like you described, not
just cloning.
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Possible bug in scsi_lib.c:scsi_req_map_sg()

2007-03-04 Thread James Bottomley
On Sun, 2007-03-04 at 11:00 -0700, Dachepalli, Sudhir wrote:
> Do you think the following could work If I used blk layer functions
> instead of "scsi_execute_async":
> 
> Blk_get_request()
> Req->flags |= REQ_DONTPREP

There's additional complexity here:  if the request isn't prepared, no
command is allocated.  You can't use the one you have because it belongs
to a different device (plus it would be freed by this request in the
completion path).

> Blk_rq_map_kern()
> Blk_execute_req_nowait()

If the request is already mapped, you don't want to be mapping it again.

> Blk_put_request() 

Without seeing the code it's hard to say definitively whether all this
is correct.

James


-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


convert sg to block layer helpers - v5

2007-03-04 Thread michaelc
There is no big changes between v4 and v5. I was able to fix
things in scsi tgt, so I could remove the weird arguements
the block helpers were taking for it. I also tried to break
up the patchset for easier viewing. The final patch also
takes care of the access_ok regression.

These patches were made against linus's tree since Tomo needed
me to break part of it out for his scsi tgt bug fix patches.

0001-rm-bio-hacks-in-scsi-tgt.txt - Drop scsi tgt's bio_map_user
usage and convert it to blk_rq_map_user. Tomo is also sending
this patch in his patchset since he needs it for his bug fixes.

0002-rm-block-device-arg-from-bio-map-user.txt - The block_device
argument is never used in the bio map user functions, so this
patch drops it.

0003-Support-large-sg-io-segments.txt - Modify the bio functions
to allocate multiple pages at once instead of a single page.

0004-Add-reserve-buffer-for-sg-io.txt - Add reserve buffer support
to the block layer for sg and st indirect IO use.

0005-Add-sg-io-mmap-helper.txt - Add some block layer helpers for
sg mmap support.

0006-Convert-sg-to-block-layer-helpers.txt - Convert sg to block
layer helpers.

0007-mv-user-buffer-copy-access_ok-test-to-block-helper.txt - 
Move user data buffer access_ok tests to block layer helpers.

The goal of this patchset is to remove scsi_execute_async and
reduce code duplication.

People want to discuss further merging sg and bsg/scsi_ioctl
functionality, but I did not handle and any of that in this
patchset since people still disagree on what should supported
with future interfaces.

My only TODO is maybe make the bio reserve buffer mempoolable
(make it work as mempool alloc and free functions). Since
sg only supported one reserve buffer per fd I have not worked
on it and it did not seem worth it if there are no users.


-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/7] rm bio hacks in scsi tgt

2007-03-04 Thread michaelc
From: Mike Christie <[EMAIL PROTECTED]>

scsi tgt breaks up a command into multple scatterlists
if we cannot fit all the data in one. This was because
the block rq helpers did not support large requests and
because we can get a command of any old size so it is
hard to preallocate pages for scatterlist large enough
(we cannot really preallocate pages with the bio map
user path). In 2.6.20, we added large request support to
the block layer helper, blk_rq_map_user. And at LSF,
we talked about increasing SCSI_MAX_PHYS_SEGMENTS for
scsi tgt if we want to support really really :) large
(greater than 256 * PAGE_SIZE in the worst mapping case)
requests.

The only target currently implemented does not even support
the multiple scatterlists stuff and only supports smaller
requests, so this patch just coverts scsi tgt to use
blk_rq_map_user.
Signed-off-by: Mike Christie <[EMAIL PROTECTED]>
---
 drivers/scsi/scsi_tgt_lib.c |  133 +++
 include/scsi/scsi_cmnd.h|3 -
 2 files changed, 34 insertions(+), 102 deletions(-)

diff --git a/drivers/scsi/scsi_tgt_lib.c b/drivers/scsi/scsi_tgt_lib.c
index d402aff..47c29a9 100644
--- a/drivers/scsi/scsi_tgt_lib.c
+++ b/drivers/scsi/scsi_tgt_lib.c
@@ -28,7 +28,6 @@ #include 
 #include 
 #include 
 #include 
-#include <../drivers/md/dm-bio-list.h>
 
 #include "scsi_tgt_priv.h"
 
@@ -42,9 +41,8 @@ static struct kmem_cache *scsi_tgt_cmd_c
 struct scsi_tgt_cmd {
/* TODO replace work with James b's code */
struct work_struct work;
-   /* TODO replace the lists with a large bio */
-   struct bio_list xfer_done_list;
-   struct bio_list xfer_list;
+   /* TODO fix limits of some drivers */
+   struct bio *bio;
 
struct list_head hash_list;
struct request *rq;
@@ -93,7 +91,12 @@ struct scsi_cmnd *scsi_host_get_command(
if (!tcmd)
goto put_dev;
 
-   rq = blk_get_request(shost->uspace_req_q, write, gfp_mask);
+   /*
+* The blk helpers are used to the READ/WRITE requests
+* transfering data from a initiator point of view. Since
+* we are in target mode we want the opposite.
+*/
+   rq = blk_get_request(shost->uspace_req_q, !write, gfp_mask);
if (!rq)
goto free_tcmd;
 
@@ -111,8 +114,6 @@ struct scsi_cmnd *scsi_host_get_command(
rq->cmd_flags |= REQ_TYPE_BLOCK_PC;
rq->end_io_data = tcmd;
 
-   bio_list_init(&tcmd->xfer_list);
-   bio_list_init(&tcmd->xfer_done_list);
tcmd->rq = rq;
 
return cmd;
@@ -157,22 +158,6 @@ void scsi_host_put_command(struct Scsi_H
 }
 EXPORT_SYMBOL_GPL(scsi_host_put_command);
 
-static void scsi_unmap_user_pages(struct scsi_tgt_cmd *tcmd)
-{
-   struct bio *bio;
-
-   /* must call bio_endio in case bio was bounced */
-   while ((bio = bio_list_pop(&tcmd->xfer_done_list))) {
-   bio_endio(bio, bio->bi_size, 0);
-   bio_unmap_user(bio);
-   }
-
-   while ((bio = bio_list_pop(&tcmd->xfer_list))) {
-   bio_endio(bio, bio->bi_size, 0);
-   bio_unmap_user(bio);
-   }
-}
-
 static void cmd_hashlist_del(struct scsi_cmnd *cmd)
 {
struct request_queue *q = cmd->request->q;
@@ -185,6 +170,11 @@ static void cmd_hashlist_del(struct scsi
spin_unlock_irqrestore(&qdata->cmd_hash_lock, flags);
 }
 
+static void scsi_unmap_user_pages(struct scsi_tgt_cmd *tcmd)
+{
+   blk_rq_unmap_user(tcmd->bio);
+}
+
 static void scsi_tgt_cmd_destroy(struct work_struct *work)
 {
struct scsi_tgt_cmd *tcmd =
@@ -193,16 +183,6 @@ static void scsi_tgt_cmd_destroy(struct 
 
dprintk("cmd %p %d %lu\n", cmd, cmd->sc_data_direction,
rq_data_dir(cmd->request));
-   /*
-* We fix rq->cmd_flags here since when we told bio_map_user
-* to write vm for WRITE commands, blk_rq_bio_prep set
-* rq_data_dir the flags to READ.
-*/
-   if (cmd->sc_data_direction == DMA_TO_DEVICE)
-   cmd->request->cmd_flags |= REQ_RW;
-   else
-   cmd->request->cmd_flags &= ~REQ_RW;
-
scsi_unmap_user_pages(tcmd);
scsi_host_put_command(scsi_tgt_cmd_to_host(cmd), cmd);
 }
@@ -215,6 +195,7 @@ static void init_scsi_tgt_cmd(struct req
struct list_head *head;
 
tcmd->tag = tag;
+   tcmd->bio = NULL;
INIT_WORK(&tcmd->work, scsi_tgt_cmd_destroy);
spin_lock_irqsave(&qdata->cmd_hash_lock, flags);
head = &qdata->cmd_hash[cmd_hashfn(tag)];
@@ -419,52 +400,33 @@ static int scsi_map_user_pages(struct sc
struct request *rq = cmd->request;
void *uaddr = tcmd->buffer;
unsigned int len = tcmd->bufflen;
-   struct bio *bio;
int err;
 
-   while (len > 0) {
-   dprintk("%lx %u\n", (unsigned long) uaddr, len);
-   bio = bio_map_user(q, NULL, (unsigned long) uaddr, len, rw);
-   if (IS_ERR(bio)) {
-

[PATCH 2/7] rm block device arg from bio map user

2007-03-04 Thread michaelc
From: Mike Christie <[EMAIL PROTECTED]>

Everyone is passing in NULL, so let's just
make it a little more simple and drop the
block device argument from the bio mapping
functions.
Signed-off-by: Mike Christie <[EMAIL PROTECTED]>
---
 block/ll_rw_blk.c   |4 ++--
 fs/bio.c|   17 ++---
 include/linux/bio.h |5 ++---
 3 files changed, 10 insertions(+), 16 deletions(-)

diff --git a/block/ll_rw_blk.c b/block/ll_rw_blk.c
index 38c293b..7a108d5 100644
--- a/block/ll_rw_blk.c
+++ b/block/ll_rw_blk.c
@@ -2343,7 +2343,7 @@ static int __blk_rq_map_user(request_que
 */
uaddr = (unsigned long) ubuf;
if (!(uaddr & queue_dma_alignment(q)) && !(len & 
queue_dma_alignment(q)))
-   bio = bio_map_user(q, NULL, uaddr, len, reading);
+   bio = bio_map_user(q, uaddr, len, reading);
else
bio = bio_copy_user(q, uaddr, len, reading);
 
@@ -2479,7 +2479,7 @@ int blk_rq_map_user_iov(request_queue_t 
/* we don't allow misaligned data like bio_map_user() does.  If the
 * user is using sg, they're expected to know the alignment constraints
 * and respect them accordingly */
-   bio = bio_map_user_iov(q, NULL, iov, iov_count, rq_data_dir(rq)== READ);
+   bio = bio_map_user_iov(q, iov, iov_count, rq_data_dir(rq)== READ);
if (IS_ERR(bio))
return PTR_ERR(bio);
 
diff --git a/fs/bio.c b/fs/bio.c
index 7618bcb..8ae7223 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -601,7 +601,6 @@ out_bmd:
 }
 
 static struct bio *__bio_map_user_iov(request_queue_t *q,
- struct block_device *bdev,
  struct sg_iovec *iov, int iov_count,
  int write_to_vm)
 {
@@ -694,7 +693,6 @@ static struct bio *__bio_map_user_iov(re
if (!write_to_vm)
bio->bi_rw |= (1 << BIO_RW);
 
-   bio->bi_bdev = bdev;
bio->bi_flags |= (1 << BIO_USER_MAPPED);
return bio;
 
@@ -713,7 +711,6 @@ static struct bio *__bio_map_user_iov(re
 /**
  * bio_map_user-   map user address into bio
  * @q: the request_queue_t for the bio
- * @bdev: destination block device
  * @uaddr: start of user address
  * @len: length in bytes
  * @write_to_vm: bool indicating writing to pages or not
@@ -721,21 +718,20 @@ static struct bio *__bio_map_user_iov(re
  * Map the user space address into a bio suitable for io to a block
  * device. Returns an error pointer in case of error.
  */
-struct bio *bio_map_user(request_queue_t *q, struct block_device *bdev,
-unsigned long uaddr, unsigned int len, int write_to_vm)
+struct bio *bio_map_user(request_queue_t *q, unsigned long uaddr,
+unsigned int len, int write_to_vm)
 {
struct sg_iovec iov;
 
iov.iov_base = (void __user *)uaddr;
iov.iov_len = len;
 
-   return bio_map_user_iov(q, bdev, &iov, 1, write_to_vm);
+   return bio_map_user_iov(q, &iov, 1, write_to_vm);
 }
 
 /**
  * bio_map_user_iov - map user sg_iovec table into bio
  * @q: the request_queue_t for the bio
- * @bdev: destination block device
  * @iov:   the iovec.
  * @iov_count: number of elements in the iovec
  * @write_to_vm: bool indicating writing to pages or not
@@ -743,13 +739,12 @@ struct bio *bio_map_user(request_queue_t
  * Map the user space address into a bio suitable for io to a block
  * device. Returns an error pointer in case of error.
  */
-struct bio *bio_map_user_iov(request_queue_t *q, struct block_device *bdev,
-struct sg_iovec *iov, int iov_count,
-int write_to_vm)
+struct bio *bio_map_user_iov(request_queue_t *q, struct sg_iovec *iov,
+int iov_count, int write_to_vm)
 {
struct bio *bio;
 
-   bio = __bio_map_user_iov(q, bdev, iov, iov_count, write_to_vm);
+   bio = __bio_map_user_iov(q, iov, iov_count, write_to_vm);
 
if (IS_ERR(bio))
return bio;
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 08daf32..cfb6a7d 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -298,11 +298,10 @@ extern int bio_add_page(struct bio *, st
 extern int bio_add_pc_page(struct request_queue *, struct bio *, struct page *,
   unsigned int, unsigned int);
 extern int bio_get_nr_vecs(struct block_device *);
-extern struct bio *bio_map_user(struct request_queue *, struct block_device *,
-   unsigned long, unsigned int, int);
+extern struct bio *bio_map_user(struct request_queue *, unsigned long,
+   unsigned int, int);
 struct sg_iovec;
 extern struct bio *bio_map_user_iov(struct request_queue *,
-   struct block_device *,
struct sg_iovec *, int, int);
 extern void bio_unm

[PATCH 5/7] Add sg io mmap helper

2007-03-04 Thread michaelc
From: Mike Christie <[EMAIL PROTECTED]>

sg.c supports mmap, so this patch just move the code
to the block layer for others to share and converts it
to the bio reserved buffer.

The helpers are:
- blk_rq_mmap - does some checks to makre sure the reserved buf is
large enough.
- blk_rq_vma_nopage - traverses the reserved buffer and does get_page()

To setup and teardown the request and bio reserved buffer mappings for
the sg mmap operation you call blk_rq_setup_buffer() and
blk_rq_destroy_buffer().
Signed-off-by: Mike Christie <[EMAIL PROTECTED]>
---
 block/ll_rw_blk.c  |   68 
 include/linux/blkdev.h |4 +++
 2 files changed, 72 insertions(+), 0 deletions(-)

diff --git a/block/ll_rw_blk.c b/block/ll_rw_blk.c
index 4d6c2bd..35b66ed 100644
--- a/block/ll_rw_blk.c
+++ b/block/ll_rw_blk.c
@@ -2431,6 +2431,74 @@ unmap_rq:
 EXPORT_SYMBOL(blk_rq_setup_buffer);
 
 /**
+ * blk_rq_mmap - alloc and setup buffers for REQ_BLOCK_PC mmap
+ * @rbuf:  reserve buffer
+ * @vma:   vm struct
+ *
+ * Description:
+ *A the caller must also call blk_rq_setup_buffer on the request and
+ *blk_rq_destroy_buffer() must be issued at the end of io.
+ *It's the callers responsibility to make sure this happens. The
+ *original bio must be passed back in to blk_rq_destroy_buffer() for
+ *proper unmapping.
+ *
+ *The block layer mmap functions implement the old sg.c behavior
+ *where they can be only one sg mmap command outstanding.
+ */
+int blk_rq_mmap(struct bio_reserve_buf *rbuf, struct vm_area_struct *vma)
+{
+   unsigned long len;
+
+   if (vma->vm_pgoff)
+   return -EINVAL; /* want no offset */
+
+   if (!rbuf)
+   return -ENOMEM;
+
+   len = vma->vm_end - vma->vm_start;
+   if (len > rbuf->buf_size)
+   return -ENOMEM;
+
+   vma->vm_flags |= VM_RESERVED;
+   return 0;
+}
+EXPORT_SYMBOL(blk_rq_mmap);
+
+struct page *blk_rq_vma_nopage(struct bio_reserve_buf *rbuf,
+  struct vm_area_struct *vma, unsigned long addr,
+  int *type)
+{
+   struct page *pg = NOPAGE_SIGBUS;
+   unsigned long offset, bytes = 0, sg_offset;
+   struct scatterlist *sg;
+   int i;
+
+   if (!rbuf)
+   return pg;
+
+   offset = addr - vma->vm_start;
+   if (offset >= rbuf->buf_size)
+   return pg;
+
+   for (i = 0; i < rbuf->sg_count; i++) {
+   sg = &rbuf->sg[i];
+
+   bytes += sg->length;
+   if (bytes > offset) {
+   sg_offset = sg->length - (bytes - offset);
+   pg = &sg->page[sg_offset >> PAGE_SHIFT];
+   get_page(pg);
+   break;
+   }
+   }
+
+   if (type)
+   *type = VM_FAULT_MINOR;
+   return pg;
+}
+EXPORT_SYMBOL(blk_rq_vma_nopage);
+
+/**
  * blk_rq_map_user - map user data to a request.
  * @q: request queue where request should be inserted
  * @rq:request structure to fill
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 755f0b4..04c1b09 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -670,6 +670,10 @@ extern void blk_sync_queue(struct reques
 extern void __blk_stop_queue(request_queue_t *q);
 extern void blk_run_queue(request_queue_t *);
 extern void blk_start_queueing(request_queue_t *);
+extern struct page *blk_rq_vma_nopage(struct bio_reserve_buf *,
+ struct vm_area_struct *, unsigned long,
+ int *);
+extern int blk_rq_mmap(struct bio_reserve_buf *, struct vm_area_struct *);
 extern int blk_rq_init_transfer(request_queue_t *, struct request *, void 
__user *, unsigned long);
 extern int blk_rq_map_user(request_queue_t *, struct request *,
   void __user *, unsigned long);
-- 
1.4.1.1

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/7] Add reserve buffer for sg io

2007-03-04 Thread michaelc
From: Mike Christie <[EMAIL PROTECTED]>

sg and st use a reserve buffer so that they can always
gaurantee that they can execute IO of a certain size
which is larger than the worst case guess.

This patch adds a bio_reserved_buf structure, which holds mutlple
segments that can be mapped into BIOs. This replaces sg's reserved
buffer code, and can be used for tape (I think we need some reserved buffer
growing code for that, but that should not be too difficult to add).
It can also be used for scsi_tgt, so we gaurantee a certain IO size will
always be executable.
Signed-off-by: Mike Christie <[EMAIL PROTECTED]>
---
 block/ll_rw_blk.c  |   15 ++-
 fs/bio.c   |  211 ++--
 include/linux/bio.h|   20 -
 include/linux/blkdev.h |5 +
 4 files changed, 234 insertions(+), 17 deletions(-)

diff --git a/block/ll_rw_blk.c b/block/ll_rw_blk.c
index c9d765b..4d6c2bd 100644
--- a/block/ll_rw_blk.c
+++ b/block/ll_rw_blk.c
@@ -2347,12 +2347,13 @@ EXPORT_SYMBOL(blk_rq_destroy_buffer);
  * @rq:request structure to fill
  * @ubuf:  the user buffer (optional)
  * @len:   length of buffer
+ * @rbuf:  reserve buf to use
  *
  * Description:
  *The caller must call blk_rq_destroy_buffer when the IO is completed.
  */
 int blk_rq_setup_buffer(struct request *rq, void __user *ubuf,
-   unsigned long len)
+   unsigned long len, struct bio_reserve_buf *rbuf)
 {
struct request_queue *q = rq->q;
unsigned long bytes_read = 0;
@@ -2383,7 +2384,7 @@ int blk_rq_setup_buffer(struct request *
 
bio = bio_map_user(q, uaddr, map_len, reading);
} else
-   bio = bio_setup_user_buffer(q, map_len, reading);
+   bio = bio_setup_user_buffer(q, map_len, reading, rbuf);
if (IS_ERR(bio)) {
ret = PTR_ERR(bio);
goto unmap_rq;
@@ -2450,7 +2451,7 @@ EXPORT_SYMBOL(blk_rq_setup_buffer);
 int blk_rq_map_user(request_queue_t *q, struct request *rq,
 void __user *ubuf, unsigned long len)
 {
-   return blk_rq_setup_buffer(rq, ubuf, len);
+   return blk_rq_setup_buffer(rq, ubuf, len, NULL);
 }
 EXPORT_SYMBOL(blk_rq_map_user);
 
@@ -2522,6 +2523,7 @@ continue_from_bvec:
  * @iov:   sg iovec
  * @iov_count: number of elements in the iovec
  * @len:   max length of data (length of buffer)
+ * @rbuf:  reserve buffer
  *
  * Description:
  *This function is for REQ_BLOCK_PC usage.
@@ -2534,11 +2536,12 @@ continue_from_bvec:
  *proper unmapping.
  */
 int blk_rq_copy_user_iov(struct request *rq, struct sg_iovec *iov,
-int iov_count, unsigned long len)
+int iov_count, unsigned long len,
+struct bio_reserve_buf *rbuf)
 {
int ret;
 
-   ret = blk_rq_setup_buffer(rq, NULL, len);
+   ret = blk_rq_setup_buffer(rq, NULL, len, rbuf);
if (ret)
return ret;
 
@@ -2607,7 +2610,7 @@ int blk_rq_init_transfer(request_queue_t
iov.iov_base = ubuf;
iov.iov_len = len;
 
-   ret = blk_rq_copy_user_iov(rq, &iov, 1, len);
+   ret = blk_rq_copy_user_iov(rq, &iov, 1, len, NULL);
}
return ret;
 }
diff --git a/fs/bio.c b/fs/bio.c
index 2fff42a..75a3495 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -458,6 +458,7 @@ struct bio_map_vec {
 };
 
 struct bio_map_data {
+   struct bio_reserve_buf *rbuf;
struct bio_map_vec *iovecs;
int nr_vecs;
 };
@@ -485,8 +486,7 @@ static struct bio_map_data *bio_alloc_ma
 
 /*
  * This is only a esitmation. Drivers, like MD/DM RAID could have strange
- * boundaries not expressed in a q limit, so we do not know the real
- * limit until we add the page to the bio.
+ * boundaries not expresses in a q limit.
  *
  * This should only be used by bio helpers, because we cut off the max
  * segment size at BIO_MAX_SIZE. There is hw that can do larger segments,
@@ -505,6 +505,7 @@ static unsigned int bio_estimate_max_seg
return bytes;
 }
 
+/* This should only be used by block layer helpers */
 static struct page *bio_alloc_pages(struct request_queue *q, unsigned int len,
int *ret_order)
 {
@@ -530,10 +531,175 @@ static struct page *bio_alloc_pages(stru
return pages;
 }
 
+static void free_reserve_buf(struct bio_reserve_buf *rbuf)
+{
+   struct scatterlist *sg;
+   int i;
+
+   for (i = 0; i < rbuf->sg_count; i++) {
+   sg = &rbuf->sg[i];
+   if (sg->page)
+   __free_pages(sg->page, get_order(sg->length));
+   }
+
+   kfree(rbuf->sg);
+   kfree(rbuf);
+}
+
+/**
+ * bio_free_reserve_buf - free reserve buffer
+ * @q: the request queue for the device
+ *
+ * It is the responsibility of the caller to make sure it is
+ * no lo

[PATCH 3/7] Support large sg io segments

2007-03-04 Thread michaelc
From: Mike Christie <[EMAIL PROTECTED]>

sg.c and st allocate large chunks of clustered pages
to try and make really large requests. The block
layer sg io code only allocates a page at a time,
so we can end up with lots of unclustered pages
and smaller requests.

This patch modifies the block layer to allocate large
segments like st and sg so they can use those functions.

This patch also renames the blk_rq* helpers to clarify
what they are doing:

Previously, we did blk_rq_map_user() to map or copy date to a buffer. Then
called blk_rq_unmap_user to unmap or copy back data. sg and st want finer
control over when to use DIO vs indirect IO, and for sg mmap we want to
use the code that sets up a bio buffer which is also used by indirect IO.

Now, if the caller does not care how we transfer data they can call
blk_rq_init_transfer() to setup the buffers (this does what blk_rq_map_user()
did before where it would try DIO first then fall back to indirect IO)
and then call blk_rq_complete_transfer() when the IO is done (this
does what blk_rq_unmap_user did before). block/scsi_ioctl.c, cdrom,
and bsg use these functions.

If the callers wants to try to do DIO, then they can call blk_rq_map_user()
to set up the buffer. When the IO is done you can then call
blk_rq_destroy_buffer(). You could also call blk_rq_complete_transfer() is
just a smart wrapper.

To do indirect IO, we now have blk_rq_copy_user_iov(). When that IO
is done, you then call blk_rq_uncopy_user_iov().
Signed-off-by: Mike Christie <[EMAIL PROTECTED]>
---
 block/ll_rw_blk.c  |  379 +++-
 block/scsi_ioctl.c |4 -
 drivers/cdrom/cdrom.c  |4 -
 fs/bio.c   |  193 ++--
 include/linux/bio.h|5 -
 include/linux/blkdev.h |   11 +
 6 files changed, 400 insertions(+), 196 deletions(-)

diff --git a/block/ll_rw_blk.c b/block/ll_rw_blk.c
index 7a108d5..c9d765b 100644
--- a/block/ll_rw_blk.c
+++ b/block/ll_rw_blk.c
@@ -35,6 +35,10 @@ #include 
  * for max sense size
  */
 #include 
+/*
+ * for struct sg_iovc
+ */
+#include 
 
 static void blk_unplug_work(struct work_struct *work);
 static void blk_unplug_timeout(unsigned long data);
@@ -2314,138 +2318,301 @@ void blk_insert_request(request_queue_t 
 
 EXPORT_SYMBOL(blk_insert_request);
 
-static int __blk_rq_unmap_user(struct bio *bio)
+static void __blk_rq_destroy_buffer(struct bio *bio)
 {
-   int ret = 0;
+   if (bio_flagged(bio, BIO_USER_MAPPED))
+   bio_unmap_user(bio);
+   else
+   bio_destroy_user_buffer(bio);
+}
 
-   if (bio) {
-   if (bio_flagged(bio, BIO_USER_MAPPED))
-   bio_unmap_user(bio);
-   else
-   ret = bio_uncopy_user(bio);
-   }
+void blk_rq_destroy_buffer(struct bio *bio)
+{
+   struct bio *mapped_bio;
 
-   return ret;
+   while (bio) {
+   mapped_bio = bio;
+   if (unlikely(bio_flagged(bio, BIO_BOUNCED)))
+   mapped_bio = bio->bi_private;
+   __blk_rq_destroy_buffer(mapped_bio);
+   mapped_bio = bio;
+   bio = bio->bi_next;
+   bio_put(mapped_bio);
+   }
 }
+EXPORT_SYMBOL(blk_rq_destroy_buffer);
 
-static int __blk_rq_map_user(request_queue_t *q, struct request *rq,
-void __user *ubuf, unsigned int len)
+/**
+ * blk_rq_setup_buffer - setup buffer to bio mappings
+ * @rq:request structure to fill
+ * @ubuf:  the user buffer (optional)
+ * @len:   length of buffer
+ *
+ * Description:
+ *The caller must call blk_rq_destroy_buffer when the IO is completed.
+ */
+int blk_rq_setup_buffer(struct request *rq, void __user *ubuf,
+   unsigned long len)
 {
-   unsigned long uaddr;
+   struct request_queue *q = rq->q;
+   unsigned long bytes_read = 0;
struct bio *bio, *orig_bio;
int reading, ret;
 
-   reading = rq_data_dir(rq) == READ;
-
-   /*
-* if alignment requirement is satisfied, map in user pages for
-* direct dma. else, set up kernel bounce buffers
-*/
-   uaddr = (unsigned long) ubuf;
-   if (!(uaddr & queue_dma_alignment(q)) && !(len & 
queue_dma_alignment(q)))
-   bio = bio_map_user(q, uaddr, len, reading);
-   else
-   bio = bio_copy_user(q, uaddr, len, reading);
+   if (!len || len > (q->max_hw_sectors << 9))
+   return -EINVAL;
 
-   if (IS_ERR(bio))
-   return PTR_ERR(bio);
+   reading = rq_data_dir(rq) == READ;
+   rq->bio = NULL;
+   while (bytes_read != len) {
+   unsigned long map_len, end, start, uaddr = 0;
 
-   orig_bio = bio;
-   blk_queue_bounce(q, &bio);
+   map_len = min_t(unsigned long, len - bytes_read, BIO_MAX_SIZE);
+   if (ubuf) {
+   uaddr = (unsigned long)ubuf;
+   e

[PATCH 7/7] mv user buffer copy access_ok test to block helper

2007-03-04 Thread michaelc
From: Mike Christie <[EMAIL PROTECTED]>

sg.c does a access_ok test on the user buffer when doing
indirect IO. bsg and scsi_ioctl.c did not, but it seems
like it would be ok to be common. This patch moves that
test to the block layer helpers.
Signed-off-by: Mike Christie <[EMAIL PROTECTED]>
---
 block/ll_rw_blk.c |8 +++-
 1 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/block/ll_rw_blk.c b/block/ll_rw_blk.c
index 35b66ed..4327e23 100644
--- a/block/ll_rw_blk.c
+++ b/block/ll_rw_blk.c
@@ -2527,6 +2527,7 @@ static int copy_user_iov(struct bio *hea
 {
unsigned int iov_len = 0;
int ret, i = 0, iov_index = 0;
+   int read = bio_data_dir(head) == READ;
struct bio *bio;
struct bio_vec *bvec;
char __user *p = NULL;
@@ -2560,10 +2561,15 @@ continue_from_bvec:
 */
goto continue_from_bvec;
}
+
+   if (!access_ok(read ?
+  VERIFY_WRITE : VERIFY_READ,
+  p, iov_len))
+   return -EFAULT;
}
 
copy_bytes = min(iov_len, bvec->bv_len - bvec_offset);
-   if (bio_data_dir(head) == READ)
+   if (read)
ret = copy_to_user(p, addr, copy_bytes);
else
ret = copy_from_user(addr, p, copy_bytes);
-- 
1.4.1.1

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/7] Convert sg to block layer helpers

2007-03-04 Thread michaelc
From: Mike Christie <[EMAIL PROTECTED]>

Convert sg to block layer helpers. I have tested with sg3_utils and
sg_utils. I have tested the mmap, iovec, dio and indirect IO paths,
by running those tools and the example programs against software
iscsi which does not support clustering, scsi_debug which has
a large segment size limit, and libata.
Signed-off-by: Mike Christie <[EMAIL PROTECTED]>
---
 drivers/scsi/sg.c | 1004 +
 1 files changed, 245 insertions(+), 759 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 81e3bc7..53cc140 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -67,7 +67,6 @@ static void sg_proc_cleanup(void);
 #endif
 
 #define SG_ALLOW_DIO_DEF 0
-#define SG_ALLOW_DIO_CODE /* compile out by commenting this define */
 
 #define SG_MAX_DEVS 32768
 
@@ -94,9 +93,6 @@ int sg_big_buff = SG_DEF_RESERVED_SIZE;
 static int def_reserved_size = -1; /* picks up init parameter */
 static int sg_allow_dio = SG_ALLOW_DIO_DEF;
 
-static int scatter_elem_sz = SG_SCATTER_SZ;
-static int scatter_elem_sz_prev = SG_SCATTER_SZ;
-
 #define SG_SECTOR_SZ 512
 #define SG_SECTOR_MSK (SG_SECTOR_SZ - 1)
 
@@ -115,12 +111,9 @@ static struct class_interface sg_interfa
 
 typedef struct sg_scatter_hold { /* holding area for scsi scatter gather info 
*/
unsigned short k_use_sg; /* Count of kernel scatter-gather pieces */
-   unsigned short sglist_len; /* size of malloc'd scatter-gather list ++ */
unsigned bufflen;   /* Size of (aggregate) data buffer */
-   unsigned b_malloc_len;  /* actual len malloc'ed in buffer */
-   struct scatterlist *buffer;/* scatter list */
-   char dio_in_use;/* 0->indirect IO (or mmap), 1->dio */
unsigned char cmd_opcode; /* first byte of command */
+   struct bio_reserve_buf *rbuf; /* reserve memory */
 } Sg_scatter_hold;
 
 struct sg_device;  /* forward declarations */
@@ -132,6 +125,8 @@ typedef struct sg_request { /* SG_MAX_QU
Sg_scatter_hold data;   /* hold buffer, perhaps scatter list */
sg_io_hdr_t header; /* scsi command+info, see  */
unsigned char sense_b[SCSI_SENSE_BUFFERSIZE];
+   struct request *request;
+   struct bio *bio;/* ptr to bio for later unmapping */
char res_used;  /* 1 -> using reserve buffer, 0 -> not ... */
char orphan;/* 1 -> drop on sight, 0 -> normal */
char sg_io_owned;   /* 1 -> packet belongs to SG_IO */
@@ -146,7 +141,6 @@ typedef struct sg_fd {  /* holds the sta
int timeout;/* defaults to SG_DEFAULT_TIMEOUT  */
int timeout_user;   /* defaults to SG_DEFAULT_TIMEOUT_USER */
Sg_scatter_hold reserve;/* buffer held for this file descriptor 
*/
-   unsigned save_scat_len; /* original length of trunc. scat. element */
Sg_request *headrp; /* head of request slist, NULL->empty */
struct fasync_struct *async_qp; /* used by asynchronous notification */
Sg_request req_arr[SG_MAX_QUEUE];   /* used as singly-linked list */
@@ -173,38 +167,24 @@ typedef struct sg_device { /* holds the 
 
 static int sg_fasync(int fd, struct file *filp, int mode);
 /* tasklet or soft irq callback */
-static void sg_cmd_done(void *data, char *sense, int result, int resid);
-static int sg_start_req(Sg_request * srp);
+static void sg_cmd_done(struct request *rq, int uptodate);
+static int sg_setup_req(Sg_request * srp);
 static void sg_finish_rem_req(Sg_request * srp);
-static int sg_build_indirect(Sg_scatter_hold * schp, Sg_fd * sfp, int 
buff_size);
-static int sg_build_sgat(Sg_scatter_hold * schp, const Sg_fd * sfp,
-int tablesize);
 static ssize_t sg_new_read(Sg_fd * sfp, char __user *buf, size_t count,
   Sg_request * srp);
 static ssize_t sg_new_write(Sg_fd * sfp, const char __user *buf, size_t count,
int blocking, int read_only, Sg_request ** o_srp);
 static int sg_common_write(Sg_fd * sfp, Sg_request * srp,
   unsigned char *cmnd, int timeout, int blocking);
-static int sg_u_iovec(sg_io_hdr_t * hp, int sg_num, int ind,
- int wr_xf, int *countp, unsigned char __user **up);
-static int sg_write_xfer(Sg_request * srp);
 static int sg_read_xfer(Sg_request * srp);
-static int sg_read_oxfer(Sg_request * srp, char __user *outp, int 
num_read_xfer);
-static void sg_remove_scat(Sg_scatter_hold * schp);
-static void sg_build_reserve(Sg_fd * sfp, int req_size);
-static void sg_link_reserve(Sg_fd * sfp, Sg_request * srp, int size);
-static void sg_unlink_reserve(Sg_fd * sfp, Sg_request * srp);
-static struct page *sg_page_malloc(int rqSz, int lowDma, int *retSzp);
-static void sg_page_free(struct page *page, int size);
+static int sg_build_reserve(Sg_fd * sfp, int req_size);
 static Sg_fd *sg_add_sfp(Sg_device * sdp, int dev);
 static int sg_re

RE: Possible bug in scsi_lib.c:scsi_req_map_sg()

2007-03-04 Thread Dachepalli, Sudhir
James,

How about the following:

1. Cmd= Scsi_get_command( on physical device )  
2. Clone the scsi_cmnd fields of the virtual cmnd( command received by
failover driver) to physical cmnd (the command allocated by
scsi_get_command )
3. blk_get_request()
4. fill the request fields, req->special = Cmd ( point the special field
to allocated scsi_cmnd )
5. set the request flags to REQ_DONTPREP
6. blk_execute_req_nowait()
7. once the command comes back call the blk_put_request()


Sudhir

-Original Message-
From: James Bottomley [mailto:[EMAIL PROTECTED] 
Sent: Sunday, March 04, 2007 12:14 PM
To: Dachepalli, Sudhir
Cc: Mike Christie; Benny Halevy; Jens Axboe; Boaz Harrosh;
linux-scsi@vger.kernel.org
Subject: RE: Possible bug in scsi_lib.c:scsi_req_map_sg()

On Sun, 2007-03-04 at 11:00 -0700, Dachepalli, Sudhir wrote:
> Do you think the following could work If I used blk layer functions 
> instead of "scsi_execute_async":
> 
> Blk_get_request()
> Req->flags |= REQ_DONTPREP

There's additional complexity here:  if the request isn't prepared, no
command is allocated.  You can't use the one you have because it belongs
to a different device (plus it would be freed by this request in the
completion path).

> Blk_rq_map_kern()
> Blk_execute_req_nowait()

If the request is already mapped, you don't want to be mapping it again.

> Blk_put_request()

Without seeing the code it's hard to say definitively whether all this
is correct.

James


-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: convert sg to block layer helpers - v5

2007-03-04 Thread Douglas Gilbert
[EMAIL PROTECTED] wrote:
> There is no big changes between v4 and v5. I was able to fix
> things in scsi tgt, so I could remove the weird arguements
> the block helpers were taking for it. I also tried to break
> up the patchset for easier viewing. The final patch also
> takes care of the access_ok regression.
> 
> These patches were made against linus's tree since Tomo needed
> me to break part of it out for his scsi tgt bug fix patches.
> 
> 0001-rm-bio-hacks-in-scsi-tgt.txt - Drop scsi tgt's bio_map_user
> usage and convert it to blk_rq_map_user. Tomo is also sending
> this patch in his patchset since he needs it for his bug fixes.
> 
> 0002-rm-block-device-arg-from-bio-map-user.txt - The block_device
> argument is never used in the bio map user functions, so this
> patch drops it.
> 
> 0003-Support-large-sg-io-segments.txt - Modify the bio functions
> to allocate multiple pages at once instead of a single page.
> 
> 0004-Add-reserve-buffer-for-sg-io.txt - Add reserve buffer support
> to the block layer for sg and st indirect IO use.
> 
> 0005-Add-sg-io-mmap-helper.txt - Add some block layer helpers for
> sg mmap support.
> 
> 0006-Convert-sg-to-block-layer-helpers.txt - Convert sg to block
> layer helpers.
> 
> 0007-mv-user-buffer-copy-access_ok-test-to-block-helper.txt - 
> Move user data buffer access_ok tests to block layer helpers.
> 
> The goal of this patchset is to remove scsi_execute_async and
> reduce code duplication.
> 
> People want to discuss further merging sg and bsg/scsi_ioctl
> functionality, but I did not handle and any of that in this
> patchset since people still disagree on what should supported
> with future interfaces.
> 
> My only TODO is maybe make the bio reserve buffer mempoolable
> (make it work as mempool alloc and free functions). Since
> sg only supported one reserve buffer per fd I have not worked
> on it and it did not seem worth it if there are no users.
***

Mike,
I see you are removing the scatter_elem_sz parameter.
What decides the scatter gather element size? Can it
be greater than PAGE_SIZE?


*** Generalizing the idea of a mmap-ed reserve buffer to
something the user had more control over could be very
powerful.
For example allowing two file descriptors (to different
devices) in the same process to share the same mmap-ed
area. This would allow a device to device copy to DMA into
and out of the same memory, potentially with large per command
transfers and with no per command scatter gather build and
tear down. Basically a zero copy copy with minimal CPU
overhead.

Doug Gilbert



-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: convert sg to block layer helpers - v5

2007-03-04 Thread Mike Christie
Douglas Gilbert wrote:
> Mike,
> I see you are removing the scatter_elem_sz parameter.
> What decides the scatter gather element size? Can it
> be greater than PAGE_SIZE?

Oh yeah, sorry I should have documented that.

I just made the code try to allocate as large a element as possible.
So the code looks at q->max_segment_size and tries to allocate segments
that large initially. If that is too large then it will drop down by
half like what sg.c used to do when it could not allocate large segments.

I will add the param back if you want. I had thought it was a workaound
due to the segment size of a device not being exported.

> 
> 
> *** Generalizing the idea of a mmap-ed reserve buffer to
> something the user had more control over could be very
> powerful.
> For example allowing two file descriptors (to different
> devices) in the same process to share the same mmap-ed
> area. This would allow a device to device copy to DMA into
> and out of the same memory, potentially with large per command
> transfers and with no per command scatter gather build and
> tear down. Basically a zero copy copy with minimal CPU
> overhead.
> 

I was thinking of something similar but not based on mmap. I have been
trying to figure out a way to do sg io splice. I do not care what
interface or method is used, I think it would be useful.

I know we talked about the mmap approach a little, but I do not remember
if we talked about how to tell both fds that they are going to use the
same buffer. Would we need a modification to the sg header or would we
need to add in a new IOCTL which would tell sg.c to share the buffer
between two fds?
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: convert sg to block layer helpers - v5

2007-03-04 Thread Douglas Gilbert
Mike Christie wrote:
> Douglas Gilbert wrote:
>> Mike,
>> I see you are removing the scatter_elem_sz parameter.
>> What decides the scatter gather element size? Can it
>> be greater than PAGE_SIZE?
> 
> Oh yeah, sorry I should have documented that.
> 
> I just made the code try to allocate as large a element as possible.
> So the code looks at q->max_segment_size and tries to allocate segments
> that large initially. If that is too large then it will drop down by
> half like what sg.c used to do when it could not allocate large segments.
> 
> I will add the param back if you want. I had thought it was a workaound
> due to the segment size of a device not being exported.
> 
>>
>> *** Generalizing the idea of a mmap-ed reserve buffer to
>> something the user had more control over could be very
>> powerful.
>> For example allowing two file descriptors (to different
>> devices) in the same process to share the same mmap-ed
>> area. This would allow a device to device copy to DMA into
>> and out of the same memory, potentially with large per command
>> transfers and with no per command scatter gather build and
>> tear down. Basically a zero copy copy with minimal CPU
>> overhead.
>>
> 
> I was thinking of something similar but not based on mmap. I have been
> trying to figure out a way to do sg io splice. I do not care what
> interface or method is used, I think it would be useful.
> 
> I know we talked about the mmap approach a little, but I do not remember
> if we talked about how to tell both fds that they are going to use the
> same buffer. Would we need a modification to the sg header or would we
> need to add in a new IOCTL which would tell sg.c to share the buffer
> between two fds?

Mike,
Currently there is a flag in sgv3:
#define SG_FLAG_MMAP_IO 4
and when it is active the dxferp field is ignored
as it is assumed the user previously did a mmap()
call to get the reserved buffer.

We could add a:
#define SG_FLAG_MMAP_IO_SHARED 8
and then the pointer in dxferp could taken as
the already mmap-ed buffer from another device.


Having more than one mmap-ed IO buffer per file
descriptor would be nice but opening multiple
file descriptors to the same device can give
the same effect (with perhaps a POSIX thread per
file descriptor).


Doug Gilbert

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/7] mv user buffer copy access_ok test to block helper

2007-03-04 Thread Mike Christie
[EMAIL PROTECTED] wrote:
> + if (!access_ok(read ?
> +VERIFY_WRITE : VERIFY_READ,
> +p, iov_len))
> + return -EFAULT;
>   }
>  
>   copy_bytes = min(iov_len, bvec->bv_len - bvec_offset);
> - if (bio_data_dir(head) == READ)
> + if (read)
>   ret = copy_to_user(p, addr, copy_bytes);
>   else
>   ret = copy_from_user(addr, p, copy_bytes);

Tomo notified me that copy_from/to_user does the access_ok test so this
patch is not needed.
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] tgt: fix scsi command leak

2007-03-04 Thread FUJITA Tomonori
From: Douglas Gilbert <[EMAIL PROTECTED]>
Subject: Re: [PATCH 3/3] tgt: fix scsi command leak
Date: Sat, 03 Mar 2007 11:58:19 -0500

> FUJITA Tomonori wrote:
> > The failure to map user-space pages leads to scsi command leak. It can
> > happens mostly because of user-space daemon bugs (or OOM). This patch
> > makes tgt just notify a LLD of the failure with sense when
> > blk_rq_map_user() fails.
> > 
> > Signed-off-by: FUJITA Tomonori <[EMAIL PROTECTED]>
> > Signed-off-by: Mike Christie <[EMAIL PROTECTED]>
> > ---
> >  drivers/scsi/scsi_tgt_lib.c |   23 ---
> >  1 files changed, 20 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/scsi/scsi_tgt_lib.c b/drivers/scsi/scsi_tgt_lib.c
> > index dc8781a..c05dff9 100644
> > --- a/drivers/scsi/scsi_tgt_lib.c
> > +++ b/drivers/scsi/scsi_tgt_lib.c
> > @@ -459,6 +459,16 @@ static struct request *tgt_cmd_hash_look
> > return rq;
> >  }
> >  
> > +static void scsi_tgt_build_sense(unsigned char *sense_buffer, unsigned 
> > char key,
> > +unsigned char asc, unsigned char asq)
> > +{
> > +   sense_buffer[0] = 0x70;
> > +   sense_buffer[2] = key;
> > +   sense_buffer[7] = 0xa;
> > +   sense_buffer[12] = asc;
> > +   sense_buffer[13] = asq;
> > +}
> > +
> 
> Tomo,
> Perhaps you could add a memset(sense_buffer, 0, 18) before
> those assignments and state that this is "fixed" sense
> buffer format.

I think that it isn't necessary because when a target mode driver
allocates scsi_cmnd, scsi_host_get_command() does that.


> What about an option for descriptor sense format? With SAT now
> a standard, we now have one more reason to support
> descriptor format when required. The ATA PASS-THROUGH SCSI
> commands in SAT use descriptor sense format to return
> ATA registers.

tgt's kernel-space code doesn't know anything about SCSI devices that
initiators talks to. So it's difficult to send proper sense buffer.
Nomally, we don't have this problem because tgt user-space code builds
sense buffer.

The bug that we are trying to fix is that the scsi command leak due to
the user-space's bugs. So we can't rely on the user-space for this.

Not that, like open-iscsi, the user-space bugs are pretty critical for
tgt as the kernel-space bugs. We don't think target mode drivers can
continue to work. However, tgt should tell target mode drivers that
unrecoverable problems happen and we should cleanly unload the kernel
modules.


> 
> While on the subject of sense data, I note that the
> ATA folks (t13.org) are proposing an "ATA REQUEST
> SENSE" command to leverage of existing SCSI
> sense_key, asc, ascq tuples.
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html