Re: [PATCHv7 0/3][Resend] Display EVPD pages in sysfs

2014-03-05 Thread Hannes Reinecke
On 03/02/2014 09:53 AM, Bart Van Assche wrote:
> On 02/13/14 11:27, Hannes Reinecke wrote:
>> After discussion with jejb I've dropped the EVPD parsing.
>> So with this version we're just displaying the EVPD page
>> 0x80 and 0x83 as hexdumps; no parsing is attempted.
>> This drastically simplifies the patch, and we don't
>> have to worry about any parsing errors in kernel space.
>> Of course we'll need a parser in userspace, but that
>> doesn't need to do any I/O. So it's still a very nice
>> gain.
> 
> A general comment about this patch series: I think the cached copies of
> these pages should be refreshed at least after an INQUIRY DATA HAS
> CHANGED unit attention code has been received. Some SCSI target
> implementations allow to change this data after a LUN has been created.
> 
Yes, eventually. But this needs to be handled in a general context,
as (potentially) even the inquiry string itself has been invalidated
after receiving such an event.
So we should be doing a rescan of the scsi device upon receiving
such an event. But this is a general problem, not one particular to
this patchset.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries & Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv7 0/3][Resend] Display EVPD pages in sysfs

2014-03-05 Thread Bart Van Assche
On 03/05/14 09:00, Hannes Reinecke wrote:
> On 03/02/2014 09:53 AM, Bart Van Assche wrote:
>> A general comment about this patch series: I think the cached copies of
>> these pages should be refreshed at least after an INQUIRY DATA HAS
>> CHANGED unit attention code has been received. Some SCSI target
>> implementations allow to change this data after a LUN has been created.
>
> Yes, eventually. But this needs to be handled in a general context,
> as (potentially) even the inquiry string itself has been invalidated
> after receiving such an event.
> So we should be doing a rescan of the scsi device upon receiving
> such an event. But this is a general problem, not one particular to
> this patchset.

Sorry but since the ALUA patch series is based on this patch series I'm
afraid that the ALUA patch series introduces a regression that seems
unacceptable to me. SCSI target implementations like LIO allow to remove
and re-add a LUN after initial discovery of a SCSI host. My concern here
is that the caching introduced by this patch series and which is used in
the ALUA patch series will cause INQUIRY data not to be updated after it
has been changed at the target side. Today the scsi_dh_alua handler
processes such INQUIRY data changes fine. Does this make sense to you ?

Thanks,

Bart.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv7 0/3][Resend] Display EVPD pages in sysfs

2014-03-05 Thread Hannes Reinecke
On 03/05/2014 09:23 AM, Bart Van Assche wrote:
> On 03/05/14 09:00, Hannes Reinecke wrote:
>> On 03/02/2014 09:53 AM, Bart Van Assche wrote:
>>> A general comment about this patch series: I think the cached copies of
>>> these pages should be refreshed at least after an INQUIRY DATA HAS
>>> CHANGED unit attention code has been received. Some SCSI target
>>> implementations allow to change this data after a LUN has been created.
>>
>> Yes, eventually. But this needs to be handled in a general context,
>> as (potentially) even the inquiry string itself has been invalidated
>> after receiving such an event.
>> So we should be doing a rescan of the scsi device upon receiving
>> such an event. But this is a general problem, not one particular to
>> this patchset.
> 
> Sorry but since the ALUA patch series is based on this patch series I'm
> afraid that the ALUA patch series introduces a regression that seems
> unacceptable to me. SCSI target implementations like LIO allow to remove
> and re-add a LUN after initial discovery of a SCSI host. My concern here
> is that the caching introduced by this patch series and which is used in
> the ALUA patch series will cause INQUIRY data not to be updated after it
> has been changed at the target side. Today the scsi_dh_alua handler
> processes such INQUIRY data changes fine. Does this make sense to you ?
> 
Currently, the SCSI stack itself is not well-equipped to handle
these kind of setups anyway.
(The only known working solution is to remove the device _entirely_
and have the HBA rescan the LUN).
So while you are right I'm still preferring to keep those two issues
separate.

James, any preferences here?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries & Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Loan Application

2014-03-05 Thread Loans
Loan Application at a low rate of 0.5% send your Name,Amount,Phone and country 
to standar...@56788.com

Note: $5,000.00 USD minimum and $100,000,000 Maximum.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/6] iser-target: Fix active I/O shutdown related issues

2014-03-05 Thread Sagi Grimberg

On 3/5/2014 2:06 AM, Nicholas A. Bellinger wrote:

On Tue, 2014-03-04 at 17:17 +0200, Sagi Grimberg wrote:

On 3/4/2014 2:00 AM, Nicholas A. Bellinger wrote:

From: Nicholas Bellinger 

Hi Or & Sagi,

This series addresses a number of active I/O shutdown related issues
in iser-target code that have come up recently during stress testing.

Note there is still a seperate iser-target network portal shutdown
bug being tracked down, but this series addresses all existing issues
related to active I/O session shutdown.

The patch breakdown looks like:

Patch #1 fixes a long-standing bug where TPGs in shutdown incorrectly
could be referenced by new login attempts.

Patch #2 converts list_del -> list_del_init for iscsi_cmd->i_conn_node
so that list_empty works correctly.

Patch #3 addresses isert_conn->state related bugs resulting in hung
shutdown, and splits isert_free_conn() into seperate code that is
called earlier during shutdown to ensure that all outstanding I/O
has completed.

Patch #4 fixes incorrect accounting of ->post_send_buf_count during
active I/O shutdown with outstanding RDMA WRITE + RDMA READ work
requests.

Patch #5 addresses a bug related to active I/O shutdown with
outstanding FRMR work requests.  Note this patch is specific to
v3.12+ code.

Patch #6 addresses bugs related to active I/O shutdown with
outstanding completion interrupt coalescing batches. Note this patch
is specific to v3.13+ code.

Please review.

Hey Nic,

So besides a minor comment, you have my Ack on this set.


Thanks!


More on cleanup flow. isert_cma_handler does not handle
RDMA_CM_EVENT_TIMEWAIT_EXIT.
To be more specific, according to IB spec, when initiating disconnect
(rdma_disconnect/ib_send_cm_dreq),
one should not destroy a used qp until getting TIMEWAIT_EXIT CM event.
We are working on this in iSER initiator.
It might lead to "stale connection" CM rejects on future connections
(SRP also does not do that).


, I noticed that as well during recent debugging.

However, AFAICT the RDMA_CM_EVENT_TIMEWAIT_EVENT doesn't (always) occur
on the target side after a RDMA_CM_EVENT_DISCONNECTED, and thus far I've
not been able to ascertain what's different about the shutdown sequence
that would make this happen, or not happen..

Any ideas..?


That's probably because the cm_id is destroyed before you get the event. 
There is a specific
timout computation to get this event (see IB spec). If you will attempt 
to disconnect while
the link is down (initiator won't receive it and send you disconnect 
back), you should be able
to see this event. As I understand, in order to comply the spec, the QP 
(and the cm_id afterwards)

should be destroyed only when getting this event and not before.

Sagi.


--nab



--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 71231] System unresponsable after a lot of LUNs have been added to the system

2014-03-05 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=71231

--- Comment #7 from Alex  ---
ea461abf61753b4b79e625a7c20650105b990f21 is the first bad commit
commit ea461abf61753b4b79e625a7c20650105b990f21
Author: Gavin Shan 
Date:   Wed Jun 5 15:34:02 2013 +0800

powerpc/eeh: Fix fetching bus for single-dev-PE

While running Linux as guest on top of phyp, we possiblly have
PE that includes single PCI device. However, we didn't return
its PCI bus correctly and it leads to failure on recovery from
EEH errors for single-dev-PE. The patch fixes the issue.

Cc:  # v3.7+
Cc: Steve Best 
Signed-off-by: Gavin Shan 
Signed-off-by: Benjamin Herrenschmidt 

:04 04 8af694ef3a1cc027bef45c99ec9a5592a501e31d
6b6f5b9268e71c63a46385d273656b2419645502 March

-- 
You are receiving this mail because:
You are the assignee for the bug.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] bnx2fc: remove unused variable hash_table_size

2014-03-05 Thread Maurizio Lombardi
hash_table_size is not used by the bnx2fc_free_hash_table() function.

Signed-off-by: Maurizio Lombardi 
---
 drivers/scsi/bnx2fc/bnx2fc_hwi.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/scsi/bnx2fc/bnx2fc_hwi.c b/drivers/scsi/bnx2fc/bnx2fc_hwi.c
index 46a3765..261af2a 100644
--- a/drivers/scsi/bnx2fc/bnx2fc_hwi.c
+++ b/drivers/scsi/bnx2fc/bnx2fc_hwi.c
@@ -1966,12 +1966,9 @@ static void bnx2fc_free_hash_table(struct bnx2fc_hba 
*hba)
 {
int i;
int segment_count;
-   int hash_table_size;
u32 *pbl;
 
segment_count = hba->hash_tbl_segment_count;
-   hash_table_size = BNX2FC_NUM_MAX_SESS * BNX2FC_MAX_ROWS_IN_HASH_TBL *
-   sizeof(struct fcoe_hash_table_entry);
 
pbl = hba->hash_tbl_pbl;
for (i = 0; i < segment_count; ++i) {
-- 
Maurizio Lombardi

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 02/13] IB/iser: Push the decision what memory key to use into fast_reg_mr routine

2014-03-05 Thread Sagi Grimberg
This is a preparation step for T10-PI offload support. We prefer to push
the desicion of which mkey to use (global or fastreg) to iser_fast_reg_mr.
We choose to do this since it in T10-PI we may need to register for
protection buffers and in this case we wish to simplify iser_fast_reg_mr
instead of repeating the logic of which key to use.

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg 
Signed-off-by: Alex Tabachnik 
---
 drivers/infiniband/ulp/iser/iser_memory.c |  101 +
 1 files changed, 59 insertions(+), 42 deletions(-)

diff --git a/drivers/infiniband/ulp/iser/iser_memory.c 
b/drivers/infiniband/ulp/iser/iser_memory.c
index 6e9b7bc..d25587e 100644
--- a/drivers/infiniband/ulp/iser/iser_memory.c
+++ b/drivers/infiniband/ulp/iser/iser_memory.c
@@ -444,16 +444,40 @@ int iser_reg_rdma_mem_fmr(struct iscsi_iser_task 
*iser_task,
return 0;
 }
 
-static int iser_fast_reg_mr(struct fast_reg_descriptor *desc,
-   struct iser_conn *ib_conn,
+static int iser_fast_reg_mr(struct iscsi_iser_task *iser_task,
struct iser_regd_buf *regd_buf,
-   u32 offset, unsigned int data_size,
-   unsigned int page_list_len)
+   struct iser_data_buf *mem,
+   struct ib_sge *sge)
 {
+   struct fast_reg_descriptor *desc = regd_buf->reg.mem_h;
+   struct iser_conn *ib_conn = iser_task->iser_conn->ib_conn;
+   struct iser_device *device = ib_conn->device;
+   struct ib_device *ibdev = device->ib_device;
struct ib_send_wr fastreg_wr, inv_wr;
struct ib_send_wr *bad_wr, *wr = NULL;
u8 key;
-   int ret;
+   int ret, offset, size, plen;
+
+   /* if there a single dma entry, dma mr suffices */
+   if (mem->dma_nents == 1) {
+   struct scatterlist *sg = (struct scatterlist *)mem->buf;
+
+   sge->lkey = device->mr->lkey;
+   sge->addr   = ib_sg_dma_address(ibdev, &sg[0]);
+   sge->length  = ib_sg_dma_len(ibdev, &sg[0]);
+
+   iser_dbg("Single DMA entry: lkey=0x%x, addr=0x%llx, 
length=0x%x\n",
+sge->lkey, sge->addr, sge->length);
+   return 0;
+   }
+
+   plen = iser_sg_to_page_vec(mem, device->ib_device,
+  desc->data_frpl->page_list,
+  &offset, &size);
+   if (plen * SIZE_4K < size) {
+   iser_err("fast reg page_list too short to hold this SG\n");
+   return -EINVAL;
+   }
 
if (!desc->valid) {
memset(&inv_wr, 0, sizeof(inv_wr));
@@ -472,9 +496,9 @@ static int iser_fast_reg_mr(struct fast_reg_descriptor 
*desc,
fastreg_wr.opcode = IB_WR_FAST_REG_MR;
fastreg_wr.wr.fast_reg.iova_start = desc->data_frpl->page_list[0] + 
offset;
fastreg_wr.wr.fast_reg.page_list = desc->data_frpl;
-   fastreg_wr.wr.fast_reg.page_list_len = page_list_len;
+   fastreg_wr.wr.fast_reg.page_list_len = plen;
fastreg_wr.wr.fast_reg.page_shift = SHIFT_4K;
-   fastreg_wr.wr.fast_reg.length = data_size;
+   fastreg_wr.wr.fast_reg.length = size;
fastreg_wr.wr.fast_reg.rkey = desc->data_mr->rkey;
fastreg_wr.wr.fast_reg.access_flags = (IB_ACCESS_LOCAL_WRITE  |
   IB_ACCESS_REMOTE_WRITE |
@@ -492,12 +516,9 @@ static int iser_fast_reg_mr(struct fast_reg_descriptor 
*desc,
}
desc->valid = false;
 
-   regd_buf->reg.mem_h = desc;
-   regd_buf->reg.lkey = desc->data_mr->lkey;
-   regd_buf->reg.rkey = desc->data_mr->rkey;
-   regd_buf->reg.va = desc->data_frpl->page_list[0] + offset;
-   regd_buf->reg.len = data_size;
-   regd_buf->reg.is_mr = 1;
+   sge->lkey = desc->data_mr->lkey;
+   sge->addr = desc->data_frpl->page_list[0] + offset;
+   sge->length = size;
 
return ret;
 }
@@ -516,11 +537,10 @@ int iser_reg_rdma_mem_fastreg(struct iscsi_iser_task 
*iser_task,
struct ib_device *ibdev = device->ib_device;
struct iser_data_buf *mem = &iser_task->data[cmd_dir];
struct iser_regd_buf *regd_buf = &iser_task->rdma_regd[cmd_dir];
-   struct fast_reg_descriptor *desc;
-   unsigned int data_size, page_list_len;
+   struct fast_reg_descriptor *desc = NULL;
+   struct ib_sge data_sge;
int err, aligned_len;
unsigned long flags;
-   u32 offset;
 
aligned_len = iser_data_buf_aligned_len(mem, ibdev);
if (aligned_len != mem->dma_nents) {
@@ -533,41 +553,38 @@ int iser_reg_rdma_mem_fastreg(struct iscsi_iser_task 
*iser_task,
mem = &iser_task->data_copy[cmd_dir];
}
 
-   /* if there a single dma entry, dma mr suffices */
-   if (mem->dma_nents == 1) {
-   struct scatterlist *sg = (struct scatterlist *)mem->buf;
-
-   

[PATCH v2 03/13] IB/iser: Move fast_reg_descriptor initialization to a function

2014-03-05 Thread Sagi Grimberg
fastreg descriptor will include protection information context.
In order to place the logic in one place we introduce
iser_create_fr_desc function.

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg 
Signed-off-by: Alex Tabachnik 
---
 drivers/infiniband/ulp/iser/iser_verbs.c |   58 -
 1 files changed, 40 insertions(+), 18 deletions(-)

diff --git a/drivers/infiniband/ulp/iser/iser_verbs.c 
b/drivers/infiniband/ulp/iser/iser_verbs.c
index dc5a0b4..9569e40 100644
--- a/drivers/infiniband/ulp/iser/iser_verbs.c
+++ b/drivers/infiniband/ulp/iser/iser_verbs.c
@@ -279,6 +279,39 @@ void iser_free_fmr_pool(struct iser_conn *ib_conn)
ib_conn->fmr.page_vec = NULL;
 }
 
+static int
+iser_create_fastreg_desc(struct ib_device *ib_device, struct ib_pd *pd,
+struct fast_reg_descriptor *desc)
+{
+   int ret;
+
+   desc->data_frpl = ib_alloc_fast_reg_page_list(ib_device,
+ ISCSI_ISER_SG_TABLESIZE + 
1);
+   if (IS_ERR(desc->data_frpl)) {
+   ret = PTR_ERR(desc->data_frpl);
+   iser_err("Failed to allocate ib_fast_reg_page_list err=%d\n",
+ret);
+   return PTR_ERR(desc->data_frpl);
+   }
+
+   desc->data_mr = ib_alloc_fast_reg_mr(pd, ISCSI_ISER_SG_TABLESIZE + 1);
+   if (IS_ERR(desc->data_mr)) {
+   ret = PTR_ERR(desc->data_mr);
+   iser_err("Failed to allocate ib_fast_reg_mr err=%d\n", ret);
+   goto fast_reg_mr_failure;
+   }
+   iser_info("Create fr_desc %p page_list %p\n",
+ desc, desc->data_frpl->page_list);
+   desc->valid = true;
+
+   return 0;
+
+fast_reg_mr_failure:
+   ib_free_fast_reg_page_list(desc->data_frpl);
+
+   return ret;
+}
+
 /**
  * iser_create_fastreg_pool - Creates pool of fast_reg descriptors
  * for fast registration work requests.
@@ -300,32 +333,21 @@ int iser_create_fastreg_pool(struct iser_conn *ib_conn, 
unsigned cmds_max)
goto err;
}
 
-   desc->data_frpl = ib_alloc_fast_reg_page_list(device->ib_device,
-
ISCSI_ISER_SG_TABLESIZE + 1);
-   if (IS_ERR(desc->data_frpl)) {
-   ret = PTR_ERR(desc->data_frpl);
-   iser_err("Failed to allocate ib_fast_reg_page_list 
err=%d\n", ret);
-   goto fast_reg_page_failure;
+   ret = iser_create_fastreg_desc(device->ib_device,
+  device->pd, desc);
+   if (ret) {
+   iser_err("Failed to create fastreg descriptor err=%d\n",
+ret);
+   kfree(desc);
+   goto err;
}
 
-   desc->data_mr = ib_alloc_fast_reg_mr(device->pd,
-ISCSI_ISER_SG_TABLESIZE + 
1);
-   if (IS_ERR(desc->data_mr)) {
-   ret = PTR_ERR(desc->data_mr);
-   iser_err("Failed to allocate ib_fast_reg_mr err=%d\n", 
ret);
-   goto fast_reg_mr_failure;
-   }
-   desc->valid = true;
list_add_tail(&desc->list, &ib_conn->fastreg.pool);
ib_conn->fastreg.pool_size++;
}
 
return 0;
 
-fast_reg_mr_failure:
-   ib_free_fast_reg_page_list(desc->data_frpl);
-fast_reg_page_failure:
-   kfree(desc);
 err:
iser_free_fastreg_pool(ib_conn);
return ret;
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 10/13] IB/iser: Support T10-PI operations

2014-03-05 Thread Sagi Grimberg
Add logic to initialize protection information entities.
Upon each iSCSI task, we keep the scsi_cmnd in order to
query the scsi protection operations and reference to
protection buffers.

Modify iser_fast_reg_mr to receive indication weather it
is registering the data or protection buffers.

In addition Introduce iser_reg_sig_mr which performs fast
registration work-request for a signature enabled memory region
(IB_WR_REG_SIG_MR). In this routine we set all the protection
relevants for the device to offload protection data-transfer
and verification.

Signed-off-by: Sagi Grimberg 
Signed-off-by: Alex Tabachnik 
---
 drivers/infiniband/ulp/iser/iscsi_iser.c |2 +
 drivers/infiniband/ulp/iser/iscsi_iser.h |9 +
 drivers/infiniband/ulp/iser/iser_initiator.c |   63 ++-
 drivers/infiniband/ulp/iser/iser_memory.c|  257 +++---
 4 files changed, 304 insertions(+), 27 deletions(-)

diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.c 
b/drivers/infiniband/ulp/iser/iscsi_iser.c
index cfa952e..a64b878 100644
--- a/drivers/infiniband/ulp/iser/iscsi_iser.c
+++ b/drivers/infiniband/ulp/iser/iscsi_iser.c
@@ -184,6 +184,8 @@ iscsi_iser_task_init(struct iscsi_task *task)
 
iser_task->command_sent = 0;
iser_task_rdma_init(iser_task);
+   iser_task->sc = task->sc;
+
return 0;
 }
 
diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.h 
b/drivers/infiniband/ulp/iser/iscsi_iser.h
index 99fc8b8..fce5409 100644
--- a/drivers/infiniband/ulp/iser/iscsi_iser.h
+++ b/drivers/infiniband/ulp/iser/iscsi_iser.h
@@ -46,6 +46,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include 
 #include 
@@ -289,6 +291,10 @@ struct iser_device {
enum iser_data_dir 
cmd_dir);
 };
 
+#define ISER_CHECK_GUARD   0xc0
+#define ISER_CHECK_REFTAG  0x0f
+#define ISER_CHECK_APPTAG  0x30
+
 enum iser_reg_indicator {
ISER_DATA_KEY_VALID = 1 << 0,
ISER_PROT_KEY_VALID = 1 << 1,
@@ -361,11 +367,14 @@ struct iscsi_iser_task {
struct iser_tx_desc  desc;
struct iscsi_iser_conn   *iser_conn;
enum iser_task_statusstatus;
+   struct scsi_cmnd *sc;
int  command_sent;  /* set if command  sent  */
int  dir[ISER_DIRS_NUM];  /* set if dir 
use*/
struct iser_regd_buf rdma_regd[ISER_DIRS_NUM];/* regd rdma buf 
*/
struct iser_data_buf data[ISER_DIRS_NUM]; /* orig. data 
des*/
struct iser_data_buf data_copy[ISER_DIRS_NUM];/* contig. copy  
*/
+   struct iser_data_buf prot[ISER_DIRS_NUM]; /* prot desc 
*/
+   struct iser_data_buf prot_copy[ISER_DIRS_NUM];/* prot copy 
*/
 };
 
 struct iser_page_vec {
diff --git a/drivers/infiniband/ulp/iser/iser_initiator.c 
b/drivers/infiniband/ulp/iser/iser_initiator.c
index 58e14c7..7fd95fe 100644
--- a/drivers/infiniband/ulp/iser/iser_initiator.c
+++ b/drivers/infiniband/ulp/iser/iser_initiator.c
@@ -62,6 +62,17 @@ static int iser_prepare_read_cmd(struct iscsi_task *task,
if (err)
return err;
 
+   if (scsi_prot_sg_count(iser_task->sc)) {
+   struct iser_data_buf *pbuf_in = &iser_task->prot[ISER_DIR_IN];
+
+   err = iser_dma_map_task_data(iser_task,
+pbuf_in,
+ISER_DIR_IN,
+DMA_FROM_DEVICE);
+   if (err)
+   return err;
+   }
+
if (edtl > iser_task->data[ISER_DIR_IN].data_len) {
iser_err("Total data length: %ld, less than EDTL: "
 "%d, in READ cmd BHS itt: %d, conn: 0x%p\n",
@@ -113,6 +124,17 @@ iser_prepare_write_cmd(struct iscsi_task *task,
if (err)
return err;
 
+   if (scsi_prot_sg_count(iser_task->sc)) {
+   struct iser_data_buf *pbuf_out = &iser_task->prot[ISER_DIR_OUT];
+
+   err = iser_dma_map_task_data(iser_task,
+pbuf_out,
+ISER_DIR_OUT,
+DMA_TO_DEVICE);
+   if (err)
+   return err;
+   }
+
if (edtl > iser_task->data[ISER_DIR_OUT].data_len) {
iser_err("Total data length: %ld, less than EDTL: %d, "
 "in WRITE cmd BHS itt: %d, conn: 0x%p\n",
@@ -368,7 +390,7 @@ int iser_send_command(struct iscsi_conn *conn,
struct iscsi_iser_task *iser_task = task->dd_data;
unsigned long edtl;
int err;
-   struct iser_data_buf *data_buf;
+   struct iser_data_buf *data_buf, *prot_buf;
struct iscsi_scsi_req *hdr = (struct iscsi_scsi_req *)task->hdr;
struct scsi_cmnd *sc  =  task->sc;
   

[PATCH v2 12/13] IB/iser: Implement check_protection

2014-03-05 Thread Sagi Grimberg
Once the iSCSI transaction is completed we must
imeplement check_protection in order to notify
on DIF errors that may have occured.

The routine boils down to calling ib_check_mr_status
to get the signature status of the transaction.

Signed-off-by: Sagi Grimberg 
Signed-off-by: Alex Tabachnik 

Signed-off-by: Sagi Grimberg 
---
 drivers/infiniband/ulp/iser/iscsi_iser.c |   13 
 drivers/infiniband/ulp/iser/iscsi_iser.h |2 +
 drivers/infiniband/ulp/iser/iser_verbs.c |   47 ++
 3 files changed, 62 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.c 
b/drivers/infiniband/ulp/iser/iscsi_iser.c
index a64b878..f13d7e9 100644
--- a/drivers/infiniband/ulp/iser/iscsi_iser.c
+++ b/drivers/infiniband/ulp/iser/iscsi_iser.c
@@ -306,6 +306,18 @@ static void iscsi_iser_cleanup_task(struct iscsi_task 
*task)
}
 }
 
+static u8 iscsi_iser_check_protection(struct iscsi_task *task, sector_t 
*sector)
+{
+   struct iscsi_iser_task *iser_task = task->dd_data;
+
+   if (iser_task->dir[ISER_DIR_IN])
+   return iser_check_task_pi_status(iser_task, ISER_DIR_IN,
+sector);
+   else
+   return iser_check_task_pi_status(iser_task, ISER_DIR_OUT,
+sector);
+}
+
 static struct iscsi_cls_conn *
 iscsi_iser_conn_create(struct iscsi_cls_session *cls_session, uint32_t 
conn_idx)
 {
@@ -742,6 +754,7 @@ static struct iscsi_transport iscsi_iser_transport = {
.xmit_task  = iscsi_iser_task_xmit,
.cleanup_task   = iscsi_iser_cleanup_task,
.alloc_pdu  = iscsi_iser_pdu_alloc,
+   .check_protection   = iscsi_iser_check_protection,
/* recovery */
.session_recovery_timedout = iscsi_session_recovery_timedout,
 
diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.h 
b/drivers/infiniband/ulp/iser/iscsi_iser.h
index fce5409..95f291f 100644
--- a/drivers/infiniband/ulp/iser/iscsi_iser.h
+++ b/drivers/infiniband/ulp/iser/iscsi_iser.h
@@ -483,4 +483,6 @@ int iser_create_fmr_pool(struct iser_conn *ib_conn, 
unsigned cmds_max);
 void iser_free_fmr_pool(struct iser_conn *ib_conn);
 int iser_create_fastreg_pool(struct iser_conn *ib_conn, unsigned cmds_max);
 void iser_free_fastreg_pool(struct iser_conn *ib_conn);
+u8 iser_check_task_pi_status(struct iscsi_iser_task *iser_task,
+enum iser_data_dir cmd_dir, sector_t *sector);
 #endif
diff --git a/drivers/infiniband/ulp/iser/iser_verbs.c 
b/drivers/infiniband/ulp/iser/iser_verbs.c
index 0404c71..abbb6ec 100644
--- a/drivers/infiniband/ulp/iser/iser_verbs.c
+++ b/drivers/infiniband/ulp/iser/iser_verbs.c
@@ -1153,3 +1153,50 @@ static void iser_cq_callback(struct ib_cq *cq, void 
*cq_context)
 
tasklet_schedule(&device->cq_tasklet[cq_index]);
 }
+
+u8 iser_check_task_pi_status(struct iscsi_iser_task *iser_task,
+enum iser_data_dir cmd_dir, sector_t *sector)
+{
+   struct iser_mem_reg *reg = &iser_task->rdma_regd[cmd_dir].reg;
+   struct fast_reg_descriptor *desc = reg->mem_h;
+   unsigned long sector_size = iser_task->sc->device->sector_size;
+   struct ib_mr_status mr_status;
+   int ret;
+
+   if (desc && desc->reg_indicators & ISER_FASTREG_PROTECTED) {
+   desc->reg_indicators &= ~ISER_FASTREG_PROTECTED;
+   ret = ib_check_mr_status(desc->pi_ctx->sig_mr,
+IB_MR_CHECK_SIG_STATUS, &mr_status);
+   if (ret) {
+   pr_err("ib_check_mr_status failed, ret %d\n", ret);
+   goto err;
+   }
+
+   if (mr_status.fail_status & IB_MR_CHECK_SIG_STATUS) {
+   sector_t sector_off = mr_status.sig_err.sig_err_offset;
+
+   do_div(sector_off, sector_size + 8);
+   *sector = scsi_get_lba(iser_task->sc) + sector_off;
+
+   pr_err("PI error found type %d at sector %lx "
+  "expected %x vs actual %x\n",
+  mr_status.sig_err.err_type, *sector,
+  mr_status.sig_err.expected,
+  mr_status.sig_err.actual);
+
+   switch (mr_status.sig_err.err_type) {
+   case IB_SIG_BAD_GUARD:
+   return 0x1;
+   case IB_SIG_BAD_REFTAG:
+   return 0x3;
+   case IB_SIG_BAD_APPTAG:
+   return 0x2;
+   }
+   }
+   }
+
+   return 0;
+err:
+   /* Not alot we can do here, return ambiguous guard error */
+   return 0x1;
+}
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
Mor

[PATCH v2 00/13] T10-PI support for iSER initiator

2014-03-05 Thread Sagi Grimberg
Hey Roland, Nic, Mike and Co

This patchset adds T10 protection information offload support over
RDMA signature verbs API. This set, along with the iSER target set,
allow end-to-end protection information passthrough and validation.
The patchset was tested against Linux SCSI target with iSER DIF
support applied.

iSER T10-PI support enablement is currently controlled with
module parameters (similar to lpfc for example) which make them
global. In the next phase we can consider passing these parameters
from iscsid nl messages, which would make them per-connection.

The approach I took with respect to escalating protection information
errors was minimal iSCSI intervention in protection information affairs.
I added libiscsi a hook asking the transport to check the protection
information status, and construct the proper sense data in case of errors
(the alternative of letting the transport to construct sense data seemed
much less appealing).

Note that this patchset comes on top of a pending patch for iSER to suppress
fastreg completions (http://marc.info/?l=linux-rdma&m=139047309831997&w=2).

v0 patches are available in target-pending git repo (branch rdma-dif) and
passed 0-DAY testing.

Roland, I would like to hear your feedback on this.

The set is ordered by the following:
- Preparation patches (non/minor functionality changes).
- Add protection information execution support.
- Add protection information status check facilities.
- Publish T10-DIF support to SCSI mid-layer according to
  IB device capabilities.

Changes from v1:
- Removed extra space (BUG_ON).
- Dropped DID_ABORT from sc result for data integrity errors.
- Fixed failed sector report.

Changes from v0:
- Fix protection information dma registration for unaligned scatterlists
  which may happen when the block layer merges bios.
- Don't fail connections on devices without DIF support - warn and continue
  without DIF.
- reword FR -> FastReg

Alex Tabachnik (2):
  IB/iser: Introduce pi_enable, pi_guard module parameters
  IB/iser: Initialize T10-PI resources

Sagi Grimberg (11):
  IB/iser: Avoid FRWR notation, use fastreg instead
  IB/iser: Push the decision what memory key to use into fast_reg_mr
routine
  IB/iser: Move fast_reg_descriptor initialization to a function
  IB/iser: Keep IB device attributes under iser_device
  IB/iser: Replace fastreg descriptor valid bool with indicators
container
  IB/iser: Generalize iser_unmap_task_data and
finalize_rdma_unaligned_sg
  IB/iser: Generalize fall_to_bounce_buf routine
  IB/iser: Support T10-PI operations
  SCSI/libiscsi: Add check_protection callback for transports
  IB/iser: Implement check_protection
  IB/iser: Publish T10-PI support to SCSI midlayer

 drivers/infiniband/ulp/iser/iscsi_iser.c |   46 +++-
 drivers/infiniband/ulp/iser/iscsi_iser.h |   71 -
 drivers/infiniband/ulp/iser/iser_initiator.c |   98 +-
 drivers/infiniband/ulp/iser/iser_memory.c|  445 +++---
 drivers/infiniband/ulp/iser/iser_verbs.c |  287 -
 drivers/scsi/libiscsi.c  |   32 ++
 include/scsi/libiscsi.h  |4 +
 include/scsi/scsi_transport_iscsi.h  |1 +
 8 files changed, 771 insertions(+), 213 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 01/13] IB/iser: Avoid FRWR notation, use fastreg instead

2014-03-05 Thread Sagi Grimberg
FRWR stands for "fast registration work request". We
want to avoid calling the fastreg pool with that name,
instead we name it fastreg which stands for "fast registration".

This pool will include more elements in the future, so
it is a good idea to generalize the name.

Signed-off-by: Sagi Grimberg 
Signed-off-by: Alex Tabachnik 
---
 drivers/infiniband/ulp/iser/iscsi_iser.h  |   20 ---
 drivers/infiniband/ulp/iser/iser_memory.c |   28 +-
 drivers/infiniband/ulp/iser/iser_verbs.c  |   84 ++--
 3 files changed, 67 insertions(+), 65 deletions(-)

diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.h 
b/drivers/infiniband/ulp/iser/iscsi_iser.h
index e1a01c6..ca161df 100644
--- a/drivers/infiniband/ulp/iser/iscsi_iser.h
+++ b/drivers/infiniband/ulp/iser/iscsi_iser.h
@@ -138,7 +138,7 @@
 #define ISER_WSV   0x08
 #define ISER_RSV   0x04
 
-#define ISER_FRWR_LI_WRID  0xULL
+#define ISER_FASTREG_LI_WRID   0xULL
 
 struct iser_hdr {
u8  flags;
@@ -312,6 +312,8 @@ struct iser_conn {
unsigned int rx_desc_head;
struct iser_rx_desc  *rx_descs;
struct ib_recv_wrrx_wr[ISER_MIN_POSTED_RX];
+
+   /* Connection memory registration pool */
union {
struct {
struct ib_fmr_pool  *pool; /* pool of IB FMRs   
  */
@@ -321,8 +323,8 @@ struct iser_conn {
struct {
struct list_headpool;
int pool_size;
-   } frwr;
-   } fastreg;
+   } fastreg;
+   };
 };
 
 struct iscsi_iser_conn {
@@ -408,8 +410,8 @@ void iser_finalize_rdma_unaligned_sg(struct iscsi_iser_task 
*task,
 
 int  iser_reg_rdma_mem_fmr(struct iscsi_iser_task *task,
   enum iser_data_dir cmd_dir);
-int  iser_reg_rdma_mem_frwr(struct iscsi_iser_task *task,
-   enum iser_data_dir cmd_dir);
+int  iser_reg_rdma_mem_fastreg(struct iscsi_iser_task *task,
+  enum iser_data_dir cmd_dir);
 
 int  iser_connect(struct iser_conn   *ib_conn,
  struct sockaddr_in *src_addr,
@@ -422,8 +424,8 @@ int  iser_reg_page_vec(struct iser_conn *ib_conn,
 
 void iser_unreg_mem_fmr(struct iscsi_iser_task *iser_task,
enum iser_data_dir cmd_dir);
-void iser_unreg_mem_frwr(struct iscsi_iser_task *iser_task,
-enum iser_data_dir cmd_dir);
+void iser_unreg_mem_fastreg(struct iscsi_iser_task *iser_task,
+   enum iser_data_dir cmd_dir);
 
 int  iser_post_recvl(struct iser_conn *ib_conn);
 int  iser_post_recvm(struct iser_conn *ib_conn, int count);
@@ -440,6 +442,6 @@ int  iser_initialize_task_headers(struct iscsi_task *task,
 int iser_alloc_rx_descriptors(struct iser_conn *ib_conn, struct iscsi_session 
*session);
 int iser_create_fmr_pool(struct iser_conn *ib_conn, unsigned cmds_max);
 void iser_free_fmr_pool(struct iser_conn *ib_conn);
-int iser_create_frwr_pool(struct iser_conn *ib_conn, unsigned cmds_max);
-void iser_free_frwr_pool(struct iser_conn *ib_conn);
+int iser_create_fastreg_pool(struct iser_conn *ib_conn, unsigned cmds_max);
+void iser_free_fastreg_pool(struct iser_conn *ib_conn);
 #endif
diff --git a/drivers/infiniband/ulp/iser/iser_memory.c 
b/drivers/infiniband/ulp/iser/iser_memory.c
index f770179..6e9b7bc 100644
--- a/drivers/infiniband/ulp/iser/iser_memory.c
+++ b/drivers/infiniband/ulp/iser/iser_memory.c
@@ -422,8 +422,8 @@ int iser_reg_rdma_mem_fmr(struct iscsi_iser_task *iser_task,
 (unsigned long)regd_buf->reg.va,
 (unsigned long)regd_buf->reg.len);
} else { /* use FMR for multiple dma entries */
-   iser_page_vec_build(mem, ib_conn->fastreg.fmr.page_vec, ibdev);
-   err = iser_reg_page_vec(ib_conn, ib_conn->fastreg.fmr.page_vec,
+   iser_page_vec_build(mem, ib_conn->fmr.page_vec, ibdev);
+   err = iser_reg_page_vec(ib_conn, ib_conn->fmr.page_vec,
®d_buf->reg);
if (err && err != -EAGAIN) {
iser_data_buf_dump(mem, ibdev);
@@ -431,12 +431,12 @@ int iser_reg_rdma_mem_fmr(struct iscsi_iser_task 
*iser_task,
 mem->dma_nents,
 ntoh24(iser_task->desc.iscsi_header.dlength));
iser_err("page_vec: data_size = 0x%x, length = %d, 
offset = 0x%x\n",
-ib_conn->fastreg.fmr.page_vec->data_size,
-ib_conn->fastreg.fmr.page_vec->length,
-ib_conn->fastreg.fmr.page_vec->offset);
-   for (i = 0; i < ib_conn->fastreg.fmr.page_vec->length; 
i++)
+ib_co

[PATCH v2 08/13] IB/iser: Introduce pi_enable, pi_guard module parameters

2014-03-05 Thread Sagi Grimberg
From: Alex Tabachnik 

Use modparams to activate protection information support.

pi_enable bool: Based on this parameter iSER will know
if it should support T10-PI. We don't want to do this by
default as it requires to allocate and initiatlize extra
resources. In case pi_enable=N, iSER won't publish to
SCSI midlayer any DIF capabilities.

pi_guard int: Based on this parameter iSER will publish
DIX guard type support to SCSI midlayer. 0 means CRC is
allowed to be passed in DIX buffers, 1 (or non-zero)
means IP-CSUM is allowed to be passed in DIX buffers.
Note that over the wire, only CRC is allowed.

In the next phase, it is worth considering passing these
parameters from iscsid via nlmsg. This will allow these
parameters to be connection based rather than global.

Signed-off-by: Alex Tabachnik 
Signed-off-by: Sagi Grimberg 
---
 drivers/infiniband/ulp/iser/iscsi_iser.c |8 
 drivers/infiniband/ulp/iser/iscsi_iser.h |3 +++
 drivers/infiniband/ulp/iser/iser_verbs.c |   13 +
 3 files changed, 24 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.c 
b/drivers/infiniband/ulp/iser/iscsi_iser.c
index dd03cfe..cfa952e 100644
--- a/drivers/infiniband/ulp/iser/iscsi_iser.c
+++ b/drivers/infiniband/ulp/iser/iscsi_iser.c
@@ -82,6 +82,8 @@ static unsigned int iscsi_max_lun = 512;
 module_param_named(max_lun, iscsi_max_lun, uint, S_IRUGO);
 
 int iser_debug_level = 0;
+bool iser_pi_enable = false;
+int iser_pi_guard = 0;
 
 MODULE_DESCRIPTION("iSER (iSCSI Extensions for RDMA) Datamover");
 MODULE_LICENSE("Dual BSD/GPL");
@@ -91,6 +93,12 @@ MODULE_VERSION(DRV_VER);
 module_param_named(debug_level, iser_debug_level, int, 0644);
 MODULE_PARM_DESC(debug_level, "Enable debug tracing if > 0 
(default:disabled)");
 
+module_param_named(pi_enable, iser_pi_enable, bool, 0644);
+MODULE_PARM_DESC(pi_enable, "Enable T10-PI offload support 
(default:disabled)");
+
+module_param_named(pi_guard, iser_pi_guard, int, 0644);
+MODULE_PARM_DESC(pi_guard, "T10-PI guard_type, 0:CRC|1:IP_CSUM (default:CRC)");
+
 struct iser_global ig;
 
 void
diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.h 
b/drivers/infiniband/ulp/iser/iscsi_iser.h
index 623defa..011003f 100644
--- a/drivers/infiniband/ulp/iser/iscsi_iser.h
+++ b/drivers/infiniband/ulp/iser/iscsi_iser.h
@@ -317,6 +317,7 @@ struct iser_conn {
unsigned int rx_desc_head;
struct iser_rx_desc  *rx_descs;
struct ib_recv_wrrx_wr[ISER_MIN_POSTED_RX];
+   bool pi_support;
 
/* Connection memory registration pool */
union {
@@ -371,6 +372,8 @@ struct iser_global {
 
 extern struct iser_global ig;
 extern int iser_debug_level;
+extern bool iser_pi_enable;
+extern int iser_pi_guard;
 
 /* allocate connection resources needed for rdma functionality */
 int iser_conn_set_full_featured_mode(struct iscsi_conn *conn);
diff --git a/drivers/infiniband/ulp/iser/iser_verbs.c 
b/drivers/infiniband/ulp/iser/iser_verbs.c
index 6a5f424..4c27f55 100644
--- a/drivers/infiniband/ulp/iser/iser_verbs.c
+++ b/drivers/infiniband/ulp/iser/iser_verbs.c
@@ -607,6 +607,19 @@ static int iser_addr_handler(struct rdma_cm_id *cma_id)
ib_conn = (struct iser_conn *)cma_id->context;
ib_conn->device = device;
 
+   /* connection T10-PI support */
+   if (iser_pi_enable) {
+   if (!(device->dev_attr.device_cap_flags &
+ IB_DEVICE_SIGNATURE_HANDOVER)) {
+   iser_warn("T10-PI requested but not supported on %s, "
+ "continue without T10-PI\n",
+ ib_conn->device->ib_device->name);
+   ib_conn->pi_support = false;
+   } else {
+   ib_conn->pi_support = true;
+   }
+   }
+
ret = rdma_resolve_route(cma_id, 1000);
if (ret) {
iser_err("resolve route failed: %d\n", ret);
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 13/13] IB/iser: Publish T10-PI support to SCSI midlayer

2014-03-05 Thread Sagi Grimberg
After allocating a scsi_host we set protection types
and guard type supported.

Signed-off-by: Sagi Grimberg 
Signed-off-by: Alex Tabachnik 
---
 drivers/infiniband/ulp/iser/iscsi_iser.c |   23 ++-
 1 files changed, 22 insertions(+), 1 deletions(-)

diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.c 
b/drivers/infiniband/ulp/iser/iscsi_iser.c
index f13d7e9..a0ec2d0 100644
--- a/drivers/infiniband/ulp/iser/iscsi_iser.c
+++ b/drivers/infiniband/ulp/iser/iscsi_iser.c
@@ -435,6 +435,17 @@ static void iscsi_iser_session_destroy(struct 
iscsi_cls_session *cls_session)
iscsi_host_free(shost);
 }
 
+static inline unsigned int
+iser_dif_prot_caps(int prot_caps)
+{
+   return ((prot_caps & IB_PROT_T10DIF_TYPE_1) ? 
SHOST_DIF_TYPE1_PROTECTION |
+ 
SHOST_DIX_TYPE1_PROTECTION : 0) |
+  ((prot_caps & IB_PROT_T10DIF_TYPE_2) ? 
SHOST_DIF_TYPE2_PROTECTION |
+ 
SHOST_DIX_TYPE2_PROTECTION : 0) |
+  ((prot_caps & IB_PROT_T10DIF_TYPE_3) ? 
SHOST_DIF_TYPE3_PROTECTION |
+ 
SHOST_DIX_TYPE3_PROTECTION : 0);
+}
+
 static struct iscsi_cls_session *
 iscsi_iser_session_create(struct iscsi_endpoint *ep,
  uint16_t cmds_max, uint16_t qdepth,
@@ -459,8 +470,18 @@ iscsi_iser_session_create(struct iscsi_endpoint *ep,
 * older userspace tools (before 2.0-870) did not pass us
 * the leading conn's ep so this will be NULL;
 */
-   if (ep)
+   if (ep) {
ib_conn = ep->dd_data;
+   if (ib_conn->pi_support) {
+   u32 sig_caps = ib_conn->device->dev_attr.sig_prot_cap;
+
+   scsi_host_set_prot(shost, iser_dif_prot_caps(sig_caps));
+   if (iser_pi_guard)
+   scsi_host_set_guard(shost, SHOST_DIX_GUARD_IP);
+   else
+   scsi_host_set_guard(shost, SHOST_DIX_GUARD_CRC);
+   }
+   }
 
if (iscsi_host_add(shost,
   ep ? ib_conn->device->ib_device->dma_device : NULL))
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 04/13] IB/iser: Keep IB device attributes under iser_device

2014-03-05 Thread Sagi Grimberg
For T10-PI offload support, we will need to know the
device signature offload capability upon every connection
establishment.

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg 
Signed-off-by: Alex Tabachnik 
---
 drivers/infiniband/ulp/iser/iscsi_iser.h |1 +
 drivers/infiniband/ulp/iser/iser_verbs.c |   18 ++
 2 files changed, 7 insertions(+), 12 deletions(-)

diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.h 
b/drivers/infiniband/ulp/iser/iscsi_iser.h
index ca161df..b4290f5 100644
--- a/drivers/infiniband/ulp/iser/iscsi_iser.h
+++ b/drivers/infiniband/ulp/iser/iscsi_iser.h
@@ -260,6 +260,7 @@ struct iscsi_iser_task;
 struct iser_device {
struct ib_device *ib_device;
struct ib_pd *pd;
+   struct ib_device_attrdev_attr;
struct ib_cq *rx_cq[ISER_MAX_CQ];
struct ib_cq *tx_cq[ISER_MAX_CQ];
struct ib_mr *mr;
diff --git a/drivers/infiniband/ulp/iser/iser_verbs.c 
b/drivers/infiniband/ulp/iser/iser_verbs.c
index 9569e40..95fcfca 100644
--- a/drivers/infiniband/ulp/iser/iser_verbs.c
+++ b/drivers/infiniband/ulp/iser/iser_verbs.c
@@ -71,17 +71,14 @@ static void iser_event_handler(struct ib_event_handler 
*handler,
  */
 static int iser_create_device_ib_res(struct iser_device *device)
 {
-   int i, j;
struct iser_cq_desc *cq_desc;
-   struct ib_device_attr *dev_attr;
+   struct ib_device_attr *dev_attr = &device->dev_attr;
+   int ret, i, j;
 
-   dev_attr = kmalloc(sizeof(*dev_attr), GFP_KERNEL);
-   if (!dev_attr)
-   return -ENOMEM;
-
-   if (ib_query_device(device->ib_device, dev_attr)) {
+   ret = ib_query_device(device->ib_device, dev_attr);
+   if (ret) {
pr_warn("Query device failed for %s\n", 
device->ib_device->name);
-   goto dev_attr_err;
+   return ret;
}
 
/* Assign function handles  - based on FMR support */
@@ -101,7 +98,7 @@ static int iser_create_device_ib_res(struct iser_device 
*device)
device->iser_unreg_rdma_mem = iser_unreg_mem_fastreg;
} else {
iser_err("IB device does not support FMRs nor FastRegs, can't 
register memory\n");
-   goto dev_attr_err;
+   return -1;
}
 
device->cqs_used = min(ISER_MAX_CQ, 
device->ib_device->num_comp_vectors);
@@ -158,7 +155,6 @@ static int iser_create_device_ib_res(struct iser_device 
*device)
if (ib_register_event_handler(&device->event_handler))
goto handler_err;
 
-   kfree(dev_attr);
return 0;
 
 handler_err:
@@ -178,8 +174,6 @@ pd_err:
kfree(device->cq_desc);
 cq_desc_err:
iser_err("failed to allocate an IB resource\n");
-dev_attr_err:
-   kfree(dev_attr);
return -1;
 }
 
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 09/13] IB/iser: Initialize T10-PI resources

2014-03-05 Thread Sagi Grimberg
From: Alex Tabachnik 

During connection establishment we also initiatlize
T10-PI resources (QP, PI contexts) in order to support
SCSI's protection operations.

Signed-off-by: Alex Tabachnik 
Signed-off-by: Sagi Grimberg 
---
 drivers/infiniband/ulp/iser/iscsi_iser.h |   21 -
 drivers/infiniband/ulp/iser/iser_verbs.c |   77 +++---
 2 files changed, 90 insertions(+), 8 deletions(-)

diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.h 
b/drivers/infiniband/ulp/iser/iscsi_iser.h
index 011003f..99fc8b8 100644
--- a/drivers/infiniband/ulp/iser/iscsi_iser.h
+++ b/drivers/infiniband/ulp/iser/iscsi_iser.h
@@ -134,6 +134,15 @@
ISER_MAX_TX_MISC_PDUS+ \
ISER_MAX_RX_MISC_PDUS)
 
+/* Max registration work requests per command */
+#define ISER_MAX_REG_WR_PER_CMD5
+
+/* For Signature we don't support DATAOUTs so no need to make room for them */
+#define ISER_QP_SIG_MAX_REQ_DTOS   (ISER_DEF_XMIT_CMDS_MAX *   \
+   (1 + ISER_MAX_REG_WR_PER_CMD) + \
+   ISER_MAX_TX_MISC_PDUS + \
+   ISER_MAX_RX_MISC_PDUS)
+
 #define ISER_VER   0x10
 #define ISER_WSV   0x08
 #define ISER_RSV   0x04
@@ -281,7 +290,16 @@ struct iser_device {
 };
 
 enum iser_reg_indicator {
-   ISER_DATA_KEY_VALID = 1 << 0,
+   ISER_DATA_KEY_VALID = 1 << 0,
+   ISER_PROT_KEY_VALID = 1 << 1,
+   ISER_SIG_KEY_VALID  = 1 << 2,
+   ISER_FASTREG_PROTECTED  = 1 << 3,
+};
+
+struct iser_pi_context {
+   struct ib_mr   *prot_mr;
+   struct ib_fast_reg_page_list   *prot_frpl;
+   struct ib_mr   *sig_mr;
 };
 
 struct fast_reg_descriptor {
@@ -289,6 +307,7 @@ struct fast_reg_descriptor {
/* For fast registration - FRWR */
struct ib_mr *data_mr;
struct ib_fast_reg_page_list *data_frpl;
+   struct iser_pi_context   *pi_ctx;
/* registration indicators container */
u8reg_indicators;
 };
diff --git a/drivers/infiniband/ulp/iser/iser_verbs.c 
b/drivers/infiniband/ulp/iser/iser_verbs.c
index 4c27f55..0404c71 100644
--- a/drivers/infiniband/ulp/iser/iser_verbs.c
+++ b/drivers/infiniband/ulp/iser/iser_verbs.c
@@ -275,7 +275,7 @@ void iser_free_fmr_pool(struct iser_conn *ib_conn)
 
 static int
 iser_create_fastreg_desc(struct ib_device *ib_device, struct ib_pd *pd,
-struct fast_reg_descriptor *desc)
+bool pi_enable, struct fast_reg_descriptor *desc)
 {
int ret;
 
@@ -294,12 +294,64 @@ iser_create_fastreg_desc(struct ib_device *ib_device, 
struct ib_pd *pd,
iser_err("Failed to allocate ib_fast_reg_mr err=%d\n", ret);
goto fast_reg_mr_failure;
}
+   desc->reg_indicators |= ISER_DATA_KEY_VALID;
+
+   if (pi_enable) {
+   struct ib_mr_init_attr mr_init_attr = {0};
+   struct iser_pi_context *pi_ctx = NULL;
+
+   desc->pi_ctx = kzalloc(sizeof(*desc->pi_ctx), GFP_KERNEL);
+   if (!desc->pi_ctx) {
+   iser_err("Failed to allocate pi context\n");
+   ret = -ENOMEM;
+   goto pi_ctx_alloc_failure;
+   }
+   pi_ctx = desc->pi_ctx;
+
+   pi_ctx->prot_frpl = ib_alloc_fast_reg_page_list(ib_device,
+   ISCSI_ISER_SG_TABLESIZE);
+   if (IS_ERR(pi_ctx->prot_frpl)) {
+   ret = PTR_ERR(pi_ctx->prot_frpl);
+   iser_err("Failed to allocate prot frpl ret=%d\n",
+ret);
+   goto prot_frpl_failure;
+   }
+
+   pi_ctx->prot_mr = ib_alloc_fast_reg_mr(pd,
+   ISCSI_ISER_SG_TABLESIZE + 1);
+   if (IS_ERR(pi_ctx->prot_mr)) {
+   ret = PTR_ERR(pi_ctx->prot_mr);
+   iser_err("Failed to allocate prot frmr ret=%d\n",
+ret);
+   goto prot_mr_failure;
+   }
+   desc->reg_indicators |= ISER_PROT_KEY_VALID;
+
+   mr_init_attr.max_reg_descriptors = 2;
+   mr_init_attr.flags |= IB_MR_SIGNATURE_EN;
+   pi_ctx->sig_mr = ib_create_mr(pd, &mr_init_attr);
+   if (IS_ERR(pi_ctx->sig_mr)) {
+   ret = PTR_ERR(pi_ctx->sig_mr);
+   iser_err("Failed to allocate signature enabled mr 
err=%d\n",
+ret);
+   goto sig_mr_failure;
+   }
+   desc->reg_indicators |= ISER_SIG_KEY_VALID;
+   }

[PATCH v2 06/13] IB/iser: Generalize iser_unmap_task_data and finalize_rdma_unaligned_sg

2014-03-05 Thread Sagi Grimberg
This routines operates on data buffers and may also work with
protection infomation buffers. So we generalize them to handle
an iser_data_buf which can be the command data or command protection
information.

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg 
---
 drivers/infiniband/ulp/iser/iscsi_iser.h |9 --
 drivers/infiniband/ulp/iser/iser_initiator.c |   37 +---
 drivers/infiniband/ulp/iser/iser_memory.c|   40 ++---
 3 files changed, 48 insertions(+), 38 deletions(-)

diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.h 
b/drivers/infiniband/ulp/iser/iscsi_iser.h
index 5660714..623defa 100644
--- a/drivers/infiniband/ulp/iser/iscsi_iser.h
+++ b/drivers/infiniband/ulp/iser/iscsi_iser.h
@@ -410,8 +410,10 @@ void iser_task_rdma_finalize(struct iscsi_iser_task *task);
 
 void iser_free_rx_descriptors(struct iser_conn *ib_conn);
 
-void iser_finalize_rdma_unaligned_sg(struct iscsi_iser_task *task,
-enum iser_data_dir cmd_dir);
+void iser_finalize_rdma_unaligned_sg(struct iscsi_iser_task *iser_task,
+struct iser_data_buf *mem,
+struct iser_data_buf *mem_copy,
+enum iser_data_dir cmd_dir);
 
 int  iser_reg_rdma_mem_fmr(struct iscsi_iser_task *task,
   enum iser_data_dir cmd_dir);
@@ -441,7 +443,8 @@ int iser_dma_map_task_data(struct iscsi_iser_task 
*iser_task,
enum   iser_data_dir   iser_dir,
enum   dma_data_direction  dma_dir);
 
-void iser_dma_unmap_task_data(struct iscsi_iser_task *iser_task);
+void iser_dma_unmap_task_data(struct iscsi_iser_task *iser_task,
+ struct iser_data_buf *data);
 int  iser_initialize_task_headers(struct iscsi_task *task,
struct iser_tx_desc *tx_desc);
 int iser_alloc_rx_descriptors(struct iser_conn *ib_conn, struct iscsi_session 
*session);
diff --git a/drivers/infiniband/ulp/iser/iser_initiator.c 
b/drivers/infiniband/ulp/iser/iser_initiator.c
index 334f34b..58e14c7 100644
--- a/drivers/infiniband/ulp/iser/iser_initiator.c
+++ b/drivers/infiniband/ulp/iser/iser_initiator.c
@@ -644,27 +644,42 @@ void iser_task_rdma_init(struct iscsi_iser_task 
*iser_task)
 void iser_task_rdma_finalize(struct iscsi_iser_task *iser_task)
 {
struct iser_device *device = iser_task->iser_conn->ib_conn->device;
-   int is_rdma_aligned = 1;
+   int is_rdma_data_aligned = 1;
 
/* if we were reading, copy back to unaligned sglist,
 * anyway dma_unmap and free the copy
 */
if (iser_task->data_copy[ISER_DIR_IN].copy_buf != NULL) {
-   is_rdma_aligned = 0;
-   iser_finalize_rdma_unaligned_sg(iser_task, ISER_DIR_IN);
+   is_rdma_data_aligned = 0;
+   iser_finalize_rdma_unaligned_sg(iser_task,
+   &iser_task->data[ISER_DIR_IN],
+   
&iser_task->data_copy[ISER_DIR_IN],
+   ISER_DIR_IN);
}
+
if (iser_task->data_copy[ISER_DIR_OUT].copy_buf != NULL) {
-   is_rdma_aligned = 0;
-   iser_finalize_rdma_unaligned_sg(iser_task, ISER_DIR_OUT);
+   is_rdma_data_aligned = 0;
+   iser_finalize_rdma_unaligned_sg(iser_task,
+   &iser_task->data[ISER_DIR_OUT],
+   
&iser_task->data_copy[ISER_DIR_OUT],
+   ISER_DIR_OUT);
}
 
-   if (iser_task->dir[ISER_DIR_IN])
+   if (iser_task->dir[ISER_DIR_IN]) {
device->iser_unreg_rdma_mem(iser_task, ISER_DIR_IN);
+   if (is_rdma_data_aligned)
+   iser_dma_unmap_task_data(iser_task,
+&iser_task->data[ISER_DIR_IN]);
 
-   if (iser_task->dir[ISER_DIR_OUT])
-   device->iser_unreg_rdma_mem(iser_task, ISER_DIR_OUT);
+   }
 
-   /* if the data was unaligned, it was already unmapped and then copied */
-   if (is_rdma_aligned)
-   iser_dma_unmap_task_data(iser_task);
+   if (iser_task->dir[ISER_DIR_OUT]) {
+   device->iser_unreg_rdma_mem(iser_task, ISER_DIR_OUT);
+   if (is_rdma_data_aligned)
+   iser_dma_unmap_task_data(iser_task,
+
&iser_task->data[ISER_DIR_OUT]);
+   if (prot_count && is_rdma_prot_aligned)
+   iser_dma_unmap_task_data(iser_task,
+
&iser_task->prot[ISER_DIR_OUT]);
+   }
 }
diff --git a/drivers/infiniband/ulp/iser/iser_memory.c 
b/drivers/infiniband/ulp/iser/iser_memory.c
in

[PATCH v2 05/13] IB/iser: Replace fastreg descriptor valid bool with indicators container

2014-03-05 Thread Sagi Grimberg
In T10-PI support we will have memory keys for protection
buffers and signature transactions. We prefer to compact
indicators rather than keeping multiple bools.

This commit does not change any functionality.

Signed-off-by: Sagi Grimberg 
Signed-off-by: Alex Tabachnik 
---
 drivers/infiniband/ulp/iser/iscsi_iser.h  |8 ++--
 drivers/infiniband/ulp/iser/iser_memory.c |4 ++--
 drivers/infiniband/ulp/iser/iser_verbs.c  |2 +-
 3 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.h 
b/drivers/infiniband/ulp/iser/iscsi_iser.h
index b4290f5..5660714 100644
--- a/drivers/infiniband/ulp/iser/iscsi_iser.h
+++ b/drivers/infiniband/ulp/iser/iscsi_iser.h
@@ -280,13 +280,17 @@ struct iser_device {
enum iser_data_dir 
cmd_dir);
 };
 
+enum iser_reg_indicator {
+   ISER_DATA_KEY_VALID = 1 << 0,
+};
+
 struct fast_reg_descriptor {
struct list_head  list;
/* For fast registration - FRWR */
struct ib_mr *data_mr;
struct ib_fast_reg_page_list *data_frpl;
-   /* Valid for fast registration flag */
-   bool  valid;
+   /* registration indicators container */
+   u8reg_indicators;
 };
 
 struct iser_conn {
diff --git a/drivers/infiniband/ulp/iser/iser_memory.c 
b/drivers/infiniband/ulp/iser/iser_memory.c
index d25587e..a7a0d3e 100644
--- a/drivers/infiniband/ulp/iser/iser_memory.c
+++ b/drivers/infiniband/ulp/iser/iser_memory.c
@@ -479,7 +479,7 @@ static int iser_fast_reg_mr(struct iscsi_iser_task 
*iser_task,
return -EINVAL;
}
 
-   if (!desc->valid) {
+   if (!(desc->reg_indicators & ISER_DATA_KEY_VALID)) {
memset(&inv_wr, 0, sizeof(inv_wr));
inv_wr.wr_id = ISER_FASTREG_LI_WRID;
inv_wr.opcode = IB_WR_LOCAL_INV;
@@ -514,7 +514,7 @@ static int iser_fast_reg_mr(struct iscsi_iser_task 
*iser_task,
iser_err("fast registration failed, ret:%d\n", ret);
return ret;
}
-   desc->valid = false;
+   desc->reg_indicators &= ~ISER_DATA_KEY_VALID;
 
sge->lkey = desc->data_mr->lkey;
sge->addr = desc->data_frpl->page_list[0] + offset;
diff --git a/drivers/infiniband/ulp/iser/iser_verbs.c 
b/drivers/infiniband/ulp/iser/iser_verbs.c
index 95fcfca..6a5f424 100644
--- a/drivers/infiniband/ulp/iser/iser_verbs.c
+++ b/drivers/infiniband/ulp/iser/iser_verbs.c
@@ -296,7 +296,7 @@ iser_create_fastreg_desc(struct ib_device *ib_device, 
struct ib_pd *pd,
}
iser_info("Create fr_desc %p page_list %p\n",
  desc, desc->data_frpl->page_list);
-   desc->valid = true;
+   desc->reg_indicators |= ISER_DATA_KEY_VALID;
 
return 0;
 
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 11/13] SCSI/libiscsi: Add check_protection callback for transports

2014-03-05 Thread Sagi Grimberg
iSCSI needs to be at least aware that a task involves protection
information. In case it does, after the transaction completed
libiscsi will ask the transport to check the protection status
of the transaction.

Unlike transport errors, DIF errors should not prevent successful
completion of the transaction from the transport point of view,
but should be escelated to scsi mid-layer when constructing the
scsi result and sense data.

check_protection routine will return the ascq corresponding to the
DIF error that occured (or 0 if no error happened).

return ascq:
- 0x1: GUARD_CHECK_FAILED
- 0x2: APPTAG_CHECK_FAILED
- 0x3: REFTAG_CHECK_FAILED

Signed-off-by: Sagi Grimberg 
Signed-off-by: Alex Tabachnik 
---
 drivers/scsi/libiscsi.c |   32 
 include/scsi/libiscsi.h |4 
 include/scsi/scsi_transport_iscsi.h |1 +
 3 files changed, 37 insertions(+), 0 deletions(-)

diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
index 4046241..3c11acf 100644
--- a/drivers/scsi/libiscsi.c
+++ b/drivers/scsi/libiscsi.c
@@ -395,6 +395,10 @@ static int iscsi_prep_scsi_cmd_pdu(struct iscsi_task *task)
if (rc)
return rc;
}
+
+   if (scsi_get_prot_op(sc) != SCSI_PROT_NORMAL)
+   task->protected = true;
+
if (sc->sc_data_direction == DMA_TO_DEVICE) {
unsigned out_len = scsi_out(sc)->length;
struct iscsi_r2t_info *r2t = &task->unsol_r2t;
@@ -823,6 +827,33 @@ static void iscsi_scsi_cmd_rsp(struct iscsi_conn *conn, 
struct iscsi_hdr *hdr,
 
sc->result = (DID_OK << 16) | rhdr->cmd_status;
 
+   if (task->protected) {
+   sector_t sector;
+   u8 ascq;
+
+   /**
+* Transports that didn't implement check_protection
+* callback but still published T10-PI support to scsi-mid
+* deserve this BUG_ON.
+**/
+   BUG_ON(!session->tt->check_protection);
+
+   ascq = session->tt->check_protection(task, §or);
+   if (ascq) {
+   sc->result = DRIVER_SENSE << 24 |
+SAM_STAT_CHECK_CONDITION;
+   scsi_build_sense_buffer(1, sc->sense_buffer,
+   ILLEGAL_REQUEST, 0x10, ascq);
+   sc->sense_buffer[7] = 0xc; /* Additional sense length */
+   sc->sense_buffer[8] = 0;   /* Information desc type */
+   sc->sense_buffer[9] = 0xa; /* Additional desc length */
+   sc->sense_buffer[10] = 0x80; /* Validity bit */
+
+   put_unaligned_be64(sector, &sc->sense_buffer[12]);
+   goto out;
+   }
+   }
+
if (rhdr->response != ISCSI_STATUS_CMD_COMPLETED) {
sc->result = DID_ERROR << 16;
goto out;
@@ -1567,6 +1598,7 @@ static inline struct iscsi_task *iscsi_alloc_task(struct 
iscsi_conn *conn,
task->have_checked_conn = false;
task->last_timeout = jiffies;
task->last_xfer = jiffies;
+   task->protected = false;
INIT_LIST_HEAD(&task->running);
return task;
 }
diff --git a/include/scsi/libiscsi.h b/include/scsi/libiscsi.h
index 309f513..1457c26 100644
--- a/include/scsi/libiscsi.h
+++ b/include/scsi/libiscsi.h
@@ -133,6 +133,10 @@ struct iscsi_task {
unsigned long   last_xfer;
unsigned long   last_timeout;
boolhave_checked_conn;
+
+   /* T10 protection information */
+   boolprotected;
+
/* state set/tested under session->lock */
int state;
atomic_trefcount;
diff --git a/include/scsi/scsi_transport_iscsi.h 
b/include/scsi/scsi_transport_iscsi.h
index 88640a4..2555ee5 100644
--- a/include/scsi/scsi_transport_iscsi.h
+++ b/include/scsi/scsi_transport_iscsi.h
@@ -167,6 +167,7 @@ struct iscsi_transport {
 struct iscsi_bus_flash_conn *fnode_conn);
int (*logout_flashnode_sid) (struct iscsi_cls_session *cls_sess);
int (*get_host_stats) (struct Scsi_Host *shost, char *buf, int len);
+   u8 (*check_protection)(struct iscsi_task *task, sector_t *sector);
 };
 
 /*
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 07/13] IB/iser: Generalize fall_to_bounce_buf routine

2014-03-05 Thread Sagi Grimberg
Unaligned SG-lists may also happen for protection
information. Genrelize bounce buffer routine to handle
any iser_data_buf which may be data and/or protection.

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg 
---
 drivers/infiniband/ulp/iser/iser_memory.c |   53 -
 1 files changed, 29 insertions(+), 24 deletions(-)

diff --git a/drivers/infiniband/ulp/iser/iser_memory.c 
b/drivers/infiniband/ulp/iser/iser_memory.c
index a933508..2c3f4b1 100644
--- a/drivers/infiniband/ulp/iser/iser_memory.c
+++ b/drivers/infiniband/ulp/iser/iser_memory.c
@@ -45,13 +45,19 @@
  * iser_start_rdma_unaligned_sg
  */
 static int iser_start_rdma_unaligned_sg(struct iscsi_iser_task *iser_task,
+   struct iser_data_buf *data,
+   struct iser_data_buf *data_copy,
enum iser_data_dir cmd_dir)
 {
-   int dma_nents;
-   struct ib_device *dev;
+   struct ib_device *dev = 
iser_task->iser_conn->ib_conn->device->ib_device;
+   struct scatterlist *sgl = (struct scatterlist *)data->buf;
+   struct scatterlist *sg;
char *mem = NULL;
-   struct iser_data_buf *data = &iser_task->data[cmd_dir];
-   unsigned long  cmd_data_len = data->data_len;
+   unsigned long  cmd_data_len = 0;
+   int dma_nents, i;
+
+   for_each_sg(sgl, sg, data->size, i)
+   cmd_data_len += ib_sg_dma_len(dev, sg);
 
if (cmd_data_len > ISER_KMALLOC_THRESHOLD)
mem = (void *)__get_free_pages(GFP_ATOMIC,
@@ -61,17 +67,16 @@ static int iser_start_rdma_unaligned_sg(struct 
iscsi_iser_task *iser_task,
 
if (mem == NULL) {
iser_err("Failed to allocate mem size %d %d for copying 
sglist\n",
-data->size,(int)cmd_data_len);
+data->size, (int)cmd_data_len);
return -ENOMEM;
}
 
if (cmd_dir == ISER_DIR_OUT) {
/* copy the unaligned sg the buffer which is used for RDMA */
-   struct scatterlist *sgl = (struct scatterlist *)data->buf;
-   struct scatterlist *sg;
int i;
char *p, *from;
 
+   sgl = (struct scatterlist *)data->buf;
p = mem;
for_each_sg(sgl, sg, data->size, i) {
from = kmap_atomic(sg_page(sg));
@@ -83,22 +88,19 @@ static int iser_start_rdma_unaligned_sg(struct 
iscsi_iser_task *iser_task,
}
}
 
-   sg_init_one(&iser_task->data_copy[cmd_dir].sg_single, mem, 
cmd_data_len);
-   iser_task->data_copy[cmd_dir].buf  =
-   &iser_task->data_copy[cmd_dir].sg_single;
-   iser_task->data_copy[cmd_dir].size = 1;
+   sg_init_one(&data_copy->sg_single, mem, cmd_data_len);
+   data_copy->buf = &data_copy->sg_single;
+   data_copy->size = 1;
+   data_copy->copy_buf = mem;
 
-   iser_task->data_copy[cmd_dir].copy_buf  = mem;
-
-   dev = iser_task->iser_conn->ib_conn->device->ib_device;
-   dma_nents = ib_dma_map_sg(dev,
- &iser_task->data_copy[cmd_dir].sg_single,
- 1,
+   dma_nents = ib_dma_map_sg(dev, &data_copy->sg_single, 1,
  (cmd_dir == ISER_DIR_OUT) ?
  DMA_TO_DEVICE : DMA_FROM_DEVICE);
BUG_ON(dma_nents == 0);
 
-   iser_task->data_copy[cmd_dir].dma_nents = dma_nents;
+   data_copy->dma_nents = dma_nents;
+   data_copy->data_len = cmd_data_len;
+
return 0;
 }
 
@@ -341,11 +343,12 @@ void iser_dma_unmap_task_data(struct iscsi_iser_task 
*iser_task,
 
 static int fall_to_bounce_buf(struct iscsi_iser_task *iser_task,
  struct ib_device *ibdev,
+ struct iser_data_buf *mem,
+ struct iser_data_buf *mem_copy,
  enum iser_data_dir cmd_dir,
  int aligned_len)
 {
struct iscsi_conn*iscsi_conn = iser_task->iser_conn->iscsi_conn;
-   struct iser_data_buf *mem = &iser_task->data[cmd_dir];
 
iscsi_conn->fmr_unalign_cnt++;
iser_warn("rdma alignment violation (%d/%d aligned) or FMR not 
supported\n",
@@ -355,12 +358,12 @@ static int fall_to_bounce_buf(struct iscsi_iser_task 
*iser_task,
iser_data_buf_dump(mem, ibdev);
 
/* unmap the command data before accessing it */
-   iser_dma_unmap_task_data(iser_task, &iser_task->data[cmd_dir]);
+   iser_dma_unmap_task_data(iser_task, mem);
 
/* allocate copy buf, if we are writing, copy the */
/* unaligned scatterlist, dma map the copy*/
-   if (iser_start_rdma_unaligned_sg(iser_task, cmd_dir) != 0)
-   return -ENOMEM;
+   if (iser_start_rdma_unaligned_sg(iser_task, mem, mem_copy, cmd_dir) != 
0

[PATCH] SCSI/libiscsi: Allow transport xmit_task to fail with non-transient error codes

2014-03-05 Thread Sagi Grimberg
Allow transport callback xmit_task to return either transient
errors such as -ENOMEM or -EAGAIN to retry the task, and
non-transient such as -EINVAL to abort it.

Signed-off-by: Sagi Grimberg 
---
 drivers/scsi/libiscsi.c |   13 ++---
 1 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
index 3c11acf..749e7bf 100644
--- a/drivers/scsi/libiscsi.c
+++ b/drivers/scsi/libiscsi.c
@@ -1707,10 +1707,17 @@ int iscsi_queuecommand(struct Scsi_Host *host, struct 
scsi_cmnd *sc)
goto prepd_fault;
}
}
-   if (session->tt->xmit_task(task)) {
+
+   reason = session->tt->xmit_task(task);
+   if (reason) {
session->cmdsn--;
-   reason = FAILURE_SESSION_NOT_READY;
-   goto prepd_reject;
+   if (reason == -ENOMEM || reason == -EAGAIN) {
+   reason = FAILURE_SESSION_NOT_READY;
+   goto prepd_reject;
+   } else {
+   sc->result = DID_ABORT << 16;
+   goto prepd_fault;
+   }
}
} else {
list_add_tail(&task->running, &conn->cmdqueue);
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Slow I/O performance on SAS1064

2014-03-05 Thread Markus
Hi

I have problem with SATA disks at my Sunfire v245 and its lsi controller they 
are very slow.

So I test it with dd and while dd in a nother terminal i see dstat results of 
sda:


Copy from ramdisk to sda.
=== Test ext4 ===
DD

root@outpost:/ramdisk# dd if=debian-7.4.0-sparc-netinst.iso of=/root/test.dd
353344+0 records in
353344+0 records out
180912128 bytes (181 MB) copied, 24.6384 s, 7.3 MB/s

Dstat:

total-cpu-usage --dsk/sda-- ---io/sda--
usr sys idl wai hiq siq| read  writ| read  writ
  1   0 100   0   0   0|   0 0 |   0 0 
  0   0 100   0   0   0|   0 0 |   0 0 
  2   1  97   0   0   0|   0 0 |   0 0 
  1   5  94   0   0   0|   0 0 |   0 0 
  7  43  50   0   0   0|   0 0 |   0 0 
  7  43  50   0   0   0|   0 0 |   0 0 
 10  40  50   0   0   0|   0 0 |   0 0 
 12  45  41   3   0   0|   0 0 |   0 0 
  6  42  36  17   0   1|   0   512k|   0  1.00 
  0   0   0  99   0   0|   0  4096k|   0  8.00 
  0   2   0  98   0   0|   0  5120k|   0  10.0 
  0   1   0  99   0   0|   0  7680k|   0  15.0 
  1   0   0 100   0   0|   0  5632k|   0  11.0 
  0   0   0  99   0   0|   0  7168k|   0  14.0 
  1   1   0  99   0   0|   0  6656k|   0  13.0 
  0   0   0 100   0   0|   0  6144k|   0  12.0 
  1   0   0 100   0   0|   0  7680k|   0  15.0 
  0   1  22  78   0   0|   0  4608k|   0  9.00 
  0   0  50  50   0   0|   0  3072k|   0  6.00 
  1   0  50  50   0   1|   0  7168k|   0  14.0 
  1   1  50  49   0   0|4096B 5120k|1.00  10.0 
  0   0  50  50   0   0|   0  8704k|   0  17.0 
  0   0  50  49   0   0|   0  7680k|   0  15.0 
  0   1  50  50   0   0|   0  7168k|   0  14.0 
  1   0  49  49   0   0|   0  7168k|   0  14.0 
  1   0  50  50   0   0|   0  7680k|   0  15.0 
  0   0  50  50   0   0|   0  7168k|   0  14.0 
  0   0  50  50   0   0|   0  2560k|   0  5.00 
  0   1  30  69   0   0|   0  3624k|   0  8.00 
  1   0   0  99   0   0|   0  4100k|   0  9.00 
  0   0   0 100   0   0|   0  5632k|   0  11.0 
  0   0   0  99   0   0|   0  3584k|   0  7.00 
  1   0   0 100   0   0|   0  4608k|   0  9.00 
  0   0   0  99   0   1|   0  5120k|   0  10.0 
  1   0   4  96   0   0|   0  2560k|   0  6.00 
  0   0  50  50   0   0|   0  3072k|   0  6.00 
  0   0  49  50   0   0|   0  5120k|   0  10.0 
  1   0  50  50   0   0|   0  5120k|   0  10.0 
  0   0  50  50   0   0|   0  4096k|   0  8.00 
  1   0  50  50   0   0|   0  5120k|   0  10.0 
  0   0  50  50   0   0|   0  5120k|   0  10.0 
  0   1  57  42   0   0|   0  1125k|   0  17.0 
  0   0 100   0   0   0|   0 0 |   0 0 
  1   0 100   0   0   0|   0 0 |   0 0 
  1   0 100   0   0   0|   0 0 |   0 0 
  0   0 100   0   0   0|   0 0 |   0 0 
  0   0 100   0   0   0|   0 0 |   0 0

---
Information:
Kernel 3.13.5 (stable release)
Debian7 (stable)
hdparm v9.39


Controller 
LSI SAS1064
cat /proc/mpt/ioc0/info 
ioc0:
  ProductID = 0x2701 (LSISAS1064 A3)
  FWVersion = 0x01080400
  MsgVersion = 0x0105
  FirstWhoInit = 0x00
  EventState = 0x00
  CurrentHostMfaHighAddr = 0x
  CurrentSenseBufferHighAddr = 0x
  MaxChainDepth = 0x60 frames
  MinBlockSize = 0x20 bytes
  RequestFrames @ 0xfc137f602800 (Dma @ 0xc000a800)
{CurReqSz=128} x {CurReqDepth=511} = 65408 bytes ^= 0x1
{MaxReqSz=128}   {MaxReqDepth=511}
  Frames   @ 0xfc137f60 (Dma @ 0xc0008000)
{CurRepSz=80} x {CurRepDepth=128} = 10240 bytes ^= 0x2880
{MaxRepSz=0}   {MaxRepDepth=511}
  MaxDevices = 63
  MaxBuses = 1
  PortNumber = 1 (of 1)

cat /proc/mpt/version 
mptlinux-3.04.20
  Fusion MPT base driver
  Fusion MPT SAS host driver
  Fusion MPT ioctl driver

---
Hdparm Information:
hdparm -I /dev/sda

/dev/sda:
SG_IO: bad/missing sense data, sb[]:  70 00 05 00 00 00 00 0a 00 00 00 
00 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

ATA device, with non-removable media
Standards:
Likely used: 1
Configuration:
Logical max current
cylinders   0   0
heads   0   0
sectors/track   0   0
--
Logical/Physical Sector size:   512 bytes
device size with M = 1024*1024:   0 MBytes
device size with M = 1000*1000:   0 MBytes 
cache/buffer size  = unknown
Capabilities:
IORDY not likely
Cannot perform double-word IO
R/W multiple sector transfer: not supported

Re: [PATCH v1 10/13] IB/iser: Support T10-PI operations

2014-03-05 Thread Mike Christie
On 03/04/2014 03:59 AM, Sagi Grimberg wrote:
> On 3/4/2014 11:38 AM, Or Gerlitz wrote:
>> On 03/03/2014 06:44, Mike Christie wrote:
>>> The xmit_task callout does handle failures like EINVAL. If the above map
>>> calls fail then you would get infinite retries. You would currently want
>>> to do the mapping in the init_task callout instead.
>>>
>>> If it makes it easier on the driver implementation then it is ok to
>>> modify the xmit_task callers so that they handle multiple error codes
>>> for drivers like iser that have the xmit_task callout called from
>>> iscsi_queuecommand.
>>
>> Mike,
>>
>> After looking on the code with Sagi,  it seems to us that the correct
>> way to go here, would be to enhance in iscsi_queuecommand the
>> processing of the result returned by session->tt->xmit_task(task) to
>> behave in a similar manner to how the return value of
>> iscsi_prep_scsi_cmd_pdu() is treated. E.g for errors such as ENOMEM
>> and EGAIN take the "reject" flow which would cause the SCSI midlayer
>> to retry the command and for other return values go to the "fault"
>> flow which will cause the ML to abort the command.
>>
>> Or.
>>
> 
> Yes, we were thinking about the following:
> --- a/drivers/scsi/libiscsi.c
> +++ b/drivers/scsi/libiscsi.c
> @@ -1707,10 +1707,17 @@ int iscsi_queuecommand(struct Scsi_Host *host,
> struct scsi_cmnd *sc)
> goto prepd_fault;
> }
> }
> -   if (session->tt->xmit_task(task)) {
> -   session->cmdsn--;
> -   reason = FAILURE_SESSION_NOT_READY;
> -   goto prepd_reject;
> +
> +   reason = session->tt->xmit_task(task);
> +   if (reason) {
> +   if (reason == -ENOMEM || reason == -EAGAIN) {
> +   session->cmdsn--;
> +   reason = FAILURE_SESSION_NOT_READY;
> +   goto prepd_reject;
> +   } else {
> +   sc->result = DID_ABORT << 16;
> +   goto prepd_fault;
> +   }

Or is correct.

If iscsi_prep_scsi_cmd_pdu is successful then it will increment cmdsn.

In this code path for xmit_task above, we assume that the driver can
either take the entire command and can send it here, or return error and
we requeue. For your new error case where we cannot send the command due
to a hard failure so we want to fail the command, then we still need to
decrement the cmdsn or there would be a hole since it was never put on
the wire.

Also, it is probably safest to check for the error code you are adding
support for:

reason = session->tt->xmit_task(task);
if (reason) {
session->cmdsn--;

if (reason == -EINVAL) {
sc->result = DID_ABORT << 16;
goto prepd_fault;
} else {
sc->result = DID_ABORT << 16;
goto prepd_fault;
}
}
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Slow I/O performance on SAS1064

2014-03-05 Thread Mark Knecht
On Wed, Mar 5, 2014 at 9:50 AM, Markus  wrote:


> The hdparm result looks like there is somethink not right . There were no 
> features supported but why ?


Does the HDD have S.M.A.R.T. features? Possibly

smartctl -a /dev/sda

would provide some additional visibility?

- Mark
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] ipr: Remove extended delay bit on GSCSI reads/writes ops

2014-03-05 Thread wenxiong

Hi James,

I didn't see these patches got merged yet. Do you want me to re-submit them?

Thanks for your help!
Wendy

Quoting wenxi...@linux.vnet.ibm.com:


This patch removes extended delay bit on GSCSI reads/writes ops, the
performance will be significanly better.

Signed-off-by: Wen Xiong 

---
 drivers/scsi/ipr.c |6 +-
 drivers/scsi/ipr.h |1 +
 2 files changed, 6 insertions(+), 1 deletion(-)

Index: b/drivers/scsi/ipr.c
===
--- a/drivers/scsi/ipr.c2014-01-21 11:31:51.747454658 -0600
+++ b/drivers/scsi/ipr.c2014-01-21 11:32:14.949015785 -0600
@@ -1143,6 +1143,7 @@ static void ipr_init_res_entry(struct ip
res->add_to_ml = 0;
res->del_from_ml = 0;
res->resetting_device = 0;
+   res->reset_occurred = 0;
res->sdev = NULL;
res->sata_port = NULL;

@@ -5015,6 +5016,7 @@ static int __ipr_eh_dev_reset(struct scs
} else
rc = ipr_device_reset(ioa_cfg, res);
res->resetting_device = 0;
+   res->reset_occurred = 1;

LEAVE;
return rc ? FAILED : SUCCESS;
@@ -6183,8 +6185,10 @@ static int ipr_queuecommand(struct Scsi_
ioarcb->cmd_pkt.flags_hi |= IPR_FLAGS_HI_NO_ULEN_CHK;

ioarcb->cmd_pkt.flags_hi |= IPR_FLAGS_HI_NO_LINK_DESC;
-   if (ipr_is_gscsi(res))
+   if (ipr_is_gscsi(res) && res->reset_occurred) {
+   res->reset_occurred = 0;
ioarcb->cmd_pkt.flags_lo |= 
IPR_FLAGS_LO_DELAY_AFTER_RST;
+   }
ioarcb->cmd_pkt.flags_lo |= IPR_FLAGS_LO_ALIGNED_BFR;
ioarcb->cmd_pkt.flags_lo |= ipr_get_task_attributes(scsi_cmd);
}
Index: b/drivers/scsi/ipr.h
===
--- a/drivers/scsi/ipr.h2014-01-21 11:31:55.587764485 -0600
+++ b/drivers/scsi/ipr.h2014-01-21 11:32:14.957453279 -0600
@@ -1252,6 +1252,7 @@ struct ipr_resource_entry {
u8 add_to_ml:1;
u8 del_from_ml:1;
u8 resetting_device:1;
+   u8 reset_occurred:1;

u32 bus;/* AKA channel */
u32 target; /* AKA id */

--




--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/8] target_core_spc: Include target device descriptor in VPD page 83

2014-03-05 Thread Andy Grover

On 12/18/2013 01:20 AM, Hannes Reinecke wrote:

On 12/17/2013 09:01 PM, Nicholas A. Bellinger wrote:

On Tue, 2013-12-17 at 11:50 -0800, Nicholas A. Bellinger wrote:

On Tue, 2013-12-17 at 09:18 +0100, Hannes Reinecke wrote:

We should be including a descriptor referring to the target device
to allow identification of different TCM instances.

Signed-off-by: Hannes Reinecke 
---
  drivers/target/target_core_spc.c | 43 +++-
  1 file changed, 42 insertions(+), 1 deletion(-)



One issue with this patch.  The local buffer in spc_emulate_inquiry is
currently hardcoded to SE_INQUIRY_BUF=512, so two large scsi name
designators could overflow here..

So for the largest case with EVPD=0x83, this would be:

4 bytes for header +
20 bytes for NAA IEEE Registered Extended Assigned designator +
56 bytes for T10 Vendor Identifier +
8 bytes for Relative target port +
8 bytes for Target port group +
8 bytes for Logical unit group +
256 bytes for SCSI name (target port) +
256 bytes for SCSI name (target device) == 616 bytes.

So for good measure, bumping up SE_INQUIRY_BUF to 1024.



Mmmm, looking at this again, is reporting back two SCSI names in
EVPD=0x83 with different associations (one for target port, and one for
target device) really necessary..?

Doesn't the existing target port association report back the same
information..?


No.
'Target port' is the identification for the port handling the
request, which is contained within a target device.

The reason why we need this is that we want to identify the scope of
the Target port group number.

Target port group numbers are relative to the encompassing target
device, so when we're having _several_ target devices they might
well provide us with identical target port group numbers.

For explicit ALUA each target port group within a target device can
be thought of a 'scheduling domain', ie if I sent STPG to one of the
devices in that domain there is a _high_ likelihood that _every_
device within that scheduling domain will be affected.
So I can be slightly smarter here and just send one STPG and then
wait for the resulting states on all affected devices.

If I don't have this information I am required to send STPG to each
and every device, thereby flooding the target controller with STPGs
for the same target port group.

So yes, we should be furnishing both.
In addition it's the only sane way of identifying the array :-)


Hi, fbfe858 only bumps INQUIRY_BUF to 768 although the comment says 
1024, is this expected and ok?


Regards -- Andy

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] Add EVPD page 0x83 to sysfs

2014-03-05 Thread Christoph Hellwig
On Wed, Mar 05, 2014 at 08:38:01AM +0100, Hannes Reinecke wrote:
> > Either way I think the call to query evpd 0 should be a separate
> > function, so even if we don't store the information it's abstracted out.
> > 
> Hmm. That would work if we were just asking for a single page; but
> when we're checking several pages (like 0x83 and 0x80) we'd need
> either to pass in a page array or querying page 0 several times.
> Neither of which is very appealing.
>
> However, specifying additional flags for the individual pages might
> work. I'll see what I can come up with.

Passing in a bitmask or flags seems useful.  Even better storing it in the
scsi_device.  Note that I expect the place that need to know the EVPD
patch to grow slowly but steadily over time.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/8] target_core_spc: Include target device descriptor in VPD page 83

2014-03-05 Thread Andy Grover

On 03/05/2014 11:41 AM, Andy Grover wrote:

Hi, fbfe858 only bumps INQUIRY_BUF to 768 although the comment says
1024, is this expected and ok?


Doh, read the diff backwards, disregard. -- Andy

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 1/3] libata, libsas: kill pm_result and related cleanup

2014-03-05 Thread Dan Williams
Tejun says:
  "At least for libata, worrying about suspend/resume failures don't make
   whole lot of sense.  If suspend failed, just proceed with suspend.  If
   the device can't be woken up afterwards, that's that.  There isn't
   anything we could have done differently anyway.  The same for resume, if
   spinup fails, the device is dud and the following commands will invoke
   EH actions and will eventually fail.  Again, there really isn't any
   *choice* to make.  Just making sure the errors are handled gracefully
   (ie. don't crash) and the following commands are handled correctly
   should be enough."

The only libata user that actually cares about the result from a suspend
operation is libsas.  However, it only cares about whether queuing a new
operation collides with an in-flight one.  All libsas does with the
error is retry, but we can just let libata wait for the previous
operation before continuing.

While we're at it take an opportunity to replace helper functions that
just do a to_ata_port(dev) conversion with macros whose names readily
distinguish sync vs async (queued) operations.

Reference: http://marc.info/?l=linux-scsi&m=138995409532286&w=2

Cc: Phillip Susi 
Cc: Alan Stern 
Suggested-by: Tejun Heo 
Signed-off-by: Todd Brandt 
Signed-off-by: Dan Williams 
---
 drivers/ata/libata-core.c |   75 +++--
 drivers/ata/libata-eh.c   |   13 +--
 drivers/scsi/libsas/sas_ata.c |   35 +++
 include/linux/libata.h|9 ++---
 include/scsi/libsas.h |1 -
 5 files changed, 39 insertions(+), 94 deletions(-)

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 1a3dbd1b196e..0f47436c714c 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -5351,22 +5351,17 @@ bool ata_link_offline(struct ata_link *link)
 }
 
 #ifdef CONFIG_PM
-static int ata_port_request_pm(struct ata_port *ap, pm_message_t mesg,
-  unsigned int action, unsigned int ehi_flags,
-  int *async)
+static void ata_port_request_pm(struct ata_port *ap, pm_message_t mesg,
+   unsigned int action, unsigned int ehi_flags,
+   bool async)
 {
struct ata_link *link;
unsigned long flags;
-   int rc = 0;
 
/* Previous resume operation might still be in
 * progress.  Wait for PM_PENDING to clear.
 */
if (ap->pflags & ATA_PFLAG_PM_PENDING) {
-   if (async) {
-   *async = -EAGAIN;
-   return 0;
-   }
ata_port_wait_eh(ap);
WARN_ON(ap->pflags & ATA_PFLAG_PM_PENDING);
}
@@ -5375,11 +5370,6 @@ static int ata_port_request_pm(struct ata_port *ap, 
pm_message_t mesg,
spin_lock_irqsave(ap->lock, flags);
 
ap->pm_mesg = mesg;
-   if (async)
-   ap->pm_result = async;
-   else
-   ap->pm_result = &rc;
-
ap->pflags |= ATA_PFLAG_PM_PENDING;
ata_for_each_link(link, ap, HOST_FIRST) {
link->eh_info.action |= action;
@@ -5390,16 +5380,13 @@ static int ata_port_request_pm(struct ata_port *ap, 
pm_message_t mesg,
 
spin_unlock_irqrestore(ap->lock, flags);
 
-   /* wait and check result */
if (!async) {
ata_port_wait_eh(ap);
WARN_ON(ap->pflags & ATA_PFLAG_PM_PENDING);
}
-
-   return rc;
 }
 
-static int __ata_port_suspend_common(struct ata_port *ap, pm_message_t mesg, 
int *async)
+static int ata_port_suspend_common(struct ata_port *ap, pm_message_t mesg, 
bool async)
 {
/*
 * On some hardware, device fails to respond after spun down
@@ -5411,59 +5398,53 @@ static int __ata_port_suspend_common(struct ata_port 
*ap, pm_message_t mesg, int
 */
unsigned int ehi_flags = ATA_EHI_QUIET | ATA_EHI_NO_AUTOPSY |
 ATA_EHI_NO_RECOVERY;
-   return ata_port_request_pm(ap, mesg, 0, ehi_flags, async);
+   ata_port_request_pm(ap, mesg, 0, ehi_flags, async);
+   return 0;
 }
 
-static int ata_port_suspend_common(struct device *dev, pm_message_t mesg)
-{
-   struct ata_port *ap = to_ata_port(dev);
-
-   return __ata_port_suspend_common(ap, mesg, NULL);
-}
+#define ata_port_suspend_sync(ap, msg) ata_port_suspend_common((ap), (msg), 
false)
+#define queue_ata_port_suspend(ap, msg) ata_port_suspend_common((ap), (msg), 
true)
 
 static int ata_port_suspend(struct device *dev)
 {
+   struct ata_port *ap = to_ata_port(dev);
+
if (pm_runtime_suspended(dev))
return 0;
 
-   return ata_port_suspend_common(dev, PMSG_SUSPEND);
+   return ata_port_suspend_sync(ap, PMSG_SUSPEND);
 }
 
 static int ata_port_do_freeze(struct device *dev)
 {
+   struct ata_port *ap = to_ata_port(dev);
+
if (pm_runtime_suspended(dev))
return 0;
 
-   

[PATCH v5 3/3] scsi: async sd resume

2014-03-05 Thread Dan Williams
async_schedule() sd resume work to allow disks and other devices to
resume in parallel.

This moves the entirety of scsi_device resume to an async context to
ensure that scsi_device_resume() remains ordered with respect to the
completion of the start/stop command.  For the duration of the resume,
new command submissions (that do not originate from the scsi-core) will
be deferred (BLKPREP_DEFER).

It adds a new ASYNC_DOMAIN_EXCLUSIVE(scsi_sd_pm_domain) as a container
of these operations.  Like scsi_sd_probe_domain it is flushed at
sd_remove() time to ensure async ops do not continue past the
end-of-life of the sdev.  The implementation explicitly refrains from
reusing scsi_sd_probe_domain directly for this purpose as it is flushed
at the end of dpm_resume(), potentially defeating some of the benefit.
Given sdevs are quiesced it is permissible for these resume operations
to bleed past the async_synchronize_full() calls made by the driver
core.

We defer the resolution of which pm callback to call until
scsi_dev_type_{suspend|resume} time and guarantee that the 'int
(*cb)(struct device *)' parameter is never NULL.  With this in place the
type of resume operation is encoded in the async function identifier.

Inspired by Todd's analysis and initial proposal [2]:
https://01.org/suspendresume/blogs/tebrandt/2013/hard-disk-resume-optimization-simpler-approach

Cc: Len Brown 
Cc: Phillip Susi 
Cc: Alan Stern 
Suggested-by: Todd Brandt 
[djbw: kick all resume work to the async queue]
Signed-off-by: Dan Williams 
---
 drivers/scsi/scsi.c  |3 +
 drivers/scsi/scsi_pm.c   |  115 --
 drivers/scsi/scsi_priv.h |1 
 drivers/scsi/sd.c|1 
 4 files changed, 95 insertions(+), 25 deletions(-)

diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index d8afec8317cf..990d4a302788 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -91,6 +91,9 @@ EXPORT_SYMBOL(scsi_logging_level);
 ASYNC_DOMAIN(scsi_sd_probe_domain);
 EXPORT_SYMBOL(scsi_sd_probe_domain);
 
+ASYNC_DOMAIN_EXCLUSIVE(scsi_sd_pm_domain);
+EXPORT_SYMBOL(scsi_sd_pm_domain);
+
 /* NB: These are exposed through /proc/scsi/scsi and form part of the ABI.
  * You may not alter any existing entry (although adding new ones is
  * encouraged once assigned by ANSI/INCITS T10
diff --git a/drivers/scsi/scsi_pm.c b/drivers/scsi/scsi_pm.c
index 001e9ceda4c3..1cb211bf0383 100644
--- a/drivers/scsi/scsi_pm.c
+++ b/drivers/scsi/scsi_pm.c
@@ -18,17 +18,52 @@
 
 #ifdef CONFIG_PM_SLEEP
 
+#define do_pm_op(dev, op) \
+   dev->driver ? dev->driver->pm ? \
+   dev->driver->pm->op(dev) : 0 : 0
+
+static int do_scsi_suspend(struct device *dev)
+{
+   return do_pm_op(dev, suspend);
+}
+
+static int do_scsi_freeze(struct device *dev)
+{
+   return do_pm_op(dev, freeze);
+}
+
+static int do_scsi_poweroff(struct device *dev)
+{
+   return do_pm_op(dev, poweroff);
+}
+
+static int do_scsi_resume(struct device *dev)
+{
+   return do_pm_op(dev, resume);
+}
+
+static int do_scsi_thaw(struct device *dev)
+{
+   return do_pm_op(dev, thaw);
+}
+
+static int do_scsi_restore(struct device *dev)
+{
+   return do_pm_op(dev, restore);
+}
+
 static int scsi_dev_type_suspend(struct device *dev, int (*cb)(struct device 
*))
 {
int err;
 
+   /* flush pending in-flight resume operations, suspend is synchronous */
+   async_synchronize_full_domain(&scsi_sd_pm_domain);
+
err = scsi_device_quiesce(to_scsi_device(dev));
if (err == 0) {
-   if (cb) {
-   err = cb(dev);
-   if (err)
-   scsi_device_resume(to_scsi_device(dev));
-   }
+   err = cb(dev);
+   if (err)
+   scsi_device_resume(to_scsi_device(dev));
}
dev_dbg(dev, "scsi suspend: %d\n", err);
return err;
@@ -38,10 +73,16 @@ static int scsi_dev_type_resume(struct device *dev, int 
(*cb)(struct device *))
 {
int err = 0;
 
-   if (cb)
-   err = cb(dev);
+   err = cb(dev);
scsi_device_resume(to_scsi_device(dev));
dev_dbg(dev, "scsi resume: %d\n", err);
+
+   if (err == 0) {
+   pm_runtime_disable(dev);
+   pm_runtime_set_active(dev);
+   pm_runtime_enable(dev);
+   }
+
return err;
 }
 
@@ -66,20 +107,50 @@ scsi_bus_suspend_common(struct device *dev, int 
(*cb)(struct device *))
return err;
 }
 
+static void async_sdev_resume(void *dev, async_cookie_t cookie)
+{
+   scsi_dev_type_resume(dev, do_scsi_resume);
+}
+
+static void async_sdev_thaw(void *dev, async_cookie_t cookie)
+{
+   scsi_dev_type_resume(dev, do_scsi_thaw);
+}
+
+static void async_sdev_restore(void *dev, async_cookie_t cookie)
+{
+   scsi_dev_type_resume(dev, do_scsi_restore);
+}
+
+static async_func_t to_async_sdev_resume_fn(struct device *dev,
+   int (*cb)(struct device *))
+

[PATCH v5 0/3] Accelerate Storage Resume (2x or more)

2014-03-05 Thread Dan Williams
It is not everyday that a kernel operation gets 2x faster.  Nice find
by Todd and his AnalyzeSuspend tool [1].

Todd is on vacation so I am taking care of these patches to make sure
they get in the queue for 3.15.

The significant changes from Todd's last submission [2]:

1/ Split out the pure cleanup into its own patch, and expand it to clean
   up libsas as well

2/ Move the entirety of scsi_device resume to an async context.  As
   written, v4 of the patchset overlapped the scsi-start-stop command
   with the scsi_device_resume().  Now in v5 the queue restart is
   properly ordered with the completion of the scsi-start-stop command.

Resume completion time as reported by:

"echo devices > /sys/power/pm_test && echo mem > /sys/power/state"

BEFORE: PM: resume of devices complete after 2271.097 msecs
AFTER:  PM: resume of devices complete after 1057.404 msecs

On a system with two SSDs attached via AHCI.

--
Dan


[1]: Todd's testing showing up to 12x resume latency improvement in some
cases:
https://01.org/suspendresume/blogs/tebrandt/2013/hard-disk-resume-optimization-simpler-approach

[2]: Todd's v4 patchset:
[PATCH v4 0/2] http://marc.info/?l=linux-ide&m=138984698103487&w=2
[PATCH v4 1/2] http://marc.info/?l=linux-ide&m=138984713203515&w=2
[PATCH v4 2/2] http://marc.info/?l=linux-ide&m=138984722003539&w=2

---

Dan Williams (2):
  libata, libsas: kill pm_result and related cleanup
  scsi: async sd resume

Todd Brandt (1):
  libata: async resume


 drivers/ata/libata-core.c |   75 ++-
 drivers/ata/libata-eh.c   |   13 +
 drivers/scsi/libsas/sas_ata.c |   35 ++--
 drivers/scsi/scsi.c   |3 +
 drivers/scsi/scsi_pm.c|  115 -
 drivers/scsi/scsi_priv.h  |1 
 drivers/scsi/sd.c |1 
 include/linux/libata.h|9 +--
 include/scsi/libsas.h |1 
 9 files changed, 134 insertions(+), 119 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 2/3] libata: async resume

2014-03-05 Thread Dan Williams
From: Todd Brandt 

Improve overall system resume time by making libata link recovery
actions asynchronous relative to other resume events.

Link resume operations are performed using the scsi_eh thread, so
commands, particularly the sd resume start/stop command, will be held
off until the device exits error handling.  Libata already flushes eh
with ata_port_wait_eh() in the port teardown paths, so there are no
concerns with async operation colliding with the end-of-life of the
ata_port object.  Also, libata-core is already careful to flush
in-flight pm operations before another round of pm starts on the given
ata_port.

Reference: 
https://01.org/suspendresume/blogs/tebrandt/2013/hard-disk-resume-optimization-simpler-approach

Cc: Len Brown 
Cc: Phillip Susi 
Cc: Alan Stern 
Signed-off-by: Todd Brandt 
[djbw: rebase on cleanup patch, changelog wordsmithing]
Signed-off-by: Dan Williams 
---
 drivers/ata/libata-core.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 0f47436c714c..7719ec7d9df9 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -5444,7 +5444,7 @@ static int ata_port_resume(struct device *dev)
 {
int rc;
 
-   rc = ata_port_resume_sync(to_ata_port(dev), PMSG_RESUME);
+   rc = queue_ata_port_resume(to_ata_port(dev), PMSG_RESUME);
if (!rc) {
pm_runtime_disable(dev);
pm_runtime_set_active(dev);

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v14 2/3] ata: Add APM X-Gene SoC AHCI SATA host controller driver

2014-03-05 Thread Loc Ho
Hi,

>> +#define pdata_to_ctx(x) container_of(x, struct xgene_ahci_context, 
>> plat_data)
>> +
>> +struct xgene_ahci_context {
>> + struct ahci_platform_data plat_data;
>
> plat_data is used only to get to struct xgene_ahci_context instance
> so it can be removed (especially since we want to remove struct
> ahci_platform_data altogether in the long-term) and hpriv->plat_data
> should be set to point to context instance itself.  Actually, this
> seems to be the case already as hpriv->plat_data is set to hplat_data
> (not to &hplat_data->plat_data) in xgene_ahci_probe() so I wonder how
> does this driver work currently?

I will remove the plat_data. .

>> +/**
>> + * xgene_ahci_read_id - Read ID data from the specified device
>> + * @dev: device
>> + * @tf: proposed taskfile
>> + * @id: data buffer
>> + *
>> + * This custom read ID function is required due to the fact that the HW
>> + * does not support DEVSLP and the controller state machine may get stuck
>> + * after processing the ID query command.
>> + */
>> +static unsigned int xgene_ahci_read_id(struct ata_device *dev,
>> +struct ata_taskfile *tf, u16 *id)
>> +{
>> + u32 err_mask;
>> + void __iomem *port_mmio = ahci_port_base(dev->link->ap);
>> +
>> + err_mask = ata_do_dev_read_id(dev, tf, id);
>> + if (err_mask)
>> + return err_mask;
>> +
>> + /*
>> +  * Mask reserved area. Bit78 spec of Link Power Management
>
> Word78?

Yes...

>
>> +  * bit15-8: reserved
>> +  * bit7: NCQ autosence
>> +  * bit6: Software settings preservation supported
>> +  * bit5: reserved
>> +  * bit4: In-order sata delivery supported
>> +  * bit3: DIPM requests supported
>> +  * bit2: DMA Setup FIS Auto-Activate optimization supported
>> +  * bit1: DMA Setup FIX non-Zero buffer offsets supported
>> +  * bit0: Reserved
>> +  *
>> +  * Clear reserved bit (DEVSLP bit) as we don't support DEVSLP
>> +  */
>> + id[78] &= 0x00FF;

I will also fix this up by bit mask off.

>> +
>> + /* HW requires toggle of the clock */
>> + ahci_platform_disable_clks(hpriv);
>> + rc = ahci_platform_enable_clks(hpriv);
>
> Why is this needed (extra clocks disable->enable sequence)?

This is an HW errata with the clock. If I don't give the clock an full
cycle, it doesn't work.

-Loc
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v12 1/4] PHY: Add function set_speed to generic PHY framework

2014-03-05 Thread Loc Ho
Hi,

 >> >> This patch adds function set_speed to the generic PHY framework 
 >> >> operation
 >> >> structure. This function can be called to instruct the PHY 
 >> >> underlying layer
 >> >> at specified lane to configure for specified speed in hertz.
 >> >
 >> > why ? looks like clk_set_rate() is your friend here. Can you be more
 >> > descriptive of the use case ? When will this be used ?
 >> >
 >>
 >> The phy_set_speed is used to configure the operation speed of the PHY
 >> at run-time. The clock interface in general is used to configure the
 >> clock input to the IP. I don't believe they are the same thing. Maybe
 >> it will be clear in my response to your second email
 >
 > The problem with this is that you end up adding SATA-specific details to
 > something which is supposed to be generic.

 Setting the operation speed is pretty generic from an interface point
 of view. An generic multi-purpose PHY can support multiple speed. If
>>>
>>> no it's not. Specially when you consider that your "speed" argument can
>>> be just about anything and depending on the underlying bus, it *will* be
>>> treated differently. Note that e.g. in OMAP devices the exact *same* PHY
>>> IP is used for PCIe, SATA and USB... now, let's assume for the sake of
>>> argument that we were to implement ->set_speed() in that environment,
>>> how do you plan to do that ? a 6MHz arguments isn't valid from USB stand
>>> point and could mean different things in PCIe or SATA.
>>>
>>
>> Correct me if I am wrong here. If the same PHY is used for PCIe, SATA,
>> and USB, would you not need to indicate what speed the PHY should be
>> operated at - unless the underlying IP magically negotiate and
>> configured automatically? If so, what about PHY isn't so intelligent?
>> How should you suggest that we handle these?
>>
 the upper layer wish to operate at an specified speed (say for testing
 purpose and etc), it can be allowed.
>>>
>>> anything for testing purposes, should be limited to test scenarios.
>>
>> Testing purpose is only one use case. Another use case is to limit the
>> speed so that I can confirm the driver actually works with various
>> speed of the device and handle correctly.
>>
>>>
 > After negoatiation, don't you
 > get any interrupt from your PHY indicating that link speed negotiation
 > is done ? Or is that IRQ only on AHCI IP ?

 There is NO interrupt from the PHY. The IRQ is assoicated with the
 AHCI IP. With SATA host flow, it starts off with an COMRESET to start
 the link negotiation. At that point, it will poll for completion.

 Any other concerns?
>>>
>>> hey, calm down... just trying to prevent us from applying something
>>> which isn't truly generic and I don't think "->set_speed" is generic
>>> enough. The semantics of the "speed" argument won't be valid for all use
>>> cases.
>>>
>>> I can already see people using that to pass
>>> USB_SPEED_{LOW,FULL,HIGH,SUPER}, instead of actual speed numbers. We wil
>>> end up with a mess to handle from a PHY driver, specially in cases such
>>> as OMAP where, as mentioned above, the same IP is used for SATA, PCIe
>>> and USB3.
>>>
>>
>> This PHY isn't as "intelligent" as other PHY's. What would you suggest
>> as I need an method to indicate to the underlying PHY that I want to
>> operate at an specified speed?
>
> Sorry if I am pinging you guys too fast here. I am look from an
> solution and open to any solution in which it is acceptable for your
> original intent of the generic PHY framework. I understand that you
> don't believe set_speed is generic enough and may not apply to omap.
> Or if you prefer we can try changing the init function to take an
> initial MAX speed?
>

For the initial version, I will remove support for Gen1/Gen2 until we
come up with an solution. Then post patches that will address
individual errata's.

-Loc
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Slow I/O performance on SAS1064

2014-03-05 Thread markus
On Wed, Mar 05, 2014 at 10:21:07AM -0800, Mark Knecht wrote:
> On Wed, Mar 5, 2014 at 9:50 AM, Markus  wrote:
> 
> 
> > The hdparm result looks like there is somethink not right . There were no 
> > features supported but why ?
> 
> 
> Does the HDD have S.M.A.R.T. features? Possibly
> 
> smartctl -a /dev/sda
root@outpost:~# smartctl -a /dev/sda
smartctl 5.41 2011-06-09 r3365 [sparc64-linux-3.13.5-mar] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net


Probable ATA device behind a SAT layer
Try an additional '-d ata' or '-d sat' argument.

so i try it with scsi. 

root@outpost:~# smartctl -d scsi -a /dev/sda
smartctl 5.41 2011-06-09 r3365 [sparc64-linux-3.13.5-mar] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

User Capacity:2,000,398,934,016 bytes [2.00 TB]
Logical block size:   512 bytes
Logical Unit id:  0x5394e2380537
Serial number:   7365P4KNT
Device type:  disk
Local Time is:Wed Mar  5 22:32:09 2014 CET
Device supports SMART and is Disabled
Temperature Warning Disabled or Not Supported
SMART Health Status: OK

Error Counter logging not supported
Device does not support Self Test logging


> 
> would provide some additional visibility?
.
[   98.037209] Uniform Multi-Platform E-IDE driver
[   98.107564] SLAB: Unable to allocate memory on node 0 (gfp=0x40d1)
[   98.188985]   cache: dma-kmalloc-512, object size: 512, order: 0
[   98.268127]   node 0: slabs: 0/0, objs: 0/0, free: 0
[   98.333538] sr: out of memory.
[   98.373628] cdrom: Uniform CD-ROM driver Revision: 3.20
[   98.443390] sr 0:0:0:0: Attached scsi CD-ROM sr0
[   98.448375] sr 0:0:0:0: Attached scsi generic sg0 type 5
[  103.198705] scsi2 : ioc0: LSISAS1064 A3, FwRev=01080400h, Ports=1, MaxQ=511, 
IRQ=10
[  103.329805] mptsas: ioc0: attaching sata device: fw_channel 0, fw_id 0, phy 
0, sas_addr 0x1221
[  103.462762] scsi 2:0:0:0: Direct-Access ATA  TOSHIBA MQ01ABB2 0U   
PQ: 0 ANSI: 5
[  103.571610] scsi 2:0:0:0: Attached scsi generic sg1 type 0
[  103.645418] mptsas: ioc0: attaching sata device: fw_channel 0, fw_id 1, phy 
1, sas_addr 0x12210100
[  103.777635] scsi 2:0:1:0: Direct-Access ATA  TOSHIBA MQ01ABB2 0U   
PQ: 0 ANSI: 5
[  103.886607] scsi 2:0:1:0: Attached scsi generic sg2 type 0
[  103.960517] mptsas: ioc0: attaching sata device: fw_channel 0, fw_id 2, phy 
2, sas_addr 0x12210200
[  104.093109] scsi 2:0:2:0: Direct-Access ATA  TOSHIBA MQ01ABB2 0U   
PQ: 0 ANSI: 5
[  104.201973] scsi 2:0:2:0: Attached scsi generic sg3 type 0
[  104.275830] mptsas: ioc0: attaching sata device: fw_channel 0, fw_id 3, phy 
3, sas_addr 0x12210300
[  104.408412] scsi 2:0:3:0: Direct-Access ATA  TOSHIBA MQ01ABB2 0U   
PQ: 0 ANSI: 5
[  104.517379] scsi 2:0:3:0: Attached scsi generic sg4 type 0
[  104.618997] sd 2:0:0:0: [sda] 3907029168 512-byte logical blocks: (2.00 
TB/1.81 TiB)
[  104.721185] sd 2:0:3:0: [sdd] 3907029168 512-byte logical blocks: (2.00 
TB/1.81 TiB)
[  104.823771] sd 2:0:1:0: [sdb] 3907029168 512-byte logical blocks: (2.00 
TB/1.81 TiB)
[  104.824017] sd 2:0:2:0: [sdc] 3907029168 512-byte logical blocks: (2.00 
TB/1.81 TiB)
[  104.828691] sd 2:0:0:0: [sda] Write Protect is off
[  104.828697] sd 2:0:0:0: [sda] Mode Sense: 67 00 00 08
[  104.829940] sd 2:0:2:0: [sdc] Write Protect is off
[  104.829945] sd 2:0:2:0: [sdc] Mode Sense: 67 00 00 08
[  104.830488] sd 2:0:3:0: [sdd] Write Protect is off
[  104.830493] sd 2:0:3:0: [sdd] Mode Sense: 67 00 00 08
[  104.833225] sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, 
doesn't support DPO or FUA
[  104.834471] sd 2:0:2:0: [sdc] Write cache: enabled, read cache: enabled, 
doesn't support DPO or FUA
[  104.835118] sd 2:0:3:0: [sdd] Write cache: enabled, read cache: enabled, 
doesn't support DPO or FUA
[  105.583175] sd 2:0:1:0: [sdb] Write Protect is off
[  105.623801]  sda: sda1 sda2 sda3
[  105.651118]  sdc: sdc1 sdc9
[  105.654024]  sdd: sdd1 sdd9
[  105.761966] sd 2:0:1:0: [sdb] Mode Sense: 67 00 00 08
[  105.769029] sd 2:0:1:0: [sdb] Write cache: enabled, read cache: enabled, 
doesn't support DPO or FUA
[  105.775293] sd 2:0:2:0: [sdc] Attached SCSI disk
[  105.775941] sd 2:0:3:0: [sdd] Attached SCSI disk
[  106.014357] sd 2:0:0:0: [sda] Attached SCSI disk
[  106.158864]  sdb: sdb1 sdb9
[  106.205879] sd 2:0:1:0: [sdb] Attached SCSI disk
[  106.502387] random: nonblocking pool is initialized
[  106.792978] device-mapper: uevent: version 1.0.3
[  106.861434] device-mapper: ioctl: 4.27.0-ioctl (2013-10-30) initialised: 
dm-de...@redhat.com
[  108.085062] bio: create slab  at 1
.
> 
> - Mark

Markus


signature.asc
Description: Digital signature


Re: [PATCH 0/6] iser-target: Fix active I/O shutdown related issues

2014-03-05 Thread Nicholas A. Bellinger
On Wed, 2014-03-05 at 14:12 +0200, Sagi Grimberg wrote:
> On 3/5/2014 2:06 AM, Nicholas A. Bellinger wrote:
> > On Tue, 2014-03-04 at 17:17 +0200, Sagi Grimberg wrote:
> >> On 3/4/2014 2:00 AM, Nicholas A. Bellinger wrote:
> >>> From: Nicholas Bellinger 
> >>>



> >> More on cleanup flow. isert_cma_handler does not handle
> >> RDMA_CM_EVENT_TIMEWAIT_EXIT.
> >> To be more specific, according to IB spec, when initiating disconnect
> >> (rdma_disconnect/ib_send_cm_dreq),
> >> one should not destroy a used qp until getting TIMEWAIT_EXIT CM event.
> >> We are working on this in iSER initiator.
> >> It might lead to "stale connection" CM rejects on future connections
> >> (SRP also does not do that).
> >>
> > , I noticed that as well during recent debugging.
> >
> > However, AFAICT the RDMA_CM_EVENT_TIMEWAIT_EVENT doesn't (always) occur
> > on the target side after a RDMA_CM_EVENT_DISCONNECTED, and thus far I've
> > not been able to ascertain what's different about the shutdown sequence
> > that would make this happen, or not happen..
> >
> > Any ideas..?
> 
> That's probably because the cm_id is destroyed before you get the event. 
> There is a specific
> timout computation to get this event (see IB spec). If you will attempt 
> to disconnect while
> the link is down (initiator won't receive it and send you disconnect 
> back), you should be able
> to see this event. As I understand, in order to comply the spec, the QP 
> (and the cm_id afterwards)
> should be destroyed only when getting this event and not before.
> 

, thanks for the additional background.

So currently rdma_destroy_qp() + rdma_destroy_id() is being done via
isert_connect_release(), which occurs after the final isert_put_conn()
happens from either the RDMA_CM_EVENT_DISCONNECTED handler, or within
isert_free_conn() in one of the per connection kernel thread contexts
via iscsit_close_connection().

If I understand the above correctly, the isert_put_conn() should move
from the RDMA_CM_EVENT_DISCONNECTED handler into the TIMEWAIT_EVENT
handler, yes..?

And it's safe to assume that DISCONNECTED will always occur before
TIMEWAIT_EVENT, right..?

--nab

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v13 0/3] PHY: Add APM X-Gene SoC 15Gbps Multi-purpose PHY support

2014-03-05 Thread Loc Ho
This patch adds support for APM X-Gene SoC 15Gbps Multi-purpose PHY. This
is the physical layer interface for the corresponding host controller. This
driver uses the PHY generic framework.

v13:
* Remove PHY patch for initial version as only Gen3 speed is supported
* Remove function xgene_phy_sata_force_gen and xgene_phy_set_speed
* Minor comment update to header and function xgene_phy_hw_init

v12:
* Add driver depend on HAS_IOMEM and OF
* Fix comment typo in header file phy-xgene.
* Change all macro shift value in hex to decimal
* Add time out for while loop to function sds_wr and sds_rd
* Set proper return value if failed to get IO resource 0
* Move devm_of_phy_provider_register to last operation in probe function
* Remove driver registration print statement
* Replace module init/exit with module_platform_driver
* Change license to GPL v2

v11:
* Add comment to function phy_set_speed
* Add commit log for documentation patch file
* Minor comment update to function xgene_phy_force_lat_summer_cal and
  xgene_phy_sata_force_gen

v10
* Update comment for function xgene_phy_force_lat_summer_cal and
  xgene_phy_sata_force_gen with function style fully-winged style

v9
* Update CMU parameter setting for register 13
* Add required delay when configure CMU PLL, Manual Calibration PLL, and VCO
  PLL
* Add comment for CMU PLL calibration loop delay of 10us
* Add required delay for stopping and starting summer calibrations
* Update comment for summer and latch calibration delays
* Update comment for PHY reset Rx delay and decrease max sleep time from 500
  to 150us
* Always program the DFE (equalizer) setting to 0x7e00 as with original version
* Fix Tx speed selection to always using Gen3 setting when force to an
  specified generation speed

v8
* Update binding documentation
* Remove XGENE_PHY_DTS and XGENE_PHY_EXT_DTS defines
* Remove support for internal clock
* Remove support for external reference CMU
* Remove the need for external reference resource DTS entry and its related
  code

v7
* Add/Update PHY CMU/lane parameters and its default values
* Rename variable enable_manual_cal to preA3Chip
* Remove function phy_rd, phy_wr, and phy_wr_flush
* Change function cmu_wr, cmu_rd, cmu_toggle1to0, cmu_clrbits, cmu_setbits,
  serdes_wr, serdes_rd, serdes_clrbits, and serdes_setbits to take context
  instead void *
* Remove function serdes_toggle1to0
* Decrease the polling time from 10ms to 1ms on CMU calibration complete
  detection
* Move all SATA specify code in function xgene_phy_hw_initialize into
  function xgene_phy_hw_init_sata
* Add usleep_range after starting summer/latch calibrations
* Add usleep_range between receiver reset (function xgene_phy_reset_rxd)
* Save and restore PHY register 31 instead writing 0 in function
  xgene_phy_gen_avg_val
* Update function xgene_phy_sata_force_gen programming sequences
* Add support to reset the receiver lane in function xgene_phy_set_speed
  if speed is 0
* Update PHY parameters in DTS per controller
* Some minor code clean up

v6
* Move PHY document to Documentation/devicetree/binding/phy
* Remove _ADDR from all register defines
* Update clock-names property for sataphy1clk, sataphy2clk, and sataphy3clk

v5
* Update DTS binding documentation
* Remove direct clock access and use clock interface instead
* Change override parameters to decimal instead hex values
* Change apm,tx-amplitude, apm,tx-pre-cursor1, apm,tx-pre-cursor2,
  apm,tx-post-cursor to be unit of uV

v4
* Update documentation with 'apm,' instead 'apm-'
* Change DTS override parameter to have 'apm,'
* Add select GENERIC_PHY to Kconfig PHY_XGENE
* Make override parameters to be pair of three values instead one
* Some minor comment and indentation changes
* Remove error register addition offset
* Add ULL to constants
* Use module_init instead subsys_initcall
* Make DTS node based on first register address
* Update override setting values

v3
* Major re-write of the code based on various review comments
* Support external clock only at the moment
* Support SATA mode only at the moment
* No UEFI support at the moment

v2
* Remove port knowledge from functions
* Make all functions static
* Remove ID completely
* Make resource requirement based on compatible type
* Rename override PHY parameters with more descriptive name
* Add override PHY parameter for per controller, per port, and per speed
* Patch the generic PHY frame to expose set_speed operation

v1
* Initial version

Signed-off-by: Loc Ho 
Signed-off-by: Tuan Phan 
Signed-off-by: Suman Tripathi 
---
Loc Ho (3):
  Documentation: Add APM X-Gene SoC 15Gbps Multi-purpose PHY driver
binding documentation
  PHY: add APM X-Gene SoC 15Gbps Multi-purpose PHY driver
  arm64: Add APM X-Gene SoC 15Gbps Multi-purpose PHY DTS entries

 .../devicetree/bindings/phy/apm-xgene-phy.txt  |   79 +
 arch/arm64/boot/dts/apm-storm.dtsi |   75 +
 drivers/phy/Kconfig|7 +
 drivers/phy/Makefile 

[PATCH v15 0/3] ata: Add APM X-Gene SoC AHCI SATA host controller support

2014-03-05 Thread Loc Ho
This patch adds support for the APM X-Gene SoC AHCI SATA host controller. In
order for the host controller to work, the corresponding PHY driver
musts also be available. Currently, only Gen3 disk is supported with this
initial version.

v15:
 * Rebase to libata next branch
 * Remove field plat_data and PHY from context structure
 * Fix comment on function xgene_ahci_read_id as well as using bit mask to
   clear DEVSLP bit
 * Remove function xgene_ahci_force_phy_rdy and xgene_ahci_phy_restart as not
   required since Gen1/Gen2 support remove for this initial version
 * Update function xgene_ahci_do_hardreset comment
 * Remove Gen1/Gen2 support from function xgene_ahci_do_hardreset
 * Change int to u32 for variable portcmd_saved in function
   xgene_ahci_hardreset
 * Change variable hplat_data to ctx in function xgene_ahci_probe
 * Remove PHY call and using ahci_platform_enable_resource instead
 * Add ahci_patlform_remove_one to driver function .remove
 * Change phy-name to "sata-phy"

v14:
 * Remove the shutdown already check and replace the while loop check with
   msleep in function xgene_ahci_init_memram

v13:
 * Add fully-winged style comment for function xgene_ahci_read_id and
   xgene_ahci_do_hardrest
 * Minor comments update for function xgene_ahci_read_id,
   xgene_ahci_do_hardrest, and xgene_ahci_hardreset
 * NOTE: There is no functional code change.

v12:
 * Remove function xgene_ahci_get_channel and use the ata_port field port_no
 * Update comment for function xgene_ahci_read_id to function comment style
   '/**'
 * Update comment for multiple lines to fully-winged style

v11:
 * Drop the export functions requirement with libachi
 * Change CONFIG_SATA_XGENE to CONFIG_AHCI_XGENE
 * Rename file sata_xgene.c to ahci_xgene.c
 * Convert to use Hans De Geode version 5 ahci_platform code re-factor changes
   to reduce code duplication. For extra context, use plat_data to store our
   context. The probe function follows the ahci_sunxi implementation. A number
   of code fragments update to reflect this change.
 * Update comment for function xgene_ahci_read_id
 * Minor code move around in function xgene_ahci_do_hardreset and use
   ATA_BUSY instead 0x80
 * Fix hardreset to use start_engine function pointer as required due to newer
   kernel rebased
 * Fix the set DMA mask for 32-bit as well

v10:
 * Update binding documentation

v9:
 * Remove ACPI/EFI include files
 * Remove the IO flush support, interrupt routine, and DTS resources
 * Remove function xgene_rd, xgene_wr, and xgene_wr_flush
 * Remove PMP support (function xgene_ahci_qc_issue, xgene_ahci_qc_prep,
   xgene_ahci_qc_fill_rtf, xgene_ahci_softreset, and xgene_ahci_do_softreset)
 * Rename function xgene_ahci_enable_phy to xgene_ahci_force_phy_rdy
 * Clean up hardreset functions
 * Require v7 of the PHY driver

v8:
 * Remove _ADDR from defines
 * Remove define MSTAWAUX_COHERENT_BYPASS_SET and
   STARAUX_COHERENT_BYPASS_SET and use direct coding
 * Remove the un-necessary check for DTS boot with built in ACPI table
 * Switch to use dma_set_mask_and_coherent for setting DMA mask
 * Remove ACPI table matching code
 * Update clock-names for sata01clk, sata23clk, and sata45clk

v7:
 * Update the clock code by toggle the clock
 * Update the DTS clock mask values due to the clock spilt between host and
   v5 of the PHY drivers

v6:
 * Update binding documentation
 * Change select PHY_XGENE_SATA to PHY_XGENE
 * Add ULL to constants
 * Change indentation and comments
 * Clean up the probe functions a bit more
 * Remove xgene_ahci_remove function
 * Add the flush register to DTS
 * Remove the interrupt-parent from DTS

v5:
 * Sync up to v3 of the PHY driver
 * Remove MSLIM wrapper functions
 * Change the memory shutdown loop to use usleep_range
 * Use devm_ioremap_resource instead devm_ioremap
 * Remove suspend/resume functions as not needed

v4:
 * Remove the ID property in DT
 * Remove the temporary PHY direct function call and use PHY function
 * Change printk to pr_debug
 * Move the IOB flush addresses into the DT
 * Remove the parameters retrieval function as no longer needed
 * Remove the header file as no longer needed
 * Require v2 patch of the SATA PHY driver. Require slightly modification
   in the Kconfig as it is moved to folder driver/phy and use Kconfig
   PHY_XGENE_SATA instead SATA_XGENE_PHY.

v3:
 * Move out the SATA PHY to another driver
 * Remove the clock-cells entry from DTS
 * Remove debug wrapper
 * Remove delay functions wrapper
 * Clean up resource and IRQ query
 * Remove query clock name
 * Switch to use dma_set_mask/dma_coherent_mask
 * Remove un-necessary devm_kfree
 * Update GPL license header to v2
 * Spilt up function xgene_ahci_hardreset
 * Spilt up function xgene_ahci_probe
 * Remove all reference of CONFIG_ARCH_MSLIM
 * Clean up chip revision code

v2:
 * Clean up file sata_xgene.c with Lindent and etc
 * Clean up file sata_xgene_serdes.c with Lindent and etc
 * Add description to each patch

v1:
 * inital version

Signed-o

[PATCH v13 3/3] arm64: Add APM X-Gene SoC 15Gbps Multi-purpose PHY DTS entries

2014-03-05 Thread Loc Ho
This patch adds the DTS entries for the APM X-Gene SoC 15Gbps Multi-purpose
PHY driver. The PHY for SATA controller 2 and 3 are enabled by default.

Signed-off-by: Loc Ho 
Signed-off-by: Tuan Phan 
Signed-off-by: Suman Tripathi 
---
 arch/arm64/boot/dts/apm-storm.dtsi |   75 
 1 files changed, 75 insertions(+), 0 deletions(-)

diff --git a/arch/arm64/boot/dts/apm-storm.dtsi 
b/arch/arm64/boot/dts/apm-storm.dtsi
index d37d736..c78ddcf 100644
--- a/arch/arm64/boot/dts/apm-storm.dtsi
+++ b/arch/arm64/boot/dts/apm-storm.dtsi
@@ -176,6 +176,51 @@
reg-names = "csr-reg";
clock-output-names = "eth8clk";
};
+
+   sataphy1clk: sataphy1clk@1f21c000 {
+   compatible = "apm,xgene-device-clock";
+   #clock-cells = <1>;
+   clocks = <&socplldiv2 0>;
+   clock-names = "socplldiv2";
+   reg = <0x0 0x1f21c000 0x0 0x1000>;
+   reg-names = "csr-reg";
+   clock-output-names = "sataphy1clk";
+   status = "disabled";
+   csr-offset = <0x4>;
+   csr-mask = <0x00>;
+   enable-offset = <0x0>;
+   enable-mask = <0x06>;
+   };
+
+   sataphy2clk: sataphy1clk@1f22c000 {
+   compatible = "apm,xgene-device-clock";
+   #clock-cells = <1>;
+   clocks = <&socplldiv2 0>;
+   clock-names = "socplldiv2";
+   reg = <0x0 0x1f22c000 0x0 0x1000>;
+   reg-names = "csr-reg";
+   clock-output-names = "sataphy2clk";
+   status = "ok";
+   csr-offset = <0x4>;
+   csr-mask = <0x3a>;
+   enable-offset = <0x0>;
+   enable-mask = <0x06>;
+   };
+
+   sataphy3clk: sataphy1clk@1f23c000 {
+   compatible = "apm,xgene-device-clock";
+   #clock-cells = <1>;
+   clocks = <&socplldiv2 0>;
+   clock-names = "socplldiv2";
+   reg = <0x0 0x1f23c000 0x0 0x1000>;
+   reg-names = "csr-reg";
+   clock-output-names = "sataphy3clk";
+   status = "ok";
+   csr-offset = <0x4>;
+   csr-mask = <0x3a>;
+   enable-offset = <0x0>;
+   enable-mask = <0x06>;
+   };
};
 
serial0: serial@1c02 {
@@ -187,5 +232,35 @@
interrupt-parent = <&gic>;
interrupts = <0x0 0x4c 0x4>;
};
+
+   phy1: phy@1f21a000 {
+   compatible = "apm,xgene-phy";
+   reg = <0x0 0x1f21a000 0x0 0x100>;
+   #phy-cells = <1>;
+   clocks = <&sataphy1clk 0>;
+   status = "disabled";
+   apm,tx-boost-gain = <30 30 30 30 30 30>;
+   apm,tx-eye-tuning = <2 10 10 2 10 10>;
+   };
+
+   phy2: phy@1f22a000 {
+   compatible = "apm,xgene-phy";
+   reg = <0x0 0x1f22a000 0x0 0x100>;
+   #phy-cells = <1>;
+   clocks = <&sataphy2clk 0>;
+   status = "ok";
+   apm,tx-boost-gain = <30 30 30 30 30 30>;
+   apm,tx-eye-tuning = <1 10 10 2 10 10>;
+   };
+
+   phy3: phy@1f23a000 {
+   compatible = "apm,xgene-phy";
+   reg = <0x0 0x1f23a000 0x0 0x100>;
+   #phy-cells = <1>;
+   clocks = <&sataphy3clk 0>;
+   status = "ok";
+   apm,tx-boost-gain = <31 31 31 31 31 31>;
+   apm,tx-eye-tuning = <2 10 10 2 10 10>;
+   };
};
 };
-- 
1.5.5

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v13 1/3] Documentation: Add APM X-Gene SoC 15Gbps Multi-purpose PHY driver binding documentation

2014-03-05 Thread Loc Ho
This patch adds the APM X-Gene SoC 15Gbps Multi-purpose PHY driver binding 
documentation.

Signed-off-by: Loc Ho 
Signed-off-by: Tuan Phan 
Signed-off-by: Suman Tripathi 
---
 .../devicetree/bindings/phy/apm-xgene-phy.txt  |   79 
 1 files changed, 79 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/phy/apm-xgene-phy.txt

diff --git a/Documentation/devicetree/bindings/phy/apm-xgene-phy.txt 
b/Documentation/devicetree/bindings/phy/apm-xgene-phy.txt
new file mode 100644
index 000..5f3a65a
--- /dev/null
+++ b/Documentation/devicetree/bindings/phy/apm-xgene-phy.txt
@@ -0,0 +1,79 @@
+* APM X-Gene 15Gbps Multi-purpose PHY nodes
+
+PHY nodes are defined to describe on-chip 15Gbps Multi-purpose PHY. Each
+PHY (pair of lanes) has its own node.
+
+Required properties:
+- compatible   : Shall be "apm,xgene-phy".
+- reg  : PHY memory resource is the SDS PHY access resource.
+- #phy-cells   : Shall be 1 as it expects one argument for setting
+ the mode of the PHY. Possible values are 0 (SATA),
+ 1 (SGMII), 2 (PCIe), 3 (USB), and 4 (XFI).
+
+Optional properties:
+- status   : Shall be "ok" if enabled or "disabled" if disabled.
+ Default is "ok".
+- clocks   : Reference to the clock entry.
+- apm,tx-eye-tuning: Manual control to fine tune the capture of the serial
+ bit lines from the automatic calibrated position.
+ Two set of 3-tuple setting for each (up to 3)
+ supported link speed on the host. Range from 0 to
+ 127 in unit of one bit period. Default is 10.
+- apm,tx-eye-direction : Eye tuning manual control direction. 0 means sample
+ data earlier than the nominal sampling point. 1 means
+ sample data later than the nominal sampling point.
+ Two set of 3-tuple setting for each (up to 3)
+ supported link speed on the host. Default is 0.
+- apm,tx-boost-gain: Frequency boost AC (LSB 3-bit) and DC (2-bit)
+ gain control. Two set of 3-tuple setting for each
+ (up to 3) supported link speed on the host. Range is
+ between 0 to 31 in unit of dB. Default is 3.
+- apm,tx-amplitude : Amplitude control. Two set of 3-tuple setting for
+ each (up to 3) supported link speed on the host.
+ Range is between 0 to 199500 in unit of uV.
+ Default is 199500 uV.
+- apm,tx-pre-cursor1   : 1st pre-cursor emphasis taps control. Two set of
+ 3-tuple setting for each (up to 3) supported link
+ speed on the host. Range is 0 to 273000 in unit of
+ uV. Default is 0.
+- apm,tx-pre-cursor2   : 2st pre-cursor emphasis taps control. Two set of
+ 3-tuple setting for each (up to 3) supported link
+ speed on the host. Range is 0 to 127400 in unit uV.
+ Default is 0x0.
+- apm,tx-post-cursor   : Post-cursor emphasis taps control. Two set of
+ 3-tuple setting for Gen1, Gen2, and Gen3. Range is
+ between 0 to 0x1f in unit of 18.2mV. Default is 0xf.
+- apm,tx-speed : Tx operating speed. One set of 3-tuple for each
+ supported link speed on the host.
+  0 = 1-2Gbps
+  1 = 2-4Gbps (1st tuple default)
+  2 = 4-8Gbps
+  3 = 8-15Gbps (2nd tuple default)
+  4 = 2.5-4Gbps
+  5 = 4-5Gbps
+  6 = 5-6Gbps
+  7 = 6-16Gbps (3rd tuple default)
+
+NOTE: PHY override parameters are board specific setting.
+
+Example:
+   phy1: phy@1f21a000 {
+   compatible = "apm,xgene-phy";
+   reg = <0x0 0x1f21a000 0x0 0x100>;
+   #phy-cells = <1>;
+   status = "disabled";
+   };
+
+   phy2: phy@1f22a000 {
+   compatible = "apm,xgene-phy";
+   reg = <0x0 0x1f22a000 0x0 0x100>;
+   #phy-cells = <1>;
+   status = "ok";
+   };
+
+   phy3: phy@1f23a000 {
+   compatible = "apm,xgene-phy";
+   reg = <0x0 0x1f23a000 0x0 0x100>;
+   #phy-cells = <1>;
+   status = "ok";
+   };
-- 
1.5.5

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kern

[PATCH v15 1/3] Documentation: Add documentation for the APM X-Gene SoC SATA host controller DTS binding

2014-03-05 Thread Loc Ho
This patch adds documentation for the APM X-Gene SoC SATA host controller DTS
binding.

Signed-off-by: Loc Ho 
Signed-off-by: Tuan Phan 
Signed-off-by: Suman Tripathi 
---
 .../devicetree/bindings/ata/apm-xgene.txt  |   70 
 1 files changed, 70 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/ata/apm-xgene.txt

diff --git a/Documentation/devicetree/bindings/ata/apm-xgene.txt 
b/Documentation/devicetree/bindings/ata/apm-xgene.txt
new file mode 100644
index 000..633eb3b
--- /dev/null
+++ b/Documentation/devicetree/bindings/ata/apm-xgene.txt
@@ -0,0 +1,70 @@
+* APM X-Gene 6.0 Gb/s SATA host controller nodes
+
+SATA host controller nodes are defined to describe on-chip Serial ATA
+controllers. Each SATA controller (pair of ports) have its own node.
+
+Required properties:
+- compatible   : Shall contain:
+  * "apm,xgene-ahci-sgmii" if mux'ed with SGMII
+  * "apm,xgene-ahci-pcie" if mux'ed with PCIe
+- reg  : First memory resource shall be the AHCI memory
+ resource.
+ Second memory resource shall be the host controller
+ memory resource.
+- interrupts   : Interrupt-specifier for SATA host controller IRQ.
+- clocks   : Reference to the clock entry.
+- phys : A list of phandles + phy-specifiers, one for each
+ entry in phy-names.
+- phy-names: Should contain:
+  * "sata-6g" for the SATA 6.0Gbps PHY
+
+Optional properties:
+- status   : Shall be "ok" if enabled or "disabled" if disabled.
+ Default is "ok".
+- interrupt-parent : Interrupt controller.
+
+Example:
+   sataclk: sataclk {
+   compatible = "fixed-clock";
+   #clock-cells = <1>;
+   clock-frequency = <1>;
+   clock-output-names = "sataclk";
+   };
+
+   phy2: phy@1f22a000 {
+   compatible = "apm,xgene-phy";
+   reg = <0x0 0x1f22a000 0x0 0x100>,
+ <0x0 0x1f22c000 0x0 0x100>;
+   #phy-cells = <1>;
+   };
+
+   phy3: phy@1f23a000 {
+   compatible = "apm,xgene-phy";
+   reg = <0x0 0x1f23a000 0x0 0x100>,
+ <0x0 0x1f23c000 0x0 0x100>;
+   #phy-cells = <1>;
+   };
+
+   sata2: sata@1a40 {
+   compatible = "apm,xgene-ahci-sgmii";
+   reg = <0x0 0x1a40 0x0 0x1000>,
+ <0x0 0x1f22 0x0 0x1>;
+   interrupt-parent = <&gic>;
+   interrupts = <0x0 0x87 0x4>;
+   status = "ok";
+   clocks = <&sataclk 0>;
+   phys = <&phy2 0>;
+   phy-names = "sata-6g";
+   };
+
+   sata3: sata@1a80 {
+   compatible = "apm,xgene-ahci-pcie";
+   reg = <0x0 0x1a80 0x0 0x1000>,
+ <0x0 0x1f23 0x0 0x1>;
+   interrupt-parent = <&gic>;
+   interrupts = <0x0 0x88 0x4>;
+   status = "ok";
+   clocks = <&sataclk 0>;
+   phys = <&phy3 0>;
+   phy-names = "sata-6g";
+   };
-- 
1.5.5

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v15 2/3] ata: Add APM X-Gene SoC AHCI SATA host controller driver

2014-03-05 Thread Loc Ho
This patch adds support for the APM X-Gene SoC AHCI SATA host controller
driver. It requires the corresponding APM X-Gene SoC PHY driver. This
initial version only supports Gen3 speed.

Signed-off-by: Loc Ho 
Signed-off-by: Tuan Phan 
Signed-off-by: Suman Tripathi 
---
 drivers/ata/Kconfig  |7 +
 drivers/ata/Makefile |1 +
 drivers/ata/ahci_xgene.c |  490 ++
 3 files changed, 498 insertions(+), 0 deletions(-)
 create mode 100644 drivers/ata/ahci_xgene.c

diff --git a/drivers/ata/Kconfig b/drivers/ata/Kconfig
index b4a9262..7b9a91a 100644
--- a/drivers/ata/Kconfig
+++ b/drivers/ata/Kconfig
@@ -115,6 +115,13 @@ config AHCI_SUNXI
 
  If unsure, say N.
 
+config AHCI_XGENE
+   tristate "APM X-Gene 6.0Gbps AHCI SATA host controller support"
+   depends on SATA_AHCI_PLATFORM && (ARM64 || COMPILE_TEST)
+   select PHY_XGENE
+   help
+This option enables support for APM X-Gene SoC SATA host controller.
+
 config SATA_FSL
tristate "Freescale 3.0Gbps SATA support"
depends on FSL_SOC
diff --git a/drivers/ata/Makefile b/drivers/ata/Makefile
index 246050b..72b423b 100644
--- a/drivers/ata/Makefile
+++ b/drivers/ata/Makefile
@@ -12,6 +12,7 @@ obj-$(CONFIG_SATA_DWC)+= sata_dwc_460ex.o
 obj-$(CONFIG_SATA_HIGHBANK)+= sata_highbank.o libahci.o
 obj-$(CONFIG_AHCI_IMX) += ahci_imx.o
 obj-$(CONFIG_AHCI_SUNXI)   += ahci_sunxi.o
+obj-$(CONFIG_AHCI_XGENE)   += ahci_xgene.o
 
 # SFF w/ custom DMA
 obj-$(CONFIG_PDC_ADMA) += pdc_adma.o
diff --git a/drivers/ata/ahci_xgene.c b/drivers/ata/ahci_xgene.c
new file mode 100644
index 000..df37c78
--- /dev/null
+++ b/drivers/ata/ahci_xgene.c
@@ -0,0 +1,490 @@
+/*
+ * AppliedMicro X-Gene SoC SATA Host Controller Driver
+ *
+ * Copyright (c) 2014, Applied Micro Circuits Corporation
+ * Author: Loc Ho 
+ * Tuan Phan 
+ * Suman Tripathi 
+ *
+ * This program is free software; you can redistribute  it and/or modify it
+ * under  the terms of  the GNU General  Public License as published by the
+ * Free Software Foundation;  either version 2 of the  License, or (at your
+ * option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see .
+ *
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "ahci.h"
+
+/* Controller who PHY shared with SGMII Ethernet PHY */
+#define XGENE_AHCI_SGMII_DTS   "apm,xgene-ahci-sgmii"
+
+/* Controller who PHY (internal reference clock macro) shared with PCIe */
+#define XGENE_AHCI_PCIE_DTS"apm,xgene-ahci-pcie"
+
+/* Max # of disk per a controller */
+#define MAX_AHCI_CHN_PERCTR2
+
+#define SATA_ENET_MUX_OFFSET   0x7000
+#define SATA_DIAG_OFFSET   0xD000
+#define SATA_GLB_OFFSET0xD850
+#define SATA_SHIM_OFFSET   0xE000
+#define SATA_MASTER_OFFSET 0xF000
+#define SATA_PORT0_OFFSET  0x0100
+#define SATA_PORT1_OFFSET  0x0180
+
+/* MUX CSR */
+#define SATA_ENET_CONFIG_REG   0x
+#define  CFG_SATA_ENET_SELECT_MASK 0x0001
+
+/* SATA host controller CSR */
+#define SLVRDERRATTRIBUTES 0x
+#define SLVWRERRATTRIBUTES 0x0004
+#define MSTRDERRATTRIBUTES 0x0008
+#define MSTWRERRATTRIBUTES 0x000c
+#define BUSCTLREG  0x0014
+#define IOFMSTRWAUX0x0018
+#define INTSTATUSMASK  0x002c
+#define ERRINTSTATUS   0x0030
+#define ERRINTSTATUSMASK   0x0034
+
+/* SATA host AHCI CSR */
+#define PORTCFG0x00a4
+#define  PORTADDR_SET(dst, src) \
+   (((dst) & ~0x003f) | (((u32)(src)) & 0x003f))
+#define PORTPHY1CFG0x00a8
+#define PORTPHY1CFG_FRCPHYRDY_SET(dst, src) \
+   (((dst) & ~0x0010) | (((u32)(src) << 0x14) & 0x0010))
+#define PORTPHY2CFG0x00ac
+#define PORTPHY3CFG0x00b0
+#define PORTPHY4CFG0x00b4
+#define PORTPHY5CFG0x00b8
+#define SCTL0  0x012C
+#define PORTPHY5CFG_RTCHG_SET(dst, src) \
+   (((dst) & ~0xfff0) | (((u32)(src) << 0x14) & 0xfff0))
+#define PORTAXICFG_EN_CONTEXT_SET(dst, src) \
+   (((dst) & ~0x0100) | (((u32)(src) << 0x18) & 0x0100))
+#define PORTAXICFG 0x00bc
+#define PORTAXICFG_OUTTRANS_SET(dst, src) \
+   

[PATCH v15 3/3] arm64: Add APM X-Gene SoC AHCI SATA host controller DTS entries

2014-03-05 Thread Loc Ho
This patch adds APM X-Gene SoC AHCI SATA host controller DTS entries.

Signed-off-by: Loc Ho 
Signed-off-by: Tuan Phan 
Signed-off-by: Suman Tripathi 
---
 arch/arm64/boot/dts/apm-storm.dtsi |   75 
 1 files changed, 75 insertions(+), 0 deletions(-)

diff --git a/arch/arm64/boot/dts/apm-storm.dtsi 
b/arch/arm64/boot/dts/apm-storm.dtsi
index c78ddcf..2a03e96 100644
--- a/arch/arm64/boot/dts/apm-storm.dtsi
+++ b/arch/arm64/boot/dts/apm-storm.dtsi
@@ -221,6 +221,48 @@
enable-offset = <0x0>;
enable-mask = <0x06>;
};
+
+   sata01clk: sata01clk@1f21c000 {
+   compatible = "apm,xgene-device-clock";
+   #clock-cells = <1>;
+   clocks = <&socplldiv2 0>;
+   clock-names = "socplldiv2";
+   reg = <0x0 0x1f21c000 0x0 0x1000>;
+   reg-names = "csr-reg";
+   clock-output-names = "sata01clk";
+   csr-offset = <0x4>;
+   csr-mask = <0x05>;
+   enable-offset = <0x0>;
+   enable-mask = <0x39>;
+   };
+
+   sata23clk: sata23clk@1f22c000 {
+   compatible = "apm,xgene-device-clock";
+   #clock-cells = <1>;
+   clocks = <&socplldiv2 0>;
+   clock-names = "socplldiv2";
+   reg = <0x0 0x1f22c000 0x0 0x1000>;
+   reg-names = "csr-reg";
+   clock-output-names = "sata23clk";
+   csr-offset = <0x4>;
+   csr-mask = <0x05>;
+   enable-offset = <0x0>;
+   enable-mask = <0x39>;
+   };
+
+   sata45clk: sata45clk@1f23c000 {
+   compatible = "apm,xgene-device-clock";
+   #clock-cells = <1>;
+   clocks = <&socplldiv2 0>;
+   clock-names = "socplldiv2";
+   reg = <0x0 0x1f23c000 0x0 0x1000>;
+   reg-names = "csr-reg";
+   clock-output-names = "sata45clk";
+   csr-offset = <0x4>;
+   csr-mask = <0x05>;
+   enable-offset = <0x0>;
+   enable-mask = <0x39>;
+   };
};
 
serial0: serial@1c02 {
@@ -262,5 +304,38 @@
apm,tx-boost-gain = <31 31 31 31 31 31>;
apm,tx-eye-tuning = <2 10 10 2 10 10>;
};
+
+   sata1: sata@1a00 {
+   compatible = "apm,xgene-ahci-sgmii";
+   reg = <0x0 0x1a00 0x0 0x1000>,
+ <0x0 0x1f21 0x0 0x1>;
+   interrupts = <0x0 0x86 0x4>;
+   status = "disabled";
+   clocks = <&sata01clk 0>;
+   phys = <&phy1 0>;
+   phy-names = "sata-phy";
+   };
+
+   sata2: sata@1a40 {
+   compatible = "apm,xgene-ahci-sgmii";
+   reg = <0x0 0x1a40 0x0 0x1000>,
+ <0x0 0x1f22 0x0 0x1>;
+   interrupts = <0x0 0x87 0x4>;
+   status = "ok";
+   clocks = <&sata23clk 0>;
+   phys = <&phy2 0>;
+   phy-names = "sata-phy";
+   };
+
+   sata3: sata@1a80 {
+   compatible = "apm,xgene-ahci-pcie";
+   reg = <0x0 0x1a80 0x0 0x1000>,
+ <0x0 0x1f23 0x0 0x1>;
+   interrupts = <0x0 0x88 0x4>;
+   status = "ok";
+   clocks = <&sata45clk 0>;
+   phys = <&phy3 0>;
+   phy-names = "sata-phy";
+   };
};
 };
-- 
1.5.5

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND] SCSI: sd: don't fail if the device doesn't recognize SYNCHRONIZE CACHE

2014-03-05 Thread Dan Williams
On Mon, Mar 3, 2014 at 1:25 PM, Daniel Mack  wrote:
> From: Alan Stern 
>
> Evidently some wacky USB-ATA bridges don't recognize the SYNCHRONIZE
> CACHE command, as shown in this email thread:
>
> http://marc.info/?t=13897835622&r=1&w=2
>
> The fact that we can't tell them to drain their caches shouldn't
> prevent the system from going into suspend.  Therefore sd_sync_cache()
> shouldn't return an error if the device replies with an Invalid
> Command ASC.
>
> Signed-off-by: Alan Stern 
> Reported-by: Sven Neumann 
> Tested-by: Daniel Mack 
> CC: Oliver Neukum 
> CC: 
> ---
> Hi,
>
> this patch has been around for awhile, but hasn't gained much
> attraction, and hasn't been merged anywhere yet. Which is sad,
> as it fixes a bug on real hardware when going to suspend :)
>
> Could anyone from the SCSI people have a quick look maybe?

Acked-by: Dan Williams 

But I agree with Tejun [1], that this likely does not go far enough.
We should also be looking to fail future writes to the device or
disabling the cache.

Tejun's comment:
"Ooh, yeah, flush failure is special.  That said, I think the right way
to deal with that is marking the device as failed and fail writes /
flushes afterwards instead of failing suspend.  It's hightly unlikely
the device is in any useable state after failing flushes anyway and
failing suspend has potential to lead to pretty dramatic failure
conditions (device overheating in the bag would be a common one) too."

[1]: http://marc.info/?l=linux-scsi&m=138998568010393&w=2
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html