date:20180207

Re: [PATCH v2 01/13] lpfc: Rework lpfc to allow different sli4 cq and eq handlers

2018-02-07 Thread Johannes Thumshirn

I first though of renameing:
lpfc_sli4_eq_clr_intr -> lpfc_sli4_t2_eq_clr_intr 
lpfc_sli4_eq_release -> lpfc_sli4_t2_eq_release 
lpfc_sli4_cq_release -> lpfc_sli4_t2_cq_release 

and then make lpfc_sli4_eq_clr_intr, etc... wrappers over the new function
pointers so and you could keep the callsites, but then you'd have to change
the callsites anyways as the function prototype will need to change.

So looks good,
Reviewed-by: Johannes Thumshirn 

-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

Re: [PATCH v2 02/13] lpfc: Rework sli4 doorbell infrastructure

2018-02-07 Thread Johannes Thumshirn

> diff --git a/drivers/scsi/lpfc/lpfc_sli4.h b/drivers/scsi/lpfc/lpfc_sli4.h
> index 0c0cbe296fed..e2f06c92c4dd 100644
> --- a/drivers/scsi/lpfc/lpfc_sli4.h
> +++ b/drivers/scsi/lpfc/lpfc_sli4.h
> @@ -569,7 +569,8 @@ struct lpfc_sli4_hba {
>   /* IF type 0, BAR 0 and if type 2, BAR 0 doorbell register memory map */
>   void __iomem *RQDBregaddr;  /* RQ_DOORBELL register */
>   void __iomem *WQDBregaddr;  /* WQ_DOORBELL register */
> - void __iomem *EQCQDBregaddr;/* EQCQ_DOORBELL register */
> + void __iomem *CQDBregaddr;  /* CQ_DOORBELL register */
> + void __iomem *EQDBregaddr;  /* EQ_DOORBELL register */
>   void __iomem *MQDBregaddr;  /* MQ_DOORBELL register */
>   void __iomem *BMBXregaddr;  /* BootStrap MBX register */

Reviewed-by: Johannes Thumshirn 

Although CQDBregaddr, EQDBregaddr, BMBXregaddr, etc... really look ugly and
this would've been the perfect chance to change it ;-)

-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

Re: [PATCH v2 03/13] lpfc: Add SLI-4 if_type=6 support to the code base

2018-02-07 Thread Johannes Thumshirn

Looks good,
Reviewed-by: Johannes Thumshirn 
-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

Re: [PATCH v2 04/13] lpfc: Add push-to-adapter support to sli4

2018-02-07 Thread Johannes Thumshirn

> + if ((if_type == LPFC_SLI_INTF_IF_TYPE_6) &&
> + (pci_resource_start(pdev, PCI_64BIT_BAR4))) {

The above contains a lot of unneeded parenthesis.

[...]

> + /* Enable combined writes for DPP aperture */
> + pg_addr = (unsigned long)(wq->dpp_regaddr) & PAGE_MASK;
> +#ifdef CONFIG_X86
> + rc = set_memory_wc(pg_addr, 1);
> + if (rc) {
> + lpfc_printf_log(phba, KERN_ERR, LOG_INIT,
> + "3272 Cannot setup Combined "
> + "Write on WQ[%d] - disable 
> DPP\n",
> + wq->queue_id);
> + phba->cfg_enable_dpp = 0;
> + }
> +#else
> + phba->cfg_enable_dpp = 0;
> +#endif
> + } else
> + wq->db_regaddr = phba->sli4_hba.WQDBregaddr;

I don't really like the set_memory_wc() call here. Neither do I like the ifdef
CONFIG_X86 special casing.

If you really need write combining, can't you at least use ioremap_wc()?

Thanks,
Johannes

-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

Re: [PATCH v2 05/13] lpfc: Add PCI Ids for if_type=6 hardware

2018-02-07 Thread Johannes Thumshirn

Looks good,
Reviewed-by: Johannes Thumshirn 
-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

Re: [PATCH v2 06/13] lpfc: Add 64G link speed support

2018-02-07 Thread Johannes Thumshirn

On Tue, Feb 06, 2018 at 06:28:44PM -0800, James Smart wrote:
>   lpfc_printf_log(phba, KERN_ERR, LOG_INIT,
> - "0469 lpfc_link_speed attribute cannot be set to %d, "
> - "allowed values are ["LPFC_LINK_SPEED_STRING"]\n", val);
> + "0469 lpfc_link_speed attribute cannot be set to %d, "
> + "allowed values are ["LPFC_LINK_SPEED_STRING"]\n", val);

lpfc_printf_log(phba, KERN_ERR, LOG_INIT,
"0469 lpfc_link_speed attribute cannot be set to %d, "
"allowed values are [%s]\n",
val, LPFC_LINK_SPEED_STRING);

And possible have the whole quoted string on one line for easier grepping.
checkplatch.pl should've told you that.

Thanks,
Johannes
-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

Re: [PATCH v2 07/13] lpfc: Add if_type=6 support for cycling valid bits

2018-02-07 Thread Johannes Thumshirn

Looks good,
Reviewed-by: Johannes Thumshirn 
-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

Re: [PATCH v2 08/13] lpfc: Enable fw download on if_type=6 devices

2018-02-07 Thread Johannes Thumshirn

Looks good,
Reviewed-by: Johannes Thumshirn 
-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

Re: [PATCH v2 09/13] lpfc: Add embedded data pointers for enhanced performance

2018-02-07 Thread Johannes Thumshirn

On Tue, Feb 06, 2018 at 06:28:47PM -0800, James Smart wrote:
> + if (do_pbde && (i == 0)) {
Nit: 
if (do_pde && i == 0) {

Else:

Reviewed-by: Johannes Thumshirn 
-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

Re: [PATCH v2 04/13] lpfc: Add push-to-adapter support to sli4

2018-02-07 Thread Johannes Thumshirn

On Wed, Feb 07, 2018 at 10:51:57AM +0100, Johannes Thumshirn wrote:
> > +   /* Enable combined writes for DPP aperture */
> > +   pg_addr = (unsigned long)(wq->dpp_regaddr) & PAGE_MASK;
> > +#ifdef CONFIG_X86
> > +   rc = set_memory_wc(pg_addr, 1);
> > +   if (rc) {
> > +   lpfc_printf_log(phba, KERN_ERR, LOG_INIT,
> > +   "3272 Cannot setup Combined "
> > +   "Write on WQ[%d] - disable 
> > DPP\n",
> > +   wq->queue_id);
> > +   phba->cfg_enable_dpp = 0;
> > +   }
> > +#else
> > +   phba->cfg_enable_dpp = 0;
> > +#endif
> > +   } else
> > +   wq->db_regaddr = phba->sli4_hba.WQDBregaddr;
> 
> I don't really like the set_memory_wc() call here. Neither do I like the ifdef
> CONFIG_X86 special casing.
> 
> If you really need write combining, can't you at least use ioremap_wc()?

Coming back to this again (after talking to our ARM/POWER folks internally).
Is this really x86 specific here? I know there are servers with other 
architectures
using lpfcs out there.

I _think_ write combining should be possible on other architectures (that have
PCIe and aren't dead) as well.

The ioremap_wc() I suggested is probably wrong.

So can you please revisit this? I CCed Mark and Michael, maybe they can help
here.

Byte,
Johannes

-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

Re: [PATCH v2 10/13] lpfc: Fix nvme embedded io length on new hardware

2018-02-07 Thread Johannes Thumshirn

Looks good,
Reviewed-by: Johannes Thumshirn 
-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

Re: [PATCH v2 12/13] lpfc: update driver version to 12.0.0.0

2018-02-07 Thread Johannes Thumshirn

Looks good,
Reviewed-by: Johannes Thumshirn 
-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

Re: [PATCH v2 11/13] lpfc: Work around NVME cmd iu SGL type

2018-02-07 Thread Johannes Thumshirn

Looks good,
Reviewed-by: Johannes Thumshirn 
-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

Re: [PATCH v2 13/13] lpfc: Update 12.0.0.0 modified files for 2018 Copyright

2018-02-07 Thread Johannes Thumshirn

Looks good,
Reviewed-by: Johannes Thumshirn 

Although I'd change the subject to:
"lpfc: Change Copyright of Update 12.0.0.0 modified files to 2018" 

Reads easier for the non-native english speaker (like me).
-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

[V1 2/6] mpt3sas: Configure reply post queue depth, DMA and sgl tablesize.

2018-02-07 Thread Suganath Prabu S

This configures shost max sector to 128, single reply descriptor
post queue, sgl table size to 16 and 32 bit DMA for MPI Endpoint
and it supports 64K as max IO.

Signed-off-by: Suganath Prabu S 
---
 drivers/scsi/mpt3sas/mpt3sas_base.c  | 47 +++-
 drivers/scsi/mpt3sas/mpt3sas_scsih.c | 40 ++
 2 files changed, 54 insertions(+), 33 deletions(-)

diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c 
b/drivers/scsi/mpt3sas/mpt3sas_base.c
index 13d6e4e..f45da9a 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_base.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_base.c
@@ -2214,6 +2214,9 @@ _base_config_dma_addressing(struct MPT3SAS_ADAPTER *ioc, 
struct pci_dev *pdev)
struct sysinfo s;
u64 consistent_dma_mask;
 
+   if (ioc->is_mcpu_endpoint)
+   goto try_32bit;
+
if (ioc->dma_mask)
consistent_dma_mask = DMA_BIT_MASK(64);
else
@@ -2232,6 +2235,7 @@ _base_config_dma_addressing(struct MPT3SAS_ADAPTER *ioc, 
struct pci_dev *pdev)
}
}
 
+ try_32bit:
if (!pci_set_dma_mask(pdev, DMA_BIT_MASK(32))
&& !pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(32))) {
ioc->base_add_sg_single = &_base_add_sg_single_32;
@@ -3887,17 +3891,21 @@ _base_allocate_memory_pools(struct MPT3SAS_ADAPTER *ioc)
sg_tablesize = min_t(unsigned short, sg_tablesize,
   MPT_KDUMP_MIN_PHYS_SEGMENTS);
 
-   if (sg_tablesize < MPT_MIN_PHYS_SEGMENTS)
-   sg_tablesize = MPT_MIN_PHYS_SEGMENTS;
-   else if (sg_tablesize > MPT_MAX_PHYS_SEGMENTS) {
-   sg_tablesize = min_t(unsigned short, sg_tablesize,
- SG_MAX_SEGMENTS);
-   pr_warn(MPT3SAS_FMT
-"sg_tablesize(%u) is bigger than kernel"
-" defined SG_CHUNK_SIZE(%u)\n", ioc->name,
-sg_tablesize, MPT_MAX_PHYS_SEGMENTS);
+   if (ioc->is_mcpu_endpoint)
+   ioc->shost->sg_tablesize = MPT_MIN_PHYS_SEGMENTS;
+   else {
+   if (sg_tablesize < MPT_MIN_PHYS_SEGMENTS)
+   sg_tablesize = MPT_MIN_PHYS_SEGMENTS;
+   else if (sg_tablesize > MPT_MAX_PHYS_SEGMENTS) {
+   sg_tablesize = min_t(unsigned short, sg_tablesize,
+   SG_MAX_SEGMENTS);
+   pr_warn(MPT3SAS_FMT
+   "sg_tablesize(%u) is bigger than kernel "
+   "defined SG_CHUNK_SIZE(%u)\n", ioc->name,
+   sg_tablesize, MPT_MAX_PHYS_SEGMENTS);
+   }
+   ioc->shost->sg_tablesize = sg_tablesize;
}
-   ioc->shost->sg_tablesize = sg_tablesize;
 
ioc->internal_depth = min_t(int, (facts->HighPriorityCredit + (5)),
(facts->RequestCredit / 4));
@@ -3982,13 +3990,18 @@ _base_allocate_memory_pools(struct MPT3SAS_ADAPTER *ioc)
/* reply free queue sizing - taking into account for 64 FW events */
ioc->reply_free_queue_depth = ioc->hba_queue_depth + 64;
 
-   /* calculate reply descriptor post queue depth */
-   ioc->reply_post_queue_depth = ioc->hba_queue_depth +
-   ioc->reply_free_queue_depth +  1 ;
-   /* align the reply post queue on the next 16 count boundary */
-   if (ioc->reply_post_queue_depth % 16)
-   ioc->reply_post_queue_depth += 16 -
-   (ioc->reply_post_queue_depth % 16);
+   /* mCPU manage single counters for simplicity */
+   if (ioc->is_mcpu_endpoint)
+   ioc->reply_post_queue_depth = ioc->reply_free_queue_depth;
+   else {
+   /* calculate reply descriptor post queue depth */
+   ioc->reply_post_queue_depth = ioc->hba_queue_depth +
+   ioc->reply_free_queue_depth +  1;
+   /* align the reply post queue on the next 16 count boundary */
+   if (ioc->reply_post_queue_depth % 16)
+   ioc->reply_post_queue_depth += 16 -
+   (ioc->reply_post_queue_depth % 16);
+   }
 
if (ioc->reply_post_queue_depth >
facts->MaxReplyDescriptorPostQueueDepth) {
diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c 
b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
index bde3c6f..5e52679 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
@@ -10521,26 +10521,34 @@ _scsih_probe(struct pci_dev *pdev, const struct 
pci_device_id *id)
shost->transportt = mpt3sas_transport_template;
shost->unique_id = ioc->id;
 
-   if (max_sectors != 0x) {
-   if (max_sectors < 64) {
-   shost->max_sectors = 64;
-   pr_warn(MPT3SAS_FMT "Invalid value %d passed " \
-   "for max_sectors, range is 64 to 32767. Assigning "
-   "va

[V1 1/6] mpt3sas: Add PCI device ID for Andromeda.

2018-02-07 Thread Suganath Prabu S

Add device ID and flag for Andromeda/MPI Emdpont.

Signed-off-by: Suganath Prabu S 
---
 drivers/scsi/mpt3sas/mpi/mpi2_cnfg.h |  1 +
 drivers/scsi/mpt3sas/mpt3sas_base.h  |  1 +
 drivers/scsi/mpt3sas/mpt3sas_scsih.c | 14 --
 3 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/mpt3sas/mpi/mpi2_cnfg.h 
b/drivers/scsi/mpt3sas/mpi/mpi2_cnfg.h
index ee11710..0ad88de 100644
--- a/drivers/scsi/mpt3sas/mpi/mpi2_cnfg.h
+++ b/drivers/scsi/mpt3sas/mpi/mpi2_cnfg.h
@@ -524,6 +524,7 @@ typedef struct _MPI2_CONFIG_REPLY {
 #define MPI2_MFGPAGE_DEVID_SAS2308_1(0x0086)
 #define MPI2_MFGPAGE_DEVID_SAS2308_2(0x0087)
 #define MPI2_MFGPAGE_DEVID_SAS2308_3(0x006E)
+#define MPI2_MFGPAGE_DEVID_SAS2308_MPI_EP   (0x02B0)
 
 /*MPI v2.5 SAS products */
 #define MPI25_MFGPAGE_DEVID_SAS3004 (0x0096)
diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.h 
b/drivers/scsi/mpt3sas/mpt3sas_base.h
index 789bc42..897394d 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_base.h
+++ b/drivers/scsi/mpt3sas/mpt3sas_base.h
@@ -1336,6 +1336,7 @@ struct MPT3SAS_ADAPTER {
u32 ring_buffer_offset;
u32 ring_buffer_sz;
u8  is_warpdrive;
+   u8  is_mcpu_endpoint;
u8  hide_ir_msg;
u8  mfg_pg10_hide_flag;
u8  hide_drives;
diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c 
b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
index 74fca18..bde3c6f 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
@@ -10335,6 +10335,7 @@ _scsih_determine_hba_mpi_version(struct pci_dev *pdev)
case MPI2_MFGPAGE_DEVID_SAS2308_1:
case MPI2_MFGPAGE_DEVID_SAS2308_2:
case MPI2_MFGPAGE_DEVID_SAS2308_3:
+   case MPI2_MFGPAGE_DEVID_SAS2308_MPI_EP:
return MPI2_VERSION;
case MPI25_MFGPAGE_DEVID_SAS3004:
case MPI25_MFGPAGE_DEVID_SAS3008:
@@ -10412,11 +10413,18 @@ _scsih_probe(struct pci_dev *pdev, const struct 
pci_device_id *id)
ioc->hba_mpi_version_belonged = hba_mpi_version;
ioc->id = mpt2_ids++;
sprintf(ioc->driver_name, "%s", MPT2SAS_DRIVER_NAME);
-   if (pdev->device == MPI2_MFGPAGE_DEVID_SSS6200) {
+   switch (pdev->device) {
+   case MPI2_MFGPAGE_DEVID_SSS6200:
ioc->is_warpdrive = 1;
ioc->hide_ir_msg = 1;
-   } else
+   break;
+   case MPI2_MFGPAGE_DEVID_SAS2308_MPI_EP:
+   ioc->is_mcpu_endpoint = 1;
+   break;
+   default:
ioc->mfg_pg10_hide_flag = MFG_PAGE10_EXPOSE_ALL_DISKS;
+   break;
+   }
break;
case MPI25_VERSION:
case MPI26_VERSION:
@@ -10845,6 +10853,8 @@ static const struct pci_device_id mpt3sas_pci_table[] = 
{
PCI_ANY_ID, PCI_ANY_ID },
{ MPI2_MFGPAGE_VENDORID_LSI, MPI2_MFGPAGE_DEVID_SAS2308_3,
PCI_ANY_ID, PCI_ANY_ID },
+   { MPI2_MFGPAGE_VENDORID_LSI, MPI2_MFGPAGE_DEVID_SAS2308_MPI_EP,
+   PCI_ANY_ID, PCI_ANY_ID },
/* SSS6200 */
{ MPI2_MFGPAGE_VENDORID_LSI, MPI2_MFGPAGE_DEVID_SSS6200,
PCI_ANY_ID, PCI_ANY_ID },
-- 
2.5.5

[V1 3/6] mpt3sas: Introduce API's to get BAR0 mapped buffer address.

2018-02-07 Thread Suganath Prabu S

For MPI Endpoint/Mcpu, Driver should double buffer data buffer/sgl's.
 This is normally copied from host to internal memory of IOC by
 DMA engine of PCI Device. Since the interface to DMA from host
 to mCPU is not present for Mcpu/MPI Endpoint device,
 Driver does double copy of those buffer directly to the mCPU
 memory region via BAR-0 region.

 Introduced API's to calculate and return BAR0 mapped
 host buffer's physical and virtual address for the provided smid

Signed-off-by: Suganath Prabu S 
---
 drivers/scsi/mpt3sas/mpt3sas_base.c | 93 +
 drivers/scsi/mpt3sas/mpt3sas_base.h |  2 +
 2 files changed, 95 insertions(+)

diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c 
b/drivers/scsi/mpt3sas/mpt3sas_base.c
index f45da9a..36f1242 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_base.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_base.c
@@ -126,6 +126,99 @@ module_param_call(mpt3sas_fwfault_debug, 
_scsih_set_fwfault_debug,
param_get_int, &mpt3sas_fwfault_debug, 0644);
 
 /**
+ * _base_get_chain - Calculates and Returns virtual chain address
+ *  for the provided smid in BAR0 space.
+ *
+ * @ioc: per adapter object
+ * @smid: system request message index
+ * @sge_chain_count: Scatter gather chain count.
+ *
+ * @Return: chain address.
+ */
+static inline void __iomem*
+_base_get_chain(struct MPT3SAS_ADAPTER *ioc, u16 smid,
+   u8 sge_chain_count)
+{
+   void __iomem *base_chain, *chain_virt;
+   u16 cmd_credit = ioc->facts.RequestCredit + 1;
+
+   base_chain  = (void __iomem *)ioc->chip + MPI_FRAME_START_OFFSET +
+   (cmd_credit * ioc->request_sz) +
+   REPLY_FREE_POOL_SIZE;
+   chain_virt = base_chain + (smid * ioc->facts.MaxChainDepth *
+   ioc->request_sz) + (sge_chain_count * ioc->request_sz);
+   return chain_virt;
+}
+
+/**
+ * _base_get_chain_phys - Calculates and Returns physical address
+ * in BAR0 for scatter gather chains, for
+ * the provided smid.
+ *
+ * @ioc: per adapter object
+ * @smid: system request message index
+ * @sge_chain_count: Scatter gather chain count.
+ *
+ * @Return - Physical chain address.
+ */
+static inline void *
+_base_get_chain_phys(struct MPT3SAS_ADAPTER *ioc, u16 smid,
+   u8 sge_chain_count)
+{
+   void *base_chain_phys, *chain_phys;
+   u16 cmd_credit = ioc->facts.RequestCredit + 1;
+
+   base_chain_phys  = (void *)ioc->chip_phys + MPI_FRAME_START_OFFSET +
+   (cmd_credit * ioc->request_sz) +
+   REPLY_FREE_POOL_SIZE;
+   chain_phys = base_chain_phys + (smid * ioc->facts.MaxChainDepth *
+   ioc->request_sz) + (sge_chain_count * ioc->request_sz);
+   return chain_phys;
+}
+
+/**
+ * _base_get_buffer_bar0 - Calculates and Returns BAR0 mapped Host
+ * buffer address for the provided smid.
+ * (Each smid can have 64K starts from 17024)
+ *
+ * @ioc: per adapter object
+ * @smid: system request message index
+ *
+ * @Returns - Pointer to buffer location in BAR0.
+ */
+
+static void __iomem *
+_base_get_buffer_bar0(struct MPT3SAS_ADAPTER *ioc, u16 smid)
+{
+   u16 cmd_credit = ioc->facts.RequestCredit + 1;
+   // Added extra 1 to reach end of chain.
+   void __iomem *chain_end = _base_get_chain(ioc,
+   cmd_credit + 1,
+   ioc->facts.MaxChainDepth);
+   return chain_end + (smid * 64 * 1024);
+}
+
+/**
+ * _base_get_buffer_phys_bar0 - Calculates and Returns BAR0 mapped
+ * Host buffer Physical address for the provided smid.
+ * (Each smid can have 64K starts from 17024)
+ *
+ * @ioc: per adapter object
+ * @smid: system request message index
+ *
+ * @Returns - Pointer to buffer location in BAR0.
+ */
+static void *
+_base_get_buffer_phys_bar0(struct MPT3SAS_ADAPTER *ioc, u16 smid)
+{
+   u16 cmd_credit = ioc->facts.RequestCredit + 1;
+   void *chain_end_phys = _base_get_chain_phys(ioc,
+   cmd_credit + 1,
+   ioc->facts.MaxChainDepth);
+   return chain_end_phys + (smid * 64 * 1024);
+}
+
+/**
  *  mpt3sas_remove_dead_ioc_func - kthread context to remove dead ioc
  * @arg: input argument, used to derive ioc
  *
diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.h 
b/drivers/scsi/mpt3sas/mpt3sas_base.h
index 897394d..2529d25 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_base.h
+++ b/drivers/scsi/mpt3sas/mpt3sas_base.h
@@ -120,6 +120,8 @@
 #define MPT3SAS_NVME_QUEUE_DEPTH   128
 #define MPT_NAME_LENGTH32  /* generic length of 
strings */
 #define MPT_STRING_LENGTH  64
+#define MPI_FRAME_START_OFFSET 256
+#define REPLY_FREE_POOL_SIZE   512 /*(32 maxcredix *4)*(4 times)*/
 
 #define MPT_MAX_CALLBACKS  32
 
-- 
2.5.5

[V1 6/6] mpt3sas: Introduce function to clone mpi reply.

2018-02-07 Thread Suganath Prabu S

If the posted request has an error of any type, the IOC writes
a Reply message into a host-based system reply message frame.
This functions clone it in the BAR0 mapped region.

Signed-off-by: Suganath Prabu S 
---
 drivers/scsi/mpt3sas/mpt3sas_base.c | 37 -
 1 file changed, 36 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c 
b/drivers/scsi/mpt3sas/mpt3sas_base.c
index 52effd1..1c29286 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_base.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_base.c
@@ -126,6 +126,33 @@ module_param_call(mpt3sas_fwfault_debug, 
_scsih_set_fwfault_debug,
param_get_int, &mpt3sas_fwfault_debug, 0644);
 
 /**
+ * _base_clone_reply_to_sys_mem - copies reply to reply free iomem
+ *   in BAR0 space.
+ *
+ * @ioc: per adapter object
+ * @reply: reply message frame(lower 32bit addr)
+ * @index: System request message index.
+ *
+ * @Returns - Nothing
+ */
+static void
+_base_clone_reply_to_sys_mem(struct MPT3SAS_ADAPTER *ioc, u32 reply,
+   u32 index)
+{
+   /*
+* 256 is offset within sys register.
+* 256 offset MPI frame starts. Max MPI frame supported is 32.
+* 32 * 128 = 4K. From here, Clone of reply free for mcpu starts
+*/
+   u16 cmd_credit = ioc->facts.RequestCredit + 1;
+   void __iomem *reply_free_iomem = (void __iomem *)ioc->chip +
+   MPI_FRAME_START_OFFSET +
+   (cmd_credit * ioc->request_sz) + (index * sizeof(u32));
+
+   writel(reply, reply_free_iomem);
+}
+
+/**
  * _base_clone_mpi_to_sys_mem - Writes/copies MPI frames
  * to system/BAR0 region.
  *
@@ -1400,6 +1427,10 @@ _base_interrupt(int irq, void *bus_id)
0 : ioc->reply_free_host_index + 1;
ioc->reply_free[ioc->reply_free_host_index] =
cpu_to_le32(reply);
+   if (ioc->is_mcpu_endpoint)
+   _base_clone_reply_to_sys_mem(ioc,
+   cpu_to_le32(reply),
+   ioc->reply_free_host_index);
writel(ioc->reply_free_host_index,
&ioc->chip->ReplyFreeHostIndex);
}
@@ -6242,8 +6273,12 @@ _base_make_ioc_operational(struct MPT3SAS_ADAPTER *ioc)
/* initialize Reply Free Queue */
for (i = 0, reply_address = (u32)ioc->reply_dma ;
i < ioc->reply_free_queue_depth ; i++, reply_address +=
-   ioc->reply_sz)
+   ioc->reply_sz) {
ioc->reply_free[i] = cpu_to_le32(reply_address);
+   if (ioc->is_mcpu_endpoint)
+   _base_clone_reply_to_sys_mem(ioc,
+   (__le32)reply_address, i);
+   }
 
/* initialize reply queues */
if (ioc->is_driver_loading)
-- 
2.5.5

[V1 0/6] mpt3sas: Adding MPI Endpoint device support.

2018-02-07 Thread Suganath Prabu S

V1 Change info:

* Few sparse warning fixes over initial patch set.
* For 32 bit Arch,_base_writeq function is identical
to _base_mpi_ep_writeq, Removed duplicate code as suggested by Martin.

Andromeda is a PCIe switch, and it has a dedicated management
 CPU (mCPU), nonvolatile flash memory, RAM etc... and
 Linux kernel runs on mCPU. MPI Endpoint driver is the
 management driver for Andromeda.

The Plx Manager driver running on mCPU synthesizes a
 virtual/Synthetic MPI End point to host.
Synthetic MPI End point is emulated IT firmware running on
 Linux operating system, which interfaces with PLX management 
 driver.

PLX Management driver integrates IOCFW in same driver binary.
At the end of Plx_Mgr driver load, it initializes IOC FW as well.
Current implementation is single instance
 of IOC FW (as it supports only one host). 

 PLX management driver will provide required resources 
and infrastructure for Synthetic MPI End point.

Existing PLXManagement driver will reserve virtual slot for
 MPI end point. currently, Virtual slot number 29 is reserved
 for MPI end point. 

Synthetic device in management driver will be marked as
 new type “PLX_DEV_TYPE_SYNTH_MPI_EP”. PLXmanagement driver
 will interface with Synthetic MPI Endpoint for any
 communication happening on PLX_DEV_TYPE_SYNTH_MPI_EP device
 type from host.

Link between host and PLX C2 is in below diagram.

 ___
 ___|   |
|   |   |   |
| PLX C2|===|HOST   |
| PCI - |===|   MACHINE |
|  SWITCH   |   |   |
|___|   |   |
||  |___|
||
||
 ___||__
|   |
|  MCPU |
|   |
|___|



 After MPI end point implementation -
(Host will see dedicated Virtual SLOT as MPI End point.)
In Below single line is logical channel for MPI Endpoint
 ___
 ___|   |
|   |   |   |
| PLX C2|===|   HOST|
| PCI - |===|   MACHINE |
|  SWITCH   |   |   |
|   |   |  ---  |
|___|---| | IT DRIVER | |
||  |   |  ---  |
||  |   |___|
||  |
||  |
 ___||__|___
|   ||  |   |
|  MCPU |   |
|___|   |
|   | PLX MGR|  |
|   | DRIVER |  |
|   ||  |
|   |   |
|___|_  |
|   | | |
|   |IOC FW   | |
|   |_| |
|___|

PLXmanagement driver will create MPI end point based on
 device table definition. PLXManagement driver will also
 populate Synthetic device tree based on Device Table 
 for each host.  

>From host it will be seen as IT HBA (Simplified version of SAS2/MPI2)
(PCI Device, in which emulated IT FW running on mCPU behind Synthetic
 endpoint of PCISWITCH). For host it is considered as actual
 Physical Device.

PLX Management driver provide interface to do DMA from mCPU to Host
 using “MCPU Response Data Buffer“ method. DMA from Host to mCPU using
 “MCPU Response Data Buffer” is not possible.

Why DMA from host to mCPU is not possible using Responsebuffer ?
 MCPU Response buffer is not really for reading from host
 (reading will work, but answer TLP will not come back to the CSR FIFO,
 but will go to the MCPU root complex - which could be an
 unexpected read completion!

Existing host driver (mpt2sas) will not work
 for MPI end point. As the interface to DMA from host to mCPU is
 not present for Mcpu/MPI Endpoint device, To overcome this
 Driver should do double copy of those buffer directly to the
 mCPU memory region via BAR-0 region.

The host BAR0 region is divided into different group to serve Host
 assisted DMA.

 0- 255 System register(Doorbell, Host Interrupt etc)
 256  - 4352MPI Frame. (This is based on maxCredit 32)
 4352 - 4864Reply_free pool (512 byte is reserved considering
maxCredit 32. Reply needsextra room, for mCPU case
kept four times of maxCredit)
 4864 -17152SGE chain element.
(32 command * 3 chain of 128 byte size = 12288)
 17152 -x   Host buffer mapped with smid.
(Each smid can have 64K Max IO.)
BAR0+Last 1KMSIX Addr and DataTotalsize in use 2113664 bytes
of 4MB BAR0 MPI end point module of PLX management
driver must be aware of regions above.

SGE and Host buffer details will be available in MPI frame.

Each PCI packets coming from host on MPI end point will end up in
 mCPU PLXmanagement driver. We can consid

[V1 5/6] mpt3sas: Introduce function to clone mpi request.

2018-02-07 Thread Suganath Prabu S

1) Added function _base_clone_mpi_to_sys_mem to clone
MPI request into system BAR0 mapped region.

2) Separate out MPI Endpoint IO submissions to function
_base_put_smid_mpi_ep_scsi_io.

3) MPI EP requests are submitted in two 32 bit MMIO writes.
from _base_mpi_ep_writeq.

 For 32 bit Arch,_base_writeq function is identical
to _base_mpi_ep_writeq, Removed duplicate code as suggested.

Signed-off-by: Suganath Prabu S 
---
 drivers/scsi/mpt3sas/mpt3sas_base.c | 140 
 1 file changed, 125 insertions(+), 15 deletions(-)

diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c 
b/drivers/scsi/mpt3sas/mpt3sas_base.c
index c41c65b..52effd1 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_base.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_base.c
@@ -126,6 +126,25 @@ module_param_call(mpt3sas_fwfault_debug, 
_scsih_set_fwfault_debug,
param_get_int, &mpt3sas_fwfault_debug, 0644);
 
 /**
+ * _base_clone_mpi_to_sys_mem - Writes/copies MPI frames
+ * to system/BAR0 region.
+ *
+ * @dst_iomem: Pointer to the destinaltion location in BAR0 space.
+ * @src: Pointer to the Source data.
+ * @size: Size of data to be copied.
+ */
+static void
+_base_clone_mpi_to_sys_mem(void *dst_iomem, void *src, u32 size)
+{
+   int i;
+   u32 *src_virt_mem = (u32 *)src;
+
+   for (i = 0; i < size/4; i++)
+   writel((u32)src_virt_mem[i],
+   (void __iomem *)dst_iomem + (i * 4));
+}
+
+/**
  * _base_clone_to_sys_mem - Writes/copies data to system/BAR0 region
  *
  * @dst_iomem: Pointer to the destinaltion location in BAR0 space.
@@ -3268,6 +3287,29 @@ mpt3sas_base_free_smid(struct MPT3SAS_ADAPTER *ioc, u16 
smid)
 }
 
 /**
+ * _base_mpi_ep_writeq - 32 bit write to MMIO
+ * @b: data payload
+ * @addr: address in MMIO space
+ * @writeq_lock: spin lock
+ *
+ * This special handling for MPI EP to take care of 32 bit
+ * environment where its not quarenteed to send the entire word
+ * in one transfer.
+ */
+static inline void
+_base_mpi_ep_writeq(__u64 b, volatile void __iomem *addr,
+   spinlock_t *writeq_lock)
+{
+   unsigned long flags;
+   __u64 data_out = cpu_to_le64(b);
+
+   spin_lock_irqsave(writeq_lock, flags);
+   writel((u32)(data_out), addr);
+   writel((u32)(data_out >> 32), (addr + 4));
+   spin_unlock_irqrestore(writeq_lock, flags);
+}
+
+/**
  * _base_writeq - 64 bit write to MMIO
  * @ioc: per adapter object
  * @b: data payload
@@ -3288,17 +3330,41 @@ _base_writeq(__u64 b, volatile void __iomem *addr, 
spinlock_t *writeq_lock)
 static inline void
 _base_writeq(__u64 b, volatile void __iomem *addr, spinlock_t *writeq_lock)
 {
-   unsigned long flags;
-   __u64 data_out = cpu_to_le64(b);
-
-   spin_lock_irqsave(writeq_lock, flags);
-   writel((u32)(data_out), addr);
-   writel((u32)(data_out >> 32), (addr + 4));
-   spin_unlock_irqrestore(writeq_lock, flags);
+   _base_mpi_ep_writeq(b, addr, writeq_lock);
 }
 #endif
 
 /**
+ * _base_put_smid_mpi_ep_scsi_io - send SCSI_IO request to firmware
+ * @ioc: per adapter object
+ * @smid: system request message index
+ * @handle: device handle
+ *
+ * Return nothing.
+ */
+static void
+_base_put_smid_mpi_ep_scsi_io(struct MPT3SAS_ADAPTER *ioc, u16 smid, u16 
handle)
+{
+   Mpi2RequestDescriptorUnion_t descriptor;
+   u64 *request = (u64 *)&descriptor;
+   void *mpi_req_iomem;
+   __le32 *mfp = (__le32 *)mpt3sas_base_get_msg_frame(ioc, smid);
+
+   _clone_sg_entries(ioc, (void *) mfp, smid);
+   mpi_req_iomem = (void *)ioc->chip +
+   MPI_FRAME_START_OFFSET + (smid * ioc->request_sz);
+   _base_clone_mpi_to_sys_mem(mpi_req_iomem, (void *)mfp,
+   ioc->request_sz);
+   descriptor.SCSIIO.RequestFlags = MPI2_REQ_DESCRIPT_FLAGS_SCSI_IO;
+   descriptor.SCSIIO.MSIxIndex =  _base_get_msix_index(ioc);
+   descriptor.SCSIIO.SMID = cpu_to_le16(smid);
+   descriptor.SCSIIO.DevHandle = cpu_to_le16(handle);
+   descriptor.SCSIIO.LMID = 0;
+   _base_mpi_ep_writeq(*request, &ioc->chip->RequestDescriptorPostLow,
+   &ioc->scsi_lookup_lock);
+}
+
+/**
  * _base_put_smid_scsi_io - send SCSI_IO request to firmware
  * @ioc: per adapter object
  * @smid: system request message index
@@ -3359,7 +3425,23 @@ _base_put_smid_hi_priority(struct MPT3SAS_ADAPTER *ioc, 
u16 smid,
u16 msix_task)
 {
Mpi2RequestDescriptorUnion_t descriptor;
-   u64 *request = (u64 *)&descriptor;
+   void *mpi_req_iomem;
+   u64 *request;
+
+   if (ioc->is_mcpu_endpoint) {
+   MPI2RequestHeader_t *request_hdr;
+
+   __le32 *mfp = (__le32 *)mpt3sas_base_get_msg_frame(ioc, smid);
+
+   request_hdr = (MPI2RequestHeader_t *)mfp;
+   /* TBD 256 is offset within sys register. */
+   mpi_req_iomem = (void *)ioc->chip + MPI_FRAME_START_OFFSET
+

[V1 4/6] mpt3sas: Introduce Base function for cloning.

2018-02-07 Thread Suganath Prabu S

All scsi IO's and config requests data buffer and
sgl are cloned to system memory in _clone_sg_entries
before submitting it to Firmware.

Signed-off-by: Suganath Prabu S 
---
 drivers/scsi/mpt3sas/mpt3sas_base.c   | 215 +-
 drivers/scsi/mpt3sas/mpt3sas_base.h   |   3 +
 drivers/scsi/mpt3sas/mpt3sas_config.c |   1 +
 3 files changed, 218 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c 
b/drivers/scsi/mpt3sas/mpt3sas_base.c
index 36f1242..c41c65b 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_base.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_base.c
@@ -126,6 +126,24 @@ module_param_call(mpt3sas_fwfault_debug, 
_scsih_set_fwfault_debug,
param_get_int, &mpt3sas_fwfault_debug, 0644);
 
 /**
+ * _base_clone_to_sys_mem - Writes/copies data to system/BAR0 region
+ *
+ * @dst_iomem: Pointer to the destinaltion location in BAR0 space.
+ * @src: Pointer to the Source data.
+ * @size: Size of data to be copied.
+ */
+static void
+_base_clone_to_sys_mem(void __iomem *dst_iomem, void *src, u32 size)
+{
+   int i;
+   u32 *src_virt_mem = (u32 *)(src);
+
+   for (i = 0; i < size/4; i++)
+   writel((u32)src_virt_mem[i],
+   (void __iomem *)dst_iomem + (i * 4));
+}
+
+/**
  * _base_get_chain - Calculates and Returns virtual chain address
  *  for the provided smid in BAR0 space.
  *
@@ -219,6 +237,201 @@ _base_get_buffer_phys_bar0(struct MPT3SAS_ADAPTER *ioc, 
u16 smid)
 }
 
 /**
+ * _base_get_chain_buffer_dma_to_chain_buffer - Iterates chain
+ * lookup list and Provides chain_buffer
+ * address for the matching dma address.
+ * (Each smid can have 64K starts from 17024)
+ *
+ * @ioc: per adapter object
+ * @chain_buffer_dma: Chain buffer dma address.
+ *
+ * @Returns - Pointer to chain buffer. Or Null on Failure.
+ */
+static void *
+_base_get_chain_buffer_dma_to_chain_buffer(struct MPT3SAS_ADAPTER *ioc,
+   dma_addr_t chain_buffer_dma)
+{
+   u16 index;
+
+   for (index = 0; index < ioc->chain_depth; index++) {
+   if (ioc->chain_lookup[index].chain_buffer_dma ==
+   chain_buffer_dma)
+   return ioc->chain_lookup[index].chain_buffer;
+   }
+   pr_info(MPT3SAS_FMT
+   "Provided chain_buffer_dma address is not in the lookup list\n",
+   ioc->name);
+   return NULL;
+}
+
+/**
+ * _clone_sg_entries - MPI EP's scsiio and config requests
+ * are handled here. Base function for
+ * double buffering, before submitting
+ * the requests.
+ *
+ * @ioc: per adapter object.
+ * @mpi_request: mf request pointer.
+ * @smid: system request message index.
+ *
+ * @Returns: Nothing.
+ */
+static void _clone_sg_entries(struct MPT3SAS_ADAPTER *ioc,
+   void *mpi_request, u16 smid)
+{
+   Mpi2SGESimple32_t *sgel, *sgel_next;
+   u32  sgl_flags, sge_chain_count = 0;
+   bool is_write = 0;
+   u16 i = 0;
+   void __iomem *buffer_iomem;
+   void  *buffer_iomem_phys;
+   void __iomem *buff_ptr;
+   void *buff_ptr_phys;
+   void __iomem *dst_chain_addr[MCPU_MAX_CHAINS_PER_IO];
+   void *src_chain_addr[MCPU_MAX_CHAINS_PER_IO], *dst_addr_phys;
+   MPI2RequestHeader_t *request_hdr;
+   struct scsi_cmnd *scmd;
+   struct scatterlist *sg_scmd = NULL;
+   int is_scsiio_req = 0;
+
+   request_hdr = (MPI2RequestHeader_t *) mpi_request;
+
+   if (request_hdr->Function == MPI2_FUNCTION_SCSI_IO_REQUEST) {
+   Mpi25SCSIIORequest_t *scsiio_request =
+   (Mpi25SCSIIORequest_t *)mpi_request;
+   sgel = (Mpi2SGESimple32_t *) &scsiio_request->SGL;
+   is_scsiio_req = 1;
+   } else if (request_hdr->Function == MPI2_FUNCTION_CONFIG) {
+   Mpi2ConfigRequest_t  *config_req =
+   (Mpi2ConfigRequest_t *)mpi_request;
+   sgel = (Mpi2SGESimple32_t *) &config_req->PageBufferSGE;
+   } else
+   return;
+
+   /* From smid we can get scsi_cmd, once we have sg_scmd,
+* we just need to get sg_virt and sg_next to get virual
+* address associated with sgel->Address.
+*/
+
+   if (is_scsiio_req) {
+   /* Get scsi_cmd using smid */
+   scmd = mpt3sas_scsih_scsi_lookup_get(ioc, smid);
+   if (scmd == NULL) {
+   pr_err(MPT3SAS_FMT "scmd is NULL\n", ioc->name);
+   return;
+   }
+
+   /* Get sg_scmd from scmd provided */
+   sg_scmd = scsi_sglist(scmd);
+   }
+
+   /*
+* 0 - 255  System register
+* 256 - 4352   MPI Frame. (This is based on maxCredit 32)
+* 4352 - 4864  Reply_free pool (512 byte is reserved
+*  considering maxCredit 32. Reply need extra

Re: [LSF/MM TOPIC] Two blk-mq related topics

2018-02-07 Thread John Garry


On 30/01/2018 10:33, John Garry wrote:

On 30/01/2018 01:24, Ming Lei wrote:

On Mon, Jan 29, 2018 at 12:56:30PM -0800, James Bottomley wrote:

On Mon, 2018-01-29 at 23:46 +0800, Ming Lei wrote:
[...]

2. When to enable SCSI_MQ at default again?


I'm not sure there's much to discuss ... I think the basic answer is as
soon as Christoph wants to try it again.


I guess Christoph still need to evaluate if there are existed issues or
blockers before trying it again. And more input may be got from F2F
discussion, IMHO.




SCSI_MQ is enabled on V3.17 firstly, but disabled at default. In
V4.13-rc1, it is enabled at default, but later the patch is reverted
in V4.13-rc7, and becomes disabled at default too.

Now both the original reported PM issue(actually SCSI quiesce) and
the sequential IO performance issue have been addressed.


Is the blocker bug just not closed because no-one thought to do it:

https://bugzilla.kernel.org/show_bug.cgi?id=178381

(we have confirmed that this issue is now fixed with the original
reporter?)



From a developer view, this issue is fixed by the following commit:

3a0a52997(block, scsi: Make SCSI quiesce and resume work reliably),
and it is verified by kernel list reporter.



And did the Huawei guy (Jonathan Cameron) confirm his performance issue
was fixed (I don't think I saw email that he did)?


Last time I talked with John Garry about the issue, and the merged
.get_budget
based patch improves much on the IO performance, but there is still a
bit gap
compared with legacy path. Seems a driver specific issue, remembered
that removing
a driver's lock can improve performance much.

Garry, could you provide further update on this issue?


Hi Ming,

From our testing with experimental changes to our driver to support SCSI
mq we were almost getting on par performance with legacy path. But
without these MQ was hitting performance (and I would not necessarily
say it was a driver issue).

We can retest from today's mainline and see where we are.

BTW, Have you got performance figures for many other single queue HBAs
with and without CONFIG_SCSI_MQ_DEFAULT=Y?


We finally got around to retesting this (on hisi_sas controller). So the 
results are generally ok, in that we are now not seeing such big 
performance drops in our hardware for enabling SCSI MQ - in some 
scenarios the performance is better. Generally fio rw mode is better.


Anyway, for what it's worth, it's a green light from us to set SCSI MQ 
on by default.


John



Thanks,
John



Thanks,
Ming

.




___
Linuxarm mailing list
linux...@huawei.com
http://hulk.huawei.com/mailman/listinfo/linuxarm

.

Re: [PATCH 0/5] blk-mq/scsi-mq: support global tags & introduce force_blk_mq

2018-02-07 Thread Ming Lei

On Wed, Feb 07, 2018 at 07:50:21AM +0100, Hannes Reinecke wrote:
> Hi all,
> 
> [ .. ]
> >>
> >> Could you share us your patch for enabling global_tags/MQ on
> > megaraid_sas
> >> so that I can reproduce your test?
> >>
> >>> See below perf top data. "bt_iter" is consuming 4 times more CPU.
> >>
> >> Could you share us what the IOPS/CPU utilization effect is after
> > applying the
> >> patch V2? And your test script?
> > Regarding CPU utilization, I need to test one more time. Currently system
> > is in used.
> > 
> > I run below fio test on total 24 SSDs expander attached.
> > 
> > numactl -N 1 fio jbod.fio --rw=randread --iodepth=64 --bs=4k
> > --ioengine=libaio --rw=randread
> > 
> > Performance dropped from 1.6 M IOPs to 770K IOPs.
> > 
> This is basically what we've seen with earlier iterations.

Hi Hannes,

As I mentioned in another mail[1], Kashyap's patch has a big issue, which
causes only reply queue 0 used.

[1] https://marc.info/?l=linux-scsi&m=151793204014631&w=2

So could you guys run your performance test again after fixing the patch?


Thanks,
Ming

[scsi:misc 327/330] drivers//scsi/storvsc_drv.c:1313:4: error: implicit declaration of function 'for_each_cpu_wrap'

2018-02-07 Thread kbuild test robot

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi.git misc
head:   16a628faa63c8149d9a8f433e5c6548f6cff98e4
commit: 2439bec3bf084ab6cbc69a66797a4612042be6ca [327/330] scsi: storvsc: 
Spread interrupts when picking a channel for I/O requests
config: x86_64-randconfig-s1-02071932 (attached as .config)
compiler: gcc-6 (Debian 6.4.0-9) 6.4.0 20171026
reproduce:
git checkout 2439bec3bf084ab6cbc69a66797a4612042be6ca
# save the attached .config to linux build tree
make ARCH=x86_64 

All errors (new ones prefixed by >>):

   drivers//scsi/storvsc_drv.c: In function 'storvsc_do_io':
>> drivers//scsi/storvsc_drv.c:1313:4: error: implicit declaration of function 
>> 'for_each_cpu_wrap' [-Werror=implicit-function-declaration]
   for_each_cpu_wrap(tgt_cpu, &alloced_mask,
   ^
   drivers//scsi/storvsc_drv.c:1314:40: error: expected ';' before '{' token
 outgoing_channel->target_cpu + 1) {
   ^
   cc1: some warnings being treated as errors

vim +/for_each_cpu_wrap +1313 drivers//scsi/storvsc_drv.c

  1280  
  1281  
  1282  static int storvsc_do_io(struct hv_device *device,
  1283   struct storvsc_cmd_request *request, u16 q_num)
  1284  {
  1285  struct storvsc_device *stor_device;
  1286  struct vstor_packet *vstor_packet;
  1287  struct vmbus_channel *outgoing_channel;
  1288  int ret = 0;
  1289  struct cpumask alloced_mask;
  1290  int tgt_cpu;
  1291  
  1292  vstor_packet = &request->vstor_packet;
  1293  stor_device = get_out_stor_device(device);
  1294  
  1295  if (!stor_device)
  1296  return -ENODEV;
  1297  
  1298  
  1299  request->device  = device;
  1300  /*
  1301   * Select an an appropriate channel to send the request out.
  1302   */
  1303  
  1304  if (stor_device->stor_chns[q_num] != NULL) {
  1305  outgoing_channel = stor_device->stor_chns[q_num];
  1306  if (outgoing_channel->target_cpu == smp_processor_id()) 
{
  1307  /*
  1308   * Ideally, we want to pick a different channel 
if
  1309   * available on the same NUMA node.
  1310   */
  1311  cpumask_and(&alloced_mask, 
&stor_device->alloced_cpus,
  1312  
cpumask_of_node(cpu_to_node(q_num)));
> 1313  for_each_cpu_wrap(tgt_cpu, &alloced_mask,
  1314  outgoing_channel->target_cpu + 
1) {
  1315  if (tgt_cpu != 
outgoing_channel->target_cpu) {
  1316  outgoing_channel =
  1317  stor_device->stor_chns[tgt_cpu];
  1318  break;
  1319  }
  1320  }
  1321  }
  1322  } else {
  1323  outgoing_channel = get_og_chn(stor_device, q_num);
  1324  }
  1325  
  1326  
  1327  vstor_packet->flags |= REQUEST_COMPLETION_FLAG;
  1328  
  1329  vstor_packet->vm_srb.length = (sizeof(struct vmscsi_request) -
  1330  vmscsi_size_delta);
  1331  
  1332  
  1333  vstor_packet->vm_srb.sense_info_length = sense_buffer_size;
  1334  
  1335  
  1336  vstor_packet->vm_srb.data_transfer_length =
  1337  request->payload->range.len;
  1338  
  1339  vstor_packet->operation = VSTOR_OPERATION_EXECUTE_SRB;
  1340  
  1341  if (request->payload->range.len) {
  1342  
  1343  ret = vmbus_sendpacket_mpb_desc(outgoing_channel,
  1344  request->payload, request->payload_sz,
  1345  vstor_packet,
  1346  (sizeof(struct vstor_packet) -
  1347  vmscsi_size_delta),
  1348  (unsigned long)request);
  1349  } else {
  1350  ret = vmbus_sendpacket(outgoing_channel, vstor_packet,
  1351 (sizeof(struct vstor_packet) -
  1352  vmscsi_size_delta),
  1353 (unsigned long)request,
  1354 VM_PKT_DATA_INBAND,
  1355 
VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED);
  1356  }
  1357  
  1358  if (ret != 0)
  1359  return ret;
  1360  
  1361  atomic_inc(&stor_device->num_outstanding_req);
  1362  
  1363  return ret;
  1364  }
  1365  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/

Re: [PATCH V2 1/2] qedi: Fix truncation of CHAP name and secret

2018-02-07 Thread Javali, Nilesh


On 2/6/18, 8:53 PM, "Bart Van Assche"  wrote:

>On Tue, 2018-02-06 at 05:12 -0800, Nilesh Javali wrote:
>> From: Andrew Vasquez 
>> 
>> The data in NVRAM is not guaranteed to be NUL terminated.
>> Copy the data upto the element size or to the first NUL
>> in the byte-stream and then append a NUL.
>> 
>> Signed-off-by: Andrew Vasquez 
>> Signed-off-by: Nilesh Javali 
>> ---
>>  drivers/scsi/qedi/qedi_main.c | 45
>>+++
>>  1 file changed, 33 insertions(+), 12 deletions(-)
>> 
>> diff --git a/drivers/scsi/qedi/qedi_main.c
>>b/drivers/scsi/qedi/qedi_main.c
>> index 8808f0d..f3dd438 100644
>> --- a/drivers/scsi/qedi/qedi_main.c
>> +++ b/drivers/scsi/qedi/qedi_main.c
>> @@ -1705,6 +1705,27 @@ void qedi_reset_host_mtu(struct qedi_ctx *qedi,
>>u16 mtu)
>>  qedi_ops->ll2->start(qedi->cdev, ¶ms);
>>  }
>>  
>> +static ssize_t
>> +qedi_show_copy_data(char *buf, size_t size, u8 *data)
>> +{
>> +size_t i;
>> +
>> +if (!data)
>> +return sprintf(buf, "\n");
>> +
>> +/*
>> + * Data not guaranteed to be NUL terminated. Copy until NUL found or
>> + * complete copy done.
>> + */
>> +for (i = 0; i < size && data[i]; i++)
>> +buf[i] = data[i];
>> +/* Data copy complete, append NEWLINE and NUL terminator. */
>> +buf[i] = '\n';
>> +buf[i + 1] = '\0';
>> +return strlen(buf);
>> +}
>
>Can the body of the above function be changed into the following, which
>is much
>shorter?
>
>sprintf(buf, "%.*s", (int)size, data ? : "")

This looks clean and shorter. I will send the updated patch.



Thanks,
Nilesh

RE: [PATCH 0/5] blk-mq/scsi-mq: support global tags & introduce force_blk_mq

2018-02-07 Thread Kashyap Desai

> -Original Message-
> From: Ming Lei [mailto:ming@redhat.com]
> Sent: Wednesday, February 7, 2018 5:53 PM
> To: Hannes Reinecke
> Cc: Kashyap Desai; Jens Axboe; linux-bl...@vger.kernel.org; Christoph
> Hellwig; Mike Snitzer; linux-scsi@vger.kernel.org; Arun Easi; Omar
Sandoval;
> Martin K . Petersen; James Bottomley; Christoph Hellwig; Don Brace;
Peter
> Rivera; Paolo Bonzini; Laurence Oberman
> Subject: Re: [PATCH 0/5] blk-mq/scsi-mq: support global tags & introduce
> force_blk_mq
>
> On Wed, Feb 07, 2018 at 07:50:21AM +0100, Hannes Reinecke wrote:
> > Hi all,
> >
> > [ .. ]
> > >>
> > >> Could you share us your patch for enabling global_tags/MQ on
> > > megaraid_sas
> > >> so that I can reproduce your test?
> > >>
> > >>> See below perf top data. "bt_iter" is consuming 4 times more CPU.
> > >>
> > >> Could you share us what the IOPS/CPU utilization effect is after
> > > applying the
> > >> patch V2? And your test script?
> > > Regarding CPU utilization, I need to test one more time. Currently
> > > system is in used.
> > >
> > > I run below fio test on total 24 SSDs expander attached.
> > >
> > > numactl -N 1 fio jbod.fio --rw=randread --iodepth=64 --bs=4k
> > > --ioengine=libaio --rw=randread
> > >
> > > Performance dropped from 1.6 M IOPs to 770K IOPs.
> > >
> > This is basically what we've seen with earlier iterations.
>
> Hi Hannes,
>
> As I mentioned in another mail[1], Kashyap's patch has a big issue,
which
> causes only reply queue 0 used.
>
> [1] https://marc.info/?l=linux-scsi&m=151793204014631&w=2
>
> So could you guys run your performance test again after fixing the
patch?

Ming -

I tried after change you requested.  Performance drop is still unresolved.
>From 1.6 M IOPS to 770K IOPS.

See below data. All 24 reply queue is in used correctly.

IRQs / 1 second(s)
IRQ#  TOTAL  NODE0   NODE1  NAME
 360  16422  0   16422  IR-PCI-MSI 70254653-edge megasas
 364  15980  0   15980  IR-PCI-MSI 70254657-edge megasas
 362  15979  0   15979  IR-PCI-MSI 70254655-edge megasas
 345  15696  0   15696  IR-PCI-MSI 70254638-edge megasas
 341  15659  0   15659  IR-PCI-MSI 70254634-edge megasas
 369  15656  0   15656  IR-PCI-MSI 70254662-edge megasas
 359  15650  0   15650  IR-PCI-MSI 70254652-edge megasas
 358  15596  0   15596  IR-PCI-MSI 70254651-edge megasas
 350  15574  0   15574  IR-PCI-MSI 70254643-edge megasas
 342  15532  0   15532  IR-PCI-MSI 70254635-edge megasas
 344  15527  0   15527  IR-PCI-MSI 70254637-edge megasas
 346  15485  0   15485  IR-PCI-MSI 70254639-edge megasas
 361  15482  0   15482  IR-PCI-MSI 70254654-edge megasas
 348  15467  0   15467  IR-PCI-MSI 70254641-edge megasas
 368  15463  0   15463  IR-PCI-MSI 70254661-edge megasas
 354  15420  0   15420  IR-PCI-MSI 70254647-edge megasas
 351  15378  0   15378  IR-PCI-MSI 70254644-edge megasas
 352  15377  0   15377  IR-PCI-MSI 70254645-edge megasas
 356  15348  0   15348  IR-PCI-MSI 70254649-edge megasas
 337  15344  0   15344  IR-PCI-MSI 70254630-edge megasas
 343  15320  0   15320  IR-PCI-MSI 70254636-edge megasas
 355  15266  0   15266  IR-PCI-MSI 70254648-edge megasas
 335  15247  0   15247  IR-PCI-MSI 70254628-edge megasas
 363  15233  0   15233  IR-PCI-MSI 70254656-edge megasas


Average:CPU  %usr %nice  %sys   %iowait%steal
%irq %soft%guest%gnice %idle
Average: 18  3.80  0.00 14.78 10.08  0.00
0.00  4.01  0.00  0.00 67.33
Average: 19  3.26  0.00 15.35 10.62  0.00
0.00  4.03  0.00  0.00 66.74
Average: 20  3.42  0.00 14.57 10.67  0.00
0.00  3.84  0.00  0.00 67.50
Average: 21  3.19  0.00 15.60 10.75  0.00
0.00  4.16  0.00  0.00 66.30
Average: 22  3.58  0.00 15.15 10.66  0.00
0.00  3.51  0.00  0.00 67.11
Average: 23  3.34  0.00 15.36 10.63  0.00
0.00  4.17  0.00  0.00 66.50
Average: 24  3.50  0.00 14.58 10.93  0.00
0.00  3.85  0.00  0.00 67.13
Average: 25  3.20  0.00 14.68 10.86  0.00
0.00  4.31  0.00  0.00 66.95
Average: 26  3.27  0.00 14.80 10.70  0.00
0.00  3.68  0.00  0.00 67.55
Average: 27  3.58  0.00 15.36 10.80  0.00
0.00  3.79  0.00  0.00 66.48
Average: 28  3.46  0.00 15.17 10.46  0.00
0.00  3.32  0.00  0.00 67.59
Average: 29  3.34  0.00 14.42 10.72  0.00
0.00  3.34  0.00  0.00 68.18
Average: 30  3.34  0.00 15.08 10.70  0.00
0.00  3.89  0.00  0.00 66.99
Average: 31  3.26  0.00 15.33 10.47  0.00
0

Re: [PATCH v2 06/13] lpfc: Add 64G link speed support

2018-02-07 Thread James Smart


On 2/7/2018 1:58 AM, Johannes Thumshirn wrote:

On Tue, Feb 06, 2018 at 06:28:44PM -0800, James Smart wrote:

lpfc_printf_log(phba, KERN_ERR, LOG_INIT,
-   "0469 lpfc_link_speed attribute cannot be set to %d, "
-   "allowed values are ["LPFC_LINK_SPEED_STRING"]\n", val);
+   "0469 lpfc_link_speed attribute cannot be set to %d, "
+   "allowed values are ["LPFC_LINK_SPEED_STRING"]\n", val);


lpfc_printf_log(phba, KERN_ERR, LOG_INIT,
"0469 lpfc_link_speed attribute cannot be set to %d, "
"allowed values are [%s]\n",
val, LPFC_LINK_SPEED_STRING);

And possible have the whole quoted string on one line for easier grepping.
checkplatch.pl should've told you that.



Checkpatch didn't say anything interesting. We consistently have line 
spanning as the lines are fairly verbose. Not an easy way to avoid that.


Regardless, no big deal.

-- james

Re: [PATCH 4/6] scsi: qedf: fix LTO-enabled build

2018-02-07 Thread Chad Dupuis


On Fri, 2 Feb 2018, 8:12am, Arnd Bergmann wrote:

> The prototype for qedf_dbg_fops/qedf_debugfs_ops doesn't match the definition,
> which causes the final link to fail with link-time optimizations:
> 
> drivers/scsi/qedf/qedf_main.c:34: error: type of 'qedf_dbg_fops' does not 
> match original declaration [-Werror=lto-type-mismatch]
>  extern struct file_operations qedf_dbg_fops;
> 
> drivers/scsi/qedf/qedf_debugfs.c:443: note: 'qedf_dbg_fops' was previously 
> declared here
>  const struct file_operations qedf_dbg_fops[] = {
> 
> drivers/scsi/qedf/qedf_main.c:33: error: type of 'qedf_debugfs_ops' does not 
> match original declaration [-Werror=lto-type-mismatch]
>  extern struct qedf_debugfs_ops qedf_debugfs_ops;
> 
> drivers/scsi/qedf/qedf_debugfs.c:102: note: 'qedf_debugfs_ops' was previously 
> declared here
>  struct qedf_debugfs_ops qedf_debugfs_ops[] = {
> 
> This corrects the prototype and moves it into a shared header file where it
> belongs. The file operations can also be marked 'const' like the
> qedf_debugfs_ops.
> 
> Signed-off-by: Arnd Bergmann 
> ---
>  drivers/scsi/qedf/qedf_dbg.h | 17 ++---
>  drivers/scsi/qedf/qedf_debugfs.c |  6 +++---
>  drivers/scsi/qedf/qedf_main.c|  8 +++-
>  3 files changed, 16 insertions(+), 15 deletions(-)
> 

Thanks.

Acked-by: Chad Dupuis

Re: [PATCH v2 06/13] lpfc: Add 64G link speed support

2018-02-07 Thread Johannes Thumshirn

On Wed, Feb 07, 2018 at 07:38:39AM -0800, James Smart wrote:
> On 2/7/2018 1:58 AM, Johannes Thumshirn wrote:
> > On Tue, Feb 06, 2018 at 06:28:44PM -0800, James Smart wrote:
> > >   lpfc_printf_log(phba, KERN_ERR, LOG_INIT,
> > > - "0469 lpfc_link_speed attribute cannot be set to %d, "
> > > - "allowed values are ["LPFC_LINK_SPEED_STRING"]\n", val);
> > > + "0469 lpfc_link_speed attribute cannot be set to %d, "
> > > + "allowed values are ["LPFC_LINK_SPEED_STRING"]\n", val);
> > 
> > lpfc_printf_log(phba, KERN_ERR, LOG_INIT,
> > "0469 lpfc_link_speed attribute cannot be set to %d, "
> > "allowed values are [%s]\n",
> > val, LPFC_LINK_SPEED_STRING);
> > 
> > And possible have the whole quoted string on one line for easier grepping.
> > checkplatch.pl should've told you that.
> > 
> 
> Checkpatch didn't say anything interesting. We consistently have line
> spanning as the lines are fairly verbose. Not an easy way to avoid that.
> 
> Regardless, no big deal.

The line-breaks make it non-trivial to grep for an error message. No big deal
and with lpfc having the numeric prefixes for prints this is actually not
needed.

What I mainly didn't like was the implicit pasting of the define instead of
%s. But no big deal as well.

-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

Re: [PATCH v2 04/13] lpfc: Add push-to-adapter support to sli4

2018-02-07 Thread James Smart


On 2/7/2018 2:27 AM, Johannes Thumshirn wrote:

On Wed, Feb 07, 2018 at 10:51:57AM +0100, Johannes Thumshirn wrote:

+   /* Enable combined writes for DPP aperture */
+   pg_addr = (unsigned long)(wq->dpp_regaddr) & PAGE_MASK;
+#ifdef CONFIG_X86
+   rc = set_memory_wc(pg_addr, 1);
+   if (rc) {
+   lpfc_printf_log(phba, KERN_ERR, LOG_INIT,
+   "3272 Cannot setup Combined "
+   "Write on WQ[%d] - disable 
DPP\n",
+   wq->queue_id);
+   phba->cfg_enable_dpp = 0;
+   }
+#else
+   phba->cfg_enable_dpp = 0;
+#endif
+   } else
+   wq->db_regaddr = phba->sli4_hba.WQDBregaddr;


I don't really like the set_memory_wc() call here. Neither do I like the ifdef
CONFIG_X86 special casing.

If you really need write combining, can't you at least use ioremap_wc()?


Coming back to this again (after talking to our ARM/POWER folks internally).
Is this really x86 specific here? I know there are servers with other 
architectures
using lpfcs out there.

I _think_ write combining should be possible on other architectures (that have
PCIe and aren't dead) as well.

The ioremap_wc() I suggested is probably wrong.

So can you please revisit this? I CCed Mark and Michael, maybe they can help
here.


yes - we really are looking for write combining. We were following the 
lead of a couple of other entities in the kernel and hadn't found the 
right combination for something other than X86. We figured we would come 
back and add the non-86 enablements later when we found the right options.


-- james

RE: [PATCH 0/3] scsi: aacraid: Multi controller Kdump IOP reset handling

2018-02-07 Thread Raghava Aditya Renukunta



> -Original Message-
> From: linux-scsi-ow...@vger.kernel.org [mailto:linux-scsi-
> ow...@vger.kernel.org] On Behalf Of Martin K. Petersen
> Sent: Tuesday, February 6, 2018 4:21 PM
> To: Raghava Aditya Renukunta
> 
> Cc: j...@linux.vnet.ibm.com; martin.peter...@oracle.com; linux-
> s...@vger.kernel.org; Scott Benesh ; Tom
> White ; dl-esc-Aacraid Linux Driver
> ; Guilherme G . Piccoli
> ; Bart Van Assche
> 
> Subject: Re: [PATCH 0/3] scsi: aacraid: Multi controller Kdump IOP reset
> handling
> 
> EXTERNAL EMAIL
> 
> 
> Raghava,
> 
> > During Kdump aacraid controller IOP reset is invoked, IOP reset
> > takes approx 40 seconds to bring the controller back up and running.
> > with timeout of 120 seconds and anything more than 2 controllers
> > will cause kdump to timeout.
> >
> > This patchset implements a new reset mechanism called DropIO, that
> > induces the fw to drop any pending IO in the fw and making the reset
> > process quicker.
> 
> This series doesn't apply to my impending 4.17/scsi-queue. Since the
> latter won't exist until Linus releases rc1, please respin against
> current linus/master and resubmit.

Will do Martin.

Thanks,
Raghava Aditya

> Thanks!
> --
> Martin K. Petersen  Oracle Linux Engineering

Re: [PATCH V2 2/8] blk-mq: introduce BLK_MQ_F_GLOBAL_TAGS

2018-02-07 Thread Bart Van Assche


On 02/06/18 15:18, Jens Axboe wrote:

GLOBAL implies that it's, strangely enough, global. That isn't really the
case. Why not call this BLK_MQ_F_HOST_TAGS or something like that? I'd
welcome better names, but global doesn't seem to be a great choice.

BLK_MQ_F_SET_TAGS?


I like the name BLK_MQ_F_HOST_TAGS because it refers how these tags will 
be used by SCSI LLDs, namely as a host-wide tag set.


Thanks,

Bart.

[PATCH v3 2/2] qedi: Cleanup local str variable

2018-02-07 Thread Nilesh Javali

Signed-off-by: Nilesh Javali 
---
 drivers/scsi/qedi/qedi_main.c | 43 ---
 1 file changed, 20 insertions(+), 23 deletions(-)

diff --git a/drivers/scsi/qedi/qedi_main.c b/drivers/scsi/qedi/qedi_main.c
index deaed93..47c45a5 100644
--- a/drivers/scsi/qedi/qedi_main.c
+++ b/drivers/scsi/qedi/qedi_main.c
@@ -1735,7 +1735,6 @@ static ssize_t qedi_show_boot_eth_info(void *data, int 
type, char *buf)
 {
struct qedi_ctx *qedi = data;
struct nvm_iscsi_initiator *initiator;
-   char *str = buf;
int rc = 1;
u32 ipv6_en, dhcp_en, ip_len;
struct nvm_iscsi_block *block;
@@ -1769,32 +1768,32 @@ static ssize_t qedi_show_boot_eth_info(void *data, int 
type, char *buf)
 
switch (type) {
case ISCSI_BOOT_ETH_IP_ADDR:
-   rc = snprintf(str, ip_len, fmt, ip);
+   rc = snprintf(buf, ip_len, fmt, ip);
break;
case ISCSI_BOOT_ETH_SUBNET_MASK:
-   rc = snprintf(str, ip_len, fmt, sub);
+   rc = snprintf(buf, ip_len, fmt, sub);
break;
case ISCSI_BOOT_ETH_GATEWAY:
-   rc = snprintf(str, ip_len, fmt, gw);
+   rc = snprintf(buf, ip_len, fmt, gw);
break;
case ISCSI_BOOT_ETH_FLAGS:
-   rc = snprintf(str, 3, "%hhd\n",
+   rc = snprintf(buf, 3, "%hhd\n",
  SYSFS_FLAG_FW_SEL_BOOT);
break;
case ISCSI_BOOT_ETH_INDEX:
-   rc = snprintf(str, 3, "0\n");
+   rc = snprintf(buf, 3, "0\n");
break;
case ISCSI_BOOT_ETH_MAC:
-   rc = sysfs_format_mac(str, qedi->mac, ETH_ALEN);
+   rc = sysfs_format_mac(buf, qedi->mac, ETH_ALEN);
break;
case ISCSI_BOOT_ETH_VLAN:
-   rc = snprintf(str, 12, "%d\n",
+   rc = snprintf(buf, 12, "%d\n",
  GET_FIELD2(initiator->generic_cont0,
 NVM_ISCSI_CFG_INITIATOR_VLAN));
break;
case ISCSI_BOOT_ETH_ORIGIN:
if (dhcp_en)
-   rc = snprintf(str, 3, "3\n");
+   rc = snprintf(buf, 3, "3\n");
break;
default:
rc = 0;
@@ -1830,7 +1829,6 @@ static ssize_t qedi_show_boot_ini_info(void *data, int 
type, char *buf)
 {
struct qedi_ctx *qedi = data;
struct nvm_iscsi_initiator *initiator;
-   char *str = buf;
int rc;
struct nvm_iscsi_block *block;
 
@@ -1842,7 +1840,7 @@ static ssize_t qedi_show_boot_ini_info(void *data, int 
type, char *buf)
 
switch (type) {
case ISCSI_BOOT_INI_INITIATOR_NAME:
-   rc = sprintf(str, "%.*s\n", NVM_ISCSI_CFG_ISCSI_NAME_MAX_LEN,
+   rc = sprintf(buf, "%.*s\n", NVM_ISCSI_CFG_ISCSI_NAME_MAX_LEN,
 initiator->initiator_name.byte);
break;
default:
@@ -1871,7 +1869,6 @@ static umode_t qedi_ini_get_attr_visibility(void *data, 
int type)
 qedi_show_boot_tgt_info(struct qedi_ctx *qedi, int type,
char *buf, enum qedi_nvm_tgts idx)
 {
-   char *str = buf;
int rc = 1;
u32 ctrl_flags, ipv6_en, chap_en, mchap_en, ip_len;
struct nvm_iscsi_block *block;
@@ -1910,48 +1907,48 @@ static umode_t qedi_ini_get_attr_visibility(void *data, 
int type)
 
switch (type) {
case ISCSI_BOOT_TGT_NAME:
-   rc = sprintf(str, "%.*s\n", NVM_ISCSI_CFG_ISCSI_NAME_MAX_LEN,
+   rc = sprintf(buf, "%.*s\n", NVM_ISCSI_CFG_ISCSI_NAME_MAX_LEN,
 block->target[idx].target_name.byte);
break;
case ISCSI_BOOT_TGT_IP_ADDR:
if (ipv6_en)
-   rc = snprintf(str, ip_len, "%pI6\n",
+   rc = snprintf(buf, ip_len, "%pI6\n",
  block->target[idx].ipv6_addr.byte);
else
-   rc = snprintf(str, ip_len, "%pI4\n",
+   rc = snprintf(buf, ip_len, "%pI4\n",
  block->target[idx].ipv4_addr.byte);
break;
case ISCSI_BOOT_TGT_PORT:
-   rc = snprintf(str, 12, "%d\n",
+   rc = snprintf(buf, 12, "%d\n",
  GET_FIELD2(block->target[idx].generic_cont0,
 NVM_ISCSI_CFG_TARGET_TCP_PORT));
break;
case ISCSI_BOOT_TGT_LUN:
-   rc = snprintf(str, 22, "%.*d\n",
+   rc = snprintf(buf, 22, "%.*d\n",
  block->target[idx].lun.value[1],
  block->target[idx].lun.value[0]);
break;
case ISCSI_BOOT_TGT_CHAP_NAME:
-   rc = sprintf(str, "%.*s\n", NVM_ISCSI_CFG_CHAP_NAME_MAX_LEN,
+

[PATCH v3 1/2] qedi: Fix truncation of CHAP name and secret

2018-02-07 Thread Nilesh Javali

From: Andrew Vasquez 

The data in NVRAM is not guaranteed to be NUL terminated.
Since snprintf expects byte-stream to accommodate null byte,
the CHAP secret is truncated.
Use sprintf instead of snprintf to fix the truncation of
CHAP name and secret.

Signed-off-by: Andrew Vasquez 
Signed-off-by: Nilesh Javali 
---
 drivers/scsi/qedi/qedi_main.c | 24 
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/drivers/scsi/qedi/qedi_main.c b/drivers/scsi/qedi/qedi_main.c
index 8808f0d..deaed93 100644
--- a/drivers/scsi/qedi/qedi_main.c
+++ b/drivers/scsi/qedi/qedi_main.c
@@ -1842,8 +1842,8 @@ static ssize_t qedi_show_boot_ini_info(void *data, int 
type, char *buf)
 
switch (type) {
case ISCSI_BOOT_INI_INITIATOR_NAME:
-   rc = snprintf(str, NVM_ISCSI_CFG_ISCSI_NAME_MAX_LEN, "%s\n",
- initiator->initiator_name.byte);
+   rc = sprintf(str, "%.*s\n", NVM_ISCSI_CFG_ISCSI_NAME_MAX_LEN,
+initiator->initiator_name.byte);
break;
default:
rc = 0;
@@ -1910,8 +1910,8 @@ static umode_t qedi_ini_get_attr_visibility(void *data, 
int type)
 
switch (type) {
case ISCSI_BOOT_TGT_NAME:
-   rc = snprintf(str, NVM_ISCSI_CFG_ISCSI_NAME_MAX_LEN, "%s\n",
- block->target[idx].target_name.byte);
+   rc = sprintf(str, "%.*s\n", NVM_ISCSI_CFG_ISCSI_NAME_MAX_LEN,
+block->target[idx].target_name.byte);
break;
case ISCSI_BOOT_TGT_IP_ADDR:
if (ipv6_en)
@@ -1932,20 +1932,20 @@ static umode_t qedi_ini_get_attr_visibility(void *data, 
int type)
  block->target[idx].lun.value[0]);
break;
case ISCSI_BOOT_TGT_CHAP_NAME:
-   rc = snprintf(str, NVM_ISCSI_CFG_CHAP_NAME_MAX_LEN, "%s\n",
- chap_name);
+   rc = sprintf(str, "%.*s\n", NVM_ISCSI_CFG_CHAP_NAME_MAX_LEN,
+chap_name);
break;
case ISCSI_BOOT_TGT_CHAP_SECRET:
-   rc = snprintf(str, NVM_ISCSI_CFG_CHAP_PWD_MAX_LEN, "%s\n",
- chap_secret);
+   rc = sprintf(str, "%.*s\n", NVM_ISCSI_CFG_CHAP_NAME_MAX_LEN,
+chap_secret);
break;
case ISCSI_BOOT_TGT_REV_CHAP_NAME:
-   rc = snprintf(str, NVM_ISCSI_CFG_CHAP_NAME_MAX_LEN, "%s\n",
- mchap_name);
+   rc = sprintf(str, "%.*s\n", NVM_ISCSI_CFG_CHAP_NAME_MAX_LEN,
+mchap_name);
break;
case ISCSI_BOOT_TGT_REV_CHAP_SECRET:
-   rc = snprintf(str, NVM_ISCSI_CFG_CHAP_PWD_MAX_LEN, "%s\n",
- mchap_secret);
+   rc = sprintf(str, "%.*s\n", NVM_ISCSI_CFG_CHAP_NAME_MAX_LEN,
+mchap_secret);
break;
case ISCSI_BOOT_TGT_FLAGS:
rc = snprintf(str, 3, "%hhd\n", SYSFS_FLAG_FW_SEL_BOOT);
-- 
1.8.3.1

[PATCH v3 0/2] Code cleanup and bug fix for truncated CHAP name and secret

2018-02-07 Thread Nilesh Javali

Martin,
Please consider below patch set for next 'scsi-fixes' submission.

Thanks,
Nilesh

Andrew Vasquez (1):
  qedi: Fix truncation of CHAP name and secret

Nilesh Javali (1):
  qedi: Cleanup local str variable

 drivers/scsi/qedi/qedi_main.c | 55 ---
 1 file changed, 26 insertions(+), 29 deletions(-)

-- 
1.8.3.1

Re: [PATCH 6/6] scsi: qedf: use correct strncpy() size

2018-02-07 Thread Chad Dupuis


On Fri, 2 Feb 2018, 8:12am, Arnd Bergmann wrote:

> gcc-8 warns during link-time optimization that the strncpy() call
> passes the size of the source buffer rather than the destination:
> 
> drivers/scsi/qedf/qedf_dbg.c: In function 'qedf_uevent_emit':
> include/linux/string.h:253: error: 'strncpy' specified bound depends on the 
> length of the source argument [-Werror=stringop-overflow=]
> 
> This changes it to strscpy() with the correct length, guaranteeing
> a properly nul-terminated string of the right size.
> 
> Signed-off-by: Arnd Bergmann 
> ---
>  drivers/scsi/qedf/qedf_dbg.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 

Reasonable security precaution.

Acked-by: Chad Dupuis

Re: [PATCH v3 1/2] qedi: Fix truncation of CHAP name and secret

2018-02-07 Thread Bart Van Assche


On 02/07/18 08:12, Nilesh Javali wrote:

The data in NVRAM is not guaranteed to be NUL terminated.
Since snprintf expects byte-stream to accommodate null byte,
the CHAP secret is truncated.
Use sprintf instead of snprintf to fix the truncation of
CHAP name and secret.


Reviewed-by: Bart Van Assche 

Thanks for having addressed the review comments.

Bart.

Re: [PATCH v3 2/2] qedi: Cleanup local str variable

2018-02-07 Thread Bart Van Assche


On 02/07/18 08:12, Nilesh Javali wrote:

Signed-off-by: Nilesh Javali 


Reviewed-by: Bart Van Assche

[PATCH v2 0/3] scsi: aacraid: Multi controller Kdump IOP reset handling

2018-02-07 Thread Raghava Aditya Renukunta

During Kdump aacraid controller IOP reset is invoked, IOP reset
takes approx 40 seconds to bring the controller back up and running.
with timeout of 120 seconds and anything more than 2 controllers
will cause kdump to timeout.

This patchset implements a new reset mechanism called DropIO, that
induces the fw to drop any pending IO in the fw and making the reset
process quicker.

Changes in v2:
Respun patchset against Linus Master
Added Dave Carroll's reviewed-by tags

Raghava Aditya Renukunta (3):
  scsi: aacraid: Implement DropIO sync command
  scsi: aacraid: Preserve MSIX mode in the OMR register
  scsi: aacraid: Auto detect INTx or MSIx mode during sync cmd
processing

 drivers/scsi/aacraid/aacraid.h |   5 +
 drivers/scsi/aacraid/src.c | 205 ++---
 2 files changed, 198 insertions(+), 12 deletions(-)

-- 
2.9.4

[PATCH v2 2/3] scsi: aacraid: Preserve MSIX mode in the OMR register

2018-02-07 Thread Raghava Aditya Renukunta

Preserve the current MSIX mode value in the OMR before
rewriting the OMR to initiate the IOP or Soft Reset.

Signed-off-by: Prasad B Munirathnam 
Signed-off-by: Raghava Aditya Renukunta 
Reviewed-by: Dave Carroll 

---
Changes in V2:
Re based on Linus Master

 drivers/scsi/aacraid/src.c | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/drivers/scsi/aacraid/src.c b/drivers/scsi/aacraid/src.c
index de48845..09b82d3 100644
--- a/drivers/scsi/aacraid/src.c
+++ b/drivers/scsi/aacraid/src.c
@@ -680,6 +680,25 @@ void aac_set_intx_mode(struct aac_dev *dev)
}
 }
 
+static void aac_clear_omr(struct aac_dev *dev)
+{
+   u32 omr_value = 0;
+
+   omr_value = src_readl(dev, MUnit.OMR);
+
+   /*
+* Check for PCI Errors or Kernel Panic
+*/
+   if ((omr_value == INVALID_OMR) || (omr_value & KERNEL_PANIC))
+   omr_value = 0;
+
+   /*
+* Preserve MSIX Value if any
+*/
+   src_writel(dev, MUnit.OMR, omr_value & AAC_INT_MODE_MSIX);
+   src_readl(dev, MUnit.OMR);
+}
+
 static void aac_dump_fw_fib_iop_reset(struct aac_dev *dev)
 {
__le32 supported_options3;
@@ -740,6 +759,8 @@ static void aac_send_iop_reset(struct aac_dev *dev)
 
aac_set_intx_mode(dev);
 
+   aac_clear_omr(dev);
+
src_writel(dev, MUnit.IDR, IOP_SRC_RESET_MASK);
 
msleep(5000);
@@ -749,6 +770,7 @@ static void aac_send_hardware_soft_reset(struct aac_dev 
*dev)
 {
u_int32_t val;
 
+   aac_clear_omr(dev);
val = readl(((char *)(dev->base) + IBW_SWR_OFFSET));
val |= 0x01;
writel(val, ((char *)(dev->base) + IBW_SWR_OFFSET));
-- 
2.9.4

[PATCH v2 3/3] scsi: aacraid: Auto detect INTx or MSIx mode during sync cmd processing

2018-02-07 Thread Raghava Aditya Renukunta

During sync command processing if legacy INTx status indicates
command is not completed, sample the MSIx register and check if
it indicates command completion, set controller MSIx enabled flag.

Signed-off-by: Prasad B Munirathnam 
Signed-off-by: Raghava Aditya Renukunta 
Reviewed-by: Dave Carroll 

---
Changes in V2:
Re based on Linus Master

 drivers/scsi/aacraid/aacraid.h |  1 +
 drivers/scsi/aacraid/src.c | 22 --
 2 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/drivers/scsi/aacraid/aacraid.h b/drivers/scsi/aacraid/aacraid.h
index c3fdec9..29bf1e6 100644
--- a/drivers/scsi/aacraid/aacraid.h
+++ b/drivers/scsi/aacraid/aacraid.h
@@ -1231,6 +1231,7 @@ struct src_registers {
 
 #define SRC_ODR_SHIFT  12
 #define SRC_IDR_SHIFT  9
+#define SRC_MSI_READ_MASK  0x1000
 
 typedef void (*fib_callback)(void *ctxt, struct fib *fibctx);
 
diff --git a/drivers/scsi/aacraid/src.c b/drivers/scsi/aacraid/src.c
index 09b82d3..3122389f 100644
--- a/drivers/scsi/aacraid/src.c
+++ b/drivers/scsi/aacraid/src.c
@@ -1405,13 +1405,23 @@ void aac_src_access_devreg(struct aac_dev *dev, int 
mode)
 
 static int aac_src_get_sync_status(struct aac_dev *dev)
 {
+   int msix_val = 0;
+   int legacy_val = 0;
 
-   int val;
+   msix_val = src_readl(dev, MUnit.ODR_MSI) & SRC_MSI_READ_MASK ? 1 : 0;
 
-   if (dev->msi_enabled)
-   val = src_readl(dev, MUnit.ODR_MSI) & 0x1000 ? 1 : 0;
-   else
-   val = src_readl(dev, MUnit.ODR_R) >> SRC_ODR_SHIFT;
+   if (!dev->msi_enabled) {
+   /*
+* if Legacy int status indicates cmd is not complete
+* sample MSIx register to see if it indiactes cmd complete,
+* if yes set the controller in MSIx mode and consider cmd
+* completed
+*/
+   legacy_val = src_readl(dev, MUnit.ODR_R) >> SRC_ODR_SHIFT;
+   if (!(legacy_val & 1) && msix_val)
+   dev->msi_enabled = 1;
+   return legacy_val;
+   }
 
-   return val;
+   return msix_val;
 }
-- 
2.9.4

[PATCH v2 1/3] scsi: aacraid: Implement DropIO sync command

2018-02-07 Thread Raghava Aditya Renukunta

IOP_RESET takes longer time to complete, if controller
is in a state where we can bring it back with init struct,
controller DropIO sync command is implemented.

 - If controller is faulted perform standard IOP_RESET in aac_srcv_init.
 - If controller is not faulted get adapter properties and extended
   properties.
 - Update the sa_firmware variable and determine if DropIO request is
   supported.
 - Issue DropIO request, and get the number of outstanding commands.
 - If all commands are complete with success (CT_OK), consider IOP_RESET
   is complete.
 - If any commands timeout, Perform the IOP_RESET.

Signed-off-by: Prasad B Munirathnam 
Signed-off-by: Raghava Aditya Renukunta 
Reviewed-by: Dave Carroll 

---
Changes in V2:
Re based on Linus Master

 drivers/scsi/aacraid/aacraid.h |   4 +
 drivers/scsi/aacraid/src.c | 161 +++--
 2 files changed, 159 insertions(+), 6 deletions(-)

diff --git a/drivers/scsi/aacraid/aacraid.h b/drivers/scsi/aacraid/aacraid.h
index 0095fcb..c3fdec9 100644
--- a/drivers/scsi/aacraid/aacraid.h
+++ b/drivers/scsi/aacraid/aacraid.h
@@ -1528,6 +1528,7 @@ struct aac_bus_info_response {
 #define AAC_COMM_MESSAGE_TYPE3 5
 
 #define AAC_EXTOPT_SA_FIRMWARE cpu_to_le32(1<<1)
+#define AAC_EXTOPT_SOFT_RESET  cpu_to_le32(1<<16)
 
 /* MSIX context */
 struct aac_msix_ctx {
@@ -1662,6 +1663,7 @@ struct aac_dev
u8  raw_io_64;
u8  printf_enabled;
u8  in_reset;
+   u8  in_soft_reset;
u8  msi;
u8  sa_firmware;
int management_fib_count;
@@ -2504,6 +2506,7 @@ struct aac_hba_info {
 #define RCV_TEMP_READINGS  0x0025
 #define GET_COMM_PREFERRED_SETTINGS0x0026
 #define IOP_RESET_FW_FIB_DUMP  0x0034
+#define DROP_IO0x0035
 #define IOP_RESET  0x1000
 #define IOP_RESET_ALWAYS   0x1001
 #define RE_INIT_ADAPTER0x00ee
@@ -2539,6 +2542,7 @@ struct aac_hba_info {
 #defineFLASH_UPD_PENDING   0x2000
 #defineFLASH_UPD_SUCCESS   0x4000
 #defineFLASH_UPD_FAILED0x8000
+#defineINVALID_OMR 0x
 #defineFWUPD_TIMEOUT   (5 * 60)
 
 /*
diff --git a/drivers/scsi/aacraid/src.c b/drivers/scsi/aacraid/src.c
index fde6b6a..de48845 100644
--- a/drivers/scsi/aacraid/src.c
+++ b/drivers/scsi/aacraid/src.c
@@ -255,7 +255,8 @@ static int src_sync_cmd(struct aac_dev *dev, u32 command,
 */
src_writel(dev, MUnit.IDR, INBOUNDDOORBELL_0 << SRC_IDR_SHIFT);
 
-   if (!dev->sync_mode || command != SEND_SYNCHRONOUS_FIB) {
+   if ((!dev->sync_mode || command != SEND_SYNCHRONOUS_FIB) &&
+   !dev->in_soft_reset) {
ok = 0;
start = jiffies;
 
@@ -992,6 +993,148 @@ int aac_src_init(struct aac_dev *dev)
return -1;
 }
 
+static int aac_src_wait_sync(struct aac_dev *dev, int *status)
+{
+   unsigned long start = jiffies;
+   unsigned long usecs = 0;
+   int delay = 5 * HZ;
+   int rc = 1;
+
+   while (time_before(jiffies, start+delay)) {
+   /*
+* Delay 5 microseconds to let Mon960 get info.
+*/
+   udelay(5);
+
+   /*
+* Mon960 will set doorbell0 bit when it has completed the
+* command.
+*/
+   if (aac_src_get_sync_status(dev) & OUTBOUNDDOORBELL_0) {
+   /*
+* Clear: the doorbell.
+*/
+   if (dev->msi_enabled)
+   aac_src_access_devreg(dev, AAC_CLEAR_SYNC_BIT);
+   else
+   src_writel(dev, MUnit.ODR_C,
+   OUTBOUNDDOORBELL_0 << SRC_ODR_SHIFT);
+   rc = 0;
+
+   break;
+   }
+
+   /*
+* Yield the processor in case we are slow
+*/
+   usecs = 1 * USEC_PER_MSEC;
+   usleep_range(usecs, usecs + 50);
+   }
+   /*
+* Pull the synch status from Mailbox 0.
+*/
+   if (status && !rc) {
+   status[0] = readl(&dev->IndexRegs->Mailbox[0]);
+   status[1] = readl(&dev->IndexRegs->Mailbox[1]);
+   status[2] = readl(&dev->IndexRegs->Mailbox[2]);
+   status[3] = readl(&dev->IndexRegs->Mailbox[3]);
+   status[4] = readl(&dev->IndexRegs->Mailbox[4]);
+   }
+
+   return rc;
+}
+
+/**
+ *  aac_src_soft_reset -   perform soft reset to speed up
+ *  access
+ *
+ *  Assumptions: That the controller is in a state where we can
+ *

Re: [PATCH V2 2/8] blk-mq: introduce BLK_MQ_F_GLOBAL_TAGS

2018-02-07 Thread Bart Van Assche

On Mon, 2018-02-05 at 23:20 +0800, Ming Lei wrote:
> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
> index 55c0a745b427..385bbec73804 100644
> --- a/block/blk-mq-sched.c
> +++ b/block/blk-mq-sched.c
> @@ -81,6 +81,17 @@ static bool blk_mq_sched_restart_hctx(struct blk_mq_hw_ctx 
> *hctx)
>   } else
>   clear_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state);
>  
> + /* need to restart all hw queues for global tags */
> + if (hctx->flags & BLK_MQ_F_GLOBAL_TAGS) {
> + struct blk_mq_hw_ctx *hctx2;
> + int i;
> +
> + queue_for_each_hw_ctx(hctx->queue, hctx2, i)
> + if (blk_mq_run_hw_queue(hctx2, true))
> + return true;
> + return false;
> + }
> +
>   return blk_mq_run_hw_queue(hctx, true);
>  }

This new loop looks misplaced to me. If both the BLK_MQ_F_GLOBAL_TAGS and the
BLK_MQ_F_TAG_SHARED flags are set then the outer loop in blk_mq_sched_restart()
and the inner loop in blk_mq_sched_restart_hctx() will cause more calls of
blk_mq_run_hw_queue() than necessary. Have you considered to merge the above
loop into blk_mq_sched_restart()?

Thanks,

Bart.

Re: [PATCH v3 1/2] qedi: Fix truncation of CHAP name and secret

2018-02-07 Thread Chris Leech

On Wed, Feb 07, 2018 at 08:12:35AM -0800, Nilesh Javali wrote:
> From: Andrew Vasquez 
> 
> The data in NVRAM is not guaranteed to be NUL terminated.
> Since snprintf expects byte-stream to accommodate null byte,
> the CHAP secret is truncated.
> Use sprintf instead of snprintf to fix the truncation of
> CHAP name and secret.
> 
> Signed-off-by: Andrew Vasquez 
> Signed-off-by: Nilesh Javali 

Makes sense, limits the data copied out instead of the entire output
length and ensures that the newline and \0 are always in the output.

Signed-off-by: Chris Leech 

> ---
>  drivers/scsi/qedi/qedi_main.c | 24 
>  1 file changed, 12 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/scsi/qedi/qedi_main.c b/drivers/scsi/qedi/qedi_main.c
> index 8808f0d..deaed93 100644
> --- a/drivers/scsi/qedi/qedi_main.c
> +++ b/drivers/scsi/qedi/qedi_main.c
> @@ -1842,8 +1842,8 @@ static ssize_t qedi_show_boot_ini_info(void *data, int 
> type, char *buf)
>  
>   switch (type) {
>   case ISCSI_BOOT_INI_INITIATOR_NAME:
> - rc = snprintf(str, NVM_ISCSI_CFG_ISCSI_NAME_MAX_LEN, "%s\n",
> -   initiator->initiator_name.byte);
> + rc = sprintf(str, "%.*s\n", NVM_ISCSI_CFG_ISCSI_NAME_MAX_LEN,
> +  initiator->initiator_name.byte);
>   break;
>   default:
>   rc = 0;
> @@ -1910,8 +1910,8 @@ static umode_t qedi_ini_get_attr_visibility(void *data, 
> int type)
>  
>   switch (type) {
>   case ISCSI_BOOT_TGT_NAME:
> - rc = snprintf(str, NVM_ISCSI_CFG_ISCSI_NAME_MAX_LEN, "%s\n",
> -   block->target[idx].target_name.byte);
> + rc = sprintf(str, "%.*s\n", NVM_ISCSI_CFG_ISCSI_NAME_MAX_LEN,
> +  block->target[idx].target_name.byte);
>   break;
>   case ISCSI_BOOT_TGT_IP_ADDR:
>   if (ipv6_en)
> @@ -1932,20 +1932,20 @@ static umode_t qedi_ini_get_attr_visibility(void 
> *data, int type)
> block->target[idx].lun.value[0]);
>   break;
>   case ISCSI_BOOT_TGT_CHAP_NAME:
> - rc = snprintf(str, NVM_ISCSI_CFG_CHAP_NAME_MAX_LEN, "%s\n",
> -   chap_name);
> + rc = sprintf(str, "%.*s\n", NVM_ISCSI_CFG_CHAP_NAME_MAX_LEN,
> +  chap_name);
>   break;
>   case ISCSI_BOOT_TGT_CHAP_SECRET:
> - rc = snprintf(str, NVM_ISCSI_CFG_CHAP_PWD_MAX_LEN, "%s\n",
> -   chap_secret);
> + rc = sprintf(str, "%.*s\n", NVM_ISCSI_CFG_CHAP_NAME_MAX_LEN,
> +  chap_secret);
>   break;
>   case ISCSI_BOOT_TGT_REV_CHAP_NAME:
> - rc = snprintf(str, NVM_ISCSI_CFG_CHAP_NAME_MAX_LEN, "%s\n",
> -   mchap_name);
> + rc = sprintf(str, "%.*s\n", NVM_ISCSI_CFG_CHAP_NAME_MAX_LEN,
> +  mchap_name);
>   break;
>   case ISCSI_BOOT_TGT_REV_CHAP_SECRET:
> - rc = snprintf(str, NVM_ISCSI_CFG_CHAP_PWD_MAX_LEN, "%s\n",
> -   mchap_secret);
> + rc = sprintf(str, "%.*s\n", NVM_ISCSI_CFG_CHAP_NAME_MAX_LEN,
> +  mchap_secret);
>   break;
>   case ISCSI_BOOT_TGT_FLAGS:
>   rc = snprintf(str, 3, "%hhd\n", SYSFS_FLAG_FW_SEL_BOOT);
> -- 
> 1.8.3.1
>

Re: [PATCH v3 2/2] qedi: Cleanup local str variable

2018-02-07 Thread Chris Leech

On Wed, Feb 07, 2018 at 08:12:36AM -0800, Nilesh Javali wrote:
> Signed-off-by: Nilesh Javali 

Signed-off-by: Chris Leech 

> ---
>  drivers/scsi/qedi/qedi_main.c | 43 
> ---
>  1 file changed, 20 insertions(+), 23 deletions(-)
> 
> diff --git a/drivers/scsi/qedi/qedi_main.c b/drivers/scsi/qedi/qedi_main.c
> index deaed93..47c45a5 100644
> --- a/drivers/scsi/qedi/qedi_main.c
> +++ b/drivers/scsi/qedi/qedi_main.c
> @@ -1735,7 +1735,6 @@ static ssize_t qedi_show_boot_eth_info(void *data, int 
> type, char *buf)
>  {
>   struct qedi_ctx *qedi = data;
>   struct nvm_iscsi_initiator *initiator;
> - char *str = buf;
>   int rc = 1;
>   u32 ipv6_en, dhcp_en, ip_len;
>   struct nvm_iscsi_block *block;
> @@ -1769,32 +1768,32 @@ static ssize_t qedi_show_boot_eth_info(void *data, 
> int type, char *buf)
>  
>   switch (type) {
>   case ISCSI_BOOT_ETH_IP_ADDR:
> - rc = snprintf(str, ip_len, fmt, ip);
> + rc = snprintf(buf, ip_len, fmt, ip);
>   break;
>   case ISCSI_BOOT_ETH_SUBNET_MASK:
> - rc = snprintf(str, ip_len, fmt, sub);
> + rc = snprintf(buf, ip_len, fmt, sub);
>   break;
>   case ISCSI_BOOT_ETH_GATEWAY:
> - rc = snprintf(str, ip_len, fmt, gw);
> + rc = snprintf(buf, ip_len, fmt, gw);
>   break;
>   case ISCSI_BOOT_ETH_FLAGS:
> - rc = snprintf(str, 3, "%hhd\n",
> + rc = snprintf(buf, 3, "%hhd\n",
> SYSFS_FLAG_FW_SEL_BOOT);
>   break;
>   case ISCSI_BOOT_ETH_INDEX:
> - rc = snprintf(str, 3, "0\n");
> + rc = snprintf(buf, 3, "0\n");
>   break;
>   case ISCSI_BOOT_ETH_MAC:
> - rc = sysfs_format_mac(str, qedi->mac, ETH_ALEN);
> + rc = sysfs_format_mac(buf, qedi->mac, ETH_ALEN);
>   break;
>   case ISCSI_BOOT_ETH_VLAN:
> - rc = snprintf(str, 12, "%d\n",
> + rc = snprintf(buf, 12, "%d\n",
> GET_FIELD2(initiator->generic_cont0,
>NVM_ISCSI_CFG_INITIATOR_VLAN));
>   break;
>   case ISCSI_BOOT_ETH_ORIGIN:
>   if (dhcp_en)
> - rc = snprintf(str, 3, "3\n");
> + rc = snprintf(buf, 3, "3\n");
>   break;
>   default:
>   rc = 0;
> @@ -1830,7 +1829,6 @@ static ssize_t qedi_show_boot_ini_info(void *data, int 
> type, char *buf)
>  {
>   struct qedi_ctx *qedi = data;
>   struct nvm_iscsi_initiator *initiator;
> - char *str = buf;
>   int rc;
>   struct nvm_iscsi_block *block;
>  
> @@ -1842,7 +1840,7 @@ static ssize_t qedi_show_boot_ini_info(void *data, int 
> type, char *buf)
>  
>   switch (type) {
>   case ISCSI_BOOT_INI_INITIATOR_NAME:
> - rc = sprintf(str, "%.*s\n", NVM_ISCSI_CFG_ISCSI_NAME_MAX_LEN,
> + rc = sprintf(buf, "%.*s\n", NVM_ISCSI_CFG_ISCSI_NAME_MAX_LEN,
>initiator->initiator_name.byte);
>   break;
>   default:
> @@ -1871,7 +1869,6 @@ static umode_t qedi_ini_get_attr_visibility(void *data, 
> int type)
>  qedi_show_boot_tgt_info(struct qedi_ctx *qedi, int type,
>   char *buf, enum qedi_nvm_tgts idx)
>  {
> - char *str = buf;
>   int rc = 1;
>   u32 ctrl_flags, ipv6_en, chap_en, mchap_en, ip_len;
>   struct nvm_iscsi_block *block;
> @@ -1910,48 +1907,48 @@ static umode_t qedi_ini_get_attr_visibility(void 
> *data, int type)
>  
>   switch (type) {
>   case ISCSI_BOOT_TGT_NAME:
> - rc = sprintf(str, "%.*s\n", NVM_ISCSI_CFG_ISCSI_NAME_MAX_LEN,
> + rc = sprintf(buf, "%.*s\n", NVM_ISCSI_CFG_ISCSI_NAME_MAX_LEN,
>block->target[idx].target_name.byte);
>   break;
>   case ISCSI_BOOT_TGT_IP_ADDR:
>   if (ipv6_en)
> - rc = snprintf(str, ip_len, "%pI6\n",
> + rc = snprintf(buf, ip_len, "%pI6\n",
> block->target[idx].ipv6_addr.byte);
>   else
> - rc = snprintf(str, ip_len, "%pI4\n",
> + rc = snprintf(buf, ip_len, "%pI4\n",
> block->target[idx].ipv4_addr.byte);
>   break;
>   case ISCSI_BOOT_TGT_PORT:
> - rc = snprintf(str, 12, "%d\n",
> + rc = snprintf(buf, 12, "%d\n",
> GET_FIELD2(block->target[idx].generic_cont0,
>NVM_ISCSI_CFG_TARGET_TCP_PORT));
>   break;
>   case ISCSI_BOOT_TGT_LUN:
> - rc = snprintf(str, 22, "%.*d\n",
> + rc = snprintf(buf, 22, "%.*d\n",
> block->target[idx].lun.value[1],
> block->target[idx].lun.value[0]);

Re: [PATCH] libiscsi: ensure session spin lock usage consistent

2018-02-07 Thread Mike Christie

On 02/05/2018 01:13 PM, Lee Duncan wrote:
> The libiscsi code was using both spin_lock()/spin_unlock()
> and spin_lock_bh()/spin_unlock_bh() on its session lock.

It does this because the lock was only taken between bottom halves and
process contexts. If we are already in a bh then there is no need to do
the _bh call.

If that's not the case, then you would also want to modify the back_lock
calls too right?

> In addition, lock validation found that libiscsi.c was
> taking a HARDIRQ-unsafe lock while holding an HARDIRQ-
> irq-safe lock:


I'm not sure how this patch helps or if this is a valid bug.

If somehow iscsi_conn_setup was running and it had passed the
spin_lock_bh(&session->frwd_lock) call but not yet released it, and we
got a interrupt, and then blk_timeout_work->iscsi_eh_cmd_timed_out was
run, we would still hit the bug the trace below is warning against
right?  I think wou would want to do the _irq locking call in
iscsi_conn_setup to avoid that.

However, it would be impossible to hit this case, because we would
always have had to complete the create_conn call before ever starting to
even execute commands and eventually time out commands.


> 
>> [ 2528.738454] =
>> [ 2528.744679] WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected
>> [ 2528.749891] 4.15.0-6-g4548e6f42022-dirty #418 Not tainted
>> [ 2528.754356] -
>> [ 2528.759643] kworker/0:1H/100 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
>> [ 2528.764608]  (&(&session->frwd_lock)->rlock){+.-.}, at: 
>> [<97d8de0f>] iscsi_eh_cmd_timed_out+0x3d/0x2b0
>> [ 2528.769164]
>>and this task is already holding:
>> [ 2528.771313]  (&(&q->__queue_lock)->rlock){-.-.}, at: [] 
>> blk_timeout_work+0x22/0x120
>> [ 2528.773825] which would create a new lock dependency:
>> [ 2528.775216]  (&(&q->__queue_lock)->rlock){-.-.} -> 
>> (&(&session->frwd_lock)->rlock){+.-.}
>> [ 2528.777071]
>>but this new dependency connects a HARDIRQ-irq-safe lock:
>> [ 2528.778945]  (&(&q->__queue_lock)->rlock){-.-.}
>> [ 2528.778948]
>>... which became HARDIRQ-irq-safe at:
>> [ 2528.781204]   _raw_spin_lock_irqsave+0x3c/0x4b
>> [ 2528.781966]   cfq_idle_slice_timer+0x2f/0x100
>> [ 2528.782705]   __hrtimer_run_queues+0xdc/0x440
>> [ 2528.783448]   hrtimer_interrupt+0xb1/0x210
>> [ 2528.784149]   smp_apic_timer_interrupt+0x6d/0x260
>> [ 2528.784954]   apic_timer_interrupt+0xac/0xc0
>> [ 2528.785679]   native_safe_halt+0x2/0x10
>> [ 2528.786280]   default_idle+0x1f/0x180
>> [ 2528.786806]   do_idle+0x166/0x1e0
>> [ 2528.787288]   cpu_startup_entry+0x6f/0x80
>> [ 2528.787874]   start_secondary+0x186/0x1e0
>> [ 2528.788448]   secondary_startup_64+0xa5/0xb0
>> [ 2528.789070]
>>to a HARDIRQ-irq-unsafe lock:
>> [ 2528.789913]  (&(&session->frwd_lock)->rlock){+.-.}
>> [ 2528.789915]
>>... which became HARDIRQ-irq-unsafe at:
>> [ 2528.791548] ...
>> [ 2528.791553]   _raw_spin_lock_bh+0x34/0x40
>> [ 2528.792366]   iscsi_conn_setup+0x166/0x220
>> [ 2528.792960]   iscsi_tcp_conn_setup+0x10/0x40
>> [ 2528.793526]   iscsi_sw_tcp_conn_create+0x1b/0x160
>> [ 2528.794111]   iscsi_if_rx+0xf9f/0x1580
>> [ 2528.794542]   netlink_unicast+0x1f7/0x2f0
>> [ 2528.795105]   netlink_sendmsg+0x2c9/0x3c0
>> [ 2528.795636]   sock_sendmsg+0x30/0x40
>> [ 2528.796068]   ___sys_sendmsg+0x269/0x2c0
>> [ 2528.796544]   __sys_sendmsg+0x51/0x90
>> [ 2528.797028]   entry_SYSCALL_64_fastpath+0x25/0x9c
>> [ 2528.797595]
>> ...
> 
> This commit avoids possible reverse deadlock.
> 
> Signed-off-by: Lee Duncan 
> ---
>  drivers/scsi/libiscsi.c | 18 +-
>  1 file changed, 9 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
> index 82c3fd4bc938..055357b2fe9e 100644
> --- a/drivers/scsi/libiscsi.c
> +++ b/drivers/scsi/libiscsi.c
> @@ -1248,9 +1248,9 @@ int iscsi_complete_pdu(struct iscsi_conn *conn, struct 
> iscsi_hdr *hdr,
>  {
>   int rc;
>  
> - spin_lock(&conn->session->lock);
> + spin_lock_bh(&conn->session->lock);
>   rc = __iscsi_complete_pdu(conn, hdr, data, datalen);
> - spin_unlock(&conn->session->lock);
> + spin_unlock_bh(&conn->session->lock);
>   return rc;
>  }
>  EXPORT_SYMBOL_GPL(iscsi_complete_pdu);
> @@ -1749,14 +1749,14 @@ static void iscsi_tmf_timedout(unsigned long data)
>   struct iscsi_conn *conn = (struct iscsi_conn *)data;
>   struct iscsi_session *session = conn->session;
>  
> - spin_lock(&session->lock);
> + spin_lock_bh(&session->lock);
>   if (conn->tmf_state == TMF_QUEUED) {
>   conn->tmf_state = TMF_TIMEDOUT;
>   ISCSI_DBG_EH(session, "tmf timedout\n");
>   /* unblock eh_abort() */
>   wake_up(&conn->ehwait);
>   }
> - spin_unlock(&session->lock);
> + spin_unlock_bh(&session->lock);
>  }
>  
>  static int

Re: [PATCH] libiscsi: ensure session spin lock usage consistent

2018-02-07 Thread Mike Christie

On 02/05/2018 01:13 PM, Lee Duncan wrote:
> diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
> index 82c3fd4bc938..055357b2fe9e 100644
> --- a/drivers/scsi/libiscsi.c
> +++ b/drivers/scsi/libiscsi.c
> @@ -1248,9 +1248,9 @@ int iscsi_complete_pdu(struct iscsi_conn *conn, struct 
> iscsi_hdr *hdr,
>  {
>   int rc;
>  
> - spin_lock(&conn->session->lock);
> + spin_lock_bh(&conn->session->lock);
>   rc = __iscsi_complete_pdu(conn, hdr, data, datalen);
> - spin_unlock(&conn->session->lock);
> + spin_unlock_bh(&conn->session->lock);
>   return rc;

This one is actually needed because qla4xxx_task_work calls it from
process context.

Re: [PATCH] libiscsi: ensure session spin lock usage consistent

2018-02-07 Thread Mike Christie

On 02/07/2018 04:41 PM, Mike Christie wrote:
>  
>>  static int iscsi_exec_task_mgmt_fn(struct iscsi_conn *conn,
>> @@ -1908,7 +1908,7 @@ static enum blk_eh_timer_return 
>> iscsi_eh_cmd_timed_out(struct scsi_cmnd *sc)
>>  
>>  ISCSI_DBG_EH(session, "scsi cmd %p timedout\n", sc);
>>  
>> -spin_lock(&session->lock);
>> +spin_lock_bh(&session->lock);
>>  task = (struct iscsi_task *)sc->SCp.ptr;
>>  if (!task) {
>>  /*
>> @@ -2022,7 +2022,7 @@ static enum blk_eh_timer_return 
>> iscsi_eh_cmd_timed_out(struct scsi_cmnd *sc)
>>  done:
>>  if (task)
>>  task->last_timeout = jiffies;
>> -spin_unlock(&session->lock);
>> +spin_unlock_bh(&session->lock);
>>  ISCSI_DBG_EH(session, "return %s\n", rc == BLK_EH_RESET_TIMER ?
>>   "timer reset" : "nh");
>>  return rc;

I guess you want this chunk too. It does not now, but someone could add
a blk_abort_request call that ends up calling into the driver.

Re: [PATCH 0/5] blk-mq/scsi-mq: support global tags & introduce force_blk_mq

2018-02-07 Thread Ming Lei

Hi Kashyap,

On Wed, Feb 07, 2018 at 07:44:04PM +0530, Kashyap Desai wrote:
> > -Original Message-
> > From: Ming Lei [mailto:ming@redhat.com]
> > Sent: Wednesday, February 7, 2018 5:53 PM
> > To: Hannes Reinecke
> > Cc: Kashyap Desai; Jens Axboe; linux-bl...@vger.kernel.org; Christoph
> > Hellwig; Mike Snitzer; linux-scsi@vger.kernel.org; Arun Easi; Omar
> Sandoval;
> > Martin K . Petersen; James Bottomley; Christoph Hellwig; Don Brace;
> Peter
> > Rivera; Paolo Bonzini; Laurence Oberman
> > Subject: Re: [PATCH 0/5] blk-mq/scsi-mq: support global tags & introduce
> > force_blk_mq
> >
> > On Wed, Feb 07, 2018 at 07:50:21AM +0100, Hannes Reinecke wrote:
> > > Hi all,
> > >
> > > [ .. ]
> > > >>
> > > >> Could you share us your patch for enabling global_tags/MQ on
> > > > megaraid_sas
> > > >> so that I can reproduce your test?
> > > >>
> > > >>> See below perf top data. "bt_iter" is consuming 4 times more CPU.
> > > >>
> > > >> Could you share us what the IOPS/CPU utilization effect is after
> > > > applying the
> > > >> patch V2? And your test script?
> > > > Regarding CPU utilization, I need to test one more time. Currently
> > > > system is in used.
> > > >
> > > > I run below fio test on total 24 SSDs expander attached.
> > > >
> > > > numactl -N 1 fio jbod.fio --rw=randread --iodepth=64 --bs=4k
> > > > --ioengine=libaio --rw=randread
> > > >
> > > > Performance dropped from 1.6 M IOPs to 770K IOPs.
> > > >
> > > This is basically what we've seen with earlier iterations.
> >
> > Hi Hannes,
> >
> > As I mentioned in another mail[1], Kashyap's patch has a big issue,
> which
> > causes only reply queue 0 used.
> >
> > [1] https://marc.info/?l=linux-scsi&m=151793204014631&w=2
> >
> > So could you guys run your performance test again after fixing the
> patch?
> 
> Ming -
> 
> I tried after change you requested.  Performance drop is still unresolved.
> From 1.6 M IOPS to 770K IOPS.
> 
> See below data. All 24 reply queue is in used correctly.
> 
> IRQs / 1 second(s)
> IRQ#  TOTAL  NODE0   NODE1  NAME
>  360  16422  0   16422  IR-PCI-MSI 70254653-edge megasas
>  364  15980  0   15980  IR-PCI-MSI 70254657-edge megasas
>  362  15979  0   15979  IR-PCI-MSI 70254655-edge megasas
>  345  15696  0   15696  IR-PCI-MSI 70254638-edge megasas
>  341  15659  0   15659  IR-PCI-MSI 70254634-edge megasas
>  369  15656  0   15656  IR-PCI-MSI 70254662-edge megasas
>  359  15650  0   15650  IR-PCI-MSI 70254652-edge megasas
>  358  15596  0   15596  IR-PCI-MSI 70254651-edge megasas
>  350  15574  0   15574  IR-PCI-MSI 70254643-edge megasas
>  342  15532  0   15532  IR-PCI-MSI 70254635-edge megasas
>  344  15527  0   15527  IR-PCI-MSI 70254637-edge megasas
>  346  15485  0   15485  IR-PCI-MSI 70254639-edge megasas
>  361  15482  0   15482  IR-PCI-MSI 70254654-edge megasas
>  348  15467  0   15467  IR-PCI-MSI 70254641-edge megasas
>  368  15463  0   15463  IR-PCI-MSI 70254661-edge megasas
>  354  15420  0   15420  IR-PCI-MSI 70254647-edge megasas
>  351  15378  0   15378  IR-PCI-MSI 70254644-edge megasas
>  352  15377  0   15377  IR-PCI-MSI 70254645-edge megasas
>  356  15348  0   15348  IR-PCI-MSI 70254649-edge megasas
>  337  15344  0   15344  IR-PCI-MSI 70254630-edge megasas
>  343  15320  0   15320  IR-PCI-MSI 70254636-edge megasas
>  355  15266  0   15266  IR-PCI-MSI 70254648-edge megasas
>  335  15247  0   15247  IR-PCI-MSI 70254628-edge megasas
>  363  15233  0   15233  IR-PCI-MSI 70254656-edge megasas
> 
> 
> Average:CPU  %usr %nice  %sys   %iowait%steal
> %irq %soft%guest%gnice %idle
> Average: 18  3.80  0.00 14.78 10.08  0.00
> 0.00  4.01  0.00  0.00 67.33
> Average: 19  3.26  0.00 15.35 10.62  0.00
> 0.00  4.03  0.00  0.00 66.74
> Average: 20  3.42  0.00 14.57 10.67  0.00
> 0.00  3.84  0.00  0.00 67.50
> Average: 21  3.19  0.00 15.60 10.75  0.00
> 0.00  4.16  0.00  0.00 66.30
> Average: 22  3.58  0.00 15.15 10.66  0.00
> 0.00  3.51  0.00  0.00 67.11
> Average: 23  3.34  0.00 15.36 10.63  0.00
> 0.00  4.17  0.00  0.00 66.50
> Average: 24  3.50  0.00 14.58 10.93  0.00
> 0.00  3.85  0.00  0.00 67.13
> Average: 25  3.20  0.00 14.68 10.86  0.00
> 0.00  4.31  0.00  0.00 66.95
> Average: 26  3.27  0.00 14.80 10.70  0.00
> 0.00  3.68  0.00  0.00 67.55
> Average: 27  3.58  0.00 15.36 10.80  0.00
> 0.00  3.79  0.00  0.00 66.48
> Average: 28  3.46  0.00 15.17 10.46  0.00
> 0.00  3.32  0.00  0.00 67.59
> Averag

Re: [PATCH] libiscsi: ensure session spin lock usage consistent

2018-02-07 Thread Chris Leech

On Mon, Feb 05, 2018 at 11:13:23AM -0800, Lee Duncan wrote:
> The libiscsi code was using both spin_lock()/spin_unlock()
> and spin_lock_bh()/spin_unlock_bh() on its session lock.
> In addition, lock validation found that libiscsi.c was
> taking a HARDIRQ-unsafe lock while holding an HARDIRQ-
> irq-safe lock:

Lee and I have exchanged a few mails off-list, but I don't think these
changes are the right thing to address the lockdep warning here.

I've managed to reproduce this by exporting a scsi_debug lun over iSCSI
and running fio traffic on a lockdep enabled initiator.

The deadlock that lockdep is worried about is:

>   Possible interrupt unsafe locking scenario:
>
>CPU0CPU1
>
>   lock(&(&session->frwd_lock)->rlock);
>local_irq_disable();
>lock(&(&q->__queue_lock)->rlock);
>lock(&(&session->frwd_lock)->rlock);
>   
> lock(&(&q->__queue_lock)->rlock);
> 
>  *** DEADLOCK ***

Which I don't think can actually happen unless the queue_lock is being
taken in hardirq context by any of the iSCSI drivers, but lockdep can't
know that.  But it does make me wonder if we can do less in
iscsi_eh_cmd_timed_out while interrupts are blocked.

Lee, I'm going to test with your patch applied and see what lockdep
kicks out.

- Chris
 
> > [ 2528.738454] =
> > [ 2528.744679] WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected
> > [ 2528.749891] 4.15.0-6-g4548e6f42022-dirty #418 Not tainted
> > [ 2528.754356] -
> > [ 2528.759643] kworker/0:1H/100 [HC0[0]:SC0[0]:HE0:SE1] is trying to 
> > acquire:
> > [ 2528.764608]  (&(&session->frwd_lock)->rlock){+.-.}, at: 
> > [<97d8de0f>] iscsi_eh_cmd_timed_out+0x3d/0x2b0
> > [ 2528.769164]
> >and this task is already holding:
> > [ 2528.771313]  (&(&q->__queue_lock)->rlock){-.-.}, at: 
> > [] blk_timeout_work+0x22/0x120
> > [ 2528.773825] which would create a new lock dependency:
> > [ 2528.775216]  (&(&q->__queue_lock)->rlock){-.-.} -> 
> > (&(&session->frwd_lock)->rlock){+.-.}
> > [ 2528.777071]
> >but this new dependency connects a HARDIRQ-irq-safe lock:
> > [ 2528.778945]  (&(&q->__queue_lock)->rlock){-.-.}
> > [ 2528.778948]
> >... which became HARDIRQ-irq-safe at:
> > [ 2528.781204]   _raw_spin_lock_irqsave+0x3c/0x4b
> > [ 2528.781966]   cfq_idle_slice_timer+0x2f/0x100
> > [ 2528.782705]   __hrtimer_run_queues+0xdc/0x440
> > [ 2528.783448]   hrtimer_interrupt+0xb1/0x210
> > [ 2528.784149]   smp_apic_timer_interrupt+0x6d/0x260
> > [ 2528.784954]   apic_timer_interrupt+0xac/0xc0
> > [ 2528.785679]   native_safe_halt+0x2/0x10
> > [ 2528.786280]   default_idle+0x1f/0x180
> > [ 2528.786806]   do_idle+0x166/0x1e0
> > [ 2528.787288]   cpu_startup_entry+0x6f/0x80
> > [ 2528.787874]   start_secondary+0x186/0x1e0
> > [ 2528.788448]   secondary_startup_64+0xa5/0xb0
> > [ 2528.789070]
> >to a HARDIRQ-irq-unsafe lock:
> > [ 2528.789913]  (&(&session->frwd_lock)->rlock){+.-.}
> > [ 2528.789915]
> >... which became HARDIRQ-irq-unsafe at:
> > [ 2528.791548] ...
> > [ 2528.791553]   _raw_spin_lock_bh+0x34/0x40
> > [ 2528.792366]   iscsi_conn_setup+0x166/0x220
> > [ 2528.792960]   iscsi_tcp_conn_setup+0x10/0x40
> > [ 2528.793526]   iscsi_sw_tcp_conn_create+0x1b/0x160
> > [ 2528.794111]   iscsi_if_rx+0xf9f/0x1580
> > [ 2528.794542]   netlink_unicast+0x1f7/0x2f0
> > [ 2528.795105]   netlink_sendmsg+0x2c9/0x3c0
> > [ 2528.795636]   sock_sendmsg+0x30/0x40
> > [ 2528.796068]   ___sys_sendmsg+0x269/0x2c0
> > [ 2528.796544]   __sys_sendmsg+0x51/0x90
> > [ 2528.797028]   entry_SYSCALL_64_fastpath+0x25/0x9c
> > [ 2528.797595]
> > ...
> 
> This commit avoids possible reverse deadlock.
> 
> Signed-off-by: Lee Duncan 
> ---
>  drivers/scsi/libiscsi.c | 18 +-
>  1 file changed, 9 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
> index 82c3fd4bc938..055357b2fe9e 100644
> --- a/drivers/scsi/libiscsi.c
> +++ b/drivers/scsi/libiscsi.c
> @@ -1248,9 +1248,9 @@ int iscsi_complete_pdu(struct iscsi_conn *conn, struct 
> iscsi_hdr *hdr,
>  {
>   int rc;
>  
> - spin_lock(&conn->session->lock);
> + spin_lock_bh(&conn->session->lock);
>   rc = __iscsi_complete_pdu(conn, hdr, data, datalen);
> - spin_unlock(&conn->session->lock);
> + spin_unlock_bh(&conn->session->lock);
>   return rc;
>  }
>  EXPORT_SYMBOL_GPL(iscsi_complete_pdu);
> @@ -1749,14 +1749,14 @@ static void iscsi_tmf_timedout(unsigned long data)
>   struct iscsi_conn *conn = (struct iscsi_conn *)data;
>   struct iscsi_session *session = conn->session;
>  
> - spin_lock(&session->lock);
> + spin_lock_bh(&session->lock);
>   if (conn->tmf_state == TMF_QUEUED) {
>

Re: [PATCH] libiscsi: ensure session spin lock usage consistent

2018-02-07 Thread Chris Leech

I overlooked it by mentally swapping out the session->lock in the patch
for session->frwd_lock from the warning when looking at this, but what
kernel was this patch built against?  It doesn't have the
frwd_lock/back_lock split stuff.

- Chris

On Mon, Feb 05, 2018 at 11:13:23AM -0800, Lee Duncan wrote:
> The libiscsi code was using both spin_lock()/spin_unlock()
> and spin_lock_bh()/spin_unlock_bh() on its session lock.
> In addition, lock validation found that libiscsi.c was
> taking a HARDIRQ-unsafe lock while holding an HARDIRQ-
> irq-safe lock:
> 
> > [ 2528.738454] =
> > [ 2528.744679] WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected
> > [ 2528.749891] 4.15.0-6-g4548e6f42022-dirty #418 Not tainted
> > [ 2528.754356] -
> > [ 2528.759643] kworker/0:1H/100 [HC0[0]:SC0[0]:HE0:SE1] is trying to 
> > acquire:
> > [ 2528.764608]  (&(&session->frwd_lock)->rlock){+.-.}, at: 
> > [<97d8de0f>] iscsi_eh_cmd_timed_out+0x3d/0x2b0
> > [ 2528.769164]
> >and this task is already holding:
> > [ 2528.771313]  (&(&q->__queue_lock)->rlock){-.-.}, at: 
> > [] blk_timeout_work+0x22/0x120
> > [ 2528.773825] which would create a new lock dependency:
> > [ 2528.775216]  (&(&q->__queue_lock)->rlock){-.-.} -> 
> > (&(&session->frwd_lock)->rlock){+.-.}
> > [ 2528.777071]
> >but this new dependency connects a HARDIRQ-irq-safe lock:
> > [ 2528.778945]  (&(&q->__queue_lock)->rlock){-.-.}
> > [ 2528.778948]
> >... which became HARDIRQ-irq-safe at:
> > [ 2528.781204]   _raw_spin_lock_irqsave+0x3c/0x4b
> > [ 2528.781966]   cfq_idle_slice_timer+0x2f/0x100
> > [ 2528.782705]   __hrtimer_run_queues+0xdc/0x440
> > [ 2528.783448]   hrtimer_interrupt+0xb1/0x210
> > [ 2528.784149]   smp_apic_timer_interrupt+0x6d/0x260
> > [ 2528.784954]   apic_timer_interrupt+0xac/0xc0
> > [ 2528.785679]   native_safe_halt+0x2/0x10
> > [ 2528.786280]   default_idle+0x1f/0x180
> > [ 2528.786806]   do_idle+0x166/0x1e0
> > [ 2528.787288]   cpu_startup_entry+0x6f/0x80
> > [ 2528.787874]   start_secondary+0x186/0x1e0
> > [ 2528.788448]   secondary_startup_64+0xa5/0xb0
> > [ 2528.789070]
> >to a HARDIRQ-irq-unsafe lock:
> > [ 2528.789913]  (&(&session->frwd_lock)->rlock){+.-.}
> > [ 2528.789915]
> >... which became HARDIRQ-irq-unsafe at:
> > [ 2528.791548] ...
> > [ 2528.791553]   _raw_spin_lock_bh+0x34/0x40
> > [ 2528.792366]   iscsi_conn_setup+0x166/0x220
> > [ 2528.792960]   iscsi_tcp_conn_setup+0x10/0x40
> > [ 2528.793526]   iscsi_sw_tcp_conn_create+0x1b/0x160
> > [ 2528.794111]   iscsi_if_rx+0xf9f/0x1580
> > [ 2528.794542]   netlink_unicast+0x1f7/0x2f0
> > [ 2528.795105]   netlink_sendmsg+0x2c9/0x3c0
> > [ 2528.795636]   sock_sendmsg+0x30/0x40
> > [ 2528.796068]   ___sys_sendmsg+0x269/0x2c0
> > [ 2528.796544]   __sys_sendmsg+0x51/0x90
> > [ 2528.797028]   entry_SYSCALL_64_fastpath+0x25/0x9c
> > [ 2528.797595]
> > ...
> 
> This commit avoids possible reverse deadlock.
> 
> Signed-off-by: Lee Duncan 
> ---
>  drivers/scsi/libiscsi.c | 18 +-
>  1 file changed, 9 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
> index 82c3fd4bc938..055357b2fe9e 100644
> --- a/drivers/scsi/libiscsi.c
> +++ b/drivers/scsi/libiscsi.c
> @@ -1248,9 +1248,9 @@ int iscsi_complete_pdu(struct iscsi_conn *conn, struct 
> iscsi_hdr *hdr,
>  {
>   int rc;
>  
> - spin_lock(&conn->session->lock);
> + spin_lock_bh(&conn->session->lock);
>   rc = __iscsi_complete_pdu(conn, hdr, data, datalen);
> - spin_unlock(&conn->session->lock);
> + spin_unlock_bh(&conn->session->lock);
>   return rc;
>  }
>  EXPORT_SYMBOL_GPL(iscsi_complete_pdu);
> @@ -1749,14 +1749,14 @@ static void iscsi_tmf_timedout(unsigned long data)
>   struct iscsi_conn *conn = (struct iscsi_conn *)data;
>   struct iscsi_session *session = conn->session;
>  
> - spin_lock(&session->lock);
> + spin_lock_bh(&session->lock);
>   if (conn->tmf_state == TMF_QUEUED) {
>   conn->tmf_state = TMF_TIMEDOUT;
>   ISCSI_DBG_EH(session, "tmf timedout\n");
>   /* unblock eh_abort() */
>   wake_up(&conn->ehwait);
>   }
> - spin_unlock(&session->lock);
> + spin_unlock_bh(&session->lock);
>  }
>  
>  static int iscsi_exec_task_mgmt_fn(struct iscsi_conn *conn,
> @@ -1908,7 +1908,7 @@ static enum blk_eh_timer_return 
> iscsi_eh_cmd_timed_out(struct scsi_cmnd *sc)
>  
>   ISCSI_DBG_EH(session, "scsi cmd %p timedout\n", sc);
>  
> - spin_lock(&session->lock);
> + spin_lock_bh(&session->lock);
>   task = (struct iscsi_task *)sc->SCp.ptr;
>   if (!task) {
>   /*
> @@ -2022,7 +2022,7 @@ static enum blk_eh_timer_return 
> iscsi_eh_cmd_timed_out(struct scsi_cmnd *sc)
>  done:
>   if (task)
>   task->last_timeout = jiffies;

Re: [PATCH 0/5] blk-mq/scsi-mq: support global tags & introduce force_blk_mq

2018-02-07 Thread Hannes Reinecke

On 02/07/2018 03:14 PM, Kashyap Desai wrote:
>> -Original Message-
>> From: Ming Lei [mailto:ming@redhat.com]
>> Sent: Wednesday, February 7, 2018 5:53 PM
>> To: Hannes Reinecke
>> Cc: Kashyap Desai; Jens Axboe; linux-bl...@vger.kernel.org; Christoph
>> Hellwig; Mike Snitzer; linux-scsi@vger.kernel.org; Arun Easi; Omar
> Sandoval;
>> Martin K . Petersen; James Bottomley; Christoph Hellwig; Don Brace;
> Peter
>> Rivera; Paolo Bonzini; Laurence Oberman
>> Subject: Re: [PATCH 0/5] blk-mq/scsi-mq: support global tags & introduce
>> force_blk_mq
>>
>> On Wed, Feb 07, 2018 at 07:50:21AM +0100, Hannes Reinecke wrote:
>>> Hi all,
>>>
>>> [ .. ]
>
> Could you share us your patch for enabling global_tags/MQ on
 megaraid_sas
> so that I can reproduce your test?
>
>> See below perf top data. "bt_iter" is consuming 4 times more CPU.
>
> Could you share us what the IOPS/CPU utilization effect is after
 applying the
> patch V2? And your test script?
 Regarding CPU utilization, I need to test one more time. Currently
 system is in used.

 I run below fio test on total 24 SSDs expander attached.

 numactl -N 1 fio jbod.fio --rw=randread --iodepth=64 --bs=4k
 --ioengine=libaio --rw=randread

 Performance dropped from 1.6 M IOPs to 770K IOPs.

>>> This is basically what we've seen with earlier iterations.
>>
>> Hi Hannes,
>>
>> As I mentioned in another mail[1], Kashyap's patch has a big issue,
> which
>> causes only reply queue 0 used.
>>
>> [1] https://marc.info/?l=linux-scsi&m=151793204014631&w=2
>>
>> So could you guys run your performance test again after fixing the
> patch?
> 
> Ming -
> 
> I tried after change you requested.  Performance drop is still unresolved.
> From 1.6 M IOPS to 770K IOPS.
> 
> See below data. All 24 reply queue is in used correctly.
> 
> IRQs / 1 second(s)
> IRQ#  TOTAL  NODE0   NODE1  NAME
>  360  16422  0   16422  IR-PCI-MSI 70254653-edge megasas
>  364  15980  0   15980  IR-PCI-MSI 70254657-edge megasas
>  362  15979  0   15979  IR-PCI-MSI 70254655-edge megasas
>  345  15696  0   15696  IR-PCI-MSI 70254638-edge megasas
>  341  15659  0   15659  IR-PCI-MSI 70254634-edge megasas
>  369  15656  0   15656  IR-PCI-MSI 70254662-edge megasas
>  359  15650  0   15650  IR-PCI-MSI 70254652-edge megasas
>  358  15596  0   15596  IR-PCI-MSI 70254651-edge megasas
>  350  15574  0   15574  IR-PCI-MSI 70254643-edge megasas
>  342  15532  0   15532  IR-PCI-MSI 70254635-edge megasas
>  344  15527  0   15527  IR-PCI-MSI 70254637-edge megasas
>  346  15485  0   15485  IR-PCI-MSI 70254639-edge megasas
>  361  15482  0   15482  IR-PCI-MSI 70254654-edge megasas
>  348  15467  0   15467  IR-PCI-MSI 70254641-edge megasas
>  368  15463  0   15463  IR-PCI-MSI 70254661-edge megasas
>  354  15420  0   15420  IR-PCI-MSI 70254647-edge megasas
>  351  15378  0   15378  IR-PCI-MSI 70254644-edge megasas
>  352  15377  0   15377  IR-PCI-MSI 70254645-edge megasas
>  356  15348  0   15348  IR-PCI-MSI 70254649-edge megasas
>  337  15344  0   15344  IR-PCI-MSI 70254630-edge megasas
>  343  15320  0   15320  IR-PCI-MSI 70254636-edge megasas
>  355  15266  0   15266  IR-PCI-MSI 70254648-edge megasas
>  335  15247  0   15247  IR-PCI-MSI 70254628-edge megasas
>  363  15233  0   15233  IR-PCI-MSI 70254656-edge megasas
> 
> 
> Average:CPU  %usr %nice  %sys   %iowait%steal
> %irq %soft%guest%gnice %idle
> Average: 18  3.80  0.00 14.78 10.08  0.00
> 0.00  4.01  0.00  0.00 67.33
> Average: 19  3.26  0.00 15.35 10.62  0.00
> 0.00  4.03  0.00  0.00 66.74
> Average: 20  3.42  0.00 14.57 10.67  0.00
> 0.00  3.84  0.00  0.00 67.50
> Average: 21  3.19  0.00 15.60 10.75  0.00
> 0.00  4.16  0.00  0.00 66.30
> Average: 22  3.58  0.00 15.15 10.66  0.00
> 0.00  3.51  0.00  0.00 67.11
> Average: 23  3.34  0.00 15.36 10.63  0.00
> 0.00  4.17  0.00  0.00 66.50
> Average: 24  3.50  0.00 14.58 10.93  0.00
> 0.00  3.85  0.00  0.00 67.13
> Average: 25  3.20  0.00 14.68 10.86  0.00
> 0.00  4.31  0.00  0.00 66.95
> Average: 26  3.27  0.00 14.80 10.70  0.00
> 0.00  3.68  0.00  0.00 67.55
> Average: 27  3.58  0.00 15.36 10.80  0.00
> 0.00  3.79  0.00  0.00 66.48
> Average: 28  3.46  0.00 15.17 10.46  0.00
> 0.00  3.32  0.00  0.00 67.59
> Average: 29  3.34  0.00 14.42 10.72  0.00
> 0.00  3.34  0.00  0.00 68.18
> Average

52 matches

Mail list logo