Re: [PATCH v2 01/13] lpfc: Rework lpfc to allow different sli4 cq and eq handlers
I first though of renameing: lpfc_sli4_eq_clr_intr -> lpfc_sli4_t2_eq_clr_intr lpfc_sli4_eq_release -> lpfc_sli4_t2_eq_release lpfc_sli4_cq_release -> lpfc_sli4_t2_cq_release and then make lpfc_sli4_eq_clr_intr, etc... wrappers over the new function pointers so and you could keep the callsites, but then you'd have to change the callsites anyways as the function prototype will need to change. So looks good, Reviewed-by: Johannes Thumshirn -- Johannes Thumshirn Storage jthumsh...@suse.de+49 911 74053 689 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
Re: [PATCH v2 02/13] lpfc: Rework sli4 doorbell infrastructure
> diff --git a/drivers/scsi/lpfc/lpfc_sli4.h b/drivers/scsi/lpfc/lpfc_sli4.h > index 0c0cbe296fed..e2f06c92c4dd 100644 > --- a/drivers/scsi/lpfc/lpfc_sli4.h > +++ b/drivers/scsi/lpfc/lpfc_sli4.h > @@ -569,7 +569,8 @@ struct lpfc_sli4_hba { > /* IF type 0, BAR 0 and if type 2, BAR 0 doorbell register memory map */ > void __iomem *RQDBregaddr; /* RQ_DOORBELL register */ > void __iomem *WQDBregaddr; /* WQ_DOORBELL register */ > - void __iomem *EQCQDBregaddr;/* EQCQ_DOORBELL register */ > + void __iomem *CQDBregaddr; /* CQ_DOORBELL register */ > + void __iomem *EQDBregaddr; /* EQ_DOORBELL register */ > void __iomem *MQDBregaddr; /* MQ_DOORBELL register */ > void __iomem *BMBXregaddr; /* BootStrap MBX register */ Reviewed-by: Johannes Thumshirn Although CQDBregaddr, EQDBregaddr, BMBXregaddr, etc... really look ugly and this would've been the perfect chance to change it ;-) -- Johannes Thumshirn Storage jthumsh...@suse.de+49 911 74053 689 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
Re: [PATCH v2 03/13] lpfc: Add SLI-4 if_type=6 support to the code base
Looks good, Reviewed-by: Johannes Thumshirn -- Johannes Thumshirn Storage jthumsh...@suse.de+49 911 74053 689 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
Re: [PATCH v2 04/13] lpfc: Add push-to-adapter support to sli4
> + if ((if_type == LPFC_SLI_INTF_IF_TYPE_6) && > + (pci_resource_start(pdev, PCI_64BIT_BAR4))) { The above contains a lot of unneeded parenthesis. [...] > + /* Enable combined writes for DPP aperture */ > + pg_addr = (unsigned long)(wq->dpp_regaddr) & PAGE_MASK; > +#ifdef CONFIG_X86 > + rc = set_memory_wc(pg_addr, 1); > + if (rc) { > + lpfc_printf_log(phba, KERN_ERR, LOG_INIT, > + "3272 Cannot setup Combined " > + "Write on WQ[%d] - disable > DPP\n", > + wq->queue_id); > + phba->cfg_enable_dpp = 0; > + } > +#else > + phba->cfg_enable_dpp = 0; > +#endif > + } else > + wq->db_regaddr = phba->sli4_hba.WQDBregaddr; I don't really like the set_memory_wc() call here. Neither do I like the ifdef CONFIG_X86 special casing. If you really need write combining, can't you at least use ioremap_wc()? Thanks, Johannes -- Johannes Thumshirn Storage jthumsh...@suse.de+49 911 74053 689 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
Re: [PATCH v2 05/13] lpfc: Add PCI Ids for if_type=6 hardware
Looks good, Reviewed-by: Johannes Thumshirn -- Johannes Thumshirn Storage jthumsh...@suse.de+49 911 74053 689 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
Re: [PATCH v2 06/13] lpfc: Add 64G link speed support
On Tue, Feb 06, 2018 at 06:28:44PM -0800, James Smart wrote: > lpfc_printf_log(phba, KERN_ERR, LOG_INIT, > - "0469 lpfc_link_speed attribute cannot be set to %d, " > - "allowed values are ["LPFC_LINK_SPEED_STRING"]\n", val); > + "0469 lpfc_link_speed attribute cannot be set to %d, " > + "allowed values are ["LPFC_LINK_SPEED_STRING"]\n", val); lpfc_printf_log(phba, KERN_ERR, LOG_INIT, "0469 lpfc_link_speed attribute cannot be set to %d, " "allowed values are [%s]\n", val, LPFC_LINK_SPEED_STRING); And possible have the whole quoted string on one line for easier grepping. checkplatch.pl should've told you that. Thanks, Johannes -- Johannes Thumshirn Storage jthumsh...@suse.de+49 911 74053 689 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
Re: [PATCH v2 07/13] lpfc: Add if_type=6 support for cycling valid bits
Looks good, Reviewed-by: Johannes Thumshirn -- Johannes Thumshirn Storage jthumsh...@suse.de+49 911 74053 689 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
Re: [PATCH v2 08/13] lpfc: Enable fw download on if_type=6 devices
Looks good, Reviewed-by: Johannes Thumshirn -- Johannes Thumshirn Storage jthumsh...@suse.de+49 911 74053 689 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
Re: [PATCH v2 09/13] lpfc: Add embedded data pointers for enhanced performance
On Tue, Feb 06, 2018 at 06:28:47PM -0800, James Smart wrote: > + if (do_pbde && (i == 0)) { Nit: if (do_pde && i == 0) { Else: Reviewed-by: Johannes Thumshirn -- Johannes Thumshirn Storage jthumsh...@suse.de+49 911 74053 689 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
Re: [PATCH v2 04/13] lpfc: Add push-to-adapter support to sli4
On Wed, Feb 07, 2018 at 10:51:57AM +0100, Johannes Thumshirn wrote: > > + /* Enable combined writes for DPP aperture */ > > + pg_addr = (unsigned long)(wq->dpp_regaddr) & PAGE_MASK; > > +#ifdef CONFIG_X86 > > + rc = set_memory_wc(pg_addr, 1); > > + if (rc) { > > + lpfc_printf_log(phba, KERN_ERR, LOG_INIT, > > + "3272 Cannot setup Combined " > > + "Write on WQ[%d] - disable > > DPP\n", > > + wq->queue_id); > > + phba->cfg_enable_dpp = 0; > > + } > > +#else > > + phba->cfg_enable_dpp = 0; > > +#endif > > + } else > > + wq->db_regaddr = phba->sli4_hba.WQDBregaddr; > > I don't really like the set_memory_wc() call here. Neither do I like the ifdef > CONFIG_X86 special casing. > > If you really need write combining, can't you at least use ioremap_wc()? Coming back to this again (after talking to our ARM/POWER folks internally). Is this really x86 specific here? I know there are servers with other architectures using lpfcs out there. I _think_ write combining should be possible on other architectures (that have PCIe and aren't dead) as well. The ioremap_wc() I suggested is probably wrong. So can you please revisit this? I CCed Mark and Michael, maybe they can help here. Byte, Johannes -- Johannes Thumshirn Storage jthumsh...@suse.de+49 911 74053 689 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
Re: [PATCH v2 10/13] lpfc: Fix nvme embedded io length on new hardware
Looks good, Reviewed-by: Johannes Thumshirn -- Johannes Thumshirn Storage jthumsh...@suse.de+49 911 74053 689 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
Re: [PATCH v2 12/13] lpfc: update driver version to 12.0.0.0
Looks good, Reviewed-by: Johannes Thumshirn -- Johannes Thumshirn Storage jthumsh...@suse.de+49 911 74053 689 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
Re: [PATCH v2 11/13] lpfc: Work around NVME cmd iu SGL type
Looks good, Reviewed-by: Johannes Thumshirn -- Johannes Thumshirn Storage jthumsh...@suse.de+49 911 74053 689 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
Re: [PATCH v2 13/13] lpfc: Update 12.0.0.0 modified files for 2018 Copyright
Looks good, Reviewed-by: Johannes Thumshirn Although I'd change the subject to: "lpfc: Change Copyright of Update 12.0.0.0 modified files to 2018" Reads easier for the non-native english speaker (like me). -- Johannes Thumshirn Storage jthumsh...@suse.de+49 911 74053 689 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
[V1 2/6] mpt3sas: Configure reply post queue depth, DMA and sgl tablesize.
This configures shost max sector to 128, single reply descriptor post queue, sgl table size to 16 and 32 bit DMA for MPI Endpoint and it supports 64K as max IO. Signed-off-by: Suganath Prabu S --- drivers/scsi/mpt3sas/mpt3sas_base.c | 47 +++- drivers/scsi/mpt3sas/mpt3sas_scsih.c | 40 ++ 2 files changed, 54 insertions(+), 33 deletions(-) diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c b/drivers/scsi/mpt3sas/mpt3sas_base.c index 13d6e4e..f45da9a 100644 --- a/drivers/scsi/mpt3sas/mpt3sas_base.c +++ b/drivers/scsi/mpt3sas/mpt3sas_base.c @@ -2214,6 +2214,9 @@ _base_config_dma_addressing(struct MPT3SAS_ADAPTER *ioc, struct pci_dev *pdev) struct sysinfo s; u64 consistent_dma_mask; + if (ioc->is_mcpu_endpoint) + goto try_32bit; + if (ioc->dma_mask) consistent_dma_mask = DMA_BIT_MASK(64); else @@ -2232,6 +2235,7 @@ _base_config_dma_addressing(struct MPT3SAS_ADAPTER *ioc, struct pci_dev *pdev) } } + try_32bit: if (!pci_set_dma_mask(pdev, DMA_BIT_MASK(32)) && !pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(32))) { ioc->base_add_sg_single = &_base_add_sg_single_32; @@ -3887,17 +3891,21 @@ _base_allocate_memory_pools(struct MPT3SAS_ADAPTER *ioc) sg_tablesize = min_t(unsigned short, sg_tablesize, MPT_KDUMP_MIN_PHYS_SEGMENTS); - if (sg_tablesize < MPT_MIN_PHYS_SEGMENTS) - sg_tablesize = MPT_MIN_PHYS_SEGMENTS; - else if (sg_tablesize > MPT_MAX_PHYS_SEGMENTS) { - sg_tablesize = min_t(unsigned short, sg_tablesize, - SG_MAX_SEGMENTS); - pr_warn(MPT3SAS_FMT -"sg_tablesize(%u) is bigger than kernel" -" defined SG_CHUNK_SIZE(%u)\n", ioc->name, -sg_tablesize, MPT_MAX_PHYS_SEGMENTS); + if (ioc->is_mcpu_endpoint) + ioc->shost->sg_tablesize = MPT_MIN_PHYS_SEGMENTS; + else { + if (sg_tablesize < MPT_MIN_PHYS_SEGMENTS) + sg_tablesize = MPT_MIN_PHYS_SEGMENTS; + else if (sg_tablesize > MPT_MAX_PHYS_SEGMENTS) { + sg_tablesize = min_t(unsigned short, sg_tablesize, + SG_MAX_SEGMENTS); + pr_warn(MPT3SAS_FMT + "sg_tablesize(%u) is bigger than kernel " + "defined SG_CHUNK_SIZE(%u)\n", ioc->name, + sg_tablesize, MPT_MAX_PHYS_SEGMENTS); + } + ioc->shost->sg_tablesize = sg_tablesize; } - ioc->shost->sg_tablesize = sg_tablesize; ioc->internal_depth = min_t(int, (facts->HighPriorityCredit + (5)), (facts->RequestCredit / 4)); @@ -3982,13 +3990,18 @@ _base_allocate_memory_pools(struct MPT3SAS_ADAPTER *ioc) /* reply free queue sizing - taking into account for 64 FW events */ ioc->reply_free_queue_depth = ioc->hba_queue_depth + 64; - /* calculate reply descriptor post queue depth */ - ioc->reply_post_queue_depth = ioc->hba_queue_depth + - ioc->reply_free_queue_depth + 1 ; - /* align the reply post queue on the next 16 count boundary */ - if (ioc->reply_post_queue_depth % 16) - ioc->reply_post_queue_depth += 16 - - (ioc->reply_post_queue_depth % 16); + /* mCPU manage single counters for simplicity */ + if (ioc->is_mcpu_endpoint) + ioc->reply_post_queue_depth = ioc->reply_free_queue_depth; + else { + /* calculate reply descriptor post queue depth */ + ioc->reply_post_queue_depth = ioc->hba_queue_depth + + ioc->reply_free_queue_depth + 1; + /* align the reply post queue on the next 16 count boundary */ + if (ioc->reply_post_queue_depth % 16) + ioc->reply_post_queue_depth += 16 - + (ioc->reply_post_queue_depth % 16); + } if (ioc->reply_post_queue_depth > facts->MaxReplyDescriptorPostQueueDepth) { diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c index bde3c6f..5e52679 100644 --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c @@ -10521,26 +10521,34 @@ _scsih_probe(struct pci_dev *pdev, const struct pci_device_id *id) shost->transportt = mpt3sas_transport_template; shost->unique_id = ioc->id; - if (max_sectors != 0x) { - if (max_sectors < 64) { - shost->max_sectors = 64; - pr_warn(MPT3SAS_FMT "Invalid value %d passed " \ - "for max_sectors, range is 64 to 32767. Assigning " - "va
[V1 1/6] mpt3sas: Add PCI device ID for Andromeda.
Add device ID and flag for Andromeda/MPI Emdpont. Signed-off-by: Suganath Prabu S --- drivers/scsi/mpt3sas/mpi/mpi2_cnfg.h | 1 + drivers/scsi/mpt3sas/mpt3sas_base.h | 1 + drivers/scsi/mpt3sas/mpt3sas_scsih.c | 14 -- 3 files changed, 14 insertions(+), 2 deletions(-) diff --git a/drivers/scsi/mpt3sas/mpi/mpi2_cnfg.h b/drivers/scsi/mpt3sas/mpi/mpi2_cnfg.h index ee11710..0ad88de 100644 --- a/drivers/scsi/mpt3sas/mpi/mpi2_cnfg.h +++ b/drivers/scsi/mpt3sas/mpi/mpi2_cnfg.h @@ -524,6 +524,7 @@ typedef struct _MPI2_CONFIG_REPLY { #define MPI2_MFGPAGE_DEVID_SAS2308_1(0x0086) #define MPI2_MFGPAGE_DEVID_SAS2308_2(0x0087) #define MPI2_MFGPAGE_DEVID_SAS2308_3(0x006E) +#define MPI2_MFGPAGE_DEVID_SAS2308_MPI_EP (0x02B0) /*MPI v2.5 SAS products */ #define MPI25_MFGPAGE_DEVID_SAS3004 (0x0096) diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.h b/drivers/scsi/mpt3sas/mpt3sas_base.h index 789bc42..897394d 100644 --- a/drivers/scsi/mpt3sas/mpt3sas_base.h +++ b/drivers/scsi/mpt3sas/mpt3sas_base.h @@ -1336,6 +1336,7 @@ struct MPT3SAS_ADAPTER { u32 ring_buffer_offset; u32 ring_buffer_sz; u8 is_warpdrive; + u8 is_mcpu_endpoint; u8 hide_ir_msg; u8 mfg_pg10_hide_flag; u8 hide_drives; diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c index 74fca18..bde3c6f 100644 --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c @@ -10335,6 +10335,7 @@ _scsih_determine_hba_mpi_version(struct pci_dev *pdev) case MPI2_MFGPAGE_DEVID_SAS2308_1: case MPI2_MFGPAGE_DEVID_SAS2308_2: case MPI2_MFGPAGE_DEVID_SAS2308_3: + case MPI2_MFGPAGE_DEVID_SAS2308_MPI_EP: return MPI2_VERSION; case MPI25_MFGPAGE_DEVID_SAS3004: case MPI25_MFGPAGE_DEVID_SAS3008: @@ -10412,11 +10413,18 @@ _scsih_probe(struct pci_dev *pdev, const struct pci_device_id *id) ioc->hba_mpi_version_belonged = hba_mpi_version; ioc->id = mpt2_ids++; sprintf(ioc->driver_name, "%s", MPT2SAS_DRIVER_NAME); - if (pdev->device == MPI2_MFGPAGE_DEVID_SSS6200) { + switch (pdev->device) { + case MPI2_MFGPAGE_DEVID_SSS6200: ioc->is_warpdrive = 1; ioc->hide_ir_msg = 1; - } else + break; + case MPI2_MFGPAGE_DEVID_SAS2308_MPI_EP: + ioc->is_mcpu_endpoint = 1; + break; + default: ioc->mfg_pg10_hide_flag = MFG_PAGE10_EXPOSE_ALL_DISKS; + break; + } break; case MPI25_VERSION: case MPI26_VERSION: @@ -10845,6 +10853,8 @@ static const struct pci_device_id mpt3sas_pci_table[] = { PCI_ANY_ID, PCI_ANY_ID }, { MPI2_MFGPAGE_VENDORID_LSI, MPI2_MFGPAGE_DEVID_SAS2308_3, PCI_ANY_ID, PCI_ANY_ID }, + { MPI2_MFGPAGE_VENDORID_LSI, MPI2_MFGPAGE_DEVID_SAS2308_MPI_EP, + PCI_ANY_ID, PCI_ANY_ID }, /* SSS6200 */ { MPI2_MFGPAGE_VENDORID_LSI, MPI2_MFGPAGE_DEVID_SSS6200, PCI_ANY_ID, PCI_ANY_ID }, -- 2.5.5
[V1 3/6] mpt3sas: Introduce API's to get BAR0 mapped buffer address.
For MPI Endpoint/Mcpu, Driver should double buffer data buffer/sgl's. This is normally copied from host to internal memory of IOC by DMA engine of PCI Device. Since the interface to DMA from host to mCPU is not present for Mcpu/MPI Endpoint device, Driver does double copy of those buffer directly to the mCPU memory region via BAR-0 region. Introduced API's to calculate and return BAR0 mapped host buffer's physical and virtual address for the provided smid Signed-off-by: Suganath Prabu S --- drivers/scsi/mpt3sas/mpt3sas_base.c | 93 + drivers/scsi/mpt3sas/mpt3sas_base.h | 2 + 2 files changed, 95 insertions(+) diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c b/drivers/scsi/mpt3sas/mpt3sas_base.c index f45da9a..36f1242 100644 --- a/drivers/scsi/mpt3sas/mpt3sas_base.c +++ b/drivers/scsi/mpt3sas/mpt3sas_base.c @@ -126,6 +126,99 @@ module_param_call(mpt3sas_fwfault_debug, _scsih_set_fwfault_debug, param_get_int, &mpt3sas_fwfault_debug, 0644); /** + * _base_get_chain - Calculates and Returns virtual chain address + * for the provided smid in BAR0 space. + * + * @ioc: per adapter object + * @smid: system request message index + * @sge_chain_count: Scatter gather chain count. + * + * @Return: chain address. + */ +static inline void __iomem* +_base_get_chain(struct MPT3SAS_ADAPTER *ioc, u16 smid, + u8 sge_chain_count) +{ + void __iomem *base_chain, *chain_virt; + u16 cmd_credit = ioc->facts.RequestCredit + 1; + + base_chain = (void __iomem *)ioc->chip + MPI_FRAME_START_OFFSET + + (cmd_credit * ioc->request_sz) + + REPLY_FREE_POOL_SIZE; + chain_virt = base_chain + (smid * ioc->facts.MaxChainDepth * + ioc->request_sz) + (sge_chain_count * ioc->request_sz); + return chain_virt; +} + +/** + * _base_get_chain_phys - Calculates and Returns physical address + * in BAR0 for scatter gather chains, for + * the provided smid. + * + * @ioc: per adapter object + * @smid: system request message index + * @sge_chain_count: Scatter gather chain count. + * + * @Return - Physical chain address. + */ +static inline void * +_base_get_chain_phys(struct MPT3SAS_ADAPTER *ioc, u16 smid, + u8 sge_chain_count) +{ + void *base_chain_phys, *chain_phys; + u16 cmd_credit = ioc->facts.RequestCredit + 1; + + base_chain_phys = (void *)ioc->chip_phys + MPI_FRAME_START_OFFSET + + (cmd_credit * ioc->request_sz) + + REPLY_FREE_POOL_SIZE; + chain_phys = base_chain_phys + (smid * ioc->facts.MaxChainDepth * + ioc->request_sz) + (sge_chain_count * ioc->request_sz); + return chain_phys; +} + +/** + * _base_get_buffer_bar0 - Calculates and Returns BAR0 mapped Host + * buffer address for the provided smid. + * (Each smid can have 64K starts from 17024) + * + * @ioc: per adapter object + * @smid: system request message index + * + * @Returns - Pointer to buffer location in BAR0. + */ + +static void __iomem * +_base_get_buffer_bar0(struct MPT3SAS_ADAPTER *ioc, u16 smid) +{ + u16 cmd_credit = ioc->facts.RequestCredit + 1; + // Added extra 1 to reach end of chain. + void __iomem *chain_end = _base_get_chain(ioc, + cmd_credit + 1, + ioc->facts.MaxChainDepth); + return chain_end + (smid * 64 * 1024); +} + +/** + * _base_get_buffer_phys_bar0 - Calculates and Returns BAR0 mapped + * Host buffer Physical address for the provided smid. + * (Each smid can have 64K starts from 17024) + * + * @ioc: per adapter object + * @smid: system request message index + * + * @Returns - Pointer to buffer location in BAR0. + */ +static void * +_base_get_buffer_phys_bar0(struct MPT3SAS_ADAPTER *ioc, u16 smid) +{ + u16 cmd_credit = ioc->facts.RequestCredit + 1; + void *chain_end_phys = _base_get_chain_phys(ioc, + cmd_credit + 1, + ioc->facts.MaxChainDepth); + return chain_end_phys + (smid * 64 * 1024); +} + +/** * mpt3sas_remove_dead_ioc_func - kthread context to remove dead ioc * @arg: input argument, used to derive ioc * diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.h b/drivers/scsi/mpt3sas/mpt3sas_base.h index 897394d..2529d25 100644 --- a/drivers/scsi/mpt3sas/mpt3sas_base.h +++ b/drivers/scsi/mpt3sas/mpt3sas_base.h @@ -120,6 +120,8 @@ #define MPT3SAS_NVME_QUEUE_DEPTH 128 #define MPT_NAME_LENGTH32 /* generic length of strings */ #define MPT_STRING_LENGTH 64 +#define MPI_FRAME_START_OFFSET 256 +#define REPLY_FREE_POOL_SIZE 512 /*(32 maxcredix *4)*(4 times)*/ #define MPT_MAX_CALLBACKS 32 -- 2.5.5
[V1 6/6] mpt3sas: Introduce function to clone mpi reply.
If the posted request has an error of any type, the IOC writes a Reply message into a host-based system reply message frame. This functions clone it in the BAR0 mapped region. Signed-off-by: Suganath Prabu S --- drivers/scsi/mpt3sas/mpt3sas_base.c | 37 - 1 file changed, 36 insertions(+), 1 deletion(-) diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c b/drivers/scsi/mpt3sas/mpt3sas_base.c index 52effd1..1c29286 100644 --- a/drivers/scsi/mpt3sas/mpt3sas_base.c +++ b/drivers/scsi/mpt3sas/mpt3sas_base.c @@ -126,6 +126,33 @@ module_param_call(mpt3sas_fwfault_debug, _scsih_set_fwfault_debug, param_get_int, &mpt3sas_fwfault_debug, 0644); /** + * _base_clone_reply_to_sys_mem - copies reply to reply free iomem + * in BAR0 space. + * + * @ioc: per adapter object + * @reply: reply message frame(lower 32bit addr) + * @index: System request message index. + * + * @Returns - Nothing + */ +static void +_base_clone_reply_to_sys_mem(struct MPT3SAS_ADAPTER *ioc, u32 reply, + u32 index) +{ + /* +* 256 is offset within sys register. +* 256 offset MPI frame starts. Max MPI frame supported is 32. +* 32 * 128 = 4K. From here, Clone of reply free for mcpu starts +*/ + u16 cmd_credit = ioc->facts.RequestCredit + 1; + void __iomem *reply_free_iomem = (void __iomem *)ioc->chip + + MPI_FRAME_START_OFFSET + + (cmd_credit * ioc->request_sz) + (index * sizeof(u32)); + + writel(reply, reply_free_iomem); +} + +/** * _base_clone_mpi_to_sys_mem - Writes/copies MPI frames * to system/BAR0 region. * @@ -1400,6 +1427,10 @@ _base_interrupt(int irq, void *bus_id) 0 : ioc->reply_free_host_index + 1; ioc->reply_free[ioc->reply_free_host_index] = cpu_to_le32(reply); + if (ioc->is_mcpu_endpoint) + _base_clone_reply_to_sys_mem(ioc, + cpu_to_le32(reply), + ioc->reply_free_host_index); writel(ioc->reply_free_host_index, &ioc->chip->ReplyFreeHostIndex); } @@ -6242,8 +6273,12 @@ _base_make_ioc_operational(struct MPT3SAS_ADAPTER *ioc) /* initialize Reply Free Queue */ for (i = 0, reply_address = (u32)ioc->reply_dma ; i < ioc->reply_free_queue_depth ; i++, reply_address += - ioc->reply_sz) + ioc->reply_sz) { ioc->reply_free[i] = cpu_to_le32(reply_address); + if (ioc->is_mcpu_endpoint) + _base_clone_reply_to_sys_mem(ioc, + (__le32)reply_address, i); + } /* initialize reply queues */ if (ioc->is_driver_loading) -- 2.5.5
[V1 0/6] mpt3sas: Adding MPI Endpoint device support.
V1 Change info: * Few sparse warning fixes over initial patch set. * For 32 bit Arch,_base_writeq function is identical to _base_mpi_ep_writeq, Removed duplicate code as suggested by Martin. Andromeda is a PCIe switch, and it has a dedicated management CPU (mCPU), nonvolatile flash memory, RAM etc... and Linux kernel runs on mCPU. MPI Endpoint driver is the management driver for Andromeda. The Plx Manager driver running on mCPU synthesizes a virtual/Synthetic MPI End point to host. Synthetic MPI End point is emulated IT firmware running on Linux operating system, which interfaces with PLX management driver. PLX Management driver integrates IOCFW in same driver binary. At the end of Plx_Mgr driver load, it initializes IOC FW as well. Current implementation is single instance of IOC FW (as it supports only one host). PLX management driver will provide required resources and infrastructure for Synthetic MPI End point. Existing PLXManagement driver will reserve virtual slot for MPI end point. currently, Virtual slot number 29 is reserved for MPI end point. Synthetic device in management driver will be marked as new type “PLX_DEV_TYPE_SYNTH_MPI_EP”. PLXmanagement driver will interface with Synthetic MPI Endpoint for any communication happening on PLX_DEV_TYPE_SYNTH_MPI_EP device type from host. Link between host and PLX C2 is in below diagram. ___ ___| | | | | | | PLX C2|===|HOST | | PCI - |===| MACHINE | | SWITCH | | | |___| | | || |___| || || ___||__ | | | MCPU | | | |___| After MPI end point implementation - (Host will see dedicated Virtual SLOT as MPI End point.) In Below single line is logical channel for MPI Endpoint ___ ___| | | | | | | PLX C2|===| HOST| | PCI - |===| MACHINE | | SWITCH | | | | | | --- | |___|---| | IT DRIVER | | || | | --- | || | |___| || | || | ___||__|___ | || | | | MCPU | | |___| | | | PLX MGR| | | | DRIVER | | | || | | | | |___|_ | | | | | | |IOC FW | | | |_| | |___| PLXmanagement driver will create MPI end point based on device table definition. PLXManagement driver will also populate Synthetic device tree based on Device Table for each host. >From host it will be seen as IT HBA (Simplified version of SAS2/MPI2) (PCI Device, in which emulated IT FW running on mCPU behind Synthetic endpoint of PCISWITCH). For host it is considered as actual Physical Device. PLX Management driver provide interface to do DMA from mCPU to Host using “MCPU Response Data Buffer“ method. DMA from Host to mCPU using “MCPU Response Data Buffer” is not possible. Why DMA from host to mCPU is not possible using Responsebuffer ? MCPU Response buffer is not really for reading from host (reading will work, but answer TLP will not come back to the CSR FIFO, but will go to the MCPU root complex - which could be an unexpected read completion! Existing host driver (mpt2sas) will not work for MPI end point. As the interface to DMA from host to mCPU is not present for Mcpu/MPI Endpoint device, To overcome this Driver should do double copy of those buffer directly to the mCPU memory region via BAR-0 region. The host BAR0 region is divided into different group to serve Host assisted DMA. 0- 255 System register(Doorbell, Host Interrupt etc) 256 - 4352MPI Frame. (This is based on maxCredit 32) 4352 - 4864Reply_free pool (512 byte is reserved considering maxCredit 32. Reply needsextra room, for mCPU case kept four times of maxCredit) 4864 -17152SGE chain element. (32 command * 3 chain of 128 byte size = 12288) 17152 -x Host buffer mapped with smid. (Each smid can have 64K Max IO.) BAR0+Last 1KMSIX Addr and DataTotalsize in use 2113664 bytes of 4MB BAR0 MPI end point module of PLX management driver must be aware of regions above. SGE and Host buffer details will be available in MPI frame. Each PCI packets coming from host on MPI end point will end up in mCPU PLXmanagement driver. We can consid
[V1 5/6] mpt3sas: Introduce function to clone mpi request.
1) Added function _base_clone_mpi_to_sys_mem to clone MPI request into system BAR0 mapped region. 2) Separate out MPI Endpoint IO submissions to function _base_put_smid_mpi_ep_scsi_io. 3) MPI EP requests are submitted in two 32 bit MMIO writes. from _base_mpi_ep_writeq. For 32 bit Arch,_base_writeq function is identical to _base_mpi_ep_writeq, Removed duplicate code as suggested. Signed-off-by: Suganath Prabu S --- drivers/scsi/mpt3sas/mpt3sas_base.c | 140 1 file changed, 125 insertions(+), 15 deletions(-) diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c b/drivers/scsi/mpt3sas/mpt3sas_base.c index c41c65b..52effd1 100644 --- a/drivers/scsi/mpt3sas/mpt3sas_base.c +++ b/drivers/scsi/mpt3sas/mpt3sas_base.c @@ -126,6 +126,25 @@ module_param_call(mpt3sas_fwfault_debug, _scsih_set_fwfault_debug, param_get_int, &mpt3sas_fwfault_debug, 0644); /** + * _base_clone_mpi_to_sys_mem - Writes/copies MPI frames + * to system/BAR0 region. + * + * @dst_iomem: Pointer to the destinaltion location in BAR0 space. + * @src: Pointer to the Source data. + * @size: Size of data to be copied. + */ +static void +_base_clone_mpi_to_sys_mem(void *dst_iomem, void *src, u32 size) +{ + int i; + u32 *src_virt_mem = (u32 *)src; + + for (i = 0; i < size/4; i++) + writel((u32)src_virt_mem[i], + (void __iomem *)dst_iomem + (i * 4)); +} + +/** * _base_clone_to_sys_mem - Writes/copies data to system/BAR0 region * * @dst_iomem: Pointer to the destinaltion location in BAR0 space. @@ -3268,6 +3287,29 @@ mpt3sas_base_free_smid(struct MPT3SAS_ADAPTER *ioc, u16 smid) } /** + * _base_mpi_ep_writeq - 32 bit write to MMIO + * @b: data payload + * @addr: address in MMIO space + * @writeq_lock: spin lock + * + * This special handling for MPI EP to take care of 32 bit + * environment where its not quarenteed to send the entire word + * in one transfer. + */ +static inline void +_base_mpi_ep_writeq(__u64 b, volatile void __iomem *addr, + spinlock_t *writeq_lock) +{ + unsigned long flags; + __u64 data_out = cpu_to_le64(b); + + spin_lock_irqsave(writeq_lock, flags); + writel((u32)(data_out), addr); + writel((u32)(data_out >> 32), (addr + 4)); + spin_unlock_irqrestore(writeq_lock, flags); +} + +/** * _base_writeq - 64 bit write to MMIO * @ioc: per adapter object * @b: data payload @@ -3288,17 +3330,41 @@ _base_writeq(__u64 b, volatile void __iomem *addr, spinlock_t *writeq_lock) static inline void _base_writeq(__u64 b, volatile void __iomem *addr, spinlock_t *writeq_lock) { - unsigned long flags; - __u64 data_out = cpu_to_le64(b); - - spin_lock_irqsave(writeq_lock, flags); - writel((u32)(data_out), addr); - writel((u32)(data_out >> 32), (addr + 4)); - spin_unlock_irqrestore(writeq_lock, flags); + _base_mpi_ep_writeq(b, addr, writeq_lock); } #endif /** + * _base_put_smid_mpi_ep_scsi_io - send SCSI_IO request to firmware + * @ioc: per adapter object + * @smid: system request message index + * @handle: device handle + * + * Return nothing. + */ +static void +_base_put_smid_mpi_ep_scsi_io(struct MPT3SAS_ADAPTER *ioc, u16 smid, u16 handle) +{ + Mpi2RequestDescriptorUnion_t descriptor; + u64 *request = (u64 *)&descriptor; + void *mpi_req_iomem; + __le32 *mfp = (__le32 *)mpt3sas_base_get_msg_frame(ioc, smid); + + _clone_sg_entries(ioc, (void *) mfp, smid); + mpi_req_iomem = (void *)ioc->chip + + MPI_FRAME_START_OFFSET + (smid * ioc->request_sz); + _base_clone_mpi_to_sys_mem(mpi_req_iomem, (void *)mfp, + ioc->request_sz); + descriptor.SCSIIO.RequestFlags = MPI2_REQ_DESCRIPT_FLAGS_SCSI_IO; + descriptor.SCSIIO.MSIxIndex = _base_get_msix_index(ioc); + descriptor.SCSIIO.SMID = cpu_to_le16(smid); + descriptor.SCSIIO.DevHandle = cpu_to_le16(handle); + descriptor.SCSIIO.LMID = 0; + _base_mpi_ep_writeq(*request, &ioc->chip->RequestDescriptorPostLow, + &ioc->scsi_lookup_lock); +} + +/** * _base_put_smid_scsi_io - send SCSI_IO request to firmware * @ioc: per adapter object * @smid: system request message index @@ -3359,7 +3425,23 @@ _base_put_smid_hi_priority(struct MPT3SAS_ADAPTER *ioc, u16 smid, u16 msix_task) { Mpi2RequestDescriptorUnion_t descriptor; - u64 *request = (u64 *)&descriptor; + void *mpi_req_iomem; + u64 *request; + + if (ioc->is_mcpu_endpoint) { + MPI2RequestHeader_t *request_hdr; + + __le32 *mfp = (__le32 *)mpt3sas_base_get_msg_frame(ioc, smid); + + request_hdr = (MPI2RequestHeader_t *)mfp; + /* TBD 256 is offset within sys register. */ + mpi_req_iomem = (void *)ioc->chip + MPI_FRAME_START_OFFSET +
[V1 4/6] mpt3sas: Introduce Base function for cloning.
All scsi IO's and config requests data buffer and sgl are cloned to system memory in _clone_sg_entries before submitting it to Firmware. Signed-off-by: Suganath Prabu S --- drivers/scsi/mpt3sas/mpt3sas_base.c | 215 +- drivers/scsi/mpt3sas/mpt3sas_base.h | 3 + drivers/scsi/mpt3sas/mpt3sas_config.c | 1 + 3 files changed, 218 insertions(+), 1 deletion(-) diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c b/drivers/scsi/mpt3sas/mpt3sas_base.c index 36f1242..c41c65b 100644 --- a/drivers/scsi/mpt3sas/mpt3sas_base.c +++ b/drivers/scsi/mpt3sas/mpt3sas_base.c @@ -126,6 +126,24 @@ module_param_call(mpt3sas_fwfault_debug, _scsih_set_fwfault_debug, param_get_int, &mpt3sas_fwfault_debug, 0644); /** + * _base_clone_to_sys_mem - Writes/copies data to system/BAR0 region + * + * @dst_iomem: Pointer to the destinaltion location in BAR0 space. + * @src: Pointer to the Source data. + * @size: Size of data to be copied. + */ +static void +_base_clone_to_sys_mem(void __iomem *dst_iomem, void *src, u32 size) +{ + int i; + u32 *src_virt_mem = (u32 *)(src); + + for (i = 0; i < size/4; i++) + writel((u32)src_virt_mem[i], + (void __iomem *)dst_iomem + (i * 4)); +} + +/** * _base_get_chain - Calculates and Returns virtual chain address * for the provided smid in BAR0 space. * @@ -219,6 +237,201 @@ _base_get_buffer_phys_bar0(struct MPT3SAS_ADAPTER *ioc, u16 smid) } /** + * _base_get_chain_buffer_dma_to_chain_buffer - Iterates chain + * lookup list and Provides chain_buffer + * address for the matching dma address. + * (Each smid can have 64K starts from 17024) + * + * @ioc: per adapter object + * @chain_buffer_dma: Chain buffer dma address. + * + * @Returns - Pointer to chain buffer. Or Null on Failure. + */ +static void * +_base_get_chain_buffer_dma_to_chain_buffer(struct MPT3SAS_ADAPTER *ioc, + dma_addr_t chain_buffer_dma) +{ + u16 index; + + for (index = 0; index < ioc->chain_depth; index++) { + if (ioc->chain_lookup[index].chain_buffer_dma == + chain_buffer_dma) + return ioc->chain_lookup[index].chain_buffer; + } + pr_info(MPT3SAS_FMT + "Provided chain_buffer_dma address is not in the lookup list\n", + ioc->name); + return NULL; +} + +/** + * _clone_sg_entries - MPI EP's scsiio and config requests + * are handled here. Base function for + * double buffering, before submitting + * the requests. + * + * @ioc: per adapter object. + * @mpi_request: mf request pointer. + * @smid: system request message index. + * + * @Returns: Nothing. + */ +static void _clone_sg_entries(struct MPT3SAS_ADAPTER *ioc, + void *mpi_request, u16 smid) +{ + Mpi2SGESimple32_t *sgel, *sgel_next; + u32 sgl_flags, sge_chain_count = 0; + bool is_write = 0; + u16 i = 0; + void __iomem *buffer_iomem; + void *buffer_iomem_phys; + void __iomem *buff_ptr; + void *buff_ptr_phys; + void __iomem *dst_chain_addr[MCPU_MAX_CHAINS_PER_IO]; + void *src_chain_addr[MCPU_MAX_CHAINS_PER_IO], *dst_addr_phys; + MPI2RequestHeader_t *request_hdr; + struct scsi_cmnd *scmd; + struct scatterlist *sg_scmd = NULL; + int is_scsiio_req = 0; + + request_hdr = (MPI2RequestHeader_t *) mpi_request; + + if (request_hdr->Function == MPI2_FUNCTION_SCSI_IO_REQUEST) { + Mpi25SCSIIORequest_t *scsiio_request = + (Mpi25SCSIIORequest_t *)mpi_request; + sgel = (Mpi2SGESimple32_t *) &scsiio_request->SGL; + is_scsiio_req = 1; + } else if (request_hdr->Function == MPI2_FUNCTION_CONFIG) { + Mpi2ConfigRequest_t *config_req = + (Mpi2ConfigRequest_t *)mpi_request; + sgel = (Mpi2SGESimple32_t *) &config_req->PageBufferSGE; + } else + return; + + /* From smid we can get scsi_cmd, once we have sg_scmd, +* we just need to get sg_virt and sg_next to get virual +* address associated with sgel->Address. +*/ + + if (is_scsiio_req) { + /* Get scsi_cmd using smid */ + scmd = mpt3sas_scsih_scsi_lookup_get(ioc, smid); + if (scmd == NULL) { + pr_err(MPT3SAS_FMT "scmd is NULL\n", ioc->name); + return; + } + + /* Get sg_scmd from scmd provided */ + sg_scmd = scsi_sglist(scmd); + } + + /* +* 0 - 255 System register +* 256 - 4352 MPI Frame. (This is based on maxCredit 32) +* 4352 - 4864 Reply_free pool (512 byte is reserved +* considering maxCredit 32. Reply need extra
Re: [LSF/MM TOPIC] Two blk-mq related topics
On 30/01/2018 10:33, John Garry wrote: On 30/01/2018 01:24, Ming Lei wrote: On Mon, Jan 29, 2018 at 12:56:30PM -0800, James Bottomley wrote: On Mon, 2018-01-29 at 23:46 +0800, Ming Lei wrote: [...] 2. When to enable SCSI_MQ at default again? I'm not sure there's much to discuss ... I think the basic answer is as soon as Christoph wants to try it again. I guess Christoph still need to evaluate if there are existed issues or blockers before trying it again. And more input may be got from F2F discussion, IMHO. SCSI_MQ is enabled on V3.17 firstly, but disabled at default. In V4.13-rc1, it is enabled at default, but later the patch is reverted in V4.13-rc7, and becomes disabled at default too. Now both the original reported PM issue(actually SCSI quiesce) and the sequential IO performance issue have been addressed. Is the blocker bug just not closed because no-one thought to do it: https://bugzilla.kernel.org/show_bug.cgi?id=178381 (we have confirmed that this issue is now fixed with the original reporter?) From a developer view, this issue is fixed by the following commit: 3a0a52997(block, scsi: Make SCSI quiesce and resume work reliably), and it is verified by kernel list reporter. And did the Huawei guy (Jonathan Cameron) confirm his performance issue was fixed (I don't think I saw email that he did)? Last time I talked with John Garry about the issue, and the merged .get_budget based patch improves much on the IO performance, but there is still a bit gap compared with legacy path. Seems a driver specific issue, remembered that removing a driver's lock can improve performance much. Garry, could you provide further update on this issue? Hi Ming, From our testing with experimental changes to our driver to support SCSI mq we were almost getting on par performance with legacy path. But without these MQ was hitting performance (and I would not necessarily say it was a driver issue). We can retest from today's mainline and see where we are. BTW, Have you got performance figures for many other single queue HBAs with and without CONFIG_SCSI_MQ_DEFAULT=Y? We finally got around to retesting this (on hisi_sas controller). So the results are generally ok, in that we are now not seeing such big performance drops in our hardware for enabling SCSI MQ - in some scenarios the performance is better. Generally fio rw mode is better. Anyway, for what it's worth, it's a green light from us to set SCSI MQ on by default. John Thanks, John Thanks, Ming . ___ Linuxarm mailing list linux...@huawei.com http://hulk.huawei.com/mailman/listinfo/linuxarm .
Re: [PATCH 0/5] blk-mq/scsi-mq: support global tags & introduce force_blk_mq
On Wed, Feb 07, 2018 at 07:50:21AM +0100, Hannes Reinecke wrote: > Hi all, > > [ .. ] > >> > >> Could you share us your patch for enabling global_tags/MQ on > > megaraid_sas > >> so that I can reproduce your test? > >> > >>> See below perf top data. "bt_iter" is consuming 4 times more CPU. > >> > >> Could you share us what the IOPS/CPU utilization effect is after > > applying the > >> patch V2? And your test script? > > Regarding CPU utilization, I need to test one more time. Currently system > > is in used. > > > > I run below fio test on total 24 SSDs expander attached. > > > > numactl -N 1 fio jbod.fio --rw=randread --iodepth=64 --bs=4k > > --ioengine=libaio --rw=randread > > > > Performance dropped from 1.6 M IOPs to 770K IOPs. > > > This is basically what we've seen with earlier iterations. Hi Hannes, As I mentioned in another mail[1], Kashyap's patch has a big issue, which causes only reply queue 0 used. [1] https://marc.info/?l=linux-scsi&m=151793204014631&w=2 So could you guys run your performance test again after fixing the patch? Thanks, Ming
[scsi:misc 327/330] drivers//scsi/storvsc_drv.c:1313:4: error: implicit declaration of function 'for_each_cpu_wrap'
tree: https://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi.git misc head: 16a628faa63c8149d9a8f433e5c6548f6cff98e4 commit: 2439bec3bf084ab6cbc69a66797a4612042be6ca [327/330] scsi: storvsc: Spread interrupts when picking a channel for I/O requests config: x86_64-randconfig-s1-02071932 (attached as .config) compiler: gcc-6 (Debian 6.4.0-9) 6.4.0 20171026 reproduce: git checkout 2439bec3bf084ab6cbc69a66797a4612042be6ca # save the attached .config to linux build tree make ARCH=x86_64 All errors (new ones prefixed by >>): drivers//scsi/storvsc_drv.c: In function 'storvsc_do_io': >> drivers//scsi/storvsc_drv.c:1313:4: error: implicit declaration of function >> 'for_each_cpu_wrap' [-Werror=implicit-function-declaration] for_each_cpu_wrap(tgt_cpu, &alloced_mask, ^ drivers//scsi/storvsc_drv.c:1314:40: error: expected ';' before '{' token outgoing_channel->target_cpu + 1) { ^ cc1: some warnings being treated as errors vim +/for_each_cpu_wrap +1313 drivers//scsi/storvsc_drv.c 1280 1281 1282 static int storvsc_do_io(struct hv_device *device, 1283 struct storvsc_cmd_request *request, u16 q_num) 1284 { 1285 struct storvsc_device *stor_device; 1286 struct vstor_packet *vstor_packet; 1287 struct vmbus_channel *outgoing_channel; 1288 int ret = 0; 1289 struct cpumask alloced_mask; 1290 int tgt_cpu; 1291 1292 vstor_packet = &request->vstor_packet; 1293 stor_device = get_out_stor_device(device); 1294 1295 if (!stor_device) 1296 return -ENODEV; 1297 1298 1299 request->device = device; 1300 /* 1301 * Select an an appropriate channel to send the request out. 1302 */ 1303 1304 if (stor_device->stor_chns[q_num] != NULL) { 1305 outgoing_channel = stor_device->stor_chns[q_num]; 1306 if (outgoing_channel->target_cpu == smp_processor_id()) { 1307 /* 1308 * Ideally, we want to pick a different channel if 1309 * available on the same NUMA node. 1310 */ 1311 cpumask_and(&alloced_mask, &stor_device->alloced_cpus, 1312 cpumask_of_node(cpu_to_node(q_num))); > 1313 for_each_cpu_wrap(tgt_cpu, &alloced_mask, 1314 outgoing_channel->target_cpu + 1) { 1315 if (tgt_cpu != outgoing_channel->target_cpu) { 1316 outgoing_channel = 1317 stor_device->stor_chns[tgt_cpu]; 1318 break; 1319 } 1320 } 1321 } 1322 } else { 1323 outgoing_channel = get_og_chn(stor_device, q_num); 1324 } 1325 1326 1327 vstor_packet->flags |= REQUEST_COMPLETION_FLAG; 1328 1329 vstor_packet->vm_srb.length = (sizeof(struct vmscsi_request) - 1330 vmscsi_size_delta); 1331 1332 1333 vstor_packet->vm_srb.sense_info_length = sense_buffer_size; 1334 1335 1336 vstor_packet->vm_srb.data_transfer_length = 1337 request->payload->range.len; 1338 1339 vstor_packet->operation = VSTOR_OPERATION_EXECUTE_SRB; 1340 1341 if (request->payload->range.len) { 1342 1343 ret = vmbus_sendpacket_mpb_desc(outgoing_channel, 1344 request->payload, request->payload_sz, 1345 vstor_packet, 1346 (sizeof(struct vstor_packet) - 1347 vmscsi_size_delta), 1348 (unsigned long)request); 1349 } else { 1350 ret = vmbus_sendpacket(outgoing_channel, vstor_packet, 1351 (sizeof(struct vstor_packet) - 1352 vmscsi_size_delta), 1353 (unsigned long)request, 1354 VM_PKT_DATA_INBAND, 1355 VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED); 1356 } 1357 1358 if (ret != 0) 1359 return ret; 1360 1361 atomic_inc(&stor_device->num_outstanding_req); 1362 1363 return ret; 1364 } 1365 --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/
Re: [PATCH V2 1/2] qedi: Fix truncation of CHAP name and secret
On 2/6/18, 8:53 PM, "Bart Van Assche" wrote: >On Tue, 2018-02-06 at 05:12 -0800, Nilesh Javali wrote: >> From: Andrew Vasquez >> >> The data in NVRAM is not guaranteed to be NUL terminated. >> Copy the data upto the element size or to the first NUL >> in the byte-stream and then append a NUL. >> >> Signed-off-by: Andrew Vasquez >> Signed-off-by: Nilesh Javali >> --- >> drivers/scsi/qedi/qedi_main.c | 45 >>+++ >> 1 file changed, 33 insertions(+), 12 deletions(-) >> >> diff --git a/drivers/scsi/qedi/qedi_main.c >>b/drivers/scsi/qedi/qedi_main.c >> index 8808f0d..f3dd438 100644 >> --- a/drivers/scsi/qedi/qedi_main.c >> +++ b/drivers/scsi/qedi/qedi_main.c >> @@ -1705,6 +1705,27 @@ void qedi_reset_host_mtu(struct qedi_ctx *qedi, >>u16 mtu) >> qedi_ops->ll2->start(qedi->cdev, ¶ms); >> } >> >> +static ssize_t >> +qedi_show_copy_data(char *buf, size_t size, u8 *data) >> +{ >> +size_t i; >> + >> +if (!data) >> +return sprintf(buf, "\n"); >> + >> +/* >> + * Data not guaranteed to be NUL terminated. Copy until NUL found or >> + * complete copy done. >> + */ >> +for (i = 0; i < size && data[i]; i++) >> +buf[i] = data[i]; >> +/* Data copy complete, append NEWLINE and NUL terminator. */ >> +buf[i] = '\n'; >> +buf[i + 1] = '\0'; >> +return strlen(buf); >> +} > >Can the body of the above function be changed into the following, which >is much >shorter? > >sprintf(buf, "%.*s", (int)size, data ? : "") This looks clean and shorter. I will send the updated patch. Thanks, Nilesh
RE: [PATCH 0/5] blk-mq/scsi-mq: support global tags & introduce force_blk_mq
> -Original Message- > From: Ming Lei [mailto:ming@redhat.com] > Sent: Wednesday, February 7, 2018 5:53 PM > To: Hannes Reinecke > Cc: Kashyap Desai; Jens Axboe; linux-bl...@vger.kernel.org; Christoph > Hellwig; Mike Snitzer; linux-scsi@vger.kernel.org; Arun Easi; Omar Sandoval; > Martin K . Petersen; James Bottomley; Christoph Hellwig; Don Brace; Peter > Rivera; Paolo Bonzini; Laurence Oberman > Subject: Re: [PATCH 0/5] blk-mq/scsi-mq: support global tags & introduce > force_blk_mq > > On Wed, Feb 07, 2018 at 07:50:21AM +0100, Hannes Reinecke wrote: > > Hi all, > > > > [ .. ] > > >> > > >> Could you share us your patch for enabling global_tags/MQ on > > > megaraid_sas > > >> so that I can reproduce your test? > > >> > > >>> See below perf top data. "bt_iter" is consuming 4 times more CPU. > > >> > > >> Could you share us what the IOPS/CPU utilization effect is after > > > applying the > > >> patch V2? And your test script? > > > Regarding CPU utilization, I need to test one more time. Currently > > > system is in used. > > > > > > I run below fio test on total 24 SSDs expander attached. > > > > > > numactl -N 1 fio jbod.fio --rw=randread --iodepth=64 --bs=4k > > > --ioengine=libaio --rw=randread > > > > > > Performance dropped from 1.6 M IOPs to 770K IOPs. > > > > > This is basically what we've seen with earlier iterations. > > Hi Hannes, > > As I mentioned in another mail[1], Kashyap's patch has a big issue, which > causes only reply queue 0 used. > > [1] https://marc.info/?l=linux-scsi&m=151793204014631&w=2 > > So could you guys run your performance test again after fixing the patch? Ming - I tried after change you requested. Performance drop is still unresolved. >From 1.6 M IOPS to 770K IOPS. See below data. All 24 reply queue is in used correctly. IRQs / 1 second(s) IRQ# TOTAL NODE0 NODE1 NAME 360 16422 0 16422 IR-PCI-MSI 70254653-edge megasas 364 15980 0 15980 IR-PCI-MSI 70254657-edge megasas 362 15979 0 15979 IR-PCI-MSI 70254655-edge megasas 345 15696 0 15696 IR-PCI-MSI 70254638-edge megasas 341 15659 0 15659 IR-PCI-MSI 70254634-edge megasas 369 15656 0 15656 IR-PCI-MSI 70254662-edge megasas 359 15650 0 15650 IR-PCI-MSI 70254652-edge megasas 358 15596 0 15596 IR-PCI-MSI 70254651-edge megasas 350 15574 0 15574 IR-PCI-MSI 70254643-edge megasas 342 15532 0 15532 IR-PCI-MSI 70254635-edge megasas 344 15527 0 15527 IR-PCI-MSI 70254637-edge megasas 346 15485 0 15485 IR-PCI-MSI 70254639-edge megasas 361 15482 0 15482 IR-PCI-MSI 70254654-edge megasas 348 15467 0 15467 IR-PCI-MSI 70254641-edge megasas 368 15463 0 15463 IR-PCI-MSI 70254661-edge megasas 354 15420 0 15420 IR-PCI-MSI 70254647-edge megasas 351 15378 0 15378 IR-PCI-MSI 70254644-edge megasas 352 15377 0 15377 IR-PCI-MSI 70254645-edge megasas 356 15348 0 15348 IR-PCI-MSI 70254649-edge megasas 337 15344 0 15344 IR-PCI-MSI 70254630-edge megasas 343 15320 0 15320 IR-PCI-MSI 70254636-edge megasas 355 15266 0 15266 IR-PCI-MSI 70254648-edge megasas 335 15247 0 15247 IR-PCI-MSI 70254628-edge megasas 363 15233 0 15233 IR-PCI-MSI 70254656-edge megasas Average:CPU %usr %nice %sys %iowait%steal %irq %soft%guest%gnice %idle Average: 18 3.80 0.00 14.78 10.08 0.00 0.00 4.01 0.00 0.00 67.33 Average: 19 3.26 0.00 15.35 10.62 0.00 0.00 4.03 0.00 0.00 66.74 Average: 20 3.42 0.00 14.57 10.67 0.00 0.00 3.84 0.00 0.00 67.50 Average: 21 3.19 0.00 15.60 10.75 0.00 0.00 4.16 0.00 0.00 66.30 Average: 22 3.58 0.00 15.15 10.66 0.00 0.00 3.51 0.00 0.00 67.11 Average: 23 3.34 0.00 15.36 10.63 0.00 0.00 4.17 0.00 0.00 66.50 Average: 24 3.50 0.00 14.58 10.93 0.00 0.00 3.85 0.00 0.00 67.13 Average: 25 3.20 0.00 14.68 10.86 0.00 0.00 4.31 0.00 0.00 66.95 Average: 26 3.27 0.00 14.80 10.70 0.00 0.00 3.68 0.00 0.00 67.55 Average: 27 3.58 0.00 15.36 10.80 0.00 0.00 3.79 0.00 0.00 66.48 Average: 28 3.46 0.00 15.17 10.46 0.00 0.00 3.32 0.00 0.00 67.59 Average: 29 3.34 0.00 14.42 10.72 0.00 0.00 3.34 0.00 0.00 68.18 Average: 30 3.34 0.00 15.08 10.70 0.00 0.00 3.89 0.00 0.00 66.99 Average: 31 3.26 0.00 15.33 10.47 0.00 0
Re: [PATCH v2 06/13] lpfc: Add 64G link speed support
On 2/7/2018 1:58 AM, Johannes Thumshirn wrote: On Tue, Feb 06, 2018 at 06:28:44PM -0800, James Smart wrote: lpfc_printf_log(phba, KERN_ERR, LOG_INIT, - "0469 lpfc_link_speed attribute cannot be set to %d, " - "allowed values are ["LPFC_LINK_SPEED_STRING"]\n", val); + "0469 lpfc_link_speed attribute cannot be set to %d, " + "allowed values are ["LPFC_LINK_SPEED_STRING"]\n", val); lpfc_printf_log(phba, KERN_ERR, LOG_INIT, "0469 lpfc_link_speed attribute cannot be set to %d, " "allowed values are [%s]\n", val, LPFC_LINK_SPEED_STRING); And possible have the whole quoted string on one line for easier grepping. checkplatch.pl should've told you that. Checkpatch didn't say anything interesting. We consistently have line spanning as the lines are fairly verbose. Not an easy way to avoid that. Regardless, no big deal. -- james
Re: [PATCH 4/6] scsi: qedf: fix LTO-enabled build
On Fri, 2 Feb 2018, 8:12am, Arnd Bergmann wrote: > The prototype for qedf_dbg_fops/qedf_debugfs_ops doesn't match the definition, > which causes the final link to fail with link-time optimizations: > > drivers/scsi/qedf/qedf_main.c:34: error: type of 'qedf_dbg_fops' does not > match original declaration [-Werror=lto-type-mismatch] > extern struct file_operations qedf_dbg_fops; > > drivers/scsi/qedf/qedf_debugfs.c:443: note: 'qedf_dbg_fops' was previously > declared here > const struct file_operations qedf_dbg_fops[] = { > > drivers/scsi/qedf/qedf_main.c:33: error: type of 'qedf_debugfs_ops' does not > match original declaration [-Werror=lto-type-mismatch] > extern struct qedf_debugfs_ops qedf_debugfs_ops; > > drivers/scsi/qedf/qedf_debugfs.c:102: note: 'qedf_debugfs_ops' was previously > declared here > struct qedf_debugfs_ops qedf_debugfs_ops[] = { > > This corrects the prototype and moves it into a shared header file where it > belongs. The file operations can also be marked 'const' like the > qedf_debugfs_ops. > > Signed-off-by: Arnd Bergmann > --- > drivers/scsi/qedf/qedf_dbg.h | 17 ++--- > drivers/scsi/qedf/qedf_debugfs.c | 6 +++--- > drivers/scsi/qedf/qedf_main.c| 8 +++- > 3 files changed, 16 insertions(+), 15 deletions(-) > Thanks. Acked-by: Chad Dupuis
Re: [PATCH v2 06/13] lpfc: Add 64G link speed support
On Wed, Feb 07, 2018 at 07:38:39AM -0800, James Smart wrote: > On 2/7/2018 1:58 AM, Johannes Thumshirn wrote: > > On Tue, Feb 06, 2018 at 06:28:44PM -0800, James Smart wrote: > > > lpfc_printf_log(phba, KERN_ERR, LOG_INIT, > > > - "0469 lpfc_link_speed attribute cannot be set to %d, " > > > - "allowed values are ["LPFC_LINK_SPEED_STRING"]\n", val); > > > + "0469 lpfc_link_speed attribute cannot be set to %d, " > > > + "allowed values are ["LPFC_LINK_SPEED_STRING"]\n", val); > > > > lpfc_printf_log(phba, KERN_ERR, LOG_INIT, > > "0469 lpfc_link_speed attribute cannot be set to %d, " > > "allowed values are [%s]\n", > > val, LPFC_LINK_SPEED_STRING); > > > > And possible have the whole quoted string on one line for easier grepping. > > checkplatch.pl should've told you that. > > > > Checkpatch didn't say anything interesting. We consistently have line > spanning as the lines are fairly verbose. Not an easy way to avoid that. > > Regardless, no big deal. The line-breaks make it non-trivial to grep for an error message. No big deal and with lpfc having the numeric prefixes for prints this is actually not needed. What I mainly didn't like was the implicit pasting of the define instead of %s. But no big deal as well. -- Johannes Thumshirn Storage jthumsh...@suse.de+49 911 74053 689 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
Re: [PATCH v2 04/13] lpfc: Add push-to-adapter support to sli4
On 2/7/2018 2:27 AM, Johannes Thumshirn wrote: On Wed, Feb 07, 2018 at 10:51:57AM +0100, Johannes Thumshirn wrote: + /* Enable combined writes for DPP aperture */ + pg_addr = (unsigned long)(wq->dpp_regaddr) & PAGE_MASK; +#ifdef CONFIG_X86 + rc = set_memory_wc(pg_addr, 1); + if (rc) { + lpfc_printf_log(phba, KERN_ERR, LOG_INIT, + "3272 Cannot setup Combined " + "Write on WQ[%d] - disable DPP\n", + wq->queue_id); + phba->cfg_enable_dpp = 0; + } +#else + phba->cfg_enable_dpp = 0; +#endif + } else + wq->db_regaddr = phba->sli4_hba.WQDBregaddr; I don't really like the set_memory_wc() call here. Neither do I like the ifdef CONFIG_X86 special casing. If you really need write combining, can't you at least use ioremap_wc()? Coming back to this again (after talking to our ARM/POWER folks internally). Is this really x86 specific here? I know there are servers with other architectures using lpfcs out there. I _think_ write combining should be possible on other architectures (that have PCIe and aren't dead) as well. The ioremap_wc() I suggested is probably wrong. So can you please revisit this? I CCed Mark and Michael, maybe they can help here. yes - we really are looking for write combining. We were following the lead of a couple of other entities in the kernel and hadn't found the right combination for something other than X86. We figured we would come back and add the non-86 enablements later when we found the right options. -- james
RE: [PATCH 0/3] scsi: aacraid: Multi controller Kdump IOP reset handling
> -Original Message- > From: linux-scsi-ow...@vger.kernel.org [mailto:linux-scsi- > ow...@vger.kernel.org] On Behalf Of Martin K. Petersen > Sent: Tuesday, February 6, 2018 4:21 PM > To: Raghava Aditya Renukunta > > Cc: j...@linux.vnet.ibm.com; martin.peter...@oracle.com; linux- > s...@vger.kernel.org; Scott Benesh ; Tom > White ; dl-esc-Aacraid Linux Driver > ; Guilherme G . Piccoli > ; Bart Van Assche > > Subject: Re: [PATCH 0/3] scsi: aacraid: Multi controller Kdump IOP reset > handling > > EXTERNAL EMAIL > > > Raghava, > > > During Kdump aacraid controller IOP reset is invoked, IOP reset > > takes approx 40 seconds to bring the controller back up and running. > > with timeout of 120 seconds and anything more than 2 controllers > > will cause kdump to timeout. > > > > This patchset implements a new reset mechanism called DropIO, that > > induces the fw to drop any pending IO in the fw and making the reset > > process quicker. > > This series doesn't apply to my impending 4.17/scsi-queue. Since the > latter won't exist until Linus releases rc1, please respin against > current linus/master and resubmit. Will do Martin. Thanks, Raghava Aditya > Thanks! > -- > Martin K. Petersen Oracle Linux Engineering
Re: [PATCH V2 2/8] blk-mq: introduce BLK_MQ_F_GLOBAL_TAGS
On 02/06/18 15:18, Jens Axboe wrote: GLOBAL implies that it's, strangely enough, global. That isn't really the case. Why not call this BLK_MQ_F_HOST_TAGS or something like that? I'd welcome better names, but global doesn't seem to be a great choice. BLK_MQ_F_SET_TAGS? I like the name BLK_MQ_F_HOST_TAGS because it refers how these tags will be used by SCSI LLDs, namely as a host-wide tag set. Thanks, Bart.
[PATCH v3 2/2] qedi: Cleanup local str variable
Signed-off-by: Nilesh Javali --- drivers/scsi/qedi/qedi_main.c | 43 --- 1 file changed, 20 insertions(+), 23 deletions(-) diff --git a/drivers/scsi/qedi/qedi_main.c b/drivers/scsi/qedi/qedi_main.c index deaed93..47c45a5 100644 --- a/drivers/scsi/qedi/qedi_main.c +++ b/drivers/scsi/qedi/qedi_main.c @@ -1735,7 +1735,6 @@ static ssize_t qedi_show_boot_eth_info(void *data, int type, char *buf) { struct qedi_ctx *qedi = data; struct nvm_iscsi_initiator *initiator; - char *str = buf; int rc = 1; u32 ipv6_en, dhcp_en, ip_len; struct nvm_iscsi_block *block; @@ -1769,32 +1768,32 @@ static ssize_t qedi_show_boot_eth_info(void *data, int type, char *buf) switch (type) { case ISCSI_BOOT_ETH_IP_ADDR: - rc = snprintf(str, ip_len, fmt, ip); + rc = snprintf(buf, ip_len, fmt, ip); break; case ISCSI_BOOT_ETH_SUBNET_MASK: - rc = snprintf(str, ip_len, fmt, sub); + rc = snprintf(buf, ip_len, fmt, sub); break; case ISCSI_BOOT_ETH_GATEWAY: - rc = snprintf(str, ip_len, fmt, gw); + rc = snprintf(buf, ip_len, fmt, gw); break; case ISCSI_BOOT_ETH_FLAGS: - rc = snprintf(str, 3, "%hhd\n", + rc = snprintf(buf, 3, "%hhd\n", SYSFS_FLAG_FW_SEL_BOOT); break; case ISCSI_BOOT_ETH_INDEX: - rc = snprintf(str, 3, "0\n"); + rc = snprintf(buf, 3, "0\n"); break; case ISCSI_BOOT_ETH_MAC: - rc = sysfs_format_mac(str, qedi->mac, ETH_ALEN); + rc = sysfs_format_mac(buf, qedi->mac, ETH_ALEN); break; case ISCSI_BOOT_ETH_VLAN: - rc = snprintf(str, 12, "%d\n", + rc = snprintf(buf, 12, "%d\n", GET_FIELD2(initiator->generic_cont0, NVM_ISCSI_CFG_INITIATOR_VLAN)); break; case ISCSI_BOOT_ETH_ORIGIN: if (dhcp_en) - rc = snprintf(str, 3, "3\n"); + rc = snprintf(buf, 3, "3\n"); break; default: rc = 0; @@ -1830,7 +1829,6 @@ static ssize_t qedi_show_boot_ini_info(void *data, int type, char *buf) { struct qedi_ctx *qedi = data; struct nvm_iscsi_initiator *initiator; - char *str = buf; int rc; struct nvm_iscsi_block *block; @@ -1842,7 +1840,7 @@ static ssize_t qedi_show_boot_ini_info(void *data, int type, char *buf) switch (type) { case ISCSI_BOOT_INI_INITIATOR_NAME: - rc = sprintf(str, "%.*s\n", NVM_ISCSI_CFG_ISCSI_NAME_MAX_LEN, + rc = sprintf(buf, "%.*s\n", NVM_ISCSI_CFG_ISCSI_NAME_MAX_LEN, initiator->initiator_name.byte); break; default: @@ -1871,7 +1869,6 @@ static umode_t qedi_ini_get_attr_visibility(void *data, int type) qedi_show_boot_tgt_info(struct qedi_ctx *qedi, int type, char *buf, enum qedi_nvm_tgts idx) { - char *str = buf; int rc = 1; u32 ctrl_flags, ipv6_en, chap_en, mchap_en, ip_len; struct nvm_iscsi_block *block; @@ -1910,48 +1907,48 @@ static umode_t qedi_ini_get_attr_visibility(void *data, int type) switch (type) { case ISCSI_BOOT_TGT_NAME: - rc = sprintf(str, "%.*s\n", NVM_ISCSI_CFG_ISCSI_NAME_MAX_LEN, + rc = sprintf(buf, "%.*s\n", NVM_ISCSI_CFG_ISCSI_NAME_MAX_LEN, block->target[idx].target_name.byte); break; case ISCSI_BOOT_TGT_IP_ADDR: if (ipv6_en) - rc = snprintf(str, ip_len, "%pI6\n", + rc = snprintf(buf, ip_len, "%pI6\n", block->target[idx].ipv6_addr.byte); else - rc = snprintf(str, ip_len, "%pI4\n", + rc = snprintf(buf, ip_len, "%pI4\n", block->target[idx].ipv4_addr.byte); break; case ISCSI_BOOT_TGT_PORT: - rc = snprintf(str, 12, "%d\n", + rc = snprintf(buf, 12, "%d\n", GET_FIELD2(block->target[idx].generic_cont0, NVM_ISCSI_CFG_TARGET_TCP_PORT)); break; case ISCSI_BOOT_TGT_LUN: - rc = snprintf(str, 22, "%.*d\n", + rc = snprintf(buf, 22, "%.*d\n", block->target[idx].lun.value[1], block->target[idx].lun.value[0]); break; case ISCSI_BOOT_TGT_CHAP_NAME: - rc = sprintf(str, "%.*s\n", NVM_ISCSI_CFG_CHAP_NAME_MAX_LEN, +
[PATCH v3 1/2] qedi: Fix truncation of CHAP name and secret
From: Andrew Vasquez The data in NVRAM is not guaranteed to be NUL terminated. Since snprintf expects byte-stream to accommodate null byte, the CHAP secret is truncated. Use sprintf instead of snprintf to fix the truncation of CHAP name and secret. Signed-off-by: Andrew Vasquez Signed-off-by: Nilesh Javali --- drivers/scsi/qedi/qedi_main.c | 24 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/drivers/scsi/qedi/qedi_main.c b/drivers/scsi/qedi/qedi_main.c index 8808f0d..deaed93 100644 --- a/drivers/scsi/qedi/qedi_main.c +++ b/drivers/scsi/qedi/qedi_main.c @@ -1842,8 +1842,8 @@ static ssize_t qedi_show_boot_ini_info(void *data, int type, char *buf) switch (type) { case ISCSI_BOOT_INI_INITIATOR_NAME: - rc = snprintf(str, NVM_ISCSI_CFG_ISCSI_NAME_MAX_LEN, "%s\n", - initiator->initiator_name.byte); + rc = sprintf(str, "%.*s\n", NVM_ISCSI_CFG_ISCSI_NAME_MAX_LEN, +initiator->initiator_name.byte); break; default: rc = 0; @@ -1910,8 +1910,8 @@ static umode_t qedi_ini_get_attr_visibility(void *data, int type) switch (type) { case ISCSI_BOOT_TGT_NAME: - rc = snprintf(str, NVM_ISCSI_CFG_ISCSI_NAME_MAX_LEN, "%s\n", - block->target[idx].target_name.byte); + rc = sprintf(str, "%.*s\n", NVM_ISCSI_CFG_ISCSI_NAME_MAX_LEN, +block->target[idx].target_name.byte); break; case ISCSI_BOOT_TGT_IP_ADDR: if (ipv6_en) @@ -1932,20 +1932,20 @@ static umode_t qedi_ini_get_attr_visibility(void *data, int type) block->target[idx].lun.value[0]); break; case ISCSI_BOOT_TGT_CHAP_NAME: - rc = snprintf(str, NVM_ISCSI_CFG_CHAP_NAME_MAX_LEN, "%s\n", - chap_name); + rc = sprintf(str, "%.*s\n", NVM_ISCSI_CFG_CHAP_NAME_MAX_LEN, +chap_name); break; case ISCSI_BOOT_TGT_CHAP_SECRET: - rc = snprintf(str, NVM_ISCSI_CFG_CHAP_PWD_MAX_LEN, "%s\n", - chap_secret); + rc = sprintf(str, "%.*s\n", NVM_ISCSI_CFG_CHAP_NAME_MAX_LEN, +chap_secret); break; case ISCSI_BOOT_TGT_REV_CHAP_NAME: - rc = snprintf(str, NVM_ISCSI_CFG_CHAP_NAME_MAX_LEN, "%s\n", - mchap_name); + rc = sprintf(str, "%.*s\n", NVM_ISCSI_CFG_CHAP_NAME_MAX_LEN, +mchap_name); break; case ISCSI_BOOT_TGT_REV_CHAP_SECRET: - rc = snprintf(str, NVM_ISCSI_CFG_CHAP_PWD_MAX_LEN, "%s\n", - mchap_secret); + rc = sprintf(str, "%.*s\n", NVM_ISCSI_CFG_CHAP_NAME_MAX_LEN, +mchap_secret); break; case ISCSI_BOOT_TGT_FLAGS: rc = snprintf(str, 3, "%hhd\n", SYSFS_FLAG_FW_SEL_BOOT); -- 1.8.3.1
[PATCH v3 0/2] Code cleanup and bug fix for truncated CHAP name and secret
Martin, Please consider below patch set for next 'scsi-fixes' submission. Thanks, Nilesh Andrew Vasquez (1): qedi: Fix truncation of CHAP name and secret Nilesh Javali (1): qedi: Cleanup local str variable drivers/scsi/qedi/qedi_main.c | 55 --- 1 file changed, 26 insertions(+), 29 deletions(-) -- 1.8.3.1
Re: [PATCH 6/6] scsi: qedf: use correct strncpy() size
On Fri, 2 Feb 2018, 8:12am, Arnd Bergmann wrote: > gcc-8 warns during link-time optimization that the strncpy() call > passes the size of the source buffer rather than the destination: > > drivers/scsi/qedf/qedf_dbg.c: In function 'qedf_uevent_emit': > include/linux/string.h:253: error: 'strncpy' specified bound depends on the > length of the source argument [-Werror=stringop-overflow=] > > This changes it to strscpy() with the correct length, guaranteeing > a properly nul-terminated string of the right size. > > Signed-off-by: Arnd Bergmann > --- > drivers/scsi/qedf/qedf_dbg.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > Reasonable security precaution. Acked-by: Chad Dupuis
Re: [PATCH v3 1/2] qedi: Fix truncation of CHAP name and secret
On 02/07/18 08:12, Nilesh Javali wrote: The data in NVRAM is not guaranteed to be NUL terminated. Since snprintf expects byte-stream to accommodate null byte, the CHAP secret is truncated. Use sprintf instead of snprintf to fix the truncation of CHAP name and secret. Reviewed-by: Bart Van Assche Thanks for having addressed the review comments. Bart.
Re: [PATCH v3 2/2] qedi: Cleanup local str variable
On 02/07/18 08:12, Nilesh Javali wrote: Signed-off-by: Nilesh Javali Reviewed-by: Bart Van Assche
[PATCH v2 0/3] scsi: aacraid: Multi controller Kdump IOP reset handling
During Kdump aacraid controller IOP reset is invoked, IOP reset takes approx 40 seconds to bring the controller back up and running. with timeout of 120 seconds and anything more than 2 controllers will cause kdump to timeout. This patchset implements a new reset mechanism called DropIO, that induces the fw to drop any pending IO in the fw and making the reset process quicker. Changes in v2: Respun patchset against Linus Master Added Dave Carroll's reviewed-by tags Raghava Aditya Renukunta (3): scsi: aacraid: Implement DropIO sync command scsi: aacraid: Preserve MSIX mode in the OMR register scsi: aacraid: Auto detect INTx or MSIx mode during sync cmd processing drivers/scsi/aacraid/aacraid.h | 5 + drivers/scsi/aacraid/src.c | 205 ++--- 2 files changed, 198 insertions(+), 12 deletions(-) -- 2.9.4
[PATCH v2 2/3] scsi: aacraid: Preserve MSIX mode in the OMR register
Preserve the current MSIX mode value in the OMR before rewriting the OMR to initiate the IOP or Soft Reset. Signed-off-by: Prasad B Munirathnam Signed-off-by: Raghava Aditya Renukunta Reviewed-by: Dave Carroll --- Changes in V2: Re based on Linus Master drivers/scsi/aacraid/src.c | 22 ++ 1 file changed, 22 insertions(+) diff --git a/drivers/scsi/aacraid/src.c b/drivers/scsi/aacraid/src.c index de48845..09b82d3 100644 --- a/drivers/scsi/aacraid/src.c +++ b/drivers/scsi/aacraid/src.c @@ -680,6 +680,25 @@ void aac_set_intx_mode(struct aac_dev *dev) } } +static void aac_clear_omr(struct aac_dev *dev) +{ + u32 omr_value = 0; + + omr_value = src_readl(dev, MUnit.OMR); + + /* +* Check for PCI Errors or Kernel Panic +*/ + if ((omr_value == INVALID_OMR) || (omr_value & KERNEL_PANIC)) + omr_value = 0; + + /* +* Preserve MSIX Value if any +*/ + src_writel(dev, MUnit.OMR, omr_value & AAC_INT_MODE_MSIX); + src_readl(dev, MUnit.OMR); +} + static void aac_dump_fw_fib_iop_reset(struct aac_dev *dev) { __le32 supported_options3; @@ -740,6 +759,8 @@ static void aac_send_iop_reset(struct aac_dev *dev) aac_set_intx_mode(dev); + aac_clear_omr(dev); + src_writel(dev, MUnit.IDR, IOP_SRC_RESET_MASK); msleep(5000); @@ -749,6 +770,7 @@ static void aac_send_hardware_soft_reset(struct aac_dev *dev) { u_int32_t val; + aac_clear_omr(dev); val = readl(((char *)(dev->base) + IBW_SWR_OFFSET)); val |= 0x01; writel(val, ((char *)(dev->base) + IBW_SWR_OFFSET)); -- 2.9.4
[PATCH v2 3/3] scsi: aacraid: Auto detect INTx or MSIx mode during sync cmd processing
During sync command processing if legacy INTx status indicates command is not completed, sample the MSIx register and check if it indicates command completion, set controller MSIx enabled flag. Signed-off-by: Prasad B Munirathnam Signed-off-by: Raghava Aditya Renukunta Reviewed-by: Dave Carroll --- Changes in V2: Re based on Linus Master drivers/scsi/aacraid/aacraid.h | 1 + drivers/scsi/aacraid/src.c | 22 -- 2 files changed, 17 insertions(+), 6 deletions(-) diff --git a/drivers/scsi/aacraid/aacraid.h b/drivers/scsi/aacraid/aacraid.h index c3fdec9..29bf1e6 100644 --- a/drivers/scsi/aacraid/aacraid.h +++ b/drivers/scsi/aacraid/aacraid.h @@ -1231,6 +1231,7 @@ struct src_registers { #define SRC_ODR_SHIFT 12 #define SRC_IDR_SHIFT 9 +#define SRC_MSI_READ_MASK 0x1000 typedef void (*fib_callback)(void *ctxt, struct fib *fibctx); diff --git a/drivers/scsi/aacraid/src.c b/drivers/scsi/aacraid/src.c index 09b82d3..3122389f 100644 --- a/drivers/scsi/aacraid/src.c +++ b/drivers/scsi/aacraid/src.c @@ -1405,13 +1405,23 @@ void aac_src_access_devreg(struct aac_dev *dev, int mode) static int aac_src_get_sync_status(struct aac_dev *dev) { + int msix_val = 0; + int legacy_val = 0; - int val; + msix_val = src_readl(dev, MUnit.ODR_MSI) & SRC_MSI_READ_MASK ? 1 : 0; - if (dev->msi_enabled) - val = src_readl(dev, MUnit.ODR_MSI) & 0x1000 ? 1 : 0; - else - val = src_readl(dev, MUnit.ODR_R) >> SRC_ODR_SHIFT; + if (!dev->msi_enabled) { + /* +* if Legacy int status indicates cmd is not complete +* sample MSIx register to see if it indiactes cmd complete, +* if yes set the controller in MSIx mode and consider cmd +* completed +*/ + legacy_val = src_readl(dev, MUnit.ODR_R) >> SRC_ODR_SHIFT; + if (!(legacy_val & 1) && msix_val) + dev->msi_enabled = 1; + return legacy_val; + } - return val; + return msix_val; } -- 2.9.4
[PATCH v2 1/3] scsi: aacraid: Implement DropIO sync command
IOP_RESET takes longer time to complete, if controller is in a state where we can bring it back with init struct, controller DropIO sync command is implemented. - If controller is faulted perform standard IOP_RESET in aac_srcv_init. - If controller is not faulted get adapter properties and extended properties. - Update the sa_firmware variable and determine if DropIO request is supported. - Issue DropIO request, and get the number of outstanding commands. - If all commands are complete with success (CT_OK), consider IOP_RESET is complete. - If any commands timeout, Perform the IOP_RESET. Signed-off-by: Prasad B Munirathnam Signed-off-by: Raghava Aditya Renukunta Reviewed-by: Dave Carroll --- Changes in V2: Re based on Linus Master drivers/scsi/aacraid/aacraid.h | 4 + drivers/scsi/aacraid/src.c | 161 +++-- 2 files changed, 159 insertions(+), 6 deletions(-) diff --git a/drivers/scsi/aacraid/aacraid.h b/drivers/scsi/aacraid/aacraid.h index 0095fcb..c3fdec9 100644 --- a/drivers/scsi/aacraid/aacraid.h +++ b/drivers/scsi/aacraid/aacraid.h @@ -1528,6 +1528,7 @@ struct aac_bus_info_response { #define AAC_COMM_MESSAGE_TYPE3 5 #define AAC_EXTOPT_SA_FIRMWARE cpu_to_le32(1<<1) +#define AAC_EXTOPT_SOFT_RESET cpu_to_le32(1<<16) /* MSIX context */ struct aac_msix_ctx { @@ -1662,6 +1663,7 @@ struct aac_dev u8 raw_io_64; u8 printf_enabled; u8 in_reset; + u8 in_soft_reset; u8 msi; u8 sa_firmware; int management_fib_count; @@ -2504,6 +2506,7 @@ struct aac_hba_info { #define RCV_TEMP_READINGS 0x0025 #define GET_COMM_PREFERRED_SETTINGS0x0026 #define IOP_RESET_FW_FIB_DUMP 0x0034 +#define DROP_IO0x0035 #define IOP_RESET 0x1000 #define IOP_RESET_ALWAYS 0x1001 #define RE_INIT_ADAPTER0x00ee @@ -2539,6 +2542,7 @@ struct aac_hba_info { #defineFLASH_UPD_PENDING 0x2000 #defineFLASH_UPD_SUCCESS 0x4000 #defineFLASH_UPD_FAILED0x8000 +#defineINVALID_OMR 0x #defineFWUPD_TIMEOUT (5 * 60) /* diff --git a/drivers/scsi/aacraid/src.c b/drivers/scsi/aacraid/src.c index fde6b6a..de48845 100644 --- a/drivers/scsi/aacraid/src.c +++ b/drivers/scsi/aacraid/src.c @@ -255,7 +255,8 @@ static int src_sync_cmd(struct aac_dev *dev, u32 command, */ src_writel(dev, MUnit.IDR, INBOUNDDOORBELL_0 << SRC_IDR_SHIFT); - if (!dev->sync_mode || command != SEND_SYNCHRONOUS_FIB) { + if ((!dev->sync_mode || command != SEND_SYNCHRONOUS_FIB) && + !dev->in_soft_reset) { ok = 0; start = jiffies; @@ -992,6 +993,148 @@ int aac_src_init(struct aac_dev *dev) return -1; } +static int aac_src_wait_sync(struct aac_dev *dev, int *status) +{ + unsigned long start = jiffies; + unsigned long usecs = 0; + int delay = 5 * HZ; + int rc = 1; + + while (time_before(jiffies, start+delay)) { + /* +* Delay 5 microseconds to let Mon960 get info. +*/ + udelay(5); + + /* +* Mon960 will set doorbell0 bit when it has completed the +* command. +*/ + if (aac_src_get_sync_status(dev) & OUTBOUNDDOORBELL_0) { + /* +* Clear: the doorbell. +*/ + if (dev->msi_enabled) + aac_src_access_devreg(dev, AAC_CLEAR_SYNC_BIT); + else + src_writel(dev, MUnit.ODR_C, + OUTBOUNDDOORBELL_0 << SRC_ODR_SHIFT); + rc = 0; + + break; + } + + /* +* Yield the processor in case we are slow +*/ + usecs = 1 * USEC_PER_MSEC; + usleep_range(usecs, usecs + 50); + } + /* +* Pull the synch status from Mailbox 0. +*/ + if (status && !rc) { + status[0] = readl(&dev->IndexRegs->Mailbox[0]); + status[1] = readl(&dev->IndexRegs->Mailbox[1]); + status[2] = readl(&dev->IndexRegs->Mailbox[2]); + status[3] = readl(&dev->IndexRegs->Mailbox[3]); + status[4] = readl(&dev->IndexRegs->Mailbox[4]); + } + + return rc; +} + +/** + * aac_src_soft_reset - perform soft reset to speed up + * access + * + * Assumptions: That the controller is in a state where we can + *
Re: [PATCH V2 2/8] blk-mq: introduce BLK_MQ_F_GLOBAL_TAGS
On Mon, 2018-02-05 at 23:20 +0800, Ming Lei wrote: > diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c > index 55c0a745b427..385bbec73804 100644 > --- a/block/blk-mq-sched.c > +++ b/block/blk-mq-sched.c > @@ -81,6 +81,17 @@ static bool blk_mq_sched_restart_hctx(struct blk_mq_hw_ctx > *hctx) > } else > clear_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state); > > + /* need to restart all hw queues for global tags */ > + if (hctx->flags & BLK_MQ_F_GLOBAL_TAGS) { > + struct blk_mq_hw_ctx *hctx2; > + int i; > + > + queue_for_each_hw_ctx(hctx->queue, hctx2, i) > + if (blk_mq_run_hw_queue(hctx2, true)) > + return true; > + return false; > + } > + > return blk_mq_run_hw_queue(hctx, true); > } This new loop looks misplaced to me. If both the BLK_MQ_F_GLOBAL_TAGS and the BLK_MQ_F_TAG_SHARED flags are set then the outer loop in blk_mq_sched_restart() and the inner loop in blk_mq_sched_restart_hctx() will cause more calls of blk_mq_run_hw_queue() than necessary. Have you considered to merge the above loop into blk_mq_sched_restart()? Thanks, Bart.
Re: [PATCH v3 1/2] qedi: Fix truncation of CHAP name and secret
On Wed, Feb 07, 2018 at 08:12:35AM -0800, Nilesh Javali wrote: > From: Andrew Vasquez > > The data in NVRAM is not guaranteed to be NUL terminated. > Since snprintf expects byte-stream to accommodate null byte, > the CHAP secret is truncated. > Use sprintf instead of snprintf to fix the truncation of > CHAP name and secret. > > Signed-off-by: Andrew Vasquez > Signed-off-by: Nilesh Javali Makes sense, limits the data copied out instead of the entire output length and ensures that the newline and \0 are always in the output. Signed-off-by: Chris Leech > --- > drivers/scsi/qedi/qedi_main.c | 24 > 1 file changed, 12 insertions(+), 12 deletions(-) > > diff --git a/drivers/scsi/qedi/qedi_main.c b/drivers/scsi/qedi/qedi_main.c > index 8808f0d..deaed93 100644 > --- a/drivers/scsi/qedi/qedi_main.c > +++ b/drivers/scsi/qedi/qedi_main.c > @@ -1842,8 +1842,8 @@ static ssize_t qedi_show_boot_ini_info(void *data, int > type, char *buf) > > switch (type) { > case ISCSI_BOOT_INI_INITIATOR_NAME: > - rc = snprintf(str, NVM_ISCSI_CFG_ISCSI_NAME_MAX_LEN, "%s\n", > - initiator->initiator_name.byte); > + rc = sprintf(str, "%.*s\n", NVM_ISCSI_CFG_ISCSI_NAME_MAX_LEN, > + initiator->initiator_name.byte); > break; > default: > rc = 0; > @@ -1910,8 +1910,8 @@ static umode_t qedi_ini_get_attr_visibility(void *data, > int type) > > switch (type) { > case ISCSI_BOOT_TGT_NAME: > - rc = snprintf(str, NVM_ISCSI_CFG_ISCSI_NAME_MAX_LEN, "%s\n", > - block->target[idx].target_name.byte); > + rc = sprintf(str, "%.*s\n", NVM_ISCSI_CFG_ISCSI_NAME_MAX_LEN, > + block->target[idx].target_name.byte); > break; > case ISCSI_BOOT_TGT_IP_ADDR: > if (ipv6_en) > @@ -1932,20 +1932,20 @@ static umode_t qedi_ini_get_attr_visibility(void > *data, int type) > block->target[idx].lun.value[0]); > break; > case ISCSI_BOOT_TGT_CHAP_NAME: > - rc = snprintf(str, NVM_ISCSI_CFG_CHAP_NAME_MAX_LEN, "%s\n", > - chap_name); > + rc = sprintf(str, "%.*s\n", NVM_ISCSI_CFG_CHAP_NAME_MAX_LEN, > + chap_name); > break; > case ISCSI_BOOT_TGT_CHAP_SECRET: > - rc = snprintf(str, NVM_ISCSI_CFG_CHAP_PWD_MAX_LEN, "%s\n", > - chap_secret); > + rc = sprintf(str, "%.*s\n", NVM_ISCSI_CFG_CHAP_NAME_MAX_LEN, > + chap_secret); > break; > case ISCSI_BOOT_TGT_REV_CHAP_NAME: > - rc = snprintf(str, NVM_ISCSI_CFG_CHAP_NAME_MAX_LEN, "%s\n", > - mchap_name); > + rc = sprintf(str, "%.*s\n", NVM_ISCSI_CFG_CHAP_NAME_MAX_LEN, > + mchap_name); > break; > case ISCSI_BOOT_TGT_REV_CHAP_SECRET: > - rc = snprintf(str, NVM_ISCSI_CFG_CHAP_PWD_MAX_LEN, "%s\n", > - mchap_secret); > + rc = sprintf(str, "%.*s\n", NVM_ISCSI_CFG_CHAP_NAME_MAX_LEN, > + mchap_secret); > break; > case ISCSI_BOOT_TGT_FLAGS: > rc = snprintf(str, 3, "%hhd\n", SYSFS_FLAG_FW_SEL_BOOT); > -- > 1.8.3.1 >
Re: [PATCH v3 2/2] qedi: Cleanup local str variable
On Wed, Feb 07, 2018 at 08:12:36AM -0800, Nilesh Javali wrote: > Signed-off-by: Nilesh Javali Signed-off-by: Chris Leech > --- > drivers/scsi/qedi/qedi_main.c | 43 > --- > 1 file changed, 20 insertions(+), 23 deletions(-) > > diff --git a/drivers/scsi/qedi/qedi_main.c b/drivers/scsi/qedi/qedi_main.c > index deaed93..47c45a5 100644 > --- a/drivers/scsi/qedi/qedi_main.c > +++ b/drivers/scsi/qedi/qedi_main.c > @@ -1735,7 +1735,6 @@ static ssize_t qedi_show_boot_eth_info(void *data, int > type, char *buf) > { > struct qedi_ctx *qedi = data; > struct nvm_iscsi_initiator *initiator; > - char *str = buf; > int rc = 1; > u32 ipv6_en, dhcp_en, ip_len; > struct nvm_iscsi_block *block; > @@ -1769,32 +1768,32 @@ static ssize_t qedi_show_boot_eth_info(void *data, > int type, char *buf) > > switch (type) { > case ISCSI_BOOT_ETH_IP_ADDR: > - rc = snprintf(str, ip_len, fmt, ip); > + rc = snprintf(buf, ip_len, fmt, ip); > break; > case ISCSI_BOOT_ETH_SUBNET_MASK: > - rc = snprintf(str, ip_len, fmt, sub); > + rc = snprintf(buf, ip_len, fmt, sub); > break; > case ISCSI_BOOT_ETH_GATEWAY: > - rc = snprintf(str, ip_len, fmt, gw); > + rc = snprintf(buf, ip_len, fmt, gw); > break; > case ISCSI_BOOT_ETH_FLAGS: > - rc = snprintf(str, 3, "%hhd\n", > + rc = snprintf(buf, 3, "%hhd\n", > SYSFS_FLAG_FW_SEL_BOOT); > break; > case ISCSI_BOOT_ETH_INDEX: > - rc = snprintf(str, 3, "0\n"); > + rc = snprintf(buf, 3, "0\n"); > break; > case ISCSI_BOOT_ETH_MAC: > - rc = sysfs_format_mac(str, qedi->mac, ETH_ALEN); > + rc = sysfs_format_mac(buf, qedi->mac, ETH_ALEN); > break; > case ISCSI_BOOT_ETH_VLAN: > - rc = snprintf(str, 12, "%d\n", > + rc = snprintf(buf, 12, "%d\n", > GET_FIELD2(initiator->generic_cont0, >NVM_ISCSI_CFG_INITIATOR_VLAN)); > break; > case ISCSI_BOOT_ETH_ORIGIN: > if (dhcp_en) > - rc = snprintf(str, 3, "3\n"); > + rc = snprintf(buf, 3, "3\n"); > break; > default: > rc = 0; > @@ -1830,7 +1829,6 @@ static ssize_t qedi_show_boot_ini_info(void *data, int > type, char *buf) > { > struct qedi_ctx *qedi = data; > struct nvm_iscsi_initiator *initiator; > - char *str = buf; > int rc; > struct nvm_iscsi_block *block; > > @@ -1842,7 +1840,7 @@ static ssize_t qedi_show_boot_ini_info(void *data, int > type, char *buf) > > switch (type) { > case ISCSI_BOOT_INI_INITIATOR_NAME: > - rc = sprintf(str, "%.*s\n", NVM_ISCSI_CFG_ISCSI_NAME_MAX_LEN, > + rc = sprintf(buf, "%.*s\n", NVM_ISCSI_CFG_ISCSI_NAME_MAX_LEN, >initiator->initiator_name.byte); > break; > default: > @@ -1871,7 +1869,6 @@ static umode_t qedi_ini_get_attr_visibility(void *data, > int type) > qedi_show_boot_tgt_info(struct qedi_ctx *qedi, int type, > char *buf, enum qedi_nvm_tgts idx) > { > - char *str = buf; > int rc = 1; > u32 ctrl_flags, ipv6_en, chap_en, mchap_en, ip_len; > struct nvm_iscsi_block *block; > @@ -1910,48 +1907,48 @@ static umode_t qedi_ini_get_attr_visibility(void > *data, int type) > > switch (type) { > case ISCSI_BOOT_TGT_NAME: > - rc = sprintf(str, "%.*s\n", NVM_ISCSI_CFG_ISCSI_NAME_MAX_LEN, > + rc = sprintf(buf, "%.*s\n", NVM_ISCSI_CFG_ISCSI_NAME_MAX_LEN, >block->target[idx].target_name.byte); > break; > case ISCSI_BOOT_TGT_IP_ADDR: > if (ipv6_en) > - rc = snprintf(str, ip_len, "%pI6\n", > + rc = snprintf(buf, ip_len, "%pI6\n", > block->target[idx].ipv6_addr.byte); > else > - rc = snprintf(str, ip_len, "%pI4\n", > + rc = snprintf(buf, ip_len, "%pI4\n", > block->target[idx].ipv4_addr.byte); > break; > case ISCSI_BOOT_TGT_PORT: > - rc = snprintf(str, 12, "%d\n", > + rc = snprintf(buf, 12, "%d\n", > GET_FIELD2(block->target[idx].generic_cont0, >NVM_ISCSI_CFG_TARGET_TCP_PORT)); > break; > case ISCSI_BOOT_TGT_LUN: > - rc = snprintf(str, 22, "%.*d\n", > + rc = snprintf(buf, 22, "%.*d\n", > block->target[idx].lun.value[1], > block->target[idx].lun.value[0]);
Re: [PATCH] libiscsi: ensure session spin lock usage consistent
On 02/05/2018 01:13 PM, Lee Duncan wrote: > The libiscsi code was using both spin_lock()/spin_unlock() > and spin_lock_bh()/spin_unlock_bh() on its session lock. It does this because the lock was only taken between bottom halves and process contexts. If we are already in a bh then there is no need to do the _bh call. If that's not the case, then you would also want to modify the back_lock calls too right? > In addition, lock validation found that libiscsi.c was > taking a HARDIRQ-unsafe lock while holding an HARDIRQ- > irq-safe lock: I'm not sure how this patch helps or if this is a valid bug. If somehow iscsi_conn_setup was running and it had passed the spin_lock_bh(&session->frwd_lock) call but not yet released it, and we got a interrupt, and then blk_timeout_work->iscsi_eh_cmd_timed_out was run, we would still hit the bug the trace below is warning against right? I think wou would want to do the _irq locking call in iscsi_conn_setup to avoid that. However, it would be impossible to hit this case, because we would always have had to complete the create_conn call before ever starting to even execute commands and eventually time out commands. > >> [ 2528.738454] = >> [ 2528.744679] WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected >> [ 2528.749891] 4.15.0-6-g4548e6f42022-dirty #418 Not tainted >> [ 2528.754356] - >> [ 2528.759643] kworker/0:1H/100 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire: >> [ 2528.764608] (&(&session->frwd_lock)->rlock){+.-.}, at: >> [<97d8de0f>] iscsi_eh_cmd_timed_out+0x3d/0x2b0 >> [ 2528.769164] >>and this task is already holding: >> [ 2528.771313] (&(&q->__queue_lock)->rlock){-.-.}, at: [] >> blk_timeout_work+0x22/0x120 >> [ 2528.773825] which would create a new lock dependency: >> [ 2528.775216] (&(&q->__queue_lock)->rlock){-.-.} -> >> (&(&session->frwd_lock)->rlock){+.-.} >> [ 2528.777071] >>but this new dependency connects a HARDIRQ-irq-safe lock: >> [ 2528.778945] (&(&q->__queue_lock)->rlock){-.-.} >> [ 2528.778948] >>... which became HARDIRQ-irq-safe at: >> [ 2528.781204] _raw_spin_lock_irqsave+0x3c/0x4b >> [ 2528.781966] cfq_idle_slice_timer+0x2f/0x100 >> [ 2528.782705] __hrtimer_run_queues+0xdc/0x440 >> [ 2528.783448] hrtimer_interrupt+0xb1/0x210 >> [ 2528.784149] smp_apic_timer_interrupt+0x6d/0x260 >> [ 2528.784954] apic_timer_interrupt+0xac/0xc0 >> [ 2528.785679] native_safe_halt+0x2/0x10 >> [ 2528.786280] default_idle+0x1f/0x180 >> [ 2528.786806] do_idle+0x166/0x1e0 >> [ 2528.787288] cpu_startup_entry+0x6f/0x80 >> [ 2528.787874] start_secondary+0x186/0x1e0 >> [ 2528.788448] secondary_startup_64+0xa5/0xb0 >> [ 2528.789070] >>to a HARDIRQ-irq-unsafe lock: >> [ 2528.789913] (&(&session->frwd_lock)->rlock){+.-.} >> [ 2528.789915] >>... which became HARDIRQ-irq-unsafe at: >> [ 2528.791548] ... >> [ 2528.791553] _raw_spin_lock_bh+0x34/0x40 >> [ 2528.792366] iscsi_conn_setup+0x166/0x220 >> [ 2528.792960] iscsi_tcp_conn_setup+0x10/0x40 >> [ 2528.793526] iscsi_sw_tcp_conn_create+0x1b/0x160 >> [ 2528.794111] iscsi_if_rx+0xf9f/0x1580 >> [ 2528.794542] netlink_unicast+0x1f7/0x2f0 >> [ 2528.795105] netlink_sendmsg+0x2c9/0x3c0 >> [ 2528.795636] sock_sendmsg+0x30/0x40 >> [ 2528.796068] ___sys_sendmsg+0x269/0x2c0 >> [ 2528.796544] __sys_sendmsg+0x51/0x90 >> [ 2528.797028] entry_SYSCALL_64_fastpath+0x25/0x9c >> [ 2528.797595] >> ... > > This commit avoids possible reverse deadlock. > > Signed-off-by: Lee Duncan > --- > drivers/scsi/libiscsi.c | 18 +- > 1 file changed, 9 insertions(+), 9 deletions(-) > > diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c > index 82c3fd4bc938..055357b2fe9e 100644 > --- a/drivers/scsi/libiscsi.c > +++ b/drivers/scsi/libiscsi.c > @@ -1248,9 +1248,9 @@ int iscsi_complete_pdu(struct iscsi_conn *conn, struct > iscsi_hdr *hdr, > { > int rc; > > - spin_lock(&conn->session->lock); > + spin_lock_bh(&conn->session->lock); > rc = __iscsi_complete_pdu(conn, hdr, data, datalen); > - spin_unlock(&conn->session->lock); > + spin_unlock_bh(&conn->session->lock); > return rc; > } > EXPORT_SYMBOL_GPL(iscsi_complete_pdu); > @@ -1749,14 +1749,14 @@ static void iscsi_tmf_timedout(unsigned long data) > struct iscsi_conn *conn = (struct iscsi_conn *)data; > struct iscsi_session *session = conn->session; > > - spin_lock(&session->lock); > + spin_lock_bh(&session->lock); > if (conn->tmf_state == TMF_QUEUED) { > conn->tmf_state = TMF_TIMEDOUT; > ISCSI_DBG_EH(session, "tmf timedout\n"); > /* unblock eh_abort() */ > wake_up(&conn->ehwait); > } > - spin_unlock(&session->lock); > + spin_unlock_bh(&session->lock); > } > > static int
Re: [PATCH] libiscsi: ensure session spin lock usage consistent
On 02/05/2018 01:13 PM, Lee Duncan wrote: > diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c > index 82c3fd4bc938..055357b2fe9e 100644 > --- a/drivers/scsi/libiscsi.c > +++ b/drivers/scsi/libiscsi.c > @@ -1248,9 +1248,9 @@ int iscsi_complete_pdu(struct iscsi_conn *conn, struct > iscsi_hdr *hdr, > { > int rc; > > - spin_lock(&conn->session->lock); > + spin_lock_bh(&conn->session->lock); > rc = __iscsi_complete_pdu(conn, hdr, data, datalen); > - spin_unlock(&conn->session->lock); > + spin_unlock_bh(&conn->session->lock); > return rc; This one is actually needed because qla4xxx_task_work calls it from process context.
Re: [PATCH] libiscsi: ensure session spin lock usage consistent
On 02/07/2018 04:41 PM, Mike Christie wrote: > >> static int iscsi_exec_task_mgmt_fn(struct iscsi_conn *conn, >> @@ -1908,7 +1908,7 @@ static enum blk_eh_timer_return >> iscsi_eh_cmd_timed_out(struct scsi_cmnd *sc) >> >> ISCSI_DBG_EH(session, "scsi cmd %p timedout\n", sc); >> >> -spin_lock(&session->lock); >> +spin_lock_bh(&session->lock); >> task = (struct iscsi_task *)sc->SCp.ptr; >> if (!task) { >> /* >> @@ -2022,7 +2022,7 @@ static enum blk_eh_timer_return >> iscsi_eh_cmd_timed_out(struct scsi_cmnd *sc) >> done: >> if (task) >> task->last_timeout = jiffies; >> -spin_unlock(&session->lock); >> +spin_unlock_bh(&session->lock); >> ISCSI_DBG_EH(session, "return %s\n", rc == BLK_EH_RESET_TIMER ? >> "timer reset" : "nh"); >> return rc; I guess you want this chunk too. It does not now, but someone could add a blk_abort_request call that ends up calling into the driver.
Re: [PATCH 0/5] blk-mq/scsi-mq: support global tags & introduce force_blk_mq
Hi Kashyap, On Wed, Feb 07, 2018 at 07:44:04PM +0530, Kashyap Desai wrote: > > -Original Message- > > From: Ming Lei [mailto:ming@redhat.com] > > Sent: Wednesday, February 7, 2018 5:53 PM > > To: Hannes Reinecke > > Cc: Kashyap Desai; Jens Axboe; linux-bl...@vger.kernel.org; Christoph > > Hellwig; Mike Snitzer; linux-scsi@vger.kernel.org; Arun Easi; Omar > Sandoval; > > Martin K . Petersen; James Bottomley; Christoph Hellwig; Don Brace; > Peter > > Rivera; Paolo Bonzini; Laurence Oberman > > Subject: Re: [PATCH 0/5] blk-mq/scsi-mq: support global tags & introduce > > force_blk_mq > > > > On Wed, Feb 07, 2018 at 07:50:21AM +0100, Hannes Reinecke wrote: > > > Hi all, > > > > > > [ .. ] > > > >> > > > >> Could you share us your patch for enabling global_tags/MQ on > > > > megaraid_sas > > > >> so that I can reproduce your test? > > > >> > > > >>> See below perf top data. "bt_iter" is consuming 4 times more CPU. > > > >> > > > >> Could you share us what the IOPS/CPU utilization effect is after > > > > applying the > > > >> patch V2? And your test script? > > > > Regarding CPU utilization, I need to test one more time. Currently > > > > system is in used. > > > > > > > > I run below fio test on total 24 SSDs expander attached. > > > > > > > > numactl -N 1 fio jbod.fio --rw=randread --iodepth=64 --bs=4k > > > > --ioengine=libaio --rw=randread > > > > > > > > Performance dropped from 1.6 M IOPs to 770K IOPs. > > > > > > > This is basically what we've seen with earlier iterations. > > > > Hi Hannes, > > > > As I mentioned in another mail[1], Kashyap's patch has a big issue, > which > > causes only reply queue 0 used. > > > > [1] https://marc.info/?l=linux-scsi&m=151793204014631&w=2 > > > > So could you guys run your performance test again after fixing the > patch? > > Ming - > > I tried after change you requested. Performance drop is still unresolved. > From 1.6 M IOPS to 770K IOPS. > > See below data. All 24 reply queue is in used correctly. > > IRQs / 1 second(s) > IRQ# TOTAL NODE0 NODE1 NAME > 360 16422 0 16422 IR-PCI-MSI 70254653-edge megasas > 364 15980 0 15980 IR-PCI-MSI 70254657-edge megasas > 362 15979 0 15979 IR-PCI-MSI 70254655-edge megasas > 345 15696 0 15696 IR-PCI-MSI 70254638-edge megasas > 341 15659 0 15659 IR-PCI-MSI 70254634-edge megasas > 369 15656 0 15656 IR-PCI-MSI 70254662-edge megasas > 359 15650 0 15650 IR-PCI-MSI 70254652-edge megasas > 358 15596 0 15596 IR-PCI-MSI 70254651-edge megasas > 350 15574 0 15574 IR-PCI-MSI 70254643-edge megasas > 342 15532 0 15532 IR-PCI-MSI 70254635-edge megasas > 344 15527 0 15527 IR-PCI-MSI 70254637-edge megasas > 346 15485 0 15485 IR-PCI-MSI 70254639-edge megasas > 361 15482 0 15482 IR-PCI-MSI 70254654-edge megasas > 348 15467 0 15467 IR-PCI-MSI 70254641-edge megasas > 368 15463 0 15463 IR-PCI-MSI 70254661-edge megasas > 354 15420 0 15420 IR-PCI-MSI 70254647-edge megasas > 351 15378 0 15378 IR-PCI-MSI 70254644-edge megasas > 352 15377 0 15377 IR-PCI-MSI 70254645-edge megasas > 356 15348 0 15348 IR-PCI-MSI 70254649-edge megasas > 337 15344 0 15344 IR-PCI-MSI 70254630-edge megasas > 343 15320 0 15320 IR-PCI-MSI 70254636-edge megasas > 355 15266 0 15266 IR-PCI-MSI 70254648-edge megasas > 335 15247 0 15247 IR-PCI-MSI 70254628-edge megasas > 363 15233 0 15233 IR-PCI-MSI 70254656-edge megasas > > > Average:CPU %usr %nice %sys %iowait%steal > %irq %soft%guest%gnice %idle > Average: 18 3.80 0.00 14.78 10.08 0.00 > 0.00 4.01 0.00 0.00 67.33 > Average: 19 3.26 0.00 15.35 10.62 0.00 > 0.00 4.03 0.00 0.00 66.74 > Average: 20 3.42 0.00 14.57 10.67 0.00 > 0.00 3.84 0.00 0.00 67.50 > Average: 21 3.19 0.00 15.60 10.75 0.00 > 0.00 4.16 0.00 0.00 66.30 > Average: 22 3.58 0.00 15.15 10.66 0.00 > 0.00 3.51 0.00 0.00 67.11 > Average: 23 3.34 0.00 15.36 10.63 0.00 > 0.00 4.17 0.00 0.00 66.50 > Average: 24 3.50 0.00 14.58 10.93 0.00 > 0.00 3.85 0.00 0.00 67.13 > Average: 25 3.20 0.00 14.68 10.86 0.00 > 0.00 4.31 0.00 0.00 66.95 > Average: 26 3.27 0.00 14.80 10.70 0.00 > 0.00 3.68 0.00 0.00 67.55 > Average: 27 3.58 0.00 15.36 10.80 0.00 > 0.00 3.79 0.00 0.00 66.48 > Average: 28 3.46 0.00 15.17 10.46 0.00 > 0.00 3.32 0.00 0.00 67.59 > Averag
Re: [PATCH] libiscsi: ensure session spin lock usage consistent
On Mon, Feb 05, 2018 at 11:13:23AM -0800, Lee Duncan wrote: > The libiscsi code was using both spin_lock()/spin_unlock() > and spin_lock_bh()/spin_unlock_bh() on its session lock. > In addition, lock validation found that libiscsi.c was > taking a HARDIRQ-unsafe lock while holding an HARDIRQ- > irq-safe lock: Lee and I have exchanged a few mails off-list, but I don't think these changes are the right thing to address the lockdep warning here. I've managed to reproduce this by exporting a scsi_debug lun over iSCSI and running fio traffic on a lockdep enabled initiator. The deadlock that lockdep is worried about is: > Possible interrupt unsafe locking scenario: > >CPU0CPU1 > > lock(&(&session->frwd_lock)->rlock); >local_irq_disable(); >lock(&(&q->__queue_lock)->rlock); >lock(&(&session->frwd_lock)->rlock); > > lock(&(&q->__queue_lock)->rlock); > > *** DEADLOCK *** Which I don't think can actually happen unless the queue_lock is being taken in hardirq context by any of the iSCSI drivers, but lockdep can't know that. But it does make me wonder if we can do less in iscsi_eh_cmd_timed_out while interrupts are blocked. Lee, I'm going to test with your patch applied and see what lockdep kicks out. - Chris > > [ 2528.738454] = > > [ 2528.744679] WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected > > [ 2528.749891] 4.15.0-6-g4548e6f42022-dirty #418 Not tainted > > [ 2528.754356] - > > [ 2528.759643] kworker/0:1H/100 [HC0[0]:SC0[0]:HE0:SE1] is trying to > > acquire: > > [ 2528.764608] (&(&session->frwd_lock)->rlock){+.-.}, at: > > [<97d8de0f>] iscsi_eh_cmd_timed_out+0x3d/0x2b0 > > [ 2528.769164] > >and this task is already holding: > > [ 2528.771313] (&(&q->__queue_lock)->rlock){-.-.}, at: > > [] blk_timeout_work+0x22/0x120 > > [ 2528.773825] which would create a new lock dependency: > > [ 2528.775216] (&(&q->__queue_lock)->rlock){-.-.} -> > > (&(&session->frwd_lock)->rlock){+.-.} > > [ 2528.777071] > >but this new dependency connects a HARDIRQ-irq-safe lock: > > [ 2528.778945] (&(&q->__queue_lock)->rlock){-.-.} > > [ 2528.778948] > >... which became HARDIRQ-irq-safe at: > > [ 2528.781204] _raw_spin_lock_irqsave+0x3c/0x4b > > [ 2528.781966] cfq_idle_slice_timer+0x2f/0x100 > > [ 2528.782705] __hrtimer_run_queues+0xdc/0x440 > > [ 2528.783448] hrtimer_interrupt+0xb1/0x210 > > [ 2528.784149] smp_apic_timer_interrupt+0x6d/0x260 > > [ 2528.784954] apic_timer_interrupt+0xac/0xc0 > > [ 2528.785679] native_safe_halt+0x2/0x10 > > [ 2528.786280] default_idle+0x1f/0x180 > > [ 2528.786806] do_idle+0x166/0x1e0 > > [ 2528.787288] cpu_startup_entry+0x6f/0x80 > > [ 2528.787874] start_secondary+0x186/0x1e0 > > [ 2528.788448] secondary_startup_64+0xa5/0xb0 > > [ 2528.789070] > >to a HARDIRQ-irq-unsafe lock: > > [ 2528.789913] (&(&session->frwd_lock)->rlock){+.-.} > > [ 2528.789915] > >... which became HARDIRQ-irq-unsafe at: > > [ 2528.791548] ... > > [ 2528.791553] _raw_spin_lock_bh+0x34/0x40 > > [ 2528.792366] iscsi_conn_setup+0x166/0x220 > > [ 2528.792960] iscsi_tcp_conn_setup+0x10/0x40 > > [ 2528.793526] iscsi_sw_tcp_conn_create+0x1b/0x160 > > [ 2528.794111] iscsi_if_rx+0xf9f/0x1580 > > [ 2528.794542] netlink_unicast+0x1f7/0x2f0 > > [ 2528.795105] netlink_sendmsg+0x2c9/0x3c0 > > [ 2528.795636] sock_sendmsg+0x30/0x40 > > [ 2528.796068] ___sys_sendmsg+0x269/0x2c0 > > [ 2528.796544] __sys_sendmsg+0x51/0x90 > > [ 2528.797028] entry_SYSCALL_64_fastpath+0x25/0x9c > > [ 2528.797595] > > ... > > This commit avoids possible reverse deadlock. > > Signed-off-by: Lee Duncan > --- > drivers/scsi/libiscsi.c | 18 +- > 1 file changed, 9 insertions(+), 9 deletions(-) > > diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c > index 82c3fd4bc938..055357b2fe9e 100644 > --- a/drivers/scsi/libiscsi.c > +++ b/drivers/scsi/libiscsi.c > @@ -1248,9 +1248,9 @@ int iscsi_complete_pdu(struct iscsi_conn *conn, struct > iscsi_hdr *hdr, > { > int rc; > > - spin_lock(&conn->session->lock); > + spin_lock_bh(&conn->session->lock); > rc = __iscsi_complete_pdu(conn, hdr, data, datalen); > - spin_unlock(&conn->session->lock); > + spin_unlock_bh(&conn->session->lock); > return rc; > } > EXPORT_SYMBOL_GPL(iscsi_complete_pdu); > @@ -1749,14 +1749,14 @@ static void iscsi_tmf_timedout(unsigned long data) > struct iscsi_conn *conn = (struct iscsi_conn *)data; > struct iscsi_session *session = conn->session; > > - spin_lock(&session->lock); > + spin_lock_bh(&session->lock); > if (conn->tmf_state == TMF_QUEUED) { >
Re: [PATCH] libiscsi: ensure session spin lock usage consistent
I overlooked it by mentally swapping out the session->lock in the patch for session->frwd_lock from the warning when looking at this, but what kernel was this patch built against? It doesn't have the frwd_lock/back_lock split stuff. - Chris On Mon, Feb 05, 2018 at 11:13:23AM -0800, Lee Duncan wrote: > The libiscsi code was using both spin_lock()/spin_unlock() > and spin_lock_bh()/spin_unlock_bh() on its session lock. > In addition, lock validation found that libiscsi.c was > taking a HARDIRQ-unsafe lock while holding an HARDIRQ- > irq-safe lock: > > > [ 2528.738454] = > > [ 2528.744679] WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected > > [ 2528.749891] 4.15.0-6-g4548e6f42022-dirty #418 Not tainted > > [ 2528.754356] - > > [ 2528.759643] kworker/0:1H/100 [HC0[0]:SC0[0]:HE0:SE1] is trying to > > acquire: > > [ 2528.764608] (&(&session->frwd_lock)->rlock){+.-.}, at: > > [<97d8de0f>] iscsi_eh_cmd_timed_out+0x3d/0x2b0 > > [ 2528.769164] > >and this task is already holding: > > [ 2528.771313] (&(&q->__queue_lock)->rlock){-.-.}, at: > > [] blk_timeout_work+0x22/0x120 > > [ 2528.773825] which would create a new lock dependency: > > [ 2528.775216] (&(&q->__queue_lock)->rlock){-.-.} -> > > (&(&session->frwd_lock)->rlock){+.-.} > > [ 2528.777071] > >but this new dependency connects a HARDIRQ-irq-safe lock: > > [ 2528.778945] (&(&q->__queue_lock)->rlock){-.-.} > > [ 2528.778948] > >... which became HARDIRQ-irq-safe at: > > [ 2528.781204] _raw_spin_lock_irqsave+0x3c/0x4b > > [ 2528.781966] cfq_idle_slice_timer+0x2f/0x100 > > [ 2528.782705] __hrtimer_run_queues+0xdc/0x440 > > [ 2528.783448] hrtimer_interrupt+0xb1/0x210 > > [ 2528.784149] smp_apic_timer_interrupt+0x6d/0x260 > > [ 2528.784954] apic_timer_interrupt+0xac/0xc0 > > [ 2528.785679] native_safe_halt+0x2/0x10 > > [ 2528.786280] default_idle+0x1f/0x180 > > [ 2528.786806] do_idle+0x166/0x1e0 > > [ 2528.787288] cpu_startup_entry+0x6f/0x80 > > [ 2528.787874] start_secondary+0x186/0x1e0 > > [ 2528.788448] secondary_startup_64+0xa5/0xb0 > > [ 2528.789070] > >to a HARDIRQ-irq-unsafe lock: > > [ 2528.789913] (&(&session->frwd_lock)->rlock){+.-.} > > [ 2528.789915] > >... which became HARDIRQ-irq-unsafe at: > > [ 2528.791548] ... > > [ 2528.791553] _raw_spin_lock_bh+0x34/0x40 > > [ 2528.792366] iscsi_conn_setup+0x166/0x220 > > [ 2528.792960] iscsi_tcp_conn_setup+0x10/0x40 > > [ 2528.793526] iscsi_sw_tcp_conn_create+0x1b/0x160 > > [ 2528.794111] iscsi_if_rx+0xf9f/0x1580 > > [ 2528.794542] netlink_unicast+0x1f7/0x2f0 > > [ 2528.795105] netlink_sendmsg+0x2c9/0x3c0 > > [ 2528.795636] sock_sendmsg+0x30/0x40 > > [ 2528.796068] ___sys_sendmsg+0x269/0x2c0 > > [ 2528.796544] __sys_sendmsg+0x51/0x90 > > [ 2528.797028] entry_SYSCALL_64_fastpath+0x25/0x9c > > [ 2528.797595] > > ... > > This commit avoids possible reverse deadlock. > > Signed-off-by: Lee Duncan > --- > drivers/scsi/libiscsi.c | 18 +- > 1 file changed, 9 insertions(+), 9 deletions(-) > > diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c > index 82c3fd4bc938..055357b2fe9e 100644 > --- a/drivers/scsi/libiscsi.c > +++ b/drivers/scsi/libiscsi.c > @@ -1248,9 +1248,9 @@ int iscsi_complete_pdu(struct iscsi_conn *conn, struct > iscsi_hdr *hdr, > { > int rc; > > - spin_lock(&conn->session->lock); > + spin_lock_bh(&conn->session->lock); > rc = __iscsi_complete_pdu(conn, hdr, data, datalen); > - spin_unlock(&conn->session->lock); > + spin_unlock_bh(&conn->session->lock); > return rc; > } > EXPORT_SYMBOL_GPL(iscsi_complete_pdu); > @@ -1749,14 +1749,14 @@ static void iscsi_tmf_timedout(unsigned long data) > struct iscsi_conn *conn = (struct iscsi_conn *)data; > struct iscsi_session *session = conn->session; > > - spin_lock(&session->lock); > + spin_lock_bh(&session->lock); > if (conn->tmf_state == TMF_QUEUED) { > conn->tmf_state = TMF_TIMEDOUT; > ISCSI_DBG_EH(session, "tmf timedout\n"); > /* unblock eh_abort() */ > wake_up(&conn->ehwait); > } > - spin_unlock(&session->lock); > + spin_unlock_bh(&session->lock); > } > > static int iscsi_exec_task_mgmt_fn(struct iscsi_conn *conn, > @@ -1908,7 +1908,7 @@ static enum blk_eh_timer_return > iscsi_eh_cmd_timed_out(struct scsi_cmnd *sc) > > ISCSI_DBG_EH(session, "scsi cmd %p timedout\n", sc); > > - spin_lock(&session->lock); > + spin_lock_bh(&session->lock); > task = (struct iscsi_task *)sc->SCp.ptr; > if (!task) { > /* > @@ -2022,7 +2022,7 @@ static enum blk_eh_timer_return > iscsi_eh_cmd_timed_out(struct scsi_cmnd *sc) > done: > if (task) > task->last_timeout = jiffies;
Re: [PATCH 0/5] blk-mq/scsi-mq: support global tags & introduce force_blk_mq
On 02/07/2018 03:14 PM, Kashyap Desai wrote: >> -Original Message- >> From: Ming Lei [mailto:ming@redhat.com] >> Sent: Wednesday, February 7, 2018 5:53 PM >> To: Hannes Reinecke >> Cc: Kashyap Desai; Jens Axboe; linux-bl...@vger.kernel.org; Christoph >> Hellwig; Mike Snitzer; linux-scsi@vger.kernel.org; Arun Easi; Omar > Sandoval; >> Martin K . Petersen; James Bottomley; Christoph Hellwig; Don Brace; > Peter >> Rivera; Paolo Bonzini; Laurence Oberman >> Subject: Re: [PATCH 0/5] blk-mq/scsi-mq: support global tags & introduce >> force_blk_mq >> >> On Wed, Feb 07, 2018 at 07:50:21AM +0100, Hannes Reinecke wrote: >>> Hi all, >>> >>> [ .. ] > > Could you share us your patch for enabling global_tags/MQ on megaraid_sas > so that I can reproduce your test? > >> See below perf top data. "bt_iter" is consuming 4 times more CPU. > > Could you share us what the IOPS/CPU utilization effect is after applying the > patch V2? And your test script? Regarding CPU utilization, I need to test one more time. Currently system is in used. I run below fio test on total 24 SSDs expander attached. numactl -N 1 fio jbod.fio --rw=randread --iodepth=64 --bs=4k --ioengine=libaio --rw=randread Performance dropped from 1.6 M IOPs to 770K IOPs. >>> This is basically what we've seen with earlier iterations. >> >> Hi Hannes, >> >> As I mentioned in another mail[1], Kashyap's patch has a big issue, > which >> causes only reply queue 0 used. >> >> [1] https://marc.info/?l=linux-scsi&m=151793204014631&w=2 >> >> So could you guys run your performance test again after fixing the > patch? > > Ming - > > I tried after change you requested. Performance drop is still unresolved. > From 1.6 M IOPS to 770K IOPS. > > See below data. All 24 reply queue is in used correctly. > > IRQs / 1 second(s) > IRQ# TOTAL NODE0 NODE1 NAME > 360 16422 0 16422 IR-PCI-MSI 70254653-edge megasas > 364 15980 0 15980 IR-PCI-MSI 70254657-edge megasas > 362 15979 0 15979 IR-PCI-MSI 70254655-edge megasas > 345 15696 0 15696 IR-PCI-MSI 70254638-edge megasas > 341 15659 0 15659 IR-PCI-MSI 70254634-edge megasas > 369 15656 0 15656 IR-PCI-MSI 70254662-edge megasas > 359 15650 0 15650 IR-PCI-MSI 70254652-edge megasas > 358 15596 0 15596 IR-PCI-MSI 70254651-edge megasas > 350 15574 0 15574 IR-PCI-MSI 70254643-edge megasas > 342 15532 0 15532 IR-PCI-MSI 70254635-edge megasas > 344 15527 0 15527 IR-PCI-MSI 70254637-edge megasas > 346 15485 0 15485 IR-PCI-MSI 70254639-edge megasas > 361 15482 0 15482 IR-PCI-MSI 70254654-edge megasas > 348 15467 0 15467 IR-PCI-MSI 70254641-edge megasas > 368 15463 0 15463 IR-PCI-MSI 70254661-edge megasas > 354 15420 0 15420 IR-PCI-MSI 70254647-edge megasas > 351 15378 0 15378 IR-PCI-MSI 70254644-edge megasas > 352 15377 0 15377 IR-PCI-MSI 70254645-edge megasas > 356 15348 0 15348 IR-PCI-MSI 70254649-edge megasas > 337 15344 0 15344 IR-PCI-MSI 70254630-edge megasas > 343 15320 0 15320 IR-PCI-MSI 70254636-edge megasas > 355 15266 0 15266 IR-PCI-MSI 70254648-edge megasas > 335 15247 0 15247 IR-PCI-MSI 70254628-edge megasas > 363 15233 0 15233 IR-PCI-MSI 70254656-edge megasas > > > Average:CPU %usr %nice %sys %iowait%steal > %irq %soft%guest%gnice %idle > Average: 18 3.80 0.00 14.78 10.08 0.00 > 0.00 4.01 0.00 0.00 67.33 > Average: 19 3.26 0.00 15.35 10.62 0.00 > 0.00 4.03 0.00 0.00 66.74 > Average: 20 3.42 0.00 14.57 10.67 0.00 > 0.00 3.84 0.00 0.00 67.50 > Average: 21 3.19 0.00 15.60 10.75 0.00 > 0.00 4.16 0.00 0.00 66.30 > Average: 22 3.58 0.00 15.15 10.66 0.00 > 0.00 3.51 0.00 0.00 67.11 > Average: 23 3.34 0.00 15.36 10.63 0.00 > 0.00 4.17 0.00 0.00 66.50 > Average: 24 3.50 0.00 14.58 10.93 0.00 > 0.00 3.85 0.00 0.00 67.13 > Average: 25 3.20 0.00 14.68 10.86 0.00 > 0.00 4.31 0.00 0.00 66.95 > Average: 26 3.27 0.00 14.80 10.70 0.00 > 0.00 3.68 0.00 0.00 67.55 > Average: 27 3.58 0.00 15.36 10.80 0.00 > 0.00 3.79 0.00 0.00 66.48 > Average: 28 3.46 0.00 15.17 10.46 0.00 > 0.00 3.32 0.00 0.00 67.59 > Average: 29 3.34 0.00 14.42 10.72 0.00 > 0.00 3.34 0.00 0.00 68.18 > Average