RE: [PATCH] megaraid: add scsi_cmnd NULL check before use
> -Original Message- > From: Finn Thain [mailto:fth...@telegraphics.com.au] > Sent: Friday, May 13, 2016 1:14 PM > To: Sumit Saxena > Cc: Dan Carpenter; Petros Koutoupis; kashyap.de...@avagotech.com; > sumit.sax...@avagotech.com; uday.ling...@avagotech.com; > megaraidlinux@avagotech.com; linux-scsi@vger.kernel.org > Subject: RE: [PATCH] megaraid: add scsi_cmnd NULL check before use > > > On Thu, 12 May 2016, Sumit Saxena wrote: > > > > From: Dan Carpenter [mailto:dan.carpen...@oracle.com] > > > > > > Also when I'm doing static analysis people always tell me that "that > > > bug is impossible, trust me." and instead of trusting people I > > > really wish they would just show me the relevant code that prevents > > > it from happening. > > > > Inside megasas_build_io_fusion() function, driver sets "cmd->scmd" > > pointer(SCSI command pointer received from SCSI mid layer). Functions > > called inside megasas_build_io_fusion()(which actually builds frame to > > be sent to firmware) are setting Function type- > > MPI2_FUNCTION_SCSI_IO_REQUEST (or) > MEGASAS_MPI2_FUNCTION_LD_IO_REQUEST. > > So in case Function type set to any one these two, there must be valid > > "cmd->scmd". > > That doesn't show what prevents the bug. It merely shows that the bug does not > always manifest. > > For example, you might check whether anything prevents > megasas_build_io_fusion() from returning before assigning cmd->scmd, like > so: > > 2112 if (sge_count > instance->max_num_sge) { > 2113 dev_err(&instance->pdev->dev, "Error. sge_count (0x%x) > exceeds " > 2114 "max (0x%x) allowed\n", sge_count, > 2115 instance->max_num_sge); > 2116 return 1; > 2117 } > > Another possibility: cmd->io_request->Function is valid yet cmd->scmd is NULL > when seen from the interrupt handler if it intervenes between the two > statements in megasas_return_cmd_fusion(): For IOs returned by above error are not actually fired to firmware so there will be no interrupt handler called for the same. Anyways, if anyone has logs pertaining to the failure, please attach.. > > 180 inline void megasas_return_cmd_fusion(struct megasas_instance *instance, > 181 struct megasas_cmd_fusion *cmd) > 182 { > 183 cmd->scmd = NULL; > 184 memset(cmd->io_request, 0, sizeof(struct > MPI2_RAID_SCSI_IO_REQUEST)); > 185 } > > You might want to confirm that locking always prevents that. > > OTOH, without an actual backtrace, I too might be reluctant to pursue this kind > of speculation. > > -- -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv3]scsi: don't fail zero length request too early
Hi James, and all, I guess you're busy on other staff, so I create patch below as you suggested, I think we also need this into stable. >From 99eab170653544fa1e1bc9511ec055ba70e183d2 Mon Sep 17 00:00:00 2001 From: Jack Wang Date: Fri, 13 May 2016 09:53:21 +0200 Subject: [PATCH] scsi: don't fail zero length request too early We hit IO error in our production when SYNC on multipath devices during resize device on target side, the problem turns out scsi driver passes up as IO error when sense data is UNIT_ATTENTION and ASC && ASCQ indicate Capacity data has changed, even storage side sync the data properly. Dig it further turns out we need special case on zero length commands (currently only FLUSH), when it fails, we always need to drop down into retry code. Reported-by: Sebastian Parschauer Suggested-by: James Bottomley Signed-off-by: Jack Wang --- drivers/scsi/scsi_lib.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index 8106515..5a97866 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -911,9 +911,12 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes) } /* - * If we finished all bytes in the request we are done now. + * special case: failed zero length commands always need to + * drop down into the retry code. Otherwise, if we finished + * all bytes in the request we are done now. */ - if (!scsi_end_request(req, error, good_bytes, 0)) + if (!(good_bytes == 0 && blk_rq_bytes(req) == 0 && result != 0) && +!scsi_end_request(req, error, good_bytes, 0)) return; /* -- 1.9.1 -- Mit freundlichen Grüßen, Best Regards, Jack Wang Linux Kernel Developer Storage ProfitBricks GmbH The IaaS-Company. From 99eab170653544fa1e1bc9511ec055ba70e183d2 Mon Sep 17 00:00:00 2001 From: Jack Wang Date: Fri, 13 May 2016 09:53:21 +0200 Subject: [PATCH] scsi: don't fail zero length request too early We hit IO error in our production when SYNC on multipath devices during resize device on target side, the problem turns out scsi driver passes up as IO error when sense data is UNIT_ATTENTION and ASC && ASCQ indicate Capacity data has changed, even storage side sync the data properly. Dig it further turns out we need special case on zero length commands (currently only FLUSH), when it fails, we always need to drop down into retry code. Reported-by: Sebastian Parschauer Suggested-by: James Bottomley Signed-off-by: Jack Wang --- drivers/scsi/scsi_lib.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index 8106515..5a97866 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -911,9 +911,12 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes) } /* - * If we finished all bytes in the request we are done now. + * special case: failed zero length commands always need to + * drop down into the retry code. Otherwise, if we finished + * all bytes in the request we are done now. */ - if (!scsi_end_request(req, error, good_bytes, 0)) + if (!(good_bytes == 0 && blk_rq_bytes(req) == 0 && result != 0) && + !scsi_end_request(req, error, good_bytes, 0)) return; /* -- 1.9.1
RE: [PATCH] megaraid: add scsi_cmnd NULL check before use
On Fri, 13 May 2016, Sumit Saxena wrote: > > For IOs returned by above error are not actually fired to firmware so > there will be no interrupt handler called for the same. You don't need both possibilities to hold. Either one would do it! -- -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [patch 1/2] mpt3sas: add missing curly braces
Hi, Please consider this patch as Ack-by: Chaitra P B Thanks, Chaitra -Original Message- From: Dan Carpenter [mailto:dan.carpen...@oracle.com] Sent: Friday, May 13, 2016 2:08 AM To: Sathya Prakash; Chaitra P B Cc: Suganath Prabu Subramani; James E.J. Bottomley; Martin K. Petersen; mpt-fusionlinux@broadcom.com; linux-scsi@vger.kernel.org; kernel-janit...@vger.kernel.org Subject: [patch 1/2] mpt3sas: add missing curly braces There are some missing curly braces on this if statement, so we end up printing when we shouldn't. Fixes: a470a51cd648 ('mpt3sas: Handle active cable exception event') Signed-off-by: Dan Carpenter diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c index 6a4df5a..6bff13e 100644 --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c @@ -7975,13 +7975,14 @@ mpt3sas_scsih_event_callback(struct MPT3SAS_ADAPTER *ioc, u8 msix_index, ActiveCableEventData = (Mpi26EventDataActiveCableExcept_t *) mpi_reply->EventData; if (ActiveCableEventData->ReasonCode == - MPI26_EVENT_ACTIVE_CABLE_INSUFFICIENT_POWER) + MPI26_EVENT_ACTIVE_CABLE_INSUFFICIENT_POWER) { pr_info(MPT3SAS_FMT "Currently an active cable with ReceptacleID %d", ioc->name, ActiveCableEventData->ReceptacleID); pr_info("cannot be powered and devices connected to this active cable"); pr_info("will not be seen. This active cable"); pr_info("requires %d mW of power", ActiveCableEventData->ActiveCablePowerRequirement); + } break; default: /* ignore the rest */ -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [patch 2/2] mpt3sas: clean up indenting a bit
Hi Dan, Below patch doesn't apply smoothly on latest upstream kernel , we are seeing Hunk failures. Thanks, Chaitra -Original Message- From: Dan Carpenter [mailto:dan.carpen...@oracle.com] Sent: Friday, May 13, 2016 2:09 AM To: Sathya Prakash Cc: Chaitra P B; Suganath Prabu Subramani; James E.J. Bottomley; Martin K. Petersen; mpt-fusionlinux@broadcom.com; linux-scsi@vger.kernel.org; kernel-janit...@vger.kernel.org Subject: [patch 2/2] mpt3sas: clean up indenting a bit The indenting is slightly off in parts of this file so I have tidied it. Signed-off-by: Dan Carpenter diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c index 6bff13e..34d6f996 100644 --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c @@ -1865,7 +1865,7 @@ scsih_slave_configure(struct scsi_device *sdev) ds = "SSP"; } else { qdepth = MPT3SAS_SATA_QUEUE_DEPTH; -if (raid_device->device_info & + if (raid_device->device_info & MPI2_SAS_DEVICE_INFO_SATA_DEVICE) ds = "SATA"; else @@ -3497,21 +3497,21 @@ _scsih_issue_delayed_event_ack(struct MPT3SAS_ADAPTER *ioc, u16 smid, u16 event, void _scsih_issue_delayed_sas_io_unit_ctrl(struct MPT3SAS_ADAPTER *ioc, u16 smid, u16 handle) - { - Mpi2SasIoUnitControlRequest_t *mpi_request; - u32 ioc_state; - int i = smid - ioc->internal_smid; - unsigned long flags; +{ + Mpi2SasIoUnitControlRequest_t *mpi_request; + u32 ioc_state; + int i = smid - ioc->internal_smid; + unsigned long flags; - if (ioc->remove_host) { - dewtprintk(ioc, pr_info(MPT3SAS_FMT - "%s: host has been removed\n", -__func__, ioc->name)); - return; - } else if (ioc->pci_error_recovery) { - dewtprintk(ioc, pr_info(MPT3SAS_FMT - "%s: host in pci error recovery\n", - __func__, ioc->name)); + if (ioc->remove_host) { + dewtprintk(ioc, pr_info(MPT3SAS_FMT + "%s: host has been removed\n", +__func__, ioc->name)); + return; + } else if (ioc->pci_error_recovery) { + dewtprintk(ioc, pr_info(MPT3SAS_FMT + "%s: host in pci error recovery\n", + __func__, ioc->name)); return; } ioc_state = mpt3sas_base_get_iocstate(ioc, 1); @@ -5173,7 +5173,7 @@ _scsih_expander_add(struct MPT3SAS_ADAPTER *ioc, u16 handle) } _scsih_expander_node_add(ioc, sas_expander); -return 0; + return 0; out_fail: @@ -7774,9 +7774,9 @@ _mpt3sas_fw_work(struct MPT3SAS_ADAPTER *ioc, struct fw_event_work *fw_event) break; case MPT3SAS_PORT_ENABLE_COMPLETE: ioc->start_scan = 0; - if (missing_delay[0] != -1 && missing_delay[1] != -1) + if (missing_delay[0] != -1 && missing_delay[1] != -1) mpt3sas_base_update_missing_delay(ioc, missing_delay[0], - missing_delay[1]); + missing_delay[1]); dewtprintk(ioc, pr_info(MPT3SAS_FMT "port enable: complete from worker thread\n", ioc->name)); -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv3]scsi: don't fail zero length request too early
On Fri, 2016-05-13 at 10:07 +0200, Jinpu Wang wrote: > Hi James, and all, > > I guess you're busy on other staff, so I create patch below as you > suggested, I think we also need this into stable. No, I'll do it, but I just wanted to verify that we don't get into an infinite retry loop on any conditions. James > From 99eab170653544fa1e1bc9511ec055ba70e183d2 Mon Sep 17 00:00:00 > 2001 > From: Jack Wang > Date: Fri, 13 May 2016 09:53:21 +0200 > Subject: [PATCH] scsi: don't fail zero length request too early > > We hit IO error in our production when SYNC on multipath devices > during resize > device on target side, the problem turns out scsi driver passes up as > IO > error when sense data is UNIT_ATTENTION and ASC && ASCQ indicate > Capacity data has changed, even storage side sync the data properly. > > Dig it further turns out we need special case on zero length commands > (currently only FLUSH), when it fails, we always need to drop down > into retry code. > > Reported-by: Sebastian Parschauer > Suggested-by: James Bottomley > Signed-off-by: Jack Wang > --- > drivers/scsi/scsi_lib.c | 7 +-- > 1 file changed, 5 insertions(+), 2 deletions(-) > > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c > index 8106515..5a97866 100644 > --- a/drivers/scsi/scsi_lib.c > +++ b/drivers/scsi/scsi_lib.c > @@ -911,9 +911,12 @@ void scsi_io_completion(struct scsi_cmnd *cmd, > unsigned int good_bytes) > } > > /* > - * If we finished all bytes in the request we are done now. > + * special case: failed zero length commands always need to > + * drop down into the retry code. Otherwise, if we finished > + * all bytes in the request we are done now. > */ > - if (!scsi_end_request(req, error, good_bytes, 0)) > + if (!(good_bytes == 0 && blk_rq_bytes(req) == 0 && result != 0) && > +!scsi_end_request(req, error, good_bytes, 0)) > return; > > /* > -- > 1.9.1 > -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 118081] open-iscsi Ping timeout erro
https://bugzilla.kernel.org/show_bug.cgi?id=118081 --- Comment #1 from Mike Christie --- On 05/11/2016 10:34 PM, bugzilla-dae...@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=118081 > > Bug ID: 118081 >Summary: open-iscsi Ping timeout erro >Product: SCSI Drivers >Version: 2.5 > Kernel Version: 4.4.7 > Hardware: All > OS: Linux > Tree: Mainline > Status: NEW > Severity: normal > Priority: P1 > Component: Other > Assignee: scsi_drivers-ot...@kernel-bugs.osdl.org > Reporter: liuzhengyuang...@gmail.com > Regression: No > > Hi everyone: > I create a target using fileio as the backend storage on ARM64 server. The > initiator reported some errors showed bellow while perform iozone test. > > [178444.145679] connection14:0: ping timeout of 5 secs expired, recv timeout > 5, last rx 4339462894, last ping 4339464146, now 4339465400 > [178444.145706] connection14:0: detected conn error (1011) > [178469.674313] connection14:0: detected conn error (1020) > [178504.420979] connection14:0: ping timeout of 5 secs expired, recv timeout > 5, last rx 4339477953, last ping 4339479204, now 4339480456 > [178504.421001] connection14:0: detected conn error (1011) > [178532.064262] connection14:0: detected conn error (1020) > [178564.584087] connection14:0: ping timeout of 5 secs expired, recv timeout > 5, last rx 4339492980, last ping 4339494232, now 4339495484 > .. > > I try to trace the function call of target iscsi. Then, I found the > receiving > thread of target iscsi blocked at fd_execute_sync_cache -> vfs_fsync_range. > Further, vfs_fsync_range may takes more than 10 seconds to return,while > initiator Ping timeout would happened after 5 seconds. vfs_fsync_range was > call with the form vfs_fsync_range(fd_dev->fd_file, 0, LLONG_MAX, 1) every > times which means sync all device cache. > So, is this a bug? > How does Initiator send sync_cache scsi command? > Does it need to sync all device cache at once? > Any reply would be thankful. > The upper layers like the FS or application determine when to send a sync cache. They send down a request and the iscsi layer just sends it to the target. You are using LIO right? It looks like we end up syncing the entire device sometimes. I think for iscsi pings/Nops that have the immediate bit set, the target would want to reply to them right away. They should not be getting stuck behind these type of commands. Nick, what do you think? -- You are receiving this mail because: You are watching the assignee of the bug. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bug 118081] New: open-iscsi Ping timeout erro
On 05/11/2016 10:34 PM, bugzilla-dae...@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=118081 > > Bug ID: 118081 >Summary: open-iscsi Ping timeout erro >Product: SCSI Drivers >Version: 2.5 > Kernel Version: 4.4.7 > Hardware: All > OS: Linux > Tree: Mainline > Status: NEW > Severity: normal > Priority: P1 > Component: Other > Assignee: scsi_drivers-ot...@kernel-bugs.osdl.org > Reporter: liuzhengyuang...@gmail.com > Regression: No > > Hi everyone: > I create a target using fileio as the backend storage on ARM64 server. The > initiator reported some errors showed bellow while perform iozone test. > > [178444.145679] connection14:0: ping timeout of 5 secs expired, recv timeout > 5, last rx 4339462894, last ping 4339464146, now 4339465400 > [178444.145706] connection14:0: detected conn error (1011) > [178469.674313] connection14:0: detected conn error (1020) > [178504.420979] connection14:0: ping timeout of 5 secs expired, recv timeout > 5, last rx 4339477953, last ping 4339479204, now 4339480456 > [178504.421001] connection14:0: detected conn error (1011) > [178532.064262] connection14:0: detected conn error (1020) > [178564.584087] connection14:0: ping timeout of 5 secs expired, recv timeout > 5, last rx 4339492980, last ping 4339494232, now 4339495484 > .. > > I try to trace the function call of target iscsi. Then, I found the > receiving > thread of target iscsi blocked at fd_execute_sync_cache -> vfs_fsync_range. > Further, vfs_fsync_range may takes more than 10 seconds to return,while > initiator Ping timeout would happened after 5 seconds. vfs_fsync_range was > call with the form vfs_fsync_range(fd_dev->fd_file, 0, LLONG_MAX, 1) every > times which means sync all device cache. > So, is this a bug? > How does Initiator send sync_cache scsi command? > Does it need to sync all device cache at once? > Any reply would be thankful. > The upper layers like the FS or application determine when to send a sync cache. They send down a request and the iscsi layer just sends it to the target. You are using LIO right? It looks like we end up syncing the entire device sometimes. I think for iscsi pings/Nops that have the immediate bit set, the target would want to reply to them right away. They should not be getting stuck behind these type of commands. Nick, what do you think? -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv3]scsi: don't fail zero length request too early
On Fri, 2016-05-13 at 06:51 -0700, James Bottomley wrote: > On Fri, 2016-05-13 at 10:07 +0200, Jinpu Wang wrote: > > Hi James, and all, > > > > I guess you're busy on other staff, so I create patch below as you > > suggested, I think we also need this into stable. > > No, I'll do it, but I just wanted to verify that we don't get into an > infinite retry loop on any conditions. OK, I checked, we're covered by the wait_for check at the bottom of the switch which will automatically fail the command and not retry if we've exceeded the timeout. James -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] scsi_lib: correctly retry failed zero length REQ_TYPE_FS commands
When SCSI was written, all commands coming from the filesystem (REQ_TYPE_FS commands) had data. This meant that our signal for needing to complete the command was the number of bytes completed being equal to the number of bytes in the request. Unfortunately, with the advent of flush barriers, we can now get zero length REQ_TYPE_FS commands, which confuse this logic because they satisfy the condition every time. This means they never get retried even for retryable conditions, like UNIT ATTENTION because we complete them early assuming they're done. Fix this by special casing the early completion condition to recognise zero length commands with errors and let them drop through to the retry code. Reported-by: Sebastian Parschauer Signed-off-by: James E.J. Bottomley --- diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index 8106515..f704d02 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -911,9 +911,12 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes) } /* -* If we finished all bytes in the request we are done now. +* special case: failed zero length commands always need to +* drop down into the retry code. Otherwise, if we finished +* all bytes in the request we are done now. */ - if (!scsi_end_request(req, error, good_bytes, 0)) + if (!(blk_rq_bytes(req) == 0 && error) && + !scsi_end_request(req, error, good_bytes, 0)) return; /* -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] ses: Fix racy cleanup of /sys in remove_dev()
Currently we free the resources backing the enclosure device before we call device_unregister(). This is racy: during rmmod of low-level SCSI drivers that hook into enclosure, we end up with a small window of time during which writing to /sys can OOPS. Example trace with mpt3sas: general protection fault: [#1] SMP KASAN Modules linked in: mpt3sas(-) <...> RIP: [] ses_get_page2_descriptor.isra.6+0x38/0x220 [ses] Call Trace: [] ses_set_fault+0xf4/0x400 [ses] [] set_component_fault+0xa9/0xf0 [enclosure] [] dev_attr_store+0x3c/0x70 [] sysfs_kf_write+0x115/0x180 [] kernfs_fop_write+0x275/0x3a0 [] __vfs_write+0xe0/0x3e0 [] vfs_write+0x13f/0x4a0 [] SyS_write+0x111/0x230 [] entry_SYSCALL_64_fastpath+0x13/0x94 Fortunately the solution is extremely simple: call device_unregister() before we free the resources, and the race no longer exists. The driver core holds a reference over ->remove_dev(), so AFAICT this is safe. Signed-off-by: Calvin Owens --- drivers/scsi/ses.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/scsi/ses.c b/drivers/scsi/ses.c index 53ef1cb..0e8601a 100644 --- a/drivers/scsi/ses.c +++ b/drivers/scsi/ses.c @@ -778,6 +778,8 @@ static void ses_intf_remove_enclosure(struct scsi_device *sdev) if (!edev) return; + enclosure_unregister(edev); + ses_dev = edev->scratch; edev->scratch = NULL; @@ -789,7 +791,6 @@ static void ses_intf_remove_enclosure(struct scsi_device *sdev) kfree(edev->component[0].scratch); put_device(&edev->edev); - enclosure_unregister(edev); } static void ses_intf_remove(struct device *cdev, -- 2.8.0.rc2 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] mpt3sas: Do scsi_remove_host() before deleting SAS PHY objects
On the hardware I'm testing on, simply removing the mpt3sas module triggers a litany of WARNs culminating in an OOPS: [ cut here ] WARNING: CPU: 5 PID: 13348 at lib/kobject.c:244 kobject_add_internal+0x359/0x8a0 kobject_add_internal failed for ArrayDevice09 (error: -2 parent: 6:0:15:0) CPU: 5 PID: 13348 Comm: rmmod Not tainted 4.6.0-rc2-mpt3sas-debug-1-g7e7e6f4 #2 Hardware name: Wiwynn HoneyBadger/PantherPlus, BIOS HBP6.6 11/20/2015 82b88ec0 8806ce76f820 81deff03 8806ce76f898 8806ce76f868 811191e2 880749819af8 00f4 ed00d9cedf0f 880749819b08 88074c9b8168 Call Trace: [] dump_stack+0x67/0x94 [] __warn+0x172/0x1b0 [] warn_slowpath_fmt+0x97/0xb0 [] kobject_add_internal+0x359/0x8a0 [] kobject_add+0x10e/0x1c0 [] device_add+0x30a/0x1490 [] enclosure_remove_device+0x172/0x1cc [enclosure] [] ses_intf_remove+0x1c4/0x270 [ses] [] device_del+0x2ab/0x680 [] device_unregister+0x12/0x30 [] __scsi_remove_device+0x1d5/0x250 [] scsi_forget_host+0x12c/0x1e0 [] scsi_remove_host+0x10c/0x300 [] scsih_remove+0x321/0x680 [mpt3sas] [] pci_device_remove+0x70/0x110 [] __device_release_driver+0x160/0x3a0 [] driver_detach+0x183/0x200 [] bus_remove_driver+0xdf/0x200 [] driver_unregister+0x67/0xa0 [] pci_unregister_driver+0x1e/0xe0 [] _mpt3sas_exit+0x23/0x219 [mpt3sas] [] SyS_delete_module+0x2ee/0x390 [] entry_SYSCALL_64_fastpath+0x18/0xa8 ---[ end trace fe163024b624f4af ]--- general protection fault: [#1] SMP KASAN CPU: 6 PID: 17388 Comm: rmmod Tainted: GW 4.6.0-rc2-1-g7e7e6f4 #1 Hardware name: Wiwynn HoneyBadger/PantherPlus, BIOS HBP6.6 11/20/2015 task: 880753731740 ti: 8806896f task.ti: 8806896f RIP: 0010:[] [] klist_put+0x1f/0x160 RSP: 0018:8806896f7a10 EFLAGS: 00010202 RAX: dc00 RBX: RCX: 880753731f48 RDX: 000b RSI: 0001 RDI: 0058 RBP: 8806896f7a30 R08: 0006 R09: R10: R11: R12: 880752bec000 R13: 0001 R14: dc00 R15: 880752bec2b0 FS: 7feef8ea6700() GS:88075ef8() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 7fe45044d020 CR3: 0006f7092000 CR4: 001006e0 Stack: 880752bec000 dc00 8806896f7a40 8290987e 8806896f7b00 820aa3bd 8806896f7aa0 1100d12def4f 880752bec390 829183ca Call Trace: [] klist_del+0xe/0x10 [] device_del+0x12d/0x680 [] device_unregister+0x12/0x30 [] enclosure_unregister+0xe0/0x170 [enclosure] [] ses_intf_remove+0x190/0x270 [ses] [] device_del+0x2ab/0x680 [] device_unregister+0x12/0x30 [] __scsi_remove_device+0x1d5/0x250 [] scsi_forget_host+0x12c/0x1e0 [] scsi_remove_host+0x10c/0x300 [] scsih_remove+0x321/0x680 [mpt3sas] [] pci_device_remove+0x70/0x110 [] __device_release_driver+0x160/0x3a0 [] driver_detach+0x183/0x200 [] bus_remove_driver+0xdf/0x200 [] driver_unregister+0x67/0xa0 [] pci_unregister_driver+0x1e/0xe0 [] _mpt3sas_exit+0x23/0xa89 [mpt3sas] [] SyS_delete_module+0x2ee/0x390 [] entry_SYSCALL_64_fastpath+0x18/0xa8 The issue is that enclosure_remove_device() expects to be able to re-add the device that was previously enclosured: so with SES running, the order we unwind things matters in a way it otherwise wouldn't. Since mpt3sas deletes the SAS objects before the SCSI hosts, enclosure ends up trying to re-parent the SCSI device from the enclosure to the SAS PHY which has already been deleted. This obviously ends in sadness. The fix appears to be simple: just call scsi_remove_host() before we call sas_port_delete() and/or sas_remove_host(). Signed-off-by: Calvin Owens --- drivers/scsi/mpt3sas/mpt3sas_scsih.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c index e0e4920..4aa128a 100644 --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c @@ -8149,6 +8149,8 @@ void scsih_remove(struct pci_dev *pdev) _scsih_raid_device_remove(ioc, raid_device); } + scsi_remove_host(shost); + /* free ports attached to the sas_host */ list_for_each_entry_safe(mpt3sas_port, next_port, &ioc->sas_hba.sas_port_list, port_list) { @@ -8172,7 +8174,6 @@ void scsih_remove(struct pci_dev *pdev) } sas_remove_host(shost); - scsi_remove_host(shost); mpt3sas_base_detach(ioc); spin_lock(&gioc_lock); list_del(&ioc->list); -- 2.8.0.rc2 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to ma
RE: [PATCH] mpt3sas: Do scsi_remove_host() before deleting SAS PHY objects
> -Original Message- > From: linux-kernel-ow...@vger.kernel.org [mailto:linux-kernel- > ow...@vger.kernel.org] On Behalf Of Calvin Owens > Sent: Friday, May 13, 2016 3:28 PM ... > Subject: [PATCH] mpt3sas: Do scsi_remove_host() before deleting SAS PHY > objects > ... > The issue is that enclosure_remove_device() expects to be able to re-add > the device that was previously enclosured: so with SES running, the order > we unwind things matters in a way it otherwise wouldn't. > > Since mpt3sas deletes the SAS objects before the SCSI hosts, enclosure > ends up trying to re-parent the SCSI device from the enclosure to the SAS > PHY which has already been deleted. This obviously ends in sadness. > > The fix appears to be simple: just call scsi_remove_host() before we call > sas_port_delete() and/or sas_remove_host(). > ... > diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c > @@ -8149,6 +8149,8 @@ void scsih_remove(struct pci_dev *pdev) > _scsih_raid_device_remove(ioc, raid_device); > } > > + scsi_remove_host(shost); > + > /* free ports attached to the sas_host */ > list_for_each_entry_safe(mpt3sas_port, next_port, > &ioc->sas_hba.sas_port_list, port_list) { > @@ -8172,7 +8174,6 @@ void scsih_remove(struct pci_dev *pdev) > } > > sas_remove_host(shost); > - scsi_remove_host(shost); > mpt3sas_base_detach(ioc); > spin_lock(&gioc_lock); > list_del(&ioc->list); That change matches the pattern of all other drivers calling sas_remove_host, except for one: drivers/message/fusion/mptsas.c That consensus means this comment in drivers/scsi/scsi_transport_sas.c is wrong: /** * sas_remove_host - tear down a Scsi_Host's SAS data structures * @shost: Scsi Host that is torn down * * Removes all SAS PHYs and remote PHYs for a given Scsi_Host. * Must be called just before scsi_remove_host for SAS HBAs. */ -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Fix for hang of Ordered task in TCM
If a command with a Simple task attribute is failed due to a Unit Attention, then a subsequent command with an Ordered task attribute will hang forever. The reason for this is that the Unit Attention status is checked for in target_setup_cmd_from_cdb, before the call to target_execute_cmd, which calls target_handle_task_attr, which in turn increments dev->simple_cmds. However, transport_generic_request_failure still calls transport_complete_task_attr, which will decrement dev->simple_cmds. In this case, simple_cmds is now -1. So when a command with the Ordered task attribute is sent, target_handle_task_attr sees that dev->simple_cmds is not 0, so it decides it can't execute the command until all the (nonexistent) Simple commands have completed. The solution I've implemented is to move target_scsi3_ua_check, as well as target_alua_state_check and target_check_reservation, into target_execute_cmd, after the call to target_handle_task_attr. I believe this is actually the correct way this should be handled. According to SAM-4 r14, under section 5.14: "h) if a command other than INQUIRY, REPORT LUNS, REQUEST SENSE, or NOTIFY DATA TRANSFER DEVICE enters the enabled command state while a unit attention condition exists for the SCSI initiator port associated with the I_T nexus on which the command was received, the device server shall terminate the command with a CHECK CONDITION status. The device server shall provide sense data that reports a unit attention condition for the SCSI initiator port that sent the command on the I_T nexus." But according to section 8.5 and 8.6, a command which is not yet executed because of the presence of other tasks in the task set (i.e., one for which target_handle_task_attr returns true) would not enter the enabled command state; it would be in the dormant command state. target_execute_cmd would get called when a command entered the enabled command state, and thus that is the appropriate place to check for Unit Attenion. Similarly, though not quite as explicit, section 5.3.3 tells us that a Reservation Conflict status has a lower precedence than a Unit Attention, and so this would also seem to be the appropriate place to call target_check_reservation. I'm less sure about target_alua_state_check, since I'm not very familiar with ALUA. Signed-off-by: Michael Cyr --- drivers/target/target_core_transport.c | 41 -- 1 file changed, 24 insertions(+), 17 deletions(-) diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c index 6c089af..2ee5502 100644 --- a/drivers/target/target_core_transport.c +++ b/drivers/target/target_core_transport.c @@ -1303,23 +1303,6 @@ target_setup_cmd_from_cdb(struct se_cmd *cmd, unsigned char *cdb) trace_target_sequencer_start(cmd); - /* -* Check for an existing UNIT ATTENTION condition -*/ - ret = target_scsi3_ua_check(cmd); - if (ret) - return ret; - - ret = target_alua_state_check(cmd); - if (ret) - return ret; - - ret = target_check_reservation(cmd); - if (ret) { - cmd->scsi_status = SAM_STAT_RESERVATION_CONFLICT; - return ret; - } - ret = dev->transport->parse_cdb(cmd); if (ret == TCM_UNSUPPORTED_SCSI_OPCODE) pr_warn_ratelimited("%s/%s: Unsupported SCSI Opcode 0x%02x, sending CHECK_CONDITION.\n", @@ -1865,6 +1848,8 @@ static int __transport_check_aborted_status(struct se_cmd *, int); void target_execute_cmd(struct se_cmd *cmd) { + sense_reason_t ret; + /* * Determine if frontend context caller is requesting the stopping of * this command for frontend exceptions. @@ -1899,6 +1884,28 @@ void target_execute_cmd(struct se_cmd *cmd) return; } + /* +* Check for an existing UNIT ATTENTION condition +*/ + ret = target_scsi3_ua_check(cmd); + if (ret) { + transport_generic_request_failure(cmd, ret); + return; + } + + ret = target_alua_state_check(cmd); + if (ret) { + transport_generic_request_failure(cmd, ret); + return; + } + + ret = target_check_reservation(cmd); + if (ret) { + cmd->scsi_status = SAM_STAT_RESERVATION_CONFLICT; + transport_generic_request_failure(cmd, ret); + return; + } + __target_execute_cmd(cmd); } EXPORT_SYMBOL(target_execute_cmd); -- 2.5.0 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] megaraid: fix error code check of register_chrdev()
There is invalid error code check of register_chrdev() in megaraid_init(). register_chrdev() returns negative code in case of error, as a result current code can try to unregister_chrdev() with error code instead of major that may lead to unregistering somebody else's chardev. Found by Linux Driver Verification project (linuxtesting.org). Signed-off-by: Alexey Khoroshilov --- drivers/scsi/megaraid.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/scsi/megaraid.c b/drivers/scsi/megaraid.c index 9d05302a3bcd..ded082942ca0 100644 --- a/drivers/scsi/megaraid.c +++ b/drivers/scsi/megaraid.c @@ -4702,7 +4702,7 @@ static int __init megaraid_init(void) * major number allocation. */ major = register_chrdev(0, "megadev_legacy", &megadev_fops); - if (!major) { + if (major < 0) { printk(KERN_WARNING "megaraid: failed to register char device\n"); } @@ -4715,7 +4715,10 @@ static void __exit megaraid_exit(void) /* * Unregister the character device interface to the driver. */ - unregister_chrdev(major, "megadev_legacy"); + if (major > 0) { + unregister_chrdev(major, "megadev_legacy"); + major = 0; + } pci_unregister_driver(&megaraid_pci_driver); -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] megaraid: add scsi_cmnd NULL check before use
On Thu, 2016-05-12 at 09:35 +0300, Dan Carpenter wrote: > On Wed, May 11, 2016 at 08:49:51PM -0500, Petros Koutoupis wrote: > > Sumit, > > > > I will resubmit the patch with all the recommendations. Thank you. In case > > you are interested, I have a crash file showcasing the error. I can always > > provide this outside of this mailing thread. > > > > Please send it to the thread. > > To be honest, I totally can't understand this thread. Sumit says it is > impossible and you are saying that you have seen it happen in real life. > Are you using out of tree code or something? What are you doing that is > unexpected? > > I don't see the point of adding a WARN_ON(). NULL derefs normally > generate a pretty clear stack trace already (unless they are caused by > memory corruption). Why are we not just fixing the bugs instead of > warning and then crashing. > > Also when I'm doing static analysis people always tell me that "that bug > is impossible, trust me." and instead of trusting people I really wish > they would just show me the relevant code that prevents it from > happening. > > regards, > dan carpenter > All, You will find the tarball containing the crash and kernel images here: https://drive.google.com/open?id=0B771xrWtHcpWWFI3Yl93WFBfT28 Note that it is a 7GB file. -- Petros -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html