Martin, James,
On 02/22/2018 01:07 AM, Martin K. Petersen wrote:
The first patch prevents the SCSI error handlers to run once the
shutdown/unload path starts. This avoids an oops at least in the host
reset handler, on kernels with a recent patch, and also in the abort
handler on kernels without
ht help, so still go for the changes.
Also, this might help to prevent similar errors in the future,
in case code changes and possibly tries to access freed stuff.
Note the fix in scsih_host_reset() is still important anyway.
Signed-off-by: Mauricio Faria de Oliveira
---
drivers/scsi/mpt3sas
to
be queued) then it flushes commands that might still be running.
This avoids triggering error handling (e.g., abort command) for
all commands possibly completed by the adapter after interrupts
disabled.
Suggested-by: Sreekanth Reddy
Tested-by: Mauricio Faria de Oliveira
Signed-off-by: Sr
.
Mauricio Faria de Oliveira (2):
scsi: mpt3sas: fix oops in error handlers after shutdown/unload
scsi: mpt3sas: wait for and flush running commands on shutdown/unload
drivers/scsi/mpt3sas/mpt3sas_base.c | 8
drivers/scsi/mpt3sas/mpt3sas_base.h | 3 +++
drivers/scsi/mpt3sas
Hi Sreekanth,
On 02/15/2018 03:48 AM, Sreekanth Reddy wrote:
During the shutdown time, I don't want the outstanding IOs to timeout due to
disabling of interrupts and go the TM path. So I wanted to clear out all the
Outstanding IOs in the shutdown path itself instead of clearing them in TM
path.
h), and that check is already used in many other points in the code,
for the same reasons (exit early before the code attempts to use stuff
that might be released).
Thanks again,
--
Mauricio Faria de Oliveira
IBM Linux Technology Center
The test-case results with PATCH v2.
scsih_abort()
=
Without patch:
[ 362.669743] setting logging_level(0x1000)
[ 362.705074] mpt3sas_cm0: skip free_smid/scsi_done scmd(c01fd4f2bd40)
[ 363.956579] sd 16:0:1:0: [sdf] Synchronizing SCSI cache
[ 363.956844] s
atch might help, so still go for the changes.
Also, this might help to prevent similar errors in the future,
in case code changes and possibly tries to access freed stuff.
Note the fix in scsih_host_reset() is still important anyway.
Signed-off-by: Mauricio Faria de Oliveira
---
v2:
- rebase o
On 01/31/2018 08:50 PM, Bart Van Assche wrote:
I think it would be useful to have some variant of the above code in the kernel
tree. Are you familiar with the fault injection framework (see also
and Documentation/fault-injection/fault-injection.txt)?
No, not yet. That's very interesting.
Do
On 01/31/2018 08:59 PM, Bart Van Assche wrote:
On Wed, 2018-01-31 at 17:48 -0200, Mauricio Faria de Oliveira wrote:
On 01/31/2018 05:06 PM, Bart Van Assche wrote:
Sorry but I think this patch introduces new race conditions. Have you
Can you detail the race conditions? As far as I can see
Bart,
Thanks for reviewing.
On 01/31/2018 05:06 PM, Bart Van Assche wrote:
Sorry but I think this patch introduces new race conditions. Have you
Can you detail the race conditions? As far as I can see, the only race
condition would be when an error handler is invoked very close in time
to the
This patch can be verified with this simple test-case,
which inserts a wait loop at the bottom of 'scsih_shutdown()'
and forces SCSI commands to timeout (skip 'scmd->scsi_done()').
It abuses the 'ioc->logging_level' parameter do to that, with:
- 0x1000: wait loop on scsih_shutdown() and skip s
tate()').
The device reset and target reset handlers do not cause oopses,
but print a misleading message of host reset in progress, thus
fix those too.
Signed-off-by: Mauricio Faria de Oliveira
---
drivers/scsi/mpt3sas/mpt3sas_scsih.c | 28
1 file changed, 28 inse
On 07/11/2017 12:32 PM, Mauricio Faria de Oliveira wrote:
Also, it seems the Unavailable/Standby states would not be logged
without a recheck from alua_check_sense(), since the only callers
of alua_rtpg_queue() are alua_activate() and alua_check[_sense]()
Well, actually it does get logged if
kers won't go through that function.
(and it occurred to me that the state-change check of patch 3 can
be done there, simpler.)
cheers,
--
Mauricio Faria de Oliveira
IBM Linux Technology Center
On 07/10/2017 07:47 PM, Mauricio Faria de Oliveira wrote:
This patchset addresses that problem, and adds a few improvements
to the logging of PG state changes.
Here are some kernel log snippets with the patchset, if that helps.
The 2 port groups temporarily gone into unavailable state, and
es)
part, which is only printed for the current PG.
Signed-off-by: Mauricio Faria de Oliveira
---
v2:
- use lockdep_assert_held() instead of documenting locking conventions
(Bart Van Assche )
- define two functions (with/without supported states information)
(Bart Van Assche )
- simplify wh
g/active state again,
the PG state is normally updated on path activation (alua_activate(),
as it schedules a recheck), thus I/O requests are no longer failed.
Signed-off-by: Mauricio Faria de Oliveira
Reported-by: Naresh Bannoth
---
v2:
- also add support for standby state to alua_check_s
he recheck scheduled in alua_check_sense() to update PG state.
So, do not to print such message if unavailable/standby state remains
(i.e., the PG did not transition to/from such states). All other cases
continue to be printed.
Signed-off-by: Mauricio Faria de Oliveira
---
v2:
- changed v1's alu
Insert sdev_dbg() calls in the function path which may queue
alua_rtpg_work() past initialization, for debugging purposes:
- alua_activate()
- alua_check_sense()
- alua_rtpg_queue()
Signed-off-by: Mauricio Faria de Oliveira
---
drivers/scsi/device_handler/scsi_dh_alua.c | 14 --
1
in unavailable/standby
are not logged - only changes are.
Patch 4 adds few sdev_dbg() calls to track the path to alua_rtpg_work()
Tested on v4.12+ (commit b4b8cbf679c4).
Mauricio Faria de Oliveira (4):
scsi: scsi_dh_alua: allow I/O in target port unavailable and standby
states
scsi
This is the PATCH v2. Sorry for the wrong subject line.
On 04/11/2017 11:46 AM, Mauricio Faria de Oliveira wrote:
Signed-off-by: Mauricio Faria de Oliveira
Acked-by: Brian King
---
v2:
- use the scsi_cmd local variable rather than ipr_cmd->scsi_cmd dereference.
- add Acked-by: Brian K
01F80 dm-6 IBM ,IPR-0 59C2AE00
size=5.2T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
|-+- policy='service-time 0' prio=0 status=active
| `- 2:2:7:0 sdaf 65:240 active undef running
`-+- policy='service-time 0' prio=0 status=enab
ointer, so could be:
Thanks for catching that oversight.
--
Mauricio Faria de Oliveira
IBM Linux Technology Center
01F80 dm-6 IBM ,IPR-0 59C2AE00
size=5.2T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
|-+- policy='service-time 0' prio=0 status=active
| `- 2:2:7:0 sdaf 65:240 active undef running
`-+- policy='service-time 0' prio=0 status=ena
On 04/10/2017 10:17 PM, Mauricio Faria de Oliveira wrote:
For documentation purposes, I'll reply to this cover letter with the analysis
of such cases of this problem, and the accompanying messages from kernel logs.
Here it goes, for anyone interested.
Scenario: 4 LUNs, 2 target port g
handler structures is used to
find a valid scsi_device that is associated to this port group.
Signed-off-by: Mauricio Faria de Oliveira
---
drivers/scsi/device_handler/scsi_dh_alua.c | 12
1 file changed, 12 insertions(+)
diff --git a/drivers/scsi/device_handler/scsi_dh_alua.c
b
ional), in
which case the 'valid_states' information is not printed. That
is for the following patch too.
Signed-off-by: Mauricio Faria de Oliveira
---
drivers/scsi/device_handler/scsi_dh_alua.c | 43 ++
1 file changed, 32 insertions(+), 11 deletions(-)
diff
d the accompanying messages from kernel logs.
Mauricio Faria de Oliveira (4):
scsi: scsi_dh_alua: allow I/O in the target port unavailable state
scsi: scsi_dh_alua: create alua_rtpg_print() for alua_rtpg()
sdev_printk
scsi: scsi_dh_alua: print changes to RTPG state of other PGs too
scsi: s
whenever
the unavailable state is detected so pg->state can be updated properly
(and further SCSI IO error messages then silenced through alua_prep_fn()).
Once a path checker eventually detects an active state again, the port
group state will be updated by the path activation call, alua_activat
be printed.
Signed-off-by: Mauricio Faria de Oliveira
---
drivers/scsi/device_handler/scsi_dh_alua.c | 26 --
1 file changed, 24 insertions(+), 2 deletions(-)
diff --git a/drivers/scsi/device_handler/scsi_dh_alua.c
b/drivers/scsi/device_handler/scsi_dh_alua.c
index c2
On 04/05/2017 01:23 PM, Song Liu wrote:
Reviewed-by: Song Liu
Thanks for reviewing, Song Liu.
It's good to know this patch doesn't break anything for you.
cheers,
--
Mauricio Faria de Oliveira
IBM Linux Technology Center
le that more properly, set the initial power state
value to '-1' (i.e., uninitialized) instead of '1' (power 'on'),
and check for it in that callback which may do an direct access
to the field value _if_ a callback function is not defined.
Signed-off-by: Mauricio F
On 04/05/2017 11:41 AM, Dan Williams wrote:
On Wed, Apr 5, 2017 at 6:13 AM, Mauricio Faria de Oliveira
wrote:
1) imagine .get_power_status couldn't update the 'power_status' field
(it's a bit unlikely with the in-tree ses driver, but in the case
that ses_get_page2_
nitialize iocb list %d.\n",
phba->cfg_iocb_cnt*1024);
cheers,
--
Mauricio Faria de Oliveira
IBM Linux Technology Center
ed not to be required in this case,
I'd be fine with dropping the related changes and making this patch
a one-line. :- )
cheers,
--
Mauricio Faria de Oliveira
IBM Linux Technology Center
Hi Martin and Junichi,
On 04/03/2017 11:10 PM, Junichi Nomura wrote:
On 04/04/17 06:53, Mauricio Faria de Oliveira wrote:
On 03/28/2017 11:29 PM, Junichi Nomura wrote:
Since commit 895427bd012c ("scsi: lpfc: NVME Initiator: Base modifications"),
"rmmod lpfc" starti
t sent
([PATCH] lpfc: fix double free of bound CQ/WQ ring pointer) resolves it?
I don't have a setup to test it handy right now.
cheers,
--
Mauricio Faria de Oliveira
IBM Linux Technology Center
qidx]->pring = pring;
commit 85e8a23936ab ("scsi: lpfc: Add shutdown method for kexec") made
this more likely as lpfc_pci_remove_one() is called on driver shutdown
(e.g., modprobe -r / rmmod).
(this patch is partially based on a different patch suggested by Johannes,
thus adding a Sug
turned on (disabled by default).
Also, to handle that more properly, set the initial power state
value to '-1' (i.e., uninitialized) instead of '1' (power 'on'),
and check for it in that callback which may do an direct access
to the field value _if_ a callback function
On 03/13/2017 11:48 AM, Hannes Reinecke wrote:
This is assuming that we're always running on a scsi_disk, and that
scsi_disk is the only one implementing 'eh_action'.
Neither of which is necessarily true.
Ah, OK. Thanks for explaining.
--
Mauricio Faria de Oliveira
IBM L
e commands but TEST UNIT
diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
index 4dac35e..6a4f75a 100644
--- a/drivers/scsi/sd.h
+++ b/drivers/scsi/sd.h
@@ -106,6 +106,7 @@ struct scsi_disk {
unsignedrc_basis: 2;
unsignedzoned: 2;
unsigned urswrz : 1;
ange if they see fit/required.
[1] http://www.spinics.net/lists/linux-scsi/msg105886.html
cheers,
--
Mauricio Faria de Oliveira
IBM Linux Technology Center
t I'd present/ask for consideration too.
I think I should have included this in the tested-by tag email, for
documentation/evidence: no regression observed in system shutdown path.
Thanks,
--
Mauricio Faria de Oliveira
IBM Linux Technology Center
sorry; I missed checking the right tree. Thanks for the pointers.
--
Mauricio Faria de Oliveira
IBM Linux Technology Center
On 02/12/2017 07:49 PM, Anton Blanchard wrote:
We see lpfc devices regularly fail during kexec. Fix this by adding
a shutdown method which mirrors the remove method.
Reviewed-by: Mauricio Faria de Oliveira
Tested-by: Mauricio Faria de Oliveira
@mkp, @jejb: may you please flag this patch for
Hi Martin and James,
On 02/12/2017 07:52 PM, James Smart wrote:
Correct WQ creation for pagesize
Reviewed-by: Mauricio Faria de Oliveira
Please flag this patch for stable.
This patch resolves a serious problem on IBM Power systems at least.
An (apparently constant) series of invalid
received partially
updated WQE data.
Add the memory barrier after updating the WQE memory.
Reviewed-by: Mauricio Faria de Oliveira
Martin, may you please flag this patch for stable?
Thank you,
--
Mauricio Faria de Oliveira
IBM Linux Technology Center
, please feel free to change the sign-off line as
appropriate here.
Thanks,
--
Mauricio Faria de Oliveira
IBM Linux Technology Center
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo inf
x280 [qla2xxx]
qla2x00_abort_isp+0xef/0x690 [qla2xxx]
qla2x00_do_dpc+0x36c/0x880 [qla2xxx]
kthread+0x10c/0x140
Note: this patch is a slight change of the original patch
sent by Bart, submitted by request of mkp.
Signed-off-by: Mauricio Faria de Oliveira
Reported-by: Bart Van
re_lock, flags);
- qla2xxx_eh_abort(GET_CMD_SP(sp));
+ qla2xxx_eh_abort(scmd);
spin_lock_irqsave(&ha->hardware_lock, flags);
}
req->outstanding_cmds[cn
or it didn't. Regardless of how the I/O is
being broken up into frames at the transport level and at which offset
the transfer was interrupted.
Christoph, Hannes, Martin,
Thank you all for your comments and pointers to the documentation/spec.
I'll carry it on with the HBA and stora
ag#0 sd_done: completed 4096 of 4096 bytes
[...] sd 0:0:0:0: [sda] tag#0 8 sectors total, 4096 bytes done.
[...] sd 0:0:0:0: tag#0 0 sectors total, 0 bytes done.
Apologies for the ridiculously long commit message with description and
test-cases, but this problem has been relatively difficul
update_request: I/O error, dev sda, sector 17096824
Links:
[1]
http://git.qemu.org/?p=qemu.git;a=commit;h=336a6915bc7089fb20fea4ba99972ad9a97c5f52
[2] https://libvirt.org/formatdomain.html#elementsDisks (see 'disk' -> 'device')
Signed-off-by: Mauricio Faria de Oliveira
Sig
On 11/23/2016 12:12 PM, Johannes Thumshirn wrote:
Looks good and sorry for the bug,
Reviewed-by: Johannes Thumshirn
Thanks for the quick review. Not a problem!
This problem turned out to be a good learning exercise. :)
--
Mauricio Faria de Oliveira
IBM Linux Technology Center
--
To
Due credit; an oversight.
On 11/23/2016 10:33 AM, Mauricio Faria de Oliveira wrote:
Reported-by: Harsha Thyagaraja
Cc: sta...@vger.kernel.org # v4.8
Fixes: 22466da5b4b7 ("lpfc: Fix possible NULL pointer dereference")
Signed-off-by: Mauricio Faria de Oliveira
--
Maurici
] [...] kthread+0x108/0x130
[...] [...] ret_from_kernel_thread+0x5c/0xbc
<...>
Cc: sta...@vger.kernel.org # v4.8
Fixes: 22466da5b4b7 ("lpfc: Fix possible NULL pointer dereference")
Signed-off-by: Mauricio Faria de Oliveira
---
drivers/scsi/lpfc/lpfc_sli.c | 14 -
s; sorry for this oversight.)
With it applied, both PCI device remove and EEH recovery works fine.
Fixes: 1535aa75a3d8 ("scsi: qla2xxx: fix invalid DMA access after
command aborts in PCI device remove")
Signed-off-by: Mauricio Faria de Oliveira
---
drivers/scsi/qla2xxx/qla_os.
;
qla2xxx [001d:70:00.0]-801c:1: Abort command issued nexus=1:0:0 -- 1 2003.
<...>
qla2xxx [001d:70:00.1]-801c:2: Abort command issued nexus=2:3:0 -- 1 2003.
<...>
(command does return; adapter can be re-added correctly)
Mauricio Faria de Oliveira (2):
qla2
700a10] qla2xxx_queuecommand+0x50/0x3f0 [qla2xxx]
So, fail commands in qla2xxx_queuecommand() if the UNLOADING bit is set.
Signed-off-by: Mauricio Faria de Oliveira
---
drivers/scsi/qla2xxx/qla_os.c | 5 +
1 file changed, 5 insertions(+)
diff --git a/drivers/scsi/qla2xxx/qla_os.c b/driver
quests and handle responses.
Reported-by: Naresh Bannoth
Signed-off-by: Mauricio Faria de Oliveira
---
drivers/scsi/qla2xxx/qla_os.c | 9 +
1 file changed, 9 insertions(+)
diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c
index fdb135b..c50dd22 100644
--- a/dr
got fixed,
and it happens in normal scenarios (eg SCSI EH), it seems appropriate.
Thanks,
--
Mauricio Faria de Oliveira
IBM Linux Technology Center
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
Mor
received other technical
problems, for example.
Thanks for the review/comments (Christoph too),
--
Mauricio Faria de Oliveira
IBM Linux Technology Center
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
Mor
me distros where ppc64/le
usually runs, on which it would be easier to adapt this relatively
small change than moving forward w/ blk-mq/scsi-mq, for example --
even if the latter is clearly a superior approach.
[1] http://lists.infradead.org/pipermail/linux-nvme/2016-June/005012.html
--
Maurici
On 06/01/2016 05:43 PM, Mauricio Faria de Oliveira wrote:
Tested on next-20160601 (with an extra commit for patch 1/2, see commit msg).
FYI, that commit has been accepted into powerpc next [1].
[1] https://git.kernel.org/powerpc/c/f8ab481066e7246e4b272233aa
--
Mauricio Faria de Oliveira
IBM
On 06/16/2016 09:35 AM, Mauricio Faria de Oliveira wrote:
On 06/15/2016 01:34 PM, Brian King wrote:
[...] did you ever try to reproduce this with an
upstream
kernel?
Now that the topic is under discussion, I've asked for some time slots
on that system, so we can test an upstream kerne
analyze it more closely.
--
Mauricio Faria de Oliveira
IBM Linux Technology Center
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
lpfc 0006:01:00.4: 4:(0):0713 SCSI layer issued Device Reset (0, 0) return
x2002
<...>
lpfc 0006:01:00.4: 4:(0):0723 SCSI layer issued Target Reset (1, 0) return
x2002
<...>
lpfc 0006:01:00.4: 4:(0):0714 SCSI layer issued Bus Reset Data: x2002
<...>
lp
ent with large steps on some systems).
While in there, include the CPU number in the debug message, which
helps reading it on systems with many CPUs.
This depends on commit 'powerpc: export cpu_to_core_id()' (submitted
to the linuxppc-dev mailing list). Tested on next-20160601 w/ com
(topology information),
which has server processors with many cores/threads and per-core caches.
Although the series include bits for PowerPC64, the per-core scheduling patch
is architecture independent.
Tested on next-20160601 (with an extra commit for patch 1/2, see commit msg).
Mauricio Faria de
Tested on next-20160601.
Signed-off-by: Mauricio Faria de Oliveira
---
drivers/scsi/lpfc/lpfc_attr.c | 8 +--
drivers/scsi/lpfc/lpfc_hw4.h | 1 +
drivers/scsi/lpfc/lpfc_init.c | 54 ++-
drivers/scsi/lpfc/lpfc_scsi.c | 3 ++-
4 files changed, 62 insert
ontext for linux-scsi so they don't need to track
down other mails (btw, thanks for the detailed patch header but it
enabled me to be skeptical of your request to revert):
You're welcome. If it's been useful for rejecting this patch and
getting a better one later, it's worth i
Commit f3ddac1918fe963bcbf8d407a3a3c0881b47248b ("[SCSI] qla2xxx:
Disable adapter when we encounter a PCI disconnect.") has introduced a
code that disables the board, releasing some resources, when reading
0x.
In case this happens when there is an EEH, this read will trigger EEH
detection
73 matches
Mail list logo