On 02/20/10 04:16, Alexander Motin wrote:
Lawrence Stewart wrote:
A couple of times it has gotten even more upset reporting things like this:

mpt0: mpt_cam_event: 0x16
mpt0: mpt_cam_event: 0x16
mpt0: request 0xffffff80002f1400:54058 timed out for ccb
0xffffff0001c65000 (req->ccb 0xffffff0001c65000)
mpt0: attempting to abort req 0xffffff80002f1400:54058 function 0
mpt0: request 0xffffff80002fd100:54059 timed out for ccb
0xffffff009f3ec800 (req->ccb 0xffffff009f3ec800)
mpt0: request 0xffffff80002efcf0:54060 timed out for ccb
0xffffff0001bd2000 (req->ccb 0xffffff0001bd2000)
mpt0: mpt_recover_commands: IOC Status 0x4a. Resetting controller.
mpt0: mpt_cam_event: 0x0
mpt0: mpt_cam_event: 0x0
mpt0: completing timedout/aborted req 0xffffff80002f1400:54058
mpt0: completing timedout/aborted req 0xffffff80002fd100:54059
mpt0: completing timedout/aborted req 0xffffff80002efcf0:54060
mpt0: mpt_cam_event: 0x16
mpt0: mpt_cam_event: 0x12
mpt0: mpt_cam_event: 0x12
mpt0: mpt_cam_event: 0x16
mpt0: Volume(0:2): Volume Status Changed
mpt0: request 0xffffff80002f8990:0 timed out for ccb 0xffffff009f3cb800
(req->ccb 0)

No ill effects are observed after such an episode and the array remains
in healthy as-normal state. The only observable problem is the stall of
all disk IO while these events occur.

I have no idea how mpt driver works, neither I have hardware to play,
but quick look shows that 0x12 event is MPI_EVENT_SAS_PHY_LINK_STATUS,
and 0x16 is MPI_EVENT_SAS_DISCOVERY. Both are not handled by mpt driver
and so logged. I would say something is going on at physical level of
your SAN. Timeouts are also could be the result of physical issues.

Ok, I'll try and figure out what's possibly going on.


As best I can tell, the hardware is ok, both disks report as fine
without SMART errors and are only 2 months old, so wanted to rule out
software issues. On upgrading to recent 8-STABLE, I got a page fault
kernel panic on boot in the mpt driver mpt_raid0 kproc. After some trial
and error, r203888 is the most recent revision that boots fine, whilst
r203889 exhibits the page fault. I should also note that r203888 still
sees the "mpt0: mpt_cam_event: 0x16" messages and associated disk IO
stalls.

I compiled DDB into my r203889 kernel. Unfortunately my ILO emulates a
USB keyboard so I can't do anything in DDB which is a huge pain, but
here's the info I did get (hand transcribed):

Fatal trap 12: page fault while in kernel mode
current process: mpt_raid0
Stopped at xpt_rescan+0x1d:     movq   0x10(%rsi),%rdx

1. Any thoughts on how to resolve the regression in the mpt driver with
the r203889 commit?

Any thoughts where to find a good telepath? :)

For the beginning, show at least verbose boot messages up to the crash.
Full panic message could also be useful, it may show address of the
fault instruction, which may be resolved to source line with addr2line
tool. If you could find a good old PS/2 keyboard, backtrace would be
interesting to see.

2 issues:
- The server is in colocated rack space and not easy to get to
- I'm not even sure that this server has PS2 ports on it

Perhaps this commit should be backed out of 8-STABLE until we get a chance to diagnose a bit more?

Cheers,
Lawrence
_______________________________________________
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

Reply via email to