Hi Alexander and all,

On 02/15/10 06:38, Alexander Motin wrote:
Author: mav
Date: Sun Feb 14 19:38:27 2010
New Revision: 203889
URL: http://svn.freebsd.org/changeset/base/203889

Log:
   MFC r203108:
   Large set of CAM improvements:

[snip]

I've been having issues with the mpt-driven LSI SAS adapter in my SunFire X4100 server running FreeBSD 8-STABLE r202132. Under certain disk workloads like running an svn update of the src tree or kernel compile, the disk subsystem will become extremely unresponsive in a stalled like state, and /var/log/messages will report a number of these:

mpt0: mpt_cam_event: 0x16

It does eventually come good after a minute or two even though the svn op or build is still running, then it will maybe repeat a few times stalled/good behaviour sometimes with minutes between events.

A couple of times it has gotten even more upset reporting things like this:

mpt0: mpt_cam_event: 0x16
mpt0: mpt_cam_event: 0x16
mpt0: request 0xffffff80002f1400:54058 timed out for ccb 0xffffff0001c65000 (req->ccb 0xffffff0001c65000)
mpt0: attempting to abort req 0xffffff80002f1400:54058 function 0
mpt0: request 0xffffff80002fd100:54059 timed out for ccb 0xffffff009f3ec800 (req->ccb 0xffffff009f3ec800) mpt0: request 0xffffff80002efcf0:54060 timed out for ccb 0xffffff0001bd2000 (req->ccb 0xffffff0001bd2000)
mpt0: mpt_recover_commands: IOC Status 0x4a. Resetting controller.
mpt0: mpt_cam_event: 0x0
mpt0: mpt_cam_event: 0x0
mpt0: completing timedout/aborted req 0xffffff80002f1400:54058
mpt0: completing timedout/aborted req 0xffffff80002fd100:54059
mpt0: completing timedout/aborted req 0xffffff80002efcf0:54060
mpt0: mpt_cam_event: 0x16
mpt0: mpt_cam_event: 0x12
mpt0: mpt_cam_event: 0x12
mpt0: mpt_cam_event: 0x16
mpt0: Volume(0:2): Volume Status Changed
mpt0: request 0xffffff80002f8990:0 timed out for ccb 0xffffff009f3cb800 (req->ccb 0)

No ill effects are observed after such an episode and the array remains in healthy as-normal state. The only observable problem is the stall of all disk IO while these events occur.

The disk configuration is 2 x 320GB WD3200BEKT 7200RPM SATA HDDs in RAID1. The hardware reports itself as:

mpt0: <LSILogic SAS/SATA Adapter> port 0xa800-0xa8ff mem 0xfc4fc000-0xfc4fffff,0xfc4e0000-0xfc4effff irq 28 at device 3.0 on pci2
mpt0: [ITHREAD]
mpt0: MPI Version=1.5.13.0
mpt0: Capabilities: ( RAID-0 RAID-1E RAID-1 )
mpt0: 1 Active Volume (2 Max)
mpt0: 2 Hidden Drive Members (10 Max)

m...@pci0:2:3:0: class=0x010000 card=0x30601000 chip=0x00501000 rev=0x02 hdr=0x00
    vendor     = 'LSI Logic (Was: Symbios Logic, NCR)'
    device     = 'SAS 3000 series, 4-port with 1064 -StorPort'
    class      = mass storage
    subclass   = SCSI




As best I can tell, the hardware is ok, both disks report as fine without SMART errors and are only 2 months old, so wanted to rule out software issues. On upgrading to recent 8-STABLE, I got a page fault kernel panic on boot in the mpt driver mpt_raid0 kproc. After some trial and error, r203888 is the most recent revision that boots fine, whilst r203889 exhibits the page fault. I should also note that r203888 still sees the "mpt0: mpt_cam_event: 0x16" messages and associated disk IO stalls.

I compiled DDB into my r203889 kernel. Unfortunately my ILO emulates a USB keyboard so I can't do anything in DDB which is a huge pain, but here's the info I did get (hand transcribed):

Fatal trap 12: page fault while in kernel mode
current process: mpt_raid0
Stopped at xpt_rescan+0x1d:     movq   0x10(%rsi),%rdx



So there are two separate issues here:

1. Any thoughts on how to resolve the regression in the mpt driver with the r203889 commit?

2. Any thoughts on the behaviour I'm seeing with the mpt_cam_event messages? Is it possible it's just a driver issue? Is the hardware likely bad? I'm really hoping they'll go away once the driver issue is resolved as the freezes are fairly unacceptable on a production machine and the hardware appears to pass all checks I've done so far.

Cheers,
Lawrence
_______________________________________________
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

Reply via email to