LI Xin wrote:
Jeff Royle wrote:
Jeff Royle wrote:
I could use some advice on this issue I have had with my raid controller.
I am not really running much on the system yet, postfix, Pf + pflogd,
rlogind, ssh, bsnmp and ntpd.  While I was just reading a file with
less the system stopped responding.   I thought it was the network
interfaces but I was able to ping the interface. Once I plugged a
monitor into the system I saw this (roughly):

AAC0: COMMAND <SOME HEX> TIMEOUT AFTER X number of seconds

Not good :)

Reset of the system resolved the issue and it booted fine.    Since
the controller stopped responding nothing was recorded to my logs.

Now I have to figure out how to prevent that from happening again.

Basic run down on the system and some history...

P4 3.2Ghz
Asus P5MT-S MB
2 x 1GB DDR2 667 memory
Adaptec 2130SLP Raid Controller + battery backup module
2 Segate Ultra320 73GB 15k RPM (mirrored)

I have run this same system hardware testing 6.2-BETA3, RC-1 and RC-2
without this issue.    I was using the driver released by Adaptec
while testing the pre-release installs
(http://www.adaptec.com/en-US/speed/raid/aac/unix/aacraid_freebsd6_drv_b11518_tgz.htm). You could say I am fairly confidient in the hardware itself. I have
put this system through a lot of testing since BETA3.

The 6.2 release kernel has not been customized all that much, I just
pulled out all the drivers I would never use.    To be safe I kept
just about all scsi devices/card models still in as I continued my
testing of 6.2 release. Right now I am going to try taking out aac and
aacp then try the driver I used in my previous tests.    However,
since I have run a week without this issue it will be hard/impossible
tell if this did anything to resolve it...I almost want a crash on the
old driver :)

So I need some advice...  How best do I debug this issue?

Thanks in advance for any direction you guys can offer me.

Cheers,

Jeff


It appears the driver I was using in my pre-release testing is newer
then the release driver.

Stock driver in 6.2r dmesg:

aac0: <Adaptec SCSI RAID 2130S> mem
0xfc600000-0xfc7fffff,0xfc5ff000-0xfc5fffff irq 24 at device 1.0 on pci2
aac0: New comm. interface enabled
aac0: Adaptec Raid Controller 2.0.0-1
aacp0: <SCSI Passthrough Bus> on aac0

Currently using:

aacu0: <Adaptec SCSI RAID 2130S> mem
0xfc600000-0xfc7fffff,0xfc5ff000-0xfc5fffff irq 24 at device 1.0 on pci2
aacu0: New comm. interface enabled
aacu0: Adaptec Raid Controller 2.0.7-1
aacpu0: <SCSI Passthrough Bus> on aacu0

Going to continue testing with the newer driver.

I have some preliminary work on merging the Adaptec driver:

http://people.freebsd.org/~delphij/for_review/patch-aac-vendor-b11518

But one of the reviewers has advised me to request boarder testing,
especially against old cards and CLI tools, so I have hold the commit
for now.

Cheers,

Well the driver patched fine, no issues to report there.

The speed performance is where I expected to see it while using bonnie and simple DD tests based on my previous testing.

So far the issue I noted above with the TIMEOUT error has not shown itself again, time will tell I think on this one.

However I have encountered a intermittent bug on boot.

Sometimes, say every 5-10 boots the system will hang while probing the the scsi bus for the drives. Now I have seen this happen on the aacdu 2.0.7-1 binary driver I was using in my 6.2-RC 1 / 6.2-RC 2 testing once before. This problem is happening a fair bit more.

Here is where it hangs...

Hung dmesg output:

-- snip ---
orm0: <ISA Option ROMs> at iomem 0xc0000-0xc7fff,0xc8000-0xcd7ff on isa0
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
ppc0: parallel port not found.
Timecounters tick every 1.000 msec
acd0: CDRW <QSI CD-RW/DVD-ROM SBW-243/TX09> at ata0-master UDMA33
aacd0: <RAID 1 (Mirror)> on aac0
aacd0: 69889MB (143132672 sectors)
--- end snip ---

The system does not continue on and probe the drives, as seen in a normal boot dmesg:

--- snip ---
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
ppc0: parallel port not found.
Timecounters tick every 1.000 msec
acd0: CDRW <QSI CD-RW/DVD-ROM SBW-243/TX09> at ata0-master UDMA33
aacd0: <RAID 1 (Mirror)> on aac0
aacd0: 69889MB (143132672 sectors)
pass0 at aacp0 bus 0 target 0 lun 0
pass0: <SEAGATE ST373207LC 0005> Fixed unknown SCSI-3 device
pass0: 3.300MB/s transfers
pass1 at aacp0 bus 0 target 3 lun 0
pass1: <SEAGATE ST373207LC 0005> Fixed unknown SCSI-3 device
pass1: 3.300MB/s transfers
SMP: AP CPU #1 Launched!
Trying to mount root from ufs:/dev/aacd0s1a
-- end snip --

In a effort to resolve this I increased the scsi delay in the kernel from 5ms to 10ms

options         SCSI_DELAY=10000

It *may* have helped on one of my reboot tests, I thought it was going to hang again but proceeded. However it definitely did not solve the issue.

Once I am back in the office I will see if I can get some debug output for you.

Cheers,

Jeff
_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Reply via email to