Re: twa: Passthru request timed out! Resetting controller...

Atanas Wed, 15 Nov 2006 18:12:20 -0800

Mark Dotson said the following on 11/14/06 1:18 PM:

I've had continued problems with the 3ware series SATA cards and theTyan boards. Specifically, I have a "Tyan S5360-1U" and both a9500S-4LP and a 8506 series 3ware cards.
In my case the first error is different, but the 'resetting' over andover is VERY familiar. This could be triggered by a simple file copyfrom one part of a container to another; degrading the unit andtriggering the resetting crap. Note that the drives are fine, I testedthat first thing.
Sep 8 11:59:23 localhost kernel: 3w-9xxx: scsi0: WARNING:(0x06:0x002C): Unit #1: Command (0x2a) timed out, resetting card.
Sep  8 11:59:41 localhost kernel: 3w-9xxx: scsi0: AEN: INFO (0x04:0x005E):
Cache synchronized after power fail:unit=0.
Sep  8 11:59:41 localhost kernel: 3w-9xxx: scsi0: AEN: INFO (0x04:0x005E):
Cache synchronized after power fail:unit=1.
I also found this problem to exist across platforms, not just FreeBSD.For example, the excerpt above is from a CentOS box.
All tests were done with newest firmware for both card and mobo, andusing the newest drivers provided by 3ware.
Once I removed the card and drives from the Tyan system and stuck themin pretty much ANY other system, they worked fantastically.
I don't have an answer for the "resetting problem" as of yet... 3wareand Tyan (And my system vendor "Appro") are still trying to find myspecific problem and solve it. I believe they are currently doing the"replace everything" method of troubleshooting.

Mark, thank you.

It's good to know that the resetting problem exist on other platforms too.

We already found out that replacing the entire box with identical onedoesn't help, so unfortunately we'll have to start replacing componentsby using different brands or models.

I wouldn't like to touch the I/O subsystem (these are already loadedproduction machines), so like you said, the safest bet would be to tryanother motherboard.

However I don't see many Dual Opteron based boards suggested by the3ware's compatibility list. The next one that comes in mind from thatlist is Supermicro H8DC8, but it looks more like a gamers dream(High-End PCI-e Graphics, SLI, etc. but no on-board VGA) than a serverboard.

I'm quite surprised that the top Opteron based motherboard manufacturerlisted in the 3ware web site motherboard compatibility docs:http://3ware.com/products/pdf/Motherboard_compatibility_list_9550SX_2006_06.pdfmakes 2 out of 5 boards that are marked as compatible, but perform sobad with 3ware cards.

I know what happens here in this mailing list when somebody looks forgood SATA cards (Re: 3ware, 3ware, ...), I replied myself too.

So are there any success stories with 3ware 9550SX (SATA II) and dualAMD Opteron server boards, or it's time to go back with Intel?


Regards,
Atanas

Atanas wrote:
Has anyone experiencing this:
twa0: ERROR: (0x05: 0x2018): Passthru request timed out!: request =0xca839d20
twa0: INFO: (0x16: 0x1108): Resetting controller...:
twa0: INFO: (0x04: 0x005E): Cache synchronization completed: unit=0
...
twa0: INFO: (0x04: 0x005E): Cache synchronization completed: unit=7
twa0: INFO: (0x04: 0x0001): Controller reset occurred: resets=1
twa0: INFO: (0x16: 0x1107): Controller reset done!:
This happens on 6.2-PRERELEASE i386 (and on 6.1 since its release) ona number of machines with the following hardware configuration:
- Tyan K8SE 2892, 2 AMD Opteron 270 CPUs, 4GB RAM
- 3ware 9550SX-8LP, 8 500GB Seagate ST3500641AS SATA drives
  (configured as 8 SINGLE DISK units, aka JBOD)
All hardware components, including the server chassis, are listed inthe 3ware hardware compatibility lists. It doesn't seem to be acabling or power issue. The controller and hard drives are alreadyflashed to the latest firmware revisions. I tried turning off NCQ, butit didn't make any difference. I tried also switching the kernel fromPAE to non-PAE (reducing the usable memory to 3GB), but it didn't helpeither.
I have another machines with similar I/O configurations (3ware), butwith Intel motherboards and running FreeBSD-5.5, and these run finefor about a year already. Now I'm thinking about swapping the drivesbetween a working Intel and AMD based box, to see where controllertimeouts will follow.
The problem happens sporadically once in a month or so and is veryhard to reproduce. Sometimes it takes several weeks until the nextcrash happens, sometimes it crashes again in just a few hours.
When the thing happens, the kernel sometimes panics (most likely dueto the inconsistent filesystem state caused by the controller reset),sometimes just hangs. It can be interrupted (I have a serial console),but the only usable thing after that seems to be "call cpu_reset()",followed by full (and sometimes painfully long) filesystem check.
Here are the diffs against the default GENERIC and PAE kernelconfigurations:
< cpu       I486_CPU
< ident     GENERIC
< options   INET6               # IPv6 communications protocols
< options   SCSI_DELAY=5000     # Delay (in ms) before probing SCSI

 > options   QUOTA
 > options   SMP                 # Symmetric MultiProcessor Kernel
 > options   BREAK_TO_DEBUGGER
 > options   DDB
 > options   KDB
 > options   KDB_UNATTENDED

 > options   IPFIREWALL
 > options   DUMMYNET

I'm attaching the dmesg.boot following the latest crash.

Regards,
Atanas



_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: twa: Passthru request timed out! Resetting controller...

Reply via email to