On Tue, Dec 31, 2002 at 03:57:16PM -0500, Bruce Campbell wrote: > > I am seeing a problem with ata disks on 4 new systems, which > I believe is either a bug in the ata driver, or a problem with > the onboard IDE controller, or something else. Systems are as follows: > > Motherboard: ASUS A7M266-D > CPUs : 2 x 2000+ AMD MP > Memory : 2 x 512MB Crucial part: CT6472Y265 > > Disks (all UDMA100): > > Master Slave > System 1: WDC WD400BB WDC WD1000BB > System 2: WDC WD400BB WDC WD1000BB > System 3: WDC WD400BB WDC WD800BB > System 4: WDC WD400BB Maxtor 98196H8 > > Kernel : 4.7-RELEASE, custom kernel (compared to GENERIC): > > commented out: > > cpu I386_CPU > cpu I486_CPU > > enabled > > options SMP # Symmetric MultiProcessor Kernel > options APIC_IO # Symmetric (APIC) I/O > > > I am running a test with "dbench" (/usr/ports/benchmarks/dbench) > with a script which runs: > > dbench 1 > sleep for 5 minutes > dbench 2 > sleep for 5 minutes > dbench 3 > ... > > to simulate 1,2,3... clients. > > The following has happened on systems 2,3 and 4, after about 15 hours > of running the test: > > Dec 30 23:26:59 ecserv13 /kernel: ad0: WRITE command timeout tag=0 serv=0 - > resetting > Dec 30 23:26:59 ecserv13 /kernel: ata0: resetting devices .. done > Dec 30 23:26:59 ecserv13 /kernel: ad0: WRITE command timeout tag=0 serv=0 > resetting > Dec 30 23:27:00 ecserv13 /kernel: ata0: resetting devices .. done > Dec 30 23:27:00 ecserv13 /kernel: ad0: WRITE command timeout tag=0 serv=0 > resetting > Dec 30 23:27:00 ecserv13 /kernel: ata0: resetting devices .. done > Dec 30 23:27:00 ecserv13 /kernel: ad0: WRITE command timeout tag=0 serv=0 > resetting > Dec 30 23:27:00 ecserv13 /kernel: ad0: timeout waiting for cmd=ef s=d0 e=00 > Dec 30 23:27:00 ecserv13 /kernel: ad0: trying fallback to PIO mode > Dec 30 23:27:00 ecserv13 /kernel: ata0: resetting devices .. done > > The test continues to run with the ata controller in PIO mode, with > slower performance, and higher load average. > > Once the master drops to PIO, attempts to access the slave then cause > it to drop to PIO. > > If I run: > > atacontrol mode 0 UDMA100 UDMA100 > > attempts to access either drive result in a delay until the controller > drops to PIO, and then operations resume. A soft reboot and things > work in UDMA mode again. Also tried UDMA33 and UDMA66 with no change. > I also tried "atacontrol reinit 0" with no help. > > Theories when I search the web for "fallback to PIO mode" include: > > - bad disks > - something to do with thermal recalibration > > I don't believe the problems are bad disks, as the slave drops to PIO > after the master does, and I can't get in back to UDMA, other than by > soft reboot. Plus I see the problem on 6 of 8 disks. > > The problem is very repeatable. > > Can anyone offer any ideas, or suggest investigative steps ? I have a system > in PIO mode right now. > > Thanks, > > -- > Bruce Campbell > Engineering Computing > CPH-2374B > University of Waterloo > (519)888-4567 ext 5889 > > ---------------------------------------- > This mail sent through www.mywaterloo.ca > > To Unsubscribe: send mail to [EMAIL PROTECTED] > with "unsubscribe freebsd-questions" in the body of the message > > end of the original message
Same problem here, but slightly different configuration: # atacontrol list ATA channel 0: Master: ad0 <IC35L040AVER07-0/ER4OA44A> ATA/ATAPI rev 5 Slave: no device present ATA channel 1: Master: acd0 <LG CD-ROM CRD-8521B/1.03> ATA/ATAPI rev 0 Slave: no device present ATA channel 2: Master: ad4 <IC35L040AVER07-0/ER4OA44A> ATA/ATAPI rev 5 Slave: no device present ATA channel 3: Master: ad6 <IC35L040AVER07-0/ER4OA44A> ATA/ATAPI rev 5 Slave: no device present ad4 and ad6 are attached to a Promise FastTrak 100 TX2 ATA RAID controller. # atacontrol mode 0 Master = UDMA100 Slave = ??? # atacontrol mode 1 Master = PIO4 Slave = ??? # atacontrol mode 2 Master = UDMA100 Slave = ??? # atacontrol mode 3 Master = PIO4 Slave = ??? ad6 falls back to PIO mode on heavy I/O activity, i.e. when the system does a level 0 file systems dump from the RAID 1 array (ad4,ad6) to the backup disk ad0. Rebooting and rebuilding the array with the Promise BIOS utility temporarily solve the problem. The system may be up and running for 1-4 weeks doing a level 0 dump every morning at 5:30am and then one day the drive ad6 falls back to PIO mode again (little before the completion of fs dump). Do the hard drives you are using support the ATA tagged queuing? And if so, do you have TQ enbled? Francesco Casadei -- You can download my public key from http://digilander.libero.it/fcasadei/ or retrieve it from a keyserver (pgpkeys.mit.edu, wwwkeys.pgp.net, ...) Key fingerprint is: 1671 9A23 ACB4 520A E7EE 00B0 7EC3 375F 164E B17B
msg13998/pgp00000.pgp
Description: PGP signature