On Sat, Dec 07, 2002 at 09:35:13AM +0200, Vladislav V. Zhuk wrote:
[snip]
> 
> I don't think like you.
> I check my hardware and I consider that problem in new ATA driver.
> Under FreeBSD 4.1.1 my hardware work excellent.
> After 4.5 release I get more troubles with IDE devices.
> Some bugs was fixed and now (under 4.7s) I have no problem
> with IDE HDD (even softupdates work).
> 
> After reboot my system work excellent 2-5 days, than I get
> "read timeout" problem with my CDROM and all system hang.
> 
> I wrote about that troubles with ATA, but not get answer...
> 
> Who have problem with ATA driver - write here about this
> and show /var/run/dmesg. Maybe we discover some dependences
> where trouble appeared....
> 
> --
> Vladislav V. Zhuk (06267)3-60-03  [EMAIL PROTECTED]  2:[EMAIL PROTECTED]
> 
> To Unsubscribe: send mail to [EMAIL PROTECTED]
> with "unsubscribe freebsd-questions" in the body of the message
> 
> end of the original message

I had a lot of problems with tagged queuing enabled on IBM drives.

I have a server with a Promise FastTrak TX2 ATA RAID controller and 2
IBM 40G drives attached to it. I have another IBM drive (identical to the other
two) attached to the mainboard's ATA controller.

# atacontrol list
ATA channel 0:
    Master:  ad0 <IC35L040AVER07-0/ER4OA44A> ATA/ATAPI rev 5
    Slave:       no device present
ATA channel 1:
    Master: acd0 <LG CD-ROM CRD-8521B/1.03> ATA/ATAPI rev 0
    Slave:       no device present
ATA channel 2:
    Master:  ad4 <IC35L040AVER07-0/ER4OA44A> ATA/ATAPI rev 5
    Slave:       no device present
ATA channel 3:
    Master:  ad6 <IC35L040AVER07-0/ER4OA44A> ATA/ATAPI rev 5
    Slave:       no device present


The filesystems layout is:

# mount
/dev/ar0s1a on / (ufs, local, soft-updates)
/dev/ar0s1f on /usr (ufs, local, soft-updates)
/dev/ar0s1d on /var (ufs, local, noatime, soft-updates)
/dev/ar0s1e on /var/tmp (ufs, local, soft-updates)
/dev/ar0s1g on /db (ufs, local, soft-updates)
/dev/ar0s1h on /home (ufs, local, noatime, soft-updates)
/dev/ad0s1a on /backup (ufs, local, soft-updates)
procfs on /proc (procfs, local)

The sysctl's hw.ata tunables are set as follows:

# sysctl -a | grep 'hw\.ata'
hw.ata.ata_dma: 1
hw.ata.wc: 1
hw.ata.tags: 1
hw.ata.atapi_dma: 0

The server ran without problems since october 2001 till the summer of 2002,
when an MFC broke the tagged queing support. I had to set hw.ata.tags to 0 to
avoid kernel panics and have the system up and running. Finally, the TQ support
was (apparently) fixed and I re-enabled it. The system ran fine for a short
time though, because the drive on the second channel of the Promise controller
began to fallback to PIO mode.
I don't think it's a hardware problem, because I rebooted the system from the
live-system CD of the FreeBSD distribution set and ran dd on the faulty drive:
no error was reported.
I have rebuilt the array using the Promise utilty and rebooted the system which
ran in UDMA100 mode for a couple of weeks. Then the problem appeared again:

Dec  4 05:40:02 zeus /kernel: ad6: SERVICE timeout tag=0 s=51 e=04
Dec  4 05:40:02 zeus /kernel: ad6: invalidating queued requests
Dec  4 05:40:02 zeus /kernel: ad6: no request for tag=0
Dec  4 05:40:02 zeus /kernel: ad6: invalidating queued requests
Dec  4 05:40:12 zeus /kernel: ad6: READ command timeout tag=0 serv=1 -
resetting
Dec  4 05:40:22 zeus /kernel: ad6: invalidating queued requests
Dec  4 05:40:22 zeus /kernel: ata3: resetting devices .. ad6: invalidating
queued requests
Dec  4 05:40:22 zeus /kernel: done
Dec  4 05:40:22 zeus /kernel: ad6: READ command timeout tag=0 serv=1 -
resetting
Dec  4 05:40:22 zeus /kernel: ad6: invalidating queued requests
Dec  4 05:40:22 zeus /kernel: ata3: resetting devices .. ad6: invalidating
queued requests
Dec  4 05:40:22 zeus /kernel: done
Dec  4 05:40:22 zeus /kernel: ad6: no request for tag=0
Dec  4 05:40:22 zeus /kernel: ad6: invalidating queued requests
Dec  4 05:40:32 zeus /kernel: ad6: READ command timeout tag=0 serv=1 -
resetting
Dec  4 05:40:52 zeus /kernel: ad6: invalidating queued requests
Dec  4 05:40:52 zeus /kernel: ata3: resetting devices .. ad6: invalidating
queued requests
Dec  4 05:40:52 zeus /kernel: done
Dec  4 05:40:52 zeus /kernel: ad6: timeout waiting for READY
Dec  4 05:40:52 zeus /kernel: ad6: invalidating queued requests
Dec  4 05:40:52 zeus /kernel: ad6: timeout sending command=00 s=d0 e=04
Dec  4 05:40:52 zeus /kernel: ad6: flush queue failed
Dec  4 05:40:52 zeus /kernel: - resetting
Dec  4 05:40:52 zeus /kernel: ata3: resetting devices .. ad6: invalidating
queued requests
Dec  4 05:40:52 zeus /kernel: done
Dec  4 05:40:52 zeus /kernel: ad6: READ command timeout tag=0 serv=1 -
resetting
Dec  4 05:40:52 zeus /kernel: ad6: invalidating queued requests
Dec  4 05:40:52 zeus /kernel: ad6: trying fallback to PIO mode
Dec  4 05:40:52 zeus /kernel: ata3: resetting devices .. ad6: invalidating
queued requests
Dec  4 05:40:52 zeus /kernel: done
Dec  4 05:40:52 zeus /kernel: ad6: WRITE command timeout tag=0 serv=0 -
resetting
Dec  4 05:40:52 zeus /kernel: ad6: invalidating queued requests
Dec  4 05:40:52 zeus /kernel: ata3: resetting devices .. ad6: invalidating
queued requests
Dec  4 05:40:52 zeus /kernel: done

(The most recent error report is shown)

# atacontrol mode 3
Master = PIO4 
Slave  = ???

# atacontrol mode 3 udma100 xxx
Master = UDMA100 
Slave  = ???

# atacontrol mode 3
Master = UDMA100 
Slave  = ???

If I execute an IO-intensive program then the system falls back to PIO mode 4:

# find /usr/ports/ -name nonexistent

# atacontrol mode 3
Master = PIO4 
Slave  = ???


If I reboot the system the Promise utilty tells me that the array has a
critical status. If I rebuild the array and reboot the system, then everything
is fine for other 1-4 weeks before the problem appears again!

Note that the problem appears always before the completion of backup activity.
From the daily run output before the drive failure:

Last dump(s) done (Dump '>' file systems):
> /dev/ar0s1a   (     /) Last dump: Level 0, Date Tue Dec  3 05:30
  /dev/ar0s1d   (  /var) Last dump: Level 0, Date Tue Dec  3 05:30
  /dev/ar0s1e   (/var/tmp) Last dump: Level 0, Date Tue Dec  3 05:30
  /dev/ar0s1f   (  /usr) Last dump: Level 0, Date Tue Dec  3 05:30
  /dev/ar0s1g   (   /db) Last dump: Level 0, Date Tue Dec  3 05:40
  /dev/ar0s1h   ( /home) Last dump: Level 0, Date Tue Dec  3 05:39

On dec, 4th at 05:40:02 the timeout problem appears:
Last dump(s) done (Dump '>' file systems):
> /dev/ar0s1a   (     /) Last dump: Level 0, Date Wed Dec  4 05:30
  /dev/ar0s1d   (  /var) Last dump: Level 0, Date Wed Dec  4 05:30
  /dev/ar0s1e   (/var/tmp) Last dump: Level 0, Date Wed Dec  4 05:30
  /dev/ar0s1f   (  /usr) Last dump: Level 0, Date Wed Dec  4 05:30
  /dev/ar0s1g   (   /db) Last dump: Level 0, Date Wed Dec  4 05:41
  /dev/ar0s1h   ( /home) Last dump: Level 0, Date Wed Dec  4 05:39

note that the duration of the backup of /dev/ar0s1g is 1 minute longer than
usual (with exactly the same load, not showed).

After the problem appeared:

Last dump(s) done (Dump '>' file systems):
> /dev/ar0s1a   (     /) Last dump: Level 0, Date Thu Dec  5 05:30
  /dev/ar0s1d   (  /var) Last dump: Level 0, Date Thu Dec  5 05:30
  /dev/ar0s1e   (/var/tmp) Last dump: Level 0, Date Thu Dec  5 05:30
  /dev/ar0s1f   (  /usr) Last dump: Level 0, Date Thu Dec  5 05:30
  /dev/ar0s1g   (   /db) Last dump: Level 0, Date Thu Dec  5 05:47
  /dev/ar0s1h   ( /home) Last dump: Level 0, Date Thu Dec  5 05:44

obviously the system is slower, but it works.

I'm tired to reboot and rebuild the array each time, can anybody help me to
solve this problem?

        Francesco Casadei

P.S. sorry for the long post, but I'm sure the information I gave you will help
you to diagnose the problem.

-- 
You can download my public key from http://digilander.libero.it/fcasadei/
or retrieve it from a keyserver (pgpkeys.mit.edu, wwwkeys.pgp.net, ...)

Key fingerprint is: 1671 9A23 ACB4 520A E7EE  00B0 7EC3 375F 164E B17B

Attachment: msg11440/pgp00000.pgp
Description: PGP signature

Reply via email to