Re: ATA troubles

Jerome Herman Sun, 24 Jul 2011 17:46:54 -0700

On 25/07/2011 01:58, Andrea Venturoli wrote:

(Sorry for the previous post, I accidentally hit sent, while themessages was still unfinished).
Hello everyone.

For those interested, this post is a sequel of:
http://www.mailinglistarchive.com/html/freebsd-questions%40freebsd.org/2011-06/msg00018.html
However, I'll summarize.
At the beginning of June, I installed two WD 1TB Caviar Green SATAdrives into an Intel-S5000-based production box of mine and it was hell!This server runs 7.3/i386 off a SAS RAID and the two new drives shouldhave worked with gstripe to constitute a secondary storage.
I started getting:
ad4: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout -completing
request directly
ad4: WARNING - SMART taskqueue timeout - completing request directly
ad8: WARNING - SMART taskqueue timeout - completing request directly
ad4: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout -completing
request directly
ad4: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout -completing
request directly
ad8: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout -completing
request directly
ad4: WARNING - SETFEATURES ENABLE RCACHE taskqueue timeout -completing request
directly
and the box would reboot within minutes.
This also prevented me from running tests with smartctl.
Notice the box had previously a single SATA drive working perfectly.
It was suggested I ran wdidle.exe from DOS to prevent the drives fromspinning down and it helped: now I was at least able to fsck thestripe and copy something on it.Still I keep getting the above messages; the drives would alsooccasionally hang and then restart. Uptime raised to some hours, butthe box would still reboot.
In the meantime the drives went bad (smartd, BIOS and WD-tools proven)and I had them replaced.
When they came back, I decided to put up a test box: hardware iscompletely different from the production box, but still FreeBSD willrun from a SCSI drive and the two WD will constitute an additionalstripe.First I run WD tools to check the driver and they passed every test(including long one).
So I installed FreeBSD 7.3/i386, smartctl and verified the disks again.
I created the stripe, fscked it, and copied about 420GB of data viarsync over NFS. It seemed to work fine, but, after about 15 hours, thebox rebooted after:
ad6: FAILURE - device detached
g_vfs_done():stripe/backup[WRITE(offset=1709926940672,length=131072)]error = 6
/mnt/local: got error 6 while accessing filesystem
panic: softdep_deallocate_dependencies: unrecovered I/O error
Subsequent retries always gave the same results, until I disabledsoftupdates on the stripe. I then was able to complete the rsync.
Not quite happy, I made a local to local copy and started getting alot of:
Jul 24 18:54:28 mydavid kernel: ad4: WARNING - READ_DMA48 UDMA ICRCerror (retrying request) LBA=1620416000Jul 24 18:54:28 mydavid kernel: ad4: FAILURE - READ_DMA48status=51<READY,DSC,ERROR> error=10<NID_NOT_FOUND> LBA=1620416000Jul 24 18:54:28 mydavid kernel:g_vfs_done():stripe/backup[READ(offset=1659305967616,length=131072)]error = 5Jul 24 18:54:42 mydavid kernel: ad6: WARNING - READ_DMA48 UDMA ICRCerror (retrying request) LBA=1621920384Jul 24 18:54:42 mydavid kernel: ad6: FAILURE - READ_DMA48status=51<READY,DSC,ERROR> error=10<NID_NOT_FOUND> LBA=1621920384Jul 24 18:54:42 mydavid kernel:g_vfs_done():stripe/backup[READ(offset=1660846522368,length=131072)]error = 5
I run smartctl's short test on both drives and they were ok; I triedthe offline test, but they got interrupted (???).
In spite of the messages above, it looked like it was working...
However, I was logged in via ssh and had to turn of the client; so Istopped it, went into the console and started it again.
Now it looks like one drive is not working fine anymore...
Jul 24 23:48:36 mydavid kernel: ad6: FAILURE - READ_DMA48status=51<READY,DSC,ERROR> error=40<UNCORRECTABLE> LBA=1671887488Jul 24 23:48:36 mydavid kernel:g_vfs_done():stripe/backup[READ(offset=1712012836864,length=131072)]error = 5Jul 24 23:48:39 mydavid kernel: ad6: FAILURE - READ_DMA48status=51<READY,DSC,ERROR> error=40<UNCORRECTABLE> LBA=1671897856Jul 24 23:48:39 mydavid kernel:g_vfs_done():stripe/backup[READ(offset=1712023420928,length=131072)]error = 5Jul 24 23:48:41 mydavid kernel: ad6: FAILURE - READ_DMA48status=51<READY,DSC,ERROR> error=40<UNCORRECTABLE> LBA=1671897888Jul 24 23:48:41 mydavid kernel:g_vfs_done():stripe/backup[READ(offset=1712023486464,length=131072)]error = 5
Also, smartd is complaining:
Jul 24 23:41:59 mydavid smartd[2630]: Device: /dev/ad6, 38 Currentlyunreadable (pending) sectorsJul 24 23:50:56 mydavid smartd[538]: Device: /dev/ad6, 39 Currentlyunreadable (pending) sectors
After a reboot, I've got back to the NID_NOT_FOUND errors...




While I'm still conducting other tests, has anyone any hint on this?

Just a shot in the dark : are your drives of the "green" kind ? Such asWestern Digital Caviar Green ?Also since they are ATA drives make sure you are using 80pins ribbonsand that DMA is properly activated in BIOS.

You can also try to reduce DMA level, it must be on UDMA5 by default,try using UDMA 4 (aka UDMA/66) or UDMA 3.



 bye & Thanks
    av.
_______________________________________________
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions

To unsubscribe, send any mail to"freebsd-questions-unsubscr...@freebsd.org"


_______________________________________________
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"

Re: ATA troubles

Reply via email to