On Jul 28, 2011, at 4:55 AM, Koopmann, Jan-Peter wrote: > Hi, > > my system is running oi148 on a super micro X8SIL-F board. I have two pools > (2 disc mirror, 4 disc RAIDZ) with RAID level SATA drives. (Hitachi HUA72205 > and SAMSUNG HE103UJ). The system runs as expected however every few days > (sometimes weeks) the system comes to a halt due to these errors: > > Dec 3 13:51:20 nasjpk gda: [ID 107833 kern.warning] WARNING: > /pci@0,0/pci-ide@1f,2/ide@1/cmdk@0,0 (Disk1): > Dec 3 13:51:20 nasjpk Error for commandX \'read sector\' Error Level: Fatal > Dec 3 13:51:20 nasjpk gda: [ID 107833 kern.notice] Requested Block > 5503936, Error Block: 5503936 > Dec 3 13:51:20 nasjpk gda: [ID 107833 kern.notice] Sense Key: > uncorrectable data error > Dec 3 13:51:20 nasjpk gda: [ID 107833 kern.notice] Vendor \'Gen-ATA \' > error code: XX7
Several things: 1. You are using SATA in IDE-compatibility mode. Usually this is a BIOS setting and for most BIOSes, IDE-compatibility mode is the default. Change to AHCI is an improvement that includes better error monitoring. 2. In this case, the disk is returning an unrecoverable read error. This is the most common error for modern HDDs. 3. When #2 happens, consumer-grade disks can get stuck retrying forever. Enterprise-class drives have limited retry. For the retry-forever disks, the OS is responsible for ultimately timing out the I/O attempt. For many Solaris releases, the default retry/timeout cycle lasts 3 to 5 minutes. Because of #1, the disk cannot service more than one outstanding I/O, so all I/O to the disk is blocked, impacting the rest of the pool. > > It is not related to this one disk. It happens on all disks. Sometimes > several are listed before the system "crashes", sometimes just one. I cannot > pinpoint it to a single defect disk though (and already have replaced the > disks). I suspect that this is an error with the SATA controller or the > driver. Can someone give me a hint on whether or not that assumption sounds > feasible? I am planning on getting a new "cheap" 6-8 way SATA2 or SATA3 > controller and switch over the drives to that controller. If it is > driver/controller related the problem should disappear. Is it possible to > simply reconnect the drives and all is going to be well or will I have to > reinstall due to different SATA "layouts" on the disks or alike? The ease of migration depends on your HBA and whether it writes metadata that is not compatible with other HBAs. For simple HBAs, it is quite common for disks to be migrated to other machines and the pool imported. HTH, -- richard
_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss