On Jul 28, 2011, at 4:55 AM, Koopmann, Jan-Peter wrote:

> Hi,
> 
> my system is running oi148 on a super micro X8SIL-F board. I have two pools 
> (2 disc mirror, 4 disc RAIDZ) with RAID level SATA drives. (Hitachi HUA72205 
> and SAMSUNG HE103UJ).  The system runs as expected however every few days 
> (sometimes weeks) the system comes to a halt due to these errors:
> 
> Dec  3 13:51:20 nasjpk gda: [ID 107833 kern.warning] WARNING: 
> /pci@0,0/pci-ide@1f,2/ide@1/cmdk@0,0 (Disk1):
> Dec  3 13:51:20 nasjpk  Error for commandX \'read sector\' Error Level: Fatal
> Dec  3 13:51:20 nasjpk gda: [ID 107833 kern.notice]     Requested Block 
> 5503936, Error Block: 5503936
> Dec  3 13:51:20 nasjpk gda: [ID 107833 kern.notice]     Sense Key: 
> uncorrectable data error
> Dec  3 13:51:20 nasjpk gda: [ID 107833 kern.notice]     Vendor \'Gen-ATA \' 
> error code: XX7

Several things:

1. You are using SATA in IDE-compatibility mode.  Usually this is a BIOS setting
and for most BIOSes, IDE-compatibility mode is the default. Change to AHCI 
is an improvement that includes better error monitoring.

2. In this case, the disk is returning an unrecoverable read error. This is the 
most
common error for modern HDDs.

3. When #2 happens, consumer-grade disks can get stuck retrying forever. 
Enterprise-class drives have limited retry. For the retry-forever disks, the OS
is responsible for ultimately timing out the I/O attempt. For many Solaris 
releases,
the default retry/timeout cycle lasts 3 to 5 minutes. Because of #1, the disk 
cannot
service more than one outstanding I/O, so all I/O to the disk is blocked, 
impacting
the rest of the pool.

> 
> It is not related to this one disk. It happens on all disks. Sometimes 
> several are listed before the system "crashes", sometimes just one. I cannot 
> pinpoint it to a single defect disk though (and already have replaced the 
> disks). I suspect that this is an error with the SATA controller or the 
> driver. Can someone give me a hint on whether or not that assumption sounds 
> feasible? I am planning on getting a new "cheap" 6-8 way SATA2 or SATA3 
> controller and switch over the drives to that controller. If it is 
> driver/controller related the problem should disappear. Is it possible to 
> simply reconnect the drives and all is going to be well or will I have to 
> reinstall due to different SATA "layouts" on the disks or alike? 

The ease of migration depends on your HBA and whether it writes metadata
that is not compatible with other HBAs. For simple HBAs, it is quite common for
disks to be migrated to other machines and the pool imported.

HTH,
 -- richard


_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to