Al,

That makes so much sense that I can't believe I missed it. One bay was the one 
giving me the problems. Switching drives didn't affect that. Switching cabling 
didn't affect that. Changing Sata controllers didn't affect that. However, 
reorienting the case on it's side did!

I'll be putting in a larger fan into the disk-stack case.

Gary

> On Tue, 14 Aug 2007, Richard Elling wrote:
> 
> > Rick Wager wrote:
> >> We see similar problems on a SuperMicro with 5 500
> GB Seagate sata drives. This is using the AHCI
> driver. We do not, however, see problems with the
> same hardware/drivers if we use 250GB drives.
> >
> > Duh.  The error is from the disk :-)
> 
> A likely possiblity is that the disk drives are
> simply not getting 
> enough (cool) airflow and are over-heating during
> periods of high 
> system activity that generates a lot of disk head
> movement; for 
> example, during a zpool scrub.  And the extra
> platters present in the 
> larger disk drives would require even more cooling
> capacity - which 
> would validate your observations.
> 
> Best to actually *measure* the effectiveness of the
> disk cooling 
> design/installation.  Recommendation: investigate the
> Fluke mini 
> infrared thermometers - for example - the Fluke 62
> at: 
> http://www.testequipmentdepot.com/fluke/thermometers/6
> 2.htm
> 
> In some disk drive installations, its possible for
> the infrared probe 
> to "see" the disk HDA (Head Disk Assembly) without
> disturbing the 
> drive.
> 
> PS: I use a much older Fluke 80T-IR in combination
> with a digital 
> multimeter with millivolt resolution (a Fluke meter
> of course!).
> 
> >> We sometimes see bad blocks reported (are these
> automatically remapped somehow so they are not used
> again?) and sometimes sata port resets.
> >
> > Depending on how the errors are reported, the
> driver may attempt a reset
> > to clear.  The drive may also automaticaly spare
> bad blocks.
> >
> >> Here is a sample of the log output. Any help
> understanding and/or resolving this issue greatly
> appreciated. I very much don't wont to have freezes
> in production.
> >>
> >> Aug 14 11:20:28 chazz1  port 2: device reset
> >> Aug 14 11:20:28 chazz1 scsi: [ID 107833
> kern.warning] WARNING:
> /[EMAIL PROTECTED],0/pci15d9,[EMAIL PROTECTED],2/[EMAIL PROTECTED],0 (sd3):
> >> Aug 14 11:20:28 chazz1  Error for Command: write
>                   Error Level: Retryable
> chazz1 scsi: [ID 107833 kern.notice]    Requested
>  Block: 530                       Error Block: 530
> > Aug 14 11:20:28 chazz1 scsi: [ID 107833
> kern.notice]    Vendor: ATA
>                                Serial Number:
> [ID 107833 kern.notice]    Sense Key:
>  No_Additional_Sense
> > Aug 14 11:20:28 chazz1 scsi: [ID 107833
> kern.notice]    ASC: 0x0 (no additional sense info),
> ASCQ: 0x0, FRU: 0x0
> >
> > This error was transient and retried.  If it was a
> fatal error (still
> > failed after retries) then you'll have another,
> different message
> > describing the failed condition.
> >  -- richard
> >
> 
> Regards,
> 
> Al Hopper  Logical Approach Inc, Plano, TX.
>  [EMAIL PROTECTED]
> Voice: 972.379.2133 Fax: 972.379.2134
>   Timezone: US CDT
> enSolaris Governing Board (OGB) Member - Apr 2005 to
> Mar 2007
> http://www.opensolaris.org/os/community/ogb/ogb_2005-2
> 007/
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discu
> ss
 
 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to