Re: [zfs-discuss] zfs global hot spares?

Fred Liu Fri, 17 Jun 2011 03:19:04 -0700

> -----Original Message-----
> From: Fred Liu
> Sent: 星期四, 六月 16, 2011 17:28
> To: Fred Liu; 'Richard Elling'
> Cc: 'Jim Klimov'; 'zfs-discuss@opensolaris.org'
> Subject: RE: [zfs-discuss] zfs global hot spares?
> 
> Fixing a typo in my last thread...
> 
> > -----Original Message-----
> > From: Fred Liu
> > Sent: 星期四, 六月 16, 2011 17:22
> > To: 'Richard Elling'
> > Cc: Jim Klimov; zfs-discuss@opensolaris.org
> > Subject: RE: [zfs-discuss] zfs global hot spares?
> >
> > > This message is from the disk saying that it aborted a command.
> These
> > > are
> > > usually preceded by a reset, as shown here. What caused the reset
> > > condition?
> > > Was it actually target 11 or did target 11 get caught up in the
> reset
> > > storm?
> > >
> >
>  It happed in the mid-night and nobody touched the file box.
>  I assume it is the transition status before the disk is *thoroughly*
>  damaged:
> 
>  Jun 10 09:34:11 cn03 fmd: [ID 377184 daemon.error] SUNW-MSG-ID: ZFS-
>  8000-FD, TYPE: Fault, VER: 1, SEVERITY:
> 
>  Major
>  Jun 10 09:34:11 cn03 EVENT-TIME: Fri Jun 10 09:34:11 CST 2011
>  Jun 10 09:34:11 cn03 PLATFORM: X8DTH-i-6-iF-6F, CSN: 1234567890,
>  HOSTNAME: cn03
>  Jun 10 09:34:11 cn03 SOURCE: zfs-diagnosis, REV: 1.0
>  Jun 10 09:34:11 cn03 EVENT-ID: 4f4bfc2c-f653-ed20-ab13-eef72224af5e
>  Jun 10 09:34:11 cn03 DESC: The number of I/O errors associated with a
>  ZFS device exceeded
>  Jun 10 09:34:11 cn03         acceptable levels.  Refer to
>  http://sun.com/msg/ZFS-8000-FD for more information.
>  Jun 10 09:34:11 cn03 AUTO-RESPONSE: The device has been offlined and
>  marked as faulted.  An attempt
>  Jun 10 09:34:11 cn03         will be made to activate a hot spare if
>  available.
>  Jun 10 09:34:11 cn03 IMPACT: Fault tolerance of the pool may be
>  compromised.
>  Jun 10 09:34:11 cn03 REC-ACTION: Run 'zpool status -x' and replace the
>  bad device.
> 
>  After I rebooted it, I got:
>  Jun 10 11:38:49 cn03 genunix: [ID 540533 kern.notice] ^MSunOS Release
>  5.11 Version snv_134 64-bit
>  Jun 10 11:38:49 cn03 genunix: [ID 683174 kern.notice] Copyright 1983-
>  2010 Sun Microsystems, Inc.  All rights
> 
>  reserved.
>  Jun 10 11:38:49 cn03 Use is subject to license terms.
>  Jun 10 11:38:49 cn03 unix: [ID 126719 kern.info] features:
> 
> 
> 7f7fffff<sse4_2,sse4_1,ssse3,cpuid,mwait,tscp,cmp,cx16,sse3,nx,asysc,ht
>  t,sse2,sse,sep,pat,cx8,pae,mca,mmx,cmov,d
> 
>  e,pge,mtrr,msr,tsc,lgpg>
> 
>  Jun 10 11:39:06 cn03 scsi: [ID 365881 kern.info]
>  /pci@0,0/pci8086,3410@9/pci1000,72@0 (mpt_sas0):
>  Jun 10 11:39:06 cn03    mptsas0 unrecognized capability 0x3
> 
>  Jun 10 11:39:42 cn03 scsi: [ID 107833 kern.warning] WARNING:
>  /scsi_vhci/disk@g5000c50009723937 (sd3):
>  Jun 10 11:39:42 cn03    drive offline
>  Jun 10 11:39:47 cn03 scsi: [ID 107833 kern.warning] WARNING:
>  /scsi_vhci/disk@g5000c50009723937 (sd3):
>  Jun 10 11:39:47 cn03    drive offline
>  Jun 10 11:39:52 cn03 scsi: [ID 107833 kern.warning] WARNING:
>  /scsi_vhci/disk@g5000c50009723937 (sd3):
>  Jun 10 11:39:52 cn03    drive offline
>  Jun 10 11:39:57 cn03 scsi: [ID 107833 kern.warning] WARNING:
>  /scsi_vhci/disk@g5000c50009723937 (sd3):
>  Jun 10 11:39:57 cn03    drive offline
> >
> >
> > >
> > > Hot spare will not help you here. The problem is not constrained to
> > one
> > > disk.
> > > In fact, a hot spare may be the worst thing here because it can
> kick
> > in
> > > for the disk
> > > complaining about a clogged expander or spurious resets.  This
> causes
> > a
> > > resilver
> > > that reads from the actual broken disk, that causes more resets,
> that
> > > kicks out another
> > > disk that causes a resilver, and so on.
> > >  -- richard
> > >
> >
>  So the warm spares could be "better" choice under this situation?
>  BTW, in what condition, the scsi reset storm will happen?
>  How can we be immune to this so as NOT to interrupt the file
>  service?
> >
> >
> > Thanks.
> > Fred
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs global hot spares?

Reply via email to