Re: [zfs-discuss] OpenSolaris ZFS NAS Setup

Richard Elling Mon, 07 Apr 2008 11:35:27 -0700

Ross Smith wrote:
> Which again is unacceptable for network storage.  If hardware raid 
> controllers took over a minute to timeout a drive network admins would 
> be in uproar.  Why should software be held to a different standard?

You need to take a systems approach to analyzing these things.
For example, how long does an array take to cold boot?  When
I was Chief Architect for Integrated Systems Engineering, we had
a product which included a storage array and a server racked
together.  If you used the defaults, and simulated a power-loss
failure scenario, then the whole thing fell apart.  Why?  Because
the server cold booted much faster than the array.  When Solaris
started, it looked for the disks, found none because the array was
still booting, and declared those disks dead.  The result was that
you needed system administrator intervention to get the services
started again.  Not acceptable.  The solution was to delay the
server boot to more closely match the array's boot time.

The default timeout values can be changed, but we rarely
recommend it.  You can get into all sorts of false failure modes
with small timeouts.  For example, most disks spec a 30 second
spin up time.  So if your disk is spun down, perhaps for power
savings, then you need a timeout which is greater than 30
seconds by some margin.  Similarly, if you have a CD-ROM
hanging off the bus, then you need a long timeout to accommodate
the slow data access for a CD-ROM.  I wrote a Sun BluePrint
article discussing some of these issues  a few years ago.
http://www.sun.com/blueprints/1101/clstrcomplex.pdf

>  
> I can understand the driver being persistant if your data is on a 
> single disk, however when you have any kind of redundant data, there 
> is no need for these delays.  And there should definately not be 
> delays in returning status information.  Who ever heard of a hardware 
> raid controller that takes 3 minutes to tell you which disk has gone bad?
>  
> I can understand how the current configuration came about, but it 
> seems to me that the design of ZFS isn't quite consistent.  You do all 
> this end-to-end checksumming to double check that data is consistent 
> because you don't trust the hardware, cables, or controllers to not 
> corrupt data.  Yet you trust that same equipment absolutely when it 
> comes to making status decisions.
>  
> It seems to me that you either trust the infrastructure or you don't, 
> and the safest decision (as ZFS' integrity checking has shown), is not 
> to trust it.  ZFS would be better assuming that drivers and 
> controllers won't always return accurate status information, and have 
> it's own set of criteria to determine whether a drive (of any kind) is 
> working as expected and returning responses in a timely manner.

I don't see any benefit for ZFS to add another set of timeouts
over and above the existing timeouts.  Indeed we often want to
delay any rash actions which would cause human intervention
or prolonged recovery later.  Sometimes patience is a virtue.
 -- richard

>  
>  
>
>
> > Date: Mon, 7 Apr 2008 07:48:41 -0700
> > From: [EMAIL PROTECTED]
> > Subject: Re: [zfs-discuss] OpenSolaris ZFS NAS Setup
> > To: [EMAIL PROTECTED]
> > CC: zfs-discuss@opensolaris.org
> >
> > Ross wrote:
> > > To repeat what some others have said, yes, Solaris seems to handle 
> an iSCSI device going offline in that it doesn't panick and continues 
> working once everything has timed out.
> > >
> > > However that doesn't necessarily mean it's ready for production 
> use. ZFS will hang for 3 mins (180 seconds) waiting for the iSCSI 
> client to timeout. Now I don't know about you, but HA to me doesn't 
> mean "Highly Available, but with occasional 3 minute breaks". Most of 
> the client applications we would want to run on ZFS would be broken 
> with a 3 minute delay returning data, and this was enough for us to 
> give up on ZFS over iSCSI for now.
> > >
> > >
> >
> > By default, the sd driver has a 60 second timeout with either 3 or 5
> > retries before timing out the I/O request. In other words, for the
> > same failure mode in a DAS or SAN you will get the same behaviour.
> > -- richard
> >
>
>
> ------------------------------------------------------------------------
> Have you played Fishticuffs? Get fish-slapping on Messenger 
> <http://www.fishticuffs.co.uk>

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] OpenSolaris ZFS NAS Setup

Reply via email to