Re: [zfs-discuss] Re: [osol-discuss] Re: I wish Sun would open-source"QFS"... / was:Re: Re: Distributed File System for Solaris

Anton Rang Tue, 30 May 2006 12:59:14 -0700


On May 30, 2006, at 2:16 PM, Richard Elling wrote:

[assuming we're talking about disks and not "hardware RAID arrays"...]


It'd be interesting to know how many customers plan to use raw disks,
and how their performance relates to hardware arrays.  (My gut feeling
is that a lot of disks on FC probably isn't too bad, though on parallel
SCSI the negotiation overhead and lack of fairness was awful, but I
haven't tested this.)

On Tue, 2006-05-30 at 11:43 -0500, Anton Rang wrote:

Sure, the block size may be 128KB, but ZFS can bundle more than one
per-file/transaction


But it doesn't right now, as far as I can tell.


The protocol overhead is still orders of magnitude faster than a
rev.  Sure, there are pathological cases such as FC-AL over
200kms with 100+ nodes, but most folks won't hurt themselves like
that.


OK.  Let's take 4 Gb FC (e.g. array hardware).  Sending 128 KB will take
roughly 330 microseconds.  If we're going to achieve 95% of theoretical

rate, then each transaction can have no more than 5% of that foroverhead,

or 16 microseconds.  That's pretty darn fast.  For that matter, the
Solaris host would have to initiate 3,000 writes per second to keep the

channel busy. For each channel. And a host might well have 20channels.

Can our FC stack do that?  Not yet, though it's been looked at....

At 16 MB [why 16? because we can't do 32 MB in a WRITE(10) command] we
have some more leeway.  Sending 16 MB will take roughly 42 ms.  Each
transaction can take 5% of that, or 2 ms, for overhead, and still reach
the 95% mark.  And we only need to issue 24 commands per second to keep
the channel saturated.  No problem....

Single disks still run FC at 2 Gb, so the numbers above are roughly
halved, and since it takes 2-4 disks to max out a channel, you can
also multiply the allowable overhead time on the disk by a factor of
2-4.  That gives the disk about 16*2*4 = 128 microseconds to process
a command.  The disk might be able to do that.  Solaris (and the HBA)
still need to push out 1500 writes per second (per channel), though.
A good HBA may be able to do that....

For modern disks, multiple 128kByte transfers will spend a long time
in the disk's buffer cache waiting to be written to media.

They shouldn't spend that long, really. Today's Cheetah has a 200 MB/sec

interface, and a 59-118 MB/sec transfer rate to media, so at best we can
fill the cache a little over twice as fast as it empties.  (Once we put
multiple disks on the channel, it's easy to have the cache empty faster
than we fill it -- this is actually the desirable case, so that we're
not waiting on the media.)

Very few disks have 16MByte write buffer caches, so if you want tosendsuch a large iop down the wire (DAS please, otherwise you kill theSAN),
then you'll be waiting on the media anyway.  The disk interconnect is
faster than the media speed.  I don't see how you could avoid blowing
a rev in that case.


Yes, we'll wait on the media.  We'll never lose a rev, though.  Each
track on a Cheetah holds an average of 400 KB (1.6 MB/cylinder), so each
time that we change tracks, we'll likely have the buffer full with all

the data for the track. Even if we don't, FC transfers data out oforder,so the drive can re-order if it deems necessary (in the desirablecache-empty

case).

But until we have a well-configured test system to benchmark, this is
rather academic.  :-)  I suspect our customers will quickly tell us how
well ZFS works in their environments.  Hopefully the answer will be
"very well" for the 95% of customers who are in the median; for those
on the "radical fringe" of I/O requirements, there will likely be more
work to do.

I'll wander off to wait for some real data.  ;-)

-- Anton

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Re: [osol-discuss] Re: I wish Sun would open-source"QFS"... / was:Re: Re: Distributed File System for Solaris

Reply via email to