On Thu, 2007-08-30 at 15:28 -0700, Richard Elling wrote:
> Jeffrey W. Baker wrote:
> > # zfs set recordsize=2K tank/bench
> > # randomio bigfile 10 .25 .01 2048 60 1
> > 
> >   total |  read:         latency (ms)       |  write:        latency (ms)
> >    iops |   iops   min    avg    max   sdev |   iops   min    avg    max   
> > sdev
> > --------+-----------------------------------+----------------------------------
> >   463.9 |  346.8   0.0   21.6  761.9   33.7 |  117.1   0.0   21.3  883.9   
> > 33.5
> > 
> > Roughly the same as when the RS was 128K.  But, if I set the RS to 2K
> > before creating bigfile:
> > 
> >   total |  read:         latency (ms)       |  write:        latency (ms)
> >    iops |   iops   min    avg    max   sdev |   iops   min    avg    max   
> > sdev
> > --------+-----------------------------------+----------------------------------
> >   614.7 |  460.4   0.0   18.5  249.3   14.2 |  154.4   0.0    9.6  989.0   
> > 27.6
> > 
> > Much better!  Yay!  So I assume you would always set RS=8K when using
> > PostgreSQL, etc?
> 
> I presume these are something like Seagate DB35.3 series SATA 400 GByte 
> drives?
> If so, then the spec'ed average read seek time is < 11 ms and rotational delay
> is 7,200 rpm.  So the theoretical peak random read rate per drive is ~66 iops.
> http://www.seagate.com/ww/v/index.jsp?vgnextoid=01117ea70fafd010VgnVCM100000dd04090aRCRD&locale=en-US#

400GB 7200.10, which have slightly better seek specs.

> For an 8-disk mirrored set, the max theoretical random read rate is 527 iops.
> I see you're getting 460, so you're at 87% of theoretical.  Not bad.
> 
> When writing, the max theoretical rate is a little smaller because of the 
> longer
> seek time (see datasheet) so we can get ~62 iops per disk.  Also, the total is
> divided in half because we have to write to both sides of the mirror.  Thus 
> the
> peak is 248 iops.  You see 154 or 62% of peak.

I think this line of reasoning is a bit misleading, since the reads and
the writes are happening simultaneously, with a ratio of 3:1 in favor of
writes, and 1% of all writes followed by an fsync.  With all writes and
no fsyncs, it's more like this:

   iops |   iops   min    avg    max   sdev |   iops   min    avg    max   sdev
--------+-----------------------------------+----------------------------------
  364.1 |    0.0   Inf   -NaN    0.0   -NaN |  364.1   0.0   27.4 1795.8   69.3

Which is altogether respectable.

> For simultaneous reads and writes, 614 iops is pretty decent, but it makes me 
> wonder
> if the spread is much smaller than the full disk.

Sure it is.  4GiB << 1.2TiB.  If I spread it out over 128GiB, it's much
slower, but it seems that would apply to any filesystem.

  190.8 |  143.4   0.0   53.4  254.4   26.6 |   47.4   3.6   49.4  558.8   29.4

-jwb

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to