On Thu, 2007-08-30 at 15:28 -0700, Richard Elling wrote: > Jeffrey W. Baker wrote: > > # zfs set recordsize=2K tank/bench > > # randomio bigfile 10 .25 .01 2048 60 1 > > > > total | read: latency (ms) | write: latency (ms) > > iops | iops min avg max sdev | iops min avg max > > sdev > > --------+-----------------------------------+---------------------------------- > > 463.9 | 346.8 0.0 21.6 761.9 33.7 | 117.1 0.0 21.3 883.9 > > 33.5 > > > > Roughly the same as when the RS was 128K. But, if I set the RS to 2K > > before creating bigfile: > > > > total | read: latency (ms) | write: latency (ms) > > iops | iops min avg max sdev | iops min avg max > > sdev > > --------+-----------------------------------+---------------------------------- > > 614.7 | 460.4 0.0 18.5 249.3 14.2 | 154.4 0.0 9.6 989.0 > > 27.6 > > > > Much better! Yay! So I assume you would always set RS=8K when using > > PostgreSQL, etc? > > I presume these are something like Seagate DB35.3 series SATA 400 GByte > drives? > If so, then the spec'ed average read seek time is < 11 ms and rotational delay > is 7,200 rpm. So the theoretical peak random read rate per drive is ~66 iops. > http://www.seagate.com/ww/v/index.jsp?vgnextoid=01117ea70fafd010VgnVCM100000dd04090aRCRD&locale=en-US#
400GB 7200.10, which have slightly better seek specs. > For an 8-disk mirrored set, the max theoretical random read rate is 527 iops. > I see you're getting 460, so you're at 87% of theoretical. Not bad. > > When writing, the max theoretical rate is a little smaller because of the > longer > seek time (see datasheet) so we can get ~62 iops per disk. Also, the total is > divided in half because we have to write to both sides of the mirror. Thus > the > peak is 248 iops. You see 154 or 62% of peak. I think this line of reasoning is a bit misleading, since the reads and the writes are happening simultaneously, with a ratio of 3:1 in favor of writes, and 1% of all writes followed by an fsync. With all writes and no fsyncs, it's more like this: iops | iops min avg max sdev | iops min avg max sdev --------+-----------------------------------+---------------------------------- 364.1 | 0.0 Inf -NaN 0.0 -NaN | 364.1 0.0 27.4 1795.8 69.3 Which is altogether respectable. > For simultaneous reads and writes, 614 iops is pretty decent, but it makes me > wonder > if the spread is much smaller than the full disk. Sure it is. 4GiB << 1.2TiB. If I spread it out over 128GiB, it's much slower, but it seems that would apply to any filesystem. 190.8 | 143.4 0.0 53.4 254.4 26.6 | 47.4 3.6 49.4 558.8 29.4 -jwb _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss