Ok so lets consider your 2MB read. You have the option of
setting in in one contiguous place on the disk or split it
into 16 x 128K chunks, somewhat spread all over.

Now you issue a read to that 2MB of data.

As you noted, you  either have to wait for  the head to find
the 2MB block and  stream it, or  you dump 16 I/O descriptor
into an intelligent  controller; Wherever the  head is there
is data to be gotten from the get go. I can't swear it wins
the game, but it should be real close.

Well, the full specs aren't available, but a little math and
studying some models can get us close.  :-)

Let's presume we're using an enterprise-class disk, say a 37 GB
Seagate Cheetah.  This is best-case for seeks as it uses so
little of the platter and runs at 15K RPM.

Large-block case:

On average, to reach the 2 MB, we'll take 3.5ms.  Transfer can
then proceed at media rate (average 110 MB/sec) and be sent to
the host over a 200 MB/sec channel.  3.5 ms seek, 18.1 ms data
transfer, total time 21.6 ms for a rate of 92.6 MB/sec.

Small-block case:

Each seek will be shorter than the average since we are ordering
them optimally.  A single-track seek is 0.2 ms; average is 3.5ms;
if we assume linear scaling (which isn't quite right) then we're
looking at 1/8 of 3.7 ms = 0.46 ms.  We do 16 seeks, for 7.36 ms,
and our data transfer time is the same (18.1 ms), for a rate of
25.46 ms, a rate of 78.5 MB/sec.

Not too bad.  It's pretty clear why these drives are pricey.  :-)

Mmmm, actually it's not that good.  There are 50K tracks on this
35 GB disk, so each track holds 700 KB.  We're only storing 128KB
on each track, so on average we'll need to wait nearly 1/2 of a
revolution before we see any of our data under the head.  At 15K
RPM, that's not so bad, only 2ms, but we've got 16 times to wait,
adding 32 ms, dropping our rate to roughly half what we'd get
otherwise.  (Older disks should, surprisingly, do better since
they have less data packed onto each track!)

Looking at a 250 GB "near-line" SATA disk, and presuming its
controller does the same optimizations, things are different.
Average seek time is 8ms, with single-track seek time of 0.8ms,
so 15 additional seeks will cost roughly 30 ms.  A half-track
wait is 4ms (60ms in total).  Things are going pretty slow now.

I just did an experiment and could see > 60MB of data out of
a 35G disk using 128K chunks (> 450 IOPS).

On the only disk I have handy, I get 36 MB/sec with concurrent
128 KB chunks, 38 MB/sec with non-concurrent 2 MB chunks,
39 MB/sec with 2 MB chunks.  But I'm issuing all of these I/O
operations sequentially -- no seeks.

Disruptive.

What is?

Multiple I/Os outstanding to a device isn't precisely new.  ;-)

Honestly, adding seeks is -never- going to improve performance.
Giving the drive the opportunity to reorder I/O operations will,
but splitting a single operation up can never speed it up, though
if you get lucky it won't slow down.

Anton

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to