2011-12-11 15:10, Nathan Kroenert wrote:
Hey all,
That reminds me of something I have been wondering about... Why only 12x
faster? If we are effectively reading from memory - as compared to a
disk reading at approximately 100MB/s (which is about an average PC HDD
reading sequentially), I'd have thought it should be a lot faster than 12x.
Can we really only pull stuff from cache at only a little over one
gigabyte per second if it's dedup data?
I believe there's a couple of things in play.
One is that you'd rarely get 100Mb/s from a single HDD disk
due to fragmentation, especially inherent to ZFS. But you do
mention "sequential reading", so that's covered.
Besides, from Pavel's DD examples we see that he first read
at 98Mbyte/sec average, and then at 1233Mbyte/sec.
Another aspect is the RAM bandwidth, and we don't know the
specs of Pavel's test rig. For example, a 100MHz DDR2 would
peak out at 3200Mbyte/sec. That would include walking the
(cached) DDT tree for each block involved, determining which
(cached) data blocks correspond to it, and fetching them
from RAM or disk.
I would not be surprised to see that there is some disk IO
adding delays for the second case (read of a deduped file
"clone"), because you still have to determine references
to this second file's blocks, and another path of on-disk
blocks might lead to it from a separate inode in a separate
dataset (or I might be wrong). Reading this second path of
pointers to the same cached data blocks might decrease speed
a little.
It would be interesting to see Pavel's test updated with
second reads of both files (now that data and metadata are
all cached in RAM). It's possible that NOW reads would be
closer to RAM speeds with no disk IO. And I would be very
surprised if speeds would be noticeably different ;)
//Jim
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss