On Apr 10, 2010, at 11:32 PM, valrh...@gmail.com wrote:
> A theoretical question on how ZFS works, for the experts on this board.
> I am wondering about how and where ZFS puts the physical data on a mechanical 
> hard drive. In the past, I have spent lots of money on 15K rpm SCSI and then 
> SAS drives, which of course have great performance. However, given the 
> increase in areal density in modern consumer SATA drives, similar performance 
> can be reached by short-stroking the drives; that is, the outermost tracks 
> are similar in performance to the average performance, and sometimes 
> exceeding the peak, on the 15K drives.

HDDs and performance do not mix.  SSDs win.  Game over.

> My question is how ZFS lays the data out on the disk, and if there's a way to 
> capture some of this effectively. It seems inefficient to do physically 
> short-stroke any of the drives, but more sensible to have ZFS handle this (if 
> in fact it has this capability). But if I am using mirrored pairs of 2 TB 
> drives, but only have a few hundred GB of data, in effect if only the outer 
> tracks are used, then the performance should be similar to if I have 
> nearly-full 15 K drives, in practice. Given that ZFS can also thin provision, 
> thereby disconnecting the virtual space and physical space on the drives, how 
> does the data layout maximize performance?

In general, the space with the lower numbered LBA is allocated first.  For many
HDDs, the lower numbered LBAs are on the outer cylinders.  An easy way to see
the allocations at a high level is to look at the metaslab statistics in
        
# zdb -m syspool 
Metaslabs: 
        vdev 0 metaslabs 148 offset spacemap free 
        --------------- ------------------- --------------- ------------- 
        metaslab 0 offset 0 spacemap 26 free 476M 
        metaslab 1 offset 40000000 spacemap 41 free 481M 
        metaslab 2 offset 80000000 spacemap 44 free 974M 
        metaslab 3 offset c0000000 spacemap 45 free 935M 
        metaslab 4 offset 100000000 spacemap 46 free 1007M 
        metaslab 5 offset 140000000 spacemap 110 free 935M 
        metaslab 6 offset 180000000 spacemap 111 free 1019M 
        metaslab 7 offset 1c0000000 spacemap 0 free 1G 
        metaslab 8 offset 200000000 spacemap 0 free 1G 
        metaslab 9 offset 240000000 spacemap 0 free 1G 
...
        metaslab 27 offset 6c0000000 spacemap 0 free 1G 
        metaslab 28 offset 700000000 spacemap 25 free 1012M 
        metaslab 29 offset 740000000 spacemap 40 free 1011M 
        metaslab 30 offset 780000000 spacemap 0 free 1G 
        metaslab 31 offset 7c0000000 spacemap 0 free 1G 
        metaslab 32 offset 800000000 spacemap 0 free 1G 
...

Most of the data is allocated in lower numbered metaslabs.  A bit later
you can see where the redundanct metadata is written. The rest is mostly
free space.

Remember that ZFS uses COW, so new writes will be to the free areas.

> The practical question: I have something like 600 GB of data on a mirrored 
> pair of 2 TB Hitachi SATA drives, with compression and deduplication. Before, 
> I had a RAID5 of four 147 GB 10K rpm Seagate Savvio 10K.2 2.5" SAS drives on 
> a Dell PERC5/i caching RAID controller. The old RAID was nearly full (20-30 
> GB free), and performed substantially slower than the current setup in daily 
> use (it had noticeably slower disk access, and transfer rates), because the 
> drives were nearly full. I'm curious to see if I switched from these two 
> disks to the new Western Digital Velociraptors (10K RPM SATA), if I could 
> even tell the difference. Or because those drives would be nearly full, would 
> the whole setup be slower?

Yes, the drives will be able to push more media under the head.
It is not clear that this will always give better performance.

Also, for writes, as the pool fills, it becomes more difficult to allocate
free space. This is not a ZFS-only phenomenon, all file systems have 
some sort of allocation. However, there have been improvements in
this area for ZFS over the past year or so.
 -- richard

ZFS storage and performance consulting at http://www.RichardElling.com
ZFS training on deduplication, NexentaStor, and NAS performance
Las Vegas, April 29-30, 2010 http://nexenta-vegas.eventbrite.com 





_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to