[zfs-discuss] rethinking RaidZ and Record size

matthew patton Sun, 03 Jan 2010 23:28:18 -0800

I find it baffling that RaidZ(2,3) was designed to split a record-size block 
into N (N=# of member devices) pieces and send the uselessly tiny requests to 
spinning rust when we know the massive delays entailed in head seeks and 
rotational delay. The ZFS-mirror and load-balanced configuration do the 
obviously correct thing and don't split records and gain more by utilizing 
parallel access. I can't imagine the code-path for RAIDZ would be so hard to 
fix.


I've read posts back to 06 and all I see are lamenting about the horrendous 
drop in IOPs, about sizing RAIDZ to ~4+P and trying to claw back performance by 
combining multiple such vDEVs. I understand RAIDZ will never equal Mirroring, 
but it could get damn close if it didn't break requests down and better yet 
utilized copies=N and properly placed the copies on disparate spindles. This is 
somewhat analogous to what the likes of 3PAR do and it's not rocket science.

An 8 disk mirror and a RAIDZ8+2P w/ copies=2 give me the same amount of storage 
but the latter is a hell of a lot more resilient and max IOPS should be higher 
to boot. An non-broken-up RAIDZ4+P would still be 1/2 the IOPS of the 8 disk 
mirror but I'd at least save a bundle of coin in either reduced spindle count 
or using slower drives.

With all the great things ZFS is capable of, why hasn't this been redesigned 
long ago? what glaringly obvious truth am I missing?


      
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] rethinking RaidZ and Record size

Reply via email to