I find it baffling that RaidZ(2,3) was designed to split a record-size block into N (N=# of member devices) pieces and send the uselessly tiny requests to spinning rust when we know the massive delays entailed in head seeks and rotational delay. The ZFS-mirror and load-balanced configuration do the obviously correct thing and don't split records and gain more by utilizing parallel access. I can't imagine the code-path for RAIDZ would be so hard to fix.
I've read posts back to 06 and all I see are lamenting about the horrendous drop in IOPs, about sizing RAIDZ to ~4+P and trying to claw back performance by combining multiple such vDEVs. I understand RAIDZ will never equal Mirroring, but it could get damn close if it didn't break requests down and better yet utilized copies=N and properly placed the copies on disparate spindles. This is somewhat analogous to what the likes of 3PAR do and it's not rocket science. An 8 disk mirror and a RAIDZ8+2P w/ copies=2 give me the same amount of storage but the latter is a hell of a lot more resilient and max IOPS should be higher to boot. An non-broken-up RAIDZ4+P would still be 1/2 the IOPS of the 8 disk mirror but I'd at least save a bundle of coin in either reduced spindle count or using slower drives. With all the great things ZFS is capable of, why hasn't this been redesigned long ago? what glaringly obvious truth am I missing? _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss