On Mon, Jan 4, 2010 at 2:27 AM, matthew patton <patto...@yahoo.com> wrote: > I find it baffling that RaidZ(2,3) was designed to split a record-size block > into N (N=# of member devices) pieces and send the uselessly tiny requests to > spinning rust when we know the massive delays entailed in head seeks and > rotational delay. The ZFS-mirror and load-balanced configuration do the > obviously correct thing and don't split records and gain more by utilizing > parallel access. I can't imagine the code-path for RAIDZ would be so hard to > fix. > > I've read posts back to 06 and all I see are lamenting about the horrendous > drop in IOPs, about sizing RAIDZ to ~4+P and trying to claw back performance > by combining multiple such vDEVs. I understand RAIDZ will never equal > Mirroring, but it could get damn close if it didn't break requests down and > better yet utilized copies=N and properly placed the copies on disparate > spindles. This is somewhat analogous to what the likes of 3PAR do and it's > not rocket science. > > An 8 disk mirror and a RAIDZ8+2P w/ copies=2 give me the same amount of > storage but the latter is a hell of a lot more resilient and max IOPS should > be higher to boot. An non-broken-up RAIDZ4+P would still be 1/2 the IOPS of > the 8 disk mirror but I'd at least save a bundle of coin in either reduced > spindle count or using slower drives. > > With all the great things ZFS is capable of, why hasn't this been redesigned > long ago? what glaringly obvious truth am I missing?
It is the sacrifice that was made to remove the write hole vulnerability that existed in RAID5/6. Personnally I am thinking now that that write hole isn't so bad, and with COW writes and a write log, that vulnerability really could be marginalized. If you are running copies=2 you could utilize hardware RAID5/6 with battery backed write cache for the RAID and present it as a couple LUNs to ZFS which should provide higher performance with data resiliency in place. Say with 14 drives, 2x 7 drive RAID6s, make a 2 vdev zpool out of them and copies=2, should provide more then enough data resiliency and performance, at a cost. If the drives are large enough though it could overcome that. -Ross _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss