Re: [zfs-discuss] rethinking RaidZ and Record size

Ross Walker Mon, 04 Jan 2010 07:47:11 -0800

On Mon, Jan 4, 2010 at 2:27 AM, matthew patton <patto...@yahoo.com> wrote:
> I find it baffling that RaidZ(2,3) was designed to split a record-size block 
> into N (N=# of member devices) pieces and send the uselessly tiny requests to 
> spinning rust when we know the massive delays entailed in head seeks and 
> rotational delay. The ZFS-mirror and load-balanced configuration do the 
> obviously correct thing and don't split records and gain more by utilizing 
> parallel access. I can't imagine the code-path for RAIDZ would be so hard to 
> fix.
>
> I've read posts back to 06 and all I see are lamenting about the horrendous 
> drop in IOPs, about sizing RAIDZ to ~4+P and trying to claw back performance 
> by combining multiple such vDEVs. I understand RAIDZ will never equal 
> Mirroring, but it could get damn close if it didn't break requests down and 
> better yet utilized copies=N and properly placed the copies on disparate 
> spindles. This is somewhat analogous to what the likes of 3PAR do and it's 
> not rocket science.
>
> An 8 disk mirror and a RAIDZ8+2P w/ copies=2 give me the same amount of 
> storage but the latter is a hell of a lot more resilient and max IOPS should 
> be higher to boot. An non-broken-up RAIDZ4+P would still be 1/2 the IOPS of 
> the 8 disk mirror but I'd at least save a bundle of coin in either reduced 
> spindle count or using slower drives.
>
> With all the great things ZFS is capable of, why hasn't this been redesigned 
> long ago? what glaringly obvious truth am I missing?


It is the sacrifice that was made to remove the write hole
vulnerability that existed in RAID5/6. Personnally I am thinking now
that that write hole isn't so bad, and with COW writes and a write
log, that vulnerability really could be marginalized.

If you are running copies=2 you could utilize hardware RAID5/6 with
battery backed write cache for the RAID and present it as a couple
LUNs to ZFS which should provide higher performance with data
resiliency in place.

Say with 14 drives, 2x 7 drive RAID6s, make a 2 vdev zpool out of them
and copies=2, should provide more then enough data resiliency and
performance, at a cost. If the drives are large enough though it could
overcome that.

-Ross
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] rethinking RaidZ and Record size

Reply via email to