Re: [zfs-discuss] rethinking RaidZ and Record size

Tristan Ball Tue, 05 Jan 2010 11:57:34 -0800

On 6/01/2010 3:00 AM, Roch wrote:

Richard Elling writes:
  >  On Jan 3, 2010, at 11:27 PM, matthew patton wrote:
  >
  >  >  I find it baffling that RaidZ(2,3) was designed to split a record-
  >  >  size block into N (N=# of member devices) pieces and send the
  >  >  uselessly tiny requests to spinning rust when we know the massive
  >  >  delays entailed in head seeks and rotational delay. The ZFS-mirror
  >  >  and load-balanced configuration do the obviously correct thing and
  >  >  don't split records and gain more by utilizing parallel access. I
  >  >  can't imagine the code-path for RAIDZ would be so hard to fix.
  >
  >  Knock yourself out :-)
  >  
http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/vdev_raidz.c
  >
  >  >  I've read posts back to 06 and all I see are lamenting about the
  >  >  horrendous drop in IOPs, about sizing RAIDZ to ~4+P and trying to
  >  >  claw back performance by combining multiple such vDEVs. I understand
  >  >  RAIDZ will never equal Mirroring, but it could get damn close if it
  >  >  didn't break requests down and better yet utilized copies=N and
  >  >  properly placed the copies on disparate spindles. This is somewhat
  >  >  analogous to what the likes of 3PAR do and it's not rocket science.
  >

[snipped for space ]


That said, I truly am for a evolution for random read
workloads. Raid-Z on 4K sectors is quite appealing. It means
that small objects become nearly mirrored with good random read
performance while large objects are stored efficiently.

-r

Sold! Let's do that then! :-)

Seriously - are there design or architectural reasons why this isn'tdone by default, or at least an option? Or is it just a "no one's hadtime to implement yet" thing?I understand that 4K sectors might be less space efficient for lots ofsmall files, but I suspect lots of us would happilly make that trade off!


Thanks,
    Tristan
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] rethinking RaidZ and Record size

Reply via email to