On Fri, 2006-09-29 at 09:41 +0200, Roch wrote:
> Erik Trimble writes:
>  > On Thu, 2006-09-28 at 10:51 -0700, Richard Elling - PAE wrote:
>  > > Keith Clay wrote:
>  > > > We are in the process of purchasing new san/s that our mail server 
> runs 
>  > > > on (JES3).  We have moved our mailstores to zfs and continue to have 
>  > > > checksum errors -- they are corrected but this improves on the ufs 
> inode 
>  > > > errors that require system shutdown and fsck.
>  > > > 
>  > > > So, I am recommending that we buy small jbods, do raidz2 and let zfs 
>  > > > handle the raiding of these boxes.  As we need more storage, we can 
> add 
>  > > > boxes and place them in a pool.  This would allow more controllers and 
>  > > > move spindles which I would think would add reliability and 
>  > > > performance.  I am thinking SATA II drives.
>  > > > 
>  > > > Any recommendations and/or advice is welcome.
>  > > 
>  > 
>  > Also, I can't remember how JES3 does its mailstore, but lots of little
>  > writes to a RAIDZ volume aren't good for performance, even through ZFS
>  > is better about waiting for sufficient write data to do a
>  > full-stripe-width write (vs. RAID-5).
>  > 
>  > That is, using RAIDZ on SATA isn't a good performance idea for the small
>  > write usage pattern, so I'd be careful and get a demo unit first to
>  > check out the actual numbers.
>  > 
> 
> IMO, RAIDZn should perform admirably on the write loads.
> The random reads aspects is more limited. The simple rule of 
> thumb is to consider that a RAIDZ group will deliver random
> read IOPS with the performance characteristic of single
> device. That rule does not apply to either read or write
> streaming data but only for small random reads pattern.
> 
> If that means you need to construct small RAIDZ groups
> then do consider mirroring as an alternative.
> 
> -r
> 
> ____________________________________________________________________________________
>       Performance, Availability & Architecture Engineering  
> 
> Roch Bourbonnais                        Sun Microsystems, Icnc-Grenoble 
> Senior Performance Analyst              180, Avenue De L'Europe, 38330, 
>                                       Montbonnot Saint Martin, France
> http://icncweb.france/~rbourbon               http://blogs.sun.com/roch
> [EMAIL PROTECTED]             (+33).4.76.18.83.20
> 

I'd like to see benchmarking for RAIDZn vs striping or single disk for
random write.

Random (re)write is a problem for RAID-5 (and relatives), as it requires
a full stripe-width read, parity calculation, then full stripe-width
write to do any data change on a stripe. 

For new data, RAID-5 isn't so bad, since it can skip the initial stripe
read.  And, as pointed out, random read for RAID-5 is good, equal to
striping in most cases.

Now, RAIDZn should beat RAID-5 since it tends to queue up writes until
it can write a full stripe at once (right?), so you will get _less_
writes required, but it still has the same problem for sparse writes
(i.e. small writes spaced far apart on the disk layout, where writes to
the same area are infrequent).  

For the original question, of a mail store backend,  I think the best
compromise between cost and performance is to use RAIDZn across SATA
JBODs for mail archives, and use RAID-10 (striped mirrors) on SCSI or FC
drives for the primary mail spool/user directories.  Assuming, of
course, this is a system handling at minimum of 100,000 messages/day.
(i.e. mid-size business and larger)..


-- 
Erik Trimble
Java System Support
Mailstop:  usca14-102
Phone:  x17195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to