On Fri, 2006-09-29 at 09:41 +0200, Roch wrote: > Erik Trimble writes: > > On Thu, 2006-09-28 at 10:51 -0700, Richard Elling - PAE wrote: > > > Keith Clay wrote: > > > > We are in the process of purchasing new san/s that our mail server > runs > > > > on (JES3). We have moved our mailstores to zfs and continue to have > > > > checksum errors -- they are corrected but this improves on the ufs > inode > > > > errors that require system shutdown and fsck. > > > > > > > > So, I am recommending that we buy small jbods, do raidz2 and let zfs > > > > handle the raiding of these boxes. As we need more storage, we can > add > > > > boxes and place them in a pool. This would allow more controllers and > > > > move spindles which I would think would add reliability and > > > > performance. I am thinking SATA II drives. > > > > > > > > Any recommendations and/or advice is welcome. > > > > > > > Also, I can't remember how JES3 does its mailstore, but lots of little > > writes to a RAIDZ volume aren't good for performance, even through ZFS > > is better about waiting for sufficient write data to do a > > full-stripe-width write (vs. RAID-5). > > > > That is, using RAIDZ on SATA isn't a good performance idea for the small > > write usage pattern, so I'd be careful and get a demo unit first to > > check out the actual numbers. > > > > IMO, RAIDZn should perform admirably on the write loads. > The random reads aspects is more limited. The simple rule of > thumb is to consider that a RAIDZ group will deliver random > read IOPS with the performance characteristic of single > device. That rule does not apply to either read or write > streaming data but only for small random reads pattern. > > If that means you need to construct small RAIDZ groups > then do consider mirroring as an alternative. > > -r > > ____________________________________________________________________________________ > Performance, Availability & Architecture Engineering > > Roch Bourbonnais Sun Microsystems, Icnc-Grenoble > Senior Performance Analyst 180, Avenue De L'Europe, 38330, > Montbonnot Saint Martin, France > http://icncweb.france/~rbourbon http://blogs.sun.com/roch > [EMAIL PROTECTED] (+33).4.76.18.83.20 >
I'd like to see benchmarking for RAIDZn vs striping or single disk for random write. Random (re)write is a problem for RAID-5 (and relatives), as it requires a full stripe-width read, parity calculation, then full stripe-width write to do any data change on a stripe. For new data, RAID-5 isn't so bad, since it can skip the initial stripe read. And, as pointed out, random read for RAID-5 is good, equal to striping in most cases. Now, RAIDZn should beat RAID-5 since it tends to queue up writes until it can write a full stripe at once (right?), so you will get _less_ writes required, but it still has the same problem for sparse writes (i.e. small writes spaced far apart on the disk layout, where writes to the same area are infrequent). For the original question, of a mail store backend, I think the best compromise between cost and performance is to use RAIDZn across SATA JBODs for mail archives, and use RAID-10 (striped mirrors) on SCSI or FC drives for the primary mail spool/user directories. Assuming, of course, this is a system handling at minimum of 100,000 messages/day. (i.e. mid-size business and larger).. -- Erik Trimble Java System Support Mailstop: usca14-102 Phone: x17195 Santa Clara, CA Timezone: US/Pacific (GMT-0800) _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss