On Fri, 30 Nov 2007, Vincent Fox wrote: ... reformatted ... > We will be using Cyrus to store mail on 2540 arrays. > > We have chosen to build 5-disk RAID-5 LUNs in 2 arrays which are > both connected to same host, and mirror and stripe the LUNs. So a > ZFS RAID-10 set composed of 4 LUNs. Multi-pathing also in use for > redundancy. > > My question is any guidance on best choice in CAM for stripe size in the LUNs?
[after reading the entire thread where details of the storage related application is presented piecemeal and piecing together the details] I can't give you an answer or a recommendation, because the question does not make sense IMHO. IOW: This is like saying: "I want to get from Dallas to LA as quickly as possible and have already decided that a bicycle would be the best mode of transport to use; can you tell me how I should configure the bicycle." The problem is that its very unlikely that the bicycle is the correct solution and to recommend which bicycle config is correct is likely to provide very bad advice..... and also validate the supposition that the solution utilizing the bicycle is, indeed, the correct solution. > Default is 128K right now, can go up to 512K, should we go higher? > > Cyrus stores mail messages as many small files, not big mbox files. > But there are so many layers in action here it's hard to know what > is best choice. [again based on reading the entire thread and not an answer to the above paragraph] It appears that the chosen solution is to use a stripe of two hardware RAID5 luns presented by a 2540 (please correct me if this is incorrect). There are several issues with this proposal: a) You're mixing solutions: Hardware RAID5 and ZFS. Why? All this does is introduce needless complexity and make it very difficult to troubleshoot issues with the storage subsystem - especially if the issue is performance related. Also - how do you localize a fault condition that is caused by a 2540 RAID firmware bug? How do you isolate performance issues caused by the interaction between the hardware RAID5 luns and ZFS? b) You've chosen a stripe - despite Richard Ellings best advice (something like "friends don't let friends use stripes"). See Richards blogs for a comparison of the reliability rates for different storage configurations. c) For a mail storage subsystem a stripe seems totally wrong. Generally speaking, email (stores) consists of many small files - with occasional medium sized files (due to attachments) and less commonly, some large files - usually limited by the max message size defined by the MTA (typical value is 10Mb - what is it in your case?). d) ZFS, with its built-in volume manager, relies on having direct access to individual disks (JBOD). Placing a hardware RAID engine between ZFS and the actual disks is a "black box" in terms of the ZFS volume manager - and it can't possibly "understand" how various storage providers' "black boxes" will behave.... especially when ZFS tells the "disk" to do something and the hardware RAID lun lies to ZFS (example sync writes). e) You've presented no data in terms of typical iostat -xcnz 5 output - generalized over various times of the day where particular user data access patterns are known. This information would allow us to give you some basic recommendations. IOW - we need to know the basic requirements in terms of IOPS and average I/O transfer sizes. BTW: Brendan Greggs DTrace scripts will allow you to gather very detailed I/O usage data on the production system with no risk. f) You have not provided any details of the 2540 config - except for the fact that it is "fully loaded" IIRC. SAS disks? 10,000 RPM drives of 15k RPM drives? Disk drive size? g) You've provided no details of how the host is configured. If you decide to deploy a ZFS based system, the amount of installed RAM on the mailserver will have a *huge* impact on the actual load placed on the I/O subsystem. In this regard, ZFS is your friend, as it'll cache almost _everything_, given enough RAM. And DDR2 RAM is (arguably) less than $40 a gigabyte today - with 2Gb SIMMs having reached price parity with the equivalent pricing of 2 * 1Gb DIMMs. For example: if an end-user MUA is configured to poll the mailserver every 30 Seconds, to check if new mail has arrived, if the mailserver has sufficient (cache) memory, then only the first request will require disk access and a large number of subsequent requests will be handled out of (cache) memory. h) Another observation: You've commented on the importance of system reliability because there are 10k users on the mailserver. Whether you have 10 users or 10k users or 100k users is of no importance if you are considering system reliability (aka failure rates). IOW - a system that is configured to a certain reliability requirement will be the same, regardless of the number of end users that rely on that system. The number of concurrent users is important only in terms of system performance and response time. i) I don't know what the overall storage requirement is (someone said 1Tb IIRC) and how this relates to the number/size of the available disk drives (in the 2540). Observations: 1) Any striped config seems inherently wrong - given the available information. 2) mixing RAID5 luns (backend) with ZFS introduces unnecessary system complexity. 3) designing a system when no requirements have been presented in terms of: i) I/O access patterns ii) IOPS (I/O Ops per Second) iii) required response time iv) number of concurrent requests v) application host config (CPUs/cores, RAM, I/O bus, disk ctrls) vi) backup methodology and frequency vii) storage subsystem config .... is very unlikely to result in a correctly configured system that will meet the owner/operators expectations. Please don't frame this response as completely negative. That is not my intention - what I'm trying to do is present you with a list of questions that must be answered before a technically correct storage subsystem can be designed and implemented. IOW - before a storage subsystem can be correctly *engineered*. Also - please don't be discouraged by this response. If you are willing to fill in the blanks, I'm willing to help provide a meaningful recommendation. Regards, Al Hopper Logical Approach Inc, Plano, TX. [EMAIL PROTECTED] Voice: 972.379.2133 Fax: 972.379.2134 Timezone: US CDT OpenSolaris Governing Board (OGB) Member - Apr 2005 to Mar 2007 http://www.opensolaris.org/os/community/ogb/ogb_2005-2007/ Graduate from "sugar-coating school"? Sorry - I never attended! :) _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss