Hi, This is to ask you for your thoughts/advice on the best hardware setup for an OpenBSD server.
This email ultimately reduces to the question, "What HW & config do you suggest for minimizing the possibility of IO freeze or system crash from BIOS or SATA card, in the event of SSD/HDD malfunction?", however I'll take the whole reasoning around the HW choice from ground up with you just to see that you feel that I got it all right. I hope this email will serve as general advice for others re. best practice for OpenBSD server hardware choices. GOAL I am setting up an SSD-based storage facility that needs high data integrity guarantees and high performance (random reads/writes). The goal is to be able to safely store and constantly process something about as important as, say, medical records. Needless to say, at some point such a storage server *will* fail, and the only way to get to any sense of a pretty-much-100% uptime guarantee, is to set up the facility in the form of multiple servers in a reduntant cluster. What the individual server can do then is to never ever deliver broken data. And, locally and collectively there needs to be a well working mechanism for detecting when a node needs maintenance & take it out of use then. What I want to ask you about now then, is your thoughts on what would be the most suitable hardware configuration for the individual server, for them to function for as long as possible without need for physical administrator intervention. (And for when physical admin intervention would be needed, to reduce competence need for that maintenance if possible, to only involve hotswapping or adding a physical disk - so that is to minimize need of reboots due to SATA controller issues, weird BIOS behavior, or other reasons.) GENERAL PROBLEM SURFACE OF SERVER HARDWARE It seems to me that the accumulated experience with respect to why servers break, is 1) anything storage-related, 2) PSU, 3) other. So then, stability aspects should be given consideration in that order. For 2), the PSU can be made redundant easily, and PSU failures are fairly rare anyhow, so that is pretty much what is reasonable to do for that. For 3), the "other" category would either be because of bad thermal conditions (so that needs to be given proper consideration), or happen anyhow, for which no safeguards exist anyhow, so we just need to take that. The rest of this post will discuss 1) the storage aspect, only. THE STORAGE SOLUTION Originally I thought RAID 5/6 would provide data integrity guarantees and performance well. Then I saw the benchmark for a high-end RAID card showing 25MB/sec write (= 95% overhead) and 80% overhead on reads (http://www.storagereview.com/lsi_megaraid_sas3_93618i_review) per disk set, which is enough to make me understand that the upcoming softraid RAID1C with 2-4 drives will be far better at delivering those qualities - Of course I didn't see any benchmarks on RAID1C, but I guess its overhead for both read and write will be <<10-15% in average at least with its default CRC32C. (Perhaps RAID1C needs to be fortified with a better checksumming algorithm, and perhaps also double mirror reads on any read (depending on how the scrubbing works - didn't check this yet), though that is a separate conversation.) Of course to really know how well RAID1C will perform, I would need to benchmark it, but, there seems to be a general consensus in the RAID community that checksummed mirroring is preferable to RAID 5/6, so like, I perceive that this preliminary understanding I have that RAID1C will be the winning option, is well founded. The SSD:s would be enterprise grade and hence *should* shut down immediately if they start malfunctioning, so there should be essentially no QoS dumps in the softraid from any IO operations that take ultra-long to complete e.g. >>>10 seconds. For the RAID1C to really deliver then (now that PSU, CPU, RAM, and SSD all work), all that would be needed is that the remaining factors deliver well, so that is the SATA connectivity and that the BIOS operates transparently. HARDWARE BUDGET A good Xeon Supermicro server with onboard SATA and ethernet with decent PSU, RAM, CPU is some 1000:ds USD. 2TB x 2-3 enterprise SSD:s is around 2700-4000 USD. If any specialized SATA controllers if needed would be below 2000 USD anyhow. QUESTION Someone with 30 years of admin experience warned me that in the case that an individual storage drive dies, the SATA controller could crash, or the BIOS could kill the whole system. Also he warned me that if any disk in the boot softraid RAID1 would break, then the BIOS could get so confused that the system even wouldn't want to boot - and for that reason I guess the boot disks should be separated altogether from the "data disks", as the further will have a much, much lower turnover. A SATA-controller- or BIOS-induced system crash, freeze, or other need to reboot the system because of malfunction because of them, would be really unfortunate as it would escalate the maintenance requirement in that moment not only above not needing any physical intervention at all, but also above the level of just needing to change or add a physical drive (which is easy to ask anyone to do). Q1: Therefore, just to save myself as much as possible of future headache, I would like to learn to know your thoughts/suggestions on what hardware and software configuration I should choose (in general & actual particular controller, motherboard, BIOS setting, etc.), to get maximum operational stability, in particular for the case that the SSD:s break down, with the OpenBSD & other setup as described above? Q2: Also if anything is needed for a SATA controller to support hotswapping well, in general and with OpenBSD, please let me know? Q3: For if I need any separate SATA HW, I didn't get any clarity about which SATA controllers OpenBSD actually supports (e.g. http://www.openbsd.org/cgi-bin/man.cgi/OpenBSD-current/man4/pci.4?query=pci&sec=4 is not so clear?). In the absence of any further suggestion, my best guess would be that the best choice would be some of the LSI-based HBA:s or RAID cards from Avagotech, so that is http://www.avagotech.com/products/server-storage/host-bus-adapters/ and www.avagotech.com/products/server-storage/raid-controllers/ . If so, what do you say about SATA HW choice? (Later on I'll need to learn about softraid hot spares, scrubbing, rebuild and hotswapping, though those are software questions.) Thanks! Tinker