Hi,

This is to ask you for your thoughts/advice on the best hardware setup
for an OpenBSD server.

This email ultimately reduces to the question, "What HW & config do you
suggest for minimizing the possibility of IO freeze or system crash from
BIOS or SATA card, in the event of SSD/HDD malfunction?", however I'll
take the whole reasoning around the HW choice from ground up with you
just to see that you feel that I got it all right.

I hope this email will serve as general advice for others re. best
practice for OpenBSD server hardware choices.

GOAL
I am setting up an SSD-based storage facility that needs high data
integrity guarantees and high performance (random reads/writes). The
goal is to be able to safely store and constantly process something
about as important as, say, medical records.

Needless to say, at some point such a storage server *will* fail, and
the only way to get to any sense of a pretty-much-100% uptime guarantee,
is to set up the facility in the form of multiple servers in a reduntant
cluster.

What the individual server can do then is to never ever deliver broken
data. And, locally and collectively there needs to be a well working
mechanism for detecting when a node needs maintenance & take it out of
use then.

What I want to ask you about now then, is your thoughts on what would be
the most suitable hardware configuration for the individual server, for
them to function for as long as possible without need for physical
administrator intervention.

(And for when physical admin intervention would be needed, to reduce
competence need for that maintenance if possible, to only involve
hotswapping or adding a physical disk - so that is to minimize need of
reboots due to SATA controller issues, weird BIOS behavior, or other
reasons.)

GENERAL PROBLEM SURFACE OF SERVER HARDWARE

It seems to me that the accumulated experience with respect to why
servers break, is 1) anything storage-related, 2) PSU, 3) other.

So then, stability aspects should be given consideration in that order.

For 2), the PSU can be made redundant easily, and PSU failures are
fairly rare anyhow, so that is pretty much what is reasonable to do for
that.

For 3), the "other" category would either be because of bad thermal
conditions (so that needs to be given proper consideration), or happen
anyhow, for which no safeguards exist anyhow, so we just need to take
that.

The rest of this post will discuss 1) the storage aspect, only.

THE STORAGE SOLUTION
Originally I thought RAID 5/6 would provide data integrity guarantees
and performance well. Then I saw the benchmark for a high-end RAID card
showing 25MB/sec write (= 95% overhead) and 80% overhead on reads
(http://www.storagereview.com/lsi_megaraid_sas3_93618i_review) per disk
set, which is enough to make me understand that the upcoming softraid
RAID1C with 2-4 drives will be far better at delivering those qualities
-

Of course I didn't see any benchmarks on RAID1C, but I guess its
overhead for both read and write will be <<10-15% in average at least
with its default CRC32C.

(Perhaps RAID1C needs to be fortified with a better checksumming
algorithm, and perhaps also double mirror reads on any read (depending
on how the scrubbing works - didn't check this yet), though that is a
separate conversation.)

Of course to really know how well RAID1C will perform, I would need to
benchmark it, but, there seems to be a general consensus in the RAID
community that checksummed mirroring is preferable to RAID 5/6, so like,
I perceive that this preliminary understanding I have that RAID1C will
be the winning option, is well founded.

The SSD:s would be enterprise grade and hence *should* shut down
immediately if they start malfunctioning, so there should be essentially
no QoS dumps in the softraid from any IO operations that take ultra-long
to complete e.g. >>>10 seconds.

For the RAID1C to really deliver then (now that PSU, CPU, RAM, and SSD
all work), all that would be needed is that the remaining factors
deliver well, so that is the SATA connectivity and that the BIOS
operates transparently.

HARDWARE BUDGET
A good Xeon Supermicro server with onboard SATA and ethernet with decent
PSU, RAM, CPU is some 1000:ds USD. 2TB x 2-3 enterprise SSD:s is around
2700-4000 USD. If any specialized SATA controllers if needed would be
below 2000 USD anyhow.

QUESTION
Someone with 30 years of admin experience warned me that in the case
that an individual storage drive dies, the SATA controller could crash,
or the BIOS could kill the whole system.

Also he warned me that if any disk in the boot softraid RAID1 would
break, then the BIOS could get so confused that the system even wouldn't
want to boot - and for that reason I guess the boot disks should be
separated altogether from the "data disks", as the further will have a
much, much lower turnover.

A SATA-controller- or BIOS-induced system crash, freeze, or other need
to reboot the system because of malfunction because of them, would be
really unfortunate as it would escalate the maintenance requirement in
that moment not only above not needing any physical intervention at all,
but also above the level of just needing to change or add a physical
drive (which is easy to ask anyone to do).

Q1: Therefore, just to save myself as much as possible of future
headache, I would like to learn to know your thoughts/suggestions on
what hardware and software configuration I should choose (in general &
actual particular controller, motherboard, BIOS setting, etc.), to get
maximum operational stability, in particular for the case that the SSD:s
break down, with the OpenBSD & other setup as described above?

Q2: Also if anything is needed for a SATA controller to support
hotswapping well, in general and with OpenBSD, please let me know?

Q3: For if I need any separate SATA HW, I didn't get any clarity about
which SATA controllers OpenBSD actually supports (e.g.
http://www.openbsd.org/cgi-bin/man.cgi/OpenBSD-current/man4/pci.4?query=pci&sec=4
is not so clear?). In the absence of any further suggestion, my best
guess would be that the best choice would be some of the LSI-based HBA:s
or RAID cards from Avagotech, so that is
http://www.avagotech.com/products/server-storage/host-bus-adapters/ and
www.avagotech.com/products/server-storage/raid-controllers/ . If so,
what do you say about SATA HW choice?

(Later on I'll need to learn about softraid hot spares, scrubbing,
rebuild and hotswapping, though those are software questions.)

Thanks!

Tinker

Reply via email to