Hi,

This is to ask you for your thoughts/advice on the best hardware setup
for an OpenBSD server.

Oh where to start. You have a lot of enthusiasm clearly but not a lot of
experience.

OK, I'll bite.

"best" is subjective.  The server(s) will be surrounded
by clients (they are servers after all).  What is the best client for
this best server?  What is the purpose of this collection of servers
and clients?  What is your budget?  Who will evaluate this system and on
what basis will they describe it as successful or not?


This email ultimately reduces to the question, "What HW & config do you
suggest for minimizing the possibility of IO freeze or system crash from
BIOS or SATA card, in the event of SSD/HDD malfunction?", however I'll
take the whole reasoning around the HW choice from ground up with you
just to see that you feel that I got it all right.


This post and others seem to show you are very concerned with I/O freeze.
Yet that is a rare occurence, by comparison to hundreds of other
possibilities for system failure.  AC power failure, for instance.

I hope this email will serve as general advice for others re. best
practice for OpenBSD server hardware choices.

GOAL
I am setting up an SSD-based storage facility that needs high data
integrity guarantees and high performance (random reads/writes). The
goal is to be able to safely store and constantly process something
about as important as, say, medical records.

"high" and "guarantee" are mutually incompatible. You either get a guarantee or you don't. (Any guarantee is unlikely to be credible.) Now, if they said "perfect" and "guarantee" then your statement would be correct, however, still
unbelievable.  There is a disconnect here in the logic.


Needless to say, at some point such a storage server *will* fail, and
the only way to get to any sense of a pretty-much-100% uptime guarantee, is to set up the facility in the form of multiple servers in a reduntant
cluster.

OK, now you have a choice: do you want to spend lots of money on highly
reliable servers, and cluster them, or spend less money on less reliable
servers and rely on the clustering for overall reliability?

OpenBSD does not support clustered filesystems, so here you must be assuming some other non-OpenBSD package, such as from ports, to implement "clusters".

Is this right?


What the individual server can do then is to never ever deliver broken
data. And, locally and collectively there needs to be a well working
mechanism for detecting when a node needs maintenance & take it out of
use then.

Another error in logic. "never ever" is incompatible with "*will* fail".

You might want to review how Netflix manages failure. Look up "chaos monkey". The gist of which is, based on a "will fail" assumption, they constantly test
handling failures.


What I want to ask you about nw then, is your thoughts on what would be
the most suitable hardware configuration for the individual server, for
them to function for as long as possible without need for physical
administrator intervention.

Why do you think you need to build such a device?  Why don't you buy it?

(Dell PowerEdge VRTX, HP hyper converged, etc)


(And for when physical admin intervention would be needed, to reduce
competence need for that maintenance if possible, to only involve
hotswapping or adding a physical disk - so that is to minimize need of
reboots due to SATA controller issues, weird BIOS behavior, or other
reasons.)

GENERAL PROBLEM SURFACE OF SERVER HARDWARE

It seems to me that the accumulated experience with respect to why
servers break, is 1) anything storage-related, 2) PSU, 3) other.

You don't give any source for this claim. Check out various publications
by Google and other at-scale users about their experience.


So then, stability aspects should be given consideration in that order.

For 2), the PSU can be made redundant easily, and PSU failures are
fairly rare anyhow, so that is pretty much what is reasonable to do for
that.

You omit AC power failures, distribution panel faults, uninterruptible power systems, power cables, unintended pressure by fingers roaming on/off buttons,
feet kicking power cables, and so on.  Why do you leave these risks out?


For 3), the "other" category would either be because of bad thermal
conditions (so that needs to be given proper consideration), or happen
anyhow, for which no safeguards exist anyhow, so we just need to take
that.

The rest of this post will discuss 1) the storage aspect, only.

THE STORAGE SOLUTION
Originally I thought RAID 5/6 would provide data integrity guarantees
and performance well. Then I saw the benchmark for a high-end RAID card
showing 25MB/sec write (= 95% overhead) and 80% overhead on reads
(http://www.storagereview.com/lsi_megaraid_sas3_93618i_review) per disk

The reference you cite says no such thing.  The word "overhead" does not
appear in the article.

That reference has some flaky methodology (averaging of standard deviations,
no less). The performance they claim seems somewhat unlikely, and not
controlled for caching etc by their operating system.  Reads and write
throughput in RAID-5 or RAID-0 being so widely different, for example.

set, which is enough to make me understand that the upcoming softraid
RAID1C with 2-4 drives will be far better at delivering those qualities
-

Of course I didn't see any benchmarks on RAID1C, but I guess its
overhead for both read and write will be <<10-15% in average at least
with its default CRC32C.

(Perhaps RAID1C needs to be fortified with a better checksumming
algorithm, and perhaps also double mirror reads on any read (depending
on how the scrubbing works - didn't check this yet), though that is a
separate conversation.)

Of course to really know how well RAID1C will perform, I would need to
benchmark it, but, there seems to be a general consensus in the RAID
community that checksummed mirroring is preferable to RAID 5/6, so like,
I perceive that this preliminary understanding I have that RAID1C will
be the winning option, is well founded.

The SSD:s would be enterprise grade and hence *should* shut down
immediately if they start malfunctioning, so there should be essentially no QoS dumps in the softraid from any IO operations that take ultra-long
to complete e.g. >>>10 seconds.

Why would a disk do this?  You seem to be very focussed on this unlikely
event without considering numerous other possibilities: data corruption in read, in spite of built-in error checking; data corruption in transfer of
data from disk to computer; memory faults in the RAID controller;
drive write faults such as writing to the incorrect block on disk;
etc etc etc.


For the RAID1C to really deliver then (now that PSU, CPU, RAM, and SSD
all work), all that would be needed is that the remaining factors
deliver well, so that is the SATA connectivity and that the BIOS
operates transparently.

HARDWARE BUDGET
A good Xeon Supermicro server with onboard SATA and ethernet with decent
PSU, RAM, CPU is some 1000:ds USD. 2TB x 2-3 enterprise SSD:s is around
2700-4000 USD. If any specialized SATA controllers if needed would be
below 2000 USD anyhow.

QUESTION
Someone with 30 years of admin experience warned me that in the case
that an individual storage drive dies, the SATA controller could crash,
or the BIOS could kill the whole system.


Ummm, you have access to somebody with all this experience, why don't you ask
them to design the system for you?  Or, at least, review the design you
come up with?

Or ask them for an education in how to approach the design of a system that
includes servers, clients, software for the clients, etc

Also he warned me that if any disk in the boot softraid RAID1 would
break, then the BIOS could get so confused that the system even wouldn't
want to boot - and for that reason I guess the boot disks should be
separated altogether from the "data disks", as the further will have a
much, much lower turnover.

Logic error. If a disk can fail, it will, and if it is a boot disk, it will
also fail.  If the softraid is two disks or 8, you run the same risks.

Or, otherwise, avoid disk boot altogether and use a network boot from a
dual-redundant bootp/NFS server.


A SATA-controller- or BIOS-induced system crash, freeze, or other need
to reboot the system because of malfunction because of them, would be
really unfortunate as it would escalate the maintenance requirement in
that moment not only above not needing any physical intervention at all,
but also above the level of just needing to change or add a physical
drive (which is easy to ask anyone to do).

Do not overestimate the skills of your "anyone". Unless they are rehearsed
regularly, they won't.


Q1: Therefore, just to save myself as much as possible of future
headache, I would like to learn to know your thoughts/suggestions on
what hardware and software configuration I should choose (in general &
actual particular controller, motherboard, BIOS setting, etc.), to get
maximum operational stability, in particular for the case that the SSD:s
break down, with the OpenBSD & other setup as described above?


Given your mix of enthusiasm and relative inexperience, you need to find someone
who can explain some of the bigger issues you are facing.  Choosing BIOS
settings is not one of them.

Q2: Also if anything is needed for a SATA controller to support
hotswapping well, in general and with OpenBSD, please let me know?

Q3: For if I need any separate SATA HW, I didn't get any clarity about
which SATA controllers OpenBSD actually supports (e.g.
http://www.openbsd.org/cgi-bin/man.cgi/OpenBSD-current/man4/pci.4?query=pci&sec=4
is not so clear?). In the absence of any further suggestion, my best
guess would be that the best choice would be some of the LSI-based HBA:s
or RAID cards from Avagotech, so that is
http://www.avagotech.com/products/server-storage/host-bus-adapters/ and
www.avagotech.com/products/server-storage/raid-controllers/ . If so,
what do you say about SATA HW choice?


Clearly you like to shop. Why not shop for complete hardware solutions instead
of just parts?

(Later on I'll need to learn about softraid hot spares, scrubbing,
rebuild and hotswapping, though those are software questions.)

Thanks!

Tinker

Some important things:

- what is the purpose of this collection of clients, servers, networks and software? - who will judge, and how will they judge, the effectiveness of it? How fast/correctly it performs not just reliability
 - what is their budget?
 - how much time will they give you?
 - how will you spend your time?
- how will you prove to yourself that you have finished? How can you prove to your users/customer that it works?


--John

Reply via email to