Dear Nick,
On 2016-02-15 05:29, Nick Holland wrote:
On 02/13/16 11:49, Tinker wrote:
Hi,
1)
http://www.openbsd.org/papers/asiabsdcon2010_softraid/softraid.pdf
page
3 "2.2 RAID 1" says that it reads "on a round-robin basis from all
active chunks", i.e. read operations are spread evenly across disks.
Since then did anyone implement selective reading based on experienced
read operation time, or a user-specified device read priority order?
That would allow Softraid RAID1 based on 1 SSD mirror + 1 SSD mirror +
1
HDD mirror, which would give the best combination of IO performance
and
data security OpenBSD would offer today.
I keep flip-flopping on the merits of this.
At one point, I was with you, thinking, "great idea! Back an
expensive,
fast disk with a cheap disk".
Currently, I'm thinking, "REALLY BAD IDEA". Here's my logic:
There's no such thing as an "expensive disk" anymore. A quick look
..
of "fast" storage to make their very few business apps run better. No
question in their mind, it was worth it. Now we do much more with our
computers and it costs much less. The business value of our investment
should be much greater than it was in 1982.
And ignoring hardware, it is. Companies drop thousands of dollars on
consulting and assistance and think nothing of it. And in a major
computer project, a couple $1000 disks barely show as a blip on the
budget. Hey, I'm all about being a cheap bastard whenever possible,
but
this just isn't a reasonable place to be cheap, so not somewhere I'd
suggest spending developer resources.
Also ... it's probably a bad idea for functional reasons. You can't
just assume that "slower" is better than "nothing" -- very often, it's
indistinguishable from "nothing". In many cases, computer systems that
perform below a certain speed are basically non-functional, as tasks
can
pile up on them faster than they can produce results. Anyone who has
dealt with an overloaded database server, mail server or firewall will
know what I'm saying here -- at a certain load, they go from "running
ok" to "death spiral", and they do it very quickly.
If you /need/ the speed of an SSD, you can justify the cost of a pair
of
'em. If you can't justify the cost, you are really working with a
really unimportant environment, and you can either wait for two cheap
slow disks or skip the RAID entirely.
How fast do you need to get to your porn, anyway?
I technically agree with you -
What lead me to think about SDD+HDD was the idea of having on the same
mountpoint a hybrid-SSD-HDD storage where the "important stuff" would be
automatically in the SSD and the "less important" on the HDD.
This symmetry would mean that those two data sets could be stored within
one and the same directory structure, which would be really handy, and
archiving of unused files would be implicit.
I understand that ZFS is quite good at delivering this. LSI MegRaid
cards are good at that as long as the "important stuff" is forever
<512GB, which is not the case, duh.
This whole idea has a really exotic, unpredictable, ""stinking"" edge to
it though. Your "slower" is generally as bad as "nothing" allegory
combined with the market price situation, makes all sense -
So, even if kind of unwillingly, I must agree with your reasoning.
(now ... that being said, part of me would love a tmpfs / disk RAID1,
one that would come up degraded, and the disk would populate the RAM
disk, writes would go to both subsystems, reads would come from the RAM
disk once populated. I could see this for some applications like CVS
repositories or source directories where things are "read mostly", and
typically smaller than a practical RAM size these days, and as there
are
still a few orders of magnitude greater performance in a RAM disk than
an SSD and this will likely remain true for a while, there are SOME
applications where this could be nice)
Wait.. you mean you would like OpenBSD to implement read cache that is
"100% caching agressive" rather than the current "buffer cache" which
has "dynamic caching agressiveness" - I don't understand how this could
make sense, can you please clarify?
2)
Also if there's a read/write failure (or excessive time consumption
for
a single operation, say 15 seconds), will Softraid RAID1 learn to take
the broken disk out of use?
As far as I am aware, Softraid (like most RAID systems, hw or sw) will
deactivate a drive which reports a failure. Drives which go super slow
(i.e., always manage to get the data BEFORE the X'th retry at which
they
would toss an error) never report an error back, so never deactivate
the
drive.
Sound implausible? Nope. It Happens. Frustrating as heck when you
have this happen to you until you figure it out. In fact, one key
feature of "enterprise" and "RAID" grade disks is that when they hop
off-line and throw an error fast and early, to prevent this problem
(some "NAS" grade disks may do this. Or they may just see your credit
limit hasn't been reached).
However, having done this for a looong time, and seen the problems from
both rapid-failure and "try and try" disks, I'll take the "try and try"
problem any day. Happens a lot less often, and tends to be less
catastrophic when it happens (hint: you WILL be quickly fixing a disk
system which gets to be 100x slower than normal. You may not notice
the
first disk that fails and causes an array to be non-redundant until the
disk fails that takes the array down completely).
What I learn from what you say here, until if I would be proven
otherwise by anyone else, is
1) The softraid subsystem will wait for underlying drives' IO
operations indefinitely.
Therefore, misbehavior in the form of ultra-long-running IO
operations and the like,
will many times imply the same misbehavior in the softraid globally.
On the other hand, if disk that sends a specific complete-failure
messaging (such as going
offline or sending the SMART command for telling that), it will be
handled gracefully by
the softraid without any QoS impact, by the softraid simply
disconnecting the drive.
2) Enterprise drives uniquely happen to follow exactly this access
pattern, i.e. either they work
perfectly, or in the case of any real issue they'll report themself
gracefully as broken
by either just disconnecting, or sending the proper SMART command.
Therefore, I should always buy only enterprise-certified drives,
e.g. Samsung PM863 or SM863 as
SSD or "Seagate Enterprise Capacity 3.5 HDD 8TB 3.5" SATA-600" as
HDD.
Finally, if the absolutely unexpected would happen and there'd be a
dumped-IO-throughput scenario for any
secondary reasons, then there must be some trigger to detect that, which
takes the server out of use totally,
to undergo maintenance and that's all.
Cheers.
Tinker