It also doesn't prevent against something going completely SNAFU inside
the filesystem caused by some sofware error. garbage in, garbage out.
But, at some point you do have to weight the risks and take your
chances. Given that it's a distributed system, you are at least somewhat
isolated from a single node going berzerk and the likelihood of the same
bug triggering across all nodes (which are having different I/Os
happening to them) at exactly the same time is somewhat unlikely, but
not totally eliminated.
Also, since it's a wholly integrated appliance, you don't have the crazy
edge case that I had where trying to upgrade the OS on a standard-OS
service node caused metadata to get stomped because kickstart wanted to
'helpfully' put an ext4 filesystem over the metadata. There are other
upgrade risks though.
On 9/15/2013 11:04 AM, Andrew Hume wrote:
i see these claims all teh time, and the main problem for me is that
this reliability
is just math. it is very promising and all, but its just math.
does anyone really believe the linux kernel can deliver data with 15 9s?
does anyone believe or even care that bits (erasure encoded or not) can get
to an application's memory with 15 9s?
at the last talk like this, i asked what the chance was that the machine
room
housing the disks went lights out. no one knows, but is rather more than
15 9s.
the fact that this doesn't enter into any of the failure calculations show
that the analysis is incomplete.
having said that, erasure coding is great and should, and will, proliferate.
it is not without its own costs but is a valuable component of reliable
data.
On Sep 15, 2013, at 7:42 AM, Adam Levin wrote:
However, with their system, you just tell it how protected you want
your data, and it spreads it out and does the calculations for you.
You give it the number of "safety drives" you want, for example 4,
and it will do it for you. With 4, you get 15 9's of durability. You
can lose 4 drives, 4 nodes, or even 4 racks (you obviously have to
have the infrastructure available to provide this capability, but the
theory is that it's just cheap disk, and the software is doing the
work). By the way, this 15 9's protection is 70% efficient, so 1TB of
raw capacity provides 700GB of usable. If you want DR, you just
geo-spread it around the country or the world. It's not full
replication, it's calculated distributed parity using the erasure
coding, so it's more efficient, but not as fast.
_______________________________________________
Tech mailing list
Tech@lists.lopsa.org
https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
http://lopsa.org/