Re: [lopsa-tech] large scale storage - medium bandwidth

Adam Levin Sun, 15 Sep 2013 07:42:45 -0700

I cleaned up the To: and Cc: stuff.  I'm not on Tachyon so my message
bounced.

Keeping in mind that I have not yet actually *used* Amplidata, just spoken
with them, it's a very interesting technology.

Regarding protecting against corruption, that's the point of the erasure
coding.  Mathematically, there can't be any, or at the very least, if there
*is*, it's corrected right away by calculations from the rest of the data.
 The data is spread far and wide to protect against disk, shelf, controller
or even site failures (it's a *globally* distributed filesystem).

I can't possibly explain the technology adequately in an email.  It's a
very interesting presentation.  Supposedly, this is what services like
Azure, Amazon and Carbonite run on.  They have insane amounts of data
stored in their clouds for their customers, and there's no possible way to
back up that data, so it's protected by math.  :)

According to them, and my math is too rusty to manage this, they claim that
a standard 10 disk RAID6 with hot spares is around 3 9's of "durability" --
99.9%.  If you back up to another similar RAID6, you get 2x3-9's, or
3+3+2=8 9's of durability, 99.999999%.  That's cool.

However, with their system, you just tell it how protected you want your
data, and it spreads it out and does the calculations for you.  You give it
the number of "safety drives" you want, for example 4, and it will do it
for you.  With 4, you get 15 9's of durability.  You can lose 4 drives, 4
nodes, or even 4 racks (you obviously have to have the infrastructure
available to provide this capability, but the theory is that it's just
cheap disk, and the software is doing the work).  By the way, this 15 9's
protection is 70% efficient, so 1TB of raw capacity provides 700GB of
usable.  If you want DR, you just geo-spread it around the country or the
world.  It's not full replication, it's calculated distributed parity using
the erasure coding, so it's more efficient, but not as fast.

The performance is supposedly good enough for primary storage reads and for
sequential i/o, but not good for high random writes.  Sounds like a good
case for big data to me.

There are policies to manage the data, so you can lock it, encrypt it,
provide automated or manual expiration, and even do DOD erasure.  So, yeah,
rm -rf is an issue if you don't use the policies, but that's the case
anywhere.  In big data, you probably don't want to be able to delete
anything anyway, so make it WORM and protect it.

I would love to check it out in more depth, but frankly I just don't have
that much data yet.  :)  This system is an object store, so you still need
something on the front end if you want to use standard protocols like NFS.
 If you have an application that uses their API, though, you don't need
that.  Filetek can front this system and provide NFS/CIFS file services,
but is a bit redundant since Filetek provides a lot of the policy
management that Amplidata does.

-Adam

On Sun, Sep 15, 2013 at 8:14 AM, Yves Dorfsman <y...@zioup.com> wrote:

> I don't know about those particular technologies, but I'm assuming through
> snapshots with equivalent retetions as you'd do for your backups.
>

_______________________________________________
Tech mailing list
Tech@lists.lopsa.org
https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
 http://lopsa.org/

Re: [lopsa-tech] large scale storage - medium bandwidth

Reply via email to