In the message dated: Sat, 14 Sep 2013 07:05:48 -0700,
The pithy ruminations from Andrew Hume on 
<[lopsa-tech] large scale storage - medium bandwidth> were:
=> a recent meditation from a mailing list on the issue of media bandwidth
=> for big data. note especially the claim that the time needed to migrate
=> the data to another medium exceeds the lifetime of the current medium.

Huh? That just doesn't sound right to me.

What does "lifetime of the current media" mean?
        Number of reads or writes? Unlikely, as it sounds like the data
        is on archival media, and the issue is migration.

        Number of years that the manufacturer considers the media to 
        be viable (given proper storage, etc.)?

Let's plug in some numbers here, thanks to Wikipedia[1]:

        LTO1:
                capacity = 200GB (compressed)
                data transfer speed 20MB/s
                15+ years archival lifetime
                Introduced September 2000
                

Assuming a pathological, worst-case scenario where all the old data
is on LTO1 tapes, purchased the week they came onto the market, and
where there's only a single tape drive available to read legacy tapes
in order to migrate data to a newer/bigger/faster storage platform,
and assuming that the "lifetime of the current media" means that in
September, 2015, all the LTO1 tapes disappear in a puff of smoke (think
Mission Impossible), and assuming 1 minute to change tapes, how much
data could we read in 2 years?

My 'back of the envelope calculation' gives:

        200GB capacity @ 20MB/s =~ 2 hrs 50 min to read a tape
                                 +        1 min to change tapes
                                 ==============
                                 2:51 (2.85)/tape = 8.42 tapes/day

        365 days * 2 years * 8.42 tapes/day =~ 6146 tapes

        6146 tapes =~ 1.17 Petabytes

I'm assuming no shoe-shining, no bad media (it hasn't reached it's
"lifetime" yet), no read-errors, no data verification, etc. Even if those
factors reduced the data transfer time by 50%, you'd still be able to
move data off of more than 3000 tapes before they reach the low end of
their "lifetime".

Sure, you could re-do the scenario with much older media (DAT-2,
anyone?) if you want to make it even more unlikely, but I think this
shows that it's implausible that a site really has more data stored on
tape than could be migrated before the media "expires".


[1] http://en.wikipedia.org/wiki/Linear_Tape-Open
        
=> 
=> > Exactly what I was referring to - bandwidth needed for data integrity
=> > & migration. Oh, and read-never is not a myth at all - at least in
=> > the minds of many datacenters and the folks who run them. They are
=> > under either legal mandate and/or company policy to retain data, read
=> > regardless. They hope to never read it at all. Still, they must prove
=> > in a court of law that they have retained it.
=> >
=> > Which brings us back to data integrity and long-term preservation. If
=> > you think it's a problem today, just wait...this is the 8 bazillion
=> > pound gorilla that faces all institutions who plan on storing exabytes
=> > of data. FB is one of those.
=> >
=> > To your point about large tape farms (disclaimer: I used to work for
=> > StorageTek) I already know several HPC sites who are 'stuck' - i.e.
=> > they cannot (or will not pay for) the necessary infrastructure to
=> > correctly maintain and migrate exascale data collections. It would

Oh, "correctly maintain and migrate" and "pay for" are very different
considerations than the question of whether it is mathematically possible
to migrate data within the "lifetime of the current media".

=> > take them longer to migrate the collection to new tape than the useful

Based on the numbers above, I don't believe that.

=> > lifetime of the media. And they are too cheap to buy and maintain the

I certainly believe that it's too expensive to buy & maintain the capacity
to do the migration "correctly", but not that it is simply impossible
to move the data over the lifetime of the media.

=> > needed infrastructure to perform such a migration in parallel, to
=> > reduce the time needed.
=> >
=> > Just you wait. 5 years from now, the scheist will hit the (exabyte)
=> > fan. Storing data today is one thing, preserving it for decades is
=> > quite another. HIPAA, anyone?
=> 
=> ----------------------- Andrew Hume 949-707-1964 (VO and best)
=> 732-420-2275 (NJ) and...@research.att.com
=> 

-- 
Mark Bergman
_______________________________________________
Tech mailing list
Tech@lists.lopsa.org
https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
 http://lopsa.org/

Reply via email to