In the message dated: Sat, 14 Sep 2013 07:05:48 -0700, The pithy ruminations from Andrew Hume on <[lopsa-tech] large scale storage - medium bandwidth> were: => a recent meditation from a mailing list on the issue of media bandwidth => for big data. note especially the claim that the time needed to migrate => the data to another medium exceeds the lifetime of the current medium.
Huh? That just doesn't sound right to me. What does "lifetime of the current media" mean? Number of reads or writes? Unlikely, as it sounds like the data is on archival media, and the issue is migration. Number of years that the manufacturer considers the media to be viable (given proper storage, etc.)? Let's plug in some numbers here, thanks to Wikipedia[1]: LTO1: capacity = 200GB (compressed) data transfer speed 20MB/s 15+ years archival lifetime Introduced September 2000 Assuming a pathological, worst-case scenario where all the old data is on LTO1 tapes, purchased the week they came onto the market, and where there's only a single tape drive available to read legacy tapes in order to migrate data to a newer/bigger/faster storage platform, and assuming that the "lifetime of the current media" means that in September, 2015, all the LTO1 tapes disappear in a puff of smoke (think Mission Impossible), and assuming 1 minute to change tapes, how much data could we read in 2 years? My 'back of the envelope calculation' gives: 200GB capacity @ 20MB/s =~ 2 hrs 50 min to read a tape + 1 min to change tapes ============== 2:51 (2.85)/tape = 8.42 tapes/day 365 days * 2 years * 8.42 tapes/day =~ 6146 tapes 6146 tapes =~ 1.17 Petabytes I'm assuming no shoe-shining, no bad media (it hasn't reached it's "lifetime" yet), no read-errors, no data verification, etc. Even if those factors reduced the data transfer time by 50%, you'd still be able to move data off of more than 3000 tapes before they reach the low end of their "lifetime". Sure, you could re-do the scenario with much older media (DAT-2, anyone?) if you want to make it even more unlikely, but I think this shows that it's implausible that a site really has more data stored on tape than could be migrated before the media "expires". [1] http://en.wikipedia.org/wiki/Linear_Tape-Open => => > Exactly what I was referring to - bandwidth needed for data integrity => > & migration. Oh, and read-never is not a myth at all - at least in => > the minds of many datacenters and the folks who run them. They are => > under either legal mandate and/or company policy to retain data, read => > regardless. They hope to never read it at all. Still, they must prove => > in a court of law that they have retained it. => > => > Which brings us back to data integrity and long-term preservation. If => > you think it's a problem today, just wait...this is the 8 bazillion => > pound gorilla that faces all institutions who plan on storing exabytes => > of data. FB is one of those. => > => > To your point about large tape farms (disclaimer: I used to work for => > StorageTek) I already know several HPC sites who are 'stuck' - i.e. => > they cannot (or will not pay for) the necessary infrastructure to => > correctly maintain and migrate exascale data collections. It would Oh, "correctly maintain and migrate" and "pay for" are very different considerations than the question of whether it is mathematically possible to migrate data within the "lifetime of the current media". => > take them longer to migrate the collection to new tape than the useful Based on the numbers above, I don't believe that. => > lifetime of the media. And they are too cheap to buy and maintain the I certainly believe that it's too expensive to buy & maintain the capacity to do the migration "correctly", but not that it is simply impossible to move the data over the lifetime of the media. => > needed infrastructure to perform such a migration in parallel, to => > reduce the time needed. => > => > Just you wait. 5 years from now, the scheist will hit the (exabyte) => > fan. Storing data today is one thing, preserving it for decades is => > quite another. HIPAA, anyone? => => ----------------------- Andrew Hume 949-707-1964 (VO and best) => 732-420-2275 (NJ) and...@research.att.com => -- Mark Bergman _______________________________________________ Tech mailing list Tech@lists.lopsa.org https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech This list provided by the League of Professional System Administrators http://lopsa.org/