Re: [lopsa-tech] Backup Reliability

Skylar Thompson Fri, 17 May 2013 09:15:04 -0700

On 05/13/2013 06:13 AM, Edward Ned Harvey (lopser) wrote:

From: [email protected] [mailto:[email protected]]
On Behalf Of Skylar Thompson


Second, we depend LTO's data validation while data are being written to
tape.

I don't want to say "all devices," but I'll say all hard drives include data 
integrity, in the form of FEC, built into the hardware.  If you get data out of the disk, 
it means it already passed the hardware checksum.  The same is true for TCP.  Yet things 
like zfs layer additional integrity checking on top of that ...  And in a lot of 
circumstances, it's wise to validate files transferred over a network too.

I certainly know, I can't enumerate the number of times I've discovered corrupt 
data by checking the md5, or scrubbing the filesystem.  So the hardware and TCP 
checksumming is extremely useful, but by at least my measure, not good enough.

One thing that works in our favor, is that much of our data comes fromsources that are already known to be lossy (gene sequencers, massspectrometers, etc.). There's already QA going on to correct for that,and in many cases it can correct in storage lossiness as well -sequencing frequently runs at 30x coverage, so any given region of thegenome is processed 30 times. If one of those copies is bad (whether onthe sequencer or in the storage), then QA can catch that.

The final results, of course, need to be better protected, but they'realso much smaller than the raw data. The raw data is good to have backedup for a few months, though, since each run (lasting 4-8 days, and wemight have 20 of them going at once) can cost upwards of $20k.


Skylar
_______________________________________________
Tech mailing list
[email protected]
https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
http://lopsa.org/

Re: [lopsa-tech] Backup Reliability

Reply via email to