Re: [OpenIndiana-discuss] System disk corruption

Robin Axelsson Mon, 20 Feb 2012 08:55:33 -0800

On 2012-02-20 17:05, Richard Elling wrote:

On Feb 20, 2012, at 6:38 AM, Robin Axelsson wrote:

Maybe the iostat "behavior" depends on the controller it monitors. Some 
controllers such as the AMD SB950 in my case may not be as transparent with errors as the 
LSI 1068e operating in IT mode.


Still, I find this to be too much of a coincidence. It is evident that ZFS is 
not very good to use without disk redundancy.

Eh? Other file systems will blissfully deliver corrupted data. Silent data
corruption is a much worse fate!

I'll try to add a mirror to the system pools as soon as possible. It would be 
great if there were some kind of software that could be set up to generate 
.par2 files (with x% data redundancy) on-the-fly to protect files on hard 
drives without disk redundancy (RAID=0).

Not needed. ZFS has a copies parameter where you can set the number of
redundant copies on a per-dataset basis. For example, you can set copies=2
for important data, and copies=1 (the default) for data stored on other media
(eg .iso files)

OTOH, par2 is a completely different architecture that is designed for 
transferring
files reliably. par2 is not well suited for direct access to data.

I couldn't recover the image file with cp but I learned in the process that it 
is possible with dd. 'dd if=infile of=outfile conv=noerror,sync' could do it.

Correct, cp will exit on a failed read.

That is all fine but I kind of expected that cp had some kind of aforce/recover/salvage parameter for recovering corrupted files.

Then I discovered ddrescue which did *exactly* what I expected cp to do. I just 
entered:

# ddrescue /path/to/corrupted/file /path/to/recovered/file /path/to/logfile.log

Good idea.

all paths were even in the same vdev. In the process the vdev became 'DEGRADED' 
even though no additional corruption occurred. So I did a scrub afterwards and 
'zfs clear':ed the error afterwards. I did an fmadm repair to tell fma about 
it. Perhaps I should fmadm reset zfs-diagnosis and zfs-retire as well.

Once you've recovered the data, why are you so interested in eliminating the 
history of
the corruption?

I'm not, I just want things to return to normal.

Neither par2 nor ddrescue are included with OpenIndiana, I downloaded and 
installed them manually from the opencsw.org repository. I would strongly 
recommend to have such tools included with OI.

par2 seems to have little traction. ddrescue can be useful, but is only 
applicable in rare cases.
  -- richard

The copies=n > 1 parameters and so called ditto blocks seems to be aninteresting idea. I think I may try and use that one until I get amirror drive.

I think par2 is kind of useful. Par2 can generate checksums with anyuser defined percentage number of redundancy between 0 and 100%. If oneassumes that the likelihood of corruption is 0.1% per data written(which is really bad) then even a 1% redundancy will protect againstsuch corruption (if par2 data is updated on every write). This alsoapplies even if the corruption occurs in the par2 data.

Of course, if an entire drive goes down it won't be sufficient (norwould be ditto blocks) but it could provide a slimmer trade off betweenditto block redundancy and storage space. I guess the price to be paidis I/O performance and CPU.

If I understand it correctly, par2 uses similar principles as raidz/2/3and it also uses Reed-Solomon code for check-summing.

The problem with par2 on "file" level is that if an error has occurredin a pool, zfs won't be very forthcoming with it even though the errormay be fixable with par2.

--
DTrace Conference, April 3, 2012, 
http://wiki.smartos.org/display/DOC/dtrace.conf
ZFS Performance and Training
richard.ell...@richardelling.com
+1-760-896-4422



_______________________________________________
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss




_______________________________________________
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss

Re: [OpenIndiana-discuss] System disk corruption

Reply via email to