On 2012-02-20 17:05, Richard Elling wrote:
On Feb 20, 2012, at 6:38 AM, Robin Axelsson wrote:
Maybe the iostat "behavior" depends on the controller it monitors. Some
controllers such as the AMD SB950 in my case may not be as transparent with errors as the
LSI 1068e operating in IT mode.
Still, I find this to be too much of a coincidence. It is evident that ZFS is
not very good to use without disk redundancy.
Eh? Other file systems will blissfully deliver corrupted data. Silent data
corruption is a much worse fate!
I'll try to add a mirror to the system pools as soon as possible. It would be
great if there were some kind of software that could be set up to generate
.par2 files (with x% data redundancy) on-the-fly to protect files on hard
drives without disk redundancy (RAID=0).
Not needed. ZFS has a copies parameter where you can set the number of
redundant copies on a per-dataset basis. For example, you can set copies=2
for important data, and copies=1 (the default) for data stored on other media
(eg .iso files)
OTOH, par2 is a completely different architecture that is designed for
transferring
files reliably. par2 is not well suited for direct access to data.
I couldn't recover the image file with cp but I learned in the process that it
is possible with dd. 'dd if=infile of=outfile conv=noerror,sync' could do it.
Correct, cp will exit on a failed read.
That is all fine but I kind of expected that cp had some kind of a
force/recover/salvage parameter for recovering corrupted files.
Then I discovered ddrescue which did *exactly* what I expected cp to do. I just
entered:
# ddrescue /path/to/corrupted/file /path/to/recovered/file /path/to/logfile.log
Good idea.
all paths were even in the same vdev. In the process the vdev became 'DEGRADED'
even though no additional corruption occurred. So I did a scrub afterwards and
'zfs clear':ed the error afterwards. I did an fmadm repair to tell fma about
it. Perhaps I should fmadm reset zfs-diagnosis and zfs-retire as well.
Once you've recovered the data, why are you so interested in eliminating the
history of
the corruption?
I'm not, I just want things to return to normal.
Neither par2 nor ddrescue are included with OpenIndiana, I downloaded and
installed them manually from the opencsw.org repository. I would strongly
recommend to have such tools included with OI.
par2 seems to have little traction. ddrescue can be useful, but is only
applicable in rare cases.
-- richard
The copies=n > 1 parameters and so called ditto blocks seems to be an
interesting idea. I think I may try and use that one until I get a
mirror drive.
I think par2 is kind of useful. Par2 can generate checksums with any
user defined percentage number of redundancy between 0 and 100%. If one
assumes that the likelihood of corruption is 0.1% per data written
(which is really bad) then even a 1% redundancy will protect against
such corruption (if par2 data is updated on every write). This also
applies even if the corruption occurs in the par2 data.
Of course, if an entire drive goes down it won't be sufficient (nor
would be ditto blocks) but it could provide a slimmer trade off between
ditto block redundancy and storage space. I guess the price to be paid
is I/O performance and CPU.
If I understand it correctly, par2 uses similar principles as raidz/2/3
and it also uses Reed-Solomon code for check-summing.
The problem with par2 on "file" level is that if an error has occurred
in a pool, zfs won't be very forthcoming with it even though the error
may be fixable with par2.
--
DTrace Conference, April 3, 2012,
http://wiki.smartos.org/display/DOC/dtrace.conf
ZFS Performance and Training
richard.ell...@richardelling.com
+1-760-896-4422
_______________________________________________
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss
_______________________________________________
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss