On 31.07.09 22:04, Kurt Olsen wrote:
On Jul 24, 2009, at 22:17, Bob Friesenhahn wrote:
....
Most of the issues that I've read on this list would
have been
"solved" if there was a mechanism where the user /
sysadmin could tell
ZFS to simply go back until it found a TXG that
worked.
The trade off is that any transactions (and their
data) after the
working one would be lost. But at least you're not
left with an un-
importable pool.
I'm curious as to why people think rolling back txgs don't come with
additional costs beyond losing recent transactions. What are the odds
that the data blocks that were replaced by the discarded transactions
haven't been overwritten?
Odds depend on lots of factors - activity in the pool, free space, block
selection policy, metaslab cursor positions etc. I have seen examples of
successful recovery to a point in time which is around 9 hours before
last synced txg. Sometimes it is enough to roll one txg back, sometimes
it requires going back and trying a a few older ones.
Without a snapshot to hold the references
aren't those blocks considered free and available for reuse?
As soon as transaction group is synced, blocks freed during that
transaction group time are released back to the pool, and potentially
can be overwritten during next txg.
Don't get me wrong, I do think that rolling back to previous
uberblocks should be an option v. total pool loss, but it doesn't
seem like one can reliably say that their data is in some known good
state.
If fact thanks to the fact that everything is checksummed one can say
that pool is in a good shape as reliably as current checksum in use allows.
victor
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss