Daniel Shahaf wrote:
Julian Foad (Jira) wrote on Mon, 01 Jun 2020 11:20 +0000:
Two clues strongly suggest the corruption was originally caused by a bug rather
than hardware corruption:
* the checksum on the node-revision must account for the corrupted data,
otherwise a checksum error is thrown instead;
_Which_ checksum? Do you mean the checksum of the directory
representation wherein the dangling (off-by-four) pointer was found?
Yes: in my test an error is thrown for the checksum of that
representation if that checksum isn't corrected or nulled.
* although an off-by-4 can sometimes be a 1-bit error, this particular one
(41271 vs. 41275) is not a 1-bit error.
Is there any chance that this _was_ originally a 1-bit error and then
some offset got added/subtracted to both the id in the noderev header
and the id in the directory rep?
Technically I expect that's possible but it seems less likely. It would
have to have happened at original commit time in the transaction
construction and commit time, whereas I suspect (without research) that
a bit error is much more likely to occur in data at rest on disk for years.
I do not know which revision contains the bad reference nor what version of svn
committed it.
Was CONFIG_OPTION_VERIFY_BEFORE_COMMIT enabled at the time the revision
containing the dangling pointer was committed? (You may be able to
answer this even without running grep -aR to find the dangling pointer.)
I very much expect not, because I am not aware of it being used in any
production environments that I have been connected with.
It's curious that the wrong offset points directly to the value of the
first field. However, having glanced at svn_fs_fs__write_noderev(),
I guess that's just a coincidence.
Assuming the node-rev headers _are_ synthesized by svn_fs_fs__write_noderev()
in this user's environment, that is.
I'm confident they were synthesized by standard FSFS code.
- Julian