Filed as http://subversion.apache.org/issue/4843 .
We could start by defining more precisely what it needs to do.
Some aims, in order from highest priority:
* check if any pristine file's content is corrupted (according to its
filename hash)
- report and rename/delete corrupted pristines
* check if any pristines are missing (according to wc.db)
* fetch missing (or corrupted) pristines from the repository
* verify wc.db 'pristine' table entries against other tables
Checking for content corruption by recalculating the checksums is going
to be slow -- there is no getting away from that -- so this most
important check is probably going to be the last one we run, and we may
choose to make it optional. That's fine.
We could check quickly:
* for each pristine file listed in the DB:
- file is present
- file size matches the DB
- file mod-time matches the DB
The existing 'cleanup' implementation contains a function
'pristine_cleanup_wcroot' which has in its doc string:
[[[
TODO: Ideas for possible extra clean-up operations:
* Check and correct all the refcounts. Identify any rows missing
from the 'pristine' table. [...]
* Check the checksums. (Very expensive to check them all, so find
a way to not check them all.)
* Check for pristine files missing from disk but referenced in the
'pristine' table.
* Repair any pristine files missing from disk and/or rows missing
from the 'pristine' table and/or bad checksums. Generally
requires contacting the server, so requires support at a higher
level than this function.
* Identify any pristine text files on disk that are not referenced
in the DB, and delete them.
]]]
The refcounts are references within the DB from nodes to the 'pristines'
table. They are enforced by SQLite with 'REFERENCES' clauses in the
schema, though I saw one comment somewhere saying this was "in debug
builds" so we might want to double-check.
I am not aware of problems in the consistency of the DB tables, so I
don't think checking that is a priority. Though I don't have hard
evidence, from problems reported over the years I think corrupted and
missing pristine files on disk is the main concern.
- Julian