Filed as http://subversion.apache.org/issue/4843 .

We could start by defining more precisely what it needs to do.

Some aims, in order from highest priority:
* check if any pristine file's content is corrupted (according to its filename hash)
    - report and rename/delete corrupted pristines
  * check if any pristines are missing (according to wc.db)
  * fetch missing (or corrupted) pristines from the repository
  * verify wc.db 'pristine' table entries against other tables

Checking for content corruption by recalculating the checksums is going to be slow -- there is no getting away from that -- so this most important check is probably going to be the last one we run, and we may choose to make it optional. That's fine.

We could check quickly:
  * for each pristine file listed in the DB:
    - file is present
    - file size matches the DB
    - file mod-time matches the DB

The existing 'cleanup' implementation contains a function 'pristine_cleanup_wcroot' which has in its doc string:

[[[
  TODO: Ideas for possible extra clean-up operations:

  * Check and correct all the refcounts.  Identify any rows missing
    from the 'pristine' table.  [...]

  * Check the checksums.  (Very expensive to check them all, so find
    a way to not check them all.)

  * Check for pristine files missing from disk but referenced in the
    'pristine' table.

  * Repair any pristine files missing from disk and/or rows missing
    from the 'pristine' table and/or bad checksums.  Generally
    requires contacting the server, so requires support at a higher
    level than this function.

  * Identify any pristine text files on disk that are not referenced
    in the DB, and delete them.
]]]

The refcounts are references within the DB from nodes to the 'pristines' table. They are enforced by SQLite with 'REFERENCES' clauses in the schema, though I saw one comment somewhere saying this was "in debug builds" so we might want to double-check.

I am not aware of problems in the consistency of the DB tables, so I don't think checking that is a priority. Though I don't have hard evidence, from problems reported over the years I think corrupted and missing pristine files on disk is the main concern.

- Julian

Reply via email to