Commenting on myself... Neels J Hofmeyr wrote: > Philip Martin wrote: >> Neels J Hofmeyr <ne...@elego.de> writes: >> >>> THE PRISTINE STORE >>> ================== >>> >>> The pristine store is a local cache of complete content of files that are >>> known to be in the repository. It is hashed by a checksum of that content >>> (SHA1). >> I'm not sure whether you are planning one table per pristine store or >> one table per working copy, but I think it's one per pristine store. >> Obviously it doesn't makes no difference until pristine stores can be >> shared (and it might be one per directory in the short term depending >> on when stop being one database per directory). > > Thanks for that. This is the tip of an iceberg called 'a pristine store does > not equal a working copy [root]'. > > The question is how to store the PRISTINE table (also see below) once it > serves various working copies. Will we have a separate SQLite db store, and > create a new file system entity called 'pristine store' that the user can > place anywhere, like a working copy? > > We could also keep pristine store and working copy welded together, so that > one working copy can use the pristine store of another working copy, and > that a 'pristine store' that isn't used as a working copy is just a > --depth=empty checkout of any folder URL of that repository. It practically > has the same effect as completely separating pristine stores from working > copies (there is another SQLite store somewhere else), but we can just > re-use the WC API, no need to have a separate pristine *store* API (create > new store, contact local store database, indicate a store location, checking > presence given a location, etc.).
If we have a single wc.db per user, we can also easily have a single pristine store per user. Until then, we'll probably better use a separate pristine store per WC...? How to tackle a system-wide pristine store also has to cope with write permissions, so that may be a different thing entirely (like a local service daemon instead of a publicly writable file system location...) > >>> SOME IMPLEMENTATION INSIGHTS >>> ============================ >>> >>> There is a PRISTINE table in the SQLite database with columns >>> (checksum, md5_checksum, size, refcount) >>> >>> The pristine contents are stored in the local filesystem in a pristine file, >>> which may or may not be compressed (opaquely hidden behind the pristines >>> API). >>> The goal is to be able to have a pristine store per working copy, per user >>> as >>> well as system-wide, and to configure each working copy as to which pristine >>> store(s) it should use for reading/writing. >>> >>> There is a canonical way of getting a given CHECKSUM's pristine file name >>> for >>> a given working copy without contacting the WC database (static function >>> get_pristine_fname()). >>> >>> When interacting with the pristine store, we want to, as appropriate, check >>> for (combos of): >>> db-presence - presence in the PRISTINE table with noted file size > 0 >>> file-presence - pristine file presence >>> stat-match - PRISTINE table's size and mtime match file system >>> checksum-match - validity of data in the file against the checksum >>> >>> file-presence is gotten for free from a successful stat-match (fstat), >>> checksum-match (fopen) and unchecked read of the file (fopen). >>> >>> How fast we consider things: >>> db-presence - very fast to moderately fast (in case of "empty db >>> cache") >>> file-presence - slow (fstat or fopen) >>> stat-match - slow (fstat plus SQLite query) >>> checksum-match - super slow (reading, checksumming) >> I'm prepared to believe a database query can be faster that stat when >> the inode cache is cold, but what about when the inode cache is hot? > > Also thanks for this! > > I don't know that much about database/file system benchmarks, let alone on > different platforms. My initial classifications are mostly guessing, mixed > with provocative prodding to wake up more experienced devs ;) > > I'm also not really aware how expensive it is to calculate a checksum while > reading a stream for other purposes. How much cpu time does it add if the > file I/O would happen anyway? Is it neglectable? > > I guess we'll ultimately have to just try out what performs best. > >> If the database query requires even one system call then it could well >> be slower. Multiple processes accessing a working copy, or writing to >> the pristine store, might bias this further towards stat being faster, >> If we decide to share the pristine store between several working >> copies then a shared database could become a bottleneck. >> >> [...] >> >>> Use case "need": "I want to use this pristine's content, definitely." >>> --------------- >>> pseudocode: >>> pristine_check(&present, checksum, _usable) (3) >>> if !present: >>> get_pristine_from_repos(checksum, ra) (9) >>> pristine_read(&stream, checksum) (6) >>> >>> (3) check for _usable: >>> - db-presence >>> - if the checksum is not present in the table, return that it is not >>> present (don't check for file existence as well). >>> - stat-match (includes file-presence) >>> - if the checksum is present in the table but file is bad/not there, >>> bail, asking user to 'svn cleanup --pristines' (or sth.) >>> >>> (9) See use case "fetch". After this, either the pristine file is ready for >>> reading, or "fetch" has bailed already. >>> >>> (6) fopen() >> >> I think this is the most important case from a performance point of >> view. This is what 'svn status' et al. use, and it's important for >> GUIs as a lot of the "feel" depends on how fast a process can query >> the metadata. > > Agreed. > >> If we were to do away with the PRISTINE table, then we would not have >> to worry about it becoming a bottleneck. We don't need the existance >> check if we are just about to open the file, since opening the file >> proves that it exists. > > <rant>Yes, I meant that, semantically, there has to be an existence check. > You're right that it is gotten for free from opening the file. It's still > important to note where the antenna sits that detects non-existence.</rant> > >> We obviously have the checksum already, from >> the BASE/WORKING table, so we only need the PRISTINE table for the >> size/mtime. Perhaps we could store those in the BASE/WORKING table >> and eliminate the PRISTINE table, or is this too much of a layering >> violation? The pristine store is then just a sharded directory, into >> which we move files and from which we read files. > > -1 > > While we could store size&mtime in the BASE/WORKING tables, this causes size > and mtime to be stored multiple times (whereever a pristine is referenced) > and involves editing multiple entries when a pristine is removed/added due > to high-water-mark or repair. That would be nothing less than horrible. > Taking one step away from that, each working copy should have a dedicated > table that stores size and mtime only once. Then we still face the situation > that size and mtime are stored multiple times (once per working copy), and > where, if a central pristine store is restructured, every working copy has > to be updated. Bad idea. > > Instead, we could not store size and mtime at all! :) A big BUT is that we also need to store and send the MD5 checksum for backwards compatibility with older servers/clients. So we'll definitely need a database until 2.0, because of the MD5 compat alone. We also currently have a 'compressed' flag stored, which allows optionally compressing pristines. I think it's debatable if that is really useful. The pristine store should be *fast* and, ideally, random-access-able. Opening a decompression stream is kind of versus that; it's optimising for disk space, and that's inherently not what the pristine store is for. I'd lose it. ~Neels > > They are merely half-checks for validity. During normal operation, size and > mtime should never change, because we don't open write streams to pristines. > If anyone messes with the pristine store accidentally, we would pick it up > with the size, or if that stayed the same, with the mtime. But we can pick > up all cases of bitswaps/disk failure *only* by verifying *full checksum > validity*! > > So, while checking size and mtime gives a sense of basic sanity, it is > really just a puny excuse for not checking full checksum validity. If we > really care about correctness of pristines, *every* read of a pristine > should verify the checksum along the way. (That would include to always read > the complete pristine, even if just a few lines along the middle are needed) > > * neels dreams of disks that hardware-checksum on-the-fly > > If I further follow my dream of us emulating such hardware, we would store > checksums for sub-chunks of each pristine, so that we can read small > sections of pristines, being sure that the given section is correct without > having to read the whole pristine. > > Whoa, look where you got me now! ;) > > I think it's a very valid question. Chuck the mtime and size, thus get rid > of the PRISTINE table, thus do away with checking for any inconsistency > between table and file system, also do away with possible database > bottlenecks, and reduce the location of the pristine store to a mere local > abspath. We have the checksum, we have the filename. Checking mtime and > length protects against accidental editing of the pristine files. But any > malicious or hw-failure corruption can in fact be *protected* by keeping > mtime and length intact! ("hey, we checked it, it must be correct.") > > Let's play through a corrupted pristine (with unchanged mtime/length). This > is just theoretical... > > Commit modification: > > - User makes a checkout / revert / update that uses a locally > corrupted pristine. The corrupted pristine thus sits in the WC. > > - User makes a text mod > > - User commits > > - Client/network layer communicate the *delta* between the local pristine > and the local mod to the repository, and the checksum of the modified > text. > > - Repos applies the delta to the intact pristine it has in *its* store. > > - Repos finds the resulting checksum to be *different* from the client's > checksum, because the underlying pristine was corrupt. > > --> Yay! No need to do *ANY* local verification at all!! > > Of course, in case the client/network layer decide to send the full text > instead of a delta, the corruption is no longer detected. :( > > > Merge and commit: > > - User makes a merge that uses a locally corrupted pristine. > > - The merge *delta* applied to the working copy is incorrect. > > - User does not note the corruption (e.g. via --accept=mine-full) > > - User commits > > - Repos accepts the changes based on the corrupted pristine that was > used to get the merge delta, because it can't tell the difference > from a normal modification. > > --> My goodness, merge needs to check pristine validity on each read, > as if it wasn't slow enough. But as discussed above, even if merge > checked mtime and length, it would not necessarily detect disk failure > and crafted malicious corruption. > > > Thanks, Philip. > > I'm now challenging the need to store mtime and length, and a need to do > more checksumming instead. The checksumming overhead could be smaller than > the database bottleneck slew. > > For future optimisation, I'm also suggesting pristines should have > additionally stored checksums for small chunks of each pristine, while still > being indexed by the full checksum. > (Which may imply a db again :/ , but that db would only be hit if we're > trying to save time by reading just a small bit of the pristine) > > Everyone, please prove me wrong! > > Thanks, > ~Neels >
signature.asc
Description: OpenPGP digital signature