On Mon, Mar 8, 2010 at 13:22, Julian Foad <[email protected]> wrote: >... > I see two Wrong Ways to do this: > > (1) Pass the path of the temporary file (through whatever contorted code > and data flows exist) along to install_committed_file().
This one is somewhat reasonable. > (2) Ensure that the temporary file has a derived name, so that the same > name can be derived again within (). The name could be based on the > working file path/name/version (as it is in WC-1), or could be based on > the file's SHA-1 checksum (like it will be when properly in the pristine > store) if that is available. This is what we do now, and it totally sucks. We should never "guess" at filenames or other such items. There should be clear dataflow. > ... and one Right Way: > > (3) As soon as all the content is written to the temp file, move it > fully into the pristine store, named by its checksum. Then later, in > install_committed_file(), make that pristine text become "this node's > base" by writing its checksum into the node's entry in the DB. Yes. This was where I hoped for us to go, which is why I brought up the bits about checksum in process_committed_leaf. There is still a problem around the checksum dataflow, but I do believe that is the best way to do this. > Method (3) is right because the new pristine store can contain both the > old and the new text base simultaneously. We can install a new pristine Yes and no. More below. > text that nothing yet refers to, which can be done regardless of whether > and when the (update) operation completes, and then later make something > refer to it, thus simplifying the loggy requirements. > > However, that does beg the question of how and when we will delete the > old text from the store. ... or keep the in-flight one from being deleted. Imagine a second process running during the commit, performing a GC, seeing an unref'd pristine, and wiping it out. > If we are going delete it by garbage collection, we must either ensure > that the temporary pristine is known to be "referenced" before it is > actually recorded as the base of any node, or otherwise ensure that no > "garbage collection" can happen while the WC is in such an intermediate > state. I believe we can do this pretty simply, by recording a work queue item (to do *whatever*). The wc_db code will not allow the use of a database if there are outstanding work queue items. > Alternatively, if we are going to delete the old one at the time when > the new one "replaces" it, then we have to "unreference" the old one at > that time, but that's OK as we will have the old SHA1 checksum available > up until that time. Sure. But I think we're always going to have some form of garbage collection. At a minimum, "svn cleanup" will perform pristine GC. A "referenced" pristine has a checksum sitting in one of the columns of the schema (there are about five). If we delete the old (assuming it is unref'd), then that tends to imply the new one is not (yet) referenced and is subject to a second process' GC logic. >... One thing around this commit process: the work queue item is "backwards". We have a work item which does a bunch of stuff and *inside* calls svn_wc__db_global_commit() on each item to perform the database work. This is due mostly to converting an old loggy item. The Proper way is to have code call db_global_commit() itself (rather than queueing a work item which calls that function). When calling commit, one or more work items should be passed as argument in order to complete the on-disk operations. This would include (at a minimum) installing a new, translated copy of the pristine into the working copy. *Somewhere* in this process is also the installation of the pristine, which involves both on-disk and in-db operations which need to be coordinated. global_commit() takes the new checksum. Maybe it can queue two work items: 1) perform the on-disk installation of the pristine 2) perform the translated-install into the working copy Now... this does imply that the new pristine is not installed, but residing at some temp location. It also means that the new PRISTINE row will exist, a WORK_QUEUE row will exist, and the pristine file will not be "in place" (but after completing that work item, it will be). I'm not sure how to best install a pristine *before* commit finalization and ensure it won't be tossed. We have some checksums in ACTUAL which are used to record sources for merge conflicts (with corresponding instructions in ACTUAL_NODE.conflict). Those checksums should be null if there is no conflict. I don't see how we can record a "to-be-committed-checksum" because I don't know when we'd clear that. What if the commit is interrupted, and the user changes the text? When we do say "that commit checksum is now outdated"? Wiping them all at the start of commit would definitely create dangling pristines, and it wouldn't allow for simultaneous, disjoint commits from within the same working copy. Maybe we could harvest committables, store intended checksums into *just* those commit targets, then run the commit logic. Something like that. Thoughts? Cheers, -g

