[Arising from some discussion on IRC today.] I've been considering the problem of a dump/load upgrade for a repository with a large number of revisions. To minimise downtime the initial dump/load would be carried out while the original repository remains live. When the load finishes the new repository is already out-of-date so an incremental dump/load is carried out. When this second load finishes the original repository is taken offline and we want to bring the new repository online as quickly as possible. A final incremental dump/load is required but that only involves a small number of revisions and so is fast. The remaining problems are locks and revprops.
We do not have tools to handle locks so the options are: a) drop all the locks, or b) copy/move the whole db/locks subdir. I'm not really interested in locks at present. Revprops are more of a problem. Most revprops are up-to-date but a small number may be out-of-date. The problem is we do not know which revprops are out-of-date. Is there a reliable and efficient way to bring the revprops up-to-date? We could attempt to disable and/or track revprop changes during the load but this is not reliable. Post- hooks are not 100% reliable and revprop changes can bypass the hooks. We could attempt to copy/move the whole revprops subdir that is not always possible if the repository formats are different. One general solution is to use svnsync to bulk copy the revprops: ln -sf /bin/true dst/hooks/pre-revprop-change svnsync initialize --allow-non-empty file:///src file:///dst svnsync copy-revprops file:///src file:///dst This isn't very fast, I get about 2,000 revisions a minute for repositories on an SSD. There are typically three revprops per revisions and the FS/RA API change one at time. Each change must run the mandatory pre-revprop-change hook and fsync() the repository. svnsync has a simple algorithm that writes every revprop for each revision. A repository with a million revisions svnsync would invoke three million processes to run the hooks and three million fsync(). Typically, most of this work is useless because most of the revprops already match. I wrote a script using the Python FS bindings (see below). This avoids the hooks and also elides the writes when the values already match. Typically this just has to read and so will process several hundred thousand revisions a minute. This will reliably update a million revisions in minutes. I was thinking that perhaps we ought to provide a more accessible way to do this. First, modify the FS implementations to detect when a change is a noop that doesn't modify a value and skip all the writing. Second provide some new admin commands to dump/load revprops: svnadmin dump-revprops repo | svnadmin load-revprops repo dump-revprops would dump just the revprops and load-revprops would load into existing revisions rather than creating new revisions. There would be options to enable/bypass the hooks. I think this would be close to the efficiency of the script. #!/usr/bin/python import sys from svn import core, fs, repos src_path = core.svn_path_canonicalize(sys.argv[1]) dst_path = core.svn_path_canonicalize(sys.argv[2]) src_repo = repos.open(src_path) dst_repo = repos.open(dst_path) src_fs = repos.fs(src_repo) dst_fs = repos.fs(dst_repo) head = min(fs.youngest_rev(src_fs), fs.youngest_rev(dst_fs)) for r in range(0, head + 1): print r src_props = fs.revision_proplist(src_fs, r) dst_props = fs.revision_proplist(dst_fs, r) for src_name, src_value in src_props.iteritems(): try: dst_value = dst_props[src_name] if src_value != dst_value: fs.change_rev_prop(dst_fs, r, src_name, src_value) # modify dst_props.pop(src_name) except: fs.change_rev_prop(dst_fs, r, src_name, src_value) # add for dst_name, dst_value in dst_props.iteritems(): try: src_value = src_props[dst_name] except: fs.change_rev_prop(dst_fs, r, dst_name, None) # delete -- Philip Martin WANdisco