On 24 July 2015 at 22:58, Philip Martin <philip.mar...@wandisco.com> wrote: > [Arising from some discussion on IRC today.] > > I've been considering the problem of a dump/load upgrade for a > repository with a large number of revisions. To minimise downtime the > initial dump/load would be carried out while the original repository > remains live. When the load finishes the new repository is already > out-of-date so an incremental dump/load is carried out. When this > second load finishes the original repository is taken offline and we > want to bring the new repository online as quickly as possible. A final > incremental dump/load is required but that only involves a small number > of revisions and so is fast. The remaining problems are locks and > revprops. > > We do not have tools to handle locks so the options are: a) drop all the > locks, or b) copy/move the whole db/locks subdir. I'm not really > interested in locks at present. > > Revprops are more of a problem. Most revprops are up-to-date but a > small number may be out-of-date. The problem is we do not know which > revprops are out-of-date. Is there a reliable and efficient way to > bring the revprops up-to-date? We could attempt to disable and/or track > revprop changes during the load but this is not reliable. Post- hooks > are not 100% reliable and revprop changes can bypass the hooks. We > could attempt to copy/move the whole revprops subdir that is not always > possible if the repository formats are different. > > One general solution is to use svnsync to bulk copy the revprops: > > ln -sf /bin/true dst/hooks/pre-revprop-change > svnsync initialize --allow-non-empty file:///src file:///dst > svnsync copy-revprops file:///src file:///dst > > This isn't very fast, I get about 2,000 revisions a minute for > repositories on an SSD. There are typically three revprops per > revisions and the FS/RA API change one at time. Each change must run > the mandatory pre-revprop-change hook and fsync() the repository. > svnsync has a simple algorithm that writes every revprop for each > revision. > > A repository with a million revisions svnsync would invoke three million > processes to run the hooks and three million fsync(). Typically, most > of this work is useless because most of the revprops already match. > > I wrote a script using the Python FS bindings (see below). This avoids > the hooks and also elides the writes when the values already match. > Typically this just has to read and so will process several hundred > thousand revisions a minute. This will reliably update a million > revisions in minutes. > > I was thinking that perhaps we ought to provide a more accessible way to > do this. First, modify the FS implementations to detect when a change > is a noop that doesn't modify a value and skip all the writing. Second > provide some new admin commands to dump/load revprops: > > svnadmin dump-revprops repo | svnadmin load-revprops repo > May be use existing 'load' subcommand with '--revprops-only' switch to load revprops instead of new subcommand? I.e.: svnadmin dump --revprops-only | svnadmin load --revprops-only
-- Ivan Zhakov