On Wed, 2010-04-14, C. Michael Pilato wrote:
> Paul Burba wrote:
> > After thinking about this some more I see three options:
> > 
> > A) Keep the pre-927243 behavior as the default and thus support the
> > incremental dump use case by default.  Add a new option to svnadmin
> > load, --skip-missing-merge-sources (or maybe
> > --filter-missing-merge-sources) to activate the filtering of mergeinfo
> > sources outside of the dump stream (i.e. make the r927243 changes only
> > take affect with this option).  The obvious drawback is that admins
> > must know to use this option.  They would still be able to partially
> > dump a repos then load it into an empty (or unrelated) repos and have
> > bogus mergeinfo.
> > 
> > B) If the load stream's mergeinfo contains a merge source-rev pair
> > that predates the start of the load stream, but exists in the target
> > repository, then allow it to be loaded, otherwise filter it.  The
> > drawbacks are two:  First, performance; we'd need to check every
> > path/rev pair of incoming mergeinfo which certainly isn't going to
> > speed up a load*.  Second, it may be mere coincidence that the
> > path/rev exists in the target repository at the start of the load.
> > 
> > C) Revert r927243 and move the fix into svnadmin load: Give an error
> > when doing a partial dump that contains mergeinfo with revisions that
> > predate the starting rev of the dump and require the use of a new
> > --missing-merge-source=[skip|allow] option to successfully complete
> > the dump (i.e. something analogous to svndumpfilter's
> > --skip-missing-merge-sources).
> > 
> > Of course there is always:
> > 
> > D) <Insert your brilliant idea here>
> 
> [Thinking aloud here.]
> 
> We have analogous behavior today with the handling of copied paths.  There
> are two aspects of this handling which might serve as models in this new
> scenario:
> 
> 1.  'svnadmin dump' warns when it is in incremental mode and must generate a
> copy action from a source that predates the beginning of the dump window,
> but it's only a warning and the dump continues.

> 2.  'svnadmin load' does nothing smart, trusting that the dump it's being
> fed is a sensible one.  If it gets a request to copy some path/rev that
> doesn't exist, stuff errors out and the repository is left in whatever state
> it was in prior to the revision that failed to load.
> 
> Assuming similar behavior for mergeinfo handling, we have, at a minimum:
> 
> 1.  'svnadmin dump' warns when it is in incremental mode and must generate
> mergeinfo from a merge source that predates the beginning of the dump
> window, but it's only a warning and the dump continues.

+1 on that, both for consistency with copy-source info and because it's
a good policy anyway.

A detail: I haven't tried it, but would want it to print this warning
(and the copy-source one) at the end of the dump, and not just in the
middle of the long list of "Dumping rX" messages.

> Now, there's some question here about how to handle part (2).  In the copied
> path case, 'svnadmin load' can be stupid because the failure case is a noisy
> one -- stuff errors out from deep within the FS layer.  That's not quite the
> same failure case for mergeinfo, where bad mergeinfo would be silently
> recorded without error.  So we *could* try to validate the mergeinfo somehow
> against the history recorded in the target repository.  But there follows
> another complication:  it's really not so hard to get bad (if perhaps
> innocuous) mergeinfo into a repository by normal means, and you certainly
> wouldn't expect that to cost you the ability to dump and load your repository!
> 
> So I'm really left with this:
> 
> 2.  'svnadmin load' does nothing smart, trusting that the dump it's being
> fed is a sensible one.

Better: 'svnadmin load' tries to validate the mergeinfo, and issues a
warning if it refers to a non-existent source.  The admin then gets to
figure out whether it was bad in the first place or because of his/her
partial-dump/partial-load scenario.

However, it depends how efficient the checking is.  If that would make
the 'load' really slow, I can see that not being wanted.

- Julian


> As a net change against our shipping code, that's really just, what, the
> addition of warnings in 'svnadmin dump'?
> 
> The question then becomes, "How do users deal with legitimate partial dumps
> that will be loaded atop something other than loads from previous
> incremental windows?"  I think they do this the same way they handle the
> copy case:  either hand-hacking the dumpstream or using 'svndumpfilter'.  So
> maybe 'svndumpfilter' grows the logic and options required to pare away
> mergeinfo that doesn't meet some criteria (such as "ranges outside the dump
> stream range")?
> 


Reply via email to