Sweeping through my TODO list turned up this thread.

I'll commit this last patch tomorrow if there are no objections.

I pause a day only because I recall Mike saying he was planning to
take a look at this.

Paul

On Tue, May 11, 2010 at 4:00 PM, Paul Burba <ptbu...@gmail.com> wrote:
> On Tue, Apr 20, 2010 at 11:08 PM, C. Michael Pilato <cmpil...@collab.net> 
> wrote:
>> Paul Burba wrote:
>>> Hi Mike and Julian,
>>>
>>> Sorry for the delayed response; been working on/thinking about other
>>> mergey stuff lately...
>>>
>>> A few inline comments and then a proposed course of action at the end.
>>>
>>> On Thu, Apr 15, 2010 at 9:28 AM, C. Michael Pilato <cmpil...@collab.net> 
>>> wrote:
>>>> The question then becomes, "How do users deal with legitimate partial dumps
>>>> that will be loaded atop something other than loads from previous
>>>> incremental windows?"  I think they do this the same way they handle the
>>>> copy case:  either hand-hacking the dumpstream or using 'svndumpfilter'.  
>>>> So
>>>> maybe 'svndumpfilter' grows the logic and options required to pare away
>>>> mergeinfo that doesn't meet some criteria (such as "ranges outside the dump
>>>> stream range")?
>>>
>>> I vote for svndumpfilter growing the logic to filter ranges predating
>>> the start of the dump stream in the same way it strips mergeinfo with
>>> merge sources filtered from the dump stream.  Basically this means
>>> moving the r927243 functionality to svndumpfilter.  No really sure we
>>> need a new option, seems we could overload
>>> --skip-missing-merge-sources to include this behavior.  Thoughts?
>>
>> +1
>>
>>>> Julian Foad wrote:
>>>>> On Wed, 2010-04-14, C. Michael Pilato wrote:
>>>>>> Assuming similar behavior for mergeinfo handling, we have, at a minimum:
>>>>>>
>>>>>> 1.  'svnadmin dump' warns when it is in incremental mode and must 
>>>>>> generate
>>>>>> mergeinfo from a merge source that predates the beginning of the dump
>>>>>> window, but it's only a warning and the dump continues.
>>>
>>> Why only --incremental mode?  We should warn for both --incremental
>>> dumps *and* non-incremental dumps because in the latter case we might
>>> be specifying a dump range -rX:Y where rX>0* and there might be
>>> mergeinfo that refers to revs < X.  And while we might want to be
>>> clever about skipping the warning when no range is specified (i.e.
>>> defaulting to -r0:HEAD), even then we should warn because issue #3020
>>> might already have resulted in a r1 with mergeinfo in the starting
>>> repository.
>>
>> You're right, of course.  I was thinking "partial dump" and saying
>> "incremental".  My bad.
>>
>>>>>> 2.  'svnadmin load' does nothing smart, trusting that the dump it's being
>>>>>> fed is a sensible one.
>>>>> Better: 'svnadmin load' tries to validate the mergeinfo, and issues a
>>>>> warning if it refers to a non-existent source.  The admin then gets to
>>>>> figure out whether it was bad in the first place or because of his/her
>>>>> partial-dump/partial-load scenario.
>>>>>
>>>>> However, it depends how efficient the checking is.  If that would make
>>>>> the 'load' really slow, I can see that not being wanted.
>>
>> [...]
>>
>>> No unsurprisingly, this took significantly (62%) longer, 00:05:39.609.
>>
>> (My vote was for 'svnadmin load' does nothing smart, and I stand by it.)
>>
>>> Taking what you've had to say into consideration, I propose the following:
>>>
>>> 1) Revert http://svn.apache.org/viewvc?view=revision&revision=927243.
>>> This stops svnadmin load from automatically filtering mergeinfo
>>> referring to revisions outside of the load stream.
>>>
>>> 2) Reapply or reimplement the part of r327243 that accounts for the
>>> case where the first loaded revision in a load stream has mergeinfo,
>>> see http://subversion.tigris.org/issues/show_bug.cgi?id=3020#desc10
>>> item #1.
>>
>> (r927243, you mean.)
>>
>>> 3) Keep the inline copy-source warnings as they exist today in
>>> svnadmin dump, but add a new end-of-process summary warning e.g.:
>>>
>>>   * Dumped revision 504.
>>>   * Dumped revision 505.
>>>   * Dumped revision 506.
>>>   * Dumped revision 507.
>>>   WARNING: Referencing data in revision 250, which is older than the oldest
>>>   WARNING: dumped revision (504).  Loading this dump into an empty 
>>> repository
>>>   WARNING: will fail.
>>>   * Dumped revision 58.
>>>   * Dumped revision 59.
>>>   * Dumped revision 510.
>>>   * Dumped revision 511.
>>>   .
>>>   .
>>>   .
>>>   WARNING: The range of revisions dumped contained references to copy
>>> sources outside that range.
>>>
>>> 4) Implement inline and summary functionality analogous to 3) but for
>>> mergeinfo ranges outside of the dump stream, e.g.:
>>>
>>>   * Dumped revision 504.
>>>   * Dumped revision 505.
>>>   * Dumped revision 506.
>>>   * Dumped revision 507.
>>>   WARNING: Mergeinfo referencing revision(s) prior to revision 250,
>>> which is older
>>>   WARNING: than the oldest dumped revision (504).  Loading this dump may 
>>> result
>>>   WARNING: in invalid mergeinfo.
>>>   * Dumped revision 58.
>>>   * Dumped revision 59.
>>>   * Dumped revision 510.
>>>   * Dumped revision 511.
>>>   .
>>>   .
>>>   .
>>>   WARNING: The range of revisions dumped contained references to
>>> mergeinfo outside that range.
>>>
>>> Also, as mentioned earlier, I think this warning should kick in
>>> regardless of whether --incremental is used.
>>>
>>> 5) Overload svndumpfilter's --skip-missing-merge-sources option to
>>> also remove mergeinfo that predates the starting revision of the dump
>>> stream.
>>
>> +1 all over this stuff.
>>
>>> 6) Open: How does svnadmin load deal with mergeinfo that doesn't exist
>>> in the load steam or the target repos:
>>>
>>>    A) It doesn't.
>>>
>>>    B) Only checks that the revs are valid (fast, but
>>>       can allow mergeinfo that describes invalid
>>>       mergesource:rev pairs)
>>>
>>>    C) Find an efficient way to test for valid
>>>       mergesource:rev pairs
>>
>> For now, A (or maybe B).  Then we have our Paul back.  :-)
>>
>
> Been down the rabbit hole on-and-off the last couple weeks with this.
> Did everything we discussed so far in this thread, plus a slew of new
> bugs, see http://subversion.tigris.org/issues/show_bug.cgi?id=3020#desc15
> and onward.
>
> At any rate, after my recent work we are left with one common(?) use
> case still broken:
>
> Loading a sequence of incremental dumps when the fist load in the
> sequence is into a *non-empty* repostitory.
>
> This is tested in svnadmin_tests.py 20 'don't filter mergeinfo revs
> from incremental dump', specifically see '# PART 4: Load a a series of
> incremental dumps to an non-empty repository."
>
> Basically the problem is this:
>
> 1) Given repository ReposA with X revisions in it.
>
>   Note: Previously this use case was also broken if ReposB
>   was empty to start, but this now works (assuming none of
>   the incremental dumps in step 3 were svndumpfiltered in
>   such a way as to remove/renumber revisions).
>
> 2) Given repository ReposB with Y revisions in it.
>
> 3) Incrementally (but fully) dump ReposA
>
>   svnadmin dump ReposA -r0:100                 >  A.0.100.dump
>   svnadmin dump ReposA -r101:200 --incremental >  A.101.200.dump
>   svnadmin dump ReposA -r201:300 --incremental >  A.201.300.dump
>   .
>   .
>   .
>   svnadmin dump ReposA -rW:X     --incremental >  A.W.X.dump
>
> 4) Load the ReposA incremental dumps to ReposB
>
>   svnadmin load ReposB --parent-dir /some/subtree < A.0.100.dump
>   svnadmin load ReposB --parent-dir /some/subtree < A.101.200.dump
>   svnadmin load ReposB --parent-dir /some/subtree < A.201.300.dump
>   .
>   .
>   .
>   svnadmin load ReposB --parent-dir /some/subtree < A.W.X.dump
>
> The problem arises when one of the incremental loads contains
> references to mergeinfo source revisions that predate the start of the
> dump.  For example, say A.201.300.dump contains mergeinfo referencing
> r82.  When this is loaded into ReposB the reference should be changed
> to r(82+Y) to reflect the fact that ReposB has Y revisions in it
> before we ever started loading parts of ReposA.  Currently no change
> is made and the source rev stays at r82, which is obviously incorrect.
>
> The attached patch fixes this problem.  It takes all mergeinfo which
> predates the first revision in the dump stream and adjusts it by the
> difference between the head rev in ReposB and the current dump stream
> revision.  Yes, this is a fairly fragile solution (as is our copyfrom
> sources mapping, which this is based on).  If unrelated commits are
> made to ReposB between loads of the incremental dumps, then the
> mergeinfo source revs will be wrong; but it is *always* wrong right
> now, at least this would fix this use case.  Opinions appreciated.
>
> And yes, there are still other potential problems: Think incremental
> dumps that are dumpfiltered such that some revs are dropped or
> renumbered.  But I'm not sure we really want to solve *every* problem
> in this space, I'm not even sure we can (without bringing the
> performance of svnadmin load to its knees).  I think this patch brings
> us to "good enough" with the caveat that ill-advised use of svnadmin
> dump/load and/or svndumpfilter can result in incorrect mergeinfo.
>
> Paul
>
One last issue #3020 fix: Correctly map mergeinfo revisions when loading
incremental dumps into a repository that was not empty prior to the first
load.

* subversion/include/private/svn_mergeinfo_private.h

  (svn_mergeinfo__adjust_mergeinfo_rangelists): New.

* subversion/libsvn_subr/mergeinfo.c

  (svn_mergeinfo__adjust_mergeinfo_rangelists): New.

* subversion/libsvn_repos/load.c

  (renumber_mergeinfo_revs): Adjust mergeinfo older than the oldest revision
   in the dump stream by the difference between the head rev of the target
   repository and the current dump stream rev.

  (new_revision_record): Update parse_baton.oldest_old_rev here, a bit sooner
   than we did before when we set it...

  (close_revision): ...here.

* subversion/tests/cmdline/svnadmin_tests.py
Index: subversion/include/private/svn_mergeinfo_private.h
===================================================================
--- subversion/include/private/svn_mergeinfo_private.h  (revision 943236)
+++ subversion/include/private/svn_mergeinfo_private.h  (working copy)
@@ -212,6 +212,21 @@
                           svn_boolean_t inheritable,
                           apr_pool_t *result_pool);
 
+/* Adjust in-place MERGEINFO's rangelists by OFFSET.  If OFFSET is negative 
+   and would adjust any part of MERGEINFO's source revisions to 0 or less,
+   then those revisions are dropped.  If all the source revisions for a merge
+   source path are dropped, then the path itself is dropped.  If all merge
+   source paths are dropped, then *ADJUSTED_MERGEINFO is set to an empty
+   hash.  *ADJUSTED_MERGEINFO is allocated in RESULT_POOL.  SCRATCH_POOL is
+   used for any temporary allocations. */
+svn_error_t *
+svn_mergeinfo__adjust_mergeinfo_rangelists(svn_mergeinfo_t *adjusted_mergeinfo,
+                                           svn_mergeinfo_t mergeinfo,
+                                           svn_revnum_t offset,
+                                           apr_pool_t *result_pool,
+                                           apr_pool_t *scratch_pool);
+
+
 #ifdef __cplusplus
 }
 #endif /* __cplusplus */
Index: subversion/libsvn_repos/load.c
===================================================================
--- subversion/libsvn_repos/load.c      (revision 943236)
+++ subversion/libsvn_repos/load.c      (working copy)
@@ -278,11 +278,31 @@
                         apr_pool_t *pool)
 {
   apr_pool_t *subpool = svn_pool_create(pool);
-  apr_hash_t *mergeinfo;
-  apr_hash_t *final_mergeinfo = apr_hash_make(subpool);
+  svn_mergeinfo_t mergeinfo, predates_stream_mergeinfo;
+  svn_mergeinfo_t final_mergeinfo = apr_hash_make(subpool);
   apr_hash_index_t *hi;
 
   SVN_ERR(svn_mergeinfo_parse(&mergeinfo, initial_val->data, subpool));
+
+  if (rb->pb->oldest_old_rev > 1)
+    {
+      SVN_ERR(svn_mergeinfo__filter_mergeinfo_by_ranges(
+        &predates_stream_mergeinfo, mergeinfo,
+        rb->pb->oldest_old_rev - 1, 0,
+        TRUE, subpool, subpool));
+      SVN_ERR(svn_mergeinfo__filter_mergeinfo_by_ranges(
+        &mergeinfo, mergeinfo,
+        rb->pb->oldest_old_rev - 1, 0,
+        FALSE, subpool, subpool));
+      SVN_ERR(svn_mergeinfo__adjust_mergeinfo_rangelists(
+        &predates_stream_mergeinfo, predates_stream_mergeinfo,
+        -rb->rev_offset, subpool, subpool));
+    }
+  else
+    {
+      predates_stream_mergeinfo = NULL;
+    }
+
   for (hi = apr_hash_first(subpool, mergeinfo); hi; hi = apr_hash_next(hi))
     {
       const char *merge_source;
@@ -348,6 +368,11 @@
       apr_hash_set(final_mergeinfo, merge_source,
                    APR_HASH_KEY_STRING, rangelist);
     }
+
+  if (predates_stream_mergeinfo)
+      SVN_ERR(svn_mergeinfo_merge(final_mergeinfo, predates_stream_mergeinfo,
+                                  subpool));
+
   SVN_ERR(svn_mergeinfo_sort(final_mergeinfo, subpool));
 
   /* Mergeinfo revision sources for r0 and r1 are invalid; you can't merge r0
@@ -1053,6 +1078,10 @@
       SVN_ERR(svn_stream_printf(pb->outstream, pool,
                                 _("<<< Started new transaction, based on "
                                   "original revision %ld\n"), rb->rev));
+
+      /* Stash the oldest "old" revision committed from the load stream. */
+      if (!SVN_IS_VALID_REVNUM(pb->oldest_old_rev))
+        pb->oldest_old_rev = rb->rev;
     }
 
   /* If we're parsing revision 0, only the revision are (possibly)
@@ -1411,10 +1440,6 @@
         return svn_error_return(err);
     }
 
-  /* Stash the oldest "old" revision committed from the load stream. */
-  if (!SVN_IS_VALID_REVNUM(pb->oldest_old_rev))
-    pb->oldest_old_rev = *old_rev;
-
   /* Run post-commit hook, if so commanded.  */
   if (pb->use_post_commit_hook)
     {
Index: subversion/libsvn_subr/mergeinfo.c
===================================================================
--- subversion/libsvn_subr/mergeinfo.c  (revision 943236)
+++ subversion/libsvn_subr/mergeinfo.c  (working copy)
@@ -1998,6 +1998,58 @@
   return SVN_NO_ERROR;
 }
 
+svn_error_t *
+svn_mergeinfo__adjust_mergeinfo_rangelists(svn_mergeinfo_t *adjusted_mergeinfo,
+                                           svn_mergeinfo_t mergeinfo,
+                                           svn_revnum_t offset,
+                                           apr_pool_t *result_pool,
+                                           apr_pool_t *scratch_pool)
+{
+  apr_hash_index_t *hi;
+  *adjusted_mergeinfo = apr_hash_make(result_pool);
+
+  if (mergeinfo)
+    {
+      for (hi = apr_hash_first(scratch_pool, mergeinfo);
+           hi;
+           hi = apr_hash_next(hi))
+        {
+          int i;
+          const char *path = svn__apr_hash_index_key(hi);
+          apr_array_header_t *rangelist = svn__apr_hash_index_val(hi);
+          apr_array_header_t *adjusted_rangelist =
+            apr_array_make(result_pool, rangelist->nelts,
+                           sizeof(svn_merge_range_t *));
+
+          for (i = 0; i < rangelist->nelts; i++)
+            {
+              svn_merge_range_t *range =
+                APR_ARRAY_IDX(rangelist, i, svn_merge_range_t *);
+
+              if (range->start + offset > 0 && range->end + offset > 0)
+                {                  
+                  if (range->start + offset < 0)
+                    range->start = 0;
+                  else
+                    range->start = range->start + offset;
+
+                  if (range->end + offset < 0)
+                    range->end = 0;
+                  else
+                    range->end = range->end + offset;
+                  APR_ARRAY_PUSH(adjusted_rangelist, svn_merge_range_t *) =
+                    range;
+                }
+            }
+
+          if (adjusted_rangelist->nelts)
+            apr_hash_set(*adjusted_mergeinfo, apr_pstrdup(result_pool, path),
+                         APR_HASH_KEY_STRING, adjusted_rangelist);
+        }
+    }
+  return SVN_NO_ERROR;
+}
+
 svn_boolean_t
 svn_mergeinfo__is_noninheritable(svn_mergeinfo_t mergeinfo,
                                  apr_pool_t *scratch_pool)
Index: subversion/tests/cmdline/svnadmin_tests.py
===================================================================
--- subversion/tests/cmdline/svnadmin_tests.py  (revision 943236)
+++ subversion/tests/cmdline/svnadmin_tests.py  (working copy)
@@ -1070,6 +1070,7 @@
   load_and_verify_dumpstream(sbox, [], [], None,
                              open(dump_file_r11_13).read(),
                              '--ignore-uuid')
+  #raise svntest.Failure("PTB")
   load_and_verify_dumpstream(sbox, [], [], None,
                              open(dump_file_r14_15).read(),
                              '--ignore-uuid')
@@ -1162,28 +1163,6 @@
 
   # Check the resulting mergeinfo.  We expect the exact same results
   # as Part 3.
-  #
-  # Currently this fails because our current logic mapping mergeinfo revs
-  # in the load stream to their new values based on the offset of the
-  # target repository is quite flawed.  Right now this is the resulting
-  # mergeinfo:
-  #
-  #    Properties on 'svnadmin_tests-21\Projects\Project-X\branches\B1\B
-  #      svn:mergeinfo
-  #        /Projects/Project-X/branches/B2/B/E:11-12
-  #        /Projects/Project-X/trunk/B/E:5-6,8-9
-  #    Properties on 'svnadmin_tests-21\Projects\Project-X\branches\B1':
-  #      svn:mergeinfo
-  #        /Projects/Project-X/branches/B2:11-18
-  #                                           ^^
-  #                                 The *only* correct rev here!
-  #        /Projects/Project-X/trunk:6,9
-  #    Properties on 'svnadmin_tests-21\Projects\Project-X\branches\B2':
-  #      svn:mergeinfo
-  #        /Projects/Project-X/trunk:9
-  #
-  # See http://subversion.tigris.org/issues/show_bug.cgi?id=3020#desc16 for
-  # more info.
   svntest.actions.run_and_verify_svn(None, expected_output, [],
                                      'propget', 'svn:mergeinfo', '-R',
                                      sbox.repo_url)
@@ -1215,7 +1194,7 @@
               create_in_repo_subdir,
               SkipUnless(verify_with_invalid_revprops,
                          svntest.main.is_fs_type_fsfs),
-              XFail(dont_drop_valid_mergeinfo_during_incremental_loads),
+              dont_drop_valid_mergeinfo_during_incremental_loads,
              ]
 
 if __name__ == '__main__':

Reply via email to