Re: Testing equality between svnrdump and svnadmin dump

Julian Foad Tue, 13 Jan 2015 06:23:23 -0800

The next version of my testing patch is attached. It pipes each dumpfile 
through svndumpfilter and checks that a no-op filtering does not change 
anything.


This finds some differences between svnadmin and svndumpfilter:

1. When a revision has no revprops, svnadmin outputs an empty properties 
section, while svndumpfilter omits the properties section (but adds an extra 
newline after where it would have been).

svnadmin dump:
[[[

Revision-number: 1
Prop-content-length: 10
Content-length: 10

PROPS-END

]]]

svndumpfilter:
[[[

Revision-number: 1
Content-length: 0


]]]

The document 'notes/dump-load-format.txt' presently says the properties section 
should always be present. I think we have to change that to say it's optional, 
as released versions of svndumpfilter behave in this way (at least v1.7 and 
v1.8).

The attached patch 'dump-load-format-no-revprops-1.patch' corrects that 
documentation.


2. When there is no content section needed on a Node 'add' record (such as an 
add-with-history with no modifications), svnadmin omits the Content-length 
header while svndumpfilter writes "Content-length: 0" and an empty content 
section (a newline).


3. svndumpfilter outputs node headers in a different order. That's fine.


- Julian

After every test, dump the created repository (if any) with svnadmin and
with svnrdump and compare the two.


* subversion/tests/cmdline/svntest/sandbox.py
  (Sandbox.__init__): Remember the current working directory.
  (Sandbox.verify): New method.

* subversion/tests/cmdline/svntest/testcase.py
  (FunctionTestCase.run): Call the sandbox's verify method afterwards.

* subversion/tests/cmdline/svntest/verify.py
  (DumpParser): Improve the parsing of headers, fixing bugs and making it
    more generic. Don't store the number of blank lines, for now, because at
    the moment these often differ between svnadmin and svnrdump.
  (compare_dump_files): Add an 'ignore uuid' option.

* subversion/tests/cmdline/svnadmin_tests.py
  (fsfs_recover_handle_missing_revs_or_revprops_file): Restore the
    repository to a non-corrupt state, to avoid failing to dump.
  (verify_keep_going,
   verify_invalid_path_changes,
   verify_quickly): Remove the corrupt repository, to avoid failing to dump.
--This line, and those below, will be ignored--

Index: subversion/tests/cmdline/svnadmin_tests.py
===================================================================
--- subversion/tests/cmdline/svnadmin_tests.py	(revision 1651075)
+++ subversion/tests/cmdline/svnadmin_tests.py	(working copy)
@@ -1235,12 +1235,16 @@ def fsfs_recover_handle_missing_revs_or_
 
   if svntest.verify.verify_outputs(
     "Output of 'svnadmin recover' is unexpected.", None, errput, None,
     ".*Revision 3 has a non-file where its revprops file should be.*"):
     raise svntest.Failure
 
+  # Restore the r3 revprops file, thus repairing the repository.
+  os.rmdir(revprop_3)
+  os.rename(revprop_was_3, revprop_3)
+
 
 #----------------------------------------------------------------------
 
 @Skip(svntest.main.tests_use_prepacakaged_repository)
 def create_in_repo_subdir(sbox):
   "'svnadmin create /path/to/repo/subdir'"
@@ -2148,12 +2152,15 @@ def verify_keep_going(sbox):
                                                         sbox.repo_dir)
 
   if svntest.verify.verify_outputs("Output of 'svnadmin verify' is unexpected.",
                                    None, errput, None, "svnadmin: E165011:.*"):
     raise svntest.Failure
 
+  # Don't leave a corrupt repository
+  svntest.main.safe_rmtree(sbox.repo_dir, True)
+
 @SkipUnless(svntest.main.is_fs_type_fsfs)
 def verify_invalid_path_changes(sbox):
   "detect invalid changed path list entries"
 
   sbox.build(create_wc = False)
   repo_url = sbox.repo_url
@@ -2287,12 +2294,15 @@ def verify_invalid_path_changes(sbox):
                                                         sbox.repo_dir)
 
   if svntest.verify.verify_outputs("Output of 'svnadmin verify' is unexpected.",
                                    None, errput, None, "svnadmin: E165011:.*"):
     raise svntest.Failure
 
+  # Don't leave a corrupt repository
+  svntest.main.safe_rmtree(sbox.repo_dir, True)
+
 
 def verify_denormalized_names(sbox):
   "detect denormalized names and name collisions"
 
   sbox.build(create_wc = False)
   svntest.main.safe_rmtree(sbox.repo_dir, True)
@@ -2666,12 +2676,15 @@ def verify_quickly(sbox):
   if (svntest.main.fs_has_rep_sharing()):
     exp_out.insert(0, ".*Verifying.*metadata.*")
   if svntest.verify.verify_outputs("Unexpected error while running 'svnadmin verify'.",
                                    output, errput, exp_out, exp_err):
     raise svntest.Failure
 
+  # Don't leave a corrupt repository
+  svntest.main.safe_rmtree(sbox.repo_dir, True)
+
 
 @SkipUnless(svntest.main.is_fs_type_fsfs)
 @SkipUnless(svntest.main.fs_has_pack)
 def fsfs_hotcopy_progress(sbox):
   "hotcopy progress reporting"
 
Index: subversion/tests/cmdline/svntest/sandbox.py
===================================================================
--- subversion/tests/cmdline/svntest/sandbox.py	(revision 1651075)
+++ subversion/tests/cmdline/svntest/sandbox.py	(working copy)
@@ -23,12 +23,13 @@
 
 import os
 import shutil
 import copy
 import urllib
 import logging
+import re
 
 import svntest
 
 logger = logging.getLogger()
 
 
@@ -43,12 +44,14 @@ class Sandbox:
     self.test_paths = []
 
     self._set_name("%s-%d" % (module, idx))
     # This flag is set to True by build() and returned by is_built()
     self._is_built = False
 
+    self.was_cwd = os.getcwd()
+
   def _set_name(self, name, read_only=False):
     """A convenience method for renaming a sandbox, useful when
     working with multiple repositories in the same unit test."""
     if not name is None:
       self.name = name
     self.read_only = read_only
@@ -378,12 +381,102 @@ class Sandbox:
     _, output, _ = svntest.actions.run_and_verify_svnlook(
                      None, svntest.verify.AnyOutput, [],
                      'youngest', self.repo_dir)
     youngest = int(output[0])
     return youngest
 
+  def verify_repo(self):
+    """
+    """
+    svnrdump_headers_missing = re.compile(
+        "Text-content-sha1: .*|Text-copy-source-md5: .*|"
+        "Text-copy-source-sha1: .*|Text-delta-base-sha1: .*"
+    )
+    svnrdump_headers_always = re.compile(
+        "Prop-delta: .*"
+    )
+
+    dumpfile_a_n = svntest.actions.run_and_verify_dump(self.repo_dir,
+                                                      deltas=False)
+    dumpfile_a_d = svntest.actions.run_and_verify_dump(self.repo_dir,
+                                                      deltas=True)
+    dumpfile_r_d = svntest.actions.run_and_verify_svnrdump(
+                             None, svntest.verify.AnyOutput, [], 0,
+                             'dump', '-q', self.repo_url)
+
+    # Compare the two deltas dumpfiles, ignoring expected differences
+    dumpfile_a_d_cmp = [l for l in dumpfile_a_d
+                       if not svnrdump_headers_missing.match(l)
+                                and not svnrdump_headers_always.match(l)]
+    dumpfile_r_d_cmp = [l for l in dumpfile_r_d
+                       if not svnrdump_headers_always.match(l)]
+    svntest.verify.compare_dump_files(None, None,
+                                      dumpfile_a_d_cmp,
+                                      dumpfile_r_d_cmp)
+
+    # Try loading the dump files.
+    # For extra points, try loading each with the other tool:
+    #   svnadmin dump | svnrdump load
+    #   svnrdump dump | svnadmin load
+    repo_dir_a_n, repo_url_a_n = self.add_repo_path('load_a_n')
+    svntest.main.create_repos(repo_dir_a_n)
+    svntest.actions.enable_revprop_changes(repo_dir_a_n)
+    svntest.actions.run_and_verify_svnrdump(dumpfile_a_n,
+                                            svntest.verify.AnyOutput,
+                                            [], 0, 'load', repo_url_a_n)
+
+    repo_dir_a_d, repo_url_a_d = self.add_repo_path('load_a_d')
+    svntest.main.create_repos(repo_dir_a_d)
+    svntest.actions.enable_revprop_changes(repo_dir_a_d)
+    svntest.actions.run_and_verify_svnrdump(dumpfile_a_d,
+                                            svntest.verify.AnyOutput,
+                                            [], 0, 'load', repo_url_a_d)
+
+    repo_dir_r_d, repo_url_r_d = self.add_repo_path('load_r_d')
+    svntest.main.create_repos(repo_dir_r_d)
+    svntest.actions.run_and_verify_load(repo_dir_r_d, dumpfile_r_d)
+
+    # Dump the loaded repositories in the same way; expect exact equality
+    reloaded_dumpfile_a_n = svntest.actions.run_and_verify_dump(repo_dir_a_n)
+    reloaded_dumpfile_a_d = svntest.actions.run_and_verify_dump(repo_dir_a_d)
+    reloaded_dumpfile_r_d = svntest.actions.run_and_verify_dump(repo_dir_r_d)
+    svntest.verify.compare_dump_files(None, None,
+                                      reloaded_dumpfile_a_n,
+                                      reloaded_dumpfile_a_d,
+                                      ignore_uuid=True)
+    svntest.verify.compare_dump_files(None, None,
+                                      reloaded_dumpfile_a_d,
+                                      reloaded_dumpfile_r_d,
+                                      ignore_uuid=True)
+
+    # Run each dump through svndumpfilter and check for no further change.
+    for dumpfile in [dumpfile_a_n, dumpfile_a_d, dumpfile_r_d]:
+      exit_code2, output2, errput2 = svntest.main.run_command_stdin(
+        svntest.main.svndumpfilter_binary, None, -1, True,
+        dumpfile, '--quiet', 'include', '/')
+      assert not exit_code2 and not errput2
+      if (output2 != dumpfile):
+        from difflib import ndiff
+        print ''.join(ndiff(dumpfile, output2))
+      svntest.verify.compare_dump_files(None, None, dumpfile, output2)
+
+  def verify(self):
+    """Do additional testing that should hold for any sandbox, such as
+       verifying that the repository can be dumped.
+    """
+    if self.is_built() and not self.read_only:
+      # verify that we can in fact dump the repo
+      # (except for the few tests that deliberately corrupt the repo)
+      os.chdir(self.was_cwd)
+      if os.path.exists(self.repo_dir):
+        print("Testing dump...")
+        self.verify_repo()
+    else:
+      print("NOT testing dump: is_built=%d, read_only=%d"
+            % (self.is_built(), self.read_only))
+
 def is_url(target):
   return (target.startswith('^/')
           or target.startswith('file://')
           or target.startswith('http://')
           or target.startswith('https://')
           or target.startswith('svn://')
Index: subversion/tests/cmdline/svntest/testcase.py
===================================================================
--- subversion/tests/cmdline/svntest/testcase.py	(revision 1651075)
+++ subversion/tests/cmdline/svntest/testcase.py	(working copy)
@@ -170,13 +170,15 @@ class FunctionTestCase(TestCase):
     function was defined."""
 
     filename = self.func.func_code.co_filename
     return os.path.splitext(os.path.basename(filename))[0]
 
   def run(self, sandbox):
-    return self.func(sandbox)
+    result = self.func(sandbox)
+    sandbox.verify()
+    return result
 
 
 class _XFail(TestCase):
   """A test that is expected to fail, if its condition is true."""
 
   _result_map = {
Index: subversion/tests/cmdline/svntest/verify.py
===================================================================
--- subversion/tests/cmdline/svntest/verify.py	(revision 1651075)
+++ subversion/tests/cmdline/svntest/verify.py	(working copy)
@@ -488,32 +488,52 @@ class DumpParser:
                                 % (self.current, self.lines[self.current]))
       else:
         return False
     self.current += 1
     return True
 
+  def parse_header(self, header):
+    regex = '([^:]*): (.*)$'
+    m = re.match(regex, self.lines[self.current])
+    if not m:
+      raise SVNDumpParseError("expected a header '%s' at line %d, but found:\n%s"
+                              % (regex, self.current,
+                                 self.lines[self.current]))
+    self.current += 1
+    return m.groups()
+
+  def parse_headers(self):
+    headers = []
+    while self.lines[self.current] != '\n':
+      key, val = self.parse_header(self)
+      headers.append((key, val))
+    return headers
+
+  def parse_boolean(self, header, required):
+    return self.parse_line(header + ': (false|true)$', required)
+
   def parse_format(self):
     return self.parse_line('SVN-fs-dump-format-version: ([0-9]+)$')
 
   def parse_uuid(self):
     return self.parse_line('UUID: ([0-9a-z-]+)$')
 
   def parse_revision(self):
     return self.parse_line('Revision-number: ([0-9]+)$')
 
+  def parse_prop_delta(self):
+    return self.parse_line('Prop-delta: (false|true)$', required=False)
+
   def parse_prop_length(self, required=True):
     return self.parse_line('Prop-content-length: ([0-9]+)$', required)
 
   def parse_content_length(self, required=True):
     return self.parse_line('Content-length: ([0-9]+)$', required)
 
   def parse_path(self):
-    path = self.parse_line('Node-path: (.+)$', required=False)
-    if not path and self.lines[self.current] == 'Node-path: \n':
-      self.current += 1
-      path = ''
+    path = self.parse_line('Node-path: (.*)$', required=False)
     return path
 
   def parse_kind(self):
     return self.parse_line('Node-kind: (.+)$', required=False)
 
   def parse_action(self):
@@ -538,12 +558,21 @@ class DumpParser:
   def parse_text_md5(self):
     return self.parse_line('Text-content-md5: ([0-9a-z]+)$', required=False)
 
   def parse_text_sha1(self):
     return self.parse_line('Text-content-sha1: ([0-9a-z]+)$', required=False)
 
+  def parse_text_delta(self):
+    return self.parse_line('Text-delta: (false|true)$', required=False)
+
+  def parse_text_delta_base_md5(self):
+    return self.parse_line('Text-delta-base-md5: ([0-9a-f]+)$', required=False)
+
+  def parse_text_delta_base_sha1(self):
+    return self.parse_line('Text-delta-base-sha1: ([0-9a-f]+)$', required=False)
+
   def parse_text_length(self):
     return self.parse_line('Text-content-length: ([0-9]+)$', required=False)
 
   def get_props(self):
     props = []
     while not re.match('PROPS-END$', self.lines[self.current]):
@@ -567,14 +596,20 @@ class DumpParser:
           key += props[curprop[0]]
           curprop[0] += 1
         key = key[:-1]
 
         return key
 
-      key = read_key_or_value(curprop)
-      value = read_key_or_value(curprop)
+      if props[curprop[0]].startswith('K'):
+        key = read_key_or_value(curprop)
+        value = read_key_or_value(curprop)
+      elif props[curprop[0]].startswith('D'):
+        key = read_key_or_value(curprop)
+        value = None
+      else:
+        raise
       prophash[key] = value
 
     return prophash
 
   def get_content(self, length):
     content = ''
@@ -587,45 +622,58 @@ class DumpParser:
       raise SVNDumpParseError("content length expected %d actual %d at line %d"
                               % (length, len(content), self.current))
     return content
 
   def parse_one_node(self):
     node = {}
-    node['kind'] = self.parse_kind()
-    action = self.parse_action()
-    node['copyfrom_rev'] = self.parse_copyfrom_rev()
-    node['copyfrom_path'] = self.parse_copyfrom_path()
-    node['copy_md5'] = self.parse_copy_md5()
-    node['copy_sha1'] = self.parse_copy_sha1()
-    node['prop_length'] = self.parse_prop_length(required=False)
-    node['text_length'] = self.parse_text_length()
-    node['text_md5'] = self.parse_text_md5()
-    node['text_sha1'] = self.parse_text_sha1()
-    node['content_length'] = self.parse_content_length(required=False)
+    headers_list = self.parse_headers()
+    headers = { k:v for (k, v) in headers_list }
+    for key, header in [
+        ('kind', 'Node-kind'),
+        ('copyfrom_rev', 'Node-copyfrom-rev'),
+        ('copyfrom_path', 'Node-copyfrom-path'),
+        ('copy_md5', 'Text-copy-source-md5'),
+        ('copy_sha1', 'Text-copy-source-sha1'),
+        ('prop_delta', 'Prop-delta'),
+        ('prop_length', 'Prop-content-length'),
+        ('text_delta', 'Text-delta'),
+        ('text_delta_base_md5', 'Text-delta-base-md5'),
+        ('text_delta_base_sha1', 'Text-delta-base-sha1'),
+        ('text_length', 'Text-content-length'),
+        ('text_md5', 'Text-content-md5'),
+        ('text_sha1', 'Text-content-sha1'),
+        ('content_length', 'Content-length'),
+        ]:
+      node[key] = headers.get(header, None)
+
+    action = headers['Node-action']
+
     self.parse_blank()
     if node['prop_length']:
       node['props'] = self.get_props()
     if node['text_length']:
       node['content'] = self.get_content(int(node['text_length']))
     # Hard to determine how may blanks is 'correct' (a delete that is
     # followed by an add that is a replace and a copy has one fewer
     # than expected but that can't be predicted until seeing the add)
     # so allow arbitrary number
     blanks = 0
     while self.current < len(self.lines) and self.parse_blank(required=False):
       blanks += 1
-    node['blanks'] = blanks
+    ### disable temporarily, as svnrdump behaves differently from svnadmin
+    ### on a replace-with-copy (bug -- should file an issue)
+    #node['blanks'] = blanks
     return action, node
 
   def parse_all_nodes(self):
     nodes = {}
     while True:
       if self.current >= len(self.lines):
         break
       path = self.parse_path()
-      if not path and not path is '':
+      if path is None:
         break
       if not nodes.get(path):
         nodes[path] = {}
       action, node = self.parse_one_node()
       if nodes[path].get(action):
         raise SVNDumpParseError("duplicate action '%s' for node '%s' at line %d"
@@ -657,21 +705,25 @@ class DumpParser:
     self.parse_blank()
     self.parsed['uuid'] = self.parse_uuid()
     self.parse_blank()
     self.parse_all_revisions()
     return self.parsed
 
-def compare_dump_files(message, label, expected, actual):
+def compare_dump_files(message, label, expected, actual, ignore_uuid=False):
   """Parse two dump files EXPECTED and ACTUAL, both of which are lists
   of lines as returned by run_and_verify_dump, and check that the same
   revisions, nodes, properties, etc. are present in both dumps.
   """
 
   parsed_expected = DumpParser(expected).parse()
   parsed_actual = DumpParser(actual).parse()
 
+  if ignore_uuid:
+    parsed_expected['uuid'] = '<ignored>'
+    parsed_actual['uuid'] = '<ignored>'
+
   if parsed_expected != parsed_actual:
     raise svntest.Failure('\n' + '\n'.join(ndiff(
           pprint.pformat(parsed_expected).splitlines(),
           pprint.pformat(parsed_actual).splitlines())))
 
 ##########################################################################################

* notes/dump-load-format.txt
  Correct the assertion that a Revision record must have a properties
  section. While svnadmin always writes a properties section even if there
  are no revprops, svndumpfilter omits the properties section in that case.
--This line, and those below, will be ignored--

Index: notes/dump-load-format.txt
===================================================================
--- notes/dump-load-format.txt	(revision 1651345)
+++ notes/dump-load-format.txt	(working copy)
@@ -74,12 +74,12 @@ An example UUID is "7bf7a5ef-cabf-0310-b
 
 ==== Revision records ====
 
-A Revision record has three headers and is always followed by a
+A Revision record has three headers and is usually followed by a
 property section.  Expect the following form and sequence:
 
 -------------------------------------------------------------------
 Revision-number: <N>
-Prop-content-length: <P>
+[Prop-content-length: <P>]
 Content-length: <L>
 !
 -------------------------------------------------------------------
@@ -158,7 +158,7 @@ RFC822-style headers. A body may follow.
 
 === Property sections ===
 
-A Revision record *must* have a property section, and a Node record *may*
+A Revision record *may* have a property section, and a Node record *may*
 have a property section. Every record with a property section has 
 a Prop-content-length header.
 
@@ -345,9 +345,13 @@ to the same directory.
 
 === Properties and persistence ===
 
-The properties section of a Revision record consists of some subset
-of the three reserved per-commit properties: svn:author, svn:date,
-and svn:log. These properties do not persist to later revisions.
+The properties section of a Revision record consists of some (possibly
+empty) subset of the three reserved revision properties: svn:author,
+svn:date, and svn:log, along with any other revision properties.
+
+The revision properties do not persist to later revisions.  Each revision
+has exactly the revision properties specified in its revision record, or
+no revision properties if there is no property section.
 
 The key thing to know about Node properties is that they are 
 persistent, once set, until modified by a future property 
@@ -520,6 +524,15 @@ properties block.
 This note is included for historical completeness only, at is it highly
 unlikely that any Subversion instances that old remain in production.
 
+== Implementation choices for optional behaviour ==
+
+This section lists some of the ways existing implementations interpret the
+optional aspects of the specification.
+
+A Revision record need not have a Properties section if it has no revision
+properties. In Subversion 1.7 to 1.8, svnadmin and svnrdump emit an empty
+properties section, whereas svndumpfilter omits it.
+
 == Ancient history ==
 
 Old discussion:

Re: Testing equality between svnrdump and svnadmin dump

Reply via email to