Hi Stefan,
Thanks for the constructive review! I accept your nits, but have one comment:
You can use svn_sort_compare_items_lexically because entries are single path
components that do not contain slashes. In which case
svn_sort_compare_items_as_paths produces the same result but is less
efficient.
The paths are like:
trunk/documents/papers/archetypes/paper_plots/103lg.ps
But I still believe you are right that lexical sort will work fine and is
simpler than svn_sort_compare_items_as_paths.
The revised patch is attached. Here's a proposed commit message:
[[[
Issue #4134: svnadmin dump files are not reproducible, due to APR 1.4.6 hash
order changes
This patch sorts deleted path names before dumping them.
* subversion/libsvn_repos/dump.c
(close_directory): sort hash deleted_entries before dumping.
Patch by: Dustin Lang <dstnd...@gmail.com>
Review by: stsp
(Stefan suggested several stylistic changes and using
svn_sort_compare_items_lexically.)
]]]
cheers,
dstn
On Tue, 29 May 2012, Stefan Sperling wrote:
On Tue, May 29, 2012 at 03:09:08PM -0400, Dustin Lang wrote:
Hi,
This is my first time working on the subversion code, so be gentle :)
I believe the attached patch solves one component of #4134
("svnadmin dump files are not reproducible, due to APR 1.4.6 hash
order changes"), namely that deleted nodes are stored in a hash and
dumped out in (now-arbitrary) hash iteration order. This patch
simply uses "svn_sort_hash" to get a sorted view of the keys before
dumping them.
I have verified that it runs valgrind-clean when "svnadmin dump"ing
700 revs of the astrometry.net repo (and now my repo and its
svnsync'd mirror produce exactly equal dumps, hoorah), but I don't
really know if I've used the apr infrastructure appropriately.
I don't know whether this completely solves #4134; it works for me,
but my repo is simple.
cheers,
dustin
This patch looks correct to me overall but I have some nits.
http://subversion.tigris.org/issues/show_bug.cgi?id=4134
Index: subversion/libsvn_repos/dump.c
===================================================================
--- subversion/libsvn_repos/dump.c (revision 1343881)
+++ subversion/libsvn_repos/dump.c (working copy)
@@ -34,6 +34,7 @@
#include "svn_time.h"
#include "svn_checksum.h"
#include "svn_props.h"
+#include "svn_sorts.h"
#include "private/svn_mergeinfo_private.h"
#include "private/svn_fs_private.h"
@@ -736,12 +737,14 @@
struct edit_baton *eb = db->edit_baton;
apr_hash_index_t *hi;
apr_pool_t *subpool = svn_pool_create(pool);
+ unsigned int Npaths = apr_hash_count(db->deleted_entries);
No CamelCase please. You don't need this variable at all (see below).
+ unsigned int i;
+ apr_array_header_t* sorted;
Maybe call the array 'sorted_deleted_entries', or 'sorted_entries', instead?
- for (hi = apr_hash_first(pool, db->deleted_entries);
- hi;
- hi = apr_hash_next(hi))
+ sorted = svn_sort__hash(db->deleted_entries,
svn_sort_compare_items_as_paths, pool);
Please try to wrap lines at 78 columns.
You can use svn_sort_compare_items_lexically because entries are single
path components that do not contain slashes. In which case
svn_sort_compare_items_as_paths produces the same result but is less efficient.
+ for (i = 0; i < Npaths; i++)
Here, you can use the idiom:
for (i = 0; i < sorted->nelts; i++)
which means you don't need 'Npaths'.
{
- const char *path = svn__apr_hash_index_key(hi);
+ const char* path = APR_ARRAY_IDX(sorted, i, svn_sort__item_t).key;
Please put the asterisk next to the variable name for consistency with code
formatting elsewhere. Same applies to the declaration of 'sorted' above.
svn_pool_clear(subpool);
@@ -753,6 +756,8 @@
FALSE, NULL, SVN_INVALID_REVNUM, subpool));
}
+ apr_array_clear(sorted);
No need to do this. This doesn't actually free any memory, it just marks
array slots for reuse. Array memory will be freed when the caller clears
or destroys 'pool'. It's up to the caller to do this, so just delete that line.
+
svn_pool_destroy(subpool);
return SVN_NO_ERROR;
}
Index: subversion/libsvn_repos/dump.c
===================================================================
--- subversion/libsvn_repos/dump.c (revision 1343783)
+++ subversion/libsvn_repos/dump.c (working copy)
@@ -34,6 +34,7 @@
#include "svn_time.h"
#include "svn_checksum.h"
#include "svn_props.h"
+#include "svn_sorts.h"
#include "private/svn_mergeinfo_private.h"
#include "private/svn_fs_private.h"
@@ -736,12 +737,15 @@ close_directory(void *dir_baton,
struct edit_baton *eb = db->edit_baton;
apr_hash_index_t *hi;
apr_pool_t *subpool = svn_pool_create(pool);
+ unsigned int i;
+ apr_array_header_t *sorted_entries;
- for (hi = apr_hash_first(pool, db->deleted_entries);
- hi;
- hi = apr_hash_next(hi))
+ sorted_entries = svn_sort__hash(db->deleted_entries,
+ svn_sort_compare_items_lexically, pool);
+ for (i = 0; i < sorted_entries->nelts; i++)
{
- const char *path = svn__apr_hash_index_key(hi);
+ const char *path = APR_ARRAY_IDX(sorted_entries, i,
+ svn_sort__item_t).key;
svn_pool_clear(subpool);