On Thu, Feb 27, 2014 at 1:53 AM, Evgeny Kotkov <evgeny.kot...@visualsvn.com>wrote:
> Hi Stefan, > > > Author: stefan2 > > Date: Fri Jan 3 14:34:41 2014 > > New Revision: 1555109 > > > > URL: http://svn.apache.org/r1555109 > > Log: > > Refine the FSFS f7 pack heuristics away from the container-centric > > order used by FSX. The changes only affect the order of text reps > > and noderevs; changed paths lists and properties remain as are. > > It looks like there is a problem is the implementation of the reps sorting > algorithm. The sort_reps_range() function (which essentially is the core > of > the new heuristics) can access uninitialized memory under certain > circumstances. > There is a bunch of valgrind warnings for trunk@1572250 ... > [[[ > (valgrind fs-fs-pack-test) > > Conditional jump or move depends on uninitialised value(s) > at 0x42C9BF: svn_fs_fs__id_part_eq (id.c:145) > by 0x43505B: sort_reps_range (pack.c:932) > by 0x4340CD: sort_reps (pack.c:1007) > by 0x432B38: pack_range (pack.c:1391) > by 0x4319BF: pack_log_addressed (pack.c:1570) > by 0x43151B: pack_rev_shard (pack.c:1758) > by 0x431023: pack_shard (pack.c:1817) > by 0x430E33: pack_body (pack.c:1958) > by 0x425DAF: with_some_lock_file (fs_fs.c:185) > by 0x425C21: svn_fs_fs__with_write_lock (fs_fs.c:203) > by 0x430B15: svn_fs_fs__pack (pack.c:1986) > by 0x425491: fs_pack (fs.c:364) > ]]] > > ...and this behavior can also be witnessed in some real-world scenarios. > I patched this function to make it crash upon the uninitialized variable > usage > and received 11/14 coredumps in the fs-fs-pack-test and a 100%-reproducible > coredump when packing the svnrdump'ed > http://googletest.googlecode.com/svn/ > repository with "layout sharded 5": > ... > Well, the good thing is that this would never lead to a broken repo but only affect the item placement order within a pack file. It's still a bug that needs fixing. The algorithm reorders the noderevs in three scans (please correct me if I > misunderstood some details). The first scan (1) bubbles up a single > noderev > from every cluster, which is likely to be referenced from the following > packs. > Cluster is a term that refers to a sequential noderev range, where every > element shares the same path. After that, the second scan (2) bubbles up > reconstructible delta chains per every cluster. The last scan (3) is the > least interesting one, because it simply grabs everything what was left > after > (1) and (2). > Correct. And that catch-all scan (3) makes the whole heuristics robust against a wide range of bugs. > Now, in order to distinguish different clusters, parts (1) and (2) remember > (like, "lock on") the (PATH, REP_ID) for the first noderev from the current > cluster. Iff the PATH changes (this means we have just entered the next > cluster), we update the stored (PATH, REP_ID) values. We need to remember > the > REP_ID in order to reconstruct those delta chains in (2), because they > start > from this REP_ID. > > As far as I can tell, the problem lies in the way we handle the first path. > For any non-empty path set, the first path always marks the beginning of > the > corresponding cluster. However, we only remember the PATH part of the > (PATH, > REP_ID) pair for the first path and omit the REP_ID. In case when the > path set > consists of *a single cluster* (i.e. all path_order_t objects share the > same > PATH), phase (2) will end up in an attempt to reconstruct the delta chain > starting from the uninitialized REP_ID. There are scenarios that lead to > this > behavior in fs-fs-pack-test (see create_packed_filesystem() with multiple > /iota > content tweaks falling into the same cluster) and this can easily happen > with > real-world repositories. > Well, chances are that PATH mismatches for the first non-NULL iteration in (2), which gets REP_ID set. However, if there is only one cluster with more than a single change in the pack file, the path PATH does match and REP_ID will have some random value. > So, what do you think about fixing it this way? > [[[ > Index: subversion/libsvn_fs_fs/pack.c > =================================================================== > --- subversion/libsvn_fs_fs/pack.c (revision 1572250) > +++ subversion/libsvn_fs_fs/pack.c (working copy) > @@ -865,6 +865,7 @@ sort_reps_range(pack_context_t *context, > */ > dest = first; > path = path_order[first]->path; > + rep_id = path_order[first]->rep_id; > best = first; > > /* (1) For each path, pick the "roundest" representation and put it in > ]]] > I committed a fix in r1572512. It removes the unnecessary write to REP_ID during (1) and makes sure that (PATH, REP_ID) get initialized to the first *relevant* entry at the being of (2). > Just to mention, even if this change looks good, I won't be able to commit > it, > because I am still waiting for my Apache account to be created. The same > is > true for those patches from > http://svn.haxx.se/dev/archive-2014-02/0280.shtml > and http://svn.haxx.se/dev/archive-2014-02/0279.shtml > Hopefully, this process won't take too long. > No worries. I'm currently on a working holiday myself enjoying the African sun. -- Stefan^2.