>> Now, in order to distinguish different clusters, parts (1) and (2) remember >> (like, "lock on") the (PATH, REP_ID) for the first noderev from the current >> cluster. Iff the PATH changes (this means we have just entered the next >> cluster), we update the stored (PATH, REP_ID) values. We need to remember >> the REP_ID in order to reconstruct those delta chains in (2), because they >> start from this REP_ID. >> >> As far as I can tell, the problem lies in the way we handle the first path. >> For any non-empty path set, the first path always marks the beginning of the >> corresponding cluster. However, we only remember the PATH part of the (PATH, >> REP_ID) pair for the first path and omit the REP_ID. In case when the path >> set consists of *a single cluster* (i.e. all path_order_t objects share the >> same PATH), phase (2) will end up in an attempt to reconstruct the delta >> chain starting from the uninitialized REP_ID. There are scenarios that lead >> to this behavior in fs-fs-pack-test (see create_packed_filesystem() with >> multiple /iota content tweaks falling into the same cluster) and this can >> easily happen with real-world repositories. > > Well, chances are that PATH mismatches for the first > non-NULL iteration in (2), which gets REP_ID set. > However, if there is only one cluster with more than > a single change in the pack file, the path PATH does > match and REP_ID will have some random value.
Aha, for some reason I assumed that you need to carry the REP_ID from (1) to (2) in order to be able to reconstruct the delta chain for this single cluster (i.e. single path) scenario. That's clearly incorrect and now I understand it. > I committed a fix in r1572512. It removes the unnecessary > write to REP_ID during (1) and makes sure that (PATH, REP_ID) > get initialized to the first *relevant* entry at the being of (2). Thank you for fixing this. Regards, Evgeny Kotkov