When we are building the pack index, we can notice that
there are duplicate objects, pick one "winner" instance, and
mention the object only once in the index (mapped to the
winner's offset).
This has the effect that the duplicate packfile entries are
never found by lookup. The data still exists in the
packfile, though, and can be used as a delta base if delta
base offsets are used. If delta refs are used, then it is
possible that some deltas may be broken.
Unfortunately, this scheme does not quite work, as the pack
reader checks that the index and packfile claim the same
number of objects, and will refuse to open such a packfile.
We have a few options:
1. Loosen the check in the reader. This drops a
potentially useful sanity check on the data, and it
won't work for any other implementations (including
older versions of git).
2. Loosen the check only if a special flag is found in the
index indicating that we removed duplicates. This means
that we only lose the safety check in the rare case
that duplicates are found. But there is actually no
place in the index to put such a flag, and in addition
no other implementation would understand our flag.
3. Instead of reducing the numnber of objects in the
index, include "dummy" entries using the null sha1 or a
similar sentinel value (and pointing to some invalid
offset). This should work, but is awfully hacky (and
will probably create havoc as we will now claim to have
the null sha1, but will throw errors if you actually
look at it).
Signed-off-by: Jeff King <[email protected]>
---
I think this line of "fixing" should probably be scrapped. Truly fixing
it and covering the REF_DELTA case would involve magic in the actual
object lookups (we would have to detect cycles and find the "other"
object that is the real base). And it's probably just not worth the
effort.
builtin/index-pack.c | 2 ++
pack-write.c | 30 ++++++++++++++++++++++++++++++
pack.h | 3 ++-
t/t5308-pack-detect-duplicates.sh | 8 ++++++++
4 files changed, 42 insertions(+), 1 deletion(-)
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 83fd3bb..1dadd56 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1375,6 +1375,8 @@ static int git_index_pack_config(const char *k, const
char *v, void *cb)
opts->duplicates = WRITE_IDX_DUPLICATES_IGNORE;
else if (boolval == 0 || !strcmp(v, "reject"))
opts->duplicates = WRITE_IDX_DUPLICATES_REJECT;
+ else if (!strcmp(v, "skip"))
+ opts->duplicates = WRITE_IDX_DUPLICATES_SKIP;
else
die("unknown value for pack.indexduplicates: %s", v);
return 0;
diff --git a/pack-write.c b/pack-write.c
index 542b081..657da2a 100644
--- a/pack-write.c
+++ b/pack-write.c
@@ -50,6 +50,27 @@ static void *find_duplicate(void *vbase, size_t n, size_t
size,
return NULL;
}
+static size_t remove_duplicates(void *base, size_t n, size_t size,
+ int (*cmp)(const void *, const void *))
+{
+ unsigned char *from = base, *to = base;
+
+ from += size;
+ to += size;
+ n--;
+
+ while (n > 0) {
+ if (cmp(from, from - size)) {
+ if (to != from)
+ memcpy(to, from, size);
+ to += size;
+ }
+ from += size;
+ n--;
+ }
+ return (to - (unsigned char *)base) / size;
+}
+
/*
* On entry *sha1 contains the pack content SHA1 hash, on exit it is
* the SHA1 hash of sorted object names. The objects array passed in
@@ -89,6 +110,15 @@ const char *write_idx_file(const char *index_name, struct
pack_idx_entry **objec
if (dup)
die("pack has duplicate entries for %s",
sha1_to_hex((*dup)->sha1));
+ } else if (opts->duplicates == WRITE_IDX_DUPLICATES_SKIP) {
+ int nr_unique = remove_duplicates(sorted_by_sha,
+ nr_objects,
+ sizeof(*sorted_by_sha),
+ sha1_compare);
+ if (nr_unique != nr_objects) {
+ nr_objects = nr_unique;
+ last = sorted_by_sha + nr_objects;
+ }
}
if (opts->flags & WRITE_IDX_VERIFY) {
diff --git a/pack.h b/pack.h
index cd4b536..3017ea4 100644
--- a/pack.h
+++ b/pack.h
@@ -46,7 +46,8 @@ struct pack_idx_option {
enum {
WRITE_IDX_DUPLICATES_IGNORE = 0,
- WRITE_IDX_DUPLICATES_REJECT
+ WRITE_IDX_DUPLICATES_REJECT,
+ WRITE_IDX_DUPLICATES_SKIP
} duplicates;
/*
diff --git a/t/t5308-pack-detect-duplicates.sh
b/t/t5308-pack-detect-duplicates.sh
index 0f2d928..963d7b9 100755
--- a/t/t5308-pack-detect-duplicates.sh
+++ b/t/t5308-pack-detect-duplicates.sh
@@ -102,4 +102,12 @@ test_expect_success 'index-pack can reject packs with
duplicates' '
test_expect_code 1 git cat-file -e $LO_SHA1
'
+test_expect_success 'index-pack can fix packs with duplicates' '
+ clear_packs &&
+ create_pack 2 >dups.pack &&
+ git -c pack.indexDuplicates=skip index-pack --stdin <dups.pack &&
+ git cat-file --batch-check <input >actual &&
+ test_cmp expect actual
+'
+
test_done
--
1.8.4.rc2.28.g6bb5f3f
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html