Re: [PATCH 2/2] repack: repack promisor objects if -a or -A is set

Junio C Hamano Tue, 07 Aug 2018 13:58:09 -0700

Jonathan Tan <jonathanta...@google.com> writes:

> +static int write_oid(const struct object_id *oid, struct packed_git *pack,
> +                  uint32_t pos, void *data)
> +{
> +     int fd = *(int *)data;
> +
> +     xwrite(fd, oid_to_hex(oid), GIT_SHA1_HEXSZ);
> +     xwrite(fd, "\n", 1);
> +     return 0;
> +}
> +
> +static void repack_promisor_objects(const struct packed_objects_args *args,
> +                                 struct string_list *names)
> +{
> +     struct child_process cmd = CHILD_PROCESS_INIT;
> +     FILE *out;
> +     struct strbuf line = STRBUF_INIT;
> +
> +     prepare_packed_objects(&cmd, args);
> +     cmd.in = -1;
> +
> +     if (start_command(&cmd))
> +             die("Could not start pack-objects to repack promisor objects");
> +
> +     for_each_packed_object(write_oid, &cmd.in,
> +                            FOR_EACH_OBJECT_PROMISOR_ONLY);
> +     close(cmd.in);


for_each_object_in_pack() is a fine way to make sure that you list
everythning in a pack, but I suspect it is a horrible way to feed a
list of objects to pack-objects, as it goes by the .idx order, which
is by definition a way to enumerate objects in a randomly useless
order.

Do we already have an access to the in-core reverse index for the
pack at this point in the code?  If so, we can enumerate the objects
in the offset order without incurring a lot of cost (building the
in-core revindex is the more expensive part).  When writing a pack,
we try to make sure that related objects are written out close to
each other [*1*], so listing them in the offset order (with made-up
pathname information to _force_ that objects that live close
together in the original pack are grouped together by getting
similar names) might give us a better than horrible deltification.
I dunno.

        Side note *1*: "related" has two axis, and one is relevant
        for better deltification, while the other is not useful.
        The one I have in mind here is that we write set of blobs
        that belong to the same "delta family" together.

I do not think such a "make it a bit better than horrible" is necessary
in an initial version, but it deserves an in-code NEEDSWORK comment
to remind us that we need to measure and experiment.

Thanks.

Re: [PATCH 2/2] repack: repack promisor objects if -a or -A is set

Reply via email to