On Tue, 1 May 2018 14:33:51 -0700
Stefan Beller <[email protected]> wrote:
> Git's object access code can be thought of as containing two layers:
> the raw object store provides access to raw object content, while the
> higher level obj_hash code parses raw objects and keeps track of
> parenthood and other object relationships using 'struct object'.
> Keeping these layers separate should make it easier to find relevant
> functions and to change the implementation of one without having to
> touch the other.
I only understood this after reading the code below. I think this
paragraph can be removed; I don't see its relevance - of course we need
to store metadata about how to load objects somewhere, and caching
objects we have already loaded is a good idea: and the metadata and
cache are two separate things before and after this patch anyway.
> Add an object_parser field to 'struct repository' to prepare obj_hash
> to be handled per repository. Callers still only use the_repository
> for now --- later patches will adapt them to handle arbitrary
> repositories.
I think this is better reworded as:
Convert the existing global cache for parsed objects (obj_hash) into
repository-specific parsed object caches. Existing code that uses
obj_hash are modified to use the parsed object cache of
the_repository; future patches will use the parsed object caches of
other repositories.
> +struct object_parser *object_parser_new(void)
> +{
> + struct object_parser *o = xmalloc(sizeof(*o));
> + memset(o, 0, sizeof(*o));
> + return o;
> +}
I'm not sure that I like this style of code (I prefer the strbuf style
with _INIT and release(), which I think is more flexible) but I don't
feel too strongly about it.
> +struct object_parser {
> + struct object **obj_hash;
> + int nr_objs, obj_hash_size;
> +};
object_parser is probably a bad name. I think Duy had some good
suggestions in [1].
[1]
https://public-inbox.org/git/CACsJy8CgX6BME=ehidbfmrzboydbnhe6ouxv4fzc-gw7rsi...@mail.gmail.com/
> /*
> - * Holds any information related to accessing the raw object content.
> + * Holds any information needed to retrieve the raw content
> + * of objects. The object_parser uses this to get object
> + * content which it then parses.
> */
> struct raw_object_store *objects;
I think the additional sentence ("The object_parser uses this...") is
unnecessary and confusing, especially if its identity is going to be one
of storage (like "parsed_objects" implies).
> + /*
> + * State for the object parser. This owns all parsed objects
> + * (struct object) so callers do not have to manage their
> + * lifetime.
> + */
> + struct object_parser *parsed_objects;
Even after all the commits in this patch set, this does not store any
state for parsing. Maybe just document as:
All objects in this repository that have been parsed. This structure
owns all objects it references, so users of "struct object *"
generally do not need to free them; instead, when a repository is no
longer used, call object_parser_clear() on this structure.
(And maybe say that the freeing method on struct repository will
automatically call object_parser_clear().)