On 9/26/2017 6:20 PM, Jonathan Tan wrote:
On Fri, 22 Sep 2017 20:26:21 +0000
Jeff Hostetler <g...@jeffhostetler.com> wrote:

From: Jeff Hostetler <jeffh...@microsoft.com>

Create subclass of oidset where each entry has a
field to store the length of the object's content
and an optional pathname.

This will be used in a future commit to build a
manifest of omitted objects in a partial/narrow
clone/fetch.

As Brandon mentioned, I think "oidmap" should be the new data structure
of choice (with "oidset" modified to use it).

I'll take a look at that. I'm not exactly happy with
my oidset2, but it works and it minimized touching other
things.  But yes, it may clear up a few things.


+struct oidset2_entry {
+       struct hashmap_entry hash;
+       struct object_id oid;
+
+       enum object_type type;
+       int64_t object_length;  /* This is SIGNED. Use -1 when unknown. */
+       char *pathname;
+};

object_length is defined to be "unsigned long" in Git code, I think.
When is object_length not known, and in those cases, would it be better
to use a separate data structure to store what we need?

Yeah, I struggled with that one.  Git currently treats file size as
a 32-bit unsigned value throughout the code.  I assume eventually there
will be a round of changes to support 64-bit values, so this anticipates
that.

I could change it to be an unknown flag, rather assuming -1, but in an
earlier draft I was printing -1 in the rev-list output.  I can change this.

WRT a separate structure, the SET I create will contain entries for items
where we may or may not know the size and that depends on the context.
When building a list of already-missing blobs (with the --filter-print-missing)
we never know the size.  But when building a list of to-be-omitted blobs
(from the current set of filter options), we may or may not know.  I'm
not sure we need 2 _entry definitions right now.

Reply via email to