On Sun, Dec 16, 2018 at 04:14:46PM -0800, Jonathan Nieder wrote:
> Hi,
>
> Farhan Khan wrote:
> >> Farhan Khan wrote:
>
> >>> I am having trouble figuring out the boundary between two objects in
> >>> the pack file.
> [...]
> > I think the issue is, the compressed object has a fixed
> > size and git inflates it, then moves on to the next object. I am
> > trying to figure out how where it identifies the size of the object.
>
> Do you mean the compressed size or uncompressed size?
>
> It sounds to me like pack-format.txt needs to do a better job of
> distinguishing the two.
How about something like this?
I mostly wrote this based on memory (and a very quick look at
index-pack) but I think we never ever really stored compressed
sizes. The "length" field (even in loose format) is always about
uncompressed size.
-- 8< --
diff --git a/Documentation/technical/pack-format.txt
b/Documentation/technical/pack-format.txt
index cab5bdd2ff..4fd49f61d6 100644
--- a/Documentation/technical/pack-format.txt
+++ b/Documentation/technical/pack-format.txt
@@ -31,6 +31,11 @@ Git pack format
is an OBJ_OFS_DELTA object
compressed delta data
+ Note: The length (in bytes) is of uncompressed objects or
+ deltified representation. We're supposed to reach the end of zlib
+ stream once we have inflated the given length, otherwise it's a
+ corrupted pack file.
+
Observation: length of each object is encoded in a variable
length format and is not constrained to 32-bit or anything.
@@ -199,7 +204,8 @@ Pack file entry: <+
is the size before compression).
If it is REF_DELTA, then
20-byte base object name SHA-1 (the size above is the
- size of the delta data that follows).
+ size of the delta data that follows, before
+ compression).
delta data, deflated.
If it is OFS_DELTA, then
n-byte offset (see below) interpreted as a negative
-- 8< --