On 10/02/2019 19:05, Ramsay Jones wrote:
>
>
> On 10/02/2019 16:02, Florian Steenbuck wrote:
>> Hello to all,
>>
>> I try to understand the git protocol only on the server site. So I
>> start without reading any docs and which turns to be fine until I got
>> to the PACK format (pretty early failure I know).
>>
>> I have read this documentation:
>> https://raw.githubusercontent.com/git/git/c4df23f7927d8d00e666a3c8d1b3375f1dc8a3c1/Documentation/technical/pack-format.txt
>>
>> But their are some confusion about this text.
>>
>> The basic header is no problem, but somehow I got stuck while try to
>> read the length and type of the objects, which are ints that can be
>> resolved with 3-bits and 4-bits. The question is where and how ?
>>
>
> Hmm, the 'type and length' encoding could be described more clearly!
> Hopefully, just on this issue, the following could help:
>
> In my git.git repo, which is fully packed, I have a single pack file, with
>
> $ git count-objects -v
> count: 0
> size: 0
> in-pack: 270277
> packs: 1
> size-pack: 101929
> prune-packable: 0
> garbage: 0
> size-garbage: 0
> $
>
> ... 270277 objects in it. The beginning of the file looks like:
>
> $ xxd .git/objects/pack/pack-d554e6d8335601c2525b40487faf36493094ab50.pack
> | head
> 00000000: 5041 434b 0000 0002 0004 1fc5 9d13 789c PACK..........x.
> 00000010: 9d8f cd6a c330 1084 ef7a 8a3d 171a b4ab ...j.0...z.=....
> 00000020: 9525 8750 0abd 945c f304 ab95 5cfb 602b .%.P...\....\.`+
> 00000030: b84a 7fde 3e2a 943e 406f c3f0 cd30 d3f6 .J..>*.>@o...0..
> 00000040: 5260 741a 5025 92e2 1458 917c c294 a3c3 R`t.P%...X.|....
> 00000050: 4803 e521 395f c2d8 4d73 95bd 6c0d 82f5 H..!9_..Ms..l...
> 00000060: 6172 310f 0529 7a2f d6a7 40c5 d9a0 d185 ar1..)z/..@.....
> 00000070: 622d 8789 9cb8 3f1e 5132 6366 4de4 8531 b-....?.Q2cfM..1
> 00000080: 114a 70ec 9447 2f5a 526f e29c 3847 23b7 .Jp..G/ZRo..8G#.
> 00000090: 36d7 1dce b76d a9f0 02af b2ca 56e1 f4b6 6....m......V...
> $
>
> You can see the header, which consists of 3 32-bit values, where the
> packfile signature is the '5041 434b', then the version number which
> is '0000 0002', followed by the number of objects '0004 1fc5' which
> is 270277. Next comes the first 'object entry', which starts '9d13'.
>
> Now, the 'n-byte type and length' is a variable length encoding of
> the object type and length. The number of bytes used to encode this
> data is content dependant. If the top bit of a byte is set, then we
> need to process the next byte, otherwise we are done. So, looking
> at the first 'object entry' byte (at offset 12) '9d', we take the
> top nibble, remove the top bit, and shift right 4 bits to get the
> object type. ie. (0x9d >> 4) & 7 which gives an object type of 1
> (which is a commit object). The lower nibble of the first byte
> contains the first (or only) 4 bits of the size, here (0x9d & 15)
> which is 0xd. Given that the top bit of this byte is set, we now
> process the next byte. After the first byte, each byte contains 7
> bits of the size field which is combined with the value from the
> previous byte by shifting and adding (first by 4 bits, then 11, 18,
> 25 etc.). So, in this case we have (0x13 << 4) + 0xd = 317.
Sorry, to be clear, I should have said, "mask off the top bit,
shift and add", so:
((0x13 & 0x7f) << 4) + 0xd = 317
ATB,
Ramsay Jones
>
> The compressed data follows, '789c' ...
>
> We can use git-verify-pack to confirm the details here:
>
> $ git verify-pack -v
> .git/objects/pack/pack-d554e6d8335601c2525b40487faf36493094ab50.idx | head -n
> 1
> 878e2cd30e1656909c5073043d32fe9d02204daa commit 317 216 12
> $
>
> So the object 878e2cd30e, at offset 12 in the file, is a commit object
> with size 317 (which has an in-pack size of 216).
>
> Hope this helps.
>
> ATB,
> Ramsay Jones
>
>