Signed-off-by: Nguyễn Thái Ngọc Duy <[email protected]>
---
Should be up to date with Nico's latest implementation and also cover
additions to the format that everybody seems to agree on:
- new types for canonical trees and commits
- sha-1 table covering missing objects in thin packs
Documentation/technical/pack-format.txt | 133 +++++++++++++++++++++++++++++++-
1 file changed, 132 insertions(+), 1 deletion(-)
diff --git a/Documentation/technical/pack-format.txt
b/Documentation/technical/pack-format.txt
index 8e5bf60..c5327ff 100644
--- a/Documentation/technical/pack-format.txt
+++ b/Documentation/technical/pack-format.txt
@@ -1,7 +1,7 @@
Git pack format
===============
-== pack-*.pack files have the following format:
+== pack-*.pack files version 2 and 3 have the following format:
- A header appears at the beginning and consists of the following:
@@ -36,6 +36,132 @@ Git pack format
- The trailer records 20-byte SHA-1 checksum of all of the above.
+== pack-*.pack files version 4 have the following format:
+
+ - A header appears at the beginning and consists of the following:
+
+ 4-byte signature:
+ The signature is: {'P', 'A', 'C', 'K'}
+
+ 4-byte version number (network byte order): must be 4
+
+ 4-byte number of objects contained in the pack (network byte order)
+
+ - A series of tables, described separately.
+
+ - The tables are followed by number of object entries, each of
+ which looks like below:
+
+ (undeltified representation)
+ n-byte type and length (4-bit type, (n-1)*7+4-bit length)
+ data
+
+ (deltified representation)
+ n-byte type and length (4-bit type, (n-1)*7+4-bit length)
+ base object name in SHA-1 reference encoding
+ compressed delta data
+
+ "type" is used to determine object type. Commit has type 1, tree
+ 2, blob 3, tag 4, ref-delta 7, canonical-commit 9 (commit type
+ with bit 3 set), canonical-tree 10 (tree type with bit 3 set).
+ Compared to v2, ofs-delta type is not used, and canonical-commit
+ and canonical-tree are new types.
+
+ In undeltified format, blobs and tags ares compressed. Trees are
+ not compressed at all. Some headers in commits are stored
+ uncompressed, the rest is compressed. Tree and commit
+ representations are described in detail separately.
+
+ Blobs and tags are deltified and compressed the same way in
+ v3. Commits are not delitifed. Trees are deltified using
+ undeltified representation.
+
+ Trees and commits in canonical types are in the same format as
+ v2: in canonical format and deflated. They can be used for
+ completing thin packs or preserving somewhat ill-formatted
+ objects.
+
+ - The trailer records 20-byte SHA-1 checksum of all of the above.
+
+=== Pack v4 tables
+
+ - A table of sorted SHA-1 object names for all objects contained in
+ the on-disk pack.
+
+ Thin packs are used for transferring on the wire and may omit base
+ objects, expecting the receiver to add them before writing to
+ disk. The SHA-1 table in thin packs must include the omitted objects
+ as well.
+
+ This table can be referred to using "SHA-1 reference encoding": the
+ index, in variable length encoding, to this table.
+
+ - Ident table: the uncompressed length in variable encoding,
+ followed by zlib-compressed dictionary. Each entry consists of
+ two prefix bytes storing timezone followed by a NUL-terminated
+ string.
+
+ Entries should be sorted by frequency so that the most frequent
+ entry has the smallest index, thus most efficient variable
+ encoding.
+
+ The table can be referred to using "ident reference encoding": the
+ index number, in variable length encoding, to this table.
+
+ - Tree path table: the same format to ident table. Each entry
+ consists of two prefix bytes storing tree entry mode, then a
+ NUL-terminated path name. Same sort order recommendation applies.
+
+=== Commit representation
+
+ - n-byte type and length (4-bit type, (n-1)*7+4-bit length)
+
+ - Tree SHA-1 in SHA-1 reference encoding
+
+ - Parent count in variable length encoding
+
+ - Parent SHA-1s in SHA-1 reference encoding
+
+ - Author reference in ident reference encoding
+
+ - Author timestamp in variable length encoding
+
+ - Committer reference in ident reference encoding
+
+ - Committer timestamp, encoded as a difference against author
+ timestamp with the LSB used to indicate negative difference.
+
+ - Compressed data of remaining header and the body
+
+=== Tree representation
+
+ - n-byte type and length (4-bit type, (n-1)*7+4-bit length)
+
+ - Number of tree entries in variable length encoding
+
+ - A number of entries, each can be in either forms
+
+ - INT(path_index << 1) INT(sha1_index)
+
+ - INT((entry_start << 1) | 1) INT(entry_count << 1)
+
+ - INT((entry_start << 1) | 1) INT((entry_count << 1) | 1)
INT(base_sha1_index)
+
+ INT() denotes a number in variable length encoding. path_index is
+ the index to the tree path table. sha1_index is the index to the
+ SHA-1 table. entry_start is the first tree entry to copy
+ from. entry_count is the number of tree entries to
+ copy. base_sha1_index is the index to SHA-1 table of the base tree
+ to copy from.
+
+ The LSB of the first number indicates whether it's a plain tree
+ entry (LSB not set), or an instruction to copy tree entries from
+ another tree (LSB set).
+
+ For copying from another tree, is the LSB of the second number is
+ set, it will be followed by a base tree SHA-1. If it's not set,
+ the last base tree will be used.
+
== Original (version 1) pack-*.idx files have the following format:
- The header consists of 256 4-byte network byte order
@@ -160,3 +286,8 @@ Pack file entry: <+
corresponding packfile.
20-byte SHA-1-checksum of all of the above.
+
+== Version 3 pack-*.idx files support only *.pack files version 4. The
+ format is the same as version 2 except that the table of sorted
+ 20-byte SHA-1 object names is missing in the .idx files. The same
+ table exists in .pack files and will be used instead.
--
1.8.2.83.gc99314b
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html