Derrick Stolee <[email protected]> writes:
> Documentation/technical/commit-graph-format.txt | 90
> +++++++++++++++++++++++++
> 1 file changed, 90 insertions(+)
> create mode 100644 Documentation/technical/commit-graph-format.txt
Hopefully just a few remaining nits. Overall I find this written
really clearly.
> +== graph-*.graph files have the following format:
> +
> +In order to allow extensions that add extra data to the graph, we organize
> +the body into "chunks" and provide a binary lookup table at the beginning
> +of the body. The header includes certain values, such as number of chunks,
> +hash lengths and types.
We no longer have lengths stored.
> + ...
> + The remaining data in the body is described one chunk at a time, and
> + these chunks may be given in any order. Chunks are required unless
> + otherwise specified.
It is good that this explicitly says chunks can come in any order,
and which ones are required. It should also say which chunk can
appear multiple times. I think all four chunk types we currently
define can have at most one instance in a file.
> +CHUNK DATA:
> +
> + OID Fanout (ID: {'O', 'I', 'D', 'F'}) (256 * 4 bytes)
> + The ith entry, F[i], stores the number of OIDs with first
> + byte at most i. Thus F[255] stores the total
> + number of commits (N).
> +
> + OID Lookup (ID: {'O', 'I', 'D', 'L'}) (N * H bytes)
> + The OIDs for all commits in the graph, sorted in ascending order.
Somewhere in this document, we probably would want to say that this
format allows at most (1<<31)-1 commits recorded in the file (as
CGET and EDGE uses 31-bit uint to index into this table, using MSB
for other purposes, and the all-1-bit pattern is also special), and
when we refer to "int-ids" of a commit, it is this 31-bit number
that is an index into this table.
> + Commit Data (ID: {'C', 'G', 'E', 'T' }) (N * (H + 16) bytes)
> + * The first H bytes are for the OID of the root tree.
> + * The next 8 bytes are for the int-ids of the first two parents
> + of the ith commit. Stores value 0xffffffff if no parent in that
> + position. If there are more than two parents, the second value
> + has its most-significant bit on and the other bits store an array
> + position into the Large Edge List chunk.
> + * The next 8 bytes store the generation number of the commit and
> + the commit time in seconds since EPOCH. The generation number
> + uses the higher 30 bits of the first 4 bytes, while the commit
> + time uses the 32 bits of the second 4 bytes, along with the lowest
> + 2 bits of the lowest byte, storing the 33rd and 34th bit of the
> + commit time.
> +
> + Large Edge List (ID: {'E', 'D', 'G', 'E'}) [Optional]
> + This list of 4-byte values store the second through nth parents for
> + all octopus merges. The second parent value in the commit data stores
> + an array position within this list along with the most-significant bit
> + on. Starting at that array position, iterate through this list of
> int-ids
> + for the parents until reaching a value with the most-significant bit
> on.
> + The other bits correspond to the int-id of the last parent.
> +
> +TRAILER:
> +
> + H-byte HASH-checksum of all of the above.
> +