Hi all,
Per prior discussions, here is the specification we are proposing for
version 4 of the AutoFDO GCOV profile format. This is a complete re-design
focused around being a backwards-compatible and extensible format which
can be partially read by the compiler.
=================
Table of Contents
1. Introduction
2. Motivation
3. Open Questions
4. Binary Format
4.1. File Header
4.2. Section Layout
4.3. String Table Section (GCOV_STRING_TABLE)
4.4. Summary Section (GCOV_SUMMARY_INFO)
4.5. File Names Section (GCOV_FILE_NAMES)
4.6. Symbol Names Section (GCOV_SYMBOL_NAMES)
4.7. Symbol Info Section (GCOV_SYMBOL_INFO)
4.8. Location Encoding
4.9. Compact Encoding Mode
5. Textual Format
5.1. Overview
5.2. Grammar
5.3. Example
6. Extensibility
6.1. Binary Format
6.2. Textual Format
6.3. Forward Compatibility
7. Experimental Results
1. Introduction
The existing GCOV AutoFDO profile formats (versions 2 and 3) use a
fixed, non-extensible tag-length sequential layout. Version 4 redesigns
the format to be a section-based one with the following goals:
- The ability to partially read the profile, i.e. unrequired sections are
simply skipped by a reader.
- Forward compatibility through typed location records with a
trailing size field for unknown types. This is useful (for example) for
future work regarding branch profiling for better if-conversion
heuristics.
- Reduced file size through trie-compressed string tables,
optional discriminators, zero-count elision, and variable-width
sample counts, as detailed in later sections.
- Optional compact encoding using Protocol Buffers-style varints
for further size reduction.
- A companion human-readable textual format for inspection.
Measured against a GCC bootstrap profile (58 MB in v3), the v4
binary format achieves -43% (33 MB) in normal mode and -72%
(16 MB) in compact mode. Against SPEC CPU 2017 profiles, it
also achieves -43% in normal mode and -72% in compact mode on average.
All multi-byte integers in the binary format are unsigned and
stored in big-endian byte order.
"Varint" refers to the variable-length integer encoding defined by Protocol
Buffers: each byte uses 7 data bits with the MSB as a continuation flag,
least significant group first
(see https://protobuf.dev/programming-guides/encoding/#varints).
2. Motivation
The primary issue with the existing GCOV format is the lack of extensibility
and backwards-compatibility. Adding features to the format necessitates
bumping the version number, which also requires updating all of the tooling
working with the format.
Secondly, the profile is required to be streamed in its entirety from disk
whenever it is read, which leads to duplicate work when multiple TUs are
compiled as the compiler has to read the entire profile per TU. Having the
ability to read the profile partially allows getting rid of most of this
duplicate work.
3. Open Questions
- Can the GCOV profile format be renamed to something better? The
current name is confusing and collides with the existing GCC PGO
format. Perhaps "AFDO" could work.
4. Binary Format
4.1. File Header
The file begins with a fixed header followed by a section table.
GCOV_MAGIC 4 bytes ASCII 'gcov' (0x67636F76)
GCOV_VERSION 4 bytes 0x00000004
GCOV_HEADER_BITMASK 1 byte
Bit 7 : compact flag
Bits 0-6 : reserved
GCOV_NUM_SECTIONS 7 bytes Number of entries in the
section table (excludes the
two fixed sections below)
Immediately following are offset/size pairs for two fixed
sections, then the section table:
GCOV_SUMMARY_OFFSET 8 bytes
GCOV_SUMMARY_SIZE 8 bytes
GCOV_FILE_NAMES_OFFSET 8 bytes
GCOV_FILE_NAMES_SIZE 8 bytes
GCOV_SECTION (repeated GCOV_NUM_SECTIONS times):
GCOV_SECTION_OFFSET 8 bytes
GCOV_SECTION_SIZE 8 bytes
All offsets are byte offsets from the start of the file.
The offsets for the summary and file names sections are provided
for fast access. There is no specified ordering of any of the
sections. The indexing is done based on the order they are placed
within the file, and is inclusive of the two sections mentioned
before.
4.2. Section Layout
Each section begins with a one-byte bitmask:
GCOV_SECTION_BITMASK 1 byte
Bit 7 : compact flag for this section
Bits 0-6 : section type
Defined section types:
0x01 GCOV_STRING_TABLE
0x02 GCOV_SUMMARY_INFO
0x03 GCOV_FILE_NAMES
0x04 GCOV_SYMBOL_NAMES
0x05 GCOV_SYMBOL_INFO
The section bitmask is followed immediately by the section data as
defined in the subsections below.
4.3. String Table Section (GCOV_STRING_TABLE)
Each translation unit has its own string table section, storing
its symbol name strings in a path-compressed trie. This exploits
the shared prefixes common in C++ mangled names (e.g.
"_ZNSt6vector...").
GCOV_NUM_STRINGS 4 bytes Total number of strings
Followed by a serialized trie starting at the root node:
GCOV_TRIE_NODE:
TRIE_NODE_BITMASK 1 byte
Bit 7: terminal flag (a complete string ends here)
Bits 0-6: number of children (0-127)
TRIE_TERMINAL_ID 4 bytes (present only if terminal)
String index for this terminal
TRIE_CHILD (repeated for each child):
TRIE_EDGE_LABEL_LENGTH 2 bytes
TRIE_EDGE_LABEL variable (TRIE_EDGE_LABEL_LENGTH
bytes)
GCOV_TRIE_NODE (recursively)
The trie is path-compressed: chains of nodes where each node has
exactly one child and is not a terminal are collapsed into a single
edge with a multi-character label. To reconstruct a string,
edge labels are concatenated from the root to the terminal node, like a
typical trie.
4.4. Summary Section (GCOV_SUMMARY_INFO)
The summary section contains aggregate statistics for the entire
profile.
SUMMARY_TOTAL_COUNT 8 bytes
SUMMARY_MAX_COUNT 8 bytes
SUMMARY_MAX_FN_COUNT 8 bytes
SUMMARY_NUM_COUNTS 8 bytes
SUMMARY_NUM_FUNCTIONS 8 bytes
SUMMARY_NUM_DETAILED_ENTRIES 8 bytes
Followed by detailed histogram entries:
SUMMARY_DETAILED_ENTRY (repeated SUMMARY_NUM_DETAILED_ENTRIES
times):
ENTRY_CUTOFF 4 bytes Cutoff value
ENTRY_MIN_COUNT 8 bytes Minimum count at this percentile
ENTRY_NUM_COUNTS 8 bytes Number of counters at or above this minimum
4.5. File Names Section (GCOV_FILE_NAMES)
There is exactly one file names section per profile. It maps source
files to their corresponding symbol names sections.
GCOV_NUM_FILE_NAMES 4 bytes
GCOV_FILE_NAME (repeated GCOV_NUM_FILE_NAMES times):
FILE_NAME_STRING_LENGTH 4 bytes
FILE_NAME_STRING_CONTENT variable (FILE_NAME_STRING_LENGTH
bytes, including '\0')
STRING_TABLE_SECTION_INDEX 4 bytes Section index of the
per-TU string table
SYMBOL_NAMES_SECTION_INDEX 4 bytes Section index of the
corresponding
GCOV_SYMBOL_NAMES section
SYMBOL_ID_START 4 bytes First symbol ID in
this TU (inclusive)
SYMBOL_ID_END 4 bytes Last symbol ID in
this TU (exclusive)
An entry with an empty filename string represents symbols with
unknown source file. This entry must be present.
Symbol IDs are contiguous within a TU. This way, it is easier to
lookup which TU a symbol belongs to.
4.6. Symbol Names Section (GCOV_SYMBOL_NAMES)
There is one symbol names section per file name entry, providing
a per-TU symbol table.
GCOV_NUM_SYMBOL_NAMES 4 bytes
GCOV_SYMBOL_NAME (repeated GCOV_NUM_SYMBOL_NAMES times):
SYMBOL_NAME_STRING_INDEX 4 bytes Index into this TU's
string table
SYMBOL_ID 4 bytes Unique identifier
SYMBOL_INFO_SECTION_INDEX 4 bytes Section index of the
GCOV_SYMBOL_INFO, or
0xFFFFFFFF if none
SYMBOL_ID values are globally unique within the file and are
used to reference symbols from location records.
Note: The value 0xFFFFFFFF for SYMBOL_INFO_SECTION_INDEX is reserved, and
occurs when a symbol is referred to but has no top-level instance. This
occurs in inline instances.
4.7. Symbol Info Section (GCOV_SYMBOL_INFO)
There is one symbol info section per top-level function that has
profile data.
SYMBOL_HEAD_COUNT 8 bytes Entry count for the function
SYMBOL_TIMESTAMP 8 bytes Timestamp (0 if unavailable)
SYMBOL_NUM_LOCATIONS 4 bytes
Followed by SYMBOL_NUM_LOCATIONS location records as defined in
Section 4.8.
4.8. Location Encoding
Each location record represents a source position within a
function, annotated with sample counts, call targets, or inlined
function data.
4.8.1. Common Header
All location records begin with:
LOCATION_BITMASK 1 byte
Bit 7 : has discriminator
Bits 0-6 : location type
LOCATION_LINE_OFFSET 3 bytes Line offset from function start
LOCATION_DISCRIMINATOR 2 bytes (present only if bit 7 set)
4.8.2. Defined Location Types
Total sizes are represented without and with the discriminator, respectively.
The types listed below specify the additional data present after the common
header.
LOCATION_ZERO (0x01)
No additional data. Represents a source location with zero
samples.
Total size: 4 or 6 bytes.
LOCATION_NORMAL (0x02)
LOCATION_COUNT 4 bytes Sample count
Total size: 8 or 10 bytes.
LOCATION_WIDE (0x03)
LOCATION_COUNT 8 bytes Sample count (for values > 2^32-1)
Total size: 12 or 14 bytes.
LOCATION_CALLED_FN (0x04)
LOCATION_CALLED_FN_INDEX 4 bytes SYMBOL_ID of callee
LOCATION_CALLED_FN_COUNT 8 bytes Call count
Represents a call site with a single known call target.
Total size: 16 or 18 bytes.
LOCATION_CALLED_FNS (0x05)
LOCATION_CALLED_TARGETS 4 bytes Number of targets
Followed by LOCATION_CALLED_TARGETS repetitions of:
LOCATION_CALLED_FN_INDEX 4 bytes
LOCATION_CALLED_FN_COUNT 8 bytes
Represents a call site with multiple (indirect) call targets.
LOCATION_CALLED_FN is provided as a separate type because
indirect calls are infrequent; using the single-target type
saves 4 bytes per direct call site.
Total size: (8 or 10) + 12 * LOCATION_CALLED_TARGETS bytes.
LOCATION_INLINED_FN (0x06)
INLINED_SYMBOL_ID 4 bytes SYMBOL_ID of inlined
function
Followed by a nested function body:
SYMBOL_NUM_LOCATIONS 4 bytes
GCOV_LOCATION (repeated SYMBOL_NUM_LOCATIONS times)
The nested locations are encoded identically and may themselves
contain further LOCATION_INLINED_FN records, forming an
arbitrarily deep inlining tree.
Total size: (12 or 14) bytes + nested structure.
4.8.3. Forward Compatibility
For location type values not defined above, readers must read a
LOCATION_TRAILING_SIZE field immediately after the common
header (line offset and optional discriminator), then skip that
many bytes to reach the next location record.
LOCATION_TRAILING_SIZE 4 bytes Size of trailing data
This mechanism allows older readers to process files containing
location types added in future format revisions.
4.9. Compact Encoding Mode
When the compact flag is set in the header bitmask (bit 7), the
following changes apply:
- GCOV_NUM_SECTIONS, all section offsets, and all section sizes
in the header are encoded as varints instead of fixed-width
big-endian integers. The starting 8 bytes of the file (magic / version)
always remain fixed-width.
- Within each section whose section bitmask has the compact flag
set, all integer fields (counts, indices, lengths) are encoded
as varints instead of their fixed-width equivalents. The
exception is one-byte fields (bitmasks, location types) which
remain fixed-width.
- String data within the trie (edge labels) remains as raw
bytes; only the length prefix uses varint encoding.
The compact encoding typically reduces file size by an additional
40-50% beyond the normal encoding.
5. Textual Format
5.1. Overview
The textual format provides a human-readable representation of the
same profile data stored in the binary format. It is intended for
inspection, debugging, and as an interchange format for tools that
do not need binary parsing.
The format consists of three top-level blocks: filenames, summary,
and symbols. Symbol names are double-quoted to handle C++ mangled
names that may contain special characters.
Section keywords (locations, callsites, inlined, and any future
additions) must match the pattern [a-z][a-z0-9_]*.
Braces are always balanced in the format. To skip an unknown
section for forward compatibility, input can be consumed counting balanced
{ / } pairs, treating content between double-quote pairs as
opaque (braces within quoted strings do not affect the count).
5.2. Grammar
The grammar is specified in BNF notation. Terminals are unquoted;
non-terminals are enclosed in angle brackets.
<profile> = <filenames> <summary> <symbols>
<filenames> = filenames = { <filename-list> }
<filename-list> = <quoted-string> (, <quoted-string>)*
<summary> = summary = { <summary-fields> }
<summary-fields>
= total_count = <number> ,
max_count = <number> ,
max_fn_count = <number> ,
num_counts = <number> ,
num_functions = <number> ,
num_detailed_entries = <number> ,
detailed_entries = { <detailed-entry-list>? }
<detailed-entry-list>
= <detailed-entry> (, <detailed-entry>)*
<detailed-entry>
= { cutoff = <number> ,
min_count = <number> ,
num_counts = <number> }
<symbols> = <symbol>*
<symbol> = <symbol-header> = { <section-list>? }
<symbol-header> = <quoted-string> : <file-id>
( <symbol-id> : <number> : <number> )
<file-id> = <number> | -1
<symbol-id> = <number>
<section-list> = <section> (, <section>)*
<section> = <locations-section>
| <callsites-section>
| <inlined-section>
<locations-section>
= locations = { <location-entry-list> }
<location-entry-list>
= <location-entry> (, <location-entry>)*
<location-entry>
= <location> = <number>
<callsites-section>
= callsites = { <callsite-entry-list> }
<callsite-entry-list>
= <callsite-entry> (, <callsite-entry>)*
<callsite-entry>
= <location> -> { <target-list> }
<target-list> = <target> (, <target>)*
<target> = <symbol-id> = <number>
<inlined-section>
= inlined = { <inlined-entry-list> }
<inlined-entry-list>
= <inlined-entry> (, <inlined-entry>)*
<inlined-entry> = <location> = <inlined-fn>
<inlined-fn> = <inlined-header> = { <section-list>? }
<inlined-header>
= <quoted-string> : <file-id> ( <symbol-id> )
<location> = <number> (. <number>)?
<quoted-string> = " <string-content> "
<string-content>
= (* zero or more characters except " *)
<number> = [0-9]+
In the symbol header, the three numbers in parentheses are, in
order: symbol ID, head count, and timestamp. In the inlined
function header, the single number in parentheses is the symbol
ID.
The file ID is the zero-based index into the filenames list, or
-1 for symbols with unknown source file.
Empty sections (locations, callsites, inlined) may be omitted
entirely from a symbol's section list.
5.3. Example
The following example shows a profile with two functions, one of
which has an inlined call to printf:
filenames = {
"/home/user/test.c",
"/usr/include/bits/stdio2.h"
}
summary = {
total_count = 2194467,
max_count = 659399,
max_fn_count = 0,
num_counts = 23,
num_functions = 2,
num_detailed_entries = 16,
detailed_entries = {
{cutoff = 10000, min_count = 659399, num_counts = 2},
{cutoff = 100000, min_count = 659399, num_counts = 2},
{cutoff = 200000, min_count = 659399, num_counts = 2},
{cutoff = 300000, min_count = 659399, num_counts = 2},
{cutoff = 400000, min_count = 659399, num_counts = 2},
{cutoff = 500000, min_count = 659399, num_counts = 2},
{cutoff = 600000, min_count = 659399, num_counts = 2},
{cutoff = 700000, min_count = 218844, num_counts = 6},
{cutoff = 800000, min_count = 218844, num_counts = 6},
{cutoff = 900000, min_count = 218844, num_counts = 6},
{cutoff = 950000, min_count = 218844, num_counts = 6},
{cutoff = 990000, min_count = 218844, num_counts = 6},
{cutoff = 999000, min_count = 218844, num_counts = 6},
{cutoff = 999900, min_count = 31, num_counts = 9},
{cutoff = 999990, min_count = 23, num_counts = 16},
{cutoff = 999999, min_count = 23, num_counts = 16}
}
}
"bubble_sort":0(1:0:0) = {
locations = {
0 = 0,
1 = 0,
2 = 34,
3 = 31,
4.1 = 659399,
4.2 = 659399,
5 = 23,
6 = 218844,
7 = 218844,
8 = 218844,
9 = 218844,
13 = 31
}
}
"sort_array":0(3:0:0) = {
locations = {
0 = 0,
2 = 0,
3 = 0,
3.1 = 29,
3.3 = 29,
4 = 29,
4.1 = 29,
6 = 29,
7 = 29
},
inlined = {
1 = "printf":1(2) = {
locations = {
0 = 0,
2 = 0
}
}
}
}
6. Extensibility
The format is designed to be extensible while remaining backwards
compatible.
6.1. Binary Format
There are two main points of extensibility:
- Section types: up to 127 unique section types are supported,
of which 5 are currently reserved (see Section 4.2).
- Location types: up to 127 unique location types are supported,
of which 6 are currently reserved (see Section 4.8.2).
There are no restrictions on the data that new types of either
of the above may contain.
Backwards compatibility is preserved as follows:
- Sections: the section table at the start of the file records
the offset and size of every section (see Section 4.1), so a
reader can skip sections whose type it does not recognise.
- Locations: every location whose type is not one of the six
reserved values must include a 4-byte LOCATION_TRAILING_SIZE
field immediately after the common header (see Section 4.8.3).
This field gives the number of trailing bytes for the record,
allowing a reader to skip unknown location types.
Existing sections and locations cannot be modified. New fields
or data are added by defining new section or location types.
New section or location types are not automatically linked to
existing sections, locations, or symbol records. Any such linkage
must be established by a separate extension to the format.
6.2. Textual Format
The textual format is likewise straightforward to extend. New
sections within a symbol can be introduced using the
<section> = { <data> } syntax. Because braces are required to be
balanced, a parser can skip over an unknown section by scanning
until the matching closing brace. New top-level sections follow
the same rule.
Extensions must define section keywords matching the pattern
[a-z][a-z0-9_]*. The format of the data contained within a new
section is unspecified; the only requirement is that any braces
within it are balanced.
6.3. Forward Compatibility
The format supports forward compatibility only in that unknown
section and location types can be skipped (see Section 6.1). It
makes no guarantee that a profile using extensions will produce
meaningful results in a consumer that does not understand them;
consumers must also handle absent extensions gracefully.
7. Experimental Results
The following tables compare file sizes between the existing v3
format and v4 in both normal ("v4") and compact encoding modes ("v4c").
7.1. SPEC CPU 2017
Profile v3 v4 v4c v4 v4c
------- -- -- --- ----- -----
cpugcc_r 7.1M 4.1M 1.9M -43% -74%
cpuxalan_r 1.3M 790K 516K -40% -61%
deepsjeng_r 103K 59K 30K -44% -71%
exchange2_r 27K 16K 7.7K -43% -71%
leela_r 299K 195K 113K -35% -62%
mcf_r 27K 16K 7.8K -43% -71%
omnetpp_r 1.2M 764K 425K -37% -65%
perlbench_r 412K 229K 105K -44% -75%
specrand_ir 3.3K 1.6K 705 -53% -79%
x264_r 434K 247K 113K -43% -74%
xz_r 75K 46K 23K -39% -69%
TOTAL 11M 6.4M 3.2M -42% -71%
7.2. GCC Bootstrap
Profile v3 v4 v4c v4 v4c
------- -- -- --- ----- -----
all.fda 56M 32M 16M -43% -71%
--
Regards,
Dhruv