Hi all,

Per prior discussions, here is the specification we are proposing for
version 4 of the AutoFDO GCOV profile format. This is a complete re-design
focused around being a backwards-compatible and extensible format which
can be partially read by the compiler.

=================

Table of Contents

   1.  Introduction
   2.  Motivation
   3.  Open Questions
   4.  Binary Format
       4.1.  File Header
       4.2.  Section Layout
       4.3.  String Table Section (GCOV_STRING_TABLE)
       4.4.  Summary Section (GCOV_SUMMARY_INFO)
       4.5.  File Names Section (GCOV_FILE_NAMES)
       4.6.  Symbol Names Section (GCOV_SYMBOL_NAMES)
       4.7.  Symbol Info Section (GCOV_SYMBOL_INFO)
       4.8.  Location Encoding
       4.9.  Compact Encoding Mode
   5.  Textual Format
       5.1.  Overview
       5.2.  Grammar
       5.3.  Example
   6.  Extensibility
       6.1.  Binary Format
       6.2.  Textual Format
       6.3.  Forward Compatibility
   7.  Experimental Results


1.  Introduction

   The existing GCOV AutoFDO profile formats (versions 2 and 3) use a
   fixed, non-extensible tag-length sequential layout. Version 4 redesigns
   the format to be a section-based one with the following goals:

     - The ability to partially read the profile, i.e. unrequired sections are
       simply skipped by a reader.

     - Forward compatibility through typed location records with a
       trailing size field for unknown types. This is useful (for example) for
       future work regarding branch profiling for better if-conversion
       heuristics.

     - Reduced file size through trie-compressed string tables,
       optional discriminators, zero-count elision, and variable-width
       sample counts, as detailed in later sections.

     - Optional compact encoding using Protocol Buffers-style varints
       for further size reduction.

     - A companion human-readable textual format for inspection.

   Measured against a GCC bootstrap profile (58 MB in v3), the v4
   binary format achieves -43% (33 MB) in normal mode and -72%
   (16 MB) in compact mode. Against SPEC CPU 2017 profiles, it
   also achieves -43% in normal mode and -72% in compact mode on average.

   All multi-byte integers in the binary format are unsigned and
   stored in big-endian byte order.

   "Varint" refers to the variable-length integer encoding defined by Protocol
   Buffers: each byte uses 7 data bits with the MSB as a continuation flag,
   least significant group first
   (see https://protobuf.dev/programming-guides/encoding/#varints).

2.  Motivation

   The primary issue with the existing GCOV format is the lack of extensibility
   and backwards-compatibility. Adding features to the format necessitates
   bumping the version number, which also requires updating all of the tooling
   working with the format.

   Secondly, the profile is required to be streamed in its entirety from disk
   whenever it is read, which leads to duplicate work when multiple TUs are
   compiled as the compiler has to read the entire profile per TU. Having the
   ability to read the profile partially allows getting rid of most of this
   duplicate work.

3.  Open Questions

   - Can the GCOV profile format be renamed to something better? The
     current name is confusing and collides with the existing GCC PGO
     format. Perhaps "AFDO" could work.

4.  Binary Format

4.1.  File Header

   The file begins with a fixed header followed by a section table.

     GCOV_MAGIC            4 bytes   ASCII 'gcov' (0x67636F76)
     GCOV_VERSION          4 bytes   0x00000004
     GCOV_HEADER_BITMASK   1 byte
       Bit 7    : compact flag
       Bits 0-6 : reserved
     GCOV_NUM_SECTIONS     7 bytes   Number of entries in the
                                     section table (excludes the
                                     two fixed sections below)

   Immediately following are offset/size pairs for two fixed
   sections, then the section table:

     GCOV_SUMMARY_OFFSET        8 bytes
     GCOV_SUMMARY_SIZE          8 bytes
     GCOV_FILE_NAMES_OFFSET     8 bytes
     GCOV_FILE_NAMES_SIZE       8 bytes

     GCOV_SECTION (repeated GCOV_NUM_SECTIONS times):
       GCOV_SECTION_OFFSET      8 bytes
       GCOV_SECTION_SIZE        8 bytes

   All offsets are byte offsets from the start of the file.

   The offsets for the summary and file names sections are provided
   for fast access. There is no specified ordering of any of the
   sections. The indexing is done based on the order they are placed
   within the file, and is inclusive of the two sections mentioned
   before.

4.2.  Section Layout

   Each section begins with a one-byte bitmask:

     GCOV_SECTION_BITMASK   1 byte
       Bit 7    : compact flag for this section
       Bits 0-6 : section type

   Defined section types:

     0x01   GCOV_STRING_TABLE
     0x02   GCOV_SUMMARY_INFO
     0x03   GCOV_FILE_NAMES
     0x04   GCOV_SYMBOL_NAMES
     0x05   GCOV_SYMBOL_INFO

   The section bitmask is followed immediately by the section data as
   defined in the subsections below.

4.3.  String Table Section (GCOV_STRING_TABLE)

   Each translation unit has its own string table section, storing
   its symbol name strings in a path-compressed trie. This exploits
   the shared prefixes common in C++ mangled names (e.g.
   "_ZNSt6vector...").

     GCOV_NUM_STRINGS   4 bytes   Total number of strings

   Followed by a serialized trie starting at the root node:

     GCOV_TRIE_NODE:
       TRIE_NODE_BITMASK        1 byte
         Bit 7:  terminal flag (a complete string ends here)
         Bits 0-6: number of children (0-127)
       TRIE_TERMINAL_ID         4 bytes (present only if terminal)
                                String index for this terminal
       TRIE_CHILD (repeated for each child):
         TRIE_EDGE_LABEL_LENGTH   2 bytes
         TRIE_EDGE_LABEL          variable (TRIE_EDGE_LABEL_LENGTH
                                  bytes)
         GCOV_TRIE_NODE           (recursively)

   The trie is path-compressed: chains of nodes where each node has
   exactly one child and is not a terminal are collapsed into a single
   edge with a multi-character label. To reconstruct a string,
   edge labels are concatenated from the root to the terminal node, like a
   typical trie.

4.4.  Summary Section (GCOV_SUMMARY_INFO)

   The summary section contains aggregate statistics for the entire
   profile.

     SUMMARY_TOTAL_COUNT            8 bytes
     SUMMARY_MAX_COUNT              8 bytes
     SUMMARY_MAX_FN_COUNT           8 bytes
     SUMMARY_NUM_COUNTS             8 bytes
     SUMMARY_NUM_FUNCTIONS          8 bytes
     SUMMARY_NUM_DETAILED_ENTRIES   8 bytes

   Followed by detailed histogram entries:

     SUMMARY_DETAILED_ENTRY (repeated SUMMARY_NUM_DETAILED_ENTRIES
                             times):
       ENTRY_CUTOFF       4 bytes   Cutoff value
       ENTRY_MIN_COUNT    8 bytes   Minimum count at this percentile
       ENTRY_NUM_COUNTS   8 bytes   Number of counters at or above this minimum

4.5.  File Names Section (GCOV_FILE_NAMES)

   There is exactly one file names section per profile. It maps source
   files to their corresponding symbol names sections.

     GCOV_NUM_FILE_NAMES   4 bytes

     GCOV_FILE_NAME (repeated GCOV_NUM_FILE_NAMES times):
       FILE_NAME_STRING_LENGTH      4 bytes
       FILE_NAME_STRING_CONTENT     variable (FILE_NAME_STRING_LENGTH
                                    bytes, including '\0')
       STRING_TABLE_SECTION_INDEX   4 bytes   Section index of the
                                              per-TU string table
       SYMBOL_NAMES_SECTION_INDEX   4 bytes   Section index of the
                                              corresponding
                                              GCOV_SYMBOL_NAMES section
       SYMBOL_ID_START              4 bytes   First symbol ID in
                                              this TU (inclusive)
       SYMBOL_ID_END                4 bytes   Last symbol ID in
                                              this TU (exclusive)

   An entry with an empty filename string represents symbols with
   unknown source file. This entry must be present.

   Symbol IDs are contiguous within a TU. This way, it is easier to
   lookup which TU a symbol belongs to.

4.6.  Symbol Names Section (GCOV_SYMBOL_NAMES)

   There is one symbol names section per file name entry, providing
   a per-TU symbol table.

     GCOV_NUM_SYMBOL_NAMES   4 bytes

     GCOV_SYMBOL_NAME (repeated GCOV_NUM_SYMBOL_NAMES times):
       SYMBOL_NAME_STRING_INDEX    4 bytes   Index into this TU's
                                             string table
       SYMBOL_ID                   4 bytes   Unique identifier
       SYMBOL_INFO_SECTION_INDEX   4 bytes   Section index of the
                                             GCOV_SYMBOL_INFO, or
                                             0xFFFFFFFF if none

   SYMBOL_ID values are globally unique within the file and are
   used to reference symbols from location records.

   Note: The value 0xFFFFFFFF for SYMBOL_INFO_SECTION_INDEX is reserved, and
         occurs when a symbol is referred to but has no top-level instance. This
         occurs in inline instances.

4.7.  Symbol Info Section (GCOV_SYMBOL_INFO)

   There is one symbol info section per top-level function that has
   profile data.

     SYMBOL_HEAD_COUNT      8 bytes   Entry count for the function
     SYMBOL_TIMESTAMP       8 bytes   Timestamp (0 if unavailable)
     SYMBOL_NUM_LOCATIONS   4 bytes

   Followed by SYMBOL_NUM_LOCATIONS location records as defined in
   Section 4.8.

4.8.  Location Encoding

   Each location record represents a source position within a
   function, annotated with sample counts, call targets, or inlined
   function data.

4.8.1.  Common Header

   All location records begin with:

     LOCATION_BITMASK        1 byte
       Bit 7    :  has discriminator
       Bits 0-6 : location type
     LOCATION_LINE_OFFSET    3 bytes   Line offset from function start
     LOCATION_DISCRIMINATOR  2 bytes   (present only if bit 7 set)

4.8.2.  Defined Location Types

   Total sizes are represented without and with the discriminator, respectively.
   The types listed below specify the additional data present after the common
   header.

   LOCATION_ZERO (0x01)

     No additional data. Represents a source location with zero
     samples.

     Total size: 4 or 6 bytes.

   LOCATION_NORMAL (0x02)

     LOCATION_COUNT   4 bytes   Sample count

     Total size: 8 or 10 bytes.

   LOCATION_WIDE (0x03)

     LOCATION_COUNT   8 bytes   Sample count (for values > 2^32-1)

     Total size: 12 or 14 bytes.

   LOCATION_CALLED_FN (0x04)

     LOCATION_CALLED_FN_INDEX   4 bytes   SYMBOL_ID of callee
     LOCATION_CALLED_FN_COUNT   8 bytes   Call count

     Represents a call site with a single known call target.

     Total size: 16 or 18 bytes.

   LOCATION_CALLED_FNS (0x05)

     LOCATION_CALLED_TARGETS   4 bytes   Number of targets

     Followed by LOCATION_CALLED_TARGETS repetitions of:

       LOCATION_CALLED_FN_INDEX   4 bytes
       LOCATION_CALLED_FN_COUNT   8 bytes

     Represents a call site with multiple (indirect) call targets.
     LOCATION_CALLED_FN is provided as a separate type because
     indirect calls are infrequent; using the single-target type
     saves 4 bytes per direct call site.

     Total size: (8 or 10) + 12 * LOCATION_CALLED_TARGETS bytes.

   LOCATION_INLINED_FN (0x06)

     INLINED_SYMBOL_ID      4 bytes   SYMBOL_ID of inlined
                                      function

     Followed by a nested function body:

       SYMBOL_NUM_LOCATIONS   4 bytes
       GCOV_LOCATION          (repeated SYMBOL_NUM_LOCATIONS times)

     The nested locations are encoded identically and may themselves
     contain further LOCATION_INLINED_FN records, forming an
     arbitrarily deep inlining tree.

     Total size: (12 or 14) bytes + nested structure.

4.8.3.  Forward Compatibility

   For location type values not defined above, readers must read a
   LOCATION_TRAILING_SIZE field immediately after the common
   header (line offset and optional discriminator), then skip that
   many bytes to reach the next location record.

     LOCATION_TRAILING_SIZE   4 bytes   Size of trailing data

   This mechanism allows older readers to process files containing
   location types added in future format revisions.

4.9.  Compact Encoding Mode

   When the compact flag is set in the header bitmask (bit 7), the
   following changes apply:

     - GCOV_NUM_SECTIONS, all section offsets, and all section sizes
       in the header are encoded as varints instead of fixed-width
       big-endian integers. The starting 8 bytes of the file (magic / version)
       always remain fixed-width.

     - Within each section whose section bitmask has the compact flag
       set, all integer fields (counts, indices, lengths) are encoded
       as varints instead of their fixed-width equivalents. The
       exception is one-byte fields (bitmasks, location types) which
       remain fixed-width.

     - String data within the trie (edge labels) remains as raw
       bytes; only the length prefix uses varint encoding.

   The compact encoding typically reduces file size by an additional
   40-50% beyond the normal encoding.

5.  Textual Format

5.1.  Overview

   The textual format provides a human-readable representation of the
   same profile data stored in the binary format. It is intended for
   inspection, debugging, and as an interchange format for tools that
   do not need binary parsing.

   The format consists of three top-level blocks: filenames, summary,
   and symbols. Symbol names are double-quoted to handle C++ mangled
   names that may contain special characters.

   Section keywords (locations, callsites, inlined, and any future
   additions) must match the pattern [a-z][a-z0-9_]*.

   Braces are always balanced in the format. To skip an unknown
   section for forward compatibility, input can be consumed counting balanced
   { / } pairs, treating content between double-quote pairs as
   opaque (braces within quoted strings do not affect the count).

5.2.  Grammar

   The grammar is specified in BNF notation. Terminals are unquoted;
   non-terminals are enclosed in angle brackets.

   <profile>       = <filenames> <summary> <symbols>

   <filenames>     = filenames = { <filename-list> }
   <filename-list> = <quoted-string> (, <quoted-string>)*

   <summary>       = summary = { <summary-fields> }
   <summary-fields>
                   = total_count = <number> ,
                     max_count = <number> ,
                     max_fn_count = <number> ,
                     num_counts = <number> ,
                     num_functions = <number> ,
                     num_detailed_entries = <number> ,
                     detailed_entries = { <detailed-entry-list>? }

   <detailed-entry-list>
                   = <detailed-entry> (, <detailed-entry>)*
   <detailed-entry>
                   = { cutoff = <number> ,
                       min_count = <number> ,
                       num_counts = <number> }

   <symbols>       = <symbol>*

   <symbol>        = <symbol-header> = { <section-list>? }
   <symbol-header> = <quoted-string> : <file-id>
                     ( <symbol-id> : <number> : <number> )

   <file-id>       = <number> | -1
   <symbol-id>     = <number>

   <section-list>  = <section> (, <section>)*
   <section>       = <locations-section>
                   | <callsites-section>
                   | <inlined-section>

   <locations-section>
                   = locations = { <location-entry-list> }
   <location-entry-list>
                   = <location-entry> (, <location-entry>)*
   <location-entry>
                   = <location> = <number>

   <callsites-section>
                   = callsites = { <callsite-entry-list> }
   <callsite-entry-list>
                   = <callsite-entry> (, <callsite-entry>)*
   <callsite-entry>
                   = <location> -> { <target-list> }
   <target-list>   = <target> (, <target>)*
   <target>        = <symbol-id> = <number>

   <inlined-section>
                   = inlined = { <inlined-entry-list> }
   <inlined-entry-list>
                   = <inlined-entry> (, <inlined-entry>)*
   <inlined-entry> = <location> = <inlined-fn>

   <inlined-fn>    = <inlined-header> = { <section-list>? }
   <inlined-header>
                   = <quoted-string> : <file-id> ( <symbol-id> )

   <location>      = <number> (. <number>)?

   <quoted-string> = " <string-content> "
   <string-content>
                   = (* zero or more characters except " *)
   <number>        = [0-9]+

   In the symbol header, the three numbers in parentheses are, in
   order: symbol ID, head count, and timestamp. In the inlined
   function header, the single number in parentheses is the symbol
   ID.

   The file ID is the zero-based index into the filenames list, or
   -1 for symbols with unknown source file.

   Empty sections (locations, callsites, inlined) may be omitted
   entirely from a symbol's section list.

5.3.  Example

   The following example shows a profile with two functions, one of
   which has an inlined call to printf:

   filenames = {
     "/home/user/test.c",
     "/usr/include/bits/stdio2.h"
   }

   summary = {
     total_count = 2194467,
     max_count = 659399,
     max_fn_count = 0,
     num_counts = 23,
     num_functions = 2,
     num_detailed_entries = 16,
     detailed_entries = {
       {cutoff = 10000, min_count = 659399, num_counts = 2},
       {cutoff = 100000, min_count = 659399, num_counts = 2},
       {cutoff = 200000, min_count = 659399, num_counts = 2},
       {cutoff = 300000, min_count = 659399, num_counts = 2},
       {cutoff = 400000, min_count = 659399, num_counts = 2},
       {cutoff = 500000, min_count = 659399, num_counts = 2},
       {cutoff = 600000, min_count = 659399, num_counts = 2},
       {cutoff = 700000, min_count = 218844, num_counts = 6},
       {cutoff = 800000, min_count = 218844, num_counts = 6},
       {cutoff = 900000, min_count = 218844, num_counts = 6},
       {cutoff = 950000, min_count = 218844, num_counts = 6},
       {cutoff = 990000, min_count = 218844, num_counts = 6},
       {cutoff = 999000, min_count = 218844, num_counts = 6},
       {cutoff = 999900, min_count = 31, num_counts = 9},
       {cutoff = 999990, min_count = 23, num_counts = 16},
       {cutoff = 999999, min_count = 23, num_counts = 16}
     }
   }

   "bubble_sort":0(1:0:0) = {
     locations = {
       0 = 0,
       1 = 0,
       2 = 34,
       3 = 31,
       4.1 = 659399,
       4.2 = 659399,
       5 = 23,
       6 = 218844,
       7 = 218844,
       8 = 218844,
       9 = 218844,
       13 = 31
     }
   }

   "sort_array":0(3:0:0) = {
     locations = {
       0 = 0,
       2 = 0,
       3 = 0,
       3.1 = 29,
       3.3 = 29,
       4 = 29,
       4.1 = 29,
       6 = 29,
       7 = 29
     },
     inlined = {
       1 = "printf":1(2) = {
         locations = {
           0 = 0,
           2 = 0
         }
       }
     }
   }

6.  Extensibility

   The format is designed to be extensible while remaining backwards
   compatible.

6.1.  Binary Format

   There are two main points of extensibility:

     - Section types: up to 127 unique section types are supported,
       of which 5 are currently reserved (see Section 4.2).

     - Location types: up to 127 unique location types are supported,
       of which 6 are currently reserved (see Section 4.8.2).

   There are no restrictions on the data that new types of either
   of the above may contain.

   Backwards compatibility is preserved as follows:

     - Sections: the section table at the start of the file records
       the offset and size of every section (see Section 4.1), so a
       reader can skip sections whose type it does not recognise.

     - Locations: every location whose type is not one of the six
       reserved values must include a 4-byte LOCATION_TRAILING_SIZE
       field immediately after the common header (see Section 4.8.3).
       This field gives the number of trailing bytes for the record,
       allowing a reader to skip unknown location types.

   Existing sections and locations cannot be modified. New fields
   or data are added by defining new section or location types.

   New section or location types are not automatically linked to
   existing sections, locations, or symbol records. Any such linkage
   must be established by a separate extension to the format.

6.2.  Textual Format

   The textual format is likewise straightforward to extend. New
   sections within a symbol can be introduced using the
   <section> = { <data> } syntax. Because braces are required to be
   balanced, a parser can skip over an unknown section by scanning
   until the matching closing brace. New top-level sections follow
   the same rule.

   Extensions must define section keywords matching the pattern
   [a-z][a-z0-9_]*. The format of the data contained within a new
   section is unspecified; the only requirement is that any braces
   within it are balanced.

6.3.  Forward Compatibility

   The format supports forward compatibility only in that unknown
   section and location types can be skipped (see Section 6.1). It
   makes no guarantee that a profile using extensions will produce
   meaningful results in a consumer that does not understand them;
   consumers must also handle absent extensions gracefully.

7.  Experimental Results

   The following tables compare file sizes between the existing v3
   format and v4 in both normal ("v4") and compact encoding modes ("v4c").

7.1.  SPEC CPU 2017

     Profile            v3        v4       v4c     v4    v4c
     -------            --        --       ---   -----  -----
     cpugcc_r          7.1M      4.1M     1.9M    -43%   -74%
     cpuxalan_r        1.3M      790K     516K    -40%   -61%
     deepsjeng_r       103K       59K      30K    -44%   -71%
     exchange2_r        27K       16K     7.7K    -43%   -71%
     leela_r           299K      195K     113K    -35%   -62%
     mcf_r              27K       16K     7.8K    -43%   -71%
     omnetpp_r         1.2M      764K     425K    -37%   -65%
     perlbench_r       412K      229K     105K    -44%   -75%
     specrand_ir       3.3K      1.6K      705    -53%   -79%
     x264_r            434K      247K     113K    -43%   -74%
     xz_r               75K       46K      23K    -39%   -69%

     TOTAL              11M      6.4M     3.2M    -42%   -71%

7.2.  GCC Bootstrap

     Profile            v3        v4       v4c     v4    v4c
     -------            --        --       ---   -----  -----
     all.fda            56M       32M      16M    -43%   -71%

--
Regards,
Dhruv

Reply via email to