Type representation in CTF and DWARF

Indu Bhagat Fri, 04 Oct 2019 12:12:22 -0700

Hello,

At GNU Tools Cauldron this year, some folks were curious to know more on how
the "type representation" in CTF compares vis-a-vis DWARF.


I use small testcase below to gather some numbers to help drive this discussion.

[ibhagat@ibhagatpc ctf-size]$ cat ctf_sizeme.c
#define MAX_NUM_MSGS 5

enum node_type

{
  INIT_TYPE = 0,
  COMM_TYPE = 1,
  COMP_TYPE = 2,
  MSG_TYPE = 3,
  RELEASE_TYPE = 4,
  MAX_NODE_TYPE
};

typedef struct node_payload

{
  unsigned short npay_offset;
  const char * npay_msg;
  unsigned int npay_nelems;
  struct node_payload * npay_next;
} node_payload;

typedef struct node_property

{
  int timestamp;
  char category;
  long initvalue;
} node_property_t;

typedef struct node

{
  enum node_type ntype;
  int nmask:5;
  union
    {
      struct node_payload * npayload;
      void * nbase;
    } nu;
    unsigned int msgs[MAX_NUM_MSGS];
    node_property_t node_prop;
} Node;

Node s;int main (void)

{
  return 0;
}

Note that in this case, there is nothing that the de-duplicator has to do
(neither for the TYPE comdat sections nor CTF types). I chose such an example
because de-duplication of types is orthogonal to the concept of representation
of types.

So, for the small C testcase with a union, enum, array, struct, typedef etc, I
see following sizes :

Compile with -fdebug-types-section -gdwarf-4 (size -A <binary> excerpt):
    .debug_aranges     48         0
    .debug_info       150         0
    .debug_abbrev     314         0
    .debug_line        73         0
    .debug_str        455         0
    .debug_ranges      32         0
    .debug_types      578         0

Compile with -fdebug-types-section -gdwarf-5 (size -A <binary> excerpt):
    .debug_aranges      48         0
    .debug_info        732         0
    .debug_abbrev      309         0
    .debug_line         73         0
    .debug_str         455         0
    .debug_rnglists     23         0

Compile with -gt (size -A <binary> excerpt):
    .ctf      966     0
    CTF strings sub-section size (ctf_strlen in disassmebly) = 374
    == > CTF section just for representing types = 966 - 374 = 592 bytes
    (The 592 bytes include the CTF header and other indexes etc.)

So, following points are what I would highlight. Hopefully this helps you see
that CTF has promise for the task of representing type debug info.

1. Type Information layout in sections:
   A .ctf section is self-sufficient to represent types in a program. All
   references within the CTF section are via either indexes or offsets into the
   CTF section. No relocations are necessary in CTF at this time. In contrast,
   DWARF type information is organized in multiple sections - .debug_info,
   .debug_abbrev and .debug_str sections in DWARF5; plus .debug_types in DWARF4.

2. Type Information encoding / compactness matters:
   Because the type information is organized across sections in DWARF (and
   contains some debug information like location etc.) , it is not feasible
   to put a distinct number to the size in bytes for representing type
   information in DWARF. But the size info of sections shown above should
   be helpful to show that CTF does show promise in compactly representing
   types.

   Lets see some size data. CTF string table (= 374 bytes) is left out of the
   discussion at hand because it will not be fair to compare with .debug_str
   section which contains other information than just names of types.

   The 592 bytes of the .ctf section are needed to represent types in CTF
   format. Now, when using DWARF5, the type information needs 732 bytes in
   .debug_info and 309 bytes in .debug_abbrev.

   In DWARF (when using -fdebug-types-section), the base types are duplicated
   across type units. So for the above example, the DWARF DIE representing
   'unsigned int' will appear in both the  DWARF trees for types - node and
   node_payload. In CTF, there is a single lone type 'unsigned int'.

3. Type Information retrieval and handling:
   CTF type information is organized as a linear array of CTF types. CTF types
   have references to other CTF types. libctf facilitates name lookups, i.e.
   given the name of the type, get the type information.

   DWARF type information is organized in a tree of DIEs. The information at
   the leaf DIEs (base types) across DWARF type units is often duplicated.
   DWARF type units do have references to other type units for larger types
   though. In the example, the DWARF type unit for node has a reference to the
   DWARF type unit for node_payload.

   I only state the above for sake of observation, I don't know for certain if
   one format is necessarily better or worse for consumers of type debug
   information at this time WRT runtime access patterns.

   On a related note though, it's not clear to me how .debug_types integration
   with split-dwarf works out. If the linker does not see the
   non-relocation-necessary part of the DWARF, I am not sure how .debug_type 
type
   units are de-duplicated when using split-dwarf.

Thanks
Indu

Type representation in CTF and DWARF

Reply via email to