Re: Type representation in CTF and DWARF

Indu Bhagat Mon, 07 Oct 2019 13:47:53 -0700



On 10/07/2019 12:35 AM, Richard Biener wrote:

On Fri, Oct 4, 2019 at 9:12 PM Indu Bhagat <indu.bha...@oracle.com> wrote:

Hello,

At GNU Tools Cauldron this year, some folks were curious to know more on how
the "type representation" in CTF compares vis-a-vis DWARF.

[...]

So, for the small C testcase with a union, enum, array, struct, typedef etc, I
see following sizes :

Compile with -fdebug-types-section -gdwarf-4 (size -A <binary> excerpt):
      .debug_aranges     48         0
      .debug_info       150         0
      .debug_abbrev     314         0
      .debug_line        73         0
      .debug_str        455         0
      .debug_ranges      32         0
      .debug_types      578         0

Compile with -fdebug-types-section -gdwarf-5 (size -A <binary> excerpt):
      .debug_aranges      48         0
      .debug_info        732         0
      .debug_abbrev      309         0
      .debug_line         73         0
      .debug_str         455         0
      .debug_rnglists     23         0

Compile with -gt (size -A <binary> excerpt):
      .ctf      966     0
      CTF strings sub-section size (ctf_strlen in disassmebly) = 374
      == > CTF section just for representing types = 966 - 374 = 592 bytes
      (The 592 bytes include the CTF header and other indexes etc.)

So, following points are what I would highlight. Hopefully this helps you see
that CTF has promise for the task of representing type debug info.

1. Type Information layout in sections:
     A .ctf section is self-sufficient to represent types in a program. All
     references within the CTF section are via either indexes or offsets into 
the
     CTF section. No relocations are necessary in CTF at this time. In contrast,
     DWARF type information is organized in multiple sections - .debug_info,
     .debug_abbrev and .debug_str sections in DWARF5; plus .debug_types in 
DWARF4.

2. Type Information encoding / compactness matters:
     Because the type information is organized across sections in DWARF (and
     contains some debug information like location etc.) , it is not feasible
     to put a distinct number to the size in bytes for representing type
     information in DWARF. But the size info of sections shown above should
     be helpful to show that CTF does show promise in compactly representing
     types.

     Lets see some size data. CTF string table (= 374 bytes) is left out of the
     discussion at hand because it will not be fair to compare with .debug_str
     section which contains other information than just names of types.

     The 592 bytes of the .ctf section are needed to represent types in CTF
     format. Now, when using DWARF5, the type information needs 732 bytes in
     .debug_info and 309 bytes in .debug_abbrev.

     In DWARF (when using -fdebug-types-section), the base types are duplicated
     across type units. So for the above example, the DWARF DIE representing
     'unsigned int' will appear in both the  DWARF trees for types - node and
     node_payload. In CTF, there is a single lone type 'unsigned int'.

It's not clear to me why you are using -fdebug-types-section for this
comparison?
With just -gdwarf-4 I get

.debug_info      292
.debug_abbrev 189
.debug_str       299

this contains all the info CTF provides (and more).  This sums to 780 bytes,
smaller than the CTF variant.  I skimmed over the info and there's not much
to strip to get to CTF levels, mainly locations.  The strings section also
has a quite large portion for GCC version and arguments, which is 93 bytes.
So overall the DWARF representation should clock in at less than 700 bytes,
more close to 650.

Richard.


It's not in favor of DWARF to go with just -gdwarf-4. Because the types
in the .debug_info section will not be de-duplicated. For more complicated code
bases with many compilation units, this will skew the results in favor of CTF
(once the CTF de-duplictor is ready :) ).

Now, one might argue that in this example, there is no role for de-duplicator.
Yes to that. But to all users of DWARF type debug information for _real
codebases_, -fdebug-types-section option is the best option. Isn't it ?

Keeping "the size of type debug information in the shipped artifact small" as
our target is meaningful for both CTF and DWARF.

De-duplication is a key contributor to reducing the size of the type debug
information; and both CTF and DWARF types can be de-duplicated. At this time, I
stuck to a simple example with one CU because it eases interpreting the CTF and
DWARF debug info in the binaries and because the CTF link-time de-duplication
is not fully ready.

(NickA suggested few days ago to compare how DWARF and CTF section sizes
 increase when a new member, or a new enum, or a new union etc are added. I can
 share some more data if there is interest in such a comparison. Few examples
 below :

1. Add a new member 'struct node_payload * a' to struct node_payload
   DWARF = 589 - 578 (.debug_types); 331 - 314 (.debug_abbrev); total = 11 + 17 
= 28
   CTF = 980 - 966 (.ctf) ; string bytes increase = 2 ("a\0"); total = 14 - 2 = 
12
2. Add a new enumeration value 'A = 5,' to enum node_type
   DWARF = 582 - 578 (.debug_types); 323 - 314 (.debug_abbrev); total = 4 + 9 = 
13
   CTF = 976 - 966 (.ctf); string bytes increase = 2 ("a\0"); total = 8
3. Add new member 'unsigned int a' to struct node_payload
   DWARF = 589 - 578 (.debug_types); 331 - 314 (.debug_abbrev); total = 11 + 17 
= 28
   CTF = 980 - 966 (.ctf); string bytes increase = 2; total = 14 - 2 = 12
4. Add new union nu2 to struct node (n2 mirrors nu; all new strings = "a", "b", 
"n2")
   DWARF = 666 - 578 (.debug_types); 329 - 314 (.debug_abbrev); total = 88 + 15 
= 103
   CTF = 1021 - 966 (.ctf); string bytes increase = 7; total = 55 - 7 = 48

The larger "issue" is that both CTF and DWARF have some paraphernalia in the
form of header, indexes, section/sub-section references etc. which are somewhat
necessary evil; and complicate such a comparison. So comparing section sizes
with user-level compilation options and size utility has it's merit. My opinion
is still to stick with using -fdebug-types-section even for this alternative way
of comparison.)

Indu.

3. Type Information retrieval and handling:
     CTF type information is organized as a linear array of CTF types. CTF types
     have references to other CTF types. libctf facilitates name lookups, i.e.
     given the name of the type, get the type information.

     DWARF type information is organized in a tree of DIEs. The information at
     the leaf DIEs (base types) across DWARF type units is often duplicated.
     DWARF type units do have references to other type units for larger types
     though. In the example, the DWARF type unit for node has a reference to the
     DWARF type unit for node_payload.

     I only state the above for sake of observation, I don't know for certain if
     one format is necessarily better or worse for consumers of type debug
     information at this time WRT runtime access patterns.

     On a related note though, it's not clear to me how .debug_types integration
     with split-dwarf works out. If the linker does not see the
     non-relocation-necessary part of the DWARF, I am not sure how .debug_type 
type
     units are de-duplicated when using split-dwarf.

Thanks
Indu

Re: Type representation in CTF and DWARF

Reply via email to