Re: Type representation in CTF and DWARF

Jason Merrill Mon, 07 Oct 2019 13:57:17 -0700

On Mon, Oct 7, 2019 at 4:47 PM Indu Bhagat <indu.bha...@oracle.com> wrote:


> On 10/07/2019 12:35 AM, Richard Biener wrote:
> > On Fri, Oct 4, 2019 at 9:12 PM Indu Bhagat <indu.bha...@oracle.com>
> wrote:
> >> Hello,
> >>
> >> At GNU Tools Cauldron this year, some folks were curious to know more
> on how
> >> the "type representation" in CTF compares vis-a-vis DWARF.
> >>
> >> [...]
> >>
> >> So, for the small C testcase with a union, enum, array, struct, typedef
> etc, I
> >> see following sizes :
> >>
> >> Compile with -fdebug-types-section -gdwarf-4 (size -A <binary> excerpt):
> >>       .debug_aranges     48         0
> >>       .debug_info       150         0
> >>       .debug_abbrev     314         0
> >>       .debug_line        73         0
> >>       .debug_str        455         0
> >>       .debug_ranges      32         0
> >>       .debug_types      578         0
> >>
> >> Compile with -fdebug-types-section -gdwarf-5 (size -A <binary> excerpt):
> >>       .debug_aranges      48         0
> >>       .debug_info        732         0
> >>       .debug_abbrev      309         0
> >>       .debug_line         73         0
> >>       .debug_str         455         0
> >>       .debug_rnglists     23         0
> >>
> >> Compile with -gt (size -A <binary> excerpt):
> >>       .ctf      966     0
> >>       CTF strings sub-section size (ctf_strlen in disassmebly) = 374
> >>       == > CTF section just for representing types = 966 - 374 = 592
> bytes
> >>       (The 592 bytes include the CTF header and other indexes etc.)
> >>
> >> So, following points are what I would highlight. Hopefully this helps
> you see
> >> that CTF has promise for the task of representing type debug info.
> >>
> >> 1. Type Information layout in sections:
> >>      A .ctf section is self-sufficient to represent types in a program.
> All
> >>      references within the CTF section are via either indexes or
> offsets into the
> >>      CTF section. No relocations are necessary in CTF at this time. In
> contrast,
> >>      DWARF type information is organized in multiple sections -
> .debug_info,
> >>      .debug_abbrev and .debug_str sections in DWARF5; plus .debug_types
> in DWARF4.
> >>
> >> 2. Type Information encoding / compactness matters:
> >>      Because the type information is organized across sections in DWARF
> (and
> >>      contains some debug information like location etc.) , it is not
> feasible
> >>      to put a distinct number to the size in bytes for representing type
> >>      information in DWARF. But the size info of sections shown above
> should
> >>      be helpful to show that CTF does show promise in compactly
> representing
> >>      types.
> >>
> >>      Lets see some size data. CTF string table (= 374 bytes) is left
> out of the
> >>      discussion at hand because it will not be fair to compare with
> .debug_str
> >>      section which contains other information than just names of types.
> >>
> >>      The 592 bytes of the .ctf section are needed to represent types in
> CTF
> >>      format. Now, when using DWARF5, the type information needs 732
> bytes in
> >>      .debug_info and 309 bytes in .debug_abbrev.
> >>
> >>      In DWARF (when using -fdebug-types-section), the base types are
> duplicated
> >>      across type units. So for the above example, the DWARF DIE
> representing
> >>      'unsigned int' will appear in both the  DWARF trees for types -
> node and
> >>      node_payload. In CTF, there is a single lone type 'unsigned int'.
> > It's not clear to me why you are using -fdebug-types-section for this
> > comparison?
> > With just -gdwarf-4 I get
> >
> > .debug_info      292
> > .debug_abbrev 189
> > .debug_str       299
> >
> > this contains all the info CTF provides (and more).  This sums to 780
> bytes,
> > smaller than the CTF variant.  I skimmed over the info and there's not
> much
> > to strip to get to CTF levels, mainly locations.  The strings section
> also
> > has a quite large portion for GCC version and arguments, which is 93
> bytes.
> > So overall the DWARF representation should clock in at less than 700
> bytes,
> > more close to 650.
> >
> > Richard.
>
> It's not in favor of DWARF to go with just -gdwarf-4. Because the types
> in the .debug_info section will not be de-duplicated. For more complicated
> code
> bases with many compilation units, this will skew the results in favor of
> CTF
> (once the CTF de-duplictor is ready :) ).
>
> Now, one might argue that in this example, there is no role for
> de-duplicator.
> Yes to that. But to all users of DWARF type debug information for _real
> codebases_, -fdebug-types-section option is the best option. Isn't it ?
>
> Keeping "the size of type debug information in the shipped artifact small"
> as
> our target is meaningful for both CTF and DWARF.
>
> De-duplication is a key contributor to reducing the size of the type debug
> information; and both CTF and DWARF types can be de-duplicated. At this
> time, I
> stuck to a simple example with one CU because it eases interpreting the
> CTF and
> DWARF debug info in the binaries and because the CTF link-time
> de-duplication
> is not fully ready.
>
> (NickA suggested few days ago to compare how DWARF and CTF section sizes
>   increase when a new member, or a new enum, or a new union etc are added.
> I can
>   share some more data if there is interest in such a comparison. Few
> examples
>   below :
>
> 1. Add a new member 'struct node_payload * a' to struct node_payload
>     DWARF = 589 - 578 (.debug_types); 331 - 314 (.debug_abbrev); total =
> 11 + 17 = 28
>     CTF = 980 - 966 (.ctf) ; string bytes increase = 2 ("a\0"); total = 14
> - 2 = 12
> 2. Add a new enumeration value 'A = 5,' to enum node_type
>     DWARF = 582 - 578 (.debug_types); 323 - 314 (.debug_abbrev); total = 4
> + 9 = 13
>     CTF = 976 - 966 (.ctf); string bytes increase = 2 ("a\0"); total = 8
> 3. Add new member 'unsigned int a' to struct node_payload
>     DWARF = 589 - 578 (.debug_types); 331 - 314 (.debug_abbrev); total =
> 11 + 17 = 28
>     CTF = 980 - 966 (.ctf); string bytes increase = 2; total = 14 - 2 = 12
> 4. Add new union nu2 to struct node (n2 mirrors nu; all new strings = "a",
> "b", "n2")
>     DWARF = 666 - 578 (.debug_types); 329 - 314 (.debug_abbrev); total =
> 88 + 15 = 103
>     CTF = 1021 - 966 (.ctf); string bytes increase = 7; total = 55 - 7 = 48
>
> The larger "issue" is that both CTF and DWARF have some paraphernalia in
> the
> form of header, indexes, section/sub-section references etc. which are
> somewhat
> necessary evil; and complicate such a comparison. So comparing section
> sizes
> with user-level compilation options and size utility has it's merit. My
> opinion
> is still to stick with using -fdebug-types-section even for this
> alternative way
> of comparison.)
>

If you're planning on using a separate tool for de-duplication, it would
also be good to consider the size of the DWARF without -fdebug-types, but
using dwz for de-duplication.

https://sourceware.org/dwz/


> >
> >> 3. Type Information retrieval and handling:
> >>      CTF type information is organized as a linear array of CTF types.
> CTF types
> >>      have references to other CTF types. libctf facilitates name
> lookups, i.e.
> >>      given the name of the type, get the type information.
> >>
> >>      DWARF type information is organized in a tree of DIEs. The
> information at
> >>      the leaf DIEs (base types) across DWARF type units is often
> duplicated.
> >>      DWARF type units do have references to other type units for larger
> types
> >>      though. In the example, the DWARF type unit for node has a
> reference to the
> >>      DWARF type unit for node_payload.
> >>
> >>      I only state the above for sake of observation, I don't know for
> certain if
> >>      one format is necessarily better or worse for consumers of type
> debug
> >>      information at this time WRT runtime access patterns.
> >>
> >>      On a related note though, it's not clear to me how .debug_types
> integration
> >>      with split-dwarf works out. If the linker does not see the
> >>      non-relocation-necessary part of the DWARF, I am not sure how
> .debug_type type
> >>      units are de-duplicated when using split-dwarf.
> >>
> >> Thanks
> >> Indu
> >>
>
>

Re: Type representation in CTF and DWARF

Reply via email to