On 10/07/2019 12:35 AM, Richard Biener wrote:
On Fri, Oct 4, 2019 at 9:12 PM Indu Bhagat <indu.bha...@oracle.com> wrote:
Hello,
At GNU Tools Cauldron this year, some folks were curious to know more on how
the "type representation" in CTF compares vis-a-vis DWARF.
[...]
So, for the small C testcase with a union, enum, array, struct, typedef etc, I
see following sizes :
Compile with -fdebug-types-section -gdwarf-4 (size -A <binary> excerpt):
.debug_aranges 48 0
.debug_info 150 0
.debug_abbrev 314 0
.debug_line 73 0
.debug_str 455 0
.debug_ranges 32 0
.debug_types 578 0
Compile with -fdebug-types-section -gdwarf-5 (size -A <binary> excerpt):
.debug_aranges 48 0
.debug_info 732 0
.debug_abbrev 309 0
.debug_line 73 0
.debug_str 455 0
.debug_rnglists 23 0
Compile with -gt (size -A <binary> excerpt):
.ctf 966 0
CTF strings sub-section size (ctf_strlen in disassmebly) = 374
== > CTF section just for representing types = 966 - 374 = 592 bytes
(The 592 bytes include the CTF header and other indexes etc.)
So, following points are what I would highlight. Hopefully this helps you see
that CTF has promise for the task of representing type debug info.
1. Type Information layout in sections:
A .ctf section is self-sufficient to represent types in a program. All
references within the CTF section are via either indexes or offsets into
the
CTF section. No relocations are necessary in CTF at this time. In contrast,
DWARF type information is organized in multiple sections - .debug_info,
.debug_abbrev and .debug_str sections in DWARF5; plus .debug_types in
DWARF4.
2. Type Information encoding / compactness matters:
Because the type information is organized across sections in DWARF (and
contains some debug information like location etc.) , it is not feasible
to put a distinct number to the size in bytes for representing type
information in DWARF. But the size info of sections shown above should
be helpful to show that CTF does show promise in compactly representing
types.
Lets see some size data. CTF string table (= 374 bytes) is left out of the
discussion at hand because it will not be fair to compare with .debug_str
section which contains other information than just names of types.
The 592 bytes of the .ctf section are needed to represent types in CTF
format. Now, when using DWARF5, the type information needs 732 bytes in
.debug_info and 309 bytes in .debug_abbrev.
In DWARF (when using -fdebug-types-section), the base types are duplicated
across type units. So for the above example, the DWARF DIE representing
'unsigned int' will appear in both the DWARF trees for types - node and
node_payload. In CTF, there is a single lone type 'unsigned int'.
It's not clear to me why you are using -fdebug-types-section for this
comparison?
With just -gdwarf-4 I get
.debug_info 292
.debug_abbrev 189
.debug_str 299
this contains all the info CTF provides (and more). This sums to 780 bytes,
smaller than the CTF variant. I skimmed over the info and there's not much
to strip to get to CTF levels, mainly locations. The strings section also
has a quite large portion for GCC version and arguments, which is 93 bytes.
So overall the DWARF representation should clock in at less than 700 bytes,
more close to 650.
Richard.
It's not in favor of DWARF to go with just -gdwarf-4. Because the types
in the .debug_info section will not be de-duplicated. For more complicated code
bases with many compilation units, this will skew the results in favor of CTF
(once the CTF de-duplictor is ready :) ).
Now, one might argue that in this example, there is no role for de-duplicator.
Yes to that. But to all users of DWARF type debug information for _real
codebases_, -fdebug-types-section option is the best option. Isn't it ?
Keeping "the size of type debug information in the shipped artifact small" as
our target is meaningful for both CTF and DWARF.
De-duplication is a key contributor to reducing the size of the type debug
information; and both CTF and DWARF types can be de-duplicated. At this time, I
stuck to a simple example with one CU because it eases interpreting the CTF and
DWARF debug info in the binaries and because the CTF link-time de-duplication
is not fully ready.
(NickA suggested few days ago to compare how DWARF and CTF section sizes
increase when a new member, or a new enum, or a new union etc are added. I can
share some more data if there is interest in such a comparison. Few examples
below :
1. Add a new member 'struct node_payload * a' to struct node_payload
DWARF = 589 - 578 (.debug_types); 331 - 314 (.debug_abbrev); total = 11 + 17
= 28
CTF = 980 - 966 (.ctf) ; string bytes increase = 2 ("a\0"); total = 14 - 2 =
12
2. Add a new enumeration value 'A = 5,' to enum node_type
DWARF = 582 - 578 (.debug_types); 323 - 314 (.debug_abbrev); total = 4 + 9 =
13
CTF = 976 - 966 (.ctf); string bytes increase = 2 ("a\0"); total = 8
3. Add new member 'unsigned int a' to struct node_payload
DWARF = 589 - 578 (.debug_types); 331 - 314 (.debug_abbrev); total = 11 + 17
= 28
CTF = 980 - 966 (.ctf); string bytes increase = 2; total = 14 - 2 = 12
4. Add new union nu2 to struct node (n2 mirrors nu; all new strings = "a", "b",
"n2")
DWARF = 666 - 578 (.debug_types); 329 - 314 (.debug_abbrev); total = 88 + 15
= 103
CTF = 1021 - 966 (.ctf); string bytes increase = 7; total = 55 - 7 = 48
The larger "issue" is that both CTF and DWARF have some paraphernalia in the
form of header, indexes, section/sub-section references etc. which are somewhat
necessary evil; and complicate such a comparison. So comparing section sizes
with user-level compilation options and size utility has it's merit. My opinion
is still to stick with using -fdebug-types-section even for this alternative way
of comparison.)
Indu.
3. Type Information retrieval and handling:
CTF type information is organized as a linear array of CTF types. CTF types
have references to other CTF types. libctf facilitates name lookups, i.e.
given the name of the type, get the type information.
DWARF type information is organized in a tree of DIEs. The information at
the leaf DIEs (base types) across DWARF type units is often duplicated.
DWARF type units do have references to other type units for larger types
though. In the example, the DWARF type unit for node has a reference to the
DWARF type unit for node_payload.
I only state the above for sake of observation, I don't know for certain if
one format is necessarily better or worse for consumers of type debug
information at this time WRT runtime access patterns.
On a related note though, it's not clear to me how .debug_types integration
with split-dwarf works out. If the linker does not see the
non-relocation-necessary part of the DWARF, I am not sure how .debug_type
type
units are de-duplicated when using split-dwarf.
Thanks
Indu