On Mon, Oct 7, 2019 at 4:47 PM Indu Bhagat <indu.bha...@oracle.com> wrote:
> On 10/07/2019 12:35 AM, Richard Biener wrote: > > On Fri, Oct 4, 2019 at 9:12 PM Indu Bhagat <indu.bha...@oracle.com> > wrote: > >> Hello, > >> > >> At GNU Tools Cauldron this year, some folks were curious to know more > on how > >> the "type representation" in CTF compares vis-a-vis DWARF. > >> > >> [...] > >> > >> So, for the small C testcase with a union, enum, array, struct, typedef > etc, I > >> see following sizes : > >> > >> Compile with -fdebug-types-section -gdwarf-4 (size -A <binary> excerpt): > >> .debug_aranges 48 0 > >> .debug_info 150 0 > >> .debug_abbrev 314 0 > >> .debug_line 73 0 > >> .debug_str 455 0 > >> .debug_ranges 32 0 > >> .debug_types 578 0 > >> > >> Compile with -fdebug-types-section -gdwarf-5 (size -A <binary> excerpt): > >> .debug_aranges 48 0 > >> .debug_info 732 0 > >> .debug_abbrev 309 0 > >> .debug_line 73 0 > >> .debug_str 455 0 > >> .debug_rnglists 23 0 > >> > >> Compile with -gt (size -A <binary> excerpt): > >> .ctf 966 0 > >> CTF strings sub-section size (ctf_strlen in disassmebly) = 374 > >> == > CTF section just for representing types = 966 - 374 = 592 > bytes > >> (The 592 bytes include the CTF header and other indexes etc.) > >> > >> So, following points are what I would highlight. Hopefully this helps > you see > >> that CTF has promise for the task of representing type debug info. > >> > >> 1. Type Information layout in sections: > >> A .ctf section is self-sufficient to represent types in a program. > All > >> references within the CTF section are via either indexes or > offsets into the > >> CTF section. No relocations are necessary in CTF at this time. In > contrast, > >> DWARF type information is organized in multiple sections - > .debug_info, > >> .debug_abbrev and .debug_str sections in DWARF5; plus .debug_types > in DWARF4. > >> > >> 2. Type Information encoding / compactness matters: > >> Because the type information is organized across sections in DWARF > (and > >> contains some debug information like location etc.) , it is not > feasible > >> to put a distinct number to the size in bytes for representing type > >> information in DWARF. But the size info of sections shown above > should > >> be helpful to show that CTF does show promise in compactly > representing > >> types. > >> > >> Lets see some size data. CTF string table (= 374 bytes) is left > out of the > >> discussion at hand because it will not be fair to compare with > .debug_str > >> section which contains other information than just names of types. > >> > >> The 592 bytes of the .ctf section are needed to represent types in > CTF > >> format. Now, when using DWARF5, the type information needs 732 > bytes in > >> .debug_info and 309 bytes in .debug_abbrev. > >> > >> In DWARF (when using -fdebug-types-section), the base types are > duplicated > >> across type units. So for the above example, the DWARF DIE > representing > >> 'unsigned int' will appear in both the DWARF trees for types - > node and > >> node_payload. In CTF, there is a single lone type 'unsigned int'. > > It's not clear to me why you are using -fdebug-types-section for this > > comparison? > > With just -gdwarf-4 I get > > > > .debug_info 292 > > .debug_abbrev 189 > > .debug_str 299 > > > > this contains all the info CTF provides (and more). This sums to 780 > bytes, > > smaller than the CTF variant. I skimmed over the info and there's not > much > > to strip to get to CTF levels, mainly locations. The strings section > also > > has a quite large portion for GCC version and arguments, which is 93 > bytes. > > So overall the DWARF representation should clock in at less than 700 > bytes, > > more close to 650. > > > > Richard. > > It's not in favor of DWARF to go with just -gdwarf-4. Because the types > in the .debug_info section will not be de-duplicated. For more complicated > code > bases with many compilation units, this will skew the results in favor of > CTF > (once the CTF de-duplictor is ready :) ). > > Now, one might argue that in this example, there is no role for > de-duplicator. > Yes to that. But to all users of DWARF type debug information for _real > codebases_, -fdebug-types-section option is the best option. Isn't it ? > > Keeping "the size of type debug information in the shipped artifact small" > as > our target is meaningful for both CTF and DWARF. > > De-duplication is a key contributor to reducing the size of the type debug > information; and both CTF and DWARF types can be de-duplicated. At this > time, I > stuck to a simple example with one CU because it eases interpreting the > CTF and > DWARF debug info in the binaries and because the CTF link-time > de-duplication > is not fully ready. > > (NickA suggested few days ago to compare how DWARF and CTF section sizes > increase when a new member, or a new enum, or a new union etc are added. > I can > share some more data if there is interest in such a comparison. Few > examples > below : > > 1. Add a new member 'struct node_payload * a' to struct node_payload > DWARF = 589 - 578 (.debug_types); 331 - 314 (.debug_abbrev); total = > 11 + 17 = 28 > CTF = 980 - 966 (.ctf) ; string bytes increase = 2 ("a\0"); total = 14 > - 2 = 12 > 2. Add a new enumeration value 'A = 5,' to enum node_type > DWARF = 582 - 578 (.debug_types); 323 - 314 (.debug_abbrev); total = 4 > + 9 = 13 > CTF = 976 - 966 (.ctf); string bytes increase = 2 ("a\0"); total = 8 > 3. Add new member 'unsigned int a' to struct node_payload > DWARF = 589 - 578 (.debug_types); 331 - 314 (.debug_abbrev); total = > 11 + 17 = 28 > CTF = 980 - 966 (.ctf); string bytes increase = 2; total = 14 - 2 = 12 > 4. Add new union nu2 to struct node (n2 mirrors nu; all new strings = "a", > "b", "n2") > DWARF = 666 - 578 (.debug_types); 329 - 314 (.debug_abbrev); total = > 88 + 15 = 103 > CTF = 1021 - 966 (.ctf); string bytes increase = 7; total = 55 - 7 = 48 > > The larger "issue" is that both CTF and DWARF have some paraphernalia in > the > form of header, indexes, section/sub-section references etc. which are > somewhat > necessary evil; and complicate such a comparison. So comparing section > sizes > with user-level compilation options and size utility has it's merit. My > opinion > is still to stick with using -fdebug-types-section even for this > alternative way > of comparison.) > If you're planning on using a separate tool for de-duplication, it would also be good to consider the size of the DWARF without -fdebug-types, but using dwz for de-duplication. https://sourceware.org/dwz/ > > > >> 3. Type Information retrieval and handling: > >> CTF type information is organized as a linear array of CTF types. > CTF types > >> have references to other CTF types. libctf facilitates name > lookups, i.e. > >> given the name of the type, get the type information. > >> > >> DWARF type information is organized in a tree of DIEs. The > information at > >> the leaf DIEs (base types) across DWARF type units is often > duplicated. > >> DWARF type units do have references to other type units for larger > types > >> though. In the example, the DWARF type unit for node has a > reference to the > >> DWARF type unit for node_payload. > >> > >> I only state the above for sake of observation, I don't know for > certain if > >> one format is necessarily better or worse for consumers of type > debug > >> information at this time WRT runtime access patterns. > >> > >> On a related note though, it's not clear to me how .debug_types > integration > >> with split-dwarf works out. If the linker does not see the > >> non-relocation-necessary part of the DWARF, I am not sure how > .debug_type type > >> units are de-duplicated when using split-dwarf. > >> > >> Thanks > >> Indu > >> > >