Hello,
At GNU Tools Cauldron this year, some folks were curious to know more on how
the "type representation" in CTF compares vis-a-vis DWARF.
I use small testcase below to gather some numbers to help drive this discussion.
[ibhagat@ibhagatpc ctf-size]$ cat ctf_sizeme.c
#define MAX_NUM_MSGS 5
enum node_type
{
INIT_TYPE = 0,
COMM_TYPE = 1,
COMP_TYPE = 2,
MSG_TYPE = 3,
RELEASE_TYPE = 4,
MAX_NODE_TYPE
};
typedef struct node_payload
{
unsigned short npay_offset;
const char * npay_msg;
unsigned int npay_nelems;
struct node_payload * npay_next;
} node_payload;
typedef struct node_property
{
int timestamp;
char category;
long initvalue;
} node_property_t;
typedef struct node
{
enum node_type ntype;
int nmask:5;
union
{
struct node_payload * npayload;
void * nbase;
} nu;
unsigned int msgs[MAX_NUM_MSGS];
node_property_t node_prop;
} Node;
Node s;
int main (void)
{
return 0;
}
Note that in this case, there is nothing that the de-duplicator has to do
(neither for the TYPE comdat sections nor CTF types). I chose such an example
because de-duplication of types is orthogonal to the concept of representation
of types.
So, for the small C testcase with a union, enum, array, struct, typedef etc, I
see following sizes :
Compile with -fdebug-types-section -gdwarf-4 (size -A <binary> excerpt):
.debug_aranges 48 0
.debug_info 150 0
.debug_abbrev 314 0
.debug_line 73 0
.debug_str 455 0
.debug_ranges 32 0
.debug_types 578 0
Compile with -fdebug-types-section -gdwarf-5 (size -A <binary> excerpt):
.debug_aranges 48 0
.debug_info 732 0
.debug_abbrev 309 0
.debug_line 73 0
.debug_str 455 0
.debug_rnglists 23 0
Compile with -gt (size -A <binary> excerpt):
.ctf 966 0
CTF strings sub-section size (ctf_strlen in disassmebly) = 374
== > CTF section just for representing types = 966 - 374 = 592 bytes
(The 592 bytes include the CTF header and other indexes etc.)
So, following points are what I would highlight. Hopefully this helps you see
that CTF has promise for the task of representing type debug info.
1. Type Information layout in sections:
A .ctf section is self-sufficient to represent types in a program. All
references within the CTF section are via either indexes or offsets into the
CTF section. No relocations are necessary in CTF at this time. In contrast,
DWARF type information is organized in multiple sections - .debug_info,
.debug_abbrev and .debug_str sections in DWARF5; plus .debug_types in DWARF4.
2. Type Information encoding / compactness matters:
Because the type information is organized across sections in DWARF (and
contains some debug information like location etc.) , it is not feasible
to put a distinct number to the size in bytes for representing type
information in DWARF. But the size info of sections shown above should
be helpful to show that CTF does show promise in compactly representing
types.
Lets see some size data. CTF string table (= 374 bytes) is left out of the
discussion at hand because it will not be fair to compare with .debug_str
section which contains other information than just names of types.
The 592 bytes of the .ctf section are needed to represent types in CTF
format. Now, when using DWARF5, the type information needs 732 bytes in
.debug_info and 309 bytes in .debug_abbrev.
In DWARF (when using -fdebug-types-section), the base types are duplicated
across type units. So for the above example, the DWARF DIE representing
'unsigned int' will appear in both the DWARF trees for types - node and
node_payload. In CTF, there is a single lone type 'unsigned int'.
3. Type Information retrieval and handling:
CTF type information is organized as a linear array of CTF types. CTF types
have references to other CTF types. libctf facilitates name lookups, i.e.
given the name of the type, get the type information.
DWARF type information is organized in a tree of DIEs. The information at
the leaf DIEs (base types) across DWARF type units is often duplicated.
DWARF type units do have references to other type units for larger types
though. In the example, the DWARF type unit for node has a reference to the
DWARF type unit for node_payload.
I only state the above for sake of observation, I don't know for certain if
one format is necessarily better or worse for consumers of type debug
information at this time WRT runtime access patterns.
On a related note though, it's not clear to me how .debug_types integration
with split-dwarf works out. If the linker does not see the
non-relocation-necessary part of the DWARF, I am not sure how .debug_type
type
units are de-duplicated when using split-dwarf.
Thanks
Indu