Re: Type representation in CTF and DWARF

Indu Bhagat Tue, 08 Oct 2019 22:27:13 -0700



On 10/08/2019 08:37 AM, Pedro Alves wrote:

On 10/4/19 8:23 PM, Indu Bhagat wrote:

Hello,

At GNU Tools Cauldron this year, some folks were curious to know more on how
the "type representation" in CTF compares vis-a-vis DWARF.

I was one of those, and I brought this up to Jose, after your
presentation.  Glad to see the follow up!  Thanks much for this.

In your Cauldron presentation we saw CTF compared to full blown DWARF
as justification for CTF,


Hmm. And I thought I made the effort reqd to clarify my position that comparing
full-blown DWARF sizes to type-only CTF section sizes is not appropriate, let
alone to not use as a justification for CTF. My intention to show those numbers 
was
only to give some perspective to users curious to know the sizes of CTF debug
info (as generated by dwarf2ctf) because these sections will ideally be not
stripped out of shipped binaries.

The justification for CTF is and will remain - a compact, faster debug format
for type information and support some online debugging use-cases (like
backtraces) in future.

but I was more interested in a comparison between
CTF and a DWARF subset containing exactly only what you have available in
CTF.  Because if DWARF with everything-you-don't-need stripped out
is in the same ballpark, then I am puzzled on why add/maintain a new
Debug format, with all the duplication of effort that entails going
forward.


I shared some numbers on this in the previous emails in this thread. I thought
comparing DWARF's de-duplication-amenable offering (using
-fdebug-types-section) will be useful in this context.

For binaries compiled with -fdebug-types-section -gdwarf-4, here is some data.
The CTF sections are generated with dwarf2ctf because CTF link-time de-dup is
being worked on currently. The end result of link-time CTF de-dup is expected
to be at par with these .ctf section sizes.

The .ctf section sizes below include the CTF string table (.debug_str is
excluded from the calculations however):

(coreutils-0.22)
   .debug_info(D1) | .debug_abbrev(D2) | .debug_str | .debug_types(D3) | .ctf 
(uncompressed) | ratio (.ctf/(D1+D2+D3))
ls  109806         |  18876            |  22042     |  12413           |   
26240             | 0.18
pwd 27902          |  7914             |  10851     |  5753            |   
13929             | 0.33
groups 26920       |  8173             |  10674     |  5070            |   
13378             | 0.33

(emacs-26.3)
   .debug_info(D1) | .debug_abbrev(D2) | .debug_str | .debug_types(D3) | .ctf 
(uncompressed) | ratio (.ctf/(D1+D2+D3))
emacs 3755083      |   202926          |  431926    |   143462         |   
273910            | 0.06


It is not easy to get an estimate of 'DWARF with everything-you-don't-need
stripped out'. At this time, I don't know of an easy way to make this comparison
more meaningful. Any suggestions ?

Also, it's my understanding that the current CTF format doesn't yet
support C++, Vector registers, etc., maybe other things, so if DWARF
was sufficient for your needs, then in the long run it sounds like
a better option to me, as then you wouldn't have to extend CTF _and_
DWARF whenever some feature is needed.


Yes, CTF does not support C++ at this time. To cover all of C (including
GNU C extensions), we need to add representation for things like Vector type,
non IEEE float etc. (somewhat infrequently occurring constructs)

The issue is not that DWARF cannot represent the required type information.
DWARF is voluminous and secondly, the current workflow to get to CTF from
source programs without direct toolchain support is tiresome and lengthy.

For current and future users of CTF, having the support for the format in the
toolchain is the best way to promote adoption and enhance community experience.

Maybe it would make sense to work on integrating CTF into the DWARF
standard itself, not sure?

I was also curious on your plans for adding unwinding support to CTF,
while the kernel (the main CTF user, IIUC), already has plans to
use its own unwinding format (ORC)?


Kernel's unwinding format (ORC) helps generate backtrace with function
identifiers. For some (ORCL) internal customers, the requirement is to go beyond
that and support input arg values. The requirement there is to generate
backtraces in a fast way, without relying on DWARF.

So with all those questions, I came out of the presentation
thinking that I could not really justify CTF if I were asked to.


Thanks for discussing this openly. I believe there are other GCC
maintainers who are undecided as well :)

I hope I have answered some of your concerns.

(Side note: the Cauldron page is missing slides for your
presentation, so I couldn't go and recheck some things
mentioned above.)

Thanks,
Pedro Alves

I mailed the organizers my slides. They should be online soon.

Thanks

Re: Type representation in CTF and DWARF

Reply via email to