On Fri, Oct 25, 2019 at 1:52 AM Indu Bhagat <indu.bha...@oracle.com> wrote: > > > > On 10/11/2019 04:41 AM, Jakub Jelinek wrote: > > On Fri, Oct 11, 2019 at 01:23:12PM +0200, Richard Biener wrote: > >>> (coreutils-0.22) > >>> .debug_info(D1) | .debug_abbrev(D2) | .debug_str(D4) | .ctf > >>> (uncompressed) | ratio (.ctf/(D1+D2+0.5*D4)) > >>> ls 30616 | 1136 | 21098 | 26240 > >>> | 0.62 > >>> pwd 10734 | 788 | 10433 | 13929 > >>> | 0.83 > >>> groups 10706 | 811 | 10249 | 13378 > >>> | 0.80 > >>> > >>> (emacs-26.3) > >>> .debug_info(D1) | .debug_abbrev(D2) | .debug_str(D4) | .ctf > >>> (uncompressed) | ratio (.ctf/(D1+D2+0.5*D4)) > >>> emacs-26.3.1 674657 | 6402 | 273963 | 273910 > >>> | 0.33 > >>> > >>> I chose to account for 50% of .debug_str because at this point, it will be > >>> unfair to not account for them. Actually, one could even argue that upto > >>> 70% > >>> of the .debug_str are names of entities. CTF section sizes do include the > >>> CTF > >>> string tables. > >>> > >>> Across coreutils, I see a geomean of 0.73 (ratio of > >>> .ctf/(.debug_info + .debug_abbrev + 50% of .debug_str)). So, with the > >>> "-gdwarf-like-ctf code stubs" and dwz, DWARF continues to have a larger > >>> footprint than CTF (with 50% of .debug_str accounted for). > >> I'm not convinced this "improvement" in size is worth maintainig another > >> debug-info format much less since it lacks desirable features right now > >> and thus evaluation is tricky. > >> > >> At least you can improve dwarf size considerably with a low amount of work. > >> > >> I suspect another factor where dwarf is bigger compared to CTF is that > >> dwarf > >> is recording typedef names as well as qualified type variants. But maybe > >> CTF just has a more compact representation for the bits it actually > >> implements. > > Does CTF record automatic variables in functions, or just global variables? > > If only the latter, it would be fair to also disable addition of local > > variable DIEs, lexical blocks. Does CTF record inline functions? Again, if > > not, it would be fair to not emit that either in .debug_info. > > -gno-record-gcc-switches so that the compiler command line is not encoded in > > the debug info (unless it is in CTF). > > CTF includes file-scope and global-scope entities. So, CTF for a function > defined/declared at these scopes is available in .ctf section, even if it is > inlined. > > To not generate DWARF for function-local entities, I made a tweak in the > gen_decl_die API to have an early exit when TREE_CODE (DECL_CONTEXT (decl)) > is FUNCTION_DECL. > > @@ -26374,6 +26374,12 @@ gen_decl_die (tree decl, tree origin, struct > vlr_context *ctx, > if (DECL_P (decl_or_origin) && DECL_IGNORED_P (decl_or_origin)) > return NULL; > > + /* Do not generate info for function local decl when -gdwarf-like-ctf is > + enabled. */ > + if (debug_dwarf_like_ctf && DECL_CONTEXT (decl) > + && (TREE_CODE (DECL_CONTEXT (decl)) == FUNCTION_DECL)) > + return NULL; > + > switch (TREE_CODE (decl_or_origin)) > { > case ERROR_MARK:
A better place is probably in gen_subprogram_die, returning early before /* Output Dwarf info for all of the stuff within the body of the function (if it has one - it may be just a declaration). note we also emit DIEs for [optionally also unused, if requested] function declarations without actual definitions, I would guess CTF doesn't since there's no symbol table entry for those. Plus we by default prune types that are not used. So struct S { int i; }; extern void foo (struct S *); void bar() { struct S s; foo (&s); } would have DIEs for S and foo in addition to that for bar. To me it seems those are not relevant for function entry point inspection (eventually both S and foo have CTF info in the defining unit). Correct? Richard. > > For the numbers in the email today: > 1. CFLAGS="-g -gdwarf-like-ctf -gno-record-gcc-switches -O2". dwz is used on > generated binaries. > 2. At this time, I wanted to account for .debug_str entities appropriately > (not > 50% as done previously). Using a small script to count chars for > accounting the "path-like" strings, specifically those strings that start > with a ".", I gathered the data in column named D5. > > (coreutils-0.22) > .debug_info(D1) | .debug_abbrev(D2) | .debug_str(D4) | path strings > (D5) | .ctf (uncompressed) | ratio (.ctf/(D1+D2+D4-D5)) > ls 14100 | 994 | 16945 | 1328 > | 26240 | 0.85 > pwd 6341 | 632 | 9311 | 596 > | 13929 | 0.88 > groups 6410 | 714 | 9218 | 667 > | 13378 | 0.85 > Average geomean across coreutils = 0.84 > > (emacs-26.3) > .debug_info(D1) | .debug_abbrev(D2) | .debug_str(D4) | path strings > (D5) | .ctf (uncompressed) | ratio (.ctf/(D1+D2+D4-D5)) > emacs-26.3.1 373678 | 3794 | 219048 | 3842 > | 273910 | 0.46 > > > DWARF is highly extensible format, what exactly is and is not emitted is > > something that consumers can choose. > > Yes, DWARF can be large, but mainly because it provides a lot of > > information, the actual representation has been designed with size concerns > > in mind and newer versions of the standard keep improving that too. > > > > Jakub > > Yes. > > I started out to provide some numbers around the size impact of CTF vs DWARF > as it was a legitimate curiosity many of us have had. Comparing Compactness or > feature matrices is only one dimension of evaluating the utility of supporting > CTF in the toolchain (including GCC; Bintuils and GDB have already accepted > initial CTF support). The other dimension is a user friendly workflow which > supports current users and eases further adoption and growth. > > Indu >