On Thu, Oct 17, 2019 at 7:36 PM Nick Alcock <nick.alc...@oracle.com> wrote: > > On 11 Oct 2019, Indu Bhagat stated: > > Compile with -g -gdwarf-like-ctf and use dwz -o <binary_dwz> <binary> (using > > dwz compiled from the master branch) on the generated binaries: > > > > (coreutils-0.22) > > .debug_info(D1) | .debug_abbrev(D2) | .debug_str(D4) | .ctf > > (uncompressed) | ratio (.ctf/(D1+D2+0.5*D4)) > > ls 30616 | 1136 | 21098 | 26240 > > | 0.62 > > pwd 10734 | 788 | 10433 | 13929 > > | 0.83 > > groups 10706 | 811 | 10249 | 13378 > > | 0.80 > > > > (emacs-26.3) > > .debug_info(D1) | .debug_abbrev(D2) | .debug_str(D4) | .ctf > > (uncompressed) | ratio (.ctf/(D1+D2+0.5*D4)) > > emacs-26.3.1 674657 | 6402 | 273963 | 273910 > > | 0.33
Btw, for a fair comparison you have to remove all DW_TAG_subroutine children as well since CTF doesn't represent scopes or local variables at all (nor types only used by locals). It seems CTF only represents function entry points. > A side note here: the sizes given above are uncompressed sizes, but in > the real world CTF is almost always compressed: the threshold for > compression is in theory customizable but at the moment is hardwired at > 4KiB-uncompressed in the linker. I usually see compression ratios of > roughly 3 or 4 to 1: e.g. I just tried it with a randomly chosen binary, > /usr/lib/libgtk-3.so.0.2404.3, and got these sizes: > > .text: 3317489 > DWARF: 8589254 > Uncompressed CTF (*no* ELF strtab sharing, so a bit bigger than usual): 713264 > .ctf section size: 213839 > > Note that this is not only in the absence of CTF strtab sharing with the > ELF dynstrtab, but also using a less effective compressor: currently we > use gzip, but I expect to transition to lzma iff available at binutils > build time (which it usually is), perhaps as an option (on by default) > to allow interoperability with binutils that don't have lzma available. > Obviously better compressors will save even more space. > > It may help that CTF is designed for good compressibility: we try to > minimize the number of unique symbols if we can do so without impairing > other properties, e.g. by avoiding encoding IDs of objects when we can > instead rely on the consumer to compute them at read time by walking > through the relevant data structures and counting. > > A few benchamrks indicate that compression by default also saves time > both at compression and decompression time. > > (Within a week I should be able to repeat this with an ld capable of CTF > deduplication rather than kludging it with a deduplicator meant for a > quite different job. I expect the sizes above to improve. In fact if > they *don't* improve I will take this as strong evidence that my > deduplicator is buggy.) > > > FWIW, here's my Emacs (26.1.50) sizes, again with no strtab sharing, but > with deduplication: it's bigger than I'd like at around 10% of .text > size, but still much less than 1% of binary size (my goal is 1--2% of > .text, but Emacs is a nice tricky case, like Gtk, with lots of big types > and structures with long member names): > > section size addr > .interp 28 4194872 > .note.ABI-tag 32 4194900 > .note.gnu.build-id 36 4194932 > .gnu.hash 628 4194968 > .dynsym 24432 4195600 > .dynstr 16934 4220032 > .gnu.version 2036 4236966 > .gnu.version_r 704 4239008 > .rela.data.rel.ro 72 4239712 > .rela.data 168 4239784 > .rela.got 48 4239952 > .rela,bss 336 4240000 > .rela.plt 23448 4240336 > .init 23 4263784 > .plt 15648 4263808 > .text 1912622 4279456 > .fini 9 6192080 > .rodata 165416 6192096 > .eh_frame_hdr 36196 6357512 > .eh_frame 210976 6393712 > .init_array 8 6609328 > .fini_array 8 6609336 > .data.rel.ro 4569 6609344 > .dynamic 1104 6613920 > .got 16 6615024 > .got.plt 7840 6615040 > .data 3276077 6622880 > ,bss 34153472 9899008 > .comment 26 0 > .gnu_debuglink 24 0 > .comment 26 0 > .debug_aranges 1536 0 > .debug_info 3912261 0 > .debug_abbrev 38821 0 > .debug_line 408063 0 > .debug_str 117631 0 > .debug_loc 954538 0 > .debug_ranges 149590 0 > .ctf 213839 0 > .ctf (uncompressed) 713264 0 > > (obviously, manually edited a bit, size -A doesn't produce the last line > on its own!) > > (I'm not sure what the hell is going on with the weirdly-named ,bss > section. Probably something to do with unexec().)