On Thu, Oct 17, 2019 at 7:36 PM Nick Alcock <nick.alc...@oracle.com> wrote:
>
> On 11 Oct 2019, Indu Bhagat stated:
> > Compile with -g -gdwarf-like-ctf and use dwz -o <binary_dwz> <binary> (using
> > dwz compiled from the master branch) on the generated binaries:
> >
> > (coreutils-0.22)
> >      .debug_info(D1) | .debug_abbrev(D2) | .debug_str(D4) | .ctf 
> > (uncompressed) | ratio (.ctf/(D1+D2+0.5*D4))
> > ls   30616           |    1136           |    21098       | 26240           
> >     | 0.62
> > pwd  10734           |    788            |    10433       | 13929           
> >     | 0.83
> > groups 10706         |    811            |    10249       | 13378           
> >     | 0.80
> >
> > (emacs-26.3)
> >      .debug_info(D1) | .debug_abbrev(D2) | .debug_str(D4) | .ctf 
> > (uncompressed) | ratio (.ctf/(D1+D2+0.5*D4))
> > emacs-26.3.1 674657  |    6402           |   273963       |   273910        
> >     | 0.33

Btw, for a fair comparison you have to remove all DW_TAG_subroutine
children as well since
CTF doesn't represent scopes or local variables at all (nor types only
used by locals).  It seems
CTF only represents function entry points.

> A side note here: the sizes given above are uncompressed sizes, but in
> the real world CTF is almost always compressed: the threshold for
> compression is in theory customizable but at the moment is hardwired at
> 4KiB-uncompressed in the linker. I usually see compression ratios of
> roughly 3 or 4 to 1: e.g. I just tried it with a randomly chosen binary,
> /usr/lib/libgtk-3.so.0.2404.3, and got these sizes:
>
> .text: 3317489
> DWARF: 8589254
> Uncompressed CTF (*no* ELF strtab sharing, so a bit bigger than usual): 713264
> .ctf section size: 213839
>
> Note that this is not only in the absence of CTF strtab sharing with the
> ELF dynstrtab, but also using a less effective compressor: currently we
> use gzip, but I expect to transition to lzma iff available at binutils
> build time (which it usually is), perhaps as an option (on by default)
> to allow interoperability with binutils that don't have lzma available.
> Obviously better compressors will save even more space.
>
> It may help that CTF is designed for good compressibility: we try to
> minimize the number of unique symbols if we can do so without impairing
> other properties, e.g. by avoiding encoding IDs of objects when we can
> instead rely on the consumer to compute them at read time by walking
> through the relevant data structures and counting.
>
> A few benchamrks indicate that compression by default also saves time
> both at compression and decompression time.
>
> (Within a week I should be able to repeat this with an ld capable of CTF
> deduplication rather than kludging it with a deduplicator meant for a
> quite different job. I expect the sizes above to improve. In fact if
> they *don't* improve I will take this as strong evidence that my
> deduplicator is buggy.)
>
>
> FWIW, here's my Emacs (26.1.50) sizes, again with no strtab sharing, but
> with deduplication: it's bigger than I'd like at around 10% of .text
> size, but still much less than 1% of binary size (my goal is 1--2% of
> .text, but Emacs is a nice tricky case, like Gtk, with lots of big types
> and structures with long member names):
>
> section                  size      addr
> .interp                    28   4194872
> .note.ABI-tag              32   4194900
> .note.gnu.build-id         36   4194932
> .gnu.hash                 628   4194968
> .dynsym                 24432   4195600
> .dynstr                 16934   4220032
> .gnu.version             2036   4236966
> .gnu.version_r            704   4239008
> .rela.data.rel.ro          72   4239712
> .rela.data                168   4239784
> .rela.got                  48   4239952
> .rela,bss                 336   4240000
> .rela.plt               23448   4240336
> .init                      23   4263784
> .plt                    15648   4263808
> .text                 1912622   4279456
> .fini                       9   6192080
> .rodata                165416   6192096
> .eh_frame_hdr           36196   6357512
> .eh_frame              210976   6393712
> .init_array                 8   6609328
> .fini_array                 8   6609336
> .data.rel.ro             4569   6609344
> .dynamic                 1104   6613920
> .got                       16   6615024
> .got.plt                 7840   6615040
> .data                 3276077   6622880
> ,bss                 34153472   9899008
> .comment                   26         0
> .gnu_debuglink             24         0
> .comment                   26         0
> .debug_aranges           1536         0
> .debug_info           3912261         0
> .debug_abbrev           38821         0
> .debug_line            408063         0
> .debug_str             117631         0
> .debug_loc             954538         0
> .debug_ranges          149590         0
> .ctf                   213839         0
> .ctf (uncompressed)    713264         0
>
> (obviously, manually edited a bit, size -A doesn't produce the last line
> on its own!)
>
> (I'm not sure what the hell is going on with the weirdly-named ,bss
> section. Probably something to do with unexec().)

Reply via email to