jankratochvil added a comment.

Asking LLDB community whether to continue with this patchset upstreaming:
Its advantage is sure compatibility with DWZ being used by {RHEL,CentOS}-{7,8}. 
The next version of {RHEL,CentOS} will use it as well. By my quick check Debian 
12=Bookworm=testing is not using DWZ but Debian Sid=unstable is using DWZ. DWZ 
is applied for debug info packages supplied by system vendors. DWZ is not 
automatically applied for binaries compiled by the user.
The disadvantage is that it complicates LLDB codebase IMO a lot (I haven't 
found a less intrusive way to implement the DWZ compatibility) and the debug 
info size reduction by DWZ is IMO too small to be worth all the LLDB codebase 
complications. There could be for example much easier DWZ decompressor (it does 
not exist now) for compatibility reasons instead. For current and future the 
DWZ optimization has been IMO superseded by `clang -flto`. (I have no numbers 
for that claim).
My employer wants this patchset for DWZ support to be upstream in LLDB. 
Personally I am against this idea for reasons in the paragraph above, that 
complicating LLDB codebase is not worth only backward compatibility reasons. 
IMO nowadays DWZ has been superseded by LTO, size advantages of separate 
`*.debug` files download and less importance of small file size difference 
compared to software engineering simplicity.
I spent 2 months measuring effects of DWZ on Fedora/CentOS distribution size 
(the benchmarking code 
<https://git.jankratochvil.net/?p=massrebuild.git;a=tree>): For 
`*-debuginfo.rpm` size storage compared to `-fdebug-types-section` I did 
measure in average 5% size reduction in favor of DWZ but with stddev of +/-11% 
of the size. That means the size reduction strongly depends on which packages 
are chosen. For example for package subset of Fedora which is in CentOS the 
size reduction is only 0.28%. Another example is Fedora subset of packages I 
have installed where DWZ `*-debuginfo.rpm` is even 0.72% bigger than 
`-fdebug-types-section`.
DWZ 5% saving saves about 5GB of the 82GB debug info size per Fedora Linux 
distribution. Personally I find it all pointless as it is one 4K movie size and 
nobody is downloading all the distribution debug info anyway.
Nowadays with `-flto` I believe the size is even smaller than 
`-fdebug-types-section` (and therefore the DWZ size advantage is worse) but I 
have no numbers for this claim.
The average 5% size reduction on `*-debuginfo.rpm` is primarily thanks to DWZ 
DWARF-5 "7.3.6 DWARF Supplementary Object Files". While that is useful for 
`*-debuginfo.rpm` size with the move from downloading whole `*-debuginfo.rpm` 
to rather download separate `*.debug` files (for example by debuginfod 
<https://sourceware.org/elfutils/Debuginfod.html>) one then has to download 
more data with DWZ rather than less (it depends how many files from the same 
package one downloads etc.).
The problem of the current DWZ file format is that one cannot parse 
`DW_UT_partial` without having its `DW_UT_compile` (containing 
`DW_TAG_imported_unit` for that `DW_UT_partial`) as `DW_UT_partial` is missing 
`DW_AT_language`. Another problem is that "DWARF Supplementary Object Files" 
can contain not just the typical types (like `DW_UT_type` although for DWZ in 
`DW_UT_partial`) but it can contain also `static const int variable=42;' which 
does not need `DW_AT_location`. According to @clayborg LLVM IR for types could 
be imported across files but not the variables (for variables we also need to 
know the parent `DW_UT_compile` when parsing them). This all makes the need to 
carry `DWARFUnit *main_unit` (usually as a part of `DWARFUnitPair`) everywhere 
across the LLDB codebase.
There could be a new file format not using `DW_UT_partial` but with "DWARF 
Supplementary Object Files" containing only `DW_UT_type` units which would have 
with LTO IMO the same `*-debuginfo.rpm` size benefits as DWZ without the 
difficulty to carry `DWARFUnit *main_unit` everywhere. But then such simplified 
LLDB reader would no longer be compatible with existing Red Hat OSes debug info 
formats which Red Hat is therefore not interested in.
There are many more effective debug info size reductions already supported by 
upstream LLDB. For example `SHF_COMPRESS` (`zlib`) saves 52% (compared to 5% of 
DWZ). One could also use `zstd` for faster decompression. That is for installed 
on-disk size (other measurements here are for `*-debuginfo.rpm` size).
If the debug info size matters LLVM could use more optimal `DW_FORM_ref*` than 
just its current `DW_FORM_ref4`. This is one of the optimizations done by DWZ.
I have removed all memory storage of `DWARFDIE` so that non-DWZ systems do not 
have performance affected by the DWZ compatibility of LLDB due to the increased 
size of `DWARFDIE` 16 bytes -> 24 bytes due to the new 
`DWARFDIE::m_cu::m_main_cu`. Still there remains a question how such 
`llvm::DWARFDie` size increase would be accepted by LLVM if the DWARF merge 
LLDB->LLVM ever happens.
Personally I believe it would be more convenient to solve the compatibility 
with DWZ debug info by an external "DWZ decompressor" tool which would 
transparently decompress the DWZ files to some cache directory. There could be 
a hook for such an external decompressing tool upstreamed to LLDB.
Current DWZ-optimization tool <https://sourceware.org/dwz/> has these 
disadvantages:

- DWZ does not support `-fdebug-types-section` - for DWARF-5 it errors on 
`DW_UT_type`. That means one needs to build big (approx. twice as big) 
intermediate files (before one can run DWZ) which run out of memory and disk 
space on build farms when building large packages (such as LLVM).
- DWZ will give up when it runs out of memory (`--dwz-low-mem-die-limit`, 
`--dwz-max-die-limit`) which happens for larger packages on build farms. In 
such case the debug info is extra large as one could not use even 
`-fdebug-types-section` for compatibility with DWZ in the first place. This is 
IMO why there is so big DWZ stddev +/-11% on the package sizes.

This patchset does not yet implement DWZ optimization of `.debug_macro`. 
DWARF-5 standard has currently no solution for file format of `.debug_names` 
for DWZ-optimized files (`.debug_names` becomes misleading/invalid after DWZ).
As I find DWZ technology superseded by LTO together with separate `*.debug` 
downloads and I failed to negotiate solving the DWZ compatibility of Red Hat 
OSes downstream (by keeping this LLDB DWZ patchset only downstream or writing a 
separate transparent DWZ decompressor instead) I have decided to protect LLDB 
codebase by quitting Red Hat. Unfortunately Red Hat wants me to upstream this 
patchset still during my leave notice period until October 29th 2021. 
Apparently I will not support this patchset starting with October 30th 2021. 
Red Hat currently does not have any other LLDB engineer replacement for me. The 
patchset is copyright Red Hat company. I would like not to get connected my 
name with this patchset if it gets upstreamed so one should use `git commit 
--author` to the Red Hat company.
This whole patchset on Github. 
<https://github.com/jankratochvil/llvm-project/tree/dwz>


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D96236/new/

https://reviews.llvm.org/D96236

_______________________________________________
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits

Reply via email to