Re: Range lists, zero-length functions, linker gc
On Sun, May 31, 2020 at 1:41 PM Mark Wielaard wrote: > > Hi, > > On Sun, May 31, 2020 at 11:55:06AM -0700, Fangrui Song via Elfutils-devel > wrote: > > what linkers should do regarding relocations referencing dropped > > functions (due to section group rules, --gc-sections, /DISCARD/, > > etc) in .debug_* > > > > As an example: > > > > __attribute__((section(".text.x"))) void f1() { } > > __attribute__((section(".text.x"))) void f2() { } > > int main() { } > > > > Some .debug_* sections are relocated by R_X86_64_64 referencing > > undefined symbols (the STT_SECTION symbols are collected): > > > > 0x0043: DW_TAG_subprogram [2] > > ## relocated by .text.x + 10 > > DW_AT_low_pc [DW_FORM_addr] (0x0010 > > ".text.x") > > DW_AT_high_pc [DW_FORM_data4] (0x0006) > > DW_AT_frame_base [DW_FORM_exprloc] (DW_OP_reg6 RBP) > > DW_AT_linkage_name [DW_FORM_strp] ( > > .debug_str[0x002c] = "_Z2f2v") > > DW_AT_name [DW_FORM_strp] ( .debug_str[0x0033] > > = "f2") > > > > > > With ld --gc-sections: > > > > * DW_AT_low_pc [DW_FORM_addr] in .debug_info are resolved to 0 + > > addend This can cause overlapping address ranges with normal text > > sections. {{overlap}} * [beginning address offset, ending address > > offset) in .debug_ranges are resolved to 1 (ignoring addend). See > > bfd/reloc.c (behavior introduced in > > > > https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=e4067dbb2a3368dbf908b39c5435c84d51abc9f3 > > ) > > > > [0, 0) cannot be used because it terminates the list entry. > > [-1, -1) cannot be used because -1 represents a base address > > selection entry which will affect subsequent address offset > > pairs. > > * .debug_loc address offset pairs have similar problem to .debug_ranges > > * In DWARF v5, the abnormal values can be in a separate section .debug_addr > > > > --- > > > > I am eager to know what you think > > of the ideas from binutils/gdb/elfutils's perspective. > > I think this is a producer problem. If a (code) section can be totally > dropped then the associated (.debug) sections should have been > generated together with that (code) section in a COMDAT group. That > way when the linker drops that section, all the associated sections in > that COMDAT group will get dropped with it. If you don't do that, then > the DWARF is malformed and there is not much a consumer can do about > it. > > Said otherwise, I don't think it is correct for the linker (with > --gc-sections) to drop any sections that have references to it > (through relocation symbols) from other (.debug) sections. That's probably not practical for at least some users - the easiest/most thorough counter-example is Split DWARF - the DWARF is in another file the linker can't see. All the linker sees is a list of addresses (debug_addr). All 3 linkers have (modulo bugs) supported this situation, to varying degrees, for decades (ld.bfd: resolve to zero everywhere, resolve to 1 in debug_ranges, lld/gold: resolve to 0+addend) & this is an attempt to fix the bugs & maybe make the solution a bit more robust/work for more cases/be more intentional. (even if not for Split DWARF - creating DWARF that can be dropped by a non-DWARF-aware linker (ie: one that doesn't have to parse/rebuild all the DWARF at link time - which would be super expensive (though someone's prototyping that in lld for those willing to pay that tradeoff)) involves larger DWARF which isn't always a great tradeoff - some users care a lot more about object size than executable size (and maybe increased link time - due to more sections, etc))
Re: Range lists, zero-length functions, linker gc
Thanks for getting this conversation started here, Fangrui, I might summarize things slightly differently (some corrections - some just different phrasing): Current situation: When a linker discards code (either chooses a comdat copy from another object file that's not identical (two inline functions might be optimized differently, so DWARF can't point both descriptions to the same code - one has to be pointed to some "null" data essentially) or because of --gc-sections, etc) the DWARF that had relocations to them must be given some value. But what value? Current situation: bfd: 1 in debug_ranges, 0 elsewhere (debug_ or otherwise) lld and gold: 0+addend everywhere (debug_ or otherwise) Problems: bfd uses 1 in debug_ranges to avoid creating a 0,0 range entry (<= DWARFv4, debug_ranges contains address pairs terminated by 0,0) that would terminate the list prematurely bfd misses the same problem in debug_loc - though that's less impactful (debug_loc are usually just within the scope of one function, so it's usually all or nothing - if it terminates the list early it's not good for dumpers, but not likely a problem for debuggers - though in theory you could have a debug_loc across multiple functions/sections (if you optimize a global variable up into a local register through different functions) - and then terminating the list early would be a problem) lld/gold approach ends up mostly creating ranges like [0, length) - for sufficiently large functions, or code mapped into sufficiently low address ranges this range could overlap with real code and create ambiguities unless the consumer special cased starting at zero... - except for the ".text.x" example below, where 0+addend could still result a [positive, positive) address range that would be impossible to reliably identify in the consumer lld/gold has a more severe problem in the event of empty functions (GCC and Clang can both produce empty functions - simplest example being "int f1() { }" - yeah, you can't call this validly, but still code that can appear and is valid so long as it isn't called - also (where we found this recently) "void f1() { llvm_unreachable(); }" creates zero-length functions too) 0+addend produces a [0, 0) entry in the range list which terminates it prematurely and breaks debug info for other code that appears after the empty function. So, it'd be nice to improve the situation for low-range code that could overlap with the [0+addend, 0+addend) situation in lld/gold, fix the 0,0 debug_range problem, and maybe overall make this more explicit/intentional/consistent between producers (compilers and linkers), consumers, and the DWARF spec itself. -1 isn't workable in general, because it has special meaning in debug_ranges and debug_loc - but otherwise it's probably a pretty good "special" constant (though I guess in theory someone could map their code to the very top of their address range? I assume that's less likely than using zero or other "low-ish" address spaces that could overlap with the [0+addend, 0+addend) situation of lld/gold). Hence Fangrui's suggestion of -2 for debug_ranges and debug_loc, -1 everywhere else (at least all debug_* sections - but "all other sections" if that turns out to be a problematic value for non-debug sections) On Sun, May 31, 2020 at 12:19 PM Fangrui Song via Gdb wrote: > > It is being discussed on llvm-dev > (https://lists.llvm.org/pipermail/llvm-dev/2020-May/141885.html > https://groups.google.com/forum/#!topic/llvm-dev/i0DFx6YSqDA) > what linkers should do regarding relocations referencing dropped functions > (due > to section group rules, --gc-sections, /DISCARD/, etc) in .debug_* > > As an example: > >__attribute__((section(".text.x"))) void f1() { } >__attribute__((section(".text.x"))) void f2() { } >int main() { } > > Some .debug_* sections are relocated by R_X86_64_64 referencing undefined > symbols (the STT_SECTION > symbols are collected): > >0x0043: DW_TAG_subprogram [2] >## relocated by .text.x + 10 >DW_AT_low_pc [DW_FORM_addr] (0x0010 > ".text.x") >DW_AT_high_pc [DW_FORM_data4] (0x0006) >DW_AT_frame_base [DW_FORM_exprloc] (DW_OP_reg6 RBP) >DW_AT_linkage_name [DW_FORM_strp] ( > .debug_str[0x002c] = "_Z2f2v") >DW_AT_name [DW_FORM_strp] ( .debug_str[0x0033] = > "f2") > > > With ld --gc-sections: > > * DW_AT_low_pc [DW_FORM_addr] in .debug_info are resolved to 0 + addend >This can cause overlapping address ranges with normal text sections. > {{overlap}} > * [beginning address offset, ending address offset) in .debug_ranges are > resolved to 1 (ignoring addend). >See bfd/reloc.c (behavior introduced in > > https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=e4067dbb2a3368dbf908b39c5435c84d51abc9f3 > ) > >[0, 0) cannot be used because it terminates the list entry. >
Re: Range lists, zero-length functions, linker gc
On Sun, May 31, 2020 at 3:30 PM Mark Wielaard wrote: > > Hi, > > On Sun, May 31, 2020 at 01:49:12PM -0700, David Blaikie wrote: > > On Sun, May 31, 2020 at 1:41 PM Mark Wielaard wrote: > > > On Sun, May 31, 2020 at 11:55:06AM -0700, Fangrui Song via Elfutils-devel > > > wrote: > > > > I am eager to know what you think > > > > of the ideas from binutils/gdb/elfutils's perspective. > > > > > > I think this is a producer problem. If a (code) section can be totally > > > dropped then the associated (.debug) sections should have been > > > generated together with that (code) section in a COMDAT group. That > > > way when the linker drops that section, all the associated sections in > > > that COMDAT group will get dropped with it. If you don't do that, then > > > the DWARF is malformed and there is not much a consumer can do about > > > it. > > > > > > Said otherwise, I don't think it is correct for the linker (with > > > --gc-sections) to drop any sections that have references to it > > > (through relocation symbols) from other (.debug) sections. > > > > That's probably not practical for at least some users - the > > easiest/most thorough counter-example is Split DWARF - the DWARF is in > > another file the linker can't see. All the linker sees is a list of > > addresses (debug_addr). > > I might be missing something, but I think this works fine with Split > DWARF. As long as you make sure that the .dwo files/sections are > separated along the same lines as the ELF section groups are. That > means each section group either gets its own .dwo file, or you > generate the .dwo sections in the same section group in the same > object file using the SHF_EXCLUDED trick. That way each .debug.dwo > uses their own index into the separate .debug_addr tables. If that > group, with the .debug_addr table, gets discarded, then the reference > to the .dwo also disappears and it simply won't be used. Oh, a whole separate .dwo file per function? That would be pretty extreme/difficult to implement (now the compiler's producing a variable number of output files? using some naming scheme so the build system could find them again for building a .dwp if needed, etc). Certainly Bazel (& the internal Google version used to build most Google software) can't handle an unbounded/unknown number of output files from a build action. Multiple CUs in a single .dwo file is not really supported, which would be another challenge (we had to compromise debug info quality a little because of this limitation when doing ThinLTO - unable to emit multiple CUs into each thin-linked .o file) - at which point maybe the compiler'd need to produce an intermediate .dwp file of sorts... but there wouldn't be any great way for the debugger to find those intermediate .dwp files (since it's basically "either find the .dwo file that's written in the DWARF, or find the .dwp file relative to the executable name)? Not sure. & again the overhead of all those separate contributions, headers, etc, turns out to be not very desirable in any case. - Dave
Re: Range lists, zero-length functions, linker gc
On Sun, May 31, 2020 at 3:42 PM Mark Wielaard wrote: > > Hi, > > On Sun, May 31, 2020 at 01:47:38PM -0700, Fangrui Song via Elfutils-devel > wrote: > > On 2020-05-31, Mark Wielaard wrote: > > > I think this is a producer problem. If a (code) section can be totally > > > dropped then the associated (.debug) sections should have been > > > generated together with that (code) section in a COMDAT group. That > > > way when the linker drops that section, all the associated sections in > > > that COMDAT group will get dropped with it. If you don't do that, then > > > the DWARF is malformed and there is not much a consumer can do about > > > it. > > > > > > Said otherwise, I don't think it is correct for the linker (with > > > --gc-sections) to drop any sections that have references to it > > > (through relocation symbols) from other (.debug) sections. > > > > I would love if we could solve the problem using ELF features, but > > putting DW_TAG_subprogram in the same section group is not an > > unqualified win > > Sorry for pushing back a little, No worries - so long as other people engage with the rest of the thread, hopefully - happy to/worthwhile discussing all the edges. > but as a DWARF consumer this feels a > little like the DWARF producer hasn't tried hard enough to produce > valid DWARF and now tries to pass the problems off onto the DWARF > consumer. I think the fact that it's been this way across multiple compilers, linkers, and debuggers for decades is pretty strong evidence that it's at least a strategy producers do use/probably want to/will continue using. > Or when looking at it from the perspective of the linker, > the compiler gave it an impossible problem to solve because it didn't > really get all the pieces of the puzzle (the compiler already fused > some independent parts together). > > I certainly appreciate the issue on 32-bit systems. It seems we > already have reached the limits for some programs to be linked (or > produce all the debuginfo) when all you got is 32-bits. > > But maybe that means that the problem is actually that the compiler > already produced too much code/data. And the issue really is that it > passes some problems, like unused code elimination, off to the > linker. While the compiler really should have a better view of that, > and should do that job itself. Something like LLVM's ThinLTO does help here - avoiding duplication in object files, but doesn't entirely eliminate code removal in the final link step. Anything that attempts to improve this (including ThinLTO) comes at the cost of "pinch points" in the compilation - places where global knowledge is required to decide how to remove the redundancy - which complicates and potentially slows down the build (if you want cross-file optimizations, you're willing to pay some slowdown there - but if you're looking for a quick interactive build, this sort of extra pinch point is going to be unfortunate (ThinLTO helps mitigate some of the huge cost of LTO, but it's still extra steps)). > If it did, then it would never even > produce the debuginfo in the first place. > > GCC used to produce horrible DWARF years ago with early LTO > implementations, because they just handed it all off to the linker to > figure out. But they solved it by generating DWARF in phases, only > when it was known the DWARF was valid/consistent did it get > produced. So that if some code was actually eliminated then the linker > never even see any "code ranges" for code that disappeared. See Early > Debug: https://gcc.gnu.org/wiki/early-debug Ah, interesting read - thanks for the link! Yeah, LLVM took a different path there - the serializable IR (GIMPL equivalent, I guess) includes a semantic representation of DWARF, essentially - so while we've dealt with various issues around IR+IR linking for (Thin & full) LTO, it wasn't such a hard break/architectural issue as GCC dealt with there. Though we have discussed/entertained the idea of doing something more like GCC does - generating static DWARF earlier in the front-end and serializing a blob of relatively opaque DWARF in the IR except for the bits of IR (variable locations, etc) that the compiler needs visibility into. That particular way GCC used of separating the CUs is an interesting one to know about/keep in mind if we go down that route (though might hit the Split DWARF/multiple CUs issue if we did). > Might some similar technique, where the compiler does a bit more work, > so that it actually produces less DWARF to be processed by the linker, > be used here? Not while we're looking at the "classic" compilation model (compile source files to object files, link object files), that I'm aware of. > Sorry for pushing the problem back to the producer side, but as a > consumer I think that is the more correct place to solve this. No worries - and I there might be some interesting approaches to consider, but I think the history of this issue is long enough that some producers, in some use-case
Re: Range lists, zero-length functions, linker gc
On Mon, Jun 1, 2020 at 2:31 AM Mark Wielaard wrote: > > Hi, > > On Sun, May 31, 2020 at 03:36:02PM -0700, David Blaikie wrote: > > On Sun, May 31, 2020 at 3:30 PM Mark Wielaard wrote: > > > On Sun, May 31, 2020 at 01:49:12PM -0700, David Blaikie wrote: > > > > That's probably not practical for at least some users - the > > > > easiest/most thorough counter-example is Split DWARF - the DWARF is in > > > > another file the linker can't see. All the linker sees is a list of > > > > addresses (debug_addr). > > > > > > I might be missing something, but I think this works fine with Split > > > DWARF. As long as you make sure that the .dwo files/sections are > > > separated along the same lines as the ELF section groups are. That > > > means each section group either gets its own .dwo file, or you > > > generate the .dwo sections in the same section group in the same > > > object file using the SHF_EXCLUDED trick. That way each .debug.dwo > > > uses their own index into the separate .debug_addr tables. If that > > > group, with the .debug_addr table, gets discarded, then the reference > > > to the .dwo also disappears and it simply won't be used. > > > > Oh, a whole separate .dwo file per function? That would be pretty > > extreme/difficult to implement (now the compiler's producing a > > variable number of output files? using some naming scheme so the build > > system could find them again for building a .dwp if needed, etc). > > Each skeleton compilation unit has a DW_AT_dwo_name attribute which > indicates the .dwo file where the split unit sections can be found. It > actually seems seems easier to generate a different one for each > skeleton compilation unit than trying to combine them for all the > different skeleton compilation units you produce. > > > Certainly Bazel (& the internal Google version used to build most > > Google software) can't handle an unbounded/unknown number of output > > files from a build action. > > Yes, in principle .dwo files seems troublesome for build systems in > general. They're pretty practical when they're generated right next to the .o file & that's guaranteed by the compiler. "if you generate x.o, there will be x.dwo next to it" - that's certainly how Bazel deals with this. It doesn't parse the DWARF at all - knowing where the .dwo files are along with the .o files. > Especially since to do things properly you would need to read > the actual dwo_name attribute to make the connection from > object/skeleton file to split dwarf object file. And there is no easy > way to map back from .dwo to main ELF file. I don't think those issues have come up as problems for Google's deployment of Split DWARF which we've been using since the early prototypes. > Because of that I am > actually a fan of the SHF_EXCLUDED hack that simply places the split > .dwo sections in the same object file. For the above that would mean, > just place them in the same section group. This was a newer feature added during standardization of Split DWARF, which is handy for some users - but doesn't address the needs of the original design of Split DWARF (for Google) - a distributed build system that is trying to avoid moving more bytes than it must to one machine to run the link step. So not having to ship all the DWARF bytes to one machine for interactive debugging (pulling down from a distributed file system only the needed .dwo files during debugging - not all of them) - or at least being able to ship all the .dwo files to one machine to make a .dwp, and ship all the .o files to another machine for the link. > > > Multiple CUs in a single .dwo file is not really supported, which > > would be another challenge (we had to compromise debug info quality a > > little because of this limitation when doing ThinLTO - unable to emit > > multiple CUs into each thin-linked .o file) - at which point maybe the > > compiler'd need to produce an intermediate .dwp file of sorts... > > Are you sure? Fairly sure - I worked in depth on the implementation of ThinLTO & considered a variety of options trying to support Split DWARF in that situation. > Each CU would have a separate dwo_id field to > distinquish them. At least that is how elfutils figures out which CU > in a dwo file matches a given skeleton DIE. This should work the same > as for type units, you can have multiple type untis in the same file > and distinquish which one you need by matching the signature. One of the complications is that it increased the complexity of making a .dwp file - Split DWARF is spec'd to ensure that the linking process is as lightweight as possible. Not having the size overhead of relocations (though trading off more indirection through the cu_index, debug_str_offsets, etc). Oh right... that was the critical issue: There was no way I could think of to do cross-CU references in Split DWARF (cross-CU references being critical to LTO - inlining from one CU into another, etc). Because there was no relocation processing in dwp generat
Re: Range lists, zero-length functions, linker gc
On Tue, Jun 2, 2020 at 9:50 AM Mark Wielaard wrote: > > Hi, > > On Mon, 2020-06-01 at 13:18 -0700, David Blaikie wrote: > > On Mon, Jun 1, 2020 at 2:31 AM Mark Wielaard wrote: > > > Each skeleton compilation unit has a DW_AT_dwo_name attribute which > > > indicates the .dwo file where the split unit sections can be found. It > > > actually seems seems easier to generate a different one for each > > > skeleton compilation unit than trying to combine them for all the > > > different skeleton compilation units you produce. > > > > > > > Certainly Bazel (& the internal Google version used to build most > > > > Google software) can't handle an unbounded/unknown number of output > > > > files from a build action. > > > > > > Yes, in principle .dwo files seems troublesome for build systems in > > > general. > > > > They're pretty practical when they're generated right next to the .o > > file & that's guaranteed by the compiler. "if you generate x.o, there > > will be x.dwo next to it" - that's certainly how Bazel deals with > > this. It doesn't parse the DWARF at all - knowing where the .dwo files > > are along with the .o files. > > The DWARF spec makes it clear that a DWO is per CU, not per object > file. So when an object file contains multiple CUs, it might also be > associated with multiple .dwo files (as is also the case with a linked > executable or shared library). The spec makes says the DW_AT_dwo_name > can contain both a (relative) file or a path to the associated DWO > file. Which means that relying on a one-to-one mapping from .o to .dwo > is fragile and is likely to break when tools start using multiple CUs > or different naming heuristics. Yep, agreed - in the most general form there's no guarantee that one compilation would produce one .dwo and you'd have to parse the .o to find all the associated .dwos. Practically speaking that's not the reality right now (build systems rely on stronger/narrower guarantees by the compiler about how many/where the .dwo files are). > > > Because of that I am > > > actually a fan of the SHF_EXCLUDED hack that simply places the split > > > .dwo sections in the same object file. For the above that would mean, > > > just place them in the same section group. > > > > This was a newer feature added during standardization of Split DWARF, > > which is handy for some users > > Although it is used in practice by some producers, it is not > standardize (yet). Also because SHF_EXCLUDED isn't standardized > (although it is used consistently for those arches that support it). Ah, sorry, I didn't mean the specific implementation strategy of using SHF_EXCLUDED, I meant the general concept of having a .o file be its own .dwo file is standardized "The sections that do not require relocation, however, can be written to the relocatable object (.o) file but ignored by the linker, or they can be written to a separate DWARF object (.dwo) file that need not be accessed by the linker." > > - but doesn't address the needs of the > > original design of Split DWARF (for Google) - a distributed build > > system that is trying to avoid moving more bytes than it must to one > > machine to run the link step. So not having to ship all the DWARF > > bytes to one machine for interactive debugging (pulling down from a > > distributed file system only the needed .dwo files during debugging - > > not all of them) - or at least being able to ship all the .dwo files > > to one machine to make a .dwp, and ship all the .o files to another > > machine for the link. > > I think that is not what most people would use split-dwarf for. Probably not - but it's the use case I care about/need to support. > The > Google setup seems somewhat unique. Most people probably do compiling, > linking and debugging on the same machine. The main use case (for me) > is to speed up the edit-compile-debug cycle. Making sure that the > linker doesn't have to deal with (most of) the .debug sections and can > just leave them behind (either in the .o file, or a separate .dwo file) > is the main attraction of split-dwarf IMHO. When actually producing > production builds with debug you still pay the price anyway, because > instead of the linker, you now need to build your dwp packages which > does most of the same work the linker would have done anyway (combining > the data, merging the string indexes, deduplicating debug types, etc.) It's still a price you can parallelize, rather than having to serialize (somewhat - lld is multithreaded for instance). And the dwp support for linking other dwp files together means you can do it iteratively (rather than taking all the .dwo files and doing noe big link step - you can take a few dwos, link them into an intermediate dwp (removing duplicate type information and strings) then link again with other intermediate dwps, etc - with some distribution/parallelism benefits). > > > > Multiple CUs in a single .dwo file is not really supported, which > > > > would be another challenge (w
Re: Range lists, zero-length functions, linker gc
On Tue, Jun 2, 2020 at 8:10 PM Alan Modra wrote: > > On Tue, Jun 02, 2020 at 11:06:10AM -0700, David Blaikie via Binutils wrote: > > On Tue, Jun 2, 2020 at 9:50 AM Mark Wielaard wrote: > > > where I > > > would argue the compiler simply needs to make sure that if it generates > > > code in separate sections it also should create the DWARF separate > > > section (groups). > > > > I don't think that's practical - the overhead, I believe, is too high. > > Headers for each section contribution (ELF headers but DWARF headers > > moreso - having a separate .debug_addr, .debug_line, etc section for > > each function would be very expensive) would make for very large > > object files. > > With a little linker magic I don't see the neccesity of duplicating > the DWARF headers. Taking .debug_line as an example, a compiler could > emit the header, opcode, directory and file tables to a .debug_line > section with line statements for function foo emitted to > .debug_line.foo and for bar to .debug_line.bar, trusting that the > linker will combine these sections in order to create an output > .debug_line section. If foo code is excluded then .debug_line.foo > info will also be dropped if section groups are used. I don't think this would apply to debug_addr - where the entries are referenced from elsewhere via index, or debug_rnglist where the rnglist header (or the debug_info directly) contains offsets into this section, so taking chunks out would break those offsets. (or to the file/directory name part of debug_line - where you might want to remove file/line entries that were eliminated as dead code - but that'd throw off the indexes)
Re: Range lists, zero-length functions, linker gc
On Fri, Jun 19, 2020 at 5:00 AM Mark Wielaard wrote: > > Hi, > > On Tue, 2020-06-02 at 11:06 -0700, David Blaikie via Elfutils-devel wrote: > > > I do think combining Split DWARF and LTO might not be the best > > > solution. When doing LTO you probably want something like GCC Early > > > Debug, which is like Split DWARF, but different, because the Early > > > Debug simply doesn't contain any address (ranges) yet (not even through > > > indirection like .debug_addr). > > > > I don't think Early Debug fits here - it seems like it was > > specifically for DWARF that doesn't refer to any code (eg: function > > declarations and type definitions). I don't see how it could be used > > for the actual address-referencing DWARF needed to describe function > > definitions. > > I think that is kind of the point of Early Debug. Only use DWARF (at > first) for address/range-less data like types and program scope > entries, but don't emit anything (in DWARF format) for things that > might need adjustments during link/LTO phase. The problem with using > DWARF with address (ranges) during early object creation is that the > linker isn't capable to rewrite the DWARF. You'll need a linker plugin > that calls back into the compiler to do the actual LTO and emit the > actual DWARF containing address/ranges (which can then link back to the > already emitted DWARF types/program scope/etc during the Early Debug > phase). I think the issue you are describing is actually that you do > use DWARF to describe function definitions (not just the declarations) > too early. If you aren't sure yet which addresses will be used DWARF > isn't really the appropriate (temporary) debug format. Sorry, I think we keep talking around each other. Not sure if we can reach a good consensus or shared understanding on this topic. DWARF in unlinked object files has been a fairly well used temporary debug format for a long time - and the DWARF spec has done a lot to ensure it is compatible with ELF in both object files and linkers forever, basically? So I don't think it'd be suitable to say "DWARF isn't an appropriate intermediate debug format to use between compilers and linkers". In the sense that I don't think either the DWARF committee members, producers, or consumers would agree with this sentiment. > > > > > > & again the overhead of all those separate contributions, headers, > > > > > > etc, turns out to be not very desirable in any case. > > > > > > > > > > Yes, I agree with that. But as said earlier, maybe the compiler > > > > > shouldn't have generated to code/data in the first place? > > > > > > > > In the (especially) C++ compilation model, I don't believe that's > > > > possible - inline functions, templates, etc, require duplication - > > > > unless you have a more complicated build process that can gather the > > > > potential duplication, then fan back out again to compile, etc. > > > > ThinLTO does some of this - at a cost of a more complicated build > > > > system, etc. > > > > > > It might be useful for the original discussion to have a few more > > > concrete examples to show when you might have unused code that the > > > linker might want to discard, but where the compiler could only produce > > > DWARF in one big blob. Apart of the -ffunction-sections case, > > > > Function sections, inline functions, function templates are core examples. > > I understand the function sections case, but can you give actual > examples of an inline function or function template source code and how > a DWARF producer generates DWARF for that? Maybe some simple source > code we can put through gcc or clang to see how they (mis)handle it. > Not being a compiler architect I am not sure I understand why those > cannot be expressed correctly. oh, sure! sorry. a simple case of inline functions being deduplicated looks like this: a.cpp: inline void f1() { } void f2() { f1(); } b.cpp: inline void f1() { } void f2(); int main() { f1(); f2(); } This actually demonstrates a slightly different behavior of bfd and gold: When the comdats are the same size (I'm told that's the heuristic) and the local symbol names the DWARF uses to refer to the functions (f1 in this case) - then both DWARF descriptions are resolved to point to the same deduplicated copy of 'f1', eg: BFD and Gold both produce this DWARF (uninteresting attributes have been omitted): DW_TAG_compile_unit [1] * DW_AT_name [DW_FORM_strp] ( .debug_str[0x0065] = "a.cpp")
Re: Tombstone values in debug sections (was: Range lists, zero-length functions, linker gc)
On Fri, Jun 19, 2020 at 1:04 PM Mark Wielaard wrote: > > Hi, > > On Tue, 2020-06-09 at 13:24 -0700, Fangrui Song via Elfutils-devel wrote: > > I want to revive the thread, but focus on whether a tombstone value > > (-1/-2) in .debug_* can cause trouble to various DWARF consumers (gdb, > > debug related tools in elfutils and other utilities I don't know about). > > > > Paul Robinson has proposed that DWARF v6 should reserve a tombstone > > value (the value a relocation referencing a discarded symbol in a > > .debug_* section should be resolved to) > > http://www.dwarfstd.org/ShowIssue.php?issue=200609.1 > > I would appreciate having a clear "not valid" marker instead of getting > a possibly bogus (but valid) address. -1 seems a reasonable value. > Although I have seen (and written) code that simply assumes zero is > that value. Yep - and zero seemed like a good one - except in debug_ranges and debug_loc where that would produce a premature list termination (bfd.ld gets around this by using 1 in debug_ranges) - or on architectures for which 0 is a valid address. if you use the zero+addend approach that gold uses (and lld did use/maybe still does, but is going to move away from) then you /almost/ avoid the need to special case debug_ranges and debug_loc, until you hit a zero-length function (you can create zero-length functions from code like "int f1() { }" or "void f2() { __builtin_unreachable(); }") - then you get the early list termination again Also zero+addend might trip up in a case like: "void f1() { } __attribute__((nodebug)) void f2() { } void f3() { }" - now f3's starting address has a non-zero addend, so it's indistinguishable from valid code at a very low address > Would such an invalid address marker in an DW_AT_low_pc make the whole > program scope under a DIE invalid? What about (addr, loc, rng) base > addresses? Can they contain an invalid marker, does that make the whole > table/range invalid? That would be my intent, yes - any pointer derived from an invalid address would be invalid. Take the f1/f2/f3 nodebug example above - f3's starting address could be described by "invalid address + offset" (currently DWARF has no way of describing this - well, it sort of does, you could use an exprloc with an OP_addrx and the arithmetic necessary to add to that - though I doubt many consumers could handle an exprloc there - but I would like to champion that to enable reuse of address pool entries to reduce the size of .o debug info contributions when using Split DWARF - or just reduce the number of relocations/.o file size when using non-split DWARF), so it'd be important for that to be special cased in pointer arithmetic so the tombstone value propagates through arithmetic. > I must admit that as a DWARF consumer I am slightly worried that having > a sanctioned "invalid marker" will cause DWARF producers to just not > coordinate and simply assume they can always invalidate anything they > emit. At least in my experience (8 years or so working on LLVM's DWARF emission) we've got a pretty strong incentive to reduce DWARF size already - I don't think any producers are being particularly cavalier about producing excess DWARF on the basis that it can be marked invalid. > Even if there could be a real solution by coordinating between > compiler/linker who is responsible for producing the valid DWARF > entries (especially when LTO is involved). A lot of engineering work went into restructuring LLVM's debug info IR representation for LTO to ensure LLVM doesn't produce DWARF for functions deduplicated or dropped by LTO. - Dave > > > Some comments about the proposal: > > > > > - deduplicating different functions with identical content; GNU > > > refers > > > to this as ICF (Identical Code Folding); > > > > ICF (gold --icf={safe,all}) can cause DW_TAG_subprogram with > > different DW_AT_name to have the same range. > > Cary Coutant wrote up a general Two-Level Line Number Table proposal to > address the issue of having a single machine instruction corresponds to > more than one source statement: > http://wiki.dwarfstd.org/index.php?title=TwoLevelLineTables > > Which seems useful in these kind of situations. But I don't know the > current status of the proposal. This was motivated by a desire to be able to do symbolized stack traces including inline stack frames with a smaller representation than is currently possible in DWARF - it allows the line table itself to describe inlining, to some degree, rather than relying on the DIE tree (in part this was motivated by a desire to be able to symbolized backtraces with inlining in-process when Split DWARF is used and the .dwo/.dwp files are not available). I don't think it extends to dealing with the case of deduplication like this - nor addresses the possibility of two CUs having overlapping instruction ranges. (it's semantically roughly equivalent to the inlined_subroutines of a subprogram - not so much related to two copies of a function being deduplic
Re: Range lists, zero-length functions, linker gc
On Wed, Jun 24, 2020 at 3:22 PM Mark Wielaard wrote: > > Hi David, > > On Fri, 2020-06-19 at 17:46 -0700, David Blaikie via Elfutils-devel wrote: > > On Fri, Jun 19, 2020 at 5:00 AM Mark Wielaard wrote: > > > I think that is kind of the point of Early Debug. Only use DWARF (at > > > first) for address/range-less data like types and program scope > > > entries, but don't emit anything (in DWARF format) for things that > > > might need adjustments during link/LTO phase. The problem with using > > > DWARF with address (ranges) during early object creation is that the > > > linker isn't capable to rewrite the DWARF. You'll need a linker plugin > > > that calls back into the compiler to do the actual LTO and emit the > > > actual DWARF containing address/ranges (which can then link back to the > > > already emitted DWARF types/program scope/etc during the Early Debug > > > phase). I think the issue you are describing is actually that you do > > > use DWARF to describe function definitions (not just the declarations) > > > too early. If you aren't sure yet which addresses will be used DWARF > > > isn't really the appropriate (temporary) debug format. > > > > Sorry, I think we keep talking around each other. Not sure if we can > > reach a good consensus or shared understanding on this topic. > > I think the confusion comes from the fact that we seem to cycle through > a couple of different topics which are related, but not really > connected directly. > > There is the topic of using "tombstones" in place of some pc or range > attributes/tables in the case of traditional linking separate compile > units/objects. Where we seem to agree that those are better than > silently producing bad data, but were we disagree whether there are > other ways to solve the issue (using comdat section for example, where > we might see the overhead/gains differently). > > There is the topic of LTO where part of the linker optimization is done > through a (compiler) plugin. Where it isn't clear (to me at least) if > some of the traditional way of handling DWARF in object files makes > sense. Oh - perhaps to clarify: I don't know of any implementation that creates DWARF in intermediate object files in LTO. > I would argue that GCC shows that for LTO you need something > like Early Debug, where you only produce parts of the DWARF early that > don't contain any addresses or ranges, since you don't know yet where > code/data will end up till after the actual LTO phase, only after which > it can be produced. Yeah - I guess that's the point of the name "Early Debug" - it's earlier than usual, rather than making the rest later than usual. In LLVM's implementation the faux .o files in LTO contain no DWARF whatsoever - but a semantic representation something like DWARF intended to be manipulated by compiler optimizations and designed to drop unreferenced portions as optimizations make changes. (if you inline and optimize away a function call, that function may get dropped - then no DWARF is emitted for it, same as if it were never called) Yeah, it'd be theoretically possible to create all the DWARF up-front, use loclists and rnglists for /everything/ (because you wouldn't know if a variable would have a single location or multiple until after optimizations) and then fill in those loclists and rnglists post-optimization. I don't know of any implementation that does that, though - it'd make for very verbose DWARF, and I agree with you that that wouldn't be great - I think the only point of conflict there is: I don't think that's a concern that's actually manifesting in DWARF producers today. Certainly not in LLVM & doesn't sound like it is in GCC. I think there's enough incentive for compiler performance - not to produce loads of duplicate DWARF, and to have a fairly compact/optimizable intermediate representation - there was a lot of work that went into changing LLVM's representation to be more amenable to LTO to ensure things got dropped and deduplicated as soon as possible. > Then there is the topic of Split Dwarf, where I am not sure it is > directly relevant to the above two topics. It is just a different > representation of the DWARF data, with an extra layer of indirections > used for addresses. Which in the case of the traditional model means > that you still hit the tombstones, just through an indirection table. > And for LTO it just makes some things more complicated because you have > this extra address indirection table, but since you cannot know where > the addresses end up till after the LTO phase you now have an extra > layer of indirection to