Chris Lattner <[EMAIL PROTECTED]> writes: >> * The return value of lto_module_get_symbol_attributes is not >> defined. > > Ah, sorry about that. Most of the details are actually in the public > header. The result of this function is a 'lto_symbol_attributes' > bitmask. This should be more useful and revealing: > http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm-c/lto.h?revision=HEAD&view=markup
>From an ELF perspective, this doesn't seem to have a way to indicate a common symbol, and it doesn't provide the symbol's type. It also doesn't have a way to indicate section groups. (How do section groups work in Mach-O? Example is a C++ template function with a static constant array which winds up in the .rodata section. Section groups permit discarding the array when we discard the function code.) >> * Interfaces like lto_module_get_symbol_name and >> lto_codegen_add_must_preserve_symbol are inefficient when dealing >> with large symbol tables. > > The intended model is for the linker to query the LTO plugin for its > symbol list and build up its own linker-specific hash table. This way > you don't need to force the linker to use the plugin's data structure > or the plugin to use the linker data structure. We converged on this > approach after trying it the other way. > > Does this make sense, do you have a better idea? In gcc's LTO approach, I think the linker will already have access to the symbol table anyhow. But my actual point here is that requiring a function call for every symbol is inefficient. These functions should take an array and a count. There can be hundreds of thousands of entries in a symbol table, and the interface should scale accordingly. >> The LLVM >> interface does not do that. > > Yes it does, the linker fully handles symbol resolution in our model. > >> Suppose the linker is invoked on a >> sequence of object files, some with with LTO information, some >> without, all interspersed. Suppose some symbols are defined in >> multiple .o files, through the use of common symbols, weak symbols, >> and/or section groups. The LLVM interface simply passes each object >> file to the plugin. > > No, the native linker handles all the native .o files. > >> The result is that the plugin is required to do >> symbol resolution itself. This 1) loses one of the benefits of having >> the linker around; 2) will yield incorrect results when some non-LTO >> object is linked in between LTO objects but redefines some earlier >> weak symbol. > > In the LLVM LTO model, the plugin only needs to know about its .o > files, and the linker uses this information to reason about symbol > merging etc. The Mac OS X linker can even do dead code stripping > across Macho .o files and LLVM .bc files. To be clear, when I said object file here, I meant any input file. You may have understood that. In ELF you have to think about symbol overriding. Let's say you link a.o b.o c.o. a.o has a reference to symbol S. b.o has a strong definition. c.o has a weak definition. a.o and c.o have LTO information, b.o does not. ELF requires that a.o call the symbol from b.o, not the symbol from c.o. I don't see how to make that work with the LLVM interface. This is not a particularly likely example, of course. People rely on this sort of symbol overriding quite a bit, but it's unlikely that a.o and c.o would have LTO information while b.o would not. However, given that we are designing an interface, I think we should design it so that correctness is possible. > Further other pieces of the toolchain (nm, ar, etc) also use the same > interface so that they can return useful information about LLVM LTO > files. Useful, but as I understand it gcc's LTO files will have that information anyhow. > This is our second major revision of the LTO interfaces, and the > interface continues to slowly evolve. I think it would be great to > work with you guys to extend the design to support GCC's needs. Agreed. Ian