Chris Lattner <[EMAIL PROTECTED]> writes:

>> * The return value of lto_module_get_symbol_attributes is not
>>  defined.
>
> Ah, sorry about that.  Most of the details are actually in the public
> header.  The result of this function is a 'lto_symbol_attributes'
> bitmask.  This should be more useful and revealing:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm-c/lto.h?revision=HEAD&view=markup

>From an ELF perspective, this doesn't seem to have a way to indicate a
common symbol, and it doesn't provide the symbol's type.  It also
doesn't have a way to indicate section groups.

(How do section groups work in Mach-O?  Example is a C++ template
function with a static constant array which winds up in the .rodata
section.  Section groups permit discarding the array when we discard
the function code.)


>> * Interfaces like lto_module_get_symbol_name and
>>  lto_codegen_add_must_preserve_symbol are inefficient when dealing
>>  with large symbol tables.
>
> The intended model is for the linker to query the LTO plugin for its
> symbol list and build up its own linker-specific hash table.  This way
> you don't need to force the linker to use the plugin's data structure
> or the plugin to use the linker data structure.  We converged on this
> approach after trying it the other way.
>
> Does this make sense, do you have a better idea?

In gcc's LTO approach, I think the linker will already have access to
the symbol table anyhow.  But my actual point here is that requiring a
function call for every symbol is inefficient.  These functions should
take an array and a count.  There can be hundreds of thousands of
entries in a symbol table, and the interface should scale accordingly.


>> The LLVM
>> interface does not do that.
>
> Yes it does, the linker fully handles symbol resolution in our model.
>
>> Suppose the linker is invoked on a
>> sequence of object files, some with with LTO information, some
>> without, all interspersed.  Suppose some symbols are defined in
>> multiple .o files, through the use of common symbols, weak symbols,
>> and/or section groups.  The LLVM interface simply passes each object
>> file to the plugin.
>
> No, the native linker handles all the native .o files.
>
>>  The result is that the plugin is required to do
>> symbol resolution itself.  This 1) loses one of the benefits of having
>> the linker around; 2) will yield incorrect results when some non-LTO
>> object is linked in between LTO objects but redefines some earlier
>> weak symbol.
>
> In the LLVM LTO model, the plugin only needs to know about its .o
> files, and the linker uses this information to reason about symbol
> merging etc.  The Mac OS X linker can even do dead code stripping
> across Macho .o files and LLVM .bc files.

To be clear, when I said object file here, I meant any input file.
You may have understood that.

In ELF you have to think about symbol overriding.  Let's say you link
a.o b.o c.o.  a.o has a reference to symbol S.  b.o has a strong
definition.  c.o has a weak definition.  a.o and c.o have LTO
information, b.o does not.  ELF requires that a.o call the symbol from
b.o, not the symbol from c.o.  I don't see how to make that work with
the LLVM interface.

This is not a particularly likely example, of course.  People rely on
this sort of symbol overriding quite a bit, but it's unlikely that a.o
and c.o would have LTO information while b.o would not.  However,
given that we are designing an interface, I think we should design it
so that correctness is possible.


> Further other pieces of the toolchain (nm, ar, etc) also use the same
> interface so that they can return useful information about LLVM LTO
> files.

Useful, but as I understand it gcc's LTO files will have that
information anyhow.


> This is our second major revision of the LTO interfaces, and the
> interface continues to slowly evolve.  I think it would be great to
> work with you guys to extend the design to support GCC's needs.

Agreed.

Ian

Reply via email to