On 9/24/20 9:03 AM, Richard Biener wrote:

Hmm, but offload_vars and offload_funcs do not need to be exported
since they get stored into tables with addresses pointing to them
(and that table is exported).

Granted but the x86-64 linker does not seem to be able to resolve
the symbol if the table is in a.ltrans0.ltrans.o and the variable
or function is in a.ltrans1.ltrans.o

That's both host/x86-64 code; the linker might not see that the
table is used by a dynamic library – but still it should resolve
the links, shouldn't it?

Possibly, the 'externally_visible = 1' in my code is also a
read herring; it also works by using:
   TREE_PUBLIC (decl) = 1;
   gcc_assert (!node->offloadable);
   node->offloadable = 1;
and below
  if (node->offloadable)
    {
      node->offloadable = 0;
      validize_symbol_for_target (node);
      continue;
    }
Namely: PUBLIC + avoid calling promote_symbol.

Note that ultimatively the desired visibility is determined by
the linker and communicated via the resolution file to the WPA
stage.  I'm not sure whether both host and offload code participate
in the same link and thus if the offload tables are properly
seen as being referenced

This could be the problem. The device part is linked by the
host/x86-64 linker – but the device's ".o" files are just linked
and not processed by 'ld. (In case of nvptx, they are host
compiled .o files which contain everything as strings with the
nvptx as text – to be passed to the JIT at startup.)

Note that *no* WPA/LTO is done on the device side – there only all
generated files are collected without any inter-file
optimizations. (Sufficient for the code generated by the program,
which is all in one file – but it still would be useful to
inline, e.g., libm functions.)

(for a non-DSO symbols are usually _not_
force-exported) - so, how is the offload table constructed?

First, the offload tables exist both on the host and on the
device(s). They have to be identical as otherwise the
association between variables and function is lost.

The symbols are added to offload_vars + offload_funcs.

In lto-cgraph.c's output_offload_tables there is the last chance
to remove now unused nodes — as once the tables are streamed
for device usage, they cannot be changed. Hence, there one
has
   node->force_output = 1;
[Unrelated: this prevents later optimizations, which still
could be done; cf. PR95622]


The table itself is written in omp-offload.c's omp_finish_file.

For the host, the constructor is constructed in
add_decls_addresses_to_decl_constructor, which does:
      CONSTRUCTOR_APPEND_ELT (v_ctor, NULL_TREE, addr);
      if (is_var)
        CONSTRUCTOR_APPEND_ELT (v_ctor, NULL_TREE, size);
and then in omp_finish_file:
      tree funcs_decl = build_decl (UNKNOWN_LOCATION, VAR_DECL,
                                    get_identifier (".offload_func_table"),
                                    funcs_decl_type);
      DECL_USER_ALIGN (funcs_decl) = DECL_USER_ALIGN (vars_decl) = 1;
      SET_DECL_ALIGN (funcs_decl, TYPE_ALIGN (funcs_decl_type));
      DECL_INITIAL (funcs_decl) = ctor_f;
      set_decl_section_name (funcs_decl, OFFLOAD_FUNC_TABLE_SECTION_NAME);
      varpool_node::finalize_decl (vars_decl);

Tobias

-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter

Reply via email to