On 9/24/20 9:03 AM, Richard Biener wrote:
Hmm, but offload_vars and offload_funcs do not need to be exported since they get stored into tables with addresses pointing to them (and that table is exported).
Granted but the x86-64 linker does not seem to be able to resolve the symbol if the table is in a.ltrans0.ltrans.o and the variable or function is in a.ltrans1.ltrans.o That's both host/x86-64 code; the linker might not see that the table is used by a dynamic library – but still it should resolve the links, shouldn't it? Possibly, the 'externally_visible = 1' in my code is also a read herring; it also works by using: TREE_PUBLIC (decl) = 1; gcc_assert (!node->offloadable); node->offloadable = 1; and below if (node->offloadable) { node->offloadable = 0; validize_symbol_for_target (node); continue; } Namely: PUBLIC + avoid calling promote_symbol.
Note that ultimatively the desired visibility is determined by the linker and communicated via the resolution file to the WPA stage. I'm not sure whether both host and offload code participate in the same link and thus if the offload tables are properly seen as being referenced
This could be the problem. The device part is linked by the host/x86-64 linker – but the device's ".o" files are just linked and not processed by 'ld. (In case of nvptx, they are host compiled .o files which contain everything as strings with the nvptx as text – to be passed to the JIT at startup.) Note that *no* WPA/LTO is done on the device side – there only all generated files are collected without any inter-file optimizations. (Sufficient for the code generated by the program, which is all in one file – but it still would be useful to inline, e.g., libm functions.)
(for a non-DSO symbols are usually _not_ force-exported) - so, how is the offload table constructed?
First, the offload tables exist both on the host and on the device(s). They have to be identical as otherwise the association between variables and function is lost. The symbols are added to offload_vars + offload_funcs. In lto-cgraph.c's output_offload_tables there is the last chance to remove now unused nodes — as once the tables are streamed for device usage, they cannot be changed. Hence, there one has node->force_output = 1; [Unrelated: this prevents later optimizations, which still could be done; cf. PR95622] The table itself is written in omp-offload.c's omp_finish_file. For the host, the constructor is constructed in add_decls_addresses_to_decl_constructor, which does: CONSTRUCTOR_APPEND_ELT (v_ctor, NULL_TREE, addr); if (is_var) CONSTRUCTOR_APPEND_ELT (v_ctor, NULL_TREE, size); and then in omp_finish_file: tree funcs_decl = build_decl (UNKNOWN_LOCATION, VAR_DECL, get_identifier (".offload_func_table"), funcs_decl_type); DECL_USER_ALIGN (funcs_decl) = DECL_USER_ALIGN (vars_decl) = 1; SET_DECL_ALIGN (funcs_decl, TYPE_ALIGN (funcs_decl_type)); DECL_INITIAL (funcs_decl) = ctor_f; set_decl_section_name (funcs_decl, OFFLOAD_FUNC_TABLE_SECTION_NAME); varpool_node::finalize_decl (vars_decl); Tobias ----------------- Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter