Hello,
Let me somewhat summarize current understanding of
host binary linking as well as target binary building/linking.

We put code which supposed to be offloaded to dedicated sections,
with name starting with gnu.target_lto_

At link time (I mean, link time of host app):
  1. Generate dedicated data section in each binary (executable or DSO),
     which'll be a placeholder for offloading stuff.

  2. Generate __OPENMP_TARGET__ (weak, hidden) symbol,
     which'll point to start of the section mentioned in previous item.

This section should contain at least:
  1. Number of targets
  2. Size of offl. symbols table

  [ Repeat `number of targets']
  2. Name of target
  3. Offset to beginning of image to offload to that target
  4. Size of image

  5. Offl. symbols table

Offloading symbols table will contain information about addresses
of offloadable symbols in order to create mapping of host<->target
addresses at runtime.

To get list of target addresses we need to have dedicated interface call
to libgomp plugin, something like getTargetAddresses () which will
query target for the list of addresses (accompanied with symbol names).
To get this information target DSO should contain similar table of
mapping symbols to address.

Application is going to have single instance of libgomp, which
in turn means that we'll have single splay tree holding information
about mapping  (host -> target) for all DSO and executable.

When GOMP_target* is called, pointer to table of current execution
module is passed to libgomp along with pointer to routine (or global).
libgomp in turn:
  1. Verify in splay tree if address of given pointer (to the table)
     exists. If not - then this means given table is not yet initialized.
     libgomp initializes it (see below) and insert address of the table
     in to splay tree.
  2. Performs lookup for the address (host) in table provided
     and extracting target address.
  3. After target address is found, we perform API call (passing that address)
     to given device

We have at least 2 approaches of host->target mapping solving.

I. Preserve order of symbols appearance.
   Table row: [ address, size ]
   For routines, size to be 1

   In order to initialize the table we need to get two arrays:
   of host and target addresses. The order of appearance of objects in
   these arrays must be the same. Having this makes mapping easy.
   We just need to find index if given address in array of host addrs and
   then dereference array of target addresses with index found.

   The problem is that it unlikely will work when LTO of host is ON.
   I am also not sure, that order of handling objects on target is the same
   as on host.

II. Store symbol identifier along with address.
  Table row: [ symbol_name, address, size]
  For routines, size to be 1

  To construct the table of host addresses, at link
  time we put all symbol (marked at compile time with dedicated
  attribute) addresses to the table, accompanied with symbol names (they'll
  serve as keys)

  During initialization of the table we create host->target address mapping
  using symbol names as keys.

The last thing I wanted to summarize: compiling target code.

We have 2 approaches here:

   1. Perform WPA and extract sections, marked as target, into separate object
      file. Then call target compiler on that object file to produce the binary.

      As mentioned by Jakub, this approach will complicate debugging.

   2. Pass fat object files directly to the target compiler (one CU at a time).
      So, for every object file we are going to call GCC twice:
          - Host GCC, which will compile all host code for every CU
          - Target GCC, which will compile all target code for every CU

I vote for option #2 as far as WPA-based approach complicates debugging.
What do you guys think?

--
Thanks, K

Reply via email to