Hi!

I have tried a few things, and got things somewhat working, but I'm not
satisfied with my results so far, so I'd like to ask for help.  OpenACC
specifics are not relevant to my question, which I'm thus formulating in
a very generic way.  (But find an illustrative example at the end of the
email.)

If attached to a function declaration X (using a function attribute,
basically), the OpenACC bind clause specifies that when compiling for an
offloading target, all calls to function X should be diverted to function
Y, and the body of function X be discarded.  X remains the call target
when compiling for the host.  Y may be different per offloading target.
In the generic case, Y will be identified with an assembler name.

The requirements mandate an implementation in the LTO front end (which is
the entry point for every offloading compiler), or later.  Is the LTO
front end the right place to do this?  After read_cgraph_and_symbols or
somewhere else?

As we're not going to use it in the offloaded code (it's unreachable), my
first thought was: for all decls (X) that have a bind (Y) clause
attached, set the decl X's assembler name to Y's (using
symtab->change_decl_assembler_name -- or
gcc/varasm.c:set_user_assembler_name?).  That somewhat works, but Y will
then be compiled to X's name, and I saw problems if not only X's
declaration but also its definition were available, because we'd then get
two function definitions with X's (assembler) name, and I didn't manage
to discard only the original (unreachable) X definition while keeping its
decl alive (with assembler name Y), which is still used at all call
sites.  Maybe the wrong approach after all...

I'm able to look up cgraph_node::get_for_asmname([Y]), and I tried
experimenting with cgraph_node::create_alias and resolve_alias (in the
LTO front end) but that also hasn't been completely successful: this
worked if compiling with optimizations (Y even got inlined at the call
site of X, good!), but it didn't work with -O0.

I found the redirect_callee and redirect_call_stmt_to_callee functions of
cgraph_edge -- is that something I should be using?  (Still in the LTO
front end?)

Or, should I do this redirection after the LTO front end, in an early
pass (execute_oacc_device_lower?).  That is, for every
current_function_decl, locate all calls to all functions tagged with a
bind clause, and then rewrite the call sites to Y instead of X?


An illustrative example:

    #pragma acc routine
    int Y()
    {
      return 2;
    }
    
    #pragma acc routine bind(Y)
    int X()
    {
      return 1;
    }
    
    int main()
    {
      int ret;
    #pragma acc parallel copyout(ret)
      ret = X();
    
      return ret;
    }

If running with ACC_DEVICE_TYPE=host, this should return 1, and if
running with ACC_DEVICE_TYPE=not_host, it should return 2.


Grüße
 Thomas

Reply via email to