https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120308

            Bug ID: 120308
           Summary: 'TYPE_EMPTY_P' vs. code offloading
           Product: gcc
           Version: 15.0
            Status: UNCONFIRMED
          Keywords: ABI, openacc, openmp, wrong-code
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: tschwinge at gcc dot gnu.org
                CC: ams at gcc dot gnu.org, jakub at gcc dot gnu.org,
                    rguenth at gcc dot gnu.org, vries at gcc dot gnu.org
  Target Milestone: ---
            Target: nvptx

..., and here's another host vs. offload target compatibility issue.

We've got 'gcc/stor-layout.cc:finalize_type_size':

    /* Handle empty records as per the x86-64 psABI.  */
    TYPE_EMPTY_P (type) = targetm.calls.empty_record_p (type);

(Indeed x86_64 is still the only target to define 'TARGET_EMPTY_RECORD_P',
calling 'gcc/tree.cc-default_is_empty_record'.)

And so it happens that for an empty struct used in code offloaded from x86_64
host (but not powerpc64le host, for example), we get to see 'TYPE_EMPTY_P' in
offloading compilation (where the offload targets (currently?) don't use it
themselves, and therefore aren't prepared to handle it).

For nvptx offloading compilation, this causes wrong code generation: 'ptxas
[...] error   : Call has wrong number of parameters', as nvptx code generation
for function definition doesn't pay attention to this flag (say, in
'gcc/config/nvptx/nvptx.cc:pass_in_memory', or whereever else would be
appropriate to handle that), but the generic code 'gcc/calls.cc:expand_call'
via 'gcc/function.cc:aggregate_value_p' does pay attention to it, and we thus
get mismatching function definition vs. function call.

I'd appreciate your insights into how to best address this?

Should we stream 'TYPE_EMPTY_P' only 'if (!lto_stream_offload_p)', and instead
in offload stream-in set 'TYPE_EMPTY_P' to 'false', and/or manually
re-initialize it for the respective offload target by calling
'targetm.calls.empty_record_p (type)' (once we have the complete 'type'
reconstructed)?  If that feasible, or problematic, as the host may already have
made any decisions that rely on the 'TYPE_EMPTY_P' flag?

(I've not checked, but assume that 'gcc/stor-layout.cc:finalize_type_size'
isn't getting called during offload stream-in, as otherwise that'd reset
'TYPE_EMPTY_P' as not supported for the offload targets, which evidently isn't
happening.)

Otherwise, should we implement 'TYPE_EMPTY_P' handling in the nvptx back end? 
(If yes, would I directly check 'TYPE_EMPTY_P', or use any "accessor functions"
like 'gcc/function.cc:aggregate_value_p', or
'gcc/calls.cc:must_pass_in_stack_var_size_or_pad' (as used by default for
'TARGET_MUST_PASS_IN_STACK'), etc.?  I'm confused on the exact semantics of all
these...)

This code path would then be used only for x86_64 host offloading compilation,
and therefore get different nvptx code generation for offloading from x86_64
host ('TYPE_EMPTY_P') vs. offloading from powerpc64le host (not 'TYPE_EMPTY_P')
or nvptx target (not 'TYPE_EMPTY_P').  (That may be slightly confusing, but not
an actual problem, I suppose.)

Or, implement full 'TARGET_EMPTY_RECORD_P' in the nvptx back end -- and that
way then get different nvptx code generation for nvptx target ('TYPE_EMPTY_P')
or offloading from x86_64 host ('TYPE_EMPTY_P') vs. offloading from powerpc64le
host (not 'TYPE_EMPTY_P').  (Again, that may be slightly confusing, but not an
actual problem, I suppose.)

..., or go all-in, and implement both 'TARGET_EMPTY_RECORD_P' in the nvptx back
end *and* manually re-initialize 'TYPE_EMPTY_P' during offload stream-in (if
that's feasible; see question above), and that way get the same 'TYPE_EMPTY_P'
code generation for all of nvptx target and x86_64 as well as powerpc64le host?


This issue apparently isn't a problem for GCN offloading, but I don't know if
that's by design or by accident.

I've not checked if GCN target (not 'TYPE_EMPTY_P') has diverging code
generation from code offloading from x86_64 host ('TYPE_EMPTY_P').

Reply via email to