https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120308
Bug ID: 120308 Summary: 'TYPE_EMPTY_P' vs. code offloading Product: gcc Version: 15.0 Status: UNCONFIRMED Keywords: ABI, openacc, openmp, wrong-code Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: tschwinge at gcc dot gnu.org CC: ams at gcc dot gnu.org, jakub at gcc dot gnu.org, rguenth at gcc dot gnu.org, vries at gcc dot gnu.org Target Milestone: --- Target: nvptx ..., and here's another host vs. offload target compatibility issue. We've got 'gcc/stor-layout.cc:finalize_type_size': /* Handle empty records as per the x86-64 psABI. */ TYPE_EMPTY_P (type) = targetm.calls.empty_record_p (type); (Indeed x86_64 is still the only target to define 'TARGET_EMPTY_RECORD_P', calling 'gcc/tree.cc-default_is_empty_record'.) And so it happens that for an empty struct used in code offloaded from x86_64 host (but not powerpc64le host, for example), we get to see 'TYPE_EMPTY_P' in offloading compilation (where the offload targets (currently?) don't use it themselves, and therefore aren't prepared to handle it). For nvptx offloading compilation, this causes wrong code generation: 'ptxas [...] error : Call has wrong number of parameters', as nvptx code generation for function definition doesn't pay attention to this flag (say, in 'gcc/config/nvptx/nvptx.cc:pass_in_memory', or whereever else would be appropriate to handle that), but the generic code 'gcc/calls.cc:expand_call' via 'gcc/function.cc:aggregate_value_p' does pay attention to it, and we thus get mismatching function definition vs. function call. I'd appreciate your insights into how to best address this? Should we stream 'TYPE_EMPTY_P' only 'if (!lto_stream_offload_p)', and instead in offload stream-in set 'TYPE_EMPTY_P' to 'false', and/or manually re-initialize it for the respective offload target by calling 'targetm.calls.empty_record_p (type)' (once we have the complete 'type' reconstructed)? If that feasible, or problematic, as the host may already have made any decisions that rely on the 'TYPE_EMPTY_P' flag? (I've not checked, but assume that 'gcc/stor-layout.cc:finalize_type_size' isn't getting called during offload stream-in, as otherwise that'd reset 'TYPE_EMPTY_P' as not supported for the offload targets, which evidently isn't happening.) Otherwise, should we implement 'TYPE_EMPTY_P' handling in the nvptx back end? (If yes, would I directly check 'TYPE_EMPTY_P', or use any "accessor functions" like 'gcc/function.cc:aggregate_value_p', or 'gcc/calls.cc:must_pass_in_stack_var_size_or_pad' (as used by default for 'TARGET_MUST_PASS_IN_STACK'), etc.? I'm confused on the exact semantics of all these...) This code path would then be used only for x86_64 host offloading compilation, and therefore get different nvptx code generation for offloading from x86_64 host ('TYPE_EMPTY_P') vs. offloading from powerpc64le host (not 'TYPE_EMPTY_P') or nvptx target (not 'TYPE_EMPTY_P'). (That may be slightly confusing, but not an actual problem, I suppose.) Or, implement full 'TARGET_EMPTY_RECORD_P' in the nvptx back end -- and that way then get different nvptx code generation for nvptx target ('TYPE_EMPTY_P') or offloading from x86_64 host ('TYPE_EMPTY_P') vs. offloading from powerpc64le host (not 'TYPE_EMPTY_P'). (Again, that may be slightly confusing, but not an actual problem, I suppose.) ..., or go all-in, and implement both 'TARGET_EMPTY_RECORD_P' in the nvptx back end *and* manually re-initialize 'TYPE_EMPTY_P' during offload stream-in (if that's feasible; see question above), and that way get the same 'TYPE_EMPTY_P' code generation for all of nvptx target and x86_64 as well as powerpc64le host? This issue apparently isn't a problem for GCN offloading, but I don't know if that's by design or by accident. I've not checked if GCN target (not 'TYPE_EMPTY_P') has diverging code generation from code offloading from x86_64 host ('TYPE_EMPTY_P').