On Tue, Jul 30, 2024 at 10:51:56PM +0200, Tobias Burnus wrote:
>  static tree
> -start_partial_init_fini_fn (bool initp, unsigned priority, unsigned count)
> +start_partial_init_fini_fn (bool initp, unsigned priority, unsigned count,
> +                         bool omp_target)
>  {
> -  char id[sizeof (SSDF_IDENTIFIER) + 1 /* '\0' */ + 32];
> +  tree name;
>  
> -  /* Create the identifier for this function.  It will be of the form
> -     SSDF_IDENTIFIER_<number>.  */
> -  sprintf (id, "%s_%u", SSDF_IDENTIFIER, count);
> +  if (omp_target)
> +    {
> +      char id[sizeof (OMP_SSDF_IDENTIFIER) + 1 /* \0 */ + 32];
> +
> +      /* Create the identifier for this function.  It will be of the form
> +      OMP_SSDF_IDENTIFIER_<number>.  */
> +      sprintf (id, "%s_%u", OMP_SSDF_IDENTIFIER, count);
> +      name = get_identifier (id);
> +    }
> +  else
> +    {
> +      char id[sizeof (SSDF_IDENTIFIER) + 1 /* '\0' */ + 32];
> +      /* Create the identifier for this function.  It will be of the form
> +      SSDF_IDENTIFIER_<number>.  */
> +      sprintf (id, "%s_%u", SSDF_IDENTIFIER, count);
> +      name = get_identifier (id);

I'd just use a single buffer here,
  char id[MAX (sizeof (SSDF_IDENTIFIER), sizeof (OMP_SSDF_IDENTIFIER))
          + 1 /* \0 */ + 32];
and then
  sprintf (id, "%s_%u", omp_target ? OMP_SSDF_IDENTIFIER : SSDF_IDENTIFIER,
           count);
and do get_identifier (id) as before, just tweak the comment.

> +  tree nonhost_if_stmt = NULL_TREE;
> +  if (omp_target)
> +    {
> +      nonhost_if_stmt = begin_if_stmt ();
> +      /* We add an "omp declare target nohost" attribute, but (for
> +      now) we still get a copy of the constructor/destructor on
> +      the host.  Make sure it does nothing unless we're on the
> +      target device.  */
> +      tree fn = build_call_expr_internal_loc (input_location,
> +                                           IFN_GOMP_IS_INITIAL_DEVICE,
> +                                           integer_type_node, 0);

Given that the Xeon PHI offloading is gone and fork offloading doesn't seem
to be worked on, my preference would be
__builtin_omp_is_initial_device () and fold that to 0/1 after IPA, because
that will actually help user code too.  This
wasn't done because of Xeon PHI, because the same compiler could be used
both for host and offloading compilation (say libraries linked in weren't
compiled twice) but now that isn't the case.
The fuzzy thing is whether we can fold the builtin at compile time in the
NVPTX/GCN targets if it isn't ACCEL_COMPILER, guess it would be safer not
to.

And of course, it would be much better to figure out real nohost fix,
because if we need to register a constructor which will just do nothing, it
still wastes runtime.

> +      enum internal_fn ifn = CALL_EXPR_IFN (*expr_p);
> +      if (ifn == IFN_GOMP_IS_INITIAL_DEVICE)
> +     {
> +       /* Required to expand it in the pass_omp_device_lower pass.  */
> +       cgraph_node::get (cfun->decl)->calls_declare_variant_alt = 1;
> +       return GS_ALL_DONE;
> +     }

See above.  For the builtin, I'd actually just fold it in builtins.cc,
guaded with symtab->global_info_ready or something like that.
But if you want to fold it in omp_device_lower pass, that is fine too.

        Jakub

Reply via email to