On Wed, Sep 18, 2024 at 8:41 PM Andrew Carlotti <andrew.carlo...@arm.com> wrote:
>
> On Thu, Sep 19, 2024 at 01:01:39AM +0800, Yangyu Chen wrote:
> >
> >
> > > On Sep 18, 2024, at 23:36, Andrew Carlotti <andrew.carlo...@arm.com> 
> > > wrote:
> > >
> > > On Wed, Sep 18, 2024 at 09:46:15AM +0100, Richard Sandiford wrote:
> > >> Yangyu Chen <chenyan...@isrc.iscas.ac.cn> writes:
> > >>> I recently found that target_clones functions cannot inline even when
> > >>> the caller has exactly the same target. However, if we only use target
> > >>> attributes in C++ and let the compiler generate IFUNC for us, the
> > >>> functions with the same target will be inlined.
> > >>>
> > >>> For example, the following code compiled on x86-64 target with -O3 will
> > >>> generate IFUNC for foo and bar and inline foo into the bar:
> > >>>
> > >>> ```cpp
> > >>> __attribute__((target("default")))
> > >>> int foo(int *arr) {
> > >>>    int sum = 0;
> > >>>    for (int i=0;i<16;i++) sum += arr[i];
> > >>>    return sum;
> > >>> }
> > >>>
> > >>> __attribute__((target("avx2")))
> > >>> int foo(int *arr) {
> > >>>    int sum = 0;
> > >>>    for (int i=0;i<16;i++) sum += arr[i];
> > >>>    return sum;
> > >>> }
> > >>>
> > >>> __attribute__((target("default")))
> > >>> int bar(int *arr) {
> > >>>    return foo(arr);
> > >>> }
> > >>>
> > >>> __attribute__((target("avx2")))
> > >>> int bar(int *arr) {
> > >>>    return foo(arr);
> > >>> }
> > >>> ```
> > >>>
> > >>> However, if we use target_clones attribute, the target_clones functions
> > >>> will not be inlined:
> > >>>
> > >>> ```cpp
> > >>> __attribute__((target_clones("default","avx2")))
> > >>> int foo(int *arr) {
> > >>>    int sum = 0;
> > >>>    for (int i=0;i<16;i++) sum += arr[i];
> > >>>    return sum;
> > >>> }
> > >>>
> > >>> __attribute__((target_clones("default","avx2")))
> > >>> int bar(int *arr) {
> > >>>    return foo(arr);
> > >>> }
> > >>> ```
> > >>>
> > >>> This behavior may negatively impact performance since the target_clones
> > >>> functions are not inlined. And since we didn't jump to the target_clones
> > >>> functions based on PLT but used the same target as the caller's target.
> > >>> I think it's better to allow the target_clones functions to be inlined.
> > >>>
> > >>> gcc/ada/ChangeLog:
> > >>>
> > >>>        * gcc-interface/utils.cc (handle_target_clones_attribute):
> > >>>        Allow functions with target_clones attribute to be inlined.
> > >>>
> > >>> gcc/c-family/ChangeLog:
> > >>>
> > >>>        * c-attribs.cc (handle_target_clones_attribute):
> > >>>        Allow functions with target_clones attribute to be inlined.
> > >>>
> > >>> gcc/d/ChangeLog:
> > >>>
> > >>>        * d-attribs.cc (d_handle_target_clones_attribute):
> > >>>        Allow functions with target_clones attribute to be inlined.
> > >>
> > >> What I'm about to say applies to both sequences above, but:
> > >>
> > >> Before inlining avx2 foo into avx2 bar, don't we need to be careful about
> > >> making sure that foo would still pick the avx2 version if called 
> > >> normally?
> > >> E.g. if foo had an avx512 version, direct calls to foo would presumably
> > >> pick that on avx512 targets, but still pick the avx2 version of bar.
> > >> It would then seem strange for the avx2 version of bar to inline the
> > >> avx2 version of foo, both for performance and ODR reasons.
> > >>
> > >> Thanks,
> > >> Richard
> > >
> > > This is actually an existing bug in the indirection elimination code for 
> > > the
> > > 'target' attribute.  AArch64's 'target_version' attribute is not affected 
> > > by
> > > this bug because the code in redirect_to_specific_clone only checks the 
> > > target
> > > attribute (however, AArch64 is affected by a more convoluted bug in cases 
> > > where
> > > we use both target and target_version attributes at the same time).
> > >
> > > I think it's correct to mark functions with the target_clones attribute
> > > as non-inlinable by the usual inlining mechanisms, since we don't know 
> > > which
> > > version will be called at runtime without doing further work to check 
> > > this.
> >
> > I think the best case might be allowing inline when the top priority
> > callee function has a subset of the caller's ISA. When the callee
> > has some target ISA extension that does not exist in the caller's,
> > we call the function through PLT. However, this solution also needs
> > to allow the target_clones callee to be able to inline.
>
> If we redirect to a specific version in the target_clone pass, then presumably
> that means that it's safe to inline that version?

I would say so.  I think marking the resolver decl as not inlineable
is correct but
the individual target clones should not be marked that way (are they?)

>  I'm not sure whether the
> existing code in redirect_to_specific_clone checks whether this is safe before
> removing the indirection, which might be a bug in itself.

I think the code should make the call inlineable already, unless the functions
are marked as not inlineable - you'd need to debug what happens.

RIchard.

>
> If we know it's safe to inline, then we could mark the function as inlineable
> in redirect_to_specific_clone, which would allow the later ipa_inline pass
> toinline the function.
>
> > > If
> > > we can remove the indirection, then the individual function versions 
> > > might then
> > > become inlinable, but we don't have these individual versions available 
> > > until
> > > after the target clones pass.
> > >
> > >>>
> > >>> Signed-off-by: Yangyu Chen <chenyan...@isrc.iscas.ac.cn>
> > >>> ---
> > >>> gcc/ada/gcc-interface/utils.cc | 5 +----
> > >>> gcc/c-family/c-attribs.cc      | 3 ---
> > >>> gcc/d/d-attribs.cc             | 5 -----
> > >>> 3 files changed, 1 insertion(+), 12 deletions(-)
> > >>>
> > >>> diff --git a/gcc/ada/gcc-interface/utils.cc 
> > >>> b/gcc/ada/gcc-interface/utils.cc
> > >>> index 60f36b1e50d..d010b684177 100644
> > >>> --- a/gcc/ada/gcc-interface/utils.cc
> > >>> +++ b/gcc/ada/gcc-interface/utils.cc
> > >>> @@ -7299,10 +7299,7 @@ handle_target_clones_attribute (tree *node, tree 
> > >>> name, tree ARG_UNUSED (args),
> > >>>   int ARG_UNUSED (flags), bool *no_add_attrs)
> > >>> {
> > >>>   /* Ensure we have a function type.  */
> > >>> -  if (TREE_CODE (*node) == FUNCTION_DECL)
> > >>> -    /* Do not inline functions with multiple clone targets.  */
> > >>> -    DECL_UNINLINABLE (*node) = 1;
> > >>> -  else
> > >>> +  if (TREE_CODE (*node) != FUNCTION_DECL)
> > >>>     {
> > >>>       warning (OPT_Wattributes, "%qE attribute ignored", name);
> > >>>       *no_add_attrs = true;
> > >>> diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
> > >>> index 4dd2eecbea5..f8759bb1908 100644
> > >>> --- a/gcc/c-family/c-attribs.cc
> > >>> +++ b/gcc/c-family/c-attribs.cc
> > >>> @@ -6105,9 +6105,6 @@ handle_target_clones_attribute (tree *node, tree 
> > >>> name, tree ARG_UNUSED (args),
> > >>>    "single %<target_clones%> attribute is ignored");
> > >>>   *no_add_attrs = true;
> > >>> }
> > >>> -      else
> > >>> -      /* Do not inline functions with multiple clone targets.  */
> > >>> - DECL_UNINLINABLE (*node) = 1;
> > >>>     }
> > >>>   else
> > >>>     {
> > >>> diff --git a/gcc/d/d-attribs.cc b/gcc/d/d-attribs.cc
> > >>> index 0f7ca10e017..9f67415adb1 100644
> > >>> --- a/gcc/d/d-attribs.cc
> > >>> +++ b/gcc/d/d-attribs.cc
> > >>> @@ -788,11 +788,6 @@ d_handle_target_clones_attribute (tree *node, tree 
> > >>> name, tree, int,
> > >>>       warning (OPT_Wattributes, "%qE attribute ignored", name);
> > >>>       *no_add_attrs = true;
> > >>>     }
> > >>> -  else
> > >>> -    {
> > >>> -      /* Do not inline functions with multiple clone targets.  */
> > >>> -      DECL_UNINLINABLE (*node) = 1;
> > >>> -    }
> > >>>
> > >>>   return NULL_TREE;
> > >>> }
> >
> >

Reply via email to