> On Sep 18, 2024, at 23:36, Andrew Carlotti <andrew.carlo...@arm.com> wrote:
> 
> On Wed, Sep 18, 2024 at 09:46:15AM +0100, Richard Sandiford wrote:
>> Yangyu Chen <chenyan...@isrc.iscas.ac.cn> writes:
>>> I recently found that target_clones functions cannot inline even when
>>> the caller has exactly the same target. However, if we only use target
>>> attributes in C++ and let the compiler generate IFUNC for us, the
>>> functions with the same target will be inlined.
>>> 
>>> For example, the following code compiled on x86-64 target with -O3 will
>>> generate IFUNC for foo and bar and inline foo into the bar:
>>> 
>>> ```cpp
>>> __attribute__((target("default")))
>>> int foo(int *arr) {
>>>    int sum = 0;
>>>    for (int i=0;i<16;i++) sum += arr[i];
>>>    return sum;
>>> }
>>> 
>>> __attribute__((target("avx2")))
>>> int foo(int *arr) {
>>>    int sum = 0;
>>>    for (int i=0;i<16;i++) sum += arr[i];
>>>    return sum;
>>> }
>>> 
>>> __attribute__((target("default")))
>>> int bar(int *arr) {
>>>    return foo(arr);
>>> }
>>> 
>>> __attribute__((target("avx2")))
>>> int bar(int *arr) {
>>>    return foo(arr);
>>> }
>>> ```
>>> 
>>> However, if we use target_clones attribute, the target_clones functions
>>> will not be inlined:
>>> 
>>> ```cpp
>>> __attribute__((target_clones("default","avx2")))
>>> int foo(int *arr) {
>>>    int sum = 0;
>>>    for (int i=0;i<16;i++) sum += arr[i];
>>>    return sum;
>>> }
>>> 
>>> __attribute__((target_clones("default","avx2")))
>>> int bar(int *arr) {
>>>    return foo(arr);
>>> }
>>> ```
>>> 
>>> This behavior may negatively impact performance since the target_clones
>>> functions are not inlined. And since we didn't jump to the target_clones
>>> functions based on PLT but used the same target as the caller's target.
>>> I think it's better to allow the target_clones functions to be inlined.
>>> 
>>> gcc/ada/ChangeLog:
>>> 
>>>        * gcc-interface/utils.cc (handle_target_clones_attribute):
>>>        Allow functions with target_clones attribute to be inlined.
>>> 
>>> gcc/c-family/ChangeLog:
>>> 
>>>        * c-attribs.cc (handle_target_clones_attribute):
>>>        Allow functions with target_clones attribute to be inlined.
>>> 
>>> gcc/d/ChangeLog:
>>> 
>>>        * d-attribs.cc (d_handle_target_clones_attribute):
>>>        Allow functions with target_clones attribute to be inlined.
>> 
>> What I'm about to say applies to both sequences above, but:
>> 
>> Before inlining avx2 foo into avx2 bar, don't we need to be careful about
>> making sure that foo would still pick the avx2 version if called normally?
>> E.g. if foo had an avx512 version, direct calls to foo would presumably
>> pick that on avx512 targets, but still pick the avx2 version of bar.
>> It would then seem strange for the avx2 version of bar to inline the
>> avx2 version of foo, both for performance and ODR reasons.
>> 
>> Thanks,
>> Richard
> 
> This is actually an existing bug in the indirection elimination code for the
> 'target' attribute.  AArch64's 'target_version' attribute is not affected by
> this bug because the code in redirect_to_specific_clone only checks the target
> attribute (however, AArch64 is affected by a more convoluted bug in cases 
> where
> we use both target and target_version attributes at the same time).
> 
> I think it's correct to mark functions with the target_clones attribute
> as non-inlinable by the usual inlining mechanisms, since we don't know which
> version will be called at runtime without doing further work to check this.  

I think the best case might be allowing inline when the top priority
callee function has a subset of the caller's ISA. When the callee
has some target ISA extension that does not exist in the caller's,
we call the function through PLT. However, this solution also needs
to allow the target_clones callee to be able to inline.

> If
> we can remove the indirection, then the individual function versions might 
> then
> become inlinable, but we don't have these individual versions available until
> after the target clones pass.
> 
>>> 
>>> Signed-off-by: Yangyu Chen <chenyan...@isrc.iscas.ac.cn>
>>> ---
>>> gcc/ada/gcc-interface/utils.cc | 5 +----
>>> gcc/c-family/c-attribs.cc      | 3 ---
>>> gcc/d/d-attribs.cc             | 5 -----
>>> 3 files changed, 1 insertion(+), 12 deletions(-)
>>> 
>>> diff --git a/gcc/ada/gcc-interface/utils.cc b/gcc/ada/gcc-interface/utils.cc
>>> index 60f36b1e50d..d010b684177 100644
>>> --- a/gcc/ada/gcc-interface/utils.cc
>>> +++ b/gcc/ada/gcc-interface/utils.cc
>>> @@ -7299,10 +7299,7 @@ handle_target_clones_attribute (tree *node, tree 
>>> name, tree ARG_UNUSED (args),
>>>   int ARG_UNUSED (flags), bool *no_add_attrs)
>>> {
>>>   /* Ensure we have a function type.  */
>>> -  if (TREE_CODE (*node) == FUNCTION_DECL)
>>> -    /* Do not inline functions with multiple clone targets.  */
>>> -    DECL_UNINLINABLE (*node) = 1;
>>> -  else
>>> +  if (TREE_CODE (*node) != FUNCTION_DECL)
>>>     {
>>>       warning (OPT_Wattributes, "%qE attribute ignored", name);
>>>       *no_add_attrs = true;
>>> diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
>>> index 4dd2eecbea5..f8759bb1908 100644
>>> --- a/gcc/c-family/c-attribs.cc
>>> +++ b/gcc/c-family/c-attribs.cc
>>> @@ -6105,9 +6105,6 @@ handle_target_clones_attribute (tree *node, tree 
>>> name, tree ARG_UNUSED (args),
>>>    "single %<target_clones%> attribute is ignored");
>>>   *no_add_attrs = true;
>>> }
>>> -      else
>>> -      /* Do not inline functions with multiple clone targets.  */
>>> - DECL_UNINLINABLE (*node) = 1;
>>>     }
>>>   else
>>>     {
>>> diff --git a/gcc/d/d-attribs.cc b/gcc/d/d-attribs.cc
>>> index 0f7ca10e017..9f67415adb1 100644
>>> --- a/gcc/d/d-attribs.cc
>>> +++ b/gcc/d/d-attribs.cc
>>> @@ -788,11 +788,6 @@ d_handle_target_clones_attribute (tree *node, tree 
>>> name, tree, int,
>>>       warning (OPT_Wattributes, "%qE attribute ignored", name);
>>>       *no_add_attrs = true;
>>>     }
>>> -  else
>>> -    {
>>> -      /* Do not inline functions with multiple clone targets.  */
>>> -      DECL_UNINLINABLE (*node) = 1;
>>> -    }
>>> 
>>>   return NULL_TREE;
>>> }


Reply via email to