> On Sep 18, 2024, at 16:46, Richard Sandiford <richard.sandif...@arm.com> 
> wrote:
> 
> Yangyu Chen <chenyan...@isrc.iscas.ac.cn> writes:
>> I recently found that target_clones functions cannot inline even when
>> the caller has exactly the same target. However, if we only use target
>> attributes in C++ and let the compiler generate IFUNC for us, the
>> functions with the same target will be inlined.
>> 
>> For example, the following code compiled on x86-64 target with -O3 will
>> generate IFUNC for foo and bar and inline foo into the bar:
>> 
>> ```cpp
>> __attribute__((target("default")))
>> int foo(int *arr) {
>>    int sum = 0;
>>    for (int i=0;i<16;i++) sum += arr[i];
>>    return sum;
>> }
>> 
>> __attribute__((target("avx2")))
>> int foo(int *arr) {
>>    int sum = 0;
>>    for (int i=0;i<16;i++) sum += arr[i];
>>    return sum;
>> }
>> 
>> __attribute__((target("default")))
>> int bar(int *arr) {
>>    return foo(arr);
>> }
>> 
>> __attribute__((target("avx2")))
>> int bar(int *arr) {
>>    return foo(arr);
>> }
>> ```
>> 
>> However, if we use target_clones attribute, the target_clones functions
>> will not be inlined:
>> 
>> ```cpp
>> __attribute__((target_clones("default","avx2")))
>> int foo(int *arr) {
>>    int sum = 0;
>>    for (int i=0;i<16;i++) sum += arr[i];
>>    return sum;
>> }
>> 
>> __attribute__((target_clones("default","avx2")))
>> int bar(int *arr) {
>>    return foo(arr);
>> }
>> ```
>> 
>> This behavior may negatively impact performance since the target_clones
>> functions are not inlined. And since we didn't jump to the target_clones
>> functions based on PLT but used the same target as the caller's target.
>> I think it's better to allow the target_clones functions to be inlined.
>> 
>> gcc/ada/ChangeLog:
>> 
>>        * gcc-interface/utils.cc (handle_target_clones_attribute):
>>        Allow functions with target_clones attribute to be inlined.
>> 
>> gcc/c-family/ChangeLog:
>> 
>>        * c-attribs.cc (handle_target_clones_attribute):
>>        Allow functions with target_clones attribute to be inlined.
>> 
>> gcc/d/ChangeLog:
>> 
>>        * d-attribs.cc (d_handle_target_clones_attribute):
>>        Allow functions with target_clones attribute to be inlined.
> 
> What I'm about to say applies to both sequences above, but:
> 
> Before inlining avx2 foo into avx2 bar, don't we need to be careful about
> making sure that foo would still pick the avx2 version if called normally?
> E.g. if foo had an avx512 version, direct calls to foo would presumably
> pick that on avx512 targets, but still pick the avx2 version of bar.
> It would then seem strange for the avx2 version of bar to inline the
> avx2 version of foo, both for performance and ODR reasons.
> 

Yes. For now, the GCC will emit code to call the avx2 callee directly,
which will be worse than inlining. This should be fixed in the
future.

> Thanks,
> Richard
> 
>> 
>> Signed-off-by: Yangyu Chen <chenyan...@isrc.iscas.ac.cn>
>> ---
>> gcc/ada/gcc-interface/utils.cc | 5 +----
>> gcc/c-family/c-attribs.cc      | 3 ---
>> gcc/d/d-attribs.cc             | 5 -----
>> 3 files changed, 1 insertion(+), 12 deletions(-)
>> 
>> diff --git a/gcc/ada/gcc-interface/utils.cc b/gcc/ada/gcc-interface/utils.cc
>> index 60f36b1e50d..d010b684177 100644
>> --- a/gcc/ada/gcc-interface/utils.cc
>> +++ b/gcc/ada/gcc-interface/utils.cc
>> @@ -7299,10 +7299,7 @@ handle_target_clones_attribute (tree *node, tree 
>> name, tree ARG_UNUSED (args),
>>   int ARG_UNUSED (flags), bool *no_add_attrs)
>> {
>>   /* Ensure we have a function type.  */
>> -  if (TREE_CODE (*node) == FUNCTION_DECL)
>> -    /* Do not inline functions with multiple clone targets.  */
>> -    DECL_UNINLINABLE (*node) = 1;
>> -  else
>> +  if (TREE_CODE (*node) != FUNCTION_DECL)
>>     {
>>       warning (OPT_Wattributes, "%qE attribute ignored", name);
>>       *no_add_attrs = true;
>> diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
>> index 4dd2eecbea5..f8759bb1908 100644
>> --- a/gcc/c-family/c-attribs.cc
>> +++ b/gcc/c-family/c-attribs.cc
>> @@ -6105,9 +6105,6 @@ handle_target_clones_attribute (tree *node, tree name, 
>> tree ARG_UNUSED (args),
>>    "single %<target_clones%> attribute is ignored");
>>   *no_add_attrs = true;
>> }
>> -      else
>> -      /* Do not inline functions with multiple clone targets.  */
>> - DECL_UNINLINABLE (*node) = 1;
>>     }
>>   else
>>     {
>> diff --git a/gcc/d/d-attribs.cc b/gcc/d/d-attribs.cc
>> index 0f7ca10e017..9f67415adb1 100644
>> --- a/gcc/d/d-attribs.cc
>> +++ b/gcc/d/d-attribs.cc
>> @@ -788,11 +788,6 @@ d_handle_target_clones_attribute (tree *node, tree 
>> name, tree, int,
>>       warning (OPT_Wattributes, "%qE attribute ignored", name);
>>       *no_add_attrs = true;
>>     }
>> -  else
>> -    {
>> -      /* Do not inline functions with multiple clone targets.  */
>> -      DECL_UNINLINABLE (*node) = 1;
>> -    }
>> 
>>   return NULL_TREE;
>> }


Reply via email to