> On Sep 18, 2024, at 16:46, Richard Sandiford <richard.sandif...@arm.com>
> wrote:
>
> Yangyu Chen <chenyan...@isrc.iscas.ac.cn> writes:
>> I recently found that target_clones functions cannot inline even when
>> the caller has exactly the same target. However, if we only use target
>> attributes in C++ and let the compiler generate IFUNC for us, the
>> functions with the same target will be inlined.
>>
>> For example, the following code compiled on x86-64 target with -O3 will
>> generate IFUNC for foo and bar and inline foo into the bar:
>>
>> ```cpp
>> __attribute__((target("default")))
>> int foo(int *arr) {
>> int sum = 0;
>> for (int i=0;i<16;i++) sum += arr[i];
>> return sum;
>> }
>>
>> __attribute__((target("avx2")))
>> int foo(int *arr) {
>> int sum = 0;
>> for (int i=0;i<16;i++) sum += arr[i];
>> return sum;
>> }
>>
>> __attribute__((target("default")))
>> int bar(int *arr) {
>> return foo(arr);
>> }
>>
>> __attribute__((target("avx2")))
>> int bar(int *arr) {
>> return foo(arr);
>> }
>> ```
>>
>> However, if we use target_clones attribute, the target_clones functions
>> will not be inlined:
>>
>> ```cpp
>> __attribute__((target_clones("default","avx2")))
>> int foo(int *arr) {
>> int sum = 0;
>> for (int i=0;i<16;i++) sum += arr[i];
>> return sum;
>> }
>>
>> __attribute__((target_clones("default","avx2")))
>> int bar(int *arr) {
>> return foo(arr);
>> }
>> ```
>>
>> This behavior may negatively impact performance since the target_clones
>> functions are not inlined. And since we didn't jump to the target_clones
>> functions based on PLT but used the same target as the caller's target.
>> I think it's better to allow the target_clones functions to be inlined.
>>
>> gcc/ada/ChangeLog:
>>
>> * gcc-interface/utils.cc (handle_target_clones_attribute):
>> Allow functions with target_clones attribute to be inlined.
>>
>> gcc/c-family/ChangeLog:
>>
>> * c-attribs.cc (handle_target_clones_attribute):
>> Allow functions with target_clones attribute to be inlined.
>>
>> gcc/d/ChangeLog:
>>
>> * d-attribs.cc (d_handle_target_clones_attribute):
>> Allow functions with target_clones attribute to be inlined.
>
> What I'm about to say applies to both sequences above, but:
>
> Before inlining avx2 foo into avx2 bar, don't we need to be careful about
> making sure that foo would still pick the avx2 version if called normally?
> E.g. if foo had an avx512 version, direct calls to foo would presumably
> pick that on avx512 targets, but still pick the avx2 version of bar.
> It would then seem strange for the avx2 version of bar to inline the
> avx2 version of foo, both for performance and ODR reasons.
>
Yes. For now, the GCC will emit code to call the avx2 callee directly,
which will be worse than inlining. This should be fixed in the
future.
> Thanks,
> Richard
>
>>
>> Signed-off-by: Yangyu Chen <chenyan...@isrc.iscas.ac.cn>
>> ---
>> gcc/ada/gcc-interface/utils.cc | 5 +----
>> gcc/c-family/c-attribs.cc | 3 ---
>> gcc/d/d-attribs.cc | 5 -----
>> 3 files changed, 1 insertion(+), 12 deletions(-)
>>
>> diff --git a/gcc/ada/gcc-interface/utils.cc b/gcc/ada/gcc-interface/utils.cc
>> index 60f36b1e50d..d010b684177 100644
>> --- a/gcc/ada/gcc-interface/utils.cc
>> +++ b/gcc/ada/gcc-interface/utils.cc
>> @@ -7299,10 +7299,7 @@ handle_target_clones_attribute (tree *node, tree
>> name, tree ARG_UNUSED (args),
>> int ARG_UNUSED (flags), bool *no_add_attrs)
>> {
>> /* Ensure we have a function type. */
>> - if (TREE_CODE (*node) == FUNCTION_DECL)
>> - /* Do not inline functions with multiple clone targets. */
>> - DECL_UNINLINABLE (*node) = 1;
>> - else
>> + if (TREE_CODE (*node) != FUNCTION_DECL)
>> {
>> warning (OPT_Wattributes, "%qE attribute ignored", name);
>> *no_add_attrs = true;
>> diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
>> index 4dd2eecbea5..f8759bb1908 100644
>> --- a/gcc/c-family/c-attribs.cc
>> +++ b/gcc/c-family/c-attribs.cc
>> @@ -6105,9 +6105,6 @@ handle_target_clones_attribute (tree *node, tree name,
>> tree ARG_UNUSED (args),
>> "single %<target_clones%> attribute is ignored");
>> *no_add_attrs = true;
>> }
>> - else
>> - /* Do not inline functions with multiple clone targets. */
>> - DECL_UNINLINABLE (*node) = 1;
>> }
>> else
>> {
>> diff --git a/gcc/d/d-attribs.cc b/gcc/d/d-attribs.cc
>> index 0f7ca10e017..9f67415adb1 100644
>> --- a/gcc/d/d-attribs.cc
>> +++ b/gcc/d/d-attribs.cc
>> @@ -788,11 +788,6 @@ d_handle_target_clones_attribute (tree *node, tree
>> name, tree, int,
>> warning (OPT_Wattributes, "%qE attribute ignored", name);
>> *no_add_attrs = true;
>> }
>> - else
>> - {
>> - /* Do not inline functions with multiple clone targets. */
>> - DECL_UNINLINABLE (*node) = 1;
>> - }
>>
>> return NULL_TREE;
>> }