Yangyu Chen <chenyan...@isrc.iscas.ac.cn> writes: > I recently found that target_clones functions cannot inline even when > the caller has exactly the same target. However, if we only use target > attributes in C++ and let the compiler generate IFUNC for us, the > functions with the same target will be inlined. > > For example, the following code compiled on x86-64 target with -O3 will > generate IFUNC for foo and bar and inline foo into the bar: > > ```cpp > __attribute__((target("default"))) > int foo(int *arr) { > int sum = 0; > for (int i=0;i<16;i++) sum += arr[i]; > return sum; > } > > __attribute__((target("avx2"))) > int foo(int *arr) { > int sum = 0; > for (int i=0;i<16;i++) sum += arr[i]; > return sum; > } > > __attribute__((target("default"))) > int bar(int *arr) { > return foo(arr); > } > > __attribute__((target("avx2"))) > int bar(int *arr) { > return foo(arr); > } > ``` > > However, if we use target_clones attribute, the target_clones functions > will not be inlined: > > ```cpp > __attribute__((target_clones("default","avx2"))) > int foo(int *arr) { > int sum = 0; > for (int i=0;i<16;i++) sum += arr[i]; > return sum; > } > > __attribute__((target_clones("default","avx2"))) > int bar(int *arr) { > return foo(arr); > } > ``` > > This behavior may negatively impact performance since the target_clones > functions are not inlined. And since we didn't jump to the target_clones > functions based on PLT but used the same target as the caller's target. > I think it's better to allow the target_clones functions to be inlined. > > gcc/ada/ChangeLog: > > * gcc-interface/utils.cc (handle_target_clones_attribute): > Allow functions with target_clones attribute to be inlined. > > gcc/c-family/ChangeLog: > > * c-attribs.cc (handle_target_clones_attribute): > Allow functions with target_clones attribute to be inlined. > > gcc/d/ChangeLog: > > * d-attribs.cc (d_handle_target_clones_attribute): > Allow functions with target_clones attribute to be inlined.
What I'm about to say applies to both sequences above, but: Before inlining avx2 foo into avx2 bar, don't we need to be careful about making sure that foo would still pick the avx2 version if called normally? E.g. if foo had an avx512 version, direct calls to foo would presumably pick that on avx512 targets, but still pick the avx2 version of bar. It would then seem strange for the avx2 version of bar to inline the avx2 version of foo, both for performance and ODR reasons. Thanks, Richard > > Signed-off-by: Yangyu Chen <chenyan...@isrc.iscas.ac.cn> > --- > gcc/ada/gcc-interface/utils.cc | 5 +---- > gcc/c-family/c-attribs.cc | 3 --- > gcc/d/d-attribs.cc | 5 ----- > 3 files changed, 1 insertion(+), 12 deletions(-) > > diff --git a/gcc/ada/gcc-interface/utils.cc b/gcc/ada/gcc-interface/utils.cc > index 60f36b1e50d..d010b684177 100644 > --- a/gcc/ada/gcc-interface/utils.cc > +++ b/gcc/ada/gcc-interface/utils.cc > @@ -7299,10 +7299,7 @@ handle_target_clones_attribute (tree *node, tree name, > tree ARG_UNUSED (args), > int ARG_UNUSED (flags), bool *no_add_attrs) > { > /* Ensure we have a function type. */ > - if (TREE_CODE (*node) == FUNCTION_DECL) > - /* Do not inline functions with multiple clone targets. */ > - DECL_UNINLINABLE (*node) = 1; > - else > + if (TREE_CODE (*node) != FUNCTION_DECL) > { > warning (OPT_Wattributes, "%qE attribute ignored", name); > *no_add_attrs = true; > diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc > index 4dd2eecbea5..f8759bb1908 100644 > --- a/gcc/c-family/c-attribs.cc > +++ b/gcc/c-family/c-attribs.cc > @@ -6105,9 +6105,6 @@ handle_target_clones_attribute (tree *node, tree name, > tree ARG_UNUSED (args), > "single %<target_clones%> attribute is ignored"); > *no_add_attrs = true; > } > - else > - /* Do not inline functions with multiple clone targets. */ > - DECL_UNINLINABLE (*node) = 1; > } > else > { > diff --git a/gcc/d/d-attribs.cc b/gcc/d/d-attribs.cc > index 0f7ca10e017..9f67415adb1 100644 > --- a/gcc/d/d-attribs.cc > +++ b/gcc/d/d-attribs.cc > @@ -788,11 +788,6 @@ d_handle_target_clones_attribute (tree *node, tree name, > tree, int, > warning (OPT_Wattributes, "%qE attribute ignored", name); > *no_add_attrs = true; > } > - else > - { > - /* Do not inline functions with multiple clone targets. */ > - DECL_UNINLINABLE (*node) = 1; > - } > > return NULL_TREE; > }