On Wed, Sep 18, 2024 at 8:41 PM Andrew Carlotti <andrew.carlo...@arm.com> wrote: > > On Thu, Sep 19, 2024 at 01:01:39AM +0800, Yangyu Chen wrote: > > > > > > > On Sep 18, 2024, at 23:36, Andrew Carlotti <andrew.carlo...@arm.com> > > > wrote: > > > > > > On Wed, Sep 18, 2024 at 09:46:15AM +0100, Richard Sandiford wrote: > > >> Yangyu Chen <chenyan...@isrc.iscas.ac.cn> writes: > > >>> I recently found that target_clones functions cannot inline even when > > >>> the caller has exactly the same target. However, if we only use target > > >>> attributes in C++ and let the compiler generate IFUNC for us, the > > >>> functions with the same target will be inlined. > > >>> > > >>> For example, the following code compiled on x86-64 target with -O3 will > > >>> generate IFUNC for foo and bar and inline foo into the bar: > > >>> > > >>> ```cpp > > >>> __attribute__((target("default"))) > > >>> int foo(int *arr) { > > >>> int sum = 0; > > >>> for (int i=0;i<16;i++) sum += arr[i]; > > >>> return sum; > > >>> } > > >>> > > >>> __attribute__((target("avx2"))) > > >>> int foo(int *arr) { > > >>> int sum = 0; > > >>> for (int i=0;i<16;i++) sum += arr[i]; > > >>> return sum; > > >>> } > > >>> > > >>> __attribute__((target("default"))) > > >>> int bar(int *arr) { > > >>> return foo(arr); > > >>> } > > >>> > > >>> __attribute__((target("avx2"))) > > >>> int bar(int *arr) { > > >>> return foo(arr); > > >>> } > > >>> ``` > > >>> > > >>> However, if we use target_clones attribute, the target_clones functions > > >>> will not be inlined: > > >>> > > >>> ```cpp > > >>> __attribute__((target_clones("default","avx2"))) > > >>> int foo(int *arr) { > > >>> int sum = 0; > > >>> for (int i=0;i<16;i++) sum += arr[i]; > > >>> return sum; > > >>> } > > >>> > > >>> __attribute__((target_clones("default","avx2"))) > > >>> int bar(int *arr) { > > >>> return foo(arr); > > >>> } > > >>> ``` > > >>> > > >>> This behavior may negatively impact performance since the target_clones > > >>> functions are not inlined. And since we didn't jump to the target_clones > > >>> functions based on PLT but used the same target as the caller's target. > > >>> I think it's better to allow the target_clones functions to be inlined. > > >>> > > >>> gcc/ada/ChangeLog: > > >>> > > >>> * gcc-interface/utils.cc (handle_target_clones_attribute): > > >>> Allow functions with target_clones attribute to be inlined. > > >>> > > >>> gcc/c-family/ChangeLog: > > >>> > > >>> * c-attribs.cc (handle_target_clones_attribute): > > >>> Allow functions with target_clones attribute to be inlined. > > >>> > > >>> gcc/d/ChangeLog: > > >>> > > >>> * d-attribs.cc (d_handle_target_clones_attribute): > > >>> Allow functions with target_clones attribute to be inlined. > > >> > > >> What I'm about to say applies to both sequences above, but: > > >> > > >> Before inlining avx2 foo into avx2 bar, don't we need to be careful about > > >> making sure that foo would still pick the avx2 version if called > > >> normally? > > >> E.g. if foo had an avx512 version, direct calls to foo would presumably > > >> pick that on avx512 targets, but still pick the avx2 version of bar. > > >> It would then seem strange for the avx2 version of bar to inline the > > >> avx2 version of foo, both for performance and ODR reasons. > > >> > > >> Thanks, > > >> Richard > > > > > > This is actually an existing bug in the indirection elimination code for > > > the > > > 'target' attribute. AArch64's 'target_version' attribute is not affected > > > by > > > this bug because the code in redirect_to_specific_clone only checks the > > > target > > > attribute (however, AArch64 is affected by a more convoluted bug in cases > > > where > > > we use both target and target_version attributes at the same time). > > > > > > I think it's correct to mark functions with the target_clones attribute > > > as non-inlinable by the usual inlining mechanisms, since we don't know > > > which > > > version will be called at runtime without doing further work to check > > > this. > > > > I think the best case might be allowing inline when the top priority > > callee function has a subset of the caller's ISA. When the callee > > has some target ISA extension that does not exist in the caller's, > > we call the function through PLT. However, this solution also needs > > to allow the target_clones callee to be able to inline. > > If we redirect to a specific version in the target_clone pass, then presumably > that means that it's safe to inline that version?
I would say so. I think marking the resolver decl as not inlineable is correct but the individual target clones should not be marked that way (are they?) > I'm not sure whether the > existing code in redirect_to_specific_clone checks whether this is safe before > removing the indirection, which might be a bug in itself. I think the code should make the call inlineable already, unless the functions are marked as not inlineable - you'd need to debug what happens. RIchard. > > If we know it's safe to inline, then we could mark the function as inlineable > in redirect_to_specific_clone, which would allow the later ipa_inline pass > toinline the function. > > > > If > > > we can remove the indirection, then the individual function versions > > > might then > > > become inlinable, but we don't have these individual versions available > > > until > > > after the target clones pass. > > > > > >>> > > >>> Signed-off-by: Yangyu Chen <chenyan...@isrc.iscas.ac.cn> > > >>> --- > > >>> gcc/ada/gcc-interface/utils.cc | 5 +---- > > >>> gcc/c-family/c-attribs.cc | 3 --- > > >>> gcc/d/d-attribs.cc | 5 ----- > > >>> 3 files changed, 1 insertion(+), 12 deletions(-) > > >>> > > >>> diff --git a/gcc/ada/gcc-interface/utils.cc > > >>> b/gcc/ada/gcc-interface/utils.cc > > >>> index 60f36b1e50d..d010b684177 100644 > > >>> --- a/gcc/ada/gcc-interface/utils.cc > > >>> +++ b/gcc/ada/gcc-interface/utils.cc > > >>> @@ -7299,10 +7299,7 @@ handle_target_clones_attribute (tree *node, tree > > >>> name, tree ARG_UNUSED (args), > > >>> int ARG_UNUSED (flags), bool *no_add_attrs) > > >>> { > > >>> /* Ensure we have a function type. */ > > >>> - if (TREE_CODE (*node) == FUNCTION_DECL) > > >>> - /* Do not inline functions with multiple clone targets. */ > > >>> - DECL_UNINLINABLE (*node) = 1; > > >>> - else > > >>> + if (TREE_CODE (*node) != FUNCTION_DECL) > > >>> { > > >>> warning (OPT_Wattributes, "%qE attribute ignored", name); > > >>> *no_add_attrs = true; > > >>> diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc > > >>> index 4dd2eecbea5..f8759bb1908 100644 > > >>> --- a/gcc/c-family/c-attribs.cc > > >>> +++ b/gcc/c-family/c-attribs.cc > > >>> @@ -6105,9 +6105,6 @@ handle_target_clones_attribute (tree *node, tree > > >>> name, tree ARG_UNUSED (args), > > >>> "single %<target_clones%> attribute is ignored"); > > >>> *no_add_attrs = true; > > >>> } > > >>> - else > > >>> - /* Do not inline functions with multiple clone targets. */ > > >>> - DECL_UNINLINABLE (*node) = 1; > > >>> } > > >>> else > > >>> { > > >>> diff --git a/gcc/d/d-attribs.cc b/gcc/d/d-attribs.cc > > >>> index 0f7ca10e017..9f67415adb1 100644 > > >>> --- a/gcc/d/d-attribs.cc > > >>> +++ b/gcc/d/d-attribs.cc > > >>> @@ -788,11 +788,6 @@ d_handle_target_clones_attribute (tree *node, tree > > >>> name, tree, int, > > >>> warning (OPT_Wattributes, "%qE attribute ignored", name); > > >>> *no_add_attrs = true; > > >>> } > > >>> - else > > >>> - { > > >>> - /* Do not inline functions with multiple clone targets. */ > > >>> - DECL_UNINLINABLE (*node) = 1; > > >>> - } > > >>> > > >>> return NULL_TREE; > > >>> } > > > >