Ok. There may be more correctness related comments -- but those can be
addressed when available. For trunk, you need to address issues such
as multi-way dispatch.

Thanks,

David

On Mon, May 2, 2011 at 12:11 PM, Sriraman Tallam <tmsri...@google.com> wrote:
> Hi,
>
>  I want to submit this patch to google/main to make sure it is
> available for our internal use at Google in order to materialize some
> optimization opportunities. Let us continue this dicussion as I make
> changes and submit this for review for trunk.
>
> Thanks,
> -Sri.
>
>
> On Mon, May 2, 2011 at 9:41 AM, Xinliang David Li <davi...@google.com> wrote:
>> On Mon, May 2, 2011 at 2:11 AM, Richard Guenther
>> <richard.guent...@gmail.com> wrote:
>>> On Fri, Apr 29, 2011 at 6:23 PM, Xinliang David Li <davi...@google.com> 
>>> wrote:
>>>> Here is the background for this feature:
>>>>
>>>> 1) People relies on function multi-version to explore hw features and
>>>> squeeze performance, but there is no standard ways of doing so, either
>>>> a) using indirect function calls with function pointers set at program
>>>> initialization; b) using manual dispatch at each callsite; b) using
>>>> features like IFUNC.  The dispatch mechanism needs to be promoted to
>>>> the language level and becomes the first class citizen;
>>>
>>> You are not doing that, you are inventing a new (crude) GCC extension.
>>
>> To capture the high level semantics and prevent user from lowering the
>> dispatch calls into forms compiler can not recognize, language
>> extension is the way to go.
>>
>>>
>>
>>>> See above. Multi-way dispatch can be added similarly.
>>>
>>> Not with the specified syntax.  So you'd end up with _two_
>>> language extensions.  That's bad (and unacceptable, IMHO).
>>
>> This part can be improved.
>>
>>>
>>>>
>>>>>
>>>>> That's a nice idea, but why not simply process all functions with
>>>>> a const attribute and no arguments this way?  IMHO
>>>>>
>>>>> int testsse (void) __attribute__((const));
>>>>>
>>>>> is as good and avoids the new attribute completely.  The pass would
>>>>> also benefit other uses of this idiom (giving a way to have global
>>>>> dynamic initializers in C).  The functions may not be strictly 'const'
>>>>> in a way the compiler autodetects this attribute but it presents
>>>>> exactly the promises to the compiler that allow this optimization.
>>>>>
>>>>> Thus, please split this piece out into a separate patch and use
>>>>> the const attribute.
>>>>
>>>>
>>>> Sounds good.
>>>>
>>
>>>>
>>>> 1) the function selection may happen in a different function;
>>>
>>> Well, sure.  I propose to have the above as lowered form.  If people
>>> deliberately obfuscate code it's their fault.  Your scheme simply
>>> makes that obfuscation impossible (or less likely possible).  So
>>> 1) is not a valid argument.
>>
>> Lowering too early may defeat the purpose because
>>
>> 1) the desired optimization may not happen subject to many compiler
>> heuristic changes;
>> 2) it has other side effects such as wrong estimation of function size
>> which may impact inlining
>> 3) it limits the lowering into one form which may not be ideal  --
>> with builtin_dispatch, after hoisting optimization, the lowering can
>> use more efficient IFUNC scheme, for instance.
>>
>>>
>>>
>>> My point is that such optimization is completely independent of
>>> that dispatch thing.  The above could as well be a selection to
>>> use different input arrays, one causing alias analysis issues
>>> and one not.
>>>
>>> Thus, a __builtin_dispatch centric optimization pass is the wrong
>>> way to go.
>>
>> I agree that many things can implemented in different ways, but a high
>> level standard builtin_dispatch mechanism doing hoisting
>> interprocedcurally is cleaner and simpler and targeted for function
>> multi-versioning -- it does not depend on/rely on later phase's
>> heuristic tunings to make the right things to happen. Function MV
>> deserves this level of treatment as it will become more and more
>> important for some users (e.g., Google).
>>
>>>
>>> Now, with FDO I'd expect the foo is inlined into bar (because foo
>>> is deemed hot),
>>
>> That is a myth -- the truth is that there are other heuristics which
>> can prevent this from happening.
>>
>>> then you only need to deal with loop unswitching,
>>> which should be easy to drive from FDO.
>>
>> Same here -- the loop body may not be well formed/recognized. The loop
>> nests may not be perfectly nested, or other unswitching heuristics may
>> block it from happening.  This is the common problem form many other
>> things that get lowered too early. It is cleaner to make the high
>> level transformation first in IPA, and let unswitching dealing with
>> intra-procedural optimization.
>>
>> Thanks,
>>
>> David
>>
>>>
>>> Richard.
>>>
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> David
>>>>
>>>>
>>>>
>>>>>
>>>>> Note that we already have special FDO support for indirect to direct
>>>>> call promotion, so that might work already.
>>>>>
>>>>> Thus, with all this, __builtin_dispatch would be more like syntactic
>>>>> sugar to avoid writing above switch statement yourself.  If you restrict
>>>>> that sugar to a constant number of candidates it can be replaced by
>>>>> a macro (or several ones, specialized for the number of candidates).
>>>>>
>>>>> Richard.
>>>>>
>>>>>>  For the call graph that looks like this before cloning :
>>>>>>
>>>>>> func_cold ----> func_hot ----> findOnes ----> IsPopCntSupported ----> 
>>>>>> popcnt
>>>>>>                                                |
>>>>>>                                                -----> no_popcnt
>>>>>>
>>>>>>  where popcnt and no_popcnt are the multi-versioned functions, then 
>>>>>> after cloning :
>>>>>>
>>>>>> func_cold ---->IsPopCntSupported ----> func_hot_clone0 ----> 
>>>>>> findOnes_clone0 ----> popcnt
>>>>>>                              |
>>>>>>                               ----> func_hot_clone1 ----> 
>>>>>> findOnes_clone1 ----> no_popcnt
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>  Flags : -fclone-hot-version-paths does function unswitching via cloning.
>>>>>>          --param=num-mversn-clones=<num> allows to specify the number of
>>>>>>          functions that should be cloned.
>>>>>>          --param=mversn-clone-depth=<num> allows to specify the length of
>>>>>>          the call graph path that should be cloned.  num = 0 implies only
>>>>>>          leaf node that contains the __builtin_dispatch statement must be
>>>>>>          cloned.
>>>>>> ============================================================================================
>>>>>>
>>>>>> Google ref 45947, 45986
>>>>>>
>>>>>> 2011-04-28  Sriraman Tallam  <tmsri...@google.com>
>>>>>>
>>>>>>        * c-family/c-common.c   (revision 173122) 
>>>>>> (handle_version_selector_attribute): New function.
>>>>>>        (c_common_attribute_table): New attribute "version_selector".
>>>>>>        * tree-pass.h   (revision 173122) 
>>>>>> (pass_tree_convert_builtin_dispatch): New pass.
>>>>>>        (pass_ipa_multiversion_dispatch): New pass.
>>>>>>        * testsuite/gcc.dg/mversn7.c    (revision 0): New test case.
>>>>>>        * testsuite/gcc.dg/mversn4.c    (revision 0): New test case.
>>>>>>        * testsuite/gcc.dg/mversn4.h    (revision 0): New test case.
>>>>>>        * testsuite/gcc.dg/mversn4a.c   (revision 0): New test case.
>>>>>>        * testsuite/gcc.dg/torture/mversn1.c    (revision 0): New test 
>>>>>> case.
>>>>>>        * testsuite/gcc.dg/mversn2.c    (revision 0): New test case.
>>>>>>        * testsuite/gcc.dg/mversn6.c    (revision 0): New test case.
>>>>>>        * testsuite/gcc.dg/mversn3.c    (revision 0): New test case.
>>>>>>        * testsuite/g++.dg/mversn8.C    (revision 0): New test case.
>>>>>>        * testsuite/g++.dg/mversn10a.C  (revision 0): New test case.
>>>>>>        * testsuite/g++.dg/mversn14a.C  (revision 0): New test case.
>>>>>>        * testsuite/g++.dg/tree-prof/mversn13.C (revision 0): New test 
>>>>>> case.
>>>>>>        * testsuite/g++.dg/tree-prof/mversn15.C (revision 0): New test 
>>>>>> case.
>>>>>>        * testsuite/g++.dg/tree-prof/mversn15a.C        (revision 0): New 
>>>>>> test case.
>>>>>>        * testsuite/g++.dg/mversn9.C    (revision 0): New test case.
>>>>>>        * testsuite/g++.dg/mversn10.C   (revision 0): New test case.
>>>>>>        * testsuite/g++.dg/mversn12.C   (revision 0): New test case.
>>>>>>        * testsuite/g++.dg/mversn14.C   (revision 0): New test case.
>>>>>>        * testsuite/g++.dg/mversn16.C   (revision 0): New test case.
>>>>>>        * testsuite/g++.dg/torture/mversn11.C   (revision 0): New test 
>>>>>> case.
>>>>>>        * testsuite/g++.dg/torture/mversn5.C    (revision 0): New test 
>>>>>> case.
>>>>>>        * testsuite/g++.dg/torture/mversn5.h    (revision 0): New test 
>>>>>> case.
>>>>>>        * testsuite/g++.dg/torture/mversn5a.C   (revision 0): New test 
>>>>>> case.
>>>>>>        * builtin-types.def     (revision 173122) (BT_PTR_FN_INT): New 
>>>>>> pointer type.
>>>>>>        (BT_FN_INT_PTR_FN_INT_PTR_PTR_VAR): New function type for 
>>>>>> __builtin_dispatch.
>>>>>>        * builtins.def  (revision 173122) (BUILT_IN_DISPATCH): New 
>>>>>> builtin to
>>>>>>        support multi-version calls.
>>>>>>        * mversn-dispatch.c     (revision 0): New file.
>>>>>>        * timevar.def   (revision 173122) (TV_MVERSN_DISPATCH): New time 
>>>>>> var.
>>>>>>        * common.opt    (revision 173122) (fclone-hot-version-paths): New 
>>>>>> flag.
>>>>>>        * Makefile.in   (revision 173122) (mversn-dispatch.o): New rule.
>>>>>>        * passes.c      (revision 173122) (init_optimization_passes): Add 
>>>>>> the new
>>>>>>        multi-version and dispatch passes to the pass list.
>>>>>>        * params.def    (revision 173122) 
>>>>>> (PARAM_NUMBER_OF_MVERSN_CLONES): Define.
>>>>>>        (PARAM_MVERSN_CLONE_CGRAPH_DEPTH): Define.
>>>>>>        * doc/invoke.texi       (revision 173122) (mversn-clone-depth): 
>>>>>> Document.
>>>>>>        (num-mversn-clones): Document.
>>>>>>        (fclone-hot-version-paths): Document.
>>>>>
>>>>
>>>
>>
>

Reply via email to