Hi, I want to submit this patch to google/main to make sure it is available for our internal use at Google in order to materialize some optimization opportunities. Let us continue this dicussion as I make changes and submit this for review for trunk.
Thanks, -Sri. On Mon, May 2, 2011 at 9:41 AM, Xinliang David Li <davi...@google.com> wrote: > On Mon, May 2, 2011 at 2:11 AM, Richard Guenther > <richard.guent...@gmail.com> wrote: >> On Fri, Apr 29, 2011 at 6:23 PM, Xinliang David Li <davi...@google.com> >> wrote: >>> Here is the background for this feature: >>> >>> 1) People relies on function multi-version to explore hw features and >>> squeeze performance, but there is no standard ways of doing so, either >>> a) using indirect function calls with function pointers set at program >>> initialization; b) using manual dispatch at each callsite; b) using >>> features like IFUNC. The dispatch mechanism needs to be promoted to >>> the language level and becomes the first class citizen; >> >> You are not doing that, you are inventing a new (crude) GCC extension. > > To capture the high level semantics and prevent user from lowering the > dispatch calls into forms compiler can not recognize, language > extension is the way to go. > >> > >>> See above. Multi-way dispatch can be added similarly. >> >> Not with the specified syntax. So you'd end up with _two_ >> language extensions. That's bad (and unacceptable, IMHO). > > This part can be improved. > >> >>> >>>> >>>> That's a nice idea, but why not simply process all functions with >>>> a const attribute and no arguments this way? IMHO >>>> >>>> int testsse (void) __attribute__((const)); >>>> >>>> is as good and avoids the new attribute completely. The pass would >>>> also benefit other uses of this idiom (giving a way to have global >>>> dynamic initializers in C). The functions may not be strictly 'const' >>>> in a way the compiler autodetects this attribute but it presents >>>> exactly the promises to the compiler that allow this optimization. >>>> >>>> Thus, please split this piece out into a separate patch and use >>>> the const attribute. >>> >>> >>> Sounds good. >>> > >>> >>> 1) the function selection may happen in a different function; >> >> Well, sure. I propose to have the above as lowered form. If people >> deliberately obfuscate code it's their fault. Your scheme simply >> makes that obfuscation impossible (or less likely possible). So >> 1) is not a valid argument. > > Lowering too early may defeat the purpose because > > 1) the desired optimization may not happen subject to many compiler > heuristic changes; > 2) it has other side effects such as wrong estimation of function size > which may impact inlining > 3) it limits the lowering into one form which may not be ideal -- > with builtin_dispatch, after hoisting optimization, the lowering can > use more efficient IFUNC scheme, for instance. > >> >> >> My point is that such optimization is completely independent of >> that dispatch thing. The above could as well be a selection to >> use different input arrays, one causing alias analysis issues >> and one not. >> >> Thus, a __builtin_dispatch centric optimization pass is the wrong >> way to go. > > I agree that many things can implemented in different ways, but a high > level standard builtin_dispatch mechanism doing hoisting > interprocedcurally is cleaner and simpler and targeted for function > multi-versioning -- it does not depend on/rely on later phase's > heuristic tunings to make the right things to happen. Function MV > deserves this level of treatment as it will become more and more > important for some users (e.g., Google). > >> >> Now, with FDO I'd expect the foo is inlined into bar (because foo >> is deemed hot), > > That is a myth -- the truth is that there are other heuristics which > can prevent this from happening. > >> then you only need to deal with loop unswitching, >> which should be easy to drive from FDO. > > Same here -- the loop body may not be well formed/recognized. The loop > nests may not be perfectly nested, or other unswitching heuristics may > block it from happening. This is the common problem form many other > things that get lowered too early. It is cleaner to make the high > level transformation first in IPA, and let unswitching dealing with > intra-procedural optimization. > > Thanks, > > David > >> >> Richard. >> >>> >>> >>> Thanks, >>> >>> David >>> >>> >>> >>>> >>>> Note that we already have special FDO support for indirect to direct >>>> call promotion, so that might work already. >>>> >>>> Thus, with all this, __builtin_dispatch would be more like syntactic >>>> sugar to avoid writing above switch statement yourself. If you restrict >>>> that sugar to a constant number of candidates it can be replaced by >>>> a macro (or several ones, specialized for the number of candidates). >>>> >>>> Richard. >>>> >>>>> For the call graph that looks like this before cloning : >>>>> >>>>> func_cold ----> func_hot ----> findOnes ----> IsPopCntSupported ----> >>>>> popcnt >>>>> | >>>>> -----> no_popcnt >>>>> >>>>> where popcnt and no_popcnt are the multi-versioned functions, then after >>>>> cloning : >>>>> >>>>> func_cold ---->IsPopCntSupported ----> func_hot_clone0 ----> >>>>> findOnes_clone0 ----> popcnt >>>>> | >>>>> ----> func_hot_clone1 ----> findOnes_clone1 >>>>> ----> no_popcnt >>>>> >>>>> >>>>> >>>>> >>>>> Flags : -fclone-hot-version-paths does function unswitching via cloning. >>>>> --param=num-mversn-clones=<num> allows to specify the number of >>>>> functions that should be cloned. >>>>> --param=mversn-clone-depth=<num> allows to specify the length of >>>>> the call graph path that should be cloned. num = 0 implies only >>>>> leaf node that contains the __builtin_dispatch statement must be >>>>> cloned. >>>>> ============================================================================================ >>>>> >>>>> Google ref 45947, 45986 >>>>> >>>>> 2011-04-28 Sriraman Tallam <tmsri...@google.com> >>>>> >>>>> * c-family/c-common.c (revision 173122) >>>>> (handle_version_selector_attribute): New function. >>>>> (c_common_attribute_table): New attribute "version_selector". >>>>> * tree-pass.h (revision 173122) >>>>> (pass_tree_convert_builtin_dispatch): New pass. >>>>> (pass_ipa_multiversion_dispatch): New pass. >>>>> * testsuite/gcc.dg/mversn7.c (revision 0): New test case. >>>>> * testsuite/gcc.dg/mversn4.c (revision 0): New test case. >>>>> * testsuite/gcc.dg/mversn4.h (revision 0): New test case. >>>>> * testsuite/gcc.dg/mversn4a.c (revision 0): New test case. >>>>> * testsuite/gcc.dg/torture/mversn1.c (revision 0): New test >>>>> case. >>>>> * testsuite/gcc.dg/mversn2.c (revision 0): New test case. >>>>> * testsuite/gcc.dg/mversn6.c (revision 0): New test case. >>>>> * testsuite/gcc.dg/mversn3.c (revision 0): New test case. >>>>> * testsuite/g++.dg/mversn8.C (revision 0): New test case. >>>>> * testsuite/g++.dg/mversn10a.C (revision 0): New test case. >>>>> * testsuite/g++.dg/mversn14a.C (revision 0): New test case. >>>>> * testsuite/g++.dg/tree-prof/mversn13.C (revision 0): New test >>>>> case. >>>>> * testsuite/g++.dg/tree-prof/mversn15.C (revision 0): New test >>>>> case. >>>>> * testsuite/g++.dg/tree-prof/mversn15a.C (revision 0): New >>>>> test case. >>>>> * testsuite/g++.dg/mversn9.C (revision 0): New test case. >>>>> * testsuite/g++.dg/mversn10.C (revision 0): New test case. >>>>> * testsuite/g++.dg/mversn12.C (revision 0): New test case. >>>>> * testsuite/g++.dg/mversn14.C (revision 0): New test case. >>>>> * testsuite/g++.dg/mversn16.C (revision 0): New test case. >>>>> * testsuite/g++.dg/torture/mversn11.C (revision 0): New test >>>>> case. >>>>> * testsuite/g++.dg/torture/mversn5.C (revision 0): New test >>>>> case. >>>>> * testsuite/g++.dg/torture/mversn5.h (revision 0): New test >>>>> case. >>>>> * testsuite/g++.dg/torture/mversn5a.C (revision 0): New test >>>>> case. >>>>> * builtin-types.def (revision 173122) (BT_PTR_FN_INT): New >>>>> pointer type. >>>>> (BT_FN_INT_PTR_FN_INT_PTR_PTR_VAR): New function type for >>>>> __builtin_dispatch. >>>>> * builtins.def (revision 173122) (BUILT_IN_DISPATCH): New builtin >>>>> to >>>>> support multi-version calls. >>>>> * mversn-dispatch.c (revision 0): New file. >>>>> * timevar.def (revision 173122) (TV_MVERSN_DISPATCH): New time >>>>> var. >>>>> * common.opt (revision 173122) (fclone-hot-version-paths): New >>>>> flag. >>>>> * Makefile.in (revision 173122) (mversn-dispatch.o): New rule. >>>>> * passes.c (revision 173122) (init_optimization_passes): Add >>>>> the new >>>>> multi-version and dispatch passes to the pass list. >>>>> * params.def (revision 173122) (PARAM_NUMBER_OF_MVERSN_CLONES): >>>>> Define. >>>>> (PARAM_MVERSN_CLONE_CGRAPH_DEPTH): Define. >>>>> * doc/invoke.texi (revision 173122) (mversn-clone-depth): >>>>> Document. >>>>> (num-mversn-clones): Document. >>>>> (fclone-hot-version-paths): Document. >>>> >>> >> >