On Fri, Apr 29, 2011 at 4:52 AM, Sriraman Tallam <tmsri...@google.com> wrote: > I want this patch to be considered for google/main now. This is intended to > be submitted to trunk for review soon. > This patch has been tested with crosstool bootstrap using buildit and by > running all tests. > > > Patch Description : > ================== > > Frequently executed functions in programs are some times compiled > into different versions to take advantage of some specific > advanced features present in some of the types of platforms on > which the program is executed. For instance, SSE4 versus no-SSE4. > Existing techniques to do multi-versioning, for example using > indirect calls, block compiler optimizations, mainly inlining.
The general idea of supporting versioning is good, thanks for working on it. Comments below. > This patch adds a new GCC pass to support multi-versioned calls > through a new builtin, __builtin_dispatch. The __builtin_dispatch > call is converted into a simple if-then construct to call the > appropriate version at run-time. There are no indirect calls so > inlining is not affected. I am not sure that combining the function choice and its invocation to a single builtin is good. First, you use variadic arguments for the actual function arguments which will cause vararg promotions to apply and thus it will be impossible to dispatch to functions taking single precision float values (and dependent on ABI details many more similar issues). Second, you restrict yourself to only two versions of a function - that looks quite limited and this restriction is not easy to lift with your proposed interface. Thus, I would have kept regular (but indirect) calls in the source program and only exposed the dispatch via a builtin, like ... > This patch also supports a function unswitching optimization via > cloning of functions, only along hot paths, that call multi-versioned > functions so that the hot multi-versioned paths can be independently > optimized for performance. The cloning optimization is switched on > via a flag, -fclone-hot-version-paths. > > Only two versions are supported in this patch. Example use : > > int popcnt_sse4(unsigned int x) __attribute__((__target__("popcnt"))); > int popcnt_sse4(unsigned int x) > { > int count = __builtin_popcount(x); > return count; > } > > int popcnt(unsigned int x) __attribute__((__target__("no-popcnt"))); > int popcnt(unsigned int x) > { > int count = __builtin_popcount(x); > return count; > } > > int testsse() __attribute__((version_selector)); > int main () > { > ... > /* The multi-versioned call. */ > ret = __builtin_dispatch (testsse, (void*)popcnt_sse4, (void*)popcnt, 25); > ... > } int main() { int (*ppcnt)(unsigned int) = __builtin_dispatch (testsse, popcnt_sse4, popcnt); ret = (*ppcnt) (25); } which would allow the testsse function to return the argument number of the function to select. [snip] > When to use the "version_selector" attribute ? > ----------------------------------------------- > > Functions are marked with attribute "version_selector" only if > they are run-time constants. Example of such functions would > be those that test if a specific feature is available on a > particular architecture. Such functions must return a positive > integer. For two-way functions, those that test if a feature > is present or not must return 1 or 0 respectively. > > This patch will make constructors that call these function once > and assign the outcome to a global variable. From then on, only > the value of the global will be used to decide which version to > execute. That's a nice idea, but why not simply process all functions with a const attribute and no arguments this way? IMHO int testsse (void) __attribute__((const)); is as good and avoids the new attribute completely. The pass would also benefit other uses of this idiom (giving a way to have global dynamic initializers in C). The functions may not be strictly 'const' in a way the compiler autodetects this attribute but it presents exactly the promises to the compiler that allow this optimization. Thus, please split this piece out into a separate patch and use the const attribute. > What happens with cloning, -fclone-hot-version-paths ? > ----------------------------------------------------- Now, here you lost me somewhat, because I didn't look into the patch details and I am missing an example on how the lowered IL looks before that cloning. So for now I suppose this -fclone-hot-version-paths is to expose direct calls to the IL. If you would lower __builtin_dispatch to a style like int sel = selector (); switch (sel) { case 0: fn = popcnt; break; case 1: fn = popcnt_sse4; break; } res = (*fn) (25); then rather than a new pass specialized for __builtin_dispatch existing optimization passes that are able to do tracing like VRP and DOM via jump-threading or the tracer pass should be able to do this optimization for you. If they do not use FDO in a good way it is better to improve them for this case instead of writing a new pass. Note that we already have special FDO support for indirect to direct call promotion, so that might work already. Thus, with all this, __builtin_dispatch would be more like syntactic sugar to avoid writing above switch statement yourself. If you restrict that sugar to a constant number of candidates it can be replaced by a macro (or several ones, specialized for the number of candidates). Richard. > For the call graph that looks like this before cloning : > > func_cold ----> func_hot ----> findOnes ----> IsPopCntSupported ----> popcnt > | > -----> no_popcnt > > where popcnt and no_popcnt are the multi-versioned functions, then after > cloning : > > func_cold ---->IsPopCntSupported ----> func_hot_clone0 ----> findOnes_clone0 > ----> popcnt > | > ----> func_hot_clone1 ----> findOnes_clone1 > ----> no_popcnt > > > > > Flags : -fclone-hot-version-paths does function unswitching via cloning. > --param=num-mversn-clones=<num> allows to specify the number of > functions that should be cloned. > --param=mversn-clone-depth=<num> allows to specify the length of > the call graph path that should be cloned. num = 0 implies only > leaf node that contains the __builtin_dispatch statement must be > cloned. > ============================================================================================ > > Google ref 45947, 45986 > > 2011-04-28 Sriraman Tallam <tmsri...@google.com> > > * c-family/c-common.c (revision 173122) > (handle_version_selector_attribute): New function. > (c_common_attribute_table): New attribute "version_selector". > * tree-pass.h (revision 173122) > (pass_tree_convert_builtin_dispatch): New pass. > (pass_ipa_multiversion_dispatch): New pass. > * testsuite/gcc.dg/mversn7.c (revision 0): New test case. > * testsuite/gcc.dg/mversn4.c (revision 0): New test case. > * testsuite/gcc.dg/mversn4.h (revision 0): New test case. > * testsuite/gcc.dg/mversn4a.c (revision 0): New test case. > * testsuite/gcc.dg/torture/mversn1.c (revision 0): New test case. > * testsuite/gcc.dg/mversn2.c (revision 0): New test case. > * testsuite/gcc.dg/mversn6.c (revision 0): New test case. > * testsuite/gcc.dg/mversn3.c (revision 0): New test case. > * testsuite/g++.dg/mversn8.C (revision 0): New test case. > * testsuite/g++.dg/mversn10a.C (revision 0): New test case. > * testsuite/g++.dg/mversn14a.C (revision 0): New test case. > * testsuite/g++.dg/tree-prof/mversn13.C (revision 0): New test case. > * testsuite/g++.dg/tree-prof/mversn15.C (revision 0): New test case. > * testsuite/g++.dg/tree-prof/mversn15a.C (revision 0): New test > case. > * testsuite/g++.dg/mversn9.C (revision 0): New test case. > * testsuite/g++.dg/mversn10.C (revision 0): New test case. > * testsuite/g++.dg/mversn12.C (revision 0): New test case. > * testsuite/g++.dg/mversn14.C (revision 0): New test case. > * testsuite/g++.dg/mversn16.C (revision 0): New test case. > * testsuite/g++.dg/torture/mversn11.C (revision 0): New test case. > * testsuite/g++.dg/torture/mversn5.C (revision 0): New test case. > * testsuite/g++.dg/torture/mversn5.h (revision 0): New test case. > * testsuite/g++.dg/torture/mversn5a.C (revision 0): New test case. > * builtin-types.def (revision 173122) (BT_PTR_FN_INT): New pointer > type. > (BT_FN_INT_PTR_FN_INT_PTR_PTR_VAR): New function type for > __builtin_dispatch. > * builtins.def (revision 173122) (BUILT_IN_DISPATCH): New builtin to > support multi-version calls. > * mversn-dispatch.c (revision 0): New file. > * timevar.def (revision 173122) (TV_MVERSN_DISPATCH): New time var. > * common.opt (revision 173122) (fclone-hot-version-paths): New flag. > * Makefile.in (revision 173122) (mversn-dispatch.o): New rule. > * passes.c (revision 173122) (init_optimization_passes): Add the > new > multi-version and dispatch passes to the pass list. > * params.def (revision 173122) (PARAM_NUMBER_OF_MVERSN_CLONES): > Define. > (PARAM_MVERSN_CLONE_CGRAPH_DEPTH): Define. > * doc/invoke.texi (revision 173122) (mversn-clone-depth): > Document. > (num-mversn-clones): Document. > (fclone-hot-version-paths): Document.