thanks. Updated patch attached. David
On Mon, Aug 5, 2013 at 3:57 AM, Michael V. Zolotukhin <michael.v.zolotuk...@gmail.com> wrote: > Hi, > This is a really convenient option, thanks for working on it. > I can't approve it as I'm not a maintainer, but it looks ok to me, > except fot a small nitpicking: afair, comments should end with > dot-space-space. > > Michael > > On 04 Aug 20:01, Xinliang David Li wrote: >> The attached is a new patch implementing the stringop inline strategy >> control using two new -m options: >> >> -mmemcpy-strategy= >> -mmemset-strategy= >> >> See changes in doc/invoke.texi for description of the new options. Example: >> >> -mmemcpy-strategy=rep_8byte:64:unaligned,unrolled_loop:2048:unaligned,libcall:-1:unaligned >> >> tells compiler to inline memcpy using rep_8byte when the size is no >> larger than 64 byte, using unrolled_loop when size is no larger than >> 2048, and for size > 2048, using library call. In all cases, >> destination alignment adjustment is not done. >> >> Tested on x86-64/linux. Ok for trunk? >> >> thanks, >> >> David >> >> 2013-08-02 Xinliang David Li <davi...@google.com> >> >> * config/i386/stringop.def: New file. >> * config/i386/stringop.opt: New file. >> * config/i386/i386-opts.h: Include stringopt.def. >> * config/i386/i386.opt: Include stringopt.opt. >> * config/i386/i386.c (ix86_option_override_internal): >> Override default size based stringop inline strategies >> with options. >> * config/i386/i386.c (ix86_parse_stringop_strategy_string): >> New function. >> >> 2013-08-04 Xinliang David Li <davi...@google.com> >> >> * testsuite/gcc.target/i386/memcpy-strategy-1.c: New test. >> * testsuite/gcc.target/i386/memcpy-strategy-2.c: Ditto. >> * testsuite/gcc.target/i386/memset-strategy-1.c: Ditto. >> * testsuite/gcc.target/i386/memcpy-strategy-3.c: Ditto. >> >> >> >> >> On Fri, Aug 2, 2013 at 9:21 PM, Xinliang David Li <davi...@google.com> wrote: >> > On x86_64, when the expected size of memcpy/memset is known (e.g, with >> > FDO), libcall strategy is used with the size is > 8192. This value is >> > hard coded, which makes it hard to do performance tuning. This patch >> > adds two new parameters to do that. Potential usage includes >> > per-application libcall strategy min-size tuning based on summary data >> > with FDO (e.g, instruction workset size). >> > >> > Bootstrap and tested on x86_64/linux. Ok for trunk? >> > >> > thanks, >> > >> > David >> > >> > >> > 2013-08-02 Xinliang David Li <davi...@google.com> >> > >> > * params.def: New parameters. >> > * config/i386/i386.c (ix86_option_override_internal): >> > Override default libcall size limit with parameters. > >> Index: config/i386/stringop.def >> =================================================================== >> --- config/i386/stringop.def (revision 0) >> +++ config/i386/stringop.def (revision 0) >> @@ -0,0 +1,42 @@ >> +/* Definitions for option handling for IA-32. >> + Copyright (C) 2013 Free Software Foundation, Inc. >> + >> +This file is part of GCC. >> + >> +GCC is free software; you can redistribute it and/or modify >> +it under the terms of the GNU General Public License as published by >> +the Free Software Foundation; either version 3, or (at your option) >> +any later version. >> + >> +GCC is distributed in the hope that it will be useful, >> +but WITHOUT ANY WARRANTY; without even the implied warranty of >> +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the >> +GNU General Public License for more details. >> + >> +Under Section 7 of GPL version 3, you are granted additional >> +permissions described in the GCC Runtime Library Exception, version >> +3.1, as published by the Free Software Foundation. >> + >> +You should have received a copy of the GNU General Public License and >> +a copy of the GCC Runtime Library Exception along with this program; >> +see the files COPYING3 and COPYING.RUNTIME respectively. If not, see >> +<http://www.gnu.org/licenses/>. */ >> + >> +DEF_ENUM >> +DEF_ALG (no_stringop, no_stringop) >> +DEF_ENUM >> +DEF_ALG (libcall, libcall) >> +DEF_ENUM >> +DEF_ALG (rep_prefix_1_byte, rep_byte) >> +DEF_ENUM >> +DEF_ALG (rep_prefix_4_byte, rep_4byte) >> +DEF_ENUM >> +DEF_ALG (rep_prefix_8_byte, rep_8byte) >> +DEF_ENUM >> +DEF_ALG (loop_1_byte, byte_loop) >> +DEF_ENUM >> +DEF_ALG (loop, loop) >> +DEF_ENUM >> +DEF_ALG (unrolled_loop, unrolled_loop) >> +DEF_ENUM >> +DEF_ALG (vector_loop, vector_loop) >> Index: config/i386/i386.opt >> =================================================================== >> --- config/i386/i386.opt (revision 201458) >> +++ config/i386/i386.opt (working copy) >> @@ -316,6 +316,14 @@ mstack-arg-probe >> Target Report Mask(STACK_PROBE) Save >> Enable stack probing >> >> +mmemcpy-strategy= >> +Target RejectNegative Joined Var(ix86_tune_memcpy_strategy) >> +Specify memcpy expansion strategy when expected size is known >> + >> +mmemset-strategy= >> +Target RejectNegative Joined Var(ix86_tune_memset_strategy) >> +Specify memset expansion strategy when expected size is known >> + >> mstringop-strategy= >> Target RejectNegative Joined Enum(stringop_alg) Var(ix86_stringop_alg) >> Init(no_stringop) >> Chose strategy to generate stringop using >> Index: config/i386/stringop.opt >> =================================================================== >> --- config/i386/stringop.opt (revision 0) >> +++ config/i386/stringop.opt (revision 0) >> @@ -0,0 +1,36 @@ >> +/* Definitions for option handling for IA-32. >> + Copyright (C) 2013 Free Software Foundation, Inc. >> + >> +This file is part of GCC. >> + >> +GCC is free software; you can redistribute it and/or modify >> +it under the terms of the GNU General Public License as published by >> +the Free Software Foundation; either version 3, or (at your option) >> +any later version. >> + >> +GCC is distributed in the hope that it will be useful, >> +but WITHOUT ANY WARRANTY; without even the implied warranty of >> +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the >> +GNU General Public License for more details. >> + >> +Under Section 7 of GPL version 3, you are granted additional >> +permissions described in the GCC Runtime Library Exception, version >> +3.1, as published by the Free Software Foundation. >> + >> +You should have received a copy of the GNU General Public License and >> +a copy of the GCC Runtime Library Exception along with this program; >> +see the files COPYING3 and COPYING.RUNTIME respectively. If not, see >> +<http://www.gnu.org/licenses/>. */ >> + >> +Enum(stringop_alg) String(rep_byte) Value(rep_prefix_1_byte) >> + >> +#undef DEF_ENUM >> +#define DEF_ENUM EnumValue >> + >> +#undef DEF_ALG >> +#define DEF_ALG(alg, name) Enum(stringop_alg) String(name) Value(alg) >> + >> +#include "stringop.def" >> + >> +#undef DEF_ENUM >> +#undef DEF_ALG >> Index: config/i386/i386.c >> =================================================================== >> --- config/i386/i386.c (revision 201458) >> +++ config/i386/i386.c (working copy) >> @@ -156,7 +156,7 @@ struct processor_costs ix86_size_cost = >> }; >> >> /* Processor costs (relative to an add) */ >> -static const >> +static >> struct processor_costs i386_cost = { /* 386 specific costs */ >> COSTS_N_INSNS (1), /* cost of an add instruction */ >> COSTS_N_INSNS (1), /* cost of a lea instruction */ >> @@ -226,7 +226,7 @@ struct processor_costs i386_cost = { /* >> 1, /* cond_not_taken_branch_cost. */ >> }; >> >> -static const >> +static >> struct processor_costs i486_cost = { /* 486 specific costs */ >> COSTS_N_INSNS (1), /* cost of an add instruction */ >> COSTS_N_INSNS (1), /* cost of a lea instruction */ >> @@ -298,7 +298,7 @@ struct processor_costs i486_cost = { /* >> 1, /* cond_not_taken_branch_cost. */ >> }; >> >> -static const >> +static >> struct processor_costs pentium_cost = { >> COSTS_N_INSNS (1), /* cost of an add instruction */ >> COSTS_N_INSNS (1), /* cost of a lea instruction */ >> @@ -368,7 +368,7 @@ struct processor_costs pentium_cost = { >> 1, /* cond_not_taken_branch_cost. */ >> }; >> >> -static const >> +static >> struct processor_costs pentiumpro_cost = { >> COSTS_N_INSNS (1), /* cost of an add instruction */ >> COSTS_N_INSNS (1), /* cost of a lea instruction */ >> @@ -447,7 +447,7 @@ struct processor_costs pentiumpro_cost = >> 1, /* cond_not_taken_branch_cost. */ >> }; >> >> -static const >> +static >> struct processor_costs geode_cost = { >> COSTS_N_INSNS (1), /* cost of an add instruction */ >> COSTS_N_INSNS (1), /* cost of a lea instruction */ >> @@ -518,7 +518,7 @@ struct processor_costs geode_cost = { >> 1, /* cond_not_taken_branch_cost. */ >> }; >> >> -static const >> +static >> struct processor_costs k6_cost = { >> COSTS_N_INSNS (1), /* cost of an add instruction */ >> COSTS_N_INSNS (2), /* cost of a lea instruction */ >> @@ -591,7 +591,7 @@ struct processor_costs k6_cost = { >> 1, /* cond_not_taken_branch_cost. */ >> }; >> >> -static const >> +static >> struct processor_costs athlon_cost = { >> COSTS_N_INSNS (1), /* cost of an add instruction */ >> COSTS_N_INSNS (2), /* cost of a lea instruction */ >> @@ -664,7 +664,7 @@ struct processor_costs athlon_cost = { >> 1, /* cond_not_taken_branch_cost. */ >> }; >> >> -static const >> +static >> struct processor_costs k8_cost = { >> COSTS_N_INSNS (1), /* cost of an add instruction */ >> COSTS_N_INSNS (2), /* cost of a lea instruction */ >> @@ -1265,7 +1265,7 @@ struct processor_costs btver2_cost = { >> 1, /* cond_not_taken_branch_cost. */ >> }; >> >> -static const >> +static >> struct processor_costs pentium4_cost = { >> COSTS_N_INSNS (1), /* cost of an add instruction */ >> COSTS_N_INSNS (3), /* cost of a lea instruction */ >> @@ -1336,7 +1336,7 @@ struct processor_costs pentium4_cost = { >> 1, /* cond_not_taken_branch_cost. */ >> }; >> >> -static const >> +static >> struct processor_costs nocona_cost = { >> COSTS_N_INSNS (1), /* cost of an add instruction */ >> COSTS_N_INSNS (1), /* cost of a lea instruction */ >> @@ -1409,7 +1409,7 @@ struct processor_costs nocona_cost = { >> 1, /* cond_not_taken_branch_cost. */ >> }; >> >> -static const >> +static >> struct processor_costs atom_cost = { >> COSTS_N_INSNS (1), /* cost of an add instruction */ >> COSTS_N_INSNS (1) + 1, /* cost of a lea instruction */ >> @@ -1556,7 +1556,7 @@ struct processor_costs slm_cost = { >> }; >> >> /* Generic64 should produce code tuned for Nocona and K8. */ >> -static const >> +static >> struct processor_costs generic64_cost = { >> COSTS_N_INSNS (1), /* cost of an add instruction */ >> /* On all chips taken into consideration lea is 2 cycles and more. With >> @@ -1635,7 +1635,7 @@ struct processor_costs generic64_cost = >> }; >> >> /* core_cost should produce code tuned for Core familly of CPUs. */ >> -static const >> +static >> struct processor_costs core_cost = { >> COSTS_N_INSNS (1), /* cost of an add instruction */ >> /* On all chips taken into consideration lea is 2 cycles and more. With >> @@ -1717,7 +1717,7 @@ struct processor_costs core_cost = { >> >> /* Generic32 should produce code tuned for PPro, Pentium4, Nocona, >> Athlon and K8. */ >> -static const >> +static >> struct processor_costs generic32_cost = { >> COSTS_N_INSNS (1), /* cost of an add instruction */ >> COSTS_N_INSNS (1) + 1, /* cost of a lea instruction */ >> @@ -2900,6 +2900,150 @@ ix86_debug_options (void) >> >> return; >> } >> + >> +static const char *stringop_alg_names[] = { >> +#define DEF_ENUM >> +#define DEF_ALG(alg, name) #name, >> +#include "stringop.def" >> +#undef DEF_ENUM >> +#undef DEF_ALG >> +}; >> + >> +/* Parse parameter string passed to -mmemcpy-strategy= or >> -mmemset-strategy=. >> + The string is of the following form (or comma separated list of it): >> + >> + strategy_alg:max_size:[align|noalign] >> + >> + where the full size range for the strategy is either [0, max_size] or >> + [min_size, max_size], in which min_size is the max_size + 1 of the >> + preceding range. The last size range must have max_size == -1. >> + >> + Examples: >> + >> + 1. >> + -mmemcpy-strategy=libcall:-1:noalign >> + >> + this is equivalent to (for known size memcpy) >> -mstringop-strategy=libcall >> + >> + >> + 2. >> + >> -mmemset-strategy=rep_8byte:16:noalign,vector_loop:2048:align,libcall:-1:noalign >> + >> + This is to tell the compiler to use the following strategy for memset >> + 1) when the expected size is between [1, 16], use rep_8byte strategy; >> + 2) when the size is between [17, 2048], use vector_loop; >> + 3) when the size is > 2048, use libcall. >> + >> +*/ >> + >> +struct stringop_size_range >> +{ >> + int min; >> + int max; >> + stringop_alg alg; >> + bool noalign; >> +}; >> + >> +static void >> +ix86_parse_stringop_strategy_string (char *strategy_str, bool is_memset) >> +{ >> + const struct stringop_algs *default_algs; >> + stringop_size_range input_ranges[MAX_STRINGOP_ALGS]; >> + char *curr_range_str, *next_range_str; >> + int i = 0, n = 0; >> + >> + if (is_memset) >> + default_algs = &ix86_cost->memset[TARGET_64BIT != 0]; >> + else >> + default_algs = &ix86_cost->memcpy[TARGET_64BIT != 0]; >> + >> + curr_range_str = strategy_str; >> + >> + do { >> + >> + int mins, maxs; >> + stringop_alg alg; >> + char alg_name[128]; >> + char align[16]; >> + >> + next_range_str = strchr (curr_range_str, ','); >> + if (next_range_str) >> + *next_range_str++ = '\0'; >> + >> + if (3 != sscanf (curr_range_str, "%[^:]:%d:%s", alg_name, &maxs, align)) >> + { >> + warning (0, "Wrong arg %s to option %s", curr_range_str, >> + is_memset ? "-mmemset_strategy=" : "-mmemcpy_strategy="); >> + return; >> + } >> + >> + if (n > 0 && (maxs < (mins = input_ranges[n - 1].max + 1) && maxs != >> -1)) >> + { >> + warning (0, "Size ranges of option %s should be increasing", >> + is_memset ? "-mmemset_strategy=" : "-mmemcpy_strategy="); >> + return; >> + } >> + >> + for (i = 0; i < last_alg; i++) >> + { >> + if (!strcmp (alg_name, stringop_alg_names[i])) >> + { >> + alg = (stringop_alg) i; >> + break; >> + } >> + } >> + >> + if (i == last_alg) >> + { >> + warning (0, "Wrong stringop strategy name %s specified for option >> %s", >> + alg_name, >> + is_memset ? "-mmemset_strategy=" : "-mmemcpy_strategy="); >> + return; >> + } >> + >> + input_ranges[n].min = mins; >> + input_ranges[n].max = maxs; >> + input_ranges[n].alg = alg; >> + if (!strcmp (align, "align")) >> + input_ranges[n].noalign = false; >> + else if (!strcmp (align, "noalign")) >> + input_ranges[n].noalign = true; >> + else >> + { >> + warning (0, "Unknown alignment %s specified for option %s", >> + align, is_memset ? "-mmemset_strategy=" : >> "-mmemcpy_strategy="); >> + return; >> + } >> + n++; >> + curr_range_str = next_range_str; >> + } while (curr_range_str); >> + >> + if (input_ranges[n - 1].max != -1) >> + { >> + warning (0, "The max value for the last size range should be -1" >> + " for option %s", >> + is_memset ? "-mmemset_strategy=" : "-mmemcpy_strategy="); >> + return; >> + } >> + >> + if (n > MAX_STRINGOP_ALGS) >> + { >> + warning (0, "Too many size ranges specified in option %s", >> + is_memset ? "-mmemset_strategy=" : "-mmemcpy_strategy="); >> + return; >> + } >> + >> + /* Now override the default algs array */ >> + for (i = 0; i < n; i++) >> + { >> + *const_cast<int *>(&default_algs->size[i].max) = input_ranges[i].max; >> + *const_cast<stringop_alg *>(&default_algs->size[i].alg) >> + = input_ranges[i].alg; >> + *const_cast<int *>(&default_algs->size[i].noalign) >> + = input_ranges[i].noalign; >> + } >> +} >> + >> >> /* Override various settings based on options. If MAIN_ARGS_P, the >> options are from the command line, otherwise they are from >> @@ -4021,6 +4165,21 @@ ix86_option_override_internal (bool main >> /* Handle stack protector */ >> if (!global_options_set.x_ix86_stack_protector_guard) >> ix86_stack_protector_guard = TARGET_HAS_BIONIC ? SSP_GLOBAL : SSP_TLS; >> + >> + /* Handle -mmemcpy-strategy= and -mmemset-strategy= */ >> + if (ix86_tune_memcpy_strategy) >> + { >> + char *str = xstrdup (ix86_tune_memcpy_strategy); >> + ix86_parse_stringop_strategy_string (str, false); >> + free (str); >> + } >> + >> + if (ix86_tune_memset_strategy) >> + { >> + char *str = xstrdup (ix86_tune_memset_strategy); >> + ix86_parse_stringop_strategy_string (str, true); >> + free (str); >> + } >> } >> >> /* Implement the TARGET_OPTION_OVERRIDE hook. */ >> @@ -22903,6 +23062,7 @@ ix86_expand_movmem (rtx dst, rtx src, rt >> { >> case libcall: >> case no_stringop: >> + case last_alg: >> gcc_unreachable (); >> case loop_1_byte: >> need_zero_guard = true; >> @@ -23093,6 +23253,7 @@ ix86_expand_movmem (rtx dst, rtx src, rt >> { >> case libcall: >> case no_stringop: >> + case last_alg: >> gcc_unreachable (); >> case loop_1_byte: >> case loop: >> @@ -23304,6 +23465,7 @@ ix86_expand_setmem (rtx dst, rtx count_e >> { >> case libcall: >> case no_stringop: >> + case last_alg: >> gcc_unreachable (); >> case loop: >> need_zero_guard = true; >> @@ -23481,6 +23643,7 @@ ix86_expand_setmem (rtx dst, rtx count_e >> { >> case libcall: >> case no_stringop: >> + case last_alg: >> gcc_unreachable (); >> case loop_1_byte: >> case loop: >> Index: config/i386/i386-opts.h >> =================================================================== >> --- config/i386/i386-opts.h (revision 201458) >> +++ config/i386/i386-opts.h (working copy) >> @@ -28,15 +28,17 @@ see the files COPYING3 and COPYING.RUNTI >> /* Algorithm to expand string function with. */ >> enum stringop_alg >> { >> - no_stringop, >> - libcall, >> - rep_prefix_1_byte, >> - rep_prefix_4_byte, >> - rep_prefix_8_byte, >> - loop_1_byte, >> - loop, >> - unrolled_loop, >> - vector_loop >> +#undef DEF_ENUM >> +#define DEF_ENUM >> + >> +#undef DEF_ALG >> +#define DEF_ALG(alg, name) alg, >> + >> +#include "stringop.def" >> +last_alg >> + >> +#undef DEF_ENUM >> +#undef DEF_ALG >> }; >> >> /* Available call abi. */ >> Index: doc/invoke.texi >> =================================================================== >> --- doc/invoke.texi (revision 201458) >> +++ doc/invoke.texi (working copy) >> @@ -649,6 +649,7 @@ Objective-C and Objective-C++ Dialects}. >> -mbmi2 -mrtm -mlwp -mthreads @gol >> -mno-align-stringops -minline-all-stringops @gol >> -minline-stringops-dynamically -mstringop-strategy=@var{alg} @gol >> +-mmemcpy-strategy=@var{strategy} -mmemset-strategy=@var{strategy} >> -mpush-args -maccumulate-outgoing-args -m128bit-long-double @gol >> -m96bit-long-double -mlong-double-64 -mlong-double-80 @gol >> -mregparm=@var{num} -msseregparm @gol >> @@ -14598,6 +14599,24 @@ Expand into an inline loop. >> Always use a library call. >> @end table >> >> +@item -mmemcpy-strategy=@var{strategy} >> +@opindex mmemcpy-strategy=@var{strategy} >> +Override the internal decision heuristic to decide if >> @code{__builtin_memcpy} >> +should be inlined and what inline algorithm to use when the expected size >> +of the copy operation is known. @var{strategy} >> +is a comma-separated list of @var{alg}:@var{max_size}:@var{dest_align} >> triplets. >> +@var{alg} is specified in @option{-mstringop-strategy}, @var{max_size} >> specifies >> +the max byte size with which inline algorithm @var{alg} is allowed. For the >> last >> +triplet, the @var{max_size} must be @code{-1}. The @var{max_size} of the >> triplets >> +in the list must be specified in increasing order. The minimal byte size for >> +@var{alg} is @code{0} for the first triplet and @code{@var{max_size} + 1} >> of the >> +preceding range. >> + >> +@item -mmemset-strategy=@var{strategy} >> +@opindex mmemset-strategy=@var{strategy} >> +The option is similar to @option{-mmemcpy-strategy=} except that it is to >> control >> +@code{__builtin_memset} expansion. >> + >> @item -momit-leaf-frame-pointer >> @opindex momit-leaf-frame-pointer >> Don't keep the frame pointer in a register for leaf functions. This >> Index: testsuite/gcc.target/i386/memcpy-strategy-1.c >> =================================================================== >> --- testsuite/gcc.target/i386/memcpy-strategy-1.c (revision 0) >> +++ testsuite/gcc.target/i386/memcpy-strategy-1.c (revision 0) >> @@ -0,0 +1,12 @@ >> +/* { dg-do compile } */ >> +/* { dg-options "-O2 -march=atom -mmemcpy-strategy=vector_loop:-1:align" } >> */ >> +/* { dg-final { scan-assembler-times "movdqa" 8 { target { ! { ia32 } } } } >> } */ >> +/* { dg-final { scan-assembler-times "movdqa" 4 { target { ia32 } } } } */ >> + >> +char a[2048]; >> +char b[2048]; >> +void t (void) >> +{ >> + __builtin_memcpy (a, b, 2048); >> +} >> + >> Index: testsuite/gcc.target/i386/memcpy-strategy-2.c >> =================================================================== >> --- testsuite/gcc.target/i386/memcpy-strategy-2.c (revision 0) >> +++ testsuite/gcc.target/i386/memcpy-strategy-2.c (revision 0) >> @@ -0,0 +1,12 @@ >> +/* { dg-do compile } */ >> +/* { dg-options "-O2 -march=atom >> -mmemcpy-strategy=vector_loop:3000:align,libcall:-1:align" } */ >> +/* { dg-final { scan-assembler-times "movdqa" 8 { target { ! { ia32 } } } } >> } */ >> +/* { dg-final { scan-assembler-times "movdqa" 4 { target { ia32 } } } } */ >> + >> +char a[2048]; >> +char b[2048]; >> +void t (void) >> +{ >> + __builtin_memcpy (a, b, 2048); >> +} >> + >> Index: testsuite/gcc.target/i386/memset-strategy-1.c >> =================================================================== >> --- testsuite/gcc.target/i386/memset-strategy-1.c (revision 0) >> +++ testsuite/gcc.target/i386/memset-strategy-1.c (revision 0) >> @@ -0,0 +1,10 @@ >> +/* { dg-do compile } */ >> +/* { dg-options "-O2 -march=atom -mmemset-strategy=libcall:-1:align" } */ >> +/* { dg-final { scan-assembler-times "memset" 2 } } */ >> + >> +char a[2048]; >> +void t (void) >> +{ >> + __builtin_memset (a, 1, 2048); >> +} >> + >> Index: testsuite/gcc.target/i386/memcpy-strategy-3.c >> =================================================================== >> --- testsuite/gcc.target/i386/memcpy-strategy-3.c (revision 0) >> +++ testsuite/gcc.target/i386/memcpy-strategy-3.c (revision 0) >> @@ -0,0 +1,11 @@ >> +/* { dg-do compile } */ >> +/* { dg-options "-O2 -march=atom >> -mmemcpy-strategy=vector_loop:2000:align,libcall:-1:align" } */ >> +/* { dg-final { scan-assembler-times "memcpy" 2 } } */ >> + >> +char a[2048]; >> +char b[2048]; >> +void t (void) >> +{ >> + __builtin_memcpy (a, b, 2048); >> +} >> + >
Index: doc/invoke.texi =================================================================== --- doc/invoke.texi (revision 201458) +++ doc/invoke.texi (working copy) @@ -649,6 +649,7 @@ Objective-C and Objective-C++ Dialects}. -mbmi2 -mrtm -mlwp -mthreads @gol -mno-align-stringops -minline-all-stringops @gol -minline-stringops-dynamically -mstringop-strategy=@var{alg} @gol +-mmemcpy-strategy=@var{strategy} -mmemset-strategy=@var{strategy} -mpush-args -maccumulate-outgoing-args -m128bit-long-double @gol -m96bit-long-double -mlong-double-64 -mlong-double-80 @gol -mregparm=@var{num} -msseregparm @gol @@ -14598,6 +14599,24 @@ Expand into an inline loop. Always use a library call. @end table +@item -mmemcpy-strategy=@var{strategy} +@opindex mmemcpy-strategy=@var{strategy} +Override the internal decision heuristic to decide if @code{__builtin_memcpy} +should be inlined and what inline algorithm to use when the expected size +of the copy operation is known. @var{strategy} +is a comma-separated list of @var{alg}:@var{max_size}:@var{dest_align} triplets. +@var{alg} is specified in @option{-mstringop-strategy}, @var{max_size} specifies +the max byte size with which inline algorithm @var{alg} is allowed. For the last +triplet, the @var{max_size} must be @code{-1}. The @var{max_size} of the triplets +in the list must be specified in increasing order. The minimal byte size for +@var{alg} is @code{0} for the first triplet and @code{@var{max_size} + 1} of the +preceding range. + +@item -mmemset-strategy=@var{strategy} +@opindex mmemset-strategy=@var{strategy} +The option is similar to @option{-mmemcpy-strategy=} except that it is to control +@code{__builtin_memset} expansion. + @item -momit-leaf-frame-pointer @opindex momit-leaf-frame-pointer Don't keep the frame pointer in a register for leaf functions. This Index: testsuite/gcc.target/i386/memcpy-strategy-2.c =================================================================== --- testsuite/gcc.target/i386/memcpy-strategy-2.c (revision 0) +++ testsuite/gcc.target/i386/memcpy-strategy-2.c (revision 0) @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=atom -mmemcpy-strategy=vector_loop:3000:align,libcall:-1:align" } */ +/* { dg-final { scan-assembler-times "movdqa" 8 { target { ! { ia32 } } } } } */ +/* { dg-final { scan-assembler-times "movdqa" 4 { target { ia32 } } } } */ + +char a[2048]; +char b[2048]; +void t (void) +{ + __builtin_memcpy (a, b, 2048); +} + Index: testsuite/gcc.target/i386/memset-strategy-1.c =================================================================== --- testsuite/gcc.target/i386/memset-strategy-1.c (revision 0) +++ testsuite/gcc.target/i386/memset-strategy-1.c (revision 0) @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=atom -mmemset-strategy=libcall:-1:align" } */ +/* { dg-final { scan-assembler-times "memset" 2 } } */ + +char a[2048]; +void t (void) +{ + __builtin_memset (a, 1, 2048); +} + Index: testsuite/gcc.target/i386/memcpy-strategy-3.c =================================================================== --- testsuite/gcc.target/i386/memcpy-strategy-3.c (revision 0) +++ testsuite/gcc.target/i386/memcpy-strategy-3.c (revision 0) @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=atom -mmemcpy-strategy=vector_loop:2000:align,libcall:-1:align" } */ +/* { dg-final { scan-assembler-times "memcpy" 2 } } */ + +char a[2048]; +char b[2048]; +void t (void) +{ + __builtin_memcpy (a, b, 2048); +} + Index: testsuite/gcc.target/i386/memcpy-strategy-1.c =================================================================== --- testsuite/gcc.target/i386/memcpy-strategy-1.c (revision 0) +++ testsuite/gcc.target/i386/memcpy-strategy-1.c (revision 0) @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=atom -mmemcpy-strategy=vector_loop:-1:align" } */ +/* { dg-final { scan-assembler-times "movdqa" 8 { target { ! { ia32 } } } } } */ +/* { dg-final { scan-assembler-times "movdqa" 4 { target { ia32 } } } } */ + +char a[2048]; +char b[2048]; +void t (void) +{ + __builtin_memcpy (a, b, 2048); +} + Index: config/i386/stringop.def =================================================================== --- config/i386/stringop.def (revision 0) +++ config/i386/stringop.def (revision 0) @@ -0,0 +1,42 @@ +/* Definitions for option handling for IA-32. + Copyright (C) 2013 Free Software Foundation, Inc. + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation; either version 3, or (at your option) +any later version. + +GCC is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +GNU General Public License for more details. + +Under Section 7 of GPL version 3, you are granted additional +permissions described in the GCC Runtime Library Exception, version +3.1, as published by the Free Software Foundation. + +You should have received a copy of the GNU General Public License and +a copy of the GCC Runtime Library Exception along with this program; +see the files COPYING3 and COPYING.RUNTIME respectively. If not, see +<http://www.gnu.org/licenses/>. */ + +DEF_ENUM +DEF_ALG (no_stringop, no_stringop) +DEF_ENUM +DEF_ALG (libcall, libcall) +DEF_ENUM +DEF_ALG (rep_prefix_1_byte, rep_byte) +DEF_ENUM +DEF_ALG (rep_prefix_4_byte, rep_4byte) +DEF_ENUM +DEF_ALG (rep_prefix_8_byte, rep_8byte) +DEF_ENUM +DEF_ALG (loop_1_byte, byte_loop) +DEF_ENUM +DEF_ALG (loop, loop) +DEF_ENUM +DEF_ALG (unrolled_loop, unrolled_loop) +DEF_ENUM +DEF_ALG (vector_loop, vector_loop) Index: config/i386/i386.c =================================================================== --- config/i386/i386.c (revision 201458) +++ config/i386/i386.c (working copy) @@ -156,7 +156,7 @@ struct processor_costs ix86_size_cost = }; /* Processor costs (relative to an add) */ -static const +static struct processor_costs i386_cost = { /* 386 specific costs */ COSTS_N_INSNS (1), /* cost of an add instruction */ COSTS_N_INSNS (1), /* cost of a lea instruction */ @@ -226,7 +226,7 @@ struct processor_costs i386_cost = { /* 1, /* cond_not_taken_branch_cost. */ }; -static const +static struct processor_costs i486_cost = { /* 486 specific costs */ COSTS_N_INSNS (1), /* cost of an add instruction */ COSTS_N_INSNS (1), /* cost of a lea instruction */ @@ -298,7 +298,7 @@ struct processor_costs i486_cost = { /* 1, /* cond_not_taken_branch_cost. */ }; -static const +static struct processor_costs pentium_cost = { COSTS_N_INSNS (1), /* cost of an add instruction */ COSTS_N_INSNS (1), /* cost of a lea instruction */ @@ -368,7 +368,7 @@ struct processor_costs pentium_cost = { 1, /* cond_not_taken_branch_cost. */ }; -static const +static struct processor_costs pentiumpro_cost = { COSTS_N_INSNS (1), /* cost of an add instruction */ COSTS_N_INSNS (1), /* cost of a lea instruction */ @@ -447,7 +447,7 @@ struct processor_costs pentiumpro_cost = 1, /* cond_not_taken_branch_cost. */ }; -static const +static struct processor_costs geode_cost = { COSTS_N_INSNS (1), /* cost of an add instruction */ COSTS_N_INSNS (1), /* cost of a lea instruction */ @@ -518,7 +518,7 @@ struct processor_costs geode_cost = { 1, /* cond_not_taken_branch_cost. */ }; -static const +static struct processor_costs k6_cost = { COSTS_N_INSNS (1), /* cost of an add instruction */ COSTS_N_INSNS (2), /* cost of a lea instruction */ @@ -591,7 +591,7 @@ struct processor_costs k6_cost = { 1, /* cond_not_taken_branch_cost. */ }; -static const +static struct processor_costs athlon_cost = { COSTS_N_INSNS (1), /* cost of an add instruction */ COSTS_N_INSNS (2), /* cost of a lea instruction */ @@ -664,7 +664,7 @@ struct processor_costs athlon_cost = { 1, /* cond_not_taken_branch_cost. */ }; -static const +static struct processor_costs k8_cost = { COSTS_N_INSNS (1), /* cost of an add instruction */ COSTS_N_INSNS (2), /* cost of a lea instruction */ @@ -1265,7 +1265,7 @@ struct processor_costs btver2_cost = { 1, /* cond_not_taken_branch_cost. */ }; -static const +static struct processor_costs pentium4_cost = { COSTS_N_INSNS (1), /* cost of an add instruction */ COSTS_N_INSNS (3), /* cost of a lea instruction */ @@ -1336,7 +1336,7 @@ struct processor_costs pentium4_cost = { 1, /* cond_not_taken_branch_cost. */ }; -static const +static struct processor_costs nocona_cost = { COSTS_N_INSNS (1), /* cost of an add instruction */ COSTS_N_INSNS (1), /* cost of a lea instruction */ @@ -1409,7 +1409,7 @@ struct processor_costs nocona_cost = { 1, /* cond_not_taken_branch_cost. */ }; -static const +static struct processor_costs atom_cost = { COSTS_N_INSNS (1), /* cost of an add instruction */ COSTS_N_INSNS (1) + 1, /* cost of a lea instruction */ @@ -1556,7 +1556,7 @@ struct processor_costs slm_cost = { }; /* Generic64 should produce code tuned for Nocona and K8. */ -static const +static struct processor_costs generic64_cost = { COSTS_N_INSNS (1), /* cost of an add instruction */ /* On all chips taken into consideration lea is 2 cycles and more. With @@ -1635,7 +1635,7 @@ struct processor_costs generic64_cost = }; /* core_cost should produce code tuned for Core familly of CPUs. */ -static const +static struct processor_costs core_cost = { COSTS_N_INSNS (1), /* cost of an add instruction */ /* On all chips taken into consideration lea is 2 cycles and more. With @@ -1717,7 +1717,7 @@ struct processor_costs core_cost = { /* Generic32 should produce code tuned for PPro, Pentium4, Nocona, Athlon and K8. */ -static const +static struct processor_costs generic32_cost = { COSTS_N_INSNS (1), /* cost of an add instruction */ COSTS_N_INSNS (1) + 1, /* cost of a lea instruction */ @@ -2900,6 +2900,148 @@ ix86_debug_options (void) return; } + +static const char *stringop_alg_names[] = { +#define DEF_ENUM +#define DEF_ALG(alg, name) #name, +#include "stringop.def" +#undef DEF_ENUM +#undef DEF_ALG +}; + +/* Parse parameter string passed to -mmemcpy-strategy= or -mmemset-strategy=. + The string is of the following form (or comma separated list of it): + + strategy_alg:max_size:[align|noalign] + + where the full size range for the strategy is either [0, max_size] or + [min_size, max_size], in which min_size is the max_size + 1 of the + preceding range. The last size range must have max_size == -1. + + Examples: + + 1. + -mmemcpy-strategy=libcall:-1:noalign + + this is equivalent to (for known size memcpy) -mstringop-strategy=libcall + + + 2. + -mmemset-strategy=rep_8byte:16:noalign,vector_loop:2048:align,libcall:-1:noalign + + This is to tell the compiler to use the following strategy for memset + 1) when the expected size is between [1, 16], use rep_8byte strategy; + 2) when the size is between [17, 2048], use vector_loop; + 3) when the size is > 2048, use libcall. */ + +struct stringop_size_range +{ + int min; + int max; + stringop_alg alg; + bool noalign; +}; + +static void +ix86_parse_stringop_strategy_string (char *strategy_str, bool is_memset) +{ + const struct stringop_algs *default_algs; + stringop_size_range input_ranges[MAX_STRINGOP_ALGS]; + char *curr_range_str, *next_range_str; + int i = 0, n = 0; + + if (is_memset) + default_algs = &ix86_cost->memset[TARGET_64BIT != 0]; + else + default_algs = &ix86_cost->memcpy[TARGET_64BIT != 0]; + + curr_range_str = strategy_str; + + do { + + int mins, maxs; + stringop_alg alg; + char alg_name[128]; + char align[16]; + + next_range_str = strchr (curr_range_str, ','); + if (next_range_str) + *next_range_str++ = '\0'; + + if (3 != sscanf (curr_range_str, "%[^:]:%d:%s", alg_name, &maxs, align)) + { + warning (0, "Wrong arg %s to option %s", curr_range_str, + is_memset ? "-mmemset_strategy=" : "-mmemcpy_strategy="); + return; + } + + if (n > 0 && (maxs < (mins = input_ranges[n - 1].max + 1) && maxs != -1)) + { + warning (0, "Size ranges of option %s should be increasing", + is_memset ? "-mmemset_strategy=" : "-mmemcpy_strategy="); + return; + } + + for (i = 0; i < last_alg; i++) + { + if (!strcmp (alg_name, stringop_alg_names[i])) + { + alg = (stringop_alg) i; + break; + } + } + + if (i == last_alg) + { + warning (0, "Wrong stringop strategy name %s specified for option %s", + alg_name, + is_memset ? "-mmemset_strategy=" : "-mmemcpy_strategy="); + return; + } + + input_ranges[n].min = mins; + input_ranges[n].max = maxs; + input_ranges[n].alg = alg; + if (!strcmp (align, "align")) + input_ranges[n].noalign = false; + else if (!strcmp (align, "noalign")) + input_ranges[n].noalign = true; + else + { + warning (0, "Unknown alignment %s specified for option %s", + align, is_memset ? "-mmemset_strategy=" : "-mmemcpy_strategy="); + return; + } + n++; + curr_range_str = next_range_str; + } while (curr_range_str); + + if (input_ranges[n - 1].max != -1) + { + warning (0, "The max value for the last size range should be -1" + " for option %s", + is_memset ? "-mmemset_strategy=" : "-mmemcpy_strategy="); + return; + } + + if (n > MAX_STRINGOP_ALGS) + { + warning (0, "Too many size ranges specified in option %s", + is_memset ? "-mmemset_strategy=" : "-mmemcpy_strategy="); + return; + } + + /* Now override the default algs array */ + for (i = 0; i < n; i++) + { + *const_cast<int *>(&default_algs->size[i].max) = input_ranges[i].max; + *const_cast<stringop_alg *>(&default_algs->size[i].alg) + = input_ranges[i].alg; + *const_cast<int *>(&default_algs->size[i].noalign) + = input_ranges[i].noalign; + } +} + /* Override various settings based on options. If MAIN_ARGS_P, the options are from the command line, otherwise they are from @@ -4021,6 +4163,21 @@ ix86_option_override_internal (bool main /* Handle stack protector */ if (!global_options_set.x_ix86_stack_protector_guard) ix86_stack_protector_guard = TARGET_HAS_BIONIC ? SSP_GLOBAL : SSP_TLS; + + /* Handle -mmemcpy-strategy= and -mmemset-strategy= */ + if (ix86_tune_memcpy_strategy) + { + char *str = xstrdup (ix86_tune_memcpy_strategy); + ix86_parse_stringop_strategy_string (str, false); + free (str); + } + + if (ix86_tune_memset_strategy) + { + char *str = xstrdup (ix86_tune_memset_strategy); + ix86_parse_stringop_strategy_string (str, true); + free (str); + } } /* Implement the TARGET_OPTION_OVERRIDE hook. */ @@ -22903,6 +23060,7 @@ ix86_expand_movmem (rtx dst, rtx src, rt { case libcall: case no_stringop: + case last_alg: gcc_unreachable (); case loop_1_byte: need_zero_guard = true; @@ -23093,6 +23251,7 @@ ix86_expand_movmem (rtx dst, rtx src, rt { case libcall: case no_stringop: + case last_alg: gcc_unreachable (); case loop_1_byte: case loop: @@ -23304,6 +23463,7 @@ ix86_expand_setmem (rtx dst, rtx count_e { case libcall: case no_stringop: + case last_alg: gcc_unreachable (); case loop: need_zero_guard = true; @@ -23481,6 +23641,7 @@ ix86_expand_setmem (rtx dst, rtx count_e { case libcall: case no_stringop: + case last_alg: gcc_unreachable (); case loop_1_byte: case loop: Index: config/i386/stringop.opt =================================================================== --- config/i386/stringop.opt (revision 0) +++ config/i386/stringop.opt (revision 0) @@ -0,0 +1,36 @@ +/* Definitions for option handling for IA-32. + Copyright (C) 2013 Free Software Foundation, Inc. + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation; either version 3, or (at your option) +any later version. + +GCC is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +GNU General Public License for more details. + +Under Section 7 of GPL version 3, you are granted additional +permissions described in the GCC Runtime Library Exception, version +3.1, as published by the Free Software Foundation. + +You should have received a copy of the GNU General Public License and +a copy of the GCC Runtime Library Exception along with this program; +see the files COPYING3 and COPYING.RUNTIME respectively. If not, see +<http://www.gnu.org/licenses/>. */ + +Enum(stringop_alg) String(rep_byte) Value(rep_prefix_1_byte) + +#undef DEF_ENUM +#define DEF_ENUM EnumValue + +#undef DEF_ALG +#define DEF_ALG(alg, name) Enum(stringop_alg) String(name) Value(alg) + +#include "stringop.def" + +#undef DEF_ENUM +#undef DEF_ALG Index: config/i386/i386-opts.h =================================================================== --- config/i386/i386-opts.h (revision 201458) +++ config/i386/i386-opts.h (working copy) @@ -28,15 +28,17 @@ see the files COPYING3 and COPYING.RUNTI /* Algorithm to expand string function with. */ enum stringop_alg { - no_stringop, - libcall, - rep_prefix_1_byte, - rep_prefix_4_byte, - rep_prefix_8_byte, - loop_1_byte, - loop, - unrolled_loop, - vector_loop +#undef DEF_ENUM +#define DEF_ENUM + +#undef DEF_ALG +#define DEF_ALG(alg, name) alg, + +#include "stringop.def" +last_alg + +#undef DEF_ENUM +#undef DEF_ALG }; /* Available call abi. */ Index: config/i386/i386.opt =================================================================== --- config/i386/i386.opt (revision 201458) +++ config/i386/i386.opt (working copy) @@ -316,6 +316,14 @@ mstack-arg-probe Target Report Mask(STACK_PROBE) Save Enable stack probing +mmemcpy-strategy= +Target RejectNegative Joined Var(ix86_tune_memcpy_strategy) +Specify memcpy expansion strategy when expected size is known + +mmemset-strategy= +Target RejectNegative Joined Var(ix86_tune_memset_strategy) +Specify memset expansion strategy when expected size is known + mstringop-strategy= Target RejectNegative Joined Enum(stringop_alg) Var(ix86_stringop_alg) Init(no_stringop) Chose strategy to generate stringop using