[ARM] Fix CLZ_DEFINED_VALUE_AT_ZERO for vector modes
The CLZ_DEFINED_VALUE_AT_ZERO macro is harded to return 32. For the vector intrinsic vclz this is incorrect and should return the value eight. The CTZ_DEFINED_VALUE_AT_ZERO has the same issue. Tested on arm-linux-gnueabihf, arm-linux-gnueabi. 2014-10-08 Michael Collison * config/arm/arm.h (CLZ_DEFINED_VALUE_AT_ZERO) : Update to support vector modes (CTZ_DEFINED_VALUE_AT_ZERO): Ditto -- Michael Collison Linaro Toolchain Working Group michael.colli...@linaro.org --- ../../../../linaro-gcc4_9_git/gcc/config/arm/arm.h 2014-10-08 13:49:01.109819957 -0700 +++ arm.h 2014-10-08 13:56:19.509841175 -0700 @@ -2138,8 +2138,10 @@ extern int making_const_table; : reverse_condition (code)) /* The arm5 clz instruction returns 32. */ -#define CLZ_DEFINED_VALUE_AT_ZERO(MODE, VALUE) ((VALUE) = 32, 1) -#define CTZ_DEFINED_VALUE_AT_ZERO(MODE, VALUE) ((VALUE) = 32, 1) +#define CLZ_DEFINED_VALUE_AT_ZERO(MODE, VALUE) \ + ((VALUE) = GET_MODE_UNIT_BITSIZE (MODE)) +#define CTZ_DEFINED_VALUE_AT_ZERO(MODE, VALUE) \ + ((VALUE) = GET_MODE_UNIT_BITSIZE (MODE)) #define CC_STATUS_INIT \ do { cfun->machine->thumb1_cc_insn = NULL_RTX; } while (0)
Re: [ARM] Fix CLZ_DEFINED_VALUE_AT_ZERO for vector modes
On Thu, Oct 9, 2014 at 12:05 AM, Michael Collison wrote: > > The CLZ_DEFINED_VALUE_AT_ZERO macro is harded to return 32. For the vector > intrinsic vclz this is incorrect and should return the value eight. The > CTZ_DEFINED_VALUE_AT_ZERO has the same issue. Do you have a testcase? I almost think you should have a testcase which causes the constant folding. Though I don't think there is constant folding of the vector clz/ctz happening. Thanks, Andrew > > Tested on arm-linux-gnueabihf, arm-linux-gnueabi. > > 2014-10-08 Michael Collison > > * config/arm/arm.h (CLZ_DEFINED_VALUE_AT_ZERO) : Update > to support vector modes > (CTZ_DEFINED_VALUE_AT_ZERO): Ditto > > -- > Michael Collison > Linaro Toolchain Working Group > michael.colli...@linaro.org >
Re: Towards GNU11
On Wed, Oct 08, 2014 at 08:39:40PM -0600, Jeff Law wrote: > I like it. And one could reasonably argue that now is the time to change > since that maximizes the time for folks to find broken code. Yep, this is definitely stage1 stuff. We still have a few weeks, but I wouldn't want to rush such a change in the nick of time. > I'd go so far as to conditionally approve -- if other maintainers don't > shout out in the next week or so against, then I feel this should go > forward. Thanks. I will wait at least until the end of next week. I'd like to hear from Joseph ;). > I know it's really early, but a "porting to ..." document ought to be > started and have something in it about these changes. Both how to fix the > broken code or how to go back to c89. Absolutely. I'll start something up once it's in. I feel that especially the inline semantics change should be addressed therein. Marek
Re: [ARM] Fix CLZ_DEFINED_VALUE_AT_ZERO for vector modes
Yes this problem was found with Christophe's neon intrinsic tests which are awaiting approval. The problem was found by passing a value of zero to the vclz vector intrinsic. On 10/09/2014 12:11 AM, Andrew Pinski wrote: On Thu, Oct 9, 2014 at 12:05 AM, Michael Collison wrote: The CLZ_DEFINED_VALUE_AT_ZERO macro is harded to return 32. For the vector intrinsic vclz this is incorrect and should return the value eight. The CTZ_DEFINED_VALUE_AT_ZERO has the same issue. Do you have a testcase? I almost think you should have a testcase which causes the constant folding. Though I don't think there is constant folding of the vector clz/ctz happening. Thanks, Andrew Tested on arm-linux-gnueabihf, arm-linux-gnueabi. 2014-10-08 Michael Collison * config/arm/arm.h (CLZ_DEFINED_VALUE_AT_ZERO) : Update to support vector modes (CTZ_DEFINED_VALUE_AT_ZERO): Ditto -- Michael Collison Linaro Toolchain Working Group michael.colli...@linaro.org -- Michael Collison Linaro Toolchain Working Group michael.colli...@linaro.org
[patch] Don't install libquadmath.info if libquadmath is not actually built
The attached patch prevents libquadmath from installing its info file if the library is not actually built. Tested on x86_64-linux, both on a built with libquadmath (normal) and without (tweaking configure so it thinks _float128 is not supported). OK to commit? quadmath.ChangeLog Description: Binary data quadmath.diff Description: Binary data
Re: [patch] Don't install libquadmath.info if libquadmath is not actually built
> The attached patch prevents libquadmath from installing its info file if the > library is not actually built. > Tested on x86_64-linux, both on a built with libquadmath (normal) and without > (tweaking configure so it thinks _float128 is not supported). > > OK to commit? Again, with correct ChangeLog entry including regeneration of Makefile.in quadmath.ChangeLog Description: Binary data quadmath.diff Description: Binary data
Re: [patch] Don't install libquadmath.info if libquadmath is not actually built
On Thu, Oct 09, 2014 at 09:27:55AM +0200, FX wrote: > > The attached patch prevents libquadmath from installing its info file if > > the library is not actually built. > > Tested on x86_64-linux, both on a built with libquadmath (normal) and > > without (tweaking configure so it thinks _float128 is not supported). > > > > OK to commit? > > Again, with correct ChangeLog entry including regeneration of Makefile.in > Index: Makefile.am === --- Makefile.am (revision 215887) +++ Makefile.am (working copy) @@ -158,10 +158,18 @@ # the relative path from the current `Makefile.am' to `texinfo.tex'. TEXINFO_TEX = ../gcc/doc/include/texinfo.tex + # Defines info, dvi, pdf and html targets MAKEINFOFLAGS = -I $(srcdir)/../gcc/doc/include + +if BUILD_LIBQUADMATH info_TEXINFOS = libquadmath.texi libquadmath_TEXINFOS = libquadmath-vers.texi +else +info_TEXINFOS = +libquadmath_TEXINFOS = +endif libquadmath-vers.texi: echo "@set BUGURL $(REPORT_BUGS_TEXI)" > $@ + Please avoid the vertical whitespace changes before MAKEINFOFLAGS and after libquadmath-vers.texi goal. Ok with that change. Jakub
Re: [PATCH, Pointer Bounds Checker 14/x] Passes [2/n] IPA passes
2014-10-09 2:40 GMT+04:00 Jan Hubicka : >> >> I prevent clone's body from removal and therefore original should call >> clone (otherwise clone may have no callers and be removed). I think >> call edge may work (need to recall other cases when reference do its >> work) but is it OK to have call with no stmt for non thunk nodes? > > OK, I guess I understand what makes your situation different from what we have > already. Basically you have two functions with bodies already produced (by > early optimization) but you want to keep body of the second around > to later turn the first into thunk. But I do not see why this can not happen > early? Clones are created not only for functions with bodies. Aliases and thunks may also have clones. Having call edges for aliases and more than one edge for thunks seemed more intrusive to me than having a new reference. Regarding early transformation into thunks - I need to try it. It sounds reasonable and might work. > > I suppose we could either go with fake call edge or the new reference you > have. > I basically wonder what would be least intrussive (and easiest to maintain) > way > for doing so - adding special kinds of functions always produce extra > maintenance > overhead in long term and we already have very many of them... >> >> > >> >> +instrumented function in the code but it still should be counted >> >> +as reachable if the original function is reachable. >> >> + >> >> +When original function bodies are not needed anymore we release >> >> +them and transform functions into a special kind of thunks. Each >> >> +thunk has a call edge to the instrumented version. These thunks >> >> +help to keep externally visible instrumented functions visible >> >> +when linker reolution files are used. Linker has no info about >> >> +connection between original and instrumented function and >> >> +therefore we may wrongly decide (due to difference in assember >> >> +names) that instrumented function version is local and can be >> >> +removed. */ >> > >> > Can you, please, give me an example what happens here? >> > Also how this relates to LTO? >> >> When we compile with LTO linker provides a resolution file with >> functions usage information. Consider I have external function test >> and its clone test.chkp. Linker says I have calls to test and no >> calls to test.chkp. Compiler just removes test.chkp then. It doesn't >> happen when test is trunsformed into a thunk of test.chkp. > > Does the bounds checking work across translation unit boundaries somehow? > Would it make sense to do the redirection and instrumentation as an IPA pass? Instrumentation is function local and I don't see benefit from making it an IPA pass. Ilya > > Honza >> >> Ilya >> >> > >> > Honza >> >> + >> >> +#define CHKP_BOUNDS_OF_SYMBOL_PREFIX "__chkp_bounds_of_" >> >> + >> >> +/* Build a clone of FNDECL with a modified name. */ >> >> + >> >> +static tree >> >> +chkp_build_instrumented_fndecl (tree fndecl) >> >> +{ >> >> + tree new_decl = copy_node (fndecl); >> >> + tree new_name; >> >> + std::string s; >> >> + >> >> + /* We want called_as_built_in recall instrumented calls >> >> + to instrumented built-in functions. Therefore use >> >> + DECL_NAME for cloning instead of DECL_ASSEMBLER_NAME. */ >> >> + s = IDENTIFIER_POINTER (DECL_NAME (fndecl)); >> >> + s += ".chkp"; >> >> + DECL_NAME (new_decl) = get_identifier (s.c_str ()); >> >> + >> >> + /* References to the original and to the instrumented version >> >> + should look the same in the output assembly. And we cannot >> >> + use the same assembler name for the instrumented version >> >> + because it conflicts with decl merging algorithms in LTO. >> >> + Achieve the result by using transparent alias name for the >> >> + instrumented version. */ >> >> + s = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (fndecl)); >> >> + s += ".chkp"; >> >> + new_name = get_identifier (s.c_str ()); >> >> + IDENTIFIER_TRANSPARENT_ALIAS (new_name) = 1; >> >> + TREE_CHAIN (new_name) = DECL_ASSEMBLER_NAME (fndecl); >> >> + SET_DECL_ASSEMBLER_NAME (new_decl, new_name); >> >> + >> >> + /* For functions with body versioning will make a copy of arguments. >> >> + For functions with no body we need to do it here. */ >> >> + if (!gimple_has_body_p (fndecl)) >> >> +DECL_ARGUMENTS (new_decl) = copy_list (DECL_ARGUMENTS (fndecl)); >> >> + >> >> + /* We are going to modify attributes list and therefore should >> >> + make own copy. */ >> >> + DECL_ATTRIBUTES (new_decl) = copy_list (DECL_ATTRIBUTES (fndecl)); >> >> + >> >> + return new_decl; >> >> +} >> >> + >> >> + >> >> +/* Fix operands of attribute from ATTRS list named ATTR_NAME. >> >> + Integer operands are replaced with values according to >> >> + INDEXES map having LEN elements. For operands out of len >> >> + we just add DELTA. */ >> >> + >> >> +static void >> >> +chkp_map_attr_arg_indexes (tree at
[PATCH] Fix PR61969
This fixes an invalid NRV optimization done for returning a non-automatic var. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. Richard. 2014-10-08 Richard Biener PR tree-optimization/61969 * tree-nrv.c (pass_nrv::execute): Properly test for automatic variables. Index: gcc/tree-nrv.c === --- gcc/tree-nrv.c (revision 215917) +++ gcc/tree-nrv.c (working copy) @@ -216,8 +216,7 @@ pass_nrv::execute (function *fun) same type and alignment as the function's result. */ if (TREE_CODE (found) != VAR_DECL || TREE_THIS_VOLATILE (found) - || DECL_CONTEXT (found) != current_function_decl - || TREE_STATIC (found) + || !auto_var_in_fn_p (found, current_function_decl) || TREE_ADDRESSABLE (found) || DECL_ALIGN (found) > DECL_ALIGN (result) || !useless_type_conversion_p (result_type,
Re: [ARM] Fix CLZ_DEFINED_VALUE_AT_ZERO for vector modes
On 09/10/14 08:05, Michael Collison wrote: The CLZ_DEFINED_VALUE_AT_ZERO macro is harded to return 32. For the vector intrinsic vclz this is incorrect and should return the value eight. The CTZ_DEFINED_VALUE_AT_ZERO has the same issue. Tested on arm-linux-gnueabihf, arm-linux-gnueabi. 2014-10-08 Michael Collison * config/arm/arm.h (CLZ_DEFINED_VALUE_AT_ZERO) : Update to support vector modes (CTZ_DEFINED_VALUE_AT_ZERO): Ditto Update comment? /* The arm5 clz instruction returns 32. */ Thanks, Tejas.
Re: [ARM] Fix CLZ_DEFINED_VALUE_AT_ZERO for vector modes
Tejas, You are correct. I will update the comment. On 10/9/2014 12:55 AM, Tejas Belagod wrote: On 09/10/14 08:05, Michael Collison wrote: The CLZ_DEFINED_VALUE_AT_ZERO macro is harded to return 32. For the vector intrinsic vclz this is incorrect and should return the value eight. The CTZ_DEFINED_VALUE_AT_ZERO has the same issue. Tested on arm-linux-gnueabihf, arm-linux-gnueabi. 2014-10-08 Michael Collison * config/arm/arm.h (CLZ_DEFINED_VALUE_AT_ZERO) : Update to support vector modes (CTZ_DEFINED_VALUE_AT_ZERO): Ditto Update comment? /* The arm5 clz instruction returns 32. */ Thanks, Tejas.
[PATCH] Fix PR63445
This fixes a bogus strict overflow warning from VRPs simplify_cond_using_ranges. I believe that when optimizing equality compares from (T) x != CST to x != (T) CST we do not have to care about the fact whether the computation of x possibly overflowed. Besides that it looks odd that we guard the transform by overflow infinities / strict-overflow behavior - those should only guard the warning. Jeff - you added the overflow checking stuff, can you remember why you did that? (I didn't adjust those checks to not apply for equality compares with this patch) Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk (probably latent on the 4.9 branch but the testcase doesn't fail there). Thanks, Richard. 2014-10-08 Richard Biener PR tree-optimization/63445 * tree-vrp.c (simplify_cond_using_ranges): Only warn about overflow for non-equality compares. * gcc.dg/Wstrict-overflow-26.c: New testcase. Index: gcc/tree-vrp.c === --- gcc/tree-vrp.c (revision 215917) +++ gcc/tree-vrp.c (working copy) @@ -9189,8 +9189,9 @@ simplify_cond_using_ranges (gimple stmt) /* If the range overflowed and the user has asked for warnings when strict overflow semantics were used to optimize code, issue an appropriate warning. */ - if ((is_negative_overflow_infinity (vr->min) - || is_positive_overflow_infinity (vr->max)) + if (cond_code != EQ_EXPR && cond_code != NE_EXPR + && (is_negative_overflow_infinity (vr->min) + || is_positive_overflow_infinity (vr->max)) && issue_strict_overflow_warning (WARN_STRICT_OVERFLOW_CONDITIONAL)) { location_t location; Index: gcc/testsuite/gcc.dg/Wstrict-overflow-26.c === --- gcc/testsuite/gcc.dg/Wstrict-overflow-26.c (revision 0) +++ gcc/testsuite/gcc.dg/Wstrict-overflow-26.c (working copy) @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -Wstrict-overflow" } */ + +int +f (int i, int j) +{ + unsigned int c = 0; + if (i < j) +{ + unsigned int n = j - i; + unsigned int i; + for (i = 0; i < n; i++) /* { dg-bogus "signed overflow" } */ + c++; +} + return c; +}
Re: [patch] Excessive alignment in ix86_data_alignment
On 08 Oct 23:02, Petr Murzin wrote: > Hi, > I have measured performance impact on Haswell platform according to this > input: > https://gcc.gnu.org/ml/gcc-patches/2014-06/msg00978.html Same in plain text: For `-O2': TestPrevious Current Ratio(%) 400.perlbench 46.2000 46.2000 +0% 401.bzip2 29. 29. +0% 403.gcc 41. 41.2000 +0.48% 429.mcf 53.7000 53.3000 -0.74% 445.gobmk 32.3000 32.3000 +0% 456.hmmer 32.3000 32.3000 +0% 458.sjeng 36.1000 36.4000 +0.83% 462.libquantum 84.5000 84.3000 -0.23% 464.h264ref 64.4000 64.4000 +0% 471.omnetpp 27.7000 27.9000 +0.72% 473.astar 28.1000 28. -0.35% 483.xalancbmk 49.9000 50.2000 +0.60% 410.bwaves 93.2000 93.4000 +0.21% 416.gamess 41.9000 41.8000 -0.23% 433.milc37.9000 37.8000 -0.26% 434.zeusmp 48.5000 48.4000 -0.20% 435.gromacs 32.5000 32.6000 +0.30% 436.cactusADM 82.8000 80.9000 -2.29% 437.leslie3d55.6000 55.7000 +0.17% 444.namd32.1000 32.1000 +0% 447.dealII 67.5000 67.5000 +0% 450.soplex 48. 47.9000 -0.20% 453.povray 58.3000 58.4000 +0.17% 454.calculix37.6000 37.6000 +0% 459.GemsFDTD44.5000 44.7000 +0.44% 465.tonto 28.6000 28.6000 +0% 470.lbm 67.9000 67.8000 -0.14% 481.wrf 49.4000 49.5000 +0.20% 482.sphinx3 68.7000 68.6000 -0.14% Geomeans: INT 41.14 41.18 +0.11% FP 49.82 49.76 -0.12% ALL 46.03 46.01 -0.02% For `-O3': TestPrevious Current Ratio(%) 400.perlbench 46.2000 46.2000 +0% 401.bzip2 29. 29.1000 +0.34% 403.gcc 41.1000 41.2000 +0.24% 429.mcf 53.4000 54.6000 +2.24% 445.gobmk 32.3000 32.3000 +0% 456.hmmer 32.3000 32.3000 +0% 458.sjeng 36.1000 36.4000 +0.83% 462.libquantum 84. 84.8000 +0.95% 464.h264ref 64.2000 64.3000 +0.15% 471.omnetpp 27.6000 27.8000 +0.72% 473.astar 28. 28. +0% 483.xalancbmk 50.1000 49.8000 -0.59% 410.bwaves 93.5000 93.6000 +0.10% 416.gamess 41.8000 41.9000 +0.23% 433.milc37.8000 37.8000 +0% 434.zeusmp 48.5000 48.4000 -0.20% 435.gromacs 32.6000 32.4000 -0.61% 436.cactusADM 80.4000 81.2000 +0.99% 437.leslie3d55.7000 55.7000 +0% 444.namd32.1000 32.1000 +0% 447.dealII 67.6000 67.6000 +0% 450.soplex 48.6000 48.2000 -0.82% 453.povray 58.3000 58.5000 +0.34% 454.calculix37.6000 37.6000 +0% 459.GemsFDTD44.7000 44.6000 -0.22% 465.tonto 28.6000 28.6000 +0% 470.lbm 68.1000 68.1000 +0% 481.wrf 49.5000 49.6000 +0.20% 482.sphinx3 68.5000 68. -0.72% Geomeans: INT 41.08 41.25 +0.41% FP 49.80 49.78 -0.04% ALL 45.99 46.06 +0.14% -- Thanks, K
Re: [PATCH] PR libstdc++/60132, implement the remaining C++11 is_trivially_* traits
Hi, On 10/09/2014 08:10 AM, Ville Voutilainen wrote: Ok, another try (I don't have write-after-approval, so please commit the patch once you think it's ok): Thanks Ville from me too. I'm taking care of the commit with a few very minor tweaks: 1- In new library testcases we are consistently using -std=gnu++11 and -std=gnu++14 (because the C++98 default is -std=gnu++98) 2- Uniform the dates to 2014-10-09. 3- Adjust the dg-error line numbers of 3 *_neg.cc testcases. Thanks again, Paolo. 2014-10-09 Ville Voutilainen PR libstdc++/60132 * include/std/type_traits (is_trivially_copyable, is_trivially_constructible, is_trivially_default_constructible, is_trivially_copy_constructible, is_trivially_move_constructible, is_trivially_assignable, is_trivially_copy_assignable, is_trivially_move_assignable): New. * testsuite/20_util/is_trivially_assignable/requirements/ typedefs.cc: Likewise. * testsuite/20_util/is_trivially_assignable/requirements/ explicit_instantiation.cc: Likewise. * testsuite/20_util/is_trivially_assignable/value.cc: Likewise. * testsuite/20_util/is_trivially_constructible/requirements/ typedefs.cc: Likewise. * testsuite/20_util/is_trivially_constructible/requirements/ explicit_instantiation.cc: Likewise. * testsuite/20_util/is_trivially_constructible/value.cc: Likewise. * testsuite/20_util/is_trivially_copyable/requirements/ typedefs.cc: Likewise. * testsuite/20_util/is_trivially_copyable/requirements/ explicit_instantiation.cc: Likewise. * testsuite/20_util/is_trivially_copyable/value.cc: Likewise. * testsuite/20_util/is_trivially_copy_assignable/requirements/ typedefs.cc: Likewise. * testsuite/20_util/is_trivially_copy_assignable/requirements/ explicit_instantiation.cc: Likewise. * testsuite/20_util/is_trivially_copy_assignable/value.cc: Likewise. * testsuite/20_util/is_trivially_copy_constructible/requirements/ typedefs.cc: Likewise. * testsuite/20_util/is_trivially_copy_constructible/requirements/ explicit_instantiation.cc: Likewise. * testsuite/20_util/is_trivially_copy_constructible/value.cc: Likewise. * testsuite/20_util/is_trivially_default_constructible/requirements/ typedefs.cc: Likewise. * testsuite/20_util/is_trivially_default_constructible/requirements/ explicit_instantiation.cc: Likewise. * testsuite/20_util/is_trivially_default_constructible/ value.cc: Likewise. * testsuite/20_util/is_trivially_move_assignable/requirements/ typedefs.cc: Likewise. * testsuite/20_util/is_trivially_move_assignable/requirements/ explicit_instantiation.cc: Likewise. * testsuite/20_util/is_trivially_move_assignable/value.cc: Likewise. * testsuite/20_util/is_trivially_move_constructible/requirements/ typedefs.cc: Likewise. * testsuite/20_util/is_trivially_move_constructible/requirements/ explicit_instantiation.cc: Likewise. * testsuite/20_util/is_trivially_move_constructible/value.cc: Likewise. * testsuite/20_util/declval/requirements/1_neg.cc: Adjust dg-error line number. * testsuite/20_util/make_signed/requirements/typedefs_neg.cc: Likewise. * testsuite/20_util/make_unsigned/requirements/typedefs_neg.cc: Likewise. Index: include/std/type_traits === --- include/std/type_traits (revision 216027) +++ include/std/type_traits (working copy) @@ -606,7 +606,11 @@ : public integral_constant { }; - // is_trivially_copyable (still unimplemented) + // is_trivially_copyable + template +struct is_trivially_copyable +: public integral_constant +{ }; /// is_standard_layout template @@ -1282,20 +1286,59 @@ : public __is_nt_move_assignable_impl<_Tp> { }; - /// is_trivially_constructible (still unimplemented) + /// is_trivially_constructible + template +struct is_trivially_constructible +: public __and_, integral_constant>::type +{ }; - /// is_trivially_default_constructible (still unimplemented) + /// is_trivially_default_constructible + template +struct is_trivially_default_constructible +: public is_trivially_constructible<_Tp>::type +{ }; - /// is_trivially_copy_constructible (still unimplemented) + /// is_trivially_copy_constructible + template +struct is_trivially_copy_constructible +: public __and_, + integral_constant>::type +{ }; + + /// is_trivially_move_constructible + template +struct is_trivially_move_constructible +: public __and_, + integral_constant>::type +{ }; - /// is_trivially_move_constructible (still unimplemented) + /// is_tri
Re: RFA: AVR: add infrastructure for device packages
2014-10-08 21:50 GMT+04:00 Joern Rennecke : > As the steering commitee still hasn't spoken on the maintainership issue, > apparently this still has to go the write-after-approval route. > > The purpose of this patch is to make it possible to add support for new > devices (MCUs) to the AVR toolchain, without having to re-build the > entire toolchain. This capability is desirable because new MCUs are added > fairly frequently. > > There are multiple parts of the toolchain involved. > gcc changes multilibbing to key off the new -march option; the -mmcu option > is translated via DRIVER_SELF_SPECS into a -specs option, and the > individual spec files contain the required settings like -march, and various > more detailed settings (some of which are for new options). > > binutils provides new relocation and relaxation facilities to allow referring > symbolically to symbol differences and/or I/O addresses. > avr-libc puts the device-specifc header settings in avr/io*.h, and a few > small device-specific likbale functions into a device-specific library. > > The other toolchain parts are staged here: > g...@github.com:embecosm/avr-binutils-gdb.git avr-mainline > g...@github.com:embecosm/avr-libc.git avr-libc-embecosm-mainline > > > Attached is the GCC patch for the basic device package infrastructure. > OK to apply? Please, apply. Denis.
Re: [patch] Excessive alignment in ix86_data_alignment
On Thu, Oct 9, 2014 at 10:25 AM, Kirill Yukhin wrote: > On 08 Oct 23:02, Petr Murzin wrote: >> Hi, >> I have measured performance impact on Haswell platform according to this >> input: >> https://gcc.gnu.org/ml/gcc-patches/2014-06/msg00978.html What about older processors? The optimization was introduced well before Haswell for then current processors, and it was based on the recommendation from Intel optimization guide. If this optimization doesn't apply for new processors, then tune option should be introduced and set accordingly. Uros.
[DOC PATCH] Add documentation for -fsanitize=enum,bool
We are missing documentation for -fsanitize=bool and -fsanitize=enum, so I've put something together. Ok for trunk? 2014-10-09 Marek Polacek * doc/invoke.texi: Document -fsanitize=bool and -fsanitize=enum. diff --git gcc/doc/invoke.texi gcc/doc/invoke.texi index 5fe7e15..8f3eb16 100644 --- gcc/doc/invoke.texi +++ gcc/doc/invoke.texi @@ -5604,6 +5604,19 @@ This option enables instrumentation of return statements in functions marked with @code{returns_nonnull} function attribute, to detect returning of null values from such functions. +@item -fsanitize=bool +@opindex fsanitize=bool + +This option enables instrumentation of loads from bool. If a value other +than 0/1 is loaded, a run-time error is issued. + +@item -fsanitize=enum +@opindex fsanitize=enum + +This option enables instrumentation of loads from an enum type. If +a value outside the range of values for the enum type is loaded, +a run-time error is issued. + @end table While @option{-ftrapv} causes traps for signed overflows to be emitted, Marek
Re: [DOC PATCH] Add documentation for -fsanitize=enum,bool
On Thu, Oct 09, 2014 at 10:47:34AM +0200, Marek Polacek wrote: > We are missing documentation for -fsanitize=bool and -fsanitize=enum, > so I've put something together. > > Ok for trunk? > > 2014-10-09 Marek Polacek > > * doc/invoke.texi: Document -fsanitize=bool and -fsanitize=enum. Ok, thanks. Jakub
[PING] [4.8 & 4.9] Backport of r211885
PING? > > The enclosed patch for 4.8 & 4.9 branch is a backport of r211885 from trunk. > > The only change is to use: > > for (def_rec = DF_INSN_INFO_DEFS (insn_info); *def_rec; def_rec++) > > other than the new FOR_EACH_INSN_INFO_DEF interface. > > Bootstrapped on x86_64-SUSE-Linux for both branches. OK to apply? > > > Index: gcc/loop-invariant.c > > === > --- gcc/loop-invariant.c(revision 216001) > +++ gcc/loop-invariant.c(working copy) > @@ -839,6 +839,41 @@ check_dependencies (rtx insn, bitmap depends_on) >return true; > } > > +/* Pre-check candidate DEST to skip the one which can not make a valid insn > + during move_invariant_reg. SIMPLE is to skip HARD_REGISTER. */ > + > +static bool > +pre_check_invariant_p (bool simple, rtx dest) { > + if (simple && REG_P (dest) && DF_REG_DEF_COUNT (REGNO (dest)) > 1) > +{ > + df_ref use; > + rtx ref; > + unsigned int i = REGNO (dest); > + struct df_insn_info *insn_info; > + df_ref *def_rec; > + > + for (use = DF_REG_USE_CHAIN (i); use; use = DF_REF_NEXT_REG > (use)) > +{ > + ref = DF_REF_INSN (use); > + insn_info = DF_INSN_INFO_GET (ref); > + > + for (def_rec = DF_INSN_INFO_DEFS (insn_info); *def_rec; def_rec++) > +if (DF_REF_REGNO (*def_rec) == i) > + { > +/* Multi definitions at this stage, most likely are due to > + instruction constraints, which requires both read and write > + on the same register. Since move_invariant_reg is not > + powerful enough to handle such cases, just ignore the INV > + and leave the chance to others. */ > +return false; > + } > +} > +} > + > + return true; > +} > + > /* Finds invariant in INSN. ALWAYS_REACHED is true if the insn is always > executed. ALWAYS_EXECUTED is true if the insn is always executed, > unless the program ends due to a function call. */ @@ -869,6 +904,7 > @@ find_invariant_insn (rtx insn, bool always_reached > simple = false; > >if (!may_assign_reg_p (SET_DEST (set)) > + || !pre_check_invariant_p (simple, dest) >|| !check_maybe_invariant (SET_SRC (set))) > return; > > > Cheers, > Felix
Re: [4.8 & 4.9] Backport of r211885
On Wed, Oct 08, 2014 at 11:00:24PM +0800, Felix Yang wrote: > The enclosed patch for 4.8 & 4.9 branch is a backport of r211885 from trunk. > > The only change is to use: > > for (def_rec = DF_INSN_INFO_DEFS (insn_info); *def_rec; def_rec++) > > other than the new FOR_EACH_INSN_INFO_DEF interface. > > Bootstrapped on x86_64-SUSE-Linux for both branches. OK to apply? ChangeLog entry is missing, plus description why do you want to backport it. If it fixes a bug on the branches, it would be better to have a bugzilla PR for that, and definitely a testcase. > --- gcc/loop-invariant.c(revision 216001) > +++ gcc/loop-invariant.c(working copy) > @@ -839,6 +839,41 @@ check_dependencies (rtx insn, bitmap depends_on) >return true; > } > > +/* Pre-check candidate DEST to skip the one which can not make a valid insn > + during move_invariant_reg. SIMPLE is to skip HARD_REGISTER. */ > + > +static bool > +pre_check_invariant_p (bool simple, rtx dest) > +{ > + if (simple && REG_P (dest) && DF_REG_DEF_COUNT (REGNO (dest)) > 1) > +{ > + df_ref use; > + rtx ref; > + unsigned int i = REGNO (dest); > + struct df_insn_info *insn_info; > + df_ref *def_rec; > + > + for (use = DF_REG_USE_CHAIN (i); use; use = DF_REF_NEXT_REG (use)) > +{ > + ref = DF_REF_INSN (use); > + insn_info = DF_INSN_INFO_GET (ref); > + > + for (def_rec = DF_INSN_INFO_DEFS (insn_info); *def_rec; def_rec++) > +if (DF_REF_REGNO (*def_rec) == i) > + { > +/* Multi definitions at this stage, most likely are due to > + instruction constraints, which requires both read and write > + on the same register. Since move_invariant_reg is not > + powerful enough to handle such cases, just ignore the INV > + and leave the chance to others. */ > +return false; > + } > +} > +} > + > + return true; > +} > + > /* Finds invariant in INSN. ALWAYS_REACHED is true if the insn is always > executed. ALWAYS_EXECUTED is true if the insn is always executed, > unless the program ends due to a function call. */ > @@ -869,6 +904,7 @@ find_invariant_insn (rtx insn, bool always_reached > simple = false; > >if (!may_assign_reg_p (SET_DEST (set)) > + || !pre_check_invariant_p (simple, dest) >|| !check_maybe_invariant (SET_SRC (set))) > return; > > > Cheers, > Felix Jakub
Re: [fortran,patch] Emit code for some IEEE functions in the front-end
Ok, thanks for the patch! On Sat, Oct 4, 2014 at 12:55 PM, FX wrote: > ping > > >> Hi all, >> >> The attached patch improves our code generation for some of the >> IEEE_ARITHMETIC functions: some testing functions (is*) and some arithmetic >> (logb, rint, scalb, …). The interfaces are still present in the module file >> generated as part of the library (which allows, in particular, for accurate >> testing of the extent of support we have), but we catch them before emitting >> the actual function call and emit front-end-generated code instead. This >> code uses some intrinsics that we didn’t use in the front-end so far (some >> type generic, some not), so I have added them (logb, remainder, rint, >> signbit). >> >> The patch is nice because it improves the quality of the code generated, >> eliminating in many cases the need for a function call. It is also a >> prerequisite to extend our IEEE support to more floating-point types >> (extended precision and binary128, on some targets including i386/x86_64). >> Without it, we would have a combinatorial explosion of the number of >> “helper” functions in the library. >> >> Also, I’m removing symbols from gfortran.map, but no branching/release has >> occurred since I added them in the first place: it should be all good. >> >> Regtested on x86_64-apple-darwin14. This regresses ieee_2.f90, at -m32 >> -fno-float-store only, where we seem to trigger a missimplification of >> __builtin_rint(). I’ll send, just after this one, a mail to gcc to get some >> help on that, and track the issue separately. >> >> OK to commit? >> FX > -- Janne Blomqvist
[PATCH] Fix PR63380
The following fixes PR63380 by making sure to not ignore unused but possibly trapping stmts when comparing basic-blocks. Bootstrap / regtest running on x86_64-unknown-linux-gnu. Richard. 2014-10-09 Richard Biener PR tree-optimization/63380 * tree-ssa-tail-merge.c (stmt_local_def): Exclude stmts that may trap. * gcc.dg/torture/pr63380-1.c: New testcase. Index: gcc/tree-ssa-tail-merge.c === --- gcc/tree-ssa-tail-merge.c (revision 215917) +++ gcc/tree-ssa-tail-merge.c (working copy) @@ -313,9 +313,9 @@ stmt_local_def (gimple stmt) tree val; def_operand_p def_p; - if (gimple_has_side_effects (stmt) - || stmt_could_throw_p (stmt) - || gimple_vdef (stmt) != NULL_TREE) + if (gimple_vdef (stmt) != NULL_TREE + || gimple_has_side_effects (stmt) + || gimple_could_trap_p_1 (stmt, false, false)) return false; def_p = SINGLE_SSA_DEF_OPERAND (stmt, SSA_OP_DEF); Index: gcc/testsuite/gcc.dg/torture/pr63380-1.c === --- gcc/testsuite/gcc.dg/torture/pr63380-1.c(revision 0) +++ gcc/testsuite/gcc.dg/torture/pr63380-1.c(working copy) @@ -0,0 +1,15 @@ +/* { dg-do run } */ + +int a = 0, b = 1, c = 0, d = 1, e, f, g, h; +int +main () +{ + e = 1 >> d; + f = ((31 / (1 > e)) || c) / 2; + g = b || a; + h = 31 / g; + if (!h) +__builtin_abort(); + return 0; +} + Index: gcc/testsuite/gcc.dg/torture/pr63380-2.c === --- gcc/testsuite/gcc.dg/torture/pr63380-2.c(revision 0) +++ gcc/testsuite/gcc.dg/torture/pr63380-2.c(working copy) @@ -0,0 +1,10 @@ +/* { dg-do run } */ + +int a = 0, b = 0, c = 0, d, e; +int +main (void) +{ + d = ((20 % (1 != b)) && c) + 2147483647; + e = 20 % (a >= 0); + return 0; +}
Re: [4.8 & 4.9] Backport of r211885
> > On Wed, Oct 08, 2014 at 11:00:24PM +0800, Felix Yang wrote: > > The enclosed patch for 4.8 & 4.9 branch is a backport of r211885 from trunk. > > > > The only change is to use: > > > > for (def_rec = DF_INSN_INFO_DEFS (insn_info); *def_rec; def_rec++) > > > > other than the new FOR_EACH_INSN_INFO_DEF interface. > > > > Bootstrapped on x86_64-SUSE-Linux for both branches. OK to apply? > > ChangeLog entry is missing, plus description why do you want to backport it. > If it fixes a bug on the branches, it would be better to have a bugzilla PR > for that, > and definitely a testcase. > Yeah, I will add a ChangeLog entry for this patch when it is committed. I encountered the same issue when working on my local customized 4.8/4.9 branches. Not reproduceable with the official 4.8/4.9 branches. I thinks it's just an enhancement for the loop invariant pass to make it more versatile. It's better that 4.8/4.9 branches also inlcude this enhancement. OK?
[PATCH] Fix PR63379
This fixes SLP vectorization of a MIN/MAX reduction where we choose a "neutral" element as the initial value of the first SLP group member. That's of course wrong. Simply don't choose one in which case the proper initial values will used. Bootstrap / regtest running on x86_64-unknown-linux-gnu. Richard. 2014-10-09 Richard Biener PR tree-optimization/63379 * tree-vect-slp.c (vect_get_constant_vectors): Do not compute a neutral operand for min/max. * gcc.dg/vect/pr63379.c: New testcase. Index: gcc/tree-vect-slp.c === --- gcc/tree-vect-slp.c (revision 215917) +++ gcc/tree-vect-slp.c (working copy) @@ -2395,13 +2395,8 @@ vect_get_constant_vectors (tree op, slp_ neutral_op = build_int_cst (TREE_TYPE (op), -1); break; - case MAX_EXPR: - case MIN_EXPR: -def_stmt = SSA_NAME_DEF_STMT (op); -loop = (gimple_bb (stmt))->loop_father; -neutral_op = PHI_ARG_DEF_FROM_EDGE (def_stmt, -loop_preheader_edge (loop)); -break; + /* For MIN/MAX we don't have an easy neutral operand but +the initial values can be used fine here. */ default: neutral_op = NULL; Index: gcc/testsuite/gcc.dg/vect/pr63379.c === --- gcc/testsuite/gcc.dg/vect/pr63379.c (revision 0) +++ gcc/testsuite/gcc.dg/vect/pr63379.c (working copy) @@ -0,0 +1,43 @@ +/* PR tree-optimization/63379 */ +/* { dg-do run } */ + +#include "tree-vect.h" + +extern void abort (void); + +typedef struct { +int x; +int y; +} Point; + +Point pt_array[25]; + +void __attribute__((noinline,noclone)) +generate_array(void) +{ + unsigned int i; + for (i = 0; i<25; i++) +{ + pt_array[i].x = i; + pt_array[i].y = 1000+i; +} +} + +int main() +{ + check_vect (); + generate_array (); + Point min_pt = pt_array[0]; + Point *ptr, *ptr_end; + for (ptr = pt_array+1, ptr_end = pt_array+25; ptr != ptr_end; ++ptr) +{ + min_pt.x = (min_pt.x < ptr->x) ? min_pt.x : ptr->x; + min_pt.y = (min_pt.y < ptr->y) ? min_pt.y : ptr->y; +} + + if (min_pt.x != 0 || min_pt.y != 1000) +abort (); + return 0; +} + +/* { dg-final { cleanup-tree-dump "vect" } } */
Re: [4.8 & 4.9] Backport of r211885
On Thu, Oct 09, 2014 at 09:04:49AM +, Yangfei (Felix) wrote: > > > On Wed, Oct 08, 2014 at 11:00:24PM +0800, Felix Yang wrote: > > > The enclosed patch for 4.8 & 4.9 branch is a backport of r211885 from > > > trunk. > > > > > > The only change is to use: > > > > > > for (def_rec = DF_INSN_INFO_DEFS (insn_info); *def_rec; def_rec++) > > > > > > other than the new FOR_EACH_INSN_INFO_DEF interface. > > > > > > Bootstrapped on x86_64-SUSE-Linux for both branches. OK to apply? > > > > ChangeLog entry is missing, plus description why do you want to backport it. > > If it fixes a bug on the branches, it would be better to have a bugzilla PR > > for that, > > and definitely a testcase. > > > > Yeah, I will add a ChangeLog entry for this patch when it is committed. > I encountered the same issue when working on my local customized 4.8/4.9 > branches. Not reproduceable with the official 4.8/4.9 branches. > I thinks it's just an enhancement for the loop invariant pass to make it more > versatile. It's better that 4.8/4.9 branches also inlcude this enhancement. > OK? If it is just an enhancement, then those generally are not backported to release branches (exceptions possible of course, but there needs to be a strong reason). Each pass has some risk of breaking something, exposing previously only latent bugs in later passes etc. Jakub
Re: [PATCH 5/5] add libcc1
On 31/07/14 05:47, Jeff Law wrote: > On 06/19/14 14:52, Tom Tromey wrote: >> Here's a new version of patch #5. >> I've removed the generated code; let's see if it gets through without >> compression. >> >> I think this addresses all the reviews: >> >> * It uses gcc-plugin.m4 to disable the plugin >> * It does some configure checks for needed functionality, and disables >>the plugin if they are not found >> * libcc1 and the plugin now do a protocol version handshake at >>startup >> * The diagnostic overriding code is now in the plugin, not in gcc proper >> * gdb now tells libcc1 about the target triplet, and libcc1 uses >>this to invoke the proper GCC. This is done by (ewww) searching $PATH. >> >> Tom >> >> 2014-06-19 Phil Muldoon >> Tom Tromey > So my biggest concern here is long term maintenance -- who's going to own > care and feeding of these bits over time. Sorry for taking so long to reply. We've talked, on irc and elsewhere a little (some at the Cauldron too!). I think the consensus is as nobody has explicitly mentioned anything, this is OK to go in? So to be a pain, I think we should get an archivable "OK to check-in" and that I, or other members of the Red Hat team (or anyone else that comes along that is interested), will maintain this. FWIW, I don't really see bit-rot as an issue because 1) I'll be around and so will other hackers working on this -- I think it is very important to GDB; 2) It's not really a patch-set I think is horribly susceptible to bit rot anyway. > My inclination is to go ahead and approve, but explicitly note that if the > bits do start to rot that we'll be fairly aggressive at disabling/removing > them. That's a fair condition and I can happily live with that. Agreed on the conditional here. > Now that my position is out there for everyone to see, give the other > maintainers a few days (say until Monday) to chime in with any objections. Well it's been a few months, so Monday has long gone ;) > Obviously if there are no objections and you check in the change, please be > on the lookout for any fallout. I'm particularly concerned about AIX, > Solaris and other non-linux platforms. Noted. > Does this deserve a mention in the news file? I am not sure. All the interface to this is really through GDB. I'll let someone else tell me yes or no for news. The patch set I have on my desk is ready to go, and I believe all alterations have been approved in previous email threads. Cheers, Phil
Re: [PATCH 5/5] add libcc1
On Thu, Oct 09, 2014 at 10:07:23AM +0100, Phil Muldoon wrote: > >> 2014-06-19 Phil Muldoon > >> Tom Tromey > > > So my biggest concern here is long term maintenance -- who's going to own > > care and feeding of these bits over time. > > Sorry for taking so long to reply. We've talked, on irc and elsewhere > a little (some at the Cauldron too!). I think the consensus is as > nobody has explicitly mentioned anything, this is OK to go in? So to > be a pain, I think we should get an archivable "OK to check-in" and > that I, or other members of the Red Hat team (or anyone else that The series is ok for trunk. Please retest it before checking in. > > Does this deserve a mention in the news file? > > I am not sure. All the interface to this is really through GDB. > I'll let someone else tell me yes or no for news. The patch set I have on my > desk is ready to go, and I believe all alterations have been approved > in previous email threads. Yeah, I think it belongs into GDB news file probably, because that is the place where it is user visible. Jakub
[PATCH] Clean up duplicated function seq_cost
Hi, The are two implementations of seq_cost. The function bodies are exactly the same. The patch removes one of them and make the other global. Bootstrap and no make check regression on X86-64. OK for trunk? Thanks! -Zhenqiang ChangeLog: 2014-10-09 Zhenqiang Chen * cfgloopanal.c (seq_cost): Make it global. * rtl.h (seq_cost): New prototype. * tree-ssa-loop-ivopts.c (seq_cost): Delete. diff --git a/gcc/cfgloopanal.c b/gcc/cfgloopanal.c index 7ea1a5f..dd37aa0 100644 --- a/gcc/cfgloopanal.c +++ b/gcc/cfgloopanal.c @@ -304,7 +304,7 @@ get_loop_level (const struct loop *loop) /* Returns estimate on cost of computing SEQ. */ -static unsigned +unsigned seq_cost (const rtx_insn *seq, bool speed) { unsigned cost = 0; diff --git a/gcc/rtl.h b/gcc/rtl.h index e73f731..b697417 100644 --- a/gcc/rtl.h +++ b/gcc/rtl.h @@ -2921,6 +2921,7 @@ extern rtx_insn *find_first_parameter_load (rtx_insn *, rtx_insn *); extern bool keep_with_call_p (const rtx_insn *); extern bool label_is_jump_target_p (const_rtx, const rtx_insn *); extern int insn_rtx_cost (rtx, bool); +extern unsigned seq_cost (const rtx_insn *, bool); /* Given an insn and condition, return a canonical description of the test being made. */ diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c index 400798a..087ca26 100644 --- a/gcc/tree-ssa-loop-ivopts.c +++ b/gcc/tree-ssa-loop-ivopts.c @@ -2842,26 +2842,6 @@ get_use_iv_cost (struct ivopts_data *data, struct iv_use *use, return NULL; } -/* Returns estimate on cost of computing SEQ. */ - -static unsigned -seq_cost (rtx_insn *seq, bool speed) -{ - unsigned cost = 0; - rtx set; - - for (; seq; seq = NEXT_INSN (seq)) -{ - set = single_set (seq); - if (set) - cost += set_src_cost (SET_SRC (set), speed); - else - cost++; -} - - return cost; -} - /* Produce DECL_RTL for object obj so it looks like it is stored in memory. */ static rtx produce_memory_decl_rtl (tree obj, int *regno)
Re: [4.8 & 4.9] Backport of r211885
> On Thu, Oct 09, 2014 at 09:04:49AM +, Yangfei (Felix) wrote: > > > > On Wed, Oct 08, 2014 at 11:00:24PM +0800, Felix Yang wrote: > > > > The enclosed patch for 4.8 & 4.9 branch is a backport of r211885 from > trunk. > > > > > > > > The only change is to use: > > > > > > > > for (def_rec = DF_INSN_INFO_DEFS (insn_info); *def_rec; def_rec++) > > > > > > > > other than the new FOR_EACH_INSN_INFO_DEF interface. > > > > > > > > Bootstrapped on x86_64-SUSE-Linux for both branches. OK to apply? > > > > > > ChangeLog entry is missing, plus description why do you want to backport > it. > > > If it fixes a bug on the branches, it would be better to have a > > > bugzilla PR for that, and definitely a testcase. > > > > > > > Yeah, I will add a ChangeLog entry for this patch when it is committed. > > I encountered the same issue when working on my local customized 4.8/4.9 > branches. Not reproduceable with the official 4.8/4.9 branches. > > I thinks it's just an enhancement for the loop invariant pass to make it > > more > versatile. It's better that 4.8/4.9 branches also inlcude this enhancement. > > OK? > > If it is just an enhancement, then those generally are not backported to > release > branches (exceptions possible of course, but there needs to be a strong > reason). > Each pass has some risk of breaking something, exposing previously only latent > bugs in later passes etc. > > Jakub We can treat it as bugfix, as we got incorrect code when it triggers. It just happens so rarely. Does it worth backporting?
Re: [PATCH] Clean up duplicated function seq_cost
On Thu, Oct 9, 2014 at 11:14 AM, Zhenqiang Chen wrote: > Hi, > > The are two implementations of seq_cost. The function bodies are exactly the > same. The patch removes one of them and make the other global. > > Bootstrap and no make check regression on X86-64. > > OK for trunk? The prototype should go to cfgloopanal.c. Thanks, RIchard. > Thanks! > -Zhenqiang > > ChangeLog: > 2014-10-09 Zhenqiang Chen > > * cfgloopanal.c (seq_cost): Make it global. > * rtl.h (seq_cost): New prototype. > * tree-ssa-loop-ivopts.c (seq_cost): Delete. > > diff --git a/gcc/cfgloopanal.c b/gcc/cfgloopanal.c > index 7ea1a5f..dd37aa0 100644 > --- a/gcc/cfgloopanal.c > +++ b/gcc/cfgloopanal.c > @@ -304,7 +304,7 @@ get_loop_level (const struct loop *loop) > > /* Returns estimate on cost of computing SEQ. */ > > -static unsigned > +unsigned > seq_cost (const rtx_insn *seq, bool speed) > { >unsigned cost = 0; > diff --git a/gcc/rtl.h b/gcc/rtl.h > index e73f731..b697417 100644 > --- a/gcc/rtl.h > +++ b/gcc/rtl.h > @@ -2921,6 +2921,7 @@ extern rtx_insn *find_first_parameter_load (rtx_insn > *, rtx_insn *); > extern bool keep_with_call_p (const rtx_insn *); > extern bool label_is_jump_target_p (const_rtx, const rtx_insn *); > extern int insn_rtx_cost (rtx, bool); > +extern unsigned seq_cost (const rtx_insn *, bool); > > /* Given an insn and condition, return a canonical description of > the test being made. */ > diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c > index 400798a..087ca26 100644 > --- a/gcc/tree-ssa-loop-ivopts.c > +++ b/gcc/tree-ssa-loop-ivopts.c > @@ -2842,26 +2842,6 @@ get_use_iv_cost (struct ivopts_data *data, struct > iv_use *use, >return NULL; > } > > -/* Returns estimate on cost of computing SEQ. */ > - > -static unsigned > -seq_cost (rtx_insn *seq, bool speed) > -{ > - unsigned cost = 0; > - rtx set; > - > - for (; seq; seq = NEXT_INSN (seq)) > -{ > - set = single_set (seq); > - if (set) > - cost += set_src_cost (SET_SRC (set), speed); > - else > - cost++; > -} > - > - return cost; > -} > - > /* Produce DECL_RTL for object obj so it looks like it is stored in memory. > */ > static rtx > produce_memory_decl_rtl (tree obj, int *regno) > > >
Re: [PATCH] Clean up duplicated function seq_cost
On Thu, Oct 9, 2014 at 11:20 AM, Richard Biener wrote: > On Thu, Oct 9, 2014 at 11:14 AM, Zhenqiang Chen > wrote: >> Hi, >> >> The are two implementations of seq_cost. The function bodies are exactly the >> same. The patch removes one of them and make the other global. >> >> Bootstrap and no make check regression on X86-64. >> >> OK for trunk? > > The prototype should go to cfgloopanal.c. Err - cfgloopanal.h of course ;) Or rather the function sounds misplaced in cfgloopanal.c. Richard. > > Thanks, > RIchard. > >> Thanks! >> -Zhenqiang >> >> ChangeLog: >> 2014-10-09 Zhenqiang Chen >> >> * cfgloopanal.c (seq_cost): Make it global. >> * rtl.h (seq_cost): New prototype. >> * tree-ssa-loop-ivopts.c (seq_cost): Delete. >> >> diff --git a/gcc/cfgloopanal.c b/gcc/cfgloopanal.c >> index 7ea1a5f..dd37aa0 100644 >> --- a/gcc/cfgloopanal.c >> +++ b/gcc/cfgloopanal.c >> @@ -304,7 +304,7 @@ get_loop_level (const struct loop *loop) >> >> /* Returns estimate on cost of computing SEQ. */ >> >> -static unsigned >> +unsigned >> seq_cost (const rtx_insn *seq, bool speed) >> { >>unsigned cost = 0; >> diff --git a/gcc/rtl.h b/gcc/rtl.h >> index e73f731..b697417 100644 >> --- a/gcc/rtl.h >> +++ b/gcc/rtl.h >> @@ -2921,6 +2921,7 @@ extern rtx_insn *find_first_parameter_load (rtx_insn >> *, rtx_insn *); >> extern bool keep_with_call_p (const rtx_insn *); >> extern bool label_is_jump_target_p (const_rtx, const rtx_insn *); >> extern int insn_rtx_cost (rtx, bool); >> +extern unsigned seq_cost (const rtx_insn *, bool); >> >> /* Given an insn and condition, return a canonical description of >> the test being made. */ >> diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c >> index 400798a..087ca26 100644 >> --- a/gcc/tree-ssa-loop-ivopts.c >> +++ b/gcc/tree-ssa-loop-ivopts.c >> @@ -2842,26 +2842,6 @@ get_use_iv_cost (struct ivopts_data *data, struct >> iv_use *use, >>return NULL; >> } >> >> -/* Returns estimate on cost of computing SEQ. */ >> - >> -static unsigned >> -seq_cost (rtx_insn *seq, bool speed) >> -{ >> - unsigned cost = 0; >> - rtx set; >> - >> - for (; seq; seq = NEXT_INSN (seq)) >> -{ >> - set = single_set (seq); >> - if (set) >> - cost += set_src_cost (SET_SRC (set), speed); >> - else >> - cost++; >> -} >> - >> - return cost; >> -} >> - >> /* Produce DECL_RTL for object obj so it looks like it is stored in memory. >> */ >> static rtx >> produce_memory_decl_rtl (tree obj, int *regno) >> >> >>
RE: [PATCH, C++] Fix PR63366: __complex not equivalent to __complex double in C++
Ping? > -Original Message- > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme > Sent: Monday, September 29, 2014 3:33 PM > To: gcc-patches@gcc.gnu.org > Subject: [PATCH, C++] Fix PR63366: __complex not equivalent to > __complex double in C++ > > According to a comment in grokdeclarator in file gcc/cp/decl.c: > > /* If we just have "complex", it is equivalent to > "complex double", but if any modifiers at all are specified it is > the complex form of TYPE. E.g, "complex short" is > "complex short int". */ > > Yet, __complex is equivalent to __complex int as shows the following > testcase: > > #include > > int > main (void) > { > return typeid (__complex) != typeid (__complex int); > } > > The following patch fix the problem. > > > ChangeLog are as follows: > > *** gcc/cp/ChangeLog *** > > 2014-09-26 Thomas Preud'homme > > PR C++/63366 > * decl.c (grokdeclarator): Set defaulted_int when defaulting to > int > because type is null. > > *** gcc/testsuite/ChangeLog *** > > 2014-10-26 Thomas Preud'homme > > PR C++/63366 > * g++.dg/torture/pr63366.C: New test. > > > diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c > index d26a432..449efdf 100644 > --- a/gcc/cp/decl.c > +++ b/gcc/cp/decl.c > @@ -9212,6 +9212,7 @@ grokdeclarator (const cp_declarator *declarator, > "ISO C++ forbids declaration of %qs with no type", name); > >type = integer_type_node; > + defaulted_int = 1; > } > >ctype = NULL_TREE; > diff --git a/gcc/testsuite/g++.dg/torture/pr63366.C > b/gcc/testsuite/g++.dg/torture/pr63366.C > new file mode 100644 > index 000..af59b98 > --- /dev/null > +++ b/gcc/testsuite/g++.dg/torture/pr63366.C > @@ -0,0 +1,11 @@ > +// { dg-do run } > +// { dg-options "-fpermissive" } > +// { dg-prune-output "ISO C\\+\\+ forbids declaration of 'type name' > with no type" } > + > +#include > + > +int > +main (void) > +{ > + return typeid (__complex) != typeid (__complex double); > +} > > > Is this ok for trunk? > > Best regards, > > Thomas Preud'homme > > >
答复: [4.8 & 4.9] Backport of r211885
> > > > On Thu, Oct 09, 2014 at 09:04:49AM +, Yangfei (Felix) wrote: > > > > > On Wed, Oct 08, 2014 at 11:00:24PM +0800, Felix Yang wrote: > > > > > The enclosed patch for 4.8 & 4.9 branch is a backport of r211885 > > > > > from > > trunk. > > > > > > > > > > The only change is to use: > > > > > > > > > > for (def_rec = DF_INSN_INFO_DEFS (insn_info); *def_rec; > > > > > def_rec++) > > > > > > > > > > other than the new FOR_EACH_INSN_INFO_DEF interface. > > > > > > > > > > Bootstrapped on x86_64-SUSE-Linux for both branches. OK to apply? > > > > > > > > ChangeLog entry is missing, plus description why do you want to > > > > backport > > it. > > > > If it fixes a bug on the branches, it would be better to have a > > > > bugzilla PR for that, and definitely a testcase. > > > > > > > > > > Yeah, I will add a ChangeLog entry for this patch when it is committed. > > > I encountered the same issue when working on my local customized > > > 4.8/4.9 > > branches. Not reproduceable with the official 4.8/4.9 branches. > > > I thinks it's just an enhancement for the loop invariant pass to > > > make it more > > versatile. It's better that 4.8/4.9 branches also inlcude this enhancement. > > > OK? > > > > If it is just an enhancement, then those generally are not backported > > to release branches (exceptions possible of course, but there needs to be a > strong reason). > > Each pass has some risk of breaking something, exposing previously > > only latent bugs in later passes etc. > > > > Jakub > > We can treat it as bugfix, as we got incorrect code when it triggers. > It just happens so rarely. Does it worth backporting? And the patch fix this bug by making the loop invariant pass more conservative. I didn't find a PR or testcase on trunk for this patch either.
Re: [patch] Turn 1 lra_assert into 1 gcc_assert
> OK, I guess I can copy-and-paste reload1.c:spill_failure there. This generates the same error message than reload1.c:spill_failure (modulo the class of the spill register, but it's not mentioned in the asm case either). Tested on x86_64-suse-linux, OK for the mainline? 2014-10-09 Eric Botcazou * lra-assigns.c (assign_by_spills): Error out on spill failure. -- Eric BotcazouIndex: lra-assigns.c === --- lra-assigns.c (revision 215968) +++ lra-assigns.c (working copy) @@ -1286,10 +1286,9 @@ assign_by_spills (void) break; if (iter > 0) { - /* We did not assign hard regs to reload pseudos after two - iteration. It means something is wrong with asm insn - constraints. Report it. */ - bool asm_p = false; + /* We did not assign hard regs to reload pseudos after two iterations. + Either it's an asm and something is wrong with the constraints, or + we have run out of spill registers; error out in either case. */ bitmap_head failed_reload_insns; bitmap_initialize (&failed_reload_insns, ®_obstack); @@ -1299,7 +1298,7 @@ assign_by_spills (void) bitmap_ior_into (&failed_reload_insns, &lra_reg_info[regno].insn_bitmap); /* Assign an arbitrary hard register of regno class to - avoid further trouble with the asm insns. */ + avoid further trouble with this insn. */ bitmap_clear_bit (&all_spilled_pseudos, regno); assign_hard_regno (ira_class_hard_regs[regno_allocno_class_array[regno]][0], @@ -1310,7 +1309,6 @@ assign_by_spills (void) insn = lra_insn_recog_data[u]->insn; if (asm_noperands (PATTERN (insn)) >= 0) { - asm_p = true; error_for_asm (insn, "% operand has impossible constraints"); /* Avoid further trouble with this insn. @@ -1331,8 +1329,12 @@ assign_by_spills (void) lra_set_insn_deleted (insn); } } + else + { + error ("unable to find a register to spill"); + fatal_insn ("this is the insn:", insn); + } } - lra_assert (asm_p); break; } /* This is a very rare event. We can not assign a hard register
Re: [fortran,patch] Emit code for some IEEE functions in the front-end
> Ok, thanks for the patch! Committed as rev. 216036. Thanks for the review. FX
[PATCH i386 AVX512] [64/n] Add rest of VI1-AVX2: vpack[us]wb.
Hello, This patch adds rest of vpack instruction patterns. Bootstrapped. gcc.target/i386.exp tests on top of patch-set show no regressions. under simulator. Is it ok for trunk? gcc/ * config/i386/sse.md (define_insn "_packsswb"): Add masking. (define_insn "_packuswb"): Ditto. -- Thanks, K diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index d3e9635..594e692 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -11007,21 +11007,21 @@ DONE; }) -(define_insn "_packsswb" - [(set (match_operand:VI1_AVX2 0 "register_operand" "=x,x") +(define_insn "_packsswb" + [(set (match_operand:VI1_AVX2 0 "register_operand" "=x,v") (vec_concat:VI1_AVX2 (ss_truncate: - (match_operand: 1 "register_operand" "0,x")) + (match_operand: 1 "register_operand" "0,v")) (ss_truncate: - (match_operand: 2 "nonimmediate_operand" "xm,xm"] - "TARGET_SSE2" + (match_operand: 2 "nonimmediate_operand" "xm,vm"] + "TARGET_SSE2 && && " "@ packsswb\t{%2, %0|%0, %2} - vpacksswb\t{%2, %1, %0|%0, %1, %2}" + vpacksswb\t{%2, %1, %0|%0, %1, %2}" [(set_attr "isa" "noavx,avx") (set_attr "type" "sselog") (set_attr "prefix_data16" "1,*") - (set_attr "prefix" "orig,vex") + (set_attr "prefix" "orig,maybe_evex") (set_attr "mode" "")]) (define_insn "_packssdw" @@ -11041,17 +11041,17 @@ (set_attr "prefix" "orig,vex") (set_attr "mode" "")]) -(define_insn "_packuswb" - [(set (match_operand:VI1_AVX2 0 "register_operand" "=x,x") +(define_insn "_packuswb" + [(set (match_operand:VI1_AVX2 0 "register_operand" "=x,v") (vec_concat:VI1_AVX2 (us_truncate: - (match_operand: 1 "register_operand" "0,x")) + (match_operand: 1 "register_operand" "0,v")) (us_truncate: - (match_operand: 2 "nonimmediate_operand" "xm,xm"] - "TARGET_SSE2" + (match_operand: 2 "nonimmediate_operand" "xm,vm"] + "TARGET_SSE2 && && " "@ packuswb\t{%2, %0|%0, %2} - vpackuswb\t{%2, %1, %0|%0, %1, %2}" + vpackuswb\t{%2, %1, %0|%0, %1, %2}" [(set_attr "isa" "noavx,avx") (set_attr "type" "sselog") (set_attr "prefix_data16" "1,*")
[PATCH i386 AVX512] [65/n] Add rest of VI1-AVX2: mul insn pattern.
Hello, This tiny patch extend mul insn pattern to support masking. Bootstrapped. AVX-512* tests on top of patch-set all pass under simulator. Is it ok for trunk? gcc/ * config/i386/sse.md (define_expand "mul3"): Add masking. -- Thanks, K diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 594e692..a3b2477 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -9238,11 +9238,11 @@ (set_attr "prefix" "orig,maybe_evex") (set_attr "mode" "TI")]) -(define_expand "mul3" +(define_expand "mul3" [(set (match_operand:VI1_AVX2 0 "register_operand") (mult:VI1_AVX2 (match_operand:VI1_AVX2 1 "register_operand") (match_operand:VI1_AVX2 2 "register_operand")))] - "TARGET_SSE2" + "TARGET_SSE2 && && " { ix86_expand_vecop_qihi (MULT, operands[0], operands[1], operands[2]); DONE;
[PATCH i386 AVX512] [66/n] Extend vpalignr insn patterns.
Hello, This patch extends vpalignr insn patterns. It also introduces dedicated `masked' version of pattern w/o substing. Bootstrapped. AVX-512* tests on top of patch-set all pass under simulator. Is it ok for trunk? gcc/ * config/i386/sse.md (define_mode_iterator SSESCALARMODE): Add V4TI mode. (define_insn "_palignr_mask"): New. (define_insn "_palignr"): Add EVEX version. -- Thanks, K diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index a3b2477..79b6012 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -351,7 +351,7 @@ ;; ??? This should probably be dropped in favor of VIMAX_AVX2. (define_mode_iterator SSESCALARMODE - [(V2TI "TARGET_AVX2") TI]) + [(V4TI "TARGET_AVX512BW") (V2TI "TARGET_AVX2") TI]) (define_mode_iterator VI12_AVX2 [(V64QI "TARGET_AVX512BW") (V32QI "TARGET_AVX2") V16QI @@ -13621,11 +13621,33 @@ (set (attr "prefix_rex") (symbol_ref "x86_extended_reg_mentioned_p (insn)")) (set_attr "mode" "DI")]) +(define_insn "_palignr_mask" + [(set (match_operand:VI1_AVX2 0 "register_operand" "=v") +(vec_merge:VI1_AVX2 + (unspec:VI1_AVX2 + [(match_operand:VI1_AVX2 1 "register_operand" "v") +(match_operand:VI1_AVX2 2 "nonimmediate_operand" "vm") +(match_operand:SI 3 "const_0_to_255_mul_8_operand" "n")] + UNSPEC_PALIGNR) + (match_operand:VI1_AVX2 4 "vector_move_operand" "0C") + (match_operand: 5 "register_operand" "Yk")))] + "TARGET_AVX512BW && ( == 64 || TARGET_AVX512VL)" +{ + operands[3] = GEN_INT (INTVAL (operands[3]) / 8); + return "vpalignr\t{%3, %2, %1, %0%{%5%}%N4|%0%{%5%}%N4, %1, %2, %3}"; +} + [(set_attr "type" "sseishft") + (set_attr "atom_unit" "sishuf") + (set_attr "prefix_extra" "1") + (set_attr "length_immediate" "1") + (set_attr "prefix" "evex") + (set_attr "mode" "")]) + (define_insn "_palignr" - [(set (match_operand:SSESCALARMODE 0 "register_operand" "=x,x") + [(set (match_operand:SSESCALARMODE 0 "register_operand" "=x,v") (unspec:SSESCALARMODE - [(match_operand:SSESCALARMODE 1 "register_operand" "0,x") - (match_operand:SSESCALARMODE 2 "nonimmediate_operand" "xm,xm") + [(match_operand:SSESCALARMODE 1 "register_operand" "0,v") + (match_operand:SSESCALARMODE 2 "nonimmediate_operand" "xm,vm") (match_operand:SI 3 "const_0_to_255_mul_8_operand" "n,n")] UNSPEC_PALIGNR))] "TARGET_SSSE3"
Re: [i386] Replace builtins with vector extensions
Ping https://gcc.gnu.org/ml/gcc-patches/2014-07/msg01812.html (another part of the discussion is around https://gcc.gnu.org/ml/gcc-patches/2014-06/msg02288.html ) Most people who commented seem cautiously in favor. The least favorable was Ulrich who suggested to go with it but keep the old behavior accessible if the user defines some macro (which imho would lose a large part of the simplification benefits of the patch) https://gcc.gnu.org/ml/gcc-patches/2014-06/msg02328.html If this is accepted, I will gladly prepare patches removing the unused builtins and extending this to a few more operations (integer vectors in particular). If this is not the direction we want to go, I'd like to hear it clearly so I can move on... My main doubt with the current patch is whether it is better to write simply (both variables have type __m128d): __A + __B or, as we will have to do for integers: (__m128d)((__v2df)__A + (__v2df)__B) -- Marc Glisse
[PATCH i386 AVX512] [67/n] Update constraints in vec_dup insn pattern.
Hello, This tiny patch updates constraints in vec_dup insn pattern. Bootstrapped. AVX-512* tests on top of patch-set all pass under simulator. Is it ok for trunk? gcc/ * config/i386/sse.md (define_insn "vec_dup"): Update constraints. -- Thanks, K diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 79b6012..f88d3d0 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -16448,9 +16448,9 @@ [V8SI V8SF V4DI V4DF]) (define_insn "vec_dup" - [(set (match_operand:AVX_VEC_DUP_MODE 0 "register_operand" "=x,x,x") + [(set (match_operand:AVX_VEC_DUP_MODE 0 "register_operand" "=x,v,x") (vec_duplicate:AVX_VEC_DUP_MODE - (match_operand: 1 "nonimmediate_operand" "m,x,?x")))] + (match_operand: 1 "nonimmediate_operand" "m,v,?x")))] "TARGET_AVX" "@ vbroadcast\t{%1, %0|%0, %1}
Re: [patch] Add -static-libquadmath option
Version 2 of the patch, now handling the darwin case (thanks Iain) and expressely noting in the documentation the implications on redistribution (thanks Joseph). Bootstrapped and regtested on x86_64-apple-darwin14. OK to commit? I need a C/driver options maintainer, or global reviewer, to OK the C changes (but they really are obvious). As Iain suggested the darwin.h change, I consider it pre-approved by him :) > We have a -static-libgfortran option, but on targets where we support > quad-prec math through libquadmath, we didn’t have an equivalent > -static-libquadmath so far. This patch adds it, in what I think is a rather > straightforward manner. > > The only minor complication comes from the fact that previously, linking > libquadmath was conditionally through libgfortran.spec. So the spec was > modified: when we use -static-libquadmath, the driver itself includes the > -lquadmath (surrounded by the necessary static linking directives), so that > libgfortran.spec shouldn’t do it again. > > Bootstrapped and regtested on x86_64 linux. OK to commit? static_quad.ChangeLog Description: Binary data static_quad.diff Description: Binary data
Re: 答复: [4.8 & 4.9] Backport of r211885
On Thu, Oct 9, 2014 at 11:35 AM, Yangfei (Felix) wrote: > >> > > > On Thu, Oct 09, 2014 at 09:04:49AM +, Yangfei (Felix) wrote: >> > > > > On Wed, Oct 08, 2014 at 11:00:24PM +0800, Felix Yang wrote: >> > > > > The enclosed patch for 4.8 & 4.9 branch is a backport of r211885 >> > > > > from >> > trunk. >> > > > > >> > > > > The only change is to use: >> > > > > >> > > > > for (def_rec = DF_INSN_INFO_DEFS (insn_info); *def_rec; >> > > > > def_rec++) >> > > > > >> > > > > other than the new FOR_EACH_INSN_INFO_DEF interface. >> > > > > >> > > > > Bootstrapped on x86_64-SUSE-Linux for both branches. OK to apply? >> > > > >> > > > ChangeLog entry is missing, plus description why do you want to >> > > > backport >> > it. >> > > > If it fixes a bug on the branches, it would be better to have a >> > > > bugzilla PR for that, and definitely a testcase. >> > > > >> > > >> > > Yeah, I will add a ChangeLog entry for this patch when it is committed. >> > > I encountered the same issue when working on my local customized >> > > 4.8/4.9 >> > branches. Not reproduceable with the official 4.8/4.9 branches. >> > > I thinks it's just an enhancement for the loop invariant pass to >> > > make it more >> > versatile. It's better that 4.8/4.9 branches also inlcude this enhancement. >> > > OK? >> > >> > If it is just an enhancement, then those generally are not backported >> > to release branches (exceptions possible of course, but there needs to be a >> strong reason). >> > Each pass has some risk of breaking something, exposing previously >> > only latent bugs in later passes etc. >> > >> > Jakub >> >> We can treat it as bugfix, as we got incorrect code when it triggers. >> It just happens so rarely. Does it worth backporting? > > And the patch fix this bug by making the loop invariant pass more > conservative. > I didn't find a PR or testcase on trunk for this patch either. We at least want a testcase for the "we got incorrect code when it triggers". Richard. >
Re: [patch] Turn 1 lra_assert into 1 gcc_assert
On Thu, Oct 9, 2014 at 11:39 AM, Eric Botcazou wrote: >> OK, I guess I can copy-and-paste reload1.c:spill_failure there. > > This generates the same error message than reload1.c:spill_failure (modulo the > class of the spill register, but it's not mentioned in the asm case either). > > Tested on x86_64-suse-linux, OK for the mainline? Ok. Thanks, Richard. > > 2014-10-09 Eric Botcazou > > * lra-assigns.c (assign_by_spills): Error out on spill failure. > > > -- > Eric Botcazou
Re: [patch] Add -static-libquadmath option
Hi FX, On 9 Oct 2014, at 11:39, FX wrote: > Version 2 of the patch, now handling the darwin case (thanks Iain) and > expressely noting in the documentation the implications on redistribution > (thanks Joseph). > Bootstrapped and regtested on x86_64-apple-darwin14. > > OK to commit? > > I need a C/driver options maintainer, or global reviewer, to OK the C changes > (but they really are obvious). > As Iain suggested the darwin.h change, I consider it pre-approved by him :) But it still needs to be OK'd by either a global reviewer or one of the listed Darwin maintainers ;) ... ... (ccing Mike) Iain > > > >> We have a -static-libgfortran option, but on targets where we support >> quad-prec math through libquadmath, we didn’t have an equivalent >> -static-libquadmath so far. This patch adds it, in what I think is a rather >> straightforward manner. >> >> The only minor complication comes from the fact that previously, linking >> libquadmath was conditionally through libgfortran.spec. So the spec was >> modified: when we use -static-libquadmath, the driver itself includes the >> -lquadmath (surrounded by the necessary static linking directives), so that >> libgfortran.spec shouldn’t do it again. >> >> Bootstrapped and regtested on x86_64 linux. OK to commit? > > >
Re: [PATCH] Add zero-overhead looping for xtensa backend
Hello Sterling, My paper work with the FSF has finished and we can now move forward with this patch :-) I rebased the patch on the latest trunk. Attached please find version 3 of the patch. And the enclosed patch also includes the two points pointed by you, do you like it? Make check regression tested with xtensa-elf-gcc built from trunk with this patch. OK to apply? Index: gcc/ChangeLog === --- gcc/ChangeLog(revision 216036) +++ gcc/ChangeLog(working copy) @@ -1,3 +1,19 @@ +2014-10-09 Felix Yang + +* config/xtensa/xtensa.h (TARGET_LOOPS): New Macro. +* config/xtensa/xtensa.c (xtensa_reorg): New. +(xtensa_reorg_loops): New. +(xtensa_can_use_doloop_p): New. +(xtensa_invalid_within_doloop): New. +(hwloop_optimize): New. +(hwloop_fail): New. +(hwloop_pattern_reg): New. +(xtensa_emit_loop_end): Modified to emit the zero-overhead loop end label. +(xtensa_doloop_hooks): Define. +* config/xtensa/xtensa.md (doloop_end): New. +(zero_cost_loop_start): Rewritten. +(zero_cost_loop_end): Rewritten. + 2014-10-09 Joern Rennecke * config/avr/avr.opt (mmcu=): Change to have a string value. Index: gcc/config/xtensa/xtensa.md === --- gcc/config/xtensa/xtensa.md(revision 216036) +++ gcc/config/xtensa/xtensa.md(working copy) @@ -35,6 +35,8 @@ (UNSPEC_TLS_CALL9) (UNSPEC_TP10) (UNSPEC_MEMW11) + (UNSPEC_LSETUP_START 12) + (UNSPEC_LSETUP_END13) (UNSPECV_SET_FP1) (UNSPECV_ENTRY2) @@ -1289,41 +1291,67 @@ (set_attr "length""3")]) +;; Zero-overhead looping support. + ;; Define the loop insns used by bct optimization to represent the -;; start and end of a zero-overhead loop (in loop.c). This start -;; template generates the loop insn; the end template doesn't generate -;; any instructions since loop end is handled in hardware. +;; start and end of a zero-overhead loop. This start template generates +;; the loop insn; the end template doesn't generate any instructions since +;; loop end is handled in hardware. (define_insn "zero_cost_loop_start" [(set (pc) -(if_then_else (eq (match_operand:SI 0 "register_operand" "a") - (const_int 0)) - (label_ref (match_operand 1 "" "")) - (pc))) - (set (reg:SI 19) -(plus:SI (match_dup 0) (const_int -1)))] +(if_then_else (ne (match_operand:SI 0 "register_operand" "a") + (const_int 1)) + (label_ref (match_operand 1 "" "")) + (pc))) + (set (match_operand:SI 2 "register_operand" "+a0") +(plus (match_dup 2) + (const_int -1))) + (unspec [(const_int 0)] UNSPEC_LSETUP_START)] "" - "loopnez\t%0, %l1" + "loop\t%0, %l1_LEND" [(set_attr "type""jump") (set_attr "mode""none") (set_attr "length""3")]) (define_insn "zero_cost_loop_end" [(set (pc) -(if_then_else (ne (reg:SI 19) (const_int 0)) - (label_ref (match_operand 0 "" "")) - (pc))) - (set (reg:SI 19) -(plus:SI (reg:SI 19) (const_int -1)))] +(if_then_else (ne (match_operand:SI 0 "register_operand" "a") + (const_int 1)) + (label_ref (match_operand 1 "" "")) + (pc))) + (set (match_operand:SI 2 "register_operand" "+a0") +(plus (match_dup 2) + (const_int -1))) + (unspec [(const_int 0)] UNSPEC_LSETUP_END)] "" { -xtensa_emit_loop_end (insn, operands); -return ""; + xtensa_emit_loop_end (insn, operands); + return ""; } [(set_attr "type""jump") (set_attr "mode""none") (set_attr "length""0")]) +; operand 0 is the loop count pseudo register +; operand 1 is the label to jump to at the top of the loop +(define_expand "doloop_end" + [(parallel [(set (pc) (if_then_else + (ne (match_operand:SI 0 "" "") + (const_int 1)) + (label_ref (match_operand 1 "" "")) + (pc))) + (set (match_dup 0) + (plus:SI (match_dup 0) +(const_int -1))) + (unspec [(const_int 0)] UNSPEC_LSETUP_END)])] + "" +{ + /* The loop optimizer doesn't check the predicates... */ + if (GET_MODE (operands[0]) != SImode) +FAIL; +}) + ;; Setting a register from a comparison. Index: gcc/config/xtensa/xtensa.c === --- gcc/config/xtensa/xtensa.c(revision 216036) +++ gcc/config/xtensa/xtensa.c(working copy) @@ -61,6 +61,8 @@ along with GCC; see the file COPYING3. If not see #include "gimplify.h" #include "df.h" #include "builtins.h" +#include "dumpfile.h" +#include "hw-doloop.h" /* Enumeration for all of the rel
Re: 答复: [4.8 & 4.9] Backport of r211885
The issue is that I cannot reproduce it on the official released branches. It happens on my local GCC branch (with a new port). Let's see if the original author of the patch has an testcase. Zhenqiang, do you have one that can reproduce this bug with the official 4.8/4.9 branches? Thanks. Cheers, Felix On Thu, Oct 9, 2014 at 6:41 PM, Richard Biener wrote: > On Thu, Oct 9, 2014 at 11:35 AM, Yangfei (Felix) > wrote: >> >>> > > > On Thu, Oct 09, 2014 at 09:04:49AM +, Yangfei (Felix) wrote: >>> > > > > On Wed, Oct 08, 2014 at 11:00:24PM +0800, Felix Yang wrote: >>> > > > > The enclosed patch for 4.8 & 4.9 branch is a backport of r211885 >>> > > > > from >>> > trunk. >>> > > > > >>> > > > > The only change is to use: >>> > > > > >>> > > > > for (def_rec = DF_INSN_INFO_DEFS (insn_info); *def_rec; >>> > > > > def_rec++) >>> > > > > >>> > > > > other than the new FOR_EACH_INSN_INFO_DEF interface. >>> > > > > >>> > > > > Bootstrapped on x86_64-SUSE-Linux for both branches. OK to apply? >>> > > > >>> > > > ChangeLog entry is missing, plus description why do you want to >>> > > > backport >>> > it. >>> > > > If it fixes a bug on the branches, it would be better to have a >>> > > > bugzilla PR for that, and definitely a testcase. >>> > > > >>> > > >>> > > Yeah, I will add a ChangeLog entry for this patch when it is committed. >>> > > I encountered the same issue when working on my local customized >>> > > 4.8/4.9 >>> > branches. Not reproduceable with the official 4.8/4.9 branches. >>> > > I thinks it's just an enhancement for the loop invariant pass to >>> > > make it more >>> > versatile. It's better that 4.8/4.9 branches also inlcude this >>> > enhancement. >>> > > OK? >>> > >>> > If it is just an enhancement, then those generally are not backported >>> > to release branches (exceptions possible of course, but there needs to be >>> > a >>> strong reason). >>> > Each pass has some risk of breaking something, exposing previously >>> > only latent bugs in later passes etc. >>> > >>> > Jakub >>> >>> We can treat it as bugfix, as we got incorrect code when it triggers. >>> It just happens so rarely. Does it worth backporting? >> >> And the patch fix this bug by making the loop invariant pass more >> conservative. >> I didn't find a PR or testcase on trunk for this patch either. > > We at least want a testcase for the "we got incorrect code when it triggers". > > Richard. > >>
[PATCH i386 AVX512] [68/n] Add vpmullw, vpacksdw, pmaddwd insn patterns.
Hello, This patch extends vpmullw, vpacksdw and pmaddwd insn patterns. Bootstrapped. AVX-512* tests on top of patch-set all pass under simulator. Is it ok for trunk? gcc/ * config/i386/sse.md (define_c_enum "unspec"): Add UNSPEC_PMADDWD512. (define_mode_iterator VI2_AVX2): Add V32HI mode. (define_expand "mul3"): Add masking. (define_insn "*mul3"): Ditto. (define_expand "mul3_highpart"): Ditto. (define_insn "*mul3_highpart"): Ditto. (define_insn "avx512bw_pmaddwd512"): New. (define_mode_attr SDOT_PMADD_SUF): Ditto. (define_expand "sdot_prod"): Add . (define_insn "_packssdw"): Add masking. (define_insn "*_pmulhrsw3"): Ditto. (define_insn "avx2_packusdw"): Delete. (define_insn "sse4_1_packusdw"): Ditto. (define_insn "_packusdw"): New. -- Thanks, K diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index f88d3d0..90414c7 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -132,6 +132,7 @@ ;; For AVX512BW support UNSPEC_DBPSADBW UNSPEC_PMADDUBSW512 + UNSPEC_PMADDWD512 UNSPEC_PSHUFHW UNSPEC_PSHUFLW UNSPEC_CVTINT2MASK @@ -301,7 +302,7 @@ [(V64QI "TARGET_AVX512BW") (V32QI "TARGET_AVX2") V16QI]) (define_mode_iterator VI2_AVX2 - [(V16HI "TARGET_AVX2") V8HI]) + [(V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX2") V8HI]) (define_mode_iterator VI2_AVX512F [(V32HI "TARGET_AVX512F") (V16HI "TARGET_AVX2") V8HI]) @@ -9248,28 +9249,30 @@ DONE; }) -(define_expand "mul3" +(define_expand "mul3" [(set (match_operand:VI2_AVX2 0 "register_operand") (mult:VI2_AVX2 (match_operand:VI2_AVX2 1 "nonimmediate_operand") (match_operand:VI2_AVX2 2 "nonimmediate_operand")))] - "TARGET_SSE2" + "TARGET_SSE2 && && " "ix86_fixup_binary_operands_no_copy (MULT, mode, operands);") -(define_insn "*mul3" - [(set (match_operand:VI2_AVX2 0 "register_operand" "=x,x") -(mult:VI2_AVX2 (match_operand:VI2_AVX2 1 "nonimmediate_operand" "%0,v") - (match_operand:VI2_AVX2 2 "nonimmediate_operand" "xm,vm")))] - "TARGET_SSE2 && ix86_binary_operator_ok (MULT, mode, operands)" +(define_insn "*mul3" + [(set (match_operand:VI2_AVX2 0 "register_operand" "=x,v") + (mult:VI2_AVX2 (match_operand:VI2_AVX2 1 "nonimmediate_operand" "%0,v") + (match_operand:VI2_AVX2 2 "nonimmediate_operand" "xm,vm")))] + "TARGET_SSE2 + && ix86_binary_operator_ok (MULT, mode, operands) + && && " "@ pmullw\t{%2, %0|%0, %2} - vpmullw\t{%2, %1, %0|%0, %1, %2}" + vpmullw\t{%2, %1, %0|%0, %1, %2}" [(set_attr "isa" "noavx,avx") (set_attr "type" "sseimul") (set_attr "prefix_data16" "1,*") (set_attr "prefix" "orig,vex") (set_attr "mode" "")]) -(define_expand "mul3_highpart" +(define_expand "mul3_highpart" [(set (match_operand:VI2_AVX2 0 "register_operand") (truncate:VI2_AVX2 (lshiftrt: @@ -9279,23 +9282,26 @@ (any_extend: (match_operand:VI2_AVX2 2 "nonimmediate_operand"))) (const_int 16] - "TARGET_SSE2" + "TARGET_SSE2 + && && " "ix86_fixup_binary_operands_no_copy (MULT, mode, operands);") -(define_insn "*mul3_highpart" - [(set (match_operand:VI2_AVX2 0 "register_operand" "=x,x") +(define_insn "*mul3_highpart" + [(set (match_operand:VI2_AVX2 0 "register_operand" "=x,v") (truncate:VI2_AVX2 (lshiftrt: (mult: (any_extend: - (match_operand:VI2_AVX2 1 "nonimmediate_operand" "%0,x")) + (match_operand:VI2_AVX2 1 "nonimmediate_operand" "%0,v")) (any_extend: - (match_operand:VI2_AVX2 2 "nonimmediate_operand" "xm,xm"))) + (match_operand:VI2_AVX2 2 "nonimmediate_operand" "xm,vm"))) (const_int 16] - "TARGET_SSE2 && ix86_binary_operator_ok (MULT, mode, operands)" + "TARGET_SSE2 + && ix86_binary_operator_ok (MULT, mode, operands) + && && " "@ pmulhw\t{%2, %0|%0, %2} - vpmulhw\t{%2, %1, %0|%0, %1, %2}" + vpmulhw\t{%2, %1, %0|%0, %1, %2}" [(set_attr "isa" "noavx,avx") (set_attr "type" "sseimul") (set_attr "prefix_data16" "1,*") @@ -9538,6 +9544,18 @@ (set_attr "prefix" "orig,vex") (set_attr "mode" "TI")]) +(define_insn "avx512bw_pmaddwd512" + [(set (match_operand: 0 "register_operand" "=v") + (unspec: +[(match_operand:VI2_AVX2 1 "register_operand" "v") + (match_operand:VI2_AVX2 2 "nonimmediate_operand" "vm")] + UNSPEC_PMADDWD512))] + "TARGET_AVX512BW && " + "vpmaddwd\t{%2, %1, %0|%0, %1, %2}"; + [(set_attr "type" "sseiadd") + (set_attr "prefix" "evex") + (set_attr "mode" "XI")]) + (define_expand "avx2_pmaddwd" [(set (match_operand:V8SI 0 "register_operand") (plus:V8SI @@ -9778,6 +9796,9 @@ DONE; }) +(define_mode_attr SDOT_PMADD_SUF + [(V32HI "512v32hi") (V16HI "") (V8HI "")]) +
Re: [PATCH i386 AVX512] [68/n] Add vpmullw, vpacksdw, pmaddwd insn patterns.
On 09 Oct 15:07, Kirill Yukhin wrote: > +(define_insn "*mul3" > + [(set (match_operand:VI2_AVX2 0 "register_operand" "=x,v") > + (mult:VI2_AVX2 (match_operand:VI2_AVX2 1 "nonimmediate_operand" "%0,v") > +(match_operand:VI2_AVX2 2 "nonimmediate_operand" > "xm,vm")))] > + "TARGET_SSE2 > + && ix86_binary_operator_ok (MULT, mode, operands) > + && && " Just noticed, that need to swap target check with operads check. -- Thanks, K
[PATCH i386 AVX512] [69/n] Add vpmulhrsw insn support.
Hello, This patch adds support for vpmulhrsw insn. Bootstrapped. AVX-512* tests on top of patch-set all pass under simulator. Is it ok for trunk? gcc/ * config/i386/sse.md (define_insn "avx512bw_umulhrswv32hi3"): New. (define_expand "_pmulhrsw3_mask"): Ditto. -- Thanks, K diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 90414c7..25bf3d8 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -13439,6 +13439,41 @@ (set_attr "prefix" "evex") (set_attr "mode" "XI")]) +(define_insn "avx512bw_umulhrswv32hi3" + [(set (match_operand:V32HI 0 "register_operand" "=v") + (truncate:V32HI + (lshiftrt:V32SI + (plus:V32SI + (lshiftrt:V32SI + (mult:V32SI + (sign_extend:V32SI + (match_operand:V32HI 1 "nonimmediate_operand" "%v")) + (sign_extend:V32SI + (match_operand:V32HI 2 "nonimmediate_operand" "vm"))) + (const_int 14)) + (const_vector:V32HI [(const_int 1) (const_int 1) + (const_int 1) (const_int 1) + (const_int 1) (const_int 1) + (const_int 1) (const_int 1) + (const_int 1) (const_int 1) + (const_int 1) (const_int 1) + (const_int 1) (const_int 1) + (const_int 1) (const_int 1) + (const_int 1) (const_int 1) + (const_int 1) (const_int 1) + (const_int 1) (const_int 1) + (const_int 1) (const_int 1) + (const_int 1) (const_int 1) + (const_int 1) (const_int 1) + (const_int 1) (const_int 1) + (const_int 1) (const_int 1)])) + (const_int 1] + "TARGET_AVX512BW" + "vpmulhrsw\t{%2, %1, %0|%0, %1, %2}" + [(set_attr "type" "sseimul") + (set_attr "prefix" "evex") + (set_attr "mode" "XI")]) + (define_insn "ssse3_pmaddubsw128" [(set (match_operand:V8HI 0 "register_operand" "=x,x") (ss_plus:V8HI @@ -13516,6 +13551,29 @@ (define_mode_iterator PMULHRSW [V4HI V8HI (V16HI "TARGET_AVX2")]) +(define_expand "_pmulhrsw3_mask" + [(set (match_operand:PMULHRSW 0 "register_operand") + (vec_merge:PMULHRSW + (truncate:PMULHRSW + (lshiftrt: + (plus: + (lshiftrt: + (mult: + (sign_extend: + (match_operand:PMULHRSW 1 "nonimmediate_operand")) + (sign_extend: + (match_operand:PMULHRSW 2 "nonimmediate_operand"))) + (const_int 14)) + (match_dup 5)) + (const_int 1))) + (match_operand:PMULHRSW 3 "register_operand") + (match_operand: 4 "register_operand")))] + "TARGET_AVX512BW && TARGET_AVX512VL" +{ + operands[5] = CONST1_RTX(mode); + ix86_fixup_binary_operands_no_copy (MULT, mode, operands); +}) + (define_expand "_pmulhrsw3" [(set (match_operand:PMULHRSW 0 "register_operand") (truncate:PMULHRSW
Re: [PATCH RFC]Pair load store instructions using a generic scheduling fusion pass
On Oct 7, 2014, at 10:28 PM, Jeff Law wrote: > It's certainly advantageous for sched2 to split insns that generate multiple > instructions. So, on my port, I have a load multiple that is just one instruction, and it is a single clock cycle (to enque it).
Re: [PATCH, ARM] attribute target (thumb,arm)
On 10/08/2014 06:56 PM, Ramana Radhakrishnan wrote: > Hi Christian, << snipped agreed stuf >> > 3) about inlining >I dislike inlining different modes, From a conceptual use, a user > might want to switch mode only when changing a function's hotness. > Usually inlining a cold function into a hot one is not what the user > explicitly required when setting different mode attributes for them, > __attribute__((thumb)) should not imply coldness or hotness. Inlining > between cold and hot functions should be done based on profile feedback. > The choice of compiling in "Thumb1" state for coldness is a separate one > because that's where the choice needs to be made. Ideally yes. but I think that a user motivation to use target attribute (("thumb") is to reduce code size even in the cases where PFO is not available (libraries, kernel or user application build spec packaged without profile data). And there are cases where static probabilities are not enough and that a user wants it own control with gprof or oprofile. But in this case, we could point to the __attribute__ (("cold")) on the function ? That would probably be the best workaround to propose if we recommend this But here is another scenario: Using of attribute (("arm")) for exception entry points is indeed not related to hotness. But consider a typical thumb binary with an entry point in arm compiled in C (ex handler, a kernel...). Today due to the file boundary the thumb part is not inlined into the arm point. (Using -flto is not possible because the whole gimple would be thumb). Now, using attribute (("target")) for the functions others than the entry point, with your approach they would all be inlined (assuming the cost allow this) and we would end up with a arm binary instead of a thumb binary... But there are still 3 points : - At least 2 other target (i386, Powerpc) that support attribute_target disable inlining between modes that are not subsets. I like to think about homogeneity between targets and I find odd to have different inlining rules... - Scanning the function body to check for ASM_INPUT does not look very elegant (if this matters) because the asm could well be unrelated The only case when it will always be a win to inline thumb into arm is when the cost of the inlined body is less than a BX instruction (but still, with branch prediction this cost is pondered). > >> The compiler would take a decision that is not what the user wrote. And >> in addition if you consider the few instructions to modify R15 to switch >> state that would end up with more code executed in the critical path, >> voiding a possible size of speed gain. > I do not expect there to be any additional instructions needed to switch > state. If function x is inlined into function y the state would be lost > and the state would be in terms of the state of function x. Yes, indeed. I was in a LCM/mode-switching thinking mode when writing this. In this case the mode is inherited. > Obviously if the user doesn't want inlining - the user would add > attributes to disable inlining. You do have extensions such as > __attribute__((noinline)) and __attribute__((never_inline)) to give the > user that control and those bits need to be used in addition. Those attributes are overkill. They would disable inlining between caller-callee of a same mode. This is not what we want > > The attribute then purely reflects then the output instruction state of > the function if a copy of it's body is laid out separately in the output. > > IMHO, the heuristics for inlining should be the best judge of when > functions should be inlined between one and another and we shouldn't be > second guessing that in the backend > > If there is a copy of the function to be put out by the compiler, only > then should we choose this based on the state of the "target" i.e. arm > or thumb. > Yes, So to summarize, we can: 1) don't inline between different modes. Same behavior with other targets. Solves asm case 2) always inline unless the function contains asm statements. ( I reject adding a new compilation switch) 3) always inline. But recommend the use of attribute ((noinline)) to handle asm or attribute ((cold,hot)) in the absence of profile datas I obviously prefer 1) safe and homogenous, 3) is the worse as it requires additional user action (poor user). 2) is less worse. Thanks for supporting me ::) Christian
[PATCH i386 AVX512] [70/n]
Hello, This patch further extends maxmin patterns. Bootstrapped. AVX-512* tests on top of patch-set all pass under simulator. Is it ok for trunk? gcc/ * config/i386/sse.md (define_insn "*3_finite"): Fix pattern conditions order. (define_insn "*sse4_1_3"): Add masking. (define_insn "*sse4_1_3"0: Ditto. -- Thanks, K diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 25bf3d8..a833cd9 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -1898,8 +1898,8 @@ (match_operand:VF 1 "" "%0,v") (match_operand:VF 2 "" "xm,")))] "TARGET_SSE && flag_finite_math_only - && ix86_binary_operator_ok (, mode, operands) - && && " + && && + && ix86_binary_operator_ok (, mode, operands)" "@ \t{%2, %0|%0, %2} v\t{%2, %1, %0|%0, %1, %2}" @@ -10209,15 +10209,17 @@ } }) -(define_insn "*sse4_1_3" - [(set (match_operand:VI14_128 0 "register_operand" "=x,x") +(define_insn "*sse4_1_3" + [(set (match_operand:VI14_128 0 "register_operand" "=x,v") (smaxmin:VI14_128 - (match_operand:VI14_128 1 "nonimmediate_operand" "%0,x") - (match_operand:VI14_128 2 "nonimmediate_operand" "xm,xm")))] - "TARGET_SSE4_1 && ix86_binary_operator_ok (, mode, operands)" + (match_operand:VI14_128 1 "nonimmediate_operand" "%0,v") + (match_operand:VI14_128 2 "nonimmediate_operand" "xm,vm")))] + "TARGET_SSE4_1 + && + && ix86_binary_operator_ok (, mode, operands)" "@ p\t{%2, %0|%0, %2} - vp\t{%2, %1, %0|%0, %1, %2}" + vp\t{%2, %1, %0|%0, %1, %2}" [(set_attr "isa" "noavx,avx") (set_attr "type" "sseiadd") (set_attr "prefix_extra" "1,*") @@ -10290,15 +10292,17 @@ } }) -(define_insn "*sse4_1_3" - [(set (match_operand:VI24_128 0 "register_operand" "=x,x") +(define_insn "*sse4_1_3" + [(set (match_operand:VI24_128 0 "register_operand" "=x,v") (umaxmin:VI24_128 - (match_operand:VI24_128 1 "nonimmediate_operand" "%0,x") - (match_operand:VI24_128 2 "nonimmediate_operand" "xm,xm")))] - "TARGET_SSE4_1 && ix86_binary_operator_ok (, mode, operands)" + (match_operand:VI24_128 1 "nonimmediate_operand" "%0,v") + (match_operand:VI24_128 2 "nonimmediate_operand" "xm,vm")))] + "TARGET_SSE4_1 + && + && ix86_binary_operator_ok (, mode, operands)" "@ p\t{%2, %0|%0, %2} - vp\t{%2, %1, %0|%0, %1, %2}" + vp\t{%2, %1, %0|%0, %1, %2}" [(set_attr "isa" "noavx,avx") (set_attr "type" "sseiadd") (set_attr "prefix_extra" "1,*")
Re: [i386] Replace builtins with vector extensions
On Thu, Oct 9, 2014 at 12:33 PM, Marc Glisse wrote: > Ping https://gcc.gnu.org/ml/gcc-patches/2014-07/msg01812.html > > (another part of the discussion is around > https://gcc.gnu.org/ml/gcc-patches/2014-06/msg02288.html ) > > Most people who commented seem cautiously in favor. The least favorable was > Ulrich who suggested to go with it but keep the old behavior accessible if > the user defines some macro (which imho would lose a large part of the > simplification benefits of the patch) > https://gcc.gnu.org/ml/gcc-patches/2014-06/msg02328.html > > If this is accepted, I will gladly prepare patches removing the unused > builtins and extending this to a few more operations (integer vectors in > particular). If this is not the direction we want to go, I'd like to hear it > clearly so I can move on... Well, I'm undecided. The current approach is proven to work OK, there is no bugs reported in this area and the performance is apparently OK. There should be clear benefits in order to change something that "ain't broken", and at least some proof that we won't regress in this area with the new approach. On the other hand, if the new approach opens new optimization opportunities (without regression!), I'm in favor of it, including the fact that new code won't produce equivalent assembly - as long as functionality of the optimized asm stays the same (obviously, I'd say). Please also note that this is quite big project. There are plenty of intrinsics and I for one don't want another partial transition ... TL/DR: If there are benefits, no regressions and you think you'll finish the transition, let's go for it. Uros.
[PATCH i386 AVX512] [71/n] [Obvious?] Remove redudant iterator attribute.
Hello, This obvious patch removes redundant iterator attribute Bootstrapped. Is it ok for trunk? gcc/ * config/i386/sse.md (define_mode_attr avx2_avx512f): Remove. -- Thanks, K diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index a833cd9..cf415c3 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -451,12 +451,6 @@ (V4SI "vec") (V8SI "avx2") (V2DI "vec") (V4DI "avx2")]) -(define_mode_attr avx2_avx512f - [(V4SI "avx2") (V8SI "avx2") (V16SI "avx512f") - (V2DI "avx2") (V4DI "avx2") (V8DI "avx512f") - (V8SF "avx2") (V16SF "avx512f") - (V4DF "avx2") (V8DF "avx512f")]) - (define_mode_attr avx2_avx512 [(V4SI "avx2") (V8SI "avx2") (V16SI "avx512f") (V2DI "avx2") (V4DI "avx2") (V8DI "avx512f")
Re: [PATCH] Don't call fatal_error before error reporting has been initialized.
Ping. On 29 Sep 18:02, Ilya Tocar wrote: > Hi, > > Currently if call to atexit (lto_wrapper_cleanup) fails we > won't report error as we haven't initialized error-reporting > infrastructure. This patch moves this call after diagnostic_initialize. > I hope that we can't exit inside diagnostic_initialize. Otherwise we > won't cleanup after it. > Ok for trunk? > > 2014-09-29 Ilya Tocar > > * lto-wrapper.c (main): Don't call fatal_error before > diagnostic_initialize. > --- > gcc/lto-wrapper.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/gcc/lto-wrapper.c b/gcc/lto-wrapper.c > index 08fd090..39e13b8 100644 > --- a/gcc/lto-wrapper.c > +++ b/gcc/lto-wrapper.c > @@ -870,13 +870,13 @@ main (int argc, char *argv[]) > >xmalloc_set_program_name (progname); > > - if (atexit (lto_wrapper_cleanup) != 0) > -fatal_error ("atexit failed"); > - >gcc_init_libintl (); > >diagnostic_initialize (global_dc, 0); > > + if (atexit (lto_wrapper_cleanup) != 0) > +fatal_error ("atexit failed"); > + >if (signal (SIGINT, SIG_IGN) != SIG_IGN) > signal (SIGINT, fatal_signal); > #ifdef SIGHUP > -- > 1.8.3.1 >
[PATCH i386 AVX512] [72/n] Extend VI itterator.
Hello, This patch extends VI mode iterator. Bootstrapped. AVX-512* tests on top of patch-set all pass under simulator. Is it ok for trunk? gcc/ * config/i386/i386.c (ix86_expand_vector_logical_operator): Handle V16SF and V8DF modes. * config/i386/sse.md (define_mode_iterator VI): Add V64QI and V32HI modes. -- Thanks, K diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index d759a45..257e12b 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -17502,8 +17502,10 @@ ix86_expand_vector_logical_operator (enum rtx_code code, enum machine_mode mode, { case V4SFmode: case V8SFmode: + case V16SFmode: case V2DFmode: case V4DFmode: + case V8DFmode: dst = gen_reg_rtx (GET_MODE (SUBREG_REG (op1))); if (GET_CODE (op2) == CONST_VECTOR) { diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index cf415c3..852cb30 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -265,8 +265,8 @@ ;; All vector integer modes (define_mode_iterator VI [(V16SI "TARGET_AVX512F") (V8DI "TARGET_AVX512F") - (V32QI "TARGET_AVX") V16QI - (V16HI "TARGET_AVX") V8HI + (V64QI "TARGET_AVX512BW") (V32QI "TARGET_AVX") V16QI + (V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX") V8HI (V8SI "TARGET_AVX") V4SI (V4DI "TARGET_AVX") V2DI])
[PATCH i386 AVX512] [73/n] Extend reduc min/max autogen.
Hello, This patch extends pattern for reducation maxmin autogen. Bootstrapped. AVX-512* tests on top of patch-set all pass under simulator. Is it ok for trunk? gcc/ * config/i386/sse.md (define_mode_iterator REDUC_SMINMAX_MODE): Add V64QI and V32HI modes. -- Thanks, K diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 852cb30..d4adc33 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -2316,7 +2316,8 @@ [(V32QI "TARGET_AVX2") (V16HI "TARGET_AVX2") (V8SI "TARGET_AVX2") (V4DI "TARGET_AVX2") (V8SF "TARGET_AVX") (V4DF "TARGET_AVX") - (V4SF "TARGET_SSE") (V16SI "TARGET_AVX512F") + (V4SF "TARGET_SSE") (V64QI "TARGET_AVX512BW") + (V32HI "TARGET_AVX512BW") (V16SI "TARGET_AVX512F") (V8DI "TARGET_AVX512F") (V16SF "TARGET_AVX512F") (V8DF "TARGET_AVX512F")])
Re: [patch] Add -static-libquadmath option
> But it still needs to be OK'd by either a global reviewer or one of the > listed Darwin maintainers ;) ... > ... (ccing Mike) Duh me. I thought you were a darwin maintainer. Sorry. FX
[PATCH i386 AVX512] [74/n] Add byte/word max/mix reduction.
Hello, Bootstrapped. AVX-512* tests on top of patch-set all pass under simulator. Is it ok for trunk? gcc/ * config/i386/i386.c (emit_reduc_half): Handle V64QI and V32HI mode. * config/i386/sse.md (define_mode_iterator VI_AVX512BW): New. (define_expand "reduc__"): Use VI512_48F_12BW. -- Thanks, K diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 257e12b..e79210c 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -41246,6 +41246,8 @@ emit_reduc_half (rtx dest, rtx src, int i) GEN_INT (i / 2)); } break; +case V64QImode: +case V32HImode: case V16SImode: case V16SFmode: case V8DImode: diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index d4adc33..106fa00 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -509,6 +509,8 @@ (define_mode_iterator VI48_256 [V8SI V4DI]) (define_mode_iterator VI48_512 [V16SI V8DI]) (define_mode_iterator VI4_256_8_512 [V8SI V8DI]) +(define_mode_iterator VI_AVX512BW + [V16SI V8DI (V32HI "TARGET_AVX512BW") (V64QI "TARGET_AVX512BW")]) ;; Int-float size matches (define_mode_iterator VI4F_128 [V4SI V4SF]) @@ -2332,9 +2334,9 @@ }) (define_expand "reduc__" - [(umaxmin:VI48_512 - (match_operand:VI48_512 0 "register_operand") - (match_operand:VI48_512 1 "register_operand"))] + [(umaxmin:VI_AVX512BW + (match_operand:VI_AVX512BW 0 "register_operand") + (match_operand:VI_AVX512BW 1 "register_operand"))] "TARGET_AVX512F" { ix86_expand_reduc (gen_3, operands[0], operands[1]);
Re: [PATCH 4/n] OpenMP 4.0 offloading infrastructure: lto-wrapper
On 08 Oct 12:26, Jakub Jelinek wrote: > On Thu, Oct 02, 2014 at 07:14:57PM +0400, Ilya Verbin wrote: > > @@ -1296,6 +1297,9 @@ static const char *const standard_startfile_prefix_2 > > relative to the driver. */ > > static const char *const tooldir_base_prefix = TOOLDIR_BASE_PREFIX; > > > > +/* A prefix to be used when this is an accelerator compiler. */ > > +static const char *const accel_dir_suffix = ACCEL_DIR_SUFFIX; > > Is ACCEL_DIR_SUFFIX here "" or something starting with "/ ? > > The reason I'm asking is that otherwise it seems gcc.c heavily uses > dir_separator_str for the separators. Don't have any experience with > targets where DIR_SEPARATOR is not / though, perhaps it is a non-issue. It is "" for the host compiler and starts with "/" for the accel compiler. Looking at gcc/configure.ac, there are a lot of paths with slashes. So, I think it's ok to define ACCEL_DIR_SUFFIX in such a way. > > +#ifndef ACCEL_COMPILER > >/* We need to check standard_exec_prefix/just_machine_suffix/specs > > for any override of as, ld and libraries. */ > >specs_file = (char *) alloca (strlen (standard_exec_prefix) > >+ strlen (just_machine_suffix) + sizeof ("specs")); > > - > >strcpy (specs_file, standard_exec_prefix); > >strcat (specs_file, just_machine_suffix); > >strcat (specs_file, "specs"); > >if (access (specs_file, R_OK) == 0) > > read_specs (specs_file, true, false); > > +#endif > > Why do you want to disable specs reading for the accel compiler? > Then users won't have the possibility to override defaults etc. easily... Bernd, Do you need this ifndef for PTX? Or I could remove it? > > @@ -7097,6 +7103,16 @@ main (int argc, char **argv) > >obstack_grow (&collect_obstack, argv[0], strlen (argv[0]) + 1); > >xputenv (XOBFINISH (&collect_obstack, char *)); > > > > + if (strlen (OFFLOAD_TARGETS) > 0) > > +{ > > + obstack_init (&collect_obstack); > > + obstack_grow (&collect_obstack, "OFFLOAD_TARGET_NAMES=", > > + sizeof ("OFFLOAD_TARGET_NAMES=") - 1); > > + obstack_grow (&collect_obstack, OFFLOAD_TARGETS, > > + strlen (OFFLOAD_TARGETS) + 1); > > + xputenv (XOBFINISH (&collect_obstack, char *)); > > +} > > + > > I'm surprised to see the obstack_init call here, but looking at > gcc.c, I'm surprised to see it in more places. > > I've always thought that obstack_init was something you invoke generally > once on a given obstack object, then work with the obstack and then > obstack_free (..., NULL) it at the end. Now, if it wants the obstack to be > live until exit, it just would not obstack_free it. But calling > obstack_init on the already initialized obstack results IMHO in memory > leaks. It should be initialized just once somewhere. Right, I removed obstack_init from this patch. > > +#define OFFLOAD_TARGET_NAMES_ENV "OFFLOAD_TARGET_NAMES" > > Missing comment about the env var and what it is for. Fixed. > > + char *suffix > > += XALLOCAVEC (char, strlen ("/accel//mkoffload") + strlen (target) + > > 1); > > Use sizeof ("/accel//mkoffload") + strlen (target) instead? Done. > > + out: > > The goto is probably unnecessary here, indenting all the lines by > 4 more spaces and using if (compiler) { ... } instead doesn't need > any further warpping. Done. > > + /* By default linker does not discard .gnu.offload_lto_* sections. > > */ > > + const char *linker_script = make_temp_file ("_linker_script.x"); > > + FILE *stream = fopen (linker_script, "w"); > > + if (!stream) > > + fatal_error ("fopen %s: %m", linker_script); > > + fprintf (stream, "SECTIONS { /DISCARD/ : { *(" > > + OFFLOAD_SECTION_NAME_PREFIX "*) } }\n"); > > + fclose (stream); > > + printf ("%s\n", linker_script); > > + > > + goto finish; > > +} > > Does this work with gold? Are there any other linkers that support plugins, > but don't support linker scripts this way? Oops, gold does not support scripts, outputted from plugins :( "error: SECTIONS seen after other input files; try -T/--script" Probably, we should update default linker scripts in binutils? But without latest ld/gold all binaries compiled without -flto and with offloading will contain intermediate bytecode... Thanks, -- Ilya
Re: [PATCH 4/n] OpenMP 4.0 offloading infrastructure: lto-wrapper
On 10/09/2014 02:07 PM, Ilya Verbin wrote: +#ifndef ACCEL_COMPILER /* We need to check standard_exec_prefix/just_machine_suffix/specs for any override of as, ld and libraries. */ specs_file = (char *) alloca (strlen (standard_exec_prefix) + strlen (just_machine_suffix) + sizeof ("specs")); - strcpy (specs_file, standard_exec_prefix); strcat (specs_file, just_machine_suffix); strcat (specs_file, "specs"); if (access (specs_file, R_OK) == 0) read_specs (specs_file, true, false); +#endif Why do you want to disable specs reading for the accel compiler? Then users won't have the possibility to override defaults etc. easily... Bernd, Do you need this ifndef for PTX? Or I could remove it? I suspect the paths aren't right. If that can be fixed it should be fine to remove the ifdef. Bernd
[PATCH i386 AVX512] [75/n] Update vec_init.
Hello, This patch extends vec_init-related routines/patterns. Bootstrapped. AVX-512* tests on top of patch-set all pass under simulator. Is it ok for trunk? gcc/ * config/i386/i386.c (ix86_expand_vector_init_duplicate): Handle V64QI and V32HI modes, update V8HI, V16QI, V32QI modes handling. (ix86_expand_vector_init_general): Handle V64QI and V32HI modes. * config/i386/sse.md (define_mode_iterator VI48F_512): Rename to ... (define_mode_iterator VI48F_I12_AVX512BW): ... this. Extend to AVX-512BW modes. (define_expand "vec_init"): Use VI48F_I12_AVX512BW. -- Thanks, K diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index e79210c..7c34431 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -39791,6 +39791,8 @@ ix86_expand_vector_init_duplicate (bool mmx_ok, enum machine_mode mode, case V8SFmode: case V8SImode: case V2DFmode: +case V64QImode: +case V32HImode: case V2DImode: case V4SFmode: case V4SImode: @@ -39821,6 +39823,9 @@ ix86_expand_vector_init_duplicate (bool mmx_ok, enum machine_mode mode, goto widen; case V8HImode: + if (TARGET_AVX512VL) +return ix86_vector_duplicate_value (mode, target, val); + if (TARGET_SSE2) { struct expand_vec_perm_d dperm; @@ -39851,6 +39856,9 @@ ix86_expand_vector_init_duplicate (bool mmx_ok, enum machine_mode mode, goto widen; case V16QImode: + if (TARGET_AVX512VL) +return ix86_vector_duplicate_value (mode, target, val); + if (TARGET_SSE2) goto permute; goto widen; @@ -39880,16 +39888,19 @@ ix86_expand_vector_init_duplicate (bool mmx_ok, enum machine_mode mode, case V16HImode: case V32QImode: - { - enum machine_mode hvmode = (mode == V16HImode ? V8HImode : V16QImode); - rtx x = gen_reg_rtx (hvmode); + if (TARGET_AVX512VL) +return ix86_vector_duplicate_value (mode, target, val); + else + { + enum machine_mode hvmode = (mode == V16HImode ? V8HImode : V16QImode); + rtx x = gen_reg_rtx (hvmode); - ok = ix86_expand_vector_init_duplicate (false, hvmode, x, val); - gcc_assert (ok); + ok = ix86_expand_vector_init_duplicate (false, hvmode, x, val); + gcc_assert (ok); - x = gen_rtx_VEC_CONCAT (mode, x, x); - emit_insn (gen_rtx_SET (VOIDmode, target, x)); - } + x = gen_rtx_VEC_CONCAT (mode, x, x); + emit_insn (gen_rtx_SET (VOIDmode, target, x)); + } return true; default: @@ -40451,8 +40462,9 @@ static void ix86_expand_vector_init_general (bool mmx_ok, enum machine_mode mode, rtx target, rtx vals) { - rtx ops[64], op0, op1; + rtx ops[64], op0, op1, op2, op3, op4, op5; enum machine_mode half_mode = VOIDmode; + enum machine_mode quarter_mode = VOIDmode; int n, i; switch (mode) @@ -40503,6 +40515,42 @@ half: gen_rtx_VEC_CONCAT (mode, op0, op1))); return; +case V64QImode: + quarter_mode = V16QImode; + half_mode = V32QImode; + goto quarter; + +case V32HImode: + quarter_mode = V8HImode; + half_mode = V16HImode; + goto quarter; + +quarter: + n = GET_MODE_NUNITS (mode); + for (i = 0; i < n; i++) + ops[i] = XVECEXP (vals, 0, i); + op0 = gen_reg_rtx (quarter_mode); + op1 = gen_reg_rtx (quarter_mode); + op2 = gen_reg_rtx (quarter_mode); + op3 = gen_reg_rtx (quarter_mode); + op4 = gen_reg_rtx (half_mode); + op5 = gen_reg_rtx (half_mode); + ix86_expand_vector_init_interleave (quarter_mode, op0, ops, + n >> 3); + ix86_expand_vector_init_interleave (quarter_mode, op1, + &ops [n >> 2], n >> 3); + ix86_expand_vector_init_interleave (quarter_mode, op2, + &ops [n >> 1], n >> 3); + ix86_expand_vector_init_interleave (quarter_mode, op3, + &ops [(n >> 1) | (n >> 2)], n >> 3); + emit_insn (gen_rtx_SET (VOIDmode, op4, + gen_rtx_VEC_CONCAT (half_mode, op0, op1))); + emit_insn (gen_rtx_SET (VOIDmode, op5, + gen_rtx_VEC_CONCAT (half_mode, op2, op3))); + emit_insn (gen_rtx_SET (VOIDmode, target, + gen_rtx_VEC_CONCAT (mode, op4, op5))); + return; + case V16QImode: if (!TARGET_SSE4_1) break; diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 106fa00..a60b0d1 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -524,7 +524,9 @@ (V16SI "TARGET_AVX512F") (V16SF "TARGET_AVX512F") (V8DI "TARGET_AVX512F") (V8DF "TARGET_AVX512F") (V4DI "TARGET_AVX512VL") (V4DF "TARGET_AVX512VL")]) -(define_mode_iterator VI48F_512 [V16SI V
Re: [PATCH i386 AVX512] [63.1/n] Add vpshufb, perm autogen (except for v64qi).
Hi, I think this patch should be split in 2 parts: V64QI related and non-V64QI related. This part contains non-V64QI related changes. Also I've noticed, that not all patterns using VI1_AVX2, actually have AVX512 versions, so fixed bogus patterns. On 06 Oct 16:10, Jakub Jelinek wrote: > On Mon, Oct 06, 2014 at 04:55:28PM +0400, Kirill Yukhin wrote: > > --- a/gcc/config/i386/i386.c > > +++ b/gcc/config/i386/i386.c > > @@ -21364,20 +21364,113 @@ ix86_expand_vec_perm_vpermi2 (rtx target, rtx > > op0, rtx mask, rtx op1) > >enum machine_mode mode = GET_MODE (op0); > >switch (mode) > > { > > + /* There is no byte version of vpermi2. So we use vpermi2w. */ > > +case V64QImode: ... > > I believe this case doesn't belong to this function, other than this > case ix86_expand_vec_perm_vpermi2 emits always just a single insn, and > so it should always do that, and there should be a separate function > that expands the worst case of V64QImode full 2 operand permutation. > See my previous mail, IMHO it is doable with 5 instructions rather than 7. > And IMHO we should have a separate function which emits that, supposedly > one for the constant permutations, one for the variable case (perhaps > then your 7 insn sequence is best?). This will be done in following patch. > > Also, IMHO rather than building a CONST_VECTOR ahead in each of the callers, > supposedly ix86_expand_vec_perm_vpermi2 could take the arguments it takes > right now plus D, either D would be NULL (then it would behave as now), or > SEL would be NULL, then it would create a CONST_VECTOR on the fly if needed. > I.e. the function would start with a switch that would just contain the > if (...) > return false; > hunks plus break; for the success case, then code to generate CONST_VECTOR > if sel is NULL_RTX from d, and finally another switch with just the emit > cases. Done. > > > +case V8HImode: > > + if (!TARGET_AVX512VL) > > + return false; > > + emit_insn (gen_avx512vl_vpermi2varv8hi3 (target, op0, > > + force_reg (V8HImode, mask), > > op1)); > > + return true; > > +case V16HImode: > > + if (!TARGET_AVX512VL) > > + return false; > > + emit_insn (gen_avx512vl_vpermi2varv16hi3 (target, op0, > > +force_reg (V16HImode, mask), op1)); > > + return true; > > Aren't these two insns there only if both TARGET_AVX512VL && TARGET_AVX512BW? > I mean, the ISA pdf mentions both of the CPUID flags simultaneously, and I > think neither of these depends on the other one in GCC. That's unlike insns > where CPUID AVX512VL and AVX512F are mentioned together, because in GCC > AVX512VL depends on AVX512F. > Good catch! > > @@ -42662,7 +42764,12 @@ expand_vec_perm_blend (struct expand_vec_perm_d *d) > > > >if (d->one_operand_p) > > return false; > > - if (TARGET_AVX2 && GET_MODE_SIZE (vmode) == 32) > > + if (TARGET_AVX512F && GET_MODE_SIZE (vmode) == 64 && > > + GET_MODE_SIZE (GET_MODE_INNER (vmode)) >= 4) > > Formatting, && belongs on the second line. > Fixed. > > +; > > + else if (TARGET_AVX512VL) > > I'd add && GET_MODE_SIZE (GET_MODE_INNER (vmode) == 64 here. > AVX512VL is not going to handle 64-bit vectors, or 1024-bit ones, > and the == 32 and == 16 cases are handled because AVX512VL implies > TARGET_AVX2 and TARGET_SSE4_1, doesn't it? > As TARGET_AVX512VL always implies TARGET_AVX2 and TARGET_SSE4_1 and works only on 32/16-byte mode this case is redundant, so I've removed it. > > @@ -43012,6 +43125,17 @@ expand_vec_perm_pshufb (struct expand_vec_perm_d > > *d) > > return false; > > } > > } > > + else if (GET_MODE_SIZE (d->vmode) == 64) > > + { > > + if (!TARGET_AVX512BW) > > + return false; > > + if (vmode == V64QImode) > > + { > > + for (i = 0; i < nelt; ++i) > > + if ((d->perm[i] ^ i) & (nelt / 4)) > > + return false; > > Missing comment, I'd duplicate the > /* vpshufb only works intra lanes, it is not > possible to shuffle bytes in between the lanes. */ > comment there. > Done. > > @@ -43109,12 +43237,24 @@ expand_vec_perm_1 (struct expand_vec_perm_d *d) > > rtx (*gen) (rtx, rtx) = NULL; > > switch (d->vmode) > > { > > + case V64QImode: > > + if (TARGET_AVX512VL) > > VL? Isn't that BW? > > > + gen = gen_avx512bw_vec_dupv64qi; > > + break; > > case V32QImode: > > gen = gen_avx2_pbroadcastv32qi_1; > > break; > > + case V32HImode: > > + if (TARGET_AVX512VL) > > Ditto. > Fixed. > > @@ -43216,6 +43368,14 @@ expand_vec_perm_1 (struct expand_vec_perm_d *d) > > mode = V8DImode; > >else if (mode == V16SFmode) > > mode = V16SImode; > > + else if (mode == V4DFmode) > > +mode = V4DImode; > > + else if (mode == V2DFmode) > > +mode = V2DImode; > > + else if (mod
RE: [RFC: Patch, PR 60102] [4.9/4.10 Regression] powerpc fp-bit ices@dwf_regno
> -Original Message- > From: Maciej W. Rozycki [mailto:ma...@codesourcery.com] > To: Ulrich Weigand > Cc: Dharmakan Rohit-B30502; Wienskoski Edmar-RA8797; David Edelsohn; gcc- > patc...@gcc.gnu.org; Alan Modra; Jakub Jelinek > Subject: Re: [RFC: Patch, PR 60102] [4.9/4.10 Regression] powerpc fp-bit > ices@dwf_regno [snip] > > > > Rohit, could you verify whether this fixes the SPE problem? > > Thanks for your work, I have applied your change to my 4.9 setup, rebuilt the > compiler and smoke-tested it with gdb.base/store.exp and my e500v2 multilib > only; powerpc-linux-gnu target. Unfortunately your patch hasn't changed > anything, the same failures remain. > > Just to be sure I haven't missed anything I reverted your change and Rohit's > one as well, rebuilt the compiler and rerun this testing and this time all > failures > were gone. So it looks like your fix didn't completely cover the damage > Rohit's > change caused. :( Ulrich/Maciej, The patch works for me. Tested with GCC v4.9 branch rev 216036 and GCC trunk rev 216027. Regards, Rohit
Re: [AArch64] Fix predicate and constraint mismatch in logical atomic operations
On 25 September 2014 21:30, Segher Boessenkool wrote: > On Thu, Sep 25, 2014 at 10:33:17AM -0700, Michael Collison wrote: >> The problem is the "CONST_INT 0", not a large constant. This constant is >> not accepted by the predicate, but is accepted by the constraint. > > Yes, bad choice of words, sorry. Just read "big" as "not matching the > predicate". The point is that everything works fine until RA, and that > makes it hard to make a useful test. > Hi, IIUC, a testcase is not required/practicable. So, may I ping this patch? https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02209.html Thanks, Christophe. > > Segher > > > p.s. Please don't top-post. Thanks.
Re: [i386] Replace builtins with vector extensions
On Thu, 9 Oct 2014, Uros Bizjak wrote: On Thu, Oct 9, 2014 at 12:33 PM, Marc Glisse wrote: Ping https://gcc.gnu.org/ml/gcc-patches/2014-07/msg01812.html (another part of the discussion is around https://gcc.gnu.org/ml/gcc-patches/2014-06/msg02288.html ) Most people who commented seem cautiously in favor. The least favorable was Ulrich who suggested to go with it but keep the old behavior accessible if the user defines some macro (which imho would lose a large part of the simplification benefits of the patch) https://gcc.gnu.org/ml/gcc-patches/2014-06/msg02328.html If this is accepted, I will gladly prepare patches removing the unused builtins and extending this to a few more operations (integer vectors in particular). If this is not the direction we want to go, I'd like to hear it clearly so I can move on... Well, I'm undecided. First, thanks for answering, it helps me a lot to know what others think. The current approach is proven to work OK, there is no bugs reported in this area and the performance is apparently OK. There should be clear benefits in order to change something that "ain't broken", and at least some proof that we won't regress in this area with the new approach. There are quite a few enhancement PRs asking for more performance, but indeed no (or very few) complaints about correctness or about gcc turning their code into something worse than what they wrote, which I completely agree weighs more. On the other hand, if the new approach opens new optimization opportunities (without regression!), I'm in favor of it, including the fact that new code won't produce equivalent assembly - as long as functionality of the optimized asm stays the same (obviously, I'd say). Please also note that this is quite big project. There are plenty of intrinsics and I for one don't want another partial transition ... That might be an issue : this transition is partial by nature. Many intrinsics cannot (easily) be expressed in GIMPLE, and among those that can be represented, we only want to change those for which we are confident that we will not regress the quality of the code. From the reactions, I would assume that we want to be quite conservative at the beginning, and maybe we can reconsider some other intrinsics later. The best I can offer is consistency: if addition of v2df is changed, addition of v4df is changed as well (and say any +-*/ of float/double vectors of any supported size). Another block would be +-*/% for integer vectors. And construction / access (most construction is already builtin-free). And remove the unused builtins in the same patch that makes them unused. If you don't like those blocks, I can write one mega-patch that does all these, if we roughly agree on the list beforehand, so it goes in all at once. Would that be good enough? TL/DR: If there are benefits, no regressions and you think you'll finish the transition, let's go for it. -- Marc Glisse
[PATCH] Fix GCC tests fail for installed toolchain due to ASan, UBSan and TSan testsuites drop GCC_EXEC_PREFIX.
Hi, After enabling ASan, TSan and UBSan testsuites for installed toolchain, many tests started to fail. This is caused by wrong logic in {asan, ubsan, tsan}_finish functions. Here, restore_ld_library_path is called, that is wrong, because it drops some env variables ( GCC_EXEC_PREFIX, LD_LIBRARY_PATH, etc) to state that was before gcc-dg.exp initialized testing environment, so installed GCC will be confused to find some needed stuff later. Removing restore_ld_library_path from {asan, ubsan, tsan}_finish seems to fix the issue. Tested on x86_64-pc-linux-gnu, ok to commit? -Maxim gcc/testsuite/ChangeLog: 2014-10-09 Max Ostapenko * lib/asan-dg.exp (asan_finish): Remove restore_ld_library_path_env_vars. * lib/tsan-dg.exp (tsan_finish): Likewise. * lib/ubsan-dg.exp (ubsan_finish): Likewise. diff --git a/gcc/testsuite/lib/asan-dg.exp b/gcc/testsuite/lib/asan-dg.exp index 9769138..c98fd3c 100644 --- a/gcc/testsuite/lib/asan-dg.exp +++ b/gcc/testsuite/lib/asan-dg.exp @@ -132,7 +132,6 @@ proc asan_finish { args } { unset TEST_ALWAYS_FLAGS } } -restore_ld_library_path_env_vars } # Symbolize lines like diff --git a/gcc/testsuite/lib/tsan-dg.exp b/gcc/testsuite/lib/tsan-dg.exp index 54ec404..6f7a4d9 100644 --- a/gcc/testsuite/lib/tsan-dg.exp +++ b/gcc/testsuite/lib/tsan-dg.exp @@ -143,5 +143,4 @@ proc tsan_finish { args } { } else { unset dg-do-what-default } -restore_ld_library_path_env_vars } diff --git a/gcc/testsuite/lib/ubsan-dg.exp b/gcc/testsuite/lib/ubsan-dg.exp index 5a7a653..87c460f 100644 --- a/gcc/testsuite/lib/ubsan-dg.exp +++ b/gcc/testsuite/lib/ubsan-dg.exp @@ -114,5 +114,4 @@ proc ubsan_finish { args } { unset TEST_ALWAYS_FLAGS } } -restore_ld_library_path_env_vars }
Re: Towards GNU11
Am 08.10.2014 um 09:16 schrieb Richard Biener: > On Tue, 7 Oct 2014, Marek Polacek wrote: > I think it makes sense to do this (and I expect C++ will follow > with defaulting to -std=c++11 once the ABI stuff has settled). > > Of course it would be nice to look at the actual fallout in > a whole-distribution rebuild... I can certainly do that, once stage1 is finished, hopefully for more than x86 architectures. What happened to the plans to stabilize the libstdc++ c++11 ABI? Is this still a target for GCC 5? Matthias
Re: [i386] Replace builtins with vector extensions
On Thu, Oct 9, 2014 at 2:28 PM, Marc Glisse wrote: > On Thu, 9 Oct 2014, Uros Bizjak wrote: > >> On Thu, Oct 9, 2014 at 12:33 PM, Marc Glisse wrote: >>> >>> Ping https://gcc.gnu.org/ml/gcc-patches/2014-07/msg01812.html >>> >>> (another part of the discussion is around >>> https://gcc.gnu.org/ml/gcc-patches/2014-06/msg02288.html ) >>> >>> Most people who commented seem cautiously in favor. The least favorable >>> was >>> Ulrich who suggested to go with it but keep the old behavior accessible >>> if >>> the user defines some macro (which imho would lose a large part of the >>> simplification benefits of the patch) >>> https://gcc.gnu.org/ml/gcc-patches/2014-06/msg02328.html >>> >>> If this is accepted, I will gladly prepare patches removing the unused >>> builtins and extending this to a few more operations (integer vectors in >>> particular). If this is not the direction we want to go, I'd like to hear >>> it >>> clearly so I can move on... >> >> >> Well, I'm undecided. > > > First, thanks for answering, it helps me a lot to know what others think. > >> The current approach is proven to work OK, there is no bugs reported >> in this area and the performance is apparently OK. There should be >> clear benefits in order to change something that "ain't broken", and >> at least some proof that we won't regress in this area with the new >> approach. > > > There are quite a few enhancement PRs asking for more performance, but > indeed no (or very few) complaints about correctness or about gcc turning > their code into something worse than what they wrote, which I completely > agree weighs more. > >> On the other hand, if the new approach opens new optimization >> opportunities (without regression!), I'm in favor of it, including the >> fact that new code won't produce equivalent assembly - as long as >> functionality of the optimized asm stays the same (obviously, I'd >> say). >> >> Please also note that this is quite big project. There are plenty of >> intrinsics and I for one don't want another partial transition ... > > > That might be an issue : this transition is partial by nature. Many > intrinsics cannot (easily) be expressed in GIMPLE, and among those that can > be represented, we only want to change those for which we are confident that > we will not regress the quality of the code. From the reactions, I would > assume that we want to be quite conservative at the beginning, and maybe we > can reconsider some other intrinsics later. > > The best I can offer is consistency: if addition of v2df is changed, > addition of v4df is changed as well (and say any +-*/ of float/double > vectors of any supported size). Another block would be +-*/% for integer > vectors. And construction / access (most construction is already > builtin-free). And remove the unused builtins in the same patch that makes > them unused. If you don't like those blocks, I can write one mega-patch that > does all these, if we roughly agree on the list beforehand, so it goes in > all at once. > > Would that be good enough? OK, let's go in the proposed way, more detailed: - we begin with +-*/ of float/double vectors. IMO, this would result in a relatively small and easily reviewable patch to iron out the details of the approach. Alternatively, we can begin with floats only. - commit the patch and wait for the sky to fall down. - we play a bit with the compiler to check generated code and corner cases (some kind of Q/A) and wait if someone finds a problem (say, a couple of weeks). - if there are no problems, continue with integer builtins following the established approach, otherwise we revert everything and go back to the drawing board. - repeat the procedure for other builtins. I propose to wait a couple of days for possible comments before we get the ball rolling. Uros.
Re: [PATCH, C++] Fix PR63366: __complex not equivalent to __complex double in C++
On 10/09/2014 05:24 AM, Thomas Preud'homme wrote: "ISO C++ forbids declaration of %qs with no type", name); type = integer_type_node; + defaulted_int = 1; I would think we want to handle this up in the existing defaulted_int block: /* No type at all: default to `int', and set DEFAULTED_INT because it was not a user-defined typedef. */ if (type == NULL_TREE && (signed_p || unsigned_p || long_p || short_p)) + return typeid (__complex) != typeid (__complex double); Don't we want this to be '=='? Jason
Re: constexpr and const qual in C++14
OK, thanks. Jason
Re: [PATCH, C++] Fix PR63366: __complex not equivalent to __complex double in C++
On 10/09/14 09:25, Jason Merrill wrote: On 10/09/2014 05:24 AM, Thomas Preud'homme wrote: "ISO C++ forbids declaration of %qs with no type", name); type = integer_type_node; + defaulted_int = 1; I would think we want to handle this up in the existing defaulted_int block: my thought was to at least put it next to the explicit_int = -1 above. /* No type at all: default to `int', and set DEFAULTED_INT because it was not a user-defined typedef. */ if (type == NULL_TREE && (signed_p || unsigned_p || long_p || short_p)) + return typeid (__complex) != typeid (__complex double); Don't we want this to be '=='? I think it wants to check for _complex being __complex double, not not being _complex int (other failure modes exist!) nathan -- Nathan Sidwell
Re: [C++ PAtch] More C++11 and C++14 constexpr work
On 10/08/2014 03:47 PM, Paolo Carlini wrote: (check_constexpr_ctor_body): Use it; add bool parameter. This function seems to only be called in one place; why add the parameter? Jason
Re: [PATCH, C++] Fix PR63366: __complex not equivalent to __complex double in C++
On 10/09/2014 09:30 AM, Nathan Sidwell wrote: + return typeid (__complex) != typeid (__complex double); Don't we want this to be '=='? I think it wants to check for _complex being __complex double, not not being _complex int (other failure modes exist!) Right, I was forgetting about returning nonzero for failure. Jason
Re: C++ Patch for c++/60894
On 10/08/2014 01:30 PM, Fabien Chêne wrote: 2014-10-07 23:13 GMT+02:00 Jason Merrill : It seems to me that the problem is that lookup_and_check_tag is rejecting a USING_DECL rather than returning it. What if we return the USING_DECL? If the USING_DECL is returned, the code below will be rejected as expected, but the error message will not mention the line where the USING_DECL appears as the previous definition, but at the target declaration of the USING_DECL instead. I think that's what happens if you strip the USING_DECL and return what it points to; if you return the USING_DECL itself that shouldn't happen (though then the caller needs to be able to deal with getting a USING_DECL). Jason
Re: Towards GNU11
On 10/09/2014 08:45 AM, Matthias Klose wrote: What happened to the plans to stabilize the libstdc++ c++11 ABI? Is this still a target for GCC 5? Yes. Jason
Re: [RFC: Patch, PR 60102] [4.9/4.10 Regression] powerpc fp-bit ices@dwf_regno
rohitarulraj wrote: > > -Original Message- > > From: Maciej W. Rozycki [mailto:ma...@codesourcery.com] > > To: Ulrich Weigand > > Cc: Dharmakan Rohit-B30502; Wienskoski Edmar-RA8797; David Edelsohn; gcc- > > patc...@gcc.gnu.org; Alan Modra; Jakub Jelinek > > Subject: Re: [RFC: Patch, PR 60102] [4.9/4.10 Regression] powerpc fp-bit > > ices@dwf_regno > [snip] > > > > > > Rohit, could you verify whether this fixes the SPE problem? > >=20 > > Thanks for your work, I have applied your change to my 4.9 setup, rebuil= > t the > > compiler and smoke-tested it with gdb.base/store.exp and my e500v2 multil= > ib > > only; powerpc-linux-gnu target. Unfortunately your patch hasn't changed > > anything, the same failures remain. > >=20 > > Just to be sure I haven't missed anything I reverted your change and Roh= > it's > > one as well, rebuilt the compiler and rerun this testing and this time al= > l failures > > were gone. So it looks like your fix didn't completely cover the damage = > Rohit's > > change caused. :( > > Ulrich/Maciej, > > The patch works for me. > Tested with GCC v4.9 branch rev 216036 and GCC trunk rev 216027. Thanks for testing! Can you work with Maciej to find out why he's seeing different results? Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: __intN patch 3/5: main __int128 -> __intN conversion.
OK, thanks. Jason
Re: [C++ PAtch] More C++11 and C++14 constexpr work
Hi, On 10/09/2014 03:31 PM, Jason Merrill wrote: On 10/08/2014 03:47 PM, Paolo Carlini wrote: (check_constexpr_ctor_body): Use it; add bool parameter. This function seems to only be called in one place; why add the parameter? Is also called recursively by check_constexpr_ctor_body_1 and without the complain boolean we end up printing the error message twice. Paolo.
Re: [Ping] Port of VTV for Cygwin and MinGW
On 27.09.2014 12:50, Kai Tietz wrote: > Hi Patrick, > > the mingw/cygwin part your patch looks fine to me. Nevertheless I > have one question regarding to you. Do you have FSF papers for gcc > already? As I asked an overseer and he didn't found you on the list. > > Regards, > Kai > The papers FSF have been taken care of, and the signed papers have been exchanged. A short recap: Mail with the latest patch and changelog: https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02200.html Approved: * gcc/config/i386/* * libstdc++-v3/* * libvtv/* Not approved: * gcc/cp/vtable-class-hierarchy.c * gcc/varasm.c * libgcc/Makefile.in * libgcc/config.host * libiberty/obstack.c Regards, Patrick
Re: [C++ PAtch] More C++11 and C++14 constexpr work
.. a simple example in C++11 would be: struct S { constexpr S() { { struct T { }; } } }; Paolo.
Re: [PATCH, i386, Pointer Bounds Checker 31/x] Pointer Bounds Checker builtins for i386 target
It appeared I changed a semantics of BNDMK expand when replaced tree operations with rtl ones. Original code: + op1 = expand_normal (fold_build2 (PLUS_EXPR, TREE_TYPE (arg1), + arg1, integer_minus_one_node)); + op1 = force_reg (Pmode, op1); Modified code: + op1 = expand_normal (arg1); + + if (!register_operand (op1, Pmode)) + op1 = ix86_zero_extend_to_Pmode (op1); + + /* Builtin arg1 is size of block but instruction op1 should +be (size - 1). */ + op1 = expand_simple_binop (Pmode, PLUS, op1, constm1_rtx, +op1, 1, OPTAB_DIRECT); The problem is that in the fixed version we may modify value of a pseudo register into which arg1 is expanded which means incorrect value for all following usages of arg1. Didn't reveal it early because programs surprisingly rarely hit this bug. I do following change to fix it: op1 = expand_simple_binop (Pmode, PLUS, op1, constm1_rtx, -op1, 1, OPTAB_DIRECT); +NULL, 1, OPTAB_DIRECT); Similar problem was also fixed for BNDNARROW. Does it look OK? Thanks, Ilya -- diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def index 9161287..5421ba9 100644 --- a/gcc/config/i386/i386-builtin-types.def +++ b/gcc/config/i386/i386-builtin-types.def @@ -47,6 +47,7 @@ DEF_PRIMITIVE_TYPE (UCHAR, unsigned_char_type_node) DEF_PRIMITIVE_TYPE (QI, char_type_node) DEF_PRIMITIVE_TYPE (HI, intHI_type_node) DEF_PRIMITIVE_TYPE (SI, intSI_type_node) +DEF_PRIMITIVE_TYPE (BND, pointer_bounds_type_node) # ??? Logically this should be intDI_type_node, but that maps to "long" # with 64-bit, and that's not how the emmintrin.h is written. Again, # changing this would change name mangling. @@ -60,6 +61,7 @@ DEF_PRIMITIVE_TYPE (USHORT, short_unsigned_type_node) DEF_PRIMITIVE_TYPE (INT, integer_type_node) DEF_PRIMITIVE_TYPE (UINT, unsigned_type_node) DEF_PRIMITIVE_TYPE (UNSIGNED, unsigned_type_node) +DEF_PRIMITIVE_TYPE (ULONG, long_unsigned_type_node) DEF_PRIMITIVE_TYPE (LONGLONG, long_long_integer_type_node) DEF_PRIMITIVE_TYPE (ULONGLONG, long_long_unsigned_type_node) DEF_PRIMITIVE_TYPE (UINT8, unsigned_char_type_node) @@ -810,3 +812,15 @@ DEF_FUNCTION_TYPE_ALIAS (V2DI_FTYPE_V2DI_V2DI, TF) DEF_FUNCTION_TYPE_ALIAS (V4SF_FTYPE_V4SF_V4SF, TF) DEF_FUNCTION_TYPE_ALIAS (V4SI_FTYPE_V4SI_V4SI, TF) DEF_FUNCTION_TYPE_ALIAS (V8HI_FTYPE_V8HI_V8HI, TF) + +# MPX builtins +DEF_FUNCTION_TYPE (BND, PCVOID, ULONG) +DEF_FUNCTION_TYPE (VOID, PCVOID, BND) +DEF_FUNCTION_TYPE (VOID, PCVOID, BND, PCVOID) +DEF_FUNCTION_TYPE (BND, PCVOID, PCVOID) +DEF_FUNCTION_TYPE (BND, PCVOID) +DEF_FUNCTION_TYPE (BND, BND, BND) +DEF_FUNCTION_TYPE (PVOID, PVOID, PVOID, ULONG) +DEF_FUNCTION_TYPE (PVOID, PCVOID, BND, ULONG) +DEF_FUNCTION_TYPE (ULONG, VOID) +DEF_FUNCTION_TYPE (PVOID, BND) diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 2b39ff1..3f464de 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -84,6 +84,8 @@ along with GCC; see the file COPYING3. If not see #include "tree-vectorizer.h" #include "shrink-wrap.h" #include "builtins.h" +#include "tree-chkp.h" +#include "rtl-chkp.h" static rtx legitimize_dllimport_symbol (rtx, bool); static rtx legitimize_pe_coff_extern_decl (rtx, bool); @@ -28776,6 +28778,19 @@ enum ix86_builtins IX86_BUILTIN_XABORT, IX86_BUILTIN_XTEST, + /* MPX */ + IX86_BUILTIN_BNDMK, + IX86_BUILTIN_BNDSTX, + IX86_BUILTIN_BNDLDX, + IX86_BUILTIN_BNDCL, + IX86_BUILTIN_BNDCU, + IX86_BUILTIN_BNDRET, + IX86_BUILTIN_BNDNARROW, + IX86_BUILTIN_BNDINT, + IX86_BUILTIN_SIZEOF, + IX86_BUILTIN_BNDLOWER, + IX86_BUILTIN_BNDUPPER, + /* BMI instructions. */ IX86_BUILTIN_BEXTR32, IX86_BUILTIN_BEXTR64, @@ -28853,6 +28868,8 @@ struct builtin_isa { enum ix86_builtin_func_type tcode; /* type to use in the declaration */ HOST_WIDE_INT isa; /* isa_flags this builtin is defined for */ bool const_p;/* true if the declaration is constant */ + bool leaf_p; /* true if the declaration has leaf attribute */ + bool nothrow_p; /* true if the declaration has nothrow attribute */ bool set_and_not_built_p; }; @@ -28904,6 +28921,8 @@ def_builtin (HOST_WIDE_INT mask, const char *name, ix86_builtins[(int) code] = NULL_TREE; ix86_builtins_isa[(int) code].tcode = tcode; ix86_builtins_isa[(int) code].name = name; + ix86_builtins_isa[(int) code].leaf_p = false; + ix86_builtins_isa[(int) code].nothrow_p = false; ix86_builtins_isa[(int) code].const_p = false; ix86_builtins_isa[(int) code].set_and_not_built_p = true; } @@ -28954,6 +28973,11 @@ ix86_add_new_builtins (HOST_WIDE_INT isa) ix86_builtins[i] = decl; if (ix86_builtins_isa[i].const_p) TREE_READONLY (decl) = 1; +
Re: [PATCH, ARM] attribute target (thumb,arm)
On 09/10/14 12:35, Christian Bruel wrote: > > On 10/08/2014 06:56 PM, Ramana Radhakrishnan wrote: >> Hi Christian, > > << snipped agreed stuf >> >> 3) about inlining >>I dislike inlining different modes, From a conceptual use, a user >> might want to switch mode only when changing a function's hotness. >> Usually inlining a cold function into a hot one is not what the user >> explicitly required when setting different mode attributes for them, >> __attribute__((thumb)) should not imply coldness or hotness. Inlining >> between cold and hot functions should be done based on profile feedback. >> The choice of compiling in "Thumb1" state for coldness is a separate one >> because that's where the choice needs to be made. > > Ideally yes. but I think that a user motivation to use target attribute > (("thumb") is to reduce code size even in the cases where PFO is not > available (libraries, kernel or user application build spec packaged > without profile data). And there are cases where static probabilities > are not enough and that a user wants it own control with gprof or oprofile. > But in this case, we could point to the __attribute__ (("cold")) on the > function ? That would probably be the best workaround to propose if we > recommend this > Hot vs cold is interesting, but arm/thumb shouldn't be used to imply that. The days when ARM=fast, thumb=small are in the past now, and thumb2 code should be both fast and small. Indeed, smaller thumb2 code can be faster than larger ARM code simply because you can get more of it in the cache. The use of arm vs thumb is likely to be much more subtle now. > But here is another scenario: Using of attribute (("arm")) for exception > entry points is indeed not related to hotness. But consider a typical > thumb binary with an entry point in arm compiled in C (ex handler, a > kernel...). Today due to the file boundary the thumb part is not inlined > into the arm point. (Using -flto is not possible because the whole > gimple would be thumb). > We have no-inline attributes for scenarios like that. I don't think a specific use case should dominate other cases. > Now, using attribute (("target")) for the functions others than the > entry point, with your approach they would all be inlined (assuming the > cost allow this) and we would end up with a arm binary instead of a > thumb binary... > > But there are still 3 points : > > - At least 2 other target (i386, Powerpc) that support attribute_target > disable inlining between modes that are not subsets. I like to think > about homogeneity between targets and I find odd to have different > inlining rules... > That's because use of specific instructions must not be allowed to leak past a gating check that is in the caller. It would be a disaster if a function that used a neon register, for example, was allowed to leak into code that might run on a target with no Neon register file. The ARM/thumb distinction shouldn't, by default, be limited in that manner. I believe inlining could happen from a subset of the archtiecture into a function using a superset, just not vice-versa. > - Scanning the function body to check for ASM_INPUT does not look very > elegant (if this matters) because the asm could well be unrelated > > The only case when it will always be a win to inline thumb into arm is > when the cost of the inlined body is less than a BX instruction (but > still, with branch prediction this cost is pondered). > One of the problems with not inlining is that the C++ abstraction penalty is likely to shoot up. There will be many major lost optimization opportunities if we start down that path. So I think inlining should only be disabled if there's some technical reason why it should be disabled, not because of some 'it might not always be ideal' feelings. Furthermore, we should expect users to use the other attributes consistently when they expect specific behaviours to occur. My 2p. R. > >> >>> The compiler would take a decision that is not what the user wrote. And >>> in addition if you consider the few instructions to modify R15 to switch >>> state that would end up with more code executed in the critical path, >>> voiding a possible size of speed gain. >> I do not expect there to be any additional instructions needed to switch >> state. If function x is inlined into function y the state would be lost >> and the state would be in terms of the state of function x. > Yes, indeed. I was in a LCM/mode-switching thinking mode when writing > this. In this case the mode is inherited. > >> Obviously if the user doesn't want inlining - the user would add >> attributes to disable inlining. You do have extensions such as >> __attribute__((noinline)) and __attribute__((never_inline)) to give the >> user that control and those bits need to be used in addition. > > Those attributes are overkill. They would disable inlining between > caller-callee of a same mode. This is not what we want > >> >> The att
Re: [C++ PAtch] More C++11 and C++14 constexpr work
On 10/09/2014 09:49 AM, Paolo Carlini wrote: Hi, On 10/09/2014 03:31 PM, Jason Merrill wrote: On 10/08/2014 03:47 PM, Paolo Carlini wrote: (check_constexpr_ctor_body): Use it; add bool parameter. This function seems to only be called in one place; why add the parameter? Is also called recursively by check_constexpr_ctor_body_1 and without the complain boolean we end up printing the error message twice. Ah, guess I overlooked that. OK. Jason
Re: [Ping] Port of VTV for Cygwin and MinGW
2014-10-09 15:52 GMT+02:00 Patrick Wollgast : > On 27.09.2014 12:50, Kai Tietz wrote: >> Hi Patrick, >> >> the mingw/cygwin part your patch looks fine to me. Nevertheless I >> have one question regarding to you. Do you have FSF papers for gcc >> already? As I asked an overseer and he didn't found you on the list. >> >> Regards, >> Kai >> > > The papers FSF have been taken care of, and the signed papers have been > exchanged. > > > A short recap: > > Mail with the latest patch and changelog: > https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02200.html > > Approved: > * gcc/config/i386/* > * libstdc++-v3/* > * libvtv/* > > Not approved: > * gcc/cp/vtable-class-hierarchy.c Index: gcc/cp/vtable-class-hierarchy.c === --- gcc/cp/vtable-class-hierarchy.c(Revision 214408) +++ gcc/cp/vtable-class-hierarchy.c(Arbeitskopie) @@ -1182,7 +1182,7 @@ vtv_generate_init_routine (void) TREE_STATIC (vtv_fndecl) = 1; TREE_USED (vtv_fndecl) = 1; DECL_PRESERVE_P (vtv_fndecl) = 1; - if (flag_vtable_verify == VTV_PREINIT_PRIORITY) + if (flag_vtable_verify == VTV_PREINIT_PRIORITY && !TARGET_PECOFF) You need to check that TARGET_PECOFF is defined. Otherwise you break compilation for none i386 targets. DECL_STATIC_CONSTRUCTOR (vtv_fndecl) = 0; gimplify_function_tree (vtv_fndecl); @@ -1190,7 +1190,7 @@ vtv_generate_init_routine (void) cgraph_process_new_functions (); - if (flag_vtable_verify == VTV_PREINIT_PRIORITY) + if (flag_vtable_verify == VTV_PREINIT_PRIORITY && !TARGET_PECOFF) See above. Likewise assemble_vtv_preinit_initializer (vtv_fndecl); } > * gcc/varasm.c Index: gcc/varasm.c === --- gcc/varasm.c(Revision 214408) +++ gcc/varasm.c(Arbeitskopie) @@ -2165,6 +2165,33 @@ assemble_variable (tree decl, int top_le DECL_NAME (decl)); in_section = sect; #else + /* Neither OBJECT_FORMAT_PE, nor OBJECT_FORMAT_COFF is set here. + Therefore the following check is used. + In case a the target is PE or COFF a comdat group section + is created, e.g. .vtable_map_vars$foo. The linker places + everything in .vtable_map_vars at the end. + + A fix could be made in + gcc/config/i386/winnt.c: i386_pe_unique_section. */ + if (TARGET_PECOFF) You need to test, if TARGET_PECOFF is defined! + { +char *name; + +if (TREE_CODE (DECL_NAME (decl)) == IDENTIFIER_NODE) + name = ACONCAT ((sect->named.name, "$", + IDENTIFIER_POINTER (DECL_NAME (decl)), NULL)); +else + name = ACONCAT ((sect->named.name, "$", + IDENTIFIER_POINTER (DECL_COMDAT_GROUP (DECL_NAME (decl))), + NULL)); + +targetm.asm_out.named_section (name, + sect->named.common.flags + | SECTION_LINKONCE, Here it seems to me that you have some whitespace issues, + DECL_NAME (decl)); +in_section = sect; +} +else switch_to_section (sect); #endif > * libgcc/Makefile.in Looks ok to me. > * libgcc/config.host Looks fine to me, too. > * libiberty/obstack.c Why you use instead of C-runtime exit/abort-functions the platform-functions to terminate the process. This looks to me like useless change. For cygwin this might be even wrong in some aspects. What is the reasoning for this change? Another note I have about re-implementation of mprotect in --- libvtv/vtv_malloc.cc. Why you need that? it is already part of libgcc for mingw. And for cygwin this function is part of cygwin's library itself. So why re-implementing it here? > Regards, > Patrick Regards, Kai
[PATCH] Fix r216010 fallout
This fixes fallout from r216010, which causes Firefox build failures. Just move the gcc_assert below the new if statement. Boostrapped and tested on powerpc64-unknown-linux-gnu. Ok for trunk? Thanks. 2014-10-09 Markus Trippelsdorf * pa-polymorphic-call.c (check_stmt_for_type_change): Move assertion. 2014-10-09 Markus Trippelsdorf * /g++.dg/ipa/polymorphic-call-1.C: New testcase. diff --git a/gcc/ipa-polymorphic-call.c b/gcc/ipa-polymorphic-call.c index 51c6709a8655..7d58601ae365 100644 --- a/gcc/ipa-polymorphic-call.c +++ b/gcc/ipa-polymorphic-call.c @@ -1424,9 +1424,9 @@ check_stmt_for_type_change (ao_ref *ao ATTRIBUTE_UNUSED, tree vdef, void *data) } type = extr_type_from_vtbl_ptr_store (stmt, tci, &offset); - gcc_assert (!type || TYPE_MAIN_VARIANT (type) == type); if (type == error_mark_node) return false; + gcc_assert (!type || TYPE_MAIN_VARIANT (type) == type); if (!type) { if (dump_file) diff --git a/gcc/testsuite/g++.dg/ipa/polymorphic-call-1.C b/gcc/testsuite/g++.dg/ipa/polymorphic-call-1.C new file mode 100644 index ..2b5b54478349 --- /dev/null +++ b/gcc/testsuite/g++.dg/ipa/polymorphic-call-1.C @@ -0,0 +1,49 @@ +// { dg-do compile } +// { dg-options "-O2" } +class A; +class B +{ + A *mRawPtr; + +public: + void *StartAssignment___trans_tmp_2; + A ** + m_fn1 () + { +StartAssignment___trans_tmp_2 = &mRawPtr; +return reinterpret_cast (StartAssignment___trans_tmp_2); + } +}; +class C +{ +public: + C (B &p1) : mTargetSmartPtr (p1) {} + operator A **() { return mTargetSmartPtr.m_fn1 (); } + B &mTargetSmartPtr; +}; +class A +{ +public: + A (); +}; +class D +{ + D (bool); + B mNewEntry; + virtual int m_fn2 (); +}; +C +fn1 (B &p1) +{ + return p1; +} +void +fn2 (bool, A **) +{ + new A; +} +D::D (bool p1) +{ + A **a = fn1 (mNewEntry); + fn2 (p1, a); +} -- Markus
Re: [PATCH] Fix r216010 fallout
On Thu, Oct 09, 2014 at 04:47:05PM +0200, Markus Trippelsdorf wrote: > 2014-10-09 Markus Trippelsdorf > > * /g++.dg/ipa/polymorphic-call-1.C: New testcase. Please drop the leading '/' when you commit this patch. Marek
[PATCH i386 AVX512] [76/n] Extend int 2 float conversions.
Hello, This patch extends autogeneration of SI-2-SF conversions. Bootstrapped. AVX-512* tests on top of patch-set all pass under simulator. Is it ok for trunk? gcc/ * config/i386/i386.c (ix86_expand_vector_convert_uns_vsivsf): Handle V16SI mode and TARGET_AVX512VL. -- Thanks, K diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 7c34431..8a7853e 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -18811,6 +18811,19 @@ ix86_expand_vector_convert_uns_vsivsf (rtx target, rtx val) enum machine_mode fltmode = GET_MODE (target); rtx (*cvt) (rtx, rtx); + if (intmode == V16SImode) +{ + emit_insn (gen_ufloatv16siv16sf2 (target, val)); + return; +} + if (TARGET_AVX512VL) +{ + if (intmode == V4SImode) +emit_insn (gen_ufloatv4siv4sf2 (target, val)); + else +emit_insn (gen_ufloatv8siv8sf2 (target, val)); + return; +} if (intmode == V4SImode) cvt = gen_floatv4siv4sf2; else
Re: [PATCH 1/2] [AARCH64,NEON] Add patterns + builtins for vld[234](q?)_lane_* intrinsics
+(define_insn "vec_load_lanesoi_lane" Best to prepend "aarch64_" the pattern name, IMHO, else it looks like a standard pattern name(eg. vec_load_lanes) at first glance. Otherwise, LGTM(but I can't approve it). Thanks for this patch. Thanks, Tejas. + [(set (match_operand:OI 0 "register_operand" "=w") + (unspec:OI [(match_operand: 1 "aarch64_simd_struct_operand" "Utv") + (match_operand:OI 2 "register_operand" "0") + (match_operand:SI 3 "immediate_operand" "i") + (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY) ] + UNSPEC_LD2_LANE))] + "TARGET_SIMD" + "ld2\\t{%S0. - %T0.}[%3], %1" + [(set_attr "type" "neon_load2_one_lane")] +) + (define_insn "vec_store_lanesoi" [(set (match_operand:OI 0 "aarch64_simd_struct_operand" "=Utv") (unspec:OI [(match_operand:OI 1 "register_operand" "w") @@ -4022,6 +4034,18 @@ [(set_attr "type" "neon_load3_3reg")] ) +(define_insn "vec_load_lanesci_lane" + [(set (match_operand:CI 0 "register_operand" "=w") + (unspec:CI [(match_operand: 1 "aarch64_simd_struct_operand" "Utv") + (match_operand:CI 2 "register_operand" "0") + (match_operand:SI 3 "immediate_operand" "i") + (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] + UNSPEC_LD3_LANE))] + "TARGET_SIMD" + "ld3\\t{%S0. - %U0.}[%3], %1" + [(set_attr "type" "neon_load3_one_lane")] +) + (define_insn "vec_store_lanesci" [(set (match_operand:CI 0 "aarch64_simd_struct_operand" "=Utv") (unspec:CI [(match_operand:CI 1 "register_operand" "w") @@ -4053,6 +4077,18 @@ [(set_attr "type" "neon_load4_4reg")] ) +(define_insn "vec_load_lanesxi_lane" + [(set (match_operand:XI 0 "register_operand" "=w") + (unspec:XI [(match_operand: 1 "aarch64_simd_struct_operand" "Utv") + (match_operand:XI 2 "register_operand" "0") + (match_operand:SI 3 "immediate_operand" "i") + (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] + UNSPEC_LD4_LANE))] + "TARGET_SIMD" + "ld4\\t{%S0. - %V0.}[%3], %1" + [(set_attr "type" "neon_load4_one_lane")] +) + (define_insn "vec_store_lanesxi" [(set (match_operand:XI 0 "aarch64_simd_struct_operand" "=Utv") (unspec:XI [(match_operand:XI 1 "register_operand" "w") @@ -4366,6 +4402,65 @@ DONE; }) +(define_expand "aarch64_ld2_lane" + [(match_operand:OI 0 "register_operand" "=w") + (match_operand:DI 1 "register_operand" "w") + (match_operand:OI 2 "register_operand" "0") + (match_operand:SI 3 "immediate_operand" "i") + (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] + "TARGET_SIMD" +{ + enum machine_mode mode = mode; + rtx mem = gen_rtx_MEM (mode, operands[1]); + + aarch64_simd_lane_bounds (operands[3], 0, GET_MODE_NUNITS (mode)); + emit_insn (gen_vec_load_lanesoi_lane (operands[0], + mem, + operands[2], + operands[3])); + DONE; +}) + +(define_expand "aarch64_ld3_lane" + [(match_operand:CI 0 "register_operand" "=w") + (match_operand:DI 1 "register_operand" "w") + (match_operand:CI 2 "register_operand" "0") + (match_operand:SI 3 "immediate_operand" "i") + (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] + "TARGET_SIMD" +{ + enum machine_mode mode = mode; + rtx mem = gen_rtx_MEM (mode, operands[1]); + + aarch64_simd_lane_bounds (operands[3], 0, GET_MODE_NUNITS (mode)); + emit_insn (gen_vec_load_lanesci_lane (operands[0], + mem, + operands[2], + operands[3])); + DONE; +}) + +(define_expand "aarch64_ld4_lane" + [(match_operand:XI 0 "register_operand" "=w") + (match_operand:DI 1 "register_operand" "w") + (match_operand:XI 2 "register_operand" "0") + (match_operand:SI 3 "immediate_operand" "i") + (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] + "TARGET_SIMD" +{ + enum machine_mode mode = mode; + rtx mem = gen_rtx_MEM (mode, operands[1]); + + aarch64_simd_lane_bounds (operands[3], 0, GET_MODE_NUNITS (mode)); + emit_insn (gen_vec_load_lanesxi_lane (operands[0], + mem, + operands[2], + operands[3])); + DONE; +}) + + + ;; Expanders for builtins to extract vector registers from large ;; opaque integer modes. diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 74b554e..6b5f51f 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -92,6 +92,9 @@ UNSPEC_LD2 UNSPEC_LD3 UNSPEC_LD4 +UNSPEC_LD2_LANE +UNSPEC_LD3_LANE +UNSPEC_LD4_LANE UNSPEC_MB UNSPEC_NOP
[PATCH i386 AVX512] [77/n] Use blend for cond-set V32HI and V64QI.
Hello, This patch extends movcc/vcond autogen. Bootstrapped. AVX-512* tests on top of patch-set all pass under simulator. Is it ok for trunk? gcc/ * config/i386/i386.c (ix86_expand_sse_movcc): Handle V64QI and V32HI mode. (ix86_expand_int_vcond): Ditto. -- Thanks, K diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 8a7853e..f86aa1b 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -21015,6 +21015,12 @@ ix86_expand_sse_movcc (rtx dest, rtx cmp, rtx op_true, rtx op_false) } break; + case V64QImode: + gen = gen_avx512bw_blendmv64qi; + break; + case V32HImode: + gen = gen_avx512bw_blendmv32hi; + break; case V16SImode: gen = gen_avx512f_blendmv16si; break; @@ -21331,6 +21337,8 @@ ix86_expand_int_vcond (rtx operands[]) } break; + case V64QImode: + case V32HImode: case V32QImode: case V16HImode: case V16QImode:
Re: [PATCH 2/2] [AARCH64,NEON] Convert arm_neon.h to use new builtins for vld[234](q?)_lane_*
On 08/10/14 18:27, charles.bay...@linaro.org wrote: From: Charles Baylis This patch replaces the inline assembler implementations of the vld[234](q?)_lane_* intrinsics with new versions which exploit the new builtin functions added in patch 1. Tested (with the rest of the patch series) with make check on aarch64-oe-linux with qemu, and also causes no regressions in clyon's NEON intrinsics tests. Charles Baylis * config/aarch64/arm_neon.h (__LD2_LANE_FUNC): Rewrite using builtins, update uses to use new macro arguments. (__LD3_LANE_FUNC): Likewise. (__LD4_LANE_FUNC): Likewise. Change-Id: I3bd5934b5c4f6127088193c1ab12848144d5540a --- gcc/config/aarch64/arm_neon.h | 377 -- 1 file changed, 255 insertions(+), 122 deletions(-) diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h index 9b1873f..19ce261 100644 --- a/gcc/config/aarch64/arm_neon.h +++ b/gcc/config/aarch64/arm_neon.h @@ -11805,47 +11805,83 @@ __LD2R_FUNC (uint16x8x2_t, uint16x2_t, uint16_t, 8h, u16, q) __LD2R_FUNC (uint32x4x2_t, uint32x2_t, uint32_t, 4s, u32, q) __LD2R_FUNC (uint64x2x2_t, uint64x2_t, uint64_t, 2d, u64, q) -#define __LD2_LANE_FUNC(rettype, ptrtype, regsuffix, \ - lnsuffix, funcsuffix, Q)\ - __extension__ static __inline rettype \ - __attribute__ ((__always_inline__)) \ - vld2 ## Q ## _lane_ ## funcsuffix (const ptrtype *ptr, \ -rettype b, const int c)\ - {\ -rettype result;\ -__asm__ ("ld1 {v16." #regsuffix ", v17." #regsuffix "}, %1\n\t"\ -"ld2 {v16." #lnsuffix ", v17." #lnsuffix "}[%3], %2\n\t" \ -"st1 {v16." #regsuffix ", v17." #regsuffix "}, %0\n\t" \ -: "=Q"(result) \ -: "Q"(b), "Q"(*(const rettype *)ptr), "i"(c) \ -: "memory", "v16", "v17"); \ -return result; \ - } - -__LD2_LANE_FUNC (int8x8x2_t, uint8_t, 8b, b, s8,) -__LD2_LANE_FUNC (float32x2x2_t, float32_t, 2s, s, f32,) -__LD2_LANE_FUNC (float64x1x2_t, float64_t, 1d, d, f64,) -__LD2_LANE_FUNC (poly8x8x2_t, poly8_t, 8b, b, p8,) -__LD2_LANE_FUNC (poly16x4x2_t, poly16_t, 4h, h, p16,) -__LD2_LANE_FUNC (int16x4x2_t, int16_t, 4h, h, s16,) -__LD2_LANE_FUNC (int32x2x2_t, int32_t, 2s, s, s32,) -__LD2_LANE_FUNC (int64x1x2_t, int64_t, 1d, d, s64,) -__LD2_LANE_FUNC (uint8x8x2_t, uint8_t, 8b, b, u8,) -__LD2_LANE_FUNC (uint16x4x2_t, uint16_t, 4h, h, u16,) -__LD2_LANE_FUNC (uint32x2x2_t, uint32_t, 2s, s, u32,) -__LD2_LANE_FUNC (uint64x1x2_t, uint64_t, 1d, d, u64,) -__LD2_LANE_FUNC (float32x4x2_t, float32_t, 4s, s, f32, q) -__LD2_LANE_FUNC (float64x2x2_t, float64_t, 2d, d, f64, q) -__LD2_LANE_FUNC (poly8x16x2_t, poly8_t, 16b, b, p8, q) -__LD2_LANE_FUNC (poly16x8x2_t, poly16_t, 8h, h, p16, q) -__LD2_LANE_FUNC (int8x16x2_t, int8_t, 16b, b, s8, q) -__LD2_LANE_FUNC (int16x8x2_t, int16_t, 8h, h, s16, q) -__LD2_LANE_FUNC (int32x4x2_t, int32_t, 4s, s, s32, q) -__LD2_LANE_FUNC (int64x2x2_t, int64_t, 2d, d, s64, q) -__LD2_LANE_FUNC (uint8x16x2_t, uint8_t, 16b, b, u8, q) -__LD2_LANE_FUNC (uint16x8x2_t, uint16_t, 8h, h, u16, q) -__LD2_LANE_FUNC (uint32x4x2_t, uint32_t, 4s, s, u32, q) -__LD2_LANE_FUNC (uint64x2x2_t, uint64_t, 2d, d, u64, q) +#define __LD2_LANE_FUNC(intype, vectype, largetype, ptrtype, \ +mode, ptrmode, funcsuffix, signedtype)\ +__extension__ static __inline intype __attribute__ ((__always_inline__)) \ +vld2_lane_##funcsuffix (const ptrtype * __ptr, intype __b, const int __c) \ +{ \ + __builtin_aarch64_simd_oi __o; \ + largetype __temp; \ + __temp.val[0] = \ +vcombine_##funcsuffix (__b.val[0], vcreate_##funcsuffix (0)); \ + __temp.val[1] = \ +vcombine_##funcsuffix (__b.val[1], vcreate_##funcsuffix (0)); \ + __o = __builtin_aarch64_set_qregoi##mode (__o, \ + (signedtype) __temp.val[0], \ + 0); \ + __o = __builtin_aarch64_set_qregoi##mode (__o, \ + (signedtype) __temp.val[1], \ + 1); \ + __o =__bui
Re: [i386] Replace builtins with vector extensions
Hello folks, On 09 Oct 14:57, Uros Bizjak wrote: > On Thu, Oct 9, 2014 at 2:28 PM, Marc Glisse wrote: > > On Thu, 9 Oct 2014, Uros Bizjak wrote: > OK, let's go in the proposed way, more detailed: > > - we begin with +-*/ of float/double vectors. IMO, this would result > in a relatively small and easily reviewable patch to iron out the > details of the approach. Alternatively, we can begin with floats only. > - commit the patch and wait for the sky to fall down. > - we play a bit with the compiler to check generated code and corner > cases (some kind of Q/A) and wait if someone finds a problem (say, a > couple of weeks). > - if there are no problems, continue with integer builtins following > the established approach, otherwise we revert everything and go back > to the drawing board. > - repeat the procedure for other builtins. > > I propose to wait a couple of days for possible comments before we get > the ball rolling. Let me repeat, I think this is good idea to do. I just wanted to kindly ask you wait for about 1-2ww before checking-in this things. I hope in that time AVX-512VL,BW,DQ will hit trunk completely and *lots* more intrinsics will be added (I think intrinsics is subject of ~[85/n] patch). -- Thanks, K > > Uros.
Re: [PATCH i386 AVX512] [64/n] Add rest of VI1-AVX2: vpack[us]wb.
On Thu, Oct 9, 2014 at 12:09 PM, Kirill Yukhin wrote: > Hello, > This patch adds rest of vpack instruction patterns. > > Bootstrapped. > gcc.target/i386.exp tests on top of patch-set show no regressions. > under simulator. > > Is it ok for trunk? > > gcc/ > * config/i386/sse.md > (define_insn "_packsswb"): Add masking. > (define_insn "_packuswb"): Ditto. OK. Thanks, Uros.
Re: [C++ PAtch] More C++11 and C++14 constexpr work
Hi, On 10/09/2014 04:18 PM, Jason Merrill wrote: On 10/09/2014 09:49 AM, Paolo Carlini wrote: Hi, On 10/09/2014 03:31 PM, Jason Merrill wrote: On 10/08/2014 03:47 PM, Paolo Carlini wrote: (check_constexpr_ctor_body): Use it; add bool parameter. This function seems to only be called in one place; why add the parameter? Is also called recursively by check_constexpr_ctor_body_1 and without the complain boolean we end up printing the error message twice. Ah, guess I overlooked that. OK. Thanks, I'm going to commit the patch. I noticed today that given the actual C++11 the error messages we provide: "constexpr constructor does not have empty body" and: "body of constexpr function ‘XXX’ not a return-statement" are rather outdated and misleading. In principle we should probably also provide more fine grained error messages, but if you have suggestions for less misleading catch all, I volunteer to do the change and adjust the testcases... Also, I have been thinking that it would probably make sense to move constexpr-related code to a separate cp/constexpr.c: what do you think? Functions with *constexpr* in the name, the various cxx_eval_* and the various potential_constant_* would qualify, I think. Paolo.
Re: [PATCH i386 AVX512] [65/n] Add rest of VI1-AVX2: mul insn pattern.
On Thu, Oct 9, 2014 at 12:19 PM, Kirill Yukhin wrote: > Hello, > This tiny patch extend mul insn pattern to support > masking. > > Bootstrapped. > AVX-512* tests on top of patch-set all pass > under simulator. > > Is it ok for trunk? > > gcc/ > * config/i386/sse.md > (define_expand "mul3"): Add masking. OK. Thanks, Uros.
Re: [patch] Excessive alignment in ix86_data_alignment
On Thu, Oct 9, 2014 at 1:37 AM, Uros Bizjak wrote: > On Thu, Oct 9, 2014 at 10:25 AM, Kirill Yukhin > wrote: >> On 08 Oct 23:02, Petr Murzin wrote: >>> Hi, >>> I have measured performance impact on Haswell platform according to this >>> input: >>> https://gcc.gnu.org/ml/gcc-patches/2014-06/msg00978.html Kirill, please mention: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61296 in your ChangeLog. > What about older processors? Kirill, please collect data on Nehelam/Westmere, Sandybrigde/Ivybride and Silvermont. > The optimization was introduced well before Haswell for then current > processors, and it was based on the recommendation from Intel > optimization guide. If this optimization doesn't apply for new > processors, then tune option should be introduced and set accordingly. > I believe the original excessive alignment was introduced by cut/paste from https://gcc.gnu.org/git/?p=gcc.git;a=patch;h=ed45e834f305d1f2709bf200a13d5beebc2fcfee to improve x86 FP performance, which might be partially copied from CONSTANT_ALIGNMENT: https://gcc.gnu.org/git/?p=gcc.git;a=patch;h=f7d6703c5d83fc9fb06246d6eb49e9b61098045c -- H.J.
Re: [PATCH i386 AVX512] [66/n] Extend vpalignr insn patterns.
On Thu, Oct 9, 2014 at 12:28 PM, Kirill Yukhin wrote: > Hello, > This patch extends vpalignr insn patterns. > It also introduces dedicated `masked' version of pattern > w/o substing. > > Bootstrapped. > AVX-512* tests on top of patch-set all pass > under simulator. > > Is it ok for trunk? > > gcc/ > * config/i386/sse.md > (define_mode_iterator SSESCALARMODE): Add V4TI mode. > (define_insn "_palignr_mask"): New. > (define_insn "_palignr"): Add EVEX version. OK, although SSESCALARMODE became even more messy ... Just FYI: V1TI in VIMAX_AVX2 iterator is used to prevent moves of TImode values from SSE to general regs on x86_64. The same reasoning applies to V1DI MMX mode on x86_32. Thanks, Uros.
Re: [PATCH i386 AVX512] [67/n] Update constraints in vec_dup insn pattern.
On Thu, Oct 9, 2014 at 12:34 PM, Kirill Yukhin wrote: > Hello, > This tiny patch updates constraints in vec_dup insn > pattern. > > Bootstrapped. > AVX-512* tests on top of patch-set all pass > under simulator. > > Is it ok for trunk? > > gcc/ > * config/i386/sse.md > (define_insn "vec_dup"): Update constraints. OK. Thanks, Uros.
Re: [i386] Replace builtins with vector extensions
On Thu, Oct 9, 2014 at 5:57 AM, Uros Bizjak wrote: > On Thu, Oct 9, 2014 at 2:28 PM, Marc Glisse wrote: >> On Thu, 9 Oct 2014, Uros Bizjak wrote: >> >>> On Thu, Oct 9, 2014 at 12:33 PM, Marc Glisse wrote: Ping https://gcc.gnu.org/ml/gcc-patches/2014-07/msg01812.html (another part of the discussion is around https://gcc.gnu.org/ml/gcc-patches/2014-06/msg02288.html ) Most people who commented seem cautiously in favor. The least favorable was Ulrich who suggested to go with it but keep the old behavior accessible if the user defines some macro (which imho would lose a large part of the simplification benefits of the patch) https://gcc.gnu.org/ml/gcc-patches/2014-06/msg02328.html If this is accepted, I will gladly prepare patches removing the unused builtins and extending this to a few more operations (integer vectors in particular). If this is not the direction we want to go, I'd like to hear it clearly so I can move on... >>> >>> >>> Well, I'm undecided. >> >> >> First, thanks for answering, it helps me a lot to know what others think. >> >>> The current approach is proven to work OK, there is no bugs reported >>> in this area and the performance is apparently OK. There should be >>> clear benefits in order to change something that "ain't broken", and >>> at least some proof that we won't regress in this area with the new >>> approach. >> >> >> There are quite a few enhancement PRs asking for more performance, but >> indeed no (or very few) complaints about correctness or about gcc turning >> their code into something worse than what they wrote, which I completely >> agree weighs more. >> >>> On the other hand, if the new approach opens new optimization >>> opportunities (without regression!), I'm in favor of it, including the >>> fact that new code won't produce equivalent assembly - as long as >>> functionality of the optimized asm stays the same (obviously, I'd >>> say). >>> >>> Please also note that this is quite big project. There are plenty of >>> intrinsics and I for one don't want another partial transition ... >> >> >> That might be an issue : this transition is partial by nature. Many >> intrinsics cannot (easily) be expressed in GIMPLE, and among those that can >> be represented, we only want to change those for which we are confident that >> we will not regress the quality of the code. From the reactions, I would >> assume that we want to be quite conservative at the beginning, and maybe we >> can reconsider some other intrinsics later. >> >> The best I can offer is consistency: if addition of v2df is changed, >> addition of v4df is changed as well (and say any +-*/ of float/double >> vectors of any supported size). Another block would be +-*/% for integer >> vectors. And construction / access (most construction is already >> builtin-free). And remove the unused builtins in the same patch that makes >> them unused. If you don't like those blocks, I can write one mega-patch that >> does all these, if we roughly agree on the list beforehand, so it goes in >> all at once. >> >> Would that be good enough? > > OK, let's go in the proposed way, more detailed: > > - we begin with +-*/ of float/double vectors. IMO, this would result > in a relatively small and easily reviewable patch to iron out the > details of the approach. Alternatively, we can begin with floats only. > - commit the patch and wait for the sky to fall down. > - we play a bit with the compiler to check generated code and corner > cases (some kind of Q/A) and wait if someone finds a problem (say, a > couple of weeks). > - if there are no problems, continue with integer builtins following > the established approach, otherwise we revert everything and go back > to the drawing board. > - repeat the procedure for other builtins. > > I propose to wait a couple of days for possible comments before we get > the ball rolling. > We should also include some testcases to show code improvement for each change. Thanks. -- H.J.
Re: [PATCH i386 AVX512] [68/n] Add vpmullw, vpacksdw, pmaddwd insn patterns.
On Thu, Oct 9, 2014 at 1:07 PM, Kirill Yukhin wrote: > Hello, > This patch extends vpmullw, vpacksdw and pmaddwd > insn patterns. > > Bootstrapped. > AVX-512* tests on top of patch-set all pass > under simulator. > > Is it ok for trunk? > > gcc/ > * config/i386/sse.md > (define_c_enum "unspec"): Add UNSPEC_PMADDWD512. > (define_mode_iterator VI2_AVX2): Add V32HI mode. > (define_expand "mul3"): Add masking. > (define_insn "*mul3"): Ditto. > (define_expand "mul3_highpart"): Ditto. > (define_insn "*mul3_highpart"): Ditto. > (define_insn "avx512bw_pmaddwd512"): New. > (define_mode_attr SDOT_PMADD_SUF): Ditto. > (define_expand "sdot_prod"): Add . > (define_insn "_packssdw"): Add masking. > (define_insn "*_pmulhrsw3"): Ditto. > (define_insn "avx2_packusdw"): Delete. > (define_insn "sse4_1_packusdw"): Ditto. > (define_insn "_packusdw"): New. OK. > + "TARGET_SSE2 > + && ix86_binary_operator_ok (MULT, mode, operands) > + && && " > > Just noticed, that need to swap target check with operads check. No need to worry for minor issues now, but looking at the sse.md, it looks to me like a case for a quick cleanup patch to correct these inconsistencies. Thanks, Uros.