[ARM] Fix CLZ_DEFINED_VALUE_AT_ZERO for vector modes

2014-10-09 Thread Michael Collison


The CLZ_DEFINED_VALUE_AT_ZERO macro is harded to return 32. For the 
vector intrinsic vclz this is incorrect and should return the value 
eight. The CTZ_DEFINED_VALUE_AT_ZERO has the same issue.


Tested on arm-linux-gnueabihf, arm-linux-gnueabi.

2014-10-08  Michael Collison  

* config/arm/arm.h (CLZ_DEFINED_VALUE_AT_ZERO) : Update
to support vector modes
(CTZ_DEFINED_VALUE_AT_ZERO): Ditto

--
Michael Collison
Linaro Toolchain Working Group
michael.colli...@linaro.org

--- ../../../../linaro-gcc4_9_git/gcc/config/arm/arm.h	2014-10-08 13:49:01.109819957 -0700
+++ arm.h	2014-10-08 13:56:19.509841175 -0700
@@ -2138,8 +2138,10 @@ extern int making_const_table;
: reverse_condition (code))
 
 /* The arm5 clz instruction returns 32.  */
-#define CLZ_DEFINED_VALUE_AT_ZERO(MODE, VALUE)  ((VALUE) = 32, 1)
-#define CTZ_DEFINED_VALUE_AT_ZERO(MODE, VALUE)  ((VALUE) = 32, 1)
+#define CLZ_DEFINED_VALUE_AT_ZERO(MODE, VALUE) \
+  ((VALUE) = GET_MODE_UNIT_BITSIZE (MODE))
+#define CTZ_DEFINED_VALUE_AT_ZERO(MODE, VALUE) \
+  ((VALUE) = GET_MODE_UNIT_BITSIZE (MODE))
 
 #define CC_STATUS_INIT \
   do { cfun->machine->thumb1_cc_insn = NULL_RTX; } while (0)


Re: [ARM] Fix CLZ_DEFINED_VALUE_AT_ZERO for vector modes

2014-10-09 Thread Andrew Pinski
On Thu, Oct 9, 2014 at 12:05 AM, Michael Collison
 wrote:
>
> The CLZ_DEFINED_VALUE_AT_ZERO macro is harded to return 32. For the vector
> intrinsic vclz this is incorrect and should return the value eight. The
> CTZ_DEFINED_VALUE_AT_ZERO has the same issue.


Do you have a testcase?  I almost think you should have a testcase
which causes the constant folding.  Though I don't think there is
constant folding of the vector clz/ctz happening.

Thanks,
Andrew

>
> Tested on arm-linux-gnueabihf, arm-linux-gnueabi.
>
> 2014-10-08  Michael Collison  
>
> * config/arm/arm.h (CLZ_DEFINED_VALUE_AT_ZERO) : Update
> to support vector modes
> (CTZ_DEFINED_VALUE_AT_ZERO): Ditto
>
> --
> Michael Collison
> Linaro Toolchain Working Group
> michael.colli...@linaro.org
>


Re: Towards GNU11

2014-10-09 Thread Marek Polacek
On Wed, Oct 08, 2014 at 08:39:40PM -0600, Jeff Law wrote:
> I like it.  And one could reasonably argue that now is the time to change
> since that maximizes the time for folks to find broken code.

Yep, this is definitely stage1 stuff.  We still have a few weeks, but
I wouldn't want to rush such a change in the nick of time.

> I'd go so far as to conditionally approve -- if other maintainers don't
> shout out in the next week or so against, then I feel this should go
> forward.
 
Thanks.  I will wait at least until the end of next week.

I'd like to hear from Joseph ;).

> I know it's really early, but a "porting to ..." document ought to be
> started and have something in it about these changes.  Both how to fix the
> broken code or how to go back to c89.

Absolutely.  I'll start something up once it's in.  I feel that
especially the inline semantics change should be addressed therein.

Marek


Re: [ARM] Fix CLZ_DEFINED_VALUE_AT_ZERO for vector modes

2014-10-09 Thread Michael Collison


Yes this problem was found with Christophe's neon intrinsic tests which 
are awaiting approval. The problem was found by passing a value of zero 
to the vclz vector intrinsic.


On 10/09/2014 12:11 AM, Andrew Pinski wrote:

On Thu, Oct 9, 2014 at 12:05 AM, Michael Collison
 wrote:

The CLZ_DEFINED_VALUE_AT_ZERO macro is harded to return 32. For the vector
intrinsic vclz this is incorrect and should return the value eight. The
CTZ_DEFINED_VALUE_AT_ZERO has the same issue.


Do you have a testcase?  I almost think you should have a testcase
which causes the constant folding.  Though I don't think there is
constant folding of the vector clz/ctz happening.

Thanks,
Andrew


Tested on arm-linux-gnueabihf, arm-linux-gnueabi.

2014-10-08  Michael Collison  

 * config/arm/arm.h (CLZ_DEFINED_VALUE_AT_ZERO) : Update
 to support vector modes
 (CTZ_DEFINED_VALUE_AT_ZERO): Ditto

--
Michael Collison
Linaro Toolchain Working Group
michael.colli...@linaro.org



--
Michael Collison
Linaro Toolchain Working Group
michael.colli...@linaro.org



[patch] Don't install libquadmath.info if libquadmath is not actually built

2014-10-09 Thread FX
The attached patch prevents libquadmath from installing its info file if the 
library is not actually built.
Tested on x86_64-linux, both on a built with libquadmath (normal) and without 
(tweaking configure so it thinks _float128 is not supported).

OK to commit?




quadmath.ChangeLog
Description: Binary data


quadmath.diff
Description: Binary data


Re: [patch] Don't install libquadmath.info if libquadmath is not actually built

2014-10-09 Thread FX
> The attached patch prevents libquadmath from installing its info file if the 
> library is not actually built.
> Tested on x86_64-linux, both on a built with libquadmath (normal) and without 
> (tweaking configure so it thinks _float128 is not supported).
> 
> OK to commit?

Again, with correct ChangeLog entry including regeneration of Makefile.in



quadmath.ChangeLog
Description: Binary data


quadmath.diff
Description: Binary data


Re: [patch] Don't install libquadmath.info if libquadmath is not actually built

2014-10-09 Thread Jakub Jelinek
On Thu, Oct 09, 2014 at 09:27:55AM +0200, FX wrote:
> > The attached patch prevents libquadmath from installing its info file if 
> > the library is not actually built.
> > Tested on x86_64-linux, both on a built with libquadmath (normal) and 
> > without (tweaking configure so it thinks _float128 is not supported).
> > 
> > OK to commit?
> 
> Again, with correct ChangeLog entry including regeneration of Makefile.in
> 

Index: Makefile.am
===
--- Makefile.am (revision 215887)
+++ Makefile.am (working copy)
@@ -158,10 +158,18 @@
 # the relative path from the current `Makefile.am' to `texinfo.tex'.
 TEXINFO_TEX   = ../gcc/doc/include/texinfo.tex
 
+
 # Defines info, dvi, pdf and html targets
 MAKEINFOFLAGS = -I $(srcdir)/../gcc/doc/include
+
+if BUILD_LIBQUADMATH
 info_TEXINFOS = libquadmath.texi
 libquadmath_TEXINFOS = libquadmath-vers.texi
+else
+info_TEXINFOS = 
+libquadmath_TEXINFOS = 
+endif
 
 libquadmath-vers.texi:
echo "@set BUGURL $(REPORT_BUGS_TEXI)" > $@
+

Please avoid the vertical whitespace changes before MAKEINFOFLAGS and after
libquadmath-vers.texi goal.
Ok with that change.

Jakub


Re: [PATCH, Pointer Bounds Checker 14/x] Passes [2/n] IPA passes

2014-10-09 Thread Ilya Enkovich
2014-10-09 2:40 GMT+04:00 Jan Hubicka :
>>
>> I prevent clone's body from removal and therefore original should call
>> clone (otherwise clone may have no callers and be removed).  I think
>> call edge may work (need to recall other cases when reference do its
>> work) but is it OK to have call with no stmt for non thunk nodes?
>
> OK, I guess I understand what makes your situation different from what we have
> already.  Basically you have two functions with bodies already produced (by
> early optimization) but you want to keep body of the second around
> to later turn the first into thunk.  But I do not see why this can not happen 
> early?

Clones are created not only for functions with bodies.  Aliases and
thunks may also have clones.  Having call edges for aliases and more
than one edge for thunks seemed more intrusive to me than having a new
reference.  Regarding early transformation into thunks - I need to try
it.  It sounds reasonable and might work.

>
> I suppose we could either go with fake call edge or the new reference you 
> have.
> I basically wonder what would be least intrussive (and easiest to maintain) 
> way
> for doing so - adding special kinds of functions always produce extra 
> maintenance
> overhead in long term and we already have very many of them...
>>
>> >
>> >> +instrumented function in the code but it still should be counted
>> >> +as reachable if the original function is reachable.
>> >> +
>> >> +When original function bodies are not needed anymore we release
>> >> +them and transform functions into a special kind of thunks.  Each
>> >> +thunk has a call edge to the instrumented version.  These thunks
>> >> +help to keep externally visible instrumented functions visible
>> >> +when linker reolution files are used.  Linker has no info about
>> >> +connection between original and instrumented function and
>> >> +therefore we may wrongly decide (due to difference in assember
>> >> +names) that instrumented function version is local and can be
>> >> +removed.  */
>> >
>> > Can you, please, give me an example what happens here?
>> > Also how this relates to LTO?
>>
>> When we compile with LTO linker provides a resolution file with
>> functions usage information.  Consider I have external function test
>> and its clone test.chkp.  Linker says I have calls to test and no
>> calls to test.chkp.  Compiler just removes test.chkp then.  It doesn't
>> happen when test is trunsformed into a thunk of test.chkp.
>
> Does the bounds checking work across translation unit boundaries somehow?
> Would it make sense to do the redirection and instrumentation as an IPA pass?

Instrumentation is function local and I don't see benefit from making
it an IPA pass.

Ilya

>
> Honza
>>
>> Ilya
>>
>> >
>> > Honza
>> >> +
>> >> +#define CHKP_BOUNDS_OF_SYMBOL_PREFIX "__chkp_bounds_of_"
>> >> +
>> >> +/* Build a clone of FNDECL with a modified name.  */
>> >> +
>> >> +static tree
>> >> +chkp_build_instrumented_fndecl (tree fndecl)
>> >> +{
>> >> +  tree new_decl = copy_node (fndecl);
>> >> +  tree new_name;
>> >> +  std::string s;
>> >> +
>> >> +  /* We want called_as_built_in recall instrumented calls
>> >> + to instrumented built-in functions.  Therefore use
>> >> + DECL_NAME for cloning instead of DECL_ASSEMBLER_NAME.  */
>> >> +  s = IDENTIFIER_POINTER (DECL_NAME (fndecl));
>> >> +  s += ".chkp";
>> >> +  DECL_NAME (new_decl) = get_identifier (s.c_str ());
>> >> +
>> >> +  /* References to the original and to the instrumented version
>> >> + should look the same in the output assembly.  And we cannot
>> >> + use the same assembler name for the instrumented version
>> >> + because it conflicts with decl merging algorithms in LTO.
>> >> + Achieve the result by using transparent alias name for the
>> >> + instrumented version.  */
>> >> +  s = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (fndecl));
>> >> +  s += ".chkp";
>> >> +  new_name = get_identifier (s.c_str ());
>> >> +  IDENTIFIER_TRANSPARENT_ALIAS (new_name) = 1;
>> >> +  TREE_CHAIN (new_name) = DECL_ASSEMBLER_NAME (fndecl);
>> >> +  SET_DECL_ASSEMBLER_NAME (new_decl, new_name);
>> >> +
>> >> +  /* For functions with body versioning will make a copy of arguments.
>> >> + For functions with no body we need to do it here.  */
>> >> +  if (!gimple_has_body_p (fndecl))
>> >> +DECL_ARGUMENTS (new_decl) = copy_list (DECL_ARGUMENTS (fndecl));
>> >> +
>> >> +  /* We are going to modify attributes list and therefore should
>> >> + make own copy.  */
>> >> +  DECL_ATTRIBUTES (new_decl) = copy_list (DECL_ATTRIBUTES (fndecl));
>> >> +
>> >> +  return new_decl;
>> >> +}
>> >> +
>> >> +
>> >> +/* Fix operands of attribute from ATTRS list named ATTR_NAME.
>> >> +   Integer operands are replaced with values according to
>> >> +   INDEXES map having LEN elements.  For operands out of len
>> >> +   we just add DELTA.  */
>> >> +
>> >> +static void
>> >> +chkp_map_attr_arg_indexes (tree at

[PATCH] Fix PR61969

2014-10-09 Thread Richard Biener

This fixes an invalid NRV optimization done for returning a
non-automatic var.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2014-10-08  Richard Biener  

PR tree-optimization/61969
* tree-nrv.c (pass_nrv::execute): Properly test for automatic
variables.

Index: gcc/tree-nrv.c
===
--- gcc/tree-nrv.c  (revision 215917)
+++ gcc/tree-nrv.c  (working copy)
@@ -216,8 +216,7 @@ pass_nrv::execute (function *fun)
 same type and alignment as the function's result.  */
  if (TREE_CODE (found) != VAR_DECL
  || TREE_THIS_VOLATILE (found)
- || DECL_CONTEXT (found) != current_function_decl
- || TREE_STATIC (found)
+ || !auto_var_in_fn_p (found, current_function_decl)
  || TREE_ADDRESSABLE (found)
  || DECL_ALIGN (found) > DECL_ALIGN (result)
  || !useless_type_conversion_p (result_type,


Re: [ARM] Fix CLZ_DEFINED_VALUE_AT_ZERO for vector modes

2014-10-09 Thread Tejas Belagod

On 09/10/14 08:05, Michael Collison wrote:


The CLZ_DEFINED_VALUE_AT_ZERO macro is harded to return 32. For the
vector intrinsic vclz this is incorrect and should return the value
eight. The CTZ_DEFINED_VALUE_AT_ZERO has the same issue.

Tested on arm-linux-gnueabihf, arm-linux-gnueabi.

2014-10-08  Michael Collison  

  * config/arm/arm.h (CLZ_DEFINED_VALUE_AT_ZERO) : Update
  to support vector modes
  (CTZ_DEFINED_VALUE_AT_ZERO): Ditto



Update comment?

/* The arm5 clz instruction returns 32.  */


Thanks,
Tejas.



Re: [ARM] Fix CLZ_DEFINED_VALUE_AT_ZERO for vector modes

2014-10-09 Thread Michael Collison


Tejas,

You are correct. I will update the comment.

On 10/9/2014 12:55 AM, Tejas Belagod wrote:

On 09/10/14 08:05, Michael Collison wrote:


The CLZ_DEFINED_VALUE_AT_ZERO macro is harded to return 32. For the
vector intrinsic vclz this is incorrect and should return the value
eight. The CTZ_DEFINED_VALUE_AT_ZERO has the same issue.

Tested on arm-linux-gnueabihf, arm-linux-gnueabi.

2014-10-08  Michael Collison 

  * config/arm/arm.h (CLZ_DEFINED_VALUE_AT_ZERO) : Update
  to support vector modes
  (CTZ_DEFINED_VALUE_AT_ZERO): Ditto



Update comment?

/* The arm5 clz instruction returns 32.  */


Thanks,
Tejas.





[PATCH] Fix PR63445

2014-10-09 Thread Richard Biener

This fixes a bogus strict overflow warning from VRPs 
simplify_cond_using_ranges.  I believe that when optimizing
equality compares from (T) x != CST to x != (T) CST we do
not have to care about the fact whether the computation of x
possibly overflowed.

Besides that it looks odd that we guard the transform by
overflow infinities / strict-overflow behavior - those
should only guard the warning.  Jeff - you added the overflow
checking stuff, can you remember why you did that?  (I didn't
adjust those checks to not apply for equality compares with
this patch)

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk
(probably latent on the 4.9 branch but the testcase doesn't fail there).

Thanks,
Richard.

2014-10-08  Richard Biener  

PR tree-optimization/63445
* tree-vrp.c (simplify_cond_using_ranges): Only warn about
overflow for non-equality compares.

* gcc.dg/Wstrict-overflow-26.c: New testcase.

Index: gcc/tree-vrp.c
===
--- gcc/tree-vrp.c  (revision 215917)
+++ gcc/tree-vrp.c  (working copy)
@@ -9189,8 +9189,9 @@ simplify_cond_using_ranges (gimple stmt)
  /* If the range overflowed and the user has asked for warnings
 when strict overflow semantics were used to optimize code,
 issue an appropriate warning.  */
- if ((is_negative_overflow_infinity (vr->min)
-  || is_positive_overflow_infinity (vr->max))
+ if (cond_code != EQ_EXPR && cond_code != NE_EXPR
+ && (is_negative_overflow_infinity (vr->min)
+ || is_positive_overflow_infinity (vr->max))
  && issue_strict_overflow_warning 
(WARN_STRICT_OVERFLOW_CONDITIONAL))
{
  location_t location;
Index: gcc/testsuite/gcc.dg/Wstrict-overflow-26.c
===
--- gcc/testsuite/gcc.dg/Wstrict-overflow-26.c  (revision 0)
+++ gcc/testsuite/gcc.dg/Wstrict-overflow-26.c  (working copy)
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -Wstrict-overflow" } */
+
+int
+f (int i, int j)
+{
+  unsigned int c = 0;  
+  if (i < j)
+{
+  unsigned int n = j - i;
+  unsigned int i;
+  for (i = 0; i < n; i++) /* { dg-bogus "signed overflow" } */
+   c++;
+}
+  return c;
+}


Re: [patch] Excessive alignment in ix86_data_alignment

2014-10-09 Thread Kirill Yukhin
On 08 Oct 23:02, Petr Murzin wrote:
> Hi,
> I have measured performance impact on Haswell platform according to this
> input:
> https://gcc.gnu.org/ml/gcc-patches/2014-06/msg00978.html
Same in plain text:
For `-O2':
TestPrevious Current Ratio(%)
400.perlbench   46.2000 46.2000 +0%
401.bzip2   29. 29. +0%
403.gcc 41. 41.2000 +0.48%
429.mcf 53.7000 53.3000 -0.74%
445.gobmk   32.3000 32.3000 +0%
456.hmmer   32.3000 32.3000 +0%
458.sjeng   36.1000 36.4000 +0.83%
462.libquantum  84.5000 84.3000 -0.23%
464.h264ref 64.4000 64.4000 +0%
471.omnetpp 27.7000 27.9000 +0.72%
473.astar   28.1000 28. -0.35%
483.xalancbmk   49.9000 50.2000 +0.60%
410.bwaves  93.2000 93.4000 +0.21%
416.gamess  41.9000 41.8000 -0.23%
433.milc37.9000 37.8000 -0.26%
434.zeusmp  48.5000 48.4000 -0.20%
435.gromacs 32.5000 32.6000 +0.30%
436.cactusADM   82.8000 80.9000 -2.29%
437.leslie3d55.6000 55.7000 +0.17%
444.namd32.1000 32.1000 +0%
447.dealII  67.5000 67.5000 +0%
450.soplex  48. 47.9000 -0.20%
453.povray  58.3000 58.4000 +0.17%
454.calculix37.6000 37.6000 +0%
459.GemsFDTD44.5000 44.7000 +0.44%
465.tonto   28.6000 28.6000 +0%
470.lbm 67.9000 67.8000 -0.14%
481.wrf 49.4000 49.5000 +0.20%
482.sphinx3 68.7000 68.6000 -0.14%
Geomeans:
INT 41.14   41.18   +0.11%
FP  49.82   49.76   -0.12%
ALL 46.03   46.01   -0.02%

For `-O3':
TestPrevious Current Ratio(%)
400.perlbench   46.2000 46.2000 +0%
401.bzip2   29. 29.1000 +0.34%
403.gcc 41.1000 41.2000 +0.24%
429.mcf 53.4000 54.6000 +2.24%
445.gobmk   32.3000 32.3000 +0%
456.hmmer   32.3000 32.3000 +0%
458.sjeng   36.1000 36.4000 +0.83%
462.libquantum  84. 84.8000 +0.95%
464.h264ref 64.2000 64.3000 +0.15%
471.omnetpp 27.6000 27.8000 +0.72%
473.astar   28. 28. +0%
483.xalancbmk   50.1000 49.8000 -0.59%
410.bwaves  93.5000 93.6000 +0.10%
416.gamess  41.8000 41.9000 +0.23%
433.milc37.8000 37.8000 +0%
434.zeusmp  48.5000 48.4000 -0.20%
435.gromacs 32.6000 32.4000 -0.61%
436.cactusADM   80.4000 81.2000 +0.99%
437.leslie3d55.7000 55.7000 +0%
444.namd32.1000 32.1000 +0%
447.dealII  67.6000 67.6000 +0%
450.soplex  48.6000 48.2000 -0.82%
453.povray  58.3000 58.5000 +0.34%
454.calculix37.6000 37.6000 +0%
459.GemsFDTD44.7000 44.6000 -0.22%
465.tonto   28.6000 28.6000 +0%
470.lbm 68.1000 68.1000 +0%
481.wrf 49.5000 49.6000 +0.20%
482.sphinx3 68.5000 68. -0.72%
Geomeans:
INT 41.08   41.25   +0.41%
FP  49.80   49.78   -0.04%
ALL 45.99   46.06   +0.14%

--
Thanks, K


Re: [PATCH] PR libstdc++/60132, implement the remaining C++11 is_trivially_* traits

2014-10-09 Thread Paolo Carlini

Hi,

On 10/09/2014 08:10 AM, Ville Voutilainen wrote:
Ok, another try (I don't have write-after-approval, so please commit 
the patch once you think it's ok):
Thanks Ville from me too. I'm taking care of the commit with a few very 
minor tweaks:
1- In new library testcases we are consistently using -std=gnu++11 and 
-std=gnu++14 (because the C++98 default is -std=gnu++98)

2- Uniform the dates to 2014-10-09.
3- Adjust the dg-error line numbers of 3 *_neg.cc testcases.

Thanks again,
Paolo.


2014-10-09  Ville Voutilainen  

PR libstdc++/60132
* include/std/type_traits (is_trivially_copyable,
is_trivially_constructible, is_trivially_default_constructible,
is_trivially_copy_constructible, is_trivially_move_constructible,
is_trivially_assignable, is_trivially_copy_assignable,
is_trivially_move_assignable): New.
* testsuite/20_util/is_trivially_assignable/requirements/
typedefs.cc: Likewise.
* testsuite/20_util/is_trivially_assignable/requirements/
explicit_instantiation.cc: Likewise.
* testsuite/20_util/is_trivially_assignable/value.cc: Likewise.
* testsuite/20_util/is_trivially_constructible/requirements/
typedefs.cc: Likewise.
* testsuite/20_util/is_trivially_constructible/requirements/
explicit_instantiation.cc: Likewise.
* testsuite/20_util/is_trivially_constructible/value.cc: Likewise.
* testsuite/20_util/is_trivially_copyable/requirements/
typedefs.cc: Likewise.
* testsuite/20_util/is_trivially_copyable/requirements/
explicit_instantiation.cc: Likewise.
* testsuite/20_util/is_trivially_copyable/value.cc: Likewise.
* testsuite/20_util/is_trivially_copy_assignable/requirements/
typedefs.cc: Likewise.
* testsuite/20_util/is_trivially_copy_assignable/requirements/
explicit_instantiation.cc: Likewise.
* testsuite/20_util/is_trivially_copy_assignable/value.cc: Likewise.
* testsuite/20_util/is_trivially_copy_constructible/requirements/
typedefs.cc: Likewise.
* testsuite/20_util/is_trivially_copy_constructible/requirements/
explicit_instantiation.cc: Likewise.
* testsuite/20_util/is_trivially_copy_constructible/value.cc: Likewise.
* testsuite/20_util/is_trivially_default_constructible/requirements/
typedefs.cc: Likewise.
* testsuite/20_util/is_trivially_default_constructible/requirements/
explicit_instantiation.cc: Likewise.
* testsuite/20_util/is_trivially_default_constructible/
value.cc: Likewise.
* testsuite/20_util/is_trivially_move_assignable/requirements/
typedefs.cc: Likewise.
* testsuite/20_util/is_trivially_move_assignable/requirements/
explicit_instantiation.cc: Likewise.
* testsuite/20_util/is_trivially_move_assignable/value.cc: Likewise.
* testsuite/20_util/is_trivially_move_constructible/requirements/
typedefs.cc: Likewise.
* testsuite/20_util/is_trivially_move_constructible/requirements/
explicit_instantiation.cc: Likewise.
* testsuite/20_util/is_trivially_move_constructible/value.cc:
Likewise.
* testsuite/20_util/declval/requirements/1_neg.cc: Adjust dg-error
line number.
* testsuite/20_util/make_signed/requirements/typedefs_neg.cc:
Likewise.
* testsuite/20_util/make_unsigned/requirements/typedefs_neg.cc:
Likewise.
Index: include/std/type_traits
===
--- include/std/type_traits (revision 216027)
+++ include/std/type_traits (working copy)
@@ -606,7 +606,11 @@
 : public integral_constant
 { };
 
-  // is_trivially_copyable (still unimplemented)
+  // is_trivially_copyable
+  template
+struct is_trivially_copyable
+: public integral_constant
+{ };
 
   /// is_standard_layout
   template
@@ -1282,20 +1286,59 @@
 : public __is_nt_move_assignable_impl<_Tp>
 { };
 
-  /// is_trivially_constructible (still unimplemented)
+  /// is_trivially_constructible
+  template
+struct is_trivially_constructible
+: public __and_, integral_constant>::type
+{ };
   
-  /// is_trivially_default_constructible (still unimplemented)
+  /// is_trivially_default_constructible
+  template
+struct is_trivially_default_constructible
+: public is_trivially_constructible<_Tp>::type
+{ };
 
-  /// is_trivially_copy_constructible (still unimplemented)
+  /// is_trivially_copy_constructible
+  template
+struct is_trivially_copy_constructible
+: public __and_, 
+   integral_constant>::type
+{ };
+  
+  /// is_trivially_move_constructible
+  template
+struct is_trivially_move_constructible
+: public __and_, 
+   integral_constant>::type
+{ };
 
-  /// is_trivially_move_constructible (still unimplemented)
+  /// is_tri

Re: RFA: AVR: add infrastructure for device packages

2014-10-09 Thread Denis Chertykov
2014-10-08 21:50 GMT+04:00 Joern Rennecke :
> As the steering commitee still hasn't spoken on the maintainership issue,
> apparently this still has to go the write-after-approval route.
>
> The purpose of this patch is to make it possible to add support for new
> devices (MCUs) to the AVR toolchain, without having to re-build the
> entire toolchain.  This capability is desirable because new MCUs are added
> fairly frequently.
>
> There are multiple parts of the toolchain involved.
> gcc changes multilibbing to key off the new -march option; the -mmcu option
> is translated via DRIVER_SELF_SPECS into a -specs option, and the
> individual spec files contain the required settings like -march, and various
> more detailed settings (some of which are for new options).
>
> binutils provides new relocation and relaxation facilities to allow referring
> symbolically to symbol differences and/or I/O addresses.
> avr-libc puts the device-specifc header settings in avr/io*.h, and a few
> small device-specific likbale functions into a device-specific library.
>
> The other toolchain parts are staged here:
> g...@github.com:embecosm/avr-binutils-gdb.git avr-mainline
> g...@github.com:embecosm/avr-libc.git avr-libc-embecosm-mainline
>
>
> Attached is the GCC patch for the basic device package infrastructure.
> OK to apply?


Please, apply.

Denis.


Re: [patch] Excessive alignment in ix86_data_alignment

2014-10-09 Thread Uros Bizjak
On Thu, Oct 9, 2014 at 10:25 AM, Kirill Yukhin  wrote:
> On 08 Oct 23:02, Petr Murzin wrote:
>> Hi,
>> I have measured performance impact on Haswell platform according to this
>> input:
>> https://gcc.gnu.org/ml/gcc-patches/2014-06/msg00978.html

What about older processors?

The optimization was introduced well before Haswell for then current
processors, and it was based on the recommendation from Intel
optimization guide. If this optimization doesn't apply for new
processors, then tune option should be introduced and set accordingly.

Uros.


[DOC PATCH] Add documentation for -fsanitize=enum,bool

2014-10-09 Thread Marek Polacek
We are missing documentation for -fsanitize=bool and -fsanitize=enum,
so I've put something together.

Ok for trunk?

2014-10-09  Marek Polacek  

* doc/invoke.texi: Document -fsanitize=bool and -fsanitize=enum.

diff --git gcc/doc/invoke.texi gcc/doc/invoke.texi
index 5fe7e15..8f3eb16 100644
--- gcc/doc/invoke.texi
+++ gcc/doc/invoke.texi
@@ -5604,6 +5604,19 @@ This option enables instrumentation of return statements 
in functions
 marked with @code{returns_nonnull} function attribute, to detect returning
 of null values from such functions.
 
+@item -fsanitize=bool
+@opindex fsanitize=bool
+
+This option enables instrumentation of loads from bool.  If a value other
+than 0/1 is loaded, a run-time error is issued.
+
+@item -fsanitize=enum
+@opindex fsanitize=enum
+
+This option enables instrumentation of loads from an enum type.  If
+a value outside the range of values for the enum type is loaded,
+a run-time error is issued.
+
 @end table
 
 While @option{-ftrapv} causes traps for signed overflows to be emitted,

Marek


Re: [DOC PATCH] Add documentation for -fsanitize=enum,bool

2014-10-09 Thread Jakub Jelinek
On Thu, Oct 09, 2014 at 10:47:34AM +0200, Marek Polacek wrote:
> We are missing documentation for -fsanitize=bool and -fsanitize=enum,
> so I've put something together.
> 
> Ok for trunk?
> 
> 2014-10-09  Marek Polacek  
> 
>   * doc/invoke.texi: Document -fsanitize=bool and -fsanitize=enum.

Ok, thanks.

Jakub


[PING] [4.8 & 4.9] Backport of r211885

2014-10-09 Thread Yangfei (Felix)
PING?


> 
> The enclosed patch for 4.8 & 4.9 branch is a backport of r211885 from trunk.
> 
> The only change is to use:
> 
> for (def_rec = DF_INSN_INFO_DEFS (insn_info); *def_rec; def_rec++)
> 
> other than the new FOR_EACH_INSN_INFO_DEF interface.
> 
> Bootstrapped on x86_64-SUSE-Linux for both branches. OK to apply?
> 
> 
> Index: gcc/loop-invariant.c
> 
> ===
> --- gcc/loop-invariant.c(revision 216001)
> +++ gcc/loop-invariant.c(working copy)
> @@ -839,6 +839,41 @@ check_dependencies (rtx insn, bitmap depends_on)
>return true;
>  }
> 
> +/* Pre-check candidate DEST to skip the one which can not make a valid insn
> +   during move_invariant_reg.  SIMPLE is to skip HARD_REGISTER.  */
> +
> +static bool
> +pre_check_invariant_p (bool simple, rtx dest) {
> +  if (simple && REG_P (dest) && DF_REG_DEF_COUNT (REGNO (dest)) > 1)
> +{
> +  df_ref use;
> +  rtx ref;
> +  unsigned int i = REGNO (dest);
> +  struct df_insn_info *insn_info;
> +  df_ref *def_rec;
> +
> +  for (use = DF_REG_USE_CHAIN (i); use; use = DF_REF_NEXT_REG
> (use))
> +{
> +  ref = DF_REF_INSN (use);
> +  insn_info = DF_INSN_INFO_GET (ref);
> +
> +  for (def_rec = DF_INSN_INFO_DEFS (insn_info); *def_rec; def_rec++)
> +if (DF_REF_REGNO (*def_rec) == i)
> +  {
> +/* Multi definitions at this stage, most likely are due to
> +   instruction constraints, which requires both read and write
> +   on the same register.  Since move_invariant_reg is not
> +   powerful enough to handle such cases, just ignore the INV
> +   and leave the chance to others.  */
> +return false;
> +  }
> +}
> +}
> +
> +  return true;
> +}
> +
>  /* Finds invariant in INSN.  ALWAYS_REACHED is true if the insn is always
> executed.  ALWAYS_EXECUTED is true if the insn is always executed,
> unless the program ends due to a function call.  */ @@ -869,6 +904,7
> @@ find_invariant_insn (rtx insn, bool always_reached
>  simple = false;
> 
>if (!may_assign_reg_p (SET_DEST (set))
> +  || !pre_check_invariant_p (simple, dest)
>|| !check_maybe_invariant (SET_SRC (set)))
>  return;
> 
> 
> Cheers,
> Felix


Re: [4.8 & 4.9] Backport of r211885

2014-10-09 Thread Jakub Jelinek
On Wed, Oct 08, 2014 at 11:00:24PM +0800, Felix Yang wrote:
> The enclosed patch for 4.8 & 4.9 branch is a backport of r211885 from trunk.
> 
> The only change is to use:
> 
> for (def_rec = DF_INSN_INFO_DEFS (insn_info); *def_rec; def_rec++)
> 
> other than the new FOR_EACH_INSN_INFO_DEF interface.
> 
> Bootstrapped on x86_64-SUSE-Linux for both branches. OK to apply?

ChangeLog entry is missing, plus description why do you want to backport it.
If it fixes a bug on the branches, it would be better to have a bugzilla
PR for that, and definitely a testcase.

> --- gcc/loop-invariant.c(revision 216001)
> +++ gcc/loop-invariant.c(working copy)
> @@ -839,6 +839,41 @@ check_dependencies (rtx insn, bitmap depends_on)
>return true;
>  }
> 
> +/* Pre-check candidate DEST to skip the one which can not make a valid insn
> +   during move_invariant_reg.  SIMPLE is to skip HARD_REGISTER.  */
> +
> +static bool
> +pre_check_invariant_p (bool simple, rtx dest)
> +{
> +  if (simple && REG_P (dest) && DF_REG_DEF_COUNT (REGNO (dest)) > 1)
> +{
> +  df_ref use;
> +  rtx ref;
> +  unsigned int i = REGNO (dest);
> +  struct df_insn_info *insn_info;
> +  df_ref *def_rec;
> +
> +  for (use = DF_REG_USE_CHAIN (i); use; use = DF_REF_NEXT_REG (use))
> +{
> +  ref = DF_REF_INSN (use);
> +  insn_info = DF_INSN_INFO_GET (ref);
> +
> +  for (def_rec = DF_INSN_INFO_DEFS (insn_info); *def_rec; def_rec++)
> +if (DF_REF_REGNO (*def_rec) == i)
> +  {
> +/* Multi definitions at this stage, most likely are due to
> +   instruction constraints, which requires both read and write
> +   on the same register.  Since move_invariant_reg is not
> +   powerful enough to handle such cases, just ignore the INV
> +   and leave the chance to others.  */
> +return false;
> +  }
> +}
> +}
> +
> +  return true;
> +}
> +
>  /* Finds invariant in INSN.  ALWAYS_REACHED is true if the insn is always
> executed.  ALWAYS_EXECUTED is true if the insn is always executed,
> unless the program ends due to a function call.  */
> @@ -869,6 +904,7 @@ find_invariant_insn (rtx insn, bool always_reached
>  simple = false;
> 
>if (!may_assign_reg_p (SET_DEST (set))
> +  || !pre_check_invariant_p (simple, dest)
>|| !check_maybe_invariant (SET_SRC (set)))
>  return;
> 
> 
> Cheers,
> Felix

Jakub


Re: [fortran,patch] Emit code for some IEEE functions in the front-end

2014-10-09 Thread Janne Blomqvist
Ok, thanks for the patch!

On Sat, Oct 4, 2014 at 12:55 PM, FX  wrote:
> ping
>
>
>> Hi all,
>>
>> The attached patch improves our code generation for some of the 
>> IEEE_ARITHMETIC functions: some testing functions (is*) and some arithmetic 
>> (logb, rint, scalb, …). The interfaces are still present in the module file 
>> generated as part of the library (which allows, in particular, for accurate 
>> testing of the extent of support we have), but we catch them before emitting 
>> the actual function call and emit front-end-generated code instead. This 
>> code uses some intrinsics that we didn’t use in the front-end so far (some 
>> type generic, some not), so I have added them (logb, remainder, rint, 
>> signbit).
>>
>> The patch is nice because it improves the quality of the code generated, 
>> eliminating in many cases the need for a function call. It is also a 
>> prerequisite to extend our IEEE support to more floating-point types 
>> (extended precision and binary128, on some targets including i386/x86_64). 
>> Without it, we would have a combinatorial explosion of the number of 
>> “helper” functions in the library.
>>
>> Also, I’m removing symbols from gfortran.map, but no branching/release has 
>> occurred since I added them in the first place: it should be all good.
>>
>> Regtested on x86_64-apple-darwin14. This regresses ieee_2.f90, at -m32 
>> -fno-float-store only, where we seem to trigger a missimplification of 
>> __builtin_rint(). I’ll send, just after this one, a mail to gcc to get some 
>> help on that, and track the issue separately.
>>
>> OK to commit?
>> FX
>



-- 
Janne Blomqvist


[PATCH] Fix PR63380

2014-10-09 Thread Richard Biener

The following fixes PR63380 by making sure to not ignore unused
but possibly trapping stmts when comparing basic-blocks.

Bootstrap / regtest running on x86_64-unknown-linux-gnu.

Richard.

2014-10-09  Richard Biener  

PR tree-optimization/63380
* tree-ssa-tail-merge.c (stmt_local_def): Exclude stmts that
may trap.

* gcc.dg/torture/pr63380-1.c: New testcase.

Index: gcc/tree-ssa-tail-merge.c
===
--- gcc/tree-ssa-tail-merge.c   (revision 215917)
+++ gcc/tree-ssa-tail-merge.c   (working copy)
@@ -313,9 +313,9 @@ stmt_local_def (gimple stmt)
   tree val;
   def_operand_p def_p;
 
-  if (gimple_has_side_effects (stmt)
-  || stmt_could_throw_p (stmt)
-  || gimple_vdef (stmt) != NULL_TREE)
+  if (gimple_vdef (stmt) != NULL_TREE
+  || gimple_has_side_effects (stmt)
+  || gimple_could_trap_p_1 (stmt, false, false))
 return false;
 
   def_p = SINGLE_SSA_DEF_OPERAND (stmt, SSA_OP_DEF);
Index: gcc/testsuite/gcc.dg/torture/pr63380-1.c
===
--- gcc/testsuite/gcc.dg/torture/pr63380-1.c(revision 0)
+++ gcc/testsuite/gcc.dg/torture/pr63380-1.c(working copy)
@@ -0,0 +1,15 @@
+/* { dg-do run } */
+
+int a = 0, b = 1, c = 0, d = 1, e, f, g, h;
+int
+main ()
+{
+  e = 1 >> d;
+  f = ((31 / (1 > e)) || c) / 2;
+  g = b || a;
+  h = 31 / g;
+  if (!h)
+__builtin_abort();
+  return 0;
+}
+
Index: gcc/testsuite/gcc.dg/torture/pr63380-2.c
===
--- gcc/testsuite/gcc.dg/torture/pr63380-2.c(revision 0)
+++ gcc/testsuite/gcc.dg/torture/pr63380-2.c(working copy)
@@ -0,0 +1,10 @@
+/* { dg-do run } */
+
+int a = 0, b = 0, c = 0, d, e;
+int
+main (void)
+{
+  d = ((20 % (1 != b)) && c) + 2147483647;
+  e = 20 % (a >= 0);
+  return 0;
+}


Re: [4.8 & 4.9] Backport of r211885

2014-10-09 Thread Yangfei (Felix)
> > On Wed, Oct 08, 2014 at 11:00:24PM +0800, Felix Yang wrote:
> > The enclosed patch for 4.8 & 4.9 branch is a backport of r211885 from trunk.
> >
> > The only change is to use:
> >
> > for (def_rec = DF_INSN_INFO_DEFS (insn_info); *def_rec; def_rec++)
> >
> > other than the new FOR_EACH_INSN_INFO_DEF interface.
> >
> > Bootstrapped on x86_64-SUSE-Linux for both branches. OK to apply?
> 
> ChangeLog entry is missing, plus description why do you want to backport it.
> If it fixes a bug on the branches, it would be better to have a bugzilla PR 
> for that,
> and definitely a testcase.
> 

Yeah, I will add a ChangeLog entry for this patch when it is committed. 
I encountered the same issue when working on my local customized 4.8/4.9 
branches. Not reproduceable with the official 4.8/4.9 branches. 
I thinks it's just an enhancement for the loop invariant pass to make it more 
versatile. It's better that 4.8/4.9 branches also inlcude this enhancement. 
OK? 


[PATCH] Fix PR63379

2014-10-09 Thread Richard Biener

This fixes SLP vectorization of a MIN/MAX reduction where we choose
a "neutral" element as the initial value of the first SLP group
member.  That's of course wrong.  Simply don't choose one in which
case the proper initial values will used.

Bootstrap / regtest running on x86_64-unknown-linux-gnu.

Richard.

2014-10-09  Richard Biener  

PR tree-optimization/63379
* tree-vect-slp.c (vect_get_constant_vectors): Do not compute
a neutral operand for min/max.

* gcc.dg/vect/pr63379.c: New testcase.

Index: gcc/tree-vect-slp.c
===
--- gcc/tree-vect-slp.c (revision 215917)
+++ gcc/tree-vect-slp.c (working copy)
@@ -2395,13 +2395,8 @@ vect_get_constant_vectors (tree op, slp_
 neutral_op = build_int_cst (TREE_TYPE (op), -1);
 break;
 
-  case MAX_EXPR:
-  case MIN_EXPR:
-def_stmt = SSA_NAME_DEF_STMT (op);
-loop = (gimple_bb (stmt))->loop_father;
-neutral_op = PHI_ARG_DEF_FROM_EDGE (def_stmt,
-loop_preheader_edge (loop));
-break;
+ /* For MIN/MAX we don't have an easy neutral operand but
+the initial values can be used fine here.  */
 
   default:
 neutral_op = NULL;
Index: gcc/testsuite/gcc.dg/vect/pr63379.c
===
--- gcc/testsuite/gcc.dg/vect/pr63379.c (revision 0)
+++ gcc/testsuite/gcc.dg/vect/pr63379.c (working copy)
@@ -0,0 +1,43 @@
+/* PR tree-optimization/63379  */
+/* { dg-do run } */
+
+#include "tree-vect.h"
+
+extern void abort (void);
+
+typedef struct {
+int x;
+int y;
+} Point;
+
+Point pt_array[25];
+
+void __attribute__((noinline,noclone))
+generate_array(void)
+{
+  unsigned int i;
+  for (i = 0; i<25; i++)
+{
+  pt_array[i].x = i;
+  pt_array[i].y = 1000+i;
+}
+}
+
+int main()
+{
+  check_vect ();
+  generate_array ();
+  Point min_pt = pt_array[0];
+  Point *ptr, *ptr_end;
+  for (ptr = pt_array+1, ptr_end = pt_array+25; ptr != ptr_end; ++ptr)
+{
+  min_pt.x = (min_pt.x < ptr->x) ? min_pt.x : ptr->x;
+  min_pt.y = (min_pt.y < ptr->y) ? min_pt.y : ptr->y;
+}
+
+  if (min_pt.x != 0 || min_pt.y != 1000)
+abort ();
+  return 0;
+}
+
+/* { dg-final { cleanup-tree-dump "vect" } } */


Re: [4.8 & 4.9] Backport of r211885

2014-10-09 Thread Jakub Jelinek
On Thu, Oct 09, 2014 at 09:04:49AM +, Yangfei (Felix) wrote:
> > > On Wed, Oct 08, 2014 at 11:00:24PM +0800, Felix Yang wrote:
> > > The enclosed patch for 4.8 & 4.9 branch is a backport of r211885 from 
> > > trunk.
> > >
> > > The only change is to use:
> > >
> > > for (def_rec = DF_INSN_INFO_DEFS (insn_info); *def_rec; def_rec++)
> > >
> > > other than the new FOR_EACH_INSN_INFO_DEF interface.
> > >
> > > Bootstrapped on x86_64-SUSE-Linux for both branches. OK to apply?
> > 
> > ChangeLog entry is missing, plus description why do you want to backport it.
> > If it fixes a bug on the branches, it would be better to have a bugzilla PR 
> > for that,
> > and definitely a testcase.
> > 
> 
> Yeah, I will add a ChangeLog entry for this patch when it is committed. 
> I encountered the same issue when working on my local customized 4.8/4.9 
> branches. Not reproduceable with the official 4.8/4.9 branches. 
> I thinks it's just an enhancement for the loop invariant pass to make it more 
> versatile. It's better that 4.8/4.9 branches also inlcude this enhancement. 
> OK? 

If it is just an enhancement, then those generally are not backported to
release branches (exceptions possible of course, but there needs to be a
strong reason).  Each pass has some risk of breaking something, exposing
previously only latent bugs in later passes etc.

Jakub


Re: [PATCH 5/5] add libcc1

2014-10-09 Thread Phil Muldoon
On 31/07/14 05:47, Jeff Law wrote:
> On 06/19/14 14:52, Tom Tromey wrote:
>> Here's a new version of patch #5.
>> I've removed the generated code; let's see if it gets through without
>> compression.
>>
>> I think this addresses all the reviews:
>>
>> * It uses gcc-plugin.m4 to disable the plugin
>> * It does some configure checks for needed functionality, and disables
>>the plugin if they are not found
>> * libcc1 and the plugin now do a protocol version handshake at
>>startup
>> * The diagnostic overriding code is now in the plugin, not in gcc proper
>> * gdb now tells libcc1 about the target triplet, and libcc1 uses
>>this to invoke the proper GCC.  This is done by (ewww) searching $PATH.
>>
>> Tom
>>
>> 2014-06-19  Phil Muldoon  
>> Tom Tromey  

> So my biggest concern here is long term maintenance -- who's going to own 
> care and feeding of these bits over time.

Sorry for taking so long to reply.  We've talked, on irc and elsewhere
a little (some at the Cauldron too!).  I think the consensus is as
nobody has explicitly mentioned anything, this is OK to go in?  So to
be a pain, I think we should get an archivable "OK to check-in" and
that I, or other members of the Red Hat team (or anyone else that
comes along that is interested), will maintain this.  FWIW, I don't
really see bit-rot as an issue because 1) I'll be around and so will
other hackers working on this -- I think it is very important to GDB;
2) It's not really a patch-set I think is horribly susceptible to bit
rot anyway.

> My inclination is to go ahead and approve, but explicitly note that if the 
> bits do start to rot that we'll be fairly aggressive at disabling/removing 
> them.

That's a fair condition and I can happily live with that. Agreed on
the conditional here.

> Now that my position is out there for everyone to see, give the other 
> maintainers a few days (say until Monday) to chime in with any objections.

Well it's been a few  months, so Monday has long gone ;)

> Obviously if there are no objections and you check in the change, please be 
> on the lookout for any fallout.  I'm particularly concerned about AIX, 
> Solaris and other non-linux platforms.

Noted.

> Does this deserve a mention in the news file?

I am not sure.  All the interface to this is really through GDB.
I'll let someone else tell me yes or no for news.  The patch set I have on my
desk is ready to go, and I believe all alterations have been approved
in previous email threads.

Cheers,

Phil



Re: [PATCH 5/5] add libcc1

2014-10-09 Thread Jakub Jelinek
On Thu, Oct 09, 2014 at 10:07:23AM +0100, Phil Muldoon wrote:
> >> 2014-06-19  Phil Muldoon  
> >> Tom Tromey  
> 
> > So my biggest concern here is long term maintenance -- who's going to own 
> > care and feeding of these bits over time.
> 
> Sorry for taking so long to reply.  We've talked, on irc and elsewhere
> a little (some at the Cauldron too!).  I think the consensus is as
> nobody has explicitly mentioned anything, this is OK to go in?  So to
> be a pain, I think we should get an archivable "OK to check-in" and
> that I, or other members of the Red Hat team (or anyone else that

The series is ok for trunk.  Please retest it before checking in.

> > Does this deserve a mention in the news file?
> 
> I am not sure.  All the interface to this is really through GDB.
> I'll let someone else tell me yes or no for news.  The patch set I have on my
> desk is ready to go, and I believe all alterations have been approved
> in previous email threads.

Yeah, I think it belongs into GDB news file probably, because that is the
place where it is user visible. 

Jakub


[PATCH] Clean up duplicated function seq_cost

2014-10-09 Thread Zhenqiang Chen
Hi,

The are two implementations of seq_cost. The function bodies are exactly the
same. The patch removes one of them and make the other global.

Bootstrap and no make check regression on X86-64.

OK for trunk?

Thanks!
-Zhenqiang

ChangeLog:
2014-10-09  Zhenqiang Chen  

* cfgloopanal.c (seq_cost): Make it global.
* rtl.h (seq_cost): New prototype.
* tree-ssa-loop-ivopts.c (seq_cost): Delete.

diff --git a/gcc/cfgloopanal.c b/gcc/cfgloopanal.c
index 7ea1a5f..dd37aa0 100644
--- a/gcc/cfgloopanal.c
+++ b/gcc/cfgloopanal.c
@@ -304,7 +304,7 @@ get_loop_level (const struct loop *loop)
 
 /* Returns estimate on cost of computing SEQ.  */
 
-static unsigned
+unsigned
 seq_cost (const rtx_insn *seq, bool speed)
 {
   unsigned cost = 0;
diff --git a/gcc/rtl.h b/gcc/rtl.h
index e73f731..b697417 100644
--- a/gcc/rtl.h
+++ b/gcc/rtl.h
@@ -2921,6 +2921,7 @@ extern rtx_insn *find_first_parameter_load (rtx_insn
*, rtx_insn *);
 extern bool keep_with_call_p (const rtx_insn *);
 extern bool label_is_jump_target_p (const_rtx, const rtx_insn *);
 extern int insn_rtx_cost (rtx, bool);
+extern unsigned seq_cost (const rtx_insn *, bool);
 
 /* Given an insn and condition, return a canonical description of
the test being made.  */
diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index 400798a..087ca26 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -2842,26 +2842,6 @@ get_use_iv_cost (struct ivopts_data *data, struct
iv_use *use,
   return NULL;
 }
 
-/* Returns estimate on cost of computing SEQ.  */
-
-static unsigned
-seq_cost (rtx_insn *seq, bool speed)
-{
-  unsigned cost = 0;
-  rtx set;
-
-  for (; seq; seq = NEXT_INSN (seq))
-{
-  set = single_set (seq);
-  if (set)
-   cost += set_src_cost (SET_SRC (set), speed);
-  else
-   cost++;
-}
-
-  return cost;
-}
-
 /* Produce DECL_RTL for object obj so it looks like it is stored in memory.
*/
 static rtx
 produce_memory_decl_rtl (tree obj, int *regno)





Re: [4.8 & 4.9] Backport of r211885

2014-10-09 Thread Yangfei (Felix)
> On Thu, Oct 09, 2014 at 09:04:49AM +, Yangfei (Felix) wrote:
> > > > On Wed, Oct 08, 2014 at 11:00:24PM +0800, Felix Yang wrote:
> > > > The enclosed patch for 4.8 & 4.9 branch is a backport of r211885 from
> trunk.
> > > >
> > > > The only change is to use:
> > > >
> > > > for (def_rec = DF_INSN_INFO_DEFS (insn_info); *def_rec; def_rec++)
> > > >
> > > > other than the new FOR_EACH_INSN_INFO_DEF interface.
> > > >
> > > > Bootstrapped on x86_64-SUSE-Linux for both branches. OK to apply?
> > >
> > > ChangeLog entry is missing, plus description why do you want to backport
> it.
> > > If it fixes a bug on the branches, it would be better to have a
> > > bugzilla PR for that, and definitely a testcase.
> > >
> >
> > Yeah, I will add a ChangeLog entry for this patch when it is committed.
> > I encountered the same issue when working on my local customized 4.8/4.9
> branches. Not reproduceable with the official 4.8/4.9 branches.
> > I thinks it's just an enhancement for the loop invariant pass to make it 
> > more
> versatile. It's better that 4.8/4.9 branches also inlcude this enhancement.
> > OK?
> 
> If it is just an enhancement, then those generally are not backported to 
> release
> branches (exceptions possible of course, but there needs to be a strong 
> reason).
> Each pass has some risk of breaking something, exposing previously only latent
> bugs in later passes etc.
> 
>   Jakub

We can treat it as bugfix, as we got incorrect code when it triggers.
It just happens so rarely. Does it worth backporting?



Re: [PATCH] Clean up duplicated function seq_cost

2014-10-09 Thread Richard Biener
On Thu, Oct 9, 2014 at 11:14 AM, Zhenqiang Chen  wrote:
> Hi,
>
> The are two implementations of seq_cost. The function bodies are exactly the
> same. The patch removes one of them and make the other global.
>
> Bootstrap and no make check regression on X86-64.
>
> OK for trunk?

The prototype should go to cfgloopanal.c.

Thanks,
RIchard.

> Thanks!
> -Zhenqiang
>
> ChangeLog:
> 2014-10-09  Zhenqiang Chen  
>
> * cfgloopanal.c (seq_cost): Make it global.
> * rtl.h (seq_cost): New prototype.
> * tree-ssa-loop-ivopts.c (seq_cost): Delete.
>
> diff --git a/gcc/cfgloopanal.c b/gcc/cfgloopanal.c
> index 7ea1a5f..dd37aa0 100644
> --- a/gcc/cfgloopanal.c
> +++ b/gcc/cfgloopanal.c
> @@ -304,7 +304,7 @@ get_loop_level (const struct loop *loop)
>
>  /* Returns estimate on cost of computing SEQ.  */
>
> -static unsigned
> +unsigned
>  seq_cost (const rtx_insn *seq, bool speed)
>  {
>unsigned cost = 0;
> diff --git a/gcc/rtl.h b/gcc/rtl.h
> index e73f731..b697417 100644
> --- a/gcc/rtl.h
> +++ b/gcc/rtl.h
> @@ -2921,6 +2921,7 @@ extern rtx_insn *find_first_parameter_load (rtx_insn
> *, rtx_insn *);
>  extern bool keep_with_call_p (const rtx_insn *);
>  extern bool label_is_jump_target_p (const_rtx, const rtx_insn *);
>  extern int insn_rtx_cost (rtx, bool);
> +extern unsigned seq_cost (const rtx_insn *, bool);
>
>  /* Given an insn and condition, return a canonical description of
> the test being made.  */
> diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
> index 400798a..087ca26 100644
> --- a/gcc/tree-ssa-loop-ivopts.c
> +++ b/gcc/tree-ssa-loop-ivopts.c
> @@ -2842,26 +2842,6 @@ get_use_iv_cost (struct ivopts_data *data, struct
> iv_use *use,
>return NULL;
>  }
>
> -/* Returns estimate on cost of computing SEQ.  */
> -
> -static unsigned
> -seq_cost (rtx_insn *seq, bool speed)
> -{
> -  unsigned cost = 0;
> -  rtx set;
> -
> -  for (; seq; seq = NEXT_INSN (seq))
> -{
> -  set = single_set (seq);
> -  if (set)
> -   cost += set_src_cost (SET_SRC (set), speed);
> -  else
> -   cost++;
> -}
> -
> -  return cost;
> -}
> -
>  /* Produce DECL_RTL for object obj so it looks like it is stored in memory.
> */
>  static rtx
>  produce_memory_decl_rtl (tree obj, int *regno)
>
>
>


Re: [PATCH] Clean up duplicated function seq_cost

2014-10-09 Thread Richard Biener
On Thu, Oct 9, 2014 at 11:20 AM, Richard Biener
 wrote:
> On Thu, Oct 9, 2014 at 11:14 AM, Zhenqiang Chen  
> wrote:
>> Hi,
>>
>> The are two implementations of seq_cost. The function bodies are exactly the
>> same. The patch removes one of them and make the other global.
>>
>> Bootstrap and no make check regression on X86-64.
>>
>> OK for trunk?
>
> The prototype should go to cfgloopanal.c.

Err - cfgloopanal.h of course ;)  Or rather the function sounds misplaced
in cfgloopanal.c.

Richard.

>
> Thanks,
> RIchard.
>
>> Thanks!
>> -Zhenqiang
>>
>> ChangeLog:
>> 2014-10-09  Zhenqiang Chen  
>>
>> * cfgloopanal.c (seq_cost): Make it global.
>> * rtl.h (seq_cost): New prototype.
>> * tree-ssa-loop-ivopts.c (seq_cost): Delete.
>>
>> diff --git a/gcc/cfgloopanal.c b/gcc/cfgloopanal.c
>> index 7ea1a5f..dd37aa0 100644
>> --- a/gcc/cfgloopanal.c
>> +++ b/gcc/cfgloopanal.c
>> @@ -304,7 +304,7 @@ get_loop_level (const struct loop *loop)
>>
>>  /* Returns estimate on cost of computing SEQ.  */
>>
>> -static unsigned
>> +unsigned
>>  seq_cost (const rtx_insn *seq, bool speed)
>>  {
>>unsigned cost = 0;
>> diff --git a/gcc/rtl.h b/gcc/rtl.h
>> index e73f731..b697417 100644
>> --- a/gcc/rtl.h
>> +++ b/gcc/rtl.h
>> @@ -2921,6 +2921,7 @@ extern rtx_insn *find_first_parameter_load (rtx_insn
>> *, rtx_insn *);
>>  extern bool keep_with_call_p (const rtx_insn *);
>>  extern bool label_is_jump_target_p (const_rtx, const rtx_insn *);
>>  extern int insn_rtx_cost (rtx, bool);
>> +extern unsigned seq_cost (const rtx_insn *, bool);
>>
>>  /* Given an insn and condition, return a canonical description of
>> the test being made.  */
>> diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
>> index 400798a..087ca26 100644
>> --- a/gcc/tree-ssa-loop-ivopts.c
>> +++ b/gcc/tree-ssa-loop-ivopts.c
>> @@ -2842,26 +2842,6 @@ get_use_iv_cost (struct ivopts_data *data, struct
>> iv_use *use,
>>return NULL;
>>  }
>>
>> -/* Returns estimate on cost of computing SEQ.  */
>> -
>> -static unsigned
>> -seq_cost (rtx_insn *seq, bool speed)
>> -{
>> -  unsigned cost = 0;
>> -  rtx set;
>> -
>> -  for (; seq; seq = NEXT_INSN (seq))
>> -{
>> -  set = single_set (seq);
>> -  if (set)
>> -   cost += set_src_cost (SET_SRC (set), speed);
>> -  else
>> -   cost++;
>> -}
>> -
>> -  return cost;
>> -}
>> -
>>  /* Produce DECL_RTL for object obj so it looks like it is stored in memory.
>> */
>>  static rtx
>>  produce_memory_decl_rtl (tree obj, int *regno)
>>
>>
>>


RE: [PATCH, C++] Fix PR63366: __complex not equivalent to __complex double in C++

2014-10-09 Thread Thomas Preud'homme
Ping?

> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme
> Sent: Monday, September 29, 2014 3:33 PM
> To: gcc-patches@gcc.gnu.org
> Subject: [PATCH, C++] Fix PR63366: __complex not equivalent to
> __complex double in C++
> 
> According to a comment in grokdeclarator in file gcc/cp/decl.c:
> 
> /* If we just have "complex", it is equivalent to
> "complex double", but if any modifiers at all are specified it is
> the complex form of TYPE. E.g, "complex short" is
> "complex short int". */
> 
> Yet, __complex is equivalent to __complex int as shows the following
> testcase:
> 
> #include 
> 
> int
> main (void)
> {
>   return typeid (__complex) != typeid (__complex int);
> }
> 
> The following patch fix the problem.
> 
> 
> ChangeLog are as follows:
> 
> *** gcc/cp/ChangeLog ***
> 
> 2014-09-26  Thomas Preud'homme  
> 
>   PR C++/63366
>   * decl.c (grokdeclarator): Set defaulted_int when defaulting to
> int
>   because type is null.
> 
> *** gcc/testsuite/ChangeLog ***
> 
> 2014-10-26  Thomas Preud'homme  
> 
>   PR C++/63366
>   * g++.dg/torture/pr63366.C: New test.
> 
> 
> diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
> index d26a432..449efdf 100644
> --- a/gcc/cp/decl.c
> +++ b/gcc/cp/decl.c
> @@ -9212,6 +9212,7 @@ grokdeclarator (const cp_declarator *declarator,
>   "ISO C++ forbids declaration of %qs with no type", name);
> 
>type = integer_type_node;
> +  defaulted_int = 1;
>  }
> 
>ctype = NULL_TREE;
> diff --git a/gcc/testsuite/g++.dg/torture/pr63366.C
> b/gcc/testsuite/g++.dg/torture/pr63366.C
> new file mode 100644
> index 000..af59b98
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/torture/pr63366.C
> @@ -0,0 +1,11 @@
> +// { dg-do run }
> +// { dg-options "-fpermissive" }
> +// { dg-prune-output "ISO C\\+\\+ forbids declaration of 'type name'
> with no type" }
> +
> +#include 
> +
> +int
> +main (void)
> +{
> +  return typeid (__complex) != typeid (__complex double);
> +}
> 
> 
> Is this ok for trunk?
> 
> Best regards,
> 
> Thomas Preud'homme
> 
> 
> 






答复: [4.8 & 4.9] Backport of r211885

2014-10-09 Thread Yangfei (Felix)

> > > > On Thu, Oct 09, 2014 at 09:04:49AM +, Yangfei (Felix) wrote:
> > > > > On Wed, Oct 08, 2014 at 11:00:24PM +0800, Felix Yang wrote:
> > > > > The enclosed patch for 4.8 & 4.9 branch is a backport of r211885
> > > > > from
> > trunk.
> > > > >
> > > > > The only change is to use:
> > > > >
> > > > > for (def_rec = DF_INSN_INFO_DEFS (insn_info); *def_rec;
> > > > > def_rec++)
> > > > >
> > > > > other than the new FOR_EACH_INSN_INFO_DEF interface.
> > > > >
> > > > > Bootstrapped on x86_64-SUSE-Linux for both branches. OK to apply?
> > > >
> > > > ChangeLog entry is missing, plus description why do you want to
> > > > backport
> > it.
> > > > If it fixes a bug on the branches, it would be better to have a
> > > > bugzilla PR for that, and definitely a testcase.
> > > >
> > >
> > > Yeah, I will add a ChangeLog entry for this patch when it is committed.
> > > I encountered the same issue when working on my local customized
> > > 4.8/4.9
> > branches. Not reproduceable with the official 4.8/4.9 branches.
> > > I thinks it's just an enhancement for the loop invariant pass to
> > > make it more
> > versatile. It's better that 4.8/4.9 branches also inlcude this enhancement.
> > > OK?
> >
> > If it is just an enhancement, then those generally are not backported
> > to release branches (exceptions possible of course, but there needs to be a
> strong reason).
> > Each pass has some risk of breaking something, exposing previously
> > only latent bugs in later passes etc.
> >
> > Jakub
> 
> We can treat it as bugfix, as we got incorrect code when it triggers.
> It just happens so rarely. Does it worth backporting?

And the patch fix this bug by making the loop invariant pass more conservative. 
I didn't find a PR or testcase on trunk for this patch either. 




Re: [patch] Turn 1 lra_assert into 1 gcc_assert

2014-10-09 Thread Eric Botcazou
> OK, I guess I can copy-and-paste reload1.c:spill_failure there.

This generates the same error message than reload1.c:spill_failure (modulo the 
class of the spill register, but it's not mentioned in the asm case either).

Tested on x86_64-suse-linux, OK for the mainline?


2014-10-09  Eric Botcazou  

* lra-assigns.c (assign_by_spills): Error out on spill failure.


-- 
Eric BotcazouIndex: lra-assigns.c
===
--- lra-assigns.c	(revision 215968)
+++ lra-assigns.c	(working copy)
@@ -1286,10 +1286,9 @@ assign_by_spills (void)
 	break;
   if (iter > 0)
 	{
-	  /* We did not assign hard regs to reload pseudos after two
-	 iteration.  It means something is wrong with asm insn
-	 constraints.  Report it.  */
-	  bool asm_p = false;
+	  /* We did not assign hard regs to reload pseudos after two iterations.
+	 Either it's an asm and something is wrong with the constraints, or
+	 we have run out of spill registers; error out in either case.  */
 	  bitmap_head failed_reload_insns;
 
 	  bitmap_initialize (&failed_reload_insns, ®_obstack);
@@ -1299,7 +1298,7 @@ assign_by_spills (void)
 	  bitmap_ior_into (&failed_reload_insns,
 			   &lra_reg_info[regno].insn_bitmap);
 	  /* Assign an arbitrary hard register of regno class to
-		 avoid further trouble with the asm insns.  */
+		 avoid further trouble with this insn.  */
 	  bitmap_clear_bit (&all_spilled_pseudos, regno);
 	  assign_hard_regno
 		(ira_class_hard_regs[regno_allocno_class_array[regno]][0],
@@ -1310,7 +1309,6 @@ assign_by_spills (void)
 	  insn = lra_insn_recog_data[u]->insn;
 	  if (asm_noperands (PATTERN (insn)) >= 0)
 		{
-		  asm_p = true;
 		  error_for_asm (insn,
  "% operand has impossible constraints");
 		  /* Avoid further trouble with this insn.
@@ -1331,8 +1329,12 @@ assign_by_spills (void)
 		  lra_set_insn_deleted (insn);
 		}
 		}
+	  else
+		{
+		  error ("unable to find a register to spill");
+		  fatal_insn ("this is the insn:", insn);
+		}
 	}
-	  lra_assert (asm_p);
 	  break;
 	}
   /* This is a very rare event.  We can not assign a hard register

Re: [fortran,patch] Emit code for some IEEE functions in the front-end

2014-10-09 Thread FX
> Ok, thanks for the patch!

Committed as rev. 216036.
Thanks for the review.

FX


[PATCH i386 AVX512] [64/n] Add rest of VI1-AVX2: vpack[us]wb.

2014-10-09 Thread Kirill Yukhin
Hello,
This patch adds rest of vpack instruction patterns.

Bootstrapped.
gcc.target/i386.exp tests on top of patch-set show no regressions.
under simulator.

Is it ok for trunk?

gcc/
* config/i386/sse.md
(define_insn "_packsswb"): Add masking.
(define_insn "_packuswb"): Ditto.

--
Thanks, K

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index d3e9635..594e692 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -11007,21 +11007,21 @@
   DONE;
 })
 
-(define_insn "_packsswb"
-  [(set (match_operand:VI1_AVX2 0 "register_operand" "=x,x")
+(define_insn "_packsswb"
+  [(set (match_operand:VI1_AVX2 0 "register_operand" "=x,v")
(vec_concat:VI1_AVX2
  (ss_truncate:
-   (match_operand: 1 "register_operand" "0,x"))
+   (match_operand: 1 "register_operand" "0,v"))
  (ss_truncate:
-   (match_operand: 2 "nonimmediate_operand" "xm,xm"]
-  "TARGET_SSE2"
+   (match_operand: 2 "nonimmediate_operand" "xm,vm"]
+  "TARGET_SSE2 &&  && "
   "@
packsswb\t{%2, %0|%0, %2}
-   vpacksswb\t{%2, %1, %0|%0, %1, %2}"
+   vpacksswb\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "isa" "noavx,avx")
(set_attr "type" "sselog")
(set_attr "prefix_data16" "1,*")
-   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix" "orig,maybe_evex")
(set_attr "mode" "")])
 
 (define_insn "_packssdw"
@@ -11041,17 +11041,17 @@
(set_attr "prefix" "orig,vex")
(set_attr "mode" "")])
 
-(define_insn "_packuswb"
-  [(set (match_operand:VI1_AVX2 0 "register_operand" "=x,x")
+(define_insn "_packuswb"
+  [(set (match_operand:VI1_AVX2 0 "register_operand" "=x,v")
(vec_concat:VI1_AVX2
  (us_truncate:
-   (match_operand: 1 "register_operand" "0,x"))
+   (match_operand: 1 "register_operand" "0,v"))
  (us_truncate:
-   (match_operand: 2 "nonimmediate_operand" "xm,xm"]
-  "TARGET_SSE2"
+   (match_operand: 2 "nonimmediate_operand" "xm,vm"]
+  "TARGET_SSE2 &&  && "
   "@
packuswb\t{%2, %0|%0, %2}
-   vpackuswb\t{%2, %1, %0|%0, %1, %2}"
+   vpackuswb\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "isa" "noavx,avx")
(set_attr "type" "sselog")
(set_attr "prefix_data16" "1,*")


[PATCH i386 AVX512] [65/n] Add rest of VI1-AVX2: mul insn pattern.

2014-10-09 Thread Kirill Yukhin
Hello,
This tiny patch extend mul insn pattern to support
masking.

Bootstrapped.
AVX-512* tests on top of patch-set all pass
under simulator.

Is it ok for trunk?

gcc/
* config/i386/sse.md
(define_expand "mul3"): Add masking.

--
Thanks, K

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 594e692..a3b2477 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -9238,11 +9238,11 @@
(set_attr "prefix" "orig,maybe_evex")
(set_attr "mode" "TI")])
 
-(define_expand "mul3"
+(define_expand "mul3"
   [(set (match_operand:VI1_AVX2 0 "register_operand")
(mult:VI1_AVX2 (match_operand:VI1_AVX2 1 "register_operand")
   (match_operand:VI1_AVX2 2 "register_operand")))]
-  "TARGET_SSE2"
+  "TARGET_SSE2 &&  && "
 {
   ix86_expand_vecop_qihi (MULT, operands[0], operands[1], operands[2]);
   DONE;


[PATCH i386 AVX512] [66/n] Extend vpalignr insn patterns.

2014-10-09 Thread Kirill Yukhin
Hello,
This patch extends vpalignr insn patterns.
It also introduces dedicated `masked' version of pattern
w/o substing.

Bootstrapped.
AVX-512* tests on top of patch-set all pass
under simulator.

Is it ok for trunk?

gcc/
* config/i386/sse.md
(define_mode_iterator SSESCALARMODE): Add V4TI mode.
(define_insn "_palignr_mask"): New.
(define_insn "_palignr"): Add EVEX version.

--
Thanks, K

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index a3b2477..79b6012 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -351,7 +351,7 @@
 
 ;; ??? This should probably be dropped in favor of VIMAX_AVX2.
 (define_mode_iterator SSESCALARMODE
-  [(V2TI "TARGET_AVX2") TI])
+  [(V4TI "TARGET_AVX512BW") (V2TI "TARGET_AVX2") TI])
 
 (define_mode_iterator VI12_AVX2
   [(V64QI "TARGET_AVX512BW") (V32QI "TARGET_AVX2") V16QI
@@ -13621,11 +13621,33 @@
(set (attr "prefix_rex") (symbol_ref "x86_extended_reg_mentioned_p (insn)"))
(set_attr "mode" "DI")])
 
+(define_insn "_palignr_mask"
+  [(set (match_operand:VI1_AVX2 0 "register_operand" "=v")
+(vec_merge:VI1_AVX2
+ (unspec:VI1_AVX2
+   [(match_operand:VI1_AVX2 1 "register_operand" "v")
+(match_operand:VI1_AVX2 2 "nonimmediate_operand" "vm")
+(match_operand:SI 3 "const_0_to_255_mul_8_operand" "n")]
+   UNSPEC_PALIGNR)
+   (match_operand:VI1_AVX2 4 "vector_move_operand" "0C")
+   (match_operand: 5 "register_operand" "Yk")))]
+  "TARGET_AVX512BW && ( == 64 || TARGET_AVX512VL)"
+{
+  operands[3] = GEN_INT (INTVAL (operands[3]) / 8);
+  return "vpalignr\t{%3, %2, %1, %0%{%5%}%N4|%0%{%5%}%N4, %1, %2, %3}";
+}
+  [(set_attr "type" "sseishft")
+   (set_attr "atom_unit" "sishuf")
+   (set_attr "prefix_extra" "1")
+   (set_attr "length_immediate" "1")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "")])
+
 (define_insn "_palignr"
-  [(set (match_operand:SSESCALARMODE 0 "register_operand" "=x,x")
+  [(set (match_operand:SSESCALARMODE 0 "register_operand" "=x,v")
(unspec:SSESCALARMODE
- [(match_operand:SSESCALARMODE 1 "register_operand" "0,x")
-  (match_operand:SSESCALARMODE 2 "nonimmediate_operand" "xm,xm")
+ [(match_operand:SSESCALARMODE 1 "register_operand" "0,v")
+  (match_operand:SSESCALARMODE 2 "nonimmediate_operand" "xm,vm")
   (match_operand:SI 3 "const_0_to_255_mul_8_operand" "n,n")]
  UNSPEC_PALIGNR))]
   "TARGET_SSSE3"


Re: [i386] Replace builtins with vector extensions

2014-10-09 Thread Marc Glisse

Ping https://gcc.gnu.org/ml/gcc-patches/2014-07/msg01812.html

(another part of the discussion is around 
https://gcc.gnu.org/ml/gcc-patches/2014-06/msg02288.html )


Most people who commented seem cautiously in favor. The least favorable 
was Ulrich who suggested to go with it but keep the old behavior 
accessible if the user defines some macro (which imho would lose a large 
part of the simplification benefits of the patch)

https://gcc.gnu.org/ml/gcc-patches/2014-06/msg02328.html

If this is accepted, I will gladly prepare patches removing the unused 
builtins and extending this to a few more operations (integer vectors in 
particular). If this is not the direction we want to go, I'd like to hear 
it clearly so I can move on...


My main doubt with the current patch is whether it is better to write 
simply (both variables have type __m128d):

__A + __B
or, as we will have to do for integers:
(__m128d)((__v2df)__A + (__v2df)__B)

--
Marc Glisse


[PATCH i386 AVX512] [67/n] Update constraints in vec_dup insn pattern.

2014-10-09 Thread Kirill Yukhin
Hello,
This tiny patch updates constraints in vec_dup insn
pattern.

Bootstrapped.
AVX-512* tests on top of patch-set all pass
under simulator.

Is it ok for trunk?

gcc/
* config/i386/sse.md
(define_insn "vec_dup"): Update constraints.

--
Thanks, K

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 79b6012..f88d3d0 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -16448,9 +16448,9 @@
   [V8SI V8SF V4DI V4DF])
 
 (define_insn "vec_dup"
-  [(set (match_operand:AVX_VEC_DUP_MODE 0 "register_operand" "=x,x,x")
+  [(set (match_operand:AVX_VEC_DUP_MODE 0 "register_operand" "=x,v,x")
(vec_duplicate:AVX_VEC_DUP_MODE
- (match_operand: 1 "nonimmediate_operand" "m,x,?x")))]
+ (match_operand: 1 "nonimmediate_operand" "m,v,?x")))]
   "TARGET_AVX"
   "@
vbroadcast\t{%1, %0|%0, %1}


Re: [patch] Add -static-libquadmath option

2014-10-09 Thread FX
Version 2 of the patch, now handling the darwin case (thanks Iain) and 
expressely noting in the documentation the implications on redistribution 
(thanks Joseph).
Bootstrapped and regtested on x86_64-apple-darwin14.

OK to commit?

I need a C/driver options maintainer, or global reviewer, to OK the C changes 
(but they really are obvious).
As Iain suggested the darwin.h change, I consider it pre-approved by him :)




> We have a -static-libgfortran option, but on targets where we support 
> quad-prec math through libquadmath, we didn’t have an equivalent 
> -static-libquadmath so far. This patch adds it, in what I think is a rather 
> straightforward manner.
> 
> The only minor complication comes from the fact that previously, linking 
> libquadmath was conditionally through libgfortran.spec. So the spec was 
> modified: when we use -static-libquadmath, the driver itself includes the 
> -lquadmath (surrounded by the necessary static linking directives), so that 
> libgfortran.spec shouldn’t do it again.
> 
> Bootstrapped and regtested on x86_64 linux. OK to commit?




static_quad.ChangeLog
Description: Binary data


static_quad.diff
Description: Binary data


Re: 答复: [4.8 & 4.9] Backport of r211885

2014-10-09 Thread Richard Biener
On Thu, Oct 9, 2014 at 11:35 AM, Yangfei (Felix)  wrote:
>
>> > > > On Thu, Oct 09, 2014 at 09:04:49AM +, Yangfei (Felix) wrote:
>> > > > > On Wed, Oct 08, 2014 at 11:00:24PM +0800, Felix Yang wrote:
>> > > > > The enclosed patch for 4.8 & 4.9 branch is a backport of r211885
>> > > > > from
>> > trunk.
>> > > > >
>> > > > > The only change is to use:
>> > > > >
>> > > > > for (def_rec = DF_INSN_INFO_DEFS (insn_info); *def_rec;
>> > > > > def_rec++)
>> > > > >
>> > > > > other than the new FOR_EACH_INSN_INFO_DEF interface.
>> > > > >
>> > > > > Bootstrapped on x86_64-SUSE-Linux for both branches. OK to apply?
>> > > >
>> > > > ChangeLog entry is missing, plus description why do you want to
>> > > > backport
>> > it.
>> > > > If it fixes a bug on the branches, it would be better to have a
>> > > > bugzilla PR for that, and definitely a testcase.
>> > > >
>> > >
>> > > Yeah, I will add a ChangeLog entry for this patch when it is committed.
>> > > I encountered the same issue when working on my local customized
>> > > 4.8/4.9
>> > branches. Not reproduceable with the official 4.8/4.9 branches.
>> > > I thinks it's just an enhancement for the loop invariant pass to
>> > > make it more
>> > versatile. It's better that 4.8/4.9 branches also inlcude this enhancement.
>> > > OK?
>> >
>> > If it is just an enhancement, then those generally are not backported
>> > to release branches (exceptions possible of course, but there needs to be a
>> strong reason).
>> > Each pass has some risk of breaking something, exposing previously
>> > only latent bugs in later passes etc.
>> >
>> > Jakub
>>
>> We can treat it as bugfix, as we got incorrect code when it triggers.
>> It just happens so rarely. Does it worth backporting?
>
> And the patch fix this bug by making the loop invariant pass more 
> conservative.
> I didn't find a PR or testcase on trunk for this patch either.

We at least want a testcase for the "we got incorrect code when it triggers".

Richard.

>


Re: [patch] Turn 1 lra_assert into 1 gcc_assert

2014-10-09 Thread Richard Biener
On Thu, Oct 9, 2014 at 11:39 AM, Eric Botcazou  wrote:
>> OK, I guess I can copy-and-paste reload1.c:spill_failure there.
>
> This generates the same error message than reload1.c:spill_failure (modulo the
> class of the spill register, but it's not mentioned in the asm case either).
>
> Tested on x86_64-suse-linux, OK for the mainline?

Ok.

Thanks,
Richard.

>
> 2014-10-09  Eric Botcazou  
>
> * lra-assigns.c (assign_by_spills): Error out on spill failure.
>
>
> --
> Eric Botcazou


Re: [patch] Add -static-libquadmath option

2014-10-09 Thread Iain Sandoe
Hi FX,

On 9 Oct 2014, at 11:39, FX wrote:

> Version 2 of the patch, now handling the darwin case (thanks Iain) and 
> expressely noting in the documentation the implications on redistribution 
> (thanks Joseph).
> Bootstrapped and regtested on x86_64-apple-darwin14.
> 
> OK to commit?
> 
> I need a C/driver options maintainer, or global reviewer, to OK the C changes 
> (but they really are obvious).
> As Iain suggested the darwin.h change, I consider it pre-approved by him :)

But it still needs to be OK'd by either a global reviewer or one of the listed 
Darwin maintainers ;) ...
... (ccing Mike)

Iain
> 
> 
> 
>> We have a -static-libgfortran option, but on targets where we support 
>> quad-prec math through libquadmath, we didn’t have an equivalent 
>> -static-libquadmath so far. This patch adds it, in what I think is a rather 
>> straightforward manner.
>> 
>> The only minor complication comes from the fact that previously, linking 
>> libquadmath was conditionally through libgfortran.spec. So the spec was 
>> modified: when we use -static-libquadmath, the driver itself includes the 
>> -lquadmath (surrounded by the necessary static linking directives), so that 
>> libgfortran.spec shouldn’t do it again.
>> 
>> Bootstrapped and regtested on x86_64 linux. OK to commit?
> 
> 
> 



Re: [PATCH] Add zero-overhead looping for xtensa backend

2014-10-09 Thread Felix Yang
Hello Sterling,

 My paper work with the FSF has finished and we can now move
forward with this patch :-)
 I rebased the patch on the latest trunk. Attached please find
version 3 of the patch.
 And the enclosed patch also includes the two points pointed by
you, do you like it?
 Make check regression tested with xtensa-elf-gcc built from trunk
with this patch.
 OK to apply?

Index: gcc/ChangeLog
===
--- gcc/ChangeLog(revision 216036)
+++ gcc/ChangeLog(working copy)
@@ -1,3 +1,19 @@
+2014-10-09  Felix Yang  
+
+* config/xtensa/xtensa.h (TARGET_LOOPS): New Macro.
+* config/xtensa/xtensa.c (xtensa_reorg): New.
+(xtensa_reorg_loops): New.
+(xtensa_can_use_doloop_p): New.
+(xtensa_invalid_within_doloop): New.
+(hwloop_optimize): New.
+(hwloop_fail): New.
+(hwloop_pattern_reg): New.
+(xtensa_emit_loop_end): Modified to emit the zero-overhead loop end label.
+(xtensa_doloop_hooks): Define.
+* config/xtensa/xtensa.md (doloop_end): New.
+(zero_cost_loop_start): Rewritten.
+(zero_cost_loop_end): Rewritten.
+
 2014-10-09  Joern Rennecke  

 * config/avr/avr.opt (mmcu=): Change to have a string value.
Index: gcc/config/xtensa/xtensa.md
===
--- gcc/config/xtensa/xtensa.md(revision 216036)
+++ gcc/config/xtensa/xtensa.md(working copy)
@@ -35,6 +35,8 @@
   (UNSPEC_TLS_CALL9)
   (UNSPEC_TP10)
   (UNSPEC_MEMW11)
+  (UNSPEC_LSETUP_START  12)
+  (UNSPEC_LSETUP_END13)

   (UNSPECV_SET_FP1)
   (UNSPECV_ENTRY2)
@@ -1289,41 +1291,67 @@
(set_attr "length""3")])


+;; Zero-overhead looping support.
+
 ;; Define the loop insns used by bct optimization to represent the
-;; start and end of a zero-overhead loop (in loop.c).  This start
-;; template generates the loop insn; the end template doesn't generate
-;; any instructions since loop end is handled in hardware.
+;; start and end of a zero-overhead loop.  This start template generates
+;; the loop insn; the end template doesn't generate any instructions since
+;; loop end is handled in hardware.

 (define_insn "zero_cost_loop_start"
   [(set (pc)
-(if_then_else (eq (match_operand:SI 0 "register_operand" "a")
-  (const_int 0))
-  (label_ref (match_operand 1 "" ""))
-  (pc)))
-   (set (reg:SI 19)
-(plus:SI (match_dup 0) (const_int -1)))]
+(if_then_else (ne (match_operand:SI 0 "register_operand" "a")
+  (const_int 1))
+  (label_ref (match_operand 1 "" ""))
+  (pc)))
+   (set (match_operand:SI 2 "register_operand" "+a0")
+(plus (match_dup 2)
+  (const_int -1)))
+   (unspec [(const_int 0)] UNSPEC_LSETUP_START)]
   ""
-  "loopnez\t%0, %l1"
+  "loop\t%0, %l1_LEND"
   [(set_attr "type""jump")
(set_attr "mode""none")
(set_attr "length""3")])

 (define_insn "zero_cost_loop_end"
   [(set (pc)
-(if_then_else (ne (reg:SI 19) (const_int 0))
-  (label_ref (match_operand 0 "" ""))
-  (pc)))
-   (set (reg:SI 19)
-(plus:SI (reg:SI 19) (const_int -1)))]
+(if_then_else (ne (match_operand:SI 0 "register_operand" "a")
+  (const_int 1))
+  (label_ref (match_operand 1 "" ""))
+  (pc)))
+   (set (match_operand:SI 2 "register_operand" "+a0")
+(plus (match_dup 2)
+  (const_int -1)))
+   (unspec [(const_int 0)] UNSPEC_LSETUP_END)]
   ""
 {
-xtensa_emit_loop_end (insn, operands);
-return "";
+  xtensa_emit_loop_end (insn, operands);
+  return "";
 }
   [(set_attr "type""jump")
(set_attr "mode""none")
(set_attr "length""0")])

+; operand 0 is the loop count pseudo register
+; operand 1 is the label to jump to at the top of the loop
+(define_expand "doloop_end"
+  [(parallel [(set (pc) (if_then_else
+  (ne (match_operand:SI 0 "" "")
+  (const_int 1))
+  (label_ref (match_operand 1 "" ""))
+  (pc)))
+  (set (match_dup 0)
+   (plus:SI (match_dup 0)
+(const_int -1)))
+  (unspec [(const_int 0)] UNSPEC_LSETUP_END)])]
+  ""
+{
+  /* The loop optimizer doesn't check the predicates... */
+  if (GET_MODE (operands[0]) != SImode)
+FAIL;
+})
+

 ;; Setting a register from a comparison.

Index: gcc/config/xtensa/xtensa.c
===
--- gcc/config/xtensa/xtensa.c(revision 216036)
+++ gcc/config/xtensa/xtensa.c(working copy)
@@ -61,6 +61,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimplify.h"
 #include "df.h"
 #include "builtins.h"
+#include "dumpfile.h"
+#include "hw-doloop.h"


 /* Enumeration for all of the rel

Re: 答复: [4.8 & 4.9] Backport of r211885

2014-10-09 Thread Felix Yang
The issue is that I cannot reproduce it on the official released branches.
It happens on my local GCC branch (with a new port).
Let's see if the original author of the patch has an testcase.
Zhenqiang, do you have one that can reproduce this bug with the
official 4.8/4.9 branches?
Thanks.
Cheers,
Felix


On Thu, Oct 9, 2014 at 6:41 PM, Richard Biener
 wrote:
> On Thu, Oct 9, 2014 at 11:35 AM, Yangfei (Felix)  
> wrote:
>>
>>> > > > On Thu, Oct 09, 2014 at 09:04:49AM +, Yangfei (Felix) wrote:
>>> > > > > On Wed, Oct 08, 2014 at 11:00:24PM +0800, Felix Yang wrote:
>>> > > > > The enclosed patch for 4.8 & 4.9 branch is a backport of r211885
>>> > > > > from
>>> > trunk.
>>> > > > >
>>> > > > > The only change is to use:
>>> > > > >
>>> > > > > for (def_rec = DF_INSN_INFO_DEFS (insn_info); *def_rec;
>>> > > > > def_rec++)
>>> > > > >
>>> > > > > other than the new FOR_EACH_INSN_INFO_DEF interface.
>>> > > > >
>>> > > > > Bootstrapped on x86_64-SUSE-Linux for both branches. OK to apply?
>>> > > >
>>> > > > ChangeLog entry is missing, plus description why do you want to
>>> > > > backport
>>> > it.
>>> > > > If it fixes a bug on the branches, it would be better to have a
>>> > > > bugzilla PR for that, and definitely a testcase.
>>> > > >
>>> > >
>>> > > Yeah, I will add a ChangeLog entry for this patch when it is committed.
>>> > > I encountered the same issue when working on my local customized
>>> > > 4.8/4.9
>>> > branches. Not reproduceable with the official 4.8/4.9 branches.
>>> > > I thinks it's just an enhancement for the loop invariant pass to
>>> > > make it more
>>> > versatile. It's better that 4.8/4.9 branches also inlcude this 
>>> > enhancement.
>>> > > OK?
>>> >
>>> > If it is just an enhancement, then those generally are not backported
>>> > to release branches (exceptions possible of course, but there needs to be 
>>> > a
>>> strong reason).
>>> > Each pass has some risk of breaking something, exposing previously
>>> > only latent bugs in later passes etc.
>>> >
>>> > Jakub
>>>
>>> We can treat it as bugfix, as we got incorrect code when it triggers.
>>> It just happens so rarely. Does it worth backporting?
>>
>> And the patch fix this bug by making the loop invariant pass more 
>> conservative.
>> I didn't find a PR or testcase on trunk for this patch either.
>
> We at least want a testcase for the "we got incorrect code when it triggers".
>
> Richard.
>
>>


[PATCH i386 AVX512] [68/n] Add vpmullw, vpacksdw, pmaddwd insn patterns.

2014-10-09 Thread Kirill Yukhin
Hello,
This patch extends vpmullw, vpacksdw and pmaddwd
insn patterns.

Bootstrapped.
AVX-512* tests on top of patch-set all pass
under simulator.

Is it ok for trunk?

gcc/
* config/i386/sse.md
(define_c_enum "unspec"): Add UNSPEC_PMADDWD512.
(define_mode_iterator VI2_AVX2): Add V32HI mode.
(define_expand "mul3"): Add masking.
(define_insn "*mul3"): Ditto.
(define_expand "mul3_highpart"): Ditto.
(define_insn "*mul3_highpart"): Ditto.
(define_insn "avx512bw_pmaddwd512"): New.
(define_mode_attr SDOT_PMADD_SUF): Ditto.
(define_expand "sdot_prod"): Add .
(define_insn "_packssdw"): Add masking.
(define_insn "*_pmulhrsw3"): Ditto.
(define_insn "avx2_packusdw"): Delete.
(define_insn "sse4_1_packusdw"): Ditto.
(define_insn "_packusdw"): New.

--
Thanks, K

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index f88d3d0..90414c7 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -132,6 +132,7 @@
   ;; For AVX512BW support
   UNSPEC_DBPSADBW
   UNSPEC_PMADDUBSW512
+  UNSPEC_PMADDWD512
   UNSPEC_PSHUFHW
   UNSPEC_PSHUFLW
   UNSPEC_CVTINT2MASK
@@ -301,7 +302,7 @@
   [(V64QI "TARGET_AVX512BW") (V32QI "TARGET_AVX2") V16QI])
 
 (define_mode_iterator VI2_AVX2
-  [(V16HI "TARGET_AVX2") V8HI])
+  [(V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX2") V8HI])
 
 (define_mode_iterator VI2_AVX512F
   [(V32HI "TARGET_AVX512F") (V16HI "TARGET_AVX2") V8HI])
@@ -9248,28 +9249,30 @@
   DONE;
 })
 
-(define_expand "mul3"
+(define_expand "mul3"
   [(set (match_operand:VI2_AVX2 0 "register_operand")
(mult:VI2_AVX2 (match_operand:VI2_AVX2 1 "nonimmediate_operand")
   (match_operand:VI2_AVX2 2 "nonimmediate_operand")))]
-  "TARGET_SSE2"
+  "TARGET_SSE2 &&  && "
   "ix86_fixup_binary_operands_no_copy (MULT, mode, operands);")
 
-(define_insn "*mul3"
-  [(set (match_operand:VI2_AVX2 0 "register_operand" "=x,x")
-(mult:VI2_AVX2 (match_operand:VI2_AVX2 1 "nonimmediate_operand" "%0,v")
-   (match_operand:VI2_AVX2 2 "nonimmediate_operand" 
"xm,vm")))]
-  "TARGET_SSE2 && ix86_binary_operator_ok (MULT, mode, operands)"
+(define_insn "*mul3"
+  [(set (match_operand:VI2_AVX2 0 "register_operand" "=x,v")
+   (mult:VI2_AVX2 (match_operand:VI2_AVX2 1 "nonimmediate_operand" "%0,v")
+  (match_operand:VI2_AVX2 2 "nonimmediate_operand" 
"xm,vm")))]
+  "TARGET_SSE2
+   && ix86_binary_operator_ok (MULT, mode, operands)
+   &&  && "
   "@
pmullw\t{%2, %0|%0, %2}
-   vpmullw\t{%2, %1, %0|%0, %1, %2}"
+   vpmullw\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "isa" "noavx,avx")
(set_attr "type" "sseimul")
(set_attr "prefix_data16" "1,*")
(set_attr "prefix" "orig,vex")
(set_attr "mode" "")])
 
-(define_expand "mul3_highpart"
+(define_expand "mul3_highpart"
   [(set (match_operand:VI2_AVX2 0 "register_operand")
(truncate:VI2_AVX2
  (lshiftrt:
@@ -9279,23 +9282,26 @@
  (any_extend:
(match_operand:VI2_AVX2 2 "nonimmediate_operand")))
(const_int 16]
-  "TARGET_SSE2"
+  "TARGET_SSE2
+   &&  && "
   "ix86_fixup_binary_operands_no_copy (MULT, mode, operands);")
 
-(define_insn "*mul3_highpart"
-  [(set (match_operand:VI2_AVX2 0 "register_operand" "=x,x")
+(define_insn "*mul3_highpart"
+  [(set (match_operand:VI2_AVX2 0 "register_operand" "=x,v")
(truncate:VI2_AVX2
  (lshiftrt:
(mult:
  (any_extend:
-   (match_operand:VI2_AVX2 1 "nonimmediate_operand" "%0,x"))
+   (match_operand:VI2_AVX2 1 "nonimmediate_operand" "%0,v"))
  (any_extend:
-   (match_operand:VI2_AVX2 2 "nonimmediate_operand" "xm,xm")))
+   (match_operand:VI2_AVX2 2 "nonimmediate_operand" "xm,vm")))
(const_int 16]
-  "TARGET_SSE2 && ix86_binary_operator_ok (MULT, mode, operands)"
+  "TARGET_SSE2
+   && ix86_binary_operator_ok (MULT, mode, operands)
+   &&  && "
   "@
pmulhw\t{%2, %0|%0, %2}
-   vpmulhw\t{%2, %1, %0|%0, %1, %2}"
+   vpmulhw\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "isa" "noavx,avx")
(set_attr "type" "sseimul")
(set_attr "prefix_data16" "1,*")
@@ -9538,6 +9544,18 @@
(set_attr "prefix" "orig,vex")
(set_attr "mode" "TI")])
 
+(define_insn "avx512bw_pmaddwd512"
+  [(set (match_operand: 0 "register_operand" "=v")
+  (unspec:
+[(match_operand:VI2_AVX2 1 "register_operand" "v")
+ (match_operand:VI2_AVX2 2 "nonimmediate_operand" "vm")]
+ UNSPEC_PMADDWD512))]
+   "TARGET_AVX512BW && "
+   "vpmaddwd\t{%2, %1, %0|%0, %1, %2}";
+  [(set_attr "type" "sseiadd")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "XI")])
+
 (define_expand "avx2_pmaddwd"
   [(set (match_operand:V8SI 0 "register_operand")
(plus:V8SI
@@ -9778,6 +9796,9 @@
   DONE;
 })
 
+(define_mode_attr SDOT_PMADD_SUF
+  [(V32HI "512v32hi") (V16HI "") (V8HI "")])
+

Re: [PATCH i386 AVX512] [68/n] Add vpmullw, vpacksdw, pmaddwd insn patterns.

2014-10-09 Thread Kirill Yukhin
On 09 Oct 15:07, Kirill Yukhin wrote:
> +(define_insn "*mul3"
> +  [(set (match_operand:VI2_AVX2 0 "register_operand" "=x,v")
> + (mult:VI2_AVX2 (match_operand:VI2_AVX2 1 "nonimmediate_operand" "%0,v")
> +(match_operand:VI2_AVX2 2 "nonimmediate_operand" 
> "xm,vm")))]
> +  "TARGET_SSE2
> +   && ix86_binary_operator_ok (MULT, mode, operands)
> +   &&  && "
Just noticed, that need to swap target check with operads check.


--
Thanks, K


[PATCH i386 AVX512] [69/n] Add vpmulhrsw insn support.

2014-10-09 Thread Kirill Yukhin
Hello,
This patch adds support for vpmulhrsw insn.

Bootstrapped.
AVX-512* tests on top of patch-set all pass
under simulator.

Is it ok for trunk?

gcc/
* config/i386/sse.md
(define_insn "avx512bw_umulhrswv32hi3"): New.
(define_expand "_pmulhrsw3_mask"): Ditto.

--
Thanks, K

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 90414c7..25bf3d8 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -13439,6 +13439,41 @@
(set_attr "prefix" "evex")
(set_attr "mode" "XI")])
 
+(define_insn "avx512bw_umulhrswv32hi3"
+  [(set (match_operand:V32HI 0 "register_operand" "=v")
+   (truncate:V32HI
+ (lshiftrt:V32SI
+   (plus:V32SI
+ (lshiftrt:V32SI
+   (mult:V32SI
+ (sign_extend:V32SI
+   (match_operand:V32HI 1 "nonimmediate_operand" "%v"))
+ (sign_extend:V32SI
+   (match_operand:V32HI 2 "nonimmediate_operand" "vm")))
+   (const_int 14))
+ (const_vector:V32HI [(const_int 1) (const_int 1)
+  (const_int 1) (const_int 1)
+  (const_int 1) (const_int 1)
+  (const_int 1) (const_int 1)
+  (const_int 1) (const_int 1)
+  (const_int 1) (const_int 1)
+  (const_int 1) (const_int 1)
+  (const_int 1) (const_int 1)
+  (const_int 1) (const_int 1)
+  (const_int 1) (const_int 1)
+  (const_int 1) (const_int 1)
+  (const_int 1) (const_int 1)
+  (const_int 1) (const_int 1)
+  (const_int 1) (const_int 1)
+  (const_int 1) (const_int 1)
+  (const_int 1) (const_int 1)]))
+   (const_int 1]
+  "TARGET_AVX512BW"
+  "vpmulhrsw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "sseimul")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "XI")])
+
 (define_insn "ssse3_pmaddubsw128"
   [(set (match_operand:V8HI 0 "register_operand" "=x,x")
(ss_plus:V8HI
@@ -13516,6 +13551,29 @@
 (define_mode_iterator PMULHRSW
   [V4HI V8HI (V16HI "TARGET_AVX2")])
 
+(define_expand "_pmulhrsw3_mask"
+  [(set (match_operand:PMULHRSW 0 "register_operand")
+   (vec_merge:PMULHRSW
+ (truncate:PMULHRSW
+   (lshiftrt:
+ (plus:
+   (lshiftrt:
+ (mult:
+   (sign_extend:
+ (match_operand:PMULHRSW 1 "nonimmediate_operand"))
+   (sign_extend:
+ (match_operand:PMULHRSW 2 "nonimmediate_operand")))
+ (const_int 14))
+   (match_dup 5))
+ (const_int 1)))
+ (match_operand:PMULHRSW 3 "register_operand")
+ (match_operand: 4 "register_operand")))]
+  "TARGET_AVX512BW && TARGET_AVX512VL"
+{
+  operands[5] = CONST1_RTX(mode);
+  ix86_fixup_binary_operands_no_copy (MULT, mode, operands);
+})
+
 (define_expand "_pmulhrsw3"
   [(set (match_operand:PMULHRSW 0 "register_operand")
(truncate:PMULHRSW


Re: [PATCH RFC]Pair load store instructions using a generic scheduling fusion pass

2014-10-09 Thread Mike Stump
On Oct 7, 2014, at 10:28 PM, Jeff Law  wrote:
> It's certainly advantageous for sched2 to split insns that generate multiple 
> instructions.

So, on my port, I have a load multiple that is just one instruction, and it is 
a single clock cycle (to enque it).

Re: [PATCH, ARM] attribute target (thumb,arm)

2014-10-09 Thread Christian Bruel

On 10/08/2014 06:56 PM, Ramana Radhakrishnan wrote:
> Hi Christian,

<< snipped agreed stuf >>
> 3) about inlining
>I dislike inlining different modes, From a conceptual use, a user
> might want to switch mode only when changing a function's hotness.
> Usually inlining a cold function into a hot one is not what the user
> explicitly required when setting different mode attributes for them,
> __attribute__((thumb)) should not imply coldness or hotness. Inlining 
> between cold and hot functions should be done based on profile feedback. 
> The choice of compiling in "Thumb1" state for coldness is a separate one 
> because that's where the choice needs to be made.

Ideally yes. but I think that a user motivation to use target attribute
(("thumb") is to reduce code size even in the cases where PFO is not
available (libraries, kernel or user application build spec packaged
without profile data). And there are cases where static probabilities
are not enough and that a user wants it own control with gprof or oprofile.
But in this case, we could point to the __attribute__ (("cold")) on the
function ? That would probably be the best workaround to propose if we
recommend this

But here is another scenario: Using of attribute (("arm")) for exception
entry points is indeed not related to hotness. But consider a typical
thumb binary with an entry point in arm compiled in C (ex handler, a
kernel...). Today due to the file boundary the thumb part is not inlined
into the arm point. (Using -flto is not possible because the whole
gimple would be thumb).

Now, using attribute (("target")) for the functions others than the
entry point, with your approach they would all be inlined (assuming the
cost allow this) and we would end up with a arm binary instead of a
thumb binary...

But there are still 3 points  :

- At least 2 other target (i386, Powerpc) that support attribute_target
disable inlining between modes that are not subsets. I like to think
about homogeneity between targets and I find odd to have different
inlining rules...

- Scanning the function body to check for ASM_INPUT does not look very
elegant (if this matters) because the asm could well be unrelated

The only case when it will always be a win to inline thumb into arm is
when the cost of the inlined body is less than a BX instruction (but
still, with branch prediction this cost is pondered).


>
>> The compiler would take a decision that is not what the user wrote. And
>> in addition if you consider the few instructions to modify R15 to switch
>> state that would end up with more code executed in the critical path,
>> voiding a possible size of speed gain.
> I do not expect there to be any additional instructions needed to switch 
> state. If function x is inlined into function y the state would be lost 
> and the state would be in terms of the state of function x.
Yes, indeed. I was in a LCM/mode-switching thinking mode when writing
this. In this case the mode is inherited.

> Obviously if the user doesn't want inlining - the user would add 
> attributes to disable inlining. You do have extensions such as 
> __attribute__((noinline)) and __attribute__((never_inline)) to give the 
> user that control and those bits need to be used in addition.

Those attributes are overkill. They would disable inlining between
caller-callee of a same mode. This is not what we want

>
> The attribute then purely reflects then the output instruction state of 
> the function if a copy of it's body is laid out separately in the output.
>
> IMHO, the heuristics for inlining should be the best judge of when 
> functions should be inlined between one and another and we shouldn't be 
> second guessing that in the backend
>
> If there is a copy of the function to be put out by the compiler, only 
> then should we choose this based on the state of the "target" i.e. arm 
> or thumb.
>
Yes,

So to summarize, we can:

  1) don't inline between different modes. Same behavior with other
targets. Solves asm case
  2) always inline unless the function contains asm statements. ( I
reject adding a new compilation switch)
  3) always inline. But recommend the use of attribute ((noinline)) to
handle asm or attribute ((cold,hot)) in the absence of profile datas

I obviously prefer 1) safe and  homogenous, 3) is the worse as it
requires additional user action (poor user). 2) is less worse.

Thanks for supporting me ::)

Christian



[PATCH i386 AVX512] [70/n]

2014-10-09 Thread Kirill Yukhin
Hello,
This patch further extends maxmin patterns.

Bootstrapped.
AVX-512* tests on top of patch-set all pass
under simulator.

Is it ok for trunk?

gcc/
* config/i386/sse.md
(define_insn "*3_finite"):
Fix pattern conditions order.
(define_insn "*sse4_1_3"): Add masking.
(define_insn "*sse4_1_3"0: Ditto.

--
Thanks, K

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 25bf3d8..a833cd9 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -1898,8 +1898,8 @@
  (match_operand:VF 1 "" "%0,v")
  (match_operand:VF 2 "" 
"xm,")))]
   "TARGET_SSE && flag_finite_math_only
-   && ix86_binary_operator_ok (, mode, operands)
-   &&  && "
+   &&  && 
+   && ix86_binary_operator_ok (, mode, operands)"
   "@
\t{%2, %0|%0, %2}
v\t{%2, %1, 
%0|%0, %1, %2}"
@@ -10209,15 +10209,17 @@
 }
 })
 
-(define_insn "*sse4_1_3"
-  [(set (match_operand:VI14_128 0 "register_operand" "=x,x")
+(define_insn "*sse4_1_3"
+  [(set (match_operand:VI14_128 0 "register_operand" "=x,v")
(smaxmin:VI14_128
- (match_operand:VI14_128 1 "nonimmediate_operand" "%0,x")
- (match_operand:VI14_128 2 "nonimmediate_operand" "xm,xm")))]
-  "TARGET_SSE4_1 && ix86_binary_operator_ok (, mode, operands)"
+ (match_operand:VI14_128 1 "nonimmediate_operand" "%0,v")
+ (match_operand:VI14_128 2 "nonimmediate_operand" "xm,vm")))]
+  "TARGET_SSE4_1
+   && 
+   && ix86_binary_operator_ok (, mode, operands)"
   "@
p\t{%2, %0|%0, %2}
-   vp\t{%2, %1, %0|%0, %1, %2}"
+   vp\t{%2, %1, 
%0|%0, %1, %2}"
   [(set_attr "isa" "noavx,avx")
(set_attr "type" "sseiadd")
(set_attr "prefix_extra" "1,*")
@@ -10290,15 +10292,17 @@
 }
 })
 
-(define_insn "*sse4_1_3"
-  [(set (match_operand:VI24_128 0 "register_operand" "=x,x")
+(define_insn "*sse4_1_3"
+  [(set (match_operand:VI24_128 0 "register_operand" "=x,v")
(umaxmin:VI24_128
- (match_operand:VI24_128 1 "nonimmediate_operand" "%0,x")
- (match_operand:VI24_128 2 "nonimmediate_operand" "xm,xm")))]
-  "TARGET_SSE4_1 && ix86_binary_operator_ok (, mode, operands)"
+ (match_operand:VI24_128 1 "nonimmediate_operand" "%0,v")
+ (match_operand:VI24_128 2 "nonimmediate_operand" "xm,vm")))]
+  "TARGET_SSE4_1
+   && 
+   && ix86_binary_operator_ok (, mode, operands)"
   "@
p\t{%2, %0|%0, %2}
-   vp\t{%2, %1, %0|%0, %1, %2}"
+   vp\t{%2, %1, 
%0|%0, %1, %2}"
   [(set_attr "isa" "noavx,avx")
(set_attr "type" "sseiadd")
(set_attr "prefix_extra" "1,*")


Re: [i386] Replace builtins with vector extensions

2014-10-09 Thread Uros Bizjak
On Thu, Oct 9, 2014 at 12:33 PM, Marc Glisse  wrote:
> Ping https://gcc.gnu.org/ml/gcc-patches/2014-07/msg01812.html
>
> (another part of the discussion is around
> https://gcc.gnu.org/ml/gcc-patches/2014-06/msg02288.html )
>
> Most people who commented seem cautiously in favor. The least favorable was
> Ulrich who suggested to go with it but keep the old behavior accessible if
> the user defines some macro (which imho would lose a large part of the
> simplification benefits of the patch)
> https://gcc.gnu.org/ml/gcc-patches/2014-06/msg02328.html
>
> If this is accepted, I will gladly prepare patches removing the unused
> builtins and extending this to a few more operations (integer vectors in
> particular). If this is not the direction we want to go, I'd like to hear it
> clearly so I can move on...

Well, I'm undecided.

The current approach is proven to work OK, there is no bugs reported
in this area and the performance is apparently OK. There should be
clear benefits in order to change something that "ain't broken", and
at least some proof that we won't regress in this area with the new
approach.

On the other hand, if the new approach opens new optimization
opportunities (without regression!), I'm in favor of it, including the
fact that new code won't produce equivalent assembly - as long as
functionality of the optimized asm stays the same (obviously, I'd
say).

Please also note that this is quite big project. There are plenty of
intrinsics and I for one don't want another partial transition ...

TL/DR: If there are benefits, no regressions and you think you'll
finish the transition, let's go for it.

Uros.


[PATCH i386 AVX512] [71/n] [Obvious?] Remove redudant iterator attribute.

2014-10-09 Thread Kirill Yukhin
Hello,
This obvious patch removes redundant iterator attribute

Bootstrapped.

Is it ok for trunk?

gcc/
* config/i386/sse.md (define_mode_attr avx2_avx512f): Remove.
--
Thanks, K

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index a833cd9..cf415c3 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -451,12 +451,6 @@
(V4SI "vec") (V8SI "avx2")
(V2DI "vec") (V4DI "avx2")])
 
-(define_mode_attr avx2_avx512f
-  [(V4SI "avx2") (V8SI "avx2") (V16SI "avx512f")
-   (V2DI "avx2") (V4DI "avx2") (V8DI "avx512f")
-   (V8SF "avx2") (V16SF "avx512f")
-   (V4DF "avx2") (V8DF "avx512f")])
-
 (define_mode_attr avx2_avx512
   [(V4SI "avx2") (V8SI "avx2") (V16SI "avx512f")
(V2DI "avx2") (V4DI "avx2") (V8DI "avx512f")


Re: [PATCH] Don't call fatal_error before error reporting has been initialized.

2014-10-09 Thread Ilya Tocar
Ping.

On 29 Sep 18:02, Ilya Tocar wrote:
> Hi,
> 
> Currently if call to atexit (lto_wrapper_cleanup) fails we
> won't report error as we haven't initialized error-reporting
> infrastructure. This patch moves this call after diagnostic_initialize.
> I hope that we can't  exit inside diagnostic_initialize. Otherwise we
> won't cleanup after it.
> Ok for trunk?
> 
> 2014-09-29  Ilya Tocar  
> 
>   * lto-wrapper.c (main): Don't call fatal_error before
>   diagnostic_initialize.
> ---
>  gcc/lto-wrapper.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/gcc/lto-wrapper.c b/gcc/lto-wrapper.c
> index 08fd090..39e13b8 100644
> --- a/gcc/lto-wrapper.c
> +++ b/gcc/lto-wrapper.c
> @@ -870,13 +870,13 @@ main (int argc, char *argv[])
>  
>xmalloc_set_program_name (progname);
>  
> -  if (atexit (lto_wrapper_cleanup) != 0)
> -fatal_error ("atexit failed");
> -
>gcc_init_libintl ();
>  
>diagnostic_initialize (global_dc, 0);
>  
> +  if (atexit (lto_wrapper_cleanup) != 0)
> +fatal_error ("atexit failed");
> +
>if (signal (SIGINT, SIG_IGN) != SIG_IGN)
>  signal (SIGINT, fatal_signal);
>  #ifdef SIGHUP
> -- 
> 1.8.3.1
> 


[PATCH i386 AVX512] [72/n] Extend VI itterator.

2014-10-09 Thread Kirill Yukhin
Hello,
This patch extends VI mode iterator.

Bootstrapped.
AVX-512* tests on top of patch-set all pass
under simulator.

Is it ok for trunk?

gcc/
* config/i386/i386.c
(ix86_expand_vector_logical_operator): Handle V16SF and V8DF modes.
* config/i386/sse.md
(define_mode_iterator VI): Add V64QI and V32HI modes.

--
Thanks, K

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index d759a45..257e12b 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -17502,8 +17502,10 @@ ix86_expand_vector_logical_operator (enum rtx_code 
code, enum machine_mode mode,
{
case V4SFmode:
case V8SFmode:
+   case V16SFmode:
case V2DFmode:
case V4DFmode:
+   case V8DFmode:
  dst = gen_reg_rtx (GET_MODE (SUBREG_REG (op1)));
  if (GET_CODE (op2) == CONST_VECTOR)
{
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index cf415c3..852cb30 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -265,8 +265,8 @@
 ;; All vector integer modes
 (define_mode_iterator VI
   [(V16SI "TARGET_AVX512F") (V8DI "TARGET_AVX512F")
-   (V32QI "TARGET_AVX") V16QI
-   (V16HI "TARGET_AVX") V8HI
+   (V64QI "TARGET_AVX512BW") (V32QI "TARGET_AVX") V16QI
+   (V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX") V8HI
(V8SI "TARGET_AVX") V4SI
(V4DI "TARGET_AVX") V2DI])
 


[PATCH i386 AVX512] [73/n] Extend reduc min/max autogen.

2014-10-09 Thread Kirill Yukhin
Hello,
This patch extends pattern for reducation maxmin autogen.

Bootstrapped.
AVX-512* tests on top of patch-set all pass
under simulator.

Is it ok for trunk?

gcc/
* config/i386/sse.md
(define_mode_iterator REDUC_SMINMAX_MODE): Add V64QI and V32HI modes.

--
Thanks, K

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 852cb30..d4adc33 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -2316,7 +2316,8 @@
   [(V32QI "TARGET_AVX2") (V16HI "TARGET_AVX2")
(V8SI "TARGET_AVX2") (V4DI "TARGET_AVX2")
(V8SF "TARGET_AVX") (V4DF "TARGET_AVX")
-   (V4SF "TARGET_SSE") (V16SI "TARGET_AVX512F")
+   (V4SF "TARGET_SSE") (V64QI "TARGET_AVX512BW")
+   (V32HI "TARGET_AVX512BW") (V16SI "TARGET_AVX512F")
(V8DI "TARGET_AVX512F") (V16SF "TARGET_AVX512F")
(V8DF "TARGET_AVX512F")])


Re: [patch] Add -static-libquadmath option

2014-10-09 Thread FX
> But it still needs to be OK'd by either a global reviewer or one of the 
> listed Darwin maintainers ;) ...
> ... (ccing Mike)

Duh me. I thought you were a darwin maintainer. Sorry.

FX

[PATCH i386 AVX512] [74/n] Add byte/word max/mix reduction.

2014-10-09 Thread Kirill Yukhin
Hello,

Bootstrapped.
AVX-512* tests on top of patch-set all pass
under simulator.

Is it ok for trunk?

gcc/
* config/i386/i386.c
(emit_reduc_half): Handle V64QI and V32HI mode.
* config/i386/sse.md
(define_mode_iterator VI_AVX512BW): New.
(define_expand "reduc__"): Use VI512_48F_12BW.

--
Thanks, K

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 257e12b..e79210c 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -41246,6 +41246,8 @@ emit_reduc_half (rtx dest, rtx src, int i)
GEN_INT (i / 2));
}
   break;
+case V64QImode:
+case V32HImode:
 case V16SImode:
 case V16SFmode:
 case V8DImode:
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index d4adc33..106fa00 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -509,6 +509,8 @@
 (define_mode_iterator VI48_256 [V8SI V4DI])
 (define_mode_iterator VI48_512 [V16SI V8DI])
 (define_mode_iterator VI4_256_8_512 [V8SI V8DI])
+(define_mode_iterator VI_AVX512BW
+  [V16SI V8DI (V32HI "TARGET_AVX512BW") (V64QI "TARGET_AVX512BW")])
 
 ;; Int-float size matches
 (define_mode_iterator VI4F_128 [V4SI V4SF])
@@ -2332,9 +2334,9 @@
 })
 
 (define_expand "reduc__"
-  [(umaxmin:VI48_512
- (match_operand:VI48_512 0 "register_operand")
- (match_operand:VI48_512 1 "register_operand"))]
+  [(umaxmin:VI_AVX512BW
+ (match_operand:VI_AVX512BW 0 "register_operand")
+ (match_operand:VI_AVX512BW 1 "register_operand"))]
   "TARGET_AVX512F"
 {
   ix86_expand_reduc (gen_3, operands[0], operands[1]);


Re: [PATCH 4/n] OpenMP 4.0 offloading infrastructure: lto-wrapper

2014-10-09 Thread Ilya Verbin
On 08 Oct 12:26, Jakub Jelinek wrote:
> On Thu, Oct 02, 2014 at 07:14:57PM +0400, Ilya Verbin wrote:
> > @@ -1296,6 +1297,9 @@ static const char *const standard_startfile_prefix_2
> > relative to the driver.  */
> >  static const char *const tooldir_base_prefix = TOOLDIR_BASE_PREFIX;
> >  
> > +/* A prefix to be used when this is an accelerator compiler.  */
> > +static const char *const accel_dir_suffix = ACCEL_DIR_SUFFIX;
> 
> Is ACCEL_DIR_SUFFIX here "" or something starting with "/ ?
> 
> The reason I'm asking is that otherwise it seems gcc.c heavily uses
> dir_separator_str for the separators.  Don't have any experience with
> targets where DIR_SEPARATOR is not / though, perhaps it is a non-issue.

It is "" for the host compiler and starts with "/" for the accel compiler.
Looking at gcc/configure.ac, there are a lot of paths with slashes.
So, I think it's ok to define ACCEL_DIR_SUFFIX in such a way.

> > +#ifndef ACCEL_COMPILER
> >/* We need to check standard_exec_prefix/just_machine_suffix/specs
> >   for any override of as, ld and libraries.  */
> >specs_file = (char *) alloca (strlen (standard_exec_prefix)
> >+ strlen (just_machine_suffix) + sizeof ("specs"));
> > -
> >strcpy (specs_file, standard_exec_prefix);
> >strcat (specs_file, just_machine_suffix);
> >strcat (specs_file, "specs");
> >if (access (specs_file, R_OK) == 0)
> >  read_specs (specs_file, true, false);
> > +#endif
> 
> Why do you want to disable specs reading for the accel compiler?
> Then users won't have the possibility to override defaults etc. easily...

Bernd,
Do you need this ifndef for PTX?  Or I could remove it?

> > @@ -7097,6 +7103,16 @@ main (int argc, char **argv)
> >obstack_grow (&collect_obstack, argv[0], strlen (argv[0]) + 1);
> >xputenv (XOBFINISH (&collect_obstack, char *));
> >  
> > +  if (strlen (OFFLOAD_TARGETS) > 0)
> > +{
> > +  obstack_init (&collect_obstack);
> > +  obstack_grow (&collect_obstack, "OFFLOAD_TARGET_NAMES=",
> > +   sizeof ("OFFLOAD_TARGET_NAMES=") - 1);
> > +  obstack_grow (&collect_obstack, OFFLOAD_TARGETS,
> > +   strlen (OFFLOAD_TARGETS) + 1);
> > +  xputenv (XOBFINISH (&collect_obstack, char *));
> > +}
> > +
> 
> I'm surprised to see the obstack_init call here, but looking at
> gcc.c, I'm surprised to see it in more places.
> 
> I've always thought that obstack_init was something you invoke generally
> once on a given obstack object, then work with the obstack and then
> obstack_free (..., NULL) it at the end.  Now, if it wants the obstack to be
> live until exit, it just would not obstack_free it.  But calling
> obstack_init on the already initialized obstack results IMHO in memory
> leaks.  It should be initialized just once somewhere.

Right, I removed obstack_init from this patch.

> > +#define OFFLOAD_TARGET_NAMES_ENV   "OFFLOAD_TARGET_NAMES"
> 
> Missing comment about the env var and what it is for.

Fixed.

> > +  char *suffix
> > += XALLOCAVEC (char, strlen ("/accel//mkoffload") + strlen (target) + 
> > 1);
> 
> Use sizeof ("/accel//mkoffload") + strlen (target) instead?

Done.

> > + out:
> 
> The goto is probably unnecessary here, indenting all the lines by
> 4 more spaces and using if (compiler) { ... } instead doesn't need
> any further warpping.

Done.

> > +  /* By default linker does not discard .gnu.offload_lto_* sections.  
> > */
> > +  const char *linker_script = make_temp_file ("_linker_script.x");
> > +  FILE *stream = fopen (linker_script, "w");
> > +  if (!stream)
> > +   fatal_error ("fopen %s: %m", linker_script);
> > +  fprintf (stream, "SECTIONS { /DISCARD/ : { *("
> > +  OFFLOAD_SECTION_NAME_PREFIX "*) } }\n");
> > +  fclose (stream);
> > +  printf ("%s\n", linker_script);
> > +
> > +  goto finish;
> > +}
> 
> Does this work with gold?  Are there any other linkers that support plugins,
> but don't support linker scripts this way?

Oops, gold does not support scripts, outputted from plugins :(
"error: SECTIONS seen after other input files; try -T/--script"

Probably, we should update default linker scripts in binutils?
But without latest ld/gold all binaries compiled without -flto and with
offloading will contain intermediate bytecode...

Thanks,
  -- Ilya


Re: [PATCH 4/n] OpenMP 4.0 offloading infrastructure: lto-wrapper

2014-10-09 Thread Bernd Schmidt

On 10/09/2014 02:07 PM, Ilya Verbin wrote:

+#ifndef ACCEL_COMPILER
/* We need to check standard_exec_prefix/just_machine_suffix/specs
   for any override of as, ld and libraries.  */
specs_file = (char *) alloca (strlen (standard_exec_prefix)
   + strlen (just_machine_suffix) + sizeof ("specs"));
-
strcpy (specs_file, standard_exec_prefix);
strcat (specs_file, just_machine_suffix);
strcat (specs_file, "specs");
if (access (specs_file, R_OK) == 0)
  read_specs (specs_file, true, false);
+#endif


Why do you want to disable specs reading for the accel compiler?
Then users won't have the possibility to override defaults etc. easily...


Bernd,
Do you need this ifndef for PTX?  Or I could remove it?


I suspect the paths aren't right. If that can be fixed it should be fine 
to remove the ifdef.



Bernd



[PATCH i386 AVX512] [75/n] Update vec_init.

2014-10-09 Thread Kirill Yukhin
Hello,
This patch extends vec_init-related routines/patterns.

Bootstrapped.
AVX-512* tests on top of patch-set all pass
under simulator.

Is it ok for trunk?

gcc/
* config/i386/i386.c
(ix86_expand_vector_init_duplicate): Handle V64QI and V32HI modes,
update V8HI, V16QI, V32QI modes handling.
(ix86_expand_vector_init_general): Handle V64QI and V32HI modes.
* config/i386/sse.md
(define_mode_iterator VI48F_512): Rename to ...
(define_mode_iterator VI48F_I12_AVX512BW): ... this. Extend to
AVX-512BW modes.
(define_expand "vec_init"): Use VI48F_I12_AVX512BW.

--
Thanks, K

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index e79210c..7c34431 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -39791,6 +39791,8 @@ ix86_expand_vector_init_duplicate (bool mmx_ok, enum 
machine_mode mode,
 case V8SFmode:
 case V8SImode:
 case V2DFmode:
+case V64QImode:
+case V32HImode:
 case V2DImode:
 case V4SFmode:
 case V4SImode:
@@ -39821,6 +39823,9 @@ ix86_expand_vector_init_duplicate (bool mmx_ok, enum 
machine_mode mode,
   goto widen;
 
 case V8HImode:
+  if (TARGET_AVX512VL)
+return ix86_vector_duplicate_value (mode, target, val);
+
   if (TARGET_SSE2)
{
  struct expand_vec_perm_d dperm;
@@ -39851,6 +39856,9 @@ ix86_expand_vector_init_duplicate (bool mmx_ok, enum 
machine_mode mode,
   goto widen;
 
 case V16QImode:
+  if (TARGET_AVX512VL)
+return ix86_vector_duplicate_value (mode, target, val);
+
   if (TARGET_SSE2)
goto permute;
   goto widen;
@@ -39880,16 +39888,19 @@ ix86_expand_vector_init_duplicate (bool mmx_ok, enum 
machine_mode mode,
 
 case V16HImode:
 case V32QImode:
-  {
-   enum machine_mode hvmode = (mode == V16HImode ? V8HImode : V16QImode);
-   rtx x = gen_reg_rtx (hvmode);
+  if (TARGET_AVX512VL)
+return ix86_vector_duplicate_value (mode, target, val);
+  else
+   {
+ enum machine_mode hvmode = (mode == V16HImode ? V8HImode : V16QImode);
+ rtx x = gen_reg_rtx (hvmode);
 
-   ok = ix86_expand_vector_init_duplicate (false, hvmode, x, val);
-   gcc_assert (ok);
+ ok = ix86_expand_vector_init_duplicate (false, hvmode, x, val);
+ gcc_assert (ok);
 
-   x = gen_rtx_VEC_CONCAT (mode, x, x);
-   emit_insn (gen_rtx_SET (VOIDmode, target, x));
-  }
+ x = gen_rtx_VEC_CONCAT (mode, x, x);
+ emit_insn (gen_rtx_SET (VOIDmode, target, x));
+   }
   return true;
 
 default:
@@ -40451,8 +40462,9 @@ static void
 ix86_expand_vector_init_general (bool mmx_ok, enum machine_mode mode,
 rtx target, rtx vals)
 {
-  rtx ops[64], op0, op1;
+  rtx ops[64], op0, op1, op2, op3, op4, op5;
   enum machine_mode half_mode = VOIDmode;
+  enum machine_mode quarter_mode = VOIDmode;
   int n, i;
 
   switch (mode)
@@ -40503,6 +40515,42 @@ half:
  gen_rtx_VEC_CONCAT (mode, op0, op1)));
   return;
 
+case V64QImode:
+  quarter_mode = V16QImode;
+  half_mode = V32QImode;
+  goto quarter;
+
+case V32HImode:
+  quarter_mode = V8HImode;
+  half_mode = V16HImode;
+  goto quarter;
+
+quarter:
+  n = GET_MODE_NUNITS (mode);
+  for (i = 0; i < n; i++)
+   ops[i] = XVECEXP (vals, 0, i);
+  op0 = gen_reg_rtx (quarter_mode);
+  op1 = gen_reg_rtx (quarter_mode);
+  op2 = gen_reg_rtx (quarter_mode);
+  op3 = gen_reg_rtx (quarter_mode);
+  op4 = gen_reg_rtx (half_mode);
+  op5 = gen_reg_rtx (half_mode);
+  ix86_expand_vector_init_interleave (quarter_mode, op0, ops,
+ n >> 3);
+  ix86_expand_vector_init_interleave (quarter_mode, op1,
+ &ops [n >> 2], n >> 3);
+  ix86_expand_vector_init_interleave (quarter_mode, op2,
+ &ops [n >> 1], n >> 3);
+  ix86_expand_vector_init_interleave (quarter_mode, op3,
+ &ops [(n >> 1) | (n >> 2)], n >> 3);
+  emit_insn (gen_rtx_SET (VOIDmode, op4,
+ gen_rtx_VEC_CONCAT (half_mode, op0, op1)));
+  emit_insn (gen_rtx_SET (VOIDmode, op5,
+ gen_rtx_VEC_CONCAT (half_mode, op2, op3)));
+  emit_insn (gen_rtx_SET (VOIDmode, target,
+ gen_rtx_VEC_CONCAT (mode, op4, op5)));
+  return;
+
 case V16QImode:
   if (!TARGET_SSE4_1)
break;
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 106fa00..a60b0d1 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -524,7 +524,9 @@
   (V16SI "TARGET_AVX512F") (V16SF "TARGET_AVX512F")
   (V8DI  "TARGET_AVX512F") (V8DF  "TARGET_AVX512F")
   (V4DI  "TARGET_AVX512VL") (V4DF  "TARGET_AVX512VL")])
-(define_mode_iterator VI48F_512 [V16SI V

Re: [PATCH i386 AVX512] [63.1/n] Add vpshufb, perm autogen (except for v64qi).

2014-10-09 Thread Ilya Tocar
Hi,

I think this patch should be split in 2 parts:
V64QI related and non-V64QI related.
This part contains non-V64QI related changes.
Also I've noticed, that not all patterns using VI1_AVX2,
actually have AVX512 versions, so fixed bogus patterns.

On 06 Oct 16:10, Jakub Jelinek wrote:
> On Mon, Oct 06, 2014 at 04:55:28PM +0400, Kirill Yukhin wrote:
> > --- a/gcc/config/i386/i386.c
> > +++ b/gcc/config/i386/i386.c
> > @@ -21364,20 +21364,113 @@ ix86_expand_vec_perm_vpermi2 (rtx target, rtx 
> > op0, rtx mask, rtx op1)
> >enum machine_mode mode = GET_MODE (op0);
> >switch (mode)
> >  {
> > +  /* There is no byte version of vpermi2.  So we use vpermi2w.  */
> > +case V64QImode:
...
> 
> I believe this case doesn't belong to this function, other than this
> case ix86_expand_vec_perm_vpermi2 emits always just a single insn, and
> so it should always do that, and there should be a separate function
> that expands the worst case of V64QImode full 2 operand permutation.
> See my previous mail, IMHO it is doable with 5 instructions rather than 7.
> And IMHO we should have a separate function which emits that, supposedly
> one for the constant permutations, one for the variable case (perhaps
> then your 7 insn sequence is best?).
This will be done in following patch.
> 
> Also, IMHO rather than building a CONST_VECTOR ahead in each of the callers,
> supposedly ix86_expand_vec_perm_vpermi2 could take the arguments it takes
> right now plus D, either D would be NULL (then it would behave as now), or
> SEL would be NULL, then it would create a CONST_VECTOR on the fly if needed.
> I.e. the function would start with a switch that would just contain the
> if (...)
> return false; 
> hunks plus break; for the success case, then code to generate CONST_VECTOR
> if sel is NULL_RTX from d, and finally another switch with just the emit
> cases.
Done.

> 
> > +case V8HImode:
> > +  if (!TARGET_AVX512VL)
> > +   return false;
> > +  emit_insn (gen_avx512vl_vpermi2varv8hi3 (target, op0,
> > +  force_reg (V8HImode, mask), 
> > op1));
> > +  return true;
> > +case V16HImode:
> > +  if (!TARGET_AVX512VL)
> > +   return false;
> > +  emit_insn (gen_avx512vl_vpermi2varv16hi3 (target, op0,
> > +force_reg (V16HImode, mask), op1));
> > +  return true;
> 
> Aren't these two insns there only if both TARGET_AVX512VL && TARGET_AVX512BW?
> I mean, the ISA pdf mentions both of the CPUID flags simultaneously, and I
> think neither of these depends on the other one in GCC.  That's unlike insns
> where CPUID AVX512VL and AVX512F are mentioned together, because in GCC
> AVX512VL depends on AVX512F.
>
Good catch!

> > @@ -42662,7 +42764,12 @@ expand_vec_perm_blend (struct expand_vec_perm_d *d)
> >  
> >if (d->one_operand_p)
> >  return false;
> > -  if (TARGET_AVX2 && GET_MODE_SIZE (vmode) == 32)
> > +  if (TARGET_AVX512F && GET_MODE_SIZE (vmode) == 64 &&
> > +  GET_MODE_SIZE (GET_MODE_INNER (vmode)) >= 4)
> 
> Formatting, && belongs on the second line.
> 
Fixed.

> > +;
> > +  else if (TARGET_AVX512VL)
> 
> I'd add && GET_MODE_SIZE (GET_MODE_INNER (vmode) == 64 here.
> AVX512VL is not going to handle 64-bit vectors, or 1024-bit ones,
> and the == 32 and == 16 cases are handled because AVX512VL implies
> TARGET_AVX2 and TARGET_SSE4_1, doesn't it?
> 
As TARGET_AVX512VL always implies TARGET_AVX2 and TARGET_SSE4_1 and
works only on 32/16-byte mode this case is redundant, so I've removed
it.

> > @@ -43012,6 +43125,17 @@ expand_vec_perm_pshufb (struct expand_vec_perm_d 
> > *d)
> >   return false;
> > }
> > }
> > +  else if (GET_MODE_SIZE (d->vmode) == 64)
> > +   {
> > + if (!TARGET_AVX512BW)
> > +   return false;
> > + if (vmode == V64QImode)
> > +   {
> > + for (i = 0; i < nelt; ++i)
> > +   if ((d->perm[i] ^ i) & (nelt / 4))
> > + return false;
> 
> Missing comment, I'd duplicate the
>   /* vpshufb only works intra lanes, it is not
>  possible to shuffle bytes in between the lanes.  */
> comment there.
> 
Done.

> > @@ -43109,12 +43237,24 @@ expand_vec_perm_1 (struct expand_vec_perm_d *d)
> >   rtx (*gen) (rtx, rtx) = NULL;
> >   switch (d->vmode)
> > {
> > +   case V64QImode:
> > + if (TARGET_AVX512VL)
> 
> VL?  Isn't that BW?
> 
> > +   gen = gen_avx512bw_vec_dupv64qi;
> > + break;
> > case V32QImode:
> >   gen = gen_avx2_pbroadcastv32qi_1;
> >   break;
> > +   case V32HImode:
> > + if (TARGET_AVX512VL)
> 
> Ditto.
>
Fixed.

> > @@ -43216,6 +43368,14 @@ expand_vec_perm_1 (struct expand_vec_perm_d *d)
> >  mode = V8DImode;
> >else if (mode == V16SFmode)
> >  mode = V16SImode;
> > +  else if (mode == V4DFmode)
> > +mode = V4DImode;
> > +  else if (mode == V2DFmode)
> > +mode = V2DImode;
> > +  else if (mod

RE: [RFC: Patch, PR 60102] [4.9/4.10 Regression] powerpc fp-bit ices@dwf_regno

2014-10-09 Thread rohitarul...@freescale.com

> -Original Message-
> From: Maciej W. Rozycki [mailto:ma...@codesourcery.com]
> To: Ulrich Weigand
> Cc: Dharmakan Rohit-B30502; Wienskoski Edmar-RA8797; David Edelsohn; gcc-
> patc...@gcc.gnu.org; Alan Modra; Jakub Jelinek
> Subject: Re: [RFC: Patch, PR 60102] [4.9/4.10 Regression] powerpc fp-bit
> ices@dwf_regno
[snip]
> >
> > Rohit, could you verify whether this fixes the SPE problem?
> 
>  Thanks for your work, I have applied your change to my 4.9 setup, rebuilt the
> compiler and smoke-tested it with gdb.base/store.exp and my e500v2 multilib
> only; powerpc-linux-gnu target.  Unfortunately your patch hasn't changed
> anything, the same failures remain.
> 
>  Just to be sure I haven't missed anything I reverted your change and Rohit's
> one as well, rebuilt the compiler and rerun this testing and this time all 
> failures
> were gone.  So it looks like your fix didn't completely cover the damage 
> Rohit's
> change caused. :(

Ulrich/Maciej,

The patch works for me.
Tested with GCC v4.9 branch rev 216036 and GCC trunk rev 216027.

Regards,
Rohit



Re: [AArch64] Fix predicate and constraint mismatch in logical atomic operations

2014-10-09 Thread Christophe Lyon
On 25 September 2014 21:30, Segher Boessenkool
 wrote:
> On Thu, Sep 25, 2014 at 10:33:17AM -0700, Michael Collison wrote:
>> The problem is the "CONST_INT 0", not a large constant. This constant is
>> not accepted by the predicate, but is accepted by the constraint.
>
> Yes, bad choice of words, sorry.  Just read "big" as "not matching the
> predicate".  The point is that everything works fine until RA, and that
> makes it hard to make a useful test.
>

Hi,

IIUC, a testcase is not required/practicable.
So, may I ping this patch?
https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02209.html

Thanks,

Christophe.

>
> Segher
>
>
> p.s.  Please don't top-post.  Thanks.


Re: [i386] Replace builtins with vector extensions

2014-10-09 Thread Marc Glisse

On Thu, 9 Oct 2014, Uros Bizjak wrote:


On Thu, Oct 9, 2014 at 12:33 PM, Marc Glisse  wrote:

Ping https://gcc.gnu.org/ml/gcc-patches/2014-07/msg01812.html

(another part of the discussion is around
https://gcc.gnu.org/ml/gcc-patches/2014-06/msg02288.html )

Most people who commented seem cautiously in favor. The least favorable was
Ulrich who suggested to go with it but keep the old behavior accessible if
the user defines some macro (which imho would lose a large part of the
simplification benefits of the patch)
https://gcc.gnu.org/ml/gcc-patches/2014-06/msg02328.html

If this is accepted, I will gladly prepare patches removing the unused
builtins and extending this to a few more operations (integer vectors in
particular). If this is not the direction we want to go, I'd like to hear it
clearly so I can move on...


Well, I'm undecided.


First, thanks for answering, it helps me a lot to know what others think.


The current approach is proven to work OK, there is no bugs reported
in this area and the performance is apparently OK. There should be
clear benefits in order to change something that "ain't broken", and
at least some proof that we won't regress in this area with the new
approach.


There are quite a few enhancement PRs asking for more performance, but 
indeed no (or very few) complaints about correctness or about gcc turning 
their code into something worse than what they wrote, which I completely 
agree weighs more.



On the other hand, if the new approach opens new optimization
opportunities (without regression!), I'm in favor of it, including the
fact that new code won't produce equivalent assembly - as long as
functionality of the optimized asm stays the same (obviously, I'd
say).

Please also note that this is quite big project. There are plenty of
intrinsics and I for one don't want another partial transition ...


That might be an issue : this transition is partial by nature. Many 
intrinsics cannot (easily) be expressed in GIMPLE, and among those that 
can be represented, we only want to change those for which we are 
confident that we will not regress the quality of the code. From the 
reactions, I would assume that we want to be quite conservative at the 
beginning, and maybe we can reconsider some other intrinsics later.


The best I can offer is consistency: if addition of v2df is changed, 
addition of v4df is changed as well (and say any +-*/ of float/double 
vectors of any supported size). Another block would be +-*/% for integer 
vectors. And construction / access (most construction is already 
builtin-free). And remove the unused builtins in the same patch that makes 
them unused. If you don't like those blocks, I can write one mega-patch 
that does all these, if we roughly agree on the list beforehand, so it 
goes in all at once.


Would that be good enough?


TL/DR: If there are benefits, no regressions and you think you'll
finish the transition, let's go for it.


--
Marc Glisse


[PATCH] Fix GCC tests fail for installed toolchain due to ASan, UBSan and TSan testsuites drop GCC_EXEC_PREFIX.

2014-10-09 Thread Maxim Ostapenko

Hi,

After enabling ASan, TSan and UBSan testsuites for installed toolchain, 
many tests started to fail. This is caused by wrong logic in {asan, 
ubsan, tsan}_finish
functions. Here, restore_ld_library_path is called, that is wrong, 
because it drops some env variables ( GCC_EXEC_PREFIX, LD_LIBRARY_PATH, 
etc) to state that was before gcc-dg.exp initialized testing 
environment, so installed GCC will be confused to find some needed stuff 
later.


Removing restore_ld_library_path from {asan, ubsan, tsan}_finish seems 
to fix the issue.


Tested on x86_64-pc-linux-gnu, ok to commit?

-Maxim
gcc/testsuite/ChangeLog:

2014-10-09  Max Ostapenko  

	* lib/asan-dg.exp (asan_finish): Remove restore_ld_library_path_env_vars.
	* lib/tsan-dg.exp (tsan_finish): Likewise.
	* lib/ubsan-dg.exp (ubsan_finish): Likewise.

diff --git a/gcc/testsuite/lib/asan-dg.exp b/gcc/testsuite/lib/asan-dg.exp
index 9769138..c98fd3c 100644
--- a/gcc/testsuite/lib/asan-dg.exp
+++ b/gcc/testsuite/lib/asan-dg.exp
@@ -132,7 +132,6 @@ proc asan_finish { args } {
 	unset TEST_ALWAYS_FLAGS
 	}
 }
-restore_ld_library_path_env_vars
 }
 
 # Symbolize lines like
diff --git a/gcc/testsuite/lib/tsan-dg.exp b/gcc/testsuite/lib/tsan-dg.exp
index 54ec404..6f7a4d9 100644
--- a/gcc/testsuite/lib/tsan-dg.exp
+++ b/gcc/testsuite/lib/tsan-dg.exp
@@ -143,5 +143,4 @@ proc tsan_finish { args } {
 } else {
 	unset dg-do-what-default
 }
-restore_ld_library_path_env_vars
 }
diff --git a/gcc/testsuite/lib/ubsan-dg.exp b/gcc/testsuite/lib/ubsan-dg.exp
index 5a7a653..87c460f 100644
--- a/gcc/testsuite/lib/ubsan-dg.exp
+++ b/gcc/testsuite/lib/ubsan-dg.exp
@@ -114,5 +114,4 @@ proc ubsan_finish { args } {
 	unset TEST_ALWAYS_FLAGS
 	}
 }
-restore_ld_library_path_env_vars
 }


Re: Towards GNU11

2014-10-09 Thread Matthias Klose
Am 08.10.2014 um 09:16 schrieb Richard Biener:
> On Tue, 7 Oct 2014, Marek Polacek wrote:
> I think it makes sense to do this (and I expect C++ will follow
> with defaulting to -std=c++11 once the ABI stuff has settled).
> 
> Of course it would be nice to look at the actual fallout in
> a whole-distribution rebuild...

I can certainly do that, once stage1 is finished, hopefully for more than x86
architectures.

What happened to the plans to stabilize the libstdc++ c++11 ABI?  Is this still
a target for GCC 5?

  Matthias



Re: [i386] Replace builtins with vector extensions

2014-10-09 Thread Uros Bizjak
On Thu, Oct 9, 2014 at 2:28 PM, Marc Glisse  wrote:
> On Thu, 9 Oct 2014, Uros Bizjak wrote:
>
>> On Thu, Oct 9, 2014 at 12:33 PM, Marc Glisse  wrote:
>>>
>>> Ping https://gcc.gnu.org/ml/gcc-patches/2014-07/msg01812.html
>>>
>>> (another part of the discussion is around
>>> https://gcc.gnu.org/ml/gcc-patches/2014-06/msg02288.html )
>>>
>>> Most people who commented seem cautiously in favor. The least favorable
>>> was
>>> Ulrich who suggested to go with it but keep the old behavior accessible
>>> if
>>> the user defines some macro (which imho would lose a large part of the
>>> simplification benefits of the patch)
>>> https://gcc.gnu.org/ml/gcc-patches/2014-06/msg02328.html
>>>
>>> If this is accepted, I will gladly prepare patches removing the unused
>>> builtins and extending this to a few more operations (integer vectors in
>>> particular). If this is not the direction we want to go, I'd like to hear
>>> it
>>> clearly so I can move on...
>>
>>
>> Well, I'm undecided.
>
>
> First, thanks for answering, it helps me a lot to know what others think.
>
>> The current approach is proven to work OK, there is no bugs reported
>> in this area and the performance is apparently OK. There should be
>> clear benefits in order to change something that "ain't broken", and
>> at least some proof that we won't regress in this area with the new
>> approach.
>
>
> There are quite a few enhancement PRs asking for more performance, but
> indeed no (or very few) complaints about correctness or about gcc turning
> their code into something worse than what they wrote, which I completely
> agree weighs more.
>
>> On the other hand, if the new approach opens new optimization
>> opportunities (without regression!), I'm in favor of it, including the
>> fact that new code won't produce equivalent assembly - as long as
>> functionality of the optimized asm stays the same (obviously, I'd
>> say).
>>
>> Please also note that this is quite big project. There are plenty of
>> intrinsics and I for one don't want another partial transition ...
>
>
> That might be an issue : this transition is partial by nature. Many
> intrinsics cannot (easily) be expressed in GIMPLE, and among those that can
> be represented, we only want to change those for which we are confident that
> we will not regress the quality of the code. From the reactions, I would
> assume that we want to be quite conservative at the beginning, and maybe we
> can reconsider some other intrinsics later.
>
> The best I can offer is consistency: if addition of v2df is changed,
> addition of v4df is changed as well (and say any +-*/ of float/double
> vectors of any supported size). Another block would be +-*/% for integer
> vectors. And construction / access (most construction is already
> builtin-free). And remove the unused builtins in the same patch that makes
> them unused. If you don't like those blocks, I can write one mega-patch that
> does all these, if we roughly agree on the list beforehand, so it goes in
> all at once.
>
> Would that be good enough?

OK, let's go in the proposed way, more detailed:

- we begin with +-*/ of float/double vectors. IMO, this would result
in a relatively small and easily reviewable patch to iron out the
details of the approach. Alternatively, we can begin with floats only.
- commit the patch and wait for the sky to fall down.
- we play a bit with the compiler to check generated code and corner
cases (some kind of Q/A) and wait if someone finds a problem (say, a
couple of weeks).
- if there are no problems, continue with integer builtins following
the established approach, otherwise we revert everything and go back
to the drawing board.
- repeat the procedure for other builtins.

I propose to wait a couple of days for possible comments before we get
the ball rolling.

Uros.


Re: [PATCH, C++] Fix PR63366: __complex not equivalent to __complex double in C++

2014-10-09 Thread Jason Merrill

On 10/09/2014 05:24 AM, Thomas Preud'homme wrote:

   "ISO C++ forbids declaration of %qs with no type", name);

type = integer_type_node;
+  defaulted_int = 1;


I would think we want to handle this up in the existing defaulted_int block:


  /* No type at all: default to `int', and set DEFAULTED_INT
 because it was not a user-defined typedef.  */
  if (type == NULL_TREE && (signed_p || unsigned_p || long_p || short_p))



+  return typeid (__complex) != typeid (__complex double);


Don't we want this to be '=='?

Jason



Re: constexpr and const qual in C++14

2014-10-09 Thread Jason Merrill

OK, thanks.

Jason


Re: [PATCH, C++] Fix PR63366: __complex not equivalent to __complex double in C++

2014-10-09 Thread Nathan Sidwell

On 10/09/14 09:25, Jason Merrill wrote:

On 10/09/2014 05:24 AM, Thomas Preud'homme wrote:

   "ISO C++ forbids declaration of %qs with no type", name);

type = integer_type_node;
+  defaulted_int = 1;


I would think we want to handle this up in the existing defaulted_int block:

my thought was to at least put it next to the explicit_int = -1 above.


  /* No type at all: default to `int', and set DEFAULTED_INT
 because it was not a user-defined typedef.  */
  if (type == NULL_TREE && (signed_p || unsigned_p || long_p || short_p))



+  return typeid (__complex) != typeid (__complex double);


Don't we want this to be '=='?


I think it wants to check for _complex being __complex double, not not being 
_complex int  (other failure modes exist!)


nathan

--
Nathan Sidwell


Re: [C++ PAtch] More C++11 and C++14 constexpr work

2014-10-09 Thread Jason Merrill

On 10/08/2014 03:47 PM, Paolo Carlini wrote:

(check_constexpr_ctor_body): Use it; add bool parameter.


This function seems to only be called in one place; why add the parameter?

Jason



Re: [PATCH, C++] Fix PR63366: __complex not equivalent to __complex double in C++

2014-10-09 Thread Jason Merrill

On 10/09/2014 09:30 AM, Nathan Sidwell wrote:

+  return typeid (__complex) != typeid (__complex double);


Don't we want this to be '=='?


I think it wants to check for _complex being __complex double, not not
being _complex int  (other failure modes exist!)


Right, I was forgetting about returning nonzero for failure.

Jason



Re: C++ Patch for c++/60894

2014-10-09 Thread Jason Merrill

On 10/08/2014 01:30 PM, Fabien Chêne wrote:

2014-10-07 23:13 GMT+02:00 Jason Merrill :

It seems to me that the problem is that lookup_and_check_tag is rejecting a
USING_DECL rather than returning it.  What if we return the USING_DECL?


If the USING_DECL is returned, the code below will be rejected as
expected, but the error message will not mention the line where the
USING_DECL appears as the previous definition, but at the target
declaration of the USING_DECL instead.


I think that's what happens if you strip the USING_DECL and return what 
it points to; if you return the USING_DECL itself that shouldn't happen 
(though then the caller needs to be able to deal with getting a USING_DECL).


Jason



Re: Towards GNU11

2014-10-09 Thread Jason Merrill

On 10/09/2014 08:45 AM, Matthias Klose wrote:

What happened to the plans to stabilize the libstdc++ c++11 ABI?  Is this still
a target for GCC 5?


Yes.

Jason




Re: [RFC: Patch, PR 60102] [4.9/4.10 Regression] powerpc fp-bit ices@dwf_regno

2014-10-09 Thread Ulrich Weigand
rohitarulraj wrote:
> > -Original Message-
> > From: Maciej W. Rozycki [mailto:ma...@codesourcery.com]
> > To: Ulrich Weigand
> > Cc: Dharmakan Rohit-B30502; Wienskoski Edmar-RA8797; David Edelsohn; gcc-
> > patc...@gcc.gnu.org; Alan Modra; Jakub Jelinek
> > Subject: Re: [RFC: Patch, PR 60102] [4.9/4.10 Regression] powerpc fp-bit
> > ices@dwf_regno
> [snip]
> > >
> > > Rohit, could you verify whether this fixes the SPE problem?
> >=20
> >  Thanks for your work, I have applied your change to my 4.9 setup, rebuil=
> t the
> > compiler and smoke-tested it with gdb.base/store.exp and my e500v2 multil=
> ib
> > only; powerpc-linux-gnu target.  Unfortunately your patch hasn't changed
> > anything, the same failures remain.
> >=20
> >  Just to be sure I haven't missed anything I reverted your change and Roh=
> it's
> > one as well, rebuilt the compiler and rerun this testing and this time al=
> l failures
> > were gone.  So it looks like your fix didn't completely cover the damage =
> Rohit's
> > change caused. :(
> 
> Ulrich/Maciej,
> 
> The patch works for me.
> Tested with GCC v4.9 branch rev 216036 and GCC trunk rev 216027.

Thanks for testing!  Can you work with Maciej to find out why he's
seeing different results?

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



Re: __intN patch 3/5: main __int128 -> __intN conversion.

2014-10-09 Thread Jason Merrill

OK, thanks.

Jason


Re: [C++ PAtch] More C++11 and C++14 constexpr work

2014-10-09 Thread Paolo Carlini

Hi,

On 10/09/2014 03:31 PM, Jason Merrill wrote:

On 10/08/2014 03:47 PM, Paolo Carlini wrote:

(check_constexpr_ctor_body): Use it; add bool parameter.

This function seems to only be called in one place; why add the parameter?
Is also called recursively by check_constexpr_ctor_body_1 and without 
the complain boolean we end up printing the error message twice.


Paolo.


Re: [Ping] Port of VTV for Cygwin and MinGW

2014-10-09 Thread Patrick Wollgast
On 27.09.2014 12:50, Kai Tietz wrote:
> Hi Patrick,
> 
> the mingw/cygwin part your patch looks fine to me.  Nevertheless I
> have one question regarding to you.  Do you have FSF papers for gcc
> already?  As I asked an overseer and he didn't found you on the list.
> 
> Regards,
> Kai
> 

The papers FSF have been taken care of, and the signed papers have been
exchanged.


A short recap:

Mail with the latest patch and changelog:
https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02200.html

Approved:
* gcc/config/i386/*
* libstdc++-v3/*
* libvtv/*

Not approved:
* gcc/cp/vtable-class-hierarchy.c
* gcc/varasm.c
* libgcc/Makefile.in
* libgcc/config.host
* libiberty/obstack.c

Regards,
Patrick


Re: [C++ PAtch] More C++11 and C++14 constexpr work

2014-10-09 Thread Paolo Carlini

.. a simple example in C++11 would be:

struct S
{
  constexpr S() { { struct T { }; } }
};

Paolo.


Re: [PATCH, i386, Pointer Bounds Checker 31/x] Pointer Bounds Checker builtins for i386 target

2014-10-09 Thread Ilya Enkovich
It appeared I changed a semantics of BNDMK expand when replaced tree operations 
with rtl ones.

Original code:

+  op1 = expand_normal (fold_build2 (PLUS_EXPR, TREE_TYPE (arg1),
+   arg1, integer_minus_one_node));
+  op1 = force_reg (Pmode, op1);

Modified code:

+  op1 = expand_normal (arg1);
+
+  if (!register_operand (op1, Pmode))
+   op1 = ix86_zero_extend_to_Pmode (op1);
+
+  /* Builtin arg1 is size of block but instruction op1 should
+be (size - 1).  */
+  op1 = expand_simple_binop (Pmode, PLUS, op1, constm1_rtx,
+op1, 1, OPTAB_DIRECT);

The problem is that in the fixed version we may modify value of a pseudo 
register into which arg1 is expanded which means incorrect value for all 
following usages of arg1.

Didn't reveal it early because programs surprisingly rarely hit this bug.  I do 
following change to fix it:

   op1 = expand_simple_binop (Pmode, PLUS, op1, constm1_rtx,
-op1, 1, OPTAB_DIRECT);
+NULL, 1, OPTAB_DIRECT);

Similar problem was also fixed for BNDNARROW.  Does it look OK?

Thanks,
Ilya
--
diff --git a/gcc/config/i386/i386-builtin-types.def 
b/gcc/config/i386/i386-builtin-types.def
index 9161287..5421ba9 100644
--- a/gcc/config/i386/i386-builtin-types.def
+++ b/gcc/config/i386/i386-builtin-types.def
@@ -47,6 +47,7 @@ DEF_PRIMITIVE_TYPE (UCHAR, unsigned_char_type_node)
 DEF_PRIMITIVE_TYPE (QI, char_type_node)
 DEF_PRIMITIVE_TYPE (HI, intHI_type_node)
 DEF_PRIMITIVE_TYPE (SI, intSI_type_node)
+DEF_PRIMITIVE_TYPE (BND, pointer_bounds_type_node)
 # ??? Logically this should be intDI_type_node, but that maps to "long"
 # with 64-bit, and that's not how the emmintrin.h is written.  Again, 
 # changing this would change name mangling.
@@ -60,6 +61,7 @@ DEF_PRIMITIVE_TYPE (USHORT, short_unsigned_type_node)
 DEF_PRIMITIVE_TYPE (INT, integer_type_node)
 DEF_PRIMITIVE_TYPE (UINT, unsigned_type_node)
 DEF_PRIMITIVE_TYPE (UNSIGNED, unsigned_type_node)
+DEF_PRIMITIVE_TYPE (ULONG, long_unsigned_type_node)
 DEF_PRIMITIVE_TYPE (LONGLONG, long_long_integer_type_node)
 DEF_PRIMITIVE_TYPE (ULONGLONG, long_long_unsigned_type_node)
 DEF_PRIMITIVE_TYPE (UINT8, unsigned_char_type_node)
@@ -810,3 +812,15 @@ DEF_FUNCTION_TYPE_ALIAS (V2DI_FTYPE_V2DI_V2DI, TF)
 DEF_FUNCTION_TYPE_ALIAS (V4SF_FTYPE_V4SF_V4SF, TF)
 DEF_FUNCTION_TYPE_ALIAS (V4SI_FTYPE_V4SI_V4SI, TF)
 DEF_FUNCTION_TYPE_ALIAS (V8HI_FTYPE_V8HI_V8HI, TF)
+
+# MPX builtins
+DEF_FUNCTION_TYPE (BND, PCVOID, ULONG)
+DEF_FUNCTION_TYPE (VOID, PCVOID, BND)
+DEF_FUNCTION_TYPE (VOID, PCVOID, BND, PCVOID)
+DEF_FUNCTION_TYPE (BND, PCVOID, PCVOID)
+DEF_FUNCTION_TYPE (BND, PCVOID)
+DEF_FUNCTION_TYPE (BND, BND, BND)
+DEF_FUNCTION_TYPE (PVOID, PVOID, PVOID, ULONG)
+DEF_FUNCTION_TYPE (PVOID, PCVOID, BND, ULONG)
+DEF_FUNCTION_TYPE (ULONG, VOID)
+DEF_FUNCTION_TYPE (PVOID, BND)
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 2b39ff1..3f464de 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -84,6 +84,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-vectorizer.h"
 #include "shrink-wrap.h"
 #include "builtins.h"
+#include "tree-chkp.h"
+#include "rtl-chkp.h"
 
 static rtx legitimize_dllimport_symbol (rtx, bool);
 static rtx legitimize_pe_coff_extern_decl (rtx, bool);
@@ -28776,6 +28778,19 @@ enum ix86_builtins
   IX86_BUILTIN_XABORT,
   IX86_BUILTIN_XTEST,
 
+  /* MPX */
+  IX86_BUILTIN_BNDMK,
+  IX86_BUILTIN_BNDSTX,
+  IX86_BUILTIN_BNDLDX,
+  IX86_BUILTIN_BNDCL,
+  IX86_BUILTIN_BNDCU,
+  IX86_BUILTIN_BNDRET,
+  IX86_BUILTIN_BNDNARROW,
+  IX86_BUILTIN_BNDINT,
+  IX86_BUILTIN_SIZEOF,
+  IX86_BUILTIN_BNDLOWER,
+  IX86_BUILTIN_BNDUPPER,
+
   /* BMI instructions.  */
   IX86_BUILTIN_BEXTR32,
   IX86_BUILTIN_BEXTR64,
@@ -28853,6 +28868,8 @@ struct builtin_isa {
   enum ix86_builtin_func_type tcode; /* type to use in the declaration */
   HOST_WIDE_INT isa;   /* isa_flags this builtin is defined for */
   bool const_p;/* true if the declaration is constant 
*/
+  bool leaf_p; /* true if the declaration has leaf attribute */
+  bool nothrow_p;  /* true if the declaration has nothrow 
attribute */
   bool set_and_not_built_p;
 };
 
@@ -28904,6 +28921,8 @@ def_builtin (HOST_WIDE_INT mask, const char *name,
  ix86_builtins[(int) code] = NULL_TREE;
  ix86_builtins_isa[(int) code].tcode = tcode;
  ix86_builtins_isa[(int) code].name = name;
+ ix86_builtins_isa[(int) code].leaf_p = false;
+ ix86_builtins_isa[(int) code].nothrow_p = false;
  ix86_builtins_isa[(int) code].const_p = false;
  ix86_builtins_isa[(int) code].set_and_not_built_p = true;
}
@@ -28954,6 +28973,11 @@ ix86_add_new_builtins (HOST_WIDE_INT isa)
  ix86_builtins[i] = decl;
  if (ix86_builtins_isa[i].const_p)
TREE_READONLY (decl) = 1;
+

Re: [PATCH, ARM] attribute target (thumb,arm)

2014-10-09 Thread Richard Earnshaw
On 09/10/14 12:35, Christian Bruel wrote:
> 
> On 10/08/2014 06:56 PM, Ramana Radhakrishnan wrote:
>> Hi Christian,
> 
> << snipped agreed stuf >>
>> 3) about inlining
>>I dislike inlining different modes, From a conceptual use, a user
>> might want to switch mode only when changing a function's hotness.
>> Usually inlining a cold function into a hot one is not what the user
>> explicitly required when setting different mode attributes for them,
>> __attribute__((thumb)) should not imply coldness or hotness. Inlining 
>> between cold and hot functions should be done based on profile feedback. 
>> The choice of compiling in "Thumb1" state for coldness is a separate one 
>> because that's where the choice needs to be made.
> 
> Ideally yes. but I think that a user motivation to use target attribute
> (("thumb") is to reduce code size even in the cases where PFO is not
> available (libraries, kernel or user application build spec packaged
> without profile data). And there are cases where static probabilities
> are not enough and that a user wants it own control with gprof or oprofile.
> But in this case, we could point to the __attribute__ (("cold")) on the
> function ? That would probably be the best workaround to propose if we
> recommend this
> 

Hot vs cold is interesting, but arm/thumb shouldn't be used to imply
that.  The days when ARM=fast, thumb=small are in the past now, and
thumb2 code should be both fast and small.  Indeed, smaller thumb2 code
can be faster than larger ARM code simply because you can get more of it
in the cache.  The use of arm vs thumb is likely to be much more subtle now.

> But here is another scenario: Using of attribute (("arm")) for exception
> entry points is indeed not related to hotness. But consider a typical
> thumb binary with an entry point in arm compiled in C (ex handler, a
> kernel...). Today due to the file boundary the thumb part is not inlined
> into the arm point. (Using -flto is not possible because the whole
> gimple would be thumb).
> 

We have no-inline attributes for scenarios like that.  I don't think a
specific use case should dominate other cases.

> Now, using attribute (("target")) for the functions others than the
> entry point, with your approach they would all be inlined (assuming the
> cost allow this) and we would end up with a arm binary instead of a
> thumb binary...
> 
> But there are still 3 points  :
> 
> - At least 2 other target (i386, Powerpc) that support attribute_target
> disable inlining between modes that are not subsets. I like to think
> about homogeneity between targets and I find odd to have different
> inlining rules...
> 

That's because use of specific instructions must not be allowed to leak
past a gating check that is in the caller.  It would be a disaster if a
function that used a neon register, for example, was allowed to leak
into code that might run on a target with no Neon register file.  The
ARM/thumb distinction shouldn't, by default, be limited in that manner.

I believe inlining could happen from a subset of the archtiecture into a
function using a superset, just not vice-versa.

> - Scanning the function body to check for ASM_INPUT does not look very
> elegant (if this matters) because the asm could well be unrelated
> 
> The only case when it will always be a win to inline thumb into arm is
> when the cost of the inlined body is less than a BX instruction (but
> still, with branch prediction this cost is pondered).
> 

One of the problems with not inlining is that the C++ abstraction
penalty is likely to shoot up.  There will be many major lost
optimization opportunities if we start down that path.

So I think inlining should only be disabled if there's some technical
reason why it should be disabled, not because of some 'it might not
always be ideal' feelings.  Furthermore, we should expect users to use
the other attributes consistently when they expect specific behaviours
to occur.

My 2p.

R.

> 
>>
>>> The compiler would take a decision that is not what the user wrote. And
>>> in addition if you consider the few instructions to modify R15 to switch
>>> state that would end up with more code executed in the critical path,
>>> voiding a possible size of speed gain.
>> I do not expect there to be any additional instructions needed to switch 
>> state. If function x is inlined into function y the state would be lost 
>> and the state would be in terms of the state of function x.
> Yes, indeed. I was in a LCM/mode-switching thinking mode when writing
> this. In this case the mode is inherited.
> 
>> Obviously if the user doesn't want inlining - the user would add 
>> attributes to disable inlining. You do have extensions such as 
>> __attribute__((noinline)) and __attribute__((never_inline)) to give the 
>> user that control and those bits need to be used in addition.
> 
> Those attributes are overkill. They would disable inlining between
> caller-callee of a same mode. This is not what we want
> 
>>
>> The att

Re: [C++ PAtch] More C++11 and C++14 constexpr work

2014-10-09 Thread Jason Merrill

On 10/09/2014 09:49 AM, Paolo Carlini wrote:

Hi,

On 10/09/2014 03:31 PM, Jason Merrill wrote:

On 10/08/2014 03:47 PM, Paolo Carlini wrote:

(check_constexpr_ctor_body): Use it; add bool parameter.

This function seems to only be called in one place; why add the
parameter?

Is also called recursively by check_constexpr_ctor_body_1 and without
the complain boolean we end up printing the error message twice.


Ah, guess I overlooked that.  OK.

Jason




Re: [Ping] Port of VTV for Cygwin and MinGW

2014-10-09 Thread Kai Tietz
2014-10-09 15:52 GMT+02:00 Patrick Wollgast :
> On 27.09.2014 12:50, Kai Tietz wrote:
>> Hi Patrick,
>>
>> the mingw/cygwin part your patch looks fine to me.  Nevertheless I
>> have one question regarding to you.  Do you have FSF papers for gcc
>> already?  As I asked an overseer and he didn't found you on the list.
>>
>> Regards,
>> Kai
>>
>
> The papers FSF have been taken care of, and the signed papers have been
> exchanged.
>
>
> A short recap:
>
> Mail with the latest patch and changelog:
> https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02200.html
>
> Approved:
> * gcc/config/i386/*
> * libstdc++-v3/*
> * libvtv/*
>
> Not approved:
> * gcc/cp/vtable-class-hierarchy.c

Index: gcc/cp/vtable-class-hierarchy.c
===
--- gcc/cp/vtable-class-hierarchy.c(Revision 214408)
+++ gcc/cp/vtable-class-hierarchy.c(Arbeitskopie)
@@ -1182,7 +1182,7 @@ vtv_generate_init_routine (void)
   TREE_STATIC (vtv_fndecl) = 1;
   TREE_USED (vtv_fndecl) = 1;
   DECL_PRESERVE_P (vtv_fndecl) = 1;
-  if (flag_vtable_verify == VTV_PREINIT_PRIORITY)
+  if (flag_vtable_verify == VTV_PREINIT_PRIORITY && !TARGET_PECOFF)

You need to check that TARGET_PECOFF is defined.  Otherwise you break
compilation for none i386 targets.

 DECL_STATIC_CONSTRUCTOR (vtv_fndecl) = 0;

   gimplify_function_tree (vtv_fndecl);
@@ -1190,7 +1190,7 @@ vtv_generate_init_routine (void)

   cgraph_process_new_functions ();

-  if (flag_vtable_verify == VTV_PREINIT_PRIORITY)
+  if (flag_vtable_verify == VTV_PREINIT_PRIORITY && !TARGET_PECOFF)

See above. Likewise
 assemble_vtv_preinit_initializer (vtv_fndecl);

 }


> * gcc/varasm.c

Index: gcc/varasm.c
===
--- gcc/varasm.c(Revision 214408)
+++ gcc/varasm.c(Arbeitskopie)
@@ -2165,6 +2165,33 @@ assemble_variable (tree decl, int top_le
  DECL_NAME (decl));
   in_section = sect;
 #else
+  /* Neither OBJECT_FORMAT_PE, nor OBJECT_FORMAT_COFF is set here.
+ Therefore the following check is used.
+ In case a the target is PE or COFF a comdat group section
+ is created, e.g. .vtable_map_vars$foo. The linker places
+ everything in .vtable_map_vars at the end.
+
+ A fix could be made in
+ gcc/config/i386/winnt.c: i386_pe_unique_section. */
+  if (TARGET_PECOFF)

You need to test, if TARGET_PECOFF is defined!

+  {
+char *name;
+
+if (TREE_CODE (DECL_NAME (decl)) == IDENTIFIER_NODE)
+  name = ACONCAT ((sect->named.name, "$",
+   IDENTIFIER_POINTER (DECL_NAME (decl)), NULL));
+else
+  name = ACONCAT ((sect->named.name, "$",
+   IDENTIFIER_POINTER (DECL_COMDAT_GROUP (DECL_NAME (decl))),
+   NULL));
+
+targetm.asm_out.named_section (name,
+ sect->named.common.flags
+   | SECTION_LINKONCE,

Here it seems to me that you have some whitespace issues,

+   DECL_NAME (decl));
+in_section = sect;
+}
+else
   switch_to_section (sect);
 #endif


> * libgcc/Makefile.in

Looks ok to me.

> * libgcc/config.host

Looks fine to me, too.

> * libiberty/obstack.c

Why you use instead of C-runtime exit/abort-functions the
platform-functions to terminate the process.  This looks to me like
useless change.  For cygwin this might be even wrong in some aspects.
What is the reasoning for this change?

Another note I have about re-implementation of mprotect in ---
libvtv/vtv_malloc.cc.  Why you need that?  it is already part of
libgcc for mingw.  And for cygwin this function is part of cygwin's
library itself.  So why re-implementing it here?


> Regards,
> Patrick

Regards,
Kai


[PATCH] Fix r216010 fallout

2014-10-09 Thread Markus Trippelsdorf
This fixes fallout from r216010, which causes Firefox build failures.
Just move the gcc_assert below the new if statement.

Boostrapped and tested on powerpc64-unknown-linux-gnu.
Ok for trunk?
Thanks.

2014-10-09  Markus Trippelsdorf  

* pa-polymorphic-call.c (check_stmt_for_type_change): Move
assertion.

2014-10-09  Markus Trippelsdorf  

* /g++.dg/ipa/polymorphic-call-1.C: New testcase.


diff --git a/gcc/ipa-polymorphic-call.c b/gcc/ipa-polymorphic-call.c
index 51c6709a8655..7d58601ae365 100644
--- a/gcc/ipa-polymorphic-call.c
+++ b/gcc/ipa-polymorphic-call.c
@@ -1424,9 +1424,9 @@ check_stmt_for_type_change (ao_ref *ao ATTRIBUTE_UNUSED, 
tree vdef, void *data)
}
 
   type = extr_type_from_vtbl_ptr_store (stmt, tci, &offset);
-  gcc_assert (!type || TYPE_MAIN_VARIANT (type) == type);
   if (type == error_mark_node)
return false;
+  gcc_assert (!type || TYPE_MAIN_VARIANT (type) == type);
   if (!type)
{
  if (dump_file)
diff --git a/gcc/testsuite/g++.dg/ipa/polymorphic-call-1.C 
b/gcc/testsuite/g++.dg/ipa/polymorphic-call-1.C
new file mode 100644
index ..2b5b54478349
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ipa/polymorphic-call-1.C
@@ -0,0 +1,49 @@
+// { dg-do compile }
+// { dg-options "-O2" }
+class A;
+class B
+{
+  A *mRawPtr;
+
+public:
+  void *StartAssignment___trans_tmp_2;
+  A **
+  m_fn1 ()
+  {
+StartAssignment___trans_tmp_2 = &mRawPtr;
+return reinterpret_cast (StartAssignment___trans_tmp_2);
+  }
+};
+class C
+{
+public:
+  C (B &p1) : mTargetSmartPtr (p1) {}
+  operator A **() { return mTargetSmartPtr.m_fn1 (); }
+  B &mTargetSmartPtr;
+};
+class A
+{
+public:
+  A ();
+};
+class D
+{
+  D (bool);
+  B mNewEntry;
+  virtual int m_fn2 ();
+};
+C
+fn1 (B &p1)
+{
+  return p1;
+}
+void
+fn2 (bool, A **)
+{
+  new A;
+}
+D::D (bool p1)
+{
+  A **a = fn1 (mNewEntry);
+  fn2 (p1, a);
+}
-- 
Markus


Re: [PATCH] Fix r216010 fallout

2014-10-09 Thread Marek Polacek
On Thu, Oct 09, 2014 at 04:47:05PM +0200, Markus Trippelsdorf wrote:
> 2014-10-09  Markus Trippelsdorf  
> 
>   * /g++.dg/ipa/polymorphic-call-1.C: New testcase.

Please drop the leading '/' when you commit this patch.

Marek


[PATCH i386 AVX512] [76/n] Extend int 2 float conversions.

2014-10-09 Thread Kirill Yukhin
Hello,
This patch extends autogeneration of SI-2-SF
conversions.

Bootstrapped.
AVX-512* tests on top of patch-set all pass
under simulator.

Is it ok for trunk?

gcc/
* config/i386/i386.c
(ix86_expand_vector_convert_uns_vsivsf): Handle V16SI mode and
TARGET_AVX512VL.

--
Thanks, K

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 7c34431..8a7853e 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -18811,6 +18811,19 @@ ix86_expand_vector_convert_uns_vsivsf (rtx target, rtx 
val)
   enum machine_mode fltmode = GET_MODE (target);
   rtx (*cvt) (rtx, rtx);
 
+  if (intmode == V16SImode)
+{
+  emit_insn (gen_ufloatv16siv16sf2 (target, val));
+  return;
+}
+  if (TARGET_AVX512VL)
+{
+  if (intmode == V4SImode)
+emit_insn (gen_ufloatv4siv4sf2 (target, val));
+  else
+emit_insn (gen_ufloatv8siv8sf2 (target, val));
+  return;
+}
   if (intmode == V4SImode)
 cvt = gen_floatv4siv4sf2;
   else


Re: [PATCH 1/2] [AARCH64,NEON] Add patterns + builtins for vld[234](q?)_lane_* intrinsics

2014-10-09 Thread Tejas Belagod


+(define_insn "vec_load_lanesoi_lane"


Best to prepend "aarch64_" the pattern name, IMHO, else it looks like a 
standard pattern name(eg. vec_load_lanes) at first glance.


Otherwise, LGTM(but I can't approve it). Thanks for this patch.

Thanks,
Tejas.


+  [(set (match_operand:OI 0 "register_operand" "=w")
+   (unspec:OI [(match_operand: 1 "aarch64_simd_struct_operand" 
"Utv")
+   (match_operand:OI 2 "register_operand" "0")
+   (match_operand:SI 3 "immediate_operand" "i")
+   (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY) ]
+  UNSPEC_LD2_LANE))]
+  "TARGET_SIMD"
+  "ld2\\t{%S0. - %T0.}[%3], %1"
+  [(set_attr "type" "neon_load2_one_lane")]
+)
+
  (define_insn "vec_store_lanesoi"
[(set (match_operand:OI 0 "aarch64_simd_struct_operand" "=Utv")
(unspec:OI [(match_operand:OI 1 "register_operand" "w")
@@ -4022,6 +4034,18 @@
[(set_attr "type" "neon_load3_3reg")]
  )

+(define_insn "vec_load_lanesci_lane"
+  [(set (match_operand:CI 0 "register_operand" "=w")
+   (unspec:CI [(match_operand: 1 "aarch64_simd_struct_operand" 
"Utv")
+   (match_operand:CI 2 "register_operand" "0")
+   (match_operand:SI 3 "immediate_operand" "i")
+   (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+  UNSPEC_LD3_LANE))]
+  "TARGET_SIMD"
+  "ld3\\t{%S0. - %U0.}[%3], %1"
+  [(set_attr "type" "neon_load3_one_lane")]
+)
+
  (define_insn "vec_store_lanesci"
[(set (match_operand:CI 0 "aarch64_simd_struct_operand" "=Utv")
(unspec:CI [(match_operand:CI 1 "register_operand" "w")
@@ -4053,6 +4077,18 @@
[(set_attr "type" "neon_load4_4reg")]
  )

+(define_insn "vec_load_lanesxi_lane"
+  [(set (match_operand:XI 0 "register_operand" "=w")
+   (unspec:XI [(match_operand: 1 "aarch64_simd_struct_operand" 
"Utv")
+   (match_operand:XI 2 "register_operand" "0")
+   (match_operand:SI 3 "immediate_operand" "i")
+   (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+  UNSPEC_LD4_LANE))]
+  "TARGET_SIMD"
+  "ld4\\t{%S0. - %V0.}[%3], %1"
+  [(set_attr "type" "neon_load4_one_lane")]
+)
+
  (define_insn "vec_store_lanesxi"
[(set (match_operand:XI 0 "aarch64_simd_struct_operand" "=Utv")
(unspec:XI [(match_operand:XI 1 "register_operand" "w")
@@ -4366,6 +4402,65 @@
DONE;
  })

+(define_expand "aarch64_ld2_lane"
+  [(match_operand:OI 0 "register_operand" "=w")
+   (match_operand:DI 1 "register_operand" "w")
+   (match_operand:OI 2 "register_operand" "0")
+   (match_operand:SI 3 "immediate_operand" "i")
+   (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+  "TARGET_SIMD"
+{
+  enum machine_mode mode = mode;
+  rtx mem = gen_rtx_MEM (mode, operands[1]);
+
+  aarch64_simd_lane_bounds (operands[3], 0, GET_MODE_NUNITS (mode));
+  emit_insn (gen_vec_load_lanesoi_lane (operands[0],
+  mem,
+  operands[2],
+  operands[3]));
+  DONE;
+})
+
+(define_expand "aarch64_ld3_lane"
+  [(match_operand:CI 0 "register_operand" "=w")
+   (match_operand:DI 1 "register_operand" "w")
+   (match_operand:CI 2 "register_operand" "0")
+   (match_operand:SI 3 "immediate_operand" "i")
+   (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+  "TARGET_SIMD"
+{
+  enum machine_mode mode = mode;
+  rtx mem = gen_rtx_MEM (mode, operands[1]);
+
+  aarch64_simd_lane_bounds (operands[3], 0, GET_MODE_NUNITS (mode));
+  emit_insn (gen_vec_load_lanesci_lane (operands[0],
+  mem,
+  operands[2],
+  operands[3]));
+  DONE;
+})
+
+(define_expand "aarch64_ld4_lane"
+  [(match_operand:XI 0 "register_operand" "=w")
+   (match_operand:DI 1 "register_operand" "w")
+   (match_operand:XI 2 "register_operand" "0")
+   (match_operand:SI 3 "immediate_operand" "i")
+   (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+  "TARGET_SIMD"
+{
+  enum machine_mode mode = mode;
+  rtx mem = gen_rtx_MEM (mode, operands[1]);
+
+  aarch64_simd_lane_bounds (operands[3], 0, GET_MODE_NUNITS (mode));
+  emit_insn (gen_vec_load_lanesxi_lane (operands[0],
+  mem,
+  operands[2],
+  operands[3]));
+  DONE;
+})
+
+
+
  ;; Expanders for builtins to extract vector registers from large
  ;; opaque integer modes.

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 74b554e..6b5f51f 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -92,6 +92,9 @@
  UNSPEC_LD2
  UNSPEC_LD3
  UNSPEC_LD4
+UNSPEC_LD2_LANE
+UNSPEC_LD3_LANE
+UNSPEC_LD4_LANE
  UNSPEC_MB
  UNSPEC_NOP
  

[PATCH i386 AVX512] [77/n] Use blend for cond-set V32HI and V64QI.

2014-10-09 Thread Kirill Yukhin
Hello,
This patch extends movcc/vcond autogen.

Bootstrapped.
AVX-512* tests on top of patch-set all pass
under simulator.

Is it ok for trunk?

gcc/
* config/i386/i386.c
(ix86_expand_sse_movcc): Handle V64QI and V32HI mode.
(ix86_expand_int_vcond): Ditto.

--
Thanks, K

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 8a7853e..f86aa1b 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -21015,6 +21015,12 @@ ix86_expand_sse_movcc (rtx dest, rtx cmp, rtx op_true, 
rtx op_false)
}
  break;
 
+   case V64QImode:
+ gen = gen_avx512bw_blendmv64qi;
+ break;
+   case V32HImode:
+ gen = gen_avx512bw_blendmv32hi;
+ break;
case V16SImode:
  gen = gen_avx512f_blendmv16si;
  break;
@@ -21331,6 +21337,8 @@ ix86_expand_int_vcond (rtx operands[])
}
  break;
 
+   case V64QImode:
+   case V32HImode:
case V32QImode:
case V16HImode:
case V16QImode:


Re: [PATCH 2/2] [AARCH64,NEON] Convert arm_neon.h to use new builtins for vld[234](q?)_lane_*

2014-10-09 Thread Tejas Belagod

On 08/10/14 18:27, charles.bay...@linaro.org wrote:

From: Charles Baylis 

This patch replaces the inline assembler implementations of the
vld[234](q?)_lane_* intrinsics with new versions which exploit the new builtin
functions added in patch 1.

Tested (with the rest of the patch series) with make check on aarch64-oe-linux
with qemu, and also causes no regressions in clyon's NEON intrinsics tests.

  Charles Baylis  

 * config/aarch64/arm_neon.h (__LD2_LANE_FUNC): Rewrite using builtins,
 update uses to use new macro arguments.
 (__LD3_LANE_FUNC): Likewise.
 (__LD4_LANE_FUNC): Likewise.

Change-Id: I3bd5934b5c4f6127088193c1ab12848144d5540a
---
  gcc/config/aarch64/arm_neon.h | 377 --
  1 file changed, 255 insertions(+), 122 deletions(-)

diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 9b1873f..19ce261 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -11805,47 +11805,83 @@ __LD2R_FUNC (uint16x8x2_t, uint16x2_t, uint16_t, 8h, 
u16, q)
  __LD2R_FUNC (uint32x4x2_t, uint32x2_t, uint32_t, 4s, u32, q)
  __LD2R_FUNC (uint64x2x2_t, uint64x2_t, uint64_t, 2d, u64, q)

-#define __LD2_LANE_FUNC(rettype, ptrtype, regsuffix,   \
-   lnsuffix, funcsuffix, Q)\
-  __extension__ static __inline rettype
\
-  __attribute__ ((__always_inline__))  \
-  vld2 ## Q ## _lane_ ## funcsuffix (const ptrtype *ptr,   \
-rettype b, const int c)\
-  {\
-rettype result;\
-__asm__ ("ld1 {v16." #regsuffix ", v17." #regsuffix "}, %1\n\t"\
-"ld2 {v16." #lnsuffix ", v17." #lnsuffix "}[%3], %2\n\t"   \
-"st1 {v16." #regsuffix ", v17." #regsuffix "}, %0\n\t" \
-: "=Q"(result) \
-: "Q"(b), "Q"(*(const rettype *)ptr), "i"(c)   \
-: "memory", "v16", "v17"); \
-return result; \
-  }
-
-__LD2_LANE_FUNC (int8x8x2_t, uint8_t, 8b, b, s8,)
-__LD2_LANE_FUNC (float32x2x2_t, float32_t, 2s, s, f32,)
-__LD2_LANE_FUNC (float64x1x2_t, float64_t, 1d, d, f64,)
-__LD2_LANE_FUNC (poly8x8x2_t, poly8_t, 8b, b, p8,)
-__LD2_LANE_FUNC (poly16x4x2_t, poly16_t, 4h, h, p16,)
-__LD2_LANE_FUNC (int16x4x2_t, int16_t, 4h, h, s16,)
-__LD2_LANE_FUNC (int32x2x2_t, int32_t, 2s, s, s32,)
-__LD2_LANE_FUNC (int64x1x2_t, int64_t, 1d, d, s64,)
-__LD2_LANE_FUNC (uint8x8x2_t, uint8_t, 8b, b, u8,)
-__LD2_LANE_FUNC (uint16x4x2_t, uint16_t, 4h, h, u16,)
-__LD2_LANE_FUNC (uint32x2x2_t, uint32_t, 2s, s, u32,)
-__LD2_LANE_FUNC (uint64x1x2_t, uint64_t, 1d, d, u64,)
-__LD2_LANE_FUNC (float32x4x2_t, float32_t, 4s, s, f32, q)
-__LD2_LANE_FUNC (float64x2x2_t, float64_t, 2d, d, f64, q)
-__LD2_LANE_FUNC (poly8x16x2_t, poly8_t, 16b, b, p8, q)
-__LD2_LANE_FUNC (poly16x8x2_t, poly16_t, 8h, h, p16, q)
-__LD2_LANE_FUNC (int8x16x2_t, int8_t, 16b, b, s8, q)
-__LD2_LANE_FUNC (int16x8x2_t, int16_t, 8h, h, s16, q)
-__LD2_LANE_FUNC (int32x4x2_t, int32_t, 4s, s, s32, q)
-__LD2_LANE_FUNC (int64x2x2_t, int64_t, 2d, d, s64, q)
-__LD2_LANE_FUNC (uint8x16x2_t, uint8_t, 16b, b, u8, q)
-__LD2_LANE_FUNC (uint16x8x2_t, uint16_t, 8h, h, u16, q)
-__LD2_LANE_FUNC (uint32x4x2_t, uint32_t, 4s, s, u32, q)
-__LD2_LANE_FUNC (uint64x2x2_t, uint64_t, 2d, d, u64, q)
+#define __LD2_LANE_FUNC(intype, vectype, largetype, ptrtype,  \
+mode, ptrmode, funcsuffix, signedtype)\
+__extension__ static __inline intype __attribute__ ((__always_inline__))   \
+vld2_lane_##funcsuffix (const ptrtype * __ptr, intype __b, const int __c)  \
+{ \
+  __builtin_aarch64_simd_oi __o;  \
+  largetype __temp;   \
+  __temp.val[0] = \
+vcombine_##funcsuffix (__b.val[0], vcreate_##funcsuffix (0)); \
+  __temp.val[1] = \
+vcombine_##funcsuffix (__b.val[1], vcreate_##funcsuffix (0)); \
+  __o = __builtin_aarch64_set_qregoi##mode (__o,  \
+  (signedtype) __temp.val[0], \
+  0); \
+  __o = __builtin_aarch64_set_qregoi##mode (__o,  \
+  (signedtype) __temp.val[1], \
+  1); \
+  __o =__bui

Re: [i386] Replace builtins with vector extensions

2014-10-09 Thread Kirill Yukhin
Hello folks,
On 09 Oct 14:57, Uros Bizjak wrote:
> On Thu, Oct 9, 2014 at 2:28 PM, Marc Glisse  wrote:
> > On Thu, 9 Oct 2014, Uros Bizjak wrote:
> OK, let's go in the proposed way, more detailed:
> 
> - we begin with +-*/ of float/double vectors. IMO, this would result
> in a relatively small and easily reviewable patch to iron out the
> details of the approach. Alternatively, we can begin with floats only.
> - commit the patch and wait for the sky to fall down.
> - we play a bit with the compiler to check generated code and corner
> cases (some kind of Q/A) and wait if someone finds a problem (say, a
> couple of weeks).
> - if there are no problems, continue with integer builtins following
> the established approach, otherwise we revert everything and go back
> to the drawing board.
> - repeat the procedure for other builtins.
> 
> I propose to wait a couple of days for possible comments before we get
> the ball rolling.
Let me repeat, I think this is good idea to do.
I just wanted to kindly ask you wait for about 1-2ww before checking-in
this things.
I hope in that time AVX-512VL,BW,DQ will hit trunk completely 
and *lots* more intrinsics will be added (I think intrinsics is
subject of ~[85/n] patch).

--
Thanks, K

> 
> Uros.


Re: [PATCH i386 AVX512] [64/n] Add rest of VI1-AVX2: vpack[us]wb.

2014-10-09 Thread Uros Bizjak
On Thu, Oct 9, 2014 at 12:09 PM, Kirill Yukhin  wrote:
> Hello,
> This patch adds rest of vpack instruction patterns.
>
> Bootstrapped.
> gcc.target/i386.exp tests on top of patch-set show no regressions.
> under simulator.
>
> Is it ok for trunk?
>
> gcc/
> * config/i386/sse.md
> (define_insn "_packsswb"): Add masking.
> (define_insn "_packuswb"): Ditto.

OK.

Thanks,
Uros.


Re: [C++ PAtch] More C++11 and C++14 constexpr work

2014-10-09 Thread Paolo Carlini

Hi,

On 10/09/2014 04:18 PM, Jason Merrill wrote:

On 10/09/2014 09:49 AM, Paolo Carlini wrote:

Hi,

On 10/09/2014 03:31 PM, Jason Merrill wrote:

On 10/08/2014 03:47 PM, Paolo Carlini wrote:

(check_constexpr_ctor_body): Use it; add bool parameter.

This function seems to only be called in one place; why add the
parameter?

Is also called recursively by check_constexpr_ctor_body_1 and without
the complain boolean we end up printing the error message twice.

Ah, guess I overlooked that.  OK.

Thanks, I'm going to commit the patch.

I noticed today that given the actual C++11 the error messages we provide:

"constexpr constructor does not have empty body"

and:

"body of constexpr function ‘XXX’ not a return-statement"

are rather outdated and misleading. In principle we should probably also 
provide more fine grained error messages, but if you have suggestions 
for less misleading catch all, I volunteer to do the change and adjust 
the testcases...


Also, I have been thinking that it would probably make sense to move 
constexpr-related code to a separate cp/constexpr.c: what do you think? 
Functions with *constexpr* in the name, the various cxx_eval_* and the 
various potential_constant_* would qualify, I think.


Paolo.


Re: [PATCH i386 AVX512] [65/n] Add rest of VI1-AVX2: mul insn pattern.

2014-10-09 Thread Uros Bizjak
On Thu, Oct 9, 2014 at 12:19 PM, Kirill Yukhin  wrote:
> Hello,
> This tiny patch extend mul insn pattern to support
> masking.
>
> Bootstrapped.
> AVX-512* tests on top of patch-set all pass
> under simulator.
>
> Is it ok for trunk?
>
> gcc/
> * config/i386/sse.md
> (define_expand "mul3"): Add masking.

OK.

Thanks,
Uros.


Re: [patch] Excessive alignment in ix86_data_alignment

2014-10-09 Thread H.J. Lu
On Thu, Oct 9, 2014 at 1:37 AM, Uros Bizjak  wrote:
> On Thu, Oct 9, 2014 at 10:25 AM, Kirill Yukhin  
> wrote:
>> On 08 Oct 23:02, Petr Murzin wrote:
>>> Hi,
>>> I have measured performance impact on Haswell platform according to this
>>> input:
>>> https://gcc.gnu.org/ml/gcc-patches/2014-06/msg00978.html

Kirill, please mention:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61296

in your ChangeLog.

> What about older processors?

Kirill, please collect data on Nehelam/Westmere, Sandybrigde/Ivybride
and Silvermont.

> The optimization was introduced well before Haswell for then current
> processors, and it was based on the recommendation from Intel
> optimization guide. If this optimization doesn't apply for new
> processors, then tune option should be introduced and set accordingly.
>

I believe the original excessive alignment was introduced by cut/paste
from

https://gcc.gnu.org/git/?p=gcc.git;a=patch;h=ed45e834f305d1f2709bf200a13d5beebc2fcfee

to improve x86 FP performance, which might be partially copied from
CONSTANT_ALIGNMENT:

https://gcc.gnu.org/git/?p=gcc.git;a=patch;h=f7d6703c5d83fc9fb06246d6eb49e9b61098045c


-- 
H.J.


Re: [PATCH i386 AVX512] [66/n] Extend vpalignr insn patterns.

2014-10-09 Thread Uros Bizjak
On Thu, Oct 9, 2014 at 12:28 PM, Kirill Yukhin  wrote:
> Hello,
> This patch extends vpalignr insn patterns.
> It also introduces dedicated `masked' version of pattern
> w/o substing.
>
> Bootstrapped.
> AVX-512* tests on top of patch-set all pass
> under simulator.
>
> Is it ok for trunk?
>
> gcc/
> * config/i386/sse.md
> (define_mode_iterator SSESCALARMODE): Add V4TI mode.
> (define_insn "_palignr_mask"): New.
> (define_insn "_palignr"): Add EVEX version.

OK, although SSESCALARMODE became even more messy ...

Just FYI: V1TI in VIMAX_AVX2 iterator is used to prevent moves of
TImode values from SSE to general regs on x86_64. The same reasoning
applies to V1DI MMX mode on x86_32.

Thanks,
Uros.


Re: [PATCH i386 AVX512] [67/n] Update constraints in vec_dup insn pattern.

2014-10-09 Thread Uros Bizjak
On Thu, Oct 9, 2014 at 12:34 PM, Kirill Yukhin  wrote:
> Hello,
> This tiny patch updates constraints in vec_dup insn
> pattern.
>
> Bootstrapped.
> AVX-512* tests on top of patch-set all pass
> under simulator.
>
> Is it ok for trunk?
>
> gcc/
> * config/i386/sse.md
> (define_insn "vec_dup"): Update constraints.

OK.

Thanks,
Uros.


Re: [i386] Replace builtins with vector extensions

2014-10-09 Thread H.J. Lu
On Thu, Oct 9, 2014 at 5:57 AM, Uros Bizjak  wrote:
> On Thu, Oct 9, 2014 at 2:28 PM, Marc Glisse  wrote:
>> On Thu, 9 Oct 2014, Uros Bizjak wrote:
>>
>>> On Thu, Oct 9, 2014 at 12:33 PM, Marc Glisse  wrote:

 Ping https://gcc.gnu.org/ml/gcc-patches/2014-07/msg01812.html

 (another part of the discussion is around
 https://gcc.gnu.org/ml/gcc-patches/2014-06/msg02288.html )

 Most people who commented seem cautiously in favor. The least favorable
 was
 Ulrich who suggested to go with it but keep the old behavior accessible
 if
 the user defines some macro (which imho would lose a large part of the
 simplification benefits of the patch)
 https://gcc.gnu.org/ml/gcc-patches/2014-06/msg02328.html

 If this is accepted, I will gladly prepare patches removing the unused
 builtins and extending this to a few more operations (integer vectors in
 particular). If this is not the direction we want to go, I'd like to hear
 it
 clearly so I can move on...
>>>
>>>
>>> Well, I'm undecided.
>>
>>
>> First, thanks for answering, it helps me a lot to know what others think.
>>
>>> The current approach is proven to work OK, there is no bugs reported
>>> in this area and the performance is apparently OK. There should be
>>> clear benefits in order to change something that "ain't broken", and
>>> at least some proof that we won't regress in this area with the new
>>> approach.
>>
>>
>> There are quite a few enhancement PRs asking for more performance, but
>> indeed no (or very few) complaints about correctness or about gcc turning
>> their code into something worse than what they wrote, which I completely
>> agree weighs more.
>>
>>> On the other hand, if the new approach opens new optimization
>>> opportunities (without regression!), I'm in favor of it, including the
>>> fact that new code won't produce equivalent assembly - as long as
>>> functionality of the optimized asm stays the same (obviously, I'd
>>> say).
>>>
>>> Please also note that this is quite big project. There are plenty of
>>> intrinsics and I for one don't want another partial transition ...
>>
>>
>> That might be an issue : this transition is partial by nature. Many
>> intrinsics cannot (easily) be expressed in GIMPLE, and among those that can
>> be represented, we only want to change those for which we are confident that
>> we will not regress the quality of the code. From the reactions, I would
>> assume that we want to be quite conservative at the beginning, and maybe we
>> can reconsider some other intrinsics later.
>>
>> The best I can offer is consistency: if addition of v2df is changed,
>> addition of v4df is changed as well (and say any +-*/ of float/double
>> vectors of any supported size). Another block would be +-*/% for integer
>> vectors. And construction / access (most construction is already
>> builtin-free). And remove the unused builtins in the same patch that makes
>> them unused. If you don't like those blocks, I can write one mega-patch that
>> does all these, if we roughly agree on the list beforehand, so it goes in
>> all at once.
>>
>> Would that be good enough?
>
> OK, let's go in the proposed way, more detailed:
>
> - we begin with +-*/ of float/double vectors. IMO, this would result
> in a relatively small and easily reviewable patch to iron out the
> details of the approach. Alternatively, we can begin with floats only.
> - commit the patch and wait for the sky to fall down.
> - we play a bit with the compiler to check generated code and corner
> cases (some kind of Q/A) and wait if someone finds a problem (say, a
> couple of weeks).
> - if there are no problems, continue with integer builtins following
> the established approach, otherwise we revert everything and go back
> to the drawing board.
> - repeat the procedure for other builtins.
>
> I propose to wait a couple of days for possible comments before we get
> the ball rolling.
>

We should also include some testcases to show code improvement
for each change.

Thanks.


-- 
H.J.


Re: [PATCH i386 AVX512] [68/n] Add vpmullw, vpacksdw, pmaddwd insn patterns.

2014-10-09 Thread Uros Bizjak
On Thu, Oct 9, 2014 at 1:07 PM, Kirill Yukhin  wrote:
> Hello,
> This patch extends vpmullw, vpacksdw and pmaddwd
> insn patterns.
>
> Bootstrapped.
> AVX-512* tests on top of patch-set all pass
> under simulator.
>
> Is it ok for trunk?
>
> gcc/
> * config/i386/sse.md
> (define_c_enum "unspec"): Add UNSPEC_PMADDWD512.
> (define_mode_iterator VI2_AVX2): Add V32HI mode.
> (define_expand "mul3"): Add masking.
> (define_insn "*mul3"): Ditto.
> (define_expand "mul3_highpart"): Ditto.
> (define_insn "*mul3_highpart"): Ditto.
> (define_insn "avx512bw_pmaddwd512"): New.
> (define_mode_attr SDOT_PMADD_SUF): Ditto.
> (define_expand "sdot_prod"): Add .
> (define_insn "_packssdw"): Add masking.
> (define_insn "*_pmulhrsw3"): Ditto.
> (define_insn "avx2_packusdw"): Delete.
> (define_insn "sse4_1_packusdw"): Ditto.
> (define_insn "_packusdw"): New.

OK.

> +  "TARGET_SSE2
> +   && ix86_binary_operator_ok (MULT, mode, operands)
> +   &&  && "
>
> Just noticed, that need to swap target check with operads check.

No need to worry for minor issues now, but looking at the sse.md, it
looks to me like a case for a quick cleanup patch to correct these
inconsistencies.

Thanks,
Uros.


  1   2   >