Re: Ignore debug insns in memcmp optimization

2016-06-08 Thread Richard Biener
On Tue, Jun 7, 2016 at 5:22 PM, Bernd Schmidt  wrote:
> This fixes a few PRs from the last few days. Fully tested on x86_64-linux.
> Ok?

Ok.

Richard.

>
> Bernd


Re: [PATCH][2/3] Vectorize inductions that are live after the loop

2016-06-08 Thread Richard Biener
On Tue, Jun 7, 2016 at 4:32 PM, Alan Hayward  wrote:
>
>
> On 07/06/2016 10:28, "Rainer Orth"  wrote:
>
>>Alan Hayward  writes:
>>
>>> On 05/06/2016 12:00, "Andreas Schwab"  wrote:
>>>
Alan Hayward  writes:

>* gcc.dg/vect/vect-live-2.c: New test.

This test fails on powerpc64 (with -m64, but not with -m32):

$ grep 'vectorized.*loops' ./vect-live-2.c.149t.vect
../gcc/testsuite/gcc.dg/vect/vect-live-2.c:10:1: note: vectorized 0
loops
in function.
../gcc/testsuite/gcc.dg/vect/vect-live-2.c:29:1: note: vectorized 0
loops
in function.


>>>
>>> "note: not vectorized: relevant stmt not supported: _1 = (long unsigned
>>> int) j_24;"
>>>
>>>
>>> This is failing because power does not support vectorising a cast from
>>>int
>>> to long.
>>> (It works on power 32bit because longs are 32bit and therefore no need
>>>to
>>> cast).
>>>
>>> Can someone please suggest a target-supports define (or another method)
>>>I
>>> can use to
>>> disable this test for power 64bit (but not 32bit) ?
>>> I tried using vect_multiple_sizes, but that will also disable the test
>>>on
>>> x86 without
>>> avx.
>>
>>I'm also seeing new FAILs on Solaris/SPARC:
>>
>>+FAIL: gcc.dg/vect/vect-live-2.c -flto -ffat-lto-objects
>>scan-tree-dump-times v
>>ect "vectorized 1 loops" 1
>>+FAIL: gcc.dg/vect/vect-live-2.c scan-tree-dump-times vect "vectorized 1
>>loops"
>>1
>>
>>32- and 64-bit:
>>
>>vect-live-2.c:16:3: note: not vectorized: relevant stmt not supported: _2
>>= j.0_1 * 4;
>>vect-live-2.c:48:7: note: not vectorized: control flow in loop.
>>vect-live-2.c:35:3: note: not vectorized: loop contains function calls or
>>data references that cannot be analyzed
>>
>>and
>>
>>+FAIL: gcc.dg/vect/vect-live-slp-3.c -flto -ffat-lto-objects
>>scan-tree-dump-times vect "vec_stmt_relevant_p: stmt live but not
>>relevant" 4
>>+FAIL: gcc.dg/vect/vect-live-slp-3.c -flto -ffat-lto-objects
>>scan-tree-dump-times vect "vectorized 1 loops" 4
>>+FAIL: gcc.dg/vect/vect-live-slp-3.c -flto -ffat-lto-objects
>>scan-tree-dump-times vect "vectorizing stmts using SLP" 4
>>+FAIL: gcc.dg/vect/vect-live-slp-3.c scan-tree-dump-times vect
>>"vec_stmt_relevant_p: stmt live but not relevant" 4
>>+FAIL: gcc.dg/vect/vect-live-slp-3.c scan-tree-dump-times vect
>>"vectorized 1 loops" 4
>>+FAIL: gcc.dg/vect/vect-live-slp-3.c scan-tree-dump-times vect
>>"vectorizing stmts using SLP" 4
>>
>>vect-live-slp-3.c:29:1: note: not vectorized: no vectype for stmt: n0_29
>>= *_4;
>>vect-live-slp-3.c:30:1: note: not vectorized: no vectype for stmt: n0_29
>>= *_4;
>>vect-live-slp-3.c:31:1: note: not vectorized: no vectype for stmt: n0_29
>>= *_4;
>>vect-live-slp-3.c:32:1: note: not vectorized: no vectype for stmt: n0_29
>>= *_4;
>>vect-live-slp-3.c:62:4: note: not vectorized: control flow in loop.
>>vect-live-slp-3.c:45:3: note: not vectorized: loop contains function
>>calls or data references that cannot be analyzed
>>
>
>
> I’ve been trying both these tests on x86,aarch64,power,sparc
>
> vect-live-slp-3.c
> Fails on  power 64 (altivec & vsx), sparc 64 (vis 2 & 3)
>   - due to long int unsupported
> Pass on x86, aarch64, power 32 (altivec & vsx), sparc 32 (vis 2 & 3)
>
> vect-live-2.c
> Fails on power 64 (altivec & vsx), sparc 64 (vis 2 & 3)
>   - due to long int unsupported
> Fails on sparc 32 (vis 2)
>   - due to multiply/shift not supported
> Pass on x86, aarch64, power 32 (altivec & vsx), sparc 32 (vis 3)
>
>
> Therefore I think both tests should be gated on “vect_long”.
> In addition, vect-live-2.c should also be gated on “vect_shift”
>
> “vect_long” is not currently enabled for aarch64, but should be.
>
> Also “vect_shift” is not currently enabled for sparc 32 (vis 3), but
> probably
> should be. I leave this for a task for a sparc maintainer to add (as I’m
> unable to test).
>
>
>
>
> This patch fixes the targets for vect-live-slp-3.c and vect-live-2.c.
> It also adds aarch64 to vect_long.
>
> As a side consequence, the following vector tests are now enabled for
> aarch64:
> pr18425.c, pr30843.c, pr36493.c, pr42193.c and pr60656.c
>
> Tested on aarch64 and x86.
> Tested by inspection on power and sparc
>
> Ok to commit?

Ok.

Thanks,
Richard.

> testsuite/
> * gcc.dg/vect/vect-live-2.c : Likewise
> * gcc.dg/vect/vect-live-slp-3.c : Update effective target
> * lib/target-supports.exp : Add aarch64 to vect_long
>
>
>
>
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-live-2.c
> b/gcc/testsuite/gcc.dg/vect/vect-live-2.c
> index
> 53adc3fee006e0577a4cf2f9ba8fe091d2a09353..9460624a515945bdd72f98a0b1a6751fd
> c7a75de 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-live-2.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-live-2.c
> @@ -1,4 +1,5 @@
> -/* { dg-require-effective-target vect_int } */
> +/* { dg-require-effective-target vect_long } */
> +/* { dg-require-effective-target vect_shift } */
>  /* { dg-additional-options "-fno-tree-scev-cprop" } */
>
>  #include "tree-vect.h"
> diff --git a/gcc/testsuite/gcc.dg/vect/v

Re: [PATCH] integer overflow checking builtins in constant expressions

2016-06-08 Thread Jakub Jelinek
On Tue, Jun 07, 2016 at 03:56:43PM -0600, Martin Sebor wrote:
> >+The built-in functions promote the first two operands into infinite 
> >precision signed type
> >+and perform addition on those promoted operands. The result is then
> >+cast to the type the third argument.
> 
> The above is missing an "of" (it should read "type of the third
> argument".)

Thanks for spotting that, fixed below.
Additionally, I've noticed that ATTR_NOTHROW_TYPEGENERIC_LEAF
for the 3 new builtins is wrong, unlike the other overflow builtins,
the *_p ones are also const.  It shouldn't change much, because we lower it
very early to the internal fns that are const, but at least from the POV of
the FEs it describes the builtins properly.

Ok for trunk?

2016-06-08  Martin Sebor  
Jakub Jelinek  

PR c++/70507
PR c/68120
* builtins.def (BUILT_IN_ADD_OVERFLOW_P, BUILT_IN_SUB_OVERFLOW_P,
BUILT_IN_MUL_OVERFLOW_P): New builtins.
* builtins.c: Include gimple-fold.h.
(fold_builtin_arith_overflow): Handle
BUILT_IN_{ADD,SUB,MUL}_OVERFLOW_P.
(fold_builtin_3): Likewise.
* doc/extend.texi (Integer Overflow Builtins): Document
__builtin_{add,sub,mul}_overflow_p.
gcc/c/
* c-typeck.c (convert_arguments): Don't promote last argument
of BUILT_IN_{ADD,SUB,MUL}_OVERFLOW_P.
gcc/cp/
* constexpr.c: Include gimple-fold.h.
(cxx_eval_internal_function): New function.
(cxx_eval_call_expression): Call it.
(potential_constant_expression_1): Handle integer arithmetic
overflow built-ins.
* tree.c (builtin_valid_in_constant_expr_p): Handle
BUILT_IN_{ADD,SUB,MUL}_OVERFLOW_P.
gcc/c-family/
* c-common.c (check_builtin_function_arguments): Handle
BUILT_IN_{ADD,SUB,MUL}_OVERFLOW_P.
gcc/testsuite/
* c-c++-common/builtin-arith-overflow-1.c: Add test cases.
* c-c++-common/builtin-arith-overflow-2.c: New test.
* g++.dg/ext/builtin-arith-overflow-1.C: New test.
* g++.dg/cpp0x/constexpr-arith-overflow.C: New test.
* g++.dg/cpp1y/constexpr-arith-overflow.C: New test.

--- gcc/builtins.def.jj 2016-06-06 14:40:35.619347198 +0200
+++ gcc/builtins.def2016-06-07 17:46:52.034206039 +0200
@@ -710,6 +710,9 @@ DEF_C94_BUILTIN(BUILT_IN_TOWUPPE
 DEF_GCC_BUILTIN(BUILT_IN_ADD_OVERFLOW, "add_overflow", BT_FN_BOOL_VAR, 
ATTR_NOTHROW_TYPEGENERIC_LEAF)
 DEF_GCC_BUILTIN(BUILT_IN_SUB_OVERFLOW, "sub_overflow", BT_FN_BOOL_VAR, 
ATTR_NOTHROW_TYPEGENERIC_LEAF)
 DEF_GCC_BUILTIN(BUILT_IN_MUL_OVERFLOW, "mul_overflow", BT_FN_BOOL_VAR, 
ATTR_NOTHROW_TYPEGENERIC_LEAF)
+DEF_GCC_BUILTIN(BUILT_IN_ADD_OVERFLOW_P, "add_overflow_p", 
BT_FN_BOOL_VAR, ATTR_CONST_NOTHROW_TYPEGENERIC_LEAF)
+DEF_GCC_BUILTIN(BUILT_IN_SUB_OVERFLOW_P, "sub_overflow_p", 
BT_FN_BOOL_VAR, ATTR_CONST_NOTHROW_TYPEGENERIC_LEAF)
+DEF_GCC_BUILTIN(BUILT_IN_MUL_OVERFLOW_P, "mul_overflow_p", 
BT_FN_BOOL_VAR, ATTR_CONST_NOTHROW_TYPEGENERIC_LEAF)
 /* Clang compatibility.  */
 DEF_GCC_BUILTIN(BUILT_IN_SADD_OVERFLOW, "sadd_overflow", 
BT_FN_BOOL_INT_INT_INTPTR, ATTR_NOTHROW_LEAF_LIST)
 DEF_GCC_BUILTIN(BUILT_IN_SADDL_OVERFLOW, "saddl_overflow", 
BT_FN_BOOL_LONG_LONG_LONGPTR, ATTR_NOTHROW_LEAF_LIST)
--- gcc/builtins.c.jj   2016-06-06 14:40:35.601347427 +0200
+++ gcc/builtins.c  2016-06-07 17:46:51.958207014 +0200
@@ -64,6 +64,7 @@ along with GCC; see the file COPYING3.
 #include "rtl-chkp.h"
 #include "internal-fn.h"
 #include "case-cfn-macros.h"
+#include "gimple-fold.h"
 
 
 struct target_builtins default_target_builtins;
@@ -7943,18 +7944,28 @@ fold_builtin_unordered_cmp (location_t l
 /* Fold __builtin_{,s,u}{add,sub,mul}{,l,ll}_overflow, either into normal
arithmetics if it can never overflow, or into internal functions that
return both result of arithmetics and overflowed boolean flag in
-   a complex integer result, or some other check for overflow.  */
+   a complex integer result, or some other check for overflow.
+   Similarly fold __builtin_{add,sub,mul}_overflow_p to just the overflow
+   checking part of that.  */
 
 static tree
 fold_builtin_arith_overflow (location_t loc, enum built_in_function fcode,
 tree arg0, tree arg1, tree arg2)
 {
   enum internal_fn ifn = IFN_LAST;
-  tree type = TREE_TYPE (TREE_TYPE (arg2));
-  tree mem_arg2 = build_fold_indirect_ref_loc (loc, arg2);
+  /* The code of the expression corresponding to the type-generic
+ built-in, or ERROR_MARK for the type-specific ones.  */
+  enum tree_code opcode = ERROR_MARK;
+  bool ovf_only = false;
+
   switch (fcode)
 {
+case BUILT_IN_ADD_OVERFLOW_P:
+  ovf_only = true;
+  /* FALLTHRU */
 case BUILT_IN_ADD_OVERFLOW:
+  opcode = PLUS_EXPR;
+  /* FALLTHRU */
 case BUILT_IN_SADD_OVERFLOW:
 case BUILT_IN_SADDL_OVERFLOW:
 case BUILT_IN_SADDLL_OVERFLOW:
@@ -7963,7 +7974,12 @@ fold_builtin_arith_overflo

Re: [Patch ARM/AArch64 09/11] Add missing vrnd{,a,m,n,p,x} tests.

2016-06-08 Thread Christophe Lyon
On 7 June 2016 at 19:05, Wilco Dijkstra  wrote:
> Hi,
>
>
> These new tests cause failures due to running on non-ARMv8 hardware - the
> target check should be arm_v8_neon_hw. Also they don't run on AArch64
> hardware as arm_v8_neon_ok/arm_v8_neon_hw isn't true.

This really makes sense.

I use QEMU to run the tests, and according to my logs, the tests are compiled
with -mfpu=neon-fp-armv8 -march=armv8-a
and QEMU --cpu cortex-a9 (on the validation configurations intended to
validate armv7-a).

So... it looks like QEMU failed to reject the invalid instructions?
I'm using QEMU-2.4.1.

> check_effective_target_arm_v8_neon_hw in testsuite/lib/target-supports.exp
> needs to be extended to allow running on AArch64 as well, as these tests
> pass when I remove the dg-require-effective-target line.

Probably, I didn't take AArch64 into account when added these.
AArch64 intrinsics tests completion is still to be done.

Christophe

>
> Wilco
>
>
> diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrnd.c
> b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrnd.c
> new file mode 100644
> index 000..5f492d4
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrnd.c
> @@ -0,0 +1,16 @@
> +/* { dg-require-effective-target arm_v8_neon_ok } */
>
> This should be arm_v8_neon_hw (the arm_v8_neon_ok can only be used for
> compilation).
>
> +/* { dg-add-options arm_v8_neon } */
>
> 


[PATCH] Fix PR71452

2016-06-08 Thread Richard Biener

The following fixes a bug when rewriting a memory location into SSA
form.  For the testcase we didn't consider the case where the type
we end up using for the SSA name does not have enough precision
to cover all values of the dynamic type (thus we only need to consider
stores).

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Richard.

2016-06-08  Richard Biener  

PR tree-optimization/71452
* tree-ssa.c (non_rewritable_lvalue_p): Make sure that the
type used for the SSA rewrite has enough precision to cover
the dynamic type of the location.

* gcc.dg/torture/pr71452.c: New testcase.

Index: gcc/tree-ssa.c
===
*** gcc/tree-ssa.c  (revision 237196)
--- gcc/tree-ssa.c  (working copy)
*** non_rewritable_lvalue_p (tree lhs)
*** 1292,1297 
--- 1320,1333 
if (integer_zerop (TREE_OPERAND (lhs, 1))
  && DECL_P (decl)
  && DECL_SIZE (decl) == TYPE_SIZE (TREE_TYPE (lhs))
+ /* If the dynamic type of the decl has larger precision than
+the decl itself we can't use the decls type for SSA rewriting.  */
+ && ((! INTEGRAL_TYPE_P (TREE_TYPE (decl))
+  || compare_tree_int (DECL_SIZE (decl),
+   TYPE_PRECISION (TREE_TYPE (decl))) == 0)
+ || (INTEGRAL_TYPE_P (TREE_TYPE (lhs))
+ && (TYPE_PRECISION (TREE_TYPE (decl))
+ >= TYPE_PRECISION (TREE_TYPE (lhs)
  && (TREE_THIS_VOLATILE (decl) == TREE_THIS_VOLATILE (lhs)))
return false;
  
Index: gcc/testsuite/gcc.dg/torture/pr71452.c
===
*** gcc/testsuite/gcc.dg/torture/pr71452.c  (revision 0)
--- gcc/testsuite/gcc.dg/torture/pr71452.c  (working copy)
***
*** 0 
--- 1,10 
+ /* { dg-do run } */
+ 
+ int main()
+ {
+   _Bool b;
+   *(char *)&b = 123;
+   if (*(char *)&b != 123)
+ __builtin_abort ();
+   return 0;
+ }
Index: gcc/testsuite/g++.dg/torture/pr71452.C
===
*** gcc/testsuite/g++.dg/torture/pr71452.C  (revision 0)
--- gcc/testsuite/g++.dg/torture/pr71452.C  (working copy)
***
*** 0 
--- 1,10 
+ // { dg-do run }
+ 
+ int main()
+ {
+   bool b;
+   *(char *)&b = 123;
+   if (*(char *)&b != 123)
+ __builtin_abort ();
+   return 0;
+ }


Re: [C++ Patch/RFC] PR 70572 ("[4.9/5/6/7 Regression] ICE on code with decltype (auto) on x86_64-linux-gnu in digest_init_r")

2016-06-08 Thread Paolo Carlini
.. shall we fix this in gcc-6-branch too or not? It's just an ICE on 
invalid but we don't emit any diagnostic before the crash.


Thanks,
Paolo.


Re: [PATCH 9/17][ARM] Add NEON FP16 arithmetic instructions.

2016-06-08 Thread James Greenhalgh
On Thu, May 19, 2016 at 05:29:16PM +, Joseph Myers wrote:
> On Thu, 19 May 2016, Jiong Wang wrote:
> 
> > Then,
> > 
> >   * if we add scalar HF mode to standard patterns, vector HF modes operation
> > will be
> > turned into scalar HF operations instead of scalar SF operations.
> > 
> >   * if we add vector HF mode to standard patterns, vector HF modes 
> > operations
> > will
> > generate vector HF instructions directly.
> > 
> >   Will this still cause precision inconsistence with old gcc when there are
> > cascade
> >   vector float operations?
> 
> I'm not sure inconsistency with old GCC is what's relevant here.
> 
> Standard-named RTL patterns have particular semantics.  Those semantics do 
> not depend on the target architecture (except where there are target 
> macros / hooks to define such dependence).  If you have an instruction 
> that matches those target-independent semantics, it should be available 
> for the standard-named pattern.  I believe that is the case here, for both 
> the scalar and the vector instructions - they have the standard semantics, 
> so should be available for the standard patterns.
> 
> It is the responsibility of the target-independent parts of the compiler 
> to ensure that the RTL generated matches the source code semantics, so 
> that providing a standard pattern for an instruction that matches the 
> pattern's semantics does not cause any problems regarding source code 
> semantics.
> 
> That said: if the expander in old GCC is converting a vector HF operation 
> into scalar SF operations, I'd expect it also to include a conversion from 
> SFmode back to HFmode after those operations, since it will be producing a 
> vector HF result.  And that would apply for each individual operation 
> expanded.  So I would not expect inconsistency to arise from making direct 
> HFmode operations available (given that the semantics of scalar + - * / 
> are the same whether you do them directly on HFmode or promote to SFmode, 
> do the operation there and then convert the result back to HFmode before 
> doing any further operations on it).

I think the confusion here is that these two functions:

  float16x8_t
  __attribute__ ((noinline)) 
  foo (float16x8_t a, float16x8_t b, float16x8_t c)
  {
return a * b / c;
  }

  float16_t
  __attribute__ ((noinline)) 
  bar (float16_t a, float16_t b, float16_t c)
  {
return a * b / c;
  }

Have different behaviours in terms of when they extend and truncate between
floating-point precisions.

A full testcase calling these functions is attached.

Compile with

  `gcc -O3`
 for AArch64 ARMv8-A
  `gcc -O3 -mfloat-abi=hard -mfpu=neon-fp16 -mfp16-format=ieee -march=armv7-a`
 for ARMv7-A 

This prints:

  Fail:
Scalar Input256.00
Scalar Output   256.00
Vector input256.00
Vector output   inf
  Fail:
Scalar Input3.300781
Scalar Output   3.300781
Vector input3.300781
Vector output   3.302734
  Fail:
Scalar Input1.00
Scalar Output   1.00
Vector input1.00
Vector output   inf
  Fail:
Scalar Input0.03
Scalar Output   0.03
Vector input0.03
Vector output   0.00
  Fail:
Scalar Input0.000400
Scalar Output   0.000400
Vector input0.000400
Vector output   0.000447

foo, operating on vectors, remains in 16-bit precision throughout gimple,
will scalarise during veclower, and will add float_extend and float_truncate
around each operation during expand to preserve the 16-bit rounding
behaviour. For this testcase, that means two truncates per vector element.
One after the multiply, one after the divide.

bar, operating on scalars, adds promotions early due to TARGET_PROMOTED_TYPE.
In gimple we stay in 32-bit precision for the two operations, and we
truncate only after both operations. That means one truncate, taking place
after the divide.

However, I find this surprising at a language level, though I see
that Clang 3.8 has the same behaviour.  ACLE doesn't mention the GCC
vector extensions, so doesn't specify the behaviour of the arithmetic
operators on vector-of-float16_t types. GCC's vector extension documentation
gives this definition for arithmetic operations:

  The types defined in this manner can be used with a subset of normal
  C operations. Currently, GCC allows using the following operators on
  these types: +, -, *, /, unary minus, ^, |, &, ~, %.

  The operations behave like C++ valarrays. Addition is defined as
  the addition of the corresponding elements of the operands. For
  example, in the code below, each of the 4 elements in a is added to
  the corresponding 4 elements in b and the resulting vector is stored
  in c.

  Subtraction, multiplication, division, and the logical operations
  operate in a similar manner. Likewise, the result of using the unary
  minus or complem

Re: [Patch ARM/AArch64 09/11] Add missing vrnd{,a,m,n,p,x} tests.

2016-06-08 Thread Christophe Lyon
On 8 June 2016 at 09:37, Christophe Lyon  wrote:
> On 7 June 2016 at 19:05, Wilco Dijkstra  wrote:
>> Hi,
>>
>>
>> These new tests cause failures due to running on non-ARMv8 hardware - the
>> target check should be arm_v8_neon_hw. Also they don't run on AArch64
>> hardware as arm_v8_neon_ok/arm_v8_neon_hw isn't true.
>
> This really makes sense.
>
> I use QEMU to run the tests, and according to my logs, the tests are compiled
> with -mfpu=neon-fp-armv8 -march=armv8-a
> and QEMU --cpu cortex-a9 (on the validation configurations intended to
> validate armv7-a).
>
> So... it looks like QEMU failed to reject the invalid instructions?
> I'm using QEMU-2.4.1.
>
Looking in more details, objdump says:
   1074c:   f3fa05a0vrintz.f32  d16, d16
and qemu -d in_asm says:
0x0001074c:  f3fa05a0  vabal.uq8, d26, d16

and I've just had the same behaviour with QEMU-2.6.0

incorrect decoding probably means incorrect execution
(but how does the test manage to pass?).

Christophe

>> check_effective_target_arm_v8_neon_hw in testsuite/lib/target-supports.exp
>> needs to be extended to allow running on AArch64 as well, as these tests
>> pass when I remove the dg-require-effective-target line.
>
> Probably, I didn't take AArch64 into account when added these.
> AArch64 intrinsics tests completion is still to be done.
>
> Christophe
>
>>
>> Wilco
>>
>>
>> diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrnd.c
>> b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrnd.c
>> new file mode 100644
>> index 000..5f492d4
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrnd.c
>> @@ -0,0 +1,16 @@
>> +/* { dg-require-effective-target arm_v8_neon_ok } */
>>
>> This should be arm_v8_neon_hw (the arm_v8_neon_ok can only be used for
>> compilation).
>>
>> +/* { dg-add-options arm_v8_neon } */
>>
>> 


Re: [Patch ARM/AArch64 09/11] Add missing vrnd{,a,m,n,p,x} tests.

2016-06-08 Thread Christophe Lyon
On 8 June 2016 at 10:47, Christophe Lyon  wrote:
> On 8 June 2016 at 09:37, Christophe Lyon  wrote:
>> On 7 June 2016 at 19:05, Wilco Dijkstra  wrote:
>>> Hi,
>>>
>>>
>>> These new tests cause failures due to running on non-ARMv8 hardware - the
>>> target check should be arm_v8_neon_hw. Also they don't run on AArch64
>>> hardware as arm_v8_neon_ok/arm_v8_neon_hw isn't true.
>>
>> This really makes sense.
>>
>> I use QEMU to run the tests, and according to my logs, the tests are compiled
>> with -mfpu=neon-fp-armv8 -march=armv8-a
>> and QEMU --cpu cortex-a9 (on the validation configurations intended to
>> validate armv7-a).
>>
>> So... it looks like QEMU failed to reject the invalid instructions?
>> I'm using QEMU-2.4.1.
>>
> Looking in more details, objdump says:
>1074c:   f3fa05a0vrintz.f32  d16, d16
> and qemu -d in_asm says:
> 0x0001074c:  f3fa05a0  vabal.uq8, d26, d16
>
> and I've just had the same behaviour with QEMU-2.6.0
>
> incorrect decoding probably means incorrect execution
> (but how does the test manage to pass?).
>
After running QEMU in debug mode, it just seems that it fails to reject the
instruction, and executes it correctly.
I'm going to file a bug.

Thanks for catching this.

> Christophe
>
>>> check_effective_target_arm_v8_neon_hw in testsuite/lib/target-supports.exp
>>> needs to be extended to allow running on AArch64 as well, as these tests
>>> pass when I remove the dg-require-effective-target line.
>>
>> Probably, I didn't take AArch64 into account when added these.
>> AArch64 intrinsics tests completion is still to be done.
>>
>> Christophe
>>
>>>
>>> Wilco
>>>
>>>
>>> diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrnd.c
>>> b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrnd.c
>>> new file mode 100644
>>> index 000..5f492d4
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrnd.c
>>> @@ -0,0 +1,16 @@
>>> +/* { dg-require-effective-target arm_v8_neon_ok } */
>>>
>>> This should be arm_v8_neon_hw (the arm_v8_neon_ok can only be used for
>>> compilation).
>>>
>>> +/* { dg-add-options arm_v8_neon } */
>>>
>>> 


Re: [PATCH 5/9] regrename: Don't run if function was separately shrink-wrapped

2016-06-08 Thread Bernd Schmidt

On 06/08/2016 03:47 AM, Segher Boessenkool wrote:

+  /* regrename creates wrong code for exception handling, if used together
+ with separate shrink-wrapping.  Disable for now, until we have
+figured out what exactly is going on.  */


That needs to be figured out now or it'll be there forever.


Bernd



Re: [PATCH] Add selftest for pretty-print.c (v2)

2016-06-08 Thread Bernd Schmidt

On 06/08/2016 02:56 AM, David Malcolm wrote:

Good idea.  In the following I did it by adding 0x12345678 as a
successor argument to each test.  I chose that bit pattern on the
grounds that each nybble is unique and non-zero.
I printed them with %x to make it easier (I hope) to track down
problems.

Successfully bootstrapped®rtested on x86_64-pc-linux-gnu.

OK for trunk?


Sure, that was implied by "otherwise OK".


Bernd



Re: [PATCH] Fix SLP wrong-code with VECTOR_BOOLEAN_TYPE_P (PR tree-optimization/71259)

2016-06-08 Thread Christophe Lyon
On 7 June 2016 at 11:28, Jakub Jelinek  wrote:
> On Tue, Jun 07, 2016 at 11:23:01AM +0200, Christophe Lyon wrote:
>> > --- gcc/testsuite/gcc.dg/vect/pr71259.c.jj  2016-06-03 
>> > 17:05:37.693475438 +0200
>> > +++ gcc/testsuite/gcc.dg/vect/pr71259.c 2016-06-03 17:05:32.418544731 +0200
>> > @@ -0,0 +1,28 @@
>> > +/* PR tree-optimization/71259 */
>> > +/* { dg-do run } */
>> > +/* { dg-options "-O3" } */
>
> Would changing this from dg-options to dg-additional-options help for the
> ARM issues?
> check_vect () is the standard way for testing for HW vectorization support
> and hundreds of tests use it.
>

This does fix the problem for pr71259.
I've also tried to replace all the dg-options by dg-additional-options
in vect/*.c, and this improves:
gcc.dg/vect/vect-shift-2-big-array.c
gcc.dg/vect/vect-shift-2.c

It has no effect on arm/aarch64 on these tests (which already pass or
are unsupported):
no-tree-pre-pr45241.c
pr18308.c
pr24049.c
pr33373.c
pr36228.c
pr42395.c
pr42604.c
pr46663.c
(unsupported) pr48765.c
pr49093.c
pr49352.c
pr52298.c
pr52870.c
pr53185.c
pr53773.c
pr56695.c
(unsupported) pr62171.c
pr63530.c
pr68339.c
(unsupported) vect-82_64.c
(unsupported) vect-83_64.c
vect-debug-pr41926.c
vect-fold-1.c
vect-singleton_1.c

So: should I change dg-options into dg-additional-options for all the
tests for consistency, or only on the 3 ones where it makes them pass?
(pr71259.c, vect-shift-2-big-array.c, vect-shift-2.c)

Thanks

Christophe.

>> > +/* { dg-additional-options "-mavx" { target avx_runtime } } */
>> > +
>> > +#include "tree-vect.h"
>> > +
>> > +long a, b[1][44][2];
>> > +long long c[44][17][2];
>> > +
>> > +int
>> > +main ()
>> > +{
>> > +  int i, j, k;
>> > +  check_vect ();
>> > +  asm volatile ("" : : : "memory");
>> > +  for (i = 0; i < 44; i++)
>> > +for (j = 0; j < 17; j++)
>> > +  for (k = 0; k < 2; k++)
>> > +   c[i][j][k] = (30995740 >= *(k + *(j + *b)) != (a != 8)) - 
>> > 5105075050047261684;
>> > +  asm volatile ("" : : : "memory");
>> > +  for (i = 0; i < 44; i++)
>> > +for (j = 0; j < 17; j++)
>> > +  for (k = 0; k < 2; k++)
>> > +   if (c[i][j][k] != -5105075050047261684)
>> > + __builtin_abort ();
>> > +  return 0;
>> > +}
>> >
>>
>> This new test fails on ARM targets where the default FPU is not Neon like.
>> The error message I'm seeing is:
>> In file included from
>> /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/gcc.dg/vect/pr71259.c:6:0:
>> /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/gcc.dg/vect/tree-vect.h:
>> In function 'check_vect':
>> /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/gcc.dg/vect/tree-vect.h:65:5:
>> error: inconsistent operand constraints in an 'asm'
>>
>> Well, the same error message actually appears with other tests, I did
>> notice this one because
>> it is a new one.
>>
>> The arm code is:
>> /* On some processors without NEON support, this instruction may
>>be a no-op, on others it may trap, so check that it executes
>>correctly.  */
>> long long a = 0, b = 1;
>> asm ("vorr %P0, %P1, %P2"
>>  : "=w" (a)
>>  : "0" (a), "w" (b));
>>
>> ... which has been here since 2007 :(
>>
>> IIUC, its purpose is to check Neon availability, but this makes the
>> tests fail instead of
>> being unsupported.
>>
>> Why not use an effective-target check instead?
>
> Jakub


Introduce param for copy loop headers pass

2016-06-08 Thread Jan Hubicka
Hi,
I think 20 insns to copy for loop header is way too much. The constant came
from jump.c that was operating with quite different IL and compiler.
This patch adds --param for it so we can fine tune it for new millenia.

Bootstrapped/regtested x86_64-linux, OK?

Honza

* invoke.texi (max-loop-headers-insns): Document.
* params.def (PARAM_MAX_LOOP_HEADER_INSNS): New.
* tree-ssa-loop-ch.c (should_duplicate_loop_header_p): Update comment.
(ch_base::copy_headers): Use PARAM_MAX_LOOP_HEADER_INSNS.

Index: doc/invoke.texi
===
--- doc/invoke.texi (revision 237184)
+++ doc/invoke.texi (working copy)
@@ -9066,6 +9066,9 @@ The maximum number of insns of an unswit
 @item max-unswitch-level
 The maximum number of branches unswitched in a single loop.
 
+@item max-loop-headers-insns
+The maximum number of insns in loop header duplicated by copy loop headers 
pass.
+
 @item lim-expensive
 The minimum cost of an expensive expression in the loop invariant motion.
 
Index: params.def
===
--- params.def  (revision 237184)
+++ params.def  (working copy)
@@ -344,6 +344,13 @@ DEFPARAM(PARAM_MAX_UNSWITCH_LEVEL,
"The maximum number of unswitchings in a single loop.",
3, 0, 0)
 
+/* The maximum number of insns in loop header duplicated by copy loop headers
+   pass.  */
+DEFPARAM(PARAM_MAX_LOOP_HEADER_INSNS,
+   "max-loop-header-insns",
+   "The maximum number of insns in loop header duplicated by copy loop 
headers pass.",
+   20, 0, 0)
+
 /* The maximum number of iterations of a loop the brute force algorithm
for analysis of # of iterations of the loop tries to evaluate.  */
 DEFPARAM(PARAM_MAX_ITERATIONS_TO_TRACK,
Index: tree-ssa-loop-ch.c
===
--- tree-ssa-loop-ch.c  (revision 237184)
+++ tree-ssa-loop-ch.c  (working copy)
@@ -33,6 +33,7 @@ along with GCC; see the file COPYING3.
 #include "tree-inline.h"
 #include "tree-ssa-scopedtables.h"
 #include "tree-ssa-threadedge.h"
+#include "params.h"
 
 /* Duplicates headers of loops if they are small enough, so that the statements
in the loop body are always executed when the loop is entered.  This
@@ -106,8 +107,7 @@ should_duplicate_loop_header_p (basic_bl
   return false;
 }
 
-  /* Approximately copy the conditions that used to be used in jump.c --
- at most 20 insns and no calls.  */
+  /* Count number of instructions and punt on calls.  */
   for (bsi = gsi_start_bb (header); !gsi_end_p (bsi); gsi_next (&bsi))
 {
   last = gsi_stmt (bsi);
@@ -290,8 +290,8 @@ ch_base::copy_headers (function *fun)
 
   FOR_EACH_LOOP (loop, 0)
 {
-  /* Copy at most 20 insns.  */
-  int limit = 20;
+  int ninsns = PARAM_VALUE (PARAM_MAX_LOOP_HEADER_INSNS);
+  int limit = ninsns;
   if (dump_file && (dump_flags & TDF_DETAILS))
fprintf (dump_file,
 "Analyzing loop %i\n", loop->num);
@@ -333,7 +333,8 @@ ch_base::copy_headers (function *fun)
fprintf (dump_file,
 "Duplicating header of the loop %d up to edge %d->%d,"
 " %i insns.\n",
-loop->num, exit->src->index, exit->dest->index, 20 - limit);
+loop->num, exit->src->index, exit->dest->index,
+ninsns - limit);
 
   /* Ensure that the header will have just the latch as a predecessor
 inside the loop.  */


Re: [v2][AArch64, 1/6] Reimplement scalar fixed-point intrinsics

2016-06-08 Thread James Greenhalgh
On Mon, Jun 06, 2016 at 02:38:58PM +0100, Jiong Wang wrote:
> On 27/05/16 17:52, Jiong Wang wrote:
> >
> >
> >On 27/05/16 14:03, James Greenhalgh wrote:
> >>On Tue, May 24, 2016 at 09:23:36AM +0100, Jiong Wang wrote:
> >>> * config/aarch64/aarch64-simd-builtins.def: Rename to
> >>> aarch64-builtins.def.
> >>Why? We already have some number of intrinsics in here that are not
> >>strictly SIMD, but I don't see the value in the rename?
> >
> >Mostly because this builtin infrastructure is handy that I want to
> >implement some vfp builtins in this .def file instead of implement those
> >raw structure inside aarch64-builtins.c.
> >
> >And there maybe more and more such builtins in the future, so I renamed
> >this file.
> >
> >
> >Is this OK?
> >
> >>>+(define_int_iterator FCVT_FIXED2F_SCALAR [UNSPEC_SCVTF_SCALAR
> >>>UNSPEC_UCVTF_SCALAR])
> >>Again, do we need the "SCALAR" versions at all?
> >
> >That's because for scalar fixed-point conversion, we have two types of
> >instructions to support this.
> >
> >  * scalar instruction from vfp
> >  * scalar variant instruction from simd
> >
> >One is guarded by TARGET_FLOAT, the other is guarded by TARGET_SIMD, and
> >their instruction format is different, so I want to keep them in
> >aarch64.md and aarch64-simd.md seperately.
> >
> >The other reason is these two use different patterns:
> >
> >  * vfp scalar support conversion between different size, for example,
> >SF->DI, DF->SI, so it's using two mode iterators, GPI and GPF, and
> >is utilizing the product of the two to cover all supported
> >conversions, sfsi, sfdi, dfsi, dfdi, sisf, sidf, disf, didf.
> >
> >  * simd scalar only support conversion between same size that single
> >mode iterator is used to cover sfsi, sisf, dfdi, didf.
> >
> >For intrinsics implementation, I used builtins backed by vfp scalar
> >instead of simd scalar which requires the input sitting inside
> >vector register.
> >
> >I remember the simd scalar pattern was here because it's anyway needed
> >by patch [2/6] which extends it's modes naturally to vector modes. I was
> >thinking it's better to keep simd scalar variant with this scalar
> >intrinsics enable patch.
> >
> >Is this OK?

This is OK. Just watch the length of some of your ChangeLog lines when you
commit.

Thanks,
James

> gcc/
> 2016-06-06  Jiong Wang
> 
> * config/aarch64/aarch64-builtins.c (TYPES_BINOP_USS): New
> (TYPES_BINOP_SUS): Likewise.
> (aarch64_simd_builtin_data): Update include file name.
> (aarch64_builtins): Likewise.
> * config/aarch64/aarch64-simd-builtins.def (scvtf): New entries
> for conversion between scalar float-point and fixed-point.
> (ucvtf): Likewise.
> (fcvtzs): Likewise.
> (fcvtzu): Likewise.
> * config/aarch64/aarch64.md
> (3: New
> pattern for conversion between scalar float to fixed-pointer.
> (: Likewise.
> (UNSPEC_FCVTZS): New UNSPEC enumeration.
> (UNSPEC_FCVTZU): Likewise.
> (UNSPEC_SCVTF): Likewise.
> (UNSPEC_UCVTF): Likewise.
> * config/aarch64/arm_neon.h (vcvtd_n_f64_s64): Remove inline 
> assembly.  Use
> builtin.
> (vcvtd_n_f64_u64): Likewise.
> (vcvtd_n_s64_f64): Likewise.
> (vcvtd_n_u64_f64): Likewise.
> (vcvtd_n_f32_s32): Likewise.
> (vcvts_n_f32_u32): Likewise.
> (vcvtd_n_s32_f32): Likewise.
> (vcvts_n_u32_f32): Likewise.
> * config/aarch64/iterators.md (fcvt_target): Support integer to float 
> mapping.
> (FCVT_TARGET): Likewise.
> (FCVT_FIXED2F): New iterator.
> (FCVT_F2FIXED): Likewise.
> (fcvt_fixed_insn): New define_int_attr.
> 




Re: [v2][AArch64, 2/6] Reimplement vector fixed-point intrinsics

2016-06-08 Thread James Greenhalgh
On Mon, Jun 06, 2016 at 02:39:38PM +0100, Jiong Wang wrote:
> Based on top of [1/6], this patch reimplement vector intrinsics for
> conversion between floating-point and fixed-point.

OK.

Thanks,
James

> 
> gcc/
> 2016-06-06  Jiong Wang
> 
> * config/aarch64/aarch64-builtins.def (scvtf): Register vector modes.
> (ucvtf): Likewise.
> (fcvtzs): Likewise.
> (fcvtzu): Likewise.
> * config/aarch64/aarch64-simd.md
> (3): New.
> (3): Likewise.
> * config/aarch64/arm_neon.h (vcvt_n_f32_s32): Remove inline assembly.
> Use builtin.
> (vcvt_n_f32_u32): Likewise.
> (vcvt_n_s32_f32): Likewise.
> (vcvt_n_u32_f32): Likewise.
> (vcvtq_n_f32_s32): Likewise.
> (vcvtq_n_f32_u32): Likewise.
> (vcvtq_n_f64_s64): Likewise.
> (vcvtq_n_f64_u64): Likewise.
> (vcvtq_n_s32_f32): Likewise.
> (vcvtq_n_s64_f64): Likewise.
> (vcvtq_n_u32_f32): Likewise.
> (vcvtq_n_u64_f64): Likewise.
> * config/aarch64/iterators.md (VDQ_SDI): New mode iterator.
> (VSDQ_SDI): Likewise.
> (fcvt_target): Support V4DI, V4SI and V2SI.
> (FCVT_TARGET): Likewise.
> 




Re: [v2][AArch64, 3/6] Reimplement frsqrte intrinsics

2016-06-08 Thread James Greenhalgh
On Mon, Jun 06, 2016 at 02:40:22PM +0100, Jiong Wang wrote:
> These intrinsics were implemented before the instruction pattern
> "aarch64_rsqrte" added, that these intrinsics were implemented through
> inline assembly.
> 
> This mirgrate the implementation to builtin.

OK. Thanks for the extra work in this patch set to add the missing
intrinsics. I'm glad to tick a nother couple off the TODO list!

Thanks,
James

> 
> gcc/
> 2016-06-06  Jiong Wang
> 
> * config/aarch64/aarch64-builtins.def (rsqrte): New builtins for modes
> VALLF.
> * config/aarch64/aarch64-simd.md (aarch64_rsqrte_2): Rename to
> "aarch64_rsqrte".
> * config/aarch64/aarch64.c (get_rsqrte_type): Update gen* name.
> * config/aarch64/arm_neon.h (vrsqrts_f32): Remove inline assembly.  
> Use
> builtin.
> (vrsqrted_f64): Likewise.
> (vrsqrte_f32): Likewise.
> (vrsqrte_f64): Likewise.
> (vrsqrteq_f32): Likewise.
> (vrsqrteq_f64): Likewise.
> 



Re: [v2][AArch64, 4/6] Reimplement frsqrts intrinsics

2016-06-08 Thread James Greenhalgh
On Mon, Jun 06, 2016 at 02:40:33PM +0100, Jiong Wang wrote:
> Similar as [3/6], these intrinsics were implemented before the instruction
> pattern "aarch64_rsqrts" added, that these intrinsics were implemented
> through inline assembly.
> 
> This mirgrate the implementation to builtin.

OK.

Thanks,
James




Re: [v2][AArch64, 5/6] Reimplement fabd intrinsics & merge rtl patterns

2016-06-08 Thread James Greenhalgh
On Mon, Jun 06, 2016 at 02:40:45PM +0100, Jiong Wang wrote:
> These intrinsics were implemented before "fabd_3" introduces.
> Meanwhile
> the patterns "fabd_3" and "*fabd_scalar3" can be merged into a
> single "fabd3" using VALLF.
> 
> This patch migrate the implementation to builtins backed by this pattern.

OK, but watch your ChangeLog format and line length.

Thanks,
James

> 
> gcc/
> 2016-06-01  Jiong Wang 
> 
> * config/aarch64/aarch64-builtins.def (fabd): New builtins
> for modes
> VALLF.
> * config/aarch64/aarch64-simd.md (fabd_3): Extend
> modes from VDQF
> to VALLF.  Rename to "fabd3".
> "*fabd_scalar3): Delete.
> * config/aarch64/arm_neon.h (vabds_f32): Remove inline assembly.
> Use builtin.
> (vabdd_f64): Likewise.
> (vabd_f32): Likewise.
> (vabd_f64): Likewise.
> (vabdq_f32): Likewise.
> (vabdq_f64): Likewise.




Re: [v2][AArch64, 6/6] Reimplement vpadd intrinsics & extend rtl patterns to all modes

2016-06-08 Thread James Greenhalgh
On Mon, Jun 06, 2016 at 02:40:55PM +0100, Jiong Wang wrote:
> These intrinsics was implemented by inline assembly using "faddp" instruction.
> There was a pattern "aarch64_addpv4sf" which supportsV4SF mode only while we 
> can
> extend this pattern to support VDQF mode, then we can reimplement these
> intrinsics through builtlins.

OK. But watch your ChangeLog format and line length.

Thanks again for this second spin of this patch set. I'm much happier
knowing that we don't have to revisit some of these intrinsics.

Thanks,
James

> 
> gcc/
> 2016-06-06  Jiong Wang
> 
> * config/aarch64/aarch64-builtins.def (faddp): New builtins for modes 
> in VDQF.
> * config/aarch64/aarch64-simd.md (aarch64_faddp): New.
> (arch64_addpv4sf): Delete.
> (reduc_plus_scal_v4sf): Use "gen_aarch64_faddpv4sf" instead of
> "gen_aarch64_addpv4sf".
> * config/aarch64/arm_neon.h (vpadd_f32): Remove inline assembly.  Use
> builtin.
> (vpadds_f32): Likewise.
> (vpaddq_f32): Likewise.
> (vpaddq_f64): Likewise.
> 



[PATCH] Improve optimize pragma/attribute diagnostics

2016-06-08 Thread Richard Biener

As pointed out by Manu in BZ.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied as obvious.

Richard.

2016-06-08  Richard Biener  

* c-common.c (parse_optimize_options): Improve diagnostic messages.

Index: gcc/c-family/c-common.c
===
--- gcc/c-family/c-common.c (revision 237174)
+++ gcc/c-family/c-common.c (working copy)
@@ -9542,10 +9542,10 @@ parse_optimize_options (tree args, bool
  ret = false;
  if (attr_p)
warning (OPT_Wattributes,
-"bad option %s to optimize attribute", p);
+"bad option %qs to attribute %", p);
  else
warning (OPT_Wpragmas,
-"bad option %s to pragma attribute", p);
+"bad option %qs to pragma %", p);
  continue;
}
 
@@ -9589,11 +9589,11 @@ parse_optimize_options (tree args, bool
  ret = false;
  if (attr_p)
warning (OPT_Wattributes,
-"bad option %s to optimize attribute",
+"bad option %qs to attribute %",
 decoded_options[i].orig_option_with_args_text);
  else
warning (OPT_Wpragmas,
-"bad option %s to pragma attribute",
+"bad option %qs to pragma %",
 decoded_options[i].orig_option_with_args_text);
  continue;
}


Re: Update probabilities in predict.def to match reality

2016-06-08 Thread Andreas Schwab
Jan Hubicka  writes:

> Bootstrapped/regtested x86_64-linux, will commit it later today.

FAIL: gcc.dg/tree-ssa/slsr-8.c scan-tree-dump-times optimized " w?* " 7

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


Re: [PATCH] Fix SLP wrong-code with VECTOR_BOOLEAN_TYPE_P (PR tree-optimization/71259)

2016-06-08 Thread Richard Biener
On Wed, 8 Jun 2016, Christophe Lyon wrote:

> On 7 June 2016 at 11:28, Jakub Jelinek  wrote:
> > On Tue, Jun 07, 2016 at 11:23:01AM +0200, Christophe Lyon wrote:
> >> > --- gcc/testsuite/gcc.dg/vect/pr71259.c.jj  2016-06-03 
> >> > 17:05:37.693475438 +0200
> >> > +++ gcc/testsuite/gcc.dg/vect/pr71259.c 2016-06-03 17:05:32.418544731 
> >> > +0200
> >> > @@ -0,0 +1,28 @@
> >> > +/* PR tree-optimization/71259 */
> >> > +/* { dg-do run } */
> >> > +/* { dg-options "-O3" } */
> >
> > Would changing this from dg-options to dg-additional-options help for the
> > ARM issues?
> > check_vect () is the standard way for testing for HW vectorization support
> > and hundreds of tests use it.
> >
> 
> This does fix the problem for pr71259.
> I've also tried to replace all the dg-options by dg-additional-options
> in vect/*.c, and this improves:
> gcc.dg/vect/vect-shift-2-big-array.c
> gcc.dg/vect/vect-shift-2.c
> 
> It has no effect on arm/aarch64 on these tests (which already pass or
> are unsupported):
> no-tree-pre-pr45241.c
> pr18308.c
> pr24049.c
> pr33373.c
> pr36228.c
> pr42395.c
> pr42604.c
> pr46663.c
> (unsupported) pr48765.c
> pr49093.c
> pr49352.c
> pr52298.c
> pr52870.c
> pr53185.c
> pr53773.c
> pr56695.c
> (unsupported) pr62171.c
> pr63530.c
> pr68339.c
> (unsupported) vect-82_64.c
> (unsupported) vect-83_64.c
> vect-debug-pr41926.c
> vect-fold-1.c
> vect-singleton_1.c
> 
> So: should I change dg-options into dg-additional-options for all the
> tests for consistency, or only on the 3 ones where it makes them pass?
> (pr71259.c, vect-shift-2-big-array.c, vect-shift-2.c)

I think all tests should use dg-additional-options.

Richard.

> Thanks
> 
> Christophe.
> 
> >> > +/* { dg-additional-options "-mavx" { target avx_runtime } } */
> >> > +
> >> > +#include "tree-vect.h"
> >> > +
> >> > +long a, b[1][44][2];
> >> > +long long c[44][17][2];
> >> > +
> >> > +int
> >> > +main ()
> >> > +{
> >> > +  int i, j, k;
> >> > +  check_vect ();
> >> > +  asm volatile ("" : : : "memory");
> >> > +  for (i = 0; i < 44; i++)
> >> > +for (j = 0; j < 17; j++)
> >> > +  for (k = 0; k < 2; k++)
> >> > +   c[i][j][k] = (30995740 >= *(k + *(j + *b)) != (a != 8)) - 
> >> > 5105075050047261684;
> >> > +  asm volatile ("" : : : "memory");
> >> > +  for (i = 0; i < 44; i++)
> >> > +for (j = 0; j < 17; j++)
> >> > +  for (k = 0; k < 2; k++)
> >> > +   if (c[i][j][k] != -5105075050047261684)
> >> > + __builtin_abort ();
> >> > +  return 0;
> >> > +}
> >> >
> >>
> >> This new test fails on ARM targets where the default FPU is not Neon like.
> >> The error message I'm seeing is:
> >> In file included from
> >> /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/gcc.dg/vect/pr71259.c:6:0:
> >> /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/gcc.dg/vect/tree-vect.h:
> >> In function 'check_vect':
> >> /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/gcc.dg/vect/tree-vect.h:65:5:
> >> error: inconsistent operand constraints in an 'asm'
> >>
> >> Well, the same error message actually appears with other tests, I did
> >> notice this one because
> >> it is a new one.
> >>
> >> The arm code is:
> >> /* On some processors without NEON support, this instruction may
> >>be a no-op, on others it may trap, so check that it executes
> >>correctly.  */
> >> long long a = 0, b = 1;
> >> asm ("vorr %P0, %P1, %P2"
> >>  : "=w" (a)
> >>  : "0" (a), "w" (b));
> >>
> >> ... which has been here since 2007 :(
> >>
> >> IIUC, its purpose is to check Neon availability, but this makes the
> >> tests fail instead of
> >> being unsupported.
> >>
> >> Why not use an effective-target check instead?
> >
> > Jakub
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [PATCH] Fix SLP wrong-code with VECTOR_BOOLEAN_TYPE_P (PR tree-optimization/71259)

2016-06-08 Thread Jakub Jelinek
On Wed, Jun 08, 2016 at 12:26:17PM +0200, Richard Biener wrote:
> > So: should I change dg-options into dg-additional-options for all the
> > tests for consistency, or only on the 3 ones where it makes them pass?
> > (pr71259.c, vect-shift-2-big-array.c, vect-shift-2.c)
> 
> I think all tests should use dg-additional-options.

All tests in {gcc,g++}.dg/vect/, right?  I agree with that.

Jakub


Re: [PATCH] Fix SLP wrong-code with VECTOR_BOOLEAN_TYPE_P (PR tree-optimization/71259)

2016-06-08 Thread Richard Biener
On Wed, 8 Jun 2016, Jakub Jelinek wrote:

> On Wed, Jun 08, 2016 at 12:26:17PM +0200, Richard Biener wrote:
> > > So: should I change dg-options into dg-additional-options for all the
> > > tests for consistency, or only on the 3 ones where it makes them pass?
> > > (pr71259.c, vect-shift-2-big-array.c, vect-shift-2.c)
> > 
> > I think all tests should use dg-additional-options.
> 
> All tests in {gcc,g++}.dg/vect/, right?  I agree with that.

Yes.  [and most of the vect.exp fancy-filename stuff should be replaced
by adding dg-additional-options]

Richard.


[PATCH] Fold x/x to 1, 0/x to 0 and 0%x to 0 consistently

2016-06-08 Thread Richard Biener

The following works around PR70992 but the issue came up repeatedly
that we are not very consistent in preserving the undefined behavior
of division or modulo by zero.  Ok - the only inconsistency is
that we fold 0 % x to 0 but not 0 % 0 (with literal zero).

After folding is now no longer done early in the C family FEs the
number of diagnostic regressions with the patch below is two.

FAIL: g++.dg/cpp1y/constexpr-sfinae.C  -std=c++14 (test for excess errors)
FAIL: gcc.dg/wcaselabel-1.c  (test for errors, line 10)

And then there is a -fnon-call-exceptions testcase

FAIL: gcc.c-torture/execute/20101011-1.c   -O1  execution test
FAIL: gcc.c-torture/execute/20101011-1.c   -O2  execution test
FAIL: gcc.c-torture/execute/20101011-1.c   -O2 -flto 
-fno-use-linker-plugin -flt
o-partition=none  execution test
FAIL: gcc.c-torture/execute/20101011-1.c   -O2 -flto -fuse-linker-plugin 
-fno-fa
t-lto-objects  execution test
FAIL: gcc.c-torture/execute/20101011-1.c   -O3 -g  execution test
FAIL: gcc.c-torture/execute/20101011-1.c   -Os  execution test

which tests that 0/0 traps (on targets where it does).  This shows
we might want to guard the simplifications against -fnon-call-exceptions.

The other way to fix the inconsistency is of course to not rely
on undefinedness in 0 % x simplification and disable that if x
is not known to be nonzero.  We can introduce the other transforms
with properly guarding against a zero 2nd operand as well.

So - any opinions here?

Thanks,
Richard.

FAIL: g++.dg/cpp1y/constexpr-sfinae.C  -std=c++14 (test for excess errors)
FAIL: gcc.c-torture/execute/20101011-1.c   -O1  execution test
FAIL: gcc.c-torture/execute/20101011-1.c   -O2  execution test
FAIL: gcc.c-torture/execute/20101011-1.c   -O2 -flto -fno-use-linker-plugin -flt
o-partition=none  execution test
FAIL: gcc.c-torture/execute/20101011-1.c   -O2 -flto -fuse-linker-plugin -fno-fa
t-lto-objects  execution test
FAIL: gcc.c-torture/execute/20101011-1.c   -O3 -g  execution test
FAIL: gcc.c-torture/execute/20101011-1.c   -Os  execution test
FAIL: gcc.dg/wcaselabel-1.c  (test for errors, line 10)


2016-06-08  Richard Biener  

PR middle-end/70992
* match.pd (X / X -> 1): Add.
(0 / X -> 0): Likewise.
(0 % X -> 0): Remove restriction on X not being literal zero.

Index: gcc/match.pd
===
--- gcc/match.pd(revision 237205)
+++ gcc/match.pd(working copy)
@@ -140,12 +140,19 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
|| !COMPLEX_FLOAT_TYPE_P (type)))
(negate @0)))
 
-/* Make sure to preserve divisions by zero.  This is the reason why
-   we don't simplify x / x to 1 or 0 / x to 0.  */
 (for op (mult trunc_div ceil_div floor_div round_div exact_div)
   (simplify
 (op @0 integer_onep)
 (non_lvalue @0)))
+/* Make sure to preserve divisions by zero.  This is the reason why
+   we don't simplify x / x to 1 or 0 / x to 0.  */
+(for op (trunc_div ceil_div floor_div round_div exact_div)
+  (simplify
+(op @0 @0)
+{ build_one_cst (type); })
+  (simplify
+(op integer_zerop@0 @1)
+@0))
 
 /* X / -1 is -X.  */
 (for div (trunc_div ceil_div floor_div round_div exact_div)
@@ -255,9 +262,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  /* 0 % X is always zero.  */
  (simplify
   (mod integer_zerop@0 @1)
-  /* But not for 0 % 0 so that we can get the proper warnings and errors.  */
-  (if (!integer_zerop (@1))
-   @0))
+  @0)
  /* X % 1 is always zero.  */
  (simplify
   (mod @0 integer_onep)


Add a test for C DR#423 (PR c/65471)

2016-06-08 Thread Marek Polacek
Reading  it occured
to me that we might resolve c/65471, dealing with type interpretation in
_Generic, too.  Since it turned out that GCC already does the right thing,
I'm only adding a new test.  (We should discard qualifiers from controlling 
expression of _Generic.)

Regarding the bit-field issue, it seems that we should do nothing and keep
it implementation-defined for now.

Tested on x86_64-linux, ok for trunk?

2016-06-08  Marek Polacek  

PR c/65471
* gcc.dg/c11-generic-3.c: New test.

diff --git gcc/testsuite/gcc.dg/c11-generic-3.c 
gcc/testsuite/gcc.dg/c11-generic-3.c
index e69de29..8bac21e 100644
--- gcc/testsuite/gcc.dg/c11-generic-3.c
+++ gcc/testsuite/gcc.dg/c11-generic-3.c
@@ -0,0 +1,10 @@
+/* Test C11 _Generic.  Test we follow the resolution of DR#423.  */
+/* { dg-do compile } */
+/* { dg-options "-std=c11 -pedantic-errors" } */
+
+char const *a = _Generic ("bla", char *: "");
+char const *b = _Generic ("bla", char[4]: ""); /* { dg-error "not compatible 
with any association" } */
+char const *c = _Generic ((int const) { 0 }, int: "");
+char const *d = _Generic ((int const) { 0 }, int const: ""); /* { dg-error 
"not compatible with any association" } */
+char const *e = _Generic (+(int const) { 0 }, int: "");
+char const *f = _Generic (+(int const) { 0 }, int const: ""); /* { dg-error 
"not compatible with any association" } */

Marek


Re: [PATCH GCC]Remove duplciated alias check in vectorizer

2016-06-08 Thread Richard Biener
On Mon, Jun 6, 2016 at 6:00 PM, Bin Cheng  wrote:
> Hi,
> GCC now generates duplicated alias check in vectorizer when versioning loops. 
>  In current implementation, DR_OFFSET and DR_INIT are added together too 
> early when creating structure dr_with_seg_len.  This has two disadvantages: 
> A) structure dr_with_seg_len_pair_t is only canonicalized against 
> DR_BASE_ADDRESS in function vect_prune_runtime_alias_test_list, while it 
> should be against DR_OFFSET too; B) When function 
> vect_prune_runtime_alias_test_list tries to merge aias checks with 
> consecutive memory references, it can only handle DRs with constant DR_OFFSET 
> + DR_INIT, as in below code:
>   /* We consider the case that DR_B1 and DR_B2 are same memrefs,
>  and DR_A1 and DR_A2 are two consecutive memrefs.  */
>   //... ...
>   if (!operand_equal_p (DR_BASE_ADDRESS (dr_a1->dr),
> DR_BASE_ADDRESS (dr_a2->dr),
> 0)
>   || !tree_fits_shwi_p (dr_a1->offset)
>   || !tree_fits_shwi_p (dr_a2->offset))
> continue;
>
> Both disadvantages result in duplicated/unnecessary alias checks, as well as 
> bloated condition basic block of loop versioning.
> This patch fixes the issue.  Bootstrap and test on x86_64 and AArch64.  Is it 
> OK?
> Test gfortran.dg/vect/vect-8.f90 failed now.  It scans for "vectorized 20 
> loops" but with this patch there are more than 20 loops vectorized.  The 
> additional loop wasn't vectorized because # of alias checks exceeded 
> parameter bound "vect-max-version-for-alias-checks" w/o this patch.
>
> There are other issues in vectorizer alias checking, I will tackle them in 
> follow up patches.

Ok.

Thanks,
Richard.

> Thanks,
> bin
>
> 2016-06-03  Bin Cheng  
>
> * tree-vectorizer.h (struct dr_with_seg_len): Remove class
> member OFFSET.
> * tree-vect-data-refs.c (operator ==): Handle DR_OFFSET directly,
> rather than OFFSET.
> (comp_dr_with_seg_len_pair, comp_dr_with_seg_len_pair): Ditto.
> (vect_create_cond_for_alias_checks): Ditto.
> (vect_prune_runtime_alias_test_list): Also canonicalize pairs
> against DR_OFFSET.  Handle DR_OFFSET directly when prune alias
> checks.
>
> gcc/testsuite/ChangeLog
> 2016-06-03  Bin Cheng  
>
> * gcc.dg/vect/vect-alias-check-1.c: New test.


Re: [PATCH 1/2][v3] Drop excess size used for run time allocated stack variables.

2016-06-08 Thread Bernd Schmidt

On 05/25/2016 03:30 PM, Dominik Vogt wrote:

* explow.c (allocate_dynamic_stack_space): Simplify knowing that
MUST_ALIGN was always true and extra_align ist always BITS_PER_UNIT.


I tried to do some archaeology to find out how the code came to look the 
way it currently does. A relevant message appears to be


https://gcc.gnu.org/ml/gcc-patches/2011-01/msg00836.html

There's some discussion about how STACK_POINT_OFFSET shouldn't cause us 
to have to align, and postponing that optimization to gcc-4.7. Since 
STACK_POINTER_OFFSET should be constant, it ought to be easy enough to 
take it into account.


So, I'm undecided. Your cleanup is valid as the code stands right now, 
but I'm undecided whether we shouldn't fix the potentially unnecessary 
extra alignment instead.



Bernd


Re: [PATCH 2/2][v3] Drop excess size used for run time allocated stack variables.

2016-06-08 Thread Bernd Schmidt

On 05/25/2016 03:32 PM, Dominik Vogt wrote:

* explow.c (round_push): Use know adjustment.
(allocate_dynamic_stack_space): Pass known adjustment to round_push.
gcc/testsuite/ChangeLog



I was thinking about whether it would be possible/desirable to eliminate 
the double add entirely, but I couldn't find a way to structure the code 
in a way that seems better than what you have. So, ...



 /* Round the size of a block to be pushed up to the boundary required
-   by this machine.  SIZE is the desired size, which need not be constant.  */
+   by this machine.  SIZE is the desired size, which need not be constant.
+   ALREADY_ADDED is the number of units that have already been added to SIZE
+   for other alignment reasons.
+*/


The */ goes on the last line of the comment.


+/* PR/50938: Check that alloca () reserves the correct amount of stack space.
+ */


Same here really, even if it's only a test.

Ok with these fixed.


Bernd


[Patch, avr] Fix broken stack-usage-1.c test

2016-06-08 Thread Senthil Kumar Selvaraj
Hi,

  A recent patch I submitted fixed broken -fstack-usage for the avr
  target, by including the size of the return address pushed to the stack
  (https://gcc.gnu.org/ml/gcc-patches/2016-05/msg01715.html).

  I forgot to send this testcase modification with that patch - here's
  the fix for making gcc.dg/stack-usage-1.c pass again for avr.

  If this is ok, could someone commit please? I don't have commit
  access.

Regards
Senthil

gcc/testsuite/ChangeLog

2016-06-08  Senthil Kumar Selvaraj  

* gcc.dg/stack-usage-1.c (SIZE): Consider return address
  when setting SIZE.

diff --git gcc/testsuite/gcc.dg/stack-usage-1.c 
gcc/testsuite/gcc.dg/stack-usage-1.c
index 7864c6a..bdc5656 100644
--- gcc/testsuite/gcc.dg/stack-usage-1.c
+++ gcc/testsuite/gcc.dg/stack-usage-1.c
@@ -64,7 +64,11 @@
 #define SIZE 240
 #  endif
 #elif defined (__AVR__)
-#  define SIZE 254
+#if defined (__AVR_3_BYTE_PC__ )
+#  define SIZE 251 /* 256 - 2 bytes for Y - 3 bytes for return address */
+#else
+#  define SIZE 252 /* 256 - 2 bytes for Y - 2 bytes for return address */
+#endif
 #elif defined (__s390x__)
 #  define SIZE 96  /* 256 - 160 bytes for register save area */
 #elif defined (__s390__)


Re: increase alignment of global structs in increase_alignment pass

2016-06-08 Thread Prathamesh Kulkarni
On 7 June 2016 at 20:17, Wilco Dijkstra  wrote:
>
> After your commit these tests fail on AArch64:
>
> UNRESOLVED: gcc.dg/vect/section-anchors-vect-70.c scan-ipa-dump-times
> increase_alignment "Increasing alignment of decl" 0
> UNRESOLVED: gcc.dg/vect/section-anchors-vect-70.c scan-ipa-dump-times
> increase_alignment "Increasing alignment of decl" 3
> UNRESOLVED: gcc.dg/vect/section-anchors-vect-71.c scan-ipa-dump-times
> increase_alignment "Increasing alignment of decl" 0
> UNRESOLVED: gcc.dg/vect/section-anchors-vect-72.c scan-ipa-dump-times
> increase_alignment "Increasing alignment of decl" 0
> UNRESOLVED: gcc.dg/vect/section-anchors-vect-72.c scan-ipa-dump-times
> increase_alignment "Increasing alignment of decl" 3
>
> Did you mean to commit these tests as aligned-section-anchors-vect-*.c? That
> would enable the -fdump-ipa-increase_alignment-details that appears to be
> required by these tests.
Oops, sorry for that. I had bootstrapped and tested the patch on
aarch64-linux-gnu before committing,
I wonder why these didn't show up with compare_tests script or maybe I
missed something :/
I renamed the above files in r237207 and the tests change from
UNRESOLVED to PASS.

Thanks,
Prathamesh
>
> Wilco
>


Re: Introduce param for copy loop headers pass

2016-06-08 Thread Richard Biener
On Wed, 8 Jun 2016, Jan Hubicka wrote:

> Hi,
> I think 20 insns to copy for loop header is way too much. The constant came
> from jump.c that was operating with quite different IL and compiler.
> This patch adds --param for it so we can fine tune it for new millenia.
> 
> Bootstrapped/regtested x86_64-linux, OK?

Ok.

Thanks
Richard.

> Honza
> 
>   * invoke.texi (max-loop-headers-insns): Document.
>   * params.def (PARAM_MAX_LOOP_HEADER_INSNS): New.
>   * tree-ssa-loop-ch.c (should_duplicate_loop_header_p): Update comment.
>   (ch_base::copy_headers): Use PARAM_MAX_LOOP_HEADER_INSNS.
> 
> Index: doc/invoke.texi
> ===
> --- doc/invoke.texi   (revision 237184)
> +++ doc/invoke.texi   (working copy)
> @@ -9066,6 +9066,9 @@ The maximum number of insns of an unswit
>  @item max-unswitch-level
>  The maximum number of branches unswitched in a single loop.
>  
> +@item max-loop-headers-insns
> +The maximum number of insns in loop header duplicated by copy loop headers 
> pass.
> +
>  @item lim-expensive
>  The minimum cost of an expensive expression in the loop invariant motion.
>  
> Index: params.def
> ===
> --- params.def(revision 237184)
> +++ params.def(working copy)
> @@ -344,6 +344,13 @@ DEFPARAM(PARAM_MAX_UNSWITCH_LEVEL,
>   "The maximum number of unswitchings in a single loop.",
>   3, 0, 0)
>  
> +/* The maximum number of insns in loop header duplicated by copy loop headers
> +   pass.  */
> +DEFPARAM(PARAM_MAX_LOOP_HEADER_INSNS,
> + "max-loop-header-insns",
> + "The maximum number of insns in loop header duplicated by copy loop 
> headers pass.",
> + 20, 0, 0)
> +
>  /* The maximum number of iterations of a loop the brute force algorithm
> for analysis of # of iterations of the loop tries to evaluate.  */
>  DEFPARAM(PARAM_MAX_ITERATIONS_TO_TRACK,
> Index: tree-ssa-loop-ch.c
> ===
> --- tree-ssa-loop-ch.c(revision 237184)
> +++ tree-ssa-loop-ch.c(working copy)
> @@ -33,6 +33,7 @@ along with GCC; see the file COPYING3.
>  #include "tree-inline.h"
>  #include "tree-ssa-scopedtables.h"
>  #include "tree-ssa-threadedge.h"
> +#include "params.h"
>  
>  /* Duplicates headers of loops if they are small enough, so that the 
> statements
> in the loop body are always executed when the loop is entered.  This
> @@ -106,8 +107,7 @@ should_duplicate_loop_header_p (basic_bl
>return false;
>  }
>  
> -  /* Approximately copy the conditions that used to be used in jump.c --
> - at most 20 insns and no calls.  */
> +  /* Count number of instructions and punt on calls.  */
>for (bsi = gsi_start_bb (header); !gsi_end_p (bsi); gsi_next (&bsi))
>  {
>last = gsi_stmt (bsi);
> @@ -290,8 +290,8 @@ ch_base::copy_headers (function *fun)
>  
>FOR_EACH_LOOP (loop, 0)
>  {
> -  /* Copy at most 20 insns.  */
> -  int limit = 20;
> +  int ninsns = PARAM_VALUE (PARAM_MAX_LOOP_HEADER_INSNS);
> +  int limit = ninsns;
>if (dump_file && (dump_flags & TDF_DETAILS))
>   fprintf (dump_file,
>"Analyzing loop %i\n", loop->num);
> @@ -333,7 +333,8 @@ ch_base::copy_headers (function *fun)
>   fprintf (dump_file,
>"Duplicating header of the loop %d up to edge %d->%d,"
>" %i insns.\n",
> -  loop->num, exit->src->index, exit->dest->index, 20 - limit);
> +  loop->num, exit->src->index, exit->dest->index,
> +  ninsns - limit);
>  
>/* Ensure that the header will have just the latch as a predecessor
>inside the loop.  */
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: Introduce param for copy loop headers pass

2016-06-08 Thread Bernd Schmidt

On 06/08/2016 11:31 AM, Jan Hubicka wrote:

I think 20 insns to copy for loop header is way too much. The constant came
from jump.c that was operating with quite different IL and compiler.
This patch adds --param for it so we can fine tune it for new millenia.



+@item max-loop-headers-insns
+The maximum number of insns in loop header duplicated by copy loop headers 
pass.
+


"the copy loop headers pass", here and in params.def.


-  int limit = 20;
+  int ninsns = PARAM_VALUE (PARAM_MAX_LOOP_HEADER_INSNS);
+  int limit = ninsns;


The naming is somewhat unfortunate, I think limit should be the initial 
limit, and something like remaining_limit should be the name of the one 
that counts down.


Otherwise ok.


Bernd


Re: Update probabilities in predict.def to match reality

2016-06-08 Thread Martin Liška
On 06/07/2016 09:27 PM, Jan Hubicka wrote:
> There are bugs in few predictors - goto predictor is dead because the FE code 
> was dropped,
> return predictor is bit random because CFG is optimized (it should probably 
> be done in FE),
> loop iv compare seems bogus and fortran fail alloc does not seem to work as 
> intended.
> I added FIXME and will addres them incrementally.

Hi.

I've just investigated 'fail alloc' predicate which occurs just in 2 SPEC2006 
benchmarks:

437.leslie3d
HEURISTICS   BRANCHES  (REL)  HITRATE
COVERAGE COVERAGE  (REL)
fail alloc 15   1.3% 100.00% / 100.00%  
   1515.00   0.0%

and

459.GemsFDTD
HEURISTICS   BRANCHES  (REL)  HITRATE
COVERAGE COVERAGE  (REL)
fail alloc580  12.3%  61.21% / 100.00%  
  580   580.00   0.0%

The first one contains just couple of edges, while Gems contains quite many.
As we use the PRED_FORTRAN_FAIL_ALLOC predicate for multiple edges, I've 
separated the predicate
to more to observe what happens:

a) It fails in situations where we decorate ALLOCATABLE, which can be called 
for already allocated object.
All such edges have hits == 1. The predicate is set properly.

b) Very similar situation in deallocation, where it can be called for a release 
memory.
All such edges have hits == 1. The predicate is set properly.

I've also tried polyhedron, where the predicate behaves as follows:
HEURISTICS   BRANCHES  (REL)  HITRATE
COVERAGE COVERAGE  (REL)
fail alloc572   6.0%  66.08% / 100.00%  
  572   572.00   0.0%

The only reason why it fails is array allocation, where the function is called 
for already allocated array.
Thus the predicate should be also fine.

It's hard to guess how to properly set the predictor. The name is a bit 
misleading as it's not tightly connected
to a memory allocation failure. As I don't have any real-world fortran code 
base, it's quite hard to catch some
representative numbers. I would alter the number to 70-80%.

Thoughts?

Martin


Re: [PATCH 0/9] separate shrink-wrapping

2016-06-08 Thread Bernd Schmidt

On 06/08/2016 03:47 AM, Segher Boessenkool wrote:

This patch series introduces separate shrink-wrapping.

[...]

The next six patches are to prevent later passes from mishandling the
epilogue instructions that now appear before the epilogue: mostly, you
cannot do much to instructions with a REG_CFA_RESTORE note without
confusing dwarf2cfi.  The cprop one is for prologue instructions.


I'll need a while to sort out my thoughts about this. On the whole I 
like having the ability to do this, but I'm worried about the fragility 
it introduces in passes after shrink-wrapping. Ideally we'd need an ix86 
implementation for test coverage reasons.


Is the usage of the word "concern" here standard for this kind of thing? 
It seems odd somehow but maybe that's just me.



Bernd


Re: Update probabilities in predict.def to match reality

2016-06-08 Thread Jan Hubicka
> On 06/07/2016 09:27 PM, Jan Hubicka wrote:
> > There are bugs in few predictors - goto predictor is dead because the FE 
> > code was dropped,
> > return predictor is bit random because CFG is optimized (it should probably 
> > be done in FE),
> > loop iv compare seems bogus and fortran fail alloc does not seem to work as 
> > intended.
> > I added FIXME and will addres them incrementally.
> 
> Hi.
> 
> I've just investigated 'fail alloc' predicate which occurs just in 2 SPEC2006 
> benchmarks:
> 
> 437.leslie3d
> HEURISTICS   BRANCHES  (REL)  HITRATE
> COVERAGE COVERAGE  (REL)
> fail alloc 15   1.3% 100.00% / 100.00%
>  1515.00   0.0%
> 
> and
> 
> 459.GemsFDTD
> HEURISTICS   BRANCHES  (REL)  HITRATE
> COVERAGE COVERAGE  (REL)
> fail alloc580  12.3%  61.21% / 100.00%
> 580   580.00   0.0%
> 
> The first one contains just couple of edges, while Gems contains quite many.
> As we use the PRED_FORTRAN_FAIL_ALLOC predicate for multiple edges, I've 
> separated the predicate
> to more to observe what happens:
> 
> a) It fails in situations where we decorate ALLOCATABLE, which can be called 
> for already allocated object.
> All such edges have hits == 1. The predicate is set properly.
> 
> b) Very similar situation in deallocation, where it can be called for a 
> release memory.
> All such edges have hits == 1. The predicate is set properly.
> 
> I've also tried polyhedron, where the predicate behaves as follows:
> HEURISTICS   BRANCHES  (REL)  HITRATE
> COVERAGE COVERAGE  (REL)
> fail alloc572   6.0%  66.08% / 100.00%
> 572   572.00   0.0%
> 
> The only reason why it fails is array allocation, where the function is 
> called for already allocated array.
> Thus the predicate should be also fine.
> 
> It's hard to guess how to properly set the predictor. The name is a bit 
> misleading as it's not tightly connected
> to a memory allocation failure. As I don't have any real-world fortran code 
> base, it's quite hard to catch some
> representative numbers. I would alter the number to 70-80%.

Either that or we can drop the predictor. It was added by me and I obviuosly
got confused here.  If it is common for function to fail becuase things are
already allocated, I think we could just leave it unpredicted and do whatever
the generic code does here.

Thanks for investigating this!
Honza
> 
> Thoughts?
> 
> Martin


Re: Update probabilities in predict.def to match reality

2016-06-08 Thread Martin Liška
On 06/08/2016 12:21 PM, Andreas Schwab wrote:
> Jan Hubicka  writes:
> 
>> Bootstrapped/regtested x86_64-linux, will commit it later today.
> 
> FAIL: gcc.dg/tree-ssa/slsr-8.c scan-tree-dump-times optimized " w?* " 7
> 
> Andreas.
> 

Hi.

It's caused by a different probabilities for BB 2:

@@ -11,11 +11,11 @@
 ;; 3 succs { 4 }
 ;; 4 succs { 1 }
 Predictions for bb 2
-  DS theory heuristics: 78.4%
-  first match heuristics (ignored): 85.0%
-  combined heuristics: 78.4%
-  pointer (on trees) heuristics: 85.0%
-  early return (on trees) heuristics: 39.0%
+  DS theory heuristics: 66.5%
+  first match heuristics (ignored): 70.0%
+  combined heuristics: 66.5%
+  pointer (on trees) heuristics: 70.0%
+  early return (on trees) heuristics: 46.0%

Which leads to a different decision made by tree-ssa-sink:

+++ /tmp/sl-new/slsr-8.c.127t.sink  2016-06-08 14:07:59.747958332 +0200
@@ -21,6 +21,16 @@
  from bb 2 to bb 3
 Sinking a3_17 = s_11(D) * 6;
  from bb 2 to bb 3
+Sinking x2_16 = c_13(D) + _6;
+ from bb 2 to bb 5
+Sinking _6 = -_5;
+ from bb 2 to bb 5
+Sinking _5 = _4 * 4;
+ from bb 2 to bb 5
+Sinking _4 = (long unsigned int) a2_15;
+ from bb 2 to bb 5
+Sinking a2_15 = s_11(D) * 4;
+ from bb 2 to bb 5
 f (int s, int * c)
 {
   int * x3;
@@ -46,17 +56,17 @@
   _2 = _1 * 4;
   _3 = -_2;
   x1_14 = c_13(D) + _3;
-  a2_15 = s_11(D) * 4;
-  _4 = (long unsigned int) a2_15;
-  _5 = _4 * 4;
-  _6 = -_5;
-  x2_16 = c_13(D) + _6;
   if (x1_14 != 0B)
 goto ;
   else
 goto ;
 
   :
+  a2_15 = s_11(D) * 4;
+  _4 = (long unsigned int) a2_15;
+  _5 = _4 * 4;
+  _6 = -_5;
+  x2_16 = c_13(D) + _6;
   goto ;
 
   :

That eventually leads to 9 occurrences of the scanned pattern. However, I'm not 
sure if the test-case makes
sense any longer?

Thanks,
Martin

;; Function f (f, funcdef_no=0, decl_uid=1747, cgraph_uid=0, symbol_order=0)

;; 1 loops found
;;
;; Loop 0
;;  header 0, latch 1
;;  depth 0, outer -1
;;  nodes: 0 1 2 5 3 4
;; 2 succs { 5 3 }
;; 5 succs { 4 }
;; 3 succs { 4 }
;; 4 succs { 1 }
Sinking x3_18 = c_13(D) + _9;
 from bb 2 to bb 3
Sinking _9 = -_8;
 from bb 2 to bb 3
Sinking _8 = _7 * 4;
 from bb 2 to bb 3
Sinking _7 = (long unsigned int) a3_17;
 from bb 2 to bb 3
Sinking a3_17 = s_11(D) * 6;
 from bb 2 to bb 3
Sinking x2_16 = c_13(D) + _6;
 from bb 2 to bb 5
Sinking _6 = -_5;
 from bb 2 to bb 5
Sinking _5 = _4 * 4;
 from bb 2 to bb 5
Sinking _4 = (long unsigned int) a2_15;
 from bb 2 to bb 5
Sinking a2_15 = s_11(D) * 4;
 from bb 2 to bb 5
f (int s, int * c)
{
  int * x3;
  int * x2;
  int * x1;
  int a3;
  int a2;
  int a1;
  long unsigned int _1;
  long unsigned int _2;
  sizetype _3;
  long unsigned int _4;
  long unsigned int _5;
  sizetype _6;
  long unsigned int _7;
  long unsigned int _8;
  sizetype _9;
  int * iftmp.0_10;

  :
  a1_12 = s_11(D) * 2;
  _1 = (long unsigned int) a1_12;
  _2 = _1 * 4;
  _3 = -_2;
  x1_14 = c_13(D) + _3;
  if (x1_14 != 0B)
goto ;
  else
goto ;

  :
  a2_15 = s_11(D) * 4;
  _4 = (long unsigned int) a2_15;
  _5 = _4 * 4;
  _6 = -_5;
  x2_16 = c_13(D) + _6;
  goto ;

  :
  a3_17 = s_11(D) * 6;
  _7 = (long unsigned int) a3_17;
  _8 = _7 * 4;
  _9 = -_8;
  x3_18 = c_13(D) + _9;

  :
  # iftmp.0_10 = PHI 
  return iftmp.0_10;

}




Re: [PATCH 1/2][v3] Drop excess size used for run time allocated stack variables.

2016-06-08 Thread Eric Botcazou
> There's some discussion about how STACK_POINT_OFFSET shouldn't cause us
> to have to align, and postponing that optimization to gcc-4.7. Since
> STACK_POINTER_OFFSET should be constant, it ought to be easy enough to
> take it into account.

See the "Minor cleanup to allocate_dynamic_stack_space" subthread, I think 
that the real issue is STACK_DYNAMIC_OFFSET.

-- 
Eric Botcazou


Re: [PATCH 2/2] Add edge predictions pruning

2016-06-08 Thread Martin Liška
Hi.

I'm sending second version of the patch, where I fixed dump_prediction function 
to
dump a proper edge.

Thanks,
Martin
>From 0d82e8def140636fe186888a525fe84e329d676b Mon Sep 17 00:00:00 2001
From: marxin 
Date: Tue, 31 May 2016 17:29:53 +0200
Subject: [PATCH 2/4] Add edge predictions pruning

contrib/ChangeLog:

2016-06-01  Martin Liska  

	* analyze_brprob.py: Cover new dump output format.

gcc/ChangeLog:

2016-06-01  Martin Liska  

	* predict.c (dump_prediction): Add new argument.
	(enum predictor_reason): New enum.
	(struct predictor_hash): New struct.
	(predictor_hash::hash): New function.
	(predictor_hash::equal): Likewise.
	(not_removed_prediction_p): New function.
	(prune_predictions_for_bb): Likewise.
	(combine_predictions_for_bb): Prune predictions.
---
 contrib/analyze_brprob.py |  10 +--
 gcc/predict.c | 180 --
 2 files changed, 165 insertions(+), 25 deletions(-)

diff --git a/contrib/analyze_brprob.py b/contrib/analyze_brprob.py
index 36371ff..9416eed 100755
--- a/contrib/analyze_brprob.py
+++ b/contrib/analyze_brprob.py
@@ -122,14 +122,14 @@ if len(sys.argv) != 2:
 exit(1)
 
 profile = Profile(sys.argv[1])
-r = re.compile('  (.*) heuristics: (.*)%.*exec ([0-9]*) hit ([0-9]*)')
+r = re.compile('  (.*) heuristics( of edge [0-9]*->[0-9]*)?( \\(.*\\))?: (.*)%.*exec ([0-9]*) hit ([0-9]*)')
 for l in open(profile.filename).readlines():
 m = r.match(l)
-if m != None:
+if m != None and m.group(3) == None:
 name = m.group(1)
-prediction = float(m.group(2))
-count = int(m.group(3))
-hits = int(m.group(4))
+prediction = float(m.group(4))
+count = int(m.group(5))
+hits = int(m.group(6))
 
 profile.add(name, prediction, count, hits)
 
diff --git a/gcc/predict.c b/gcc/predict.c
index e058793..f2ecc4a 100644
--- a/gcc/predict.c
+++ b/gcc/predict.c
@@ -55,13 +55,25 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-ssa-loop.h"
 #include "tree-scalar-evolution.h"
 
+enum predictor_reason
+{
+  NONE,
+  IGNORED,
+  SINGLE_EDGE_DUPLICATE,
+  EDGE_PAIR_DUPLICATE
+};
+
+static const char *reason_messages[] = {"", " (ignored)",
+" (single edge duplicate)", " (edge pair duplicate)"};
+
 /* real constants: 0, 1, 1-1/REG_BR_PROB_BASE, REG_BR_PROB_BASE,
 		   1/REG_BR_PROB_BASE, 0.5, BB_FREQ_MAX.  */
 static sreal real_almost_one, real_br_prob_base,
 	 real_inv_br_prob_base, real_one_half, real_bb_freq_max;
 
 static void combine_predictions_for_insn (rtx_insn *, basic_block);
-static void dump_prediction (FILE *, enum br_predictor, int, basic_block, int);
+static void dump_prediction (FILE *, enum br_predictor, int, basic_block,
+			 enum predictor_reason, edge);
 static void predict_paths_leading_to (basic_block, enum br_predictor, enum prediction);
 static void predict_paths_leading_to_edge (edge, enum br_predictor, enum prediction);
 static bool can_predict_insn_p (const rtx_insn *);
@@ -723,21 +735,31 @@ invert_br_probabilities (rtx insn)
 
 static void
 dump_prediction (FILE *file, enum br_predictor predictor, int probability,
-		 basic_block bb, int used)
+		 basic_block bb, enum predictor_reason reason = NONE,
+		 edge ep_edge = NULL)
 {
-  edge e;
+  edge e = ep_edge;
   edge_iterator ei;
 
   if (!file)
 return;
 
-  FOR_EACH_EDGE (e, ei, bb->succs)
-if (! (e->flags & EDGE_FALLTHRU))
-  break;
+  if (e == NULL)
+FOR_EACH_EDGE (e, ei, bb->succs)
+  if (! (e->flags & EDGE_FALLTHRU))
+	break;
 
-  fprintf (file, "  %s heuristics%s: %.1f%%",
+  char edge_info_str[128];
+  if (ep_edge)
+sprintf (edge_info_str, " of edge %d->%d", ep_edge->src->index,
+	 ep_edge->dest->index);
+  else
+edge_info_str[0] = '\0';
+
+  fprintf (file, "  %s heuristics%s%s: %.1f%%",
 	   predictor_info[predictor].name,
-	   used ? "" : " (ignored)", probability * 100.0 / REG_BR_PROB_BASE);
+	   edge_info_str, reason_messages[reason],
+	   probability * 100.0 / REG_BR_PROB_BASE);
 
   if (bb->count)
 {
@@ -834,18 +856,18 @@ combine_predictions_for_insn (rtx_insn *insn, basic_block bb)
 
   if (!found)
 dump_prediction (dump_file, PRED_NO_PREDICTION,
-		 combined_probability, bb, true);
+		 combined_probability, bb);
   else
 {
   dump_prediction (dump_file, PRED_DS_THEORY, combined_probability,
-		   bb, !first_match);
+		   bb, !first_match ? NONE : IGNORED);
   dump_prediction (dump_file, PRED_FIRST_MATCH, best_probability,
-		   bb, first_match);
+		   bb, first_match ? NONE: IGNORED);
 }
 
   if (first_match)
 combined_probability = best_probability;
-  dump_prediction (dump_file, PRED_COMBINED, combined_probability, bb, true);
+  dump_prediction (dump_file, PRED_COMBINED, combined_probability, bb);
 
   while (*pnote)
 {
@@ -856,7 +878,8 @@ combine_predictions_for_insn (rtx_insn *insn, basic_block bb)
 	  int probability = INTVAL (XEXP (XEXP (*pnote, 0), 1));
 
 	  dump_predict

[PATCH 3/N] Add sorting support to analyze_brprob script

2016-06-08 Thread Martin Liška
Hello.

This is a small followup, where I would like to add new argument to 
analyze_brprob.py
script file. With the patch, one can sort predictors by e.g. hitrate:


Example CPU2006
HEURISTICS   BRANCHES  (REL)  HITRATE
COVERAGE COVERAGE  (REL)
loop iv compare33   0.1%  20.27% /  86.24%   
30630826   30.63M   0.0%
no prediction   10406  19.5%  33.41% /  84.76%   
139755242456  139.76G  14.1%
early return (on trees)  6328  11.9%  54.20% /  86.48%
33569991740   33.57G   3.4%
guessed loop iterations   112   0.2%  62.06% /  64.49%  
958458522  958.46M   0.1%
fail alloc595   1.1%  62.18% / 100.00%  
  595   595.00   0.0%
opcode values positive (on trees)4266   8.0%  64.30% /  91.28%
16931889792   16.93G   1.7%
opcode values nonequal (on trees)6600  12.4%  66.23% /  80.60%
71483051282   71.48G   7.2%
continue  507   0.9%  66.66% /  82.85%
10086808016   10.09G   1.0%
call11351  21.3%  67.16% /  92.24%
34680666103   34.68G   3.5%
loop iterations  2689   5.0%  67.99% /  67.99%   
408309517405  408.31G  41.3%
DS theory   26385  49.4%  68.62% /  85.44%   
146974369890  146.97G  14.9%
const return  271   0.5%  69.39% /  87.09%  
301566712  301.57M   0.0%
pointer (on trees)   6230  11.7%  69.59% /  87.18%
16667735314   16.67G   1.7%
combined53398 100.0%  70.31% /  80.36%   
989164856862  989.16G 100.0%
goto   78   0.1%  70.36% /  96.96%  
951041538  951.04M   0.1%
first match 16607  31.1%  78.00% /  78.42%   
702435244516  702.44G  71.0%
extra loop exit   141   0.3%  82.80% /  88.17% 
16969469421.70G   0.2%
null return   393   0.7%  91.47% /  93.08% 
32686781973.27G   0.3%
loop exit9909  18.6%  91.80% /  92.81%   
282927773783  282.93G  28.6%
guess loop iv compare 178   0.3%  97.81% /  97.85% 
43750864534.38G   0.4%
negative return   277   0.5%  97.94% /  99.23% 
10621190281.06G   0.1%
noreturn call2372   4.4% 100.00% / 100.00% 
83565623238.36G   0.8%
overflow 1282   2.4% 100.00% / 100.00%  
175074177  175.07M   0.0%
zero-sized array  677   1.3% 100.00% / 100.00%  
112723803  112.72M   0.0%
unconditional jump103   0.2% 100.00% / 100.00% 
491001  491.00K   0.0%

Martin
>From fc40cf57a5b822558d32182b9937ba6dafd62377 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Thu, 2 Jun 2016 13:15:08 +0200
Subject: [PATCH 3/4] Add sorting support to analyze_brprob script

contrib/ChangeLog:

2016-06-08  Martin Liska  

	* analyze_brprob.py: Add new argument --sorting.
---
 contrib/analyze_brprob.py | 26 +++---
 1 file changed, 19 insertions(+), 7 deletions(-)

diff --git a/contrib/analyze_brprob.py b/contrib/analyze_brprob.py
index 9416eed..9808c46 100755
--- a/contrib/analyze_brprob.py
+++ b/contrib/analyze_brprob.py
@@ -65,6 +65,7 @@
 import sys
 import os
 import re
+import argparse
 
 def percentage(a, b):
 return 100.0 * a / b
@@ -77,6 +78,9 @@ class Summary:
 self.hits = 0
 self.fits = 0
 
+def get_hitrate(self):
+return self.hits / self.count
+
 def count_formatted(self):
 v = self.count
 for unit in ['','K','M','G','T','P','E','Z']:
@@ -108,22 +112,30 @@ class Profile:
 def count_max(self):
 return max([v.count for k, v in self.heuristics.items()])
 
-def dump(self):
+def dump(self, sorting):
+sorter = lambda x: x[1].branches
+if sorting == 'hitrate':
+sorter = lambda x: x[1].get_hitrate()
+elif sorting == 'coverage':
+sorter = lambda x: x[1].count
+
 print('%-36s %8s %6s  %-16s %14s %8s %6s' % ('HEURISTICS', 'BRANCHES', '(REL)',
   'HITRATE', 'COVERAGE', 'COVERAGE', '(REL)'))
-for (k, v) in sorted(self.heuristics.items(), key = lambda x: x[1].branches):
+for (k, v) in sorted(self.heuristics.items(), key = sorter):
 print('%-36s %8i %5.1f%% %6.2f%% / %6.2f%% %14i %8s %5.1f%%' %
 (k, v.branches, percentage(v.branches, self.branches_max ()),
  percentage(v.hits, v.count), percentage(v.fits, v.count),
  v.count, v.count_formatted(), percentage(v.count, self.count_max()) ))
 
-if len(sys.argv) != 2:
-print('Usage: ./analyze_brprob.py dump_file')
-exit(1)
+parser = argparse.ArgumentParser()
+parser.add_argument('dump_file', metavar = 'dump_file

[PATCH 4/N] Add new analyze_brprob_spec.py script

2016-06-08 Thread Martin Liška
Hi.

The second follow up patch adds new script which is a simple wrapper around
analyze_brprob.py and can be used to dump statistics for results that are in
different folder (like SPEC benchmarks).

Sample:
./contrib/analyze_brprob_spec.py --sorting=hitrate 
/home/marxin/Programming/cpu2006/benchspec/CPU2006/

Sample output:
401.bzip2
HEURISTICS   BRANCHES  (REL)  HITRATE
COVERAGE COVERAGE  (REL)
no prediction 107  14.0%  19.45% /  84.37% 
27681347332.77G  10.8%
opcode values nonequal (on trees)  76  10.0%  32.24% /  85.06% 
40346813444.03G  15.8%
call   95  12.5%  45.50% /  93.31%  
152224913  152.22M   0.6%
DS theory 275  36.1%  45.56% /  84.30% 
73088639047.31G  28.6%
continue   14   1.8%  48.44% /  73.14% 
14797749961.48G   5.8%
guessed loop iterations12   1.6%  68.30% /  71.61%  
269705737  269.71M   1.1%
combined  762 100.0%  69.52% /  89.32%
25553311262   25.55G 100.0%
goto   40   5.2%  72.41% /  98.80%  
882062676  882.06M   3.5%
opcode values positive (on trees)  40   5.2%  76.74% /  88.09% 
13941049261.39G   5.5%
pointer (on trees) 61   8.0%  83.79% / 100.00% 
931107  931.11K   0.0%
early return (on trees)31   4.1%  84.39% /  84.41% 
25480584022.55G  10.0%
first match   380  49.9%  89.79% /  92.57%
15476312625   15.48G  60.6%
loop exit 316  41.5%  90.09% /  92.88%
15065219828   15.07G  59.0%
guess loop iv compare   2   0.3%  99.61% /  99.61%   
26987995   26.99M   0.1%
loop iv compare 1   0.1%  99.61% /  99.61% 
105411  105.41K   0.0%
loop iterations38   5.0%  99.64% /  99.64%  
140236649  140.24M   0.5%
null return 2   0.3% 100.00% / 100.00%  
   1818.00   0.0%
noreturn call  13   1.7% 100.00% / 100.00%
10450001.04M   0.0%
const return2   0.3% 100.00% / 100.00%  
  816   816.00   0.0%
negative return62   8.1% 100.00% / 100.00%  
618097152  618.10M   2.4%

410.bwaves
HEURISTICS   BRANCHES  (REL)  HITRATE
COVERAGE COVERAGE  (REL)
call1   0.6%   0.00% / 100.00%  
   2020.00   0.0%
no prediction   6   3.7%   0.28% /  99.72%
27041842.70M   0.1%
opcode values nonequal (on trees)   4   2.4%  60.00% /  70.00%  
  200   200.00   0.0%
loop iterations 7   4.3%  80.00% /  80.00%  
112892000  112.89M   2.4%
first match83  50.6%  81.67% /  81.67% 
43938854654.39G  92.1%
loop exit  76  46.3%  81.71% /  81.71% 
42809934654.28G  89.8%
combined  164 100.0%  83.05% /  83.11% 
47685455074.77G 100.0%
DS theory  75  45.7% 100.00% / 100.00%  
371955858  371.96M   7.8%
early return (on trees) 3   1.8% 100.00% / 100.00%  
  688   688.00   0.0%
opcode values positive (on trees)  71  43.3% 100.00% / 100.00%  
371955658  371.96M   7.8%

...

Thanks,
Martin

>From ca9806bf77bd90df43913f5f1552ed16379dcf38 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Fri, 3 Jun 2016 12:46:43 +0200
Subject: [PATCH 4/4] Add new analyze_brprob_spec.py script

contrib/ChangeLog:

2016-06-08  Martin Liska  

	* analyze_brprob_spec.py: New file.
---
 contrib/analyze_brprob_spec.py | 58 ++
 1 file changed, 58 insertions(+)
 create mode 100755 contrib/analyze_brprob_spec.py

diff --git a/contrib/analyze_brprob_spec.py b/contrib/analyze_brprob_spec.py
new file mode 100755
index 000..a28eaac
--- /dev/null
+++ b/contrib/analyze_brprob_spec.py
@@ -0,0 +1,58 @@
+#!/usr/bin/env python3
+
+# This file is part of GCC.
+#
+# GCC is free software; you can redistribute it and/or modify it under
+# the terms of the GNU General Public License as published by the Free
+# Software Foundation; either version 3, or (at your option) any later
+# version.
+#
+# GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+# WARRANTY; without even the implied warranty of MERCHANTABILITY or
+# FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+# for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3.  If not see
+# .  */
+
+import sys

Re: [PATCH, RFC] First cut at using vec_construct for strided loads

2016-06-08 Thread Richard Biener
On Wed, Jun 13, 2012 at 4:18 AM, William J. Schmidt
 wrote:
> This patch is a follow-up to the discussion generated by
> http://gcc.gnu.org/ml/gcc-patches/2012-06/msg00546.html.  I've added
> vec_construct to the cost model for use in vect_model_load_cost, and
> implemented a cost calculation that makes sense to me for PowerPC.  I'm
> less certain about the default, i386, and spu implementations.  I took a
> guess at i386 from the discussions we had, and used the same calculation
> for the default and for spu.  I'm hoping you or others can fill in the
> blanks if I guessed badly.
>
> The i386 cost for vec_construct is different from all the others, which
> are parameterized for each processor description.  This should probably
> be parameterized in some way as well, but thought you'd know better than
> I how that should be.  Perhaps instead of
>
> elements / 2 + 1
>
> it should be
>
> (elements / 2) * X + Y
>
> where X and Y are taken from the processor description, and represent
> the cost of a merge and a permute, respectively.  Let me know what you
> think.

Just trying to understand how you arrived at the above formulas in investigating
strangely low cost for v16qi construction of 9.  If we pairwise reduce elements
with a cost of 1 then we arrive at a cost of elements - 1, that's what you'd
get with not accounting an initial move of element zero into a vector and then
inserting each other element into that with elements - 1 inserts.

This also matches up with code-generation on x86_64 for

vT foo (T a, T b, ...)
{
  return (vT) {a, b, ... };
}

for any vector / element type combination I tried.  Thus the patch below.

I'll bootstrap / test that on x86_64-linux and I'm leaving other
targets to target
maintainers.

Ok for the i386 parts?

Thanks,
Richard.

2016-06-08  Richard Biener  

* targhooks.c (default_builtin_vectorization_cost): Adjust
vec_construct cost.
* config/i386/i386.c (ix86_builtin_vectorization_cost): Likewise.

Index: gcc/targhooks.c
===
--- gcc/targhooks.c (revision 237196)
+++ gcc/targhooks.c (working copy)
@@ -589,8 +589,7 @@ default_builtin_vectorization_cost (enum
 return 3;

   case vec_construct:
-   elements = TYPE_VECTOR_SUBPARTS (vectype);
-   return elements / 2 + 1;
+   return TYPE_VECTOR_SUBPARTS (vectype) - 1;

   default:
 gcc_unreachable ();
Index: gcc/config/i386/i386.c
===
--- gcc/config/i386/i386.c  (revision 237196)
+++ gcc/config/i386/i386.c  (working copy)
@@ -49503,8 +49520,6 @@ static int
 ix86_builtin_vectorization_cost (enum vect_cost_for_stmt type_of_cost,
  tree vectype, int)
 {
-  unsigned elements;
-
   switch (type_of_cost)
 {
   case scalar_stmt:
@@ -49546,8 +49561,7 @@ ix86_builtin_vectorization_cost (enum ve
 return ix86_cost->vec_stmt_cost;

   case vec_construct:
-   elements = TYPE_VECTOR_SUBPARTS (vectype);
-   return ix86_cost->vec_stmt_cost * (elements / 2 + 1);
+   return ix86_cost->vec_stmt_cost * (TYPE_VECTOR_SUBPARTS (vectype) - 1);

   default:
 gcc_unreachable ();


> Thanks,
> Bill
>
>
> 2012-06-12  Bill Schmidt  
>
> * targhooks.c (default_builtin_vectorized_conversion): Handle
> vec_construct, using vectype to base cost on subparts.
> * target.h (enum vect_cost_for_stmt): Add vec_construct.
> * tree-vect-stmts.c (vect_model_load_cost): Use vec_construct
> instead of scalar_to-vec.
> * config/spu/spu.c (spu_builtin_vectorization_cost): Handle
> vec_construct in same way as default for now.
> * config/i386/i386.c (ix86_builtin_vectorization_cost): Likewise.
> * config/rs6000/rs6000.c (rs6000_builtin_vectorization_cost):
> Handle vec_construct, including special case for 32-bit loads.
>
>
> Index: gcc/targhooks.c
> ===
> --- gcc/targhooks.c (revision 188482)
> +++ gcc/targhooks.c (working copy)
> @@ -499,9 +499,11 @@ default_builtin_vectorized_conversion (unsigned in
>
>  int
>  default_builtin_vectorization_cost (enum vect_cost_for_stmt type_of_cost,
> -tree vectype ATTRIBUTE_UNUSED,
> +tree vectype,
>  int misalign ATTRIBUTE_UNUSED)
>  {
> +  unsigned elements;
> +
>switch (type_of_cost)
>  {
>case scalar_stmt:
> @@ -524,6 +526,11 @@ default_builtin_vectorization_cost (enum vect_cost
>case cond_branch_taken:
>  return 3;
>
> +  case vec_construct:
> +   elements = TYPE_VECTOR_SUBPARTS (vectype);
> +   gcc_assert (elements > 1);
> +   return elements / 2 + 1;
> +
>default:
>  gcc_unreachable ();
>  }
> Index: gcc/target.h
> 

Re: Update probabilities in predict.def to match reality

2016-06-08 Thread Jan Hubicka
> On 06/08/2016 12:21 PM, Andreas Schwab wrote:
> > Jan Hubicka  writes:
> > 
> >> Bootstrapped/regtested x86_64-linux, will commit it later today.
> > 
> > FAIL: gcc.dg/tree-ssa/slsr-8.c scan-tree-dump-times optimized " w?* " 7
> > 
> > Andreas.
> > 
> 
> Hi.
> 
> It's caused by a different probabilities for BB 2:
> 
> @@ -11,11 +11,11 @@
>  ;; 3 succs { 4 }
>  ;; 4 succs { 1 }
>  Predictions for bb 2
> -  DS theory heuristics: 78.4%
> -  first match heuristics (ignored): 85.0%
> -  combined heuristics: 78.4%
> -  pointer (on trees) heuristics: 85.0%
> -  early return (on trees) heuristics: 39.0%
> +  DS theory heuristics: 66.5%
> +  first match heuristics (ignored): 70.0%
> +  combined heuristics: 66.5%
> +  pointer (on trees) heuristics: 70.0%
> +  early return (on trees) heuristics: 46.0%

I see this is because sinking is done when PARAM_SINK_FREQUENCY_THRESHOLD
is met and that is 75% which seems quite ambitious for guessed profiles
that tends to be flat.  (also the code should use counts where available).
For some optimizers we have two thresholds - one for guessed profile and one
for FDO. Perhaps it would make sense to benchmark how decreasing this threshold
affect performance & code size.

What are the downsides of sinking? Increased register pressure? For non-loop
branches it is bit iffy to rely on static branch prediction to even give the
right direction of the branch. It happens in about 65% of cases (where perfect
predictor would do 85%) so we may try to come with heuristics that does not
fully rely on the profile.

We could probably fix the testcase by adding --param sink-frequency-threshold=55

Honza
> 
> Which leads to a different decision made by tree-ssa-sink:
> 
> +++ /tmp/sl-new/slsr-8.c.127t.sink2016-06-08 14:07:59.747958332 +0200
> @@ -21,6 +21,16 @@
>   from bb 2 to bb 3
>  Sinking a3_17 = s_11(D) * 6;
>   from bb 2 to bb 3
> +Sinking x2_16 = c_13(D) + _6;
> + from bb 2 to bb 5
> +Sinking _6 = -_5;
> + from bb 2 to bb 5
> +Sinking _5 = _4 * 4;
> + from bb 2 to bb 5
> +Sinking _4 = (long unsigned int) a2_15;
> + from bb 2 to bb 5
> +Sinking a2_15 = s_11(D) * 4;
> + from bb 2 to bb 5
>  f (int s, int * c)
>  {
>int * x3;
> @@ -46,17 +56,17 @@
>_2 = _1 * 4;
>_3 = -_2;
>x1_14 = c_13(D) + _3;
> -  a2_15 = s_11(D) * 4;
> -  _4 = (long unsigned int) a2_15;
> -  _5 = _4 * 4;
> -  _6 = -_5;
> -  x2_16 = c_13(D) + _6;
>if (x1_14 != 0B)
>  goto ;
>else
>  goto ;
>  
>:
> +  a2_15 = s_11(D) * 4;
> +  _4 = (long unsigned int) a2_15;
> +  _5 = _4 * 4;
> +  _6 = -_5;
> +  x2_16 = c_13(D) + _6;
>goto ;
>  
>:
> 
> That eventually leads to 9 occurrences of the scanned pattern. However, I'm 
> not sure if the test-case makes
> sense any longer?
> 
> Thanks,
> Martin

> 
> ;; Function f (f, funcdef_no=0, decl_uid=1747, cgraph_uid=0, symbol_order=0)
> 
> ;; 1 loops found
> ;;
> ;; Loop 0
> ;;  header 0, latch 1
> ;;  depth 0, outer -1
> ;;  nodes: 0 1 2 5 3 4
> ;; 2 succs { 5 3 }
> ;; 5 succs { 4 }
> ;; 3 succs { 4 }
> ;; 4 succs { 1 }
> Sinking x3_18 = c_13(D) + _9;
>  from bb 2 to bb 3
> Sinking _9 = -_8;
>  from bb 2 to bb 3
> Sinking _8 = _7 * 4;
>  from bb 2 to bb 3
> Sinking _7 = (long unsigned int) a3_17;
>  from bb 2 to bb 3
> Sinking a3_17 = s_11(D) * 6;
>  from bb 2 to bb 3
> Sinking x2_16 = c_13(D) + _6;
>  from bb 2 to bb 5
> Sinking _6 = -_5;
>  from bb 2 to bb 5
> Sinking _5 = _4 * 4;
>  from bb 2 to bb 5
> Sinking _4 = (long unsigned int) a2_15;
>  from bb 2 to bb 5
> Sinking a2_15 = s_11(D) * 4;
>  from bb 2 to bb 5
> f (int s, int * c)
> {
>   int * x3;
>   int * x2;
>   int * x1;
>   int a3;
>   int a2;
>   int a1;
>   long unsigned int _1;
>   long unsigned int _2;
>   sizetype _3;
>   long unsigned int _4;
>   long unsigned int _5;
>   sizetype _6;
>   long unsigned int _7;
>   long unsigned int _8;
>   sizetype _9;
>   int * iftmp.0_10;
> 
>   :
>   a1_12 = s_11(D) * 2;
>   _1 = (long unsigned int) a1_12;
>   _2 = _1 * 4;
>   _3 = -_2;
>   x1_14 = c_13(D) + _3;
>   if (x1_14 != 0B)
> goto ;
>   else
> goto ;
> 
>   :
>   a2_15 = s_11(D) * 4;
>   _4 = (long unsigned int) a2_15;
>   _5 = _4 * 4;
>   _6 = -_5;
>   x2_16 = c_13(D) + _6;
>   goto ;
> 
>   :
>   a3_17 = s_11(D) * 6;
>   _7 = (long unsigned int) a3_17;
>   _8 = _7 * 4;
>   _9 = -_8;
>   x3_18 = c_13(D) + _9;
> 
>   :
>   # iftmp.0_10 = PHI 
>   return iftmp.0_10;
> 
> }
> 
> 



[C++ Patch/RFC] Tiny tsubst tweak

2016-06-08 Thread Paolo Carlini

Hi,

while looking a bit into c++/71169 I noticed that in one place we first 
call tsubst_aggr_type and tubst_copy and then we check if either 
returned error_mark_node. Performance-wise, it would be better to check 
as soon as possible the return value of the former and in case return 
immediately without calling the latter. That assuming I'm not missing 
something of course... but the below passes testing.


Thanks,
Paolo.

//
Index: pt.c
===
--- pt.c(revision 237196)
+++ pt.c(working copy)
@@ -13430,10 +13430,12 @@ tsubst (tree t, tree args, tsubst_flags_t complain
   {
tree ctx = tsubst_aggr_type (TYPE_CONTEXT (t), args, complain,
 in_decl, /*entering_scope=*/1);
+   if (ctx == error_mark_node)
+ return error_mark_node;
+
tree f = tsubst_copy (TYPENAME_TYPE_FULLNAME (t), args,
  complain, in_decl);
-
-   if (ctx == error_mark_node || f == error_mark_node)
+   if (f == error_mark_node)
  return error_mark_node;
 
if (!MAYBE_CLASS_TYPE_P (ctx))


Re: [C++ PATCH] Fix -Wunused-* regression (PR c++/71442)

2016-06-08 Thread Jakub Jelinek
On Tue, Jun 07, 2016 at 05:23:34PM +0200, Jakub Jelinek wrote:
> Marek has recently added code to set TREE_USED bits on the elements of
> TREE_VEC referenced in SIZEOF_EXPR.
> But, as the testcase shows, it can be used on various parameter/argument
> packs, some of them have types as elements, others decls.
> And IMHO we want to set TREE_USED only on the decls listed in those,
> for types TREE_USED should be a property of the type regardless of whether
> the type is mentioned in sizeof... or not, otherwise we suddenly stop
> diagnosing any unused vars with those types.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk/6.2/5.5?
> 
> 2016-06-07  Jakub Jelinek  
> 
>   PR c++/71442
>   * pt.c (tsubst_copy): Only set TREE_USED on DECLs.
> 
>   * g++.dg/cpp0x/Wunused-variable-1.C: New test.

Richi pointed in the PR that I've screwed up the testcase, it doesn't FAIL
with unpatched compiler.

Here is the same patch with slightly adjusted testcase that does fail
with unpatched trunk, 6.1 or 5.4.  Bootstrapped/regtested again on
x86_64-linux and i686-linux, ok for trunk/6.2/5.5?

2016-06-08  Jakub Jelinek  

PR c++/71442
* pt.c (tsubst_copy): Only set TREE_USED on DECLs.

* g++.dg/cpp0x/Wunused-variable-1.C: New test.

--- gcc/cp/pt.c.jj  2016-06-01 14:17:12.0 +0200
+++ gcc/cp/pt.c 2016-06-07 14:29:16.608041125 +0200
@@ -14160,7 +14160,8 @@ tsubst_copy (tree t, tree args, tsubst_f
  len = TREE_VEC_LENGTH (expanded);
  /* Set TREE_USED for the benefit of -Wunused.  */
  for (int i = 0; i < len; i++)
-   TREE_USED (TREE_VEC_ELT (expanded, i)) = true;
+   if (DECL_P (TREE_VEC_ELT (expanded, i)))
+ TREE_USED (TREE_VEC_ELT (expanded, i)) = true;
}
 
  if (expanded == error_mark_node)
--- gcc/testsuite/g++.dg/cpp0x/Wunused-variable-1.C.jj  2016-06-07 
14:31:15.514486508 +0200
+++ gcc/testsuite/g++.dg/cpp0x/Wunused-variable-1.C 2016-06-07 
14:32:13.526730026 +0200
@@ -0,0 +1,37 @@
+// PR c++/71442
+// { dg-do compile { target c++11 } }
+// { dg-options "-Wunused-variable" }
+
+struct C
+{
+  template
+  int operator()(Ts &&...)
+  {
+return sizeof...(Ts);
+  }
+};
+
+int
+foo ()
+{
+  C {} (1, 1L, 1LL, 1.0);
+}
+
+template
+void
+bar ()
+{
+  char a;  // { dg-warning "unused variable" }
+  short b; // { dg-warning "unused variable" }
+  int c;   // { dg-warning "unused variable" }
+  long d;  // { dg-warning "unused variable" }
+  long long e; // { dg-warning "unused variable" }
+  float f; // { dg-warning "unused variable" }
+  double g;// { dg-warning "unused variable" }
+}
+
+void
+baz ()
+{
+  bar <0> ();
+}


Jakub


Re: [PATCH 2/2] Add edge predictions pruning

2016-06-08 Thread Jan Hubicka
> 2016-06-01  Martin Liska  
> 
>   * analyze_brprob.py: Cover new dump output format.
> 
> gcc/ChangeLog:
> 
> 2016-06-01  Martin Liska  
> 
>   * predict.c (dump_prediction): Add new argument.
>   (enum predictor_reason): New enum.
>   (struct predictor_hash): New struct.
>   (predictor_hash::hash): New function.
>   (predictor_hash::equal): Likewise.
>   (not_removed_prediction_p): New function.
>   (prune_predictions_for_bb): Likewise.
>   (combine_predictions_for_bb): Prune predictions.
> ---
>  contrib/analyze_brprob.py |  10 +--
>  gcc/predict.c | 180 
> --
>  2 files changed, 165 insertions(+), 25 deletions(-)
> 
> diff --git a/contrib/analyze_brprob.py b/contrib/analyze_brprob.py
> index 36371ff..9416eed 100755
> --- a/contrib/analyze_brprob.py
> +++ b/contrib/analyze_brprob.py
> @@ -122,14 +122,14 @@ if len(sys.argv) != 2:
>  exit(1)
>  
>  profile = Profile(sys.argv[1])
> -r = re.compile('  (.*) heuristics: (.*)%.*exec ([0-9]*) hit ([0-9]*)')
> +r = re.compile('  (.*) heuristics( of edge [0-9]*->[0-9]*)?( \\(.*\\))?: 
> (.*)%.*exec ([0-9]*) hit ([0-9]*)')
>  for l in open(profile.filename).readlines():
>  m = r.match(l)
> -if m != None:
> +if m != None and m.group(3) == None:
>  name = m.group(1)
> -prediction = float(m.group(2))
> -count = int(m.group(3))
> -hits = int(m.group(4))
> +prediction = float(m.group(4))
> +count = int(m.group(5))
> +hits = int(m.group(6))
>  
>  profile.add(name, prediction, count, hits)
>  
> diff --git a/gcc/predict.c b/gcc/predict.c
> index e058793..f2ecc4a 100644
> --- a/gcc/predict.c
> +++ b/gcc/predict.c
> @@ -55,13 +55,25 @@ along with GCC; see the file COPYING3.  If not see
>  #include "tree-ssa-loop.h"
>  #include "tree-scalar-evolution.h"
>  
> +enum predictor_reason
Add comment, please
> +{
> +  NONE,
> +  IGNORED,
> +  SINGLE_EDGE_DUPLICATE,
> +  EDGE_PAIR_DUPLICATE
> +};
> +
> +static const char *reason_messages[] = {"", " (ignored)",
> +" (single edge duplicate)", " (edge pair duplicate)"};

And here too.
> +
>  /* real constants: 0, 1, 1-1/REG_BR_PROB_BASE, REG_BR_PROB_BASE,
>  1/REG_BR_PROB_BASE, 0.5, BB_FREQ_MAX.  */
>  static sreal real_almost_one, real_br_prob_base,
>real_inv_br_prob_base, real_one_half, real_bb_freq_max;
>  
>  static void combine_predictions_for_insn (rtx_insn *, basic_block);
> -static void dump_prediction (FILE *, enum br_predictor, int, basic_block, 
> int);
> +static void dump_prediction (FILE *, enum br_predictor, int, basic_block,
> +  enum predictor_reason, edge);
>  static void predict_paths_leading_to (basic_block, enum br_predictor, enum 
> prediction);
>  static void predict_paths_leading_to_edge (edge, enum br_predictor, enum 
> prediction);
>  static bool can_predict_insn_p (const rtx_insn *);
> @@ -723,21 +735,31 @@ invert_br_probabilities (rtx insn)
>  
>  static void
>  dump_prediction (FILE *file, enum br_predictor predictor, int probability,
> -  basic_block bb, int used)
> +  basic_block bb, enum predictor_reason reason = NONE,
> +  edge ep_edge = NULL)
>  {
> -  edge e;
> +  edge e = ep_edge;
>edge_iterator ei;
>  
>if (!file)
>  return;
>  
> -  FOR_EACH_EDGE (e, ei, bb->succs)
> -if (! (e->flags & EDGE_FALLTHRU))
> -  break;
> +  if (e == NULL)
> +FOR_EACH_EDGE (e, ei, bb->succs)
> +  if (! (e->flags & EDGE_FALLTHRU))
> + break;
>  
> -  fprintf (file, "  %s heuristics%s: %.1f%%",
> +  char edge_info_str[128];
> +  if (ep_edge)
> +sprintf (edge_info_str, " of edge %d->%d", ep_edge->src->index,
> +  ep_edge->dest->index);
> +  else
> +edge_info_str[0] = '\0';
> +
> +  fprintf (file, "  %s heuristics%s%s: %.1f%%",
>  predictor_info[predictor].name,
> -used ? "" : " (ignored)", probability * 100.0 / REG_BR_PROB_BASE);
> +edge_info_str, reason_messages[reason],
> +probability * 100.0 / REG_BR_PROB_BASE);
>  
>if (bb->count)
>  {
> @@ -834,18 +856,18 @@ combine_predictions_for_insn (rtx_insn *insn, 
> basic_block bb)
>  
>if (!found)
>  dump_prediction (dump_file, PRED_NO_PREDICTION,
> -  combined_probability, bb, true);
> +  combined_probability, bb);
>else
>  {
>dump_prediction (dump_file, PRED_DS_THEORY, combined_probability,
> -bb, !first_match);
> +bb, !first_match ? NONE : IGNORED);
>dump_prediction (dump_file, PRED_FIRST_MATCH, best_probability,
> -bb, first_match);
> +bb, first_match ? NONE: IGNORED);
>  }
>  
>if (first_match)
>  combined_probability = best_probability;
> -  dump_prediction (dump_file, PRED_COMBINED, combined_probability, bb, true);
> +  dump_prediction (dump_file, PRED_COMBINED, combined_

[PATCH] Improve "str" + 2 > "str" folding (PR c++/71448)

2016-06-08 Thread Jakub Jelinek
Hi!

For the purposes of fold_comparison, various constants (in this case
STRING_CST) work the same as decls, in particular we know the objects
extents and can determine possible pointer wrapping.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk
(and maybe later for 6.x)?

2016-06-08  Jakub Jelinek  
Richard Biener  

PR c++/71448
* fold-const.c (fold_comparison): Handle CONSTANT_CLASS_P (base0)
the same as DECL_P (base0) for indirect_base0.  Use equality_code
in one further place.

* g++.dg/torture/pr71448.C: New test.

--- gcc/fold-const.c.jj 2016-06-06 19:39:40.0 +0200
+++ gcc/fold-const.c2016-06-08 10:23:15.129178865 +0200
@@ -8527,9 +8527,9 @@ fold_comparison (location_t loc, enum tr
  if ((offset0 == offset1
   || (offset0 && offset1
   && operand_equal_p (offset0, offset1, 0)))
- && (code == EQ_EXPR
- || code == NE_EXPR
- || (indirect_base0 && DECL_P (base0))
+ && (equality_code
+ || (indirect_base0
+ && (DECL_P (base0) || CONSTANT_CLASS_P (base0)))
  || POINTER_TYPE_OVERFLOW_UNDEFINED))
 
{
@@ -8568,7 +8568,8 @@ fold_comparison (location_t loc, enum tr
 6.5.6/8 and /9 with respect to the signed ptrdiff_t.  */
  else if (bitpos0 == bitpos1
   && (equality_code
-  || (indirect_base0 && DECL_P (base0))
+  || (indirect_base0
+  && (DECL_P (base0) || CONSTANT_CLASS_P (base0)))
   || POINTER_TYPE_OVERFLOW_UNDEFINED))
{
  /* By converting to signed sizetype we cover middle-end pointer
--- gcc/testsuite/g++.dg/torture/pr71448.C.jj   2016-06-08 10:32:17.409952602 
+0200
+++ gcc/testsuite/g++.dg/torture/pr71448.C  2016-06-08 10:33:38.396872265 
+0200
@@ -0,0 +1,27 @@
+// PR c++/71448
+// { dg-do compile }
+// { dg-additional-options "-std=c++11" }
+
+static constexpr const char foo[] = "foo";
+static constexpr const char *bar = "bar";
+
+static_assert ((foo + 3 - foo) == 3, "check");
+static_assert (foo + 2 != foo, "check");
+static_assert (foo + 2 >= foo, "check");
+static_assert (3 + foo >= foo, "check");
+static_assert (foo <= foo + 2, "check");
+static_assert (foo <= 3 + foo, "check");
+static_assert (foo + 2 > foo, "check");
+static_assert (3 + foo > foo, "check");
+static_assert (foo < 2 + foo, "check");
+static_assert (foo < foo + 3, "check");
+static_assert ((bar + 3 - bar) == 3, "check");
+static_assert (bar + 2 != bar, "check");
+static_assert (2 + bar >= bar, "check");
+static_assert (bar + 3 >= bar, "check");
+static_assert (bar <= bar + 2, "check");
+static_assert (bar <= 3 + bar, "check");
+static_assert (bar + 2 > bar, "check");
+static_assert (3 + bar > bar, "check");
+static_assert (bar < 2 + bar, "check");
+static_assert (bar < bar + 3, "check");

Jakub


Re: [PATCH 0/9] separate shrink-wrapping

2016-06-08 Thread Eric Botcazou
> Is the usage of the word "concern" here standard for this kind of thing?
> It seems odd somehow but maybe that's just me.

No, I find it quite odd too.

-- 
Eric Botcazou


Re: Update probabilities in predict.def to match reality

2016-06-08 Thread Richard Biener
On Wed, Jun 8, 2016 at 2:35 PM, Jan Hubicka  wrote:
>> On 06/08/2016 12:21 PM, Andreas Schwab wrote:
>> > Jan Hubicka  writes:
>> >
>> >> Bootstrapped/regtested x86_64-linux, will commit it later today.
>> >
>> > FAIL: gcc.dg/tree-ssa/slsr-8.c scan-tree-dump-times optimized " w?* " 7
>> >
>> > Andreas.
>> >
>>
>> Hi.
>>
>> It's caused by a different probabilities for BB 2:
>>
>> @@ -11,11 +11,11 @@
>>  ;; 3 succs { 4 }
>>  ;; 4 succs { 1 }
>>  Predictions for bb 2
>> -  DS theory heuristics: 78.4%
>> -  first match heuristics (ignored): 85.0%
>> -  combined heuristics: 78.4%
>> -  pointer (on trees) heuristics: 85.0%
>> -  early return (on trees) heuristics: 39.0%
>> +  DS theory heuristics: 66.5%
>> +  first match heuristics (ignored): 70.0%
>> +  combined heuristics: 66.5%
>> +  pointer (on trees) heuristics: 70.0%
>> +  early return (on trees) heuristics: 46.0%
>
> I see this is because sinking is done when PARAM_SINK_FREQUENCY_THRESHOLD
> is met and that is 75% which seems quite ambitious for guessed profiles
> that tends to be flat.  (also the code should use counts where available).
> For some optimizers we have two thresholds - one for guessed profile and one
> for FDO. Perhaps it would make sense to benchmark how decreasing this 
> threshold
> affect performance & code size.
>
> What are the downsides of sinking? Increased register pressure?

Possibly.  But that depends on the whole stmt chain that is eventually be sunken
and this heuristic is for single stmts - which makes it somewhat fishy.  I'd
simply benchmark removing it ...   Eventually it tries to avoid sinking to
post-dominated blocks this way (those should have the same frequency), not sure.

Richard.

> For non-loop
> branches it is bit iffy to rely on static branch prediction to even give the
> right direction of the branch. It happens in about 65% of cases (where perfect
> predictor would do 85%) so we may try to come with heuristics that does not
> fully rely on the profile.
>
> We could probably fix the testcase by adding --param 
> sink-frequency-threshold=55
>
> Honza
>>
>> Which leads to a different decision made by tree-ssa-sink:
>>
>> +++ /tmp/sl-new/slsr-8.c.127t.sink2016-06-08 14:07:59.747958332 +0200
>> @@ -21,6 +21,16 @@
>>   from bb 2 to bb 3
>>  Sinking a3_17 = s_11(D) * 6;
>>   from bb 2 to bb 3
>> +Sinking x2_16 = c_13(D) + _6;
>> + from bb 2 to bb 5
>> +Sinking _6 = -_5;
>> + from bb 2 to bb 5
>> +Sinking _5 = _4 * 4;
>> + from bb 2 to bb 5
>> +Sinking _4 = (long unsigned int) a2_15;
>> + from bb 2 to bb 5
>> +Sinking a2_15 = s_11(D) * 4;
>> + from bb 2 to bb 5
>>  f (int s, int * c)
>>  {
>>int * x3;
>> @@ -46,17 +56,17 @@
>>_2 = _1 * 4;
>>_3 = -_2;
>>x1_14 = c_13(D) + _3;
>> -  a2_15 = s_11(D) * 4;
>> -  _4 = (long unsigned int) a2_15;
>> -  _5 = _4 * 4;
>> -  _6 = -_5;
>> -  x2_16 = c_13(D) + _6;
>>if (x1_14 != 0B)
>>  goto ;
>>else
>>  goto ;
>>
>>:
>> +  a2_15 = s_11(D) * 4;
>> +  _4 = (long unsigned int) a2_15;
>> +  _5 = _4 * 4;
>> +  _6 = -_5;
>> +  x2_16 = c_13(D) + _6;
>>goto ;
>>
>>:
>>
>> That eventually leads to 9 occurrences of the scanned pattern. However, I'm 
>> not sure if the test-case makes
>> sense any longer?
>>
>> Thanks,
>> Martin
>
>>
>> ;; Function f (f, funcdef_no=0, decl_uid=1747, cgraph_uid=0, symbol_order=0)
>>
>> ;; 1 loops found
>> ;;
>> ;; Loop 0
>> ;;  header 0, latch 1
>> ;;  depth 0, outer -1
>> ;;  nodes: 0 1 2 5 3 4
>> ;; 2 succs { 5 3 }
>> ;; 5 succs { 4 }
>> ;; 3 succs { 4 }
>> ;; 4 succs { 1 }
>> Sinking x3_18 = c_13(D) + _9;
>>  from bb 2 to bb 3
>> Sinking _9 = -_8;
>>  from bb 2 to bb 3
>> Sinking _8 = _7 * 4;
>>  from bb 2 to bb 3
>> Sinking _7 = (long unsigned int) a3_17;
>>  from bb 2 to bb 3
>> Sinking a3_17 = s_11(D) * 6;
>>  from bb 2 to bb 3
>> Sinking x2_16 = c_13(D) + _6;
>>  from bb 2 to bb 5
>> Sinking _6 = -_5;
>>  from bb 2 to bb 5
>> Sinking _5 = _4 * 4;
>>  from bb 2 to bb 5
>> Sinking _4 = (long unsigned int) a2_15;
>>  from bb 2 to bb 5
>> Sinking a2_15 = s_11(D) * 4;
>>  from bb 2 to bb 5
>> f (int s, int * c)
>> {
>>   int * x3;
>>   int * x2;
>>   int * x1;
>>   int a3;
>>   int a2;
>>   int a1;
>>   long unsigned int _1;
>>   long unsigned int _2;
>>   sizetype _3;
>>   long unsigned int _4;
>>   long unsigned int _5;
>>   sizetype _6;
>>   long unsigned int _7;
>>   long unsigned int _8;
>>   sizetype _9;
>>   int * iftmp.0_10;
>>
>>   :
>>   a1_12 = s_11(D) * 2;
>>   _1 = (long unsigned int) a1_12;
>>   _2 = _1 * 4;
>>   _3 = -_2;
>>   x1_14 = c_13(D) + _3;
>>   if (x1_14 != 0B)
>> goto ;
>>   else
>> goto ;
>>
>>   :
>>   a2_15 = s_11(D) * 4;
>>   _4 = (long unsigned int) a2_15;
>>   _5 = _4 * 4;
>>   _6 = -_5;
>>   x2_16 = c_13(D) + _6;
>>   goto ;
>>
>>   :
>>   a3_17 = s_11(D) * 6;
>>   _7 = (long unsigned int) a3_17;
>>   _8 = _7 * 4;
>>   _9 = -_8;
>>   x3_18 = c_13(D) + _9;
>>
>>   :
>>   # iftmp.0_10 = PHI 
>>   return iftmp.0_10;
>>
>> }
>>
>>
>


Re: [C++ Patch/RFC] Tiny tsubst tweak

2016-06-08 Thread Paolo Carlini
... well, I suppose that in principle the super-safe thing to do in such 
cases would be checking immediately and returning immediately only if we 
are in a SFINAE context. Like in the untested patchlet attached.


Paolo.

/
Index: pt.c
===
--- pt.c(revision 237196)
+++ pt.c(working copy)
@@ -13430,6 +13430,10 @@ tsubst (tree t, tree args, tsubst_flags_t complain
   {
tree ctx = tsubst_aggr_type (TYPE_CONTEXT (t), args, complain,
 in_decl, /*entering_scope=*/1);
+   if (!(complain & tf_error)
+   && ctx == error_mark_node)
+ return error_mark_node;
+
tree f = tsubst_copy (TYPENAME_TYPE_FULLNAME (t), args,
  complain, in_decl);
 


Re: [PATCH] Improve "str" + 2 > "str" folding (PR c++/71448)

2016-06-08 Thread Richard Biener
On Wed, 8 Jun 2016, Jakub Jelinek wrote:

> Hi!
> 
> For the purposes of fold_comparison, various constants (in this case
> STRING_CST) work the same as decls, in particular we know the objects
> extents and can determine possible pointer wrapping.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk
> (and maybe later for 6.x)?

Ok.

Thanks,
Richard.

> 
> 2016-06-08  Jakub Jelinek  
>   Richard Biener  
> 
>   PR c++/71448
>   * fold-const.c (fold_comparison): Handle CONSTANT_CLASS_P (base0)
>   the same as DECL_P (base0) for indirect_base0.  Use equality_code
>   in one further place.
> 
>   * g++.dg/torture/pr71448.C: New test.
> 
> --- gcc/fold-const.c.jj   2016-06-06 19:39:40.0 +0200
> +++ gcc/fold-const.c  2016-06-08 10:23:15.129178865 +0200
> @@ -8527,9 +8527,9 @@ fold_comparison (location_t loc, enum tr
> if ((offset0 == offset1
>  || (offset0 && offset1
>  && operand_equal_p (offset0, offset1, 0)))
> -   && (code == EQ_EXPR
> -   || code == NE_EXPR
> -   || (indirect_base0 && DECL_P (base0))
> +   && (equality_code
> +   || (indirect_base0
> +   && (DECL_P (base0) || CONSTANT_CLASS_P (base0)))
> || POINTER_TYPE_OVERFLOW_UNDEFINED))
>  
>   {
> @@ -8568,7 +8568,8 @@ fold_comparison (location_t loc, enum tr
>6.5.6/8 and /9 with respect to the signed ptrdiff_t.  */
> else if (bitpos0 == bitpos1
>  && (equality_code
> -|| (indirect_base0 && DECL_P (base0))
> +|| (indirect_base0
> +&& (DECL_P (base0) || CONSTANT_CLASS_P (base0)))
>  || POINTER_TYPE_OVERFLOW_UNDEFINED))
>   {
> /* By converting to signed sizetype we cover middle-end pointer
> --- gcc/testsuite/g++.dg/torture/pr71448.C.jj 2016-06-08 10:32:17.409952602 
> +0200
> +++ gcc/testsuite/g++.dg/torture/pr71448.C2016-06-08 10:33:38.396872265 
> +0200
> @@ -0,0 +1,27 @@
> +// PR c++/71448
> +// { dg-do compile }
> +// { dg-additional-options "-std=c++11" }
> +
> +static constexpr const char foo[] = "foo";
> +static constexpr const char *bar = "bar";
> +
> +static_assert ((foo + 3 - foo) == 3, "check");
> +static_assert (foo + 2 != foo, "check");
> +static_assert (foo + 2 >= foo, "check");
> +static_assert (3 + foo >= foo, "check");
> +static_assert (foo <= foo + 2, "check");
> +static_assert (foo <= 3 + foo, "check");
> +static_assert (foo + 2 > foo, "check");
> +static_assert (3 + foo > foo, "check");
> +static_assert (foo < 2 + foo, "check");
> +static_assert (foo < foo + 3, "check");
> +static_assert ((bar + 3 - bar) == 3, "check");
> +static_assert (bar + 2 != bar, "check");
> +static_assert (2 + bar >= bar, "check");
> +static_assert (bar + 3 >= bar, "check");
> +static_assert (bar <= bar + 2, "check");
> +static_assert (bar <= 3 + bar, "check");
> +static_assert (bar + 2 > bar, "check");
> +static_assert (3 + bar > bar, "check");
> +static_assert (bar < 2 + bar, "check");
> +static_assert (bar < bar + 3, "check");
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: Update probabilities in predict.def to match reality

2016-06-08 Thread Jan Hubicka
> On Wed, Jun 8, 2016 at 2:35 PM, Jan Hubicka  wrote:
> >> On 06/08/2016 12:21 PM, Andreas Schwab wrote:
> >> > Jan Hubicka  writes:
> >> >
> >> >> Bootstrapped/regtested x86_64-linux, will commit it later today.
> >> >
> >> > FAIL: gcc.dg/tree-ssa/slsr-8.c scan-tree-dump-times optimized " w?* 
> >> > " 7
> >> >
> >> > Andreas.
> >> >
> >>
> >> Hi.
> >>
> >> It's caused by a different probabilities for BB 2:
> >>
> >> @@ -11,11 +11,11 @@
> >>  ;; 3 succs { 4 }
> >>  ;; 4 succs { 1 }
> >>  Predictions for bb 2
> >> -  DS theory heuristics: 78.4%
> >> -  first match heuristics (ignored): 85.0%
> >> -  combined heuristics: 78.4%
> >> -  pointer (on trees) heuristics: 85.0%
> >> -  early return (on trees) heuristics: 39.0%
> >> +  DS theory heuristics: 66.5%
> >> +  first match heuristics (ignored): 70.0%
> >> +  combined heuristics: 66.5%
> >> +  pointer (on trees) heuristics: 70.0%
> >> +  early return (on trees) heuristics: 46.0%
> >
> > I see this is because sinking is done when PARAM_SINK_FREQUENCY_THRESHOLD
> > is met and that is 75% which seems quite ambitious for guessed profiles
> > that tends to be flat.  (also the code should use counts where available).
> > For some optimizers we have two thresholds - one for guessed profile and one
> > for FDO. Perhaps it would make sense to benchmark how decreasing this 
> > threshold
> > affect performance & code size.
> >
> > What are the downsides of sinking? Increased register pressure?
> 
> Possibly.  But that depends on the whole stmt chain that is eventually be 
> sunken
> and this heuristic is for single stmts - which makes it somewhat fishy.  I'd

Yep, the usual problem ;)

> simply benchmark removing it ...   Eventually it tries to avoid sinking to
> post-dominated blocks this way (those should have the same frequency), not 
> sure.

Ruling out post-dominators in addition to checking profile would definitly be
useful safety check and you probably don't want to sink into loops (ont sure
if that can happen). I will take a look.

Honza


[PATCH] Remove strided SLP load vectorization restriction

2016-06-08 Thread Richard Biener

Currently we only handle group_size <= nunits && nunits % group_size == 0
strided SLP loads.  That's overly restrictive as we can chunk
group_size > nunits && group_size % nunits == 0 loads and handle all
other cases by constructing the vector from scalars (as we'd do for
non-SLP).

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2016-06-08  Richard Biener  

* tree-vect-stmts.c (vectorizable_load): Remove restrictions
on strided SLP loads and fall back to scalar loads in case
we can't chunk them.

* gcc.dg/vect/slp-43.c: New testcase.

Index: gcc/tree-vect-stmts.c
===
*** gcc/tree-vect-stmts.c   (revision 237205)
--- gcc/tree-vect-stmts.c   (working copy)
*** vectorizable_load (gimple *stmt, gimple_
*** 6440,6456 
}
  }
else if (STMT_VINFO_STRIDED_P (stmt_info))
! {
!   if (grouped_load
! && slp
! && (group_size > nunits
! || nunits % group_size != 0))
!   {
! dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
!  "unhandled strided group load\n");
! return false;
!   }
! }
else
  {
negative = tree_int_cst_compare (nested_in_vect_loop
--- 6440,6446 
}
  }
else if (STMT_VINFO_STRIDED_P (stmt_info))
! ;
else
  {
negative = tree_int_cst_compare (nested_in_vect_loop
*** vectorizable_load (gimple *stmt, gimple_
*** 6744,6759 
running_off = offvar;
alias_off = build_int_cst (reference_alias_ptr_type (DR_REF 
(first_dr)), 0);
int nloads = nunits;
tree ltype = TREE_TYPE (vectype);
auto_vec dr_chain;
if (slp)
{
! nloads = nunits / group_size;
! if (group_size < nunits)
!   ltype = build_vector_type (TREE_TYPE (vectype), group_size);
! else
!   ltype = vectype;
! ltype = build_aligned_type (ltype, TYPE_ALIGN (TREE_TYPE (vectype)));
  /* For SLP permutation support we need to load the whole group,
 not only the number of vector stmts the permutation result
 fits in.  */
--- 6734,6762 
running_off = offvar;
alias_off = build_int_cst (reference_alias_ptr_type (DR_REF 
(first_dr)), 0);
int nloads = nunits;
+   int lnel = 1;
tree ltype = TREE_TYPE (vectype);
auto_vec dr_chain;
if (slp)
{
! if (group_size < nunits
! && nunits % group_size == 0)
!   {
! nloads = nunits / group_size;
! lnel = group_size;
! ltype = build_vector_type (TREE_TYPE (vectype), group_size);
! ltype = build_aligned_type (ltype,
! TYPE_ALIGN (TREE_TYPE (vectype)));
!   }
! else if (group_size >= nunits
!  && group_size % nunits == 0)
!   {
! nloads = 1;
! lnel = nunits;
! ltype = vectype;
! ltype = build_aligned_type (ltype,
! TYPE_ALIGN (TREE_TYPE (vectype)));
!   }
  /* For SLP permutation support we need to load the whole group,
 not only the number of vector stmts the permutation result
 fits in.  */
*** vectorizable_load (gimple *stmt, gimple_
*** 6765,6812 
  else
ncopies = SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node);
}
for (j = 0; j < ncopies; j++)
{
- tree vec_inv;
- 
  if (nloads > 1)
{
! vec_alloc (v, nloads);
! for (i = 0; i < nloads; i++)
{
! tree newref, newoff;
! gimple *incr;
! newref = build2 (MEM_REF, ltype, running_off, alias_off);
! 
! newref = force_gimple_operand_gsi (gsi, newref, true,
!NULL_TREE, true,
!GSI_SAME_STMT);
! CONSTRUCTOR_APPEND_ELT (v, NULL_TREE, newref);
! newoff = copy_ssa_name (running_off);
! incr = gimple_build_assign (newoff, POINTER_PLUS_EXPR,
! running_off, stride_step);
  vect_finish_stmt_generation (stmt, incr, gsi);
  
  running_off = newoff;
}
- 
- vec_inv = build_constructor (vectype, v);
- new_temp = vect_init_vector (stmt, vec_inv, vectype, gsi);
- new_stmt = SSA_NAME_DEF_STMT (new_temp);
}
! else
{
! new_stmt = gimple_build_assign (make_ssa_name (ltype),
! build2 (MEM_REF, ltype,
! running_off, alias_off));

Re: Update probabilities in predict.def to match reality

2016-06-08 Thread Richard Biener
On Wed, Jun 8, 2016 at 2:50 PM, Jan Hubicka  wrote:
>> On Wed, Jun 8, 2016 at 2:35 PM, Jan Hubicka  wrote:
>> >> On 06/08/2016 12:21 PM, Andreas Schwab wrote:
>> >> > Jan Hubicka  writes:
>> >> >
>> >> >> Bootstrapped/regtested x86_64-linux, will commit it later today.
>> >> >
>> >> > FAIL: gcc.dg/tree-ssa/slsr-8.c scan-tree-dump-times optimized " w?* 
>> >> > " 7
>> >> >
>> >> > Andreas.
>> >> >
>> >>
>> >> Hi.
>> >>
>> >> It's caused by a different probabilities for BB 2:
>> >>
>> >> @@ -11,11 +11,11 @@
>> >>  ;; 3 succs { 4 }
>> >>  ;; 4 succs { 1 }
>> >>  Predictions for bb 2
>> >> -  DS theory heuristics: 78.4%
>> >> -  first match heuristics (ignored): 85.0%
>> >> -  combined heuristics: 78.4%
>> >> -  pointer (on trees) heuristics: 85.0%
>> >> -  early return (on trees) heuristics: 39.0%
>> >> +  DS theory heuristics: 66.5%
>> >> +  first match heuristics (ignored): 70.0%
>> >> +  combined heuristics: 66.5%
>> >> +  pointer (on trees) heuristics: 70.0%
>> >> +  early return (on trees) heuristics: 46.0%
>> >
>> > I see this is because sinking is done when PARAM_SINK_FREQUENCY_THRESHOLD
>> > is met and that is 75% which seems quite ambitious for guessed profiles
>> > that tends to be flat.  (also the code should use counts where available).
>> > For some optimizers we have two thresholds - one for guessed profile and 
>> > one
>> > for FDO. Perhaps it would make sense to benchmark how decreasing this 
>> > threshold
>> > affect performance & code size.
>> >
>> > What are the downsides of sinking? Increased register pressure?
>>
>> Possibly.  But that depends on the whole stmt chain that is eventually be 
>> sunken
>> and this heuristic is for single stmts - which makes it somewhat fishy.  I'd
>
> Yep, the usual problem ;)
>
>> simply benchmark removing it ...   Eventually it tries to avoid sinking to
>> post-dominated blocks this way (those should have the same frequency), not 
>> sure.
>
> Ruling out post-dominators in addition to checking profile would definitly be
> useful safety check and you probably don't want to sink into loops (ont sure
> if that can happen). I will take a look.

That is already taken care of with the loop depth check.

I think "sinking" into post-dominated regions should be done by a hypothetical
GIMPLE scheduling pass.  I'm not sure the sinking pass will even consider such
locations (you'd have to double check).

Richard.

> Honza


[Patch, testsuite] Skip some more tests for targets with int size < 32

2016-06-08 Thread Senthil Kumar Selvaraj
Hi,

  This patch requires int32plus support for a few more tests - these
  were failing for the avr target.

  bswap-2.c uses left shifts wider than 16 bits on a char, and
  pr68067-{1,2} use an out of range negative number (INT_MIN for 32 bit int).

  If this is ok, could someone commit please? I don't have commit access.

Regards
Senthil

gcc/testsuite/ChangeLog

2016-06-08  Senthil Kumar Selvaraj  

* gcc.c-torture/execute/bswap-2.c: Require int32plus.
*   gcc.dg/torture/pr68067-1.c: Likewise.
* gcc.dg/torture/pr68067-2.c: Likewise.

diff --git gcc/testsuite/gcc.c-torture/execute/bswap-2.c 
gcc/testsuite/gcc.c-torture/execute/bswap-2.c
index 88132fe..63e7807 100644
--- gcc/testsuite/gcc.c-torture/execute/bswap-2.c
+++ gcc/testsuite/gcc.c-torture/execute/bswap-2.c
@@ -1,3 +1,5 @@
+/* { dg-require-effective-target int32plus } */
+
 #ifdef __UINT32_TYPE__
 typedef __UINT32_TYPE__ uint32_t;
 #else
diff --git gcc/testsuite/gcc.dg/torture/pr68067-1.c 
gcc/testsuite/gcc.dg/torture/pr68067-1.c
index a7b6aa0..f8ad3ca 100644
--- gcc/testsuite/gcc.dg/torture/pr68067-1.c
+++ gcc/testsuite/gcc.dg/torture/pr68067-1.c
@@ -1,4 +1,5 @@
 /* { dg-do run } */
+/* { dg-require-effective-target int32plus } */
 
 int main()
 {
diff --git gcc/testsuite/gcc.dg/torture/pr68067-2.c 
gcc/testsuite/gcc.dg/torture/pr68067-2.c
index 38a459b..e03bf22 100644
--- gcc/testsuite/gcc.dg/torture/pr68067-2.c
+++ gcc/testsuite/gcc.dg/torture/pr68067-2.c
@@ -1,4 +1,5 @@
 /* { dg-do run } */
+/* { dg-require-effective-target int32plus } */
 
 int main()
 {
-- 
2.7.4



Re: [PATCH,rs6000] Add built-in function support for new Power9 vector absolute difference unsigned instructions

2016-06-08 Thread Bill Schmidt

> On Jun 7, 2016, at 2:29 AM, Richard Biener  wrote:
> 
> On Tue, Jun 7, 2016 at 1:58 AM, Kelvin Nilsen
>  wrote:
>> 
>> This patch adds built-in function support for the ISA 3.0 vabsub,
>> vabsduh, and vabsduw instructions.
>> 
>> I have bootstrapped and tested on powerpc64le-unkonwn-linux-gnu with no
>> regressions.  Is this ok for the trunk?
>> 
>> I have also tested against the gcc-6 branch without regressions.  Is
>> this ok for backporting to gcc6 after a few days of burn-in time on the
>> trunk?
> 
> It sounds like these match SAD_EXPR and thus should allow vectorizing
> gcc.dg/vect/slp-reduc-sad.c and gcc.dg/vect/vect-reduc-sad.c using SAD?

Very possibly.  I need to look at what the vectorizer expects for SAD_EXPR.
At first glance, these are not a direct match, as SAD_EXPR has a widening 
component that these instructions don’t have.  So although I think they could 
be used, the required sequence might be a little ugly on POWER, which does
not have double-wide vectors.  We’ll need to look at it more carefully to see if
we can generate code that’s better than scalar for those tests.

If not, we’ll probably be in the market for a non-widening equivalent one of
these days…  But right now we’re just focusing on enablement of the
instructions.  Looking at effective vectorization with them is on my list for 
later.

— Bill

> 
> Richard.
> 
>> gcc/testsuite/ChangeLog:
>> 
>> 2016-06-06  Kelvin Nilsen  
>> 
>>* gcc.target/powerpc/vadsdu-0.c: New test.
>>* gcc.target/powerpc/vadsdu-1.c: New test.
>>* gcc.target/powerpc/vadsdu-2.c: New test.
>>* gcc.target/powerpc/vadsdu-3.c: New test.
>>* gcc.target/powerpc/vadsdu-4.c: New test.
>>* gcc.target/powerpc/vadsdu-5.c: New test.
>>* gcc.target/powerpc/vadsdub-1.c: New test.
>>* gcc.target/powerpc/vadsdub-2.c: New test.
>>* gcc.target/powerpc/vadsduh-1.c: New test.
>>* gcc.target/powerpc/vadsduh-2.c: New test.
>>* gcc.target/powerpc/vadsduw-1.c: New test.
>>* gcc.target/powerpc/vadsduw-2.c: New test.
>> 
>> 
>> gcc/ChangeLog:
>> 
>> 2016-06-06  Kelvin Nilsen  
>> 
>>* config/rs6000/altivec.h (vec_adu): New macro for vector absolute
>>difference unsigned.
>>(vec_adub): New macro for vector absolute difference unsigned
>>byte.
>>(vec_aduh): New macro for vector absolute difference unsigned
>>half-word.
>>(vec_aduw): New macro for vector absolute difference unsigned word.
>>* config/rs6000/altivec.md (UNSPEC_VADU): New value.
>>(vadu3): New insn.
>>(*p9_vadu3): New insn.
>>* config/rs6000/rs6000-builtin.def (vadub): New built-in
>>definition.
>>(vaduh): New built-in definition.
>>(vaduw): New built-in definition.
>>(vadu): New overloaded built-in definition.
>>(vadub): New overloaded built-in definition.
>>(vaduh): New overloaded built-in definition.
>>(vaduw): New overloaded built-in definition.
>>* config/rs6000/rs6000-c.c (altivec_overloaded_builtins): Add
>>overloaded vector absolute difference unsigned functions.
>>* doc/extend.texi (PowerPC AltiVec Built-in Functions): Document
>>the ISA 3.0 vector absolute difference unsigned built-in functions.
>> 
>> Index: gcc/config/rs6000/altivec.h
>> ===
>> --- gcc/config/rs6000/altivec.h (revision 237045)
>> +++ gcc/config/rs6000/altivec.h (working copy)
>> @@ -401,6 +401,11 @@
>> #define vec_vprtybq __builtin_vec_vprtybq
>> #endif
>> 
>> +#define vec_adu __builtin_vec_vadu
>> +#define vec_adub __builtin_vec_vadub
>> +#define vec_aduh __builtin_vec_vaduh
>> +#define vec_aduw __builtin_vec_vaduw
>> +
>> #define vec_slv __builtin_vec_vslv
>> #define vec_srv __builtin_vec_vsrv
>> #endif
>> Index: gcc/config/rs6000/altivec.md
>> ===
>> --- gcc/config/rs6000/altivec.md(revision 237045)
>> +++ gcc/config/rs6000/altivec.md(working copy)
>> @@ -114,6 +114,7 @@
>>UNSPEC_STVLXL
>>UNSPEC_STVRX
>>UNSPEC_STVRXL
>> +   UNSPEC_VADU
>>UNSPEC_VSLV
>>UNSPEC_VSRV
>>UNSPEC_VMULWHUB
>> @@ -3464,6 +3465,25 @@
>>   [(set_attr "length" "4")
>>(set_attr "type" "vecsimple")])
>> 
>> +;; Vector absolute difference unsigned
>> +(define_expand "vadu3"
>> +  [(set (match_operand:VI 0 "register_operand" "")
>> +(unspec:VI [(match_operand:VI 1 "register_operand" "")
>> +   (match_operand:VI 2 "register_operand" "")]
>> + UNSPEC_VADU))]
>> +  "TARGET_P9_VECTOR")
>> +
>> +;; Vector absolute difference unsigned
>> +(define_insn "*p9_vadu3"
>> +  [(set (match_operand:VI 0 "register_operand" "=v")
>> +(unspec:VI [(match_operand:VI 1 "register_operand" "v")
>> +   (match_operand:VI 2 "register_operand" "v")]
>> + UNSPEC_VADU))]
>> +  "TARGET_P9_VECTOR"
>> 

[PATCH] Testcase for PR68558

2016-06-08 Thread Richard Biener

Vectorized since GCC 6.

Tested on x86_64-unknown-linux-gnu, applied.

Richard.

2016-06-08  Richard Biener  

* gcc.dg/vect/slp-44.c: New testcase.

Index: gcc/testsuite/gcc.dg/vect/slp-44.c
===
--- gcc/testsuite/gcc.dg/vect/slp-44.c  (revision 0)
+++ gcc/testsuite/gcc.dg/vect/slp-44.c  (working copy)
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_int } */
+
+void IMB_double_fast_x (int * __restrict__ destf,
+   int * __restrict__ dest, int y,
+   int * __restrict__ p1f)
+{
+  int i;
+  for (i = y; i > 0; i--)
+{
+  *dest++ = 0;
+  destf[0] = p1f[0];
+  destf[1] = p1f[1];
+  destf[2] = p1f[2];
+  destf[3] = p1f[3];
+  destf[4] = p1f[8];
+  destf[5] = p1f[9];
+  destf[6] = p1f[10];
+  destf[7] = p1f[11];
+  destf += 8;
+  p1f += 12;
+}
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { target { 
vect_hw_misalign && vect_perm } } } } */


libgomp: Unconfuse offload plugins vs. offload targets

2016-06-08 Thread Thomas Schwinge
Hi!

This got me confused recently, so I took the effort to clean it up.  OK
to commit?

commit 5a1b2d8440a459fd2c623ba76fd6cab478ada54f
Author: Thomas Schwinge 
Date:   Wed Jun 8 15:18:11 2016 +0200

libgomp: Unconfuse offload plugins vs. offload targets

libgomp/
* plugin/configfrag.ac: Populate and AC_SUBST offload_plugins
instead of offload_targets, and AC_DEFINE_UNQUOTED OFFLOAD_PLUGINS
instead of OFFLOAD_TARGETS.
* target.c (gomp_target_init): Adjust to that.
* testsuite/lib/libgomp.exp: Likewise.
* testsuite/libgomp-test-support.exp.in: Likewise.
* Makefile.in: Regenerate.
* config.h.in: Regenerate.
* configure: Regenerate.
* testsuite/Makefile.in: Regenerate.
---
 libgomp/Makefile.in   |  2 +-
 libgomp/config.h.in   |  4 ++--
 libgomp/configure | 34 ++-
 libgomp/plugin/configfrag.ac  | 34 ++-
 libgomp/target.c  |  8 +++
 libgomp/testsuite/Makefile.in |  2 +-
 libgomp/testsuite/lib/libgomp.exp | 25 +---
 libgomp/testsuite/libgomp-test-support.exp.in |  2 +-
 8 files changed, 56 insertions(+), 55 deletions(-)

diff --git libgomp/Makefile.in libgomp/Makefile.in
[...]
diff --git libgomp/config.h.in libgomp/config.h.in
[...]
diff --git libgomp/configure libgomp/configure
[...]
diff --git libgomp/plugin/configfrag.ac libgomp/plugin/configfrag.ac
index 88b4156..93d3a71 100644
--- libgomp/plugin/configfrag.ac
+++ libgomp/plugin/configfrag.ac
@@ -26,8 +26,6 @@
 # see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 # .
 
-offload_targets=
-AC_SUBST(offload_targets)
 plugin_support=yes
 AC_CHECK_LIB(dl, dlsym, , [plugin_support=no])
 if test x"$plugin_support" = xyes; then
@@ -142,7 +140,10 @@ AC_SUBST(PLUGIN_HSA_LIBS)
 
 
 
-# Get offload targets and path to install tree of offloading compiler.
+# Parse offload targets, and figure out libgomp plugin, and configure the
+# corresponding offload compiler.
+offload_plugins=
+AC_SUBST(offload_plugins)
 offload_additional_options=
 offload_additional_lib_paths=
 AC_SUBST(offload_additional_options)
@@ -151,13 +152,13 @@ if test x"$enable_offload_targets" != x; then
   for tgt in `echo $enable_offload_targets | sed -e 's#,# #g'`; do
 tgt_dir=`echo $tgt | grep '=' | sed 's/.*=//'`
 tgt=`echo $tgt | sed 's/=.*//'`
-tgt_name=
+tgt_plugin=
 case $tgt in
   *-intelmic-* | *-intelmicemul-*)
-   tgt_name=intelmic
+   tgt_plugin=intelmic
;;
   nvptx*)
-tgt_name=nvptx
+   tgt_plugin=nvptx
PLUGIN_NVPTX=$tgt
PLUGIN_NVPTX_CPPFLAGS=$CUDA_DRIVER_CPPFLAGS
PLUGIN_NVPTX_LDFLAGS=$CUDA_DRIVER_LDFLAGS
@@ -184,7 +185,7 @@ if test x"$enable_offload_targets" != x; then
;;
esac
;;
-  hsa*)
+  hsa)
case "${target}" in
  x86_64-*-*)
case " ${CC} ${CFLAGS} " in
@@ -192,7 +193,7 @@ if test x"$enable_offload_targets" != x; then
PLUGIN_HSA=0
;;
  *)
-   tgt_name=hsa
+   tgt_plugin=hsa
PLUGIN_HSA=$tgt
PLUGIN_HSA_CPPFLAGS=$HSA_RUNTIME_CPPFLAGS
PLUGIN_HSA_LDFLAGS="$HSA_RUNTIME_LDFLAGS $HSA_KMT_LDFLAGS"
@@ -214,7 +215,7 @@ if test x"$enable_offload_targets" != x; then
LDFLAGS=$PLUGIN_HSA_save_LDFLAGS
LIBS=$PLUGIN_HSA_save_LIBS
case $PLUGIN_HSA in
- hsa*)
+ hsa)
HSA_PLUGIN=0
AC_MSG_ERROR([HSA run-time package required for HSA 
support])
;;
@@ -231,16 +232,17 @@ if test x"$enable_offload_targets" != x; then
AC_MSG_ERROR([unknown offload target specified])
;;
 esac
-if test x"$tgt_name" = x; then
+if test x"$tgt_plugin" = x; then
   # Don't configure libgomp for this offloading target if we don't build
   # the corresponding plugin.
   continue
-elif test x"$offload_targets" = x; then
-  offload_targets=$tgt_name
+elif test x"$offload_plugins" = x; then
+  offload_plugins=$tgt_plugin
 else
-  offload_targets=$offload_targets,$tgt_name
+  offload_plugins=$offload_plugins,$tgt_plugin
 fi
-if test "$tgt_name" = hsa; then
+# Configure additional search paths.
+if test "$tgt_plugin" = hsa; then
   # Offloading compilation is all handled by the target compiler.
   :
 elif test x"$tgt_dir" != x; then
@@ -252,8 +254,8 @@ if test x"$enable_offload_targets" != x; then
 fi
   done
 fi
-AC_DEFINE_UNQUOTED(OFFLOAD_TARGETS, "$offload_targets",
-  [Define to offload targets, separated by commas.])
+AC_DEFINE_UNQUOTED(OFFLOAD_PLUGINS, "$offload_plugins",
+  [Define to offl

[PING] [PR c/71381] C/C++ OpenACC cache directive rejects valid syntax

2016-06-08 Thread Thomas Schwinge
Hi!

Ping.

On Thu, 2 Jun 2016 13:47:08 +0200, I wrote:
> On Wed, 05 Nov 2014 17:29:19 +0100, I wrote:
> > In r217145, I applied Jim's patch to gomp-4_0-branch:
> > 
> > commit 4361f9b6b2c74c2961c3a5290a4945abe2d7a444
> > Author: tschwinge 
> > Date:   Wed Nov 5 16:26:47 2014 +
> > 
> > OpenACC cache directive for C.
> 
> (That, and the corresponding C++ changes later made it into trunk.)
> 
> > --- gcc/c/c-parser.c
> > +++ gcc/c/c-parser.c
> > @@ -10053,6 +10053,14 @@ c_parser_omp_variable_list (c_parser *parser,
> > {
> >   switch (kind)
> > {
> > +   case OMP_NO_CLAUSE_CACHE:
> > + if (c_parser_peek_token (parser)->type != CPP_OPEN_SQUARE)
> > +   {
> > + c_parser_error (parser, "expected %<[%>");
> > + t = error_mark_node;
> > + break;
> > +   }
> > + /* FALL THROUGH.  */
> > case OMP_CLAUSE_MAP:
> > case OMP_CLAUSE_FROM:
> > case OMP_CLAUSE_TO:
> 
> Strictly speaking (OpenACC 2.0a specification), that is correct: the
> OpenACC cache directive explicitly only allows "array elements or
> subarrays".  However, I wonder if it would make sense to allow complete
> arrays as a GNU extension?  That is, syntactic sugar to allow "cache (a)"
> to mean "cache (a[0:LENGTH])"?
> 
> > @@ -10091,6 +10099,29 @@ c_parser_omp_variable_list (c_parser *parser,
> >   t = error_mark_node;
> >   break;
> > }
> > +
> > + if (kind == OMP_NO_CLAUSE_CACHE)
> > +   {
> > + mark_exp_read (low_bound);
> > + mark_exp_read (length);
> > +
> > + if (TREE_CODE (low_bound) != INTEGER_CST
> > + && !TREE_READONLY (low_bound))
> > +   {
> > + error_at (clause_loc,
> > +   "%qD is not a constant", low_bound);
> > + t = error_mark_node;
> > +   }
> 
> WHile OpenACC 2.0a specifies that "the lower bound is a constant", it
> also permits the lower bound to be a "loop invariant, or the for loop
> index variable plus or minus a constant or loop invariant".  So, we're
> rejecting valid syntax here.
> 
> > +
> > + if (TREE_CODE (length) != INTEGER_CST
> > + && !TREE_READONLY (length))
> > +   {
> > + error_at (clause_loc,
> > +   "%qD is not a constant", length);
> > + t = error_mark_node;
> > +   }
> > +   }
> 
> The idea is correct (OpenACC 2.0a: "the length is a constant"), but we
> can't reliably check that here; for example:
> 
> #pragma acc cache (a[0:n + 1])
> 
> ... will run into an ICE, "tree check: expected tree that contains 'decl
> minimal' structure, have 'plus_expr' in [...]".
> 
> Currently we're discarding the OpenACC cache directive in the middle end;
> I expect checking of the lower bound and length will come automatically
> as soon as we start to do something with OACC_CACHE/OMP_CLAUSE__CACHE_.
> Until then, I propose we simple remove these checks from the front ends.
> OK for trunk and gcc-6-branch?
> 
> commit a620ebe6fa509ec6441ba87276e55078eb2d00fc
> Author: Thomas Schwinge 
> Date:   Thu Jun 2 12:19:49 2016 +0200
> 
> [PR c/71381] C/C++ OpenACC cache directive rejects valid syntax
> 
>   gcc/c/
>   PR c/71381
>   * c-parser.c (c_parser_omp_variable_list) :
>   Loosen checking.
>   gcc/cp/
>   PR c/71381
>   * parser.c (cp_parser_omp_var_list_no_open) :
>   Loosen checking.
>   gcc/fortran/
>   PR c/71381
>   * openmp.c (gfc_match_oacc_cache): Add comment.
>   gcc/testsuite/
>   PR c/71381
>   * c-c++-common/goacc/cache-1.c: Update.  Move invalid usage tests
>   to...
>   * c-c++-common/goacc/cache-2.c: ... this new file.
>   * gfortran.dg/goacc/cache-1.f95: Move invalid usage tests to...
>   * gfortran.dg/goacc/cache-2.f95: ... this new file.
>   * gfortran.dg/goacc/coarray.f95: Update OpenACC cache directive
>   usage.
>   * gfortran.dg/goacc/cray.f95: Likewise.
>   * gfortran.dg/goacc/loop-1.f95: Likewise.
>   libgomp/
>   PR c/71381
>   * testsuite/libgomp.oacc-c-c++-common/cache-1.c: #include
>   "../../../gcc/testsuite/c-c++-common/goacc/cache-1.c".
>   * testsuite/libgomp.oacc-fortran/cache-1.f95: New file.
> 
>   gcc/
>   * omp-low.c (scan_sharing_clauses): Don't expect
>   OMP_CLAUSE__CACHE_.
> ---
>  gcc/c/c-parser.c   | 22 +---
>  gcc/cp/parser.c| 22 +---
>  gcc/fortran/openmp.c   |  5 ++
>  gcc/omp-low.c  |  6 --
>  gcc/testsuite/c-c++-common/goacc/cache-1.c | 66 
> --
>  .../c-c++-common/goacc/{cache-1.c => c

[PATCH][SPARC] Fix cpu auto-detection in M7 and S7 (Sonoma)

2016-06-08 Thread Jose E. Marchesi

Starting with the M7 we will be using the same identifiers to identify
the cpu in both Solaris (via kstat) and GNU/Linux (via /proc/cpuinfo).
This little patch fixes the SPARC M7 entry in cpu_names, and also adds
an entry for the Sonoma SoC.

Tested in sparc64-*-linux-gnu, sparcv9-*-linux-gnu and
sparc-sun-solaris2.11 targets.

2016-06-08  Jose E. Marchesi  

* config/sparc/driver-sparc.c (cpu_names): Fix the entry for the
SPARC-M7 and add an entry for SPARC-S7 cpus (Sonoma).


diff --git a/gcc/config/sparc/driver-sparc.c b/gcc/config/sparc/driver-sparc.c
index b81763e..ea174bf 100644
--- a/gcc/config/sparc/driver-sparc.c
+++ b/gcc/config/sparc/driver-sparc.c
@@ -57,7 +57,6 @@ static const struct cpu_names {
   { "UltraSPARC-T2+",  "niagara2" },
   { "SPARC-T3","niagara3" },
   { "SPARC-T4","niagara4" },
-  { "SPARC-M7","niagara7" },
 #else
   { "SuperSparc",  "supersparc" },
   { "HyperSparc",  "hypersparc" },
@@ -74,9 +73,10 @@ static const struct cpu_names {
   { "UltraSparc T2",   "niagara2" },
   { "UltraSparc T3",   "niagara3" },
   { "UltraSparc T4",   "niagara4" },
-  { "UltraSparc M7",   "niagara7" },
   { "LEON","leon3" },
 #endif
+  { "SPARC-M7","niagara7" },
+  { "SPARC-S7","niagara7" },
   { NULL,  NULL }
   };
 


Re: [PATCH, RFC] First cut at using vec_construct for strided loads

2016-06-08 Thread Bill Schmidt
Hi Richard,

> On Jun 8, 2016, at 7:29 AM, Richard Biener  wrote:
> 
> On Wed, Jun 13, 2012 at 4:18 AM, William J. Schmidt
>  wrote:
>> This patch is a follow-up to the discussion generated by
>> http://gcc.gnu.org/ml/gcc-patches/2012-06/msg00546.html.  I've added
>> vec_construct to the cost model for use in vect_model_load_cost, and
>> implemented a cost calculation that makes sense to me for PowerPC.  I'm
>> less certain about the default, i386, and spu implementations.  I took a
>> guess at i386 from the discussions we had, and used the same calculation
>> for the default and for spu.  I'm hoping you or others can fill in the
>> blanks if I guessed badly.
>> 
>> The i386 cost for vec_construct is different from all the others, which
>> are parameterized for each processor description.  This should probably
>> be parameterized in some way as well, but thought you'd know better than
>> I how that should be.  Perhaps instead of
>> 
>>elements / 2 + 1
>> 
>> it should be
>> 
>>(elements / 2) * X + Y
>> 
>> where X and Y are taken from the processor description, and represent
>> the cost of a merge and a permute, respectively.  Let me know what you
>> think.
> 
> Just trying to understand how you arrived at the above formulas in 
> investigating
> strangely low cost for v16qi construction of 9.  If we pairwise reduce 
> elements
> with a cost of 1 then we arrive at a cost of elements - 1, that's what you'd
> get with not accounting an initial move of element zero into a vector and then
> inserting each other element into that with elements - 1 inserts.

What I wrote there only makes partial sense for certain types on Power, so far 
as
I can tell, and even then it doesn’t generalize properly.  When the scalar 
registers
are contained in the vector registers (as happens for floating-point on Power), 
then
you can do some merges and other forms of permutes to combine them faster
than doing specific inserts.  But that isn’t a general solution even on Power; 
for the
integer modes we still do inserts.

So what you have makes sense to me, and what’s currently in place for Power 
needs
work also, so far as I can tell.  I’ll take a note to revisit this.

— Bill

> 
> This also matches up with code-generation on x86_64 for
> 
> vT foo (T a, T b, ...)
> {
>  return (vT) {a, b, ... };
> }
> 
> for any vector / element type combination I tried.  Thus the patch below.
> 
> I'll bootstrap / test that on x86_64-linux and I'm leaving other
> targets to target
> maintainers.
> 
> Ok for the i386 parts?
> 
> Thanks,
> Richard.
> 
> 2016-06-08  Richard Biener  
> 
>* targhooks.c (default_builtin_vectorization_cost): Adjust
>vec_construct cost.
>* config/i386/i386.c (ix86_builtin_vectorization_cost): Likewise.
> 
> Index: gcc/targhooks.c
> ===
> --- gcc/targhooks.c (revision 237196)
> +++ gcc/targhooks.c (working copy)
> @@ -589,8 +589,7 @@ default_builtin_vectorization_cost (enum
> return 3;
> 
>   case vec_construct:
> -   elements = TYPE_VECTOR_SUBPARTS (vectype);
> -   return elements / 2 + 1;
> +   return TYPE_VECTOR_SUBPARTS (vectype) - 1;
> 
>   default:
> gcc_unreachable ();
> Index: gcc/config/i386/i386.c
> ===
> --- gcc/config/i386/i386.c  (revision 237196)
> +++ gcc/config/i386/i386.c  (working copy)
> @@ -49503,8 +49520,6 @@ static int
> ix86_builtin_vectorization_cost (enum vect_cost_for_stmt type_of_cost,
>  tree vectype, int)
> {
> -  unsigned elements;
> -
>   switch (type_of_cost)
> {
>   case scalar_stmt:
> @@ -49546,8 +49561,7 @@ ix86_builtin_vectorization_cost (enum ve
> return ix86_cost->vec_stmt_cost;
> 
>   case vec_construct:
> -   elements = TYPE_VECTOR_SUBPARTS (vectype);
> -   return ix86_cost->vec_stmt_cost * (elements / 2 + 1);
> +   return ix86_cost->vec_stmt_cost * (TYPE_VECTOR_SUBPARTS (vectype) - 
> 1);
> 
>   default:
> gcc_unreachable ();
> 
> 
>> Thanks,
>> Bill
>> 
>> 
>> 2012-06-12  Bill Schmidt  
>> 
>>* targhooks.c (default_builtin_vectorized_conversion): Handle
>>vec_construct, using vectype to base cost on subparts.
>>* target.h (enum vect_cost_for_stmt): Add vec_construct.
>>* tree-vect-stmts.c (vect_model_load_cost): Use vec_construct
>>instead of scalar_to-vec.
>>* config/spu/spu.c (spu_builtin_vectorization_cost): Handle
>>vec_construct in same way as default for now.
>>* config/i386/i386.c (ix86_builtin_vectorization_cost): Likewise.
>>* config/rs6000/rs6000.c (rs6000_builtin_vectorization_cost):
>>Handle vec_construct, including special case for 32-bit loads.
>> 
>> 
>> Index: gcc/targhooks.c
>> ===
>> --- gcc/targhooks.c (revision 188482)
>> +

Re: [PING] [PR other/70945] Handle function_glibc_finite_math in offloading

2016-06-08 Thread Thomas Schwinge
Hi!

On Tue, 7 Jun 2016 08:54:10 +0200, Jakub Jelinek  wrote:
> On Mon, Jun 06, 2016 at 09:11:18PM +, Joseph Myers wrote:
> > On Fri, 3 Jun 2016, Jakub Jelinek wrote:
> > > I think it would be better to just add this support to newlib.
> > 
> > That suggestion doesn't really make sense to me.  Why should newlib be 
> > expected to follow the same choices as glibc regarding what variants of 
> > libm functions to export, beyond the standard names for those functions, 
> > or how to name any variants it does export?

ACK.  I'm thinking somewhat along the lines that in GCC's offloading
compilation, the target ("host") side should "sanitize" the code
presented to the offloading compilers.  One first step for that I have
presented here, to sanitize the finite math functions that get
special-cased by glibc.

A next step is to define the set of external functions/symbols that are
permitted to be used in offloaded code, and then sanitize these at the
glibc header file level, for example, by adding (new) attributes, and
similar.

> I'm not saying newlib in general, let newlib do whatever they want, but
> I'm talking about offloading port(s) of newlib, which IMHO should provide
> translation layer from the host headers to the offloading target functions.

In earlier emails I argued against this, and you didn't reject it back
then.  So you're saying that we'll need a compatibility/translation layer
for any kind of target libc that people may be using (which arguably is
glibc primarily, but do we intend to limit us to that?), and keep that
maintained as these target libcs evolve?

> The thing is, I think it is much better to have this layer in a source form
> where you can easily modify it than inside of the compiler where you have to
> hardwire everything in there.  It could sit in some offloading directory of
> newlib, which the offloading port(s) could share.

I argue this should be as close as possible to the origin, which is the
glibc header files, and as these are not feasible to be adjusted quickly,
we instead do it in the compilation process, for now.

As I understand Joseph's point, similar handling will be required for
vector function variants anyway.

> The __*_finite functions aren't the only one, what if glibc the next half a
> year adds another 4-5 of the finite math functions?

Huh?  In your proposed model I would then have to react to that, and
alter the translation layer, whereas in my model it would just continue
to work?

> What about e.g.
> -D_FORTIFY_SOURCE=2 string functions, etc.?

Obviously, these will required similar handling.  I specifically said:
"The first thing I'm working on is math functions usage in offloaded
regions".


Grüße
 Thomas


Re: [PATCH v1] Support for SPARC M7 and VIS 4.0

2016-06-08 Thread Eric Botcazou
> Committed to trunk.  Thanks.

You're welcome.  Now backported onto the 6 branch, together with your previous 
patch for --with-cpu-{32,64}.

I have also attached an update for the wwwdocs module, OK to install?

-- 
Eric BotcazouIndex: gcc-6/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-6/changes.html,v
retrieving revision 1.84
diff -u -r1.84 changes.html
--- gcc-6/changes.html	3 Jun 2016 08:25:44 -	1.84
+++ gcc-6/changes.html	8 Jun 2016 13:49:37 -
@@ -849,5 +849,23 @@
 	generation of PIE by default.
   
 
+GCC 6.2
+
+This is the https://gcc.gnu.org/bugzilla/buglist.cgi?bug_status=RESOLVED&resolution=FIXED&target_milestone=6.2";>list
+of problem reports (PRs) from GCC's bug tracking system that are
+known to be fixed in the 6.2 release. This list might not be
+complete (that is, it is possible that some PRs that have been fixed
+are not listed here).
+
+Target Specific Changes
+
+SPARC
+  
+Support for --with-cpu-32 and --with-cpu-64
+configure options has been added on bi-architecture platforms.
+Support for the SPARC M7 (Niagara 7) processor has been added.
+Support for the VIS 4.0 instruction set has been added.
+  
+
 
 


Re: [PING] [PR other/70945] Handle function_glibc_finite_math in offloading

2016-06-08 Thread Jakub Jelinek
On Wed, Jun 08, 2016 at 03:47:54PM +0200, Thomas Schwinge wrote:
> > I'm not saying newlib in general, let newlib do whatever they want, but
> > I'm talking about offloading port(s) of newlib, which IMHO should provide
> > translation layer from the host headers to the offloading target functions.
> 
> In earlier emails I argued against this, and you didn't reject it back
> then.  So you're saying that we'll need a compatibility/translation layer
> for any kind of target libc that people may be using (which arguably is
> glibc primarily, but do we intend to limit us to that?), and keep that
> maintained as these target libcs evolve?

Yes, to me that looks simplest.

> > The thing is, I think it is much better to have this layer in a source form
> > where you can easily modify it than inside of the compiler where you have to
> > hardwire everything in there.  It could sit in some offloading directory of
> > newlib, which the offloading port(s) could share.
> 
> I argue this should be as close as possible to the origin, which is the
> glibc header files, and as these are not feasible to be adjusted quickly,
> we instead do it in the compilation process, for now.

For one, I don't understand the argument about hard upgrades of the libc
headers, upgrading libc isn't much harder than updating the compiler, and
you have always the possibility to fixinclude the headers or whatever else.

But, I'm also not convinced how would you like to change the headers.
The finite math entrypoints are just one example of many things you can do
in libc headers, asm redirects to another entrypoint, so you'd need some way
to say, and if offloading to target XYZ (e.g. I think for XeonPhi that is
not needed), redirect to this instead.  Consider the various glibc macros or
inlines, e.g. for -D_FORTIFY_SOURCE*, or optimized string.h in
bits/string2.h, you can end up with various macros for standard functions,
I really don't think it would be easy to somehow undo those changes except
for providing a compatibility layer in the offloading library.
Whether through some special markup in libc headers or (much worse)
hardwiring it all in the compiler.

Sure, we should document what APIs we are willing to support, but then
adding a compat layer shouldn't be that hard.

> > The __*_finite functions aren't the only one, what if glibc the next half a
> > year adds another 4-5 of the finite math functions?
> 
> Huh?  In your proposed model I would then have to react to that, and
> alter the translation layer, whereas in my model it would just continue
> to work?

No, in your model you'd need to update the compiler to hardwire further
stuff in it.

Jakub


Re: [PATCH][SPARC] Fix cpu auto-detection in M7 and S7 (Sonoma)

2016-06-08 Thread Eric Botcazou
> Starting with the M7 we will be using the same identifiers to identify
> the cpu in both Solaris (via kstat) and GNU/Linux (via /proc/cpuinfo).
> This little patch fixes the SPARC M7 entry in cpu_names, and also adds
> an entry for the Sonoma SoC.
> 
> Tested in sparc64-*-linux-gnu, sparcv9-*-linux-gnu and
> sparc-sun-solaris2.11 targets.
> 
> 2016-06-08  Jose E. Marchesi  
> 
>   * config/sparc/driver-sparc.c (cpu_names): Fix the entry for the
>   SPARC-M7 and add an entry for SPARC-S7 cpus (Sonoma).

OK, but it needs to be applied both on mainline and 6 branch.

-- 
Eric Botcazou


Re: [PATCH, RFC] First cut at using vec_construct for strided loads

2016-06-08 Thread Richard Biener
On Wed, 8 Jun 2016, Bill Schmidt wrote:

> Hi Richard,
> 
> > On Jun 8, 2016, at 7:29 AM, Richard Biener  
> > wrote:
> > 
> > On Wed, Jun 13, 2012 at 4:18 AM, William J. Schmidt
> >  wrote:
> >> This patch is a follow-up to the discussion generated by
> >> http://gcc.gnu.org/ml/gcc-patches/2012-06/msg00546.html.  I've added
> >> vec_construct to the cost model for use in vect_model_load_cost, and
> >> implemented a cost calculation that makes sense to me for PowerPC.  I'm
> >> less certain about the default, i386, and spu implementations.  I took a
> >> guess at i386 from the discussions we had, and used the same calculation
> >> for the default and for spu.  I'm hoping you or others can fill in the
> >> blanks if I guessed badly.
> >> 
> >> The i386 cost for vec_construct is different from all the others, which
> >> are parameterized for each processor description.  This should probably
> >> be parameterized in some way as well, but thought you'd know better than
> >> I how that should be.  Perhaps instead of
> >> 
> >>elements / 2 + 1
> >> 
> >> it should be
> >> 
> >>(elements / 2) * X + Y
> >> 
> >> where X and Y are taken from the processor description, and represent
> >> the cost of a merge and a permute, respectively.  Let me know what you
> >> think.
> > 
> > Just trying to understand how you arrived at the above formulas in 
> > investigating
> > strangely low cost for v16qi construction of 9.  If we pairwise reduce 
> > elements
> > with a cost of 1 then we arrive at a cost of elements - 1, that's what you'd
> > get with not accounting an initial move of element zero into a vector and 
> > then
> > inserting each other element into that with elements - 1 inserts.
> 
> What I wrote there only makes partial sense for certain types on Power, so 
> far as
> I can tell, and even then it doesn’t generalize properly.  When the scalar 
> registers
> are contained in the vector registers (as happens for floating-point on 
> Power), then
> you can do some merges and other forms of permutes to combine them faster
> than doing specific inserts.  But that isn’t a general solution even on 
> Power; for the
> integer modes we still do inserts.

You mean Power has instructions to combine more than two vector registers
into one?  Otherwise you still need n / 2 plus n / 4 plus n / 8 ...
"permutes" which boils down to n - 1.
 
> So what you have makes sense to me, and what’s currently in place for Power 
> needs
> work also, so far as I can tell.  I’ll take a note to revisit this.

Thanks.
Richard.

> — Bill
> 
> > 
> > This also matches up with code-generation on x86_64 for
> > 
> > vT foo (T a, T b, ...)
> > {
> >  return (vT) {a, b, ... };
> > }
> > 
> > for any vector / element type combination I tried.  Thus the patch below.
> > 
> > I'll bootstrap / test that on x86_64-linux and I'm leaving other
> > targets to target
> > maintainers.
> > 
> > Ok for the i386 parts?
> > 
> > Thanks,
> > Richard.
> > 
> > 2016-06-08  Richard Biener  
> > 
> >* targhooks.c (default_builtin_vectorization_cost): Adjust
> >vec_construct cost.
> >* config/i386/i386.c (ix86_builtin_vectorization_cost): Likewise.
> > 
> > Index: gcc/targhooks.c
> > ===
> > --- gcc/targhooks.c (revision 237196)
> > +++ gcc/targhooks.c (working copy)
> > @@ -589,8 +589,7 @@ default_builtin_vectorization_cost (enum
> > return 3;
> > 
> >   case vec_construct:
> > -   elements = TYPE_VECTOR_SUBPARTS (vectype);
> > -   return elements / 2 + 1;
> > +   return TYPE_VECTOR_SUBPARTS (vectype) - 1;
> > 
> >   default:
> > gcc_unreachable ();
> > Index: gcc/config/i386/i386.c
> > ===
> > --- gcc/config/i386/i386.c  (revision 237196)
> > +++ gcc/config/i386/i386.c  (working copy)
> > @@ -49503,8 +49520,6 @@ static int
> > ix86_builtin_vectorization_cost (enum vect_cost_for_stmt type_of_cost,
> >  tree vectype, int)
> > {
> > -  unsigned elements;
> > -
> >   switch (type_of_cost)
> > {
> >   case scalar_stmt:
> > @@ -49546,8 +49561,7 @@ ix86_builtin_vectorization_cost (enum ve
> > return ix86_cost->vec_stmt_cost;
> > 
> >   case vec_construct:
> > -   elements = TYPE_VECTOR_SUBPARTS (vectype);
> > -   return ix86_cost->vec_stmt_cost * (elements / 2 + 1);
> > +   return ix86_cost->vec_stmt_cost * (TYPE_VECTOR_SUBPARTS (vectype) - 
> > 1);
> > 
> >   default:
> > gcc_unreachable ();
> > 
> > 
> >> Thanks,
> >> Bill
> >> 
> >> 
> >> 2012-06-12  Bill Schmidt  
> >> 
> >>* targhooks.c (default_builtin_vectorized_conversion): Handle
> >>vec_construct, using vectype to base cost on subparts.
> >>* target.h (enum vect_cost_for_stmt): Add vec_construct.
> >>* tree-vect-stmts.c (vect_model_load_cost): Use vec_construct
> >>instead of scalar_

Re: [PING] [PR c/71381] C/C++ OpenACC cache directive rejects valid syntax

2016-06-08 Thread Jakub Jelinek
On Wed, Jun 08, 2016 at 03:28:57PM +0200, Thomas Schwinge wrote:
> > [PR c/71381] C/C++ OpenACC cache directive rejects valid syntax
> > 
> > gcc/c/
> > PR c/71381
> > * c-parser.c (c_parser_omp_variable_list) :
> > Loosen checking.
> > gcc/cp/
> > PR c/71381
> > * parser.c (cp_parser_omp_var_list_no_open) 
> > :
> > Loosen checking.
> > gcc/fortran/
> > PR c/71381
> > * openmp.c (gfc_match_oacc_cache): Add comment.
> > gcc/testsuite/
> > PR c/71381
> > * c-c++-common/goacc/cache-1.c: Update.  Move invalid usage 
> > tests
> > to...
> > * c-c++-common/goacc/cache-2.c: ... this new file.
> > * gfortran.dg/goacc/cache-1.f95: Move invalid usage tests to...
> > * gfortran.dg/goacc/cache-2.f95: ... this new file.
> > * gfortran.dg/goacc/coarray.f95: Update OpenACC cache directive
> > usage.
> > * gfortran.dg/goacc/cray.f95: Likewise.
> > * gfortran.dg/goacc/loop-1.f95: Likewise.
> > libgomp/
> > PR c/71381
> > * testsuite/libgomp.oacc-c-c++-common/cache-1.c: #include
> > "../../../gcc/testsuite/c-c++-common/goacc/cache-1.c".
> > * testsuite/libgomp.oacc-fortran/cache-1.f95: New file.
> > 
> > gcc/
> > * omp-low.c (scan_sharing_clauses): Don't expect
> > OMP_CLAUSE__CACHE_.

Ok.

> > --- gcc/c/c-parser.c
> > +++ gcc/c/c-parser.c
> > @@ -10601,6 +10601,9 @@ c_parser_omp_variable_list (c_parser *parser,
> >   switch (kind)
> > {
> > case OMP_CLAUSE__CACHE_:
> > + /* The OpenACC cache directive explicitly only allows "array
> > +elements or subarrays".  Would it make sense to allow complete
> > +arrays as a GNU extension?  */

Please try to not add GNU extensions on top of OpenACC, unless strictly
necessary.
It is better if the compiler is strict and there is interoperability.
If you think it should accept something that it doesn't, talk to the OpenACC
committee.

Jakub


Re: libgomp: Unconfuse offload plugins vs. offload targets

2016-06-08 Thread Jakub Jelinek
On Wed, Jun 08, 2016 at 03:27:44PM +0200, Thomas Schwinge wrote:
> Hi!
> 
> This got me confused recently, so I took the effort to clean it up.  OK
> to commit?

As I said earlier, I don't find anything confusing on what we have there
and would strongly prefer not to change it.
Can you submit the actual testsuite change which got hidden in all the
renaming separately?

Thanks.

Jakub


Re: RFC [1/2] divmod transform

2016-06-08 Thread Richard Biener
On Fri, 3 Jun 2016, Jim Wilson wrote:

> On Mon, May 30, 2016 at 12:45 AM, Richard Biener  wrote:
> > Joseph - do you know sth about why there's not a full set of divmod
> > libfuncs in libgcc?
> 
> Because udivmoddi4 isn't a libfunc, it is a helper function for the
> div and mov libfuncs.  Since we can compute the signed div and mod
> results from udivmoddi4, there was no need to also add a signed
> version of it.  It was given a libfunc style name so that we had the
> option of making it a libfunc in the future, but that never happened.
> There was no support for calling any divmod libfunc until it was added
> as a special case to call an ARM library (not libgcc) function.  This
> happened here
> 
> 2004-08-09  Mark Mitchell  
> 
> * config.gcc (arm*-*-eabi*): New target.
> * defaults.h (TARGET_LIBGCC_FUNCS): New macro.
> (TARGET_LIB_INT_CMP_BIASED): Likewise.
> * expmed.c (expand_divmod): Try a two-valued divmod function as a
> last resort.
> ...
> * config/arm/arm.c (arm_init_libfuncs): New function.
> (arm_compute_initial_eliminatino_offset): Return HOST_WIDE_INT.
> (TARGET_INIT_LIBFUNCS): Define it.
> ...
> 
> Later, two ports added their own divmod libfuncs, but I don't see any
> evidence that they were ever used, since there is no support for
> calling divmod other than the expand_divmod last resort code that only
> triggers for ARM.
> 
> It is only now that Prathamesh is adding gimple support for divmod
> operations that we need to worry about getting this right, without
> breaking the existing ARM library support or the existing udivmoddi4
> support.

Ok, so as he is primarily targeting the special arm divmod libcall
I suppose we can live with special-casing libcall handling to
udivmoddi3.  It would be nice to not lie about divmod availablilty
as libcall though... - it looks like the libcall is also guarded
on TARGET_HAS_NO_HW_DIVIDE (unless it was available historically
like on x86).

So not sure where to go from here.

Richard.


Re: [PATCH v1] Support for SPARC M7 and VIS 4.0

2016-06-08 Thread Gerald Pfeifer
On Wed, 8 Jun 2016, Eric Botcazou wrote:
> I have also attached an update for the wwwdocs module, OK to install?

Yes, definitely!

Thank you,
Gerald



Re: move increase_alignment from simple to regular ipa pass

2016-06-08 Thread Richard Biener
On Tue, 7 Jun 2016, Prathamesh Kulkarni wrote:

> On 3 June 2016 at 13:35, Jan Hubicka  wrote:
> >> > fsection-anchors
> >> > Common Report Var(flag_section_anchors)
> >> > Access data in the same section from shared anchor points.
> >>
> >> Funny.  I see the following on trunk:
> >>
> >> fsection-anchors
> >> Common Report Var(flag_section_anchors) Optimization
> >> Access data in the same section from shared anchor points.
> >
> > Aha, my local change from last year still inmy tree. Sorry.
> > Yep, having it as Optimization makes sense, but we need to be sure it works 
> > as intended.
> >>
> >> > flag_section_anchors is not declared as Optimiation, so it can't be 
> >> > function
> >> > specific right now. It probably should because it is an optimization.  
> >> > This
> >> > makes me wonder what happens when one function have anchors enabled and 
> >> > other
> >> > doesn't?  Probably anchroing or not anchoring the var will then depend 
> >> > on what
> >> > function comes first in the compilation order and then we will need to 
> >> > make
> >> > backend grok the case where static var is anchored but 
> >> > flag_section_anchors is
> >> > off.
> >>
> >> This is because we represent the anchor with DECL_RTL, right?  Maybe
> >> DECL_RTL of globals needs to be re-computed for each function...
> >
> > I would rather anchor variable if it is used by at least one function that 
> > is compiled
> > with anchors.  Accessing anchors is IMO no slower than accessing symbols. 
> > But I am not
> > that familiar witht his code...
> >>
> >> > I dunno what is the desired behaviour for LTOint together different code
> >> > models.
> >>
> >> Good question.  There's always the choice to remove 'Optimization' and
> >> enforce same setting for all TUs we LTO in lto-wrapper.
> >
> > Yep. Not sure what is better - I did not really think of targets that use 
> > both
> > models.
> Um I am not really sure what to do next to convert increase_alignment
> to regular pass, I would be grateful
> for suggestions.

I think it would be nice to work towards transitioning 
flag_section_anchors to a flag on varpool nodes, thereby removing
the Optimization flag from common.opt:fsection-anchors

That would simplify the walk over varpool candidates.

Richard.

> Thanks,
> Prathamesh
> >
> > Honza
> >>
> >> Richard.
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [PING^4][PATCHv2, ARM, libgcc] New aeabi_idiv function for armv6-m

2016-06-08 Thread Andre Vieira (lists)
Ping.

On 19/05/16 11:19, Andre Vieira (lists) wrote:
> Ping for GCC-7, patch applies cleanly, passed make check for cortex-m0.
> 
> Might be worth mentioning that this patch has been used in three
> releases of the GNU ARM embedded toolchain, using GCC versions 4.9 and
> 5, and no issues have been reported so far.
> 
> On 25/01/16 17:15, Andre Vieira (lists) wrote:
>> Ping.
>>
>> On 27/10/15 17:03, Andre Vieira wrote:
>>> Ping.
>>>
>>> BR,
>>> Andre
>>>
>>> On 13/10/15 18:01, Andre Vieira wrote:
 This patch ports the aeabi_idiv routine from Linaro Cortex-Strings
 (https://git.linaro.org/toolchain/cortex-strings.git), which was
 contributed by ARM under Free BSD license.

 The new aeabi_idiv routine is used to replace the one in
 libgcc/config/arm/lib1funcs.S. This replacement happens within the
 Thumb1 wrapper. The new routine is under LGPLv3 license.

 The main advantage of this version is that it can improve the
 performance of the aeabi_idiv function for Thumb1. This solution will
 also increase the code size. So it will only be used if
 __OPTIMIZE_SIZE__ is not defined.

 Make check passed for armv6-m.

 libgcc/ChangeLog:
 2015-08-10  Hale Wang  
   Andre Vieira  

 * config/arm/lib1funcs.S: Add new wrapper.

>>
> 



Re: [PATCH] Fix SLP wrong-code with VECTOR_BOOLEAN_TYPE_P (PR tree-optimization/71259)

2016-06-08 Thread Christophe Lyon
On 8 June 2016 at 12:33, Richard Biener  wrote:
> On Wed, 8 Jun 2016, Jakub Jelinek wrote:
>
>> On Wed, Jun 08, 2016 at 12:26:17PM +0200, Richard Biener wrote:
>> > > So: should I change dg-options into dg-additional-options for all the
>> > > tests for consistency, or only on the 3 ones where it makes them pass?
>> > > (pr71259.c, vect-shift-2-big-array.c, vect-shift-2.c)
>> >
>> > I think all tests should use dg-additional-options.
>>
>> All tests in {gcc,g++}.dg/vect/, right?  I agree with that.
>
> Yes.  [and most of the vect.exp fancy-filename stuff should be replaced
> by adding dg-additional-options]
>

I've tried the attached patch (which does only dg-options ->
dg-additional-options).
For GCC, it's better, except that on arm-none-eabi qemu complains about
an illegal instruction when asked to use arm926 and GCC is configured with
the default cpu. Maybe that's because check_vect does not have the expected
behaviour ? (I have checked yet which instruction causes that because it
will take a bit of time to reproduce manually the needed environment)

For G++, the tests now pass with --std=c++XX instead of std=gnu++XX.

Is it OK?

  Christophe Lyon  

   * gcc.dg/vect/YYY.c: Use dg-additional options instead of dg-options.


> Richard.
diff --git a/gcc/testsuite/g++.dg/vect/pr33834_2.cc 
b/gcc/testsuite/g++.dg/vect/pr33834_2.cc
index ecaf588..49e72d2 100644
--- a/gcc/testsuite/g++.dg/vect/pr33834_2.cc
+++ b/gcc/testsuite/g++.dg/vect/pr33834_2.cc
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O3 -ftree-vectorize" } */
+/* { dg-additional-options "-O3 -ftree-vectorize" } */
 
 /* Testcase by Martin Michlmayr  */
 
diff --git a/gcc/testsuite/g++.dg/vect/pr33860a.cc 
b/gcc/testsuite/g++.dg/vect/pr33860a.cc
index 0e5164f..bbfdeef 100644
--- a/gcc/testsuite/g++.dg/vect/pr33860a.cc
+++ b/gcc/testsuite/g++.dg/vect/pr33860a.cc
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-Wno-psabi" { target { { i?86-*-* x86_64-*-* } && ilp32 } } } 
*/
+/* { dg-additional-options "-Wno-psabi" { target { { i?86-*-* x86_64-*-* } && 
ilp32 } } } */
 
 /* Testcase by Martin Michlmayr  */
 
diff --git a/gcc/testsuite/g++.dg/vect/pr45470-a.cc 
b/gcc/testsuite/g++.dg/vect/pr45470-a.cc
index 98ce4ca..ba5873c 100644
--- a/gcc/testsuite/g++.dg/vect/pr45470-a.cc
+++ b/gcc/testsuite/g++.dg/vect/pr45470-a.cc
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O1 -ftree-vectorize -fnon-call-exceptions" } */
+/* { dg-additional-options "-O1 -ftree-vectorize -fnon-call-exceptions" } */
 
 struct A
 {
diff --git a/gcc/testsuite/g++.dg/vect/pr45470-b.cc 
b/gcc/testsuite/g++.dg/vect/pr45470-b.cc
index 3ad66ec..ce04f8e 100644
--- a/gcc/testsuite/g++.dg/vect/pr45470-b.cc
+++ b/gcc/testsuite/g++.dg/vect/pr45470-b.cc
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O1 -ftree-vectorize -fno-vect-cost-model 
-fnon-call-exceptions" } */
+/* { dg-additional-options "-O1 -ftree-vectorize -fno-vect-cost-model 
-fnon-call-exceptions" } */
 
 template < typename _Tp > struct new_allocator
 {
diff --git a/gcc/testsuite/g++.dg/vect/pr60896.cc 
b/gcc/testsuite/g++.dg/vect/pr60896.cc
index c6ce68b..b4ff0d3 100644
--- a/gcc/testsuite/g++.dg/vect/pr60896.cc
+++ b/gcc/testsuite/g++.dg/vect/pr60896.cc
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O3" } */
+/* { dg-additional-options "-O3" } */
 
 struct A
 {
diff --git a/gcc/testsuite/gcc.dg/vect/no-tree-pre-pr45241.c 
b/gcc/testsuite/gcc.dg/vect/no-tree-pre-pr45241.c
index 54aa89b..00055b8 100644
--- a/gcc/testsuite/gcc.dg/vect/no-tree-pre-pr45241.c
+++ b/gcc/testsuite/gcc.dg/vect/no-tree-pre-pr45241.c
@@ -1,6 +1,6 @@
 /* PR tree-optimization/45241 */
 /* { dg-do compile } */
-/* { dg-options "-ftree-vectorize" } */
+/* { dg-additional-options "-ftree-vectorize" } */
 
 int
 foo (short x)
diff --git a/gcc/testsuite/gcc.dg/vect/pr18308.c 
b/gcc/testsuite/gcc.dg/vect/pr18308.c
index b71f08e..51bcc83 100644
--- a/gcc/testsuite/gcc.dg/vect/pr18308.c
+++ b/gcc/testsuite/gcc.dg/vect/pr18308.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O -ftree-vectorize -funroll-loops" } */
+/* { dg-additional-options "-O -ftree-vectorize -funroll-loops" } */
 void foo();
 
 void bar(int j)
diff --git a/gcc/testsuite/gcc.dg/vect/pr24049.c 
b/gcc/testsuite/gcc.dg/vect/pr24049.c
index a7798bd..dd3e94c 100644
--- a/gcc/testsuite/gcc.dg/vect/pr24049.c
+++ b/gcc/testsuite/gcc.dg/vect/pr24049.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O1 -ftree-vectorize --param ggc-min-heapsize=0 --param 
ggc-min-expand=0" } */
+/* { dg-additional-options "-O1 -ftree-vectorize --param ggc-min-heapsize=0 
--param ggc-min-expand=0" } */
 
 void unscrunch (unsigned char *, int *);
 
diff --git a/gcc/testsuite/gcc.dg/vect/pr33373.c 
b/gcc/testsuite/gcc.dg/vect/pr33373.c
index efba2ab..7ab6223 100644
--- a/gcc/testsuite/gcc.dg/vect/pr33373.c
+++ b/gcc/testsuite/gcc.dg/vect/pr33373.c
@@ -1,4 +1,4 @@
-/* { dg-options "-Wno-shift-overflow" } */
+/* { dg-additional-options "-

Re: [PATCH] Fix SLP wrong-code with VECTOR_BOOLEAN_TYPE_P (PR tree-optimization/71259)

2016-06-08 Thread Jakub Jelinek
On Wed, Jun 08, 2016 at 04:44:00PM +0200, Christophe Lyon wrote:
> I've tried the attached patch (which does only dg-options ->
> dg-additional-options).
> For GCC, it's better, except that on arm-none-eabi qemu complains about
> an illegal instruction when asked to use arm926 and GCC is configured with
> the default cpu. Maybe that's because check_vect does not have the expected

check_vect installs a SIGILL handler and if the insn is invalid, excepts
a signal to be raised.  Is that not the case with qemu?  Or is qemu just
being too noisy?

>   Christophe Lyon  
> 
>* gcc.dg/vect/YYY.c: Use dg-additional options instead of dg-options.

Please list all the changed tests in the ChangeLog (with : Likewise. for
all but the first one).

Ok with that change.

Jakub


Re: [PATCH AArch64]Support missing vcond pattern by adding/using vec_cmp/vcond_mask patterns.

2016-06-08 Thread Bin Cheng
> From: James Greenhalgh 
> Sent: 31 May 2016 16:24
> To: Bin Cheng
> Cc: gcc-patches@gcc.gnu.org; nd
> Subject: Re: [PATCH AArch64]Support missing vcond pattern by adding/using 
> vec_cmp/vcond_mask patterns.
> 
> On Tue, May 17, 2016 at 09:02:22AM +, Bin Cheng wrote:
> > Hi,
> > Alan and Renlin noticed that some vcond patterns are not supported in
> > AArch64(or AArch32?) backend, and they both had some patches fixing this.
> > After investigation, I agree with them that vcond/vcondu in AArch64's 
> > backend
> > should be re-implemented using vec_cmp/vcond_mask patterns, so here comes
> > this patch which is based on Alan's.  This patch supports all vcond/vcondu
> > patterns by implementing/using vec_cmp and vcond_mask patterns.  Different 
> > to
> > the original patch, it doesn't change GCC's expanding process, and it keeps
> > vcond patterns.  The patch also introduces vec_cmp*_internal to support
> > special case optimization for vcond/vcondu which current implementation 
> > does.
> > Apart from Alan's patch, I also learned ideas from Renlin's, and it is my
> > change that shall be blamed if any potential bug is introduced.
> > 
> > With this patch, GCC's test condition "vect_cond_mixed" can be enabled on
> > AArch64 (in a following patch).  Bootstrap and test on AArch64.  Is it OK?
> > BTW, this patch is necessary for gcc.dg/vect/PR56541.c (on AArch64) which 
> > was
> > added before in tree if-conversion patch.
> 
> Splitting this patch would have been very helpful. One patch each for the
> new standard pattern names, and one patch for the refactor of vcond. As
> it is, this patch is rather difficult to read.
Done, patch split into two with one implementing new vcond_mask&vec_cmp 
patterns, and another re-writing vcond patterns.

> 
> > diff --git a/gcc/config/aarch64/aarch64-simd.md 
> > b/gcc/config/aarch64/aarch64-simd.md
> > index bd73bce..f51473a 100644
> > --- a/gcc/config/aarch64/aarch64-simd.md
> > +++ b/gcc/config/aarch64/aarch64-simd.md
> > @@ -1053,7 +1053,7 @@
> >  }
> >  
> >cmp_fmt = gen_rtx_fmt_ee (cmp_operator, V2DImode, operands[1], 
> > operands[2]);
> > -  emit_insn (gen_aarch64_vcond_internalv2div2di (operands[0], operands[1],
> > +  emit_insn (gen_vcondv2div2di (operands[0], operands[1],
> >operands[2], cmp_fmt, operands[1], operands[2]));
> >DONE;
> >  })
> > @@ -2225,204 +2225,215 @@
> >DONE;
> >  })
> >  
> > -(define_expand "aarch64_vcond_internal"
> > +(define_expand "vcond_mask_"
> > +  [(match_operand:VALLDI 0 "register_operand")
> > +   (match_operand:VALLDI 1 "nonmemory_operand")
> > +   (match_operand:VALLDI 2 "nonmemory_operand")
> > +   (match_operand: 3 "register_operand")]
> > +  "TARGET_SIMD"
> > +{
> > +  /* If we have (a = (P) ? -1 : 0);
> > + Then we can simply move the generated mask (result must be int).  */
> > +  if (operands[1] == CONSTM1_RTX (mode)
> > +  && operands[2] == CONST0_RTX (mode))
> > +emit_move_insn (operands[0], operands[3]);
> > +  /* Similarly, (a = (P) ? 0 : -1) is just inverting the generated mask.  
> > */
> > +  else if (operands[1] == CONST0_RTX (mode)
> > +&& operands[2] == CONSTM1_RTX (mode))
> > +emit_insn (gen_one_cmpl2 (operands[0], operands[3]));
> > +  else
> > +{
> > +  if (!REG_P (operands[1]))
> > + operands[1] = force_reg (mode, operands[1]);
> > +  if (!REG_P (operands[2]))
> > + operands[2] = force_reg (mode, operands[2]);
> > +  emit_insn (gen_aarch64_simd_bsl (operands[0], operands[3],
> > +  operands[1], operands[2]));
> > +}
> > +
> > +  DONE;
> > +})
> > +
> 
> This pattern is fine.
> 
> > +;; Patterns comparing two vectors to produce a mask.
> 
> This comment is insufficient. The logic in vec_cmp_internal
> does not always return the expected mask (in particular for NE), but this
> is not made clear in the comment.
Comments added to various places to make the code easier to understand and 
maintain.

> 
> > +
> > +(define_expand "vec_cmp_internal"
> >[(set (match_operand:VSDQ_I_DI 0 "register_operand")
> > - (if_then_else:VSDQ_I_DI
> > -   (match_operator 3 "comparison_operator"
> > - [(match_operand:VSDQ_I_DI 4 "register_operand")
> > -  (match_operand:VSDQ_I_DI 5 "nonmemory_operand")])
> > -   (match_operand:VSDQ_I_DI 1 "nonmemory_operand")
> > -   (match_operand:VSDQ_I_DI 2 "nonmemory_operand")))]
> > +   (match_operator 1 "comparison_operator"
> > + [(match_operand:VSDQ_I_DI 2 "register_operand")
> > +  (match_operand:VSDQ_I_DI 3 "nonmemory_operand")]))]
> >"TARGET_SIMD"
> 
> 
> 
> > +(define_expand "vec_cmp"
> > +  [(set (match_operand:VSDQ_I_DI 0 "register_operand")
> > +   (match_operator 1 "comparison_operator"
> > + [(match_operand:VSDQ_I_DI 2 "register_operand")
> > +  (match_operand:VSDQ_I_DI 3 "nonmemory_operand")]))]
> > +  "TARGET_SIMD"
> > +{
> > +  enum rtx_code code = GET_CODE (operands[1]

[gomp4.5] !$omp declare target changes

2016-06-08 Thread Jakub Jelinek
Hi!

I've committed following patch to implement OpenMP 4.5 declare target
construct.

In addition, I've fixed omp declare simd handling, where it would in free
form incorrectly accept free form
SUBROUTINE FOO(A)
!$OMP DECLARE SIMDLINEAR(A)
  INTEGER :: A
END SUBROUTINE
(no space between SIMD and following clause name).

Tested on x86_64-linux, committed to gomp-4_5-branch.

2016-06-08  Jakub Jelinek  

* gfortran.h (symbol_attribute): Add omp_declare_target_link bitfield.
(struct gfc_omp_namelist): Add u.common field.
(struct gfc_common_head): Change omp_declare_target into bitfield.
Add omp_declare_target_link bitfield.
(gfc_add_omp_declare_target_link): New prototype.
* openmp.c (gfc_match_omp_to_link): New function.
(gfc_match_omp_clauses): Use it for to and link clauses in declare
target construct.
(OMP_DECLARE_TARGET_CLAUSES): Define.
(gfc_match_omp_declare_target): Rewritten for OpenMP 4.5.
* symbol.c (check_conflict): Handle omp_declare_target_link.
(gfc_add_omp_declare_target_link): New function.
(gfc_copy_attr): Copy omp_declare_target_link.
* module.c (enum ab_attribute): Add AB_OMP_DECLARE_TARGET_LINK.
(attr_bits): Add AB_OMP_DECLARE_TARGET_LINK entry.
(mio_symbol_attribute): Save and restore omp_declare_target_link bit.
* f95-lang.c (gfc_attribute_table): Add "omp declare target link".
* trans-decl.c (add_attributes_to_decl): Add "omp declare target link"
instead of "omp declare target" for omp_declare_target_link.
* trans-common.c (build_common_decl): Likewise.

* openmp.c (gfc_match_omp_declare_simd): If not using the form with
(proc-name), require space before first clause.
testsuite/
* gfortran.dg/gomp/declare-target-1.f90: New test.
* gfortran.dg/gomp/declare-target-2.f90: New test.

--- gcc/fortran/gfortran.h.jj   2016-05-25 18:23:54.0 +0200
+++ gcc/fortran/gfortran.h  2016-06-07 15:29:18.170184003 +0200
@@ -849,6 +849,7 @@ typedef struct
 
   /* Mentioned in OMP DECLARE TARGET.  */
   unsigned omp_declare_target:1;
+  unsigned omp_declare_target_link:1;
 
   /* Mentioned in OACC DECLARE.  */
   unsigned oacc_declare_create:1;
@@ -1157,6 +1158,7 @@ typedef struct gfc_omp_namelist
   gfc_omp_depend_op depend_op;
   gfc_omp_map_op map_op;
   gfc_omp_linear_op linear_op;
+  struct gfc_common_head *common;
 } u;
   struct gfc_omp_namelist_udr *udr;
   struct gfc_omp_namelist *next;
@@ -1561,7 +1563,9 @@ struct gfc_undo_change_set
 typedef struct gfc_common_head
 {
   locus where;
-  char use_assoc, saved, threadprivate, omp_declare_target;
+  char use_assoc, saved, threadprivate;
+  unsigned char omp_declare_target : 1;
+  unsigned char omp_declare_target_link : 1;
   char name[GFC_MAX_SYMBOL_LEN + 1];
   struct gfc_symbol *head;
   const char* binding_label;
@@ -2840,6 +2844,8 @@ bool gfc_add_result (symbol_attribute *,
 bool gfc_add_save (symbol_attribute *, save_state, const char *, locus *);
 bool gfc_add_threadprivate (symbol_attribute *, const char *, locus *);
 bool gfc_add_omp_declare_target (symbol_attribute *, const char *, locus *);
+bool gfc_add_omp_declare_target_link (symbol_attribute *, const char *,
+ locus *);
 bool gfc_add_saved_common (symbol_attribute *, locus *);
 bool gfc_add_target (symbol_attribute *, locus *);
 bool gfc_add_dummy (symbol_attribute *, const char *, locus *);
--- gcc/fortran/openmp.c.jj 2016-05-31 19:33:55.0 +0200
+++ gcc/fortran/openmp.c2016-06-08 16:10:55.309586149 +0200
@@ -340,6 +340,96 @@ cleanup:
   return MATCH_ERROR;
 }
 
+/* Match a variable/procedure/common block list and construct a namelist
+   from it.  */
+
+static match
+gfc_match_omp_to_link (const char *str, gfc_omp_namelist **list)
+{
+  gfc_omp_namelist *head, *tail, *p;
+  locus old_loc, cur_loc;
+  char n[GFC_MAX_SYMBOL_LEN+1];
+  gfc_symbol *sym;
+  match m;
+  gfc_symtree *st;
+
+  head = tail = NULL;
+
+  old_loc = gfc_current_locus;
+
+  m = gfc_match (str);
+  if (m != MATCH_YES)
+return m;
+
+  for (;;)
+{
+  cur_loc = gfc_current_locus;
+  m = gfc_match_symbol (&sym, 1);
+  switch (m)
+   {
+   case MATCH_YES:
+ p = gfc_get_omp_namelist ();
+ if (head == NULL)
+   head = tail = p;
+ else
+   {
+ tail->next = p;
+ tail = tail->next;
+   }
+ tail->sym = sym;
+ tail->where = cur_loc;
+ goto next_item;
+   case MATCH_NO:
+ break;
+   case MATCH_ERROR:
+ goto cleanup;
+   }
+
+  m = gfc_match (" / %n /", n);
+  if (m == MATCH_ERROR)
+   goto cleanup;
+  if (m == MATCH_NO)
+   goto syntax;
+
+  st = gfc_find_symtree (gfc_current_ns->common_root, n);
+  if (st == NULL)
+   {
+ gfc_error ("COMMON block /%s/ not found at

Re: [PATCH 3/N] Add sorting support to analyze_brprob script

2016-06-08 Thread Jan Hubicka
> Hello.
> 
> This is a small followup, where I would like to add new argument to 
> analyze_brprob.py
> script file. With the patch, one can sort predictors by e.g. hitrate:

OK,
thanks!

Honza


Re: [PATCH 4/N] Add new analyze_brprob_spec.py script

2016-06-08 Thread Jan Hubicka
> Hi.
> 
> The second follow up patch adds new script which is a simple wrapper around
> analyze_brprob.py and can be used to dump statistics for results that are in
> different folder (like SPEC benchmarks).
> 
> Sample:
> ./contrib/analyze_brprob_spec.py --sorting=hitrate 
> /home/marxin/Programming/cpu2006/benchspec/CPU2006/
> 
> Sample output:
> 401.bzip2
> HEURISTICS   BRANCHES  (REL)  HITRATE
> COVERAGE COVERAGE  (REL)
> no prediction 107  14.0%  19.45% /  84.37% 
> 27681347332.77G  10.8%
> opcode values nonequal (on trees)  76  10.0%  32.24% /  85.06% 
> 40346813444.03G  15.8%
> call   95  12.5%  45.50% /  93.31%  
> 152224913  152.22M   0.6%
> DS theory 275  36.1%  45.56% /  84.30% 
> 73088639047.31G  28.6%
> continue   14   1.8%  48.44% /  73.14% 
> 14797749961.48G   5.8%
> guessed loop iterations12   1.6%  68.30% /  71.61%  
> 269705737  269.71M   1.1%
> combined  762 100.0%  69.52% /  89.32%
> 25553311262   25.55G 100.0%
> goto   40   5.2%  72.41% /  98.80%  
> 882062676  882.06M   3.5%
> opcode values positive (on trees)  40   5.2%  76.74% /  88.09% 
> 13941049261.39G   5.5%
> pointer (on trees) 61   8.0%  83.79% / 100.00%
>  931107  931.11K   0.0%
> early return (on trees)31   4.1%  84.39% /  84.41% 
> 25480584022.55G  10.0%
> first match   380  49.9%  89.79% /  92.57%
> 15476312625   15.48G  60.6%
> loop exit 316  41.5%  90.09% /  92.88%
> 15065219828   15.07G  59.0%
> guess loop iv compare   2   0.3%  99.61% /  99.61%   
> 26987995   26.99M   0.1%
> loop iv compare 1   0.1%  99.61% /  99.61%
>  105411  105.41K   0.0%
> loop iterations38   5.0%  99.64% /  99.64%  
> 140236649  140.24M   0.5%
> null return 2   0.3% 100.00% / 100.00%
>  1818.00   0.0%
> noreturn call  13   1.7% 100.00% / 100.00%
> 10450001.04M   0.0%
> const return2   0.3% 100.00% / 100.00%
> 816   816.00   0.0%
> negative return62   8.1% 100.00% / 100.00%  
> 618097152  618.10M   2.4%
> 
> 410.bwaves
> HEURISTICS   BRANCHES  (REL)  HITRATE
> COVERAGE COVERAGE  (REL)
> call1   0.6%   0.00% / 100.00%
>  2020.00   0.0%
> no prediction   6   3.7%   0.28% /  99.72%
> 27041842.70M   0.1%
> opcode values nonequal (on trees)   4   2.4%  60.00% /  70.00%
> 200   200.00   0.0%
> loop iterations 7   4.3%  80.00% /  80.00%  
> 112892000  112.89M   2.4%
> first match83  50.6%  81.67% /  81.67% 
> 43938854654.39G  92.1%
> loop exit  76  46.3%  81.71% /  81.71% 
> 42809934654.28G  89.8%
> combined  164 100.0%  83.05% /  83.11% 
> 47685455074.77G 100.0%
> DS theory  75  45.7% 100.00% / 100.00%  
> 371955858  371.96M   7.8%
> early return (on trees) 3   1.8% 100.00% / 100.00%
> 688   688.00   0.0%
> opcode values positive (on trees)  71  43.3% 100.00% / 100.00%  
> 371955658  371.96M   7.8%
> 
> ...
> 
> Thanks,
> Martin
> 

> >From ca9806bf77bd90df43913f5f1552ed16379dcf38 Mon Sep 17 00:00:00 2001
> From: marxin 
> Date: Fri, 3 Jun 2016 12:46:43 +0200
> Subject: [PATCH 4/4] Add new analyze_brprob_spec.py script
> 
> contrib/ChangeLog:
> 
> 2016-06-08  Martin Liska  
> 
>   * analyze_brprob_spec.py: New file.

OK,
thanks

Honza
> ---
>  contrib/analyze_brprob_spec.py | 58 
> ++
>  1 file changed, 58 insertions(+)
>  create mode 100755 contrib/analyze_brprob_spec.py
> 
> diff --git a/contrib/analyze_brprob_spec.py b/contrib/analyze_brprob_spec.py
> new file mode 100755
> index 000..a28eaac
> --- /dev/null
> +++ b/contrib/analyze_brprob_spec.py
> @@ -0,0 +1,58 @@
> +#!/usr/bin/env python3
> +
> +# This file is part of GCC.
> +#
> +# GCC is free software; you can redistribute it and/or modify it under
> +# the terms of the GNU General Public License as published by the Free
> +# Software Foundation; either version 3, or (at your option) any later
> +# version.
> +#
> +# GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> +# WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +# FITNESS

Re: move increase_alignment from simple to regular ipa pass

2016-06-08 Thread Jan Hubicka
> I think it would be nice to work towards transitioning 
> flag_section_anchors to a flag on varpool nodes, thereby removing
> the Optimization flag from common.opt:fsection-anchors
> 
> That would simplify the walk over varpool candidates.

Makes sense to me, too. There are more candidates for sutff that should be
variable specific in common.opt (such as variable alignment, -fdata-sctions,
-fmerge-constants) and targets.  We may try to do it in an easy to extend way
so incrementally we can get rid of those global flags, too.

One thing that needs to be done for LTO is sane merging, I guess in this case
it is clear that the variable should be anchored when its previaling definition
is.

Honza
> 
> Richard.
> 
> > Thanks,
> > Prathamesh
> > >
> > > Honza
> > >>
> > >> Richard.
> > 
> > 
> 
> -- 
> Richard Biener 
> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
> 21284 (AG Nuernberg)


Re: [PATCH] Fold x/x to 1, 0/x to 0 and 0%x to 0 consistently

2016-06-08 Thread Marc Glisse

On Wed, 8 Jun 2016, Richard Biener wrote:


The following works around PR70992 but the issue came up repeatedly
that we are not very consistent in preserving the undefined behavior
of division or modulo by zero.  Ok - the only inconsistency is
that we fold 0 % x to 0 but not 0 % 0 (with literal zero).

After folding is now no longer done early in the C family FEs the
number of diagnostic regressions with the patch below is two.

FAIL: g++.dg/cpp1y/constexpr-sfinae.C  -std=c++14 (test for excess errors)
FAIL: gcc.dg/wcaselabel-1.c  (test for errors, line 10)

And then there is a -fnon-call-exceptions testcase

FAIL: gcc.c-torture/execute/20101011-1.c   -O1  execution test
FAIL: gcc.c-torture/execute/20101011-1.c   -O2  execution test
FAIL: gcc.c-torture/execute/20101011-1.c   -O2 -flto
-fno-use-linker-plugin -flt
o-partition=none  execution test
FAIL: gcc.c-torture/execute/20101011-1.c   -O2 -flto -fuse-linker-plugin
-fno-fa
t-lto-objects  execution test
FAIL: gcc.c-torture/execute/20101011-1.c   -O3 -g  execution test
FAIL: gcc.c-torture/execute/20101011-1.c   -Os  execution test

which tests that 0/0 traps (on targets where it does).  This shows
we might want to guard the simplifications against -fnon-call-exceptions.

The other way to fix the inconsistency is of course to not rely
on undefinedness in 0 % x simplification and disable that if x
is not known to be nonzero.  We can introduce the other transforms
with properly guarding against a zero 2nd operand as well.

So - any opinions here?


Note that currently, VRP optimizes 0/0 to 0 (but not 0%0) when we don't 
pass -fnon-call-exceptions.


If we guard with !flag_non_call_exceptions || tree_expr_nonzero_p(...), 
the transformation seems safe for the middle-end, so the main issue is 
front-ends.


A few random ideas I was considering:
* restrict it to GIMPLE, so we can't have a regression in the front-ends.
* fold x/0 to 0 with TREE_OVERFLOW set, to tell the front-end that 
something is going on.
* fold to (x/y,0) or (x/y,1) so the division by 0 is still there, but 
C++11 constexpr might give a strange message about it, and folding might 
not be idempotent.


But as long as we don't always perform the simplification, we need some 
other way to break the cycle for PR70992.


--
Marc Glisse


Re: [PATCH 0/9] separate shrink-wrapping

2016-06-08 Thread Segher Boessenkool
On Wed, Jun 08, 2016 at 01:55:55PM +0200, Bernd Schmidt wrote:
> On 06/08/2016 03:47 AM, Segher Boessenkool wrote:
> >This patch series introduces separate shrink-wrapping.
> [...]
> >The next six patches are to prevent later passes from mishandling the
> >epilogue instructions that now appear before the epilogue: mostly, you
> >cannot do much to instructions with a REG_CFA_RESTORE note without
> >confusing dwarf2cfi.  The cprop one is for prologue instructions.
> 
> I'll need a while to sort out my thoughts about this. On the whole I 
> like having the ability to do this, but I'm worried about the fragility 
> it introduces in passes after shrink-wrapping.

On the plus side I should have caught most of it now.  And the failures
are rarely silent, they show up during compilation already.

Most of the problems are code changes the later passes want to do that
are valid an sich but that dwarf2cfi does not like, like not restoring
a callee-saved register before a noreturn call.  Those later patches
already know not to touch epilogue instructions, but only for the single
epilogue, not for instructions scattered throughout the whole function.

> Ideally we'd need an ix86 implementation for test coverage reasons.

Yes, but someone who knows the x86 backend well will have to write that.

> Is the usage of the word "concern" here standard for this kind of thing? 
> It seems odd somehow but maybe that's just me.

There is no standard naming for this as far as I know.  I'll gladly
use a better name anyone comes up with.


Segher


Re: libgomp: Unconfuse offload plugins vs. offload targets

2016-06-08 Thread Thomas Schwinge
Hi!

On Wed, 8 Jun 2016 16:08:38 +0200, Jakub Jelinek  wrote:
> On Wed, Jun 08, 2016 at 03:27:44PM +0200, Thomas Schwinge wrote:
> > This got me confused recently, so I took the effort to clean it up.  OK
> > to commit?
> 
> As I said earlier, I don't find anything confusing on what we have there
> and would strongly prefer not to change it.

Please explain why you are rejecting clean-up patches that make the code
(variable names) actually match its semantics, make it easy for the
reader?

> Can you submit the actual testsuite change which got hidden in all the
> renaming separately?
> 
> Thanks.

I submitted that more than a month ago, and pinged it thrice,
.
As you can see, there actually is a difference between offload_plugins
and offload_targets (for example, "intelmic"
vs. "x86_64-intelmicemul-linux-gnu"), and I'm using both variables -- to
avoid having to translate the more specific
"x86_64-intelmicemul-linux-gnu" (which we required in the test harness)
into the less specific "intelmic" (for plugin loading) in
libgomp/target.c.  I can do that, so that we can continue to use just a
single offload_targets variable, but I consider that a less elegant
solution.


Grüße
 Thomas


Re: [PATCH][SPARC] Fix cpu auto-detection in M7 and S7 (Sonoma)

2016-06-08 Thread Jose E. Marchesi

> Starting with the M7 we will be using the same identifiers to identify
> the cpu in both Solaris (via kstat) and GNU/Linux (via /proc/cpuinfo).
> This little patch fixes the SPARC M7 entry in cpu_names, and also adds
> an entry for the Sonoma SoC.
> 
> Tested in sparc64-*-linux-gnu, sparcv9-*-linux-gnu and
> sparc-sun-solaris2.11 targets.
> 
> 2016-06-08  Jose E. Marchesi  
> 
>   * config/sparc/driver-sparc.c (cpu_names): Fix the entry for the
>   SPARC-M7 and add an entry for SPARC-S7 cpus (Sonoma).

OK, but it needs to be applied both on mainline and 6 branch.

No problem.  Do I need the approval of a maintainer in particular for
commits in the 6 branch?


Re: [PATCH][3/3][RTL ifcvt] PR middle-end/37780: Conditional expression with __builtin_clz() should be optimized out

2016-06-08 Thread Kyrill Tkachov


On 07/06/16 20:34, Christophe Lyon wrote:

On 26 May 2016 at 11:53, Kyrill Tkachov  wrote:

Hi all,

In this PR we want to optimise:
int foo (int i)
{
   return (i == 0) ? N : __builtin_clz (i);
}

on targets where CLZ is defined at zero to the constant 'N'.
This is determined at the RTL level through the CLZ_DEFINED_VALUE_AT_ZERO
macro.
The obvious place to implement this would be in combine through simplify-rtx
where we'd
recognise an IF_THEN_ELSE of the form:
(set (reg:SI r1)
  (if_then_else:SI (ne (reg:SI r2)
   (const_int 0 [0]))
(clz:SI (reg:SI r2))
(const_int 32)))

and if CLZ_DEFINED_VALUE_AT_ZERO is defined to 32 for SImode we'd simplify
it into
just (clz:SI (reg:SI r2)).
However, I found this doesn't quite happen for a couple of reasons:
1) This depends on ifcvt or some other pass to have created a conditional
move of the
two branches that provide the IF_THEN_ELSE to propagate the const_int and
clz operation into.

2) Combine will refuse to propagate r2 from the above example into both the
condition and the
CLZ at the same time, so the most we see is:
(set (reg:SI r1)
  (if_then_else:SI (ne (reg:CC cc)
 (const_int 0))
(clz:SI (reg:SI r2))
(const_int 32)))

which is not enough information to perform the simplification.

This patch implements the optimisation in ce1 using the noce ifcvt
framework.
During ifcvt noce_process_if_block can see that we're trying to optimise
something
of the form (x == 0 ? const_int : CLZ (x)) and so it has visibility of all
the information
needed to perform the transformation.

The transformation is performed by adding a new noce_try* function that
tries to put the
condition and the 'then' and 'else' arms into an IF_THEN_ELSE rtx and try to
simplify that
using the simplify-rtx machinery. That way, we can implement the
simplification logic in
simplify-rtx.c where it belongs.

A similar transformation for CTZ is implemented as well.
So for code:
int foo (int i)
{
   return (i == 0) ? 32 : __builtin_clz (i);
}

On aarch64 we now emit:
foo:
 clz w0, w0
 ret

instead of:
foo:
 mov w1, 32
 clz w2, w0
 cmp w0, 0
 cselw0, w2, w1, ne
 ret

and for arm similarly we generate:
foo:
 clz r0, r0
 bx  lr

instead of:
foo:
 cmp r0, #0
 clzne   r0, r0
 moveq   r0, #32
 bx  lr


and for x86_64 with -O2 -mlzcnt we generate:
foo:
 xorl%eax, %eax
 lzcntl  %edi, %eax
 ret

instead of:
foo:
 xorl%eax, %eax
 movl$32, %edx
 lzcntl  %edi, %eax
 testl   %edi, %edi
 cmove   %edx, %eax
 ret


I tried getting this to work on other targets as well, but encountered
difficulties.
For example on powerpc the two arms of the condition seen during ifcvt are:

(insn 4 22 11 4 (set (reg:DI 156 [  ])
 (const_int 32 [0x20])) clz.c:3 434 {*movdi_internal64}
  (nil))
and
(insn 10 9 23 3 (set (subreg/s/u:SI (reg:DI 156 [  ]) 0)
 (clz:SI (subreg/u:SI (reg/v:DI 157 [ i ]) 0))) clz.c:3 132 {clzsi2}
  (expr_list:REG_DEAD (reg/v:DI 157 [ i ])
 (nil)))

So the setup code in noce_process_if_block sees that the set destination is
not the same
((reg:DI 156 [  ]) and (subreg/s/u:SI (reg:DI 156 [  ]) 0))
so it bails out on the rtx_interchangeable_p (x, SET_DEST (set_b)) check.
I suppose that's a consequence of how SImode operations are represented in
early RTL
on powerpc, I don't know what to do there. Perhaps that part of ivcvt can be
taught to handle
destinations that are subregs of one another, but that would be a separate
patch.

Anyway, is this patch ok for trunk?

Bootstrapped and tested on arm-none-linux-gnueabihf, aarch64-none-linux-gnu,
x86_64-pc-linux-gnu.

Thanks,
Kyrill

2016-05-26  Kyrylo Tkachov  

 PR middle-end/37780
 * ifcvt.c (noce_try_ifelse_collapse): New function.
 Declare prototype.
 (noce_process_if_block): Call noce_try_ifelse_collapse.
 * simplify-rtx.c (simplify_cond_clz_ctz): New function.
 (simplify_ternary_operation): Use the above to simplify
 conditional CLZ/CTZ expressions.

2016-05-26  Kyrylo Tkachov  

 PR middle-end/37780
 * gcc.c-torture/execute/pr37780.c: New test.
 * gcc.target/aarch64/pr37780_1.c: Likewise.
 * gcc.target/arm/pr37780_1.c: Likewise.

Hi Kyrylo,


Hi Christophe,


I've noticed that gcc.target/arm/pr37780_1.c fails on
arm if arch < v6.
I first tried to fix the effective-target guard (IIRC, the doc
says clz is available starting with v5t), but that isn't sufficient.

When compiling for armv5-t, the scan-assembler directives
fail. It seems to work with v6t2, so I am wondering whether
it's just a matter of increasing the effective-target arch version,
or if you really intended to make the test pass on these old
architectures?


I've dug into it a bit.
I think the problem is that CLZ is available with ARMv

Re: [PATCH 0/9] separate shrink-wrapping

2016-06-08 Thread Bernd Schmidt

On 06/08/2016 05:16 PM, Segher Boessenkool wrote:

On the plus side I should have caught most of it now.  And the failures
are rarely silent, they show up during compilation already.


That does count as a plus. Aborts in dwarf2cfi, I assume.


Most of the problems are code changes the later passes want to do that
are valid an sich but that dwarf2cfi does not like, like not restoring
a callee-saved register before a noreturn call.  Those later patches
already know not to touch epilogue instructions, but only for the single
epilogue, not for instructions scattered throughout the whole function.


Yeah, that's a problem though - having to disable otherwise valid 
transformations is always a source of errors.


Is there a strong reason to keep thread_p_e_insns at its current 
position in the compilation process, or could it be moved later to 
expose this problem to fewer passes?



There is no standard naming for this as far as I know.  I'll gladly
use a better name anyone comes up with.


Maybe just subpart?


Bernd


[Patch, lra] PR70751, correct the cost for spilling non-pseudo into memory

2016-06-08 Thread Jiong Wang

As discussed on the PR

  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70751,

here is the patch.

For this particular failure on arm, *arm_movsi_insn has the following operand
constraints:
  
  operand 0: "=rk,r,r,r,rk,m"

  operand 1: "rk, I,K,j,mi,rk"

gcc won't explicitly refuse an unmatch CT_MEMORY operand (r235184) if it
comes from substituion that alternative (alt) 4 got a chance to compete with
alt 0, and eventually be the winner as it's with rld_nregs=0 while alt 0 is
with rld_nregs=1.

I fell it's OK to give alt 4 a chance here, but we should calculate the cost
correctly.

For alt 4, it should be treated as spill into memory, but currently lra only
recognize a spill for pseudo register.   While the spilled rtx for alt 4 is a
plus after equiv substitution.

 (plus:SI (reg/f:SI 102 sfp)
(const_int 4 [0x4]))

This patch thus let lra-constraint cost spill of Non-pseudo as well and fixed
the regression.

x86_64/aarch64/arm boostrap and regression OK.
arm bootstrapped cc1 is about 0.3% smaller in code size.

OK for trunk?

gcc/
PR rtl-optimization/70751
* lra-constraints.c (process_alt_operands): Recognize Non-pseudo 
spilled into
memory.

diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c
index e4e6c8c..8f2db87 100644
--- a/gcc/lra-constraints.c
+++ b/gcc/lra-constraints.c
@@ -2474,14 +2474,29 @@ process_alt_operands (int only_alternative)
 	  /* We are trying to spill pseudo into memory.  It is
 		 usually more costly than moving to a hard register
 		 although it might takes the same number of
-		 reloads.  */
-	  if (no_regs_p && REG_P (op) && hard_regno[nop] >= 0)
+		 reloads.
+
+		 PR 70751, non-pseudo spill may happen also.  For example, if an
+		 operand comes from equiv substitution, then we won't reject it
+		 if it's an unmatch CT_MEMORY in above code (see r235184).
+		 Suppose a target allows both register and memory in the
+		 operand constraint alternatives, then it's typical that an
+		 eliminable register has a substition of "base + offset" which
+		 can either be reloaded by a simple "new_reg <= base + offset"
+		 which will match the register constraint, or a similar reg
+		 addition followed by further spill to and reload from memory
+		 which will match the memory constraint, but this memory spill
+		 will be much more costly usually.
+
+		 Code below increases the reject for both pseudo and non-pseudo
+		 spill.  */
+	  if (no_regs_p && !(REG_P (op) && hard_regno[nop] < 0))
 		{
 		  if (lra_dump_file != NULL)
 		fprintf
 		  (lra_dump_file,
-		   "%d Spill pseudo into memory: reject+=3\n",
-		   nop);
+		   "%d Spill %spseudo into memory: reject+=3\n",
+		   nop, REG_P (op) ? "" : "Non-");
 		  reject += 3;
 		  if (VECTOR_MODE_P (mode))
 		{


Re: [PATCH 1/2][AArch64] Implement AAPCS64 updates for alignment attribute

2016-06-08 Thread James Greenhalgh
On Tue, Jun 07, 2016 at 12:07:03PM +0100, James Greenhalgh wrote:
> On Fri, Jan 22, 2016 at 05:16:00PM +, Alan Lawrence wrote:
> > 
> > On 21/01/16 17:23, Alan Lawrence wrote:
> > > On 18/01/16 17:10, Eric Botcazou wrote:
> > >>
> > >> Could you post the list of files that differ?  How do they differ 
> > >> exactly?
> > >
> > > Hmmm. Well, I definitely had this failing to bootstrap once. I repeated 
> > > that, to
> > > try to identify exactly what the differences wereand it succeeded 
> > > even with
> > > my pre-AAPCS64-update host compiler. So, this is probably a false alarm; 
> > > I'm
> > > bootstrapping again, after a rebase, to make sure...
> > >
> > > --Alan
> > 
> > Ok, rebased onto a more recent build, and bootstrapping with Ada posed no
> > problems. Sorry for the noise.
> > 
> > However, I had to drop the assert that TYPE_FIELDS was non-null because of 
> > some
> > C++ testcases.
> > 
> > Is this version OK for trunk?
> 
> Now that we're in GCC7, this version of the patch is OK for trunk.
> 
> From my reading of Richard's AAPCS update, this patch implements the
> rules as required.
> 
> I'll give this a day for any last minute comments from Richard/Marcus,
> then commit this on your behalf tomorrow.

I've now committed this on Alan's behalf as revisions r237224 (this patch)
and r237225 (the tests) respectively.

Thanks,
James



Re: [PATCH 0/9] separate shrink-wrapping

2016-06-08 Thread Segher Boessenkool
On Wed, Jun 08, 2016 at 06:43:23PM +0200, Bernd Schmidt wrote:
> On 06/08/2016 05:16 PM, Segher Boessenkool wrote:
> >On the plus side I should have caught most of it now.  And the failures
> >are rarely silent, they show up during compilation already.
> 
> That does count as a plus. Aborts in dwarf2cfi, I assume.

Yeah.

> >Most of the problems are code changes the later passes want to do that
> >are valid an sich but that dwarf2cfi does not like, like not restoring
> >a callee-saved register before a noreturn call.  Those later patches
> >already know not to touch epilogue instructions, but only for the single
> >epilogue, not for instructions scattered throughout the whole function.
> 
> Yeah, that's a problem though - having to disable otherwise valid 
> transformations is always a source of errors.

Ideally we would be able to e.g. dead-code delete frame restores and not
have dwarf2cfi throw fits.  I'm not certain we can always express that
in the CFI though (two paths joining at the same block, having different
locations for the saved registers -- one restored and one not).  Maybe
we could just say a register is restored even when it's not, but that
seems very fragile.

One thing I should try is put a USE of the saved registers at such
exits, maybe that helps those passes that now delete frame restores
to not do that.

> Is there a strong reason to keep thread_p_e_insns at its current 
> position in the compilation process, or could it be moved later to 
> expose this problem to fewer passes?

peephole2, bbro, split4/5, sched2 should stay later.  It seems reasonable
we still want some dce/dse there.  rnreg needs to be late, too; maybe
not that late though.  Dunno about cprop_hardreg, but I guess it wants to
be after peep2.

> >There is no standard naming for this as far as I know.  I'll gladly
> >use a better name anyone comes up with.
> 
> Maybe just subpart?

That is maybe just a bit too generic.  Naming, such a hard problem :-)


Segher


Re: [PATCH] Fold x/x to 1, 0/x to 0 and 0%x to 0 consistently

2016-06-08 Thread Jason Merrill
On Wed, Jun 8, 2016 at 11:16 AM, Marc Glisse  wrote:
> On Wed, 8 Jun 2016, Richard Biener wrote:
>
>> The following works around PR70992 but the issue came up repeatedly
>> that we are not very consistent in preserving the undefined behavior
>> of division or modulo by zero.  Ok - the only inconsistency is
>> that we fold 0 % x to 0 but not 0 % 0 (with literal zero).
>>
>> After folding is now no longer done early in the C family FEs the
>> number of diagnostic regressions with the patch below is two.
>>
>> FAIL: g++.dg/cpp1y/constexpr-sfinae.C  -std=c++14 (test for excess errors)

Yep. We don't want to fold away undefined behavior in a constexpr
function, since constexpr evaluation wants to detect undefined
behavior and treat the expression as non-constant in that case.

> A few random ideas I was considering:
> * restrict it to GIMPLE, so we can't have a regression in the front-ends.
> * fold x/0 to 0 with TREE_OVERFLOW set, to tell the front-end that something
> is going on.
> * fold to (x/y,0) or (x/y,1) so the division by 0 is still there, but C++11
> constexpr might give a strange message about it, and folding might not be
> idempotent.

Any of these would avoid the constexpr regression, though the second
would make the diagnostic worse.  Or the front end could copy
constexpr function bodies before folding.

Jason


Re: [C++ Patch/RFC] Tiny tsubst tweak

2016-06-08 Thread Jason Merrill
The first patch is OK.

Jason


Re: [PATCH] Fold x/x to 1, 0/x to 0 and 0%x to 0 consistently

2016-06-08 Thread Jakub Jelinek
On Wed, Jun 08, 2016 at 01:43:56PM -0400, Jason Merrill wrote:
> > A few random ideas I was considering:
> > * restrict it to GIMPLE, so we can't have a regression in the front-ends.
> > * fold x/0 to 0 with TREE_OVERFLOW set, to tell the front-end that something
> > is going on.
> > * fold to (x/y,0) or (x/y,1) so the division by 0 is still there, but C++11
> > constexpr might give a strange message about it, and folding might not be
> > idempotent.
> 
> Any of these would avoid the constexpr regression, though the second
> would make the diagnostic worse.  Or the front end could copy
> constexpr function bodies before folding.

Or, both cxx_eval_binary_expression and cp_fold would need to
not fold if the divisor is integer_zerop.

Jakub


Re: [C++ PATCH] Fix -Wunused-* regression (PR c++/71442)

2016-06-08 Thread Jason Merrill
OK.

Jason


Re: [PATCH][SPARC] Fix cpu auto-detection in M7 and S7 (Sonoma)

2016-06-08 Thread Jose E. Marchesi

> Starting with the M7 we will be using the same identifiers to identify
> the cpu in both Solaris (via kstat) and GNU/Linux (via /proc/cpuinfo).
> This little patch fixes the SPARC M7 entry in cpu_names, and also adds
> an entry for the Sonoma SoC.
> 
> Tested in sparc64-*-linux-gnu, sparcv9-*-linux-gnu and
> sparc-sun-solaris2.11 targets.
> 
> 2016-06-08  Jose E. Marchesi  
> 
>   * config/sparc/driver-sparc.c (cpu_names): Fix the entry for the
>   SPARC-M7 and add an entry for SPARC-S7 cpus (Sonoma).

OK, but it needs to be applied both on mainline and 6 branch.

No problem.  Do I need the approval of a maintainer in particular for
commits in the 6 branch?

Oh, never mind.  Now I read your email properly :)


Re: [PATCH] integer overflow checking builtins in constant expressions

2016-06-08 Thread Jason Merrill

OK.

Jason


Re: [C++ Patch] Fix some simple location issues

2016-06-08 Thread Jason Merrill

OK.

Jason


Re: [PATCH][SPARC] Fix cpu auto-detection in M7 and S7 (Sonoma)

2016-06-08 Thread Jose E. Marchesi

> Starting with the M7 we will be using the same identifiers to identify
> the cpu in both Solaris (via kstat) and GNU/Linux (via /proc/cpuinfo).
> This little patch fixes the SPARC M7 entry in cpu_names, and also adds
> an entry for the Sonoma SoC.
> 
> Tested in sparc64-*-linux-gnu, sparcv9-*-linux-gnu and
> sparc-sun-solaris2.11 targets.
> 
> 2016-06-08  Jose E. Marchesi  
> 
>   * config/sparc/driver-sparc.c (cpu_names): Fix the entry for the
>   SPARC-M7 and add an entry for SPARC-S7 cpus (Sonoma).

OK, but it needs to be applied both on mainline and 6 branch.

Applied to both trunk and branches/gcc-6-branch.
Thanks.


Re: [PATCH 9/17][ARM] Add NEON FP16 arithmetic instructions.

2016-06-08 Thread Joseph Myers
On Wed, 8 Jun 2016, James Greenhalgh wrote:

> My question is whether you consider the different behaviour between scalar
> float16_t and vector-of-float16_t types to be a bug? I can think of some

No, because it matches how things work for vectors of integer types.  
E.g.:

typedef unsigned char vuc __attribute__((vector_size(8)));

vuc a = { 128, 128, 128, 128, 128, 128, 128, 128 }, b;

int
main (void)
{
  b = a / (a + a);
  return 0;
}

(Does a divide-by-zero, because (a + a) is evaluated without promotion to 
vector of int.)

It's a general rule for vector operations that there are no promotions 
that change the bit-size of the vectors, so arithmetic is done directly on 
unsigned char in this case, even though it normally would not be.  
Conversions when the types match apart from signedness are, as the comment 
in c_common_type notes, not fully defined.

  /* If one type is a vector type, return that type.  (How the usual
 arithmetic conversions apply to the vector types extension is not
 precisely specified.)  */

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: Add a test for C DR#423 (PR c/65471)

2016-06-08 Thread Joseph Myers
On Wed, 8 Jun 2016, Marek Polacek wrote:

> Reading  it occured
> to me that we might resolve c/65471, dealing with type interpretation in
> _Generic, too.  Since it turned out that GCC already does the right thing,
> I'm only adding a new test.  (We should discard qualifiers from controlling 
> expression of _Generic.)
> 
> Regarding the bit-field issue, it seems that we should do nothing and keep
> it implementation-defined for now.
> 
> Tested on x86_64-linux, ok for trunk?

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


Container debug light mode

2016-06-08 Thread François Dumont

Hi

Here is the patch I already proposed to introduce the debug light 
mode for vector and deque containers.


It also simplify some internal calls.

* include/debug/debug.h
(__glibcxx_requires_non_empty_range, __glibcxx_requires_nonempty)
(__glibcxx_requires_subscript): Move...
* include/debug/assertions.h: ...here and add __builtin_expect.
(_GLIBCXX_DEBUG_ONLY): Remove ; value.
* include/bits/stl_deque.h
(std::deque<>::operator[]): Add __glibcxx_requires_subscript check.
(std::deque<>::front()): Add __glibcxx_requires_nonempty check.
(std::deque<>::back()): Likewise.
(std::deque<>::pop_front()): Likewise.
(std::deque<>::pop_back()): Likewise.
(std::deque<>::swap(deque&)): Add allocator check.
(std::deque<>::operator=): Call _M_assign_aux.
(std::deque<>::assign(initializer_list<>)): Likewise.
(std::deque<>::resize(size_t, const value_type&)): Call _M_fill_insert.
(std::deque<>::insert(const_iterator, initializer_list<>)):
Call _M_range_insert_aux.
(std::deque<>::_M_assign_aux(It, It, std::forward_iterator_tag):
Likewise.
(std::deque<>::_M_fill_assign): Call _M_fill_insert.
(std::deque<>::_M_move_assign2): Call _M_assign_aux.
* include/bits/deque.tcc
(std::deque<>::operator=): Call _M_range_insert_aux.
(std::deque<>::_M_assign_aux(It, It, std::input_iterator_tag)):
Likewise.
* include/bits/stl_vector.h
(std::vector<>::operator[]): Add __glibcxx_requires_subscript check.
(std::vector<>::front()): Add __glibcxx_requires_nonempty check.
(std::vector<>::back()): Likewise.
(std::vector<>::pop_back()): Likewise.
(std::vector<>::swap(vector&)): Add allocator check.
(std::vector<>::operator=): Call _M_assign_aux.
(std::vector<>::assign(initializer_list<>)): Likewise.
(std::vector<>::resize(size_t, const value_type&)): Call 
_M_fill_insert.

(std::vector<>::insert(const_iterator, initializer_list<>)):
Call _M_range_insert.
* include/bits/vector.tcc (std::vector<>::_M_assign_aux): Likewise.

Successfully run vector and deque tests under Linux x86_64 for now, will 
complete before commit.


François
Index: include/bits/deque.tcc
===
--- include/bits/deque.tcc	(revision 237180)
+++ include/bits/deque.tcc	(working copy)
@@ -119,7 +119,8 @@
 	{
 	  const_iterator __mid = __x.begin() + difference_type(__len);
 	  std::copy(__x.begin(), __mid, this->_M_impl._M_start);
-	  insert(this->_M_impl._M_finish, __mid, __x.end());
+	  _M_range_insert_aux(this->_M_impl._M_finish, __mid, __x.end(),
+  std::random_access_iterator_tag());
 	}
 	}
   return *this;
@@ -280,7 +281,8 @@
 if (__first == __last)
   _M_erase_at_end(__cur);
 else
-  insert(end(), __first, __last);
+  _M_range_insert_aux(end(), __first, __last,
+			  std::__iterator_category(__first));
   }
 
   template 
Index: include/bits/stl_deque.h
===
--- include/bits/stl_deque.h	(revision 237180)
+++ include/bits/stl_deque.h	(working copy)
@@ -63,6 +63,8 @@
 #include 
 #endif
 
+#include 
+
 namespace std _GLIBCXX_VISIBILITY(default)
 {
 _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
@@ -1081,7 +1083,8 @@
   deque&
   operator=(initializer_list __l)
   {
-	this->assign(__l.begin(), __l.end());
+	_M_assign_aux(__l.begin(), __l.end(),
+		  random_access_iterator_tag());
 	return *this;
   }
 #endif
@@ -1142,7 +1145,7 @@
*/
   void
   assign(initializer_list __l)
-  { this->assign(__l.begin(), __l.end()); }
+  { _M_assign_aux(__l.begin(), __l.end(), random_access_iterator_tag()); }
 #endif
 
   /// Get a copy of the memory allocation object.
@@ -1306,7 +1309,7 @@
   {
 	const size_type __len = size();
 	if (__new_size > __len)
-	  insert(this->_M_impl._M_finish, __new_size - __len, __x);
+	  _M_fill_insert(this->_M_impl._M_finish, __new_size - __len, __x);
 	else if (__new_size < __len)
 	  _M_erase_at_end(this->_M_impl._M_start
 			  + difference_type(__new_size));
@@ -1328,7 +1331,7 @@
   {
 	const size_type __len = size();
 	if (__new_size > __len)
-	  insert(this->_M_impl._M_finish, __new_size - __len, __x);
+	  _M_fill_insert(this->_M_impl._M_finish, __new_size - __len, __x);
 	else if (__new_size < __len)
 	  _M_erase_at_end(this->_M_impl._M_start
 			  + difference_type(__new_size));
@@ -1364,7 +1367,10 @@
*/
   reference
   operator[](size_type __n) _GLIBCXX_NOEXCEPT
-  { return this->_M_impl._M_start[difference_type(__n)]; }
+  {
+	__glibcxx_requires_subscript(__n);
+	return this->_M_impl._M_start[difference_type(__n)];
+  }
 
   /**
*  @brief Subscript access to the data contained in the %deque.
@@ -1379,7 +1385,10 @@
*/
   const_reference
   operator[](size_type __n) const _GLIBCXX_NOEXCEPT
-  { return

Re: [PATCH][SPARC] Fix cpu auto-detection in M7 and S7 (Sonoma)

2016-06-08 Thread Eric Botcazou
> No problem.  Do I need the approval of a maintainer in particular for
> commits in the 6 branch?

No, maintainers can approve patches for branches outside of release periods.

-- 
Eric Botcazou


  1   2   >