Re: [PATCH 1/4][Ada,DJGPP] Ada support for DJGPP

2016-08-15 Thread Eric Botcazou
> Both '/' and '\' must be supported as directory separators. So
> DIR_SEPARATOR='/' is not OK in this case.

Understood.
 
> Unconditional converting '/' to '\' in case of DJGPP native build causes
> gnatmake to break. Retested it today it with gcc-6.1.0. The problem is that
> special directory name /dev/env/DJDIR is used as prefix for DJGPP (it
> resolves to $DJDIR in execution time)

So it's only because of the /dev/ thing, i.e. this would work without it?  If 
so, can we restrict the special-casing to this block of code?

  --  Replace all '/' by Directory Separators (this is for Windows)

  if Directory_Separator /= '/' then
 for Index in 1 .. End_Path loop
if Path_Buffer (Index) = '/' then
   Path_Buffer (Index) := Directory_Separator;
end if;
 end loop;
  end if;

IOW, can we disable it for the /dev/ thing and leave the rest untouched?
Since DIR_SEPARATOR=='\', the block immediately below will be disabled too.

-- 
Eric Botcazou


[committed] Fix typo in my recent const_with_all_bytes_same fix (PR tree-optimization/72824)

2016-08-15 Thread Jakub Jelinek
Hi!

Martin has reported a typo in my recent const_with_all_bytes_same change,
apparently we don't have any test coverage for that, so I've added a test
and committed to trunk and 6.2 as obvious.

2016-08-15  Martin Liska  
Jakub Jelinek  

PR tree-optimization/72824
* tree-loop-distribution.c (const_with_all_bytes_same)
: Fix a typo.

* gcc.c-torture/execute/ieee/pr72824-2.c: New test.

--- gcc/tree-loop-distribution.c.jj 2016-08-09 09:46:27.0 +0200
+++ gcc/tree-loop-distribution.c2016-08-15 10:21:03.982598447 +0200
@@ -774,7 +774,7 @@ const_with_all_bytes_same (tree val)
case VECTOR_CST:
  unsigned int j;
  for (j = 0; j < VECTOR_CST_NELTS (val); ++j)
-   if (const_with_all_bytes_same (VECTOR_CST_ELT (val, i)))
+   if (const_with_all_bytes_same (VECTOR_CST_ELT (val, j)))
  break;
  if (j == VECTOR_CST_NELTS (val))
return 0;
--- gcc/testsuite/gcc.c-torture/execute/ieee/pr72824-2.c.jj 2016-08-15 
10:23:32.735731036 +0200
+++ gcc/testsuite/gcc.c-torture/execute/ieee/pr72824-2.c2016-08-15 
10:20:05.0 +0200
@@ -0,0 +1,21 @@
+/* PR tree-optimization/72824 */
+
+typedef float V __attribute__((vector_size (4 * sizeof (float;
+
+static inline void
+foo (V *x, V value)
+{
+  int i;
+  for (i = 0; i < 32; ++i)
+x[i] = value;
+}
+
+int
+main ()
+{
+  V x[32];
+  foo (x, (V) { 0.f, -0.f, 0.f, -0.f });
+  if (__builtin_copysignf (1.0, x[3][1]) != -1.0f)
+__builtin_abort ();
+  return 0;
+}

Jakub


Patch ping

2016-08-15 Thread Jakub Jelinek
Hi!

I'd like to ping following fix:
PR71910 - http://gcc.gnu.org/ml/gcc-patches/2016-08/msg00624.html

Jakub


[PATCH, libstdc++] Fixed PR72840: dg-error syntax in 20_util/ratio/cons/cons_overflow_neg.cc as obvious

2016-08-15 Thread Thomas Preudhomme
Libstdc++-v3's test 20_util/ratio/cons/cons_overflow_neg.cc is missing closing 
curly braces for 2 dg-error directives, making them be ignored by dejagnu. Fixed 
as obvious.


ChangeLog entry is as follows:

*** libstdc++-v3/ChangeLog ***

2016-08-15  Thomas Preud'homme  

PR libstdc++/72840
* testsuite/20_util/ratio/cons/cons_overflow_neg.cc: Fix dg-error
syntax.


Best regards,

Thomas
diff --git a/libstdc++-v3/testsuite/20_util/ratio/cons/cons_overflow_neg.cc b/libstdc++-v3/testsuite/20_util/ratio/cons/cons_overflow_neg.cc
index a101d2938a798324a56fb0ba4e503a9e66b30001..51a7926d35b4cdc0fe5e7bd23c805dfccfb77a44 100644
--- a/libstdc++-v3/testsuite/20_util/ratio/cons/cons_overflow_neg.cc
+++ b/libstdc++-v3/testsuite/20_util/ratio/cons/cons_overflow_neg.cc
@@ -37,13 +37,13 @@ test02()
 void
 test03()
 {
-  std::ratio<1, INTMAX_MIN> r1 __attribute__((unused)); // { dg-error "required from here"
+  std::ratio<1, INTMAX_MIN> r1 __attribute__((unused)); // { dg-error "required from here" }
 }
 
 void
 test04()
 {
-  std::ratio<1,0> r1 __attribute__((unused)); // { dg-error "required from here"
+  std::ratio<1,0> r1 __attribute__((unused)); // { dg-error "required from here" }
 }
 
 // { dg-error "denominator cannot be zero" "" { target *-*-* } 265 }


[doc] Document GNU make version for libjava on Solaris

2016-08-15 Thread Eric Botcazou
It turns out that neither GNU make 3.80 nor 3.81 can build libjava on Solaris 
with the Solaris linker because of the final "ver-sun" recipe.  Probably not 
worth fixing at this point, so this patch simply documents it instead.

Tested with "make doc", applied on all active branches as obvious.


2016-08-15  Eric Botcazou  

* doc/install.texi (*-*-solaris2*): Fix version number and document
requirement on GNU make for building libjava with the Solaris linker.

-- 
Eric BotcazouIndex: doc/install.texi
===
--- doc/install.texi	(revision 239324)
+++ doc/install.texi	(working copy)
@@ -4511,7 +4511,7 @@ supported as cross-compilation target only.
 @c alone is too unspecific and must be avoided.
 @anchor{x-x-solaris2}
 @heading *-*-solaris2*
-Support for Solaris 9 has been removed in GCC 4.10.  Support for Solaris
+Support for Solaris 9 has been removed in GCC 5.  Support for Solaris
 8 has been removed in GCC 4.8.  Support for Solaris 7 has been removed
 in GCC 4.6.
 
@@ -4583,12 +4583,15 @@ features, so better stay with Solaris @command{ld}
 plugin (@option{-fuse-linker-plugin}) with GNU @command{ld}, GNU
 binutils @emph{must} be configured with @option{--enable-largefile}.
 
-To enable symbol versioning in @samp{libstdc++} with Solaris @command{ld},
+To enable symbol versioning in @samp{libstdc++} with the Solaris linker,
 you need to have any version of GNU @command{c++filt}, which is part of
 GNU binutils.  @samp{libstdc++} symbol versioning will be disabled if no
-appropriate version is found.  Solaris @command{c++filt} from the Solaris Studio
-compilers does @emph{not} work.
+appropriate version is found.  Solaris @command{c++filt} from the Solaris
+Studio compilers does @emph{not} work.
 
+GNU @command{make} version 3.82 or later is required to build libjava
+with the Solaris linker.
+
 Sun bug 4927647 sometimes causes random spurious testsuite failures
 related to missing diagnostic output.  This bug doesn't affect GCC
 itself, rather it is a kernel bug triggered by the @command{expect}


Re: [PATCH] Add mark_spam.py script

2016-08-15 Thread Martin Liška
This is version of the script I've just installed as r239467.

Martin
>From 6385fc5c8729dcabd791c5b0cc5ba2ff64e68489 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Mon, 15 Aug 2016 11:28:35 +0200
Subject: [PATCH] Enhance mark_spam.py script

contrib/ChangeLog:

2016-08-15  Martin Liska  

	* mark_spam.py: Add error handling and reset
	another properties of attachments and bugs.
---
 contrib/mark_spam.py | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/contrib/mark_spam.py b/contrib/mark_spam.py
index 569a03d..f206356 100755
--- a/contrib/mark_spam.py
+++ b/contrib/mark_spam.py
@@ -34,6 +34,10 @@ def mark_as_spam(id, api_key, verbose):
 r = requests.get(u)
 response = json.loads(r.text)
 
+if 'error' in response and response['error']:
+print(response['message'])
+return
+
 # 2) mark the bug as spam
 cc_list = response['bugs'][0]['cc']
 data = {
@@ -49,6 +53,7 @@ def mark_as_spam(id, api_key, verbose):
 'cc': {'remove': cc_list},
 'priority': 'P5',
 'severity': 'trivial',
+'url': '',
 'assigned_to': 'unassig...@gcc.gnu.org' }
 
 r = requests.put(u, json = data)
@@ -74,7 +79,12 @@ def mark_as_spam(id, api_key, verbose):
 for a in attachments:
 attachment_id = a['id']
 url = '%sbug/attachment/%d' % (base_url, attachment_id)
-r = requests.put(url, json = {'ids': [attachment_id], 'summary': 'spam', 'comment': 'spam', 'is_obsolete': True, 'api_key': api_key})
+r = requests.put(url, json = {'ids': [attachment_id],
+'summary': 'spam',
+'file_name': 'spam',
+'content_type': 'application/x-spam',
+'is_obsolete': True,
+'api_key': api_key})
 if verbose:
 print(r)
 print(r.text)
-- 
2.9.2



Re: [PATCH] Add mark_spam.py script

2016-08-15 Thread Jakub Jelinek
On Mon, Aug 15, 2016 at 11:31:22AM +0200, Martin Liška wrote:
> diff --git a/contrib/mark_spam.py b/contrib/mark_spam.py
> index 569a03d..f206356 100755
> --- a/contrib/mark_spam.py
> +++ b/contrib/mark_spam.py
> @@ -34,6 +34,10 @@ def mark_as_spam(id, api_key, verbose):
>  r = requests.get(u)
>  response = json.loads(r.text)
>  
> +if 'error' in response and response['error']:
> +print(response['message'])
> +return
> +
>  # 2) mark the bug as spam
>  cc_list = response['bugs'][0]['cc']
>  data = {
> @@ -49,6 +53,7 @@ def mark_as_spam(id, api_key, verbose):
>  'cc': {'remove': cc_list},
>  'priority': 'P5',
>  'severity': 'trivial',
> +'url': '',
>  'assigned_to': 'unassig...@gcc.gnu.org' }
>  
>  r = requests.put(u, json = data)
> @@ -74,7 +79,12 @@ def mark_as_spam(id, api_key, verbose):
>  for a in attachments:
>  attachment_id = a['id']
>  url = '%sbug/attachment/%d' % (base_url, attachment_id)
> -r = requests.put(url, json = {'ids': [attachment_id], 'summary': 
> 'spam', 'comment': 'spam', 'is_obsolete': True, 'api_key': api_key})
> +r = requests.put(url, json = {'ids': [attachment_id],
> +'summary': 'spam',
> +'file_name': 'spam',
> +'content_type': 'application/x-spam',
> +'is_obsolete': True,

Is dropping of 'comment": 'spam' intentional?

> +'api_key': api_key})
>  if verbose:
>  print(r)
>  print(r.text)

Jakub


Re: [PATCH] Add mark_spam.py script

2016-08-15 Thread Martin Liška
On 08/15/2016 11:37 AM, Jakub Jelinek wrote:
> On Mon, Aug 15, 2016 at 11:31:22AM +0200, Martin Liška wrote:
>> diff --git a/contrib/mark_spam.py b/contrib/mark_spam.py
>> index 569a03d..f206356 100755
>> --- a/contrib/mark_spam.py
>> +++ b/contrib/mark_spam.py
>> @@ -34,6 +34,10 @@ def mark_as_spam(id, api_key, verbose):
>>  r = requests.get(u)
>>  response = json.loads(r.text)
>>  
>> +if 'error' in response and response['error']:
>> +print(response['message'])
>> +return
>> +
>>  # 2) mark the bug as spam
>>  cc_list = response['bugs'][0]['cc']
>>  data = {
>> @@ -49,6 +53,7 @@ def mark_as_spam(id, api_key, verbose):
>>  'cc': {'remove': cc_list},
>>  'priority': 'P5',
>>  'severity': 'trivial',
>> +'url': '',
>>  'assigned_to': 'unassig...@gcc.gnu.org' }
>>  
>>  r = requests.put(u, json = data)
>> @@ -74,7 +79,12 @@ def mark_as_spam(id, api_key, verbose):
>>  for a in attachments:
>>  attachment_id = a['id']
>>  url = '%sbug/attachment/%d' % (base_url, attachment_id)
>> -r = requests.put(url, json = {'ids': [attachment_id], 'summary': 
>> 'spam', 'comment': 'spam', 'is_obsolete': True, 'api_key': api_key})
>> +r = requests.put(url, json = {'ids': [attachment_id],
>> +'summary': 'spam',
>> +'file_name': 'spam',
>> +'content_type': 'application/x-spam',
>> +'is_obsolete': True,
> 
> Is dropping of 'comment": 'spam' intentional?

Yes, it's not necessary to do a comment about the change for an attachment.
As the name of the attachment is set to spam, it's obvious in a comment
that is made for that.

Martin

> 
>> +'api_key': api_key})
>>  if verbose:
>>  print(r)
>>  print(r.text)
> 
>   Jakub
> 



Re: [PATCH, libstdc++v3]: Fallback to read/write ops in case sendfile fails with ENOSYS or EINVAL.

2016-08-15 Thread Jonathan Wakely

On 11/08/16 21:27 +0200, Uros Bizjak wrote:

Hello!

Attached patch implements note from sendfile manpage:


Applications may wish to fall back to read(2)/write(2) in the case
where sendfile() fails with EINVAL or ENOSYS.


Also, the patch fixes a small inconsistency in how
_GLIBCXX_USE_FCHMODAT config flag is handled in do_copy_file function.

2016-08-11  Uros Bizjak  

   * src/filesystem/ops.cc: Always include ostream and
   ext/stdio_filebuf.h.
   (do_copy_file): Check if _GLIBCXX_USE_FCHMODAT is defined.
   [_GLIBCXX_USE_SENDFILE]: Fallback to read/write operations in case
   sendfile fails with ENOSYS or EINVAL.

Patch was bootstrapped and regression tested on x86_64-linux-gnu
{,-m32} on CentOS 5.11 (where sendfile returns EINVAL for file->file
copy) and Fedora 24. In addition, the patch was bootstraped and
regression tested with _GLIBCXX_USE_SENDFILE manually disabled after
configure.

OK for mainline?


Yes, thanks.

The src/filesystem/ops.cc code is identical on the gcc-5 and gcc-6
branches, so could you backport it there too please? (After the 6.3
release though).




Re: [PATCH] Extend -falign-FOO=N to N[,M]: the second number is max padding

2016-08-15 Thread Richard Biener
On Fri, Aug 12, 2016 at 9:00 PM, Denys Vlasenko  wrote:
> On 08/12/2016 05:20 PM, Denys Vlasenko wrote:
>>>
>>> Yes, I know all that.  Fetching is one thing.  Loop cache is for instance
>>> another (more important) thing.  Not aligning the loop head increases
>>> chance of the whole loop being split over more cache lines than
>>> necessary.
>>> Jump predictors also don't necessarily decode/remember the whole
>>> instruction address.  And so on.
>>>
 Aligning to 8 bytes within a cacheline does not speed things up. It
 simply wastes bytes without speeding up anything.
>>>
>>>
>>> It's not that easy, which is why I have asked if you have _measured_ the
>>> correctness of your theory of it not mattering?  All the alignment
>>> adjustments in GCC were included after measurements.  In particular the
>>> align-by-8-always (for loop heads) was included after some large
>>> regressions on cpu2000, in 2007 (core2 duo at that time).
>>>
>>> So, I'm never much thrilled about listing reasons for why performance
>>> can't possibly be affected, especially when we know that it once _was_
>>> affected, when there's an easy way to show that it's not affected.
>>
>>
>> z.S:
>>
>> #compile with: gcc -nostartfiles -nostdlib
>> _start: .globl _start
>> .p2align 8
>> mov $4000*1000*1000, %eax # 5-byte insn
>> nop # 6
>> nop # 7
>> nop # 8
>> loop:   dec %eax
>> lea (%ebx), %ebx
>> jnz loop
>> push$0
>> ret # SEGV
>>
>> This program loops 4 billion times, then exits (by crashing).
>
> ...
>>
>> Looks like loop alignment to 8 bytes does not matter (in this particular
>> example).
>
>
>
> I looked into it more. I read Agner's Fog
> http://www.agner.org/optimize/microarchitecture.pdf
>
> Since Nehalem, Intel CPUs have loopback buffer,
> differently implemented in different CPUs.
>
> I use the following code with 4-billion iteration loop
> with various numbers of padding NOPs:
>
> 00400100 <_start>:
>   400100:   b8 00 28 6b ee  mov$0xee6b2800,%eax
>   400105:   90  nop
>   400106:   90  nop
> 00400107 :
>   400107:   ff c8   dec%eax
>   400109:   8d 88 d2 04 00 00   lea0x4d2(%rax),%ecx
>   40010f:   75 f6   jne400107 
>
>   400111:   b8 e7 00 00 00  mov$0xe7,%eax
>   400116:   0f 05   syscall
>
> On Skylake, the loop slows down if its body crosses 16 bytes
> (as shown above - last JNE insn doesn't fit).
>
> With loop starting at 00400106 and fitting into an aligned 16-byte
> block:
>
>  Performance counter stats for './z6' (10 runs):
>1209.051244  task-clock (msec) #0.999 CPUs utilized
> ( +-  0.99% )
>  5  context-switches  #0.004 K/sec
> ( +- 11.11% )
>  2  page-faults   #0.002 K/sec
> ( +-  4.76% )
>  4,101,694,215  cycles#3.392 GHz
> ( +-  0.51% )
> 12,027,931,896  instructions  #2.93  insn per cycle
> ( +-  0.00% )
>  4,005,295,446  branches  # 3312.759 M/sec
> ( +-  0.00% )
> 15,828  branch-misses #0.00% of all branches
> ( +-  4.49% )
>1.209910890 seconds time elapsed
> ( +-  0.99% )
>
> With loop starting at 00400107:
>
>  Performance counter stats for './z7' (10 runs):
>1408.362422  task-clock (msec) #0.999 CPUs utilized
> ( +-  1.23% )
>  5  context-switches  #0.004 K/sec
> ( +- 15.59% )
>  2  page-faults   #0.001 K/sec
> ( +-  4.76% )
>  4,749,031,319  cycles#3.372 GHz
> ( +-  0.34% )
> 12,032,488,082  instructions  #2.53  insn per cycle
> ( +-  0.00% )
>  4,006,159,536  branches  # 2844.552 M/sec
> ( +-  0.00% )
>  6,946  branch-misses #0.00% of all branches
> ( +-  3.88% )
>1.409459099 seconds time elapsed
> ( +-  1.23% )
>
> With loop starting at 00400108:
>
>  Performance counter stats for './z8' (10 runs):
>1407.127953  task-clock (msec) #0.999 CPUs utilized
> ( +-  1.09% )
>  6  context-switches  #0.004 K/sec
> ( +- 15.70% )
>  2  page-faults   #0.002 K/sec
> ( +-  6.64% )
>  4,747,410,967  cycles#3.374 GHz
> ( +-  0.39% )
> 12,032,462,223  instructions  #2.53  insn per cycle
> ( +-  0.00% )
>  4,006,154,637  branches  # 2847.044 M/sec
> ( +-  0.00% )
>  7,324  branch-misses #0.00% of all branches
> ( +-

[PATCH] PR71752 - SLP: Maintain operand ordering when creating vec defs

2016-08-15 Thread Alan Hayward
The testcase pr71752.c was failing because the SLP code was generating an
SLP
vector using arguments from the SLP scalar stmts, but was using the wrong
argument number.

vect_get_slp_defs() is given a vector of operands. When calling down to
vect_get_constant_vectors it uses i as op_num - making the assumption that
the
first op in the vector refers to the first argument in the SLP scalar
statement, the second op refers to the second arg and so on.

However, previously in vectorizable_reduction, the call to
vect_get_vec_defs
destroyed this ordering by potentially only passing op1.

The solution is in vectorizable_reduction to create a vector of operands
equal
in size to the number of arguments in the SLP statements. We maintain the
argument ordering and if we don't require defs for that argument we instead
push NULL into the vector. In vect_get_slp_defs we need to handle cases
where
an op might be NULL.

Tested with a check run on X86 and AArch64.
Ok to commit?


Changelog:

gcc/
* tree-vect-loop.c (vectorizable_reduction): Keep SLP operand ordering.
* tree-vect-slp.c (vect_get_slp_defs): Handle null operands.

gcc/testsuite/
* gcc.dg/vect/pr71752.c: New.



Thanks,
Alan.




diff --git a/gcc/testsuite/gcc.dg/vect/pr71752.c
b/gcc/testsuite/gcc.dg/vect/pr71752.c
new file mode 100644
index 
..8d26754b4fedf8b104caae8742a445dff
bf23f0a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr71752.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+
+unsigned int q4, yg;
+
+unsigned int
+w6 (unsigned int z5, unsigned int jv)
+{
+  unsigned int *f2 = &jv;
+
+  while (*f2 < 21)
+{
+  q4 -= jv;
+  z5 -= jv;
+  f2 = &yg;
+  ++(*f2);
+}
+  return z5;
+}
+
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 
2a7e0c6661bc1ba82c9f03720e550749f2252a7c..826481af3d1d8b29bcdbd7d81c0fd5a85
9ecd9b0 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -5364,7 +5364,7 @@ vectorizable_reduction (gimple *stmt,
gimple_stmt_iterator *gsi,
   auto_vec vect_defs;
   auto_vec phis;
   int vec_num;
-  tree def0, def1, tem, op0, op1 = NULL_TREE;
+  tree def0, def1, tem, op1 = NULL_TREE;
   bool first_p = true;
   tree cr_index_scalar_type = NULL_TREE, cr_index_vector_type = NULL_TREE;
   gimple *cond_expr_induction_def_stmt = NULL;
@@ -5964,29 +5964,36 @@ vectorizable_reduction (gimple *stmt,
gimple_stmt_iterator *gsi,
   /* Handle uses.  */
   if (j == 0)
 {
-  op0 = ops[!reduc_index];
-  if (op_type == ternary_op)
-{
-  if (reduc_index == 0)
-op1 = ops[2];
-  else
-op1 = ops[1];
-}
+ if (slp_node)
+   {
+ /* Get vec defs for all the operands except the reduction index,
+   ensuring the ordering of the ops in the vector is kept.  */
+ auto_vec slp_ops;
+ auto_vec, 3> vec_defs;

-  if (slp_node)
-vect_get_vec_defs (op0, op1, stmt, &vec_oprnds0, &vec_oprnds1,
-   slp_node, -1);
+ slp_ops.quick_push ((reduc_index == 0) ? NULL : ops[0]);
+ slp_ops.quick_push ((reduc_index == 1) ? NULL : ops[1]);
+ if (op_type == ternary_op)
+   slp_ops.quick_push ((reduc_index == 2) ? NULL : ops[2]);
+
+ vect_get_slp_defs (slp_ops, slp_node, &vec_defs, -1);
+
+ vec_oprnds0.safe_splice (vec_defs[(reduc_index == 0) ? 1 : 0]);
+ if (op_type == ternary_op)
+   vec_oprnds1.safe_splice (vec_defs[(reduc_index == 2) ? 1 : 2]);
+   }
   else
-{
+   {
   loop_vec_def0 = vect_get_vec_def_for_operand
(ops[!reduc_index],
 stmt);
   vec_oprnds0.quick_push (loop_vec_def0);
   if (op_type == ternary_op)
{
+op1 = (reduc_index == 0) ? ops[2] : ops[1];
  loop_vec_def1 = vect_get_vec_def_for_operand (op1, stmt);
  vec_oprnds1.quick_push (loop_vec_def1);
}
-}
+   }
 }
   else
 {
diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index 
fb325d54f1084461d44cd54a98e5b7f99541a188..7c480d59c823b5258255c8be047f050c8
3cc91fd 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -3200,10 +3200,19 @@ vect_get_slp_defs (vec ops, slp_tree
slp_node,
   vec vec_defs;
   tree oprnd;
   bool vectorized_defs;
+  bool first_iteration = true;

   first_stmt = SLP_TREE_SCALAR_STMTS (slp_node)[0];
   FOR_EACH_VEC_ELT (ops, i, oprnd)
 {
+  if (oprnd == NULL)
+   {
+ vec_defs = vNULL;
+ vec_defs.create (0);
+ vec_oprnds->quick_push (vec_defs);
+ continue;
+   }
+
   /* For each operand we check if it has vectorized definitions in a
child
 node or we need to create them (for

Re: [PATCH] Add mark_spam.py script

2016-08-15 Thread Jakub Jelinek
On Mon, Aug 15, 2016 at 11:43:11AM +0200, Martin Liška wrote:
> > Is dropping of 'comment": 'spam' intentional?
> 
> Yes, it's not necessary to do a comment about the change for an attachment.
> As the name of the attachment is set to spam, it's obvious in a comment
> that is made for that.

But can't the comment added for the attachment contain also some spam text
that should be sanitized?

Jakub


[patch, OpenACC] Fix reduction lowering segfault in omp-low

2016-08-15 Thread Chung-Lin Tang
Hi Jakub,
This patch fixes an OpenACC reduction lowering segfault which
triggers when nested acc loop directives are present.
Cesar has reviewed this patch internally (since he mostly wrote
the code originally)

Patch has been tested and committed to gomp-4_0-branch,
is this also okay for trunk?

Thanks,
Chung-Lin

2016-08-15  Chung-Lin Tang  

* omp-low.c (lower_oacc_reductions): Adjust variable lookup to use
maybe_lookup_decl, to handle nested acc loop directives.

Index: omp-low.c
===
--- omp-low.c   (revision 239324)
+++ omp-low.c   (working copy)
@@ -5687,10 +5687,19 @@ lower_oacc_reductions (location_t loc, tree clause
outgoing = var;
incoming = omp_reduction_init_op (loc, rcode, type);
  }
-   else if (ctx->outer)
- incoming = outgoing = lookup_decl (orig, ctx->outer);
else
- incoming = outgoing = orig;
+ {
+   /* Try to look at enclosing contexts for reduction var,
+  use original if no mapping found.  */
+   tree t = NULL_TREE;
+   omp_context *c = ctx->outer;
+   while (c && !t)
+ {
+   t = maybe_lookup_decl (orig, c);
+   c = c->outer;
+ }
+   incoming = outgoing = (t ? t : orig);
+ }
  
  has_outer_reduction:;
  }


Re: [v3 PATCH] Implement LWG 2744 and LWG 2754.

2016-08-15 Thread Jonathan Wakely

On 09/08/16 18:00 +0300, Ville Voutilainen wrote:
_Args&&...> = false>

diff --git a/libstdc++-v3/include/std/utility b/libstdc++-v3/include/std/utility
index 0c03644..e1a523f 100644
--- a/libstdc++-v3/include/std/utility
+++ b/libstdc++-v3/include/std/utility
@@ -356,6 +356,19 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  template 
in_place_tag in_place(__in_place_index<_Idx>*) {terminate();}

+  template
+struct __is_in_place_impl : false_type
+{ };
+
+  template
+  struct __is_in_place_impl> : true_type


Indentation nit.


+{ };
+
+  template
+struct __is_in_place
+: public __is_in_place_impl>>


Any reason not to use decay_t here? In all the cases where decay is
different to stripping references and cv-qualifiers the result will be
false either way.

I wouldn't have bothered with the std:: qualification either, but it's
fine as it is.

OK for trunk, thanks.



Re: [v3 PATCH] Implement C++17 make_from_tuple.

2016-08-15 Thread Jonathan Wakely

On 14/08/16 21:27 +0300, Ville Voutilainen wrote:

Here. Tested on Linux-x64. I made the test for the macro value compare
it relatively rather than exactly;
I don't think our tests should necessarily break just because a macro
value is updated.

2016-08-14  Ville Voutilainen  

   Add a feature macro for C++17 make_from_tuple.
   * include/std/tuple (__cpp_lib_make_from_tuple): New.
   * testsuite/20_util/tuple/make_from_tuple/1.cc: Adjust.


OK, thanks.



Re: [patch, OpenACC] Fix reduction lowering segfault in omp-low

2016-08-15 Thread Jakub Jelinek
On Mon, Aug 15, 2016 at 05:52:29PM +0800, Chung-Lin Tang wrote:
> Hi Jakub,
> This patch fixes an OpenACC reduction lowering segfault which
> triggers when nested acc loop directives are present.
> Cesar has reviewed this patch internally (since he mostly wrote
> the code originally)
> 
> Patch has been tested and committed to gomp-4_0-branch,
> is this also okay for trunk?
> 
> Thanks,
> Chung-Lin
> 
> 2016-08-15  Chung-Lin Tang  
> 
> * omp-low.c (lower_oacc_reductions): Adjust variable lookup to use
> maybe_lookup_decl, to handle nested acc loop directives.

Is this covered by an existing testcase in the testsuite?
If not, can you please add a testcase for it.
Otherwise LGTM (not extra happy about accepting any kinds of contexts,
but I hope the nesting diagnostics error out on OpenMP contexts mixed with
OpenACC ones and hope that there can't be some other OpenACC context around
that you wouldn't want to handle).

> Index: omp-low.c
> ===
> --- omp-low.c (revision 239324)
> +++ omp-low.c (working copy)
> @@ -5687,10 +5687,19 @@ lower_oacc_reductions (location_t loc, tree clause
>   outgoing = var;
>   incoming = omp_reduction_init_op (loc, rcode, type);
> }
> - else if (ctx->outer)
> -   incoming = outgoing = lookup_decl (orig, ctx->outer);
>   else
> -   incoming = outgoing = orig;
> +   {
> + /* Try to look at enclosing contexts for reduction var,
> +use original if no mapping found.  */
> + tree t = NULL_TREE;
> + omp_context *c = ctx->outer;
> + while (c && !t)
> +   {
> + t = maybe_lookup_decl (orig, c);
> + c = c->outer;
> +   }
> + incoming = outgoing = (t ? t : orig);
> +   }
> 
> has_outer_reduction:;
> }


Jakub


Re: [PATCH] Fix early debug regression with DW_AT_string_length (PR debug/71906)

2016-08-15 Thread Jakub Jelinek
On Fri, Aug 12, 2016 at 07:57:42PM +0200, Jakub Jelinek wrote:
> On Fri, Aug 12, 2016 at 01:47:14PM -0400, Jason Merrill wrote:
> > On 07/21/2016 12:53 PM, Jakub Jelinek wrote:
> > >  size = int_size_in_bytes (TREE_TYPE (szdecl));
> > ...
> > >+ if (size != DWARF2_ADDR_SIZE)
> > >+   add_AT_unsigned (array_die, DW_AT_byte_size, size);
> > 
> > For DWARF5, where DW_AT_byte_size is always the size of the array type, I
> > think this should be DW_AT_string_length_byte_size.
> 
> Sure, but this is just reindenting existing code,
> DW_AT_string_length_byte_size isn't yet in dwarf2.def nor anywhere else.
> When DWARF5 will make it into the public beta, I'll try to spend some time
> on implementing the 5 support, but I think it would be better done
> incrementally then, not part of this patch.

I've committed the patch with following incremental patch on top of it for
the trunk (and for 6.2 just the original patch):

2016-08-15  Jakub Jelinek  

* dwarf2.def (DW_AT_string_length_bit_size,
DW_AT_string_length_byte_size): New attributes.

* dwarf2out.c (struct checksum_attributes): Add
at_string_length_bit_size and at_string_length_byte_size fields.
(collect_checksum_attributes): Handle DW_AT_string_length_bit_size
and DW_AT_string_length_byte_size.
(die_checksum_ordered): Handle at_string_length_bit_size and
at_string_length_byte_size.
(gen_array_type_die): For dwarf_version >= 5 emit
DW_AT_string_length_byte_size instead of DW_AT_byte_size.
(adjust_string_types): For dwarf_version >= 5 remove
DW_AT_string_length_byte_size instead of DW_AT_byte_size.
(resolve_addr): Likewise.

--- include/dwarf2.def.jj   2016-08-12 11:12:47.0 +0200
+++ include/dwarf2.def  2016-08-15 11:03:44.742465435 +0200
@@ -309,6 +309,8 @@ DW_AT (DW_AT_const_expr, 0x6c)
 DW_AT (DW_AT_enum_class, 0x6d)
 DW_AT (DW_AT_linkage_name, 0x6e)
 /* DWARF 5.  */
+DW_AT (DW_AT_string_length_bit_size, 0x6f)
+DW_AT (DW_AT_string_length_byte_size, 0x70)
 DW_AT (DW_AT_noreturn, 0x87)
 DW_AT (DW_AT_deleted, 0x8a)
 DW_AT (DW_AT_defaulted, 0x8b)
--- gcc/dwarf2out.c.jj  2016-08-15 11:02:41.0 +0200
+++ gcc/dwarf2out.c 2016-08-15 11:11:40.023507518 +0200
@@ -6363,6 +6363,8 @@ struct checksum_attributes
   dw_attr_node *at_small;
   dw_attr_node *at_segment;
   dw_attr_node *at_string_length;
+  dw_attr_node *at_string_length_bit_size;
+  dw_attr_node *at_string_length_byte_size;
   dw_attr_node *at_threads_scaled;
   dw_attr_node *at_upper_bound;
   dw_attr_node *at_use_location;
@@ -6502,6 +6504,12 @@ collect_checksum_attributes (struct chec
 case DW_AT_string_length:
   attrs->at_string_length = a;
   break;
+   case DW_AT_string_length_bit_size:
+ attrs->at_string_length_bit_size = a;
+ break;
+   case DW_AT_string_length_byte_size:
+ attrs->at_string_length_byte_size = a;
+ break;
 case DW_AT_threads_scaled:
   attrs->at_threads_scaled = a;
   break;
@@ -6588,6 +6596,8 @@ die_checksum_ordered (dw_die_ref die, st
   CHECKSUM_ATTR (attrs.at_small);
   CHECKSUM_ATTR (attrs.at_segment);
   CHECKSUM_ATTR (attrs.at_string_length);
+  CHECKSUM_ATTR (attrs.at_string_length_bit_size);
+  CHECKSUM_ATTR (attrs.at_string_length_byte_size);
   CHECKSUM_ATTR (attrs.at_threads_scaled);
   CHECKSUM_ATTR (attrs.at_upper_bound);
   CHECKSUM_ATTR (attrs.at_use_location);
@@ -19355,7 +19365,9 @@ gen_array_type_die (tree type, dw_die_re
  add_AT_location_description (array_die, DW_AT_string_length,
   loc);
  if (size != DWARF2_ADDR_SIZE)
-   add_AT_unsigned (array_die, DW_AT_byte_size, size);
+   add_AT_unsigned (array_die, dwarf_version >= 5
+   ? DW_AT_string_length_byte_size
+   : DW_AT_byte_size, size);
}
}
}
@@ -19448,7 +19460,9 @@ adjust_string_types (void)
   else
{
  remove_AT (array_die, DW_AT_string_length);
- remove_AT (array_die, DW_AT_byte_size);
+ remove_AT (array_die, dwarf_version >= 5
+   ? DW_AT_string_length_byte_size
+   : DW_AT_byte_size);
}
 }
 }
@@ -26909,8 +26923,8 @@ copy_deref_exprloc (dw_loc_descr_ref exp
 
 /* For DW_AT_string_length attribute with DW_OP_call4 reference to a variable
or argument, adjust it if needed and return:
-   -1 if the DW_AT_string_length attribute and DW_AT_byte_size attribute
-  if present should be removed
+   -1 if the DW_AT_string_length attribute and DW_AT_{string_length_,}byte_size
+  attribute if present should be removed
0 keep the attribute as is if the referenced var or argument has
  only DWARF expression that covers a

Re: [PATCH] Add mark_spam.py script

2016-08-15 Thread Martin Liška
On 08/15/2016 11:48 AM, Jakub Jelinek wrote:
> But can't the comment added for the attachment contain also some spam text
> that should be sanitized?
> 
>   Jakub

It can, currently we mark as spam just the first comment. If there's a spam PR
which contains multiple comments, I'll extend the script.

There's a sample of attachment marked as spam:
https://gcc.gnu.org/bugzilla/attachment.cgi?id=39437&action=edit

Martin


Re: [v3 PATCH] Implement C++17 make_from_tuple.

2016-08-15 Thread Jonathan Wakely

On 11/08/16 03:04 +0300, Ville Voutilainen wrote:

+
+  template 
+constexpr _Tp
+__make_from_tuple_impl(_Tuple&& __t, index_sequence<_Idx...>)
+{ return _Tp(get<_Idx>(std::forward<_Tuple>(__t))...); }


We need to use std::get here.


+
+  template 
+constexpr _Tp
+make_from_tuple(_Tuple&& __t)
+{
+  return __make_from_tuple_impl<_Tp>(
+std::forward<_Tuple>(__t),
+   make_index_sequence>>{});
+}
#endif // C++17


It would be nice to add a conditional 'noexcept' to this function, but
doing so is a bit complicated, as I discovered when trying to do it
for std::apply().

What we need is a version of tuple_element which gives you the result
of std::get on the tuple, taking into account its value category,
something like:

 template
   struct __tuple_element_ref
   : add_lvalue_reference> { };

 template
   struct __tuple_element_ref<_Nm, _Tuple&&>
   : add_rvalue_reference> { };

 template
   struct __tuple_element_ref<_Nm, _Tuple&>
   : add_lvalue_reference> { };

 template
   using __tuple_element_ref_t
 = typename __tuple_element_ref<_Nm, _Tuple>::type;

And then for std::__make_from_tuple_impl use:

 noexcept(is_nothrow_constructible_v<_Tp, __tuple_element_ref_t<_Idx, 
_Tuple>...>)

And for std::__apply_impl use:

 noexcept(is_nothrow_callable_v<_Fn&&(__tuple_element_ref_t<_Idx, _Tuple>...)>)



Re: [PATCH 3/4] Add support to run auto-vectorization tests for multiple effective targets

2016-08-15 Thread Trevor Saunders
On Tue, Jul 26, 2016 at 01:51:33PM +, Robert Suchanek wrote:
> Hi,
> 
> > On May 5, 2016, at 8:14 AM, Robert Suchanek  
> > wrote:
> > >
> > > I'm resending this patch as it has been rebased and updated.  I reverted a
> > change
> > > to check_effective_target_vect_call_lrint procedure because it does not 
> > > use
> > > cached result.
> > 
> > Ok.
> > 
> > Please ensure that the compilation flag is mixed into the test case name so
> > that as you iterate over them, the test case names are unique.
> 
> An effective target is likely to have a unique flag to enable a given set of
> SIMD operations and this is mixed into test case names.
> 
> I double-checked this with mips-mti-linux-gnu where auto-vectorization tests
> can be run twice i.e. for -mmsa and -mpaired-single.
> 
> The patch was rebased once again and tested on x86_64-unknown-linux-gnu.
> 
> Committed as r238755.

unfortunately this broke make check-c
RUNTESTFLAGS='vect.exp=*no-vfa-vect-dv-2.c
--target_board=unix\{-m32,-m64\}', causing the check if
vect_aligned_arrays to be cached between the -m64 and -m32 variants
which is incorrect at least on my machine if you actually run that test
for -m32 and -m64 you get different results.  In both case et_index is 0
so you use the cached value the second time, but that's not correct
because the options changed.

I suspect this also causes some random vectorizer tests to appear and
disappear during regression testing with the same -m64 and -m32, but I'm
not absolutely sure of that part.

Thanks!

Trev

> 
> Thanks and regards,
> Robert
> 


Re: [PATCH] gcov: add new option (--hash-names) (PR gcov-profile/36412).

2016-08-15 Thread Nathan Sidwell

On 08/09/16 10:32, Martin Liška wrote:

Hello.

Following enhancement for gcov solves issues when we cannot create a file due 
to a filesystem
path length limit. Selected approach utilizes existing md5sum functions.

Patch survives make check -k RUNTESTFLAGS="gcov.exp" on x86_64-linux-gnu.

Ready for trunk?
Thanks,
Martin



+ [@option{-e}|@option{--hash-names}]
'--hash-filenames' would be better.  Let's not confuse the user with thinking 
may be  the function names are hashed. (or perhaps '--hash-paths'?  The world's 
a little unclear on whether 'filename->last bit of file path, or the whole thing')



+/* For situations when a long name can potentially hit filesystem path limit,
+   let's calculate md5sum of the patch and create file x.gcov##md5sum.gcov.  */
+
+static int flag_hash_names = 0;
+
s/patch/path.
Which  bit of 'x.gcov##md5sum.gcov' is the hash?  is it 'x' or sommethihg else? 
Perhaps this more detailed comment should be near where the filename is 
generated.  And this flag just labelled as someting like 'hash long pathnames'


+  fnotice (file, "  -e, --hash-namesUse hash of file path in "
.. and ..
+  { "long-file-names",  no_argument,   NULL, 'e' },

don't seem to match?  Why 'e'?

nathan


Re: [PATCH] Fix invalid memory access in gcc.c (driver/72765)

2016-08-15 Thread Jakub Jelinek
On Fri, Aug 12, 2016 at 02:22:56PM +0200, Martin Liška wrote:
> Simple patch corrects assumption about string length, however the hunk in
> save_string is kind of discussable as one can have a string with '\0' chars
> which is length enough? 
> 
> Thoughts?
> 
> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
> 
> Ready to be installed?
> Martin

> >From c7a7e1be3c113ee0f610d627426b8f241357b86e Mon Sep 17 00:00:00 2001
> From: marxin 
> Date: Tue, 9 Aug 2016 13:04:57 +0200
> Subject: [PATCH] Fix invalid memory access in gcc.c (driver/72765)
> 
> gcc/ChangeLog:
> 
> 2016-08-09  Martin Liska  
> 
>   PR driver/72765
>   * gcc.c (do_spec_1): Call save_string with the right size.
>   (save_string): Do an assert about string we copy.
> ---
>  gcc/gcc.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/gcc.c b/gcc/gcc.c
> index 7460f6a..a5c4a19 100644
> --- a/gcc/gcc.c
> +++ b/gcc/gcc.c
> @@ -5420,8 +5420,9 @@ do_spec_1 (const char *spec, int inswitch, const char 
> *soft_matched_part)
>   if (files_differ)
>  #endif
> {
> - temp_filename = save_string (temp_filename,
> -  temp_filename_length + 
> 1);
> + temp_filename
> +   = save_string (temp_filename,
> +  temp_filename_length - 1);
>   obstack_grow (&obstack, temp_filename,
>   temp_filename_length);
>   arg_going = 1;

This is ok for trunk/6.2 (if you commit RSN).

> @@ -8362,6 +8363,7 @@ save_string (const char *s, int len)
>  {
>char *result = XNEWVEC (char, len + 1);
>  
> +  gcc_assert (strlen (s) >= (unsigned int)len);
>memcpy (result, s, len);
>result[len] = 0;
>return result;

I'd leave this one out (at least from 6.x); if anything, use just
gcc_checking_assert (since otherwise it doesn't make much sense to pass len
if you are going to recompute it anyway) and put a space between the cast and 
len.

Jakub


[PATCH] Some timevar fixes

2016-08-15 Thread Richard Biener

I had those in my tree.  Committed as obvious.

Richard.

2016-08-15  Richard Biener  

* ree.c (rest_of_handle_ree): Remove redundant timevar push/pop.
* config/i386/i386.c (pass_data_insert_vzeroupper): Account to
TV_MACH_DEP.
(pass_data_stv): Likewise.

Index: gcc/ree.c
===
--- gcc/ree.c   (revision 239460)
+++ gcc/ree.c   (working copy)
@@ -1247,9 +1247,7 @@ find_and_remove_re (void)
 static unsigned int
 rest_of_handle_ree (void)
 {
-  timevar_push (TV_REE);
   find_and_remove_re ();
-  timevar_pop (TV_REE);
   return 0;
 }
 
Index: gcc/config/i386/i386.c
===
--- gcc/config/i386/i386.c  (revision 239460)
+++ gcc/config/i386/i386.c  (working copy)
@@ -4057,7 +4057,7 @@ const pass_data pass_data_insert_vzeroup
   RTL_PASS, /* type */
   "vzeroupper", /* name */
   OPTGROUP_NONE, /* optinfo_flags */
-  TV_NONE, /* tv_id */
+  TV_MACH_DEP, /* tv_id */
   0, /* properties_required */
   0, /* properties_provided */
   0, /* properties_destroyed */
@@ -4092,7 +4092,7 @@ const pass_data pass_data_stv =
   RTL_PASS, /* type */
   "stv", /* name */
   OPTGROUP_NONE, /* optinfo_flags */
-  TV_NONE, /* tv_id */
+  TV_MACH_DEP, /* tv_id */
   0, /* properties_required */
   0, /* properties_provided */
   0, /* properties_destroyed */



[PATCH] Fix val-prof-7.c on --target_board 'unix/-m32'

2016-08-15 Thread Martin Liška
Hello.

The test-case uses size of memory operations which cannot be handled by core2 
in 32-bit
mode. Fixed in the patch.

Survives:
make check -k -j10 RUNTESTFLAGS="tree-prof.exp --target_board 'unix/-m32'"
make check -k -j10 RUNTESTFLAGS="tree-prof.exp"

on x86_64-linux-gnu.

Ready for trunk?
Thanks,
Martin
>From e8663bc8b2c721eff3003aa591d52b2b15132b88 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Mon, 15 Aug 2016 13:09:36 +0200
Subject: [PATCH] Fix val-prof-7.c on --target_board 'unix/-m32'

gcc/testsuite/ChangeLog:

2016-08-15  Martin Liska  

	* gcc.dg/tree-prof/val-prof-7.c (int main): Change size
	of memory operations so that it can be handled by core2
	in 32-bit mode.
---
 gcc/testsuite/gcc.dg/tree-prof/val-prof-7.c | 22 +++---
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/tree-prof/val-prof-7.c b/gcc/testsuite/gcc.dg/tree-prof/val-prof-7.c
index 3e636aa..84ec9fb 100644
--- a/gcc/testsuite/gcc.dg/tree-prof/val-prof-7.c
+++ b/gcc/testsuite/gcc.dg/tree-prof/val-prof-7.c
@@ -57,25 +57,25 @@ int main() {
   buffer1 = __builtin_malloc (1000);
   buffer2 = __builtin_malloc (1000);
 
-  test_stringops_with_values_0 (8, 111);
-  test_stringops_with_values_1 (111, 111);
-  test_stringops_with_values_2 (257, 111);
+  test_stringops_with_values_0 (8, 55);
+  test_stringops_with_values_1 (55, 55);
+  test_stringops_with_values_2 (257, 55);
 
   return 0;
 }
 
 /* { dg-final-use-not-autofdo { scan-ipa-dump "Single value 8 stringop transformation on __builtin_bzero" "profile" } } */
-/* { dg-final-use-not-autofdo { scan-ipa-dump "Single value 111 stringop transformation on __builtin_bzero" "profile" } } */
-/* { dg-final-use-not-autofdo { scan-ipa-dump-times "Single value 257 stringop transformation on __builtin_bzero" 0 "profile" } } */
+/* { dg-final-use-not-autofdo { scan-ipa-dump "Single value 55 stringop transformation on __builtin_bzero" "profile" } } */
+/* { dg-final-use-not-autofdo { scan-ipa-dump-times "Single value 32 stringop transformation on __builtin_bzero" 0 "profile" } } */
 
 /* { dg-final-use-not-autofdo { scan-ipa-dump "Single value 8 stringop transformation on __builtin_memcpy" "profile" } } */
-/* { dg-final-use-not-autofdo { scan-ipa-dump "Single value 111 stringop transformation on __builtin_memcpy" "profile" } } */
-/* { dg-final-use-not-autofdo { scan-ipa-dump-times "Single value 257 stringop transformation on __builtin_memcpy" 0 "profile" } } */
+/* { dg-final-use-not-autofdo { scan-ipa-dump "Single value 55 stringop transformation on __builtin_memcpy" "profile" } } */
+/* { dg-final-use-not-autofdo { scan-ipa-dump-times "Single value 32 stringop transformation on __builtin_memcpy" 0 "profile" } } */
 
 /* { dg-final-use-not-autofdo { scan-ipa-dump "Single value 8 stringop transformation on __builtin_mempcpy" "profile" } } */
-/* { dg-final-use-not-autofdo { scan-ipa-dump "Single value 111 stringop transformation on __builtin_mempcpy" "profile" } } */
-/* { dg-final-use-not-autofdo { scan-ipa-dump-times "Single value 257 stringop transformation on __builtin_mempcpy" 0 "profile" } } */
+/* { dg-final-use-not-autofdo { scan-ipa-dump "Single value 55 stringop transformation on __builtin_mempcpy" "profile" } } */
+/* { dg-final-use-not-autofdo { scan-ipa-dump-times "Single value 32 stringop transformation on __builtin_mempcpy" 0 "profile" } } */
 
 /* { dg-final-use-not-autofdo { scan-ipa-dump "Single value 8 stringop transformation on __builtin_memset" "profile" } } */
-/* { dg-final-use-not-autofdo { scan-ipa-dump "Single value 111 stringop transformation on __builtin_memset" "profile" } } */
-/* { dg-final-use-not-autofdo { scan-ipa-dump-times "Single value 257 stringop transformation on __builtin_memset" 0 "profile" } } */
+/* { dg-final-use-not-autofdo { scan-ipa-dump "Single value 55 stringop transformation on __builtin_memset" "profile" } } */
+/* { dg-final-use-not-autofdo { scan-ipa-dump-times "Single value 32 stringop transformation on __builtin_memset" 0 "profile" } } */
-- 
2.9.2



Re: [PATCH] Fix invalid memory access in gcc.c (driver/72765)

2016-08-15 Thread Martin Liška
On 08/15/2016 01:02 PM, Jakub Jelinek wrote:
> On Fri, Aug 12, 2016 at 02:22:56PM +0200, Martin Liška wrote:
>> Simple patch corrects assumption about string length, however the hunk in
>> save_string is kind of discussable as one can have a string with '\0' chars
>> which is length enough? 
>>
>> Thoughts?
>>
>> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
>>
>> Ready to be installed?
>> Martin
> 
>> >From c7a7e1be3c113ee0f610d627426b8f241357b86e Mon Sep 17 00:00:00 2001
>> From: marxin 
>> Date: Tue, 9 Aug 2016 13:04:57 +0200
>> Subject: [PATCH] Fix invalid memory access in gcc.c (driver/72765)
>>
>> gcc/ChangeLog:
>>
>> 2016-08-09  Martin Liska  
>>
>>  PR driver/72765
>>  * gcc.c (do_spec_1): Call save_string with the right size.
>>  (save_string): Do an assert about string we copy.
>> ---
>>  gcc/gcc.c | 6 --
>>  1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/gcc/gcc.c b/gcc/gcc.c
>> index 7460f6a..a5c4a19 100644
>> --- a/gcc/gcc.c
>> +++ b/gcc/gcc.c
>> @@ -5420,8 +5420,9 @@ do_spec_1 (const char *spec, int inswitch, const char 
>> *soft_matched_part)
>>  if (files_differ)
>>  #endif
>>{
>> -temp_filename = save_string (temp_filename,
>> - temp_filename_length + 
>> 1);
>> +temp_filename
>> +  = save_string (temp_filename,
>> + temp_filename_length - 1);
>>  obstack_grow (&obstack, temp_filename,
>>  temp_filename_length);
>>  arg_going = 1;
> 
> This is ok for trunk/6.2 (if you commit RSN).

Ok, I'll commit the first hunk to both GCC-5 and GCC-6 branches.

> 
>> @@ -8362,6 +8363,7 @@ save_string (const char *s, int len)
>>  {
>>char *result = XNEWVEC (char, len + 1);
>>  
>> +  gcc_assert (strlen (s) >= (unsigned int)len);
>>memcpy (result, s, len);
>>result[len] = 0;
>>return result;
> 
> I'd leave this one out (at least from 6.x); if anything, use just
> gcc_checking_assert (since otherwise it doesn't make much sense to pass len
> if you are going to recompute it anyway) and put a space between the cast and 
> len.

And will commit checking assert for trunk.

Martin

> 
>   Jakub
> 



Re: [PATCH] PR71752 - SLP: Maintain operand ordering when creating vec defs

2016-08-15 Thread Richard Biener
On Mon, Aug 15, 2016 at 11:48 AM, Alan Hayward  wrote:
> The testcase pr71752.c was failing because the SLP code was generating an
> SLP
> vector using arguments from the SLP scalar stmts, but was using the wrong
> argument number.
>
> vect_get_slp_defs() is given a vector of operands. When calling down to
> vect_get_constant_vectors it uses i as op_num - making the assumption that
> the
> first op in the vector refers to the first argument in the SLP scalar
> statement, the second op refers to the second arg and so on.
>
> However, previously in vectorizable_reduction, the call to
> vect_get_vec_defs
> destroyed this ordering by potentially only passing op1.
>
> The solution is in vectorizable_reduction to create a vector of operands
> equal
> in size to the number of arguments in the SLP statements. We maintain the
> argument ordering and if we don't require defs for that argument we instead
> push NULL into the vector. In vect_get_slp_defs we need to handle cases
> where
> an op might be NULL.
>
> Tested with a check run on X86 and AArch64.
> Ok to commit?
>

Ugh.  Note the logic in vect_get_slp_defs is incredibly fragile - I
think you can't
simply "skip" ops the way you do as you need to still increment child_index
accordingly for ignored ops.

Why not let the function compute defs for all ops?  That said, the
vectorizable_reduction
part certainly is fixing a bug (I think I've seen similar issues
elsewhere though).

Richard.

> Changelog:
>
> gcc/
> * tree-vect-loop.c (vectorizable_reduction): Keep SLP operand 
> ordering.
> * tree-vect-slp.c (vect_get_slp_defs): Handle null operands.
>
> gcc/testsuite/
> * gcc.dg/vect/pr71752.c: New.
>
>
>
> Thanks,
> Alan.
>
>
>
>
> diff --git a/gcc/testsuite/gcc.dg/vect/pr71752.c
> b/gcc/testsuite/gcc.dg/vect/pr71752.c
> new file mode 100644
> index
> ..8d26754b4fedf8b104caae8742a445dff
> bf23f0a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/pr71752.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile } */
> +
> +unsigned int q4, yg;
> +
> +unsigned int
> +w6 (unsigned int z5, unsigned int jv)
> +{
> +  unsigned int *f2 = &jv;
> +
> +  while (*f2 < 21)
> +{
> +  q4 -= jv;
> +  z5 -= jv;
> +  f2 = &yg;
> +  ++(*f2);
> +}
> +  return z5;
> +}
> +
> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> index
> 2a7e0c6661bc1ba82c9f03720e550749f2252a7c..826481af3d1d8b29bcdbd7d81c0fd5a85
> 9ecd9b0 100644
> --- a/gcc/tree-vect-loop.c
> +++ b/gcc/tree-vect-loop.c
> @@ -5364,7 +5364,7 @@ vectorizable_reduction (gimple *stmt,
> gimple_stmt_iterator *gsi,
>auto_vec vect_defs;
>auto_vec phis;
>int vec_num;
> -  tree def0, def1, tem, op0, op1 = NULL_TREE;
> +  tree def0, def1, tem, op1 = NULL_TREE;
>bool first_p = true;
>tree cr_index_scalar_type = NULL_TREE, cr_index_vector_type = NULL_TREE;
>gimple *cond_expr_induction_def_stmt = NULL;
> @@ -5964,29 +5964,36 @@ vectorizable_reduction (gimple *stmt,
> gimple_stmt_iterator *gsi,
>/* Handle uses.  */
>if (j == 0)
>  {
> -  op0 = ops[!reduc_index];
> -  if (op_type == ternary_op)
> -{
> -  if (reduc_index == 0)
> -op1 = ops[2];
> -  else
> -op1 = ops[1];
> -}
> + if (slp_node)
> +   {
> + /* Get vec defs for all the operands except the reduction index,
> +   ensuring the ordering of the ops in the vector is kept.  */
> + auto_vec slp_ops;
> + auto_vec, 3> vec_defs;
>
> -  if (slp_node)
> -vect_get_vec_defs (op0, op1, stmt, &vec_oprnds0, &vec_oprnds1,
> -   slp_node, -1);
> + slp_ops.quick_push ((reduc_index == 0) ? NULL : ops[0]);
> + slp_ops.quick_push ((reduc_index == 1) ? NULL : ops[1]);
> + if (op_type == ternary_op)
> +   slp_ops.quick_push ((reduc_index == 2) ? NULL : ops[2]);
> +
> + vect_get_slp_defs (slp_ops, slp_node, &vec_defs, -1);
> +
> + vec_oprnds0.safe_splice (vec_defs[(reduc_index == 0) ? 1 : 0]);
> + if (op_type == ternary_op)
> +   vec_oprnds1.safe_splice (vec_defs[(reduc_index == 2) ? 1 : 
> 2]);
> +   }
>else
> -{
> +   {
>loop_vec_def0 = vect_get_vec_def_for_operand
> (ops[!reduc_index],
>  stmt);
>vec_oprnds0.quick_push (loop_vec_def0);
>if (op_type == ternary_op)
> {
> +op1 = (reduc_index == 0) ? ops[2] : ops[1];
>   loop_vec_def1 = vect_get_vec_def_for_operand (op1, stmt);
>   vec_oprnds1.quick_push (loop_vec_def1);
> }
> -}
> +   }
>  }
>else
>  {
> diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
> in

Re: [PATCH] gcov-tool: Do not segfault in merge operation (PR, gcov-profile/67097).

2016-08-15 Thread Martin Liška
On 08/08/2016 04:16 PM, Martin Liška wrote:
> Hi.
> 
> Following simple patch is a fix for $subject.
> 
> Ready for trunk?
> Martin
> 

Patch has been approved on IRC, installed as r239478.

Martin


[PATCH, PR70895] Add copy mapping for reductions on OpenACC loop directives

2016-08-15 Thread Chung-Lin Tang
Hi Jakub,
per the discussion on the bugzilla PR page, reductions on OpenACC loop
directives will automatically get a copy clause mapping on an enclosing
parallel construct (unless bounded by a local variable or an explicit
firstprivate clause).

There is also a patch for libgomp testsuite cases. Asides from the
fortran case which now needs explicit firstprivate clauses to work,
other C/C++ cases have been adjusted to remove explicit copy clauses.
(I have not exhaustively searched everywhere to eliminate them though)

This has been tested using gomp-4_0-branch, which is based on GCC 6,
which is what this PR was originally filed for.

I will be committing this soon for gomp-4_0-branch,
is this okay for gcc-6-branch and trunk as well?

Thanks,
Chung-Lin

2016-08-15  Chung-Lin Tang  

PR middle-end/70895
gcc/
* gimplify.c (omp_add_variable): Adjust/add variable mapping on
enclosing parallel construct for reduction variables on OpenACC loop
directives.

libgomp/
* testsuite/libgomp.oacc-fortran/reduction-7.f90: Add explicit
firstprivate clauses.
* testsuite/libgomp.oacc-c-c++-common/reduction-7.c: Remove explicit
copy clauses.
* testsuite/libgomp.oacc-c-c++-common/reduction-cplx-flt.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/reduction-flt.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/collapse-2.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/loop-red-wv-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/collapse-4.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/loop-red-v-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/reduction-cplx-dbl.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/loop-red-g-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/loop-red-gwv-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/loop-red-w-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/reduction-dbl.c: Likewise.
Index: gcc/gimplify.c
===
--- gcc/gimplify.c	(revision 239471)
+++ gcc/gimplify.c	(working copy)
@@ -5897,6 +5897,37 @@ omp_add_variable (struct gimplify_omp_ctx *ctx, tr
 n->value |= flags;
   else
 splay_tree_insert (ctx->variables, (splay_tree_key)decl, flags);
+
+  /* For reductions clauses in OpenACC loop directives, by default create a
+ copy clause on the enclosing parallel construct for carrying back the
+ results.  */
+  if (ctx->region_type == ORT_ACC && (flags & GOVD_REDUCTION))
+{
+  struct gimplify_omp_ctx *outer_ctx = ctx->outer_context;
+  while (outer_ctx)
+	{
+	  n = splay_tree_lookup (outer_ctx->variables, (splay_tree_key)decl);
+	  if (n != NULL)
+	{
+	  /* Ignore local variables and explicitly declared clauses.  */
+	  if (n->value & (GOVD_LOCAL | GOVD_EXPLICIT))
+		break;
+	  else if (outer_ctx->region_type == ORT_ACC_PARALLEL)
+		{
+		  /* Remove firstprivate and make it a copy map.  */
+		  n->value &= ~GOVD_FIRSTPRIVATE;
+		  n->value |= GOVD_MAP;
+		}
+	}
+	  else if (outer_ctx->region_type == ORT_ACC_PARALLEL)
+	{
+	  splay_tree_insert (outer_ctx->variables, (splay_tree_key)decl,
+ GOVD_MAP | GOVD_SEEN);
+	  break;
+	}
+	  outer_ctx = outer_ctx->outer_context;
+	}
+}
 }
 
 /* Notice a threadprivate variable DECL used in OMP context CTX.
Index: libgomp/testsuite/libgomp.oacc-fortran/reduction-7.f90
===
--- libgomp/testsuite/libgomp.oacc-fortran/reduction-7.f90	(revision 239471)
+++ libgomp/testsuite/libgomp.oacc-fortran/reduction-7.f90	(working copy)
@@ -50,7 +50,7 @@ subroutine redsub_private(sum, n, arr)
 end subroutine redsub_private
 
 
-! Bogus reduction on an impliclitly firstprivate variable.  The results do
+! Bogus reduction on a firstprivate variable.  The results do
 ! survive the parallel region.  The goal here is to ensure that gfortran
 ! doesn't ICE.
 
@@ -58,7 +58,7 @@ subroutine redsub_bogus(sum, n)
   integer :: sum, n, arr(n)
   integer :: i
 
-  !$acc parallel
+  !$acc parallel firstprivate(sum)
   !$acc loop gang worker vector reduction (+:sum)
   do i = 1, n
  sum = sum + 1
@@ -72,7 +72,7 @@ subroutine redsub_combined(sum, n, arr)
   integer :: sum, n, arr(n)
   integer :: i, j
 
-  !$acc parallel copy (arr)
+  !$acc parallel copy (arr) firstprivate(sum)
   !$acc loop gang
   do i = 1, n
  sum = i;
Index: libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-v-1.c
===
--- libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-v-1.c	(revision 239471)
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-v-1.c	(working copy)
@@ -12,7 +12,7 @@ int main ()
   int ondev = 0;
   int t = 0,  h = 0;
 
-#pragma acc parallel vector_length(32) copy(t) copy(ondev)
+#pragma acc parallel vector_length(32) copy(ondev)
   {

Re: [PATCH] gcov: add new option (--hash-names) (PR gcov-profile/36412).

2016-08-15 Thread Martin Liška
On 08/15/2016 12:47 PM, Nathan Sidwell wrote:
> On 08/09/16 10:32, Martin Liška wrote:
>> Hello.
>>
>> Following enhancement for gcov solves issues when we cannot create a file 
>> due to a filesystem
>> path length limit. Selected approach utilizes existing md5sum functions.
>>
>> Patch survives make check -k RUNTESTFLAGS="gcov.exp" on x86_64-linux-gnu.
>>
>> Ready for trunk?
>> Thanks,
>> Martin
>>
> 
> + [@option{-e}|@option{--hash-names}]
> '--hash-filenames' would be better.  Let's not confuse the user with thinking 
> may be  the function names are hashed. (or perhaps '--hash-paths'?  The 
> world's a little unclear on whether 'filename->last bit of file path, or the 
> whole thing')
> 
> 
> +/* For situations when a long name can potentially hit filesystem path limit,
> +   let's calculate md5sum of the patch and create file x.gcov##md5sum.gcov.  
> */
> +
> +static int flag_hash_names = 0;
> +
> s/patch/path.
> Which  bit of 'x.gcov##md5sum.gcov' is the hash?  is it 'x' or sommethihg 
> else? Perhaps this more detailed comment should be near where the filename is 
> generated.  And this flag just labelled as someting like 'hash long pathnames'
> 
> +  fnotice (file, "  -e, --hash-namesUse hash of file path in 
> "
> .. and ..
> +  { "long-file-names",  no_argument,   NULL, 'e' },

Hi Nathan.

All nits are applied in the second version of patch.

> 
> don't seem to match?  Why 'e'?

I've renamed it to -x, well, a lot of letters are already occupied.

Martin

> 
> nathan

>From b923bc8d838cf1de01a97db8f5ea5c78519a782b Mon Sep 17 00:00:00 2001
From: marxin 
Date: Tue, 9 Aug 2016 16:27:10 +0200
Subject: [PATCH] gcov: add new option (--hash-filenames) (PR
 gcov-profile/36412).

gcc/ChangeLog:

2016-08-09  Martin Liska  

	PR gcov-profile/36412
	* doc/gcov.texi: Document --hash-filenames(-x).
	* gcov.c (print_usage): Add the option.
	(process_args): Process the option.
	(md5sum_to_hex): New function.
	(make_gcov_file_name): Do the md5sum and append it to a
	filename.
---
 gcc/doc/gcov.texi |  7 +++
 gcc/gcov.c| 48 ++--
 2 files changed, 53 insertions(+), 2 deletions(-)

diff --git a/gcc/doc/gcov.texi b/gcc/doc/gcov.texi
index df58df8..1737416 100644
--- a/gcc/doc/gcov.texi
+++ b/gcc/doc/gcov.texi
@@ -133,6 +133,7 @@ gcov [@option{-v}|@option{--version}] [@option{-h}|@option{--help}]
  [@option{-r}|@option{--relative-only}]
  [@option{-s}|@option{--source-prefix} @var{directory}]
  [@option{-u}|@option{--unconditional-branches}]
+ [@option{-x}|@option{--hash-filenames}]
  @var{files}
 @c man end
 @c man begin SEEALSO
@@ -278,6 +279,12 @@ branch:28,nottaken
 Display demangled function names in output. The default is to show
 mangled function names.
 
+@item -x
+@itemx --hash-filenames
+For situations when a long name can potentially hit filesystem path limit,
+let's calculate md5sum of the path and create file
+@file{source_file.c##.gcov}.
+
 @end table
 
 @command{gcov} should be run with the current directory the same as that
diff --git a/gcc/gcov.c b/gcc/gcov.c
index 30fc167..614f371 100644
--- a/gcc/gcov.c
+++ b/gcc/gcov.c
@@ -43,6 +43,7 @@ along with Gcov; see the file COPYING3.  If not see
 
 #include 
 #include 
+#include "md5.h"
 
 using namespace std;
 
@@ -359,6 +360,11 @@ static int flag_demangled_names = 0;
 
 static int flag_long_names = 0;
 
+/* For situations when a long name can potentially hit filesystem path limit,
+   let's calculate md5sum of the path and append it to a file name.  */
+
+static int flag_hash_filenames = 0;
+
 /* Output count information for every basic block, not merely those
that contain line number information.  */
 
@@ -667,6 +673,7 @@ print_usage (int error_p)
   fnotice (file, "  -s, --source-prefix DIR Source prefix to elide\n");
   fnotice (file, "  -u, --unconditional-branchesShow unconditional branch counts too\n");
   fnotice (file, "  -v, --version   Print version number, then exit\n");
+  fnotice (file, "  -x, --hash-filenamesHash long pathnames\n");
   fnotice (file, "\nFor bug reporting instructions, please see:\n%s.\n",
 	   bug_report_url);
   exit (status);
@@ -706,6 +713,7 @@ static const struct option options[] =
   { "source-prefix",required_argument, NULL, 's' },
   { "unconditional-branches", no_argument, NULL, 'u' },
   { "display-progress", no_argument,   NULL, 'd' },
+  { "hash-filenames",	no_argument,   NULL, 'x' },
   { 0, 0, 0, 0 }
 };
 
@@ -716,8 +724,8 @@ process_args (int argc, char **argv)
 {
   int opt;
 
-  while ((opt = getopt_long (argc, argv, "abcdfhilmno:s:pruv", options, NULL)) !=
- -1)
+  const char *opts = "abcdfhilmno:s:pruvx";
+  while ((opt = getopt_long (argc, argv, opts, options, NULL)) != -1)
 {
   switch (opt)
 	{
@@ -770,6 +778,9 @@ process_args (int argc, char **argv)
   break;
 	case 'v':
 	  print_version ();
+	case 'x':
+

Re: [PATCH, COMMITTED] Add branch_changer.py script to maintainer-scripts

2016-08-15 Thread Martin Liška
On 08/14/2016 06:35 PM, Gerald Pfeifer wrote:
> The one thing that would be really good to add is documentation on
> the usage of the script and/or examples for relevant invocations.
> 
> Thanks,
> Gerald

Good idea, sending second version of the patch where I added usage samples.

Ready to be installed?
Martin
>From f56db0ac860fd92c0f4112119a05038ea9de7ce9 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Fri, 12 Aug 2016 15:27:29 +0200
Subject: [PATCH] Document branch_changer.py script

maintainer-scripts/ChangeLog:

2016-08-12  Martin Liska  

	* branch_changer.py: Describe the script. Add sample usage
	of the script.
---
 maintainer-scripts/branch_changer.py | 58 
 1 file changed, 58 insertions(+)

diff --git a/maintainer-scripts/branch_changer.py b/maintainer-scripts/branch_changer.py
index 5e1681b..607d9ac 100755
--- a/maintainer-scripts/branch_changer.py
+++ b/maintainer-scripts/branch_changer.py
@@ -1,9 +1,67 @@
 #!/usr/bin/env python3
 
+# Script is used by maintainers to modify bugzilla entries in a batch mode.
+# Currently, the scripts can remove and add a release from/to PRs that are
+# prefixed with '[x Regression]'. Apart from that, the script can also
+# change a target milestone and optionally enhance list of known-to-fail
+# versions.
+#
+# The script utilizes the Bugzilla API, as documented here:
+# http://bugzilla.readthedocs.io/en/latest/api/index.html
+
 # The script requires simplejson, requests, semantic_version packages, in case
 # of openSUSE:
 # zypper in python3-simplejson python3-requests
 # pip3 install semantic_version
+#
+# Sample usages of the script:
+#
+# $ ./maintainer-scripts/branch_changer.py api_key --new-target-milestone=6.2:6.3 --comment '6.2 has been released' --add-known-to-fail=6.2 --limit 3
+# PR28628 (Not forcing alignment of arrays in structs  with -fsection-anchors)
+#   changing target milestone: "6.2" to "6.3" (same branch)
+#   adding comment: "6.2 has been released"
+#   changing known_to_fail: "" to "6.2"
+# PR35514 (Gcc shoud generate symbol type for undefined symbol)
+#   changing target milestone: "6.2" to "6.3" (same branch)
+#   adding comment: "6.2 has been released"
+#   changing known_to_fail: "" to "6.2"
+# PR39589 (make -Wmissing-field-initializers=2 work with "designated initializers" ?)
+#   changing target milestone: "6.2" to "6.3" (same branch)
+#   adding comment: "6.2 has been released"
+#   changing known_to_fail: "" to "6.2"
+#
+# Modified PRs: 3
+#
+# $ ./maintainer-scripts/branch_changer.py api_key --new-target-milestone=5.5:6.3 --comment 'GCC 5 branch is being closed' --remove 5 --limit 3
+# PR8270 ([5/6/7 Regression] back-slash white space newline with comments, no warning)
+#   changing summary: "[5/6/7 Regression] back-slash white space newline with comments, no warning" to "[6/7 Regression] back-slash white space newline with comments, no warning"
+# PR12245 ([5/6/7 regression] Uses lots of memory when compiling large initialized arrays)
+#   changing summary: "[5/6/7 regression] Uses lots of memory when compiling large initialized arrays" to "[6/7 regression] Uses lots of memory when compiling large initialized arrays"
+# PR14179 ([5/6/7 Regression] out of memory while parsing array with many initializers)
+#   changing summary: "[5/6/7 Regression] out of memory while parsing array with many initializers" to "[6/7 Regression] out of memory while parsing array with many initializers"
+#
+# Modified PRs: 3
+# PR8270 ([5/6/7 Regression] back-slash white space newline with comments, no warning)
+#   changing target milestone: "5.5" to "6.3" (regresses with the new milestone)
+#   adding comment: "GCC 5 branch is being closed"
+# PR12245 ([5/6/7 regression] Uses lots of memory when compiling large initialized arrays)
+#   changing target milestone: "5.5" to "6.3" (regresses with the new milestone)
+#   adding comment: "GCC 5 branch is being closed"
+# PR14179 ([5/6/7 Regression] out of memory while parsing array with many initializers)
+#   changing target milestone: "5.5" to "6.3" (regresses with the new milestone)
+#   adding comment: "GCC 5 branch is being closed"
+#
+# Modified PRs: 3
+#
+# $ ./maintainer-scripts/branch_changer.py api_key --add=7:8 --limit 3
+# PR8270 ([5/6/7 Regression] back-slash white space newline with comments, no warning)
+#   changing summary: "[5/6/7 Regression] back-slash white space newline with comments, no warning" to "[5/6/7/8 Regression] back-slash white space newline with comments, no warning"
+# PR12245 ([5/6/7 regression] Uses lots of memory when compiling large initialized arrays)
+#   changing summary: "[5/6/7 regression] Uses lots of memory when compiling large initialized arrays" to "[5/6/7/8 regression] Uses lots of memory when compiling large initialized arrays"
+# PR14179 ([5/6/7 Regression] out of memory while parsing array with many initializers)
+#   changing summary: "[5/6/7 Regression] out of memory while parsing array with many initialize

Re: [PATCH] Extend -falign-FOO=N to N[,M]: the second number is max padding

2016-08-15 Thread Denys Vlasenko

On 08/15/2016 11:45 AM, Richard Biener wrote:

Thus. For this CPU, alignment of loops to 8 bytes is wrong: it helps if it
happens
to align a loop to 16 bytes, but it may in fact hurt performance if it
happens to align
a loop to 16+8 bytes and this pushes loop's body end over the next 16-byte
boundary,
as it happens in the above example.

I suspect something similar was seen sometime ago on a different, earlier
CPU,
and on _that_ CPU decoder/loop buffer idiosyncrasies are such that it likes
8 byte alignment.

It's not true that such alignment is always a win.


It looks to me that all you want is to drop the 8-byte alignment on
entities that are smaller than a cacheline.


I don't think it can be simplified to this.

An example. A loop 122 bytes long fits into either two or three 64-byte 
cachelines,
depending on where it starts. If it starts in bytes 0..5 in a cacheline, it fits
into two cachelines. If it starts at 6 bytes or more into cacheline, it doesn't 
fit.

8-byte alignment is worse for such a loop than not doing it.

It's even worse for the use case which prompted me to create these patches:
-falign-functions. Linux kernel people want to align all functions
to 64 bytes, but only if the necessary padding is, say, 9 bytes or less.
The rationale is that function calls are often "cold", i.e. function body
is not in L1, and it would be even slower if first insn(s) would require
two L1 loads, not one, to be decoded.

Hence -falign-functions=64,10. This would be a very efficient packing:
only ~15% of all functions would need any padding (the remaining 85%
would start 10 or more bytes before end of cacheline and thus need
no padding), and among those 15% the average padding length would be
only 5 bytes. With very small code size increase, we'd gain a lot
in speed.

This nice optimistic picture is currently destroyed by unnecessary
and not-asked-for "subalignment" to 8 bytes, which now adds 4.5 bytes
of padding on average *to every function*, as a "bonus" making
it *less* efficient versus instruction fetch, not more efficient!


IOW: I am proposing to remove this code because it seems arbitrary: it helped
on one particular CPU model, and maybe only on some particular benchmarks.
On other CPUs, or in other scenarios, it's harmful.
It should not be now done for all CPUs and all programs.

If there is a value in the ability to do a "subalignment" within a larger 
alignment,
maybe we can make it a separate option, and let user specify it if he wants?


Re: [v3 PATCH] Implement LWG 2744 and LWG 2754.

2016-08-15 Thread Ville Voutilainen
On 15 August 2016 at 12:53, Jonathan Wakely  wrote:
>> +  template
>> +struct __is_in_place_impl : false_type
>> +{ };
>> +
>> +  template
>> +  struct __is_in_place_impl> : true_type
> Indentation nit.

Will fix.

>
>> +{ };
>> +
>> +  template
>> +struct __is_in_place
>> +: public
>> __is_in_place_impl>>
>
>
> Any reason not to use decay_t here? In all the cases where decay is
> different to stripping references and cv-qualifiers the result will be
> false either way.
>
> I wouldn't have bothered with the std:: qualification either, but it's
> fine as it is.


The reason for not using decay and for the qualification is that the
trait is closely
related to its copy-paste origin, which is __is_optional. ;) I should
also add a test
for the case where in_place is attempted to pass through the
ValueType&& parameter,
aka a case where a type is not default-constructible but is
constructible from in_place,
and in_place-construction is used.


Re: [PATCH] gcov: add new option (--hash-names) (PR gcov-profile/36412).

2016-08-15 Thread Nathan Sidwell

On 08/15/16 07:43, Martin Liška wrote:


All nits are applied in the second version of patch.



don't seem to match?  Why 'e'?


I've renamed it to -x, well, a lot of letters are already occupied.


I guess 'x' may be better.  If there is no good choice, do we really need a 
short name? (There didn't seem to be a good letter available.  I'm ambivalent so 
will leave it to you.)


gcc/doc/gcov.texi
+For situations when a long name can potentially hit filesystem path limit,
+let's calculate md5sum of the path and create file

Sorry I missed this first time round.  The language is straight from the comment 
you added, so not really suitable for user documentation.  Add a bit of 
backstory.  How about:


 "By default, gcov uses the full pathname of the source files to to create an 
output filename.  This can lead to long filenames that can overflow filesystem 
limits.  This option creates names of the form 
@file{@var{source-file}##@var{md5}.gcov}, where the @var{source-file} component 
is the final filename part and the @var{md5} component is calculated from the 
full mangled name that would have been used otherwise."


+  const char *opts = "abcdfhilmno:s:pruvx";
Could you fix the alphabetization while you're there? (s:pr)

+  /* With -x flag, file names will be in format:
+ source_file.c##.gcov.  */

comment shouldn't really mention the option name.  Use some indirection to avoid 
another place that could get inconsistent :)


/* When hashing filenames, we shorten them by only using the filename
   component and appending a hash of the full (mangled) pathname.  */

nathan


Re: [PATCH build/doc] Replacing libiberty with gnulib

2016-08-15 Thread ayush goel
Included gnulib’s config.h header file inside gcc’s config.h itself as
per the discussions.

Built and tested the system.

PFA the patch



2016-08-14 Ayush Goel 

* Makefile.def: Added gnulib as build & host library and dependency of
all-gcc on gnulib
* Makefile.in: regenerated
* configure.ac: Added gnulib as build and host library
* configure: regenerated
* gcc/Makefile.in: Added path to gnulib static library (libgnu.a) and
gnulib header files
* gcc/mkconfig.sh: Included gnulib’s config.h
* gcc/doc/sourcebuild.texi: Added gnulib and how to use the update
script to update/import gnu lib modules
* gnulib: created directory
* gnulib/Makefile.in: new file
* gnulib/configure.ac: new file
* gnulib/update-gnulib.sh: script to import gnulib modules using gnulib-tool
* gnulib/import: created by update-gnulib.sh
* gnulib/import/Makefile.in: imported from gnulib
* gnulib/import/alignof.h: Imported from gnulib
* gnulib/import/exitfail.c: Imported from gnulib
* gnulib/import/exitfail.h: Imported from gnulib
* gnulib/import/extra: Imported from gnulib
* gnulib/import/extra/snippet: Imported from gnulib
* gnulib/import/extra/snippet/_Noreturn.h: Imported from gnulib
* gnulib/import/extra/snippet/arg-nonnull.h: Imported from gnulib
* gnulib/import/extra/snippet/c++defs.h: Imported from gnulib
* gnulib/import/extra/snippet/warn-on-use.h: Imported from gnulib
* gnulib/import/gettext.h: Imported from gnulib
* gnulib/import/m4: Imported from gnulib
* gnulib/import/m4/00gnulib.m4: Imported from gnulib
* gnulib/import/m4/absolute-header.m4: Imported from gnulib
* gnulib/import/m4/extern-inline.m4: Imported from gnulib
* gnulib/import/m4/gnulib-cache.m4: Imported from gnulib
* gnulib/import/m4/gnulib-common.m4: Imported from gnulib
* gnulib/import/m4/gnulib-comp.m4: Imported from gnulib
* gnulib/import/m4/gnulib-tool.m4: Imported from gnulib
* gnulib/import/m4/include_next.m4: Imported from gnulib
* gnulib/import/m4/longlong.m4: Imported from gnulib
* gnulib/import/m4/multiarch.m4: Imported from gnulib
* gnulib/import/m4/obstack.m4: Imported from gnulib
* gnulib/import/m4/off_t.m4: Imported from gnulib
* gnulib/import/m4/ssize_t.m4: Imported from gnulib
* gnulib/import/m4/stddef_h.m4: Imported from gnulib
* gnulib/import/m4/stdint.m4: Imported from gnulib
* gnulib/import/m4/stdlib_h.m4: Imported from gnulib
* gnulib/import/m4/sys_types_h.m4: Imported from gnulib
* gnulib/import/m4/unistd_h.m4: Imported from gnulib
* gnulib/import/m4/warn-on-use.m4: Imported from gnulib
* gnulib/import/m4/wchar_t.m4: Imported from gnulib
* gnulib/import/obstack.c: Imported from gnulib
* gnulib/import/obstack.h: Imported from gnulib
* gnulib/import/stddef.in.h: Imported from gnulib
* gnulib/import/stdint.in.h: Imported from gnulib
* gnulib/import/stdlib.in.h: Imported from gnulib
* gnulib/import/sys: Imported from gnulib
* gnulib/import/sys_types.in.h: Imported from gnulib
* gnulib/import/unistd.c: Imported from gnulib
* gnulib/import/unistd.in.h: Imported from gnulib
* gnulib/stamp-h1: generated



-Ayush

On 10 August 2016 at 11:27:49 PM, Pedro Alves (pal...@redhat.com) wrote:
> 20-7-16(http://airmail.calendar/2016-07-20%2012:00:00%20IST) Ayush Goel 
> Makefile.def:
> Added gnulib as build & host library and dependency of all-gcc on gnulib * 
> Makefile.in:
> regenerated * configure.ac: Added gnulib as build and host library * 
> configure: regenerated
> * gcc/Makefile.in: Added path to gnulib static library (libgnu.a) and gnulib 
> header
> files * gcc/doc/sourcebuild.texi: Added gnulib and how to use the update 
> script to update/import
> gnu lib modules * gnulib: created directory * gnulib/Makefile.in: new file * 
> gnulib/configure.ac:
> new file * gnulib/update-gnulib.sh: script to import gnulib modules using 
> gnulib-tool
> * gnulib/import: created by update-gnulib.sh * gnulib/import/Makefile.in: 
> imported
> from gnulib * gnulib/import/alignof.h: Imported from gnulib * 
> gnulib/import/exitfail.c:
> Imported from gnulib * gnulib/import/exitfail.h: Imported from gnulib * 
> gnulib/import/extra:
> Imported from gnulib * gnulib/import/extra/snippet: Imported from gnulib * 
> gnulib/import/extra/snippet/_Noreturn.h:
> Imported from gnulib * gnulib/import/extra/snippet/arg-nonnull.h: Imported 
> from
> gnulib * gnulib/import/extra/snippet/c++defs.h: Imported from gnulib * 
> gnulib/import/extra/snippet/warn-on-use.h:
> Imported from gnulib * gnulib/import/gettext.h: Imported from gnulib * 
> gnulib/import/m4:
> Imported from gnulib * gnulib/import/m4/00gnulib.m4: Imported from gnulib * 
> gnulib/import/m4/absolute-header.m4:
> Imported from gnulib * gnulib/import/m4/extern-inline.m4: Imported from gnulib
> * gnulib/import/m4/gnulib-cache.m4: Imported from gnulib * 
> gnulib/import/m4/gnulib-common.m4:
> Imported from gnulib * gnulib/import/m4/gnulib-comp.m4: Imported from gnulib 
> * gnulib/import/m4/gnulib-tool.m4:
> Imported from gnulib * gnulib/import/m4/include_next.m4: Imported from gnulib 
> *
> gnulib/import/m4/longlong.m4: Importe

Handle preprocessor constants with -fdump-ada-spec

2016-08-15 Thread Eric Botcazou
This makes -fdump-ada-spec dump preprocessor macros defined as integers and 
strings, in the form of Ada constants without declared type.

Tested on x86_64-suse-linux, applied on the mainline.


2016-08-16  Eric Botcazou  
Arnaud Charlet  

c-family/
* c-ada-spec.c (dump_number): New function.
(handle_escape_character): Likewise.
(print_ada_macros): Add handling of constant integers and strings.


2016-08-16  Eric Botcazou  

* c-c++-common/dump-ada-spec-5.c: New test.

-- 
Eric BotcazouIndex: c-ada-spec.c
===
--- c-ada-spec.c	(revision 239324)
+++ c-ada-spec.c	(working copy)
@@ -116,6 +116,58 @@ macro_length (const cpp_macro *macro, int *support
   (*buffer_len)++;
 }
 
+/* Dump all digits/hex chars from NUMBER to BUFFER and return a pointer
+   to the character after the last character written.  */
+
+static unsigned char *
+dump_number (unsigned char *number, unsigned char *buffer)
+{
+  while (*number != '\0'
+	 && *number != 'U'
+	 && *number != 'u'
+	 && *number != 'l'
+	 && *number != 'L')
+*buffer++ = *number++;
+
+  return buffer;
+}
+
+/* Handle escape character C and convert to an Ada character into BUFFER.
+   Return a pointer to the character after the last character written, or
+   NULL if the escape character is not supported.  */
+
+static unsigned char *
+handle_escape_character (unsigned char *buffer, char c)
+{
+  switch (c)
+{
+  case '"':
+	*buffer++ = '"';
+	*buffer++ = '"';
+	break;
+
+  case 'n':
+	strcpy ((char *) buffer, "\" & ASCII.LF & \"");
+	buffer += 16;
+	break;
+
+  case 'r':
+	strcpy ((char *) buffer, "\" & ASCII.CR & \"");
+	buffer += 16;
+	break;
+
+  case 't':
+	strcpy ((char *) buffer, "\" & ASCII.HT & \"");
+	buffer += 16;
+	break;
+
+  default:
+	return NULL;
+}
+
+  return buffer;
+}
+
 /* Dump into PP a set of MAX_ADA_MACROS MACROS (C/C++) as Ada constants when
possible.  */
 
@@ -132,7 +184,7 @@ print_ada_macros (pretty_printer *pp, cpp_hashnode
   int supported = 1, prev_is_one = 0, buffer_len, param_len;
   int is_string = 0, is_char = 0;
   char *ada_name;
-  unsigned char *s, *params, *buffer, *buf_param, *char_one = NULL;
+  unsigned char *s, *params, *buffer, *buf_param, *char_one = NULL, *tmp;
 
   macro_length (macro, &supported, &buffer_len, ¶m_len);
   s = buffer = XALLOCAVEC (unsigned char, buffer_len);
@@ -246,8 +298,6 @@ print_ada_macros (pretty_printer *pp, cpp_hashnode
 		  case CPP_CHAR32:
 		  case CPP_UTF8CHAR:
 		  case CPP_NAME:
-		  case CPP_STRING:
-		  case CPP_NUMBER:
 		if (!macro->fun_like)
 		  supported = 0;
 		else
@@ -254,6 +304,27 @@ print_ada_macros (pretty_printer *pp, cpp_hashnode
 		  buffer = cpp_spell_token (parse_in, token, buffer, false);
 		break;
 
+		  case CPP_STRING:
+		is_string = 1;
+		{
+		  const unsigned char *s = token->val.str.text;
+
+		  for (; *s; s++)
+			if (*s == '\\')
+			  {
+			s++;
+			buffer = handle_escape_character (buffer, *s);
+			if (buffer == NULL)
+			  {
+supported = 0;
+break;
+			  }
+			  }
+			else
+			  *buffer++ = *s;
+		}
+		break;
+
 		  case CPP_CHAR:
 		is_char = 1;
 		{
@@ -278,6 +349,72 @@ print_ada_macros (pretty_printer *pp, cpp_hashnode
 		}
 		break;
 
+		  case CPP_NUMBER:
+		tmp = cpp_token_as_text (parse_in, token);
+
+		switch (*tmp)
+		  {
+			case '0':
+			  switch (tmp[1])
+			{
+			  case '\0':
+			  case 'l':
+			  case 'L':
+			  case 'u':
+			  case 'U':
+*buffer++ = '0';
+break;
+
+			  case 'x':
+			  case 'X':
+*buffer++ = '1';
+*buffer++ = '6';
+*buffer++ = '#';
+buffer = dump_number (tmp + 2, buffer);
+*buffer++ = '#';
+break;
+
+			  case 'b':
+			  case 'B':
+*buffer++ = '2';
+*buffer++ = '#';
+buffer = dump_number (tmp + 2, buffer);
+*buffer++ = '#';
+break;
+
+			  default:
+/* Dump floating constants unmodified.  */
+if (strchr ((const char *)tmp, '.'))
+  buffer = dump_number (tmp, buffer);
+else
+  {
+*buffer++ = '8';
+*buffer++ = '#';
+buffer = dump_number (tmp + 1, buffer);
+*buffer++ = '#';
+  }
+break;
+			}
+			  break;
+
+			case '1':
+			  if (tmp[1] == '\0' || tmp[1] == 'l' || tmp[1] == 'u'
+			  || tmp[1] == 'L' || tmp[1] == 'U')
+			{
+			  is_one = 1;
+			  char_one = buffer;
+			  *buffer++ = '1';
+			}
+			  else
+			buffer = dump_number (tmp, buffer);
+			  break;
+
+			default:
+			  buffer = dump_number (tmp, buffer);
+			  break;
+		  }
+		break;
+
 		  case CPP_LSHIFT:
 		if (prev_is_one)
 		  {
/* { dg-do compile } */
/* { dg-options "-fdump-ada-spec" } */

#define not_octal_constant 0.627

extern double foo (double);

/* { dg-final { scan-ada-spec-not "unsupported macro"

[PATCH] Fix PR76490

2016-08-15 Thread Richard Biener

The following fixes PR76490 which happens because how VRP expects
+INF vs. +INF(OVF) to behave wrt comparisons.  I fixed all
operand_equal_p cases that matter.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2016-08-15  Richard Biener  

PR tree-optimization/76490
* tree-vrp.c (value_range_constant_singleton): Use vrp_operand_equal_p
to handle overflow max/min correctly.
(vrp_valueize): Likewise.
(union_ranges): Likewise.
(intersect_ranges): Likewise.

* gfortran.fortran-torture/compile/pr76490.f90: New testcase.

Index: gcc/tree-vrp.c
===
*** gcc/tree-vrp.c  (revision 239460)
--- gcc/tree-vrp.c  (working copy)
*** static tree
*** 1422,1428 
  value_range_constant_singleton (value_range *vr)
  {
if (vr->type == VR_RANGE
!   && operand_equal_p (vr->min, vr->max, 0)
&& is_gimple_min_invariant (vr->min))
  return vr->min;
  
--- 1423,1429 
  value_range_constant_singleton (value_range *vr)
  {
if (vr->type == VR_RANGE
!   && vrp_operand_equal_p (vr->min, vr->max)
&& is_gimple_min_invariant (vr->min))
  return vr->min;
  
*** vrp_valueize (tree name)
*** 7028,7035 
  {
value_range *vr = get_value_range (name);
if (vr->type == VR_RANGE
! && (vr->min == vr->max
! || operand_equal_p (vr->min, vr->max, 0)))
return vr->min;
  }
return name;
--- 7006,7012 
  {
value_range *vr = get_value_range (name);
if (vr->type == VR_RANGE
! && vrp_operand_equal_p (vr->min, vr->max))
return vr->min;
  }
return name;
*** union_ranges (enum value_range_type *vr0
*** 7995,8002 
  enum value_range_type vr1type,
  tree vr1min, tree vr1max)
  {
!   bool mineq = operand_equal_p (*vr0min, vr1min, 0);
!   bool maxeq = operand_equal_p (*vr0max, vr1max, 0);
  
/* [] is vr0, () is vr1 in the following classification comments.  */
if (mineq && maxeq)
--- 7972,7979 
  enum value_range_type vr1type,
  tree vr1min, tree vr1max)
  {
!   bool mineq = vrp_operand_equal_p (*vr0min, vr1min);
!   bool maxeq = vrp_operand_equal_p (*vr0max, vr1max);
  
/* [] is vr0, () is vr1 in the following classification comments.  */
if (mineq && maxeq)
*** intersect_ranges (enum value_range_type
*** 8266,8273 
  enum value_range_type vr1type,
  tree vr1min, tree vr1max)
  {
!   bool mineq = operand_equal_p (*vr0min, vr1min, 0);
!   bool maxeq = operand_equal_p (*vr0max, vr1max, 0);
  
/* [] is vr0, () is vr1 in the following classification comments.  */
if (mineq && maxeq)
--- 8243,8250 
  enum value_range_type vr1type,
  tree vr1min, tree vr1max)
  {
!   bool mineq = vrp_operand_equal_p (*vr0min, vr1min);
!   bool maxeq = vrp_operand_equal_p (*vr0max, vr1max);
  
/* [] is vr0, () is vr1 in the following classification comments.  */
if (mineq && maxeq)
Index: gcc/testsuite/gfortran.fortran-torture/compile/pr76490.f90
===
*** gcc/testsuite/gfortran.fortran-torture/compile/pr76490.f90  (revision 0)
--- gcc/testsuite/gfortran.fortran-torture/compile/pr76490.f90  (working copy)
***
*** 0 
--- 1,23 
+ program membug
+ call bug1()
+ end program membug
+ subroutine unknown(x1,y1,ibig)
+write(*,*)x1,y1,ibig
+ end subroutine unknown
+ subroutine bug1()
+ real arrayq(3000)
+isize=0
+ibig=-1
+x2=0
+ 10 continue   
+isize=isize+1
+arrayq(isize)=x2
+ 15 continue
+call unknown(x1,y1,ibig)
+if(ibig.eq.1)then
+   goto 10
+elseif(ibig.eq.2)then   
+   isize=max(1,isize-1)
+   goto 15
+endif
+ end subroutine bug1


[PATCH] Add verifier for virtual SSA form

2016-08-15 Thread Richard Biener

This adds a verifier that makes sure no overlapping life-ranges occur
for virtuals.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Richard.

2016-08-15  Richard Biener  

* tree-ssa.c: Include tree-cfg.h and tree-dfa.h.
(verify_vssa): New function verifying virtual SSA form.
(verify_ssa): Call it.

Index: gcc/tree-ssa.c
===
*** gcc/tree-ssa.c  (revision 239473)
--- gcc/tree-ssa.c  (working copy)
*** along with GCC; see the file COPYING3.
*** 39,44 
--- 39,46 
  #include "tree-ssa.h"
  #include "cfgloop.h"
  #include "cfgexpand.h"
+ #include "tree-cfg.h"
+ #include "tree-dfa.h"
  
  /* Pointer map of variable mappings, keyed by edge.  */
  static hash_map > *edge_var_maps;
*** release_defs_bitset (bitmap toremove)
*** 603,608 
--- 605,669 
}
  }
  
+ /* Verify virtual SSA form.  */
+ 
+ bool
+ verify_vssa (basic_block bb, tree current_vdef, sbitmap visited)
+ {
+   if (bitmap_bit_p (visited, bb->index))
+ return false;
+ 
+   bitmap_set_bit (visited, bb->index);
+ 
+   /* Pick up virtual PHI def.  */
+   gphi *phi = get_virtual_phi (bb);
+   if (phi)
+ current_vdef = gimple_phi_result (phi);
+ 
+   /* Verify stmts.  */
+   bool err = false;
+   for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p (gsi);
+gsi_next (&gsi))
+ {
+   gimple *stmt = gsi_stmt (gsi);
+   tree vuse = gimple_vuse (stmt);
+   if (vuse)
+   {
+ if (vuse != current_vdef)
+   {
+ error ("stmt with wrong VUSE %qD, expected %qD",
+vuse, current_vdef);
+ print_gimple_stmt (stderr, stmt, 0, TDF_VOPS);
+ err = true;
+   }
+ tree vdef = gimple_vdef (stmt);
+ if (vdef)
+   current_vdef = vdef;
+   }
+ }
+ 
+   /* Verify destination PHI uses and recurse.  */
+   edge_iterator ei;
+   edge e;
+   FOR_EACH_EDGE (e, ei, bb->succs)
+ {
+   gphi *phi = get_virtual_phi (e->dest);
+   if (phi
+ && PHI_ARG_DEF_FROM_EDGE (phi, e) != current_vdef)
+   {
+ error ("PHI node with wrong VUSE %qD, expected %qD",
+PHI_ARG_DEF_FROM_EDGE (phi, e), current_vdef);
+ print_gimple_stmt (stderr, phi, 0, TDF_VOPS);
+ err = true;
+   }
+ 
+   /* Recurse.  */
+   err |= verify_vssa (e->dest, current_vdef, visited);
+ }
+ 
+   return err;
+ }
+ 
  /* Return true if SSA_NAME is malformed and mark it visited.
  
 IS_VIRTUAL is true if this SSA_NAME was found inside a virtual
*** verify_ssa (bool check_modified_stmt, bo
*** 1024,1029 
--- 1110,1125 
  
free (definition_block);
  
+   if (gimple_vop (cfun)
+   && ssa_default_def (cfun, gimple_vop (cfun)))
+ {
+   auto_sbitmap visited (last_basic_block_for_fn (cfun) + 1);
+   bitmap_clear (visited);
+   if (verify_vssa (ENTRY_BLOCK_PTR_FOR_FN (cfun),
+  ssa_default_def (cfun, gimple_vop (cfun)), visited))
+   goto err;
+ }
+ 
/* Restore the dominance information to its prior known state, so
   that we do not perturb the compiler's subsequent behavior.  */
if (orig_dom_state == DOM_NONE)



[PATCH] Fix PR23855

2016-08-15 Thread Richard Biener

For GCC 6 unswitching gained the ability to hoist loop guard checks.
The following extends this to make sure we can hoist multiple guards
(including those that were hoisted from inner loops) - this allows
us to hoist all guards in a three-loop nest completely out of the
nest (the gcc.dg/loop-unswitch-2.c covers this case but had only
two expected hoists).

In the process of this I've removed the virtual SSA rewrite we were
doing per guard hoisting.

Bootstrap and regtest running on x86_64-unknown-linux-gnu (including
the virtual SSA form verifier patch).

I already built SPEC 2k6 for this patch (but w/o the verifier) without
issues, all test runs pass (with -Ofast -march=native on Skylake).

Richard.

2016-08-15  Richard Biener  

PR tree-optimization/23855
* tree-ssa-loop-unswitch.c: Include tree-ssa-loop-manip.h.
(tree_unswitch_outer_loop): Iterate find_loop_guard as long as we
find guards to hoist.  Do not update SSA form but rewrite virtuals
into loop closed SSA.
(find_loop_guard): Adjust to skip already hoisted guards.  Do
not mark virtuals for renaming or update SSA form.

* gcc.dg/loop-unswitch-2.c: Adjust.

Index: gcc/testsuite/gcc.dg/loop-unswitch-2.c
===
*** gcc/testsuite/gcc.dg/loop-unswitch-2.c  (revision 239473)
--- gcc/testsuite/gcc.dg/loop-unswitch-2.c  (working copy)
*** void foo (float **a, float **b, float *c
*** 11,15 
c[i] += a[i][k] * b[k][j];
  }
  
! /* { dg-final { scan-tree-dump-times "guard hoisted" 2 "unswitch" } } */
  
--- 11,15 
c[i] += a[i][k] * b[k][j];
  }
  
! /* { dg-final { scan-tree-dump-times "guard hoisted" 3 "unswitch" } } */
  
Index: gcc/tree-ssa-loop-unswitch.c
===
*** gcc/tree-ssa-loop-unswitch.c(revision 239478)
--- gcc/tree-ssa-loop-unswitch.c(working copy)
*** along with GCC; see the file COPYING3.
*** 37,42 
--- 37,43 
  #include "tree-inline.h"
  #include "gimple-iterator.h"
  #include "cfghooks.h"
+ #include "tree-ssa-loop-manip.h"
  
  /* This file implements the loop unswitching, i.e. transformation of loops 
like
  
*** tree_unswitch_outer_loop (struct loop *l
*** 451,464 
return false;
  }
  
!   guard = find_loop_guard (loop);
!   if (guard)
  {
hoist_guard (loop, guard);
!   update_ssa (TODO_update_ssa);
!   return true;
  }
!   return false;
  }
  
  /* Checks if the body of the LOOP is within an invariant guard.  If this
--- 452,466 
return false;
  }
  
!   bool changed = false;
!   while ((guard = find_loop_guard (loop)))
  {
+   if (! changed)
+   rewrite_virtuals_into_loop_closed_ssa (loop);
hoist_guard (loop, guard);
!   changed = true;
  }
!   return changed;
  }
  
  /* Checks if the body of the LOOP is within an invariant guard.  If this
*** find_loop_guard (struct loop *loop)
*** 501,513 
b) anything defined in something1, something2 and something3
   is not used outside of the loop.  */
  
!   while (single_succ_p (header))
! header = single_succ (header);
!   if (!last_stmt (header)
!   || gimple_code (last_stmt (header)) != GIMPLE_COND)
! return NULL;
! 
!   extract_true_false_edges_from_block (header, &te, &fe);
if (!flow_bb_inside_loop_p (loop, te->dest)
|| !flow_bb_inside_loop_p (loop, fe->dest))
  return NULL;
--- 503,530 
b) anything defined in something1, something2 and something3
   is not used outside of the loop.  */
  
!   gcond *cond;
!   do
! {
!   if (single_succ_p (header))
!   header = single_succ (header);
!   else
!   {
! cond = dyn_cast  (last_stmt (header));
! if (! cond)
!   return NULL;
! extract_true_false_edges_from_block (header, &te, &fe);
! /* Make sure to skip earlier hoisted guards that are left
!in place as if (true).  */
! if (gimple_cond_true_p (cond))
!   header = te->dest;
! else if (gimple_cond_false_p (cond))
!   header = fe->dest;
! else
!   break;
!   }
! }
!   while (1);
if (!flow_bb_inside_loop_p (loop, te->dest)
|| !flow_bb_inside_loop_p (loop, fe->dest))
  return NULL;
*** find_loop_guard (struct loop *loop)
*** 549,555 
 guard_edge->src->index, guard_edge->dest->index, loop->num);
/* Check if condition operands do not have definitions inside loop since
   any bb copying is not performed.  */
!   FOR_EACH_SSA_TREE_OPERAND (use, last_stmt (header), iter, SSA_OP_USE)
  {
gimple *def = SSA_NAME_DEF_STMT (use);
basic_block def_bb = gimple_bb (def);
--- 566,572 
 guard_edge->src->index, guard_edge->dest->index, loop->num);
/* Check if condition o

Re: [PATCH 0/4] BRIG (HSAIL) frontend

2016-08-15 Thread Pekka Jääskeläinen
Hi,

I have updated the BRIG FE patches.
I addressed Martin Jambor's comments and rebased to the latest gcc trunk.
I will reply also to the other respective threads with updated patches.

The updated diffstat:


 .gitignore| 2 +-
 Makefile.def  | 3 +
 Makefile.in   |   489 +
 configure | 1 +
 configure.ac  | 1 +
 gcc/brig/Make-lang.in |   246 +
 gcc/brig/brig-builtins.h  |99 +
 gcc/brig/brig-c.h |66 +
 gcc/brig/brig-lang.c  |   760 +
 gcc/brig/brigfrontend/brig-arg-block-handler.cc   |66 +
 gcc/brig/brigfrontend/brig-atomic-inst-handler.cc |   265 +
 gcc/brig/brigfrontend/brig-basic-inst-handler.cc  |   887 +
 gcc/brig/brigfrontend/brig-branch-inst-handler.cc |   221 +
 gcc/brig/brigfrontend/brig-cmp-inst-handler.cc|   204 +
 gcc/brig/brigfrontend/brig-code-entry-handler.cc  |  1845 ++
 gcc/brig/brigfrontend/brig-code-entry-handler.h   |   422 +
 gcc/brig/brigfrontend/brig-comment-handler.cc |39 +
 gcc/brig/brigfrontend/brig-control-handler.cc |29 +
 .../brigfrontend/brig-copy-move-inst-handler.cc   |57 +
 gcc/brig/brigfrontend/brig-cvt-inst-handler.cc|   259 +
 gcc/brig/brigfrontend/brig-fbarrier-handler.cc|44 +
 gcc/brig/brigfrontend/brig-function-handler.cc|   374 +
 gcc/brig/brigfrontend/brig-function.cc|   719 +
 gcc/brig/brigfrontend/brig-function.h |   208 +
 gcc/brig/brigfrontend/brig-inst-mod-handler.cc|58 +
 gcc/brig/brigfrontend/brig-label-handler.cc   |37 +
 gcc/brig/brigfrontend/brig-lane-inst-handler.cc   |83 +
 gcc/brig/brigfrontend/brig-machine.c  |44 +
 gcc/brig/brigfrontend/brig-machine.h  |33 +
 gcc/brig/brigfrontend/brig-mem-inst-handler.cc|   182 +
 gcc/brig/brigfrontend/brig-module-handler.cc  |30 +
 gcc/brig/brigfrontend/brig-queue-inst-handler.cc  |93 +
 gcc/brig/brigfrontend/brig-seg-inst-handler.cc|   146 +
 gcc/brig/brigfrontend/brig-signal-inst-handler.cc |42 +
 gcc/brig/brigfrontend/brig-to-generic.cc  |   768 +
 gcc/brig/brigfrontend/brig-to-generic.h   |   217 +
 gcc/brig/brigfrontend/brig-util.cc|   349 +
 gcc/brig/brigfrontend/brig-util.h |49 +
 gcc/brig/brigfrontend/brig-variable-handler.cc|   254 +
 gcc/brig/brigfrontend/phsa.h  |40 +
 gcc/brig/brigspec.c   |   135 +
 gcc/brig/config-lang.in   |41 +
 gcc/brig/lang-specs.h |28 +
 gcc/brig/lang.opt |41 +
 gcc/builtin-types.def |78 +-
 gcc/builtins.def  |39 +
 gcc/doc/frontends.texi| 2 +-
 gcc/doc/invoke.texi   | 4 +
 gcc/doc/standards.texi| 8 +
 gcc/hsail-builtins.def|   652 +
 gcc/testsuite/brig.dg/README  |10 +
 gcc/testsuite/brig.dg/dg.exp  |27 +
 gcc/testsuite/brig.dg/test/gimple/alloca.hsail|37 +
 gcc/testsuite/brig.dg/test/gimple/atomics.hsail   |33 +
 gcc/testsuite/brig.dg/test/gimple/branches.hsail  |58 +
 gcc/testsuite/brig.dg/test/gimple/fbarrier.hsail  |74 +
 .../brig.dg/test/gimple/function_calls.hsail  |59 +
 gcc/testsuite/brig.dg/test/gimple/kernarg.hsail   |25 +
 gcc/testsuite/brig.dg/test/gimple/mem.hsail   |39 +
 gcc/testsuite/brig.dg/test/gimple/mulhi.hsail |33 +
 gcc/testsuite/brig.dg/test/gimple/packed.hsail|78 +
 .../brig.dg/test/gimple/smoke_test.hsail  |91 +
 gcc/testsuite/brig.dg/test/gimple/variables.hsail |   124 +
 gcc/testsuite/brig.dg/test/gimple/vector.hsail|57 +
 gcc/testsuite/lib/brig-dg.exp |29 +
 gcc/testsuite/lib/brig.exp|40 +
 include/hsa-interface.h   |   630 +
 libhsail-rt/Makefile.am   |   123 +
 libhsail-rt/Makefile.in   |   721 +
 libhsail-rt/README| 4 +
 libhsail-rt/aclocal.m4|   979 +
 libhsail-rt/config.h.in   |   217 +
 libhsail-rt/configure | 17162 ++
 libhsail-rt/configure.ac  |   150 +
 .../include/internal/phsa-queue-interface.h   |60 +
 libhsail-rt/include/internal/phsa-rt.h|97 +
 libhsail-rt/include/internal/workitems.h  |   103 +
 libhsail-rt/m4/libtool.m4 |  7997 
 

Re: [PATCH 1/4] BRIG (HSAIL) frontend: configuration file changes and misc

2016-08-15 Thread Pekka Jääskeläinen
Updated patch attached.

On Wed, May 18, 2016 at 7:58 PM, Pekka Jääskeläinen  wrote:
> Hi,
>
> Attached an updated patch (rebased + added .texi docs).
>
> On Mon, May 16, 2016 at 8:25 PM, Pekka Jääskeläinen  
> wrote:
>> The configuration file changes and misc. updates required
>> by the BRIG frontend.
>>
>> Also, added include/hsa-interface.h which is hsa.h taken from libgomp
>> and will be shared by it (agreed with Martin Liška / SUSE).
>>
>> --
>> Pekka Jääskeläinen
>> Parmance


001-brig-fe-config-etc.patch.gz
Description: GNU Zip compressed data


Re: [PATCH 2/4] BRIG (HSAIL) frontend: The FE itself.

2016-08-15 Thread Pekka Jääskeläinen
Thanks a lot for the review Martin,

Responses inline:

On Mon, Aug 1, 2016 at 7:37 PM, Martin Jambor  wrote:
>
>   - I would definitely appreciate more comments.  All but the most
> trivial functions should have one that describes what the function
> does, what are its arguments and what it returns.  (And there
> should be one blank line between the comment and the function
> itself).


I added more comments. Please let me know if I missed a non-trivial one.

>
>
>   - We very much prefer c-style comments to c++ ones.  I hope they can
> be converted easily in some automated way.


Converted.

>
>
> So far I have the following specific comments:
>
> - brig-c.h
>   + please remove commented out extern GTY (()) tree brig_non_zero_struct


Done.

>
>
> - brig-lang.c:
>   + In the long run I would like to get rid of
>   opts->x_flag_whole_program = 0 in
> brig_langhook_init_options_struct, when did it cause issues, when
> you tried LTO?  Since there obviously is no linker-plugin support,
> I think we can leave it this way for now, though.


Agreed. I have not tried the current status of LTO, but it can be implemented
later.


>
>   + brig_langhook_handle_option has argument scode marked as unused
> but it is used.


Fixed.

>
>   + brig_langhook_type_for_size uses both supposedly unused arguments
> and I am surprised that handling just 64 bits is sufficient.


It is now being called for 32b too due to introducing the builtins like
they should.

>
>   + brig_langhook_type_for_mode: the "FIXME: This static_cast should
> be in machmode.h" can be removed, the cast is in machmode.h
>
>   + brig_langhook_eh_personality comment refers to file
> libbrig/runtime/brig-unwind.c which does not seem to exist?
>   + convert has attributes marked as unused but they are used


Fixed these.

>
>   + The "FIXME: This is a hack to preserve trees that we create from
> the garbage collector." IMHO does not describe any real issue,
> using GTY roots for that is common practice.


Removed the FIXME.

> - brigspec.c:
>   + lang_specific_driver: if (len > 3 && strcmp (arg + len - 3,
> ".brig") == 0) can never be true, even if str end with brig.
> Consequently, the if branch marked by FIXME later on in the
> function never gets executed.  So perhaps it is not necessary?


True. A copy-paste error.
>
>
> - brigfrontend/*.cc in general:
>
>   + A lot of functions should have a comment.  I prefer to have them
> above the function body but if it is a method, a comment in the
> class definition is fin e too (I think).  Often you have helpful
> comments inside the function but it really helps if you know what
> to expect before you start reading the function.  For example,
> brig_to_generic::add_global_variable needs a comment that it adds
> VAR_DECL to the list of global variables and if there is a "host
> def var" (I guess I know what that means but an explanation
> somewhere would not hurt either), it makes sure it points to the
> new VAR_DECL.  Another example: call_builtin is very difficult to
> review without a comment explaining what arguments it expects.
> Please make sure that all functions have a comment somewhere,
> perhaps with the exception of only the most trivial and
> self-evident.


Fixed these. I also scanned the sources and added new comments
where e.g. the function name didn't describe the purpose in a self clear
way.


>
>   + Is there any reason why you use internal_error instead of
> more common gcc_assert and/or gcc_unreachable?


I think it was because of the possibility to add a format str & a message
more easily while developing. I now converted them to asserts/unreachable
as the frontend can be considered feature complete.


>
> - brigfrontend/brig_to_generic.cc:
>
>   + Why does this filename have underscores while all the others have
> dashes? :-)


Renamed.

>
>   + What should the sanity check of data secion in parse accomplish?


Nothing anymore. A leftover from debugging sessions, I believe.
Removed.

>   + In build_reinterpret_cast, it would be best if you could avoid
> constructing VIEW_CONVERT_EXPRs that have type with a different
> size from the type of its operand (even though I think that Ada
> also does this, it is considered unfortunate).  What are the cases
> when this happens?


This was needed due to the special treatment of f16 which is stored in
a 32b reg (variable). I added a comment about this.

>
> OK, adding another note later: For converting scalars (anything
> !AGGREGATE_TYPE_P), I think you pretty much always want to use
> NOP_EXPR rather than generating a V_C_E.  V_C_E is mainly used to
> change how we interpret aggregates (or type-cast between
> aggregates and scalars).  In particular, NOP_EXPR will also
> sign-extend properly when you are extending intergers according to
> the type, whereas what will be in the "e

Re: [PATCH, PR70895] Add copy mapping for reductions on OpenACC loop directives

2016-08-15 Thread Jakub Jelinek
On Mon, Aug 15, 2016 at 07:25:48PM +0800, Chung-Lin Tang wrote:
> per the discussion on the bugzilla PR page, reductions on OpenACC loop
> directives will automatically get a copy clause mapping on an enclosing
> parallel construct (unless bounded by a local variable or an explicit
> firstprivate clause).
> 
> There is also a patch for libgomp testsuite cases. Asides from the
> fortran case which now needs explicit firstprivate clauses to work,
> other C/C++ cases have been adjusted to remove explicit copy clauses.
> (I have not exhaustively searched everywhere to eliminate them though)
> 
> This has been tested using gomp-4_0-branch, which is based on GCC 6,
> which is what this PR was originally filed for.
> 
> I will be committing this soon for gomp-4_0-branch,
> is this okay for gcc-6-branch and trunk as well?

The gimplify.c change is ok for trunk and 6.3 (after 6.2 is released).
As for the testsuite, I'll leave it to Thomas/Nathan on what they prefer,
I'd think that having explicit clauses in e.g. half of the testcases and
implicit ones in the other half wouldn't hurt, so that both are tested
enough.

> 2016-08-15  Chung-Lin Tang  
> 
> PR middle-end/70895
> gcc/
> * gimplify.c (omp_add_variable): Adjust/add variable mapping on
> enclosing parallel construct for reduction variables on OpenACC loop
> directives.
> 
> libgomp/
> * testsuite/libgomp.oacc-fortran/reduction-7.f90: Add explicit
> firstprivate clauses.
> * testsuite/libgomp.oacc-c-c++-common/reduction-7.c: Remove explicit
> copy clauses.
> * testsuite/libgomp.oacc-c-c++-common/reduction-cplx-flt.c: Likewise.
> * testsuite/libgomp.oacc-c-c++-common/reduction-flt.c: Likewise.
> * testsuite/libgomp.oacc-c-c++-common/collapse-2.c: Likewise.
> * testsuite/libgomp.oacc-c-c++-common/loop-red-wv-1.c: Likewise.
> * testsuite/libgomp.oacc-c-c++-common/collapse-4.c: Likewise.
> * testsuite/libgomp.oacc-c-c++-common/loop-red-v-1.c: Likewise.
> * testsuite/libgomp.oacc-c-c++-common/reduction-cplx-dbl.c: Likewise.
> * testsuite/libgomp.oacc-c-c++-common/loop-red-g-1.c: Likewise.
> * testsuite/libgomp.oacc-c-c++-common/loop-red-gwv-1.c: Likewise.
> * testsuite/libgomp.oacc-c-c++-common/loop-red-w-1.c: Likewise.
> * testsuite/libgomp.oacc-c-c++-common/reduction-dbl.c: Likewise.

Jakub


Re: [PATCH 4/4] BRIG (HSAIL) frontend: smoke test suite

2016-08-15 Thread Pekka Jääskeläinen
Updated the test suite by refreshing the test strings to match the new output +
added a new regression test case.

On Mon, May 16, 2016 at 8:26 PM, Pekka Jääskeläinen  wrote:
> A smoke test suite. The patch has been tested more thoroughly with the
> proprietary HSA PRM conformance suite.
>
> Requires the HSAILasm tool to first compile the .hsail to .brig.
>
> --
> Pekka Jääskeläinen
> Parmance


004-brig-fe-testsuite.patch.gz
Description: GNU Zip compressed data


Re: [PATCH] Extend -falign-FOO=N to N[,M]: the second number is max padding

2016-08-15 Thread Richard Biener
On Mon, Aug 15, 2016 at 1:53 PM, Denys Vlasenko  wrote:
> On 08/15/2016 11:45 AM, Richard Biener wrote:
>>>
>>> Thus. For this CPU, alignment of loops to 8 bytes is wrong: it helps if
>>> it
>>> happens
>>> to align a loop to 16 bytes, but it may in fact hurt performance if it
>>> happens to align
>>> a loop to 16+8 bytes and this pushes loop's body end over the next
>>> 16-byte
>>> boundary,
>>> as it happens in the above example.
>>>
>>> I suspect something similar was seen sometime ago on a different, earlier
>>> CPU,
>>> and on _that_ CPU decoder/loop buffer idiosyncrasies are such that it
>>> likes
>>> 8 byte alignment.
>>>
>>> It's not true that such alignment is always a win.
>>
>>
>> It looks to me that all you want is to drop the 8-byte alignment on
>> entities that are smaller than a cacheline.
>
>
> I don't think it can be simplified to this.
>
> An example. A loop 122 bytes long fits into either two or three 64-byte
> cachelines,
> depending on where it starts. If it starts in bytes 0..5 in a cacheline, it
> fits
> into two cachelines. If it starts at 6 bytes or more into cacheline, it
> doesn't fit.
>
> 8-byte alignment is worse for such a loop than not doing it.
>
> It's even worse for the use case which prompted me to create these patches:
> -falign-functions. Linux kernel people want to align all functions
> to 64 bytes, but only if the necessary padding is, say, 9 bytes or less.
> The rationale is that function calls are often "cold", i.e. function body
> is not in L1, and it would be even slower if first insn(s) would require
> two L1 loads, not one, to be decoded.
>
> Hence -falign-functions=64,10. This would be a very efficient packing:
> only ~15% of all functions would need any padding (the remaining 85%
> would start 10 or more bytes before end of cacheline and thus need
> no padding), and among those 15% the average padding length would be
> only 5 bytes. With very small code size increase, we'd gain a lot
> in speed.
>
> This nice optimistic picture is currently destroyed by unnecessary
> and not-asked-for "subalignment" to 8 bytes, which now adds 4.5 bytes
> of padding on average *to every function*, as a "bonus" making
> it *less* efficient versus instruction fetch, not more efficient!
>
>
> IOW: I am proposing to remove this code because it seems arbitrary: it
> helped
> on one particular CPU model, and maybe only on some particular benchmarks.
> On other CPUs, or in other scenarios, it's harmful.
> It should not be now done for all CPUs and all programs.
>
> If there is a value in the ability to do a "subalignment" within a larger
> alignment,
> maybe we can make it a separate option, and let user specify it if he wants?

Controlling this separately makes sense IMHO.  Changing the default for
generic tuning has to be backed up with measurements and old CPUs not
benchmarked should retain the old value when tuned for them.

Let me rephrase the desire again.  The desire is to maximize the number
of instructions fetched with the first cacheline for any label that is branched
(forward) to.  A side-effect may be avoiding penalties for CPUs that have
an instruction started at only N-byte aligned space (not sure that exists
for an ISA with 1-byte opcodes).  For labels that are branched backward to
(thus loops) the desire is to minimize the number of cachelines that need
to be fetched to get the whole loop covered - ISTR CPUs have limits on that
number when it comes to handling loops with loop caches.  Branch target
buffers may also not like too many targest per cache-line -- I expect
8 2-byte functions in a cache-line to be very bad here.

If the situation cannot be improved on any of the above any additional
"aritificial" alignment makes things only worse (by enlarging code).

Richard.


Re: [v3 PATCH] Implement LWG 2744 and LWG 2754.

2016-08-15 Thread Ville Voutilainen
On 15 August 2016 at 14:55, Ville Voutilainen
 wrote:
>>> +  template
>>> +struct __is_in_place
>>> +: public
>>> __is_in_place_impl>>
>>
>>
>> Any reason not to use decay_t here? In all the cases where decay is
>> different to stripping references and cv-qualifiers the result will be
>> false either way.
>>
>> I wouldn't have bothered with the std:: qualification either, but it's
>> fine as it is.
>
>
> The reason for not using decay and for the qualification is that the
> trait is closely
> related to its copy-paste origin, which is __is_optional. ;) I should
> also add a test
> for the case where in_place is attempted to pass through the
> ValueType&& parameter,
> aka a case where a type is not default-constructible but is
> constructible from in_place,
> and in_place-construction is used.

Argh, the trait shouldn't either decay or remove_reference, and the
remove_const is superfluous.
These tags are references to functions, decaying them will cause bad mojo.

Now that I actually test that the in_place avoidance in the perfect
forwarder works, it's fixed.
Tested on Linux-x64.

2016-08-15  Ville Voutilainen  

Implement LWG 2744 and LWG 2754.
* include/std/any (any(ValueType&&)): Constrain with __is_in_place_type.
(any(in_place_type_t<_ValueType>, _Args&&...)): Use _Decay.
(any(in_place_type_t<_ValueType>, initializer_list<_Up>, _Args&&...)):
Likewise.
(emplace(_Args&&...)): Likewise.
(emplace(initializer_list<_Up>, _Args&&...)): Likewise.
* include/std/utility: (__is_in_place_type_impl): New.
(__is_in_place_type): Likewise.
* testsuite/20_util/any/assign/emplace.cc: Add tests for decaying
emplace.
* testsuite/20_util/any/cons/in_place.cc: Add tests for decaying
in_place constructor.
* testsuite/20_util/any/misc/any_cast_neg.cc: Adjust.
* testsuite/20_util/any/requirements.cc: Add a test for
in_place-constructing a non-default-constructible type.
diff --git a/libstdc++-v3/include/std/any b/libstdc++-v3/include/std/any
index 4add118..9160035 100644
--- a/libstdc++-v3/include/std/any
+++ b/libstdc++-v3/include/std/any
@@ -153,7 +153,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 /// Construct with a copy of @p __value as the contained object.
 template ,
  typename _Mgr = _Manager<_Tp>,
-  __any_constructible_t<_Tp, _ValueType&&> = true>
+  __any_constructible_t<_Tp, _ValueType&&> = true,
+ enable_if_t::value, bool> = true>
   any(_ValueType&& __value)
   : _M_manager(&_Mgr::_S_manage)
   {
@@ -164,9 +165,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 template ,
  typename _Mgr = _Manager<_Tp>,
   enable_if_t<__and_,
-__not_<
-  is_constructible<_Tp,
-   _ValueType&&>>>::value,
+__not_>,
+__not_<__is_in_place_type<_ValueType>>>::value,
  bool> = false>
   any(_ValueType&& __value)
   : _M_manager(&_Mgr::_S_manage)
@@ -175,10 +175,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   }
 
 /// Construct with an object created from @p __args as the contained 
object.
-template ,
  typename _Mgr = _Manager<_Tp>,
   __any_constructible_t<_Tp, _Args&&...> = false>
-  any(in_place_type_t<_Tp>, _Args&&... __args)
+  any(in_place_type_t<_ValueType>, _Args&&... __args)
   : _M_manager(&_Mgr::_S_manage)
   {
 _Mgr::_S_create(_M_storage, std::forward<_Args>(__args)...);
@@ -186,11 +187,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
 /// Construct with an object created from @p __il and @p __args as
 /// the contained object.
-template ,
  typename _Mgr = _Manager<_Tp>,
   __any_constructible_t<_Tp, initializer_list<_Up>,
_Args&&...> = false>
-  any(in_place_type_t<_Tp>, initializer_list<_Up> __il, _Args&&... __args)
+  any(in_place_type_t<_ValueType>,
+ initializer_list<_Up> __il, _Args&&... __args)
   : _M_manager(&_Mgr::_S_manage)
   {
 _Mgr::_S_create(_M_storage, __il, std::forward<_Args>(__args)...);
@@ -248,7 +251,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   }
 
 /// Emplace with an object created from @p __args as the contained object.
-template ,
  typename _Mgr = _Manager<_Tp>,
   __any_constructible_t<_Tp, _Args&&...> = false>
   void emplace(_Args&&... __args)
@@ -260,7 +264,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
 /// Emplace with an object created from @p __il and @p __args as
 /// the contained object.
-template ,
  typename _Mgr = _Manager<_Tp>,
   __any_constructible_t<_Tp, initializer_list<_Up>,
_Args&&...> = false>
diff --git a/libstdc++-v3/include/std/utility b/libstdc++-v3/include/std/utility
inde

Re: [PATCH] PR71752 - SLP: Maintain operand ordering when creating vec defs

2016-08-15 Thread Alan Hayward


On 15/08/2016 12:17, "Richard Biener"  wrote:

>On Mon, Aug 15, 2016 at 11:48 AM, Alan Hayward 
>wrote:
>> The testcase pr71752.c was failing because the SLP code was generating
>>an
>> SLP
>> vector using arguments from the SLP scalar stmts, but was using the
>>wrong
>> argument number.
>>
>> vect_get_slp_defs() is given a vector of operands. When calling down to
>> vect_get_constant_vectors it uses i as op_num - making the assumption
>>that
>> the
>> first op in the vector refers to the first argument in the SLP scalar
>> statement, the second op refers to the second arg and so on.
>>
>> However, previously in vectorizable_reduction, the call to
>> vect_get_vec_defs
>> destroyed this ordering by potentially only passing op1.
>>
>> The solution is in vectorizable_reduction to create a vector of operands
>> equal
>> in size to the number of arguments in the SLP statements. We maintain
>>the
>> argument ordering and if we don't require defs for that argument we
>>instead
>> push NULL into the vector. In vect_get_slp_defs we need to handle cases
>> where
>> an op might be NULL.
>>
>> Tested with a check run on X86 and AArch64.
>> Ok to commit?
>>
>
>Ugh.  Note the logic in vect_get_slp_defs is incredibly fragile - I
>think you can't
>simply "skip" ops the way you do as you need to still increment
>child_index
>accordingly for ignored ops.

Agreed, I should be maintaining the child_index.
Looking at the code, I need a valid oprnd for that code to work.

Given that the size of the ops vector is never more than 3, it probably
makes
sense to reset child_index each time and do a quick for loop through all
the
children.

>
>Why not let the function compute defs for all ops?  That said, the
>vectorizable_reduction
>part certainly is fixing a bug (I think I've seen similar issues
>elsewhere though).

If you let it compute defs for all ops then you can end up creating invalid
initial value SLP vectors which cause SSA failures (even though nothing
uses those
vectors).

In the test case, SLP is defined for _3. Op1 of this is q4_lsm.5_8, but
that is
a loop phi. Creating SLP vec refs for it results in vect_cst__67 and
vect_cst__68, which are clearly invalid:



:
  _58 = yg_lsm.7_23 + 1;
  _59 = _58 + 1;
  _60 = _58 + 2;
  vect_cst__61 = {yg_lsm.7_23, _58, _59, _60};
  vect_cst__62 = { 4, 4, 4, 4 };
  vect_cst__67 = {q4_lsm.5_8, z5_19, q4_lsm.5_8, z5_19};
  vect_cst__68 = {q4_lsm.5_8, z5_19, q4_lsm.5_8, z5_19};
  vect_cst__69 = {jv_9(D), jv_9(D), jv_9(D), jv_9(D)};
  vect_cst__70 = {jv_9(D), jv_9(D), jv_9(D), jv_9(D)};
  vect_cst__73 = {0, 0, 0, 0};
  vect_cst__74 = {q4_lsm.5_16, z5_10(D), 0, 0};
  vect_cst__91 = { 1, 1, 1, 1 };

  :
  # z5_19 = PHI 
  # q4_lsm.5_8 = PHI 
  # yg_lsm.7_17 = PHI 
  # vect_vec_iv_.14_63 = PHI 
  # vect__3.15_65 = PHI 
  # vect__3.15_66 = PHI 
  # ivtmp_94 = PHI <0(4), ivtmp_95(10)>
  vect_vec_iv_.14_64 = vect_vec_iv_.14_63 + vect_cst__62;
  _3 = q4_lsm.5_8 - jv_9(D);
  vect__3.15_71 = vect__3.15_65 - vect_cst__70;
  vect__3.15_72 = vect__3.15_66 - vect_cst__69;
  z5_13 = z5_19 - jv_9(D);
  vect__5.17_92 = vect_vec_iv_.14_63 + vect_cst__91;
  _5 = yg_lsm.7_17 + 1;
  ivtmp_95 = ivtmp_94 + 1;
  if (ivtmp_95 < bnd.11_18)
goto ;
  else
goto ;




>
>Richard.
>
>> Changelog:
>>
>> gcc/
>> * tree-vect-loop.c (vectorizable_reduction): Keep SLP operand
>>ordering.
>> * tree-vect-slp.c (vect_get_slp_defs): Handle null operands.
>>
>> gcc/testsuite/
>> * gcc.dg/vect/pr71752.c: New.
>>
>>
>>
>> Thanks,
>> Alan.
>>
>>
>>
>>
>> diff --git a/gcc/testsuite/gcc.dg/vect/pr71752.c
>> b/gcc/testsuite/gcc.dg/vect/pr71752.c
>> new file mode 100644
>> index
>> 
>>..8d26754b4fedf8b104caae8742a445d
>>ff
>> bf23f0a
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/vect/pr71752.c
>> @@ -0,0 +1,19 @@
>> +/* { dg-do compile } */
>> +
>> +unsigned int q4, yg;
>> +
>> +unsigned int
>> +w6 (unsigned int z5, unsigned int jv)
>> +{
>> +  unsigned int *f2 = &jv;
>> +
>> +  while (*f2 < 21)
>> +{
>> +  q4 -= jv;
>> +  z5 -= jv;
>> +  f2 = &yg;
>> +  ++(*f2);
>> +}
>> +  return z5;
>> +}
>> +
>> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
>> index
>> 
>>2a7e0c6661bc1ba82c9f03720e550749f2252a7c..826481af3d1d8b29bcdbd7d81c0fd5a
>>85
>> 9ecd9b0 100644
>> --- a/gcc/tree-vect-loop.c
>> +++ b/gcc/tree-vect-loop.c
>> @@ -5364,7 +5364,7 @@ vectorizable_reduction (gimple *stmt,
>> gimple_stmt_iterator *gsi,
>>auto_vec vect_defs;
>>auto_vec phis;
>>int vec_num;
>> -  tree def0, def1, tem, op0, op1 = NULL_TREE;
>> +  tree def0, def1, tem, op1 = NULL_TREE;
>>bool first_p = true;
>>tree cr_index_scalar_type = NULL_TREE, cr_index_vector_type =
>>NULL_TREE;
>>gimple *cond_expr_induction_def_stmt = NULL;
>> @@ -5964,29 +5964,36 @@ vectorizable_reduction (gimple *stmt,
>> gimple_stmt_iterator *gsi,
>>/* Handle uses.  */
>>if (j == 0)
>>  {
>> -  op0 = ops[!reduc_index];
>> -

Re: [PATCH build/doc] Replacing libiberty with gnulib

2016-08-15 Thread Manuel López-Ibáñez
On 15 August 2016 at 13:50, ayush goel  wrote:
> Included gnulib’s config.h header file inside gcc’s config.h itself as
> per the discussions.
>
> Built and tested the system.

You need to mention how you build it (languages, bootstrap, etc.) and
how you tested it (targets).

Do the other patches that you submitted still work with this new one?

Cheers,

Manuel.


Re: [PATCH] Fix val-prof-7.c on --target_board 'unix/-m32'

2016-08-15 Thread Jeff Law

On 08/15/2016 05:12 AM, Martin Liška wrote:

Hello.

The test-case uses size of memory operations which cannot be handled by core2 
in 32-bit
mode. Fixed in the patch.

Survives:
make check -k -j10 RUNTESTFLAGS="tree-prof.exp --target_board 'unix/-m32'"
make check -k -j10 RUNTESTFLAGS="tree-prof.exp"

on x86_64-linux-gnu.

Ready for trunk?
Thanks,
Martin


0001-Fix-val-prof-7.c-on-target_board-unix-m32.patch


From e8663bc8b2c721eff3003aa591d52b2b15132b88 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Mon, 15 Aug 2016 13:09:36 +0200
Subject: [PATCH] Fix val-prof-7.c on --target_board 'unix/-m32'

gcc/testsuite/ChangeLog:

2016-08-15  Martin Liska  

* gcc.dg/tree-prof/val-prof-7.c (int main): Change size
of memory operations so that it can be handled by core2
in 32-bit mode.

OK.
jeff


Re: [PATCH] lra: A multiple_sets is not a simple_move_p (PR73650)

2016-08-15 Thread Jeff Law

On 08/12/2016 12:38 PM, Segher Boessenkool wrote:

Hi!

In the PR we have a PARALLEL of a move and a compare (a "mr." instruction).
The compare is dead, so single_set on it returns just the move.  Then,
simple_move_p returns true; but the instruction does need reloads in this
case.  This patch solves this by making simple_move_p return false for
every multiple_sets instruction.

Bootstrapped and regression checked on powerpc64-linux (-m64,-m32).
Is this okay for trunk?


Segher


2016-08-12  Segher Boessenkool  

PR rtl-optimization/73650
* lra-constraints.c (simple_move_p): If the insn is multiple_sets
it is not a simple move.

OK.

Though I do wonder if it would be advantageous to try and rewrite such 
insns.


jeff



Re: [PATCH, COMMITTED] Add branch_changer.py script to maintainer-scripts

2016-08-15 Thread Gerald Pfeifer
Hi Martin,

On Mon, 15 Aug 2016, Martin Liška wrote:
> Ready to be installed?

you ignored (or I guess: missed) the updated patch that I
included in my previous message.  Can you use that instead
of your original?

As for the examples, I would omit the output of the script,
since otherwise we'll need to adjust that whenever the output
changes, and it's a bit lengthy. ;-)  And potentially use a 
few words to describe the scenario instead for those cases
where there is no --comment options?


Clearly this is an improvement, so let's give Richi a day to
chime in and then go ahead with whatever you have.

Gerald

Re: [PATCH] lra: A multiple_sets is not a simple_move_p (PR73650)

2016-08-15 Thread Segher Boessenkool
On Mon, Aug 15, 2016 at 09:37:34AM -0600, Jeff Law wrote:
> On 08/12/2016 12:38 PM, Segher Boessenkool wrote:
> >In the PR we have a PARALLEL of a move and a compare (a "mr." instruction).
> >The compare is dead, so single_set on it returns just the move.  Then,
> >simple_move_p returns true; but the instruction does need reloads in this
> >case.  This patch solves this by making simple_move_p return false for
> >every multiple_sets instruction.
> >
> >Bootstrapped and regression checked on powerpc64-linux (-m64,-m32).
> >Is this okay for trunk?
> >
> >
> >Segher
> >
> >
> >2016-08-12  Segher Boessenkool  
> >
> > PR rtl-optimization/73650
> > * lra-constraints.c (simple_move_p): If the insn is multiple_sets
> > it is not a simple move.
> OK.
> 
> Though I do wonder if it would be advantageous to try and rewrite such 
> insns.

Ah, I didn't mention that.  I tried that, using code similar to
eliminate_regs_in_insn (lra-eliminations.c, around line 1080, after the
comment starting with "/* First see if this insn remains valid when").
This works, but the REG_UNUSED still remains, and it seems something later
then deletes the remaining insn.

Cleaning up the notes requires some nasty code.

The multiple_sets fits nicely in simple_move_p, which says
/* Return true if the current move insn does not need processing as we
   already know that it satisfies its constraints.  */
(we could of course move all this to the caller).


Segher


[hsa-branch] Fix issues with firstbit and popcount source types

2016-08-15 Thread Martin Jambor
Hi,

this patch addresses a regression caused by my patch that avoided
useless register copies but in a few cases caused us to generate
instruction types that did not make the finalizer happy.

Fixed thusly.  I am going to commit to the branch now and will queue
it for trunk for later.

Thanks,

Martin


2016-08-12  Martin Jambor  

* hsa-gen.c (gen_hsa_unary_operation): Make sure the function does
not use bittype source type for firstbit and lastbit operations.
(gen_hsa_popcount_to_dest): Make sure the function uses a bittype
source type.

libgomp/
* testsuite/libgomp.hsa.c/bits-insns.c: New test.
---
 gcc/hsa-gen.c| 13 +++--
 libgomp/testsuite/libgomp.hsa.c/bits-insns.c | 73 
 2 files changed, 82 insertions(+), 4 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.hsa.c/bits-insns.c

diff --git a/gcc/hsa-gen.c b/gcc/hsa-gen.c
index baa20b9..c946b2f 100644
--- a/gcc/hsa-gen.c
+++ b/gcc/hsa-gen.c
@@ -2957,8 +2957,12 @@ gen_hsa_unary_operation (BrigOpcode opcode, hsa_op_reg 
*dest,
   if (opcode == BRIG_OPCODE_MOV && hsa_needs_cvt (dest->m_type, op1->m_type))
 insn = new hsa_insn_cvt (dest, op1);
   else if (opcode == BRIG_OPCODE_FIRSTBIT || opcode == BRIG_OPCODE_LASTBIT)
-insn = new hsa_insn_srctype (2, opcode, BRIG_TYPE_U32, op1->m_type, NULL,
-op1);
+{
+  BrigType16_t srctype = hsa_type_integer_p (op1->m_type) ? op1->m_type
+   : hsa_unsigned_type_for_type (op1->m_type);
+  insn = new hsa_insn_srctype (2, opcode, BRIG_TYPE_U32, srctype, NULL,
+  op1);
+}
   else
 {
   insn = new hsa_insn_basic (2, opcode, dest->m_type, dest, op1);
@@ -4250,12 +4254,13 @@ gen_hsa_popcount_to_dest (hsa_op_reg *dest, 
hsa_op_with_type *arg, hsa_bb *hbb)
   if (hsa_type_bit_size (arg->m_type) < 32)
 arg = arg->get_in_type (BRIG_TYPE_B32, hbb);
 
+  BrigType16_t srctype = hsa_bittype_for_type (arg->m_type);
   if (!hsa_btype_p (arg->m_type))
-arg = arg->get_in_type (hsa_bittype_for_type (arg->m_type), hbb);
+arg = arg->get_in_type (srctype, hbb);
 
   hsa_insn_srctype *popcount
 = new hsa_insn_srctype (2, BRIG_OPCODE_POPCOUNT, BRIG_TYPE_U32,
-   arg->m_type, NULL, arg);
+   srctype, NULL, arg);
   hbb->append_insn (popcount);
   popcount->set_output_in_type (dest, 0, hbb);
 }
diff --git a/libgomp/testsuite/libgomp.hsa.c/bits-insns.c 
b/libgomp/testsuite/libgomp.hsa.c/bits-insns.c
new file mode 100644
index 000..21cac72
--- /dev/null
+++ b/libgomp/testsuite/libgomp.hsa.c/bits-insns.c
@@ -0,0 +1,73 @@
+#include 
+
+#define N 12
+
+int main()
+{
+  unsigned int arguments[N] = {0u, 1u, 2u, 3u, 111u, 333u, 444u, 0x8000u, 
0xu, 0xf000u, 0xff00u, 0xu};
+  int clrsb[N] = {};
+  int clz[N] = {};
+  int ctz[N] = {};
+  int ffs[N] = {};
+  int parity[N] = {};
+  int popcount[N] = {};
+
+  int ref_clrsb[N] = {};
+  int ref_clz[N] = {};
+  int ref_ctz[N] = {};
+  int ref_ffs[N] = {};
+  int ref_parity[N] = {};
+  int ref_popcount[N] = {};
+
+  for (unsigned i = 0; i < N; i++)
+{
+  ref_clrsb[i] = __builtin_clrsb (arguments[i]);
+  ref_clz[i] = __builtin_clz (arguments[i]);
+  ref_ctz[i] = __builtin_ctz (arguments[i]);
+  ref_ffs[i] = __builtin_ffs (arguments[i]);
+  ref_parity[i] = __builtin_parity (arguments[i]);
+  ref_popcount[i] = __builtin_popcount (arguments[i]);
+}
+
+  #pragma omp target map(from:clz, ctz, ffs, parity, popcount)
+  {
+for (unsigned i = 0; i < N; i++)
+{
+  clrsb[i] = __builtin_clrsb (arguments[i]);
+  clz[i] = __builtin_clz (arguments[i]);
+  ctz[i] = __builtin_ctz (arguments[i]);
+  ffs[i] = __builtin_ffs (arguments[i]);
+  parity[i] = __builtin_parity (arguments[i]);
+  popcount[i] = __builtin_popcount (arguments[i]);
+}
+  }
+
+  for (unsigned i = 0; i < N; i++)
+if (ref_clrsb[i] != clrsb[i])
+  __builtin_abort ();
+
+  /* CLZ of zero is undefined for zero.  */
+  for (unsigned i = 1; i < N; i++)
+if (ref_clz[i] != clz[i])
+  __builtin_abort ();
+
+  /* Likewise for ctz */
+  for (unsigned i = 1; i < N; i++)
+if (ref_ctz[i] != ctz[i])
+  __builtin_abort ();
+
+  for (unsigned i = 0; i < N; i++)
+if (ref_ffs[i] != ffs[i])
+  __builtin_abort ();
+
+  for (unsigned i = 0; i < N; i++)
+if (ref_parity[i] != parity[i])
+  __builtin_abort ();
+
+  for (unsigned i = 0; i < N; i++)
+if (ref_popcount[i] != popcount[i])
+  __builtin_abort ();
+
+  return 0;
+}
+
-- 
2.9.2



[hsa-branch] Fix ICE in binds_to_current_def_p

2016-08-15 Thread Martin Jambor
Hi,

we have found out that when building a shared object file with
pure/const HSA functions in them, we hit an assert in
binds_to_current_def_p.  The reason is that we are creating a clone
for the HSA implementation and even though normally clones are
private, ours isn't and so we need to set back not only the
TREE_PUBLIC flag but also the cgraph externally_visible flag.

I will commit it to the branch and queue for trunk for later.

Thanks,

Martin


2016-08-12  Martin Jambor  

* ipa-hsa.c (process_hsa_functions): Copy externally visible flag
to the node.
---
 gcc/ipa-hsa.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/ipa-hsa.c b/gcc/ipa-hsa.c
index 9ab4927..0fbe2e2 100644
--- a/gcc/ipa-hsa.c
+++ b/gcc/ipa-hsa.c
@@ -90,6 +90,7 @@ process_hsa_functions (void)
= node->create_virtual_clone (vec  (),
  NULL, NULL, "hsa");
  TREE_PUBLIC (clone->decl) = TREE_PUBLIC (node->decl);
+ clone->externally_visible = node->externally_visible;
 
  clone->force_output = true;
  hsa_summaries->link_functions (clone, node, s->m_kind, false);
@@ -107,6 +108,7 @@ process_hsa_functions (void)
= node->create_virtual_clone (vec  (),
  NULL, NULL, "hsa");
  TREE_PUBLIC (clone->decl) = TREE_PUBLIC (node->decl);
+ clone->externally_visible = node->externally_visible;
 
  if (!cgraph_local_p (node))
clone->force_output = true;
-- 
2.9.2



Re: backward threading heuristics tweek

2016-08-15 Thread Jeff Law

On 08/11/2016 06:35 AM, Jan Hubicka wrote:

This also caused:

FAIL: gcc.dg/tree-ssa/pr69270-3.c scan-tree-dump-times uncprop1 ", 1" 4

  Failures:
gcc.dg/tree-ssa/pr69270-3.c

  Bisected to:

  Author: hubicka
  Date:   Sun Aug 7 10:50:16 2016 +

* tree-ssa-threadbackward.c: Include tree-inline.h
(profitable_jump_thread_path): Use estimate_num_insns to estimate
size of copied block; for cold paths reduce duplication.
(find_jump_threads_backwards): Remove redundant tests.
(pass_thread_jumps::gate): Enable for -Os.
* gcc.dg/tree-ssa/ssa-dom-thread-7.c: Update testcase.

  svn+ssh://gcc.gnu.org/svn/gcc/trunk@239219

On my aarch64-none-linux-gnu and x86_64-none-linux-gnu automatic bisect
robots.


Sorry for that - it seems I have missed this failure.  The reason is that FSM
now stops on:
  /* We avoid creating irreducible inner loops unless we thread through
 a multiway branch, in which case we have deemed it worth losing
 other loop optimizations later.

 We also consider it worth creating an irreducible inner loop if
 the number of copied statement is low relative to the length of
 the path -- in that case there's little the traditional loop
 optimizer would have done anyway, so an irreducible loop is not
 so bad.  */
  if (!threaded_multiway_branch && creates_irreducible_loop
  && (n_insns * PARAM_VALUE (PARAM_FSM_SCALE_PATH_STMTS)
  > path_length * PARAM_VALUE (PARAM_FSM_SCALE_PATH_BLOCKS)))

{
  if (dump_file && (dump_flags & TDF_DETAILS))
fprintf (dump_file,
 "FSM would create irreducible loop without threading "
 "multiway branch.\n");
  path->pop ();
  return NULL;
}

The path threaded now gets n_insn==13 and path_lengt=6. I guess the difference
is that the path consists of several calls that are considered heavy by the
new code size estimate which is correct. It is definitly heaver than path
consisting of few increments.

:
if (phi_inserted_5 == 0)
  goto ;
else
  goto ;

:
_2 = boo ();
if (_2 != 20)
  goto ;
else
  goto ;

:
_1 = arf ();
if (_1 != 10)
  goto ;
else
  goto ;

:
# phi_inserted_5 = PHI <0(2), phi_inserted_4(8)>
_3 = end_imm_use_stmt_p ();
if (_3 == 0)
  goto ;
else
  goto ;

loop latch.
:
# phi_inserted_4 = PHI 
next_imm_use_stmt ();

:
if (phi_inserted_5 == 0)
  goto ;
else
  goto ;


I would say that optimizing this path to dead is not the most important thing.  
The question is whether
there is really problem with an irreducible loop. THere are two loops in the 
function body prior threading:

;; Loop 0
;;  header 0, latch 1
;;  depth 0, outer -1
;;  nodes: 0 1 2 3 4 6 7 8 9 10
;;
;; Loop 1
;;  header 9, latch 8
;;  depth 1, outer 0
;;  nodes: 9 8 4 6 7 3
;; 2 succs { 9 }
;; 3 succs { 8 4 }
;; 4 succs { 8 6 }
;; 6 succs { 8 7 }
;; 7 succs { 8 }
;; 8 succs { 9 }
;; 9 succs { 3 10 }
;; 10 succs { 1 }

So the threaded path lives fully inside loop1: 6->8->9->3->4->6 propagating
that phi_inserted is 0 after the first iteration of the loop.  This looks like
useful loop peeling oppurtunity which does not garble loop structure. So
perhaps threading paths starting and passing loop latch (i.e. peeling) is
sane? Perhaps all paths fully captured in the loop in question are?
Peeling like this has long been a point of contention -- it totally 
mucks things up like vectorizing.


The general issue that the threader knows nothing about the 
characteristics of the loop -- thus peeling is at this point is 
premature and just as likely to hinder performance as improve it.


I'm never been happy with how this aspect of threading vs loop opts 
turned out and we have open BZs related to this rats nest of issues.


jeff




Re: divmod transform: add test-cases

2016-08-15 Thread Jeff Law

On 08/12/2016 01:17 AM, Richard Biener wrote:


Note that for the main patch I don't like the current state of the
divmod libcall issue.  I think we need to solve this in a more
reasonable manner and not expose this oddness to a GIMPLE level pass.

I haven't looked at the main patch at all.



Any ideas welcome - I don't have a very good one :/

The best idea I have is to not lie about libfunc availability in
the optab handler.

Hard to argue with "to not lie about ...".

jeff


Re: Early jump threading

2016-08-15 Thread Jeff Law

On 08/12/2016 12:02 AM, Richard Biener wrote:


Hmm, isn't walking backwards from uses doing a lot of redundant stmt
walking compared to walking stmts once in forward direction?  To me
it sounds like a 'local' patterns matching like optimization rather
than a global one with proper data flow or a lattice?
You end up walking the use-def chain, so you look only at the chain of 
feeding statements.


Forward threading is actually worse because it tries to walk *past* the 
current point in the dominator walk.  For example at a dominance 
frontier we have multiple paths merging -- we will walk all statements 
in the merge block for every incoming path as well as all statements on 
the outgoing paths of the merge block.   We have to update the various 
tables, then unwind them as we work through all those paths.


I have not done a full analysis, but I strongly suspect the backwards 
threader will ultimately end up doing less work, both in the common and 
pathological cases.


jeff


Re: Early jump threading

2016-08-15 Thread Jeff Law

On 08/12/2016 05:27 AM, Jan Hubicka wrote:

* passes.def (pass_early_thread_jumps): Schedule after forwprop.
* tree-pass.h (make_pass_early_thread_jumps): Declare.
* tree-ssa-threadbackward.c (fsm_find_thread_path,
fsm_find_thread_path, profitable_jump_thread_path,
fsm_find_control_statement_thread_paths,
find_jump_threads_backwards): Add speed_p parameter.
(pass_data_early_thread_jumps): New pass.
(make_pass_early_thread_jumps): New function.

Index: passes.def
===
--- passes.def  (revision 239218)
+++ passes.def  (working copy)
@@ -84,6 +84,7 @@ along with GCC; see the file COPYING3.
  /* After CCP we rewrite no longer addressed locals into SSA
 form if possible.  */
  NEXT_PASS (pass_forwprop);
+  NEXT_PASS (pass_early_thread_jumps);


What's the reason for this placement?  I know Jeff argues that
as jump threading helps CSE we need to place it before CSE but
OTOH the FSM style threading relies on copies and redundancies
being optimized already and the above has only constants and copies
being propagated and forwprop left you with lots of dead code
(but it should also have copies and constants propagated but it
leaves PHIs alone, not propagating into them or removing degenerate
ones - sth to fix I guess).

So I'd be interested to see threading statistics when you place
the threading pass after early FRE (or cd_dce).  I guess early
FRE will already handle quite some of the simplistic "threading"
opportunities (it optimizes redundant checks) thus numbers may
even get worse here.

That said - if you put it before early FRE then I'd put it
right after CCP, not after forwprop.


I placed it just after forwprop becasue the pattern it handles:
 bb0:
   x = a COND b;
   if (x) goto ... else goto ...

   Will be transformed into:

 bb0:
   if (a COND b) goto ... else goto ...
Note that extending the backward threader to handle the former style 
sequence is relatively straightforward.  In fact, building a bit of that 
kind of infrastructure is what I expect to be the biggest source of 
things we're missing relative to the forward threader.




In general threading helps forward propagators becuase of code specialization
it does. It does not like dead code (as it will get accounted and prevent
duplication), degenerate PHIs (because it will do useless duplication -
something to fix probably), and unpropagated temporaries (because
fsm_find_control_statement_thread_paths does not look into them)
Yes, avoiding useless duplication and factoring identical paths from 
different incoming edges are definitely on the TODO list.  They're 
clearly a source of codesize issues when we compare the backward 
threader to the forward threader.


Presumably for unpropagated temporaries you're referring to cases where 
both operands of a COND_EXPR need to be looked up?  Right now we only 
walk one operand backward to a constant, but we really want to walk both.


jeff


Honza





Re: Implement -Wimplicit-fallthrough: core

2016-08-15 Thread Jeff Law

On 08/12/2016 09:58 AM, Joseph Myers wrote:

On Fri, 12 Aug 2016, Marek Polacek wrote:


Not sure if error is appropriate here, or whether I should downgrade the
error to a warning and ignore the attribute.


I'd say the semantics for uses of attributes that currently parse should
remain unchanged: they should not be reinterpreted (for non-fallthrough
attributes) to be attributes on a statement rather than on an empty
declaration.

Agreed.

jeff



[wwwdocs] Improve example at https://gcc.gnu.org/gcc-6/porting_to.html#flifetime-dse

2016-08-15 Thread Jonathan Wakely

I'm committing this patch to improve the wording of the -flifetime-dse
docs at https://gcc.gnu.org/gcc-6/porting_to.html#flifetime-dse and to
add a new paragraph explaining what the problem is and how to fix it.
The current text doesn't actually say where the UB is, and talks about
a placement new operator, but the example doesn't use placement new.

Committed to CVS.


In a later patch I'd also like to simplify the example to remove the
build(void) function and just do A* a = new A; in main(), because the
build function doesn't seem to serve any purpose. The assertion in the
example doesn't fail, so it doesn't demonstrate the problem, and so
the attribute((noinline)) is just noise. Am I missing something?

I'd also like to replace the GNU coding style with more idiomatic C++.
This is meant to be documentation for end-users writing C++, who don't
typically put spaces before the argument list of a function, or use
(void) for functions with no arguments. Writing (void) is redundant in
C++, and even sometimes considered an abomination.


Index: gcc-6/porting_to.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-6/porting_to.html,v
retrieving revision 1.23
diff -u -r1.23 porting_to.html
--- gcc-6/porting_to.html	23 May 2016 14:23:23 -	1.23
+++ gcc-6/porting_to.html	15 Aug 2016 17:27:21 -
@@ -343,11 +343,11 @@
 More aggressive optimization of -flifetime-dse
 
 
-The C++ compiler (with enabled -flifetime-dse)
-is more aggressive in dead-store elimination in situations where
-a memory store to a location precedes a constructor to the
-memory location. Described situation can be commonly found in programs
-which zero a memory that is eventually passed to a placement new operator:
+The C++ compiler (with -flifetime-dse enabled)
+is more aggressive about dead-store elimination in situations where
+a memory store to a location precedes the construction of an object at
+that memory location. Such situations are commonly found in programs
+which zero memory in a custom new operator:
 
 
 
@@ -384,6 +384,14 @@
 
 
 
+An object's constructor begins the lifetime of a new object at the relevant
+memory location, so any stores to that memory location which happen before
+the constructor are considered "dead stores" and so can be optimized away.
+If the memory needs to be initialized to specific values then that should be
+done by the constructor, not by code that happens before the constructor.
+
+
+
 If the program cannot be fixed to remove the undefined behavior then
 the option -flifetime-dse=1 can be used to disable
 this optimization.


Re: [wwwdocs] Improve example at https://gcc.gnu.org/gcc-6/porting_to.html#flifetime-dse

2016-08-15 Thread Jonathan Wakely

On 15/08/16 18:31 +0100, Jonathan Wakely wrote:

In a later patch I'd also like to simplify the example to remove the
build(void) function and just do A* a = new A; in main(), because the
build function doesn't seem to serve any purpose. The assertion in the
example doesn't fail, so it doesn't demonstrate the problem, and so
the attribute((noinline)) is just noise. Am I missing something?

I'd also like to replace the GNU coding style with more idiomatic C++.
This is meant to be documentation for end-users writing C++, who don't
typically put spaces before the argument list of a function, or use
(void) for functions with no arguments. Writing (void) is redundant in
C++, and even sometimes considered an abomination.


I think this would be an improvement, although I still can't get the
assertion to fail:

#include 
#include 
#include 

struct A
{
 A() {}

 void* operator new(size_t s)
 {
   void* ptr = malloc(s);
   memset(ptr, 0, s);
   return ptr;
 }

 void operator delete(void* ptr) { free(ptr); }

 int value;
};

int main()
{
 A* a =  new A;
 assert(a->value == 0); /* Use of uninitialized value */
 delete a;
}




libgo patch committed: fix go test -i with gccgo

2016-08-15 Thread Ian Lance Taylor
https://golang.org/issue/16701 points out that `go test -i` fails when
using gccgo.  This patch fixes the problem, by recognizing that the
go/build package will fail to load a standard import when using gccgo.
This is a gccgo-specific patch, because the standard go/build package
does not distinguish standard packages and user-written packages in
this way.  I will leave the issue open to find a better approach in
the future.  This patch bootstrapped and ran Go testsuite on
x86_64-pc-linux-gnu.  Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 239443)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-24e0c4c98e0614b1892316aca787f1c564f2d269
+affb1bf5fcd7abf05993c54313d8000b93a08d4a
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/go/cmd/go/pkg.go
===
--- libgo/go/cmd/go/pkg.go  (revision 238662)
+++ libgo/go/cmd/go/pkg.go  (working copy)
@@ -763,6 +763,13 @@ var cgoSyscallExclude = map[string]bool{
 func (p *Package) load(stk *importStack, bp *build.Package, err error) 
*Package {
p.copyBuild(bp)
 
+   // When using gccgo the go/build package will not be able to
+   // find a standard package.  It would be nicer to not get that
+   // error, but go/build doesn't know stdpkg.
+   if runtime.Compiler == "gccgo" && err != nil && p.Standard {
+   err = nil
+   }
+
// The localPrefix is the path we interpret ./ imports relative to.
// Synthesized main packages sometimes override this.
p.localPrefix = dirToImportPath(p.Dir)


Re: [PATCH] Extend -falign-FOO=N to N[,M]: the second number is max padding

2016-08-15 Thread Denys Vlasenko

On 08/15/2016 03:30 PM, Richard Biener wrote:

On Mon, Aug 15, 2016 at 1:53 PM, Denys Vlasenko  wrote:

On 08/15/2016 11:45 AM, Richard Biener wrote:


Thus. For this CPU, alignment of loops to 8 bytes is wrong: it helps if
it
happens
to align a loop to 16 bytes, but it may in fact hurt performance if it
happens to align
a loop to 16+8 bytes and this pushes loop's body end over the next
16-byte
boundary,
as it happens in the above example.

I suspect something similar was seen sometime ago on a different, earlier
CPU,
and on _that_ CPU decoder/loop buffer idiosyncrasies are such that it
likes
8 byte alignment.

It's not true that such alignment is always a win.



It looks to me that all you want is to drop the 8-byte alignment on
entities that are smaller than a cacheline.



I don't think it can be simplified to this.

An example. A loop 122 bytes long fits into either two or three 64-byte
cachelines,
depending on where it starts. If it starts in bytes 0..5 in a cacheline, it
fits
into two cachelines. If it starts at 6 bytes or more into cacheline, it
doesn't fit.

8-byte alignment is worse for such a loop than not doing it.

It's even worse for the use case which prompted me to create these patches:
-falign-functions. Linux kernel people want to align all functions
to 64 bytes, but only if the necessary padding is, say, 9 bytes or less.
The rationale is that function calls are often "cold", i.e. function body
is not in L1, and it would be even slower if first insn(s) would require
two L1 loads, not one, to be decoded.

Hence -falign-functions=64,10. This would be a very efficient packing:
only ~15% of all functions would need any padding (the remaining 85%
would start 10 or more bytes before end of cacheline and thus need
no padding), and among those 15% the average padding length would be
only 5 bytes. With very small code size increase, we'd gain a lot
in speed.

This nice optimistic picture is currently destroyed by unnecessary
and not-asked-for "subalignment" to 8 bytes, which now adds 4.5 bytes
of padding on average *to every function*, as a "bonus" making
it *less* efficient versus instruction fetch, not more efficient!


IOW: I am proposing to remove this code because it seems arbitrary: it
helped
on one particular CPU model, and maybe only on some particular benchmarks.
On other CPUs, or in other scenarios, it's harmful.
It should not be now done for all CPUs and all programs.

If there is a value in the ability to do a "subalignment" within a larger
alignment,
maybe we can make it a separate option, and let user specify it if he wants?


Controlling this separately makes sense IMHO.  Changing the default for
generic tuning has to be backed up with measurements and old CPUs not
benchmarked should retain the old value when tuned for them.

Let me rephrase the desire again.  The desire is to maximize the number
of instructions fetched with the first cacheline for any label that is branched
(forward) to.  A side-effect may be avoiding penalties for CPUs that have
an instruction started at only N-byte aligned space (not sure that exists
for an ISA with 1-byte opcodes).  For labels that are branched backward to
(thus loops) the desire is to minimize the number of cachelines that need
to be fetched to get the whole loop covered - ISTR CPUs have limits on that
number when it comes to handling loops with loop caches.  Branch target
buffers may also not like too many targest per cache-line -- I expect
8 2-byte functions in a cache-line to be very bad here.

If the situation cannot be improved on any of the above any additional
"aritificial" alignment makes things only worse (by enlarging code).


I have an idea.

Since I am extending -falign-foo directives anyway, I can add even more
functionality to them. Such as:

-falign-functions=N[,M[,N2[,M2]]]

This would emit

.balign N,M-1
[.balign N2,M2-1]   // only if N2 > 1

N2 can be made to default to 8 if N is > 8. This is exactly the current
behavior on x86.

For the use case I described (about kernel aligning functions to 64 bytes)
the desired flag would be:

-falign-functions=64,10,1

Does this look good to you?


[PATCH, i386]: Fix PR72867, incorrect optimization of VMINPS/VMAXPS at compile time

2016-08-15 Thread Uros Bizjak
Hello!

Attached patch applies min/max IEEE rules w.r.t. NaN and signed zeros
also to  SSE intrinsics. Before the patch, intrinsics were expanded to
a generic min/max pattern with commutative operands.

Patched compiler expands intrinsics to an UNSPEC or generic min/max,
depending on flag_finite_math_only or flag_signed_zeros.

2016-08-15  Uros Bizjak  

PR target/72867
* config/i386/sse.md (3):
Emit ieee_3
for !flag_finite_math_only or flag_signed_zeros.
(*3): Rename from
*3_finite.  Do not
depend on flag_finite_math_only.
(ieee_3):
New insn pattern.
(*3): Remove.
(*ieee_smin3): Ditto.
(*ieee_smax3): Ditto.
* config/i386/mmx.md (mmx_v2sf3): Emit
mmx_ieee_v2sf3 for !flag_finite_math_only or
flag_signed_zeros.
(*mmx_v2sf3): Rename from *mmx_v2sf3_finite.  Do not
depend on flag_finite_math_only.
(mmx_ieee_v2sf3): New insn pattern.
(*mmx_v2sf3): Remove.
* config/i386/subst.md (round_saeonly_mask_arg3): New subst attribute.
* config/i386/i386.c (ix86_expand_sse_fp_mimnax): Check
flag_signed_zeros instead of !flag_unsafe_math_optimizations.

testsuite/ChangeLog:

2016-08-15  Uros Bizjak  

PR target/72867
* gcc.target/i386/pr72867.c: New test.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32} .

Committed to mainline SVN.

Uros.
Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 239479)
+++ config/i386/i386.c  (working copy)
@@ -23404,7 +23404,7 @@ ix86_expand_sse_fp_minmax (rtx dest, enum rtx_code
 
   /* We want to check HONOR_NANS and HONOR_SIGNED_ZEROS here,
  but MODE may be a vector mode and thus not appropriate.  */
-  if (!flag_finite_math_only || !flag_unsafe_math_optimizations)
+  if (!flag_finite_math_only || flag_signed_zeros)
 {
   int u = is_min ? UNSPEC_IEEE_MIN : UNSPEC_IEEE_MAX;
   rtvec v;
Index: config/i386/i386.md
===
--- config/i386/i386.md (revision 239479)
+++ config/i386/i386.md (working copy)
@@ -885,6 +885,14 @@
  (umax "maxu") (umin "minu")])
 (define_code_attr maxmin_float [(smax "max") (smin "min")])
 
+(define_int_iterator IEEE_MAXMIN
+   [UNSPEC_IEEE_MAX
+UNSPEC_IEEE_MIN])
+
+(define_int_attr ieee_maxmin
+   [(UNSPEC_IEEE_MAX "max")
+(UNSPEC_IEEE_MIN "min")])
+
 ;; Mapping of logic operators
 (define_code_iterator any_logic [and ior xor])
 (define_code_iterator any_or [ior xor])
@@ -17401,14 +17409,6 @@
 ;; Their operands are not commutative, and thus they may be used in the
 ;; presence of -0.0 and NaN.
 
-(define_int_iterator IEEE_MAXMIN
-   [UNSPEC_IEEE_MAX
-UNSPEC_IEEE_MIN])
-
-(define_int_attr ieee_maxmin
-   [(UNSPEC_IEEE_MAX "max")
-(UNSPEC_IEEE_MIN "min")])
-
 (define_insn "*ieee_s3"
   [(set (match_operand:MODEF 0 "register_operand" "=x,v")
(unspec:MODEF
Index: config/i386/mmx.md
===
--- config/i386/mmx.md  (revision 239479)
+++ config/i386/mmx.md  (working copy)
@@ -296,10 +296,6 @@
(set_attr "prefix_extra" "1")
(set_attr "mode" "V2SF")])
 
-;; ??? For !flag_finite_math_only, the representation with SMIN/SMAX
-;; isn't really correct, as those rtl operators aren't defined when
-;; applied to NaNs.  Hopefully the optimizers won't get too smart on us.
-
 (define_expand "mmx_v2sf3"
   [(set (match_operand:V2SF 0 "register_operand")
 (smaxmin:V2SF
@@ -307,30 +303,47 @@
  (match_operand:V2SF 2 "nonimmediate_operand")))]
   "TARGET_3DNOW"
 {
-  if (!flag_finite_math_only)
-operands[1] = force_reg (V2SFmode, operands[1]);
-  ix86_fixup_binary_operands_no_copy (, V2SFmode, operands);
+  if (!flag_finite_math_only || flag_signed_zeros)
+{
+  operands[1] = force_reg (V2SFmode, operands[1]);
+  emit_insn (gen_mmx_ieee_v2sf3
+(operands[0], operands[1], operands[2]));
+  DONE;
+}
+  else
+ix86_fixup_binary_operands_no_copy (, V2SFmode, operands);
 })
 
-(define_insn "*mmx_v2sf3_finite"
+;; These versions of the min/max patterns are intentionally ignorant of
+;; their behavior wrt -0.0 and NaN (via the commutative operand mark).
+;; Since both the tree-level MAX_EXPR and the rtl-level SMAX operator
+;; are undefined in this condition, we're certain this is correct.
+
+(define_insn "*mmx_v2sf3"
   [(set (match_operand:V2SF 0 "register_operand" "=y")
 (smaxmin:V2SF
  (match_operand:V2SF 1 "nonimmediate_operand" "%0")
  (match_operand:V2SF 2 "nonimmediate_operand" "ym")))]
-  "TARGET_3DNOW && flag_finite_math_only
-   && ix86_binary_operator_ok (, V2SFmode, operands)"
+  "TARGET_3DNOW && ix86_binary_operator_ok (, V2SFmode, operands)"
   "pf\t{%2, %0|%0, %2}"
   [(set_attr "type" "mmxadd")
(set_attr "prefix_extra" "1")
(set_attr "mode" "V2SF")])
 
-(define_insn "*mmx_v2sf3"
+;; Thes

[patch, fortran, committed] Set deferred flag on typespec for temporary strings

2016-08-15 Thread Thomas Koenig

Hello world,

I just committed the attached patch as obvious and simple after
regression-testing.

One source of mysterious errors (and regressions) in the front end was
that the deferred flag on the typespec was not set for deffered strings.

Because some flags (including the deferred flag) in the typespec were
not output to the dump of the Fortran AST, I have also added this.

I will backport to 6 and 5 in a few days.

Regards

Thomas

 2016-08-15  Thomas Koenig  

* frontend-passes.c (create_var):  Set ts.deferred for
deferred-length character variables.
* dump-parse-tree.c (show_typespec):  Also dump
is_c_interop, is_iso_c and deferred flags.
Index: dump-parse-tree.c
===
--- dump-parse-tree.c	(Revision 239218)
+++ dump-parse-tree.c	(Arbeitskopie)
@@ -120,7 +120,15 @@ show_typespec (gfc_typespec *ts)
   fprintf (dumpfile, "%d", ts->kind);
   break;
 }
+  if (ts->is_c_interop)
+fputs (" C_INTEROP", dumpfile);
 
+  if (ts->is_iso_c)
+fputs (" ISO_C", dumpfile);
+
+  if (ts->deferred)
+fputs (" DEFERRED", dumpfile);
+
   fputc (')', dumpfile);
 }
 
Index: frontend-passes.c
===
--- frontend-passes.c	(Revision 239218)
+++ frontend-passes.c	(Arbeitskopie)
@@ -616,6 +616,7 @@ create_var (gfc_expr * e, const char *vname)
   gfc_code *n;
   gfc_namespace *ns;
   int i;
+  bool deferred;
 
   if (e->expr_type == EXPR_CONSTANT || is_fe_temp (e))
 return gfc_copy_expr (e);
@@ -666,6 +667,7 @@ create_var (gfc_expr * e, const char *vname)
 	}
 }
 
+  deferred = 0;
   if (e->ts.type == BT_CHARACTER && e->rank == 0)
 {
   gfc_expr *length;
@@ -675,7 +677,10 @@ create_var (gfc_expr * e, const char *vname)
   if (length)
 	symbol->ts.u.cl->length = length;
   else
-	symbol->attr.allocatable = 1;
+	{
+	  symbol->attr.allocatable = 1;
+	  deferred = 1;
+	}
 }
 
   symbol->attr.flavor = FL_VARIABLE;
@@ -687,6 +692,7 @@ create_var (gfc_expr * e, const char *vname)
   result = gfc_get_expr ();
   result->expr_type = EXPR_VARIABLE;
   result->ts = e->ts;
+  result->ts.deferred = deferred;
   result->rank = e->rank;
   result->shape = gfc_copy_shape (e->shape, e->rank);
   result->symtree = symtree;


Re: backward threading heuristics tweek

2016-08-15 Thread Jan Hubicka
> >So the threaded path lives fully inside loop1: 6->8->9->3->4->6 propagating
> >that phi_inserted is 0 after the first iteration of the loop.  This looks 
> >like
> >useful loop peeling oppurtunity which does not garble loop structure. So
> >perhaps threading paths starting and passing loop latch (i.e. peeling) is
> >sane? Perhaps all paths fully captured in the loop in question are?
> Peeling like this has long been a point of contention -- it totally
> mucks things up like vectorizing.
> 
> The general issue that the threader knows nothing about the
> characteristics of the loop -- thus peeling is at this point is
> premature and just as likely to hinder performance as improve it.
> 
> I'm never been happy with how this aspect of threading vs loop opts
> turned out and we have open BZs related to this rats nest of issues.

Ok, then we perhaps just want to silence the testcase?

Honza
> 
> jeff
> 


Re: Ping! Re: [PATCH, Fortran] New flag -finit-derived to initialize components of derived types

2016-08-15 Thread Thomas Koenig

Hi Fritz,


Still waiting on a review for this patch. Comments/concerns from all
are welcome.



The patch looks fine to me.

OK for trunk, and thanks for the patch!

Regards

Thomas

@Gerald: Will a gcc-7/changes.html file be generated?  If so, we should
document this new flag (and others...)



Re: [PATCH] Add mark_spam.py script

2016-08-15 Thread Joseph Myers
On Mon, 15 Aug 2016, Martin Liška wrote:

> It can, currently we mark as spam just the first comment. If there's a spam PR
> which contains multiple comments, I'll extend the script.

There certainly are spam bugs where the spammer pasted their spam in a 
comment after creating the bug, rather than putting it in the initial bug 
description; see bug 76607, for example.  Maybe all comments created by 
the original bug submitter should be considered as spam, not just the 
initial bug description?

-- 
Joseph S. Myers
jos...@codesourcery.com

Ping^3 Re: Implement C _FloatN, _FloatNx types [version 5]

2016-08-15 Thread Joseph Myers
Ping^3.  This patch 
 
(non-C-front-end parts) is still pending review.  (We have approvals for 
Fortran and rs6000 parts.)

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [WIP] [PR fortran/72741] Rework Fortran OpenACC routine clause handling

2016-08-15 Thread Cesar Philippidis
On 08/11/2016 09:26 AM, Thomas Schwinge wrote:

> As Cesar asked for it, there is now a Git branch
> tschwinge/omp/pr72741-wip containing these changes (plus some other
> pending changes that I didn't single out at this time), at
> .
> (I expect it does, but I didn't verify that this actually builds; I have
> further changes on top of that.)  Cesar, please tell me if you'd like me
> to push this to GitHub, in case you want to use their review/commentary
> functions, or the like.

No, that git repository is fine.

> On Thu, 11 Aug 2016 17:40:26 +0200, Jakub Jelinek  wrote:
>> On Thu, Aug 11, 2016 at 05:18:43PM +0200, Thomas Schwinge wrote:
>>> --- gcc/fortran/gfortran.h
>>> +++ gcc/fortran/gfortran.h
> 
>>>  /* Symbol attribute structure.  */
>>> -typedef struct
>>> +typedef struct symbol_attribute
>>>  {
> 
>> While symbol_attribute is already bloated, I don't like bloating it this
>> much further.  Do you really need it for all symbols, or just all 
>> subroutines?
> 
> Certainly not for all symbole; just for what is valid to be used with the
> OpenACC routine directive, which per OpenACC 2.0a, 2.13.1 Routine
> Directive is:
> 
> In Fortran the syntax of the routine directive is:
> !$acc routine clause-list
> !$acc routine( name ) clause-list
> In Fortran, the routine directive without a name may appear within the 
> specification part of a subroutine or function definition, or within an 
> interface body for a subroutine or function in an interface block, and 
> applies to the containing subroutine or function. The routine directive with 
> a name may appear in the specification part of a subroutine, function or 
> module, and applies to the named subroutine or function.
> 
> (Pasting that in full just in case that contains some additional Fortran
> lingo, meaning more than "subroutines".)

I'm avoided that problem in this patch. For the moment, I'm ignoring the
device_type problem and handling all of the matching errors in
gfc_match_oacc_routine. You're patch was handling those errors in
add_attributes_to_decls, which I think is too late.

device_type will require extra handling down the road. But instead of
introducing new attributes, we can just use the existing
gfc_oacc_routine_name struct to capture and chain all of the clauses for
all of the different device_types. Then we can teach
add_attributes_to_decls to call gfc_oacc_routine_dims to generate the
appropriate OACC_FUNCTION attribute for a given set of device_type clauses.

Note that besides for checking for multiple acc routine directives, this
patch also handles the case where the optional name argument in 'acc
routine (NAME)' is the name of the current procedure. This was a TODO
item in gomp4.

Thomas, does this patch ok to you for gomp4?

Cesar
2016-08-15  Cesar Philippidis  

	gcc/fortran/
	* openmp.c (gfc_match_oacc_routine): Error on repeated ACC ROUTINE
	directives.  Consider the optional NAME argument being the current
	procedure name.
	* trans-decl.c (add_attributes_to_decl): Use build_oacc_routine_dims
	to construct the oacc_function attribute arguments.

	gcc/testsuite/
	* gfortran.dg/goacc/pr72741-2.f: New test.
	* gfortran.dg/goacc/pr72741-intrinsic-1.f: Add test coverage.
	* gfortran.dg/goacc/pr72741-intrinsic-2.f: Likewise.
	* gfortran.dg/goacc/pr72741.f90: Likewise.


diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index 80f46c0..cb8efb8 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -1877,8 +1877,9 @@ gfc_match_oacc_cache (void)
   return MATCH_YES;
 }
 
-/* Determine the loop level for a routine.  Returns OACC_FUNCTION_NONE if
-   any error is detected.  */
+/* Determine the loop level for a routine.  Returns OACC_FUNCTION_NONE
+   if any error is detected.  Note that this function needs to be
+   called repeatedly for each DEVICE_TYPE.  */
 
 static oacc_function
 gfc_oacc_routine_dims (gfc_omp_clauses *clauses)
@@ -1925,6 +1926,7 @@ gfc_match_oacc_routine (void)
   gfc_omp_clauses *c = NULL;
   gfc_oacc_routine_name *n = NULL;
   oacc_function dims = OACC_FUNCTION_NONE;
+  bool seen_error = false;
 
   old_loc = gfc_current_locus;
 
@@ -1969,6 +1971,13 @@ gfc_match_oacc_routine (void)
 	  gfc_current_locus = old_loc;
 	  return MATCH_ERROR;
 	}
+
+	  /* Set sym to NULL if it matches the current procedure's
+	 name.  This will simplify the check for duplicate ACC
+	 ROUTINE attributes.  */
+	  if (gfc_current_ns->proc_name
+	  && !strcmp (buffer, gfc_current_ns->proc_name->name))
+	sym = NULL;
 	}
   else
 {
@@ -1993,19 +2002,24 @@ gfc_match_oacc_routine (void)
 	  != MATCH_YES))
 return MATCH_ERROR;
 
+  /* Scan for invalid routine geometry.  */
   dims = gfc_oacc_routine_dims (c);
   if (dims == OACC_FUNCTION_NONE)
 {
   gfc_error ("Multiple loop axes specified in !$ACC ROUTINE at %C");
-  goto cleanup;
+
+  /* Don't abort early, b

Re: [wwwdocs] Improve example at https://gcc.gnu.org/gcc-6/porting_to.html#flifetime-dse

2016-08-15 Thread Bernd Edlinger
Hi Jonathan,

> I think this would be an improvement, although I still can't get the
> assertion to fail:

Probably because the memory is still initialized to zero,
when it is used for the first time.

Try this:

#include 
#include 
#include 

struct A
{
 A() {}

 void* operator new(size_t s)
 {
   void* ptr = malloc(s);
   memset(ptr, 0xFF, s);
   return ptr;
 }

 void operator delete(void* ptr) { free(ptr); }

 int value;
};

int main()
{
 A* a =  new A;
 assert(a->value == -1); /* Use of uninitialized value */
 delete a;
}



Bernd.

[PING v2] Unreviewed GCC-6 patches

2016-08-15 Thread Jakub Sejdak
Hi!

I would like to ping a couple of unreviewed patches for GCC-6 branch
(they are already in trunk):

- Backport new Phoenix-RTOS OS name to config.sub
https://gcc.gnu.org/ml/gcc-patches/2016-07/msg01441.html

- Backport support for Phoenix-RTOS targets in GCC's config for ARM platform.
https://gcc.gnu.org/ml/gcc-patches/2016-07/msg01442.html

- Backport support for Phoenix-RTOS targets in libgcc's config for ARM platform.
https://gcc.gnu.org/ml/gcc-patches/2016-07/msg01440.html

Thanks,
Kuba Sejdak

-- 
Jakub Sejdak
Software Engineer
Phoenix Systems (www.phoesys.com)
+48 608 050 163