[Bug other/78161] [7 Regression] contrib/download_prerequisites fails on MacOS, Lubuntu, and Windows

2016-11-01 Thread damian at sourceryinstitute dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78161

--- Comment #2 from Damian Rouson  ---
I could close this because it turned out the failures were my error (I was
using a modified version of the script but thinking I was using the downloaded
version of the script).  If it's ok, I'll keep it open with the expectation
that I'll still submit a patch this week to broaden the download options, which
seems like a good idea, despite the fact that my bug report was a false alarm.

[Bug fortran/71902] [5/6 Regression] Unneeded temporary on reallocatable character assignment

2016-11-01 Thread tkoenig at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71902

--- Comment #11 from Thomas Koenig  ---
Author: tkoenig
Date: Tue Nov  1 08:12:00 2016
New Revision: 241732

URL: https://gcc.gnu.org/viewcvs?rev=241732&root=gcc&view=rev
Log:
2016-10-31  Thomas Koenig  

Backport from trunk
PR fortran/71902
* frontend-passes.c (realloc_string_callback): Also check for the
lhs being deferred.  Name temporary variable "realloc_string".

2016-10-31  Thomas Koenig  

Backport from trunk
PR fortran/71902
* gfortran.dg/dependency_47.f90:  New test.
* gfortran.dg/dependency_49.f90:  New test.



Added:
branches/gcc-5-branch/gcc/testsuite/gfortran.dg/dependency_47.f90
branches/gcc-5-branch/gcc/testsuite/gfortran.dg/dependency_49.f90
Modified:
branches/gcc-5-branch/gcc/fortran/ChangeLog
branches/gcc-5-branch/gcc/fortran/frontend-passes.c
branches/gcc-5-branch/gcc/testsuite/ChangeLog

[Bug fortran/71902] [5/6 Regression] Unneeded temporary on reallocatable character assignment

2016-11-01 Thread tkoenig at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71902

Thomas Koenig  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #12 from Thomas Koenig  ---
Fixed on all open branches, closing.

[Bug fortran/70959] [6/7 Regression] Invalid type determination due to expression in a type declaration statement

2016-11-01 Thread tkoenig at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70959

Thomas Koenig  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #11 from Thomas Koenig  ---
Fixed with https://gcc.gnu.org/viewcvs?rev=241689&root=gcc&view=rev , also
fixed on gcc 6. Closing.

[Bug target/77308] surprisingly large stack usage for sha512 on arm

2016-11-01 Thread rearnsha at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308

--- Comment #33 from Richard Earnshaw  ---
(In reply to Wilco from comment #32)
> (In reply to Bernd Edlinger from comment #31)
> > Furthermore, if I want to do -Os the third condition is FALSE too.
> > But one ldrd must be shorter than two ldr ?
> > 
> > That seems wrong...
> 
> Indeed, on a target that supports LDRD you want to use LDRD if legal. LDM
> should only be tried on Thumb-1. Emitting LDRD from a peephole when the
> offset is in range will never increase code size so should always be enabled.

The logic is certainly strange.  Some cores run LDRD less quickly than they can
do LDM, or even two independent loads.  I suspect the logic is meant to be: use
LDRD if available and not (optimizing for speed on a slow LDRD-device).

[Bug rtl-optimization/78038] [5/6 Regression] internal compiler error: in get_sub_rtx, at ree.c:655

2016-11-01 Thread ktkachov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78038

--- Comment #11 from ktkachov at gcc dot gnu.org ---
Author: ktkachov
Date: Tue Nov  1 10:29:40 2016
New Revision: 241735

URL: https://gcc.gnu.org/viewcvs?rev=241735&root=gcc&view=rev
Log:
[ree] PR rtl-optimization/78038: Handle global register dataflow definitions in
ree

Backport from mainline
2016-10-21  Kyrylo Tkachov  

PR rtl-optimization/78038
* ree.c (get_defs): Return NULL if a defining insn for REG cannot
be deduced to set REG through the RTL structure.
(make_defs_and_copies_lists): Return false on a failing get_defs call.

* gcc.target/aarch64/pr78038.c: New test.


Added:
branches/gcc-6-branch/gcc/testsuite/gcc.target/aarch64/pr78038.c
Modified:
branches/gcc-6-branch/gcc/ChangeLog
branches/gcc-6-branch/gcc/ree.c
branches/gcc-6-branch/gcc/testsuite/ChangeLog

[Bug target/77308] surprisingly large stack usage for sha512 on arm

2016-11-01 Thread bernd.edlinger at hotmail dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308

--- Comment #34 from Bernd Edlinger  ---
(In reply to Richard Earnshaw from comment #33)
> (In reply to Wilco from comment #32)
> > (In reply to Bernd Edlinger from comment #31)
> > > Furthermore, if I want to do -Os the third condition is FALSE too.
> > > But one ldrd must be shorter than two ldr ?
> > > 
> > > That seems wrong...
> > 
> > Indeed, on a target that supports LDRD you want to use LDRD if legal. LDM
> > should only be tried on Thumb-1. Emitting LDRD from a peephole when the
> > offset is in range will never increase code size so should always be 
> > enabled.
> 
> The logic is certainly strange.  Some cores run LDRD less quickly than they
> can do LDM, or even two independent loads.  I suspect the logic is meant to
> be: use LDRD if available and not (optimizing for speed on a slow
> LDRD-device).

Ok, so instead of removing this completely I should change it to:
   TARGET_LDRD
   && (current_tune->prefer_ldrd_strd
   || optimize_function_for_size_p (cfun))

[Bug fortran/58750] Wrong code with realloc on assignment and array constructors with numeric type conversion

2016-11-01 Thread adam at aphirst dot karoo.co.uk
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58750

--- Comment #11 from Adam Hirst  ---
It's been almost another year, and I just wanted to confirm that this bug is
still present in the following version:

GNU Fortran (GCC) 6.2.1 20160830

[Bug fortran/58861] Realloc on assignment: Bogus "Array bound mismatch" error with -fcheck=bounds

2016-11-01 Thread adam at aphirst dot karoo.co.uk
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58861

--- Comment #5 from Adam Hirst  ---
It's been over a year, and I can confirm that this bug is present for all 3
examples in this thread, in the following version:

gcc (GCC) 6.2.1 20160830

[Bug target/77308] surprisingly large stack usage for sha512 on arm

2016-11-01 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308

--- Comment #35 from wilco at gcc dot gnu.org ---
(In reply to Richard Earnshaw from comment #30)
> (In reply to wilco from comment #29)
> >  Combine could help with
> > merging 2 loads/stores into a single instruction.
> 
> No, combine works strictly on dataflow dependencies.  Two stores cannot be
> dataflow related so won't be combined.  Loads would only be dataflow related
> if both loads fed into *exactly* one data-processing instruction after the
> split.  That's unlikely to happen so I very much dobut it would happen there
> either.

Right, so then either we need to look further when creating ldm/ldrd or when
splitting use a parallel of 2 SI mode loads.(In reply to Richard Earnshaw from
comment #33)
> (In reply to Wilco from comment #32)
> > (In reply to Bernd Edlinger from comment #31)
> > > Furthermore, if I want to do -Os the third condition is FALSE too.
> > > But one ldrd must be shorter than two ldr ?
> > > 
> > > That seems wrong...
> > 
> > Indeed, on a target that supports LDRD you want to use LDRD if legal. LDM
> > should only be tried on Thumb-1. Emitting LDRD from a peephole when the
> > offset is in range will never increase code size so should always be 
> > enabled.
> 
> The logic is certainly strange.  Some cores run LDRD less quickly than they
> can do LDM, or even two independent loads.  I suspect the logic is meant to
> be: use LDRD if available and not (optimizing for speed on a slow
> LDRD-device).

The issue is that the behaviour is not consistent. If DI mode accesses are
split early, LDRD is not used, but if not split, LDRD is used even on cores
where LDRD is not preferred or slow.

Selecting -mcpu=cortex-a57 while splitting early gives:

t0p:
ldrdr3, r2, [r0]
addsr3, r3, #1
adc r2, r2, #0
strdr3, r2, [r0]
bx  lr

But with -mcpu=cortex-a53 (with -O2 or -Os):

t0p:
ldr r3, [r0]
ldr r2, [r0, #4]
addsr3, r3, #1
str r3, [r0]
adc r2, r2, #0
str r2, [r0, #4]
bx  lr

GCC currently emits LDRD for both cases - so clearly LDRD was preferred...

[Bug fortran/59298] ICE when initialising PARAMETER array of derived-type (containing an array) using array constructor

2016-11-01 Thread adam at aphirst dot karoo.co.uk
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59298

--- Comment #7 from Adam Hirst  ---
It's been almost a year, and I wanted to confirm that this bug is still present
for the test case I attached to the thread, and for Janus' reduced example.
Also relevant might be that my comment at 2014-11-12 13:09:05 UTC still
applies; namely that removing the parentheses dodges the ICE permitting the
code to compile.

I'm using version:
gcc (GCC) 6.2.1 20160830

[Bug target/77308] surprisingly large stack usage for sha512 on arm

2016-11-01 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308

--- Comment #36 from wilco at gcc dot gnu.org ---
(In reply to Bernd Edlinger from comment #34)
> (In reply to Richard Earnshaw from comment #33)
> > (In reply to Wilco from comment #32)
> > > (In reply to Bernd Edlinger from comment #31)
> > > > Furthermore, if I want to do -Os the third condition is FALSE too.
> > > > But one ldrd must be shorter than two ldr ?
> > > > 
> > > > That seems wrong...
> > > 
> > > Indeed, on a target that supports LDRD you want to use LDRD if legal. LDM
> > > should only be tried on Thumb-1. Emitting LDRD from a peephole when the
> > > offset is in range will never increase code size so should always be 
> > > enabled.
> > 
> > The logic is certainly strange.  Some cores run LDRD less quickly than they
> > can do LDM, or even two independent loads.  I suspect the logic is meant to
> > be: use LDRD if available and not (optimizing for speed on a slow
> > LDRD-device).
> 
> Ok, so instead of removing this completely I should change it to:
>TARGET_LDRD
>&& (current_tune->prefer_ldrd_strd
>|| optimize_function_for_size_p (cfun))

That's better but still won't emit LDRD as it seems most cores have
prefer_ldrd_strd disabled... Given that we currently always emit LDRD/STRD for
DI mode accesses, this should just check TARGET_LDRD.

[Bug target/77308] surprisingly large stack usage for sha512 on arm

2016-11-01 Thread rearnsha at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308

--- Comment #37 from Richard Earnshaw  ---
(In reply to Bernd Edlinger from comment #34)
> (In reply to Richard Earnshaw from comment #33)

> > The logic is certainly strange.  Some cores run LDRD less quickly than they
> > can do LDM, or even two independent loads.  I suspect the logic is meant to
> > be: use LDRD if available and not (optimizing for speed on a slow
> > LDRD-device).
> 
> Ok, so instead of removing this completely I should change it to:
>TARGET_LDRD
>&& (current_tune->prefer_ldrd_strd
>|| optimize_function_for_size_p (cfun))

That sounds about right.  Note that the original patch, back in 2013, said:

"* does not attempt to generate LDRD/STRD when optimizing for size and non of
the LDM/STM patterns match (but it would be easy to add),"
(https://gcc.gnu.org/ml/gcc-patches/2013-02/msg00604.html)

So it appears that this case was not attempted at the time.

I think when LDRD is not preferred we'd want to try to use LDM by preference if
the address offsets support it, even when optimizing for size.  Otherwise, use
LDRD if it supports the operation.

[Bug target/77933] Stack corruption on ARM when using high registers and __builtin_return_address

2016-11-01 Thread thopre01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77933

Thomas Preud'homme  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2016-11-01
   Assignee|unassigned at gcc dot gnu.org  |thopre01 at gcc dot 
gnu.org
 Ever confirmed|0   |1

--- Comment #2 from Thomas Preud'homme  ---
Working on a patch.

[Bug target/78176] New: [MIPS] miscompiles ldxc1 with large pointers on 32-bits

2016-11-01 Thread james410 at cowgill dot org.uk
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78176

Bug ID: 78176
   Summary: [MIPS] miscompiles ldxc1 with large pointers on
32-bits
   Product: gcc
   Version: 6.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: james410 at cowgill dot org.uk
  Target Milestone: ---

Created attachment 39937
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=39937&action=edit
testcase

Originally from the 'flint' package on Debian failing to build:
https://buildd.debian.org/status/fetch.php?pkg=flint&arch=mipsel&ver=2.5.2-10&stamp=1477705959

The attached testcase when compiled and run on a 64-bit mips processor crashes
with a SIGBUS error. The same binary works when run on a 32-bit mips processor.

Compile with -O2 optimization like this:
> gcc -O2 -march=mips32r2 ldxc1.c -o ldxc1

The crash occurs at the ldxc1 instruction at the end of this exerpt from the
inner loop in the ldxc1_test function:

> 8e0:   02028023subus0,s0,v0
> 8e4:   afbf003csw  ra,60(sp)
> 8e8:   0080b025moves6,a0
> 8ec:   8fb3005clw  s3,92(sp)
> [loop begins here]
> 8f0:   02e0a825moves5,s7
> 8f4:   26d6addiu   s6,s6,-1
> 8f8:   0256102aslt v0,s2,s6
> 8fc:   10400012beqzv0,948 
> 900:   8f998054lw  t9,-32684(gp)
> 904:   8e62lw  v0,0(s3)
> 908:   8ea6fff8lw  a2,-8(s5)
> 90c:   8ee3lw  v1,0(s7)
> 910:   00551021adduv0,v0,s5
> 914:   26b5fffcaddiu   s5,s5,-4
> 918:   00511021adduv0,v0,s1
> 91c:   00c33023subua2,a2,v1
> 920:   8c42lw  v0,0(v0)
> 924:   00541021adduv0,v0,s4
> 928:   2694fff8addiu   s4,s4,-8
> 92c:   4e020301ldxc1   $f12,v0(s0)

Before the ldxc1 instruction is executed, gdb reports that the values in v0 and
s0 are both large integers (above 0x8000):
(gdb) print/x $v0
$1 = 0xfffee7f8
(gdb) print/x $s0
$2 = 0x80008b50

When added together, the lower 32-bits contains the correct pointer (in this
case on the stack). On a 32-bit processor this is fine.

On a 64-bit processor, we know that v0 and s0 are sign extended as the last
instructions to touch them were the addu at 924 and the subu at 8e0. So the
values in the registers are actually:
v0 = 0xfffee7f8
s0 = 0x80008b50

Adding these together (modulo 64-bit) gives the final pointer of
0x7fff7348 which is outside the user address space and thus results in
a SIGBUS.

I think GCC is assuming that the address calculated by the ldxc1 instruction is
modulo 32-bit when compiled for a 32-bit processor. However, this is not true
if the code is later run on a 64-bit processor.

[Bug target/78176] [MIPS] miscompiles ldxc1 with large pointers on 32-bits

2016-11-01 Thread james410 at cowgill dot org.uk
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78176

James Cowgill  changed:

   What|Removed |Added

  Attachment #39937|0   |1
is obsolete||

--- Comment #1 from James Cowgill  ---
Created attachment 39938
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=39938&action=edit
testcase v2

Testcase actually initializes the arrays this time :)

The bug still occurs.

[Bug go/78145] Several go.test tests fail with error: integer constant overflow on 32bit targets

2016-11-01 Thread ian at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78145

--- Comment #2 from ian at gcc dot gnu.org  ---
Author: ian
Date: Tue Nov  1 13:46:10 2016
New Revision: 241740

URL: https://gcc.gnu.org/viewcvs?rev=241740&root=gcc&view=rev
Log:
PR go/78145
compiler: don't put print/println constants into temporaries

It's not necessary, and it breaks setting their type to int64/uint64
when appropriate.

This fixes GCC PR 78145.

Reviewed-on: https://go-review.googlesource.com/32475

Modified:
trunk/gcc/go/gofrontend/MERGE
trunk/gcc/go/gofrontend/expressions.cc

[Bug go/78145] Several go.test tests fail with error: integer constant overflow on 32bit targets

2016-11-01 Thread ian at airs dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78145

Ian Lance Taylor  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #3 from Ian Lance Taylor  ---
Fixed.

[Bug middle-end/78174] out of bounds array subscript in rtl.h NOTE_DATA macro

2016-11-01 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78174

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #5 from Jakub Jelinek  ---
Using __builtin_object_size (x, 1) for memset/memcpy etc. is just incorrect,
don't do that.

[Bug target/77308] surprisingly large stack usage for sha512 on arm

2016-11-01 Thread bernd.edlinger at hotmail dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308

Bernd Edlinger  changed:

   What|Removed |Added

  Attachment #39898|0   |1
is obsolete||

--- Comment #38 from Bernd Edlinger  ---
Created attachment 39939
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=39939&action=edit
proposed patch, v2

Hi,

this is a new version that tries to fix the fall out of
the previous attempt.

I will attempt a bootstrap and reg-test later this week.

It splits the logical di3 pattern right at the expansion.
When !TARGET_HARD_FLOAT or !TARGET_IWMMXT, in order to not
break the neon/iwmmxt patterns that seem to depend on it.

Simply disabling the logical di3 pattern made it impossible
to merge the ldrd/strd later because the ldr/str got expanded
too far away from each other.

It splits the adddi3/subdi3 in the split1 pass but only when
!TARGET_HARD_FLOAT, because other hard float pattern seem
to depend on it.

Note that the setting of the out register in the shift
expansion is only necessary in the case -mfpu=vfp -mhard-float
in all other configurations this is now unnecessary.

So far I have only benchmarked with the sha512 test case
and a modified sha512 with the Sigma blocks decorated with bit-not (~).

Checked that the pr53447-*.c test cases work again.

Checked that this test case emits all ldrd/strd where expected:

cat test.c
void foo(long long* p)
{
  p[1] |= 0x10001;
  p[2] &= 0x10001;
  p[3] ^= 0x10001;
  p[4] += 0x10001;
  p[5] -= 0x10001;
  p[6] = ~p[6];
  p[7] <<= 5;
  p[8] >>= 5;
  p[9] -= p[10];
}

At -Os -mthumb -march=armv7-a -msoft-float / -mhard-float
improves number of ldrd/strd with this patch to 100%.

I wonder if it is OK to emit ldrd at all when optimizing
for speed, given they are considered slower than ldm / 2x ldr ?

With -Os -mfpu=neon / -mfpu=vfp / -march=iwmmxt: checked that the stack usage
is still the same, around 2328 bytes.

With -Os -marm / thumb2: made sure that the stack usage is still 272 bytes.

Unlike the previous patch, thumb1 stack usage stays at 1588 bytes,
because thumb1 cannot split the adddi3 pattern, once it is emitted.

[Bug fortran/70696] [6.0] ICE on EVENT POST of host-associated EVENT_TYPE coarray

2016-11-01 Thread dominiq at lps dot ens.fr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70696

Dominique d'Humieres  changed:

   What|Removed |Added

 Status|NEW |WAITING

--- Comment #4 from Dominique d'Humieres  ---
Duplicate of pr67073?

[Bug target/77308] surprisingly large stack usage for sha512 on arm

2016-11-01 Thread bernd.edlinger at hotmail dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308

Bernd Edlinger  changed:

   What|Removed |Added

  Attachment #39939|0   |1
is obsolete||

--- Comment #39 from Bernd Edlinger  ---
Created attachment 39940
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=39940&action=edit
proposed patch, v2

last upload was accidentally truncated.
uploaded the right patch.

Sorry.

[Bug target/77308] surprisingly large stack usage for sha512 on arm

2016-11-01 Thread bernd.edlinger at hotmail dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308

--- Comment #40 from Bernd Edlinger  ---
BTW: I found something strange in this pattern in neon.md:

(define_insn_and_split "orndi3_neon"
  [(set (match_operand:DI 0 "s_register_operand" "=w,?&r,?&r,?&r")
(ior:DI (not:DI (match_operand:DI 2 "s_register_operand" "w,0,0,r"))
(match_operand:DI 1 "s_register_operand" "w,r,r,0")))]
  "TARGET_NEON"
  "@
   vorn\t%P0, %P1, %P2
   #
   #
   #"
  "reload_completed && 
   (TARGET_NEON && !(IS_VFP_REGNUM (REGNO (operands[0]"
  [(set (match_dup 0) (ior:SI (not:SI (match_dup 2)) (match_dup 1)))
   (set (match_dup 3) (ior:SI (not:SI (match_dup 4)) (match_dup 5)))]
  "
  {
if (TARGET_THUMB2)
  {
operands[3] = gen_highpart (SImode, operands[0]);
operands[0] = gen_lowpart (SImode, operands[0]);
operands[4] = gen_highpart (SImode, operands[2]);
operands[2] = gen_lowpart (SImode, operands[2]);
operands[5] = gen_highpart (SImode, operands[1]);
operands[1] = gen_lowpart (SImode, operands[1]);
  }
else
  {
emit_insn (gen_one_cmpldi2 (operands[0], operands[2]));
emit_insn (gen_iordi3 (operands[0], operands[1], operands[0]));
DONE;
  }
  }"
  [(set_attr "type" "neon_logic,multiple,multiple,multiple")
   (set_attr "length" "*,16,8,8")
   (set_attr "arch" "any,a,t2,t2")]
)


I think in alternative#4 we have operands[0] == operands[1]
and operands[2] != operands[0]

and then gen_one_cmpldi2 (operands[0], operands[2])
will overwrite the operand[1] before it is used in
gen_iordi3 (operands[0], operands[1], operands[0]) ??

[Bug fortran/64933] ASSOCIATE on a character variable does not allow substring expressions

2016-11-01 Thread pault at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64933

--- Comment #4 from Paul Thomas  ---
Created attachment 39941
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=39941&action=edit
Draft Patch for the PR

Bootstraps and regtests on FC21/x86_64. This testcase runs fine:

program test_this
  implicit none
  character(len = 15) :: char_var, char_var_dim (3)
  character(len = 80) :: buffer

! Original failing case reported in PR
  ASSOCIATE(should_work=>char_var)
should_work = "test succesful"
write (buffer, *) should_work(5:14)
  END ASSOCIATE

  if (trim (buffer) .ne. "  succesful") call abort

! Found to be failing during debugging
  ASSOCIATE(should_work=>char_var_dim)
should_work = ["test SUCCESFUL", "test_SUCCESFUL", "test.SUCCESFUL"]
write (buffer, *) should_work(:)(5:14)
  END ASSOCIATE

  if (trim (buffer) .ne. "  SUCCESFUL_SUCCESFUL.SUCCESFUL") call abort

! Found to be failing during debugging
  ASSOCIATE(should_work=>char_var_dim(1:2))
should_work = ["test SUCCESFUL", "test_SUCCESFUL", "test.SUCCESFUL"]
write (buffer, *) should_work(:)(5:14)
  END ASSOCIATE

  if (trim (buffer) .ne. "  SUCCESFUL_SUCCESFUL") call abort

end program

I'll submit and commit this weekend.

Paul

[Bug target/78176] [MIPS] miscompiles ldxc1 with large pointers on 32-bits

2016-11-01 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78176

--- Comment #2 from Andrew Pinski  ---
I think this code is undefined if you have wrapping pointers. No pointer should
ever be above INT_MAX in user space on mips32 due to the memory layout on
MIPS32.

[Bug c++/78177] New: adding -flto flags causes linker error "undefined reference to vtable"

2016-11-01 Thread steve.lorimer at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78177

Bug ID: 78177
   Summary: adding -flto flags causes linker error "undefined
reference to vtable"
   Product: gcc
   Version: 5.4.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: steve.lorimer at gmail dot com
  Target Milestone: ---

I have code which compiles and links fine.

I'm now trying to enable link-time optimizations, but adding -flto to my
compiler and linker flags is causing a linker error:

/usr/local/lib/libboost_thread.a(thread.o): \
In function `void
boost::throw_exception(boost::bad_lexical_cast
const&)':
   
thread.cpp:(.text._ZN5boost15throw_exceptionINS_16bad_lexical_castEEEvRKT_[_ZN5boost15throw_exceptionINS_16bad_lexical_castEEEvRKT_]+0x124):
\
undefined reference to `vtable for boost::bad_lexical_cast'

The only flag I've added is -flto.

set(CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE} -flto" )
set(CMAKE_EXE_LINKER_FLAGS_RELEASE "${CMAKE_EXE_LINKER_FLAGS_RELEASE} -flto" )

To be clear:

Without -flto the app builds and links fine
With -flto (and no other changes), the app fails to link with the above error

I'm running version 5.4.0-6ubuntu1~16.04.2

[Bug target/78176] [MIPS] miscompiles ldxc1 with large pointers on 32-bits

2016-11-01 Thread james410 at cowgill dot org.uk
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78176

--- Comment #3 from James Cowgill  ---
As far as I can tell, all the pointers in the original C code are valid and do
not wrap. Some of the registers wrap, but they're not pointers (until added
with other registers).

[Bug target/77308] surprisingly large stack usage for sha512 on arm

2016-11-01 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308

--- Comment #41 from wilco at gcc dot gnu.org ---
(In reply to Bernd Edlinger from comment #40)
> BTW: I found something strange in this pattern in neon.md:
> 
> (define_insn_and_split "orndi3_neon"
>   [(set (match_operand:DI 0 "s_register_operand" "=w,?&r,?&r,?&r")
> (ior:DI (not:DI (match_operand:DI 2 "s_register_operand" "w,0,0,r"))
> (match_operand:DI 1 "s_register_operand" "w,r,r,0")))]
>   "TARGET_NEON"
>   "@
>vorn\t%P0, %P1, %P2
>#
>#
>#"
>   "reload_completed && 
>(TARGET_NEON && !(IS_VFP_REGNUM (REGNO (operands[0]"
>   [(set (match_dup 0) (ior:SI (not:SI (match_dup 2)) (match_dup 1)))
>(set (match_dup 3) (ior:SI (not:SI (match_dup 4)) (match_dup 5)))]
>   "
>   {
> if (TARGET_THUMB2)
>   {
> operands[3] = gen_highpart (SImode, operands[0]);
> operands[0] = gen_lowpart (SImode, operands[0]);
> operands[4] = gen_highpart (SImode, operands[2]);
> operands[2] = gen_lowpart (SImode, operands[2]);
> operands[5] = gen_highpart (SImode, operands[1]);
> operands[1] = gen_lowpart (SImode, operands[1]);
>   }
> else
>   {
> emit_insn (gen_one_cmpldi2 (operands[0], operands[2]));
> emit_insn (gen_iordi3 (operands[0], operands[1], operands[0]));
> DONE;
>   }
>   }"
>   [(set_attr "type" "neon_logic,multiple,multiple,multiple")
>(set_attr "length" "*,16,8,8")
>(set_attr "arch" "any,a,t2,t2")]
> )
> 
> 
> I think in alternative#4 we have operands[0] == operands[1]
> and operands[2] != operands[0]
> 
> and then gen_one_cmpldi2 (operands[0], operands[2])
> will overwrite the operand[1] before it is used in
> gen_iordi3 (operands[0], operands[1], operands[0]) ??

ARM only uses the 2nd alternative (set_attr "arch" "any,a,t2,t2"), so this is
correct. There is no need to support this pattern for ARM as ARM doesn't have
ORN, and we expand early the whole pattern becomes redundant.

[Bug target/77308] surprisingly large stack usage for sha512 on arm

2016-11-01 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308

--- Comment #42 from wilco at gcc dot gnu.org ---
(In reply to Bernd Edlinger from comment #40)
> BTW: I found something strange in this pattern in neon.md:
> 
> (define_insn_and_split "orndi3_neon"
>   [(set (match_operand:DI 0 "s_register_operand" "=w,?&r,?&r,?&r")
> (ior:DI (not:DI (match_operand:DI 2 "s_register_operand" "w,0,0,r"))
> (match_operand:DI 1 "s_register_operand" "w,r,r,0")))]

Also it would be easy to support "&r,r,0" by doing op0 = ~(op0 = op2 & ~op0) so
there was no need to have ARM and Thumb-2 specific alternatives...

[Bug c++/78177] adding -flto flags causes linker error "undefined reference to vtable"

2016-11-01 Thread trippels at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78177

Markus Trippelsdorf  changed:

   What|Removed |Added

 Status|UNCONFIRMED |WAITING
   Last reconfirmed||2016-11-01
 CC||trippels at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #1 from Markus Trippelsdorf  ---
Please provide a small testcase that shows the issue.

[Bug target/77308] surprisingly large stack usage for sha512 on arm

2016-11-01 Thread bernd.edlinger at hotmail dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308

--- Comment #43 from Bernd Edlinger  ---
(In reply to wilco from comment #41)
> 
> ARM only uses the 2nd alternative (set_attr "arch" "any,a,t2,t2"), so this
> is correct. There is no need to support this pattern for ARM as ARM doesn't
> have ORN, and we expand early the whole pattern becomes redundant.

Oh I see.  Thanks for clarifying that.

[Bug target/78176] [MIPS] miscompiles ldxc1 with large pointers on 32-bits

2016-11-01 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78176

Andrew Pinski  changed:

   What|Removed |Added

   Keywords||wrong-code
 Target||mips

--- Comment #4 from Andrew Pinski  ---
Then the problem is a reassociation issue. That is we should never get an
overflow happening for pointers for MIPS.

[Bug fortran/69544] [5/6/7 Regression] Internal compiler error with -Wall and where

2016-11-01 Thread tkoenig at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69544

--- Comment #5 from Thomas Koenig  ---
Author: tkoenig
Date: Tue Nov  1 16:18:18 2016
New Revision: 241745

URL: https://gcc.gnu.org/viewcvs?rev=241745&root=gcc&view=rev
Log:
2016-11-01  Thomas Koenig  

PR fortran/69544
* match.c (gfc_match_where):  Fill in locus for assigment
in simple WHERE statement.

2016-11-01  Thomas Koenig  

PR fortran/69544
* gfortran.dg/where_5.f90:  New test.


Added:
trunk/gcc/testsuite/gfortran.dg/where_5.f90
Modified:
trunk/gcc/fortran/ChangeLog
trunk/gcc/fortran/match.c
trunk/gcc/testsuite/ChangeLog

[Bug middle-end/78174] out of bounds array subscript in rtl.h NOTE_DATA macro

2016-11-01 Thread msebor at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78174

--- Comment #6 from Martin Sebor  ---
(In reply to Jakub Jelinek from comment #5)

Do you have an explanation of why it's "just incorrect?" or an example where it
results in warning on valid code?

I have found another compiler that issues a warning for the same code.  When
the test case from comment #2 is slightly modified and compiled with IBM XLC++
it produces the warning below.  A translation unit obtained from the emit-rtl.c
file causes tons of such warnings with this compiler.

$ cat u.c && xlC -O2 -xc++ u.c
struct A { int i, j; };
struct B { int i0, j0, i1, j1, i2, j2, i3, j3, i4, j4; };

struct C {
  union {
struct A a[1];
struct B b;
  } u;
};

struct D: C { };

extern "C" void* memset (void*, const void*, unsigned long);
void f (struct D *d)
{
  struct A *p = &d->u.a[3];
  memset (p, 0, sizeof *p);
}
"u.c", line 16.25: 1540-2907 (W) The subscript 3 is out of range. The valid
range is 0 to 0.

The IBM compiler also emits code that assumes there are no elements in the
array beyond the first and programs that assume otherwise tend to behave
unexpectedly.  For example, the following function only reads the value
p->u.b.k once and not in every iteration of the loop, computing a different
result than when it's compiled with GCC.

int f (struct D *p)
{
  int n = 0;

  for (int i = 0; i != p->u.b.k - 1; ++i, ++n)
p->u.a [i + 1].a = 3;

  return n;
}

Since GCC is intended to be compiled by other compilers besides itself it
should be written in portable C++ without relying on its own extensions,
especially those that are undocumented like the one that's the subject of this
bug.

In any event, since you seem to know abvout __builtin_object_size what the rest
of us don't it would be helpful if you could document these restrictions so we
know how use the built-in correctly.  Alternatively, if you write them up
either here or in response to one of the other bugs I and others have opened
for it over the years I'd be happy to update the manual myself.  That said,
without a rationale for these restrictions comments like "it's just wrong,
don't do it" aren't helpful.

[Bug fortran/69544] [5/6/7 Regression] Internal compiler error with -Wall and where

2016-11-01 Thread dominiq at lps dot ens.fr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69544

--- Comment #6 from Dominique d'Humieres  ---
You did not read https://gcc.gnu.org/ml/fortran/2016-11/msg2.html!-(

[Bug target/77308] surprisingly large stack usage for sha512 on arm

2016-11-01 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308

--- Comment #44 from wilco at gcc dot gnu.org ---
(In reply to Bernd Edlinger from comment #38)
> Created attachment 39939 [details]
> proposed patch, v2
> 

> Unlike the previous patch, thumb1 stack usage stays at 1588 bytes,
> because thumb1 cannot split the adddi3 pattern, once it is emitted.

We can split into a new pattern that contains adds/adc together. Splitting
should help Thumb-1 the most as it has just 3 allocatable DI mode registers...

[Bug fortran/71796] Link error referencing compiler generated symbol __vtab_xxx

2016-11-01 Thread dominiq at lps dot ens.fr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71796

--- Comment #4 from Dominique d'Humieres  ---
> If you want a test case that exhibits no run time error upon successful
> compilation and linking, then replace the entire main program with an
> END statement.

I am using that for years and was not what I asked for (sorry if I have been
unclear).

Presently the test obviously segfaults at run time.

What I'ld like to have is an actual implementation of b_binding and the
expected output at run time.

[Bug middle-end/78174] out of bounds array subscript in rtl.h NOTE_DATA macro

2016-11-01 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78174

--- Comment #7 from Jakub Jelinek  ---
(In reply to Martin Sebor from comment #6)
> (In reply to Jakub Jelinek from comment #5)
> 
> Do you have an explanation of why it's "just incorrect?" or an example where
> it results in warning on valid code?

Only -D_FORTIFY_SOURCE=1 which always uses __bos (, 0) is fully complaint mode,
-D_FORTIFY_SOURCE=2 is a mode that imposes additional restrictions (both that
%n in *printf family can't be used in writable format strings and that
str*/stp* functions can't cross field boundaries.
Doing memset (field_address, 0, sizeof (whole_struct) - offsetof (struct,
field)); and similar is just so common that breaking it is not desirable even
in that extra mode.  As in the warning you want to add (which I'm still not
very happy about, because the important distinction where you just warn about
something that may happen or not vs. warning where you warn that if you execute
this particular code path, you'll always __chk_fail () and abort the process is
lost, plus there is no separate option or warning levels for those must fail
and "must invoke UB" and "may invoke UB") doesn't know if the additional
restrictions are imposed or not, except when __builtin_*_chk builtins are used,
IMNSHO you should use __bos (, 0) for all those other cases.

It is true that ideally we'd use flexible array members for u.fld, except that
C++ doesn't have them, and somebody decided it is a good idea to wrap them into
further classes.  I know coverity is unhappy about that, perhaps a way out of
this would be to just always use fld[10]; or whatever is the highest number of
RTL operands (similarly for tree_exp).  But that doesn't change anything on
that this is a very common technique used by tons of other programs and we do
not want to warn about that.

[Bug target/77308] surprisingly large stack usage for sha512 on arm

2016-11-01 Thread rearnsha at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308

--- Comment #46 from Richard Earnshaw  ---
(In reply to wilco from comment #44)
> (In reply to Bernd Edlinger from comment #38)
> > Created attachment 39939 [details]
> > proposed patch, v2
> > 
> 
> > Unlike the previous patch, thumb1 stack usage stays at 1588 bytes,
> > because thumb1 cannot split the adddi3 pattern, once it is emitted.
> 
> We can split into a new pattern that contains adds/adc together. Splitting
> should help Thumb-1 the most as it has just 3 allocatable DI mode
> registers...

Not on Thumb-1 we can't.  Because of register allocation limitations, we cannot
expose the flags until after register allocation has completed.  (Since the
register allocator needs to be able to insert loads, adds and copy instructions
between any two insns.  The add and copy instructions clobber the flags, making
early splitting impossible.

[Bug target/77308] surprisingly large stack usage for sha512 on arm

2016-11-01 Thread bernd.edlinger at hotmail dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308

--- Comment #45 from Bernd Edlinger  ---
(In reply to wilco from comment #44)
> (In reply to Bernd Edlinger from comment #38)
> > Created attachment 39939 [details]
> > proposed patch, v2
> > 
> 
> > Unlike the previous patch, thumb1 stack usage stays at 1588 bytes,
> > because thumb1 cannot split the adddi3 pattern, once it is emitted.
> 
> We can split into a new pattern that contains adds/adc together. Splitting
> should help Thumb-1 the most as it has just 3 allocatable DI mode
> registers...

But we need to split the adds and the adc into two separate
pattern, then it can happen that the adc instruction's result
is unused, and that propagates to the inputs.

But since I read this comment in thumb1.md I have doubts:

;; Beware of splitting Thumb1 patterns that output multiple
;; assembly instructions, in particular instruction such as SBC and
;; ADC which consume flags.  For example, in the pattern thumb_subdi3
;; below, the output SUB implicitly sets the flags (assembled to SUBS)
;; and then the Carry flag is used by SBC to compute the correct
;; result.  If we split thumb_subdi3 pattern into two separate RTL
;; insns (using define_insn_and_split), the scheduler might place
;; other RTL insns between SUB and SBC, possibly modifying the Carry
;; flag used by SBC.  This might happen because most Thumb1 patterns
;; for flag-setting instructions do not have explicit RTL for setting
;; or clobbering the flags.  Instead, they have the attribute "conds"
;; with value "set" or "clob".  However, this attribute is not used to
;; identify dependencies and therefore the scheduler might reorder
;; these instruction.  Currenly, this problem cannot happen because
;; there are no separate Thumb1 patterns for individual instruction
;; that consume flags (except conditional execution, which is treated
;; differently).  In particular there is no Thumb1 armv6-m pattern for
;; sbc or adc.


Disabling the adddi3 pattern worked with control flow instead of
passing the Carry flag from thee adds to the adc pattern.

In the sha512 test case that was still profitable, but I think
that will not be the case in general.

I can live with the state of thumb1 in the moment.

I am more interested in early expansion of di patterns
in vfp / avoid_neon_for_64bits and so on.

Maybe if the user explicitly wants neon_for_64bits, so be it.

[Bug fortran/78178] New: ICE in WHERE statement with diagnostic

2016-11-01 Thread tkoenig at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78178

Bug ID: 78178
   Summary: ICE in WHERE statement with diagnostic
   Product: gcc
   Version: 7.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: fortran
  Assignee: unassigned at gcc dot gnu.org
  Reporter: tkoenig at gcc dot gnu.org
  Target Milestone: ---

This is a spin-off from PR 65944.

As noted by Dominique in https://gcc.gnu.org/ml/fortran/2016-11/msg2.html ,
gfc_match_simple_where also has an issue with not setting the locus
correctly.  Test case:

! { dg-do compile }
! { dg-options "-Wcharacter-truncation" }
subroutine where_ice (i,j)

  implicit none

  character(8)  :: y(10,10,2)

  integer   :: i
  integer   :: j

  character(12) :: txt(5)
  if (.true.) where (txt(1:3) /= ''   )  y(1:3,i,j) = txt(1:3) ! { dg-warning
"CHARACTER expression will be truncated" }

end subroutine where_ice

Problem is that the obvious and simple fix

Index: match.c
===
--- match.c (Revision 241745)
+++ match.c (Arbeitskopie)
@@ -6219,6 +6219,7 @@ match_simple_where (void)

   c->next = XCNEW (gfc_code);
   *c->next = new_st;
+  c->next->loc = gfc_current_locus;
   gfc_clear_new_st ();

   new_st.op = EXEC_WHERE;

leads to lots of regressions:

ig25@linux-fd1f:~/Krempel/Where> gfortran actual_array_offset_1.f90 
f951: internal compiler error: Segmentation fault
0xc2695f crash_signal
../../trunk/gcc/toplev.c:338
0x6f0ba6 resolve_select_type
../../trunk/gcc/fortran/resolve.c:8768
0x6e719c gfc_resolve_code(gfc_code*, gfc_namespace*)
../../trunk/gcc/fortran/resolve.c:11045
0x6e94a7 resolve_codes
../../trunk/gcc/fortran/resolve.c:16025
0x6e93ee resolve_codes
../../trunk/gcc/fortran/resolve.c:16010
0x6e956e gfc_resolve(gfc_namespace*)
../../trunk/gcc/fortran/resolve.c:16060
0x6d3ff2 gfc_parse_file()
../../trunk/gcc/fortran/parse.c:6092
0x717902 gfc_be_parse_file
../../trunk/gcc/fortran/f95-lang.c:198
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.

What on earth is going on there I don't know.  I think I will
build from a clean tree and retry, and in the meantime fix
PR 65944 on the other branches.

[Bug target/78118] xtensa: ICE in gcc-6.1.0/libgcc/libgcc2.c:1992:1: error: unrecognizable insn

2016-11-01 Thread jcmvbkbc at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78118

--- Comment #4 from jcmvbkbc at gcc dot gnu.org ---
Author: jcmvbkbc
Date: Tue Nov  1 17:16:33 2016
New Revision: 241748

URL: https://gcc.gnu.org/viewcvs?rev=241748&root=gcc&view=rev
Log:
xtensa: Fix PR target/78118

It started failing after the following commit: 32e90dc6a0cda45 ("PR
rtl-optimization/61047").

The change that made xtensa backend go ICE looks completely unrelated,
and indeed, the issue is caused by the side effect of
compute_frame_size() function call hidden in the
INITIAL_ELIMINATION_OFFSET macro. This call updates the value of the
xtensa_current_frame_size static variable, used in "return" instruction
predicate. Prior to the change the value of xtensa_current_frame_size was
set to 0 after the end of epilogue generation, which enabled the "return"
instruction for the CALL0 ABI, but after the change the additional
INITIAL_ELIMINATION_OFFSET calls make xtensa_current_frame_size non-zero
and "return" pattern unavailable.

Get rid of the global xtensa_current_frame_size and
xtensa_callee_save_size variables by moving them into the
machine_function structure. Implement predicate for the "return" pattern
as a function. Don't communicate completion of epilogue generation
through zeroing of xtensa_current_frame_size, add explicit epilogue_done
variable to the machine_function structure. Don't update stack frame
layout after the completion of reload.

2016-11-01  Max Filippov  
gcc/
* config/xtensa/xtensa-protos.h
(xtensa_use_return_instruction_p): New prototype.
* config/xtensa/xtensa.c (xtensa_current_frame_size,
xtensa_callee_save_size): Remove.
(struct machine_function): Add new fields: current_frame_size,
callee_save_size, frame_laid_out and epilogue_done.
(compute_frame_size, xtensa_expand_prologue,
xtensa_expand_epilogue): Replace xtensa_callee_save_size with
cfun->machine->callee_save_size and xtensa_current_frame_size
with cfun->machine->current_frame_size.
(compute_frame_size): Update cfun->machine->frame_laid_out and
don't update frame layout after reload completion.
(xtensa_expand_epilogue): Set cfun->machine->epilogue_done
instead of zeroing xtensa_current_frame_size.
(xtensa_use_return_instruction_p): New function.
* config/xtensa/xtensa.h (xtensa_current_frame_size): Remove
declaration.
(INITIAL_ELIMINATION_OFFSET): Use return value of
compute_frame_size instead of xtensa_current_frame_size value.
* config/xtensa/xtensa.md ("return" pattern): Use new predicate
function xtensa_use_return_instruction_p instead of inline code.


Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/xtensa/xtensa-protos.h
trunk/gcc/config/xtensa/xtensa.c
trunk/gcc/config/xtensa/xtensa.h
trunk/gcc/config/xtensa/xtensa.md

[Bug target/77308] surprisingly large stack usage for sha512 on arm

2016-11-01 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308

--- Comment #47 from wilco at gcc dot gnu.org ---
(In reply to Richard Earnshaw from comment #46)
> (In reply to wilco from comment #44)
> > (In reply to Bernd Edlinger from comment #38)
> > > Created attachment 39939 [details]
> > > proposed patch, v2
> > > 
> > 
> > > Unlike the previous patch, thumb1 stack usage stays at 1588 bytes,
> > > because thumb1 cannot split the adddi3 pattern, once it is emitted.
> > 
> > We can split into a new pattern that contains adds/adc together. Splitting
> > should help Thumb-1 the most as it has just 3 allocatable DI mode
> > registers...
> 
> Not on Thumb-1 we can't.  Because of register allocation limitations, we
> cannot expose the flags until after register allocation has completed. 
> (Since the register allocator needs to be able to insert loads, adds and
> copy instructions between any two insns.  The add and copy instructions
> clobber the flags, making early splitting impossible.

What I meant is splitting into a single new instruction using SI mode registers
rather than DI mode registers so that register allocation is more efficient.

[Bug target/78166] [6/7 Regression] hash.c:1887:1: error: unrecognizable insn

2016-11-01 Thread law at redhat dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78166

Jeffrey A. Law  changed:

   What|Removed |Added

   Priority|P3  |P4
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2016-11-01
 Ever confirmed|0   |1

--- Comment #5 from Jeffrey A. Law  ---
How unpleasant.  ISTM we have a few paths forward here.

1. Add back the patterns which accept non-canonical RTL
2. Hack up reload to know about the canonicalization rule and enforce it when
we reload an address
3. Handle the rewriting by using a secondary reload

#1 and #3 are PA specific fixes.  #2 means some ugly hacking in code we really
don't want to support anymore.  But #2 has the advantage that it'd address this
issue on other reload targets with shadd style insns and scaled memory
indexing.

#1 is easy and John has a patch for that.  I've prototyped #3, but don't really
like it.  I think going with #1 on the trunk and backported to gcc-6 is
probably best.

[Bug target/78166] [6/7 Regression] hash.c:1887:1: error: unrecognizable insn

2016-11-01 Thread dave.anglin at bell dot net
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78166

--- Comment #6 from dave.anglin at bell dot net ---
On 2016-11-01, at 1:22 PM, law at redhat dot com wrote:

> #1 is easy and John has a patch for that.  I've prototyped #3, but don't 
> really
> like it.  I think going with #1 on the trunk and backported to gcc-6 is
> probably best.

Thanks very much for the analysis.  I'll commit my change.

PS: I started on trying to implement lra but it's not going to be easy...

--
John David Anglin   dave.ang...@bell.net

[Bug rtl-optimization/14319] incorrect optimization of union of structs with common initial sequences

2016-11-01 Thread txr at alumni dot caltech.edu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=14319

--- Comment #10 from Tim Rentsch  ---
I would like to add a few comments to the discussion.

One:  C and C++ are different in how they treat unions.  My comments
here are about C.  I believe they apply to C++ as well, but I am not
as familiar with the C++ standard as the C standard, so please take
that into consideration.

Two:  The issue here is about how accesses involving struct and union
members interact with aliasing questions.  Clearly possible aliasing
is allowed IF one considers just the effective type rules (6.5 p6&7),
because the access types involved are both just 'int'.  What makes a
difference here is accesses being performed by way of the '->'
operator, and possible union membership for the objects in question.

Three:  The ISO C standard doesn't articulate clearly (at least, not
as clearly as I would like) what the aliasing implications are for
accesses involving struct and union members.  Obviously it would help
if the Standard were improved in this area.

Four:  Despite the last observation, the "one special guarantee" clause
(and hence also DR 257) is clearly not germane to this problem.  The
reason for this is that the "one special guarantee" clause is concerned
with read access ("inspect" is the word used in the Standard), but the
example code has no read accesses, only write accesses.  That paragraph
of the Standard is not relevant here.

Five:  For basically the same reason, this bug should not be considered
a duplicate of Bug 65892.  The example code in Bug 65892 _does_ involve
a read access, and the "one special guarantee" clause _is_ relevant to
that discussion.  Because of that, these two bugs should be separated
again, as their resolutions may be (and I believe probably will be)
different in the two cases.

I am expecting to post a separate comment for Bug 65892 shortly.

[Bug fortran/78178] ICE in WHERE statement with diagnostic

2016-11-01 Thread tkoenig at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78178

--- Comment #1 from Thomas Koenig  ---

> What on earth is going on there I don't know.  I think I will
> build from a clean tree and retry, and in the meantime fix

That should be PR 69544, of course.

[Bug target/78118] xtensa: ICE in gcc-6.1.0/libgcc/libgcc2.c:1992:1: error: unrecognizable insn

2016-11-01 Thread jcmvbkbc at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78118

jcmvbkbc at gcc dot gnu.org changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #5 from jcmvbkbc at gcc dot gnu.org ---
Fix committed to trunk.

[Bug target/78166] [6/7 Regression] hash.c:1887:1: error: unrecognizable insn

2016-11-01 Thread danglin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78166

--- Comment #7 from John David Anglin  ---
Author: danglin
Date: Tue Nov  1 18:15:57 2016
New Revision: 241749

URL: https://gcc.gnu.org/viewcvs?rev=241749&root=gcc&view=rev
Log:
PR target/78166
* config/pa/pa.md: Add new shift/add patterns to handle
(plus (mult (reg) (mem_shadd_operand)) (reg)) source operand.


Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/pa/pa.md

[Bug c/65892] gcc fails to implement N685 aliasing of union members

2016-11-01 Thread txr at alumni dot caltech.edu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65892

Tim Rentsch  changed:

   What|Removed |Added

 CC||txr at alumni dot caltech.edu

--- Comment #18 from Tim Rentsch  ---
I would like to add a few comments to the discussion.

One:  C and C++ are different in how they treat unions.  My comments
here are about C.  I believe they apply to C++ as well, but I am not
as familiar with the C++ standard as the C standard, so please take
that into consideration.

Two:  I have recently posted a comment for Bug 14319.  That comment
explains my reasoning why these two bugs should be separated and
not be considered duplicates.

Three:  I note the comments made by joseph with regard to the .s1/.s2
matter.  There may be a larger open question there, but to avoid
muddying the waters please assume that his change to use .s2 in the
initializer has been made.

Four:  I understand that there are also larger issues related to how
union membership may have a bearing on alias analysis.  My comments
here are confined to the particular case at hand, namely, given a
definition for union U followed by a definition for function f(),
could f() be optimized so the p1->m value is cached in a register
(or something similar) before the body of the if() is executed,
and the cached value used as the return value.

Five:  The answer to the question is clearly No.  The example code
is very much on point to the "one special guarantee" clause, and
so the read access p1->m is permitted.  As the access is permitted,
and as there are no other conditions present that cause undefined
or unspecified behavior, the behavior is well-defined, which means
any optimization that changes the unoptimized behavior is wrong.

Six:  To see the example code is covered under the "one special
guarantee" clause, note the second part of EXAMPLE 3 in 6.5.2.3.
In particular, the commentary in parentheses, "(because the union
type is not visible within function f)", shows that whether the
union type is defined before or after f() is the determining
factor here.  Whether a . or -> union membership operation is
present or not present has no bearing on the definedness of
the struct member access p1->m.

Seven:  I understand the objections about impacting alias analysis
and so forth.  I agree that it makes the analysis more difficult
(although not as sweeping in its implications as some comments
imply).  Despite the problems, the examples in the Standard, and
also the response to DR 257, both show that the committee members
fully intend that this case be covered under the "one special
guarantee" clause.

Eight:  In the meantime, I strongly recommend gcc be patched to
support the expected decision (which is the more conservative
choice) rather than suspending activity until some indefinite
time in the future.

[Bug target/78166] [6/7 Regression] hash.c:1887:1: error: unrecognizable insn

2016-11-01 Thread danglin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78166

--- Comment #8 from John David Anglin  ---
Author: danglin
Date: Tue Nov  1 18:17:58 2016
New Revision: 241750

URL: https://gcc.gnu.org/viewcvs?rev=241750&root=gcc&view=rev
Log:
PR target/78166
* config/pa/pa.md: Add new shift/add patterns to handle
(plus (mult (reg) (mem_shadd_operand)) (reg)) source operand.


Modified:
branches/gcc-6-branch/gcc/ChangeLog
branches/gcc-6-branch/gcc/config/pa/pa.md

[Bug target/78166] [6/7 Regression] hash.c:1887:1: error: unrecognizable insn

2016-11-01 Thread danglin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78166

John David Anglin  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #9 from John David Anglin  ---
Fixed.

[Bug libstdc++/78179] New: FAIL: 26_numerics/headers/cmath/hypot.cc execution test

2016-11-01 Thread danglin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78179

Bug ID: 78179
   Summary: FAIL: 26_numerics/headers/cmath/hypot.cc execution
test
   Product: gcc
   Version: 7.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: danglin at gcc dot gnu.org
  Target Milestone: ---
  Host: hppa-unknown-linux-gnu
Target: hppa-unknown-linux-gnu
 Build: hppa-unknown-linux-gnu

spawn -ignore SIGHUP /home/dave/gnu/gcc/objdir/./gcc/xg++ -shared-libgcc
-B/home
/dave/gnu/gcc/objdir/./gcc -nostdinc++
-L/home/dave/gnu/gcc/objdir/hppa-linux-gn
u/libstdc++-v3/src
-L/home/dave/gnu/gcc/objdir/hppa-linux-gnu/libstdc++-v3/src/.
libs -L/home/dave/gnu/gcc/objdir/hppa-linux-gnu/libstdc++-v3/libsupc++/.libs
-B/
home/dave/opt/gnu/gcc/gcc-7/hppa-linux-gnu/bin/
-B/home/dave/opt/gnu/gcc/gcc-7/h
ppa-linux-gnu/lib/ -isystem /home/dave/opt/gnu/gcc/gcc-7/hppa-linux-gnu/include 
-isystem /home/dave/opt/gnu/gcc/gcc-7/hppa-linux-gnu/sys-include
-B/home/dave/gn
u/gcc/objdir/hppa-linux-gnu/./libstdc++-v3/src/.libs -fmessage-length=0
-fno-sho
w-column -ffunction-sections -fdata-sections -g -O2 -D_GNU_SOURCE
-DLOCALEDIR=".
" -nostdinc++
-I/home/dave/gnu/gcc/objdir/hppa-linux-gnu/libstdc++-v3/include/hp
pa-linux-gnu -I/home/dave/gnu/gcc/objdir/hppa-linux-gnu/libstdc++-v3/include
-I/
home/dave/gnu/gcc/gcc/libstdc++-v3/libsupc++
-I/home/dave/gnu/gcc/gcc/libstdc++-
v3/include/backward -I/home/dave/gnu/gcc/gcc/libstdc++-v3/testsuite/util
/home/d
ave/gnu/gcc/gcc/libstdc++-v3/testsuite/26_numerics/headers/cmath/hypot.cc
-std=g
nu++17 -fno-diagnostics-show-caret -fdiagnostics-color=never ./libtestc++.a
-Wl,
--gc-sections
-L/home/dave/gnu/gcc/objdir/hppa-linux-gnu/libstdc++-v3/src/filesy
stem/.libs -lm -o ./hypot.exe
PASS: 26_numerics/headers/cmath/hypot.cc (test for excess errors)
Setting LD_LIBRARY_PATH to
:/home/dave/gnu/gcc/objdir/gcc:/home/dave/gnu/gcc/obj
dir/hppa-linux-gnu/./libstdc++-v3/../libgomp/.libs:/home/dave/gnu/gcc/objdir/hpp
a-linux-gnu/./libstdc++-v3/src/.libs::/home/dave/gnu/gcc/objdir/gcc:/home/dave/g
nu/gcc/objdir/hppa-linux-gnu/./libstdc++-v3/../libgomp/.libs:/home/dave/gnu/gcc/
objdir/hppa-linux-gnu/./libstdc++-v3/src/.libs:/home/dave/gnu/gcc/objdir/hppa-li
nux-gnu/libstdc++-v3/src/.libs:/home/dave/gnu/gcc/objdir/hppa-linux-gnu/libssp/.libs:/home/dave/gnu/gcc/objdir/hppa-linux-gnu/libgomp/.libs:/home/dave/gnu/gcc/objdir/hppa-linux-gnu/libatomic/.libs:/home/dave/gnu/gcc/objdir/./gcc:/home/dave/gnu/gcc/objdir/./prev-gcc
Execution timeout is: 300
spawn [open ...]
/home/dave/gnu/gcc/gcc/libstdc++-v3/testsuite/26_numerics/headers/cmath/hypot.cc:71:
void test(const testcase_hypot (&)[Num], Tp) [with Tp = long double;
unsigned int Num = 13u]: Assertion 'max_abs_frac < toler' failed.
FAIL: 26_numerics/headers/cmath/hypot.cc execution test

I believe this is a test error.  Long double and double are the same on
hppa-unknown-linux-gnu, but the test tolerances differ.

[Bug c/65892] gcc fails to implement N685 aliasing of union members

2016-11-01 Thread joseph at codesourcery dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65892

--- Comment #19 from joseph at codesourcery dot com  ---
On Tue, 1 Nov 2016, txr at alumni dot caltech.edu wrote:

> Five:  The answer to the question is clearly No.  The example code
> is very much on point to the "one special guarantee" clause, and
> so the read access p1->m is permitted.  As the access is permitted,

I maintain that, as I said in comment#9, the textual history indicates 
that the original intent of saying things are permitted here is *only* an 
exception to the general implementation-defined nature of type punning, 
not to any other reason why things might be undefined (such as aliasing 
rules, data races, etc.).

[Bug c/65892] gcc fails to implement N685 aliasing of union members

2016-11-01 Thread rguenther at suse dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65892

--- Comment #20 from rguenther at suse dot de  ---
On November 1, 2016 7:16:06 PM GMT+01:00, "txr at alumni dot caltech.edu"
 wrote:
>https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65892
>
>Tim Rentsch  changed:
>
>   What|Removed |Added
>
>  CC||txr at alumni dot caltech.edu
>
>--- Comment #18 from Tim Rentsch  ---
>I would like to add a few comments to the discussion.
>
>One:  C and C++ are different in how they treat unions.  My comments
>here are about C.  I believe they apply to C++ as well, but I am not
>as familiar with the C++ standard as the C standard, so please take
>that into consideration.
>
>Two:  I have recently posted a comment for Bug 14319.  That comment
>explains my reasoning why these two bugs should be separated and
>not be considered duplicates.
>
>Three:  I note the comments made by joseph with regard to the .s1/.s2
>matter.  There may be a larger open question there, but to avoid
>muddying the waters please assume that his change to use .s2 in the
>initializer has been made.
>
>Four:  I understand that there are also larger issues related to how
>union membership may have a bearing on alias analysis.  My comments
>here are confined to the particular case at hand, namely, given a
>definition for union U followed by a definition for function f(),
>could f() be optimized so the p1->m value is cached in a register
>(or something similar) before the body of the if() is executed,
>and the cached value used as the return value.
>
>Five:  The answer to the question is clearly No.  The example code
>is very much on point to the "one special guarantee" clause, and
>so the read access p1->m is permitted.  As the access is permitted,
>and as there are no other conditions present that cause undefined
>or unspecified behavior, the behavior is well-defined, which means
>any optimization that changes the unoptimized behavior is wrong.
>
>Six:  To see the example code is covered under the "one special
>guarantee" clause, note the second part of EXAMPLE 3 in 6.5.2.3.
>In particular, the commentary in parentheses, "(because the union
>type is not visible within function f)", shows that whether the
>union type is defined before or after f() is the determining
>factor here.  Whether a . or -> union membership operation is
>present or not present has no bearing on the definedness of
>the struct member access p1->m.
>
>Seven:  I understand the objections about impacting alias analysis
>and so forth.  I agree that it makes the analysis more difficult
>(although not as sweeping in its implications as some comments
>imply).  Despite the problems, the examples in the Standard, and
>also the response to DR 257, both show that the committee members
>fully intend that this case be covered under the "one special
>guarantee" clause.
>
>Eight:  In the meantime, I strongly recommend gcc be patched to
>support the expected decision (which is the more conservative
>choice) rather than suspending activity until some indefinite
>time in the future.

GCC already implements this if you specify -fno-strict-aliasing.

[Bug rtl-optimization/71785] Computed gotos are mostly optimized away

2016-11-01 Thread andres at anarazel dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71785

--- Comment #6 from Andres Freund  ---
Hi,

Can confirm this patch fixes the specific code generation issue I
complained about, leading to an overall 1.9% improvement in TPC-H
performance.  There's still some counterproductive jumps, but they're
unrelated to computed goto.

Thanks,

Andres

[Bug lto/78140] [7 Regression] libxul -flto uses 1GB more memory than gcc-6

2016-11-01 Thread trippels at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78140

--- Comment #7 from Markus Trippelsdorf  ---
BTW Firefox trunk fails to build for me:

ld: error: /tmp/ccsbLieS.ltrans29.ltrans.o: requires dynamic R_X86_64_PC32
reloc against '_ZN2js3jitL2R0E' which may overflow at runtime; recompile with
-fPIC
ld: error: read-only segment has dynamic relocations
/tmp/ccsbLieS.ltrans29.ltrans.o::function
js::jit::BaselineCompiler::emitCheckThis(js::jit::ValueOperand) [clone
.constprop.20226]: error: undefined reference to 'js::jit::R0'

Haven't looked into it yet. Could well be a Firefox bug.

[Bug fortran/78152] [6/7 Regression] coarray and associate

2016-11-01 Thread pault at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78152

Paul Thomas  changed:

   What|Removed |Added

 CC||pault at gcc dot gnu.org

--- Comment #4 from Paul Thomas  ---
(In reply to Steve Kargl from comment #3)
> I believe the posted code is neither conforming nor nonconforming.
> It has found a defect in the Fortran 2008 standard.  The code can
> be written as
> 
> subroutine co_assoc
>   implicit none
>   integer, parameter :: p = 5
>   real, allocatable :: a(:,:)[:,:]
>   allocate (a(p,p)[2,*])
>   associate (i => a(1:p, 1:p))
>   end associate
> end subroutine co_assoc
> 
> where I changed 'program' to 'subroutine'.  gfortran reports
> 
> % gfc -c -fcoarray=single a.f90
> a.f90:7:29:
> 
>associate (i => a(1:p, 1:p))
>  1
> Error: Variable 'i' at (1) is a coarray and is not ALLOCATABLE,
> SAVE nor a dummy argument
> 
> Now, looking at the standard one finds
> 
>8.1.3.3 Attributes of associate names
>...
>The cobounds of each codimension of the associating entity are
>the same as those of the selector.
> 
> If one assumes that an entity with cobounds and codimension is a
> coarray, then 'i' is a coarray.  But, by F2008:C526, one has
> 
>C526 A coarray or an object with a coarray ultimate component
> shall be a dummy argument or have the ALLOCATABLE or SAVE
> attribute.
> 
> 'i' is clearly not a dummy argument.  From 16.5.1.6
> 
>If the selector is allocatable, it shall be allocated; the
>associate name is associated with the data object and does
>not have the ALLOCATABLE attribute.
> 
> 'i' does not have the ALLOCATABLE attribute.  By inspection, one
> can see that 'i' also does not have the SAVE attribute.
> 
> In thinking about it, 8.1.3.3 shows that the standard intends
> to permit a coarray in an ASSOCIATE statement, but C526 imposes
> very limited conditions. 
> 
> Someone needs to submit an interpretation request to J3.

Hi Steve,

If you bypass the error for associate names, does such code run correctly?

Also, what do the other brands do? I am away from base at present but, as soon
as I am home, I will try the testcase with ifort.

Cheers

Paul

[Bug fortran/78152] [6/7 Regression] coarray and associate

2016-11-01 Thread sgk at troutmask dot apl.washington.edu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78152

--- Comment #5 from Steve Kargl  ---
On Tue, Nov 01, 2016 at 08:32:46PM +, pault at gcc dot gnu.org wrote:
> > 
> > Someone needs to submit an interpretation request to J3.
> 
> 
> If you bypass the error for associate names, does such code run correctly?

The code compiles with the rather trivial patch (watch for
cut-n-paste whitespace corruption).

Index: gcc/fortran/resolve.c
===
--- gcc/fortran/resolve.c   (revision 241667)
+++ gcc/fortran/resolve.c   (working copy)
@@ -14630,7 +14630,9 @@ resolve_symbol (gfc_symbol *sym)
   || (sym->ns->save_all && !sym->attr.automatic)
   || sym->ns->proc_name->attr.flavor == FL_MODULE
   || sym->ns->proc_name->attr.is_main_program
-  || sym->attr.function || sym->attr.result || sym->attr.use_assoc))
+  || sym->attr.function || sym->attr.result
+  || sym->attr.use_assoc
+  || sym->attr.associate_var))
 {
   gfc_error ("Variable %qs at %L is a coarray and is not ALLOCATABLE, SAVE
"
 "nor a dummy argument", sym->name, &sym->declared_at);

The original testcase and my altered code are rather trivial
and don't do anything within the ASSOCIATE constructed.  I
suspect that it will be get optimized out.

> Also, what do the other brands do?

Fortunately, I use FreeBSD as my operating system, which
unfortunately limits me to gfortran.  I posted to c.l.f, 
but haven't got much feedback.

[Bug c++/78180] New: Poor optimization of std::array on gcc 4.8/5.4/6.2 as compared to simple raw array

2016-11-01 Thread barry.revzin at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78180

Bug ID: 78180
   Summary: Poor optimization of std::array on gcc 4.8/5.4/6.2 as
compared to simple raw array
   Product: gcc
   Version: 6.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: barry.revzin at gmail dot com
  Target Milestone: ---

Here is a complete benchmark comparing a bunch of simple operations on a
std::array vs a int64_t[128]. I'm using
https://github.com/google/benchmark and compiling with -std=c++11 -O3
-D_GLIBCXX_USE_CXX11_ABI=0:

=
#include 
#include 

template 
class Rolling
{
C times_{};
uint32_t idx_;
const uint32_t size_;

public:
Rolling(uint32_t size)
: idx_(0)
, size_(size)
{ } 

void add(int64_t t)
{   
times_[idx_] = t;
++idx_;
if (idx_ == size_) {
idx_ = 0;
}
}   

bool exceeded(int64_t now, int64_t intv)
{   
return now - times_[idx_] < intv;
}   
};

template 
void BM_Rolling(benchmark::State& state)
{
Rolling r(100);
int64_t i = 0;
int64_t exc = 0;

while (state.KeepRunning()) {
for (int i = 0; i < state.range(0); ++i) {
r.add(i);
if (r.exceeded(i, 100)) {
benchmark::DoNotOptimize(++exc);
}
}
}   
}

#define JOIN(...) __VA_ARGS__
BENCHMARK_TEMPLATE(BM_Rolling, int64_t[128])->Range(8, 8<<10);
BENCHMARK_TEMPLATE(BM_Rolling, JOIN(std::array))->Range(8,
8<<10);

BENCHMARK_MAIN();
=

This yields the following performance numbers (similar across 4.8.2, 5.4.0, and
6.2.0):

Run on (16 X 3199.66 MHz CPU s)
2016-11-01 15:56:13
Benchmark   Time   CPU
Iterations
-
BM_Rolling)>/8   18 ns 18 ns  
39568747
BM_Rolling)>/64 135 ns134 ns   
5218330
BM_Rolling)>/512   1084 ns   1031 ns
678795
BM_Rolling)>/4k8221 ns   8185 ns 
85583
BM_Rolling)>/8k   16975 ns  16520 ns 
42752
BM_Rolling/8 15 ns 15 ns  
45940368
BM_Rolling/64   112 ns111 ns   
6301196
BM_Rolling/512  821 ns817 ns
858168
BM_Rolling/4k  6538 ns   6496 ns
108570
BM_Rolling/8k 12957 ns  12902 ns 
53582

That is a large performance gap between std::array and raw array, where I
wouldn't expect any. When compiling with clang, I don't see any gap at all
(though for both containers, the performance is significantly worse than
gcc's).

[Bug fortran/69544] [5/6/7 Regression] Internal compiler error with -Wall and where

2016-11-01 Thread tkoenig at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69544

--- Comment #7 from Thomas Koenig  ---
Author: tkoenig
Date: Tue Nov  1 21:16:46 2016
New Revision: 241756

URL: https://gcc.gnu.org/viewcvs?rev=241756&root=gcc&view=rev
Log:
2016-11-01  Thomas Koenig  

PR fortran/78178
* match.c (match_simple_where):  Fill in locus for assigment
in simple WHERE statement.

2016-11-01  Thomas Koenig  

PR fortran/69544
* gfortran.dg/where_6.f90:  New test.


Added:
trunk/gcc/testsuite/gfortran.dg/where_6.f90
Modified:
trunk/gcc/fortran/ChangeLog
trunk/gcc/fortran/match.c
trunk/gcc/testsuite/ChangeLog

[Bug fortran/78178] ICE in WHERE statement with diagnostic

2016-11-01 Thread tkoenig at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78178

--- Comment #2 from Thomas Koenig  ---
Author: tkoenig
Date: Tue Nov  1 21:16:46 2016
New Revision: 241756

URL: https://gcc.gnu.org/viewcvs?rev=241756&root=gcc&view=rev
Log:
2016-11-01  Thomas Koenig  

PR fortran/78178
* match.c (match_simple_where):  Fill in locus for assigment
in simple WHERE statement.

2016-11-01  Thomas Koenig  

PR fortran/69544
* gfortran.dg/where_6.f90:  New test.


Added:
trunk/gcc/testsuite/gfortran.dg/where_6.f90
Modified:
trunk/gcc/fortran/ChangeLog
trunk/gcc/fortran/match.c
trunk/gcc/testsuite/ChangeLog

[Bug c++/78180] Poor optimization of std::array on gcc 4.8/5.4/6.2 as compared to simple raw array

2016-11-01 Thread barry.revzin at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78180

--- Comment #1 from Barry Revzin  ---
Upon further investigation, all of the difference is in benchmark comes from
the initialization of the raw array versus std::array. For some reason, in the
two examples, the two containers are initialized differently - and this
propagates down to using different registers throughout the rest of the
benchmark.

[Bug libstdc++/69332] [6/7 Regression] FAIL: libstdc++-prettyprinters/libfundts.cc print ab

2016-11-01 Thread danglin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69332

John David Anglin  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #6 from John David Anglin  ---
Duplicate.

*** This bug has been marked as a duplicate of bug 68735 ***

[Bug libstdc++/68735] FAIL: libstdc++-prettyprinters/libfundts.cc print ab

2016-11-01 Thread danglin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68735

--- Comment #1 from John David Anglin  ---
*** Bug 69332 has been marked as a duplicate of this bug. ***

[Bug fortran/78152] [6/7 Regression] coarray and associate

2016-11-01 Thread jvdelisle at charter dot net
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78152

--- Comment #6 from jvdelisle at charter dot net ---
On 11/01/2016 01:45 PM, sgk at troutmask dot apl.washington.edu wrote:
--- snip ---

> Fortunately, I use FreeBSD as my operating system, which
> unfortunately limits me to gfortran.  I posted to c.l.f,
> but haven't got much feedback.
>

Hi Steve,

I am in the process of building OpenCoarrays and to give it a spin.

Maybe we can think of a reasonable test case to use to test the feature that is 
not well defined yet and from a practical point view, see what makes sense.

Also, off topic. (pardon the long wind here)

I had a meeting with Damian Rousan today and we briefly talked about 
OpenCoarrays.  He has mentioned on the gfortran list about enabling it into the 
build of gfortran if a user chooses to do so. So I thought I would take a look.

Currently OpenCoarrays is built independent of gfortran and in fact if you
don't 
have the right version of gfortran available, it will build gcc/gfortran as
well 
as well.

I am running his install script now, and one issue is that it builds gcc itself 
in single thread mode.  I want to modify this so that if it detects multiple 
cores available, it can ask the user if they want to use these during build of
gcc.

On Linux I can cat /proc/cpuinfo to get at the information about the machine.

Is there an equivalent FreeBSD way to query number of cores available?

Jerry

[Bug fortran/78152] [6/7 Regression] coarray and associate

2016-11-01 Thread sgk at troutmask dot apl.washington.edu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78152

--- Comment #7 from Steve Kargl  ---
On Tue, Nov 01, 2016 at 11:06:30PM +, jvdelisle at charter dot net wrote:
> 
> I had a meeting with Damian Rousan today and we briefly talked about 
> OpenCoarrays.  He has mentioned on the gfortran list about enabling
> it into the build of gfortran if a user chooses to do so.  So Ir
> thought I would take a look.

Damian has sent a patch to start the process of incorporating
OpenCoarrays into GCC.  But, I believe that this needs a 
build maintainer or global maintainer to approve it.

> I am running his install script now, and one issue is that
> it builds gcc itself in single thread mode.  I want to modify
> this so that if it detects multiple cores available, it can ask
> the user if they want to use these during build of gcc.
> 
> On Linux I can cat /proc/cpuinfo to get at the information
> about the machine.
> 
> Is there an equivalent FreeBSD way to query number of cores available?
> 

sysctl command may be of help.  On troutmask, I have

% sysctl hw.ncpu
hw.ncpu: 8

[Bug c++/69288] [concepts] Subsumption failure with constrained member functions of class template

2016-11-01 Thread Casey at Carter dot net
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69288

Casey Carter  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 CC||Casey at Carter dot net
 Resolution|--- |FIXED

--- Comment #1 from Casey Carter  ---
This no longer reproduces with 6.2 or trunk.

[Bug middle-end/78174] out of bounds array subscript in rtl.h NOTE_DATA macro

2016-11-01 Thread msebor at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78174

--- Comment #8 from Martin Sebor  ---
Okay, thanks.  Your comments seem to be focused on my patch and not so much on
this problem that was exposed by it.  I realize I invited those comments with
my response and do want to continue to have that discussion, but I want to have
it separately from this issue.  Here I'd like to get a confirmation of just the
NOTE_DATA problem so I can prepare patch for it.

[Bug target/78168] [7 Regression] Second ICE in maybe_record_trace_start, at dwarf2cfi.c:2285

2016-11-01 Thread sebastian.hu...@embedded-brains.de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78168

--- Comment #11 from Sebastian Huber  ---
(In reply to Segher Boessenkool from comment #10)
> Doesn't fail with powerpc-rtems4.12 either.  Are you sure you built trunk?
> A clean build?

I tested again today using:

commit 89bcfdabe78607bf83aa58e3d2696a2c71e719e5
Author: tbsaunde 
Date:   Wed Nov 2 03:46:17 2016 +

remove cast from prev_nonnote_insn_bb

gcc/ChangeLog:

2016-11-01  Trevor Saunders  

* emit-rtl.c (prev_nonnote_insn_bb): Change argument type to
rtx_insn *.
* rtl.h (prev_nonnote_insn_bb): Adjust prototype.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@241773
138bc75d-0d04-0410-961f-82ee72b054a4

I still get the ICE. The following flags seem to be essential (e.g. no ICE with
-mno-spe):

-O2 -mcpu=8540 -mspe -mabi=spe -g

[Bug fortran/70696] [6.0] ICE on EVENT POST of host-associated EVENT_TYPE coarray

2016-11-01 Thread damian at sourceryinstitute dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70696

--- Comment #5 from Damian Rouson  ---
Hmm... I wouldn't be surprised if the root cause is the same or closely
related, but given that lock_type and event_type are not the same, I don't see
one as the duplicate of the other.