Re: [PATCH][GCC][front-end][build-machinery][opt-framework] Allow setting of stack-clash via configure options. [Patch (4/6)]

2018-07-26 Thread Alexandre Oliva
On Jul 25, 2018, Tamar Christina  wrote:

> gcc/
> 2018-07-25  Tamar Christina  

>   PR target/86486
>   * configure.ac: Add stack-clash-protection-guard-size.
>   * doc/install.texi: Document it.
>   * config.in (DEFAULT_STK_CLASH_GUARD_SIZE): New.
>   * params.def: Update comment for guard-size.
>   (PARAM_STACK_CLASH_PROTECTION_GUARD_SIZE,
>   PARAM_STACK_CLASH_PROTECTION_PROBE_INTERVAL): Update description.
>   * configure: Regenerate.

Thanks.  No objections from me.  I don't see any use of the new config
knob, though; assuming it's in a subsequent patch, I guess this one is
fine, but I'm not sure I'm entitled to approve it.

-- 
Alexandre Oliva, freedom fighter   https://FSFLA.org/blogs/lxo
Be the change, be Free! FSF Latin America board member
GNU Toolchain EngineerFree Software Evangelist


Re: [RFC 2/3, debug] Add fkeep-vars-live

2018-07-26 Thread Alexandre Oliva
On Jul 25, 2018, Jakub Jelinek  wrote:

> On Tue, Jul 24, 2018 at 04:11:11PM -0300, Alexandre Oliva wrote:
>> On Jul 24, 2018, Tom de Vries  wrote:
>> 
>> > This patch adds fake uses of user variables at the point where they go out 
>> > of
>> > scope, to keep user variables inspectable throughout the application.
>> 
>> I suggest also adding such uses before sets, so that variables aren't
>> regarded as dead and get optimized out in ranges between the end of a
>> live range and a subsequent assignment.

> But that can be done incrementally, right

Sure

> the optimizers could still move it appart

*nod*

-- 
Alexandre Oliva, freedom fighter   https://FSFLA.org/blogs/lxo
Be the change, be Free! FSF Latin America board member
GNU Toolchain EngineerFree Software Evangelist


Re: [RFC 1/3, debug] Add fdebug-nops

2018-07-26 Thread Alexandre Oliva
On Jul 24, 2018, Tom de Vries  wrote:

>> I thought of a way to not break it: enable the debug info generation
>> machinery, including VTA and SFN, but discard those only at the very end
>> if -g is not enabled.  The downside is that it would likely slow -Og
>> down significantly, but who uses it without -g anyway?

> I thought of the same.  I've submitted a patch here that uses SFN:
> https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01391.html .

Nice!

> VTA is not needed AFAIU.

Yes, indeed.  It could avoid inserting some nops, if you were to refrain
from emitting them if there aren't any binds between neighbor SFNs, but
I like it better your way: it's even more like SFN support in the
debugger :-)

-- 
Alexandre Oliva, freedom fighter   https://FSFLA.org/blogs/lxo
Be the change, be Free! FSF Latin America board member
GNU Toolchain EngineerFree Software Evangelist


[committed, libgomp, openacc, testsuite] Fix async/wait logic in lib-13.f90

2018-07-26 Thread Tom de Vries
Hi,

the purpose of the lib-13.f90 test-case is to test acc_wait_all_async.  The
test indeed calls acc_wait_all_async, but then subsequentlys calls
acc_wait_all, so the acc_wait_all_async functionality is not tested.
Furthermore, all acc_async_test calls are placed in a location where they are
not guaranteed to succeed, which explains why there's an xfail for the lower
optimization levels.

This patch fixes the problems by replacing acc_wait_all with an acc_wait on
the async id used for the acc_wait_all_async call, and moving the
acc_async_test calls to the correct locations.

Reg-tested on x86_64 with nvptx accelerator.

Committed to trunk.

Thanks,
- Tom

[libgomp, openacc, testsuite] Fix async/wait logic in lib-13.f90

2018-07-26  Tom de Vries  

* testsuite/libgomp.oacc-fortran/lib-13.f90: Replace acc_wait_all with
acc_wait.  Move acc_async_test calls to correct locations.  Remove
xfail.

---
 libgomp/testsuite/libgomp.oacc-fortran/lib-13.f90 | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/libgomp/testsuite/libgomp.oacc-fortran/lib-13.f90 
b/libgomp/testsuite/libgomp.oacc-fortran/lib-13.f90
index 6d713b1cd95..da944c35de9 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/lib-13.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/lib-13.f90
@@ -1,5 +1,4 @@
 ! { dg-do run }
-! { dg-xfail-run-if "TODO" { openacc_nvidia_accel_selected } { "-O0" "-O1" } { 
"" } }
 
 program main
   use openacc
@@ -22,13 +21,12 @@ program main
 end do
   !$acc end data
 
-  if (acc_async_test (1) .neqv. .TRUE.) call abort
-  if (acc_async_test (2) .neqv. .TRUE.) call abort
-
   call acc_wait_all_async (nprocs + 1)
 
-  if (acc_async_test (nprocs + 1) .neqv. .TRUE.) call abort
+  call acc_wait (nprocs + 1)
 
-  call acc_wait_all ()
+  if (acc_async_test (1) .neqv. .TRUE.) call abort
+  if (acc_async_test (2) .neqv. .TRUE.) call abort
+  if (acc_async_test (nprocs + 1) .neqv. .TRUE.) call abort
 
 end program


[committed, libgomp, openacc, testsuite] Fix async logic in lib-12.f90

2018-07-26 Thread Tom de Vries
Hi,

in testcase lib-12.f90, all acc_async_test calls are placed in a location
where they are not guaranteed to succeed, which explains why there's an xfail
for the lower optimization levels.

This patch fixes the problem by moving the acc_async_test calls to the correct
locations.

Reg-tested on x86_64 with nvptx accelerator.

Committed to trunk.

Thanks,
- Tom

[libgomp, openacc, testsuite] Fix async logic in lib-12.f90

2018-07-26  Tom de Vries  

* testsuite/libgomp.oacc-fortran/lib-12.f90: Move acc_async_test calls
to correct locations.  Remove xfail.

---
 libgomp/testsuite/libgomp.oacc-fortran/lib-12.f90 | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/libgomp/testsuite/libgomp.oacc-fortran/lib-12.f90 
b/libgomp/testsuite/libgomp.oacc-fortran/lib-12.f90
index e307dfde374..6912f67d444 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/lib-12.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/lib-12.f90
@@ -1,5 +1,4 @@
 ! { dg-do run }
-! { dg-xfail-run-if "TODO" { openacc_nvidia_accel_selected } { "-O0" "-O1" } { 
"" } }
 
 program main
   use openacc
@@ -18,10 +17,9 @@ program main
 
   call acc_wait_async (0, 1)
 
-  if (acc_async_test (0) .neqv. .TRUE.) call abort
+  call acc_wait (1)
 
+  if (acc_async_test (0) .neqv. .TRUE.) call abort
   if (acc_async_test (1) .neqv. .TRUE.) call abort
 
-  call acc_wait (1)
-
 end program


Re: [PATCH][Middle-end] disable strcmp/strncmp inlining with O2 below and Os

2018-07-26 Thread Richard Biener
On Wed, 25 Jul 2018, Qing Zhao wrote:

> Hi,
> 
> As Wilco suggested, the new added strcmp/strncmp inlining should be only 
> enabled with O2 and above.
> 
> this is the simple patch for this change.
> 
> tested on both X86 and aarch64.
> 
> Okay for thunk?

You should simply use

  if (optimize_insn_for_size_p ())
return NULL_RTX;

to be properly profile-aware.  OK with that change.

Richard.

> Qing
> 
> gcc/ChangeLog:
> 
> +2018-07-25  Qing Zhao  
> +
> +   * builtins.c (inline_expand_builtin_string_cmp): Disable inlining
> +   when optimization level is lower than 2 or optimize for size.
> +   
> 
> gcc/testsuite/ChangeLog:
> 
> +2018-07-25  Qing Zhao  
> +
> +   * gcc.dg/strcmpopt_5.c: Change to O2 to enable the transformation.
> +   * gcc.dg/strcmpopt_6.c: Likewise.
> +
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [PATCH][GCC][Arm] Fix subreg crash in different way by enabling the FP16 pattern unconditionally.

2018-07-26 Thread Thomas Preudhomme
Hi Tamar,

On Wed, 25 Jul 2018 at 16:28, Tamar Christina  wrote:
>
> Hi Thomas,
>
> Thanks for the review!
>
> > >
> > > I don't believe the TARGET_FP16 guard to be needed, because the
> > > pattern doesn't actually generate code and requires another pattern
> > > for that, and a reg to reg move should always be possible anyway. So
> > > allowing the force to register here is safe and it allows the compiler
> > > to generate a correct error instead of ICEing in an infinite loop.
> >
> > How about subreg to subreg move? Doesn't that expand to more insns
> > (subreg to reg and reg to subreg)? Couldn't you improve the logic to check
> > that there is actually a mode change so that if there isn't (like moving 
> > from
> > one subreg to another) just expand to a single move?
> >
>
> Yes, but that is not a new issue. My patch is simply removing the TARGET_FP16 
> restrictions and
> merging two patterns that should be one using an iterator and nothing more.
>
> The redundant mov is already there and a different issue than the ICE I'm 
> trying to fix.

It's there for movv4hf and movv6hf but your patch extends this problem
to movv2sf and movv4sf as well.

>
> None of the code inside the expander is needed at all, the code really only 
> has an effect on subreg
> to subreg moves, as `force_reg` doesn't do anything when it's argument is 
> already a reg.
>
> The comment in the expander (which was already there) is wrong. The *reason* 
> the ICE is fixed isn't
> because of the `force_reg`. It's because of the mere presence of the expander 
> itself. The expander matches the
> standard mov$a optab and so this prevents emit_move_insn_1 from doing the 
> move by subwords as it finds a pattern
> that's able to do the move.

Could you then fix the comment in your patch as well? I hadn't
understood the force_reg was not key here. You might want to update
the following sentence from your patch description if you are going to
include it in your commit message:

The way this is worked around in the back-end is that we have move patterns in
neon.md that usually just force the register instead of checking with the
back-end.

"The way this is worked around (..) that just force the register" is
what led me to believe the force_reg was important.

>
> The expander however always falls through and doesn’t stop RTL generation. 
> You could remove all the code in there and have
> it properly match the *neon_mov instructions which will do the right thing 
> later at code generation time and avoid the redundant
> moves.  My guess is the original `force_reg` was copied from the other 
> patterns like `movti` and the existing `mov`. There It makes
> sense because the operands can be MEM or anything general_operand.
>
> However the redundant moves are a different problem than what I'm trying to 
> solve here. So I think that's another patch which requires further
> testing.

I was just thinking of restricting when does the force_reg happens but
if it can be removed completely I agree it should probably be done in
a separate patch.

Oh by the way, is there something that prevent those expander to ever
be used with a memory operand? Because the GCC internals contains the
following piece for mov standard pattern (bold marks added by me):

"Second, these patterns are not used solely in the RTL generation pass. Even
the reload pass can generate move insns to copy values from stack slots into
temporary registers. When it does so, one of the operands is a hard register
and the other is an operand that can need to be reloaded into a register.
Therefore, when given such a pair of operands, the pattern must generate RTL
which needs no reloading and needs no temporary registers—no registers other
than the operands. For example, if you support the pattern with a define_
expand, then in such a case the define_expand *mustn’t call force_reg* or any
other such function which might generate new pseudo registers."

Best regards,

Thomas

>
> Regards,
> Tamar
>
> > Best regards,
> >
> > Thomas
> >
> > >
> > > This patch ensures gcc.target/arm/big-endian-subreg.c is fixed without
> > > introducing any regressions while fixing
> > >
> > > gcc.dg/vect/vect-nop-move.c execution test
> > > g++.dg/torture/vshuf-v2si.C   -O3 -g  execution test
> > > g++.dg/torture/vshuf-v4si.C   -O3 -g  execution test
> > > g++.dg/torture/vshuf-v8hi.C   -O3 -g  execution test
> > >
> > > Regtested on armeb-none-eabi and no regressions.
> > > Bootstrapped on arm-none-linux-gnueabihf and no issues.
> > >
> > >
> > > Ok for trunk?
> > >
> > > Thanks,
> > > Tamar
> > >
> > > gcc/
> > > 2018-07-23  Tamar Christina  
> > >
> > > PR target/84711
> > > * config/arm/arm.c (arm_can_change_mode_class): Disallow subreg.
> > > * config/arm/neon.md (movv4hf, movv8hf): Refactored to..
> > > (mov): ..this and enable unconditionally.
> > >
> > > --


Re: [PATCH 3/3] Add user-friendly OpenACC diagnostics regarding detected parallelism.

2018-07-26 Thread Richard Biener
On Wed, Jul 25, 2018 at 5:30 PM Cesar Philippidis
 wrote:
>
> This patch teaches GCC to inform the user how it assigned parallelism
> to each OpenACC loop at compile time using the -fopt-info-note-omp
> flag. For instance, given the acc parallel loop nest:
>
>   #pragma acc parallel loop
>   for (...)
> #pragma acc loop vector
> for (...)
>
> GCC will report somthing like
>
>   foo.c:4:0: note: Detected parallelism 
>   foo.c:6:0: note: Detected parallelism 
>
> Note how only the inner loop specifies vector parallelism. In this
> example, GCC automatically assigned gang and worker parallelism to the
> outermost loop. Perhaps, going forward, it would be useful to
> distinguish which parallelism was specified by the user and which was
> assigned by the compiler. But that can be added in a follow up patch.
>
> Is this patch OK for trunk? I bootstrapped and regtested it for x86_64
> with nvptx offloading.

Shouldn't this use MSG_OPTIMIZED_LOCATIONS instead?  Are there
any other optinfo notes emitted?  Like when despite pragmas loops
are not handled or so?

> Thanks,
> Cesar
>
> 2018-XX-YY  Cesar Philippidis  
>
> gcc/
> * omp-offload.c (inform_oacc_loop): New function.
> (execute_oacc_device_lower): Use it to display loop parallelism.
>
> gcc/testsuite/
> * c-c++-common/goacc/note-parallelism.c: New test.
> * gfortran.dg/goacc/note-parallelism.f90: New test.
>
> (cherry picked from gomp-4_0-branch r245683, and gcc/testsuite/ parts of
> r245770)
>
> diff --git a/gcc/omp-offload.c b/gcc/omp-offload.c
> index 0abf028..66b99bb 100644
> --- a/gcc/omp-offload.c
> +++ b/gcc/omp-offload.c
> @@ -866,6 +866,31 @@ debug_oacc_loop (oacc_loop *loop)
>dump_oacc_loop (stderr, loop, 0);
>  }
>
> +/* Provide diagnostics on OpenACC loops LOOP, its siblings and its
> +   children.  */
> +
> +static void
> +inform_oacc_loop (oacc_loop *loop)
> +{
> +  const char *seq = loop->mask == 0 ? " seq" : "";
> +  const char *gang = loop->mask & GOMP_DIM_MASK (GOMP_DIM_GANG)
> +? " gang" : "";
> +  const char *worker = loop->mask & GOMP_DIM_MASK (GOMP_DIM_WORKER)
> +? " worker" : "";
> +  const char *vector = loop->mask & GOMP_DIM_MASK (GOMP_DIM_VECTOR)
> +? " vector" : "";
> +  dump_location_t loc = dump_location_t::from_location_t (loop->loc);
> +
> +  dump_printf_loc (MSG_NOTE, loc,
> +  "Detected parallelism \n", seq, gang,
> +  worker, vector);
> +
> +  if (loop->child)
> +inform_oacc_loop (loop->child);
> +  if (loop->sibling)
> +inform_oacc_loop (loop->sibling);
> +}
> +
>  /* DFS walk of basic blocks BB onwards, creating OpenACC loop
> structures as we go.  By construction these loops are properly
> nested.  */
> @@ -1533,6 +1558,8 @@ execute_oacc_device_lower ()
>dump_oacc_loop (dump_file, loops, 0);
>fprintf (dump_file, "\n");
>  }
> +  if (dump_enabled_p () && loops->child)
> +inform_oacc_loop (loops->child);
>
>/* Offloaded targets may introduce new basic blocks, which require
>   dominance information to update SSA.  */
> diff --git a/gcc/testsuite/c-c++-common/goacc/note-parallelism.c 
> b/gcc/testsuite/c-c++-common/goacc/note-parallelism.c
> new file mode 100644
> index 000..3ec794c
> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/goacc/note-parallelism.c
> @@ -0,0 +1,61 @@
> +/* Test the output of -fopt-info-note-omp.  */
> +
> +/* { dg-additional-options "-fopt-info-note-omp" } */
> +
> +int
> +main ()
> +{
> +  int x, y, z;
> +
> +#pragma acc parallel loop seq /* { dg-message "note: Detected parallelism 
> " } */
> +  for (x = 0; x < 10; x++)
> +;
> +
> +#pragma acc parallel loop gang /* { dg-message "note: Detected parallelism 
> " } */
> +  for (x = 0; x < 10; x++)
> +;
> +
> +#pragma acc parallel loop worker /* { dg-message "note: Detected parallelism 
> " } */
> +  for (x = 0; x < 10; x++)
> +;
> +
> +#pragma acc parallel loop vector /* { dg-message "note: Detected parallelism 
> " } */
> +  for (x = 0; x < 10; x++)
> +;
> +
> +#pragma acc parallel loop gang vector /* { dg-message "note: Detected 
> parallelism " } */
> +  for (x = 0; x < 10; x++)
> +;
> +
> +#pragma acc parallel loop gang worker /* { dg-message "note: Detected 
> parallelism " } */
> +  for (x = 0; x < 10; x++)
> +;
> +
> +#pragma acc parallel loop worker vector /* { dg-message "note: Detected 
> parallelism " } */
> +  for (x = 0; x < 10; x++)
> +;
> +
> +#pragma acc parallel loop gang worker vector /* { dg-message "note: Detected 
> parallelism " } */
> +  for (x = 0; x < 10; x++)
> +;
> +
> +#pragma acc parallel loop /* { dg-message "note: Detected parallelism  loop gang vector>" } */
> +  for (x = 0; x < 10; x++)
> +;
> +
> +#pragma acc parallel loop /* { dg-message "note: Detected parallelism  loop gang worker>" } */
> +  for (x = 0; x < 10; x++)
> +#pragma acc loop /* { dg-message "note: Detected parallelism  vector>" } */
> +for (y = 0; y < 10; y++)

Re: [PATCH] treat -Wxxx-larger-than=HWI_MAX special (PR 86631)

2018-07-26 Thread Richard Biener
On Wed, Jul 25, 2018 at 5:54 PM Martin Sebor  wrote:
>
> On 07/25/2018 08:57 AM, Jakub Jelinek wrote:
> > On Wed, Jul 25, 2018 at 08:54:13AM -0600, Martin Sebor wrote:
> >> I don't mean for the special value to be used except internally
> >> for the defaults.  Otherwise, users wanting to override the default
> >> will choose a value other than it.  I'm happy to document it in
> >> the .opt file for internal users though.
> >>
> >> -1 has the documented effect of disabling the warnings altogether
> >> (-1 is SIZE_MAX) so while I agree that -1 looks better it doesn't
> >> work.  (It would need more significant changes.)
> >
> > The variable is signed, so -1 is not SIZE_MAX.  Even if -1 disables it, you
> > could use e.g. -2 or other negative value for the other special case.
>
> The -Wxxx-larger-than=N distinguish three ranges of argument
> values (treated as unsigned):
>
>1.  [0, HOST_WIDE_INT_MAX)
>2.  HOST_WIDE_INT_MAX
>3.  [HOST_WIDE_INT_MAX + 1, Infinity)

But it doesn't make sense for those to be host dependent.

I think numerical user input should be limited to [0, ptrdiff_max]
and cases (1) and (2) should be simply merged, I see no value
in distinguishing them.  -Wxxx-larger-than should be aliased
to [0, ptrdiff_max], case (3) is achieved by -Wno-xxx-larger-than.

I think you are over-engineering this and the user-interface is
awful.

> (1) implies warnings for allocations in excess of the size.  For
> the alloca/VLA warnings it also means warnings for allocations
> that may be unbounded.  (This feels like a bit of a wart.)
>
> (2) implies warnings for allocations in excess of PTRDIFF_MAX
> only.  For the alloca/VLA warnings it also disables warnings
> for allocations that may be unbounded (also a bit of a wart)
>
> (3) isn't treated consistently by all options (yet) but for
> the alloca/VLA warnings it means no warnings.  Since
> the argument value is stored in signed HOST_WIDE_INT this
> range is strictly negative.
>
> Any value from (3) could in theory be made special and used
> instead of -1 or HOST_WIDE_INT_MAX as a placeholder for
> PTRDIFF_MAX.  But no matter what the choice is, it removes
> the value from the usable set in (3) (i.e., it doesn't have
> the expected effect of disabling the warning).
>
> I don't see the advantage of picking -2 over any other negative
> number.  As inelegant as the current choice of HOST_WIDE_INT_MAX
> may be, it seems less arbitrary and less intrusive than picking
> a random value from the negative range.
>
> Martin
>
> PS The handling of these ranges isn't consistent across all
> the options because they were each developed independently
> and without necessarily aiming for it.  I think making them
> more consistent would be nice as a followup patch.  I would
> expect consistency to be achievable more easily if baking
> special cases into the design is kept to a minimum.  It
> would also help to remove some existing special cases.
> E.g., by introducing a distinct option for the special case
> of diagnosing unbounded alloca/VLA allocations and removing
> it from -W{alloca,vla}-larger-than=.


Re: [PATCH 1/7] Add __builtin_speculation_safe_value

2018-07-26 Thread Richard Biener
On Wed, Jul 25, 2018 at 8:03 PM Richard Earnshaw (lists)
 wrote:
>
> On 24/07/18 18:26, Richard Biener wrote:
> > So, please make resolve_overloaded_builtin return a no-op on such targets
> > which means you can remove the above warning.  Maybe such targets
> > shouldn't advertise / initialize the builtins at all?
>
> So I tried to make resolve_overloaded_builtin collapse the builtin
> entirely if it's not needed by the machine, transforming
>
>   x = __b_s_s_v (y);
>
> into
>
>   x = y;
>
>
> but I can't see how to make any side-effects on the optional second
> argument hang around.  It's somewhat obscure, but if the user does write
>
>  x = __b_s_s_v (y, z++);
>
> then z++ does still need to be performed.
>
> The problem seems to be that the callers of resolve_overloaded_builtin
> expect just a simple value result - they can't, for example, deal with a
> statement list and just calling save_expr on the argument isn't enough;
> so I can't see an obvious way to force the z++ expression back into the
> token stream at this point.
>
> Any ideas?  The alternative seems to be that we must keep the call until
> such time as the builtins are lowered during expansion, which pretty
> much loses all the benefits you were looking for.

Use a COMPOUND_EXPR: (z++, y).  So,

 if (TREE_SIDE_EFFECTS (2ndarg))
  res = build2_loc (loc, COMPOUND_EXPR, type, 2ndarg, 1starg);
 else
  res = 2starg;

Richard.

>
> R.


Re: [PATCH] Fix target clones (PR gcov-profile/85370).

2018-07-26 Thread Martin Liška
On 07/25/2018 03:50 PM, Richard Biener wrote:
> On Wed, Jul 25, 2018 at 3:38 PM Martin Liška  wrote:
>>
>> Hi.
>>
>> Target clones have DECL_ARTIFICIAL set to 1, but we want to
>> provide --coverage for that. With patched GCC on can see:
>>
>> -:0:Source:pr85370.c
>> -:0:Graph:pr85370.gcno
>> -:0:Data:pr85370.gcda
>> -:0:Runs:1
>> -:0:Programs:1
>> -:1:__attribute__((target_clones("arch=slm","default")))
>> 1:2:int foo1 (int a, int b) { // executed  wrongly
>> 1:3:  return a + b;
>> -:4:}
>> --
>> foo1.arch_slm.0:
>> 0:2:int foo1 (int a, int b) { // executed  wrongly
>> 0:3:  return a + b;
>> -:4:}
>> --
>> foo1.default.1:
>> 1:2:int foo1 (int a, int b) { // executed  wrongly
>> 1:3:  return a + b;
>> -:4:}
>> --
>> -:5:
>> 1:6:int foo2 (int a, int b) {
>> 1:7:  return a + b;
>> -:8:}
>> -:9:
>> 1:   10:int main() {
>> 1:   11:  foo1(1, 1);
>> 1:   12:  foo2(1, 1);
>> 1:   13:  return 1;
>> -:   14:}
>>
>> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
>> Will install in couple of days if no objection.
> 
> I wonder if representing the clones as artificial but have their body be
> marked as inline instance of the original function works for gcov? 

Do you mean an inlined functions? If so, these are fine as gimple statements
that were inlined still point to original source code.

 I think
> it should for debuggers.  A similar case is probably the
> static_constructors_and_destructors

Actually static_c_a_d were motivation for exclusion. They have location set
to last line in source code and it's not intentional. Similarly implicit
ctors/dtors (e.g. Centering<3>::Centering(Centering<3> const&)) are ignored
as they don't have any real line of code in a source file.

> function which has all ctors/dtors of static objects inlined into but itself 
> is
> of course artificial.  Is that handled correctly?

Hope I explained enough?
Martin

> 
> Richard.
> 
>> Martin
>>
>> gcc/ChangeLog:
>>
>> 2018-07-25  Martin Liska  
>>
>> PR gcov-profile/85370
>> * coverage.c (coverage_begin_function): Do not mark target
>> clones as artificial functions.
>> ---
>>  gcc/coverage.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>>



[PATCH] Fix an UBSAN error in cp/parse.c (PR c++/86653).

2018-07-26 Thread Martin Liška
Hello.

Quite simple patch that initializes a boolean value before it's used.
The variable is not initialized when an error recovery happens.

Ready for trunk after testing?
Thanks,
Martin

gcc/cp/ChangeLog:

2018-07-26  Martin Liska  

PR c++/86653
* parser.c (cp_parser_condition): Initialize non_constant_p
to false.
---
 gcc/cp/parser.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index d44a6b88028..93c812f80d7 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -11721,7 +11721,7 @@ cp_parser_condition (cp_parser* parser)
   if (cp_parser_parse_definitely (parser))
 	{
 	  tree pushed_scope;
-	  bool non_constant_p;
+	  bool non_constant_p = false;
 	  int flags = LOOKUP_ONLYCONVERTING;
 
 	  if (!cp_parser_check_condition_declarator (parser, declarator, loc))



Re: [PATCH] warn for strlen of arrays with missing nul (PR 86552)

2018-07-26 Thread Bernd Edlinger
> @@ -567,13 +597,17 @@ string_length (const void *ptr, unsigned eltsize, 
> unsigned maxelts)
> accesses.  Note that this implies the result is not going to be emitted
>into the instruction stream.
>  
> +   When ARR is non-null and the string is not properly nul-terminated,
> +   set *ARR to the declaration of the outermost constant object whose
> +   initializer (or one of its elements) is not nul-terminated.
> +
> The value returned is of type `ssizetype'.
>  
> Unfortunately, string_constant can't access the values of const char
> arrays with initializers, so neither can we do so here.  */

Maybe drop that sentence when it is no longer true?

>  
>  tree
> -c_strlen (tree src, int only_value)
> +c_strlen (tree src, int only_value, tree *arr /* = NULL */)
>  {
>STRIP_NOPS (src);
>if (TREE_CODE (src) == COND_EXPR
> @@ -581,24 +615,31 @@ c_strlen (tree src, int only_value)
>  {
>tree len1, len2;
>  
> -  len1 = c_strlen (TREE_OPERAND (src, 1), only_value);
> -  len2 = c_strlen (TREE_OPERAND (src, 2), only_value);
> +  len1 = c_strlen (TREE_OPERAND (src, 1), only_value, arr);
> +  len2 = c_strlen (TREE_OPERAND (src, 2), only_value, arr);

Wow, what happens here if the first operand is non-zero terminated and the 
second is zero-terminated?

>if (tree_int_cst_equal (len1, len2))
>   return len1;
>  }
>  
>if (TREE_CODE (src) == COMPOUND_EXPR
>&& (only_value || !TREE_SIDE_EFFECTS (TREE_OPERAND (src, 0


Bernd.

Re: [PATCH] Make strlen range computations more conservative

2018-07-26 Thread Richard Biener
On Wed, 25 Jul 2018, Martin Sebor wrote:

> > BUT - for the string_constant and c_strlen functions we are,
> > in all cases we return something interesting, able to look
> > at an initializer which then determines that type.  Hopefully.
> > I think the strlen() folding code when it sets SSA ranges
> > now looks at types ...?
> > 
> > Consider
> > 
> > struct X { int i; char c[4]; int j;};
> > struct Y { char c[16]; };
> > 
> > void foo (struct X *p, struct Y *q)
> > {
> >   memcpy (p, q, sizeof (struct Y));
> >   if (strlen ((char *)(struct Y *)p + 4) < 7)
> > abort ();
> > }
> > 
> > here the GIMPLE IL looks like
> > 
> >   const char * _1;
> > 
> >[local count: 1073741825]:
> >   _5 = MEM[(char * {ref-all})q_4(D)];
> >   MEM[(char * {ref-all})p_6(D)] = _5;
> >   _1 = p_6(D) + 4;
> >   _2 = __builtin_strlen (_1);
> > 
> > and I guess Martin would argue that since p is of type struct X
> > + 4 gets you to c[4] and thus strlen of that cannot be larger
> > than 3.  But of course the middle-end doesn't work like that
> > and luckily we do not try to draw such conclusions or we
> > are somehow lucky that for the testcase as written above we do not
> > (I'm not sure whether Martins changes in this area would derive
> > such conclusions in principle).
> 
> Only if the strlen argument were p->c.
> 
> > NOTE - we do not know the dynamic type here since we do not know
> > the dynamic type of the memory pointed-to by q!  We can only
> > derive that at q+4 there must be some object that we can
> > validly call strlen on (where Martin again thinks strlen
> > imposes constrains that memchr does not - sth I do not agree
> > with from a QOI perspective)
> 
> The dynamic type is a murky area.

It's well-specified in the middle-end.  A store changes the
dynamic type of the stored-to object.  If that type is
compatible with the surrounding objects dynamic type that one
is not affected, if not then the surrounding objects dynamic
type becomes unspecified.  There is TYPE_TYPELESS_STORAGE
to somewhat control "compatibility" of subobjects.

> As you said, above we don't
> know whether *p is an allocated object or not.  Strictly speaking,
> we would need to treat it as such.  It would basically mean
> throwing out all type information and treating objects simply
> as blobs of bytes.  But that's not what GCC or other compilers do
> either.

It is what GCC does unless it sees a store to the memory.  Basically
pointers carry no type information, only (visible!) stores
(and loads to some extent) provide information about dynamic types
of objects (allocated or declared - GCC doesn't make a difference there).

  For instance, in the modified foo below, GCC eliminates
> the test because it assumes that *p and *q don't overlap.  It
> does that because they are members of structs of unrelated types
> access to which cannot alias.  I.e., not just the type of
> the access matters (here int and char) but so does the type of
> the enclosing object.  If it were otherwise and only the type
> of the access mattered then eliminating the test below wouldn't
> be valid (objects can have their stored value accessed by either
> an lvalue of a compatible type or char).
> 
>   void foo (struct X *p, struct Y *q)
>   {
> int j = p->j;
> q->c[__builtin_offsetof (struct X, j)] = 0;
> if (j != p->j)
>   __builtin_abort ();
> }

Here GCC sees both a load and a store where it derives the
information from.  And yes, it looks at the full access
structure which contains a dereference of p and of q.
Because of that and the fact that the store to q->c[]
(which for GCC implies a store to *q!) that changes the dynamic
type.

> Clarifying (and adjusting if necessary) this area is among
> the goals of the C object model proposal and the ongoing study
> group.  We have been talking about some of these cases there
> and trying to come up with ways to let code do what it needs
> to do without compromising existing language rules, which was
> the consensus position within WG14 when the study group was
> formed: i.e., to clarify or reaffirm existing rules and, in
> cases of ambiguity or where the standard is unintentionally
> overly permissive), favor tighter rules over looser ones.

There is also the C++ object model and the Ada object model and ...

GCC already has an object model in its middle-end and that is
not going to change.  And obviously it was modeled after the
requirements from the languages the middle-end supports.  The
latest change was made necessary by C++ (placement new and
storage re-use specifically).

Richard.



Re: [PATCH] Fix target clones (PR gcov-profile/85370).

2018-07-26 Thread Richard Biener
On Thu, Jul 26, 2018 at 10:44 AM Martin Liška  wrote:
>
> On 07/25/2018 03:50 PM, Richard Biener wrote:
> > On Wed, Jul 25, 2018 at 3:38 PM Martin Liška  wrote:
> >>
> >> Hi.
> >>
> >> Target clones have DECL_ARTIFICIAL set to 1, but we want to
> >> provide --coverage for that. With patched GCC on can see:
> >>
> >> -:0:Source:pr85370.c
> >> -:0:Graph:pr85370.gcno
> >> -:0:Data:pr85370.gcda
> >> -:0:Runs:1
> >> -:0:Programs:1
> >> -:1:__attribute__((target_clones("arch=slm","default")))
> >> 1:2:int foo1 (int a, int b) { // executed  wrongly
> >> 1:3:  return a + b;
> >> -:4:}
> >> --
> >> foo1.arch_slm.0:
> >> 0:2:int foo1 (int a, int b) { // executed  wrongly
> >> 0:3:  return a + b;
> >> -:4:}
> >> --
> >> foo1.default.1:
> >> 1:2:int foo1 (int a, int b) { // executed  wrongly
> >> 1:3:  return a + b;
> >> -:4:}
> >> --
> >> -:5:
> >> 1:6:int foo2 (int a, int b) {
> >> 1:7:  return a + b;
> >> -:8:}
> >> -:9:
> >> 1:   10:int main() {
> >> 1:   11:  foo1(1, 1);
> >> 1:   12:  foo2(1, 1);
> >> 1:   13:  return 1;
> >> -:   14:}
> >>
> >> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
> >> Will install in couple of days if no objection.
> >
> > I wonder if representing the clones as artificial but have their body be
> > marked as inline instance of the original function works for gcov?
>
> Do you mean an inlined functions? If so, these are fine as gimple statements
> that were inlined still point to original source code.
>
>  I think
> > it should for debuggers.  A similar case is probably the
> > static_constructors_and_destructors
>
> Actually static_c_a_d were motivation for exclusion. They have location set
> to last line in source code and it's not intentional. Similarly implicit
> ctors/dtors (e.g. Centering<3>::Centering(Centering<3> const&)) are ignored
> as they don't have any real line of code in a source file.
>
> > function which has all ctors/dtors of static objects inlined into but 
> > itself is
> > of course artificial.  Is that handled correctly?
>
> Hope I explained enough?

The question is what you like to see - looking at your figure above it
looks like
you want to see separate coverage for the different clones even when they
are auto-generated by GCC?  Isn't that inconsistent with for example
IPA-CP generated clones or inline instances?  Manual multiversions
in source are already reported separately?

Richard.

> Martin
>
> >
> > Richard.
> >
> >> Martin
> >>
> >> gcc/ChangeLog:
> >>
> >> 2018-07-25  Martin Liska  
> >>
> >> PR gcov-profile/85370
> >> * coverage.c (coverage_begin_function): Do not mark target
> >> clones as artificial functions.
> >> ---
> >>  gcc/coverage.c | 3 ++-
> >>  1 file changed, 2 insertions(+), 1 deletion(-)
> >>
> >>
>


[PATCH v2 0/7] Support partial clobbers around TLS calls on Aarch64 SVE

2018-07-26 Thread Alan Hayward
This is rebasing of the patch posted in November.
It's aim is to support aarch64 SVE register preservation around TLS calls
by adding a CLOBBER_HIGH expression.

Across a TLS call, Aarch64 SVE does not explicitly preserve the
SVE vector registers. However, the Neon vector registers are preserved.
Due to overlapping of registers, this means the lower 128bits of all
SVE vector registers will be preserved.

The existing GCC code assume all Neon and SVE registers are clobbered
across TLS calls.

This patch introduces a CLOBBER_HIGH expression. This behaves a bit like
a CLOBBER expression. CLOBBER_HIGH can only refer to a single register.
The mode of the expression indicates the size of the lower bits which
will be preserved. If the register contains a value bigger than this
mode then the code will treat the register as clobbered, otherwise the
register remains untouched.

The means in order to evaluate if a clobber high is relevant, we need to
ensure the mode of the existing value in a register is tracked.

The first two patches introduce CLOBBER_HIGH and generation support.
The next patch adds a helper function for easily determining if a register
gets clobbered by a CLOBBER_HIGH.
The next three patches add clobber high checks to all of the passes. I
couldn't think of a better way of splitting this up (maybe needs dividing
into smaller patches?).
Finally the last patch adds the CLOBBER_HIGHS around a TLS call for
aarch64 SVE and some test cases.

Alan Hayward (7):
  Add CLOBBER_HIGH expression
  Generation support for CLOBBER_HIGH
  Add func to check if register is clobbered by clobber_high
  lra support for clobber_high
  cse support for clobber_high
  Remaining support for clobber high
  Enable clobber high for tls descs on Aarch64

 gcc/alias.c|  11 ++
 gcc/cfgexpand.c|   1 +
 gcc/combine-stack-adj.c|   1 +
 gcc/combine.c  |  38 -
 gcc/config/aarch64/aarch64.md  |  69 ++--
 gcc/cse.c  | 187 ++---
 gcc/cselib.c   |  42 +++--
 gcc/cselib.h   |   2 +-
 gcc/dce.c  |  11 +-
 gcc/df-scan.c  |   6 +
 gcc/doc/rtl.texi   |  15 +-
 gcc/dwarf2out.c|   1 +
 gcc/emit-rtl.c |  16 ++
 gcc/genconfig.c|   1 +
 gcc/genemit.c  |  51 +++---
 gcc/genrecog.c |   3 +-
 gcc/haifa-sched.c  |   3 +
 gcc/ira-build.c|   5 +
 gcc/ira-costs.c|   7 +
 gcc/ira.c  |   6 +-
 gcc/jump.c |   1 +
 gcc/lra-eliminations.c |  11 ++
 gcc/lra-int.h  |   2 +
 gcc/lra-lives.c|  31 ++--
 gcc/lra.c  |  66 +---
 gcc/postreload-gcse.c  |  21 ++-
 gcc/postreload.c   |  25 ++-
 gcc/print-rtl.c|   1 +
 gcc/recog.c|   9 +-
 gcc/regcprop.c |  10 +-
 gcc/reginfo.c  |   4 +
 gcc/reload1.c  |  16 +-
 gcc/reorg.c|  27 ++-
 gcc/resource.c |  24 ++-
 gcc/rtl.c  |  15 ++
 gcc/rtl.def|  10 ++
 gcc/rtl.h  |  27 ++-
 gcc/rtlanal.c  |  47 +-
 gcc/sched-deps.c   |  15 +-
 .../gcc.target/aarch64/sve_tls_preserve_1.c|  19 +++
 .../gcc.target/aarch64/sve_tls_preserve_2.c|  24 +++
 .../gcc.target/aarch64/sve_tls_preserve_3.c|  24 +++
 42 files changed, 725 insertions(+), 180 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve_tls_preserve_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve_tls_preserve_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve_tls_preserve_3.c

-- 
2.15.2 (Apple Git-101.1)



[PATCH v2 3/7] Add func to check if register is clobbered by clobber_high

2018-07-26 Thread Alan Hayward
Given a CLOBBER_HIGH expression and a register, it checks if
the register will be clobbered.

A second version exists for the cases where the expressions are
not available.

The function will be used throughout the following patches.

2018-07-25  Alan Hayward  

* rtl.h (reg_is_clobbered_by_clobber_high): Add declarations.
* rtlanal.c (reg_is_clobbered_by_clobber_high): Add function.
---
 gcc/rtl.h | 10 ++
 gcc/rtlanal.c | 29 +
 2 files changed, 39 insertions(+)

diff --git a/gcc/rtl.h b/gcc/rtl.h
index f42d749511d..d549b0aad0b 100644
--- a/gcc/rtl.h
+++ b/gcc/rtl.h
@@ -3467,6 +3467,16 @@ extern bool tablejump_p (const rtx_insn *, rtx_insn **, 
rtx_jump_table_data **);
 extern int computed_jump_p (const rtx_insn *);
 extern bool tls_referenced_p (const_rtx);
 extern bool contains_mem_rtx_p (rtx x);
+extern bool reg_is_clobbered_by_clobber_high (unsigned int, machine_mode,
+ const_rtx);
+
+/* Convenient wrapper for reg_is_clobbered_by_clobber_high.  */
+inline bool
+reg_is_clobbered_by_clobber_high (const_rtx x, const_rtx clobber_high_op)
+{
+  return reg_is_clobbered_by_clobber_high (REGNO (x), GET_MODE (x),
+  clobber_high_op);
+}
 
 /* Overload for refers_to_regno_p for checking a single register.  */
 inline bool
diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
index 9f84d7f2a8c..1cab1545744 100644
--- a/gcc/rtlanal.c
+++ b/gcc/rtlanal.c
@@ -6551,3 +6551,32 @@ tls_referenced_p (const_rtx x)
   return true;
   return false;
 }
+
+/* Return true if reg REGNO with mode REG_MODE would be clobbered by the
+   clobber_high operand in CLOBBER_HIGH_OP.  */
+
+bool
+reg_is_clobbered_by_clobber_high (unsigned int regno, machine_mode reg_mode,
+ const_rtx clobber_high_op)
+{
+  unsigned int clobber_regno = REGNO (clobber_high_op);
+  machine_mode clobber_mode = GET_MODE (clobber_high_op);
+  unsigned char regno_nregs = hard_regno_nregs (regno, reg_mode);
+
+  /* Clobber high should always span exactly one register.  */
+  gcc_assert (REG_NREGS (clobber_high_op) == 1);
+
+  /* Clobber high needs to match with one of the registers in X.  */
+  if (clobber_regno < regno || clobber_regno >= regno + regno_nregs)
+return false;
+
+  gcc_assert (reg_mode != BLKmode && clobber_mode != BLKmode);
+
+  if (reg_mode == VOIDmode)
+return clobber_mode != VOIDmode;
+
+  /* Clobber high will clobber if its size might be greater than the size of
+ register regno.  */
+  return maybe_gt (exact_div (GET_MODE_SIZE (reg_mode), regno_nregs),
+GET_MODE_SIZE (clobber_mode));
+}
-- 
2.15.2 (Apple Git-101.1)



[PATCH v2 4/7] lra support for clobber_high

2018-07-26 Thread Alan Hayward
The lra specific changes for clobber_high.

2018-07-25  Alan Hayward  

* lra-eliminations.c (lra_eliminate_regs_1): Check for clobber high.
(mark_not_eliminable): Likewise.
* lra-int.h (struct lra_insn_reg): Add clobber high marker.
* lra-lives.c (process_bb_lives): Check for clobber high.
* lra.c (new_insn_reg): Remember clobber highs.
(collect_non_operand_hard_regs): Check for clobber high.
(lra_set_insn_recog_data): Likewise.
(add_regs_to_insn_regno_info): Likewise.
(lra_update_insn_regno_info): Likewise.
---
 gcc/lra-eliminations.c | 11 +
 gcc/lra-int.h  |  2 ++
 gcc/lra-lives.c| 31 
 gcc/lra.c  | 66 +++---
 4 files changed, 80 insertions(+), 30 deletions(-)

diff --git a/gcc/lra-eliminations.c b/gcc/lra-eliminations.c
index f5f104020b3..d0cfaa8714a 100644
--- a/gcc/lra-eliminations.c
+++ b/gcc/lra-eliminations.c
@@ -654,6 +654,7 @@ lra_eliminate_regs_1 (rtx_insn *insn, rtx x, machine_mode 
mem_mode,
   return x;
 
 case CLOBBER:
+case CLOBBER_HIGH:
 case SET:
   gcc_unreachable ();
 
@@ -806,6 +807,16 @@ mark_not_eliminable (rtx x, machine_mode mem_mode)
setup_can_eliminate (ep, false);
   return;
 
+case CLOBBER_HIGH:
+  gcc_assert (REG_P (XEXP (x, 0)));
+  gcc_assert (REGNO (XEXP (x, 0)) < FIRST_PSEUDO_REGISTER);
+  for (ep = reg_eliminate;
+  ep < ®_eliminate[NUM_ELIMINABLE_REGS];
+  ep++)
+   if (reg_is_clobbered_by_clobber_high (ep->to_rtx, XEXP (x, 0)))
+ setup_can_eliminate (ep, false);
+  return;
+
 case SET:
   if (SET_DEST (x) == stack_pointer_rtx
  && GET_CODE (SET_SRC (x)) == PLUS
diff --git a/gcc/lra-int.h b/gcc/lra-int.h
index 86e103b7480..5267b53c5e3 100644
--- a/gcc/lra-int.h
+++ b/gcc/lra-int.h
@@ -168,6 +168,8 @@ struct lra_insn_reg
   /* True if there is an early clobber alternative for this
  operand.  */
   unsigned int early_clobber : 1;
+  /* True if the reg is clobber highed by the operand.  */
+  unsigned int clobber_high : 1;
   /* The corresponding regno of the register.  */
   int regno;
   /* Next reg info of the same insn.  */
diff --git a/gcc/lra-lives.c b/gcc/lra-lives.c
index 920fd02b997..433c819d9e3 100644
--- a/gcc/lra-lives.c
+++ b/gcc/lra-lives.c
@@ -658,7 +658,7 @@ process_bb_lives (basic_block bb, int &curr_point, bool 
dead_insn_p)
   bool call_p;
   int n_alt, dst_regno, src_regno;
   rtx set;
-  struct lra_insn_reg *reg;
+  struct lra_insn_reg *reg, *hr;
 
   if (!NONDEBUG_INSN_P (curr_insn))
continue;
@@ -690,11 +690,12 @@ process_bb_lives (basic_block bb, int &curr_point, bool 
dead_insn_p)
break;
  }
  for (reg = curr_static_id->hard_regs; reg != NULL; reg = reg->next)
-   if (reg->type != OP_IN)
+   if (reg->type != OP_IN && !reg->clobber_high)
  {
remove_p = false;
break;
  }
+
  if (remove_p && ! volatile_refs_p (PATTERN (curr_insn)))
{
  dst_regno = REGNO (SET_DEST (set));
@@ -812,14 +813,24 @@ process_bb_lives (basic_block bb, int &curr_point, bool 
dead_insn_p)
 unused values because they still conflict with quantities
 that are live at the time of the definition.  */
   for (reg = curr_id->regs; reg != NULL; reg = reg->next)
-   if (reg->type != OP_IN)
- {
-   need_curr_point_incr
- |= mark_regno_live (reg->regno, reg->biggest_mode,
- curr_point);
-   check_pseudos_live_through_calls (reg->regno,
- last_call_used_reg_set);
- }
+   {
+ if (reg->type != OP_IN)
+   {
+ need_curr_point_incr
+   |= mark_regno_live (reg->regno, reg->biggest_mode,
+   curr_point);
+ check_pseudos_live_through_calls (reg->regno,
+   last_call_used_reg_set);
+   }
+
+ if (reg->regno >= FIRST_PSEUDO_REGISTER)
+   for (hr = curr_static_id->hard_regs; hr != NULL; hr = hr->next)
+ if (hr->clobber_high
+ && maybe_gt (GET_MODE_SIZE (PSEUDO_REGNO_MODE (reg->regno)),
+  GET_MODE_SIZE (hr->biggest_mode)))
+   SET_HARD_REG_BIT (lra_reg_info[reg->regno].conflict_hard_regs,
+ hr->regno);
+   }
 
   for (reg = curr_static_id->hard_regs; reg != NULL; reg = reg->next)
if (reg->type != OP_IN)
diff --git a/gcc/lra.c b/gcc/lra.c
index b410b90f126..aa768fb2a23 100644
--- a/gcc/lra.c
+++ b/gcc/lra.c
@@ -535,13 +535,14 @@ object_allocator lra_insn_reg_pool ("insn 
regs");
clobbered in the insn (EARLY_CLOBBER), and reference to the next

[PATCH v2 2/7] Generation support for CLOBBER_HIGH

2018-07-26 Thread Alan Hayward
Ensure clobber high is a register expression.
Info is passed through for the error case.

2018-07-25  Alan Hayward  

* emit-rtl.c (verify_rtx_sharing): Check for CLOBBER_HIGH.
(copy_insn_1): Likewise.
(gen_hard_reg_clobber_high): New gen function.
* genconfig.c (walk_insn_part): Check for CLOBBER_HIGH.
* genemit.c (gen_exp): Likewise.
(gen_emit_seq): Pass through info.
(gen_insn): Check for CLOBBER_HIGH.
(gen_expand): Pass through info.
(gen_split): Likewise.
(output_add_clobbers): Likewise.
* genrecog.c (validate_pattern): Check for CLOBBER_HIGH.
(remove_clobbers): Likewise.
* rtl.h (gen_hard_reg_clobber_high): New declaration.
---
 gcc/emit-rtl.c  | 16 
 gcc/genconfig.c |  1 +
 gcc/genemit.c   | 51 +++
 gcc/genrecog.c  |  3 ++-
 gcc/rtl.h   |  1 +
 5 files changed, 51 insertions(+), 21 deletions(-)

diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
index e4b070486e8..6a32bcbdaf6 100644
--- a/gcc/emit-rtl.c
+++ b/gcc/emit-rtl.c
@@ -2865,6 +2865,7 @@ verify_rtx_sharing (rtx orig, rtx insn)
   /* SCRATCH must be shared because they represent distinct values.  */
   return;
 case CLOBBER:
+case CLOBBER_HIGH:
   /* Share clobbers of hard registers (like cc0), but do not share pseudo 
reg
  clobbers or clobbers of hard registers that originated as pseudos.
  This is needed to allow safe register renaming.  */
@@ -3118,6 +3119,7 @@ repeat:
   /* SCRATCH must be shared because they represent distinct values.  */
   return;
 case CLOBBER:
+case CLOBBER_HIGH:
   /* Share clobbers of hard registers (like cc0), but do not share pseudo 
reg
  clobbers or clobbers of hard registers that originated as pseudos.
  This is needed to allow safe register renaming.  */
@@ -5690,6 +5692,7 @@ copy_insn_1 (rtx orig)
 case SIMPLE_RETURN:
   return orig;
 case CLOBBER:
+case CLOBBER_HIGH:
   /* Share clobbers of hard registers (like cc0), but do not share pseudo 
reg
  clobbers or clobbers of hard registers that originated as pseudos.
  This is needed to allow safe register renaming.  */
@@ -6508,6 +6511,19 @@ gen_hard_reg_clobber (machine_mode mode, unsigned int 
regno)
gen_rtx_CLOBBER (VOIDmode, gen_rtx_REG (mode, regno)));
 }
 
+static GTY((deletable)) rtx
+hard_reg_clobbers_high[NUM_MACHINE_MODES][FIRST_PSEUDO_REGISTER];
+
+rtx
+gen_hard_reg_clobber_high (machine_mode mode, unsigned int regno)
+{
+  if (hard_reg_clobbers_high[mode][regno])
+return hard_reg_clobbers_high[mode][regno];
+  else
+return (hard_reg_clobbers_high[mode][regno]
+   = gen_rtx_CLOBBER_HIGH (VOIDmode, gen_rtx_REG (mode, regno)));
+}
+
 location_t prologue_location;
 location_t epilogue_location;
 
diff --git a/gcc/genconfig.c b/gcc/genconfig.c
index c1bfde8d54b..745d5374b39 100644
--- a/gcc/genconfig.c
+++ b/gcc/genconfig.c
@@ -72,6 +72,7 @@ walk_insn_part (rtx part, int recog_p, int non_pc_set_src)
   switch (code)
 {
 case CLOBBER:
+case CLOBBER_HIGH:
   clobbers_seen_this_insn++;
   break;
 
diff --git a/gcc/genemit.c b/gcc/genemit.c
index f4179e2b631..86e792f7396 100644
--- a/gcc/genemit.c
+++ b/gcc/genemit.c
@@ -79,7 +79,7 @@ gen_rtx_scratch (rtx x, enum rtx_code subroutine_type)
substituting any operand references appearing within.  */
 
 static void
-gen_exp (rtx x, enum rtx_code subroutine_type, char *used)
+gen_exp (rtx x, enum rtx_code subroutine_type, char *used, md_rtx_info *info)
 {
   RTX_CODE code;
   int i;
@@ -123,7 +123,7 @@ gen_exp (rtx x, enum rtx_code subroutine_type, char *used)
   for (i = 0; i < XVECLEN (x, 1); i++)
{
  printf (",\n\t\t");
- gen_exp (XVECEXP (x, 1, i), subroutine_type, used);
+ gen_exp (XVECEXP (x, 1, i), subroutine_type, used, info);
}
   printf (")");
   return;
@@ -137,7 +137,7 @@ gen_exp (rtx x, enum rtx_code subroutine_type, char *used)
   for (i = 0; i < XVECLEN (x, 2); i++)
{
  printf (",\n\t\t");
- gen_exp (XVECEXP (x, 2, i), subroutine_type, used);
+ gen_exp (XVECEXP (x, 2, i), subroutine_type, used, info);
}
   printf (")");
   return;
@@ -163,12 +163,21 @@ gen_exp (rtx x, enum rtx_code subroutine_type, char *used)
 case CLOBBER:
   if (REG_P (XEXP (x, 0)))
{
- printf ("gen_hard_reg_clobber (%smode, %i)", GET_MODE_NAME (GET_MODE 
(XEXP (x, 0))),
-REGNO (XEXP (x, 0)));
+ printf ("gen_hard_reg_clobber (%smode, %i)",
+ GET_MODE_NAME (GET_MODE (XEXP (x, 0))),
+ REGNO (XEXP (x, 0)));
  return;
}
   break;
-
+case CLOBBER_HIGH:
+  if (!REG_P (XEXP (x, 0)))
+   error ("CLOBBER_HIGH argument is not a register expr, at %s:%d",
+  in

[PATCH v2 1/7] Add CLOBBER_HIGH expression

2018-07-26 Thread Alan Hayward
Includes documentation.

2018-07-25  Alan Hayward  

* doc/rtl.texi (clobber_high): Add.
(parallel): Add in clobber high
* rtl.c (rtl_check_failed_code3): Add function.
* rtl.def (CLOBBER_HIGH): Add expression.
* rtl.h (RTL_CHECKC3): Add macro.
(rtl_check_failed_code3): Add declaration.
(XC3EXP): Add macro.
---
 gcc/doc/rtl.texi | 15 ++-
 gcc/rtl.c| 11 +++
 gcc/rtl.def  | 10 ++
 gcc/rtl.h| 16 +++-
 4 files changed, 50 insertions(+), 2 deletions(-)

diff --git a/gcc/doc/rtl.texi b/gcc/doc/rtl.texi
index a37d9ac5389..20c57732679 100644
--- a/gcc/doc/rtl.texi
+++ b/gcc/doc/rtl.texi
@@ -3296,6 +3296,18 @@ There is one other known use for clobbering a pseudo 
register in a
 clobbered by the insn.  In this case, using the same pseudo register in
 the clobber and elsewhere in the insn produces the expected results.
 
+@findex clobber_high
+@item (clobber_high @var{x})
+Represents the storing or possible storing of an unpredictable,
+undescribed value into the upper parts of @var{x}. The mode of the expression
+represents the lower parts of the register which will not be overwritten.
+@code{reg} must be a reg expression.
+
+One place this is used is when calling into functions where the registers are
+preserved, but only up to a given number of bits.  For example when using
+Aarch64 SVE, calling a TLS descriptor will cause only the lower 128 bits of
+each of the vector registers to be preserved.
+
 @findex use
 @item (use @var{x})
 Represents the use of the value of @var{x}.  It indicates that the
@@ -3349,7 +3361,8 @@ Represents several side effects performed in parallel.  
The square
 brackets stand for a vector; the operand of @code{parallel} is a
 vector of expressions.  @var{x0}, @var{x1} and so on are individual
 side effect expressions---expressions of code @code{set}, @code{call},
-@code{return}, @code{simple_return}, @code{clobber} or @code{use}.
+@code{return}, @code{simple_return}, @code{clobber} @code{use} or
+@code{clobber_high}.
 
 ``In parallel'' means that first all the values used in the individual
 side-effects are computed, and second all the actual side-effects are
diff --git a/gcc/rtl.c b/gcc/rtl.c
index 90bbc7c6861..985db1c14f0 100644
--- a/gcc/rtl.c
+++ b/gcc/rtl.c
@@ -856,6 +856,17 @@ rtl_check_failed_code2 (const_rtx r, enum rtx_code code1, 
enum rtx_code code2,
  func, trim_filename (file), line);
 }
 
+void
+rtl_check_failed_code3 (const_rtx r, enum rtx_code code1, enum rtx_code code2,
+   enum rtx_code code3, const char *file, int line,
+   const char *func)
+{
+  internal_error
+("RTL check: expected code '%s', '%s' or '%s', have '%s' in %s, at %s:%d",
+ GET_RTX_NAME (code1), GET_RTX_NAME (code2), GET_RTX_NAME (code3),
+ GET_RTX_NAME (GET_CODE (r)), func, trim_filename (file), line);
+}
+
 void
 rtl_check_failed_code_mode (const_rtx r, enum rtx_code code, machine_mode mode,
bool not_mode, const char *file, int line,
diff --git a/gcc/rtl.def b/gcc/rtl.def
index 2578a0ccb9e..0ed27505545 100644
--- a/gcc/rtl.def
+++ b/gcc/rtl.def
@@ -312,6 +312,16 @@ DEF_RTL_EXPR(USE, "use", "e", RTX_EXTRA)
is considered undeletable before reload.  */
 DEF_RTL_EXPR(CLOBBER, "clobber", "e", RTX_EXTRA)
 
+/* Indicate that the upper parts of something are clobbered in a way that we
+   don't want to explain.  The MODE references the lower bits that will be
+   preserved.  Anything above that size will be clobbered.
+
+   CLOBBER_HIGH only occurs as the operand of a PARALLEL rtx.  It cannot appear
+   in other contexts, and unlike CLOBBER, it cannot appear on its own.
+   CLOBBER_HIGH can only be used with fixed register rtxes.  */
+
+DEF_RTL_EXPR(CLOBBER_HIGH, "clobber_high", "e", RTX_EXTRA)
+
 /* Call a subroutine.
Operand 1 is the address to call.
Operand 2 is the number of arguments.  */
diff --git a/gcc/rtl.h b/gcc/rtl.h
index 565ce3abbe4..5e07e9bee80 100644
--- a/gcc/rtl.h
+++ b/gcc/rtl.h
@@ -1100,6 +1100,14 @@ is_a_helper ::test (rtx_insn *insn)
   __FUNCTION__); \
  &_rtx->u.fld[_n]; }))
 
+#define RTL_CHECKC3(RTX, N, C1, C2, C3) __extension__  \
+(*({ __typeof (RTX) const _rtx = (RTX); const int _n = (N);\
+ const enum rtx_code _code = GET_CODE (_rtx);  \
+ if (_code != (C1) && _code != (C2) && _code != (C3))  \
+   rtl_check_failed_code3 (_rtx, (C1), (C2), (C3), __FILE__,   \
+  __LINE__, __FUNCTION__); \
+ &_rtx->u.fld[_n]; }))
+
 #define RTVEC_ELT(RTVEC, I) __extension__  \
 (*({ __typeof (RTVEC) const _rtvec = (RTVEC); const int _i = (I);  \
  if (_i < 0 || _i >= GET_NUM_ELEM (_rtvec))
\
@@ -1190,6 +1198,10 @@ extern void rtl_check_failed_code1 (const_rtx, enum 
rtx_co

[PATCH v2 5/7] cse support for clobber_high

2018-07-26 Thread Alan Hayward
The cse specific changes for clobber_high.

2018-07-25  Alan Hayward  

* cse.c (invalidate_reg): New function extracted from...
(invalidate): ...here.
(canonicalize_insn): Check for clobber high.
(invalidate_from_clobbers): invalidate clobber highs.
(invalidate_from_sets_and_clobbers): Likewise.
(count_reg_usage): Check for clobber high.
(insn_live_p): Likewise.
* cselib.c (cselib_expand_value_rtx_1):Likewise.
(cselib_invalidate_regno): Check for clobber in setter.
(cselib_invalidate_rtx): Pass through setter.
(cselib_invalidate_rtx_note_stores):
(cselib_process_insn): Check for clobber high.
* cselib.h (cselib_invalidate_rtx): Add operand.
---
 gcc/cse.c| 187 +++
 gcc/cselib.c |  42 ++
 gcc/cselib.h |   2 +-
 3 files changed, 156 insertions(+), 75 deletions(-)

diff --git a/gcc/cse.c b/gcc/cse.c
index 4e94152b380..3d7888b7093 100644
--- a/gcc/cse.c
+++ b/gcc/cse.c
@@ -559,6 +559,7 @@ static struct table_elt *insert_with_costs (rtx, struct 
table_elt *, unsigned,
 static struct table_elt *insert (rtx, struct table_elt *, unsigned,
 machine_mode);
 static void merge_equiv_classes (struct table_elt *, struct table_elt *);
+static void invalidate_reg (rtx, bool);
 static void invalidate (rtx, machine_mode);
 static void remove_invalid_refs (unsigned int);
 static void remove_invalid_subreg_refs (unsigned int, poly_uint64,
@@ -1818,7 +1819,85 @@ check_dependence (const_rtx x, rtx exp, machine_mode 
mode, rtx addr)
 }
   return false;
 }
-
+
+/* Remove from the hash table, or mark as invalid, all expressions whose
+   values could be altered by storing in register X.
+
+   CLOBBER_HIGH is set if X was part of a CLOBBER_HIGH expression.  */
+
+static void
+invalidate_reg (rtx x, bool clobber_high)
+{
+  gcc_assert (GET_CODE (x) == REG);
+
+  /* If X is a register, dependencies on its contents are recorded
+ through the qty number mechanism.  Just change the qty number of
+ the register, mark it as invalid for expressions that refer to it,
+ and remove it itself.  */
+  unsigned int regno = REGNO (x);
+  unsigned int hash = HASH (x, GET_MODE (x));
+
+  /* Remove REGNO from any quantity list it might be on and indicate
+ that its value might have changed.  If it is a pseudo, remove its
+ entry from the hash table.
+
+ For a hard register, we do the first two actions above for any
+ additional hard registers corresponding to X.  Then, if any of these
+ registers are in the table, we must remove any REG entries that
+ overlap these registers.  */
+
+  delete_reg_equiv (regno);
+  REG_TICK (regno)++;
+  SUBREG_TICKED (regno) = -1;
+
+  if (regno >= FIRST_PSEUDO_REGISTER)
+{
+  gcc_assert (!clobber_high);
+  remove_pseudo_from_table (x, hash);
+}
+  else
+{
+  HOST_WIDE_INT in_table = TEST_HARD_REG_BIT (hard_regs_in_table, regno);
+  unsigned int endregno = END_REGNO (x);
+  unsigned int rn;
+  struct table_elt *p, *next;
+
+  CLEAR_HARD_REG_BIT (hard_regs_in_table, regno);
+
+  for (rn = regno + 1; rn < endregno; rn++)
+   {
+ in_table |= TEST_HARD_REG_BIT (hard_regs_in_table, rn);
+ CLEAR_HARD_REG_BIT (hard_regs_in_table, rn);
+ delete_reg_equiv (rn);
+ REG_TICK (rn)++;
+ SUBREG_TICKED (rn) = -1;
+   }
+
+  if (in_table)
+   for (hash = 0; hash < HASH_SIZE; hash++)
+ for (p = table[hash]; p; p = next)
+   {
+ next = p->next_same_hash;
+
+ if (!REG_P (p->exp) || REGNO (p->exp) >= FIRST_PSEUDO_REGISTER)
+   continue;
+
+ if (clobber_high)
+   {
+ if (reg_is_clobbered_by_clobber_high (p->exp, x))
+   remove_from_table (p, hash);
+   }
+ else
+   {
+ unsigned int tregno = REGNO (p->exp);
+ unsigned int tendregno = END_REGNO (p->exp);
+ if (tendregno > regno && tregno < endregno)
+   remove_from_table (p, hash);
+   }
+   }
+}
+}
+
 /* Remove from the hash table, or mark as invalid, all expressions whose
values could be altered by storing in X.  X is a register, a subreg, or
a memory reference with nonvarying address (because, when a memory
@@ -1841,65 +1920,7 @@ invalidate (rtx x, machine_mode full_mode)
   switch (GET_CODE (x))
 {
 case REG:
-  {
-   /* If X is a register, dependencies on its contents are recorded
-  through the qty number mechanism.  Just change the qty number of
-  the register, mark it as invalid for expressions that refer to it,
-  and remove it itself.  */
-   unsigned int regno = REGNO (x);
-   unsigned int hash = HASH (x, GET_MODE (x));
-
-   /* Remove REGNO from an

[PATCH v2 7/7] Enable clobber high for tls descs on Aarch64

2018-07-26 Thread Alan Hayward
Add the clobber high expressions to tls_desc for aarch64.
It also adds three tests.

In addition I also tested by taking the gcc torture test suite and making
all global variables __thread. Then emended the suite to compile with -fpic,
save the .s file and only for one given O level.
I ran this before and after the patch and compared the resulting .s files,
ensuring that there were no ASM changes.
I discarded the 10% of tests that failed to compile (due to the code in
the test now being invalid C).
I did this for O0,O2,O3 on both x86 and aarch64 and observed no difference
between ASM files before and after the patch.

Alan.

2018-07-25  Alan Hayward  

gcc/
* config/aarch64/aarch64.md: Add clobber highs to tls_desc.

gcc/testsuite/
* gcc.target/aarch64/sve_tls_preserve_1.c: New test.
* gcc.target/aarch64/sve_tls_preserve_2.c: New test.
* gcc.target/aarch64/sve_tls_preserve_3.c: New test.
---
 gcc/config/aarch64/aarch64.md  | 69 ++
 .../gcc.target/aarch64/sve_tls_preserve_1.c| 19 ++
 .../gcc.target/aarch64/sve_tls_preserve_2.c| 24 
 .../gcc.target/aarch64/sve_tls_preserve_3.c| 24 
 4 files changed, 124 insertions(+), 12 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve_tls_preserve_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve_tls_preserve_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve_tls_preserve_3.c

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index e9c16f9697b..a41d6e15bc8 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -57,14 +57,36 @@
 (LR_REGNUM 30)
 (SP_REGNUM 31)
 (V0_REGNUM 32)
+(V1_REGNUM 33)
+(V2_REGNUM 34)
+(V3_REGNUM 35)
 (V4_REGNUM 36)
+(V5_REGNUM 37)
+(V6_REGNUM 38)
+(V7_REGNUM 39)
 (V8_REGNUM 40)
+(V9_REGNUM 41)
+(V10_REGNUM42)
+(V11_REGNUM43)
 (V12_REGNUM44)
+(V13_REGNUM45)
+(V14_REGNUM46)
 (V15_REGNUM47)
 (V16_REGNUM48)
+(V17_REGNUM49)
+(V18_REGNUM50)
+(V19_REGNUM51)
 (V20_REGNUM52)
+(V21_REGNUM53)
+(V22_REGNUM54)
+(V23_REGNUM55)
 (V24_REGNUM56)
+(V25_REGNUM57)
+(V26_REGNUM58)
+(V27_REGNUM59)
 (V28_REGNUM60)
+(V29_REGNUM61)
+(V30_REGNUM62)
 (V31_REGNUM63)
 (LAST_SAVED_REGNUM 63)
 (SFP_REGNUM64)
@@ -6302,24 +6324,47 @@
   [(set_attr "type" "call")
(set_attr "length" "16")])
 
-;; For SVE, model tlsdesc calls as clobbering all vector and predicate
-;; registers, on top of the usual R0 and LR.  In reality the calls
-;; preserve the low 128 bits of the vector registers, but we don't
-;; yet have a way of representing that in the instruction pattern.
+;; For SVE, model tlsdesc calls as clobbering the lower 128 bits of
+;; all vector registers, and clobber all predicate registers, on
+;; top of the usual R0 and LR.
 (define_insn "tlsdesc_small_sve_"
   [(set (reg:PTR R0_REGNUM)
 (unspec:PTR [(match_operand 0 "aarch64_valid_symref" "S")]
UNSPEC_TLSDESC))
(clobber (reg:DI LR_REGNUM))
(clobber (reg:CC CC_REGNUM))
-   (clobber (reg:XI V0_REGNUM))
-   (clobber (reg:XI V4_REGNUM))
-   (clobber (reg:XI V8_REGNUM))
-   (clobber (reg:XI V12_REGNUM))
-   (clobber (reg:XI V16_REGNUM))
-   (clobber (reg:XI V20_REGNUM))
-   (clobber (reg:XI V24_REGNUM))
-   (clobber (reg:XI V28_REGNUM))
+   (clobber_high (reg:TI V0_REGNUM))
+   (clobber_high (reg:TI V1_REGNUM))
+   (clobber_high (reg:TI V2_REGNUM))
+   (clobber_high (reg:TI V3_REGNUM))
+   (clobber_high (reg:TI V4_REGNUM))
+   (clobber_high (reg:TI V5_REGNUM))
+   (clobber_high (reg:TI V6_REGNUM))
+   (clobber_high (reg:TI V7_REGNUM))
+   (clobber_high (reg:TI V8_REGNUM))
+   (clobber_high (reg:TI V9_REGNUM))
+   (clobber_high (reg:TI V10_REGNUM))
+   (clobber_high (reg:TI V11_REGNUM))
+   (clobber_high (reg:TI V12_REGNUM))
+   (clobber_high (reg:TI V13_REGNUM))
+   (clobber_high (reg:TI V14_REGNUM))
+   (clobber_high (reg:TI V15_REGNUM))
+   (clobber_high (reg:TI V16_REGNUM))
+   (clobber_high (reg:TI V17_REGNUM))
+   (clobber_high (reg:TI V18_REGNUM))
+   (clobber_high (reg:TI V19_REGNUM))
+   (clobber_high (reg:TI V20_REGNUM))
+   (clobber_high (reg:TI V21_REGNUM))
+   (clobber_high (reg:TI V22_REGNUM))
+   (clobber_high (reg:TI V23_REGNUM))
+   (clobber_high (reg:TI V24_REGNUM))
+   (clobber_high (reg:TI V25_REGNUM))
+   (clobber_high (reg:TI V26_REGNUM))
+   (clobber_high (reg:TI V27_REGNUM)

[PATCH v2 6/7] Remaining support for clobber high

2018-07-26 Thread Alan Hayward
Add the remainder of clobber high checks.
Happy to split this into smaller patches if required (there didn't
seem anything obvious to split into).

2018-07-25  Alan Hayward  

* alias.c (record_set): Check for clobber high.
* cfgexpand.c (expand_gimple_stmt): Likewise.
* combine-stack-adj.c (single_set_for_csa): Likewise.
* combine.c (find_single_use_1): Likewise.
(set_nonzero_bits_and_sign_copies): Likewise.
(get_combine_src_dest): Likewise.
(is_parallel_of_n_reg_sets): Likewise.
(try_combine): Likewise.
(record_dead_and_set_regs_1): Likewise.
(reg_dead_at_p_1): Likewise.
(reg_dead_at_p): Likewise.
* dce.c (deletable_insn_p): Likewise.
(mark_nonreg_stores_1): Likewise.
(mark_nonreg_stores_2): Likewise.
* df-scan.c (df_find_hard_reg_defs): Likewise.
(df_uses_record): Likewise.
(df_get_call_refs): Likewise.
* dwarf2out.c (mem_loc_descriptor): Likewise.
* haifa-sched.c (haifa_classify_rtx): Likewise.
* ira-build.c (create_insn_allocnos): Likewise.
* ira-costs.c (scan_one_insn): Likewise.
* ira.c (equiv_init_movable_p): Likewise.
(rtx_moveable_p): Likewise.
(interesting_dest_for_shprep): Likewise.
* jump.c (mark_jump_label_1): Likewise.
* postreload-gcse.c (record_opr_changes): Likewise.
* postreload.c (reload_cse_simplify): Likewise.
(struct reg_use): Add source expr.
(reload_combine): Check for clobber high.
(reload_combine_note_use): Likewise.
(reload_cse_move2add): Likewise.
(move2add_note_store): Likewise.
* print-rtl.c (print_pattern): Likewise.
* recog.c (decode_asm_operands): Likewise.
(store_data_bypass_p): Likewise.
(if_test_bypass_p): Likewise.
* regcprop.c (kill_clobbered_value): Likewise.
(kill_set_value): Likewise.
* reginfo.c (reg_scan_mark_refs): Likewise.
* reload1.c (maybe_fix_stack_asms): Likewise.
(eliminate_regs_1): Likewise.
(elimination_effects): Likewise.
(mark_not_eliminable): Likewise.
(scan_paradoxical_subregs): Likewise.
(forget_old_reloads_1): Likewise.
* reorg.c (find_end_label): Likewise.
(try_merge_delay_insns): Likewise.
(redundant_insn): Likewise.
(own_thread_p): Likewise.
(fill_simple_delay_slots): Likewise.
(fill_slots_from_thread): Likewise.
(dbr_schedule): Likewise.
* resource.c (update_live_status): Likewise.
(mark_referenced_resources): Likewise.
(mark_set_resources): Likewise.
* rtl.c (copy_rtx): Likewise.
* rtlanal.c (reg_referenced_p): Likewise.
(single_set_2): Likewise.
(noop_move_p): Likewise.
(note_stores): Likewise.
* sched-deps.c (sched_analyze_reg): Likewise.
(sched_analyze_insn): Likewise.
---
 gcc/alias.c | 11 +++
 gcc/cfgexpand.c |  1 +
 gcc/combine-stack-adj.c |  1 +
 gcc/combine.c   | 38 +-
 gcc/dce.c   | 11 +--
 gcc/df-scan.c   |  6 ++
 gcc/dwarf2out.c |  1 +
 gcc/haifa-sched.c   |  3 +++
 gcc/ira-build.c |  5 +
 gcc/ira-costs.c |  7 +++
 gcc/ira.c   |  6 +-
 gcc/jump.c  |  1 +
 gcc/postreload-gcse.c   | 21 -
 gcc/postreload.c| 25 -
 gcc/print-rtl.c |  1 +
 gcc/recog.c |  9 ++---
 gcc/regcprop.c  | 10 --
 gcc/reginfo.c   |  4 
 gcc/reload1.c   | 16 +++-
 gcc/reorg.c | 27 ++-
 gcc/resource.c  | 24 +---
 gcc/rtl.c   |  4 
 gcc/rtlanal.c   | 18 +++---
 gcc/sched-deps.c| 15 ++-
 24 files changed, 225 insertions(+), 40 deletions(-)

diff --git a/gcc/alias.c b/gcc/alias.c
index 2091dfbf3d7..748da2b6951 100644
--- a/gcc/alias.c
+++ b/gcc/alias.c
@@ -1554,6 +1554,17 @@ record_set (rtx dest, const_rtx set, void *data 
ATTRIBUTE_UNUSED)
  new_reg_base_value[regno] = 0;
  return;
}
+  /* A CLOBBER_HIGH only wipes out the old value if the mode of the old
+value is greater than that of the clobber.  */
+  else if (GET_CODE (set) == CLOBBER_HIGH)
+   {
+ if (new_reg_base_value[regno] != 0
+ && reg_is_clobbered_by_clobber_high (
+  regno, GET_MODE (new_reg_base_value[regno]), XEXP (set, 0)))
+   new_reg_base_value[regno] = 0;
+ return;
+   }
+
   src = SET_SRC (set);
 }
   else
diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index d6e3c382085..39db6ed435f 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -3750,6 +3750,7 @@ expand_gimple_stmt (gimple *stmt)
  

[wwwdocs] Repeat the 8.2 C++ ABI change also in the 8.2 changes.html section

2018-07-26 Thread Jakub Jelinek
Hi!

I've noticed that while Jason documented the -fabi-version=13/-Wabi=12
8.2 fix in the C++ section, there is no mention of it in the 8.2 section,
so if people just quickly look at what significant has changed in 8.2,
they will not notice that.

This patch fixes the markup in Jason's changes  etc. and adds a short
note also to the 8.2 section, with a cross reference to the C++ section.

Ok for wwwdocs?

--- htdocs/gcc-8/changes.html   22 Jul 2018 08:26:21 -  1.91
+++ htdocs/gcc-8/changes.html   26 Jul 2018 09:18:19 -
@@ -536,7 +536,7 @@ $ gcc unclosed-2.c
 
 C++
 
-  GCC 8 (-fabi-version=12) has a couple of corrections to the calling
+  GCC 8 (-fabi-version=12) has a couple of corrections to the 
calling
 convention, which changes the ABI for some uncommon code:
   Passing an empty class as an argument now takes up no space on
x86_64, as required by the psABI.
@@ -547,12 +547,13 @@ $ gcc unclosed-2.c
impossible.
   WARNING: In GCC 8.1 the second change mistakenly also affects
classes with a deleted copy constructor and defaulted trivial move
-   constructor (bug c++/86094).  This issue is fixed in GCC 8.2
-   (-fabi-version=13).
+   constructor (bug https://gcc.gnu.org/PR86094";>c++/86094).
+   This issue is fixed in GCC 8.2 (-fabi-version=13).
 
-You can test whether these changes affect your code with -Wabi=11 (or
--Wabi=12 in GCC 8.2 for the third issue); if these changes are problematic
-for your project, the GCC 7 ABI can be selected with -fabi-version=11.
+You can test whether these changes affect your code with
+-Wabi=11 (or -Wabi=12 in GCC 8.2 for the third 
issue);
+if these changes are problematic for your project, the GCC 7 ABI can be 
selected
+with -fabi-version=11.
   
   The value of the C++11 alignof operator has been corrected
 to match C _Alignof (minimum alignment) rather than
@@ -1327,6 +1328,17 @@ are not listed here).
in the partitioning algorithm while building large binaries.
   
 
+Language Specific Changes
+
+C++
+  GCC 8.2 fixed a bug introduced in GCC 8.1 affecting passing or returning
+  of classes with a deleted copy constructor and defaulted trivial move
+  constructor (bug https://gcc.gnu.org/PR86094";>c++/86094).
+  GCC 8.2 introduces -fabi-version=13 and makes it the default,
+  ABI incompatibilities between GCC 8.1 and 8.2 can be reported with
+  -Wabi=12.  See C++ changes for more
+  details.
+
 Target Specific Changes
 
 IA-32/x86-64

Jakub


Re: [wwwdocs] Repeat the 8.2 C++ ABI change also in the 8.2 changes.html section

2018-07-26 Thread Richard Biener
On Thu, 26 Jul 2018, Jakub Jelinek wrote:

> Hi!
> 
> I've noticed that while Jason documented the -fabi-version=13/-Wabi=12
> 8.2 fix in the C++ section, there is no mention of it in the 8.2 section,
> so if people just quickly look at what significant has changed in 8.2,
> they will not notice that.
> 
> This patch fixes the markup in Jason's changes  etc. and adds a short
> note also to the 8.2 section, with a cross reference to the C++ section.
> 
> Ok for wwwdocs?

LGTM.

Thanks,
Richard.

> --- htdocs/gcc-8/changes.html 22 Jul 2018 08:26:21 -  1.91
> +++ htdocs/gcc-8/changes.html 26 Jul 2018 09:18:19 -
> @@ -536,7 +536,7 @@ $ gcc unclosed-2.c
>  
>  C++
>  
> -  GCC 8 (-fabi-version=12) has a couple of corrections to the calling
> +  GCC 8 (-fabi-version=12) has a couple of corrections to 
> the calling
>  convention, which changes the ABI for some uncommon code:
>Passing an empty class as an argument now takes up no space on
>   x86_64, as required by the psABI.
> @@ -547,12 +547,13 @@ $ gcc unclosed-2.c
>   impossible.
>WARNING: In GCC 8.1 the second change mistakenly also 
> affects
>   classes with a deleted copy constructor and defaulted trivial move
> - constructor (bug c++/86094).  This issue is fixed in GCC 8.2
> - (-fabi-version=13).
> + constructor (bug https://gcc.gnu.org/PR86094";>c++/86094).
> + This issue is fixed in GCC 8.2 (-fabi-version=13).
>  
> -You can test whether these changes affect your code with -Wabi=11 (or
> --Wabi=12 in GCC 8.2 for the third issue); if these changes are 
> problematic
> -for your project, the GCC 7 ABI can be selected with -fabi-version=11.
> +You can test whether these changes affect your code with
> +-Wabi=11 (or -Wabi=12 in GCC 8.2 for the third 
> issue);
> +if these changes are problematic for your project, the GCC 7 ABI can be 
> selected
> +with -fabi-version=11.
>
>The value of the C++11 alignof operator has been corrected
>  to match C _Alignof (minimum alignment) rather than
> @@ -1327,6 +1328,17 @@ are not listed here).
>   in the partitioning algorithm while building large binaries.
>
>  
> +Language Specific Changes
> +
> +C++
> +  GCC 8.2 fixed a bug introduced in GCC 8.1 affecting passing or returning
> +  of classes with a deleted copy constructor and defaulted trivial move
> +  constructor (bug https://gcc.gnu.org/PR86094";>c++/86094).
> +  GCC 8.2 introduces -fabi-version=13 and makes it the default,
> +  ABI incompatibilities between GCC 8.1 and 8.2 can be reported with
> +  -Wabi=12.  See C++ changes for more
> +  details.
> +
>  Target Specific Changes
>  
>  IA-32/x86-64
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [PATCH 1/7] Add __builtin_speculation_safe_value

2018-07-26 Thread Richard Earnshaw (lists)
On 25/07/18 14:47, Richard Biener wrote:
> On Wed, Jul 25, 2018 at 2:41 PM Richard Earnshaw (lists)
>  wrote:
>>
>> On 25/07/18 11:36, Richard Biener wrote:
>>> On Wed, Jul 25, 2018 at 11:49 AM Richard Earnshaw (lists)
>>>  wrote:

 On 24/07/18 18:26, Richard Biener wrote:
> On Mon, Jul 9, 2018 at 6:40 PM Richard Earnshaw
>  wrote:
>>
>>
>> This patch defines a new intrinsic function
>> __builtin_speculation_safe_value.  A generic default implementation is
>> defined which will attempt to use the backend pattern
>> "speculation_safe_barrier".  If this pattern is not defined, or if it
>> is not available, then the compiler will emit a warning, but
>> compilation will continue.
>>
>> Note that the test spec-barrier-1.c will currently fail on all
>> targets.  This is deliberate, the failure will go away when
>> appropriate action is taken for each target backend.
>
> So given this series is supposed to be backported I question
>
> +rtx
> +default_speculation_safe_value (machine_mode mode ATTRIBUTE_UNUSED,
> +   rtx result, rtx val,
> +   rtx failval ATTRIBUTE_UNUSED)
> +{
> +  emit_move_insn (result, val);
> +#ifdef HAVE_speculation_barrier
> +  /* Assume the target knows what it is doing: if it defines a
> + speculation barrier, but it is not enabled, then assume that one
> + isn't needed.  */
> +  if (HAVE_speculation_barrier)
> +emit_insn (gen_speculation_barrier ());
> +
> +#else
> +  warning_at (input_location, 0,
> + "this target does not define a speculation barrier; "
> + "your program will still execute correctly, but speculation 
> "
> + "will not be inhibited");
> +#endif
> +  return result;
>
> which makes all but aarch64 archs warn on __bultin_speculation_safe_value
> uses, even those that do not suffer from Spectre like all those embedded 
> targets
> where implementations usually do not speculate at all.
>
> In fact for those targets the builtin stays in the way of optimization on 
> GIMPLE
> as well so we should fold it away early if neither the target hook is
> implemented
> nor there is a speculation_barrier insn.
>
> So, please make resolve_overloaded_builtin return a no-op on such targets
> which means you can remove the above warning.  Maybe such targets
> shouldn't advertise / initialize the builtins at all?

 I disagree with your approach here.  Why would users not want to know
 when the compiler is failing to implement a security feature when it
 should?  As for targets that don't need something, they can easily
 define the hook as described to suppress the warning.

 Or are you just suggesting moving the warning to resolve overloaded 
 builtin.
>>>
>>> Well.  You could argue I say we shouldn't even support
>>> __builtin_sepeculation_safe_value
>>> for archs that do not need it or have it not implemented.  That way users 
>>> can
>>> decide:
>>>
>>> #if __HAVE_SPECULATION_SAFE_VALUE
>>>  
>>> #else
>>> #warning oops // or nothing
>>> #endif
>>>
>>
>> So how about removing the predefine of __HAVE_S_S_V when the builtin is
>> a nop, but then leaving the warning in if people try to use it anyway?
> 
> Little bit inconsistent but I guess I could live with that.  It still leaves
> the question open for how to declare you do not need speculation
> barriers at all then.
> 
 Other ports will need to take action, but in general, it can be as
 simple as, eg patch 2 or 3 do for the Arm and AArch64 backends - or
 simpler still if nothing is needed for that architecture.
>>>
>>> Then that should be the default.  You might argue we'll only see
>>> __builtin_speculation_safe_value uses for things like Firefox which
>>> is unlikely built for AVR (just to make an example).  But people
>>> are going to test build just on x86 and if they build with -Werror
>>> this will break builds on all targets that didn't even get the chance
>>> to implement this feature.
>>>
 There is a test which is intended to fail to targets that have not yet
 been patched - I thought that was better than hard-failing the build,
 especially given that we want to back-port.

 Port maintainers DO need to decide what to do about speculation, even if
 it is explicitly that no mitigation is needed.
>>>
>>> Agreed.  But I didn't yet see a request for maintainers to decide that?
>>>
>>
>> consider it made, then :-)
> 
> I suspect that drew their attention ;)
> 
> So a different idea would be to produce patches implementing the hook for
> each target "empty", CC the target maintainers and hope they quickly
> ack if the target doesn't have a speculation problem.  Others then would
> get no patch (from you) and thus raise a warning?
> 
> Maybe at leas

[PATCH] combine: Another hard register problem (PR85805)

2018-07-26 Thread Segher Boessenkool
The current code in reg_nonzero_bits_for_combine allows using the
reg_stat info when last_set_mode is a different integer mode.  This is
completely wrong for non-pseudos.  For example, as in the PR, a value
in a DImode hard register is set by eight writes to its constituent
QImode parts.  The value written to the DImode is not the same as that
written to the lowest-numbered QImode!

This patch fixes it.  Committing.  Will backport later, too.


Segher


2018-07-26  Segher Boessenkool  

PR rtl-optimization/85805
* combine.c (reg_nonzero_bits_for_combine): Only use the last set
value for hard registers if that was written in the same mode.

---
 gcc/combine.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/combine.c b/gcc/combine.c
index 09cbad4..fe71f3a 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -10186,7 +10186,8 @@ reg_nonzero_bits_for_combine (const_rtx x, 
scalar_int_mode xmode,
   rsp = ®_stat[REGNO (x)];
   if (rsp->last_set_value != 0
   && (rsp->last_set_mode == mode
- || (GET_MODE_CLASS (rsp->last_set_mode) == MODE_INT
+ || (REGNO (x) >= FIRST_PSEUDO_REGISTER
+ && GET_MODE_CLASS (rsp->last_set_mode) == MODE_INT
  && GET_MODE_CLASS (mode) == MODE_INT))
   && ((rsp->last_set_label >= label_tick_ebb_start
   && rsp->last_set_label < label_tick)
-- 
1.8.3.1



Re: [Patch, avr, PR86635] Fix miscompilation with __memx and libgcc float function __gtsf2

2018-07-26 Thread Senthil Kumar Selvaraj


Senthil Kumar Selvaraj writes:

> The below patch fixes a miscompilation of function calls with__memx address 
> space
> arguments.
>
> For code like
>
> extern const  __memx float a;
> extern const float b;
>
> int diff () { return a > b; }
>
> when compiled with -Os, a is never loaded and passed in as an argument
> to the __gtsf2 libgcc function.
>
> Turns out that mov for the variable a expands into
>
> (insn 8 7 9 2 (parallel [
> (set (reg:SF 22 r22)
> (mem/u/c:SF (reg/f:PSI 47) [1 a+0 S4 A8 AS7]))
> (clobber (reg:SF 22 r22))
> (clobber (reg:QI 21 r21))
> (clobber (reg:HI 30 r30))
> ]) "test.c":4 36 {xloadsf_A}
>  (expr_list:REG_DEAD (reg/f:PSI 47)
> (expr_list:REG_UNUSED (reg:HI 30 r30)
> (expr_list:REG_EQUAL (mem/u/c:SF (symbol_ref:PSI ("a") [flags 
> 0xe40]  ) [1 a+0 S4 A8 AS7])
> (nil)
>
> The ud_dce pass sees this insn and deletes it as reg:SF r22 is both set
> and clobbered.
>
> Georg-Johann pointed out a similar issue (PR63633), and that was fixed
> by introducing a pseudo as the target of set. This patch does the same -
> adds an avr_emit2_fix_outputs for gen functions with 2 operands, that
> detects hard reg conflicts with clobbered regs and substitutes pseudos
> in their place.
>
> The patch also adds a testcase to verify a is actually read. Reg testing
> passed. Ok to commit to trunk?

Sent an out-of-date patch. Here's the right one.
>
> Regards
> Senthil
>
gcc/ChangeLog:

2018-07-25  Senthil Kumar Selvaraj  

* config/avr/avr-protos.h (avr_emit2_fix_outputs): New prototype.
  * config/avr/avr.c (avr_emit2_fix_outputs): New function.
  * config/avr/avr.md (mov): Wrap gen_xload_A call
with avr_emit2_fix_outputs.

gcc/testsuite/ChangeLog:

2018-07-25  Senthil Kumar Selvaraj  

* gcc.target/avr/torture/pr86635.c: New test.



diff --git gcc/config/avr/avr-protos.h gcc/config/avr/avr-protos.h
index 5622e9035a0..f8db418582e 100644
--- gcc/config/avr/avr-protos.h
+++ gcc/config/avr/avr-protos.h
@@ -135,6 +135,7 @@ regmask (machine_mode mode, unsigned regno)
 }
 
 extern void avr_fix_inputs (rtx*, unsigned, unsigned);
+extern bool avr_emit2_fix_outputs (rtx (*)(rtx,rtx), rtx*, unsigned, unsigned);
 extern bool avr_emit3_fix_outputs (rtx (*)(rtx,rtx,rtx), rtx*, unsigned, 
unsigned);
 
 extern rtx lpm_reg_rtx;
diff --git gcc/config/avr/avr.c gcc/config/avr/avr.c
index 81c35e7fbc2..996d5187c52 100644
--- gcc/config/avr/avr.c
+++ gcc/config/avr/avr.c
@@ -13335,6 +13335,34 @@ avr_emit3_fix_outputs (rtx (*gen)(rtx,rtx,rtx), rtx 
*op,
   return avr_move_fixed_operands (op, hreg, opmask);
 }
 
+/* Same as avr_emit3_fix_outputs, but for 2 operands */
+bool
+avr_emit2_fix_outputs (rtx (*gen)(rtx,rtx), rtx *op,
+   unsigned opmask, unsigned rmask)
+{
+  const int n = 2;
+  rtx hreg[n];
+
+  /* It is letigimate for GEN to call this function, and in order not to
+ get self-recursive we use the following static kludge.  This is the
+ only way not to duplicate all expanders and to avoid ugly and
+ hard-to-maintain C-code instead of the much more appreciated RTL
+ representation as supplied by define_expand.  */
+  static bool lock = false;
+
+  gcc_assert (opmask < (1u << n));
+
+  if (lock)
+return false;
+
+  avr_fix_operands (op, hreg, opmask, rmask);
+
+  lock = true;
+  emit_insn (gen (op[0], op[1]));
+  lock = false;
+
+  return avr_move_fixed_operands (op, hreg, opmask);
+}
 
 /* Worker function for movmemhi expander.
XOP[0]  Destination as MEM:BLK
diff --git gcc/config/avr/avr.md gcc/config/avr/avr.md
index e619e695418..033a428e9f3 100644
--- gcc/config/avr/avr.md
+++ gcc/config/avr/avr.md
@@ -672,7 +672,14 @@
  ; insn-emit does not depend on the mode, it's all about operands. 
 */
   emit_insn (gen_xload8qi_A (dest, src));
 else
-  emit_insn (gen_xload_A (dest, src));
+  {
+operands[0] = dest; operands[1] = src;
+if (!avr_emit2_fix_outputs (gen_xload_A, operands, 1 << 0,
+  regmask (mode, 22)
+   | regmask (QImode, 21)
+   | regmask (HImode, REG_Z)))
+  FAIL;
+  }
 
 DONE;
   }
diff --git gcc/testsuite/gcc.target/avr/torture/pr86635.c 
gcc/testsuite/gcc.target/avr/torture/pr86635.c
new file mode 100644
index 000..f91367f7e7a
--- /dev/null
+++ gcc/testsuite/gcc.target/avr/torture/pr86635.c
@@ -0,0 +1,9 @@
+/* { dg-do compile { target { ! avr_tiny } } } */
+/* { dg-options "-std=gnu99" } */
+
+extern const __memx float a;
+extern const float b;
+
+unsigned long diff () { return a > b; }
+
+/* { dg-final { scan-assembler "call __xload_4" } } */


Re: [36/46] Add a pattern_stmt_p field to stmt_vec_info

2018-07-26 Thread Richard Sandiford
Richard Biener  writes:
> On Wed, Jul 25, 2018 at 1:09 PM Richard Sandiford
>  wrote:
>>
>> Richard Biener  writes:
>> > On Tue, Jul 24, 2018 at 12:07 PM Richard Sandiford
>> >  wrote:
>> >>
>> >> This patch adds a pattern_stmt_p field to stmt_vec_info, so that it's
>> >> possible to tell whether the statement is a pattern statement without
>> >> referring to other statements.  The new field goes in what was
>> >> previously a hole in the structure, so the size is the same as before.
>> >
>> > Not sure what the advantage is?  is_pattern_stmt_p () looks nicer
>> > than ->is_pattern_p
>>
>> I can keep the function wrapper if you prefer that.  But having a
>> statement "know" whether it's a pattern stmt makes things like
>> freeing stmt_vec_infos simpler (see later patches in the series).
>
> Ah, ok.
>
>> It should also be cheaper to test, but that's much more minor.
>
> So please keep the wrapper.

Like this?

> I guess at some point we should decide what to do with all
> the STMT_VINFO_ macros (and the others, {LOOP,BB}_ stuff
> is already used inconsistently).

Yeah...


2018-07-26  Richard Sandiford  

gcc/
* tree-vectorizer.h (_stmt_vec_info::pattern_stmt_p): New field.
(is_pattern_stmt_p): Use it.
* tree-vect-patterns.c (vect_init_pattern_stmt): Set pattern_stmt_p
on pattern statements.

Index: gcc/tree-vectorizer.h
===
--- gcc/tree-vectorizer.h   2018-07-26 11:28:18.0 +0100
+++ gcc/tree-vectorizer.h   2018-07-26 11:28:19.072951054 +0100
@@ -791,6 +791,12 @@ struct _stmt_vec_info {
   /* Stmt is part of some pattern (computation idiom)  */
   bool in_pattern_p;
 
+  /* True if the statement was created during pattern recognition as
+ part of the replacement for RELATED_STMT.  This implies that the
+ statement isn't part of any basic block, although for convenience
+ its gimple_bb is the same as for RELATED_STMT.  */
+  bool pattern_stmt_p;
+
   /* Is this statement vectorizable or should it be skipped in (partial)
  vectorization.  */
   bool vectorizable;
@@ -1157,8 +1163,7 @@ get_later_stmt (stmt_vec_info stmt1_info
 static inline bool
 is_pattern_stmt_p (stmt_vec_info stmt_info)
 {
-  stmt_vec_info related_stmt_info = STMT_VINFO_RELATED_STMT (stmt_info);
-  return related_stmt_info && STMT_VINFO_IN_PATTERN_P (related_stmt_info);
+  return stmt_info->pattern_stmt_p;
 }
 
 /* Return true if BB is a loop header.  */
Index: gcc/tree-vect-patterns.c
===
--- gcc/tree-vect-patterns.c2018-07-26 11:28:18.0 +0100
+++ gcc/tree-vect-patterns.c2018-07-26 11:28:19.068951168 +0100
@@ -108,6 +108,7 @@ vect_init_pattern_stmt (gimple *pattern_
 pattern_stmt_info = orig_stmt_info->vinfo->add_stmt (pattern_stmt);
   gimple_set_bb (pattern_stmt, gimple_bb (orig_stmt_info->stmt));
 
+  pattern_stmt_info->pattern_stmt_p = true;
   STMT_VINFO_RELATED_STMT (pattern_stmt_info) = orig_stmt_info;
   STMT_VINFO_DEF_TYPE (pattern_stmt_info)
 = STMT_VINFO_DEF_TYPE (orig_stmt_info);


[PATCH] doc: discourage const/volatile on register variables

2018-07-26 Thread Alexander Monakov
Hi,

when using explicit register variables ('register int foo asm("%ebp");'),
using const/volatile type qualifiers on the declaration does not result in
the logically expected effect.

The main issue is that gcc-8 got "better" at propagating initializers of
'register const' variables to their uses in asm operands, losing the
association with the register and thus causing the operand to
unexpectedly appear in some other register. This caused build issues for
the Linux kernel and was reported a couple of times in the GCC Bugzilla.

This patch adds a few lines to the documentation to say that qualifiers
won't work as expected. OK for trunk?

Thanks.
Alexander

PR target/86673
doc/extend.texi (Global Register Variables): Discourage use of type
qualifiers.
(Local Register Variables): Likewise.

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 7b471ec40f7..9a41f2753e9 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -9591,6 +9591,11 @@ a global variable the declaration appears outside a 
function. The
 @code{static}. The register name must be a valid register name for the
 target platform.
 
+Do not use type qualifiers such as @code{const} and @code{volatile}, as
+the result may be contrary to your expectations. In  particular, using
+the @code{volatile} qualifier does not fully prevent the compiler from
+optimizing accesses to the register.
+
 Registers are a scarce resource on most systems and allowing the 
 compiler to manage their usage usually results in the best code. However, 
 under special circumstances it can make sense to reserve some globally.
@@ -9698,6 +9703,12 @@ but for a local variable the declaration appears within 
a function.  The
 @code{static}.  The register name must be a valid register name for the
 target platform.
 
+Do not use type qualifiers such as @code{const} and @code{volatile}, as
+the result may be contrary to your expectations. In particular, when
+the @code{const} qualifier is used, the compiler may substitute the
+variable with its initializer in @code{asm} statements, which may cause
+the corresponding operand to appear in a different register.
+
 As with global register variables, it is recommended that you choose 
 a register that is normally saved and restored by function calls on your 
 machine, so that calls to library routines will not clobber it.


Re: [37/46] Associate alignment information with stmt_vec_infos

2018-07-26 Thread Richard Sandiford
Richard Biener  writes:
> On Tue, Jul 24, 2018 at 12:08 PM Richard Sandiford
>  wrote:
>>
>> Alignment information is really a property of a stmt_vec_info
>> (and the way we want to vectorise it) rather than the original scalar dr.
>> I think that was true even before the recent dr sharing.
>
> But that is only so as long as we handle only stmts with a single DR.
> In reality alignment info _is_ a property of the DR and not of the stmt.
>
> So you're doing a shortcut here, shouldn't we rename
> dr_misalignment to stmt_dr_misalignment then?
>
> Otherwise I don't see how this makes sense semantically.

OK, the patch below takes a different approach, suggested in the
38/46 thread.  The idea is to make dr_aux link back to both the scalar
data_reference and the containing stmt_vec_info, so that it becomes a
lookup-free key for a vectorisable reference.

The data_reference link is just STMT_VINFO_DATA_REF, moved from
_stmt_vec_info.  The stmt pointer is a new field and always tracks
the current stmt_vec_info for the reference (which might be a pattern
stmt or the original stmt).

Then 38/40 can use dr_aux instead of data_reference (compared to current
sources) and instead of stmt_vec_info (compared to the original series).
This still avoids the repeated lookups that the series is trying to avoid.

The patch also makes the dr_aux in the current (possibly pattern) stmt
be the one that counts, rather than have the information stay with the
original DR_STMT.  A new macro (STMT_VINFO_DR_INFO) gives this
information for a given stmt_vec_info.

The changes together should make it easier to have multiple dr_auxs
in a single statement.

Thanks,
Richard


2018-07-26  Richard Sandiford  

gcc/
* tree-vectorizer.h (vec_info::move_dr): New member function.
(dataref_aux): Rename to...
(dr_vec_info): ...this and add "dr" and "stmt" fields.
(_stmt_vec_info::dr_aux): Update accordingly.
(_stmt_vec_info::data_ref_info): Delete.
(STMT_VINFO_GROUPED_ACCESS, DR_GROUP_FIRST_ELEMENT)
(DR_GROUP_NEXT_ELEMENT, DR_GROUP_SIZE, DR_GROUP_STORE_COUNT)
(DR_GROUP_GAP, DR_GROUP_SAME_DR_STMT, REDUC_GROUP_FIRST_ELEMENT):
(REDUC_GROUP_NEXT_ELEMENT, REDUC_GROUP_SIZE): Use dr_aux.dr instead
of data_ref.
(STMT_VINFO_DATA_REF): Likewise.  Turn into an lvalue.
(STMT_VINFO_DR_INFO): New macro.
(DR_VECT_AUX): Use STMT_VINFO_DR_INKFO and vect_dr_stmt.
(set_dr_misalignment): Update after rename of dataref_aux.
(vect_dr_stmt): Move earlier in file.  Return dr_aux.stmt.
* tree-vect-stmts.c (new_stmt_vec_info): Remove redundant
initialization of STMT_VINFO_DATA_REF.
* tree-vectorizer.c (vec_info::move_dr): New function.
* tree-vect-patterns.c (vect_recog_bool_pattern)
(vect_recog_mask_conversion_pattern)
(vect_recog_gather_scatter_pattern): Use it.
* tree-vect-data-refs.c (vect_analyze_data_refs): Initialize
the "dr" and "stmt" fields of dr_vec_info instead of
STMT_VINFO_DATA_REF.

Index: gcc/tree-vectorizer.h
===
--- gcc/tree-vectorizer.h   2018-07-26 11:30:55.0 +0100
+++ gcc/tree-vectorizer.h   2018-07-26 11:30:56.197256524 +0100
@@ -240,6 +240,7 @@ struct vec_info {
   stmt_vec_info lookup_stmt (gimple *);
   stmt_vec_info lookup_def (tree);
   stmt_vec_info lookup_single_use (tree);
+  void move_dr (stmt_vec_info, stmt_vec_info);
 
   /* The type of vectorization.  */
   vec_kind kind;
@@ -767,7 +768,11 @@ enum vect_memory_access_type {
   VMAT_GATHER_SCATTER
 };
 
-struct dataref_aux {
+struct dr_vec_info {
+  /* The data reference itself.  */
+  data_reference *dr;
+  /* The statement that contains the data reference.  */
+  stmt_vec_info stmt;
   /* The misalignment in bytes of the reference, or -1 if not known.  */
   int misalignment;
   /* The byte alignment that we'd ideally like the reference to have,
@@ -818,11 +823,7 @@ struct _stmt_vec_info {
  data-ref (array/pointer/struct access). A GIMPLE stmt is expected to have
  at most one such data-ref.  */
 
-  /* Information about the data-ref (access function, etc),
- relative to the inner-most containing loop.  */
-  struct data_reference *data_ref_info;
-
-  dataref_aux dr_aux;
+  dr_vec_info dr_aux;
 
   /* Information about the data-ref relative to this loop
  nest (the loop that is being considered for vectorization).  */
@@ -996,7 +997,7 @@ #define STMT_VINFO_LIVE_P(S)
 #define STMT_VINFO_VECTYPE(S)  (S)->vectype
 #define STMT_VINFO_VEC_STMT(S) (S)->vectorized_stmt
 #define STMT_VINFO_VECTORIZABLE(S) (S)->vectorizable
-#define STMT_VINFO_DATA_REF(S) (S)->data_ref_info
+#define STMT_VINFO_DATA_REF(S) ((S)->dr_aux.dr + 0)
 #define STMT_VINFO_GATHER_SCATTER_P(S)(S)->gather_scatter_p
 #define STMT_VINFO_STRIDED_P(S)   (S)->strided_p
 #define

[PATCH] Add linker_output as prefix for LTO temps (PR lto/86548).

2018-07-26 Thread Martin Liška
Hi.

As requested in the PR, now we produce prefixes for temp files in LTO:

Example:
$ gcc -flto main.o a.o --save-temps -o mybinary

generates:
$ ls /tmp/mybinary*
/tmp/mybinary  /tmp/mybinary.ltrans0.o  /tmp/mybinary.ltrans0.s  
/tmp/mybinary.ltrans.out

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

Ready to be installed?
Martin

gcc/ChangeLog:

2018-07-26  Martin Liska  

PR lto/86548
* lto-wrapper.c: Add linker_output as prefix
for ltrans_output_file.

include/ChangeLog:

2018-07-26  Martin Liska  

PR lto/86548
* libiberty.h (make_temp_file_with_prefix): New function.

libiberty/ChangeLog:

2018-07-26  Martin Liska  

PR lto/86548
* make-temp-file.c (TEMP_FILE): Remove leading 'cc'.
(make_temp_file): Call make_temp_file_with_prefix with
first argument set to NULL.
(make_temp_file_with_prefix): Support also prefix.
---
 gcc/lto-wrapper.c  | 14 +-
 include/libiberty.h|  5 +
 libiberty/make-temp-file.c | 24 ++--
 3 files changed, 36 insertions(+), 7 deletions(-)


diff --git a/gcc/lto-wrapper.c b/gcc/lto-wrapper.c
index c3eb00dc0c2..cf4a8c659e0 100644
--- a/gcc/lto-wrapper.c
+++ b/gcc/lto-wrapper.c
@@ -1373,7 +1373,19 @@ cont1:
 	  strcat (ltrans_output_file, ".ltrans.out");
 	}
   else
-	ltrans_output_file = make_temp_file (".ltrans.out");
+	{
+	  char *prefix = NULL;
+	  if (linker_output)
+	{
+	  prefix = (char *) xmalloc (strlen (linker_output) + 2);
+	  strcpy (prefix, linker_output);
+	  strcat (prefix, ".");
+	}
+
+	  ltrans_output_file = make_temp_file_with_prefix (prefix,
+			   ".ltrans.out");
+	  free (prefix);
+	}
   list_option_full = (char *) xmalloc (sizeof (char) *
 		 (strlen (ltrans_output_file) + list_option_len + 1));
   tmp = list_option_full;
diff --git a/include/libiberty.h b/include/libiberty.h
index dc09e791e41..0823614c00e 100644
--- a/include/libiberty.h
+++ b/include/libiberty.h
@@ -239,6 +239,11 @@ extern char *choose_temp_base (void) ATTRIBUTE_MALLOC ATTRIBUTE_RETURNS_NONNULL;
 
 extern char *make_temp_file (const char *) ATTRIBUTE_MALLOC;
 
+/* Return a temporary file name with given PREFIX and SUFFIX
+   or NULL if unable to create one.  */
+
+extern char *make_temp_file_with_prefix (const char *, const char *) ATTRIBUTE_MALLOC;
+
 /* Remove a link to a file unless it is special. */
 
 extern int unlink_if_ordinary (const char *);
diff --git a/libiberty/make-temp-file.c b/libiberty/make-temp-file.c
index 89faed7f09e..21b05457542 100644
--- a/libiberty/make-temp-file.c
+++ b/libiberty/make-temp-file.c
@@ -56,7 +56,7 @@ extern int mkstemps (char *, int);
 
 /* Name of temporary file.
mktemp requires 6 trailing X's.  */
-#define TEMP_FILE "ccXX"
+#define TEMP_FILE "XX"
 #define TEMP_FILE_LEN (sizeof(TEMP_FILE) - 1)
 
 #if !defined(_WIN32) || defined(__CYGWIN__)
@@ -181,25 +181,31 @@ string is @code{malloc}ed, and the temporary file has been created.
 */
 
 char *
-make_temp_file (const char *suffix)
+make_temp_file_with_prefix (const char *prefix, const char *suffix)
 {
   const char *base = choose_tmpdir ();
   char *temp_filename;
-  int base_len, suffix_len;
+  int base_len, suffix_len, prefix_len;
   int fd;
 
+  if (prefix == 0)
+prefix = "cc";
+
   if (suffix == 0)
 suffix = "";
 
   base_len = strlen (base);
+  prefix_len = strlen (prefix);
   suffix_len = strlen (suffix);
 
   temp_filename = XNEWVEC (char, base_len
 			   + TEMP_FILE_LEN
-			   + suffix_len + 1);
+			   + suffix_len
+			   + prefix_len + 1);
   strcpy (temp_filename, base);
-  strcpy (temp_filename + base_len, TEMP_FILE);
-  strcpy (temp_filename + base_len + TEMP_FILE_LEN, suffix);
+  strcpy (temp_filename + base_len, prefix);
+  strcpy (temp_filename + base_len + prefix_len, TEMP_FILE);
+  strcpy (temp_filename + base_len + prefix_len + TEMP_FILE_LEN, suffix);
 
   fd = mkstemps (temp_filename, suffix_len);
   /* Mkstemps failed.  It may be EPERM, ENOSPC etc.  */
@@ -214,3 +220,9 @@ make_temp_file (const char *suffix)
 abort ();
   return temp_filename;
 }
+
+char *
+make_temp_file (const char *suffix)
+{
+  return make_temp_file_with_prefix (NULL, suffix);
+}



RE: [PATCH][GCC][Arm] Fix subreg crash in different way by enabling the FP16 pattern unconditionally.

2018-07-26 Thread Tamar Christina
Hi Thomas,

> -Original Message-
> From: Thomas Preudhomme 
> Sent: Thursday, July 26, 2018 09:29
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; Ramana Radhakrishnan
> ; Richard Earnshaw
> ; ni...@redhat.com; Kyrylo Tkachov
> 
> Subject: Re: [PATCH][GCC][Arm] Fix subreg crash in different way by
> enabling the FP16 pattern unconditionally.
> 
> Hi Tamar,
> 
> On Wed, 25 Jul 2018 at 16:28, Tamar Christina 
> wrote:
> >
> > Hi Thomas,
> >
> > Thanks for the review!
> >
> > > >
> > > > I don't believe the TARGET_FP16 guard to be needed, because the
> > > > pattern doesn't actually generate code and requires another
> > > > pattern for that, and a reg to reg move should always be possible
> > > > anyway. So allowing the force to register here is safe and it
> > > > allows the compiler to generate a correct error instead of ICEing in an
> infinite loop.
> > >
> > > How about subreg to subreg move? Doesn't that expand to more insns
> > > (subreg to reg and reg to subreg)? Couldn't you improve the logic to
> > > check that there is actually a mode change so that if there isn't
> > > (like moving from one subreg to another) just expand to a single move?
> > >
> >
> > Yes, but that is not a new issue. My patch is simply removing the
> > TARGET_FP16 restrictions and merging two patterns that should be one
> using an iterator and nothing more.
> >
> > The redundant mov is already there and a different issue than the ICE I'm
> trying to fix.
> 
> It's there for movv4hf and movv6hf but your patch extends this problem to
> movv2sf and movv4sf as well.

I don't understand how it can. My patch just replaces one pattern for V4HF and
one for V8HF with one pattern operating on VH.

;; Vector modes for 16-bit floating-point support.
(define_mode_iterator VH [V8HF V4HF])

My pattern has absolutely no effect on V2SF and V4SF or any of the other modes.

> 
> >
> > None of the code inside the expander is needed at all, the code really
> > only has an effect on subreg to subreg moves, as `force_reg` doesn't do
> anything when it's argument is already a reg.
> >
> > The comment in the expander (which was already there) is wrong. The
> > *reason* the ICE is fixed isn't because of the `force_reg`. It's
> > because of the mere presence of the expander itself. The expander
> > matches the standard mov$a optab and so this prevents
> emit_move_insn_1 from doing the move by subwords as it finds a pattern
> that's able to do the move.
> 
> Could you then fix the comment in your patch as well? I hadn't understood
> the force_reg was not key here. You might want to update the following
> sentence from your patch description if you are going to include it in your
> commit message:

I'll update the comment in the patch. The cover letter won't be included in the 
commit,
But it does accurately reflect the current state of affairs. The patch will do 
the force_reg,
It's just not the reason it works.

> 
> The way this is worked around in the back-end is that we have move
> patterns in neon.md that usually just force the register instead of checking
> with the back-end.
> 
> "The way this is worked around (..) that just force the register" is what led
> me to believe the force_reg was important.
> 
> >
> > The expander however always falls through and doesn’t stop RTL
> > generation. You could remove all the code in there and have it
> > properly match the *neon_mov instructions which will do the right
> > thing later at code generation time and avoid the redundant moves.  My
> guess is the original `force_reg` was copied from the other patterns like
> `movti` and the existing `mov`. There It makes sense because the
> operands can be MEM or anything general_operand.
> >
> > However the redundant moves are a different problem than what I'm
> > trying to solve here. So I think that's another patch which requires further
> testing.
> 
> I was just thinking of restricting when does the force_reg happens but if it
> can be removed completely I agree it should probably be done in a separate
> patch.
> 
> Oh by the way, is there something that prevent those expander to ever be
> used with a memory operand? Because the GCC internals contains the
> following piece for mov standard pattern (bold marks added by me):
> 
> "Second, these patterns are not used solely in the RTL generation pass. Even
> the reload pass can generate move insns to copy values from stack slots into
> temporary registers. When it does so, one of the operands is a hard register
> and the other is an operand that can need to be reloaded into a register.
> Therefore, when given such a pair of operands, the pattern must generate
> RTL which needs no reloading and needs no temporary registers—no
> registers other than the operands. For example, if you support the pattern
> with a define_ expand, then in such a case the define_expand *mustn’t call
> force_reg* or any other such function which might generate new pseudo
> registers."

When used during expand the operand 

Re: [38/46] Pass stmt_vec_infos instead of data_references where relevant

2018-07-26 Thread Richard Sandiford
Richard Sandiford  writes:
> Richard Biener  writes:
>> On Tue, Jul 24, 2018 at 12:08 PM Richard Sandiford
>>  wrote:
>>>
>>> This patch makes various routines (mostly in tree-vect-data-refs.c)
>>> take stmt_vec_infos rather than data_references.  The affected routines
>>> are really dealing with the way that an access is going to vectorised
>>> for a particular stmt_vec_info, rather than with the original scalar
>>> access described by the data_reference.
>>
>> Similar.  Doesn't it make more sense to pass both stmt_info and DR to
>> the functions?
>
> Not sure.  If we...
>
>> We currently cannot handle aggregate copies in the to-be-vectorized IL
>> but rely on SRA and friends to elide those.  That's the only two-DR
>> stmt I can think of for vectorization.  Maybe aggregate by-value / return
>> function calls with OMP SIMD if that supports this somehow.
>
> ...did this then I don't think a data_refrence would be the natural
> way of identifying a DR within a stmt_vec_info.  Presumably the
> stmt_vec_info would need multiple STMT_VINFO_DATA_REFS and dr_auxs.
> If both of those were vectors then a (stmt_vec_info, index) pair
> might make more sense than (stmt_vec_info, data_reference).
>
> Alternatively we could move STMT_VINFO_DATA_REF into dataref_aux,
> so that there's a back-pointer to the DR, add a stmt_vec_info
> field to dataref_aux too, and then use dataref_aux instead of
> stmt_vec_info as the key.

New patch 37/46 does that.  The one below goes through and uses
dr_vec_info insead of data_reference in code that is dealing
with the way that a reference is going to be vectorised.

Thanks,
Richard


2018-07-26  Richard Sandiford  

gcc/
* tree-vectorizer.h (set_dr_misalignment, dr_misalignment)
(DR_TARGET_ALIGNMENT, aligned_access_p, known_alignment_for_access_p)
(vect_known_alignment_in_bytes, vect_dr_behavior)
(vect_get_scalar_dr_size): Take references as dr_vec_infos
instead of data_references.  Update calls to other routines for
which the same change has been made.
* tree-vect-data-refs.c (vect_preserves_scalar_order_p): Take
dr_vec_infos instead of stmt_vec_infos.
(vect_analyze_data_ref_dependence): Update call accordingly.
(vect_slp_analyze_data_ref_dependence)
(vect_record_base_alignments): Use DR_VECT_AUX.
(vect_calculate_target_alignment, vect_compute_data_ref_alignment)
(vect_update_misalignment_for_peel, verify_data_ref_alignment)
(vector_alignment_reachable_p, vect_get_data_access_cost)
(vect_peeling_supportable, vect_analyze_group_access_1)
(vect_analyze_group_access, vect_analyze_data_ref_access)
(vect_vfa_segment_size, vect_vfa_access_size, vect_vfa_align)
(vect_compile_time_alias, vect_small_gap_p)
(vectorizable_with_step_bound_p, vect_duplicate_ssa_name_ptr_info):
(vect_supportable_dr_alignment): Take references as dr_vec_infos
instead of data_references.  Update calls to other routines for
which the same change has been made.
(vect_verify_datarefs_alignment, vect_get_peeling_costs_all_drs)
(vect_find_same_alignment_drs, vect_analyze_data_refs_alignment)
(vect_slp_analyze_and_verify_node_alignment)
(vect_analyze_data_ref_accesses, vect_prune_runtime_alias_test_list)
(vect_create_addr_base_for_vector_ref, vect_create_data_ref_ptr)
(vect_setup_realignment): Use dr_vec_infos.  Update calls after
above changes.
(_vect_peel_info::dr): Replace with...
(_vect_peel_info::dr_info): ...this new field.
(vect_peeling_hash_get_most_frequent)
(vect_peeling_hash_choose_best_peeling): Update accordingly.
(vect_peeling_hash_get_lowest_cost):
(vect_enhance_data_refs_alignment): Likewise.  Update calls to other
routines for which the same change has been made.
(vect_peeling_hash_insert): Likewise.  Take a dr_vec_info instead of a
data_reference.
* tree-vect-loop-manip.c (get_misalign_in_elems)
(vect_gen_prolog_loop_niters): Use dr_vec_infos.  Update calls after
above changes.
* tree-vect-loop.c (vect_analyze_loop_2): Likewise.
* tree-vect-stmts.c (vect_get_store_cost, vect_get_load_cost)
(vect_truncate_gather_scatter_offset, compare_step_with_zero)
(get_group_load_store_type, get_negative_load_store_type)
(vect_get_data_ptr_increment, vectorizable_store)
(vectorizable_load): Likewise.
(ensure_base_align): Take a dr_vec_info instead of a data_reference.
Update calls to other routines for which the same change has been made.

Index: gcc/tree-vectorizer.h
===
--- gcc/tree-vectorizer.h   2018-07-26 11:30:56.197256524 +0100
+++ gcc/tree-vectorizer.h   2018-07-26 11:42:19.035663718 +0100
@@ -1294,15 +1294,15 @@ #define DR_MISALIGNMENT_UNKNOWN (-1)
 #defin

[39/46 v2] Change STMT_VINFO_UNALIGNED_DR to a dr_vec_info

2018-07-26 Thread Richard Sandiford
[Updated after new 37/46 and 38/46]

After previous changes, it makes more sense for STMT_VINFO_UNALIGNED_DR
to be dr_vec_info rather than a data_reference.


2018-07-26  Richard Sandiford  

gcc/
* tree-vectorizer.h (_loop_vec_info::unaligned_dr): Change to
dr_vec_info.
* tree-vect-data-refs.c (vect_enhance_data_refs_alignment): Update
accordingly.
* tree-vect-loop.c (vect_analyze_loop_2): Likewise.
* tree-vect-loop-manip.c (get_misalign_in_elems): Likewise.
(vect_gen_prolog_loop_niters): Likewise.

Index: gcc/tree-vectorizer.h
===
--- gcc/tree-vectorizer.h   2018-07-26 11:42:19.035663718 +0100
+++ gcc/tree-vectorizer.h   2018-07-26 11:42:24.919598492 +0100
@@ -437,7 +437,7 @@ typedef struct _loop_vec_info : public v
   tree mask_compare_type;
 
   /* Unknown DRs according to which loop was peeled.  */
-  struct data_reference *unaligned_dr;
+  struct dr_vec_info *unaligned_dr;
 
   /* peeling_for_alignment indicates whether peeling for alignment will take
  place, and what the peeling factor should be:
Index: gcc/tree-vect-data-refs.c
===
--- gcc/tree-vect-data-refs.c   2018-07-26 11:42:19.031663762 +0100
+++ gcc/tree-vect-data-refs.c   2018-07-26 11:42:24.915598537 +0100
@@ -2135,7 +2135,7 @@ vect_enhance_data_refs_alignment (loop_v
vect_update_misalignment_for_peel (dr_info, dr0_info, npeel);
  }
 
-  LOOP_VINFO_UNALIGNED_DR (loop_vinfo) = dr0_info->dr;
+  LOOP_VINFO_UNALIGNED_DR (loop_vinfo) = dr0_info;
   if (npeel)
 LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) = npeel;
   else
Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c2018-07-26 11:42:19.031663762 +0100
+++ gcc/tree-vect-loop.c2018-07-26 11:42:24.919598492 +0100
@@ -2142,8 +2142,7 @@ vect_analyze_loop_2 (loop_vec_info loop_
  /* Niters for peeled prolog loop.  */
  if (LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) < 0)
{
- dr_vec_info *dr_info
-   = DR_VECT_AUX (LOOP_VINFO_UNALIGNED_DR (loop_vinfo));
+ dr_vec_info *dr_info = LOOP_VINFO_UNALIGNED_DR (loop_vinfo);
  tree vectype = STMT_VINFO_VECTYPE (dr_info->stmt);
  niters_th += TYPE_VECTOR_SUBPARTS (vectype) - 1;
}
Index: gcc/tree-vect-loop-manip.c
===
--- gcc/tree-vect-loop-manip.c  2018-07-26 11:42:19.031663762 +0100
+++ gcc/tree-vect-loop-manip.c  2018-07-26 11:42:24.915598537 +0100
@@ -1560,7 +1560,7 @@ vect_update_ivs_after_vectorizer (loop_v
 static tree
 get_misalign_in_elems (gimple **seq, loop_vec_info loop_vinfo)
 {
-  dr_vec_info *dr_info = DR_VECT_AUX (LOOP_VINFO_UNALIGNED_DR (loop_vinfo));
+  dr_vec_info *dr_info = LOOP_VINFO_UNALIGNED_DR (loop_vinfo);
   stmt_vec_info stmt_info = dr_info->stmt;
   tree vectype = STMT_VINFO_VECTYPE (stmt_info);
 
@@ -1627,7 +1627,7 @@ get_misalign_in_elems (gimple **seq, loo
 vect_gen_prolog_loop_niters (loop_vec_info loop_vinfo,
 basic_block bb, int *bound)
 {
-  dr_vec_info *dr_info = DR_VECT_AUX (LOOP_VINFO_UNALIGNED_DR (loop_vinfo));
+  dr_vec_info *dr_info = LOOP_VINFO_UNALIGNED_DR (loop_vinfo);
   tree var;
   tree niters_type = TREE_TYPE (LOOP_VINFO_NITERS (loop_vinfo));
   gimple_seq stmts = NULL, new_stmts = NULL;


RE: [PATCH][GCC][front-end][build-machinery][opt-framework] Allow setting of stack-clash via configure options. [Patch (4/6)]

2018-07-26 Thread Tamar Christina
Hi Alexandre,

> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org 
> On Behalf Of Alexandre Oliva
> Sent: Thursday, July 26, 2018 08:46
> To: Tamar Christina 
> Cc: Joseph Myers ; Jeff Law
> ; gcc-patches@gcc.gnu.org; nd ;
> bonz...@gnu.org; d...@redhat.com; nero...@gcc.gnu.org;
> ralf.wildenh...@gmx.de
> Subject: Re: [PATCH][GCC][front-end][build-machinery][opt-framework]
> Allow setting of stack-clash via configure options. [Patch (4/6)]
> 
> On Jul 25, 2018, Tamar Christina  wrote:
> 
> > gcc/
> > 2018-07-25  Tamar Christina  
> 
> > PR target/86486
> > * configure.ac: Add stack-clash-protection-guard-size.
> > * doc/install.texi: Document it.
> > * config.in (DEFAULT_STK_CLASH_GUARD_SIZE): New.
> > * params.def: Update comment for guard-size.
> > (PARAM_STACK_CLASH_PROTECTION_GUARD_SIZE,
> > PARAM_STACK_CLASH_PROTECTION_PROBE_INTERVAL): Update
> description.
> > * configure: Regenerate.
> 
> Thanks.  No objections from me.  I don't see any use of the new config knob,
> though; assuming it's in a subsequent patch, I guess this one is fine, but I'm
> not sure I'm entitled to approve it.

Yup it's in a subsequent AArch64 patch.  That's no problem, Jeff still has to 
review
the other front-end patch so I'll have to wait for approval there anyway.

Thanks for the review and comments!

Regards,
Tamar

> 
> --
> Alexandre Oliva, freedom fighter   https://FSFLA.org/blogs/lxo
> Be the change, be Free! FSF Latin America board member
> GNU Toolchain EngineerFree Software Evangelist


[40/46 v2] Add vec_info::lookup_dr

2018-07-26 Thread Richard Sandiford
[Updated after new 37/46 and 38/46.  41 onwards are unaffected.]

This patch replaces DR_VECT_AUX and vect_dr_stmt with a new
vec_info::lookup_dr function, so that the lookup is relative
to a particular vec_info rather than to global state.


2018-07-26  Richard Sandiford  

gcc/
* tree-vectorizer.h (vec_info::lookup_dr): New member function.
(vect_dr_stmt): Delete.
* tree-vectorizer.c (vec_info::lookup_dr): New function.
* tree-vect-loop-manip.c (vect_update_inits_of_drs): Use it instead
of DR_VECT_AUX.
* tree-vect-data-refs.c (vect_analyze_possibly_independent_ddr)
(vect_analyze_data_ref_dependence, vect_record_base_alignments)
(vect_verify_datarefs_alignment, vect_peeling_supportable)
(vect_analyze_data_ref_accesses, vect_prune_runtime_alias_test_list)
(vect_analyze_data_refs): Likewise.
(vect_slp_analyze_data_ref_dependence): Likewise.  Take a vec_info
argument.
(vect_find_same_alignment_drs): Likewise.
(vect_slp_analyze_node_dependences): Update calls accordingly.
(vect_analyze_data_refs_alignment): Likewise.  Use vec_info::lookup_dr
instead of DR_VECT_AUX.
(vect_get_peeling_costs_all_drs): Take a loop_vec_info instead
of a vector data references.  Use vec_info::lookup_dr instead of
DR_VECT_AUX.
(vect_peeling_hash_get_lowest_cost): Update calls accordingly.
(vect_enhance_data_refs_alignment): Likewise.  Use vec_info::lookup_dr
instead of DR_VECT_AUX.

Index: gcc/tree-vectorizer.h
===
--- gcc/tree-vectorizer.h   2018-07-26 11:42:24.919598492 +0100
+++ gcc/tree-vectorizer.h   2018-07-26 11:42:29.387548800 +0100
@@ -240,6 +240,7 @@ struct vec_info {
   stmt_vec_info lookup_stmt (gimple *);
   stmt_vec_info lookup_def (tree);
   stmt_vec_info lookup_single_use (tree);
+  struct dr_vec_info *lookup_dr (data_reference *);
   void move_dr (stmt_vec_info, stmt_vec_info);
 
   /* The type of vectorization.  */
@@ -1062,8 +1063,6 @@ #define HYBRID_SLP_STMT(S)
 #define PURE_SLP_STMT(S)  ((S)->slp_type == pure_slp)
 #define STMT_SLP_TYPE(S)   (S)->slp_type
 
-#define DR_VECT_AUX(dr) (STMT_VINFO_DR_INFO (vect_dr_stmt (dr)))
-
 #define VECT_MAX_COST 1000
 
 /* The maximum number of intermediate steps required in multi-step type
@@ -1273,20 +1272,6 @@ add_stmt_costs (void *data, stmt_vector_
   cost->misalign, cost->where);
 }
 
-/* Return the stmt DR is in.  For DR_STMT that have been replaced by
-   a pattern this returns the corresponding pattern stmt.  Otherwise
-   DR_STMT is returned.  */
-
-inline stmt_vec_info
-vect_dr_stmt (data_reference *dr)
-{
-  gimple *stmt = DR_STMT (dr);
-  stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
-  /* DR_STMT should never refer to a stmt in a pattern replacement.  */
-  gcc_checking_assert (!is_pattern_stmt_p (stmt_info));
-  return stmt_info->dr_aux.stmt;
-}
-
 /*-*/
 /* Info on data references alignment.  */
 /*-*/
Index: gcc/tree-vectorizer.c
===
--- gcc/tree-vectorizer.c   2018-07-26 11:30:56.197256524 +0100
+++ gcc/tree-vectorizer.c   2018-07-26 11:42:29.387548800 +0100
@@ -562,6 +562,17 @@ vec_info::lookup_single_use (tree lhs)
   return NULL;
 }
 
+/* Return vectorization information about DR.  */
+
+dr_vec_info *
+vec_info::lookup_dr (data_reference *dr)
+{
+  stmt_vec_info stmt_info = lookup_stmt (DR_STMT (dr));
+  /* DR_STMT should never refer to a stmt in a pattern replacement.  */
+  gcc_checking_assert (!is_pattern_stmt_p (stmt_info));
+  return STMT_VINFO_DR_INFO (stmt_info->dr_aux.stmt);
+}
+
 /* Record that NEW_STMT_INFO now implements the same data reference
as OLD_STMT_INFO.  */
 
Index: gcc/tree-vect-loop-manip.c
===
--- gcc/tree-vect-loop-manip.c  2018-07-26 11:42:24.915598537 +0100
+++ gcc/tree-vect-loop-manip.c  2018-07-26 11:42:29.387548800 +0100
@@ -1754,8 +1754,8 @@ vect_update_inits_of_drs (loop_vec_info
 
   FOR_EACH_VEC_ELT (datarefs, i, dr)
 {
-  gimple *stmt = DR_STMT (dr);
-  if (!STMT_VINFO_GATHER_SCATTER_P (vinfo_for_stmt (stmt)))
+  dr_vec_info *dr_info = loop_vinfo->lookup_dr (dr);
+  if (!STMT_VINFO_GATHER_SCATTER_P (dr_info->stmt))
vect_update_init_of_dr (dr, niters, code);
 }
 }
Index: gcc/tree-vect-data-refs.c
===
--- gcc/tree-vect-data-refs.c   2018-07-26 11:42:24.915598537 +0100
+++ gcc/tree-vect-data-refs.c   2018-07-26 11:42:29.387548800 +0100
@@ -269,10 +269,10 @@ vect_analyze_possibly_independent_ddr (d
 
 Note that the alias checks will be re

Re: [PATCH] PR libstdc++/70940 optimize pmr::resource_adaptor for allocators using malloc

2018-07-26 Thread Rainer Orth
Hi Jonathan,

> Rainer, this is another place where alignof(max_align_t) gets encoded
> into the ABI, so is affected by PR 77691 as well.

indeed, fixed by the following patch.  Tested on i386-pc-solaris2.11,
ok for mainline?

The ugly thing about xfailing the affected tests is that they will XPASS
once in a while when malloc happens to return 16-byte aligned memory.
However, I'm reluctant to skip them instead at least while there's a
chance that Solaris will fix 32-bit x86 malloc alignment post Solaris
11.4.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


2018-07-25  Rainer Orth  

PR libstdc++/77691
* testsuite/experimental/memory_resource/new_delete_resource.cc:
xfail execution on 32-bit Solaris/x86.

# HG changeset patch
# Parent  5af7194620544c9e848e8bfa4181759921729028
xfail experimental/memory_resource/new_delete_resource.cc on 32-bit Solaris/x86 (PR libstdc++/77691)

diff --git a/libstdc++-v3/testsuite/experimental/memory_resource/new_delete_resource.cc b/libstdc++-v3/testsuite/experimental/memory_resource/new_delete_resource.cc
--- a/libstdc++-v3/testsuite/experimental/memory_resource/new_delete_resource.cc
+++ b/libstdc++-v3/testsuite/experimental/memory_resource/new_delete_resource.cc
@@ -16,6 +16,7 @@
 // .
 
 // { dg-do run { target c++14 } }
+// { dg-xfail-run-if "PR libstdc++/77691" { { i?86-*-solaris2.* x86_64-*-solaris2.* } && ilp32 } }
 
 #include 
 #include 


Re: [37/46] Associate alignment information with stmt_vec_infos

2018-07-26 Thread Richard Biener
On Thu, Jul 26, 2018 at 12:55 PM Richard Sandiford
 wrote:
>
> Richard Biener  writes:
> > On Tue, Jul 24, 2018 at 12:08 PM Richard Sandiford
> >  wrote:
> >>
> >> Alignment information is really a property of a stmt_vec_info
> >> (and the way we want to vectorise it) rather than the original scalar dr.
> >> I think that was true even before the recent dr sharing.
> >
> > But that is only so as long as we handle only stmts with a single DR.
> > In reality alignment info _is_ a property of the DR and not of the stmt.
> >
> > So you're doing a shortcut here, shouldn't we rename
> > dr_misalignment to stmt_dr_misalignment then?
> >
> > Otherwise I don't see how this makes sense semantically.
>
> OK, the patch below takes a different approach, suggested in the
> 38/46 thread.  The idea is to make dr_aux link back to both the scalar
> data_reference and the containing stmt_vec_info, so that it becomes a
> lookup-free key for a vectorisable reference.
>
> The data_reference link is just STMT_VINFO_DATA_REF, moved from
> _stmt_vec_info.  The stmt pointer is a new field and always tracks
> the current stmt_vec_info for the reference (which might be a pattern
> stmt or the original stmt).
>
> Then 38/40 can use dr_aux instead of data_reference (compared to current
> sources) and instead of stmt_vec_info (compared to the original series).
> This still avoids the repeated lookups that the series is trying to avoid.
>
> The patch also makes the dr_aux in the current (possibly pattern) stmt
> be the one that counts, rather than have the information stay with the
> original DR_STMT.  A new macro (STMT_VINFO_DR_INFO) gives this
> information for a given stmt_vec_info.
>
> The changes together should make it easier to have multiple dr_auxs
> in a single statement.

I like this.

OK.
Richard.

> Thanks,
> Richard
>
>
> 2018-07-26  Richard Sandiford  
>
> gcc/
> * tree-vectorizer.h (vec_info::move_dr): New member function.
> (dataref_aux): Rename to...
> (dr_vec_info): ...this and add "dr" and "stmt" fields.
> (_stmt_vec_info::dr_aux): Update accordingly.
> (_stmt_vec_info::data_ref_info): Delete.
> (STMT_VINFO_GROUPED_ACCESS, DR_GROUP_FIRST_ELEMENT)
> (DR_GROUP_NEXT_ELEMENT, DR_GROUP_SIZE, DR_GROUP_STORE_COUNT)
> (DR_GROUP_GAP, DR_GROUP_SAME_DR_STMT, REDUC_GROUP_FIRST_ELEMENT):
> (REDUC_GROUP_NEXT_ELEMENT, REDUC_GROUP_SIZE): Use dr_aux.dr instead
> of data_ref.
> (STMT_VINFO_DATA_REF): Likewise.  Turn into an lvalue.
> (STMT_VINFO_DR_INFO): New macro.
> (DR_VECT_AUX): Use STMT_VINFO_DR_INKFO and vect_dr_stmt.
> (set_dr_misalignment): Update after rename of dataref_aux.
> (vect_dr_stmt): Move earlier in file.  Return dr_aux.stmt.
> * tree-vect-stmts.c (new_stmt_vec_info): Remove redundant
> initialization of STMT_VINFO_DATA_REF.
> * tree-vectorizer.c (vec_info::move_dr): New function.
> * tree-vect-patterns.c (vect_recog_bool_pattern)
> (vect_recog_mask_conversion_pattern)
> (vect_recog_gather_scatter_pattern): Use it.
> * tree-vect-data-refs.c (vect_analyze_data_refs): Initialize
> the "dr" and "stmt" fields of dr_vec_info instead of
> STMT_VINFO_DATA_REF.
>
> Index: gcc/tree-vectorizer.h
> ===
> --- gcc/tree-vectorizer.h   2018-07-26 11:30:55.0 +0100
> +++ gcc/tree-vectorizer.h   2018-07-26 11:30:56.197256524 +0100
> @@ -240,6 +240,7 @@ struct vec_info {
>stmt_vec_info lookup_stmt (gimple *);
>stmt_vec_info lookup_def (tree);
>stmt_vec_info lookup_single_use (tree);
> +  void move_dr (stmt_vec_info, stmt_vec_info);
>
>/* The type of vectorization.  */
>vec_kind kind;
> @@ -767,7 +768,11 @@ enum vect_memory_access_type {
>VMAT_GATHER_SCATTER
>  };
>
> -struct dataref_aux {
> +struct dr_vec_info {
> +  /* The data reference itself.  */
> +  data_reference *dr;
> +  /* The statement that contains the data reference.  */
> +  stmt_vec_info stmt;
>/* The misalignment in bytes of the reference, or -1 if not known.  */
>int misalignment;
>/* The byte alignment that we'd ideally like the reference to have,
> @@ -818,11 +823,7 @@ struct _stmt_vec_info {
>   data-ref (array/pointer/struct access). A GIMPLE stmt is expected to 
> have
>   at most one such data-ref.  */
>
> -  /* Information about the data-ref (access function, etc),
> - relative to the inner-most containing loop.  */
> -  struct data_reference *data_ref_info;
> -
> -  dataref_aux dr_aux;
> +  dr_vec_info dr_aux;
>
>/* Information about the data-ref relative to this loop
>   nest (the loop that is being considered for vectorization).  */
> @@ -996,7 +997,7 @@ #define STMT_VINFO_LIVE_P(S)
>  #define STMT_VINFO_VECTYPE(S)  (S)->vectype
>  #define STMT_VINFO_VEC_STMT(S) (S)->vectorized_stmt
>  #define STMT_VINFO_VECTORIZABLE

Re: [38/46] Pass stmt_vec_infos instead of data_references where relevant

2018-07-26 Thread Richard Biener
On Thu, Jul 26, 2018 at 1:05 PM Richard Sandiford
 wrote:
>
> Richard Sandiford  writes:
> > Richard Biener  writes:
> >> On Tue, Jul 24, 2018 at 12:08 PM Richard Sandiford
> >>  wrote:
> >>>
> >>> This patch makes various routines (mostly in tree-vect-data-refs.c)
> >>> take stmt_vec_infos rather than data_references.  The affected routines
> >>> are really dealing with the way that an access is going to vectorised
> >>> for a particular stmt_vec_info, rather than with the original scalar
> >>> access described by the data_reference.
> >>
> >> Similar.  Doesn't it make more sense to pass both stmt_info and DR to
> >> the functions?
> >
> > Not sure.  If we...
> >
> >> We currently cannot handle aggregate copies in the to-be-vectorized IL
> >> but rely on SRA and friends to elide those.  That's the only two-DR
> >> stmt I can think of for vectorization.  Maybe aggregate by-value / return
> >> function calls with OMP SIMD if that supports this somehow.
> >
> > ...did this then I don't think a data_refrence would be the natural
> > way of identifying a DR within a stmt_vec_info.  Presumably the
> > stmt_vec_info would need multiple STMT_VINFO_DATA_REFS and dr_auxs.
> > If both of those were vectors then a (stmt_vec_info, index) pair
> > might make more sense than (stmt_vec_info, data_reference).
> >
> > Alternatively we could move STMT_VINFO_DATA_REF into dataref_aux,
> > so that there's a back-pointer to the DR, add a stmt_vec_info
> > field to dataref_aux too, and then use dataref_aux instead of
> > stmt_vec_info as the key.
>
> New patch 37/46 does that.  The one below goes through and uses
> dr_vec_info insead of data_reference in code that is dealing
> with the way that a reference is going to be vectorised.

OK.

> Thanks,
> Richard
>
>
> 2018-07-26  Richard Sandiford  
>
> gcc/
> * tree-vectorizer.h (set_dr_misalignment, dr_misalignment)
> (DR_TARGET_ALIGNMENT, aligned_access_p, known_alignment_for_access_p)
> (vect_known_alignment_in_bytes, vect_dr_behavior)
> (vect_get_scalar_dr_size): Take references as dr_vec_infos
> instead of data_references.  Update calls to other routines for
> which the same change has been made.
> * tree-vect-data-refs.c (vect_preserves_scalar_order_p): Take
> dr_vec_infos instead of stmt_vec_infos.
> (vect_analyze_data_ref_dependence): Update call accordingly.
> (vect_slp_analyze_data_ref_dependence)
> (vect_record_base_alignments): Use DR_VECT_AUX.
> (vect_calculate_target_alignment, vect_compute_data_ref_alignment)
> (vect_update_misalignment_for_peel, verify_data_ref_alignment)
> (vector_alignment_reachable_p, vect_get_data_access_cost)
> (vect_peeling_supportable, vect_analyze_group_access_1)
> (vect_analyze_group_access, vect_analyze_data_ref_access)
> (vect_vfa_segment_size, vect_vfa_access_size, vect_vfa_align)
> (vect_compile_time_alias, vect_small_gap_p)
> (vectorizable_with_step_bound_p, vect_duplicate_ssa_name_ptr_info):
> (vect_supportable_dr_alignment): Take references as dr_vec_infos
> instead of data_references.  Update calls to other routines for
> which the same change has been made.
> (vect_verify_datarefs_alignment, vect_get_peeling_costs_all_drs)
> (vect_find_same_alignment_drs, vect_analyze_data_refs_alignment)
> (vect_slp_analyze_and_verify_node_alignment)
> (vect_analyze_data_ref_accesses, vect_prune_runtime_alias_test_list)
> (vect_create_addr_base_for_vector_ref, vect_create_data_ref_ptr)
> (vect_setup_realignment): Use dr_vec_infos.  Update calls after
> above changes.
> (_vect_peel_info::dr): Replace with...
> (_vect_peel_info::dr_info): ...this new field.
> (vect_peeling_hash_get_most_frequent)
> (vect_peeling_hash_choose_best_peeling): Update accordingly.
> (vect_peeling_hash_get_lowest_cost):
> (vect_enhance_data_refs_alignment): Likewise.  Update calls to other
> routines for which the same change has been made.
> (vect_peeling_hash_insert): Likewise.  Take a dr_vec_info instead of a
> data_reference.
> * tree-vect-loop-manip.c (get_misalign_in_elems)
> (vect_gen_prolog_loop_niters): Use dr_vec_infos.  Update calls after
> above changes.
> * tree-vect-loop.c (vect_analyze_loop_2): Likewise.
> * tree-vect-stmts.c (vect_get_store_cost, vect_get_load_cost)
> (vect_truncate_gather_scatter_offset, compare_step_with_zero)
> (get_group_load_store_type, get_negative_load_store_type)
> (vect_get_data_ptr_increment, vectorizable_store)
> (vectorizable_load): Likewise.
> (ensure_base_align): Take a dr_vec_info instead of a data_reference.
> Update calls to other routines for which the same change has been 
> made.
>
> Index: gcc/tree-vectorizer.h
> =

Re: [39/46 v2] Change STMT_VINFO_UNALIGNED_DR to a dr_vec_info

2018-07-26 Thread Richard Biener
On Thu, Jul 26, 2018 at 1:08 PM Richard Sandiford
 wrote:
>
> [Updated after new 37/46 and 38/46]
>
> After previous changes, it makes more sense for STMT_VINFO_UNALIGNED_DR
> to be dr_vec_info rather than a data_reference.

OK.

>
> 2018-07-26  Richard Sandiford  
>
> gcc/
> * tree-vectorizer.h (_loop_vec_info::unaligned_dr): Change to
> dr_vec_info.
> * tree-vect-data-refs.c (vect_enhance_data_refs_alignment): Update
> accordingly.
> * tree-vect-loop.c (vect_analyze_loop_2): Likewise.
> * tree-vect-loop-manip.c (get_misalign_in_elems): Likewise.
> (vect_gen_prolog_loop_niters): Likewise.
>
> Index: gcc/tree-vectorizer.h
> ===
> --- gcc/tree-vectorizer.h   2018-07-26 11:42:19.035663718 +0100
> +++ gcc/tree-vectorizer.h   2018-07-26 11:42:24.919598492 +0100
> @@ -437,7 +437,7 @@ typedef struct _loop_vec_info : public v
>tree mask_compare_type;
>
>/* Unknown DRs according to which loop was peeled.  */
> -  struct data_reference *unaligned_dr;
> +  struct dr_vec_info *unaligned_dr;
>
>/* peeling_for_alignment indicates whether peeling for alignment will take
>   place, and what the peeling factor should be:
> Index: gcc/tree-vect-data-refs.c
> ===
> --- gcc/tree-vect-data-refs.c   2018-07-26 11:42:19.031663762 +0100
> +++ gcc/tree-vect-data-refs.c   2018-07-26 11:42:24.915598537 +0100
> @@ -2135,7 +2135,7 @@ vect_enhance_data_refs_alignment (loop_v
> vect_update_misalignment_for_peel (dr_info, dr0_info, npeel);
>   }
>
> -  LOOP_VINFO_UNALIGNED_DR (loop_vinfo) = dr0_info->dr;
> +  LOOP_VINFO_UNALIGNED_DR (loop_vinfo) = dr0_info;
>if (npeel)
>  LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) = npeel;
>else
> Index: gcc/tree-vect-loop.c
> ===
> --- gcc/tree-vect-loop.c2018-07-26 11:42:19.031663762 +0100
> +++ gcc/tree-vect-loop.c2018-07-26 11:42:24.919598492 +0100
> @@ -2142,8 +2142,7 @@ vect_analyze_loop_2 (loop_vec_info loop_
>   /* Niters for peeled prolog loop.  */
>   if (LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) < 0)
> {
> - dr_vec_info *dr_info
> -   = DR_VECT_AUX (LOOP_VINFO_UNALIGNED_DR (loop_vinfo));
> + dr_vec_info *dr_info = LOOP_VINFO_UNALIGNED_DR (loop_vinfo);
>   tree vectype = STMT_VINFO_VECTYPE (dr_info->stmt);
>   niters_th += TYPE_VECTOR_SUBPARTS (vectype) - 1;
> }
> Index: gcc/tree-vect-loop-manip.c
> ===
> --- gcc/tree-vect-loop-manip.c  2018-07-26 11:42:19.031663762 +0100
> +++ gcc/tree-vect-loop-manip.c  2018-07-26 11:42:24.915598537 +0100
> @@ -1560,7 +1560,7 @@ vect_update_ivs_after_vectorizer (loop_v
>  static tree
>  get_misalign_in_elems (gimple **seq, loop_vec_info loop_vinfo)
>  {
> -  dr_vec_info *dr_info = DR_VECT_AUX (LOOP_VINFO_UNALIGNED_DR (loop_vinfo));
> +  dr_vec_info *dr_info = LOOP_VINFO_UNALIGNED_DR (loop_vinfo);
>stmt_vec_info stmt_info = dr_info->stmt;
>tree vectype = STMT_VINFO_VECTYPE (stmt_info);
>
> @@ -1627,7 +1627,7 @@ get_misalign_in_elems (gimple **seq, loo
>  vect_gen_prolog_loop_niters (loop_vec_info loop_vinfo,
>  basic_block bb, int *bound)
>  {
> -  dr_vec_info *dr_info = DR_VECT_AUX (LOOP_VINFO_UNALIGNED_DR (loop_vinfo));
> +  dr_vec_info *dr_info = LOOP_VINFO_UNALIGNED_DR (loop_vinfo);
>tree var;
>tree niters_type = TREE_TYPE (LOOP_VINFO_NITERS (loop_vinfo));
>gimple_seq stmts = NULL, new_stmts = NULL;


Re: [36/46] Add a pattern_stmt_p field to stmt_vec_info

2018-07-26 Thread Richard Biener
On Thu, Jul 26, 2018 at 12:29 PM Richard Sandiford
 wrote:
>
> Richard Biener  writes:
> > On Wed, Jul 25, 2018 at 1:09 PM Richard Sandiford
> >  wrote:
> >>
> >> Richard Biener  writes:
> >> > On Tue, Jul 24, 2018 at 12:07 PM Richard Sandiford
> >> >  wrote:
> >> >>
> >> >> This patch adds a pattern_stmt_p field to stmt_vec_info, so that it's
> >> >> possible to tell whether the statement is a pattern statement without
> >> >> referring to other statements.  The new field goes in what was
> >> >> previously a hole in the structure, so the size is the same as before.
> >> >
> >> > Not sure what the advantage is?  is_pattern_stmt_p () looks nicer
> >> > than ->is_pattern_p
> >>
> >> I can keep the function wrapper if you prefer that.  But having a
> >> statement "know" whether it's a pattern stmt makes things like
> >> freeing stmt_vec_infos simpler (see later patches in the series).
> >
> > Ah, ok.
> >
> >> It should also be cheaper to test, but that's much more minor.
> >
> > So please keep the wrapper.
>
> Like this?

Yes, OK.

Thanks,
Richard.

> > I guess at some point we should decide what to do with all
> > the STMT_VINFO_ macros (and the others, {LOOP,BB}_ stuff
> > is already used inconsistently).
>
> Yeah...
>
>
> 2018-07-26  Richard Sandiford  
>
> gcc/
> * tree-vectorizer.h (_stmt_vec_info::pattern_stmt_p): New field.
> (is_pattern_stmt_p): Use it.
> * tree-vect-patterns.c (vect_init_pattern_stmt): Set pattern_stmt_p
> on pattern statements.
>
> Index: gcc/tree-vectorizer.h
> ===
> --- gcc/tree-vectorizer.h   2018-07-26 11:28:18.0 +0100
> +++ gcc/tree-vectorizer.h   2018-07-26 11:28:19.072951054 +0100
> @@ -791,6 +791,12 @@ struct _stmt_vec_info {
>/* Stmt is part of some pattern (computation idiom)  */
>bool in_pattern_p;
>
> +  /* True if the statement was created during pattern recognition as
> + part of the replacement for RELATED_STMT.  This implies that the
> + statement isn't part of any basic block, although for convenience
> + its gimple_bb is the same as for RELATED_STMT.  */
> +  bool pattern_stmt_p;
> +
>/* Is this statement vectorizable or should it be skipped in (partial)
>   vectorization.  */
>bool vectorizable;
> @@ -1157,8 +1163,7 @@ get_later_stmt (stmt_vec_info stmt1_info
>  static inline bool
>  is_pattern_stmt_p (stmt_vec_info stmt_info)
>  {
> -  stmt_vec_info related_stmt_info = STMT_VINFO_RELATED_STMT (stmt_info);
> -  return related_stmt_info && STMT_VINFO_IN_PATTERN_P (related_stmt_info);
> +  return stmt_info->pattern_stmt_p;
>  }
>
>  /* Return true if BB is a loop header.  */
> Index: gcc/tree-vect-patterns.c
> ===
> --- gcc/tree-vect-patterns.c2018-07-26 11:28:18.0 +0100
> +++ gcc/tree-vect-patterns.c2018-07-26 11:28:19.068951168 +0100
> @@ -108,6 +108,7 @@ vect_init_pattern_stmt (gimple *pattern_
>  pattern_stmt_info = orig_stmt_info->vinfo->add_stmt (pattern_stmt);
>gimple_set_bb (pattern_stmt, gimple_bb (orig_stmt_info->stmt));
>
> +  pattern_stmt_info->pattern_stmt_p = true;
>STMT_VINFO_RELATED_STMT (pattern_stmt_info) = orig_stmt_info;
>STMT_VINFO_DEF_TYPE (pattern_stmt_info)
>  = STMT_VINFO_DEF_TYPE (orig_stmt_info);


Re: [40/46 v2] Add vec_info::lookup_dr

2018-07-26 Thread Richard Biener
On Thu, Jul 26, 2018 at 1:10 PM Richard Sandiford
 wrote:
>
> [Updated after new 37/46 and 38/46.  41 onwards are unaffected.]
>
> This patch replaces DR_VECT_AUX and vect_dr_stmt with a new
> vec_info::lookup_dr function, so that the lookup is relative
> to a particular vec_info rather than to global state.

OK.

>
> 2018-07-26  Richard Sandiford  
>
> gcc/
> * tree-vectorizer.h (vec_info::lookup_dr): New member function.
> (vect_dr_stmt): Delete.
> * tree-vectorizer.c (vec_info::lookup_dr): New function.
> * tree-vect-loop-manip.c (vect_update_inits_of_drs): Use it instead
> of DR_VECT_AUX.
> * tree-vect-data-refs.c (vect_analyze_possibly_independent_ddr)
> (vect_analyze_data_ref_dependence, vect_record_base_alignments)
> (vect_verify_datarefs_alignment, vect_peeling_supportable)
> (vect_analyze_data_ref_accesses, vect_prune_runtime_alias_test_list)
> (vect_analyze_data_refs): Likewise.
> (vect_slp_analyze_data_ref_dependence): Likewise.  Take a vec_info
> argument.
> (vect_find_same_alignment_drs): Likewise.
> (vect_slp_analyze_node_dependences): Update calls accordingly.
> (vect_analyze_data_refs_alignment): Likewise.  Use vec_info::lookup_dr
> instead of DR_VECT_AUX.
> (vect_get_peeling_costs_all_drs): Take a loop_vec_info instead
> of a vector data references.  Use vec_info::lookup_dr instead of
> DR_VECT_AUX.
> (vect_peeling_hash_get_lowest_cost): Update calls accordingly.
> (vect_enhance_data_refs_alignment): Likewise.  Use vec_info::lookup_dr
> instead of DR_VECT_AUX.
>
> Index: gcc/tree-vectorizer.h
> ===
> --- gcc/tree-vectorizer.h   2018-07-26 11:42:24.919598492 +0100
> +++ gcc/tree-vectorizer.h   2018-07-26 11:42:29.387548800 +0100
> @@ -240,6 +240,7 @@ struct vec_info {
>stmt_vec_info lookup_stmt (gimple *);
>stmt_vec_info lookup_def (tree);
>stmt_vec_info lookup_single_use (tree);
> +  struct dr_vec_info *lookup_dr (data_reference *);
>void move_dr (stmt_vec_info, stmt_vec_info);
>
>/* The type of vectorization.  */
> @@ -1062,8 +1063,6 @@ #define HYBRID_SLP_STMT(S)
>  #define PURE_SLP_STMT(S)  ((S)->slp_type == pure_slp)
>  #define STMT_SLP_TYPE(S)   (S)->slp_type
>
> -#define DR_VECT_AUX(dr) (STMT_VINFO_DR_INFO (vect_dr_stmt (dr)))
> -
>  #define VECT_MAX_COST 1000
>
>  /* The maximum number of intermediate steps required in multi-step type
> @@ -1273,20 +1272,6 @@ add_stmt_costs (void *data, stmt_vector_
>cost->misalign, cost->where);
>  }
>
> -/* Return the stmt DR is in.  For DR_STMT that have been replaced by
> -   a pattern this returns the corresponding pattern stmt.  Otherwise
> -   DR_STMT is returned.  */
> -
> -inline stmt_vec_info
> -vect_dr_stmt (data_reference *dr)
> -{
> -  gimple *stmt = DR_STMT (dr);
> -  stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
> -  /* DR_STMT should never refer to a stmt in a pattern replacement.  */
> -  gcc_checking_assert (!is_pattern_stmt_p (stmt_info));
> -  return stmt_info->dr_aux.stmt;
> -}
> -
>  /*-*/
>  /* Info on data references alignment.  */
>  /*-*/
> Index: gcc/tree-vectorizer.c
> ===
> --- gcc/tree-vectorizer.c   2018-07-26 11:30:56.197256524 +0100
> +++ gcc/tree-vectorizer.c   2018-07-26 11:42:29.387548800 +0100
> @@ -562,6 +562,17 @@ vec_info::lookup_single_use (tree lhs)
>return NULL;
>  }
>
> +/* Return vectorization information about DR.  */
> +
> +dr_vec_info *
> +vec_info::lookup_dr (data_reference *dr)
> +{
> +  stmt_vec_info stmt_info = lookup_stmt (DR_STMT (dr));
> +  /* DR_STMT should never refer to a stmt in a pattern replacement.  */
> +  gcc_checking_assert (!is_pattern_stmt_p (stmt_info));
> +  return STMT_VINFO_DR_INFO (stmt_info->dr_aux.stmt);
> +}
> +
>  /* Record that NEW_STMT_INFO now implements the same data reference
> as OLD_STMT_INFO.  */
>
> Index: gcc/tree-vect-loop-manip.c
> ===
> --- gcc/tree-vect-loop-manip.c  2018-07-26 11:42:24.915598537 +0100
> +++ gcc/tree-vect-loop-manip.c  2018-07-26 11:42:29.387548800 +0100
> @@ -1754,8 +1754,8 @@ vect_update_inits_of_drs (loop_vec_info
>
>FOR_EACH_VEC_ELT (datarefs, i, dr)
>  {
> -  gimple *stmt = DR_STMT (dr);
> -  if (!STMT_VINFO_GATHER_SCATTER_P (vinfo_for_stmt (stmt)))
> +  dr_vec_info *dr_info = loop_vinfo->lookup_dr (dr);
> +  if (!STMT_VINFO_GATHER_SCATTER_P (dr_info->stmt))
> vect_update_init_of_dr (dr, niters, code);
>  }
>  }
> Index: gcc/tree-vect-data-refs.c
> ==

Re: [PATCH] combine: Another hard register problem (PR85805)

2018-07-26 Thread Richard Biener
On Thu, Jul 26, 2018 at 12:12 PM Segher Boessenkool
 wrote:
>
> The current code in reg_nonzero_bits_for_combine allows using the
> reg_stat info when last_set_mode is a different integer mode.  This is
> completely wrong for non-pseudos.  For example, as in the PR, a value
> in a DImode hard register is set by eight writes to its constituent
> QImode parts.  The value written to the DImode is not the same as that
> written to the lowest-numbered QImode!
>
> This patch fixes it.  Committing.  Will backport later, too.

testcase?

>
> Segher
>
>
> 2018-07-26  Segher Boessenkool  
>
> PR rtl-optimization/85805
> * combine.c (reg_nonzero_bits_for_combine): Only use the last set
> value for hard registers if that was written in the same mode.
>
> ---
>  gcc/combine.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/combine.c b/gcc/combine.c
> index 09cbad4..fe71f3a 100644
> --- a/gcc/combine.c
> +++ b/gcc/combine.c
> @@ -10186,7 +10186,8 @@ reg_nonzero_bits_for_combine (const_rtx x, 
> scalar_int_mode xmode,
>rsp = ®_stat[REGNO (x)];
>if (rsp->last_set_value != 0
>&& (rsp->last_set_mode == mode
> - || (GET_MODE_CLASS (rsp->last_set_mode) == MODE_INT
> + || (REGNO (x) >= FIRST_PSEUDO_REGISTER
> + && GET_MODE_CLASS (rsp->last_set_mode) == MODE_INT
>   && GET_MODE_CLASS (mode) == MODE_INT))
>&& ((rsp->last_set_label >= label_tick_ebb_start
>&& rsp->last_set_label < label_tick)
> --
> 1.8.3.1
>


Re: [PATCH] doc: discourage const/volatile on register variables

2018-07-26 Thread Richard Biener
On Thu, Jul 26, 2018 at 12:29 PM Alexander Monakov  wrote:
>
> Hi,
>
> when using explicit register variables ('register int foo asm("%ebp");'),
> using const/volatile type qualifiers on the declaration does not result in
> the logically expected effect.
>
> The main issue is that gcc-8 got "better" at propagating initializers of
> 'register const' variables to their uses in asm operands, losing the
> association with the register and thus causing the operand to
> unexpectedly appear in some other register. This caused build issues for
> the Linux kernel and was reported a couple of times in the GCC Bugzilla.
>
> This patch adds a few lines to the documentation to say that qualifiers
> won't work as expected. OK for trunk?

Looks ok to me.  Maybe we should change FEs to ignore those
qualifiers on explicit register variables and emit a warning like

warning: const/volatile qualifier ignored on X

?

> Thanks.
> Alexander
>
> PR target/86673
> doc/extend.texi (Global Register Variables): Discourage use of type
> qualifiers.
> (Local Register Variables): Likewise.
>
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index 7b471ec40f7..9a41f2753e9 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -9591,6 +9591,11 @@ a global variable the declaration appears outside a 
> function. The
>  @code{static}. The register name must be a valid register name for the
>  target platform.
>
> +Do not use type qualifiers such as @code{const} and @code{volatile}, as
> +the result may be contrary to your expectations. In  particular, using
> +the @code{volatile} qualifier does not fully prevent the compiler from
> +optimizing accesses to the register.
> +
>  Registers are a scarce resource on most systems and allowing the
>  compiler to manage their usage usually results in the best code. However,
>  under special circumstances it can make sense to reserve some globally.
> @@ -9698,6 +9703,12 @@ but for a local variable the declaration appears 
> within a function.  The
>  @code{static}.  The register name must be a valid register name for the
>  target platform.
>
> +Do not use type qualifiers such as @code{const} and @code{volatile}, as
> +the result may be contrary to your expectations. In particular, when
> +the @code{const} qualifier is used, the compiler may substitute the
> +variable with its initializer in @code{asm} statements, which may cause
> +the corresponding operand to appear in a different register.
> +
>  As with global register variables, it is recommended that you choose
>  a register that is normally saved and restored by function calls on your
>  machine, so that calls to library routines will not clobber it.


Re: [PATCH] PR libstdc++/70940 optimize pmr::resource_adaptor for allocators using malloc

2018-07-26 Thread Jonathan Wakely

On 26/07/18 13:11 +0200, Rainer Orth wrote:

Hi Jonathan,


Rainer, this is another place where alignof(max_align_t) gets encoded
into the ABI, so is affected by PR 77691 as well.


indeed, fixed by the following patch.  Tested on i386-pc-solaris2.11,
ok for mainline?


OK, thanks.


The ugly thing about xfailing the affected tests is that they will XPASS
once in a while when malloc happens to return 16-byte aligned memory.
However, I'm reluctant to skip them instead at least while there's a
chance that Solaris will fix 32-bit x86 malloc alignment post Solaris
11.4.


Yes, it isn't ideal to have them flip between XFAIL and XPASS, but I
agree that simply skipping them is worse.




Re: [PATCH][GCC][Arm] Fix subreg crash in different way by enabling the FP16 pattern unconditionally.

2018-07-26 Thread Thomas Preudhomme
On Thu, 26 Jul 2018 at 12:01, Tamar Christina  wrote:
>
> Hi Thomas,
>
> > -Original Message-
> > From: Thomas Preudhomme 
> > Sent: Thursday, July 26, 2018 09:29
> > To: Tamar Christina 
> > Cc: gcc-patches@gcc.gnu.org; nd ; Ramana Radhakrishnan
> > ; Richard Earnshaw
> > ; ni...@redhat.com; Kyrylo Tkachov
> > 
> > Subject: Re: [PATCH][GCC][Arm] Fix subreg crash in different way by
> > enabling the FP16 pattern unconditionally.
> >
> > Hi Tamar,
> >
> > On Wed, 25 Jul 2018 at 16:28, Tamar Christina 
> > wrote:
> > >
> > > Hi Thomas,
> > >
> > > Thanks for the review!
> > >
> > > > >
> > > > > I don't believe the TARGET_FP16 guard to be needed, because the
> > > > > pattern doesn't actually generate code and requires another
> > > > > pattern for that, and a reg to reg move should always be possible
> > > > > anyway. So allowing the force to register here is safe and it
> > > > > allows the compiler to generate a correct error instead of ICEing in 
> > > > > an
> > infinite loop.
> > > >
> > > > How about subreg to subreg move? Doesn't that expand to more insns
> > > > (subreg to reg and reg to subreg)? Couldn't you improve the logic to
> > > > check that there is actually a mode change so that if there isn't
> > > > (like moving from one subreg to another) just expand to a single move?
> > > >
> > >
> > > Yes, but that is not a new issue. My patch is simply removing the
> > > TARGET_FP16 restrictions and merging two patterns that should be one
> > using an iterator and nothing more.
> > >
> > > The redundant mov is already there and a different issue than the ICE I'm
> > trying to fix.
> >
> > It's there for movv4hf and movv6hf but your patch extends this problem to
> > movv2sf and movv4sf as well.
>
> I don't understand how it can. My patch just replaces one pattern for V4HF and
> one for V8HF with one pattern operating on VH.
>
> ;; Vector modes for 16-bit floating-point support.
> (define_mode_iterator VH [V8HF V4HF])
>
> My pattern has absolutely no effect on V2SF and V4SF or any of the other 
> modes.

My bad, I was looking at VF.

>
> >
> > >
> > > None of the code inside the expander is needed at all, the code really
> > > only has an effect on subreg to subreg moves, as `force_reg` doesn't do
> > anything when it's argument is already a reg.
> > >
> > > The comment in the expander (which was already there) is wrong. The
> > > *reason* the ICE is fixed isn't because of the `force_reg`. It's
> > > because of the mere presence of the expander itself. The expander
> > > matches the standard mov$a optab and so this prevents
> > emit_move_insn_1 from doing the move by subwords as it finds a pattern
> > that's able to do the move.
> >
> > Could you then fix the comment in your patch as well? I hadn't understood
> > the force_reg was not key here. You might want to update the following
> > sentence from your patch description if you are going to include it in your
> > commit message:
>
> I'll update the comment in the patch. The cover letter won't be included in 
> the commit,
> But it does accurately reflect the current state of affairs. The patch will 
> do the force_reg,
> It's just not the reason it works.

Understood.

>
> >
> > The way this is worked around in the back-end is that we have move
> > patterns in neon.md that usually just force the register instead of checking
> > with the back-end.
> >
> > "The way this is worked around (..) that just force the register" is what 
> > led
> > me to believe the force_reg was important.
> >
> > >
> > > The expander however always falls through and doesn’t stop RTL
> > > generation. You could remove all the code in there and have it
> > > properly match the *neon_mov instructions which will do the right
> > > thing later at code generation time and avoid the redundant moves.  My
> > guess is the original `force_reg` was copied from the other patterns like
> > `movti` and the existing `mov`. There It makes sense because the
> > operands can be MEM or anything general_operand.
> > >
> > > However the redundant moves are a different problem than what I'm
> > > trying to solve here. So I think that's another patch which requires 
> > > further
> > testing.
> >
> > I was just thinking of restricting when does the force_reg happens but if it
> > can be removed completely I agree it should probably be done in a separate
> > patch.
> >
> > Oh by the way, is there something that prevent those expander to ever be
> > used with a memory operand? Because the GCC internals contains the
> > following piece for mov standard pattern (bold marks added by me):
> >
> > "Second, these patterns are not used solely in the RTL generation pass. Even
> > the reload pass can generate move insns to copy values from stack slots into
> > temporary registers. When it does so, one of the operands is a hard register
> > and the other is an operand that can need to be reloaded into a register.
> > Therefore, when given such a pair of operands, the pattern must generate
> > RTL whi

Re: [PATCH] Add linker_output as prefix for LTO temps (PR lto/86548).

2018-07-26 Thread Richard Biener
On Thu, Jul 26, 2018 at 12:55 PM Martin Liška  wrote:
>
> Hi.
>
> As requested in the PR, now we produce prefixes for temp files in LTO:
>
> Example:
> $ gcc -flto main.o a.o --save-temps -o mybinary
>
> generates:
> $ ls /tmp/mybinary*
> /tmp/mybinary  /tmp/mybinary.ltrans0.o  /tmp/mybinary.ltrans0.s  
> /tmp/mybinary.ltrans.out

It will be /tmp/mybinary.abc421.ltrans0.o
/tmp/mybinary.abc421.ltrans1.o, etc., correct?

Otherwise there's the chance to trash user files which isn't good.

> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
>
> Ready to be installed?

OK if the above is correct.

Thanks,
Richard.

> Martin
>
> gcc/ChangeLog:
>
> 2018-07-26  Martin Liska  
>
> PR lto/86548
> * lto-wrapper.c: Add linker_output as prefix
> for ltrans_output_file.
>
> include/ChangeLog:
>
> 2018-07-26  Martin Liska  
>
> PR lto/86548
> * libiberty.h (make_temp_file_with_prefix): New function.
>
> libiberty/ChangeLog:
>
> 2018-07-26  Martin Liska  
>
> PR lto/86548
> * make-temp-file.c (TEMP_FILE): Remove leading 'cc'.
> (make_temp_file): Call make_temp_file_with_prefix with
> first argument set to NULL.
> (make_temp_file_with_prefix): Support also prefix.
> ---
>  gcc/lto-wrapper.c  | 14 +-
>  include/libiberty.h|  5 +
>  libiberty/make-temp-file.c | 24 ++--
>  3 files changed, 36 insertions(+), 7 deletions(-)
>
>


Re: [PATCH] combine: Another hard register problem (PR85805)

2018-07-26 Thread Segher Boessenkool
On Thu, Jul 26, 2018 at 01:16:42PM +0200, Richard Biener wrote:
> On Thu, Jul 26, 2018 at 12:12 PM Segher Boessenkool
>  wrote:
> >
> > The current code in reg_nonzero_bits_for_combine allows using the
> > reg_stat info when last_set_mode is a different integer mode.  This is
> > completely wrong for non-pseudos.  For example, as in the PR, a value
> > in a DImode hard register is set by eight writes to its constituent
> > QImode parts.  The value written to the DImode is not the same as that
> > written to the lowest-numbered QImode!
> >
> > This patch fixes it.  Committing.  Will backport later, too.
> 
> testcase?

Feel free to write one?


Segher


Re: [patch] adjust default nvptx launch geometry for OpenACC offloaded regions

2018-07-26 Thread Tom de Vries
> Content-Type: text/x-patch; name="trunk-libgomp-default-par.diff"
> Content-Transfer-Encoding: 7bit
> Content-Disposition: attachment; filename="trunk-libgomp-default-par.diff"

>From https://gcc.gnu.org/contribute.html#patches :
...
We prefer patches posted as plain text or as MIME parts of type
text/x-patch or text/plain, disposition inline, encoded as 7bit or 8bit.
It is strongly discouraged to post patches as MIME parts of type
application/whatever, disposition attachment or encoded as base64 or
quoted-printable.
...

Please post with content-disposition inline instead of attachment (or,
as plain text).

Thanks,
- Tom


Re: [PATCH] Add linker_output as prefix for LTO temps (PR lto/86548).

2018-07-26 Thread Martin Liška
On 07/26/2018 01:34 PM, Richard Biener wrote:
> On Thu, Jul 26, 2018 at 12:55 PM Martin Liška  wrote:
>>
>> Hi.
>>
>> As requested in the PR, now we produce prefixes for temp files in LTO:
>>
>> Example:
>> $ gcc -flto main.o a.o --save-temps -o mybinary
>>
>> generates:
>> $ ls /tmp/mybinary*
>> /tmp/mybinary  /tmp/mybinary.ltrans0.o  /tmp/mybinary.ltrans0.s  
>> /tmp/mybinary.ltrans.out
> 
> It will be /tmp/mybinary.abc421.ltrans0.o
> /tmp/mybinary.abc421.ltrans1.o, etc., correct?

Yes, --save-temps changes which file names are used. Before patch:

$ strace -f -s512 gcc -flto a.o main.o -o mybinary 2>&1 | grep execv | grep 
ltrans
[pid 23926] 
execve("/home/marxin/bin/gcc/lib/gcc/x86_64-pc-linux-gnu/9.0.0/lto1", 
["/home/marxin/bin/gcc/lib/gcc/x86_64-pc-linux-gnu/9.0.0/lto1", "-quiet", 
"-dumpdir", "./", "-dumpbase", "mybinary.wpa", "-mtune=generic", 
"-march=x86-64", "-mtune=generic", "-march=x86-64", "-auxbase", "a", 
"-fno-openmp", "-fno-openacc", "-fltrans-output-list=/tmp/ccPVzNR6.ltrans.out", 
"-fwpa", "-fresolution=/tmp/ccuocrhY.res", "-flinker-output=exec", 
"@/tmp/ccHQA575"], 0x7fffd960 /* 105 vars */ 
[pid 23928] 
execve("/home/marxin/bin/gcc/lib/gcc/x86_64-pc-linux-gnu/9.0.0/lto1", 
["/home/marxin/bin/gcc/lib/gcc/x86_64-pc-linux-gnu/9.0.0/lto1", "-quiet", 
"-dumpdir", "./", "-dumpbase", "mybinary.ltrans0", "-mtune=generic", 
"-march=x86-64", "-mtune=generic", "-march=x86-64", "-auxbase-strip", 
"/tmp/ccPVzNR6.ltrans0.ltrans.o", "-fno-openmp", "-fno-openacc", "-fltrans", 
"@/tmp/ccssDdS8", "-o", "/tmp/cclsKY4G.s"], 0x7fffd960 /* 105 vars */ 

[pid 23929] execve("/home/marxin/bin/gcc/bin//as", ["as", "--64", "-o", 
"/tmp/ccPVzNR6.ltrans0.ltrans.o", "/tmp/cclsKY4G.s"], 0x7fffd960 /* 105 
vars */) = -1 ENOENT (No such file or directory)
[pid 23929] execve("/home/marxin/bin/as", ["as", "--64", "-o", 
"/tmp/ccPVzNR6.ltrans0.ltrans.o", "/tmp/cclsKY4G.s"], 0x7fffd960 /* 105 
vars */) = -1 ENOENT (No such file or directory)
[pid 23929] execve("/usr/local/bin/as", ["as", "--64", "-o", 
"/tmp/ccPVzNR6.ltrans0.ltrans.o", "/tmp/cclsKY4G.s"], 0x7fffd960 /* 105 
vars */) = -1 ENOENT (No such file or directory)
[pid 23929] execve("/usr/bin/as", ["as", "--64", "-o", 
"/tmp/ccPVzNR6.ltrans0.ltrans.o", "/tmp/cclsKY4G.s"], 0x7fffd960 /* 105 
vars */ 

after:

$ strace -f -s512 gcc -flto a.o main.o -o mybinary 2>&1 | grep execv | grep 
ltrans
[pid 16379] 
execve("/home/marxin/bin/gcc/lib/gcc/x86_64-pc-linux-gnu/9.0.0/lto1", 
["/home/marxin/bin/gcc/lib/gcc/x86_64-pc-linux-gnu/9.0.0/lto1", "-quiet", 
"-dumpdir", "./", "-dumpbase", "mybinary.wpa", "-mtune=generic", 
"-march=x86-64", "-mtune=generic", "-march=x86-64", "-auxbase", "a", 
"-fno-openmp", "-fno-openacc", 
"-fltrans-output-list=/tmp/mybinary.VkVmXd.ltrans.out", "-fwpa", 
"-fresolution=/tmp/ccnQ6e55.res", "-flinker-output=exec", "@/tmp/cc0Sv5Fe"], 
0x7fffd960 /* 105 vars */ 
[pid 16381] 
execve("/home/marxin/bin/gcc/lib/gcc/x86_64-pc-linux-gnu/9.0.0/lto1", 
["/home/marxin/bin/gcc/lib/gcc/x86_64-pc-linux-gnu/9.0.0/lto1", "-quiet", 
"-dumpdir", "./", "-dumpbase", "mybinary.ltrans0", "-mtune=generic", 
"-march=x86-64", "-mtune=generic", "-march=x86-64", "-auxbase-strip", 
"/tmp/mybinary.VkVmXd.ltrans0.ltrans.o", "-fno-openmp", "-fno-openacc", 
"-fltrans", "@/tmp/ccDY6Ojf", "-o", "/tmp/ccrmLPAG.s"], 0x7fffd960 /* 105 
vars */ 
[pid 16382] execve("/home/marxin/bin/gcc/bin//as", ["as", "--64", "-o", 
"/tmp/mybinary.VkVmXd.ltrans0.ltrans.o", "/tmp/ccrmLPAG.s"], 0x7fffd960 /* 
105 vars */) = -1 ENOENT (No such file or directory)
[pid 16382] execve("/home/marxin/bin/as", ["as", "--64", "-o", 
"/tmp/mybinary.VkVmXd.ltrans0.ltrans.o", "/tmp/ccrmLPAG.s"], 0x7fffd960 /* 
105 vars */) = -1 ENOENT (No such file or directory)
[pid 16382] execve("/usr/local/bin/as", ["as", "--64", "-o", 
"/tmp/mybinary.VkVmXd.ltrans0.ltrans.o", "/tmp/ccrmLPAG.s"], 0x7fffd960 /* 
105 vars */) = -1 ENOENT (No such file or directory)
[pid 16382] execve("/usr/bin/as", ["as", "--64", "-o", 
"/tmp/mybinary.VkVmXd.ltrans0.ltrans.o", "/tmp/ccrmLPAG.s"], 0x7fffd960 /* 
105 vars */ 


> 
> Otherwise there's the chance to trash user files which isn't good.

Sure, the patch behaves fine. I'll install it.

Martin

> 
>> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
>>
>> Ready to be installed?
> 
> OK if the above is correct.
> 
> Thanks,
> Richard.
> 
>> Martin
>>
>> gcc/ChangeLog:
>>
>> 2018-07-26  Martin Liska  
>>
>> PR lto/86548
>> * lto-wrapper.c: Add linker_output as prefix
>> for ltrans_output_file.
>>
>> include/ChangeLog:
>>
>> 2018-07-26  Martin Liska  
>>
>> PR lto/86548
>> * libiberty.h (make_temp_file_with_prefix): New function.
>>
>> libiberty/ChangeLog:
>>
>> 2018-07-26  Martin Liska  
>>
>> PR lto/86548
>> * make-temp-file.c (TEMP_FILE): Remove leading 'cc'.
>> (make_temp_file): Call make_temp_file_with_prefix with
>> first argument set to N

[libgomp, nvptx] Move device property sampling from nvptx_exec to nvptx_open

2018-07-26 Thread Tom de Vries
> diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
> index 89326e57741..5022e462a3d 100644
> --- a/libgomp/plugin/plugin-nvptx.c
> +++ b/libgomp/plugin/plugin-nvptx.c
> @@ -1120,6 +1126,7 @@ nvptx_exec (void (*fn), size_t mapnum, void 
> **hostaddrs, void **devaddrs,
>void *hp, *dp;
>struct nvptx_thread *nvthd = nvptx_thread ();
>const char *maybe_abort_msg = "(perhaps abort was called)";
> +  int dev_size = nvthd->ptx_dev->num_sms;
>  
>function = targ_fn->fn;
>  
> @@ -1150,23 +1156,20 @@ nvptx_exec (void (*fn), size_t mapnum, void 
> **hostaddrs, void **devaddrs,
> for (int i = 0; i < GOMP_DIM_MAX; ++i)
>   default_dims[i] = GOMP_PLUGIN_acc_default_dim (i);
>  
> -   int warp_size, block_size, dev_size, cpu_size;
> +   int warp_size, block_size, cpu_size;
> CUdevice dev = nvptx_thread()->ptx_dev->dev;
> /* 32 is the default for known hardware.  */
> int gang = 0, worker = 32, vector = 32;
> -   CUdevice_attribute cu_tpb, cu_ws, cu_mpc, cu_tpm;
> +   CUdevice_attribute cu_tpb, cu_ws, cu_tpm;
>  
> cu_tpb = CU_DEVICE_ATTRIBUTE_MAX_THREADS_PER_BLOCK;
> cu_ws = CU_DEVICE_ATTRIBUTE_WARP_SIZE;
> -   cu_mpc = CU_DEVICE_ATTRIBUTE_MULTIPROCESSOR_COUNT;
> cu_tpm  = CU_DEVICE_ATTRIBUTE_MAX_THREADS_PER_MULTIPROCESSOR;
>  
> if (CUDA_CALL_NOCHECK (cuDeviceGetAttribute, &block_size, cu_tpb,
>dev) == CUDA_SUCCESS
> && CUDA_CALL_NOCHECK (cuDeviceGetAttribute, &warp_size, cu_ws,
>   dev) == CUDA_SUCCESS
> -   && CUDA_CALL_NOCHECK (cuDeviceGetAttribute, &dev_size, cu_mpc,
> - dev) == CUDA_SUCCESS
> && CUDA_CALL_NOCHECK (cuDeviceGetAttribute, &cpu_size, cu_tpm,
>   dev) == CUDA_SUCCESS)
>   {

This is a good idea (and should have been an independent patch of course).

Furthermore, it's better to move the remaining cuDeviceGetAttribute
calls to nvptx_open, as was already suggested by Thomas here (
https://gcc.gnu.org/ml/gcc-patches/2017-02/msg01020.html ).

Committed to trunk.

- Tom
[libgomp, nvptx] Move device property sampling from nvptx_exec to nvptx_open

Move sampling of device properties from nvptx_exec to nvptx_open, and assume
the sampling always succeeds.  This simplifies the default dimension
initialization code in nvptx_open.

2018-07-26  Cesar Philippidis  
	Tom de Vries  

	* plugin/plugin-nvptx.c (struct ptx_device): Add warp_size,
	max_threads_per_block and max_threads_per_multiprocessor fields.
	(nvptx_open_device): Initialize new fields.
	(nvptx_exec): Use num_sms, and new fields.

---
 libgomp/plugin/plugin-nvptx.c | 53 +--
 1 file changed, 26 insertions(+), 27 deletions(-)

diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 89326e57741..5d9b5151e95 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -414,6 +414,9 @@ struct ptx_device
   int num_sms;
   int regs_per_block;
   int regs_per_sm;
+  int warp_size;
+  int max_threads_per_block;
+  int max_threads_per_multiprocessor;
 
   struct ptx_image_data *images;  /* Images loaded on device.  */
   pthread_mutex_t image_lock; /* Lock for above list.  */
@@ -800,6 +803,15 @@ nvptx_open_device (int n)
   GOMP_PLUGIN_error ("Only warp size 32 is supported");
   return NULL;
 }
+  ptx_dev->warp_size = pi;
+
+  CUDA_CALL_ERET (NULL, cuDeviceGetAttribute, &pi,
+		  CU_DEVICE_ATTRIBUTE_MAX_THREADS_PER_BLOCK, dev);
+  ptx_dev->max_threads_per_block = pi;
+
+  CUDA_CALL_ERET (NULL, cuDeviceGetAttribute, &pi,
+		  CU_DEVICE_ATTRIBUTE_MAX_THREADS_PER_MULTIPROCESSOR, dev);
+  ptx_dev->max_threads_per_multiprocessor = pi;
 
   r = CUDA_CALL_NOCHECK (cuDeviceGetAttribute, &async_engines,
 			 CU_DEVICE_ATTRIBUTE_ASYNC_ENGINE_COUNT, dev);
@@ -1150,33 +1162,20 @@ nvptx_exec (void (*fn), size_t mapnum, void **hostaddrs, void **devaddrs,
 	  for (int i = 0; i < GOMP_DIM_MAX; ++i)
 	default_dims[i] = GOMP_PLUGIN_acc_default_dim (i);
 
-	  int warp_size, block_size, dev_size, cpu_size;
-	  CUdevice dev = nvptx_thread()->ptx_dev->dev;
-	  /* 32 is the default for known hardware.  */
-	  int gang = 0, worker = 32, vector = 32;
-	  CUdevice_attribute cu_tpb, cu_ws, cu_mpc, cu_tpm;
-
-	  cu_tpb = CU_DEVICE_ATTRIBUTE_MAX_THREADS_PER_BLOCK;
-	  cu_ws = CU_DEVICE_ATTRIBUTE_WARP_SIZE;
-	  cu_mpc = CU_DEVICE_ATTRIBUTE_MULTIPROCESSOR_COUNT;
-	  cu_tpm  = CU_DEVICE_ATTRIBUTE_MAX_THREADS_PER_MULTIPROCESSOR;
-
-	  if (CUDA_CALL_NOCHECK (cuDeviceGetAttribute, &block_size, cu_tpb,
- dev) == CUDA_SUCCESS
-	  && CUDA_CALL_NOCHECK (cuDeviceGetAttribute, &warp_size, cu_ws,
-dev) == CUDA_SUCCESS
-	  && CUDA_CALL_NOCHECK (cuDeviceGetAttribute, &dev_size, cu_mpc,
-dev) == CUDA_SUCCESS
-	  && CUDA_CALL_NOCHECK (cuDeviceGetAttribute, &cpu_size, cu_

Re: [PATCH] Fix segfault in -fsave-optimization-record (PR tree-optimization/86636)

2018-07-26 Thread Andre Vieira (lists)
On 24/07/18 15:12, Richard Biener wrote:
> On Tue, Jul 24, 2018 at 1:44 AM David Malcolm  wrote:
>>
>> There are various ways that it's possible for a gimple statement to
>> have an UNKNOWN_LOCATION, and for that UNKNOWN_LOCATION to be wrapped
>> in an ad-hoc location to capture inlining information.
>>
>> For such a location, LOCATION_FILE (loc) is NULL.
>>
>> Various places in -fsave-optimization-record were checking for
>>   loc != UNKNOWN_LOCATION
>> and were passing LOCATION_FILE (loc) to code that assumed a non-NULL
>> filename, thus leading to segfaults for the above cases.
>>
>> This patch updates the tests to use
>>   LOCATION_LOCUS (loc) != UNKNOWN_LOCATION
>> instead, to look through ad-hoc location wrappers, fixing the segfaults.
>>
>> It also adds various assertions to the affected code.
>>
>> Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu; adds
>> 8 PASS results to gcc.sum.
>>
>> OK for trunk?
> 
> OK.
> 
> Richard.
> 
>>
>> gcc/ChangeLog:
>> PR tree-optimization/86636
>> * json.cc (json::object::set): Fix comment.  Add assertions.
>> (json::array::append): Move here from json.h.  Add comment and an
>> assertion.
>> (json::string::string): Likewise.
>> * json.h (json::array::append): Move to json.cc.
>> (json::string::string): Likewise.
>> * optinfo-emit-json.cc
>> (optrecord_json_writer::impl_location_to_json): Assert that we
>> aren't attempting to write out UNKNOWN_LOCATION, or an ad-hoc
>> wrapper around it.  Expand the location once, rather than three
>> times.
>> (optrecord_json_writer::inlining_chain_to_json): Fix the check for
>> UNKNOWN_LOCATION, to use LOCATION_LOCUS to look through ad-hoc
>> wrappers.
>> (optrecord_json_writer::optinfo_to_json): Likewise, in four
>> places.  Fix some overlong lines.
>>
>> gcc/testsuite/ChangeLog:
>> PR tree-optimization/86636
>> * gcc.c-torture/compile/pr86636.c: New test.
>> ---
>>  gcc/json.cc   | 24 +++-
>>  gcc/json.h|  4 ++--
>>  gcc/optinfo-emit-json.cc  | 25 +++--
>>  gcc/testsuite/gcc.c-torture/compile/pr86636.c |  8 
>>  4 files changed, 48 insertions(+), 13 deletions(-)
>>  create mode 100644 gcc/testsuite/gcc.c-torture/compile/pr86636.c
>>
>> diff --git a/gcc/json.cc b/gcc/json.cc
>> index 3c2aa77..3ead980 100644
>> --- a/gcc/json.cc
>> +++ b/gcc/json.cc
>> @@ -76,12 +76,15 @@ object::print (pretty_printer *pp) const
>>pp_character (pp, '}');
>>  }
>>
>> -/* Set the json::value * for KEY, taking ownership of VALUE
>> +/* Set the json::value * for KEY, taking ownership of V
>> (and taking a copy of KEY if necessary).  */
>>
>>  void
>>  object::set (const char *key, value *v)
>>  {
>> +  gcc_assert (key);
>> +  gcc_assert (v);
>> +
>>value **ptr = m_map.get (key);
>>if (ptr)
>>  {
>> @@ -126,6 +129,15 @@ array::print (pretty_printer *pp) const
>>pp_character (pp, ']');
>>  }
>>
>> +/* Append non-NULL value V to a json::array, taking ownership of V.  */
>> +
>> +void
>> +array::append (value *v)
>> +{
>> +  gcc_assert (v);
>> +  m_elements.safe_push (v);
>> +}
>> +
>>  /* class json::number, a subclass of json::value, wrapping a double.  */
>>
>>  /* Implementation of json::value::print for json::number.  */
>> @@ -140,6 +152,16 @@ number::print (pretty_printer *pp) const
>>
>>  /* class json::string, a subclass of json::value.  */
>>
>> +/* json::string's ctor.  */
>> +
>> +string::string (const char *utf8)
>> +{
>> +  gcc_assert (utf8);
>> +  m_utf8 = xstrdup (utf8);
>> +}
>> +
>> +/* Implementation of json::value::print for json::string.  */
>> +
>>  void
>>  string::print (pretty_printer *pp) const
>>  {
>> diff --git a/gcc/json.h b/gcc/json.h
>> index 5c3274c..154d9e1 100644
>> --- a/gcc/json.h
>> +++ b/gcc/json.h
>> @@ -107,7 +107,7 @@ class array : public value
>>enum kind get_kind () const FINAL OVERRIDE { return JSON_ARRAY; }
>>void print (pretty_printer *pp) const FINAL OVERRIDE;
>>
>> -  void append (value *v) { m_elements.safe_push (v); }
>> +  void append (value *v);
>>
>>   private:
>>auto_vec m_elements;
>> @@ -134,7 +134,7 @@ class number : public value
>>  class string : public value
>>  {
>>   public:
>> -  string (const char *utf8) : m_utf8 (xstrdup (utf8)) {}
>> +  string (const char *utf8);
>>~string () { free (m_utf8); }
>>
>>enum kind get_kind () const FINAL OVERRIDE { return JSON_STRING; }
>> diff --git a/gcc/optinfo-emit-json.cc b/gcc/optinfo-emit-json.cc
>> index bf1172a..6460a81 100644
>> --- a/gcc/optinfo-emit-json.cc
>> +++ b/gcc/optinfo-emit-json.cc
>> @@ -202,10 +202,12 @@ optrecord_json_writer::impl_location_to_json 
>> (dump_impl_location_t loc)
>>  json::object *
>>  optrecord_json_writer::location_to_json (location_t loc)
>>  {
>> +  gcc_assert (LOCATION_LOCUS (loc

Re: [PATCH] Add linker_output as prefix for LTO temps (PR lto/86548).

2018-07-26 Thread Richard Biener
On Thu, Jul 26, 2018 at 2:12 PM Martin Liška  wrote:
>
> On 07/26/2018 01:34 PM, Richard Biener wrote:
> > On Thu, Jul 26, 2018 at 12:55 PM Martin Liška  wrote:
> >>
> >> Hi.
> >>
> >> As requested in the PR, now we produce prefixes for temp files in LTO:
> >>
> >> Example:
> >> $ gcc -flto main.o a.o --save-temps -o mybinary
> >>
> >> generates:
> >> $ ls /tmp/mybinary*
> >> /tmp/mybinary  /tmp/mybinary.ltrans0.o  /tmp/mybinary.ltrans0.s  
> >> /tmp/mybinary.ltrans.out
> >
> > It will be /tmp/mybinary.abc421.ltrans0.o
> > /tmp/mybinary.abc421.ltrans1.o, etc., correct?
>
> Yes, --save-temps changes which file names are used. Before patch:
>
> $ strace -f -s512 gcc -flto a.o main.o -o mybinary 2>&1 | grep execv | grep 
> ltrans
> [pid 23926] 
> execve("/home/marxin/bin/gcc/lib/gcc/x86_64-pc-linux-gnu/9.0.0/lto1", 
> ["/home/marxin/bin/gcc/lib/gcc/x86_64-pc-linux-gnu/9.0.0/lto1", "-quiet", 
> "-dumpdir", "./", "-dumpbase", "mybinary.wpa", "-mtune=generic", 
> "-march=x86-64", "-mtune=generic", "-march=x86-64", "-auxbase", "a", 
> "-fno-openmp", "-fno-openacc", 
> "-fltrans-output-list=/tmp/ccPVzNR6.ltrans.out", "-fwpa", 
> "-fresolution=/tmp/ccuocrhY.res", "-flinker-output=exec", "@/tmp/ccHQA575"], 
> 0x7fffd960 /* 105 vars */ 
> [pid 23928] 
> execve("/home/marxin/bin/gcc/lib/gcc/x86_64-pc-linux-gnu/9.0.0/lto1", 
> ["/home/marxin/bin/gcc/lib/gcc/x86_64-pc-linux-gnu/9.0.0/lto1", "-quiet", 
> "-dumpdir", "./", "-dumpbase", "mybinary.ltrans0", "-mtune=generic", 
> "-march=x86-64", "-mtune=generic", "-march=x86-64", "-auxbase-strip", 
> "/tmp/ccPVzNR6.ltrans0.ltrans.o", "-fno-openmp", "-fno-openacc", "-fltrans", 
> "@/tmp/ccssDdS8", "-o", "/tmp/cclsKY4G.s"], 0x7fffd960 /* 105 vars */ 
> 
> [pid 23929] execve("/home/marxin/bin/gcc/bin//as", ["as", "--64", "-o", 
> "/tmp/ccPVzNR6.ltrans0.ltrans.o", "/tmp/cclsKY4G.s"], 0x7fffd960 /* 105 
> vars */) = -1 ENOENT (No such file or directory)
> [pid 23929] execve("/home/marxin/bin/as", ["as", "--64", "-o", 
> "/tmp/ccPVzNR6.ltrans0.ltrans.o", "/tmp/cclsKY4G.s"], 0x7fffd960 /* 105 
> vars */) = -1 ENOENT (No such file or directory)
> [pid 23929] execve("/usr/local/bin/as", ["as", "--64", "-o", 
> "/tmp/ccPVzNR6.ltrans0.ltrans.o", "/tmp/cclsKY4G.s"], 0x7fffd960 /* 105 
> vars */) = -1 ENOENT (No such file or directory)
> [pid 23929] execve("/usr/bin/as", ["as", "--64", "-o", 
> "/tmp/ccPVzNR6.ltrans0.ltrans.o", "/tmp/cclsKY4G.s"], 0x7fffd960 /* 105 
> vars */ 
>
> after:
>
> $ strace -f -s512 gcc -flto a.o main.o -o mybinary 2>&1 | grep execv | grep 
> ltrans
> [pid 16379] 
> execve("/home/marxin/bin/gcc/lib/gcc/x86_64-pc-linux-gnu/9.0.0/lto1", 
> ["/home/marxin/bin/gcc/lib/gcc/x86_64-pc-linux-gnu/9.0.0/lto1", "-quiet", 
> "-dumpdir", "./", "-dumpbase", "mybinary.wpa", "-mtune=generic", 
> "-march=x86-64", "-mtune=generic", "-march=x86-64", "-auxbase", "a", 
> "-fno-openmp", "-fno-openacc", 
> "-fltrans-output-list=/tmp/mybinary.VkVmXd.ltrans.out", "-fwpa", 
> "-fresolution=/tmp/ccnQ6e55.res", "-flinker-output=exec", "@/tmp/cc0Sv5Fe"], 
> 0x7fffd960 /* 105 vars */ 
> [pid 16381] 
> execve("/home/marxin/bin/gcc/lib/gcc/x86_64-pc-linux-gnu/9.0.0/lto1", 
> ["/home/marxin/bin/gcc/lib/gcc/x86_64-pc-linux-gnu/9.0.0/lto1", "-quiet", 
> "-dumpdir", "./", "-dumpbase", "mybinary.ltrans0", "-mtune=generic", 
> "-march=x86-64", "-mtune=generic", "-march=x86-64", "-auxbase-strip", 
> "/tmp/mybinary.VkVmXd.ltrans0.ltrans.o", "-fno-openmp", "-fno-openacc", 
> "-fltrans", "@/tmp/ccDY6Ojf", "-o", "/tmp/ccrmLPAG.s"], 0x7fffd960 /* 105 
> vars */ 
> [pid 16382] execve("/home/marxin/bin/gcc/bin//as", ["as", "--64", "-o", 
> "/tmp/mybinary.VkVmXd.ltrans0.ltrans.o", "/tmp/ccrmLPAG.s"], 0x7fffd960 
> /* 105 vars */) = -1 ENOENT (No such file or directory)
> [pid 16382] execve("/home/marxin/bin/as", ["as", "--64", "-o", 
> "/tmp/mybinary.VkVmXd.ltrans0.ltrans.o", "/tmp/ccrmLPAG.s"], 0x7fffd960 
> /* 105 vars */) = -1 ENOENT (No such file or directory)
> [pid 16382] execve("/usr/local/bin/as", ["as", "--64", "-o", 
> "/tmp/mybinary.VkVmXd.ltrans0.ltrans.o", "/tmp/ccrmLPAG.s"], 0x7fffd960 
> /* 105 vars */) = -1 ENOENT (No such file or directory)
> [pid 16382] execve("/usr/bin/as", ["as", "--64", "-o", 
> "/tmp/mybinary.VkVmXd.ltrans0.ltrans.o", "/tmp/ccrmLPAG.s"], 0x7fffd960 
> /* 105 vars */ 
>
>
> >
> > Otherwise there's the chance to trash user files which isn't good.
>
> Sure, the patch behaves fine. I'll install it.

Btw, it would be more natural if with -save-temps those files were in
the same directory as the output (see how we handle
-fresolution= for example) and be named without abcd1234 stuff.  That
also avoids $tmp creep if you use -save-temps
multiple times...

Richard.

> Martin
>
> >
> >> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
> >>
> >> Ready to be installed?
> >
> > OK if the above is correct.
> >
> > Thanks,
> > Richard.
> >
> >> Martin
> >>
> >> gcc/ChangeLog:
> >>
> >> 2018-07-26  Martin Liska 

Re: [PATCH 1/7] Add __builtin_speculation_safe_value

2018-07-26 Thread Richard Biener
On Thu, Jul 26, 2018 at 12:03 PM Richard Earnshaw (lists)
 wrote:
>
> On 25/07/18 14:47, Richard Biener wrote:
> > On Wed, Jul 25, 2018 at 2:41 PM Richard Earnshaw (lists)
> >  wrote:
> >>
> >> On 25/07/18 11:36, Richard Biener wrote:
> >>> On Wed, Jul 25, 2018 at 11:49 AM Richard Earnshaw (lists)
> >>>  wrote:
> 
>  On 24/07/18 18:26, Richard Biener wrote:
> > On Mon, Jul 9, 2018 at 6:40 PM Richard Earnshaw
> >  wrote:
> >>
> >>
> >> This patch defines a new intrinsic function
> >> __builtin_speculation_safe_value.  A generic default implementation is
> >> defined which will attempt to use the backend pattern
> >> "speculation_safe_barrier".  If this pattern is not defined, or if it
> >> is not available, then the compiler will emit a warning, but
> >> compilation will continue.
> >>
> >> Note that the test spec-barrier-1.c will currently fail on all
> >> targets.  This is deliberate, the failure will go away when
> >> appropriate action is taken for each target backend.
> >
> > So given this series is supposed to be backported I question
> >
> > +rtx
> > +default_speculation_safe_value (machine_mode mode ATTRIBUTE_UNUSED,
> > +   rtx result, rtx val,
> > +   rtx failval ATTRIBUTE_UNUSED)
> > +{
> > +  emit_move_insn (result, val);
> > +#ifdef HAVE_speculation_barrier
> > +  /* Assume the target knows what it is doing: if it defines a
> > + speculation barrier, but it is not enabled, then assume that one
> > + isn't needed.  */
> > +  if (HAVE_speculation_barrier)
> > +emit_insn (gen_speculation_barrier ());
> > +
> > +#else
> > +  warning_at (input_location, 0,
> > + "this target does not define a speculation barrier; "
> > + "your program will still execute correctly, but 
> > speculation "
> > + "will not be inhibited");
> > +#endif
> > +  return result;
> >
> > which makes all but aarch64 archs warn on 
> > __bultin_speculation_safe_value
> > uses, even those that do not suffer from Spectre like all those 
> > embedded targets
> > where implementations usually do not speculate at all.
> >
> > In fact for those targets the builtin stays in the way of optimization 
> > on GIMPLE
> > as well so we should fold it away early if neither the target hook is
> > implemented
> > nor there is a speculation_barrier insn.
> >
> > So, please make resolve_overloaded_builtin return a no-op on such 
> > targets
> > which means you can remove the above warning.  Maybe such targets
> > shouldn't advertise / initialize the builtins at all?
> 
>  I disagree with your approach here.  Why would users not want to know
>  when the compiler is failing to implement a security feature when it
>  should?  As for targets that don't need something, they can easily
>  define the hook as described to suppress the warning.
> 
>  Or are you just suggesting moving the warning to resolve overloaded 
>  builtin.
> >>>
> >>> Well.  You could argue I say we shouldn't even support
> >>> __builtin_sepeculation_safe_value
> >>> for archs that do not need it or have it not implemented.  That way users 
> >>> can
> >>> decide:
> >>>
> >>> #if __HAVE_SPECULATION_SAFE_VALUE
> >>>  
> >>> #else
> >>> #warning oops // or nothing
> >>> #endif
> >>>
> >>
> >> So how about removing the predefine of __HAVE_S_S_V when the builtin is
> >> a nop, but then leaving the warning in if people try to use it anyway?
> >
> > Little bit inconsistent but I guess I could live with that.  It still leaves
> > the question open for how to declare you do not need speculation
> > barriers at all then.
> >
>  Other ports will need to take action, but in general, it can be as
>  simple as, eg patch 2 or 3 do for the Arm and AArch64 backends - or
>  simpler still if nothing is needed for that architecture.
> >>>
> >>> Then that should be the default.  You might argue we'll only see
> >>> __builtin_speculation_safe_value uses for things like Firefox which
> >>> is unlikely built for AVR (just to make an example).  But people
> >>> are going to test build just on x86 and if they build with -Werror
> >>> this will break builds on all targets that didn't even get the chance
> >>> to implement this feature.
> >>>
>  There is a test which is intended to fail to targets that have not yet
>  been patched - I thought that was better than hard-failing the build,
>  especially given that we want to back-port.
> 
>  Port maintainers DO need to decide what to do about speculation, even if
>  it is explicitly that no mitigation is needed.
> >>>
> >>> Agreed.  But I didn't yet see a request for maintainers to decide that?
> >>>
> >>
> >> consider it made, then :-)
> >
> > I suspect that drew t

Re: [patch] adjust default nvptx launch geometry for OpenACC offloaded regions

2018-07-26 Thread Tom de Vries
>> Right, in fact there are two separate things you're trying to address
>> here: launch failure and occupancy heuristic, so split the patch.

> That hunk was small, so I included it with this patch. Although if you
> insist, I can remove it.

Please, for future reference, always assume that I insist instead of
asking me, unless you have an argument to present why that is not a good
idea. And just to be clear here: "small" is not such an argument.

Please keep in mind ( https://gcc.gnu.org/contribute.html#patches ):
...
Don't mix together changes made for different reasons. Send them
individually.
...

> +  /* Check if the accelerator has sufficient hardware resources to
> + launch the offloaded kernel.  */
> +  if (dims[GOMP_DIM_WORKER] * dims[GOMP_DIM_VECTOR]
> +  > targ_fn->max_threads_per_block)
> +GOMP_PLUGIN_fatal ("The Nvidia accelerator has insufficient resources to"
> +" launch '%s' with num_workers = %d and vector_length ="
> +" %d; recompile the program with 'num_workers = x and"
> +" vector_length = y' on that offloaded region or "
> +"'-fopenacc-dim=-:x:y' where x * y <= %d.\n",
> +targ_fn->launch->fn, dims[GOMP_DIM_WORKER],
> +dims[GOMP_DIM_VECTOR], targ_fn->max_threads_per_block);
> +

This is copied from the state on an openacc branch where vector-length
is variable, and the error message text doesn't make sense on current
trunk for that reason. Also, it suggests a syntax for fopenacc-dim
that's not supported on trunk.

Committed as attached.

Thanks,
- Tom
[libgomp, nvptx] Add error with recompilation hint for launch failure

Currently, when a kernel is lauched with too many workers, it results in a cuda
launch failure.  This is triggered f.i. for parallel-loop-1.c at -O0 on a Quadro
M1200.

This patch detects this situation, and errors out with a hint on how to fix it.

Build and reg-tested on x86_64 with nvptx accelerator.

2018-07-26  Cesar Philippidis  
	Tom de Vries  

	* plugin/plugin-nvptx.c (nvptx_exec): Error if the hardware doesn't have
	sufficient resources to launch a kernel, and give a hint on how to fix
	it.

---
 libgomp/plugin/plugin-nvptx.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 5d9b5151e95..3a4077a1315 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -1204,6 +1204,21 @@ nvptx_exec (void (*fn), size_t mapnum, void **hostaddrs, void **devaddrs,
 	  dims[i] = default_dims[i];
 }
 
+  /* Check if the accelerator has sufficient hardware resources to
+ launch the offloaded kernel.  */
+  if (dims[GOMP_DIM_WORKER] * dims[GOMP_DIM_VECTOR]
+  > targ_fn->max_threads_per_block)
+{
+  int suggest_workers
+	= targ_fn->max_threads_per_block / dims[GOMP_DIM_VECTOR];
+  GOMP_PLUGIN_fatal ("The Nvidia accelerator has insufficient resources to"
+			 " launch '%s' with num_workers = %d; recompile the"
+			 " program with 'num_workers = %d' on that offloaded"
+			 " region or '-fopenacc-dim=:%d'",
+			 targ_fn->launch->fn, dims[GOMP_DIM_WORKER],
+			 suggest_workers, suggest_workers);
+}
+
   /* This reserves a chunk of a pre-allocated page of memory mapped on both
  the host and the device. HP is a host pointer to the new chunk, and DP is
  the corresponding device pointer.  */


Re: [PATCH 1/7] Add __builtin_speculation_safe_value

2018-07-26 Thread Richard Earnshaw (lists)
On 26/07/18 13:41, Richard Biener wrote:
> On Thu, Jul 26, 2018 at 12:03 PM Richard Earnshaw (lists)
>  wrote:
>>
>> On 25/07/18 14:47, Richard Biener wrote:
>>> On Wed, Jul 25, 2018 at 2:41 PM Richard Earnshaw (lists)
>>>  wrote:

 On 25/07/18 11:36, Richard Biener wrote:
> On Wed, Jul 25, 2018 at 11:49 AM Richard Earnshaw (lists)
>  wrote:
>>
>> On 24/07/18 18:26, Richard Biener wrote:
>>> On Mon, Jul 9, 2018 at 6:40 PM Richard Earnshaw
>>>  wrote:


 This patch defines a new intrinsic function
 __builtin_speculation_safe_value.  A generic default implementation is
 defined which will attempt to use the backend pattern
 "speculation_safe_barrier".  If this pattern is not defined, or if it
 is not available, then the compiler will emit a warning, but
 compilation will continue.

 Note that the test spec-barrier-1.c will currently fail on all
 targets.  This is deliberate, the failure will go away when
 appropriate action is taken for each target backend.
>>>
>>> So given this series is supposed to be backported I question
>>>
>>> +rtx
>>> +default_speculation_safe_value (machine_mode mode ATTRIBUTE_UNUSED,
>>> +   rtx result, rtx val,
>>> +   rtx failval ATTRIBUTE_UNUSED)
>>> +{
>>> +  emit_move_insn (result, val);
>>> +#ifdef HAVE_speculation_barrier
>>> +  /* Assume the target knows what it is doing: if it defines a
>>> + speculation barrier, but it is not enabled, then assume that one
>>> + isn't needed.  */
>>> +  if (HAVE_speculation_barrier)
>>> +emit_insn (gen_speculation_barrier ());
>>> +
>>> +#else
>>> +  warning_at (input_location, 0,
>>> + "this target does not define a speculation barrier; "
>>> + "your program will still execute correctly, but 
>>> speculation "
>>> + "will not be inhibited");
>>> +#endif
>>> +  return result;
>>>
>>> which makes all but aarch64 archs warn on 
>>> __bultin_speculation_safe_value
>>> uses, even those that do not suffer from Spectre like all those 
>>> embedded targets
>>> where implementations usually do not speculate at all.
>>>
>>> In fact for those targets the builtin stays in the way of optimization 
>>> on GIMPLE
>>> as well so we should fold it away early if neither the target hook is
>>> implemented
>>> nor there is a speculation_barrier insn.
>>>
>>> So, please make resolve_overloaded_builtin return a no-op on such 
>>> targets
>>> which means you can remove the above warning.  Maybe such targets
>>> shouldn't advertise / initialize the builtins at all?
>>
>> I disagree with your approach here.  Why would users not want to know
>> when the compiler is failing to implement a security feature when it
>> should?  As for targets that don't need something, they can easily
>> define the hook as described to suppress the warning.
>>
>> Or are you just suggesting moving the warning to resolve overloaded 
>> builtin.
>
> Well.  You could argue I say we shouldn't even support
> __builtin_sepeculation_safe_value
> for archs that do not need it or have it not implemented.  That way users 
> can
> decide:
>
> #if __HAVE_SPECULATION_SAFE_VALUE
>  
> #else
> #warning oops // or nothing
> #endif
>

 So how about removing the predefine of __HAVE_S_S_V when the builtin is
 a nop, but then leaving the warning in if people try to use it anyway?
>>>
>>> Little bit inconsistent but I guess I could live with that.  It still leaves
>>> the question open for how to declare you do not need speculation
>>> barriers at all then.
>>>
>> Other ports will need to take action, but in general, it can be as
>> simple as, eg patch 2 or 3 do for the Arm and AArch64 backends - or
>> simpler still if nothing is needed for that architecture.
>
> Then that should be the default.  You might argue we'll only see
> __builtin_speculation_safe_value uses for things like Firefox which
> is unlikely built for AVR (just to make an example).  But people
> are going to test build just on x86 and if they build with -Werror
> this will break builds on all targets that didn't even get the chance
> to implement this feature.
>
>> There is a test which is intended to fail to targets that have not yet
>> been patched - I thought that was better than hard-failing the build,
>> especially given that we want to back-port.
>>
>> Port maintainers DO need to decide what to do about speculation, even if
>> it is explicitly that no mitigation is needed.
>
> Agreed.  But I didn't yet see a request for maintainers to decide that?
>

 consider 

Re: [PATCH 1/7] Add __builtin_speculation_safe_value

2018-07-26 Thread Richard Biener
On Thu, Jul 26, 2018 at 3:06 PM Richard Earnshaw (lists)
 wrote:
>
> On 26/07/18 13:41, Richard Biener wrote:
> > On Thu, Jul 26, 2018 at 12:03 PM Richard Earnshaw (lists)
> >  wrote:
> >>
> >> On 25/07/18 14:47, Richard Biener wrote:
> >>> On Wed, Jul 25, 2018 at 2:41 PM Richard Earnshaw (lists)
> >>>  wrote:
> 
>  On 25/07/18 11:36, Richard Biener wrote:
> > On Wed, Jul 25, 2018 at 11:49 AM Richard Earnshaw (lists)
> >  wrote:
> >>
> >> On 24/07/18 18:26, Richard Biener wrote:
> >>> On Mon, Jul 9, 2018 at 6:40 PM Richard Earnshaw
> >>>  wrote:
> 
> 
>  This patch defines a new intrinsic function
>  __builtin_speculation_safe_value.  A generic default implementation 
>  is
>  defined which will attempt to use the backend pattern
>  "speculation_safe_barrier".  If this pattern is not defined, or if it
>  is not available, then the compiler will emit a warning, but
>  compilation will continue.
> 
>  Note that the test spec-barrier-1.c will currently fail on all
>  targets.  This is deliberate, the failure will go away when
>  appropriate action is taken for each target backend.
> >>>
> >>> So given this series is supposed to be backported I question
> >>>
> >>> +rtx
> >>> +default_speculation_safe_value (machine_mode mode ATTRIBUTE_UNUSED,
> >>> +   rtx result, rtx val,
> >>> +   rtx failval ATTRIBUTE_UNUSED)
> >>> +{
> >>> +  emit_move_insn (result, val);
> >>> +#ifdef HAVE_speculation_barrier
> >>> +  /* Assume the target knows what it is doing: if it defines a
> >>> + speculation barrier, but it is not enabled, then assume that one
> >>> + isn't needed.  */
> >>> +  if (HAVE_speculation_barrier)
> >>> +emit_insn (gen_speculation_barrier ());
> >>> +
> >>> +#else
> >>> +  warning_at (input_location, 0,
> >>> + "this target does not define a speculation barrier; "
> >>> + "your program will still execute correctly, but 
> >>> speculation "
> >>> + "will not be inhibited");
> >>> +#endif
> >>> +  return result;
> >>>
> >>> which makes all but aarch64 archs warn on 
> >>> __bultin_speculation_safe_value
> >>> uses, even those that do not suffer from Spectre like all those 
> >>> embedded targets
> >>> where implementations usually do not speculate at all.
> >>>
> >>> In fact for those targets the builtin stays in the way of 
> >>> optimization on GIMPLE
> >>> as well so we should fold it away early if neither the target hook is
> >>> implemented
> >>> nor there is a speculation_barrier insn.
> >>>
> >>> So, please make resolve_overloaded_builtin return a no-op on such 
> >>> targets
> >>> which means you can remove the above warning.  Maybe such targets
> >>> shouldn't advertise / initialize the builtins at all?
> >>
> >> I disagree with your approach here.  Why would users not want to know
> >> when the compiler is failing to implement a security feature when it
> >> should?  As for targets that don't need something, they can easily
> >> define the hook as described to suppress the warning.
> >>
> >> Or are you just suggesting moving the warning to resolve overloaded 
> >> builtin.
> >
> > Well.  You could argue I say we shouldn't even support
> > __builtin_sepeculation_safe_value
> > for archs that do not need it or have it not implemented.  That way 
> > users can
> > decide:
> >
> > #if __HAVE_SPECULATION_SAFE_VALUE
> >  
> > #else
> > #warning oops // or nothing
> > #endif
> >
> 
>  So how about removing the predefine of __HAVE_S_S_V when the builtin is
>  a nop, but then leaving the warning in if people try to use it anyway?
> >>>
> >>> Little bit inconsistent but I guess I could live with that.  It still 
> >>> leaves
> >>> the question open for how to declare you do not need speculation
> >>> barriers at all then.
> >>>
> >> Other ports will need to take action, but in general, it can be as
> >> simple as, eg patch 2 or 3 do for the Arm and AArch64 backends - or
> >> simpler still if nothing is needed for that architecture.
> >
> > Then that should be the default.  You might argue we'll only see
> > __builtin_speculation_safe_value uses for things like Firefox which
> > is unlikely built for AVR (just to make an example).  But people
> > are going to test build just on x86 and if they build with -Werror
> > this will break builds on all targets that didn't even get the chance
> > to implement this feature.
> >
> >> There is a test which is intended to fail to targets that have not yet
> >> been patched - I thought that was better than hard-failing the bu

Build fail on gthr-simple.h targets (Re: AsyncI/O patch committed)

2018-07-26 Thread Ulrich Weigand
Nicholas Koenig wrote:

> Hello everyone,
> 
> I have committed the async I/O patch as r262978.
> 
> The test cases are in libgomp.fortran for now, maybe that can be changed 
> later.

It looks like this broke building libgfortran on spu, and presumably
any platform that uses gthr-simple instead of gthr-posix.

The problem is that io/asynch.h unconditionally uses a couple of
features that are not provided by gthr-simplex, in particular
  __gthread_cond_t
and
  __gthread_equal / __gthread_self

According to the documentation in gthr.h, the former is only available
if __GTHREAD_HAS_COND is defined, and the latter are only available if
__GTHREADS_CXX0X is defined.  Neither is true for gthr-simple.h.

To fix the build error, either libgfortran should only use those features
conditionally on those defines, or else the gthr.h logic needs to be
changed and (stubs for) those features provided in gthr-simple.h as well.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



[PATCH, rs6000] Replace __uint128_t and __int128_t with __uint128 and __int128 in Power PC built-in documentation

2018-07-26 Thread Kelvin Nilsen
To improve internal consistency and to improve consistency with published ABI 
documents, this patch replaces the __uint128_t type with __uint128 and replaces 
__int128_t with __int128.

I have built and regression tested this patch on powerpc64le-unknown-linux with 
no regressions.  I have also built and reviewed the gcc.pdf file.

Is this ok for trunk?

gcc/ChangeLog:

2018-07-25  Kelvin Nilsen  

* doc/extend.texi (Basic PowerPC Built-in Functions Available on
ISA 2.05):  Replace __uint128_t with __uint128 and __int128_t with
__int128 in built-in function prototypes.
(PowerPC AltiVec Built-in Functions on ISA 2.07): Likewise.
(PowerPC AltiVec Built-in Functions on ISA 3.0): Likewise.

Index: gcc/doc/extend.texi
===
--- gcc/doc/extend.texi (revision 262977)
+++ gcc/doc/extend.texi (working copy)
@@ -15762,9 +15762,9 @@ long long __builtin_divde (long long, long long);
 unsigned long long __builtin_divdeu (unsigned long long, unsigned long long);
 int __builtin_divwe (int, int);
 unsigned int __builtin_divweu (unsigned int, unsigned int);
-vector __int128_t __builtin_pack_vector_int128 (long long, long long);
+vector __int128 __builtin_pack_vector_int128 (long long, long long);
 void __builtin_rs6000_speculation_barrier (void);
-long long __builtin_unpack_vector_int128 (vector __int128_t, signed char);
+long long __builtin_unpack_vector_int128 (vector __int128, signed char);
 @end smallexample
 
 Of these, the @code{__builtin_divde} and @code{__builtin_divdeu} functions
@@ -18331,57 +18331,57 @@ vector unsigned long long vec_vupklsw (vector int)
 If the ISA 2.07 additions to the vector/scalar (power8-vector)
 instruction set are available, the following additional functions are
 available for 64-bit targets.  New vector types
-(@var{vector __int128_t} and @var{vector __uint128_t}) are available
-to hold the @var{__int128_t} and @var{__uint128_t} types to use these
+(@var{vector __int128} and @var{vector __uint128}) are available
+to hold the @var{__int128} and @var{__uint128} types to use these
 builtins.
 
 The normal vector extract, and set operations work on
-@var{vector __int128_t} and @var{vector __uint128_t} types,
+@var{vector __int128} and @var{vector __uint128} types,
 but the index value must be 0.
 
 @smallexample
-vector __int128_t vec_vaddcuq (vector __int128_t, vector __int128_t);
-vector __uint128_t vec_vaddcuq (vector __uint128_t, vector __uint128_t);
+vector __int128 vec_vaddcuq (vector __int128, vector __int128);
+vector __uint128 vec_vaddcuq (vector __uint128, vector __uint128);
 
-vector __int128_t vec_vadduqm (vector __int128_t, vector __int128_t);
-vector __uint128_t vec_vadduqm (vector __uint128_t, vector __uint128_t);
+vector __int128 vec_vadduqm (vector __int128, vector __int128);
+vector __uint128 vec_vadduqm (vector __uint128, vector __uint128);
 
-vector __int128_t vec_vaddecuq (vector __int128_t, vector __int128_t,
-vector __int128_t);
-vector __uint128_t vec_vaddecuq (vector __uint128_t, vector __uint128_t,
- vector __uint128_t);
+vector __int128 vec_vaddecuq (vector __int128, vector __int128,
+vector __int128);
+vector __uint128 vec_vaddecuq (vector __uint128, vector __uint128,
+ vector __uint128);
 
-vector __int128_t vec_vaddeuqm (vector __int128_t, vector __int128_t,
-vector __int128_t);
-vector __uint128_t vec_vaddeuqm (vector __uint128_t, vector __uint128_t,
- vector __uint128_t);
+vector __int128 vec_vaddeuqm (vector __int128, vector __int128,
+vector __int128);
+vector __uint128 vec_vaddeuqm (vector __uint128, vector __uint128,
+ vector __uint128);
 
-vector __int128_t vec_vsubecuq (vector __int128_t, vector __int128_t,
-vector __int128_t);
-vector __uint128_t vec_vsubecuq (vector __uint128_t, vector __uint128_t,
- vector __uint128_t);
+vector __int128 vec_vsubecuq (vector __int128, vector __int128,
+vector __int128);
+vector __uint128 vec_vsubecuq (vector __uint128, vector __uint128,
+ vector __uint128);
 
-vector __int128_t vec_vsubeuqm (vector __int128_t, vector __int128_t,
-vector __int128_t);
-vector __uint128_t vec_vsubeuqm (vector __uint128_t, vector __uint128_t,
- vector __uint128_t);
+vector __int128 vec_vsubeuqm (vector __int128, vector __int128,
+vector __int128);
+vector __uint128 vec_vsubeuqm (vector __uint128, vector __uint128,
+ vector __uint128);
 
-vector __int128_t vec_vsubcuq (vector __int128_t, vector __int128_t);
-vector __uint128_t vec_vsubcu

Re: Build fail on gthr-single.h targets (Re: AsyncI/O patch committed)

2018-07-26 Thread Ulrich Weigand
I wrote:
> Nicholas Koenig wrote:
> 
> > Hello everyone,
> > 
> > I have committed the async I/O patch as r262978.
> > 
> > The test cases are in libgomp.fortran for now, maybe that can be changed 
> > later.
> 
> It looks like this broke building libgfortran on spu, and presumably
> any platform that uses gthr-simple instead of gthr-posix.

The file is called gthr-single.h, not gthr-simple.h ... sorry for the typo.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



Re: [PATCH] Add linker_output as prefix for LTO temps (PR lto/86548).

2018-07-26 Thread Martin Liška
On 07/26/2018 02:26 PM, Richard Biener wrote:
> On Thu, Jul 26, 2018 at 2:12 PM Martin Liška  wrote:
>>
>> On 07/26/2018 01:34 PM, Richard Biener wrote:
>>> On Thu, Jul 26, 2018 at 12:55 PM Martin Liška  wrote:

 Hi.

 As requested in the PR, now we produce prefixes for temp files in LTO:

 Example:
 $ gcc -flto main.o a.o --save-temps -o mybinary

 generates:
 $ ls /tmp/mybinary*
 /tmp/mybinary  /tmp/mybinary.ltrans0.o  /tmp/mybinary.ltrans0.s  
 /tmp/mybinary.ltrans.out
>>>
>>> It will be /tmp/mybinary.abc421.ltrans0.o
>>> /tmp/mybinary.abc421.ltrans1.o, etc., correct?
>>
>> Yes, --save-temps changes which file names are used. Before patch:
>>
>> $ strace -f -s512 gcc -flto a.o main.o -o mybinary 2>&1 | grep execv | grep 
>> ltrans
>> [pid 23926] 
>> execve("/home/marxin/bin/gcc/lib/gcc/x86_64-pc-linux-gnu/9.0.0/lto1", 
>> ["/home/marxin/bin/gcc/lib/gcc/x86_64-pc-linux-gnu/9.0.0/lto1", "-quiet", 
>> "-dumpdir", "./", "-dumpbase", "mybinary.wpa", "-mtune=generic", 
>> "-march=x86-64", "-mtune=generic", "-march=x86-64", "-auxbase", "a", 
>> "-fno-openmp", "-fno-openacc", 
>> "-fltrans-output-list=/tmp/ccPVzNR6.ltrans.out", "-fwpa", 
>> "-fresolution=/tmp/ccuocrhY.res", "-flinker-output=exec", "@/tmp/ccHQA575"], 
>> 0x7fffd960 /* 105 vars */ 
>> [pid 23928] 
>> execve("/home/marxin/bin/gcc/lib/gcc/x86_64-pc-linux-gnu/9.0.0/lto1", 
>> ["/home/marxin/bin/gcc/lib/gcc/x86_64-pc-linux-gnu/9.0.0/lto1", "-quiet", 
>> "-dumpdir", "./", "-dumpbase", "mybinary.ltrans0", "-mtune=generic", 
>> "-march=x86-64", "-mtune=generic", "-march=x86-64", "-auxbase-strip", 
>> "/tmp/ccPVzNR6.ltrans0.ltrans.o", "-fno-openmp", "-fno-openacc", "-fltrans", 
>> "@/tmp/ccssDdS8", "-o", "/tmp/cclsKY4G.s"], 0x7fffd960 /* 105 vars */ 
>> 
>> [pid 23929] execve("/home/marxin/bin/gcc/bin//as", ["as", "--64", "-o", 
>> "/tmp/ccPVzNR6.ltrans0.ltrans.o", "/tmp/cclsKY4G.s"], 0x7fffd960 /* 105 
>> vars */) = -1 ENOENT (No such file or directory)
>> [pid 23929] execve("/home/marxin/bin/as", ["as", "--64", "-o", 
>> "/tmp/ccPVzNR6.ltrans0.ltrans.o", "/tmp/cclsKY4G.s"], 0x7fffd960 /* 105 
>> vars */) = -1 ENOENT (No such file or directory)
>> [pid 23929] execve("/usr/local/bin/as", ["as", "--64", "-o", 
>> "/tmp/ccPVzNR6.ltrans0.ltrans.o", "/tmp/cclsKY4G.s"], 0x7fffd960 /* 105 
>> vars */) = -1 ENOENT (No such file or directory)
>> [pid 23929] execve("/usr/bin/as", ["as", "--64", "-o", 
>> "/tmp/ccPVzNR6.ltrans0.ltrans.o", "/tmp/cclsKY4G.s"], 0x7fffd960 /* 105 
>> vars */ 
>>
>> after:
>>
>> $ strace -f -s512 gcc -flto a.o main.o -o mybinary 2>&1 | grep execv | grep 
>> ltrans
>> [pid 16379] 
>> execve("/home/marxin/bin/gcc/lib/gcc/x86_64-pc-linux-gnu/9.0.0/lto1", 
>> ["/home/marxin/bin/gcc/lib/gcc/x86_64-pc-linux-gnu/9.0.0/lto1", "-quiet", 
>> "-dumpdir", "./", "-dumpbase", "mybinary.wpa", "-mtune=generic", 
>> "-march=x86-64", "-mtune=generic", "-march=x86-64", "-auxbase", "a", 
>> "-fno-openmp", "-fno-openacc", 
>> "-fltrans-output-list=/tmp/mybinary.VkVmXd.ltrans.out", "-fwpa", 
>> "-fresolution=/tmp/ccnQ6e55.res", "-flinker-output=exec", "@/tmp/cc0Sv5Fe"], 
>> 0x7fffd960 /* 105 vars */ 
>> [pid 16381] 
>> execve("/home/marxin/bin/gcc/lib/gcc/x86_64-pc-linux-gnu/9.0.0/lto1", 
>> ["/home/marxin/bin/gcc/lib/gcc/x86_64-pc-linux-gnu/9.0.0/lto1", "-quiet", 
>> "-dumpdir", "./", "-dumpbase", "mybinary.ltrans0", "-mtune=generic", 
>> "-march=x86-64", "-mtune=generic", "-march=x86-64", "-auxbase-strip", 
>> "/tmp/mybinary.VkVmXd.ltrans0.ltrans.o", "-fno-openmp", "-fno-openacc", 
>> "-fltrans", "@/tmp/ccDY6Ojf", "-o", "/tmp/ccrmLPAG.s"], 0x7fffd960 /* 
>> 105 vars */ 
>> [pid 16382] execve("/home/marxin/bin/gcc/bin//as", ["as", "--64", "-o", 
>> "/tmp/mybinary.VkVmXd.ltrans0.ltrans.o", "/tmp/ccrmLPAG.s"], 0x7fffd960 
>> /* 105 vars */) = -1 ENOENT (No such file or directory)
>> [pid 16382] execve("/home/marxin/bin/as", ["as", "--64", "-o", 
>> "/tmp/mybinary.VkVmXd.ltrans0.ltrans.o", "/tmp/ccrmLPAG.s"], 0x7fffd960 
>> /* 105 vars */) = -1 ENOENT (No such file or directory)
>> [pid 16382] execve("/usr/local/bin/as", ["as", "--64", "-o", 
>> "/tmp/mybinary.VkVmXd.ltrans0.ltrans.o", "/tmp/ccrmLPAG.s"], 0x7fffd960 
>> /* 105 vars */) = -1 ENOENT (No such file or directory)
>> [pid 16382] execve("/usr/bin/as", ["as", "--64", "-o", 
>> "/tmp/mybinary.VkVmXd.ltrans0.ltrans.o", "/tmp/ccrmLPAG.s"], 0x7fffd960 
>> /* 105 vars */ 
>>
>>
>>>
>>> Otherwise there's the chance to trash user files which isn't good.
>>
>> Sure, the patch behaves fine. I'll install it.
> 
> Btw, it would be more natural if with -save-temps those files were in
> the same directory as the output (see how we handle
> -fresolution= for example) and be named without abcd1234 stuff.  That
> also avoids $tmp creep if you use -save-temps
> multiple times...

If I see correctly they are:

$ strace -f -s512 gcc -flto a.o main.o -o mybinary --save-temps 2>&1 | grep 
execv | grep ltrans
[pid 11343] 
execv

[PATCH 1/8] Remove dependency on _GLIBCXX_USE_C99_STDINT_TR1

2018-07-26 Thread jwakely
From: Jonathan Wakely 

By adding fallback definitions of std::intmax_t and std::uintmax_t it's
possible to define  without _GLIBCXX_USE_C99_STDINT_TR1. This in
turn allows most of  to be defined, which removes the dependency
on _GLIBCXX_USE_C99_STDINT_TR1 for all of the C++11 concurrency features.

The compiler defines __INTMAX_TYPE__ and __UINTMAX_TYPE__
unconditionally so it should be safe to rely on them.

* include/bits/atomic_futex.h [!_GLIBCXX_USE_C99_STDINT_TR1]
(__atomic_futex_unsigned_base): Remove dependency on
_GLIBCXX_USE_C99_STDINT_TR1 macro.
* include/bits/unique_lock.h [!_GLIBCXX_USE_C99_STDINT_TR1]
(unique_lock): Remove dependency on _GLIBCXX_USE_C99_STDINT_TR1.
* include/c_global/cstdint [!_GLIBCXX_USE_C99_STDINT_TR1] (intmax_t)
(uintmax_t): Define using predefined macros.
* include/std/chrono [!_GLIBCXX_USE_C99_STDINT_TR1] (duration)
(time_point, system_clock, high_resolution_clock, steady_clock): Remove
dependency on _GLIBCXX_USE_C99_STDINT_TR1 macro.
(nanoseconds, microseconds, milliseconds, seconds, minutes, hours):
[!_GLIBCXX_USE_C99_STDINT_TR1]: Define using __INT64_TYPE__ or
long long when  is not usable.
* include/std/condition_variable [!_GLIBCXX_USE_C99_STDINT_TR1]
(condition_variable, condition_variable_any): Remove dependency on
_GLIBCXX_USE_C99_STDINT_TR1.
* include/std/future [!_GLIBCXX_USE_C99_STDINT_TR1] (future, promise)
(packaged_task, async): Likewise.
* include/std/mutex [!_GLIBCXX_USE_C99_STDINT_TR1] (recursive_mutex)
(timed_mutex, recursive_timed_mutex, try_lock, lock, scoped_lock)
(once_flag, call_once): Likewise.
* include/std/ratio [!_GLIBCXX_USE_C99_STDINT_TR1] (ratio): Likewise.
* include/std/shared_mutex [!_GLIBCXX_USE_C99_STDINT_TR1]
(shared_mutex, shared_timed_mutex, shared_lock): Likewise.
* include/std/thread [!_GLIBCXX_USE_C99_STDINT_TR1] (thread)
(this_thread::get_id, this_thread::yield, this_thread::sleep_for)
(this_thread::sleep_until): Likewise.
* src/c++11/chrono.cc: Remove dependency on
_GLIBCXX_USE_C99_STDINT_TR1 macro.
* src/c++11/condition_variable.cc: Likewise.
* src/c++11/futex.cc: Likewise.
* src/c++11/future.cc: Likewise.
* src/c++11/mutex.cc: Likewise.
* src/c++11/thread.cc: Likewise.
* testsuite/20_util/duration/literals/range_neg.cc: Adjust dg-error.
* testsuite/20_util/duration/requirements/typedefs_neg1.cc: Likewise.
* testsuite/20_util/duration/requirements/typedefs_neg2.cc: Likewise.
* testsuite/20_util/duration/requirements/typedefs_neg3.cc: Likewise.
* testsuite/20_util/ratio/cons/cons_overflow_neg.cc: Likewise.
* testsuite/20_util/ratio/operations/ops_overflow_neg.cc: Likewise.

diff --git a/libstdc++-v3/ChangeLog b/libstdc++-v3/ChangeLog
index f0855a6cd91..a3665ee8b6a 100644
--- a/libstdc++-v3/ChangeLog
+++ b/libstdc++-v3/ChangeLog
@@ -1,3 +1,46 @@
+2018-07-26  Jonathan Wakely  
+
+   * include/bits/atomic_futex.h [!_GLIBCXX_USE_C99_STDINT_TR1]
+   (__atomic_futex_unsigned_base): Remove dependency on
+   _GLIBCXX_USE_C99_STDINT_TR1 macro.
+   * include/bits/unique_lock.h [!_GLIBCXX_USE_C99_STDINT_TR1]
+   (unique_lock): Remove dependency on _GLIBCXX_USE_C99_STDINT_TR1.
+   * include/c_global/cstdint [!_GLIBCXX_USE_C99_STDINT_TR1] (intmax_t)
+   (uintmax_t): Define using predefined macros.
+   * include/std/chrono [!_GLIBCXX_USE_C99_STDINT_TR1] (duration)
+   (time_point, system_clock, high_resolution_clock, steady_clock): Remove
+   dependency on _GLIBCXX_USE_C99_STDINT_TR1 macro.
+   (nanoseconds, microseconds, milliseconds, seconds, minutes, hours):
+   [!_GLIBCXX_USE_C99_STDINT_TR1]: Define using __INT64_TYPE__ or
+   long long when  is not usable.
+   * include/std/condition_variable [!_GLIBCXX_USE_C99_STDINT_TR1]
+   (condition_variable, condition_variable_any): Remove dependency on
+   _GLIBCXX_USE_C99_STDINT_TR1.
+   * include/std/future [!_GLIBCXX_USE_C99_STDINT_TR1] (future, promise)
+   (packaged_task, async): Likewise.
+   * include/std/mutex [!_GLIBCXX_USE_C99_STDINT_TR1] (recursive_mutex)
+   (timed_mutex, recursive_timed_mutex, try_lock, lock, scoped_lock)
+   (once_flag, call_once): Likewise.
+   * include/std/ratio [!_GLIBCXX_USE_C99_STDINT_TR1] (ratio): Likewise.
+   * include/std/shared_mutex [!_GLIBCXX_USE_C99_STDINT_TR1]
+   (shared_mutex, shared_timed_mutex, shared_lock): Likewise.
+   * include/std/thread [!_GLIBCXX_USE_C99_STDINT_TR1] (thread)
+   (this_thread::get_id, this_thread::yield, this_thread::sleep_for)
+   (this_thread::sleep_until): Likewise.
+   * src/c++11/chrono.cc: Remove dependency on
+   _GLIBCXX_USE_C99_STDINT_TR1 macro.
+   * src/c++11/condition_variable.cc: L

[PATCH 0/8] Reduce/remove dependencies on _GLIBCXX_USE_C99_STDINT_TR1

2018-07-26 Thread jwakely
From: Jonathan Wakely 

Currently huge swathes of the library are only enabled conditionally by:

#ifdef _GLIBCXX_USE_C99_STDINT_TR1

This macro was created as part of the TR1 implementation, to detect whether
the C++98 compiler has access to a working  header from C99. In
C++11 that header is required, and may even be provided by GCC itself. Having
a large portion of the C++11 library depend on a feature that is almost
guaranteed to be present for C++11 just complicates and obfuscates the code.

There are also a number of places that use features that depend on the macro,
but aren't guarded by the macro. This means if the macro were to be undefined
for some target, the library wouldn't even build!

Several of the dependencies turn out to be unnecessary. For example every
instantiation of strings and streams using char16_t was guarded by the macro,
because char_traits wants to use std::uint_least16_t (and similarly for
char32_t). We can define good-enough char_traits specializations even if the
 types are not available. Every use of  is guarded by the
macro, because  depends on  and that uses std::intmax_t and
std::uintmax_t. By defining those two types in  even when we don't
have a working  we can define most of the C++11 concurrency library
unconditionally (or to be only conditional on _GLIBCXX_HAS_GTHREADS).

The remaining dependencies are related to , which makes heavy use of
the  types. I haven't tried to do anything about that, but have
added some missing checks for the macro, and some missing dg-require-cstdint
directives to tests that depend on  or .

Tested powerpc64le-linux, committed to trunk.



[PATCH 3/8] Modify some library internals to work without

2018-07-26 Thread jwakely
From: Jonathan Wakely 

std::__detail::__clp2 used uint_fast32_t and uint_fast64_t without
checking _GLIBCXX_USE_C99_STDINT_TR1 which was a potential bug. A
simpler implementation based on the new std::__ceil2 code performs
better and doesn't depend on  types.

std::align and other C++11 functions in  where unnecessarily
missing when _GLIBCXX_USE_C99_STDINT_TR1 was not defined.

* include/bits/hashtable_policy.h (__detail::__clp2): Use faster
implementation that doesn't depend on  types.
* include/std/memory (align) [!_GLIBCXX_USE_C99_STDINT_TR1]: Use
std::size_t when std::uintptr_t is not usable.
[!_GLIBCXX_USE_C99_STDINT_TR1] (pointer_safety, declare_reachable)
(undeclare_reachable, declare_no_pointers, undeclare_no_pointers):
Define independent of _GLIBCXX_USE_C99_STDINT_TR1.

diff --git a/libstdc++-v3/ChangeLog b/libstdc++-v3/ChangeLog
index 10b1496af81..66ee23d1fc7 100644
--- a/libstdc++-v3/ChangeLog
+++ b/libstdc++-v3/ChangeLog
@@ -1,5 +1,13 @@
 2018-07-26  Jonathan Wakely  
 
+   * include/bits/hashtable_policy.h (__detail::__clp2): Use faster
+   implementation that doesn't depend on  types.
+   * include/std/memory (align) [!_GLIBCXX_USE_C99_STDINT_TR1]: Use
+   std::size_t when std::uintptr_t is not usable.
+   [!_GLIBCXX_USE_C99_STDINT_TR1] (pointer_safety, declare_reachable)
+   (undeclare_reachable, declare_no_pointers, undeclare_no_pointers):
+   Define independent of _GLIBCXX_USE_C99_STDINT_TR1.
+
* include/bits/basic_string.h [!_GLIBCXX_USE_C99_STDINT_TR1]
(hash, hash): Remove dependency on
_GLIBCXX_USE_C99_STDINT_TR1.
diff --git a/libstdc++-v3/include/bits/hashtable_policy.h 
b/libstdc++-v3/include/bits/hashtable_policy.h
index 3ff6b14a90f..d7497711071 100644
--- a/libstdc++-v3/include/bits/hashtable_policy.h
+++ b/libstdc++-v3/include/bits/hashtable_policy.h
@@ -32,7 +32,7 @@
 #define _HASHTABLE_POLICY_H 1
 
 #include// for std::tuple, std::forward_as_tuple
-#include  // for std::uint_fast64_t
+#include   // for std::numeric_limits
 #include  // for std::min.
 
 namespace std _GLIBCXX_VISIBILITY(default)
@@ -504,27 +504,15 @@ namespace __detail
 { return __num & (__den - 1); }
   };
 
-  /// Compute closest power of 2.
-  _GLIBCXX14_CONSTEXPR
+  /// Compute closest power of 2 not less than __n
   inline std::size_t
   __clp2(std::size_t __n) noexcept
   {
-#if __SIZEOF_SIZE_T__ >= 8
-std::uint_fast64_t __x = __n;
-#else
-std::uint_fast32_t __x = __n;
-#endif
-// Algorithm from Hacker's Delight, Figure 3-3.
-__x = __x - 1;
-__x = __x | (__x >> 1);
-__x = __x | (__x >> 2);
-__x = __x | (__x >> 4);
-__x = __x | (__x >> 8);
-__x = __x | (__x >>16);
-#if __SIZEOF_SIZE_T__ >= 8
-__x = __x | (__x >>32);
-#endif
-return __x + 1;
+// Equivalent to return __n ? std::ceil2(__n) : 0;
+if (__n < 2)
+  return __n;
+return 1ul << (numeric_limits::digits
+   - __builtin_clzl(__n - 1ul));
   }
 
   /// Rehash policy providing power of 2 bucket numbers. Avoids modulo
diff --git a/libstdc++-v3/include/std/memory b/libstdc++-v3/include/std/memory
index f3559a91327..9689540fb81 100644
--- a/libstdc++-v3/include/std/memory
+++ b/libstdc++-v3/include/std/memory
@@ -88,8 +88,7 @@
 #endif
 
 #if __cplusplus >= 201103L
-#  include 
-#  ifdef _GLIBCXX_USE_C99_STDINT_TR1
+#include 
 namespace std _GLIBCXX_VISIBILITY(default)
 {
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
@@ -113,7 +112,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 inline void*
 align(size_t __align, size_t __size, void*& __ptr, size_t& __space) noexcept
 {
+#ifdef _GLIBCXX_USE_C99_STDINT_TR1
   const auto __intptr = reinterpret_cast(__ptr);
+#else
+  // Cannot use std::uintptr_t so assume that std::size_t can be used instead.
+  static_assert(sizeof(size_t) >= sizeof(void*),
+  "std::size_t must be a suitable substitute for std::uintptr_t");
+  const auto __intptr = reinterpret_cast(__ptr);
+#endif
   const auto __aligned = (__intptr - 1u + __align) & -__align;
   const auto __diff = __aligned - __intptr;
   if ((__size + __diff) > __space)
@@ -147,7 +153,6 @@ get_pointer_safety() noexcept { return 
pointer_safety::relaxed; }
 
 _GLIBCXX_END_NAMESPACE_VERSION
 } // namespace
-#endif // _GLIBCXX_USE_C99_STDINT_TR1
 #endif // C++11
 
 #endif /* _GLIBCXX_MEMORY */
-- 
2.14.4



[PATCH 2/8] Remove char16_t and char32_t dependency on

2018-07-26 Thread jwakely
From: Jonathan Wakely 

The char16_t and char32_t types are automatically defined by the
compiler and do not depend on support in . The char_traits
specializations depend on uint_leastNN_t but can be made to work anyway
by using the predefined macros, or as a last resort make_unsigned.

* include/bits/basic_string.h [!_GLIBCXX_USE_C99_STDINT_TR1]
(hash, hash): Remove dependency on
_GLIBCXX_USE_C99_STDINT_TR1.
* include/bits/char_traits.h [!_GLIBCXX_USE_C99_STDINT_TR1]
(char_traits, char_traits): Remove dependency on
_GLIBCXX_USE_C99_STDINT_TR1. Use __UINT_LEAST16_TYPE__ and
__UINT_LEAST32_TYPE__ or make_unsigned when  is not usable.
* include/bits/codecvt.h [!_GLIBCXX_USE_C99_STDINT_TR1]
(codecvt)
(codecvt)
(codecvt_byname)
(codecvt_byname): Remove dependency
on _GLIBCXX_USE_C99_STDINT_TR1.
* include/bits/locale_facets.h [!_GLIBCXX_USE_C99_STDINT_TR1]
(_GLIBCXX_NUM_UNICODE_FACETS): Likewise.
* include/bits/stringfwd.h [!_GLIBCXX_USE_C99_STDINT_TR1]
(char_traits, char_traits)
(basic_string, basic_string): Remove dependency
on _GLIBCXX_USE_C99_STDINT_TR1.
* include/experimental/string_view [!_GLIBCXX_USE_C99_STDINT_TR1]
(u16string_view, u32string_view, hash)
(hash, operator""sv(const char16_t, size_t))
(operator""sv(const char32_t, size_t)): Likewise.
* include/ext/vstring.h [!_GLIBCXX_USE_C99_STDINT_TR1]
(hash<__u16vstring>, hash<__u32vstring>): Likewise.
* include/ext/vstring_fwd.h [!_GLIBCXX_USE_C99_STDINT_TR1]
(__u16vstring, __u16sso_string, __u16rc_string, __u32vstring)
(__u32sso_string, __u32rc_string): Likewise.
* include/std/codecvt [!_GLIBCXX_USE_C99_STDINT_TR1] (codecvt_mode)
(codecvt_utf8, codecvt_utf16, codecvt_utf8_utf16): Likewise.
* include/std/string_view [!_GLIBCXX_USE_C99_STDINT_TR1]
(u16string_view, u32string_view, hash)
(hash, operator""sv(const char16_t, size_t))
(operator""sv(const char32_t, size_t)): Likewise.
* src/c++11/codecvt.cc: Likewise.
* src/c++98/locale_init.cc: Likewise.
* src/c++98/localename.cc: Likewise.

diff --git a/libstdc++-v3/ChangeLog b/libstdc++-v3/ChangeLog
index a3665ee8b6a..10b1496af81 100644
--- a/libstdc++-v3/ChangeLog
+++ b/libstdc++-v3/ChangeLog
@@ -1,5 +1,43 @@
 2018-07-26  Jonathan Wakely  
 
+   * include/bits/basic_string.h [!_GLIBCXX_USE_C99_STDINT_TR1]
+   (hash, hash): Remove dependency on
+   _GLIBCXX_USE_C99_STDINT_TR1.
+   * include/bits/char_traits.h [!_GLIBCXX_USE_C99_STDINT_TR1]
+   (char_traits, char_traits): Remove dependency on
+   _GLIBCXX_USE_C99_STDINT_TR1. Use __UINT_LEAST16_TYPE__ and
+   __UINT_LEAST32_TYPE__ or make_unsigned when  is not usable.
+   * include/bits/codecvt.h [!_GLIBCXX_USE_C99_STDINT_TR1]
+   (codecvt)
+   (codecvt)
+   (codecvt_byname)
+   (codecvt_byname): Remove dependency
+   on _GLIBCXX_USE_C99_STDINT_TR1.
+   * include/bits/locale_facets.h [!_GLIBCXX_USE_C99_STDINT_TR1]
+   (_GLIBCXX_NUM_UNICODE_FACETS): Likewise.
+   * include/bits/stringfwd.h [!_GLIBCXX_USE_C99_STDINT_TR1]
+   (char_traits, char_traits)
+   (basic_string, basic_string): Remove dependency
+   on _GLIBCXX_USE_C99_STDINT_TR1.
+   * include/experimental/string_view [!_GLIBCXX_USE_C99_STDINT_TR1]
+   (u16string_view, u32string_view, hash)
+   (hash, operator""sv(const char16_t, size_t))
+   (operator""sv(const char32_t, size_t)): Likewise.
+   * include/ext/vstring.h [!_GLIBCXX_USE_C99_STDINT_TR1]
+   (hash<__u16vstring>, hash<__u32vstring>): Likewise.
+   * include/ext/vstring_fwd.h [!_GLIBCXX_USE_C99_STDINT_TR1]
+   (__u16vstring, __u16sso_string, __u16rc_string, __u32vstring)
+   (__u32sso_string, __u32rc_string): Likewise.
+   * include/std/codecvt [!_GLIBCXX_USE_C99_STDINT_TR1] (codecvt_mode)
+   (codecvt_utf8, codecvt_utf16, codecvt_utf8_utf16): Likewise.
+   * include/std/string_view [!_GLIBCXX_USE_C99_STDINT_TR1]
+   (u16string_view, u32string_view, hash)
+   (hash, operator""sv(const char16_t, size_t))
+   (operator""sv(const char32_t, size_t)): Likewise.
+   * src/c++11/codecvt.cc: Likewise.
+   * src/c++98/locale_init.cc: Likewise.
+   * src/c++98/localename.cc: Likewise.
+
* include/bits/atomic_futex.h [!_GLIBCXX_USE_C99_STDINT_TR1]
(__atomic_futex_unsigned_base): Remove dependency on
_GLIBCXX_USE_C99_STDINT_TR1 macro.
diff --git a/libstdc++-v3/include/bits/basic_string.h 
b/libstdc++-v3/include/bits/basic_string.h
index 2d1b9dc6c29..c9463989ddc 100644
--- a/libstdc++-v3/include/bits/basic_string.h
+++ b/libstdc++-v3/include/bits/basic_string.h
@@ -6662,7 +6662,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #endif
 #endif /* _GLIBCXX_COMPATIBILITY_CXX0X */
 
-#ifdef _GLIBCXX_USE_C99_STDIN

[PATCH 4/8] Add missing checks for _GLIBCXX_USE_C99_STDINT_TR1

2018-07-26 Thread jwakely
From: Jonathan Wakely 

The throw_allocator extension depends on  which depends on
_GLIBCXX_USE_C99_STDINT_TR1.

The Transactional Memory support uses fixed-width integer types from
.

* include/ext/throw_allocator.h [!_GLIBCXX_USE_C99_STDINT_TR1]
(random_condition, throw_value_random, throw_allocator_random)
(std::hash): Do not define when  is
not usable.
* src/c++11/cow-stdexcept.cc [!_GLIBCXX_USE_C99_STDINT_TR1]: Do not
define transactional memory support when  is not usable.

diff --git a/libstdc++-v3/ChangeLog b/libstdc++-v3/ChangeLog
index 66ee23d1fc7..285ea6b7dca 100644
--- a/libstdc++-v3/ChangeLog
+++ b/libstdc++-v3/ChangeLog
@@ -1,5 +1,12 @@
 2018-07-26  Jonathan Wakely  
 
+   * include/ext/throw_allocator.h [!_GLIBCXX_USE_C99_STDINT_TR1]
+   (random_condition, throw_value_random, throw_allocator_random)
+   (std::hash): Do not define when  is
+   not usable.
+   * src/c++11/cow-stdexcept.cc [!_GLIBCXX_USE_C99_STDINT_TR1]: Do not
+   define transactional memory support when  is not usable.
+
* include/bits/hashtable_policy.h (__detail::__clp2): Use faster
implementation that doesn't depend on  types.
* include/std/memory (align) [!_GLIBCXX_USE_C99_STDINT_TR1]: Use
diff --git a/libstdc++-v3/include/ext/throw_allocator.h 
b/libstdc++-v3/include/ext/throw_allocator.h
index 7fd2ca149a0..dd7c69e 100644
--- a/libstdc++-v3/include/ext/throw_allocator.h
+++ b/libstdc++-v3/include/ext/throw_allocator.h
@@ -482,7 +482,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 }
   };
 
-
+#ifdef _GLIBCXX_USE_C99_STDINT_TR1
   /**
*  @brief Base class for random probability control and throw.
*/
@@ -596,7 +596,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   return _S_e;
 }
   };
-
+#endif // _GLIBCXX_USE_C99_STDINT_TR1
 
   /**
*  @brief Class with exception generation control. Intended to be
@@ -752,6 +752,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #endif
   };
 
+#ifdef _GLIBCXX_USE_C99_STDINT_TR1
   /// Type throwing via random condition.
   struct throw_value_random : public throw_value_base
   {
@@ -782,7 +783,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 operator=(throw_value_random&&) = default;
 #endif
   };
-
+#endif // _GLIBCXX_USE_C99_STDINT_TR1
 
   /**
*  @brief Allocator class with logging and exception generation control.
@@ -920,6 +921,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   ~throw_allocator_limit() _GLIBCXX_USE_NOEXCEPT { }
 };
 
+#ifdef _GLIBCXX_USE_C99_STDINT_TR1
   /// Allocator throwing via random condition.
   template
 struct throw_allocator_random
@@ -940,6 +942,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   ~throw_allocator_random() _GLIBCXX_USE_NOEXCEPT { }
 };
+#endif // _GLIBCXX_USE_C99_STDINT_TR1
 
 _GLIBCXX_END_NAMESPACE_VERSION
 } // namespace
@@ -965,6 +968,7 @@ namespace std _GLIBCXX_VISIBILITY(default)
   }
 };
 
+#ifdef _GLIBCXX_USE_C99_STDINT_TR1
   /// Explicit specialization of std::hash for __gnu_cxx::throw_value_random.
   template<>
 struct hash<__gnu_cxx::throw_value_random>
@@ -979,6 +983,7 @@ namespace std _GLIBCXX_VISIBILITY(default)
return __result;
   }
 };
+#endif
 } // end namespace std
 #endif
 
diff --git a/libstdc++-v3/src/c++11/cow-stdexcept.cc 
b/libstdc++-v3/src/c++11/cow-stdexcept.cc
index a2df7892fd4..54859d58820 100644
--- a/libstdc++-v3/src/c++11/cow-stdexcept.cc
+++ b/libstdc++-v3/src/c++11/cow-stdexcept.cc
@@ -198,6 +198,7 @@ _GLIBCXX_END_NAMESPACE_VERSION
 // declared transaction-safe, so we just don't provide transactional clones
 // in this case.
 #if _GLIBCXX_USE_WEAK_REF
+#ifdef _GLIBCXX_USE_C99_STDINT_TR1
 
 extern "C" {
 
@@ -456,4 +457,5 @@ CTORDTOR(15underflow_error, std::underflow_error, 
runtime_error)
 
 }
 
+#endif  // _GLIBCXX_USE_C99_STDINT_TR1
 #endif  // _GLIBCXX_USE_WEAK_REF
-- 
2.14.4



[PATCH 6/8] Remove dg-require-cstdint directive from tests

2018-07-26 Thread jwakely
From: Jonathan Wakely 

Tests for components which are no longer dependent on
_GLIBCXX_USE_C99_STDINT_TR1 do not need to require .

* testsuite/30_threads/async/42819.cc: Remove dg-require-cstdint
directive.
* testsuite/30_threads/async/49668.cc: Likewise.
* testsuite/30_threads/async/54297.cc: Likewise.
* testsuite/30_threads/async/84532.cc: Likewise.
* testsuite/30_threads/async/any.cc: Likewise.
* testsuite/30_threads/async/async.cc: Likewise.
* testsuite/30_threads/async/except.cc: Likewise.
* testsuite/30_threads/async/forced_unwind.cc: Likewise.
* testsuite/30_threads/async/launch.cc: Likewise.
* testsuite/30_threads/async/lwg2021.cc: Likewise.
* testsuite/30_threads/async/sync.cc: Likewise.
* testsuite/30_threads/call_once/39909.cc: Likewise.
* testsuite/30_threads/call_once/49668.cc: Likewise.
* testsuite/30_threads/call_once/60497.cc: Likewise.
* testsuite/30_threads/call_once/call_once1.cc: Likewise.
* testsuite/30_threads/call_once/constexpr.cc: Likewise.
* testsuite/30_threads/call_once/dr2442.cc: Likewise.
* testsuite/30_threads/call_once/once_flag.cc: Likewise.
* testsuite/30_threads/condition_variable/54185.cc: Likewise.
* testsuite/30_threads/condition_variable/cons/1.cc: Likewise.
* testsuite/30_threads/condition_variable/cons/assign_neg.cc:
Likewise.
* testsuite/30_threads/condition_variable/cons/copy_neg.cc: Likewise.
* testsuite/30_threads/condition_variable/members/1.cc: Likewise.
* testsuite/30_threads/condition_variable/members/2.cc: Likewise.
* testsuite/30_threads/condition_variable/members/3.cc: Likewise.
* testsuite/30_threads/condition_variable/members/53841.cc: Likewise.
* testsuite/30_threads/condition_variable/members/68519.cc: Likewise.
* testsuite/30_threads/condition_variable/native_handle/typesizes.cc:
Likewise.
* testsuite/30_threads/condition_variable/requirements/
standard_layout.cc: Likewise.
* testsuite/30_threads/condition_variable/requirements/typedefs.cc:
* Likewise.
* testsuite/30_threads/condition_variable_any/50862.cc: Likewise.
* testsuite/30_threads/condition_variable_any/53830.cc: Likewise.
* testsuite/30_threads/condition_variable_any/cons/1.cc: Likewise.
* testsuite/30_threads/condition_variable_any/cons/assign_neg.cc:
Likewise.
* testsuite/30_threads/condition_variable_any/cons/copy_neg.cc:
Likewise.
* testsuite/30_threads/condition_variable_any/members/1.cc: Likewise.
* testsuite/30_threads/condition_variable_any/members/2.cc: Likewise.
* testsuite/30_threads/future/cons/assign_neg.cc: Likewise.
* testsuite/30_threads/future/cons/constexpr.cc: Likewise.
* testsuite/30_threads/future/cons/copy_neg.cc: Likewise.
* testsuite/30_threads/future/cons/default.cc: Likewise.
* testsuite/30_threads/future/cons/move.cc: Likewise.
* testsuite/30_threads/future/cons/move_assign.cc: Likewise.
* testsuite/30_threads/future/members/45133.cc: Likewise.
* testsuite/30_threads/future/members/get.cc: Likewise.
* testsuite/30_threads/future/members/get2.cc: Likewise.
* testsuite/30_threads/future/members/share.cc: Likewise.
* testsuite/30_threads/future/members/valid.cc: Likewise.
* testsuite/30_threads/future/members/wait.cc: Likewise.
* testsuite/30_threads/future/members/wait_for.cc: Likewise.
* testsuite/30_threads/future/members/wait_until.cc: Likewise.
* testsuite/30_threads/future/requirements/explicit_instantiation.cc:
Likewise.
* testsuite/30_threads/headers/condition_variable/types_std_c++0x.cc:
Likewise.
* testsuite/30_threads/headers/future/types_std_c++0x.cc: Likewise.
* testsuite/30_threads/headers/mutex/types_std_c++0x.cc: Likewise.
* testsuite/30_threads/headers/thread/std_c++0x_neg.cc: Likewise.
* testsuite/30_threads/headers/thread/types_std_c++0x.cc: Likewise.
* testsuite/30_threads/lock/1.cc: Likewise.
* testsuite/30_threads/lock/2.cc: Likewise.
* testsuite/30_threads/lock/3.cc: Likewise.
* testsuite/30_threads/lock/4.cc: Likewise.
* testsuite/30_threads/lock_guard/cons/1.cc: Likewise.
* testsuite/30_threads/lock_guard/requirements/
explicit_instantiation.cc: Likewise.
* testsuite/30_threads/lock_guard/requirements/typedefs.cc: Likewise.
* testsuite/30_threads/mutex/cons/1.cc: Likewise.
* testsuite/30_threads/mutex/cons/assign_neg.cc: Likewise.
* testsuite/30_threads/mutex/cons/constexpr.cc: Likewise.
* testsuite/30_threads/mutex/cons/copy_neg.cc: Likewise.
* testsuite/30_threads/mutex/dest/destructor_locked.cc: Likewise.
* tes

[PATCH 5/8] Remove dg-require-cstdint directive from tests

2018-07-26 Thread jwakely
From: Jonathan Wakely 

Tests for components which are no longer dependent on
_GLIBCXX_USE_C99_STDINT_TR1 do not need to require .

* testsuite/18_support/numeric_limits/char16_32_t.cc: Qualify names
from namespace std.
* testsuite/20_util/align/2.cc: Remove dg-require-cstdint directive.
* testsuite/20_util/duration/arithmetic/1.cc: Likewise.
* testsuite/20_util/duration/arithmetic/2.cc: Likewise.
* testsuite/20_util/duration/arithmetic/dr2020.cc: Likewise.
* testsuite/20_util/duration/arithmetic/dr934-1.cc: Likewise.
* testsuite/20_util/duration/arithmetic/dr934-2.cc: Likewise.
* testsuite/20_util/duration/comparison_operators/1.cc: Likewise.
* testsuite/20_util/duration/cons/1.cc: Likewise.
* testsuite/20_util/duration/cons/1_neg.cc: Likewise.
* testsuite/20_util/duration/cons/2.cc: Likewise.
* testsuite/20_util/duration/cons/54025.cc: Likewise.
* testsuite/20_util/duration/cons/dr974_neg.cc: Likewise.
* testsuite/20_util/duration/requirements/explicit_instantiation/
explicit_instantiation.cc: Likewise.
* testsuite/20_util/duration/requirements/typedefs_neg1.cc: Likewise.
* testsuite/20_util/duration/requirements/typedefs_neg2.cc: Likewise.
* testsuite/20_util/duration/requirements/typedefs_neg3.cc: Likewise.
* testsuite/20_util/make_signed/requirements/typedefs-4.cc: Likewise.
* testsuite/20_util/ratio/comparisons/comp1.cc: Likewise.
* testsuite/20_util/ratio/comparisons/comp2.cc: Likewise.
* testsuite/20_util/ratio/comparisons/comp3.cc: Likewise.
* testsuite/20_util/ratio/cons/cons1.cc: Likewise.
* testsuite/20_util/ratio/operations/45866.cc: Likewise.
* testsuite/20_util/ratio/operations/47913.cc: Likewise.
* testsuite/20_util/ratio/operations/53840.cc: Likewise.
* testsuite/20_util/ratio/operations/ops1.cc: Likewise.
* testsuite/20_util/shared_ptr/atomic/3.cc: Likewise.
* testsuite/20_util/system_clock/1.cc: Likewise.
* testsuite/20_util/time_point/1.cc: Likewise.
* testsuite/20_util/time_point/2.cc: Likewise.
* testsuite/20_util/time_point/3.cc: Likewise.
* testsuite/20_util/time_point/requirements/explicit_instantiation/
explicit_instantiation.cc: Likewise.
* testsuite/21_strings/basic_string/requirements/
explicit_instantiation/char16_t/1.cc: Likewise.
* testsuite/21_strings/basic_string/requirements/
explicit_instantiation/char32_t/1.cc: Likewise.
* testsuite/21_strings/basic_string_view/requirements/
explicit_instantiation/char16_t/1.cc: Likewise.
* testsuite/21_strings/basic_string_view/requirements/
explicit_instantiation/char32_t/1.cc: Likewise.
* testsuite/21_strings/char_traits/requirements/
explicit_instantiation/char16_t/1.cc: Likewise.
* testsuite/21_strings/char_traits/requirements/
explicit_instantiation/char32_t/1.cc: Likewise.
* testsuite/21_strings/headers/string/types_std_c++0x.cc: Likewise.
* testsuite/22_locale/codecvt/char16_t.cc: Likewise.
* testsuite/22_locale/codecvt/char32_t.cc: Likewise.
* testsuite/22_locale/codecvt/codecvt_utf16/requirements/1.cc:
Likewise.
* testsuite/22_locale/codecvt/codecvt_utf8/requirements/1.cc:
Likewise.
* testsuite/22_locale/codecvt/codecvt_utf8_utf16/requirements/1.cc:
Likewise.
* testsuite/22_locale/codecvt/utf8.cc: Likewise.
* testsuite/23_containers/vector/bool/72847.cc: Likewise.
* testsuite/23_containers/vector/debug/multithreaded_swap.cc:
Likewise.
* testsuite/experimental/string_view/requirements/
explicit_instantiation/char16_t/1.cc: Likewise.
* testsuite/experimental/string_view/requirements/
explicit_instantiation/char32_t/1.cc: Likewise.
* testsuite/ext/vstring/requirements/explicit_instantiation/char16_t/
1.cc: Likewise.
* testsuite/ext/vstring/requirements/explicit_instantiation/char32_t/
1.cc: Likewise.

diff --git a/libstdc++-v3/ChangeLog b/libstdc++-v3/ChangeLog
index 285ea6b7dca..028f269e6f4 100644
--- a/libstdc++-v3/ChangeLog
+++ b/libstdc++-v3/ChangeLog
@@ -1,5 +1,74 @@
 2018-07-26  Jonathan Wakely  
 
+   * testsuite/18_support/numeric_limits/char16_32_t.cc: Qualify names
+   from namespace std.
+   * testsuite/20_util/align/2.cc: Remove dg-require-cstdint directive.
+   * testsuite/20_util/duration/arithmetic/1.cc: Likewise.
+   * testsuite/20_util/duration/arithmetic/2.cc: Likewise.
+   * testsuite/20_util/duration/arithmetic/dr2020.cc: Likewise.
+   * testsuite/20_util/duration/arithmetic/dr934-1.cc: Likewise.
+   * testsuite/20_util/duration/arithmetic/dr934-2.cc: Likewise.
+   * testsuite/20_util/duration/comparison_operators/1.cc: Likewise.
+   * t

[PATCH 8/8] Add missing dg-require-cstdint directives to tests

2018-07-26 Thread jwakely
From: Jonathan Wakely 

* testsuite/18_support/aligned_alloc/aligned_alloc.cc: Add
dg-require-cstdint directive.
* testsuite/20_util/allocator/overaligned.cc: Likewise.
* testsuite/20_util/any/cons/aligned.cc: Likewise.
* testsuite/20_util/monotonic_buffer_resource/allocate.cc: Likewise.
* testsuite/20_util/monotonic_buffer_resource/deallocate.cc: Likewise.
* testsuite/20_util/shared_ptr/thread/default_weaktoshared.cc:
Likewise.
* testsuite/20_util/shared_ptr/thread/mutex_weaktoshared.cc: Likewise.
* testsuite/23_containers/list/modifiers/insert/25288.cc: Likewise.
* testsuite/23_containers/set/allocator/move_assign.cc: Likewise.
* testsuite/25_algorithms/make_heap/complexity.cc: Likewise.
* testsuite/25_algorithms/pop_heap/complexity.cc: Require cstdint and
random_device effective-target.
* testsuite/25_algorithms/push_heap/complexity.cc: Likewise.
* testsuite/25_algorithms/sample/1.cc: Require cstdint.
* testsuite/25_algorithms/sample/2.cc: Likewise.
* testsuite/25_algorithms/sort_heap/complexity.cc: Require cstdint
and random_device.
* testsuite/26_numerics/headers/random/types_std_c++0x.cc: Require
cstdint.
* testsuite/26_numerics/random/chi_squared_distribution/83833.cc:
Likewise.
* testsuite/26_numerics/random/discard_block_engine/requirements/
constexpr_data.cc: Likewise.
* testsuite/26_numerics/random/discard_block_engine/requirements/
constexpr_functions.cc: Likewise.
* testsuite/26_numerics/random/independent_bits_engine/requirements/
constexpr_functions.cc: Likewise.
* testsuite/26_numerics/random/linear_congruential_engine/requirements/
constexpr_data.cc: Likewise.
* testsuite/26_numerics/random/linear_congruential_engine/requirements/
constexpr_functions.cc: Likewise.
* testsuite/26_numerics/random/mersenne_twister_engine/requirements/
constexpr_data.cc: Likewise.
* testsuite/26_numerics/random/mersenne_twister_engine/requirements/
constexpr_functions.cc: Likewise.
* testsuite/26_numerics/random/pr60037-neg.cc: Likewise.
* testsuite/26_numerics/random/seed_seq/cons/65631.cc: Likewise.
* testsuite/26_numerics/random/shuffle_order_engine/requirements/
constexpr_data.cc: Add dg-require-cstdint directive.
* testsuite/26_numerics/random/shuffle_order_engine/requirements/
constexpr_functions.cc: Likewise.
* testsuite/26_numerics/random/subtract_with_carry_engine/requirements/
constexpr_data.cc: Likewise.
* testsuite/26_numerics/random/subtract_with_carry_engine/requirements/
constexpr_functions.cc: Likewise.
* testsuite/26_numerics/random/uniform_real_distribution/operators/
64351.cc: Likewise.
* testsuite/29_atomics/headers/atomic/types_std_c++0x.cc: Likewise.
* testsuite/experimental/algorithm/sample-2.cc: Likewise.
* testsuite/experimental/algorithm/sample.cc: Likewise.
* testsuite/experimental/algorithm/search.cc: Likewise.
* testsuite/experimental/algorithm/shuffle.cc: Likewise.
* testsuite/experimental/any/cons/aligned.cc: Likewise.
* testsuite/experimental/memory_resource/new_delete_resource.cc:
Likewise.
* testsuite/experimental/memory_resource/resource_adaptor.cc: Likewise.
* testsuite/experimental/random/randint.cc: Likewise.
* testsuite/experimental/source_location/1.cc: Likewise.
* testsuite/ext/bitmap_allocator/overaligned.cc: Likewise.
* testsuite/ext/malloc_allocator/overaligned.cc: Likewise.
* testsuite/ext/mt_allocator/overaligned.cc: Likewise.
* testsuite/ext/new_allocator/overaligned.cc: Likewise.
* testsuite/ext/pb_ds/regression/hash_map_rand.cc: Likewise.
* testsuite/ext/pb_ds/regression/hash_set_rand.cc: Likewise.
* testsuite/ext/pb_ds/regression/list_update_map_rand.cc: Likewise.
* testsuite/ext/pb_ds/regression/list_update_set_rand.cc: Likewise.
* testsuite/ext/pb_ds/regression/priority_queue_rand.cc: Likewise.
* testsuite/ext/pb_ds/regression/tree_map_rand.cc: Likewise.
* testsuite/ext/pb_ds/regression/tree_set_rand.cc: Likewise.
* testsuite/ext/pb_ds/regression/trie_map_rand.cc: Likewise.
* testsuite/ext/pb_ds/regression/trie_set_rand.cc: Likewise.
* testsuite/ext/pool_allocator/overaligned.cc: Likewise.
* testsuite/ext/throw_allocator/check_allocate_max_size.cc: Likewise.
* testsuite/ext/throw_allocator/check_deallocate_null.cc: Likewise.
* testsuite/ext/throw_allocator/check_delete.cc: Likewise.
* testsuite/ext/throw_allocator/check_new.cc: Likewise.
* testsuite/ext/throw_allocator/deallocate_global.cc: Likewise.
* testsuite/ext/throw_allocator

[PATCH 7/8] Remove dg-require-cstdint directive from tests

2018-07-26 Thread jwakely
From: Jonathan Wakely 

Tests for components which are no longer dependent on
_GLIBCXX_USE_C99_STDINT_TR1 do not need to require .

* testsuite/30_threads/recursive_mutex/cons/1.cc: Likewise.
* testsuite/30_threads/recursive_mutex/cons/assign_neg.cc: Likewise.
* testsuite/30_threads/recursive_mutex/cons/copy_neg.cc: Likewise.
* testsuite/30_threads/recursive_mutex/dest/destructor_locked.cc:
Likewise.
* testsuite/30_threads/recursive_mutex/lock/1.cc: Likewise.
* testsuite/30_threads/recursive_mutex/native_handle/1.cc: Likewise.
* testsuite/30_threads/recursive_mutex/native_handle/typesizes.cc:
Likewise.
* testsuite/30_threads/recursive_mutex/requirements/standard_layout.cc:
Likewise.
* testsuite/30_threads/recursive_mutex/requirements/typedefs.cc:
Likewise.
* testsuite/30_threads/recursive_mutex/try_lock/1.cc: Likewise.
* testsuite/30_threads/recursive_mutex/try_lock/2.cc: Likewise.
* testsuite/30_threads/recursive_mutex/unlock/1.cc: Likewise.
* testsuite/30_threads/recursive_mutex/unlock/2.cc: Likewise.
* testsuite/30_threads/recursive_timed_mutex/cons/1.cc: Likewise.
* testsuite/30_threads/recursive_timed_mutex/cons/assign_neg.cc:
Likewise.
* testsuite/30_threads/recursive_timed_mutex/cons/copy_neg.cc:
Likewise.
* testsuite/30_threads/recursive_timed_mutex/dest/
destructor_locked.cc: Likewise.
* testsuite/30_threads/recursive_timed_mutex/lock/1.cc: Likewise.
* testsuite/30_threads/recursive_timed_mutex/lock/2.cc: Likewise.
* testsuite/30_threads/recursive_timed_mutex/native_handle/1.cc:
Likewise.
* testsuite/30_threads/recursive_timed_mutex/native_handle/
typesizes.cc: Likewise.
* testsuite/30_threads/recursive_timed_mutex/requirements/typedefs.cc:
Likewise.
* testsuite/30_threads/recursive_timed_mutex/try_lock/1.cc:
Likewise.
* testsuite/30_threads/recursive_timed_mutex/try_lock/2.cc:
Likewise.
* testsuite/30_threads/recursive_timed_mutex/try_lock_for/1.cc:
Likewise.
* testsuite/30_threads/recursive_timed_mutex/try_lock_for/2.cc:
Likewise.
* testsuite/30_threads/recursive_timed_mutex/try_lock_for/3.cc:
Likewise.
* testsuite/30_threads/recursive_timed_mutex/try_lock_until/1.cc:
Likewise.
* testsuite/30_threads/recursive_timed_mutex/try_lock_until/2.cc:
Likewise.
* testsuite/30_threads/recursive_timed_mutex/unlock/1.cc: Likewise.
* testsuite/30_threads/recursive_timed_mutex/unlock/2.cc: Likewise.
* testsuite/30_threads/scoped_lock/cons/1.cc: Likewise.
* testsuite/30_threads/scoped_lock/requirements/
explicit_instantiation.cc: Likewise.
* testsuite/30_threads/scoped_lock/requirements/typedefs.cc: Likewise.
* testsuite/30_threads/shared_future/cons/assign.cc: Likewise.
* testsuite/30_threads/shared_future/cons/constexpr.cc: Likewise.
* testsuite/30_threads/shared_future/cons/copy.cc: Likewise.
* testsuite/30_threads/shared_future/cons/default.cc: Likewise.
* testsuite/30_threads/shared_future/cons/move.cc: Likewise.
* testsuite/30_threads/shared_future/cons/move_assign.cc: Likewise.
* testsuite/30_threads/shared_future/members/45133.cc: Likewise.
* testsuite/30_threads/shared_future/members/get.cc: Likewise.
* testsuite/30_threads/shared_future/members/get2.cc: Likewise.
* testsuite/30_threads/shared_future/members/valid.cc: Likewise.
* testsuite/30_threads/shared_future/members/wait.cc: Likewise.
* testsuite/30_threads/shared_future/members/wait_for.cc: Likewise.
* testsuite/30_threads/shared_future/members/wait_until.cc: Likewise.
* testsuite/30_threads/shared_future/requirements/
explicit_instantiation.cc: Likewise.
* testsuite/30_threads/shared_lock/cons/1.cc: Likewise.
* testsuite/30_threads/shared_lock/cons/2.cc: Likewise.
* testsuite/30_threads/shared_lock/cons/3.cc: Likewise.
* testsuite/30_threads/shared_lock/cons/4.cc: Likewise.
* testsuite/30_threads/shared_lock/cons/5.cc: Likewise.
* testsuite/30_threads/shared_lock/cons/6.cc: Likewise.
* testsuite/30_threads/shared_lock/locking/1.cc: Likewise.
* testsuite/30_threads/shared_lock/locking/2.cc: Likewise.
* testsuite/30_threads/shared_lock/locking/3.cc: Likewise.
* testsuite/30_threads/shared_lock/locking/4.cc: Likewise.
* testsuite/30_threads/shared_lock/modifiers/1.cc: Likewise.
* testsuite/30_threads/shared_lock/requirements/
explicit_instantiation.cc: Likewise.
* testsuite/30_threads/shared_lock/requirements/typedefs.cc: Likewise.
* testsuite/30_threads/shared_mutex/cons/1.cc: Likewise.
* tests

Re: [PATCH 3/3] Add user-friendly OpenACC diagnostics regarding detected parallelism.

2018-07-26 Thread Cesar Philippidis
On 07/26/2018 01:33 AM, Richard Biener wrote:
> On Wed, Jul 25, 2018 at 5:30 PM Cesar Philippidis
>  wrote:
>>
>> This patch teaches GCC to inform the user how it assigned parallelism
>> to each OpenACC loop at compile time using the -fopt-info-note-omp
>> flag. For instance, given the acc parallel loop nest:
>>
>>   #pragma acc parallel loop
>>   for (...)
>> #pragma acc loop vector
>> for (...)
>>
>> GCC will report somthing like
>>
>>   foo.c:4:0: note: Detected parallelism 
>>   foo.c:6:0: note: Detected parallelism 
>>
>> Note how only the inner loop specifies vector parallelism. In this
>> example, GCC automatically assigned gang and worker parallelism to the
>> outermost loop. Perhaps, going forward, it would be useful to
>> distinguish which parallelism was specified by the user and which was
>> assigned by the compiler. But that can be added in a follow up patch.
>>
>> Is this patch OK for trunk? I bootstrapped and regtested it for x86_64
>> with nvptx offloading.
> 
> Shouldn't this use MSG_OPTIMIZED_LOCATIONS instead?  Are there
> any other optinfo notes emitted?  Like when despite pragmas loops
> are not handled or so?

Early on I was just using the diagnostics in omp-grid.c as a model, but
yes, it does make sense to use MSG_OPTIMIZED_LOCATIONS instead of
MSG_NOTE. And no, these are the only optinfo notes that we're emitting
at the moment. All of the other diagnostics are just errors and
warnings, although we probably should revisit that for some of the
forthcoming acc routine diagnostics. Going forward, now that there's in
interest in automatic parallelism inside acc kernels, we do plan on
expanding the diagnostics.

The attached revised patch now uses MSG_OPTIMIZED_LOCATIONS for the
diagnostics. If this gets approved for trunk, I'll go ahead and backport
it to og8 and update the OpenACC wiki to change the usage of
-fopt-info-note-omp to -fopt-info-optimized-omp.

Is this OK for trunk?

Thanks,
Cesar
2018-XX-YY  Cesar Philippidis  

	gcc/
	* omp-offload.c (inform_oacc_loop): New function.
	(execute_oacc_device_lower): Use it to display loop parallelism.

	gcc/testsuite/
	* c-c++-common/goacc/note-parallelism.c: New test.
	* gfortran.dg/goacc/note-parallelism.f90: New test.

(cherry picked from gomp-4_0-branch r245683, and gcc/testsuite/ parts of
r245770)

use MSG_OPTIMIZED_LOCATIONS instead of MSG_NOTE
---
 gcc/omp-offload.c | 27 
 .../c-c++-common/goacc/note-parallelism.c | 61 ++
 .../gfortran.dg/goacc/note-parallelism.f90| 62 +++
 3 files changed, 150 insertions(+)
 create mode 100644 gcc/testsuite/c-c++-common/goacc/note-parallelism.c
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/note-parallelism.f90

diff --git a/gcc/omp-offload.c b/gcc/omp-offload.c
index 0abf0283c9e..3582dda3d1a 100644
--- a/gcc/omp-offload.c
+++ b/gcc/omp-offload.c
@@ -866,6 +866,31 @@ debug_oacc_loop (oacc_loop *loop)
   dump_oacc_loop (stderr, loop, 0);
 }
 
+/* Provide diagnostics on OpenACC loops LOOP, its siblings and its
+   children.  */
+
+static void
+inform_oacc_loop (oacc_loop *loop)
+{
+  const char *seq = loop->mask == 0 ? " seq" : "";
+  const char *gang = loop->mask & GOMP_DIM_MASK (GOMP_DIM_GANG)
+? " gang" : "";
+  const char *worker = loop->mask & GOMP_DIM_MASK (GOMP_DIM_WORKER)
+? " worker" : "";
+  const char *vector = loop->mask & GOMP_DIM_MASK (GOMP_DIM_VECTOR)
+? " vector" : "";
+  dump_location_t loc = dump_location_t::from_location_t (loop->loc);
+
+  dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, loc,
+		   "Detected parallelism \n", seq, gang,
+		   worker, vector);
+
+  if (loop->child)
+inform_oacc_loop (loop->child);
+  if (loop->sibling)
+inform_oacc_loop (loop->sibling);
+}
+
 /* DFS walk of basic blocks BB onwards, creating OpenACC loop
structures as we go.  By construction these loops are properly
nested.  */
@@ -1533,6 +1558,8 @@ execute_oacc_device_lower ()
   dump_oacc_loop (dump_file, loops, 0);
   fprintf (dump_file, "\n");
 }
+  if (dump_enabled_p () && loops->child)
+inform_oacc_loop (loops->child);
 
   /* Offloaded targets may introduce new basic blocks, which require
  dominance information to update SSA.  */
diff --git a/gcc/testsuite/c-c++-common/goacc/note-parallelism.c b/gcc/testsuite/c-c++-common/goacc/note-parallelism.c
new file mode 100644
index 000..2e50d86cd23
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/note-parallelism.c
@@ -0,0 +1,61 @@
+/* Test the output of -fopt-info-note-omp.  */
+
+/* { dg-additional-options "-fopt-info-note-optimized" } */
+
+int
+main ()
+{
+  int x, y, z;
+
+#pragma acc parallel loop seq /* { dg-message "note: Detected parallelism " } */
+  for (x = 0; x < 10; x++)
+;
+
+#pragma acc parallel loop gang /* { dg-message "note: Detected parallelism " } */
+  for (x = 0; x < 10; x++)
+;
+
+#pragma acc parallel loop worker /* { dg-message "note: Detected parallelism " } */
+  for (x = 0; x < 

Re: [patch] adjust default nvptx launch geometry for OpenACC offloaded regions

2018-07-26 Thread Cesar Philippidis
Hi Tom,

I see that you're reviewing the libgomp changes. Please disregard the
following hunk:

On 07/11/2018 12:13 PM, Cesar Philippidis wrote:
> @@ -1199,12 +1202,59 @@ nvptx_exec (void (*fn), size_t mapnum, void 
> **hostaddrs, void **devaddrs,
>default_dims[GOMP_DIM_VECTOR]);
>   }
>pthread_mutex_unlock (&ptx_dev_lock);
> +  int vectors = default_dims[GOMP_DIM_VECTOR];
> +  int workers = default_dims[GOMP_DIM_WORKER];
> +  int gangs = default_dims[GOMP_DIM_GANG];
> +
> +  if (nvptx_thread()->ptx_dev->driver_version > 6050)
> + {
> +   int grids, blocks;
> +   CUDA_CALL_ASSERT (cuOccupancyMaxPotentialBlockSize, &grids,
> + &blocks, function, NULL, 0,
> + dims[GOMP_DIM_WORKER] * dims[GOMP_DIM_VECTOR]);
> +   GOMP_PLUGIN_debug (0, "cuOccupancyMaxPotentialBlockSize: "
> +  "grid = %d, block = %d\n", grids, blocks);
> +
> +   gangs = grids * dev_size;
> +   workers = blocks / vectors;
> + }

I revisited this change yesterday and I noticed it was setting gangs
incorrectly. Basically, gangs should be set as follows

  gangs = grids * (blocks / warp_size);

or to be more closer to og8 as

  gangs = 2 * grids * (blocks / warp_size);

The use of that magic constant 2 is to prevent thread starvation. That's
a similar concept behind make -j<2*#threads>.

Anyway, I'm still experimenting with that change. There are still some
discrepancies between the way that I select num_workers and how the
driver does. The driver appears to be a little bit more conservative,
but according to the thread occupancy calculator, that should yield
greater performance on GPUs.

I just wanted to give you a heads up because you seem to be working on this.

Thanks for all of your reviews!

By the way, are you now maintainer of the libgomp nvptx plugin?

Cesar


Re: [PATCH] [AArch64, Falkor] Switch to using Falkor-specific vector costs

2018-07-26 Thread Kyrill Tkachov

Hi Luis,

On 25/07/18 19:10, Luis Machado wrote:

The adjusted vector costs give Falkor a reasonable boost in performance for FP
benchmarks (both CPU2017 and CPU2006) and doesn't change INT benchmarks that
much. About 0.7% for CPU2017 FP and 1.54% for CPU2006 FP.

OK for trunk?



The patch looks ok and safe to me (though you'll need approval from the 
maintainers).

I'd be interested to see what workloads in CPU2017 were affected by this.
Any chance you could post the breakdown in numbers from CPU2017?

Thanks,
Kyrill


gcc/ChangeLog:

2018-07-25  Luis Machado  

* config/aarch64/aarch64.c (qdf24xx_vector_cost): New.
(qdf24xx_tunings) : Set to qdf24xx_vector_cost.
---
 gcc/config/aarch64/aarch64.c | 22 +-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index fa01475..d443aee 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -430,6 +430,26 @@ static const struct cpu_vector_cost generic_vector_cost =
   1 /* cond_not_taken_branch_cost  */
 };

+/* Qualcomm QDF24xx costs for vector insn classes.  */
+static const struct cpu_vector_cost qdf24xx_vector_cost =
+{
+  1, /* scalar_int_stmt_cost  */
+  1, /* scalar_fp_stmt_cost  */
+  1, /* scalar_load_cost  */
+  1, /* scalar_store_cost  */
+  1, /* vec_int_stmt_cost  */
+  3, /* vec_fp_stmt_cost  */
+  2, /* vec_permute_cost  */
+  1, /* vec_to_scalar_cost  */
+  1, /* scalar_to_vec_cost  */
+  1, /* vec_align_load_cost  */
+  1, /* vec_unalign_load_cost  */
+  1, /* vec_unalign_store_cost  */
+  1, /* vec_store_cost  */
+  3, /* cond_taken_branch_cost  */
+  1  /* cond_not_taken_branch_cost  */
+};
+
 /* ThunderX costs for vector insn classes.  */
 static const struct cpu_vector_cost thunderx_vector_cost =
 {
@@ -890,7 +910,7 @@ static const struct tune_params qdf24xx_tunings =
   &qdf24xx_extra_costs,
   &qdf24xx_addrcost_table,
   &qdf24xx_regmove_cost,
-  &generic_vector_cost,
+  &qdf24xx_vector_cost,
   &generic_branch_cost,
   &generic_approx_modes,
   4, /* memmov_cost  */
--
2.7.4





PING [PATCH] libsanitizer: Mark REAL(swapcontext) with indirect_return attribute on x86

2018-07-26 Thread H.J. Lu
On Fri, Jul 20, 2018 at 1:11 PM, H.J. Lu  wrote:
> Cherry-pick compiler-rt revision 337603:
>
> When shadow stack from Intel CET is enabled, the first instruction of all
> indirect branch targets must be a special instruction, ENDBR.
>
> lib/asan/asan_interceptors.cc has
>
> ...
>   int res = REAL(swapcontext)(oucp, ucp);
> ...
>
> REAL(swapcontext) is a function pointer to swapcontext in libc.  Since
> swapcontext may return via indirect branch on x86 when shadow stack is
> enabled, as in this case,
>
> int res = REAL(swapcontext)(oucp, ucp);
>     This function may be
> returned via an indirect branch.
>
> Here compiler must insert ENDBR after call, like
>
> call *bar(%rip)
> endbr64
>
> I opened an LLVM bug:
>
> https://bugs.llvm.org/show_bug.cgi?id=38207
>
> to add the indirect_return attribute so that it can be used to inform
> compiler to insert ENDBR after REAL(swapcontext) call.  We mark
> REAL(swapcontext) with the indirect_return attribute if it is available.
>
> This fixed:
>
> https://bugs.llvm.org/show_bug.cgi?id=38249
>
> Reviewed By: eugenis
>
> Differential Revision: https://reviews.llvm.org/D49608
>
> OK for trunk?
>
> H.J.
> ---
> PR target/86560
> * asan/asan_interceptors.cc (swapcontext): Call REAL(swapcontext)
> with indirect_return attribute on x86 if indirect_return attribute
> is available.
> * sanitizer_common/sanitizer_internal_defs.h (__has_attribute):
> New.
> ---
>  libsanitizer/asan/asan_interceptors.cc  | 8 
>  libsanitizer/sanitizer_common/sanitizer_internal_defs.h | 5 +
>  2 files changed, 13 insertions(+)
>
> diff --git a/libsanitizer/asan/asan_interceptors.cc 
> b/libsanitizer/asan/asan_interceptors.cc
> index a8f4b72723f..552cf9347af 100644
> --- a/libsanitizer/asan/asan_interceptors.cc
> +++ b/libsanitizer/asan/asan_interceptors.cc
> @@ -267,7 +267,15 @@ INTERCEPTOR(int, swapcontext, struct ucontext_t *oucp,
>uptr stack, ssize;
>ReadContextStack(ucp, &stack, &ssize);
>ClearShadowMemoryForContextStack(stack, ssize);
> +#if __has_attribute(__indirect_return__) && \
> +(defined(__x86_64__) || defined(__i386__))
> +  int (*real_swapcontext)(struct ucontext_t *, struct ucontext_t *)
> +__attribute__((__indirect_return__))
> += REAL(swapcontext);
> +  int res = real_swapcontext(oucp, ucp);
> +#else
>int res = REAL(swapcontext)(oucp, ucp);
> +#endif
>// swapcontext technically does not return, but program may swap context to
>// "oucp" later, that would look as if swapcontext() returned 0.
>// We need to clear shadow for ucp once again, as it may be in arbitrary
> diff --git a/libsanitizer/sanitizer_common/sanitizer_internal_defs.h 
> b/libsanitizer/sanitizer_common/sanitizer_internal_defs.h
> index edd6a21c122..4413a88bea0 100644
> --- a/libsanitizer/sanitizer_common/sanitizer_internal_defs.h
> +++ b/libsanitizer/sanitizer_common/sanitizer_internal_defs.h
> @@ -104,6 +104,11 @@
>  # define __has_feature(x) 0
>  #endif
>
> +// Older GCCs do not understand __has_attribute.
> +#if !defined(__has_attribute)
> +# define __has_attribute(x) 0
> +#endif
> +
>  // For portability reasons we do not include stddef.h, stdint.h or any other
>  // system header, but we do need some basic types that are not defined
>  // in a portable way by the language itself.
> --
> 2.17.1
>

Any objections?


-- 
H.J.


PING [PATCH] i386: Remove _Unwind_Frames_Increment

2018-07-26 Thread H.J. Lu
On Fri, Jul 20, 2018 at 11:15 AM, H.J. Lu  wrote:
> Tested on CET SDV using the CET kernel on cet branch at:
>
> https://github.com/yyu168/linux_cet/tree/cet
>
> OK for trunk and GCC 8 branch?
>
> Thanks.
>
>
> H.J.
> ---
> The CET kernel has been changed to place a restore token on shadow stack
> for signal handler to enhance security.  It is usually transparent to user
> programs since kernel will pop the restore token when signal handler
> returns.  But when an exception is thrown from a signal handler, now
> we need to remove _Unwind_Frames_Increment to pop the the restore token
> from shadow stack.  Otherwise, we get
>
> FAIL: g++.dg/torture/pr85334.C   -O0  execution test
> FAIL: g++.dg/torture/pr85334.C   -O1  execution test
> FAIL: g++.dg/torture/pr85334.C   -O2  execution test
> FAIL: g++.dg/torture/pr85334.C   -O3 -g  execution test
> FAIL: g++.dg/torture/pr85334.C   -Os  execution test
> FAIL: g++.dg/torture/pr85334.C   -O2 -flto -fno-use-linker-plugin 
> -flto-partition=none  execution test
>
> PR libgcc/85334
> * config/i386/shadow-stack-unwind.h (_Unwind_Frames_Increment):
> Removed.
> ---
>  libgcc/config/i386/shadow-stack-unwind.h | 5 -
>  1 file changed, 5 deletions(-)
>
> diff --git a/libgcc/config/i386/shadow-stack-unwind.h 
> b/libgcc/config/i386/shadow-stack-unwind.h
> index a32f3e74b52..40f48df2aec 100644
> --- a/libgcc/config/i386/shadow-stack-unwind.h
> +++ b/libgcc/config/i386/shadow-stack-unwind.h
> @@ -49,8 +49,3 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
> If not, see
> }   \
>  }  \
>  while (0)
> -
> -/* Increment frame count.  Skip signal frames.  */
> -#undef _Unwind_Frames_Increment
> -#define _Unwind_Frames_Increment(context, frames) \
> -  if (!_Unwind_IsSignalFrame (context)) frames++
> --
> 2.17.1
>

I will check it into trunk tomorrow if there is no objection.


-- 
H.J.


Re: PING [PATCH] libsanitizer: Mark REAL(swapcontext) with indirect_return attribute on x86

2018-07-26 Thread Jakub Jelinek
On Thu, Jul 26, 2018 at 07:38:34AM -0700, H.J. Lu wrote:
> > PR target/86560
> > * asan/asan_interceptors.cc (swapcontext): Call REAL(swapcontext)
> > with indirect_return attribute on x86 if indirect_return attribute
> > is available.
> > * sanitizer_common/sanitizer_internal_defs.h (__has_attribute):
> > New.

If it is a cherry-pick, just say so in the ChangeLog entry.
Ok with that change.

Jakub


[PATCH] Print heuristics probability fraction part with 2 digits.

2018-07-26 Thread Martin Liška
Hi.

It's just a cosmetics change where I want to print 2 digits of fraction
part of heuristics probabilities. It helps to distinguish 100% from
PROB_VERY_LIKELY (99.96%).

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

Ready to be installed?
Martin

gcc/ChangeLog:

2018-07-26  Martin Liska  

* predict.c (dump_prediction): Change to 2 digits
in fraction part.

gcc/testsuite/ChangeLog:

2018-07-26  Martin Liska  

* gcc.dg/predict-1.c: Adjust scanned pattern to cover 2 digits.
* gcc.dg/predict-13.c:Likewise.
* gcc.dg/predict-3.c:Likewise.
* gcc.dg/predict-4.c:Likewise.
* gcc.dg/predict-5.c:Likewise.
* gcc.dg/predict-6.c:Likewise.
* gcc.dg/predict-9.c:Likewise.
* gfortran.dg/predict-1.f90:Likewise.
---
 gcc/predict.c   | 2 +-
 gcc/testsuite/gcc.dg/predict-1.c| 2 +-
 gcc/testsuite/gcc.dg/predict-13.c   | 4 ++--
 gcc/testsuite/gcc.dg/predict-3.c| 2 +-
 gcc/testsuite/gcc.dg/predict-4.c| 2 +-
 gcc/testsuite/gcc.dg/predict-5.c| 2 +-
 gcc/testsuite/gcc.dg/predict-6.c| 2 +-
 gcc/testsuite/gcc.dg/predict-9.c| 4 ++--
 gcc/testsuite/gfortran.dg/predict-1.f90 | 2 +-
 9 files changed, 11 insertions(+), 11 deletions(-)


diff --git a/gcc/predict.c b/gcc/predict.c
index 65e088fb8df..a6769eda1c7 100644
--- a/gcc/predict.c
+++ b/gcc/predict.c
@@ -734,7 +734,7 @@ dump_prediction (FILE *file, enum br_predictor predictor, int probability,
   else
 edge_info_str[0] = '\0';
 
-  fprintf (file, "  %s heuristics%s%s: %.1f%%",
+  fprintf (file, "  %s heuristics%s%s: %.2f%%",
 	   predictor_info[predictor].name,
 	   edge_info_str, reason_messages[reason],
 	   probability * 100.0 / REG_BR_PROB_BASE);
diff --git a/gcc/testsuite/gcc.dg/predict-1.c b/gcc/testsuite/gcc.dg/predict-1.c
index 4ba26e6e256..9e5605a2e84 100644
--- a/gcc/testsuite/gcc.dg/predict-1.c
+++ b/gcc/testsuite/gcc.dg/predict-1.c
@@ -23,4 +23,4 @@ void foo (int bound)
 }
 }
 
-/* { dg-final { scan-tree-dump-times "guess loop iv compare heuristics of edge\[^:\]*: 36.0%" 4 "profile_estimate"} } */
+/* { dg-final { scan-tree-dump-times "guess loop iv compare heuristics of edge\[^:\]*: 36.00%" 4 "profile_estimate"} } */
diff --git a/gcc/testsuite/gcc.dg/predict-13.c b/gcc/testsuite/gcc.dg/predict-13.c
index 385be9e1389..c6da45f8127 100644
--- a/gcc/testsuite/gcc.dg/predict-13.c
+++ b/gcc/testsuite/gcc.dg/predict-13.c
@@ -20,5 +20,5 @@ int main(int argc, char **argv)
   return 10;
 }
 
-/* { dg-final { scan-tree-dump-times "combined heuristics of edge\[^:\]*: 33.3%" 3 "profile_estimate"} } */
-/* { dg-final { scan-tree-dump-times "combined heuristics of edge\[^:\]*: 0.1%" 2 "profile_estimate"} } */
+/* { dg-final { scan-tree-dump-times "combined heuristics of edge\[^:\]*: 33.30%" 3 "profile_estimate"} } */
+/* { dg-final { scan-tree-dump-times "combined heuristics of edge\[^:\]*: 0.05%" 2 "profile_estimate"} } */
diff --git a/gcc/testsuite/gcc.dg/predict-3.c b/gcc/testsuite/gcc.dg/predict-3.c
index 81addde1667..f3f416345e5 100644
--- a/gcc/testsuite/gcc.dg/predict-3.c
+++ b/gcc/testsuite/gcc.dg/predict-3.c
@@ -25,4 +25,4 @@ void foo (int bound)
 }
 }
 
-/* { dg-final { scan-tree-dump-times "guess loop iv compare heuristics of edge\[^:\]*: 64.0%" 3 "profile_estimate"} } */
+/* { dg-final { scan-tree-dump-times "guess loop iv compare heuristics of edge\[^:\]*: 64.00%" 3 "profile_estimate"} } */
diff --git a/gcc/testsuite/gcc.dg/predict-4.c b/gcc/testsuite/gcc.dg/predict-4.c
index 2ac2ec5721d..851afb1cff5 100644
--- a/gcc/testsuite/gcc.dg/predict-4.c
+++ b/gcc/testsuite/gcc.dg/predict-4.c
@@ -15,4 +15,4 @@ void foo (int bound)
 }
 }
 
-/* { dg-final { scan-tree-dump "  loop iv compare heuristics of edge\[^:\]*: 50.0%" "profile_estimate"} } */
+/* { dg-final { scan-tree-dump "  loop iv compare heuristics of edge\[^:\]*: 50.00%" "profile_estimate"} } */
diff --git a/gcc/testsuite/gcc.dg/predict-5.c b/gcc/testsuite/gcc.dg/predict-5.c
index c80b2928d57..5af5db1825e 100644
--- a/gcc/testsuite/gcc.dg/predict-5.c
+++ b/gcc/testsuite/gcc.dg/predict-5.c
@@ -21,4 +21,4 @@ void foo (int base, int bound)
 }
 }
 
-/* { dg-final { scan-tree-dump-times "guess loop iv compare heuristics of edge\[^:\]*: 64.0%" 4 "profile_estimate"} } */
+/* { dg-final { scan-tree-dump-times "guess loop iv compare heuristics of edge\[^:\]*: 64.00%" 4 "profile_estimate"} } */
diff --git a/gcc/testsuite/gcc.dg/predict-6.c b/gcc/testsuite/gcc.dg/predict-6.c
index 3acc7644629..5d6fbf158f2 100644
--- a/gcc/testsuite/gcc.dg/predict-6.c
+++ b/gcc/testsuite/gcc.dg/predict-6.c
@@ -21,4 +21,4 @@ void foo (int base, int bound)
 }
 }
 
-/* { dg-final { scan-tree-dump-times "guess loop iv compare heuristics of edge\[^:\]*: 36.0%" 4 "profile_estimate"} } */
+/* { dg-final { scan-tree-dump-times "guess loop iv compare heuristics of edge\[^:\]*: 36.00%" 4 "profile_estimate"} } */
diff --git a/gcc/testsuite/gcc.dg/predic

[PATCH] Add malloc predictor (PR middle-end/83023).

2018-07-26 Thread Martin Liška
Hi.

Following patch implements new predictors that annotates malloc-like functions.
These almost every time return a non-null value.

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

Ready to be installed?
Martin

gcc/ChangeLog:

2018-07-26  Martin Liska  

PR middle-end/83023
* predict.c (expr_expected_value_1): Handle DECL_IS_MALLOC
declarations.
* predict.def (PRED_MALLOC_NONNULL): New predictor.

gcc/testsuite/ChangeLog:

2018-07-26  Martin Liska  

PR middle-end/83023
* gcc.dg/predict-16.c: New test.
---
 gcc/predict.c |  8 
 gcc/predict.def   |  3 +++
 gcc/testsuite/gcc.dg/predict-16.c | 31 +++
 3 files changed, 42 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/predict-16.c


diff --git a/gcc/predict.c b/gcc/predict.c
index a6769eda1c7..a7b2223c697 100644
--- a/gcc/predict.c
+++ b/gcc/predict.c
@@ -2380,6 +2380,14 @@ expr_expected_value_1 (tree type, tree op0, enum tree_code code,
 		}
 	  return NULL;
 	}
+
+	  if (DECL_IS_MALLOC (decl))
+	{
+	  if (predictor)
+		*predictor = PRED_MALLOC_NONNULL;
+	  return boolean_true_node;
+	}
+
 	  if (DECL_BUILT_IN_CLASS (decl) == BUILT_IN_NORMAL)
 	switch (DECL_FUNCTION_CODE (decl))
 	  {
diff --git a/gcc/predict.def b/gcc/predict.def
index 4ed97ed165c..8036fac84c5 100644
--- a/gcc/predict.def
+++ b/gcc/predict.def
@@ -169,6 +169,9 @@ DEF_PREDICTOR (PRED_HOT_LABEL, "hot label", HITRATE (85), 0)
 DEF_PREDICTOR (PRED_COLD_LABEL, "cold label", PROB_VERY_LIKELY,
 	   PRED_FLAG_FIRST_MATCH)
 
+/* Return value of malloc function is almost always non-null.  */
+DEF_PREDICTOR (PRED_MALLOC_NONNULL, "malloc returned non-NULL", \
+	   PROB_VERY_LIKELY, PRED_FLAG_FIRST_MATCH)
 
 /* The following predictors are used in Fortran. */
 
diff --git a/gcc/testsuite/gcc.dg/predict-16.c b/gcc/testsuite/gcc.dg/predict-16.c
new file mode 100644
index 000..3a3e943bb79
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/predict-16.c
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-profile_estimate" } */
+
+#include 
+#include 
+
+void *r;
+void *r2;
+void *r3;
+void *r4;
+
+void *m (size_t s, int c)
+{
+  r = malloc (s);
+  if (r)
+memset (r, 0, s);
+
+  r2 = calloc (s, 0);
+  if (r2)
+memset (r2, 0, s);
+
+  r3 = __builtin_malloc (s);
+  if (r3)
+memset (r3, 0, s);
+
+  r4 = __builtin_calloc (s, 0);
+  if (r4)
+memset (r4, 0, s);
+}
+
+/* { dg-final { scan-tree-dump-times "malloc returned non-NULL heuristics of edge\[^:\]*: 99.96%" 4 "profile_estimate"} } */



Re: [PATCH] [AArch64, Falkor] Switch to using Falkor-specific vector costs

2018-07-26 Thread Luis Machado

Hi Kyrill,

On 07/26/2018 11:34 AM, Kyrill Tkachov wrote:

Hi Luis,

On 25/07/18 19:10, Luis Machado wrote:
The adjusted vector costs give Falkor a reasonable boost in 
performance for FP
benchmarks (both CPU2017 and CPU2006) and doesn't change INT 
benchmarks that

much. About 0.7% for CPU2017 FP and 1.54% for CPU2006 FP.

OK for trunk?



The patch looks ok and safe to me (though you'll need approval from the 
maintainers).


I'd be interested to see what workloads in CPU2017 were affected by this.
Any chance you could post the breakdown in numbers from CPU2017?



Sure. Here it is (speed):

605.mcf_s: -1.8%
620.omnetpp_s: -2% (tends to be noisy)
623.xalancbmk_s: 2%
654.roms_s: 7%

INT mean: -0.09%
FP mean: 0.70%

It is worth mentioning i noticed bigger improvements in CPU2017 rate, 
but i did not record those numbers for the final run. The speed 
benchmarks seem to have a slightly different performance profile.


Here's a breakdown of the biggest changes from CPU2006 in case you're 
interested:


410.bwaves: 5.4%
434.zeusmp: 9.7%
436.cactusADM: -12.3%
437.leslie3d: 5.2%
459.GemsFDTD: 16.9%

cactusADM seems to have a pretty big loop that is a win if vectorized, 
but experimentation showed me it is tricky to get GCC to vectorize that 
specific loop without also vectorizing particular loops from the other 
benchmarks.


It would be nice to get cactusADM back up though.


Re: Fwd: [PATCH, rs6000] Replace __uint128_t and __int128_t with __uint128 and __int128 in Power PC built-in documentation

2018-07-26 Thread Segher Boessenkool
On Thu, Jul 26, 2018 at 08:40:01AM -0500, Kelvin Nilsen wrote:
> To improve internal consistency and to improve consistency with published ABI 
> documents, this patch replaces the __uint128_t type with __uint128 and 
> replaces __int128_t with __int128.

> Is this ok for trunk?

Looks good, thanks!  Most (all?) of these functions are not documented
in the ABI, but this is a step forward anyway.  Okay for trunk.

What do things like error messages involving these functions look like?
What types do those say?


Segher


Re: [PATCH] treat -Wxxx-larger-than=HWI_MAX special (PR 86631)

2018-07-26 Thread Martin Sebor

On 07/26/2018 02:38 AM, Richard Biener wrote:

On Wed, Jul 25, 2018 at 5:54 PM Martin Sebor  wrote:


On 07/25/2018 08:57 AM, Jakub Jelinek wrote:

On Wed, Jul 25, 2018 at 08:54:13AM -0600, Martin Sebor wrote:

I don't mean for the special value to be used except internally
for the defaults.  Otherwise, users wanting to override the default
will choose a value other than it.  I'm happy to document it in
the .opt file for internal users though.

-1 has the documented effect of disabling the warnings altogether
(-1 is SIZE_MAX) so while I agree that -1 looks better it doesn't
work.  (It would need more significant changes.)


The variable is signed, so -1 is not SIZE_MAX.  Even if -1 disables it, you
could use e.g. -2 or other negative value for the other special case.


The -Wxxx-larger-than=N distinguish three ranges of argument
values (treated as unsigned):

   1.  [0, HOST_WIDE_INT_MAX)
   2.  HOST_WIDE_INT_MAX
   3.  [HOST_WIDE_INT_MAX + 1, Infinity)


But it doesn't make sense for those to be host dependent.


It isn't when the values are handled by each warning.  That's
also the point of this patch: to remove this (unintended)
dependency.


I think numerical user input should be limited to [0, ptrdiff_max]
and cases (1) and (2) should be simply merged, I see no value
in distinguishing them.  -Wxxx-larger-than should be aliased
to [0, ptrdiff_max], case (3) is achieved by -Wno-xxx-larger-than.

I think you are over-engineering this and the user-interface is
awful.


Thank you.

I agree that what you describe would be the ideal solution.
As I explained in the description of the patch, I did consider
handling PTRDIFF_MAX but the target-dependent value is not
available at the time the option argument is processed.  We
don't even know yet what the target data model is.

This is the best I came up with.  What do you suggest instead?

Martin


Re: [PATCH][Middle-end] disable strcmp/strncmp inlining with O2 below and Os

2018-07-26 Thread Qing Zhao


> On Jul 26, 2018, at 3:26 AM, Richard Biener  wrote:
> 
> On Wed, 25 Jul 2018, Qing Zhao wrote:
> 
>> Hi,
>> 
>> As Wilco suggested, the new added strcmp/strncmp inlining should be only 
>> enabled with O2 and above.
>> 
>> this is the simple patch for this change.
>> 
>> tested on both X86 and aarch64.
>> 
>> Okay for thunk?
> 
> You should simply use
> 
>  if (optimize_insn_for_size_p ())
>return NULL_RTX;
> 
> to be properly profile-aware.  OK with that change.

thanks for the review.

I will make the change, retest it, and then commit it.

Qing
> 
> Richard.
> 



Re: [PATCH] Add malloc predictor (PR middle-end/83023).

2018-07-26 Thread Marc Glisse

On Thu, 26 Jul 2018, Martin Liška wrote:


Following patch implements new predictors that annotates malloc-like functions.
These almost every time return a non-null value.


Out of curiosity (the __builtin_expect there doesn't hurt and we don't 
need to remove it), does it make __builtin_expect unnecessary in the 
implementation of operator new (libstdc++-v3/libsupc++/new_op.cc)? It 
looks like


  while (__builtin_expect ((p = malloc (sz)) == 0, false))
{
  new_handler handler = std::get_new_handler ();
  if (! handler)
_GLIBCXX_THROW_OR_ABORT(bad_alloc());
  handler ();
}

where being in a loop may trigger opposite heuristics.

--
Marc Glisse


Re: Build fail on gthr-simple.h targets (Re: AsyncI/O patch committed)

2018-07-26 Thread David Edelsohn
> Ulrich Weigand wrote:

> Nicholas Koenig wrote:
>
>> Hello everyone,
>>
>> I have committed the async I/O patch as r262978.
>>
>> The test cases are in libgomp.fortran for now, maybe that can be changed
>> later.
>
> It looks like this broke building libgfortran on spu, and presumably
> any platform that uses gthr-simple instead of gthr-posix.

Yes, this broke bootstrap for AIX as well.

- David


Re: [patch] adjust default nvptx launch geometry for OpenACC offloaded regions

2018-07-26 Thread Tom de Vries
On 07/26/2018 04:27 PM, Cesar Philippidis wrote:
> Hi Tom,
> 
> I see that you're reviewing the libgomp changes. Please disregard the
> following hunk:
> 
> On 07/11/2018 12:13 PM, Cesar Philippidis wrote:
>> @@ -1199,12 +1202,59 @@ nvptx_exec (void (*fn), size_t mapnum, void 
>> **hostaddrs, void **devaddrs,
>>   default_dims[GOMP_DIM_VECTOR]);
>>  }
>>pthread_mutex_unlock (&ptx_dev_lock);
>> +  int vectors = default_dims[GOMP_DIM_VECTOR];
>> +  int workers = default_dims[GOMP_DIM_WORKER];
>> +  int gangs = default_dims[GOMP_DIM_GANG];
>> +
>> +  if (nvptx_thread()->ptx_dev->driver_version > 6050)
>> +{
>> +  int grids, blocks;
>> +  CUDA_CALL_ASSERT (cuOccupancyMaxPotentialBlockSize, &grids,
>> +&blocks, function, NULL, 0,
>> +dims[GOMP_DIM_WORKER] * dims[GOMP_DIM_VECTOR]);
>> +  GOMP_PLUGIN_debug (0, "cuOccupancyMaxPotentialBlockSize: "
>> + "grid = %d, block = %d\n", grids, blocks);
>> +
>> +  gangs = grids * dev_size;
>> +  workers = blocks / vectors;
>> +}
> 
> I revisited this change yesterday and I noticed it was setting gangs
> incorrectly. Basically, gangs should be set as follows
> 
>   gangs = grids * (blocks / warp_size);
> 
> or to be more closer to og8 as
> 
>   gangs = 2 * grids * (blocks / warp_size);
> 
> The use of that magic constant 2 is to prevent thread starvation. That's
> a similar concept behind make -j<2*#threads>.
> 
> Anyway, I'm still experimenting with that change. There are still some
> discrepancies between the way that I select num_workers and how the
> driver does. The driver appears to be a little bit more conservative,
> but according to the thread occupancy calculator, that should yield
> greater performance on GPUs.
> 
> I just wanted to give you a heads up because you seem to be working on this.
> 

Ack, thanks for letting me know.

> Thanks for all of your reviews!
> 
> By the way, are you now maintainer of the libgomp nvptx plugin?

I'm not sure if that's a separate thing.

AFAIU the responsibilities of the nvptx maintainer are:
- the nvptx backend (under supervision of the global maintainers)
- and anything nvptx-y in all other components (under supervision of the
  component and global maintainers)

So, I'd say I'm on the hook to review patches for the nvptx plugin in
libgomp.

Thanks,
- Tom


Re: [PATCH 2/8] Remove char16_t and char32_t dependency on

2018-07-26 Thread Marek Polacek
On Thu, Jul 26, 2018 at 03:01:51PM +0100, jwak...@redhat.com wrote:
> --- a/libstdc++-v3/src/c++98/locale_init.cc
> +++ b/libstdc++-v3/src/c++98/locale_init.cc
> @@ -201,7 +201,6 @@ namespace
>fake_messages_w messages_w;
>  #endif
>  
> -#ifdef _GLIBCXX_USE_C99_STDINT_TR1
>typedef char fake_codecvt_c16[sizeof(codecvt)]
>__attribute__ ((aligned(__alignof__(codecvt;
>fake_codecvt_c16 codecvt_c16;
> @@ -209,7 +208,6 @@ namespace
>typedef char fake_codecvt_c32[sizeof(codecvt)]
>__attribute__ ((aligned(__alignof__(codecvt;
>fake_codecvt_c32 codecvt_c32;
> -#endif
>  
>// Storage for "C" locale caches.
>typedef char fake_num_cache_c[sizeof(std::__numpunct_cache)]
> @@ -329,7 +327,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>  &std::ctype::id,
>  &codecvt::id,
>  #endif
> -#ifdef _GLIBCXX_USE_C99_STDINT_TR1
> +#if _GLIBCXX_NUM_UNICODE_FACETS != 0
>  &codecvt::id,
>  &codecvt::id,
>  #endif
> @@ -536,7 +534,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>  _M_init_facet(new (&messages_w) std::messages(1));
>  #endif
>  
> -#ifdef _GLIBCXX_USE_C99_STDINT_TR1
> +#ifdef _GLIBCXX_NUM_UNICODE_FACETS != 0

This seems like a mistake; ok to fix it with the following?

2018-07-26  Marek Polacek  

* src/c++98/locale_init.cc: Fix #ifdef condition.

--- gcc/libstdc++-v3/src/c++98/locale_init.cc
+++ gcc/libstdc++-v3/src/c++98/locale_init.cc
@@ -534,7 +534,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 _M_init_facet(new (&messages_w) std::messages(1));
 #endif
 
-#ifdef _GLIBCXX_NUM_UNICODE_FACETS != 0
+#if _GLIBCXX_NUM_UNICODE_FACETS != 0
 _M_init_facet(new (&codecvt_c16) codecvt(1));
 _M_init_facet(new (&codecvt_c32) codecvt(1));
 #endif


Re: [PATCH 2/8] Remove char16_t and char32_t dependency on

2018-07-26 Thread Jonathan Wakely

On 26/07/18 11:51 -0400, Marek Polacek wrote:

On Thu, Jul 26, 2018 at 03:01:51PM +0100, jwak...@redhat.com wrote:

--- a/libstdc++-v3/src/c++98/locale_init.cc
+++ b/libstdc++-v3/src/c++98/locale_init.cc
@@ -201,7 +201,6 @@ namespace
   fake_messages_w messages_w;
 #endif

-#ifdef _GLIBCXX_USE_C99_STDINT_TR1
   typedef char fake_codecvt_c16[sizeof(codecvt)]
   __attribute__ ((aligned(__alignof__(codecvt;
   fake_codecvt_c16 codecvt_c16;
@@ -209,7 +208,6 @@ namespace
   typedef char fake_codecvt_c32[sizeof(codecvt)]
   __attribute__ ((aligned(__alignof__(codecvt;
   fake_codecvt_c32 codecvt_c32;
-#endif

   // Storage for "C" locale caches.
   typedef char fake_num_cache_c[sizeof(std::__numpunct_cache)]
@@ -329,7 +327,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 &std::ctype::id,
 &codecvt::id,
 #endif
-#ifdef _GLIBCXX_USE_C99_STDINT_TR1
+#if _GLIBCXX_NUM_UNICODE_FACETS != 0
 &codecvt::id,
 &codecvt::id,
 #endif
@@ -536,7 +534,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 _M_init_facet(new (&messages_w) std::messages(1));
 #endif

-#ifdef _GLIBCXX_USE_C99_STDINT_TR1
+#ifdef _GLIBCXX_NUM_UNICODE_FACETS != 0


This seems like a mistake; ok to fix it with the following?


Doh, yes please do - thanks.


2018-07-26  Marek Polacek  

* src/c++98/locale_init.cc: Fix #ifdef condition.

--- gcc/libstdc++-v3/src/c++98/locale_init.cc
+++ gcc/libstdc++-v3/src/c++98/locale_init.cc
@@ -534,7 +534,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
_M_init_facet(new (&messages_w) std::messages(1));
#endif

-#ifdef _GLIBCXX_NUM_UNICODE_FACETS != 0
+#if _GLIBCXX_NUM_UNICODE_FACETS != 0
_M_init_facet(new (&codecvt_c16) codecvt(1));
_M_init_facet(new (&codecvt_c32) codecvt(1));
#endif


Re: [PATCH 0/8] Reduce/remove dependencies on _GLIBCXX_USE_C99_STDINT_TR1

2018-07-26 Thread Cesar Philippidis
On 07/26/2018 07:01 AM, jwak...@redhat.com wrote:
> From: Jonathan Wakely 

It looks like you're using git send-email for this patch series. And it
seems like you made the same mistake that I did when you configured git
sendmail.from. According to the git sent-email manpage, from should be
your email address, however, it really wants it to be in of the form

  Full Name 

This is not a huge deal because the email went through, but it was
something that wasn't immediately obvious to me.

Cesar


Re: [PATCH 2/8] Remove char16_t and char32_t dependency on

2018-07-26 Thread Jonathan Wakely

On 26/07/18 16:59 +0100, Jonathan Wakely wrote:

On 26/07/18 11:51 -0400, Marek Polacek wrote:

On Thu, Jul 26, 2018 at 03:01:51PM +0100, jwak...@redhat.com wrote:

--- a/libstdc++-v3/src/c++98/locale_init.cc
+++ b/libstdc++-v3/src/c++98/locale_init.cc
@@ -201,7 +201,6 @@ namespace
  fake_messages_w messages_w;
#endif

-#ifdef _GLIBCXX_USE_C99_STDINT_TR1
  typedef char fake_codecvt_c16[sizeof(codecvt)]
  __attribute__ ((aligned(__alignof__(codecvt;
  fake_codecvt_c16 codecvt_c16;
@@ -209,7 +208,6 @@ namespace
  typedef char fake_codecvt_c32[sizeof(codecvt)]
  __attribute__ ((aligned(__alignof__(codecvt;
  fake_codecvt_c32 codecvt_c32;
-#endif

  // Storage for "C" locale caches.
  typedef char fake_num_cache_c[sizeof(std::__numpunct_cache)]
@@ -329,7 +327,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
&std::ctype::id,
&codecvt::id,
#endif
-#ifdef _GLIBCXX_USE_C99_STDINT_TR1
+#if _GLIBCXX_NUM_UNICODE_FACETS != 0
&codecvt::id,
&codecvt::id,
#endif
@@ -536,7 +534,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
_M_init_facet(new (&messages_w) std::messages(1));
#endif

-#ifdef _GLIBCXX_USE_C99_STDINT_TR1
+#ifdef _GLIBCXX_NUM_UNICODE_FACETS != 0


This seems like a mistake; ok to fix it with the following?


Doh, yes please do - thanks.


The warning scrolled by too fast for me to see:

/home/jwakely/src/gcc/libstdc++-v3/src/c++98/locale_init.cc:537:36: warning: 
extra tokens at end of #ifdef directive
#ifdef _GLIBCXX_NUM_UNICODE_FACETS != 0
   ^~

The downside of building with -j30 on a big machine.




Re: [PATCH 0/8] Reduce/remove dependencies on _GLIBCXX_USE_C99_STDINT_TR1

2018-07-26 Thread Jonathan Wakely

On 26/07/18 08:59 -0700, Cesar Philippidis wrote:

On 07/26/2018 07:01 AM, jwak...@redhat.com wrote:

From: Jonathan Wakely 


It looks like you're using git send-email for this patch series. And it
seems like you made the same mistake that I did when you configured git
sendmail.from. According to the git sent-email manpage, from should be
your email address, however, it really wants it to be in of the form

 Full Name 

This is not a huge deal because the email went through, but it was
something that wasn't immediately obvious to me.


Indeed :-)

I already changed the config, but thanks for confirming it was what
caused the problem.




[committed] Don't ignore OpenMP map clauses for declare target to vars if there is always modifier (PR middle-end/86660)

2018-07-26 Thread Jakub Jelinek
Hi!

We can't ignore map clauses for variables that are declare target to
if the map clause has always modifier, because we need to copy the data to
and/or from the device as user requested.  This has slightly undesirable
effect that the vars inside of the construct are remapped, perhaps with more
work we could keep using the device var directly.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk
and 8.3.

2018-07-26  Jakub Jelinek  

PR middle-end/86660
* omp-low.c (scan_sharing_clauses): Don't ignore map clauses for
declare target to variables if they have always,{to,from,tofrom} map
kinds.

* testsuite/libgomp.c/pr86660.c: New test.

--- gcc/omp-low.c.jj2018-07-17 12:54:13.543991017 +0200
+++ gcc/omp-low.c   2018-07-26 13:43:08.453714154 +0200
@@ -1183,13 +1183,16 @@ scan_sharing_clauses (tree clauses, omp_
  /* Global variables with "omp declare target" attribute
 don't need to be copied, the receiver side will use them
 directly.  However, global variables with "omp declare target link"
-attribute need to be copied.  */
+attribute need to be copied.  Or when ALWAYS modifier is used.  */
  if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
  && DECL_P (decl)
  && ((OMP_CLAUSE_MAP_KIND (c) != GOMP_MAP_FIRSTPRIVATE_POINTER
   && (OMP_CLAUSE_MAP_KIND (c)
   != GOMP_MAP_FIRSTPRIVATE_REFERENCE))
  || TREE_CODE (TREE_TYPE (decl)) == ARRAY_TYPE)
+ && OMP_CLAUSE_MAP_KIND (c) != GOMP_MAP_ALWAYS_TO
+ && OMP_CLAUSE_MAP_KIND (c) != GOMP_MAP_ALWAYS_FROM
+ && OMP_CLAUSE_MAP_KIND (c) != GOMP_MAP_ALWAYS_TOFROM
  && is_global_var (maybe_lookup_decl_in_outer_ctx (decl, ctx))
  && varpool_node::get_create (decl)->offloadable
  && !lookup_attribute ("omp declare target link",
--- libgomp/testsuite/libgomp.c/pr86660.c.jj2018-07-26 14:37:49.269202230 
+0200
+++ libgomp/testsuite/libgomp.c/pr86660.c   2018-07-26 14:35:31.938979831 
+0200
@@ -0,0 +1,28 @@
+/* PR middle-end/86660 */
+
+#pragma omp declare target
+int v[20];
+
+void
+foo (void)
+{
+  if (v[7] != 2)
+__builtin_abort ();
+  v[7] = 1;
+}
+#pragma omp end declare target
+
+int
+main ()
+{
+  v[5] = 8;
+  v[7] = 2;
+  #pragma omp target map (always, tofrom: v)
+  {
+foo ();
+v[5] = 3;
+  }
+  if (v[7] != 1 || v[5] != 3)
+__builtin_abort ();
+  return 0;
+}

Jakub


[committed] Partial fix for for-15.C (PR middle-end/86660)

2018-07-26 Thread Jakub Jelinek
Hi!

As mentioned in the PR and now that the middle-end bug is fixed, this fixes
thinkos in the testcase; the result variable has to be omp declare target,
otherwise the declared target functions called from the target regions can't
access it.

The testcase still fails to assemble due to exceptions, but that is
something to be fixed in the nvptx backend (sure, we could add
-fno-exceptions temporarily though for nvptx offloading only).

2018-07-26  Jakub Jelinek  

PR testsuite/86660
* testsuite/libgomp.c++/for-15.C (results): Include it in
omp declare target region.
(main): Use map (always, tofrom: results) instead of
map (tofrom: results).

--- libgomp/testsuite/libgomp.c++/for-15.C.jj   2018-07-11 15:09:05.997745784 
+0200
+++ libgomp/testsuite/libgomp.c++/for-15.C  2018-07-25 12:30:59.490564748 
+0200
@@ -88,11 +88,9 @@ private:
 
 template  const I &J::begin () { return b; }
 template  const I &J::end () { return e; }
-#pragma omp end declare target
 
 int results[2000];
 
-#pragma omp declare target
 template 
 void
 baz (I &i)
@@ -186,37 +184,37 @@ main ()
 a[i] = i;
   #pragma omp target data map (to: a)
   {
-#pragma omp target teams map (tofrom: results)
+#pragma omp target teams map (always, tofrom: results)
 {
   J j (&a[75], &a[1945]);
   f1 (j);
 }
 check (i >= 75 && i < 1945 && (i - 75) % 3 == 0);
-#pragma omp target teams map (tofrom: results)
+#pragma omp target teams map (always, tofrom: results)
 {
   J j (&a[63], &a[1949]);
   f2 (j);
 }
 check (i >= 63 && i < 1949);
-#pragma omp target teams map (tofrom: results)
+#pragma omp target teams map (always, tofrom: results)
 {
   J j (&a[58], &a[1979]);
   f3 <2> (j);
 }
 check (i >= 58 && i < 1979 && (i - 58) % 6 == 0);
-#pragma omp target teams map (tofrom: results)
+#pragma omp target teams map (always, tofrom: results)
 {
   J j (&a[59], &a[1981]);
   f4 <9> (j);
 }
 check (i >= 59 && i < 1981 && (i - 59) % 9 == 0);
-#pragma omp target teams map (tofrom: results)
+#pragma omp target teams map (always, tofrom: results)
 {
   J j (&a[52], &a[1972]);
   f5 (j);
 }
 check (i >= 52 && i < 1972 && (i - 52) % 4 == 0);
-#pragma omp target teams map (tofrom: results)
+#pragma omp target teams map (always, tofrom: results)
 {
   J j (&a[31], &a[1827]);
   f6 (j);

Jakub


Re: [Patch-86512]: Subnormal float support in armv7(with -msoft-float) for intrinsics

2018-07-26 Thread Nicolas Pitre
Umesh Kalappa wrote:

> Any more suggestions or comments on the patch ?

The patch is suboptimal as it introduces 2 additional instructions in a 
fairly common path for a branch that is very unlikely to be taken in 
practice.

I'm therefore proposing this alternative patch to fix the issue in an 
optimal way. I'm also using this opportunity to update my email address 
as the one currently in those files has been obsolete for more than 10 
years at this point, in the hope that I get notified of similar issues 
directly in the future.diff --git a/libgcc/ChangeLog b/libgcc/ChangeLog
index c13bf4cb2f6..c19d05c8a2e 100644
--- a/libgcc/ChangeLog
+++ b/libgcc/ChangeLog
@@ -1,3 +1,9 @@
+2018-07-26  Nicolas Pitre 
+
+   * config/arm/ieee754-df.S: Don't shortcut denormal handling when
+   exponent goes negative. Update my email address.
+   * config/arm/ieee754-sf.S: Likewise.
+
 2018-07-05  James Clarke  
 
* configure: Regenerated.
diff --git a/libgcc/config/arm/ieee754-df.S b/libgcc/config/arm/ieee754-df.S
index 8741aa99245..ee7a9835394 100644
--- a/libgcc/config/arm/ieee754-df.S
+++ b/libgcc/config/arm/ieee754-df.S
@@ -1,7 +1,7 @@
 /* ieee754-df.S double-precision floating point support for ARM
 
Copyright (C) 2003-2018 Free Software Foundation, Inc.
-   Contributed by Nicolas Pitre (n...@cam.org)
+   Contributed by Nicolas Pitre (n...@fluxnic.net)
 
This file is free software; you can redistribute it and/or modify it
under the terms of the GNU General Public License as published by the
@@ -238,9 +238,10 @@ LSYM(Lad_a):
movsip, ip, lsl #1
adcsxl, xl, xl
adc xh, xh, xh
-   tst xh, #0x0010
-   sub r4, r4, #1
-   bne LSYM(Lad_e)
+   subsr4, r4, #1
+   do_it   hs
+   tsths   xh, #0x0010
+   bhi LSYM(Lad_e)
 
@ No rounding necessary since ip will always be 0 at this point.
 LSYM(Lad_l):
diff --git a/libgcc/config/arm/ieee754-sf.S b/libgcc/config/arm/ieee754-sf.S
index d80d5e9080c..640c97ed550 100644
--- a/libgcc/config/arm/ieee754-sf.S
+++ b/libgcc/config/arm/ieee754-sf.S
@@ -1,7 +1,7 @@
 /* ieee754-sf.S single-precision floating point support for ARM
 
Copyright (C) 2003-2018 Free Software Foundation, Inc.
-   Contributed by Nicolas Pitre (n...@cam.org)
+   Contributed by Nicolas Pitre (n...@fluxnic.net)
 
This file is free software; you can redistribute it and/or modify it
under the terms of the GNU General Public License as published by the
@@ -168,9 +168,10 @@ LSYM(Lad_e):
 LSYM(Lad_a):
movsr1, r1, lsl #1
adc r0, r0, r0
-   tst r0, #0x0080
-   sub r2, r2, #1
-   bne LSYM(Lad_e)
+   subsr2, r2, #1
+   do_it   hs
+   tsths   r0, #0x0080
+   bhi LSYM(Lad_e)

@ No rounding necessary since r1 will always be 0 at this point.
 LSYM(Lad_l):


Backports to 8.3

2018-07-26 Thread Jakub Jelinek
Hi!

I've backported 4 commits of mine to gcc-8-branch, after
bootstrapping/regtesting them on x86_64-linux and i686-linux.

Jakub
2018-07-26  Jakub Jelinek  

Backported from mainline
2018-07-10  Jakub Jelinek  

PR fortran/86421
* module.c (omp_declare_simd_clauses): Add LINEAR with _REF, _VAL and
_UVAL suffixes.
(mio_omp_declare_simd): Save and restore ref, val and uval modifiers
on linear clauses.  Initialize n->where to gfc_current_locus.

* gfortran.dg/vect/pr86421.f90: New test.

--- gcc/fortran/module.c(revision 262534)
+++ gcc/fortran/module.c(revision 262535)
@@ -4098,6 +4098,9 @@ static const mstring omp_declare_simd_cl
 minit ("UNIFORM", 3),
 minit ("LINEAR", 4),
 minit ("ALIGNED", 5),
+minit ("LINEAR_REF", 33),
+minit ("LINEAR_VAL", 34),
+minit ("LINEAR_UVAL", 35),
 minit (NULL, -1)
 };
 
@@ -4140,7 +4143,10 @@ mio_omp_declare_simd (gfc_namespace *ns,
}
  for (n = ods->clauses->lists[OMP_LIST_LINEAR]; n; n = n->next)
{
- mio_name (4, omp_declare_simd_clauses);
+ if (n->u.linear_op == OMP_LINEAR_DEFAULT)
+   mio_name (4, omp_declare_simd_clauses);
+ else
+   mio_name (32 + n->u.linear_op, omp_declare_simd_clauses);
  mio_symbol_ref (&n->sym);
  mio_expr (&n->expr);
}
@@ -4181,11 +4187,20 @@ mio_omp_declare_simd (gfc_namespace *ns,
case 4:
case 5:
  *ptrs[t - 3] = n = gfc_get_omp_namelist ();
+   finish_namelist:
+ n->where = gfc_current_locus;
  ptrs[t - 3] = &n->next;
  mio_symbol_ref (&n->sym);
  if (t != 3)
mio_expr (&n->expr);
  break;
+   case 33:
+   case 34:
+   case 35:
+ *ptrs[1] = n = gfc_get_omp_namelist ();
+ n->u.linear_op = (enum gfc_omp_linear_op) (t - 32);
+ t = 4;
+ goto finish_namelist;
}
}
 }
--- gcc/testsuite/gfortran.dg/vect/pr86421.f90  (nonexistent)
+++ gcc/testsuite/gfortran.dg/vect/pr86421.f90  (revision 262535)
@@ -0,0 +1,35 @@
+! PR fortran/86421
+! { dg-require-effective-target vect_simd_clones }
+! { dg-additional-options "-fopenmp-simd" }
+! { dg-additional-options "-mavx" { target avx_runtime } }
+
+module mod86421
+  implicit none
+contains
+  subroutine foo(x, y, z)
+real :: x
+integer :: y, z
+!$omp declare simd linear(ref(x)) linear(val(y)) linear(uval(z))
+x = x + y
+z = z + 1
+  end subroutine
+end module mod86421
+
+program pr86421
+  use mod86421
+  implicit none
+  integer :: i, j
+  real :: a(64)
+  j = 0
+  do i = 1, 64
+a(i) = i
+  end do
+  !$omp simd
+  do i = 1, 64
+call foo (a(i), i, j)
+  end do
+  do i = 1, 64
+if (a(i) .ne. (2 * i)) stop 1
+  end do
+  if (j .ne. 64) stop 2
+end program pr86421
2018-07-26  Jakub Jelinek  

Backported from mainline
2018-07-17  Jakub Jelinek  

PR middle-end/86539
* gimplify.c (gimplify_omp_for): Ensure taskloop firstprivatized init
and cond temporaries don't have reference type if iterator has
pointer type.  For init use &for_pre_body instead of pre_p if
for_pre_body is non-empty.

* testsuite/libgomp.c++/pr86539.C: New test.

--- gcc/gimplify.c  (revision 262775)
+++ gcc/gimplify.c  (revision 262776)
@@ -9811,9 +9811,26 @@ gimplify_omp_for (tree *expr_p, gimple_s
  t = TREE_VEC_ELT (OMP_FOR_INIT (for_stmt), i);
  if (!is_gimple_constant (TREE_OPERAND (t, 1)))
{
+ tree type = TREE_TYPE (TREE_OPERAND (t, 0));
  TREE_OPERAND (t, 1)
= get_initialized_tmp_var (TREE_OPERAND (t, 1),
-  pre_p, NULL, false);
+  gimple_seq_empty_p (for_pre_body)
+  ? pre_p : &for_pre_body, NULL,
+  false);
+ /* Reference to pointer conversion is considered useless,
+but is significant for firstprivate clause.  Force it
+here.  */
+ if (TREE_CODE (type) == POINTER_TYPE
+ && (TREE_CODE (TREE_TYPE (TREE_OPERAND (t, 1)))
+ == REFERENCE_TYPE))
+   {
+ tree v = create_tmp_var (TYPE_MAIN_VARIANT (type));
+ tree m = build2 (INIT_EXPR, TREE_TYPE (v), v,
+  TREE_OPERAND (t, 1));
+ gimplify_and_add (m, gimple_seq_empty_p (for_pre_body)
+  ? pre_p : &for_pre_body);
+ TREE_OPERAND (t, 1) = v;
+   }
  tree c = build_omp_clause (input_location,
 OMP_CLAUSE_FIRSTPRIVATE);
   

Re: [PATCH] enhance strlen to understand MEM_REF and partial overlaps (PR 86042, 86043)

2018-07-26 Thread Martin Sebor

On 06/29/2018 11:05 AM, Jeff Law wrote:

On 06/07/2018 09:57 AM, Martin Sebor wrote:

The attached patch enhances the strlen pass to more consistently
deal with MEM_REF assignments (PR 86042) and to track string
lengths across calls to memcpy that overwrite parts of a string
with sequences of non-nul characters (PR 86043).

Fixes for both bugs rely on changes to the same code so I chose
to include them in the same patch.

To fix PR 86042 the patch extends handle_char_store() to deal with
more forms of multi-character assignments from MEM_REF (originally
introduced in r256180).  To handle assignments from strings of
multiple nuls the patch also extends the initializer_zerop()
function to understand MEM_REFs of the form:

   MEM[(char * {ref-all})&a] = MEM[(char * {ref-all})"..."];

The solution for PR 86043 consists of two parts: the extension
above which lets handle_char_store() recognize assignments of
sequences of non-null characters that overwrite some portion of
the leading non-zero characters in the destination and avoid
discarding the destination information, and a similar extension
to handle_builtin_memcpy().

Martin

gcc-86042.diff


PR tree-optimization/86042 - missing strlen optimization after second strcpy

gcc/ChangeLog:

PR tree-optimization/86042
* tree-ssa-strlen.c (handle_builtin_memcpy): Handle strict overlaps.
(get_string_cst_length): Rename...
(get_min_string_length): ...to this.  Add argument.
(handle_char_store): Extend to handle multi-character stores by
MEM_REF.
* tree.c (initializer_zerop): Use new argument.  Handle MEM_REF.
* tree.h (initializer_zerop): Add argument.

gcc/testsuite/ChangeLog:

PR tree-optimization/86042
* gcc.dg/strlenopt-44.c: New test.

OK.


I missed your approval and didn't get to committing the patch
until today.  While retesting it on top of fresh trunk I noticed
a few test failures due to other recent strlen changes.  I made
adjustments to the patch to avoid most of them and opened bug
86688 for one that I think needs a separate code change and
xfailed the test cases until the bug gets resolved.

Martin



Re: [GCC][PATCH][Aarch64] Stop redundant zero-extension after UMOV when in DI mode

2018-07-26 Thread Sam Tebbs



On 07/25/2018 07:08 PM, Sudakshina Das wrote:

Hi Sam

On 25/07/18 14:08, Sam Tebbs wrote:

On 07/23/2018 05:01 PM, Sudakshina Das wrote:

Hi Sam


On Monday 23 July 2018 11:39 AM, Sam Tebbs wrote:

Hi all,

This patch extends the aarch64_get_lane_zero_extendsi instruction 
definition to
also cover DI mode. This prevents a redundant AND instruction from 
being

generated due to the pattern failing to be matched.

Example:

typedef char v16qi __attribute__ ((vector_size (16)));

unsigned long long
foo (v16qi a)
{
  return a[0];
}

Previously generated:

foo:
    umov    w0, v0.b[0]
    and x0, x0, 255
    ret

And now generates:

foo:
    umov    w0, v0.b[0]
    ret

Bootstrapped on aarch64-none-linux-gnu and tested on 
aarch64-none-elf with no

regressions.

gcc/
2018-07-23  Sam Tebbs 

    * config/aarch64/aarch64-simd.md
    (*aarch64_get_lane_zero_extendsi):
    Rename to...
(*aarch64_get_lane_zero_extend): ... This.
    Use GPI iterator instead of SI mode.

gcc/testsuite
2018-07-23  Sam Tebbs 

    * gcc.target/aarch64/extract_zero_extend.c: New file

You will need an approval from a maintainer, but I would only add 
one request to this:


diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md

index 89e38e6..15fb661 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -3032,15 +3032,16 @@
   [(set_attr "type" "neon_to_gp")]
 )

-(define_insn "*aarch64_get_lane_zero_extendsi"
-  [(set (match_operand:SI 0 "register_operand" "=r")
-    (zero_extend:SI
+(define_insn "*aarch64_get_lane_zero_extend"
+  [(set (match_operand:GPI 0 "register_operand" "=r")
+    (zero_extend:GPI

Since you are adding 4 new patterns with this change, could you add
more cases in your test as well to make sure you have coverage for 
each of them.


Thanks
Sudi


Hi Sudi,

Thanks for the feedback. Here is an updated patch that adds more 
testcases to cover the patterns generated by the different mode 
combinations. The changelog and description from my original email 
still apply.




Thanks for making the changes and adding more test cases. I do however
see that you are only covering 2 out of 4 new
*aarch64_get_lane_zero_extenddi<> patterns. The
*aarch64_get_lane_zero_extendsi<> were already existing. I don't mind
those tests. I would just ask you to add the other two new patterns
as well. Also since the different versions of the instruction generate
same instructions (like foo_16qi and foo_8qi both give out the same
instruction), I would suggest using a -fdump-rtl-final (or any relevant
rtl dump) with the dg-options and using a scan-rtl-dump to scan the
pattern name. Something like:
/* { dg-do compile } */
/* { dg-options "-O3 -fdump-rtl-final" } */
...
...
/* { dg-final { scan-rtl-dump "aarch64_get_lane_zero_extenddiv16qi" 
"final" } } */


Thanks
Sudi


Hi Sudi,

Thanks again. Here's an update that adds 4 more tests, so all 8 patterns
generated are now tested for!

Below is the updated changelog

gcc/
2018-07-26  Sam Tebbs  

    * config/aarch64/aarch64-simd.md
    (*aarch64_get_lane_zero_extendsi):
    Rename to...
(*aarch64_get_lane_zero_extend): ... This.
    Use GPI iterator instead of SI mode.

gcc/testsuite
2018-07-26  Sam Tebbs  

    * gcc.target/aarch64/extract_zero_extend.c: New file





   (vec_select:
     (match_operand:VDQQH 1 "register_operand" "w")
     (parallel [(match_operand:SI 2 "immediate_operand" "i")]]
   "TARGET_SIMD"
   {
-    operands[2] = aarch64_endian_lane_rtx (mode, INTVAL 
(operands[2]));

+    operands[2] = aarch64_endian_lane_rtx (mode,
+                       INTVAL (operands[2]));
 return "umov\\t%w0, %1.[%2]";
   }
   [(set_attr "type" "neon_to_gp")]






diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index f1784d72e55c412d076de43f2f7aad4632d55ecb..e92a3b49c65e84d2a16a2a480c359a0b4d8fa3e3 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -3033,15 +3033,16 @@
   [(set_attr "type" "neon_to_gp")]
 )
 
-(define_insn "*aarch64_get_lane_zero_extendsi"
-  [(set (match_operand:SI 0 "register_operand" "=r")
-	(zero_extend:SI
+(define_insn "*aarch64_get_lane_zero_extend"
+  [(set (match_operand:GPI 0 "register_operand" "=r")
+	(zero_extend:GPI
 	  (vec_select:
 	(match_operand:VDQQH 1 "register_operand" "w")
 	(parallel [(match_operand:SI 2 "immediate_operand" "i")]]
   "TARGET_SIMD"
   {
-operands[2] = aarch64_endian_lane_rtx (mode, INTVAL (operands[2]));
+operands[2] = aarch64_endian_lane_rtx (mode,
+	   INTVAL (operands[2]));
 return "umov\\t%w0, %1.[%2]";
   }
   [(set_attr "type" "neon_to_gp")]
diff --git a/gcc/testsuite/gcc.target/aarch64/extract_zero_extend.c b/gcc/testsuite/gcc.target/aarch64/extract_zero_extend.c
new file mode 100644
index ..a294b261909a1d67ab339c929f2609dcda01c067
--- /dev/nul

Re: [PATCH] haiku: Initial build support

2018-07-26 Thread Joseph Myers
On Mon, 16 Jul 2018, Alexander von Gluck IV wrote:

> * We have been dragging these around since gcc 4.x.
> * Some tweaks will likely be needed, but this gets our foot
>   in the door.
> 
> Authors:
>   Fredrik Holmqvist
>   Jerome Duval
>   Augustin Cavalier
>   François Revol
>   Simon South
>   Jessica Hamilton
>   Ithamar R. Adema
>   Oliver Tappe
>   Jonathan Schleifer
>   .. and maybe more!

Before this can be reviewed, we'll need copyright assignments (with 
employer disclaimers where applicable) on file at the FSF from everyone 
who contributed a legally significant amount of code (more than around 15 
lines).  Without those, reviewers can't safely look at the changes in 
detail.

https://gcc.gnu.org/contribute.html

https://git.savannah.gnu.org/cgit/gnulib.git/plain/doc/Copyright/request-assign.future

Then, please make sure that only substantive changes are included - that 
there are no diff lines that are purely changing trailing whitespace in 
existing code, for example.  Please ensure that all copyright and license 
notices follow current standards (which means using ranges of years ending 
in 2018, GPLv3 notices and a URL not an FSF postal address).  For changes 
to existing code, especially, please make sure to include sufficient 
rationale in the patch submission to explain those changes, why they are 
needed and the approach taken to them.

For new target OS support, I'd expect details to be provided of the test 
results on that OS for the various architectures supported by GCC.  Are 
you planning, if the support is accepted in GCC, to maintain a bot that 
keeps running the GCC testsuite for GCC mainline for this OS for the 
various target architectures supported, at least daily or thereabouts, and 
posts the results to the gcc-testresults list, and to keep monitoring the 
test results and fixing OS-specific issues that show up?  It's much better 
for issues to be identified within a day or two of the commit causing them 
than many months later, possibly only after a release has come out with 
the issue - but that requires an ongoing commitment to keep monitoring 
test results, posting them to gcc-testresults and keeping them in good 
shape.

> diff --git a/libtool.m4 b/libtool.m4

If this an exact backport of a change from upstream libtool git?  If so, 
please give the commit reference.  If not, give the URL of the submission 
to upstream libtool.  We don't want local libtool changes that aren't 
backports or at least proposed upstream without objections, to avoid 
making future updates from upstream libtool harder.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH] Make function clone name numbering independent.

2018-07-26 Thread Michael Ploujnikov
On 2018-07-24 09:57 AM, Michael Ploujnikov wrote:
> On 2018-07-20 06:05 AM, Richard Biener wrote:
>>>  /* Return a new assembler name for a clone with SUFFIX of a decl named
>>> NAME.  */
>>> @@ -521,14 +521,13 @@ tree
>>>  clone_function_name_1 (const char *name, const char *suffix)
>>
>> pass this function the counter to use
>>
>>>  {
>>>size_t len = strlen (name);
>>> -  char *tmp_name, *prefix;
>>> +  char *prefix;
>>>
>>>prefix = XALLOCAVEC (char, len + strlen (suffix) + 2);
>>>memcpy (prefix, name, len);
>>>strcpy (prefix + len + 1, suffix);
>>>prefix[len] = symbol_table::symbol_suffix_separator ();
>>> -  ASM_FORMAT_PRIVATE_NAME (tmp_name, prefix, clone_fn_id_num++);
>>
>> and keep using ASM_FORMAT_PRIVATE_NAME here.  You need to change
>> the lto/lto-partition.c caller (just use zero as counter).
>>
>>> -  return get_identifier (tmp_name);
>>> +  return get_identifier (prefix);
>>>  }
>>>
>>>  /* Return a new assembler name for a clone of DECL with SUFFIX.  */
>>> @@ -537,7 +536,17 @@ tree
>>>  clone_function_name (tree decl, const char *suffix)
>>>  {
>>>tree name = DECL_ASSEMBLER_NAME (decl);
>>> -  return clone_function_name_1 (IDENTIFIER_POINTER (name), suffix);
>>> +  const char *decl_name = IDENTIFIER_POINTER (name);
>>> +  char *numbered_name;
>>> +  unsigned int *suffix_counter;
>>> +  if (!clone_fn_ids) {
>>> +/* Initialize the per-function counter hash table if this is the first 
>>> call */
>>> +clone_fn_ids = hash_map::create_ggc (64);
>>> +  }
>>
>> I still do not like throwing memory at the problem in this way for the
>> little benefit
>> this change provides :/
>>
>> So no approval from me at this point...
>>
>> Richard.
> 
> Can you give me an idea of the memory constraints that are involved?
> 
> The highest memory usage increase that I could find was when compiling
> a source file (from Linux) with roughly 10,000 functions. It showed a 2kB
> increase over the before-patch use of 6936kB which is barely 0.03%.
> 
> Using a single counter can result in more confusing namespacing when
> you have .bar.clone.4 despite there only being 3 clones of .bar.
> 
> From a practical point of view this change is helpful to anyone
> diffing binary output such as forensic analysts, Debian Reproducible
> Builds or even someone validating compiler output (before and after an input
> source patch). The extra changes that this patch alleviates are a
> distraction and could even be misleading. For example, applying a
> source patch to the same Linux source produces the following binary
> diff before my change:
> 
> --- /tmp/output.o.objdump
> +++ /tmp/patched-output.o.objdump
> @@ -1,5 +1,5 @@
> 
> -/tmp/uverbs_cmd/output.o: file format elf32-i386
> +/tmp/uverbs_cmd/patched-output.o: file format elf32-i386
> 
> 
>  Disassembly of section .text.get_order:
> @@ -265,12 +265,12 @@
> 3:e9 fc ff ff ff  jmp4 
>   4: R_386_PC32   .text.put_uobj_read
> 
> -Disassembly of section .text.trace_kmalloc.constprop.3:
> +Disassembly of section .text.trace_kmalloc.constprop.4:
> 
> - :
> + :
> 0:83 3d 04 00 00 00 00cmpl   $0x0,0x4
>   2: R_386_32 __tracepoint_kmalloc
> -   7:74 34   je 3d 
> 
> +   7:74 34   je 3d 
> 
> 9:55  push   %ebp
> a:89 cd   mov%ecx,%ebp
> c:57  push   %edi
> @@ -281,7 +281,7 @@
>13:8b 1d 10 00 00 00   mov0x10,%ebx
>   15: R_386_32__tracepoint_kmalloc
>19:85 db   test   %ebx,%ebx
> -  1b:74 1b   je 38 
> 
> +  1b:74 1b   je 38 
> 
>1d:68 d0 00 00 00  push   $0xd0
>22:89 fa   mov%edi,%edx
>24:89 f0   mov%esi,%eax
> @@ -292,7 +292,7 @@
>31:58  pop%eax
>32:83 3b 00cmpl   $0x0,(%ebx)
>35:5a  pop%edx
> -  36:eb e3   jmp1b 
> 
> +  36:eb e3   jmp1b 
> 
>38:5b  pop%ebx
>39:5e  pop%esi
>3a:5f  pop%edi
> @@ -846,7 +846,7 @@
>78:b8 5f 00 00 00  mov$0x5f,%eax
>   79: R_386_32.text.ib_uverbs_alloc_pd
>7d:e8 fc ff ff ff  call   7e 
> - 7e: R_386_PC32  .text.trace_kmalloc.constprop.3
> + 7e: R_386_PC32  .text.trace_kmalloc.constprop.4
>82:c7 45 d4 f4 ff ff ffmovl   $0xfff4,-0x2c(%ebp)
>89:59  pop%ecx
>8a:85 db   test   %ebx,%ebx
> @@ -1068,7 +1068,7 @@
>9

New template for 'gcc' made available

2018-07-26 Thread Translation Project Robot
Hello, gentle maintainer.

This is a message from the Translation Project robot.  (If you have
any questions, send them to .)

A new POT file for textual domain 'gcc' has been made available
to the language teams for translation.  It is archived as:

http://translationproject.org/POT-files/gcc-8.2.0.pot

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

Below is the URL which has been provided to the translators of your
package.  Please inform the translation coordinator, at the address
at the bottom, if this information is not current:

https://ftp.gnu.org/gnu/gcc/gcc-8.2.0/gcc-8.2.0.tar.xz

Translated PO files will later be automatically e-mailed to you.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.




  1   2   >