date:20150929

Re: [PATCH] liboffloadmic emulation mode: make it asynchronous

2015-09-29 Thread Jakub Jelinek

On Mon, Sep 28, 2015 at 05:53:42PM +0300, Ilya Verbin wrote:
> Currently the COI emulator is single-threaded, i.e. it is able to run only one
> target function at a time, e.g. the following testcase:
> 
>   #pragma omp parallel sections num_threads(2)
> {
>   #pragma omp section
>   #pragma omp target
>   while (1)
>   putchar ('.');
> 
>   #pragma omp section
>   #pragma omp target
>   while (1)
>   putchar ('o');
> }
> 
> prints only dots using emul, while using real libcoi it prints:
> ...o.o.o.o...o...o.oo.o.o.ooo.oo...o.o.o...o.ooo
> Of course, it's not possible to test new OpenMP 4.1's async features using 
> such
> an emulator.
> 
> The patch bellow makes it asynchronous, it creates an auxiliary thread for 
> each
> COIPipeline in host and in target processes.  In general, a new COIPipeline is
> created by liboffloadmic for each host thread with offload, i.e. the example
> above has:
> 4 threads in the host process (2 OpenMP threads + 2 auxiliary threads) and
> 3 threads in the target process (1 main thread + 2 auxiliary threads).
> An auxiliary host thread runs a target function in the new thread in target
> process and waits for its completion.  When the function is finished, the host
> thread signals an event and can run a callback, if it is registered.
> liboffloadmic waits for signalled events by calling COIEventWait.
> This is identical to how real libcoi works.
> 
> make check-target-libgomp and some internal tests did not show any regression.
> TSan report is clean.  Is it OK for trunk?

For now ok.  Though, I'd say I'd prefer if there were no auxiliary threads
on the host side, just whatever thread is asked to send something to/from
the device, wait for something and/or poll for something just polling the
pipes.  Are there auxiliary host threads also for the case when using
the real COI, offloading to hw?
> 
> 
> liboffloadmic/
>   * plugin/libgomp-plugin-intelmic.cpp (OFFLOAD_ACTIVE_WAIT_ENV): New
>   define.
>   (init): Set OFFLOAD_ACTIVE_WAIT env var to 0, if it is not set.
>   * runtime/emulator/coi_common.h (PIPE_HOST_PATH): Replace with ...
>   (PIPE_HOST2TGT_NAME): ... this.
>   (PIPE_TARGET_PATH): Replace with ...
>   (PIPE_TGT2HOST_NAME): ... this.
>   (MALLOCN): New define.
>   (READN): Likewise.
>   (WRITEN): Likewise.
>   (enum cmd_t): Replace CMD_RUN_FUNCTION with CMD_PIPELINE_RUN_FUNCTION.
>   Add CMD_PIPELINE_CREATE, CMD_PIPELINE_DESTROY.
>   * runtime/emulator/coi_device.cpp (engine_dir): New static variable.
>   (pipeline_thread_routine): New static function.
>   (COIProcessWaitForShutdown): Use global engine_dir instead of mic_dir.
>   Rename pipe_host and pipe_target to pipe_host2tgt and pipe_tgt2host.
>   If cmd is CMD_PIPELINE_CREATE, create a new thread for the pipeline.
>   Remove cmd == CMD_RUN_FUNCTION case.
>   * runtime/emulator/coi_device.h (COIERRORN): New define.
>   * runtime/emulator/coi_host.cpp: Include set, map, queue.
>   Replace typedefs with enums and structs.
>   (struct Function): Remove name, add num_buffers, bufs_size,
>   bufs_data_target, misc_data_len, misc_data, return_value_len,
>   return_value, completion_event.
>   (struct Callback): New.
>   (struct Process): Remove pipeline.  Add pipe_host2tgt and pipe_tgt2host.
>   (struct Pipeline): Remove pipe_host and pipe_target.  Add thread,
>   destroy, is_destroyed, pipe_host2tgt_path, pipe_tgt2host_path,
>   pipe_host2tgt, pipe_tgt2host, queue, process.
>   (max_pipeline_num): New static variable.
>   (pipelines): Likewise.
>   (max_event_num): Likewise.
>   (non_signalled_events): Likewise.
>   (errored_events): Likewise.
>   (callbacks): Likewise.
>   (cleanup): Do not check tmp_dirs before free.
>   (start_critical_section): New static function.
>   (finish_critical_section): Likewise.
>   (pipeline_is_destroyed): Likewise.
>   (maybe_invoke_callback): Likewise.
>   (signal_event): Likewise.
>   (get_event_result): Likewise.
>   (COIBufferCopy): Rename arguments according to headers.  Add asserts.
>   Use process' main pipes, instead of pipeline's pipes.  Signal completion
>   event.
>   (COIBufferCreate): Rename arguments according to headers.  Add asserts.
>   Use process' main pipes, instead of pipeline's pipes.
>   (COIBufferCreateFromMemory): Rename arguments according to headers.
>   Add asserts.
>   (COIBufferDestroy): Rename arguments according to headers.  Add asserts.
>   Use process' main pipes, instead of pipeline's pipes.
>   (COIBufferGetSinkAddress): Rename arguments according to headers.
>   Add asserts.
>   (COIBufferMap): Rename arguments according to headers.  Add asserts.
>   Signal completion event.
>   (COIBufferRead): Likewise.
>   (COIBufferSetState): Likewise.
>   (COIBuf

Re: [PATCH][AArch64] Add separate insn sched class for vector LDP & STP

2015-09-29 Thread Marcus Shawcroft


On 29/09/15 00:52, Evandro Menezes wrote:

In some micro-architectures the insns to load or store pairs of vector
registers are implemented rather differently from those affecting lanes
in vector registers.  Then, it's important that such insns be described
likewise differently in the scheduling model.

This patch adds the insn types neon_ldp{,_q} and neon_stp{,_q} apart
from the current neon_load2_2reg_q and neon_store2_2reg_q types,
respectively.



Hi,

The AArch64 part of this is OK. Please wait for Kyrill or Ramana to 
comment on ARM side.  Cheers /Marcus



Thank you,

-- Evandro Menezes


0001-AArch64-Add-separate-insn-sched-class-for-vector-LDP.patch


 From 340249dcd2af8dfce486cb4f62d4eaf285c6a799 Mon Sep 17 00:00:00 2001
From: Evandro Menezes
Date: Mon, 28 Sep 2015 15:00:00 -0500
Subject: [PATCH] [AArch64] Add separate insn sched class for vector LDP & STP

2015-09-28  Evandro Menezes

gcc/
* config/arm/types.md (neon_ldp, neon_ldp_q, neon_stp, neon_stp_q):
add new insn types for vector load and store pairs.


s/add/Add/ and likewise the rest of the changelog comments.


* config/arm/cortex-a53.md (cortex_a53_f_load_2reg): add insn
types "neon_ldp{,_q}".
* config/arm/cortex-a57.md (neon_load_c): add insn types
"neon_ldp{,_q}".
(neon_store_complex): add insn types "neon_stp{,_q}".
* config/aarch64/aarch64-simd.md (aarch64_be_movoi): add insn types
"neon_{ldp,stp}_q".

Re: Ping^2 Re: Pass -foffload targets from driver to libgomp at link time

2015-09-29 Thread Jakub Jelinek

On Mon, Sep 28, 2015 at 11:39:10AM +0200, Thomas Schwinge wrote:
> Hi!
> 
> On Fri, 11 Sep 2015 17:43:49 +0200, Jakub Jelinek  wrote:
> > So, do I understand well that you'll call GOMP_set_offload_targets from
> > construct[ors] of all shared libraries (and the binary) that contain 
> > offloaded
> > code?  If yes, that is surely going to fail the assertions in there.
> 
> Indeed.  My original plan has been to generate/invoke this constructor
> only for/from the final executable and not for any shared libraries, but
> it seems I didn't implemented this correctly.

How would you mean to implement it?  -fopenmp or -fopenacc code with
offloading bits might not be in the final executable at all, nor in shared
libraries it is linked against; such libraries could be only dlopened,
consider say python plugin.  And this is not just made up, perhaps not with
offloading yet, but people regularly use OpenMP code in plugins and then we
get complains that fork child of the main program is not allowed to do
anything but async-signal-safe functions.
> 
> > You can dlopen such libraries etc.  What if you link one library with
> > -fopenmp=nvptx-none and another one with -fopenmp=x86_64-intelmicemul-linux?
> 
> So, the first question to answer is: what do we expect to happen in this
> case, or similarly, if the executable and any shared libraries are
> compiled with different/incompatible -foffload options?

As the device numbers are per-process, the only possibility I see is that
all the physically available devices are always available, and just if you
try to offload from some code to a device that doesn't support it, you get
host fallback.  Because, one shared library could carefully use device(xyz)
to offload to say XeonPhi it is compiled for and supports, and another
library device(abc) to offload to PTX it is compiled for and supports.

> For this, I propose that the only mode of operation that we currently can
> support is that all of the executable and any shared libraries agree on
> the offload targets specified by -foffload, and I thus propose the
> following patch on top of what Joseph has posted before (passes the
> testsuite, but not yet tested otherwise):

See above, no.

Jakub

Re: [PATCH][PR67666] Handle single restrict pointer in struct in create_variable_info_for_1

2015-09-29 Thread Richard Biener

On Tue, 29 Sep 2015, Tom de Vries wrote:

> On 22/09/15 09:49, Richard Biener wrote:
> > On Tue, 22 Sep 2015, Tom de Vries wrote:
> > 
> > > Hi,
> > > 
> > > Consider this test-case:
> > > 
> > > struct ps
> > > {
> > >int *__restrict__ p;
> > > };
> > > 
> > > void
> > > f (struct ps &__restrict__ ps1)
> > > {
> > >*(ps1.p) = 1;
> > > }
> > > 
> > > 
> > > Atm, the restrict on p has no effect. Now, say we add a field to the
> > > struct:
> > > 
> > > struct ps
> > > {
> > >int *__restrict__ p;
> > >int a;
> > > };
> > > 
> > > 
> > > Then the restrict on p does have the desired effect.
> > > 
> > > 
> > > This patch fixes the handling of structs with a single field in alias
> > > analysis.
> > > 
> > > Bootstrapped and reg-tested on x86_64.
> > > 
> > > OK for trunk?
> > 
> > Ok.
> > 
> 
> Hi,
> 
> I wonder if this follow-up patch is necessary.
> 
> Now that we handle structs with one field in the final loop of
> create_variable_info_for_1, should we set the is_full_var field as well? It
> used to be set for such structs before I committed the "Handle single restrict
> pointer in struct in create_variable_info_for_1" patch.

Yeah, I suppose so.  But I'd set vi->is_full_var to true when
allocating 'vi':

  vi = new_var_info (decl, name);
  vi->fullsize = tree_to_uhwi (declsize);
 +  if (fieldstack.length () == 1) 
 +   vi->is_full_var = true;


Ok with that change.

Thanks,
Richard.

> Thanks,
> - Tom
> 
> diff --git a/gcc/tree-ssa-structalias.c b/gcc/tree-ssa-structalias.c
> index 8d86dcb..26d97a3 100644
> --- a/gcc/tree-ssa-structalias.c
> +++ b/gcc/tree-ssa-structalias.c
> @@ -5720,6 +5720,8 @@ create_variable_info_for_1 (tree decl, const char *name)
>newvi->offset = fo->offset;
>newvi->size = fo->size;
>newvi->fullsize = vi->fullsize;
> +  if (fieldstack.length () == 1)
> +   newvi->is_full_var = true;
>newvi->may_have_pointers = fo->may_have_pointers;
>newvi->only_restrict_pointers = fo->only_restrict_pointers;
>if (i + 1 < fieldstack.length ())
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)

Re: [AArch64] Fix Prefetch ICE

2015-09-29 Thread Marcus Shawcroft

On 28 September 2015 at 06:27, Hurugalawadi, Naveen
 wrote:
> Hi Marcus,
>
> Thanks for the review and comments.
>
>>> OK and can you back port to 5 ?
>
> Please find attached the backported patch on gcc-5-branch.
>
> Regression tested on AArch64 without any issues.
>
> 2015-09-28  Andrew Pinski  
>
> ChangeLog
>
> * config/aarch64/aarch64.md (prefetch):
> Change the predicate of operand 0 to register_operand.

Thank you, please commit it if you have not already.
/M

Re: [PATCH][AArch64] Add separate insn sched class for vector LDP & STP

2015-09-29 Thread Kyrill Tkachov



On 29/09/15 09:03, Marcus Shawcroft wrote:

On 29/09/15 00:52, Evandro Menezes wrote:

In some micro-architectures the insns to load or store pairs of vector
registers are implemented rather differently from those affecting lanes
in vector registers.  Then, it's important that such insns be described
likewise differently in the scheduling model.

This patch adds the insn types neon_ldp{,_q} and neon_stp{,_q} apart
from the current neon_load2_2reg_q and neon_store2_2reg_q types,
respectively.


Hi,

The AArch64 part of this is OK. Please wait for Kyrill or Ramana to
comment on ARM side.  Cheers /Marcus



This is ok arm-wise. I see the instructions being modelled
with this type don't have a direct arm equivalent anyway.
Marcus' comment on the ChangeLog still apply.

Thanks,
Kyrill


Thank you,

-- Evandro Menezes


0001-AArch64-Add-separate-insn-sched-class-for-vector-LDP.patch


  From 340249dcd2af8dfce486cb4f62d4eaf285c6a799 Mon Sep 17 00:00:00 2001
From: Evandro Menezes
Date: Mon, 28 Sep 2015 15:00:00 -0500
Subject: [PATCH] [AArch64] Add separate insn sched class for vector LDP & STP

2015-09-28  Evandro Menezes

gcc/
* config/arm/types.md (neon_ldp, neon_ldp_q, neon_stp, neon_stp_q):
add new insn types for vector load and store pairs.

s/add/Add/ and likewise the rest of the changelog comments.


* config/arm/cortex-a53.md (cortex_a53_f_load_2reg): add insn
types "neon_ldp{,_q}".
* config/arm/cortex-a57.md (neon_load_c): add insn types
"neon_ldp{,_q}".
(neon_store_complex): add insn types "neon_stp{,_q}".
* config/aarch64/aarch64-simd.md (aarch64_be_movoi): add insn types
"neon_{ldp,stp}_q".

Re: [gomp4] error on acc loops not associated with offloaded acc regions

2015-09-29 Thread Thomas Schwinge

Hi Cesar!

On Mon, 28 Sep 2015 10:08:34 -0700, Cesar Philippidis  
wrote:
> I've applied this patch to gomp-4_0-branch which teaches omplower how to
> error when it detects acc loops which aren't nested inside an acc
> parallel or kernels region or located within a function marked as an acc
> routine. A couple of test cases needed to be updated.
> 
> The error message is kind of long. Let me know if it should be revised.

>   gcc/testsuite/
>   * c-c++-common/goacc/non-routine.c: New test.
>   * c-c++-common/goacc-gomp/nesting-1.c: Add checks for invalid loop
>   nesting.
>   * c-c++-common/goacc-gomp/nesting-fail-1.c: Likewise.
>   * c-c++-common/goacc/clauses-fail.c: Likewise.
>   * c-c++-common/goacc/sb-1.c: Likewise.
>   * c-c++-common/goacc/sb-3.c: Likewise.
>   * gcc.dg/goacc/sb-1.c: Likewise.
>   * gcc.dg/goacc/sb-3.c: Likewise.

What about any Fortran test cases?

> --- a/gcc/omp-low.c
> +++ b/gcc/omp-low.c
> @@ -2901,6 +2901,14 @@ check_omp_nesting_restrictions (gimple *stmt, 
> omp_context *ctx)
>   }
> return true;
>   }
> +  if (is_gimple_omp_oacc (stmt) && ctx == NULL
> +   && get_oacc_fn_attrib (current_function_decl) == NULL)
> + {
> +   error_at (gimple_location (stmt),
> + "acc loops must be associated with an acc region or "
> + "routine");
> +   return false;
> + }
>/* FALLTHRU */
>  case GIMPLE_CALL:
>if (is_gimple_call (stmt)

I see that the error reporting doesn't really use a consistent style
currently, but what about something like "loop directive must be
associated with compute region" (where "compute region" is the language
used by OpenACC 2.0a to mean the structured block associated with a
compute construct as well as routine directive)?

> --- a/gcc/testsuite/c-c++-common/goacc-gomp/nesting-1.c
> +++ b/gcc/testsuite/c-c++-common/goacc-gomp/nesting-1.c
> @@ -20,6 +20,7 @@ f_acc_kernels (void)
>}
>  }
>  
> +#pragma acc routine
>  void
>  f_acc_loop (void)
>  {

OK, but...

> --- a/gcc/testsuite/c-c++-common/goacc-gomp/nesting-fail-1.c
> +++ b/gcc/testsuite/c-c++-common/goacc-gomp/nesting-fail-1.c
> @@ -361,72 +361,72 @@ f_acc_data (void)
>  void
>  f_acc_loop (void)
>  {
> -#pragma acc loop
> +#pragma acc loop /* { dg-error "acc loops must be associated with an acc 
> region or routine" } */
>for (i = 0; i < 2; ++i)
>  {
> -#pragma omp parallel /* { dg-error "non-OpenACC construct inside of OpenACC 
> region" } */
> +#pragma omp parallel
>;
>  }

... here you're changing what this is meant to be testing, so please
restore the original meaning (by adding "#pragma acc routine" to this
function, I suppose), and then perhaps add whichever additional test
cases you deem necessary.

> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/goacc/non-routine.c
> @@ -0,0 +1,16 @@
> +/* This program validates the behavior of acc loops which are
> +   not associated with a parallel or kernles region or routine.  */

:-) Thanks for adding such a comment -- this is missing in too many test
cases.


Grüße,
 Thomas


signature.asc
Description: PGP signature

[gomp4, committed] Ignore reduction clauses in kernels region

2015-09-29 Thread Tom de Vries


Hi,

this patch filters out reduction clauses in an oacc kernels region. This 
fixes an ICE in the test-case.


Committed to gomp-4_0-branch.

Thanks,
- Tom
Ignore reduction clauses in kernels region

2015-09-29  Tom de Vries  

	* omp-low.c (ctx_in_oacc_kernels_region): New function.
	(scan_omp_for): Filter out reduction clauses in kernels region.

	* c-c++-common/goacc/kernels-acc-loop-reduction.c: New test.
---
 gcc/omp-low.c  | 18 +++-
 .../goacc/kernels-acc-loop-reduction.c | 25 ++
 2 files changed, 42 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-acc-loop-reduction.c

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index a5904eb..597035f 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -2579,6 +2579,20 @@ oacc_loop_or_target_p (gimple *stmt)
 	  && gimple_omp_for_kind (stmt) == GF_OMP_FOR_KIND_OACC_LOOP));
 }
 
+bool
+ctx_in_oacc_kernels_region (omp_context *ctx)
+{
+  for (;ctx != NULL; ctx = ctx->outer)
+{
+  gimple *stmt = ctx->stmt;
+  if (gimple_code (stmt) == GIMPLE_OMP_TARGET
+	  && gimple_omp_target_kind (stmt) == GF_OMP_TARGET_KIND_OACC_KERNELS)
+	return true;
+}
+
+  return false;
+}
+
 /* Scan a GIMPLE_OMP_FOR.  */
 
 static void
@@ -2592,6 +2606,7 @@ scan_omp_for (gomp_for *stmt, omp_context *outer_ctx)
   bool auto_clause = false;
   bool seq_clause = false;
   int gwv_routine = 0;
+  bool in_oacc_kernels_region = ctx_in_oacc_kernels_region (outer_ctx);
 
   if (outer_ctx)
 outer_type = gimple_code (outer_ctx->stmt);
@@ -2665,7 +2680,8 @@ scan_omp_for (gomp_for *stmt, omp_context *outer_ctx)
 
   /* Filter out any OpenACC clauses which aren't associated with
 	 gangs, workers or vectors.  Such reductions are no-ops.  */
-  if (extract_oacc_loop_mask (ctx) == 0)
+  if (extract_oacc_loop_mask (ctx) == 0
+	  || in_oacc_kernels_region)
 	{
 	  /* First filter out the clauses at the beginning of the chain.  */
 	  while (clauses && OMP_CLAUSE_CODE (clauses) == OMP_CLAUSE_REDUCTION)
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-acc-loop-reduction.c b/gcc/testsuite/c-c++-common/goacc/kernels-acc-loop-reduction.c
new file mode 100644
index 000..f3aa4e7
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-acc-loop-reduction.c
@@ -0,0 +1,25 @@
+/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-ftree-parallelize-loops=32" } */
+/* { dg-additional-options "-fdump-tree-parloops_oacc_kernels-all" } */
+/* { dg-additional-options "-fdump-tree-optimized" } */
+
+unsigned int
+foo (int n, unsigned int *a)
+{
+  unsigned int sum = 0;
+
+#pragma acc kernels loop gang reduction(+:sum)
+  for (int i = 0; i < n; i++)
+sum += a[i];
+
+  return sum;
+}
+
+/* Check that only one loop is analyzed, and that it can be parallelized.  */
+/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloops_oacc_kernels" } } */
+/* { dg-final { scan-tree-dump-not "FAILED:" "parloops_oacc_kernels" } } */
+
+/* Check that the loop has been split off into a function.  */
+/* { dg-final { scan-tree-dump-times "(?n);; Function .*foo.*\\._omp_fn\\.0" 1 "optimized" } } */
+
+/* { dg-final { scan-tree-dump-times "(?n)pragma omp target oacc_parallel.*num_gangs\\(32\\)" 1 "parloops_oacc_kernels" } } */
-- 
1.9.1

[gomp4, committed] Don't unnecessarily set address taken in expand_omp_for_generic

2015-09-29 Thread Tom de Vries


Hi,

this patch sets the address taken bit for start0 and end0 in 
expand_omp_for_generic only if necessary. This fixes an ICE while 
compiling the test-case.


Committed to gomp-4_0-branch.

Thanks,
- Tom
Don't unnecessarily set address taken in expand_omp_for_generic

2015-09-29  Tom de Vries  

	* omp-low.c (expand_omp_for_generic): Only set address taken for istart0
	and end0 unless necessary.

	* c-c++-common/goacc/kernels-acc-loop-smaller-equal.c: New test.
---
 gcc/omp-low.c  | 10 ++---
 .../goacc/kernels-acc-loop-smaller-equal.c | 25 ++
 2 files changed, 32 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-acc-loop-smaller-equal.c

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 597035f..a53a872 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -6564,7 +6564,7 @@ expand_omp_for_generic (struct omp_region *region,
   gassign *assign_stmt;
   bool in_combined_parallel = is_combined_parallel (region);
   bool broken_loop = region->cont == NULL;
-  bool seq_loop = (!start_fn || !next_fn);
+  bool seq_loop = (start_fn == BUILT_IN_NONE || next_fn == BUILT_IN_NONE);
   edge e, ne;
   tree *counts = NULL;
   int i;
@@ -6576,8 +6576,12 @@ expand_omp_for_generic (struct omp_region *region,
   type = TREE_TYPE (fd->loop.v);
   istart0 = create_tmp_var (fd->iter_type, ".istart0");
   iend0 = create_tmp_var (fd->iter_type, ".iend0");
-  TREE_ADDRESSABLE (istart0) = 1;
-  TREE_ADDRESSABLE (iend0) = 1;
+
+  if (!seq_loop)
+{
+  TREE_ADDRESSABLE (istart0) = 1;
+  TREE_ADDRESSABLE (iend0) = 1;
+}
 
   /* See if we need to bias by LLONG_MIN.  */
   if (fd->iter_type == long_long_unsigned_type_node
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-acc-loop-smaller-equal.c b/gcc/testsuite/c-c++-common/goacc/kernels-acc-loop-smaller-equal.c
new file mode 100644
index 000..ba7414a
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-acc-loop-smaller-equal.c
@@ -0,0 +1,25 @@
+/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-ftree-parallelize-loops=32" } */
+/* { dg-additional-options "-fdump-tree-parloops_oacc_kernels-all" } */
+/* { dg-additional-options "-fdump-tree-optimized" } */
+
+unsigned int
+foo (int n)
+{
+  unsigned int sum = 1;
+
+  #pragma acc kernels loop
+  for (int i = 1; i <= n; i++)
+sum += i;
+
+  return sum;
+}
+
+/* Check that only one loop is analyzed, and that it can be parallelized.  */
+/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloops_oacc_kernels" } } */
+/* { dg-final { scan-tree-dump-not "FAILED:" "parloops_oacc_kernels" } } */
+
+/* Check that the loop has been split off into a function.  */
+/* { dg-final { scan-tree-dump-times "(?n);; Function .*foo.*\\._omp_fn\\.0" 1 "optimized" } } */
+
+/* { dg-final { scan-tree-dump-times "(?n)pragma omp target oacc_parallel.*num_gangs\\(32\\)" 1 "parloops_oacc_kernels" } } */
-- 
1.9.1

Re: [Graphite] Redesign Graphite scop detection

2015-09-29 Thread Andreas Schwab

FAIL: gcc.dg/graphite/interchange-1.c execution test
FAIL: gcc.dg/graphite/interchange-10.c execution test
FAIL: gcc.dg/graphite/interchange-11.c execution test
FAIL: gcc.dg/graphite/interchange-3.c execution test
FAIL: gcc.dg/graphite/interchange-4.c execution test
FAIL: gcc.dg/graphite/interchange-7.c execution test
FAIL: gcc.dg/graphite/pr46185.c execution test
FAIL: gcc.dg/graphite/uns-block-1.c execution test
FAIL: gcc.dg/graphite/uns-interchange-12.c execution test
FAIL: gcc.dg/graphite/uns-interchange-14.c execution test
FAIL: gcc.dg/graphite/uns-interchange-15.c execution test
FAIL: gcc.dg/graphite/uns-interchange-9.c execution test
FAIL: gcc.dg/graphite/uns-interchange-mvt.c execution test
FAIL: gfortran.dg/graphite/block-1.f90   -O  (internal compiler error)

/daten/aranym/gcc/gcc-20150929/gcc/testsuite/gfortran.dg/graphite/block-1.f90:1:0:
 internal compiler error: in extract_affine_chrec, at 
graphite-sese-to-poly.c:605
0xece332 extract_affine_chrec
../../gcc/graphite-sese-to-poly.c:604
0xece332 extract_affine
../../gcc/graphite-sese-to-poly.c:791
0xecdcec extract_affine_chrec
../../gcc/graphite-sese-to-poly.c:595
0xecdcec extract_affine
../../gcc/graphite-sese-to-poly.c:791
0xed3476 pdr_add_memory_accesses
../../gcc/graphite-sese-to-poly.c:1477
0xed3476 build_poly_dr
../../gcc/graphite-sese-to-poly.c:1572
0xed3476 build_pbb_drs
../../gcc/graphite-sese-to-poly.c:1836
0xed3476 build_scop_drs
../../gcc/graphite-sese-to-poly.c:1919
0xed3476 build_poly_scop(scop*)
../../gcc/graphite-sese-to-poly.c:3179
0xebdfc2 graphite_transform_loops()
../../gcc/graphite.c:318
0xebe6a0 graphite_transforms
../../gcc/graphite.c:353
0xebe6a0 execute
../../gcc/graphite.c:430

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

Re: [Patch ifcvt costs 0/3] Introduce a new target hook for ifcvt costs.

2015-09-29 Thread Richard Biener

On Fri, Sep 25, 2015 at 5:04 PM, James Greenhalgh
 wrote:
> Hi,
>
> In relation to the patch I put up for review a few weeks ago to teach
> RTL if-convert to handle multiple sets in a basic block [1], I was
> asking about a sensible cost model to use. There was some consensus at
> Cauldron that what should be done in this situation is to introduce a
> target hook that delegates answering the question to the target.

Err - the consensus was to _not_ add gazillion of special target hooks
but instead enhance what we have with rtx_cost so that passes can
rely on comparing before and after costs of a sequence of insns.

Richard.

> This patch series introduces that new target hook to provide cost
> decisions for the RTL ifcvt pass.
>
> The idea is to give the target full visibility of the proposed
> transformation, and allow it to respond as to whether if-conversion in that
> way is profitable.
>
> In order to preserve current behaviour across targets, we will need the
> default implementation to keep to the strategy of simply comparing branch
> cost against a magic number. Patch 1/3 performs this refactoring, which is
> a bit hairy in some corner cases.
>
> Patch 2/3 is a simple code move, pulling the definition of the if_info
> structure used by RTL if-convert in to ifcvt.h where it can be included
> by targets.
>
> Patch 3/3 then introduces the new target hook, with the same default
> behaviour as was previously in noce_is_profitable_p.
>
> The series has been bootstrapped on ARM, AArch64 and x86_64 targets, and
> I've verified with Spec2000 and Spec2006 runs that there are no code
> generation differences for any of these three targets after the patch.
>
> I also gave ultrasparc3 a quick go, from what I could see, I changed the
> register allocation for the floating-point condition code registers.
> Presumably this is a side effect of first constructing RTXen that I then
> discard. I didn't see anything which looked like more frequent reloads or
> substantial code generation changes, though I'm not familiar with the
> intricacies of the Sparc condition registers :).
>
> I've included a patch 4/3, to give an example of what a target might want
> to do with this hook. It needs work for tuning and deciding how the function
> should actually behave, but works if it is thought of as more of a
> strawman/prototype than a patch submission.
>
> Are parts 1, 2 and 3 OK?
>
> Thanks,
> James
>
> [1]: https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00781.html
>
> ---
> [Patch ifcvt 1/3] Factor out cost calculations from noce cases
>
> 2015-09-26  James Greenhalgh  
>
> * ifcvt.c (noce_if_info): Add a magic_number field :-(.
> (noce_is_profitable_p): New.
> (noce_try_store_flag_constants): Move cost calculation
> to after sequence generation, factor it out to noce_is_profitable_p.
> (noce_try_addcc): Likewise.
> (noce_try_store_flag_mask): Likewise.
> (noce_try_cmove): Likewise.
> (noce_try_cmove_arith): Likewise.
> (noce_try_sign_mask): Add comment regarding cost calculations.
>
> [Patch ifcvt 2/3] Move noce_if_info in to ifcvt.h
>
> 2015-09-26  James Greenhalgh  
>
> * ifcvt.c (noce_if_info): Move to...
> * ifcvt.h (noce_if_info): ...Here.
>
> [Patch ifcvt 3/3] Create a new target hook for deciding profitability
> of noce if-conversion
>
> 2015-09-26  James Greenhalgh  
>
> * target.def (costs): New hook vector.
> (ifcvt_noce_profitable_p): New hook.
> * doc/tm.texi.in: Document it.
> * doc/tm.texi: Regenerate.
> * targhooks.h (default_ifcvt_noce_profitable_p): New.
> * targhooks.c (default_ifcvt_noce_profitable_p): New.
> * ifcvt.c (noce_profitable_p): Use new target hook.
>
> [Patch Prototype AArch64 ifcvt 4/3] Wire up the new if-convert costs
> hook for AArch64
>
> 2015-09-26  James Greenhalgh  
>
> * config/aarch64/aarch64.c
> (aarch64_additional_branch_cost_for_probability): New.
> (aarch64_ifcvt_noce_profitable_p): Likewise.
> (TARGET_COSTS_IFCVT_NOCE_PROFITABLE_P): Likewise.

Re: [Patch Prototype AArch64 ifcvt 4/3] Wire up the new if-convert costs hook for AArch64

2015-09-29 Thread Richard Biener

On Fri, Sep 25, 2015 at 5:04 PM, James Greenhalgh
 wrote:
>
> Hi,
>
> This patch is a simple prototype showing how a target might choose
> to implement TARGET_COSTS_IFCVT_NOCE_IS_PROFITABLE_P.  It has not been
> tuned, tested or looked at in any meaningful way.
>
> While the patch is in need of more detailed analysis it is sufficient to
> serve as an indication of what direction I was aiming for with this
> patch set.
>
> Clearly this is not OK for trunk without further work, but I thought I'd
> include it as an afterthought for the costs rework.

First of all don't include math.h or use FP math on the host.  If you need
fractional arithmetic use sreal.

It looks like with your hook implementation you are mostly hiding magic
numbers in the target.  I'm not sure how this is better than exposing them
as user-accessible --params (and thus their defaults controllable by
the target).

Richard.

> Thanks,
> James
>
> ---
> 2015-09-26  James Greenhalgh  
>
> * config/aarch64/aarch64.c
> (aarch64_additional_branch_cost_for_probability): New.
> (aarch64_ifcvt_noce_profitable_p): Likewise.
> (TARGET_COSTS_IFCVT_NOCE_PROFITABLE_P): Likewise.
>

Re: [gomp4, committed] Ignore reduction clauses in kernels region

2015-09-29 Thread Tom de Vries


On 29/09/15 11:49, Tom de Vries wrote:

Hi,

this patch filters out reduction clauses in an oacc kernels region. This
fixes an ICE in the test-case.

Committed to gomp-4_0-branch.


I've committed this follow-up patch that marks the function 
ctx_in_oacc_kernels_region static, and adds the missing function header 
comment.


Thanks,
- Tom

Make ctx_in_oacc_kernels_region static

2015-09-29  Tom de Vries  

	* omp-low.c (ctx_in_oacc_kernels_region): Make static.  Add missing
	function header comment.
---
 gcc/omp-low.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 75044a5..64f6168 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -2579,7 +2579,9 @@ oacc_loop_or_target_p (gimple *stmt)
 	  && gimple_omp_for_kind (stmt) == GF_OMP_FOR_KIND_OACC_LOOP));
 }
 
-bool
+/* Return true if ctx is part of an oacc kernels region.  */
+
+static bool
 ctx_in_oacc_kernels_region (omp_context *ctx)
 {
   for (;ctx != NULL; ctx = ctx->outer)
-- 
1.9.1

Re: [PATCH 1/4] Add mkoffload for Intel MIC

2015-09-29 Thread Richard Biener

On Mon, Sep 28, 2015 at 2:05 PM, Bernd Schmidt  wrote:
> On 09/28/2015 02:00 PM, Jakub Jelinek wrote:
>>
>> On Mon, Sep 28, 2015 at 01:27:32PM +0200, Bernd Schmidt wrote:

 I've removed obstack_ptr_grow for arrays with known sizes after this
 review:
 https://gcc.gnu.org/ml/gcc-patches/2014-10/msg02210.html
>>>
>>>
>>> That's unfortunate, I think that made the code less future-proof. IMO we
>>> should revert to the obstack method especially if Thomas -v patch goes
>>> in.
>>
>>
>> Why?  If the number of arguments is bound by a small constant, using
>> automatic fixed size array is certainly more efficient, and I really don't
>> see it as less readable or maintainable.
>
>
> The code becomes harder to modify, with more room for error, and you no
> longer have consistency in how you build argv arrays within the same file.
> The obstack method is pretty much foolproof and doesn't even remotely allow
> for the possibility of a buffer overflow, and adding new arguments, even
> conditionally, is entirely trivial. Efficiency is really not an issue for
> building arguments compared to the cost of executing another binary.

I agree that obstacks are better here.  Efficiency shouldn't matter here.
But we're in C++ now so can't we statically construct the array with
sth like

const char *new_argv[] = { "objcopy", ... };

?  Thus have the compiler figure out the number of args.  That would work
for me as well.

Richard.

>
> Bernd

Re: [patch, committed] Dump function attributes

2015-09-29 Thread Richard Biener

On Tue, Sep 29, 2015 at 7:43 AM, Tom de Vries  wrote:
> [ was: Re: [RFC] Dump function attributes ]
>
> On 28/09/15 17:17, Bernd Schmidt wrote:
>>
>> On 09/28/2015 04:32 PM, Tom de Vries wrote:
>>>
>>> patch below prints the function attributes in the dump file.
>>
>>
>>> foo ()
>>> [ noclone , noinline ]
>>> {
>>> ...
>>>
>>> Good idea?
>>>
>>> If so, do we want one attribute per line?
>>
>>
>> Only for really long ones I'd think. Patch is ok for now.
>>
>>
>
> Reposting patch with ChangeLog entry added.
>
> Bootstrapped and reg-tested on x86_64.
>
> Committed to trunk.

Hmpf.  I always like to make the dump-files as much copy&past-able to testcases
as possible.  So why did you invent a new syntax for attributes instead of using
the existing __attribute__(("noclone", "noinline")) (in this case)?
Did you verify
how attributes with arguments get printed?

Thanks,
Richard.

>
> Thanks,
> - Tom

Re: [PR64164] drop copyrename, integrate into expand

2015-09-29 Thread Szabolcs Nagy


On 23/09/15 21:07, Alexandre Oliva wrote:

On Sep 18, 2015, Alan Lawrence  wrote:


With the latest git commit 2b27ef197ece54c4573c5a748b0d40076e35412c on
branch aoliva/pr64164, I am now able to build a cross toolchain for
aarch64 and aarch64_be, and can confirm the ABI failure is fixed on
the branch.




this commit

commit 33cc9081157a8c90460e4c0bdda2ac461a3822cc
Author: aoliva 
Date:   2015-09-27 09:02:00 +

revert to assign_parms assignments using default defs
...

introduced a test failure on arm-none-eabi (using newlib, compiling
with -mthumb -march=armv8-a -mfpu=crypto-neon-fp-armv8 -mfloat-abi=hard ):

FAIL: gcc.target/arm/pr43920-2.c scan-assembler-times pop 2

spawn arm-none-eabi-size pr43920-2.o
   textdata bss dec hex filename
 56   0   0  56  38 pr43920-2.o
text size is 56
FAIL: gcc.target/arm/pr43920-2.c object-size text <= 54

(i haven't looked into the failure, attached asm output before and after).


Thanks for the confirmation.  I've made one further tweak for cris and
lm32, dropping the assert that caused build failures for libstdc++
atomics parms that required more alignment than
MAX_SUPPORTED_STACK_ALIGNMENT, consolidated the patchset and retested it
with a more recent baseline (r228019), with native regstraps on
x86_64-linux-gnu, i686-linux-gnu, powerpc64-linux-gnu,
powerpc64le-linux-gnu, and cross toolchain builds for the following 73
platforms: aarch64_be-elf aarch64-elf arm-eabi armeb-eabihf
arm-symbianelf avr-elf bfin-elf c6x-elf cr16-elf cris-elf crisv32-elf
epiphany-elf fido-elf fr30-elf frv-elf ft32-elf h8300-elf i686-elf
ia64-elf iq2000-elf lm32-elf m32c-elf m32r-elf m32rle-elf m68k-elf
mcore-elf mep-elf microblaze-elf mips64el-elf mips64-elf mips64orion-elf
mips64vr-elf mipsel-elf mipsisa32-elfoabi mipsisa64-elfoabi
mipsisa64r2el-elf mipsisa64r2-sde-elf mipsisa64sb1-elf
mipsisa64sr71k-elf mipstx39-elf mn10300-elf moxie-elf msp430-elf
nds32be-elf nds32le-elf nios2-elf pdp11-aout powerpc-eabialtivec
powerpc-eabi powerpc-eabisimaltivec powerpc-eabisim powerpc-eabispe
powerpcle-eabi powerpcle-eabisim powerpcle-elf powerpc-xilinx-eabi
ppc64-eabi ppc-eabi ppc-elf rl78-elf rx-elf sh64-elf sh-elf
sh-superh-elf sparc64-elf sparc-elf sparc-leon-elf spu-elf v850e-elf
v850-elf visium-elf xstormy16-elf xtensa-elf.  Not all of them succeeded
in building, but those that didn't failed at the very same spots before
and after this patch.


This patch doesn't really add much functionality.  It rather
reimplements a lot of the ugly and fragile stuff I put in in the
previous big patchset in a far more robust and pleasant way.  It fixes a
number of regressions in the process, mainly because, instead of
modifying assign_parms so as to let cfgexpand do part of its job, it
reverts all of the RTL assignment for parameters and results to
assign_parms.  cfgexpand now leaves the RTL assignment of partitions
containing default defs or parms and results to assign_parms, and
assign_parms uses a single callback, set_parm_rtl, to tell cfgexpand the
assignment for the partition containing the default def of each
parameter.

This required introducing default defs for all parms and results, even
if unused; we could refrain from creating them, and refrain from
initializing those parameters (at least when optimizing), but that would
require messing with the fragile bits in assign_parms again, and it
would bring little benefit, since RTL optimization will likely notice
the initialization is unused and drop it anyway.  Besides, adding the
default defs was actually needed to fix a regression in the previous
patch, and even with the current patch it helps make sure we don't
assign more than one default def to the same SSA partition (the previous
patch attempted to do that, but there was a bug, fixed in the current
patch).  Having unused default defs makes it easier for us to decide
whether to use an entry_value rtx for the initial debug insn of a parm.
We track partitions holding default defs for parms and results with a
bitmap; we used to have a bitmap that tracked partitions holding default
defs, but it was unused!  I just renamed it and repurposed it.

I've also added checking asserts to set_rtl, to verify that, when we
expect a REG, we get a REG, and that it has the expected mode.  set_rtl
was also adjusted to record anonymous SSA names or their base types in
attrs of REGs or MEMs, respectively, so that code that relied on the
attrs to detect properties of the decl types no longer regress just
because we no longer generate decls for anonymous SSA names.  Since
there were prior uses of types in MEM attrs, that was expected to go
smoothly, but I was surprised at how smoothly adding SSA names to REG
attrs went.  No adjustments required!

I also tightened a bit the conditions for coalescing: we used to require
the same canonical type; I've added tests for same alignment
requirements, and for same signedness.  OTOH, I've added a few more
coalesce candidates for RESULT_DE

Re: [PATCH] liboffloadmic emulation mode: make it asynchronous

2015-09-29 Thread Ilya Verbin

On Tue, Sep 29, 2015 at 09:01:33 +0200, Jakub Jelinek wrote:
> On Mon, Sep 28, 2015 at 05:53:42PM +0300, Ilya Verbin wrote:
> > Currently the COI emulator is single-threaded, i.e. it is able to run only 
> > one
> > target function at a time, e.g. the following testcase:
> > 
> >   #pragma omp parallel sections num_threads(2)
> > {
> >   #pragma omp section
> >   #pragma omp target
> >   while (1)
> > putchar ('.');
> > 
> >   #pragma omp section
> >   #pragma omp target
> >   while (1)
> > putchar ('o');
> > }
> > 
> > prints only dots using emul, while using real libcoi it prints:
> > ...o.o.o.o...o...o.oo.o.o.ooo.oo...o.o.o...o.ooo
> > Of course, it's not possible to test new OpenMP 4.1's async features using 
> > such
> > an emulator.
> > 
> > The patch bellow makes it asynchronous, it creates an auxiliary thread for 
> > each
> > COIPipeline in host and in target processes.  In general, a new COIPipeline 
> > is
> > created by liboffloadmic for each host thread with offload, i.e. the example
> > above has:
> > 4 threads in the host process (2 OpenMP threads + 2 auxiliary threads) and
> > 3 threads in the target process (1 main thread + 2 auxiliary threads).
> > An auxiliary host thread runs a target function in the new thread in target
> > process and waits for its completion.  When the function is finished, the 
> > host
> > thread signals an event and can run a callback, if it is registered.
> > liboffloadmic waits for signalled events by calling COIEventWait.
> > This is identical to how real libcoi works.
> > 
> > make check-target-libgomp and some internal tests did not show any 
> > regression.
> > TSan report is clean.  Is it OK for trunk?
> 
> For now ok.  Though, I'd say I'd prefer if there were no auxiliary threads
> on the host side, just whatever thread is asked to send something to/from
> the device, wait for something and/or poll for something just polling the
>
> pipes.  Are there auxiliary host threads also for the case when using
> the real COI, offloading to hw?

Yes.

  -- Ilya

Re: [patch, committed] Dump function attributes

2015-09-29 Thread Tom de Vries


On 29/09/15 12:36, Richard Biener wrote:

On Tue, Sep 29, 2015 at 7:43 AM, Tom de Vries  wrote:

[ was: Re: [RFC] Dump function attributes ]

On 28/09/15 17:17, Bernd Schmidt wrote:


On 09/28/2015 04:32 PM, Tom de Vries wrote:


patch below prints the function attributes in the dump file.




foo ()
[ noclone , noinline ]
{
...

Good idea?

If so, do we want one attribute per line?



Only for really long ones I'd think. Patch is ok for now.




Reposting patch with ChangeLog entry added.

Bootstrapped and reg-tested on x86_64.

Committed to trunk.


Hmpf.  I always like to make the dump-files as much copy&past-able to testcases
as possible.


Hmm, interesting. Not something I use, but I can imagine it's useful.


So why did you invent a new syntax for attributes instead of using
the existing __attribute__(("noclone", "noinline")) (in this case)?


My main concerns were:
- being able to see in dump files what the actual attributes of a
  function are (rather than having to figure it out in a debug session).
- being able to write testcases that can test for the presence of those
  attributes in dump files


Did you verify
how attributes with arguments get printed?


F.i. an oacc offload function compiled by the host compiler is annotated 
as follows:


before pass_oacc_transform (in the gomp-4_0-branch):
...
[ oacc function 32, , , omp target entrypoint ]
...

after pass_oacc_transform:

[ oacc function 1, 1, 1, omp target entrypoint ]
...

Thanks,
- Tom

Re: [patch, committed] Dump function attributes

2015-09-29 Thread Richard Biener

On Tue, Sep 29, 2015 at 1:23 PM, Tom de Vries  wrote:
> On 29/09/15 12:36, Richard Biener wrote:
>>
>> On Tue, Sep 29, 2015 at 7:43 AM, Tom de Vries 
>> wrote:
>>>
>>> [ was: Re: [RFC] Dump function attributes ]
>>>
>>> On 28/09/15 17:17, Bernd Schmidt wrote:


 On 09/28/2015 04:32 PM, Tom de Vries wrote:
>
>
> patch below prints the function attributes in the dump file.



> foo ()
> [ noclone , noinline ]
> {
> ...
>
> Good idea?
>
> If so, do we want one attribute per line?



 Only for really long ones I'd think. Patch is ok for now.


>>>
>>> Reposting patch with ChangeLog entry added.
>>>
>>> Bootstrapped and reg-tested on x86_64.
>>>
>>> Committed to trunk.
>>
>>
>> Hmpf.  I always like to make the dump-files as much copy&past-able to
>> testcases
>> as possible.
>
>
> Hmm, interesting. Not something I use, but I can imagine it's useful.
>
>> So why did you invent a new syntax for attributes instead of using
>> the existing __attribute__(("noclone", "noinline")) (in this case)?
>
>
> My main concerns were:
> - being able to see in dump files what the actual attributes of a
>   function are (rather than having to figure it out in a debug session).
> - being able to write testcases that can test for the presence of those
>   attributes in dump files
>
>> Did you verify
>> how attributes with arguments get printed?
>
>
> F.i. an oacc offload function compiled by the host compiler is annotated as
> follows:
>
> before pass_oacc_transform (in the gomp-4_0-branch):
> ...
> [ oacc function 32, , , omp target entrypoint ]
> ...
>
> after pass_oacc_transform:
> 
> [ oacc function 1, 1, 1, omp target entrypoint ]
> .

Hmm, ok.  So without some extra dump_attribute_list wrapping
__attribute_(( ... )) around the above doesn't make it more amenable
for cut&pasting.

Richard.

>
> Thanks,
> - Tom

[PATCH, testsuite]: Check all variables to be non-zero before signbit tests in tg-tests.h

2015-09-29 Thread Uros Bizjak

Hello!

On targets where denormals are flushed to zero with
-funsafe-math-optimizations (x86 SSE and alpha), it can happen that
zero value enters signbit tests in usafe math mode. Since signs of
zeroes and NaNs are not preserved in unsafe math mode,
gcc.dg/pr28796-2.c can fail on these targets.

We already have a check for non-zero double value in place for unsafe
math mode. Attached patch adds additional tests that guarantee  that
float and long double values are non-zero before signbit tests.

2015-09-29  Uros Bizjak  

* gcc.dg/tg-tests.h (foo_1) [UNSAFE]: Also check if f and ld are
non-zero for __builtin_signbit tests.

Tested on alpha-linux-gnu (where the patch fixes gcc.dg/pr28796-2.c
failure) and x86_64-linux-gnu {,-m32}.

Committed to mainline SVN.

Uros.

Index: gcc.dg/tg-tests.h
===
--- gcc.dg/tg-tests.h   (revision 228229)
+++ gcc.dg/tg-tests.h   (working copy)
@@ -82,7 +82,7 @@

   /* Sign bit of zeros and nans is not preserved in unsafe math mode.  */
 #ifdef UNSAFE
-  if (!res_isnan && d != 0)
+  if (!res_isnan && f != 0 && d != 0 && ld != 0)
 #endif
 {
   if ((__builtin_signbit (f) ? 1 : 0) != res_signbit)

[patch] libstdc++/67747 Allocate space for dirent::d_name

2015-09-29 Thread Jonathan Wakely


POSIX says that dirent::d_name has an unspecified length, so calls to
readdir_r must pass a buffer with enough trailing space for
{NAME_MAX}+1 characters. I wasn't doing that, which works OK on
GNU/Linux and BSD where d_name is a large array, but fails on Solaris
32-bit.

This uses pathconf to get NAME_MAX and allocates a buffer.

Tested powerpc64le-linux and x86_64-dragonfly4.1, I'm going to commit
this to trunk today (and backport all the filesystem fixes to
gcc-5-branch).

commit 16ff5d124b8e6c5d1f9dd4edb81b6ca5c9129134
Author: Jonathan Wakely 
Date:   Tue Sep 29 11:58:19 2015 +0100

PR libstdc++/67747 Allocate space for dirent::d_name

	PR libstdc++/67747
	* src/filesystem/dir.cc (_Dir::dirent_buf): New member.
	(get_name_max): New function.
	(native_readdir) [_GLIBCXX_FILESYSTEM_IS_WINDOWS]: Copy to supplied
	dirent object. Handle end of directory.
	(_Dir::advance): Allocate space for d_name.

diff --git a/libstdc++-v3/src/filesystem/dir.cc b/libstdc++-v3/src/filesystem/dir.cc
index bce751c..d29f8eb 100644
--- a/libstdc++-v3/src/filesystem/dir.cc
+++ b/libstdc++-v3/src/filesystem/dir.cc
@@ -25,8 +25,12 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
+#ifdef _GLIBCXX_HAVE_UNISTD_H
+# include 
+#endif
 #ifdef _GLIBCXX_HAVE_DIRENT_H
 # ifdef _GLIBCXX_HAVE_SYS_TYPES_H
 #  include 
@@ -64,20 +68,23 @@ struct fs::_Dir
   fs::path		path;
   directory_entry	entry;
   file_type		type = file_type::none;
+  unique_ptr	dirent_buf;
 };
 
 namespace
 {
   template
-inline bool is_set(Bitmask obj, Bitmask bits)
+inline bool
+is_set(Bitmask obj, Bitmask bits)
 {
   return (obj & bits) != Bitmask::none;
 }
 
   // Returns {dirp, p} on success, {nullptr, p} on error.
   // If an ignored EACCES error occurs returns {}.
-  fs::_Dir
-  open_dir(const fs::path& p, fs::directory_options options, std::error_code* ec)
+  inline fs::_Dir
+  open_dir(const fs::path& p, fs::directory_options options,
+	   std::error_code* ec)
   {
 if (ec)
   ec->clear();
@@ -99,8 +106,22 @@ namespace
 return {nullptr, p};
   }
 
+  inline long
+  get_name_max(const fs::path& path __attribute__((__unused__)))
+  {
+#ifdef _GLIBCXX_HAVE_UNISTD_H
+long name_max = pathconf(path.c_str(), _PC_NAME_MAX);
+if (name_max != -1)
+  return name_max;
+#endif
+
+// Maximum path component on Windows is 255 (UTF-16?) characters,
+// which is a reasonable default for POSIX too.
+return 255;
+  }
+
   inline fs::file_type
-  get_file_type(const dirent& d __attribute__((__unused__)))
+  get_file_type(const ::dirent& d __attribute__((__unused__)))
   {
 #ifdef _GLIBCXX_HAVE_STRUCT_DIRENT_D_TYPE
 switch (d.d_type)
@@ -129,12 +150,26 @@ namespace
 #endif
   }
 
-  int
+  inline int
   native_readdir(DIR* dirp, ::dirent*& entryp)
   {
 #ifdef _GLIBCXX_FILESYSTEM_IS_WINDOWS
-if ((entryp = ::readdir(dirp)))
-  return 0;
+const int saved_errno = errno;
+errno = 0;
+if (auto entp = ::readdir(dirp))
+  {
+	size_t name_len = strlen(entp->d_name);
+	if (name_len > 255)
+	  return ENAMETOOLONG;
+	size_t len = offsetof(::dirent, d_name) + name_len + 1;
+	memcpy(entryp, entp, len);
+	return 0;
+  }
+else if (errno == 0) // End of directory reached.
+  {
+	errno = saved_errno;
+	entryp = nullptr;
+  }
 return errno;
 #else
 return ::readdir_r(dirp, entryp, &entryp);
@@ -142,6 +177,7 @@ namespace
   }
 }
 
+
 // Returns false when the end of the directory entries is reached.
 // Reports errors by setting ec or throwing.
 bool
@@ -150,9 +186,15 @@ fs::_Dir::advance(error_code* ec, directory_options options)
   if (ec)
 ec->clear();
 
-  ::dirent ent;
-  ::dirent* result = &ent;
-  if (int err = native_readdir(dirp, result))
+  if (!dirent_buf)
+{
+  size_t len = offsetof(::dirent, d_name) + get_name_max(path) + 1;
+  dirent_buf.reset(new char[len]);
+}
+
+  ::dirent* entp = reinterpret_cast<::dirent*>(dirent_buf.get());
+
+  if (int err = native_readdir(dirp, entp))
 {
   if (err == EACCES
 && is_set(options, directory_options::skip_permission_denied))
@@ -165,13 +207,13 @@ fs::_Dir::advance(error_code* ec, directory_options options)
   ec->assign(err, std::generic_category());
   return true;
 }
-  else if (result != nullptr)
+  else if (entp != nullptr)
 {
   // skip past dot and dot-dot
-  if (!strcmp(ent.d_name, ".") || !strcmp(ent.d_name, ".."))
+  if (!strcmp(entp->d_name, ".") || !strcmp(entp->d_name, ".."))
 	return advance(ec, options);
-  entry = fs::directory_entry{path / ent.d_name};
-  type = get_file_type(ent);
+  entry = fs::directory_entry{path / entp->d_name};
+  type = get_file_type(*entp);
   return true;
 }
   else

Re: [PATCH 1/4] Add mkoffload for Intel MIC

2015-09-29 Thread Bernd Schmidt


On 09/29/2015 12:29 PM, Richard Biener wrote:

I agree that obstacks are better here.  Efficiency shouldn't matter here.
But we're in C++ now so can't we statically construct the array with
sth like

const char *new_argv[] = { "objcopy", ... };

?  Thus have the compiler figure out the number of args.  That would work
for me as well.


The issue is that the code is about to be changed to conditionally pass 
certain arguments ("-v"), so you no longer have a fixed arglist.



Bernd

Re: [PATCH] Use stdint-wrap.h on --netbsd[56]*

2015-09-29 Thread Jonathan Wakely


Ping.

On 18/09/15 13:59 +0100, Jonathan Wakely wrote:

This patch adjust config.gcc so that it installs  for NetBSD
5.x and 6.x, which is necessary for the C++ library because the host
 has:

#if !defined(__cplusplus) || defined(__STDC_LIMIT_MACROS)
#include 
#endif

#if !defined(__cplusplus) || defined(__STDC_CONSTANT_MACROS)
#include 
#endif

This means that contrary to the C++11 standard the stdint macros are
only defined when __STDC_CONSTANT_MACROS / __STDC_LIMIT_MACROS are
defined.

I first noted the problem earlier this year and opened
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65806

I rediscovered the problem when I broke netbsd bootstrap by including
 during bootstrap with https://gcc.gnu.org/r227684

That header uses UINT32_C, which is not defined without this patch.

NetBSD 7.x should be OK, because it knows about C++11 (see the link in
the PR for details).

Tested x86_64-unknown-netbsd5.1, OK for trunk?




diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index affc5ba..9450dcb 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,7 @@
+2015-09-16  Jonathan Wakely  
+
+   * config.gcc (*-*-netbsd[5-6]*): Set use_gcc_stdint=wrap.
+
2015-09-15  Alan Lawrence  

* config/aarch64/aarch64-simd.md
diff --git a/gcc/config.gcc b/gcc/config.gcc
index 75807f5..394ded3 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -788,6 +788,14 @@ case ${target} in
  default_use_cxa_atexit=yes
  ;;
  esac
+
+  # NetBSD 5.x and 6.x provide  but require
+  # __STDC_LIMIT_MACROS and __STDC_CONSTANT_MACROS for C++.
+  case ${target} in
+*-*-netbsd[5-6]* | *-*-netbsdelf[5-6]*)
+  use_gcc_stdint=wrap
+  ;;
+  esac
  ;;
*-*-openbsd*)
  tmake_file="t-openbsd"

[PATCH] Clarify __atomic_compare_exchange_n docs

2015-09-29 Thread Jonathan Wakely


Someone on IRC incorrectly parsed the docs at
https://gcc.gnu.org/onlinedocs/gcc-5.2.0/gcc/_005f_005fatomic-Builtins.html#index-g_t_005f_005fatomic_005fcompare_005fexchange_005fn-3536
as:

 IF
 (
  desired is written into *ptr
  AND
  the execution is considered to conform to the memory model
  specified by success_memmodel.
 )
 {
  true is returned
 }
 otherwise ...

rather than the intended:

 IF ( desired is written into *ptr )
 {
  true is returned
  AND
  the execution is considered to conform to the memory model
  specified by success_memmodel.
 }
 otherwise ...

So they asked:


What is otherwise, here? Can I make the function return false even
when 'desired' has been written into 'ptr'? How do I do it? I could
not write an example, so far.


This patch rewords it to avoid the ambiguity.

I've also replaced the rather clunky "the operation is considered to
conform to" phrasing. (It's only _considered_ to? So does it or doesn't
it use that memory order?) Instead I've used the terminology from the
C and C++ standards, which say "memory is affected according to".

OK for trunk?

commit 370a92b7f4d318957a70d0d3f1185f1c6f282ff3
Author: Jonathan Wakely 
Date:   Tue Sep 29 12:45:21 2015 +0100

	* doc/extend.texi (__atomic Builtins): Clarify compare_exchange
	effects.

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 8406945..0de94f2 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -9353,17 +9353,17 @@ This compares the contents of @code{*@var{ptr}} with the contents of
 @code{*@var{expected}}. If equal, the operation is a @emph{read-modify-write}
 operation that writes @var{desired} into @code{*@var{ptr}}.  If they are not
 equal, the operation is a @emph{read} and the current contents of
-@code{*@var{ptr}} is written into @code{*@var{expected}}.  @var{weak} is true
+@code{*@var{ptr}} are written into @code{*@var{expected}}.  @var{weak} is true
 for weak compare_exchange, and false for the strong variation.  Many targets 
 only offer the strong variation and ignore the parameter.  When in doubt, use
 the strong variation.
 
-True is returned if @var{desired} is written into
-@code{*@var{ptr}} and the operation is considered to conform to the
+If @var{desired} is written into @code{*@var{ptr}} then true is returned
+and memory is affected according to the
 memory order specified by @var{success_memorder}.  There are no
 restrictions on what memory order can be used here.
 
-False is returned otherwise, and the operation is considered to conform
+Otherwise, false is returned and memory is affected according
 to @var{failure_memorder}. This memory order cannot be
 @code{__ATOMIC_RELEASE} nor @code{__ATOMIC_ACQ_REL}.  It also cannot be a
 stronger order than that specified by @var{success_memorder}.

Re: [RFC, PR target/65105] Use vector instructions for scalar 64bit computations on 32bit target

2015-09-29 Thread H.J. Lu

On Wed, Sep 23, 2015 at 3:29 AM, Uros Bizjak  wrote:
> On Wed, Sep 23, 2015 at 12:19 PM, Ilya Enkovich  
> wrote:
>> On 14 Sep 17:50, Uros Bizjak wrote:
>>>
>>> +(define_insn_and_split "*zext_doubleword"
>>> +  [(set (match_operand:DI 0 "register_operand" "=r")
>>> + (zero_extend:DI (match_operand:SWI24 1 "nonimmediate_operand" "rm")))]
>>> +  "!TARGET_64BIT && TARGET_STV && TARGET_SSE2"
>>> +  "#"
>>> +  "&& reload_completed && GENERAL_REG_P (operands[0])"
>>> +  [(set (match_dup 0) (zero_extend:SI (match_dup 1)))
>>> +   (set (match_dup 2) (const_int 0))]
>>> +  "split_double_mode (DImode, &operands[0], 1, &operands[0], 
>>> &operands[2]);")
>>> +
>>> +(define_insn_and_split "*zextqi_doubleword"
>>> +  [(set (match_operand:DI 0 "register_operand" "=r")
>>> + (zero_extend:DI (match_operand:QI 1 "nonimmediate_operand" "qm")))]
>>> +  "!TARGET_64BIT && TARGET_STV && TARGET_SSE2"
>>> +  "#"
>>> +  "&& reload_completed && GENERAL_REG_P (operands[0])"
>>> +  [(set (match_dup 0) (zero_extend:SI (match_dup 1)))
>>> +   (set (match_dup 2) (const_int 0))]
>>> +  "split_double_mode (DImode, &operands[0], 1, &operands[0], 
>>> &operands[2]);")
>>> +
>>>
>>> Please put the above patterns together with other zero_extend
>>> patterns. You can also merge these two patterns using SWI124 mode
>>> iterator with  mode attribute as a register constraint. Also, no
>>> need to check for GENERAL_REG_P after reload, when "r" constraint is
>>> in effect:
>>>
>>> (define_insn_and_split "*zext_doubleword"
>>>   [(set (match_operand:DI 0 "register_operand" "=r")
>>>  (zero_extend:DI (match_operand:SWI124 1 "nonimmediate_operand" "m")))]
>>>   "!TARGET_64BIT && TARGET_STV && TARGET_SSE2"
>>>   "#"
>>>   "&& reload_completed"
>>>   [(set (match_dup 0) (zero_extend:SI (match_dup 1)))
>>>(set (match_dup 2) (const_int 0))]
>>>   "split_double_mode (DImode, &operands[0], 1, &operands[0], 
>>> &operands[2]);")
>>
>> Register constraint doesn't affect split and I need GENERAL_REG_P to filter 
>> other registers case.
>
> OK.
>
>> I merged QI and HI cases of zext but made a separate pattern for SI case 
>> because it doesn't need zero_extend in resulting code.  Bootstrapped and 
>> regtested for x86_64-unknown-linux-gnu.
>
> This change is OK.
>
> The patch LGTM, but please wait a couple of days if Jeff has some
> comment on algorithmic aspect of the patch.
>
> Thanks,
> Uros.
>
>>
>> Thanks,
>> Ilya
>> --
>> gcc/
>>
>> 2015-09-23  Ilya Enkovich  
>>
>> * config/i386/i386.c: Include dbgcnt.h.
>> (has_non_address_hard_reg): New.
>> (convertible_comparison_p): New.
>> (scalar_to_vector_candidate_p): New.
>> (remove_non_convertible_regs): New.
>> (scalar_chain): New.
>> (scalar_chain::scalar_chain): New.
>> (scalar_chain::~scalar_chain): New.
>> (scalar_chain::add_to_queue): New.
>> (scalar_chain::mark_dual_mode_def): New.
>> (scalar_chain::analyze_register_chain): New.
>> (scalar_chain::add_insn): New.
>> (scalar_chain::build): New.
>> (scalar_chain::compute_convert_gain): New.
>> (scalar_chain::replace_with_subreg): New.
>> (scalar_chain::replace_with_subreg_in_insn): New.
>> (scalar_chain::emit_conversion_insns): New.
>> (scalar_chain::make_vector_copies): New.
>> (scalar_chain::convert_reg): New.
>> (scalar_chain::convert_op): New.
>> (scalar_chain::convert_insn): New.
>> (scalar_chain::convert): New.
>> (convert_scalars_to_vector): New.
>> (pass_data_stv): New.
>> (pass_stv): New.
>> (make_pass_stv): New.
>> (ix86_option_override): Created and register stv pass.
>> (flag_opts): Add -mstv.
>> (ix86_option_override_internal): Likewise.
>> * config/i386/i386.md (SWIM1248x): New.
>> (*movdi_internal): Add xmm to mem alternative for TARGET_STV.
>> (and3): Use SWIM1248x iterator instead of SWIM.
>> (*anddi3_doubleword): New.
>> (*zext_doubleword): New.
>> (*zextsi_doubleword): New.
>> (3): Use SWIM1248x iterator instead of SWIM.
>> (*di3_doubleword): New.
>> * config/i386/i386.opt (mstv): New.
>> * dbgcnt.def (stv_conversion): New.
>>

This caused:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67761



-- 
H.J.

[PATCH] Fix PR67170

2015-09-29 Thread Richard Biener


The following patch addresses PR67170 which shows we fail to disambiguate
INTENT(IN) variables against for example recursive calls.  The trick
in solving this is to notice that when a function has a fn spec
attribute that says memory reachable by a parameter is not modified
then that memory behaves as if it were readonly throughout the function
and thus it doesn't have a dependence on any other reference in that
function.

In the PR I prototyped a patch in the alias oracle itself but that's
too expensive (we need to find the index of a PARM_DECL).  Thus the
following patch implements that trick in the value-numbering machinery
instead.  Going with the alias oracle patch would still be possible
if we decide on caching the fn spec information in a place that is
O(1) accessible from relevant memory references (thus either on the
SSA default def or the PARM_DECL itself).

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

This improves a future important benchmark implementing a
Sudoku puzzle solver considerably (~10% on x86_64 IIRC).

Richard.

2015-09-29  Richard Biener  

PR tree-optimization/67170
* tree-ssa-alias.h (get_continuation_for_phi): Adjust
the translate function pointer parameter to get the
bool whether to disambiguate only by reference.
(walk_non_aliased_vuses): Likewise.
* tree-ssa-alias.c (maybe_skip_until): Adjust.
(get_continuation_for_phi_1): Likewise.
(get_continuation_for_phi): Likewise.
(walk_non_aliased_vuses): Likewise.
* tree-ssa-sccvn.c (const_parms): New bitmap.
(vn_reference_lookup_3): Adjust for interface change.
Disambiguate parameters pointing to readonly memory.
(free_scc_vn): Free const_parms.
(run_scc_vn): Initialize const_parms from a fn spec attribute.

* gfortran.dg/pr67170.f90: New testcase.

Index: gcc/tree-ssa-alias.c
===
*** gcc/tree-ssa-alias.c(revision 228230)
--- gcc/tree-ssa-alias.c(working copy)
*** static bool
*** 2442,2448 
  maybe_skip_until (gimple *phi, tree target, ao_ref *ref,
  tree vuse, unsigned int *cnt, bitmap *visited,
  bool abort_on_visited,
! void *(*translate)(ao_ref *, tree, void *, bool),
  void *data)
  {
basic_block bb = gimple_bb (phi);
--- 2442,2448 
  maybe_skip_until (gimple *phi, tree target, ao_ref *ref,
  tree vuse, unsigned int *cnt, bitmap *visited,
  bool abort_on_visited,
! void *(*translate)(ao_ref *, tree, void *, bool *),
  void *data)
  {
basic_block bb = gimple_bb (phi);
*** maybe_skip_until (gimple *phi, tree targ
*** 2477,2484 
  ++*cnt;
  if (stmt_may_clobber_ref_p_1 (def_stmt, ref))
{
  if (translate
! && (*translate) (ref, vuse, data, true) == NULL)
;
  else
return false;
--- 2477,2485 
  ++*cnt;
  if (stmt_may_clobber_ref_p_1 (def_stmt, ref))
{
+ bool disambiguate_only = true;
  if (translate
! && (*translate) (ref, vuse, data, &disambiguate_only) == NULL)
;
  else
return false;
*** static tree
*** 2505,2511 
  get_continuation_for_phi_1 (gimple *phi, tree arg0, tree arg1,
ao_ref *ref, unsigned int *cnt,
bitmap *visited, bool abort_on_visited,
!   void *(*translate)(ao_ref *, tree, void *, bool),
void *data)
  {
gimple *def0 = SSA_NAME_DEF_STMT (arg0);
--- 2506,2512 
  get_continuation_for_phi_1 (gimple *phi, tree arg0, tree arg1,
ao_ref *ref, unsigned int *cnt,
bitmap *visited, bool abort_on_visited,
!   void *(*translate)(ao_ref *, tree, void *, bool *),
void *data)
  {
gimple *def0 = SSA_NAME_DEF_STMT (arg0);
*** get_continuation_for_phi_1 (gimple *phi,
*** 2547,2559 
else if ((common_vuse = gimple_vuse (def0))
   && common_vuse == gimple_vuse (def1))
  {
*cnt += 2;
if ((!stmt_may_clobber_ref_p_1 (def0, ref)
   || (translate
!  && (*translate) (ref, arg0, data, true) == NULL))
  && (!stmt_may_clobber_ref_p_1 (def1, ref)
  || (translate
! && (*translate) (ref, arg1, data, true) == NULL)))
return common_vuse;
  }
  
--- 2548,2561 
else if ((common_vuse = gimple_vuse (def0))
   && common_vuse == gimple_vuse (def1))
  {
+   bool disambiguate_only = true;
*cnt += 2;
if ((!stmt_may_clobber_ref_p_1 (def0, ref)
   || (translate
!

[gomp4] fold acc_on_device

2015-09-29 Thread Nathan Sidwell


I've committed this patch to gomp4.

It removes acc_on_device handling  from the oacc_xform pass, and moves it into 
the builtin folder.  I force the runtime version to be built with optimization 
and remove the expander too.


Expansion is rather later than I'm confortable with, but until we have use cases 
where it causes a problem, this is fine.


Bernd, I'd managed to confuse myself last week -- compiling w/o optimization can 
generate a different set of rtl dumps than with optimization, so I ended up 
seeing some stale ones.


Will prepare trunk  versions next ...

nathan
2015-09-29  Nathan Sidwell  

	gcc/
	* omp-low.c (oacc_xform_on_device): Delete.
	(oacc_xform_dim): Return bool.
	(execute_oacc_transform): Don't handle acc_on_device here.  Adjust
	rescan logic.
	* builtins.c (expand_builtin_acc_on_device): Delete.
	(expand_builtin): Do not call it.
	(fold_builtin_1): Fold acc_on_device.

	libgomp/
	* oacc-init.c (acc_on_device): Compile with optimization.
	* config/nvptx/oacc-init.c (acc_on_device): Compile with optimization.

Index: gcc/omp-low.c
===
--- gcc/omp-low.c	(revision 228215)
+++ gcc/omp-low.c	(working copy)
@@ -14719,45 +14719,10 @@ make_pass_late_lower_omp (gcc::context *
   return new pass_late_lower_omp (ctxt);
 }
 
-/* Transform an acc_on_device call.  OpenACC 2.0a requires this folded at
-   compile time for constant operands.  We always fold it.  In an
-   offloaded function we're never 'none'.  */
-
-static void
-oacc_xform_on_device (gcall *call)
-{
-  tree arg = gimple_call_arg (call, 0);
-  unsigned val = GOMP_DEVICE_HOST;
-	  
-#ifdef ACCEL_COMPILER
-  val = GOMP_DEVICE_NOT_HOST;
-#endif
-  tree result = build2 (EQ_EXPR, boolean_type_node, arg,
-			build_int_cst (integer_type_node, val));
-#ifdef ACCEL_COMPILER
-  {
-tree dev  = build2 (EQ_EXPR, boolean_type_node, arg,
-			build_int_cst (integer_type_node,
-   ACCEL_COMPILER_acc_device));
-result = build2 (TRUTH_OR_EXPR, boolean_type_node, result, dev);
-  }
-#endif
-  result = fold_convert (integer_type_node, result);
-  tree lhs = gimple_call_lhs (call);
-  gimple_seq seq = NULL;
-
-  push_gimplify_context (true);
-  gimplify_assign (lhs, result, &seq);
-  pop_gimplify_context (NULL);
-
-  gimple_stmt_iterator gsi = gsi_for_stmt (call);
-  gsi_replace_with_seq (&gsi, seq, false);
-}
-
 /* Transform oacc_dim_size and oacc_dim_pos internal function calls to
constants, where possible.  */
 
-static void
+static bool
 oacc_xform_dim (gcall *call, const int dims[], bool is_pos)
 {
   tree arg = gimple_call_arg (call, 0);
@@ -14766,13 +14731,13 @@ oacc_xform_dim (gcall *call, const int d
 
   if (!size)
 /* Dimension size is dynamic.  */
-return;
+return false;
   
   if (is_pos)
 {
   if (size != 1)
 	/* Size is more than 1, so POS might be non-zero.  */
-	return;
+	return false;
   size = 0;
 }
 
@@ -14783,6 +14748,7 @@ oacc_xform_dim (gcall *call, const int d
 
   gimple_stmt_iterator gsi = gsi_for_stmt (call);
   gsi_replace (&gsi, g, false);
+  return true;
 }
 
 /* Validate and update the dimensions for offloaded FN.  ATTRS is the
@@ -14868,64 +14834,57 @@ execute_oacc_transform ()
 for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p (gsi);)
   {
 	gimple *stmt = gsi_stmt (gsi);
-	int rescan = 0;
-	
 	if (!is_gimple_call (stmt))
 	  {
 	gsi_next (&gsi);
 	continue;
 	  }
 
+	gcall *call = as_a  (stmt);
+	if (!gimple_call_internal_p (call))
+	  {
+	gsi_next (&gsi);
+	continue;
+	  }
+
 	/* Rewind to allow rescan.  */
 	gsi_prev (&gsi);
+	int rescan = 0;
+	unsigned ifn_code = gimple_call_internal_fn (call);
 
-	gcall *call = as_a  (stmt);
-	
-	if (gimple_call_builtin_p (call, BUILT_IN_ACC_ON_DEVICE))
-	  /* acc_on_device must be evaluated at compile time for
-	 constant arguments.  */
+	switch (ifn_code)
 	  {
-	oacc_xform_on_device (call);
+	  default: break;
+
+	  case IFN_GOACC_DIM_POS:
+	  case IFN_GOACC_DIM_SIZE:
+	if (oacc_xform_dim (call, dims, ifn_code == IFN_GOACC_DIM_POS))
+	  rescan = 1;
+	break;
+
+	  case IFN_GOACC_REDUCTION_SETUP:
+	  case IFN_GOACC_REDUCTION_INIT:
+	  case IFN_GOACC_REDUCTION_FINI:
+	  case IFN_GOACC_REDUCTION_TEARDOWN:
+	/* Mark the function for SSA renaming.  */
+	mark_virtual_operands_for_renaming (cfun);
+	targetm.goacc.reduction (call);
 	rescan = 1;
+	break;
+
+	  case IFN_UNIQUE:
+	{
+	  unsigned code = TREE_INT_CST_LOW (gimple_call_arg (call, 0));
+
+	  if ((code == IFN_UNIQUE_OACC_FORK
+		   || code == IFN_UNIQUE_OACC_JOIN)
+		  && (targetm.goacc.fork_join
+		  (call, dims, code == IFN_UNIQUE_OACC_FORK)))
+		rescan = -1;
+	  break;
+	}
 	  }
-	else if (gimple_call_internal_p (call))
-	  {
-	unsigned ifn_code = gimple_call_internal_fn (call);
-	switch (ifn_code)
-	  {
-	  default: break;
-
-	  case IFN_GOACC_DIM_POS:
-	  case IFN_GOACC_DIM_SIZE:
-		oac

Re: [PATCH 1/3, libgomp] Adjust offload plugin interface for avoiding deadlock on exit

2015-09-29 Thread Chung-Lin Tang

On 2015/9/25 上午 04:27, Ilya Verbin wrote:
> On Thu, Aug 27, 2015 at 21:44:50 +0800, Chung-Lin Tang wrote:
>> We've discovered that, for several of the libgomp plugin interface routines,
>> if the target specific routine calls exit() (usually upon a fatal condition),
>> deadlock ensues. We found this using nvptx, but it's possible on intelmic as 
>> well.
>>
>> This is due to many of the plugin routines are called with the device lock 
>> held,
>> and when exit() is called inside the plugin code, the GOMP_unregister_var() 
>> destructor
>> tries to iterate through and acquire all device locks to cleanup. Since we 
>> already hold
>> one of the device locks, this just gets stuck.  Also because gomp_mutex_t is 
>> a
>> simple futex based lock implementation (instead of pthreads), we don't have a
>> trylock mechanism to use either.
>>
>> So this patch tries to alleviate this problem by changing the plugin 
>> interface;
>> the plugin routines that are called while holding the device lock are 
>> adjusted
>> to assume to never fatal exit, but return a value back to libgomp proper to
>> indicate execution results. The core libgomp code then may unlock and call 
>> gomp_fatal().
>>
>> We believe this is the right route to solve the problem, since there's only
>> two accel target plugins so far. Besides the nvptx plugin, I have made some 
>> effort
>> to update the intelmic plugin as well, though it's not as thoroughly audited.
>> Intel folks might want to further make sure your plugin code is free of this 
>> problem as well.
>>
>> This patch contains the libgomp proper changes. The nvptx and intelmic 
>> patches follow.
>> I have tested the libgomp testsuite without regressions for both accel 
>> targets, is this
>> okay for trunk?
> 
> (I have no objections)
> 
> However, in case of intelmic, these exit()s are just the tip of the iceberg,
> because underlying liboffloadmic contains other exit()s at fatal errors.
> And I don't know what to do with such deadlocks.
> 
>   -- Ilya

Yes, I think I saw more things to adjust wrt this issue within liboffloadmic, 
though I
hope this plugin interface change can set things ready.

And ping again, for the libgomp proper changes.

Thanks,
Chung-Lin

Re: [PATCH 2/4 v2] bb-reorder: Add the "simple" algorithm

2015-09-29 Thread Bernd Schmidt


On 09/25/2015 04:16 PM, Segher Boessenkool wrote:

v2 changes:
- Add a file header comment;
- Use "for" loop initial declarations;
- Handle asm goto.

Testing this on x86_64-linux; okay if it succeeds?


No objections from me. Let's give Steven another day or so to comment.


Bernd

Re: [patch] Reduce space and time overhead of std::thread

2015-09-29 Thread Jonathan Wakely


On 23/09/15 17:18 +0100, Jonathan Wakely wrote:

For PR 65393 I avoided some unnecessary shared_ptr copies while
launching a std::thread. This goes further and avoids shared_ptr
entirely, using unique_ptr instead. This reduces the memory overhead
of a std::thread by 32 bytes (on 64-bit) and avoids any
reference-count updates.

The downside is it exports some new symbols, and we have to keep the
old code for backwards compatibility, but I think it's worth doing.

Does anybody disagree?


Tested powerpc64le-linux and x86_64-dragonfly4.1.

Committed to trunk.



commit 2d7e89aae8ac12dd7a6b2083e5169679c1200cc5
Author: Jonathan Wakely 
Date:   Thu Mar 12 13:23:23 2015 +

   Reduce space and time overhead of std::thread
   
   	PR libstdc++/65393

* config/abi/pre/gnu.ver: Export new symbols.
* include/std/thread (thread::_State, thread::_State_impl): New types.
(thread::_M_start_thread): Add overload taking unique_ptr<_State>.
(thread::_M_make_routine): Remove.
(thread::_S_make_state): Add.
(thread::_Impl_base, thread::_Impl, thread::_M_start_thread)
[_GLIBCXX_THREAD_ABI_COMPAT] Only declare conditionally.
* src/c++11/thread.cc (execute_native_thread_routine): Rename to
execute_native_thread_routine_compat and re-define to use _State.
(thread::_State::~_State()): Define.
(thread::_M_make_thread): Define new overload.
(thread::_M_make_thread) [_GLIBCXX_THREAD_ABI_COMPAT]: Only define old
overloads conditionally.

diff --git a/libstdc++-v3/config/abi/pre/gnu.ver 
b/libstdc++-v3/config/abi/pre/gnu.ver
index d42cd37..08d9bc6 100644
--- a/libstdc++-v3/config/abi/pre/gnu.ver
+++ b/libstdc++-v3/config/abi/pre/gnu.ver
@@ -1870,6 +1870,11 @@ GLIBCXX_3.4.22 {
# std::uncaught_exceptions()
_ZSt19uncaught_exceptionsv;

+# std::thread::_State::~_State()
+_ZT[ISV]NSt6thread6_StateE;
+_ZNSt6thread6_StateD[012]Ev;
+
_ZNSt6thread15_M_start_threadESt10unique_ptrINS_6_StateESt14default_deleteIS1_EEPFvvE;
+
} GLIBCXX_3.4.21;

# Symbols in the support library (libsupc++) have their own tag.
diff --git a/libstdc++-v3/include/std/thread b/libstdc++-v3/include/std/thread
index ebbda62..c67ec46 100644
--- a/libstdc++-v3/include/std/thread
+++ b/libstdc++-v3/include/std/thread
@@ -60,9 +60,16 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  class thread
  {
  public:
+// Abstract base class for types that wrap arbitrary functors to be
+// invoked in the new thread of execution.
+struct _State
+{
+  virtual ~_State();
+  virtual void _M_run() = 0;
+};
+using _State_ptr = unique_ptr<_State>;
+
typedef __gthread_t native_handle_type;
-struct _Impl_base;
-typedef shared_ptr<_Impl_base>   __shared_base_type;

/// thread::id
class id
@@ -92,29 +99,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
operator<<(basic_ostream<_CharT, _Traits>& __out, thread::id __id);
};

-// Simple base type that the templatized, derived class containing
-// an arbitrary functor can be converted to and called.
-struct _Impl_base
-{
-  __shared_base_type   _M_this_ptr;
-
-  inline virtual ~_Impl_base();
-
-  virtual void _M_run() = 0;
-};
-
-template
-  struct _Impl : public _Impl_base
-  {
-   _Callable   _M_func;
-
-   _Impl(_Callable&& __f) : _M_func(std::forward<_Callable>(__f))
-   { }
-
-   void
-   _M_run() { _M_func(); }
-  };
-
  private:
id  _M_id;

@@ -133,16 +117,15 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  thread(_Callable&& __f, _Args&&... __args)
  {
#ifdef GTHR_ACTIVE_PROXY
-   // Create a reference to pthread_create, not just the gthr weak symbol
-_M_start_thread(_M_make_routine(std::__bind_simple(
-std::forward<_Callable>(__f),
-std::forward<_Args>(__args)...)),
-   reinterpret_cast(&pthread_create));
+   // Create a reference to pthread_create, not just the gthr weak symbol.
+   auto __depend = reinterpret_cast(&pthread_create);
#else
-_M_start_thread(_M_make_routine(std::__bind_simple(
-std::forward<_Callable>(__f),
-std::forward<_Args>(__args)...)));
+   auto __depend = nullptr;
#endif
+_M_start_thread(_S_make_state(
+ std::__bind_simple(std::forward<_Callable>(__f),
+std::forward<_Args>(__args)...)),
+   __depend);
  }

~thread()
@@ -190,23 +173,48 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
hardware_concurrency() noexcept;

  private:
+template
+  struct _State_impl : public _State
+  {
+   _Callable   _M_func;
+
+   _State_impl(_Callable&& __f) : _M_func(std::forward<_Callable>(__f))
+   { }
+
+   void
+   _M_run() { _M_func(); }
+  };
+
+void
+_M_start_thread(_State_ptr, void (*)());
+
+template
+  static _St

[PATCH] Fix PR67741

2015-09-29 Thread Richard Biener


Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2015-09-29  Richard Biener  

PR tree-optimization/67741
* tree-ssa-math-opts.c (pass_cse_sincos::execute): Only recognize
builtin calls with correct signature.

* gcc.dg/torture/pr67741.c: New testcase.

Index: gcc/tree-ssa-math-opts.c
===
*** gcc/tree-ssa-math-opts.c(revision 228115)
--- gcc/tree-ssa-math-opts.c(working copy)
*** pass_cse_sincos::execute (function *fun)
*** 1738,1752 
 of a basic block.  */
  cleanup_eh = false;
  
! if (is_gimple_call (stmt)
! && gimple_call_lhs (stmt)
! && (fndecl = gimple_call_fndecl (stmt))
! && DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL)
{
  tree arg, arg0, arg1, result;
  HOST_WIDE_INT n;
  location_t loc;
  
  switch (DECL_FUNCTION_CODE (fndecl))
{
CASE_FLT_FN (BUILT_IN_COS):
--- 1738,1751 
 of a basic block.  */
  cleanup_eh = false;
  
! if (gimple_call_builtin_p (stmt, BUILT_IN_NORMAL)
! && gimple_call_lhs (stmt))
{
  tree arg, arg0, arg1, result;
  HOST_WIDE_INT n;
  location_t loc;
  
+ fndecl = gimple_call_fndecl (stmt);
  switch (DECL_FUNCTION_CODE (fndecl))
{
CASE_FLT_FN (BUILT_IN_COS):
Index: gcc/testsuite/gcc.dg/torture/pr67741.c
===
*** gcc/testsuite/gcc.dg/torture/pr67741.c  (revision 0)
--- gcc/testsuite/gcc.dg/torture/pr67741.c  (working copy)
***
*** 0 
--- 1,13 
+ /* { dg-do compile } */
+ 
+ struct singlecomplex { float real, imag ; } ;
+ struct doublecomplex { double real, imag ; } ;
+ struct extendedcomplex { long double real, imag ; } ;
+ extern double cabs();
+ float cabsf(fc)
+  struct singlecomplex fc;  /* { dg-warning "doesn't match" } */
+ {
+   struct doublecomplex dc ;
+   dc.real=fc.real; dc.imag=fc.imag;
+   return (float) cabs(dc);
+ }

Re: [PATCH] Convert SPARC to LRA

2015-09-29 Thread Oleg Endo

On Mon, 2015-09-28 at 15:28 -0500, Segher Boessenkool wrote:

> We can at least change the default to LRA, so new ports get it unless
> they like to hurt themselves.
> 
> I don't think it makes sense to keep reload around *just* for the ports
> that are in "maintenance mode": by the time we are down to *just* those
> ports, it makes more sense to relabel them as "unmaintained".

Just for my understanding ... what's the definition of "maintenance
mode" or "unmaintained"?

Cheers,
Oleg

[patch] libstdc++/67583 Fix invalid sputn calls in tests

2015-09-29 Thread Jonathan Wakely


As the PR says, we're calling sputn with a string that is shorter than
the length we specify.

I'm not sure if the length was significant (I don't think so), but
rather than change that I extended the strings to that length.

Tested powerpc64le-linux, committed to trunk.

commit 4f835652c412fbd0300c8b045a5836d116ff56c8
Author: Jonathan Wakely 
Date:   Tue Sep 29 14:14:15 2015 +0100

PR libstdc++/67583 Fix invalid sputn calls in tests

	PR libstdc++/67583
	* testsuite/27_io/basic_stringbuf/seekoff/char/1.cc: Fix sputn call
	with mismatched arguments.
	* testsuite/27_io/basic_stringbuf/seekoff/wchar_t/1.cc: Likewise.

diff --git a/libstdc++-v3/testsuite/27_io/basic_stringbuf/seekoff/char/1.cc b/libstdc++-v3/testsuite/27_io/basic_stringbuf/seekoff/char/1.cc
index ddc6d97..2cd7696 100644
--- a/libstdc++-v3/testsuite/27_io/basic_stringbuf/seekoff/char/1.cc
+++ b/libstdc++-v3/testsuite/27_io/basic_stringbuf/seekoff/char/1.cc
@@ -88,8 +88,10 @@ void test04()
   VERIFY( strmsz_2 != strmsz_1 );
   VERIFY( strmsz_2 == 1 );
   // end part three
+  str_tmp = " ravi shankar meets carlos santana in LoHa   ";
+  str_tmp += str_tmp;
   strmsz_1 = strb_01.str().size();
-  strmsz_2 = strb_01.sputn(" ravi shankar meets carlos santana in LoHa", 90);
+  strmsz_2 = strb_01.sputn(str_tmp.c_str(), str_tmp.size());
   strb_01.pubseekoff(0, std::ios_base::end);
   strb_01.sputc('<');
   str_tmp = strb_01.str();
diff --git a/libstdc++-v3/testsuite/27_io/basic_stringbuf/seekoff/wchar_t/1.cc b/libstdc++-v3/testsuite/27_io/basic_stringbuf/seekoff/wchar_t/1.cc
index 8678536..0dd0974 100644
--- a/libstdc++-v3/testsuite/27_io/basic_stringbuf/seekoff/wchar_t/1.cc
+++ b/libstdc++-v3/testsuite/27_io/basic_stringbuf/seekoff/wchar_t/1.cc
@@ -88,8 +88,10 @@ void test04()
   VERIFY( strmsz_2 != strmsz_1 );
   VERIFY( strmsz_2 == 1 );
   // end part three
+  str_tmp = L" ravi shankar meets carlos santana in LoHa   ";
+  str_tmp += str_tmp;
   strmsz_1 = strb_01.str().size();
-  strmsz_2 = strb_01.sputn(L" ravi shankar meets carlos santana in LoHa", 90);
+  strmsz_2 = strb_01.sputn(str_tmp.c_str(), str_tmp.size());
   strb_01.pubseekoff(0, std::ios_base::end);
   strb_01.sputc(L'<');
   str_tmp = strb_01.str();

Re: [PATCH] Convert SPARC to LRA

2015-09-29 Thread Jeff Law


On 09/29/2015 07:19 AM, Oleg Endo wrote:

On Mon, 2015-09-28 at 15:28 -0500, Segher Boessenkool wrote:


We can at least change the default to LRA, so new ports get it unless
they like to hurt themselves.

I don't think it makes sense to keep reload around *just* for the ports
that are in "maintenance mode": by the time we are down to *just* those
ports, it makes more sense to relabel them as "unmaintained".


Just for my understanding ... what's the definition of "maintenance
mode" or "unmaintained"?

I'm not sure there's any formal definition.

If the port isn't getting tested, bugs aren't getting fixed, fails to 
build, etc then it's probably a good bet you could put it into the 
unmaintained bucket.


If the port does get occasional fixes (primarily driven by BZs), but not 
getting updated on a regular basis (such as conversion to LRA, 
conversion to RTL prologue/epilogue, etc), may be only getting 
occasional testing, etc.  Then it's probably fair to call it in 
maintenance mode.  A great example IMHO would be the m68k.


I would say we probably have many ports in maintenance mode right now. 
Not sure if any are in the unmaintained mode with perhaps the exception 
of interix.


jeff

Re: [PATCH] Convert SPARC to LRA

2015-09-29 Thread Jeff Law


On 09/28/2015 02:28 PM, Segher Boessenkool wrote:

On Mon, Sep 28, 2015 at 03:23:37PM -0400, Vladimir Makarov wrote:

There are more ports using reload than LRA now.  Even some major ports
(e.g. ppc64) did not switch to LRA.


There still are some failures in the testsuite (ICEs even) so we're
not there yet.


I usually say target maintainers, that if they don't switch LRA they
probably will have problems with maintenance and development in a long
perspective.  New things are easier to implement in LRA.


It is also true that new *ports* are easier to do with LRA than with
reload :-)
Right.  And if we set the expectation that a new port must use LRA, then 
I think we're fine.




We can at least change the default to LRA, so new ports get it unless
they like to hurt themselves.

I don't think it makes sense to keep reload around *just* for the ports
that are in "maintenance mode": by the time we are down to *just* those
ports, it makes more sense to relabel them as "unmaintained".
FWIW, I tried to build a simple cc0 target with LRA (v850-elf), but it 
fell over pretty early.  Essentially LRA doesn't seem to be cc0-aware in 
split_reg as ultimately inserted something between a cc0-setter and 
cc0-user.  Oops.



jeff

Re: [gomp4] Remove erroneous test and unreachable situation.

2015-09-29 Thread James Norris


Hi,

The original patch still missed some situations (thanks Cesar!)
and the attached patch addresses those. It also adds some new
tests.

Jim

Index: libgomp/ChangeLog.gomp
===
--- libgomp/ChangeLog.gomp	(revision 228245)
+++ libgomp/ChangeLog.gomp	(working copy)
@@ -1,3 +1,7 @@
+2015-09-29  James Norris  
+
+	* testsuite/libgomp.oacc-fortran/routine-9.f90: New test.
+
 2015-09-29  Nathan Sidwell  
 
 	* oacc-init.c (acc_on_device): Compile with optimization.
Index: libgomp/testsuite/libgomp.oacc-fortran/routine-9.f90
===
--- libgomp/testsuite/libgomp.oacc-fortran/routine-9.f90	(revision 0)
+++ libgomp/testsuite/libgomp.oacc-fortran/routine-9.f90	(revision 0)
@@ -0,0 +1,31 @@
+! { dg-do run }
+! { dg-options "-fno-inline" }
+
+program main
+  implicit none
+  integer, parameter :: n = 10
+  integer :: a(n), i
+  integer, external :: fact
+  !$acc routine (fact)
+  !$acc parallel
+  !$acc loop
+  do i = 1, n
+ a(i) = fact (i)
+  end do
+  !$acc end parallel
+  do i = 1, n
+ if (a(i) .ne. fact(i)) call abort
+  end do
+end program main
+
+recursive function fact (x) result (res)
+  implicit none
+  !$acc routine (fact)
+  integer, intent(in) :: x
+  integer :: res
+  if (x < 1) then
+ res = 1
+  else
+ res = x * fact(x - 1)
+  end if
+end function fact
Index: gcc/testsuite/ChangeLog.gomp
===
--- gcc/testsuite/ChangeLog.gomp	(revision 228245)
+++ gcc/testsuite/ChangeLog.gomp	(working copy)
@@ -1,3 +1,7 @@
+2015-08-29  James Norris  
+
+	* gfortran.dg/goacc/routine-6.f90: New test.
+
 2015-09-29  Tom de Vries  
 
 	* c-c++-common/goacc/kernels-acc-loop-smaller-equal.c: New test.
Index: gcc/testsuite/gfortran.dg/goacc/routine-6.f90
===
--- gcc/testsuite/gfortran.dg/goacc/routine-6.f90	(revision 0)
+++ gcc/testsuite/gfortran.dg/goacc/routine-6.f90	(revision 0)
@@ -0,0 +1,79 @@
+
+module m
+  integer m1int
+contains
+  subroutine subr5 (x) 
+  implicit none
+  !$acc routine (subr5)
+  !$acc routine (m1int) ! { dg-error "invalid function name" }
+  integer, intent(inout) :: x
+  if (x < 1) then
+ x = 1
+  else
+ x = x * x - 1
+  end if
+  end subroutine subr5
+end module m
+
+program main
+  implicit none
+  interface
+function subr6 (x) 
+!$acc routine (subr6) ! { dg-error "without list is allowed in interface" }
+integer, intent (in) :: x
+integer :: subr6
+end function subr6
+  end interface
+  integer, parameter :: n = 10
+  integer :: a(n), i
+  !$acc routine (subr1) ! { dg-error "invalid function name" }
+  external :: subr2
+  !$acc routine (subr2)
+  !$acc parallel
+  !$acc loop
+  do i = 1, n
+ call subr1 (i)
+ call subr2 (i)
+  end do
+  !$acc end parallel
+end program main
+
+subroutine subr1 (x) 
+  !$acc routine
+  integer, intent(inout) :: x
+  if (x < 1) then
+ x = 1
+  else
+ x = x * x - 1
+  end if
+end subroutine subr1
+
+subroutine subr2 (x) 
+  !$acc routine (subr1) ! { dg-error "invalid function name" }
+  integer, intent(inout) :: x
+  if (x < 1) then
+ x = 1
+  else
+ x = x * x - 1
+  end if
+end subroutine subr2
+
+subroutine subr3 (x) 
+  !$acc routine (subr3)
+  integer, intent(inout) :: x
+  if (x < 1) then
+ x = 1
+  else
+ call subr4 (x)
+  end if
+end subroutine subr3
+
+subroutine subr4 (x) 
+  !$acc routine (subr4)
+  integer, intent(inout) :: x
+  if (x < 1) then
+ x = 1
+  else
+ x = x * x - 1
+  end if
+end subroutine subr4
Index: gcc/fortran/openmp.c
===
--- gcc/fortran/openmp.c	(revision 228245)
+++ gcc/fortran/openmp.c	(working copy)
@@ -1745,11 +1745,35 @@ gfc_match_oacc_routine (void)
 
   if (m == MATCH_YES)
 {
-  /* Scan for a function name/string.  */
-  m = gfc_match_symbol (&sym, 0);
+  char buffer[GFC_MAX_SYMBOL_LEN + 1];
+  gfc_symtree *st;
 
-  if (m == MATCH_NO)
+  m = gfc_match_name (buffer);
+  if (m == MATCH_YES)
 	{
+	  st = gfc_find_symtree (gfc_current_ns->sym_root, buffer);
+	  if (st)
+	{
+	  sym = st->n.sym;
+	  if (strcmp (sym->name, gfc_current_ns->proc_name->name) == 0)
+	sym = NULL;
+	}
+
+	  if (st == NULL
+	  || (sym
+		  && !sym->attr.external
+		  && !sym->attr.function
+		  && !sym->attr.subroutine))
+	{
+	  gfc_error ("Syntax error in !$ACC ROUTINE ( NAME ) at %C, "
+			 "invalid function name %s",
+			 (sym) ? sym->name : buffer);
+	  gfc_current_locus = old_loc;
+	  return MATCH_ERROR;
+	}
+	}
+  else
+{
 	  gfc_error ("Syntax error in !$ACC ROUTINE ( NAME ) at %C");
 	  gfc_current_locus = old_loc;
 	  return MATCH_ERROR;
@@ -1761,7 +1785,7 @@ gfc_match_oacc_routine (void)
 		 " ')' after NAME");
 	  gfc_current_locus = old_loc;
 	  return MATCH_ERROR;

[PATCH] x86 interrupt attribute

2015-09-29 Thread Yulia Koval

Hi,



The patch below implements interrupt attribute for x86 processors.



The interrupt and exception handlers are called by x86 processors.
X86 hardware pushes information onto stack and calls the handler.  The
requirements are



1. Both interrupt and exception handlers must use the 'IRET'
instruction, instead of the 'RET' instruction, to return from the
handlers.

2. All registers are callee-saved in interrupt and exception handlers.

3. The difference between interrupt and exception handlers is the
exception handler must pop 'ERROR_CODE' off the stack before the
'IRET'

instruction.



The design goals of interrupt and exception handlers for x86 processors

are:



1. Support both 32-bit and 64-bit modes.

2. Flexible for compilers to optimize.

3. Easy to use by programmers.



To implement interrupt and exception handlers for x86 processors, a
compiler should support:



'interrupt' attribute



Use this attribute to indicate that the specified function with
mandatory arguments is an interrupt or exception handler.  The
compiler generates function entry and exit sequences suitable for use
in an interrupt handler when this attribute is present.  The 'IRET'
instruction, instead of the 'RET' instruction, is used to return from
interrupt or exception handlers.  All registers, except for the EFLAGS
register which is restored by the 'IRET' instruction, are preserved by
the compiler.



Any interruptible-without-stack-switch code must be compiled with
-mno-red-zone since interrupt handlers can and will, because of the
hardware design, touch the red zone.



1. interrupt handler must be declared with a mandatory pointer argument:



struct interrupt_frame;



__attribute__ ((interrupt))

void

f (struct interrupt_frame *frame)

{

...

}



and user must properly define the structure the pointer pointing to.



2. exception handler:



The exception handler is very similar to the interrupt handler with a
different mandatory function signature:



typedef unsigned long long int uword_t;

typedef unsigned int uword_t;



struct interrupt_frame;



__attribute__ ((interrupt))

void

f (struct interrupt_frame *frame, uword_t error_code) { ...

}



and compiler pops the error code off stack before the 'IRET' instruction.



The exception handler should only be used for exceptions which push an
error code and all other exceptions must use the interrupt handler.

The system will crash if the wrong handler is used.



Bootstrapped/regtested on Linux/x86_64 and Linux/i686.

Ok for trunk?



2015-09-29  Julia Koval 

H.J. Lu 



PR target/67630

PR target/67634

* config/i386/i386-protos.h (ix86_interrupt_return_nregs): New.

* config/i386/i386.c (ix86_frame): Add nbndregs and nmaskregs.

(ix86_interrupt_return_nregs): New variable.

(ix86_nsaved_bndregs): New function.

(ix86_nsaved_maskregs): Likewise.

(ix86_reg_save_area_size): Likewise.

(ix86_nsaved_sseregs): Don't return 0 in interrupt handler.

(ix86_compute_frame_layout): Set nbndregs and nmaskregs.  Set

save_regs_using_mov to true to save bound and mask registers.

Call ix86_reg_save_area_size to get register save area size.

Allocate space to save full vector registers in
interrupt handler.

(ix86_emit_save_reg_using_mov): Set alignment to word_mode

alignment when saving full vector registers in
interrupt handler.

(ix86_emit_save_regs_using_mov): Use regno_reg_rtx to get

register size.

(ix86_emit_restore_regs_using_mov): Likewise.

(ix86_emit_save_sse_regs_using_mov): Save full vector
registers in

interrupt handler.

(ix86_emit_restore_sse_regs_using_mov): Restore full vector

registers in interrupt handler.

(ix86_expand_epilogue): Use move to restore bound registers.

* config/i386/sse.md (*mov_internal): Handle misaligned

SSE load and store in interrupt handler.



PR target/66960

* config/i386/i386.c (ix86_conditional_register_usage): Set

ix86_interrupt_return_nregs/

(ix86_set_current_function): Set is_interrupt and is_exception.

Mark arguments in interrupt handler as used.

(ix86_function_ok_for_sibcall): Return false if in interrupt

handler.

(type_natural_mode): Don't warn ABI change for MMX in interrupt

handler.

(ix86_function_arg_advance): Skip for callee in interrupt

handler.

(ix86_function_arg): Handle arguments for callee in interrupt

handler.

(ix86_can_use_return_insn_p): Don't use `ret'

Re: OpenACC subarray data alignment in fortran

2015-09-29 Thread Cesar Philippidis

Ping.

In the meantime, I'll apply this patch to gomp-4_0-branch.

Cesar

On 09/22/2015 08:24 AM, Cesar Philippidis wrote:
> In both OpenACC and OpenMP, each subarray has at least two data mappings
> associated with them, one for the pointer and another for the data in
> the array section (fortan also has a pset mapping). One problem I
> observed in fortran is that array section data is casted to char *.
> Consequently, when lower_omp_target assigns alignment for the subarray
> data, it does so incorrectly. This is a problem on nvptx if you have a
> data clause such as
> 
>   integer foo
>   real*8 bar (100)
> 
>   !$acc data copy (foo, bar(1:100))
> 
> Here, the data associated with bar could get aligned on a 4 byte
> boundary instead of 8 byte. That causes problems on nvptx targets.
> 
> My fix for this is to prevent the fortran front end from casting the
> data pointers to char *. I only prevented casting on the code which
> handles OMP_CLAUSE_MAP. The subarrays associated with OMP_CLAUSE_SHARED
> also get casted to char *, but I left those as-is because I'm not that
> familiar with how non-OpenMP target regions get lowered.
> 
> Is this patch OK for trunk?
> 
> Thanks,
> Cesar
>

Re: [Patch, Fortran, 66927, v2] [6 Regression] ICE in gfc_conf_procedure_call

2015-09-29 Thread Andre Vehreschild

Hi Mikael, hi all,

sorry for the late reply, but I was a bit busy lately and the patch was
not as easy as expected. 

Mikael, I addressed your question about clarifying the comment and while
doing so the question arose "what happens when the function returns a
class object?" You have one guess; correct: ICE! This extended patch
now addresses the ICE and furthermore more consequently makes use of
the temporary created for the source= expression. I.e., when the
temporary is a class-object, it's vtab is more often retrieved from the
temporary and no longer generated from the gfc_expr's typespec. 

To efficiently copy - in the class/derived cases - the data, I had to
drill open the gfc_copy_class_to_class() routine a little bit, in that
it accepts the destination object to be a BT_DERIVED, too. 

I provide two testcases now and had to fix class_array_15, which was
expecting one too many calls to __builtin_free. With this patch the
creation of an unnecessary temporary object is prevented, which in the
consequence leads to one less calls to __builtin_free to free the
allocatable component of the temporary object.

Bootstraps and regtests ok on x86_64-linux-gnu/f21.

Ok, for trunk?

Regards,
Andre

On Sun, 9 Aug 2015 14:37:03 +0200
Mikael Morin  wrote:

> Le 06/08/2015 14:00, Mikael Morin a écrit :
> > Let me have a look at it.
> >
> So, I've had a look at it.
> This is a pandora box that I don't want to open.
> So your change is OK.
> However, could you clarify the comment?
> Function calls returning a class object are either pointer or 
> allocatable, so they don't call gfc_conv_expr_descriptor already, they 
> aren't an exception...
> 
> Mikael


-- 
Andre Vehreschild * Email: vehre ad gmx dot de 


pr66927_2.clog
Description: Binary data
diff --git a/gcc/fortran/trans-array.c b/gcc/fortran/trans-array.c
index a6b761b..504b08a 100644
--- a/gcc/fortran/trans-array.c
+++ b/gcc/fortran/trans-array.c
@@ -3222,7 +3222,7 @@ build_array_ref (tree desc, tree offset, tree decl, tree vptr)
 {
   type = gfc_get_element_type (type);
   tmp = TREE_OPERAND (cdecl, 0);
-  tmp = gfc_get_class_array_ref (offset, tmp);
+  tmp = gfc_get_class_array_ref (offset, tmp, NULL_TREE);
   tmp = fold_convert (build_pointer_type (type), tmp);
   tmp = build_fold_indirect_ref_loc (input_location, tmp);
   return tmp;
@@ -7079,9 +7079,20 @@ gfc_conv_expr_descriptor (gfc_se *se, gfc_expr *expr)
 	}
 	  else if (GFC_ARRAY_TYPE_P (TREE_TYPE (desc)) || se->use_offset)
 	{
+	  bool toonebased;
 	  tmp = gfc_conv_array_lbound (desc, n);
+	  toonebased = integer_onep (tmp);
+	  // lb(arr) - from (- start + 1)
 	  tmp = fold_build2_loc (input_location, MINUS_EXPR,
  TREE_TYPE (base), tmp, from);
+	  if (onebased && toonebased)
+		{
+		  tmp = fold_build2_loc (input_location, MINUS_EXPR,
+	 TREE_TYPE (base), tmp, start);
+		  tmp = fold_build2_loc (input_location, PLUS_EXPR,
+	 TREE_TYPE (base), tmp,
+	 gfc_index_one_node);
+		}
 	  tmp = fold_build2_loc (input_location, MULT_EXPR,
  TREE_TYPE (base), tmp,
  gfc_conv_array_stride (desc, n));
@@ -7155,12 +7166,13 @@ gfc_conv_expr_descriptor (gfc_se *se, gfc_expr *expr)
   /* For class arrays add the class tree into the saved descriptor to
  enable getting of _vptr and the like.  */
   if (expr->expr_type == EXPR_VARIABLE && VAR_P (desc)
-  && IS_CLASS_ARRAY (expr->symtree->n.sym)
-  && DECL_LANG_SPECIFIC (expr->symtree->n.sym->backend_decl))
+  && IS_CLASS_ARRAY (expr->symtree->n.sym))
 {
   gfc_allocate_lang_decl (desc);
   GFC_DECL_SAVED_DESCRIPTOR (desc) =
-	  GFC_DECL_SAVED_DESCRIPTOR (expr->symtree->n.sym->backend_decl);
+	  DECL_LANG_SPECIFIC (expr->symtree->n.sym->backend_decl) ?
+	GFC_DECL_SAVED_DESCRIPTOR (expr->symtree->n.sym->backend_decl)
+	  : expr->symtree->n.sym->backend_decl;
 }
   if (!se->direct_byref || se->byref_noassign)
 {
diff --git a/gcc/fortran/trans-expr.c b/gcc/fortran/trans-expr.c
index e086fe3..90b5140 100644
--- a/gcc/fortran/trans-expr.c
+++ b/gcc/fortran/trans-expr.c
@@ -1039,9 +1039,10 @@ gfc_conv_class_to_class (gfc_se *parmse, gfc_expr *e, gfc_typespec class_ts,
of the referenced element.  */
 
 tree
-gfc_get_class_array_ref (tree index, tree class_decl)
+gfc_get_class_array_ref (tree index, tree class_decl, tree data_comp)
 {
-  tree data = gfc_class_data_get (class_decl);
+  tree data = data_comp != NULL_TREE ? data_comp :
+   gfc_class_data_get (class_decl);
   tree size = gfc_class_vtab_size_get (class_decl);
   tree offset = fold_build2_loc (input_location, MULT_EXPR,
  gfc_array_index_type,
@@ -1075,6 +1076,7 @@ gfc_copy_class_to_class (tree from, tree to, tree nelems, bool unlimited)
   tree stdcopy;
   tree extcopy;
   tree index;
+  bool is_from_desc = false, is_to_class = false;
 
   args = NULL;
   /* To prevent warnings on uninitialized variables.  */
@@ -1088,7 +1090,19 @@ gfc_copy_class

Re: [PATCH] Convert SPARC to LRA

2015-09-29 Thread Richard Biener

On Tue, Sep 29, 2015 at 3:39 PM, Jeff Law  wrote:
> On 09/29/2015 07:19 AM, Oleg Endo wrote:
>>
>> On Mon, 2015-09-28 at 15:28 -0500, Segher Boessenkool wrote:
>>
>>> We can at least change the default to LRA, so new ports get it unless
>>> they like to hurt themselves.
>>>
>>> I don't think it makes sense to keep reload around *just* for the ports
>>> that are in "maintenance mode": by the time we are down to *just* those
>>> ports, it makes more sense to relabel them as "unmaintained".
>>
>>
>> Just for my understanding ... what's the definition of "maintenance
>> mode" or "unmaintained"?
>
> I'm not sure there's any formal definition.
>
> If the port isn't getting tested, bugs aren't getting fixed, fails to build,
> etc then it's probably a good bet you could put it into the unmaintained
> bucket.
>
> If the port does get occasional fixes (primarily driven by BZs), but not
> getting updated on a regular basis (such as conversion to LRA, conversion to
> RTL prologue/epilogue, etc), may be only getting occasional testing, etc.
> Then it's probably fair to call it in maintenance mode.  A great example
> IMHO would be the m68k.

Another criteria would be available hardware for which both the PA and
alpha ports
are a good example.  When you can't buy new hardware then targets that
could formerly host GCC quickly rot to the state where only cross-compilation
is viable (and having "old" GCC is good enough).

> I would say we probably have many ports in maintenance mode right now. Not
> sure if any are in the unmaintained mode with perhaps the exception of
> interix.

I'd say that all ports not in maintainance mode should be at least secondary
archs as we can expect maintainers to be around to keep it at the quality
level we expect for secondary targets.  Now I'd like to do the opposite
conclusion and declare all non-primary/secondary targets as in
maintainance mode ... ;)
We have 49 targets (counting directories) and 7 of them compose the list of
primary and secondary triplets.

Richard.

> jeff

Re: [PATCH] Clear variables with stale SSA_NAME_RANGE_INFO (PR tree-optimization/67690)

2015-09-29 Thread Marek Polacek

On Fri, Sep 25, 2015 at 06:22:44PM +0200, Richard Biener wrote:
> On September 25, 2015 3:49:34 PM GMT+02:00, Marek Polacek 
>  wrote:
> >On Fri, Sep 25, 2015 at 09:29:30AM +0200, Richard Biener wrote:
> >> On Thu, 24 Sep 2015, Marek Polacek wrote:
> >> 
> >> > As Richi said in
> >,
> >> > using recorded SSA name range infos in VRP is likely to expose
> >errors in the
> >> > ranges.  This PR is such a case.  As discussed in the PR, after
> >tail merging
> >> > via PRE the range infos cannot be relied upon anymore, so we need
> >to clear
> >> > them.
> >> > 
> >> > Since tree-ssa-ifcombine.c already had code to clean up the flow
> >data in a BB,
> >> > I've factored it out to a common function.
> >> > 
> >> > Bootstrapped/regtested on x86_64-linux, ok for trunk and 5?
> >> 
> >> I believe for tail-merge you also need to clear range info on
> >> PHI defs in the BB.  For ifcombine this wasn't necessary (no PHI
> >nodes
> >> in the relevant CFG), but it's ok to extend the new 
> >> reset_flow_sensitive_info_in_bb function to also reset PHI defs.
> >
> >All right.
> > 
> >> Ok with that change.
> >
> >Since I'm not completely sure if I did the right thing here, could you
> >please have another look at the new function?
> 
> Doesn't work that way.  You need to iterate over the PHI sequence separately 
> via gsi_start_phis(bb), etc.

Oops, sorry.  So like this?

Bootstrapped/regtested on x86_64-linux, ok for trunk (and a similar
patch for 5)?

2015-09-29  Marek Polacek  

PR tree-optimization/67690
* tree-ssa-ifcombine.c (pass_tree_ifcombine::execute): Call
reset_flow_sensitive_info_in_bb.
* tree-ssa-tail-merge.c (replace_block_by): Likewise.
* tree-ssanames.c: Include "gimple-iterator.h".
(reset_flow_sensitive_info_in_bb): New function.
* tree-ssanames.h (reset_flow_sensitive_info_in_bb): Declare.

* gcc.dg/torture/pr67690.c: New test.

diff --git gcc/testsuite/gcc.dg/torture/pr67690.c 
gcc/testsuite/gcc.dg/torture/pr67690.c
index e69de29..491de51 100644
--- gcc/testsuite/gcc.dg/torture/pr67690.c
+++ gcc/testsuite/gcc.dg/torture/pr67690.c
@@ -0,0 +1,32 @@
+/* { dg-do run } */
+
+const int c1 = 1;
+const int c2 = 2;
+
+int
+check (int i)
+{
+  int j;
+  if (i >= 0)
+j = c2 - i;
+  else
+j = c2 - i;
+  return c2 - c1 + 1 > j;
+}
+
+int invoke (int *pi) __attribute__ ((noinline,noclone));
+int
+invoke (int *pi)
+{
+  return check (*pi);
+}
+
+int
+main ()
+{
+  int i = c1;
+  int ret = invoke (&i);
+  if (!ret)
+__builtin_abort ();
+  return 0;
+}
diff --git gcc/tree-ssa-ifcombine.c gcc/tree-ssa-ifcombine.c
index 9f04174..66be430 100644
--- gcc/tree-ssa-ifcombine.c
+++ gcc/tree-ssa-ifcombine.c
@@ -769,16 +769,7 @@ pass_tree_ifcombine::execute (function *fun)
  {
/* Clear range info from all stmts in BB which is now executed
   conditional on a always true/false condition.  */
-   for (gimple_stmt_iterator gsi = gsi_start_bb (bb);
-!gsi_end_p (gsi); gsi_next (&gsi))
- {
-   gimple *stmt = gsi_stmt (gsi);
-   ssa_op_iter i;
-   tree op;
-   FOR_EACH_SSA_TREE_OPERAND (op, stmt, i, SSA_OP_DEF)
- reset_flow_sensitive_info (op);
- }
-
+   reset_flow_sensitive_info_in_bb (bb);
cfg_changed |= true;
  }
 }
diff --git gcc/tree-ssa-tail-merge.c gcc/tree-ssa-tail-merge.c
index 0ce59e8..487961e 100644
--- gcc/tree-ssa-tail-merge.c
+++ gcc/tree-ssa-tail-merge.c
@@ -1534,6 +1534,10 @@ replace_block_by (basic_block bb1, basic_block bb2)
   e2->probability = GCOV_COMPUTE_SCALE (e2->count, out_sum);
 }
 
+  /* Clear range info from all stmts in BB2 -- this transformation
+ could make them out of date.  */
+  reset_flow_sensitive_info_in_bb (bb2);
+
   /* Do updates that use bb1, before deleting bb1.  */
   release_last_vdef (bb1);
   same_succ_flush_bb (bb1);
diff --git gcc/tree-ssanames.c gcc/tree-ssanames.c
index 4199290..7235dc3 100644
--- gcc/tree-ssanames.c
+++ gcc/tree-ssanames.c
@@ -23,6 +23,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "backend.h"
 #include "tree.h"
 #include "gimple.h"
+#include "gimple-iterator.h"
 #include "hard-reg-set.h"
 #include "ssa.h"
 #include "alias.h"
@@ -544,6 +545,29 @@ reset_flow_sensitive_info (tree name)
 SSA_NAME_RANGE_INFO (name) = NULL;
 }
 
+/* Clear all flow sensitive data from all statements and PHI definitions
+   in BB.  */
+
+void
+reset_flow_sensitive_info_in_bb (basic_block bb)
+{
+  for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p (gsi);
+   gsi_next (&gsi))
+{
+  gimple *stmt = gsi_stmt (gsi);
+  ssa_op_iter i;
+  tree op;
+  FOR_EACH_SSA_TREE_OPERAND (op, stmt, i, SSA_OP_DEF)
+   reset_flow_sensitive_info (op);
+}
+
+  for (gphi_iterator gsi = gsi_start_phis (bb); !gsi_end_p (gsi);
+   gsi_

Re: patch to fix PR66424

2015-09-29 Thread Matthias Klose

This was marked as a regression in 5 and 6, but never backported to the 
gcc-5-branch. Is it time to backport?


Matthias

On 21.07.2015 21:54, Vladimir Makarov wrote:

   The following patch fixes

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66424

   The patch was tested and bootstrapped on x86/x86-64.

   Committed as rev. 226053.

2015-07-21  Vladimir Makarov  

 PR ipa/66424.
 * lra-remat.c (operand_to_remat): Prevent using insns with input
 subregs processed separately by IRA.

2015-07-21  Vladimir Makarov  

 PR ipa/66424.
 * gcc.target/i386/pr66424.c: New.

Re: [PATCH] Convert SPARC to LRA

2015-09-29 Thread Peter Bergner

On Mon, 2015-09-28 at 15:28 -0500, Segher Boessenkool wrote:
> On Mon, Sep 28, 2015 at 03:23:37PM -0400, Vladimir Makarov wrote:
> > There are more ports using reload than LRA now.  Even some major ports 
> > (e.g. ppc64) did not switch to LRA.
> 
> There still are some failures in the testsuite (ICEs even) so we're
> not there yet.

I've started to looking through the failures with a target of getting POWER
converted to LRA before the switch to stage3.  From a quick scan, I see what
looks like two different ICEs on multiple tests and one wrong code gen issue.

The first ICE seems to be due to a conversion to long double and LRA ends
up going into a infinite loop spilling things until it hits a threshold and
quits with an ICE.  I haven't spent enough time to determine whether this
is a LRA or port issue yet though.  The simplest test case I have at the
moment is:

bergner@genoa:~/gcc/BUGS/LRA/20011123-1$ cat bug2.i
void
foo (long double *ldb1, double *db1)
{
  *ldb1 = *db1;
}
bergner@genoa:~/gcc/BUGS/LRA/20011123-1$ 
/home/bergner/gcc/build/gcc-fsf-mainline-bootstrap-lra-default-debug/gcc/xgcc 
-B/home/bergner/gcc/build/gcc-fsf-mainline-bootstrap-lra-default-debug/gcc/ -S 
-O1 -mvsx -S bug2.i
bug2.i: In function ‘foo’:
bug2.i:5:1: internal compiler error: Max. number of generated reload insns per 
insn is achieved (90)

 }
 ^
0x10962903 lra_constraints(bool)

/home/bergner/gcc/gcc-fsf-mainline-bootstrap-lra-default/gcc/lra-constraints.c:4351
0x10942af7 lra(_IO_FILE*)
/home/bergner/gcc/gcc-fsf-mainline-bootstrap-lra-default/gcc/lra.c:2298
0x108c0ac7 do_reload
/home/bergner/gcc/gcc-fsf-mainline-bootstrap-lra-default/gcc/ira.c:5391
0x108c1183 execute
/home/bergner/gcc/gcc-fsf-mainline-bootstrap-lra-default/gcc/ira.c:5562


After IRA, things are pretty simple, with just the following one insn which 
needs
a reload/spill, since we don't have memory to memory ops on POWER:

(insn 7 4 10 2 (parallel [
(set (mem:TF (reg:DI 3 3 [ ldb1 ]) [0 *ldb1_5(D)+0 S16 A128])
(float_extend:TF (mem:DF (reg:DI 4 4 [ db1 ]) [0 *db1_2(D)+0 S8 
A64])))
(use (const_double:DF 0.0 [0x0.0p+0]))
]) bug2.i:4 445 {*extenddftf2_internal}
 (expr_list:REG_DEAD (reg:DI 4 4 [ db1 ])
(expr_list:REG_DEAD (reg:DI 3 3 [ ldb1 ])
(nil

In LRA, comes along and gives us the following which looks good:

(insn 7 4 11 2 (parallel [
(set (reg:TF 159)
(float_extend:TF (mem:DF (reg:DI 4 4 [ db1 ]) [0 *db1_2(D)+0 S8 
A64])))
(use (const_double:DF 0.0 [0x0.0p+0]))
]) bug2.i:4 445 {*extenddftf2_internal}
 (expr_list:REG_DEAD (reg:DI 4 4 [ db1 ])
(expr_list:REG_DEAD (reg:DI 3 3 [ ldb1 ])
(nil

(insn 11 7 10 2 (set (mem:TF (reg:DI 3 3 [ ldb1 ]) [0 *ldb1_5(D)+0 S16 A128])
(reg:TF 159)) bug2.i:4 435 {*movtf_64bit_dm}
 (nil))

but for some reason, it thinks reg 159 needs reloading and gives us:

(insn 7 4 12 2 (parallel [
(set (reg:TF 159)
(float_extend:TF (mem:DF (reg:DI 4 4 [ db1 ]) [0 *db1_2(D)+0 S8 
A64])))
(use (const_double:DF 0.0 [0x0.0p+0]))
]) bug2.i:4 445 {*extenddftf2_internal}
 (expr_list:REG_DEAD (reg:DI 4 4 [ db1 ])
(expr_list:REG_DEAD (reg:DI 3 3 [ ldb1 ])
(nil

(insn 12 7 11 2 (set (reg:TF 160 [159])
(reg:TF 159)) bug2.i:4 435 {*movtf_64bit_dm}
 (nil))

(insn 11 12 10 2 (set (mem:TF (reg:DI 3 3 [ ldb1 ]) [0 *ldb1_5(D)+0 S16 A128])
(reg:TF 160 [159])) bug2.i:4 435 {*movtf_64bit_dm}
 (nil))

and we end up doing it again and again and...until we hit the reload threshold
and ICE.  That's as far as I've gotten at this point.  Comments welcome since
I've had to put this on the shelf at the moment while working on next year's
work schedule for our team.

I haven't had a chance to look into the other ICE or wrong code gen issue yet,
but will eventually will get to those.

Peter

Re: [Patch ifcvt costs 0/3] Introduce a new target hook for ifcvt costs.

2015-09-29 Thread James Greenhalgh

On Tue, Sep 29, 2015 at 11:16:37AM +0100, Richard Biener wrote:
> On Fri, Sep 25, 2015 at 5:04 PM, James Greenhalgh
>  wrote:
> > Hi,
> >
> > In relation to the patch I put up for review a few weeks ago to teach
> > RTL if-convert to handle multiple sets in a basic block [1], I was
> > asking about a sensible cost model to use. There was some consensus at
> > Cauldron that what should be done in this situation is to introduce a
> > target hook that delegates answering the question to the target.
> 
> Err - the consensus was to _not_ add gazillion of special target hooks
> but instead enhance what we have with rtx_cost so that passes can
> rely on comparing before and after costs of a sequence of insns.

Ah, I was not able to attend Cauldron this year, so I was trying to pick out
"consensus" from the video. Rewatching it now, I see a better phrase would
be "suggestion with some support".

Watching the video a second time, it seems your proposal is that we improve
the RTX costs infrastructure to handle sequences of Gimple/RTX. That would
get us some way to making a smart decision in if-convert, but I'm not
convinced it allows us to answer the question we are interested in.

We have the rtx for before and after, and we can generate costs for these
sequences. This allows us to calculate some weighted cost of the
instructions based on the calculated probabilities that each block is
executed. However, we are missing information on how expensive the branch
is, and we have no way to get that through an RTX-costs infrastructure.

We could add a hook to give a cost in COSTS_N_INSNS units to a branch based
on its predictability. This is difficult as COSTS_N_INSNS units can differ
depending on whether you are talking about floating-point or integer code.
By this I mean, the compiler considers a SET which costs more than
COSTS_N_INSNS (1) to be "expensive". Consequently, some targets set the cost
of both an integer SET and a floating-point SET to both be COSTS_N_INSNS (1).
In reality, these instructions may have different latency performance
characteristics. What real world quantity are we trying to invoke when we
say a branch costs the same as 3 SET instructions of any type? It certainly
isn't mispredict penalty (likely measured in cycles, not relative to the cost
of a SET instruction, which may well be completely free on modern x86
processors), nor is it the cost of executing the branch instruction which
is often constant to resolve regardless of predicted/mispredicted status.

On the other side of the equation, we want a cost for the converted
sequence. We can build a cost of the generated rtl sequence, but for
targets like AArch64 this is going to be wildly off. AArch64 will expand
(a > b) ? x : y; as a set to the CC register, followed by a conditional
move based on the CC register. Consequently, where we have multiple sets
back to back we end up with:

  set CC (a > b)
  set x1 (CC ? x : y)
  set CC (a > b)
  set x2 (CC ? x : z)
  set CC (a > b)
  set x3 (CC ? x : k)

Which we know will be simplified later to:

  set CC (a > b)
  set x1 (CC ? x : y)
  set x2 (CC ? x : z)
  set x3 (CC ? x : k)

I imagine other targets have something similar in their expansion of
movcc (though I haven't looked).

Our comparison for if-conversion then must be:

  weighted_old_cost = (taken_probability * (then_bb_cost)
- (1 - taken_probability) * (else_bb_cost));
  branch_cost = branch_cost_in_insns (taken_probability)
  weighted_new_cost = redundancy_factor (new_sequence) * seq_cost (new_sequence)

  profitable = weighted_new_cost <= weighted_old_cost + branch_cost

And we must define:

  branch_cost_in_insns (taken_probability)
  redundancy_factor (new_sequence)

At that point, I feel you are better giving the entire sequence to the
target and asking it to implement whatever logic is needed to return a
profitable/unprofitable analysis of the transformation.

The "redundancy_factor" in particular is pretty tough to define in a way
which makes sense outside of if_convert, without adding some pretty
detailed analysis to decide what might or might not be eliminated by
later passes. The alternative is to weight the other side of the equation
by tuning the cost of branch_cost_in_insns high. This only serves to increase
the disconnect between a real-world cost and a number to tweak to game
code generation.

If you have a different way of phrasing the if-conversion question that
avoids the two very specific hooks, I'd be happy to try taking the patches
in that direction. I don't see a way to implement this as just queries to
a costing function which does not need substantial target and pass
dependent tweaking to make behave correctly.

Thanks,
James

> > This patch series introduces that new target hook to provide cost
> > decisions for the RTL ifcvt pass.
> >
> > The idea is to give the target full visibility of the proposed
> > transformation, and allow it to respond as to whether if-conversion in that
> > way is p

[patch] Leave errno unchanged by successful std::stoi etc

2015-09-29 Thread Jonathan Wakely


We set errno=0 in __gnu_cxx::__stoa in order to reliably detect when
it gets set to ERANGE. This restores the previous value when the
conversion is successful.

Tested powerpc64le-linux, committed to trunk.
commit 412f75dc37b1048e14996c9caafa46c00db8eb30
Author: Jonathan Wakely 
Date:   Tue Sep 29 15:09:23 2015 +0100

Leave errno unchanged by successful std::stoi etc

	* include/ext/string_conversions.h (__stoa): Save and restore errno.
	* testsuite/21_strings/basic_string/numeric_conversions/char/errno.cc:
	New.

diff --git a/libstdc++-v3/include/ext/string_conversions.h b/libstdc++-v3/include/ext/string_conversions.h
index f4648a8..58387a2 100644
--- a/libstdc++-v3/include/ext/string_conversions.h
+++ b/libstdc++-v3/include/ext/string_conversions.h
@@ -58,6 +58,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   _Ret __ret;
 
   _CharT* __endptr;
+  const int __saved_errno = errno;
   errno = 0;
   const _TRet __tmp = __convf(__str, &__endptr, __base...);
 
@@ -70,6 +71,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	std::__throw_out_of_range(__name);
   else
 	__ret = __tmp;
+  errno = __saved_errno;
 
   if (__idx)
 	*__idx = __endptr - __str;
diff --git a/libstdc++-v3/testsuite/21_strings/basic_string/numeric_conversions/char/errno.cc b/libstdc++-v3/testsuite/21_strings/basic_string/numeric_conversions/char/errno.cc
new file mode 100644
index 000..4079744
--- /dev/null
+++ b/libstdc++-v3/testsuite/21_strings/basic_string/numeric_conversions/char/errno.cc
@@ -0,0 +1,36 @@
+// Copyright (C) 2015 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-options "-std=gnu++11" }
+// { dg-require-string-conversions "" }
+
+#include 
+#include 
+
+void
+test01()
+{
+  errno = ERANGE;
+  std::stoi("42");
+  VERIFY( errno == ERANGE ); // errno should not be altered by successful call
+}
+
+int
+main()
+{
+  test01();
+}

Re: [PATCH] Clarify __atomic_compare_exchange_n docs

2015-09-29 Thread Sandra Loosemore


On 09/29/2015 06:00 AM, Jonathan Wakely wrote:

Someone on IRC incorrectly parsed the docs at
https://gcc.gnu.org/onlinedocs/gcc-5.2.0/gcc/_005f_005fatomic-Builtins.html#index-g_t_005f_005fatomic_005fcompare_005fexchange_005fn-3536

as:

  IF
  (
   desired is written into *ptr
   AND
   the execution is considered to conform to the memory model
   specified by success_memmodel.
  )
  {
   true is returned
  }
  otherwise ...

rather than the intended:

  IF ( desired is written into *ptr )
  {
   true is returned
   AND
   the execution is considered to conform to the memory model
   specified by success_memmodel.
  }
  otherwise ...

So they asked:


What is otherwise, here? Can I make the function return false even
when 'desired' has been written into 'ptr'? How do I do it? I could
not write an example, so far.


This patch rewords it to avoid the ambiguity.

I've also replaced the rather clunky "the operation is considered to
conform to" phrasing. (It's only _considered_ to? So does it or doesn't
it use that memory order?) Instead I've used the terminology from the
C and C++ standards, which say "memory is affected according to".

OK for trunk?


This is OK, as far as it goes, but while we're at it, can we do 
something to fix the description of the weak parameter?



@@ -9353,17 +9353,17 @@ This compares the contents of @code{*@var{ptr}} with 
the contents of
 @code{*@var{expected}}. If equal, the operation is a @emph{read-modify-write}
 operation that writes @var{desired} into @code{*@var{ptr}}.  If they are not
 equal, the operation is a @emph{read} and the current contents of
-@code{*@var{ptr}} is written into @code{*@var{expected}}.  @var{weak} is true
+@code{*@var{ptr}} are written into @code{*@var{expected}}.  @var{weak} is true
 for weak compare_exchange, and false for the strong variation.  Many targets
 only offer the strong variation and ignore the parameter.  When in doubt, use
 the strong variation.


What is "weak compare_exchange", and what is "the strong variation", and 
how do they differ in terms of behavior?


-Sandra

[gomp4] Rename oacc_transform pass

2015-09-29 Thread Nathan Sidwell

I've committed this to gomp4 branch.  It renames the oacc_transform pass to 
oacc_device_lower, in line  with the (now withdrawn) patch for mainline.


I'm preparing a version of the pass for mainline with a different initial use 
than acc_on_device folding.


nathan
2015-09-29  Nathan Sidwell  
	Cesar Philippidis  

	* passes.def: Rename pass_oacc_transform to pass_oacc_device_lower.
	* tree-pass.h (make_pass_oacc_transform): Rename to ...
	(make_pass_oacc_device_lower): ... here.
	* doc/invoke/texi (oaccdevlow): Document tree dump flag.
	* omp-low.c (execute_oacc_transform): Rename to ...
	(execute_oacc_device_lower): ... here.
	(pass_data pass_data_oacc_transform): Rename to ...
	(pass_data pass_data_oacc_device_lower): ... here. Adjust name.
	(class pass_oacc_transform): Rename to ...
	class pass_oacc_device_lower): ... here.
	(make_pass_oacc_transform): Rename to ...
	(make_pass_oacc_device_lower): ... here.

Index: gcc/doc/invoke.texi
===
--- gcc/doc/invoke.texi	(revision 228241)
+++ gcc/doc/invoke.texi	(working copy)
@@ -1,3 +1,4 @@
+
 @c Copyright (C) 1988-2015 Free Software Foundation, Inc.
 @c This is part of the GCC manual.
 @c For copying conditions, see the file gcc.texi.
@@ -332,6 +333,7 @@ Objective-C and Objective-C++ Dialects}.
 -fdump-passes @gol
 -fdump-statistics @gol
 -fdump-tree-all @gol
+-fdump-tree-accdevlow @gol
 -fdump-tree-original@r{[}-@var{n}@r{]}  @gol
 -fdump-tree-optimized@r{[}-@var{n}@r{]} @gol
 -fdump-tree-cfg -fdump-tree-alias @gol
@@ -7246,6 +7248,11 @@ is made by appending @file{.slp} to the
 Dump each function after Value Range Propagation (VRP).  The file name
 is made by appending @file{.vrp} to the source file name.
 
+@item oaccdevlow
+@opindex fdump-tree-oaccdevlow
+Dump each function after applying device-specific OpenACC transformations.
+The file name is made by appending @file{.oaccdevlow} to the source file name.
+
 @item all
 @opindex fdump-tree-all
 Enable all the available tree dumps with the flags provided in this option.
Index: gcc/passes.def
===
--- gcc/passes.def	(revision 228241)
+++ gcc/passes.def	(working copy)
@@ -164,7 +164,7 @@ along with GCC; see the file COPYING3.
   INSERT_PASSES_AFTER (all_passes)
   NEXT_PASS (pass_fixup_cfg);
   NEXT_PASS (pass_lower_eh_dispatch);
-  NEXT_PASS (pass_oacc_transform);
+  NEXT_PASS (pass_oacc_device_lower);
   NEXT_PASS (pass_all_optimizations);
   PUSH_INSERT_PASSES_WITHIN (pass_all_optimizations)
   NEXT_PASS (pass_remove_cgraph_callee_edges);
Index: gcc/tree-pass.h
===
--- gcc/tree-pass.h	(revision 228241)
+++ gcc/tree-pass.h	(working copy)
@@ -411,7 +411,7 @@ extern gimple_opt_pass *make_pass_late_l
 extern gimple_opt_pass *make_pass_diagnose_omp_blocks (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_expand_omp (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_expand_omp_ssa (gcc::context *ctxt);
-extern gimple_opt_pass *make_pass_oacc_transform (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_oacc_device_lower (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_object_sizes (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_strlen (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_fold_builtins (gcc::context *ctxt);
Index: gcc/omp-low.c
===
--- gcc/omp-low.c	(revision 228241)
+++ gcc/omp-low.c	(working copy)
@@ -14836,7 +14836,7 @@ oacc_validate_dims (tree fn, tree attrs,
point (including the host fallback).  */
 
 static unsigned int
-execute_oacc_transform ()
+execute_oacc_device_lower ()
 {
   tree attrs = get_oacc_fn_attrib (current_function_decl);
   int dims[GOMP_DIM_MAX];
@@ -15036,10 +15036,10 @@ default_goacc_reduction (gcall *call)
 
 namespace {
 
-const pass_data pass_data_oacc_transform =
+const pass_data pass_data_oacc_device_lower =
 {
   GIMPLE_PASS, /* type */
-  "fold_oacc_transform", /* name */
+  "accdevlow", /* name */
   OPTGROUP_NONE, /* optinfo_flags */
   TV_NONE, /* tv_id */
   PROP_cfg, /* properties_required */
@@ -15049,11 +15049,11 @@ const pass_data pass_data_oacc_transform
   TODO_update_ssa | TODO_cleanup_cfg, /* todo_flags_finish */
 };
 
-class pass_oacc_transform : public gimple_opt_pass
+class pass_oacc_device_lower : public gimple_opt_pass
 {
 public:
-  pass_oacc_transform (gcc::context *ctxt)
-: gimple_opt_pass (pass_data_oacc_transform, ctxt)
+  pass_oacc_device_lower (gcc::context *ctxt)
+: gimple_opt_pass (pass_data_oacc_device_lower, ctxt)
   {}
 
   /* opt_pass methods: */
@@ -15064,17 +15064,17 @@ public:
   if (!gate)
 	return 0;
 
-  return execute_oacc_transform ();
+  return execute_oacc_device_lower ();
 }
 
-}; // class pass_oacc_transform
+}; // class pass_oacc_device_lower
 
 } // anon namespace
 
 gimple_opt_pass *
-make_pas

Re: [patch] Leave errno unchanged by successful std::stoi etc

2015-09-29 Thread Jakub Jelinek

On Tue, Sep 29, 2015 at 04:15:41PM +0100, Jonathan Wakely wrote:
> We set errno=0 in __gnu_cxx::__stoa in order to reliably detect when
> it gets set to ERANGE. This restores the previous value when the
> conversion is successful.
> 
> Tested powerpc64le-linux, committed to trunk.

> commit 412f75dc37b1048e14996c9caafa46c00db8eb30
> Author: Jonathan Wakely 
> Date:   Tue Sep 29 15:09:23 2015 +0100
> 
> Leave errno unchanged by successful std::stoi etc
> 
>   * include/ext/string_conversions.h (__stoa): Save and restore errno.
>   * testsuite/21_strings/basic_string/numeric_conversions/char/errno.cc:
>   New.
> 
> diff --git a/libstdc++-v3/include/ext/string_conversions.h 
> b/libstdc++-v3/include/ext/string_conversions.h
> index f4648a8..58387a2 100644
> --- a/libstdc++-v3/include/ext/string_conversions.h
> +++ b/libstdc++-v3/include/ext/string_conversions.h
> @@ -58,6 +58,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>_Ret __ret;
>  
>_CharT* __endptr;
> +  const int __saved_errno = errno;
>errno = 0;
>const _TRet __tmp = __convf(__str, &__endptr, __base...);
>  
> @@ -70,6 +71,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>   std::__throw_out_of_range(__name);
>else
>   __ret = __tmp;
> +  errno = __saved_errno;

That looks wrong to me, you only restore errno if you don't throw :(.
If you throw, then errno might remain 0, which is IMHO undesirable.
So, I'd say you want to restore it earlier, right after __convf, and
immediately before that copy the current errno to some other temporary
for the use in the condition?  Or restore errno = __saved_errno;
in all the 3 spots instead of just one.

Jakub

[C PATCH] Fix missing warning (PR c/67730)

2015-09-29 Thread Marek Polacek

This fixes missing warning for the attached testcase.  In such a case,
we must use the expansion point location.  I didn't simply add
  loc = expansion_point_location_if_in_system_header (loc);
as might be seen elsewhere in the codebase because we pass LOC down to
convert_for_assignment where many of the warnings are issued and I was
nervous about passing a different location there.

Bootstrapped/regtested on x86_64-linux, ok for trunk and 5?

2015-09-29  Marek Polacek  

PR c/67730
* c-typeck.c (c_finish_return): Use the expansion point location for
certain "return with value" warnings.

* gcc.dg/pr67730.c: New test.

diff --git gcc/c/c-typeck.c gcc/c/c-typeck.c
index 3b26231..a11ccb2 100644
--- gcc/c/c-typeck.c
+++ gcc/c/c-typeck.c
@@ -9369,8 +9369,12 @@ c_finish_return (l_cation_t ttt, tree retval, tree 
origtype)
   bool npc = false;
   size_t rank = 0;
 
+  /* Use the expansion point to handle cases such as returning NULL
+ in a function returning void.  */
+  source_location xloc = expansion_point_location_if_in_system_header (loc);
+
   if (TREE_THIS_VOLATILE (current_function_decl))
-warning_at (loc, 0,
+warning_at (xloc, 0,
"function declared % has a % statement");
 
   if (flag_cilkplus && contains_array_notation_expr (retval))
@@ -9425,10 +9429,10 @@ c_finish_return (location_t loc, tree retval, tree 
origtype)
 {
   current_function_returns_null = 1;
   if (TREE_CODE (TREE_TYPE (retval)) != VOID_TYPE)
-   pedwarn (loc, 0,
+   pedwarn (xloc, 0,
 "% with a value, in function returning void");
   else
-   pedwarn (loc, OPT_Wpedantic, "ISO C forbids "
+   pedwarn (xloc, OPT_Wpedantic, "ISO C forbids "
 "% with expression, in function returning void");
 }
   else
diff --git gcc/testsuite/gcc.dg/pr67730.c gcc/testsuite/gcc.dg/pr67730.c
index e69de29..54d73a6 100644
--- gcc/testsuite/gcc.dg/pr67730.c
+++ gcc/testsuite/gcc.dg/pr67730.c
@@ -0,0 +1,11 @@
+/* PR c/67730 */
+/* { dg-do compile } */
+/* { dg-options "" } */
+
+#include 
+
+void
+fn1 (void)
+{
+  return NULL; /* { dg-warning "10:.return. with a value" } */
+}

Marek

Re: [Patch,optimization]: Optimized changes in the estimate register pressure cost.

2015-09-29 Thread Pat Haugen


On 09/25/2015 11:51 PM, Ajit Kumar Agarwal wrote:

I have made the following changes in the estimate_reg_pressure_cost function 
used
by the loop invariant and IVOPTS.

Earlier the estimate_reg_pressure cost uses the cost of n_new variables that 
are generated by the Loop Invariant
  and IVOPTS. These are not sufficient for register pressure calculation. The 
register pressure cost calculation should
use the n_new + n_old (numbers) to consider the cost. n_old is the register  
used inside the loops and the effect of
  n_new new variables generated by loop invariant and IVOPTS on register 
pressure is based on how the new
variables impact on register used inside the loops. The increase or decrease in 
register pressure is due to the impact
of new variables on the register used  inside the loops. The register-register 
move cost or the spill cost should consider
the cost associated with register used and the new variables generated. The 
movement  of new variables increases or
decreases the register pressure, which is based on  overall cost of n_new + 
n_old variables.

The increase and decrease in register pressure is based on the overall cost of 
n_new + n_old as the changes in the
register pressure caused due to new variables is based on how the changes 
behave with respect to the register used
in the loops.

Thus the register pressure caused to new variables is based on the new 
variables and its impact on register used inside
  the loops and thus consider the overall  cost of n_new + n_old.

Bootstrap for i386 and reg tested on i386 with the change is fine.

SPEC CPU 2000 benchmarks are run and there is following impact on the 
performance
and code size.

ratio with the optimization vs ratio without optimization for INT benchmarks
(3807.632 vs 3804.661)

ratio with the optimization vs ratio without optimization for FP benchmarks
( 4668.743 vs 4778.741)

Code size reduction with respect to FP SPEC CPU 2000 benchmarks

Number of instruction with optimization = 1094117
Number of instruction without optimization = 1094659

Reduction in number of instruction with the optimization = 542 instruction.
I tried your patch on powerpc64le using CPU2006. There was a small 
degradation in mcf (-1.5%) and small improvement in bwaves (+1.3%), the 
remaining benchmarks (and overall results) were neutral.


-Pat

Re: [patch] Leave errno unchanged by successful std::stoi etc

2015-09-29 Thread Jonathan Wakely


On 29/09/15 17:25 +0200, Jakub Jelinek wrote:

On Tue, Sep 29, 2015 at 04:15:41PM +0100, Jonathan Wakely wrote:

We set errno=0 in __gnu_cxx::__stoa in order to reliably detect when
it gets set to ERANGE. This restores the previous value when the
conversion is successful.

Tested powerpc64le-linux, committed to trunk.



commit 412f75dc37b1048e14996c9caafa46c00db8eb30
Author: Jonathan Wakely 
Date:   Tue Sep 29 15:09:23 2015 +0100

Leave errno unchanged by successful std::stoi etc

* include/ext/string_conversions.h (__stoa): Save and restore errno.
* testsuite/21_strings/basic_string/numeric_conversions/char/errno.cc:
New.

diff --git a/libstdc++-v3/include/ext/string_conversions.h 
b/libstdc++-v3/include/ext/string_conversions.h
index f4648a8..58387a2 100644
--- a/libstdc++-v3/include/ext/string_conversions.h
+++ b/libstdc++-v3/include/ext/string_conversions.h
@@ -58,6 +58,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   _Ret __ret;

   _CharT* __endptr;
+  const int __saved_errno = errno;
   errno = 0;
   const _TRet __tmp = __convf(__str, &__endptr, __base...);

@@ -70,6 +71,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
std::__throw_out_of_range(__name);
   else
__ret = __tmp;
+  errno = __saved_errno;


That looks wrong to me, you only restore errno if you don't throw :(.
If you throw, then errno might remain 0, which is IMHO undesirable.


My thinking was that a failed conversion that throws an exception
should be allowed to modify errno, and that the second case sets it to
ERANGE sometimes anyway.

But I suppose it would be better to consistently set it to non-zero
when an exception is thrown, or consistently restore the original
value in all cases.


So, I'd say you want to restore it earlier, right after __convf, and
immediately before that copy the current errno to some other temporary
for the use in the condition?  Or restore errno = __saved_errno;
in all the 3 spots instead of just one.


Or in a destructor so it happens however we exit the function, like
this ...


diff --git a/libstdc++-v3/include/ext/string_conversions.h b/libstdc++-v3/include/ext/string_conversions.h
index 58387a2..3b62c9a 100644
--- a/libstdc++-v3/include/ext/string_conversions.h
+++ b/libstdc++-v3/include/ext/string_conversions.h
@@ -58,8 +58,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   _Ret __ret;
 
   _CharT* __endptr;
-  const int __saved_errno = errno;
-  errno = 0;
+
+  struct _Restore_errno {
+	  _Restore_errno() : _M_errno(errno) { errno = 0; }
+	  ~_Restore_errno() { errno = _M_errno; }
+	  int _M_errno;
+  } const __restore;
+
   const _TRet __tmp = __convf(__str, &__endptr, __base...);
 
   if (__endptr == __str)
@@ -71,7 +76,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	std::__throw_out_of_range(__name);
   else
 	__ret = __tmp;
-  errno = __saved_errno;
 
   if (__idx)
 	*__idx = __endptr - __str;

[PATCH] Fix undefined behaviour in msp430 port

2015-09-29 Thread Jeff Law



Similar to the fixes from the weekend.  Avoiding left shifts of negative 
signed values in the obvious way.


Tested by building msp430 targets from config-all.mk.

Installed on the trunk.

Jeff
commit 679cec5bd2f9ca9c6dabff89d0103790d560c0cb
Author: Jeff Law 
Date:   Mon Sep 28 19:24:56 2015 -0400

[PATCH] Fix undefined behaviour in msp430 port

   * config/msp430/msp430.c (msp430_legitimate_constant): Fix undefined
left shift behaviour.
* config/msp430/constraints.md ('L' constraint): Similarly.
('Ys' constraint): Similarly.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 03f566c..1b9985a 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,10 @@
+2015-09-29  Jeff Law  
+
+   * config/msp430/msp430.c (msp430_legitimate_constant): Fix undefined
+   left shift behaviour.
+   * config/msp430/constraints.md ('L' constraint): Similarly.
+   ('Ys' constraint): Similarly.
+
 2015-09-29  Richard Biener  
 
PR tree-optimization/67170
diff --git a/gcc/config/msp430/constraints.md b/gcc/config/msp430/constraints.md
index 30f944c..dfda152 100644
--- a/gcc/config/msp430/constraints.md
+++ b/gcc/config/msp430/constraints.md
@@ -32,7 +32,7 @@
 (define_constraint "L"
   "Integer constant -1^20..1^19."
   (and (match_code "const_int")
-   (match_test "IN_RANGE (ival, -1 << 20, 1 << 19)")))
+   (match_test "IN_RANGE (ival, HOST_WIDE_INT_M1U << 20, 1 << 19)")))
 
 (define_constraint "M"
   "Integer constant 1-4."
@@ -77,7 +77,7 @@
(and (match_code "plus" "0")
 (and (match_code "reg" "00")
  (match_test ("CONST_INT_P (XEXP (XEXP (op, 0), 1))"))
- (match_test ("IN_RANGE (INTVAL (XEXP (XEXP (op, 0), 1)), -1 
<< 15, (1 << 15)-1)"
+ (match_test ("IN_RANGE (INTVAL (XEXP (XEXP (op, 0), 1)), 
HOST_WIDE_INT_M1U << 15, (1 << 15)-1)"
(match_code "reg" "0")
)))
 
diff --git a/gcc/config/msp430/msp430.c b/gcc/config/msp430/msp430.c
index d2308cb..ba8d862 100644
--- a/gcc/config/msp430/msp430.c
+++ b/gcc/config/msp430/msp430.c
@@ -998,7 +998,7 @@ msp430_legitimate_constant (machine_mode mode, rtx x)
 /* GCC does not know the width of the PSImode, so make
sure that it does not try to use a constant value that
is out of range.  */
-|| (INTVAL (x) < (1 << 20) && INTVAL (x) >= (-1 << 20));
+|| (INTVAL (x) < (1 << 20) && INTVAL (x) >= 
(HOST_WIDE_INT)(HOST_WIDE_INT_M1U << 20));
 }

[PATCH] remove dead code of commutative_reductions

2015-09-29 Thread Sebastian Pop

This code is not used anymore after we removed the previous loop optimizer (not
based on the ISL scheduler.)  We will add back the detection of commutative
reductions after we improve the code generation of scalar dependences (by not
going out of SSA for scalar dependences just to expose them to the data
dependence graph.)

Patch passed bootstrap and check on x86_64-linux with ISL-0.15.
I will commit this patch to trunk.

2015-09-29  Sebastian Pop  
Aditya Kumar  

* graphite-sese-to-poly.c (gsi_for_phi_node): Remove.
(nb_data_writes_in_bb): Remove.
(split_pbb): Remove.
(split_reduction_stmt): Remove.
(is_reduction_operation_p): Remove.
(phi_contains_arg): Remove.
(follow_ssa_with_commutative_ops): Remove.
(detect_commutative_reduction_arg): Remove.
(detect_commutative_reduction_assign): Remove.
(follow_inital_value_to_phi): Remove.
(edge_initial_value_for_loop_phi): Remove.
(initial_value_for_loop_phi): Remove.
(used_outside_reduction): Remove.
(detect_commutative_reduction): Remove.
(translate_scalar_reduction_to_array_for_stmt): Remove.
(remove_phi): Remove.
(dr_indices_valid_in_loop): Remove.
(close_phi_written_to_memory): Remove.
(translate_scalar_reduction_to_array): Remove.
(rewrite_commutative_reductions_out_of_ssa_close_phi): Remove.
(rewrite_commutative_reductions_out_of_ssa_loop): Remove.
(rewrite_commutative_reductions_out_of_ssa): Remove.
(build_poly_scop): Remove call to 
rewrite_commutative_reductions_out_of_ssa.
---
 gcc/graphite-sese-to-poly.c | 602 
 1 file changed, 602 deletions(-)

diff --git a/gcc/graphite-sese-to-poly.c b/gcc/graphite-sese-to-poly.c
index 3b8dd56..26f75e9 100644
--- a/gcc/graphite-sese-to-poly.c
+++ b/gcc/graphite-sese-to-poly.c
@@ -1919,22 +1919,6 @@ build_scop_drs (scop_p scop)
 build_pbb_drs (pbb);
 }
 
-/* Return a gsi at the position of the phi node STMT.  */
-
-static gphi_iterator
-gsi_for_phi_node (gphi *stmt)
-{
-  gphi_iterator psi;
-  basic_block bb = gimple_bb (stmt);
-
-  for (psi = gsi_start_phis (bb); !gsi_end_p (psi); gsi_next (&psi))
-if (stmt == psi.phi ())
-  return psi;
-
-  gcc_unreachable ();
-  return psi;
-}
-
 /* Analyze all the data references of STMTS and add them to the
GBB_DATA_REFS vector of BB.  */
 
@@ -2515,590 +2499,6 @@ nb_pbbs_in_loops (scop_p scop)
   return res;
 }
 
-/* Return the number of data references in BB that write in
-   memory.  */
-
-static int
-nb_data_writes_in_bb (basic_block bb)
-{
-  int res = 0;
-  gimple_stmt_iterator gsi;
-
-  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
-if (gimple_vdef (gsi_stmt (gsi)))
-  res++;
-
-  return res;
-}
-
-/* Splits at STMT the basic block BB represented as PBB in the
-   polyhedral form.  */
-
-static edge
-split_pbb (scop_p scop, poly_bb_p pbb, basic_block bb, gimple *stmt)
-{
-  edge e1 = split_block (bb, stmt);
-  new_pbb_from_pbb (scop, pbb, e1->dest);
-  return e1;
-}
-
-/* Splits STMT out of its current BB.  This is done for reduction
-   statements for which we want to ignore data dependences.  */
-
-static basic_block
-split_reduction_stmt (scop_p scop, gimple *stmt)
-{
-  basic_block bb = gimple_bb (stmt);
-  poly_bb_p pbb = pbb_from_bb (bb);
-  gimple_bb_p gbb = gbb_from_bb (bb);
-  edge e1;
-  int i;
-  data_reference_p dr;
-
-  /* Do not split basic blocks with no writes to memory: the reduction
- will be the only write to memory.  */
-  if (nb_data_writes_in_bb (bb) == 0
-  /* Or if we have already marked BB as a reduction.  */
-  || PBB_IS_REDUCTION (pbb_from_bb (bb)))
-return bb;
-
-  e1 = split_pbb (scop, pbb, bb, stmt);
-
-  /* Split once more only when the reduction stmt is not the only one
- left in the original BB.  */
-  if (!gsi_one_before_end_p (gsi_start_nondebug_bb (bb)))
-{
-  gimple_stmt_iterator gsi = gsi_last_bb (bb);
-  gsi_prev (&gsi);
-  e1 = split_pbb (scop, pbb, bb, gsi_stmt (gsi));
-}
-
-  /* A part of the data references will end in a different basic block
- after the split: move the DRs from the original GBB to the newly
- created GBB1.  */
-  FOR_EACH_VEC_ELT (GBB_DATA_REFS (gbb), i, dr)
-{
-  basic_block bb1 = gimple_bb (DR_STMT (dr));
-
-  if (bb1 != bb)
-   {
- gimple_bb_p gbb1 = gbb_from_bb (bb1);
- GBB_DATA_REFS (gbb1).safe_push (dr);
- GBB_DATA_REFS (gbb).ordered_remove (i);
- i--;
-   }
-}
-
-  return e1->dest;
-}
-
-/* Return true when stmt is a reduction operation.  */
-
-static inline bool
-is_reduction_operation_p (gimple *stmt)
-{
-  enum tree_code code;
-
-  gcc_assert (is_gimple_assign (stmt));
-  code = gimple_assign_rhs_code (stmt);
-
-  if (!commutative_

Re: [patch] Leave errno unchanged by successful std::stoi etc

2015-09-29 Thread Jakub Jelinek

On Tue, Sep 29, 2015 at 05:10:20PM +0100, Jonathan Wakely wrote:
> >That looks wrong to me, you only restore errno if you don't throw :(.
> >If you throw, then errno might remain 0, which is IMHO undesirable.
> 
> My thinking was that a failed conversion that throws an exception
> should be allowed to modify errno, and that the second case sets it to
> ERANGE sometimes anyway.

Well, you can modify errno, you just shouldn't change it from non-zero to
zero as far as the user is concerned.

http://pubs.opengroup.org/onlinepubs/009695399/functions/errno.html
"No function in this volume of IEEE Std 1003.1-2001 shall set errno to 0."
Of course, this part of STL is not POSIX, still, as you said, it would be
nice to guarantee the same.
> 
> But I suppose it would be better to consistently set it to non-zero
> when an exception is thrown, or consistently restore the original
> value in all cases.
> 
> >So, I'd say you want to restore it earlier, right after __convf, and
> >immediately before that copy the current errno to some other temporary
> >for the use in the condition?  Or restore errno = __saved_errno;
> >in all the 3 spots instead of just one.
> 
> Or in a destructor so it happens however we exit the function, like
> this ...

Works for me.

Jakub

[PATCH] Fix undefined behaviour in rl78 port

2015-09-29 Thread Jeff Law


And in the rl78 port.  Tested by building the rl78 targets in config-all.mk.

Installed on the trunk.

Jeff
commit 6d8cde85a30e36e5b5842b8d66837a8b4815d197
Author: Jeff Law 
Date:   Mon Sep 28 19:25:04 2015 -0400

[PATCH] Fix undefined behaviour in rl78 port
* config/rl78/rl78-expand.md (movqi): Fix undefined left shift
behaviour.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 1b9985a..79dc89f 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,5 +1,8 @@
 2015-09-29  Jeff Law  
 
+   * config/rl78/rl78-expand.md (movqi): Fix undefined left shift
+   behaviour.
+
* config/msp430/msp430.c (msp430_legitimate_constant): Fix undefined
left shift behaviour.
* config/msp430/constraints.md ('L' constraint): Similarly.
diff --git a/gcc/config/rl78/rl78-expand.md b/gcc/config/rl78/rl78-expand.md
index 0335a4d..67e6620 100644
--- a/gcc/config/rl78/rl78-expand.md
+++ b/gcc/config/rl78/rl78-expand.md
@@ -48,7 +48,7 @@
&& ! REG_P (operands[0]))
operands[1] = copy_to_mode_reg (QImode, operands[1]);
 
-if (CONST_INT_P (operands[1]) && ! IN_RANGE (INTVAL (operands[1]), (-1 << 
8) + 1, (1 << 8) - 1))
+if (CONST_INT_P (operands[1]) && ! IN_RANGE (INTVAL (operands[1]), 
(HOST_WIDE_INT_M1U << 8) + 1, (1 << 8) - 1))
   FAIL;
   }
 )

Re: [C PATCH] Fix missing warning (PR c/67730)

2015-09-29 Thread Joseph Myers

On Tue, 29 Sep 2015, Marek Polacek wrote:

> This fixes missing warning for the attached testcase.  In such a case,
> we must use the expansion point location.  I didn't simply add
>   loc = expansion_point_location_if_in_system_header (loc);
> as might be seen elsewhere in the codebase because we pass LOC down to
> convert_for_assignment where many of the warnings are issued and I was
> nervous about passing a different location there.

I suppose that for the convert_for_assignment cases you should warn if the 
user's code is in any way responsible for the issue, which includes if a 
user's function returns a wrong-type macro defined in a system header (or 
for that matter if a system header contains a return but the user chose 
the argument to that return, but that seems much less likely).

> Bootstrapped/regtested on x86_64-linux, ok for trunk and 5?

OK (though followups may be needed for any other issues).

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [C PATCH] Fix missing warning (PR c/67730)

2015-09-29 Thread Marek Polacek

On Tue, Sep 29, 2015 at 06:04:55PM +0200, Marc Glisse wrote:
> On Tue, 29 Sep 2015, Marek Polacek wrote:
> 
> >This fixes missing warning for the attached testcase.  In such a case,
> >we must use the expansion point location.  I didn't simply add
> > loc = expansion_point_location_if_in_system_header (loc);
> >as might be seen elsewhere in the codebase because we pass LOC down to
> >convert_for_assignment where many of the warnings are issued and I was
> >nervous about passing a different location there.
> 
> I assume this means that the other missing warning from
> http://stackoverflow.com/questions/32732281/no-warning-when-returning-null-with-gcc
> (same code but change the return type from void to int)
> is not fixed at the same time?

Nope, I wasn't aware of that one :(.  Maybe we want the
  loc = expansion_point_location_if_in_system_header (loc);
line after all...

Marek

Re: [C PATCH] Fix missing warning (PR c/67730)

2015-09-29 Thread Marc Glisse


On Tue, 29 Sep 2015, Marek Polacek wrote:


This fixes missing warning for the attached testcase.  In such a case,
we must use the expansion point location.  I didn't simply add
 loc = expansion_point_location_if_in_system_header (loc);
as might be seen elsewhere in the codebase because we pass LOC down to
convert_for_assignment where many of the warnings are issued and I was
nervous about passing a different location there.


I assume this means that the other missing warning from
http://stackoverflow.com/questions/32732281/no-warning-when-returning-null-with-gcc
(same code but change the return type from void to int)
is not fixed at the same time?

--
Marc Glisse

[PATCH] Fix undefined behaviour in rx port

2015-09-29 Thread Jeff Law


And the rx port.  Tested by building the rx targets in config-all.mk.

Installed on the trunk.

Jeff
commit 67dd8bdfba4072f24ea1a2bd07ffacc91185ee89
Author: Jeff Law 
Date:   Mon Sep 28 19:25:14 2015 -0400

[PATCH] Fix undefined behaviour in rx port
* config/rx/constraints.md (Int08): Fix undefined left shift
behaviour.
(Sint08, Sint16, Sint24): Likewise.
* config/rx/rx.c (rx_get_stack_layout): Likewise.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 79dc89f..53a52a6 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,5 +1,10 @@
 2015-09-29  Jeff Law  
 
+   * config/rx/constraints.md (Int08): Fix undefined left shift
+   behaviour.
+   (Sint08, Sint16, Sint24): Likewise.
+   * config/rx/rx.c (rx_get_stack_layout): Likewise.
+
* config/rl78/rl78-expand.md (movqi): Fix undefined left shift
behaviour.
 
diff --git a/gcc/config/rx/constraints.md b/gcc/config/rx/constraints.md
index d46f9da..b41c232 100644
--- a/gcc/config/rx/constraints.md
+++ b/gcc/config/rx/constraints.md
@@ -28,28 +28,28 @@
 (define_constraint "Int08"
   "@internal A signed or unsigned 8-bit immediate value"
   (and (match_code "const_int")
-   (match_test "IN_RANGE (ival, (-1 << 8), (1 << 8) - 1)")
+   (match_test "IN_RANGE (ival, (HOST_WIDE_INT_M1U << 8), (1 << 8) - 1)")
   )
 )
 
 (define_constraint "Sint08"
   "@internal A signed 8-bit immediate value"
   (and (match_code "const_int")
-   (match_test "IN_RANGE (ival, (-1 << 7), (1 << 7) - 1)")
+   (match_test "IN_RANGE (ival, (HOST_WIDE_INT_M1U << 7), (1 << 7) - 1)")
   )
 )
 
 (define_constraint "Sint16"
   "@internal A signed 16-bit immediate value"
   (and (match_code "const_int")
-   (match_test "IN_RANGE (ival, (-1 << 15), (1 << 15) - 1)")
+   (match_test "IN_RANGE (ival, (HOST_WIDE_INT_M1U << 15), (1 << 15) - 1)")
   )
 )
 
 (define_constraint "Sint24"
   "@internal A signed 24-bit immediate value"
   (and (match_code "const_int")
-   (match_test "IN_RANGE (ival, (-1 << 23), (1 << 23) - 1)")
+   (match_test "IN_RANGE (ival, (HOST_WIDE_INT_M1U << 23), (1 << 23) - 1)")
   )
 )
 
diff --git a/gcc/config/rx/rx.c b/gcc/config/rx/rx.c
index c68f29e..6d911d2 100644
--- a/gcc/config/rx/rx.c
+++ b/gcc/config/rx/rx.c
@@ -1561,7 +1561,7 @@ rx_get_stack_layout (unsigned int * lowest,
  PUSHM.
 
  FIXME: Is it worth improving this heuristic ?  */
-  pushed_mask = (-1 << low) & ~(-1 << (high + 1));
+  pushed_mask = (HOST_WIDE_INT_M1U << low) & ~(HOST_WIDE_INT_M1U << (high + 
1));
   unneeded_pushes = (pushed_mask & (~ save_mask)) & pushed_mask;
 
   if ((fixed_reg && fixed_reg <= high)
@@ -1667,7 +1667,7 @@ ok_for_max_constant (HOST_WIDE_INT val)
 
   /* rx_max_constant_size specifies the maximum number
  of bytes that can be used to hold a signed value.  */
-  return IN_RANGE (val, (-1 << (rx_max_constant_size * 8)),
+  return IN_RANGE (val, (HOST_WIDE_INT_M1U << (rx_max_constant_size * 8)),
( 1 << (rx_max_constant_size * 8)));
 }

Re: [PATCH] remove dead code of commutative_reductions

2015-09-29 Thread Tobias Grosser


On 09/29/2015 06:26 PM, Sebastian Pop wrote:

This code is not used anymore after we removed the previous loop optimizer (not
based on the ISL scheduler.)  We will add back the detection of commutative
reductions after we improve the code generation of scalar dependences (by not
going out of SSA for scalar dependences just to expose them to the data
dependence graph.)

Patch passed bootstrap and check on x86_64-linux with ISL-0.15.
I will commit this patch to trunk.


LGTM.

Regarding the handling of scalars, Polly does this now by only virtually 
modeling
them as memory dependences, but leaving them as registers until code generation.
The final code generation is still done by alloca(ing) a memory slot and then
generating loads/stores from this memory slot. This is significantly easier than
trying to directly generate SSA again.

Best,
Tobias

[PATCH, PR target/67761] Fix i686-- bootstrap comparison failure

2015-09-29 Thread Ilya Enkovich

Hi,

My recenttly introduced STV pass doesn't skip debug instructions and it causes 
transformation (mistly cost computation) depending on debug info.  It causes 
bootstrap comparison failure.  This patch fixes.  Bootstrapped for i686-linux.  
Testing for x86_64-unknown-linux-gnu{,m32} is in progress.  OK for trunk if 
pass?

Thanks,
Ilya
--
gcc/

2015-09-29  Ilya Enkovich  

* config/i386/i386.c (scalar_chain::analyze_register_chain): Ignore
debug insns.
(scalar_chain::convert_reg): Likewise.

gcc/testsuite/

2015-09-29  Ilya Enkovich  

* gcc.target/i386/pr67761.c: New test.


diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 6f2380f..7b3ffb0 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2919,6 +2919,10 @@ scalar_chain::analyze_register_chain (bitmap candidates, 
df_ref ref)
   for (chain = DF_REF_CHAIN (ref); chain; chain = chain->next)
 {
   unsigned uid = DF_REF_INSN_UID (chain->ref);
+
+  if (!NONDEBUG_INSN_P (DF_REF_INSN (chain->ref)))
+   continue;
+
   if (!DF_REF_REG_MEM_P (chain->ref))
{
  if (bitmap_bit_p (insns, uid))
@@ -3279,7 +3283,7 @@ scalar_chain::convert_reg (unsigned regno)
bitmap_clear_bit (conv, DF_REF_INSN_UID (ref));
  }
   }
-else
+else if (NONDEBUG_INSN_P (DF_REF_INSN (ref)))
   {
replace_rtx (DF_REF_INSN (ref), reg, scopy);
df_insn_rescan (DF_REF_INSN (ref));
diff --git a/gcc/testsuite/gcc.target/i386/pr67761.c 
b/gcc/testsuite/gcc.target/i386/pr67761.c
new file mode 100644
index 000..9b13d58
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr67761.c
@@ -0,0 +1,13 @@
+/* PR target/pr67761 */
+/* { dg-do run { target { ia32 } } } */
+/* { dg-options "-O2 -march=slm -g" } */
+/* { dg-final { scan-assembler "paddq" } } */
+
+void
+test (long long *values, long long val, long long delta)
+{
+  unsigned i;
+
+  for (i = 0; i < 128; i++, val += delta)
+values[i] = val;
+}

[PATCH] Fix undefined behaviour in SH port

2015-09-29 Thread Jeff Law

More left shifts of negative signed values to fix in the SH port.  I'm 
not sure how these were missed last week or if they were introduced 
between the point when I tested last week and yesterday.  Regardless, 
they're fixed in the obvious way.


Tested by building all the sh targets form config-all.mk.

Installed on the trunk.

Jeff
commit d1349379450b8e11dcc7adfe678028b674a63cf1
Author: Jeff Law 
Date:   Mon Sep 28 19:25:20 2015 -0400

[PATCH] Fix undefined behaviour in SH port

* config/sh/sh.c (gen_shl_and): Fix undefined left shift
behaviour.
(gen_shl_sext): Likewise.
* config/sh/sh.md (divsi3): Likewise.
(imm->ext_dest_operand splitter): Likewise.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index cce1ba5..22c09b7 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,11 @@
+2015-09-29  Jeff Law  
+
+   * config/sh/sh.c (gen_shl_and): Fix undefined left shift
+   behaviour.
+   (gen_shl_sext): Likewise.
+   * config/sh/sh.md (divsi3): Likewise.
+   (imm->ext_dest_operand splitter): Likewise.
+
 2015-09-29  Evandro Menezes  
 
* config/arm/types.md (neon_ldp, neon_ldp_q, neon_stp, neon_stp_q):
diff --git a/gcc/config/sh/sh.c b/gcc/config/sh/sh.c
index 16fb575..904201b 100644
--- a/gcc/config/sh/sh.c
+++ b/gcc/config/sh/sh.c
@@ -4342,7 +4342,7 @@ gen_shl_and (rtx dest, rtx left_rtx, rtx mask_rtx, rtx 
source)
 that don't matter.  This way, we might be able to get a shorter
 signed constant.  */
   if (mask & ((HOST_WIDE_INT) 1 << (31 - total_shift)))
-   mask |= (HOST_WIDE_INT) ~0 << (31 - total_shift);
+   mask |= (HOST_WIDE_INT) ((HOST_WIDE_INT_M1U) << (31 - total_shift));
 case 2:
   /* Don't expand fine-grained when combining, because that will
  make the pattern fail.  */
@@ -4626,7 +4626,7 @@ gen_shl_sext (rtx dest, rtx left_rtx, rtx size_rtx, rtx 
source)
}
   emit_insn (gen_andsi3 (dest, source, GEN_INT ((1 << insize) - 1)));
   emit_insn (gen_xorsi3 (dest, dest, GEN_INT (1 << (insize - 1;
-  emit_insn (gen_addsi3 (dest, dest, GEN_INT (-1 << (insize - 1;
+  emit_insn (gen_addsi3 (dest, dest, GEN_INT (HOST_WIDE_INT_M1U << (insize 
- 1;
   operands[0] = dest;
   operands[2] = kind == 7 ? GEN_INT (left + 1) : left_rtx;
   gen_shifty_op (ASHIFT, operands);
diff --git a/gcc/config/sh/sh.md b/gcc/config/sh/sh.md
index 8a388bc..d758e3b 100644
--- a/gcc/config/sh/sh.md
+++ b/gcc/config/sh/sh.md
@@ -3052,7 +3052,7 @@
  tab_base = force_reg (DImode, tab_base);
}
   if (TARGET_DIVIDE_INV20U)
-   i2p27 = force_reg (DImode, GEN_INT (-2 << 27));
+   i2p27 = force_reg (DImode, GEN_INT ((unsigned HOST_WIDE_INT)-2 << 27));
   else
i2p27 = GEN_INT (0);
   if (TARGET_DIVIDE_INV20U || TARGET_DIVIDE_INV20L)
@@ -7875,7 +7875,7 @@ label:
  break;
}
  /* Try movi / mshflo.l w/ r63.  */
- val2 = val + ((HOST_WIDE_INT) -1 << 32);
+ val2 = val + ((HOST_WIDE_INT) (HOST_WIDE_INT_M1U << 32));
  if ((HOST_WIDE_INT) val2 < 0 && CONST_OK_FOR_I16 (val2))
{
  operands[1] = gen_mshflo_l_di (operands[0], operands[0],

Re: patch to fix PR66424

2015-09-29 Thread Vladimir Makarov


On 09/29/2015 10:23 AM, Matthias Klose wrote:
This was marked as a regression in 5 and 6, but never backported to 
the gcc-5-branch. Is it time to backport?



Thanks for the remainder.  I've just committed the patch to gcc 5 branch.

Patch for PR 66424 has been backported to GCC-5 branch

2015-09-29 Thread Vladimir Makarov


  The following patch has been committed to gcc 5 branch as rev. 228256.

  The patch was bootstrapped and tested on x86/x86-64.


Index: ChangeLog
===
--- ChangeLog	(revision 228250)
+++ ChangeLog	(working copy)
@@ -1,3 +1,12 @@
+2015-09-29  Vladimir Makarov  
+
+	Backport from mainline
+	2015-07-21  Vladimir Makarov  
+
+	PR ipa/66424.
+	* lra-remat.c (operand_to_remat): Prevent using insns with input
+	subregs processed separately by IRA.
+
 2015-09-29  Andreas Krebbel  
 
 	Backport from mainline
@@ -31,7 +40,7 @@
 	("vec_scatter_element_SI"): Replace gf mode
 	attribute with bhfgq.
 
-2015-09-29  Andrew Pinski  
+2015-09-29  Andrew Pinski  
 
 	* config/aarch64/aarch64.md (prefetch):
 	Change the predicate of operand 0 to register_operand.
Index: lra-remat.c
===
--- lra-remat.c	(revision 228250)
+++ lra-remat.c	(working copy)
@@ -432,6 +432,16 @@ operand_to_remat (rtx_insn *insn)
 	  return -1;
 	found_reg = reg;
   }
+/* IRA calculates conflicts separately for subregs of two words
+   pseudo.  Even if the pseudo lives, e.g. one its subreg can be
+   used lately, another subreg hard register can be already used
+   for something else.  In such case, it is not safe to
+   rematerialize the insn.  */
+else if (reg->type == OP_IN && reg->subreg_p
+	 && reg->regno >= FIRST_PSEUDO_REGISTER
+	 && (GET_MODE_SIZE (PSEUDO_REGNO_MODE (reg->regno))
+		 == 2 * UNITS_PER_WORD))
+  return -1;
   if (found_reg == NULL)
 return -1;
   if (found_reg->regno < FIRST_PSEUDO_REGISTER)
Index: testsuite/ChangeLog
===
--- testsuite/ChangeLog	(revision 228250)
+++ testsuite/ChangeLog	(working copy)
@@ -1,3 +1,11 @@
+2015-09-29  Vladimir Makarov  
+
+	Backport from mainline
+	2015-07-21  Vladimir Makarov  
+
+	PR ipa/66424.
+	* gcc.target/i386/pr66424.c: New.
+
 2015-09-29  Andreas Krebbel  
 
 	Backport from mainline

Re: [patch] Leave errno unchanged by successful std::stoi etc

2015-09-29 Thread Martin Sebor


On 09/29/2015 10:15 AM, Jakub Jelinek wrote:

On Tue, Sep 29, 2015 at 05:10:20PM +0100, Jonathan Wakely wrote:

That looks wrong to me, you only restore errno if you don't throw :(.
If you throw, then errno might remain 0, which is IMHO undesirable.


My thinking was that a failed conversion that throws an exception
should be allowed to modify errno, and that the second case sets it to
ERANGE sometimes anyway.


Well, you can modify errno, you just shouldn't change it from non-zero to
zero as far as the user is concerned.

http://pubs.opengroup.org/onlinepubs/009695399/functions/errno.html
"No function in this volume of IEEE Std 1003.1-2001 shall set errno to 0."
Of course, this part of STL is not POSIX, still, as you said, it would be
nice to guarantee the same.


FWIW, I agree. It's a helpful property. If libstdc++ provides
the POSIC/C guarantee it would be nice to document it in the
manual.

That said, this part of the C++ spec (stoi and related) is specified
to such a level of detail that one might argue that the functions
aren't allowed to reset errno in an observable way.

As an aside, I objected to this specification when it was first
proposed, not because of the errno guarantee, but because the
functions were meant to be light-weight, efficient, and certainly
thread-safe means of converting strings to numbers. Specifying
their effects as opposed to their postconditions means that can't
be implemented independent of strtol and the C locale, which makes
them anything but light-weight, and prone to data races in
programs that call setlocale.

Martin

[PATCH] Fix warnings building pdp11 port

2015-09-29 Thread Jeff Law

The pdp11 port fails to build with the trunk because of a warning. 
Essentially VRP determines that the result of using BRANCH_COST is a 
constant with the range [0..1].  That's always less than 4, 3 and the 
various other magic constants used with BRANCH_COST and VRP issues a 
warning about that comparison.


I expect we're going to be overhauling BRANCH_COST shortly.  In the mean 
time, this just revectors BRANCH_COST for the pdp11 into a function to 
prevent VRP from collapsing the test and issuing the warning.


Yes, this means more code in the pdp11 cross compiler.  I'm not terribly 
concerned about that and I couldn't stand the idea of scattering 
diagnostic push/pop stuff all over the place to make just the pdp11 port 
happy.



Tested by building the pdp11 targets from config-all.mk.

Installed on the trunk.

Jeff

[PATCH] Fix building microblaze targets with trunk

2015-09-29 Thread Jeff Law

The microblaze port as a "*p++" statement which computes a result that 
is never used (the memory result).  This removes the spurious memory 
dereference and the unused value warning.


Tested by building the microblaze targets in config-all.mk.

Installed on the trunk.

Jeff
commit b2e58a1a53a3bbba60bd39ce53beb9fd706742f4
Author: Jeff Law 
Date:   Tue Sep 29 11:59:11 2015 -0400

[PATCH] Fix building microblaze targets with trunk
* config/microblaze/microblaze.c (microblaze_version_to_int): Remove
computation of unused value.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 13e930a..8d55423 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,5 +1,8 @@
 2015-09-29  Jeff Law  
 
+   * config/microblaze/microblaze.c (microblaze_version_to_int): Remove
+   computation of unused value.
+
* config/pdp11/pdp11.c (pdp11_branch_cost): New function.
* config/pdp11/pdp11.h (BRANCH_COST): Call function rather than
inline macro expansion.
diff --git a/gcc/config/microblaze/microblaze.c 
b/gcc/config/microblaze/microblaze.c
index 6e7745a..ebcf65a 100644
--- a/gcc/config/microblaze/microblaze.c
+++ b/gcc/config/microblaze/microblaze.c
@@ -1640,7 +1640,7 @@ microblaze_version_to_int (const char *version)
{   /* Looking for major  */
   if (*p == '.')
 {
-  *v++;
+  v++;
 }
   else
 {

[PATCH] Fix building interix targets

2015-09-29 Thread Jeff Law



I'm resisting the temptation to declare interix dead (it's been tried 
before).  I'm guessing it hasn't built since early 2012.  But the fix is 
trivial enough and it's not like interix needs lots of care and maintenance.


Tested by building the interix targets in config-list.mk.

Installed on the trunk.

Jeff
commit 2cdebba6f51af63ad820568cbd439f296e7c4d82
Author: Jeff Law 
Date:   Tue Sep 29 11:58:51 2015 -0400

[PATCH] Fix building interix targets

* config/i386/t-interix (winnt-stubs.o): Fix compilation rule.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 87de440..68149c4 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,5 +1,7 @@
 2015-09-29  Jeff Law  
 
+   * config/i386/t-interix (winnt-stubs.o): Fix compilation rule.
+
* config/sh/sh.c (gen_shl_and): Fix undefined left shift
behaviour.
(gen_shl_sext): Likewise.
diff --git a/gcc/config/i386/t-interix b/gcc/config/i386/t-interix
index db35dbe..dd59b85 100644
--- a/gcc/config/i386/t-interix
+++ b/gcc/config/i386/t-interix
@@ -25,6 +25,6 @@ winnt.o: $(srcdir)/config/i386/winnt.c $(CONFIG_H) 
$(SYSTEM_H) coretypes.h \
 winnt-stubs.o: $(srcdir)/config/i386/winnt-stubs.c $(CONFIG_H) $(SYSTEM_H) 
coretypes.h \
   $(TM_H) $(RTL_H) $(REGS_H) hard-reg-set.h output.h $(TREE_H) flags.h \
   $(TM_P_H) toplev.h $(HASHTAB_H) $(GGC_H)
-   $(COMPILER) -c $(ALL_CFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
+   $(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
$(srcdir)/config/i386/winnt-stubs.c

Re: [PATCH] Fix building microblaze targets with trunk

2015-09-29 Thread Michael Eager


On 09/29/2015 10:01 AM, Jeff Law wrote:

The microblaze port as a "*p++" statement which computes a result that is never 
used (the memory
result).  This removes the spurious memory dereference and the unused value 
warning.

Tested by building the microblaze targets in config-all.mk.

Installed on the trunk.



OK.


--
Michael Eagerea...@eagercon.com
1960 Park Blvd., Palo Alto, CA 94306  650-325-8077

[gomp4.1] Fixup handling of doacross loops with noreturn body

2015-09-29 Thread Jakub Jelinek

On Thu, Sep 24, 2015 at 08:32:10PM +0200, Jakub Jelinek wrote:
> then there is a bug with ordered loops that have noreturn body (need to add
> some edge for that case and condition checking),

This patch fixes the above issue, if we have any of the ordered > collapse
loops that might have zero iterations, we need to deal with the !cont_bb
(aka broken_loop) case, for lastprivate reasons not just as simple checking
of the conditions and falling through into the cont_bb case, but have to
emit all the loops, and just for the innermost handle the case that there is
no fallthru from the body to the cont_bb block; the innermost could have
zero iterations and some of the outer ones could have all non-zero
iterations, at which point we want lastprivate to contain the initial value
of the innermost iterator and last iteration's values of the outer ones.

2015-09-29  Jakub Jelinek  

* omp-low.c (expand_omp_for_ordered_loops): Handle the case
when cont_bb has no predecessors.
(expand_omp_for_generic): If any of the ordered loops above
collapsed loops could have zero iterations for broken_loop,
create a cont_bb and continue as if the loop is not broken.

* testsuite/libgomp.c/doacross-1.c (main): Adjust, so that one
of the doacross loops has noreturn loop body.

--- gcc/omp-low.c.jj2015-09-25 18:17:13.0 +0200
+++ gcc/omp-low.c   2015-09-29 19:07:25.366494422 +0200
@@ -7345,36 +7345,44 @@ expand_omp_for_ordered_loops (struct omp
   basic_block new_body = e1->dest;
   if (body_bb == cont_bb)
cont_bb = new_body;
-  gsi = gsi_last_bb (cont_bb);
-  if (POINTER_TYPE_P (type))
-   t = fold_build_pointer_plus (fd->loops[i].v,
-fold_convert (sizetype,
-  fd->loops[i].step));
-  else
-   t = fold_build2 (PLUS_EXPR, type, fd->loops[i].v,
-fold_convert (type, fd->loops[i].step));
-  expand_omp_build_assign (&gsi, fd->loops[i].v, t);
-  if (counts[i])
-   {
- t = fold_build2 (PLUS_EXPR, fd->iter_type, counts[i],
-  build_int_cst (fd->iter_type, 1));
- expand_omp_build_assign (&gsi, counts[i], t);
- t = counts[i];
-   }
-  else
+  edge e2 = NULL;
+  basic_block new_header;
+  if (EDGE_COUNT (cont_bb->preds) > 0)
{
- t = fold_build2 (MINUS_EXPR, TREE_TYPE (fd->loops[i].v),
-  fd->loops[i].v, fd->loops[i].n1);
- t = fold_convert (fd->iter_type, t);
- t = force_gimple_operand_gsi (&gsi, t, true, NULL_TREE,
-   true, GSI_SAME_STMT);
+ gsi = gsi_last_bb (cont_bb);
+ if (POINTER_TYPE_P (type))
+   t = fold_build_pointer_plus (fd->loops[i].v,
+fold_convert (sizetype,
+  fd->loops[i].step));
+ else
+   t = fold_build2 (PLUS_EXPR, type, fd->loops[i].v,
+fold_convert (type, fd->loops[i].step));
+ expand_omp_build_assign (&gsi, fd->loops[i].v, t);
+ if (counts[i])
+   {
+ t = fold_build2 (PLUS_EXPR, fd->iter_type, counts[i],
+  build_int_cst (fd->iter_type, 1));
+ expand_omp_build_assign (&gsi, counts[i], t);
+ t = counts[i];
+   }
+ else
+   {
+ t = fold_build2 (MINUS_EXPR, TREE_TYPE (fd->loops[i].v),
+  fd->loops[i].v, fd->loops[i].n1);
+ t = fold_convert (fd->iter_type, t);
+ t = force_gimple_operand_gsi (&gsi, t, true, NULL_TREE,
+   true, GSI_SAME_STMT);
+   }
+ aref = build4 (ARRAY_REF, fd->iter_type, counts[fd->ordered],
+size_int (i - fd->collapse + 1),
+NULL_TREE, NULL_TREE);
+ expand_omp_build_assign (&gsi, aref, t);
+ gsi_prev (&gsi);
+ e2 = split_block (cont_bb, gsi_stmt (gsi));
+ new_header = e2->dest;
}
-  aref = build4 (ARRAY_REF, fd->iter_type, counts[fd->ordered],
-size_int (i - fd->collapse + 1), NULL_TREE, NULL_TREE);
-  expand_omp_build_assign (&gsi, aref, t);
-  gsi_prev (&gsi);
-  edge e2 = split_block (cont_bb, gsi_stmt (gsi));
-  basic_block new_header = e2->dest;
+  else
+   new_header = cont_bb;
   gsi = gsi_after_labels (new_header);
   tree v = force_gimple_operand_gsi (&gsi, fd->loops[i].v, true, NULL_TREE,
 true, GSI_SAME_STMT);
@@ -7395,10 +7403,13 @@ expand_omp_for_ordered_loops (struct omp
   set_immediate_dominator (CDI_DOMINATORS, new_header, body_bb);
   set_immediate_dominator (CDI_DOMINATORS, new_body, new_header);
 
-  struct loop *loop = alloc_lo

Re: [PATCH] Fix warnings building pdp11 port

2015-09-29 Thread Trevor Saunders

On Tue, Sep 29, 2015 at 10:55:46AM -0600, Jeff Law wrote:
> The pdp11 port fails to build with the trunk because of a warning.
> Essentially VRP determines that the result of using BRANCH_COST is a
> constant with the range [0..1].  That's always less than 4, 3 and the
> various other magic constants used with BRANCH_COST and VRP issues a warning
> about that comparison.
> 
> I expect we're going to be overhauling BRANCH_COST shortly.  In the mean
> time, this just revectors BRANCH_COST for the pdp11 into a function to
> prevent VRP from collapsing the test and issuing the warning.
> 
> Yes, this means more code in the pdp11 cross compiler.  I'm not terribly
> concerned about that and I couldn't stand the idea of scattering diagnostic
> push/pop stuff all over the place to make just the pdp11 port happy.

ENOPATCH, but it seems like that's the right direction anyway since it
makes it slightly easier to convert the macro to a hook ;)

Trev

> 
> 
> Tested by building the pdp11 targets from config-all.mk.
> 
> Installed on the trunk.
> 
> Jeff

[PATCH,committed] xfail Fortran tests on i386-freebsd

2015-09-29 Thread Steve Kargl

Neither test mention below has a chance to ever pass
on i386-*-freebsd* without a rewrite of the testecases.
So, I've xfailed both.


2015-09-29  Steven G. Kargl  

gfortran.dg/ieee/ieee_4.f90: xfail on i386-*-freebsd*
gfortran.dg/round_4.f90: ditto.

Index: gfortran.dg/ieee/ieee_4.f90
===
--- gfortran.dg/ieee/ieee_4.f90 (revision 228261)
+++ gfortran.dg/ieee/ieee_4.f90 (working copy)
@@ -1,4 +1,4 @@
-! { dg-do run }
+! { dg-do run { xfail i386-*-freebsd* } }
 
   use :: ieee_arithmetic
   implicit none
Index: gfortran.dg/round_4.f90
===
--- gfortran.dg/round_4.f90 (revision 228261)
+++ gfortran.dg/round_4.f90 (working copy)
@@ -1,4 +1,4 @@
-! { dg-do run }
+! { dg-do run { xfail i386-*-freebsd* } }
 ! { dg-add-options ieee }
 ! { dg-skip-if "PR libfortran/58015" { hppa*-*-hpux* } }
 ! { dg-skip-if "IBM long double 31 bits of precision, test requires 38" { 
powerpc*-*-linux* } }
-- 
Steve

Fold acc_on_device

2015-09-29 Thread Nathan Sidwell

This patch folds acc_on_device as a regular builtin, but postponed until we know 
which compiler we're in.  As suggested by Bernd, we use the existing builtin 
folding machinery.


Trunk is still using  the older PTX runtime scheme (Thomas is working on that), 
so the only change there is in the  host-side libgomp piece.


Ok for trunk?

nathan
2015-09-29  Nathan Sidwell  

	gcc/
	* builtins.c (expand_builtin_acc_on_device): Delete.
	(expand_builtin): Don't call it.
	(fold_builtin_1): Fold acc_on_device.

	libgomp/
	* oacc-init.c (acc_on_device): Force optimization level.

Index: libgomp/oacc-init.c
===
--- libgomp/oacc-init.c	(revision 228250)
+++ libgomp/oacc-init.c	(working copy)
@@ -620,10 +620,12 @@ acc_set_device_num (int ord, acc_device_
 
 ialias (acc_set_device_num)
 
-int
+/* Compile on_device with optimization, so that the compiler expands
+   this, rather than generating infinitely recursive code.  */
+
+int __attribute__ ((__optimize__ ("O2")))
 acc_on_device (acc_device_t dev)
 {
-  /* Just rely on the compiler builtin.  */
   return __builtin_acc_on_device (dev);
 }
 
Index: gcc/builtins.c
===
--- gcc/builtins.c	(revision 228250)
+++ gcc/builtins.c	(working copy)
@@ -5859,46 +5859,6 @@ expand_stack_save (void)
 }
 
 
-/* Expand OpenACC acc_on_device.
-
-   This has to happen late (that is, not in early folding; expand_builtin_*,
-   rather than fold_builtin_*), as we have to act differently for host and
-   acceleration device (ACCEL_COMPILER conditional).  */
-
-static rtx
-expand_builtin_acc_on_device (tree exp, rtx target)
-{
-  if (!validate_arglist (exp, INTEGER_TYPE, VOID_TYPE))
-return NULL_RTX;
-
-  tree arg = CALL_EXPR_ARG (exp, 0);
-
-  /* Return (arg == v1 || arg == v2) ? 1 : 0.  */
-  machine_mode v_mode = TYPE_MODE (TREE_TYPE (arg));
-  rtx v = expand_normal (arg), v1, v2;
-#ifdef ACCEL_COMPILER
-  v1 = GEN_INT (GOMP_DEVICE_NOT_HOST);
-  v2 = GEN_INT (ACCEL_COMPILER_acc_device);
-#else
-  v1 = GEN_INT (GOMP_DEVICE_NONE);
-  v2 = GEN_INT (GOMP_DEVICE_HOST);
-#endif
-  machine_mode target_mode = TYPE_MODE (integer_type_node);
-  if (!target || !register_operand (target, target_mode))
-target = gen_reg_rtx (target_mode);
-  emit_move_insn (target, const1_rtx);
-  rtx_code_label *done_label = gen_label_rtx ();
-  do_compare_rtx_and_jump (v, v1, EQ, false, v_mode, NULL_RTX,
-			   NULL, done_label, PROB_EVEN);
-  do_compare_rtx_and_jump (v, v2, EQ, false, v_mode, NULL_RTX,
-			   NULL, done_label, PROB_EVEN);
-  emit_move_insn (target, const0_rtx);
-  emit_label (done_label);
-
-  return target;
-}
-
-
 /* Expand an expression EXP that calls a built-in function,
with result going to TARGET if that's convenient
(and in mode MODE if that's convenient).
@@ -7036,9 +6996,8 @@ expand_builtin (tree exp, rtx target, rt
   break;
 
 case BUILT_IN_ACC_ON_DEVICE:
-  target = expand_builtin_acc_on_device (exp, target);
-  if (target)
-	return target;
+  /* Do library call, if we failed to expand the builtin when
+	 folding.  */
   break;
 
 default:	/* just do library call, if unknown builtin */
@@ -10271,6 +10230,27 @@ fold_builtin_1 (location_t loc, tree fnd
 	return build_empty_stmt (loc);
   break;
 
+case BUILT_IN_ACC_ON_DEVICE:
+  /* Don't fold on_device until we know which compiler is active.  */
+  if (symtab->state == EXPANSION)
+	{
+	  unsigned val_host = GOMP_DEVICE_HOST;
+	  unsigned val_dev = GOMP_DEVICE_NONE;
+
+#ifdef ACCEL_COMPILER
+	  val_host = GOMP_DEVICE_NOT_HOST;
+	  val_dev = ACCEL_COMPILER_acc_device;
+#endif
+	  tree host = build2 (EQ_EXPR, boolean_type_node, arg0,
+			  build_int_cst (integer_type_node, val_host));
+	  tree dev = build2 (EQ_EXPR, boolean_type_node, arg0,
+			 build_int_cst (integer_type_node, val_dev));
+
+	  tree result = build2 (TRUTH_OR_EXPR, boolean_type_node, host, dev);
+	  return fold_convert (integer_type_node, result);
+	}
+  break;
+
 default:
   break;
 }

Re: [PATCH] Fix warnings building pdp11 port

2015-09-29 Thread Jeff Law


On 09/29/2015 12:11 PM, Trevor Saunders wrote:

On Tue, Sep 29, 2015 at 10:55:46AM -0600, Jeff Law wrote:

The pdp11 port fails to build with the trunk because of a warning.
Essentially VRP determines that the result of using BRANCH_COST is a
constant with the range [0..1].  That's always less than 4, 3 and the
various other magic constants used with BRANCH_COST and VRP issues a warning
about that comparison.

I expect we're going to be overhauling BRANCH_COST shortly.  In the mean
time, this just revectors BRANCH_COST for the pdp11 into a function to
prevent VRP from collapsing the test and issuing the warning.

Yes, this means more code in the pdp11 cross compiler.  I'm not terribly
concerned about that and I couldn't stand the idea of scattering diagnostic
push/pop stuff all over the place to make just the pdp11 port happy.


ENOPATCH, but it seems like that's the right direction anyway since it
makes it slightly easier to convert the macro to a hook ;)

Bah.  Attached this time :-)

Yea, hookization was in the back of my mind when I made the final choice 
to use a function call.


Jeff
commit c6fc406c69342fdcca25ee48294bd43dd90facc2
Author: law 
Date:   Tue Sep 29 16:56:04 2015 +

[PATCH] Fix warnings building pdp11 port

* config/pdp11/pdp11.c (pdp11_branch_cost): New function.
* config/pdp11/pdp11.h (BRANCH_COST): Call function rather than
inline macro expansion.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@228259 
138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 68149c4..13e930a 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,5 +1,9 @@
 2015-09-29  Jeff Law  
 
+   * config/pdp11/pdp11.c (pdp11_branch_cost): New function.
+   * config/pdp11/pdp11.h (BRANCH_COST): Call function rather than
+   inline macro expansion.
+
* config/i386/t-interix (winnt-stubs.o): Fix compilation rule.
 
* config/sh/sh.c (gen_shl_and): Fix undefined left shift
diff --git a/gcc/config/pdp11/pdp11-protos.h b/gcc/config/pdp11/pdp11-protos.h
index 86c6da3..aca3d82 100644
--- a/gcc/config/pdp11/pdp11-protos.h
+++ b/gcc/config/pdp11/pdp11-protos.h
@@ -47,3 +47,4 @@ extern void output_ascii (FILE *, const char *, int);
 extern void pdp11_asm_output_var (FILE *, const char *, int, int, bool);
 extern void pdp11_expand_prologue (void);
 extern void pdp11_expand_epilogue (void);
+extern int pdp11_branch_cost (void);
diff --git a/gcc/config/pdp11/pdp11.c b/gcc/config/pdp11/pdp11.c
index f0c2a5d..8eb37c6 100644
--- a/gcc/config/pdp11/pdp11.c
+++ b/gcc/config/pdp11/pdp11.c
@@ -1933,4 +1933,10 @@ pdp11_scalar_mode_supported_p (machine_mode mode)
   return default_scalar_mode_supported_p (mode);
 }
 
+int
+pdp11_branch_cost ()
+{
+  return (TARGET_BRANCH_CHEAP ? 0 : 1);
+}
+
 struct gcc_target targetm = TARGET_INITIALIZER;
diff --git a/gcc/config/pdp11/pdp11.h b/gcc/config/pdp11/pdp11.h
index 1d947f3..8339f1c 100644
--- a/gcc/config/pdp11/pdp11.h
+++ b/gcc/config/pdp11/pdp11.h
@@ -660,8 +660,7 @@ extern rtx cc0_reg_rtx;
 /* there is no point in avoiding branches on a pdp, 
since branches are really cheap - I just want to find out
how much difference the BRANCH_COST macro makes in code */
-#define BRANCH_COST(speed_p, predictable_p) (TARGET_BRANCH_CHEAP ? 0 : 1)
-
+#define BRANCH_COST(speed_p, predictable_p) pdp11_branch_cost ()
 
 #define COMPARE_FLAG_MODE HImode

Re: [gomp4] error on acc loops not associated with offloaded acc regions

2015-09-29 Thread Cesar Philippidis

On 09/29/2015 02:48 AM, Thomas Schwinge wrote:

> On Mon, 28 Sep 2015 10:08:34 -0700, Cesar Philippidis 
>  wrote:
>> I've applied this patch to gomp-4_0-branch which teaches omplower how to
>> error when it detects acc loops which aren't nested inside an acc
>> parallel or kernels region or located within a function marked as an acc
>> routine. A couple of test cases needed to be updated.
>>
>> The error message is kind of long. Let me know if it should be revised.
> 
>>  gcc/testsuite/
>>  * c-c++-common/goacc/non-routine.c: New test.
>>  * c-c++-common/goacc-gomp/nesting-1.c: Add checks for invalid loop
>>  nesting.
>>  * c-c++-common/goacc-gomp/nesting-fail-1.c: Likewise.
>>  * c-c++-common/goacc/clauses-fail.c: Likewise.
>>  * c-c++-common/goacc/sb-1.c: Likewise.
>>  * c-c++-common/goacc/sb-3.c: Likewise.
>>  * gcc.dg/goacc/sb-1.c: Likewise.
>>  * gcc.dg/goacc/sb-3.c: Likewise.
> 
> What about any Fortran test cases?

My first thought was that we didn't need one because this is generic
error handling in omplow, and there are already a lot of c tests cases
exercising it. However a fortran test can't hurt, so I added one in this
new patch. Note that I had to create a new test instead of hijacking an
existing test, because the fortran front end bails out when it detects
errors before it hands anything over to omplow. And the existing tests
had a bunch of expected front end errors.

>> --- a/gcc/omp-low.c
>> +++ b/gcc/omp-low.c
>> @@ -2901,6 +2901,14 @@ check_omp_nesting_restrictions (gimple *stmt, 
>> omp_context *ctx)
>>  }
>>return true;
>>  }
>> +  if (is_gimple_omp_oacc (stmt) && ctx == NULL
>> +  && get_oacc_fn_attrib (current_function_decl) == NULL)
>> +{
>> +  error_at (gimple_location (stmt),
>> +"acc loops must be associated with an acc region or "
>> +"routine");
>> +  return false;
>> +}
>>/* FALLTHRU */
>>  case GIMPLE_CALL:
>>if (is_gimple_call (stmt)
> 
> I see that the error reporting doesn't really use a consistent style
> currently, but what about something like "loop directive must be
> associated with compute region" (where "compute region" is the language
> used by OpenACC 2.0a to mean the structured block associated with a
> compute construct as well as routine directive)?

That sounds reasonable, but it's not much shorter.

>> --- a/gcc/testsuite/c-c++-common/goacc-gomp/nesting-1.c
>> +++ b/gcc/testsuite/c-c++-common/goacc-gomp/nesting-1.c
>> @@ -20,6 +20,7 @@ f_acc_kernels (void)
>>}
>>  }
>>  
>> +#pragma acc routine
>>  void
>>  f_acc_loop (void)
>>  {
> 
> OK, but...
> 
>> --- a/gcc/testsuite/c-c++-common/goacc-gomp/nesting-fail-1.c
>> +++ b/gcc/testsuite/c-c++-common/goacc-gomp/nesting-fail-1.c
>> @@ -361,72 +361,72 @@ f_acc_data (void)
>>  void
>>  f_acc_loop (void)
>>  {
>> -#pragma acc loop
>> +#pragma acc loop /* { dg-error "acc loops must be associated with an acc 
>> region or routine" } */
>>for (i = 0; i < 2; ++i)
>>  {
>> -#pragma omp parallel /* { dg-error "non-OpenACC construct inside of OpenACC 
>> region" } */
>> +#pragma omp parallel
>>;
>>  }
> 
> ... here you're changing what this is meant to be testing, so please
> restore the original meaning (by adding "#pragma acc routine" to this
> function, I suppose), and then perhaps add whichever additional test
> cases you deem necessary.

I was wondering about that too. After thinking about it some more, I did
as you suggested -- revert those changes and used a routine pragma.

>> --- /dev/null
>> +++ b/gcc/testsuite/c-c++-common/goacc/non-routine.c
>> @@ -0,0 +1,16 @@
>> +/* This program validates the behavior of acc loops which are
>> +   not associated with a parallel or kernles region or routine.  */
> 
> :-) Thanks for adding such a comment -- this is missing in too many test
> cases.

We definitely need more of them. I'm not starting to forget what I was
trying to test several months ago.

I'll apply this patch to gomp4.

Cesar

2015-09-29  Cesar Philippidis  

	gcc/
	* omp-low.c (check_omp_nesting_restrictions): Update the error
	message for loops not affliated with acc compute regions.

	gcc/testsuite/
	* c-c++-common/goacc-gomp/nesting-fail-1.c (f_omp): Revert changes and
	mark the function as an acc routine.
	* c-c++-common/goacc/clauses-fail.c: Likewise.
	* c-c++-common/goacc/loop-1.c: Likewise.
	* c-c++-common/goacc/non-routine.c: Likewise.
	* c-c++-common/goacc/sb-1.c: Likewise.
	* c-c++-common/goacc/sb-3.c: Likewise.
	* gcc.dg/goacc/sb-1.c: Likewise.
	* gcc.dg/goacc/sb-3.c: Likewise.
	* gfortran.dg/goacc/loop-4.f95: New test.


diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index ba8cdf4..dff013d 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -2923,8 +2923,8 @@ check_omp_nesting_restrictions (gimple *stmt, omp_context *ctx)
 	  && get_oacc_fn_attrib (current_function_decl) == NULL)
 	{
 	  error_at (gimple_location (stmt),
-		"acc loops must be

New OpenACC pass and Target Hook

2015-09-29 Thread Nathan Sidwell

This patch implements an openacc device-specific lowering pass, and an openacc 
target hook for validating compute dimensions.


The pass 'oaccdevlow' is inserted early after LTO readback.  It is active for 
offloaded openacc functions, and openacc routines.  Currently its only action is 
to validate the compute dimensions specified for an offloaded function.


The new hook performs the validation. It can change dimensions and  issue 
diagnostics etc.  The default hook simply sets all dimensions to 1, which is 
what is required on the host.  The PTX backend overrides this hook, but 
currently does no validation.  When the partitioned execution patch(es) are 
ready, it will make sense for the backend to validate -- this is already working 
on the branch, FWIW.


ok for trunk?

nathan
2015-09-29  Nathan Sidwell  
	Cesar Philippidis  

	gcc/
	* config/nvptx/nvptx.c (nvptx_validate_dims): New.
	(TARGET_GOACC_VALIDATE_DIMS): Override.
	* target.def (TARGET_GOACC): New target hook prefix.
	(validate_dims): New hook.
	* targhooks.h (default_goacc_validate_dims): New.
	* omp-low.c (oacc_validate_dims): New.
	(execute_oacc_device_lower): New.
	(default_goacc_validate_dims): New.
	(pass_data_oacc_device_lower): New.
	(pass_oacc_device_lower): New pass.
	(make_pass_oacc_device_lower): New.
	* tree-pass.h (make_pass_oacc_device_lower): Declare.
	* passes.def (pass_oacc_device_lower): Add it.
	* doc/tm.texi: Rebuilt.
	* doc/tm.texi.in (TARGET_GOACC_VALIDATE_DIMS): Add hook.
	* doc/invoke.texi (oaccdevlow): Document tree dump flag.

Index: gcc/config/nvptx/nvptx.c
===
--- gcc/config/nvptx/nvptx.c	(revision 228245)
+++ gcc/config/nvptx/nvptx.c	(working copy)
@@ -2141,6 +2141,22 @@ nvptx_file_end (void)
   fputs (func_decls.str().c_str(), asm_out_file);
 }
 
+/* Validate compute dimensions, fill in non-unity defaults.  FN_LEVEL
+   indicates the level at which a routine might spawn a loop.  It is
+   negative for non-routines.  */
+
+static bool
+nvptx_validate_dims (tree ARG_UNUSED (decl), int *ARG_UNUSED (dims),
+		 int ARG_UNUSED (fn_level))
+{
+  bool changed = false;
+
+  /* TODO: Leave dimensions unaltered.  Partitioned execution needs
+ porting before filtering dimensions makes sense.  */
+
+  return changed;
+}
+
 #undef TARGET_OPTION_OVERRIDE
 #define TARGET_OPTION_OVERRIDE nvptx_option_override
 
@@ -2227,6 +2243,9 @@ nvptx_file_end (void)
 #undef TARGET_VECTOR_ALIGNMENT
 #define TARGET_VECTOR_ALIGNMENT nvptx_vector_alignment
 
+#undef TARGET_GOACC_VALIDATE_DIMS
+#define TARGET_GOACC_VALIDATE_DIMS nvptx_validate_dims
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-nvptx.h"
Index: gcc/target.def
===
--- gcc/target.def	(revision 228245)
+++ gcc/target.def	(working copy)
@@ -1639,6 +1639,23 @@ int, (struct cgraph_node *), NULL)
 
 HOOK_VECTOR_END (simd_clone)
 
+/* Functions relating to openacc.  */
+#undef HOOK_PREFIX
+#define HOOK_PREFIX "TARGET_GOACC_"
+HOOK_VECTOR (TARGET_GOACC, goacc)
+
+DEFHOOK
+(validate_dims,
+"This hook should check the launch dimensions provided.  It should fill\n\
+in anything that needs to default to non-unity and verify non-defaults.\n\
+Defaults are represented as -1.  Diagnostics should be issued as\n\
+appropriate.  Return true if changes have been made.  You must override\n\
+this hook to provide dimensions larger than 1.",
+bool, (tree decl, int dims[], int fn_level),
+default_goacc_validate_dims)
+
+HOOK_VECTOR_END (goacc)
+
 /* Functions relating to vectorization.  */
 #undef HOOK_PREFIX
 #define HOOK_PREFIX "TARGET_VECTORIZE_"
Index: gcc/targhooks.h
===
--- gcc/targhooks.h	(revision 228245)
+++ gcc/targhooks.h	(working copy)
@@ -107,6 +107,9 @@ extern unsigned default_add_stmt_cost (v
 extern void default_finish_cost (void *, unsigned *, unsigned *, unsigned *);
 extern void default_destroy_cost_data (void *);
 
+/* OpenACC hooks.  */
+extern bool default_goacc_validate_dims (tree, int [], int);
+
 /* These are here, and not in hooks.[ch], because not all users of
hooks.h include tm.h, and thus we don't have CUMULATIVE_ARGS.  */
 
Index: gcc/omp-low.c
===
--- gcc/omp-low.c	(revision 228245)
+++ gcc/omp-low.c	(working copy)
@@ -14020,4 +14019,146 @@ omp_finish_file (void)
 }
 }
 
+/* Validate and update the dimensions for offloaded FN.  ATTRS is the
+   raw attribute.  DIMS is an array of dimensions, which is returned.
+   Returns the function level dimensionality --  the level at which an
+   offload routine wishes to partition a loop.  */
+
+static int
+oacc_validate_dims (tree fn, tree attrs, int *dims)
+{
+  tree purpose[GOMP_DIM_MAX];
+  unsigned ix;
+  tree pos = TREE_VALUE (attrs);
+  int fn_level = -1;
+
+  /* Make sure the attribute creator attached the dimension
+ informati

Re: [PATCH] x86 interrupt attribute

2015-09-29 Thread Mike Stump

To be feature complete, it would be nice to have two styles of interrupt 
functions, one that returns with iret, and one that returns with ret.  The 
point is that the user might want to call functions from a interrupt handler 
and not save and restore all call clobbered registers.  By allowing a ret style 
interrupt handler, calls to a ret style interrupt routine can avoid saving and 
restoring all call clobbered registers.

Oh, and I wish that all the port independent code for interrupt functions was 
shared across all ports, as redoing all this code for each port is silly (sad). 
 And example of this would be the sibcall code, the fact that all call saved 
registers need to be saved is another.  The EPILOGUE_USES or the gen_rtx_USE is 
yet another.  Type checking the return type to ensure the return type is void, 
likely another.

One last comment, most folks use EPILOGUE_USES and mark up the registers as 
used.  You don’t.  I’m not sure if both ways work equally well, or if there is 
a reason to prefer one over the other.  Maybe someone could comment on this, as 
in my port I use EPILOGUE_USES and it seems to work just fine.

On Sep 29, 2015, at 6:49 AM, Yulia Koval  wrote:
> +  /* Always need to save SSE registrers in interrupt handler.  */

Spelling registrers -> registers.

Not in your code, but I noticed it:

>/* All pointer bounds argumntas are handled separately here.  */

Spelling argumntas -> arguments?

Re: [patch] libstdc++/67747 Allocate space for dirent::d_name

2015-09-29 Thread Martin Sebor


On 09/29/2015 05:37 AM, Jonathan Wakely wrote:

POSIX says that dirent::d_name has an unspecified length, so calls to
readdir_r must pass a buffer with enough trailing space for
{NAME_MAX}+1 characters. I wasn't doing that, which works OK on
GNU/Linux and BSD where d_name is a large array, but fails on Solaris
32-bit.

This uses pathconf to get NAME_MAX and allocates a buffer.

Tested powerpc64le-linux and x86_64-dragonfly4.1, I'm going to commit
this to trunk today (and backport all the filesystem fixes to
gcc-5-branch).


Calling pathconf is only necessary when _POSIX_NO_TRUNC is zero
which I think exists mainly for legacy file systems. Otherwise,
it's safe to use NAME_MAX instead. Avoiding the call to pathconf
also avoids the TOCTOU between it and the call to opendir, and
hardcoding the value makes it possible to avoid dynamically
allocating the dirent buffer.

I didn't remember the MAX_PATH value on Windows anymore but from
what I've just read online it sounds like it's defined to 260.

Defaulting to 255 on POSIX is appropriate. On XSI systems, the
minimum required value is _XOPEN_NAME_MAX which is 255 (I would
suggest using the macro instead when it's defined). Otherwise,
the strictly conforming minimum value would be 14 -- the value
of _POSIX_NAME_MAX, but since 255 is greater it's fine.

Other than that, I tend to be leery of using plain char arrays
as buffers for objects of bigger types. I don't know to what
extent this is a problem for libstdc++ anymore as more and more
hardware is tolerant of misaligned accesses and as the default
new expression typically returns memory suitably aligned for
the largest fundamental type. But since there is no requirement
in the language that it do so and I would tend to err on the
side of caution and use operator new (as opposed to
new char[len]).

Martin

PS I'm interpreting _POSIX_NO_TRUNC being zero as more
restrictive than if it was non-zero and so calling pathconf(p,
_PC_NO_TRUNC) should be required to also return non-zero for
such an implementation, regardless of p. But let me check that
I'm reading it right.

Re: [Patch ifcvt costs 0/3] Introduce a new target hook for ifcvt costs.

2015-09-29 Thread Mike Stump

On Sep 29, 2015, at 7:31 AM, James Greenhalgh  wrote:
> On Tue, Sep 29, 2015 at 11:16:37AM +0100, Richard Biener wrote:
>> On Fri, Sep 25, 2015 at 5:04 PM, James Greenhalgh
>>  wrote:
>>> 
>>> In relation to the patch I put up for review a few weeks ago to teach
>>> RTL if-convert to handle multiple sets in a basic block [1], I was
>>> asking about a sensible cost model to use. There was some consensus at
>>> Cauldron that what should be done in this situation is to introduce a
>>> target hook that delegates answering the question to the target.
>> 
>> Err - the consensus was to _not_ add gazillion of special target hooks
>> but instead enhance what we have with rtx_cost so that passes can
>> rely on comparing before and after costs of a sequence of insns.
> 
> Ah, I was not able to attend Cauldron this year, so I was trying to pick out
> "consensus" from the video. Rewatching it now, I see a better phrase would
> be "suggestion with some support”.

I’m not a big fan of rtx_cost.  To me it feels more like a crude, sledge 
hammer.  Now, that is the gcc way, we have a ton of these things, but would be 
nice to refine the tools so that the big escape hatch isn’t used as often and 
we have more finer grained ways of doing things.  rtx_cost should be what a 
code-generator generates with most new ports when they use the nice api to do a 
port.  The old sledge hammer wielding ports may well always define rtx_cost 
themselves, but, we should shoot for something better.

As a concrete example, I now have a code-generator for enum reg_class, 
N_REG_CLASSES, REG_CLASS_NAMES, REG_CLASS_CONTENTS, REGISTER_NAMES, 
FIXED_REGISTERS, CALL_USED_REGISTERS, ADDITIONAL_REGISTER_NAMES, 
REG_ALLOC_ORDER and more (some binutils code-gen to do with registers), and oh 
my, it is so much nicer to user than the original api.  If you only ever have 
to write once these things, fine, but, if you develop and prototype CPUs, the 
existing interface is, well, less than ideal.  I can do things like:

gccrclass
  rc_gprs = “GENERAL”;

r gpr[] = { rc_gprs, Fixed, Used,
"$zero", "$sp", "$fp", "$lr" };
r gpr_sav[] = { Notfixed, Notused, alias ("$save_first"),
"$sav1",   "$sav2",   "$sav3",   "$sav4”,

and get all the other goop I need for free.  I’d encourage people to find a way 
to do up an rtx_cost generator.  If you're a port maintainer, and want to redo 
your port to use a nicer api to do the registers, let me know.  I’d love to see 
progress made to rid gcc of the old crappy apis.

[openacc] use cuda error routine

2015-09-29 Thread Nathan Sidwell

The cuda library has provided cuGetErrorString since  at least 5.5, along with 
documentation of same.   What's been missing until cuda 7.0 is a declaration in 
the cuda header file.


I've merged this patch from the gomp4 branch to the nvptx libgomp plugin.

nathan
2015-09-29  Nathan Sidwell  

	* plugin/plugin-nvptx.c (ARRAYSIZE): Delete.
	(cuda_errlist): Delete.
	(cuda_error): Reimplement.

Index: libgomp/plugin/plugin-nvptx.c
===
--- libgomp/plugin/plugin-nvptx.c	(revision 228242)
+++ libgomp/plugin/plugin-nvptx.c	(working copy)
@@ -47,84 +47,21 @@
 #include 
 #include 
 
-#define	ARRAYSIZE(X) (sizeof (X) / sizeof ((X)[0]))
-
-static const struct
-{
-  CUresult r;
-  const char *m;
-} cuda_errlist[]=
-{
-  { CUDA_ERROR_INVALID_VALUE, "invalid value" },
-  { CUDA_ERROR_OUT_OF_MEMORY, "out of memory" },
-  { CUDA_ERROR_NOT_INITIALIZED, "not initialized" },
-  { CUDA_ERROR_DEINITIALIZED, "deinitialized" },
-  { CUDA_ERROR_PROFILER_DISABLED, "profiler disabled" },
-  { CUDA_ERROR_PROFILER_NOT_INITIALIZED, "profiler not initialized" },
-  { CUDA_ERROR_PROFILER_ALREADY_STARTED, "already started" },
-  { CUDA_ERROR_PROFILER_ALREADY_STOPPED, "already stopped" },
-  { CUDA_ERROR_NO_DEVICE, "no device" },
-  { CUDA_ERROR_INVALID_DEVICE, "invalid device" },
-  { CUDA_ERROR_INVALID_IMAGE, "invalid image" },
-  { CUDA_ERROR_INVALID_CONTEXT, "invalid context" },
-  { CUDA_ERROR_CONTEXT_ALREADY_CURRENT, "context already current" },
-  { CUDA_ERROR_MAP_FAILED, "map error" },
-  { CUDA_ERROR_UNMAP_FAILED, "unmap error" },
-  { CUDA_ERROR_ARRAY_IS_MAPPED, "array is mapped" },
-  { CUDA_ERROR_ALREADY_MAPPED, "already mapped" },
-  { CUDA_ERROR_NO_BINARY_FOR_GPU, "no binary for gpu" },
-  { CUDA_ERROR_ALREADY_ACQUIRED, "already acquired" },
-  { CUDA_ERROR_NOT_MAPPED, "not mapped" },
-  { CUDA_ERROR_NOT_MAPPED_AS_ARRAY, "not mapped as array" },
-  { CUDA_ERROR_NOT_MAPPED_AS_POINTER, "not mapped as pointer" },
-  { CUDA_ERROR_ECC_UNCORRECTABLE, "ecc uncorrectable" },
-  { CUDA_ERROR_UNSUPPORTED_LIMIT, "unsupported limit" },
-  { CUDA_ERROR_CONTEXT_ALREADY_IN_USE, "context already in use" },
-  { CUDA_ERROR_PEER_ACCESS_UNSUPPORTED, "peer access unsupported" },
-  { CUDA_ERROR_INVALID_SOURCE, "invalid source" },
-  { CUDA_ERROR_FILE_NOT_FOUND, "file not found" },
-  { CUDA_ERROR_SHARED_OBJECT_SYMBOL_NOT_FOUND,
-   "shared object symbol not found" },
-  { CUDA_ERROR_SHARED_OBJECT_INIT_FAILED, "shared object init error" },
-  { CUDA_ERROR_OPERATING_SYSTEM, "operating system" },
-  { CUDA_ERROR_INVALID_HANDLE, "invalid handle" },
-  { CUDA_ERROR_NOT_FOUND, "not found" },
-  { CUDA_ERROR_NOT_READY, "not ready" },
-  { CUDA_ERROR_LAUNCH_FAILED, "launch error" },
-  { CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES, "launch out of resources" },
-  { CUDA_ERROR_LAUNCH_TIMEOUT, "launch timeout" },
-  { CUDA_ERROR_LAUNCH_INCOMPATIBLE_TEXTURING,
- "launch incompatibe texturing" },
-  { CUDA_ERROR_PEER_ACCESS_ALREADY_ENABLED, "peer access already enabled" },
-  { CUDA_ERROR_PEER_ACCESS_NOT_ENABLED, "peer access not enabled " },
-  { CUDA_ERROR_PRIMARY_CONTEXT_ACTIVE, "primary cotext active" },
-  { CUDA_ERROR_CONTEXT_IS_DESTROYED, "context is destroyed" },
-  { CUDA_ERROR_ASSERT, "assert" },
-  { CUDA_ERROR_TOO_MANY_PEERS, "too many peers" },
-  { CUDA_ERROR_HOST_MEMORY_ALREADY_REGISTERED,
-   "host memory already registered" },
-  { CUDA_ERROR_HOST_MEMORY_NOT_REGISTERED, "host memory not registered" },
-  { CUDA_ERROR_NOT_PERMITTED, "not permitted" },
-  { CUDA_ERROR_NOT_SUPPORTED, "not supported" },
-  { CUDA_ERROR_UNKNOWN, "unknown" }
-};
-
 static const char *
 cuda_error (CUresult r)
 {
-  int i;
-
-  for (i = 0; i < ARRAYSIZE (cuda_errlist); i++)
-{
-  if (cuda_errlist[i].r == r)
-	return cuda_errlist[i].m;
-}
-
-  static char errmsg[30];
+#if CUDA_VERSION < 7000
+  /* Specified in documentation and present in library from at least
+ 5.5.  Not declared in header file prior to 7.0.  */
+  extern CUresult cuGetErrorString (CUresult, const char **);
+#endif
+  const char *desc;
 
-  snprintf (errmsg, sizeof (errmsg), "unknown error code: %d", r);
+  r = cuGetErrorString (r, &desc);
+  if (r != CUDA_SUCCESS)
+desc = "unknown cuda error";
 
-  return errmsg;
+  return desc;
 }
 
 static unsigned int instantiated_devices = 0;

Re: Elimitate duplication of get_catalogs in different abi

2015-09-29 Thread François Dumont

On 25/09/2015 17:58, Jonathan Wakely wrote:
> On 25/09/15 16:10 +0100, Jonathan Wakely wrote:
>> On 25/09/15 16:08 +0100, Jonathan Wakely wrote:
>>> On 23/09/15 21:28 +0200, François Dumont wrote:
 On 05/09/2015 23:02, François Dumont wrote:
> On 22/08/2015 14:24, Daniel Krügler wrote:
>> 2015-08-21 23:11 GMT+02:00 François Dumont :
>>> I think I found a better way to handle this problem. It is
>>> c++locale.cc
>>> that needs to be built with --fimplicit-templates. I even think
>>> that the
>>> *_cow.cc file do not need this option but as I don't know what
>>> is the
>>> drawback of this option I kept it. I also explicitely used the
>>> file name
>>> c++locale.cc even if it is an alias to a configurable source
>>> file.  I
>>> guess there must be some variable to use no ?
>>>
>>> With this patch there are 6 additional symbols. I guess I need to
>>> declare those in the scripts even if it is for internal library
>>> usage,
>>> right ?
>> I would expect that the new Catalog_info definition either has
>> deleted
>> or properly (user-)defined copy constructor and copy assignment
>> operator.
>>
>>
>> - Daniel
>>
> This type is used in C++98 so I need to make those private, not
> deleted.
>
> With this change, is the patch ok to commit ?
>
> François
>

 What about this patch ?

 I am still uncomfortable in exposing those implementation details
 in the
 versionned symbols but I don't know how to do otherwise. Do you
 want me
 to push this code in std::__detail namespace ?
>>>
>>> I think because the types are only used internally in the library we
>>> don't need to export them. The other code inside the shared library
>>> can refer to those symbols without them being exported.
>>>
>>> That way users can't see their names (because they're not in any
>>> public headers) and can't use the symbols (because they're not
>>> exported) so they're pure internal implementation details.
>>>
>>> I tested it briefly and it seems to work, so if you can confirm it
>>> still works then the patch is OK without the changes to gnu.ver
>>
>> Oh, the problem is that the symbols are matched by patterns in the
>> _GLIBCXX_3.4 version, so get exported with that version instead. Gah.
>>
>> In that case your patch would not have worked on Solaris anyway, as
>> the SOlaris linker gives an error if a symbol matches patterns in more
>> than one symbol version.
>>
>> Let me try to adjust the gnu.ver script to make this work ...
>
> This should do it ...
>
Indeed, I just rerun all tests with success. I am re-attaching the patch.

2015-09-30  François Dumont  
Jonathan Wakely  

* config/locale/gnu/messages_members.cc (Catalog_info, Catalogs):
Move...
* config/locale/gnu/c++locale_internal.h: ...here in std namespace.
* config/locale/gnu/c_locale.cc: Move implementation of latter here.
* config/abi/pre/gnu.ver: Adjust.

Ok to commit ?

François

diff --git a/libstdc++-v3/config/abi/pre/gnu.ver b/libstdc++-v3/config/abi/pre/gnu.ver
index d42cd37..c761052 100644
--- a/libstdc++-v3/config/abi/pre/gnu.ver
+++ b/libstdc++-v3/config/abi/pre/gnu.ver
@@ -24,7 +24,7 @@ GLIBCXX_3.4 {
 # Names inside the 'extern' block are demangled names.
 extern "C++"
 {
-  std::[A-Z]*;
+  std::[ABD-Z]*;
   std::a[a-c]*;
   std::ad[a-n]*;
   std::ad[p-z]*;
@@ -106,7 +106,7 @@ GLIBCXX_3.4 {
 # std::istringstream*;
   std::istrstream*;
   std::i[t-z]*;
-  std::[A-Zj-k]*;
+  std::[j-k]*;
 # std::length_error::l*;
 # std::length_error::~l*;
   std::locale::[A-Za-e]*;
@@ -132,9 +132,8 @@ GLIBCXX_3.4 {
 # std::logic_error::l*;
   std::logic_error::what*;
 # std::logic_error::~l*;
-# std::[A-Zm-r]*;
-# std::[A-Zm]*;
-  std::[A-Z]*;
+# std::[m-r]*;
+# std::[m]*;
   std::messages[^_]*;
 # std::messages_byname*;
   std::money_*;
@@ -175,11 +174,13 @@ GLIBCXX_3.4 {
 # std::t[i-n]*;
   std::tr1::h[^a]*;
   std::t[s-z]*;
-# std::[A-Zu-z]*;
+# std::[u-z]*;
 # std::underflow_error::u*;
 # std::underflow_error::~u*;
   std::unexpected*;
-  std::[A-Zv-z]*;
+  std::valarray*;
+  # std::vector*
+  std::[w-z]*;
   std::_List_node_base::hook*;
   std::_List_node_base::swap*;
   std::_List_node_base::unhook*;
diff --git a/libstdc++-v3/config/locale/gnu/c++locale_internal.h b/libstdc++-v3/config/locale/gnu/c++locale_internal.h
index f1959d6..7db354c 100644
--- a/libstdc++-v3/config/locale/gnu/c++locale_internal.h
+++ b/libstdc++-v3/config/locale/gnu/c++locale_internal.h
@@ -36,8 +36,13 @@
 #include 
 #include 
 
+#include 
+#include 	// ::strdup
+
+#include 
+
 #if __GLIBC__ > 2 || (__GLIBC__ == 2 && __GLIBC_MINOR__ > 2)
-  
+
 extern "C" __typeof(nl_langinfo_l) __nl_langinfo_l;
 extern

[committed, PATCH] Fix typos in comments in config/i386/i386.c

2015-09-29 Thread H.J. Lu

Index: gcc/ChangeLog
===
--- gcc/ChangeLog   (revision 228264)
+++ gcc/ChangeLog   (working copy)
@@ -1,3 +1,8 @@
+2015-09-29  H.J. Lu  
+
+   * config/i386/i386.c (ix86_function_arg): Fix typo in comments.
+   (ix86_nsaved_sseregs): Likewise.
+
 2015-09-29  Jeff Law  
 
* config/microblaze/microblaze.c (microblaze_version_to_int): Remove
Index: gcc/config/i386/i386.c
===
--- gcc/config/i386/i386.c  (revision 228264)
+++ gcc/config/i386/i386.c  (working copy)
@@ -8998,7 +8998,7 @@ ix86_function_arg (cumulative_args_t cum
   HOST_WIDE_INT bytes, words;
   rtx arg;
 
-  /* All pointer bounds argumntas are handled separately here.  */
+  /* All pointer bounds arguments are handled separately here.  */
   if ((type && POINTER_BOUNDS_TYPE_P (type))
   || POINTER_BOUNDS_MODE_P (mode))
 {
@@ -11084,7 +11084,7 @@ ix86_nsaved_regs (void)
   return nregs;
 }
 
-/* Return number of saved SSE registrers.  */
+/* Return number of saved SSE registers.  */
 
 static int
 ix86_nsaved_sseregs (void)

Re: Fold acc_on_device

2015-09-29 Thread Bernd Schmidt


On 09/29/2015 08:21 PM, Nathan Sidwell wrote:

This patch folds acc_on_device as a regular builtin, but postponed until
we know which compiler we're in.  As suggested by Bernd, we use the
existing builtin folding machinery.

Trunk is still using  the older PTX runtime scheme (Thomas is working on
that), so the only change there is in the  host-side libgomp piece.

Ok for trunk?


Ok, although I really don't quite see the need to drop the expander.


Bernd

Re: Fold acc_on_device

2015-09-29 Thread Nathan Sidwell


On 09/29/15 15:52, Bernd Schmidt wrote:


Ok, although I really don't quite see the need to drop the expander.


Unnecessary code duplication.  It's better to say something once in one place, 
than try and say it twice in two different places.


nathan

Re: [PATCH] x86 interrupt attribute

2015-09-29 Thread H.J. Lu

On Tue, Sep 29, 2015 at 11:49 AM, Mike Stump  wrote:
> To be feature complete, it would be nice to have two styles of interrupt 
> functions, one that returns with iret, and one that returns with ret.  The 
> point is that the user might want to call functions from a interrupt handler 
> and not save and restore all call clobbered registers.  By allowing a ret 
> style interrupt handler, calls to a ret style interrupt routine can avoid 
> saving and restoring all call clobbered registers.

Do you have a testcase for this?  I think the current implementation
covers most use cases.

> Oh, and I wish that all the port independent code for interrupt functions was 
> shared across all ports, as redoing all this code for each port is silly 
> (sad).  And example of this would be the sibcall code, the fact that all call 
> saved registers need to be saved is another.  The EPILOGUE_USES or the 
> gen_rtx_USE is yet another.  Type checking the return type to ensure the 
> return type is void, likely another.

A very good point, but beyond this implementation :-(.

> One last comment, most folks use EPILOGUE_USES and mark up the registers as 
> used.  You don’t.  I’m not sure if both ways work equally well, or if there 
> is a reason to prefer one over the other.  Maybe someone could comment on 
> this, as in my port I use EPILOGUE_USES and it seems to work just fine.

We will take a look.

> On Sep 29, 2015, at 6:49 AM, Yulia Koval  wrote:
>> +  /* Always need to save SSE registrers in interrupt handler.  */
>
> Spelling registrers -> registers.
>
> Not in your code, but I noticed it:
>
>>/* All pointer bounds argumntas are handled separately here.  */
>
> Spelling argumntas -> arguments?

I checked in an obvious patch to fix those typos.

Thanks.


-- 
H.J.

[PATCH] Make compute_deps, extend_schedule static

2015-09-29 Thread Aditya Kumar

From: hiraditya 

No functional changes intended. Passes make check and bootstrap.
gcc/ChangeLog:

2015-09-29  Aditya Kumar  

* graphite-dependences.c (scop_get_dependences): Moved in down
in order to be visible to its caller.
* graphite-poly.h: Removed compute_deps, and extend_schedule.

---
 gcc/graphite-dependences.c | 62 +++---
 gcc/graphite-poly.h| 16 
 2 files changed, 31 insertions(+), 47 deletions(-)

diff --git a/gcc/graphite-dependences.c b/gcc/graphite-dependences.c
index 85f16f3..e39394a 100644
--- a/gcc/graphite-dependences.c
+++ b/gcc/graphite-dependences.c
@@ -47,35 +47,6 @@ along with GCC; see the file COPYING3.  If not see
 #include "graphite-poly.h"
 
 
-isl_union_map *
-scop_get_dependences (scop_p scop)
-{
-  isl_union_map *dependences;
-
-  if (!scop->must_raw)
-compute_deps (scop, SCOP_BBS (scop),
- &scop->must_raw, &scop->may_raw,
- &scop->must_raw_no_source, &scop->may_raw_no_source,
- &scop->must_war, &scop->may_war,
- &scop->must_war_no_source, &scop->may_war_no_source,
- &scop->must_waw, &scop->may_waw,
- &scop->must_waw_no_source, &scop->may_waw_no_source);
-
-  dependences = isl_union_map_copy (scop->must_raw);
-  dependences = isl_union_map_union (dependences,
-isl_union_map_copy (scop->must_war));
-  dependences = isl_union_map_union (dependences,
-isl_union_map_copy (scop->must_waw));
-  dependences = isl_union_map_union (dependences,
-isl_union_map_copy (scop->may_raw));
-  dependences = isl_union_map_union (dependences,
-isl_union_map_copy (scop->may_war));
-  dependences = isl_union_map_union (dependences,
-isl_union_map_copy (scop->may_waw));
-
-  return dependences;
-}
-
 /* Add the constraints from the set S to the domain of MAP.  */
 
 static isl_map *
@@ -252,7 +223,7 @@ extend_schedule_1 (__isl_take isl_map *map, void *user)
 
 /* Return a relation that has uniform output dimensions.  */
 
-__isl_give isl_union_map *
+static __isl_give isl_union_map *
 extend_schedule (__isl_take isl_union_map *x)
 {
   int max = 0;
@@ -519,7 +490,7 @@ subtract_commutative_associative_deps (scop_p scop,
 /* Compute the original data dependences in SCOP for all the reads and
writes in PBBS.  */
 
-void
+static void
 compute_deps (scop_p scop, vec pbbs,
  isl_union_map **must_raw,
  isl_union_map **may_raw,
@@ -595,6 +566,35 @@ transform_is_safe (scop_p scop, isl_union_map *transform)
   return res;
 }
 
+isl_union_map *
+scop_get_dependences (scop_p scop)
+{
+  isl_union_map *dependences;
+
+  if (!scop->must_raw)
+compute_deps (scop, SCOP_BBS (scop),
+ &scop->must_raw, &scop->may_raw,
+ &scop->must_raw_no_source, &scop->may_raw_no_source,
+ &scop->must_war, &scop->may_war,
+ &scop->must_war_no_source, &scop->may_war_no_source,
+ &scop->must_waw, &scop->may_waw,
+ &scop->must_waw_no_source, &scop->may_waw_no_source);
+
+  dependences = isl_union_map_copy (scop->must_raw);
+  dependences = isl_union_map_union (dependences,
+isl_union_map_copy (scop->must_war));
+  dependences = isl_union_map_union (dependences,
+isl_union_map_copy (scop->must_waw));
+  dependences = isl_union_map_union (dependences,
+isl_union_map_copy (scop->may_raw));
+  dependences = isl_union_map_union (dependences,
+isl_union_map_copy (scop->may_war));
+  dependences = isl_union_map_union (dependences,
+isl_union_map_copy (scop->may_waw));
+
+  return dependences;
+}
+
 /* Return true when the SCOP transformed schedule is correct.  */
 
 bool
diff --git a/gcc/graphite-poly.h b/gcc/graphite-poly.h
index 3bd22f0..b2dbd36 100644
--- a/gcc/graphite-poly.h
+++ b/gcc/graphite-poly.h
@@ -456,22 +456,6 @@ scop_set_nb_params (scop_p scop, graphite_dim_t nb_params)
 }
 
 bool graphite_legal_transform (scop_p);
-__isl_give isl_union_map *extend_schedule (__isl_take isl_union_map *);
-
-void
-compute_deps (scop_p scop, vec pbbs,
- isl_union_map **must_raw,
- isl_union_map **may_raw,
- isl_union_map **must_raw_no_source,
- isl_union_map **may_raw_no_source,
- isl_union_map **must_war,
- isl_union_map **may_war,
- isl_union_map **must_war_no_source,
- isl_union_map **may_war_no_source,
- isl_union_map **must_waw,
- isl_union_map **may_waw,
- isl_union_map **must_waw_no_source,
- isl_union_map **may_waw_no_source);
 
 isl_unio

[Patch] Add OPT_Wattributes to ignored attributes on template args

2015-09-29 Thread Ryan Mansfield


Hi,

In canonicalize_type_argument attributes are being discarded with a 
warning. Should it be added to OPT_Wattributes?


2015-09-29  Ryan Mansfield  

* pt.c (canonicalize_type_argument): Use OPT_Wattributes in 
warning.



Index: cp/pt.c
===
--- cp/pt.c (revision 228265)
+++ cp/pt.c (working copy)
@@ -6888,7 +6888,7 @@
   tree canon = strip_typedefs (arg, &removed_attributes);
   if (removed_attributes
   && (complain & tf_warning))
-warning (0, "ignoring attributes on template argument %qT", arg);
+warning (OPT_Wattributes, "ignoring attributes on template argument 
%qT", arg);

   return canon;
 }

Regards,

Ryan Mansfield

[testsuite] Fix order of dg-do and dg-require-effective-target directives

2015-09-29 Thread Christophe Lyon

I have noticed that both dg-do and dg-require-effective-target modify
the value of dg-do-what, which means that dg-do directives must appear
before dg-require-effective-target.

Indeed if the effective-target property is false, but dg-do is
executed later, the test would fail instead of being unsupported.

The attached patch fixes the order on the few testcases where I
noticed it was wrong.

Tested on several arm* and aarch64* targets/multilibs with no regression.

OK?

Christophe.
2015-09-29  Christophe Lyon  

* g++.dg/cpp0x/stdint.C: Move dg-require-effective-target after
dg-do.
* g++.dg/gomp/tls-wrap4.C: Likewise.
* gcc.dg/atomic-op-optimize.c: Likewise.
* gcc.dg/pr54087.c: Likewise.
* gcc.dg/tls/section-2.c: Likewise.
* gcc.dg/vect/costmodel/ppc/costmodel-bb-slp-9a-pr63175.c:
Likewise.
* gcc.dg/vect/costmodel/ppc/costmodel-pr37194.c: Likewise.
* gcc.dg/vect/trapv-vect-reduc-4.c: Likewise.
* gcc.target/arm/divzero.c: Likewise.
* gcc.target/arm/sibcall-2.c: Likewise.
* gcc.target/arm/thumb1-Os-mult.c: Likewise.
* gcc.target/arm/thumb1-load-64bit-constant-1.c: Likewise.
* gcc.target/arm/thumb1-load-64bit-constant-2.c: Likewise.
* gcc.target/arm/thumb1-load-64bit-constant-3.c: Likewise.
* gcc.target/arm/volatile-bitfields-1.c: Likewise.
* gcc.target/arm/volatile-bitfields-2.c: Likewise.
* gcc.target/arm/volatile-bitfields-3.c: Likewise.
* gcc.target/arm/volatile-bitfields-4.c: Likewise.
* gfortran.dg/default_format_2.f90: Likewise.
* gfortran.dg/default_format_denormal_2.f90: Likewise.
diff --git a/gcc/testsuite/g++.dg/cpp0x/stdint.C b/gcc/testsuite/g++.dg/cpp0x/stdint.C
index 434d458..6c213d7 100644
--- a/gcc/testsuite/g++.dg/cpp0x/stdint.C
+++ b/gcc/testsuite/g++.dg/cpp0x/stdint.C
@@ -1,6 +1,6 @@
 // PR c++/52764
-// { dg-require-effective-target stdint_types }
 // { dg-do compile { target c++11 } }
+// { dg-require-effective-target stdint_types }
 
 #include 
 
diff --git a/gcc/testsuite/g++.dg/gomp/tls-wrap4.C b/gcc/testsuite/g++.dg/gomp/tls-wrap4.C
index 59a5683..dca249d 100644
--- a/gcc/testsuite/g++.dg/gomp/tls-wrap4.C
+++ b/gcc/testsuite/g++.dg/gomp/tls-wrap4.C
@@ -1,8 +1,8 @@
 // We don't need to call the wrapper through the PLT; we can use a separate
 // copy per shared object.
 
-// { dg-require-effective-target tls }
 // { dg-do compile { target c++11 } }
+// { dg-require-effective-target tls }
 // { dg-options "-fPIC" }
 // { dg-final { scan-assembler-not "_ZTW1i@PLT" { target i?86-*-* x86_64-*-* } } }
 
diff --git a/gcc/testsuite/gcc.dg/atomic-op-optimize.c b/gcc/testsuite/gcc.dg/atomic-op-optimize.c
index d2e960a..66efee4 100644
--- a/gcc/testsuite/gcc.dg/atomic-op-optimize.c
+++ b/gcc/testsuite/gcc.dg/atomic-op-optimize.c
@@ -2,8 +2,8 @@
Test that it at happens on x86 by making sure there are 2 xchg's and no
compare_exchange loop.  */
 
-/* { dg-require-effective-target sync_int_long } */
 /* { dg-do compile { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-require-effective-target sync_int_long } */
 /* { dg-final { scan-assembler-times "cmpxchg" 0 } } */
 /* { dg-final { scan-assembler-times "xchg" 2 } } */
 
diff --git a/gcc/testsuite/gcc.dg/pr54087.c b/gcc/testsuite/gcc.dg/pr54087.c
index abb0af3..5874e9c 100644
--- a/gcc/testsuite/gcc.dg/pr54087.c
+++ b/gcc/testsuite/gcc.dg/pr54087.c
@@ -1,7 +1,7 @@
 /* PR54087.  Verify __atomic_sub (val) uses __atomic_add (-val) if there is no
  atomic_aub.  */
-/* { dg-require-effective-target sync_int_long } */
 /* { dg-do compile { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-require-effective-target sync_int_long } */
 /* { dg-final { scan-assembler-times "xadd" 2 } } */
 
 
diff --git a/gcc/testsuite/gcc.dg/tls/section-2.c b/gcc/testsuite/gcc.dg/tls/section-2.c
index 8f11def..9c21307 100644
--- a/gcc/testsuite/gcc.dg/tls/section-2.c
+++ b/gcc/testsuite/gcc.dg/tls/section-2.c
@@ -1,7 +1,7 @@
 /* Verify that we get errors for trying to put TLS data in 
sections which can't work.  */
-/* { dg-require-effective-target tls } */
 /* { dg-do compile { target *-*-vxworks } } */
+/* { dg-require-effective-target tls } */
 
 #define A(X)	__attribute__((section(X)))
 
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-bb-slp-9a-pr63175.c b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-bb-slp-9a-pr63175.c
index bf6053d..409e685 100644
--- a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-bb-slp-9a-pr63175.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-bb-slp-9a-pr63175.c
@@ -1,5 +1,5 @@
-/* { dg-require-effective-target vect_int } */
 /* { dg-do compile } */
+/* { dg-require-effective-target vect_int } */
 
 #define N 16 
 
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-pr37194.c b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-pr37194.c
index e0093c4..5b5dd1b 100644
--- a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/

RE: [PATCH][AArch64] Add separate insn sched class for vector LDP & STP

2015-09-29 Thread Evandro Menezes

It's been committed as 228253.

Thank y'all for playing.

Cheers,

-- 
Evandro Menezes  Austin, TX


> -Original Message-
> From: Kyrill Tkachov [mailto:kyrylo.tkac...@arm.com]
> Sent: Tuesday, September 29, 2015 4:01
> To: Marcus Shawcroft; Evandro Menezes; gcc-patches@gcc.gnu.org
> Cc: James Greenhalgh; Ramana Radhakrishnan
> Subject: Re: [PATCH][AArch64] Add separate insn sched class for vector LDP &
> STP
> 
> 
> On 29/09/15 09:03, Marcus Shawcroft wrote:
> > On 29/09/15 00:52, Evandro Menezes wrote:
> >> In some micro-architectures the insns to load or store pairs of
> >> vector registers are implemented rather differently from those
> >> affecting lanes in vector registers.  Then, it's important that such
> >> insns be described likewise differently in the scheduling model.
> >>
> >> This patch adds the insn types neon_ldp{,_q} and neon_stp{,_q} apart
> >> from the current neon_load2_2reg_q and neon_store2_2reg_q types,
> >> respectively.
> >>
> > Hi,
> >
> > The AArch64 part of this is OK. Please wait for Kyrill or Ramana to
> > comment on ARM side.  Cheers /Marcus
> >
> 
> This is ok arm-wise. I see the instructions being modelled with this type
> don't have a direct arm equivalent anyway.
> Marcus' comment on the ChangeLog still apply.
> 
> Thanks,
> Kyrill
> 
> >> Thank you,
> >>
> >> -- Evandro Menezes
> >>
> >>
> >> 0001-AArch64-Add-separate-insn-sched-class-for-vector-LDP.patch
> >>
> >>
> >>   From 340249dcd2af8dfce486cb4f62d4eaf285c6a799 Mon Sep 17 00:00:00
> >> 2001
> >> From: Evandro Menezes
> >> Date: Mon, 28 Sep 2015 15:00:00 -0500
> >> Subject: [PATCH] [AArch64] Add separate insn sched class for vector
> >> LDP & STP
> >>
> >> 2015-09-28  Evandro Menezes
> >>
> >>gcc/
> >>* config/arm/types.md (neon_ldp, neon_ldp_q, neon_stp, neon_stp_q):
> >>add new insn types for vector load and store pairs.
> > s/add/Add/ and likewise the rest of the changelog comments.
> >
> >>* config/arm/cortex-a53.md (cortex_a53_f_load_2reg): add insn
> >>types "neon_ldp{,_q}".
> >>* config/arm/cortex-a57.md (neon_load_c): add insn types
> >>"neon_ldp{,_q}".
> >>(neon_store_complex): add insn types "neon_stp{,_q}".
> >>* config/aarch64/aarch64-simd.md (aarch64_be_movoi): add insn types
> >>"neon_{ldp,stp}_q".

Re: [PATCH] Clear variables with stale SSA_NAME_RANGE_INFO (PR tree-optimization/67690)

2015-09-29 Thread Richard Biener

On September 29, 2015 4:21:16 PM GMT+02:00, Marek Polacek  
wrote:
>On Fri, Sep 25, 2015 at 06:22:44PM +0200, Richard Biener wrote:
>> On September 25, 2015 3:49:34 PM GMT+02:00, Marek Polacek
> wrote:
>> >On Fri, Sep 25, 2015 at 09:29:30AM +0200, Richard Biener wrote:
>> >> On Thu, 24 Sep 2015, Marek Polacek wrote:
>> >> 
>> >> > As Richi said in
>> >,
>> >> > using recorded SSA name range infos in VRP is likely to expose
>> >errors in the
>> >> > ranges.  This PR is such a case.  As discussed in the PR, after
>> >tail merging
>> >> > via PRE the range infos cannot be relied upon anymore, so we
>need
>> >to clear
>> >> > them.
>> >> > 
>> >> > Since tree-ssa-ifcombine.c already had code to clean up the flow
>> >data in a BB,
>> >> > I've factored it out to a common function.
>> >> > 
>> >> > Bootstrapped/regtested on x86_64-linux, ok for trunk and 5?
>> >> 
>> >> I believe for tail-merge you also need to clear range info on
>> >> PHI defs in the BB.  For ifcombine this wasn't necessary (no PHI
>> >nodes
>> >> in the relevant CFG), but it's ok to extend the new 
>> >> reset_flow_sensitive_info_in_bb function to also reset PHI defs.
>> >
>> >All right.
>> > 
>> >> Ok with that change.
>> >
>> >Since I'm not completely sure if I did the right thing here, could
>you
>> >please have another look at the new function?
>> 
>> Doesn't work that way.  You need to iterate over the PHI sequence
>separately via gsi_start_phis(bb), etc.
>
>Oops, sorry.  So like this?
>
>Bootstrapped/regtested on x86_64-linux, ok for trunk (and a similar
>patch for 5)?

Yes, thanks
Richard.

>2015-09-29  Marek Polacek  
>
>   PR tree-optimization/67690
>   * tree-ssa-ifcombine.c (pass_tree_ifcombine::execute): Call
>   reset_flow_sensitive_info_in_bb.
>   * tree-ssa-tail-merge.c (replace_block_by): Likewise.
>   * tree-ssanames.c: Include "gimple-iterator.h".
>   (reset_flow_sensitive_info_in_bb): New function.
>   * tree-ssanames.h (reset_flow_sensitive_info_in_bb): Declare.
>
>   * gcc.dg/torture/pr67690.c: New test.
>
>diff --git gcc/testsuite/gcc.dg/torture/pr67690.c
>gcc/testsuite/gcc.dg/torture/pr67690.c
>index e69de29..491de51 100644
>--- gcc/testsuite/gcc.dg/torture/pr67690.c
>+++ gcc/testsuite/gcc.dg/torture/pr67690.c
>@@ -0,0 +1,32 @@
>+/* { dg-do run } */
>+
>+const int c1 = 1;
>+const int c2 = 2;
>+
>+int
>+check (int i)
>+{
>+  int j;
>+  if (i >= 0)
>+j = c2 - i;
>+  else
>+j = c2 - i;
>+  return c2 - c1 + 1 > j;
>+}
>+
>+int invoke (int *pi) __attribute__ ((noinline,noclone));
>+int
>+invoke (int *pi)
>+{
>+  return check (*pi);
>+}
>+
>+int
>+main ()
>+{
>+  int i = c1;
>+  int ret = invoke (&i);
>+  if (!ret)
>+__builtin_abort ();
>+  return 0;
>+}
>diff --git gcc/tree-ssa-ifcombine.c gcc/tree-ssa-ifcombine.c
>index 9f04174..66be430 100644
>--- gcc/tree-ssa-ifcombine.c
>+++ gcc/tree-ssa-ifcombine.c
>@@ -769,16 +769,7 @@ pass_tree_ifcombine::execute (function *fun)
> {
>   /* Clear range info from all stmts in BB which is now executed
>  conditional on a always true/false condition.  */
>-  for (gimple_stmt_iterator gsi = gsi_start_bb (bb);
>-   !gsi_end_p (gsi); gsi_next (&gsi))
>-{
>-  gimple *stmt = gsi_stmt (gsi);
>-  ssa_op_iter i;
>-  tree op;
>-  FOR_EACH_SSA_TREE_OPERAND (op, stmt, i, SSA_OP_DEF)
>-reset_flow_sensitive_info (op);
>-}
>-
>+  reset_flow_sensitive_info_in_bb (bb);
>   cfg_changed |= true;
> }
> }
>diff --git gcc/tree-ssa-tail-merge.c gcc/tree-ssa-tail-merge.c
>index 0ce59e8..487961e 100644
>--- gcc/tree-ssa-tail-merge.c
>+++ gcc/tree-ssa-tail-merge.c
>@@ -1534,6 +1534,10 @@ replace_block_by (basic_block bb1, basic_block
>bb2)
>   e2->probability = GCOV_COMPUTE_SCALE (e2->count, out_sum);
> }
> 
>+  /* Clear range info from all stmts in BB2 -- this transformation
>+ could make them out of date.  */
>+  reset_flow_sensitive_info_in_bb (bb2);
>+
>   /* Do updates that use bb1, before deleting bb1.  */
>   release_last_vdef (bb1);
>   same_succ_flush_bb (bb1);
>diff --git gcc/tree-ssanames.c gcc/tree-ssanames.c
>index 4199290..7235dc3 100644
>--- gcc/tree-ssanames.c
>+++ gcc/tree-ssanames.c
>@@ -23,6 +23,7 @@ along with GCC; see the file COPYING3.  If not see
> #include "backend.h"
> #include "tree.h"
> #include "gimple.h"
>+#include "gimple-iterator.h"
> #include "hard-reg-set.h"
> #include "ssa.h"
> #include "alias.h"
>@@ -544,6 +545,29 @@ reset_flow_sensitive_info (tree name)
> SSA_NAME_RANGE_INFO (name) = NULL;
> }
> 
>+/* Clear all flow sensitive data from all statements and PHI
>definitions
>+   in BB.  */
>+
>+void
>+reset_flow_sensitive_info_in_bb (basic_block bb)
>+{
>+  for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p (gsi);
>+   gsi_next (&gsi))
>+{
>+  gimple *stmt = gsi_stmt (gsi);
>+

[google][gcc-4_9] Remove unused key field in gcov_fn_info

2015-09-29 Thread Rong Xu

Hi,

This patch is for google/gcc-4_9 branch.

The 'key' field in gcov_fn_info is designed to allow gcov function
data to be COMDATTed, but the comdat elimination never works. This
patch removes this field to reduce the instrumented object size.

Thanks,

-Rong
Removed the unused 'key' field in gcov_fn_info to reduce the 
instrumented objects size.

2015-09-29  Rong Xu  

* gcc/coverage.c (build_fn_info_type): Remove 'key'
field. (build_fn_info): Ditto.
(coverage_obj_fn): Ditto.
* libgcc/libgcov.h (struct gcov_fn_info): Ditto.
* libgcc/libgcov-driver.c (gcov_compute_histogram): Ditto.
(gcov_exit_compute_summary): Ditto.
(gcov_exit_merge_gcda): Ditto.
(gcov_write_func_counters): Ditto.
(gcov_clear): Ditto.
* libgcc/libgcov-util.c (tag_function): Ditto.
(gcov_merge): Ditto.
(gcov_profile_scale): Ditto.
(gcov_profile_normalize): Ditto.
(compute_one_gcov): Ditto.
(gcov_info_count_all_cold): Ditto.

Index: gcc/coverage.c
===
--- gcc/coverage.c  (revision 228223)
+++ gcc/coverage.c  (working copy)
@@ -189,7 +189,7 @@ static void read_counts_file (const char *, unsign
 static tree build_var (tree, tree, int);
 static void build_fn_info_type (tree, unsigned, tree);
 static void build_info_type (tree, tree);
-static tree build_fn_info (const struct coverage_data *, tree, tree);
+static tree build_fn_info (const struct coverage_data *, tree);
 static tree build_info (tree, tree);
 static bool coverage_obj_init (void);
 static vec *coverage_obj_fn
@@ -1668,16 +1668,9 @@ build_fn_info_type (tree type, unsigned counters,
 
   finish_builtin_struct (ctr_info, "__gcov_ctr_info", fields, NULL_TREE);
 
-  /* key */
-  field = build_decl (BUILTINS_LOCATION, FIELD_DECL, NULL_TREE,
- build_pointer_type (build_qualified_type
- (gcov_info_type, TYPE_QUAL_CONST)));
-  fields = field;
-
   /* ident */
   field = build_decl (BUILTINS_LOCATION, FIELD_DECL, NULL_TREE,
  get_gcov_unsigned_t ());
-  DECL_CHAIN (field) = fields;
   fields = field;
 
   /* lineno_checksum */
@@ -1705,10 +1698,10 @@ build_fn_info_type (tree type, unsigned counters,
 
 /* Returns a CONSTRUCTOR for a gcov_fn_info.  DATA is
the coverage data for the function and TYPE is the gcov_fn_info
-   RECORD_TYPE.  KEY is the object file key.  */
+   RECORD_TYPE.  */
 
 static tree
-build_fn_info (const struct coverage_data *data, tree type, tree key)
+build_fn_info (const struct coverage_data *data, tree type)
 {
   tree fields = TYPE_FIELDS (type);
   tree ctr_type;
@@ -1716,11 +1709,6 @@ static tree
   vec *v1 = NULL;
   vec *v2 = NULL;
 
-  /* key */
-  CONSTRUCTOR_APPEND_ELT (v1, fields,
- build1 (ADDR_EXPR, TREE_TYPE (fields), key));
-  fields = DECL_CHAIN (fields);
-  
   /* ident */
   CONSTRUCTOR_APPEND_ELT (v1, fields,
  build_int_cstu (get_gcov_unsigned_t (),
@@ -2556,7 +2544,7 @@ static vec *
 coverage_obj_fn (vec *ctor, tree fn,
 struct coverage_data const *data)
 {
-  tree init = build_fn_info (data, gcov_fn_info_type, gcov_info_var);
+  tree init = build_fn_info (data, gcov_fn_info_type);
   tree var = build_var (fn, gcov_fn_info_type, -1);
   
   DECL_INITIAL (var) = init;
Index: libgcc/libgcov-driver.c
===
--- libgcc/libgcov-driver.c (revision 227984)
+++ libgcc/libgcov-driver.c (working copy)
@@ -380,7 +380,7 @@ gcov_compute_histogram (struct gcov_summary *sum)
 {
   gfi_ptr = gi_ptr->functions[f_ix];
 
-  if (!gfi_ptr || gfi_ptr->key != gi_ptr)
+  if (!gfi_ptr)
 continue;
 
   ci_ptr = &gfi_ptr->ctrs[ctr_info_ix];
@@ -430,9 +430,6 @@ gcov_exit_compute_summary (struct gcov_summary *th
 {
   gfi_ptr = gi_ptr->functions[f_ix];
 
-  if (gfi_ptr && gfi_ptr->key != gi_ptr)
-gfi_ptr = 0;
-
   crc32 = crc32_unsigned (crc32, gfi_ptr ? gfi_ptr->cfg_checksum : 0);
   crc32 = crc32_unsigned (crc32,
   gfi_ptr ? gfi_ptr->lineno_checksum : 0);
@@ -688,7 +685,7 @@ gcov_exit_merge_gcda (struct gcov_info *gi_ptr,
   if (length != GCOV_TAG_FUNCTION_LENGTH)
 goto read_mismatch;
 
-  if (!gfi_ptr || gfi_ptr->key != gi_ptr)
+  if (!gfi_ptr)
 {
   /* This function appears in the other program.  We
  need to buffer the information in order to write
@@ -832,10 +829,8 @@ gcov_write_func_counters (struct gcov_info *gi_ptr
   else
 {
   gfi_ptr = gi_ptr->functions[f_ix];
-  if (gfi_ptr && gfi_ptr->key == gi_ptr)
+  if (gfi_ptr)
 length = GCOV_TAG_FUNCTION_LENGTH;
-  else
-length = 0;
 }
 
   gcov_write_tag_length

Re: [PATCH] Make compute_deps, extend_schedule static

2015-09-29 Thread Tobias Grosser


On 09/29/2015 10:19 PM, Aditya Kumar wrote:

From: hiraditya 

No functional changes intended. Passes make check and bootstrap.


LGTM.

Tobias

Re: [testsuite] Fix order of dg-do and dg-require-effective-target directives

2015-09-29 Thread Mike Stump

On Sep 29, 2015, at 1:29 PM, Christophe Lyon  wrote:
> The attached patch fixes the order on the few testcases where I
> noticed it was wrong.

> OK?

Ok.

Re: [AArch64_be] Fix vtbl[34] and vtbx4

2015-09-29 Thread Christophe Lyon

Ping?


On 15 September 2015 at 18:25, Christophe Lyon
 wrote:
> This patch re-implements vtbl[34] and vtbx4 AdvSIMD intrinsics using
> existing builtins, and fixes the behaviour on aarch64_be.
>
> Tested on aarch64_be-none-elf and aarch64-none-elf using the Foundation Model.
>
> OK?
>
> Christophe.

Re: [PATCH] x86 interrupt attribute

2015-09-29 Thread H.J. Lu

On Tue, Sep 29, 2015 at 1:16 PM, H.J. Lu  wrote:
> On Tue, Sep 29, 2015 at 11:49 AM, Mike Stump  wrote:
>> To be feature complete, it would be nice to have two styles of interrupt 
>> functions, one that returns with iret, and one that returns with ret.  The 
>> point is that the user might want to call functions from a interrupt handler 
>> and not save and restore all call clobbered registers.  By allowing a ret 
>> style interrupt handler, calls to a ret style interrupt routine can avoid 
>> saving and restoring all call clobbered registers.
>
> Do you have a testcase for this?  I think the current implementation
> covers most use cases.
>
>> Oh, and I wish that all the port independent code for interrupt functions 
>> was shared across all ports, as redoing all this code for each port is silly 
>> (sad).  And example of this would be the sibcall code, the fact that all 
>> call saved registers need to be saved is another.  The EPILOGUE_USES or the 
>> gen_rtx_USE is yet another.  Type checking the return type to ensure the 
>> return type is void, likely another.
>
> A very good point, but beyond this implementation :-(.
>
>> One last comment, most folks use EPILOGUE_USES and mark up the registers as 
>> used.  You don’t.  I’m not sure if both ways work equally well, or if there 
>> is a reason to prefer one over the other.  Maybe someone could comment on 
>> this, as in my port I use EPILOGUE_USES and it seems to work just fine.
>
> We will take a look.

Julia, I checked a patch into hjl/interrupt/master branch to
define EPILOGUE_USES in i386:

commit f3a6675a8d69d810d2cad0c090a762094a0a8622
Author: H.J. Lu 
Date:   Tue Sep 29 13:47:18 2015 -0700

Define EPILOGUE_USES in i386

Define EPILOGUE_USES in i386 so that all preserved registers are used
by the epilogue of interrupt handler.  Don't explicitly mark BP and SP
registers as used since they are always used in epilogue.

Please take a look.


-- 
H.J.

Re: [PATCH] x86 interrupt attribute

2015-09-29 Thread Mike Stump

On Sep 29, 2015, at 1:16 PM, H.J. Lu  wrote:
> On Tue, Sep 29, 2015 at 11:49 AM, Mike Stump  wrote:
>> To be feature complete, it would be nice to have two styles of interrupt 
>> functions, one that returns with iret, and one that returns with ret.  The 
>> point is that the user might want to call functions from a interrupt handler 
>> and not save and restore all call clobbered registers.  By allowing a ret 
>> style interrupt handler, calls to a ret style interrupt routine can avoid 
>> saving and restoring all call clobbered registers.
> 
> Do you have a testcase for this?  I think the current implementation
> covers most use cases.

When I wrote my interrupt support for my cpu, I ran these through the code 
generator…  I have many registers, and noticed saving and restoring them all 
just because two interrupt handlers used the same routine was silly.  Test case 
is trivial:

interrupt void foo2() {
  bar();
}

interrupt void foo1() {
  bar();
}

if more than 1-2 registers are saved, then likely it is saving all call used 
registers.  Saving all means that one cannot use functions to compose semantics 
and attain performance.  Performance of ISR routines I think is useful to shoot 
for, given that it is easy enough to attain, I don’t see the harm in doing 
that.  Even if in the first implementation you don’t bother with performance, 
if you spec the other function, the user code need never change; and when 
performance does matter, it is then a mere matter of enhancing the code gen to 
do the right thing.  It is pretty easy to get most of the benefit without much 
work.  i call the main interrupt function interrupt, and the recursive (ret 
style), I call interruptr.  The r is for recursive.

Re: [PATCH] x86 interrupt attribute

2015-09-29 Thread Mike Stump

On Sep 29, 2015, at 1:59 PM, H.J. Lu  wrote:
> commit f3a6675a8d69d810d2cad0c090a762094a0a8622
> Author: H.J. Lu 
> Date:   Tue Sep 29 13:47:18 2015 -0700
> 
>Define EPILOGUE_USES in i386 so that all preserved registers are used
>by the epilogue of interrupt handler.  Don't explicitly mark BP and SP
>registers as used since they are always used in epilogue.
> 
> Please take a look.

Oh, too bad you didn’t copy it here.  The easiest thing to blow is the addition 
of reload_completed && on the condition:

  /* An interrupt handler must preserve some registers that are 

 ordinarily call-clobbered.  */
  if (reload_completed
  && myarch_interrupt_func (current_function_decl)
  && save_reg_p (regno))
return true;

without it, the optimizer will blow chunks all over the place and code-gen will 
not be very good, if it doesn’t.  I’d love this to be shared across all ports, 
it it is cryptic and usually test cases are not elaborate enough to find the 
problem.  When we ported a large library to our system that made extensive uses 
of complex interrupt routines, the compiler blew chunks.  With lessor code, we 
never even noticed a problem.

Re: [PATCH] x86 interrupt attribute

2015-09-29 Thread Mike Stump

On Sep 29, 2015, at 2:23 PM, Mike Stump  wrote:
> On Sep 29, 2015, at 1:59 PM, H.J. Lu  wrote:
>> commit f3a6675a8d69d810d2cad0c090a762094a0a8622
>> Author: H.J. Lu 
>> Date:   Tue Sep 29 13:47:18 2015 -0700
>> 
>>   Define EPILOGUE_USES in i386

>> Please take a look.

Oh, and with that, I don’t think one needs the generated USEs anymore.

[RFA][PATCH] Fix building cr16-elf with trunk compiler

2015-09-29 Thread Jeff Law



This code from builtins.c:

  /* If we don't need too much alignment, we'll have been guaranteed
 proper alignment by get_trampoline_type.  */
  if (TRAMPOLINE_ALIGNMENT <= STACK_BOUNDARY)
return tramp;


It's entirely conceivable that TRAMPOLINE_ALIGNMENT will be the same as 
STACK_BOUNDARY.  And if they are, then -Wtautological-compare will 
complain bitterly.


This affects the cr16 port and possibly others (I've had this fix in my 
tree while running the config-all.mk builds).


Given the real possibility that those two objects are the same and thus 
the complaint from -Wtautological-compare, it seems best to simply 
disable -Wtautological-compare for this function.


Bootstrapped and regression tested on x86_64-linux-gnu and also used to 
successfully build cr16-elf cross compilers from config-all.mk.


OK for the trunk?

Other alternatives would be to obfuscate the appropriate macros in the 
cr16 port.  That seemed wrong in this case to me.


Jeff
* builtins.c (round_trampoline_addr): Turn off -Wtautological-compare
when compiling this function.

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 1592810..e4ed470 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -4830,6 +4830,11 @@ expand_builtin___clear_cache (tree exp)
   return const0_rtx;
 }
 
+#if GCC_VERSION >= 6000
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wtautological-compare"
+#endif
+
 /* Given a trampoline address, make sure it satisfies TRAMPOLINE_ALIGNMENT.  */
 
 static rtx
@@ -4854,6 +4859,9 @@ round_trampoline_addr (rtx tramp)
 
   return tramp;
 }
+#if GCC_VERSION >= 6000
+#pragma GCC diagnostic pop
+#endif
 
 static rtx
 expand_builtin_init_trampoline (tree exp, bool onstack)

[PATCH] use MIN fusion for ISL-14

2015-09-29 Thread Sebastian Pop

This patch fixes PR66754 by reverting an earlier unintended change.
We now generate a much simpler AST for interchange-1.c:

ISL AST generated by ISL:
{
  for (int c1 = 0; c1 <= 1334; c1 += 1) {
S_7(c1);
for (int c3 = 0; c3 <= 1334; c3 += 1)
  S_4(c1, c3);
S_5(c1);
  }
  for (int c1 = 0; c1 <= 1334; c1 += 1)
S_10(c1);
  S_8();
}

Bootstrap and check pass on x86_64-linux with isl-0.14.1

  PR tree-optimization/67754
  * graphite-optimize-isl.c (optimize_isl): Call
  isl_options_set_schedule_fuse with ISL_SCHEDULE_FUSE_MIN for ISL-14.
---
 gcc/graphite-optimize-isl.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/graphite-optimize-isl.c b/gcc/graphite-optimize-isl.c
index 4b82174..512c64c 100644
--- a/gcc/graphite-optimize-isl.c
+++ b/gcc/graphite-optimize-isl.c
@@ -327,9 +327,10 @@ optimize_isl (scop_p scop)
   isl_options_set_schedule_max_constant_term (scop->ctx, CONSTANT_BOUND);
   isl_options_set_schedule_maximize_band_depth (scop->ctx, 1);
 #ifdef HAVE_ISL_OPTIONS_SET_SCHEDULE_SERIALIZE_SCCS
+  /* ISL-0.15 or later.  */
   isl_options_set_schedule_serialize_sccs (scop->ctx, 1);
 #else
-  isl_options_set_schedule_fuse (scop->ctx, ISL_SCHEDULE_FUSE_MAX);
+  isl_options_set_schedule_fuse (scop->ctx, ISL_SCHEDULE_FUSE_MIN);
 #endif
 
 #ifdef HAVE_ISL_SCHED_CONSTRAINTS_COMPUTE_SCHEDULE
-- 
2.1.0.243.g30d45f7

1 2 >

1 - 100 of 127 matches

Mail list logo