date:20160226

[PATCH] Fix PR69951

2016-02-26 Thread Richard Biener


The following fixes PR69951, hopefully the last case of decl alias
issues with alias analysis.  This time it's points-to and the DECL_UIDs
used in points-to sets not being canonicalized.

The simplest (and cheapest) fix is to make aliases refer to the
ultimate alias target via their DECL_PT_UID which we conveniently
have available.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2016-02-26  Richard Biener  

PR tree-optimization/69551
* tree-ssa-structalias.c (get_constraint_for_ssa_var): When
looking through aliases adjust DECL_PT_UID to refer to the
ultimate alias target.

* gcc.dg/torture/pr69951.c: New testcase.

Index: gcc/tree-ssa-structalias.c
===
*** gcc/tree-ssa-structalias.c  (revision 233693)
--- gcc/tree-ssa-structalias.c  (working copy)
*** get_constraint_for_ssa_var (tree t, vec<
*** 2943,2948 
--- 2943,2956 
if (node && node->alias && node->analyzed)
{
  node = node->ultimate_alias_target ();
+ /* Canonicalize the PT uid of all aliases to the ultimate target.
+???  Hopefully the set of aliases can't change in a way that
+changes the ultimate alias target.  */
+ gcc_assert ((! DECL_PT_UID_SET_P (node->decl)
+  || DECL_PT_UID (node->decl) == DECL_UID (node->decl))
+ && (! DECL_PT_UID_SET_P (t)
+ || DECL_PT_UID (t) == DECL_UID (node->decl)));
+ DECL_PT_UID (t) = DECL_UID (node->decl);
  t = node->decl;
}
  }
Index: gcc/testsuite/gcc.dg/torture/pr69951.c
===
*** gcc/testsuite/gcc.dg/torture/pr69951.c  (revision 0)
--- gcc/testsuite/gcc.dg/torture/pr69951.c  (working copy)
***
*** 0 
--- 1,21 
+ /* { dg-do run } */
+ /* { dg-require-alias "" } */
+ 
+ extern void abort (void);
+ 
+ int a = 1, c = 1;
+ extern int b __attribute__((alias("a")));
+ extern int d __attribute__((alias("c")));
+ int main(int argc)
+ {
+   int *p, *q;
+   if (argc)
+ p = &c, q = &d;
+   else
+ p = &b, q = &d;
+   *p = 1;
+   *q = 2;
+   if (*p == 1)
+ abort();
+   return 0;
+ }

Re: [PATCH] Fix DW_OP_GNU_implicit_pointer referring to DW_TAG_dwarf_procedure (PR debug/69947)

2016-02-26 Thread Pierre-Marie de Rodat


On 02/25/2016 05:46 PM, Jakub Jelinek wrote:

Ah, I've been looking for something that would set die_perennial_p, but
actually you just set die_mark later on instead for those.
So IMHO the right fix is just handle all the ops that could directly or
indirectly contain references to other DIEs, rather than just handling
the 3 you have there.


Agreed! (as per the patch I sent)


Going to bootstrap/regtest this on x86_64-linux and i686-linux now.

Is this ok for trunk if it passes testing?


I guess this was not for me as I’m not maintainer, but: this looks 
completely fine to me. :-) Thanks!


--
Pierre-Marie de Rodat

[PATCH, PR69956] Fix multi-step conversion of boolean vectors

2016-02-26 Thread Ilya Enkovich

Hi,

Currently multi-step vector conversion tries to compute
intermediate type from its mode but it doesn't work for
boolean vectors.  This patch introduces a computation
of intermediate vector masks.  Bootstrapped and tested
on x86_64-pc-linux-gnu.  OK for trunk?

Thanks,
Ilya
--
gcc/

2016-02-26  Ilya Enkovich  

PR tree-optimization/69956
* tree-vect-stmts.c (supportable_widening_operation): Support
multi-step conversion of boolean vectors.
(supportable_narrowing_operation): Likewise.

gcc/testsuite/

2016-02-26  Ilya Enkovich  

PR tree-optimization/69956
* gcc.dg/pr69956.c: New test.


diff --git a/gcc/testsuite/gcc.dg/pr69956.c b/gcc/testsuite/gcc.dg/pr69956.c
new file mode 100644
index 000..37d24d4
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr69956.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-vectorize" } */
+/* { dg-additional-options "-march=skylake-avx512" { target { i?86-*-* 
x86_64-*-* } } } */
+
+void
+fn1 (char *b, char *d, int *c, int i)
+{
+  for (; i; i++, d++)
+if (b[i])
+  *d = c[i];
+}
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 9678d7c..182b277 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -9000,9 +9000,19 @@ supportable_widening_operation (enum tree_code code, 
gimple *stmt,
   for (i = 0; i < MAX_INTERM_CVT_STEPS; i++)
 {
   intermediate_mode = insn_data[icode1].operand[0].mode;
-  intermediate_type
-   = lang_hooks.types.type_for_mode (intermediate_mode,
- TYPE_UNSIGNED (prev_type));
+  if (VECTOR_BOOLEAN_TYPE_P (prev_type))
+   {
+ intermediate_type
+   = build_truth_vector_type (TYPE_VECTOR_SUBPARTS (prev_type) / 2,
+  current_vector_size);
+ if (intermediate_mode != TYPE_MODE (intermediate_type))
+   return false;
+   }
+  else
+   intermediate_type
+ = lang_hooks.types.type_for_mode (intermediate_mode,
+   TYPE_UNSIGNED (prev_type));
+
   optab3 = optab_for_tree_code (c1, intermediate_type, optab_default);
   optab4 = optab_for_tree_code (c2, intermediate_type, optab_default);
 
@@ -9065,7 +9075,7 @@ supportable_narrowing_operation (enum tree_code code,
   tree vectype = vectype_in;
   tree narrow_vectype = vectype_out;
   enum tree_code c1;
-  tree intermediate_type;
+  tree intermediate_type, prev_type;
   machine_mode intermediate_mode, prev_mode;
   int i;
   bool uns;
@@ -9111,6 +9121,7 @@ supportable_narrowing_operation (enum tree_code code,
   /* Check if it's a multi-step conversion that can be done using intermediate
  types.  */
   prev_mode = vec_mode;
+  prev_type = vectype;
   if (code == FIX_TRUNC_EXPR)
 uns = TYPE_UNSIGNED (vectype_out);
   else
@@ -9145,8 +9156,17 @@ supportable_narrowing_operation (enum tree_code code,
   for (i = 0; i < MAX_INTERM_CVT_STEPS; i++)
 {
   intermediate_mode = insn_data[icode1].operand[0].mode;
-  intermediate_type
-   = lang_hooks.types.type_for_mode (intermediate_mode, uns);
+  if (VECTOR_BOOLEAN_TYPE_P (prev_type))
+   {
+ intermediate_type
+   = build_truth_vector_type (TYPE_VECTOR_SUBPARTS (prev_type) * 2,
+  current_vector_size);
+ if (intermediate_mode != TYPE_MODE (intermediate_type))
+ return false;
+   }
+  else
+   intermediate_type
+ = lang_hooks.types.type_for_mode (intermediate_mode, uns);
   interm_optab
= optab_for_tree_code (VEC_PACK_TRUNC_EXPR, intermediate_type,
   optab_default);
@@ -9164,6 +9184,7 @@ supportable_narrowing_operation (enum tree_code code,
return true;
 
   prev_mode = intermediate_mode;
+  prev_type = intermediate_type;
   optab1 = interm_optab;
 }

Re: [DOC, PATCH] Mention --enable-valgrind-annotations in install.texi

2016-02-26 Thread Martin Liška

On 12/09/2015 11:01 AM, Markus Trippelsdorf wrote:
> Sorry, but this is simply awful ;-). How about:
> 
> Mark selected memory related operations in the compiler when run under
> valgrind to suppress false positives.

Hi.

I forgot to install the patch, following Markus' correction, installed as 
r233735.

Martin

Re: [DOC,PATCH] Mention clog10, clog10f an clog10l in Builtins section.

2016-02-26 Thread Martin Liška

On 12/04/2015 10:45 AM, Martin Liška wrote:
> Hello.
> 
> I noticed that Builtins section of documentation does not mention 
> clog10{,f,l} functions.
> I've tried to write a patch, however I'm not sure how should be these 
> functions described.
> 
> Thanks,
> Martin
> 

PING

Re: [ARM] Add support for overflow add, sub, and neg operations

2016-02-26 Thread Michael Collison




On 02/25/2016 02:51 AM, Kyrill Tkachov wrote:

Hi Michael,

On 24/02/16 23:02, Michael Collison wrote:
This patch adds support for builtin overflow of add, subtract and 
negate. This patch is targeted for gcc 7 stage 1. It was tested with 
no regressions in arm and thumb modes on the following targets:


arm-non-linux-gnueabi
arm-non-linux-gnuabihf
armeb-none-linux-gnuabihf
arm-non-eabi



I'll have a deeper look once we're closer to GCC 7 development.
I've got a few comments in the meantime.


2016-02-24 Michael Collison 

* config/arm/arm-modes.def: Add new condition code mode CC_V
to represent the overflow bit.
* config/arm/arm.c (maybe_get_arm_condition_code):
Add support for CC_Vmode.
* config/arm/arm.md (addv4, add3_compareV,
addsi3_compareV_upper): New patterns to support signed
builtin overflow add operations.
(uaddv4, add3_compareC, addsi3_compareV_upper):
New patterns to support unsigned builtin add overflow operations.
(subv4, sub3_compare1): New patterns to support signed
builtin overflow subtract operations,
(usubv4): New patterns to support unsigned builtin subtract
overflow operations.
(negvsi3, negvdi3, negdi2_compre, negsi2_carryin_compare): New 
patterns

to support builtin overflow negate operations.




Can you please summarise what sequences are generated for these 
operations, and how

they are better than the default fallback sequences.


Sure for a simple test case such as:

int
fn3 (int x, int y, int *ovf)
{
  int res;
  *ovf = __builtin_sadd_overflow (x, y, &res);
  return res;
}

Current trunk at -O2 generates

fn3:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
cmp r1, #0
mov r3, #0
add r1, r0, r1
blt .L4
cmp r1, r0
blt .L3
.L2:
str r3, [r2]
mov r0, r1
bx  lr
.L4:
cmp r1, r0
ble .L2
.L3:
mov r3, #1
b   .L2

With the overflow patch this now generates:

   addsr0, r0, r1
   movvs   r3, #1
   movvc   r3, #0
   str r3, [r2]
   bx  lr

Also, we'd need tests for each of these overflow operations, since 
these are pretty complex

patterns that are being added.


The patterns are tested now most notably by tests in:

c-c++-common/torture/builtin-arith-overflow*.c

I had a few failures I resolved so the builtin overflow arithmetic 
functions are definitely being exercised.


Also, you may want to consider splitting this into a patch series, 
each adding a single
overflow operation, together with its tests. That way it will be 
easier to keep track of
which pattern applies to which use case and they can go in 
independently of each other.


Let me know if you still fell the same way given the existing test cases.



+(define_expand "uaddv4"
+  [(match_operand:SIDI 0 "register_operand")
+   (match_operand:SIDI 1 "register_operand")
+   (match_operand:SIDI 2 "register_operand")
+   (match_operand 3 "")]
+  "TARGET_ARM"
+{
+  emit_insn (gen_add3_compareC (operands[0], operands[1], 
operands[2]));

+
+  rtx x;
+  x = gen_rtx_NE (VOIDmode, gen_rtx_REG (CC_Cmode, CC_REGNUM), 
const0_rtx);

+  x = gen_rtx_IF_THEN_ELSE (VOIDmode, x,
+gen_rtx_LABEL_REF (VOIDmode, operands[3]),
+pc_rtx);
+  emit_jump_insn (gen_rtx_SET (pc_rtx, x));
+  DONE;
+})
+

I notice this and many other patterns in this patch are guarded on 
TARGET_ARM. Is there any reason why they

should be restricted to arm state and not be TARGET_32BIT ?
I thought about this as well. I will test will TARGET_32BIT and get back 
to you.



Thanks,
Kyrill


--
Michael Collison
Linaro Toolchain Working Group
michael.colli...@linaro.org

Re: [PATCH] Do not gather mem stats in run_exit_handles (PR middle-end/69919)

2016-02-26 Thread Martin Liška

On 02/25/2016 07:02 PM, David Malcolm wrote:
> Was this tested with "jit" in --enable-languages?   I ask because
> toplev::main can be run repeatedly by libgccjit, and it looks like
> nothing can reset after_memory_report.  (Though I don't typically build
> with --enable-gather-detailed-mem-stats, and the jit docs don't mention
> it).

Wasn't, however as I've just retested that with jit in --enable-languages it 
works.
However, you are right that it's not reset anywhere. I will try to come up with
a better solution for GCC 7.

Martin

Re: [PATCH 3/4] Replace ENABLE_CHECKING with CHECKING_P in dwarf2out

2016-02-26 Thread Richard Biener

On Thu, Feb 25, 2016 at 11:06 AM, marxin  wrote:
> gcc/ChangeLog:

Ok.

Richard.

> 2016-02-25  Martin Liska  
>
> * dwarf2out.c (new_loc_descr): Replace ENABLE_CHECKING with
> CHECKING_P.
> (resolve_args_picking_1): Likewise.
> * dwarf2out.h (struct GTY): Likewise.
> ---
>  gcc/dwarf2out.c | 6 +++---
>  gcc/dwarf2out.h | 2 +-
>  2 files changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
> index 97e192b..a8c21d8 100644
> --- a/gcc/dwarf2out.c
> +++ b/gcc/dwarf2out.c
> @@ -1325,7 +1325,7 @@ new_loc_descr (enum dwarf_location_atom op, unsigned 
> HOST_WIDE_INT oprnd1,
>dw_loc_descr_ref descr = ggc_cleared_alloc ();
>
>descr->dw_loc_opc = op;
> -#if ENABLE_CHECKING
> +#if CHECKING_P
>descr->dw_loc_frame_offset = -1;
>  #endif
>descr->dw_loc_oprnd1.val_class = dw_val_class_unsigned_const;
> @@ -15369,14 +15369,14 @@ resolve_args_picking_1 (dw_loc_descr_ref loc, 
> unsigned initial_frame_offset,
>/* If we already met this node, there is nothing to compute anymore.  
> */
>if (visited.add (l))
> {
> -#if ENABLE_CHECKING
> +#if CHECKING_P
>   /* Make sure that the stack size is consistent wherever the 
> execution
>  flow comes from.  */
>   gcc_assert ((unsigned) l->dw_loc_frame_offset == frame_offset_);
>  #endif
>   break;
> }
> -#if ENABLE_CHECKING
> +#if CHECKING_P
>l->dw_loc_frame_offset = frame_offset_;
>  #endif
>
> diff --git a/gcc/dwarf2out.h b/gcc/dwarf2out.h
> index a96ac38..91b3d6b 100644
> --- a/gcc/dwarf2out.h
> +++ b/gcc/dwarf2out.h
> @@ -239,7 +239,7 @@ struct GTY((chain_next ("%h.dw_loc_next"))) 
> dw_loc_descr_node {
>   frame offset.  */
>unsigned int frame_offset_rel : 1;
>int dw_loc_addr;
> -#if ENABLE_CHECKING
> +#if CHECKING_P
>/* When translating a function into a DWARF procedure, contains the frame
>   offset *before* evaluating this operation.  It is -1 when not yet
>   initialized.  */
> --
> 2.7.0
>
>

Re: [PATCH 4/4] Poison ENABLE_CHECKING macro

2016-02-26 Thread Richard Biener

On Thu, Feb 25, 2016 at 11:11 AM, marxin  wrote:
> gcc/ChangeLog:

Ok.

Richard.

> 2016-02-25  Martin Liska  
>
> * system.h: Poison ENABLE_CHECKING macro.
> ---
>  gcc/system.h | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/gcc/system.h b/gcc/system.h
> index 445073c..cb54541 100644
> --- a/gcc/system.h
> +++ b/gcc/system.h
> @@ -1014,6 +1014,10 @@ extern void fancy_abort (const char *, int, const char 
> *) ATTRIBUTE_NORETURN;
>  #undef rindex
>   #pragma GCC poison bcopy bzero bcmp rindex
>
> +/* Poison ENABLE_CHECKING macro that should be replaced with
> +   'if (flag_checking)', or with CHECKING_P macro.  */
> +#pragma GCC poison ENABLE_CHECKING
> +
>  #endif /* GCC >= 3.0 */
>
>  /* This macro allows casting away const-ness to pass -Wcast-qual
> --
> 2.7.0
>

Clear visibility of TYPE_DECL

2016-02-26 Thread Jan Hubicka

Hi,
while looking into PR69589 I noticed that types are not merged when pragma
visibility does not match. This is because C++ FE stores visibility into 
TYPE_DECL
that is used by FE only.  This patch clears the flag in free_lang_data.

Bootstrapped/regtested x86_64-linux and tested it makes no difference for
Firefox LTO binary, OK?

PR lto/69589
* tree.c (free_lang_data_in_decl): Clear visibility of TYPE_DECL.
Index: tree.c
===
--- tree.c  (revision 233707)
+++ tree.c  (working copy)
@@ -5472,8 +5473,13 @@ free_lang_data_in_decl (tree decl)
  || (decl_function_context (decl) && !TREE_STATIC (decl)))
DECL_INITIAL (decl) = NULL_TREE;
 }
-  else if (TREE_CODE (decl) == TYPE_DECL
-  || TREE_CODE (decl) == FIELD_DECL)
+  else if (TREE_CODE (decl) == TYPE_DECL)
+{
+  DECL_VISIBILITY (decl) = VISIBILITY_DEFAULT;
+  DECL_VISIBILITY_SPECIFIED (decl) = 0;
+  DECL_INITIAL (decl) = NULL_TREE;
+}
+  else if (TREE_CODE (decl) == FIELD_DECL)
 DECL_INITIAL (decl) = NULL_TREE;
   else if (TREE_CODE (decl) == TRANSLATION_UNIT_DECL
&& DECL_INITIAL (decl)

Avoid redundant DECL_ASSEMBLER_NAME computations for ODR types

2016-02-26 Thread Jan Hubicka

Hi,
while looking into the PR testcase I noticed that we detect wrong duplicate
types. This is because we compute DECL_ASSEMBLER_NAME of a type variant which
is not necessary.

Bootstrapped/regested x86_64-linux and I checked dumps of xalancbmk to verify
that nothing changes in ODR type merging except for the duplicates. OK?

Honza
* tree.c (need_assembler_name_p): Do not compute mangled name for
variabt types.
Index: tree.c
===
--- tree.c  (revision 233707)
+++ tree.c  (working copy)
@@ -5329,6 +5329,7 @@ need_assembler_name_p (tree decl)
   && TREE_CODE (decl) == TYPE_DECL
   && DECL_NAME (decl)
   && decl == TYPE_NAME (TREE_TYPE (decl))
+  && TYPE_MAIN_VARIANT (TREE_TYPE (decl)) == TREE_TYPE (decl)
   && !TYPE_ARTIFICIAL (TREE_TYPE (decl))
   && (type_with_linkage_p (TREE_TYPE (decl))
  || TREE_CODE (TREE_TYPE (decl)) == INTEGER_TYPE)

Re: [RFC][PATCH][PR40921] Convert x + (-y * z * z) into x - y * z * z

2016-02-26 Thread Richard Biener

On Fri, Feb 26, 2016 at 4:02 AM, kugan
 wrote:
>
>
> Hi,
>
> This is an attempt to fix missed optimization: x + (-y * z * z) => x - y * z
> * z as reported in PR40921.
>
> Regression tested and bootstrapped on x86-64-linux-gnu with no new
> regressions.
>
> Is this OK for next stage1?

Err.  I think the way you implement that in reassoc is ad-hoc and not
related to reassoc at all.

In fact what reassoc is missing is to handle

 -y * z * (-w) * x -> y * x * w * x

thus optimize negates as if they were additional * -1 entries in a
multiplication chain.  And
then optimize a single remaining * -1 in the result chain to a negate.

Then match.pd handles x + (-y) -> x - y (independent of -frounding-math btw).

So no, this isn't ok as-is, IMHO you want to expand the multiplication ops chain
pulling in the * -1 ops (if single-use, of course).

Richard.

> Thanks,
> Kugan
>
>
> gcc/ChangeLog:
>
> 2016-02-26  Kugan Vivekanandarajah  
>
> PR middle-end/40921
> * tree-ssa-reassoc.c (propagate_neg_to_sub_or_add): New.
> (reassociate_bb): Call propagate_neg_to_sub_or_add.
>
>
> gcc/testsuite/ChangeLog:
>
> 2016-02-26  Kugan Vivekanandarajah  
>
> PR middle-end/40921
> * gcc.dg/tree-ssa/pr40921.c: New test.

Re: Avoid redundant DECL_ASSEMBLER_NAME computations for ODR types

2016-02-26 Thread Jakub Jelinek

On Fri, Feb 26, 2016 at 11:52:50AM +0100, Jan Hubicka wrote:
> Hi,
> while looking into the PR testcase I noticed that we detect wrong duplicate
> types. This is because we compute DECL_ASSEMBLER_NAME of a type variant which
> is not necessary.
> 
> Bootstrapped/regested x86_64-linux and I checked dumps of xalancbmk to verify
> that nothing changes in ODR type merging except for the duplicates. OK?
> 
> Honza
>   * tree.c (need_assembler_name_p): Do not compute mangled name for
>   variabt types.

s/variabt/variant/

Jakub

Revert gcc r227962

2016-02-26 Thread JonY

Hi,

I've submitted a patch that was committed as r227962, it causes some
unintended side effects (namely libuuid on Cygwin). Can someone please
revert?

Kai still needs some time to setup his gcc development environment.

Thanks.



signature.asc
Description: OpenPGP digital signature

Re: [RFC][PATCH][PR63586] Convert x+x+x+x into 4*x

2016-02-26 Thread Richard Biener

On Fri, Feb 26, 2016 at 4:02 AM, kugan
 wrote:
>
>
> Hi,
>
> This is an attempt to fix missed optimization: x+x+x+x -> 4*x as reported in
> PR63586.
>
> Regression tested and bootstrapped on x86-64-linux-gnu with no new
> regressions.
>
> Is this OK for next stage1?

That looks better, but I think the unordered_remove will break operand sorting
and thus you probably don't handle x + x + x + x + y + y + y + y + y +
y + z + z + z + z
optimally.

I'd say you simply want to avoid the recursion and collect a vector of
[start, end] pairs
before doing any modification to the ops vector.

Richard.

> Thanks,
> Kugan
>
>
> gcc/testsuite/ChangeLog:
>
> 2016-02-26  Kugan Vivekanandarajah  
>
> PR middle-end/63586
> * gcc.dg/tree-ssa/reassoc-14.c: Fix multiply count.
>
> gcc/ChangeLog:
>
> 2016-02-26  Kugan Vivekanandarajah  
>
> PR middle-end/63586
> * tree-ssa-reassoc.c (transform_add_to_multiply): New.
> (reassociate_bb): Call transform_add_to_multiply.
>
>

Re: [PATCH] [RFA] [PR tree-optmization/69740] Schedule loop fixups when needed

2016-02-26 Thread Richard Biener

On Fri, Feb 26, 2016 at 8:50 AM, Jeff Law  wrote:
> On 02/25/2016 03:00 AM, Richard Biener wrote:
>>
>>
>> So I fail to see how only successor edges are relevant.  Isn't the
>> important
>> case to catch whether we remove an edge marked EDGE_IRREDUCIBLE_LOOP?
>> Even if the BB persists we might have exposed a new loop here.
>>
>> Note that it is not safe to look at {BB,EDGE}_IRREDUCIBLE_LOOP if the loop
>> state does not have LOOPS_HAVE_MARKED_IRREDUCIBLE_REGIONS set
>> (the flags may be stale or missing).  So it might be that we can't rely on
>> non-loop passes modifying the CFG to handle this optimistically.
>>
>> Thus, how about (my main point) moving this to remove_edge instead, like
>
> Yea.  That works.  The !loops_state_satisfies_p check will almost certainly
> cause us to trigger loop cleanups more often, but I think it's the
> right/safe thing to do to catch cases where we haven't go the
> IRREDUCIBLE_LOOP flags set.
>
>
> Bootstrapped and regression tested on x86_64-linux-gnu.  OK for the trunk?

Ok with a comment added before that check.

Richard.

> Thanks,
> Jeff
>
>
> PR tree-optimization/69740
> * cfghooks.c (remove_edge): Request loop fixups if we delete
> an edge that might turn an irreducible loop into a natural
> loop.
>
> PR tree-optimization/69740
> * gcc.c-torture/compile/pr69740-1.c: New test.
> * gcc.c-torture/compile/pr69740-2.c: New test.
>
> diff --git a/gcc/cfghooks.c b/gcc/cfghooks.c
> index bbb1017..f80e455 100644
> --- a/gcc/cfghooks.c
> +++ b/gcc/cfghooks.c
> @@ -408,7 +408,14 @@ void
>  remove_edge (edge e)
>  {
>if (current_loops != NULL)
> -rescan_loop_exit (e, false, true);
> +{
> +  rescan_loop_exit (e, false, true);
> +
> +  if (!loops_state_satisfies_p (LOOPS_HAVE_MARKED_IRREDUCIBLE_REGIONS)
> + || (e->flags & EDGE_IRREDUCIBLE_LOOP)
> + || (e->dest->flags & BB_IRREDUCIBLE_LOOP))
> +   loops_state_set (LOOPS_NEED_FIXUP);
> +}
>
>/* This is probably not needed, but it doesn't hurt.  */
>/* FIXME: This should be called via a remove_edge hook.  */
> diff --git a/gcc/testsuite/gcc.c-torture/compile/pr69740-1.c
> b/gcc/testsuite/gcc.c-torture/compile/pr69740-1.c
> new file mode 100644
> index 000..ac867d8
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/compile/pr69740-1.c
> @@ -0,0 +1,12 @@
> +char a;
> +short b;
> +void fn1() {
> +  if (b)
> +;
> +  else {
> +int c[1] = {0};
> +  l1:;
> +  }
> +  if (a)
> +goto l1;
> +}
> diff --git a/gcc/testsuite/gcc.c-torture/compile/pr69740-2.c
> b/gcc/testsuite/gcc.c-torture/compile/pr69740-2.c
> new file mode 100644
> index 000..a89c9a0
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/compile/pr69740-2.c
> @@ -0,0 +1,19 @@
> +inline int foo(int *p1, int p2) {
> +  int z = *p1;
> +  while (z > p2)
> +p2 = 2;
> +  return z;
> +}
> +int main() {
> +  int i;
> +  for (;;) {
> +int j, k;
> +i = foo(&k, 7);
> +if (k)
> +  j = i;
> +else
> +  k = j;
> +if (2 != j)
> +  __builtin_abort();
> +  }
> +}
>

Re: [DOC,PATCH] Mention clog10, clog10f an clog10l in Builtins section.

2016-02-26 Thread Richard Biener

On Fri, Feb 26, 2016 at 11:05 AM, Martin Liška  wrote:
> On 12/04/2015 10:45 AM, Martin Liška wrote:
>> Hello.
>>
>> I noticed that Builtins section of documentation does not mention 
>> clog10{,f,l} functions.
>> I've tried to write a patch, however I'm not sure how should be these 
>> functions described.
>>
>> Thanks,
>> Martin
>>

Ok with also addign them to the @findex list above.

Richard.

> PING

Re: Clear visibility of TYPE_DECL

2016-02-26 Thread Richard Biener

On Fri, 26 Feb 2016, Jan Hubicka wrote:

> Hi,
> while looking into PR69589 I noticed that types are not merged when pragma
> visibility does not match. This is because C++ FE stores visibility into 
> TYPE_DECL
> that is used by FE only.  This patch clears the flag in free_lang_data.
> 
> Bootstrapped/regtested x86_64-linux and tested it makes no difference for
> Firefox LTO binary, OK?

Ok.

Richard.

>   PR lto/69589
>   * tree.c (free_lang_data_in_decl): Clear visibility of TYPE_DECL.
> Index: tree.c
> ===
> --- tree.c(revision 233707)
> +++ tree.c(working copy)
> @@ -5472,8 +5473,13 @@ free_lang_data_in_decl (tree decl)
> || (decl_function_context (decl) && !TREE_STATIC (decl)))
>   DECL_INITIAL (decl) = NULL_TREE;
>  }
> -  else if (TREE_CODE (decl) == TYPE_DECL
> -|| TREE_CODE (decl) == FIELD_DECL)
> +  else if (TREE_CODE (decl) == TYPE_DECL)
> +{
> +  DECL_VISIBILITY (decl) = VISIBILITY_DEFAULT;
> +  DECL_VISIBILITY_SPECIFIED (decl) = 0;
> +  DECL_INITIAL (decl) = NULL_TREE;
> +}
> +  else if (TREE_CODE (decl) == FIELD_DECL)
>  DECL_INITIAL (decl) = NULL_TREE;
>else if (TREE_CODE (decl) == TRANSLATION_UNIT_DECL
> && DECL_INITIAL (decl)

Re: Avoid redundant DECL_ASSEMBLER_NAME computations for ODR types

2016-02-26 Thread Richard Biener

On Fri, Feb 26, 2016 at 12:04 PM, Jakub Jelinek  wrote:
> On Fri, Feb 26, 2016 at 11:52:50AM +0100, Jan Hubicka wrote:
>> Hi,
>> while looking into the PR testcase I noticed that we detect wrong duplicate
>> types. This is because we compute DECL_ASSEMBLER_NAME of a type variant which
>> is not necessary.
>>
>> Bootstrapped/regested x86_64-linux and I checked dumps of xalancbmk to verify
>> that nothing changes in ODR type merging except for the duplicates. OK?
>>
>> Honza
>>   * tree.c (need_assembler_name_p): Do not compute mangled name for
>>   variabt types.
>
> s/variabt/variant/

Ok with that fix.

Richard.

> Jakub

Re: [DOC,PATCH] Mention clog10, clog10f an clog10l in Builtins section.

2016-02-26 Thread Martin Liška

On 02/26/2016 12:11 PM, Richard Biener wrote:
> Ok with also addign them to the @findex list above.
> 
> Richard.

Thanks for review, updated version installed as r233738.

Martin

[PATCH] Fix up memset handling in DSE (PR rtl-optimization/69891)

2016-02-26 Thread Jakub Jelinek

Hi!

As analyzed by Eric, DSE mishandles memset calls if it can't figure out what
the arguments to memset exactly are (it handles only register arguments
right now), or if the second or third arguments are not CONST_INTs.
In that case we don't call record_store, because we don't know what to call
it on; but we need to treat it as a wild store, which is handled by the
clear_rhs_from_active_local_stores () function (which record_store also uses
if it can't figure out what exactly is being overwritten).

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?

2016-02-26  Jakub Jelinek  
Eric Botcazou  

PR rtl-optimization/69891
* dse.c (scan_insn): If we can't figure out memset arguments
or they are non-constant, call clear_rhs_from_active_local_stores.

* gcc.target/i386/pr69891.c: New test.

--- gcc/dse.c.jj2016-01-19 13:32:12.0 +0100
+++ gcc/dse.c   2016-02-26 11:03:36.694206088 +0100
@@ -2556,6 +2556,8 @@ scan_insn (bb_info_t bb_info, rtx_insn *
  active_local_stores = insn_info;
}
}
+ else
+   clear_rhs_from_active_local_stores ();
}
}
   else if (SIBLING_CALL_P (insn) && reload_completed)
--- gcc/testsuite/gcc.target/i386/pr69891.c.jj  2016-02-26 11:09:45.492079225 
+0100
+++ gcc/testsuite/gcc.target/i386/pr69891.c 2016-02-26 11:10:58.941058170 
+0100
@@ -0,0 +1,30 @@
+/* PR rtl-optimization/69891 */
+/* { dg-do run } */
+/* { dg-options "-O -fno-tree-fre -mstringop-strategy=libcall -Wno-psabi" } */
+/* { dg-additional-options "-mno-sse" { target ia32 } } */
+
+typedef unsigned short A;
+typedef unsigned short B __attribute__ ((vector_size (32)));
+typedef unsigned int C;
+typedef unsigned int D __attribute__ ((vector_size (32)));
+typedef unsigned long long E;
+typedef unsigned long long F __attribute__ ((vector_size (32)));
+
+__attribute__((noinline, noclone)) unsigned
+foo(D a, B b, D c, F d)
+{
+  b /= (B) {1, -c[0]} | 1;
+  c[0] |= 7;
+  a %= c | 1;
+  c ^= c;
+  return a[0] + b[15] + c[0] + d[3];
+}
+
+int
+main ()
+{
+  unsigned x = foo ((D) {}, (B) {}, (D) {}, (F) {});
+  if (x != 0)
+__builtin_abort ();
+  return 0;
+}

Jakub

Re: [PATCH, rs6000] Fixing PR 67145

2016-02-26 Thread David Edelsohn

On Fri, Feb 26, 2016 at 12:08 AM, Richard Henderson  wrote:
> It's the simplify-rtx.c portion of the patch that fixes the i686 regression.
>
> In the PR, Alan raises some good points, but I don't believe that we can
> address those for gcc6.  A new rtl reassoc optimization that takes loop
> invariance into account will have to wait.
>
> But we do need to take care of the rs6000 ice that results, and that's the
> bulk of the patch -- allowing CA to be sorted to any register of the plus
> chain.
>
> Some notes:
>
> ca_operand doesn't work as written, since CA_REGNO is not an available
> register, and thus doesn't satisfy register_operand.
>
> Is there any particular reason that subf<>3_carry_in_m1 was written with
> minus rather than plus like all of the other patterns?

Segher wrote the rs6000 carry infrastructure, so he's the best one to comment.

- David

Re: [AArch64] Disable pcrelative_literal_loads with fix-cortex-a53-843419

2016-02-26 Thread Christophe Lyon

Ping?

On 26 January 2016 at 15:43, Christophe Lyon  wrote:
> With the attachment
>
>
> On 26 January 2016 at 15:42, Christophe Lyon  
> wrote:
>> Hi,
>>
>> This is a followup to PR63304.
>>
>> As discussed in bugzilla, this patch disables pcrelative_literal_loads
>> when -mfix-cortex-a53-843419 (or its default configure option) is
>> used.
>>
>> I copied the behavior of -mfix-cortex-a53-835769 (e.g. in
>> aarch64_can_inline_p), and I have tested by building the Linux kernel
>> using -mfix-cortex-a53-843419 and checked that
>> R_AARCH64_ADR_PREL_PG_HI21 relocations are not emitted anymore (under
>> CONFIG_ARM64_ERRATUM_843419).
>>
>> For reference, this is motivated by:
>> https://bugs.linaro.org/show_bug.cgi?id=1994
>> and further details on Launchpad:
>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1533009
>>
>> OK for trunk?
>>
>> Thanks,
>>
>> Christophe.

Re: [PATCH][AArch64] Replace insn to zero up DF register

2016-02-26 Thread Wilco Dijkstra

Evandro Menezes  wrote:
>
> I have a question though: is it necessary to add the "fp" and "simd"
> attributes to both movsf_aarch64 and movdf_aarch64 as well?

You need at least the "simd" attribute, but providing "fp" as well is clearer
(in principle the TARGET_FLOAT check in the pattern condition is 
redundant as a result, but the movhf and movtf patterns already do both).

Also you want to use the smallest possible SIMD size as these are
scalar operations and some microarchitectures execute 64-bit operations
more efficiently than 128-bit ones, so:

mov\\t%0.h[0], %w1
+   movi\\t%0.4h, #0
umov\\t%w0, %1.h[0]

fmov\\t%s0, %w1
+   movi\\t%0.2s, #0
fmov\\t%w0, %s1

With those changes it should be ready for commit once you get the OK from 
James/Marcus.

Wilco

Re: [PATCH] Fix up memset handling in DSE (PR rtl-optimization/69891)

2016-02-26 Thread Richard Biener

On Fri, 26 Feb 2016, Jakub Jelinek wrote:

> Hi!
> 
> As analyzed by Eric, DSE mishandles memset calls if it can't figure out what
> the arguments to memset exactly are (it handles only register arguments
> right now), or if the second or third arguments are not CONST_INTs.
> In that case we don't call record_store, because we don't know what to call
> it on; but we need to treat it as a wild store, which is handled by the
> clear_rhs_from_active_local_stores () function (which record_store also uses
> if it can't figure out what exactly is being overwritten).
> 
> Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
> trunk?

Ok.

Thanks,
Richard.

> 2016-02-26  Jakub Jelinek  
>   Eric Botcazou  
> 
>   PR rtl-optimization/69891
>   * dse.c (scan_insn): If we can't figure out memset arguments
>   or they are non-constant, call clear_rhs_from_active_local_stores.
> 
>   * gcc.target/i386/pr69891.c: New test.
> 
> --- gcc/dse.c.jj  2016-01-19 13:32:12.0 +0100
> +++ gcc/dse.c 2016-02-26 11:03:36.694206088 +0100
> @@ -2556,6 +2556,8 @@ scan_insn (bb_info_t bb_info, rtx_insn *
> active_local_stores = insn_info;
>   }
>   }
> +   else
> + clear_rhs_from_active_local_stores ();
>   }
>   }
>else if (SIBLING_CALL_P (insn) && reload_completed)
> --- gcc/testsuite/gcc.target/i386/pr69891.c.jj2016-02-26 
> 11:09:45.492079225 +0100
> +++ gcc/testsuite/gcc.target/i386/pr69891.c   2016-02-26 11:10:58.941058170 
> +0100
> @@ -0,0 +1,30 @@
> +/* PR rtl-optimization/69891 */
> +/* { dg-do run } */
> +/* { dg-options "-O -fno-tree-fre -mstringop-strategy=libcall -Wno-psabi" } 
> */
> +/* { dg-additional-options "-mno-sse" { target ia32 } } */
> +
> +typedef unsigned short A;
> +typedef unsigned short B __attribute__ ((vector_size (32)));
> +typedef unsigned int C;
> +typedef unsigned int D __attribute__ ((vector_size (32)));
> +typedef unsigned long long E;
> +typedef unsigned long long F __attribute__ ((vector_size (32)));
> +
> +__attribute__((noinline, noclone)) unsigned
> +foo(D a, B b, D c, F d)
> +{
> +  b /= (B) {1, -c[0]} | 1;
> +  c[0] |= 7;
> +  a %= c | 1;
> +  c ^= c;
> +  return a[0] + b[15] + c[0] + d[3];
> +}
> +
> +int
> +main ()
> +{
> +  unsigned x = foo ((D) {}, (B) {}, (D) {}, (F) {});
> +  if (x != 0)
> +__builtin_abort ();
> +  return 0;
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)

[PATCH][LRA]Don't generate reload for output scratch operand from reload instruction.

2016-02-26 Thread Renlin Li


Hi all,

I admit that, the title looks a little bit confusing.

The situation is like this,
To make insn_1 strict, lra generates a new insn_1_reload insn.
In insn_1_reload, there is a scratch operand with this form
clobber (match_scratch:MODE x "=&r")

When lra tries to reload insn_1_reload in later iteration, a new pseudo
register (let say RXX) is created to replace this scratch operand in-place.
Additionally, a new insn will be generated and inserted after insn_1_reload to
finish the reload. It's in this form:
(set scratch, RXX)

And this instruction is illegal. no target implements this kind of pattern.
LRA will ICE because of this.
"internal compiler error: in lra_set_insn_recog_data, at lra.c:964"

And indeed, this pattern has no side-effect. The scratch operand should
stay inside the pattern.

Normally, at the very beginning of LRA reload, all scratch operands will be
replaced by newly created pseudo register. However, this is a problem when
generated reload insn has output scratch operand.

I have checked, x86, arm, aarch64, mips, arc all have such patterns. But it's
not triggered. In my case, it's triggered by compiling glibc with local change.

So a simple change is made in this patch. The output operand is reloaded only
when it's not a scratch operand and it's not unused since then.


aarch64-none-linux-gnu bootstrap and regression test OK.
x86_64-linux bootstrap and regression test OK.
OK for trunk?

Regards,
Renlin Li


gcc/ChangeLog:

2016-02-26  Renlin Li

* lra-constraints.c (curr_insn_transform): Don't generate reload for
output scratch operand.

diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c
index 08cf0aa6c4208bb60ba5071bad1255d587f1cb4a..ef5809ff226cca69bb711bfc5dab55e24caba01a 100644
--- a/gcc/lra-constraints.c
+++ b/gcc/lra-constraints.c
@@ -3882,6 +3882,8 @@ curr_insn_transform (bool check_only_p)
 	}
 	  *loc = new_reg;
 	  if (type != OP_IN
+	  /* Don't generate reload for output scratch operand.  */
+	  && GET_CODE (old) != SCRATCH
 	  && find_reg_note (curr_insn, REG_UNUSED, old) == NULL_RTX)
 	{
 	  start_sequence ();

Re: [PATCH][LRA]Don't generate reload for output scratch operand from reload instruction.

2016-02-26 Thread Richard Biener

On Fri, Feb 26, 2016 at 1:54 PM, Renlin Li  wrote:
> Hi all,
>
> I admit that, the title looks a little bit confusing.
>
> The situation is like this,
> To make insn_1 strict, lra generates a new insn_1_reload insn.
> In insn_1_reload, there is a scratch operand with this form
> clobber (match_scratch:MODE x "=&r")
>
> When lra tries to reload insn_1_reload in later iteration, a new pseudo
> register (let say RXX) is created to replace this scratch operand in-place.
> Additionally, a new insn will be generated and inserted after insn_1_reload
> to
> finish the reload. It's in this form:
> (set scratch, RXX)
>
> And this instruction is illegal. no target implements this kind of pattern.
> LRA will ICE because of this.
> "internal compiler error: in lra_set_insn_recog_data, at lra.c:964"
>
> And indeed, this pattern has no side-effect. The scratch operand should
> stay inside the pattern.
>
> Normally, at the very beginning of LRA reload, all scratch operands will be
> replaced by newly created pseudo register. However, this is a problem when
> generated reload insn has output scratch operand.
>
> I have checked, x86, arm, aarch64, mips, arc all have such patterns. But
> it's
> not triggered. In my case, it's triggered by compiling glibc with local
> change.
>
> So a simple change is made in this patch. The output operand is reloaded
> only
> when it's not a scratch operand and it's not unused since then.
>
>
> aarch64-none-linux-gnu bootstrap and regression test OK.
> x86_64-linux bootstrap and regression test OK.
> OK for trunk?

Please extract a testcase from your modified glibc sources.

Richard.

> Regards,
> Renlin Li
>
>
> gcc/ChangeLog:
>
> 2016-02-26  Renlin Li
>
> * lra-constraints.c (curr_insn_transform): Don't generate reload for
> output scratch operand.
>

Re: [PATCH][LRA]Don't generate reload for output scratch operand from reload instruction.

2016-02-26 Thread Renlin Li


Hi Richard,

On 26/02/16 12:57, Richard Biener wrote:

On Fri, Feb 26, 2016 at 1:54 PM, Renlin Li  wrote:


I have checked, x86, arm, aarch64, mips, arc all have such patterns. But
it's
not triggered. In my case, it's triggered by compiling glibc with local
change.



Please extract a testcase from your modified glibc sources.

Richard.


The change is to the compiler. I change the compiler and tries to build 
a linux toolchain.


I tried to create a test case. But no luck.

It needs to make LRA generate a reload pattern with a non-input scratch 
register operand.

This type of pattern actually is quite common in the target backends.

I also searched the GCC bugzilla website, there is no similar bug reported.

Regards,
Renlin



Regards,
Renlin Li


gcc/ChangeLog:

2016-02-26  Renlin Li

 * lra-constraints.c (curr_insn_transform): Don't generate reload for
 output scratch operand.

Re: [WWWDocs] Deprecate support for non-thumb ARM devices

2016-02-26 Thread Richard Earnshaw (lists)

On 24/02/16 13:59, Richard Earnshaw (lists) wrote:
> After discussion with the ARM port maintainers we have decided that now
> is probably the right time to deprecate support for versions of the ARM
> Architecture prior to ARMv4t.  This will allow us to clean up some of
> the code base going forwards by being able to assume:
> - Presence of half-word data accesses
> - Presence of Thumb and therefore of interworking instructions.
> 
> This patch records the status change in the GCC-6 release notes.
> 
> I propose to commit this patch later this week.
> 

Now done.

R.

> R.
> 
> 
> deprecate.patch
> 
> 
> Index: htdocs/gcc-6/changes.html
> ===
> RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-6/changes.html,v
> retrieving revision 1.61
> diff -u -r1.61 changes.html
> --- htdocs/gcc-6/changes.html 19 Feb 2016 05:00:54 -  1.61
> +++ htdocs/gcc-6/changes.html 19 Feb 2016 14:47:31 -
> @@ -340,7 +340,14 @@
>  ARM
> 
>   
> -   The arm port now supports target attributes and pragmas.  Please
> +   Support for revisions of the ARM architecture prior to ARMv4t has
> +   been deprecated and will be removed in a future GCC release.
> +   This affects ARM6, ARM7 (but not ARM7TDMI), ARM8, StrongARM, and
> +   Faraday fa526 and fa626 devices, which do not have support for
> +   the Thumb execution state.
> + 
> + 
> +   The ARM port now supports target attributes and pragmas.  Please
> refer to the  href="https://gcc.gnu.org/onlinedocs/gcc/ARM-Function-Attributes.html#ARM-Function-Attributes";>
> documentation for details of available attributes and
> pragmas as well as usage instructions.
>

[PATCH, rs6000] Fix PR61397 (test case update for P8 vector loads/stores)

2016-02-26 Thread Bill Schmidt

Hi,

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61397 was almost resolved a
year ago, but had a proposed patch by Mike Meissner that was never
vetted and committed.  I've reviewed the patch and tested it on GCC 5
and GCC 6, and with the patch applied we see the test pass for both
32-bit and 64-bit on a Power8 big-endian platform, as well as for 64-bit
on a Power8 little-endian platform.

As I understand it, the test case got out of sync with the
implementation in GCC 5, and this rewrite of the test case restores
order.  I've verified that the original compilation options from Andreas
Schwab for this test case result in correct generation of lxsdx, which
was not the case with the original report.

The test case is extremely different in GCC 4.9.  As Mike has noted in
the PR, the -mupper-regs support does not exist in GCC 4.9, so the
rewritten test case does not apply there.

As stated, verified on powerpc64-unknown-linux-gnu (-m32, -m64) and
powerpc64le-unknown-linux-gnu (-m64).  Is this ok for trunk and GCC 5?

Thanks,
Bill


2016-02-26  Michael Meissner 
Bill Schmidt  

* gcc.target/powerpc/p8vector-ldst.c: Adjust to test desired
functionality for both 32-bit and 64-bit.


--- gcc/testsuite/gcc.target/powerpc/p8vector-ldst.c(revision 220948)
+++ gcc/testsuite/gcc.target/powerpc/p8vector-ldst.c(working copy)
@@ -1,4 +1,4 @@ 
-/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
+/* { dg-do compile { target { powerpc*-*-* } } } */
 /* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
 /* { dg-require-effective-target powerpc_p8vector_ok } */
 /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power8" } } */
@@ -51,13 +51,14 @@ load_store_sf (unsigned long num,
   float value37= 0.0f;
   float value38= 0.0f;
   float value39= 0.0f;
-  unsigned long in_mask;
-  unsigned long out_mask;
+  unsigned long in_mask, in_mask2;
+  unsigned long out_mask, out_mask2;
   unsigned long i;
 
   for (i = 0; i < num; i++)
 {
   in_mask = *in_mask_ptr++;
+  in_mask2 = *in_mask_ptr++;
   if ((in_mask & (1L <<  0)) != 0L)
value00 = *from_ptr++;
 
@@ -118,67 +119,68 @@ load_store_sf (unsigned long num,
   if ((in_mask & (1L << 19)) != 0L)
value19 = *from_ptr++;
 
-  if ((in_mask & (1L << 20)) != 0L)
+  if ((in_mask2 & (1L << 0)) != 0L)
value20 = *from_ptr++;
 
-  if ((in_mask & (1L << 21)) != 0L)
+  if ((in_mask2 & (1L << 1)) != 0L)
value21 = *from_ptr++;
 
-  if ((in_mask & (1L << 22)) != 0L)
+  if ((in_mask2 & (1L << 2)) != 0L)
value22 = *from_ptr++;
 
-  if ((in_mask & (1L << 23)) != 0L)
+  if ((in_mask2 & (1L << 3)) != 0L)
value23 = *from_ptr++;
 
-  if ((in_mask & (1L << 24)) != 0L)
+  if ((in_mask2 & (1L << 4)) != 0L)
value24 = *from_ptr++;
 
-  if ((in_mask & (1L << 25)) != 0L)
+  if ((in_mask2 & (1L << 5)) != 0L)
value25 = *from_ptr++;
 
-  if ((in_mask & (1L << 26)) != 0L)
+  if ((in_mask2 & (1L << 6)) != 0L)
value26 = *from_ptr++;
 
-  if ((in_mask & (1L << 27)) != 0L)
+  if ((in_mask2 & (1L << 7)) != 0L)
value27 = *from_ptr++;
 
-  if ((in_mask & (1L << 28)) != 0L)
+  if ((in_mask2 & (1L << 8)) != 0L)
value28 = *from_ptr++;
 
-  if ((in_mask & (1L << 29)) != 0L)
+  if ((in_mask2 & (1L << 9)) != 0L)
value29 = *from_ptr++;
 
-  if ((in_mask & (1L << 30)) != 0L)
+  if ((in_mask2 & (1L << 10)) != 0L)
value30 = *from_ptr++;
 
-  if ((in_mask & (1L << 31)) != 0L)
+  if ((in_mask2 & (1L << 11)) != 0L)
value31 = *from_ptr++;
 
-  if ((in_mask & (1L << 32)) != 0L)
+  if ((in_mask2 & (1L << 12)) != 0L)
value32 = *from_ptr++;
 
-  if ((in_mask & (1L << 33)) != 0L)
+  if ((in_mask2 & (1L << 13)) != 0L)
value33 = *from_ptr++;
 
-  if ((in_mask & (1L << 34)) != 0L)
+  if ((in_mask2 & (1L << 14)) != 0L)
value34 = *from_ptr++;
 
-  if ((in_mask & (1L << 35)) != 0L)
+  if ((in_mask2 & (1L << 15)) != 0L)
value35 = *from_ptr++;
 
-  if ((in_mask & (1L << 36)) != 0L)
+  if ((in_mask2 & (1L << 16)) != 0L)
value36 = *from_ptr++;
 
-  if ((in_mask & (1L << 37)) != 0L)
+  if ((in_mask2 & (1L << 17)) != 0L)
value37 = *from_ptr++;
 
-  if ((in_mask & (1L << 38)) != 0L)
+  if ((in_mask2 & (1L << 18)) != 0L)
value38 = *from_ptr++;
 
-  if ((in_mask & (1L << 39)) != 0L)
+  if ((in_mask2 & (1L << 19)) != 0L)
value39 = *from_ptr++;
 
   out_mask = *out_mask_ptr++;
+  out_mask2 = *out_mask_ptr++;
   if ((out_mask & (1L <<  0)) != 0L)
*to_ptr++ = value00;
 
@@ -239,64 +241,64 @@ load_store_sf (unsigned long num,
   if ((out_mask & (1L << 19)) != 0L)
*to_ptr++ = value19;
 
-  if ((out_mask & (1L << 20)) != 0L)
+  if ((out_mask2 & (1L << 0)) != 0L)

[PATCH][ARM][wwwdocs] Mention Cortex-A32 and Cortex-A35 support in changes.html for GCC 6

2016-02-26 Thread Kyrill Tkachov


Hi all,

This patch adds a note to changes.html about the added support for Cortex-A32 
and Cortex-A35.

Ok to commit?

Thanks,
Kyrill
Index: htdocs/gcc-6/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-6/changes.html,v
retrieving revision 1.62
diff -U 3 -r1.62 changes.html
--- htdocs/gcc-6/changes.html	24 Feb 2016 09:36:06 -	1.62
+++ htdocs/gcc-6/changes.html	25 Feb 2016 10:51:14 -
@@ -345,6 +345,15 @@
documentation for details of available attributes and
pragmas as well as usage instructions.
  
+ 
+   Support has been added for the following processors
+   (GCC identifiers in parentheses): ARM Cortex-A32
+   (cortex-a32), ARM Cortex-A35 (cortex-a35).
+   The GCC identifiers can be used
+   as arguments to the -mcpu or -mtune options,
+   for example: -mcpu=cortex-a32 or
+   -mtune=cortex-a35.
+

Re: [PATCH, rs6000] Fix PR61397 (test case update for P8 vector loads/stores)

2016-02-26 Thread David Edelsohn

On Fri, Feb 26, 2016 at 9:18 AM, Bill Schmidt
 wrote:
> Hi,
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61397 was almost resolved a
> year ago, but had a proposed patch by Mike Meissner that was never
> vetted and committed.  I've reviewed the patch and tested it on GCC 5
> and GCC 6, and with the patch applied we see the test pass for both
> 32-bit and 64-bit on a Power8 big-endian platform, as well as for 64-bit
> on a Power8 little-endian platform.
>
> As I understand it, the test case got out of sync with the
> implementation in GCC 5, and this rewrite of the test case restores
> order.  I've verified that the original compilation options from Andreas
> Schwab for this test case result in correct generation of lxsdx, which
> was not the case with the original report.
>
> The test case is extremely different in GCC 4.9.  As Mike has noted in
> the PR, the -mupper-regs support does not exist in GCC 4.9, so the
> rewritten test case does not apply there.
>
> As stated, verified on powerpc64-unknown-linux-gnu (-m32, -m64) and
> powerpc64le-unknown-linux-gnu (-m64).  Is this ok for trunk and GCC 5?
>
> Thanks,
> Bill
>
>
> 2016-02-26  Michael Meissner 
> Bill Schmidt  
>
> * gcc.target/powerpc/p8vector-ldst.c: Adjust to test desired
> functionality for both 32-bit and 64-bit.

Okay.

Thanks, David

[PATCH] Fix PR69720

2016-02-26 Thread Richard Biener


The following fixes PR69720 where with nested reductions that require
unrolling the inner loop (and thus having multiple PHIs) we fail to
properly build the reduction epilogue.  Existing testcases in the
testsuite are also affected but for them it doesn't matter as
adding zero can be omitted safely ...

Bootstrapped on x86_64-unknown-linux-gnu, testing still in progress.

We might be able to remove the adjustment_def case completely now,
but that's not appropriate at this stage so I removed only the case
we were handling not correctly.

Richard.

2016-02-26  Richard Biener  

PR tree-optimization/69720
* tree-vect-loop.c (get_initial_def_for_reduction): Avoid
the adjustment_def path for possibly vectorized defs.
(vect_create_epilog_for_reduction): Handle vectorized initial
defs properly.

* gcc.dg/vect/vect-outer-pr69720.c: New testcase.

Index: gcc/tree-vect-loop.c
===
*** gcc/tree-vect-loop.c(revision 233734)
--- gcc/tree-vect-loop.c(working copy)
*** get_initial_def_for_reduction (gimple *s
*** 4110,4115 
--- 4119,4133 
return vect_create_destination_var (init_val, vectype);
  }
  
+   /* In case of a nested reduction do not use an adjustment def as
+  that case is not supported by the epilogue generation correctly
+  if ncopies is not one.  */
+   if (adjustment_def && nested_in_vect_loop)
+ {
+   *adjustment_def = NULL;
+   return vect_get_vec_def_for_operand (init_val, stmt);
+ }
+ 
switch (code)
  {
case WIDEN_SUM_EXPR:
*** get_initial_def_for_reduction (gimple *s
*** 4124,4135 
  /* ADJUSMENT_DEF is NULL when called from
 vect_create_epilog_for_reduction to vectorize double reduction.  */
  if (adjustment_def)
!   {
! if (nested_in_vect_loop)
!   *adjustment_def = vect_get_vec_def_for_operand (init_val, stmt);
! else
!   *adjustment_def = init_val;
!   }
  
  if (code == MULT_EXPR)
{
--- 4142,4148 
  /* ADJUSMENT_DEF is NULL when called from
 vect_create_epilog_for_reduction to vectorize double reduction.  */
  if (adjustment_def)
! *adjustment_def = init_val;
  
  if (code == MULT_EXPR)
{
*** vect_create_epilog_for_reduction (vec

[C PATCH] Fix C error-recovery (PR c/69796, PR c/69974)

2016-02-26 Thread Jakub Jelinek

Hi!

Already PR69483 and these two further PRs show that it is really a bad idea
to set TREE_TYPE of decls with incomplete types to error_mark_node, there
are lots of places in the middle-end that don't expect error_mark_nodes
appearing so late.

I've bootstrapped/regtested on x86_64-linux and i686-linux following patch
which just doesn't do anything with the type, we don't emit the error
multiple times, because if the FE emits errors, cgraphunit doesn't attempt
to assemble variables or compile functions (just gimplifies them) if
seen_error ().  Attached is then untested alternative, set the type to
something that should hopefully not cause ICEs nor further errors/warnings
during the error-recovery.

Ok for trunk, or shall I test the other patch instead?

2016-02-26  Jakub Jelinek  

PR c/69796
PR c/69974
* c-parser.c (c_parser_translation_unit): Don't change TREE_TYPE
of incomplete decls to error_mark_node.

* gcc.dg/pr69796.c: New test.
* gcc.dg/pr69974.c: New test.

--- gcc/c/c-parser.c.jj 2016-02-24 09:33:37.0 +0100
+++ gcc/c/c-parser.c2016-02-26 13:31:00.947133760 +0100
@@ -1436,10 +1436,7 @@ c_parser_translation_unit (c_parser *par
   tree decl;
   FOR_EACH_VEC_ELT (incomplete_record_decls, i, decl)
 if (DECL_SIZE (decl) == NULL_TREE && TREE_TYPE (decl) != error_mark_node)
-  {
-   error ("storage size of %q+D isn%'t known", decl);
-   TREE_TYPE (decl) = error_mark_node;
-  }
+  error ("storage size of %q+D isn%'t known", decl);
 }
 
 /* Parse an external declaration (C90 6.7, C99 6.9).
--- gcc/testsuite/gcc.dg/pr69796.c.jj   2016-02-26 13:16:32.352323768 +0100
+++ gcc/testsuite/gcc.dg/pr69796.c  2016-02-26 13:16:16.0 +0100
@@ -0,0 +1,10 @@
+/* PR c/69796 */
+/* { dg-do compile } */
+
+struct S s;/* { dg-error "storage size of 's' isn't known" } */
+
+void
+foo ()
+{
+  s a; /* { dg-error "expression statement has incomplete type|expected" } */
+}
--- gcc/testsuite/gcc.dg/pr69974.c.jj   2016-02-26 13:13:40.239740046 +0100
+++ gcc/testsuite/gcc.dg/pr69974.c  2016-02-26 13:13:29.0 +0100
@@ -0,0 +1,13 @@
+/* PR c/69974 */
+/* { dg-do compile } */
+
+struct S;
+char foo (struct S *);
+struct S a;/* { dg-error "storage size of 'a' isn't known" } */
+int b;
+
+void
+bar ()
+{
+  b &= foo (&a);
+}


Jakub
2016-02-26  Jakub Jelinek  

PR c/69796
PR c/69974
* c-parser.c (c_parser_translation_unit): For incomplete decls
set type to char_type_node instead of error_mark_node.

* gcc.dg/pr69796.c: New test.
* gcc.dg/pr69974.c: New test.

--- gcc/c/c-parser.c.jj 2016-02-24 09:33:37.0 +0100
+++ gcc/c/c-parser.c2016-02-26 13:04:16.416657335 +0100
@@ -1438,7 +1438,10 @@ c_parser_translation_unit (c_parser *par
 if (DECL_SIZE (decl) == NULL_TREE && TREE_TYPE (decl) != error_mark_node)
   {
error ("storage size of %q+D isn%'t known", decl);
-   TREE_TYPE (decl) = error_mark_node;
+   /* Don't set the type to error_mark_node, the middle-end is unprepared
+  to see error_mark_node type appearing so late in the IL,
+  so using some other type is better for error recovery.  */
+   TREE_TYPE (decl) = char_type_node;
   }
 }
 
--- gcc/testsuite/gcc.dg/pr69796.c.jj   2016-02-26 13:16:32.352323768 +0100
+++ gcc/testsuite/gcc.dg/pr69796.c  2016-02-26 13:16:16.0 +0100
@@ -0,0 +1,10 @@
+/* PR c/69796 */
+/* { dg-do compile } */
+
+struct S s;/* { dg-error "storage size of 's' isn't known" } */
+
+void
+foo ()
+{
+  s a; /* { dg-error "expression statement has incomplete type|expected" } */
+}
--- gcc/testsuite/gcc.dg/pr69974.c.jj   2016-02-26 13:13:40.239740046 +0100
+++ gcc/testsuite/gcc.dg/pr69974.c  2016-02-26 13:13:29.0 +0100
@@ -0,0 +1,13 @@
+/* PR c/69974 */
+/* { dg-do compile } */
+
+struct S;
+char foo (struct S *);
+struct S a;/* { dg-error "storage size of 'a' isn't known" } */
+int b;
+
+void
+bar ()
+{
+  b &= foo (&a);
+}

Re: [PATCH][ARM][wwwdocs] Mention Cortex-A32 and Cortex-A35 support in changes.html for GCC 6

2016-02-26 Thread Richard Earnshaw (lists)

On 26/02/16 14:25, Kyrill Tkachov wrote:
> Hi all,
> 
> This patch adds a note to changes.html about the added support for
> Cortex-A32 and Cortex-A35.
> 
> Ok to commit?
> 

OK.

R.

> Thanks,
> Kyrill
> 
> wwwdocs-a32-a35.patch
> 
> 
> Index: htdocs/gcc-6/changes.html
> ===
> RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-6/changes.html,v
> retrieving revision 1.62
> diff -U 3 -r1.62 changes.html
> --- htdocs/gcc-6/changes.html 24 Feb 2016 09:36:06 -  1.62
> +++ htdocs/gcc-6/changes.html 25 Feb 2016 10:51:14 -
> @@ -345,6 +345,15 @@
> documentation for details of available attributes and
> pragmas as well as usage instructions.
>   
> + 
> +   Support has been added for the following processors
> +   (GCC identifiers in parentheses): ARM Cortex-A32
> +   (cortex-a32), ARM Cortex-A35 (cortex-a35).
> +   The GCC identifiers can be used
> +   as arguments to the -mcpu or -mtune options,
> +   for example: -mcpu=cortex-a32 or
> +   -mtune=cortex-a35.
> + 
> 
>  
>  
>

Re: [PATCH 1/2][AArch64] Implement AAPCS64 updates for alignment attribute

2016-02-26 Thread James Greenhalgh

On Mon, Feb 22, 2016 at 03:07:09PM +, Alan Lawrence wrote:
> On 22/01/16 17:16, Alan Lawrence wrote:
> >
> >On 21/01/16 17:23, Alan Lawrence wrote:
> >>On 18/01/16 17:10, Eric Botcazou wrote:
> >>>
> >>>Could you post the list of files that differ?  How do they differ exactly?
> >>
> >>Hmmm. Well, I definitely had this failing to bootstrap once. I repeated 
> >>that, to
> >>try to identify exactly what the differences wereand it succeeded even 
> >>with
> >>my pre-AAPCS64-update host compiler. So, this is probably a false alarm; I'm
> >>bootstrapping again, after a rebase, to make sure...
> >>
> >>--Alan
> >
> >Ok, rebased onto a more recent build, and bootstrapping with Ada posed no
> >problems. Sorry for the noise.
> >
> >However, I had to drop the assert that TYPE_FIELDS was non-null because of 
> >some
> >C++ testcases.
> >
> >Is this version OK for trunk?
> >
> >--Alan
> >
> >gcc/ChangeLog:
> >
> > * gcc/config/aarch64/aarch64.c (aarch64_function_arg_alignment):
> > Rewrite, looking one level down for records and arrays.
> >---
> >  gcc/config/aarch64/aarch64.c | 31 ---
> >  1 file changed, 16 insertions(+), 15 deletions(-)
> >
> >diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> >index 9142ac0..b084f83 100644
> >--- a/gcc/config/aarch64/aarch64.c
> >+++ b/gcc/config/aarch64/aarch64.c
> >@@ -1925,22 +1925,23 @@ aarch64_vfp_is_call_candidate (cumulative_args_t 
> >pcum_v, machine_mode mode,
> >  static unsigned int
> >  aarch64_function_arg_alignment (machine_mode mode, const_tree type)
> >  {
> >-  unsigned int alignment;
> >+  if (!type)
> >+return GET_MODE_ALIGNMENT (mode);
> >+  if (integer_zerop (TYPE_SIZE (type)))
> >+return 0;
> >
> >-  if (type)
> >-{
> >-  if (!integer_zerop (TYPE_SIZE (type)))
> >-{
> >-  if (TYPE_MODE (type) == mode)
> >-alignment = TYPE_ALIGN (type);
> >-  else
> >-alignment = GET_MODE_ALIGNMENT (mode);
> >-}
> >-  else
> >-alignment = 0;
> >-}
> >-  else
> >-alignment = GET_MODE_ALIGNMENT (mode);
> >+  gcc_assert (TYPE_MODE (type) == mode);
> >+
> >+  if (!AGGREGATE_TYPE_P (type))
> >+return TYPE_ALIGN (TYPE_MAIN_VARIANT (type));
> >+
> >+  if (TREE_CODE (type) == ARRAY_TYPE)
> >+return TYPE_ALIGN (TREE_TYPE (type));
> >+
> >+  unsigned int alignment = 0;
> >+
> >+  for (tree field = TYPE_FIELDS (type); field; field = DECL_CHAIN (field))
> >+alignment = std::max (alignment, DECL_ALIGN (field));
> >
> >return alignment;
> >  }
> >
> 
> 
> Ping.
> 
> If this is not appropriate for GCC6, then is it OK for Stage 1 of GCC 7?

I think this needs to be a GCC 7 fix at this point.

Additionally, I'd like to fully understand PR69841 before we take this
patch.

In particular, under what circumstances can the C++ front end set DECL_ALIGN
of a type to be wider than we expect. Can we ever end up with 128-bit
alignment of a template parameter when we were expecting 32- or 64-bit
alignment, and if we can what are the implications on this patch?

Thanks,
James

Re: [AArch64] Emit square root using the Newton series

2016-02-26 Thread James Greenhalgh

On Mon, Feb 22, 2016 at 06:50:44PM -0600, Evandro Menezes wrote:
> In preparation for the patch adding the Newton series also for
> square root, I'd like to propose this patch changing the name of the
> existing tuning flag for the reciprocal square root.

This is fine, other names like sw_rsqrt, expand_rsqrt, nr_rsqrt would also
be OK. Pick your favourite!

One comment on the replacement invoke.texi text below, otherwise this is
OK to apply now.

> diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt
> index 5cbd4cd..155d2bd 100644
> --- a/gcc/config/aarch64/aarch64.opt
> +++ b/gcc/config/aarch64/aarch64.opt
> @@ -151,5 +151,5 @@ PC relative literal loads.
>  
>  mlow-precision-recip-sqrt
>  Common Var(flag_mrecip_low_precision_sqrt) Optimization
> -When calculating a sqrt approximation, run fewer steps.
> +Calculate the reciprocal square-root approximation in fewer steps.
>  This reduces precision, but can result in faster computation.
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 490df93..eeff24d 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -12879,12 +12879,10 @@ corresponding flag to the linker.
>  @item -mno-low-precision-recip-sqrt
>  @opindex -mlow-precision-recip-sqrt
>  @opindex -mno-low-precision-recip-sqrt
> -The square root estimate uses two steps instead of three for 
> double-precision,
> -and one step instead of two for single-precision.
> -Thus reducing latency and precision.
> -This is only relevant if @option{-ffast-math} activates
> -reciprocal square root estimate instructions.
> -Which in turn depends on the target processor.
> +The reciprocal square root approximation uses one step less than otherwise,
> +thus reducing latency and precision.

When calculating the reciprocal square root approximation, use one less
step than otherwise, thus reducing latency and precision.

Thanks,
James

[PATCH] S/390: PR69709 Fix risbg splitter

2016-02-26 Thread Andreas Krebbel

This fixes a wrong code generation problem with the splitters introduced
with that patch: https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01840.html

The target operand is used as temporary.  This fails if it matches the
source of the left shift which is read after writing the temporary.

Thanks to Dominik for debugging it and thanks to Richard for the fix!

Bootstrapped and regtested on s390x with-arch=z13.

Bye,

-Andreas-

gcc/ChangeLog:

2016-02-26  Richard Henderson  

PR target/69709
* config/s390/s390.md (risbg and risbgn splitters): Allocate new
pseudo in case the target rtx matches the source of the left
shift.

gcc/testsuite/ChangeLog:

2016-02-26  Andreas Krebbel  

PR target/69709
* gcc.target/s390/pr69709.c: New test.
---
 gcc/config/s390/s390.md | 24 
 gcc/testsuite/gcc.target/s390/pr69709.c | 39 +
 2 files changed, 59 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/pr69709.c

diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index 2c90eae..8f92018 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -3876,13 +3876,21 @@
 (ashift:GPR (match_operand:GPR 3 "nonimmediate_operand" "")
 (match_operand:GPR 4 "nonzero_shift_count_operand" 
""]
   "TARGET_ZEC12 && UINTVAL (operands[2]) + UINTVAL (operands[4]) >= "
-  [(set (match_dup 0)
+  [(set (match_dup 6)
(lshiftrt:GPR (match_dup 1) (match_dup 2)))
(set (match_dup 0)
-   (ior:GPR (and:GPR (match_dup 0) (match_dup 5))
+   (ior:GPR (and:GPR (match_dup 6) (match_dup 5))
 (ashift:GPR (match_dup 3) (match_dup 4]
 {
   operands[5] = GEN_INT ((1UL << UINTVAL (operands[4])) - 1);
+  if (rtx_equal_p (operands[0], operands[3]))
+{
+  if (!can_create_pseudo_p ())
+   FAIL;
+  operands[6] = gen_reg_rtx (mode);
+}
+  else
+operands[6] = operands[0];
 })
 
 (define_split
@@ -3894,15 +3902,23 @@
   (match_operand:GPR 4 
"nonzero_shift_count_operand" ""
  (clobber (reg:CC CC_REGNUM))])]
   "TARGET_Z10 && !TARGET_ZEC12 && UINTVAL (operands[2]) + UINTVAL 
(operands[4]) >= "
-  [(set (match_dup 0)
+  [(set (match_dup 6)
(lshiftrt:GPR (match_dup 1) (match_dup 2)))
(parallel
 [(set (match_dup 0)
- (ior:GPR (and:GPR (match_dup 0) (match_dup 5))
+ (ior:GPR (and:GPR (match_dup 6) (match_dup 5))
   (ashift:GPR (match_dup 3) (match_dup 4
  (clobber (reg:CC CC_REGNUM))])]
 {
   operands[5] = GEN_INT ((1UL << UINTVAL (operands[4])) - 1);
+  if (rtx_equal_p (operands[0], operands[3]))
+{
+  if (!can_create_pseudo_p ())
+   FAIL;
+  operands[6] = gen_reg_rtx (mode);
+}
+  else
+operands[6] = operands[0];
 })
 
 (define_insn "*rsbg__noshift"
diff --git a/gcc/testsuite/gcc.target/s390/pr69709.c 
b/gcc/testsuite/gcc.target/s390/pr69709.c
new file mode 100644
index 000..e9aa024
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/pr69709.c
@@ -0,0 +1,39 @@
+/* PR69709 This testcase used to fail due to a broken risbg
+   splitter.  */
+
+/* { dg-do run } */
+/* { dg-options "-O3 -march=z10" } */
+
+
+typedef struct
+{
+  unsigned int sig[2];
+}
+val_t;
+
+unsigned int __attribute__ ((noinline))
+div_significands (const val_t * a)
+{
+  val_t u = *a;
+  int bit = 64;
+  unsigned int r;
+  do
+{
+  u.sig[1] = (u.sig[1] << 1) | (u.sig[0] >> 31);
+  u.sig[0] = 42;
+
+  if (bit == 64)
+   r = u.sig[1];
+}
+  while (--bit >= 0);
+  return r;
+}
+
+int
+main (void)
+{
+  val_t a = { { 0x1, 0x1 } };
+  if (div_significands (&a) != 2)
+__builtin_abort ();
+  return 0;
+}
-- 
1.9.1

Re: Wonly-top-basic-asm

2016-02-26 Thread Bernd Schmidt


On 02/21/2016 11:27 AM, David Wohlferd wrote:

So now what?  I have one Bernd who likes the sample, and one who
doesn't.  Obviously I think what I'm proposing is better than what's
there now and I've done my best to say why.  But me believing it to be
better doesn't get anything checked in.


I hadn't thought it through well enough. Jan's objection (order isn't 
guaranteed) is relevant. I'd drop the example.



Bernd

Re: Fix PR44281 (bad RA with global regs)

2016-02-26 Thread Bernd Schmidt


On 02/22/2016 03:37 PM, Richard Biener wrote:


Do calls properly clobber them even if they are not in the set of
call-clobbered regs?
Are asm()s properly using/clobbering them?  I think you are allowed to use them
in asm()s without adding constraints for them?


Calls do, asms currently don't AFAICT. Not sure whether it's allowed to 
use them, but I think it should be straightforward to adjust df-scan.


Michael Matz wrote:


Some jit-like code uses global reg vars to
reserve registers for the generated code.  It would have to add
-ffixed- compilation options instead of only the global vars after
the patch.  I think the advantages of having "good RA" with global reg
vars are dubious enough to not warrant breaking backward compatibility.


I don't see the need for new options - if we have proper live info for 
calls and asms, what other reason could there be for incomplete liveness 
information?


RB:
> I'd say this is sth for GCC 7.

Not really objecting to that. Would you ack it if I extended df-scan for 
the asm case?



Bernd

Re: [PATCH][AArch64] Remove an unused reload hook.

2016-02-26 Thread James Greenhalgh

On Thu, Feb 25, 2016 at 12:00:58PM +0100, Yvan Roux wrote:
> Hi,
> 
> On 26 January 2015 at 18:01, Matthew Wahab  wrote:
> > Hello,
> >
> > The LEGITIMIZE_RELOAD_ADDRESS macro is only needed for reload. Since the
> > Aarch64 backend no longer supports reload, this macro is not needed and this
> > patch removes it.
> >
> > Tested aarch64-none-linux-gnu with gcc-check. No new failures.
> >
> > Ok for trunk?
> > Matthew
> >
> > gcc/
> > 2015-01-26  Matthew Wahab  
> >
> > * config/aarch64/aarch64.h (LEGITIMIZE_RELOAD_ADDRESS): Remove.
> > * config/aarch64/arch64-protos.h
> > (aarch64_legitimize_reload_address): Remove.
> > * config/aarch64/aarch64.c (aarch64_legitimize_reload_address):
> > Remove.
> 
> It seems that this old patch was forgotten, I guess that it'll have to
> wait for GCC 7 now, but I think it's a good thing top cleanup the
> reload specific hooks and constructions (I've another patch on for
> that under on-going).

Agreed. Please repost the patch once GCC 7 development opens and I'll take
a look at it then.

Thanks,
James

Re: [PATCH][AArch64] PR target/69613: Return zero TARGET_SHIFT_TRUNCATION_MASK when SHIFT_COUNT_TRUNCATED is false

2016-02-26 Thread James Greenhalgh

On Thu, Feb 25, 2016 at 09:25:45AM +, Kyrill Tkachov wrote:
> Hi all,
> 
> In this wrong-code PR we get bad code when synthesising a TImode right shift
> by variable amount using DImode shifts during expand.
> 
> The expand_double_word_shift function expands two paths: one where the
> variable amount is greater than GET_MODE_BITSIZE (DImode) (word_mode for 
> aarch64) at runtime
> and one where it's less than that, and performs a conditional select between 
> the two in the end.
> 
> The second path is the one that goes wrong here.
> The algorithm it uses for calculating a TImode shift when the variable shift 
> amount is <64 is:
> 
> 
>   --DImode-   --DImode- --DImode-   --DImode-
> { |__ target-hi ___| |___ target-lo ___| } = { |___ source-hi ___| |___ 
> source-lo ___| } >> var_amnt
> 
> 1) carry = source-hi << 1
> 2) tmp = ~var_amnt // either that or BITS_PER_WORD - 1 - var_amnt depending 
> on shift_truncation_mask
> 3) carry = carry << tmp // net effect is that carry is source-hi shifted left 
> by BITS_PER_WORD - var_amnt
> 4) target-lo = source-lo >>u var_amnt //unsigned shift.
> 5) target-lo = target-lo | carry
> 6) target-hi = source-hi >> var_amnt
> 
> where the two DImode words source-hi and source-lo are the two words of the
> source TImode value and var_amnt is the register holding the variable shift
> amount. This is performed by the expand_subword_shift function in optabs.c.
> 
> Step 2) is valid only if the target truncates the shift amount by the width
> of the type its shifting, that is SHIFT_COUNT_TRUNCATED is true and
> TARGET_SHIFT_TRUNCATION_MASK is 63 in this case.
> 
> Step 3) is the one that goes wrong.  On aarch64 a left shift is usually
> implemented using the LSL instruction but we also have alternatives that can
> use the SIMD registers and the USHL instruction.  In this case we end up
> using the USHL instruction. However, although the USHL instruction does
> a DImode shift, it doesn't truncate the shift amount by 64, but rather by
> 255. Furthermore, the shift amount is interpreted as a 2's complement signed
> integer and if it's negative performs a right shift.  This is in contrast
> with the normal LSL instruction which just performs an unsigned shift by an
> amount truncated by 64.
> 
> Now, on aarch64 SHIFT_COUNT_TRUNCATED is defined as !TARGET_SIMD, so we don't
> assume shift amounts are truncated unless we know we can only ever use the
> LSL instructions for variable shifts.
> 
> However, we still return 63 as the TARGET_SHIFT_TRUNCATION_MASK for DImode,
> so expand_subword_shift assumes that since the mask is non-zero it's a valid
> shift truncation mask.  The documentation for TARGET_SHIFT_TRUNCATION_MASK
> says:
> " The default implementation of this function returns
>   @code{GET_MODE_BITSIZE (@var{mode}) - 1} if @code{SHIFT_COUNT_TRUNCATED}
>   and 0 otherwise."
> 
> So since for TARGET_SIMD we cannot guarantee that all shifts truncate their
> amount, we should be returning 0 in TARGET_SHIFT_TRUNCATION_MASK when
> SHIFT_COUNT_TRUNCATED is false.  This is what the patch does, and with it
> step 2) becomes:
> 2) tmp = BITS_PER_WORD - 1 - var_amnt
> which behaves correctly on aarch64.
> 
> Unfortunately, the testcase from the PR has very recently gone latent on
> trunk because it depends on register allocation picking a particular
> alternative from the *aarch64_ashl_sisd_or_int_3 pattern in aarch64.md.
> I tried to do some inline asm tricks to get it to force the correct
> constraints but was unsuccessful. Nevertheless I've included the testcase in
> the patch. I suppose it's better than nothing.

Thanks for the great write-up. This level of detail is very helpful for
review.

OK for trunk.

Thanks,
James

> 
> Bootstrapped and tested on aarch64.
> 
> Ok for trunk?
> 
> Thanks,
> Kyrill
> 
> 
> 2016-02-25  Kyrylo Tkachov  
> 
> PR target/69613
> * config/aarch64/aarch64.c (aarch64_shift_truncation_mask):
> Return 0 if !SHIFT_COUNT_TRUNCATED
> 
> 2016-02-25  Kyrylo Tkachov  
> 
> PR target/69613
> * gcc.dg/torture/pr69613.c: New test.

> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 
> d0d15b4feee70a5ca6af8dd16c7667cbcd844dbf..2e69e345545e591d1de76c2d175aac476e6e1107
>  100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -11171,7 +11171,8 @@ static unsigned HOST_WIDE_INT
>  aarch64_shift_truncation_mask (machine_mode mode)
>  {
>return
> -(aarch64_vector_mode_supported_p (mode)
> +(!SHIFT_COUNT_TRUNCATED
> + || aarch64_vector_mode_supported_p (mode)
>   || aarch64_vect_struct_mode_p (mode)) ? 0 : (GET_MODE_BITSIZE (mode) - 
> 1);
>  }
>  
> diff --git a/gcc/testsuite/gcc.dg/torture/pr69613.c 
> b/gcc/testsuite/gcc.dg/torture/pr69613.c
> new file mode 100644
> index 
> ..44f2b0cc91ac4b12c4d255b608f95bc8bf016923
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/torture/pr69613.c
> @@ -0,0 +1

Re: [PATCH][AArch64] Set TREE_TARGET_GLOBALS in aarch64_set_current_function when new tree is the default node to recalculate optab availability

2016-02-26 Thread James Greenhalgh

On Thu, Feb 25, 2016 at 11:04:21AM +, Kyrill Tkachov wrote:
> Hi all,
> 
> Seems like aarch64 is suffering from something similar to PR 69245 as well.
> If a target pragma sets the target state to the same as the
> target_option_default_node the node is just a pointer to
> target_option_default_node rather than a distinct identical node. So we must
> still restore the target globals even when setting to
> target_option_default_node in order to force the midend to recompute the
> availability of various optabs.
> 
> If we don't do it, we can get in a problem like in the testcase where the
> isa_flags are all set correctly, but the optab HAVE_* predicates have not
> been recomputed.
> 
> There is also a related issue present when popping/resetting target pragmas
> for which I'll send out a patch separately.
> 
> Bootstrapped and tested on aarch64.
> 
> Ok for trunk?

OK.

Thanks,
James

> Thanks,
> Kyrill
> 
> 
> 2016-02-25  Kyrylo Tkachov  
> 
> PR target/69245
> * config/aarch64/aarch64.c (aarch64_set_current_function): Save/restore
> target globals when switching to target_option_default_node.
> 
> 2016-02-25  Kyrylo Tkachov  
> 
> PR target/69245
> * gcc.target/aarch64/pr69245_1.c: New test.

[omp, hsa] Do not gridify simd constructs

2016-02-26 Thread Martin Jambor

Hi,

unfortunately, I have missed two execution failures when using HSA
(teams-6.f90 and target1.f90), both of which are caused by not
handling the simd part of combined target teams distribute parallel
for simd construct.  I have not really thought through how exactly
should GPUs teat the simd construct in a combined construct but of
course we must not miscompile.

So I'd like to commit the following patch which just disallows
gridification of such cases to trunk now and am working on a fix that
removes the simd loop for the hsa branch.  Eventually we might also
want to hsa-vectorize the body even though the explicit loop is
missing, but that will also mean that the HSA grid size has to shrink
appropriately.

Bootstrapped and tested on x86_64-linux, with and without HSA
enabled.  OK for trunk?

Thanks,

Martin

2016-02-17  Martin Jambor  

* omp-low.c (grid_find_ungridifiable_statement): Store problematic
statements to wi->info.  Also disallow omp simd constructs.
(grid_target_follows_gridifiable_pattern): Use wi.info to dump reason
for not gridifying.  Dump special string for omp_for.
---
 gcc/omp-low.c | 26 ++
 1 file changed, 22 insertions(+), 4 deletions(-)

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index fcbb3e0..989d03e 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -17241,7 +17241,7 @@ grid_find_single_omp_among_assignments (gimple_seq seq, 
location_t target_loc,
 static tree
 grid_find_ungridifiable_statement (gimple_stmt_iterator *gsi,
   bool *handled_ops_p,
-  struct walk_stmt_info *)
+  struct walk_stmt_info *wi)
 {
   *handled_ops_p = false;
   gimple *stmt = gsi_stmt (*gsi);
@@ -17251,6 +17251,7 @@ grid_find_ungridifiable_statement (gimple_stmt_iterator 
*gsi,
   if (gimple_call_noreturn_p (as_a  (stmt)))
{
  *handled_ops_p = true;
+ wi->info = stmt;
  return error_mark_node;
}
   break;
@@ -17266,8 +17267,19 @@ grid_find_ungridifiable_statement 
(gimple_stmt_iterator *gsi,
 case GIMPLE_OMP_TARGET:
 case GIMPLE_OMP_ORDERED:
   *handled_ops_p = true;
+  wi->info = stmt;
   return error_mark_node;
 
+case GIMPLE_OMP_FOR:
+  if ((gimple_omp_for_kind (stmt) & GF_OMP_FOR_SIMD)
+ && gimple_omp_for_combined_into_p (stmt))
+   {
+ *handled_ops_p = true;
+ wi->info = stmt;
+ return error_mark_node;
+   }
+  break;
+
 default:
   break;
 }
@@ -17509,10 +17521,11 @@ grid_target_follows_gridifiable_pattern (gomp_target 
*target, tree *group_size_p
 
   struct walk_stmt_info wi;
   memset (&wi, 0, sizeof (wi));
-  if (gimple *bad = walk_gimple_seq (gimple_omp_body (gfor),
-grid_find_ungridifiable_statement,
-NULL, &wi))
+  if (walk_gimple_seq (gimple_omp_body (gfor),
+  grid_find_ungridifiable_statement,
+  NULL, &wi))
 {
+  gimple *bad = (gimple *) wi.info;
   if (dump_enabled_p ())
{
  if (is_gimple_call (bad))
@@ -17520,6 +17533,11 @@ grid_target_follows_gridifiable_pattern (gomp_target 
*target, tree *group_size_p
 "Will not turn target construct into a gridified "
 " GPGPU kernel because the inner loop contains "
 "call to a noreturn function\n");
+ if (gimple_code (bad) == GIMPLE_OMP_FOR)
+   dump_printf_loc (MSG_NOTE, tloc,
+"Will not turn target construct into a gridified "
+" GPGPU kernel because the inner loop contains "
+"a simd construct\n");
  else
dump_printf_loc (MSG_NOTE, tloc,
 "Will not turn target construct into a gridified "
-- 
2.7.1

[hsa] Fail in presence of atomic operations in private segment

2016-02-26 Thread Martin Jambor

Hi,

the HSA does not allow atomic instructions operating on the private
segment because they are quite pointless.  In the long term, we should
put addressable local variables in the global memory somewhere (I'm
going to look at the nvidia stack effort soon).  But for now we need
to fail in the compiler, because otherwise there is testcase in the
libgomp testsuite that complains about the finalizer errors in the
output of the execute test.  Bootstrapped and tested, I'll commit it
to trunk shortly.

Thanks,

Martin

2016-02-17  Martin Jambor  

* hsa-gen.c (gen_hsa_ternary_atomic_for_builtin): Fail in presence of
atomic operations in private segment.
---
 gcc/hsa-gen.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/gcc/hsa-gen.c b/gcc/hsa-gen.c
index 28e8b6f..717e755 100644
--- a/gcc/hsa-gen.c
+++ b/gcc/hsa-gen.c
@@ -4557,8 +4557,13 @@ gen_hsa_ternary_atomic_for_builtin (bool ret_orig,
 
   hsa_op_address *addr;
   addr = get_address_from_value (gimple_call_arg (stmt, 0), hbb);
-  /* TODO: Warn if addr has private segment, because the finalizer will not
- accept that (and it does not make much sense).  */
+  if (addr->m_symbol && addr->m_symbol->m_segment == BRIG_SEGMENT_PRIVATE)
+{
+  HSA_SORRY_AT (gimple_location (stmt),
+   "HSA does not implement atomic operations in private "
+   "segment");
+  return;
+}
   hsa_op_base *op = hsa_reg_or_immed_for_gimple_op (gimple_call_arg (stmt, 1),
hbb);
 
-- 
2.7.1

[hsa] Satisfy conditional move operand type constrains

2016-02-26 Thread Martin Jambor

Hi,

It turned out that we got the types of conditional move operands and
their immediate operands wrong in, partly because we missed a detail
in the specs, partly because the finalizer/verifier is even stricter
in one aspect.  I'll commit the bootstrapped and tested fix below
shortly.

Thanks,

Martin

2016-02-17  Martin Jambor  

* hsa.h (is_a_helper): New overload for hsa_op_immed for
hsa_op_with_type operands.
(hsa_unsigned_type_for_type): Declare.
* hsa.c (hsa_unsigned_type_for_type): New function.
* hsa-gen.c (gen_hsa_binary_operation): Use hsa_unsigned_type_for_type.
(gen_hsa_insns_for_operation_assignment): Satisfy constrains of
the finalizer.  Do not emit extra move.
---
 gcc/hsa-gen.c | 28 +++-
 gcc/hsa.c |  8 
 gcc/hsa.h | 12 
 3 files changed, 31 insertions(+), 17 deletions(-)

diff --git a/gcc/hsa-gen.c b/gcc/hsa-gen.c
index 717e755..a7dc220 100644
--- a/gcc/hsa-gen.c
+++ b/gcc/hsa-gen.c
@@ -3022,7 +3022,7 @@ gen_hsa_binary_operation (int opcode, hsa_op_reg *dest,
   && is_a  (op2))
 {
   hsa_op_immed *i = dyn_cast  (op2);
-  i->set_type (hsa_uint_for_bitsize (hsa_type_bit_size (i->m_type)));
+  i->set_type (hsa_unsigned_type_for_type (i->m_type));
 }
 
   hsa_insn_basic *insn = new hsa_insn_basic (3, opcode, dest->m_type, dest,
@@ -3233,27 +3233,21 @@ gen_hsa_insns_for_operation_assignment (gimple *assign, 
hsa_bb *hbb)
ctrl = r;
  }
 
-   hsa_op_with_type *rhs2_reg = hsa_reg_or_immed_for_gimple_op (rhs2, hbb);
-   hsa_op_with_type *rhs3_reg = hsa_reg_or_immed_for_gimple_op (rhs3, hbb);
-
-   BrigType16_t btype = hsa_bittype_for_type (dest->m_type);
-   hsa_op_reg *tmp = new hsa_op_reg (btype);
+   hsa_op_with_type *op2 = hsa_reg_or_immed_for_gimple_op (rhs2, hbb);
+   hsa_op_with_type *op3 = hsa_reg_or_immed_for_gimple_op (rhs3, hbb);
 
-   rhs2_reg->m_type = btype;
-   rhs3_reg->m_type = btype;
+   BrigType16_t utype = hsa_unsigned_type_for_type (dest->m_type);
+   if (is_a  (op2))
+ op2->m_type = utype;
+   if (is_a  (op3))
+ op3->m_type = utype;
 
hsa_insn_basic *insn
- = new hsa_insn_basic (4, BRIG_OPCODE_CMOV, tmp->m_type, tmp, ctrl,
-   rhs2_reg, rhs3_reg);
+ = new hsa_insn_basic (4, BRIG_OPCODE_CMOV,
+   hsa_bittype_for_type (dest->m_type),
+   dest, ctrl, op2, op3);
 
hbb->append_insn (insn);
-
-   /* As operands of a CMOV insn must be Bx types, we have to emit
-  a conversion insn.  */
-   hsa_insn_basic *mov = new hsa_insn_basic (2, BRIG_OPCODE_MOV,
- dest->m_type, dest, tmp);
-   hbb->append_insn (mov);
-
return;
   }
 case COMPLEX_EXPR:
diff --git a/gcc/hsa.c b/gcc/hsa.c
index f0b3205..9537e29 100644
--- a/gcc/hsa.c
+++ b/gcc/hsa.c
@@ -472,6 +472,14 @@ hsa_bittype_for_type (BrigType16_t t)
   return hsa_bittype_for_bitsize (hsa_type_bit_size (t));
 }
 
+/* Return HSA unsigned integer type with the same size as the type T.  */
+
+BrigType16_t
+hsa_unsigned_type_for_type (BrigType16_t t)
+{
+  return hsa_uint_for_bitsize (hsa_type_bit_size (t));
+}
+
 /* Return true if and only if TYPE is a floating point number type.  */
 
 bool
diff --git a/gcc/hsa.h b/gcc/hsa.h
index f0436f3..275a954 100644
--- a/gcc/hsa.h
+++ b/gcc/hsa.h
@@ -199,6 +199,17 @@ is_a_helper ::test (hsa_op_base *p)
   return p->m_kind == BRIG_KIND_OPERAND_CONSTANT_BYTES;
 }
 
+/* Likewise, but for a more specified base. */
+
+template <>
+template <>
+inline bool
+is_a_helper ::test (hsa_op_with_type *p)
+{
+  return p->m_kind == BRIG_KIND_OPERAND_CONSTANT_BYTES;
+}
+
+
 /* HSA register operand.  */
 
 class hsa_op_reg : public hsa_op_with_type
@@ -1326,6 +1337,7 @@ BrigType16_t hsa_bittype_for_bitsize (unsigned bitsize);
 BrigType16_t hsa_uint_for_bitsize (unsigned bitsize);
 BrigType16_t hsa_float_for_bitsize (unsigned bitsize);
 BrigType16_t hsa_bittype_for_type (BrigType16_t t);
+BrigType16_t hsa_unsigned_type_for_type (BrigType16_t t);
 bool hsa_type_float_p (BrigType16_t type);
 bool hsa_type_integer_p (BrigType16_t type);
 bool hsa_btype_p (BrigType16_t type);
-- 
2.7.1

[hsa/69674] Make testsuite libgomp.c/for-3.c compile with -m32

2016-02-26 Thread Martin Jambor

Hi,

the following wrong decision about HSA pointer type leads to hitting
an assert later in the out-of-ssa phase for 32bit targets.  I have not
attempted to test HSA on a 32bit i686 yet, I believe there is no
functional runtime (although it should be possible to come up with
one) and we do not build libgomp plugin for it, but we should not ICE
and the new code below is the correct way of doing things anyway.  So
I will commit it shortly, it has been included in a
bootstrap/testsuite run.

Thanks,

Martin

2016-02-18  Martin Jambor  

pr hsa/69674
* hsa-gen.c (gen_hsa_phi_from_gimple_phi): Use proper hsa type for
pointers.
(gen_hsa_addr): Allow integer constants in TMR_INDEX2.
---
 gcc/hsa-gen.c | 17 +
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/gcc/hsa-gen.c b/gcc/hsa-gen.c
index a7dc220..a295fda 100644
--- a/gcc/hsa-gen.c
+++ b/gcc/hsa-gen.c
@@ -2105,9 +2105,17 @@ gen_hsa_addr (tree ref, hsa_bb *hbb, HOST_WIDE_INT 
*output_bitsize = NULL,
}
   if (TMR_INDEX2 (ref))
{
- hsa_op_base *disp2 = hsa_cfun->reg_for_gimple_ssa
-   (TMR_INDEX2 (ref))->get_in_type (addrtype, hbb);
- reg = add_addr_regs_if_needed (reg, as_a  (disp2), hbb);
+ if (TREE_CODE (TMR_INDEX2 (ref)) == SSA_NAME)
+   {
+ hsa_op_base *disp2 = hsa_cfun->reg_for_gimple_ssa
+   (TMR_INDEX2 (ref))->get_in_type (addrtype, hbb);
+ reg = add_addr_regs_if_needed (reg, as_a  (disp2),
+hbb);
+   }
+ else if (TREE_CODE (TMR_INDEX2 (ref)) == INTEGER_CST)
+   offset += wi::to_offset (TMR_INDEX2 (ref));
+ else
+   gcc_unreachable ();
}
   offset += wi::to_offset (TMR_OFFSET (ref));
   break;
@@ -5329,7 +5337,8 @@ gen_hsa_phi_from_gimple_phi (gimple *phi_stmt, hsa_bb 
*hbb)
  hsa_op_address *addr = gen_hsa_addr (TREE_OPERAND (op, 0),
   hbb_src);
 
- hsa_op_reg *dest = new hsa_op_reg (BRIG_TYPE_U64);
+ hsa_op_reg *dest
+   = new hsa_op_reg (hsa_get_segment_addr_type 
(BRIG_SEGMENT_FLAT));
  hsa_insn_basic *insn
= new hsa_insn_basic (2, BRIG_OPCODE_LDA, BRIG_TYPE_U64,
  dest, addr);
-- 
2.7.1

[hsa/69568] Fix ld instruction type for packed data

2016-02-26 Thread Martin Jambor

Hi,

this is a fix for another type of HSA immediate operands that we got
wrong.  Apparently, the immediate value that is to be used in an
operation on packed type must be unsigned (unless it is 128bit, then
it has to be bit-typed, which is however something we are handling
correctly even now).

The patch below fulfills that requirement.  I have bootstrapped and
tested it and will commit it shortly.

Thanks,

Martin

2016-02-18  Martin Jambor  

PR hsa/69568
* hsa.h (hsa_type_packed_p): Declare.
* hsa.c (hsa_type_packed_p): New function.
* hsa-gen.c (mem_type_for_type): Use unsigned type for packed
loads.
(gen_hsa_insns_for_store): Use hsa_type_packed_p.
* hsa-brig.c (emit_basic_insn): Likewise.
---
 gcc/hsa-brig.c | 2 +-
 gcc/hsa-gen.c  | 6 --
 gcc/hsa.c  | 8 
 gcc/hsa.h  | 1 +
 4 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/gcc/hsa-brig.c b/gcc/hsa-brig.c
index cfbac58..61cfd8b 100644
--- a/gcc/hsa-brig.c
+++ b/gcc/hsa-brig.c
@@ -1803,7 +1803,7 @@ emit_basic_insn (hsa_insn_basic *insn)
   repr.base.type = lendian16 (type);
   repr.base.operands = lendian32 (emit_insn_operands (insn));
 
-  if ((type & BRIG_TYPE_PACK_MASK) != BRIG_TYPE_PACK_NONE)
+  if (hsa_type_packed_p (type))
 {
   if (hsa_type_float_p (type)
  && !hsa_opcode_floating_bit_insn_p (insn->m_opcode))
diff --git a/gcc/hsa-gen.c b/gcc/hsa-gen.c
index a295fda..3b915cc 100644
--- a/gcc/hsa-gen.c
+++ b/gcc/hsa-gen.c
@@ -754,11 +754,13 @@ mem_type_for_type (BrigType16_t type)
  unsigned type?).  */
   if ((type & BRIG_TYPE_PACK_MASK) == BRIG_TYPE_PACK_128)
 return BRIG_TYPE_B128;
-  else if (hsa_btype_p (type))
+  else if (hsa_btype_p (type) || hsa_type_packed_p (type))
 {
   unsigned bitsize = hsa_type_bit_size (type);
   if (bitsize < 128)
return hsa_uint_for_bitsize (bitsize);
+  else
+   return hsa_bittype_for_bitsize (bitsize);
 }
   return type;
 }
@@ -2648,7 +2650,7 @@ gen_hsa_insns_for_store (tree lhs, hsa_op_base *src, 
hsa_bb *hbb)
  we can modify the above in place.  */
   if (hsa_op_immed *imm = dyn_cast  (src))
 {
-  if ((imm->m_type & BRIG_TYPE_PACK_MASK) == BRIG_TYPE_PACK_NONE)
+  if (!hsa_type_packed_p (imm->m_type))
imm->m_type = mem->m_type;
   else
{
diff --git a/gcc/hsa.c b/gcc/hsa.c
index 9537e29..09bfd28 100644
--- a/gcc/hsa.c
+++ b/gcc/hsa.c
@@ -480,6 +480,14 @@ hsa_unsigned_type_for_type (BrigType16_t t)
   return hsa_uint_for_bitsize (hsa_type_bit_size (t));
 }
 
+/* Return true if TYPE is a packed HSA type.  */
+
+bool
+hsa_type_packed_p (BrigType16_t type)
+{
+  return (type & BRIG_TYPE_PACK_MASK) != BRIG_TYPE_PACK_NONE;
+}
+
 /* Return true if and only if TYPE is a floating point number type.  */
 
 bool
diff --git a/gcc/hsa.h b/gcc/hsa.h
index 275a954..3a13fbd 100644
--- a/gcc/hsa.h
+++ b/gcc/hsa.h
@@ -1338,6 +1338,7 @@ BrigType16_t hsa_uint_for_bitsize (unsigned bitsize);
 BrigType16_t hsa_float_for_bitsize (unsigned bitsize);
 BrigType16_t hsa_bittype_for_type (BrigType16_t t);
 BrigType16_t hsa_unsigned_type_for_type (BrigType16_t t);
+bool hsa_type_packed_p (BrigType16_t type);
 bool hsa_type_float_p (BrigType16_t type);
 bool hsa_type_integer_p (BrigType16_t type);
 bool hsa_btype_p (BrigType16_t type);
-- 
2.7.1

[hsa, testsuite] Gridification tests

2016-02-26 Thread Martin Jambor

Hi,

the patch below adds a DejaGNU effective target predicate (is that the
correct dejagnu term?) offload_hsa so that selected tests can be run
only if the hsa offloading is enabled.  I hope it is fairly standard
stuff.  Additionally, it adds one C/C++ and one Fortran testsuite to
check that gridification happens.

Tested, both with and without HSA enabled.  OK for trunk?

Thanks,

Martin

2016-02-10  Martin Jambor  

* target-supports.exp (check_effective_target_offload_hsa): New.
* c-c++-common/gomp/gridify-1.c: New test.
* gfortran.dg/gomp/gridify-1.f90: Likewise.
---
 gcc/testsuite/c-c++-common/gomp/gridify-1.c  | 54 
 gcc/testsuite/gfortran.dg/gomp/gridify-1.f90 | 16 +
 gcc/testsuite/lib/target-supports.exp|  8 +
 3 files changed, 78 insertions(+)
 create mode 100644 gcc/testsuite/c-c++-common/gomp/gridify-1.c
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/gridify-1.f90

diff --git a/gcc/testsuite/c-c++-common/gomp/gridify-1.c 
b/gcc/testsuite/c-c++-common/gomp/gridify-1.c
new file mode 100644
index 000..ba7a866
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/gomp/gridify-1.c
@@ -0,0 +1,54 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target offload_hsa } */
+/* { dg-options "-fopenmp -fdump-tree-omplower-details" } */
+
+void
+foo1 (int n, int *a, int workgroup_size)
+{
+  int i;
+#pragma omp target
+#pragma omp teams thread_limit(workgroup_size)
+#pragma omp distribute parallel for shared(a) firstprivate(n) private(i)
+for (i = 0; i < n; i++)
+  a[i]++;
+}
+
+void
+foo2 (int j, int n, int *a)
+{
+  int i;
+#pragma omp target teams
+#pragma omp distribute parallel for shared(a) firstprivate(n) private(i) 
firstprivate(j)
+for (i = j + 1; i < n; i++)
+  a[i] = i;
+}
+
+void
+foo3 (int j, int n, int *a)
+{
+  int i;
+#pragma omp target teams
+#pragma omp distribute parallel for shared(a) firstprivate(n) private(i) 
firstprivate(j)
+  for (i = j + 1; i < n; i += 3)
+a[i] = i;
+}
+
+void
+foo4 (int j, int n, int *a)
+{
+#pragma omp parallel
+  {
+#pragma omp single
+{
+  int i;
+#pragma omp target
+#pragma omp teams
+#pragma omp distribute parallel for shared(a) firstprivate(n) private(i) 
firstprivate(j)
+  for (i = j + 1; i < n; i += 3)
+   a[i] = i;
+}
+  }
+}
+
+
+/* { dg-final { scan-tree-dump-times "Target construct will be turned into a 
gridified GPGPU kernel" 4 "omplower" } } */
diff --git a/gcc/testsuite/gfortran.dg/gomp/gridify-1.f90 
b/gcc/testsuite/gfortran.dg/gomp/gridify-1.f90
new file mode 100644
index 000..00ff7f5
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/gridify-1.f90
@@ -0,0 +1,16 @@
+! { dg-do compile }
+! { dg-require-effective-target offload_hsa }
+! { dg-options "-fopenmp -fdump-tree-omplower-details" } */
+
+subroutine vector_square(n, a, b)
+  integer i, n, b(n), a(n)
+!$omp target teams
+!$omp distribute parallel do
+  do i=1,n
+  b(i) = a(i) * a(i)
+  enddo
+!$omp end distribute parallel do
+!$omp end target teams
+end subroutine vector_square
+
+! { dg-final { scan-tree-dump "Target construct will be turned into a 
gridified GPGPU kernel" "omplower" } }
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 0b4252f..fac4c3c 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -6936,3 +6936,11 @@ proc check_effective_target_offload_nvptx { } {
int main () {return 0;}
 } "-foffload=nvptx-none" ]
 }
+
+# Return 1 if the compiler has been configured with hsa offloading.
+
+proc check_effective_target_offload_hsa { } {
+return [check_no_compiler_messages offload_hsa assembly {
+   int main () {return 0;}
+} "-foffload=hsa" ]
+}
-- 
2.7.1

[hsa, testsuite] Suppress hsa warnings in compiler gomp tests

2016-02-26 Thread Martin Jambor

Hi,

compilation of (some) target constructs in the following testcases
fails and the warning the compiler gives out by default are then
reported as excess errors.  There are two options how to deal with
them.  Either we can change the gomp.exp files to pass -Wno-hsa as the
default set of options and then override it in the tests which we want
to be compiled also to HSAIL, or disable it explicitely only in the
tests where we know about the warnings.  Both approaches work.

I have selected the second option so that if we regress in the sense
that we fail to produce an HSAIL for a test for which we could do it
earlier, I can detect it and examine the issue.  The explicit -Wno-hsa
options are added by the patch below, which has been tested both with
and without HSA.  OK for trunk?

Thanks,

Martin

2016-02-12  Martin Jambor  

testsuite/
* c-c++-common/gomp/clauses-1.c: Add -Wno-hsa option.
* c-c++-common/gomp/if-1.c: Likewise.
* c-c++-common/gomp/nesting-1.c: Likewise.
* c-c++-common/gomp/nesting-warn-1.c: Likewise.
* c-c++-common/gomp/pr61486-2.c: Likewise.
* c-c++-common/gomp/target-teams-1.c: Likewise.
* g++.dg/gomp/target-teams-1.C: Likewise.
* gcc.dg/gomp/pr68128-2.c: Likewise.
* gfortran.dg/gomp/target1.f90: Likewise.
* gfortran.dg/gomp/target2.f90: Likewise.
* gfortran.dg/gomp/target3.f90: Likewise.
---
 gcc/testsuite/c-c++-common/gomp/clauses-1.c  | 2 +-
 gcc/testsuite/c-c++-common/gomp/if-1.c   | 2 +-
 gcc/testsuite/c-c++-common/gomp/nesting-1.c  | 2 ++
 gcc/testsuite/c-c++-common/gomp/nesting-warn-1.c | 2 ++
 gcc/testsuite/c-c++-common/gomp/pr61486-2.c  | 2 +-
 gcc/testsuite/c-c++-common/gomp/target-teams-1.c | 2 +-
 gcc/testsuite/g++.dg/gomp/target-teams-1.C   | 2 +-
 gcc/testsuite/gcc.dg/gomp/pr68128-2.c| 2 +-
 gcc/testsuite/gfortran.dg/gomp/target1.f90   | 2 +-
 gcc/testsuite/gfortran.dg/gomp/target2.f90   | 2 +-
 gcc/testsuite/gfortran.dg/gomp/target3.f90   | 2 +-
 11 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/gcc/testsuite/c-c++-common/gomp/clauses-1.c 
b/gcc/testsuite/c-c++-common/gomp/clauses-1.c
index 2d1c352..8eb514d 100644
--- a/gcc/testsuite/c-c++-common/gomp/clauses-1.c
+++ b/gcc/testsuite/c-c++-common/gomp/clauses-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-fopenmp" } */
+/* { dg-options "-fopenmp -Wno-hsa" } */
 /* { dg-additional-options "-std=c99" { target c } } */
 
 int t;
diff --git a/gcc/testsuite/c-c++-common/gomp/if-1.c 
b/gcc/testsuite/c-c++-common/gomp/if-1.c
index 4ba708c..3a2862f 100644
--- a/gcc/testsuite/c-c++-common/gomp/if-1.c
+++ b/gcc/testsuite/c-c++-common/gomp/if-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-fopenmp" } */
+/* { dg-options "-fopenmp -Wno-hsa" } */
 
 void
 foo (int a, int b, int *p, int *q)
diff --git a/gcc/testsuite/c-c++-common/gomp/nesting-1.c 
b/gcc/testsuite/c-c++-common/gomp/nesting-1.c
index 61b2f81..395f9d7 100644
--- a/gcc/testsuite/c-c++-common/gomp/nesting-1.c
+++ b/gcc/testsuite/c-c++-common/gomp/nesting-1.c
@@ -1,3 +1,5 @@
+/* { dg-options "-fopenmp -Wno-hsa" } */
+
 extern int i;
 
 void
diff --git a/gcc/testsuite/c-c++-common/gomp/nesting-warn-1.c 
b/gcc/testsuite/c-c++-common/gomp/nesting-warn-1.c
index 800ad53..30d037b 100644
--- a/gcc/testsuite/c-c++-common/gomp/nesting-warn-1.c
+++ b/gcc/testsuite/c-c++-common/gomp/nesting-warn-1.c
@@ -1,3 +1,5 @@
+/* { dg-options "-fopenmp -Wno-hsa" } */
+
 extern int i;
 
 void
diff --git a/gcc/testsuite/c-c++-common/gomp/pr61486-2.c 
b/gcc/testsuite/c-c++-common/gomp/pr61486-2.c
index db97143..009e224 100644
--- a/gcc/testsuite/c-c++-common/gomp/pr61486-2.c
+++ b/gcc/testsuite/c-c++-common/gomp/pr61486-2.c
@@ -1,6 +1,6 @@
 /* PR middle-end/61486 */
 /* { dg-do compile } */
-/* { dg-options "-fopenmp" } */
+/* { dg-options "-fopenmp -Wno-hsa" } */
 /* { dg-require-effective-target alloca } */
 
 #pragma omp declare target
diff --git a/gcc/testsuite/c-c++-common/gomp/target-teams-1.c 
b/gcc/testsuite/c-c++-common/gomp/target-teams-1.c
index 0a707c2..94c997f 100644
--- a/gcc/testsuite/c-c++-common/gomp/target-teams-1.c
+++ b/gcc/testsuite/c-c++-common/gomp/target-teams-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-fopenmp -fdump-tree-gimple" } */
+/* { dg-options "-fopenmp -fdump-tree-gimple -Wno-hsa" } */
 
 int v = 6;
 void bar (int);
diff --git a/gcc/testsuite/g++.dg/gomp/target-teams-1.C 
b/gcc/testsuite/g++.dg/gomp/target-teams-1.C
index 0a97de0..7e2ba39 100644
--- a/gcc/testsuite/g++.dg/gomp/target-teams-1.C
+++ b/gcc/testsuite/g++.dg/gomp/target-teams-1.C
@@ -1,5 +1,5 @@
 // { dg-do compile }
-// { dg-options "-fopenmp -fdump-tree-gimple" }
+// { dg-options "-fopenmp -fdump-tree-gimple -Wno-hsa" }
 
 int v = 6;
 void bar (int);
diff --git a/gcc/testsuite/gcc.dg/gomp/pr68128-2.c 
b/gcc/testsuite/gcc.dg/gomp/pr68128-2.c
index 58a07e9..6cdb9d4 100644
--- a/gcc/testsuite/gcc.dg/gomp/pr68

[hsa,testsuite] Introduce offload_device_shared_as effective target

2016-02-26 Thread Martin Jambor

Hello,

this patch has been written by Keith McDaniel when he was working at
AMD (and so it should be covered by their blanket copyright
assignment) and adds a libgomp testsuite effective target predicate
offload_device_shared_as that allows us to run tests only when target
constructs are run on a device with shared memory (this includes the
host).  Keith included a C++ test to illustrate the use.

I have tested this thoroughly, both with and without HSA (enabled or
present).  OK for trunk?

Thanks,

Martin

2016-02-10  Keith McDaniel 
Martin Jambor  

* testsuite/lib/libgomp.exp
(check_effective_target_offload_device_shared_as): New proc.
* testsuite/libgomp.c++/declare_target-1.C: New test.
---
 libgomp/testsuite/lib/libgomp.exp| 13 
 libgomp/testsuite/libgomp.c++/declare_target-1.C | 38 
 2 files changed, 51 insertions(+)
 create mode 100644 libgomp/testsuite/libgomp.c++/declare_target-1.C

diff --git a/libgomp/testsuite/lib/libgomp.exp 
b/libgomp/testsuite/lib/libgomp.exp
index a4c9d83..154a447 100644
--- a/libgomp/testsuite/lib/libgomp.exp
+++ b/libgomp/testsuite/lib/libgomp.exp
@@ -343,6 +343,19 @@ proc check_effective_target_offload_device_nonshared_as { 
} {
}
 } ]
 }
+  
+# Return 1 if offload device is available and it has shared address space.
+proc check_effective_target_offload_device_shared_as { } {
+return [check_runtime_nocache offload_device_shared_as {
+  int main ()
+   {
+ int x = 10;
+ #pragma omp target map(to: x)
+   x++;
+ return x == 10;
+   }
+} ]
+}
 
 # Return 1 if at least one nvidia board is present.
 
diff --git a/libgomp/testsuite/libgomp.c++/declare_target-1.C 
b/libgomp/testsuite/libgomp.c++/declare_target-1.C
new file mode 100644
index 000..4394bb1
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c++/declare_target-1.C
@@ -0,0 +1,38 @@
+// { dg-do run }
+// { dg-require-effective-target offload_device_shared_as }
+
+#include 
+
+struct typeX
+{
+  int a;
+};
+
+class typeY
+{
+public:
+  int foo () { return a^0x01; }
+  int a;
+};
+
+#pragma omp declare target
+struct typeX varX;
+class typeY varY;
+#pragma omp end declare target
+
+int main ()
+{
+  varX.a = 0;
+  varY.a = 0;
+
+  #pragma omp target
+{
+  varX.a = 100;
+  varY.a = 100;
+}
+
+  if (varX.a != 100 || varY.a != 100)
+abort ();
+
+  return 0;
+}
-- 
2.7.1

[hsa, testsuite] Suppress hsa warnings in libgomp tests

2016-02-26 Thread Martin Jambor

Hi,

just like with the compiler gomp testsuite, we need to add -Wno-hsa to
options when compiling libgomp testcases in order not to have "excess
errors" failures when HSA is enabled.  There are quite many of such
testcases on the trunk because I have disabled the dynamic parallelism
way of executing stuff.  Hopefully we'll be able to revert many of the
hunks below when we get that working.

The patch has been tested both with and without HSA enabled.  OK for
trunk?

Thanks,

Martin

2016-02-12  Martin Jambor  

* testsuite/libgomp.c++/examples-4/target_data-5.C: Do not generate
HSA warnings.
* testsuite/libgomp.c++/for-11.C: Likewise.
* testsuite/libgomp.c++/for-13.C: Likewise.
* testsuite/libgomp.c++/for-14.C: Likewise.
* testsuite/libgomp.c++/pr66199-2.C: Likewise.
* testsuite/libgomp.c++/pr66199-4.C: Likewise.
* testsuite/libgomp.c++/pr66199-5.C: Likewise.
* testsuite/libgomp.c++/pr66199-6.C: Likewise.
* testsuite/libgomp.c++/pr66199-7.C: Likewise.
* testsuite/libgomp.c++/pr66199-8.C: Likewise.
* testsuite/libgomp.c++/target-1.C: Likewise.
* testsuite/libgomp.c++/target-2.C: Likewise.
* testsuite/libgomp.c++/target-3.C: Likewise.
* testsuite/libgomp.c++/target-8.C: Likewise.
* testsuite/libgomp.c/examples-4/async_target-1.c: Likewise.
* testsuite/libgomp.c/examples-4/declare_target-3.c: Likewise.
* testsuite/libgomp.c/examples-4/declare_target-4.c: Likewise.
* testsuite/libgomp.c/examples-4/declare_target-5.c: Likewise.
* testsuite/libgomp.c/examples-4/target-1.c: Likewise.
* testsuite/libgomp.c/examples-4/target-2.c: Likewise.
* testsuite/libgomp.c/examples-4/target-3.c: Likewise.
* testsuite/libgomp.c/examples-4/target-4.c: Likewise.
* testsuite/libgomp.c/examples-4/target_data-1.c: Likewise.
* testsuite/libgomp.c/examples-4/target_data-2.c: Likewise.
* testsuite/libgomp.c/examples-4/target_data-3.c: Likewise.
* testsuite/libgomp.c/examples-4/target_data-4.c: Likewise.
* testsuite/libgomp.c/examples-4/target_update-1.c: Likewise.
* testsuite/libgomp.c/examples-4/target_update-2.c: Likewise.
* testsuite/libgomp.c/examples-4/teams-2.c: Likewise.
* testsuite/libgomp.c/examples-4/teams-3.c: Likewise.
* testsuite/libgomp.c/examples-4/teams-4.c: Likewise.
* testsuite/libgomp.c/examples-4/teams-6.c: Likewise.
* testsuite/libgomp.c/for-3.c: Likewise.
* testsuite/libgomp.c/for-5.c: Likewise.
* testsuite/libgomp.c/for-6.c: Likewise.
* testsuite/libgomp.c/pr66199-2.c: Likewise.
* testsuite/libgomp.c/pr66199-4.c: Likewise.
* testsuite/libgomp.c/pr66199-5.c: Likewise.
* testsuite/libgomp.c/pr66199-6.c: Likewise.
* testsuite/libgomp.c/pr66199-7.c: Likewise.
* testsuite/libgomp.c/pr66199-8.c: Likewise.
* testsuite/libgomp.c/pr66714.c: Likewise.
* testsuite/libgomp.c/target-1.c: Likewise.
* testsuite/libgomp.c/target-16.c: Likewise.
* testsuite/libgomp.c/target-2.c: Likewise.
* testsuite/libgomp.c/target-31.c: Likewise.
* testsuite/libgomp.c/target-32.c: Likewise.
* testsuite/libgomp.c/target-35.c: Likewise.
* testsuite/libgomp.c/target-5.c: Likewise.
* testsuite/libgomp.c/target-6.c: Likewise.
* testsuite/libgomp.c/target-critical-1.c: Likewise.
* testsuite/libgomp.c/target-teams-1.c: Likewise.
* testsuite/libgomp.c/thread-limit-2.c: Likewise.
* testsuite/libgomp.c/thread-limit-3.c: Likewise.
* testsuite/libgomp.fortran/examples-4/async_target-1.f90: Likewise.
* testsuite/libgomp.fortran/examples-4/async_target-2.f90: Likewise.
* testsuite/libgomp.fortran/examples-4/declare_target-2.f90: Likewise.
* testsuite/libgomp.fortran/examples-4/declare_target-3.f90: Likewise.
* testsuite/libgomp.fortran/examples-4/declare_target-4.f90: Likewise.
* testsuite/libgomp.fortran/examples-4/declare_target-5.f90: Likewise.
* testsuite/libgomp.fortran/examples-4/target-1.f90: Likewise.
* testsuite/libgomp.fortran/examples-4/target-2.f90: Likewise.
* testsuite/libgomp.fortran/examples-4/target-3.f90: Likewise.
* testsuite/libgomp.fortran/examples-4/target-4.f90: Likewise.
* testsuite/libgomp.fortran/examples-4/target_data-1.f90: Likewise.
* testsuite/libgomp.fortran/examples-4/target_data-2.f90: Likewise.
* testsuite/libgomp.fortran/examples-4/target_data-3.f90: Likewise.
* testsuite/libgomp.fortran/examples-4/target_data-4.f90: Likewise.
* testsuite/libgomp.fortran/examples-4/target_data-5.f90: Likewise.
* testsuite/libgomp.fortran/examples-4/target_update-1.f90: Likewise.
* testsuite/libgomp.fortran/examples-4/target_update-2.f90: Likewise.
* testsuite/libgom

[hsa,testsuite] Adjust libgomp tests that do not work on host fallback

2016-02-26 Thread Martin Jambor

Hi,

this patch avoids run-time failures in libgomp testsuite that
curtrently happen when HSA offloading is actually used.  All of them
currently require the offload_device effective target which the patch
changes to offload_device_nonshared_as one.

For some tests, such as libgomp.c/examples-4/device-1.c this is really
the correct thing to do because the test explicitely checks that
changes that happen in a target construct and are not mapped back are
not observable on the host.

However, the majority of the tests has a different problem.  The test
for some reason is not compiled into HSAIL (usually because it would
require the dynamic parallelism path which is disabled or because it
calls abort from within target which HSA so far cannot handle) and so
the host fallback is called, even though the test actually is not
supposed to be called on it.  The tests then call
omp_is_initial_device to verify they are not running on the host and
decide to fail.

Changing the effective target only to devices with non-shared memory
probably isn't the really correct fix.  We basically want to disable
the host fallback for them regardeless of address spaces but I cannot
think of a simple and generic way of doing that.  However, all
testcases for non-shared memory devices were written with disallowed
fallback in mind and so this soulution also gives the desired result.

Tested both with and without HSA (enabled or present).  OK for trunk?

Thanks,

Martin

2016-02-12  Martin Jambor  

libgomp/
* testsuite/libgomp.c/examples-4/async_target-2.c: Only run on
non-shared memory accelerators.
* testsuite/libgomp.c/examples-4/device-1.c: Likewise.
* testsuite/libgomp.c/examples-4/target-5.c: Likewise.
* testsuite/libgomp.c/examples-4/target_data-6.c: Likewise.
* testsuite/libgomp.c/examples-4/target_data-7.c: Likewise.
* testsuite/libgomp.fortran/examples-4/async_target-2.f90: Likewise.
* testsuite/libgomp.fortran/examples-4/device-1.f90: Likewise.
* testsuite/libgomp.fortran/examples-4/target-5.f90: Likewise.
* testsuite/libgomp.fortran/examples-4/target_data-6.f90: Likewise.
* testsuite/libgomp.fortran/examples-4/target_data-7.f90: Likewise.
---
 libgomp/testsuite/libgomp.c/examples-4/async_target-2.c | 2 +-
 libgomp/testsuite/libgomp.c/examples-4/device-1.c   | 2 +-
 libgomp/testsuite/libgomp.c/examples-4/target-5.c   | 2 +-
 libgomp/testsuite/libgomp.c/examples-4/target_data-6.c  | 2 +-
 libgomp/testsuite/libgomp.c/examples-4/target_data-7.c  | 2 +-
 libgomp/testsuite/libgomp.fortran/examples-4/async_target-2.f90 | 2 +-
 libgomp/testsuite/libgomp.fortran/examples-4/device-1.f90   | 2 +-
 libgomp/testsuite/libgomp.fortran/examples-4/target-5.f90   | 2 +-
 libgomp/testsuite/libgomp.fortran/examples-4/target_data-6.f90  | 2 +-
 libgomp/testsuite/libgomp.fortran/examples-4/target_data-7.f90  | 2 +-
 10 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/libgomp/testsuite/libgomp.c/examples-4/async_target-2.c 
b/libgomp/testsuite/libgomp.c/examples-4/async_target-2.c
index ce63328..0c76f8e 100644
--- a/libgomp/testsuite/libgomp.c/examples-4/async_target-2.c
+++ b/libgomp/testsuite/libgomp.c/examples-4/async_target-2.c
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-require-effective-target offload_device } */
+/* { dg-require-effective-target offload_device_nonshared_as } */
 
 #include 
 #include 
diff --git a/libgomp/testsuite/libgomp.c/examples-4/device-1.c 
b/libgomp/testsuite/libgomp.c/examples-4/device-1.c
index dad8572..46aa160 100644
--- a/libgomp/testsuite/libgomp.c/examples-4/device-1.c
+++ b/libgomp/testsuite/libgomp.c/examples-4/device-1.c
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-require-effective-target offload_device } */
+/* { dg-require-effective-target offload_device_nonshared_as } */
 
 #include 
 #include 
diff --git a/libgomp/testsuite/libgomp.c/examples-4/target-5.c 
b/libgomp/testsuite/libgomp.c/examples-4/target-5.c
index 1853fba..1c14bae 100644
--- a/libgomp/testsuite/libgomp.c/examples-4/target-5.c
+++ b/libgomp/testsuite/libgomp.c/examples-4/target-5.c
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-require-effective-target offload_device } */
+/* { dg-require-effective-target offload_device_nonshared_as } */
 
 #include 
 #include 
diff --git a/libgomp/testsuite/libgomp.c/examples-4/target_data-6.c 
b/libgomp/testsuite/libgomp.c/examples-4/target_data-6.c
index affeb49..57c7c0c 100644
--- a/libgomp/testsuite/libgomp.c/examples-4/target_data-6.c
+++ b/libgomp/testsuite/libgomp.c/examples-4/target_data-6.c
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-require-effective-target offload_device } */
+/* { dg-require-effective-target offload_device_nonshared_as } */
 
 #include 
 #include 
diff --git a/libgomp/testsuite/libgomp.c/examples-4/target_data-7.c 
b/libgomp/testsuite/libgomp.c/examples-4/target_data-7.c
index c18d480..8ec41ea 100644
--- a/libgomp/

[hsa,testsuite] New directory for HSA-specific C testcases

2016-02-26 Thread Martin Jambor

Hi,

we would like a place to have some HSA-specific tests, which would
only run not only when HSA is enabled at configuration time but also
when HSA hardware is present and used for offloading.

The only way to detect that situation I could think of is to run a
simple kernel with environment variable HSA_DEBUG set and look HSA
libgomp plugin debug output in the stderr stream.  So I started with a
copy of proc check_runtime_nocache and repurposed it for this task and
then guarded any execution of tests by that predicate in the new c.exp
file.  The new directory is in the libgomp testsuite because (at least
now) there is no other way of running hsa stuff but through libgomp.
All tests there are run twice, at -O0 and -O2, because we have found
that both levels are very useful for detecting errors in the lowering
to HSA.

I have very little experience with tcl, expect or DejaGNU and would
appreciate very much any feedback or guidance of anyone more
experience in these areas.  In particular, it would probably be better
to structure the new predicate as an effective-target one but I am not
sure what amount of caching would I have to do manually for such a new
runtime test.

After I incorporate all feedback, I would of course like to commit
this to trunk.  Needless to say, the patch has been tested and works.

Thanks,

Martin

2016-02-22  Martin Jambor  
Martin Liska 

* testsuite/lib/libgomp.exp (check_hsa_offloading_available): New.
* testsuite/libgomp.hsa.c/c.exp: Likewise.
* testsuite/libgomp.hsa.c/alloca-1.c: Likewise.
* testsuite/libgomp.hsa.c/bitfield-1.c: Likewise.
* testsuite/libgomp.hsa.c/builtins-1.c: Likewise.
* testsuite/libgomp.hsa.c/complex-1.c: Likewise.
* testsuite/libgomp.hsa.c/formal-actual-args-1.c: Likewise.
* testsuite/libgomp.hsa.c/function-call-1.c: Likewise.
* testsuite/libgomp.hsa.c/get-level-1.c: Likewise.
* testsuite/libgomp.hsa.c/gridify-1.c: Likewise.
* testsuite/libgomp.hsa.c/gridify-2.c: Likewise.
* testsuite/libgomp.hsa.c/gridify-3.c: Likewise.
* testsuite/libgomp.hsa.c/gridify-4.c: Likewise.
* testsuite/libgomp.hsa.c/memory-operations-1.c: Likewise.
* testsuite/libgomp.hsa.c/pr69568.c: Likewise.
* testsuite/libgomp.hsa.c/rotate-1.c: Likewise.
* testsuite/libgomp.hsa.c/switch-1.c: Likewise.
* testsuite/libgomp.hsa.c/switch-branch-1.c: Likewise.
---
 libgomp/testsuite/lib/libgomp.exp  |  44 ++
 libgomp/testsuite/libgomp.hsa.c/alloca-1.c |  25 
 libgomp/testsuite/libgomp.hsa.c/bitfield-1.c   | 160 +
 libgomp/testsuite/libgomp.hsa.c/builtins-1.c   |  97 +
 libgomp/testsuite/libgomp.hsa.c/c.exp  |  41 ++
 libgomp/testsuite/libgomp.hsa.c/complex-1.c|  65 +
 .../testsuite/libgomp.hsa.c/formal-actual-args-1.c |  83 +++
 libgomp/testsuite/libgomp.hsa.c/function-call-1.c  |  50 +++
 libgomp/testsuite/libgomp.hsa.c/get-level-1.c  |  26 
 libgomp/testsuite/libgomp.hsa.c/gridify-1.c|  26 
 libgomp/testsuite/libgomp.hsa.c/gridify-2.c|  26 
 libgomp/testsuite/libgomp.hsa.c/gridify-3.c|  39 +
 libgomp/testsuite/libgomp.hsa.c/gridify-4.c|  45 ++
 .../testsuite/libgomp.hsa.c/memory-operations-1.c  |  92 
 libgomp/testsuite/libgomp.hsa.c/pr69568.c  |  41 ++
 libgomp/testsuite/libgomp.hsa.c/rotate-1.c |  39 +
 libgomp/testsuite/libgomp.hsa.c/switch-1.c | 145 +++
 libgomp/testsuite/libgomp.hsa.c/switch-branch-1.c  | 116 +++
 18 files changed, 1160 insertions(+)
 create mode 100644 libgomp/testsuite/libgomp.hsa.c/alloca-1.c
 create mode 100644 libgomp/testsuite/libgomp.hsa.c/bitfield-1.c
 create mode 100644 libgomp/testsuite/libgomp.hsa.c/builtins-1.c
 create mode 100644 libgomp/testsuite/libgomp.hsa.c/c.exp
 create mode 100644 libgomp/testsuite/libgomp.hsa.c/complex-1.c
 create mode 100644 libgomp/testsuite/libgomp.hsa.c/formal-actual-args-1.c
 create mode 100644 libgomp/testsuite/libgomp.hsa.c/function-call-1.c
 create mode 100644 libgomp/testsuite/libgomp.hsa.c/get-level-1.c
 create mode 100644 libgomp/testsuite/libgomp.hsa.c/gridify-1.c
 create mode 100644 libgomp/testsuite/libgomp.hsa.c/gridify-2.c
 create mode 100644 libgomp/testsuite/libgomp.hsa.c/gridify-3.c
 create mode 100644 libgomp/testsuite/libgomp.hsa.c/gridify-4.c
 create mode 100644 libgomp/testsuite/libgomp.hsa.c/memory-operations-1.c
 create mode 100644 libgomp/testsuite/libgomp.hsa.c/pr69568.c
 create mode 100644 libgomp/testsuite/libgomp.hsa.c/rotate-1.c
 create mode 100644 libgomp/testsuite/libgomp.hsa.c/switch-1.c
 create mode 100644 libgomp/testsuite/libgomp.hsa.c/switch-branch-1.c

diff --git a/libgomp/testsuite/lib/libgomp.exp 
b/libgomp/testsuite/lib/libgomp.exp
index 154a447..1c917c8 100644
--- a/libgomp/testsuite/lib/libgomp.exp
+

[PATCH] Fix powerpc shift/rotate/mask insn handling (PR target/69946)

2016-02-26 Thread Jakub Jelinek

Hi!

Segher has added last year a few routines for the shift/rotate + mask
patterns, insns always have one predicate which tests if PowerPC supports
such pattern, and another that emits the instruction for it.

The testcase in the patch is miscompiled, we end up with an instruction
with out of bound shift count, because there is disagreement between the
analysis phase, which does some changes (changes shifts by 0
into rotate and some rotates into left or right shifts (for the last one
with changed shift count)), but those changes are only done virtually,
the predicate can't change the instruction, it either accepts it or rejects
it.  Then during output, we don't perform those changes, treat it as
rotate or left or right shift just based on what the actual IL has.
In some cases it is harmless, in other cases, as the testcase shows, it is
harmful.

Also, I've noticed that the preparation phase of both
rs6000_is_valid_shift_mask and rs6000_is_valid_insert_mask is pretty much
the same (the only difference is that the latter requires the shift/rotate
count to be always CONST_INT, while the former allows also a REG in there).

So, what the following patch does is that it moves the preparation
from the predicate functions to a new static inline helper function,
and then uses that function in both the predicate functions, and also
in both the output functions.

Bootstrapped/regtested on powerpc64{,le}-linux, ok for trunk?

2016-02-26  Jakub Jelinek  

PR target/69946
* config/rs6000/rs6000.c (rs6000_is_valid_shift_mask_1): New
function.
(rs6000_is_valid_shift_mask, rs6000_is_valid_insert_mask): Use it.
(rs6000_insn_for_shift_mask, rs6000_insn_for_insert_mask): Likewise.
Adjust for possible changes of shift/rotate CODE and shift count SH.

* gcc.dg/pr69946.c: New test.

--- gcc/config/rs6000/rs6000.c.jj   2016-02-24 22:33:32.0 +0100
+++ gcc/config/rs6000/rs6000.c  2016-02-26 12:05:39.489021521 +0100
@@ -17319,42 +17319,60 @@ rs6000_insn_for_and_mask (machine_mode m
   gcc_unreachable ();
 }
 
-/* Return whether MASK (a CONST_INT) is a valid mask for any rlw[i]nm,
-   rld[i]cl, rld[i]cr, or rld[i]c instruction, to implement an AND with
-   shift SHIFT (a ROTATE, ASHIFT, or LSHIFTRT) in mode MODE.  */
+/* Helper for rs6000_is_valid_shift_mask, rs6000_insn_for_shift_mask,
+   rs6000_is_valid_insert_mask and rs6000_insn_for_insert_mask.  */
 
-bool
-rs6000_is_valid_shift_mask (rtx mask, rtx shift, machine_mode mode)
+static inline bool
+rs6000_is_valid_shift_mask_1 (rtx mask, rtx shift, machine_mode mode,
+ int *nb, int *ne, int *n, int *sh,
+ rtx_code *code)
 {
-  int nb, ne;
-
-  if (!rs6000_is_valid_mask (mask, &nb, &ne, mode))
+  if (!rs6000_is_valid_mask (mask, nb, ne, mode))
 return false;
 
-  int n = GET_MODE_PRECISION (mode);
-  int sh = -1;
+  *n = GET_MODE_PRECISION (mode);
+  *sh = -1;
 
   if (CONST_INT_P (XEXP (shift, 1)))
 {
-  sh = INTVAL (XEXP (shift, 1));
-  if (sh < 0 || sh >= n)
+  *sh = INTVAL (XEXP (shift, 1));
+  if (*sh < 0 || *sh >= *n)
return false;
 }
 
-  rtx_code code = GET_CODE (shift);
+  *code = GET_CODE (shift);
 
   /* Convert any shift by 0 to a rotate, to simplify below code.  */
-  if (sh == 0)
-code = ROTATE;
+  if (*sh == 0)
+*code = ROTATE;
 
   /* Convert rotate to simple shift if we can, to make analysis simpler.  */
-  if (code == ROTATE && sh >= 0 && nb >= ne && ne >= sh)
-code = ASHIFT;
-  if (code == ROTATE && sh >= 0 && nb >= ne && nb < sh)
+  if (*code == ROTATE && *sh >= 0 && *nb >= *ne)
 {
-  code = LSHIFTRT;
-  sh = n - sh;
+  if (*ne >= *sh)
+   *code = ASHIFT;
+  else if (*nb < *sh)
+   {
+ *code = LSHIFTRT;
+ *sh = *n - *sh;
+   }
 }
+  return true;
+}
+
+/* Return whether MASK (a CONST_INT) is a valid mask for any rlw[i]nm,
+   rld[i]cl, rld[i]cr, or rld[i]c instruction, to implement an AND with
+   shift SHIFT (a ROTATE, ASHIFT, or LSHIFTRT) in mode MODE.  */
+
+bool
+rs6000_is_valid_shift_mask (rtx mask, rtx shift, machine_mode mode)
+{
+  int nb, ne, n, sh;
+  rtx_code code;
+
+  if (!rs6000_is_valid_shift_mask_1 (mask, shift, mode, &nb, &ne,
+&n, &sh, &code))
+return false;
 
   /* DImode rotates need rld*.  */
   if (mode == DImode && code == ROTATE)
@@ -17398,15 +17416,20 @@ rs6000_is_valid_shift_mask (rtx mask, rt
 const char *
 rs6000_insn_for_shift_mask (machine_mode mode, rtx *operands, bool dot)
 {
-  int nb, ne;
+  int nb, ne, n, sh;
+  rtx_code code;
 
-  if (!rs6000_is_valid_mask (operands[3], &nb, &ne, mode))
+  if (!rs6000_is_valid_shift_mask_1 (operands[3], operands[4], mode, &nb, &ne,
+&n, &sh, &code))
 gcc_unreachable ();
 
+  if (sh != -1)
+operands[2] = GEN_INT (sh);
+
   if (mode == DImode && ne == 0)
 {
-  if (GET_CODE (operands[4])

Re: [omp, hsa] Do not gridify simd constructs

2016-02-26 Thread Jakub Jelinek

On Fri, Feb 26, 2016 at 04:57:19PM +0100, Martin Jambor wrote:
> unfortunately, I have missed two execution failures when using HSA
> (teams-6.f90 and target1.f90), both of which are caused by not
> handling the simd part of combined target teams distribute parallel
> for simd construct.  I have not really thought through how exactly
> should GPUs teat the simd construct in a combined construct but of
> course we must not miscompile.
> 
> So I'd like to commit the following patch which just disallows
> gridification of such cases to trunk now and am working on a fix that
> removes the simd loop for the hsa branch.  Eventually we might also
> want to hsa-vectorize the body even though the explicit loop is
> missing, but that will also mean that the HSA grid size has to shrink
> appropriately.
> 
> Bootstrapped and tested on x86_64-linux, with and without HSA
> enabled.  OK for trunk?
> 
> Thanks,
> 
> Martin
> 
> 2016-02-17  Martin Jambor  
> 
> * omp-low.c (grid_find_ungridifiable_statement): Store problematic
> statements to wi->info.  Also disallow omp simd constructs.
> (grid_target_follows_gridifiable_pattern): Use wi.info to dump reason
> for not gridifying.  Dump special string for omp_for.

Ok (though hope that for GCC 7 you actually handle it better).

Jakub

Re: [hsa, testsuite] Suppress hsa warnings in libgomp tests

2016-02-26 Thread Jakub Jelinek

On Fri, Feb 26, 2016 at 04:59:57PM +0100, Martin Jambor wrote:
> just like with the compiler gomp testsuite, we need to add -Wno-hsa to
> options when compiling libgomp testcases in order not to have "excess
> errors" failures when HSA is enabled.  There are quite many of such
> testcases on the trunk because I have disabled the dynamic parallelism
> way of executing stuff.  Hopefully we'll be able to revert many of the
> hunks below when we get that working.
> 
> The patch has been tested both with and without HSA enabled.  OK for
> trunk?

I don't like this very much.
Couldn't you instead add -Wno-hsa next to -fopenmp in *.exp, and just where
you want to explicitly check the hsa warnings, enable it manually in
dg-options or dg-additional-options (it would need to be guarded with hsa
being enabled etc. anyway).

Jakub

Re: [hsa,testsuite] Introduce offload_device_shared_as effective target

2016-02-26 Thread Jakub Jelinek

On Fri, Feb 26, 2016 at 04:59:42PM +0100, Martin Jambor wrote:
> this patch has been written by Keith McDaniel when he was working at
> AMD (and so it should be covered by their blanket copyright
> assignment) and adds a libgomp testsuite effective target predicate
> offload_device_shared_as that allows us to run tests only when target
> constructs are run on a device with shared memory (this includes the
> host).  Keith included a C++ test to illustrate the use.
> 
> I have tested this thoroughly, both with and without HSA (enabled or
> present).  OK for trunk?
> 
> Thanks,
> 
> Martin
> 
> 2016-02-10  Keith McDaniel 
>   Martin Jambor  
> 
>   * testsuite/lib/libgomp.exp
>   (check_effective_target_offload_device_shared_as): New proc.
>   * testsuite/libgomp.c++/declare_target-1.C: New test.

Ok.

Jakub

Re: [PATCH 1/3] Add aarch64--rtems target

2016-02-26 Thread James Greenhalgh

On Thu, Feb 25, 2016 at 02:49:08PM -0600, Joel Sherrill wrote:
>   * gcc/config.gcc, libgcc/config.host: Add aarch64-*-rtems*.
>   * gcc/config/aarch64/rtems.h: New file.

OK.

Thanks,
James

> ---
>  gcc/config.gcc | 11 +--
>  gcc/config/aarch64/rtems.h | 28 
>  libgcc/config.host |  2 +-
>  3 files changed, 38 insertions(+), 3 deletions(-)
>  create mode 100644 gcc/config/aarch64/rtems.h

Re: [PATCH 3/3] contrib/config-list.mk: Add aarch64-rtems and x86_64-rtems

2016-02-26 Thread James Greenhalgh

On Thu, Feb 25, 2016 at 02:49:10PM -0600, Joel Sherrill wrote:
>   * contrib/config-list.mk: Add aarch64-rtems and x86_64-rtems

The AArch64 part of this is OK.

Thanks,
James

> ---
>  contrib/config-list.mk | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)

[PR 69920] Prevent SRA from leaving a removed SSA_NAME in IL

2016-02-26 Thread Martin Jambor

Hi,

my fix for PR 69666 has caused quite a few regressions accross the
borad where SRA removed a SSA_NAME which however still was in the IL
(and usually stumbled upon it itself straight away).

The removal path should not be executed when there is an SSA_NAME on
the LHS, the code clearly is not ready for it.  Before my patch, we
got always lucky because the statement was simply modified elsewhere
when the LHS was an SSA_NAME.  However, even that was not 10%
guaranteed because of the !access_has_replacements_p (racc) part of
the changed condition.

The patch below fixes the ICEs simply by guarding the removal code to
only work when the LHS is not an SSA_NAME.  This means that the safe
path below it is going to execute.

I have bootstrapped and tested the patch on x86_64-linux.  I'd like to
commit it to trunk as soon as it gets approved and then I'd like to
commit it to gcc-5 branch together with the PR 69666 fix a few days
afterwards.  OK?

Thanks,

Martin


2016-02-26  Martin Jambor  

PR middle-end/69920
* tree-sra.c (sra_modify_assign): Do not remove loads of
uninitialized aggregates to SSA_NAMEs.

testsuite/
* gcc.dg/torture/pr69932.c: New test.
* gcc.dg/torture/pr69936.c: Likewise.
---
 gcc/testsuite/gcc.dg/torture/pr69932.c | 10 ++
 gcc/testsuite/gcc.dg/torture/pr69936.c | 24 
 gcc/tree-sra.c |  3 ++-
 3 files changed, 36 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr69932.c
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr69936.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr69932.c 
b/gcc/testsuite/gcc.dg/torture/pr69932.c
new file mode 100644
index 000..4b82130
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr69932.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+
+int a;
+void fn1() {
+  int b = 4;
+  short c[4];
+  c[b] = c[a];
+  if (c[2]) {}
+
+}
diff --git a/gcc/testsuite/gcc.dg/torture/pr69936.c 
b/gcc/testsuite/gcc.dg/torture/pr69936.c
new file mode 100644
index 000..3023bbb
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr69936.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+
+int a;
+char b;
+void fn1(int p1) {}
+
+int fn2() { return 5; }
+
+void fn3() {
+  if (fn2())
+;
+  else {
+char c[5];
+c[0] = 5;
+  lbl_608:
+fn1(c[9]);
+int d = c[9];
+c[2] | a;
+d = c[b];
+  }
+  goto lbl_608;
+}
+
+int main() { return 0; }
diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
index 663ded2..366f413 100644
--- a/gcc/tree-sra.c
+++ b/gcc/tree-sra.c
@@ -3504,7 +3504,8 @@ sra_modify_assign (gimple *stmt, gimple_stmt_iterator 
*gsi)
   else
{
  if (access_has_children_p (racc)
- && !racc->grp_unscalarized_data)
+ && !racc->grp_unscalarized_data
+ && TREE_CODE (lhs) != SSA_NAME)
{
  if (dump_file)
{
-- 
2.7.1

Re: [PATCH] Fix powerpc shift/rotate/mask insn handling (PR target/69946)

2016-02-26 Thread David Edelsohn

On Fri, Feb 26, 2016 at 11:02 AM, Jakub Jelinek  wrote:
> Hi!
>
> Segher has added last year a few routines for the shift/rotate + mask
> patterns, insns always have one predicate which tests if PowerPC supports
> such pattern, and another that emits the instruction for it.
>
> The testcase in the patch is miscompiled, we end up with an instruction
> with out of bound shift count, because there is disagreement between the
> analysis phase, which does some changes (changes shifts by 0
> into rotate and some rotates into left or right shifts (for the last one
> with changed shift count)), but those changes are only done virtually,
> the predicate can't change the instruction, it either accepts it or rejects
> it.  Then during output, we don't perform those changes, treat it as
> rotate or left or right shift just based on what the actual IL has.
> In some cases it is harmless, in other cases, as the testcase shows, it is
> harmful.
>
> Also, I've noticed that the preparation phase of both
> rs6000_is_valid_shift_mask and rs6000_is_valid_insert_mask is pretty much
> the same (the only difference is that the latter requires the shift/rotate
> count to be always CONST_INT, while the former allows also a REG in there).
>
> So, what the following patch does is that it moves the preparation
> from the predicate functions to a new static inline helper function,
> and then uses that function in both the predicate functions, and also
> in both the output functions.
>
> Bootstrapped/regtested on powerpc64{,le}-linux, ok for trunk?
>
> 2016-02-26  Jakub Jelinek  
>
> PR target/69946
> * config/rs6000/rs6000.c (rs6000_is_valid_shift_mask_1): New
> function.
> (rs6000_is_valid_shift_mask, rs6000_is_valid_insert_mask): Use it.
> (rs6000_insn_for_shift_mask, rs6000_insn_for_insert_mask): Likewise.
> Adjust for possible changes of shift/rotate CODE and shift count SH.
>
> * gcc.dg/pr69946.c: New test.

Segher's patch from Wednesday should fix this.

We can address the duplication separately in GCC 7.

Thanks, David

Re: (Non-)offloading diagnostics

2016-02-26 Thread Thomas Schwinge

Hi!

In light of the -Whsa testsuite patches just posted, I think we first
need to clarify the general policy questions I posted a month ago:

On Tue, 26 Jan 2016 11:46:14 +0100, I wrote:
> On Thu, 10 Dec 2015 18:51:48 +0100, Martin Jambor  wrote:
> > On Mon, Dec 07, 2015 at 12:46:45PM +0100, Jakub Jelinek wrote:
> > > On Mon, Dec 07, 2015 at 12:17:58PM +0100, Martin Jambor wrote:
> > > > [...]  There are no failing
> > > > testcases if HSA is not configured.  If it is, there are some, all of
> > > > which fall into one the following categories:
> > > > 
> > > >   1) HSA cannot compile a function for one reason or another (most
> > > >  common cause is inability of HSA to take an address of a function
> > > >  or make an indirect call) and gives a warning, which is regarded
> > > >  as an "excess error" by dejagnu.
> 
> Confirmed:
> 
> [...]/gcc/testsuite/c-c++-common/gomp/clauses-1.c: In function 
> 'bar._omp_fn.26.hsa.31':
> cc1: warning: could not emit HSAIL for the function [-Whsa]
> cc1: note: support for HSA does not implement non-gridified OpenMP 
> parallel constructs.
> [...]
> 
> ..., and many more.  So, with --enable-offload-targets=[...],hsa we
> regress (PASS -> FAIL; "test for excess errors") such compile tests.
> 
> > > It would be good if there is a -W* switch to turn such warnings off.
> > > Not just for the purposes of dejagnu libgomp testing, but say one
> > > might try to compile a program primarily say for XeonPhi or PTX 
> > > offloading,
> > > but have HSA enabled to, but care primarily about the former two, etc.
> > 
> > All these warnings are in the -Whsa group and can be suppressed with
> > -Wno-hsa.
> 
> These compile tests are done without any -W* flags; -Whsa is enabled by
> default.

I'm a proponent of enabling as many useful warnings by default, or if not
by default, then with -Wall.  -Whsa is enabled by default, and has thus
set a precedent of doing that.

> How to address this mismatch?  Put -Wno-has into all regressing
> test case files individually?  Run the affected testsuites with -Wno-hsa?
> Not enable -Whsa by default (but I agree it's useful to users)?
> (Instead, enable with -Wall, which any sane user should be specifying?)

Even if a bit tedious, my preference actually is to add to the test cases
an (expected) dg-warning everywhere where such a non-offloading warning
currently triggers, because that's what users will be seeing (with -Whsa
enabled by default), and because that will make it obvious (PASS -> FAIL
for the warning check) when that warning disappears (say, because the
compiler can now offload the respective construct, yay).

> A very similar problem also exists for nvptx offloading (Nathan CCed),
> where we emit similar warnings (enabled by default).  As nvptx offloading
> happens during link-time (not compile-time, as with hsa offloading),
> these don't affect GCC's compile tests, but need to be worked around in
> libgomp test cases.


Grüße
 Thomas

Re: (Non-)offloading diagnostics

2016-02-26 Thread Martin Jambor

Hi,

On Fri, Feb 26, 2016 at 05:46:33PM +0100, Thomas Schwinge wrote:
> Hi!
> 
> In light of the -Whsa testsuite patches just posted, I think we first
> need to clarify the general policy questions I posted a month ago:
> 
> On Tue, 26 Jan 2016 11:46:14 +0100, I wrote:
> > On Thu, 10 Dec 2015 18:51:48 +0100, Martin Jambor  wrote:
> > > On Mon, Dec 07, 2015 at 12:46:45PM +0100, Jakub Jelinek wrote:
> > > > On Mon, Dec 07, 2015 at 12:17:58PM +0100, Martin Jambor wrote:
> > > > > [...]  There are no failing
> > > > > testcases if HSA is not configured.  If it is, there are some, all of
> > > > > which fall into one the following categories:
> > > > > 
> > > > >   1) HSA cannot compile a function for one reason or another (most
> > > > >  common cause is inability of HSA to take an address of a function
> > > > >  or make an indirect call) and gives a warning, which is regarded
> > > > >  as an "excess error" by dejagnu.
> > 
> > Confirmed:
> > 
> > [...]/gcc/testsuite/c-c++-common/gomp/clauses-1.c: In function 
> > 'bar._omp_fn.26.hsa.31':
> > cc1: warning: could not emit HSAIL for the function [-Whsa]
> > cc1: note: support for HSA does not implement non-gridified OpenMP 
> > parallel constructs.
> > [...]
> > 
> > ..., and many more.  So, with --enable-offload-targets=[...],hsa we
> > regress (PASS -> FAIL; "test for excess errors") such compile tests.
> > 
> > > > It would be good if there is a -W* switch to turn such warnings off.
> > > > Not just for the purposes of dejagnu libgomp testing, but say one
> > > > might try to compile a program primarily say for XeonPhi or PTX 
> > > > offloading,
> > > > but have HSA enabled to, but care primarily about the former two, etc.
> > > 
> > > All these warnings are in the -Whsa group and can be suppressed with
> > > -Wno-hsa.
> > 
> > These compile tests are done without any -W* flags; -Whsa is enabled by
> > default.
> 
> I'm a proponent of enabling as many useful warnings by default, or if not
> by default, then with -Wall.  -Whsa is enabled by default, and has thus
> set a precedent of doing that.

I am not sure I'd go as far as "as many as possible," but in the case
of -Whsa, the warnings get emitted only if HSA offloading is
configured and especially only if the user used OMP and its target
construct.  This means that it is relevant only for a rather small
class of users and it's not a "your code looks weird" kind of warning
but a "the compiler is not doing what you clearly asked for" warning.
So that is why we decided to warn unconditionally.

But as far as I understand, gcc does not give any promises about
warnings, so I believe decisions like a defaultness of a warning can
be revisited at any point in the future, for example if people learn
not to expect some constructs to be offloaded to GPUs.  Moreover, the
conventions regarding offloading are still being settled and still
will for quite some time so nobody should really expect such details
to be set in stone.

> 
> > How to address this mismatch?  Put -Wno-has into all regressing
> > test case files individually?  Run the affected testsuites with -Wno-hsa?
> > Not enable -Whsa by default (but I agree it's useful to users)?
> > (Instead, enable with -Wall, which any sane user should be specifying?)
> 
> Even if a bit tedious, my preference actually is to add to the test cases
> an (expected) dg-warning everywhere where such a non-offloading warning
> currently triggers, because that's what users will be seeing (with -Whsa
> enabled by default), and because that will make it obvious (PASS -> FAIL
> for the warning check) when that warning disappears (say, because the
> compiler can now offload the respective construct, yay).

That is my opinion as well, except that given the number of warnings
now (with dynamic parallelism disabled), I prefer to work on the file
granularity.  Also, often testcases use macros heavily and putting
dg-warning into them is somewhere between weird and outright
impossible.

On the other hand, as you have probably noticed, Jakub asked me to
pass -Wno-hsa to all tests instead so he seems to have the opposing
point of view.  I must say that I am not really ready to argue about
this too much, especially if we have our own HSA testsuite directory.

Martin

> 
> > A very similar problem also exists for nvptx offloading (Nathan CCed),
> > where we emit similar warnings (enabled by default).  As nvptx offloading
> > happens during link-time (not compile-time, as with hsa offloading),
> > these don't affect GCC's compile tests, but need to be worked around in
> > libgomp test cases.
> 
> 
> Grüße
>  Thomas

Re: [Fortran, Patch] (Coarrays) Wrong events size

2016-02-26 Thread Thomas Koenig


Hi Allessandro,


* PING *


Looks obvious and simple enough for me.

OK.

Thanks for the patch!

Thomas

[PATCH] update include/plugin-api.h to add hooks for section alignment + size

2016-02-26 Thread Than McIntosh

Hello,

I would like to update the gcc plugin API to include interfaces for
querying section size and alignment. The intent is to make it easier
for plugins to do link-time reordering of .bss/.data/.rodata sections
to reduce padding and/or improve cache utilization.

I've posted a patch to binutils that includes changes to the gold
linker to implement these hooks, you can find details at
http://sourceware.org/ml/binutils/2016-02/msg00400.html.

The attached patch is to make sure that the gcc plugin-api.h is in
sync with the corresponding header in binutils.

Questions + comments welcome.

Thanks, Than
diff --git a/include/plugin-api.h b/include/plugin-api.h
index 1bf1750..3ee2e2b 100644
--- a/include/plugin-api.h
+++ b/include/plugin-api.h
@@ -1,6 +1,6 @@
 /* plugin-api.h -- External linker plugin API.  */
 
-/* Copyright (C) 2009-2015 Free Software Foundation, Inc.
+/* Copyright (C) 2009-2016 Free Software Foundation, Inc.
Written by Cary Coutant .
 
This file is part of binutils.
@@ -345,6 +345,26 @@ enum ld_plugin_status
 const struct ld_plugin_section * section_list,
 unsigned int num_sections);
 
+/* The linker's interface for retrieving the section alignment requirement
+   of a specific section in an object.  This interface should only be invoked 
in the
+   claim_file handler.  This function sets *ADDRALIGN to the ELF sh_addralign
+   value of the input section.  */
+
+typedef
+enum ld_plugin_status
+(*ld_plugin_get_input_section_alignment) (const struct ld_plugin_section 
section,
+  unsigned int *addralign);
+
+/* The linker's interface for retrieving the section size of a specific section
+   in an object.  This interface should only be invoked in the claim_file 
handler.
+   This function sets *SECSIZE to the ELF sh_size
+   value of the input section.  */
+
+typedef
+enum ld_plugin_status
+(*ld_plugin_get_input_section_size) (const struct ld_plugin_section section,
+ uint64_t *secsize);
+
 enum ld_plugin_level
 {
   LDPL_INFO,
@@ -384,7 +404,9 @@ enum ld_plugin_tag
   LDPT_ALLOW_SECTION_ORDERING = 24,
   LDPT_GET_SYMBOLS_V2 = 25,
   LDPT_ALLOW_UNIQUE_SEGMENT_FOR_SECTIONS = 26,
-  LDPT_UNIQUE_SEGMENT_FOR_SECTIONS = 27
+  LDPT_UNIQUE_SEGMENT_FOR_SECTIONS = 27,
+  LDPT_GET_INPUT_SECTION_ALIGNMENT = 28,
+  LDPT_GET_INPUT_SECTION_SIZE = 29
 };
 
 /* The plugin transfer vector.  */
@@ -416,6 +438,8 @@ struct ld_plugin_tv
 ld_plugin_allow_section_ordering tv_allow_section_ordering;
 ld_plugin_allow_unique_segment_for_sections 
tv_allow_unique_segment_for_sections; 
 ld_plugin_unique_segment_for_sections tv_unique_segment_for_sections;
+ld_plugin_get_input_section_alignment tv_get_input_section_alignment;
+ld_plugin_get_input_section_size tv_get_input_section_size;
   } tv_u;
 };

libgo patch committed: Add some Getsockopt functions to the syscall package

2016-02-26 Thread Ian Lance Taylor

This patch adds some Getsockopt functions to the syscall package.
These functions exist in the master library, but never made it into
gccgo's syscall package.  This fixes GCC PR 69966.  Bootstrapped and
ran Go testsuite on x86_64-pc-linux-gnu.  Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 233670)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-156f5f0152797ac2afe5f23803aeb3c7b8f8418e
+3de822d11255d439fac9717897b017aae2de18c2
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/go/syscall/socket.go
===
--- libgo/go/syscall/socket.go  (revision 232239)
+++ libgo/go/syscall/socket.go  (working copy)
@@ -251,6 +251,13 @@ func GetsockoptIPv6Mreq(fd, level, opt i
return &value, err
 }
 
+func GetsockoptICMPv6Filter(fd, level, opt int) (*ICMPv6Filter, error) {
+   var value ICMPv6Filter
+   vallen := Socklen_t(SizeofICMPv6Filter)
+   err := getsockopt(fd, level, opt, unsafe.Pointer(&value), &vallen)
+   return &value, err
+}
+
 //sys  setsockopt(s int, level int, name int, val unsafe.Pointer, vallen 
Socklen_t) (err error)
 //setsockopt(s _C_int, level _C_int, optname _C_int, val *byte, vallen 
Socklen_t) _C_int
 
Index: libgo/go/syscall/socket_bsd.go
===
--- libgo/go/syscall/socket_bsd.go  (revision 232239)
+++ libgo/go/syscall/socket_bsd.go  (working copy)
@@ -80,3 +80,10 @@ func BindToDevice(fd int, device string)
 func anyToSockaddrOS(rsa *RawSockaddrAny) (Sockaddr, error) {
return nil, EAFNOSUPPORT
 }
+
+func GetsockoptIPv6MTUInfo(fd, level, opt int) (*IPv6MTUInfo, error) {
+   var value IPv6MTUInfo
+   vallen := Socklen_t(SizeofIPv6MTUInfo)
+   err := getsockopt(fd, level, opt, unsafe.Pointer(&value), &vallen)
+   return &value, err
+}
Index: libgo/go/syscall/socket_linux.go
===
--- libgo/go/syscall/socket_linux.go(revision 232239)
+++ libgo/go/syscall/socket_linux.go(working copy)
@@ -168,6 +168,20 @@ func anyToSockaddrOS(rsa *RawSockaddrAny
return nil, EAFNOSUPPORT
 }
 
+func GetsockoptIPv6MTUInfo(fd, level, opt int) (*IPv6MTUInfo, error) {
+   var value IPv6MTUInfo
+   vallen := Socklen_t(SizeofIPv6MTUInfo)
+   err := getsockopt(fd, level, opt, unsafe.Pointer(&value), &vallen)
+   return &value, err
+}
+
+func GetsockoptUcred(fd, level, opt int) (*Ucred, error) {
+   var value Ucred
+   vallen := Socklen_t(SizeofUcred)
+   err := getsockopt(fd, level, opt, unsafe.Pointer(&value), &vallen)
+   return &value, err
+}
+
 //sysnbEpollCreate(size int) (fd int, err error)
 //epoll_create(size _C_int) _C_int
 
Index: libgo/mksysinfo.sh
===
--- libgo/mksysinfo.sh  (revision 232239)
+++ libgo/mksysinfo.sh  (working copy)
@@ -870,6 +870,14 @@ if ! grep 'type ICMPv6Filter ' ${OUT} >
   echo 'type ICMPv6Filter struct { Data [8]uint32 }' >> ${OUT}
 fi
 
+# The ip6_mtuinfo struct.
+grep '^type _ip6_mtuinfo ' gen-sysinfo.go | \
+sed -e 's/_ip6_mtuinfo/IPv6MTUInfo/' \
+  -e 's/ip6m_addr/Addr/' \
+  -e 's/_sockaddr_in6/RawSockaddrInet6/' \
+  -e 's/ip6m_mtu/Mtu/' \
+>> ${OUT}
+
 # Try to guess the type to use for fd_set.
 fd_set=`grep '^type _fd_set ' gen-sysinfo.go || true`
 fds_bits_type="_C_long"
@@ -1464,7 +1472,7 @@ set cmsghdr Cmsghdr ip_mreq IPMreq ip_mr
 msghdr Msghdr nlattr NlAttr nlmsgerr NlMsgerr nlmsghdr NlMsghdr \
 rtattr RtAttr rtgenmsg RtGenmsg rtmsg RtMsg rtnexthop RtNexthop \
 sock_filter SockFilter sock_fprog SockFprog ucred Ucred \
-icmp6_filter ICMPv6Filter
+icmp6_filter ICMPv6Filter ip6_mtuinfo IPv6MTUInfo
 while test $# != 0; do
 nc=$1
 ngo=$2

Re: [PR 69920] Prevent SRA from leaving a removed SSA_NAME in IL

2016-02-26 Thread Richard Biener

On February 26, 2016 5:15:43 PM GMT+01:00, Martin Jambor  
wrote:
>Hi,
>
>my fix for PR 69666 has caused quite a few regressions accross the
>borad where SRA removed a SSA_NAME which however still was in the IL
>(and usually stumbled upon it itself straight away).
>
>The removal path should not be executed when there is an SSA_NAME on
>the LHS, the code clearly is not ready for it.  Before my patch, we
>got always lucky because the statement was simply modified elsewhere
>when the LHS was an SSA_NAME.  However, even that was not 10%
>guaranteed because of the !access_has_replacements_p (racc) part of
>the changed condition.
>
>The patch below fixes the ICEs simply by guarding the removal code to
>only work when the LHS is not an SSA_NAME.  This means that the safe
>path below it is going to execute.
>
>I have bootstrapped and tested the patch on x86_64-linux.  I'd like to
>commit it to trunk as soon as it gets approved and then I'd like to
>commit it to gcc-5 branch together with the PR 69666 fix a few days
>afterwards.  OK?

OK.

Richard.

>Thanks,
>
>Martin
>
>
>2016-02-26  Martin Jambor  
>
>   PR middle-end/69920
>   * tree-sra.c (sra_modify_assign): Do not remove loads of
>   uninitialized aggregates to SSA_NAMEs.
>
>testsuite/
>   * gcc.dg/torture/pr69932.c: New test.
>   * gcc.dg/torture/pr69936.c: Likewise.
>---
> gcc/testsuite/gcc.dg/torture/pr69932.c | 10 ++
> gcc/testsuite/gcc.dg/torture/pr69936.c | 24 
> gcc/tree-sra.c |  3 ++-
> 3 files changed, 36 insertions(+), 1 deletion(-)
> create mode 100644 gcc/testsuite/gcc.dg/torture/pr69932.c
> create mode 100644 gcc/testsuite/gcc.dg/torture/pr69936.c
>
>diff --git a/gcc/testsuite/gcc.dg/torture/pr69932.c
>b/gcc/testsuite/gcc.dg/torture/pr69932.c
>new file mode 100644
>index 000..4b82130
>--- /dev/null
>+++ b/gcc/testsuite/gcc.dg/torture/pr69932.c
>@@ -0,0 +1,10 @@
>+/* { dg-do compile } */
>+
>+int a;
>+void fn1() {
>+  int b = 4;
>+  short c[4];
>+  c[b] = c[a];
>+  if (c[2]) {}
>+
>+}
>diff --git a/gcc/testsuite/gcc.dg/torture/pr69936.c
>b/gcc/testsuite/gcc.dg/torture/pr69936.c
>new file mode 100644
>index 000..3023bbb
>--- /dev/null
>+++ b/gcc/testsuite/gcc.dg/torture/pr69936.c
>@@ -0,0 +1,24 @@
>+/* { dg-do compile } */
>+
>+int a;
>+char b;
>+void fn1(int p1) {}
>+
>+int fn2() { return 5; }
>+
>+void fn3() {
>+  if (fn2())
>+;
>+  else {
>+char c[5];
>+c[0] = 5;
>+  lbl_608:
>+fn1(c[9]);
>+int d = c[9];
>+c[2] | a;
>+d = c[b];
>+  }
>+  goto lbl_608;
>+}
>+
>+int main() { return 0; }
>diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
>index 663ded2..366f413 100644
>--- a/gcc/tree-sra.c
>+++ b/gcc/tree-sra.c
>@@ -3504,7 +3504,8 @@ sra_modify_assign (gimple *stmt,
>gimple_stmt_iterator *gsi)
>   else
>   {
> if (access_has_children_p (racc)
>-&& !racc->grp_unscalarized_data)
>+&& !racc->grp_unscalarized_data
>+&& TREE_CODE (lhs) != SSA_NAME)
>   {
> if (dump_file)
>   {

Re: (Non-)offloading diagnostics

2016-02-26 Thread Jakub Jelinek

On Fri, Feb 26, 2016 at 06:18:13PM +0100, Martin Jambor wrote:
> > I'm a proponent of enabling as many useful warnings by default, or if not
> > by default, then with -Wall.  -Whsa is enabled by default, and has thus
> > set a precedent of doing that.
> 
> I am not sure I'd go as far as "as many as possible," but in the case
> of -Whsa, the warnings get emitted only if HSA offloading is
> configured and especially only if the user used OMP and its target
> construct.  This means that it is relevant only for a rather small
> class of users and it's not a "your code looks weird" kind of warning
> but a "the compiler is not doing what you clearly asked for" warning.
> So that is why we decided to warn unconditionally.
> 
> But as far as I understand, gcc does not give any promises about
> warnings, so I believe decisions like a defaultness of a warning can
> be revisited at any point in the future, for example if people learn
> not to expect some constructs to be offloaded to GPUs.  Moreover, the
> conventions regarding offloading are still being settled and still
> will for quite some time so nobody should really expect such details
> to be set in stone.

The thing is, most of the tests in the libgomp.{c,c++,fortran}/ testsuite
are (meant to be) valid OpenMP testcases, having them full of dozens of
dg-warning lines where every of the 10+ different offloading target warns
about something would be a maintainance nightmare.  E.g. when adding
new OpenMP tests, one would need to configure all the offloaders
(individually?), for some you need hw not every committer has, for others
there are other issues (e.g., is the required amdkfd going to be submitted
for upstream kernel?  I might have hard time convincing our kernel
maintainers to use that instead of what is in upstream kernel others).

So, IMHO if you want to check for warnings, do that as Martin has added a
new subdir with only hsa OpenMP tests, if you want test warnings on tests we
already have elsewhere, #include them in the other dir, dg-do link instead
of run (so that it is not run multiple times), and check for the warnings;
you could also use -foffload=hsa in there to make sure you only have to care
about hsa warnings, and not NVPTX, or whatever other offloader.

Jakub

Re: [Patch, Fortran] PR 69495: unused-label warning does not tell which flag triggered it

2016-02-26 Thread Janus Weil

double-ping ...



2016-02-12 0:29 GMT+01:00 Janus Weil :
> ping!
>
> 2016-02-05 19:19 GMT+01:00 Janus Weil :
>> Hi all,
>>
>> I have slightly updated the patch now to avoid string-breaking issues
>> (even if it may not be a problem at all, as mentioned by Jospeh). Also
>> I removed the questionable part in intrinsic.c that I was not sure
>> about.
>>
>> This version of the patch should not be too far from obvious now. Ok for 
>> trunk?
>>
>> Cheers,
>> Janus
>>
>>
>>
>> 2016-02-03 23:27 GMT+01:00 Joseph Myers :
>>> On Wed, 3 Feb 2016, Manfred Schwarb wrote:
>>>
 There are 2 things with translation, and there is a third issue:
 - As you noticed, breaking things differently means translation has to be
   done again.
 - Normally, each string is translated independently, and depending on the
   language there may be lack of context (e.g. adjectives get different
 suffixes
   depending on the noun).
>>>
>>> I believe gettext works fine with (compile-time) string constant
>>> concatenation - that is, extracts the whole concatenated string for
>>> translation, so these are non-issues.  What doesn't work includes:
>>>
>>> * Runtime concatenation of strings or otherwise putting English fragments
>>> together at runtime.
>>>
>>> * String constant concatenation where one of the concatenated pieces comes
>>> from a macro expansion.
>>>
>>> * The argument for translation being a conditional expression:
>>>
>>>   error (cond ? "message 1" : "message 2");
>>>
>>> (in this case, only one of the messages may be extracted for translation,
>>> so you need to mark both of them up with appropriate macros such as G_).
>>>
>>> --
>>> Joseph S. Myers
>>> jos...@codesourcery.com

Re: [Fortran, Patch] (Coarrays) Wrong events size

2016-02-26 Thread Paul Richard Thomas

Dear Alessandro,

Seconded! I saw your ping on my phone and was going to respond.
well, now :-)

Thanks for the patch

Paul

On 26 February 2016 at 18:29, Thomas Koenig  wrote:
> Hi Allessandro,
>
>> * PING *
>
>
> Looks obvious and simple enough for me.
>
> OK.
>
> Thanks for the patch!
>
> Thomas
>



-- 
The difference between genius and stupidity is; genius has its limits.

Albert Einstein

Re: Fix PR44281 (bad RA with global regs)

2016-02-26 Thread Jeff Law


On 02/26/2016 08:41 AM, Bernd Schmidt wrote:

On 02/22/2016 03:37 PM, Richard Biener wrote:


Do calls properly clobber them even if they are not in the set of
call-clobbered regs?
Are asm()s properly using/clobbering them?  I think you are allowed to
use them
in asm()s without adding constraints for them?


Calls do, asms currently don't AFAICT. Not sure whether it's allowed to
use them, but I think it should be straightforward to adjust df-scan.
The other case that came to mind was signal handlers.  What happens if 
we're using the global register as a scratch, we hit a memory reference 
that faults and inside the signal handler the original code expects to 
be able to use the global register the user set up?


If that's a valid use scenario, then there's probably all kinds of DF 
stuff that we'd need to expose.


jeff

Re: (Non-)offloading diagnostics

2016-02-26 Thread Martin Jambor

Hi,

On Fri, Feb 26, 2016 at 06:51:34PM +0100, Jakub Jelinek wrote:
> On Fri, Feb 26, 2016 at 06:18:13PM +0100, Martin Jambor wrote:
> > > I'm a proponent of enabling as many useful warnings by default, or if not
> > > by default, then with -Wall.  -Whsa is enabled by default, and has thus
> > > set a precedent of doing that.
> > 
> > I am not sure I'd go as far as "as many as possible," but in the case
> > of -Whsa, the warnings get emitted only if HSA offloading is
> > configured and especially only if the user used OMP and its target
> > construct.  This means that it is relevant only for a rather small
> > class of users and it's not a "your code looks weird" kind of warning
> > but a "the compiler is not doing what you clearly asked for" warning.
> > So that is why we decided to warn unconditionally.
> > 
> > But as far as I understand, gcc does not give any promises about
> > warnings, so I believe decisions like a defaultness of a warning can
> > be revisited at any point in the future, for example if people learn
> > not to expect some constructs to be offloaded to GPUs.  Moreover, the
> > conventions regarding offloading are still being settled and still
> > will for quite some time so nobody should really expect such details
> > to be set in stone.
> 
> The thing is, most of the tests in the libgomp.{c,c++,fortran}/ testsuite
> are (meant to be) valid OpenMP testcases, having them full of dozens of
> dg-warning lines where every of the 10+ different offloading target warns
> about something would be a maintainance nightmare.

Agreed, having such dg-warnings would definitely be an overkill.  I
only intended to mark the whole test with an option. I am willing to
be looking for new hsa warnings and examine them myself, adding the
option if necessary.  I would not expect the originator of the
testcase or anybody who does not care for HSA to do it.

> E.g. when adding
> new OpenMP tests, one would need to configure all the offloaders
> (individually?), for some you need hw not every committer has,

No special hardware is necessary to see the warning (though you need
at least https://github.com/HSAFoundation/HSA-Runtime-Reference-Source
to build the libgomp plugin).  Once gcc decides to emit HSAIL, it of
course has to work as expected.  There should be no need to configure
hsa individually either.

> for others
> there are other issues (e.g., is the required amdkfd going to be submitted
> for upstream kernel?  I might have hard time convincing our kernel
> maintainers to use that instead of what is in upstream kernel others).

I am being repeatedly told it will be and very soon, but apparently it
takes longer than AMD anticipated.  I don't think anybody expects any
distribution to pick it up on their own (...but you know, Red Hat is
actually the company that now employs the upstream kernel kfd
maintainer ;-).

> So, IMHO if you want to check for warnings, do that as Martin has added a
> new subdir with only hsa OpenMP tests, if you want test warnings on tests we
> already have elsewhere, #include them in the other dir, dg-do link instead
> of run (so that it is not run multiple times), and check for the warnings;
> you could also use -foffload=hsa in there to make sure you only have to care
> about hsa warnings, and not NVPTX, or whatever other offloader.
> 

Just to be clear, I never wanted to be testing for presence of
warnings, I see no value in that.

All in all, I am willing to add -Wno-hsa to default options and only
have these warnings on in dedicated HSA directories.  I will amend the
posted patches once testsuite maintainers look at my initial proposal
for the first such directory.

Martin

Re: [PATCH] powerpc: Handle DImode rotatert implemented with rlwinm (PR69946)

2016-02-26 Thread Segher Boessenkool

On Thu, Feb 25, 2016 at 10:52:29AM -0500, David Edelsohn wrote:
> Please add a short comment explaining why rs6000_insn_for_shift_mask
> doesn't need to match the logic in rs6000_is_valid_shift_mask
> converting rotates to simple shifts.

I added this comment:

--- trunk/gcc/config/rs6000/rs6000.c2016/02/26 18:17:02 233754
+++ trunk/gcc/config/rs6000/rs6000.c2016/02/26 18:49:18 233755
@@ -17438,9 +17438,12 @@ rs6000_insn_for_shift_mask (machine_mode
operands[2] = GEN_INT (32 - INTVAL (operands[2]));
   operands[3] = GEN_INT (31 - nb);
   operands[4] = GEN_INT (31 - ne);
+  /* This insn can also be a 64-bit rotate with mask that really makes
+it just a shift right (with mask); the %h below are to adjust for
+that situation (shift count is >= 32 in that case).  */
   if (dot)
-   return "rlw%I2nm. %0,%1,%2,%3,%4";
-  return "rlw%I2nm %0,%1,%2,%3,%4";
+   return "rlw%I2nm. %0,%1,%h2,%3,%4";
+  return "rlw%I2nm %0,%1,%h2,%3,%4";
 }
 
   gcc_unreachable ();


Segher

Re: [hsa merge 08/10] HSAIL BRIG description header file

2016-02-26 Thread Martin Jambor

Hi,

I hope I've got some good news:

On Thu, Jan 14, 2016 at 05:18:56PM -0800, Ian Lance Taylor wrote:
> Jakub Jelinek  writes:
>
> > On Wed, Jan 13, 2016 at 06:39:33PM +0100, Martin Jambor wrote:
> >> the following patch adds a BRIG (binary representation of HSAIL)
> >> representation description.  It is within a single header file
> >> describing the binary structures and constants of the format.
> >>
> >> The file comes from the HSA Foundation (I have only added the
> >> HSA_BRIG_FORMAT_H macro and check and removed some weird comments
> >> which are not present in proposed future versions of the file) and is
> >> licensed under "University of Illinois/NCSA Open Source License."
> >>
> >> The license is "GPL-compatible" according to FSF
> >> (http://www.gnu.org/licenses/license-list.en.html#GPLCompatibleLicenses)
> >> so I believe we can have it in GCC.  Nevertheless, it is not GPL and
> >> there is no copyright assignment for it, but the situation is
> >> hopefully analogous to some other libraries that have their upstream
> >> elsewhere but we ship them as part of the GCC.
> >>
> >> In the previous posting of this patch
> >> (https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00721.html) I have
> >> requested a permission from the steering committee to include this file
> >> with a different upstream in GCC.  I have not received an official
> >> reply but since I have been chosen to be the HSA maintainer, I tend to
> >> think there were no legal objections against HSA going forward,
> >> including this file.
>
> Martin, could you ask the HSA Foundation or AMD or whoever if there is
> any way they could remove the second requirement of the license?  It
> adds yet another case where anybody distributing GCC has to list yet
> another copyright notice.
>

I have asked HSA foundation to do just that and apparently they agreed
to change the licensing of the file (in upcoming versions of HSA) to
the MIT license.  IIUC, the reading of the license header would then
be the one below.  I hope that means the problematic requirements will
be gone and we will be able to just use their file.  If, however, you
still think there will be issues preventing us from doing that, please
let me know as soon as possible.

Thanks,

Martin

The license is going to be:

The MIT License (MIT)

Copyright (c) 2016, HSA Foundation, Inc

 * All rights reserved.

Permission is hereby granted, free of charge, to any person obtaining a copy of
this software and associated documentation files (the "Software"), to deal in
the Software without restriction, including without limitation the rights to
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
of the Software, and to permit persons to whom the Software is furnished to do
so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE."

Re: [PATCH] powerpc: Handle DImode rotatert implemented with rlwinm (PR69946)

2016-02-26 Thread David Edelsohn

On Fri, Feb 26, 2016 at 1:52 PM, Segher Boessenkool
 wrote:
> On Thu, Feb 25, 2016 at 10:52:29AM -0500, David Edelsohn wrote:
>> Please add a short comment explaining why rs6000_insn_for_shift_mask
>> doesn't need to match the logic in rs6000_is_valid_shift_mask
>> converting rotates to simple shifts.
>
> I added this comment:
>
> --- trunk/gcc/config/rs6000/rs6000.c2016/02/26 18:17:02 233754
> +++ trunk/gcc/config/rs6000/rs6000.c2016/02/26 18:49:18 233755
> @@ -17438,9 +17438,12 @@ rs6000_insn_for_shift_mask (machine_mode
> operands[2] = GEN_INT (32 - INTVAL (operands[2]));
>operands[3] = GEN_INT (31 - nb);
>operands[4] = GEN_INT (31 - ne);
> +  /* This insn can also be a 64-bit rotate with mask that really makes
> +it just a shift right (with mask); the %h below are to adjust for
> +that situation (shift count is >= 32 in that case).  */
>if (dot)
> -   return "rlw%I2nm. %0,%1,%2,%3,%4";
> -  return "rlw%I2nm %0,%1,%2,%3,%4";
> +   return "rlw%I2nm. %0,%1,%h2,%3,%4";
> +  return "rlw%I2nm %0,%1,%h2,%3,%4";
>  }
>
>gcc_unreachable ();

Thanks!

- David

[PATCH] Fix "no-vsx" target attribute handling (PR target/69969)

2016-02-26 Thread Jakub Jelinek

Hi!

Most of the errors and warnings in rs6000_option_override_internal
are emitted only if the particular option is explicit, e.g.
  if (TARGET_P9_DFORM && !TARGET_P9_VECTOR)
{
  if (rs6000_isa_flags_explicit & OPTION_MASK_P9_VECTOR)
error ("-mpower9-dform requires -mpower9-vector");
  rs6000_isa_flags &= ~OPTION_MASK_P9_DFORM;
}
and many others, which is right, but for the
-mallow-movmisalign requires -mvsx
error it doesn't do this, so if say -mcpu=power8 compiled TU
contains a routine with target ("no-vsx") attribute, we get this
error, even when the user hasn't done anything we should complain about.

Fixed by following what we do for the other options, bootstrapped/regtested
on powerpc64le-linux (and powerpc64-linux, but regtest is still pending
there).  Ok for trunk?

2016-02-26  Jakub Jelinek  

PR target/69969
* config/rs6000/rs6000.c (rs6000_option_override_internal): Don't
complain about -mallow-movmisalign without -mvsx if
TARGET_ALLOW_MOVMISALIGN was not set explicitly.

* gcc.target/powerpc/pr69969.c: New test.

--- gcc/config/rs6000/rs6000.c.jj   2016-02-26 12:05:39.0 +0100
+++ gcc/config/rs6000/rs6000.c  2016-02-26 15:58:29.250259330 +0100
@@ -4207,7 +4207,8 @@ rs6000_option_override_internal (bool gl
 
   else if (TARGET_ALLOW_MOVMISALIGN && !TARGET_VSX)
 {
-  if (TARGET_ALLOW_MOVMISALIGN > 0)
+  if (TARGET_ALLOW_MOVMISALIGN > 0
+ && global_options_set.x_TARGET_ALLOW_MOVMISALIGN)
error ("-mallow-movmisalign requires -mvsx");
 
   TARGET_ALLOW_MOVMISALIGN = 0;
--- gcc/testsuite/gcc.target/powerpc/pr69969.c.jj   2016-02-26 
16:03:34.992101828 +0100
+++ gcc/testsuite/gcc.target/powerpc/pr69969.c  2016-02-26 16:03:25.0 
+0100
@@ -0,0 +1,7 @@
+/* PR target/69969 */
+/* { dg-do compile } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power8" } } */
+/* { dg-options "-mcpu=power8" } */
+
+int bar (int x) { return x; }
+__attribute__((__target__("no-vsx"))) int foo (int x) { return x; } /* { 
dg-bogus "-mallow-movmisalign requires -mvsx" } */

Jakub

Re: [hsa merge 08/10] HSAIL BRIG description header file

2016-02-26 Thread Mike Stump

On Feb 26, 2016, at 10:58 AM, Martin Jambor  wrote:
> I have asked HSA foundation

Thanks.

> The license is going to be:
> 
> The MIT License (MIT)

Wonderful.

[PATCH] Fix up recent texinfo regression

2016-02-26 Thread Jakub Jelinek

Hi!

I've noticed today:
../../gcc/doc/extend.texi:10717: warning: `.' or `,' must follow @xref, not A
../../gcc/doc/extend.texi:10764: warning: `.' or `,' must follow @xref, not A

After reading info texinfo on @xref and @pxref, I believe it is invalid
to use @xref this way, in the middle of a sentence, and indeed it looks
quite weird what is produced.  So the following patch attempts to fix it up
by using @xref in a separate sentence.  Tested on x86_64-linux, ok for
trunk?

BTW, it also seems that we use see @xref{...}, or See @xref{...}, in various
places, that also looks wrong, for info that results in
see *note ...::,
but in *.pdf it is probably
see See ...,
Thus, perhaps we should check for this and remove the see/See words before
@xref, and if we want see instead of See in *.pdf, use @pxref.

2016-02-26  Jakub Jelinek  

* doc/extend.texi (__builtin_alloca, __builtin_alloca_with_align):
Fix @xref usage.

--- gcc/doc/extend.texi.jj  2016-02-26 16:49:58.0 +0100
+++ gcc/doc/extend.texi 2016-02-26 20:41:16.611200672 +0100
@@ -10714,10 +10714,11 @@ it is the responsibility of its caller t
 cause it to exceed the stack size limit.
 The @code{__builtin_alloca} function is provided to make it possible to
 allocate on the stack arrays of bytes with an upper bound that may be
-computed at run time.  Since C99 @xref{Variable Length} Arrays offer
+computed at run time.  Since C99 Variable Length Arrays offer
 similar functionality under a portable, more convenient, and safer
 interface they are recommended instead, in both C99 and C++ programs
 where GCC provides them as an extension.
+@xref{Variable Length}, for details.
 
 @end deftypefn
 
@@ -10761,10 +10762,10 @@ the argument doesn't cause it to exceed
 The @code{__builtin_alloca_with_align} function is provided to make
 it possible to allocate on the stack overaligned arrays of bytes with
 an upper bound that may be computed at run time.  Since C99
-@xref{Variable Length} Arrays offer the same functionality under
+Variable Length Arrays offer the same functionality under
 a portable, more convenient, and safer interface they are recommended
 instead, in both C99 and C++ programs where GCC provides them as
-an extension.
+an extension.  @xref{Variable Length}, for details.
 
 @end deftypefn
 


Jakub

C++ PATCH for c++/69958 (wrong sizeof...)

2016-02-26 Thread Jason Merrill


alias templates are supposed to be transparent, so when we see

template 
using wrapped2 = list>;

the size_for needs to be expanded into sizeof...(something). 
There is no way to write this "something" in C++, but it's simple enough 
to leave it as an *_ARGUMENT_PACK within the compiler.


Tested x86_64-pc-linux-gnu, applying to trunk.
commit 772d7b6f7c13119525da6e470c8729f14f35152f
Author: Jason Merrill 
Date:   Fri Feb 26 13:53:08 2016 -0500

	PR c++/69958
	* pt.c (make_argument_pack): New.
	(tsubst_copy) [SIZEOF_EXPR]: Handle partial expansion.
	(tsubst_copy_and_build): Likewise.

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index cd3eb67..b5855a8 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -7,6 +7,25 @@ get_pattern_parm (tree parm, tree tmpl)
   return patparm;
 }
 
+/* Make an argument pack out of the TREE_VEC VEC.  */
+
+static tree
+make_argument_pack (tree vec)
+{
+  tree pack;
+  tree elt = TREE_VEC_ELT (vec, 0);
+  if (TYPE_P (elt))
+pack = cxx_make_type (TYPE_ARGUMENT_PACK);
+  else
+{
+  pack = make_node (NONTYPE_ARGUMENT_PACK);
+  TREE_TYPE (pack) = TREE_TYPE (elt);
+  TREE_CONSTANT (pack) = 1;
+}
+  SET_ARGUMENT_PACK_ARGS (pack, vec);
+  return pack;
+}
+
 /* Substitute ARGS into the vector or list of template arguments T.  */
 
 static tree
@@ -14066,7 +14085,8 @@ tsubst_copy (tree t, tree args, tsubst_flags_t complain, tree in_decl)
   }
 
 case SIZEOF_EXPR:
-  if (PACK_EXPANSION_P (TREE_OPERAND (t, 0)))
+  if (PACK_EXPANSION_P (TREE_OPERAND (t, 0))
+	  || ARGUMENT_PACK_P (TREE_OPERAND (t, 0)))
 {
   tree expanded, op = TREE_OPERAND (t, 0);
 	  int len = 0;
@@ -14077,7 +14097,11 @@ tsubst_copy (tree t, tree args, tsubst_flags_t complain, tree in_decl)
 	  ++cp_unevaluated_operand;
 	  ++c_inhibit_evaluation_warnings;
 	  /* We only want to compute the number of arguments.  */
-	  expanded = tsubst_pack_expansion (op, args, complain, in_decl);
+	  if (PACK_EXPANSION_P (op))
+	expanded = tsubst_pack_expansion (op, args, complain, in_decl);
+	  else
+	expanded = tsubst_template_args (ARGUMENT_PACK_ARGS (op),
+	 args, complain, in_decl);
 	  --cp_unevaluated_operand;
 	  --c_inhibit_evaluation_warnings;
 
@@ -14093,13 +14117,15 @@ tsubst_copy (tree t, tree args, tsubst_flags_t complain, tree in_decl)
 	return error_mark_node;
 	  else if (PACK_EXPANSION_P (expanded)
 		   || (TREE_CODE (expanded) == TREE_VEC
-		   && len > 0
-		   && PACK_EXPANSION_P (TREE_VEC_ELT (expanded, len-1
+		   && pack_expansion_args_count (expanded)))
+
 	{
-	  if (TREE_CODE (expanded) == TREE_VEC)
-		expanded = TREE_VEC_ELT (expanded, len - 1);
+	  if (PACK_EXPANSION_P (expanded))
+		/* OK.  */;
+	  else if (TREE_VEC_LENGTH (expanded) == 1)
+		expanded = TREE_VEC_ELT (expanded, 0);
 	  else
-		PACK_EXPANSION_SIZEOF_P (expanded) = true;
+		expanded = make_argument_pack (expanded);
 
 	  if (TYPE_P (expanded))
 		return cxx_sizeof_or_alignof_type (expanded, SIZEOF_EXPR, 
@@ -16162,7 +16188,8 @@ tsubst_copy_and_build (tree t,
 	  length, stride, TREE_TYPE (op1)));
   }
 case SIZEOF_EXPR:
-  if (PACK_EXPANSION_P (TREE_OPERAND (t, 0)))
+  if (PACK_EXPANSION_P (TREE_OPERAND (t, 0))
+	  || ARGUMENT_PACK_P (TREE_OPERAND (t, 0)))
 	RETURN (tsubst_copy (t, args, complain, in_decl));
   /* Fall through */
   
diff --git a/gcc/testsuite/g++.dg/cpp0x/variadic-sizeof4.C b/gcc/testsuite/g++.dg/cpp0x/variadic-sizeof4.C
new file mode 100644
index 000..1187429
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/variadic-sizeof4.C
@@ -0,0 +1,33 @@
+// PR c++/69958
+// { dg-do compile { target c++11 } }
+
+typedef decltype(sizeof(int)) size_t;
+
+template 
+struct list { };
+
+template 
+struct size {  };
+
+template 
+using size_for = size;
+
+template struct assert_same;
+template struct assert_same {};
+
+template 
+using wrapped = list>;
+
+// This assertion fails (produces size<4>)
+assert_same<
+list>,
+wrapped> a3;
+
+
+template 
+using wrapped2 = list>;
+
+// This assertion fails (produces size<2>)
+assert_same<
+list>,
+wrapped2> a4;
diff --git a/gcc/testsuite/g++.dg/cpp0x/variadic-sizeof4a.C b/gcc/testsuite/g++.dg/cpp0x/variadic-sizeof4a.C
new file mode 100644
index 000..0e8096d
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/variadic-sizeof4a.C
@@ -0,0 +1,33 @@
+// PR c++/69958
+// { dg-do compile { target c++11 } }
+
+typedef decltype(sizeof(int)) size_t;
+
+template 
+struct list { };
+
+template 
+struct size {  };
+
+template 
+using size_for = size;
+
+template struct assert_same;
+template struct assert_same {};
+
+template 
+using wrapped = list>;
+
+// This assertion fails (produces size<4>)
+assert_same<
+list>,
+  wrapped> a3;
+
+
+template 
+using wrapped2 = list>;
+
+// This assertion fails (produces size<2>)
+assert_same<
+list>,
+  wrapped2> a4;

Re: [PATCH] Fix "no-vsx" target attribute handling (PR target/69969)

2016-02-26 Thread David Edelsohn

On Fri, Feb 26, 2016 at 2:27 PM, Jakub Jelinek  wrote:
> Hi!
>
> Most of the errors and warnings in rs6000_option_override_internal
> are emitted only if the particular option is explicit, e.g.
>   if (TARGET_P9_DFORM && !TARGET_P9_VECTOR)
> {
>   if (rs6000_isa_flags_explicit & OPTION_MASK_P9_VECTOR)
> error ("-mpower9-dform requires -mpower9-vector");
>   rs6000_isa_flags &= ~OPTION_MASK_P9_DFORM;
> }
> and many others, which is right, but for the
> -mallow-movmisalign requires -mvsx
> error it doesn't do this, so if say -mcpu=power8 compiled TU
> contains a routine with target ("no-vsx") attribute, we get this
> error, even when the user hasn't done anything we should complain about.
>
> Fixed by following what we do for the other options, bootstrapped/regtested
> on powerpc64le-linux (and powerpc64-linux, but regtest is still pending
> there).  Ok for trunk?
>
> 2016-02-26  Jakub Jelinek  
>
> PR target/69969
> * config/rs6000/rs6000.c (rs6000_option_override_internal): Don't
> complain about -mallow-movmisalign without -mvsx if
> TARGET_ALLOW_MOVMISALIGN was not set explicitly.
>
> * gcc.target/powerpc/pr69969.c: New test.

Seems reasonable.  LGTM

Thanks, David

Re: [PATCH] Fix up recent texinfo regression

2016-02-26 Thread Martin Sebor


On 02/26/2016 12:51 PM, Jakub Jelinek wrote:

Hi!

I've noticed today:
../../gcc/doc/extend.texi:10717: warning: `.' or `,' must follow @xref, not A
../../gcc/doc/extend.texi:10764: warning: `.' or `,' must follow @xref, not A

After reading info texinfo on @xref and @pxref, I believe it is invalid
to use @xref this way, in the middle of a sentence, and indeed it looks
quite weird what is produced.  So the following patch attempts to fix it up
by using @xref in a separate sentence.  Tested on x86_64-linux, ok for
trunk?


Thanks for fixing it up.  I noticed the warning yesterday and have
been meaning to look into it.

Your change seems fine to me though I wonder if using a plain @ref{}
instead would do the right thing.  It's used in a bunch of places to
refer to the Common Function Attributes section (for example).  That
way the referenced text wouldn't need to be repeated.

Martin



BTW, it also seems that we use see @xref{...}, or See @xref{...}, in various
places, that also looks wrong, for info that results in
see *note ...::,
but in *.pdf it is probably
see See ...,
Thus, perhaps we should check for this and remove the see/See words before
@xref, and if we want see instead of See in *.pdf, use @pxref.

2016-02-26  Jakub Jelinek  

* doc/extend.texi (__builtin_alloca, __builtin_alloca_with_align):
Fix @xref usage.

--- gcc/doc/extend.texi.jj  2016-02-26 16:49:58.0 +0100
+++ gcc/doc/extend.texi 2016-02-26 20:41:16.611200672 +0100
@@ -10714,10 +10714,11 @@ it is the responsibility of its caller t
  cause it to exceed the stack size limit.
  The @code{__builtin_alloca} function is provided to make it possible to
  allocate on the stack arrays of bytes with an upper bound that may be
-computed at run time.  Since C99 @xref{Variable Length} Arrays offer
+computed at run time.  Since C99 Variable Length Arrays offer
  similar functionality under a portable, more convenient, and safer
  interface they are recommended instead, in both C99 and C++ programs
  where GCC provides them as an extension.
+@xref{Variable Length}, for details.

  @end deftypefn

@@ -10761,10 +10762,10 @@ the argument doesn't cause it to exceed
  The @code{__builtin_alloca_with_align} function is provided to make
  it possible to allocate on the stack overaligned arrays of bytes with
  an upper bound that may be computed at run time.  Since C99
-@xref{Variable Length} Arrays offer the same functionality under
+Variable Length Arrays offer the same functionality under
  a portable, more convenient, and safer interface they are recommended
  instead, in both C99 and C++ programs where GCC provides them as
-an extension.
+an extension.  @xref{Variable Length}, for details.

  @end deftypefn



Jakub

Re: [PATCH] Fix PR69760

2016-02-26 Thread H.J. Lu

On Wed, Feb 24, 2016 at 6:49 AM, Richard Biener  wrote:
>
> The following fixes bogus SCEV analysis for expressions that are only
> executed conditionally [note: conditionally here doesn't include
> after a taken exit].  Basically we have to make sure further analysis
> does not attempt to use undefined overflow for expressions we don't
> know whether they are computed in the original source (for all
> loop iterations).  This would result in bogus CHRECs as can be seen
> in this PR.
>
> The solution is to re-write those expressions in a way so overflow
> behavior is well-defined.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.
>
> Richard.
>
> 2016-02-24  Richard Biener  
> Jakub Jelinek  
>
> PR middle-end/69760
> * tree-scalar-evolution.c (interpret_rhs_expr): Re-write
> conditionally executed ops to well-defined overflow behavior.
>

This caused:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69983


-- 
H.J.

Re: [PATCH, rs6000] Fixing PR 67145

2016-02-26 Thread Segher Boessenkool

On Thu, Feb 25, 2016 at 09:08:32PM -0800, Richard Henderson wrote:
> It's the simplify-rtx.c portion of the patch that fixes the i686 regression.
> 
> In the PR, Alan raises some good points, but I don't believe that we can 
> address those for gcc6.  A new rtl reassoc optimization that takes loop 
> invariance into account will have to wait.
> 
> But we do need to take care of the rs6000 ice that results, and that's the 
> bulk of the patch -- allowing CA to be sorted to any register of the plus 
> chain.

The rs6000 change is much too risky for stage 4 as well in my opinion.
It also seems it will regress codegen (for which we have no testcases
yet, sigh).

What is the rs6000 ICE?

> Some notes:
> 
> ca_operand doesn't work as written, since CA_REGNO is not an available 
> register, and thus doesn't satisfy register_operand.

Yes, curious.  But when it was written it *did* match.  Huh.

> Is there any particular reason that subf<>3_carry_in_m1 was written with 
> minus rather than plus like all of the other patterns?

It's what RTL simplification comes up with, so it is the only thing
combine tries.  The same reason why there are separate patterns for 0
and -1 at all.

Segher

Re: [PATCH, rs6000] Fixing PR 67145

2016-02-26 Thread Richard Henderson

On 02/26/2016 12:41 PM, Segher Boessenkool wrote:
> On Thu, Feb 25, 2016 at 09:08:32PM -0800, Richard Henderson wrote:
>> It's the simplify-rtx.c portion of the patch that fixes the i686 regression.
>>
>> In the PR, Alan raises some good points, but I don't believe that we can 
>> address those for gcc6.  A new rtl reassoc optimization that takes loop 
>> invariance into account will have to wait.
>>
>> But we do need to take care of the rs6000 ice that results, and that's the 
>> bulk of the patch -- allowing CA to be sorted to any register of the plus 
>> chain.
> 
> The rs6000 change is much too risky for stage 4 as well in my opinion.
> It also seems it will regress codegen (for which we have no testcases
> yet, sigh).
> 
> What is the rs6000 ICE?

The simplify-rtx.c patch causes (reg:M ca) to get sorted to a different spot in
the (plus (plus r1 r2) r3) chain than the rs6000 backend expects, producing an
ICE due to an unrecognizable insn.

The rs6000 patch changes the backend to allow (reg:M ca) to appear exactly once
in (plus (plus r1 r2) r3).  And we do have a test case for it -- the ICE.

How do you imagine the rs6000 change will regress codegen?

>> ca_operand doesn't work as written, since CA_REGNO is not an available 
>> register, and thus doesn't satisfy register_operand.
> 
> Yes, curious.  But when it was written it *did* match.  Huh.

It probably started to fail with r181760, back in 2011.  But the predicate
wasn't much used in the rs6000 backend, so I guess it hadn't really mattered.

>> Is there any particular reason that subf<>3_carry_in_m1 was written with 
>> minus rather than plus like all of the other patterns?
> 
> It's what RTL simplification comes up with, so it is the only thing
> combine tries.  The same reason why there are separate patterns for 0
> and -1 at all.

Well, it appears to use the new pattern just as well.  A few trivial tests are
in fact simplified as one would expect during combine.

r~

Re: [PATCH] Fix up recent texinfo regression

2016-02-26 Thread Jakub Jelinek

On Fri, Feb 26, 2016 at 01:09:57PM -0700, Martin Sebor wrote:
> On 02/26/2016 12:51 PM, Jakub Jelinek wrote:
> >I've noticed today:
> >../../gcc/doc/extend.texi:10717: warning: `.' or `,' must follow @xref, not A
> >../../gcc/doc/extend.texi:10764: warning: `.' or `,' must follow @xref, not A
> >
> >After reading info texinfo on @xref and @pxref, I believe it is invalid
> >to use @xref this way, in the middle of a sentence, and indeed it looks
> >quite weird what is produced.  So the following patch attempts to fix it up
> >by using @xref in a separate sentence.  Tested on x86_64-linux, ok for
> >trunk?
> 
> Thanks for fixing it up.  I noticed the warning yesterday and have
> been meaning to look into it.
> 
> Your change seems fine to me though I wonder if using a plain @ref{}
> instead would do the right thing.  It's used in a bunch of places to
> refer to the Common Function Attributes section (for example).  That
> way the referenced text wouldn't need to be repeated.

It would read as
See C99 Section x.y [Variable Length], page NN Arrays offer
in the pdf
and
See C99 *note Variable Length:: Arrays offer
if I read the texinfo doc right.  Both look just too weird to me.

Jakub

Re: [PATCH, rs6000] Fixing PR 67145

2016-02-26 Thread Segher Boessenkool

On Fri, Feb 26, 2016 at 12:51:10PM -0800, Richard Henderson wrote:
> > What is the rs6000 ICE?
> 
> The simplify-rtx.c patch causes (reg:M ca) to get sorted to a different spot 
> in
> the (plus (plus r1 r2) r3) chain than the rs6000 backend expects, producing an
> ICE due to an unrecognizable insn.

So it changes existing RTL to a form that does not pass recog.  Uh.

> How do you imagine the rs6000 change will regress codegen?

Combine of sequences with double-length adds.

> >> ca_operand doesn't work as written, since CA_REGNO is not an available 
> >> register, and thus doesn't satisfy register_operand.
> > 
> > Yes, curious.  But when it was written it *did* match.  Huh.
> 
> It probably started to fail with r181760, back in 2011.  But the predicate
> wasn't much used in the rs6000 backend, so I guess it hadn't really mattered.

It failed more recently though, that's what r215429 is about.  Oh, before
that it did test just the code, didn't use register_operand.  *facepalm*


Segher

Re: [PATCH, rs6000] Fixing PR 67145

2016-02-26 Thread Segher Boessenkool

On Thu, Feb 25, 2016 at 09:08:32PM -0800, Richard Henderson wrote:
> diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
> index 450fa8b..9d55e7b 100644
> --- a/gcc/simplify-rtx.c
> +++ b/gcc/simplify-rtx.c
> @@ -4421,9 +4421,17 @@ simplify_plus_minus (enum rtx_code code, machine_mode 
> mode, rtx op0,
>n_ops = i;
>  }
>  
> -  /* If nothing changed, fail.  */
> +  /* If nothing changed, check that rematerialization of rtl instructions
> + is still required.  */
>if (!canonicalized)
> -return NULL_RTX;
> +{
> +  /* Perform rematerialization if only all operands are registers and
> + all operations are PLUS.  */
> +  for (i = 0; i < n_ops; i++)
> + if (ops[i].neg || !REG_P (ops[i].op))
> +   return NULL_RTX;
> +  goto gen_result;
> +}

If you check for fixed registers as well here, does that work for you?


Segher

[PING^2] Re: [PATCH] [wwwdocs] Add a "Plugin issues" section to the GCC 6 porting guide

2016-02-26 Thread David Malcolm

Ping

On Thu, 2016-02-18 at 10:44 -0500, David Malcolm wrote:
> Ping:
> https://gcc.gnu.org/ml/gcc-patches/2016-02/msg00765.html
> 
> 
> On Thu, 2016-02-11 at 10:12 -0500, David Malcolm wrote:
> > I've (mostly) ported gcc-python-plugin to gcc 6.  The attached
> > patch
> > for the gcc website starts a new "Plugin issues" section, and
> > covers
> > the biggest issue I ran into (FWIW the suggested compatibility
> > typedef
> > is the one I committed to gcc-python-plugin).
> > 
> > Validates.
> > 
> > OK to commit?

Re: [PATCH, rs6000] Fixing PR 67145

2016-02-26 Thread Richard Henderson

On 02/26/2016 01:01 PM, Segher Boessenkool wrote:
>> How do you imagine the rs6000 change will regress codegen?
> 
> Combine of sequences with double-length adds.

What sort of test case are you imagining here?  The trivial tests I've looked
at have all been optimal (before and after).

> It failed more recently though, that's what r215429 is about.  Oh, before
> that it did test just the code, didn't use register_operand.  *facepalm*

Ah yes, exactly.  I didn't think to annotate ca_operand for more recent
changes.  ;-)

r~

Re: Revert gcc r227962

2016-02-26 Thread Jeff Law


On 02/26/2016 04:04 AM, JonY wrote:

Hi,

I've submitted a patch that was committed as r227962, it causes some
unintended side effects (namely libuuid on Cygwin). Can someone please
revert?

Kai still needs some time to setup his gcc development environment.
We'd need to have a better sense of why this is causing problems.  If 
Kai could chime in with a description of the problem it would be helpful.


jeff

Re: [PATCH 2/4] Replace ENABLE_CHECKING macro with flag_checking in GNAT

2016-02-26 Thread Jeff Law


On 02/24/2016 07:10 AM, marxin wrote:

gcc/ada/ChangeLog:

2016-02-24  Martin Liska  

* gcc-interface/utils.c (set_reverse_storage_order_on_pad_type):
Replace ENABLE_CHECKING macro with flag_checking.

OK.
jeff

Re: [PATCH] [wwwdocs] Add a "Plugin issues" section to the GCC 6 porting guide

2016-02-26 Thread Gerald Pfeifer

Hi David,

On Thu, 11 Feb 2016, David Malcolm wrote:
> I've (mostly) ported gcc-python-plugin to gcc 6.  The attached patch
> for the gcc website starts a new "Plugin issues" section, and covers
> the biggest issue I ran into (FWIW the suggested compatibility typedef
> is the one I committed to gcc-python-plugin).

this is lovely, thanks for doing it!

Just some small editorial comments; please go ahead and commit
after making (or at least considering) them.

Index: htdocs/gcc-6/porting_to.html
===
+"gimple" became a struct, rather than a pointer

Should this be   gimple?

+Prior to GCC 6, "gimple" meant a pointer to a statement.  It was a
+typedef aliasing the type struct gimple_statement_base *:

Same here.

+"gimple" is now the statement struct itself, not a pointer.
+The "gimple" struct is now the base class of the gimple statement class
+hierarchy, and throughout gcc every gimple was changed to a

..and here.

gcc -> GCC

How about "every instance of gimple" ?


+(revision
+https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=42acab1cd6812e2d9e49f4132176f5505f49a0e5";>r227941
+was the commit in question).  The typedef const_gimple is no more;

"is the commit" ?

+change.  If you aim for compatibility between both gcc 6 and earlier
+releases of gcc, it may be cleanest to introduce a compatibility typedef

GCC 6

GCC

Thank you,
Gerald

Re: [wwwdocs] Update -Wnonnull description

2016-02-26 Thread Gerald Pfeifer

On Thu, 25 Feb 2016, Marek Polacek wrote:
> Now that -Wnonnull-comare has been split out of -Wnonnull, we should 
> also update the porting_to text.  Is this sufficient?

It is for me. :-)

Unless anyone else complains, go ahead and commit.

Thanks,
Gerald

Re: [PATCH, rs6000] Fixing PR 67145

2016-02-26 Thread Richard Henderson

On 02/26/2016 01:03 PM, Segher Boessenkool wrote:
> On Thu, Feb 25, 2016 at 09:08:32PM -0800, Richard Henderson wrote:
>> +  /* Perform rematerialization if only all operands are registers and
>> + all operations are PLUS.  */
>> +  for (i = 0; i < n_ops; i++)
>> +if (ops[i].neg || !REG_P (ops[i].op))
>> +  return NULL_RTX;
>> +  goto gen_result;
>> +}
> 
> If you check for fixed registers as well here, does that work for you?

Maybe.  It prevents canonicalization of reg+fp vs fp+reg, which could well
occur via arithmetic on locally allocated arrays.

I guess for the purposes of stage4 I'd be willing to do

  if (ops[i].neg
  || !REG_P (ops[i].op)
  || (REGNO (ops[i].op) < FIRST_PSEUDO_REGISTER
  && fixed_regs[REGNO (ops[i].op)]
  && !global_regs[REGNO (ops[i].op)]
  && ops[i].op != frame_pointer_rtx
  && ops[i].op != arg_pointer_rtx
  && ops[i].op != stack_pointer_rtx))

It's pretty ugly though, and I wouldn't want to keep this forever.

The rs6000 change really ought to be evaluated at some point.  Given its scope,
I see little difference to doing that now vs putting it off to gcc7.

r~

Re: [PATCH, rs6000] Fixing PR 67145

2016-02-26 Thread Segher Boessenkool

On Fri, Feb 26, 2016 at 01:10:17PM -0800, Richard Henderson wrote:
> On 02/26/2016 01:01 PM, Segher Boessenkool wrote:
> >> How do you imagine the rs6000 change will regress codegen?
> > 
> > Combine of sequences with double-length adds.
> 
> What sort of test case are you imagining here?  The trivial tests I've looked
> at have all been optimal (before and after).

There are many different combinations, for example (32-bit code)

long long add_s42(long long a, int b)
{
return a + ((long long) b << 32) + 42;
}

and we generate optimal code on BE for all of those (not on LE yet,
and there is the issue with open-coded carry chains, and and and, and
things shift around.  Also all the scc things.  But I really should
submit some testcases).

Segher

Re: [PATCH] Fix up recent texinfo regression

2016-02-26 Thread Martin Sebor


On 02/26/2016 01:55 PM, Jakub Jelinek wrote:

On Fri, Feb 26, 2016 at 01:09:57PM -0700, Martin Sebor wrote:

On 02/26/2016 12:51 PM, Jakub Jelinek wrote:

I've noticed today:
../../gcc/doc/extend.texi:10717: warning: `.' or `,' must follow @xref, not A
../../gcc/doc/extend.texi:10764: warning: `.' or `,' must follow @xref, not A

After reading info texinfo on @xref and @pxref, I believe it is invalid
to use @xref this way, in the middle of a sentence, and indeed it looks
quite weird what is produced.  So the following patch attempts to fix it up
by using @xref in a separate sentence.  Tested on x86_64-linux, ok for
trunk?


Thanks for fixing it up.  I noticed the warning yesterday and have
been meaning to look into it.

Your change seems fine to me though I wonder if using a plain @ref{}
instead would do the right thing.  It's used in a bunch of places to
refer to the Common Function Attributes section (for example).  That
way the referenced text wouldn't need to be repeated.


It would read as
See C99 Section x.y [Variable Length], page NN Arrays offer
in the pdf
and
See C99 *note Variable Length:: Arrays offer
if I read the texinfo doc right.  Both look just too weird to me.


I agree, that wouldn't look right.  What you have should look
better.

FWIW, more out of curiosity of how it works than anything else
I pulled up the PDF manual to see what some uses of @ref{} get
rendered as.  It doesn't look like the "Section X.Y" part is
there, contrary to my reading of the Texinfo manual.  For
example, the reference below:

  See also the @ref{AVR Named Address Spaces} section for
  an alternate way to locate and access data in flash memory.

looks like this in the GCC 5.3.0 PDF manual:

  See also the [AVR Named Address Spaces], page 382 section for
  an alternate way...

It doesn't look quite right either but the "Section X.Y" text
isn't here (the text in brackets is the link to the section).

There is a way to replace the name of the referenced section
with arbitrary text.  I found an example of it here:

  Notice that attribute @ref{AVR Variable Attributes,,@code{progmem}}
  locates data in flash

which renders as

  Notice that attribute [progmem], page 434 locates data in
  flash

So presumably, to get what I was going for, the reference would
have to be written like so:

  Since C99 @ref{Variable Length,,Variable Length Arrays} offer
  similar functionality

although it would still have the "page 123" text appended to it.
I don't see how to avoid that.  It's probably best to stick to
only basic uses in sentences of their own or in parenthetical
notes.

Martin

Re: [PATCH][AArch64] PR target/69613: Return zero TARGET_SHIFT_TRUNCATION_MASK when SHIFT_COUNT_TRUNCATED is false

2016-02-26 Thread H.J. Lu

On Fri, Feb 26, 2016 at 7:50 AM, James Greenhalgh
 wrote:
> On Thu, Feb 25, 2016 at 09:25:45AM +, Kyrill Tkachov wrote:
>> Hi all,
>>
>> In this wrong-code PR we get bad code when synthesising a TImode right shift
>> by variable amount using DImode shifts during expand.
>>
>> The expand_double_word_shift function expands two paths: one where the
>> variable amount is greater than GET_MODE_BITSIZE (DImode) (word_mode for 
>> aarch64) at runtime
>> and one where it's less than that, and performs a conditional select between 
>> the two in the end.
>>
>> The second path is the one that goes wrong here.
>> The algorithm it uses for calculating a TImode shift when the variable shift 
>> amount is <64 is:
>>
>>
>>   --DImode-   --DImode- --DImode-   --DImode-
>> { |__ target-hi ___| |___ target-lo ___| } = { |___ source-hi ___| |___ 
>> source-lo ___| } >> var_amnt
>>
>> 1) carry = source-hi << 1
>> 2) tmp = ~var_amnt // either that or BITS_PER_WORD - 1 - var_amnt depending 
>> on shift_truncation_mask
>> 3) carry = carry << tmp // net effect is that carry is source-hi shifted 
>> left by BITS_PER_WORD - var_amnt
>> 4) target-lo = source-lo >>u var_amnt //unsigned shift.
>> 5) target-lo = target-lo | carry
>> 6) target-hi = source-hi >> var_amnt
>>
>> where the two DImode words source-hi and source-lo are the two words of the
>> source TImode value and var_amnt is the register holding the variable shift
>> amount. This is performed by the expand_subword_shift function in optabs.c.
>>
>> Step 2) is valid only if the target truncates the shift amount by the width
>> of the type its shifting, that is SHIFT_COUNT_TRUNCATED is true and
>> TARGET_SHIFT_TRUNCATION_MASK is 63 in this case.
>>
>> Step 3) is the one that goes wrong.  On aarch64 a left shift is usually
>> implemented using the LSL instruction but we also have alternatives that can
>> use the SIMD registers and the USHL instruction.  In this case we end up
>> using the USHL instruction. However, although the USHL instruction does
>> a DImode shift, it doesn't truncate the shift amount by 64, but rather by
>> 255. Furthermore, the shift amount is interpreted as a 2's complement signed
>> integer and if it's negative performs a right shift.  This is in contrast
>> with the normal LSL instruction which just performs an unsigned shift by an
>> amount truncated by 64.
>>
>> Now, on aarch64 SHIFT_COUNT_TRUNCATED is defined as !TARGET_SIMD, so we don't
>> assume shift amounts are truncated unless we know we can only ever use the
>> LSL instructions for variable shifts.
>>
>> However, we still return 63 as the TARGET_SHIFT_TRUNCATION_MASK for DImode,
>> so expand_subword_shift assumes that since the mask is non-zero it's a valid
>> shift truncation mask.  The documentation for TARGET_SHIFT_TRUNCATION_MASK
>> says:
>> " The default implementation of this function returns
>>   @code{GET_MODE_BITSIZE (@var{mode}) - 1} if @code{SHIFT_COUNT_TRUNCATED}
>>   and 0 otherwise."
>>
>> So since for TARGET_SIMD we cannot guarantee that all shifts truncate their
>> amount, we should be returning 0 in TARGET_SHIFT_TRUNCATION_MASK when
>> SHIFT_COUNT_TRUNCATED is false.  This is what the patch does, and with it
>> step 2) becomes:
>> 2) tmp = BITS_PER_WORD - 1 - var_amnt
>> which behaves correctly on aarch64.
>>
>> Unfortunately, the testcase from the PR has very recently gone latent on
>> trunk because it depends on register allocation picking a particular
>> alternative from the *aarch64_ashl_sisd_or_int_3 pattern in aarch64.md.
>> I tried to do some inline asm tricks to get it to force the correct
>> constraints but was unsuccessful. Nevertheless I've included the testcase in
>> the patch. I suppose it's better than nothing.
>
> Thanks for the great write-up. This level of detail is very helpful for
> review.
>
> OK for trunk.
>
> Thanks,
> James
>
>>
>> Bootstrapped and tested on aarch64.
>>
>> Ok for trunk?
>>
>> Thanks,
>> Kyrill
>>
>>
>> 2016-02-25  Kyrylo Tkachov  
>>
>> PR target/69613
>> * config/aarch64/aarch64.c (aarch64_shift_truncation_mask):
>> Return 0 if !SHIFT_COUNT_TRUNCATED
>>
>> 2016-02-25  Kyrylo Tkachov  
>>
>> PR target/69613
>> * gcc.dg/torture/pr69613.c: New test.
>
>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>> index 
>> d0d15b4feee70a5ca6af8dd16c7667cbcd844dbf..2e69e345545e591d1de76c2d175aac476e6e1107
>>  100644
>> --- a/gcc/config/aarch64/aarch64.c
>> +++ b/gcc/config/aarch64/aarch64.c
>> @@ -11171,7 +11171,8 @@ static unsigned HOST_WIDE_INT
>>  aarch64_shift_truncation_mask (machine_mode mode)
>>  {
>>return
>> -(aarch64_vector_mode_supported_p (mode)
>> +(!SHIFT_COUNT_TRUNCATED
>> + || aarch64_vector_mode_supported_p (mode)
>>   || aarch64_vect_struct_mode_p (mode)) ? 0 : (GET_MODE_BITSIZE (mode) - 
>> 1);
>>  }
>>
>> diff --git a/gcc/testsuite/gcc.dg/torture/pr69613.c 
>> b/gcc/testsuite/gcc.dg/torture/pr69613.c
>> new file mode 100644
>> ind

Re: [C PATCH] Fix C error-recovery (PR c/69796, PR c/69974)

2016-02-26 Thread Jeff Law


On 02/26/2016 07:43 AM, Jakub Jelinek wrote:

Hi!

Already PR69483 and these two further PRs show that it is really a bad idea
to set TREE_TYPE of decls with incomplete types to error_mark_node, there
are lots of places in the middle-end that don't expect error_mark_nodes
appearing so late.

I've bootstrapped/regtested on x86_64-linux and i686-linux following patch
which just doesn't do anything with the type, we don't emit the error
multiple times, because if the FE emits errors, cgraphunit doesn't attempt
to assemble variables or compile functions (just gimplifies them) if
seen_error ().  Attached is then untested alternative, set the type to
something that should hopefully not cause ICEs nor further errors/warnings
during the error-recovery.

Ok for trunk, or shall I test the other patch instead?

2016-02-26  Jakub Jelinek  

PR c/69796
PR c/69974
* c-parser.c (c_parser_translation_unit): Don't change TREE_TYPE
of incomplete decls to error_mark_node.

* gcc.dg/pr69796.c: New test.
* gcc.dg/pr69974.c: New test.
This one leaves the type incomplete, right?  So ISTM it's somewhat more 
likely than the second to expose other errors later with code that 
doesn't expect the type to be incomplete (much like other code doesn't 
expect to find error_mark_node in here).


The second patch at least puts a real type in there.  I suspect that's 
less likely to cause problems downstream, except perhaps with diagnostics.


I could argue for either.  I almost asked for the latter to be tested, 
but the more I think about it, I don't like slamming in another type 
like that.


I'll conditionally approve -- if nobody objects in 72hrs, consider the 
first patch OK for the trunk.


jeff

Re: [PATCH, rs6000] Fixing PR 67145

2016-02-26 Thread Segher Boessenkool

On Fri, Feb 26, 2016 at 01:35:10PM -0800, Richard Henderson wrote:
> On 02/26/2016 01:03 PM, Segher Boessenkool wrote:
> > On Thu, Feb 25, 2016 at 09:08:32PM -0800, Richard Henderson wrote:
> >> +  /* Perform rematerialization if only all operands are registers and
> >> + all operations are PLUS.  */
> >> +  for (i = 0; i < n_ops; i++)
> >> +  if (ops[i].neg || !REG_P (ops[i].op))
> >> +return NULL_RTX;
> >> +  goto gen_result;
> >> +}
> > 
> > If you check for fixed registers as well here, does that work for you?
> 
> Maybe.  It prevents canonicalization of reg+fp vs fp+reg, which could well
> occur via arithmetic on locally allocated arrays.

Where are these canonicalization rules described?

> I guess for the purposes of stage4 I'd be willing to do
> 
>   if (ops[i].neg
>   || !REG_P (ops[i].op)
>   || (REGNO (ops[i].op) < FIRST_PSEUDO_REGISTER
>   && fixed_regs[REGNO (ops[i].op)]
>   && !global_regs[REGNO (ops[i].op)]
>   && ops[i].op != frame_pointer_rtx
>   && ops[i].op != arg_pointer_rtx
>   && ops[i].op != stack_pointer_rtx))
> 
> It's pretty ugly though, and I wouldn't want to keep this forever.

Yeah.

> The rs6000 change really ought to be evaluated at some point.  Given its 
> scope,
> I see little difference to doing that now vs putting it off to gcc7.

It is stage 4.  This rs6000 change has almost 100% chance of introducing
regressions.


Segher

1 2 >

1 - 100 of 109 matches

Mail list logo