Re: [PATCH] Fix PR91790

2019-09-20 Thread Richard Biener
On Thu, 19 Sep 2019, Bill Schmidt wrote:

> 
> On 9/19/19 1:34 PM, Segher Boessenkool wrote:
> > Hi!
> >
> > On Tue, Sep 17, 2019 at 09:45:54AM +0200, Richard Biener wrote:
> >> The following fixes an old vectorizer issue with realignment support
> >> (thus only powerpc is affected) and BB vectorization.  The realignment
> >> token is set up from the wrong data-ref which causes an SSA verification
> >> failure but in other circumstances might simply generate wrong code.
> >>
> >> Bootstrap running on x86_64-unknown-linux-gnu, I'll install this
> >> as obvious on trunk.
> >>
> >> PPC folks - you know best how to appropriately test a target
> >> where we use the re-alignment optimization.  IIRC on later
> >> powerpc hardware this isn't exercised at all since we can use
> >> unaligned accesses.
> >>
> >> The issue is at least present on the GCC 9 branch as well but I'd
> >> appreciate testing where it exercises the path before considering
> >> a backport.
> > Is there a testcase?
> 
> 
> Richard, can you turn the PR's reported test into a torture test case?  We
> post P7 big-endian results frequently to gcc-testresults, and this bug hasn't
> fired on anything there, so it's not covered by existing tests.  Nothing has
> turned up on the testers since your patch went in, so having the new test
> added should be sufficient, I'd think.  P7 or older running big-endian is
> what's needed to test realignment support.

I was hoping you guys could take a stab at that, eventually also creating
a wrong-code runnable testcase (one that would alignment-fault at runtime
for using the wrong alignment token).  The testcase in the PR is
unwieldly, aka C++, and it likely requires machine specific options
to more reliably trigger - IIRC I needed -mcpu=e300c3 or so thus
apply subtarget specific tuning.

OTOH the patch was so obvious...

Richard.

> Thanks,
> 
> Bill
> 
> >
> > You can use -malign-natural to get stricter alignment requirements,
> > that might help.
> >
> > Cc:ing Bill, this is vectorizer :-)
> >
> >
> > Segher
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 247165 (AG München)

Re: [PATCH] Come up with debug counter for store-merging.

2019-09-20 Thread Martin Liška
On 9/19/19 4:57 PM, Bernhard Reutner-Fischer wrote:
> On Wed, 18 Sep 2019 11:01:59 +0200
> Richard Biener  wrote:
> 
>> On Wed, Sep 18, 2019 at 9:22 AM Martin Liška  wrote:
>>>
>>> Hi.
>>>
>>> After I spent quite some time with PR91758, I would like
>>> to see a debug counter in store merging for the next time.
>>>
>>> Ready to be installed?  
>> OK.
> 
>> @@ -4195,7 +4196,8 @@ imm_store_chain_info::output_merged_stores ()
>>bool ret = false;
>>FOR_EACH_VEC_ELT (m_merged_store_groups, i, merged_store)
>>  {
>> -  if (output_merged_store (merged_store))
>> +  if (dbg_cnt (store_merging)
>> +  && output_merged_store (merged_store))
> 
> Doesn't this over-count actual merged stores? You really count the number of 
> attempted store merges, not the actual successful stores merged. As i would 
> expect to count the latter, i would have expected ..
> 
>>  {
>>unsigned int j;
>>store_immediate_info *store;
> 
> dbg_cnt (store_merging) here, i.e. after the guard that determines if
> the merge is benefical, no?

Hi.

You are right. It's more about attempts rather than number of transformations.
However, the function imm_store_chain_info::output_merged_store emits GIMPLE 
statements
and I was unable to find a single place where to put the debug counter.

Martin

> 
> thanks,
> 



[PATCH] Reduction of conditional operations for vectorization

2019-09-20 Thread Yuliang Wang
Hi,

ifcvt transforms the following conditional operation reduction pattern:

  if ( condition )
a = a OP b;
  else
a = a OP c;

Into:

  a_1 = a OP b;
  a_2 = a OP c;
  a = condition ? a_1 : a_2;

Where OP is one of { plus minus mult min max and ior eor }.

This patch further optimizes the above to:

  a_0 = condition ? b : c;
  a = a OP a_0;

Which enables vectorization on AArch64.
Also supported are permutations of the above operand ordering subject to 
commutativity of OP.

Added new tests. Built and regression tested on aarch64-none-elf and 
aarch64-linux-gnu.

Best Regards,
Yuliang Wang


gcc/ChangeLog:

2019-09-19  Yuliang Wang  

* match.pd (for cnd (cond vec_cond)): New match statements for the
above patterns.
* doc/sourcebuild.texi (vect_condred_si): Document new target selector.

gcc/testsuite/ChangeLog:

2019-09-19  Yuliang Wang  

* gcc.target/aarch64/sve2/condred_1.c: New test.
* gcc.dg/vect/vect-condred-1.c: As above.
* gcc.dg/vect/vect-condred-2.c: As above.
* gcc.dg/vect/vect-condred-3.c: As above.
* gcc.dg/vect/vect-condred-4.c: As above.
* gcc.dg/vect/vect-condred-5.c: As above.
* gcc.dg/vect/vect-condred-6.c: As above.
* gcc.dg/vect/vect-condred-7.c: As above.
* gcc.dg/vect/vect-condred-8.c: As above.
* lib/target-supports.exp (check_effective_target_vect_condred_si):
Return true for AArch64 without SVE.


rb11852.patch
Description: rb11852.patch


Re: [PATCH][x86] Fix PR91814

2019-09-20 Thread Uros Bizjak
On Thu, Sep 19, 2019 at 5:43 PM Uros Bizjak  wrote:
>
> On Thu, Sep 19, 2019 at 5:30 PM Richard Biener  wrote:
> >
> >
> > Boostrapped and tested on x86_64-unknown-linux-gnu.
> >
> > OK?
>
> OK.

Hm, something is not working correctly here. For the testcase, I get:

main:
vmovq   %xmm0, %xmm0
vpxor   %xmm1, %xmm1, %xmm1
vpsubq  %xmm0, %xmm1, %xmm1
vpmaxsq %xmm1, %xmm0, %xmm0
vmovq   %xmm0, %rax
movabsq %rax, .LC0+11532131096
xorl%eax, %eax
ret

The first insn uses uninitialized reg.

The _.stv pass misses initialization of r94 reg:

(note 2 3 7 2 NOTE_INSN_FUNCTION_BEG)
(note 7 2 24 2 NOTE_INSN_DELETED)
(insn 24 7 8 2 (set (subreg:V2DI (reg:DI 93) 0)
(vec_concat:V2DI (reg:DI 94)
(const_int 0 [0]))) "pr67271.c":11:17 -1
 (nil))

Uros.

> Thanks,
> Uros.
>
> > Thanks,
> > Richard.
> >
> > 2019-09-19  Richard Biener  
> >
> > PR target/91814
> > * config/i386/i386-features.c (gen_gpr_to_xmm_move_src):
> > Force operand to a register if it isn't nonimmediate_operand.
> >
> > Index: gcc/config/i386/i386-features.c
> > ===
> > --- gcc/config/i386/i386-features.c (revision 275959)
> > +++ gcc/config/i386/i386-features.c (working copy)
> > @@ -668,10 +668,13 @@ scalar_chain::emit_conversion_insns (rtx
> >  static rtx
> >  gen_gpr_to_xmm_move_src (enum machine_mode vmode, rtx gpr)
> >  {
> > +  if (!nonimmediate_operand (gpr, GET_MODE_INNER (vmode)))
> > +gpr = force_reg (GET_MODE_INNER (vmode), gpr);
> >switch (GET_MODE_NUNITS (vmode))
> >  {
> >  case 1:
> > -  return gen_rtx_SUBREG (vmode, gpr, 0);
> > +  /* We are not using this case currently.  */
> > +  gcc_unreachable ();
> >  case 2:
> >return gen_rtx_VEC_CONCAT (vmode, gpr,
> >  CONST0_RTX (GET_MODE_INNER (vmode)));


Re: [patch, testsuite, arm] Fix ICE in gcc.dg/gimplefe-28.c

2019-09-20 Thread Kyrill Tkachov



On 9/19/19 7:44 PM, Sandra Loosemore wrote:

On 9/19/19 2:40 AM, Kyrill Tkachov wrote:


Index: gcc/testsuite/lib/target-supports.exp
===
--- gcc/testsuite/lib/target-supports.exp    (revision 275699)
+++ gcc/testsuite/lib/target-supports.exp    (working copy)
@@ -6670,6 +6670,9 @@ proc add_options_for_sqrt_insn { flags }
  if { [istarget amdgcn*-*-*] } {
  return "$flags -ffast-math"
  }
+    if { [istarget arm*-*-*] } {
+    return "$flags -mfpu=vfp -mfloat-abi=softfp"
+    }
  return $flags
  }

I tend to think it's a bit cleaner to use one of the 
add_options_for_* helpers we have that know the right float-abi and 
mfpu options to pass.
Would using add_options_for_arm_fp or add_options_for_arm_vfp3 work 
here?


add_options_for_arm_fp only adds a -mfloat-abi option which isn't 
sufficient by itself to enable floating-point on the default multilib 
for arm-none-eabi.  It needs -mfpu= too.


My understanding is that all VFP fpus have support for sqrt so why 
require vfp3?


I could put the magic options in a new function called 
add_options_for_arm_vfp instead of directly in 
add_options_for_sqrt_insn, and fix it up to cope with -mfloat-abi=hard 
multilibs too.  Would that be an improvement?


Yeah, an add_options_for_arm_vfp is what we ideally need here.

Thanks,

Kyrill





-Sandra


Re: [PATCH][x86] Fix PR91814

2019-09-20 Thread Richard Biener
On Fri, 20 Sep 2019, Uros Bizjak wrote:

> On Thu, Sep 19, 2019 at 5:43 PM Uros Bizjak  wrote:
> >
> > On Thu, Sep 19, 2019 at 5:30 PM Richard Biener  wrote:
> > >
> > >
> > > Boostrapped and tested on x86_64-unknown-linux-gnu.
> > >
> > > OK?
> >
> > OK.
> 
> Hm, something is not working correctly here. For the testcase, I get:
> 
> main:
> vmovq   %xmm0, %xmm0
> vpxor   %xmm1, %xmm1, %xmm1
> vpsubq  %xmm0, %xmm1, %xmm1
> vpmaxsq %xmm1, %xmm0, %xmm0
> vmovq   %xmm0, %rax
> movabsq %rax, .LC0+11532131096
> xorl%eax, %eax
> ret
> 
> The first insn uses uninitialized reg.
> 
> The _.stv pass misses initialization of r94 reg:
> 
> (note 2 3 7 2 NOTE_INSN_FUNCTION_BEG)
> (note 7 2 24 2 NOTE_INSN_DELETED)
> (insn 24 7 8 2 (set (subreg:V2DI (reg:DI 93) 0)
> (vec_concat:V2DI (reg:DI 94)
> (const_int 0 [0]))) "pr67271.c":11:17 -1
>  (nil))

Hmm, it emits the instruction in the wrong spot for the caller

  emit_insn_before (gen_rtx_SET (gen_rtx_SUBREG (vmode, tmp, 0),
 gen_gpr_to_xmm_move_src (vmode, 
*op)),
insn);

we can do the following which I am testing now.

Richard.

2019-09-20  Richard Biener  

PR target/91814
* config/i386/i386-features.c (gen_gpr_to_xmm_move_src): Add
before parameter to indicate where to emit a needed move to.
(general_scalar_chain::convert_op): Use it.

Index: gcc/config/i386/i386-features.c
===
--- gcc/config/i386/i386-features.c (revision 275989)
+++ gcc/config/i386/i386-features.c (working copy)
@@ -666,18 +666,26 @@ scalar_chain::emit_conversion_insns (rtx
zeroing the upper parts.  */
 
 static rtx
-gen_gpr_to_xmm_move_src (enum machine_mode vmode, rtx gpr)
+gen_gpr_to_xmm_move_src (enum machine_mode vmode, rtx gpr,
+rtx_insn *before = NULL)
 {
-  if (!nonimmediate_operand (gpr, GET_MODE_INNER (vmode)))
-gpr = force_reg (GET_MODE_INNER (vmode), gpr);
+  enum machine_mode smode = GET_MODE_INNER (vmode);
+  if (!nonimmediate_operand (gpr, smode))
+{
+  rtx tem = gen_reg_rtx (smode);
+  if (before)
+   emit_insn_before (gen_rtx_SET (tem, gpr), before);
+  else
+   emit_move_insn (tem, gpr);
+  gpr = tem;
+}
   switch (GET_MODE_NUNITS (vmode))
 {
 case 1:
   /* We are not using this case currently.  */
   gcc_unreachable ();
 case 2:
-  return gen_rtx_VEC_CONCAT (vmode, gpr,
-CONST0_RTX (GET_MODE_INNER (vmode)));
+  return gen_rtx_VEC_CONCAT (vmode, gpr, CONST0_RTX (smode));
 default:
   return gen_rtx_VEC_MERGE (vmode, gen_rtx_VEC_DUPLICATE (vmode, gpr),
CONST0_RTX (vmode), GEN_INT (HOST_WIDE_INT_1U));
@@ -836,7 +844,8 @@ general_scalar_chain::convert_op (rtx *o
   rtx tmp = gen_reg_rtx (GET_MODE (*op));
 
   emit_insn_before (gen_rtx_SET (gen_rtx_SUBREG (vmode, tmp, 0),
-gen_gpr_to_xmm_move_src (vmode, *op)),
+gen_gpr_to_xmm_move_src (vmode, *op,
+ insn)),
insn);
   *op = gen_rtx_SUBREG (vmode, tmp, 0);
 


[PATCH] Fix PR91822

2019-09-20 Thread Richard Biener


The following restores the for_reduction parameter.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2019-09-20  Richard Biener  

PR tree-optimization/91822
* tree-vectorizer.h (vectorizable_condition): Restore for_reduction
parameter.
* tree-vect-loop.c (vectorizable_reduction): Adjust asserts
for reduc_index in nested cycles, adjust vectorizable_condition
calls.
* tree-vect-stmts.c (vectorizable_condition): Restore for_reduction
parameter.
(vect_analyze_stmt): Adjust.
(vect_transform_stmt): Likewise.

Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c(revision 275988)
+++ gcc/tree-vect-loop.c(working copy)
@@ -6534,9 +6534,10 @@ vectorizable_reduction (stmt_vec_info st
 {
   /* Only call during the analysis stage, otherwise we'll lose
 STMT_VINFO_TYPE.  */
-  gcc_assert (reduc_index > 0);
+  gcc_assert (nested_cycle || reduc_index > 0);
   if (!vec_stmt && !vectorizable_condition (stmt_info, gsi, NULL,
-   reduc_index, NULL, cost_vec))
+   true, reduc_index,
+   NULL, cost_vec))
 {
   if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -6991,7 +6992,7 @@ vectorizable_reduction (stmt_vec_info st
 {
   gcc_assert (!slp_node && reduc_index > 0);
   return vectorizable_condition (stmt_info, gsi, vec_stmt,
-reduc_index, NULL, NULL);
+true, reduc_index, NULL, NULL);
 }
 
   /* Create the destination vector  */
@@ -7021,8 +7022,8 @@ vectorizable_reduction (stmt_vec_info st
 {
   if (code == COND_EXPR)
 {
-  gcc_assert (!slp_node && reduc_index > 0);
- vectorizable_condition (stmt_info, gsi, vec_stmt,
+  gcc_assert (!slp_node && (nested_cycle || reduc_index > 0));
+ vectorizable_condition (stmt_info, gsi, vec_stmt, true,
  reduc_index, NULL, NULL);
   break;
 }
Index: gcc/tree-vect-stmts.c
===
--- gcc/tree-vect-stmts.c   (revision 275988)
+++ gcc/tree-vect-stmts.c   (working copy)
@@ -9778,7 +9778,8 @@ vect_is_simple_cond (tree cond, vec_info
 
 bool
 vectorizable_condition (stmt_vec_info stmt_info, gimple_stmt_iterator *gsi,
-   stmt_vec_info *vec_stmt, int reduc_index,
+   stmt_vec_info *vec_stmt, bool for_reduction,
+   int reduc_index,
slp_tree slp_node, stmt_vector_for_cost *cost_vec)
 {
   vec_info *vinfo = stmt_info->vinfo;
@@ -9807,7 +9808,6 @@ vectorizable_condition (stmt_vec_info st
   vec vec_oprnds3 = vNULL;
   tree vec_cmp_type;
   bool masked = false;
-  bool for_reduction = (reduc_index > 0);
 
   if (for_reduction && STMT_SLP_TYPE (stmt_info))
 return false;
@@ -10668,7 +10668,7 @@ vect_analyze_stmt (stmt_vec_info stmt_in
 node_instance, cost_vec)
  || vectorizable_induction (stmt_info, NULL, NULL, node, cost_vec)
  || vectorizable_shift (stmt_info, NULL, NULL, node, cost_vec)
- || vectorizable_condition (stmt_info, NULL, NULL, 0, node,
+ || vectorizable_condition (stmt_info, NULL, NULL, false, -1, node,
 cost_vec)
  || vectorizable_comparison (stmt_info, NULL, NULL, node,
  cost_vec));
@@ -10687,7 +10687,7 @@ vect_analyze_stmt (stmt_vec_info stmt_in
  || vectorizable_load (stmt_info, NULL, NULL, node, node_instance,
cost_vec)
  || vectorizable_store (stmt_info, NULL, NULL, node, cost_vec)
- || vectorizable_condition (stmt_info, NULL, NULL, 0, node,
+ || vectorizable_condition (stmt_info, NULL, NULL, false, -1, node,
 cost_vec)
  || vectorizable_comparison (stmt_info, NULL, NULL, node,
  cost_vec));
@@ -10792,7 +10792,7 @@ vect_transform_stmt (stmt_vec_info stmt_
   break;
 
 case condition_vec_info_type:
-  done = vectorizable_condition (stmt_info, gsi, &vec_stmt, 0,
+  done = vectorizable_condition (stmt_info, gsi, &vec_stmt, false, -1,
 slp_node, NULL);
   gcc_assert (done);
   break;
Index: gcc/tree-vectorizer.h
===
--- gcc/tree-vectorizer.h   (revision 275988)
+++ gcc/tree-vectorizer.h   (working copy)
@@ -1534,7 +1534,7 @@ extern void vect_remove_stores (stmt_vec
 extern opt_result vect_analyze_stmt (

Re: [PATCH][x86] Fix PR91814

2019-09-20 Thread Uros Bizjak
On Fri, Sep 20, 2019 at 10:32 AM Richard Biener  wrote:
>
> On Fri, 20 Sep 2019, Uros Bizjak wrote:
>
> > On Thu, Sep 19, 2019 at 5:43 PM Uros Bizjak  wrote:
> > >
> > > On Thu, Sep 19, 2019 at 5:30 PM Richard Biener  wrote:
> > > >
> > > >
> > > > Boostrapped and tested on x86_64-unknown-linux-gnu.
> > > >
> > > > OK?
> > >
> > > OK.
> >
> > Hm, something is not working correctly here. For the testcase, I get:
> >
> > main:
> > vmovq   %xmm0, %xmm0
> > vpxor   %xmm1, %xmm1, %xmm1
> > vpsubq  %xmm0, %xmm1, %xmm1
> > vpmaxsq %xmm1, %xmm0, %xmm0
> > vmovq   %xmm0, %rax
> > movabsq %rax, .LC0+11532131096
> > xorl%eax, %eax
> > ret
> >
> > The first insn uses uninitialized reg.
> >
> > The _.stv pass misses initialization of r94 reg:
> >
> > (note 2 3 7 2 NOTE_INSN_FUNCTION_BEG)
> > (note 7 2 24 2 NOTE_INSN_DELETED)
> > (insn 24 7 8 2 (set (subreg:V2DI (reg:DI 93) 0)
> > (vec_concat:V2DI (reg:DI 94)
> > (const_int 0 [0]))) "pr67271.c":11:17 -1
> >  (nil))
>
> Hmm, it emits the instruction in the wrong spot for the caller
>
>   emit_insn_before (gen_rtx_SET (gen_rtx_SUBREG (vmode, tmp, 0),
>  gen_gpr_to_xmm_move_src (vmode,
> *op)),
> insn);
>
> we can do the following which I am testing now.

How about the attached patch? The only place we process memory operand
is from convert_op (see the function comment), and by using
gen_rtx_SET directly, we can still generate MOVABS, which is otherwise
split by using emit_move_insn.

Can you please test the attached patch instead?

Thanks,
Uros.
diff --git a/gcc/config/i386/i386-features.c b/gcc/config/i386/i386-features.c
index 546d78d99b53..9b297bac1910 100644
--- a/gcc/config/i386/i386-features.c
+++ b/gcc/config/i386/i386-features.c
@@ -668,8 +668,6 @@ scalar_chain::emit_conversion_insns (rtx insns, rtx_insn 
*after)
 static rtx
 gen_gpr_to_xmm_move_src (enum machine_mode vmode, rtx gpr)
 {
-  if (!nonimmediate_operand (gpr, GET_MODE_INNER (vmode)))
-gpr = force_reg (GET_MODE_INNER (vmode), gpr);
   switch (GET_MODE_NUNITS (vmode))
 {
 case 1:
@@ -835,6 +833,15 @@ general_scalar_chain::convert_op (rtx *op, rtx_insn *insn)
 {
   rtx tmp = gen_reg_rtx (GET_MODE (*op));
 
+  /* Handle movabs.  */
+  if (!memory_operand (*op, GET_MODE (*op)))
+   {
+ rtx tmp2 = gen_reg_rtx (GET_MODE (*op));
+
+ emit_insn_before (gen_rtx_SET (tmp2, *op), insn);
+ *op = tmp2;
+   }
+
   emit_insn_before (gen_rtx_SET (gen_rtx_SUBREG (vmode, tmp, 0),
 gen_gpr_to_xmm_move_src (vmode, *op)),
insn);


Fix PR c/91815

2019-09-20 Thread Eric Botcazou
This gets rid of a bogus error issued by the C compiler on a type definition 
at file scope when there is a preceding external declaration in a local scope 
for the same symbol (but the error is still issued if the preceding external 
declaration is also at file scope).

Bootstrapped/regtested on x86-64/Linux, approved by Joseph in the audit trail 
and applied on the mainline.


2019-09-20  Eric Botcazou  

PR c/91815
* c-decl.c (pushdecl): In C detect duplicate declarations across scopes
of identifiers in the external scope only for variables and functions.


2019-09-20  Eric Botcazou  

* gcc.dg/typedef-var-1.c: New test.
* gcc.dg/typedef-var-2.c: Likewise.

-- 
Eric BotcazouIndex: c-decl.c
===
--- c-decl.c	(revision 275988)
+++ c-decl.c	(working copy)
@@ -3130,8 +3130,11 @@ pushdecl (tree x)
  detecting duplicate declarations of the same object, no matter
  what scope they are in; this is what we do here.  (C99 6.2.7p2:
  All declarations that refer to the same object or function shall
- have compatible type; otherwise, the behavior is undefined.)  */
-  if (DECL_EXTERNAL (x) || scope == file_scope)
+ have compatible type; otherwise, the behavior is undefined.)
+ However, in Objective-C, we also want to detect declarations
+ conflicting with those of the basic types.  */
+  if ((DECL_EXTERNAL (x) || scope == file_scope)
+  && (VAR_OR_FUNCTION_DECL_P (x) || c_dialect_objc ()))
 {
   tree type = TREE_TYPE (x);
   tree vistype = NULL_TREE;
/* PR c/91815 */
/* { dg-do compile } */

int f (void)
{
  extern int t;
  extern float v;   

  return (v > 0.0f);
}

typedef float t;

t v = 4.5f;
/* PR c/91815 */
/* { dg-do compile } */

int f (void)
{
  extern float v;   

  return (v > 0.0f);
}

extern int t;

typedef float t; /* { dg-error "redeclared as different kind of symbol" } */

t v = 4.5f;


Re: PR88751: Backport to GCC 8 and 9 branches?

2019-09-20 Thread Andreas Krebbel
On 06.09.19 12:48, Richard Biener wrote:
> On Fri, Sep 6, 2019 at 10:11 AM Andreas Krebbel  wrote:
>>
>> Hi,
>>
>> since this caused a critical performance regression in the OpenJ9 byte code 
>> interpreter after
>> migrating from GCC 4.8 to GCC 7 I would like to backport this patch also to 
>> GCC 8 and 9 branch.
>>
>> Ok - after bootstrap and regression test went fine?
> 
> Looks reasonable to me.  But what about GCC 7?  I assume you also verified the
> actual performance regression is gone.

I've committed the patch to GCC 7 and 8 branch after verifying that the change 
has the desired
effect on the source code file from OpenJ9.

GCC 9 branch is currently frozen. Ok, to apply there as well?

Andreas

> 
> Richard.
> 
>>
>> Andreas
>>
>>
>> commit d3dc20418aad41af83fe45ccba527deb0b334983
>> Author: krebbel 
>> Date:   Thu Jun 6 11:35:04 2019 +
>>
>> Fix PR88751
>>
>> This patch implements a small improvement for the heuristic in lra
>> which decides when it has to activate the simpler register allocation
>> algorithm.
>>
>> gcc/ChangeLog:
>>
>> 2019-06-06  Andreas Krebbel  
>>
>> PR rtl-optimization/88751
>> * ira.c (ira): Use the number of the actually referenced 
>> registers
>> when calculating the threshold.
>>
>>
>>
>> git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@271996 
>> 138bc75d-0d04-0410-961f-82ee72b054a4
>>
>>
>> diff --git a/gcc/ira.c b/gcc/ira.c
>> index 4a14fb31583..725636d8dc5 100644
>> --- a/gcc/ira.c
>> +++ b/gcc/ira.c
>> @@ -5198,6 +5198,8 @@ ira (FILE *f)
>>int ira_max_point_before_emit;
>>bool saved_flag_caller_saves = flag_caller_saves;
>>enum ira_region saved_flag_ira_region = flag_ira_region;
>> +  unsigned int i;
>> +  int num_used_regs = 0;
>>
>>clear_bb_flags ();
>>
>> @@ -5213,12 +5215,17 @@ ira (FILE *f)
>>
>>ira_conflicts_p = optimize > 0;
>>
>> +  /* Determine the number of pseudos actually requiring coloring.  */
>> +  for (i = FIRST_PSEUDO_REGISTER; i < DF_REG_SIZE (df); i++)
>> +num_used_regs += !!(DF_REG_USE_COUNT (i) + DF_REG_DEF_COUNT (i));
>> +
>>/* If there are too many pseudos and/or basic blocks (e.g. 10K
>>   pseudos and 10K blocks or 100K pseudos and 1K blocks), we will
>>   use simplified and faster algorithms in LRA.  */
>>lra_simple_p
>>  = (ira_use_lra_p
>> -   && max_reg_num () >= (1 << 26) / last_basic_block_for_fn (cfun));
>> +   && num_used_regs >= (1 << 26) / last_basic_block_for_fn (cfun));
>> +
>>if (lra_simple_p)
>>  {
>>/* It permits to skip live range splitting in LRA.  */
>>



[SPARC] Fix PR target/91269

2019-09-20 Thread Eric Botcazou
This is a regression present on the mainline and 9 branch: the compiler (LRA) 
generates an instruction using unaligned double FP registers.  This has been a 
known weakness since the switch to LRA because there is no longer an explicit 
constraint for aligned double FP registers.

Bootstrapped/regtested on SPARC/Solaris and SPARC64/Linux, applied on the 
mainline and 9 branch.


2019-09-20  Eric Botcazou  

PR target/91269
* config/sparc/sparc.h (HARD_REGNO_CALLER_SAVE_MODE): Define.


2019-09-20  Eric Botcazou  

* gcc.dg/pr91269.c: New test.

-- 
Eric BotcazouIndex: config/sparc/sparc.h
===
--- config/sparc/sparc.h	(revision 275746)
+++ config/sparc/sparc.h	(working copy)
@@ -711,6 +711,13 @@ along with GCC; see the file COPYING3.
register window instruction in the prologue.  */
 #define HARD_REGNO_RENAME_OK(FROM, TO) ((FROM) != 1)
 
+/* Select a register mode required for caller save of hard regno REGNO.
+   Contrary to what is documented, the default is not the smallest suitable
+   mode but the largest suitable mode for the given (REGNO, NREGS) pair and
+   it quickly creates paradoxical subregs that can be problematic.  */
+#define HARD_REGNO_CALLER_SAVE_MODE(REGNO, NREGS, MODE) \
+  ((MODE) == VOIDmode ? choose_hard_reg_mode (REGNO, NREGS, false) : (MODE))
+
 /* Specify the registers used for certain standard purposes.
The values of these macros are register numbers.  */
 
/* PR target/91269 */
/* Testcase by Sergei Trofimovich  */

/* { dg-do assemble } */
/* { dg-options "-O2 -Wno-int-conversion" }  */
/* { dg-additional-options "-fcall-used-g6 -fPIE -mcpu=niagara4" { target sparc*-*-* } } */

struct m;

enum { a = 2 };
int b[1];
int d[2715];
int e, f, h;
enum { i = 2 } j;
inline int c(int k) {
  char *cp;
  if (k >= 62 && k <= 247)
cp = b[k];
  if (cp)
return 65533;
  return 2;
}
inline int g(int k) {
  if (k < sizeof(d))
return e;
  return 0;
}

int u(struct m*, char*, char*);

int l(struct m *k, char n, long o, int *p) {
  int q, flags = j, r, s, lasttwo = *p;
  char inptr, outptr;
  while (inptr) {
if (__builtin_expect(h, 0))
  break;
unsigned ch = inptr;
if (lasttwo) {
  long need = lasttwo >> 3;
  if (__builtin_expect(need > n, 0))
break;
} else if (s == i) {
  long t = c(ch);
  if (t != 65533) {
int jch = g(ch);
if (jch & 8)
  continue;
  }
}
if (ch <= 5)
  ;
else {
  long t = c(ch);
  if (t != 65533)
;
  else {
switch (f >> 8)
case 79:
  q = f == 20308 || f == 20350;
if (q)
  if (j)
r = u(k, &inptr, &outptr);
s = *p;
if (r)
  if (o && flags & a)
break;
  }
}
  }
}


[PATCH] Fix PR91821

2019-09-20 Thread Richard Biener


The following makes sure pattern detection also works with
"non-canonical" order of operands in reduction stmts.

Bootstrapped and tested on x86_64-unknown-linux-gnu, verified it
fixes the powerpc FAIL, applied.

Richard.

2019-09-20  Richard Biener  

PR tree-optimization/91821
* tree-vect-loop.c (check_reduction_path): Check we can compute
reduc_idx.
(vect_is_simple_reduction): Set STMT_VINFO_REDUC_IDX.
* tree-vect-patterns.c (vect_reassociating_reduction_p): Return
operands in canonical order.
* tree-vectorizer.c (vec_info::new_stmt_vec_info): Initialize
STMT_VINFO_REDUC_IDX.
* tree-vectorizer.h (_stmt_vec_info::reduc_idx): New.
(STMT_VINFO_REDUC_IDX): Likewise.

Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c(revision 275988)
+++ gcc/tree-vect-loop.c(working copy)
@@ -2658,7 +2658,13 @@ pop:
   gimple *use_stmt = USE_STMT (path[i].second);
   tree op = USE_FROM_PTR (path[i].second);
   if (! has_single_use (op)
- || ! is_gimple_assign (use_stmt))
+ || ! is_gimple_assign (use_stmt)
+ /* The following make sure we can compute the operand index
+easily plus it mostly disallows chaining via COND_EXPR condition
+operands.  */
+ || (gimple_assign_rhs1 (use_stmt) != op
+ && gimple_assign_rhs2 (use_stmt) != op
+ && gimple_assign_rhs3 (use_stmt) != op))
{
  fail = true;
  break;
@@ -3058,6 +3064,7 @@ vect_is_simple_reduction (loop_vec_info
  || !flow_bb_inside_loop_p (loop, gimple_bb (def1_info->stmt))
  || vect_valid_reduction_input_p (def1_info)))
 {
+  STMT_VINFO_REDUC_IDX (def_stmt_info) = 1;
   if (dump_enabled_p ())
report_vect_op (MSG_NOTE, def_stmt, "detected reduction: ");
   return def_stmt_info;
@@ -3070,6 +3077,7 @@ vect_is_simple_reduction (loop_vec_info
  || !flow_bb_inside_loop_p (loop, gimple_bb (def2_info->stmt))
  || vect_valid_reduction_input_p (def2_info)))
 {
+  STMT_VINFO_REDUC_IDX (def_stmt_info) = 0;
   if (dump_enabled_p ())
report_vect_op (MSG_NOTE, def_stmt, "detected reduction: ");
   return def_stmt_info;
@@ -3084,16 +3092,18 @@ vect_is_simple_reduction (loop_vec_info
  restriction is that all operations in the chain are the same.  */
   auto_vec reduc_chain;
   unsigned i;
+  bool is_slp_reduc = !nested_in_vect_loop && code != COND_EXPR;
   for (i = path.length () - 1; i >= 1; --i)
{
  gimple *stmt = USE_STMT (path[i].second);
  if (gimple_assign_rhs_code (stmt) != code)
-   break;
- reduc_chain.safe_push (loop_info->lookup_stmt (stmt));
+   is_slp_reduc = false;
+ stmt_vec_info stmt_info = loop_info->lookup_stmt (stmt);
+ STMT_VINFO_REDUC_IDX (stmt_info)
+   = path[i].second->use - gimple_assign_rhs1_ptr (stmt);
+ reduc_chain.safe_push (stmt_info);
}
-  if (i == 0
- && ! nested_in_vect_loop
- && code != COND_EXPR)
+  if (is_slp_reduc)
{
  for (unsigned i = 0; i < reduc_chain.length () - 1; ++i)
{
Index: gcc/tree-vect-patterns.c
===
--- gcc/tree-vect-patterns.c(revision 275988)
+++ gcc/tree-vect-patterns.c(working copy)
@@ -868,6 +868,8 @@ vect_reassociating_reduction_p (stmt_vec
 
   *op0_out = gimple_assign_rhs1 (assign);
   *op1_out = gimple_assign_rhs2 (assign);
+  if (commutative_tree_code (code) && STMT_VINFO_REDUC_IDX (stmt_info) == 0)
+std::swap (*op0_out, *op1_out);
   return true;
 }
 
Index: gcc/tree-vectorizer.c
===
--- gcc/tree-vectorizer.c   (revision 275988)
+++ gcc/tree-vectorizer.c   (working copy)
@@ -639,6 +639,7 @@ vec_info::new_stmt_vec_info (gimple *stm
   STMT_VINFO_VECTORIZABLE (res) = true;
   STMT_VINFO_VEC_REDUCTION_TYPE (res) = TREE_CODE_REDUCTION;
   STMT_VINFO_VEC_CONST_COND_REDUC_CODE (res) = ERROR_MARK;
+  STMT_VINFO_REDUC_IDX (res) = -1;
   STMT_VINFO_SLP_VECT_ONLY (res) = false;
 
   if (gimple_code (stmt) == GIMPLE_PHI
Index: gcc/tree-vectorizer.h
===
--- gcc/tree-vectorizer.h   (revision 275988)
+++ gcc/tree-vectorizer.h   (working copy)
@@ -941,6 +941,10 @@ public:
  vect_force_simple_reduction.  */
   enum vect_reduction_type reduc_type;
 
+  /* On a stmt participating in the reduction the index of the operand
+ on the reduction SSA cycle.  */
+  int reduc_idx;
+
   /* On a reduction PHI the def returned by vect_force_simple_reduction.
  On the def returned by vect_force_simple_reduction the
  corresponding PHI.  */
@@ -1030,6 +1034,7 @@ STMT_VINFO_BB_VINFO (stmt_vec_info stmt_
 #define STMT_

Re: [PATCH] Remove vectorizer reduction operand swapping

2019-09-20 Thread Rainer Orth
Hi Richard,

> On Thu, 19 Sep 2019, Richard Sandiford wrote:
>
>> Richard Biener  writes:
>> > It shouldn't be neccessary.
>> 
>> SVE is the counter-example :-)  But the fix is simpler than the code
>> you removed, so it's still a net win.
>
> Yeah, I meant it shouldn't be necessary to swap operands of the
> original scalar stmts since we keep track of the "order" via
> reduc_index.
>
>> Fixes various vect.exp and aarch64-sve.exp ICEs.  OK if it passes
>> proper testing?
>
> OK.

I suspect this patch caused

+FAIL: gcc.dg/pr88031.c (internal compiler error)
+FAIL: gcc.dg/pr88031.c (test for excess errors)

I'm seeing it on 32 and 64-bit Solaris/x86, but there are also several
reports on aarch64, armv8l, i686, powerpc64*, s390x, x86_64.

Excess errors:
during GIMPLE pass: vect
/vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.dg/pr88031.c:6:6: internal 
compiler error: in vectorizable_reduction, at tree-vect-loop.c:6662
0x938ecdf vectorizable_reduction(_stmt_vec_info*, gimple_stmt_iterator*, 
_stmt_vec_info**, _slp_tree*, _slp_instance*, vec*)
/vol/gcc/src/hg/trunk/local/gcc/tree-vect-loop.c:6662
0x936ac2a vect_analyze_stmt(_stmt_vec_info*, bool*, _slp_tree*, _slp_instance*, 
vec*)
/vol/gcc/src/hg/trunk/local/gcc/tree-vect-stmts.c:10667
0x938afce vect_analyze_loop_operations
/vol/gcc/src/hg/trunk/local/gcc/tree-vect-loop.c:1580
0x938afce vect_analyze_loop_2
/vol/gcc/src/hg/trunk/local/gcc/tree-vect-loop.c:2026
0x938afce vect_analyze_loop(loop*, _loop_vec_info*, vec_info_shared*)
/vol/gcc/src/hg/trunk/local/gcc/tree-vect-loop.c:2329
0x93ab60e try_vectorize_loop_1
/vol/gcc/src/hg/trunk/local/gcc/tree-vectorizer.c:884
0x93abd6c try_vectorize_loop
/vol/gcc/src/hg/trunk/local/gcc/tree-vectorizer.c:1030
0x93ac2fb vectorize_loops()
/vol/gcc/src/hg/trunk/local/gcc/tree-vectorizer.c:1104

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH] Remove vectorizer reduction operand swapping

2019-09-20 Thread Richard Biener
On Fri, 20 Sep 2019, Rainer Orth wrote:

> Hi Richard,
> 
> > On Thu, 19 Sep 2019, Richard Sandiford wrote:
> >
> >> Richard Biener  writes:
> >> > It shouldn't be neccessary.
> >> 
> >> SVE is the counter-example :-)  But the fix is simpler than the code
> >> you removed, so it's still a net win.
> >
> > Yeah, I meant it shouldn't be necessary to swap operands of the
> > original scalar stmts since we keep track of the "order" via
> > reduc_index.
> >
> >> Fixes various vect.exp and aarch64-sve.exp ICEs.  OK if it passes
> >> proper testing?
> >
> > OK.
> 
> I suspect this patch caused
> 
> +FAIL: gcc.dg/pr88031.c (internal compiler error)
> +FAIL: gcc.dg/pr88031.c (test for excess errors)
> 
> I'm seeing it on 32 and 64-bit Solaris/x86, but there are also several
> reports on aarch64, armv8l, i686, powerpc64*, s390x, x86_64.

PR91822, should be fixed now.

Richard.


[COMMITTED][GCC9] Backport RISC-V: Fix more splitters accidentally calling gen_reg_rtx.

2019-09-20 Thread Kito Cheng
This patch fix PR target/91635, back port to gcc-9-branch from trunk,
committed :)
From 430008a798822a0b6894308c841e77a5adc920d5 Mon Sep 17 00:00:00 2001
From: kito 
Date: Fri, 20 Sep 2019 10:41:51 +
Subject: [PATCH] RISC-V: Fix more splitters accidentally calling gen_reg_rtx.

	PR target/91683
	* config/riscv/riscv-protos.h (riscv_split_symbol): New bool parameter.
	(riscv_move_integer): Likewise.
	* config/riscv/riscv.c (riscv_split_integer): Pass FALSE for new
	riscv_move_integer arg.
	(riscv_legitimize_move): Likewise.
	(riscv_force_temporary): New parameter in_splitter.  Don't call
	force_reg if true.
	(riscv_unspec_offset_high): Pass FALSE for new riscv_force_temporary
	arg.
	(riscv_add_offset): Likewise.
	(riscv_split_symbol): New parameter in_splitter.  Pass to
	riscv_force_temporary.
	(riscv_legitimize_address): Pass FALSE for new riscv_split_symbol
	arg.
	(riscv_move_integer): New parameter in_splitter.  New local
	can_create_psuedo.  Don't call riscv_split_integer or force_reg when
	in_splitter TRUE.
	(riscv_legitimize_const_move): Pass FALSE for new riscv_move_integer,
	riscv_split_symbol, and riscv_force_temporary args.
	* config/riscv/riscv.md (low+1): Pass TRUE for new
	riscv_move_integer arg.
	(low+2): Pass TRUE for new riscv_split_symbol arg.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gcc-9-branch@275997 138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog   | 29 +
 gcc/config/riscv/riscv-protos.h |  4 +--
 gcc/config/riscv/riscv.c| 45 -
 gcc/config/riscv/riscv.md   |  6 ++---
 4 files changed, 62 insertions(+), 22 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 1d694a7f7c9..f1f923a06dd 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,32 @@
+2019-09-20  Kito Cheng  
+
+	Backport from mainline
+	2019-09-18  Jim Wilson  
+
+	PR target/91683
+	* config/riscv/riscv-protos.h (riscv_split_symbol): New bool parameter.
+	(riscv_move_integer): Likewise.
+	* config/riscv/riscv.c (riscv_split_integer): Pass FALSE for new
+	riscv_move_integer arg.
+	(riscv_legitimize_move): Likewise.
+	(riscv_force_temporary): New parameter in_splitter.  Don't call
+	force_reg if true.
+	(riscv_unspec_offset_high): Pass FALSE for new riscv_force_temporary
+	arg.
+	(riscv_add_offset): Likewise.
+	(riscv_split_symbol): New parameter in_splitter.  Pass to
+	riscv_force_temporary.
+	(riscv_legitimize_address): Pass FALSE for new riscv_split_symbol
+	arg.
+	(riscv_move_integer): New parameter in_splitter.  New local
+	can_create_psuedo.  Don't call riscv_split_integer or force_reg when
+	in_splitter TRUE.
+	(riscv_legitimize_const_move): Pass FALSE for new riscv_move_integer,
+	riscv_split_symbol, and riscv_force_temporary args.
+	* config/riscv/riscv.md (low+1): Pass TRUE for new
+	riscv_move_integer arg.
+	(low+2): Pass TRUE for new riscv_split_symbol arg.
+
 2019-09-20  Eric Botcazou  
 
 	PR target/91269
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 8b510f87df8..5b0bbdd7cb4 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -44,10 +44,10 @@ extern int riscv_const_insns (rtx);
 extern int riscv_split_const_insns (rtx);
 extern int riscv_load_store_insns (rtx, rtx_insn *);
 extern rtx riscv_emit_move (rtx, rtx);
-extern bool riscv_split_symbol (rtx, rtx, machine_mode, rtx *);
+extern bool riscv_split_symbol (rtx, rtx, machine_mode, rtx *, bool);
 extern bool riscv_split_symbol_type (enum riscv_symbol_type);
 extern rtx riscv_unspec_address (rtx, enum riscv_symbol_type);
-extern void riscv_move_integer (rtx, rtx, HOST_WIDE_INT);
+extern void riscv_move_integer (rtx, rtx, HOST_WIDE_INT, bool);
 extern bool riscv_legitimize_move (machine_mode, rtx, rtx);
 extern rtx riscv_subword (rtx, bool);
 extern bool riscv_split_64bit_move_p (rtx, rtx);
diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c
index 35219956c80..5cb295d3abb 100644
--- a/gcc/config/riscv/riscv.c
+++ b/gcc/config/riscv/riscv.c
@@ -508,8 +508,8 @@ riscv_split_integer (HOST_WIDE_INT val, machine_mode mode)
   unsigned HOST_WIDE_INT hival = sext_hwi ((val - loval) >> 32, 32);
   rtx hi = gen_reg_rtx (mode), lo = gen_reg_rtx (mode);
 
-  riscv_move_integer (hi, hi, hival);
-  riscv_move_integer (lo, lo, loval);
+  riscv_move_integer (hi, hi, hival, FALSE);
+  riscv_move_integer (lo, lo, loval, FALSE);
 
   hi = gen_rtx_fmt_ee (ASHIFT, mode, hi, GEN_INT (32));
   hi = force_reg (mode, hi);
@@ -1021,9 +1021,12 @@ riscv_force_binary (machine_mode mode, enum rtx_code code, rtx x, rtx y)
are allowed, copy it into a new register, otherwise use DEST.  */
 
 static rtx
-riscv_force_temporary (rtx dest, rtx value)
+riscv_force_temporary (rtx dest, rtx value, bool in_splitter)
 {
-  if (can_create_pseudo_p ())
+  /* We can't call gen_reg_rtx from a splitter, because this might realloc
+ the regno_reg_rtx array, which would invalidate reg rtx pointers in the
+

Re: [PATCH][x86] Fix PR91814

2019-09-20 Thread Richard Biener
On Fri, 20 Sep 2019, Uros Bizjak wrote:

> On Fri, Sep 20, 2019 at 10:32 AM Richard Biener  wrote:
> >
> > On Fri, 20 Sep 2019, Uros Bizjak wrote:
> >
> > > On Thu, Sep 19, 2019 at 5:43 PM Uros Bizjak  wrote:
> > > >
> > > > On Thu, Sep 19, 2019 at 5:30 PM Richard Biener  
> > > > wrote:
> > > > >
> > > > >
> > > > > Boostrapped and tested on x86_64-unknown-linux-gnu.
> > > > >
> > > > > OK?
> > > >
> > > > OK.
> > >
> > > Hm, something is not working correctly here. For the testcase, I get:
> > >
> > > main:
> > > vmovq   %xmm0, %xmm0
> > > vpxor   %xmm1, %xmm1, %xmm1
> > > vpsubq  %xmm0, %xmm1, %xmm1
> > > vpmaxsq %xmm1, %xmm0, %xmm0
> > > vmovq   %xmm0, %rax
> > > movabsq %rax, .LC0+11532131096
> > > xorl%eax, %eax
> > > ret
> > >
> > > The first insn uses uninitialized reg.
> > >
> > > The _.stv pass misses initialization of r94 reg:
> > >
> > > (note 2 3 7 2 NOTE_INSN_FUNCTION_BEG)
> > > (note 7 2 24 2 NOTE_INSN_DELETED)
> > > (insn 24 7 8 2 (set (subreg:V2DI (reg:DI 93) 0)
> > > (vec_concat:V2DI (reg:DI 94)
> > > (const_int 0 [0]))) "pr67271.c":11:17 -1
> > >  (nil))
> >
> > Hmm, it emits the instruction in the wrong spot for the caller
> >
> >   emit_insn_before (gen_rtx_SET (gen_rtx_SUBREG (vmode, tmp, 0),
> >  gen_gpr_to_xmm_move_src (vmode,
> > *op)),
> > insn);
> >
> > we can do the following which I am testing now.
> 
> How about the attached patch? The only place we process memory operand
> is from convert_op (see the function comment), and by using
> gen_rtx_SET directly, we can still generate MOVABS, which is otherwise
> split by using emit_move_insn.
> 
> Can you please test the attached patch instead?

Bootstrapped and tested on x86_64-unknown-linux-gnu, OK?

Thanks,
Richard.

2019-09-20  Richard Biener  
Uros Bizjak  

PR target/91814
* config/i386/i386-features.c (gen_gpr_to_xmm_move_src): Revert
previous change.
(general_scalar_chain::convert_op): Force not suitable memory
operands to a register.

Index: gcc/config/i386/i386-features.c
===
--- gcc/config/i386/i386-features.c (revision 275995)
+++ gcc/config/i386/i386-features.c (working copy)
@@ -668,8 +668,6 @@ scalar_chain::emit_conversion_insns (rtx
 static rtx
 gen_gpr_to_xmm_move_src (enum machine_mode vmode, rtx gpr)
 {
-  if (!nonimmediate_operand (gpr, GET_MODE_INNER (vmode)))
-gpr = force_reg (GET_MODE_INNER (vmode), gpr);
   switch (GET_MODE_NUNITS (vmode))
 {
 case 1:
@@ -835,6 +833,15 @@ general_scalar_chain::convert_op (rtx *o
 {
   rtx tmp = gen_reg_rtx (GET_MODE (*op));
 
+  /* Handle movabs.  */
+  if (!memory_operand (*op, GET_MODE (*op)))
+   {
+ rtx tmp2 = gen_reg_rtx (GET_MODE (*op));
+
+ emit_insn_before (gen_rtx_SET (tmp2, *op), insn);
+ *op = tmp2;
+   }
+
   emit_insn_before (gen_rtx_SET (gen_rtx_SUBREG (vmode, tmp, 0),
 gen_gpr_to_xmm_move_src (vmode, *op)),
insn);


Re: [PATCH][x86] Fix PR91814

2019-09-20 Thread Uros Bizjak
On Fri, Sep 20, 2019 at 1:11 PM Richard Biener  wrote:
>
> On Fri, 20 Sep 2019, Uros Bizjak wrote:
>
> > On Fri, Sep 20, 2019 at 10:32 AM Richard Biener  wrote:
> > >
> > > On Fri, 20 Sep 2019, Uros Bizjak wrote:
> > >
> > > > On Thu, Sep 19, 2019 at 5:43 PM Uros Bizjak  wrote:
> > > > >
> > > > > On Thu, Sep 19, 2019 at 5:30 PM Richard Biener  
> > > > > wrote:
> > > > > >
> > > > > >
> > > > > > Boostrapped and tested on x86_64-unknown-linux-gnu.
> > > > > >
> > > > > > OK?
> > > > >
> > > > > OK.
> > > >
> > > > Hm, something is not working correctly here. For the testcase, I get:
> > > >
> > > > main:
> > > > vmovq   %xmm0, %xmm0
> > > > vpxor   %xmm1, %xmm1, %xmm1
> > > > vpsubq  %xmm0, %xmm1, %xmm1
> > > > vpmaxsq %xmm1, %xmm0, %xmm0
> > > > vmovq   %xmm0, %rax
> > > > movabsq %rax, .LC0+11532131096
> > > > xorl%eax, %eax
> > > > ret
> > > >
> > > > The first insn uses uninitialized reg.
> > > >
> > > > The _.stv pass misses initialization of r94 reg:
> > > >
> > > > (note 2 3 7 2 NOTE_INSN_FUNCTION_BEG)
> > > > (note 7 2 24 2 NOTE_INSN_DELETED)
> > > > (insn 24 7 8 2 (set (subreg:V2DI (reg:DI 93) 0)
> > > > (vec_concat:V2DI (reg:DI 94)
> > > > (const_int 0 [0]))) "pr67271.c":11:17 -1
> > > >  (nil))
> > >
> > > Hmm, it emits the instruction in the wrong spot for the caller
> > >
> > >   emit_insn_before (gen_rtx_SET (gen_rtx_SUBREG (vmode, tmp, 0),
> > >  gen_gpr_to_xmm_move_src (vmode,
> > > *op)),
> > > insn);
> > >
> > > we can do the following which I am testing now.
> >
> > How about the attached patch? The only place we process memory operand
> > is from convert_op (see the function comment), and by using
> > gen_rtx_SET directly, we can still generate MOVABS, which is otherwise
> > split by using emit_move_insn.
> >
> > Can you please test the attached patch instead?
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu, OK?
>
> Thanks,
> Richard.
>
> 2019-09-20  Richard Biener  
> Uros Bizjak  
>
> PR target/91814
> * config/i386/i386-features.c (gen_gpr_to_xmm_move_src): Revert
> previous change.
> (general_scalar_chain::convert_op): Force not suitable memory
> operands to a register.

OK.

Thanks,
Uros.

> Index: gcc/config/i386/i386-features.c
> ===
> --- gcc/config/i386/i386-features.c (revision 275995)
> +++ gcc/config/i386/i386-features.c (working copy)
> @@ -668,8 +668,6 @@ scalar_chain::emit_conversion_insns (rtx
>  static rtx
>  gen_gpr_to_xmm_move_src (enum machine_mode vmode, rtx gpr)
>  {
> -  if (!nonimmediate_operand (gpr, GET_MODE_INNER (vmode)))
> -gpr = force_reg (GET_MODE_INNER (vmode), gpr);
>switch (GET_MODE_NUNITS (vmode))
>  {
>  case 1:
> @@ -835,6 +833,15 @@ general_scalar_chain::convert_op (rtx *o
>  {
>rtx tmp = gen_reg_rtx (GET_MODE (*op));
>
> +  /* Handle movabs.  */
> +  if (!memory_operand (*op, GET_MODE (*op)))
> +   {
> + rtx tmp2 = gen_reg_rtx (GET_MODE (*op));
> +
> + emit_insn_before (gen_rtx_SET (tmp2, *op), insn);
> + *op = tmp2;
> +   }
> +
>emit_insn_before (gen_rtx_SET (gen_rtx_SUBREG (vmode, tmp, 0),
>  gen_gpr_to_xmm_move_src (vmode, *op)),
> insn);


Re: [PATCH] Reduction of conditional operations for vectorization

2019-09-20 Thread Richard Biener
On Fri, Sep 20, 2019 at 10:09 AM Yuliang Wang  wrote:
>
> Hi,
>
> ifcvt transforms the following conditional operation reduction pattern:
>
>   if ( condition )
> a = a OP b;
>   else
> a = a OP c;
>
> Into:
>
>   a_1 = a OP b;
>   a_2 = a OP c;
>   a = condition ? a_1 : a_2;
>
> Where OP is one of { plus minus mult min max and ior eor }.
>
> This patch further optimizes the above to:
>
>   a_0 = condition ? b : c;
>   a = a OP a_0;
>
> Which enables vectorization on AArch64.
> Also supported are permutations of the above operand ordering subject to 
> commutativity of OP.
>
> Added new tests. Built and regression tested on aarch64-none-elf and 
> aarch64-linux-gnu.

@@ -3206,7 +3206,41 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  /* !A ? B : C -> A ? C : B.  */
  (simplify
   (cnd (logical_inverted_value truth_valued_p@0) @1 @2)
-  (cnd @0 @2 @1)))
+  (cnd @0 @2 @1))
+
+ /* !A ? B : C -> A ? C : B.  */
+ (simplify
+  (cnd (logical_inverted_value truth_valued_p@0) @1 @2)
+  (cnd @0 @2 @1))
+

looks like you duplicate the above pattern.  Should have raised a warning
in the genmatch run.

The patch header shows you are not working against trunk?

+ (for op (plus minus mult
+ min max
+ bit_and bit_ior bit_xor)
+  (simplify
+   (cnd @0 (op @1 @2) (op @1 @3))
+   (op @1 (cnd @0 @2 @3)))
+  (simplify
+   (cnd @0 (op @1 @2) (op @3 @2))
+   (op (cnd @0 @1 @3) @2))
+  (if (op != MINUS_EXPR)
+   (simplify
+(cnd @0 (op @1 @2) (op @3 @1))
+(op @1 (cnd @0 @2 @3)))
+   (simplify
+(cnd @0 (op @2 @1) (op @1 @3))
+(op @1 (cnd @0 @2 @3)

if you would have dropped minus handling this simplifies to

(for op (...)
 (simpify
  (cnd @0 (op:c @1 @2) (op:c @1 @3))
  (op @1 (cnd @0 @2 @3)))

you can then add minus special-cases if they are important

 (simplify
  (cnd @0 (minus @1 @2) (minus @1 @3))
...
 (simplify
  (cnd @0 (minus @2 @1) (minus @3 @1))

I think that's clearer.

+ /* Hack: generic-match causes infinite recursion
+by reverting this transformation when
+i) -fno-trapping-math is enabled, and
+ii) the common operand does not need to be wrapped in a SAVE_EXPR.  */

What's the specific transform that causes this?  Yes, there are some
left in fold-const.c.

Thanks,
Richard.

> Best Regards,
> Yuliang Wang
>
>
> gcc/ChangeLog:
>
> 2019-09-19  Yuliang Wang  
>
> * match.pd (for cnd (cond vec_cond)): New match statements for the
> above patterns.
> * doc/sourcebuild.texi (vect_condred_si): Document new target 
> selector.
>
> gcc/testsuite/ChangeLog:
>
> 2019-09-19  Yuliang Wang  
>
> * gcc.target/aarch64/sve2/condred_1.c: New test.
> * gcc.dg/vect/vect-condred-1.c: As above.
> * gcc.dg/vect/vect-condred-2.c: As above.
> * gcc.dg/vect/vect-condred-3.c: As above.
> * gcc.dg/vect/vect-condred-4.c: As above.
> * gcc.dg/vect/vect-condred-5.c: As above.
> * gcc.dg/vect/vect-condred-6.c: As above.
> * gcc.dg/vect/vect-condred-7.c: As above.
> * gcc.dg/vect/vect-condred-8.c: As above.
> * lib/target-supports.exp (check_effective_target_vect_condred_si):
> Return true for AArch64 without SVE.


Re: PR88751: Backport to GCC 8 and 9 branches?

2019-09-20 Thread Richard Biener
On Fri, Sep 20, 2019 at 11:28 AM Andreas Krebbel  wrote:
>
> On 06.09.19 12:48, Richard Biener wrote:
> > On Fri, Sep 6, 2019 at 10:11 AM Andreas Krebbel  
> > wrote:
> >>
> >> Hi,
> >>
> >> since this caused a critical performance regression in the OpenJ9 byte 
> >> code interpreter after
> >> migrating from GCC 4.8 to GCC 7 I would like to backport this patch also 
> >> to GCC 8 and 9 branch.
> >>
> >> Ok - after bootstrap and regression test went fine?
> >
> > Looks reasonable to me.  But what about GCC 7?  I assume you also verified 
> > the
> > actual performance regression is gone.
>
> I've committed the patch to GCC 7 and 8 branch after verifying that the 
> change has the desired
> effect on the source code file from OpenJ9.
>
> GCC 9 branch is currently frozen. Ok, to apply there as well?

Yes, it shouldn't be frozen anymore...

Richard.

> Andreas
>
> >
> > Richard.
> >
> >>
> >> Andreas
> >>
> >>
> >> commit d3dc20418aad41af83fe45ccba527deb0b334983
> >> Author: krebbel 
> >> Date:   Thu Jun 6 11:35:04 2019 +
> >>
> >> Fix PR88751
> >>
> >> This patch implements a small improvement for the heuristic in lra
> >> which decides when it has to activate the simpler register allocation
> >> algorithm.
> >>
> >> gcc/ChangeLog:
> >>
> >> 2019-06-06  Andreas Krebbel  
> >>
> >> PR rtl-optimization/88751
> >> * ira.c (ira): Use the number of the actually referenced 
> >> registers
> >> when calculating the threshold.
> >>
> >>
> >>
> >> git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@271996 
> >> 138bc75d-0d04-0410-961f-82ee72b054a4
> >>
> >>
> >> diff --git a/gcc/ira.c b/gcc/ira.c
> >> index 4a14fb31583..725636d8dc5 100644
> >> --- a/gcc/ira.c
> >> +++ b/gcc/ira.c
> >> @@ -5198,6 +5198,8 @@ ira (FILE *f)
> >>int ira_max_point_before_emit;
> >>bool saved_flag_caller_saves = flag_caller_saves;
> >>enum ira_region saved_flag_ira_region = flag_ira_region;
> >> +  unsigned int i;
> >> +  int num_used_regs = 0;
> >>
> >>clear_bb_flags ();
> >>
> >> @@ -5213,12 +5215,17 @@ ira (FILE *f)
> >>
> >>ira_conflicts_p = optimize > 0;
> >>
> >> +  /* Determine the number of pseudos actually requiring coloring.  */
> >> +  for (i = FIRST_PSEUDO_REGISTER; i < DF_REG_SIZE (df); i++)
> >> +num_used_regs += !!(DF_REG_USE_COUNT (i) + DF_REG_DEF_COUNT (i));
> >> +
> >>/* If there are too many pseudos and/or basic blocks (e.g. 10K
> >>   pseudos and 10K blocks or 100K pseudos and 1K blocks), we will
> >>   use simplified and faster algorithms in LRA.  */
> >>lra_simple_p
> >>  = (ira_use_lra_p
> >> -   && max_reg_num () >= (1 << 26) / last_basic_block_for_fn (cfun));
> >> +   && num_used_regs >= (1 << 26) / last_basic_block_for_fn (cfun));
> >> +
> >>if (lra_simple_p)
> >>  {
> >>/* It permits to skip live range splitting in LRA.  */
> >>
>


Re: [Ada] Fix 32/64bit mistake on SYSTEM_INFO component in s-win32

2019-09-20 Thread Olivier Hainque
Hello Iain & Rainer,

> On 19 Sep 2019, at 18:40, Olivier Hainque  wrote:

>>> Given that the test cannot compile on anything but *-*-linux* and
>>> *-*-mingw*, I'd rather restrict the test to those two (or more targets
>>> that decide to implement the missing interface).

> Works for me as well.
[...]
> We can take care of adding the required filter.

I have just committed the attached patch to this effect.
Thanks again for the heads-up!

Cheers,

Olivier

---

2019-09-20  Olivier Hainque  

testsuite/

* gnat.dg/system_info1.adb: Restrict to *-*-linux* and *-*-mingw*.



system_info1.diff
Description: Binary data


Re: [ARM/FDPIC v6 13/24] [ARM] FDPIC: Force LSB bit for PC in Cortex-M architecture

2019-09-20 Thread Matthew Malcomson
On 19/09/19 16:14, Kyrill Tkachov wrote:
> 
> On 9/19/19 4:13 PM, Christophe Lyon wrote:
>> On Tue, 17 Sep 2019 at 14:08, Christophe Lyon  
>> wrote:
>> >
>> > On 17/09/2019 13:38, Wilco Dijkstra wrote:
>> > > Hi Christophe,
>> > >
>> > > Can you explain this in more detail - it doesn't make sense to me 
>> to force the
>> > > Thumb bit during unwinding since it should already be correct, 
>> even on a
>> > > Thumb-only CPU. Perhaps the kernel code that pushes an incorrect 
>> address on
>> > > the stack could be fixed instead?
>> > >
>> > >> Without this, when we are unwinding across a signal frame we can 
>> jump
>> > >> to an even address which leads to an exception.
>> > >>
>> > >> This is needed in __gnu_persnality_sigframe_fdpic() when 
>> restoring the
>> > >> PC from the signal frame since the PC saved by the kernel has the 
>> LSB
>> > >> bit set to zero.
>> > >
>> > > Wilco
>> > > .
>> > >
>> >
>> > Indeed, I've noticed the problem mentioned by Matthew since I 
>> committed that patch.
>> >
>> > I was about to propose a fix, replacing #if (__thumb__) with #if 
>> (!__ARM_ARCH_ISA_ARM), but you are right: maybe the kernel code should 
>> be fixed instead.
>> >
>> > So far I haven't managed to reproduce a failure in FDPIC mode 
>> without this patch though...
>> >
>> > Thanks and sorry for the breakage.
>> >
>>
>> I'm having problems with the board I use for testing, so I propose to
>> revert that patch until I have a better description of the problem it
>> fixed.
>> OK?
> 
> Ok by me as long as lives the fdpic toolchain in a usable state (barring 
> the potential issue here)

Thanks Christophe -- reverting that patch would help our internal 
testing a lot!
MM

> 
> Thanks,
> 
> Kyrill
> 
> 
>>
>> Christophe
>>
>> > Christophe
>> >



[PATCH] Some vect_create_epilog_for_reduction refactoring

2019-09-20 Thread Richard Biener


Plus simplification in get_initial_def_for_reduction, avoiding some
of the + 0 adjustments I've seen created.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2019-09-20  Richard Biener  

* tree-vect-loop.c (get_initial_def_for_reduction): Simplify,
avoid adjusting by + 0 or * 1.
(vect_create_epilog_for_reduction): Get reduction code only
when necessary.  Deal with adjustment_def only when necessary.

Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c(revision 275997)
+++ gcc/tree-vect-loop.c(working copy)
@@ -4003,8 +4003,10 @@ get_initial_def_for_reduction (stmt_vec_
   gcc_assert (nested_in_vect_loop_p (loop, stmt_vinfo)
  || loop == (gimple_bb (stmt_vinfo->stmt))->loop_father);
 
-  vect_reduction_type reduction_type
-= STMT_VINFO_VEC_REDUCTION_TYPE (stmt_vinfo);
+  /* ADJUSTMENT_DEF is NULL when called from
+ vect_create_epilog_for_reduction to vectorize double reduction.  */
+  if (adjustment_def)
+*adjustment_def = NULL;
 
   switch (code)
 {
@@ -4018,11 +4020,6 @@ get_initial_def_for_reduction (stmt_vec_
 case MULT_EXPR:
 case BIT_AND_EXPR:
   {
-/* ADJUSTMENT_DEF is NULL when called from
-   vect_create_epilog_for_reduction to vectorize double reduction.  */
-if (adjustment_def)
- *adjustment_def = init_val;
-
 if (code == MULT_EXPR)
   {
 real_init_val = dconst1;
@@ -4037,10 +4034,14 @@ get_initial_def_for_reduction (stmt_vec_
 else
   def_for_init = build_int_cst (scalar_type, int_init_val);
 
-   if (adjustment_def)
- /* Option1: the first element is '0' or '1' as well.  */
- init_def = gimple_build_vector_from_val (&stmts, vectype,
-  def_for_init);
+   if (adjustment_def || operand_equal_p (def_for_init, init_val, 0))
+ {
+   /* Option1: the first element is '0' or '1' as well.  */
+   if (!operand_equal_p (def_for_init, init_val, 0))
+ *adjustment_def = init_val;
+   init_def = gimple_build_vector_from_val (&stmts, vectype,
+def_for_init);
+ }
else if (!TYPE_VECTOR_SUBPARTS (vectype).is_constant ())
  {
/* Option2 (variable length): the first element is INIT_VAL.  */
@@ -4064,16 +4065,8 @@ get_initial_def_for_reduction (stmt_vec_
 case MAX_EXPR:
 case COND_EXPR:
   {
-   if (adjustment_def)
-  {
-   *adjustment_def = NULL_TREE;
-   if (reduction_type != COND_REDUCTION
-   && reduction_type != EXTRACT_LAST_REDUCTION)
- {
-   init_def = vect_get_vec_def_for_operand (init_val, stmt_vinfo);
-   break;
- }
- }
+   gcc_assert (useless_type_conversion_p (TREE_TYPE (vectype),
+  TREE_TYPE (init_val)));
init_val = gimple_convert (&stmts, TREE_TYPE (vectype), init_val);
init_def = gimple_build_vector_from_val (&stmts, vectype, init_val);
   }
@@ -4304,7 +4297,6 @@ vect_create_epilog_for_reduction (vecstmt);
   gimple *exit_phi;
   tree bitsize;
   tree adjustment_def = NULL;
@@ -4645,12 +4637,6 @@ vect_create_epilog_for_reduction (vecstmt);
-  /* For MINUS_EXPR the initial vector is [init_val,0,...,0], therefore,
- partial results are added and not subtracted.  */
-  if (code == MINUS_EXPR) 
-code = PLUS_EXPR;
   
   scalar_dest = gimple_assign_lhs (orig_stmt_info->stmt);
   scalar_type = TREE_TYPE (scalar_dest);
@@ -4665,7 +4651,14 @@ vect_create_epilog_for_reduction (vecstmt);
+  /* For MINUS_EXPR the initial vector is [init_val,0,...,0], therefore,
+ partial results are added and not subtracted.  */
+  if (code == MINUS_EXPR) 
+code = PLUS_EXPR;
 
   /* SLP reduction without reduction chain, e.g.,
  # a1 = phi 
@@ -5352,12 +5345,7 @@ vect_create_epilog_for_reduction (vecinner;
-
+ 
   /* 2.5 Adjust the final result by the initial value of the reduction
 variable. (When such adjustment is not needed, then
 'adjustment_def' is zero).  For example, if code is PLUS we create:
@@ -5401,6 +5389,10 @@ vect_finalize_reduction:
 
   new_phis[0] = epilog_stmt;
 }
+}
+
+  if (double_reduc)
+loop = loop->inner;
 
   /* 2.6  Handle the loop-exit phis.  Replace the uses of scalar loop-exit
   phis with new adjusted scalar results, i.e., replace use 


Re: [Ada] Fix 32/64bit mistake on SYSTEM_INFO component in s-win32

2019-09-20 Thread Rainer Orth
Hi Iain,

>> On 19 Sep 2019, at 15:51, Rainer Orth  wrote:
>> 
>
 On 18 Sep 2019, at 09:39, Pierre-Marie de Rodat  
 wrote:
 
>>> 
 gcc/testsuite/
 
* gnat.dg/system_info1.adb: New testcase.
>>> 
>>> This new test fails everywhere on Darwin, which doesn’t have an
>>> implementation for
>>> System.Task_Info.Number_Of_Processors
>>> 
>>> Given 
>>> "pragma Obsolescent (Task_Info, "use System.Multiprocessors and CPU
>>> aspect”);”
>>> 
>>> is it worth me trying to implement the Task_Info stuff?
>> 
>> I'm seeing the same on Solaris (will be every non-Linux/MinGW target).
>> I've implemented Number_Of_Processors using
>> sysconf(__SC_NPROCESSORS_ONLN), which is also available on Darwin.  Will
>> submit the patch tomorrow once testing has finished…
>
> OK. it’s likely that the same (almost) patch will work for Darwin which
> also has that Posix call.
>
> I’ll look at your patch and adapt it for Darwin then,
>
> I don’t *think* we can make:
>
> s-tasinf__posix.ad? 
>
> using that since  I think __SC_NPROCESSORS_ONLN is an optional addition? 
> (but ICBW about that)

there are two issues here:

* _SC_NPROCESSORS_ONLN is not in XPG7 according to
  https://pubs.opengroup.org/onlinepubs/9699919799/.  gnulib
  (lib/nproc.c) lists it as an addition in glibc,
  Mac OS X 10.5, FreeBSD, AIX, OSF/1, Solaris, Cygwin, Haiku.

* In theory, one could at least move the sysconf declaration to a common
  s-osinte__posix.ads since that function *is* part of XPG7.  In fact,
  we already have a couple of duplicate declarations in
  s-osinte__android.ads, s-osinte__kfreebsd-gnu.ads,
  s-osinte__lynxos178.adb, s-osinte__linux.ads, and
  s-taprop__solaris.adb (with a different return type).  While it would
  be good to unify them to avoid the duplication, I don't see how to do
  so: s-osinte.ads is already linked to one of several different
  target-specific versions.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [Ada] Fix 32/64bit mistake on SYSTEM_INFO component in s-win32

2019-09-20 Thread Rainer Orth
Hi Olivier,

>> On 19 Sep 2019, at 18:40, Olivier Hainque  wrote:
>
 Given that the test cannot compile on anything but *-*-linux* and
 *-*-mingw*, I'd rather restrict the test to those two (or more targets
 that decide to implement the missing interface).
>
>> Works for me as well.
> [...]
>> We can take care of adding the required filter.
>
> I have just committed the attached patch to this effect.
> Thanks again for the heads-up!

thanks.

> 2019-09-20  Olivier Hainque  
>
>   testsuite/
>
> * gnat.dg/system_info1.adb: Restrict to *-*-linux* and *-*-mingw*.

I'd keep the target list alphabetical.  Can do so when the Solaris patch
goes in.

Here's what I've successfully tested last night on both
i386-pc-solaris2.11 and sparc-sun-solaris2.11, shamelessly stolen from
the Linux counterparts.  If it is acceptable, I'd add *-*-solaris2.* to
the target list when merging.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


2019-09-19  Rainer Orth  

* libgnarl/s-osinte__solaris.ads (sysconf): Declare.
(SC_NPROCESSORS_ONLN): Define.
* libgnarl/s-tasinf__solaris.ads (Number_Of_Processors): Declare.
* libgnarl/s-tasinf__solaris.adb (N_CPU): New variable.
(Number_Of_Processors): New function.

# HG changeset patch
# Parent  48b6a952486ff0ee28ce4b9c1bd4f77a4078491d
Provide Task_Info.Number_Of_Processors on Solaris

diff --git a/gcc/ada/libgnarl/s-osinte__solaris.ads b/gcc/ada/libgnarl/s-osinte__solaris.ads
--- a/gcc/ada/libgnarl/s-osinte__solaris.ads
+++ b/gcc/ada/libgnarl/s-osinte__solaris.ads
@@ -259,6 +259,11 @@ package System.OS_Interface is
function To_Timespec (D : Duration) return timespec;
pragma Inline (To_Timespec);
 
+   function sysconf (name : int) return long;
+   pragma Import (C, sysconf);
+
+   SC_NPROCESSORS_ONLN : constant := 15;
+
-
-- Process --
-
diff --git a/gcc/ada/libgnarl/s-tasinf__solaris.adb b/gcc/ada/libgnarl/s-tasinf__solaris.adb
--- a/gcc/ada/libgnarl/s-tasinf__solaris.adb
+++ b/gcc/ada/libgnarl/s-tasinf__solaris.adb
@@ -84,4 +84,23 @@ package body System.Task_Info is
   return (False, False);
end Unbound_Thread_Attributes;
 
+   N_CPU : Natural := 0;
+   pragma Atomic (N_CPU);
+   --  Cache CPU number. Use pragma Atomic to avoid a race condition when
+   --  setting N_CPU in Number_Of_Processors below.
+
+   --
+   -- Number_Of_Processors --
+   --
+
+   function Number_Of_Processors return Positive is
+   begin
+  if N_CPU = 0 then
+ N_CPU := Natural
+   (OS_Interface.sysconf (OS_Interface.SC_NPROCESSORS_ONLN));
+  end if;
+
+  return N_CPU;
+   end Number_Of_Processors;
+
 end System.Task_Info;
diff --git a/gcc/ada/libgnarl/s-tasinf__solaris.ads b/gcc/ada/libgnarl/s-tasinf__solaris.ads
--- a/gcc/ada/libgnarl/s-tasinf__solaris.ads
+++ b/gcc/ada/libgnarl/s-tasinf__solaris.ads
@@ -139,4 +139,7 @@ package System.Task_Info is
 
Unspecified_Task_Info : constant Task_Info_Type := null;
 
+   function Number_Of_Processors return Positive;
+   --  Returns the number of processors on the running host
+
 end System.Task_Info;


Re: Make assemble_real generate canonical CONST_INTs

2019-09-20 Thread Christophe Lyon
On Wed, 18 Sep 2019 at 11:41, Richard Sandiford
 wrote:
>
> Richard Biener  writes:
> > On Tue, Sep 17, 2019 at 4:33 PM Richard Sandiford
> >  wrote:
> >>
> >> assemble_real used GEN_INT to create integers directly from the
> >> longs returned by real_to_target.  assemble_integer then went on
> >> to interpret the const_ints as though they had the mode corresponding
> >> to the accompanying size parameter:
> >>
> >>   imode = mode_for_size (size * BITS_PER_UNIT, mclass, 0).require ();
> >>
> >>   for (i = 0; i < size; i += subsize)
> >> {
> >>   rtx partial = simplify_subreg (omode, x, imode, i);
> >>
> >> But in the assemble_real case, X might not be canonical for IMODE.
> >>
> >> If the interface to assemble_integer is supposed to allow outputting
> >> (say) the low 4 bytes of a DImode integer, then the simplify_subreg
> >> above is wrong.  But if the number of bytes passed to assemble_integer
> >> is supposed to be the number of bytes that the integer actually contains,
> >> assemble_real is wrong.
> >>
> >> This patch takes the latter interpretation and makes assemble_real
> >> generate const_ints that are canonical for the number of bytes passed.
> >>
> >> The flip_storage_order handling assumes that each long is a full
> >> SImode, which e.g. excludes BITS_PER_UNIT != 8 and float formats
> >> whose memory size is not a multiple of 32 bits (which includes
> >> HFmode at least).  The patch therefore leaves that code alone.
> >> If interpreting each integer as SImode is correct, the const_ints
> >> that it generates are also correct.
> >>
> >> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  Also tested
> >> by making sure that there were no new errors from a range of
> >> cross-built targets.  OK to install?
> >>
> >> Richard
> >>
> >>
> >> 2019-09-17  Richard Sandiford  
> >>
> >> gcc/
> >> * varasm.c (assemble_real): Generate canonical const_ints.
> >>
> >> Index: gcc/varasm.c
> >> ===
> >> --- gcc/varasm.c2019-09-05 08:49:30.829739618 +0100
> >> +++ gcc/varasm.c2019-09-17 15:30:10.400740515 +0100
> >> @@ -2873,25 +2873,27 @@ assemble_real (REAL_VALUE_TYPE d, scalar
> >>real_to_target (data, &d, mode);
> >>
> >>/* Put out the first word with the specified alignment.  */
> >> +  unsigned int chunk_nunits = MIN (nunits, units_per);
> >>if (reverse)
> >>  elt = flip_storage_order (SImode, gen_int_mode (data[nelts - 1], 
> >> SImode));
> >>else
> >> -elt = GEN_INT (data[0]);
> >> -  assemble_integer (elt, MIN (nunits, units_per), align, 1);
> >> -  nunits -= units_per;
> >> +elt = GEN_INT (sext_hwi (data[0], chunk_nunits * BITS_PER_UNIT));
> >
> > why the appearant difference between the storage-order flipping
> > variant using gen_int_mode vs. the GEN_INT with sext_hwi?
> > Can't we use gen_int_mode in the non-flipping path and be done with that?
>
> Yeah, I mentioned this in the covering note.  The flip_storage_order
> stuff only seems to work for floats that are a multiple of 32 bits in
> size, so it doesn't e.g. handle HFmode or 80-bit floats, whereas the
> new "else" does.  Hard-coding SImode also hard-codes BITS_PER_UNIT==8,
> unlike the "else".
>
> So if anything, it's flip_storage_order that might need to change
> to avoid hard-coding SImode.  That doesn't look like a trivial change
> though.  E.g. the number of bytes passed to assemble_integer would need
> to match the number of bytes in data[nelts - 1] rather than data[0].
> The alignment code below would also need to be adjusted.  Fixing that
> (if it is a bug) seems like a separate change and TBH I'd rather not
> touch it here.
>

Hi Richard,

I suspect you've probably noticed already, but in case you haven't:
this patch causes a regression on arm:
FAIL: gcc.target/arm/fp16-compile-alt-3.c scan-assembler \t.short\t49152
FAIL: gcc.target/arm/fp16-compile-ieee-3.c scan-assembler \t.short\t49152

Christophe

> Thanks,
> Richard
>
> >
> >> +  assemble_integer (elt, chunk_nunits, align, 1);
> >> +  nunits -= chunk_nunits;
> >>
> >>/* Subsequent words need only 32-bit alignment.  */
> >>align = min_align (align, 32);
> >>
> >>for (int i = 1; i < nelts; i++)
> >>  {
> >> +  chunk_nunits = MIN (nunits, units_per);
> >>if (reverse)
> >> elt = flip_storage_order (SImode,
> >>   gen_int_mode (data[nelts - 1 - i], 
> >> SImode));
> >>else
> >> -   elt = GEN_INT (data[i]);
> >> -  assemble_integer (elt, MIN (nunits, units_per), align, 1);
> >> -  nunits -= units_per;
> >> +   elt = GEN_INT (sext_hwi (data[i], chunk_nunits * BITS_PER_UNIT));
> >> +  assemble_integer (elt, chunk_nunits, align, 1);
> >> +  nunits -= chunk_nunits;
> >>  }
> >>  }
> >>


RE: [PATCH] Reduction of conditional operations for vectorization

2019-09-20 Thread Yuliang Wang
Hi Richard,

Thanks for your comments and tips. fold_binary_op_with_conditional_arg performs 
the reverse transformation to this patch in certain situations:

/* Transform `a + (b ? x : y)' into `b ? (a + x) : (a + y)'. 
   ... */

static tree
fold_binary_op_with_conditional_arg (location_t loc,
...

/* This transformation is only worthwhile if we don't have to wrap ARG
   in a SAVE_EXPR and the operation can be simplified without recursing
   on at least one of the branches once its pushed inside the COND_EXPR.  */
if (!TREE_CONSTANT (arg)
&& (TREE_SIDE_EFFECTS (arg) ...)
  return NULL_TREE;
...

For instance, this causes infinite recursion in 
gcc.dg/vect/fast-math-vect-call-2 because ARG is a float literal.

Regards,
Yuliang

-Original Message-
From: Richard Biener  
Sent: 20 September 2019 13:02
To: Yuliang Wang 
Cc: gcc-patches@gcc.gnu.org; nd ; Richard Sandiford 

Subject: Re: [PATCH] Reduction of conditional operations for vectorization

On Fri, Sep 20, 2019 at 10:09 AM Yuliang Wang  wrote:
>
> Hi,
>
> ifcvt transforms the following conditional operation reduction pattern:
>
>   if ( condition )
> a = a OP b;
>   else
> a = a OP c;
>
> Into:
>
>   a_1 = a OP b;
>   a_2 = a OP c;
>   a = condition ? a_1 : a_2;
>
> Where OP is one of { plus minus mult min max and ior eor }.
>
> This patch further optimizes the above to:
>
>   a_0 = condition ? b : c;
>   a = a OP a_0;
>
> Which enables vectorization on AArch64.
> Also supported are permutations of the above operand ordering subject to 
> commutativity of OP.
>
> Added new tests. Built and regression tested on aarch64-none-elf and 
> aarch64-linux-gnu.

@@ -3206,7 +3206,41 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  /* !A ? B : C -> A ? C : B.  */
  (simplify
   (cnd (logical_inverted_value truth_valued_p@0) @1 @2)
-  (cnd @0 @2 @1)))
+  (cnd @0 @2 @1))
+
+ /* !A ? B : C -> A ? C : B.  */
+ (simplify
+  (cnd (logical_inverted_value truth_valued_p@0) @1 @2)  (cnd @0 @2 
+ @1))
+

looks like you duplicate the above pattern.  Should have raised a warning in 
the genmatch run.

The patch header shows you are not working against trunk?

+ (for op (plus minus mult
+ min max
+ bit_and bit_ior bit_xor)
+  (simplify
+   (cnd @0 (op @1 @2) (op @1 @3))
+   (op @1 (cnd @0 @2 @3)))
+  (simplify
+   (cnd @0 (op @1 @2) (op @3 @2))
+   (op (cnd @0 @1 @3) @2))
+  (if (op != MINUS_EXPR)
+   (simplify
+(cnd @0 (op @1 @2) (op @3 @1))
+(op @1 (cnd @0 @2 @3)))
+   (simplify
+(cnd @0 (op @2 @1) (op @1 @3))
+(op @1 (cnd @0 @2 @3)

if you would have dropped minus handling this simplifies to

(for op (...)
 (simpify
  (cnd @0 (op:c @1 @2) (op:c @1 @3))
  (op @1 (cnd @0 @2 @3)))

you can then add minus special-cases if they are important

 (simplify
  (cnd @0 (minus @1 @2) (minus @1 @3))
...
 (simplify
  (cnd @0 (minus @2 @1) (minus @3 @1))

I think that's clearer.

+ /* Hack: generic-match causes infinite recursion
+by reverting this transformation when
+i) -fno-trapping-math is enabled, and
+ii) the common operand does not need to be wrapped in a SAVE_EXPR.  
+ */

What's the specific transform that causes this?  Yes, there are some left in 
fold-const.c.

Thanks,
Richard.

> Best Regards,
> Yuliang Wang
>
>
> gcc/ChangeLog:
>
> 2019-09-19  Yuliang Wang  
>
> * match.pd (for cnd (cond vec_cond)): New match statements for the
> above patterns.
> * doc/sourcebuild.texi (vect_condred_si): Document new target 
> selector.
>
> gcc/testsuite/ChangeLog:
>
> 2019-09-19  Yuliang Wang  
>
> * gcc.target/aarch64/sve2/condred_1.c: New test.
> * gcc.dg/vect/vect-condred-1.c: As above.
> * gcc.dg/vect/vect-condred-2.c: As above.
> * gcc.dg/vect/vect-condred-3.c: As above.
> * gcc.dg/vect/vect-condred-4.c: As above.
> * gcc.dg/vect/vect-condred-5.c: As above.
> * gcc.dg/vect/vect-condred-6.c: As above.
> * gcc.dg/vect/vect-condred-7.c: As above.
> * gcc.dg/vect/vect-condred-8.c: As above.
> * lib/target-supports.exp (check_effective_target_vect_condred_si):
> Return true for AArch64 without SVE.


Re: [Ada] Fix 32/64bit mistake on SYSTEM_INFO component in s-win32

2019-09-20 Thread Arnaud Charlet
> I'd keep the target list alphabetical.  Can do so when the Solaris patch
> goes in.
> 
> Here's what I've successfully tested last night on both
> i386-pc-solaris2.11 and sparc-sun-solaris2.11, shamelessly stolen from
> the Linux counterparts.  If it is acceptable, I'd add *-*-solaris2.* to
> the target list when merging.

The change is OK.

> 2019-09-19  Rainer Orth  
> 
>   * libgnarl/s-osinte__solaris.ads (sysconf): Declare.
>   (SC_NPROCESSORS_ONLN): Define.
>   * libgnarl/s-tasinf__solaris.ads (Number_Of_Processors): Declare.
>   * libgnarl/s-tasinf__solaris.adb (N_CPU): New variable.
>   (Number_Of_Processors): New function.


RE: [PATCH] Reduction of conditional operations for vectorization

2019-09-20 Thread Marc Glisse

On Fri, 20 Sep 2019, Yuliang Wang wrote:


Hi Richard,

Thanks for your comments and tips. fold_binary_op_with_conditional_arg performs 
the reverse transformation to this patch in certain situations:

/* Transform `a + (b ? x : y)' into `b ? (a + x) : (a + y)'.
  ... */

static tree
fold_binary_op_with_conditional_arg (location_t loc,
...

/* This transformation is only worthwhile if we don't have to wrap ARG
  in a SAVE_EXPR and the operation can be simplified without recursing
  on at least one of the branches once its pushed inside the COND_EXPR.  */
if (!TREE_CONSTANT (arg)
   && (TREE_SIDE_EFFECTS (arg) ...)
 return NULL_TREE;
...

For instance, this causes infinite recursion in 
gcc.dg/vect/fast-math-vect-call-2 because ARG is a float literal.


IIRC, the conditions in this function always seemed bogus to me. See 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57755 and links. I don't know 
if it is still relevant...


--
Marc Glisse


Re: [ARM/FDPIC v6 13/24] [ARM] FDPIC: Force LSB bit for PC in Cortex-M architecture

2019-09-20 Thread Christophe Lyon
On Fri, 20 Sep 2019 at 14:51, Matthew Malcomson
 wrote:
>
> On 19/09/19 16:14, Kyrill Tkachov wrote:
> >
> > On 9/19/19 4:13 PM, Christophe Lyon wrote:
> >> On Tue, 17 Sep 2019 at 14:08, Christophe Lyon 
> >> wrote:
> >> >
> >> > On 17/09/2019 13:38, Wilco Dijkstra wrote:
> >> > > Hi Christophe,
> >> > >
> >> > > Can you explain this in more detail - it doesn't make sense to me
> >> to force the
> >> > > Thumb bit during unwinding since it should already be correct,
> >> even on a
> >> > > Thumb-only CPU. Perhaps the kernel code that pushes an incorrect
> >> address on
> >> > > the stack could be fixed instead?
> >> > >
> >> > >> Without this, when we are unwinding across a signal frame we can
> >> jump
> >> > >> to an even address which leads to an exception.
> >> > >>
> >> > >> This is needed in __gnu_persnality_sigframe_fdpic() when
> >> restoring the
> >> > >> PC from the signal frame since the PC saved by the kernel has the
> >> LSB
> >> > >> bit set to zero.
> >> > >
> >> > > Wilco
> >> > > .
> >> > >
> >> >
> >> > Indeed, I've noticed the problem mentioned by Matthew since I
> >> committed that patch.
> >> >
> >> > I was about to propose a fix, replacing #if (__thumb__) with #if
> >> (!__ARM_ARCH_ISA_ARM), but you are right: maybe the kernel code should
> >> be fixed instead.
> >> >
> >> > So far I haven't managed to reproduce a failure in FDPIC mode
> >> without this patch though...
> >> >
> >> > Thanks and sorry for the breakage.
> >> >
> >>
> >> I'm having problems with the board I use for testing, so I propose to
> >> revert that patch until I have a better description of the problem it
> >> fixed.
> >> OK?
> >
> > Ok by me as long as lives the fdpic toolchain in a usable state (barring
> > the potential issue here)
>
> Thanks Christophe -- reverting that patch would help our internal
> testing a lot!
> MM
>
OK, I've reverted it.

Christophe

> >
> > Thanks,
> >
> > Kyrill
> >
> >
> >>
> >> Christophe
> >>
> >> > Christophe
> >> >
>


Deprecating cc0 (and consequently cc0 targets)

2019-09-20 Thread Jeff Law
At the register allocation BOF during the Cauldron someone (I forget
who) raised the question of when/how do we get rid of reload.

The first step in that process is to drop support for cc0.  cc0 is a
horribly antiquated mechanism for describing how to handle condition
codes.  It has numerous limitations, the most problematical being the
requirement that the cc0 setter and cc0 user must be consecutive in the
IL.  This requirement bleeds all over the compiler resulting in code
that is harder to understand and maintain.  I'm fairly confident this
code is broken in various ways, particularly WRT exceptions.

So this message is serving as official notice that we are *deprecating*
all cc0 support in gcc-10.  We are not removing any code or targets at
this time -- removals would happen during the gcc-11 cycle.


avr
cris
h8300
m68k
vax
cr16

This is based on actually looking at the backend patterns.
backends.html indicates the mn103 port would be affected, but I think
it's just out-of-date.  Similarly it fails to mention cr16, but cr16
clearly has affected patterns (I'll address that separately)

This patch deprecates the affected targets.  With some magic folks can
continue to build them.  However, if nobody steps forward to convert
them from cc0 to MODE_CC those targets will be removed during the gcc-11
cycle.

If someone is interested in possibly doing a conversion, I would
strongly recommend reviewing the best documentation we have on the subject:

https://gcc.gnu.org/wiki/CC0Transition

I worked with my son about a year ago to transition the v850 to MODE_CC
using that page as a guideline.  We also have a partial transition for
the H8/300 port (not far enough to even build anything yet).  I had
hoped we would finish the H8/300 port last year and would have tackled
another, but events have gotten in the way.

The transition isn't terribly hard, but does take time because of the
number of patterns that have to be changed.

Let the discussion begin...


Jeff
diff --git a/gcc/config.gcc b/gcc/config.gcc
index 69d0a024d85..0c1637e8be1 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -248,6 +248,12 @@ md_file=
 # Obsolete configurations.
 case ${target} in
   tile*-*-*\
+  avr*-*-* \
+  h8300*-*-*   \
+  cris*-*-*\
+  m68k*-*-*\
+  vax*-*-* \
+  cr16*-*-*\
  )
 if test "x$enable_obsolete" != xyes; then
   echo "*** Configuration ${target} is obsolete." >&2
@@ -273,7 +279,6 @@ case ${target} in
  | arm*-*-uclinux* \
  | i[34567]86-go32-*   \
  | i[34567]86-*-go32*  \
- | m68k-*-uclinuxoldabi*   \
  | mips64orion*-*-rtems*   \
  | pdp11-*-bsd \
  | powerpc*-*-linux*paired*\
@@ -294,7 +299,6 @@ case ${target} in
  | *-*-solaris2.[0-9].*\
  | *-*-solaris2.10*\
  | *-*-sysv*   \
- | vax-*-vms*  \
  )
echo "*** Configuration ${target} not supported" 1>&2
exit 1


Re: Avoid adding impossible copies in ira-conflicts.c:process_reg_shuffles

2019-09-20 Thread Vladimir Makarov

On 09/17/2019 12:50 PM, Richard Sandiford wrote:

If an insn requires two operands to be tied, and the input operand dies
in the insn, IRA acts as though there were a copy from the input to the
output with the same execution frequency as the insn.  Allocating the
same register to the input and the output then saves the cost of a move.

If there is no such tie, but an input operand nevertheless dies
in the insn, IRA creates a similar move, but with an eighth of the
frequency.  This helps to ensure that chains of instructions reuse
registers in a natural way, rather than using arbitrarily different
registers for no reason.

This heuristic seems to work well in the vast majority of cases.
However, for SVE, the handling of untied operands ends up creating
copies between dying predicate registers and vector outputs, even though
vector and predicate registers are distinct classes and can never be
tied.  This is a particular problem because the dying predicate tends
to be the loop control predicate, which is used by most instructions
in a vector loop and so (rightly) has a very high allocation priority.
Any copies involving the loop predicate therefore tend to get processed
before copies involving only vector registers.  The end result is that
we tend to allocate the output of the last vector instruction in a loop
ahead of its natural place in the allocation order and don't benefit
from chains created between vector registers.

This patch tries to avoid the problem by not adding register shuffle
copies if there appears to be no chance that the two operands could be
allocated to the same register.

Tested on aarch64-linux-gnu and x86_64-linux-gnu (which needs the
pr82361-*.c tweak I just posted).  Also tested by comparing the
gcc.c-torture, gcc.dg and g++.dg output for one target per CPU
directory at -O2 -ftree-vectorize.  The table below summarises
the tests that had more or fewer assembly lines after the patch
(always a bad metric, but it's just to get a flavour):

Target Tests  Delta   Best  Worst Median
== =  =     = ==
aarch64-linux-gnu 11-16-15  9 -1
aarch64_be-linux-gnu  10-16-15  9 -1
alpha-linux-gnu   10 12 -5 10  2
amdgcn-amdhsa 51 48   -138174 -1
arc-elf4 -3 -2  2 -2
bfin-elf  53 19-13  5  1
cris-elf   1 -1 -1 -1 -1
frv-linux-gnu139 22-16 12 -1
h8300-elf  2  2  1  1  1
hppa64-hp-hpux11.23   20  5 -3  3  1
i686-apple-darwin 42   -305-87 11 -1
i686-pc-linux-gnu 32  -3205   -402 15-15
ia64-linux-gnu   774   -636   -274261 -1
m32r-elf 179  -1386   -161 31 -2
m68k-linux-gnu21-64-18  4 -2
mipsel-linux-gnu  61  8-25 34  1
mipsisa64-linux-gnu   40-11-10 13 -1
mmix   4  1 -2  6 -2
mn10300-elf9  0 -6  5  1
pdp11 13 -6 -5  2  1
powerpc-ibm-aix7.0   152   1736   -142481 -1
powerpc64-linux-gnu  111  -1366   -370 39 -1
powerpc64le-linux-gnu136  -1183   -409241 -1
pru-elf   45 24 -6 17  1
riscv32-elf   22-83-62 10 -2
riscv64-elf   26-78-19  4 -1
s390-linux-gnu43   -145-32 16 -1
s390x-linux-gnu   40   -285-96 16 -1
visium-elf   125   -317-50 10 -1
x86_64-darwin 40-83-42 13 -1
x86_64-linux-gnu  39   -577   -164 23 -1
xstormy16-elf 26-12 -4  4 -1


As usually a great report and patch explanation, Richard.
Thank you for finding this problem and resolving it.
The patch is ok for the trunk.



2019-09-17  Richard Sandiford  

gcc/
* ira-conflicts.c (can_use_same_reg_p): New function.
(process_reg_shuffles): Take an insn parameter.  Ignore cases
in which input operand op_num could seemingly never be allocated
to the same register as the destination.
(add_insn_allocno_copies): Update call to process_reg_shuffles.

gcc/testsuite/
* gcc.target/aarch64/sve/cond_convert_1.c: Remove XFAILs.
* gcc.target/aarch64/sve/cond_convert_4.c: Likewise.
* gcc.target/aarch64/sve/cond_unary_2.c: Likewise.

Index: gcc/ira-conflicts.c
===
--- gcc/ira-conflicts.c 2019-09-12 10:52:55.0 +0100
+++ gcc/ira-conflicts.c 2019-09-17 17:43:06.426031052 +0100
@@ -325,12 +325,37 @@ process_regs_for_copy (rtx reg1, rtx reg
return true;
  }
  
-/* Process all of the output registers of the cur

[Patch] PR fortran/78260 - OpenACC + OpenMP target fixes - esp. with function-result variables

2019-09-20 Thread Tobias Burnus

Hi all,

This patch does two things:

(A) For OpenACC, only, it fixes the is-variable check. That check missed 
to reject module names (as noted in the PR) but as my testing showed, it 
also wrongly rejected function-result variables. (i.e. where the 
return-value variable has the same name as the function). - For the 
invalid input of the PR, gfortran gave an ICE in the gimplifier.


(B) Using such function-result variables did not work properly. OpenACC 
used in both cases (see pr78260-2.f90) the function name – and at least 
one variant failed with an ICE.


OpenMP used the result variable for "target data map" but not for 
"target update". Additionally "task depend" had the same issue.



Bootstrapped and regtested on x86_64-gnu-linux w/o accelerator.

I intent to build/regtest it also applied to the OG9 (openacc-gnu-9) 
branch and run the test case with actual nvptx+AMDGCN offloading, but I 
have not done so, yet.


OK for the trunk?

Tobias

PS: Regtesting fails for continuation_6.f but that's PR fortran/91253 
(fails to show a warning when a newer GLIBC is used). And for 
gfortran.dg/vect/vect-8.f90 fails because 23 instead of 22 loops get 
vectorized - probably someone (richi?) didn't update the expected value.


2019-09-20  Tobias Burnus  

	PR fortran/78260
	* openmp.c (gfc_resolve_oacc_declare): Reject all
	non variables but accept function result variables.
	* trans-openmp.c (gfc_trans_omp_clauses): Handle
	function-result variables for remaing cases.

2019-09-20  Tobias Burnus  

	PR fortran/78260
	* gfortran.dg/goacc/parameter.f95: Change
	dg-error as it is now detected earlier.
	* gfortran.dg/goacc/pr85701.f90: Modify to
	use a separate result variable.
	* gfortran.dg/goacc/pr78260.f90: New.
	* gfortran.dg/goacc/pr78260-2.f90: New.
	* gfortran.dg/gomp/pr78260.f90: New.
	* gfortran.dg/gomp/pr78260-2.f90: New.
	* gfortran.dg/gomp/pr78260-3.f90: New.

diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index 44fcb9db8c6..bda7f288989 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -6048,18 +6048,14 @@ gfc_resolve_oacc_declare (gfc_namespace *ns)
 	for (n = oc->clauses->lists[list]; n; n = n->next)
 	  {
 	n->sym->mark = 0;
-	if (n->sym->attr.function || n->sym->attr.subroutine)
+	if (n->sym->attr.flavor != FL_VARIABLE
+		&& (n->sym->attr.flavor != FL_PROCEDURE
+		|| n->sym->result != n->sym))
 	  {
 		gfc_error ("Object %qs is not a variable at %L",
 			   n->sym->name, &oc->loc);
 		continue;
 	  }
-	if (n->sym->attr.flavor == FL_PARAMETER)
-	  {
-		gfc_error ("PARAMETER object %qs is not allowed at %L",
-			   n->sym->name, &oc->loc);
-		continue;
-	  }
 
 	if (n->expr && n->expr->ref->type == REF_ARRAY)
 	  {
diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c
index 8eae7bc0a52..b4c77aebf4d 100644
--- a/gcc/fortran/trans-openmp.c
+++ b/gcc/fortran/trans-openmp.c
@@ -2075,7 +2075,7 @@ gfc_trans_omp_clauses (stmtblock_t *block, gfc_omp_clauses *clauses,
 	  tree node = build_omp_clause (input_location, OMP_CLAUSE_DEPEND);
 	  if (n->expr == NULL || n->expr->ref->u.ar.type == AR_FULL)
 		{
-		  tree decl = gfc_get_symbol_decl (n->sym);
+		  tree decl = gfc_trans_omp_variable (n->sym, false);
 		  if (gfc_omp_privatize_by_reference (decl))
 		decl = build_fold_indirect_ref (decl);
 		  if (GFC_DESCRIPTOR_TYPE_P (TREE_TYPE (decl)))
@@ -2136,7 +2136,7 @@ gfc_trans_omp_clauses (stmtblock_t *block, gfc_omp_clauses *clauses,
 	  tree node2 = NULL_TREE;
 	  tree node3 = NULL_TREE;
 	  tree node4 = NULL_TREE;
-	  tree decl = gfc_get_symbol_decl (n->sym);
+	  tree decl = gfc_trans_omp_variable (n->sym, false);
 	  if (DECL_P (decl))
 		TREE_ADDRESSABLE (decl) = 1;
 	  if (n->expr == NULL || n->expr->ref->u.ar.type == AR_FULL)
@@ -2398,7 +2398,7 @@ gfc_trans_omp_clauses (stmtblock_t *block, gfc_omp_clauses *clauses,
 	  tree node = build_omp_clause (input_location, clause_code);
 	  if (n->expr == NULL || n->expr->ref->u.ar.type == AR_FULL)
 		{
-		  tree decl = gfc_get_symbol_decl (n->sym);
+		  tree decl = gfc_trans_omp_variable (n->sym, false);
 		  if (gfc_omp_privatize_by_reference (decl))
 		decl = build_fold_indirect_ref (decl);
 		  else if (DECL_P (decl))
diff --git a/gcc/testsuite/gfortran.dg/goacc/parameter.f95 b/gcc/testsuite/gfortran.dg/goacc/parameter.f95
index 84274611915..cbe67dba788 100644
--- a/gcc/testsuite/gfortran.dg/goacc/parameter.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/parameter.f95
@@ -6,7 +6,7 @@ contains
 implicit none
 integer :: i
 integer, parameter :: a = 1
-!$acc declare device_resident (a) ! { dg-error "PARAMETER" }
+!$acc declare device_resident (a) ! { dg-error "is not a variable" }
 !$acc data copy (a) ! { dg-error "not a variable" }
 !$acc end data
 !$acc data deviceptr (a) ! { dg-error "not a variable" }
diff --git a/gcc/testsuite/gfortran.dg/goacc/pr78260-2.f90 b/gcc/testsuite/gfortran.dg/goacc/pr7826

Re: [Patch] PR fortran/78260 - OpenACC + OpenMP target fixes - esp. with function-result variables

2019-09-20 Thread Jakub Jelinek
On Fri, Sep 20, 2019 at 05:47:59PM +0200, Tobias Burnus wrote:
> This patch does two things:
> 
> (A) For OpenACC, only, it fixes the is-variable check. That check missed to
> reject module names (as noted in the PR) but as my testing showed, it also
> wrongly rejected function-result variables. (i.e. where the return-value
> variable has the same name as the function). - For the invalid input of the
> PR, gfortran gave an ICE in the gimplifier.
> 
> (B) Using such function-result variables did not work properly. OpenACC used
> in both cases (see pr78260-2.f90) the function name – and at least one
> variant failed with an ICE.
> 
> OpenMP used the result variable for "target data map" but not for "target
> update". Additionally "task depend" had the same issue.
> 
> 
> Bootstrapped and regtested on x86_64-gnu-linux w/o accelerator.
> 
> I intent to build/regtest it also applied to the OG9 (openacc-gnu-9) branch
> and run the test case with actual nvptx+AMDGCN offloading, but I have not
> done so, yet.
> 
> OK for the trunk?

LGTM, thanks.
I'm not entirely sure about the detailed #pragma matches in the dumps, if
they turn out to be too hard to maintain, we can always remove those or
adjust so that they match with fewer details.

> 2019-09-20  Tobias Burnus  
> 
>   PR fortran/78260
>   * openmp.c (gfc_resolve_oacc_declare): Reject all
>   non variables but accept function result variables.
>   * trans-openmp.c (gfc_trans_omp_clauses): Handle
>   function-result variables for remaing cases.
> 
> 2019-09-20  Tobias Burnus  
> 
>   PR fortran/78260
>   * gfortran.dg/goacc/parameter.f95: Change
>   dg-error as it is now detected earlier.
>   * gfortran.dg/goacc/pr85701.f90: Modify to
>   use a separate result variable.
>   * gfortran.dg/goacc/pr78260.f90: New.
>   * gfortran.dg/goacc/pr78260-2.f90: New.
>   * gfortran.dg/gomp/pr78260.f90: New.
>   * gfortran.dg/gomp/pr78260-2.f90: New.
>   * gfortran.dg/gomp/pr78260-3.f90: New.

Jakub


Re: [Patch] PR fortran/78260 - OpenACC + OpenMP target fixes - esp. with function-result variables

2019-09-20 Thread Steve Kargl
On Fri, Sep 20, 2019 at 05:47:59PM +0200, Tobias Burnus wrote:
> (A) For OpenACC, only, it fixes the is-variable check. That check missed 
> to reject module names (as noted in the PR) but as my testing showed, it 
> also wrongly rejected function-result variables. (i.e. where the 
> return-value variable has the same name as the function). - For the 
> invalid input of the PR, gfortran gave an ICE in the gimplifier.
> 
> (B) Using such function-result variables did not work properly. OpenACC 
> used in both cases (see pr78260-2.f90) the function name – and at least 
> one variant failed with an ICE.
> 
> OpenMP used the result variable for "target data map" but not for 
> "target update". Additionally "task depend" had the same issue.
> 
> 
> Bootstrapped and regtested on x86_64-gnu-linux w/o accelerator.
> 
> I intent to build/regtest it also applied to the OG9 (openacc-gnu-9) 
> branch and run the test case with actual nvptx+AMDGCN offloading, but I 
> have not done so, yet.
> 
> OK for the trunk?

Patch looks ok to me, but I'll defer to Jakub (opemp) and/or
Thomas Schwinge (openacc).

PS: Hopefully, Welcome back!

-- 
steve


[OG9][committed] - [Patch] PR fortran/78260 - OpenACC + OpenMP target fixes - esp. with function-result variables

2019-09-20 Thread Tobias Burnus

I have now committed this patch also to the OG9 (openacc-gcc-9) branch.

Thanks for the reviews of the trunk patch!

Tobias

On 9/20/19 5:47 PM, Tobias Burnus wrote:

Hi all,

This patch does two things:

(A) For OpenACC, only, it fixes the is-variable check. That check 
missed to reject module names (as noted in the PR) but as my testing 
showed, it also wrongly rejected function-result variables. (i.e. 
where the return-value variable has the same name as the function). - 
For the invalid input of the PR, gfortran gave an ICE in the gimplifier.


(B) Using such function-result variables did not work properly. 
OpenACC used in both cases (see pr78260-2.f90) the function name – and 
at least one variant failed with an ICE.


OpenMP used the result variable for "target data map" but not for 
"target update". Additionally "task depend" had the same issue.



Bootstrapped and regtested on x86_64-gnu-linux w/o accelerator.

I intent to build/regtest it also applied to the OG9 (openacc-gnu-9) 
branch and run the test case with actual nvptx+AMDGCN offloading, but 
I have not done so, yet.


OK for the trunk?

Tobias

PS: Regtesting fails for continuation_6.f but that's PR fortran/91253 
(fails to show a warning when a newer GLIBC is used). And for 
gfortran.dg/vect/vect-8.f90 fails because 23 instead of 22 loops get 
vectorized - probably someone (richi?) didn't update the expected value.




Question on direction of GCC support for HWASAN.

2019-09-20 Thread Matthew Malcomson
Hello,

I'm nearing the point where I think the hardware-asan patch series could
go in and would appreciate some feedback on what features need to be
implemented before it could be put into trunk.
(The RFC can be found at
https://gcc.gnu.org/ml/gcc-patches/2019-09/msg00387.html, the only
change to the general approach I've made since then is to use backend
hooks to determine the RTL to emit instead of introducing a new RTX
called `ADDTAG`)

---
Note that hwasan will not have support for architectures other than
AArch64 to start with, and any support that could be added later for
architectures without TBI would be difficult.

As a comparison; LLVM has support for hwasan on x86_64 "but it's not
really practical because any library built without instrumentation is a
big source of false positives"
https://gcc.gnu.org/ml/gcc-patches/2019-09/msg00773.html.


---
We would really like to enable using HWASAN for the kernel (in KHWASAN
-- https://lwn.net/Articles/748455/) when compiling with GCC, while also
allowing use of hwasan for testing/debugging of userland processes
(i.e. not to be run in production).
A secondary benefit is that the hwasan instrumentation should provide a
framework for adding stack-tagging MTE support to the compiler.

The implementation is unlikely to be production-quality since
development on libhwasan is only on its `platform` ABI.  This libhwasan
ABI requires changes to the system libc so that it calls into libhwasan
on interesting events.
I haven't looked into adding these changes to glibc, but expect that
most people running a Linux distribution would not want to install a
special glibc to use this sanitizer.

The alternate libhwasan ABI is called the `interceptor` ABI.  This
intercepts certain functions in the system libc so that libhwasan can
run its own code.  The method of intercepting events has some known
difficulties.  Also, as one of the LLVM hwasan developers mentioned,
this ABI is not being developed any more
https://gcc.gnu.org/ml/gcc-patches/2019-09/msg00773.html.


---
If this aim is acceptable, then the biggest question I have is whether
HWASAN could be accepted as a first stage without having the option for
inline instrumentation.  As it stands I've just implemented the
function-call instrumentation for checking tags and tagging memory, with
the idea that inline instrumentation could be added later.

Similarly I'm not planning on implementing global instrumentation for 
this first stage and hoping that's OK with the community.

Cheers,
Matthew


Re: Deprecating cc0 (and consequently cc0 targets)

2019-09-20 Thread Richard Biener
On September 20, 2019 5:38:38 PM GMT+02:00, Jeff Law  wrote:
>At the register allocation BOF during the Cauldron someone (I forget
>who) raised the question of when/how do we get rid of reload.
>
>The first step in that process is to drop support for cc0.  cc0 is a
>horribly antiquated mechanism for describing how to handle condition
>codes.  It has numerous limitations, the most problematical being the
>requirement that the cc0 setter and cc0 user must be consecutive in the
>IL.  This requirement bleeds all over the compiler resulting in code
>that is harder to understand and maintain.  I'm fairly confident this
>code is broken in various ways, particularly WRT exceptions.
>
>So this message is serving as official notice that we are *deprecating*
>all cc0 support in gcc-10.  We are not removing any code or targets at
>this time -- removals would happen during the gcc-11 cycle.
>
>
>avr
>cris
>h8300
>m68k
>vax
>cr16
>
>This is based on actually looking at the backend patterns.
>backends.html indicates the mn103 port would be affected, but I think
>it's just out-of-date.  Similarly it fails to mention cr16, but cr16
>clearly has affected patterns (I'll address that separately)
>
>This patch deprecates the affected targets.  With some magic folks can
>continue to build them.  However, if nobody steps forward to convert
>them from cc0 to MODE_CC those targets will be removed during the
>gcc-11
>cycle.
>
>If someone is interested in possibly doing a conversion, I would
>strongly recommend reviewing the best documentation we have on the
>subject:
>
>https://gcc.gnu.org/wiki/CC0Transition
>
>I worked with my son about a year ago to transition the v850 to MODE_CC
>using that page as a guideline.  We also have a partial transition for
>the H8/300 port (not far enough to even build anything yet).  I had
>hoped we would finish the H8/300 port last year and would have tackled
>another, but events have gotten in the way.
>
>The transition isn't terribly hard, but does take time because of the
>number of patterns that have to be changed.
>
>Let the discussion begin...

It seems to be that for the goal to keep a target alive a variant #2 conversion 
(according to the wiki) should be closely mirror CC0 behavior and thus should 
be easier to achieve and with less patterns touched than the 'good' variant. 
Can you acknowledge that and would such 'simple' conversions be OK? 

Richard. 

>
>Jeff



Re: Deprecating cc0 (and consequently cc0 targets)

2019-09-20 Thread Joseph Myers
I think the m68k-*-uclinuxoldabi* and vax-*-vms* cases need to move up to 
the top of that case statement (duplicating the "not supported" error), to 
keep that error even in the --enable-obsolete case.

I see tile* targets are referenced in the diff context.  They were 
obsoleted in GCC 8 (and have also been removed from glibc and the Linux 
kernel), isn't it time to remove them?

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: Deprecating cc0 (and consequently cc0 targets)

2019-09-20 Thread Jeff Law
On 9/20/19 11:22 AM, Richard Biener wrote:
> On September 20, 2019 5:38:38 PM GMT+02:00, Jeff Law 
> wrote:
>> At the register allocation BOF during the Cauldron someone (I
>> forget who) raised the question of when/how do we get rid of
>> reload.
>> 
>> The first step in that process is to drop support for cc0.  cc0 is
>> a horribly antiquated mechanism for describing how to handle
>> condition codes.  It has numerous limitations, the most
>> problematical being the requirement that the cc0 setter and cc0
>> user must be consecutive in the IL.  This requirement bleeds all
>> over the compiler resulting in code that is harder to understand
>> and maintain.  I'm fairly confident this code is broken in various
>> ways, particularly WRT exceptions.
>> 
>> So this message is serving as official notice that we are
>> *deprecating* all cc0 support in gcc-10.  We are not removing any
>> code or targets at this time -- removals would happen during the
>> gcc-11 cycle.
>> 
>> 
>> avr cris h8300 m68k vax cr16
>> 
>> This is based on actually looking at the backend patterns. 
>> backends.html indicates the mn103 port would be affected, but I
>> think it's just out-of-date.  Similarly it fails to mention cr16,
>> but cr16 clearly has affected patterns (I'll address that
>> separately)
>> 
>> This patch deprecates the affected targets.  With some magic folks
>> can continue to build them.  However, if nobody steps forward to
>> convert them from cc0 to MODE_CC those targets will be removed
>> during the gcc-11 cycle.
>> 
>> If someone is interested in possibly doing a conversion, I would 
>> strongly recommend reviewing the best documentation we have on the 
>> subject:
>> 
>> https://gcc.gnu.org/wiki/CC0Transition
>> 
>> I worked with my son about a year ago to transition the v850 to
>> MODE_CC using that page as a guideline.  We also have a partial
>> transition for the H8/300 port (not far enough to even build
>> anything yet).  I had hoped we would finish the H8/300 port last
>> year and would have tackled another, but events have gotten in the
>> way.
>> 
>> The transition isn't terribly hard, but does take time because of
>> the number of patterns that have to be changed.
>> 
>> Let the discussion begin...
> 
> It seems to be that for the goal to keep a target alive a variant #2
> conversion (according to the wiki) should be closely mirror CC0
> behavior and thus should be easier to achieve and with less patterns
> touched than the 'good' variant. Can you acknowledge that and would
> such 'simple' conversions be OK?
Unless the target has condition code preserving arithmetic a variant #2
conversion is the only way to go.

Now if someone did a variant #2 without the optimization bits (ie,
adding appropriate _set_flags pattern variants), I think that should be
considered explicitly OK even though the code quality would potentially
suffer.  Essentially it's a choice between dropping the port or keeping
the port, but with slightly less optimization.  THe latter seems like a
better choice to me.

jeff


[Darwin, X86, testsuite, committed] Fix naked-1.c fail.

2019-09-20 Thread Iain Sandoe
This fails at m32 because the scan-asm is looking for an absence
of "ret”.  Darwin is generating the correct code for the function
but the picbase thunk has a 'ret' insn.  Fixed by making the test
use -mdynamic-no-pic for m32.

tested on x86_64-darwin16, x86_64-pc-linux-gnu (m32/m64)
applied to mainline
thanks
Iain

gcc/testsuite/ChangeLog:

2019-09-20  Iain Sandoe  

* gcc.target/i386/naked-1.c: Alter options to use non-
PIC codegen for m32 Darwin.

diff --git a/gcc/testsuite/gcc.target/i386/naked-1.c 
b/gcc/testsuite/gcc.target/i386/naked-1.c
index 07bb10edd8..f51773c46a 100644
--- a/gcc/testsuite/gcc.target/i386/naked-1.c
+++ b/gcc/testsuite/gcc.target/i386/naked-1.c
@@ -1,5 +1,7 @@
 /* { dg-do compile } */
-/* { dg-options "-O0 -fno-pic" } */
+/* { dg-options "-O0" } */
+/* { dg-additional-options "-fno-pic" { target { ! *-*-darwin* } } } */
+/* { dg-additional-options "-mdynamic-no-pic" { target { *-*-darwin* && ia32 } 
} } */
 
 /* Verify that __attribute__((naked)) produces a naked function 
that does not use ret to return but traps at the end.  */



Re: [PATCH target/86811] Mark VAX as not needing speculation barriers

2019-09-20 Thread Jeff Law
On 9/17/19 2:59 PM, co...@sdf.org wrote:
> According to Bob Supnik (who is a very authoritative source on VAX),
> 
>> Funny you should ask. The short answer is no. No VAX ever did
>> speculative or out of order execution.
> 
> As such, marking VAX as not needing speculation barriers.
> 
> 
>   PR target/86811
>   * config/vax/vax.c (TARGET_HAVE_SPECULATION_SAFE_VALUE):
>   Define to speculation_safe_value_not_needed.
Installed on the trunk.
jeff


Re: Extend neg_const_int simplifications to other const rtxes

2019-09-20 Thread Jeff Law
On 9/18/19 12:56 AM, Richard Sandiford wrote:
> This patch generalises some neg_const_int-based rtx simplifications
> so that they handle all CONST_SCALAR_INTs and also CONST_POLY_INT.
> This actually simplifies things a bit, since we no longer have
> to treat HOST_WIDE_INT_MIN specially.
> 
> This is tested by later SVE patches.
> 
> Tested on aarch64-linux-gnu with SVE (with and without follow-on
> patches) and x86_64-linux-gnu.  OK to install?
> 
> Richard
> 
> 
> 2019-09-18  Richard Sandiford  
> 
> gcc/
>   * simplify-rtx.c (neg_const_int): Replace with...
>   (neg_poly_int_rtx): ...this new function.
>   (simplify_binary_operation_1): Extend (minus x C) -> (plus X -C)
>   to all CONST_SCALAR_INTs and to CONST_POLY_INT.
>   (simplify_plus_minus): Likewise for constant terms here.
OK
jeff


[PATCH] PR fortran/91426: Colorize %L text to match diagnostic_show_locus

2019-09-20 Thread David Malcolm
PR fortran/91426 reports that the following program

program main
10 continue
10 continue
end

yields:

label.f90:2:2:

2 | 10 continue
  |  1
3 | 10 continue
  |  2
Error: Duplicate statement label 10 at (1) and (2)

where the '1' and '2' annotating lines 2 and 3 in the source are
colored red and green respectively, but the "(1)" and "(2)" in the
"Error" line are not colored - only the word "Error" in that line is
colored (red).

PR fortran/91426 covers this inconsistency, and there was some
debate there on ways we could address it.

Within the diagnostics subsystem as of GCC 6 onwards, a
rich_location can contain multiple location_t values.  The first is
considered the "primary" location, subsequent ones are considered
"secondary" locations.

diagnostic_show_locus colorizes locations in the annotated source
lines by using the "diagnostic_kind" color for the primary
location_t within the rich_location, and then alternating between
two other colors for any secondary location_t values within the
rich_location ("range1" and "range2" within GCC_COLORS).

fortran/error.c currently supports at most 2 locations per
diagnostic (via the %L formatting code).

Hence fortran diagnostics have a primary location_t, and some have
a single secondary location_t.

Based on discussion with tkoenig in the bug report, this patch
tweaks fortran's error.c so that it colorizes the "(1)" and "(2)" in
the "Error" line, with their colors matching those of the "1" and
"2" in the source line annotations.

For a single-%L diagnostic, it colorizes the single "(1)" so that
its color matches that of the "1" in the source line annotation.

My view is that this makes the overall output more readable, making
the association between the text of the diagnostic and the
annotated source clearer.

For example, if you have multiple diagnostics with a mixture of
warnings and errors, the differences in colorization make it
clearer (to me, at least) what's being referred to in the text
of each diagnostic (e.g. I've been testing with
gfortran.dg/achar_3.f90 -Wall).

As normal, no colorization is done if colorization is disabled
e.g. via '-fdiagnostics-color=never' or if stderr isn't going to a
tty.

Successfully bootstrapped®rtested on x86_64-pc-linux-gnu.

I think technically I can self-approve this, but I'm not a
day-to-day user of fortran; does this look sane?

Thanks
David

gcc/fortran/ChangeLog:
PR fortran/91426
* error.c (curr_diagnostic): New static variable.
(gfc_report_diagnostic): New static function.
(gfc_warning): Replace call to diagnostic_report_diagnostic with
call to gfc_report_diagnostic.
(gfc_format_decoder): Colorize the text of %L and %C to match the
colorization used by diagnostic_show_locus.
(gfc_warning_now_at): Replace call to diagnostic_report_diagnostic with
call to gfc_report_diagnostic.
(gfc_warning_now): Likewise.
(gfc_warning_internal): Likewise.
(gfc_error_now): Likewise.
(gfc_fatal_error): Likewise.
(gfc_error_opt): Likewise.
(gfc_internal_error): Likewise.
---
 gcc/fortran/error.c | 44 
 1 file changed, 36 insertions(+), 8 deletions(-)

diff --git a/gcc/fortran/error.c b/gcc/fortran/error.c
index 68a2791..a0ce7a6 100644
--- a/gcc/fortran/error.c
+++ b/gcc/fortran/error.c
@@ -760,6 +760,23 @@ gfc_clear_pp_buffer (output_buffer *this_buffer)
   global_dc->last_location = UNKNOWN_LOCATION;
 }
 
+/* The currently-printing diagnostic, for use by gfc_format_decoder,
+   for colorizing %C and %L.  */
+
+static diagnostic_info *curr_diagnostic;
+
+/* A helper function to call diagnostic_report_diagnostic, while setting
+   curr_diagnostic for the duration of the call.  */
+
+static bool
+gfc_report_diagnostic (diagnostic_info *diagnostic)
+{
+  gcc_assert (diagnostic != NULL);
+  curr_diagnostic = diagnostic;
+  bool ret = diagnostic_report_diagnostic (global_dc, diagnostic);
+  curr_diagnostic = NULL;
+  return ret;
+}
 
 /* This is just a helper function to avoid duplicating the logic of
gfc_warning.  */
@@ -789,7 +806,7 @@ gfc_warning (int opt, const char *gmsgid, va_list ap)
   diagnostic_set_info (&diagnostic, gmsgid, &argp, &rich_loc,
   DK_WARNING);
   diagnostic.option_index = opt;
-  bool ret = diagnostic_report_diagnostic (global_dc, &diagnostic);
+  bool ret = gfc_report_diagnostic (&diagnostic);
 
   if (buffered_p)
 {
@@ -954,7 +971,18 @@ gfc_format_decoder (pretty_printer *pp, text_info *text, 
const char *spec,
 loc->lb->location,
 offset);
text->set_location (loc_num, src_loc, SHOW_RANGE_WITH_CARET);
+   /* Colorize the markers to match the color choices of
+  diagnostic_show_locus (the initial location has a color given
+  by the "kind" of the diagnostic, the secondary location has
+

[PATCH] 2019-09-20 Kamil Rytarowski

2019-09-20 Thread Kamil Rytarowski
GCC version of https://reviews.llvm.org/D67719

From 422827582d84e078df2a8e303d807c830a155ab5 Mon Sep 17 00:00:00 2001
From: Kamil Rytarowski 
Date: Fri, 20 Sep 2019 22:02:09 +0200
Subject: [PATCH] 2019-09-20  Kamil Rytarowski  

* cppbuiltin.c (define_builtin_macros_for_compilation_flags): Add new
builtin __SANITIZE_LEAK__ macros for fsanitize=leak switch.
* doc/cpp.texi: Document new macros.

* c-c++-common/lsan/sanitize-leak-macro.c: New test.
---
 gcc/ChangeLog|  6 ++
 gcc/cppbuiltin.c |  3 +++
 gcc/doc/cpp.texi |  3 +++
 gcc/testsuite/ChangeLog  |  4 
 .../c-c++-common/lsan/sanitize-lsan-macro.c  | 12 
 5 files changed, 28 insertions(+)
 create mode 100644 gcc/testsuite/c-c++-common/lsan/sanitize-lsan-macro.c

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 76269e17eb0..1bddb4835a0 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,9 @@
+2019-09-20  Kamil Rytarowski  
+
+   * cppbuiltin.c (define_builtin_macros_for_compilation_flags): Add new
+   builtin __SANITIZE_LEAK__ macros for fsanitize=leak switch.
+   * doc/cpp.texi: Document new macros.
+
 2019-09-20  Richard Biener  
Uros Bizjak  

diff --git a/gcc/cppbuiltin.c b/gcc/cppbuiltin.c
index 60e5bedc366..5a1b6555c53 100644
--- a/gcc/cppbuiltin.c
+++ b/gcc/cppbuiltin.c
@@ -93,6 +93,9 @@ define_builtin_macros_for_compilation_flags (cpp_reader 
*pfile)
   if (flag_sanitize & SANITIZE_ADDRESS)
 cpp_define (pfile, "__SANITIZE_ADDRESS__");

+  if (flag_sanitize & SANITIZE_LEAK)
+cpp_define (pfile, "__SANITIZE_LEAK__");
+
   if (flag_sanitize & SANITIZE_THREAD)
 cpp_define (pfile, "__SANITIZE_THREAD__");

diff --git a/gcc/doc/cpp.texi b/gcc/doc/cpp.texi
index f2de39a270c..74861eaf9c5 100644
--- a/gcc/doc/cpp.texi
+++ b/gcc/doc/cpp.texi
@@ -2359,6 +2359,9 @@ in use.
 This macro is defined, with value 1, when @option{-fsanitize=address}
 or @option{-fsanitize=kernel-address} are in use.

+@item __SANITIZE_LEAK__
+This macro is defined, with value 1, when @option{-fsanitize=leak} is in use.
+
 @item __SANITIZE_THREAD__
 This macro is defined, with value 1, when @option{-fsanitize=thread} is in use.

diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 05c25ee28ce..4b37e1f3643 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,7 @@
+2019-09-20  Kamil Rytarowski  
+
+   * c-c++-common/lsan/sanitize-leak-macro.c: New test.
+
 2019-09-20  Iain Sandoe  

* gcc.target/i386/naked-1.c: Alter options to use non-
diff --git a/gcc/testsuite/c-c++-common/lsan/sanitize-lsan-macro.c 
b/gcc/testsuite/c-c++-common/lsan/sanitize-lsan-macro.c
new file mode 100644
index 000..c588aa32e86
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/lsan/sanitize-lsan-macro.c
@@ -0,0 +1,12 @@
+/* Check that -fsanitize=leak options defines __SANITIZE_LEAK__ macros.  */
+/* { dg-do compile } */
+/* { dg-skip-if "" { *-*-* } { "*" } { "-O0" } } */
+
+int
+main ()
+{
+#ifndef __SANITIZE_LEAK__
+  bad construction
+#endif
+  return 0;
+}
--
2.23.0


[PATCH] [og9] Handle references in OpenACC "private" clauses

2019-09-20 Thread Julian Brown
This patch rewrites reference-type variables appearing in OpenACC
"private" clauses in a similar way to how such variables are handled in
reduction clauses. Otherwise, the mechanism used to privatize reference
variables is currently ill-suited to the worker-partitioning mechanism
used for AMD GCN, and each worker ends up accessing worker 0's copy of
those reference variables via broadcast pointers. Rewriting reference
variables to non-reference-type scalars sidesteps that problem.

This is intended as a somewhat temporary solution: it works for the
newly-included tests, but is not very elegant.

Tested with offloading to AMD GCN. I will apply to the
openacc-gcc-9-branch shortly.

Cheers,

Julian

ChangeLog

gcc/
* gimplify.c (localize_reductions): Rewrite references for
OMP_CLAUSE_PRIVATE also.

libgomp/
* testsuite/libgomp.oacc-fortran/privatized-ref-1.f95: New test.
* testsuite/libgomp.oacc-c++/privatized-ref-2.C: New test.
* testsuite/libgomp.oacc-c++/privatized-ref-3.C: New test.
---
 gcc/ChangeLog.openacc |  5 ++
 gcc/gimplify.c| 15 
 libgomp/ChangeLog.openacc |  6 ++
 .../libgomp.oacc-c++/privatized-ref-2.C   | 64 +
 .../libgomp.oacc-c++/privatized-ref-3.C   | 64 +
 .../libgomp.oacc-fortran/privatized-ref-1.f95 | 71 +++
 6 files changed, 225 insertions(+)
 create mode 100644 libgomp/testsuite/libgomp.oacc-c++/privatized-ref-2.C
 create mode 100644 libgomp/testsuite/libgomp.oacc-c++/privatized-ref-3.C
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/privatized-ref-1.f95

diff --git a/gcc/ChangeLog.openacc b/gcc/ChangeLog.openacc
index fe584959153..523b6eb1d74 100644
--- a/gcc/ChangeLog.openacc
+++ b/gcc/ChangeLog.openacc
@@ -1,3 +1,8 @@
+2019-09-20  Julian Brown  
+
+   * gimplify.c (localize_reductions): Rewrite references for
+   OMP_CLAUSE_PRIVATE also.
+
 2019-09-17  Tobias Burnus  
 
* config/gcn/gcn.c (gcn_expand_scalar_to_vector_address,
diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index d16611d3617..d95ad5d4baa 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -10879,6 +10879,21 @@ localize_reductions (tree clauses, tree body)
 
OMP_CLAUSE_REDUCTION_PRIVATE_DECL (c) = new_var;
   }
+else if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_PRIVATE)
+  {
+   var = OMP_CLAUSE_DECL (c);
+
+   if (!lang_hooks.decls.omp_privatize_by_reference (var))
+ continue;
+
+   type = TREE_TYPE (TREE_TYPE (var));
+   new_var = create_tmp_var (type, IDENTIFIER_POINTER (DECL_NAME (var)));
+
+   pr.ref_var = var;
+   pr.local_var = new_var;
+
+   walk_tree (&body, localize_reductions_r, &pr, NULL);
+  }
 }
 
 
diff --git a/libgomp/ChangeLog.openacc b/libgomp/ChangeLog.openacc
index 7813760e642..d9d1c353e31 100644
--- a/libgomp/ChangeLog.openacc
+++ b/libgomp/ChangeLog.openacc
@@ -1,3 +1,9 @@
+2019-09-20  Julian Brown  
+
+   * testsuite/libgomp.oacc-fortran/privatized-ref-1.f95: New test.
+   * testsuite/libgomp.oacc-c++/privatized-ref-2.C: New test.
+   * testsuite/libgomp.oacc-c++/privatized-ref-3.C: New test.
+
 2019-09-19  Julian Brown  
 
* plugin/plugin-nvptx.c (GOMP_OFFLOAD_openacc_async_host2dev):
diff --git a/libgomp/testsuite/libgomp.oacc-c++/privatized-ref-2.C 
b/libgomp/testsuite/libgomp.oacc-c++/privatized-ref-2.C
new file mode 100644
index 000..3884f163132
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c++/privatized-ref-2.C
@@ -0,0 +1,64 @@
+/* { dg-do run } */
+
+#include 
+
+void workers (void)
+{
+  double res[65536];
+  int i;
+
+#pragma acc parallel copyout(res) num_gangs(64) num_workers(64)
+  {
+int i, j;
+#pragma acc loop gang
+for (i = 0; i < 256; i++)
+  {
+#pragma acc loop worker
+   for (j = 0; j < 256; j++)
+ {
+   int tmpvar;
+   int &tmpref = tmpvar;
+   tmpref = (i * 256 + j) * 99;
+   res[i * 256 + j] = tmpref;
+ }
+  }
+  }
+
+  for (i = 0; i < 65536; i++)
+if (res[i] != i * 99)
+  abort ();
+}
+
+void vectors (void)
+{
+  double res[65536];
+  int i;
+
+#pragma acc parallel copyout(res) num_gangs(64) num_workers(64)
+  {
+int i, j;
+#pragma acc loop gang worker
+for (i = 0; i < 256; i++)
+  {
+#pragma acc loop vector
+   for (j = 0; j < 256; j++)
+ {
+   int tmpvar;
+   int &tmpref = tmpvar;
+   tmpref = (i * 256 + j) * 101;
+   res[i * 256 + j] = tmpref;
+ }
+  }
+  }
+
+  for (i = 0; i < 65536; i++)
+if (res[i] != i * 101)
+  abort ();
+}
+
+int main (int argc, char *argv[])
+{
+  workers ();
+  vectors ();
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c++/privatized-ref-3.C 
b/libgomp/testsuite/libgomp.oacc-c++/privatized-ref-3.C
new file mode 100644
index 000..c1a10cba31b
--- /dev/null
+++ b/libgomp/testsuite/lib

Re: [SVE] PR91532

2019-09-20 Thread Jeff Law
On 9/19/19 10:19 AM, Prathamesh Kulkarni wrote:
> Hi,
> For PR91532, the dead store is trivially deleted if we place dse pass
> between ifcvt and vect. Would it be OK to add another instance of dse there ?
> Or should we add an ad-hoc "basic-block dse" sub-pass to ifcvt that
> will clean up the dead store ?
I'd hesitate to add another DSE pass.  If there's one nearby could we
move the existing pass?


Jeff


Re: AW: Fix for type confusion bug on microblaze

2019-09-20 Thread Jeff Law
On 9/12/19 1:48 AM, Jonas Pfeil wrote:
> Yes, you are correct. Tested it and it works as intended.
> 
> Thanks,
> 
> Jonas
> 
> --- a/gcc/config/microblaze/microblaze.h
> +++ b/gcc/config/microblaze/microblaze.h
> @@ -695,7 +695,7 @@ do {  
>   \
>fprintf (STREAM, "\t.align\t%d\n", (LOG))
>  
>  #define ASM_OUTPUT_SKIP(STREAM,SIZE)   \
> -  fprintf (STREAM, "\t.space\t%lu\n", (SIZE))
> +  fprintf (STREAM, "\t.space\t" HOST_WIDE_INT_PRINT_UNSIGNED "\n", (SIZE))
>  
>  #define ASCII_DATA_ASM_OP  "\t.ascii\t"
>  #define STRING_ASM_OP  "\t.asciz\t"
THanks.  I'll be installing this momentarily on the trunk.

jeff


[PATCH] Remove spurious extended character(s) for pa.c

2019-09-20 Thread John David Anglin
This fixes build failure on trunk.  Patch committed to all active branches.

Dave
-- 
John David Anglin  dave.ang...@bell.net

2019-09-20  John David Anglin  

* config/pa/pa.c (pa_trampoline_init): Remove spurious extended
character.

Index: config/pa/pa.c
===
--- config/pa/pa.c  (revision 275998)
+++ config/pa/pa.c  (working copy)
@@ -10197,7 +10197,7 @@
 }

 #ifdef HAVE_ENABLE_EXECUTE_STACK
-  emit_library_call (gen_rtx_SYMBOL_REF (Pmode, "__enable_execute_stack"),
+  emit_library_call (gen_rtx_SYMBOL_REF (Pmode, "__enable_execute_stack"),
 LCT_NORMAL, VOIDmode, XEXP (m_tramp, 0), Pmode);
 #endif
 }


Re: [PATCH] Microblaze: Type confusion in fprintf

2019-09-20 Thread Jeff Law
On 9/12/19 7:33 AM, Jonas Pfeil wrote:
> A type confusion leads to illegal (and nonsensical) assembly on certain host
> architectures (e.g. ARM), where HOST_WIDE_INT (64 bit) and unsigned long (32
> bit) have different alignments. So this has printed an uninitialized
> register instead of the intended value. This fixes float constant
> initialization on an ARM->Microblaze cross compiler.
> 
> --- a/gcc/config/microblaze/microblaze.c
> +++ b/gcc/config/microblaze/microblaze.c
> @@ -2468,7 +2468,7 @@ print_operand (FILE * file, rtx op, int letter)
>   unsigned long value_long;
>   REAL_VALUE_TO_TARGET_SINGLE (*CONST_DOUBLE_REAL_VALUE (op),
>value_long);
> - fprintf (file, HOST_WIDE_INT_PRINT_HEX, value_long);
> + fprintf (file, "%#lx", value_long);
> }
>else
> {
> 
But shouldn't HOST_WIDE_INT_PRINT_HEX here result in the same thing?

>From hwint.h:

#ifndef HAVE_INTTYPES_H
#if INT64_T_IS_LONG
# define GCC_PRI64 HOST_LONG_FORMAT
#else
# define GCC_PRI64 HOST_LONG_LONG_FORMAT
#endif

[ ... ]

#undef PRIx64
#define PRIx64 GCC_PRI64 "x"
[ ... ]

#define HOST_WIDE_INT_PRINT_HEX "%#" PRIx64


If that's not giving us back %#lx, then it seems to me like something is
wrong elsewhere.

jeff


Re: [C][C++] Allow targets to check calls to BUILT_IN_MD functions

2019-09-20 Thread Jeff Law
On 9/17/19 8:49 AM, Richard Sandiford wrote:
> For SVE, we'd like the frontends to check calls to target-specific
> built-in functions in the same way that they already do for "normal"
> builtins.  This patch adds a target hook for that and extends
> check_builtin_function_arguments accordingly.
> 
> A slight complication is that when TARGET_RESOLVE_OVERLOADED_BUILTIN
> has resolved an overload, it can use build_function_call_vec to build
> the call to the underlying non-overloaded function decl.  This in
> turn coerces the arguments to the function type and then calls
> check_builtin_function_arguments to check the final call.  If the
> target does find a problem in this final call, it can be useful
> to refer to the original overloaded function decl in diagnostics,
> since that's what the user wrote.  The patch therefore passes the
> original decl as a final optional parameter to build_function_call_vec.
> 
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  Also tested with the
> follow-on SVE changes that make use of this.  OK to install?
> 
> Richard
> 
> 
> 2019-09-17  Richard Sandiford  
> 
> gcc/
>   * target.def (check_builtin_call): New target hook.
>   * doc/tm.texi.in (TARGET_CHECK_BUILTIN_CALL): New @hook.
>   * doc/tm.texi: Regenerate.
> 
> gcc/c-family/
>   * c-common.h (build_function_call_vec): Take the original
>   function decl as an optional final parameter.
>   (check_builtin_function_arguments): Take the original function decl.
>   * c-common.c (check_builtin_function_arguments): Likewise.
>   Handle all built-in functions, not just BUILT_IN_NORMAL ones.
>   Use targetm.check_builtin_call to check BUILT_IN_MD functions.
> 
> gcc/c/
>   * c-typeck.c (build_function_call_vec): Take the original function
>   decl as an optional final parameter.  Pass all built-in calls to
>   check_builtin_function_arguments.
> 
> gcc/cp/
>   * cp-tree.h (build_cxx_call): Take the original function decl
>   as an optional final parameter.
>   (cp_build_function_call_vec): Likewise.
>   * call.c (build_cxx_call): Likewise.  Pass all built-in calls to
>   check_builtin_function_arguments.
>   * typeck.c (build_function_call_vec): Take the original function
>   decl as an optional final parameter and pass it to
>   cp_build_function_call_vec.
>   (cp_build_function_call_vec): Take the original function
>   decl as an optional final parameter and pass it to build_cxx_call.
LGTM.

jeff


Re: [PATCH] Sync MIPS support from libffi master repository

2019-09-20 Thread Jeff Law
On 9/11/19 1:12 PM, Aurelien Jarno wrote:
> On 2019-08-08 16:22, Jeff Law wrote:
>> On 8/8/19 4:18 PM, Aurelien Jarno wrote:
>>>
>>> It is used by libgo, but it seems to be the last user.
>> Ah, yes, I should have remembered libgo.  And it links via
>> ../../blahblah.  Thanks for digging into it.
>>
>> So, yea, we need it.  So I think the best path forward is to just resync
>> everything to the upstream master.
> 
> I have noticed that parisc and riscv support have commits that haven't
> been propagated on the upstream libffi side. Things might be easier to
> sync if those changes get merged on the libffi side.
> 
FWIW, I bugged Anthony & DJ about libffi a bit during our team meetings
earlier this week.  Essentially my goal was to emphasize to them that
getting a release out the door was desperately needed.

I doubt that changes any of the calculus here.

At least according out the ChangeLog we've only backported RISC-V bits
from upstream.  It's not as clear for the PA bits John checked in a few
years ago.


jeff


Re: [PATCH] V4, patch #1: Rework prefixed/pc-relative lookup

2019-09-20 Thread Segher Boessenkool
Hi Mike,

On Wed, Sep 18, 2019 at 07:49:18PM -0400, Michael Meissner wrote:
> This patch reworks the prefixed and pc-relative memory matching functions.

This mostly looks fine, thanks!  A few smaller things:


>   (pcrel_external_address): Replace with new implementation using
>   address_to_insn_form..

(Two dots is one too many).


> +(define_predicate "pcrel_local_or_external_address"
> +  (match_code "label_ref,symbol_ref,const")
> +{
> +  enum insn_form iform = address_to_insn_form (op, mode, 
> NON_PREFIXED_DEFAULT);
> +  return (iform == INSN_FORM_PCREL_EXTERNAL || iform == 
> INSN_FORM_PCREL_LOCAL);
> +})

(define_predicate "pcrel_local_or_external_address"
  (ior (match_operand 0 "pcrel_local_address")
   (match_operand 0 "pcrel_external_address")))

(or similar) please.  genpreds will generate effectively the same code
as you had automatically.


> +/* Different PowerPC instruction formats that are used by GCC.  There are
> +   various other instruction formats used by the PowerPC hardware, but the
> +   these formats are not currently used by GCC.  */
> +
> +enum insn_form {
> +  INSN_FORM_BAD, /* Bad instruction format.  */
> +  INSN_FORM_BASE_REG,/* Base register only.  */
> +  INSN_FORM_D,   /* Base register + 16-bit numeric 
> offset.  */
> +  INSN_FORM_DS,  /* Base register + 14-bit offset + 00.  
> */
> +  INSN_FORM_DQ,  /* Base register + 12-bit offset + 
> .  */

It may be easier to describe DS-form as "D-form, with the offset aligned
to a (single) word" and DQ-form as "D-form, with the offset aligned to a
quad-word".  (Or what you do below; see below).

> +  INSN_FORM_X,   /* Base register + index register.  */
> +  INSN_FORM_UPDATE,  /* Address udpates base register.  */

(typo, "updates").

> +  INSN_FORM_LO_SUM,  /* Special offset instruction.  */

That's a somewhat lame description :-)  It's not really a separate form
insn anyway, hrm.  Can you come up with a better comment?  I have no
suggestions, so yeah maybe just keep it like you have.


> +  INSN_FORM_PREFIXED_NUMERIC,/* Base register + 34 bit numeric 
> offset.  */
> +  INSN_FORM_PCREL_LOCAL, /* Pc-relative local symbol.  */
> +  INSN_FORM_PCREL_EXTERNAL   /* Pc-relative external symbol.  */
> +};

Either pc or PC please.  It's an initialism.


> +/* Instruction format for the non-prefixed version of a load or store.  This 
> is
> +   used to determine if a 16-bit offset is valid to be used with a 
> non-prefixed
> +   (traditional) instruction or if the bottom bits of the offset cannot be 
> used
> +   with a DS or DQ instruction format, and GCC has to use a prefixed
> +   instruction for the load or store.  */
> +
> +enum non_prefixed {
> +  NON_PREFIXED_DEFAULT,  /* Use the default.  */
> +  NON_PREFIXED_D,/* All 16-bits are valid.  */
> +  NON_PREFIXED_DS,   /* Bottom 2 bits must be 0.  */
> +  NON_PREFIXED_DQ,   /* Bottom 4 bits must be 0.  */
> +  NON_PREFIXED_X /* No offset memory form exists.  */
> +};

Yeah the DS- and DQ-form descriptions here are nicer I think, thanks.

Maybe non_prefixed_form is a clearer name?  But it is longer of course.
You decide.


> +  if (SYMBOL_REF_P (x) && !SYMBOL_REF_LOCAL_P (x))
> + fputs ("@got", file);
> +
>fputs ("@pcrel", file);

I'd just use fprintf btw, GCC knows since decades to optimise that to
fputs, and it is easier to read IMO.  Not that fputs is so super bad,
but every little we do not have to think helps (helps us think, think
about more important matters!)


> +/* Given an address (ADDR), a mode (MODE), and what the format of the
> +   non-prefixed address (NON_PREFIXED_INSN) is, return the instruction format
> +   for the address.  */
> +
> +enum insn_form
> +address_to_insn_form (rtx addr,
> +   machine_mode mode,
> +   enum non_prefixed non_prefixed_insn)

non_prefixed_form, instead?


> +{
> +  rtx op0, op1;

You can declare these at first use.  Declaring things in multiple blocks
(so with shorter scopes) is a bit nicer.

> +  /* If we don't support offset addressing, make sure only indexed addressing
> + is allowed.  We special case SDmode so that the register allocator does
> + try to move SDmode through GPR registers, but instead uses the 32-bit
> + integer read/write instructions for the floating point registers.  */

Does *not* try?

Read/write, do you mean load/store?  lfiwzx and friends?


> +  if (GET_CODE (addr) != PLUS)
> +return GET_CODE (addr) == LO_SUM ? INSN_FORM_LO_SUM : INSN_FORM_BAD;

  if (GET_CODE (addr) == LO_SUM)
return INSN_FORM_LO_SUM;

  if (GET_CODE (addr) != PLUS)
return INSN_FORM_BAD;

(Avoid using the conditional operator if you can use an "if" statement
just as well; easier to read).

> +  op0 = XEXP (addr, 0);
> +  op1 = XEXP (addr, 1);
> +
> +  if (REG_P (op1) || SUBR

[PATCH, AArch64] Fix PR target/91834

2019-09-20 Thread Richard Henderson
As diagnosed in the PR.

* config/aarch64/lse.S (LDNM): Ensure STXR output does not
overlap the inputs.
---
 libgcc/config/aarch64/lse.S | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/libgcc/config/aarch64/lse.S b/libgcc/config/aarch64/lse.S
index a5f6673596c..c7979382ad7 100644
--- a/libgcc/config/aarch64/lse.S
+++ b/libgcc/config/aarch64/lse.S
@@ -227,8 +227,8 @@ STARTFN NAME(LDNM)
 8: mov s(tmp0), s(0)
 0: LDXRs(0), [x1]
OP  s(tmp1), s(0), s(tmp0)
-   STXRw(tmp1), s(tmp1), [x1]
-   cbnzw(tmp1), 0b
+   STXRw(tmp2), s(tmp1), [x1]
+   cbnzw(tmp2), 0b
ret
 
 ENDFN  NAME(LDNM)
-- 
2.17.1



[PATCH, AArch64] PR target/91833

2019-09-20 Thread Richard Henderson
Tested on aarch64-linux (glibc) and aarch64-elf (installed newlib).

The existing configure claims to be generated by 2.69, but there
are changes wrt the autoconf distributed with Ubuntu 18.  Nothing
that seems untoward though.


r~


* config/aarch64/lse-init.c: Include auto-target.h.  Disable
initialization if !HAVE_SYS_AUXV_H.
* configure.ac (AC_CHECK_HEADERS): Add sys/auxv.h.
* config.in, configure: Rebuild.
---
 libgcc/config/aarch64/lse-init.c |  4 +++-
 libgcc/config.in |  8 
 libgcc/configure | 26 +++---
 libgcc/configure.ac  |  2 +-
 4 files changed, 31 insertions(+), 9 deletions(-)
 mode change 100644 => 100755 libgcc/configure

diff --git a/libgcc/config/aarch64/lse-init.c b/libgcc/config/aarch64/lse-init.c
index 33d29147479..1a8f4c55213 100644
--- a/libgcc/config/aarch64/lse-init.c
+++ b/libgcc/config/aarch64/lse-init.c
@@ -23,12 +23,14 @@ a copy of the GCC Runtime Library Exception along with this 
program;
 see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 .  */
 
+#include "auto-target.h"
+
 /* Define the symbol gating the LSE implementations.  */
 _Bool __aarch64_have_lse_atomics
   __attribute__((visibility("hidden"), nocommon));
 
 /* Disable initialization of __aarch64_have_lse_atomics during bootstrap.  */
-#ifndef inhibit_libc
+#if !defined(inhibit_libc) && defined(HAVE_SYS_AUXV_H)
 # include 
 
 /* Disable initialization if the system headers are too old.  */
diff --git a/libgcc/config.in b/libgcc/config.in
index d634af9d949..59a3d8daf52 100644
--- a/libgcc/config.in
+++ b/libgcc/config.in
@@ -43,6 +43,9 @@
 /* Define to 1 if you have the  header file. */
 #undef HAVE_STRING_H
 
+/* Define to 1 if you have the  header file. */
+#undef HAVE_SYS_AUXV_H
+
 /* Define to 1 if you have the  header file. */
 #undef HAVE_SYS_STAT_H
 
@@ -82,6 +85,11 @@
 /* Define to 1 if the target use emutls for thread-local storage. */
 #undef USE_EMUTLS
 
+/* Enable large inode numbers on Mac OS X 10.5.  */
+#ifndef _DARWIN_USE_64_BIT_INODE
+# define _DARWIN_USE_64_BIT_INODE 1
+#endif
+
 /* Number of bits in a file offset, on hosts where this is settable. */
 #undef _FILE_OFFSET_BITS
 
diff --git a/libgcc/configure b/libgcc/configure
old mode 100644
new mode 100755
index 29f647319b4..28c7394b3f9
--- a/libgcc/configure
+++ b/libgcc/configure
@@ -675,6 +675,7 @@ infodir
 docdir
 oldincludedir
 includedir
+runstatedir
 localstatedir
 sharedstatedir
 sysconfdir
@@ -765,6 +766,7 @@ datadir='${datarootdir}'
 sysconfdir='${prefix}/etc'
 sharedstatedir='${prefix}/com'
 localstatedir='${prefix}/var'
+runstatedir='${localstatedir}/run'
 includedir='${prefix}/include'
 oldincludedir='/usr/include'
 docdir='${datarootdir}/doc/${PACKAGE_TARNAME}'
@@ -1017,6 +1019,15 @@ do
   | -silent | --silent | --silen | --sile | --sil)
 silent=yes ;;
 
+  -runstatedir | --runstatedir | --runstatedi | --runstated \
+  | --runstate | --runstat | --runsta | --runst | --runs \
+  | --run | --ru | --r)
+ac_prev=runstatedir ;;
+  -runstatedir=* | --runstatedir=* | --runstatedi=* | --runstated=* \
+  | --runstate=* | --runstat=* | --runsta=* | --runst=* | --runs=* \
+  | --run=* | --ru=* | --r=*)
+runstatedir=$ac_optarg ;;
+
   -sbindir | --sbindir | --sbindi | --sbind | --sbin | --sbi | --sb)
 ac_prev=sbindir ;;
   -sbindir=* | --sbindir=* | --sbindi=* | --sbind=* | --sbin=* \
@@ -1154,7 +1165,7 @@ fi
 for ac_var in  exec_prefix prefix bindir sbindir libexecdir datarootdir \
datadir sysconfdir sharedstatedir localstatedir includedir \
oldincludedir docdir infodir htmldir dvidir pdfdir psdir \
-   libdir localedir mandir
+   libdir localedir mandir runstatedir
 do
   eval ac_val=\$$ac_var
   # Remove trailing slashes.
@@ -1307,6 +1318,7 @@ Fine tuning of the installation directories:
   --sysconfdir=DIRread-only single-machine data [PREFIX/etc]
   --sharedstatedir=DIRmodifiable architecture-independent data [PREFIX/com]
   --localstatedir=DIR modifiable single-machine data [PREFIX/var]
+  --runstatedir=DIR   modifiable per-process data [LOCALSTATEDIR/run]
   --libdir=DIRobject code libraries [EPREFIX/lib]
   --includedir=DIRC header files [PREFIX/include]
   --oldincludedir=DIR C header files for non-gcc [/usr/include]
@@ -4173,7 +4185,7 @@ else
 We can't simply define LARGE_OFF_T to be 9223372036854775807,
 since some C++ compilers masquerading as C compilers
 incorrectly reject 9223372036854775807.  */
-#define LARGE_OFF_T (((off_t) 1 << 62) - 1 + ((off_t) 1 << 62))
+#define LARGE_OFF_T off_t) 1 << 31) << 31) - 1 + (((off_t) 1 << 31) << 31))
   int off_t_is_large[(LARGE_OFF_T % 2147483629 == 721
   && LARGE_OFF_T % 2147483647 == 1)
  ? 1 : -1];
@@ -4219,7 +4231,7 @@ else
 We can't simply define LARGE_OFF_T to be

[PATCH] Fix up sqrt(x) < c and sqrt(x) >= c match.pd folding (PR tree-optimization/91734, take 2)

2019-09-20 Thread Jakub Jelinek
On Mon, Sep 16, 2019 at 08:56:58AM +0200, Richard Biener wrote:
> > As mentioned in the PR, the sqrt (x) < c optimization into x < c*c
> > sometimes breaks the boundary case, if c2=c*c is inexact then in some cases
> > we need to optimize it into x <= c*c rather than x < c*c.  The original
> > bugreport is when c is small and c2 is 0.0, then obviously we need <= 0.0
> > rather than < 0.0, but the testcase includes another example where it makes
> > a difference, plus has a >= testcase too.
> > 
> > Bootstrapped/regtested on powerpc64le-linux, ok for trunk?
> 
> I was hoping Joseph might chime in here...  anyway, does this assume
> round-to-nearest or does it work with round to +-Inf as well?  I
> realize this all is under flag_unsafe_math_optimizations, but
> this flag is notoriously underspecified...  So the question is
> whether we should disable the transform if c*c isn't exact and
> flag_rounding_math?  The transform also doesn't seem to guard
> against isnan (c) (-funsafe-math-optimizations sets
> -fno-trapping-math and -fno-signed-zeros but not -ffinite-math-only
> or disables itself on -frounding-math)

Here is an updated patch, which on top of the previous patch:
1) punts for -frounding-math
2) punts for sqrt comparisons against NaN constant
3) for the c*c inexact also handles the other two comparisons that
apparently need to be handled too
4) for all 4 comparisons also checks nexttoward (c2, 0.0) or nexttoward (c2,
inf) depending on the comparison kind, because as Joseph correctly noted,
with rounding to nearest up to 3 different floating point values can have
the same sqrt result, and if c2 is the middle one from them, we need to use
the 1 ulp smaller or larger one in the comparison
5) had to adjust the testcase, because while it worked fine on powerpc64le,
on x86_64 if the test is linked with -ffast-math/-Ofast etc., crtfastmath.o
is linked in and subnormals are flushed to zero, which is not what we want
for the testcase (at least for a subset of the tests).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

BTW, I've used attached programs to look for the problematic cases on random
float/doubles and the cases the patch handles seem to be the only
problematic ones, there is never need to go further than one nexttoward to 0
or inf.

2019-09-21  Jakub Jelinek  

PR tree-optimization/91734
* generic-match-head.c: Include fold-const-call.h.
* match.pd (sqrt(x) cmp c): Check the boundary value and
in case inexact computation of c*c affects comparison of the boundary,
turn LT_EXPR into LE_EXPR, GE_EXPR into GT_EXPR, LE_EXPR into LT_EXPR
or GT_EXPR into GE_EXPR.  Punt for sqrt comparisons against NaN and
for -frounding-math.  For c2, try the next smaller or larger floating
point constant depending on comparison code and if it has the same
sqrt as c2, use it instead of c2.

* gcc.dg/pr91734.c: New test.

--- gcc/generic-match-head.c.jj 2019-09-20 12:24:56.376189996 +0200
+++ gcc/generic-match-head.c2019-09-20 12:43:08.017273166 +0200
@@ -29,6 +29,7 @@ along with GCC; see the file COPYING3.
 #include "cgraph.h"
 #include "vec-perm-indices.h"
 #include "fold-const.h"
+#include "fold-const-call.h"
 #include "stor-layout.h"
 #include "tree-dfa.h"
 #include "builtins.h"
--- gcc/match.pd.jj 2019-09-20 12:25:27.323710388 +0200
+++ gcc/match.pd2019-09-20 17:20:22.974316837 +0200
@@ -3711,8 +3711,7 @@ (define_operator_list COND_TERNARY
  (cmp { tem; } @1)
 
  /* Fold comparisons against built-in math functions.  */
- (if (flag_unsafe_math_optimizations
-  && ! flag_errno_math)
+ (if (flag_unsafe_math_optimizations && ! flag_errno_math)
   (for sq (SQRT)
(simplify
 (cmp (sq @0) REAL_CST@1)
@@ -3747,56 +3746,108 @@ (define_operator_list COND_TERNARY
  if x is negative or NaN.  Due to -funsafe-math-optimizations,
  the results for other x follow from natural arithmetic.  */
(cmp @0 @1)))
- (if (cmp == GT_EXPR || cmp == GE_EXPR)
+ (if ((cmp == LT_EXPR
+  || cmp == LE_EXPR
+  || cmp == GT_EXPR
+  || cmp == GE_EXPR)
+ && !REAL_VALUE_ISNAN (TREE_REAL_CST (@1))
+ /* Give up for -frounding-math.  */
+ && !HONOR_SIGN_DEPENDENT_ROUNDING (TREE_TYPE (@0)))
   (with
{
- REAL_VALUE_TYPE c2;
+REAL_VALUE_TYPE c2;
+enum tree_code ncmp = cmp;
+const real_format *fmt
+  = REAL_MODE_FORMAT (TYPE_MODE (TREE_TYPE (@0)));
 real_arithmetic (&c2, MULT_EXPR,
  &TREE_REAL_CST (@1), &TREE_REAL_CST (@1));
-real_convert (&c2, TYPE_MODE (TREE_TYPE (@0)), &c2);
+real_convert (&c2, fmt, &c2);
+/* See PR91734: if c2 is inexact and sqrt(c2) < c (or sqrt(c2) >= c),
+   then change LT_EXPR into LE_EXPR or GE_EXPR into GT_EXPR.  */
+if (!REAL_VALUE_ISINF (c2))
+  {
+tree c3 = fold_const_call (CFN

Re: [C++ PATCH 4/4] PR c++/30277 - int-width bit-field promotion.

2019-09-20 Thread Jakub Jelinek
On Mon, Sep 16, 2019 at 12:33:28AM -0400, Jason Merrill wrote:
> Here, if cp_perform_integral_promotions saw that the TREE_TYPE of a
> bit-field reference was the same as the type it promotes to, it didn't do
> anything.  But then decay_conversion saw that the bit-field reference was
> unchanged, and converted it to its declared type.  So I needed to add
> something to make it clear that promotion has been done.  But then the 33819
> change caused trouble by looking through the NOP_EXPR I just added.  This
> was the wrong fix for that bug; I've now fixed that better by recognizing in
> cp_perform_integral_promotions that we won't promote a bit-field larger than
> 32 bits, so we should use the declared type.
> 
> Tested x86_64-pc-linux-gnu, applying to trunk.
> 
>   PR c++/33819 - long bit-field promotion.
>   * typeck.c (cp_perform_integral_promotions): Handle large bit-fields
>   properly.  Handle 32-bit non-int bit-fields properly.
>   (is_bitfield_expr_with_lowered_type): Don't look through NOP_EXPR.

> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/expr/bitfield14.C
> @@ -0,0 +1,17 @@
> +// PR c++/30277
> +// { dg-do compile { target c++11 } }
> +
> +struct S
> +{
> +  signed long l: 32;
> +};
> +
> +void foo(long) = delete;
> +void foo(int) {}
> +
> +int main()
> +{
> +  S x = {1};
> +  foo(x.l+0);
> +  return 0;
> +}

This testcase fails on all targets where int and long have the same
precision.  Is that a bug on the compiler side, or should the testcase just
use a type with a larger precision than int (i.e. long long), like following
(tested on x86_64-linux and i686-linux):

2019-09-21  Jakub Jelinek  

PR c++/30277
* g++.dg/expr/bitfield14.C (struct S): Use signed long long instead
of signed long.
(foo): Use long long instead of long.

--- gcc/testsuite/g++.dg/expr/bitfield14.C.jj   2019-09-20 12:25:24.833748976 
+0200
+++ gcc/testsuite/g++.dg/expr/bitfield14.C  2019-09-21 08:17:18.500234760 
+0200
@@ -3,10 +3,10 @@
 
 struct S
 {
-  signed long l: 32;
+  signed long long l: 32;
 };
 
-void foo(long) = delete;
+void foo(long long) = delete;
 void foo(int) {}
 
 int main()


Jakub