Re: Canonicalization of compares performed as side-effect operations

2019-08-07 Thread Uros Bizjak
On Tue, Aug 6, 2019 at 11:03 PM Richard Earnshaw (lists)
 wrote:
>
> On 06/08/2019 18:39, Uros Bizjak wrote:
> > Hello!
> >
> >> On Tue, Aug 06, 2019 at 05:49:17PM +0100, Richard Earnshaw (lists) wrote:
> >>> On 06/08/2019 17:39, Segher Boessenkool wrote:
> >> What's wrong with describing the canonical form in your MD?  You'll 
> >> need
> >> some reversed condition code thingy, but that's it?
> >
> > It doesn't describe what the instruction does.  The negation has a side
> > effect of setting the flags, but the flags are swapped because the
> > side-effect comparison is swapped from a normal compare.  As I
> > mentioned, SELECT_CC_MODE doesn't help because it can't see the context
> > and the comparison just looks 'normal'.
> 
>  Sure, and we can work on making combine do what you want, but your 
>  existing
>  pattern is *incorrect*.  It needs fixing, and probably before we do other
>  things.
> >>>
> >>> Why is it incorrect?  It's not canonical, sure.  But the cannonical form
> >>> does NOT describe what the instruction does.
> >>
> >> More precisely: not having the canonical form of this in your MD is what
> >> is incorrect.
> >
> > There is TARGET_CANONICALIZE_COMPARISON target hook that can solve the
> > issue here. Please see x86, where we canonicalize
> > *cmp__i387 via
> > ix86_canonicalize_comparison. As said in the comment:
> >
> >   /* The order of operands in x87 ficom compare is forced by combine in
> >  simplify_comparison () function. Float operator is treated as RTX_OBJ
> >  with a precedence over other operators and is always put in the first
> >  place. Swap condition and operands to match ficom instruction.  */
> >
> > The issue looks similar to me.
> >
>
> That hook can't help as it can't see anything other than the operands of
> the compare, and from the operands it looks as though the operands
> should be swapped.  To correctly canonicalize this you need the whole
> parallel, or some other hint that the operation is a side-effect of
> another operation.

I see. FYI, there is the same problem in i386.md with "*neg2_cmpz"

--cut here--
;; The problem with neg is that it does not perform (compare x 0),
;; it really performs (compare 0 x), which leaves us with the zero
;; flag being the only useful item.

(define_insn "*neg2_cmpz"
  [(set (reg:CCZ FLAGS_REG)
(compare:CCZ
  (neg:SWI (match_operand:SWI 1 "nonimmediate_operand" "0"))
   (const_int 0)))
   (set (match_operand:SWI 0 "nonimmediate_operand" "=m")
(neg:SWI (match_dup 1)))]
--cut here--

However, when trying to generate the pattern using following testcase:

--cut here--
void foo (void);
void bar (void);

int
test (int a, int b)
{
  int r;

  if (r = -b)
foo ();
  else
bar ();

  return r;
}
--cut here--

combine doesn't even consider the combination, since the compare looks
at the original value:

(insn 7 4 8 2 (parallel [
(set (reg/v:SI 82 [  ])
(neg:SI (reg/v:SI 84 [ b ])))
(clobber (reg:CC 17 flags))
]) "neg.c":9:9 461 {*negsi2_1}
 (expr_list:REG_UNUSED (reg:CC 17 flags)
(nil)))
(insn 8 7 9 2 (set (reg:CCZ 17 flags)
(compare:CCZ (reg/v:SI 84 [ b ])
(const_int 0 [0]))) "neg.c":9:6 7 {*cmpsi_ccno_1}
 (expr_list:REG_DEAD (reg/v:SI 84 [ b ])
(nil)))

probably due to tree optimizers that give us:

  r_3 = -b_2(D);
  if (b_2(D) != 0)
goto ; [50.00%]
  else
goto ; [50.00%]

Uros.


[committed] Add support for use_device_addr clause

2019-08-07 Thread Jakub Jelinek
Hi!

This patch adds support for use_device_addr clause and restricts
use_device_ptr clause to pointers or for C++ references to pointers.
Before use_device_ptr handled both pointers and arrays and references to
them, the arrays as remapping just their address, newly the latter is what
is done by use_device_addr and can be done also with other types, not just
arrays.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

2019-08-07  Jakub Jelinek  

* tree-core.h (enum omp_clause_code): Adjust OMP_CLAUSE_USE_DEVICE_PTR
OpenMP description.  Add OMP_CLAUSE_USE_DEVICE_ADDR clause.
* tree.c (omp_clause_num_ops, omp_clause_code_name): Add entries
for OMP_CLAUSE_USE_DEVICE_ADDR clause.
(walk_tree_1): Handle OMP_CLAUSE_USE_DEVICE_ADDR.
* tree-pretty-print.c (dump_omp_clause): Likewise.
* tree-nested.c (convert_nonlocal_omp_clauses,
convert_local_omp_clauses): Likewise.
* gimplify.c (gimplify_scan_omp_clauses, gimplify_adjust_omp_clauses):
Likewise.
* omp-low.c (scan_sharing_clauses, lower_omp_target): Likewise.
Treat OMP_CLAUSE_USE_DEVICE_ADDR like OMP_CLAUSE_USE_DEVICE_PTR
clause with array or reference to array types, no matter what type
except for reference it has.
gcc/c-family/
* c-pragma.h (enum pragma_omp_clause): Add
PRAGMA_OMP_CLAUSE_USE_DEVICE_ADDR.  Set PRAGMA_OACC_CLAUSE_USE_DEVICE
equal to PRAGMA_OMP_CLAUSE_USE_DEVICE_PTR instead of being a separate
enumeration value.
gcc/c/
* c-parser.c (c_parser_omp_clause_name): Parse use_device_addr clause.
(c_parser_omp_clause_use_device_addr): New function.
(c_parser_omp_all_clauses): Handle PRAGMA_OMP_CLAUSE_USE_DEVICE_ADDR.
(OMP_TARGET_DATA_CLAUSE_MASK): Add PRAGMA_OMP_CLAUSE_USE_DEVICE_ADDR.
(c_parser_omp_target_data): Handle PRAGMA_OMP_CLAUSE_USE_DEVICE_ADDR
like PRAGMA_OMP_CLAUSE_USE_DEVICE_PTR, adjust diagnostics about no
map or use_device_* clauses.
* c-typeck.c (c_finish_omp_clauses): For OMP_CLAUSE_USE_DEVICE_PTR
in OpenMP, require pointer type rather than pointer or array type.
Handle OMP_CLAUSE_USE_DEVICE_ADDR.
gcc/cp/
* parser.c (cp_parser_omp_clause_name): Parse use_device_addr clause.
(cp_parser_omp_all_clauses): Handle PRAGMA_OMP_CLAUSE_USE_DEVICE_ADDR.
(OMP_TARGET_DATA_CLAUSE_MASK): Add PRAGMA_OMP_CLAUSE_USE_DEVICE_ADDR.
(cp_parser_omp_target_data): Handle PRAGMA_OMP_CLAUSE_USE_DEVICE_ADDR
like PRAGMA_OMP_CLAUSE_USE_DEVICE_PTR, adjust diagnostics about no
map or use_device_* clauses.
* semantics.c (finish_omp_clauses): For OMP_CLAUSE_USE_DEVICE_PTR
in OpenMP, require pointer or reference to pointer type rather than
pointer or array or reference to pointer or array type. Handle
OMP_CLAUSE_USE_DEVICE_ADDR.
* pt.c (tsubst_omp_clauses): Handle OMP_CLAUSE_USE_DEVICE_ADDR.
gcc/testsuite/
* c-c++-common/gomp/target-data-1.c (foo): Use use_device_addr clause
instead of use_device_ptr clause where required by OpenMP 5.0, add
further tests for both use_device_ptr and use_device_addr clauses.
libgomp/
* testsuite/libgomp.c/target-18.c (struct S): New type.
(foo): Use use_device_addr clause instead of use_device_ptr clause
where required by OpenMP 5.0, add further tests for both use_device_ptr
and use_device_addr clauses.
* testsuite/libgomp.c++/target-9.C (struct S): New type.
(foo): Use use_device_addr clause instead of use_device_ptr clause
where required by OpenMP 5.0, add further tests for both use_device_ptr
and use_device_addr clauses.  Add t and u arguments.
(main): Adjust caller.

--- gcc/tree-core.h.jj  2019-08-06 09:22:22.306952590 +0200
+++ gcc/tree-core.h 2019-08-06 10:41:51.277680188 +0200
@@ -307,9 +307,12 @@ enum omp_clause_code {
   OMP_CLAUSE_MAP,
 
   /* OpenACC clause: use_device (variable-list).
- OpenMP clause: use_device_ptr (variable-list).  */
+ OpenMP clause: use_device_ptr (ptr-list).  */
   OMP_CLAUSE_USE_DEVICE_PTR,
 
+  /* OpenMP clause: use_device_addr (variable-list).  */
+  OMP_CLAUSE_USE_DEVICE_ADDR,
+
   /* OpenMP clause: is_device_ptr (variable-list).  */
   OMP_CLAUSE_IS_DEVICE_PTR,
 
--- gcc/tree.c.jj   2019-08-06 09:22:15.753052010 +0200
+++ gcc/tree.c  2019-08-06 10:41:51.270680293 +0200
@@ -299,6 +299,7 @@ unsigned const char omp_clause_num_ops[]
   2, /* OMP_CLAUSE_TO  */
   2, /* OMP_CLAUSE_MAP  */
   1, /* OMP_CLAUSE_USE_DEVICE_PTR  */
+  1, /* OMP_CLAUSE_USE_DEVICE_ADDR  */
   1, /* OMP_CLAUSE_IS_DEVICE_PTR  */
   1, /* OMP_CLAUSE_INCLUSIVE  */
   1, /* OMP_CLAUSE_EXCLUSIVE  */
@@ -382,6 +383,7 @@ const char * const omp_clause_code_name[
   "to",
   "map",
   "use_device_ptr",
+  "use_device_addr",
   "is_device_ptr",
   "inclusive",
   "exclusive",
@@ -12384,6 +12386,7 @@ wal

[PATCH] PR 53796 Make inquire(file=, recl=) conform to F2018

2019-08-07 Thread Janne Blomqvist
In my original patch to fix PR 53796 I forgot to fix the behavior for
unconnected units when inquiring via filename. This patch fixes that.

Regtested on x86_64-pc-linux-gnu, committed as obvious.

libgfortran/ChangeLog:

2019-08-07  Janne Blomqvist  

PR fortran/53796
* io/inquire.c (inquire_via_filename): Set recl to -1 for
unconnected units.

gcc/testsuite/ChangeLog:

2019-08-07  Janne Blomqvist  

PR fortran/53796
* gfortran.dg/inquire_recl_f2018.f90: Test for unconnected unit
with inquire via filename.
---
 gcc/testsuite/gfortran.dg/inquire_recl_f2018.f90 | 7 +++
 libgfortran/io/inquire.c | 4 +++-
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gfortran.dg/inquire_recl_f2018.f90 
b/gcc/testsuite/gfortran.dg/inquire_recl_f2018.f90
index b744e920f7b..dfb4092a45f 100644
--- a/gcc/testsuite/gfortran.dg/inquire_recl_f2018.f90
+++ b/gcc/testsuite/gfortran.dg/inquire_recl_f2018.f90
@@ -39,4 +39,11 @@ program inqrecl
   if (r /= -2) then
  STOP 5
   end if
+
+  ! Also inquire by filename for a non-opened unit is considered
+  ! unconnected similar to the first test.
+  inquire(file='unconnectedfile.txt', recl=r)
+  if (r /= -1) then
+ stop 6
+  end if
 end program inqrecl
diff --git a/libgfortran/io/inquire.c b/libgfortran/io/inquire.c
index 05cfc14233f..c2174d0a307 100644
--- a/libgfortran/io/inquire.c
+++ b/libgfortran/io/inquire.c
@@ -706,7 +706,9 @@ inquire_via_filename (st_parameter_inquire *iqp)
 }
 
   if ((cf & IOPARM_INQUIRE_HAS_RECL_OUT) != 0)
-*iqp->recl_out = 0;
+/* F2018 (N2137) 12.10.2.26: If there is no connection, recl is
+   assigned the value -1.  */
+*iqp->recl_out = -1;
 
   if ((cf & IOPARM_INQUIRE_HAS_NEXTREC) != 0)
 *iqp->nextrec = 0;
-- 
2.17.1



Re: [PATCH] Fix file descriptor existence of MinGW.

2019-08-07 Thread Martin Liška
On 8/6/19 6:18 PM, Martin Sebor wrote:
> On 8/6/19 9:55 AM, Martin Liška wrote:
>> On 8/6/19 5:35 PM, Martin Sebor wrote:
>>> On 8/6/19 6:04 AM, Martin Liška wrote:
 Hi.

 The patch is about proper checking of file descriptors on Windows.

 Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>>>
>>> Is there a way to share the definition of the new function so
>>> it doesn't have to be duplicated?
>>
>> I would like to, but I'm wondering which header and source file
>> should I use for it?
> 
> I'm not sure.  It's a general purpose function that could go
> in some utility library like libiberty but I don't know what
> the process for enhancing it is to tell if that's appropriate.

Yep, I like the suggested approach. I've done that in the updated
patch.

> 
>>> Other than that, I'm wondering if the guard for F_GETFD is
>>> necessary and when (the original code didn't guard its use).
>>
>> Because the value is not defined on MinGW targets where _get_osfhandle
>> call must be used ;)
> 
> I was unclear -- sorry.  What I meant was: since there is
> a check for MinGW, is the check for F_GETFD necessary, or
> would this be sufficient and preferable?
> 
>   #if defined(_WIN32)
>     return _get_osfhandle (fd) != -1;
>   else
>     return fcntl (fd, F_GETFD) >= 0;
>   #endif
> 
> That way it's clear that only two possibilities are expected.

Ah yes, included.

Martin

> 
> Martin
> 
>>
>> Martin
>>
>>>
>>> Martin
>>>
>>>

 @Pekka: Can you please test it on Windows?

 Ready to be installed?
 Thanks,
 Martin

 gcc/ChangeLog:

 2019-08-06  Martin Liska  

  PR bootstrap/91352
  * gcc.c (fd_exists): New.
  (driver::detect_jobserver): Use fd_exists.
  * lto-wrapper.c (fd_exists): New.
  (jobserver_active_p): Use fd_exists.
 ---
    gcc/gcc.c | 19 +--
    gcc/lto-wrapper.c | 19 +--
    2 files changed, 34 insertions(+), 4 deletions(-)


>>>
>>
> 

>From e76fdfabf88305b26eb53690c251d016e6c5e455 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Tue, 6 Aug 2019 13:04:40 +0200
Subject: [PATCH] Fix file descriptor existence of MinGW.

gcc/ChangeLog:

2019-08-07  Martin Liska  

	PR bootstrap/91352
	* gcc.c (driver::detect_jobserver): Use fd_exists.
	* lto-wrapper.c (jobserver_active_p): Likewise.

include/ChangeLog:

2019-08-07  Martin Liska  

	PR bootstrap/91352
	* libiberty.h (fd_exists): New function.

libiberty/ChangeLog:

2019-08-07  Martin Liska  

	PR bootstrap/91352
	* lrealpath.c (fd_exists): New function.
---
 gcc/gcc.c |  4 ++--
 gcc/lto-wrapper.c |  4 ++--
 include/libiberty.h   |  4 
 libiberty/lrealpath.c | 14 ++
 4 files changed, 22 insertions(+), 4 deletions(-)

diff --git a/gcc/gcc.c b/gcc/gcc.c
index 18a07426290..e86b35453c2 100644
--- a/gcc/gcc.c
+++ b/gcc/gcc.c
@@ -8380,8 +8380,8 @@ driver::detect_jobserver () const
 	= (sscanf (n + strlen (needle), "%d,%d", &rfd, &wfd) == 2
 	   && rfd > 0
 	   && wfd > 0
-	   && fcntl (rfd, F_GETFD) >= 0
-	   && fcntl (wfd, F_GETFD) >= 0);
+	   && fd_exists (rfd)
+	   && fd_exists (wfd));
 
 	  /* Drop the jobserver if it's not working now.  */
 	  if (!jobserver)
diff --git a/gcc/lto-wrapper.c b/gcc/lto-wrapper.c
index 3414adedd26..a3b0130b7cb 100644
--- a/gcc/lto-wrapper.c
+++ b/gcc/lto-wrapper.c
@@ -1237,8 +1237,8 @@ jobserver_active_p (void)
   return (sscanf (n + strlen (needle), "%d,%d", &rfd, &wfd) == 2
 	  && rfd > 0
 	  && wfd > 0
-	  && fcntl (rfd, F_GETFD) >= 0
-	  && fcntl (wfd, F_GETFD) >= 0);
+	  && fd_exists (rfd)
+	  && fd_exists (wfd));
 }
 
 /* Execute gcc. ARGC is the number of arguments. ARGV contains the arguments. */
diff --git a/include/libiberty.h b/include/libiberty.h
index 635519e088a..254c00bcc6b 100644
--- a/include/libiberty.h
+++ b/include/libiberty.h
@@ -137,6 +137,10 @@ extern const char *unix_lbasename (const char *) ATTRIBUTE_RETURNS_NONNULL ATTRI
 
 extern char *lrealpath (const char *);
 
+/* Return true when FD file descriptor exists.  */
+
+extern int fd_exists (int fd);
+
 /* Concatenate an arbitrary number of strings.  You must pass NULL as
the last argument of this function, to terminate the list of
strings.  Allocates memory using xmalloc.  */
diff --git a/libiberty/lrealpath.c b/libiberty/lrealpath.c
index 7f66dc2b1bd..4fdb522e3f2 100644
--- a/libiberty/lrealpath.c
+++ b/libiberty/lrealpath.c
@@ -49,6 +49,7 @@ components will be simplified.  The returned value will be allocated using
 #ifdef HAVE_STRING_H
 #include 
 #endif
+#include 
 
 /* On GNU libc systems the declaration is only visible with _GNU_SOURCE.  */
 #if defined(HAVE_CANONICALIZE_FILE_NAME) \
@@ -155,3 +156,16 @@ lrealpath (const char *filename)
   /* This system is a lost cause, just duplicate the filename.  */
   return strdup (filename);
 }
+
+/* Return true when FD file descriptor exists.  */
+
+int
+fd_exis

Re: [PATCH] Detect not-cloned new/delete operators in DCE.

2019-08-07 Thread Martin Liška
On 8/6/19 7:02 PM, Martin Jambor wrote:
> Hi,
> 
> unfortunately I cannot look into the problem now and I don't have my
> phone set up to review patches in a sane way, but to answer your
> question below...

Thank you Martin for answer. It can definitely wait once you're back
at the office.

> 
> 
> On Tue, Aug 06 2019, Martin Liška wrote:
>> On 8/6/19 2:42 PM, Martin Liška wrote:
> 
> ...
> 
>>> Hm, strange that the ISRA clones don't have n->clone_of set. It's created 
>>> here:
>>>
> ...
> 
>>>
>>> @Martin, @Honza: Why do we not set clone_of in this transformation?
>>
> ...node->clone_of is set only while a clone created in the true IPA
> stages has no body on its own and shares the body with the
> original. These clones form a tree and their clone_of is cleared when
> they get a body.  IPA-SRA is not a true IPA pass and the clones it
> creates are created with create_clone_with_body (or similarly named)
> method which immediately gives them a body, so setting clone_of would be
> wrong. (The new IPA-SRA is a true IPA pass and so its clones have phase
> when their clone_of is set).
> 
> When an IPA-stage clone gets its body (when it is materialized),
> node->former_clone_of gets set to the decl (as opposed to cgraph_node)
> of the original node and hopefully create_clone_with_body sets it
> too. Can you perhaps use that?
> 
>> If I'm correct cgraph_node::clone is used for inline clones only?
>>
> 
> IPA-CP also creates clones, it does so by calling
> cgraph_node::create_virtual_clone but that also sets clone_of.

Hm, I can see neither of cgraph_node::clone_of and cgraph_node::former_clone_of 
set
for my ISRA clone:

grep clone /tmp/node
  next_sibling_clone = , 
  prev_sibling_clone = , 
  clones = , 
  clone_of = , 
  former_clone_of = , 
  simdclone = 0x0, 
  simd_clones = , 
  clone = {
  tm_clone = 0, 

Anyway, I'm planning to use DECL_ABSTRACT_ORIGIN here.

Martin

> 
> Martin
> 



Re: [PATCH] Detect not-cloned new/delete operators in DCE.

2019-08-07 Thread Martin Liška
On 8/6/19 5:44 PM, Marc Glisse wrote:
> On Tue, 6 Aug 2019, Martin Liška wrote:
> 
>> Anyway, I'm sending patch that considers only such new/delete operators
>> that are not a clone of an original type. That should make the current
>> DCE code more solid.
> 
> DECL_IS_REPLACEABLE_OPERATOR_NEW_P seems to have been replaced with 
> DECL_IS_OPERATOR_NEW_P. Is that on purpose?

Whoops, that was not intentional, thanks for heads up.

> 
> I like your cleanup of having a single function to decide if this is the kind 
> of operator new/delete DCE can handle, but you may have introduced long lines 
> in the substitution.
> 

Long lines should be fixed now as well.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin
>From 6ccff415a8931558cf7f1a01b34daec999e9f5af Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Tue, 6 Aug 2019 16:14:48 +0200
Subject: [PATCH] Detect not-cloned new/delete operators in DCE.

gcc/ChangeLog:

2019-08-06  Martin Liska  

	* gimple.c (gimple_call_operator_delete_p): Remove.
	* gimple.h (gimple_call_operator_delete_p): Likewise.
	* tree-ssa-dce.c (operator_new_candidate_p): New.
	(operator_delete_candidate_p): Likewise.
	(mark_stmt_if_obviously_necessary): Use operator_new_candidate_p
	and operator_delete_candidate_p in order to detect operators
	that are not not clones.
	(mark_all_reaching_defs_necessary_1): Likewise.
	(propagate_necessity): Likewise.
	(eliminate_unnecessary_stmts): Likewise.
---
 gcc/gimple.c   | 12 -
 gcc/gimple.h   |  1 -
 gcc/tree-ssa-dce.c | 63 --
 3 files changed, 38 insertions(+), 38 deletions(-)

diff --git a/gcc/gimple.c b/gcc/gimple.c
index 633ef512a19..684b8831b4d 100644
--- a/gcc/gimple.c
+++ b/gcc/gimple.c
@@ -2707,18 +2707,6 @@ gimple_builtin_call_types_compatible_p (const gimple *stmt, tree fndecl)
   return true;
 }
 
-/* Return true when STMT is operator delete call.  */
-
-bool
-gimple_call_operator_delete_p (const gcall *stmt)
-{
-  tree fndecl;
-
-  if ((fndecl = gimple_call_fndecl (stmt)) != NULL_TREE)
-return DECL_IS_OPERATOR_DELETE_P (fndecl);
-  return false;
-}
-
 /* Return true when STMT is builtins call.  */
 
 bool
diff --git a/gcc/gimple.h b/gcc/gimple.h
index 55f5d0d33d9..7a1e1f49099 100644
--- a/gcc/gimple.h
+++ b/gcc/gimple.h
@@ -1548,7 +1548,6 @@ extern alias_set_type gimple_get_alias_set (tree);
 extern bool gimple_ior_addresses_taken (bitmap, gimple *);
 extern bool gimple_builtin_call_types_compatible_p (const gimple *, tree);
 extern combined_fn gimple_call_combined_fn (const gimple *);
-extern bool gimple_call_operator_delete_p (const gcall *);
 extern bool gimple_call_builtin_p (const gimple *);
 extern bool gimple_call_builtin_p (const gimple *, enum built_in_class);
 extern bool gimple_call_builtin_p (const gimple *, enum built_in_function);
diff --git a/gcc/tree-ssa-dce.c b/gcc/tree-ssa-dce.c
index afb7bd9dedc..66eb085b65f 100644
--- a/gcc/tree-ssa-dce.c
+++ b/gcc/tree-ssa-dce.c
@@ -114,6 +114,25 @@ static bool cfg_altered;
 /* When non-NULL holds map from basic block index into the postorder.  */
 static int *bb_postorder;
 
+/* Return true when FNDECL is a new operator and not a clone.  */
+
+static bool
+operator_new_candidate_p (tree fndecl)
+{
+  return (fndecl != NULL_TREE
+	  && DECL_IS_REPLACEABLE_OPERATOR_NEW_P (fndecl)
+	  && DECL_ABSTRACT_ORIGIN (fndecl) == NULL_TREE);
+}
+
+/* Return true when FNDECL is a delete operator and not a clone.  */
+
+static bool
+operator_delete_candidate_p (tree fndecl)
+{
+  return (fndecl != NULL_TREE
+	  && DECL_IS_OPERATOR_DELETE_P (fndecl)
+	  && DECL_ABSTRACT_ORIGIN (fndecl) == NULL_TREE);
+}
 
 /* True if we should treat any stmt with a vdef as necessary.  */
 
@@ -248,7 +267,7 @@ mark_stmt_if_obviously_necessary (gimple *stmt, bool aggressive)
 
 	if (callee != NULL_TREE
 	&& flag_allocation_dce
-	&& DECL_IS_REPLACEABLE_OPERATOR_NEW_P (callee))
+	&& operator_new_candidate_p (callee))
 	  return;
 
 	/* Most, but not all function calls are required.  Function calls that
@@ -613,8 +632,8 @@ mark_all_reaching_defs_necessary_1 (ao_ref *ref ATTRIBUTE_UNUSED,
 	  }
 
   if (callee != NULL_TREE
-	  && (DECL_IS_REPLACEABLE_OPERATOR_NEW_P (callee)
-	  || DECL_IS_OPERATOR_DELETE_P (callee)))
+	  && (operator_new_candidate_p (callee)
+	  || operator_delete_candidate_p (callee)))
 	return false;
 }
 
@@ -800,21 +819,18 @@ propagate_necessity (bool aggressive)
 	 which feed this statement's uses as necessary.  */
 	  ssa_op_iter iter;
 	  tree use;
+	  tree fndecl;
 
 	  /* If this is a call to free which is directly fed by an
 	 allocation function do not mark that necessary through
 	 processing the argument.  */
 	  bool is_delete_operator
 	= (is_gimple_call (stmt)
-	   && gimple_call_operator_delete_p (as_a  (stmt)));
+	   && (fndecl = gimple_call_fndecl (as_a  (stmt)))
+	   && operator_delete_candidate_p (fndecl)

Re: [PATCH] Fix file descriptor existence of MinGW.

2019-08-07 Thread Jakub Jelinek
On Wed, Aug 07, 2019 at 10:45:08AM +0200, Martin Liška wrote:
> @@ -155,3 +156,16 @@ lrealpath (const char *filename)
>/* This system is a lost cause, just duplicate the filename.  */
>return strdup (filename);
>  }
> +
> +/* Return true when FD file descriptor exists.  */
> +
> +int
> +fd_exists (int fd)
> +{
> +#if defined(_WIN32)
> +  HANDLE h = (HANDLE) _get_osfhandle (fd);
> +  return h != (HANDLE) -1;
> +#else
> +  return fcntl (fd, F_GETFD) >= 0;
> +#endif
> +}

Judging from gnulib
http://erislabs.net/gitweb/?p=gnulib.git;a=commitdiff_plain;h=021c8619190757f535c72ad5cdb1d624e19620d6
the #if condition for Windows should be probably different
and it would be worth to have a fallback for when F_GETFD isn't defined
e.g. through dup2 (fd, fd) < 0.

And for the libiberty changes you want Ian to review them.

Jakub


Re: [PATCH][RFC][x86] Fix PR91154, add SImode smax, allow SImode add in SSE regs

2019-08-07 Thread Richard Biener
On Mon, 5 Aug 2019, Uros Bizjak wrote:

> On Mon, Aug 5, 2019 at 3:29 PM Richard Biener  wrote:
> 
> > > > > > > > (define_mode_iterator MAXMIN_IMODE [SI "TARGET_SSE4_1"] [DI 
> > > > > > > > "TARGET_AVX512F"])
> > > > > > > >
> > > > > > > > and then we need to split DImode for 32bits, too.
> > > > > > >
> > > > > > > For now, please add "TARGET_64BIT && TARGET_AVX512F" for DImode
> > > > > > > condition, I'll provide _doubleword splitter later.
> > > > > >
> > > > > > Shouldn't that be TARGET_AVX512VL instead?  Or does the insn use 
> > > > > > %g0 etc.
> > > > > > to force use of %zmmN?
> > > > >
> > > > > It generates V4SI mode, so - yes, AVX512VL.
> > > >
> > > > case SMAX:
> > > > case SMIN:
> > > > case UMAX:
> > > > case UMIN:
> > > >   if ((mode == DImode && (!TARGET_64BIT || !TARGET_AVX512VL))
> > > >   || (mode == SImode && !TARGET_SSE4_1))
> > > > return false;
> > > >
> > > > so there's no way to use AVX512VL for 32bit?
> > >
> > > There is a way, but on 32bit targets, we need to split DImode
> > > operation to a sequence of SImode operations for unconverted pattern.
> > > This is of course doable, but somehow more complex than simply
> > > emitting a DImode compare + DImode cmove, which is what current
> > > splitter does. So, a follow-up task.
> >
> > Ah, OK.  So for the above condition we can elide the !TARGET_64BIT
> > check we just need to properly split if we enable the scalar minmax
> > pattern for DImode on 32bits, the STV conversion would go fine.
> 
> Yes, that is correct.

So I tested the patch below (now with appropriate ChangeLog) on
x86_64-unknown-linux-gnu.  I've thrown it at SPEC CPU 2006 with
the obvious hmmer improvement, now checking for off-noise results
with a 3-run on those that may have one (with more than +-1 second
differences in the 1-run).

As-is the patch likely runs into the splitting issue for DImode
on i?86 and the patch misses functional testcases.  I'll do the
hmmer loop with both DImode and SImode and testcases to trigger
all pattern variants with the different ISAs we have.

Some of the patch could be split out (the cost changes that are
also effective for DImode for example).

AFAICS we could go with only adding SImode avoiding the DImode
splitting thing and this would solve the hmmer regression.

Thanks,
Richard.

2019-08-07  Richard Biener  

PR target/91154
* config/i386/i386-features.h (scalar_chain::scalar_chain): Add
mode arguments.
(scalar_chain::smode): New member.
(scalar_chain::vmode): Likewise.
(dimode_scalar_chain): Rename to...
(general_scalar_chain): ... this.
(general_scalar_chain::general_scalar_chain): Take mode arguments.
(timode_scalar_chain::timode_scalar_chain): Initialize scalar_chain
base with TImode and V1TImode.
* config/i386/i386-features.c (scalar_chain::scalar_chain): Adjust.
(general_scalar_chain::vector_const_cost): Adjust for SImode
chains.
(general_scalar_chain::compute_convert_gain): Likewise.  Fix
reg-reg move cost gain, use ix86_cost->sse_op cost and adjust
scalar costs.  Add {S,U}{MIN,MAX} support.  Dump per-instruction
gain if not zero.
(general_scalar_chain::replace_with_subreg): Use vmode/smode.
(general_scalar_chain::make_vector_copies): Likewise.  Handle
non-DImode chains appropriately.
(general_scalar_chain::convert_reg): Likewise.
(general_scalar_chain::convert_op): Likewise.
(general_scalar_chain::convert_insn): Likewise.  Add
fatal_insn_not_found if the result is not recognized.
(convertible_comparison_p): Pass in the scalar mode and use that.
(general_scalar_to_vector_candidate_p): Likewise.  Rename from
dimode_scalar_to_vector_candidate_p.  Add {S,U}{MIN,MAX} support.
(scalar_to_vector_candidate_p): Remove by inlining into single
caller.
(general_remove_non_convertible_regs): Rename from
dimode_remove_non_convertible_regs.
(remove_non_convertible_regs): Remove by inlining into single caller.
(convert_scalars_to_vector): Handle SImode and DImode chains
in addition to TImode chains.
* config/i386/i386.md (3): New insn split after STV.

Index: gcc/config/i386/i386-features.c
===
--- gcc/config/i386/i386-features.c (revision 274111)
+++ gcc/config/i386/i386-features.c (working copy)
@@ -276,8 +276,11 @@ unsigned scalar_chain::max_id = 0;
 
 /* Initialize new chain.  */
 
-scalar_chain::scalar_chain ()
+scalar_chain::scalar_chain (enum machine_mode smode_, enum machine_mode vmode_)
 {
+  smode = smode_;
+  vmode = vmode_;
+
   chain_id = ++max_id;
 
if (dump_file)
@@ -319,7 +322,7 @@ scalar_chain::add_to_queue (unsigned ins
conversion.  */
 
 void
-dimode_scalar_chain::mark_dual_mode_def (df_ref def)
+general_scalar_chain::mark_

Re: [PATCH] Handle new operators with no arguments in DCE.

2019-08-07 Thread Richard Biener
On Tue, Aug 6, 2019 at 2:42 PM Martin Liška  wrote:
>
> On 8/5/19 3:46 PM, Marc Glisse wrote:
> > On Mon, 5 Aug 2019, Martin Liška wrote:
> >
> >> You are right. It can really lead to confusion of the DCE.
> >>
> >> What we have is DECL_ABSTRACT_ORIGIN(decl) which we can use to indicate 
> >> operators
> >> that were somehow modified by an IPA optimization.
> >
> > Looks similar to the cgraph_node->clone_of that Richard was talking about. 
> > I have no idea which one would be best.
>
>
> Hm, strange that the ISRA clones don't have n->clone_of set. It's created 
> here:
>
> #0  cgraph_node::create (decl= _ZN1AdlEPvd.isra.0>) at /home/marxin/Programming/gcc/gcc/profile-count.h:751
> #1  0x00bc8342 in cgraph_node::create_version_clone 
> (this=, 
> new_decl=, redirect_callers=..., bbs_to_copy=0x0, 
> suffix=0x1b74427 "isra") at 
> /home/marxin/Programming/gcc/gcc/cgraphclones.c:960
> #2  0x00bc9b9a in cgraph_node::create_version_clone_with_body 
> (this=this@entry=, 
> redirect_callers=redirect_callers@entry=..., tree_map=tree_map@entry=0x0, 
> args_to_skip=args_to_skip@entry=0x0,
> skip_return=skip_return@entry=false, bbs_to_copy=bbs_to_copy@entry=0x0, 
> new_entry_block=, suffix=, 
> target_attributes=) at 
> /home/marxin/Programming/gcc/gcc/cgraphclones.c:1071
> #3  0x010e4611 in modify_function (adjustments=..., node= * 0x77208000 "operator delete"/64>) at 
> /home/marxin/Programming/gcc/gcc/tree-sra.c:5493
> #4  ipa_early_sra () at /home/marxin/Programming/gcc/gcc/tree-sra.c:5731
> #5  (anonymous namespace)::pass_early_ipa_sra::execute (this=) 
> at /home/marxin/Programming/gcc/gcc/tree-sra.c:5778
>
> @Martin, @Honza: Why do we not set clone_of in this transformation?
>
> >
> >> Do you believe it will be sufficient?
> >
> > In DCE only consider the operator delete that are not clones? (possibly the 
> > same for operator new? I don't know how much we can change the return value 
> > of a function in a clone) I guess that should work, and it wouldn't impact 
> > the common case with default, global operator new/delete.
>
> I tent to limit that the only cgraph nodes that are not clones. I'm going to 
> prepare a patch for it.

I think the simplest way to achieve this is to not copy, aka clear,
DECL_IS_OPERATOR_* when cloning and removing arguments
(cloning for a constant align argument should be OK for example, as is
for a constant address).  Or simply always when cloning.

Richard.

> >
> > An alternative would be to clear the DECL_IS_OPERATOR_DELETE flag when 
> > creating a clone (possibly only if it modified the first parameter?). There 
> > is probably not much information in the fact that a function that doesn't 
> > even take a pointer used to be an operator delete.
> >
> >
> > By the way, I just thought of something, now that we handle class-specific 
> > operator new/delete (but you could do the same with the global replacable 
> > ones as well):
> >
> > #include 
> > int count = 0;
> > struct A {
> >   __attribute__((malloc,noinline))
> >   static void* operator new(unsigned long sz){++count;return ::operator 
> > new(sz);}
> >   static void operator delete(void* ptr){--count;::operator delete(ptr);}
> > };
> > int main(){
> >   delete new A;
> >   printf("%d\n",count);
> > }
> >
> > If we do not inline anything, we can remove the pair and nothing touches 
> > count. If we inline both new and delete, we can then remove the inner pair 
> > instead, count increases and decreases, fine. If we inline only one of 
> > them, and DCE the mismatched pair new/delete, we get something inconsistent 
> > (count is -1).
> >
> > This seems to indicate we should check that the new and delete match 
> > somehow...
> >
>


Re: Protect tree_to_shwi call in unmodified_param_1

2019-08-07 Thread Richard Biener
On Tue, Aug 6, 2019 at 4:48 PM Richard Sandiford
 wrote:
>
> Richard Biener  writes:
> > On Mon, Aug 5, 2019 at 10:58 AM Richard Sandiford
> >  wrote:
> >>
> >> unmodified_param_1 used tree_to_shwi without first checking
> >> tree_fits_shwi_p.  This is needed by the SVE ACLE support and
> >> is hard to test independently.
> >>
> >> Tested on aarch64-linux-gnu, armeb-eabi and x86_64-linux-gnu.
> >> OK to install?
> >
> > Hmm, a better fix would be to use poly_X for *size_p.  There are also
> > quite a number of callers with NULL size_p which you unduly
> > pessimize for SVE.
>
> OK, you guilt-tripped me into doing it properly. :-)  How does this look?
>
> Tested as before.

OK.

Thanks,
Richard.

> Richard
>
>
> 2019-08-06  Richard Sandiford  
>
> gcc/
> * data-streamer.h (streamer_write_poly_uint64): Declare.
> (streamer_read_poly_uint64): Likewise.
> * data-streamer-in.c (streamer_read_poly_uint64): New function.
> * data-streamer-out.c (streamer_write_poly_uint64): Likewise.
> * ipa-predicate.h (condition::size): Turn into a poly_int64.
> (add_condition): Take a poly_int64 size.
> * ipa-predicate.c (add_condition): Likewise.
> * ipa-prop.h (ipa_load_from_parm_agg): Take a poly_int64 size pointer.
> * ipa-prop.c (ipa_load_from_parm_agg): Likewise.
> (ipcp_modif_dom_walker::before_dom_children): Update accordingly.
> * ipa-fnsummary.c (evaluate_conditions_for_known_args): Handle
> condition::size as a poly_int64.
> (unmodified_parm_1): Take a poly_int64 size pointer.
> (unmodified_parm): Likewise.
> (unmodified_parm_or_parm_agg_item): Likewise.
> (set_cond_stmt_execution_predicate): Update accordingly.
> (set_switch_stmt_execution_predicate): Likewise.
> (will_be_nonconstant_expr_predicate): Likewise.
> (will_be_nonconstant_predicate): Likewise.
> (inline_read_section): Stream condition::size as a poly_int.
> (ipa_fn_summary_write): Likewise.
>
> Index: gcc/data-streamer.h
> ===
> --- gcc/data-streamer.h 2019-07-10 19:41:26.375898187 +0100
> +++ gcc/data-streamer.h 2019-08-06 15:45:32.908297910 +0100
> @@ -53,6 +53,7 @@ HOST_WIDE_INT bp_unpack_var_len_int (str
>  void streamer_write_zero (struct output_block *);
>  void streamer_write_uhwi (struct output_block *, unsigned HOST_WIDE_INT);
>  void streamer_write_hwi (struct output_block *, HOST_WIDE_INT);
> +void streamer_write_poly_uint64 (struct output_block *, poly_uint64);
>  void streamer_write_gcov_count (struct output_block *, gcov_type);
>  void streamer_write_string (struct output_block *, struct lto_output_stream 
> *,
> const char *, bool);
> @@ -82,6 +83,7 @@ const char *bp_unpack_indexed_string (cl
>  const char *bp_unpack_string (class data_in *, struct bitpack_d *);
>  unsigned HOST_WIDE_INT streamer_read_uhwi (class lto_input_block *);
>  HOST_WIDE_INT streamer_read_hwi (class lto_input_block *);
> +poly_uint64 streamer_read_poly_uint64 (class lto_input_block *);
>  gcov_type streamer_read_gcov_count (class lto_input_block *);
>  wide_int streamer_read_wide_int (class lto_input_block *);
>  widest_int streamer_read_widest_int (class lto_input_block *);
> Index: gcc/data-streamer-in.c
> ===
> --- gcc/data-streamer-in.c  2019-07-10 19:41:20.147948065 +0100
> +++ gcc/data-streamer-in.c  2019-08-06 15:45:32.908297910 +0100
> @@ -175,6 +175,17 @@ streamer_read_hwi (class lto_input_block
>  }
>  }
>
> +/* Read a poly_uint64 from IB.  */
> +
> +poly_uint64
> +streamer_read_poly_uint64 (class lto_input_block *ib)
> +{
> +  poly_uint64 res;
> +  for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
> +res.coeffs[i] = streamer_read_uhwi (ib);
> +  return res;
> +}
> +
>  /* Read gcov_type value from IB.  */
>
>  gcov_type
> Index: gcc/data-streamer-out.c
> ===
> --- gcc/data-streamer-out.c 2019-03-08 18:14:27.285004225 +
> +++ gcc/data-streamer-out.c 2019-08-06 15:45:32.908297910 +0100
> @@ -220,6 +220,15 @@ streamer_write_hwi (struct output_block
>streamer_write_hwi_stream (ob->main_stream, work);
>  }
>
> +/* Write a poly_uint64 value WORK to OB->main_stream.  */
> +
> +void
> +streamer_write_poly_uint64 (struct output_block *ob, poly_uint64 work)
> +{
> +  for (int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
> +streamer_write_uhwi_stream (ob->main_stream, work.coeffs[i]);
> +}
> +
>  /* Write a gcov counter value WORK to OB->main_stream.  */
>
>  void
> Index: gcc/ipa-predicate.h
> ===
> --- gcc/ipa-predicate.h 2019-07-10 19:41:27.163891877 +0100
> +++ gcc/ipa-predicate.h 2019-08-06 15:45:32.908297910 +0100
> @@ -31,7 +31,7 @@ struct GTY(()) condition
>   loaded.  */
>

[PATCH] Fix -MF clash for LTO.

2019-08-07 Thread Martin Liška
Hi.

The patch is conclusion of long-lasting discussion in the PR.
I've just verified that our gcc9 package with the patch attached
can build emacs, and so it's fixed.

Note that the CL_DRIVER instead of CL_LANG_ALL still fails.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin

gcc/ChangeLog:

2019-08-06  Martin Liska  

PR target/91130
* lto-wrapper.c (find_and_merge_options): Use (CL_LANG_ALL | CL_DRIVER)
for get_options_from_collect_gcc_options instead of CL_LANG_ALL.
---
 gcc/lto-wrapper.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


diff --git a/gcc/lto-wrapper.c b/gcc/lto-wrapper.c
index 3414adedd26..f1fa48c4688 100644
--- a/gcc/lto-wrapper.c
+++ b/gcc/lto-wrapper.c
@@ -1010,7 +1010,7 @@ find_and_merge_options (int fd, off_t file_offset, const char *prefix,
   struct cl_decoded_option *f2decoded_options;
   unsigned int f2decoded_options_count;
   get_options_from_collect_gcc_options (collect_gcc,
-	fopts, CL_LANG_ALL,
+	fopts, (CL_LANG_ALL | CL_DRIVER),
 	&f2decoded_options,
 	&f2decoded_options_count);
   if (!fdecoded_options)



Re: [PATCH] Fix -MF clash for LTO.

2019-08-07 Thread Jakub Jelinek
On Wed, Aug 07, 2019 at 11:54:44AM +0200, Martin Liška wrote:
> The patch is conclusion of long-lasting discussion in the PR.
> I've just verified that our gcc9 package with the patch attached
> can build emacs, and so it's fixed.

Richard said he has a patch in testing that he plans to submit today.

> Note that the CL_DRIVER instead of CL_LANG_ALL still fails.

What fails?  The -MF handling should not.

If CL_DRIVER | CL_LANG_ALL is really needed, then there shouldn't be ()s
around it.  And lto-wrapper.c has two spots currently using CL_LANG_ALL,
both need changing, not just one.

Jakub


Re: [PATCH] Detect not-cloned new/delete operators in DCE.

2019-08-07 Thread Richard Biener
On Wed, Aug 7, 2019 at 10:56 AM Martin Liška  wrote:
>
> On 8/6/19 5:44 PM, Marc Glisse wrote:
> > On Tue, 6 Aug 2019, Martin Liška wrote:
> >
> >> Anyway, I'm sending patch that considers only such new/delete operators
> >> that are not a clone of an original type. That should make the current
> >> DCE code more solid.
> >
> > DECL_IS_REPLACEABLE_OPERATOR_NEW_P seems to have been replaced with 
> > DECL_IS_OPERATOR_NEW_P. Is that on purpose?
>
> Whoops, that was not intentional, thanks for heads up.
>
> >
> > I like your cleanup of having a single function to decide if this is the 
> > kind of operator new/delete DCE can handle, but you may have introduced 
> > long lines in the substitution.
> >
>
> Long lines should be fixed now as well.
>
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>
> Ready to be installed?

As said in the other thread I prefer you unsed DECL_IS_OPERATOR_* when
cloning.  As much as
I dislike void memcpy (int yeah, double foo); be represented as
BUILT_IN_MEMCPY I dislike
an operator that doesn't follow its signature as such.

If you go as far as looking at the abstract origin you could figure
the original param position
by looking at the to-be-DCEd operator invocations DECL_ARGUMENTS, look at _its_
abstract origin and match that up when walking the operators abstract
origin DECL_ARGUMENTS...

But I think this is all hardly worth the trouble.

Richard.

> Thanks,
> Martin


Re: [PATCH] Fix file descriptor existence of MinGW.

2019-08-07 Thread Martin Liška
On 8/7/19 10:56 AM, Jakub Jelinek wrote:
> On Wed, Aug 07, 2019 at 10:45:08AM +0200, Martin Liška wrote:
>> @@ -155,3 +156,16 @@ lrealpath (const char *filename)
>>/* This system is a lost cause, just duplicate the filename.  */
>>return strdup (filename);
>>  }
>> +
>> +/* Return true when FD file descriptor exists.  */
>> +
>> +int
>> +fd_exists (int fd)
>> +{
>> +#if defined(_WIN32)
>> +  HANDLE h = (HANDLE) _get_osfhandle (fd);
>> +  return h != (HANDLE) -1;
>> +#else
>> +  return fcntl (fd, F_GETFD) >= 0;
>> +#endif
>> +}
> 
> Judging from gnulib
> http://erislabs.net/gitweb/?p=gnulib.git;a=commitdiff_plain;h=021c8619190757f535c72ad5cdb1d624e19620d6
> the #if condition for Windows should be probably different
> and it would be worth to have a fallback for when F_GETFD isn't defined
> e.g. through dup2 (fd, fd) < 0.

Thank you for the review. I updated the patch accordingly.

Martin

> 
> And for the libiberty changes you want Ian to review them.
> 
>   Jakub
> 

>From 7d3593a4189bd970ae752abab36c8a5bc4681847 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Tue, 6 Aug 2019 13:04:40 +0200
Subject: [PATCH] Fix file descriptor existence of MinGW.

gcc/ChangeLog:

2019-08-07  Martin Liska  

	PR bootstrap/91352
	* gcc.c (driver::detect_jobserver): Use fd_exists.
	* lto-wrapper.c (jobserver_active_p): Likewise.

include/ChangeLog:

2019-08-07  Martin Liska  

	PR bootstrap/91352
	* libiberty.h (fd_exists): New function.

libiberty/ChangeLog:

2019-08-07  Martin Liska  

	PR bootstrap/91352
	* lrealpath.c (fd_exists): New function.
---
 gcc/gcc.c |  4 ++--
 gcc/lto-wrapper.c |  4 ++--
 include/libiberty.h   |  4 
 libiberty/lrealpath.c | 16 
 4 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/gcc/gcc.c b/gcc/gcc.c
index 18a07426290..e86b35453c2 100644
--- a/gcc/gcc.c
+++ b/gcc/gcc.c
@@ -8380,8 +8380,8 @@ driver::detect_jobserver () const
 	= (sscanf (n + strlen (needle), "%d,%d", &rfd, &wfd) == 2
 	   && rfd > 0
 	   && wfd > 0
-	   && fcntl (rfd, F_GETFD) >= 0
-	   && fcntl (wfd, F_GETFD) >= 0);
+	   && fd_exists (rfd)
+	   && fd_exists (wfd));
 
 	  /* Drop the jobserver if it's not working now.  */
 	  if (!jobserver)
diff --git a/gcc/lto-wrapper.c b/gcc/lto-wrapper.c
index 3414adedd26..a3b0130b7cb 100644
--- a/gcc/lto-wrapper.c
+++ b/gcc/lto-wrapper.c
@@ -1237,8 +1237,8 @@ jobserver_active_p (void)
   return (sscanf (n + strlen (needle), "%d,%d", &rfd, &wfd) == 2
 	  && rfd > 0
 	  && wfd > 0
-	  && fcntl (rfd, F_GETFD) >= 0
-	  && fcntl (wfd, F_GETFD) >= 0);
+	  && fd_exists (rfd)
+	  && fd_exists (wfd));
 }
 
 /* Execute gcc. ARGC is the number of arguments. ARGV contains the arguments. */
diff --git a/include/libiberty.h b/include/libiberty.h
index 635519e088a..254c00bcc6b 100644
--- a/include/libiberty.h
+++ b/include/libiberty.h
@@ -137,6 +137,10 @@ extern const char *unix_lbasename (const char *) ATTRIBUTE_RETURNS_NONNULL ATTRI
 
 extern char *lrealpath (const char *);
 
+/* Return true when FD file descriptor exists.  */
+
+extern int fd_exists (int fd);
+
 /* Concatenate an arbitrary number of strings.  You must pass NULL as
the last argument of this function, to terminate the list of
strings.  Allocates memory using xmalloc.  */
diff --git a/libiberty/lrealpath.c b/libiberty/lrealpath.c
index 7f66dc2b1bd..fcf9c0bb9bb 100644
--- a/libiberty/lrealpath.c
+++ b/libiberty/lrealpath.c
@@ -49,6 +49,7 @@ components will be simplified.  The returned value will be allocated using
 #ifdef HAVE_STRING_H
 #include 
 #endif
+#include 
 
 /* On GNU libc systems the declaration is only visible with _GNU_SOURCE.  */
 #if defined(HAVE_CANONICALIZE_FILE_NAME) \
@@ -155,3 +156,18 @@ lrealpath (const char *filename)
   /* This system is a lost cause, just duplicate the filename.  */
   return strdup (filename);
 }
+
+/* Return true when FD file descriptor exists.  */
+
+int
+fd_exists (int fd)
+{
+#if defined(_WIN32)
+  HANDLE h = (HANDLE) _get_osfhandle (fd);
+  return h != (HANDLE) -1;
+#elif defined(F_GETFD)
+  return fcntl (fd, F_GETFD) >= 0;
+#else
+  return dup2 (fd, fd) < 0;
+#endif
+}
-- 
2.22.0



Re: [PATCH] Fix -MF clash for LTO.

2019-08-07 Thread Martin Liška
On 8/7/19 12:04 PM, Jakub Jelinek wrote:
> On Wed, Aug 07, 2019 at 11:54:44AM +0200, Martin Liška wrote:
>> The patch is conclusion of long-lasting discussion in the PR.
>> I've just verified that our gcc9 package with the patch attached
>> can build emacs, and so it's fixed.
> 
> Richard said he has a patch in testing that he plans to submit today.
> 
>> Note that the CL_DRIVER instead of CL_LANG_ALL still fails.
> 
> What fails?  The -MF handling should not.

This still fails:

$ echo "int main() {}" > main.c && gcc -c -flto main.c && gcc -o a.out main.o 
-flto -MMD -MF deps

> 
> If CL_DRIVER | CL_LANG_ALL is really needed, then there shouldn't be ()s
> around it.  And lto-wrapper.c has two spots currently using CL_LANG_ALL,
> both need changing, not just one.> 
>   Jakub
> 



Re: [PATCH] Fix simd attribute handling on aarch64 (version 2)

2019-08-07 Thread Szabolcs Nagy
On 31/07/2019 08:25, Richard Sandiford wrote:
> Steve Ellcey  writes:
>>
>> 2019-07-30  Steve Ellcey  
>>
>>  * omp-simd-clone.c (simd_clone_adjust_return_type): Remove call to
>>  build_distinct_type_copy.
>>  (simd_clone_adjust_argument_types): Ditto.
>>  (simd_clone_adjust): Call build_distinct_type_copy here.
>>  (expand_simd_clones): Ditto.
>>
>> 2019-07-30  Steve Ellcey  
>>
>>  * gcc.target/aarch64/simd_pcs_attribute.c: New test.
>>  * gcc.target/aarch64/simd_pcs_attribute-2.c: Ditto.
>>  * gcc.target/aarch64/simd_pcs_attribute-3.c: Ditto.
> 
> OK if there are no objections in 48 hours.

i think this should be backported to gcc-9 too.


Re: [PATCH] Handle new operators with no arguments in DCE.

2019-08-07 Thread Martin Liška
On 8/7/19 11:51 AM, Richard Biener wrote:
> I think the simplest way to achieve this is to not copy, aka clear,
> DECL_IS_OPERATOR_* when cloning and removing arguments
> (cloning for a constant align argument should be OK for example, as is
> for a constant address).  Or simply always when cloning.

Ok, then I'm suggesting following tested patch.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin
>From b7ed2c7c208e96ecff77a0127570a858a486c0b8 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Wed, 7 Aug 2019 12:00:58 +0200
Subject: [PATCH] When cloning set operator new/delete to false.

gcc/ChangeLog:

2019-08-07  Martin Liska  

	* cgraphclones.c (set_new_clone_decl_and_node_flags): Drop
	IS_OPERATOR_NEW and IS_OPERATOR_DELETE.
	(create_version_clone_with_body): Likewise.
---
 gcc/cgraphclones.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/gcc/cgraphclones.c b/gcc/cgraphclones.c
index fd867ecac91..28cf2ecc411 100644
--- a/gcc/cgraphclones.c
+++ b/gcc/cgraphclones.c
@@ -248,6 +248,8 @@ set_new_clone_decl_and_node_flags (cgraph_node *new_node)
   DECL_VIRTUAL_P (new_node->decl) = 0;
   DECL_STATIC_CONSTRUCTOR (new_node->decl) = 0;
   DECL_STATIC_DESTRUCTOR (new_node->decl) = 0;
+  DECL_SET_IS_OPERATOR_NEW (new_node->decl, 0);
+  DECL_SET_IS_OPERATOR_DELETE (new_node->decl, 0);
 
   new_node->externally_visible = 0;
   new_node->local.local = 1;
@@ -1065,6 +1067,8 @@ cgraph_node::create_version_clone_with_body
   /* When the old decl was a con-/destructor make sure the clone isn't.  */
   DECL_STATIC_CONSTRUCTOR (new_decl) = 0;
   DECL_STATIC_DESTRUCTOR (new_decl) = 0;
+  DECL_SET_IS_OPERATOR_NEW (new_decl, 0);
+  DECL_SET_IS_OPERATOR_DELETE (new_decl, 0);
 
   /* Create the new version's call-graph node.
  and update the edges of the new node. */
-- 
2.22.0



Re: [PATCH] Handle new operators with no arguments in DCE.

2019-08-07 Thread Jakub Jelinek
On Wed, Aug 07, 2019 at 12:44:28PM +0200, Martin Liška wrote:
> On 8/7/19 11:51 AM, Richard Biener wrote:
> > I think the simplest way to achieve this is to not copy, aka clear,
> > DECL_IS_OPERATOR_* when cloning and removing arguments
> > (cloning for a constant align argument should be OK for example, as is
> > for a constant address).  Or simply always when cloning.
> 
> Ok, then I'm suggesting following tested patch.
> 
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

What about LAMBDA_FUNCTION, doesn't cloning which changes arguments in any
way invalidate that too, i.e. shouldn't it be just 
  FUNCTION_DECL_DECL_TYPE (new_node->decl) = NONE;
instead?  On the other side, if the cloning doesn't change arguments in any
way, do we still want to clear those flags?

Jakub


[Patch, ira] Invalid assert in reload1.c::finish_spills?

2019-08-07 Thread SenthilKumar.Selvaraj
Hi,

  gcc/testsuite/c-c++-common/pr60101.c fails with an ICE for the 
  avr target, because of a gcc_assert firing at reload1.c:4233

  The assert (in the patch below) looks bogus to me, as it's in 
  the if block of

if (! ira_conflicts_p || reg_renumber[i] >= 0)

  For this testcase and for the avr target, ira_conflicts_p is 
  false because build_conflict_bit_table bailed out early 
  (conflict table too big).
  If reg_renumber[i] is now negative, the assert fires and causes 
  the ICE.

  Getting rid of the assert (patch below) makes the ICE go away, 
  not sure if that's the right fix though.

  Comments?

Regards
Senthil


diff --git a/gcc/reload1.c b/gcc/reload1.c
index 38ee356a791..5acba706bee 100644
--- a/gcc/reload1.c
+++ b/gcc/reload1.c
@@ -4230,7 +4230,6 @@ finish_spills (int global)
/* Record the current hard register the pseudo is allocated to
   in pseudo_previous_regs so we avoid reallocating it to the
   same hard reg in a later pass.  */
-   gcc_assert (reg_renumber[i] >= 0);

SET_HARD_REG_BIT (pseudo_previous_regs[i], reg_renumber[i]);
/* Mark it as no longer having a hard register home.  */


Re: [PATCH][RFC][x86] Fix PR91154, add SImode smax, allow SImode add in SSE regs

2019-08-07 Thread Richard Biener
On Wed, 7 Aug 2019, Richard Biener wrote:

> On Mon, 5 Aug 2019, Uros Bizjak wrote:
> 
> > On Mon, Aug 5, 2019 at 3:29 PM Richard Biener  wrote:
> > 
> > > > > > > > > (define_mode_iterator MAXMIN_IMODE [SI "TARGET_SSE4_1"] [DI 
> > > > > > > > > "TARGET_AVX512F"])
> > > > > > > > >
> > > > > > > > > and then we need to split DImode for 32bits, too.
> > > > > > > >
> > > > > > > > For now, please add "TARGET_64BIT && TARGET_AVX512F" for DImode
> > > > > > > > condition, I'll provide _doubleword splitter later.
> > > > > > >
> > > > > > > Shouldn't that be TARGET_AVX512VL instead?  Or does the insn use 
> > > > > > > %g0 etc.
> > > > > > > to force use of %zmmN?
> > > > > >
> > > > > > It generates V4SI mode, so - yes, AVX512VL.
> > > > >
> > > > > case SMAX:
> > > > > case SMIN:
> > > > > case UMAX:
> > > > > case UMIN:
> > > > >   if ((mode == DImode && (!TARGET_64BIT || !TARGET_AVX512VL))
> > > > >   || (mode == SImode && !TARGET_SSE4_1))
> > > > > return false;
> > > > >
> > > > > so there's no way to use AVX512VL for 32bit?
> > > >
> > > > There is a way, but on 32bit targets, we need to split DImode
> > > > operation to a sequence of SImode operations for unconverted pattern.
> > > > This is of course doable, but somehow more complex than simply
> > > > emitting a DImode compare + DImode cmove, which is what current
> > > > splitter does. So, a follow-up task.
> > >
> > > Ah, OK.  So for the above condition we can elide the !TARGET_64BIT
> > > check we just need to properly split if we enable the scalar minmax
> > > pattern for DImode on 32bits, the STV conversion would go fine.
> > 
> > Yes, that is correct.
> 
> So I tested the patch below (now with appropriate ChangeLog) on
> x86_64-unknown-linux-gnu.  I've thrown it at SPEC CPU 2006 with
> the obvious hmmer improvement, now checking for off-noise results
> with a 3-run on those that may have one (with more than +-1 second
> differences in the 1-run).
> 
> As-is the patch likely runs into the splitting issue for DImode
> on i?86 and the patch misses functional testcases.  I'll do the
> hmmer loop with both DImode and SImode and testcases to trigger
> all pattern variants with the different ISAs we have.
> 
> Some of the patch could be split out (the cost changes that are
> also effective for DImode for example).
> 
> AFAICS we could go with only adding SImode avoiding the DImode
> splitting thing and this would solve the hmmer regression.

I've additionally bootstrapped with --with-arch=nehalem which
reveals

FAIL: gcc.target/i386/minmax-2.c scan-assembler test
FAIL: gcc.target/i386/minmax-2.c scan-assembler-not cmp

we emit cmp + cmov here now with -msse4.1 (as soon as the max
pattern is enabled I guess)

Otherwise testing is clean, so I suppose this is the net effect
of just doing the SImode chains;  I don't have AVX512 HW handily
available to really test the DImode path.

Would you be fine to simplify the patch down to SImode chain handling?

Thanks,
Richard.

> Thanks,
> Richard.
> 
> 2019-08-07  Richard Biener  
> 
>   PR target/91154
>   * config/i386/i386-features.h (scalar_chain::scalar_chain): Add
>   mode arguments.
>   (scalar_chain::smode): New member.
>   (scalar_chain::vmode): Likewise.
>   (dimode_scalar_chain): Rename to...
>   (general_scalar_chain): ... this.
>   (general_scalar_chain::general_scalar_chain): Take mode arguments.
>   (timode_scalar_chain::timode_scalar_chain): Initialize scalar_chain
>   base with TImode and V1TImode.
>   * config/i386/i386-features.c (scalar_chain::scalar_chain): Adjust.
>   (general_scalar_chain::vector_const_cost): Adjust for SImode
>   chains.
>   (general_scalar_chain::compute_convert_gain): Likewise.  Fix
>   reg-reg move cost gain, use ix86_cost->sse_op cost and adjust
>   scalar costs.  Add {S,U}{MIN,MAX} support.  Dump per-instruction
>   gain if not zero.
>   (general_scalar_chain::replace_with_subreg): Use vmode/smode.
>   (general_scalar_chain::make_vector_copies): Likewise.  Handle
>   non-DImode chains appropriately.
>   (general_scalar_chain::convert_reg): Likewise.
>   (general_scalar_chain::convert_op): Likewise.
>   (general_scalar_chain::convert_insn): Likewise.  Add
>   fatal_insn_not_found if the result is not recognized.
>   (convertible_comparison_p): Pass in the scalar mode and use that.
>   (general_scalar_to_vector_candidate_p): Likewise.  Rename from
>   dimode_scalar_to_vector_candidate_p.  Add {S,U}{MIN,MAX} support.
>   (scalar_to_vector_candidate_p): Remove by inlining into single
>   caller.
>   (general_remove_non_convertible_regs): Rename from
>   dimode_remove_non_convertible_regs.
>   (remove_non_convertible_regs): Remove by inlining into single caller.
>   (convert_scalars_to_vector): Handle SImode and DImode chains
>   in addition to TImode chains.
>   * config/i386/i386.md (3): New insn 

Re: [PATCH 4/9] Strengthen alias_ptr_types_compatible_p in LTO mode.

2019-08-07 Thread Richard Biener
On Tue, Aug 6, 2019 at 5:43 PM Martin Liska  wrote:

This warrants a comment like

  /* This function originally abstracts from simply comparing
get_deref_alias_set
 so that we are sure this still computes the same result after LTO
type merging
 is applied.  When in LTO type merging is done we can actually do
this compare.  */
  if (in_lto_p)
return get_deref_alias_set (t1) == get_deref_alias_set (t2);
...

also note you want to call get_deref_alias_set as mentioned in the
function comment.

OK with this change.

Thanks,
Richard.

> gcc/ChangeLog:
>
> 2019-07-24  Martin Liska  
>
> * alias.c (alias_ptr_types_compatible_p): Strengten
> type comparison in LTO mode.
> ---
>  gcc/alias.c | 7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)
>


Re: [PATCH] Handle new operators with no arguments in DCE.

2019-08-07 Thread Martin Liška
On 8/7/19 12:51 PM, Jakub Jelinek wrote:
> On Wed, Aug 07, 2019 at 12:44:28PM +0200, Martin Liška wrote:
>> On 8/7/19 11:51 AM, Richard Biener wrote:
>>> I think the simplest way to achieve this is to not copy, aka clear,
>>> DECL_IS_OPERATOR_* when cloning and removing arguments
>>> (cloning for a constant align argument should be OK for example, as is
>>> for a constant address).  Or simply always when cloning.
>>
>> Ok, then I'm suggesting following tested patch.
>>
>> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> 
> What about LAMBDA_FUNCTION, doesn't cloning which changes arguments in any
> way invalidate that too, i.e. shouldn't it be just 
>   FUNCTION_DECL_DECL_TYPE (new_node->decl) = NONE;

Well, how are lambdas involved in the new/delete DCE here? Lambdas with removed
arguments should not interfere here.

> instead?  On the other side, if the cloning doesn't change arguments in any
> way, do we still want to clear those flags?

Well, I would consider it safer to drop it always.

Martin

> 
>   Jakub
> 



Re: [PATCH 3/9] operand_equal_p: add support for OBJ_TYPE_REF.

2019-08-07 Thread Richard Biener
On Tue, Aug 6, 2019 at 5:43 PM Martin Liska  wrote:
>
>
> gcc/ChangeLog:

+   /* Virtual table call.  */
+   case OBJ_TYPE_REF:
+ {
+   if (!operand_equal_p (OBJ_TYPE_REF_EXPR (arg0),
+ OBJ_TYPE_REF_EXPR (arg1), flags))
+ return false;
+   if (virtual_method_call_p (arg0))
+ {
+   if (tree_to_uhwi (OBJ_TYPE_REF_TOKEN (arg0))
+   != tree_to_uhwi (OBJ_TYPE_REF_TOKEN (arg1)))
+ return false;
+   if (!types_same_for_odr (obj_type_ref_class (arg0),
+obj_type_ref_class (arg1)))
+ return false;
+   if (!operand_equal_p (OBJ_TYPE_REF_OBJECT (arg0),
+ OBJ_TYPE_REF_OBJECT (arg1), flags))
+ return false;

this all gets deep into the devirt machinery, including looking at
ODR type hashes.  So I'm not sure if we really want to handle
it this "optimistic" in operand_equal_p and completely ignore
other operands when !virtual_method_call_p?  That is, why
not compare OBJ_TYPE_REF_TOKEN/OBJECT always at least?

Do we then have cases where the OBJ_TYPE_REF is actually
distinct according to the remaining check?

+ }


> 2019-07-24  Martin Liska  
>
> * fold-const.c (operand_equal_p): Support OBJ_TYPE_REF.
> * tree.c (add_expr): Hash parts of OBJ_TYPE_REF.
> ---
>  gcc/fold-const.c | 21 +
>  gcc/tree.c   |  9 +
>  2 files changed, 30 insertions(+)
>


Re: [PATCH][RFC][x86] Fix PR91154, add SImode smax, allow SImode add in SSE regs

2019-08-07 Thread Uros Bizjak
On Wed, Aug 7, 2019 at 1:51 PM Richard Biener  wrote:
>
> On Wed, 7 Aug 2019, Richard Biener wrote:
>
> > On Mon, 5 Aug 2019, Uros Bizjak wrote:
> >
> > > On Mon, Aug 5, 2019 at 3:29 PM Richard Biener  wrote:
> > >
> > > > > > > > > > (define_mode_iterator MAXMIN_IMODE [SI "TARGET_SSE4_1"] [DI 
> > > > > > > > > > "TARGET_AVX512F"])
> > > > > > > > > >
> > > > > > > > > > and then we need to split DImode for 32bits, too.
> > > > > > > > >
> > > > > > > > > For now, please add "TARGET_64BIT && TARGET_AVX512F" for 
> > > > > > > > > DImode
> > > > > > > > > condition, I'll provide _doubleword splitter later.
> > > > > > > >
> > > > > > > > Shouldn't that be TARGET_AVX512VL instead?  Or does the insn 
> > > > > > > > use %g0 etc.
> > > > > > > > to force use of %zmmN?
> > > > > > >
> > > > > > > It generates V4SI mode, so - yes, AVX512VL.
> > > > > >
> > > > > > case SMAX:
> > > > > > case SMIN:
> > > > > > case UMAX:
> > > > > > case UMIN:
> > > > > >   if ((mode == DImode && (!TARGET_64BIT || !TARGET_AVX512VL))
> > > > > >   || (mode == SImode && !TARGET_SSE4_1))
> > > > > > return false;
> > > > > >
> > > > > > so there's no way to use AVX512VL for 32bit?
> > > > >
> > > > > There is a way, but on 32bit targets, we need to split DImode
> > > > > operation to a sequence of SImode operations for unconverted pattern.
> > > > > This is of course doable, but somehow more complex than simply
> > > > > emitting a DImode compare + DImode cmove, which is what current
> > > > > splitter does. So, a follow-up task.
> > > >
> > > > Ah, OK.  So for the above condition we can elide the !TARGET_64BIT
> > > > check we just need to properly split if we enable the scalar minmax
> > > > pattern for DImode on 32bits, the STV conversion would go fine.
> > >
> > > Yes, that is correct.
> >
> > So I tested the patch below (now with appropriate ChangeLog) on
> > x86_64-unknown-linux-gnu.  I've thrown it at SPEC CPU 2006 with
> > the obvious hmmer improvement, now checking for off-noise results
> > with a 3-run on those that may have one (with more than +-1 second
> > differences in the 1-run).
> >
> > As-is the patch likely runs into the splitting issue for DImode
> > on i?86 and the patch misses functional testcases.  I'll do the
> > hmmer loop with both DImode and SImode and testcases to trigger
> > all pattern variants with the different ISAs we have.
> >
> > Some of the patch could be split out (the cost changes that are
> > also effective for DImode for example).
> >
> > AFAICS we could go with only adding SImode avoiding the DImode
> > splitting thing and this would solve the hmmer regression.
>
> I've additionally bootstrapped with --with-arch=nehalem which
> reveals
>
> FAIL: gcc.target/i386/minmax-2.c scan-assembler test
> FAIL: gcc.target/i386/minmax-2.c scan-assembler-not cmp
>
> we emit cmp + cmov here now with -msse4.1 (as soon as the max
> pattern is enabled I guess)
>
> Otherwise testing is clean, so I suppose this is the net effect
> of just doing the SImode chains;  I don't have AVX512 HW handily
> available to really test the DImode path.
>
> Would you be fine to simplify the patch down to SImode chain handling?

Just leave DImode for a couple of days to see what HJ's autotesters
reveal. I'd just disable DImode for 32bit targets for now, we know
that splitters are missing.

Some remarks below.

Uros.

>
> Thanks,
> Richard.
>
> > Thanks,
> > Richard.
> >
> > 2019-08-07  Richard Biener  
> >
> >   PR target/91154
> >   * config/i386/i386-features.h (scalar_chain::scalar_chain): Add
> >   mode arguments.
> >   (scalar_chain::smode): New member.
> >   (scalar_chain::vmode): Likewise.
> >   (dimode_scalar_chain): Rename to...
> >   (general_scalar_chain): ... this.
> >   (general_scalar_chain::general_scalar_chain): Take mode arguments.
> >   (timode_scalar_chain::timode_scalar_chain): Initialize scalar_chain
> >   base with TImode and V1TImode.
> >   * config/i386/i386-features.c (scalar_chain::scalar_chain): Adjust.
> >   (general_scalar_chain::vector_const_cost): Adjust for SImode
> >   chains.
> >   (general_scalar_chain::compute_convert_gain): Likewise.  Fix
> >   reg-reg move cost gain, use ix86_cost->sse_op cost and adjust
> >   scalar costs.  Add {S,U}{MIN,MAX} support.  Dump per-instruction
> >   gain if not zero.
> >   (general_scalar_chain::replace_with_subreg): Use vmode/smode.
> >   (general_scalar_chain::make_vector_copies): Likewise.  Handle
> >   non-DImode chains appropriately.
> >   (general_scalar_chain::convert_reg): Likewise.
> >   (general_scalar_chain::convert_op): Likewise.
> >   (general_scalar_chain::convert_insn): Likewise.  Add
> >   fatal_insn_not_found if the result is not recognized.
> >   (convertible_comparison_p): Pass in the scalar mode and use that.
> >   (general_scalar_to_vector_candidate_p): Likewise.  Rename from
> >   dimode_scalar_to

Re: [PATCH] Fix file descriptor existence of MinGW.

2019-08-07 Thread Martin Liška
Hi.

There's one enhanced version where I added HAVE_FCNTL_H.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin
>From 980d84c31f4c27a4f8314808d1ffa548b3c7bcc8 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Tue, 6 Aug 2019 13:04:40 +0200
Subject: [PATCH] Fix file descriptor existence of MinGW.

gcc/ChangeLog:

2019-08-07  Martin Liska  

	PR bootstrap/91352
	* gcc.c (driver::detect_jobserver): Use fd_exists.
	* lto-wrapper.c (jobserver_active_p): Likewise.

include/ChangeLog:

2019-08-07  Martin Liska  

	PR bootstrap/91352
	* libiberty.h (fd_exists): New function.

libiberty/ChangeLog:

2019-08-07  Martin Liska  

	PR bootstrap/91352
	* lrealpath.c (fd_exists): New function.
---
 gcc/gcc.c |  4 ++--
 gcc/lto-wrapper.c |  4 ++--
 include/libiberty.h   |  4 
 libiberty/lrealpath.c | 16 
 4 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/gcc/gcc.c b/gcc/gcc.c
index 18a07426290..e86b35453c2 100644
--- a/gcc/gcc.c
+++ b/gcc/gcc.c
@@ -8380,8 +8380,8 @@ driver::detect_jobserver () const
 	= (sscanf (n + strlen (needle), "%d,%d", &rfd, &wfd) == 2
 	   && rfd > 0
 	   && wfd > 0
-	   && fcntl (rfd, F_GETFD) >= 0
-	   && fcntl (wfd, F_GETFD) >= 0);
+	   && fd_exists (rfd)
+	   && fd_exists (wfd));
 
 	  /* Drop the jobserver if it's not working now.  */
 	  if (!jobserver)
diff --git a/gcc/lto-wrapper.c b/gcc/lto-wrapper.c
index 3414adedd26..a3b0130b7cb 100644
--- a/gcc/lto-wrapper.c
+++ b/gcc/lto-wrapper.c
@@ -1237,8 +1237,8 @@ jobserver_active_p (void)
   return (sscanf (n + strlen (needle), "%d,%d", &rfd, &wfd) == 2
 	  && rfd > 0
 	  && wfd > 0
-	  && fcntl (rfd, F_GETFD) >= 0
-	  && fcntl (wfd, F_GETFD) >= 0);
+	  && fd_exists (rfd)
+	  && fd_exists (wfd));
 }
 
 /* Execute gcc. ARGC is the number of arguments. ARGV contains the arguments. */
diff --git a/include/libiberty.h b/include/libiberty.h
index 635519e088a..254c00bcc6b 100644
--- a/include/libiberty.h
+++ b/include/libiberty.h
@@ -137,6 +137,10 @@ extern const char *unix_lbasename (const char *) ATTRIBUTE_RETURNS_NONNULL ATTRI
 
 extern char *lrealpath (const char *);
 
+/* Return true when FD file descriptor exists.  */
+
+extern int fd_exists (int fd);
+
 /* Concatenate an arbitrary number of strings.  You must pass NULL as
the last argument of this function, to terminate the list of
strings.  Allocates memory using xmalloc.  */
diff --git a/libiberty/lrealpath.c b/libiberty/lrealpath.c
index 7f66dc2b1bd..6d6c161f7c6 100644
--- a/libiberty/lrealpath.c
+++ b/libiberty/lrealpath.c
@@ -49,6 +49,9 @@ components will be simplified.  The returned value will be allocated using
 #ifdef HAVE_STRING_H
 #include 
 #endif
+#ifdef HAVE_FCNTL_H
+#include 
+#endif
 
 /* On GNU libc systems the declaration is only visible with _GNU_SOURCE.  */
 #if defined(HAVE_CANONICALIZE_FILE_NAME) \
@@ -155,3 +158,16 @@ lrealpath (const char *filename)
   /* This system is a lost cause, just duplicate the filename.  */
   return strdup (filename);
 }
+
+/* Return true when FD file descriptor exists.  */
+
+int
+fd_exists (int fd)
+{
+#if defined(_WIN32)
+  HANDLE h = (HANDLE) _get_osfhandle (fd);
+  return h != (HANDLE) -1;
+#else
+  return fcntl (fd, F_GETFD) >= 0;
+#endif
+}
-- 
2.22.0



Re: [PATCH 2/9] operand_equal_p: add support for FIELD_DECL

2019-08-07 Thread Richard Biener
On Tue, Aug 6, 2019 at 5:44 PM Martin Liska  wrote:
>
>
> gcc/ChangeLog:

So I suppose this isn't to call operand_equal_p on two FIELD_DECLs
but to make two COMPONENT_REFs "more equal"?  If so I then
I suggest to make this change "local" to the COMPONENT_REF handling.
This also interacts with path-based disambiguation so you want to make
sure to only make things equal here iff it wouldn't change the outcome
of path-based analysis.  Honza?

Richard.

> 2019-07-24  Martin Liska  
>
> * fold-const.c (operand_equal_p): Support FIELD_DECL
> as well.
> * tree.c (add_expr): Hast DECL_FIELD_OFFSET and
> DECL_FIELD_BIT_OFFSET for FIELD_DECL.
>
> gcc/testsuite/ChangeLog:
>
> 2019-07-24  Martin Liska  
>
> * gcc.dg/vect/vect-35-big-array.c: Vectorize one more loop.
> * gcc.dg/vect/vect-35.c: Likewise.
> * gcc.dg/pr70740.c: Move from torture and set -O2.
> * gfortran.dg/vect/vect-8.f90: Update scanned pattern.
> ---
>  gcc/fold-const.c  | 34 ---
>  gcc/testsuite/gcc.dg/{torture => }/pr70740.c  |  3 +-
>  gcc/testsuite/gcc.dg/vect/vect-35-big-array.c |  3 +-
>  gcc/testsuite/gcc.dg/vect/vect-35.c   |  3 +-
>  gcc/testsuite/gfortran.dg/vect/vect-8.f90 |  2 +-
>  gcc/tree.c|  4 +++
>  6 files changed, 38 insertions(+), 11 deletions(-)
>  rename gcc/testsuite/gcc.dg/{torture => }/pr70740.c (77%)
>


Re: [PATCH 1/9] Replace int with boolean in predicate functions.

2019-08-07 Thread Richard Biener
On Tue, Aug 6, 2019 at 5:45 PM Martin Liska  wrote:
>
>
> gcc/ChangeLog:

OK.

Richard.

> 2019-07-24  Martin Liska  
>
> * fold-const.c (twoval_comparison_p): Replace int
> with bool as a return type.
> (simple_operand_p): Likewise.
> (operand_equal_p): Replace int with bool as a return type.
> * fold-const.h (operand_equal_p): Likewise.
> ---
>  gcc/fold-const.c | 148 +++
>  gcc/fold-const.h |   2 +-
>  2 files changed, 75 insertions(+), 75 deletions(-)
>


Re: [PATCH][RFC][x86] Fix PR91154, add SImode smax, allow SImode add in SSE regs

2019-08-07 Thread Uros Bizjak
On Wed, Aug 7, 2019 at 1:51 PM Richard Biener  wrote:
>
> On Wed, 7 Aug 2019, Richard Biener wrote:
>
> > On Mon, 5 Aug 2019, Uros Bizjak wrote:
> >
> > > On Mon, Aug 5, 2019 at 3:29 PM Richard Biener  wrote:
> > >
> > > > > > > > > > (define_mode_iterator MAXMIN_IMODE [SI "TARGET_SSE4_1"] [DI 
> > > > > > > > > > "TARGET_AVX512F"])
> > > > > > > > > >
> > > > > > > > > > and then we need to split DImode for 32bits, too.
> > > > > > > > >
> > > > > > > > > For now, please add "TARGET_64BIT && TARGET_AVX512F" for 
> > > > > > > > > DImode
> > > > > > > > > condition, I'll provide _doubleword splitter later.
> > > > > > > >
> > > > > > > > Shouldn't that be TARGET_AVX512VL instead?  Or does the insn 
> > > > > > > > use %g0 etc.
> > > > > > > > to force use of %zmmN?
> > > > > > >
> > > > > > > It generates V4SI mode, so - yes, AVX512VL.
> > > > > >
> > > > > > case SMAX:
> > > > > > case SMIN:
> > > > > > case UMAX:
> > > > > > case UMIN:
> > > > > >   if ((mode == DImode && (!TARGET_64BIT || !TARGET_AVX512VL))
> > > > > >   || (mode == SImode && !TARGET_SSE4_1))
> > > > > > return false;
> > > > > >
> > > > > > so there's no way to use AVX512VL for 32bit?
> > > > >
> > > > > There is a way, but on 32bit targets, we need to split DImode
> > > > > operation to a sequence of SImode operations for unconverted pattern.
> > > > > This is of course doable, but somehow more complex than simply
> > > > > emitting a DImode compare + DImode cmove, which is what current
> > > > > splitter does. So, a follow-up task.
> > > >
> > > > Ah, OK.  So for the above condition we can elide the !TARGET_64BIT
> > > > check we just need to properly split if we enable the scalar minmax
> > > > pattern for DImode on 32bits, the STV conversion would go fine.
> > >
> > > Yes, that is correct.
> >
> > So I tested the patch below (now with appropriate ChangeLog) on
> > x86_64-unknown-linux-gnu.  I've thrown it at SPEC CPU 2006 with
> > the obvious hmmer improvement, now checking for off-noise results
> > with a 3-run on those that may have one (with more than +-1 second
> > differences in the 1-run).
> >
> > As-is the patch likely runs into the splitting issue for DImode
> > on i?86 and the patch misses functional testcases.  I'll do the
> > hmmer loop with both DImode and SImode and testcases to trigger
> > all pattern variants with the different ISAs we have.
> >
> > Some of the patch could be split out (the cost changes that are
> > also effective for DImode for example).
> >
> > AFAICS we could go with only adding SImode avoiding the DImode
> > splitting thing and this would solve the hmmer regression.
>
> I've additionally bootstrapped with --with-arch=nehalem which
> reveals
>
> FAIL: gcc.target/i386/minmax-2.c scan-assembler test
> FAIL: gcc.target/i386/minmax-2.c scan-assembler-not cmp
>
> we emit cmp + cmov here now with -msse4.1 (as soon as the max
> pattern is enabled I guess)

Actually, we have to split using ix86_expand_int_compare. This will
generate optimized CC mode.

Uros.

>
> Otherwise testing is clean, so I suppose this is the net effect
> of just doing the SImode chains;  I don't have AVX512 HW handily
> available to really test the DImode path.
>
> Would you be fine to simplify the patch down to SImode chain handling?
>
> Thanks,
> Richard.
>
> > Thanks,
> > Richard.
> >
> > 2019-08-07  Richard Biener  
> >
> >   PR target/91154
> >   * config/i386/i386-features.h (scalar_chain::scalar_chain): Add
> >   mode arguments.
> >   (scalar_chain::smode): New member.
> >   (scalar_chain::vmode): Likewise.
> >   (dimode_scalar_chain): Rename to...
> >   (general_scalar_chain): ... this.
> >   (general_scalar_chain::general_scalar_chain): Take mode arguments.
> >   (timode_scalar_chain::timode_scalar_chain): Initialize scalar_chain
> >   base with TImode and V1TImode.
> >   * config/i386/i386-features.c (scalar_chain::scalar_chain): Adjust.
> >   (general_scalar_chain::vector_const_cost): Adjust for SImode
> >   chains.
> >   (general_scalar_chain::compute_convert_gain): Likewise.  Fix
> >   reg-reg move cost gain, use ix86_cost->sse_op cost and adjust
> >   scalar costs.  Add {S,U}{MIN,MAX} support.  Dump per-instruction
> >   gain if not zero.
> >   (general_scalar_chain::replace_with_subreg): Use vmode/smode.
> >   (general_scalar_chain::make_vector_copies): Likewise.  Handle
> >   non-DImode chains appropriately.
> >   (general_scalar_chain::convert_reg): Likewise.
> >   (general_scalar_chain::convert_op): Likewise.
> >   (general_scalar_chain::convert_insn): Likewise.  Add
> >   fatal_insn_not_found if the result is not recognized.
> >   (convertible_comparison_p): Pass in the scalar mode and use that.
> >   (general_scalar_to_vector_candidate_p): Likewise.  Rename from
> >   dimode_scalar_to_vector_candidate_p.  Add {S,U}{MIN,MAX} support.
> >   (scalar_to_vector_candidate_

Re: [Patch, ira] Invalid assert in reload1.c::finish_spills?

2019-08-07 Thread Michael Matz
Hi,

On Wed, 7 Aug 2019, senthilkumar.selva...@microchip.com wrote:

>   gcc/testsuite/c-c++-common/pr60101.c fails with an ICE for the 
>   avr target, because of a gcc_assert firing at reload1.c:4233
> 
>   The assert (in the patch below) looks bogus to me, as it's in 
>   the if block of
> 
> if (! ira_conflicts_p || reg_renumber[i] >= 0)
> 
>   For this testcase and for the avr target, ira_conflicts_p is 
>   false because build_conflict_bit_table bailed out early 
>   (conflict table too big).
>   If reg_renumber[i] is now negative, the assert fires and causes 
>   the ICE.
> 
>   Getting rid of the assert (patch below) makes the ICE go away, 
>   not sure if that's the right fix though.

But that obviously can't work, because of the immediately following line,
also quoted in your patch:

>   SET_HARD_REG_BIT (pseudo_previous_regs[i], reg_renumber[i]);

If reg_renumber[i] is negative that wreaks havok.  So some other logic is 
broken.  If it's really okay to have reg_renumber be negative here without 
IRA conflicts (and I don't know if that's the case), then the assert needs 
to become a condition for this statement.


Ciao,
Michael.


Re: [PATCH][RFC][x86] Fix PR91154, add SImode smax, allow SImode add in SSE regs

2019-08-07 Thread Uros Bizjak
On Wed, Aug 7, 2019 at 2:20 PM Uros Bizjak  wrote:
>
> On Wed, Aug 7, 2019 at 1:51 PM Richard Biener  wrote:
> >
> > On Wed, 7 Aug 2019, Richard Biener wrote:
> >
> > > On Mon, 5 Aug 2019, Uros Bizjak wrote:
> > >
> > > > On Mon, Aug 5, 2019 at 3:29 PM Richard Biener  wrote:
> > > >
> > > > > > > > > > > (define_mode_iterator MAXMIN_IMODE [SI "TARGET_SSE4_1"] 
> > > > > > > > > > > [DI "TARGET_AVX512F"])
> > > > > > > > > > >
> > > > > > > > > > > and then we need to split DImode for 32bits, too.
> > > > > > > > > >
> > > > > > > > > > For now, please add "TARGET_64BIT && TARGET_AVX512F" for 
> > > > > > > > > > DImode
> > > > > > > > > > condition, I'll provide _doubleword splitter later.
> > > > > > > > >
> > > > > > > > > Shouldn't that be TARGET_AVX512VL instead?  Or does the insn 
> > > > > > > > > use %g0 etc.
> > > > > > > > > to force use of %zmmN?
> > > > > > > >
> > > > > > > > It generates V4SI mode, so - yes, AVX512VL.
> > > > > > >
> > > > > > > case SMAX:
> > > > > > > case SMIN:
> > > > > > > case UMAX:
> > > > > > > case UMIN:
> > > > > > >   if ((mode == DImode && (!TARGET_64BIT || !TARGET_AVX512VL))
> > > > > > >   || (mode == SImode && !TARGET_SSE4_1))
> > > > > > > return false;
> > > > > > >
> > > > > > > so there's no way to use AVX512VL for 32bit?
> > > > > >
> > > > > > There is a way, but on 32bit targets, we need to split DImode
> > > > > > operation to a sequence of SImode operations for unconverted 
> > > > > > pattern.
> > > > > > This is of course doable, but somehow more complex than simply
> > > > > > emitting a DImode compare + DImode cmove, which is what current
> > > > > > splitter does. So, a follow-up task.
> > > > >
> > > > > Ah, OK.  So for the above condition we can elide the !TARGET_64BIT
> > > > > check we just need to properly split if we enable the scalar minmax
> > > > > pattern for DImode on 32bits, the STV conversion would go fine.
> > > >
> > > > Yes, that is correct.
> > >
> > > So I tested the patch below (now with appropriate ChangeLog) on
> > > x86_64-unknown-linux-gnu.  I've thrown it at SPEC CPU 2006 with
> > > the obvious hmmer improvement, now checking for off-noise results
> > > with a 3-run on those that may have one (with more than +-1 second
> > > differences in the 1-run).
> > >
> > > As-is the patch likely runs into the splitting issue for DImode
> > > on i?86 and the patch misses functional testcases.  I'll do the
> > > hmmer loop with both DImode and SImode and testcases to trigger
> > > all pattern variants with the different ISAs we have.
> > >
> > > Some of the patch could be split out (the cost changes that are
> > > also effective for DImode for example).
> > >
> > > AFAICS we could go with only adding SImode avoiding the DImode
> > > splitting thing and this would solve the hmmer regression.
> >
> > I've additionally bootstrapped with --with-arch=nehalem which
> > reveals
> >
> > FAIL: gcc.target/i386/minmax-2.c scan-assembler test
> > FAIL: gcc.target/i386/minmax-2.c scan-assembler-not cmp
> >
> > we emit cmp + cmov here now with -msse4.1 (as soon as the max
> > pattern is enabled I guess)
>
> Actually, we have to split using ix86_expand_int_compare. This will
> generate optimized CC mode.

So, this only matters for comparisons against zero. Currently, the
insn_and_split pattern allows only registers, but we can add other
types, too. I'd say that this is benign issue.

Uros.


PR driver/91130 Use CL_DRIVER when handling of COLLECT_GCC_OPTIONS in lto-wrapper.c

2019-08-07 Thread Richard Earnshaw (lists)
Some options are handled differently by the main driver (gcc, g++, etc) 
from the back-end compiler programs (cc1, cc1plus, etc) in that in the 
driver they do not take an additional argument, while in the compiler 
programs they do.  The processing option option CL_DRIVER controls this 
alternative interpretation of the options.


The environment variable COLLECT_GCC_OPTIONS is the list of options to 
add to a compile if the compiler re-invokes itself at some point.  As 
such, the options are driver options, so CL_DRIVER should be used when 
processing this list.  Currently lto-wrapper is doing this incorrectly.


PR driver/91130
* lto-wrapper.c (find_and_merge_options): Use CL_DRIVER when
processing COLLECT_GCC_OPTIONS.
(run_gcc): Likewise.

Bootstrapped on aarch64-linux

OK?

NB, this is essentially the same as Richi's patch in the PR.  I'll let 
you decide which to take...


diff --git a/gcc/lto-wrapper.c b/gcc/lto-wrapper.c
index 3414ade..f93ff50 100644
--- a/gcc/lto-wrapper.c
+++ b/gcc/lto-wrapper.c
@@ -1010,7 +1010,7 @@ find_and_merge_options (int fd, off_t file_offset, const char *prefix,
   struct cl_decoded_option *f2decoded_options;
   unsigned int f2decoded_options_count;
   get_options_from_collect_gcc_options (collect_gcc,
-	fopts, CL_LANG_ALL,
+	fopts, CL_DRIVER,
 	&f2decoded_options,
 	&f2decoded_options_count);
   if (!fdecoded_options)
@@ -1283,7 +1283,7 @@ run_gcc (unsigned argc, char *argv[])
 fatal_error (input_location,
 		 "environment variable % must be set");
   get_options_from_collect_gcc_options (collect_gcc, collect_gcc_options,
-	CL_LANG_ALL,
+	CL_DRIVER,
 	&decoded_options,
 	&decoded_options_count);
 


Re: [PATCH][RFC][x86] Fix PR91154, add SImode smax, allow SImode add in SSE regs

2019-08-07 Thread Richard Biener
On Wed, 7 Aug 2019, Uros Bizjak wrote:

> On Wed, Aug 7, 2019 at 2:20 PM Uros Bizjak  wrote:
> >
> > On Wed, Aug 7, 2019 at 1:51 PM Richard Biener  wrote:
> > >
> > > On Wed, 7 Aug 2019, Richard Biener wrote:
> > >
> > > > On Mon, 5 Aug 2019, Uros Bizjak wrote:
> > > >
> > > > > On Mon, Aug 5, 2019 at 3:29 PM Richard Biener  
> > > > > wrote:
> > > > >
> > > > > > > > > > > > (define_mode_iterator MAXMIN_IMODE [SI "TARGET_SSE4_1"] 
> > > > > > > > > > > > [DI "TARGET_AVX512F"])
> > > > > > > > > > > >
> > > > > > > > > > > > and then we need to split DImode for 32bits, too.
> > > > > > > > > > >
> > > > > > > > > > > For now, please add "TARGET_64BIT && TARGET_AVX512F" for 
> > > > > > > > > > > DImode
> > > > > > > > > > > condition, I'll provide _doubleword splitter later.
> > > > > > > > > >
> > > > > > > > > > Shouldn't that be TARGET_AVX512VL instead?  Or does the 
> > > > > > > > > > insn use %g0 etc.
> > > > > > > > > > to force use of %zmmN?
> > > > > > > > >
> > > > > > > > > It generates V4SI mode, so - yes, AVX512VL.
> > > > > > > >
> > > > > > > > case SMAX:
> > > > > > > > case SMIN:
> > > > > > > > case UMAX:
> > > > > > > > case UMIN:
> > > > > > > >   if ((mode == DImode && (!TARGET_64BIT || 
> > > > > > > > !TARGET_AVX512VL))
> > > > > > > >   || (mode == SImode && !TARGET_SSE4_1))
> > > > > > > > return false;
> > > > > > > >
> > > > > > > > so there's no way to use AVX512VL for 32bit?
> > > > > > >
> > > > > > > There is a way, but on 32bit targets, we need to split DImode
> > > > > > > operation to a sequence of SImode operations for unconverted 
> > > > > > > pattern.
> > > > > > > This is of course doable, but somehow more complex than simply
> > > > > > > emitting a DImode compare + DImode cmove, which is what current
> > > > > > > splitter does. So, a follow-up task.
> > > > > >
> > > > > > Ah, OK.  So for the above condition we can elide the !TARGET_64BIT
> > > > > > check we just need to properly split if we enable the scalar minmax
> > > > > > pattern for DImode on 32bits, the STV conversion would go fine.
> > > > >
> > > > > Yes, that is correct.
> > > >
> > > > So I tested the patch below (now with appropriate ChangeLog) on
> > > > x86_64-unknown-linux-gnu.  I've thrown it at SPEC CPU 2006 with
> > > > the obvious hmmer improvement, now checking for off-noise results
> > > > with a 3-run on those that may have one (with more than +-1 second
> > > > differences in the 1-run).
> > > >
> > > > As-is the patch likely runs into the splitting issue for DImode
> > > > on i?86 and the patch misses functional testcases.  I'll do the
> > > > hmmer loop with both DImode and SImode and testcases to trigger
> > > > all pattern variants with the different ISAs we have.
> > > >
> > > > Some of the patch could be split out (the cost changes that are
> > > > also effective for DImode for example).
> > > >
> > > > AFAICS we could go with only adding SImode avoiding the DImode
> > > > splitting thing and this would solve the hmmer regression.
> > >
> > > I've additionally bootstrapped with --with-arch=nehalem which
> > > reveals
> > >
> > > FAIL: gcc.target/i386/minmax-2.c scan-assembler test
> > > FAIL: gcc.target/i386/minmax-2.c scan-assembler-not cmp
> > >
> > > we emit cmp + cmov here now with -msse4.1 (as soon as the max
> > > pattern is enabled I guess)
> >
> > Actually, we have to split using ix86_expand_int_compare. This will
> > generate optimized CC mode.
> 
> So, this only matters for comparisons against zero. Currently, the
> insn_and_split pattern allows only registers, but we can add other
> types, too. I'd say that this is benign issue.

OK.  So this is with your suggestions applied plus testcases as
promised.  If we remove DImode support minmax-5.c has to be adjusted
at least.

Currently re-bootstrapping / testing on x86_64-unknown-linux-gnu.

I'll followup with the performance assessment (currently only
testing on Haswell), but I guess it is easy enough to address
issues that pop up with the various auto-testers as followup
by adjusting the cost function (and we may get additional testcases
then as well).

OK if the re-testing shows no issues?

Thanks,
Richard.

2019-08-07  Richard Biener  

PR target/91154
* config/i386/i386-features.h (scalar_chain::scalar_chain): Add
mode arguments.
(scalar_chain::smode): New member.
(scalar_chain::vmode): Likewise.
(dimode_scalar_chain): Rename to...
(general_scalar_chain): ... this.
(general_scalar_chain::general_scalar_chain): Take mode arguments.
(timode_scalar_chain::timode_scalar_chain): Initialize scalar_chain
base with TImode and V1TImode.
* config/i386/i386-features.c (scalar_chain::scalar_chain): Adjust.
(general_scalar_chain::vector_const_cost): Adjust for SImode
chains.
(general_scalar_chain::compute_convert_gain): Likewise.  Fix
reg-reg move cost gain, use ix86_cost->sse_op cos

Re: [Committed] PR fortran/54072 -- More fun with BOZ

2019-08-07 Thread Mark Eggleston

Steve,

BOZ problems in the following areas

 * use of logical and character variables with BOZ constants
 * comparisons with BOZ constants
 * DATA statements

Comparing 9.1 and trunk:

character variables (9.1)

 * old style initialisation - not allowed (Incompatible types)
 * new style initialisation - not allowed (Cannot convert)
 * assignment - not allowed (Cannot convert)
 * DATA statement not allowed

character variables (trunk with -fallow-invalid-boz)

 * old style initialisation - not allowed (type mismatch)
 * new style initialisation - not allowed (Unclassifiable statement)
 * assignment - not allowed (Invalid use)
 * DATA statement allowed

logical variables (9.1)

 * old style initialisation - not allowed (Incompatible types)
 * new style initialisation - allowed with warning
 * assignment - allowed with warning
 * DATA statement not allowed (Incompatible types)

logical variables (trunk with -fallow-invalid-boz)

 * old style initialisation - not allowed (type mismatch)
 * new style initialisation - not allowed (Unclassifiable statement)
 * assignment - not allowed (invalid use)
 * DATA statement allowed

Comparisons with BOZ constants was allowed using 9.1 with 
-Wconversion-extra:


    5 | if (i4 .eq. z'1000') then
  |    1
Warning: Conversion from INTEGER(4) to INTEGER(16) at (1) 
[-Wconversion-extra]


Using trunk  with -fallow-invalid-boz comparison is not allowed:

    5 | if (i4 .eq. z'1000') then
  |    1
Error: Operands of comparison operator '.eq.' at (1) are INTEGER(4)/BOZ

I would have expected a suitable warning about using the BOZ in an 
inappropriate place.


DATA statements for logical and character variable compile but do not work:

program test
  character(4) :: c
  data c / z'41424344' /
  write(*, *) "'" // c // "'", transfer(c, 0_4)
end program test

Outputs:

 ''   0

program test
  logical(4) b / z'0001' /
  write(*, *) b
end program test

Outputs:

 F

Apologies if this should have been reported via bugzilla. If so let me 
know and I'll submit it split into 2 or 3 bug reports.


The trunk compiler was built on x86_64 hardware using code from 
Subversion revision 274157.



regards,
Mark Eggleston


On 24/07/2019 00:05, Steve Kargl wrote:

I've committed the attached patch as a follow-up to
the recent BOZ (r273747).  It removes a few leftover
comments as well as fixes the PR.

2019-07-23  Steven G. Kargl  

  PR fortran/54072
  * check.c (gfc_invalid_boz): Fix comment.
  (illegal_boz_arg): New function.
  (gfc_check_transfer): Use to arguments.
  (gfc_check_storage_size): Ditto.
  (gfc_check_complex): Remove leftover comment from BOZ patch.
  * primary.c (match_boz_constant): Remove leftover comment.


2019-07-23  Steven G. Kargl  

  PR fortran/54072
  * gfortran.dg/illegal_boz_arg_1.f90: New tests.


--
https://www.codethink.co.uk/privacy.html



Re: [PATCH][RFC][x86] Fix PR91154, add SImode smax, allow SImode add in SSE regs

2019-08-07 Thread Uros Bizjak
On Wed, Aug 7, 2019 at 2:52 PM Richard Biener  wrote:
>
> On Wed, 7 Aug 2019, Uros Bizjak wrote:
>
> > On Wed, Aug 7, 2019 at 2:20 PM Uros Bizjak  wrote:
> > >
> > > On Wed, Aug 7, 2019 at 1:51 PM Richard Biener  wrote:
> > > >
> > > > On Wed, 7 Aug 2019, Richard Biener wrote:
> > > >
> > > > > On Mon, 5 Aug 2019, Uros Bizjak wrote:
> > > > >
> > > > > > On Mon, Aug 5, 2019 at 3:29 PM Richard Biener  
> > > > > > wrote:
> > > > > >
> > > > > > > > > > > > > (define_mode_iterator MAXMIN_IMODE [SI 
> > > > > > > > > > > > > "TARGET_SSE4_1"] [DI "TARGET_AVX512F"])
> > > > > > > > > > > > >
> > > > > > > > > > > > > and then we need to split DImode for 32bits, too.
> > > > > > > > > > > >
> > > > > > > > > > > > For now, please add "TARGET_64BIT && TARGET_AVX512F" 
> > > > > > > > > > > > for DImode
> > > > > > > > > > > > condition, I'll provide _doubleword splitter later.
> > > > > > > > > > >
> > > > > > > > > > > Shouldn't that be TARGET_AVX512VL instead?  Or does the 
> > > > > > > > > > > insn use %g0 etc.
> > > > > > > > > > > to force use of %zmmN?
> > > > > > > > > >
> > > > > > > > > > It generates V4SI mode, so - yes, AVX512VL.
> > > > > > > > >
> > > > > > > > > case SMAX:
> > > > > > > > > case SMIN:
> > > > > > > > > case UMAX:
> > > > > > > > > case UMIN:
> > > > > > > > >   if ((mode == DImode && (!TARGET_64BIT || 
> > > > > > > > > !TARGET_AVX512VL))
> > > > > > > > >   || (mode == SImode && !TARGET_SSE4_1))
> > > > > > > > > return false;
> > > > > > > > >
> > > > > > > > > so there's no way to use AVX512VL for 32bit?
> > > > > > > >
> > > > > > > > There is a way, but on 32bit targets, we need to split DImode
> > > > > > > > operation to a sequence of SImode operations for unconverted 
> > > > > > > > pattern.
> > > > > > > > This is of course doable, but somehow more complex than simply
> > > > > > > > emitting a DImode compare + DImode cmove, which is what current
> > > > > > > > splitter does. So, a follow-up task.
> > > > > > >
> > > > > > > Ah, OK.  So for the above condition we can elide the !TARGET_64BIT
> > > > > > > check we just need to properly split if we enable the scalar 
> > > > > > > minmax
> > > > > > > pattern for DImode on 32bits, the STV conversion would go fine.
> > > > > >
> > > > > > Yes, that is correct.
> > > > >
> > > > > So I tested the patch below (now with appropriate ChangeLog) on
> > > > > x86_64-unknown-linux-gnu.  I've thrown it at SPEC CPU 2006 with
> > > > > the obvious hmmer improvement, now checking for off-noise results
> > > > > with a 3-run on those that may have one (with more than +-1 second
> > > > > differences in the 1-run).
> > > > >
> > > > > As-is the patch likely runs into the splitting issue for DImode
> > > > > on i?86 and the patch misses functional testcases.  I'll do the
> > > > > hmmer loop with both DImode and SImode and testcases to trigger
> > > > > all pattern variants with the different ISAs we have.
> > > > >
> > > > > Some of the patch could be split out (the cost changes that are
> > > > > also effective for DImode for example).
> > > > >
> > > > > AFAICS we could go with only adding SImode avoiding the DImode
> > > > > splitting thing and this would solve the hmmer regression.
> > > >
> > > > I've additionally bootstrapped with --with-arch=nehalem which
> > > > reveals
> > > >
> > > > FAIL: gcc.target/i386/minmax-2.c scan-assembler test
> > > > FAIL: gcc.target/i386/minmax-2.c scan-assembler-not cmp
> > > >
> > > > we emit cmp + cmov here now with -msse4.1 (as soon as the max
> > > > pattern is enabled I guess)
> > >
> > > Actually, we have to split using ix86_expand_int_compare. This will
> > > generate optimized CC mode.
> >
> > So, this only matters for comparisons against zero. Currently, the
> > insn_and_split pattern allows only registers, but we can add other
> > types, too. I'd say that this is benign issue.
>
> OK.  So this is with your suggestions applied plus testcases as
> promised.  If we remove DImode support minmax-5.c has to be adjusted
> at least.
>
> Currently re-bootstrapping / testing on x86_64-unknown-linux-gnu.
>
> I'll followup with the performance assessment (currently only
> testing on Haswell), but I guess it is easy enough to address
> issues that pop up with the various auto-testers as followup
> by adjusting the cost function (and we may get additional testcases
> then as well).
>
> OK if the re-testing shows no issues?
>
> Thanks,
> Richard.
>
> 2019-08-07  Richard Biener  
>
> PR target/91154
> * config/i386/i386-features.h (scalar_chain::scalar_chain): Add
> mode arguments.
> (scalar_chain::smode): New member.
> (scalar_chain::vmode): Likewise.
> (dimode_scalar_chain): Rename to...
> (general_scalar_chain): ... this.
> (general_scalar_chain::general_scalar_chain): Take mode arguments.
> (timode_scalar_chain::timode_scalar_chain): Initialize scalar_chain
> base with TImode and V1TImod

Re: C++ PATCH for c++/91346 - Implement C++2a P1668R1, allow unevaluated asm in constexpr

2019-08-07 Thread Marek Polacek
On Tue, Aug 06, 2019 at 09:25:45PM -0400, Jason Merrill wrote:
> Let's downgrade the errors in earlier standard modes to pedwarn. Ok with
> that change.

Works for me, here's what I'll apply once it passes testing.

I removed the diagnostic in potential_constant_expression_1/ASM_EXPR so that
we don't generate duplicate pedwarns for the same thing.  Hope that's OK.

2019-08-07  Marek Polacek  

PR c++/91346 - Implement P1668R1, allow unevaluated asm in constexpr.
* constexpr.c (cxx_eval_constant_expression): Handle ASM_EXPR.
(potential_constant_expression_1) : Allow.
* cp-tree.h (finish_asm_stmt): Adjust.
* parser.c (cp_parser_asm_definition): Grab the locaion of "asm" and
use it.  Change an error to a pedwarn.  Allow asm in C++2a, warn
otherwise.
* pt.c (tsubst_expr): Pass a location down to finish_asm_stmt.
* semantics.c (finish_asm_stmt): New location_t parameter.  Use it.

* g++.dg/cpp2a/inline-asm1.C: New test.
* g++.dg/cpp2a/inline-asm2.C: New test.
* g++.dg/cpp1y/constexpr-neg1.C: Adjust dg-error.

diff --git gcc/cp/constexpr.c gcc/cp/constexpr.c
index 36a66337433..e86b0789b84 100644
--- gcc/cp/constexpr.c
+++ gcc/cp/constexpr.c
@@ -5289,6 +5289,18 @@ cxx_eval_constant_expression (const constexpr_ctx *ctx, 
tree t,
   r = void_node;
   break;
 
+case ASM_EXPR:
+  if (!ctx->quiet)
+   {
+ error_at (cp_expr_loc_or_input_loc (t),
+   "inline assembly is not a constant expression");
+ inform (cp_expr_loc_or_input_loc (t),
+ "only unevaluated inline assembly is allowed in a "
+ "% function in C++2a");
+   }
+  *non_constant_p = true;
+  return t;
+
 default:
   if (STATEMENT_CODE_P (TREE_CODE (t)))
{
@@ -6469,13 +6481,18 @@ potential_constant_expression_1 (tree t, bool 
want_rval, bool strict, bool now,
   /* GCC internal stuff.  */
 case VA_ARG_EXPR:
 case TRANSACTION_EXPR:
-case ASM_EXPR:
 case AT_ENCODE_EXPR:
 fail:
   if (flags & tf_error)
error_at (loc, "expression %qE is not a constant expression", t);
   return false;
 
+case ASM_EXPR:
+  /* In C++2a, unevaluated inline assembly is permitted in constexpr
+functions.  If it's used in earlier standard modes, we pedwarn in
+cp_parser_asm_definition.  */
+  return true;
+
 case OBJ_TYPE_REF:
   if (cxx_dialect >= cxx2a)
/* In C++2a virtual calls can be constexpr, don't give up yet.  */
diff --git gcc/cp/cp-tree.h gcc/cp/cp-tree.h
index d4e67cdfd96..72ee1d61e97 100644
--- gcc/cp/cp-tree.h
+++ gcc/cp/cp-tree.h
@@ -7052,8 +7052,8 @@ enum {
 extern tree begin_compound_stmt(unsigned int);
 
 extern void finish_compound_stmt   (tree);
-extern tree finish_asm_stmt(int, tree, tree, tree, tree,
-tree, bool);
+extern tree finish_asm_stmt(location_t, int, tree, tree,
+tree, tree, tree, bool);
 extern tree finish_label_stmt  (tree);
 extern void finish_label_decl  (tree);
 extern cp_expr finish_parenthesized_expr   (cp_expr);
diff --git gcc/cp/parser.c gcc/cp/parser.c
index 4d07a6a3011..ccf89f0856f 100644
--- gcc/cp/parser.c
+++ gcc/cp/parser.c
@@ -19817,16 +19817,18 @@ cp_parser_asm_definition (cp_parser* parser)
   bool invalid_inputs_p = false;
   bool invalid_outputs_p = false;
   required_token missing = RT_NONE;
+  location_t asm_loc = cp_lexer_peek_token (parser->lexer)->location;
 
   /* Look for the `asm' keyword.  */
   cp_parser_require_keyword (parser, RID_ASM, RT_ASM);
 
+  /* In C++2a, unevaluated inline assembly is permitted in constexpr
+ functions.  */
   if (parser->in_function_body
-  && DECL_DECLARED_CONSTEXPR_P (current_function_decl))
-{
-  error ("% in % function");
-  cp_function_chain->invalid_constexpr = true;
-}
+  && DECL_DECLARED_CONSTEXPR_P (current_function_decl)
+  && (cxx_dialect < cxx2a))
+pedwarn (asm_loc, 0, "% in % function only available "
+"with %<-std=c++2a%> or %<-std=gnu++2a%>");
 
   /* Handle the asm-qualifier-list.  */
   location_t volatile_loc = UNKNOWN_LOCATION;
@@ -20032,7 +20034,7 @@ cp_parser_asm_definition (cp_parser* parser)
   /* Create the ASM_EXPR.  */
   if (parser->in_function_body)
{
- asm_stmt = finish_asm_stmt (volatile_p, string, outputs,
+ asm_stmt = finish_asm_stmt (asm_loc, volatile_p, string, outputs,
  inputs, clobbers, labels, inline_p);
  /* If the extended syntax was not used, mark the ASM_EXPR.  */
  if (!extended_p)
diff --git gcc/cp/pt.c gcc/cp/pt.c
index b1ad99d1481..b03968febb4 100644
--- gcc/cp/pt.c
+++ gcc/cp/pt.c
@@ -17396,8 +17396,9 @@ tsubst_expr (tree t, tree args,

Re: [PATCH] [LRA] Fix wrong-code PR 91109

2019-08-07 Thread Vladimir Makarov

On 8/5/19 4:37 PM, Bernd Edlinger wrote:

Hi!


PR 91109 is a wrong-code bug, where LRA is using a scratch register
which is actually not available for use, and thus gets clobbered
when it should not.  That seems to be mostly because the live range
info of the cloned schatch register is not working the way how update_scrach_ops
sets up the new register.  Moreover for the new register there is
a much better alternative free register available, so that just not
trying the re-use the previous hard register assignment solves the problem.

For more background please see the bugzilla PR 91109.

Since I am able to reproduce this bug with latest gcc-9 branch, I want
to ask right away, if it is okay to back-port after a while.


Boot-strapped and reg-tested on x86_64-pc-linux-gnu and armv7-linux-gnueabihf.
Is it OK for trunk?


Thank you for working on the problem which is severe as the wrong code 
is generated.  The patch is ok as an intermediate solution. You can 
commit it to the trunk and gcc-9 branch.


Still I think more work on the PR is needed.  If subsequent LRA sub-pass 
spills some pseudo to assign a hard register to the scratch of the 
rematerialized insn as it was in the original insn, it might make this 
rematerialization unprofitable.  So I'll think how to avoid the 
unprofitable rematerialization in such cases and would like to work on 
this  PR more.


Please, do not close the PR after committing the patch.  I am going to 
work on it more when stage3 starts.





Re: [PATCH] Fix file descriptor existence of MinGW.

2019-08-07 Thread Pekka S

On 7.8.2019 15:09, Martin Liška wrote:

Hi.

There's one enhanced version where I added HAVE_FCNTL_H.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin


Hi,

tested this one against the latest trunk and it seems to be working -- 
as in it's possible to bootstrap a working (cross) compiler that is 
hosted on mingw-w64.


-- Pekka


Re: PR driver/91130 Use CL_DRIVER when handling of COLLECT_GCC_OPTIONS in lto-wrapper.c

2019-08-07 Thread Jakub Jelinek
On Wed, Aug 07, 2019 at 01:42:34PM +0100, Richard Earnshaw (lists) wrote:
> Some options are handled differently by the main driver (gcc, g++, etc) from
> the back-end compiler programs (cc1, cc1plus, etc) in that in the driver
> they do not take an additional argument, while in the compiler programs they
> do.  The processing option option CL_DRIVER controls this alternative
> interpretation of the options.
> 
> The environment variable COLLECT_GCC_OPTIONS is the list of options to add
> to a compile if the compiler re-invokes itself at some point.  As such, the
> options are driver options, so CL_DRIVER should be used when processing this
> list.  Currently lto-wrapper is doing this incorrectly.
> 
>   PR driver/91130
>   * lto-wrapper.c (find_and_merge_options): Use CL_DRIVER when
>   processing COLLECT_GCC_OPTIONS.
>   (run_gcc): Likewise.
> 
> Bootstrapped on aarch64-linux
> 
> OK?
> 
> NB, this is essentially the same as Richi's patch in the PR.  I'll let you
> decide which to take...

I think I'd prefer the patch Richi pasted with the

--- gcc/lto-wrapper.c   2019-08-07 14:36:30.781562354 +0200
+++ gcc/lto-wrapper.c   2019-08-07 15:48:55.140279384 +0200
@@ -128,7 +128,7 @@ maybe_unlink (const char *file)
 #define DUMPBASE_SUFFIX ".ltrans18446744073709551615"
 
 /* Create decoded options from the COLLECT_GCC and COLLECT_GCC_OPTIONS
-   environment according to LANG_MASK.  */
+   environment.  */
 
 static void
 get_options_from_collect_gcc_options (const char *collect_gcc,

incremental change, because it really doesn't make sense to pass to
get_options_from_collect_gcc_options the same constant value in both
invocations (well, it didn't make sense before either).

Though, I'm fine if you commit your patch now as a fix and Richi's patch
with the above incremental change is applied on top of it incrementally
as a cleanup.

Jakub


Re: [PATCH][RFC][x86] Fix PR91154, add SImode smax, allow SImode add in SSE regs

2019-08-07 Thread Richard Biener
On Wed, 7 Aug 2019, Richard Biener wrote:

> On Mon, 5 Aug 2019, Uros Bizjak wrote:
> 
> > On Mon, Aug 5, 2019 at 3:29 PM Richard Biener  wrote:
> > 
> > > > > > > > > (define_mode_iterator MAXMIN_IMODE [SI "TARGET_SSE4_1"] [DI 
> > > > > > > > > "TARGET_AVX512F"])
> > > > > > > > >
> > > > > > > > > and then we need to split DImode for 32bits, too.
> > > > > > > >
> > > > > > > > For now, please add "TARGET_64BIT && TARGET_AVX512F" for DImode
> > > > > > > > condition, I'll provide _doubleword splitter later.
> > > > > > >
> > > > > > > Shouldn't that be TARGET_AVX512VL instead?  Or does the insn use 
> > > > > > > %g0 etc.
> > > > > > > to force use of %zmmN?
> > > > > >
> > > > > > It generates V4SI mode, so - yes, AVX512VL.
> > > > >
> > > > > case SMAX:
> > > > > case SMIN:
> > > > > case UMAX:
> > > > > case UMIN:
> > > > >   if ((mode == DImode && (!TARGET_64BIT || !TARGET_AVX512VL))
> > > > >   || (mode == SImode && !TARGET_SSE4_1))
> > > > > return false;
> > > > >
> > > > > so there's no way to use AVX512VL for 32bit?
> > > >
> > > > There is a way, but on 32bit targets, we need to split DImode
> > > > operation to a sequence of SImode operations for unconverted pattern.
> > > > This is of course doable, but somehow more complex than simply
> > > > emitting a DImode compare + DImode cmove, which is what current
> > > > splitter does. So, a follow-up task.
> > >
> > > Ah, OK.  So for the above condition we can elide the !TARGET_64BIT
> > > check we just need to properly split if we enable the scalar minmax
> > > pattern for DImode on 32bits, the STV conversion would go fine.
> > 
> > Yes, that is correct.
> 
> So I tested the patch below (now with appropriate ChangeLog) on
> x86_64-unknown-linux-gnu.  I've thrown it at SPEC CPU 2006 with
> the obvious hmmer improvement, now checking for off-noise results
> with a 3-run on those that may have one (with more than +-1 second
> differences in the 1-run).

Update on this one.  On Haswell I see (besides hmmer and the
ones +-1 second in the 1-run); base is unpatched, peak is patched:

401.bzip29650382   25.3 S9650380   
25.4 S
401.bzip29650381   25.3 *9650377   
25.6 *
401.bzip29650381   25.3 S9650376   
25.7 S

458.sjeng   12100433   28.0 S   12100433   
28.0 S
458.sjeng   12100428   28.3 S   12100424   
28.5 *
458.sjeng   12100432   28.0 *   12100424   
28.6 S

464.h264ref 22130413   53.6 S   22130422   
52.5 S
464.h264ref 22130413   53.6 *   22130421   
52.5 S
464.h264ref 22130413   53.6 S   22130421   
52.5 *

473.astar7020328   21.4 S7020316   
22.2 S
473.astar7020322   21.8 S7020314   
22.4 *
473.astar7020322   21.8 *7020311   
22.6 S

416.gamess  19580593   33.0 S   19580601   
32.6 S
416.gamess  19580593   33.0 S   19580601   
32.6 *
416.gamess  19580593   33.0 *   19580601   
32.6 S

so it's a loss for 464.h264ref and 416.gamess from the above numbers
and a slight win for the others (and a big one for 456.hmmer).

I plan to have a look at the two as followup only, possibly adding
a debug couter to be able to bisect to a specific chain.

Richard.


Re: [PATCH] Handle new operators with no arguments in DCE.

2019-08-07 Thread Richard Biener
On Wed, Aug 7, 2019 at 2:04 PM Martin Liška  wrote:
>
> On 8/7/19 12:51 PM, Jakub Jelinek wrote:
> > On Wed, Aug 07, 2019 at 12:44:28PM +0200, Martin Liška wrote:
> >> On 8/7/19 11:51 AM, Richard Biener wrote:
> >>> I think the simplest way to achieve this is to not copy, aka clear,
> >>> DECL_IS_OPERATOR_* when cloning and removing arguments
> >>> (cloning for a constant align argument should be OK for example, as is
> >>> for a constant address).  Or simply always when cloning.
> >>
> >> Ok, then I'm suggesting following tested patch.
> >>
> >> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> >
> > What about LAMBDA_FUNCTION, doesn't cloning which changes arguments in any
> > way invalidate that too, i.e. shouldn't it be just
> >   FUNCTION_DECL_DECL_TYPE (new_node->decl) = NONE;
>
> Well, how are lambdas involved in the new/delete DCE here? Lambdas with 
> removed
> arguments should not interfere here.

But for coverage where we do

  gcov_write_unsigned (DECL_ARTIFICIAL (current_function_decl)
   && !DECL_FUNCTION_VERSIONED (current_function_decl)
   && !DECL_LAMBDA_FUNCTION_P (current_function_decl));

all clones should be considered artificial?

Anyway, your patch is OK, we can think about lambdas separately.  Can you
simplify the DCE code after the patch?

Thanks,
Richard.

> > instead?  On the other side, if the cloning doesn't change arguments in any
> > way, do we still want to clear those flags?
>
> Well, I would consider it safer to drop it always.
>
> Martin
>
> >
> >   Jakub
> >
>


Re: [Patch, ira] Invalid assert in reload1.c::finish_spills?

2019-08-07 Thread Vladimir Makarov

On 8/7/19 7:36 AM, senthilkumar.selva...@microchip.com wrote:

Hi,

   gcc/testsuite/c-c++-common/pr60101.c fails with an ICE for the
   avr target, because of a gcc_assert firing at reload1.c:4233

   The assert (in the patch below) looks bogus to me, as it's in
   the if block of

 if (! ira_conflicts_p || reg_renumber[i] >= 0)

   For this testcase and for the avr target, ira_conflicts_p is
   false because build_conflict_bit_table bailed out early
   (conflict table too big).
   If reg_renumber[i] is now negative, the assert fires and causes
   the ICE.

   Getting rid of the assert (patch below) makes the ICE go away,
   not sure if that's the right fix though.

   Comments?


Mike Matz is right.  Removing the assertion will make the bug even worse 
(changing memory beyond pseudo_previous_regs).


I did some investigation.  This bug occurred from a 10 years old patch 
avoiding building big conflict tables in IRA.  And the assert was in 
reload before IRA.


I think the solution should be

Index: reload1.c
===
--- reload1.c   (revision 273762)
+++ reload1.c   (working copy)
@@ -4225,13 +4225,8 @@ finish_spills (int global)
   spill_reg_order[i] = -1;

   EXECUTE_IF_SET_IN_REG_SET (&spilled_pseudos, FIRST_PSEUDO_REGISTER, 
i, rsi)

-    if (! ira_conflicts_p || reg_renumber[i] >= 0)
+    if (reg_renumber[i] >= 0)
   {
-   /* Record the current hard register the pseudo is allocated to
-  in pseudo_previous_regs so we avoid reallocating it to the
-  same hard reg in a later pass.  */
-   gcc_assert (reg_renumber[i] >= 0);
-
    SET_HARD_REG_BIT (pseudo_previous_regs[i], reg_renumber[i]);
    /* Mark it as no longer having a hard register home.  */
    reg_renumber[i] = -1;

So if the patch works, you can commit it to the trunk.


iff --git a/gcc/reload1.c b/gcc/reload1.c
index 38ee356a791..5acba706bee 100644
--- a/gcc/reload1.c
+++ b/gcc/reload1.c
@@ -4230,7 +4230,6 @@ finish_spills (int global)
/* Record the current hard register the pseudo is allocated to
   in pseudo_previous_regs so we avoid reallocating it to the
   same hard reg in a later pass.  */
-   gcc_assert (reg_renumber[i] >= 0);

SET_HARD_REG_BIT (pseudo_previous_regs[i], reg_renumber[i]);
/* Mark it as no longer having a hard register home.  */





Re: [PATCH] Fix file descriptor existence of MinGW.

2019-08-07 Thread Ian Lance Taylor
On Wed, Aug 7, 2019 at 5:09 AM Martin Liška  wrote:
>
> There's one enhanced version where I added HAVE_FCNTL_H.
>
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>
> Ready to be installed?

I think a better name than fd_exists would be is_valid_fd.  If that's
OK with you, the libiberty part of this patch seems fine with that
change.  Thanks.

(Although I'm not quite sure what problem this is trying to solve.
What goes wrong if the driver just doesn't bother to check whether the
file descriptors listed in --jobserver-auth are valid?  Why would that
ever happen, and why would it matter if it did?  The ChangeLog talks
about the linker using the same file descriptors, but in that case
won't this test see the descriptors as valid?)

Ian


Re: [PATCH 9/9] Remove alias set comparison.

2019-08-07 Thread Martin Sebor

On 6/11/19 1:36 AM, Martin Liska wrote:


gcc/ChangeLog:

2019-07-24  Martin Liska  

* ipa-icf-gimple.c (func_checker::compatible_types_p):
Do not compare alias sets.  It's handled by operand_equal_p.

gcc/testsuite/ChangeLog:

2019-07-24  Martin Liska  

* c-c++-common/Wstringop-truncation-4.c: Disable IPA ICF.


What fails without the change?

Thanks
Martin


* gcc.dg/tree-ssa/pr64910-2.c: Likewise.
* gcc.dg/tree-ssa/pr79352.c: Likewise.
* gcc.dg/ipa/ipa-icf-40.c: New test.
---
  gcc/ipa-icf-gimple.c  | 12 ---
  .../c-c++-common/Wstringop-truncation-4.c |  2 +-
  gcc/testsuite/gcc.dg/ipa/ipa-icf-40.c | 32 +++
  gcc/testsuite/gcc.dg/tree-ssa/pr64910-2.c |  2 +-
  gcc/testsuite/gcc.dg/tree-ssa/pr79352.c   |  2 +-
  5 files changed, 35 insertions(+), 15 deletions(-)
  create mode 100644 gcc/testsuite/gcc.dg/ipa/ipa-icf-40.c





[PATCH][GCC][AARCH64] Add effective-target check to b key execution tests

2019-08-07 Thread Sam Tebbs
Hi all,

This patch adds an effective target check to the pointer authentication B key 
execution tests. These were failing with an assembler error when run with a 
non-recent version of binutils, and this change will instead make them 
unsupported in such cases.

Tested with a recent version of binutils where it passes, and with a non-recent 
version where it is unsupported.

OK for trunk?

Sam

gcc/testsuite
2019-08-02  Sam Tebbs

* lib/target-supports.exp
(check_effective_target_arm_v8_4a_bkey_directive): New proc.
* g++.target/aarch64/return_address_sign_b_exception.C,
return_address_sign_ab_exception.C: Add dg-require-effective-target
checks.

diff --git 
a/gcc/testsuite/g++.target/aarch64/return_address_sign_ab_exception.C 
b/gcc/testsuite/g++.target/aarch64/return_address_sign_ab_exception.C
index 520cd18..ead11de 100644
--- a/gcc/testsuite/g++.target/aarch64/return_address_sign_ab_exception.C
+++ b/gcc/testsuite/g++.target/aarch64/return_address_sign_ab_exception.C
@@ -1,5 +1,6 @@
 /* { dg-do run } */
 /* { dg-options "--save-temps" } */
+/* { dg-require-effective-target arm_v8_3a_bkey_directive } */
 
 __attribute__((target("branch-protection=pac-ret+leaf")))
 int foo_a () {
diff --git a/gcc/testsuite/g++.target/aarch64/return_address_sign_b_exception.C 
b/gcc/testsuite/g++.target/aarch64/return_address_sign_b_exception.C
index eab2869..2f82731 100644
--- a/gcc/testsuite/g++.target/aarch64/return_address_sign_b_exception.C
+++ b/gcc/testsuite/g++.target/aarch64/return_address_sign_b_exception.C
@@ -1,5 +1,6 @@
 /* { dg-do run } */
 /* { dg-options "-mbranch-protection=pac-ret+leaf+b-key --save-temps" } */
+/* { dg-require-effective-target arm_v8_3a_bkey_directive } */
 
 int foo () {
   throw 22;
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 815e837..3c50b89 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -9463,6 +9463,16 @@ proc check_effective_target_arm_v8_3a_complex_neon_hw { 
} {
 } [add_options_for_arm_v8_3a_complex_neon ""]]
 }
 
+# Return 1 if the assembler supports assembling the Armv8.3 pointer 
authentication B key directive
+proc check_effective_target_arm_v8_3a_bkey_directive { } {
+   return [check_no_compiler_messages cet object {
+   int main(void) {
+   asm (".cfi_b_key_frame");
+   return 0;
+   }
+   }]
+}
+
 # Returns 1 if the target is using glibc, 0 otherwise.
 
 proc check_effective_target_glibc { } {


Re: [PATCH 1/3] C++20 constexpr lib part 1/3

2019-08-07 Thread Ed Smith-Rowland via gcc-patches

On 8/6/19 11:30 AM, Steve Ellcey wrote:

Ed,

I have run into an ICE that I tracked down to this patch:

commit 02fefffe6b78c4c39169206aa40fb53f367ecba8
Author: emsr 
Date:   Thu Aug 1 15:25:42 2019 +

 2019-08-01  Edward Smith-Rowland  <3dw...@verizon.net>

 Implement C++20 p0202 - Add Constexpr Modifiers to Functions
 in  and  Headers.
 Implement C++20 p1023 - constexpr comparison operators for 
std::array.


Before I try to create a smaller test example (the current failure happens
when I build https://github.com/llnl/RAJAPerf.git) I was wondering if you
have already seen this ICE:

/extra/sellcey/raja-build-error/RAJAPerf/src/apps/WIP-COUPLE.cpp: In member 
function 'virtual void rajaperf::apps::COUPLE::runKernel(rajaperf::VariantID)':
/extra/sellcey/raja-build-error/RAJAPerf/src/apps/WIP-COUPLE.cpp:217:1: 
internal compiler error: Segmentation fault
   217 | } // end namespace rajaperf
   | ^
0xe81ddf crash_signal
../../gcc/gcc/toplev.c:326
0x968d14 lookup_page_table_entry
../../gcc/gcc/ggc-page.c:632
0x968d14 ggc_set_mark(void const*)
../../gcc/gcc/ggc-page.c:1531
0xbfeadf gt_ggc_mx_symtab_node(void*)
/extra/sellcey/gcc-tot/obj-gcc/gcc/gtype-desc.c:1302
0xd9d2a7 gt_ggc_ma_order
./gt-passes.h:31
0xd9d2a7 gt_ggc_ma_order
./gt-passes.h:26
0xb6f49f ggc_mark_root_tab
../../gcc/gcc/ggc-common.c:77
0xb6f6c3 ggc_mark_roots()
../../gcc/gcc/ggc-common.c:94
0x9696af ggc_collect()
../../gcc/gcc/ggc-page.c:2201
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.


I've been building C++14 code (and C++17 and C++20 code) with this patch 
for half a year and no ICE.


I don't see how the patch could impact pre C++20 code.

Ed




Re: PR driver/91130 Use CL_DRIVER when handling of COLLECT_GCC_OPTIONS in lto-wrapper.c

2019-08-07 Thread Richard Earnshaw (lists)

On 07/08/2019 14:51, Jakub Jelinek wrote:

On Wed, Aug 07, 2019 at 01:42:34PM +0100, Richard Earnshaw (lists) wrote:

Some options are handled differently by the main driver (gcc, g++, etc) from
the back-end compiler programs (cc1, cc1plus, etc) in that in the driver
they do not take an additional argument, while in the compiler programs they
do.  The processing option option CL_DRIVER controls this alternative
interpretation of the options.

The environment variable COLLECT_GCC_OPTIONS is the list of options to add
to a compile if the compiler re-invokes itself at some point.  As such, the
options are driver options, so CL_DRIVER should be used when processing this
list.  Currently lto-wrapper is doing this incorrectly.

PR driver/91130
* lto-wrapper.c (find_and_merge_options): Use CL_DRIVER when
processing COLLECT_GCC_OPTIONS.
(run_gcc): Likewise.

Bootstrapped on aarch64-linux

OK?

NB, this is essentially the same as Richi's patch in the PR.  I'll let you
decide which to take...


I think I'd prefer the patch Richi pasted with the

--- gcc/lto-wrapper.c   2019-08-07 14:36:30.781562354 +0200
+++ gcc/lto-wrapper.c   2019-08-07 15:48:55.140279384 +0200
@@ -128,7 +128,7 @@ maybe_unlink (const char *file)
  #define DUMPBASE_SUFFIX ".ltrans18446744073709551615"
  
  /* Create decoded options from the COLLECT_GCC and COLLECT_GCC_OPTIONS

-   environment according to LANG_MASK.  */
+   environment.  */
  
  static void

  get_options_from_collect_gcc_options (const char *collect_gcc,

incremental change, because it really doesn't make sense to pass to
get_options_from_collect_gcc_options the same constant value in both
invocations (well, it didn't make sense before either).

Though, I'm fine if you commit your patch now as a fix and Richi's patch
with the above incremental change is applied on top of it incrementally
as a cleanup.

Jakub



Ok, I'll do that.  Do you want it on the gcc-9 branch as well?  I'm 
running a bootstrap of it right now.


R.


Re: [Committed] PR fortran/54072 -- More fun with BOZ

2019-08-07 Thread Steve Kargl
On Wed, Aug 07, 2019 at 01:58:17PM +0100, Mark Eggleston wrote:
> 
> BOZ problems in the following areas
> 
>   * use of logical and character variables with BOZ constants
>   * comparisons with BOZ constants
>   * DATA statements
> 
> Comparing 9.1 and trunk:

The comparison is somewhat irrelevant.  I removed a 
a number of undocumented extensions when I made the
handling of BOZ conform to the F2018 Fortran standard.

> Comparisons with BOZ constants was allowed using 9.1 with 
> -Wconversion-extra:
> 
>      5 | if (i4 .eq. z'1000') then
>    |    1
> Warning: Conversion from INTEGER(4) to INTEGER(16) at (1) 
> [-Wconversion-extra]

This is the old behavior were a BOZ upon parsing is 
immediately converted to an INTEGER with the widest decimal
range.  It is a holdover from when I made BOZ work in
accordance with the Fortran 95 standard, where a BOZ is
only allowed as a data-stmt-constant.  On your target, that
is INTEGER(16).  Because of that conversion, a BOZ could
be used anywhere an INTEGER can be used.

> Using trunk  with -fallow-invalid-boz comparison is not allowed:
> 
>      5 | if (i4 .eq. z'1000') then
>    |    1
> Error: Operands of comparison operator '.eq.' at (1) are INTEGER(4)/BOZ
> 
> I would have expected a suitable warning about using the BOZ in an 
> inappropriate place.

A BOZ cannot be an operand to a binary operator.

Consider

x = 1.0 + z'40490fdb'   ! Is this 4.14159 or 1.07853005E+09

y = z'40490fdb' + z'40490fbd' + 1. ! Is this 2*pi+1 or 2.15...E+09.

Note, gfortran does left-to-right evaluation, but Fortran standard
does not require this ordering.  For 'x' it is possible to convert
op2 to the type of op1, which would give 4.1415  That behavior
is different in comparison to the historical accident of 1.08E9.
For 'y', there is no valid conversion of op1 into op2.  In older
versions, the first addition is of 2 INTEGER(16).  The second
addition converts a INTEGER(16) to a REAL(4) and then adds.

> 
> DATA statements for logical and character variable compile but do not work:
> 
> program test
>    character(4) :: c
>    data c / z'41424344' /
>    write(*, *) "'" // c // "'", transfer(c, 0_4)
> end program test
> 
> Outputs:
> 
>   ''   0
> 
> program test
>    logical(4) b / z'0001' /
>    write(*, *) b
> end program test
> 
> Outputs:
> 
>   F

>From the current Fortran working documenti, page 111:

   If a data-stmt-constant is a boz-literal-constant, the corresponding
   variable shall be of type integer.  The boz-literal-constant is
   treated as if it were converted by the intrinsic function INT(16.9.100)
   to type integer with the kind type parameter of the variable.

For the second program, I get 

gfcx -o z a.f90 && ./z
a.f90:8:26:

8 | logical(4) b / z'0001' /
  |  1
Error: BOZ at (1) cannot appear in an old-style initialization

which to me is acceptable.

For the first testcase, that should be rejected.  I thought I had
that fixed in my tree.  It probably got lost in one of numerous
versions of the BOZ rewrite.

> Apologies if this should have been reported via bugzilla. If so let me 
> know and I'll submit it split into 2 or 3 bug reports.

Reporting it here is fine.  I'll look at rejecting the one code
that compiles (as it shouldn't).  And, I'll toy with adding BOZ
as an operand of binary operators (I actually had this working in 
an ancient patch) under -fallow-invalid-boz.



-- 
Steve


Re: PR driver/91130 Use CL_DRIVER when handling of COLLECT_GCC_OPTIONS in lto-wrapper.c

2019-08-07 Thread Jakub Jelinek
On Wed, Aug 07, 2019 at 05:08:28PM +0100, Richard Earnshaw (lists) wrote:
> > Though, I'm fine if you commit your patch now as a fix and Richi's patch
> > with the above incremental change is applied on top of it incrementally
> > as a cleanup.
> > 
> > Jakub
> > 
> 
> Ok, I'll do that.

Thanks.

> Do you want it on the gcc-9 branch as well?  I'm running
> a bootstrap of it right now.

Yes, but can you defer for 9.2.1, i.e. Tuesday+ next week?

Jakub


Re: PR driver/91130 Use CL_DRIVER when handling of COLLECT_GCC_OPTIONS in lto-wrapper.c

2019-08-07 Thread Richard Earnshaw (lists)

On 07/08/2019 17:15, Jakub Jelinek wrote:

On Wed, Aug 07, 2019 at 05:08:28PM +0100, Richard Earnshaw (lists) wrote:

Though, I'm fine if you commit your patch now as a fix and Richi's patch
with the above incremental change is applied on top of it incrementally
as a cleanup.

Jakub



Ok, I'll do that.


Thanks.


Do you want it on the gcc-9 branch as well?  I'm running
a bootstrap of it right now.


Yes, but can you defer for 9.2.1, i.e. Tuesday+ next week?

Jakub



I'm OoO next week, but can do it when I get back.

R.


Re: C++ PATCH for c++/91346 - Implement C++2a P1668R1, allow unevaluated asm in constexpr

2019-08-07 Thread Jason Merrill
On Wed, Aug 7, 2019, 9:11 AM Marek Polacek  wrote:

> On Tue, Aug 06, 2019 at 09:25:45PM -0400, Jason Merrill wrote:
> > Let's downgrade the errors in earlier standard modes to pedwarn. Ok with
> > that change.
>
> Works for me, here's what I'll apply once it passes testing.
>
> I removed the diagnostic in potential_constant_expression_1/ASM_EXPR so
> that
> we don't generate duplicate pedwarns for the same thing.  Hope that's OK.
>
> 2019-08-07  Marek Polacek  
>
> PR c++/91346 - Implement P1668R1, allow unevaluated asm in
> constexpr.
> * constexpr.c (cxx_eval_constant_expression): Handle ASM_EXPR.
> (potential_constant_expression_1) : Allow.
> * cp-tree.h (finish_asm_stmt): Adjust.
> * parser.c (cp_parser_asm_definition): Grab the locaion of "asm"
> and
> use it.  Change an error to a pedwarn.  Allow asm in C++2a, warn
> otherwise.
> * pt.c (tsubst_expr): Pass a location down to finish_asm_stmt.
> * semantics.c (finish_asm_stmt): New location_t parameter.  Use it.
>
> * g++.dg/cpp2a/inline-asm1.C: New test.
> * g++.dg/cpp2a/inline-asm2.C: New test.
> * g++.dg/cpp1y/constexpr-neg1.C: Adjust dg-error.
>
> diff --git gcc/cp/constexpr.c gcc/cp/constexpr.c
> index 36a66337433..e86b0789b84 100644
> --- gcc/cp/constexpr.c
> +++ gcc/cp/constexpr.c
> @@ -5289,6 +5289,18 @@ cxx_eval_constant_expression (const constexpr_ctx
> *ctx, tree t,
>r = void_node;
>break;
>
> +case ASM_EXPR:
> +  if (!ctx->quiet)
> +   {
> + error_at (cp_expr_loc_or_input_loc (t),
> +   "inline assembly is not a constant expression");
> + inform (cp_expr_loc_or_input_loc (t),
> + "only unevaluated inline assembly is allowed in a "
> + "% function in C++2a");
> +   }
> +  *non_constant_p = true;
> +  return t;
> +
>  default:
>if (STATEMENT_CODE_P (TREE_CODE (t)))
> {
> @@ -6469,13 +6481,18 @@ potential_constant_expression_1 (tree t, bool
> want_rval, bool strict, bool now,
>/* GCC internal stuff.  */
>  case VA_ARG_EXPR:
>  case TRANSACTION_EXPR:
> -case ASM_EXPR:
>  case AT_ENCODE_EXPR:
>  fail:
>if (flags & tf_error)
> error_at (loc, "expression %qE is not a constant expression", t);
>return false;
>
> +case ASM_EXPR:
> +  /* In C++2a, unevaluated inline assembly is permitted in constexpr
> +functions.  If it's used in earlier standard modes, we pedwarn in
> +cp_parser_asm_definition.  */
> +  return true;
>

Actually, do we need this change? If it's (possibly) unevaluated, we
shouldn't get here.

 case OBJ_TYPE_REF:
>if (cxx_dialect >= cxx2a)
> /* In C++2a virtual calls can be constexpr, don't give up yet.  */
> diff --git gcc/cp/cp-tree.h gcc/cp/cp-tree.h
> index d4e67cdfd96..72ee1d61e97 100644
> --- gcc/cp/cp-tree.h
> +++ gcc/cp/cp-tree.h
> @@ -7052,8 +7052,8 @@ enum {
>  extern tree begin_compound_stmt(unsigned int);
>
>  extern void finish_compound_stmt   (tree);
> -extern tree finish_asm_stmt(int, tree, tree, tree,
> tree,
> -tree, bool);
> +extern tree finish_asm_stmt(location_t, int, tree,
> tree,
> +tree, tree, tree, bool);
>  extern tree finish_label_stmt  (tree);
>  extern void finish_label_decl  (tree);
>  extern cp_expr finish_parenthesized_expr   (cp_expr);
> diff --git gcc/cp/parser.c gcc/cp/parser.c
> index 4d07a6a3011..ccf89f0856f 100644
> --- gcc/cp/parser.c
> +++ gcc/cp/parser.c
> @@ -19817,16 +19817,18 @@ cp_parser_asm_definition (cp_parser* parser)
>bool invalid_inputs_p = false;
>bool invalid_outputs_p = false;
>required_token missing = RT_NONE;
> +  location_t asm_loc = cp_lexer_peek_token (parser->lexer)->location;
>
>/* Look for the `asm' keyword.  */
>cp_parser_require_keyword (parser, RID_ASM, RT_ASM);
>
> +  /* In C++2a, unevaluated inline assembly is permitted in constexpr
> + functions.  */
>if (parser->in_function_body
> -  && DECL_DECLARED_CONSTEXPR_P (current_function_decl))
> -{
> -  error ("% in % function");
> -  cp_function_chain->invalid_constexpr = true;
> -}
> +  && DECL_DECLARED_CONSTEXPR_P (current_function_decl)
> +  && (cxx_dialect < cxx2a))
> +pedwarn (asm_loc, 0, "% in % function only
> available "
> +"with %<-std=c++2a%> or %<-std=gnu++2a%>");
>
>/* Handle the asm-qualifier-list.  */
>location_t volatile_loc = UNKNOWN_LOCATION;
> @@ -20032,7 +20034,7 @@ cp_parser_asm_definition (cp_parser* parser)
>/* Create the ASM_EXPR.  */
>if (parser->in_function_body)
> {
> - asm_stmt = finish_asm_stmt (volatile_p, string, outputs,
> + asm_stmt = finish_asm_stmt (asm_

Re: C++ PATCH for c++/81429 - wrong parsing of constructor with C++11 attribute

2019-08-07 Thread Jason Merrill
On Tue, Aug 6, 2019, 11:28 PM Marek Polacek  wrote:

> In this PR, we are wrongly parsing a constructor if its first parameter
> begins
> with a C++11 attribute, e.g.:
>
>   struct S {
> S([[maybe_unused]] int a) { }
>   };
>
> If the GNU attribute format is used instead, there's no problem.
> C++11-style
> attribute on a later parameter is fine also.
>
> The problem is in cp_parser_constructor_declarator_p, in particular the
> code
> that checks whether we're dealing with something like "S (f) (int);", which
> is not a constructor.  We're checking if what comes after the first '(' is
> either ')', "...", of a parameter decl.  A parameter decl can start with
> the
> "attribute" keyword, which cp_lexer_next_token_is_decl_specifier_keyword
> recognizes, but a parameter decl can also start with a C++11-style
> attribute,
> which we forgot to realize.
>
> The first test uses -Wunused-parameter in order to check that the attribute
> is in effect.
>
> And I also noticed that we don't issue a pedwarn about maybe_unused (C++17
> attribute) in C++14: PR 91382.
>

Since unsupported attributes are supposed to be ignored, I don't think we
need to complain.

The patch is ok.

Bootstrapped/regtested on x86_64-linux, ok for trunk?  I think this should
> also go into 9.3.
>
> 2019-08-06  Marek Polacek  
>
> PR c++/81429 - wrong parsing of constructor with C++11 attribute.
> * parser.c (cp_parser_constructor_declarator_p): Handle the
> scenario
> when a parameter declaration begins with [[attribute]].
>
> * g++.dg/cpp0x/gen-attrs-68.C: New test.
> * g++.dg/cpp0x/gen-attrs-69.C: New test.
>
> diff --git gcc/cp/parser.c gcc/cp/parser.c
> index 79da7b52eb9..b8996c707b6 100644
> --- gcc/cp/parser.c
> +++ gcc/cp/parser.c
> @@ -27855,7 +27855,9 @@ cp_parser_constructor_declarator_p (cp_parser
> *parser, cp_parser_flags flags,
>   /* A parameter declaration begins with a decl-specifier,
>  which is either the "attribute" keyword, a storage class
>  specifier, or (usually) a type-specifier.  */
> - && !cp_lexer_next_token_is_decl_specifier_keyword
> (parser->lexer))
> + && !cp_lexer_next_token_is_decl_specifier_keyword (parser->lexer)
> + /* A parameter declaration can also begin with [[attribute]].  */
> + && !cp_next_tokens_can_be_std_attribute_p (parser))
> {
>   tree type;
>   tree pushed_scope = NULL_TREE;
> diff --git gcc/testsuite/g++.dg/cpp0x/gen-attrs-68.C
> gcc/testsuite/g++.dg/cpp0x/gen-attrs-68.C
> new file mode 100644
> index 000..6bede0659db
> --- /dev/null
> +++ gcc/testsuite/g++.dg/cpp0x/gen-attrs-68.C
> @@ -0,0 +1,40 @@
> +// PR c++/81429 - wrong parsing of constructor with C++11 attribute.
> +// { dg-do compile { target c++11 } }
> +// { dg-additional-options "-Wunused-parameter -Wno-pedantic" }
> +
> +void fn1([[maybe_unused]] int a) { }
> +void fn2(int a [[maybe_unused]]) { }
> +void fn3(__attribute__((unused)) int a) { }
> +void fn4(int a __attribute__((unused))) { }
> +
> +struct S1 {
> +  S1([[maybe_unused]] int a) { }
> +};
> +
> +struct S2 {
> +  S2([[maybe_unused]] int f, [[maybe_unused]] int a) { }
> +};
> +
> +struct S3 {
> +  S3(int a [[maybe_unused]]) { }
> +};
> +
> +struct S4 {
> +  S4(int f [[maybe_unused]], int a [[maybe_unused]]) { }
> +};
> +
> +struct S5 {
> +  S5(__attribute__((unused)) int a) { }
> +};
> +
> +struct S6 {
> +  S6(__attribute__((unused)) int f, __attribute__((unused)) int a) { }
> +};
> +
> +struct S7 {
> +  S7(int a __attribute__((unused))) { }
> +};
> +
> +struct S8 {
> +  S8(int f __attribute__((unused)), int a __attribute__((unused))) { }
> +};
> diff --git gcc/testsuite/g++.dg/cpp0x/gen-attrs-69.C
> gcc/testsuite/g++.dg/cpp0x/gen-attrs-69.C
> new file mode 100644
> index 000..43173dbbdf4
> --- /dev/null
> +++ gcc/testsuite/g++.dg/cpp0x/gen-attrs-69.C
> @@ -0,0 +1,40 @@
> +// PR c++/81429 - wrong parsing of constructor with C++11 attribute.
> +// { dg-do compile { target c++11 } }
> +// { dg-additional-options "-Wno-pedantic" }
> +
> +void fn1([[maybe_unused]] int);
> +void fn2(int a [[maybe_unused]]);
> +void fn3(__attribute__((unused)) int);
> +void fn4(int __attribute__((unused)));
> +
> +struct S1 {
> +  S1([[maybe_unused]] int);
> +};
> +
> +struct S2 {
> +  S2([[maybe_unused]] int, [[maybe_unused]] int);
> +};
> +
> +struct S3 {
> +  S3(int a [[maybe_unused]]);
> +};
> +
> +struct S4 {
> +  S4(int a [[maybe_unused]], int b [[maybe_unused]]);
> +};
> +
> +struct S5 {
> +  S5(__attribute__((unused)) int);
> +};
> +
> +struct S6 {
> +  S6(__attribute__((unused)) int, __attribute__((unused)) int);
> +};
> +
> +struct S7 {
> +  S7(int __attribute__((unused)));
> +};
> +
> +struct S8 {
> +  S8(int __attribute__((unused)), int __attribute__((unused)));
> +};
>


RFC: Help with updating LTO documentation

2019-08-07 Thread Steve Ellcey
While trying to use the -flto and -fwhole-program flags I ran into problems
understanding what they do.  I would like to update the documentation but I
still don't understand these flags enough to be able to describe their
behaviour.  Here is the document section I would like to fix but don't
have enough information to do so.

From lto.texi:

| @subsection LTO modes of operation
| 
| One of the main goals of the GCC link-time infrastructure was to allow
| effective compilation of large programs.  For this reason GCC implements two
| link-time compilation modes.
| 
| @enumerate
| @item   @emph{LTO mode}, in which the whole program is read into the
| compiler at link-time and optimized in a similar way as if it
| were a single source-level compilation unit.
| 
| @item   @emph{WHOPR or partitioned mode}, designed to utilize multiple
| CPUs and/or a distributed compilation environment to quickly link
| large applications.  WHOPR stands for WHOle Program optimizeR (not to
| be confused with the semantics of @option{-fwhole-program}).  It
| partitions the aggregated callgraph from many different @code{.o}
| files and distributes the compilation of the sub-graphs to different
| CPUs.

What flag(s) do I use (or not use) to enable @emph{LTO mode}?
I am guessing that if I use -flto but not -flto-partition on a
link, this is what I get.  Is that correct?

What flag(s) do I use to enable @emph{WHOPR or partitioned mode}?
I am guessing that this is -flto-partition?  Do I also need -flto if I
am using -flto-partition?  I don't see any description in lto.texi or in
common.opt of exactly what the various values for -flto-partition
(none, one, balanced, 1to1, max) do.  Does such a description exist
anywhere?

Steve Ellcey
sell...@marvell.com


Re: C++ PATCH for c++/81429 - wrong parsing of constructor with C++11 attribute

2019-08-07 Thread Marek Polacek
On Wed, Aug 07, 2019 at 12:32:54PM -0400, Jason Merrill wrote:
> On Tue, Aug 6, 2019, 11:28 PM Marek Polacek  wrote:
> 
> > In this PR, we are wrongly parsing a constructor if its first parameter
> > begins
> > with a C++11 attribute, e.g.:
> >
> >   struct S {
> > S([[maybe_unused]] int a) { }
> >   };
> >
> > If the GNU attribute format is used instead, there's no problem.
> > C++11-style
> > attribute on a later parameter is fine also.
> >
> > The problem is in cp_parser_constructor_declarator_p, in particular the
> > code
> > that checks whether we're dealing with something like "S (f) (int);", which
> > is not a constructor.  We're checking if what comes after the first '(' is
> > either ')', "...", of a parameter decl.  A parameter decl can start with
> > the
> > "attribute" keyword, which cp_lexer_next_token_is_decl_specifier_keyword
> > recognizes, but a parameter decl can also start with a C++11-style
> > attribute,
> > which we forgot to realize.
> >
> > The first test uses -Wunused-parameter in order to check that the attribute
> > is in effect.
> >
> > And I also noticed that we don't issue a pedwarn about maybe_unused (C++17
> > attribute) in C++14: PR 91382.
> >
> 
> Since unsupported attributes are supposed to be ignored, I don't think we
> need to complain.

Yup, I already closed the bug.

> The patch is ok.

Thanks!

Marek


Re: [PATCH] Fix simd attribute handling on aarch64 (version 2)

2019-08-07 Thread Steve Ellcey
On Wed, 2019-08-07 at 10:40 +, Szabolcs Nagy wrote:
> ---
> ---
> On 31/07/2019 08:25, Richard Sandiford wrote:
> > Steve Ellcey  writes:
> > > 
> > > 2019-07-30  Steve Ellcey  
> > > 
> > >   * omp-simd-clone.c (simd_clone_adjust_return_type): Remove call
> > > to
> > >   build_distinct_type_copy.
> > >   (simd_clone_adjust_argument_types): Ditto.
> > >   (simd_clone_adjust): Call build_distinct_type_copy here.
> > >   (expand_simd_clones): Ditto.
> > > 
> > > 2019-07-30  Steve Ellcey  
> > > 
> > >   * gcc.target/aarch64/simd_pcs_attribute.c: New test.
> > >   * gcc.target/aarch64/simd_pcs_attribute-2.c: Ditto.
> > >   * gcc.target/aarch64/simd_pcs_attribute-3.c: Ditto.
> > 
> > OK if there are no objections in 48 hours.
> 
> i think this should be backported to gcc-9 too.

Yes, I agree.  The 9.X branch is frozen right now for the 9.2 release,
I will backport it to the branch after it reopens assuming there are no
objections.

Steve Ellcey
sell...@marvell.com


[AArch64] Tweak handling of fp moves via int registers

2019-08-07 Thread Richard Sandiford
The AArch64 port uses define_splits to prefer moving certain float
constants via integer registers over loading them from memory.  E.g.:

(set (reg:SF X) (const_double:SF C))

splits to:

(set (reg:SI tmp) (const_int C'))
(set (reg:SF X) (subreg:SF (reg:SI tmp) 0))

The problem with using splits for this -- especially when the split
instruction is a constant move -- is that the original form is still
valid and can be recreated by later pre-RA passes.  (And I think that's
a valid thing for them to do, since they're folding away what appears in
rtl terms to be a redundant instruction.)

One pass that can do this is ira's combine_and_move_insns, which among
other things looks for registers that are set once and used once.
If the register is set to a rematerialisable value, the code tries
to fold that value into the single use.

We don't normally see this effect at -O2 and above because
combine_and_move_insns isn't run when -fsched-pressure is enabled
(which it is by default on AArch64).  But arguably the combine part is
useful independently of -fsched-pressure, and only the move part is
suspect.  So I don't think we should rely on the combination not
happening here.

The new tests demonstrate the problem by running the original tests
at -O instead of -O2.

This patch does the optimisation by splitting the moves at generation
time and rejecting the combined form while the split is still possible.
REG_EQUAL notes on the second move still give the original floating-point
value for passes that need it.

Tested on aarch64-linux-gnu (with and without SVE) and aarch64_be-elf.
OK to install?

Richard


2019-08-07  Richard Sandiford  

gcc/
* config/aarch64/aarch64-protos.h (aarch64_move_float_via_int_p):
Declare.
* config/aarch64/aarch64.c (aarch64_move_float_via_int_p): New
function, extracted from the GPF_HF move splitter.
* config/aarch64/aarch64.md: Remove GPF_HF move splitter.
(mov): Move via an integer register if
aarch64_move_float_via_int_p.
(*movhf_aarch64, *movsf_aarch64, *movdf_aarch64): Check
aarch64_move_float_via_int_p.
* config/aarch64/iterators.md (fcvt_target): Handle TI and TF.
(FCVT_TARGET): Likewise.

gcc/testsuite/
* gcc.target/aarch64/dbl_mov_immediate_2.c: New test.
* gcc.target/aarch64/f16_mov_immediate_5.c: Likewise.
* gcc.target/aarch64/flt_mov_immediate_2.c: Likewise.

Index: gcc/config/aarch64/aarch64-protos.h
===
--- gcc/config/aarch64/aarch64-protos.h 2019-07-01 09:37:06.704528805 +0100
+++ gcc/config/aarch64/aarch64-protos.h 2019-08-07 19:07:38.199739765 +0100
@@ -519,6 +519,7 @@ const char * aarch64_output_probe_stack_
 const char * aarch64_output_probe_sve_stack_clash (rtx, rtx, rtx, rtx);
 void aarch64_err_no_fpadvsimd (machine_mode);
 void aarch64_expand_epilogue (bool);
+bool aarch64_move_float_via_int_p (rtx);
 void aarch64_expand_mov_immediate (rtx, rtx, rtx (*) (rtx, rtx) = 0);
 rtx aarch64_ptrue_reg (machine_mode);
 rtx aarch64_pfalse_reg (machine_mode);
Index: gcc/config/aarch64/aarch64.c
===
--- gcc/config/aarch64/aarch64.c2019-07-16 09:11:06.449416469 +0100
+++ gcc/config/aarch64/aarch64.c2019-08-07 19:07:38.203739735 +0100
@@ -3278,6 +3278,22 @@ aarch64_expand_sve_const_vector (rtx des
   gcc_assert (vectors[0] == dest);
 }
 
+/* Return true if floating-point value SRC should be moved into an
+   integer register first and then moved into a floating-point register.
+   This means that SRC is a constant that cannot be moved directly into
+   floating-point registers but assembling it in integer registers is
+   better than forcing it to memory.  */
+bool
+aarch64_move_float_via_int_p (rtx src)
+{
+  return (GET_MODE (src) != TFmode
+ && GET_CODE (src) == CONST_DOUBLE
+ && can_create_pseudo_p ()
+ && !aarch64_can_const_movi_rtx_p (src, GET_MODE (src))
+ && !aarch64_float_const_representable_p (src)
+ && aarch64_float_const_rtx_p (src));
+}
+
 /* Set DEST to immediate IMM.  For SVE vector modes, GEN_VEC_DUPLICATE
is a pattern that can be used to set DEST to a replicated scalar
element.  */
Index: gcc/config/aarch64/aarch64.md
===
--- gcc/config/aarch64/aarch64.md   2019-08-05 17:46:20.713723611 +0100
+++ gcc/config/aarch64/aarch64.md   2019-08-07 19:07:38.203739735 +0100
@@ -1249,14 +1249,24 @@ (define_expand "mov"
 && ! (GET_CODE (operands[1]) == CONST_DOUBLE
  && aarch64_float_const_zero_rtx_p (operands[1])))
   operands[1] = force_reg (mode, operands[1]);
+
+if (aarch64_move_float_via_int_p (operands[1]))
+  {
+   rtx imm = simplify_gen_subreg (mode, operands[1],
+  mode, 0);
+   rtx tmp = force_reg (mode, imm);
+ 

Run the combine part of combine_and_move_insns even if -fsched-pressure

2019-08-07 Thread Richard Sandiford
The main IRA routine includes the code:

  /* Don't move insns if live range shrinkage or register
 pressure-sensitive scheduling were done because it will not
 improve allocation but likely worsen insn scheduling.  */
  if (optimize
  && !flag_live_range_shrinkage
  && !(flag_sched_pressure && flag_schedule_insns))
combine_and_move_insns ();

The comment about not moving insns for pressure-sensitive scheduling
makes sense, but I think the combine part of combine_and_move_insns is
still useful, since it's folding a set of an equivalent value into its
single user and so eliminates the need for one register altogether.

(That also means that it's likely to undo live range shrinkage in
some cases, so I think the blanket skip still makes sense there.)

Tested on aarch64-linux-gnu, aarch64_be-elf and x86_64-linux-gnu.
OK to install?

Richard


2019-08-07  Richard Sandiford  

gcc/
* ira.c (combine_and_move_insns): Don't move insns if
pressure-sensitive scheduling is enabled.
(ira): Remove check for pressure-sensitive scheduling here.

gcc/testsuite/
* gcc.target/aarch64/csinc-3.c: New test.
* gcc.target/aarch64/csinv-2.c: Likewise.

Index: gcc/ira.c
===
--- gcc/ira.c   2019-07-10 19:41:27.159891908 +0100
+++ gcc/ira.c   2019-08-07 19:12:57.945375459 +0100
@@ -3748,6 +3748,11 @@ combine_and_move_insns (void)
   auto_bitmap cleared_regs;
   int max = max_reg_num ();
 
+  /* Don't move insns if register pressure-sensitive scheduling was
+ done because it will not improve allocation but likely worsen insn
+ scheduling.  */
+  bool allow_move_p = !(flag_sched_pressure && flag_schedule_insns);
+
   for (int regno = FIRST_PSEUDO_REGISTER; regno < max; regno++)
 {
   if (!reg_equiv[regno].replace)
@@ -3829,7 +3834,7 @@ combine_and_move_insns (void)
 
   /* Move the initialization of the register to just before
 USE_INSN.  Update the flow information.  */
-  else if (prev_nondebug_insn (use_insn) != def_insn)
+  else if (allow_move_p && prev_nondebug_insn (use_insn) != def_insn)
{
  rtx_insn *new_insn;
 
@@ -5307,12 +5312,8 @@ ira (FILE *f)
   reg_equiv = XCNEWVEC (struct equivalence, max_reg_num ());
   update_equiv_regs ();
 
-  /* Don't move insns if live range shrinkage or register
- pressure-sensitive scheduling were done because it will not
- improve allocation but likely worsen insn scheduling.  */
-  if (optimize
-  && !flag_live_range_shrinkage
-  && !(flag_sched_pressure && flag_schedule_insns))
+  /* This subpass could undo the effects of live range shrinkage.  */
+  if (optimize && !flag_live_range_shrinkage)
 combine_and_move_insns ();
 
   /* Gather additional equivalences with memory.  */
Index: gcc/testsuite/gcc.target/aarch64/csinc-3.c
===
--- /dev/null   2019-07-30 08:53:31.317691683 +0100
+++ gcc/testsuite/gcc.target/aarch64/csinc-3.c  2019-08-07 19:12:57.945375459 
+0100
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+int
+foo (int a, int b)
+{
+  if (a < 0)
+return 1;
+  if (a == 0)
+return b;
+  return b + 1;
+}
+
+/* { dg-final { scan-assembler-not {\tmov\tw[0-9]+, 1\n} } } */
+/* { dg-final { scan-assembler {\tcsinc\tw[0-9]+, w[0-9]+, wzr, ge\n} } } */
Index: gcc/testsuite/gcc.target/aarch64/csinv-2.c
===
--- /dev/null   2019-07-30 08:53:31.317691683 +0100
+++ gcc/testsuite/gcc.target/aarch64/csinv-2.c  2019-08-07 19:12:57.945375459 
+0100
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+int
+foo (int a, int b)
+{
+  if (a < 0)
+return -1;
+  if (a == 0)
+return 0;
+  return 1;
+}
+
+/* { dg-final { scan-assembler-not {\tmov\tw[0-9]+, -1\n} } } */
+/* { dg-final { scan-assembler {\tcsinv\tw[0-9]+, w[0-9]+, wzr, ge\n} } } */


[AArch64] Add a "y" constraint for V0-V7

2019-08-07 Thread Richard Sandiford
Some indexed SVE FCMLA operations have a 3-bit register field that
requires one of Z0-Z7.  This patch adds a public "y" constraint for that.

The patch also documents "x", which is again intended to be a public
constraint.

Tested on aarch64-linux-gnu (with and without SVE) and aarch64_be-elf.
OK to install?

Richard


2019-08-07  Richard Sandiford  

gcc/
* doc/md.texi: Document the x and y constraints for AArch64.
* config/aarch64/aarch64.h (FP_LO8_REGNUM_P): New macro.
(FP_LO8_REGS): New reg_class.
(REG_CLASS_NAMES, REG_CLASS_CONTENTS): Add an entry for FP_LO8_REGS.
* config/aarch64/aarch64.c (aarch64_hard_regno_nregs)
(aarch64_regno_regclass, aarch64_class_max_nregs): Handle FP_LO8_REGS.
* config/aarch64/predicates.md (aarch64_simd_register): Use
FP_REGNUM_P instead of checking the classes manually.
* config/aarch64/constraints.md (y): New constraint.

gcc/testsuite/
* gcc.target/aarch64/asm-x-constraint-1.c: New test.
* gcc.target/aarch64/asm-y-constraint-1.c: Likewise.

Index: gcc/doc/md.texi
===
--- gcc/doc/md.texi 2019-07-12 08:54:11.881445730 +0100
+++ gcc/doc/md.texi 2019-08-07 19:17:05.935537080 +0100
@@ -1748,6 +1748,12 @@ The stack pointer register (@code{SP})
 @item w
 Floating point register, Advanced SIMD vector register or SVE vector register
 
+@item x
+Like @code{w}, but restricted to registers 0 to 15 inclusive.
+
+@item y
+Like @code{w}, but restricted to registers 0 to 7 inclusive.
+
 @item Upl
 One of the low eight SVE predicate registers (@code{P0} to @code{P7})
 
Index: gcc/config/aarch64/aarch64.h
===
--- gcc/config/aarch64/aarch64.h2019-08-05 17:46:20.717723584 +0100
+++ gcc/config/aarch64/aarch64.h2019-08-07 19:17:05.931537111 +0100
@@ -563,6 +563,9 @@ #define FP_REGNUM_P(REGNO)  \
 #define FP_LO_REGNUM_P(REGNO)\
   (((unsigned) (REGNO - V0_REGNUM)) <= (V15_REGNUM - V0_REGNUM))
 
+#define FP_LO8_REGNUM_P(REGNO)\
+  (((unsigned) (REGNO - V0_REGNUM)) <= (V7_REGNUM - V0_REGNUM))
+
 #define PR_REGNUM_P(REGNO)\
   (((unsigned) (REGNO - P0_REGNUM)) <= (P15_REGNUM - P0_REGNUM))
 
@@ -581,6 +584,7 @@ enum reg_class
   GENERAL_REGS,
   STACK_REG,
   POINTER_REGS,
+  FP_LO8_REGS,
   FP_LO_REGS,
   FP_REGS,
   POINTER_AND_FP_REGS,
@@ -600,6 +604,7 @@ #define REG_CLASS_NAMES \
   "GENERAL_REGS",  \
   "STACK_REG", \
   "POINTER_REGS",  \
+  "FP_LO8_REGS",   \
   "FP_LO_REGS",\
   "FP_REGS",   \
   "POINTER_AND_FP_REGS",   \
@@ -616,6 +621,7 @@ #define REG_CLASS_CONTENTS  
\
   { 0x7fff, 0x, 0x0003 },  /* GENERAL_REGS */  \
   { 0x8000, 0x, 0x },  /* STACK_REG */ \
   { 0x, 0x, 0x0003 },  /* POINTER_REGS */  \
+  { 0x, 0x00ff, 0x },   /* FP_LO8_REGS  */ \
   { 0x, 0x, 0x },   /* FP_LO_REGS  */  \
   { 0x, 0x, 0x },   /* FP_REGS  */ \
   { 0x, 0x, 0x0003 },  /* POINTER_AND_FP_REGS */\
Index: gcc/config/aarch64/aarch64.c
===
--- gcc/config/aarch64/aarch64.c2019-08-07 19:07:38.203739735 +0100
+++ gcc/config/aarch64/aarch64.c2019-08-07 19:17:05.931537111 +0100
@@ -1610,6 +1610,7 @@ aarch64_hard_regno_nregs (unsigned regno
 {
 case FP_REGS:
 case FP_LO_REGS:
+case FP_LO8_REGS:
   if (aarch64_sve_data_mode_p (mode))
return exact_div (GET_MODE_SIZE (mode),
  BYTES_PER_SVE_VECTOR).to_constant ();
@@ -8295,7 +8296,8 @@ aarch64_regno_regclass (unsigned regno)
 return POINTER_REGS;
 
   if (FP_REGNUM_P (regno))
-return FP_LO_REGNUM_P (regno) ?  FP_LO_REGS : FP_REGS;
+return (FP_LO8_REGNUM_P (regno) ? FP_LO8_REGS
+   : FP_LO_REGNUM_P (regno) ? FP_LO_REGS : FP_REGS);
 
   if (PR_REGNUM_P (regno))
 return PR_LO_REGNUM_P (regno) ? PR_LO_REGS : PR_HI_REGS;
@@ -8585,6 +8587,7 @@ aarch64_class_max_nregs (reg_class_t reg
 case POINTER_AND_FP_REGS:
 case FP_REGS:
 case FP_LO_REGS:
+case FP_LO8_REGS:
   if (aarch64_sve_data_mode_p (mode)
  && constant_multiple_p (GET_MODE_SIZE (mode),
  BYTES_PER_SVE_VECTOR, &nregs))
Index: gcc/config/aarch64/predicates.md
===
--- gcc/config/aarch64/predicates.md2019-05-12 12:27:15.753897237 +0100
+++ gcc/config/aarch64/predicates.md2019-08-07 19:17:0

[AArch64] Make aarch64_classify_vector_mode use a switch statement

2019-08-07 Thread Richard Sandiford
aarch64_classify_vector_mode used properties of a mode to test whether
the mode was a single Advanced SIMD vector, a single SVE vector, or a
tuple of SVE vectors.  That works well for current trunk and is simpler
than checking for modes by name.

However, for the ACLE and for planned autovec improvements, we also
need partial SVE vector modes that hold:

- half of the available 32-bit elements
- a half or quarter of the available 16-bit elements
- a half, quarter, or eighth of the available 8-bit elements

These should be packed in memory and unpacked in registers.  E.g.
VNx2SI has half the number of elements of VNx4SI, and so is half the
size in memory.  When stored in registers, each VNx2SI element occupies
the low 32 bits of a VNx2DI element, with the upper bits being undefined.

The upshot is that:

  GET_MODE_SIZE (VNx4SImode) == 2 * GET_MODE_SIZE (VNx2SImode)

since GET_MODE_SIZE must always be the memory size.  This in turn means
that for fixed-length SVE, some partial modes can have the same size as
Advanced SIMD modes.  We then need to be specific about which mode we're
dealing with.

This patch prepares for that by switching based on the mode instead
of querying properties.

A later patch makes sure that Advanced SIMD modes always win over
partial SVE vector modes in normal queries.

Tested on aarch64-linux-gnu (with and without SVE) and aarch64_be-elf.
OK to install?

Richard


2019-08-07  Richard Sandiford  

gcc/
* config/aarch64/aarch64.c (aarch64_classify_vector_mode): Switch
based on the mode instead of testing properties of it.

Index: gcc/config/aarch64/aarch64.c
===
--- gcc/config/aarch64/aarch64.c2019-08-07 19:17:05.931537111 +0100
+++ gcc/config/aarch64/aarch64.c2019-08-07 19:19:52.018305243 +0100
@@ -1474,34 +1474,68 @@ aarch64_classify_vector_mode (machine_mo
   if (aarch64_sve_pred_mode_p (mode))
 return VEC_SVE_PRED;
 
-  scalar_mode inner = GET_MODE_INNER (mode);
-  if (VECTOR_MODE_P (mode)
-  && (inner == QImode
- || inner == HImode
- || inner == HFmode
- || inner == SImode
- || inner == SFmode
- || inner == DImode
- || inner == DFmode))
+  /* Make the decision based on the mode's enum value rather than its
+ properties, so that we keep the correct classification regardless
+ of -msve-vector-bits.  */
+  switch (mode)
 {
-  if (TARGET_SVE)
-   {
- if (known_eq (GET_MODE_BITSIZE (mode), BITS_PER_SVE_VECTOR))
-   return VEC_SVE_DATA;
- if (known_eq (GET_MODE_BITSIZE (mode), BITS_PER_SVE_VECTOR * 2)
- || known_eq (GET_MODE_BITSIZE (mode), BITS_PER_SVE_VECTOR * 3)
- || known_eq (GET_MODE_BITSIZE (mode), BITS_PER_SVE_VECTOR * 4))
-   return VEC_SVE_DATA | VEC_STRUCT;
-   }
+/* Single SVE vectors.  */
+case E_VNx16QImode:
+case E_VNx8HImode:
+case E_VNx4SImode:
+case E_VNx2DImode:
+case E_VNx8HFmode:
+case E_VNx4SFmode:
+case E_VNx2DFmode:
+  return TARGET_SVE ? VEC_SVE_DATA : 0;
 
-  /* This includes V1DF but not V1DI (which doesn't exist).  */
-  if (TARGET_SIMD
- && (known_eq (GET_MODE_BITSIZE (mode), 64)
- || known_eq (GET_MODE_BITSIZE (mode), 128)))
-   return VEC_ADVSIMD;
-}
+/* x2 SVE vectors.  */
+case E_VNx32QImode:
+case E_VNx16HImode:
+case E_VNx8SImode:
+case E_VNx4DImode:
+case E_VNx16HFmode:
+case E_VNx8SFmode:
+case E_VNx4DFmode:
+/* x3 SVE vectors.  */
+case E_VNx48QImode:
+case E_VNx24HImode:
+case E_VNx12SImode:
+case E_VNx6DImode:
+case E_VNx24HFmode:
+case E_VNx12SFmode:
+case E_VNx6DFmode:
+/* x4 SVE vectors.  */
+case E_VNx64QImode:
+case E_VNx32HImode:
+case E_VNx16SImode:
+case E_VNx8DImode:
+case E_VNx32HFmode:
+case E_VNx16SFmode:
+case E_VNx8DFmode:
+  return TARGET_SVE ? VEC_SVE_DATA | VEC_STRUCT : 0;
+
+/* 64-bit Advanced SIMD vectors.  */
+case E_V8QImode:
+case E_V4HImode:
+case E_V2SImode:
+/* ...E_V1DImode doesn't exist.  */
+case E_V4HFmode:
+case E_V2SFmode:
+case E_V1DFmode:
+/* 128-bit Advanced SIMD vectors.  */
+case E_V16QImode:
+case E_V8HImode:
+case E_V4SImode:
+case E_V2DImode:
+case E_V8HFmode:
+case E_V4SFmode:
+case E_V2DFmode:
+  return TARGET_SIMD ? VEC_ADVSIMD : 0;
 
-  return 0;
+default:
+  return 0;
+}
 }
 
 /* Return true if MODE is any of the data vector modes, including


[committed][AArch64] Remove unused commutative attribute

2019-08-07 Thread Richard Sandiford
The commutative attribute was once used by the SVE conditional binary
expanders, but it's now dead code.

Tested on aarch64-linux-gnu (with and without SVE) and aarch64_be-elf.
Applied as r274182.

Richard


2019-08-07  Richard Sandiford  

gcc/
* config/aarch64/iterators.md (commutative): Remove.

Index: gcc/config/aarch64/iterators.md
===
--- gcc/config/aarch64/iterators.md 2019-08-07 19:27:50.0 +0100
+++ gcc/config/aarch64/iterators.md 2019-08-07 19:27:55.414720593 +0100
@@ -1925,10 +1925,3 @@ (define_int_attr sve_fmad_op [(UNSPEC_CO
  (UNSPEC_COND_FMLS "fmsb")
  (UNSPEC_COND_FNMLA "fnmad")
  (UNSPEC_COND_FNMLS "fnmsb")])
-
-(define_int_attr commutative [(UNSPEC_COND_ADD "true")
- (UNSPEC_COND_SUB "false")
- (UNSPEC_COND_MUL "true")
- (UNSPEC_COND_DIV "false")
- (UNSPEC_COND_MIN "true")
- (UNSPEC_COND_MAX "true")])


Re: C++ PATCH for c++/91346 - Implement C++2a P1668R1, allow unevaluated asm in constexpr

2019-08-07 Thread Marek Polacek
On Wed, Aug 07, 2019 at 12:23:25PM -0400, Jason Merrill wrote:
> On Wed, Aug 7, 2019, 9:11 AM Marek Polacek  wrote:
> 
> > On Tue, Aug 06, 2019 at 09:25:45PM -0400, Jason Merrill wrote:
> > > Let's downgrade the errors in earlier standard modes to pedwarn. Ok with
> > > that change.
> >
> > Works for me, here's what I'll apply once it passes testing.
> >
> > I removed the diagnostic in potential_constant_expression_1/ASM_EXPR so
> > that
> > we don't generate duplicate pedwarns for the same thing.  Hope that's OK.
> >
> > 2019-08-07  Marek Polacek  
> >
> > PR c++/91346 - Implement P1668R1, allow unevaluated asm in
> > constexpr.
> > * constexpr.c (cxx_eval_constant_expression): Handle ASM_EXPR.
> > (potential_constant_expression_1) : Allow.
> > * cp-tree.h (finish_asm_stmt): Adjust.
> > * parser.c (cp_parser_asm_definition): Grab the locaion of "asm"
> > and
> > use it.  Change an error to a pedwarn.  Allow asm in C++2a, warn
> > otherwise.
> > * pt.c (tsubst_expr): Pass a location down to finish_asm_stmt.
> > * semantics.c (finish_asm_stmt): New location_t parameter.  Use it.
> >
> > * g++.dg/cpp2a/inline-asm1.C: New test.
> > * g++.dg/cpp2a/inline-asm2.C: New test.
> > * g++.dg/cpp1y/constexpr-neg1.C: Adjust dg-error.
> >
> > diff --git gcc/cp/constexpr.c gcc/cp/constexpr.c
> > index 36a66337433..e86b0789b84 100644
> > --- gcc/cp/constexpr.c
> > +++ gcc/cp/constexpr.c
> > @@ -5289,6 +5289,18 @@ cxx_eval_constant_expression (const constexpr_ctx
> > *ctx, tree t,
> >r = void_node;
> >break;
> >
> > +case ASM_EXPR:
> > +  if (!ctx->quiet)
> > +   {
> > + error_at (cp_expr_loc_or_input_loc (t),
> > +   "inline assembly is not a constant expression");
> > + inform (cp_expr_loc_or_input_loc (t),
> > + "only unevaluated inline assembly is allowed in a "
> > + "% function in C++2a");
> > +   }
> > +  *non_constant_p = true;
> > +  return t;
> > +
> >  default:
> >if (STATEMENT_CODE_P (TREE_CODE (t)))
> > {
> > @@ -6469,13 +6481,18 @@ potential_constant_expression_1 (tree t, bool
> > want_rval, bool strict, bool now,
> >/* GCC internal stuff.  */
> >  case VA_ARG_EXPR:
> >  case TRANSACTION_EXPR:
> > -case ASM_EXPR:
> >  case AT_ENCODE_EXPR:
> >  fail:
> >if (flags & tf_error)
> > error_at (loc, "expression %qE is not a constant expression", t);
> >return false;
> >
> > +case ASM_EXPR:
> > +  /* In C++2a, unevaluated inline assembly is permitted in constexpr
> > +functions.  If it's used in earlier standard modes, we pedwarn in
> > +cp_parser_asm_definition.  */
> > +  return true;
> >
> 
> Actually, do we need this change? If it's (possibly) unevaluated, we
> shouldn't get here.

We can get here when using asm() in ({ }) like this (ugh):

constexpr int
foo (bool b)
{
  if (b)
   {
 constexpr int i = ({ asm(""); 42; });
 return i;
}
  else
return 42;
}
static_assert(foo(false) == 42, "");

With the current state of potential_constant_expression_1, we generate

inline-asm3.C: In function ‘constexpr int foo(bool)’:
inline-asm3.C:10:27: error: inline assembly is not a constant expression
   10 |  constexpr int i = ({ asm(""); 42; });
  |   ^~~
inline-asm3.C:10:27: note: only unevaluated inline assembly is allowed in a 
‘constexpr’ function in C++2a

which I thought was better than what we emit with the hunk revered:

inline-asm3.C: In function ‘constexpr int foo(bool)’:
inline-asm3.C:10:27: error: expression ‘’ is not a constant 
expression
   10 |  constexpr int i = ({ asm(""); 42; });
  |   ^~~

But I'm happy to revert that hunk if you want.

Marek


[PATCH, i386]: Fix PR 91385, Zero-extended negation is not generated

2019-08-07 Thread Uros Bizjak
It looks that combine lost some of its unwanted creativity. Added
testcase will keep it that way.

2019-08-07  Uroš Bizjak  

PR target/91385
* config/i386/sse.md (*negsi2_1_zext): Simplify insn pattern.
(*negsi2_cmpz_zext): Ditto.

testsuite/ChangeLog:

2019-08-07  Uroš Bizjak  

PR target/91385
* gcc.target/i386/pr91385.c: New test.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Committed to mainline SVN.

Uros.
Index: config/i386/i386.md
===
--- config/i386/i386.md (revision 274169)
+++ config/i386/i386.md (working copy)
@@ -9337,13 +9337,10 @@
   [(set_attr "type" "negnot")
(set_attr "mode" "")])
 
-;; Combine is quite creative about this pattern.
 (define_insn "*negsi2_1_zext"
   [(set (match_operand:DI 0 "register_operand" "=r")
-   (lshiftrt:DI
- (neg:DI (ashift:DI (match_operand:DI 1 "register_operand" "0")
-(const_int 32)))
-   (const_int 32)))
+   (zero_extend:DI
+ (neg:SI (match_operand:SI 1 "register_operand" "0"
(clobber (reg:CC FLAGS_REG))]
   "TARGET_64BIT && ix86_unary_operator_ok (NEG, SImode, operands)"
   "neg{l}\t%k0"
@@ -9369,16 +9366,11 @@
 (define_insn "*negsi2_cmpz_zext"
   [(set (reg:CCZ FLAGS_REG)
(compare:CCZ
- (lshiftrt:DI
-   (neg:DI (ashift:DI
- (match_operand:DI 1 "register_operand" "0")
- (const_int 32)))
-   (const_int 32))
+ (neg:SI (match_operand:SI 1 "register_operand" "0"))
  (const_int 0)))
(set (match_operand:DI 0 "register_operand" "=r")
-   (lshiftrt:DI (neg:DI (ashift:DI (match_dup 1)
-   (const_int 32)))
-(const_int 32)))]
+   (zero_extend:DI
+ (neg:SI (match_dup 1]
   "TARGET_64BIT && ix86_unary_operator_ok (NEG, SImode, operands)"
   "neg{l}\t%k0"
   [(set_attr "type" "negnot")
@@ -9698,7 +9690,6 @@
   [(set_attr "type" "negnot")
(set_attr "mode" "")])
 
-;; ??? Currently never generated - xor is used instead.
 (define_insn "*one_cmplsi2_1_zext"
   [(set (match_operand:DI 0 "register_operand" "=r")
(zero_extend:DI
@@ -9749,7 +9740,6 @@
  (set (match_dup 1)
   (xor:SWI (match_dup 3) (const_int -1)))])])
 
-;; ??? Currently never generated - xor is used instead.
 (define_insn "*one_cmplsi2_2_zext"
   [(set (reg FLAGS_REG)
(compare (not:SI (match_operand:SI 1 "register_operand" "0"))
Index: testsuite/gcc.target/i386/pr91385.c
===
--- testsuite/gcc.target/i386/pr91385.c (nonexistent)
+++ testsuite/gcc.target/i386/pr91385.c (working copy)
@@ -0,0 +1,9 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -dp" } */
+/* { dg-final { scan-assembler-not "zero_extendsidi" } } */
+
+unsigned long long
+foo (unsigned int a)
+{
+  return -a;
+}


[committed][AArch64] Make SVE UNSPEC_COND_*s match the insn mnemonic

2019-08-07 Thread Richard Sandiford
This patch makes the UNSPEC_COND* names match the instruction mnemonics,
rather than having the previous mixture in which some used instructions
while others used operator names.

Tested on aarch64-linux-gnu (with and without SVE) and aarch64_be-elf.
Applied as r274185.

Richard


2019-08-07  Richard Sandiford  

gcc/
* config/aarch64/iterators.md (UNSPEC_COND_ADD): Rename to...
(UNSPEC_COND_FADD): ...this.
(UNSPEC_COND_SUB): Rename to...
(UNSPEC_COND_FSUB): ...this.
(UNSPEC_COND_MUL): Rename to...
(UNSPEC_COND_FMUL): ...this.
(UNSPEC_COND_DIV): Rename to...
(UNSPEC_COND_FDIV): ...this.
(UNSPEC_COND_MAX): Rename to...
(UNSPEC_COND_FMAXNM): ...this.
(UNSPEC_COND_MIN): Rename to...
(UNSPEC_COND_FMINNM): ...this.
(UNSPEC_COND_LT): Rename to...
(UNSPEC_COND_FCMLT): ...this.
(UNSPEC_COND_LE): Rename to...
(UNSPEC_COND_FCMLE): ...this.
(UNSPEC_COND_EQ): Rename to...
(UNSPEC_COND_FCMEQ): ...this.
(UNSPEC_COND_NE): Rename to...
(UNSPEC_COND_FCMNE): ...this.
(UNSPEC_COND_GE): Rename to...
(UNSPEC_COND_FCMGE): ...this.
(UNSPEC_COND_GT): Rename to...
(UNSPEC_COND_FCMGT): ...this.
(SVE_COND_FP_BINARY, SVE_COND_FP_CMP, optab, cmp_op, sve_fp_op)
(sve_fp_op_rev): Update accordingly.
* config/aarch64/aarch64.c (aarch64_unspec_cond_code): Likewise.

Index: gcc/config/aarch64/iterators.md
===
--- gcc/config/aarch64/iterators.md 2019-08-07 19:32:11.708818768 +0100
+++ gcc/config/aarch64/iterators.md 2019-08-07 19:39:58.777340903 +0100
@@ -468,22 +468,22 @@ (define_c_enum "unspec"
 UNSPEC_XORF; Used in aarch64-sve.md.
 UNSPEC_SMUL_HIGHPART ; Used in aarch64-sve.md.
 UNSPEC_UMUL_HIGHPART ; Used in aarch64-sve.md.
-UNSPEC_COND_ADD; Used in aarch64-sve.md.
-UNSPEC_COND_SUB; Used in aarch64-sve.md.
-UNSPEC_COND_MUL; Used in aarch64-sve.md.
-UNSPEC_COND_DIV; Used in aarch64-sve.md.
-UNSPEC_COND_MAX; Used in aarch64-sve.md.
-UNSPEC_COND_MIN; Used in aarch64-sve.md.
+UNSPEC_COND_FADD   ; Used in aarch64-sve.md.
+UNSPEC_COND_FCMEQ  ; Used in aarch64-sve.md.
+UNSPEC_COND_FCMGE  ; Used in aarch64-sve.md.
+UNSPEC_COND_FCMGT  ; Used in aarch64-sve.md.
+UNSPEC_COND_FCMLE  ; Used in aarch64-sve.md.
+UNSPEC_COND_FCMLT  ; Used in aarch64-sve.md.
+UNSPEC_COND_FCMNE  ; Used in aarch64-sve.md.
+UNSPEC_COND_FDIV   ; Used in aarch64-sve.md.
+UNSPEC_COND_FMAXNM ; Used in aarch64-sve.md.
+UNSPEC_COND_FMINNM ; Used in aarch64-sve.md.
 UNSPEC_COND_FMLA   ; Used in aarch64-sve.md.
 UNSPEC_COND_FMLS   ; Used in aarch64-sve.md.
+UNSPEC_COND_FMUL   ; Used in aarch64-sve.md.
 UNSPEC_COND_FNMLA  ; Used in aarch64-sve.md.
 UNSPEC_COND_FNMLS  ; Used in aarch64-sve.md.
-UNSPEC_COND_LT ; Used in aarch64-sve.md.
-UNSPEC_COND_LE ; Used in aarch64-sve.md.
-UNSPEC_COND_EQ ; Used in aarch64-sve.md.
-UNSPEC_COND_NE ; Used in aarch64-sve.md.
-UNSPEC_COND_GE ; Used in aarch64-sve.md.
-UNSPEC_COND_GT ; Used in aarch64-sve.md.
+UNSPEC_COND_FSUB   ; Used in aarch64-sve.md.
 UNSPEC_LASTB   ; Used in aarch64-sve.md.
 UNSPEC_FCADD90 ; Used in aarch64-simd.md.
 UNSPEC_FCADD270; Used in aarch64-simd.md.
@@ -1609,18 +1609,24 @@ (define_int_iterator UNPACK_UNSIGNED [UN
 
 (define_int_iterator MUL_HIGHPART [UNSPEC_SMUL_HIGHPART UNSPEC_UMUL_HIGHPART])
 
-(define_int_iterator SVE_COND_FP_BINARY [UNSPEC_COND_ADD UNSPEC_COND_SUB
-UNSPEC_COND_MUL UNSPEC_COND_DIV
-UNSPEC_COND_MAX UNSPEC_COND_MIN])
+(define_int_iterator SVE_COND_FP_BINARY [UNSPEC_COND_FADD
+UNSPEC_COND_FDIV
+UNSPEC_COND_FMAXNM
+UNSPEC_COND_FMINNM
+UNSPEC_COND_FMUL
+UNSPEC_COND_FSUB])
 
 (define_int_iterator SVE_COND_FP_TERNARY [UNSPEC_COND_FMLA
  UNSPEC_COND_FMLS
  UNSPEC_COND_FNMLA
  UNSPEC_COND_FNMLS])
 
-(define_int_iterator SVE_COND_FP_CMP [UNSPEC_COND_LT UNSPEC_COND_LE
- UNSPEC_COND_EQ UNSPEC_COND_NE
- UNSPEC_COND_GE UNSPEC_COND_GT])
+(define_int_iterator SVE_COND_FP_CMP [UNSPEC_COND_FCMEQ
+ UNSPEC_COND_FCMGE
+ UNSPEC_COND_FCMGT
+ UNSPEC_COND_FCMLE
+ UNSPEC_COND_FCMLT
+ UNSPEC_COND_FC

[AArch64] Remove redundant SVE FADDA pattern

2019-08-07 Thread Richard Sandiford
*pred_fold_left_plus_ could no longer match anything, since
UNSPEC_FADDA now takes three operands.  Predicated FADDAs should
now go through mask_fold_left_plus_ instead.

Tested on aarch64-linux-gnu (with and without SVE) and aarch64_be-elf.
Applied as r274186.

Richard


2019-08-07  Richard Sandiford  

gcc/
* config/aarch64/aarch64-sve.md (*pred_fold_left_plus_): Delete.

Index: gcc/config/aarch64/aarch64-sve.md
===
--- gcc/config/aarch64/aarch64-sve.md   2019-08-07 19:37:15.646555313 +0100
+++ gcc/config/aarch64/aarch64-sve.md   2019-08-07 19:45:18.526961399 +0100
@@ -3468,21 +3468,6 @@ (define_insn "mask_fold_left_plus_
   "fadda\t%0, %3, %0, %2."
 )
 
-;; Predicated form of the above in-order reduction.
-(define_insn "*pred_fold_left_plus_"
-  [(set (match_operand: 0 "register_operand" "=w")
-   (unspec:
- [(match_operand: 1 "register_operand" "0")
-  (unspec:SVE_F
-[(match_operand: 2 "register_operand" "Upl")
- (match_operand:SVE_F 3 "register_operand" "w")
- (match_operand:SVE_F 4 "aarch64_simd_imm_zero")]
-UNSPEC_SEL)]
- UNSPEC_FADDA))]
-  "TARGET_SVE"
-  "fadda\t%0, %2, %0, %3."
-)
-
 ;; =
 ;; == Permutes
 ;; =


[committed][AArch64] Merge SVE FP unary patterns

2019-08-07 Thread Richard Sandiford
This patch merges the SVE FP rounding patterns with the other SVE
FP unary patterns.

At the moment, we only generate unary FP operations for full vectors,
so we can use (sqrt:VNx4SF ...) etc. in the rtl pattern.  With the ACLE,
it's also possible to generate predicated operations on partial vectors
without specifying a value for inactive lanes.  (sqrt:VNx4SF ...) would
then have different faulting behaviour from the instruction that the
pattern generates.

This patch therefore uses unspecs to represent the operations instead.
Later patches make this change for other patterns.

Tested on aarch64-linux-gnu (with and without SVE) and aarch64_be-elf.
Applied as r274187.

Richard


2019-08-07  Richard Sandiford  

gcc/
* config/aarch64/iterators.md (UNSPEC_COND_FABS, UNSPEC_COND_FNEG)
(UNSPEC_COND_FRINTA, UNSPEC_COND_FRINTI, UNSPEC_COND_FRINTM)
(UNSPEC_COND_FRINTN, UNSPEC_COND_FRINTP, UNSPEC_COND_FRINTX)
(UNSPEC_COND_FRINTZ, UNSPEC_COND_FSQRT): New unspecs.
(optab, sve_fp_op): Handle them.
(SVE_FP_UNARY): Delete.
(optab): Remove sqrt entry.
(sve_fp_op): Remove neg, abs and sqrt entries.
(SVE_COND_FP_UNARY): New int iterator.
* config/aarch64/aarch64-sve.md (2)
(*2): Delete.
(2): Replace with...
(2): ...this.
(*2): Replace with...
(*2): ...this.

Index: gcc/config/aarch64/iterators.md
===
--- gcc/config/aarch64/iterators.md 2019-08-07 19:42:57.592010072 +0100
+++ gcc/config/aarch64/iterators.md 2019-08-07 19:49:13.241215361 +0100
@@ -468,6 +468,7 @@ (define_c_enum "unspec"
 UNSPEC_XORF; Used in aarch64-sve.md.
 UNSPEC_SMUL_HIGHPART ; Used in aarch64-sve.md.
 UNSPEC_UMUL_HIGHPART ; Used in aarch64-sve.md.
+UNSPEC_COND_FABS   ; Used in aarch64-sve.md.
 UNSPEC_COND_FADD   ; Used in aarch64-sve.md.
 UNSPEC_COND_FCMEQ  ; Used in aarch64-sve.md.
 UNSPEC_COND_FCMGE  ; Used in aarch64-sve.md.
@@ -481,8 +482,17 @@ (define_c_enum "unspec"
 UNSPEC_COND_FMLA   ; Used in aarch64-sve.md.
 UNSPEC_COND_FMLS   ; Used in aarch64-sve.md.
 UNSPEC_COND_FMUL   ; Used in aarch64-sve.md.
+UNSPEC_COND_FNEG   ; Used in aarch64-sve.md.
 UNSPEC_COND_FNMLA  ; Used in aarch64-sve.md.
 UNSPEC_COND_FNMLS  ; Used in aarch64-sve.md.
+UNSPEC_COND_FRINTA ; Used in aarch64-sve.md.
+UNSPEC_COND_FRINTI ; Used in aarch64-sve.md.
+UNSPEC_COND_FRINTM ; Used in aarch64-sve.md.
+UNSPEC_COND_FRINTN ; Used in aarch64-sve.md.
+UNSPEC_COND_FRINTP ; Used in aarch64-sve.md.
+UNSPEC_COND_FRINTX ; Used in aarch64-sve.md.
+UNSPEC_COND_FRINTZ ; Used in aarch64-sve.md.
+UNSPEC_COND_FSQRT  ; Used in aarch64-sve.md.
 UNSPEC_COND_FSUB   ; Used in aarch64-sve.md.
 UNSPEC_LASTB   ; Used in aarch64-sve.md.
 UNSPEC_FCADD90 ; Used in aarch64-simd.md.
@@ -1241,9 +1251,6 @@ (define_code_iterator FAC_COMPARISONS [l
 ;; SVE integer unary operations.
 (define_code_iterator SVE_INT_UNARY [abs neg not popcount])
 
-;; SVE floating-point unary operations.
-(define_code_iterator SVE_FP_UNARY [abs neg sqrt])
-
 ;; SVE integer binary operations.
 (define_code_iterator SVE_INT_BINARY [plus minus mult smax umax smin umin
  and ior xor])
@@ -1307,8 +1314,7 @@ (define_code_attr optab [(ashift "ashl")
 (leu "leu")
 (geu "geu")
 (gtu "gtu")
-(abs "abs")
-(sqrt "sqrt")])
+(abs "abs")])
 
 ;; For comparison operators we use the FCM* and CM* instructions.
 ;; As there are no CMLE or CMLT instructions which act on 3 vector
@@ -1462,10 +1468,7 @@ (define_code_attr sve_int_op_rev [(plus
 ;; The floating-point SVE instruction that implements an rtx code.
 (define_code_attr sve_fp_op [(plus "fadd")
 (minus "fsub")
-(mult "fmul")
-(neg "fneg")
-(abs "fabs")
-(sqrt "fsqrt")])
+(mult "fmul")])
 
 ;; The SVE immediate constraint to use for an rtl code.
 (define_code_attr sve_imm_con [(eq "vsc")
@@ -1609,6 +1612,17 @@ (define_int_iterator UNPACK_UNSIGNED [UN
 
 (define_int_iterator MUL_HIGHPART [UNSPEC_SMUL_HIGHPART UNSPEC_UMUL_HIGHPART])
 
+(define_int_iterator SVE_COND_FP_UNARY [UNSPEC_COND_FABS
+   UNSPEC_COND_FNEG
+   UNSPEC_COND_FRINTA
+   UNSPEC_COND_FRINTI
+   UNSPEC_COND_FRINTM
+   UNSPEC_COND_FRINTN
+   UNSPEC_COND_FRINTP
+   UNSPEC_COND_FRINTX
+   UNSPEC_COND_FRI

Re: [Committed] PR fortran/54072 -- More fun with BOZ

2019-08-07 Thread Steve Kargl
On Wed, Aug 07, 2019 at 09:09:49AM -0700, Steve Kargl wrote:
> On Wed, Aug 07, 2019 at 01:58:17PM +0100, Mark Eggleston wrote:
> > 
> > DATA statements for logical and character variable compile but do not work:
> > 
> > program test
> >    character(4) :: c
> >    data c / z'41424344' /
> >    write(*, *) "'" // c // "'", transfer(c, 0_4)
> > end program test
> > 
> > Outputs:
> > 
> >   ''   0

Prior versions of gfortran give

% gfc9 -c a.f90
a.f90:3:10:

3 |data c / z'41424344' /
  |  1
Error: Incompatible types in DATA statement at (1); attempted conversion of 
INTEGER(16) to CHARACTER(1)

I have a patch that now does

gfcx -c a.f90
a.f90:3:10-23:

3 |data c / z'41424344' /
  |  12
Error: data-stmt-object at (1) has type 'CHARACTER', which conflicts with the 
BOZ literal constant at (2)

BTW, -fallow-invalid-boz does enable all previous broken 
usages of BOZ.  For example, BOZ can be an actual argument
in only a few intrinsic subprograms listed in F2018.  I've
allowed only a few exceptions such as AND(z'1234',4242)
which mirros IAND() is documented behavior. 

PS: Have you gotten write access to the source code repository, yet?

-- 
Steve


[committed][AArch64] Merge SVE FMAXNM/FMINNM patterns

2019-08-07 Thread Richard Sandiford
This patch makes us use the same define_insn for both the smax/smin
and fmax/fmin optabs.  It also continues the process started by
the earlier FP unary patch of moving predicated FP patterns from
rtx codes to unspecs.

There's no need to handle the FMAX and FMIN instructions until
the ACLE patch, since we only use FMAXNM and FMINNM at present.

Tested on aarch64-linux-gnu (with and without SVE) and aarch64_be-elf.
Applied as r274188.

Richard


2019-08-07  Richard Sandiford  

gcc/
* config/aarch64/iterators.md (SVE_COND_FP_MAXMIN_PUBLIC): New
int iterator.
(maxmin_uns_op): Handle UNSPEC_COND_FMAXNM and UNSPEC_COND_FMINNM.
* config/aarch64/aarch64-sve.md
(3): Rename to...
(3): ...this and
use a single unspec for the rhs.
(*3): Delete.
(3): Use a single unspec for the rhs.

Index: gcc/config/aarch64/iterators.md
===
--- gcc/config/aarch64/iterators.md 2019-08-07 19:51:33.208173107 +0100
+++ gcc/config/aarch64/iterators.md 2019-08-07 19:53:53.831126190 +0100
@@ -1630,6 +1630,11 @@ (define_int_iterator SVE_COND_FP_BINARY
 UNSPEC_COND_FMUL
 UNSPEC_COND_FSUB])
 
+;; Floating-point max/min operations that correspond to optabs,
+;; as opposed to those that are internal to the port.
+(define_int_iterator SVE_COND_FP_MAXMIN_PUBLIC [UNSPEC_COND_FMAXNM
+   UNSPEC_COND_FMINNM])
+
 (define_int_iterator SVE_COND_FP_TERNARY [UNSPEC_COND_FMLA
  UNSPEC_COND_FMLS
  UNSPEC_COND_FNMLA
@@ -1709,7 +1714,9 @@ (define_int_attr  maxmin_uns [(UNSPEC_UM
  (UNSPEC_FMINNMV "smin")
  (UNSPEC_FMINV "smin_nan")
  (UNSPEC_FMAXNM "fmax")
- (UNSPEC_FMINNM "fmin")])
+ (UNSPEC_FMINNM "fmin")
+ (UNSPEC_COND_FMAXNM "fmax")
+ (UNSPEC_COND_FMINNM "fmin")])
 
 (define_int_attr  maxmin_uns_op [(UNSPEC_UMAXV "umax")
 (UNSPEC_UMINV "umin")
Index: gcc/config/aarch64/aarch64-sve.md
===
--- gcc/config/aarch64/aarch64-sve.md   2019-08-07 19:51:33.204173137 +0100
+++ gcc/config/aarch64/aarch64-sve.md   2019-08-07 19:53:53.831126190 +0100
@@ -2186,69 +2186,51 @@ (define_expand "xorsign3"
 ;;  [FP] Maximum and minimum
 ;; -
 ;; Includes:
-;; - FMAX
 ;; - FMAXNM
-;; - FMIN
 ;; - FMINNM
 ;; -
 
-;; Unpredicated floating-point MAX/MIN.
-(define_expand "3"
+;; Unpredicated floating-point MAX/MIN (the rtx codes).  These are more
+;; relaxed than fmax/fmin, but we implement them in the same way.
+(define_expand "3"
   [(set (match_operand:SVE_F 0 "register_operand")
(unspec:SVE_F
  [(match_dup 3)
-  (FMAXMIN:SVE_F (match_operand:SVE_F 1 "register_operand")
- (match_operand:SVE_F 2 "register_operand"))]
- UNSPEC_MERGE_PTRUE))]
+  (match_operand:SVE_F 1 "register_operand")
+  (match_operand:SVE_F 2 "register_operand")]
+ SVE_COND_FP_MAXMIN_PUBLIC))]
   "TARGET_SVE"
   {
 operands[3] = aarch64_ptrue_reg (mode);
   }
 )
 
-;; Floating-point MAX/MIN predicated with a PTRUE.
-(define_insn "*3"
-  [(set (match_operand:SVE_F 0 "register_operand" "=w, ?&w")
-   (unspec:SVE_F
- [(match_operand: 1 "register_operand" "Upl, Upl")
-  (FMAXMIN:SVE_F (match_operand:SVE_F 2 "register_operand" "%0, w")
- (match_operand:SVE_F 3 "register_operand" "w, w"))]
- UNSPEC_MERGE_PTRUE))]
-  "TARGET_SVE"
-  "@
-   fnm\t%0., %1/m, %0., %3.
-   movprfx\t%0, %2\;fnm\t%0., %1/m, %0., %3."
-  [(set_attr "movprfx" "*,yes")]
-)
-
-;; Unpredicated fmax/fmin.
+;; Unpredicated fmax/fmin (the libm functions).
 (define_expand "3"
   [(set (match_operand:SVE_F 0 "register_operand")
(unspec:SVE_F
  [(match_dup 3)
-  (unspec:SVE_F [(match_operand:SVE_F 1 "register_operand")
- (match_operand:SVE_F 2 "register_operand")]
-FMAXMIN_UNS)]
- UNSPEC_MERGE_PTRUE))]
+  (match_operand:SVE_F 1 "register_operand")
+  (match_operand:SVE_F 2 "register_operand")]
+ SVE_COND_FP_MAXMIN_PUBLIC))]
   "TARGET_SVE"
   {
 operands[3] = aarch64_ptrue_reg (mode);
   }
 )
 
-;; fmax/fmin predicated with a PTRUE.
-(define_insn "*3"
+;; Predicated floating-point maximum/minimum.
+(define_insn "*3"
   [(set (match_operand:SVE_F 0 "register_operand" "=w, ?&w")
(unspec:SVE_F
  [(match_operand: 1 "register_

[committed][AArch64] Merge SVE ternary FP operations

2019-08-07 Thread Richard Sandiford
This patch combines the four individual fused multiply-add optabs
into one pattern and uses unspecs instead of rtx codes.  This is
part of a series of patches that change the SVE FP patterns so that
they can describe cases in which the predicate isn't all-true.

Tested on aarch64-linux-gnu (with and without SVE) and aarch64_be-elf.
Applied as r274189.

Richard


2019-08-07  Richard Sandiford  

gcc/
* config/aarch64/aarch64-sve.md (fma4, *fma4)
(fnma4, *fnma4, fnms4, *fnms4)
(fms4, *fms4): Replace with...
(4)
(*4): ...these new patterns.
Use unspecs instead of rtx codes.
(cond_, *cond__2, *cond__4)
(*cond__any): Add the predicate to SVE_COND_FP_TERNARY.

Index: gcc/config/aarch64/aarch64-sve.md
===
--- gcc/config/aarch64/aarch64-sve.md   2019-08-07 19:56:41.377879110 +0100
+++ gcc/config/aarch64/aarch64-sve.md   2019-08-07 19:58:39.608999273 +0100
@@ -85,10 +85,6 @@
 ;;  [INT] Dot product
 ;;  [INT] Sum of absolute differences
 ;;  [FP] General ternary arithmetic corresponding to unspecs
-;;  [FP] FMLA and FMAD
-;;  [FP] FMLS and FMSB
-;;  [FP] FNMLA and FNMAD
-;;  [FP] FNMLS and FNMSB
 ;;
 ;; == Comparisons and selects
 ;;  [INT,FP] Select based on predicates
@@ -2469,13 +2465,46 @@ (define_expand "sad"
 ;; - FNMSB
 ;; -
 
+;; Unpredicated floating-point ternary operations.
+(define_expand "4"
+  [(set (match_operand:SVE_F 0 "register_operand")
+   (unspec:SVE_F
+ [(match_dup 4)
+  (match_operand:SVE_F 1 "register_operand")
+  (match_operand:SVE_F 2 "register_operand")
+  (match_operand:SVE_F 3 "register_operand")]
+ SVE_COND_FP_TERNARY))]
+  "TARGET_SVE"
+  {
+operands[4] = aarch64_ptrue_reg (mode);
+  }
+)
+
+;; Predicated floating-point ternary operations.
+(define_insn "*4"
+  [(set (match_operand:SVE_F 0 "register_operand" "=w, w, ?&w")
+   (unspec:SVE_F
+ [(match_operand: 1 "register_operand" "Upl, Upl, Upl")
+  (match_operand:SVE_F 2 "register_operand" "%w, 0, w")
+  (match_operand:SVE_F 3 "register_operand" "w, w, w")
+  (match_operand:SVE_F 4 "register_operand" "0, w, w")]
+ SVE_COND_FP_TERNARY))]
+  "TARGET_SVE"
+  "@
+   \t%0., %1/m, %2., %3.
+   \t%0., %1/m, %3., %4.
+   movprfx\t%0, %4\;\t%0., %1/m, %2., %3."
+  [(set_attr "movprfx" "*,*,yes")]
+)
+
 ;; Predicated floating-point ternary operations with merging.
 (define_expand "cond_"
   [(set (match_operand:SVE_F 0 "register_operand")
(unspec:SVE_F
  [(match_operand: 1 "register_operand")
   (unspec:SVE_F
-[(match_operand:SVE_F 2 "register_operand")
+[(match_dup 1)
+ (match_operand:SVE_F 2 "register_operand")
  (match_operand:SVE_F 3 "register_operand")
  (match_operand:SVE_F 4 "register_operand")]
 SVE_COND_FP_TERNARY)
@@ -2496,7 +2525,8 @@ (define_insn "*cond__2"
(unspec:SVE_F
  [(match_operand: 1 "register_operand" "Upl, Upl")
   (unspec:SVE_F
-[(match_operand:SVE_F 2 "register_operand" "0, w")
+[(match_dup 1)
+ (match_operand:SVE_F 2 "register_operand" "0, w")
  (match_operand:SVE_F 3 "register_operand" "w, w")
  (match_operand:SVE_F 4 "register_operand" "w, w")]
 SVE_COND_FP_TERNARY)
@@ -2516,7 +2546,8 @@ (define_insn "*cond__4"
(unspec:SVE_F
  [(match_operand: 1 "register_operand" "Upl, Upl")
   (unspec:SVE_F
-[(match_operand:SVE_F 2 "register_operand" "w, w")
+[(match_dup 1)
+ (match_operand:SVE_F 2 "register_operand" "w, w")
  (match_operand:SVE_F 3 "register_operand" "w, w")
  (match_operand:SVE_F 4 "register_operand" "0, w")]
 SVE_COND_FP_TERNARY)
@@ -2536,7 +2567,8 @@ (define_insn_and_rewrite "*cond_<
(unspec:SVE_F
  [(match_operand: 1 "register_operand" "Upl, Upl, Upl")
   (unspec:SVE_F
-[(match_operand:SVE_F 2 "register_operand" "w, w, w")
+[(match_dup 1)
+ (match_operand:SVE_F 2 "register_operand" "w, w, w")
  (match_operand:SVE_F 3 "register_operand" "w, w, w")
  (match_operand:SVE_F 4 "register_operand" "w, w, w")]
 SVE_COND_FP_TERNARY)
@@ -2561,174 +2593,6 @@ (define_insn_and_rewrite "*cond_<
   [(set_attr "movprfx" "yes")]
 )
 
-;; -
-;;  [FP] FMLA and FMAD
-;; -
-;; Includes:
-;; - FMAD
-;; - FMLA
-;; -
-
-;; Unpredicated fma (%0 = (%1 * %2) + %3).
-(define_expand "fma4"
-  [(set (match_operand:SVE_F 0 "register_

[committed][AArch64] Merge SVE reduction patterns

2019-08-07 Thread Richard Sandiford
The reorg showed that we had an unnecessary separation between
the bitwise and max/min reductions for integers, and the
addition and max/min reductions for fp.

Tested on aarch64-linux-gnu (with and without SVE) and aarch64_be-elf.
Applied as r274190.

Richard


2019-08-07  Richard Sandiford  

gcc/
* config/aarch64/iterators.md (BITWISEV): Delete.
(SVE_INT_REDUCTION, SVE_FP_REDUCTION): New int iterators.
(optab): Handle UNSPEC_UMAXV, UNSPEC_UMINV, UNSPEC_SMAXV,
UNSPEC_SMINV, UNSPEC_FADDV, UNSPEC_FMAXNMV, UNSPEC_FMAXV,
UNSPEC_FMINNMV, UNSPEC_FMINV.
(bit_reduc_op): Delete.
(sve_int_op): New int attribute.
(sve_fp_op): Handle UNSPEC_FADDV, UNSPEC_FMAXNMV, UNSPEC_FMAXV,
UNSPEC_FMINNMV, UNSPEC_FMINV.
* config/aarch64/aarch64-sve.md
(reduc__scal_)
(*reduc__scal_)
(reduc__scal_)
(*reduc__scal_): Merge into...
(reduc__scal_)
(*reduc__scal_): ...these
new patterns.
(reduc_plus_scal_, *reduc_plus_scal_)
(reduc__scal_)
(*reduc__scal_): Merge into...
(reduc__scal_)
(*reduc__scal_): ...these
new patterns.

Index: gcc/config/aarch64/iterators.md
===
--- gcc/config/aarch64/iterators.md 2019-08-07 19:56:41.377879110 +0100
+++ gcc/config/aarch64/iterators.md 2019-08-07 20:03:03.667034750 +0100
@@ -1501,8 +1501,6 @@ (define_int_iterator MAXMINV [UNSPEC_UMA
 (define_int_iterator FMAXMINV [UNSPEC_FMAXV UNSPEC_FMINV
   UNSPEC_FMAXNMV UNSPEC_FMINNMV])
 
-(define_int_iterator BITWISEV [UNSPEC_ANDV UNSPEC_IORV UNSPEC_XORV])
-
 (define_int_iterator LOGICALF [UNSPEC_ANDF UNSPEC_IORF UNSPEC_XORF])
 
 (define_int_iterator HADDSUB [UNSPEC_SHADD UNSPEC_UHADD
@@ -1612,6 +1610,20 @@ (define_int_iterator UNPACK_UNSIGNED [UN
 
 (define_int_iterator MUL_HIGHPART [UNSPEC_SMUL_HIGHPART UNSPEC_UMUL_HIGHPART])
 
+(define_int_iterator SVE_INT_REDUCTION [UNSPEC_ANDV
+   UNSPEC_IORV
+   UNSPEC_SMAXV
+   UNSPEC_SMINV
+   UNSPEC_UMAXV
+   UNSPEC_UMINV
+   UNSPEC_XORV])
+
+(define_int_iterator SVE_FP_REDUCTION [UNSPEC_FADDV
+  UNSPEC_FMAXV
+  UNSPEC_FMAXNMV
+  UNSPEC_FMINV
+  UNSPEC_FMINNMV])
+
 (define_int_iterator SVE_COND_FP_UNARY [UNSPEC_COND_FABS
UNSPEC_COND_FNEG
UNSPEC_COND_FRINTA
@@ -1682,6 +1694,15 @@ (define_int_attr optab [(UNSPEC_ANDF "an
(UNSPEC_ANDV "and")
(UNSPEC_IORV "ior")
(UNSPEC_XORV "xor")
+   (UNSPEC_UMAXV "umax")
+   (UNSPEC_UMINV "umin")
+   (UNSPEC_SMAXV "smax")
+   (UNSPEC_SMINV "smin")
+   (UNSPEC_FADDV "plus")
+   (UNSPEC_FMAXNMV "smax")
+   (UNSPEC_FMAXV "smax_nan")
+   (UNSPEC_FMINNMV "smin")
+   (UNSPEC_FMINV "smin_nan")
(UNSPEC_COND_FABS "abs")
(UNSPEC_COND_FADD "add")
(UNSPEC_COND_FDIV "div")
@@ -1731,10 +1752,6 @@ (define_int_attr  maxmin_uns_op [(UNSPEC
 (UNSPEC_FMAXNM "fmaxnm")
 (UNSPEC_FMINNM "fminnm")])
 
-(define_int_attr bit_reduc_op [(UNSPEC_ANDV "andv")
-  (UNSPEC_IORV "orv")
-  (UNSPEC_XORV "eorv")])
-
 ;; The SVE logical instruction that implements an unspec.
 (define_int_attr logicalf_op [(UNSPEC_ANDF "and")
  (UNSPEC_IORF "orr")
@@ -1932,7 +1949,20 @@ (define_int_attr cmp_op [(UNSPEC_COND_FC
 (UNSPEC_COND_FCMLT "lt")
 (UNSPEC_COND_FCMNE "ne")])
 
-(define_int_attr sve_fp_op [(UNSPEC_COND_FABS "fabs")
+(define_int_attr sve_int_op [(UNSPEC_ANDV "andv")
+(UNSPEC_IORV "orv")
+(UNSPEC_XORV "eorv")
+(UNSPEC_UMAXV "umaxv")
+(UNSPEC_UMINV "uminv")
+(UNSPEC_SMAXV "smaxv")
+(UNSPEC_SMINV "sminv")])
+
+(define_int_attr sve_fp_op [(UNSPEC_FADDV "faddv")
+   (UNSPEC_FMAXNMV "fmaxnmv")
+   (UNSPEC_FMAXV "fmaxv")
+   (UNSPEC_FMINNMV "fminnmv")
+   (UNSPEC_FMINV "fminv")
+   (UNSPEC_COND_FABS "fabs")
   

[committed][AArch64] Prefer FPRs over GPRs for CLASTB

2019-08-07 Thread Richard Sandiford
This patch makes the SVE CLASTB GPR alternative more expensive than the
FPR alternative in order to avoid unnecessary cross-file moves.  It also
fixes the prefix used to print the FPR;  only handles 32-bit and
64-bit elements.

Tested on aarch64-linux-gnu (with and without SVE) and aarch64_be-elf.
Applied as r274191.

Richard


2019-08-07  Richard Sandiford  

gcc/
* config/aarch64/aarch64-sve.md (fold_extract_last_):
Disparage the GPR alternative relative to the FPR one.
Fix handling of 8-bit and 16-bit FPR values.

gcc/testsuite/
* gcc.target/aarch64/sve/clastb_8.c: New test.

Index: gcc/config/aarch64/aarch64-sve.md
===
--- gcc/config/aarch64/aarch64-sve.md   2019-08-07 20:05:39.025879238 +0100
+++ gcc/config/aarch64/aarch64-sve.md   2019-08-07 20:07:56.256858738 +0100
@@ -3104,7 +3104,7 @@ (define_insn "ptest_ptrue"
 ;; Set operand 0 to the last active element in operand 3, or to tied
 ;; operand 1 if no elements are active.
 (define_insn "fold_extract_last_"
-  [(set (match_operand: 0 "register_operand" "=r, w")
+  [(set (match_operand: 0 "register_operand" "=?r, w")
(unspec:
  [(match_operand: 1 "register_operand" "0, 0")
   (match_operand: 2 "register_operand" "Upl, Upl")
@@ -3113,7 +3113,7 @@ (define_insn "fold_extract_last_"
   "TARGET_SVE"
   "@
clastb\t%0, %2, %0, %3.
-   clastb\t%0, %2, %0, %3."
+   clastb\t%0, %2, %0, %3."
 )
 
 ;; -
Index: gcc/testsuite/gcc.target/aarch64/sve/clastb_8.c
===
--- /dev/null   2019-07-30 08:53:31.317691683 +0100
+++ gcc/testsuite/gcc.target/aarch64/sve/clastb_8.c 2019-08-07 
20:07:56.256858738 +0100
@@ -0,0 +1,25 @@
+/* { dg-do assemble { target aarch64_asm_sve_ok } } */
+/* { dg-options "-O2 -ftree-vectorize -msve-vector-bits=256 --save-temps" } */
+
+#include 
+
+#define TEST_TYPE(TYPE) \
+  void \
+  test_##TYPE (TYPE *ptr, TYPE *a, TYPE *b, TYPE min_v) \
+  { \
+TYPE last = *ptr; \
+for (int i = 0; i < 1024; i++) \
+  if (a[i] < min_v) \
+   last = b[i]; \
+*ptr = last; \
+  }
+
+TEST_TYPE (uint8_t);
+TEST_TYPE (uint16_t);
+TEST_TYPE (uint32_t);
+TEST_TYPE (uint64_t);
+
+/* { dg-final { scan-assembler {\tclastb\t(b[0-9]+), p[0-7], \1, z[0-9]+\.b\n} 
} } */
+/* { dg-final { scan-assembler {\tclastb\t(h[0-9]+), p[0-7], \1, z[0-9]+\.h\n} 
} } */
+/* { dg-final { scan-assembler {\tclastb\t(s[0-9]+), p[0-7], \1, z[0-9]+\.s\n} 
} } */
+/* { dg-final { scan-assembler {\tclastb\t(d[0-9]+), p[0-7], \1, z[0-9]+\.d\n} 
} } */


[committed][AArch64] Prefer FPRs over GPRs for INSR

2019-08-07 Thread Richard Sandiford
INSR of GPRs involves a cross-file move while INSR of FPRs doesn't.
We should therefore disparage the GPR version relative to the FPR
version.

The patch also adds MOVPRFX handling, but this is only tested
properly by the ACLE.

Tested on aarch64-linux-gnu (with and without SVE) and aarch64_be-elf.
Applied as r274192.

Richard


2019-08-07  Richard Sandiford  

gcc/
* config/aarch64/aarch64-sve.md (vec_shl_insert_): Add
MOVPRFX alternatives.  Make the GPR alternatives more expensive
than the FPR ones.

gcc/testsuite/
* gcc.target/aarch64/sve/init_12.c: Expect w1 to be moved into
a temporary FPR.

Index: gcc/config/aarch64/aarch64-sve.md
===
--- gcc/config/aarch64/aarch64-sve.md   2019-08-07 20:08:50.880452586 +0100
+++ gcc/config/aarch64/aarch64-sve.md   2019-08-07 20:10:43.051618619 +0100
@@ -835,15 +835,18 @@ (define_expand "vec_init"
 
 ;; Shift an SVE vector left and insert a scalar into element 0.
 (define_insn "vec_shl_insert_"
-  [(set (match_operand:SVE_ALL 0 "register_operand" "=w, w")
+  [(set (match_operand:SVE_ALL 0 "register_operand" "=?w, w, ??&w, ?&w")
(unspec:SVE_ALL
- [(match_operand:SVE_ALL 1 "register_operand" "0, 0")
-  (match_operand: 2 "register_operand" "rZ, w")]
+ [(match_operand:SVE_ALL 1 "register_operand" "0, 0, w, w")
+  (match_operand: 2 "aarch64_reg_or_zero" "rZ, w, rZ, w")]
  UNSPEC_INSR))]
   "TARGET_SVE"
   "@
insr\t%0., %2
-   insr\t%0., %2"
+   insr\t%0., %2
+   movprfx\t%0, %1\;insr\t%0., %2
+   movprfx\t%0, %1\;insr\t%0., %2"
+  [(set_attr "movprfx" "*,*,yes,yes")]
 )
 
 ;; -
Index: gcc/testsuite/gcc.target/aarch64/sve/init_12.c
===
--- gcc/testsuite/gcc.target/aarch64/sve/init_12.c  2019-07-29 
09:46:41.910859821 +0100
+++ gcc/testsuite/gcc.target/aarch64/sve/init_12.c  2019-08-07 
20:10:43.051618619 +0100
@@ -10,12 +10,13 @@ typedef int32_t vnx4si __attribute__((ve
 
 /*
 ** foo:
+** fmov(s[0-9]+), w1
 ** mov (z[0-9]+\.s), w2
 ** mov (z[0-9]+\.s), w0
-** insr\2, w1
-** insr\2, w1
-** insr\2, w1
-** zip1\2, \2, \1
+** insr\3, \1
+** insr\3, \1
+** insr\3, \1
+** zip1\3, \3, \2
 ** ...
 */
 __attribute__((noipa))


[committed][AArch64] Fix INSR for zero floats

2019-08-07 Thread Richard Sandiford
We used INSR to handle zero integers but not zero floats.

Tested on aarch64-linux-gnu (with and without SVE) and aarch64_be-elf.
Applied as r274193.

Richard


2019-08-07  Richard Sandiford  

gcc/
* config/aarch64/constraints.md (Z): Handle floating-point zeros too.
* config/aarch64/predicates.md (aarch64_reg_or_zero): Likewise.

gcc/testsuite/
* gcc.target/aarch64/sve/init_13.c: New test.

Index: gcc/config/aarch64/constraints.md
===
--- gcc/config/aarch64/constraints.md   2019-08-07 19:27:50.0 +0100
+++ gcc/config/aarch64/constraints.md   2019-08-07 20:14:16.110034873 +0100
@@ -114,8 +114,8 @@ (define_constraint "Y"
(match_test "aarch64_float_const_zero_rtx_p (op)")))
 
 (define_constraint "Z"
-  "Integer constant zero."
-  (match_test "op == const0_rtx"))
+  "Integer or floating-point constant zero."
+  (match_test "op == CONST0_RTX (GET_MODE (op))"))
 
 (define_constraint "Ush"
   "A constraint that matches an absolute symbolic address high part."
Index: gcc/config/aarch64/predicates.md
===
--- gcc/config/aarch64/predicates.md2019-08-07 19:27:50.0 +0100
+++ gcc/config/aarch64/predicates.md2019-08-07 20:14:16.110034873 +0100
@@ -57,9 +57,9 @@ (define_predicate "aarch64_simd_register
 (match_test "REGNO_REG_CLASS (REGNO (op)) == FP_REGS"
 
 (define_predicate "aarch64_reg_or_zero"
-  (and (match_code "reg,subreg,const_int")
+  (and (match_code "reg,subreg,const_int,const_double")
(ior (match_operand 0 "register_operand")
-   (match_test "op == const0_rtx"
+   (match_test "op == CONST0_RTX (GET_MODE (op))"
 
 (define_predicate "aarch64_reg_or_fp_zero"
   (ior (match_operand 0 "register_operand")
Index: gcc/testsuite/gcc.target/aarch64/sve/init_13.c
===
--- /dev/null   2019-07-30 08:53:31.317691683 +0100
+++ gcc/testsuite/gcc.target/aarch64/sve/init_13.c  2019-08-07 
20:14:16.110034873 +0100
@@ -0,0 +1,17 @@
+/* { dg-do assemble { target aarch64_asm_sve_ok } } */
+/* { dg-options "-O -msve-vector-bits=256 --save-temps" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+typedef float vnx4sf __attribute__((vector_size (32)));
+
+/*
+** foo:
+** mov (z[0-9]+\.s), s0
+** insr\1, wzr
+** ...
+*/
+vnx4sf
+foo (float a)
+{
+  return (vnx4sf) { 0.0f, a, a, a, a, a, a, a };
+}


[AArch64] Make the complete mnemonic

2019-08-07 Thread Richard Sandiford
The Advanced SIMD and SVE permute patterns both split the permute
operation into a base name and a hilo suffix.  That works well, but it
means that for "@" patterns, we need to pass the permute code twice,
once for the base name and once for the suffix.

Having a unified name avoids that and also makes the definitions
slightly simpler.

Tested on aarch64-linux-gnu (with and without SVE) and aarch64_be-elf.
OK to install?

Richard


2019-08-07  Richard Sandiford  

gcc/
* config/aarch64/iterators.md (perm_insn): Include the "1"/"2" suffix.
(perm_hilo): Remove UNSPEC_ZIP*, UNSEPC_TRN* and UNSPEC_UZP*.
* config/aarch64/aarch64-simd.md
(aarch64_): Rename to..
(aarch64_): ...this and remove perm_hilo
from the asm template.
* config/aarch64/aarch64-sve.md
(aarch64_): Rename to..
(aarch64_): ...this and remove perm_hilo
from the asm template.
(aarch64_): Rename to..
(aarch64_): ...this and remove perm_hilo
from the asm template.
* config/aarch64/aarch64-simd-builtins.def: Update comment.

Index: gcc/config/aarch64/iterators.md
===
--- gcc/config/aarch64/iterators.md 2019-08-07 20:05:39.029879208 +0100
+++ gcc/config/aarch64/iterators.md 2019-08-07 20:22:04.962550985 +0100
@@ -1888,18 +1888,15 @@ (define_int_attr pauth_hint_num [(UNSPEC
   (UNSPEC_AUTIA1716 "12")
   (UNSPEC_AUTIB1716 "14")])
 
-(define_int_attr perm_insn [(UNSPEC_ZIP1 "zip") (UNSPEC_ZIP2 "zip")
-   (UNSPEC_TRN1 "trn") (UNSPEC_TRN2 "trn")
-   (UNSPEC_UZP1 "uzp") (UNSPEC_UZP2 "uzp")])
+(define_int_attr perm_insn [(UNSPEC_ZIP1 "zip1") (UNSPEC_ZIP2 "zip2")
+   (UNSPEC_TRN1 "trn1") (UNSPEC_TRN2 "trn2")
+   (UNSPEC_UZP1 "uzp1") (UNSPEC_UZP2 "uzp2")])
 
 ; op code for REV instructions (size within which elements are reversed).
 (define_int_attr rev_op [(UNSPEC_REV64 "64") (UNSPEC_REV32 "32")
 (UNSPEC_REV16 "16")])
 
-(define_int_attr perm_hilo [(UNSPEC_ZIP1 "1") (UNSPEC_ZIP2 "2")
-   (UNSPEC_TRN1 "1") (UNSPEC_TRN2 "2")
-   (UNSPEC_UZP1 "1") (UNSPEC_UZP2 "2")
-   (UNSPEC_UNPACKSHI "hi") (UNSPEC_UNPACKUHI "hi")
+(define_int_attr perm_hilo [(UNSPEC_UNPACKSHI "hi") (UNSPEC_UNPACKUHI "hi")
(UNSPEC_UNPACKSLO "lo") (UNSPEC_UNPACKULO "lo")])
 
 ;; Return true if the associated optab refers to the high-numbered lanes,
Index: gcc/config/aarch64/aarch64-simd.md
===
--- gcc/config/aarch64/aarch64-simd.md  2019-07-29 09:39:46.666190099 +0100
+++ gcc/config/aarch64/aarch64-simd.md  2019-08-07 20:22:04.958551015 +0100
@@ -5781,13 +5781,13 @@ (define_insn_and_split "aarch64_combinev
 ;; This instruction's pattern is generated directly by
 ;; aarch64_expand_vec_perm_const, so any changes to the pattern would
 ;; need corresponding changes there.
-(define_insn "aarch64_"
+(define_insn "aarch64_"
   [(set (match_operand:VALL_F16 0 "register_operand" "=w")
(unspec:VALL_F16 [(match_operand:VALL_F16 1 "register_operand" "w")
  (match_operand:VALL_F16 2 "register_operand" "w")]
 PERMUTE))]
   "TARGET_SIMD"
-  "\\t%0., %1., %2."
+  "\\t%0., %1., %2."
   [(set_attr "type" "neon_permute")]
 )
 
Index: gcc/config/aarch64/aarch64-sve.md
===
--- gcc/config/aarch64/aarch64-sve.md   2019-08-07 20:12:11.454961437 +0100
+++ gcc/config/aarch64/aarch64-sve.md   2019-08-07 20:22:04.958551015 +0100
@@ -3356,13 +3356,13 @@ (define_insn "*aarch64_sve_rev16vnx16qi"
 
 ;; Permutes that take half the elements from one vector and half the
 ;; elements from the other.
-(define_insn "aarch64_sve_"
+(define_insn "aarch64_sve_"
   [(set (match_operand:SVE_ALL 0 "register_operand" "=w")
(unspec:SVE_ALL [(match_operand:SVE_ALL 1 "register_operand" "w")
 (match_operand:SVE_ALL 2 "register_operand" "w")]
PERMUTE))]
   "TARGET_SVE"
-  "\t%0., %1., %2."
+  "\t%0., %1., %2."
 )
 
 ;; Concatenate two vectors and extract a subvector.  Note that the
@@ -3395,13 +3395,13 @@ (define_insn "*aarch64_sve_ext"
 
 ;; Permutes that take half the elements from one vector and half the
 ;; elements from the other.
-(define_insn "*aarch64_sve_"
+(define_insn "*aarch64_sve_"
   [(set (match_operand:PRED_ALL 0 "register_operand" "=Upa")
(unspec:PRED_ALL [(match_operand:PRED_ALL 1 "register_operand" "Upa")
  (match_operand:PRED_ALL 2 "register_operand" "Upa")]
 PERMUTE))]
   "TARGET_SVE"
-  "\t%0., %1., %2."
+  "\t%0., %1., %2."
 )
 
 ;; ==

[AArch64] Split built-in function codes into major and minor codes

2019-08-07 Thread Richard Sandiford
It was easier to add the SVE ACLE support without enumerating every
function at build time.  This in turn meant that it was easier if the
SVE builtins occupied a distinct numberspace from the existing AArch64
ones, which *are* enumerated at build time.  This patch therefore
divides the built-in functions codes into "major" and "minor" codes.
At present the major code is just "general", but the SVE patch will add
"SVE" as well.

Also, it was convenient to put the SVE ACLE support in its own file,
so the patch makes aarch64.c provide the frontline target hooks directly,
forwarding to the other files for the real work.

The reason for organising the files this way is that aarch64.c needs
to define the target hook macros whatever happens, and having aarch64.c
macros forward to aarch64-builtins.c functions and aarch64-bulitins.c
functions forward to the SVE file seemed a bit indirect.  Doing things
the way the patch does them puts aarch64-builtins.c and the SVE code on
more of an equal footing.

The aarch64_(general_)gimple_fold_builtin change is mostly just
reindentation.  I've attached a -b version of the diff as well.

Tested on aarch64-linux-gnu (with and without SVE) and aarch64_be-elf.
OK to install when the ACLE patch itself is ready to install?

Richard


2019-08-07  Richard Sandiford  

gcc/
* config/aarch64/aarch64-protos.h (aarch64_builtin_class): New enum.
(AARCH64_BUILTIN_SHIFT, AARCH64_BUILTIN_CLASS): New constants.
(aarch64_gimple_fold_builtin, aarch64_mangle_builtin_type)
(aarch64_fold_builtin, aarch64_init_builtins, aarch64_expand_builtin):
(aarch64_builtin_decl, aarch64_builtin_rsqrt): Delete.
(aarch64_general_mangle_builtin_type, aarch64_general_init_builtins):
(aarch64_general_fold_builtin, aarch64_general_gimple_fold_builtin):
(aarch64_general_expand_builtin, aarch64_general_builtin_decl):
(aarch64_general_builtin_rsqrt): Declare.
* config/aarch64/aarch64-builtins.c (aarch64_general_add_builtin):
New function.
(aarch64_mangle_builtin_type): Rename to...
(aarch64_general_mangle_builtin_type): ...this.
(aarch64_init_fcmla_laneq_builtins, aarch64_init_simd_builtins)
(aarch64_init_crc32_builtins, aarch64_init_builtin_rsqrt)
(aarch64_init_pauth_hint_builtins, aarch64_init_tme_builtins): Use
aarch64_general_add_builtin instead of add_builtin_function.
(aarch64_init_builtins): Rename to...
(aarch64_general_init_builtins): ...this.  Use
aarch64_general_add_builtin instead of add_builtin_function.
(aarch64_builtin_decl): Rename to...
(aarch64_general_builtin_decl): ...this and remove the unused
arguments.
(aarch64_expand_builtin): Rename to...
(aarch64_general_expand_builtin): ...this and remove the unused
arguments.
(aarch64_builtin_rsqrt): Rename to...
(aarch64_general_builtin_rsqrt): ...this.
(aarch64_fold_builtin): Rename to...
(aarch64_general_fold_builtin): ...this.  Take the function subcode
and return type as arguments.  Remove the "ignored" argument.
(aarch64_gimple_fold_builtin): Rename to...
(aarch64_general_gimple_fold_builtin): ...this.  Take the function
subcode and gcall as arguments, and return the new function call.
* config/aarch64/aarch64.c (aarch64_init_builtins)
(aarch64_fold_builtin, aarch64_gimple_fold_builtin)
(aarch64_expand_builtin, aarch64_builtin_decl): New functions.
(aarch64_builtin_reciprocal): Call aarch64_general_builtin_rsqrt
instead of aarch64_builtin_rsqrt.
(aarch64_mangle_type): Call aarch64_general_mangle_builtin_type
instead of aarch64_mangle_builtin_type.

Index: gcc/config/aarch64/aarch64-protos.h
===
--- gcc/config/aarch64/aarch64-protos.h 2019-08-07 19:27:50.0 +0100
+++ gcc/config/aarch64/aarch64-protos.h 2019-08-07 20:25:20.057100264 +0100
@@ -406,6 +406,22 @@ enum aarch64_key_type {
 
 extern struct tune_params aarch64_tune_params;
 
+/* It's convenient to divide the built-in function codes into groups,
+   rather than having everything in a single enum.  This type enumerates
+   those groups.  */
+enum aarch64_builtin_class
+{
+  AARCH64_BUILTIN_GENERAL
+};
+
+/* Built-in function codes are structured so that the low
+   AARCH64_BUILTIN_SHIFT bits contain the aarch64_builtin_class
+   and the upper bits contain a group-specific subcode.  */
+const unsigned int AARCH64_BUILTIN_SHIFT = 1;
+
+/* Mask that selects the aarch64_builtin_class part of a function code.  */
+const unsigned int AARCH64_BUILTIN_CLASS = (1 << AARCH64_BUILTIN_SHIFT) - 1;
+
 void aarch64_post_cfi_startproc (void);
 poly_int64 aarch64_initial_elimination_offset (unsigned, unsigned);
 int aarch64_get_condition_code (rtx);
@@ -430,7 +446,6 @@ bool aarch64_float_const_rtx_p (rtx);
 bool aarch64_fu

Re: C++ PATCH for c++/91346 - Implement C++2a P1668R1, allow unevaluated asm in constexpr

2019-08-07 Thread Jason Merrill
On Wed, Aug 7, 2019, 2:35 PM Marek Polacek  wrote:

> On Wed, Aug 07, 2019 at 12:23:25PM -0400, Jason Merrill wrote:
> > On Wed, Aug 7, 2019, 9:11 AM Marek Polacek  wrote:
> >
> > > On Tue, Aug 06, 2019 at 09:25:45PM -0400, Jason Merrill wrote:
> > > > Let's downgrade the errors in earlier standard modes to pedwarn. Ok
> with
> > > > that change.
> > >
> > > Works for me, here's what I'll apply once it passes testing.
> > >
> > > I removed the diagnostic in potential_constant_expression_1/ASM_EXPR so
> > > that
> > > we don't generate duplicate pedwarns for the same thing.  Hope that's
> OK.
> > >
> > > 2019-08-07  Marek Polacek  
> > >
> > > PR c++/91346 - Implement P1668R1, allow unevaluated asm in
> > > constexpr.
> > > * constexpr.c (cxx_eval_constant_expression): Handle ASM_EXPR.
> > > (potential_constant_expression_1) : Allow.
> > > * cp-tree.h (finish_asm_stmt): Adjust.
> > > * parser.c (cp_parser_asm_definition): Grab the locaion of
> "asm"
> > > and
> > > use it.  Change an error to a pedwarn.  Allow asm in C++2a,
> warn
> > > otherwise.
> > > * pt.c (tsubst_expr): Pass a location down to finish_asm_stmt.
> > > * semantics.c (finish_asm_stmt): New location_t parameter.
> Use it.
> > >
> > > * g++.dg/cpp2a/inline-asm1.C: New test.
> > > * g++.dg/cpp2a/inline-asm2.C: New test.
> > > * g++.dg/cpp1y/constexpr-neg1.C: Adjust dg-error.
> > >
> > > diff --git gcc/cp/constexpr.c gcc/cp/constexpr.c
> > > index 36a66337433..e86b0789b84 100644
> > > --- gcc/cp/constexpr.c
> > > +++ gcc/cp/constexpr.c
> > > @@ -5289,6 +5289,18 @@ cxx_eval_constant_expression (const
> constexpr_ctx
> > > *ctx, tree t,
> > >r = void_node;
> > >break;
> > >
> > > +case ASM_EXPR:
> > > +  if (!ctx->quiet)
> > > +   {
> > > + error_at (cp_expr_loc_or_input_loc (t),
> > > +   "inline assembly is not a constant expression");
> > > + inform (cp_expr_loc_or_input_loc (t),
> > > + "only unevaluated inline assembly is allowed in a "
> > > + "% function in C++2a");
> > > +   }
> > > +  *non_constant_p = true;
> > > +  return t;
> > > +
> > >  default:
> > >if (STATEMENT_CODE_P (TREE_CODE (t)))
> > > {
> > > @@ -6469,13 +6481,18 @@ potential_constant_expression_1 (tree t, bool
> > > want_rval, bool strict, bool now,
> > >/* GCC internal stuff.  */
> > >  case VA_ARG_EXPR:
> > >  case TRANSACTION_EXPR:
> > > -case ASM_EXPR:
> > >  case AT_ENCODE_EXPR:
> > >  fail:
> > >if (flags & tf_error)
> > > error_at (loc, "expression %qE is not a constant expression",
> t);
> > >return false;
> > >
> > > +case ASM_EXPR:
> > > +  /* In C++2a, unevaluated inline assembly is permitted in
> constexpr
> > > +functions.  If it's used in earlier standard modes, we
> pedwarn in
> > > +cp_parser_asm_definition.  */
> > > +  return true;
> > >
> >
> > Actually, do we need this change? If it's (possibly) unevaluated, we
> > shouldn't get here.
>
> We can get here when using asm() in ({ }) like this (ugh):
>
> constexpr int
> foo (bool b)
> {
>   if (b)
>{
>  constexpr int i = ({ asm(""); 42; });
>  return i;
> }
>   else
> return 42;
> }
> static_assert(foo(false) == 42, "");
>
> With the current state of potential_constant_expression_1, we generate
>
> inline-asm3.C: In function ‘constexpr int foo(bool)’:
> inline-asm3.C:10:27: error: inline assembly is not a constant expression
>10 |  constexpr int i = ({ asm(""); 42; });
>   |   ^~~
> inline-asm3.C:10:27: note: only unevaluated inline assembly is allowed in
> a ‘constexpr’ function in C++2a
>
> which I thought was better than what we emit with the hunk revered:
>
> inline-asm3.C: In function ‘constexpr int foo(bool)’:
> inline-asm3.C:10:27: error: expression ‘’ is not a constant
> expression
>10 |  constexpr int i = ({ asm(""); 42; });
>   |   ^~~
>
> But I'm happy to revert that hunk if you want.
>

Sure, perhaps we could share the error message between the two functions.
In general it's better to reject something in potential_... if we can.

>


C++ PATCH for c++/91391 - bogus -Wcomma-subscript warning

2019-08-07 Thread Marek Polacek
When implementing -Wcomma-subscript I failed to realize that a comma in
a template-argument-list shouldn't be warned about.

But we can't simply ignore any commas inside < ... > because the following
needs to be caught:

  a[b < c, b > c];

This patch from Jakub fixes it by moving the warning to cp_parser_expression
where we can better detect top-level commas (and avoid saving tokens).

I've extended the patch to revert the cp_parser_skip_to_closing_square_bracket
changes I made in r274121 -- they are no longer needed.

Apologies for the thinko.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2019-08-07  Jakub Jelinek  
Marek Polacek  

PR c++/91391 - bogus -Wcomma-subscript warning.
* parser.c (cp_parser_postfix_open_square_expression): Don't warn about
a deprecated comma here.  Pass warn_comma_subscript down to
cp_parser_expression.
(cp_parser_expression): New bool parameter.  Warn about uses of a comma
operator within a subscripting expression.
(cp_parser_skip_to_closing_square_bracket): Revert to pre-r274121 state.
(cp_parser_skip_to_closing_square_bracket_1): Remove.

* g++.dg/cpp2a/comma5.C: New test.

diff --git gcc/cp/parser.c gcc/cp/parser.c
index 14b724095c4..eccc3749fd0 100644
--- gcc/cp/parser.c
+++ gcc/cp/parser.c
@@ -2102,7 +2102,7 @@ static cp_expr cp_parser_assignment_expression
 static enum tree_code cp_parser_assignment_operator_opt
   (cp_parser *);
 static cp_expr cp_parser_expression
-  (cp_parser *, cp_id_kind * = NULL, bool = false, bool = false);
+  (cp_parser *, cp_id_kind * = NULL, bool = false, bool = false, bool = false);
 static cp_expr cp_parser_constant_expression
   (cp_parser *, bool = false, bool * = NULL, bool = false);
 static cp_expr cp_parser_builtin_offsetof
@@ -2669,8 +2669,6 @@ static bool cp_parser_init_statement_p
   (cp_parser *);
 static bool cp_parser_skip_to_closing_square_bracket
   (cp_parser *);
-static int cp_parser_skip_to_closing_square_bracket_1
-  (cp_parser *, enum cpp_ttype);
 
 /* Concept-related syntactic transformations */
 
@@ -7524,33 +7522,9 @@ cp_parser_postfix_open_square_expression (cp_parser 
*parser,
  index = cp_parser_braced_list (parser, &expr_nonconst_p);
}
   else
-   {
- /* [depr.comma.subscript]: A comma expression appearing as
-the expr-or-braced-init-list of a subscripting expression
-is deprecated.  A parenthesized comma expression is not
-deprecated.  */
- if (warn_comma_subscript)
-   {
- /* Save tokens so that we can put them back.  */
- cp_lexer_save_tokens (parser->lexer);
-
- /* Look for ',' that is not nested in () or {}.  */
- if (cp_parser_skip_to_closing_square_bracket_1 (parser,
- CPP_COMMA) == -1)
-   {
- auto_diagnostic_group d;
- warning_at (cp_lexer_peek_token (parser->lexer)->location,
- OPT_Wcomma_subscript,
- "top-level comma expression in array subscript "
- "is deprecated");
-   }
-
- /* Roll back the tokens we skipped.  */
- cp_lexer_rollback_tokens (parser->lexer);
-   }
-
- index = cp_parser_expression (parser);
-   }
+   index = cp_parser_expression (parser, NULL, /*cast_p=*/false,
+ /*decltype_p=*/false,
+ /*warn_comma_p=*/warn_comma_subscript);
 }
 
   parser->greater_than_is_operator_p = saved_greater_than_is_operator_p;
@@ -9932,12 +9906,13 @@ cp_parser_assignment_operator_opt (cp_parser* parser)
CAST_P is true if this expression is the target of a cast.
DECLTYPE_P is true if this expression is the immediate operand of decltype,
  except possibly parenthesized or on the RHS of a comma (N3276).
+   WARN_COMMA_P is true if a comma should be diagnosed.
 
Returns a representation of the expression.  */
 
 static cp_expr
 cp_parser_expression (cp_parser* parser, cp_id_kind * pidk,
- bool cast_p, bool decltype_p)
+ bool cast_p, bool decltype_p, bool warn_comma_p)
 {
   cp_expr expression = NULL_TREE;
   location_t loc = UNKNOWN_LOCATION;
@@ -9984,6 +9959,17 @@ cp_parser_expression (cp_parser* parser, cp_id_kind * 
pidk,
break;
   /* Consume the `,'.  */
   loc = cp_lexer_peek_token (parser->lexer)->location;
+  if (warn_comma_p)
+   {
+ /* [depr.comma.subscript]: A comma expression appearing as
+the expr-or-braced-init-list of a subscripting expression
+is deprecated.  A parenthesized comma expression is not
+deprecated.  */
+ warning_at (loc, OPT_Wcomma_subscript,
+ "top-level comma expression in array subscript "
+ 

Re: C++ PATCH for c++/91346 - Implement C++2a P1668R1, allow unevaluated asm in constexpr

2019-08-07 Thread Marek Polacek
On Wed, Aug 07, 2019 at 03:38:02PM -0400, Jason Merrill wrote:
> On Wed, Aug 7, 2019, 2:35 PM Marek Polacek  wrote:
> 
> > On Wed, Aug 07, 2019 at 12:23:25PM -0400, Jason Merrill wrote:
> > > On Wed, Aug 7, 2019, 9:11 AM Marek Polacek  wrote:
> > >
> > > > On Tue, Aug 06, 2019 at 09:25:45PM -0400, Jason Merrill wrote:
> > > > > Let's downgrade the errors in earlier standard modes to pedwarn. Ok
> > with
> > > > > that change.
> > > >
> > > > Works for me, here's what I'll apply once it passes testing.
> > > >
> > > > I removed the diagnostic in potential_constant_expression_1/ASM_EXPR so
> > > > that
> > > > we don't generate duplicate pedwarns for the same thing.  Hope that's
> > OK.
> > > >
> > > > 2019-08-07  Marek Polacek  
> > > >
> > > > PR c++/91346 - Implement P1668R1, allow unevaluated asm in
> > > > constexpr.
> > > > * constexpr.c (cxx_eval_constant_expression): Handle ASM_EXPR.
> > > > (potential_constant_expression_1) : Allow.
> > > > * cp-tree.h (finish_asm_stmt): Adjust.
> > > > * parser.c (cp_parser_asm_definition): Grab the locaion of
> > "asm"
> > > > and
> > > > use it.  Change an error to a pedwarn.  Allow asm in C++2a,
> > warn
> > > > otherwise.
> > > > * pt.c (tsubst_expr): Pass a location down to finish_asm_stmt.
> > > > * semantics.c (finish_asm_stmt): New location_t parameter.
> > Use it.
> > > >
> > > > * g++.dg/cpp2a/inline-asm1.C: New test.
> > > > * g++.dg/cpp2a/inline-asm2.C: New test.
> > > > * g++.dg/cpp1y/constexpr-neg1.C: Adjust dg-error.
> > > >
> > > > diff --git gcc/cp/constexpr.c gcc/cp/constexpr.c
> > > > index 36a66337433..e86b0789b84 100644
> > > > --- gcc/cp/constexpr.c
> > > > +++ gcc/cp/constexpr.c
> > > > @@ -5289,6 +5289,18 @@ cxx_eval_constant_expression (const
> > constexpr_ctx
> > > > *ctx, tree t,
> > > >r = void_node;
> > > >break;
> > > >
> > > > +case ASM_EXPR:
> > > > +  if (!ctx->quiet)
> > > > +   {
> > > > + error_at (cp_expr_loc_or_input_loc (t),
> > > > +   "inline assembly is not a constant expression");
> > > > + inform (cp_expr_loc_or_input_loc (t),
> > > > + "only unevaluated inline assembly is allowed in a "
> > > > + "% function in C++2a");
> > > > +   }
> > > > +  *non_constant_p = true;
> > > > +  return t;
> > > > +
> > > >  default:
> > > >if (STATEMENT_CODE_P (TREE_CODE (t)))
> > > > {
> > > > @@ -6469,13 +6481,18 @@ potential_constant_expression_1 (tree t, bool
> > > > want_rval, bool strict, bool now,
> > > >/* GCC internal stuff.  */
> > > >  case VA_ARG_EXPR:
> > > >  case TRANSACTION_EXPR:
> > > > -case ASM_EXPR:
> > > >  case AT_ENCODE_EXPR:
> > > >  fail:
> > > >if (flags & tf_error)
> > > > error_at (loc, "expression %qE is not a constant expression",
> > t);
> > > >return false;
> > > >
> > > > +case ASM_EXPR:
> > > > +  /* In C++2a, unevaluated inline assembly is permitted in
> > constexpr
> > > > +functions.  If it's used in earlier standard modes, we
> > pedwarn in
> > > > +cp_parser_asm_definition.  */
> > > > +  return true;
> > > >
> > >
> > > Actually, do we need this change? If it's (possibly) unevaluated, we
> > > shouldn't get here.
> >
> > We can get here when using asm() in ({ }) like this (ugh):
> >
> > constexpr int
> > foo (bool b)
> > {
> >   if (b)
> >{
> >  constexpr int i = ({ asm(""); 42; });
> >  return i;
> > }
> >   else
> > return 42;
> > }
> > static_assert(foo(false) == 42, "");
> >
> > With the current state of potential_constant_expression_1, we generate
> >
> > inline-asm3.C: In function ‘constexpr int foo(bool)’:
> > inline-asm3.C:10:27: error: inline assembly is not a constant expression
> >10 |  constexpr int i = ({ asm(""); 42; });
> >   |   ^~~
> > inline-asm3.C:10:27: note: only unevaluated inline assembly is allowed in
> > a ‘constexpr’ function in C++2a
> >
> > which I thought was better than what we emit with the hunk revered:
> >
> > inline-asm3.C: In function ‘constexpr int foo(bool)’:
> > inline-asm3.C:10:27: error: expression ‘’ is not a constant
> > expression
> >10 |  constexpr int i = ({ asm(""); 42; });
> >   |   ^~~
> >
> > But I'm happy to revert that hunk if you want.
> >
> 
> Sure, perhaps we could share the error message between the two functions.
> In general it's better to reject something in potential_... if we can.

Yeah, that makes sense.  How about this further tweak then?

2019-08-07  Marek Polacek  

* constexpr.c (inline_asm_in_constexpr_error): New.
(cxx_eval_constant_expression) : Call it.
(potential_constant_expression_1) : Likewise.

* g++.dg/cpp2a/inline-asm3.C: New test.

diff --git gcc/cp/constexpr.c gcc/cp/constexpr.c
index e86b0789b84..63c89056

Re: C++ PATCH for c++/91346 - Implement C++2a P1668R1, allow unevaluated asm in constexpr

2019-08-07 Thread Jakub Jelinek
On Wed, Aug 07, 2019 at 04:30:29PM -0400, Marek Polacek wrote:
> +/* Complain about an attempt to evaluate inline assembly.  */
> +
> +static void
> +inline_asm_in_constexpr_error (location_t loc)
> +{
> +  error_at (loc, "inline assembly is not a constant expression");
> +  inform (loc, "only unevaluated inline assembly is allowed in a "
> +   "% function in C++2a");

Note, I think in this case you do want a
  auto_diagnostic_group d;
before the two diagnostic calls, whether you apply this patch or not,
because they are a diagnostic group together.  See PR84889.

Jakub


Re: C++ PATCH for c++/91346 - Implement C++2a P1668R1, allow unevaluated asm in constexpr

2019-08-07 Thread Marek Polacek
On Wed, Aug 07, 2019 at 10:37:04PM +0200, Jakub Jelinek wrote:
> On Wed, Aug 07, 2019 at 04:30:29PM -0400, Marek Polacek wrote:
> > +/* Complain about an attempt to evaluate inline assembly.  */
> > +
> > +static void
> > +inline_asm_in_constexpr_error (location_t loc)
> > +{
> > +  error_at (loc, "inline assembly is not a constant expression");
> > +  inform (loc, "only unevaluated inline assembly is allowed in a "
> > + "% function in C++2a");
> 
> Note, I think in this case you do want a
>   auto_diagnostic_group d;
> before the two diagnostic calls, whether you apply this patch or not,
> because they are a diagnostic group together.  See PR84889.

It's not like we talked about it just a while ago...  /facepalm

Will fix.

Marek


Re: Run the combine part of combine_and_move_insns even if -fsched-pressure

2019-08-07 Thread Segher Boessenkool
Hi Richard,

On Wed, Aug 07, 2019 at 07:16:03PM +0100, Richard Sandiford wrote:
> The main IRA routine includes the code:
> 
>   /* Don't move insns if live range shrinkage or register
>  pressure-sensitive scheduling were done because it will not
>  improve allocation but likely worsen insn scheduling.  */
>   if (optimize
>   && !flag_live_range_shrinkage
>   && !(flag_sched_pressure && flag_schedule_insns))
> combine_and_move_insns ();
> 
> The comment about not moving insns for pressure-sensitive scheduling
> makes sense, but I think the combine part of combine_and_move_insns is
> still useful, since it's folding a set of an equivalent value into its
> single user and so eliminates the need for one register altogether.

During which pass are the newly combined instructions created?  If not
late, why does combine refuse to do this combination?


Segher


Re: Start implementing -frounding-math

2019-08-07 Thread Joseph Myers
On Sat, 22 Jun 2019, Marc Glisse wrote:

> as discussed in the PR, this seems like a simple enough approach to handle
> FENV functionality safely, while keeping it possible to implement
> optimizations in the future.

Could you give a high-level description of the implementation approach, 
and how this design is intended to (eventually) achieve the required 
constraints on code movement and removal?  In 
 I listed those 
constraints as:

* General calls may set, clear or test exceptions, or manipulate the 
rounding mode (as may asms, depending on their inputs / outputs / 
clobbers).

* Floating-point operations have the rounding mode as input.  They may set 
(but not clear or test) floating-point exception flags.

* Thus in general floating-point operations may not be moved across most 
calls (or relevant asms), or values from one side of a call reused for the 
same operation with the same inputs appearing on the other side of the 
call.

* Statements such as "(void) (a * b);" can't be eliminated because they 
may raise exceptions.  (That's purely about exceptions, not rounding 
modes.)

(I should add that const function calls should not depend on the rounding 
mode, but pure calls may.  Also, on some architectures there are explicit 
register names for asms to use in inputs / outputs / clobbers to refer to 
the floating-point state registers, and asms not referring to those can be 
taken not to manipulate floating-point state, but other architectures 
don't have such names.  The safe approach for asms would be to assume that 
all asms on all architectures can manipulate floating-point state, until 
there is a way to declare what the relevant registers are.)

(I should also note that DFP has a separate rounding mode from binary FP, 
but that is unlikely to affect anything in this patch - although there 
might end up being potential minor optimizations from knowing that certain 
asms only involve one of the two rounding modes.)

> I'd like to handle this incrementally, rather than wait for a mega-patch that
> does everything, if that's ok. For instance, I didn't handle vectors in this
> first patch because the interaction with vector lowering was not completely
> obvious. Plus it may help get others to implement some parts of it ;-)

Are there testcases that could be added initially to demonstrate how this 
fixes cases that are currently broken, even if other cases aren't fixed?

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: Start implementing -frounding-math

2019-08-07 Thread Joseph Myers
On Sun, 23 Jun 2019, Marc Glisse wrote:

> For constant expressions, I see a difference between
> constexpr double third = 1. / 3.;
> which really needs to be done at compile time, and
> const double third = 1. / 3.;
> which will try to evaluate the rhs as constexpr, but where the program is
> still valid if that fails. The second one clearly should refuse to be
> evaluated at compile time if we are specifying a dynamic rounding direction.

For C, initializers with static or thread storage duration always use 
round-to-nearest and discard exceptions (see F.8.2 and F.8.5).  This is 
unaffected by FENV_ACCESS (but *is* affected by FENV_ROUND).

> Note that C2x adds a pragma fenv_round that specifies a rounding direction for
> a region of code, which seems relevant for constant expressions. That pragma
> looks hard, but maybe some pieces would be nice to add.

FENV_ROUND (and FENV_DEC_ROUND) shouldn't be that hard, given the 
optimizers avoiding code movement that doesn't respect rounding modes 
(though I'm only thinking of C here, not C++).  You'd insert appropriate 
built-in function calls to save and restore the dynamic rounding modes in 
scopes with a constant rounding mode set, taking due care about scopes 
being left through goto etc., and restore the mode around calls to 
functions that aren't meant to be affected by the constant rounding modes 
- you'd also need a built-in function to indicate to make a call that is 
affected by the constant rounding modes (and make __builtin_tgmath do that 
as well), and to define all the relevant functions as macros using that 
built-in function in the standard library headers.  Optimizations for 
architectures supporting rounding modes embedded in instructions could 
come later.

Complications would include:

*  constants should use hex floats to avoid being affected by the 
constant rounding mode (in turn, this may mean disallowing the FENV_ROUND 
pragma in C90 mode because of the lack of hex floats there).  If they use 
decimal rather than hex they'd need to be very long constants to have 
exactly the right value in all rounding modes.

* The built-in functions to change the dynamic rounding mode can't involve 
calling fegetround / fesetround, because those are in libm and libm is not 
supposed to be required unless you call a function in , 
 or  (simply using a language feature such as a pragma 
should not introduce a libm dependency).  So a similar issue applies as 
applied with atomic compound assignment for floating-point types: every 
target with hardware floating point needs to have its own support for 
expanding those built-in functions inline, and relevant tests will FAIL 
(or be UNSUPPORTED through the compiler calling sorry () when the pragma 
is used) on targets without that support, until it is added.  (And in 
cases where the rounding modes is TLS data in libc rather than in 
hardware, such as soft-float PowerPC GNU/Linux and maybe some other cases 
for DFP, you need new implementation-namespace interfaces there to save / 
restore it.)

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] PR91195: fix -Wmaybe-uninitialized warning for conditional store optimization

2019-08-07 Thread Jeff Law
On 7/30/19 8:42 AM, Martin Sebor wrote:
> On 7/30/19 2:44 AM, Richard Biener wrote:
>> On Tue, Jul 30, 2019 at 10:35 AM Jakub Jelinek  wrote:
>>>
>>> On Tue, Jul 30, 2019 at 10:21:15AM +0200, Richard Biener wrote:
 Would you need to LTO stream & merge the bitmaps / maps somehow?
>>>
>>> Yes.  And if we do not throw unneeded warnings from the sets
>>> normally, LTO
>>> streaming might be a good time to do that, so that we merge in only
>>> warnings
>>> that will be tested during/post IPA.
>>>
 (maybe "unmap" to sth else when streaming individual stmts)  Not
 sure if
 a bitmap is really necessary, we could simply have a tree/gimple *
 -> vec<>
 mapping with records about emitted (or queued) diagnostics, like
 OPT_, message pairs or so.
>>>
>>> Right now we use it both for the case we've already emitted some
>>> diagnostics
>>> and don't want to emit it again later, or for the case where we insert
>>> something in the IL that a warning could be diagnosed on but we know we
>>> don't want that.  The latter is the case e.g. for when we add
>>> TREE_NO_WARNING on the last value in a statement expression so that
>>> we don't
>>> diagnose it as unused, or the case we are discussing here.
>>> If we need queued diagnostics, sure, we could add it in, but I don't
>>> see a
>>> need for that right now.  vecs compared to bitmap might be larger and
>>> harder
>>> to avoid duplicates.  I guess we'd need to do some analysis though,
>>> and e.g.
>>> if 99% of cases we need to disable just one warning and not multiple,
>>> perhaps optimize for that case.
>>
>> I was thinking we may want to use the same facility to record warnings we
>> want to not emit if we later figure a stmt was in an unreachable
>> region.  For
>> this we need to record the actual warning, not only the warning kind.
> 
> I was thinking of introducing a __builtin_warning() for this and
> issuing it if it's not eliminated just before RTL expansion.  That
> would let programs like libstdc++ use it too.
I've stated before, I think this is a *great* idea and would encourage
you to prototype it out to see if there's any significant gotchas.
There's several places I think we could use it.

> 
> The downside of these solutions is that warnings may be issued out
> of order as code moves around.  To have the issued in the order of
> increasing source lines they would need to be queued.
I think David is out right now, but I think he'd claim that emitting
diagnostics strictly in line number order isn't best anyway.  Building a
queuing mechanism with an eye towards a more complex comparison to
facilitate priority ordering of diagnostics would be a good thing.  He
may already have some prototype code, not sure on that.

jeff


Re: [PATCH v2] Use edge->indirect_unknown_callee in cgraph_edge::make_direct (PR ipa/89330).

2019-08-07 Thread Jeff Law
On 7/30/19 7:37 AM, Martin Liška wrote:
> Hi.
> 
> Thanks to Martin I was able to prepare a proper fix. The issue is that
> cgraph_edge::resolve_speculation can delete this pointer (yes, it's
> super nasty) and so that the caller can't use this->something
> right after the function returns.
> 
> For the long term, I'll rework the ::resolve_speculation function.
> 
> The patch survives --enable-checking bootstrap on x86_64-linux-gnu.
> 
> Ready to be installed after proper testing?
> Thanks,
> Martin
> 
Would this possibly be the cause of this error building the kernel:


> /opt/notnfs/law/jenkins/workspace/x86_64-linux-gnu/linux/lib/iov_iter.c: In 
> function 'memcpy_to_page':
> /opt/notnfs/law/jenkins/workspace/x86_64-linux-gnu/linux/lib/iov_iter.c:1718:1:
>  internal compiler error: in gt_ggc_mx_symtab_node, at gtype-desc.c:1382
>  1718 | EXPORT_SYMBOL(iov_iter_for_each_range);
>   | ^
> 0x66537c gt_ggc_mx_symtab_node(void*)
>   
> /opt/notnfs/law/jenkins/workspace/x86_64-linux-gnu/obj/gcc/gcc/gtype-desc.c:1382
>   CC  fs/super.o
> 0xcafbcd gt_ggc_ma_order
>   ./gt-passes.h:31
> 0xcafbcd gt_ggc_ma_order
>   ./gt-passes.h:26
> 0xac0555 ggc_mark_root_tab
>   ../../../gcc/gcc/ggc-common.c:77
> 0xac075c ggc_mark_roots()
>   ../../../gcc/gcc/ggc-common.c:94
> 0x8ee485 ggc_collect()
>   ../../../gcc/gcc/ggc-page.c:2201
> Please submit a full bug report,

Jeff


Re: [PATCH, i386]: Fix PR 91385, Zero-extended negation is not generated

2019-08-07 Thread Segher Boessenkool
On Wed, Aug 07, 2019 at 08:37:52PM +0200, Uros Bizjak wrote:
> It looks that combine lost some of its unwanted creativity. Added
> testcase will keep it that way.

Huh, I wonder why it ever did that :-)  Strange.

> -;; Combine is quite creative about this pattern.
>  (define_insn "*negsi2_1_zext"
>[(set (match_operand:DI 0 "register_operand" "=r")
> - (lshiftrt:DI
> -   (neg:DI (ashift:DI (match_operand:DI 1 "register_operand" "0")
> -  (const_int 32)))
> - (const_int 32)))
> + (zero_extend:DI
> +   (neg:SI (match_operand:SI 1 "register_operand" "0"
> (clobber (reg:CC FLAGS_REG))]

Thanks for adding a testcase!


Segher


Re: Run the combine part of combine_and_move_insns even if -fsched-pressure

2019-08-07 Thread Richard Sandiford
Segher Boessenkool  writes:
> Hi Richard,
>
> On Wed, Aug 07, 2019 at 07:16:03PM +0100, Richard Sandiford wrote:
>> The main IRA routine includes the code:
>> 
>>   /* Don't move insns if live range shrinkage or register
>>  pressure-sensitive scheduling were done because it will not
>>  improve allocation but likely worsen insn scheduling.  */
>>   if (optimize
>>   && !flag_live_range_shrinkage
>>   && !(flag_sched_pressure && flag_schedule_insns))
>> combine_and_move_insns ();
>> 
>> The comment about not moving insns for pressure-sensitive scheduling
>> makes sense, but I think the combine part of combine_and_move_insns is
>> still useful, since it's folding a set of an equivalent value into its
>> single user and so eliminates the need for one register altogether.
>
> During which pass are the newly combined instructions created?  If not
> late, why does combine refuse to do this combination?

During if_after_combine.

I should have said, but these tests aren't actually the motivating
examples.  They're just something I found while trawling the testsuite
for a nice short reproducer (gcc.c-torture/compile/scc.c).

The justification really stands independently of whichever tests end up
being bundled with the patch.  While we have code to do combinations
at this point in the pipeline, I think we should use it independently
of -fsched-pressure.  I'm pretty sure the current -fsched-pressure
condition was added specifically to prevent moves (which makes a lot
of sense) rather than to stop combinations too.

Thanks,
Richard


Re: Run the combine part of combine_and_move_insns even if -fsched-pressure

2019-08-07 Thread Segher Boessenkool
On Wed, Aug 07, 2019 at 11:21:48PM +0100, Richard Sandiford wrote:
> Segher Boessenkool  writes:
> > On Wed, Aug 07, 2019 at 07:16:03PM +0100, Richard Sandiford wrote:
> >> The main IRA routine includes the code:
> >> 
> >>   /* Don't move insns if live range shrinkage or register
> >>  pressure-sensitive scheduling were done because it will not
> >>  improve allocation but likely worsen insn scheduling.  */
> >>   if (optimize
> >>   && !flag_live_range_shrinkage
> >>   && !(flag_sched_pressure && flag_schedule_insns))
> >> combine_and_move_insns ();
> >> 
> >> The comment about not moving insns for pressure-sensitive scheduling
> >> makes sense, but I think the combine part of combine_and_move_insns is
> >> still useful, since it's folding a set of an equivalent value into its
> >> single user and so eliminates the need for one register altogether.
> >
> > During which pass are the newly combined instructions created?  If not
> > late, why does combine refuse to do this combination?
> 
> During if_after_combine.

So immediately after combine.

Why did combine refuse to do this itself?  What am I missing?


Segher


Re: Run the combine part of combine_and_move_insns even if -fsched-pressure

2019-08-07 Thread Segher Boessenkool
On Wed, Aug 07, 2019 at 05:26:45PM -0500, Segher Boessenkool wrote:
> On Wed, Aug 07, 2019 at 11:21:48PM +0100, Richard Sandiford wrote:
> > Segher Boessenkool  writes:
> > > On Wed, Aug 07, 2019 at 07:16:03PM +0100, Richard Sandiford wrote:
> > >> The main IRA routine includes the code:
> > >> 
> > >>   /* Don't move insns if live range shrinkage or register
> > >>  pressure-sensitive scheduling were done because it will not
> > >>  improve allocation but likely worsen insn scheduling.  */
> > >>   if (optimize
> > >>   && !flag_live_range_shrinkage
> > >>   && !(flag_sched_pressure && flag_schedule_insns))
> > >> combine_and_move_insns ();
> > >> 
> > >> The comment about not moving insns for pressure-sensitive scheduling
> > >> makes sense, but I think the combine part of combine_and_move_insns is
> > >> still useful, since it's folding a set of an equivalent value into its
> > >> single user and so eliminates the need for one register altogether.
> > >
> > > During which pass are the newly combined instructions created?  If not
> > > late, why does combine refuse to do this combination?
> > 
> > During if_after_combine.
> 
> So immediately after combine.
> 
> Why did combine refuse to do this itself?  What am I missing?

Oh, duh, these are *new* instructions created by that ce2 pass.  Duh :-)


Segher


[Committed] PR fortran/91359 -- Fix testcases that were munged

2019-08-07 Thread Steve Kargl
I've committed updated for the testcases for PR fortran/91359.
Both files had "! { dg do run }" instead og "! { dg-do run }",
and for some unexplained reason pr91359_1.f and pr91359_2.f
were identical.  Patch attached.

2019-08-07  Steven G. Kargl  

 PR fortran/91359
 * pr91359_2.f:  Fix missing hyphen in dg-do
 * pr91359_1.f:  Ditto.  Remove RESULT variable to test actual fix!

-- 
Steve
Index: gcc/testsuite/gfortran.dg/pr91359_1.f
===
--- gcc/testsuite/gfortran.dg/pr91359_1.f	(revision 274200)
+++ gcc/testsuite/gfortran.dg/pr91359_1.f	(working copy)
@@ -1,12 +1,12 @@
-! { dg do run }
+! { dg-do run }
 ! PR fortran/91359
 ! Orginal code contributed by Brian T. Carcich 
 !
-  logical function zero() result(a)
+  logical function zero()
  goto 2
 1return
-2a = .false.
- if (.not.a) goto 1
+2zero = .false.
+ if (.not.zero) goto 1
  return
   end
 
Index: gcc/testsuite/gfortran.dg/pr91359_2.f
===
--- gcc/testsuite/gfortran.dg/pr91359_2.f	(revision 274200)
+++ gcc/testsuite/gfortran.dg/pr91359_2.f	(working copy)
@@ -1,4 +1,4 @@
-! { dg do run }
+! { dg-do run }
 ! PR fortran/91359
 ! Orginal code contributed by Brian T. Carcich 
 !


[PATCH] enable gcc.dg/struct-ret-1.c on all targets

2019-08-07 Thread Martin Sebor

This test is reported as UNSUPPORTED when it runs on x86_64
and I expect everywhere else except hppa-*-*.  There's nothing
PA-RISC specific in it that I can see and it runs successfully,
so I'm thinking I'll enable it everywhere just to get rid of
the UNSUPPORTED result.

Jeff, it's a test you added back in 1997.  If you can think
of a reason not to enable it please let me know, otherwise
I'll go ahead with it as obvious.

Martin

diff --git a/gcc/testsuite/gcc.dg/struct-ret-1.c 
b/gcc/testsuite/gcc.dg/struct-ret-1.c

index 23c9e98130b..426843505c7 100644
--- a/gcc/testsuite/gcc.dg/struct-ret-1.c
+++ b/gcc/testsuite/gcc.dg/struct-ret-1.c
@@ -1,5 +1,5 @@
-/* { dg-do run { target hppa*-*-* } } */
-/* { dg-options { -O2 } { target hppa*-*-* } } */
+/* { dg-do run } */
+/* { dg-options -O2 } */
 extern void abort (void);
 extern void exit (int);
 typedef struct {



Re: Start implementing -frounding-math

2019-08-07 Thread Joseph Myers
On Mon, 24 Jun 2019, Richard Biener wrote:

> On the patch I'd name _DIV _RDIV (to match the tree code we are dealing
> with).  You miss _NEGATE and also the _FIX_TRUNC and _FLOAT in
> case those might trap with -ftrapping-math.  There are also internal

Negation (and abs and copysign) can never raise any exceptions even with 
signaling NaN arguments.

Conversion between integers and floating-point *can* raise exceptions 
(depending on the types involved, e.g. conversions from int to IEEE double 
are always exact with no exceptions raised).  And conversions from integer 
to floating-point, when the types mean they aren't necessarily exact, 
depend on the rounding mode (whereas conversions from floating-point to 
integer types always truncate towards 0).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: PR90724 - ICE with __sync_bool_compare_and_swap with -march=armv8.2-a

2019-08-07 Thread Prathamesh Kulkarni
On Thu, 1 Aug 2019 at 15:34, Prathamesh Kulkarni
 wrote:
>
> On Thu, 25 Jul 2019 at 11:56, Prathamesh Kulkarni
>  wrote:
> >
> > On Wed, 17 Jul 2019 at 18:15, Prathamesh Kulkarni
> >  wrote:
> > >
> > > On Wed, 17 Jul 2019 at 13:45, Kyrill Tkachov
> > >  wrote:
> > > >
> > > > Hi Prathamesh
> > > >
> > > > On 7/10/19 12:24 PM, Prathamesh Kulkarni wrote:
> > > > > Hi,
> > > > > For following test-case,
> > > > > static long long AL[24];
> > > > >
> > > > > int
> > > > > check_ok (void)
> > > > > {
> > > > >   return (__sync_bool_compare_and_swap (AL+1, 0x20003ll,
> > > > > 0x1234567890ll));
> > > > > }
> > > > >
> > > > > Compiling with -O2 -march=armv8.2-a results in:
> > > > > pr90724.c: In function ‘check_ok’:
> > > > > pr90724.c:7:1: error: unrecognizable insn:
> > > > > 7 | }
> > > > >   | ^
> > > > > (insn 11 10 12 2 (set (reg:CC 66 cc)
> > > > > (compare:CC (reg:DI 95)
> > > > > (const_int 8589934595 [0x20003]))) "pr90724.c":6:11 -1
> > > > >  (nil))
> > > > >
> > > > > IIUC, the issue is that 0x20003 falls outside the range of
> > > > > allowable immediate in cmp ? If it's replaced by a small constant then
> > > > > it works.
> > > > >
> > > > > The ICE results with -march=armv8.2-a because, we enter if
> > > > > (TARGET_LSE) { ... } condition
> > > > > in aarch64_expand_compare_and_swap, while with -march=armv8.a it goes
> > > > > into else,
> > > > > which forces oldval into register if the predicate fails to match.
> > > > >
> > > > > The attached patch checks if y (oldval) satisfies aarch64_plus_operand
> > > > > predicate and if not, forces it to be in register, which resolves ICE.
> > > > > Does it look OK ?
> > > > >
> > > > > Bootstrap+testing in progress on aarch64-linux-gnu.
> > > > >
> > > > > PS: The issue has nothing to do with SVE, which I incorrectly
> > > > > mentioned in bug report.
> > > > >
> > > > This looks ok to me (but you'll need maintainer approval).
> > > >
> > > > Does this fail on the branches as well?
> > > Hi Kyrill,
> > > Thanks for the review. The test also fails on gcc-9-branch (but not on 
> > > gcc-8).
> > Hi James,
> > Is the patch OK to commit  ?
> > https://gcc.gnu.org/ml/gcc-patches/2019-07/msg00793.html
> ping * 3: https://gcc.gnu.org/ml/gcc-patches/2019-07/msg00793.html
ping * 4: https://gcc.gnu.org/ml/gcc-patches/2019-07/msg00793.html

Thanks,
Prathamesh
>
> Thanks,
> Prathamesh
> >
> > Thanks,
> > Prathamesh
> > >
> > > Thanks,
> > > Prathamesh
> > > >
> > > > Thanks,
> > > >
> > > > Kyrill
> > > >
> > > >
> > > > > Thanks,
> > > > > Prathamesh


Re: Start implementing -frounding-math

2019-08-07 Thread Marc Glisse

On Wed, 7 Aug 2019, Joseph Myers wrote:


On Sat, 22 Jun 2019, Marc Glisse wrote:


as discussed in the PR, this seems like a simple enough approach to handle
FENV functionality safely, while keeping it possible to implement
optimizations in the future.


Could you give a high-level description of the implementation approach,


At the GIMPLE level, z = x + y is represented as a function call z = 
.FENV_PLUS (x, y, options). The floating point environment (rounding mode, 
exceptions) is considered to be somewhere in memory (I think it still 
works if it is a hard register). Unless options say so, .FENV_PLUS may 
read/write to memory. There are very little optimizations that can be done 
on general function calls, so this should avoid unwanted movement or 
removal. We can still implement some specific optimizations just for those 
functions.


At the RTL level, well the idea is that good back-ends would expand 
.FENV_PLUS however they want, but the default is to have the arguments and 
the result use an asm volatile pass-through, which is opaque to optimizers 
and prevents constant propagation, removal, movement, etc.


(the use of "options" is to avoid having many variants depending on 
whether we only care about rounding, exceptions, maybe ignore signed 
zeros, etc, with 0 as the strictest, always-safe version. For explicitly 
rounded operations as with pragma fenv_round, a different function might 
be better since the 0 case is not a safe replacement anymore)



and how this design is intended to (eventually) achieve the required
constraints on code movement and removal?  In
 I listed those
constraints as:

* General calls may set, clear or test exceptions, or manipulate the
rounding mode
(as may asms, depending on their inputs / outputs / clobbers).


If the asm is volatile, this works fine. I'll come back to this below.


* Floating-point operations have the rounding mode as input.  They may set
(but not clear or test) floating-point exception flags.

* Thus in general floating-point operations may not be moved across most
calls (or relevant asms), or values from one side of a call reused for the
same operation with the same inputs appearing on the other side of the
call.

* Statements such as "(void) (a * b);" can't be eliminated because they
may raise exceptions.  (That's purely about exceptions, not rounding
modes.)


I had to add TREE_SIDE_EFFECTS = 1 so the C++ front-end wouldn't remove it 
prematurely.



(I should add that const function calls should not depend on the rounding
mode, but pure calls may.


That perfectly fits with the idea of having the FP env as part of memory.


Also, on some architectures there are explicit
register names for asms to use in inputs / outputs / clobbers to refer to
the floating-point state registers, and asms not referring to those can be
taken not to manipulate floating-point state, but other architectures
don't have such names.  The safe approach for asms would be to assume that
all asms on all architectures can manipulate floating-point state, until
there is a way to declare what the relevant registers are.)


I assume that an asm using this register as a constraint is already 
prevented from moving across function calls somehow? If so, at least 
gimple seems safe.


For RTL, if those asm were volatile, the default expansion would be fine. 
If they don't need to be and somehow manage to cross the pass-through asm, 
I guess a target hook to add extra input/output/clobber to the 
pass-through asm would work. Or best the target would expand the 
operations to (unspec) insns that explicitly handle exactly those 
registers.



(I should also note that DFP has a separate rounding mode from binary FP,
but that is unlikely to affect anything in this patch - although there
might end up being potential minor optimizations from knowing that certain
asms only involve one of the two rounding modes.)


I'd like to handle this incrementally, rather than wait for a mega-patch that
does everything, if that's ok. For instance, I didn't handle vectors in this
first patch because the interaction with vector lowering was not completely
obvious. Plus it may help get others to implement some parts of it ;-)


Are there testcases that could be added initially to demonstrate how this
fixes cases that are currently broken, even if other cases aren't fixed?


Yes. I'll need to look into dg-require-effective-target fenv(_exceptions) 
to see how to disable those new tests where they are not supported. There 
are many easy tests that already start working, say computing 1./3 twice 
with a change of rounding mode in between and checking that the results 
differ, or computing 1./3 and ignoring the result but checking FE_INEXACT.



On Wed, 7 Aug 2019, Joseph Myers wrote:


On Sun, 23 Jun 2019, Marc Glisse wrote:


For constant expressions, I see a difference between
constexpr double third = 1. / 3.;
which really needs to be done at compile ti

[testsuite] use rand instead of random

2019-08-07 Thread Alexandre Oliva
rand is in ISO C, whereas random is only in POSIX, so it makes sense
to use the more portable function everywhere instead of falling back
from one to the other on systems that miss the less portable one.

Tested on x86_64-linux-gnu, native and cross to ppc.  Ok to install?


for  gcc/testsuite/ChangeLog

* gcc.target/i386/sse2-mul-1.c: Use rand.  Drop fallback.
* gcc.target/i386/sse4_1-blendps-2.c: Likewise.
* gcc.target/i386/sse4_1-blendps.c: Likewise.
* gcc.target/i386/xop-vshift-1.c: Likewise.
* gcc.target/powerpc/direct-move.h: Likewise.
---
 gcc/testsuite/gcc.target/i386/sse2-mul-1.c   |   13 -
 gcc/testsuite/gcc.target/i386/sse4_1-blendps-2.c |7 +--
 gcc/testsuite/gcc.target/i386/sse4_1-blendps.c   |7 +--
 gcc/testsuite/gcc.target/i386/xop-vshift-1.c |9 ++---
 gcc/testsuite/gcc.target/powerpc/direct-move.h   |2 +-
 5 files changed, 9 insertions(+), 29 deletions(-)

diff --git a/gcc/testsuite/gcc.target/i386/sse2-mul-1.c 
b/gcc/testsuite/gcc.target/i386/sse2-mul-1.c
index 9cdc12763b0b9..af63d245d7916 100644
--- a/gcc/testsuite/gcc.target/i386/sse2-mul-1.c
+++ b/gcc/testsuite/gcc.target/i386/sse2-mul-1.c
@@ -14,11 +14,6 @@
 
 #include 
 
-/* mingw runtime don't provide random().  */
-#ifdef __MINGW32__
-#define random rand
-#endif
-
 #define N 512
 static short a1[N], a2[N], a3[N];
 static unsigned short b1[N], b2[N], b3[N];
@@ -160,12 +155,12 @@ TEST (void)
   asm volatile ("" : : "r" (&s2) : "memory");
   asm volatile ("" : : "r" (&s3) : "memory");
   asm volatile ("" : : "r" (&s4) : "memory");
-  b2[i] = (int) random ();
-  b3[i] = (int) random ();
+  b2[i] = (int) rand ();
+  b3[i] = (int) rand ();
   a2[i] = b2[i];
   a3[i] = b3[i];
-  d2[i] = (((int) random ()) << 16) | b2[i];
-  d3[i] = (((int) random ()) << 16) | b3[i];
+  d2[i] = (((int) rand ()) << 16) | b2[i];
+  d3[i] = (((int) rand ()) << 16) | b3[i];
   c2[i] = d2[i];
   c3[i] = d3[i];
   s1 += a2[i] * a3[i];
diff --git a/gcc/testsuite/gcc.target/i386/sse4_1-blendps-2.c 
b/gcc/testsuite/gcc.target/i386/sse4_1-blendps-2.c
index 8fe71b71c576a..512394ea37a00 100644
--- a/gcc/testsuite/gcc.target/i386/sse4_1-blendps-2.c
+++ b/gcc/testsuite/gcc.target/i386/sse4_1-blendps-2.c
@@ -8,11 +8,6 @@
 #include 
 #include 
 
-/* mingw runtime don't provide random().  */
-#ifdef __MINGW32__
-#define random rand
-#endif
-
 #define NUM 20
 
 #undef MASK
@@ -64,7 +59,7 @@ sse4_1_test (void)
   init_blendps (src1.f, src2.f);
 
   for (i = 0; i < 4; i++)
-src3.f[i] = (int) random ();
+src3.f[i] = (int) rand ();
 
   /* Check blendps imm8, m128, xmm */
   for (i = 0; i < NUM; i++)
diff --git a/gcc/testsuite/gcc.target/i386/sse4_1-blendps.c 
b/gcc/testsuite/gcc.target/i386/sse4_1-blendps.c
index 3f4b335aca6ae..556094bf9530e 100644
--- a/gcc/testsuite/gcc.target/i386/sse4_1-blendps.c
+++ b/gcc/testsuite/gcc.target/i386/sse4_1-blendps.c
@@ -16,11 +16,6 @@
 #include 
 #include 
 
-/* mingw runtime don't provide random().  */
-#ifdef __MINGW32__
-#define random rand
-#endif
-
 #define NUM 20
 
 #ifndef MASK
@@ -73,7 +68,7 @@ TEST (void)
   init_blendps (src1.f, src2.f);
 
   for (i = 0; i < 4; i++)
-src3.f[i] = (int) random ();
+src3.f[i] = (int) rand ();
 
   /* Check blendps imm8, m128, xmm */
   for (i = 0; i < NUM; i++)
diff --git a/gcc/testsuite/gcc.target/i386/xop-vshift-1.c 
b/gcc/testsuite/gcc.target/i386/xop-vshift-1.c
index ee3d2990390ab..f3713e8ffc656 100644
--- a/gcc/testsuite/gcc.target/i386/xop-vshift-1.c
+++ b/gcc/testsuite/gcc.target/i386/xop-vshift-1.c
@@ -19,11 +19,6 @@
 #define TYPE2 long long
 #endif
 
-/* mingw runtime don't provide random().  */
-#ifdef __MINGW32__
-#define random rand
-#endif
-
 signed TYPE1 a[N], b[N], g[N];
 unsigned TYPE1 c[N], h[N];
 signed TYPE2 d[N], e[N], j[N];
@@ -108,10 +103,10 @@ TEST ()
   for (i = 0; i < N; i++)
 {
   asm ("");
-  c[i] = (random () << 1) | (random () & 1);
+  c[i] = (rand () << 1) | (rand () & 1);
   b[i] = (i * 85) & (sizeof (TYPE1) * __CHAR_BIT__ - 1);
   a[i] = c[i];
-  d[i] = (random () << 1) | (random () & 1);
+  d[i] = (rand () << 1) | (rand () & 1);
   d[i] |= (unsigned long long) c[i] << 32;
   e[i] = (i * 85) & (sizeof (TYPE2) * __CHAR_BIT__ - 1);
   f[i] = d[i];
diff --git a/gcc/testsuite/gcc.target/powerpc/direct-move.h 
b/gcc/testsuite/gcc.target/powerpc/direct-move.h
index e29f10f5cd54f..80874adcea1ae 100644
--- a/gcc/testsuite/gcc.target/powerpc/direct-move.h
+++ b/gcc/testsuite/gcc.target/powerpc/direct-move.h
@@ -179,7 +179,7 @@ main (void)
   for (j = 0; j < 10; j++)
 {
   for (i = 0; i < sizeof (TYPE); i++)
-   u.bytes[i] = (unsigned char) (random () >> 4);
+   u.bytes[i] = (unsigned char) (rand () >> 4);
 
   test_value (u.value);
 }

-- 
Alexandre Oliva, freedom fighter  he/him   https://FSFLA.org/blogs/lxo
Be the change, be Free!

[committed] Support map + use_device_{addr,ptr} clauses for the same var on the same construct

2019-08-07 Thread Jakub Jelinek
Hi!

OpenMP 5.0 also newly says:
"If one or more of the use_device_ptr or use_device_addr clauses and one or 
more map
clauses are present on the same construct, the address conversions of 
use_device_addr and
use_device_ptr clauses will occur as if performed after all variables are 
mapped according to
those map clauses."
Before this rule, one had to use a separate target data or target enter data
to map the variables first before it was possible to use use_device_* on
another target data.

The patch allows such cases in the FEs, sorts the use_device_* clauses last
in the gimplifier, ensures omp lowering doesn't ICE on that and finally in
libgomp arranges that if the mapping clauses really precede all use_device_*
clauses and something has not been already found during the map clause
processing, we defer the use_device_* lookups until after the new variables
are mapped too.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

2019-08-08  Jakub Jelinek  

* gimplify.c (omp_add_variable): Use GOVD_PRIVATE | GOVD_EXPLICIT
for VLA helper variables on target data even if not GOVD_FIRSTPRIVATE.
(gimplify_scan_omp_clauses): For OMP_CLAUSE_USE_DEVICE_* use just
GOVD_EXPLICIT flags.
(gimplify_omp_workshare): For OMP_TARGET_DATA move all
OMP_CLAUSE_USE_DEVICE_* clauses to the end of clauses chain.
* omp-low.c (scan_sharing_clauses): For OMP_CLAUSE_USE_DEVICE_*
call install_var_field with mask 11 instead of 3.
(lower_omp_target): For OMP_CLAUSE_USE_DEVICE_* use pass
(splay_tree_key) &DECL_UID (var) to build_sender_ref instead of var.
gcc/c/
* c-typeck.c (c_finish_omp_clauses): For C_ORT_OMP
OMP_CLAUSE_USE_DEVICE_* clauses use oacc_reduction_head bitmap
instead of generic_head to track duplicates.
gcc/cp/
* semantics.c (finish_omp_clauses): For C_ORT_OMP
OMP_CLAUSE_USE_DEVICE_* clauses use oacc_reduction_head bitmap
instead of generic_head to track duplicates.
libgomp/
* target.c (gomp_map_vars_internal): For GOMP_MAP_USE_DEVICE_PTR
perform the lookup in the first loop only if !not_found_cnt, otherwise
perform lookups for it in the second loop guarded with
if (not_found_cnt || has_firstprivate).
* testsuite/libgomp.c/target-37.c: New test.
* testsuite/libgomp.c++/target-22.C: New test.

--- gcc/gimplify.c.jj   2019-08-07 09:24:35.646096085 +0200
+++ gcc/gimplify.c  2019-08-07 12:32:49.927990656 +0200
@@ -6932,8 +6932,10 @@ omp_add_variable (struct gimplify_omp_ct
nflags = GOVD_MAP | GOVD_MAP_TO_ONLY | GOVD_EXPLICIT;
  else if (flags & GOVD_PRIVATE)
nflags = GOVD_PRIVATE;
- else if ((ctx->region_type & (ORT_TARGET | ORT_TARGET_DATA)) != 0
-  && (flags & GOVD_FIRSTPRIVATE))
+ else if (((ctx->region_type & (ORT_TARGET | ORT_TARGET_DATA)) != 0
+   && (flags & GOVD_FIRSTPRIVATE))
+  || (ctx->region_type == ORT_TARGET_DATA
+  && (flags & GOVD_DATA_SHARE_CLASS) == 0))
nflags = GOVD_PRIVATE | GOVD_EXPLICIT;
  else
nflags = GOVD_FIRSTPRIVATE;
@@ -9016,6 +9018,9 @@ gimplify_scan_omp_clauses (tree *list_p,
 
case OMP_CLAUSE_USE_DEVICE_PTR:
case OMP_CLAUSE_USE_DEVICE_ADDR:
+ flags = GOVD_EXPLICIT;
+ goto do_add;
+
case OMP_CLAUSE_IS_DEVICE_PTR:
  flags = GOVD_FIRSTPRIVATE | GOVD_EXPLICIT;
  goto do_add;
@@ -12404,8 +12409,27 @@ gimplify_omp_workshare (tree *expr_p, gi
  OMP_CLAUSES (expr));
   break;
 case OMP_TARGET_DATA:
-  stmt = gimple_build_omp_target (body, GF_OMP_TARGET_KIND_DATA,
- OMP_CLAUSES (expr));
+  /* Put use_device_{ptr,addr} clauses last, as map clauses are supposed
+to be evaluated before the use_device_{ptr,addr} clauses if they
+refer to the same variables.  */
+  {
+   tree use_device_clauses;
+   tree *pc, *uc = &use_device_clauses;
+   for (pc = &OMP_CLAUSES (expr); *pc; )
+ if (OMP_CLAUSE_CODE (*pc) == OMP_CLAUSE_USE_DEVICE_PTR
+ || OMP_CLAUSE_CODE (*pc) == OMP_CLAUSE_USE_DEVICE_ADDR)
+   {
+ *uc = *pc;
+ *pc = OMP_CLAUSE_CHAIN (*pc);
+ uc = &OMP_CLAUSE_CHAIN (*uc);
+   }
+ else
+   pc = &OMP_CLAUSE_CHAIN (*pc);
+   *uc = NULL_TREE;
+   *pc = use_device_clauses;
+   stmt = gimple_build_omp_target (body, GF_OMP_TARGET_KIND_DATA,
+   OMP_CLAUSES (expr));
+  }
   break;
 case OMP_TEAMS:
   stmt = gimple_build_omp_teams (body, OMP_CLAUSES (expr));
--- gcc/omp-low.c.jj2019-08-07 09:24:35.647096069 +0200
+++ gcc/omp-low.c   2019-08-07 11:12:44.137779196 +0200
@@ -1243,9 +1243,9 @@ scan_sharing_clauses (tree clauses, omp_
  if ((