Re: [PATCH][Tree-optimization/PR89772]fold memchr builtins for character not in constant nul-padded string

2019-05-09 Thread JunMa

在 2019/5/9 上午10:22, JunMa 写道:

在 2019/5/9 上午3:02, Bernd Edlinger 写道:

On 5/8/19 4:31 PM, Richard Biener wrote:

On Tue, May 7, 2019 at 4:34 AM JunMa  wrote:

在 2019/5/6 下午7:58, JunMa 写道:

在 2019/5/6 下午6:02, Richard Biener 写道:
On Thu, Mar 21, 2019 at 5:57 AM JunMa  
wrote:

Hi
For now, gcc can not fold code like:

const char a[5] = "123"
__builtin_memchr (a, '7', sizeof a)

It tries to avoid folding out of string length although length of a
is 5.
This is a bit conservative, it's safe to folding memchr/bcmp/memcmp
builtins when constant string stores in array with some trailing 
nuls.


This patch folds these cases by exposing additional length of
trailing nuls in c_getstr().
Bootstrapped/regtested on x86_64-linux, ok for trunk?
It's probably better if gimple_fold_builtin_memchr uses 
string_constant

directly instead?
We can use string_constant in gimple_fold_builtin_memchr, however 
it is a

bit complex to use it in memchr/memcmp constant folding.

You are changing memcmp constant folding but only have a testcase
for memchr.

I'll add later.

If we decide to amend c_getstr I would rather make it return the
total length instead of the number of trailing zeros.

I think return the total length is safe in other place as well.
I used the argument in patch since I do not want any impact on
other part at all.


Although it is safe to use total length, I found that function
inline_expand_builtin_string_cmp() which used c_getstr() may emit
redundant rtls for trailing null chars when total length is returned.

Also it is trivial to handle constant string with trailing null chars.

So this updated patch follow richard's suggestion: using 
string_constant

directly.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

Doesn't this fold to NULL for the case where seaching for '0' and it
doesn't occur in the string constant but only the zero-padding?
So you'd need to conditionalize on c being not zero (or handle
that case specially by actually finding the first zero in the padding)?
I think there was work to have all string constants zero terminated
but I'm not sure if we can rely on that here.  Bernd?

It turned out that there is a number of languages that don't have 
zero-terminated
strings by default, which would have complicated the patch just too 
much for too

little benefit.

In the end, it was more important to guarantee that mem_size >= 
string_length holds.


The string_length returned form  c_getstr() is equal to mem_size when 
string_length > string_size, so I'll add assert


in the patch.

In C it is just a convention that string constants have usually a 
zero-termination,
but even with C there are ways how strings constants can be not 
zero-terminated.


There can in theory be optimizations that introduce not 
zero-terminated strings,
like tree-ssa-forwprop.c, where a not zero-terminated string constant 
is folded

in simplify_builtin_call.

In such a case, c_getstr might in theory return a string without 
zero-termination,

but I think it will be rather difficult to find a C test case for that.

Well, if I had a test case for that I would probably fix it in 
c_getstr to consider

the implicit padding as equivalent to an explicit zero-termination.



c_getstr() makes sure  the returned string has zero-termination when 
2nd argument is NULL, but not when
string_length is returned.  I think it is the caller's responsibility 
to take care of both of the returned string and length.





Update the patch with assert checking on condition
string_length <= string_size. FYI.

Also Bootstrapped/regtested on x86_64-linux.

Regards
JunMa



Thanks
JunMa




Bernd.



Richard.


Regards
JunMa

gcc/ChangeLog

2019-05-07  Jun Ma 

  PR Tree-optimization/89772
  * gimple-fold.c (gimple_fold_builtin_memchr): consider 
trailing nuls in

  out-of-bound accesses checking.

gcc/testsuite/ChangeLog

2019-05-07  Jun Ma 

  PR Tree-optimization/89772
  * gcc.dg/builtin-memchr-4.c: New test.

Thanks
JunMa

Richard.


Regards
JunMa


gcc/ChangeLog

2019-03-21  Jun Ma 

   PR Tree-optimization/89772
   * fold-const.c (c_getstr): Add new parameter to get 
length of

additional
   trailing nuls after constant string.
   * gimple-fold.c (gimple_fold_builtin_memchr): consider
trailing nuls in
   out-of-bound accesses checking.
   * fold-const-call.c (fold_const_call): Likewise.


gcc/testsuite/ChangeLog

2019-03-21  Jun Ma 

   PR Tree-optimization/89772
   * gcc.dg/builtin-memchr-4.c: New test.




---
 gcc/gimple-fold.c   | 10 +-
 gcc/testsuite/gcc.dg/builtin-memchr-4.c | 30 ++
 2 files changed, 39 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/builtin-memchr-4.c

diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
index 1b10bae..c4fd5b1 100644
--- a/gcc/gimple-fold.c
+++ b/gcc/gimple-fold.c
@@ -2557,7 +2557,15 @@ gimple_fold_builtin_memchr (gimple_stmt_iterator *gsi)
   const char *r = (const cha

Re: [PATCH][AArch64] Emit TARGET_DOTPROD-specific sequence for sadv16qi

2019-05-09 Thread Richard Sandiford
Kyrill Tkachov  writes:
> +;; Helper expander for aarch64_abd_3 to save the callers
> +;; the hassle of constructing the other arm of the MINUS.
> +(define_expand "abd_3"
> +  [(use (match_operand:VDQ_BHSI 0 "register_operand"))
> +   (USMAX:VDQ_BHSI (match_operand:VDQ_BHSI 1 "register_operand")
> +(match_operand:VDQ_BHSI 2 "register_operand"))]
> +  "TARGET_SIMD"
> +  {
> +rtx other_arm
> +  = gen_rtx_ (mode, operands[1], operands[2]);
> +emit_insn (gen_aarch64_abd_3 (operands[0], operands[1],
> +operands[2], other_arm));

Should be indented to the innermost "(" instead.

LGTM otherwise, but an AArch6 maintainer should have the final say.

Thanks,
Richard


[v3 PATCH] Inconsistency wrt Allocators in basic_string assignment vs. basic_string::assign (LWG2579)

2019-05-09 Thread Nina Dinka Ranns
Tested on Linux x86_64
Inconsistency wrt Allocators in basic_string assignment vs.
basic_string::assign (LWG2579)

2019-05-09  Nina Dinka Ranns  
Inconsistency wrt Allocators in basic_string assignment vs.
basic_string::assign (LWG2579)
* include/bits/basic_string.h:
 operator=(const basic_string& __str): moved allocator
decision from operator= to assign
 assign(const basic_string& __str): moved allocator decision
from operator= to assign
* testsuite/21_strings/basic_string/allocator/char/copy_assign.cc:
Adding tests
* testsuite/21_strings/basic_string/allocator/wchar_t/copy_assign.cc:
Adding tests
Index: libstdc++-v3/include/bits/basic_string.h
===
--- libstdc++-v3/include/bits/basic_string.h	(revision 270943)
+++ libstdc++-v3/include/bits/basic_string.h	(working copy)
@@ -664,35 +664,6 @@
   basic_string&
   operator=(const basic_string& __str)
   {
-#if __cplusplus >= 201103L
-	if (_Alloc_traits::_S_propagate_on_copy_assign())
-	  {
-	if (!_Alloc_traits::_S_always_equal() && !_M_is_local()
-		&& _M_get_allocator() != __str._M_get_allocator())
-	  {
-		// Propagating allocator cannot free existing storage so must
-		// deallocate it before replacing current allocator.
-		if (__str.size() <= _S_local_capacity)
-		  {
-		_M_destroy(_M_allocated_capacity);
-		_M_data(_M_local_data());
-		_M_set_length(0);
-		  }
-		else
-		  {
-		const auto __len = __str.size();
-		auto __alloc = __str._M_get_allocator();
-		// If this allocation throws there are no effects:
-		auto __ptr = _Alloc_traits::allocate(__alloc, __len + 1);
-		_M_destroy(_M_allocated_capacity);
-		_M_data(__ptr);
-		_M_capacity(__len);
-		_M_set_length(__len);
-		  }
-	  }
-	std::__alloc_on_copy(_M_get_allocator(), __str._M_get_allocator());
-	  }
-#endif
 	return this->assign(__str);
   }
 
@@ -1363,6 +1334,35 @@
   basic_string&
   assign(const basic_string& __str)
   {
+#if __cplusplus >= 201103L
+	if (_Alloc_traits::_S_propagate_on_copy_assign())
+	  {
+	if (!_Alloc_traits::_S_always_equal() && !_M_is_local()
+		&& _M_get_allocator() != __str._M_get_allocator())
+	  {
+		// Propagating allocator cannot free existing storage so must
+		// deallocate it before replacing current allocator.
+		if (__str.size() <= _S_local_capacity)
+		  {
+		_M_destroy(_M_allocated_capacity);
+		_M_data(_M_local_data());
+		_M_set_length(0);
+		  }
+		else
+		  {
+		const auto __len = __str.size();
+		auto __alloc = __str._M_get_allocator();
+		// If this allocation throws there are no effects:
+		auto __ptr = _Alloc_traits::allocate(__alloc, __len + 1);
+		_M_destroy(_M_allocated_capacity);
+		_M_data(__ptr);
+		_M_capacity(__len);
+		_M_set_length(__len);
+		  }
+	  }
+	std::__alloc_on_copy(_M_get_allocator(), __str._M_get_allocator());
+	  }
+#endif
 	this->_M_assign(__str);
 	return *this;
   }
Index: libstdc++-v3/testsuite/21_strings/basic_string/allocator/char/copy_assign.cc
===
--- libstdc++-v3/testsuite/21_strings/basic_string/allocator/char/copy_assign.cc	(revision 270943)
+++ libstdc++-v3/testsuite/21_strings/basic_string/allocator/char/copy_assign.cc	(working copy)
@@ -133,10 +133,47 @@
   VERIFY( v1.get_allocator() == a2 );
 }
 
+void test04()
+{
+  // LWG2579
+  typedef propagating_allocator alloc_type;
+
+  typedef std::basic_string test_type;
+
+  test_type v1("tralalala",alloc_type(1));
+  test_type v2("content", alloc_type(2));
+  test_type v3("content2", alloc_type(3));
+
+  v1.assign(v2);
+  v3 = v2;
+  VERIFY(2 == v1.get_allocator().get_personality());
+  VERIFY(2 == v3.get_allocator().get_personality());
+
+}
+
+void test05()
+{
+  // LWG2579
+  typedef propagating_allocator alloc_type;
+
+  typedef std::basic_string test_type;
+
+  test_type v1("tralalala",alloc_type(1));
+  test_type v2("content", alloc_type(2));
+  test_type v3("content2", alloc_type(3));
+
+  v1.assign(v2);
+  v3 = v2;
+  VERIFY(1 == v1.get_allocator().get_personality());
+  VERIFY(3 == v3.get_allocator().get_personality());
+}
+
 int main()
 {
   test01();
   test02();
   test03();
+  test04();
+  test05();
   return 0;
 }
Index: libstdc++-v3/testsuite/21_strings/basic_string/allocator/wchar_t/copy_assign.cc
===
--- libstdc++-v3/testsuite/21_strings/basic_string/allocator/wchar_t/copy_assign.cc	(revision 270943)
+++ libstdc++-v3/testsuite/21_strings/basic_string/allocator/wchar_t/copy_assign.cc	(working copy)
@@ -133,10 +133,46 @@
   VERIFY( v1.get_allocator() == a2 );
 }
 
+void test04()
+{
+  // LWG2579
+  typedef propagating_allocator alloc_type;
+
+  typedef std::basic_string test_type;
+
+  test_type v1(L"tralalala",alloc_type(1));
+  test_type v2(L"content", alloc_type(2));
+  tes

[committed] [PR89221] Continue to default to '--disable-frame-pointer' for x86 GNU systems (was: [PATCH v2, i386]: Fix PR89221, --enable-frame-pointer does not work as intended)

2019-05-09 Thread Thomas Schwinge
Hi!

On Sun, 10 Feb 2019 20:51:39 +0100, Uros Bizjak  wrote:
> On Fri, Feb 8, 2019 at 1:24 PM Uros Bizjak  wrote:
> > Attached patch fixes --enable-frame-pointer handling [...]

ACK.

> Please note that this fix will re-enable frame pointer for all targets
> but linux* or darwin[[8912]]. However, since builds for e.g. cygwin
> and mingw survived just well without frame pointers in the mean time,
> we should probably list these targets as targets without frame
> pointers by default.

I agree, this would cause the least surprise, to simply keep the previous
default of '--disable-frame-pointer'.  Until such a global change is
agreed on, and made...

> Maintainers should decide.

The GNU/Hurd maintainer has decided (and also made a decision for
GNU/k*BSD); committed the attached to trunk in r271028.


Grüße
 Thomas


From 679a49952fe543043047267cc8500c5cc1065b7b Mon Sep 17 00:00:00 2001
From: tschwinge 
Date: Thu, 9 May 2019 09:51:59 +
Subject: [PATCH] [PR89221] Continue to default to '--disable-frame-pointer'
 for x86 GNU systems

The recent trunk r270914 for PR89221 "--enable-frame-pointer does not work as
intended" fixed a scripting defect in the x86 '--enable-frame-pointer'
handling.

This has the side effect that, for example, for '--target=i686-gnu' this is now
enabled by default: 'USE_IX86_FRAME_POINTER=1' is added to 'tm_defines'.  Given
that it's highly unlikely that anyone would now suddenly want
'--enable-frame-pointer' as the default for any kind of GNU system, I'm
changing the default back for GNU systems, to match that of a 'target_os' of
'linux* | darwin[8912]*'.

	gcc/
	PR target/89221
	* configure.ac (--enable-frame-pointer): Disable by default for
	GNU systems.
	* configure: Regenerate.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@271028 138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog| 7 +++
 gcc/configure| 4 ++--
 gcc/configure.ac | 4 ++--
 3 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 37447f854c0..ea96146b5b8 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,10 @@
+2019-05-09  Thomas Schwinge  
+
+	PR target/89221
+	* configure.ac (--enable-frame-pointer): Disable by default for
+	GNU systems.
+	* configure: Regenerate.
+
 2019-05-09  Alan Modra  
 
 	PR target/89271
diff --git a/gcc/configure b/gcc/configure
index 08cce6f1980..947d263a617 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -12197,8 +12197,8 @@ if test "${enable_frame_pointer+set}" = set; then :
 else
 
 case $target_os in
-linux* | darwin[8912]*)
-  # Enable -fomit-frame-pointer by default for Linux and Darwin with DWARF2.
+linux* | gnu* | darwin[8912]*)
+  # Enable -fomit-frame-pointer by default for these systems with DWARF2.
   enable_frame_pointer=no
   ;;
 *)
diff --git a/gcc/configure.ac b/gcc/configure.ac
index 7c526b9ada4..bfcdf526e44 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -1884,8 +1884,8 @@ AC_ARG_ENABLE(frame-pointer,
 		[enable -fno-omit-frame-pointer by default for x86])], [],
 [
 case $target_os in
-linux* | darwin[[8912]]*)
-  # Enable -fomit-frame-pointer by default for Linux and Darwin with DWARF2.
+linux* | gnu* | darwin[[8912]]*)
+  # Enable -fomit-frame-pointer by default for these systems with DWARF2.
   enable_frame_pointer=no
   ;;
 *)
-- 
2.17.1



signature.asc
Description: PGP signature


[committed] Clean up MPX-related stuff: CIF_CHKP (was: [PATCH] Clean up another MPX-related stuff.)

2019-05-09 Thread Thomas Schwinge
Hi!

On Wed, 13 Feb 2019 14:47:36 +0100, Richard Biener  
wrote:
> On February 13, 2019 6:53:17 AM GMT+01:00, "Martin Liška"  
> wrote:
> >As Honza noticed, there's still some leftover from MPX removal.
> >May I remove another bunch of fields now, or should I wait
> >for next stage1?
> 
> You can do it now. 

I recently stumbled across an additional leftover piece:

> 2019-02-13  Martin Liska  

>   * ipa-fnsummary.c (compute_fn_summary): Likewise.

| --- a/gcc/ipa-fnsummary.c
| +++ b/gcc/ipa-fnsummary.c
| @@ -2449,13 +2449,7 @@ compute_fn_summary (struct cgraph_node *node, bool 
early)
|info->account_size_time (2 * ipa_fn_summary::size_scale, 0, t, t);
|ipa_update_overall_fn_summary (node);
|info->self_size = info->size;
| -  /* We cannot inline instrumentation clones.  */
| -  if (node->thunk.add_pointer_bounds_args)
| - {
| -  info->inlinable = false;
| -  node->callees->inline_failed = CIF_CHKP;
| - }
| -  else if (stdarg_p (TREE_TYPE (node->decl)))
| +  if (stdarg_p (TREE_TYPE (node->decl)))
|   {

This removed the (only) user of 'CIF_CHKP', but didn't remove its
definition.  (Probably because of that one going by the un-prefixed
short-hand name of 'CHKP'?)  As obvious, now cleaned up on trunk in
r271029, and on gcc-9-branch in r271030, see attached.


Grüße
 Thomas


From 3f81dbf1e299a0966fbc1874255ef8f2ec03bc49 Mon Sep 17 00:00:00 2001
From: tschwinge 
Date: Thu, 9 May 2019 09:52:10 +
Subject: [PATCH] Clean up MPX-related stuff: CIF_CHKP

..., which was forgotten in recent r268844.

	gcc/
	* cif-code.def (CHKP): Remove.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@271029 138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog| 2 ++
 gcc/cif-code.def | 4 
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index ea96146b5b8..7df4ae671de 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,5 +1,7 @@
 2019-05-09  Thomas Schwinge  
 
+	* cif-code.def (CHKP): Remove.
+
 	PR target/89221
 	* configure.ac (--enable-frame-pointer): Disable by default for
 	GNU systems.
diff --git a/gcc/cif-code.def b/gcc/cif-code.def
index 0fabfebca1c..cee16cf7cb7 100644
--- a/gcc/cif-code.def
+++ b/gcc/cif-code.def
@@ -141,7 +141,3 @@ DEFCIFCODE(EXTERN_LIVE_ONLY_STATIC, CIF_FINAL_ERROR,
 /* We proved that the call is unreachable.  */
 DEFCIFCODE(UNREACHABLE, CIF_FINAL_ERROR,
 	   N_("unreachable"))
-
-/* We can't inline because of instrumentation thunk.  */
-DEFCIFCODE(CHKP, CIF_FINAL_ERROR,
-	   N_("caller is instrumentation thunk"))
-- 
2.17.1

From 3380d3e1bc51edb2d3b6baf6e213d699045f6d3a Mon Sep 17 00:00:00 2001
From: tschwinge 
Date: Thu, 9 May 2019 09:52:53 +
Subject: [PATCH] Clean up MPX-related stuff: CIF_CHKP

..., which was forgotten in recent r268844.

	gcc/
	* cif-code.def (CHKP): Remove.

trunk r271029

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gcc-9-branch@271030 138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog| 4 
 gcc/cif-code.def | 4 
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index b7791f10e3c..3d66d8e4df4 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,7 @@
+2019-05-09  Thomas Schwinge  
+
+	* cif-code.def (CHKP): Remove.
+
 2019-05-07  Kelvin Nilsen  
 
 	Backport from mainline.
diff --git a/gcc/cif-code.def b/gcc/cif-code.def
index 0fabfebca1c..cee16cf7cb7 100644
--- a/gcc/cif-code.def
+++ b/gcc/cif-code.def
@@ -141,7 +141,3 @@ DEFCIFCODE(EXTERN_LIVE_ONLY_STATIC, CIF_FINAL_ERROR,
 /* We proved that the call is unreachable.  */
 DEFCIFCODE(UNREACHABLE, CIF_FINAL_ERROR,
 	   N_("unreachable"))
-
-/* We can't inline because of instrumentation thunk.  */
-DEFCIFCODE(CHKP, CIF_FINAL_ERROR,
-	   N_("caller is instrumentation thunk"))
-- 
2.17.1



signature.asc
Description: PGP signature


Re: [PATCH][stage1] Support profile (BB counts and edge probabilities) in GIMPLE FE.

2019-05-09 Thread Martin Liška
On 5/7/19 4:44 PM, Richard Biener wrote:
> On May 7, 2019 4:33:08 PM GMT+02:00, "Martin Liška"  wrote:
>> On 5/7/19 2:56 PM, Richard Biener wrote:
>>> But that can use the existing get_hot_bb_threshold since we never
>> want
>>> to dump -1 in case min_count was never initialized.
>>
>> Yes. But the function will call:
>>
>> get_hot_bb_threshold ()
>> {
>>  if (min_count == -1)
>>{
>> gcov_type t = profile_info->sum_max / PARAM_VALUE
>> (HOT_BB_COUNT_FRACTION);
>>  set_hot_bb_threshold (t);
>> ...
>>
>> which will cause a segfault in non-PGO run. Note that:
>> static gcov_type min_count = -1;
>>
>> is a non-exported variable so that's why I simply added the getter.
>>
>> Hope it's fine as is?
> 
> Oh, I see. Hmm, so we should get away with no min_coubt when all counter 
> kinds are non-pgo? 

Yes, in that situation we don't want to print:
/* --param=gimple-fe-computed-hot-bb-threshold=xyz */

Martin

> 
> Richard. 
> 
>>
>> Martin
> 



Re: [committed] Clean up MPX-related stuff: CIF_CHKP

2019-05-09 Thread Martin Liška
On 5/9/19 11:59 AM, Thomas Schwinge wrote:
> Hi!
> 
> On Wed, 13 Feb 2019 14:47:36 +0100, Richard Biener 
>  wrote:
>> On February 13, 2019 6:53:17 AM GMT+01:00, "Martin Liška"  
>> wrote:
>>> As Honza noticed, there's still some leftover from MPX removal.
>>> May I remove another bunch of fields now, or should I wait
>>> for next stage1?
>>
>> You can do it now. 
> 
> I recently stumbled across an additional leftover piece:
> 
>> 2019-02-13  Martin Liska  
> 
>>  * ipa-fnsummary.c (compute_fn_summary): Likewise.
> 
> | --- a/gcc/ipa-fnsummary.c
> | +++ b/gcc/ipa-fnsummary.c
> | @@ -2449,13 +2449,7 @@ compute_fn_summary (struct cgraph_node *node, bool 
> early)
> |info->account_size_time (2 * ipa_fn_summary::size_scale, 0, t, t);
> |ipa_update_overall_fn_summary (node);
> |info->self_size = info->size;
> | -  /* We cannot inline instrumentation clones.  */
> | -  if (node->thunk.add_pointer_bounds_args)
> | -   {
> | -  info->inlinable = false;
> | -  node->callees->inline_failed = CIF_CHKP;
> | -   }
> | -  else if (stdarg_p (TREE_TYPE (node->decl)))
> | +  if (stdarg_p (TREE_TYPE (node->decl)))
> | {
> 
> This removed the (only) user of 'CIF_CHKP', but didn't remove its
> definition.  (Probably because of that one going by the un-prefixed
> short-hand name of 'CHKP'?)  As obvious, now cleaned up on trunk in
> r271029, and on gcc-9-branch in r271030, see attached.
> 
> 
> Grüße
>  Thomas
> 
> 

Hi.

Thanks for the patch, it's obvious to me.

Martin


[PATCH] Fix Pr90395

2019-05-09 Thread Richard Biener


Bootstrapped/tested on x86_64-unknown-linux-gnu, applied.

Richard.

2019-05-09  Richard Biener  

PR tree-optimization/90395
* tree-ssa-forwprop.c (pass_forwprop::execute): Do not
rewrite vector stores that throw internally.

* gcc.dg/torture/pr90395.c: New testcase.

Index: gcc/tree-ssa-forwprop.c
===
--- gcc/tree-ssa-forwprop.c (revision 271000)
+++ gcc/tree-ssa-forwprop.c (working copy)
@@ -2570,6 +2570,7 @@ pass_forwprop::execute (function *fun)
  if (single_imm_use (lhs, &use_p, &use_stmt)
  && gimple_store_p (use_stmt)
  && !gimple_has_volatile_ops (use_stmt)
+ && !stmt_can_throw_internal (cfun, use_stmt)
  && is_gimple_assign (use_stmt)
  && (TREE_CODE (gimple_assign_lhs (use_stmt))
  != TARGET_MEM_REF))
Index: gcc/testsuite/gcc.dg/torture/pr90395.c
===
--- gcc/testsuite/gcc.dg/torture/pr90395.c  (nonexistent)
+++ gcc/testsuite/gcc.dg/torture/pr90395.c  (working copy)
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-fexceptions -fnon-call-exceptions" } */
+
+typedef int v16si __attribute__ ((__vector_size__ (64)));
+
+void
+rl (int uq)
+{
+  v16si qw[1];
+
+  qw[uq] = (v16si) { uq };
+}


[PATCH] Fix location where lto-dump is installed.

2019-05-09 Thread Martin Liška
Hi.

The patch is about proper install location of the newly added tool.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin

gcc/lto/ChangeLog:

2019-05-09  Martin Liska  

* Make-lang.in: Use program_transform_name for lto-dump.
* config-lang.in: Do not mark lto-dump compiler as we don't
want to have it installed at
lib/gcc/x86_64-pc-linux-gnu/10.0.0/lto-dump.
---
 gcc/lto/Make-lang.in   | 3 ++-
 gcc/lto/config-lang.in | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)


diff --git a/gcc/lto/Make-lang.in b/gcc/lto/Make-lang.in
index 92487e1f53e..afcfc9893b4 100644
--- a/gcc/lto/Make-lang.in
+++ b/gcc/lto/Make-lang.in
@@ -22,6 +22,7 @@
 # The name of the LTO compiler.
 LTO_EXE = lto1$(exeext)
 LTO_DUMP_EXE = lto-dump$(exeext)
+LTO_DUMP_INSTALL_NAME := $(shell echo lto-dump|sed '$(program_transform_name)')
 # The LTO-specific object files inclued in $(LTO_EXE).
 LTO_OBJS = lto/lto-lang.o lto/lto.o lto/lto-object.o attribs.o lto/lto-partition.o lto/lto-symtab.o lto/lto-common.o
 lto_OBJS = $(LTO_OBJS)
@@ -48,7 +49,7 @@ lto.rest.encap:
 lto.tags:
 lto.install-common: installdirs
 	$(INSTALL_PROGRAM) $(LTO_DUMP_EXE) \
-	$(DESTDIR)/$(bindir)/$(LTO_DUMP_EXE)
+	$(DESTDIR)/$(bindir)/$(LTO_DUMP_INSTALL_NAME)$(exeext)
 
 lto.install-man:
 lto.install-info:
diff --git a/gcc/lto/config-lang.in b/gcc/lto/config-lang.in
index 37c8f6e12b1..c7aa41f076b 100644
--- a/gcc/lto/config-lang.in
+++ b/gcc/lto/config-lang.in
@@ -18,7 +18,7 @@
 # .
 
 language="lto"
-compilers="lto1\$(exeext) lto-dump\$(exeext)"
+compilers="lto1\$(exeext)"
 
 gtfiles="\$(srcdir)/lto/lto-tree.h \$(srcdir)/lto/lto-lang.c \$(srcdir)/lto/lto.c \$(srcdir)/lto/lto.h \$(srcdir)/lto/lto-common.h \$(srcdir)/lto/lto-common.c \$(srcdir)/lto/lto-dump.c"
 



[PATCH] Fix __builtin_init_dwarf_reg_size_table when built with -mfpxx

2019-05-09 Thread Dragan Mladjenovic
From: "Dragan Mladjenovic" 


Hi all,

For TARGET_FLOATXX the odd-numbered FP registers in SFmode are
HARD_REGNO_CALL_PART_CLOBBERED. This causes dwarf_frame_reg_mode to fall
back to VOIDmode and for __builtin_init_dwarf_reg_size_table to fill them
as zero sized.

This prevents libgcc's unwinder form ever restoring high parts of
calle-saved double precision registers.

This patch fixes the issue by forcing dwarf_frame_reg_mode to use SImode
for FP registers.

Bootstrapped and done regression tests on mipsel-unknown-linux-gnu -
no new failures found.


Best regards,
Dragan


gcc/ChangeLog:

2019-04-23  Dragan Mladjenovic  

  * gcc/config/mips/mips.c(mips_dwarf_frame_reg_mode): Replace TARGET_FLOAT64
  with !TARGET_FLOAT32, thus handling both fp64 and fpxx modes.

gcc/testsuite/ChangeLog:

2019-04-23  Dragan Mladjenovic  

  * g++.dg/eh/o32-fp.C: New.
  * gcc.target/mips/dwarfregtable-1.c: New.
  * gcc.target/mips/dwarfregtable-2.c: New.
  * gcc.target/mips/dwarfregtable-3.c: New.
  * gcc.target/mips/dwarfregtable-4.c: New.
  * gcc.target/mips/dwarfregtable.h: New.

---
 gcc/config/mips/mips.c  |  2 +-
 gcc/testsuite/g++.dg/eh/o32-fp.C| 47 +
 gcc/testsuite/gcc.target/mips/dwarfregtable-1.c |  5 +++
 gcc/testsuite/gcc.target/mips/dwarfregtable-2.c |  5 +++
 gcc/testsuite/gcc.target/mips/dwarfregtable-3.c |  5 +++
 gcc/testsuite/gcc.target/mips/dwarfregtable-4.c |  5 +++
 gcc/testsuite/gcc.target/mips/dwarfregtable.h   | 22 
 7 files changed, 90 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/eh/o32-fp.C
 create mode 100644 gcc/testsuite/gcc.target/mips/dwarfregtable-1.c
 create mode 100644 gcc/testsuite/gcc.target/mips/dwarfregtable-2.c
 create mode 100644 gcc/testsuite/gcc.target/mips/dwarfregtable-3.c
 create mode 100644 gcc/testsuite/gcc.target/mips/dwarfregtable-4.c
 create mode 100644 gcc/testsuite/gcc.target/mips/dwarfregtable.h

diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
index 1de33b2..c0c995a 100644
--- a/gcc/config/mips/mips.c
+++ b/gcc/config/mips/mips.c
@@ -9577,7 +9577,7 @@ mips_dwarf_frame_reg_mode (int regno)
 {
   machine_mode mode = default_dwarf_frame_reg_mode (regno);
 
-  if (FP_REG_P (regno) && mips_abi == ABI_32 && TARGET_FLOAT64)
+  if (FP_REG_P (regno) && mips_abi == ABI_32 && !TARGET_FLOAT32)
 mode = SImode;
 
   return mode;
diff --git a/gcc/testsuite/g++.dg/eh/o32-fp.C b/gcc/testsuite/g++.dg/eh/o32-fp.C
new file mode 100644
index 000..08fa51b
--- /dev/null
+++ b/gcc/testsuite/g++.dg/eh/o32-fp.C
@@ -0,0 +1,47 @@
+// Test whether call saved float are restored properly for O32 ABI
+// { dg-do run { target { { { mips*-*-linux* } && hard_float } && { ! mips64 } 
} } }
+// { dg-options "-O2" }
+
+void __attribute__((noinline))
+bar (void)
+{
+  throw 1;
+}
+
+void __attribute__((noinline))
+foo (void)
+{
+  register double f20 __asm__ ("f20") = 0.0;
+  register double f22 __asm__ ("f22") = 0.0;
+  register double f24 __asm__ ("f24") = 0.0;
+  register double f26 __asm__ ("f26") = 0.0;
+  register double f28 __asm__ ("f28") = 0.0;
+  register double f30 __asm__ ("f30") = 0.0;
+  __asm__ __volatile__("":"+f"(f20),"+f"(f22),"+f"(f24),"+f"(f26),"+f"(f30));
+  bar ();
+}
+
+int
+main (void)
+{
+  register double f20 __asm__ ("f20") = 12.0;
+  register double f22 __asm__ ("f22") = 13.0;
+  register double f24 __asm__ ("f24") = 14.0;
+  register double f26 __asm__ ("f26") = 15.0;
+  register double f28 __asm__ ("f28") = 16.0;
+  register double f30 __asm__ ("f30") = 17.0;
+
+  try
+{
+  foo ();
+}
+  catch (...)
+{
+  __asm__ ("":"+f"(f20),"+f"(f22),"+f"(f24),"+f"(f26),"+f"(f30));
+}
+
+  if (f20 != 12.0 || f22 != 13.0 || f24 != 14.0
+  || f26 != 15.0 || f28 != 16.0 || f30 != 17.0)
+__builtin_abort ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/mips/dwarfregtable-1.c 
b/gcc/testsuite/gcc.target/mips/dwarfregtable-1.c
new file mode 100644
index 000..93d0844
--- /dev/null
+++ b/gcc/testsuite/gcc.target/mips/dwarfregtable-1.c
@@ -0,0 +1,5 @@
+/* Check if content of dwarf reg size table matches the expected.  */
+/* { dg-do run } */
+/* { dg-options "-mabi=32 -mfp32" } */
+
+#include "dwarfregtable.h"
diff --git a/gcc/testsuite/gcc.target/mips/dwarfregtable-2.c 
b/gcc/testsuite/gcc.target/mips/dwarfregtable-2.c
new file mode 100644
index 000..c6dea94
--- /dev/null
+++ b/gcc/testsuite/gcc.target/mips/dwarfregtable-2.c
@@ -0,0 +1,5 @@
+/* Check if content of dwarf reg size table matches the expected.  */
+/* { dg-do run } */
+/* { dg-options "-mabi=32 -mfpxx" } */
+
+#include "dwarfregtable.h"
diff --git a/gcc/testsuite/gcc.target/mips/dwarfregtable-3.c 
b/gcc/testsuite/gcc.target/mips/dwarfregtable-3.c
new file mode 100644
index 000..87937c4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/mips/dwarfregtable-3.c
@@ -0,0 +1,5 @@
+/* Check if content of dwarf reg size table matches the expected.  */
+/* { dg-do run } */
+/* { dg-o

30_threads/thread/native_handle/typesizes.cc is no good

2019-05-09 Thread Alexandre Oliva
Other classes that have native_handle/typesizes.cc tests have
native_type_handle defined as a pointer type, and the pointed-to type is
native_handle, the type of the single data member of the class (*).  It
thus makes sense to check that the single data member and the enclosing
class type have the same size and alignment: a pointer to the enclosing
type should also be a valid pointer to its single member.

(*) this single data member is actually inherited or enclosed in another
class, but let's not get distracted by this irrelevant detail.

std::thread, however, does not follow this pattern.  Not because the
single data member is defined as a direct data member of a nested class,
rather than inherited from another class.  The key difference is that it
does not use native_type_handle's pointed-to type as the single data
member, it rather uses native_type_handle directly as the type of the
single data member.

On GNU/Linux, and presumably on most platforms, native_handle_type =
__gthread_t is not even a pointer type.  This key difference would have
been obvious if remove_pointer rejected non-pointer types, but that's
not the case.  When given __gthread_t -> pthread_t -> unsigned long int,
remove_pointer::type is T, so we get unsigned long int back.  The
test works because there's no pointer type to strip off.

However, on a platform that defines __gthread_t as a pointer type, we
use that pointer type as native_type_handle and as the type for the
single data member of std::thread.  But then, the test compares size and
alignment of that type with those of the type obtained by removing one
indirection level.  We're comparing properties of different, unrelated
types.  Unless the pointed-to type happens to have, by chance, the size
and alignment of a pointer, the test will fail.


IOW, the test is no good: it's not testing what it should, and wherever
it passes, it's by accident.

In order to test what it should, we'd need to use an alternate test
function that does not strip off one indirection level from
native_handle_type, if the test is to remain.

Should I introduce such an alternate test function and adjust the test,
or just remove the broken test?

Or should we change std::thread::native_handle_type to __gthread_t*,
while keeping unchanged the type of the single data member
std::thread::id::_M_thread?

Thanks in advance,

-- 
Alexandre Oliva, freedom fighter  he/him   https://FSFLA.org/blogs/lxo
Be the change, be Free! FSF Latin America board member
GNU Toolchain EngineerFree Software Evangelist
Hay que enGNUrecerse, pero sin perder la terGNUra jamás - Che GNUevara


[RFC] SHIFT_COUNT_TRUNCATED and shift_truncation_mask

2019-05-09 Thread Robin Dapp
Hi,

while trying to improve s390 code generation for rotate and shift I
noticed superfluous subregs for shift count operands. In our backend we
already have quite cumbersome patterns that would need to be duplicated
(or complicated further by more subst patterns) in order to get rid of
the subregs.

I had already finished all the patterns when I realized that
SHIFT_COUNT_TRUNCATED and the target hook shift_truncation_mask already
exist and could do what is needed without extra patterns.  Just defining
 shift_truncation_mask was not enough though as most of the additional
insns get introduced by combine.

Event SHIFT_COUNT_TRUNCATED is no perfect match to what our hardware
does because we always only consider the last 6 bits of a shift operand.

Despite all the warnings in the other backends, most notably
SHIFT_COUNT_TRUNCATED being "discouraged" as mentioned in riscv.h, I
wrote the attached tentative patch.  It's a little ad-hoc, uses the
SHIFT_COUNT_TRUNCATED paths only if shift_truncation_mask () != 0 and,
instead of truncating via & (GET_MODE_BITSIZE (mode) - 1), it applies
the mask returned by shift_truncation_mask.  Doing so, usage of both
"methods" actually reduces to a single way.

I assume both were originally intended for different purposes but
without knowing the history the separation seems artificial to me.  A
quick look at other backends showed that at least some (e.g. ARM) do not
use SHIFT_COUNT_TRUNCATED because the general behavior is not
fine-grained enough, e.g. the masks for shift and rotate differ.

While the attached patch might probably work for s390, it will probably
not for other targets.  In addition to what my patch does, would it be
useful to unify both truncation methods in a target hook that takes the
operation (shift, rotate, zero_extract, ...) as well as the mode as
arguments?  Thus, we would let the target decide what to do with the
specific combination of both.  Maybe this would also allow to
distinguish bit test operations from the rest.

--- a/gcc/targhooks.c
+++ b/gcc/targhooks.c
@@ -251,7 +251,7 @@ default_unwind_word_mode (void)
 /* The default implementation of TARGET_SHIFT_TRUNCATION_MASK.  */

 unsigned HOST_WIDE_INT
-default_shift_truncation_mask (machine_mode mode)
+default_shift_truncation_mask (enum rtx_code code, machine_mode mode)
 {
   [...]

Of course, when getting everything right in the backend, there will be
no difference in result, but in my experience it's easily possible to
forget a subreg/... somewhere and end up with worse code by accident.
Maybe there is another reason why SHIFT_COUNT_TRUNCATED is discouraged
that I missed entirely?

Regards
 Robin
diff --git a/gcc/combine.c b/gcc/combine.c
index 91e32c88c88..d2a659f929b 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -6445,14 +6445,12 @@ combine_simplify_rtx (rtx x, machine_mode op0_mode, int in_dest,
 	return simplify_shift_const (x, code, mode, XEXP (x, 0),
  INTVAL (XEXP (x, 1)));
 
-  else if (SHIFT_COUNT_TRUNCATED && !REG_P (XEXP (x, 1)))
+  else if (SHIFT_COUNT_TRUNCATED
+	  && targetm.shift_truncation_mask (mode)
+	  && !REG_P (XEXP (x, 1)))
 	SUBST (XEXP (x, 1),
 	   force_to_mode (XEXP (x, 1), GET_MODE (XEXP (x, 1)),
-			  (HOST_WIDE_INT_1U
-			   << exact_log2 (GET_MODE_UNIT_BITSIZE
-	  (GET_MODE (x
-			  - 1,
-			  0));
+		 targetm.shift_truncation_mask (mode), 0));
   break;
 
 default:
@@ -10594,8 +10592,8 @@ simplify_shift_const_1 (enum rtx_code code, machine_mode result_mode,
   /* Make sure and truncate the "natural" shift on the way in.  We don't
  want to do this inside the loop as it makes it more difficult to
  combine shifts.  */
-  if (SHIFT_COUNT_TRUNCATED)
-orig_count &= GET_MODE_UNIT_BITSIZE (mode) - 1;
+  if (SHIFT_COUNT_TRUNCATED && targetm.shift_truncation_mask (mode))
+orig_count &= targetm.shift_truncation_mask (mode);
 
   /* If we were given an invalid count, don't do anything except exactly
  what was requested.  */
@@ -12295,7 +12293,7 @@ simplify_comparison (enum rtx_code code, rtx *pop0, rtx *pop1)
 	 between the position and the location of the single bit.  */
 	  /* Except we can't if SHIFT_COUNT_TRUNCATED is set, since we might
 	 have already reduced the shift count modulo the word size.  */
-	  if (!SHIFT_COUNT_TRUNCATED
+	  if ((!SHIFT_COUNT_TRUNCATED || !targetm.shift_truncation_mask (mode))
 	  && CONST_INT_P (XEXP (op0, 0))
 	  && XEXP (op0, 1) == const1_rtx
 	  && equality_comparison_p && const_op == 0
diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index ad8eacdf4dc..1d723f29e1e 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -2320,6 +2320,7 @@ s390_single_part (rtx op,
 	part = i;
 	}
 }
+
   return part == -1 ? -1 : n_parts - 1 - part;
 }
 
@@ -2702,6 +2703,7 @@ s390_logical_operator_ok_p (rtx *operands)
   return true;
 }
 
+
 /* Narrow logical operation CODE of memory operand MEMOP with immediate
operand IM

[PATCH] Fix PR90402

2019-05-09 Thread Richard Biener


The following fixes PR90402 by avoiding value-numbering bring the
loop header PHIs of if-converted and non-if-converted loop copy
out-of-sync for the vectorizer.

This is done by teaching region VN to handle multiple entries
into the entry block which allows us to value-number a cycles
body without considering the backedge (just forcing all the
entry PHIs varying).  Single BB cycles probably still cannot
be handled since they'd have entry->dest == exit_bb, but I
didn't actually try...

Bootstrap & regtest running on x86_64-unknown-linux-gnu.

Richard.

2019-05-09  Richard Biener  

PR tree-optimization/90402
* tree-if-conv.c (tree_if_conversion): Value number only
the loop body by making the latch an exit of the region
as well.
* tree-ssa-sccvn.c (process_bb): Add flag whether to skip
processing PHIs.
(do_rpo_vn): Deal with multiple edges into the entry block
that are not backedges inside the region by skipping PHIs
of the entry block.

* gcc.dg/torture/pr90402-1.c: New testcase.

Index: gcc/tree-if-conv.c
===
--- gcc/tree-if-conv.c  (revision 271030)
+++ gcc/tree-if-conv.c  (working copy)
@@ -3066,10 +3066,12 @@ tree_if_conversion (struct loop *loop, v
   ifcvt_local_dce (loop->header);
 
   /* Perform local CSE, this esp. helps the vectorizer analysis if loads
- and stores are involved.
+ and stores are involved.  CSE only the loop body, not the entry
+ PHIs, those are to be kept in sync with the non-if-converted copy.
  ???  We'll still keep dead stores though.  */
   exit_bbs = BITMAP_ALLOC (NULL);
   bitmap_set_bit (exit_bbs, single_exit (loop)->dest->index);
+  bitmap_set_bit (exit_bbs, loop->latch->index);
   todo |= do_rpo_vn (cfun, loop_preheader_edge (loop), exit_bbs);
   BITMAP_FREE (exit_bbs);
 
Index: gcc/tree-ssa-sccvn.c
===
--- gcc/tree-ssa-sccvn.c(revision 271030)
+++ gcc/tree-ssa-sccvn.c(working copy)
@@ -5978,7 +5978,7 @@ insert_related_predicates_on_edge (enum
 static unsigned
 process_bb (rpo_elim &avail, basic_block bb,
bool bb_visited, bool iterate_phis, bool iterate, bool eliminate,
-   bool do_region, bitmap exit_bbs)
+   bool do_region, bitmap exit_bbs, bool skip_phis)
 {
   unsigned todo = 0;
   edge_iterator ei;
@@ -5989,7 +5989,8 @@ process_bb (rpo_elim &avail, basic_block
   /* If we are in loop-closed SSA preserve this state.  This is
  relevant when called on regions from outside of FRE/PRE.  */
   bool lc_phi_nodes = false;
-  if (loops_state_satisfies_p (LOOP_CLOSED_SSA))
+  if (!skip_phis
+  && loops_state_satisfies_p (LOOP_CLOSED_SSA))
 FOR_EACH_EDGE (e, ei, bb->preds)
   if (e->src->loop_father != e->dest->loop_father
  && flow_loop_nested_p (e->dest->loop_father,
@@ -6010,67 +6011,68 @@ process_bb (rpo_elim &avail, basic_block
 }
 
   /* Value-number all defs in the basic-block.  */
-  for (gphi_iterator gsi = gsi_start_phis (bb); !gsi_end_p (gsi);
-   gsi_next (&gsi))
-{
-  gphi *phi = gsi.phi ();
-  tree res = PHI_RESULT (phi);
-  vn_ssa_aux_t res_info = VN_INFO (res);
-  if (!bb_visited)
-   {
- gcc_assert (!res_info->visited);
- res_info->valnum = VN_TOP;
- res_info->visited = true;
-   }
-
-  /* When not iterating force backedge values to varying.  */
-  visit_stmt (phi, !iterate_phis);
-  if (virtual_operand_p (res))
-   continue;
-
-  /* Eliminate */
-  /* The interesting case is gcc.dg/tree-ssa/pr22230.c for correctness
-how we handle backedges and availability.
-And gcc.dg/tree-ssa/ssa-sccvn-2.c for optimization.  */
-  tree val = res_info->valnum;
-  if (res != val && !iterate && eliminate)
-   {
- if (tree leader = avail.eliminate_avail (bb, res))
-   {
- if (leader != res
- /* Preserve loop-closed SSA form.  */
- && (! lc_phi_nodes
- || is_gimple_min_invariant (leader)))
-   {
- if (dump_file && (dump_flags & TDF_DETAILS))
-   {
- fprintf (dump_file, "Replaced redundant PHI node "
-  "defining ");
- print_generic_expr (dump_file, res);
- fprintf (dump_file, " with ");
- print_generic_expr (dump_file, leader);
- fprintf (dump_file, "\n");
-   }
- avail.eliminations++;
+  if (!skip_phis)
+for (gphi_iterator gsi = gsi_start_phis (bb); !gsi_end_p (gsi);
+gsi_next (&gsi))
+  {
+   gphi *phi = gsi.phi ();
+   tree res = PHI_RESULT (phi);
+   vn_ssa_aux_t res_info = VN_INFO (res);
+   if (!bb_visited)
+ {
+   gcc_assert (!res_info->visited);

[PATCH 0/2] gcov for huge cyclic graph (PR gcov-profile/90380)

2019-05-09 Thread marxin
Hi.

The PR is about a huge number of basic blocks that belong to a line.
Apparently, it's a generated code coming from Fortran module end statement.
I'm doing fix in 2 steps:

1) I dropped NEGATIVE_LOOP as that can't happen. I've tested that on GCC
   and works fine.
2) Cycle detection algorithm should not use zero edges.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
I'll install it if there will be no objections.

Martin

marxin (2):
  Test for not existence of a negative loop (PR gcov-profile/90380).
  Do not follow zero edges in cycle detection (PR gcov-profile/90380).

 gcc/gcov.c | 61 +-
 1 file changed, 24 insertions(+), 37 deletions(-)

-- 
2.21.0



[PATCH 2/2] Do not follow zero edges in cycle detection (PR gcov-profile/90380).

2019-05-09 Thread marxin

gcc/ChangeLog:

2019-05-09  Martin Liska  

* gcov.c (circuit): Ignore zero count arcs.
---
 gcc/gcov.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/gcc/gcov.c b/gcc/gcov.c
index 6bcd2b23748..f409d67e12c 100644
--- a/gcc/gcov.c
+++ b/gcc/gcov.c
@@ -742,7 +742,9 @@ circuit (block_info *v, arc_vector_t &path, block_info *start,
   for (arc_info *arc = v->succ; arc; arc = arc->succ_next)
 {
   block_info *w = arc->dst;
-  if (w < start || !linfo.has_block (w))
+  if (w < start
+	  || arc->count == 0
+	  || !linfo.has_block (w))
 	continue;
 
   path.push_back (arc);
@@ -765,7 +767,9 @@ circuit (block_info *v, arc_vector_t &path, block_info *start,
 for (arc_info *arc = v->succ; arc; arc = arc->succ_next)
   {
 	block_info *w = arc->dst;
-	if (w < start || !linfo.has_block (w))
+	if (w < start
+	|| arc->count == 0
+	|| !linfo.has_block (w))
 	  continue;
 
 	size_t index


[PATCH] Add params for jump-table expansion params (PR middle-end/90340).

2019-05-09 Thread Martin Liška
Hi.

The patch comes up with 2 new params that drive jump table density
when optimizing for size and speed.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin

gcc/ChangeLog:

2019-05-07  Martin Liska  

PR middle-end/90340
* doc/invoke.texi: New params.
* params.def (PARAM_JUMP_TABLE_MAX_GROWTH_RATIO_FOR_SIZE): New.
(PARAM_JUMP_TABLE_MAX_GROWTH_RATIO_FOR_SPEED): Likewise.
* tree-switch-conversion.c (jump_table_cluster::can_be_handled):
Use it.
* tree-switch-conversion.h (struct jump_table_cluster):
Likewise.

gcc/testsuite/ChangeLog:

2019-05-09  Martin Liska  

* gcc.dg/tree-ssa/pr90340-2.c: New test.
* gcc.dg/tree-ssa/pr90340.c: New test.
---
 gcc/doc/invoke.texi   | 10 
 gcc/params.def| 14 ++
 gcc/testsuite/gcc.dg/tree-ssa/pr90340-2.c | 31 +++
 gcc/testsuite/gcc.dg/tree-ssa/pr90340.c   | 31 +++
 gcc/tree-switch-conversion.c  | 11 +++-
 gcc/tree-switch-conversion.h  |  6 -
 6 files changed, 90 insertions(+), 13 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr90340-2.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr90340.c


diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index de7e1aaec67..ef35179e7bf 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -11889,6 +11889,16 @@ The smallest number of different values for which it is best to use a
 jump-table instead of a tree of conditional branches.  If the value is
 0, use the default for the machine.
 
+@item jump-table-max-growth-ratio-for-size
+The maximum code size growth ratio when expanding
+into a jump table (in percent).  The parameter is used when
+optimizing for size.
+
+@item jump-table-max-growth-ratio-for-speed
+The maximum code size growth ratio when expanding
+into a jump table (in percent).  The parameter is used when
+optimizing for speed.
+
 @item tree-reassoc-width
 Set the maximum number of instructions executed in parallel in
 reassociated tree. This parameter overrides target dependent
diff --git a/gcc/params.def b/gcc/params.def
index 904b3cbdf16..f8f842ed46e 100644
--- a/gcc/params.def
+++ b/gcc/params.def
@@ -1175,6 +1175,20 @@ DEFPARAM (PARAM_CASE_VALUES_THRESHOLD,
 	  "if 0, use the default for the machine.",
   0, 0, 0)
 
+DEFPARAM (PARAM_JUMP_TABLE_MAX_GROWTH_RATIO_FOR_SIZE,
+	  "jump-table-max-growth-ratio-for-size",
+	  "The maximum code size growth ratio when expanding "
+	  "into a jump table (in percent).  The parameter is used when "
+	  "optimizing for size.",
+	  300, 0, 0)
+
+DEFPARAM (PARAM_JUMP_TABLE_MAX_GROWTH_RATIO_FOR_SPEED,
+	  "jump-table-max-growth-ratio-for-speed",
+	  "The maximum code size growth ratio when expanding "
+	  "into a jump table (in percent).  The parameter is used when "
+	  "optimizing for speed.",
+	  800, 0, 0)
+
 /* Data race flags for C++0x memory model compliance.  */
 DEFPARAM (PARAM_ALLOW_STORE_DATA_RACES,
 	  "allow-store-data-races",
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr90340-2.c b/gcc/testsuite/gcc.dg/tree-ssa/pr90340-2.c
new file mode 100644
index 000..21099821786
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr90340-2.c
@@ -0,0 +1,31 @@
+/* { dg-do compile { target { { x86_64-*-* aarch64-*-* ia64-*-* powerpc64-*-* } && lp64 } } } */
+/* { dg-options "-Os --param jump-table-max-growth-ratio-for-size=200 -fdump-tree-switchlower1" } */
+
+int a;
+
+int foo(char c) {
+  switch (c) {
+  case 'c':
+return a;
+  case 's':
+return 3;
+  case 'n':
+return 1;
+  case '%':
+return -2;
+  case 'o':
+return a + 2;
+break;
+  case 'X':
+  case 'x':
+return ;
+  case 'd':
+  case 'i':
+  case 'u':
+return ;
+  default:
+return 0;
+  }
+}
+
+/* { dg-final { scan-tree-dump ";; GIMPLE switch case clusters: 37 88 99 100 105 110 111 115 117 120" "switchlower1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr90340.c b/gcc/testsuite/gcc.dg/tree-ssa/pr90340.c
new file mode 100644
index 000..8f3b87c1916
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr90340.c
@@ -0,0 +1,31 @@
+/* { dg-do compile { target { { x86_64-*-* aarch64-*-* ia64-*-* powerpc64-*-* } && lp64 } } } */
+/* { dg-options "-Os -fdump-tree-switchlower1" } */
+
+int a;
+
+int foo(char c) {
+  switch (c) {
+  case 'c':
+return a;
+  case 's':
+return 3;
+  case 'n':
+return 1;
+  case '%':
+return -2;
+  case 'o':
+return a + 2;
+break;
+  case 'X':
+  case 'x':
+return ;
+  case 'd':
+  case 'i':
+  case 'u':
+return ;
+  default:
+return 0;
+  }
+}
+
+/* { dg-final { scan-tree-dump ";; GIMPLE switch case clusters: 37 88 JT:99-120" "switchlower1" } } */
diff --git a/gcc/tree-switch-conversion.c b/gcc/tree-switch-conversion.c
index c3f2baf39d7..bedeb2fd865 100644
--- a/gcc/tree-switch-conversion.c
+++ b

[PATCH 1/2] Test for not existence of a negative loop (PR gcov-profile/90380).

2019-05-09 Thread marxin

gcc/ChangeLog:

2019-05-09  Martin Liska  

PR gcov-profile/90380
* gcov.c (enum loop_type): Remove the enum and
the operator.
(handle_cycle): Assert that we should not reach
a negative count.
(circuit): Use loop_found instead of a tri-state loop_type.
(get_cycles_count): Do not handle NEGATIVE_LOOP as it can't
happen.
---
 gcc/gcov.c | 53 ++---
 1 file changed, 18 insertions(+), 35 deletions(-)

diff --git a/gcc/gcov.c b/gcc/gcov.c
index 1fc37a07c34..6bcd2b23748 100644
--- a/gcc/gcov.c
+++ b/gcc/gcov.c
@@ -676,27 +676,11 @@ bool function_info::group_line_p (unsigned n, unsigned src_idx)
 typedef vector arc_vector_t;
 typedef vector block_vector_t;
 
-/* Enum with types of loop in CFG.  */
-
-enum loop_type
-{
-  NO_LOOP = 0,
-  LOOP = 1,
-  NEGATIVE_LOOP = 3
-};
-
-/* Loop_type operator that merges two values: A and B.  */
-
-inline loop_type& operator |= (loop_type& a, loop_type b)
-{
-return a = static_cast (a | b);
-}
-
 /* Handle cycle identified by EDGES, where the function finds minimum cs_count
and subtract the value from all counts.  The subtracted value is added
to COUNT.  Returns type of loop.  */
 
-static loop_type
+static void
 handle_cycle (const arc_vector_t &edges, int64_t &count)
 {
   /* Find the minimum edge of the cycle, and reduce all nodes in the cycle by
@@ -712,7 +696,7 @@ handle_cycle (const arc_vector_t &edges, int64_t &count)
   for (unsigned i = 0; i < edges.size (); i++)
 edges[i]->cs_count -= cycle_count;
 
-  return cycle_count < 0 ? NEGATIVE_LOOP : LOOP;
+  gcc_assert (cycle_count >= 0);
 }
 
 /* Unblock a block U from BLOCKED.  Apart from that, iterate all blocks
@@ -743,12 +727,12 @@ unblock (const block_info *u, block_vector_t &blocked,
blocked by a block.  COUNT is accumulated count of the current LINE.
Returns what type of loop it contains.  */
 
-static loop_type
+static bool
 circuit (block_info *v, arc_vector_t &path, block_info *start,
 	 block_vector_t &blocked, vector &block_lists,
 	 line_info &linfo, int64_t &count)
 {
-  loop_type result = NO_LOOP;
+  bool loop_found = false;
 
   /* Add v to the block list.  */
   gcc_assert (find (blocked.begin (), blocked.end (), v) == blocked.end ());
@@ -763,15 +747,19 @@ circuit (block_info *v, arc_vector_t &path, block_info *start,
 
   path.push_back (arc);
   if (w == start)
-	/* Cycle has been found.  */
-	result |= handle_cycle (path, count);
+	{
+	  /* Cycle has been found.  */
+	  handle_cycle (path, count);
+	  loop_found = true;
+	}
   else if (find (blocked.begin (), blocked.end (), w) == blocked.end ())
-	result |= circuit (w, path, start, blocked, block_lists, linfo, count);
+	loop_found |= circuit (w, path, start, blocked, block_lists, linfo,
+			   count);
 
   path.pop_back ();
 }
 
-  if (result != NO_LOOP)
+  if (loop_found)
 unblock (v, blocked, block_lists);
   else
 for (arc_info *arc = v->succ; arc; arc = arc->succ_next)
@@ -788,14 +776,13 @@ circuit (block_info *v, arc_vector_t &path, block_info *start,
 	  list.push_back (v);
   }
 
-  return result;
+  return loop_found;
 }
 
-/* Find cycles for a LINFO.  If HANDLE_NEGATIVE_CYCLES is set and the line
-   contains a negative loop, then perform the same function once again.  */
+/* Find cycles for a LINFO.  */
 
 static gcov_type
-get_cycles_count (line_info &linfo, bool handle_negative_cycles = true)
+get_cycles_count (line_info &linfo)
 {
   /* Note that this algorithm works even if blocks aren't in sorted order.
  Each iteration of the circuit detection is completely independent
@@ -803,7 +790,7 @@ get_cycles_count (line_info &linfo, bool handle_negative_cycles = true)
  Therefore, operating on a permuted order (i.e., non-sorted) only
  has the effect of permuting the output cycles.  */
 
-  loop_type result = NO_LOOP;
+  bool loop_found = false;
   gcov_type count = 0;
   for (vector::iterator it = linfo.blocks.begin ();
it != linfo.blocks.end (); it++)
@@ -811,14 +798,10 @@ get_cycles_count (line_info &linfo, bool handle_negative_cycles = true)
   arc_vector_t path;
   block_vector_t blocked;
   vector block_lists;
-  result |= circuit (*it, path, *it, blocked, block_lists, linfo,
-			 count);
+  loop_found |= circuit (*it, path, *it, blocked, block_lists, linfo,
+			 count);
 }
 
-  /* If we have a negative cycle, repeat the find_cycles routine.  */
-  if (result == NEGATIVE_LOOP && handle_negative_cycles)
-count += get_cycles_count (linfo, false);
-
   return count;
 }
 


Re: [PATCH] PR libstdc++/61761 fix std::proj for targets without C99 cproj

2019-05-09 Thread Szabolcs Nagy
On 03/05/2019 20:25, Jonathan Wakely wrote:
> On 03/05/19 14:34 +, Szabolcs Nagy wrote:
>> there is still an execution failure, but that's not
>> related to copysign: proj(i*inf) returns i*inf instead of inf
>> i haven't figured out why:
>>
>> /work/b/src/gcc/libstdc++-v3/testsuite/26_numerics/complex/proj.cc:105: void 
>> test01(): Assertion 'eq( std::proj(c0p) ,
>> std::complex(pinf, +0.0) )' failed.
>> FAIL: 26_numerics/complex/proj.cc execution test
> 
> If std::copysign isn't avilable then we only provide the generic
> std::proj which doesn't support infinities, so all the tests using
> positive or negative infinity will give the wrong answer.
> 
> The r270759 change doesn't actually use std::copysign, only
> __builtin_copysign, but if the compiler doesn't expand that then it
> still requires libc to provide copysign. If autoconf decides copysign
> isn't available, std::proj doesn't support infinities.
> 
> I'm not sure whether to XFAIL the test in that case, or just make the
> tests for infinities conditional on the necessary support in libc e.g.

it seems newlib does not have full c99 math support on
aarch64 because several long double math functions are
missing (including copysignl).

i think it's better to xfail { ! copysignl } than hiding
the failure with ifdefs (but it requires a copysignl check
in the test system or a more general c99 math check)


> 
> --- a/libstdc++-v3/testsuite/26_numerics/complex/proj.cc
> +++ b/libstdc++-v3/testsuite/26_numerics/complex/proj.cc
> @@ -101,6 +101,7 @@ test01()
>   VERIFY( eq( std::proj(cqq)  , cqq ) );
>   VERIFY( eq( std::proj(-cqq) , -cqq ) );
> 
> +#ifdef _GLIBCXX_USE_C99_MATH_TR1
>   const std::complex c0p(0, pinf);
>   VERIFY( eq( std::proj(c0p)  , std::complex(pinf, +0.0) ) );
>   VERIFY( eq( std::proj(-c0p) , std::complex(pinf, -0.0) ) );
> @@ -164,6 +165,7 @@ test01()
>   const std::complex cnp(ninf, pinf);
>   VERIFY( eq( std::proj(cnp)  , std::complex(pinf, +0.0) ) );
>   VERIFY( eq( std::proj(-cnp) , std::complex(pinf, -0.0) ) );
> +#endif
> }
> 
> void
> @@ -215,6 +217,7 @@ test02()
>   VERIFY( eq( std::proj(cqq)  , cqq ) );
>   VERIFY( eq( std::proj(-cqq) , -cqq ) );
> 
> +#ifdef _GLIBCXX_USE_C99_MATH_TR1
>   const std::complex c0p(0, pinf);
>   VERIFY( eq( std::proj(c0p)  , std::complex(pinf, +0.0) ) );
>   VERIFY( eq( std::proj(-c0p) , std::complex(pinf, -0.0) ) );
> @@ -278,6 +281,7 @@ test02()
>   const std::complex cnp(ninf, pinf);
>   VERIFY( eq( std::proj(cnp)  , std::complex(pinf, +0.0) ) );
>   VERIFY( eq( std::proj(-cnp) , std::complex(pinf, -0.0) ) );
> +#endif
> }
> 
> void
> @@ -329,6 +333,7 @@ test03()
>   VERIFY( eq( std::proj(cqq)  , cqq ) );
>   VERIFY( eq( std::proj(-cqq) , -cqq ) );
> 
> +#ifdef _GLIBCXX_USE_C99_MATH_TR1
>   const std::complex c0p(0, pinf);
>   VERIFY( eq( std::proj(c0p)  , std::complex(pinf, +0.0) ) );
>   VERIFY( eq( std::proj(-c0p) , std::complex(pinf, -0.0) ) );
> @@ -392,6 +397,7 @@ test03()
>   const std::complex cnp(ninf, pinf);
>   VERIFY( eq( std::proj(cnp)  , std::complex(pinf, +0.0) ) );
>   VERIFY( eq( std::proj(-cnp) , std::complex(pinf, -0.0) ) );
> +#endif
> }
> 
> int
> 
> 
> 



Re: [PATCH] Implement LWG 2686, hash

2019-05-09 Thread Szabolcs Nagy
On 07/05/2019 13:21, Christophe Lyon wrote:
> On Tue, 7 May 2019 at 12:07, Jonathan Wakely  wrote:
>>
>> On 07/05/19 10:37 +0100, Jonathan Wakely wrote:
>>> On 07/05/19 11:05 +0200, Christophe Lyon wrote:
 On Sat, 4 May 2019 at 16:36, Jonathan Wakely  wrote:
>
> On 03/05/19 23:42 +0100, Jonathan Wakely wrote:
>> On 23/03/17 17:49 +, Jonathan Wakely wrote:
>>> On 12/03/17 13:16 +0100, Daniel Krügler wrote:
 The following is an *untested* patch suggestion, please verify.

 Notes: My interpretation is that hash should be
 defined outside of the _GLIBCXX_COMPATIBILITY_CXX0X block, please
 double-check that course of action.
>>>
>>> That's right.
>>>
 I noticed that the preexisting hash did directly refer to
 the private members of error_code albeit those have public access
 functions. For consistency I mimicked that existing style when
 implementing hash.
>>>
>>> I see no reason for that, so I've removed the friend declaration and
>>> used the public member functions.
>>
>> I'm going to do the same for hash too. It can also use the
>> public members instead of being a friend.
>>
>>
>>> Although this is a DR, I'm treating it as a new C++17 feature, so I've
>>> adjusted the patch to only add the new specialization for C++17 mode.
>>> We're too close to the GCC 7 release to be adding new things to the
>>> default mode, even minor things like this. After GCC 7 is released we
>>> can revisit it and decide if we want to enable it for all modes.
>>
>> We never revisited that, and it's still only enabled for C++17 and up.
>> I guess that's OK, but we could enabled it for C++11 and 14 on trunk
>> if we want. Anybody care enough to argue for that?
>>
>>> Here's what I've tested and will be committing.
>>>
>>>
>>
>>> commit 90ca0fd91f5c65af370beb20af06bdca257aaf63
>>> Author: Jonathan Wakely 
>>> Date:   Thu Mar 23 11:47:39 2017 +
>>>
>>>   Implement LWG 2686, std::hash, for C++17
>>>   2017-03-23  Daniel Kruegler  
>>>  Implement LWG 2686, Why is std::hash specialized for error_code,
>>>  but not error_condition?
>>>  * include/std/system_error (hash): Define for 
>>> C++17.
>>>  * testsuite/20_util/hash/operators/size_t.cc 
>>> (hash):
>>>  Instantiate test for error_condition.
>>>  * testsuite/20_util/hash/requirements/explicit_instantiation.cc
>>>  (hash): Instantiate hash.
>>>
>>> diff --git a/libstdc++-v3/include/std/system_error 
>>> b/libstdc++-v3/include/std/system_error
>>> index 6775a6e..ec7d25f 100644
>>> --- a/libstdc++-v3/include/std/system_error
>>> +++ b/libstdc++-v3/include/std/system_error
>>> @@ -373,14 +373,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>> _GLIBCXX_END_NAMESPACE_VERSION
>>> } // namespace
>>>
>>> -#ifndef _GLIBCXX_COMPATIBILITY_CXX0X
>>> -
>>> #include 
>>>
>>> namespace std _GLIBCXX_VISIBILITY(default)
>>> {
>>> _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>>
>>> +#ifndef _GLIBCXX_COMPATIBILITY_CXX0X
>>>  // DR 1182.
>>>  /// std::hash specialization for error_code.
>>>  template<>
>>> @@ -394,12 +393,27 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>>  return std::_Hash_impl::__hash_combine(__e._M_cat, __tmp);
>>>  }
>>>};
>>> +#endif // _GLIBCXX_COMPATIBILITY_CXX0X
>>> +
>>> +#if __cplusplus > 201402L
>>> +  // DR 2686.
>>> +  /// std::hash specialization for error_condition.
>>> +  template<>
>>> +struct hash
>>> +: public __hash_base
>>> +{
>>> +  size_t
>>> +  operator()(const error_condition& __e) const noexcept
>>> +  {
>>> + const size_t __tmp = std::_Hash_impl::hash(__e.value());
>>> + return std::_Hash_impl::__hash_combine(__e.category(), __tmp);
>>
>> When I changed this from using __e._M_cat (as in Daniel's patch) to
>> __e.category() I introduced a bug, because the former is a pointer to
>> the error_category (and error_category objects are unique and so can
>> be identified by their address) and the latter is the object itself,
>> so we hash the bytes of an abstract base class instead of hashing the
>> pointer to it. Oops.
>>
>> Patch coming up to fix that.
>
> Here's the fix. Tested powerpc64le-linux, committed to trunk.
>
> I'll backport this to 7, 8 and 9 as well.
>

 Hi Jonathan,

 Does the new test lack dg-require-filesystem-ts ?
>>>
>>> It lacks it, because it doesn't use the filesystem library at all.
>>>
 I'm seeing link failures on arm-eabi (using newlib):
 Excess errors:
 /libstdc++-v3/src/c++17/fs_ops.cc:806: undefined reference to `chdir'
 /libstdc++-v3/src/c++17/fs_ops.cc:583: undefined reference to `mkdir'
 /l

Re: [PATCH] Eliminates phi on branch for CMP result

2019-05-09 Thread Jeff Law
On 5/9/19 12:07 AM, Richard Biener wrote:
> On Wed, 8 May 2019, Jeff Law wrote:
> 
>> On 5/8/19 6:20 AM, Richard Biener wrote:
>>> On Wed, 8 May 2019, Jiufu Guo wrote:
>>>
 Hi,

 Thanks Richard, Segher, Andrew and all.

 Segher Boessenkool  writes:

> Let me try to answer some of this...
>
> On Tue, May 07, 2019 at 03:31:27PM +0200, Richard Biener wrote:
>> On Mon, 6 May 2019, Jiufu Guo wrote:
>>> This patch implements the optimization in PR77820.  The optimization
>>> eliminates phi and phi's basic block, if the phi is used only by
>>> condition branch, and the phi's incoming value in the result of a
>>> CMP result.
>
>> I'm not sure I like a new pass here ;)  The transform is basically
>> tail-duplicating the PHI block because the exit conditional can
>> be "simplfied" - that's something jump threading generally does
>> though it relies on "simplified" being a little bit more simplified
>> than above.
>
 Adding a new pass is just because it may be relatively easy to extend
 and maintain. 

 Adding this micor-optimization into other passes is also a good
 sugguestion. 

 - Similar with jump threading, this new transform is replacing jump
 old destination with new destination. While for this case, the new
 destination can not be determined at compile time. 

 - And as Andrew said: forwprop/backprop are similar, but this case is in
 opposite: it is spread common code(phi+gcond) into different preds.

 - And there is phiopt pass which focus on making 'phi' stmt better. 
 it also do some similar thing:
  bb0:
if (a != b) goto bb2; else goto bb1;
  bb1:
  bb2:
x = PHI ;

tranform to:

  bb0:
  bb2:
x = PHI ;

 If I avoid to add new pass, any suggustions about which exiting pass
 (jumpthreading/forwprop/phiopt/...) would be more suitable to extend to
 support this new transform? 
>>>
>>> The main thing the transform does is tail-duplicate the PHI block,
>>> if we'd just do that followup transforms would do the rest.
>> One might even claim this is really a transformation for cfg cleanups.
>> If we ignore the PHI what we have is a unconditional jump to a
>> conditional jump.  The obvious way to optimize that is to replace the
>> unconditional jump with a copy of the conditional jump.
> 
> I though about this too, but then quickly disregarded as too gross ;)
Yea, the changes to the CFG (and dominator tree and SSA graph) are
probably significant enough that trying to do it into cleanup_cfg is
just madness.  One of those few cases where doing something on RTL is
probably easier than gimple.

Jeff


Re: [PATCH][PR90106] Builtin call transformation changes in cdce pass

2019-05-09 Thread Jeff Law
On 5/8/19 7:20 PM, Bin.Cheng wrote:
> On Thu, May 9, 2019 at 5:31 AM Jeff Law  wrote:
>>
>> On 5/8/19 6:28 AM, Richard Biener wrote:
>>> On Wed, May 8, 2019 at 12:09 PM JunMa  wrote:

 Hi

 As PR90106 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90106),
 when gcc meets builtin function call like:

y = sqrt (x);

 The cdce pass tries to transform the call into an internal function
 call and conditionally executes call with a simple range check on the
 arguments which can detect most cases and the errno does not need
 to be set. It looks like:

y = IFN_SQRT (x);
if (__builtin_isless (x, 0))
  sqrt (x);

 However, If the call is in tail position, for example:

y =  sqrt (x);
return y;

 will become:

y = IFN_SQRT (x);
if (__builtin_isless (x, 0))
  sqrt (x);
return y;

 This transformation breaks tailcall pattern, and prevents
 later tailcall optimizations.

 So This patch transform builtin call with return value into
 if-then-else part, which looks like:

y =  sqrt (x);
 ==>
if (__builtin_isless (x, 0))
  y = sqrt (x);
else
  y = IFN_SQRT (x);

 BTW, y = sqrt (x) can also transform like:

y = IFN_SQRT (x);
if (__builtin_isless (x, 0))
  y = sqrt (x);

 We don‘t choose this pattern because it emits worse assemble
 code(more move instruction and use more registers) in x86_64.

 Bootstrapped/regtested on x86_64-linux, ok for trunk?
>>>
>>> OK.
>> JunMa -- do you have a copyright assignment on file and write access to
>> the repository?  If not we should take care of that before proceeding
>> further.
> Hi Jeff,
> Thanks very much for helping.
> Yes, he is under Alibaba's copyright assignment.  What else should we
> do other than noticing in this mailing list message?
> BTW, I think JunMa needs to apply write-access though.
If he's covered under a corporate assignment, then we're fine from that
standpoint.

For write access, we just need to go through the usual procedure.  JunMa
just needs to fill out this form, I'm happy to sponsor him.

https://sourceware.org/cgi-bin/pdw/ps_form.cgi

Jeff


Re: [PATCH] Add params for jump-table expansion params (PR middle-end/90340).

2019-05-09 Thread Jeff Law
On 5/9/19 7:09 AM, Martin Liška wrote:
> Hi.
> 
> The patch comes up with 2 new params that drive jump table density
> when optimizing for size and speed.
> 
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> 
> Ready to be installed?
> Thanks,
> Martin
> 
> gcc/ChangeLog:
> 
> 2019-05-07  Martin Liska  
> 
>   PR middle-end/90340
>   * doc/invoke.texi: New params.
>   * params.def (PARAM_JUMP_TABLE_MAX_GROWTH_RATIO_FOR_SIZE): New.
>   (PARAM_JUMP_TABLE_MAX_GROWTH_RATIO_FOR_SPEED): Likewise.
>   * tree-switch-conversion.c (jump_table_cluster::can_be_handled):
>   Use it.
>   * tree-switch-conversion.h (struct jump_table_cluster):
>   Likewise.
> 
> gcc/testsuite/ChangeLog:
> 
> 2019-05-09  Martin Liska  
> 
>   * gcc.dg/tree-ssa/pr90340-2.c: New test.
>   * gcc.dg/tree-ssa/pr90340.c: New test.
> ---
>  gcc/doc/invoke.texi   | 10 
>  gcc/params.def| 14 ++
>  gcc/testsuite/gcc.dg/tree-ssa/pr90340-2.c | 31 +++
>  gcc/testsuite/gcc.dg/tree-ssa/pr90340.c   | 31 +++
>  gcc/tree-switch-conversion.c  | 11 +++-
>  gcc/tree-switch-conversion.h  |  6 -
>  6 files changed, 90 insertions(+), 13 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr90340-2.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr90340.c
> 
> 
OK
jeff


Re: [PATCH] Fix location where lto-dump is installed.

2019-05-09 Thread Jeff Law
On 5/9/19 4:20 AM, Martin Liška wrote:
> Hi.
> 
> The patch is about proper install location of the newly added tool.
> 
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> 
> Ready to be installed?
> Thanks,
> Martin
> 
> gcc/lto/ChangeLog:
> 
> 2019-05-09  Martin Liska  
> 
>   * Make-lang.in: Use program_transform_name for lto-dump.
>   * config-lang.in: Do not mark lto-dump compiler as we don't
>   want to have it installed at
>   lib/gcc/x86_64-pc-linux-gnu/10.0.0/lto-dump.
> ---
>  gcc/lto/Make-lang.in   | 3 ++-
>  gcc/lto/config-lang.in | 2 +-
>  2 files changed, 3 insertions(+), 2 deletions(-)
> 
> 

OK
jeff


Re: [PATCH] Implement LWG 2686, hash

2019-05-09 Thread Jonathan Wakely
On Thu, 9 May 2019 at 15:43, Szabolcs Nagy  wrote:
>
> On 07/05/2019 13:21, Christophe Lyon wrote:
> > On Tue, 7 May 2019 at 12:07, Jonathan Wakely  wrote:
> >>
> >> On 07/05/19 10:37 +0100, Jonathan Wakely wrote:
> >>> On 07/05/19 11:05 +0200, Christophe Lyon wrote:
>  On Sat, 4 May 2019 at 16:36, Jonathan Wakely  wrote:
> >
> > On 03/05/19 23:42 +0100, Jonathan Wakely wrote:
> >> On 23/03/17 17:49 +, Jonathan Wakely wrote:
> >>> On 12/03/17 13:16 +0100, Daniel Krügler wrote:
>  The following is an *untested* patch suggestion, please verify.
> 
>  Notes: My interpretation is that hash should be
>  defined outside of the _GLIBCXX_COMPATIBILITY_CXX0X block, please
>  double-check that course of action.
> >>>
> >>> That's right.
> >>>
>  I noticed that the preexisting hash did directly refer to
>  the private members of error_code albeit those have public access
>  functions. For consistency I mimicked that existing style when
>  implementing hash.
> >>>
> >>> I see no reason for that, so I've removed the friend declaration and
> >>> used the public member functions.
> >>
> >> I'm going to do the same for hash too. It can also use the
> >> public members instead of being a friend.
> >>
> >>
> >>> Although this is a DR, I'm treating it as a new C++17 feature, so I've
> >>> adjusted the patch to only add the new specialization for C++17 mode.
> >>> We're too close to the GCC 7 release to be adding new things to the
> >>> default mode, even minor things like this. After GCC 7 is released we
> >>> can revisit it and decide if we want to enable it for all modes.
> >>
> >> We never revisited that, and it's still only enabled for C++17 and up.
> >> I guess that's OK, but we could enabled it for C++11 and 14 on trunk
> >> if we want. Anybody care enough to argue for that?
> >>
> >>> Here's what I've tested and will be committing.
> >>>
> >>>
> >>
> >>> commit 90ca0fd91f5c65af370beb20af06bdca257aaf63
> >>> Author: Jonathan Wakely 
> >>> Date:   Thu Mar 23 11:47:39 2017 +
> >>>
> >>>   Implement LWG 2686, std::hash, for C++17
> >>>   2017-03-23  Daniel Kruegler  
> >>>  Implement LWG 2686, Why is std::hash specialized for error_code,
> >>>  but not error_condition?
> >>>  * include/std/system_error (hash): Define for 
> >>> C++17.
> >>>  * testsuite/20_util/hash/operators/size_t.cc 
> >>> (hash):
> >>>  Instantiate test for error_condition.
> >>>  * testsuite/20_util/hash/requirements/explicit_instantiation.cc
> >>>  (hash): Instantiate hash.
> >>>
> >>> diff --git a/libstdc++-v3/include/std/system_error 
> >>> b/libstdc++-v3/include/std/system_error
> >>> index 6775a6e..ec7d25f 100644
> >>> --- a/libstdc++-v3/include/std/system_error
> >>> +++ b/libstdc++-v3/include/std/system_error
> >>> @@ -373,14 +373,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >>> _GLIBCXX_END_NAMESPACE_VERSION
> >>> } // namespace
> >>>
> >>> -#ifndef _GLIBCXX_COMPATIBILITY_CXX0X
> >>> -
> >>> #include 
> >>>
> >>> namespace std _GLIBCXX_VISIBILITY(default)
> >>> {
> >>> _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >>>
> >>> +#ifndef _GLIBCXX_COMPATIBILITY_CXX0X
> >>>  // DR 1182.
> >>>  /// std::hash specialization for error_code.
> >>>  template<>
> >>> @@ -394,12 +393,27 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >>>  return std::_Hash_impl::__hash_combine(__e._M_cat, __tmp);
> >>>  }
> >>>};
> >>> +#endif // _GLIBCXX_COMPATIBILITY_CXX0X
> >>> +
> >>> +#if __cplusplus > 201402L
> >>> +  // DR 2686.
> >>> +  /// std::hash specialization for error_condition.
> >>> +  template<>
> >>> +struct hash
> >>> +: public __hash_base
> >>> +{
> >>> +  size_t
> >>> +  operator()(const error_condition& __e) const noexcept
> >>> +  {
> >>> + const size_t __tmp = std::_Hash_impl::hash(__e.value());
> >>> + return std::_Hash_impl::__hash_combine(__e.category(), __tmp);
> >>
> >> When I changed this from using __e._M_cat (as in Daniel's patch) to
> >> __e.category() I introduced a bug, because the former is a pointer to
> >> the error_category (and error_category objects are unique and so can
> >> be identified by their address) and the latter is the object itself,
> >> so we hash the bytes of an abstract base class instead of hashing the
> >> pointer to it. Oops.
> >>
> >> Patch coming up to fix that.
> >
> > Here's the fix. Tested powerpc64le-linux, committed to trunk.
> >
> > I'll backport this to 7, 8 and 9 as well.
> >
> 
>  Hi Jonathan,
> 
>  Does the new test lack dg-require-filesystem-ts ?
> >>>
> >>> It lacks it, be

Re: [RFC] SHIFT_COUNT_TRUNCATED and shift_truncation_mask

2019-05-09 Thread Jeff Law
On 5/9/19 5:52 AM, Robin Dapp wrote:
> Hi,
> 
> while trying to improve s390 code generation for rotate and shift I
> noticed superfluous subregs for shift count operands. In our backend we
> already have quite cumbersome patterns that would need to be duplicated
> (or complicated further by more subst patterns) in order to get rid of
> the subregs.
> 
> I had already finished all the patterns when I realized that
> SHIFT_COUNT_TRUNCATED and the target hook shift_truncation_mask already
> exist and could do what is needed without extra patterns.  Just defining
>  shift_truncation_mask was not enough though as most of the additional
> insns get introduced by combine.
> 
> Event SHIFT_COUNT_TRUNCATED is no perfect match to what our hardware
> does because we always only consider the last 6 bits of a shift operand.>
> Despite all the warnings in the other backends, most notably
> SHIFT_COUNT_TRUNCATED being "discouraged" as mentioned in riscv.h, I
> wrote the attached tentative patch.  It's a little ad-hoc, uses the
> SHIFT_COUNT_TRUNCATED paths only if shift_truncation_mask () != 0 and,
> instead of truncating via & (GET_MODE_BITSIZE (mode) - 1), it applies
> the mask returned by shift_truncation_mask.  Doing so, usage of both
> "methods" actually reduces to a single way.
THe main reason it's discouraged is because some targets have insns
where the count would be truncated and others where it would not.   So
for example normal shifts might truncate, but vector shifts might or
(mips) or shifts might truncate but bit tests do not (x86).

I don't know enough about the s390 architecture to know if there's any
corner cases.  You'd have to look at ever pattern in your machine
description with a shift and verify that it's going to DTRT if the count
hasn't been truncated.


It would really help if you could provide testcases which show the
suboptimal code and any analysis you've done.


Jeff


[arm] PR target/90405 fix regression for thumb1 with -mtpcs-leaf-frame

2019-05-09 Thread Richard Earnshaw (lists)
-mtpcs-leaf-frame causes an APCS-style backtrace frame to be created
on the stack.  This should probably be deprecated, but it did reveal
an issue with the patch I committed previously to improve the code
generation when pushing high registers, in that
thumb_find_work_register had a different idea as to which registers
were available as scratch registers.

The new code actually does a better job of finding a viable work
register and doesn't rely so much on assumptions about the ABI, so it
seems better to adapt thumb_find_work_register to the new approach.
This way we can eliminate some rather crufty code.

gcc:
PR target/90405
* config/arm/arm.c (callee_saved_reg_p): Move before
thumb_find_work_register.
(thumb1_prologue_unused_call_clobbered_lo_regs): Move before
thumb_find_work_register.  Only call df_get_live_out once.
(thumb1_epilogue_unused_call_clobbered_lo_regs): Likewise.
(thumb_find_work_register): Use
thumb1_prologue_unused_call_clobbered_lo_regs instead of ad hoc
algorithms to locate a spare call clobbered reg.

gcc/testsuite:
PR target/90405
* gcc.target/arm/pr90405.c: New test.
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 91bb65130b8..528752ab01f 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -7638,6 +7638,41 @@ legitimize_pic_address (rtx orig, machine_mode mode, rtx reg, rtx pic_reg,
 }
 
 
+/* Whether a register is callee saved or not.  This is necessary because high
+   registers are marked as caller saved when optimizing for size on Thumb-1
+   targets despite being callee saved in order to avoid using them.  */
+#define callee_saved_reg_p(reg) \
+  (!call_used_regs[reg] \
+   || (TARGET_THUMB1 && optimize_size \
+   && reg >= FIRST_HI_REGNUM && reg <= LAST_HI_REGNUM))
+
+/* Return a mask for the call-clobbered low registers that are unused
+   at the end of the prologue.  */
+static unsigned long
+thumb1_prologue_unused_call_clobbered_lo_regs (void)
+{
+  unsigned long mask = 0;
+  bitmap prologue_live_out = df_get_live_out (ENTRY_BLOCK_PTR_FOR_FN (cfun));
+
+  for (int reg = FIRST_LO_REGNUM; reg <= LAST_LO_REGNUM; reg++)
+if (!callee_saved_reg_p (reg) && !REGNO_REG_SET_P (prologue_live_out, reg))
+  mask |= 1 << (reg - FIRST_LO_REGNUM);
+  return mask;
+}
+
+/* Similarly for the start of the epilogue.  */
+static unsigned long
+thumb1_epilogue_unused_call_clobbered_lo_regs (void)
+{
+  unsigned long mask = 0;
+  bitmap epilogue_live_in = df_get_live_in (EXIT_BLOCK_PTR_FOR_FN (cfun));
+
+  for (int reg = FIRST_LO_REGNUM; reg <= LAST_LO_REGNUM; reg++)
+if (!callee_saved_reg_p (reg) && !REGNO_REG_SET_P (epilogue_live_in, reg))
+  mask |= 1 << (reg - FIRST_LO_REGNUM);
+  return mask;
+}
+
 /* Find a spare register to use during the prolog of a function.  */
 
 static int
@@ -7645,45 +7680,16 @@ thumb_find_work_register (unsigned long pushed_regs_mask)
 {
   int reg;
 
+  unsigned long unused_regs
+= thumb1_prologue_unused_call_clobbered_lo_regs ();
+
   /* Check the argument registers first as these are call-used.  The
  register allocation order means that sometimes r3 might be used
  but earlier argument registers might not, so check them all.  */
-  for (reg = LAST_ARG_REGNUM; reg >= 0; reg --)
-if (!df_regs_ever_live_p (reg))
+  for (reg = LAST_LO_REGNUM; reg >= FIRST_LO_REGNUM; reg--)
+if (unused_regs & (1 << (reg - FIRST_LO_REGNUM)))
   return reg;
 
-  /* Before going on to check the call-saved registers we can try a couple
- more ways of deducing that r3 is available.  The first is when we are
- pushing anonymous arguments onto the stack and we have less than 4
- registers worth of fixed arguments(*).  In this case r3 will be part of
- the variable argument list and so we can be sure that it will be
- pushed right at the start of the function.  Hence it will be available
- for the rest of the prologue.
- (*): ie crtl->args.pretend_args_size is greater than 0.  */
-  if (cfun->machine->uses_anonymous_args
-  && crtl->args.pretend_args_size > 0)
-return LAST_ARG_REGNUM;
-
-  /* The other case is when we have fixed arguments but less than 4 registers
- worth.  In this case r3 might be used in the body of the function, but
- it is not being used to convey an argument into the function.  In theory
- we could just check crtl->args.size to see how many bytes are
- being passed in argument registers, but it seems that it is unreliable.
- Sometimes it will have the value 0 when in fact arguments are being
- passed.  (See testcase execute/2002-1.c for an example).  So we also
- check the args_info.nregs field as well.  The problem with this field is
- that it makes no allowances for arguments that are passed to the
- function but which are not used.  Hence we could miss an opportunity
- when a function has an unused argument in r3.  But it is better 

Re: [wwwdocs] Document GCC 9 Solaris changes

2019-05-09 Thread Jeff Law
On 5/6/19 6:51 AM, Rainer Orth wrote:
> Better late than never: here's the gcc-9/changes.html update listing
> Solaris improvements.  I'm all ears for suggestions about wording or
> markup improvements.
> 
> Ok?
> 
> Thanks.
> Rainer
> 
OK.  Please install if you haven't already.

jeff


[PATCH] [aarch64] Introduce flags for SVE2.

2019-05-09 Thread Matthew Malcomson
This patch adds support in the compiler for the architecture feature
flags that binutils will use to enable/disable the new "Future
Architecture Technologies" feature Scalable Vector Extension V2 (SVE2)
announced at Linaro Connect.

The "sve2" extension that enables the core sve2 instructions.
This also enables the sve extension, since sve is a requirement of sve2.

Extra optional sve2 features are the bitperm, sm4, aes, and sha3 extensions.
These are all given extra feature flags, "bitperm", "sve2-sm4",
"sve2-aes", and "sve2-sha3" respectively.
The sm4, aes, and sha3 extensions are explicitly marked as sve2
extensions to distinguish them from the corresponding NEON extensions.

When introducing macros to denote these new features we have gone past
what a 32 bit value can represent which means we need to change the type
of those variables working with these feature flags to ensure they use
64 bit quantities.

Tested with bootstrap on aarch64-none-linux-gnu and manually seeing that
-march=armv8-a+typo prints out the expected flags while using the new
feature flags does not complain about a missing flag (until reaching the
assembler).

gcc/ChangeLog:

2019-05-09  Matthew Malcomson  

* common/config/aarch64/aarch64-common.c
(struct aarch64_option_extension, struct processor_name_to_arch,
struct arch_to_arch_name, aarch64_parse_extension, opt_ext_cmp,
aarch64_contains_opt,
aarch64_get_extension_string_for_isa_flags): Change type of
variables storing flags to uint64_t.
* config/aarch64/aarch64-option-extensions.def (sve2, sve2-sm4,
sve2-aes, sve2-sha3, bitperm): New optional SVE2 extension flags.
* config/aarch64/aarch64.c (struct processor,
aarch64_parse_arch, aarch64_parse_cpu, aarch64_validate_mcpu,
aarch64_validate_march, aarch64_override_options,
aarch64_option_print, aarch64_handle_attr_isa_flags,
aarch64_declare_function_name, aarch64_start_file): Make flag
variables uint64_t.
* config/aarch64/aarch64.h (AARCH64_FL_SVE2, AARCH64_FL_SVE2AES,
AARCH64_FL_SVE2SM4, AARCH64_FL_SVE2SHA3,
AARCH64_FL_SVE2BITPERM): New macro feature flags.
* config/aarch64/aarch64.opt (aarch64_isa_flags): Make uint64_t.
* config/aarch64/driver-aarch64.c
(struct aarch64_arch_extension, struct aarch64_core_data,
struct aarch64_arch_driver_info, host_detect_local_cpu): Make
flag variables uint64_t.



### Attachment also inlined for ease of reply###


diff --git a/gcc/common/config/aarch64/aarch64-common.c 
b/gcc/common/config/aarch64/aarch64-common.c
index 
bab3ab3fa36c66906d1b4367e2b7bfb1bf6aa08c..816e182e85a0d3b6d8400be7089f96bc76eec9fe
 100644
--- a/gcc/common/config/aarch64/aarch64-common.c
+++ b/gcc/common/config/aarch64/aarch64-common.c
@@ -170,9 +170,9 @@ aarch64_handle_option (struct gcc_options *opts,
 struct aarch64_option_extension
 {
   const char *const name;
-  const unsigned long flag_canonical;
-  const unsigned long flags_on;
-  const unsigned long flags_off;
+  const uint64_t flag_canonical;
+  const uint64_t flags_on;
+  const uint64_t flags_off;
   const bool is_synthetic;
 };
 
@@ -201,14 +201,14 @@ struct processor_name_to_arch
 {
   const std::string processor_name;
   const enum aarch64_arch arch;
-  const unsigned long flags;
+  const uint64_t flags;
 };
 
 struct arch_to_arch_name
 {
   const enum aarch64_arch arch;
   const std::string arch_name;
-  const unsigned long flags;
+  const uint64_t flags;
 };
 
 /* Map processor names to the architecture revision they implement and
@@ -238,7 +238,7 @@ static const struct arch_to_arch_name all_architectures[] =
a copy of the string is created and stored to INVALID_EXTENSION.  */
 
 enum aarch64_parse_opt_result
-aarch64_parse_extension (const char *str, unsigned long *isa_flags,
+aarch64_parse_extension (const char *str, uint64_t *isa_flags,
 std::string *invalid_extension)
 {
   /* The extension string is parsed left to right.  */
@@ -326,16 +326,18 @@ int opt_ext_cmp (const void* a, const void* b)
  turns on as a dependency.  As an example +dotprod turns on FL_DOTPROD and
  FL_SIMD.  As such the set of bits represented by this option is
  {FL_DOTPROD, FL_SIMD}. */
-  unsigned long total_flags_a = opt_a->flag_canonical & opt_a->flags_on;
-  unsigned long total_flags_b = opt_b->flag_canonical & opt_b->flags_on;
+  uint64_t total_flags_a = opt_a->flag_canonical & opt_a->flags_on;
+  uint64_t total_flags_b = opt_b->flag_canonical & opt_b->flags_on;
   int popcnt_a = popcount_hwi ((HOST_WIDE_INT)total_flags_a);
   int popcnt_b = popcount_hwi ((HOST_WIDE_INT)total_flags_b);
   int order = popcnt_b - popcnt_a;
 
   /* If they have the same amount of bits set, give it a more
- deterministic ordering by using the value of the bits themselves.  */
+ deterministic ordering by using the value of the bits themselves.
+  

RE: [Vectorizer] Add SLP support for masked loads

2019-05-09 Thread Alejandro Martinez Vicente
Hi Richards,

This is the new version of the patch, addressing your comments.

Alejandro

> -Original Message-
> From: Richard Sandiford 
> Sent: 08 May 2019 14:36
> To: Richard Biener 
> Cc: Alejandro Martinez Vicente ; GCC
> Patches ; nd 
> Subject: Re: [Vectorizer] Add SLP support for masked loads
> 
> Richard Biener  writes:
> > On Fri, Apr 26, 2019 at 3:14 PM Richard Sandiford
> >  wrote:
> >>
> >> Alejandro Martinez Vicente 
> writes:
> >> > Hi,
> >> >
> >> > Current vectorizer doesn't support masked loads for SLP. We should
> >> > add that, to allow things like:
> >> >
> >> > void
> >> > f (int *restrict x, int *restrict y, int *restrict z, int n) {
> >> >   for (int i = 0; i < n; i += 2)
> >> > {
> >> >   x[i] = y[i] ? z[i] : 1;
> >> >   x[i + 1] = y[i + 1] ? z[i + 1] : 2;
> >> > }
> >> > }
> >> >
> >> > to be vectorized using contiguous loads rather than LD2 and ST2.
> >> >
> >> > This patch was motivated by SVE, but it is completely generic and
> >> > should apply to any architecture with masked loads.
> >> >
> >> > After the patch is applied, the above code generates this output
> >> > (-march=armv8.2-a+sve -O2 -ftree-vectorize):
> >> >
> >> >  :
> >> >0: 717fcmp w3, #0x0
> >> >4: 540002cdb.le5c 
> >> >8: 51000464sub w4, w3, #0x1
> >> >c: d283mov x3, #0x0// #0
> >> >   10: 9005adrpx5, 0 
> >> >   14: 25d8e3e0ptrue   p0.d
> >> >   18: 53017c84lsr w4, w4, #1
> >> >   1c: 91a5add x5, x5, #0x0
> >> >   20: 11000484add w4, w4, #0x1
> >> >   24: 85c0e0a1ld1rd   {z1.d}, p0/z, [x5]
> >> >   28: 2598e3e3ptrue   p3.s
> >> >   2c: d37ff884lsl x4, x4, #1
> >> >   30: 25a41fe2whilelo p2.s, xzr, x4
> >> >   34: d503201fnop
> >> >   38: a5434820ld1w{z0.s}, p2/z, [x1, x3, lsl #2]
> >> >   3c: 25808c11cmpne   p1.s, p3/z, z0.s, #0
> >> >   40: 25808810cmpne   p0.s, p2/z, z0.s, #0
> >> >   44: a5434040ld1w{z0.s}, p0/z, [x2, x3, lsl #2]
> >> >   48: 05a1c400sel z0.s, p1, z0.s, z1.s
> >> >   4c: e5434800st1w{z0.s}, p2, [x0, x3, lsl #2]
> >> >   50: 04b0e3e3incwx3
> >> >   54: 25a41c62whilelo p2.s, x3, x4
> >> >   58: 5401b.ne38   // b.any
> >> >   5c: d65f03c0ret
> >> >
> >> >
> >> > I tested this patch in an aarch64 machine bootstrapping the
> >> > compiler and running the checks.
> >> >
> >> > Alejandro
> >> >
> >> > gcc/Changelog:
> >> >
> >> > 2019-01-16  Alejandro Martinez  
> >> >
> >> >   * config/aarch64/aarch64-sve.md (copysign3): New
> define_expand.
> >> >   (xorsign3): Likewise.
> >> >   internal-fn.c: Marked mask_load_direct and mask_store_direct as
> >> >   vectorizable.
> >> >   tree-data-ref.c (data_ref_compare_tree): Fixed comment typo.
> >> >   tree-vect-data-refs.c (can_group_stmts_p): Allow masked loads to be
> >> >   combined even if masks different.
> >> >   (slp_vect_only_p): New function to detect masked loads that are 
> >> > only
> >> >   vectorizable using SLP.
> >> >   (vect_analyze_data_ref_accesses): Mark SLP only vectorizable 
> >> > groups.
> >> >   tree-vect-loop.c (vect_dissolve_slp_only_groups): New function to
> >> >   dissolve SLP-only vectorizable groups when SLP has been discarded.
> >> >   (vect_analyze_loop_2): Call vect_dissolve_slp_only_groups when
> needed.
> >> >   tree-vect-slp.c (vect_get_and_check_slp_defs): Check masked loads
> >> >   masks.
> >> >   (vect_build_slp_tree_1): Fixed comment typo.
> >> >   (vect_build_slp_tree_2): Include masks from masked loads in SLP
> tree.
> >> >   tree-vect-stmts.c (vect_get_vec_defs_for_operand): New function to
> get
> >> >   vec_defs for operand with optional SLP and vectype.
> >> >   (vectorizable_load): Allow vectorizaion of masked loads for SLP 
> >> > only.
> >> >   tree-vectorizer.h (_stmt_vec_info): Added flag for SLP-only
> >> >   vectorizable.
> >> >   tree-vectorizer.c (vec_info::new_stmt_vec_info): Likewise.
> >> >
> >> > gcc/testsuite/Changelog:
> >> >
> >> > 2019-01-16  Alejandro Martinez  
> >> >
> >> >   * gcc.target/aarch64/sve/mask_load_slp_1.c: New test for SLP
> >> >   vectorized masked loads.
> >> >
> >> > diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c index
> >> > 4f2ef45..67eee59 100644
> >> > --- a/gcc/internal-fn.c
> >> > +++ b/gcc/internal-fn.c
> >> > @@ -100,11 +100,11 @@ init_internal_fns ()
> >> >  /* Create static initializers for the information returned by
> >> > direct_internal_fn.  */
> >> >  #define not_direct { -2, -2, false } -#define mask_load_direct {
> >> > -1, 2, false }
> >> > +#define mask_load_direct { -1, 2, true }
> >> >  #define load_lanes_direct { -1, -1, false }  #define
> >> > mask_load_lanes_direct { -1, -1, false }  #define
> >> 

Re: [RFC] SHIFT_COUNT_TRUNCATED and shift_truncation_mask

2019-05-09 Thread Segher Boessenkool
[ Please Cc: me on combine patches.  Thanks! ]

On Thu, May 09, 2019 at 01:52:58PM +0200, Robin Dapp wrote:
> I had already finished all the patterns when I realized that
> SHIFT_COUNT_TRUNCATED and the target hook shift_truncation_mask already
> exist and could do what is needed without extra patterns.  Just defining
>  shift_truncation_mask was not enough though as most of the additional
> insns get introduced by combine.

SHIFT_COUNT_TRUNCATED says that *all* of your (variable) shifts truncate
the shift count in a particular way.  targetm.shift_truncation_mask says
that *all* of your variable shifts in a particular mode are done in a
particular way.  Neither is true, for some (many?) targets.

For example, on Power, if done in the integer registers, simple shifts
use one bit more than the datum rotated (this works for rotates as well
of course).  In the vector registers, you get a mask of just the size
of the datum however.

If it is not a simple shift, but a shift-and-mask, *nothing* works: for
example, a DImode shift-and-mask with certain arguments is actually
implemented with a 32-bit instruction (but many use a 64-bit instruction).

It really depends on the machine instruction what the behaviour is, what
the mask has to be; and for that at a minimum you need to know the icode,
but many patterns have different machine insns in their alternatives, too.

> While the attached patch might probably work for s390, it will probably
> not for other targets.  In addition to what my patch does, would it be
> useful to unify both truncation methods in a target hook that takes the
> operation (shift, rotate, zero_extract, ...) as well as the mode as
> arguments?  Thus, we would let the target decide what to do with the
> specific combination of both.  Maybe this would also allow to
> distinguish bit test operations from the rest.

Will this be useful for many targets?  That's the question for all new
hooks that aren't needed, that are just a nicety.  It's a cost for
everyone, not just your target, which is a bad tradeoff if it doesn't
help enough targets, and helps enough.

> @@ -12295,7 +12293,7 @@ simplify_comparison (enum rtx_code code, rtx *pop0, 
> rtx *pop1)
>between the position and the location of the single bit.  */
> /* Except we can't if SHIFT_COUNT_TRUNCATED is set, since we might
>have already reduced the shift count modulo the word size.  */
> -   if (!SHIFT_COUNT_TRUNCATED
> +   if ((!SHIFT_COUNT_TRUNCATED || !targetm.shift_truncation_mask (mode))

Please make the (default) hook implementation use the macro, instead?

> diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
> index ad8eacdf4dc..1d723f29e1e 100644
> --- a/gcc/config/s390/s390.c
> +++ b/gcc/config/s390/s390.c
> @@ -2320,6 +2320,7 @@ s390_single_part (rtx op,
>   part = i;
>   }
>  }
> +

No spurious whitespace changes please.  There is a lot that can be
improved or should be fixed, but let's not do that one by one, and not
in random patches.


Segher


Re: [RFC] SHIFT_COUNT_TRUNCATED and shift_truncation_mask

2019-05-09 Thread Segher Boessenkool
On Thu, May 09, 2019 at 09:59:55AM -0600, Jeff Law wrote:
> THe main reason it's discouraged is because some targets have insns
> where the count would be truncated and others where it would not.   So
> for example normal shifts might truncate, but vector shifts might or
> (mips) or shifts might truncate but bit tests do not (x86).

And don't forget about zero_extract, on targets that are unforunate
enough to have that.

> I don't know enough about the s390 architecture to know if there's any
> corner cases.  You'd have to look at ever pattern in your machine
> description with a shift and verify that it's going to DTRT if the count
> hasn't been truncated.

Yeah.

> It would really help if you could provide testcases which show the
> suboptimal code and any analysis you've done.

I'd love to see this, too!


Segher


[committed] sel-sched: allow negative insn priority (PR 88879)

2019-05-09 Thread Alexander Monakov
Hi,

this fixes PR 88879 by mostly restoring the previous behavior (accepting
negative priority values).

Approved by Andrey off-list, applying.

PR rtl-optimization/88879
* sel-sched.c (sel_target_adjust_priority): Remove assert.

Index: sel-sched.c
===
--- sel-sched.c (revision 271038)
+++ sel-sched.c (working copy)
@@ -3331,8 +3331,6 @@
   else
 new_priority = priority;
 
-  gcc_assert (new_priority >= 0);
-
   /* If the priority has changed, adjust EXPR_PRIORITY_ADJ accordingly.  */
   EXPR_PRIORITY_ADJ (expr) = new_priority - EXPR_PRIORITY (expr);
 



Re: [Patch] Fix ix86_expand_sse_comi_round (PR Target/89750, PR Target/86444)

2019-05-09 Thread Jeff Law
On 5/6/19 11:38 PM, Hongtao Liu wrote:
> Hi Uros and GCC:
>   This patch is to fix ix86_expand_sse_comi_round whose implementation
> was not correct.
>   New implentation aligns with _mm_cmp_round_s[sd]_mask.
> 
> Bootstrap and regression tests for x86 is fine.
> Ok for trunk?
> 
> 
> ChangeLog:
> gcc/
>* config/i386/i386-expand.c (ix86_expand_sse_comi_round):
>Modified, original implementation isn't correct.
> 
> gcc/testsuite
>* gcc.target/i386/avx512f-vcomisd-2.c: New runtime tests.
>* gcc.target/i386/avx512f-vcomisd-2.c: Likewise.
So you'll have to bear with me, I'm not really familiar with this code,
but in the absence of a maintainer I'll try to work through it.


> 
> -- BR, Hongtao
> 
> 
> 0001-Fix-ix86_expand_sse_comi_round.patch
> 
> Index: gcc/ChangeLog
> ===
> --- gcc/ChangeLog (revision 270933)
> +++ gcc/ChangeLog (working copy)
> @@ -1,3 +1,11 @@
> +2019-05-06  H.J. Lu  
> + Hongtao Liu  
> +
> + PR Target/89750
> + PR Target/86444
> + * config/i386/i386-expand.c (ix86_expand_sse_comi_round):
> + Modified, original implementation isn't correct.
> +
>  2019-05-06  Segher Boessenkool  
>  
>   * config/rs6000/rs6000.md (FIRST_ALTIVEC_REGNO, LAST_ALTIVEC_REGNO)
> Index: gcc/config/i386/i386-expand.c
> ===
> --- gcc/config/i386/i386-expand.c (revision 270933)
> +++ gcc/config/i386/i386-expand.c (working copy)
> @@ -9853,18 +9853,24 @@
>const struct insn_data_d *insn_p = &insn_data[icode];
>machine_mode mode0 = insn_p->operand[0].mode;
>machine_mode mode1 = insn_p->operand[1].mode;
> -  enum rtx_code comparison = UNEQ;
> -  bool need_ucomi = false;
>  
>/* See avxintrin.h for values.  */
> -  enum rtx_code comi_comparisons[32] =
> +  static const enum rtx_code comparisons[32] =
So I assume the comment refers to the _CMP_* #defines in avxintrin.h?


>  {
> -  UNEQ, GT, GE, UNORDERED, LTGT, UNLE, UNLT, ORDERED, UNEQ, UNLT,
> -  UNLE, LT, LTGT, GE, GT, LT, UNEQ, GT, GE, UNORDERED, LTGT, UNLE,
> -  UNLT, ORDERED, UNEQ, UNLT, UNLE, LT, LTGT, GE, GT, LT
> +  EQ, LT, LE, UNORDERED, NE, UNGE, UNGT, ORDERED,
> +  EQ, UNLT, UNLE, UNORDERED, LTGT, GE, GT, ORDERED,
> +  EQ, LT, LE, UNORDERED, NE, UNGE, UNGT, ORDERED,
> +  EQ, UNLT, UNLE, UNORDERED, LTGT, GE, GT, ORDERED
>  };

For CMP_EQ_UQ aren't we looking for an unordered comparison, so UNEQ
seems right, but you're using EQ.  Can you double-check this?  If it's
wrong, then please make sure we cover this case with a test.




> @@ -9932,11 +10021,37 @@
>  }
>  
>emit_insn (pat);
> +
> +  /* XXX: Set CCFPmode and check a different CCmode.  Does it work
> + correctly?  */
> +  if (GET_MODE (set_dst) != mode)
> +set_dst = gen_rtx_REG (mode, REGNO (set_dst));
This looks worrisome, even without the cryptic comment.  I don't think
you can just blindly change the mode like that.  Unless you happen to
know that the only things you test in the new mode were set in precisely
the same way as the old mode.

Jeff


[doc, committed] Remove stale reference to FOR_EACH_LOOP_BREAK

2019-05-09 Thread Bill Schmidt
Hi,

We removed FOR_EACH_LOOP_BREAK from the compiler six years ago, but it still 
shows
up in the internals manual.  Fix that.

Tested, committed as obvious.

Thanks,
Bill


2019-05-09  Bill Schmidt  

* doc/loop.texi: Remove reference to FOR_EACH_LOOP_BREAK.


Index: gcc/doc/loop.texi
===
--- gcc/doc/loop.texi   (revision 269976)
+++ gcc/doc/loop.texi   (working copy)
@@ -86,10 +86,7 @@ the direction of traversal and the set of loops vi
 guaranteed to be visited exactly once, regardless of the changes to the
 loop tree, and the loops may be removed during the traversal.  The newly
 created loops are never traversed, if they need to be visited, this
-must be done separately after their creation.  The @code{FOR_EACH_LOOP}
-macro allocates temporary variables.  If the @code{FOR_EACH_LOOP} loop
-were ended using break or goto, they would not be released;
-@code{FOR_EACH_LOOP_BREAK} macro must be used instead.
+must be done separately after their creation.
 
 Each basic block contains the reference to the innermost loop it belongs
 to (@code{loop_father}).  For this reason, it is only possible to have



Re: [PATCH][PR90106] Builtin call transformation changes in cdce pass

2019-05-09 Thread Jeff Law
On 5/8/19 8:25 PM, JunMa wrote:
> 在 2019/5/9 上午9:20, Bin.Cheng 写道:
>> On Thu, May 9, 2019 at 5:31 AM Jeff Law  wrote:
>>> On 5/8/19 6:28 AM, Richard Biener wrote:
 On Wed, May 8, 2019 at 12:09 PM JunMa  wrote:
> Hi
>
> As PR90106 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90106),
> when gcc meets builtin function call like:
>
>     y = sqrt (x);
>
> The cdce pass tries to transform the call into an internal function
> call and conditionally executes call with a simple range check on the
> arguments which can detect most cases and the errno does not need
> to be set. It looks like:
>
>     y = IFN_SQRT (x);
>     if (__builtin_isless (x, 0))
>   sqrt (x);
>
> However, If the call is in tail position, for example:
>
>     y =  sqrt (x);
>     return y;
>
> will become:
>
>     y = IFN_SQRT (x);
>     if (__builtin_isless (x, 0))
>   sqrt (x);
>     return y;
>
> This transformation breaks tailcall pattern, and prevents
> later tailcall optimizations.
>
> So This patch transform builtin call with return value into
> if-then-else part, which looks like:
>
>     y =  sqrt (x);
>  ==>
>     if (__builtin_isless (x, 0))
>   y = sqrt (x);
>     else
>   y = IFN_SQRT (x);
>
> BTW, y = sqrt (x) can also transform like:
>
>     y = IFN_SQRT (x);
>     if (__builtin_isless (x, 0))
>   y = sqrt (x);
>
> We don‘t choose this pattern because it emits worse assemble
> code(more move instruction and use more registers) in x86_64.
>
> Bootstrapped/regtested on x86_64-linux, ok for trunk?
 OK.
>>> JunMa -- do you have a copyright assignment on file and write access to
>>> the repository?  If not we should take care of that before proceeding
>>> further.
>> Hi Jeff,
>> Thanks very much for helping.
>> Yes, he is under Alibaba's copyright assignment.  What else should we
>> do other than noticing in this mailing list message?
>> BTW, I think JunMa needs to apply write-access though.
> 
> Yes, I do not have write access, FYI.
OK.  So in my response to Bin I gave you a link to the form to fill out
to get write access.  List me as your sponsor.  Once that's all taken
care of and working, go ahead and commit this change to the trunk.

Jeff


Merge from trunk to gccgo branch

2019-05-09 Thread Ian Lance Taylor
I merged trunk revision 271040 to the gccgo branch.

Ian


Go patch committed: Avoid copy for string([]byte) conversion in map keys

2019-05-09 Thread Ian Lance Taylor
This patch to the Go frontend by Cherry Zhang avoids a copy for a
string([]byte) conversion used as a map key.  If a string([]byte)
conversion is used immediately as a key for a map read, we don't need
to copy the backing store of the byte slice, as mapaccess does not
keep a reference to it.

The gc compiler does more than this: it also avoids the copy if the
map key is a composite literal that contains the conversion as a
field, like, T{ ... { ..., string(b), ... }, ... }. For now, we just
optimize the simple case, which is probably most common.

Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
to mainline.

Ian

2019-05-09  Cherry Zhang  

* go.dg/mapstring.go: New test.
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 271021)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-9c8581187b1c1a30036263728370f31cb846a274
+3dbf51c01c5d0acbf9ae47f77166fa9935881749
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: gcc/go/gofrontend/expressions.cc
===
--- gcc/go/gofrontend/expressions.cc(revision 271021)
+++ gcc/go/gofrontend/expressions.cc(working copy)
@@ -12158,6 +12158,13 @@ Map_index_expression::do_flatten(Gogo* g
   return Expression::make_error(loc);
 }
 
+  // Avoid copy for string([]byte) conversions used in map keys.
+  // mapaccess doesn't keep the reference, so this is safe.
+  Type_conversion_expression* ce = this->index_->conversion_expression();
+  if (ce != NULL && ce->type()->is_string_type()
+  && ce->expr()->type()->is_slice_type())
+ce->set_no_copy(true);
+
   if (!Type::are_identical(mt->key_type(), this->index_->type(),
   Type::COMPARE_ERRORS | Type::COMPARE_TAGS,
   NULL))
Index: gcc/go/gofrontend/statements.cc
===
--- gcc/go/gofrontend/statements.cc (revision 270993)
+++ gcc/go/gofrontend/statements.cc (working copy)
@@ -1307,6 +1307,13 @@ Tuple_map_assignment_statement::do_lower
   if (map_type == NULL)
 return Statement::make_error_statement(loc);
 
+  // Avoid copy for string([]byte) conversions used in map keys.
+  // mapaccess doesn't keep the reference, so this is safe.
+  Type_conversion_expression* ce = map_index->index()->conversion_expression();
+  if (ce != NULL && ce->type()->is_string_type()
+  && ce->expr()->type()->is_slice_type())
+ce->set_no_copy(true);
+
   Block* b = new Block(enclosing, loc);
 
   // Move out any subexpressions to make sure that functions are
Index: gcc/testsuite/go.dg/mapstring.go
===
--- gcc/testsuite/go.dg/mapstring.go(nonexistent)
+++ gcc/testsuite/go.dg/mapstring.go(working copy)
@@ -0,0 +1,11 @@
+// { dg-do compile }
+// { dg-options "-fgo-debug-optimization" }
+
+package p
+
+func F(m map[string]int, a, b []byte) int {
+   x := m[string(a)] // { dg-error "no copy string\\(\\\[\\\]byte\\)" }
+   y, ok := m[string(b)] // { dg-error "no copy string\\(\\\[\\\]byte\\)" }
+   _ = ok
+   return x + y
+}


Re: [PATCH] libphobos: RISC-V: Fix soft-float build errors with IEEE exception flags

2019-05-09 Thread Maciej W. Rozycki
Iain, Jakub --

> > > >  With all of the above in mind, OK to apply to trunk and to GCC 9?
> > >
> > > Looks OK to me.
> >
> > The change is ok for 9.1 if it can be committed to gcc-9-branch today.
> 
> Committed to both trunk and gcc-9-branch.

 Thank you both for the review, however, Iain, in the future, when 
committing on my behalf rather than letting me do it myself, would you 
please do use the commit message I have prepared and be duly dilligent 
with respect to the details I have provided?  Thank you.

 Kind regards,

  Maciej W. Rozycki

Re: [RFC] SHIFT_COUNT_TRUNCATED and shift_truncation_mask

2019-05-09 Thread Uros Bizjak
>> Event SHIFT_COUNT_TRUNCATED is no perfect match to what our hardware
>> does because we always only consider the last 6 bits of a shift operand.>
>> Despite all the warnings in the other backends, most notably
>> SHIFT_COUNT_TRUNCATED being "discouraged" as mentioned in riscv.h, I
>> wrote the attached tentative patch.  It's a little ad-hoc, uses the
>> SHIFT_COUNT_TRUNCATED paths only if shift_truncation_mask () != 0 and,
>> instead of truncating via & (GET_MODE_BITSIZE (mode) - 1), it applies
>> the mask returned by shift_truncation_mask.  Doing so, usage of both
>> "methods" actually reduces to a single way.
> THe main reason it's discouraged is because some targets have insns
> where the count would be truncated and others where it would not.   So
> for example normal shifts might truncate, but vector shifts might or
> (mips) or shifts might truncate but bit tests do not (x86).

Bit tests on x86 also truncate [1], if the bit base operand specifies
a register, and we don't use BT with a memory location as a bit base.
I don't know what is referred with "(real or pretended) bit field
operations" in the documentation for SHIFT_COUNT_TRUNCATED:

 However, on some machines, such as the 80386 and the 680x0,
 truncation only applies to shift operations and not the (real or
 pretended) bit-field operations.  Define 'SHIFT_COUNT_TRUNCATED' to

Vector shifts don't truncate on x86, so x86 probably shares the same
destiny with MIPS. Maybe a machine mode argument could be passed to
SHIFT_COUNT_TRUNCATED to distinguish modes that truncate from modes
that don't.

[1] https://www.felixcloutier.com/x86/bt

Uros.


Re: [PATCH] netbsd EABI support

2019-05-09 Thread coypu
On Tue, Apr 09, 2019 at 05:36:47PM +0100, Richard Earnshaw (lists) wrote:
> > So we're well into stage4 which means technically it's too late for
> > something like this.  However, given it's limited scope I won't object
> > if the ARM port maintainers want to go forward.  Otherwise I'll queue it
> > for gcc-10.
> > 
> > jeff
> > 
> 
> I was about to approve this (modulo removing the now obsolete
> FPU_DEFAULT macro), until I noticed that it also modifies the generic
> NetBSD code as well.  I'm certainly not willing to approve that myself
> at this late stage, but if one of the NetBSD OS maintainers wants to
> step up and do so, I'll happily take the Arm back-end code as that's not
> a primary or secondary target.
> 
> R.

Congrats on a new release :-)
ping


Re: [PATCH] libphobos: RISC-V: Fix soft-float build errors with IEEE exception flags

2019-05-09 Thread Maciej W. Rozycki
On Mon, 6 May 2019, Jim Wilson wrote:

> >  Hmm, I've been thinking a little bit about this hybrid mode and I have
> > one question: how do we pass the IEEE rounding mode setting between `fcsr'
> > and softfp where we have `-march=rv32imafc -mabi=ilp32' and
> > `-march=rv32imac -mabi=ilp32' object modules interlinked?
> 
> If you look at libgcc/config/riscv/sfp-machine.h you will see
> definitions of FP_INIT_ROUNDMODE and FP_HANDLE_EXCEPTIONS for reading
> rounding mode from the fcsr before soft-float FP operations, and
> storing the exception flags back into fcsr after soft-float FP
> operations, if FP regs exist..  Whether this actually works, I don't
> know, I haven't tested it.

 I've looked at the code and it looks like it should do what's intended 
here.

>  I think in practice people using
> soft-float libraries are probably not relying on rounding modes and
> exception flags much if at all, so this is unlikely to be a problem.

 However I gather the intent for the hybrid mode has been to permit people 
to seamlessly interlink hard-float and soft-float code, perhaps libraries 
that have only been supplied in a binary form (whether we like it or not) 
and cannot therefore be rebuilt for the other FP model.  As such I think 
we should ensure that consistent operation has been architected for the 
reasonable scenarios, such that if any problem problem arises, then it's 
at most a bug and not a design fault.  It does appear to me we don't have 
a design fault here, so no need to worry.

> If someone does find a problem, I will worry about how to fix it then.

 Sure.  I just wanted to double-check we don't have an ABI issue here.
 
  Maciej

[PATCH] combine: Don't generate IF_THEN_ELSE

2019-05-09 Thread Segher Boessenkool
On all targets I managed to test (21) this results in better code.  Only
alpha ends up with slightly bigger code.

Committing to trunk.


Segher


2019-05-10  Segher Boessenkool  

* combine.c (combine_simplify_rtx): Don't make IF_THEN_ELSE RTL.

---
 gcc/combine.c | 8 
 1 file changed, 8 deletions(-)

diff --git a/gcc/combine.c b/gcc/combine.c
index 7b236225..8c4375f 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -5937,14 +5937,6 @@ combine_simplify_rtx (rtx x, machine_mode op0_mode, int 
in_dest,
 mode, VOIDmode,
 cond, cop1),
mode);
- else
-   return gen_rtx_IF_THEN_ELSE (mode,
-simplify_gen_relational (cond_code,
- mode,
- VOIDmode,
- cond,
- cop1),
-true_rtx, false_rtx);
 
  code = GET_CODE (x);
  op0_mode = VOIDmode;
-- 
1.8.3.1



Re: [PATCH] [MIPS] Fix PR/target 90357 20080502-1.c -O0 start with r269880

2019-05-09 Thread Paul Hua
ping ?

On Mon, May 6, 2019 at 4:34 PM Paul Hua  wrote:
>
> The attached patch fix pr90357, bootstraped and regressed test on
> mips64el-linux-gnu target.
> Ok for commit ?


Re: [PATCH][PR90106] Builtin call transformation changes in cdce pass

2019-05-09 Thread JunMa

在 2019/5/10 上午4:00, Jeff Law 写道:

On 5/8/19 8:25 PM, JunMa wrote:

在 2019/5/9 上午9:20, Bin.Cheng 写道:

On Thu, May 9, 2019 at 5:31 AM Jeff Law  wrote:

On 5/8/19 6:28 AM, Richard Biener wrote:

On Wed, May 8, 2019 at 12:09 PM JunMa  wrote:

Hi

As PR90106 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90106),
when gcc meets builtin function call like:

     y = sqrt (x);

The cdce pass tries to transform the call into an internal function
call and conditionally executes call with a simple range check on the
arguments which can detect most cases and the errno does not need
to be set. It looks like:

     y = IFN_SQRT (x);
     if (__builtin_isless (x, 0))
   sqrt (x);

However, If the call is in tail position, for example:

     y =  sqrt (x);
     return y;

will become:

     y = IFN_SQRT (x);
     if (__builtin_isless (x, 0))
   sqrt (x);
     return y;

This transformation breaks tailcall pattern, and prevents
later tailcall optimizations.

So This patch transform builtin call with return value into
if-then-else part, which looks like:

     y =  sqrt (x);
  ==>
     if (__builtin_isless (x, 0))
   y = sqrt (x);
     else
   y = IFN_SQRT (x);

BTW, y = sqrt (x) can also transform like:

     y = IFN_SQRT (x);
     if (__builtin_isless (x, 0))
   y = sqrt (x);

We don‘t choose this pattern because it emits worse assemble
code(more move instruction and use more registers) in x86_64.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

OK.

JunMa -- do you have a copyright assignment on file and write access to
the repository?  If not we should take care of that before proceeding
further.

Hi Jeff,
Thanks very much for helping.
Yes, he is under Alibaba's copyright assignment.  What else should we
do other than noticing in this mailing list message?
BTW, I think JunMa needs to apply write-access though.

Yes, I do not have write access, FYI.

OK.  So in my response to Bin I gave you a link to the form to fill out
to get write access.  List me as your sponsor.  Once that's all taken
care of and working, go ahead and commit this change to the trunk.

Thanks!
JunMa

Jeff





Re: [PATCH 1/2] or1k: Fix code quality for volatile memory loads

2019-05-09 Thread Bernhard Reutner-Fischer
On 6 May 2019 15:16:20 CEST, Stafford Horne  wrote:
>Volatile memory does not match the memory_operand predicate.  This
>causes extra extend/mask instructions instructions when reading
>from volatile memory.  On OpenRISC loading volitile memory can be

s/volitile/volatile/g

also at least in the test.
Thanks,


>diff --git a/gcc/testsuite/gcc.target/or1k/swap-2.c
>b/gcc/testsuite/gcc.target/or1k/swap-2.c
>new file mode 100644
>index 000..8ddea4e659f
>--- /dev/null
>+++ b/gcc/testsuite/gcc.target/or1k/swap-2.c

>+/* Check to ensure the volitile load does not get zero extended.  */



Re: [Patch] Fix ix86_expand_sse_comi_round (PR Target/89750, PR Target/86444)

2019-05-09 Thread Hongtao Liu
On Fri, May 10, 2019 at 3:55 AM Jeff Law  wrote:
>
> On 5/6/19 11:38 PM, Hongtao Liu wrote:
> > Hi Uros and GCC:
> >   This patch is to fix ix86_expand_sse_comi_round whose implementation
> > was not correct.
> >   New implentation aligns with _mm_cmp_round_s[sd]_mask.
> >
> > Bootstrap and regression tests for x86 is fine.
> > Ok for trunk?
> >
> >
> > ChangeLog:
> > gcc/
> >* config/i386/i386-expand.c (ix86_expand_sse_comi_round):
> >Modified, original implementation isn't correct.
> >
> > gcc/testsuite
> >* gcc.target/i386/avx512f-vcomisd-2.c: New runtime tests.
> >* gcc.target/i386/avx512f-vcomisd-2.c: Likewise.
> So you'll have to bear with me, I'm not really familiar with this code,
> but in the absence of a maintainer I'll try to work through it.
>
>
> >
> > -- BR, Hongtao
> >
> >
> > 0001-Fix-ix86_expand_sse_comi_round.patch
> >
> > Index: gcc/ChangeLog
> > ===
> > --- gcc/ChangeLog (revision 270933)
> > +++ gcc/ChangeLog (working copy)
> > @@ -1,3 +1,11 @@
> > +2019-05-06  H.J. Lu  
> > + Hongtao Liu  
> > +
> > + PR Target/89750
> > + PR Target/86444
> > + * config/i386/i386-expand.c (ix86_expand_sse_comi_round):
> > + Modified, original implementation isn't correct.
> > +
> >  2019-05-06  Segher Boessenkool  
> >
> >   * config/rs6000/rs6000.md (FIRST_ALTIVEC_REGNO, LAST_ALTIVEC_REGNO)
> > Index: gcc/config/i386/i386-expand.c
> > ===
> > --- gcc/config/i386/i386-expand.c (revision 270933)
> > +++ gcc/config/i386/i386-expand.c (working copy)
> > @@ -9853,18 +9853,24 @@
> >const struct insn_data_d *insn_p = &insn_data[icode];
> >machine_mode mode0 = insn_p->operand[0].mode;
> >machine_mode mode1 = insn_p->operand[1].mode;
> > -  enum rtx_code comparison = UNEQ;
> > -  bool need_ucomi = false;
> >
> >/* See avxintrin.h for values.  */
> > -  enum rtx_code comi_comparisons[32] =
> > +  static const enum rtx_code comparisons[32] =
> So I assume the comment refers to the _CMP_* #defines in avxintrin.h?
>
  Yes.
>
> >  {
> > -  UNEQ, GT, GE, UNORDERED, LTGT, UNLE, UNLT, ORDERED, UNEQ, UNLT,
> > -  UNLE, LT, LTGT, GE, GT, LT, UNEQ, GT, GE, UNORDERED, LTGT, UNLE,
> > -  UNLT, ORDERED, UNEQ, UNLT, UNLE, LT, LTGT, GE, GT, LT
> > +  EQ, LT, LE, UNORDERED, NE, UNGE, UNGT, ORDERED,
> > +  EQ, UNLT, UNLE, UNORDERED, LTGT, GE, GT, ORDERED,
> > +  EQ, LT, LE, UNORDERED, NE, UNGE, UNGT, ORDERED,
> > +  EQ, UNLT, UNLE, UNORDERED, LTGT, GE, GT, ORDERED
> >  };
>
> For CMP_EQ_UQ aren't we looking for an unordered comparison, so UNEQ
> seems right, but you're using EQ.  Can you double-check this?  If it's
> wrong, then please make sure we cover this case with a test.
>
Avx512f-vcomis[sd]-2.c covers all 32 compare predicates.
UNEQ and EQ behave differently when either operand is NAN, besides
they're the same.
Since NAN operands are handled separtely, so EQ/UNEQ makes no
difference, That why this passes cover tests.
I'll correct it.
>
>
>
> > @@ -9932,11 +10021,37 @@
> >  }
> >
> >emit_insn (pat);
> > +
> > +  /* XXX: Set CCFPmode and check a different CCmode.  Does it work
> > + correctly?  */
> > +  if (GET_MODE (set_dst) != mode)
> > +set_dst = gen_rtx_REG (mode, REGNO (set_dst));
> This looks worrisome, even without the cryptic comment.  I don't think
> you can just blindly change the mode like that.  Unless you happen to
> know that the only things you test in the new mode were set in precisely
> the same way as the old mode.
>
Modified as:
+  /* NB: Set CCFPmode and check a different CCmode.  */
+  if (GET_MODE (set_dst) != mode)
+set_dst = gen_rtx_REG (mode, FLAGS_REG);

> Jeff



-- 
BR,
Hongtao


Re: [Patch] Fix ix86_expand_sse_comi_round (PR Target/89750, PR Target/86444)

2019-05-09 Thread Hongtao Liu
On Fri, May 10, 2019 at 12:54 PM Hongtao Liu  wrote:
>
> On Fri, May 10, 2019 at 3:55 AM Jeff Law  wrote:
> >
> > On 5/6/19 11:38 PM, Hongtao Liu wrote:
> > > Hi Uros and GCC:
> > >   This patch is to fix ix86_expand_sse_comi_round whose implementation
> > > was not correct.
> > >   New implentation aligns with _mm_cmp_round_s[sd]_mask.
> > >
> > > Bootstrap and regression tests for x86 is fine.
> > > Ok for trunk?
> > >
> > >
> > > ChangeLog:
> > > gcc/
> > >* config/i386/i386-expand.c (ix86_expand_sse_comi_round):
> > >Modified, original implementation isn't correct.
> > >
> > > gcc/testsuite
> > >* gcc.target/i386/avx512f-vcomisd-2.c: New runtime tests.
> > >* gcc.target/i386/avx512f-vcomisd-2.c: Likewise.
> > So you'll have to bear with me, I'm not really familiar with this code,
> > but in the absence of a maintainer I'll try to work through it.
> >
> >
> > >
> > > -- BR, Hongtao
> > >
> > >
> > > 0001-Fix-ix86_expand_sse_comi_round.patch
> > >
> > > Index: gcc/ChangeLog
> > > ===
> > > --- gcc/ChangeLog (revision 270933)
> > > +++ gcc/ChangeLog (working copy)
> > > @@ -1,3 +1,11 @@
> > > +2019-05-06  H.J. Lu  
> > > + Hongtao Liu  
> > > +
> > > + PR Target/89750
> > > + PR Target/86444
> > > + * config/i386/i386-expand.c (ix86_expand_sse_comi_round):
> > > + Modified, original implementation isn't correct.
> > > +
> > >  2019-05-06  Segher Boessenkool  
> > >
> > >   * config/rs6000/rs6000.md (FIRST_ALTIVEC_REGNO, LAST_ALTIVEC_REGNO)
> > > Index: gcc/config/i386/i386-expand.c
> > > ===
> > > --- gcc/config/i386/i386-expand.c (revision 270933)
> > > +++ gcc/config/i386/i386-expand.c (working copy)
> > > @@ -9853,18 +9853,24 @@
> > >const struct insn_data_d *insn_p = &insn_data[icode];
> > >machine_mode mode0 = insn_p->operand[0].mode;
> > >machine_mode mode1 = insn_p->operand[1].mode;
> > > -  enum rtx_code comparison = UNEQ;
> > > -  bool need_ucomi = false;
> > >
> > >/* See avxintrin.h for values.  */
> > > -  enum rtx_code comi_comparisons[32] =
> > > +  static const enum rtx_code comparisons[32] =
> > So I assume the comment refers to the _CMP_* #defines in avxintrin.h?
> >
>   Yes.
> >
> > >  {
> > > -  UNEQ, GT, GE, UNORDERED, LTGT, UNLE, UNLT, ORDERED, UNEQ, UNLT,
> > > -  UNLE, LT, LTGT, GE, GT, LT, UNEQ, GT, GE, UNORDERED, LTGT, UNLE,
> > > -  UNLT, ORDERED, UNEQ, UNLT, UNLE, LT, LTGT, GE, GT, LT
> > > +  EQ, LT, LE, UNORDERED, NE, UNGE, UNGT, ORDERED,
> > > +  EQ, UNLT, UNLE, UNORDERED, LTGT, GE, GT, ORDERED,
> > > +  EQ, LT, LE, UNORDERED, NE, UNGE, UNGT, ORDERED,
> > > +  EQ, UNLT, UNLE, UNORDERED, LTGT, GE, GT, ORDERED
> > >  };
> >
> > For CMP_EQ_UQ aren't we looking for an unordered comparison, so UNEQ
> > seems right, but you're using EQ.  Can you double-check this?  If it's
> > wrong, then please make sure we cover this case with a test.
> >
> Avx512f-vcomis[sd]-2.c covers all 32 compare predicates.
> UNEQ and EQ behave differently when either operand is NAN, besides
> they're the same.
> Since NAN operands are handled separtely, so EQ/UNEQ makes no
> difference, That why this passes cover tests.
> I'll correct it.
> >
> >
> >
> > > @@ -9932,11 +10021,37 @@
> > >  }
> > >
> > >emit_insn (pat);
> > > +
> > > +  /* XXX: Set CCFPmode and check a different CCmode.  Does it work
> > > + correctly?  */
> > > +  if (GET_MODE (set_dst) != mode)
> > > +set_dst = gen_rtx_REG (mode, REGNO (set_dst));
> > This looks worrisome, even without the cryptic comment.  I don't think
> > you can just blindly change the mode like that.  Unless you happen to
> > know that the only things you test in the new mode were set in precisely
> > the same way as the old mode.
> >
> Modified as:
> +  /* NB: Set CCFPmode and check a different CCmode.  */
> +  if (GET_MODE (set_dst) != mode)
> +set_dst = gen_rtx_REG (mode, FLAGS_REG);
>
> > Jeff
>
>
>
> --
> BR,
> Hongtao

Update patch.

-- 
BR,
Hongtao
Index: gcc/ChangeLog
===
--- gcc/ChangeLog	(revision 271051)
+++ gcc/ChangeLog	(working copy)
@@ -316,6 +316,14 @@
 	* tree-ssa-phiopt.c (two_value_replacement): Fix a typo in parameter
 	detection.
 
+2019-05-06  H.J. Lu  
+	Hongtao Liu  
+
+	PR target/89750
+	PR target/86444
+	* config/i386/i386-expand.c (ix86_expand_sse_comi_round):
+	Modified, original implementation isn't correct.
+
 2019-05-06  Segher Boessenkool  
 
 	* config/rs6000/rs6000.md (FIRST_ALTIVEC_REGNO, LAST_ALTIVEC_REGNO)
Index: gcc/config/i386/i386-expand.c
===
--- gcc/config/i386/i386-expand.c	(revision 271051)
+++ gcc/config/i386/i386-expand.c	(working copy)
@@ -9877,18 +9877,24 @@
   const struct insn_data_d *insn_p = &insn_data[icode];
   machine_mode mode0 =

Deque code cleanup and optimizations

2019-05-09 Thread François Dumont

Hi

    This patch implements a number of simplifications and optimizations 
already done to other containers like std::vector


- Default default and move constructors

- The std::swap optimization

- The construction always equal allocator optimization

- Shortcuts on method calls.

    I remove several _M_map != nullptr checks cause in current 
implementation it can't be null. I have several patches following this 
one to support it but in this case we will be using a different code path.


    Now that we take shortcuts in several methods in C++11 there are 
some that are simply unused in C++11 mode. For the moment I kept them as 
long as we are not in versioned namespace scope in order to maintain abi 
compatibility, I wonder if I really need to consider abi 
backward-compatibility here ?


    * include/bits/stl_deque.h (_Deque_base(_Deque_base&&, false_type)):
    Make private.
    (_Deque_base(_Deque_base&&, true_type)): Likewise. Remove _M_map check.
    (_Deque_base(_Deque_base&&, const allocator_type&)): New.
    (_Deque_base(_Deque_base&&, const allocator_type&, size_t)): Remove
    _M_map check.
    (_Deque_base::_Deque_impl_data): New.
    (_Deque_base::_Deque_impl): Inherit latter.
    (_Deque_base::_Deque_impl::_M_swap_data): Move...
    (_Deque_base::_Deque_impl_data::_M_swap_data): ... here.
    (_Deque_base::_Deque_impl()): Add noexcept qualification.
    (_Deque_base::_Deque_impl(_Deque_impl&&, _Tp_alloc_type&&)): New.
    (_Deque_base::_Deque_impl::_M_get_Tp_allocator()): Remove static_cast.
    (_Deque_base::_Deque_impl::_M_move_impl()): Remove _M_impl._M_map 
check.

    (deque<>::deque()): Default.
    (deque<>::deque(deque&&)): Default.
    (deque<>::deque(deque&&, const allocator_type&, false_type)): New.
    (deque<>::deque(deque&&, const allocator_type&, true_type)): New.
    (deque<>::deque(deque&&, const allocator_type&)): Delegate to latters.
    (deque<>::deque<_It>(_It, _It, const allocator_type&)): Use
    _M_range_initialize.
    (deque<>::assign<_It>(_It, _It)): Use _M_assign_aux.
    (deque<>::resize(size_type, const value_type&)): Share a single
    implementation.
    (deque<>::insert<_It>(const_iterator, _It, _It)): Use
    _M_range_insert_aux.
    [__cplusplus >= 201103L && 
_GLIBCXX_INLINE_VERSION](_M_initialize_dispatch):

    Remove.
    [__cplusplus >= 201103L && 
_GLIBCXX_INLINE_VERSION](_M_assign_dispatch):

    Remove.
    [__cplusplus >= 201103L && 
_GLIBCXX_INLINE_VERSION](_M_insert_dispatch):

    Remove.
    * testsuite/23_containers/deque/allocator/default_init.cc: New.

Tested under Linux x86_64 normal and debug modes.

Ok to commit ?

François

diff --git a/libstdc++-v3/include/bits/stl_deque.h b/libstdc++-v3/include/bits/stl_deque.h
index c050d1bf023..92c34207baa 100644
--- a/libstdc++-v3/include/bits/stl_deque.h
+++ b/libstdc++-v3/include/bits/stl_deque.h
@@ -401,7 +401,6 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 	_Map_alloc_type;
   typedef __gnu_cxx::__alloc_traits<_Map_alloc_type> _Map_alloc_traits;
 
-public:
   typedef _Alloc		  allocator_type;
 
   allocator_type
@@ -428,6 +427,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
   { /* Caller must initialize map. */ }
 
 #if __cplusplus >= 201103L
+private:
   _Deque_base(_Deque_base&& __x, false_type)
   : _M_impl(__x._M_move_impl())
   { }
@@ -436,84 +436,100 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
   : _M_impl(std::move(__x._M_get_Tp_allocator()))
   {
 	_M_initialize_map(0);
-	if (__x._M_impl._M_map)
 	this->_M_impl._M_swap_data(__x._M_impl);
   }
 
+protected:
   _Deque_base(_Deque_base&& __x)
   : _Deque_base(std::move(__x), typename _Alloc_traits::is_always_equal{})
   { }
 
+  _Deque_base(_Deque_base&& __x, const allocator_type& __a)
+  : _M_impl(std::move(__x._M_impl), _Tp_alloc_type(__a))
+  { __x._M_initialize_map(0); }
+
   _Deque_base(_Deque_base&& __x, const allocator_type& __a, size_t __n)
   : _M_impl(__a)
   {
 	if (__x.get_allocator() == __a)
-	  {
-	if (__x._M_impl._M_map)
 	  {
 	_M_initialize_map(0);
 	this->_M_impl._M_swap_data(__x._M_impl);
 	  }
-	  }
 	else
-	  {
 	  _M_initialize_map(__n);
   }
-  }
 #endif
 
   ~_Deque_base() _GLIBCXX_NOEXCEPT;
 
-protected:
   typedef typename iterator::_Map_pointer _Map_pointer;
 
-  //This struct encapsulates the implementation of the std::deque
-  //standard container and at the same time makes use of the EBO
-  //for empty allocators.
-  struct _Deque_impl
-  : public _Tp_alloc_type
+  struct _Deque_impl_data
   {
 	_Map_pointer _M_map;
 	size_t _M_map_size;
 	iterator _M_start;
 	iterator _M_finish;
 
-	_Deque_impl()
-	: _Tp_alloc_type(), _M_map(), _M_map_size(0),
-	  _M_start(), _M_finish()
+	_Deque_impl_data() _GLIBCXX_NOEXCEPT
+	: _M_map(), _M_map_size(), _M_start(), _M_finish()
+	{ }
+
+#if __cplusplus >= 201103L
+	_Deque_impl_data(const _Deque_impl_data&) = default;
+	_Deque_impl_data&
+	opera

Re: [PATCH][stage1] Support profile (BB counts and edge probabilities) in GIMPLE FE.

2019-05-09 Thread Bernhard Reutner-Fischer
On 7 May 2019 14:00:32 CEST, "Martin Liška"  wrote:

/The parameters is/s/parameters/parameter/

thanks,