date:20190918

[PATCH] Come up with debug counter for store-merging.

2019-09-18 Thread Martin Liška

Hi.

After I spent quite some time with PR91758, I would like
to see a debug counter in store merging for the next time.

Ready to be installed?
Thanks,
Martin

gcc/ChangeLog:

2019-09-18  Martin Liska  

* dbgcnt.def (store_merging): New counter.
* gimple-ssa-store-merging.c 
(imm_store_chain_info::output_merged_stores):
Use it in store merging.
---
 gcc/dbgcnt.def | 1 +
 gcc/gimple-ssa-store-merging.c | 4 +++-
 2 files changed, 4 insertions(+), 1 deletion(-)


diff --git a/gcc/dbgcnt.def b/gcc/dbgcnt.def
index 230072f7bb5..ef981aa6967 100644
--- a/gcc/dbgcnt.def
+++ b/gcc/dbgcnt.def
@@ -196,3 +196,4 @@ DEBUG_COUNTER (vect_loop)
 DEBUG_COUNTER (vect_slp)
 DEBUG_COUNTER (dom_unreachable_edges)
 DEBUG_COUNTER (match)
+DEBUG_COUNTER (store_merging)
diff --git a/gcc/gimple-ssa-store-merging.c b/gcc/gimple-ssa-store-merging.c
index 0bf64b314d6..5abaa7d18d8 100644
--- a/gcc/gimple-ssa-store-merging.c
+++ b/gcc/gimple-ssa-store-merging.c
@@ -166,6 +166,7 @@
 #include "rtl.h"
 #include "expr.h"	/* For get_bit_range.  */
 #include "optabs-tree.h"
+#include "dbgcnt.h"
 #include "selftest.h"
 
 /* The maximum size (in bits) of the stores this pass should generate.  */
@@ -4195,7 +4196,8 @@ imm_store_chain_info::output_merged_stores ()
   bool ret = false;
   FOR_EACH_VEC_ELT (m_merged_store_groups, i, merged_store)
 {
-  if (output_merged_store (merged_store))
+  if (dbg_cnt (store_merging)
+	  && output_merged_store (merged_store))
 	{
 	  unsigned int j;
 	  store_immediate_info *store;

Re: [x86] Tweak testcases for PR82361

2019-09-18 Thread Richard Sandiford

Uros Bizjak  writes:
> On Tue, Sep 17, 2019 at 6:34 PM Richard Sandiford
>  wrote:
>>
>> gcc/testsuite/gcc.target/i386/pr82361-[12].c check whether we
>> can optimise away a 32-to-64-bit zero extension of a 32-bit
>> division or modulus result.  Currently this fails for the modulus
>> part of f1 and f2 in pr82361-1.c:
>>
>> /* FIXME: We are still not able to optimize the modulo in f1/f2, only manage
>>one.  */
>> /* { dg-final { scan-assembler-times "movl\t%edx" 2 } } */
>>
>> pr82361-2.c instead expects no failures:
>>
>> /* Ditto %edx to %rdx zero extensions.  */
>> /* { dg-final { scan-assembler-not "movl\t%edx, %edx" } } */
>>
>> But we actually get the same zero-extensions for f1 and f2 in pr82361-2.c.
>> The reason they don't trigger a failure is that the RA allocates the
>> asm input for "d" to %rdi rather than %rdx, so we have:
>>
>> movl%rdi, %rdx
>>
>> instead of:
>>
>> movl%rdx, %rdx
>>
>> For the tests to work as expected, I think they have to force "c" and
>> "d" to be %rax and %rdx respectively.  We then see the same failure in
>> pr82361-2.c as for pr82361-1.c (but doubled, due to the 8-bit division
>> path).
>>
>> Tested on x86_64-linux-gnu.  OK to install?
>>
>> Richard
>>
>>
>> 2019-09-17  Richard Sandiford  
>>
>> gcc/testsuite/
>> * gcc.target/i386/pr82361-1.c (f1, f2, f3, f4, f5, f6): Force
>> "c" to be in %rax and "d" to be in %rdx.
>> * gcc.target/i386/pr82361-2.c: Expect 4 instances of "movl\t%edx".
>
> OK, with a comment improvement below.
>
> Thanks,
> Uros.
>
>> Index: gcc/testsuite/gcc.target/i386/pr82361-1.c
>> ===
>> --- gcc/testsuite/gcc.target/i386/pr82361-1.c   2019-03-08 
>> 18:14:39.040959532 +
>> +++ gcc/testsuite/gcc.target/i386/pr82361-1.c   2019-09-17 
>> 17:32:00.930930762 +0100
>> @@ -11,43 +11,43 @@
>>  void
>>  f1 (unsigned int a, unsigned int b)
>>  {
>> -  unsigned long long c = a / b;
>> -  unsigned long long d = a % b;
>> +  register unsigned long long c asm ("rax") = a / b;
>> +  register unsigned long long d asm ("rdx") = a % b;
>>asm volatile ("" : : "r" (c), "r" (d));
>>  }
>>
>>  void
>>  f2 (int a, int b)
>>  {
>> -  unsigned long long c = (unsigned int) (a / b);
>> -  unsigned long long d = (unsigned int) (a % b);
>> +  register unsigned long long c asm ("rax") = (unsigned int) (a / b);
>> +  register unsigned long long d asm ("rdx") = (unsigned int) (a % b);
>>asm volatile ("" : : "r" (c), "r" (d));
>>  }
>>
>>  void
>>  f3 (unsigned int a, unsigned int b)
>>  {
>> -  unsigned long long c = a / b;
>> +  register unsigned long long c asm ("rax") = a / b;
>>asm volatile ("" : : "r" (c));
>>  }
>>
>>  void
>>  f4 (int a, int b)
>>  {
>> -  unsigned long long c = (unsigned int) (a / b);
>> +  register unsigned long long c asm ("rax") = (unsigned int) (a / b);
>>asm volatile ("" : : "r" (c));
>>  }
>>
>>  void
>>  f5 (unsigned int a, unsigned int b)
>>  {
>> -  unsigned long long d = a % b;
>> +  register unsigned long long d asm ("rdx") = a % b;
>>asm volatile ("" : : "r" (d));
>>  }
>>
>>  void
>>  f6 (int a, int b)
>>  {
>> -  unsigned long long d = (unsigned int) (a % b);
>> +  register unsigned long long d asm ("rdx") = (unsigned int) (a % b);
>>asm volatile ("" : : "r" (d));
>>  }
>> Index: gcc/testsuite/gcc.target/i386/pr82361-2.c
>> ===
>> --- gcc/testsuite/gcc.target/i386/pr82361-2.c   2019-09-17 
>> 16:34:52.280124553 +0100
>> +++ gcc/testsuite/gcc.target/i386/pr82361-2.c   2019-09-17 
>> 17:32:00.930930762 +0100
>> @@ -4,7 +4,8 @@
>>  /* We should be able to optimize all %eax to %rax zero extensions, because
>> div and idiv instructions with 32-bit operands zero-extend both results. 
>>   */
>>  /* { dg-final { scan-assembler-not "movl\t%eax, %eax" } } */
>> -/* Ditto %edx to %rdx zero extensions.  */
>> -/* { dg-final { scan-assembler-not "movl\t%edx, %edx" } } */
>> +/* FIXME: We are still not able to optimize the modulo in f1/f2, only manage
>> +   one.  */
>
> Can we please change comment here and in pr82361-2.c to something like:
>
> /* FIXME: The compiler does not merge zero-extension to the modulo part.  */

Thanks, here's what I applied.

Richard


2019-09-18  Richard Sandiford  

gcc/testsuite/
* gcc.target/i386/pr82361-1.c (f1, f2, f3, f4, f5, f6): Force
"c" to be in %rax and "d" to be in %rdx.
* gcc.target/i386/pr82361-2.c: Expect 4 instances of "movl\t%edx".

Index: gcc/testsuite/gcc.target/i386/pr82361-1.c
===
--- gcc/testsuite/gcc.target/i386/pr82361-1.c   2019-09-17 18:00:14.0 
+0100
+++ gcc/testsuite/gcc.target/i386/pr82361-1.c   2019-09-18 08:37:39.030720198 
+0100
@@ -4,50 +4,50 @@
 /* We should be able to optimize all %eax to %rax zero extensions, because
div and idiv instructions with 32-bit operands zero-extend both res

Re: Patch RFA: Emit .cfi_sections after some input code has been seen

2019-09-18 Thread Richard Biener

On Tue, 17 Sep 2019, Ian Lance Taylor wrote:

> This seemingly innocuous change
> 
> 2019-09-11  Richard Biener  
> 
> * lto-opts.c (lto_write_options): Stream -g when debug is enabled.
> * lto-wrapper.c (merge_and_complain): Pick up -g.
> (append_compiler_options): Likewise.
> (run_gcc): Re-instantiate handling -g0 at link-time.
> * doc/invoke.texi (flto): Document debug info generation.
> 
> caused PR 91763, a test failure building Go code with -flto.  The
> problem only arose when using the GNU assembler on Solaris.
> 
> The bug is that when emitting debug info but not exception info, and
> when using gas, the DWARF code will emit
> .cfi_sections .debug_frame
> This will direct gas to emit unwind info into .debug_frame but not .eh_frame.
> 
> Go code requires unwind info, and the Go library expects it to be in
> .eh_frame.  The Go frontend always turns on exceptions, so this
> .cfi_sections directive is not used.
> 
> However, when using -flto, the lto1 program decides whether it is
> using exceptions based on what it reads from the input files.  Before
> lto1 sees any input files, flag_exceptions will have its default value
> of 0.  And lto1 initializes the debug info before seeing any input
> files, so the debug initialization thinks that exceptions are not in
> use, and emits the .cfi_sections directive.
> 
> This problem was uncovered by the above patch because Go code also
> turns on debugging by default, and so lto1 now sees a -g option that
> it did not see before.
> 
> This patch fixes the problem by moving the emission of .cfi_sections
> from debug init to the first time that the debug info needs to know
> whether CFI is supported.  This is only done when actually emitting
> debug info, and therefore after some input files have been read.
> 
> Bootstrapped and ran full testsuite on x86_64-pc-linux-gnu.  Tested
> that formerly failing case now passes on sparc-sun-solaris2.11.
> 
> OK for trunk?

Hmm.  To me it looks like there's nothing guaranteeing that
flag_exceptions is initialized appropriately since it's
set on function body read-in which is now on-demand.  So
I'm not sure that we cannot have functions output into
assembly before flag_exceptions is initialized.

Also dwarf2out_do_cfi_asm is a predicate which makes it
an awkward point.  Maybe at the time we emit the first
.cfi assembly instruction would be the correct (and latest)
point in time to emit this directive?

Anyways, I am testing the patch below which initializes
flag_exceptions before dwarf2out_assembly_start by
moving the initialization to a central place.

The lto_input_ts_function_decl_tree_pointers made this work
for most languages but not for those not having a language-specific
personality routine, so I have to check flag_exceptions as well
(the decls struct function are not input yet, but we do save
the CUs -fexception setting accordingly).

Note this doesn't solve PR91794 where the same issue for
-funwind-tables setting applies, but the fix could look
similar.

LTO bootstrap and regtest running on x86_64-unknown-linux-gnu.

Richard.

2019-09-18  Richard Biener  

PR lto/91763
* lto-streamer-in.c (input_eh_regions): Move EH init to
lto_materialize_function.
* tree-streamer-in.c (lto_input_ts_function_decl_tree_pointers):
Likewise.

lto/
* lto.c (lto_materialize_function): Initialize EH by looking
at the function personality and flag_exceptions setting.

Index: gcc/lto-streamer-in.c
===
--- gcc/lto-streamer-in.c   (revision 275800)
+++ gcc/lto-streamer-in.c   (working copy)
@@ -615,11 +615,6 @@ input_eh_regions (struct lto_input_block
 
   lto_tag_check_range (tag, LTO_eh_table, LTO_eh_table);
 
-  /* If the file contains EH regions, then it was compiled with
- -fexceptions.  In that case, initialize the backend EH
- machinery.  */
-  lto_init_eh ();
-
   gcc_assert (fn->eh);
 
   root_region = streamer_read_hwi (ib);
Index: gcc/tree-streamer-in.c
===
--- gcc/tree-streamer-in.c  (revision 275800)
+++ gcc/tree-streamer-in.c  (working copy)
@@ -800,12 +800,6 @@ lto_input_ts_function_decl_tree_pointers
   }
   }
 #endif
-
-  /* If the file contains a function with an EH personality set,
- then it was compiled with -fexceptions.  In that case, initialize
- the backend EH machinery.  */
-  if (DECL_FUNCTION_PERSONALITY (expr))
-lto_init_eh ();
 }
 
 
Index: gcc/lto/lto.c
===
--- gcc/lto/lto.c   (revision 275800)
+++ gcc/lto/lto.c   (working copy)
@@ -218,6 +218,12 @@ lto_materialize_function (struct cgraph_
return;
   if (DECL_FUNCTION_PERSONALITY (decl) && !first_personality_decl)
first_personality_decl = DECL_FUNCTION_PERSONALITY (decl);
+  /* If the file contains a function with a language specific E

Re: [PATCH 5/9] Come up with an abstraction.

2019-09-18 Thread Martin Liška

Hello.

Ok, so the current IPA ICF transformation is being blocked by the
patch 2/9 (about FIELD_DECL). I asked Honza for a help here.
In the meantime, can you Richi make an opinion about the part 5 which
is about the interaction in between old operand_equal_p and a new
hook in IPA ICF?

Thanks,
Martin

Re: [PATCH] Come up with json::integer_number and use it in GCOV.

2019-09-18 Thread Martin Liška

PING^4

Just note that the author of the JSON implementation
in GCC is fine with the patch ;)

Martin

On 9/9/19 2:38 PM, Martin Liška wrote:
> PING^3
> 
> On 8/30/19 10:55 AM, Martin Liška wrote:
>> PING^2
>>
>> On 8/26/19 2:34 PM, Martin Liška wrote:
>>> PING^1
>>>
>>> On 8/13/19 1:51 PM, Martin Liška wrote:
 On 8/2/19 2:40 PM, David Malcolm wrote:
> Something that occurred to me reading the updated patch: maybe it would
> make things easier to have utility member functions of json::object to
> implicitly make the child, e.g.:
>
> void
> json::object::set (const char *key, long v)
> {
>set (key, new json::integer_number (v));
> }
>
> so that all those calls can be just:
>
>   obj->set ("line", exploc.line);
>   obj->set ("column", exploc.column);
>
> etc (assuming overloading is unambiguous).
>
> But that's probably orthogonal to this patch.

 Looks good to me. It's a candidate for a follow up patch.

>
>
>> And I changed all occurrences of float_number with integer_number
>> as you suggested.
> Thanks.
>
>> I'm currently testing the updated patch.
>> Martin
> The updated patch looks good to me, but technically I'm not a reviewer
> for these files.

 Sure, I hope @Jakub or @Richi can approve me that?
 Thanks,
 Martin

>
> Dave

>>>
>>
>

[patch,committed][OG9] Fix compiler warnings

2019-09-18 Thread Tobias Burnus

Committed the following patch to silence compiler warnings – some 
pointed to issues, which are now fixed.


Tobias

commit 500483e6ced44e2e0fea6a37e4f8c267ebaf826a
Author: Tobias Burnus 
Date:   Wed Sep 18 08:44:20 2019 +0200

Silence compiler warnings

gcc/
2019-09-17  Tobias Burnus  

* config/gcn/gcn.c (gcn_expand_scalar_to_vector_address,
gcn_md_reorg): Remove unused statement.
(gcn_emutls_var_init): Add missing return - after sorry abort.
* config/gcn/gcn.md (movdi_symbol_save_scc): Fix condition.
* config/gcn/mkoffload.c (process_obj): Remove unused variables.
* gimplify.c (gomp_oacc_needs_data_present): Likewise.
(gimplify_adjust_omp_clauses): Fix condition by adding ().
* omp-low.c (process_oacc_gangprivate_1): Comment unused
parameter name to silence unused warning.
* omp-sese.c (omp_sese_number, omp_sese_pseudo): Remove
superfluous ().
(oacc_do_neutering): Use signed int to avoid a warning.
* tree-ssa-structalias.c (find_func_aliases_for_builtin_call,
find_func_clobbers): Use unsigned to silence warning.

gcc/fortran/
2019-09-17  Tobias Burnus  

* trans-expr.c (gfc_auto_dereference_var): Use passed loc argument.

diff --git a/gcc/ChangeLog.openacc b/gcc/ChangeLog.openacc
index 8f3aee75449..fe584959153 100644
--- a/gcc/ChangeLog.openacc
+++ b/gcc/ChangeLog.openacc
@@ -1,3 +1,20 @@
+2019-09-17  Tobias Burnus  
+
+	* config/gcn/gcn.c (gcn_expand_scalar_to_vector_address,
+	gcn_md_reorg): Remove unused statement.
+	(gcn_emutls_var_init): Add missing return - after sorry abort.
+	* config/gcn/gcn.md (movdi_symbol_save_scc): Fix condition.
+	* config/gcn/mkoffload.c (process_obj): Remove unused variables.
+	* gimplify.c (gomp_oacc_needs_data_present): Likewise.
+	(gimplify_adjust_omp_clauses): Fix condition by adding ().
+	* omp-low.c (process_oacc_gangprivate_1): Comment unused
+	parameter name to silence unused warning.
+	* omp-sese.c (omp_sese_number, omp_sese_pseudo): Remove
+	superfluous ().
+	(oacc_do_neutering): Use signed int to avoid a warning.
+	* tree-ssa-structalias.c (find_func_aliases_for_builtin_call,
+	find_func_clobbers): Use unsigned to silence warning.
+
 2019-09-10  Julian Brown  
 
 	* config/gcn/mkoffload.c (process_asm): Remove omp_data_size,
diff --git a/gcc/config/gcn/gcn.c b/gcc/config/gcn/gcn.c
index f8434e4a4f1..e0a558b289a 100644
--- a/gcc/config/gcn/gcn.c
+++ b/gcc/config/gcn/gcn.c
@@ -1766,7 +1766,6 @@ gcn_expand_scalar_to_vector_address (machine_mode mode, rtx exec, rtx mem,
 	  /* tmp[:] += zext (mem_base)  */
 	  if (exec)
 	{
-	  rtx undef_di = gcn_gen_undef (DImode);
 	  emit_insn (gen_addv64si3_vcc_dup_exec (tmplo, mem_base_lo, tmplo,
 		 vcc, undef_v64si, exec));
 	  emit_insn (gen_addcv64si3_exec (tmphi, tmphi, const0_rtx,
@@ -3167,6 +3166,7 @@ tree
 gcn_emutls_var_init (tree, tree decl, tree)
 {
   sorry_at (DECL_SOURCE_LOCATION (decl), "TLS is not implemented for GCN.");
+  return NULL_TREE;
 }
 
 /* }}}  */
@@ -4292,8 +4292,6 @@ gcn_md_reorg (void)
 {
   basic_block bb;
   rtx exec_reg = gen_rtx_REG (DImode, EXEC_REG);
-  rtx exec_lo_reg = gen_rtx_REG (SImode, EXEC_LO_REG);
-  rtx exec_hi_reg = gen_rtx_REG (SImode, EXEC_HI_REG);
   regset_head live;
 
   INIT_REG_SET (&live);
diff --git a/gcc/config/gcn/gcn.md b/gcc/config/gcn/gcn.md
index 537bb260ff7..1f328528e4a 100644
--- a/gcc/config/gcn/gcn.md
+++ b/gcc/config/gcn/gcn.md
@@ -840,7 +840,7 @@
  [(set (match_operand:DI 0 "nonimmediate_operand" "=Sg")
(match_operand:DI 1 "general_operand" "Y"))
   (clobber (reg:BI CC_SAVE_REG))]
- "GET_CODE (operands[1]) == SYMBOL_REF || GET_CODE (operands[1]) == LABEL_REF
+ "(GET_CODE (operands[1]) == SYMBOL_REF || GET_CODE (operands[1]) == LABEL_REF)
   && (lra_in_progress || reload_completed)"
  "#"
  "reload_completed"
diff --git a/gcc/config/gcn/mkoffload.c b/gcc/config/gcn/mkoffload.c
index 593274bf054..c96ed23a2a6 100644
--- a/gcc/config/gcn/mkoffload.c
+++ b/gcc/config/gcn/mkoffload.c
@@ -367,8 +367,6 @@ process_obj (FILE *in, FILE *cfile)
 {
   size_t len = 0;
   const char *input = read_file (in, &len);
-  id_map const *id;
-  unsigned ix;
 
   /* Dump out an array containing the binary.
  FIXME: do this with objcopy.  */
diff --git a/gcc/fortran/ChangeLog.openacc b/gcc/fortran/ChangeLog.openacc
index 576e33fd567..a54fb4e4614 100644
--- a/gcc/fortran/ChangeLog.openacc
+++ b/gcc/fortran/ChangeLog.openacc
@@ -1,3 +1,7 @@
+2019-09-17  Tobias Burnus  
+
+	* trans-expr.c (gfc_auto_dereference_var): Use passed loc argument.
+
 2019-09-06  Andrew Stubbs  
 
 	Backport from mainline:
diff --git a/gcc/fortran/trans-expr.c b/gcc/fortran/trans-expr.c
index 7dc5ada9b6b..4a3bd9acd65 100644
--- a/gcc/fortran/trans-expr.c
+++ b/gcc/fortran/trans-expr.c
@@ -2576,7 +2576,7 @@ gfc_auto_dereference_var (location_t loc, gfc_symbol *sym,

[patch, committed] Use PRId64 in libgomp/config/linux

2019-09-18 Thread Tobias Burnus

Use PRId64 if available, otherwise use a cast. For some reasons, it 
failed during bootstrap with a -Werror even though %ld should be okay 
with int64_t on x86_64-gnu-linux. Nonetheless, using PRId64 is better.


Committed after testing on x86_64-gnu-linux.

Tobias

commit 8a8ebae1a419e1d3642d22874195acf6d5bae7d8
Author: Tobias Burnus 
Date:   Wed Sep 18 10:27:39 2019 +0200

Use PRId64 if available

libgomp/
2019-09-18  Tobias Burnus  

* linux/gomp_print.c (gomp_print_integer): Use PRId64 if available,
otherwise cast for %ld.

diff --git a/libgomp/ChangeLog.openacc b/libgomp/ChangeLog.openacc
index 1006b8149c8..db7f2a43b80 100644
--- a/libgomp/ChangeLog.openacc
+++ b/libgomp/ChangeLog.openacc
@@ -1,3 +1,8 @@
+2019-09-18  Tobias Burnus  
+
+	* linux/gomp_print.c (gomp_print_integer): Use PRId64 if available,
+	otherwise cast for %ld.
+
 2019-09-17  Julian Brown  
 
 	* libgomp-plugin.h (GOMP_OFFLOAD_openacc_async_host2dev): Update
diff --git a/libgomp/config/linux/gomp_print.c b/libgomp/config/linux/gomp_print.c
index 811bdd6e9a9..8b2e383440f 100644
--- a/libgomp/config/linux/gomp_print.c
+++ b/libgomp/config/linux/gomp_print.c
@@ -1,6 +1,11 @@
 #include 
 #include 
 
+#include "config.h"  /* For HAVE_INTTYPES_H.  */
+#ifdef HAVE_INTTYPES_H
+# include   /* For PRId64.  */
+#endif
+
 void
 gomp_print_string (const char *msg, const char *value)
 {
@@ -10,7 +15,11 @@ gomp_print_string (const char *msg, const char *value)
 void
 gomp_print_integer (const char *msg, int64_t value)
 {
-  printf ("%s%ld\n", msg, value);
+#ifdef HAVE_INTTYPES_H
+  printf ("%s%" PRId64 "\n", msg, value);
+#else
+  printf ("%s%ld\n", msg, (long) value);
+#endif
 }
 
 void

[Ada] Fix typo in error message

2019-09-18 Thread Pierre-Marie de Rodat

An error message mentions "gnamake", where it meant to mention
"gnatmake".

Tested on x86_64-pc-linux-gnu, committed on trunk

2019-09-18  Tom Tromey  

gcc/ada/

* make.adb (Initialize): Fix typo.--- gcc/ada/make.adb
+++ gcc/ada/make.adb
@@ -3789,7 +3789,7 @@ package body Make is
 
if Gprbuild = null then
   Fail_Program
-("project files are no longer supported by gnamake;" &
+("project files are no longer supported by gnatmake;" &
  " use gprbuild instead");
end if;

[Ada] Avoid uninitialized variable in bounded containers

2019-09-18 Thread Pierre-Marie de Rodat

In function Copy in Ada.Containers.Bounded_Ordered_Sets and other
bounded containers packages, remove a possible use of an uninitialized
variable. This was not a bug, because the uninitialized variable could
be used only if checks are suppressed, and the checks would have failed,
leading to erroneous execution.

However, it seems more robust this way, and is probably equally
efficient, and avoids a warning that is given if checks are suppressed,
and the -Wall switch is given, and optimization is turned on.

Tested on x86_64-pc-linux-gnu, committed on trunk

2019-09-18  Bob Duff  

gcc/ada/

* libgnat/a-cbhama.adb, libgnat/a-cbhase.adb,
libgnat/a-cbmutr.adb, libgnat/a-cborma.adb,
libgnat/a-cborse.adb, libgnat/a-cobove.adb (Copy): Avoid reading
the uninitialized variable C in the Checks = False case. Change
variable to be a constant.

gcc/testsuite/

* gnat.dg/containers1.adb, gnat.dg/containers1.ads: New
testcase.--- gcc/ada/libgnat/a-cbhama.adb
+++ gcc/ada/libgnat/a-cbhama.adb
@@ -262,18 +262,14 @@ package body Ada.Containers.Bounded_Hashed_Maps is
   Capacity : Count_Type := 0;
   Modulus  : Hash_Type := 0) return Map
is
-  C : Count_Type;
+  C : constant Count_Type :=
+(if Capacity = 0 then Source.Length
+ else Capacity);
   M : Hash_Type;
 
begin
-  if Capacity = 0 then
- C := Source.Length;
-
-  elsif Capacity >= Source.Length then
- C := Capacity;
-
-  elsif Checks then
- raise Capacity_Error with "Capacity value too small";
+  if Checks and then C < Source.Length then
+ raise Capacity_Error with "Capacity too small";
   end if;
 
   if Modulus = 0 then

--- gcc/ada/libgnat/a-cbhase.adb
+++ gcc/ada/libgnat/a-cbhase.adb
@@ -254,16 +254,14 @@ package body Ada.Containers.Bounded_Hashed_Sets is
   Capacity : Count_Type := 0;
   Modulus  : Hash_Type := 0) return Set
is
-  C : Count_Type;
+  C : constant Count_Type :=
+(if Capacity = 0 then Source.Length
+ else Capacity);
   M : Hash_Type;
 
begin
-  if Capacity = 0 then
- C := Source.Length;
-  elsif Capacity >= Source.Length then
- C := Capacity;
-  elsif Checks then
- raise Capacity_Error with "Capacity value too small";
+  if Checks and then C < Source.Length then
+ raise Capacity_Error with "Capacity too small";
   end if;
 
   if Modulus = 0 then

--- gcc/ada/libgnat/a-cbmutr.adb
+++ gcc/ada/libgnat/a-cbmutr.adb
@@ -625,15 +625,12 @@ package body Ada.Containers.Bounded_Multiway_Trees is
  (Source   : Tree;
   Capacity : Count_Type := 0) return Tree
is
-  C : Count_Type;
-
+  C : constant Count_Type :=
+(if Capacity = 0 then Source.Count
+ else Capacity);
begin
-  if Capacity = 0 then
- C := Source.Count;
-  elsif Capacity >= Source.Count then
- C := Capacity;
-  elsif Checks then
- raise Capacity_Error with "Capacity value too small";
+  if Checks and then C < Source.Count then
+ raise Capacity_Error with "Capacity too small";
   end if;
 
   return Target : Tree (Capacity => C) do

--- gcc/ada/libgnat/a-cborma.adb
+++ gcc/ada/libgnat/a-cborma.adb
@@ -464,17 +464,12 @@ package body Ada.Containers.Bounded_Ordered_Maps is
--
 
function Copy (Source : Map; Capacity : Count_Type := 0) return Map is
-  C : Count_Type;
-
+  C : constant Count_Type :=
+(if Capacity = 0 then Source.Length
+ else Capacity);
begin
-  if Capacity = 0 then
- C := Source.Length;
-
-  elsif Capacity >= Source.Length then
- C := Capacity;
-
-  elsif Checks then
- raise Capacity_Error with "Capacity value too small";
+  if Checks and then C < Source.Length then
+ raise Capacity_Error with "Capacity too small";
   end if;
 
   return Target : Map (Capacity => C) do

--- gcc/ada/libgnat/a-cborse.adb
+++ gcc/ada/libgnat/a-cborse.adb
@@ -442,15 +442,12 @@ package body Ada.Containers.Bounded_Ordered_Sets is
--
 
function Copy (Source : Set; Capacity : Count_Type := 0) return Set is
-  C : Count_Type;
-
+  C : constant Count_Type :=
+(if Capacity = 0 then Source.Length
+ else Capacity);
begin
-  if Capacity = 0 then
- C := Source.Length;
-  elsif Capacity >= Source.Length then
- C := Capacity;
-  elsif Checks then
- raise Capacity_Error with "Capacity value too small";
+  if Checks and then C < Source.Length then
+ raise Capacity_Error with "Capacity too small";
   end if;
 
   return Target : Set (Capacity => C) do

--- gcc/ada/libgnat/a-cobove.adb
+++ gcc/ada/libgnat/a-cobove.adb
@@ -451,18 +451,12 @@ package body Ada.Containers.Bounded_Vectors is
  (Source   : Vector;
   Capacity : Count_Type := 0) return Vector
is
-  C : Count_T

[Ada] Fix errno for rename for the VxWorks 6 target

2019-09-18 Thread Pierre-Marie de Rodat

This fixes the wrong errno for rename when the file is not existing on a
dosFs. In the end it makes Ada.Directories.Rename raising the right
exception in the case we are trying to move a file in a non existing
directory.

Tested on x86_64-pc-linux-gnu, committed on trunk

2019-09-18  Frederic Konrad  

gcc/ada/

* adaint.c: Include dosFsLib.h and vwModNum.h for VxWorks 6.
(__gnat_rename): Map S_dosFsLib_FILE_NOT_FOUND to ENOENT.--- gcc/ada/adaint.c
+++ gcc/ada/adaint.c
@@ -74,6 +74,12 @@
(such as chmod) are only available on VxWorks 6.  */
 #include "version.h"
 
+/* vwModNum.h and dosFsLib.h are needed for the VxWorks 6 rename workaround.
+   See below.  */
+#if (_WRS_VXWORKS_MAJOR == 6)
+#include 
+#include 
+#endif /* 6.x */
 #endif /* VxWorks */
 
 #if defined (__APPLE__)
@@ -754,6 +760,20 @@ __gnat_rename (char *from, char *to)
 S2WSC (wto, to, GNAT_MAX_PATH_LEN);
 return _trename (wfrom, wto);
   }
+#elif defined (__vxworks) && (_WRS_VXWORKS_MAJOR == 6)
+  {
+/* When used on a dos filesystem under VxWorks 6.9 rename will trigger a
+   S_dosFsLib_FILE_NOT_FOUND errno when the file is not found.  Let's map
+   that to ENOENT so Ada.Directory.Rename can detect that and raise the
+   Name_Error exception.  */
+int ret = rename (from, to);
+
+if (ret && (errno == S_dosFsLib_FILE_NOT_FOUND))
+  {
+errno = ENOENT;
+  }
+return ret;
+  }
 #else
   return rename (from, to);
 #endif

[Ada] Fix style issues in functional maps

2019-09-18 Thread Pierre-Marie de Rodat

Rename global constants from I to J. No functional changes.

Tested on x86_64-pc-linux-gnu, committed on trunk

2019-09-18  Claire Dross  

gcc/ada/

* libgnat/a-cofuma.adb (Remove, Elements_Equal_Except,
Keys_Included, Keys_Included_Except): Rename loop indexes and
global constants from I to J.--- gcc/ada/libgnat/a-cofuma.adb
+++ gcc/ada/libgnat/a-cofuma.adb
@@ -88,15 +88,15 @@ package body Ada.Containers.Functional_Maps with SPARK_Mode => Off is
   New_Key : Key_Type) return Boolean
is
begin
-  for I in 1 .. Length (Left.Keys) loop
+  for J in 1 .. Length (Left.Keys) loop
  declare
-K : constant Key_Type := Get (Left.Keys, I);
+K : constant Key_Type := Get (Left.Keys, J);
  begin
 if not Equivalent_Keys (K, New_Key)
   and then
 (Find (Right.Keys, K) = 0
   or else Get (Right.Elements, Find (Right.Keys, K)) /=
-  Get (Left.Elements, I))
+  Get (Left.Elements, J))
 then
return False;
 end if;
@@ -112,16 +112,16 @@ package body Ada.Containers.Functional_Maps with SPARK_Mode => Off is
   Y : Key_Type) return Boolean
is
begin
-  for I in 1 .. Length (Left.Keys) loop
+  for J in 1 .. Length (Left.Keys) loop
  declare
-K : constant Key_Type := Get (Left.Keys, I);
+K : constant Key_Type := Get (Left.Keys, J);
  begin
 if not Equivalent_Keys (K, X)
   and then not Equivalent_Keys (K, Y)
   and then
 (Find (Right.Keys, K) = 0
   or else Get (Right.Elements, Find (Right.Keys, K)) /=
-  Get (Left.Elements, I))
+  Get (Left.Elements, J))
 then
return False;
 end if;
@@ -173,9 +173,9 @@ package body Ada.Containers.Functional_Maps with SPARK_Mode => Off is
 
function Keys_Included (Left : Map; Right : Map) return Boolean is
begin
-  for I in 1 .. Length (Left.Keys) loop
+  for J in 1 .. Length (Left.Keys) loop
  declare
-K : constant Key_Type := Get (Left.Keys, I);
+K : constant Key_Type := Get (Left.Keys, J);
  begin
 if Find (Right.Keys, K) = 0 then
return False;
@@ -196,9 +196,9 @@ package body Ada.Containers.Functional_Maps with SPARK_Mode => Off is
   New_Key : Key_Type) return Boolean
is
begin
-  for I in 1 .. Length (Left.Keys) loop
+  for J in 1 .. Length (Left.Keys) loop
  declare
-K : constant Key_Type := Get (Left.Keys, I);
+K : constant Key_Type := Get (Left.Keys, J);
  begin
 if not Equivalent_Keys (K, New_Key)
   and then Find (Right.Keys, K) = 0
@@ -218,9 +218,9 @@ package body Ada.Containers.Functional_Maps with SPARK_Mode => Off is
   Y : Key_Type) return Boolean
is
begin
-  for I in 1 .. Length (Left.Keys) loop
+  for J in 1 .. Length (Left.Keys) loop
  declare
-K : constant Key_Type := Get (Left.Keys, I);
+K : constant Key_Type := Get (Left.Keys, J);
  begin
 if not Equivalent_Keys (K, X)
   and then not Equivalent_Keys (K, Y)
@@ -248,11 +248,11 @@ package body Ada.Containers.Functional_Maps with SPARK_Mode => Off is

 
function Remove (Container : Map; Key : Key_Type) return Map is
-  I : constant Extended_Index := Find (Container.Keys, Key);
+  J : constant Extended_Index := Find (Container.Keys, Key);
begin
   return
-(Keys => Remove (Container.Keys, I),
- Elements => Remove (Container.Elements, I));
+(Keys => Remove (Container.Keys, J),
+ Elements => Remove (Container.Elements, J));
end Remove;
 
---

[Ada] Fix 32/64bit mistake on SYSTEM_INFO component in s-win32

2019-09-18 Thread Pierre-Marie de Rodat

The dwActiveProcessorMask field in a SYSTEM_INFO structure on Windows
should be DWORD_PTR, an integer the size of a pointer.

In s-win32, it is currently declared as DWORD. This happens to work on
32bit hosts and is wrong on 64bit hosts, causing mishaps in accesses to
this component and all the following ones.

The proposed correction adds a definition for DWORD_PTR and uses it for
dwActiveProcessorMask in System.Win32.SYSTEM_INFO.

Tested on x86_64-pc-linux-gnu, committed on trunk

2019-09-18  Olivier Hainque  

gcc/ada/

* libgnat/s-win32.ads (DWORD_PTR): New type, pointer size
unsigned int.
(SYSTEM_INFO): Use it for dwActiveProcessorMask.

gcc/testsuite/

* gnat.dg/system_info1.adb: New testcase.--- gcc/ada/libgnat/s-win32.ads
+++ gcc/ada/libgnat/s-win32.ads
@@ -57,15 +57,16 @@ package System.Win32 is
INVALID_HANDLE_VALUE : constant HANDLE := -1;
INVALID_FILE_SIZE: constant := 16##;
 
-   type ULONG  is new Interfaces.C.unsigned_long;
-   type DWORD  is new Interfaces.C.unsigned_long;
-   type WORD   is new Interfaces.C.unsigned_short;
-   type BYTE   is new Interfaces.C.unsigned_char;
-   type LONG   is new Interfaces.C.long;
-   type CHAR   is new Interfaces.C.char;
-   type SIZE_T is new Interfaces.C.size_t;
-
-   type BOOL   is new Interfaces.C.int;
+   type ULONG is new Interfaces.C.unsigned_long;
+   type DWORD is new Interfaces.C.unsigned_long;
+   type WORD  is new Interfaces.C.unsigned_short;
+   type BYTE  is new Interfaces.C.unsigned_char;
+   type LONG  is new Interfaces.C.long;
+   type CHAR  is new Interfaces.C.char;
+   type SIZE_Tis new Interfaces.C.size_t;
+   type DWORD_PTR is mod 2 ** Standard'Address_Size;
+
+   type BOOL  is new Interfaces.C.int;
for BOOL'Size use Interfaces.C.int'Size;
 
type Bits1  is range 0 .. 2 ** 1 - 1;
@@ -265,7 +266,7 @@ package System.Win32 is
   dwPageSize  : DWORD;
   lpMinimumApplicationAddress : PVOID;
   lpMaximumApplicationAddress : PVOID;
-  dwActiveProcessorMask   : DWORD;
+  dwActiveProcessorMask   : DWORD_PTR;
   dwNumberOfProcessors: DWORD;
   dwProcessorType : DWORD;
   dwAllocationGranularity : DWORD;

--- /dev/null
new file mode 100644
+++ gcc/testsuite/gnat.dg/system_info1.adb
@@ -0,0 +1,23 @@
+--  { dg-do run }
+
+with System.Multiprocessors;
+with System.Task_Info;
+
+procedure System_Info1 is
+   Ncpus : constant System.Multiprocessors.CPU :=
+ System.Multiprocessors.Number_Of_CPUS;
+   Nprocs : constant Integer :=
+ System.Task_Info.Number_Of_Processors;
+
+   use type System.Multiprocessors.CPU;
+begin
+   if Nprocs <= 0 or else Nprocs > 1024 then
+  raise Program_Error;
+   end if;
+   if Ncpus <= 0 or else Ncpus > 1024 then
+  raise Program_Error;
+   end if;
+   if Nprocs /= Integer (Ncpus) then
+  raise Program_Error;
+   end if;
+end;
\ No newline at end of file

[Ada] Missing accessibility check on discrim assignment

2019-09-18 Thread Pierre-Marie de Rodat

This patch fixes an issue whereby assignments from anonymous access
descriminants which are part of stand alone objects of anonymous access
did not have runtime checks generated based on the accessibility level
of the object according to ARM 3.10.2 (12.5/3).

Tested on x86_64-pc-linux-gnu, committed on trunk

2019-09-18  Justin Squirek  

gcc/ada/

* exp_ch4.adb (Expand_N_Type_Conversion): Add calculation of an
alternative operand for the purposes of generating accessibility
checks.

gcc/testsuite/

* gnat.dg/access8.adb, gnat.dg/access8_pkg.adb,
gnat.dg/access8_pkg.ads: New testcase.--- gcc/ada/exp_ch4.adb
+++ gcc/ada/exp_ch4.adb
@@ -11001,6 +11001,7 @@ package body Exp_Ch4 is
procedure Expand_N_Type_Conversion (N : Node_Id) is
   Loc  : constant Source_Ptr := Sloc (N);
   Operand  : constant Node_Id:= Expression (N);
+  Operand_Acc  : Node_Id := Operand;
   Target_Type  : Entity_Id   := Etype (N);
   Operand_Type : Entity_Id   := Etype (Operand);
 
@@ -11718,6 +11719,15 @@ package body Exp_Ch4 is
   --  Case of converting to an access type
 
   if Is_Access_Type (Target_Type) then
+ --  In terms of accessibility rules, an anonymous access discriminant
+ --  is not considered separate from its parent object.
+
+ if Nkind (Operand) = N_Selected_Component
+   and then Ekind (Entity (Selector_Name (Operand))) = E_Discriminant
+   and then Ekind (Operand_Type) = E_Anonymous_Access_Type
+ then
+Operand_Acc := Original_Node (Prefix (Operand));
+ end if;
 
  --  If this type conversion was internally generated by the front end
  --  to displace the pointer to the object to reference an interface
@@ -11741,9 +11751,9 @@ package body Exp_Ch4 is
  --  other checks may still need to be applied below (such as tagged
  --  type checks).
 
- elsif Is_Entity_Name (Operand)
-   and then Has_Extra_Accessibility (Entity (Operand))
-   and then Ekind (Etype (Operand)) = E_Anonymous_Access_Type
+ elsif Is_Entity_Name (Operand_Acc)
+   and then Has_Extra_Accessibility (Entity (Operand_Acc))
+   and then Ekind (Etype (Operand_Acc)) = E_Anonymous_Access_Type
and then (Nkind (Original_Node (N)) /= N_Attribute_Reference
   or else Attribute_Name (Original_Node (N)) = Name_Access)
  then
@@ -11758,7 +11768,7 @@ package body Exp_Ch4 is
 
 else
Apply_Accessibility_Check
- (Operand, Target_Type, Insert_Node => Operand);
+ (Operand_Acc, Target_Type, Insert_Node => Operand);
 end if;
 
  --  If the level of the operand type is statically deeper than the

--- /dev/null
new file mode 100644
+++ gcc/testsuite/gnat.dg/access8.adb
@@ -0,0 +1,46 @@
+--  { dg-do run }
+--  { dg-options "-gnatws" }
+
+with Access8_Pkg;
+procedure Access8 is
+   Errors : Natural := 0;
+   outer_object_accessibility_check
+ : access Access8_Pkg.object;
+   outer_discriminant_accessibility_check
+ : access Access8_Pkg.discriminant;
+   Mistake
+ : access Access8_Pkg.discriminant;
+   outer_discriminant_copy_discriminant_check
+ : access Access8_Pkg.discriminant;
+begin
+   declare
+  obj
+: aliased Access8_Pkg.object := Access8_Pkg.get;
+  inner_object
+: access Access8_Pkg.object := obj'Access;
+  inner_discriminant
+: access Access8_Pkg.discriminant := obj.d;
+   begin
+  begin
+ outer_object_accessibility_check
+   := inner_object;--  ERROR
+  exception
+ when others => Errors := Errors + 1;
+  end;
+  begin
+ Mistake
+   := inner_object.d;  --  ERROR
+  exception
+ when others => Errors := Errors + 1;
+  end;
+  begin
+ outer_discriminant_copy_discriminant_check
+   := inner_discriminant;  --  ERROR
+  exception
+when others => Errors := Errors + 1;
+  end;
+  if Errors /= 3 then
+ raise Program_Error;
+  end if;
+   end;
+end;

--- /dev/null
new file mode 100644
+++ gcc/testsuite/gnat.dg/access8_pkg.adb
@@ -0,0 +1,30 @@
+--  { dg-options "-gnatws" }
+
+with Ada.Finalization;
+
+package body Access8_Pkg is
+
+   overriding procedure Initialize (O : in out Object) is
+   begin
+  null;
+   end;
+
+   overriding procedure Finalize (O : in out Object) is
+   begin
+  null;
+   end;
+
+   function Get return Object is
+   begin
+  return O : Object := Object'
+(Ada.Finalization.Limited_Controlled
+  with D => new discriminant);
+   end;
+
+   function Get_Access return access Object is
+   begin
+  return new Object'
+(Ada.Finalization.Limited_Controlled
+  with D => new Discriminant);
+   end;
+end;

--- /dev/null
new file mode 100644
+++ gcc/testsuite/g

[Ada] Don't fail a front-end assertion if errors have already been detected

2019-09-18 Thread Pierre-Marie de Rodat

In sem_eval.adb, we have an assertion that the type of a "null" literal
is an access type. It turns out that this assertion can fail when
processing an illegal program, e.g. one that contains something like
"Integer'(null)".  This leads to differences in the compiler's generated
output for such tests depending on whether assertions are/aren't
enabled; in particular, the "compilation abandoned due to previous
error" message generated in Comperr.Compiler_Abort. In order to avoid
these differences, we change the assertion so that it does not fail if
errors have already been posted on the given node.

Tested on x86_64-pc-linux-gnu, committed on trunk

2019-09-18  Steve Baird  

gcc/ada/

* sem_eval.adb (Expr_Value): Do not fail "the type of a null
literal must be an access type" assertion if errors have already
been posted on the given node.--- gcc/ada/sem_eval.adb
+++ gcc/ada/sem_eval.adb
@@ -4278,7 +4278,8 @@ package body Sem_Eval is
   --  The NULL access value
 
   elsif Kind = N_Null then
- pragma Assert (Is_Access_Type (Underlying_Type (Etype (N;
+ pragma Assert (Is_Access_Type (Underlying_Type (Etype (N)))
+   or else Error_Posted (N));
  Val := Uint_0;
 
   --  Character literal

[Ada] Factor out code for deciding statically known Constrained attributes

2019-09-18 Thread Pierre-Marie de Rodat

Create a separate routine in Exp_Util for deciding the value of the
Constrained attribute when it is statically known. This routine is used
in Exp_Attr and will be reused in the backend of GNATprove.

There is no impact on compilation and hence no test.

Tested on x86_64-pc-linux-gnu, committed on trunk

2019-09-18  Claire Dross  

gcc/ada/

* exp_attr.adb (Expand_N_Attribute_Reference): Call routine from
Exp_Util to know the value of the Constrained attribute in the
static case.
* exp_spark.adb (Expand_SPARK_N_Attribute_Reference): Make
implicit dereferences inside the Constrained attribute explicit.
* exp_util.ads, exp_util.adb
(Attribute_Constrained_Static_Value): New routine to compute the
value of a statically known reference to the Constrained
attribute.--- gcc/ada/exp_attr.adb
+++ gcc/ada/exp_attr.adb
@@ -2770,40 +2770,6 @@ package body Exp_Attr is
   when Attribute_Constrained => Constrained : declare
  Formal_Ent : constant Entity_Id := Param_Entity (Pref);
 
- function Is_Constrained_Aliased_View (Obj : Node_Id) return Boolean;
- --  Ada 2005 (AI-363): Returns True if the object name Obj denotes a
- --  view of an aliased object whose subtype is constrained.
-
- -
- -- Is_Constrained_Aliased_View --
- -
-
- function Is_Constrained_Aliased_View (Obj : Node_Id) return Boolean is
-E : Entity_Id;
-
- begin
-if Is_Entity_Name (Obj) then
-   E := Entity (Obj);
-
-   if Present (Renamed_Object (E)) then
-  return Is_Constrained_Aliased_View (Renamed_Object (E));
-   else
-  return Is_Aliased (E) and then Is_Constrained (Etype (E));
-   end if;
-
-else
-   return Is_Aliased_View (Obj)
-and then
-  (Is_Constrained (Etype (Obj))
- or else
-   (Nkind (Obj) = N_Explicit_Dereference
-  and then
-not Object_Type_Has_Constrained_Partial_View
-  (Typ  => Base_Type (Etype (Obj)),
-   Scop => Current_Scope)));
-end if;
- end Is_Constrained_Aliased_View;
-
   --  Start of processing for Constrained
 
   begin
@@ -2844,115 +2810,23 @@ package body Exp_Attr is
   New_Occurrence_Of
 (Extra_Constrained (Entity (Pref)), Sloc (N)));
 
- --  For all other entity names, we can tell at compile time
+ --  For all other cases, we can tell at compile time
 
- elsif Is_Entity_Name (Pref) then
-declare
-   Ent : constant Entity_Id   := Entity (Pref);
-   Res : Boolean;
-
-begin
-   --  (RM J.4) obsolescent cases
-
-   if Is_Type (Ent) then
-
-  --  Private type
-
-  if Is_Private_Type (Ent) then
- Res := not Has_Discriminants (Ent)
-  or else Is_Constrained (Ent);
-
-  --  It not a private type, must be a generic actual type
-  --  that corresponded to a private type. We know that this
-  --  correspondence holds, since otherwise the reference
-  --  within the generic template would have been illegal.
-
-  else
- if Is_Composite_Type (Underlying_Type (Ent)) then
-Res := Is_Constrained (Ent);
- else
-Res := True;
- end if;
-  end if;
-
-   else
-  --  For access type, apply access check as needed
-
-  if Is_Access_Type (Ptyp) then
- Apply_Access_Check (N);
-  end if;
-
-  --  If the prefix is not a variable or is aliased, then
-  --  definitely true; if it's a formal parameter without an
-  --  associated extra formal, then treat it as constrained.
-
-  --  Ada 2005 (AI-363): An aliased prefix must be known to be
-  --  constrained in order to set the attribute to True.
-
-  if not Is_Variable (Pref)
-or else Present (Formal_Ent)
-or else (Ada_Version < Ada_2005
-  and then Is_Aliased_View (Pref))
-or else (Ada_Version >= Ada_2005
-  and then Is_Constrained_Aliased_View (Pref))
-  then
- Res := True;
-
-  --  Variable case, look at type to see if it is constrained.
-  --  Note that the one case

[Ada] Ensure that Scan_Real result does not depend on trailing zeros

2019-09-18 Thread Pierre-Marie de Rodat

Previous change in that procedure to handle overflow issues during
scanning removed the special handling for trailing zeros in the decimal
part. Beside the absence of overflow during scanning the special
handling of these zeros is still necessary.

Tested on x86_64-pc-linux-gnu, committed on trunk

2019-09-18  Nicolas Roche  

gcc/ada/

* libgnat/s-valrea.adb (Scan_Integral_Digits): New procedure.
(Scan_Decimal_Digits): New procedure.
(As_Digit): New function.
(Scan_Real): Use Scan_Integral_Digits and Scan_Decimal_Digits.

gcc/testsuite/

* gnat.dg/float_value2.adb: New testcase.--- gcc/ada/libgnat/s-valrea.adb
+++ gcc/ada/libgnat/s-valrea.adb
@@ -29,346 +29,469 @@
 --  --
 --
 
-with System.Powten_Table;  use System.Powten_Table;
 with System.Val_Util;  use System.Val_Util;
 with System.Float_Control;
 
 package body System.Val_Real is
 
-   ---
-   -- Scan_Real --
-   ---
+   procedure Scan_Integral_Digits
+  (Str: String;
+   Index  : in out Integer;
+   Max: Integer;
+   Value  : out Long_Long_Integer;
+   Scale  : out Integer;
+   Base_Violation : in out Boolean;
+   Base   : Long_Long_Integer := 10;
+   Base_Specified : Boolean := False);
+   --  Scan the integral part of a real (i.e: before decimal separator)
+   --
+   --  The string parsed is Str (Index .. Max), and after the call Index will
+   --  point to the first non parsed character.
+   --
+   --  For each digit parsed either value := value * base + digit, or scale
+   --  is incremented by 1.
+   --
+   --  Base_Violation will be set to True a digit found is not part of the Base
+
+   procedure Scan_Decimal_Digits
+  (Str: String;
+   Index  : in out Integer;
+   Max: Integer;
+   Value  : in out Long_Long_Integer;
+   Scale  : in out Integer;
+   Base_Violation : in out Boolean;
+   Base   : Long_Long_Integer := 10;
+   Base_Specified : Boolean := False);
+   --  Scan the decimal part of a real (i.e: after decimal separator)
+   --
+   --  The string parsed is Str (Index .. Max), and after the call Index will
+   --  point to the first non parsed character.
+   --
+   --  For each digit parsed value = value * base + digit and scale is
+   --  decremented by 1. If precision limit is reached remaining digits are
+   --  still parsed but ignored.
+   --
+   --  Base_Violation will be set to True a digit found is not part of the Base
+
+   subtype Char_As_Digit is Long_Long_Integer range -2 .. 15;
+   subtype Valid_Digit is Char_As_Digit range 0 .. Char_As_Digit'Last;
+   Underscore : constant Char_As_Digit := -2;
+   E_Digit : constant Char_As_Digit := 14;
+
+   function As_Digit (C : Character) return Char_As_Digit;
+   --  Given a character return the digit it represent. If the character is
+   --  not a digit then a negative value is returned, -2 for underscore and
+   --  -1 for any other character.
+
+   Precision_Limit : constant Long_Long_Integer :=
+  2 ** (Long_Long_Float'Machine_Mantissa - 1) - 1;
+   --  This is an upper bound for the number of bits used to represent the
+   --  mantissa. Beyond that number, any digits parsed are useless.
+
+   --
+   -- As_Digit --
+   --
+
+   function As_Digit (C : Character) return Char_As_Digit
+   is
+   begin
+  case C is
+ when '0' .. '9' =>
+return Character'Pos (C) - Character'Pos ('0');
+ when 'a' .. 'f' =>
+return Character'Pos (C) - (Character'Pos ('a') - 10);
+ when 'A' .. 'F' =>
+return Character'Pos (C) - (Character'Pos ('A') - 10);
+ when '_' =>
+return Underscore;
+ when others =>
+return -1;
+  end case;
+   end As_Digit;
+
+   -
+   -- Scan_Decimal_Digits --
+   -
+
+   procedure Scan_Decimal_Digits
+  (Str: String;
+   Index  : in out Integer;
+   Max: Integer;
+   Value  : in out Long_Long_Integer;
+   Scale  : in out Integer;
+   Base_Violation : in out Boolean;
+   Base   : Long_Long_Integer := 10;
+   Base_Specified : Boolean := False)
 
-   function Scan_Real
- (Str : String;
-  Ptr : not null access Integer;
-  Max : Integer) return Long_Long_Float
is
-  P : Integer;
-  --  Local copy of string pointer
+  Precision_Limit_Reached : Boolean := False;
+  --  Set to True if addition of a digit will cause Value to be superior
+  --  to Precision_Limit.
 
-  Base : Long_Long_Float;
-  --  Base value
+  Digit : Char_As_Digit;
+  --  The current digit.
 
-  Uval : Long_Lon

[Ada] Spurious run time error on anonymous access formals

2019-09-18 Thread Pierre-Marie de Rodat

This patch fixes an issue whereby subprograms with anonymous access
formals may trigger spurious runtime accessibility errors when such
formals are used as actuals in calls to nested subprograms.

Running these commands:

  gnatmake -q pass.adb
  gnatmake -q fail.adb
  gnatmake -q test_main.adb
  gnatmake -q indirect_call_test.adb
  pass
  fail
  test_main
  indirect_call_test

On the following sources:

--  pass.adb

procedure Pass is

  function A (Param : access Integer) return Boolean is
type Typ is access all Integer;
function A_Inner (Param : access Integer) return Typ is
  begin
return Typ (Param); --  OK
  end;
begin
  return A_Inner (Param) = Typ (Param);
end;

  function B (Param : access Integer) return Boolean;
  function B (Param : access Integer) return Boolean is
type Typ is access all Integer;
function B_Inner (Param : access Integer) return Typ is
  begin
return Typ (Param); --  OK
  end;
begin
  return B_Inner (Param) = Typ (Param);
end;

  procedure C (Param : access Integer) is
type Typ is access all Integer;
Var : Typ;
procedure C_Inner (Param : access Integer) is
  begin
Var := Typ (Param); --  OK
  end;
begin
  C_Inner (Param);
end;

  procedure D (Param : access Integer);
  procedure D (Param : access Integer) is
type Typ is access all Integer;
Var : Typ;
procedure D_Inner (Param : access Integer) is
  begin
Var := Typ (Param); --  OK
  end;
begin
  D_Inner (Param);
end;

  protected type E is
function G (Param : access Integer) return Boolean;
procedure I (Param : access Integer);
  end;

  protected body E is
function F (Param : access Integer) return Boolean is
  type Typ is access all Integer;
  function F_Inner (Param : access Integer) return Typ is
begin
  return Typ (Param); --  OK
end;
  begin
return F_Inner (Param) = Typ (Param);
  end;

function G (Param : access Integer) return Boolean is
  type Typ is access all Integer;
  function G_Inner (Param : access Integer) return Typ is
begin
  return Typ (Param); --  OK
end;
  B : Boolean := F (Param); --  OK
  begin
return G_Inner (Param) = Typ (Param);
  end;

procedure H (Param : access Integer) is
  type Typ is access all Integer;
  Var : Typ;
  procedure H_Inner (Param : access Integer) is
begin
  Var := Typ (Param); --  OK
end;
  begin
H_Inner (Param);
  end;

procedure I (Param : access Integer) is
  type Typ is access all Integer;
  Var : Typ;
  procedure I_Inner (Param : access Integer) is
begin
  Var := Typ (Param); --  OK
end;
  begin
H (Param); --  OK
I_Inner (Param);
  end;
  end;

  task type J is end;

  task body J is
function K (Param : access Integer) return Boolean is
  type Typ is access all Integer;
  function K_Inner (Param : access Integer) return Typ is
begin
  return Typ (Param); --  OK
end;
  begin
return K_Inner (Param) = Typ (Param);
  end;

function L (Param : access Integer) return Boolean;
function L (Param : access Integer) return Boolean is
  type Typ is access all Integer;
  function L_Inner (Param : access Integer) return Typ is
begin
  return Typ (Param); --  OK
end;
  begin
return L_Inner (Param) = Typ (Param);
  end;

procedure M (Param : access Integer) is
  type Typ is access all Integer;
  Var : Typ;
  procedure M_Inner (Param : access Integer) is
begin
  Var := Typ (Param); --  OK
end;
  begin
M_Inner (Param);
  end;

procedure N (Param : access Integer);
procedure N (Param : access Integer) is
  type Typ is access all Integer;
  Var : Typ;
  procedure N_Inner (Param : access Integer) is
begin
  Var := Typ (Param); --  OK
end;
  begin
N_Inner (Param);
  end;
Var : aliased Integer := 666;
begin
  if K (Var'Access) then null; end if; --  OK
  if L (Var'Access) then null; end if; --  OK
  M (Var'Access);  --  OK
  N (Var'Access);  --  OK
end;

begin
  begin
begin
  declare
  Var  : aliased Integer := 666;
  T: J;
  Prot : E;
  begin
if A (Var'Access) then null; end if;  --  OK
if B (Var'Access) then null; end if;  --  OK
C (Var'Access);   --  OK
D (Var'Access);   --  OK
if Prot.G (Var'Access) then null; end if; --  OK
Prot.I (Var'Access);  --  OK
  end;
end;
  end;
end;

--  fail.adb

procedure Fail is
  Failures : Integer := 0;

  type Base_Typ is access all Integer

[Ada] No Storage_Error for an oversized disabled ghost array object

2019-09-18 Thread Pierre-Marie de Rodat

In some cases where the size computation for an object declaration will
unconditionally overflow, the FE generates code to raise Storage_Error
at the point of the object declaration (and may generate an associated
warning). Don't do this if the object declaration is an ignored (i.e.,
disabled) ghost declaration.

Tested on x86_64-pc-linux-gnu, committed on trunk

2019-09-18  Steve Baird  

gcc/ada/

* freeze.adb (Freeze_Object_Declaration): Do not call
Check_Large_Modular_Array when the object declaration being
frozen is an ignored ghost entity.

gcc/testsuite/

* gnat.dg/ghost7.adb, gnat.dg/ghost7.ads: New testcase.--- gcc/ada/freeze.adb
+++ gcc/ada/freeze.adb
@@ -3569,7 +3569,8 @@ package body Freeze is
 Error_Msg_N ("\??use explicit size clause to set size", E);
  end if;
 
- if Is_Array_Type (Typ) then
+ --  Declaring a too-big array in disabled ghost code is OK
+ if Is_Array_Type (Typ) and then not Is_Ignored_Ghost_Entity (E) then
 Check_Large_Modular_Array (Typ);
  end if;
   end Freeze_Object_Declaration;

--- /dev/null
new file mode 100644
+++ gcc/testsuite/gnat.dg/ghost7.adb
@@ -0,0 +1,6 @@
+--  { dg-do compile }
+--  { dg-options "-gnatwa" }
+
+package body Ghost7 is
+   procedure Dummy is null;
+end Ghost7;

--- /dev/null
new file mode 100644
+++ gcc/testsuite/gnat.dg/ghost7.ads
@@ -0,0 +1,8 @@
+pragma Restrictions (No_Exception_Propagation);
+
+package Ghost7 is
+   type Word64 is mod 2**64;
+   type My_Array_Type is array (Word64) of Boolean;
+   My_Array : My_Array_Type with Ghost;
+   procedure Dummy;
+end Ghost7;
\ No newline at end of file

[Ada] Improve efficiency of copying bit-packed slices

2019-09-18 Thread Pierre-Marie de Rodat

This patch substantially improves the efficiency of copying large slices
of bit-packed arrays, by copying 32 bits at a time instead of 1 at a
time.

Tested on x86_64-pc-linux-gnu, committed on trunk

2019-09-18  Bob Duff  

gcc/ada/

* exp_ch5.adb (Expand_Assign_Array_Loop_Or_Bitfield): The call
to Copy_Bitfield is now enabled.
(Expand_Assign_Array_Bitfield): Multiply 'Length times
'Component_Size "by hand" instead of using 'Size.--- gcc/ada/exp_ch5.adb
+++ gcc/ada/exp_ch5.adb
@@ -1411,12 +1411,21 @@ package body Exp_Ch5 is
   --  Compute the Size of the bitfield
 
   --  Note that the length check has already been done, so we can use the
-  --  size of either L or R.
+  --  size of either L or R; they are equal. We can't use 'Size here,
+  --  because sometimes bit fields get copied into a temp, and the 'Size
+  --  ends up being the size of the temp (e.g. an 8-bit temp containing
+  --  a 4-bit bit field).
 
   Size : constant Node_Id :=
-Make_Attribute_Reference (Loc,
-  Prefix => Duplicate_Subexpr (Name (N), True),
-  Attribute_Name => Name_Size);
+Make_Op_Multiply (Loc,
+  Make_Attribute_Reference (Loc,
+Prefix =>
+  Duplicate_Subexpr (Name (N), True),
+Attribute_Name => Name_Length),
+  Make_Attribute_Reference (Loc,
+Prefix =>
+  Duplicate_Subexpr (Name (N), True),
+Attribute_Name => Name_Component_Size));
 
begin
   return Make_Procedure_Call_Statement (Loc,
@@ -1466,10 +1475,7 @@ package body Exp_Ch5 is
   --  optimization in that case as well.  We could complicate this code by
   --  actually looking for such volatile and independent components.
 
-  --  Note that Expand_Assign_Array_Bitfield is disabled for now.
-
-  if False and then -- ???
-RTE_Available (RE_Copy_Bitfield)
+  if RTE_Available (RE_Copy_Bitfield)
 and then Is_Bit_Packed_Array (L_Type)
 and then Is_Bit_Packed_Array (R_Type)
 and then not Reverse_Storage_Order (L_Type)

[Ada] Crash on universal case expression in fixed-point division

2019-09-18 Thread Pierre-Marie de Rodat

This patch fixes a compiler abort on a case expression whose
alternatives are universal_real constants, when the case expression is
an operand in a multiplication or division whose other operand is of a
fixed-point type.

Tested on x86_64-pc-linux-gnu, committed on trunk

2019-09-18  Ed Schonberg  

gcc/ada/

* sem_res.adb (Set_Mixed_Node_Expression): If a conditional
expression has universal_real alternaitves and the context is
Universal_Fixed, as when it is an operand in a fixed-point
multiplication or division, resolve the expression with a
visible fixed-point type, which must be unique.

gcc/testsuite/

* gnat.dg/fixedpnt8.adb: New testcase.--- gcc/ada/sem_res.adb
+++ gcc/ada/sem_res.adb
@@ -5674,13 +5674,21 @@ package body Sem_Res is
 
  --  A universal real conditional expression can appear in a fixed-type
  --  context and must be resolved with that context to facilitate the
- --  code generation in the back end.
+ --  code generation in the back end. However, If the context is
+ --  Universal_fixed (i.e. as an operand of a multiplication/division
+ --  involving a fixed-point operand) the conditional expression must
+ --  resolve to a unique visible fixed_point type, normally Duration.
 
  elsif Nkind_In (N, N_Case_Expression, N_If_Expression)
and then Etype (N) = Universal_Real
and then Is_Fixed_Point_Type (B_Typ)
  then
-Resolve (N, B_Typ);
+if B_Typ = Universal_Fixed then
+   Resolve (N, Unique_Fixed_Point_Type (N));
+
+else
+   Resolve (N, B_Typ);
+end if;
 
  else
 Resolve (N);

--- /dev/null
new file mode 100644
+++ gcc/testsuite/gnat.dg/fixedpnt8.adb
@@ -0,0 +1,28 @@
+--  { dg-do compile }
+
+procedure Fixedpnt8 is
+
+   Ct_A : constant := 0.000_000_100;
+   Ct_B : constant := 0.000_000_025;
+
+   Ct_C : constant := 1_000;
+
+   type Number_Type is range 0 .. Ct_C;
+
+   subtype Index_Type is Number_Type range 1 .. Number_Type'Last;
+
+   type Kind_Enumerated_Type is
+  (A1,
+   A2);
+
+   Kind : Kind_Enumerated_Type := A1;
+
+   V : Duration := 10.0;
+
+   Last : constant Index_Type :=
+  Index_Type (V / (case Kind is --  { dg-warning "universal_fixed expression interpreted as type \"Standard.Duration\"" }
+  when A1 => Ct_B,
+  when A2 => Ct_A));
+begin
+   null;
+end Fixedpnt8;
\ No newline at end of file

[Ada] Refine type of Get_Homonym_Number result

2019-09-18 Thread Pierre-Marie de Rodat

Routine Get_Homonym_Number always returns a positive number. This is
explained in its comment and is evident from its body. No test attached,
because semantics is unaffected.

Tested on x86_64-pc-linux-gnu, committed on trunk

2019-09-18  Piotr Trojanek  

gcc/ada/

* exp_dbug.ads, exp_dbug.adb (Get_Homonym_Number): Refine type
from Nat to Pos.
* sem_util.adb (Add_Homonym_Suffix): Refine type of a local
variable.--- gcc/ada/exp_dbug.adb
+++ gcc/ada/exp_dbug.adb
@@ -1058,9 +1058,9 @@ package body Exp_Dbug is
-- Get_Homonym_Number --

 
-   function Get_Homonym_Number (E : Entity_Id) return Nat is
+   function Get_Homonym_Number (E : Entity_Id) return Pos is
   H  : Entity_Id := Homonym (E);
-  Nr : Nat := 1;
+  Nr : Pos := 1;
 
begin
   while Present (H) loop

--- gcc/ada/exp_dbug.ads
+++ gcc/ada/exp_dbug.ads
@@ -460,7 +460,7 @@ package Exp_Dbug is
-- Subprograms for Handling Qualification --

 
-   function Get_Homonym_Number (E : Entity_Id) return Nat;
+   function Get_Homonym_Number (E : Entity_Id) return Pos;
--  Return the homonym number for E, which is its position in the homonym
--  chain starting at 1. This is exported for use in GNATprove.
 

--- gcc/ada/sem_util.adb
+++ gcc/ada/sem_util.adb
@@ -26183,7 +26183,7 @@ package body Sem_Util is
 
  if Has_Homonym (U) then
 declare
-   N : constant Nat := Get_Homonym_Number (U);
+   N : constant Pos := Get_Homonym_Number (U);
S : constant String := N'Img;
 begin
if N > 1 then

[Ada] Fix spurious alignment warning on simple address clause

2019-09-18 Thread Pierre-Marie de Rodat

This eliminates a spurious alignment warning given by the compiler on an
address clause when the No_Exception_Propagation restriction is in
effect and the -gnatw.x switch is used. In this configuration the
address clauses whose expression is itself of the form X'Address would
not be sufficiently analyzed and, therefore, the compiler might give
false positive warnings.

Tested on x86_64-pc-linux-gnu, committed on trunk

2019-09-18  Eric Botcazou  

gcc/ada/

* checks.ads (Alignment_Warnings_Record): Add P component.
* checks.adb (Apply_Address_Clause_Check): Be prepared to kill
the warning also if the clause is of the form X'Address.
(Validate_Alignment_Check_Warning): Kill the warning if the
clause is of the form X'Address and the alignment of X is
compatible.

gcc/testsuite/

* gnat.dg/warn31.adb, gnat.dg/warn31.ads: New testcase.--- gcc/ada/checks.adb
+++ gcc/ada/checks.adb
@@ -808,7 +808,21 @@ package body Checks is
 
 if Compile_Time_Known_Value (Expr) then
Alignment_Warnings.Append
- ((E => E, A => Expr_Value (Expr), W => Warning_Msg));
+ ((E => E,
+   A => Expr_Value (Expr),
+   P => Empty,
+   W => Warning_Msg));
+
+--  Likewise if the expression is of the form X'Address
+
+elsif Nkind (Expr) = N_Attribute_Reference
+  and then Attribute_Name (Expr) = Name_Address
+then
+   Alignment_Warnings.Append
+ ((E => E,
+   A => No_Uint,
+   P => Prefix (Expr),
+   W => Warning_Msg));
 
 --  Add explanation of the warning generated by the check
 
@@ -10925,7 +10939,12 @@ package body Checks is
 renames Alignment_Warnings.Table (J);
  begin
 if Known_Alignment (AWR.E)
-  and then AWR.A mod Alignment (AWR.E) = 0
+  and then ((AWR.A /= No_Uint
+  and then AWR.A mod Alignment (AWR.E) = 0)
+or else (Present (AWR.P)
+  and then Has_Compatible_Alignment
+ (AWR.E, AWR.P, True) =
+   Known_Compatible))
 then
Delete_Warning_And_Continuations (AWR.W);
 end if;

--- gcc/ada/checks.ads
+++ gcc/ada/checks.ads
@@ -90,7 +90,7 @@ package Checks is
--  When we have address clauses, there is an issue of whether the address
--  specified is appropriate to the alignment. In the general case where the
--  address is dynamic, we generate a check and a possible warning (this
-   --  warning occurs for example if we have a restricted run time with the
+   --  warning occurs for example if we have a restricted runtime with the
--  restriction No_Exception_Propagation). We also issue this warning in
--  the case where the address is static, but we don't know the alignment
--  at the time we process the address clause. In such a case, we issue the
@@ -98,7 +98,7 @@ package Checks is
--  annotated the actual alignment chosen) that the warning was not needed.
 
--  To deal with deleting these potentially annoying warnings, we save the
-   --  warning information in a table, and then delete the waranings in the
+   --  warning information in a table, and then delete the warnings in the
--  post compilation validation stage if we can tell that the check would
--  never fail (in general the back end will also optimize away the check
--  in such cases).
@@ -113,6 +113,9 @@ package Checks is
   --  Compile time known value of address clause for which the alignment
   --  is to be checked once we know the alignment.
 
+  P : Node_Id;
+  --  Prefix of address clause when it is of the form X'Address
+
   W : Error_Msg_Id;
   --  Id of warning message we might delete
end record;

--- /dev/null
new file mode 100644
+++ gcc/testsuite/gnat.dg/warn31.adb
@@ -0,0 +1,5 @@
+--  { dg-do compile }
+--  { dg-options "-gnatw.x -gnatd.a" }
+package body Warn31 is
+procedure Dummy is null;
+end Warn31;

--- /dev/null
new file mode 100644
+++ gcc/testsuite/gnat.dg/warn31.ads
@@ -0,0 +1,20 @@
+pragma Restrictions (No_Exception_Propagation);
+
+package Warn31 is
+
+   type U16 is mod 2 ** 16;
+   type U32 is mod 2 ** 32;
+
+   type Pair is record
+  X, Y : U16;
+   end record;
+   for Pair'Alignment use U32'Alignment;
+
+   Blob : array (1 .. 2) of Pair;
+
+   Sum : array (1 .. 2) of U32;
+   for Sum'Address use Blob'Address;
+
+   procedure Dummy;
+
+end Warn31;
\ No newline at end of file

[Ada] Implement AI12-0086's rules for discriminants in aggregates

2019-09-18 Thread Pierre-Marie de Rodat

In Ada2012, a discriminant value that governs an active variant part in
an aggregate had to be static. AI12-0086 relaxes this restriction - if
the subtype of the discriminant value is a static subtype all of whose
values select the same variant, then that is good enough.

Tested on x86_64-pc-linux-gnu, committed on trunk

2019-09-18  Steve Baird  

gcc/ada/

* sem_util.ads (Interval_Lists): A new visible package. This
package is visible because it is also intended for eventual use
in Sem_Eval.Subtypes_Statically_Compatible when that function is
someday upgraded to handle static predicates correctly.  This
new package doesn't really need to be visible for now, but it
still seems like a good idea.
* sem_util.adb (Gather_Components): Implement AI12-0086 via the
following strategy. The existing code knows how to take a static
discriminant value and identify the corresponding variant; in
the newly-permitted case of a non-static value of a static
subtype, we arbitrarily select a value of the subtype and find
the corresponding variant using the existing code. Subsequently,
we check that every other value of the discriminant's subtype
corresponds to the same variant; this is done using the newly
introduced Interval_Lists package.
(Interval_Lists): Provide a body for the new package.

gcc/testsuite/

* gnat.dg/ai12_0086_example.adb: New testcase.--- gcc/ada/sem_util.adb
+++ gcc/ada/sem_util.adb
@@ -68,6 +68,7 @@ with Tbuild;   use Tbuild;
 with Ttypes;   use Ttypes;
 with Uname;use Uname;
 
+with GNAT.Heap_Sort_G;
 with GNAT.HTable; use GNAT.HTable;
 
 package body Sem_Util is
@@ -8885,11 +8886,17 @@ package body Sem_Util is
   Variant : Node_Id;
   Discrete_Choice : Node_Id;
   Comp_Item   : Node_Id;
+  Discrim : Entity_Id;
+  Discrim_Name: Node_Id;
 
-  Discrim   : Entity_Id;
-  Discrim_Name  : Node_Id;
-  Discrim_Value : Node_Id;
+  type Discriminant_Value_Status is
+(Static_Expr, Static_Subtype, Bad);
+  subtype Good_Discrim_Value_Status is Discriminant_Value_Status
+range Static_Expr .. Static_Subtype; -- range excludes Bad
 
+  Discrim_Value : Node_Id;
+  Discrim_Value_Subtype : Node_Id;
+  Discrim_Value_Status  : Discriminant_Value_Status := Bad;
begin
   Report_Errors := False;
 
@@ -9022,26 +9029,73 @@ package body Sem_Util is
   end loop Find_Constraint;
 
   Discrim_Value := Expression (Assoc);
+  if Is_OK_Static_Expression (Discrim_Value) then
+ Discrim_Value_Status := Static_Expr;
+  else
+ if Ada_Version >= Ada_2020 then
+if Original_Node (Discrim_Value) /= Discrim_Value
+   and then Nkind (Discrim_Value) = N_Type_Conversion
+   and then Etype (Original_Node (Discrim_Value))
+  = Etype (Expression (Discrim_Value))
+then
+   Discrim_Value_Subtype := Etype (Original_Node (Discrim_Value));
+   --  An unhelpful (for this code) type conversion may be
+   --  introduced in some cases; deal with it.
+else
+   Discrim_Value_Subtype := Etype (Discrim_Value);
+end if;
 
-  if not Is_OK_Static_Expression (Discrim_Value) then
+if Is_OK_Static_Subtype (Discrim_Value_Subtype) and then
+   not Is_Null_Range (Type_Low_Bound (Discrim_Value_Subtype),
+  Type_High_Bound (Discrim_Value_Subtype))
+then
+   --  Is_Null_Range test doesn't account for predicates, as in
+   --subtype Null_By_Predicate is Natural
+   --  with Static_Predicate => Null_By_Predicate < 0;
+   --  so test for that null case separately.
+
+   if (not Has_Static_Predicate (Discrim_Value_Subtype))
+ or else Present (First (Static_Discrete_Predicate
+   (Discrim_Value_Subtype)))
+   then
+  Discrim_Value_Status := Static_Subtype;
+   end if;
+end if;
+ end if;
 
- --  If the variant part is governed by a discriminant of the type
- --  this is an error. If the variant part and the discriminant are
- --  inherited from an ancestor this is legal (AI05-120) unless the
- --  components are being gathered for an aggregate, in which case
- --  the caller must check Report_Errors.
+ if Discrim_Value_Status = Bad then
 
- if Scope (Original_Record_Component
- ((Entity (First (Choices (Assoc)) = Typ
- then
-Error_Msg_FE
-  ("value for discriminant & must be static!",
-   Discrim_Value, Discrim);
-Why_Not_Static (Discrim_Value);
- end if;
+--

[Ada] Fix portability issues in access to subprograms

2019-09-18 Thread Pierre-Marie de Rodat

This patch improves the portability of the code generated by the
compiler for access to subprograms. Written by Richard Kenner.

Tested on x86_64-pc-linux-gnu, committed on trunk

2019-09-18  Javier Miranda  

gcc/ada/

* exp_ch4.adb (Expand_N_Op_Eq): The frontend assumes that we can
do a bit-for-bit comparison of two access to protected
subprogram pointers. However, there are two reasons why we may
not be able to do that: (1) there may be padding bits for
alignment before the access to subprogram, and (2) the access to
subprogram itself may not be compared bit-for- bit because the
activation record part is undefined: two pointers are equal iff
the subprogram addresses are equal. This patch fixes it by
forcing a field-by-field comparison.
* bindgen.adb (Gen_Adainit): The type No_Param_Proc is defined
in the library as having Favor_Top_Level, but when we create an
object of that type in the binder file we don't have that
pragma, so the types are different. This patch fixes this issue.
* libgnarl/s-interr.adb, libgnarl/s-interr__hwint.adb,
libgnarl/s-interr__sigaction.adb, libgnarl/s-interr__vxworks.adb
(Is_Registered): This routine erroneously assumes that the
access to protected subprogram is two addresses. We need to
create the same record that the compiler makes to ensure that
any padding is the same. Then we have to look at just the first
word of the access to subprogram. This patch fixes this issue.--- gcc/ada/bindgen.adb
+++ gcc/ada/bindgen.adb
@@ -524,6 +524,7 @@ package body Bindgen is
 and then not Configurable_Run_Time_On_Target
   then
  WBI ("   type No_Param_Proc is access procedure;");
+ WBI ("   pragma Favor_Top_Level (No_Param_Proc);");
  WBI ("");
   end if;
 

--- gcc/ada/exp_ch4.adb
+++ gcc/ada/exp_ch4.adb
@@ -8221,6 +8221,32 @@ package body Exp_Ch4 is
 Insert_Actions  (N, Bodies,   Suppress => All_Checks);
 Analyze_And_Resolve (N, Standard_Boolean, Suppress => All_Checks);
  end if;
+
+  --  If unnesting, handle elementary types whose Equivalent_Types are
+  --  records because there may be padding or undefined fields.
+
+  elsif Unnest_Subprogram_Mode
+and then Ekind_In (Typl, E_Class_Wide_Type,
+ E_Class_Wide_Subtype,
+ E_Access_Subprogram_Type,
+ E_Access_Protected_Subprogram_Type,
+ E_Anonymous_Access_Protected_Subprogram_Type,
+ E_Access_Subprogram_Type,
+ E_Exception_Type)
+and then Present (Equivalent_Type (Typl))
+and then Is_Record_Type (Equivalent_Type (Typl))
+  then
+ Typl := Equivalent_Type (Typl);
+ Remove_Side_Effects (Lhs);
+ Remove_Side_Effects (Rhs);
+ Rewrite (N,
+   Expand_Record_Equality (N, Typl,
+ Unchecked_Convert_To (Typl, Lhs),
+ Unchecked_Convert_To (Typl, Rhs),
+ Bodies));
+
+ Insert_Actions  (N, Bodies,   Suppress => All_Checks);
+ Analyze_And_Resolve (N, Standard_Boolean, Suppress => All_Checks);
   end if;
 
   --  Test if result is known at compile time
@@ -9497,10 +9523,21 @@ package body Exp_Ch4 is
   Typ : constant Entity_Id := Etype (Left_Opnd (N));
 
begin
-  --  Case of elementary type with standard operator
+  --  Case of elementary type with standard operator.  But if
+  --  unnesting, handle elementary types whose Equivalent_Types are
+  --  records because there may be padding or undefined fields.
 
   if Is_Elementary_Type (Typ)
 and then Sloc (Entity (N)) = Standard_Location
+and then not (Ekind_In (Typ, E_Class_Wide_Type,
+E_Class_Wide_Subtype,
+E_Access_Subprogram_Type,
+E_Access_Protected_Subprogram_Type,
+E_Anonymous_Access_Protected_Subprogram_Type,
+E_Access_Subprogram_Type,
+E_Exception_Type)
+and then Present (Equivalent_Type (Typ))
+and then Is_Record_Type (Equivalent_Type (Typ)))
   then
  Binary_Op_Validity_Checks (N);
 

--- gcc/ada/libgnarl/s-interr.adb
+++ gcc/ada/libgnarl/s-interr.adb
@@ -545,9 +545,11 @@ package body System.Interrupts is
 
function Is_Registered (Handler : Parameterless_Handler) return Boolean is
 
+  type Acc_Proc is access procedure;
+
   type Fat_Ptr is record
  Object_Addr  : System.Address;
- Handler_Addr : System.Address;
+ Handler_Addr : Acc_Proc;
   end record;
 
   function To_Fat_Ptr is new A

[Ada] Crash on aggregate with dscriminant in if-expression as default

2019-09-18 Thread Pierre-Marie de Rodat

This patch fixes a crash on a an aggregate for a discriminated type,
when a component of the aggregate is also a discriminated type
constrained by a discriminant of the enclosing object, and the default
value for the component is a conditional expression that includes
references to that outer discriminant.

Tested on x86_64-pc-linux-gnu, committed on trunk

2019-09-18  Ed Schonberg  

gcc/ada/

* exp_aggr.adb (Expand_Record_Aggregate, Rewrite_Discriminant):
After rewriting a reference to an outer discriminant as a
selected component of the enclosing object, analyze the selected
component to ensure that the entity of the selector name is
properly set. This is necessary when the aggregate appears
within an expression that may have been analyzed already.

gcc/testsuite/

* gnat.dg/discr58.adb: New testcase.--- gcc/ada/exp_aggr.adb
+++ gcc/ada/exp_aggr.adb
@@ -3103,6 +3103,13 @@ package body Exp_Aggr is
   Make_Selected_Component (Loc,
 Prefix=> New_Copy_Tree (Lhs),
 Selector_Name => Make_Identifier (Loc, Chars (Expr;
+
+--  The generated code will be reanalyzed, but if the reference
+--  to the discriminant appears within an already analyzed
+--  expression (e.g. a conditional) we must set its proper entity
+--  now. Context is an initialization procedure.
+
+Analyze (Expr);
  end if;
 
  return OK;

--- /dev/null
new file mode 100644
+++ gcc/testsuite/gnat.dg/discr58.adb
@@ -0,0 +1,33 @@
+--  { dg-do compile }
+
+with Ada.Text_IO; use Ada.Text_IO;
+
+procedure Discr58 is
+
+   type Field(Flag : Boolean := True) is record
+  case Flag is
+ when True  => Param1 : Boolean := False;
+ when False => Param2 : Boolean := True;
+  end case;
+   end record;
+
+   type Header(Flag : Boolean := True) is record
+  Param3 : Integer := 0;
+  Params : Field(Flag) := (if Flag = True then
+  (Flag => True, others => <>)
+   else
+  (Flag => False, others => <>));
+   end record;
+
+   type Message(Flag : Boolean) is record
+
+  -- This assignment crashes GNAT
+  The_Header : Header(Flag) := Header'(Flag => True, others => <>);
+   end record;
+
+   It : Message (True);
+begin
+   Put_Line("Hello World");
+   Put_Line (Boolean'Image (It.The_Header.Flag));
+   Put_Line (Boolean'Image (It.The_Header.Params.Flag));
+end Discr58;
\ No newline at end of file

[Ada] Fix sharing of expression in array aggregate with others choice

2019-09-18 Thread Pierre-Marie de Rodat

This change fixes a long-standing issue in the compiler that is
generally silent but may lead to wrong code generation in specific
circumstances.  When an others choice in an array aggregate spans
multiple ranges, the compiler may generate multiple (groups of)
assignments for the ranges.

The problem is that it internally reuses the original expression for all
the ranges, which is problematic if this expression gets rewritten
during the processing of one of the ranges and typically causes a new
temporary to be shared between different ranges.

The solution is to duplicate the original expression for each range.

Tested on x86_64-pc-linux-gnu, committed on trunk

2019-09-18  Eric Botcazou  

gcc/ada/

* exp_aggr.adb (Build_Array_Aggr_Code): In STEP 1 (c), duplicate
the expression and reset the Loop_Actions for each loop
generated for an others choice.

gcc/testsuite/

* gnat.dg/aggr28.adb: New testcase.--- gcc/ada/exp_aggr.adb
+++ gcc/ada/exp_aggr.adb
@@ -2075,7 +2075,6 @@ package body Exp_Aggr is
 Choice := First (Choice_List (Assoc));
 while Present (Choice) loop
if Nkind (Choice) = N_Others_Choice then
-  Set_Loop_Actions (Assoc, New_List);
   Others_Assoc := Assoc;
   exit;
end if;
@@ -2122,7 +2121,8 @@ package body Exp_Aggr is
 
  if Present (Others_Assoc) then
 declare
-   First : Boolean := True;
+   First: Boolean := True;
+   Dup_Expr : Node_Id;
 
 begin
for J in 0 .. Nb_Choices loop
@@ -2160,9 +2160,19 @@ package body Exp_Aggr is
 or else not Empty_Range (Low, High)
   then
  First := False;
+
+ --  Duplicate the expression in case we will be generating
+ --  several loops. As a result the expression is no longer
+ --  shared between the loops and is reevaluated for each
+ --  such loop.
+
+ Expr := Get_Assoc_Expr (Others_Assoc);
+ Dup_Expr := New_Copy_Tree (Expr);
+ Set_Parent (Dup_Expr, Parent (Expr));
+
+ Set_Loop_Actions (Others_Assoc, New_List);
  Append_List
-   (Gen_Loop (Low, High,
-  Get_Assoc_Expr (Others_Assoc)), To => New_Code);
+   (Gen_Loop (Low, High, Dup_Expr), To => New_Code);
   end if;
end loop;
 end;

--- /dev/null
new file mode 100644
+++ gcc/testsuite/gnat.dg/aggr28.adb
@@ -0,0 +1,29 @@
+--  { dg-do run }
+
+procedure Aggr28 is
+
+  Count : Natural := 0;
+
+  function Get (S: String) return String is
+  begin
+Count := Count + 1;
+return S;
+  end;
+
+  Max_Error_Length : constant := 8;
+  subtype Error_Type is String (1 .. Max_Error_Length);
+
+  type Rec is record
+Text : Error_Type;
+  end record;
+
+  type Arr is array (1 .. 16) of Rec;
+
+  Table : constant Arr :=
+(3 => (Text => Get ("INVALID ")), others => (Text => Get ("OTHERS  ")));
+
+begin
+  if Count /= Table'Length then
+raise Program_Error;
+  end if;
+end;
\ No newline at end of file

[Ada] Avoid gnatbind regression caused by Copy_Bitfield

2019-09-18 Thread Pierre-Marie de Rodat

The recent Copy_Bitfield change caused gnatbind to change elaboration
order, causing different error messages.

Tested on x86_64-pc-linux-gnu, committed on trunk

2019-09-18  Bob Duff  

gcc/ada/

* exp_ch5.adb (Expand_Assign_Array_Loop_Or_Bitfield): Move call
to RTE_Available later, so it doesn't disturb the elab order.
The RE_Copy_Bitfield entity is defined in package
System.Bitfields which has a dependency on package
System.Bitfield_Utils, which has it its spec:

   pragma Elaborate_Body;

The query on RTE_Available forces loading and analyzing
System.Bitfields and all its withed units.--- gcc/ada/exp_ch5.adb
+++ gcc/ada/exp_ch5.adb
@@ -1475,8 +1475,7 @@ package body Exp_Ch5 is
   --  optimization in that case as well.  We could complicate this code by
   --  actually looking for such volatile and independent components.
 
-  if RTE_Available (RE_Copy_Bitfield)
-and then Is_Bit_Packed_Array (L_Type)
+  if Is_Bit_Packed_Array (L_Type)
 and then Is_Bit_Packed_Array (R_Type)
 and then not Reverse_Storage_Order (L_Type)
 and then not Reverse_Storage_Order (R_Type)
@@ -1489,6 +1488,7 @@ package body Exp_Ch5 is
 and then not Has_Independent_Components (R_Type)
 and then not L_Prefix_Comp
 and then not R_Prefix_Comp
+and then RTE_Available (RE_Copy_Bitfield)
   then
  return Expand_Assign_Array_Bitfield
(N, Larray, Rarray, L_Type, R_Type, Rev);

[Ada] Skip entity name qualification in GNATprove mode

2019-09-18 Thread Pierre-Marie de Rodat

GNATprove was using the qualification of names for entities with local
homonyms in the same scope, requiring the use of a suffix to
differentiate them. This caused problems for correctly identifying
primitive equality operators. This case is now handled like the rest of
entities in GNATprove, by instead updating Unique_Name to append the
suffix on-the-fly where needed.

There is no impact on compilation and hence no test.

Tested on x86_64-pc-linux-gnu, committed on trunk

2019-09-18  Yannick Moy  

gcc/ada/

* exp_dbug.adb (Append_Homonym_Number): Use new function
Get_Homonym_Number.
(Get_Homonym_Number): New function to return the homonym number.
(Qualify_Entity_Name): Remove special case for GNATprove.
* exp_dbug.ads (Get_Homonym_Number): Make the new function
public for use in GNATprove.
* frontend.adb (Frontend): Do not qualify names in GNATprove
mode.
* sem_util.adb (Unique_Name): Append homonym suffix where needed
for entities which have local homonyms in the same scope.--- gcc/ada/exp_dbug.adb
+++ gcc/ada/exp_dbug.adb
@@ -219,26 +219,12 @@ package body Exp_Dbug is
 
begin
   if Has_Homonym (E) then
- declare
-H  : Entity_Id := Homonym (E);
-Nr : Nat := 1;
-
- begin
-while Present (H) loop
-   if Scope (H) = Scope (E) then
-  Nr := Nr + 1;
-   end if;
-
-   H := Homonym (H);
-end loop;
-
-if Homonym_Len > 0 then
-   Homonym_Len := Homonym_Len + 1;
-   Homonym_Numbers (Homonym_Len) := '_';
-end if;
+ if Homonym_Len > 0 then
+Homonym_Len := Homonym_Len + 1;
+Homonym_Numbers (Homonym_Len) := '_';
+ end if;
 
-Add_Nat_To_H (Nr);
- end;
+ Add_Nat_To_H (Get_Homonym_Number (E));
   end if;
end Append_Homonym_Number;
 
@@ -1068,6 +1054,26 @@ package body Exp_Dbug is
   end loop;
end Build_Subprogram_Instance_Renamings;
 
+   
+   -- Get_Homonym_Number --
+   
+
+   function Get_Homonym_Number (E : Entity_Id) return Nat is
+  H  : Entity_Id := Homonym (E);
+  Nr : Nat := 1;
+
+   begin
+  while Present (H) loop
+ if Scope (H) = Scope (E) then
+Nr := Nr + 1;
+ end if;
+
+ H := Homonym (H);
+  end loop;
+
+  return Nr;
+   end Get_Homonym_Number;
+

-- Get_Secondary_DT_External_Name --

@@ -1451,25 +1457,6 @@ package body Exp_Dbug is
   if Has_Qualified_Name (Ent) then
  return;
 
-  --  In formal verification mode, simply append a suffix for homonyms.
-  --  We used to qualify entity names as full expansion does, but this was
-  --  removed as this prevents the verification back-end from using a short
-  --  name for debugging and user interaction. The verification back-end
-  --  already takes care of qualifying names when needed. Still mark the
-  --  name as being qualified, as Qualify_Entity_Name may be called more
-  --  than once on the same entity.
-
-  elsif GNATprove_Mode then
- if Has_Homonym (Ent) then
-Get_Name_String (Chars (Ent));
-Append_Homonym_Number (Ent);
-Output_Homonym_Numbers_Suffix;
-Set_Chars (Ent, Name_Enter);
- end if;
-
- Set_Has_Qualified_Name (Ent);
- return;
-
   --  If the entity is a variable encoding the debug name for an object
   --  renaming, then the qualified name of the entity associated with the
   --  renamed object can now be incorporated in the debug name.

--- gcc/ada/exp_dbug.ads
+++ gcc/ada/exp_dbug.ads
@@ -460,6 +460,10 @@ package Exp_Dbug is
-- Subprograms for Handling Qualification --

 
+   function Get_Homonym_Number (E : Entity_Id) return Nat;
+   --  Return the homonym number for E, which is its position in the homonym
+   --  chain starting at 1. This is exported for use in GNATprove.
+
procedure Qualify_Entity_Names (N : Node_Id);
--  Given a node N, that represents a block, subprogram body, or package
--  body or spec, or protected or task type, sets a fully qualified name

--- gcc/ada/frontend.adb
+++ gcc/ada/frontend.adb
@@ -492,7 +492,9 @@ begin
 
--  Qualify all entity names in inner packages, package bodies, etc
 
-   Exp_Dbug.Qualify_All_Entity_Names;
+   if not GNATprove_Mode then
+  Exp_Dbug.Qualify_All_Entity_Names;
+   end if;
 
--  SCIL backend requirement. Check that SCIL nodes associated with
--  dispatching calls reference subprogram calls.

--- gcc/ada/sem_util.adb
+++ gcc/ada/sem_util.adb
@@ -33,6 +33,7 @@ with Elists;   use Elists;
 with Errout;   use Errout;
 with Erroutc;  use Erroutc;
 with Exp_Ch11; use Exp_

[Ada] Use static discriminant value for discriminated task record

2019-09-18 Thread Pierre-Marie de Rodat

This patch allows the construction of a static subtype for the generated
constrained Secondary_Stack component of a task for which a stack size
is specified, when compiling for a restricted run-time that forbids
dynamic allocation. Needed for LLVM.

Tested on x86_64-pc-linux-gnu, committed on trunk

2019-09-18  Ed Schonberg  

gcc/ada/

* sem_ch3.adb (Constrain_Component_Type): For a discriminated
type, handle the case of a constraint given by a conversion of a
discriminant of the enclosing type. Necessary when compiling a
discriminated task for a restricted run-time, when the generated
Secondary_Stack component may be set by means of an aspect on
the task type.--- gcc/ada/sem_ch3.adb
+++ gcc/ada/sem_ch3.adb
@@ -13258,7 +13258,9 @@ package body Sem_Ch3 is
 
   function Build_Constrained_Discriminated_Type
 (Old_Type : Entity_Id) return Entity_Id;
-  --  Ditto for record components
+  --  Ditto for record components. Handle the case where the constraint
+  --  is a conversion of the discriminant value, introduced during
+  --  expansion.
 
   function Build_Constrained_Access_Type
 (Old_Type : Entity_Id) return Entity_Id;
@@ -13443,6 +13445,17 @@ package body Sem_Ch3 is
 
 if Is_Discriminant (Expr) then
Need_To_Create_Itype := True;
+
+--  After expansion of discriminated task types, the value
+--  of the discriminant may be converted to a run-time type
+--  for restricted run-times. Propagate the value of the
+--  discriminant ss well, so that e.g. the secondary stack
+--  component has a static constraint. Necessry for LLVM.
+
+elsif Nkind (Expr) = N_Type_Conversion
+  and then Is_Discriminant (Expression (Expr))
+then
+   Need_To_Create_Itype := True;
 end if;
 
 Next_Elmt (Old_Constraint);
@@ -13457,6 +13470,12 @@ package body Sem_Ch3 is
 
if Is_Discriminant (Expr) then
   Expr := Get_Discr_Value (Expr);
+
+   elsif Nkind (Expr) = N_Type_Conversion
+ and then Is_Discriminant (Expression (Expr))
+   then
+  Expr := New_Copy_Tree (Expr);
+  Set_Expression (Expr, Get_Discr_Value (Expression (Expr)));
end if;
 
Append (New_Copy_Tree (Expr), To => Constr_List);

[Ada] Spurious ineffective use_clause warning

2019-09-18 Thread Pierre-Marie de Rodat

This patch fixes an issue whereby expansion of post conditions may lead
to spurious ineffective use_clause warnings when a use type clause is
present in a package specification and a use package clause exists in
the package body on the package containing said type.

Tested on x86_64-pc-linux-gnu, committed on trunk

2019-09-18  Justin Squirek  

gcc/ada/

* sem_ch8.adb (Use_One_Type): Add guard to prevent warning on a
reundant use package clause where there is no previous
use_clause in the chain.

gcc/testsuite/

* gnat.dg/warn30.adb, gnat.dg/warn30.ads: New testcase.--- gcc/ada/sem_ch8.adb
+++ gcc/ada/sem_ch8.adb
@@ -10337,11 +10337,18 @@ package body Sem_Ch8 is
  --  The package where T is declared is already used
 
  elsif In_Use (Scope (T)) then
-Error_Msg_Sloc :=
-  Sloc (Find_Most_Prev (Current_Use_Clause (Scope (T;
-Error_Msg_NE -- CODEFIX
-  ("& is already use-visible through package use clause #??",
-   Id, T);
+--  Due to expansion of contracts we could be attempting to issue
+--  a spurious warning - so verify there is a previous use clause.
+
+if Current_Use_Clause (Scope (T)) /=
+ Find_Most_Prev (Current_Use_Clause (Scope (T)))
+then
+   Error_Msg_Sloc :=
+ Sloc (Find_Most_Prev (Current_Use_Clause (Scope (T;
+   Error_Msg_NE -- CODEFIX
+ ("& is already use-visible through package use clause #??",
+  Id, T);
+end if;
 
  --  The current scope is the package where T is declared
 

--- /dev/null
new file mode 100644
+++ gcc/testsuite/gnat.dg/warn30.adb
@@ -0,0 +1,10 @@
+--  { dg-do compile }
+--  { dg-options "-gnatwa" }
+with Interfaces; use Interfaces;
+
+package body Warn30 is
+   procedure Incr (X : in out Interfaces.Integer_64) is
+   begin
+  X := X + 1;
+   end Incr;
+end Warn30;
\ No newline at end of file

--- /dev/null
new file mode 100644
+++ gcc/testsuite/gnat.dg/warn30.ads
@@ -0,0 +1,6 @@
+with Interfaces; use type Interfaces.Integer_64;
+
+package Warn30 is
+   procedure Incr (X : in out Interfaces.Integer_64) with
+ Post => X = X'Old + 1;
+end Warn30;

Re: Make assemble_real generate canonical CONST_INTs

2019-09-18 Thread Richard Biener

On Tue, Sep 17, 2019 at 4:33 PM Richard Sandiford
 wrote:
>
> assemble_real used GEN_INT to create integers directly from the
> longs returned by real_to_target.  assemble_integer then went on
> to interpret the const_ints as though they had the mode corresponding
> to the accompanying size parameter:
>
>   imode = mode_for_size (size * BITS_PER_UNIT, mclass, 0).require ();
>
>   for (i = 0; i < size; i += subsize)
> {
>   rtx partial = simplify_subreg (omode, x, imode, i);
>
> But in the assemble_real case, X might not be canonical for IMODE.
>
> If the interface to assemble_integer is supposed to allow outputting
> (say) the low 4 bytes of a DImode integer, then the simplify_subreg
> above is wrong.  But if the number of bytes passed to assemble_integer
> is supposed to be the number of bytes that the integer actually contains,
> assemble_real is wrong.
>
> This patch takes the latter interpretation and makes assemble_real
> generate const_ints that are canonical for the number of bytes passed.
>
> The flip_storage_order handling assumes that each long is a full
> SImode, which e.g. excludes BITS_PER_UNIT != 8 and float formats
> whose memory size is not a multiple of 32 bits (which includes
> HFmode at least).  The patch therefore leaves that code alone.
> If interpreting each integer as SImode is correct, the const_ints
> that it generates are also correct.
>
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  Also tested
> by making sure that there were no new errors from a range of
> cross-built targets.  OK to install?
>
> Richard
>
>
> 2019-09-17  Richard Sandiford  
>
> gcc/
> * varasm.c (assemble_real): Generate canonical const_ints.
>
> Index: gcc/varasm.c
> ===
> --- gcc/varasm.c2019-09-05 08:49:30.829739618 +0100
> +++ gcc/varasm.c2019-09-17 15:30:10.400740515 +0100
> @@ -2873,25 +2873,27 @@ assemble_real (REAL_VALUE_TYPE d, scalar
>real_to_target (data, &d, mode);
>
>/* Put out the first word with the specified alignment.  */
> +  unsigned int chunk_nunits = MIN (nunits, units_per);
>if (reverse)
>  elt = flip_storage_order (SImode, gen_int_mode (data[nelts - 1], 
> SImode));
>else
> -elt = GEN_INT (data[0]);
> -  assemble_integer (elt, MIN (nunits, units_per), align, 1);
> -  nunits -= units_per;
> +elt = GEN_INT (sext_hwi (data[0], chunk_nunits * BITS_PER_UNIT));

why the appearant difference between the storage-order flipping
variant using gen_int_mode vs. the GEN_INT with sext_hwi?
Can't we use gen_int_mode in the non-flipping path and be done with that?

> +  assemble_integer (elt, chunk_nunits, align, 1);
> +  nunits -= chunk_nunits;
>
>/* Subsequent words need only 32-bit alignment.  */
>align = min_align (align, 32);
>
>for (int i = 1; i < nelts; i++)
>  {
> +  chunk_nunits = MIN (nunits, units_per);
>if (reverse)
> elt = flip_storage_order (SImode,
>   gen_int_mode (data[nelts - 1 - i], SImode));
>else
> -   elt = GEN_INT (data[i]);
> -  assemble_integer (elt, MIN (nunits, units_per), align, 1);
> -  nunits -= units_per;
> +   elt = GEN_INT (sext_hwi (data[i], chunk_nunits * BITS_PER_UNIT));
> +  assemble_integer (elt, chunk_nunits, align, 1);
> +  nunits -= chunk_nunits;
>  }
>  }
>

[Ada] Raise exception on call to Expect for a dead process

2019-09-18 Thread Pierre-Marie de Rodat

Call to Expect for a dead process results in SIGBUS signal on Linux
systems. Process_Died exception is raised in this case now.

Tested on x86_64-pc-linux-gnu, committed on trunk

2019-09-18  Vadim Godunko  

gcc/ada/

* libgnat/g-expect.adb (Expect_Internal): Don't include invalid
file descriptors into the set of file descriptors for Poll.
Raise Process_Died exception when computed set of file
descriptors to monitor is empty.

gcc/testsuite/

* gnat.dg/expect4.adb: New testcase.--- gcc/ada/libgnat/g-expect.adb
+++ gcc/ada/libgnat/g-expect.adb
@@ -653,7 +653,9 @@ package body GNAT.Expect is
 
begin
   for J in Descriptors'Range loop
- if Descriptors (J) /= null then
+ if Descriptors (J) /= null
+   and then Descriptors (J).Output_Fd /= Invalid_FD
+ then
 Fds (Fds'First + Fds_Count) := Descriptors (J).Output_Fd;
 Fds_To_Descriptor (Fds'First + Fds_Count) := J;
 Fds_Count := Fds_Count + 1;
@@ -667,6 +669,14 @@ package body GNAT.Expect is
  end if;
   end loop;
 
+  if Fds_Count = 0 then
+ --  There are no descriptors to monitor, it means that process died.
+
+ Result := Expect_Process_Died;
+
+ return;
+  end if;
+
   declare
  Buffer : aliased String (1 .. Buffer_Size);
  --  Buffer used for input. This is allocated only once, not for

--- /dev/null
new file mode 100644
+++ gcc/testsuite/gnat.dg/expect4.adb
@@ -0,0 +1,35 @@
+--  { dg-do run }
+
+with GNAT.Expect.TTY;
+with GNAT.OS_Lib;
+
+procedure Expect4 is
+   Pid: GNAT.Expect.TTY.TTY_Process_Descriptor;
+   Args   : GNAT.OS_Lib.Argument_List (1 .. 0);
+   Result : GNAT.Expect.Expect_Match;
+
+begin
+   Pid.Non_Blocking_Spawn ("true", Args);
+
+   begin
+  Pid.Expect (Result, ".*");
+
+  raise Program_Error;
+
+   exception
+  when GNAT.Expect.Process_Died =>
+ null;
+   end;
+
+   begin
+  Pid.Expect (Result, ".*");
+
+  raise Program_Error;
+
+   exception
+  when GNAT.Expect.Process_Died =>
+ null;
+   end;
+
+   Pid.Close;
+end Expect4;
\ No newline at end of file

Re: [PATCH][ARM] Enable code hoisting with -Os (PR80155)

2019-09-18 Thread Richard Biener

On Tue, Sep 17, 2019 at 7:18 PM Wilco Dijkstra  wrote:
>
> Hi Richard,
>
> > The issue with the bugzilla is that it lacked appropriate testcase(s) and 
> > thus
> > it is now a mess.  There are clear testcases (maybe not in the benchmarks 
> > you
>
> Agreed - it's not clear whether any of the proposed changes would actually
> help the original issue. My patch absolutely does.
>
> > care about) that benefit from code hoisting as enabler, mainly when control
> > flow can be then converted to data flow.  Also note that "size 
> > optimizations"
> > are important for all cases where followup transforms have size limits on 
> > the IL
> > in place.
>
> The gain from -fcode-hoisting is about 0.2% overall on Thumb-2. Ie. it's 
> definitely
> useful, but there are much larger gains to be had from other tweaks [1]. So 
> we can
> live without it until a better solution is found.

A "solution" for better eembc benchmark results?

The issues are all latent even w/o code-hoisting since you can do the
same transform at the source level.  Which is usually why I argue
trying to fix this in code-hoisting is not a complete fix.  Nor is turning
off random GIMPLE passes for specific benchmark regressions.

Anyway, it's arm maintainers call if you want to have such hacks in
place or not.

As a release manager I say that GCC isn't a benchmark compiler.

As the one "responsible" for the code-hoisting introduction I say that
as long as I don't have access to the actual benchmark I can't assess
wrongdoing of the pass nor suggest an appropriate place for optimization.

Richard.

>
> [1] https://gcc.gnu.org/ml/gcc-patches/2019-07/msg01739.html
>
> Wilco

[Ada] Code cleanup of alignment representation clauses in dispatch tables

2019-09-18 Thread Pierre-Marie de Rodat

This patch does not modify the functionality of the compiler; it avoids
generating non-required alignment representation clauses for dispatch
tables.

Tested on x86_64-pc-linux-gnu, committed on trunk

2019-09-18  Javier Miranda  

gcc/ada/

* exp_disp.adb (Make_DT, Make_Secondary_DT): Remove generation
of alignment representation clause for the following tables:
Predef_Prims, Iface_DT, TSD, ITable, DT.--- gcc/ada/exp_disp.adb
+++ gcc/ada/exp_disp.adb
@@ -4041,7 +4041,6 @@ package body Exp_Disp is
  -- predef-prim-op-thunk-2'address,
  -- ...
  -- predef-prim-op-thunk-n'address);
- --   for Predef_Prims'Alignment use Address'Alignment
 
  --  Create the thunks associated with the predefined primitives and
  --  save their entity to fill the aggregate.
@@ -4125,16 +4124,6 @@ package body Exp_Disp is
 Object_Definition   => New_Occurrence_Of
  (Defining_Identifier (Decl), Loc),
 Expression => New_Node));
-
-Append_To (Result,
-  Make_Attribute_Definition_Clause (Loc,
-Name   => New_Occurrence_Of (Predef_Prims, Loc),
-Chars  => Name_Alignment,
-Expression =>
-  Make_Attribute_Reference (Loc,
-Prefix =>
-  New_Occurrence_Of (RTE (RE_Integer_Address), Loc),
-Attribute_Name => Name_Alignment)));
  end;
 
  --  Generate
@@ -4143,6 +4132,7 @@ package body Exp_Disp is
  --  (OSD_Table => (1 => ,
  --   ...
  -- N => ));
+ --   for OSD'Alignment use Address'Alignment;
 
  --   Iface_DT : Dispatch_Table (Nb_Prims) :=
  --   ([ Signature   =>  ],
@@ -4154,7 +4144,6 @@ package body Exp_Disp is
  --  prim-op-2'address,
  --  ...
  --  prim-op-n'address));
- --   for Iface_DT'Alignment use Address'Alignment;
 
  --  Stage 3: Initialize the discriminant and the record components
 
@@ -4454,17 +4443,6 @@ package body Exp_Disp is
Make_Aggregate (Loc,
  Expressions => DT_Aggr_List)));
 
- Append_To (Result,
-   Make_Attribute_Definition_Clause (Loc,
- Name   => New_Occurrence_Of (Iface_DT, Loc),
- Chars  => Name_Alignment,
-
- Expression =>
-   Make_Attribute_Reference (Loc,
- Prefix =>
-   New_Occurrence_Of (RTE (RE_Integer_Address), Loc),
- Attribute_Name => Name_Alignment)));
-
  if Exporting_Table then
 Export_DT (Typ, Iface_DT, Suffix_Index);
 
@@ -4946,7 +4924,6 @@ package body Exp_Disp is
 
  --  Generate:
  --DT : No_Dispatch_Table_Wrapper;
- --for DT'Alignment use Address'Alignment;
  --DT_Ptr : Tag := !Tag (DT.NDT_Prims_Ptr'Address);
 
  if not Has_DT (Typ) then
@@ -4960,16 +4937,6 @@ package body Exp_Disp is
 (RTE (RE_No_Dispatch_Table_Wrapper), Loc)));
 
 Append_To (Result,
-  Make_Attribute_Definition_Clause (Loc,
-Name   => New_Occurrence_Of (DT, Loc),
-Chars  => Name_Alignment,
-Expression =>
-  Make_Attribute_Reference (Loc,
-Prefix =>
-  New_Occurrence_Of (RTE (RE_Integer_Address), Loc),
-Attribute_Name => Name_Alignment)));
-
-Append_To (Result,
   Make_Object_Declaration (Loc,
 Defining_Identifier => DT_Ptr,
 Object_Definition   => New_Occurrence_Of (RTE (RE_Tag), Loc),
@@ -5008,7 +4975,6 @@ package body Exp_Disp is
 
  --  Generate:
  --DT : Dispatch_Table_Wrapper (Nb_Prim);
- --for DT'Alignment use Address'Alignment;
  --DT_Ptr : Tag := !Tag (DT.Prims_Ptr'Address);
 
  else
@@ -5037,16 +5003,6 @@ package body Exp_Disp is
 Constraints => DT_Constr_List;
 
 Append_To (Result,
-  Make_Attribute_Definition_Clause (Loc,
-Name   => New_Occurrence_Of (DT, Loc),
-Chars  => Name_Alignment,
-Expression =>
-  Make_Attribute_Reference (Loc,
-Prefix =>
-  New_Occurrence_Of (RTE (RE_Integer_Address), Loc),
-Attribute_Name => Name_Alignment)));
-
-Append_To (Result,
   Make_Object_Declaration (Loc,
 Defining_Identifier => DT_Ptr,
 Object_Defi

Re: Two more POLY_INT cases for dwarf2out.c

2019-09-18 Thread Richard Biener

On Wed, Sep 18, 2019 at 8:49 AM Richard Sandiford
 wrote:
>
> loc_list_for_tree_1 and add_const_value_attribute currently ICE
> on POLY_INTs.  loc_list_for_tree_1 can do something sensible but
> add_const_value_attribute has to punt, since the constant there
> needs to be a link-time rather than load-time or run-time constant.
>
> This is tested by later SVE patches.
>
> Tested on aarch64-linux-gnu with SVE (with and without follow-on
> patches) and x86_64-linux-gnu.  OK to install?

OK.

Richard.

> Richard
>
>
> 2019-09-18  Richard Sandiford  
>
> gcc/
> * dwarf2out.c (loc_list_from_tree_1): Handle POLY_INT_CST.
> (add_const_value_attribute): Handle CONST_POLY_INT.
>
> Index: gcc/dwarf2out.c
> ===
> --- gcc/dwarf2out.c 2019-09-17 15:27:11.37402 +0100
> +++ gcc/dwarf2out.c 2019-09-18 07:47:42.297132785 +0100
> @@ -18568,6 +18568,24 @@ loc_list_from_tree_1 (tree loc, int want
> }
>break;
>
> +case POLY_INT_CST:
> +  {
> +   if (want_address)
> + {
> +   expansion_failed (loc, NULL_RTX,
> + "constant address with a runtime component");
> +   return 0;
> + }
> +   poly_int64 value;
> +   if (!poly_int_tree_p (loc, &value))
> + {
> +   expansion_failed (loc, NULL_RTX, "constant too big");
> +   return 0;
> + }
> +   ret = int_loc_descriptor (value);
> +  }
> +  break;
> +
>  case CONSTRUCTOR:
>  case REAL_CST:
>  case STRING_CST:
> @@ -19684,6 +19702,7 @@ add_const_value_attribute (dw_die_ref di
>  case MINUS:
>  case SIGN_EXTEND:
>  case ZERO_EXTEND:
> +case CONST_POLY_INT:
>return false;
>
>  case MEM:

Re: Handle variable-length vectors in compute_record_mode

2019-09-18 Thread Richard Biener

On Wed, Sep 18, 2019 at 8:52 AM Richard Sandiford
 wrote:
>
> This patch makes compute_record_mode handle SVE vectors in the
> same way as it would handle fixed-length vectors.  There should
> be no change in behaviour for other targets.
>
> This is needed for the SVE equivalent of arm_neon.h types like
> int8x8x2_t (i.e. a pair of int8x8_ts).
>
> Tested on aarch64-linux-gnu with SVE (with and without follow-on
> patches) and x86_64-linux-gnu.  OK to install?

OK.

Richard.

> Richard
>
>
> 2019-09-18  Richard Sandiford  
>
> gcc/
> * stor-layout.c (compute_record_mode): Operate on poly_uint64
> sizes instead of uhwi sizes.
>
> Index: gcc/stor-layout.c
> ===
> --- gcc/stor-layout.c   2019-08-20 09:52:22.522737142 +0100
> +++ gcc/stor-layout.c   2019-09-18 07:49:59.796102474 +0100
> @@ -1811,7 +1811,8 @@ compute_record_mode (tree type)
>   line.  */
>SET_TYPE_MODE (type, BLKmode);
>
> -  if (! tree_fits_uhwi_p (TYPE_SIZE (type)))
> +  poly_uint64 type_size;
> +  if (!poly_int_tree_p (TYPE_SIZE (type), &type_size))
>  return;
>
>/* A record which has any BLKmode members must itself be
> @@ -1822,20 +1823,21 @@ compute_record_mode (tree type)
>if (TREE_CODE (field) != FIELD_DECL)
> continue;
>
> +  poly_uint64 field_size;
>if (TREE_CODE (TREE_TYPE (field)) == ERROR_MARK
>   || (TYPE_MODE (TREE_TYPE (field)) == BLKmode
>   && ! TYPE_NO_FORCE_BLK (TREE_TYPE (field))
>   && !(TYPE_SIZE (TREE_TYPE (field)) != 0
>&& integer_zerop (TYPE_SIZE (TREE_TYPE (field)
> - || ! tree_fits_uhwi_p (bit_position (field))
> + || !tree_fits_poly_uint64_p (bit_position (field))
>   || DECL_SIZE (field) == 0
> - || ! tree_fits_uhwi_p (DECL_SIZE (field)))
> + || !poly_int_tree_p (DECL_SIZE (field), &field_size))
> return;
>
>/* If this field is the whole struct, remember its mode so
>  that, say, we can put a double in a class into a DF
>  register instead of forcing it to live in the stack.  */
> -  if (simple_cst_equal (TYPE_SIZE (type), DECL_SIZE (field))
> +  if (known_eq (field_size, type_size)
>   /* Partial int types (e.g. __int20) may have TYPE_SIZE equal to
>  wider types (e.g. int32), despite precision being less.  Ensure
>  that the TYPE_MODE of the struct does not get set to the partial
> @@ -1855,7 +1857,6 @@ compute_record_mode (tree type)
>   For UNION_TYPE, if the widest field is MODE_INT then use that mode.
>   If the widest field is MODE_PARTIAL_INT, and the union will be passed
>   by reference, then use that mode.  */
> -  poly_uint64 type_size;
>if ((TREE_CODE (type) == RECORD_TYPE
> || (TREE_CODE (type) == UNION_TYPE
>&& (GET_MODE_CLASS (mode) == MODE_INT
> @@ -1864,7 +1865,6 @@ compute_record_mode (tree type)
>(pack_cumulative_args (0),
> function_arg_info (type, mode, /*named=*/false)))
>&& mode != VOIDmode
> -  && poly_int_tree_p (TYPE_SIZE (type), &type_size)
>&& known_eq (GET_MODE_BITSIZE (mode), type_size))
>  ;
>else

Re: Don't treat variable-length vectors as VLAs during gimplification

2019-09-18 Thread Richard Biener

On Wed, Sep 18, 2019 at 8:53 AM Richard Sandiford
 wrote:
>
> Source-level SVE vectors should be gimplified in the same way
> as normal fixed-length vectors rather than as VLAs.
>
> This is tested by later SVE patches.
>
> Tested on aarch64-linux-gnu with SVE (with and without follow-on
> patches) and x86_64-linux-gnu.  OK to install?

OK.

Richard.

> Richard
>
>
> 2019-09-18  Richard Sandiford  
>
> gcc/
> * gimplify.c (gimplify_decl_expr): Use poly_int_tree_p instead
> of checking specifically for INTEGER_CST.
>
> Index: gcc/gimplify.c
> ===
> --- gcc/gimplify.c  2019-08-08 18:11:51.411313290 +0100
> +++ gcc/gimplify.c  2019-09-18 07:52:22.799034800 +0100
> @@ -1754,11 +1754,12 @@ gimplify_decl_expr (tree *stmt_p, gimple
>tree init = DECL_INITIAL (decl);
>bool is_vla = false;
>
> -  if (TREE_CODE (DECL_SIZE_UNIT (decl)) != INTEGER_CST
> +  poly_uint64 size;
> +  if (!poly_int_tree_p (DECL_SIZE_UNIT (decl), &size)
>   || (!TREE_STATIC (decl)
>   && flag_stack_check == GENERIC_STACK_CHECK
> - && compare_tree_int (DECL_SIZE_UNIT (decl),
> -  STACK_CHECK_MAX_VAR_SIZE) > 0))
> + && maybe_gt (size,
> +  (unsigned HOST_WIDE_INT) 
> STACK_CHECK_MAX_VAR_SIZE)))
> {
>   gimplify_vla_decl (decl, seq_p);
>   is_vla = true;

Re: Make get_value_for_expr check for INTEGER_CSTs

2019-09-18 Thread Richard Biener

On Wed, Sep 18, 2019 at 8:54 AM Richard Sandiford
 wrote:
>
> CONSTANT lattice values are symbolic constants rather than
> compile-time constants, so among other things can be POLY_INT_CSTs.
> This patch fixes a case in which we assumed all CONSTANTs were either
> ADDR_EXPRs or INTEGER_CSTs.
>
> This is tested by later SVE patches.
>
> Tested on aarch64-linux-gnu with SVE (with and without follow-on
> patches) and x86_64-linux-gnu.  OK to install?

OK.

Richard.

> Richard
>
>
> 2019-09-18  Richard Sandiford  
>
> gcc/
> * tree-ssa-ccp.c (get_value_for_expr): Check whether CONSTANTs
> are INTEGER_CSTs.
>
> Index: gcc/tree-ssa-ccp.c
> ===
> --- gcc/tree-ssa-ccp.c  2019-08-21 14:58:05.999057076 +0100
> +++ gcc/tree-ssa-ccp.c  2019-09-18 07:53:36.930481545 +0100
> @@ -615,9 +615,17 @@ get_value_for_expr (tree expr, bool for_
>   val.mask = -1;
> }
>if (for_bits_p
> - && val.lattice_val == CONSTANT
> - && TREE_CODE (val.value) == ADDR_EXPR)
> -   val = get_value_from_alignment (val.value);
> + && val.lattice_val == CONSTANT)
> +   {
> + if (TREE_CODE (val.value) == ADDR_EXPR)
> +   val = get_value_from_alignment (val.value);
> + else if (TREE_CODE (val.value) != INTEGER_CST)
> +   {
> + val.lattice_val = VARYING;
> + val.value = NULL_TREE;
> + val.mask = -1;
> +   }
> +   }
>/* Fall back to a copy value.  */
>if (!for_bits_p
>   && val.lattice_val == VARYING

Re: [PATCH] Come up with debug counter for store-merging.

2019-09-18 Thread Richard Biener

On Wed, Sep 18, 2019 at 9:22 AM Martin Liška  wrote:
>
> Hi.
>
> After I spent quite some time with PR91758, I would like
> to see a debug counter in store merging for the next time.
>
> Ready to be installed?
OK.

Richard.

> Thanks,
> Martin
>
> gcc/ChangeLog:
>
> 2019-09-18  Martin Liska  
>
> * dbgcnt.def (store_merging): New counter.
> * gimple-ssa-store-merging.c 
> (imm_store_chain_info::output_merged_stores):
> Use it in store merging.
> ---
>  gcc/dbgcnt.def | 1 +
>  gcc/gimple-ssa-store-merging.c | 4 +++-
>  2 files changed, 4 insertions(+), 1 deletion(-)
>
>

Re: Add ARRAY_REF based access patch disambiguation

2019-09-18 Thread Richard Biener

On Thu, 15 Aug 2019, Jan Hubicka wrote:

> Hi,
> here is updated version.
> > > +   /* We generally assume that both access paths starts by same sequence
> > > +  of refs.  However if number of array refs is not in sync, try
> > > +  to recover and pop elts until number match.  This helps the case
> > > +  where one access path starts by array and other by element.  */
> > 
> > I think part of the confusion is that we're doing this in the outer
> > while loop, so "starts" applies to all sub-paths we consider?
> > 
> > Or only to the innermost?
> > 
> > So - why's this inside the loop?  Some actual access path pair
> > examples in the comment would help.  And definitely more testcases
> > since the single one you add is too simplistic to need all this code ;)
> 
> I have added a testcase. Basically we hope the chain of array refs to
> end by same type and in that case we want to peel out the outermost. You
> are right that it does not work always especially if the innermost
> reference is array (which is rare).
> 
> I suppose one can do type matching once actually have
> same_type_for_tbaa_p working on array_ref.
> 
> I have added testcase, if you would preffer to move the logic out of the
> walking loop, I can do that. I can also just drop it and handle this
> later.
> 
> I put it into inner loop basically to increase chances that we
> succesfully walk access paths of different types in situation
> -fno-strict-aliasing is used and the type sequences are not fully
> compatible. 
> 
> I plan to put some love into -fno-strict-aliasing incrementally.
> 
> This patch adds testcase for the access paths of different lengths and
> fixes other issues discussed. It is another testcase where fre1 seems to
> give up and fre3 is needed.
> 
> Before fre1 we get:
> test (int i, int j)
> {
>   int[10][10] * innerptr;
>   int[10][10][10] * barptr.0_1;
>   int[10][10][10] * barptr.1_2;
>   int _9;
> 
>:
>   innerptr_4 = barptr;
>   barptr.0_1 = barptr;
>   (*barptr.0_1)[i_5(D)][2][j_6(D)] = 10;
>   (*innerptr_4)[3][j_6(D)] = 11;
>   barptr.1_2 = barptr;
>   _9 = (*barptr.1_2)[i_5(D)][2][j_6(D)];
>   return _9;
> 
> }
> 
> that is optimized to:
> test (int i, int j)
> {
>   int[10][10] * innerptr;
>   int _9;
> 
>:
>   innerptr_4 = barptr;
>   MEM[(int[10][10][10] *)innerptr_4][i_5(D)][2][j_6(D)] = 10;
>   (*innerptr_4)[3][j_6(D)] = 11;
>   _9 = MEM[(int[10][10][10] *)innerptr_4][i_5(D)][2][j_6(D)];
>   return _9;
> 
> }
> 
> before fre3 we get:
> test (int i, int j)
> {
>   int[10][10] * innerptr;
>   int _7;
> 
>[local count: 1073741824]:
>   innerptr_2 = barptr;
>   MEM[(int[10][10][10] *)innerptr_2][i_3(D)][2][j_4(D)] = 10;
>   (*innerptr_2)[3][j_4(D)] = 11;
>   _7 = MEM[(int[10][10][10] *)innerptr_2][i_3(D)][2][j_4(D)];
>   return _7;
> 
> }
> and fre3 does:
> test (int i, int j)
> {
>   int[10][10] * innerptr;
> 
>[local count: 1073741824]:
>   innerptr_2 = barptr;
>   MEM[(int[10][10][10] *)innerptr_2][i_3(D)][2][j_4(D)] = 10;
>   (*innerptr_2)[3][j_4(D)] = 11;
>   return 10;
> 
> }

This is OK.

Thanks and sorry for the delay...
Richard.

> Honza
> 
>   * tree-ssa-alias.c (nonoverlapping_component_refs_since_match_p):
>   Rename to ...
>   (nonoverlapping_refs_since_match_p): ... this; handle also
>   ARRAY_REFs.
>   (alias_stats): Update stats.
>   (dump_alias_stats): Likewise.
>   (cheap_array_ref_low_bound): New function.
>   (aliasing_matching_component_refs_p): Add partial_overlap
>   argument;
>   pass it to nonoverlapping_refs_since_match_p.
>   (aliasing_component_refs_walk): Update call of
>   aliasing_matching_component_refs_p
>   (nonoverlapping_array_refs_p): New function.
>   (decl_refs_may_alias_p, indirect_ref_may_alias_decl_p,
>   indirect_refs_may_alias_p): Update calls of
>   nonoverlapping_refs_since_match_p.
>   * gcc.dg/tree-ssa/alias-access-path-10.c: New testcase.
> 
> Index: testsuite/gcc.dg/tree-ssa/alias-access-path-10.c
> ===
> --- testsuite/gcc.dg/tree-ssa/alias-access-path-10.c  (nonexistent)
> +++ testsuite/gcc.dg/tree-ssa/alias-access-path-10.c  (working copy)
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-fre1" } */
> +
> +struct a {int array[3];} a[10];
> +int
> +test(int i,int j)
> +{
> +  a[i].array[1]=123;
> +  a[j].array[2]=2;
> +  return a[i].array[1];
> +}
> +/* { dg-final { scan-tree-dump-times "return 123" 1 "fre1"} } */
> Index: testsuite/gcc.dg/tree-ssa/alias-access-path-11.c
> ===
> --- testsuite/gcc.dg/tree-ssa/alias-access-path-11.c  (nonexistent)
> +++ testsuite/gcc.dg/tree-ssa/alias-access-path-11.c  (working copy)
> @@ -0,0 +1,15 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fno-strict-aliasing -fdump-tree-fre3" } */
> +typedef int outerarray[10][10][10];
> +typedef int innerarray[10][10];
> +outerarray *b

Re: Make assemble_real generate canonical CONST_INTs

2019-09-18 Thread Richard Sandiford

Richard Biener  writes:
> On Tue, Sep 17, 2019 at 4:33 PM Richard Sandiford
>  wrote:
>>
>> assemble_real used GEN_INT to create integers directly from the
>> longs returned by real_to_target.  assemble_integer then went on
>> to interpret the const_ints as though they had the mode corresponding
>> to the accompanying size parameter:
>>
>>   imode = mode_for_size (size * BITS_PER_UNIT, mclass, 0).require ();
>>
>>   for (i = 0; i < size; i += subsize)
>> {
>>   rtx partial = simplify_subreg (omode, x, imode, i);
>>
>> But in the assemble_real case, X might not be canonical for IMODE.
>>
>> If the interface to assemble_integer is supposed to allow outputting
>> (say) the low 4 bytes of a DImode integer, then the simplify_subreg
>> above is wrong.  But if the number of bytes passed to assemble_integer
>> is supposed to be the number of bytes that the integer actually contains,
>> assemble_real is wrong.
>>
>> This patch takes the latter interpretation and makes assemble_real
>> generate const_ints that are canonical for the number of bytes passed.
>>
>> The flip_storage_order handling assumes that each long is a full
>> SImode, which e.g. excludes BITS_PER_UNIT != 8 and float formats
>> whose memory size is not a multiple of 32 bits (which includes
>> HFmode at least).  The patch therefore leaves that code alone.
>> If interpreting each integer as SImode is correct, the const_ints
>> that it generates are also correct.
>>
>> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  Also tested
>> by making sure that there were no new errors from a range of
>> cross-built targets.  OK to install?
>>
>> Richard
>>
>>
>> 2019-09-17  Richard Sandiford  
>>
>> gcc/
>> * varasm.c (assemble_real): Generate canonical const_ints.
>>
>> Index: gcc/varasm.c
>> ===
>> --- gcc/varasm.c2019-09-05 08:49:30.829739618 +0100
>> +++ gcc/varasm.c2019-09-17 15:30:10.400740515 +0100
>> @@ -2873,25 +2873,27 @@ assemble_real (REAL_VALUE_TYPE d, scalar
>>real_to_target (data, &d, mode);
>>
>>/* Put out the first word with the specified alignment.  */
>> +  unsigned int chunk_nunits = MIN (nunits, units_per);
>>if (reverse)
>>  elt = flip_storage_order (SImode, gen_int_mode (data[nelts - 1], 
>> SImode));
>>else
>> -elt = GEN_INT (data[0]);
>> -  assemble_integer (elt, MIN (nunits, units_per), align, 1);
>> -  nunits -= units_per;
>> +elt = GEN_INT (sext_hwi (data[0], chunk_nunits * BITS_PER_UNIT));
>
> why the appearant difference between the storage-order flipping
> variant using gen_int_mode vs. the GEN_INT with sext_hwi?
> Can't we use gen_int_mode in the non-flipping path and be done with that?

Yeah, I mentioned this in the covering note.  The flip_storage_order
stuff only seems to work for floats that are a multiple of 32 bits in
size, so it doesn't e.g. handle HFmode or 80-bit floats, whereas the
new "else" does.  Hard-coding SImode also hard-codes BITS_PER_UNIT==8,
unlike the "else".

So if anything, it's flip_storage_order that might need to change
to avoid hard-coding SImode.  That doesn't look like a trivial change
though.  E.g. the number of bytes passed to assemble_integer would need
to match the number of bytes in data[nelts - 1] rather than data[0].
The alignment code below would also need to be adjusted.  Fixing that
(if it is a bug) seems like a separate change and TBH I'd rather not
touch it here.

Thanks,
Richard

>
>> +  assemble_integer (elt, chunk_nunits, align, 1);
>> +  nunits -= chunk_nunits;
>>
>>/* Subsequent words need only 32-bit alignment.  */
>>align = min_align (align, 32);
>>
>>for (int i = 1; i < nelts; i++)
>>  {
>> +  chunk_nunits = MIN (nunits, units_per);
>>if (reverse)
>> elt = flip_storage_order (SImode,
>>   gen_int_mode (data[nelts - 1 - i], 
>> SImode));
>>else
>> -   elt = GEN_INT (data[i]);
>> -  assemble_integer (elt, MIN (nunits, units_per), align, 1);
>> -  nunits -= units_per;
>> +   elt = GEN_INT (sext_hwi (data[i], chunk_nunits * BITS_PER_UNIT));
>> +  assemble_integer (elt, chunk_nunits, align, 1);
>> +  nunits -= chunk_nunits;
>>  }
>>  }
>>

Re: [PATCH] RISC-V: Allow more load/stores to be compressed

2019-09-18 Thread Kito Cheng

Hi Craig:

Some general review comment:
- Split new pass into new file.
- Add new option to enable/disable this pass.
- Could you extend this patch to support lw/sw/ld/sd/flw/fsw/fld/fsd?
  I think there is lots of common logic for supporting other types
compressed load/store
  instruction, but I'd like to see those support at once.
- Do you have experimental data about doing that after register
allocation/reload,
  I'd prefer doing such optimization after RA, because we can
accurately estimate
  how many byte we can gain, I guess it because RA didn't assign fit
src/dest reg
  or base reg so that increase code size?

On Fri, Sep 13, 2019 at 12:20 AM Craig Blackmore
 wrote:
>
> This patch aims to allow more load/store instructions to be compressed by
> replacing a load/store of 'base register + large offset' with a new load/store
> of 'new base + small offset'. If the new base gets stored in a compressed
> register, then the new load/store can be compressed. Since there is an 
> overhead
> in creating the new base, this change is only attempted when 'base register' 
> is
> referenced in at least 4 load/stores in a basic block.
>
> The optimization is implemented in a new RISC-V specific pass called
> shorten_memrefs which is enabled for RVC targets. It has been developed for 
> the
> 32-bit lw/sw instructions but could also be extended to 64-bit ld/sd in 
> future.
>
> The patch saves 164 bytes (0.3%) on a proprietary application (59450 bytes
> compared to 59286 bytes) compiled for rv32imc bare metal with -Os. On the
> Embench benchmark suite (https://www.embench.org/) we see code size reductions
> of up to 18 bytes (0.7%) and only two cases where code size is increased
> slightly, by 2 bytes each:
>
> Embench results (.text size in bytes, excluding .rodata)
>
> Benchmark   Without patch  With patch  Diff
> aha-mont64  1052   10520
> crc32   232232 0
> cubic   2446   24482
> edn 1454   1450-4
> huffbench   1642   16420
> matmult-int 420420 0
> minver  1056   10560
> nbody   714714 0
> nettle-aes  2888   2884-4
> nettle-sha256   5566   5564-2
> nsichneu15052  15052   0
> picojpeg8078   80780
> qrduino 6140   61400
> sglib-combined  2444   24440
> slre2438   2420-18
> st  880880 0
> statemate   3842   38420
> ud  702702 0
> wikisort4278   42802
> -
> Total   61324  61300   -24
>
> The patch has been tested on the following bare metal targets using QEMU
> and there were no regressions:
>
>   rv32i
>   rv32iac
>   rv32im
>   rv32imac
>   rv32imafc
>   rv64imac
>   rv64imafdc
>
> We noticed that sched2 undoes some of the addresses generated by this
> optimization and consequently increases code size, therefore this patch adds a
> check in sched-deps.c to avoid changes that are expected to increase code size
> when not optimizing for speed. Since this change touches target-independent
> code, the patch has been bootstrapped and tested on x86 with no regressions.
>
> gcc/ChangeLog
>
> * config/riscv/riscv.c (tree-pass.h): New include.
> (cfg.h) Likewise.
> (context.h) Likewise.
> (riscv_compressed_reg_p): New function.
> (riscv_compressed_lw_address_p): Likewise.
> (riscv_legitimize_address): Attempt to convert base + large_offset
> to compressible new_base + small_offset.
> (riscv_address_cost): Make anticipated compressed load/stores
> cheaper for code size than uncompressed load/stores.
> (class pass_shorten_memrefs): New pass.
> (pass_shorten_memrefs::execute): Likewise.
> (make_pass_shorten_memrefs): Likewise.
> (riscv_option_override): Register shorten_memrefs pass for
> TARGET_RVC.
> (riscv_register_priority): Move compressed register check to
> riscv_compressed_reg_p.
> * sched-deps.c (attempt_change): When optimizing for code size
> don't make change if it increases code size.
>
> ---
>  gcc/config/riscv/riscv.c | 179 
> +--
>  gcc/sched-deps.c |  10 +++
>  2 files changed, 183 insertions(+), 6 deletions(-)
>
> diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c
> index 39bf87a..e510314 100644
> --- a/gcc/config/riscv/riscv.c
> +++ b/gcc/config/riscv/riscv.c
> @@ -55,6 +55,9 @@ along with GCC; see the file COPYING3.  If not see
>  #include "diagnostic.h"
>  #include "builtins.h"
>  #include "predict.h"
> +#include "tree-pass.h"
> +#include "cfg.h"
> +#include "context.h"
>
>  /* True if

Re: Make assemble_real generate canonical CONST_INTs

2019-09-18 Thread Richard Biener

On Wed, Sep 18, 2019 at 11:41 AM Richard Sandiford
 wrote:
>
> Richard Biener  writes:
> > On Tue, Sep 17, 2019 at 4:33 PM Richard Sandiford
> >  wrote:
> >>
> >> assemble_real used GEN_INT to create integers directly from the
> >> longs returned by real_to_target.  assemble_integer then went on
> >> to interpret the const_ints as though they had the mode corresponding
> >> to the accompanying size parameter:
> >>
> >>   imode = mode_for_size (size * BITS_PER_UNIT, mclass, 0).require ();
> >>
> >>   for (i = 0; i < size; i += subsize)
> >> {
> >>   rtx partial = simplify_subreg (omode, x, imode, i);
> >>
> >> But in the assemble_real case, X might not be canonical for IMODE.
> >>
> >> If the interface to assemble_integer is supposed to allow outputting
> >> (say) the low 4 bytes of a DImode integer, then the simplify_subreg
> >> above is wrong.  But if the number of bytes passed to assemble_integer
> >> is supposed to be the number of bytes that the integer actually contains,
> >> assemble_real is wrong.
> >>
> >> This patch takes the latter interpretation and makes assemble_real
> >> generate const_ints that are canonical for the number of bytes passed.
> >>
> >> The flip_storage_order handling assumes that each long is a full
> >> SImode, which e.g. excludes BITS_PER_UNIT != 8 and float formats
> >> whose memory size is not a multiple of 32 bits (which includes
> >> HFmode at least).  The patch therefore leaves that code alone.
> >> If interpreting each integer as SImode is correct, the const_ints
> >> that it generates are also correct.
> >>
> >> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  Also tested
> >> by making sure that there were no new errors from a range of
> >> cross-built targets.  OK to install?
> >>
> >> Richard
> >>
> >>
> >> 2019-09-17  Richard Sandiford  
> >>
> >> gcc/
> >> * varasm.c (assemble_real): Generate canonical const_ints.
> >>
> >> Index: gcc/varasm.c
> >> ===
> >> --- gcc/varasm.c2019-09-05 08:49:30.829739618 +0100
> >> +++ gcc/varasm.c2019-09-17 15:30:10.400740515 +0100
> >> @@ -2873,25 +2873,27 @@ assemble_real (REAL_VALUE_TYPE d, scalar
> >>real_to_target (data, &d, mode);
> >>
> >>/* Put out the first word with the specified alignment.  */
> >> +  unsigned int chunk_nunits = MIN (nunits, units_per);
> >>if (reverse)
> >>  elt = flip_storage_order (SImode, gen_int_mode (data[nelts - 1], 
> >> SImode));
> >>else
> >> -elt = GEN_INT (data[0]);
> >> -  assemble_integer (elt, MIN (nunits, units_per), align, 1);
> >> -  nunits -= units_per;
> >> +elt = GEN_INT (sext_hwi (data[0], chunk_nunits * BITS_PER_UNIT));
> >
> > why the appearant difference between the storage-order flipping
> > variant using gen_int_mode vs. the GEN_INT with sext_hwi?
> > Can't we use gen_int_mode in the non-flipping path and be done with that?
>
> Yeah, I mentioned this in the covering note.  The flip_storage_order
> stuff only seems to work for floats that are a multiple of 32 bits in
> size, so it doesn't e.g. handle HFmode or 80-bit floats, whereas the
> new "else" does.  Hard-coding SImode also hard-codes BITS_PER_UNIT==8,
> unlike the "else".
>
> So if anything, it's flip_storage_order that might need to change
> to avoid hard-coding SImode.  That doesn't look like a trivial change
> though.  E.g. the number of bytes passed to assemble_integer would need
> to match the number of bytes in data[nelts - 1] rather than data[0].
> The alignment code below would also need to be adjusted.  Fixing that
> (if it is a bug) seems like a separate change and TBH I'd rather not
> touch it here.

Hmm, ok.  Patch is OK then.

Thanks,
Richard.

> Thanks,
> Richard
>
> >
> >> +  assemble_integer (elt, chunk_nunits, align, 1);
> >> +  nunits -= chunk_nunits;
> >>
> >>/* Subsequent words need only 32-bit alignment.  */
> >>align = min_align (align, 32);
> >>
> >>for (int i = 1; i < nelts; i++)
> >>  {
> >> +  chunk_nunits = MIN (nunits, units_per);
> >>if (reverse)
> >> elt = flip_storage_order (SImode,
> >>   gen_int_mode (data[nelts - 1 - i], 
> >> SImode));
> >>else
> >> -   elt = GEN_INT (data[i]);
> >> -  assemble_integer (elt, MIN (nunits, units_per), align, 1);
> >> -  nunits -= units_per;
> >> +   elt = GEN_INT (sext_hwi (data[i], chunk_nunits * BITS_PER_UNIT));
> >> +  assemble_integer (elt, chunk_nunits, align, 1);
> >> +  nunits -= chunk_nunits;
> >>  }
> >>  }
> >>

Re: [PATCH] Prevent LTO section collision for a symbol name starting with '*'.

2019-09-18 Thread Martin Liška

On 9/11/19 1:38 PM, Martin Liška wrote:
> The inline_clone manipulation happens in cgraph_node::find_replacement where
> we manipulate the clone_of.

I fixed that but there's a similar situation which goes other way around:

cgraph_node *
cgraph_node::get_create (tree decl)
{
  cgraph_node *first_clone = cgraph_node::get (decl);

  if (first_clone && !first_clone->global.inlined_to)
return first_clone;

  cgraph_node *node = cgraph_node::create (decl);
  if (first_clone)
{
  first_clone->clone_of = node;

Here we come up with a new parent and this->clone_of is set to the parent.
We ought to come cgraph_node::order here, but I don't like.
Right now cgraph_node::order is a way how one can identify a node in IPA dumps.

The patch is breaking that. I'm not sure we want the patch right now.
Martin
>From 61603b9a995d6cab7938ad2b0e2a14fde3d02308 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Mon, 26 Aug 2019 11:59:23 +0200
Subject: [PATCH] Use symtab_node::order in LTO sections with body.

---
 gcc/cgraph.c | 13 +
 gcc/cgraphclones.c   |  5 +
 gcc/ipa-fnsummary.c  |  2 +-
 gcc/ipa-hsa.c|  2 +-
 gcc/ipa-icf.c|  2 +-
 gcc/ipa-prop.c   |  6 --
 gcc/lto-cgraph.c | 13 +
 gcc/lto-opts.c   |  2 +-
 gcc/lto-section-in.c | 12 +++-
 gcc/lto-section-out.c|  2 +-
 gcc/lto-streamer-in.c|  4 ++--
 gcc/lto-streamer-out.c   | 21 -
 gcc/lto-streamer.c   |  9 +++--
 gcc/lto-streamer.h   | 10 +++---
 gcc/lto/lto-common.c | 13 +++--
 gcc/testsuite/gcc.dg/lto/pr91393_0.c | 11 +++
 gcc/varpool.c|  7 ---
 17 files changed, 85 insertions(+), 49 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/lto/pr91393_0.c

diff --git a/gcc/cgraph.c b/gcc/cgraph.c
index 843891e9e56..9f86c2bdd10 100644
--- a/gcc/cgraph.c
+++ b/gcc/cgraph.c
@@ -3602,12 +3602,17 @@ cgraph_node::get_untransformed_body (void)
   struct lto_in_decl_state *decl_state
 	 = lto_get_function_in_decl_state (file_data, decl);
 
+  cgraph_node *origin = this;
+  while (origin->clone_of)
+origin = origin->clone_of;
+
+  int stream_order = origin->order - file_data->order_base;
   data = lto_get_section_data (file_data, LTO_section_function_body,
-			   name, &len, decl_state->compressed);
+			   name, stream_order, &len,
+			   decl_state->compressed);
   if (!data)
-fatal_error (input_location, "%s: section %s is missing",
-		 file_data->file_name,
-		 name);
+fatal_error (input_location, "%s: section %s.%d is missing",
+		 file_data->file_name, name, stream_order);
 
   gcc_assert (DECL_STRUCT_FUNCTION (decl) == NULL);
 
diff --git a/gcc/cgraphclones.c b/gcc/cgraphclones.c
index fa753697c78..f47c50fbb00 100644
--- a/gcc/cgraphclones.c
+++ b/gcc/cgraphclones.c
@@ -772,6 +772,11 @@ cgraph_node::find_replacement (void)
 	  n->clone_of = next_inline_clone;
 	  n = n->next_sibling_clone;
 	}
+
+  /* Update order in order to be able to find a LTO section
+	 with function body.  */
+  replacement->order = order;
+
   return replacement;
 }
   else
diff --git a/gcc/ipa-fnsummary.c b/gcc/ipa-fnsummary.c
index 1bf1806eaf8..b7e0571cdfc 100644
--- a/gcc/ipa-fnsummary.c
+++ b/gcc/ipa-fnsummary.c
@@ -3463,7 +3463,7 @@ ipa_fn_summary_read (void)
   size_t len;
   const char *data = lto_get_section_data (file_data,
 	   LTO_section_ipa_fn_summary,
-	   NULL, &len);
+	   NULL, 0, &len);
   if (data)
 	inline_read_section (file_data, data, len);
   else
diff --git a/gcc/ipa-hsa.c b/gcc/ipa-hsa.c
index 8af1d734d85..c9739fa6135 100644
--- a/gcc/ipa-hsa.c
+++ b/gcc/ipa-hsa.c
@@ -278,7 +278,7 @@ ipa_hsa_read_summary (void)
 {
   size_t len;
   const char *data = lto_get_section_data (file_data, LTO_section_ipa_hsa,
-	   NULL, &len);
+	   NULL, 0, &len);
 
   if (data)
 	ipa_hsa_read_section (file_data, data, len);
diff --git a/gcc/ipa-icf.c b/gcc/ipa-icf.c
index c9c3cb4a331..41663821d36 100644
--- a/gcc/ipa-icf.c
+++ b/gcc/ipa-icf.c
@@ -2387,7 +2387,7 @@ sem_item_optimizer::read_summary (void)
 {
   size_t len;
   const char *data = lto_get_section_data (file_data,
-			 LTO_section_ipa_icf, NULL, &len);
+			 LTO_section_ipa_icf, NULL, 0, &len);
 
   if (data)
 	read_section (file_data, data, len);
diff --git a/gcc/ipa-prop.c b/gcc/ipa-prop.c
index a23aa2590a0..9b36d96fd2a 100644
--- a/gcc/ipa-prop.c
+++ b/gcc/ipa-prop.c
@@ -4592,7 +4592,9 @@ ipa_prop_read_jump_functions (void)
   while ((file_data = file_data_vec[j++]))
 {
   size_t len;
-  const char *data = lto_get_section_data (file_data, LTO_section_jump_functions, NULL, &len);
+  const char *data = lto_get_s

Re: [PATCH] RISC-V: Fix bad insn splits with paradoxical subregs.

2019-09-18 Thread Kito Cheng

Hi Jakub, Richard:

This commit is fixing wrong code gen for RISC-V, does it OK to
backport to GCC 9 branch?

On Fri, Sep 6, 2019 at 4:34 AM Jim Wilson  wrote:
>
> Shifting by more than the size of a SUBREG_REG doesn't work, so we either
> need to disable splits if an input is paradoxical, or else we need to
> generate a clean temporary for intermediate results.
>
> This was tested with rv32i/newlib and rv64gc/linux cross builds and checks.
> There were no regressions.  The new pr91635.c testcase works with the patch
> and fails without it.  Also, code size for libc and libstdc++ was checked
> and is smaller or equal sized with the patch, ignoring alignment padding.
> The shift-shift-4.c testcase gives better code size with this patch.
> The shift-shift-5.c testcase gave worse code size with a draft version
> of this patch and is OK with the final version of this patch.
>
> Jakub wrote the first version of this patch, so gets primary credit for it.
>
> Committed.
>
> Jim
>
> gcc/
> PR target/91635
> * config/riscv/riscv.md (zero_extendsidi2, zero_extendhi2,
> extend2): Don't split if
> paradoxical_subreg_p (operands[0]).
> (*lshrsi3_zero_extend_3+1, *lshrsi3_zero_extend_3+2): Add clobber and
> use as intermediate value.
>
> gcc/testsuite/
> PR target/91635
> * gcc.c-torture/execute/pr91635.c: New test.
> * gcc.target/riscv/shift-shift-4.c: New test.
> * gcc.target/riscv/shift-shift-5.c: New test.
> ---
>  gcc/config/riscv/riscv.md | 30 +++---
>  gcc/testsuite/gcc.c-torture/execute/pr91635.c | 57 +++
>  .../gcc.target/riscv/shift-shift-4.c  | 13 +
>  .../gcc.target/riscv/shift-shift-5.c  | 16 ++
>  4 files changed, 107 insertions(+), 9 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr91635.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/shift-shift-4.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/shift-shift-5.c
>
> diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
> index 78260fcf6fd..744a027a1b7 100644
> --- a/gcc/config/riscv/riscv.md
> +++ b/gcc/config/riscv/riscv.md
> @@ -1051,7 +1051,9 @@
>"@
> #
> lwu\t%0,%1"
> -  "&& reload_completed && REG_P (operands[1])"
> +  "&& reload_completed
> +   && REG_P (operands[1])
> +   && !paradoxical_subreg_p (operands[0])"
>[(set (match_dup 0)
> (ashift:DI (match_dup 1) (const_int 32)))
> (set (match_dup 0)
> @@ -1068,7 +1070,9 @@
>"@
> #
> lhu\t%0,%1"
> -  "&& reload_completed && REG_P (operands[1])"
> +  "&& reload_completed
> +   && REG_P (operands[1])
> +   && !paradoxical_subreg_p (operands[0])"
>[(set (match_dup 0)
> (ashift:GPR (match_dup 1) (match_dup 2)))
> (set (match_dup 0)
> @@ -1117,7 +1121,9 @@
>"@
> #
> l\t%0,%1"
> -  "&& reload_completed && REG_P (operands[1])"
> +  "&& reload_completed
> +   && REG_P (operands[1])
> +   && !paradoxical_subreg_p (operands[0])"
>[(set (match_dup 0) (ashift:SI (match_dup 1) (match_dup 2)))
> (set (match_dup 0) (ashiftrt:SI (match_dup 0) (match_dup 2)))]
>  {
> @@ -1766,15 +1772,20 @@
>  ;; Handle AND with 2^N-1 for N from 12 to XLEN.  This can be split into
>  ;; two logical shifts.  Otherwise it requires 3 instructions: lui,
>  ;; xor/addi/srli, and.
> +
> +;; Generating a temporary for the shift output gives better combiner results;
> +;; and also fixes a problem where op0 could be a paradoxical reg and shifting
> +;; by amounts larger than the size of the SUBREG_REG doesn't work.
>  (define_split
>[(set (match_operand:GPR 0 "register_operand")
> (and:GPR (match_operand:GPR 1 "register_operand")
> -(match_operand:GPR 2 "p2m1_shift_operand")))]
> +(match_operand:GPR 2 "p2m1_shift_operand")))
> +   (clobber (match_operand:GPR 3 "register_operand"))]
>""
> - [(set (match_dup 0)
> + [(set (match_dup 3)
> (ashift:GPR (match_dup 1) (match_dup 2)))
>(set (match_dup 0)
> -   (lshiftrt:GPR (match_dup 0) (match_dup 2)))]
> +   (lshiftrt:GPR (match_dup 3) (match_dup 2)))]
>  {
>/* Op2 is a VOIDmode constant, so get the mode size from op1.  */
>operands[2] = GEN_INT (GET_MODE_BITSIZE (GET_MODE (operands[1]))
> @@ -1786,12 +1797,13 @@
>  (define_split
>[(set (match_operand:DI 0 "register_operand")
> (and:DI (match_operand:DI 1 "register_operand")
> -   (match_operand:DI 2 "high_mask_shift_operand")))]
> +   (match_operand:DI 2 "high_mask_shift_operand")))
> +   (clobber (match_operand:DI 3 "register_operand"))]
>"TARGET_64BIT"
> -  [(set (match_dup 0)
> +  [(set (match_dup 3)
> (lshiftrt:DI (match_dup 1) (match_dup 2)))
> (set (match_dup 0)
> -   (ashift:DI (match_dup 0) (match_dup 2)))]
> +   (ashift:DI (match_dup 3) (match_dup 2)))]
>  {
>operands[2] = GEN_INT (ctz_hwi (INTVAL (operands[2])));
>  })
> di

Re: [PATCH] RISC-V: Fix bad insn splits with paradoxical subregs.

2019-09-18 Thread Richard Biener

On Wed, 18 Sep 2019, Kito Cheng wrote:

> Hi Jakub, Richard:
> 
> This commit is fixing wrong code gen for RISC-V, does it OK to
> backport to GCC 9 branch?

Since it is target specific and for non-primary/secondary targets
it's the RISC-V maintainers call whether to allow backporting this.
Generally wrong-code issues can be backported even if they are not
regressions if the chance they introduce other issues is low.

Richard.

> On Fri, Sep 6, 2019 at 4:34 AM Jim Wilson  wrote:
> >
> > Shifting by more than the size of a SUBREG_REG doesn't work, so we either
> > need to disable splits if an input is paradoxical, or else we need to
> > generate a clean temporary for intermediate results.
> >
> > This was tested with rv32i/newlib and rv64gc/linux cross builds and checks.
> > There were no regressions.  The new pr91635.c testcase works with the patch
> > and fails without it.  Also, code size for libc and libstdc++ was checked
> > and is smaller or equal sized with the patch, ignoring alignment padding.
> > The shift-shift-4.c testcase gives better code size with this patch.
> > The shift-shift-5.c testcase gave worse code size with a draft version
> > of this patch and is OK with the final version of this patch.
> >
> > Jakub wrote the first version of this patch, so gets primary credit for it.
> >
> > Committed.
> >
> > Jim
> >
> > gcc/
> > PR target/91635
> > * config/riscv/riscv.md (zero_extendsidi2, zero_extendhi2,
> > extend2): Don't split if
> > paradoxical_subreg_p (operands[0]).
> > (*lshrsi3_zero_extend_3+1, *lshrsi3_zero_extend_3+2): Add clobber 
> > and
> > use as intermediate value.
> >
> > gcc/testsuite/
> > PR target/91635
> > * gcc.c-torture/execute/pr91635.c: New test.
> > * gcc.target/riscv/shift-shift-4.c: New test.
> > * gcc.target/riscv/shift-shift-5.c: New test.
> > ---
> >  gcc/config/riscv/riscv.md | 30 +++---
> >  gcc/testsuite/gcc.c-torture/execute/pr91635.c | 57 +++
> >  .../gcc.target/riscv/shift-shift-4.c  | 13 +
> >  .../gcc.target/riscv/shift-shift-5.c  | 16 ++
> >  4 files changed, 107 insertions(+), 9 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr91635.c
> >  create mode 100644 gcc/testsuite/gcc.target/riscv/shift-shift-4.c
> >  create mode 100644 gcc/testsuite/gcc.target/riscv/shift-shift-5.c
> >
> > diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
> > index 78260fcf6fd..744a027a1b7 100644
> > --- a/gcc/config/riscv/riscv.md
> > +++ b/gcc/config/riscv/riscv.md
> > @@ -1051,7 +1051,9 @@
> >"@
> > #
> > lwu\t%0,%1"
> > -  "&& reload_completed && REG_P (operands[1])"
> > +  "&& reload_completed
> > +   && REG_P (operands[1])
> > +   && !paradoxical_subreg_p (operands[0])"
> >[(set (match_dup 0)
> > (ashift:DI (match_dup 1) (const_int 32)))
> > (set (match_dup 0)
> > @@ -1068,7 +1070,9 @@
> >"@
> > #
> > lhu\t%0,%1"
> > -  "&& reload_completed && REG_P (operands[1])"
> > +  "&& reload_completed
> > +   && REG_P (operands[1])
> > +   && !paradoxical_subreg_p (operands[0])"
> >[(set (match_dup 0)
> > (ashift:GPR (match_dup 1) (match_dup 2)))
> > (set (match_dup 0)
> > @@ -1117,7 +1121,9 @@
> >"@
> > #
> > l\t%0,%1"
> > -  "&& reload_completed && REG_P (operands[1])"
> > +  "&& reload_completed
> > +   && REG_P (operands[1])
> > +   && !paradoxical_subreg_p (operands[0])"
> >[(set (match_dup 0) (ashift:SI (match_dup 1) (match_dup 2)))
> > (set (match_dup 0) (ashiftrt:SI (match_dup 0) (match_dup 2)))]
> >  {
> > @@ -1766,15 +1772,20 @@
> >  ;; Handle AND with 2^N-1 for N from 12 to XLEN.  This can be split into
> >  ;; two logical shifts.  Otherwise it requires 3 instructions: lui,
> >  ;; xor/addi/srli, and.
> > +
> > +;; Generating a temporary for the shift output gives better combiner 
> > results;
> > +;; and also fixes a problem where op0 could be a paradoxical reg and 
> > shifting
> > +;; by amounts larger than the size of the SUBREG_REG doesn't work.
> >  (define_split
> >[(set (match_operand:GPR 0 "register_operand")
> > (and:GPR (match_operand:GPR 1 "register_operand")
> > -(match_operand:GPR 2 "p2m1_shift_operand")))]
> > +(match_operand:GPR 2 "p2m1_shift_operand")))
> > +   (clobber (match_operand:GPR 3 "register_operand"))]
> >""
> > - [(set (match_dup 0)
> > + [(set (match_dup 3)
> > (ashift:GPR (match_dup 1) (match_dup 2)))
> >(set (match_dup 0)
> > -   (lshiftrt:GPR (match_dup 0) (match_dup 2)))]
> > +   (lshiftrt:GPR (match_dup 3) (match_dup 2)))]
> >  {
> >/* Op2 is a VOIDmode constant, so get the mode size from op1.  */
> >operands[2] = GEN_INT (GET_MODE_BITSIZE (GET_MODE (operands[1]))
> > @@ -1786,12 +1797,13 @@
> >  (define_split
> >[(set (match_operand:DI 0 "register_operand")
> > (and:DI (match_

Re: [PATCH] RISC-V: Fix bad insn splits with paradoxical subregs.

2019-09-18 Thread Kito Cheng

Hi Richard:

Got it, thanks :)

On Wed, Sep 18, 2019 at 6:25 PM Richard Biener  wrote:
>
> On Wed, 18 Sep 2019, Kito Cheng wrote:
>
> > Hi Jakub, Richard:
> >
> > This commit is fixing wrong code gen for RISC-V, does it OK to
> > backport to GCC 9 branch?
>
> Since it is target specific and for non-primary/secondary targets
> it's the RISC-V maintainers call whether to allow backporting this.
> Generally wrong-code issues can be backported even if they are not
> regressions if the chance they introduce other issues is low.
>
> Richard.
>
> > On Fri, Sep 6, 2019 at 4:34 AM Jim Wilson  wrote:
> > >
> > > Shifting by more than the size of a SUBREG_REG doesn't work, so we either
> > > need to disable splits if an input is paradoxical, or else we need to
> > > generate a clean temporary for intermediate results.
> > >
> > > This was tested with rv32i/newlib and rv64gc/linux cross builds and 
> > > checks.
> > > There were no regressions.  The new pr91635.c testcase works with the 
> > > patch
> > > and fails without it.  Also, code size for libc and libstdc++ was checked
> > > and is smaller or equal sized with the patch, ignoring alignment padding.
> > > The shift-shift-4.c testcase gives better code size with this patch.
> > > The shift-shift-5.c testcase gave worse code size with a draft version
> > > of this patch and is OK with the final version of this patch.
> > >
> > > Jakub wrote the first version of this patch, so gets primary credit for 
> > > it.
> > >
> > > Committed.
> > >
> > > Jim
> > >
> > > gcc/
> > > PR target/91635
> > > * config/riscv/riscv.md (zero_extendsidi2, 
> > > zero_extendhi2,
> > > extend2): Don't split if
> > > paradoxical_subreg_p (operands[0]).
> > > (*lshrsi3_zero_extend_3+1, *lshrsi3_zero_extend_3+2): Add clobber 
> > > and
> > > use as intermediate value.
> > >
> > > gcc/testsuite/
> > > PR target/91635
> > > * gcc.c-torture/execute/pr91635.c: New test.
> > > * gcc.target/riscv/shift-shift-4.c: New test.
> > > * gcc.target/riscv/shift-shift-5.c: New test.
> > > ---
> > >  gcc/config/riscv/riscv.md | 30 +++---
> > >  gcc/testsuite/gcc.c-torture/execute/pr91635.c | 57 +++
> > >  .../gcc.target/riscv/shift-shift-4.c  | 13 +
> > >  .../gcc.target/riscv/shift-shift-5.c  | 16 ++
> > >  4 files changed, 107 insertions(+), 9 deletions(-)
> > >  create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr91635.c
> > >  create mode 100644 gcc/testsuite/gcc.target/riscv/shift-shift-4.c
> > >  create mode 100644 gcc/testsuite/gcc.target/riscv/shift-shift-5.c
> > >
> > > diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
> > > index 78260fcf6fd..744a027a1b7 100644
> > > --- a/gcc/config/riscv/riscv.md
> > > +++ b/gcc/config/riscv/riscv.md
> > > @@ -1051,7 +1051,9 @@
> > >"@
> > > #
> > > lwu\t%0,%1"
> > > -  "&& reload_completed && REG_P (operands[1])"
> > > +  "&& reload_completed
> > > +   && REG_P (operands[1])
> > > +   && !paradoxical_subreg_p (operands[0])"
> > >[(set (match_dup 0)
> > > (ashift:DI (match_dup 1) (const_int 32)))
> > > (set (match_dup 0)
> > > @@ -1068,7 +1070,9 @@
> > >"@
> > > #
> > > lhu\t%0,%1"
> > > -  "&& reload_completed && REG_P (operands[1])"
> > > +  "&& reload_completed
> > > +   && REG_P (operands[1])
> > > +   && !paradoxical_subreg_p (operands[0])"
> > >[(set (match_dup 0)
> > > (ashift:GPR (match_dup 1) (match_dup 2)))
> > > (set (match_dup 0)
> > > @@ -1117,7 +1121,9 @@
> > >"@
> > > #
> > > l\t%0,%1"
> > > -  "&& reload_completed && REG_P (operands[1])"
> > > +  "&& reload_completed
> > > +   && REG_P (operands[1])
> > > +   && !paradoxical_subreg_p (operands[0])"
> > >[(set (match_dup 0) (ashift:SI (match_dup 1) (match_dup 2)))
> > > (set (match_dup 0) (ashiftrt:SI (match_dup 0) (match_dup 2)))]
> > >  {
> > > @@ -1766,15 +1772,20 @@
> > >  ;; Handle AND with 2^N-1 for N from 12 to XLEN.  This can be split into
> > >  ;; two logical shifts.  Otherwise it requires 3 instructions: lui,
> > >  ;; xor/addi/srli, and.
> > > +
> > > +;; Generating a temporary for the shift output gives better combiner 
> > > results;
> > > +;; and also fixes a problem where op0 could be a paradoxical reg and 
> > > shifting
> > > +;; by amounts larger than the size of the SUBREG_REG doesn't work.
> > >  (define_split
> > >[(set (match_operand:GPR 0 "register_operand")
> > > (and:GPR (match_operand:GPR 1 "register_operand")
> > > -(match_operand:GPR 2 "p2m1_shift_operand")))]
> > > +(match_operand:GPR 2 "p2m1_shift_operand")))
> > > +   (clobber (match_operand:GPR 3 "register_operand"))]
> > >""
> > > - [(set (match_dup 0)
> > > + [(set (match_dup 3)
> > > (ashift:GPR (match_dup 1) (match_dup 2)))
> > >(set (match_dup 0)
> > > -   (lshiftrt:GPR (match_dup 0) (match_

Re: [PATCH 1/2][vect]PR 88915: Vectorize epilogues when versioning loops

2019-09-18 Thread Andre Vieira (lists)


Hi Richard,

As I mentioned in the IRC channel, this is my current work in progress 
patch. It currently ICE's when vectorizing 
gcc/testsuite/gcc.c-torture/execute/nestfunc-2.c with '-O3' and '--param 
vect-epilogues-nomask=1'.


It ICE's because the epilogue loop (after if conversion) and main loop 
(before vectorization) are not the same, there are a bunch of extra BBs 
and the epilogue loop seems to need some cleaning up too.


Let me know if you see a way around this issue.

Cheers,
Andre
diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h
index 0b0154ffd7bf031a005de993b101d9db6dd98c43..d01512ea46467f1cf77793bdc75b48e71b0b9641 100644
--- a/gcc/cfgloop.h
+++ b/gcc/cfgloop.h
@@ -21,6 +21,7 @@ along with GCC; see the file COPYING3.  If not see
 #define GCC_CFGLOOP_H
 
 #include "cfgloopmanip.h"
+#include "target.h"
 
 /* Structure to hold decision about unrolling/peeling.  */
 enum lpt_dec
@@ -268,6 +269,9 @@ public:
  the basic-block from being collected but its index can still be
  reused.  */
   basic_block former_header;
+
+  /* Keep track of vector sizes we know we can vectorize the epilogue with.  */
+  vector_sizes epilogue_vsizes;
 };
 
 /* Set if the loop is known to be infinite.  */
diff --git a/gcc/cfgloop.c b/gcc/cfgloop.c
index 4ad1f658708f83dbd8789666c26d4bd056837bc6..f3e81bcd00b3f125389aa15b12dc5201b3578d20 100644
--- a/gcc/cfgloop.c
+++ b/gcc/cfgloop.c
@@ -198,6 +198,7 @@ flow_loop_free (class loop *loop)
   exit->prev = exit;
 }
 
+  loop->epilogue_vsizes.release();
   ggc_free (loop->exits);
   ggc_free (loop);
 }
@@ -355,6 +356,7 @@ alloc_loop (void)
   loop->nb_iterations_upper_bound = 0;
   loop->nb_iterations_likely_upper_bound = 0;
   loop->nb_iterations_estimate = 0;
+  loop->epilogue_vsizes.create(8);
   return loop;
 }
 
diff --git a/gcc/gengtype.c b/gcc/gengtype.c
index 53317337cf8c8e8caefd6b819d28b3bba301e755..80fb6ef71465b24e034fa45d69fec56be6b2e7f8 100644
--- a/gcc/gengtype.c
+++ b/gcc/gengtype.c
@@ -5197,6 +5197,7 @@ main (int argc, char **argv)
   POS_HERE (do_scalar_typedef ("widest_int", &pos));
   POS_HERE (do_scalar_typedef ("int64_t", &pos));
   POS_HERE (do_scalar_typedef ("poly_int64", &pos));
+  POS_HERE (do_scalar_typedef ("poly_uint64", &pos));
   POS_HERE (do_scalar_typedef ("uint64_t", &pos));
   POS_HERE (do_scalar_typedef ("uint8", &pos));
   POS_HERE (do_scalar_typedef ("uintptr_t", &pos));
@@ -5206,6 +5207,7 @@ main (int argc, char **argv)
   POS_HERE (do_scalar_typedef ("machine_mode", &pos));
   POS_HERE (do_scalar_typedef ("fixed_size_mode", &pos));
   POS_HERE (do_scalar_typedef ("CONSTEXPR", &pos));
+  POS_HERE (do_scalar_typedef ("vector_sizes", &pos));
   POS_HERE (do_typedef ("PTR", 
 			create_pointer (resolve_typedef ("void", &pos)),
 			&pos));
diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c
index 5c25441c70a271f04730486e513437fffa75b7e3..b1c13dafdeeec8d95f00c232822d3ab9b11f4046 100644
--- a/gcc/tree-vect-loop-manip.c
+++ b/gcc/tree-vect-loop-manip.c
@@ -26,6 +26,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree.h"
 #include "gimple.h"
 #include "cfghooks.h"
+#include "tree-if-conv.h"
 #include "tree-pass.h"
 #include "ssa.h"
 #include "fold-const.h"
@@ -1730,6 +1731,7 @@ vect_update_inits_of_drs (loop_vec_info loop_vinfo, tree niters,
 {
   unsigned int i;
   vec datarefs = LOOP_VINFO_DATAREFS (loop_vinfo);
+  vec datarefs_copy = loop_vinfo->shared->datarefs_copy;
   struct data_reference *dr;
 
   DUMP_VECT_SCOPE ("vect_update_inits_of_dr");
@@ -1756,6 +1758,12 @@ vect_update_inits_of_drs (loop_vec_info loop_vinfo, tree niters,
   if (!STMT_VINFO_GATHER_SCATTER_P (dr_info->stmt))
 	vect_update_init_of_dr (dr, niters, code);
 }
+  FOR_EACH_VEC_ELT (datarefs_copy, i, dr)
+{
+  dr_vec_info *dr_info = loop_vinfo->lookup_dr (dr);
+  if (!STMT_VINFO_GATHER_SCATTER_P (dr_info->stmt))
+	vect_update_init_of_dr (dr, niters, code);
+}
 }
 
 /* For the information recorded in LOOP_VINFO prepare the loop for peeling
@@ -2409,6 +2417,8 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
   profile_probability prob_prolog, prob_vector, prob_epilog;
   int estimated_vf;
   int prolog_peeling = 0;
+  bool vect_epilogues
+= loop_vinfo->epilogue_vinfos.length () > 0;
   /* We currently do not support prolog peeling if the target alignment is not
  known at compile time.  'vect_gen_prolog_loop_niters' depends on the
  target alignment being constant.  */
@@ -2469,12 +2479,12 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
   /* Prolog loop may be skipped.  */
   bool skip_prolog = (prolog_peeling != 0);
   /* Skip to epilog if scalar loop may be preferred.  It's only needed
- when we peel for epilog loop and when it hasn't been checked with
- loop versioning.  */
+ when we peel for epilog loop or when we loop version.  */
   bool skip_vector = (LOOP_VINFO_NITERS_KN

Fix conversions for built-in operator overloading candidates.

2019-09-18 Thread Feng Xue OS

Hi, Jason,

  With this patch, gcc-aarch64 build will encounter an error as the following. 
Not sure it is a bug, would you please check this? 

../../gcc/expmed.c:5602:19: error: ‘int_mode’ may be used uninitialized in this 
function [-Werror=maybe-uninitialized]
 5602 |   scalar_int_mode int_mode;

Thanks,
Feng

[PATCH] Disentangle autopar and vect reduction detection

2019-09-18 Thread Richard Biener



I'm planning on major refactoring work on the vectorizer reduction
handling and as usual parloops use of the machinery stands in the
way.  The following deals with this pre refactoring by simply
duplicating all the code into tree-parloops.c.  I guess a lot
of the code could be stripped or simplified but that's not my
task right now.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Richard.

>From 9d5260a2a1ebf97308233e03d6505629db343244 Mon Sep 17 00:00:00 2001
From: Richard Guenther 
Date: Wed, 18 Sep 2019 13:28:56 +0200
Subject: [PATCH] duplicate-vect-reduction-to-parloops

* tree-parloops.c (report_ploop_op): Copy from report_vect_op.
(parloops_valid_reduction_input_p): Copy from
valid_reduction_input_p.
(parloops_is_slp_reduction): Copy from vect_is_slp_reduction.
(parloops_needs_fold_left_reduction_p): Copy from
needs_fold_left_reduction_p.
(parloops_is_simple_reduction): Copy from
vect_is_simple_reduction.
(parloops_force_simple_reduction): Copy from
vect_force_simple_reduction.
(gather_scalar_reductions): Adjust.
* tree-vect-loop.c (vect_force_simple_reduction): Make static.
* tree-vectorizer.h (vect_force_simple_reduction): Remove.

diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c
index f5cb411f087..b6bb49b2fa8 100644
--- a/gcc/tree-parloops.c
+++ b/gcc/tree-parloops.c
@@ -88,7 +88,8 @@ along with GCC; see the file COPYING3.  If not see
More info can also be found at http://gcc.gnu.org/wiki/AutoParInGCC  */
 /*
   Reduction handling:
-  currently we use vect_force_simple_reduction() to detect reduction patterns.
+  currently we use code inspired by vect_force_simple_reduction to detect
+  reduction patterns.
   The code transformation will be introduced by an example.
 
 
@@ -182,6 +183,717 @@ parloop
 
 */
 
+/* Error reporting helper for parloops_is_simple_reduction below.  GIMPLE
+   statement STMT is printed with a message MSG. */
+
+static void
+report_ploop_op (dump_flags_t msg_type, gimple *stmt, const char *msg)
+{
+  dump_printf_loc (msg_type, vect_location, "%s%G", msg, stmt);
+}
+
+/* DEF_STMT_INFO occurs in a loop that contains a potential reduction
+   operation.  Return true if the results of DEF_STMT_INFO are something
+   that can be accumulated by such a reduction.  */
+
+static bool
+parloops_valid_reduction_input_p (stmt_vec_info def_stmt_info)
+{
+  return (is_gimple_assign (def_stmt_info->stmt)
+ || is_gimple_call (def_stmt_info->stmt)
+ || STMT_VINFO_DEF_TYPE (def_stmt_info) == vect_induction_def
+ || (gimple_code (def_stmt_info->stmt) == GIMPLE_PHI
+ && STMT_VINFO_DEF_TYPE (def_stmt_info) == vect_internal_def
+ && !is_loop_header_bb_p (gimple_bb (def_stmt_info->stmt;
+}
+
+/* Detect SLP reduction of the form:
+
+   #a1 = phi 
+   a2 = operation (a1)
+   a3 = operation (a2)
+   a4 = operation (a3)
+   a5 = operation (a4)
+
+   #a = phi 
+
+   PHI is the reduction phi node (#a1 = phi  above)
+   FIRST_STMT is the first reduction stmt in the chain
+   (a2 = operation (a1)).
+
+   Return TRUE if a reduction chain was detected.  */
+
+static bool
+parloops_is_slp_reduction (loop_vec_info loop_info, gimple *phi,
+  gimple *first_stmt)
+{
+  class loop *loop = (gimple_bb (phi))->loop_father;
+  class loop *vect_loop = LOOP_VINFO_LOOP (loop_info);
+  enum tree_code code;
+  gimple *loop_use_stmt = NULL;
+  stmt_vec_info use_stmt_info;
+  tree lhs;
+  imm_use_iterator imm_iter;
+  use_operand_p use_p;
+  int nloop_uses, size = 0, n_out_of_loop_uses;
+  bool found = false;
+
+  if (loop != vect_loop)
+return false;
+
+  auto_vec reduc_chain;
+  lhs = PHI_RESULT (phi);
+  code = gimple_assign_rhs_code (first_stmt);
+  while (1)
+{
+  nloop_uses = 0;
+  n_out_of_loop_uses = 0;
+  FOR_EACH_IMM_USE_FAST (use_p, imm_iter, lhs)
+{
+ gimple *use_stmt = USE_STMT (use_p);
+ if (is_gimple_debug (use_stmt))
+   continue;
+
+  /* Check if we got back to the reduction phi.  */
+ if (use_stmt == phi)
+{
+ loop_use_stmt = use_stmt;
+  found = true;
+  break;
+}
+
+  if (flow_bb_inside_loop_p (loop, gimple_bb (use_stmt)))
+{
+ loop_use_stmt = use_stmt;
+ nloop_uses++;
+}
+   else
+ n_out_of_loop_uses++;
+
+   /* There are can be either a single use in the loop or two uses in
+  phi nodes.  */
+   if (nloop_uses > 1 || (n_out_of_loop_uses && nloop_uses))
+ return false;
+}
+
+  if (found)
+break;
+
+  /* We reached a statement with no loop uses.  */
+  if (nloop_uses == 0)
+   return false;
+
+  /* This is a loop exit phi, and we haven't reached the reduction phi.  */
+  if (gimple_code (loop_use_stmt) == GIMPLE_PHI)
+

[Patch,committed][OG9] Fix dg-warning line numbers in libgomp

2019-09-18 Thread Tobias Burnus

This fixes the line numbers of dg-warnings (cf. 2019-07-31 commit 
fcea4b6e384e30231ab6d88b1f9feb1007b3e96b).


Committed to the openacc-gcc-9-branch.

Tobias

commit 0f2a4240229e97fdbcd3c8299642ed4b85f5b347
Author: Tobias Burnus 
Date:   Wed Sep 18 13:45:34 2019 +0200

libgomp - fix dg-warning line numbers

libgomp/
* testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Fix dg-warning
line numbers.
* testsuite/libgomp.oacc-c-c++-common/serial-dims.c: Likewise.

diff --git a/libgomp/ChangeLog.openacc b/libgomp/ChangeLog.openacc
index db7f2a43b80..943a9e4a933 100644
--- a/libgomp/ChangeLog.openacc
+++ b/libgomp/ChangeLog.openacc
@@ -1,3 +1,9 @@
+2019-09-18  Tobias Burnus  
+
+	* testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Fix dg-warning
+	line numbers.
+	* testsuite/libgomp.oacc-c-c++-common/serial-dims.c: Likewise.
+
 2019-09-18  Tobias Burnus  
 
 	* linux/gomp_print.c (gomp_print_integer): Use PRId64 if available,
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims.c
index d9f2c75e868..2c14f9c545a 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims.c
@@ -158,7 +158,7 @@ int main ()
 gangs_min = workers_min = vectors_min = INT_MAX;
 gangs_max = workers_max = vectors_max = INT_MIN;
 #pragma acc parallel copy (vectors_actual) /* { dg-warning "region contains vector partitioned code but is not vector partitioned" } */ \
-  /* { dg-warning "using vector_length \\(32\\), ignoring 1" "" { target openacc_nvidia_accel_selected } .-1 } */ \
+  /* { dg-warning "using vector_length \\(32\\), ignoring 1" "" { target openacc_nvidia_accel_selected } 160 } */ \
   vector_length (VECTORS) /* { dg-warning "'vector_length' value must be positive" "" { target c++ } } */
 {
   /* We're actually executing with vector_length (1), just the GCC nvptx
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/serial-dims.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/serial-dims.c
index fd4b17c40c2..3895405b2cf 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/serial-dims.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/serial-dims.c
@@ -59,10 +59,10 @@ int main ()
 gangs_max = workers_max = vectors_max = INT_MIN;
 gangs_actual = workers_actual = vectors_actual = 1;
 #pragma acc serial
-/* { dg-warning "region contains gang partitioned code but is not gang partitioned" "" { target *-*-* } .-1 } */
-/* { dg-warning "region contains worker partitioned code but is not worker partitioned" "" { target *-*-* } .-2 } */
-/* { dg-warning "region contains vector partitioned code but is not vector partitioned" "" { target *-*-* } .-3 } */
-/* { dg-warning "using vector_length \\(32\\), ignoring 1" "" { target openacc_nvidia_accel_selected } .-4 } */
+/* { dg-warning "region contains gang partitioned code but is not gang partitioned" "" { target *-*-* } 61 } */
+/* { dg-warning "region contains worker partitioned code but is not worker partitioned" "" { target *-*-* } 61 } */
+/* { dg-warning "region contains vector partitioned code but is not vector partitioned" "" { target *-*-* } 61 } */
+/* { dg-warning "using vector_length \\(32\\), ignoring 1" "" { target openacc_nvidia_accel_selected } 61 } */
 {
   if (acc_on_device (acc_device_nvidia))
 	{

[PATCH V4] Generalized predicate/condition for parameter reference in IPA (PR ipa/91088)

2019-09-18 Thread Feng Xue OS

>> +  if (unmodified_parm_or_parm_agg_item (fbi, stmt, expr, index_p, &size,
>> + aggpos))
>> + {
>> +  tree type = TREE_TYPE (expr);
>> +
>> +   /* Stop if found bit-field whose size does not equal any natural
>> +  integral type.  */
>> +   if (maybe_ne (tree_to_poly_int64 (TYPE_SIZE (type)), size))
>> + break;

> Why is this needed?
This is to exclude bit-field reference. Actually, the restrict is not required, 
a TODO to remove it. Now I directly check bit-field attribute.

>> - = add_condition (summary, index, size, &aggpos, this_code,
>> -  unshare_expr_without_location
>> -  (gimple_cond_rhs (last)));
>> + = add_condition (summary, index, type, &aggpos, this_code,
>> +  gimple_cond_rhs (last), param_ops);

> An why unsharing is no longer needed here?
> It is importnat to avoid anything which reffers to function local blocks
> to get into the global LTO stream.
Do unsharing inside add_condition, since constants in param_ops also need to be 
unshared.

>> +   if (op1.code != op2.code
>> +   || op1.val_is_rhs != op2.val_is_rhs
>> +   || !vrp_operand_equal_p (op1.val, op2.val)

> Why you need vrp_operand_equal_p instead of operand_equal_p here?
op1.val and op2.val might be NULL_TREE.

> Overall the patch looks very reasonable. We may end up with bit more
> general expressions (i.e. supporting arithmetics involving more than one
> operand).
If you means more than one constant operand, I'v changed the patch to support 
ternary operation.

And if you means more than one parameter operand, this will involve much more 
code change in ipa-fnsummary, it's better to let it be another TODO.

> I see you also fixed a lot of typos in comments, please go head and
> commit them separately as obvious.
Removed.

Thank for your comments.
Feng
---
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 0e3693598e7..05b1bb97c6b 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -11943,6 +11943,13 @@ For switch exceeding this limit, IPA-CP will not 
construct cloning cost
 predicate, which is used to estimate cloning benefit, for default case
 of the switch statement.
 
+@item ipa-max-param-expr-ops
+IPA-CP will analyze conditional statement that references some function
+parameter to estimate benefit for cloning upon certain constant value.
+But if number of operations in a parameter expression exceeds
+@option{ipa-max-param-expr-ops}, the expression is treated as complicated
+one, and is not handled by IPA analysis.
+
 @item lto-partitions
 Specify desired number of partitions produced during WHOPR compilation.
 The number of partitions should exceed the number of CPUs used for compilation.
diff --git a/gcc/ipa-fnsummary.c b/gcc/ipa-fnsummary.c
index 1bf1806eaf8..c25e3395f59 100644
--- a/gcc/ipa-fnsummary.c
+++ b/gcc/ipa-fnsummary.c
@@ -331,6 +331,8 @@ evaluate_conditions_for_known_args (struct cgraph_node 
*node,
 {
   tree val;
   tree res;
+  int j;
+  struct expr_eval_op *op;
 
   /* We allow call stmt to have fewer arguments than the callee function
  (especially for K&R style programs).  So bound check here (we assume
@@ -382,7 +384,7 @@ evaluate_conditions_for_known_args (struct cgraph_node 
*node,
  continue;
}
 
-  if (maybe_ne (tree_to_poly_int64 (TYPE_SIZE (TREE_TYPE (val))), c->size))
+  if (TYPE_SIZE (c->type) != TYPE_SIZE (TREE_TYPE (val)))
{
  clause |= 1 << (i + predicate::first_dynamic_condition);
  nonspec_clause |= 1 << (i + predicate::first_dynamic_condition);
@@ -394,7 +396,30 @@ evaluate_conditions_for_known_args (struct cgraph_node 
*node,
  continue;
}
 
-  val = fold_unary (VIEW_CONVERT_EXPR, TREE_TYPE (c->val), val);
+  val = fold_unary (VIEW_CONVERT_EXPR, c->type, val);
+  for (j = 0; vec_safe_iterate (c->param_ops, j, &op); j++)
+   {
+ if (!val)
+   break;
+ if (!op->val[0])
+   val = fold_unary (op->code, op->type, val);
+ else if (!op->val[1])
+   val = fold_binary (op->code, op->type,
+  op->index ? op->val[0] : val,
+  op->index ? val : op->val[0]);
+ else if (op->index == 0)
+   val = fold_ternary (op->code, op->type,
+   val, op->val[0], op->val[1]);
+ else if (op->index == 1)
+   val = fold_ternary (op->code, op->type,
+   op->val[0], val, op->val[1]);
+ else if (op->index == 2)
+   val = fold_ternary (op->code, op->type,
+   op->val[0], op->val[1], val);
+ else
+   val = NULL_TREE;
+   }
+
   res = val
? fold_binary_to_constant (c->code, boolean_type_node, val, c->val)
: NULL;
@@ -11

[PATCH] Remove dead code

2019-09-18 Thread Richard Biener



This removes dead code from the vectorizer and makes a function static.

Bootstrapped / tested on x86_64-unknown-linux-gnu, applied.

Richard.

2019-09-18  Richard Biener  

* tree-vectorizer.h (get_initial_def_for_reduction): Remove.
* tree-vect-loop.c (get_initial_def_for_reduction): Make
static.
(vect_create_epilog_for_reduction): Remove dead code.

Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c(revision 275873)
+++ gcc/tree-vect-loop.c(working copy)
@@ -4203,7 +4203,7 @@ vect_model_induction_cost (stmt_vec_info
 
A cost model should help decide between these two schemes.  */
 
-tree
+static tree
 get_initial_def_for_reduction (stmt_vec_info stmt_vinfo, tree init_val,
tree *adjustment_def)
 {
@@ -4585,7 +4585,6 @@ vect_create_epilog_for_reduction (vec

Re: [PATCH, AArch64 v4 5/6] aarch64: Implement -moutline-atomics

2019-09-18 Thread Kyrill Tkachov




On 9/18/19 2:58 AM, Richard Henderson wrote:

* config/aarch64/aarch64.opt (-moutline-atomics): New.
* config/aarch64/aarch64.c (aarch64_atomic_ool_func): New.
(aarch64_ool_cas_names, aarch64_ool_swp_names): New.
(aarch64_ool_ldadd_names, aarch64_ool_ldset_names): New.
(aarch64_ool_ldclr_names, aarch64_ool_ldeor_names): New.
(aarch64_expand_compare_and_swap): Honor TARGET_OUTLINE_ATOMICS.
* config/aarch64/atomics.md (atomic_exchange): Likewise.
(atomic_): Likewise.
(atomic_fetch_): Likewise.
(atomic__fetch): Likewise.
testsuite/
* gcc.target/aarch64/atomic-op-acq_rel.c: Use -mno-outline-atomics.
* gcc.target/aarch64/atomic-comp-swap-release-acquire.c: Likewise.
* gcc.target/aarch64/atomic-op-acquire.c: Likewise.
* gcc.target/aarch64/atomic-op-char.c: Likewise.
* gcc.target/aarch64/atomic-op-consume.c: Likewise.
* gcc.target/aarch64/atomic-op-imm.c: Likewise.
* gcc.target/aarch64/atomic-op-int.c: Likewise.
* gcc.target/aarch64/atomic-op-long.c: Likewise.
* gcc.target/aarch64/atomic-op-relaxed.c: Likewise.
* gcc.target/aarch64/atomic-op-release.c: Likewise.
* gcc.target/aarch64/atomic-op-seq_cst.c: Likewise.
* gcc.target/aarch64/atomic-op-short.c: Likewise.
* gcc.target/aarch64/atomic_cmp_exchange_zero_reg_1.c: Likewise.
* gcc.target/aarch64/atomic_cmp_exchange_zero_strong_1.c: Likewise.
* gcc.target/aarch64/sync-comp-swap.c: Likewise.
* gcc.target/aarch64/sync-op-acquire.c: Likewise.
* gcc.target/aarch64/sync-op-full.c: Likewise.
---
  gcc/config/aarch64/aarch64-protos.h   | 13 +++
  gcc/config/aarch64/aarch64.c  | 87 +
  .../atomic-comp-swap-release-acquire.c|  2 +-
  .../gcc.target/aarch64/atomic-op-acq_rel.c|  2 +-
  .../gcc.target/aarch64/atomic-op-acquire.c|  2 +-
  .../gcc.target/aarch64/atomic-op-char.c   |  2 +-
  .../gcc.target/aarch64/atomic-op-consume.c|  2 +-
  .../gcc.target/aarch64/atomic-op-imm.c|  2 +-
  .../gcc.target/aarch64/atomic-op-int.c|  2 +-
  .../gcc.target/aarch64/atomic-op-long.c   |  2 +-
  .../gcc.target/aarch64/atomic-op-relaxed.c|  2 +-
  .../gcc.target/aarch64/atomic-op-release.c|  2 +-
  .../gcc.target/aarch64/atomic-op-seq_cst.c|  2 +-
  .../gcc.target/aarch64/atomic-op-short.c  |  2 +-
  .../aarch64/atomic_cmp_exchange_zero_reg_1.c  |  2 +-
  .../atomic_cmp_exchange_zero_strong_1.c   |  2 +-
  .../gcc.target/aarch64/sync-comp-swap.c   |  2 +-
  .../gcc.target/aarch64/sync-op-acquire.c  |  2 +-
  .../gcc.target/aarch64/sync-op-full.c |  2 +-
  gcc/config/aarch64/aarch64.opt|  3 +
  gcc/config/aarch64/atomics.md | 94 +--
  gcc/doc/invoke.texi   | 16 +++-
  22 files changed, 221 insertions(+), 26 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index c4b73d26df6..1c1aac7201a 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -696,4 +696,17 @@ poly_uint64 aarch64_regmode_natural_size (machine_mode);
  
  bool aarch64_high_bits_all_ones_p (HOST_WIDE_INT);
  
+struct atomic_ool_names

+{
+const char *str[5][4];
+};
+
+rtx aarch64_atomic_ool_func(machine_mode mode, rtx model_rtx,
+   const atomic_ool_names *names);
+extern const atomic_ool_names aarch64_ool_swp_names;
+extern const atomic_ool_names aarch64_ool_ldadd_names;
+extern const atomic_ool_names aarch64_ool_ldset_names;
+extern const atomic_ool_names aarch64_ool_ldclr_names;
+extern const atomic_ool_names aarch64_ool_ldeor_names;
+
  #endif /* GCC_AARCH64_PROTOS_H */
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index b937514e6f8..56a4a47db73 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -16867,6 +16867,82 @@ aarch64_emit_unlikely_jump (rtx insn)
add_reg_br_prob_note (jump, profile_probability::very_unlikely ());
  }
  
+/* We store the names of the various atomic helpers in a 5x4 array.

+   Return the libcall function given MODE, MODEL and NAMES.  */
+
+rtx
+aarch64_atomic_ool_func(machine_mode mode, rtx model_rtx,
+   const atomic_ool_names *names)
+{
+  memmodel model = memmodel_base (INTVAL (model_rtx));
+  int mode_idx, model_idx;
+
+  switch (mode)
+{
+case E_QImode:
+  mode_idx = 0;
+  break;
+case E_HImode:
+  mode_idx = 1;
+  break;
+case E_SImode:
+  mode_idx = 2;
+  break;
+case E_DImode:
+  mode_idx = 3;
+  break;
+case E_TImode:
+  mode_idx = 4;
+  break;
+default:
+  gcc_unreachable ();
+}
+
+  switch (model)
+{
+case MEMMODEL_RELAXED:
+  model_idx = 0;
+  break;
+case MEMMODEL_CONSUME:
+case MEMMODEL

Re: [PATCH, AArch64 v4 4/6] aarch64: Add out-of-line functions for LSE atomics

2019-09-18 Thread Kyrill Tkachov


On 9/18/19 2:58 AM, Richard Henderson wrote:

This is the libgcc part of the interface -- providing the functions.
Rationale is provided at the top of libgcc/config/aarch64/lse.S.

* config/aarch64/lse-init.c: New file.
* config/aarch64/lse.S: New file.
* config/aarch64/t-lse: New file.
* config.host: Add t-lse to all aarch64 tuples.
---
  libgcc/config/aarch64/lse-init.c |  45 ++
  libgcc/config.host   |   4 +
  libgcc/config/aarch64/lse.S  | 235 +++
  libgcc/config/aarch64/t-lse  |  44 ++
  4 files changed, 328 insertions(+)
  create mode 100644 libgcc/config/aarch64/lse-init.c
  create mode 100644 libgcc/config/aarch64/lse.S
  create mode 100644 libgcc/config/aarch64/t-lse

diff --git a/libgcc/config/aarch64/lse-init.c b/libgcc/config/aarch64/lse-init.c
new file mode 100644
index 000..51fb21d45c9
--- /dev/null
+++ b/libgcc/config/aarch64/lse-init.c
@@ -0,0 +1,45 @@
+/* Out-of-line LSE atomics for AArch64 architecture, Init.
+   Copyright (C) 2018 Free Software Foundation, Inc.
+   Contributed by Linaro Ltd.
+



This, and the other new files, will need an updated copyright date now.

Thanks,

Kyrill



+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+.  */
+
+/* Define the symbol gating the LSE implementations.  */
+_Bool __aarch64_have_lse_atomics
+  __attribute__((visibility("hidden"), nocommon));
+
+/* Disable initialization of __aarch64_have_lse_atomics during bootstrap.  */
+#ifndef inhibit_libc
+# include 
+
+/* Disable initialization if the system headers are too old.  */
+# if defined(AT_HWCAP) && defined(HWCAP_ATOMICS)
+
+static void __attribute__((constructor))
+init_have_lse_atomics (void)
+{
+  unsigned long hwcap = getauxval (AT_HWCAP);
+  __aarch64_have_lse_atomics = (hwcap & HWCAP_ATOMICS) != 0;
+}
+
+# endif /* HWCAP */
+#endif /* inhibit_libc */
diff --git a/libgcc/config.host b/libgcc/config.host
index 728e543ea39..122113fc519 100644
--- a/libgcc/config.host
+++ b/libgcc/config.host
@@ -350,12 +350,14 @@ aarch64*-*-elf | aarch64*-*-rtems*)
extra_parts="$extra_parts crtbegin.o crtend.o crti.o crtn.o"
extra_parts="$extra_parts crtfastmath.o"
tmake_file="${tmake_file} ${cpu_type}/t-aarch64"
+   tmake_file="${tmake_file} ${cpu_type}/t-lse t-slibgcc-libgcc"
tmake_file="${tmake_file} ${cpu_type}/t-softfp t-softfp t-crtfm"
md_unwind_header=aarch64/aarch64-unwind.h
;;
  aarch64*-*-freebsd*)
extra_parts="$extra_parts crtfastmath.o"
tmake_file="${tmake_file} ${cpu_type}/t-aarch64"
+   tmake_file="${tmake_file} ${cpu_type}/t-lse t-slibgcc-libgcc"
tmake_file="${tmake_file} ${cpu_type}/t-softfp t-softfp t-crtfm"
md_unwind_header=aarch64/freebsd-unwind.h
;;
@@ -367,12 +369,14 @@ aarch64*-*-netbsd*)
;;
  aarch64*-*-fuchsia*)
tmake_file="${tmake_file} ${cpu_type}/t-aarch64"
+   tmake_file="${tmake_file} ${cpu_type}/t-lse t-slibgcc-libgcc"
tmake_file="${tmake_file} ${cpu_type}/t-softfp t-softfp"
;;
  aarch64*-*-linux*)
extra_parts="$extra_parts crtfastmath.o"
md_unwind_header=aarch64/linux-unwind.h
tmake_file="${tmake_file} ${cpu_type}/t-aarch64"
+   tmake_file="${tmake_file} ${cpu_type}/t-lse t-slibgcc-libgcc"
tmake_file="${tmake_file} ${cpu_type}/t-softfp t-softfp t-crtfm"
;;
  alpha*-*-linux*)
diff --git a/libgcc/config/aarch64/lse.S b/libgcc/config/aarch64/lse.S
new file mode 100644
index 000..c24a39242ca
--- /dev/null
+++ b/libgcc/config/aarch64/lse.S
@@ -0,0 +1,235 @@
+/* Out-of-line LSE atomics for AArch64 architecture.
+   Copyright (C) 2018 Free Software Foundation, Inc.
+   Contributed by Linaro Ltd.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHA

Re: [PATCH, AArch64 v4 0/6] LSE atomics out-of-line

2019-09-18 Thread Kyrill Tkachov


Hi Richard,

On 9/18/19 2:58 AM, Richard Henderson wrote:

Version 3 was back in November:
https://gcc.gnu.org/ml/gcc-patches/2018-11/msg00062.html

Changes since v3:
   * Do not swap_commutative_operands_p in aarch64_gen_compare_reg.
 This is the probable cause of the bootstrap problem that Kyrill reported.
   * Add unwind markers to the out-of-line functions.
   * Use uxt{8,16} instead of mov in CAS functions,
 in preference to including the uxt with the cmp.
   * Prefer the lse case in the out-of-line fallthru (Wilco).
   * Name the option -moutline-atomics (Wilco)
   * Name the variable __aarch64_have_lse_atomics (Wilco);
 fix the definition in lse-init.c.
   * Rename the functions s/__aa64/__aarch64/ (Seemed sensible to match prev)
   * Always use Pmode for the address for libcalls, fixing ilp32 (Kyrill).

Still not done is a custom calling convention during code generation,
but that can come later as an optimization.

Tested aarch64-linux on a thunder x1.
I have not run tests on any platform supporting LSE, even qemu.


Thanks for this.

I've bootstrapped and tested this patch series on systems with and 
without LSE support, both with and without patch [6/6], so 4 setups in 
total.


It all looks clean for me.

I'm favour of this series going in (modulo patch 6/6, leaving the option 
to turn it on to the user).


I've got a couple of small comments on some of the patches that IMO can 
be fixed when committing.


I'll respond to them individually.

Thanks,

Kyrill


r~


Richard Henderson (6):
   aarch64: Extend %R for integer registers
   aarch64: Implement TImode compare-and-swap
   aarch64: Tidy aarch64_split_compare_and_swap
   aarch64: Add out-of-line functions for LSE atomics
   aarch64: Implement -moutline-atomics
   TESTING: Enable -moutline-atomics by default

  gcc/config/aarch64/aarch64-protos.h   |  13 +
  gcc/common/config/aarch64/aarch64-common.c|   6 +-
  gcc/config/aarch64/aarch64.c  | 204 +++
  .../atomic-comp-swap-release-acquire.c|   2 +-
  .../gcc.target/aarch64/atomic-op-acq_rel.c|   2 +-
  .../gcc.target/aarch64/atomic-op-acquire.c|   2 +-
  .../gcc.target/aarch64/atomic-op-char.c   |   2 +-
  .../gcc.target/aarch64/atomic-op-consume.c|   2 +-
  .../gcc.target/aarch64/atomic-op-imm.c|   2 +-
  .../gcc.target/aarch64/atomic-op-int.c|   2 +-
  .../gcc.target/aarch64/atomic-op-long.c   |   2 +-
  .../gcc.target/aarch64/atomic-op-relaxed.c|   2 +-
  .../gcc.target/aarch64/atomic-op-release.c|   2 +-
  .../gcc.target/aarch64/atomic-op-seq_cst.c|   2 +-
  .../gcc.target/aarch64/atomic-op-short.c  |   2 +-
  .../aarch64/atomic_cmp_exchange_zero_reg_1.c  |   2 +-
  .../atomic_cmp_exchange_zero_strong_1.c   |   2 +-
  .../gcc.target/aarch64/sync-comp-swap.c   |   2 +-
  .../gcc.target/aarch64/sync-op-acquire.c  |   2 +-
  .../gcc.target/aarch64/sync-op-full.c |   2 +-
  libgcc/config/aarch64/lse-init.c  |  45 
  gcc/config/aarch64/aarch64.opt|   3 +
  gcc/config/aarch64/atomics.md | 187 +-
  gcc/config/aarch64/iterators.md   |   3 +
  gcc/doc/invoke.texi   |  16 +-
  libgcc/config.host|   4 +
  libgcc/config/aarch64/lse.S   | 235 ++
  libgcc/config/aarch64/t-lse   |  44 
  28 files changed, 709 insertions(+), 85 deletions(-)
  create mode 100644 libgcc/config/aarch64/lse-init.c
  create mode 100644 libgcc/config/aarch64/lse.S
  create mode 100644 libgcc/config/aarch64/t-lse

[PATCH] Tweak clang-format configuration.

2019-09-18 Thread Martin Liška

Hi.

I'm going to install a patch to clang-format that tweaks configuration.
I'm planning to use it on daily basis and I'm planning to send an analysis
later on.

Martin

contrib/ChangeLog:

2019-09-18  Martin Liska  

* clang-format: Tweak configuration based on new
options offered.
---
 contrib/clang-format | 20 
 1 file changed, 16 insertions(+), 4 deletions(-)


diff --git a/contrib/clang-format b/contrib/clang-format
index d734001c06f..7a4e96f64ca 100644
--- a/contrib/clang-format
+++ b/contrib/clang-format
@@ -13,16 +13,21 @@
 # You should have received a copy of the GNU General Public License
 # along with this program.  If not, see .
 
-# clang-format 3.8+ (Mon Nov 16) is required
+# clang-format 7.0.1 is required
 #
 # To utilize the tool to lines just touched by a patch, use
-# clang-format-diff.py script, which can be downloaded here:
-# https://llvm.org/svn/llvm-project/cfe/trunk/tools/clang-format/clang-format-diff.py
+# clang-format-diff script that is usually also packaged with clang-format.
+#
+# Example of usage:
+# git diff -U0 --no-color | clang-format-diff -p1
+# (here the tool will generate a patch)
+# git diff -U0 --no-color | clang-format-diff -p1 -i
+# (modifications are applied)
 
 ---
 Language: Cpp
 AccessModifierOffset: -2
-AlwaysBreakAfterDefinitionReturnType: All
+AlwaysBreakAfterReturnType: TopLevel
 BinPackArguments: true
 BinPackParameters: true
 BraceWrapping:
@@ -37,6 +42,7 @@ BraceWrapping:
   BeforeCatch: true
   BeforeElse: true
   IndentBraces: true
+  SplitEmptyFunction: false
 BreakBeforeBinaryOperators: All
 BreakBeforeBraces: Custom
 BreakBeforeTernaryOperators: true
@@ -136,3 +142,9 @@ SpaceAfterCStyleCast: true
 SpaceBeforeParens: Always
 SpacesBeforeTrailingComments: 1
 UseTab: Always
+AlignEscapedNewlines: Right
+AlignTrailingComments: true
+AllowShortFunctionsOnASingleLine: All
+AlwaysBreakTemplateDeclarations: MultiLine
+KeepEmptyLinesAtTheStartOfBlocks: false
+Standard: Cpp03

[Patch, GCC]Backporting r269039 to gcc8

2019-09-18 Thread Delia Burduv

Hi,

I am trying to backport r269039 to gcc8 wich solved this bug report: 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86487 . I have tested it on 
arm-none-linux-gnueabi,aarch64-none-linux-gnu and x86_64-pc-linux-gnu 
and there was no regression. The patch applied cleanly. I don't have 
commit rights, so if it is ok can someone please commit it for me?

Thanks,
Delia

gcc/ChangeLog:
2019-09-13  Delia Burduv  

     Backport from trunk
     2019-02-20  Andre Vieira 

     PR target/86487
     * lra-constraints.c(uses_hard_regs_p): Fix handling of
     paradoxical SUBREGS.

gcc/testsuite/ChangeLog:
2019-09-13  Delia Burduv  

     Backport from trunk
     2019-02-20  Andre Vieira 

     PR target/86487
     * gcc.target/arm/pr86487.c: New.

diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c
index 484e9fa148c32208cd3af39e3aaa944069933ac0..1dea8c959d8f0e7e2d39f0ccf1b97aa1f64b024f 100644
--- a/gcc/lra-constraints.c
+++ b/gcc/lra-constraints.c
@@ -1774,14 +1774,24 @@ uses_hard_regs_p (rtx x, HARD_REG_SET set)
 return false;
   code = GET_CODE (x);
   mode = GET_MODE (x);
+
   if (code == SUBREG)
 {
+  /* For all SUBREGs we want to check whether the full multi-register
+	 overlaps the set.  For normal SUBREGs this means 'get_hard_regno' of
+	 the inner register, for paradoxical SUBREGs this means the
+	 'get_hard_regno' of the full SUBREG and for complete SUBREGs either is
+	 fine.  Use the wider mode for all cases.  */
+  rtx subreg = SUBREG_REG (x);
   mode = wider_subreg_mode (x);
-  x = SUBREG_REG (x);
-  code = GET_CODE (x);
+  if (mode == GET_MODE (subreg))
+	{
+	  x = subreg;
+	  code = GET_CODE (x);
+	}
 }
 
-  if (REG_P (x))
+  if (REG_P (x) || SUBREG_P (x))
 {
   x_hard_regno = get_hard_regno (x, true);
   return (x_hard_regno >= 0
diff --git a/gcc/testsuite/gcc.target/arm/pr86487.c b/gcc/testsuite/gcc.target/arm/pr86487.c
new file mode 100644
index ..1c1db7852d91a82a1d2b6eaa4f3d4c6dbef107f5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr86487.c
@@ -0,0 +1,10 @@
+/* { dg-skip-if "" { *-*-* } { "-march=armv[0-6]*" "-mthumb" } { "" } } */
+/* { dg-require-effective-target arm_neon_hw } */
+/* { dg-options "-O1 -mbig-endian" } */
+/* { dg-add-options arm_neon } */
+int a, b, c, d;
+long long fn1(long long p2) { return p2 == 0 ? -1 : -1 % p2; }
+void fn2(long long p1, short p2, long p3) {
+  b = fn1((d || 6) & a);
+  c = b | p3;
+}

Re: [PATCH][ARM] Add logical DImode expanders

2019-09-18 Thread Kyrill Tkachov


Hi Wilco,

On 9/9/19 6:06 PM, Wilco Dijkstra wrote:

ping


We currently use default mid-end expanders for logical DImode operations.
 These split operations without first splitting off complex immediates or
 memory operands.  The resulting expansions are non-optimal and allow for
 fewer LDRD/STRD opportunities.  So add back explicit expanders which 
ensure

 memory operands and immediates are handled more efficiently.



Makes sense to me.



 Bootstrap OK on armhf, regress passes.

 ChangeLog:
 2019-08-29  Wilco Dijkstra  

 * config/arm/arm.md (anddi3): Expand explicitly.
 (iordi3): Likewise.
 (xordi3): Likewise.
 (one_cmpldi2): Likewise.
 * config/arm/arm.c (const_ok_for_dimode_op): Return true if one
 of the constant parts is simple.
 * config/arm/predicates.md (arm_anddi_operand): Add predicate.
 (arm_iordi_operand): Add predicate.
 (arm_xordi_operand): Add predicate.

 --

 diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
 index 
fb57880fe0568be96a04aee1b7d230e77121e3f5..1fec00baa2a5e510ef2c02d9766432cc7cd0a17b 
100644

 --- a/gcc/config/arm/arm.c
 +++ b/gcc/config/arm/arm.c
 @@ -4273,8 +4273,8 @@ const_ok_for_dimode_op (HOST_WIDE_INT i, enum 
rtx_code code)

  case AND:
  case IOR:
  case XOR:
 -  return (const_ok_for_op (hi_val, code) || hi_val == 0x)
 -  && (const_ok_for_op (lo_val, code) || lo_val == 
0x);

 +  return const_ok_for_op (hi_val, code) || hi_val == 0x
 +    || const_ok_for_op (lo_val, code) || lo_val == 0x;
  case PLUS:
    return arm_not_operand (hi, SImode) && arm_add_operand (lo, 
SImode);


 diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
 index 
ed49c4beda138633a84b58fe345cf5ba99103ab7..738d42fd164f117f1dec1108a824d984ccd70d09 
100644

 --- a/gcc/config/arm/arm.md
 +++ b/gcc/config/arm/arm.md
 @@ -2176,6 +2176,89 @@ (define_expand "divdf3"
    "")


 +; Expand logical operations.  The mid-end expander does not split 
off memory
 +; operands or complex immediates, which leads to fewer LDRD/STRD 
instructions.

 +; So an explicit expander is needed to generate better code.
 +
 +(define_expand "anddi3"
 +  [(set (match_operand:DI    0 "s_register_operand")
 +   (and:DI (match_operand:DI 1 "s_register_operand")
 +   (match_operand:DI 2 "arm_anddi_operand")))]
 +  "TARGET_32BIT"
 +  {
 +  rtx low  = simplify_gen_binary (AND, SImode,
 + gen_lowpart (SImode, operands[1]),
 + gen_lowpart (SImode, operands[2]));
 +  rtx high = simplify_gen_binary (AND, SImode,
 + gen_highpart (SImode, operands[1]),
 + gen_highpart_mode (SImode, DImode,
 +    operands[2]));
 +
 +  emit_insn (gen_rtx_SET (gen_lowpart (SImode, operands[0]), low));
 +  emit_insn (gen_rtx_SET (gen_highpart (SImode, operands[0]), 
high));

 +  DONE;
 +  }
 +)
 +
 +(define_expand "iordi3"
 +  [(set (match_operand:DI    0 "s_register_operand")
 +   (ior:DI (match_operand:DI 1 "s_register_operand")
 +   (match_operand:DI 2 "arm_iordi_operand")))]
 +  "TARGET_32BIT"
 +  {
 +  rtx low  = simplify_gen_binary (IOR, SImode,
 + gen_lowpart (SImode, operands[1]),
 + gen_lowpart (SImode, operands[2]));
 +  rtx high = simplify_gen_binary (IOR, SImode,
 + gen_highpart (SImode, operands[1]),
 + gen_highpart_mode (SImode, DImode,
 +    operands[2]));
 +
 +  emit_insn (gen_rtx_SET (gen_lowpart (SImode, operands[0]), low));
 +  emit_insn (gen_rtx_SET (gen_highpart (SImode, operands[0]), 
high));

 +  DONE;
 +  }
 +)
 +
 +(define_expand "xordi3"
 +  [(set (match_operand:DI    0 "s_register_operand")
 +   (xor:DI (match_operand:DI 1 "s_register_operand")
 +   (match_operand:DI 2 "arm_xordi_operand")))]
 +  "TARGET_32BIT"
 +  {
 +   rtx low  = simplify_gen_binary (XOR, SImode,
 +   gen_lowpart (SImode, 
operands[1]),
 +   gen_lowpart (SImode, 
operands[2]));

 +   rtx high = simplify_gen_binary (XOR, SImode,
 +   gen_highpart (SImode, 
operands[1]),

 + gen_highpart_mode (SImode, DImode,
 + operands[2]));
 +
 +   emit_insn (gen_rtx_SET (gen_lowpart (SImode, operands[0]), low));
 +   emit_insn (gen_rtx_SET (gen_highpart (SImode, operands[0]), 
high));

 +   DONE;
 +  }
 +)


We should be able to "compress" the above 3 patterns into one using code 
iterators.


Looks ok to me otherwise.

Thanks,

Kyrill


 +
 +(define_expand "one_cmpldi2"
 +  [(set (match_operand:DI 0 "s

Re: [PATCH][ARM] Cleanup multiply patterns

2019-09-18 Thread Kyrill Tkachov


Hi Wilco,

On 9/9/19 6:07 PM, Wilco Dijkstra wrote:

ping


Cleanup the 32-bit multiply patterns.  Merge the pre-Armv6 with the Armv6
 patterns, remove useless alternatives and order the accumulator operands
 to prefer MLA Ra, Rb, Rc, Ra whenever feasible.

 Bootstrap OK on armhf, regress passes.

 ChangeLog:
 2019-09-03  Wilco Dijkstra  

 * config/arm/arm.md (arm_mulsi3): Remove pattern.
 (arm_mulsi3_v6): Likewise.
 (mulsi3addsi_v6): Likewise.
 (mulsi3subsi): Likewise.
 (mul): Add new multiply pattern.
 (mla): Likewise.
 (mls): Likewise.

 --
 diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
 index 
66dafdc47b7cfc37c131764e482d47bcaab90538..681358512e88f6823d1b6d59038f387daaec226e 
100644

 --- a/gcc/config/arm/arm.md
 +++ b/gcc/config/arm/arm.md
 @@ -1594,64 +1594,44 @@ (define_expand "mulsi3"
    ""
  )

 -;; Use `&' and then `0' to prevent the operands 0 and 1 being the same
 -(define_insn "*arm_mulsi3"
 -  [(set (match_operand:SI  0 "s_register_operand" "=&r,&r")
 -   (mult:SI (match_operand:SI 2 "s_register_operand" "r,r")
 -    (match_operand:SI 1 "s_register_operand" "%0,r")))]
 -  "TARGET_32BIT && !arm_arch6"
 +;; Use `&' and then `0' to prevent operands 0 and 2 being the same
 +(define_insn "*mul"
 +  [(set (match_operand:SI  0 "s_register_operand" "=l,r,&r,&r")
 +   (mult:SI (match_operand:SI 2 "s_register_operand" "l,r,r,r")
 +    (match_operand:SI 1 "s_register_operand" "%0,r,0,r")))]
 +  "TARGET_32BIT"
    "mul%?\\t%0, %2, %1"
    [(set_attr "type" "mul")
 -   (set_attr "predicable" "yes")]
 -)
 -
 -(define_insn "*arm_mulsi3_v6"
 -  [(set (match_operand:SI  0 "s_register_operand" "=l,l,r")
 -   (mult:SI (match_operand:SI 1 "s_register_operand" "0,l,r")
 -    (match_operand:SI 2 "s_register_operand" "l,0,r")))]
 -  "TARGET_32BIT && arm_arch6"
 -  "mul%?\\t%0, %1, %2"
 -  [(set_attr "type" "mul")
 (set_attr "predicable" "yes")
 -   (set_attr "arch" "t2,t2,*")
 +   (set_attr "arch" "t2,v6,nov6,nov6")
 (set_attr "length" "4")
 -   (set_attr "predicable_short_it" "yes,yes,no")]
 +   (set_attr "predicable_short_it" "yes,no,*,*")]
  )

 -;; Unnamed templates to match MLA instruction.
 +;; MLA and MLS instruction. Use operand 1 for the accumulator to prefer
 +;; reusing the same register.

 -(define_insn "*mulsi3addsi"
 -  [(set (match_operand:SI 0 "s_register_operand" "=&r,&r,&r,&r")
 +(define_insn "*mla"
 +  [(set (match_operand:SI 0 "s_register_operand" "=r,&r,&r,&r")
  (plus:SI
 - (mult:SI (match_operand:SI 2 "s_register_operand" "r,r,r,r")
 -  (match_operand:SI 1 "s_register_operand" "%0,r,0,r"))
 - (match_operand:SI 3 "s_register_operand" "r,r,0,0")))]
 -  "TARGET_32BIT && !arm_arch6"
 -  "mla%?\\t%0, %2, %1, %3"
 -  [(set_attr "type" "mla")
 -   (set_attr "predicable" "yes")]
 -)
 -
 -(define_insn "*mulsi3addsi_v6"
 -  [(set (match_operand:SI 0 "s_register_operand" "=r")
 -   (plus:SI
 - (mult:SI (match_operand:SI 2 "s_register_operand" "r")
 -  (match_operand:SI 1 "s_register_operand" "r"))
 - (match_operand:SI 3 "s_register_operand" "r")))]
 -  "TARGET_32BIT && arm_arch6"
 -  "mla%?\\t%0, %2, %1, %3"
 + (mult:SI (match_operand:SI 3 "s_register_operand" "r,r,r,r")
 +  (match_operand:SI 2 "s_register_operand" "%r,r,0,r"))
 + (match_operand:SI 1 "s_register_operand" "r,0,r,r")))]
 +  "TARGET_32BIT"
 +  "mla%?\\t%0, %3, %2, %1"
    [(set_attr "type" "mla")
 -   (set_attr "predicable" "yes")]
 +   (set_attr "predicable" "yes")
 +   (set_attr "arch" "v6,nov6,nov6,nov6")]
  )

 -(define_insn "*mulsi3subsi"
 +(define_insn "*mls"
    [(set (match_operand:SI 0 "s_register_operand" "=r")
  (minus:SI
 - (match_operand:SI 3 "s_register_operand" "r")
 - (mult:SI (match_operand:SI 2 "s_register_operand" "r")
 -  (match_operand:SI 1 "s_register_operand" "r"]
 + (match_operand:SI 1 "s_register_operand" "r")
 + (mult:SI (match_operand:SI 3 "s_register_operand" "r")
 +  (match_operand:SI 2 "s_register_operand" "r"]



Looks like we'll want to mark operand 2 here with '%' as well?

Looks ok to me otherwise.

Thanks,

Kyrill



    "TARGET_32BIT && arm_arch_thumb2"
 -  "mls%?\\t%0, %2, %1, %3"
 +  "mls%?\\t%0, %3, %2, %1"
    [(set_attr "type" "mla")
 (set_attr "predicable" "yes")]
  )

Re: [PATCH][ARM] Cleanup 64-bit multiplies

2019-09-18 Thread Kyrill Tkachov


Hi Wilco,

On 9/9/19 6:08 PM, Wilco Dijkstra wrote:

ping


Cleanup 64-bit multiplies.  Combine the expanders using iterators.
 Merge the signed/unsigned multiplies as well as the pre-Armv6 and Armv6
 variants.  Split DImode operands early into parallel sets inside the
 MULL/MLAL instructions - this improves register allocation and avoids
 subreg issues due to other DImode operations splitting early.

Hmm... quite a lot going on this patch. Perhaps breaking it into a 
series would have been easier.


But I think I untangled it all and it looks like a good improvement.

Ok.

Thanks,

Kyrill




 Bootstrap OK on armhf, regress passes.

 ChangeLog:
 2019-09-03  Wilco Dijkstra  

 * config/arm/arm.md (maddsidi4): Remove expander.
 (mulsidi3adddi): Remove pattern.
 (mulsidi3adddi_v6): Likewise.
 (mulsidi3_nov6): Likewise.
 (mulsidi3_v6): Likewise.
 (umulsidi3): Remove expander.
 (umulsidi3_nov6): Remove pattern.
 (umulsidi3_v6): Likewise.
 (umulsidi3adddi): Likewise.
 (umulsidi3adddi_v6): Likewise.
 (mulsidi3): Add combined expander.
 (maddsidi4): Likewise.
 (mull): Add combined umull and smull pattern.
 (mlal): Likewise.
 * config/arm/iterators.md (Us): Add new iterator.
 --
 diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
 index 
1ab203810bf143927a8afa0d00d82537cd7c75ed..c1fea4abdbccedbbbed9a25cab133de5cacb1afb 
100644

 --- a/gcc/config/arm/arm.md
 +++ b/gcc/config/arm/arm.md
 @@ -1636,144 +1636,80 @@ (define_insn "*mls"
 (set_attr "predicable" "yes")]
  )

 -(define_expand "maddsidi4"
 -  [(set (match_operand:DI 0 "s_register_operand")
 -   (plus:DI
 -    (mult:DI
 - (sign_extend:DI (match_operand:SI 1 "s_register_operand"))
 - (sign_extend:DI (match_operand:SI 2 "s_register_operand")))
 -    (match_operand:DI 3 "s_register_operand")))]
 -  "TARGET_32BIT"
 -  "")
 -
 -(define_insn "*mulsidi3adddi"
 -  [(set (match_operand:DI 0 "s_register_operand" "=&r")
 -   (plus:DI
 -    (mult:DI
 - (sign_extend:DI (match_operand:SI 2 "s_register_operand" "%r"))
 - (sign_extend:DI (match_operand:SI 3 "s_register_operand" "r")))
 -    (match_operand:DI 1 "s_register_operand" "0")))]
 -  "TARGET_32BIT && !arm_arch6"
 -  "smlal%?\\t%Q0, %R0, %3, %2"
 -  [(set_attr "type" "smlal")
 -   (set_attr "predicable" "yes")]
 -)
 -
 -(define_insn "*mulsidi3adddi_v6"
 -  [(set (match_operand:DI 0 "s_register_operand" "=r")
 -   (plus:DI
 -    (mult:DI
 - (sign_extend:DI (match_operand:SI 2 "s_register_operand" "r"))
 - (sign_extend:DI (match_operand:SI 3 "s_register_operand" "r")))
 -    (match_operand:DI 1 "s_register_operand" "0")))]
 -  "TARGET_32BIT && arm_arch6"
 -  "smlal%?\\t%Q0, %R0, %3, %2"
 -  [(set_attr "type" "smlal")
 -   (set_attr "predicable" "yes")]
 -)
 -
  ;; 32x32->64 widening multiply.
 -;; As with mulsi3, the only difference between the v3-5 and v6+
 -;; versions of these patterns is the requirement that the output not
 -;; overlap the inputs, but that still means we have to have a named
 -;; expander and two different starred insns.
 +;; The only difference between the v3-5 and v6+ versions is the 
requirement

 +;; that the output does not overlap with either input.

 -(define_expand "mulsidi3"
 +(define_expand "mulsidi3"
    [(set (match_operand:DI 0 "s_register_operand")
  (mult:DI
 -    (sign_extend:DI (match_operand:SI 1 "s_register_operand"))
 -    (sign_extend:DI (match_operand:SI 2 "s_register_operand"]
 +    (SE:DI (match_operand:SI 1 "s_register_operand"))
 +    (SE:DI (match_operand:SI 2 "s_register_operand"]
    "TARGET_32BIT"
 -  ""
 -)
 -
 -(define_insn "*mulsidi3_nov6"
 -  [(set (match_operand:DI 0 "s_register_operand" "=&r")
 -   (mult:DI
 -    (sign_extend:DI (match_operand:SI 1 "s_register_operand" "%r"))
 -    (sign_extend:DI (match_operand:SI 2 "s_register_operand" 
"r"]

 -  "TARGET_32BIT && !arm_arch6"
 -  "smull%?\\t%Q0, %R0, %1, %2"
 -  [(set_attr "type" "smull")
 -   (set_attr "predicable" "yes")]
 -)
 -
 -(define_insn "*mulsidi3_v6"
 -  [(set (match_operand:DI 0 "s_register_operand" "=r")
 -   (mult:DI
 -    (sign_extend:DI (match_operand:SI 1 "s_register_operand" "r"))
 -    (sign_extend:DI (match_operand:SI 2 "s_register_operand" 
"r"]

 -  "TARGET_32BIT && arm_arch6"
 -  "smull%?\\t%Q0, %R0, %1, %2"
 -  [(set_attr "type" "smull")
 -   (set_attr "predicable" "yes")]
 +  {
 +  emit_insn (gen_mull (gen_lowpart (SImode, operands[0]),
 +  gen_highpart (SImode, operands[0]),
 +  operands[1], operands[2]));
 +  DONE;
 +  }
  )

 -(define_expand "umulsidi3"
 -  [(set (match_operand:DI 0 "s_register_operand")
 -   (mult:DI
 -    (zero_extend:DI (match_operand:SI 1 "s_register_operand"))
 -    (zero_extend:DI (match_operand:SI 2 "s_register_

Re: [PATCH][ARM] Cleanup highpart multiply patterns

2019-09-18 Thread Kyrill Tkachov


Hi Wilco,

On 9/9/19 6:07 PM, Wilco Dijkstra wrote:

ping


Cleanup the various highpart multiply patterns using iterators.
 As a result the signed and unsigned variants and the pre-Armv6
 multiply operand constraints are all handled in a single pattern
 and simple expander.

 Bootstrap OK on armhf, regress passes.



Ok.

Thanks,

Kyrill



 ChangeLog:
 2019-09-03  Wilco Dijkstra  

 * config/arm/arm.md (smulsi3_highpart): Use  and  
iterators.

 (smulsi3_highpart_nov6): Remove pattern.
 (smulsi3_highpart_v6): Likewise.
 (umulsi3_highpart): Likewise.
 (umulsi3_highpart_nov6): Likewise.
 (umulsi3_highpart_v6): Likewise.
 (mull_high): Add new combined multiply pattern.
 --
 diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
 index 
681358512e88f6823d1b6d59038f387daaec226e..1ab203810bf143927a8afa0d00d82537cd7c75ed 
100644

 --- a/gcc/config/arm/arm.md
 +++ b/gcc/config/arm/arm.md
 @@ -1776,92 +1776,34 @@ (define_insn "*umulsidi3adddi_v6"
 (set_attr "predicable" "yes")]
  )

 -(define_expand "smulsi3_highpart"
 +(define_expand "mulsi3_highpart"
    [(parallel
  [(set (match_operand:SI 0 "s_register_operand")
    (truncate:SI
 (lshiftrt:DI
  (mult:DI
 -    (sign_extend:DI (match_operand:SI 1 "s_register_operand"))
 -    (sign_extend:DI (match_operand:SI 2 "s_register_operand")))
 +    (SE:DI (match_operand:SI 1 "s_register_operand"))
 +    (SE:DI (match_operand:SI 2 "s_register_operand")))
  (const_int 32
   (clobber (match_scratch:SI 3 ""))])]
    "TARGET_32BIT"
    ""
  )

 -(define_insn "*smulsi3_highpart_nov6"
 -  [(set (match_operand:SI 0 "s_register_operand" "=&r,&r")
 +(define_insn "*mull_high"
 +  [(set (match_operand:SI 0 "s_register_operand" "=r,&r,&r")
  (truncate:SI
   (lshiftrt:DI
    (mult:DI
 -  (sign_extend:DI (match_operand:SI 1 "s_register_operand" 
"%0,r"))
 -  (sign_extend:DI (match_operand:SI 2 "s_register_operand" 
"r,r")))

 +  (SE:DI (match_operand:SI 1 "s_register_operand" "%r,0,r"))
 +  (SE:DI (match_operand:SI 2 "s_register_operand" "r,r,r")))
    (const_int 32
 -   (clobber (match_scratch:SI 3 "=&r,&r"))]
 -  "TARGET_32BIT && !arm_arch6"
 -  "smull%?\\t%3, %0, %2, %1"
 -  [(set_attr "type" "smull")
 -   (set_attr "predicable" "yes")]
 -)
 -
 -(define_insn "*smulsi3_highpart_v6"
 -  [(set (match_operand:SI 0 "s_register_operand" "=r")
 -   (truncate:SI
 -    (lshiftrt:DI
 - (mult:DI
 -  (sign_extend:DI (match_operand:SI 1 "s_register_operand" "r"))
 -  (sign_extend:DI (match_operand:SI 2 "s_register_operand" 
"r")))

 - (const_int 32
 -   (clobber (match_scratch:SI 3 "=r"))]
 -  "TARGET_32BIT && arm_arch6"
 -  "smull%?\\t%3, %0, %2, %1"
 -  [(set_attr "type" "smull")
 -   (set_attr "predicable" "yes")]
 -)
 -
 -(define_expand "umulsi3_highpart"
 -  [(parallel
 -    [(set (match_operand:SI 0 "s_register_operand")
 - (truncate:SI
 -  (lshiftrt:DI
 -   (mult:DI
 -    (zero_extend:DI (match_operand:SI 1 "s_register_operand"))
 - (zero_extend:DI (match_operand:SI 2 "s_register_operand")))
 -   (const_int 32
 - (clobber (match_scratch:SI 3 ""))])]
 +   (clobber (match_scratch:SI 3 "=r,&r,&r"))]
    "TARGET_32BIT"
 -  ""
 -)
 -
 -(define_insn "*umulsi3_highpart_nov6"
 -  [(set (match_operand:SI 0 "s_register_operand" "=&r,&r")
 -   (truncate:SI
 -    (lshiftrt:DI
 - (mult:DI
 -  (zero_extend:DI (match_operand:SI 1 "s_register_operand" 
"%0,r"))
 -  (zero_extend:DI (match_operand:SI 2 "s_register_operand" 
"r,r")))

 - (const_int 32
 -   (clobber (match_scratch:SI 3 "=&r,&r"))]
 -  "TARGET_32BIT && !arm_arch6"
 -  "umull%?\\t%3, %0, %2, %1"
 +  "mull%?\\t%3, %0, %2, %1"
    [(set_attr "type" "umull")
 -   (set_attr "predicable" "yes")]
 -)
 -
 -(define_insn "*umulsi3_highpart_v6"
 -  [(set (match_operand:SI 0 "s_register_operand" "=r")
 -   (truncate:SI
 -    (lshiftrt:DI
 - (mult:DI
 -  (zero_extend:DI (match_operand:SI 1 "s_register_operand" "r"))
 -  (zero_extend:DI (match_operand:SI 2 "s_register_operand" 
"r")))

 - (const_int 32
 -   (clobber (match_scratch:SI 3 "=r"))]
 -  "TARGET_32BIT && arm_arch6"
 -  "umull%?\\t%3, %0, %2, %1"
 -  [(set_attr "type" "umull")
 -   (set_attr "predicable" "yes")]
 +   (set_attr "predicable" "yes")
 +   (set_attr "arch" "v6,nov6,nov6")]
  )

  (define_insn "mulhisi3"

Re: [PATCH][ARM] Remove support for MULS

2019-09-18 Thread Kyrill Tkachov


Hi Wilco,

On 9/9/19 6:07 PM, Wilco Dijkstra wrote:

ping


Remove various MULS/MLAS patterns which are enabled when optimizing for
 size.  However the codesize gain from these patterns is so minimal that
 there is no point in keeping them.

I disagree. If they still trigger and generate better code than without 
we should keep them.


What kind of code is *common* varies greatly from user to user.

Thanks,

Kyrill



 Bootstrap OK on armhf, regress passes.

 ChangeLog:
 2019-09-03  Wilco Dijkstra  

 * config/arm/arm.md (mulsi3_compare0): Remove pattern.
 (mulsi3_compare0_v6): Likewise.
 (mulsi_compare0_scratch): Likewise.
 (mulsi_compare0_scratch_v6): Likewise.
 (mulsi3addsi_compare0): Likewise.
 (mulsi3addsi_compare0_v6): Likewise.
 (mulsi3addsi_compare0_scratch): Likewise.
 (mulsi3addsi_compare0_scratch_v6): Likewise.
 * config/arm/thumb2.md (thumb2_mulsi_short_compare0): Remove 
pattern.

 (thumb2_mulsi_short_compare0_scratch): Likewise.

 --
 diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
 index 
738d42fd164f117f1dec1108a824d984ccd70d09..66dafdc47b7cfc37c131764e482d47bcaab90538 
100644

 --- a/gcc/config/arm/arm.md
 +++ b/gcc/config/arm/arm.md
 @@ -1618,60 +1618,6 @@ (define_insn "*arm_mulsi3_v6"
 (set_attr "predicable_short_it" "yes,yes,no")]
  )

 -(define_insn "*mulsi3_compare0"
 -  [(set (reg:CC_NOOV CC_REGNUM)
 -   (compare:CC_NOOV (mult:SI
 - (match_operand:SI 2 "s_register_operand" "r,r")
 - (match_operand:SI 1 "s_register_operand" 
"%0,r"))

 -    (const_int 0)))
 -   (set (match_operand:SI 0 "s_register_operand" "=&r,&r")
 -   (mult:SI (match_dup 2) (match_dup 1)))]
 -  "TARGET_ARM && !arm_arch6"
 -  "muls%?\\t%0, %2, %1"
 -  [(set_attr "conds" "set")
 -   (set_attr "type" "muls")]
 -)
 -
 -(define_insn "*mulsi3_compare0_v6"
 -  [(set (reg:CC_NOOV CC_REGNUM)
 -   (compare:CC_NOOV (mult:SI
 - (match_operand:SI 2 "s_register_operand" "r")
 - (match_operand:SI 1 "s_register_operand" "r"))
 -    (const_int 0)))
 -   (set (match_operand:SI 0 "s_register_operand" "=r")
 -   (mult:SI (match_dup 2) (match_dup 1)))]
 -  "TARGET_ARM && arm_arch6 && optimize_size"
 -  "muls%?\\t%0, %2, %1"
 -  [(set_attr "conds" "set")
 -   (set_attr "type" "muls")]
 -)
 -
 -(define_insn "*mulsi_compare0_scratch"
 -  [(set (reg:CC_NOOV CC_REGNUM)
 -   (compare:CC_NOOV (mult:SI
 - (match_operand:SI 2 "s_register_operand" "r,r")
 - (match_operand:SI 1 "s_register_operand" 
"%0,r"))

 -    (const_int 0)))
 -   (clobber (match_scratch:SI 0 "=&r,&r"))]
 -  "TARGET_ARM && !arm_arch6"
 -  "muls%?\\t%0, %2, %1"
 -  [(set_attr "conds" "set")
 -   (set_attr "type" "muls")]
 -)
 -
 -(define_insn "*mulsi_compare0_scratch_v6"
 -  [(set (reg:CC_NOOV CC_REGNUM)
 -   (compare:CC_NOOV (mult:SI
 - (match_operand:SI 2 "s_register_operand" "r")
 - (match_operand:SI 1 "s_register_operand" "r"))
 -    (const_int 0)))
 -   (clobber (match_scratch:SI 0 "=r"))]
 -  "TARGET_ARM && arm_arch6 && optimize_size"
 -  "muls%?\\t%0, %2, %1"
 -  [(set_attr "conds" "set")
 -   (set_attr "type" "muls")]
 -)
 -
  ;; Unnamed templates to match MLA instruction.

  (define_insn "*mulsi3addsi"
 @@ -1698,70 +1644,6 @@ (define_insn "*mulsi3addsi_v6"
 (set_attr "predicable" "yes")]
  )

 -(define_insn "*mulsi3addsi_compare0"
 -  [(set (reg:CC_NOOV CC_REGNUM)
 -   (compare:CC_NOOV
 -    (plus:SI (mult:SI
 -  (match_operand:SI 2 "s_register_operand" "r,r,r,r")
 -  (match_operand:SI 1 "s_register_operand" "%0,r,0,r"))
 - (match_operand:SI 3 "s_register_operand" "r,r,0,0"))
 -    (const_int 0)))
 -   (set (match_operand:SI 0 "s_register_operand" "=&r,&r,&r,&r")
 -   (plus:SI (mult:SI (match_dup 2) (match_dup 1))
 -    (match_dup 3)))]
 -  "TARGET_ARM && arm_arch6"
 -  "mlas%?\\t%0, %2, %1, %3"
 -  [(set_attr "conds" "set")
 -   (set_attr "type" "mlas")]
 -)
 -
 -(define_insn "*mulsi3addsi_compare0_v6"
 -  [(set (reg:CC_NOOV CC_REGNUM)
 -   (compare:CC_NOOV
 -    (plus:SI (mult:SI
 -  (match_operand:SI 2 "s_register_operand" "r")
 -  (match_operand:SI 1 "s_register_operand" "r"))
 - (match_operand:SI 3 "s_register_operand" "r"))
 -    (const_int 0)))
 -   (set (match_operand:SI 0 "s_register_operand" "=r")
 -   (plus:SI (mult:SI (match_dup 2) (match_dup 1))
 -    (match_dup 3)))]
 -  "TARGET_ARM && arm_arch6 && optimize_size"
 -  "mlas%?\\t%0, %2, %1, %3"
 -  [(set_attr "conds" "set")
 -   (set_attr "type" "mlas")]
 -)
 -
 -(define_insn "*mulsi3addsi_compare0_scratch"
 -  [(set (reg:CC_NOOV CC_REGNUM)
 -   (compare:CC_NOOV

Re: [PATCH][ARM] Enable code hoisting with -Os (PR80155)

2019-09-18 Thread Prathamesh Kulkarni

On Wed, 18 Sep 2019 at 01:46, Richard Biener  wrote:
>
> On Tue, Sep 17, 2019 at 7:18 PM Wilco Dijkstra  wrote:
> >
> > Hi Richard,
> >
> > > The issue with the bugzilla is that it lacked appropriate testcase(s) and 
> > > thus
> > > it is now a mess.  There are clear testcases (maybe not in the benchmarks 
> > > you
> >
> > Agreed - it's not clear whether any of the proposed changes would actually
> > help the original issue. My patch absolutely does.
> >
> > > care about) that benefit from code hoisting as enabler, mainly when 
> > > control
> > > flow can be then converted to data flow.  Also note that "size 
> > > optimizations"
> > > are important for all cases where followup transforms have size limits on 
> > > the IL
> > > in place.
> >
> > The gain from -fcode-hoisting is about 0.2% overall on Thumb-2. Ie. it's 
> > definitely
> > useful, but there are much larger gains to be had from other tweaks [1]. So 
> > we can
> > live without it until a better solution is found.
>
> A "solution" for better eembc benchmark results?
>
> The issues are all latent even w/o code-hoisting since you can do the
> same transform at the source level.  Which is usually why I argue
> trying to fix this in code-hoisting is not a complete fix.  Nor is turning
> off random GIMPLE passes for specific benchmark regressions.
>
> Anyway, it's arm maintainers call if you want to have such hacks in
> place or not.
>
> As a release manager I say that GCC isn't a benchmark compiler.
>
> As the one "responsible" for the code-hoisting introduction I say that
> as long as I don't have access to the actual benchmark I can't assess
> wrongdoing of the pass nor suggest an appropriate place for optimization.
Hi Richard,
The actual benchmark function for PR80155 is almost identical to FMS()
function defined in
pr77445-2.c, and the test-case reproduces the same issue as in the benchmark.

Thanks,
Prathamesh
>
> Richard.
>
> >
> > [1] https://gcc.gnu.org/ml/gcc-patches/2019-07/msg01739.html
> >
> > Wilco

[PATCH] i386: Increase Skylake SImode pseudo register store cost

2019-09-18 Thread H.J. Lu

On Skylake, SImode store cost isn't less than half cost of 128-bit vector
store.  This patch increases Skylake SImode pseudo register store cost to
make it the same as QImode and HImode.

gcc/

PR target/91446
* config/i386/x86-tune-costs.h (skylake_cost): Increase SImode
pseudo register store cost from 3 to 6 to make it the same as
QImode and HImode.

gcc/testsuite/

PR target/91446
* gcc.target/i386/pr91446.c: New test.

OK for trunk?

Thanks.

-- 
H.J.
From 63c534a600ad412ad8224c8f01c58400780affe9 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Wed, 14 Aug 2019 14:01:15 -0700
Subject: [PATCH] i386: Increase Skylake SImode pseudo register store cost

On Skylake, SImode store cost isn't less than half cost of 128-bit vector
store.  This patch increases Skylake SImode pseudo register store cost to
make it the same as QImode and HImode.

gcc/

	PR target/91446
	* config/i386/x86-tune-costs.h (skylake_cost): Increase SImode
	pseudo register store cost from 3 to 6 to make it the same as
	QImode and HImode.

gcc/testsuite/

	PR target/91446
	* gcc.target/i386/pr91446.c: New test.
---
 gcc/config/i386/x86-tune-costs.h|  2 +-
 gcc/testsuite/gcc.target/i386/pr91446.c | 24 
 2 files changed, 25 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr91446.c

diff --git a/gcc/config/i386/x86-tune-costs.h b/gcc/config/i386/x86-tune-costs.h
index 00edece3eb68..42c9c2530c98 100644
--- a/gcc/config/i386/x86-tune-costs.h
+++ b/gcc/config/i386/x86-tune-costs.h
@@ -1638,7 +1638,7 @@ struct processor_costs skylake_cost = {
   {4, 4, 4},/* cost of loading integer registers
 	   in QImode, HImode and SImode.
 	   Relative to reg-reg move (2).  */
-  {6, 6, 3},/* cost of storing integer registers */
+  {6, 6, 6},/* cost of storing integer registers */
   {6, 6, 6, 10, 20},			/* cost of loading SSE register
 	   in 32bit, 64bit, 128bit, 256bit and 512bit */
   {8, 8, 8, 12, 24},			/* cost of storing SSE register
diff --git a/gcc/testsuite/gcc.target/i386/pr91446.c b/gcc/testsuite/gcc.target/i386/pr91446.c
new file mode 100644
index ..f7c4bea616da
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr91446.c
@@ -0,0 +1,24 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -march=skylake -ftree-slp-vectorize -mtune-ctrl=^sse_typeless_stores" } */
+
+typedef struct
+{
+  unsigned long long width, height;
+  long long x, y;
+} info;
+
+extern void bar (info *);
+
+void
+foo (unsigned long long width, unsigned long long height,
+ long long x, long long y)
+{
+  info t;
+  t.width = width;
+  t.height = height;
+  t.x = x;
+  t.y = y;
+  bar (&t);
+}
+
+/* { dg-final { scan-assembler-times "vmovdqa\[^\n\r\]*xmm\[0-9\]" 2 } } */
-- 
2.20.1

[PATCH] i386: Restore Skylake SImode hard register store cost

2019-09-18 Thread H.J. Lu

On Skylake, we should move integer register to SSE register without
going through memory.  This patch restores Skylake SImode hard register
store cost to 6.

gcc/

PR target/90878
* config/i386/x86-tune-costs.h (skylake_cost): Restore SImode
hard register store cost to 6.

gcc/testsuite/

PR target/90878
* gcc.target/i386/pr90878.c: New test.

OK for trunk?

Thanks.

-- 
H.J.
From 827748528f5418fb3d28d4f023bd158e7951d33c Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Wed, 14 Aug 2019 19:42:52 -0700
Subject: [PATCH] i386: Restore Skylake SImode hard register store cost

On Skylake, we should move integer register to SSE register without
going through memory.  This patch restores Skylake SImode hard register
store cost to 6.

gcc/

	PR target/90878
	* config/i386/x86-tune-costs.h (skylake_cost): Restore SImode
	hard register store cost to 6.

gcc/testsuite/

	PR target/90878
	* gcc.target/i386/pr90878.c: New test.
---
 gcc/config/i386/x86-tune-costs.h|  2 +-
 gcc/testsuite/gcc.target/i386/pr90878.c | 25 +
 2 files changed, 26 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90878.c

diff --git a/gcc/config/i386/x86-tune-costs.h b/gcc/config/i386/x86-tune-costs.h
index 00edece3eb68..7a2c7c55b4cc 100644
--- a/gcc/config/i386/x86-tune-costs.h
+++ b/gcc/config/i386/x86-tune-costs.h
@@ -1594,7 +1594,7 @@ struct processor_costs skylake_cost = {
   {4, 4, 4},/* cost of loading integer registers
 	   in QImode, HImode and SImode.
 	   Relative to reg-reg move (2).  */
-  {6, 6, 3},/* cost of storing integer registers */
+  {6, 6, 6},/* cost of storing integer registers */
   2,	/* cost of reg,reg fld/fst */
   {6, 6, 8},/* cost of loading fp registers
 	   in SFmode, DFmode and XFmode */
diff --git a/gcc/testsuite/gcc.target/i386/pr90878.c b/gcc/testsuite/gcc.target/i386/pr90878.c
new file mode 100644
index ..18dd64bdaa7a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr90878.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=skylake" } */
+
+union ieee754_float
+  {
+float f;
+
+struct
+  {
+	unsigned int mantissa:23;
+	unsigned int exponent:8;
+	unsigned int negative:1;
+  } ieee;
+};
+
+double
+foo (float f)
+{
+  union ieee754_float u;
+  u.f = f;
+  u.ieee.negative = 0;
+  return u.f;
+}
+
+/* { dg-final { scan-assembler-not "vcvtss2sd\[^\\n\]*\\\(%.sp\\\)" } } */
-- 
2.20.1

[PATCH] Remove vectorizer reduction operand swapping

2019-09-18 Thread Richard Biener



It shouldn't be neccessary.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.
(SLP part testing separately)

Richard.

2019-09-18  Richard Biener  

* tree-vect-loop.c (vect_is_simple_reduction): Remove operand
swapping.
(vectorize_fold_left_reduction): Remove assert.
(vectorizable_reduction): Also expect COND_EXPR non-reduction
operand in position 2.  Remove assert.

Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c(revision 275872)
+++ gcc/tree-vect-loop.c(working copy)
@@ -3278,56 +3278,8 @@ vect_is_simple_reduction (loop_vec_info
  || !flow_bb_inside_loop_p (loop, gimple_bb (def2_info->stmt))
  || vect_valid_reduction_input_p (def2_info)))
 {
-  if (! nested_in_vect_loop && orig_code != MINUS_EXPR)
-   {
- /* Check if we can swap operands (just for simplicity - so that
-the rest of the code can assume that the reduction variable
-is always the last (second) argument).  */
- if (code == COND_EXPR)
-   {
- /* Swap cond_expr by inverting the condition.  */
- tree cond_expr = gimple_assign_rhs1 (def_stmt);
- enum tree_code invert_code = ERROR_MARK;
- enum tree_code cond_code = TREE_CODE (cond_expr);
-
- if (TREE_CODE_CLASS (cond_code) == tcc_comparison)
-   {
- bool honor_nans = HONOR_NANS (TREE_OPERAND (cond_expr, 0));
- invert_code = invert_tree_comparison (cond_code, honor_nans);
-   }
- if (invert_code != ERROR_MARK)
-   {
- TREE_SET_CODE (cond_expr, invert_code);
- swap_ssa_operands (def_stmt,
-gimple_assign_rhs2_ptr (def_stmt),
-gimple_assign_rhs3_ptr (def_stmt));
-   }
- else
-   {
- if (dump_enabled_p ())
-   report_vect_op (MSG_NOTE, def_stmt,
-   "detected reduction: cannot swap operands "
-   "for cond_expr");
- return NULL;
-   }
-   }
- else
-   swap_ssa_operands (def_stmt, gimple_assign_rhs1_ptr (def_stmt),
-  gimple_assign_rhs2_ptr (def_stmt));
-
- if (dump_enabled_p ())
-   report_vect_op (MSG_NOTE, def_stmt,
-   "detected reduction: need to swap operands: ");
-
- if (CONSTANT_CLASS_P (gimple_assign_rhs1 (def_stmt)))
-   LOOP_VINFO_OPERANDS_SWAPPED (loop_info) = true;
-}
-  else
-{
-  if (dump_enabled_p ())
-report_vect_op (MSG_NOTE, def_stmt, "detected reduction: ");
-}
-
+  if (dump_enabled_p ())
+   report_vect_op (MSG_NOTE, def_stmt, "detected reduction: ");
   return def_stmt_info;
 }
 
@@ -5969,7 +5921,6 @@ vectorize_fold_left_reduction (stmt_vec_
   gcc_assert (!nested_in_vect_loop_p (loop, stmt_info));
   gcc_assert (ncopies == 1);
   gcc_assert (TREE_CODE_LENGTH (code) == binary_op);
-  gcc_assert (reduc_index == (code == MINUS_EXPR ? 0 : 1));
   gcc_assert (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
  == FOLD_LEFT_REDUCTION);
 
@@ -6542,9 +6493,9 @@ vectorizable_reduction (stmt_vec_info st
  reduc_index = i;
}
 
-  if (i == 1 && code == COND_EXPR)
+  if (code == COND_EXPR)
{
- /* Record how value of COND_EXPR is defined.  */
+ /* Record how the non-reduction-def value of COND_EXPR is defined.  */
  if (dt == vect_constant_def)
{
  cond_reduc_dt = dt;
@@ -6622,10 +6573,6 @@ vectorizable_reduction (stmt_vec_info st
  return false;
}
 
-  /* vect_is_simple_reduction ensured that operand 2 is the
-loop-carried operand.  */
-  gcc_assert (reduc_index == 2);
-
   /* Loop peeling modifies initial value of reduction PHI, which
 makes the reduction stmt to be transformed different to the
 original stmt analyzed.  We need to record reduction code for

Re: [patch, testsuite, arm] Fix ICE in gcc.dg/gimplefe-28.c

2019-09-18 Thread Mike Stump

On Sep 13, 2019, at 12:06 PM, Sandra Loosemore  wrote:
> 
> For the default multilib on arm-none-eabi, gcc.dg/gimplefe-28 has been 
> getting an ICE because, while the target-supports infrastructure is probing 
> to see if it can add the command-line options to enable the sqrt insn 
> ("-mfpu=vfp -mfloat-abi=softfp"), it is not actually adding those options 
> when building this testcase.  :-S  The hook to do this is already there; it 
> just needs a case for arm.
> 
> OK to commit?

Ok.

Hum, usually the arm people are so responsive.  I don't think I've seen a 
review of this, so when they don't, I will.

General note, I do prefer the target folk chime in on such patches instead of 
me, as there can be subtle target things that target folks track better than I 
and arm is one of those targets with so many wonder and subtle things.  :-)

Re: [PATCH][ARM] Cleanup multiply patterns

2019-09-18 Thread Wilco Dijkstra

Hi Kyrill,

>>  + (mult:SI (match_operand:SI 3 "s_register_operand" "r")
>>  +  (match_operand:SI 2 "s_register_operand" "r"]
>
> Looks like we'll want to mark operand 2 here with '%' as well?

That doesn't make any difference since both operands are identical.
It only helps if the operands are not the same, eg. rather than
"r,0" "0,r" you can write "%r" "0".

Wilco

Merge from GCC trunk to gccgo branch

2019-09-18 Thread Ian Lance Taylor

I merged trunk revision 275890 to the gccgo branch.

Ian

Re: [PATCH] i386: Increase Skylake SImode pseudo register store cost

2019-09-18 Thread Uros Bizjak

On Wed, Sep 18, 2019 at 8:04 PM H.J. Lu  wrote:
>
> On Skylake, SImode store cost isn't less than half cost of 128-bit vector
> store.  This patch increases Skylake SImode pseudo register store cost to
> make it the same as QImode and HImode.
>
> gcc/
>
> PR target/91446
> * config/i386/x86-tune-costs.h (skylake_cost): Increase SImode
> pseudo register store cost from 3 to 6 to make it the same as
> QImode and HImode.
>
> gcc/testsuite/
>
> PR target/91446
> * gcc.target/i386/pr91446.c: New test.
>
> OK for trunk?

I assume these tunings are backed by some benchmark results. So, OK.

Thanks,
Uros.

Re: [PATCH] i386: Restore Skylake SImode hard register store cost

2019-09-18 Thread Uros Bizjak

On Wed, Sep 18, 2019 at 8:06 PM H.J. Lu  wrote:
>
> On Skylake, we should move integer register to SSE register without
> going through memory.  This patch restores Skylake SImode hard register
> store cost to 6.
>
> gcc/
>
> PR target/90878
> * config/i386/x86-tune-costs.h (skylake_cost): Restore SImode
> hard register store cost to 6.
>
> gcc/testsuite/
>
> PR target/90878
> * gcc.target/i386/pr90878.c: New test.
>
> OK for trunk?

OK, as with your previous patch.

Thanks,
Uros.

Re: [PATCH][ARM] Add logical DImode expanders

2019-09-18 Thread Wilco Dijkstra

Hi Kyrill,

> We should be able to "compress" the above 3 patterns into one using code 
> iterators.

Good point, that makes sense. I've committed this:

ChangeLog:
2019-09-18  Wilco Dijkstra  

PR target/91738
* config/arm/arm.md (di3): Expand explicitly.
(one_cmpldi2): Likewise.
* config/arm/arm.c (const_ok_for_dimode_op): Return true if one
of the constant parts is simple.
* config/arm/iterators.md (LOGICAL): Add new code iterator.
(logical_op): Add new code attribute.
(logical_OP): Likewise.
* config/arm/predicates.md (arm_anddi_operand): Add predicate.
(arm_iordi_operand): Add predicate.
(arm_xordi_operand): Add predicate.

--
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
173e6363682c35aa72a9fa36c14b6324b59e347b..1fc90c62798978ea5abddb11fbf1d7acbc8a8dc3
 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -4273,8 +4273,8 @@ const_ok_for_dimode_op (HOST_WIDE_INT i, enum rtx_code 
code)
 case AND:
 case IOR:
 case XOR:
-  return (const_ok_for_op (hi_val, code) || hi_val == 0x)
-  && (const_ok_for_op (lo_val, code) || lo_val == 0x);
+  return const_ok_for_op (hi_val, code) || hi_val == 0x
+|| const_ok_for_op (lo_val, code) || lo_val == 0x;
 case PLUS:
   return arm_not_operand (hi, SImode) && arm_add_operand (lo, SImode);
 
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 
e23683114087f6cc9ee78376529da97cfe31d3a6..3943c4252b272d30f88f265e90ebc4cb88e3a615
 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -2176,6 +2176,49 @@ (define_expand "divdf3"
   "")
 
 
+; Expand logical operations.  The mid-end expander does not split off memory
+; operands or complex immediates, which leads to fewer LDRD/STRD instructions.
+; So an explicit expander is needed to generate better code.
+
+(define_expand "di3"
+  [(set (match_operand:DI0 "s_register_operand")
+   (LOGICAL:DI (match_operand:DI 1 "s_register_operand")
+   (match_operand:DI 2 "arm_di_operand")))]
+  "TARGET_32BIT"
+  {
+  rtx low  = simplify_gen_binary (, SImode,
+ gen_lowpart (SImode, operands[1]),
+ gen_lowpart (SImode, operands[2]));
+  rtx high = simplify_gen_binary (, SImode,
+ gen_highpart (SImode, operands[1]),
+ gen_highpart_mode (SImode, DImode,
+operands[2]));
+
+  emit_insn (gen_rtx_SET (gen_lowpart (SImode, operands[0]), low));
+  emit_insn (gen_rtx_SET (gen_highpart (SImode, operands[0]), high));
+  DONE;
+  }
+)
+
+(define_expand "one_cmpldi2"
+  [(set (match_operand:DI 0 "s_register_operand")
+   (not:DI (match_operand:DI 1 "s_register_operand")))]
+  "TARGET_32BIT"
+  {
+  rtx low  = simplify_gen_unary (NOT, SImode,
+gen_lowpart (SImode, operands[1]),
+SImode);
+  rtx high = simplify_gen_unary (NOT, SImode,
+gen_highpart_mode (SImode, DImode,
+   operands[1]),
+SImode);
+
+  emit_insn (gen_rtx_SET (gen_lowpart (SImode, operands[0]), low));
+  emit_insn (gen_rtx_SET (gen_highpart (SImode, operands[0]), high));
+  DONE;
+  }
+)
+
 ;; Split DImode and, ior, xor operations.  Simply perform the logical
 ;; operation on the upper and lower halves of the registers.
 ;; This is needed for atomic operations in arm_split_atomic_op.
diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 
fa6f0c0529d5364b1e1df705cb1029868578e38c..20fd96cb0445fcdf821c7c72f2dd30bae8590d0c
 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -239,6 +239,8 @@ (define_code_iterator COMPARISONS [eq gt ge le lt])
 ;; A list of ...
 (define_code_iterator IOR_XOR [ior xor])
 
+(define_code_iterator LOGICAL [and ior xor])
+
 ;; Operations on two halves of a quadword vector.
 (define_code_iterator VQH_OPS [plus smin smax umin umax])
 
@@ -285,6 +287,9 @@ (define_code_attr cmp_type [(eq "i") (gt "s") (ge "s") (lt 
"s") (le "s")])
 
 (define_code_attr vfml_op [(plus "a") (minus "s")])
 
+(define_code_attr logical_op [(ior "ior") (xor "xor") (and "and")])
+(define_code_attr logical_OP [(ior "IOR") (xor "XOR") (and "AND")])
+
 ;;
 ;; Int iterators
 ;;
diff --git a/gcc/config/arm/predicates.md b/gcc/config/arm/predicates.md
index 
983faac8a72ef75e80cc34031c07c6435902c36f..8b36e7ee462235ad26e132f1ccf98d28c2487d67
 100644
--- a/gcc/config/arm/predicates.md
+++ b/gcc/config/arm/predicates.md
@@ -206

Re: [PATCH] RISC-V: Fix bad insn splits with paradoxical subregs.

2019-09-18 Thread Jim Wilson

On Wed, Sep 18, 2019 at 3:27 AM Kito Cheng  wrote:
> On Wed, Sep 18, 2019 at 6:25 PM Richard Biener  wrote:
> > Since it is target specific and for non-primary/secondary targets
> > it's the RISC-V maintainers call whether to allow backporting this.
> > Generally wrong-code issues can be backported even if they are not
> > regressions if the chance they introduce other issues is low.

Speaking as a RISC-V maintainer, I'd like to see it backported.  That
is why I left the bug report open.

Jim

PowerPC future machine patches, version 4

2019-09-18 Thread Michael Meissner

This is a reworking of the patches that I posted as V3 at the end of August.

Unlike the last set of patches, I do not use the address mask bits in reg_addr,
but instead, I have a separate function that takes an address and decodes it
into the various different flavors (single register address, D-form 16-bit
address, X-form indexed address, numeric 34-bit offset, local pc-relative
address, etc.).  The caller then decides whether the address matches what they
are looking for.

I have two enumerations that I added to this series:

1)  insn_form: This is the address format (D, DS, DQ, X, etc.);

2)  non_prefixed: This is a limited enum that just describes the format of
the non-prefixed instruction to decide if an address needs to be
prefixed or not.

Originally, I was trying to re-use the same insn_form enumeration for both the
output and the input to say what the traditional instruction uses, but I
ultimately separated them to make it clearer.

As I said to you at the Caulron, when I replaced some of the predicates, I put
them in a different location in predicates.md, so that it would be clear that
the old version was completely eliminated, and replaced with a new version.

I also removed the two boolean arguments for the pc-relative matching, and
instead the address to insn_form just returns different values (34-bit numeric
offset, 34-bit bit pc-relative reference to a local symbol, 34-bit pc-relative
reference to an external symbol, etc.).

I did collapse the fix for vector extracts into the patches that enable general
prefixed addressing, so we don't have a possibility of bad code being
generated.

Right now, I'm not going to add the PCREL_OPT patches to this set, but I will
do it later, if these patches get applied.  I will rework it to meet the
comments you raised.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797

[PATCH] V4, patch #1: Rework prefixed/pc-relative lookup

2019-09-18 Thread Michael Meissner

This patch reworks the prefixed and pc-relative memory matching functions.

As I said in the intro message, I do not re-use the address mask bits in
reg_addr, but instead, I have a separate function that takes an address and
decodes it into the various different flavors (single register address, D-form
16-bit address, X-form indexed address, numeric 34-bit offset, local
pc-relative address, etc.).  The caller then decides whether the address
matches what they are looking for.

I have two enumerations that I added to this series:

1)  insn_form: This is the address format (D, DS, DQ, X, etc.);

2)  non_prefixed: This is a limited enum that just describes the format of
the non-prefixed instruction to decide if an address needs to be
prefixed or not.

Originally, I was trying to re-use the same insn_form enumeration for both the
output and the input to say what the traditional instruction uses, but I
ultimately separated them to make it clearer.

This is an infrastructure patch.  It needs the second patch to enable basic
pc-relative support.

I have done a bootstrap build with all of the patches applied, and there were
no regressions in the test suite.  After posting these patches, I will start a
job to build each set of patches in turn just to make sure there are no extra
warnings.

Can I commit this patch to the trunk?

2019-09-18  Michael Meissner  

* config/rs6000/predicates.md (pcrel_address): Delete predicate.
(pcrel_local_address): Replace pcrel_address predicate, use the
new function address_to_insn_form.
(pcrel_external_address): Replace with new implementation using
address_to_insn_form..
(prefixed_mem_operand): Delete predicate which is now unused.
(pcrel_external_mem_operand): Delete predicate which is now
unused.
* config/rs6000/rs6000-protos.h (enum insn_form): New
enumeration.
(enum non_prefixed): New enumeration.
(address_to_insn_form): New declaration.
* config/rs6000/rs6000.c (print_operand_address): Check for either
pc-relative local symbols or pc-relative external symbols.
(mode_supports_prefixed_address_p): Delete, no longer used.
(rs6000_prefixed_address_mode_p): Delete, no longer used.
(address_to_insn_form): New function to decode an address format.

Index: gcc/config/rs6000/predicates.md
===
--- gcc/config/rs6000/predicates.md (revision 275903)
+++ gcc/config/rs6000/predicates.md (working copy)
@@ -1625,82 +1625,7 @@ (define_predicate "small_toc_ref"
   return GET_CODE (op) == UNSPEC && XINT (op, 1) == UNSPEC_TOCREL;
 })
 
-;; Return true if the operand is a pc-relative address.
-(define_predicate "pcrel_address"
-  (match_code "label_ref,symbol_ref,const")
-{
-  if (!rs6000_pcrel_p (cfun))
-return false;
-
-  if (GET_CODE (op) == CONST)
-op = XEXP (op, 0);
-
-  /* Validate offset.  */
-  if (GET_CODE (op) == PLUS)
-{
-  rtx op0 = XEXP (op, 0);
-  rtx op1 = XEXP (op, 1);
-
-  if (!CONST_INT_P (op1) || !SIGNED_34BIT_OFFSET_P (INTVAL (op1)))
-   return false;
-
-  op = op0;
-}
-
-  if (LABEL_REF_P (op))
-return true;
-
-  return (SYMBOL_REF_P (op) && SYMBOL_REF_LOCAL_P (op));
-})
-
-;; Return true if the operand is an external symbol whose address can be loaded
-;; into a register using:
-;; PLD reg,label@pcrel@got
-;;
-;; The linker will either optimize this to either a PADDI if the label is
-;; defined locally in another module or a PLD of the address if the label is
-;; defined in another module.
-
-(define_predicate "pcrel_external_address"
-  (match_code "symbol_ref,const")
-{
-  if (!rs6000_pcrel_p (cfun))
-return false;
-
-  if (GET_CODE (op) == CONST)
-op = XEXP (op, 0);
-
-  /* Validate offset.  */
-  if (GET_CODE (op) == PLUS)
-{
-  rtx op0 = XEXP (op, 0);
-  rtx op1 = XEXP (op, 1);
-
-  if (!CONST_INT_P (op1) || !SIGNED_34BIT_OFFSET_P (INTVAL (op1)))
-   return false;
-
-  op = op0;
-}
-
-  return (SYMBOL_REF_P (op) && !SYMBOL_REF_LOCAL_P (op));
-})
-
-;; Return 1 if op is a prefixed memory operand.
-(define_predicate "prefixed_mem_operand"
-  (match_code "mem")
-{
-  return rs6000_prefixed_address_mode_p (XEXP (op, 0), GET_MODE (op));
-})
-
-;; Return 1 if op is a memory operand to an external variable when we
-;; support pc-relative addressing and the PCREL_OPT relocation to
-;; optimize references to it.
-(define_predicate "pcrel_external_mem_operand"
-  (match_code "mem")
-{
-  return pcrel_external_address (XEXP (op, 0), Pmode);
-})
-
+
 ;; Match the first insn (addis) in fusing the combination of addis and loads to
 ;; GPR registers on power8.
 (define_predicate "fusion_gpr_addis"
@@ -1857,3 +1782,31 @@ (define_predicate "fusion_addis_mem_comb
 
   return 0;
 })
+
+
+;; Return true if the operand is a pc-relative address to a local symbol or a
+;; label that ca

[PATCH], V4, patch #2: Add prefixed insn attribute

2019-09-18 Thread Michael Meissner

This patch adds the "prefixed" insn attribute that says whether or not the insn
generates a prefixed instruction or not.

The attributes "prefixed_length" and "non_prefixed_length" give then length in
bytes (12 and 4 by default) of the insn if it is prefixed or not.

The "length" attribute is set based on the "prefixed" attribute.

I use the target hooks ASM_OUTPUT_OPCODE and FINAL_PRESCAN_INSN to decide
whether to emit a leading "p" before the insn.

There are functions (prefixed_load_p, prefixed_store_p, and prefixed_paddi_p)
that given an insn type, say whether that particular insn type is prefixed or
not.

In addition, this patch adds the support in rs6000_emit_move to load up
pc-relative addresses, both local addresses defined in the same compilation
unit, and external addresses that might be need to be loaded from a .GOT
address table.

I have done a bootstrap build with all of the patches applied, and there were
no regressions in the test suite.  After posting these patches, I will start a
job to build each set of patches in turn just to make sure there are no extra
warnings.

Can I commit this patch to the trunk?

2019-09-18  Michael Meissner  

* config/rs6000/rs6000-protos.h (prefixed_load_p): New
declaration.
(prefixed_store_p): New declaration.
(prefixed_paddi_p): New declaration.
(rs6000_asm_output_opcode): New declaration.
(rs6000_final_prescan_insn): Move declaration and update calling
signature.
(address_is_prefixed): New helper inline function.
* config/rs6000/rs6000.c (rs6000_emit_move): Support loading
pc-relative addresses.
(reg_to_non_prefixed): New function to identify what the
non-prefixed memory instruction format is for a register.
(prefixed_load_p): New function to identify prefixed loads.
(prefixed_store_p): New function to identify prefixed stores.
(prefixed_paddi_p): New function to identify prefixed load
immediates.
(next_insn_prefixed_p): New static state variable.
(rs6000_final_prescan_insn): New function to determine if an insn
uses a prefixed instruction.
(rs6000_asm_output_opcode): New function to emit 'p' in front of a
prefixed instruction.
* config/rs6000/rs6000.h (FINAL_PRESCAN_INSN): New target hook.
(ASM_OUTPUT_OPCODE): New target hook.
* config/rs6000/rs6000.md (prefixed): New insn attribute for
prefixed instructions.
(prefixed_length): New insn attribute for the size of prefixed
instructions.
(non_prefixed_length): New insn attribute for the size of
non-prefixed instructions.
(pcrel_local_addr): New insn to load up a local pc-relative
address.
(pcrel_extern_addr): New insn to load up an external pc-relative
address.

Index: gcc/config/rs6000/rs6000-protos.h
===
--- gcc/config/rs6000/rs6000-protos.h   (revision 275908)
+++ gcc/config/rs6000/rs6000-protos.h   (working copy)
@@ -189,6 +189,30 @@ enum non_prefixed {
 
 extern enum insn_form address_to_insn_form (rtx, machine_mode,
enum non_prefixed);
+extern bool prefixed_load_p (rtx_insn *);
+extern bool prefixed_store_p (rtx_insn *);
+extern bool prefixed_paddi_p (rtx_insn *);
+extern void rs6000_asm_output_opcode (FILE *);
+extern void rs6000_final_prescan_insn (rtx_insn *, rtx [], int);
+
+/* Return true if the address is a prefixed instruction that can be directly
+   used in a memory instruction (i.e. using numeric offset or a pc-relative
+   reference to a local symbol).
+
+   References to external pc-relative symbols aren't allowed, because GCC has
+   to load the address into a register and then issue a separate load or
+   store.  */
+
+static inline bool
+address_is_prefixed (rtx addr,
+machine_mode mode,
+enum non_prefixed non_prefixed_insn)
+{
+  enum insn_form iform = address_to_insn_form (addr, mode,
+  non_prefixed_insn);
+  return (iform == INSN_FORM_PREFIXED_NUMERIC
+ || iform == INSN_FORM_PCREL_LOCAL);
+}
 #endif /* RTX_CODE */
 
 #ifdef TREE_CODE
@@ -268,8 +292,6 @@ extern void rs6000_d_target_versions (vo
 const char * rs6000_xcoff_strip_dollar (const char *);
 #endif
 
-void rs6000_final_prescan_insn (rtx_insn *, rtx *operand, int num_operands);
-
 extern unsigned char rs6000_class_max_nregs[][LIM_REG_CLASSES];
 extern unsigned char rs6000_hard_regno_nregs[][FIRST_PSEUDO_REGISTER];
 
Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 275908)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -9639,6 +9639,22 @@ rs6000_emit_move (rtx dest, rtx source,
  return;
}
 
+  /* Use the default pattern for loading up pc-relative addresses.

[PATCH], V4, patch #3: Fix up mov_64bit_dm

2019-09-18 Thread Michael Meissner

In doing the patches, I noticed that mov_64bit_dm had two alternatives
combined together.  This patch fixes the problem, before the next patch that
will need to modify mov_64bit_dm for prefixed addressing.

I have done a bootstrap build with all of the patches applied, and there were
no regressions in the test suite.  After posting these patches, I will start a
job to build each set of patches in turn just to make sure there are no extra
warnings.

Can I commit this patch to the trunk?

2019-09-18  Michael Meissner  

* config/rs6000/rs6000.md (mov_64bit_dm): Split the
alternatives for loading 0.0 to a GPR and loading a 128-bit
floating point type to a GPR.

Index: gcc/config/rs6000/rs6000.md
===
--- gcc/config/rs6000/rs6000.md (revision 275816)
+++ gcc/config/rs6000/rs6000.md (working copy)
@@ -7758,9 +7758,18 @@ (define_expand "mov"
 ;; not swapped like they are for TImode or TFmode.  Subregs therefore are
 ;; problematical.  Don't allow direct move for this case.
 
+;; FPR loadFPR store   FPR moveFPR zeroGPR load
+;; GPR zeroGPR store   GPR moveMFVSRD  MTVSRD
+
 (define_insn_and_split "*mov_64bit_dm"
-  [(set (match_operand:FMOVE128_FPR 0 "nonimmediate_operand" 
"=m,d,d,d,Y,r,r,r,d")
-   (match_operand:FMOVE128_FPR 1 "input_operand" 
"d,m,d,,r,Y,r,d,r"))]
+  [(set (match_operand:FMOVE128_FPR 0 "nonimmediate_operand"
+   "=m,d,  d,  d,  Y,
+r, r,  r,  r,  d")
+
+   (match_operand:FMOVE128_FPR 1 "input_operand"
+   "d, m,  d,  ,  r,
+, Y,  r,  d,  r"))]
+
   "TARGET_HARD_FLOAT && TARGET_POWERPC64 && FLOAT128_2REG_P (mode)
&& (mode != TDmode || WORDS_BIG_ENDIAN)
&& (gpc_reg_operand (operands[0], mode)
@@ -7769,8 +7778,8 @@ (define_insn_and_split "*mov_64bit
   "&& reload_completed"
   [(pc)]
 { rs6000_split_multireg_move (operands[0], operands[1]); DONE; }
-  [(set_attr "length" "8,8,8,8,12,12,8,8,8")
-   (set_attr "isa" "*,*,*,*,*,*,*,p8v,p8v")])
+  [(set_attr "length" "8")
+   (set_attr "isa" "*,*,*,*,*,*,*,*,p8v,p8v")])
 
 (define_insn_and_split "*movtd_64bit_nodm"
   [(set (match_operand:TD 0 "nonimmediate_operand" "=m,d,d,Y,r,r")

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797

libgo patch committed: Fixes for arm64 GoLLVM build

2019-09-18 Thread Ian Lance Taylor

This libgo patch by Xiangdong JI is for the GoLLVM build on arm64
GNU/Linux.  It incorporates a type definition of 'uint128' to
'runtime' and 'syscall' packages.  This fixes
https://golang.org/issue/33711.  Bootstrapped and ran Go testsuite on
x86_64-pc-linux-gnu.  Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 275814)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-09ca3c1ea8a52b5d3d6c4331c59d44e0b6bfab57
+d81ff42c367cce2110ccf5ddbadb6cc9bdf94e28
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/mkrsysinfo.sh
===
--- libgo/mkrsysinfo.sh (revision 275698)
+++ libgo/mkrsysinfo.sh (working copy)
@@ -209,3 +209,9 @@ grep '^type _kevent ' gen-sysinfo.go | \
 sed -e s'/_kevent/keventt/' \
   -e 's/ udata [^;}]*/ udata *byte/' \
 >> ${OUT}
+
+# Type 'uint128' is needed in a couple of type definitions on arm64,such
+# as _user_fpsimd_struct, _elf_fpregset_t, etc.
+if ! grep '^type uint128' ${OUT} > /dev/null 2>&1; then
+echo "type uint128 [16]byte" >> ${OUT}
+fi
Index: libgo/mksysinfo.sh
===
--- libgo/mksysinfo.sh  (revision 275698)
+++ libgo/mksysinfo.sh  (working copy)
@@ -1393,4 +1393,10 @@ grep '^type _mactun_info_t ' gen-sysinfo
 sed -e 's/_in6_addr_t/[16]byte/g' \
 >> ${OUT}
 
+# Type 'uint128' is needed in a couple of type definitions on arm64,such
+# as _user_fpsimd_struct, _elf_fpregset_t, etc.
+if ! grep '^type uint128' ${OUT} > /dev/null 2>&1; then
+echo "type uint128 [16]byte" >> ${OUT}
+fi
+
 exit $?
Index: libgo/sysinfo.c
===
--- libgo/sysinfo.c (revision 275698)
+++ libgo/sysinfo.c (working copy)
@@ -424,7 +424,11 @@ EREF(MNT_FORCE);
 
 #if defined(HAVE_SYS_PTRACE_H)
 // From 
+#if defined (__aarch64__)
+SREF(user_pt_regs);
+#else
 SREF(pt_regs);
+#endif
 EREF(PTRACE_PEEKTEXT);
 #endif

[PATCH], V4, patch #4: Enable prefixed/pc-rel addressing

2019-09-18 Thread Michael Meissner

This patch is the patch that goes through and enables prefixed and pc-relative
addressing on all modes, except for SDmode.  SDmode is special in that for its
main use, you need to only use X-form addressing.  While you can do D-form
addressing to load/store SDmode in GPR registers, I found you really don't want
to do that, as the register allocator will load/store the value and do a direct
move.

I also discovered that if you did a vector extract of a variable offset where
the vector address is pc-relative, the code was incorrect because it re-used
the base register temporary.  This code prevents the vector extract from
combining the extract operation and the memory.

As you suggested in the last series of patches, I have made stack_protect_setdi
and stack_protect_testdi not support prefixed insns in the actual insn, and the
expander converts the memory address to a non-prefixed form.

I have done a bootstrap build with all of the patches applied, and there were
no regressions in the test suite.  After posting these patches, I will start a
job to build each set of patches in turn just to make sure there are no extra
warnings.

Can I commit this patch to the trunk?

2019-09-18  Michael Meissner  

* config/rs6000/constraints.md (em constraint): New constraint for
non pc-relative memory.
* config/rs6000/predicates.md (lwa_operand): Allow odd offsets if
we have prefixed addressing.
(non_prefixed_memory): New predicate.
(non_pcrel_memory): New predicate.
(reg_or_non_pcrel_memory): New predicate.
* config/rs6000/rs6000-protos.h (make_memory_non_prefixed): New
declaration.
* config/rs6000/rs6000.c (rs6000_adjust_vec_address): Optimize
pc-relative addresses with constant offsets.  Signal an error if
we have a pc-relative address and a variable offset.
(rs6000_split_vec_extract_var): Signal an error if we have a
pc-relative address and a variable offset.
(quad_address_p): Add support for prefixed addresses.
(mem_operand_gpr): Add support for prefixed addresses.
(mem_operand_ds_form): Add support for prefixed addresses.
(rs6000_legitimate_offset_address_p): Add support for prefixed
addresses.
(rs6000_legitimate_address_p): Add support for prefixed
addresses.
(rs6000_mode_dependent_address): Add support for prefixed
addresses.
(rs6000_num_insns): New helper function.
(rs6000_insn_cost): Treat prefixed instructions as having the same
cost as non prefixed instructions, even though the prefixed
instructions are larger.
(make_memory_non_prefixed): New function to make a non-prefixed
memory operand.
* config/rs6000/rs6000.md (mov_64bit_dm): Add support for
prefixed addresses.
(movtd_64bit_nodm): Add support for prefixed addresses.
(stack_protect_setdi): Convert prefixed addresses to non-prefixed
addresses.  Allow for indexed addressing as well as offsettable.
(stack_protect_testdi): Convert prefixed addresses to non-prefixed
addresses.  Allow for indexed addressing as well as offsettable.
* config/rs6000/vsx.md (vsx_mov_64bit): Add support for
prefixed addresses.
(vsx_extract__var, VSX_D iterator): Do not allow a vector in
memory with a prefixed address to combine with variable offsets.
(vsx_extract_v4sf_var): Do not allow a vector in memory with a
prefixed address to combine with variable offsets.
(vsx_extract__var, VSX_EXTRACT_I iterator): Do not allow a
vector in memory with a prefixed address to combine with variable
offsets.
(vsx_extract__mode_var): Do not allow a vector in
memory with a prefixed address to combine with variable offsets.
* doc/md.texi (PowerPC constraints): Document 'em' constraint.

Index: gcc/config/rs6000/constraints.md
===
--- gcc/config/rs6000/constraints.md(revision 275894)
+++ gcc/config/rs6000/constraints.md(working copy)
@@ -210,6 +210,11 @@ several times, or that might not access
   (and (match_code "mem")
(match_test "GET_RTX_CLASS (GET_CODE (XEXP (op, 0))) != RTX_AUTOINC")))
 
+(define_memory_constraint "em"
+  "A memory operand that does not contain a pc-relative reference."
+  (and (match_code "mem")
+   (match_test "non_pcrel_memory (op, mode)")))
+
 (define_memory_constraint "Q"
   "Memory operand that is an offset from a register (it is usually better
 to use @samp{m} or @samp{es} in @code{asm} statements)"
Index: gcc/config/rs6000/predicates.md
===
--- gcc/config/rs6000/predicates.md (revision 275908)
+++ gcc/config/rs6000/predicates.md (working copy)
@@ -932,6 +932,14 @@ (define_predicate "lwa_operand"
 return false;
 
   addr = XEXP (inner, 0);
+

[PATCH] V4, patch #5: Use PLI (PADDI) to load up 34-bit DImode

2019-09-18 Thread Michael Meissner

This is a simple patch to enable loading up 34-bit DImode integer constants via
the PLI (PADDI) instruction.  At your suggestion, I moved it from the previous
patch.

Due to the ordering of the alternatives, it does force all of the alternatives
to move down by one.

I have done a bootstrap build with all of the patches applied, and there were
no regressions in the test suite.  After posting these patches, I will start a
job to build each set of patches in turn just to make sure there are no extra
warnings.

Can I commit this patch to the trunk?

2019-09-18  Michael Meissner  

* config/rs6000/rs6000.c (num_insns_constant_gpr): Add support for
PADDI to load up and/or add 34-bit integer constants.
(rs6000_rtx_costs): Treat constants loaded up with PADDI with the
same cost as normal 16-bit constants.
* config/rs6000/rs6000.md (movdi_internal64): Add support to load
up 34-bit integer constants with PADDI.
(movdi integer constant splitter): Add comment about PADDI.

Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 275911)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -5522,7 +5522,7 @@ static int
 num_insns_constant_gpr (HOST_WIDE_INT value)
 {
   /* signed constant loadable with addi */
-  if (((unsigned HOST_WIDE_INT) value + 0x8000) < 0x1)
+  if (SIGNED_16BIT_OFFSET_P (value))
 return 1;
 
   /* constant loadable with addis */
@@ -5530,6 +5530,10 @@ num_insns_constant_gpr (HOST_WIDE_INT va
   && (value >> 31 == -1 || value >> 31 == 0))
 return 1;
 
+  /* PADDI can support up to 34 bit signed integers.  */
+  else if (TARGET_PREFIXED_ADDR && SIGNED_34BIT_OFFSET_P (value))
+return 1;
+
   else if (TARGET_POWERPC64)
 {
   HOST_WIDE_INT low  = ((value & 0x) ^ 0x8000) - 0x8000;
@@ -20663,7 +20667,8 @@ rs6000_rtx_costs (rtx x, machine_mode mo
|| outer_code == PLUS
|| outer_code == MINUS)
   && (satisfies_constraint_I (x)
-  || satisfies_constraint_L (x)))
+  || satisfies_constraint_L (x)
+  || satisfies_constraint_eI (x)))
  || (outer_code == AND
  && (satisfies_constraint_K (x)
  || (mode == SImode
Index: gcc/config/rs6000/rs6000.md
===
--- gcc/config/rs6000/rs6000.md (revision 275911)
+++ gcc/config/rs6000/rs6000.md (working copy)
@@ -8806,24 +8806,24 @@ (define_split
   [(pc)]
 { rs6000_split_multireg_move (operands[0], operands[1]); DONE; })
 
-;;  GPR store  GPR load   GPR move   GPR li GPR lis GPR #
-;;  FPR store  FPR load   FPR move   AVX store  AVX store   AVX 
load
-;;  AVX load   VSX move   P9 0   P9 -1  AVX 0/-1VSX 0
-;;  VSX -1 P9 const   AVX const  From SPR   To SPR  
SPR<->SPR
-;;  VSX->GPR   GPR->VSX
+;;  GPR store  GPR load   GPR move   GPR li GPR lis GPR pli
+;;  GPR #  FPR store  FPR load   FPR move   AVX store   AVX 
store
+;;  AVX load   AVX load   VSX move   P9 0   P9 -1   AVX 
0/-1
+;;  VSX 0  VSX -1 P9 const   AVX const  From SPRTo SPR
+;;  SPR<->SPR  VSX->GPR   GPR->VSX
 (define_insn "*movdi_internal64"
   [(set (match_operand:DI 0 "nonimmediate_operand"
"=YZ,   r, r, r, r,  r,
-m, ^d,^d,wY,Z,  $v,
-$v,^wa,   wa,wa,v,  wa,
-wa,v, v, r, *h, *h,
-?r,?wa")
+r, m, ^d,^d,wY, Z,
+$v,$v,^wa,   wa,wa, v,
+wa,wa,v, v, r,  *h,
+*h,?r,?wa")
(match_operand:DI 1 "input_operand"
-   "r, YZ,r, I, L,  nF,
-^d,m, ^d,^v,$v, wY,
-Z, ^wa,   Oj,wM,OjwM,   Oj,
-wM,wS,wB,*h,r,  0,
-wa,r"))]
+   "r, YZ,r, I, L,  eI,
+nF,^d,m, ^d,^v, $v,
+wY,Z, ^wa,   Oj,wM, OjwM,
+Oj,wM,wS,wB,*h, r,
+0, wa,r"))]
   "TARGET_POWERPC64
&& (gpc_reg_operand (operands[0], DImode)
|| gpc_reg_operand (operands[1], DImode))"
@@ -8833,6 +8833,7 @@ (define_insn "*movdi_internal

[PATCH] V4, patch #6: Use PLI (PADDI) to load up 32-bit SImode constants

2019-09-18 Thread Michael Meissner

This patch is similar to the previous patch, except it loads up 32-bit SImode
constants instead of DImode constants.

I have done a bootstrap build with all of the patches applied, and there were
no regressions in the test suite.  After posting these patches, I will start a
job to build each set of patches in turn just to make sure there are no extra
warnings.

Can I commit this patch to the trunk?

2019-09-18  Michael Meissner  

* config/rs6000/rs6000.md (movsi_internal1): Add support to load
up 32-bit SImode integer constants with PADDI.
(movsi integer constant splitter): Do not split constant if PADDI
can load it up directly.

Index: gcc/config/rs6000/rs6000.md
===
--- gcc/config/rs6000/rs6000.md (revision 275912)
+++ gcc/config/rs6000/rs6000.md (working copy)
@@ -6908,22 +6908,22 @@ (define_insn "movsi_low"
 
 ;; MR   LA   LWZ  LFIWZX   LXSIWZX
 ;; STW  STFIWX   STXSIWX  LI   LIS
-;; #XXLORXXSPLTIB 0   XXSPLTIB -1  VSPLTISW
-;; XXLXOR 0 XXLORC -1P9 const MTVSRWZ  MFVSRWZ
-;; MF%1 MT%0 NOP
+;; PLI  #XXLORXXSPLTIB 0   XXSPLTIB -1
+;; VSPLTISW XXLXOR 0 XXLORC -1P9 const MTVSRWZ
+;; MFVSRWZ  MF%1 MT%0 NOP
 (define_insn "*movsi_internal1"
   [(set (match_operand:SI 0 "nonimmediate_operand"
"=r, r,   r,   d,   v,
 m,  Z,   Z,   r,   r,
-r,  wa,  wa,  wa,  v,
-wa, v,   v,   wa,  r,
-r,  *h,  *h")
+r,  r,   wa,  wa,  wa,
+v,  wa,  v,   v,   wa,
+r,  r,   *h,  *h")
(match_operand:SI 1 "input_operand"
"r,  U,   m,   Z,   Z,
 r,  d,   v,   I,   L,
-n,  wa,  O,   wM,  wB,
-O,  wM,  wS,  r,   wa,
-*h, r,   0"))]
+eI, n,   wa,  O,   wM,
+wB, O,   wM,  wS,  r,
+wa, *h,  r,   0"))]
   "gpc_reg_operand (operands[0], SImode)
|| gpc_reg_operand (operands[1], SImode)"
   "@
@@ -6937,6 +6937,7 @@ (define_insn "*movsi_internal1"
stxsiwx %x1,%y0
li %0,%1
lis %0,%v1
+   li %0,%1
#
xxlor %x0,%x1,%x1
xxspltib %x0,0
@@ -6953,21 +6954,21 @@ (define_insn "*movsi_internal1"
   [(set_attr "type"
"*,  *,   load,fpload,  fpload,
 store,  fpstore, fpstore, *,   *,
-*,  veclogical,  vecsimple,   vecsimple,   vecsimple,
-veclogical, veclogical,  vecsimple,   mffgpr,  mftgpr,
-*,  *,   *")
+*,  *,   veclogical,  vecsimple,   vecsimple,
+vecsimple,  veclogical,  veclogical,  vecsimple,   mffgpr,
+mftgpr, *,   *,   *")
(set_attr "length"
"*,  *,   *,   *,   *,
 *,  *,   *,   *,   *,
-8,  *,   *,   *,   *,
-*,  *,   8,   *,   *,
-*,  *,   *")
+*,  8,   *,   *,   *,
+*,  *,   *,   8,   *,
+*,  *,   *,   *")
(set_attr "isa"
"*,  *,   *,   p8v, p8v,
 *,  p8v, p8v, *,   *,
-*,  p8v, p9v, p9v, p8v,
-p9v,p8v, p9v, p8v, p8v,
-*,  *,   *")])
+fut,*,   p8v, p9v, p9v,
+p8v,p9v, p8v, p9v, p8v,
+p8v,*,   *,   *")])
 
 ;; Like movsi, but adjust a SF value to be used in a SI context, i.e.
 ;; (set (reg:SI ...) (subreg:SI (reg:SF ...) 0))
@@ -7112,14 +7113,15 @@ (define_insn "*movsi_from_df"
   "xscvdpsp %x0,%x1"
   [(set_attr "type" "fp")])
 
-;; Split a load of a large constant into the appropriate two-insn
-;

[PATCH] V4, patch #7: Use PADDI to add 34-bit constants

2019-09-18 Thread Michael Meissner

This patch now allows GCC to generate PADDI to add 34-bit constants.

I have done a bootstrap build with all of the patches applied, and there were
no regressions in the test suite.  After posting these patches, I will start a
job to build each set of patches in turn just to make sure there are no extra
warnings.

Can I commit this patch to the trunk?

2019-09-18  Michael Meissner  

* config/rs6000/predicates.md (add_operand): Add support for
PADDI.
* config/rs6000/rs6000.md (add3): Add support for PADDI.

Index: gcc/config/rs6000/predicates.md
===
--- gcc/config/rs6000/predicates.md (revision 275911)
+++ gcc/config/rs6000/predicates.md (working copy)
@@ -839,7 +839,8 @@ (define_special_predicate "indexed_addre
 (define_predicate "add_operand"
   (if_then_else (match_code "const_int")
 (match_test "satisfies_constraint_I (op)
-|| satisfies_constraint_L (op)")
+|| satisfies_constraint_L (op)
+|| satisfies_constraint_eI (op)")
 (match_operand 0 "gpc_reg_operand")))
 
 ;; Return 1 if the operand is either a non-special register, or 0, or -1.
Index: gcc/config/rs6000/rs6000.md
===
--- gcc/config/rs6000/rs6000.md (revision 275913)
+++ gcc/config/rs6000/rs6000.md (working copy)
@@ -1760,15 +1760,17 @@ (define_expand "add3"
 })
 
 (define_insn "*add3"
-  [(set (match_operand:GPR 0 "gpc_reg_operand" "=r,r,r")
-   (plus:GPR (match_operand:GPR 1 "gpc_reg_operand" "%r,b,b")
- (match_operand:GPR 2 "add_operand" "r,I,L")))]
+  [(set (match_operand:GPR 0 "gpc_reg_operand" "=r,r,r,r")
+   (plus:GPR (match_operand:GPR 1 "gpc_reg_operand" "%r,b,b,b")
+ (match_operand:GPR 2 "add_operand" "r,I,L,eI")))]
   ""
   "@
add %0,%1,%2
addi %0,%1,%2
-   addis %0,%1,%v2"
-  [(set_attr "type" "add")])
+   addis %0,%1,%v2
+   addi %0,%1,%2"
+  [(set_attr "type" "add")
+   (set_attr "isa" "*,*,*,fut")])
 
 (define_insn "*addsi3_high"
   [(set (match_operand:SI 0 "gpc_reg_operand" "=b")

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797

[PATCH] V4, patch #8: Enable -mpcrel on Linux 64-bit, but not on other targets

2019-09-18 Thread Michael Meissner

This is the last patch in the compiler series for now.  On Linux 64-bit systems
it will enable -mpcrel (and -mprefixed-addr) by default.  On other systems, it
will not enable these switches until the tm.h for the OS enables it.

I have the 3 patches for the test suite that will be following this if things
are settling down.  At the moment, the tests have not been modified, but I will
look at your comments to see if I need to modify anything.

In addition, I will be re-vamping PCREL_OPT to take into account your comments.
That will come at a later date.

I have done a bootstrap build with all of the patches applied, and there were
no regressions in the test suite (this includes running all of the tests not
yet submitted).  After posting these patches, I will start a job to build each
set of patches in turn just to make sure there are no extra warnings.

Can I commit this patch to the trunk?

2019-09-18  Michael Meissner  

* config/rs6000/linux64.h (TARGET_PREFIXED_ADDR_DEFAULT): Enable
prefixed addressing by default.
(TARGET_PCREL_DEFAULT): Enable pc-relative addressing by default.
* config/rs6000/rs6000-cpus.def (ISA_FUTURE_MASKS_SERVER): Only
enable -mprefixed-addr and -mpcrel if the OS tm.h says to enable
it.
(ADDRESSING_FUTURE_MASKS): New mask macro.
(OTHER_FUTURE_MASKS): Use ADDRESSING_FUTURE_MASKS.
* config/rs6000/rs6000.c (TARGET_PREFIXED_ADDR_DEFAULT): Do not
enable -mprefixed-addr unless the OS tm.h says to.
(TARGET_PCREL_DEFAULT): Do not enable -mpcrel unless the OS tm.h
says to.
(rs6000_option_override_internal): Do not enable -mprefixed-addr
or -mpcrel unless the OS tm.h says to enable it.  Add more checks
for -mcpu=future.

Index: gcc/config/rs6000/linux64.h
===
--- gcc/config/rs6000/linux64.h (revision 275894)
+++ gcc/config/rs6000/linux64.h (working copy)
@@ -640,3 +640,11 @@ extern int dot_symbols;
enabling the __float128 keyword.  */
 #undef TARGET_FLOAT128_ENABLE_TYPE
 #define TARGET_FLOAT128_ENABLE_TYPE 1
+
+/* Enable support for pc-relative and numeric prefixed addressing on the
+   'future' system.  */
+#undef  TARGET_PREFIXED_ADDR_DEFAULT
+#define TARGET_PREFIXED_ADDR_DEFAULT   1
+
+#undef  TARGET_PCREL_DEFAULT
+#define TARGET_PCREL_DEFAULT   1
Index: gcc/config/rs6000/rs6000-cpus.def
===
--- gcc/config/rs6000/rs6000-cpus.def   (revision 275894)
+++ gcc/config/rs6000/rs6000-cpus.def   (working copy)
@@ -75,15 +75,21 @@
 | OPTION_MASK_P8_VECTOR\
 | OPTION_MASK_P9_VECTOR)
 
-/* Support for a future processor's features.  Do not enable -mpcrel until it
-   is fully functional.  */
+/* Support for a future processor's features.  The prefixed and pc-relative
+   addressing bits are not added here.  Instead, rs6000.c adds them if the OS
+   tm.h says that it supports the addressing modes.  */
 #define ISA_FUTURE_MASKS_SERVER(ISA_3_0_MASKS_SERVER   
\
-| OPTION_MASK_FUTURE   \
+| OPTION_MASK_FUTURE)
+
+/* Addressing related flags on a future processor.  These flags are broken out
+   because not all targets will support either pc-relative addressing, or even
+   prefixed addressing, and we want to clear all of the addressing bits
+   on targets that cannot support prefixed/pcrel addressing.  */
+#define ADDRESSING_FUTURE_MASKS(OPTION_MASK_PCREL  
\
 | OPTION_MASK_PREFIXED_ADDR)
 
 /* Flags that need to be turned off if -mno-future.  */
-#define OTHER_FUTURE_MASKS (OPTION_MASK_PCREL  \
-| OPTION_MASK_PREFIXED_ADDR)
+#define OTHER_FUTURE_MASKS ADDRESSING_FUTURE_MASKS
 
 /* Flags that need to be turned off if -mno-power9-vector.  */
 #define OTHER_P9_VECTOR_MASKS  (OPTION_MASK_FLOAT128_HW\
Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 275912)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -98,6 +98,16 @@
 #endif
 #endif
 
+/* Set up the defaults for whether prefixed addressing is used, and if it is
+   used, whether we want to turn on pc-relative support by default.  */
+#ifndef TARGET_PREFIXED_ADDR_DEFAULT
+#define TARGET_PREFIXED_ADDR_DEFAULT   0
+#endif
+
+#ifndef TARGET_PCREL_DEFAULT
+#define TARGET_PCREL_DEFAULT   0
+#endif
+
 /* Support targetm.vectorize.builtin_mask_for_load.  */
 GTY(()) tree altivec_builtin_mask_for_load;
 
@@ -2532,6 +2542,14 @@ rs6000_debug_reg_global (void)
   if (TARGET_DIRECT_MOVE_128)
 fprintf (stderr, DEBUG_FMT_D, "VSX easy 64-bit mfvsrld element",
 (int)VECTOR_ELEMENT_M

RE: [C++ PATCH 2/4] Fix conversions for built-in operator overloading candidates.

2019-09-18 Thread JiangNing OS

Hi Jason,

This commit caused boot-strap failure on aarch64. Is it a bug? Can this be 
fixed ASAP?

../../gcc/gcc/expmed.c:5602:19: error: ���int_mode��� may be used uninitialized 
in this function [-Werror=maybe-uninitialized]
 5602 |   scalar_int_mode int_mode;
  |   ^~~~

Thanks,
-Jiangning

> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org 
> On Behalf Of Jason Merrill
> Sent: Monday, September 16, 2019 12:33 PM
> To: gcc-patches@gcc.gnu.org
> Subject: [C++ PATCH 2/4] Fix conversions for built-in operator overloading
> candidates.
> 
> While working on C++20 operator<=>, I noticed that build_new_op_1 was
> doing too much conversion when a built-in candidate was selected; the
> standard says it should only perform user-defined conversions, and then
> leave the normal operator semantics to handle any standard conversions.
> This is important for operator<=> because a comparison of two different
> unscoped enums is ill-formed; if we promote the enums to int here,
> cp_build_binary_op never gets to see the original operand types, so we can't
> give the error.
> 
> Tested x86_64-pc-linux-gnu, applying to trunk.
> 
>   * call.c (build_new_op_1): Don't apply any standard conversions to
>   the operands of a built-in operator.  Don't suppress conversions in
>   cp_build_unary_op.
>   * typeck.c (cp_build_unary_op): Do integral promotions for enums.
> ---
>  gcc/cp/call.c| 51 
>  gcc/cp/typeck.c  |  4 ++--
>  gcc/cp/ChangeLog |  7 +++
>  3 files changed, 34 insertions(+), 28 deletions(-)
> 
> diff --git a/gcc/cp/call.c b/gcc/cp/call.c index c3045d948c5..457fa6605c2
> 100644
> --- a/gcc/cp/call.c
> +++ b/gcc/cp/call.c
> @@ -6139,41 +6139,40 @@ build_new_op_1 (const op_location_t &loc,
> enum tree_code code, int flags,
> break;
>   }
> 
> -   /* We need to strip any leading REF_BIND so that bitfields
> -  don't cause errors.  This should not remove any important
> -  conversions, because builtins don't apply to class
> -  objects directly.  */
> +   /* "If a built-in candidate is selected by overload resolution, the
> +  operands of class type are converted to the types of the
> +  corresponding parameters of the selected operation function,
> +  except that the second standard conversion sequence of a
> +  user-defined conversion sequence (12.3.3.1.2) is not applied."
> +*/
> conv = cand->convs[0];
> -   if (conv->kind == ck_ref_bind)
> - conv = next_conversion (conv);
> -   arg1 = convert_like (conv, arg1, complain);
> +   if (conv->user_conv_p)
> + {
> +   while (conv->kind != ck_user)
> + conv = next_conversion (conv);
> +   arg1 = convert_like (conv, arg1, complain);
> + }
> 
> if (arg2)
>   {
> conv = cand->convs[1];
> -   if (conv->kind == ck_ref_bind)
> - conv = next_conversion (conv);
> -   else
> - arg2 = decay_conversion (arg2, complain);
> -
> -   /* We need to call warn_logical_operator before
> -  converting arg2 to a boolean_type, but after
> -  decaying an enumerator to its value.  */
> -   if (complain & tf_warning)
> - warn_logical_operator (loc, code, boolean_type_node,
> -code_orig_arg1, arg1,
> -code_orig_arg2, arg2);
> -
> -   arg2 = convert_like (conv, arg2, complain);
> +   if (conv->user_conv_p)
> + {
> +   while (conv->kind != ck_user)
> + conv = next_conversion (conv);
> +   arg2 = convert_like (conv, arg2, complain);
> + }
>   }
> +
> if (arg3)
>   {
> conv = cand->convs[2];
> -   if (conv->kind == ck_ref_bind)
> - conv = next_conversion (conv);
> -   convert_like (conv, arg3, complain);
> +   if (conv->user_conv_p)
> + {
> +   while (conv->kind != ck_user)
> + conv = next_conversion (conv);
> +   arg3 = convert_like (conv, arg3, complain);
> + }
>   }
> -
>   }
>  }
> 
> @@ -6241,7 +6240,7 @@ build_new_op_1 (const op_location_t &loc, enum
> tree_code code, int flags,
>  case REALPART_EXPR:
>  case IMAGPART_EXPR:
>  case ABS_EXPR:
> -  return cp_build_unary_op (code, arg1, candidates != 0, complain);
> +  return cp_build_unary_op (code, arg1, false, complain);
> 
>  case ARRAY_REF:
>return cp_build_array_ref (input_location, arg1, arg2, complain); diff 
> --git
> a/gcc/cp/typeck.c b/gcc/cp/typeck.c index 70094d1b426..620f2c9afdf 100644
> --- a/gcc/cp/typeck.c
> +++ b/gcc/cp/typeck.c
> @@ -6242,7 +6242,7 @@ cp_build_unary_op (enum tree_code code, tree
> xarg, bool noconvert,

[PATCH] RISC-V: Fix more splitters accidentally calling gen_reg_rtx.

2019-09-18 Thread Jim Wilson

Similar to a previous patch, adding a in_splitter arg to every function that
can be called from a define_split, so we can prevent calls to gen_reg_rtx
during combine, which can cause crashes.

Tested with riscv32-elf and riscv64-linux builds and checks with no
regressions.

Commmitted.

Jim

PR target/91683
* config/riscv/riscv-protos.h (riscv_split_symbol): New bool parameter.
(riscv_move_integer): Likewise.
* config/riscv/riscv.c (riscv_split_integer): Pass FALSE for new
riscv_move_integer arg.
(riscv_legitimize_move): Likewise.
(riscv_force_temporary): New parameter in_splitter.  Don't call
force_reg if true.
(riscv_unspec_offset_high): Pass FALSE for new riscv_force_temporary
arg.
(riscv_add_offset): Likewise.
(riscv_split_symbol): New parameter in_splitter.  Pass to
riscv_force_temporary.
(riscv_legitimize_address): Pass FALSE for new riscv_split_symbol
arg.
(riscv_move_integer): New parameter in_splitter.  New local
can_create_psuedo.  Don't call riscv_split_integer or force_reg when
in_splitter TRUE.
(riscv_legitimize_const_move): Pass FALSE for new riscv_move_integer,
riscv_split_symbol, and riscv_force_temporary args.
* config/riscv/riscv.md (low+1): Pass TRUE for new
riscv_move_integer arg.
(low+2): Pass TRUE for new riscv_split_symbol arg.
---
 gcc/config/riscv/riscv-protos.h |  4 +--
 gcc/config/riscv/riscv.c| 46 -
 gcc/config/riscv/riscv.md   |  6 ++---
 3 files changed, 33 insertions(+), 23 deletions(-)

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 69e39f7a208..5092294803c 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -44,10 +44,10 @@ extern int riscv_const_insns (rtx);
 extern int riscv_split_const_insns (rtx);
 extern int riscv_load_store_insns (rtx, rtx_insn *);
 extern rtx riscv_emit_move (rtx, rtx);
-extern bool riscv_split_symbol (rtx, rtx, machine_mode, rtx *);
+extern bool riscv_split_symbol (rtx, rtx, machine_mode, rtx *, bool);
 extern bool riscv_split_symbol_type (enum riscv_symbol_type);
 extern rtx riscv_unspec_address (rtx, enum riscv_symbol_type);
-extern void riscv_move_integer (rtx, rtx, HOST_WIDE_INT, machine_mode);
+extern void riscv_move_integer (rtx, rtx, HOST_WIDE_INT, machine_mode, bool);
 extern bool riscv_legitimize_move (machine_mode, rtx, rtx);
 extern rtx riscv_subword (rtx, bool);
 extern bool riscv_split_64bit_move_p (rtx, rtx);
diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c
index 39bf87abf1c..b8a8778b92c 100644
--- a/gcc/config/riscv/riscv.c
+++ b/gcc/config/riscv/riscv.c
@@ -508,8 +508,8 @@ riscv_split_integer (HOST_WIDE_INT val, machine_mode mode)
   unsigned HOST_WIDE_INT hival = sext_hwi ((val - loval) >> 32, 32);
   rtx hi = gen_reg_rtx (mode), lo = gen_reg_rtx (mode);
 
-  riscv_move_integer (hi, hi, hival, mode);
-  riscv_move_integer (lo, lo, loval, mode);
+  riscv_move_integer (hi, hi, hival, mode, FALSE);
+  riscv_move_integer (lo, lo, loval, mode, FALSE);
 
   hi = gen_rtx_fmt_ee (ASHIFT, mode, hi, GEN_INT (32));
   hi = force_reg (mode, hi);
@@ -1021,9 +1021,12 @@ riscv_force_binary (machine_mode mode, enum rtx_code 
code, rtx x, rtx y)
are allowed, copy it into a new register, otherwise use DEST.  */
 
 static rtx
-riscv_force_temporary (rtx dest, rtx value)
+riscv_force_temporary (rtx dest, rtx value, bool in_splitter)
 {
-  if (can_create_pseudo_p ())
+  /* We can't call gen_reg_rtx from a splitter, because this might realloc
+ the regno_reg_rtx array, which would invalidate reg rtx pointers in the
+ combine undo buffer.  */
+  if (can_create_pseudo_p () && !in_splitter)
 return force_reg (Pmode, value);
   else
 {
@@ -1082,7 +1085,7 @@ static rtx
 riscv_unspec_offset_high (rtx temp, rtx addr, enum riscv_symbol_type 
symbol_type)
 {
   addr = gen_rtx_HIGH (Pmode, riscv_unspec_address (addr, symbol_type));
-  return riscv_force_temporary (temp, addr);
+  return riscv_force_temporary (temp, addr, FALSE);
 }
 
 /* Load an entry from the GOT for a TLS GD access.  */
@@ -1130,7 +1133,8 @@ static rtx riscv_tls_add_tp_le (rtx dest, rtx base, rtx 
sym)
is guaranteed to be a legitimate address for mode MODE.  */
 
 bool
-riscv_split_symbol (rtx temp, rtx addr, machine_mode mode, rtx *low_out)
+riscv_split_symbol (rtx temp, rtx addr, machine_mode mode, rtx *low_out,
+   bool in_splitter)
 {
   enum riscv_symbol_type symbol_type;
 
@@ -1146,7 +1150,7 @@ riscv_split_symbol (rtx temp, rtx addr, machine_mode 
mode, rtx *low_out)
   case SYMBOL_ABSOLUTE:
{
  rtx high = gen_rtx_HIGH (Pmode, copy_rtx (addr));
- high = riscv_force_temporary (temp, high);
+ high = riscv_force_temporary (temp, high, in_splitter);
  *low_out = gen_rtx_LO_SUM (Pmode, high, addr);

Re: [PATCH][ARM] Enable code hoisting with -Os (PR80155)

2019-09-18 Thread Prathamesh Kulkarni

On Wed, 18 Sep 2019 at 22:17, Prathamesh Kulkarni
 wrote:
>
> On Wed, 18 Sep 2019 at 01:46, Richard Biener  
> wrote:
> >
> > On Tue, Sep 17, 2019 at 7:18 PM Wilco Dijkstra  
> > wrote:
> > >
> > > Hi Richard,
> > >
> > > > The issue with the bugzilla is that it lacked appropriate testcase(s) 
> > > > and thus
> > > > it is now a mess.  There are clear testcases (maybe not in the 
> > > > benchmarks you
> > >
> > > Agreed - it's not clear whether any of the proposed changes would actually
> > > help the original issue. My patch absolutely does.
> > >
> > > > care about) that benefit from code hoisting as enabler, mainly when 
> > > > control
> > > > flow can be then converted to data flow.  Also note that "size 
> > > > optimizations"
> > > > are important for all cases where followup transforms have size limits 
> > > > on the IL
> > > > in place.
> > >
> > > The gain from -fcode-hoisting is about 0.2% overall on Thumb-2. Ie. it's 
> > > definitely
> > > useful, but there are much larger gains to be had from other tweaks [1]. 
> > > So we can
> > > live without it until a better solution is found.
> >
> > A "solution" for better eembc benchmark results?
> >
> > The issues are all latent even w/o code-hoisting since you can do the
> > same transform at the source level.  Which is usually why I argue
> > trying to fix this in code-hoisting is not a complete fix.  Nor is turning
> > off random GIMPLE passes for specific benchmark regressions.
> >
> > Anyway, it's arm maintainers call if you want to have such hacks in
> > place or not.
> >
> > As a release manager I say that GCC isn't a benchmark compiler.
> >
> > As the one "responsible" for the code-hoisting introduction I say that
> > as long as I don't have access to the actual benchmark I can't assess
> > wrongdoing of the pass nor suggest an appropriate place for optimization.
> Hi Richard,
> The actual benchmark function for PR80155 is almost identical to FMS()
> function defined in
> pr77445-2.c, and the test-case reproduces the same issue as in the benchmark.
Hi,
The attached patch is another workaround for hoisting. The rationale
behind the patch is
to avoid "long range" hoistings  for a "large enough" CFG.

The patch walks dom tree for the block and finds the "furthest" dom
block, for which
intersect_p (availout_in_some, AVAIL_OUT (dom_block)) is true. The
"distance" is measured
in terms of dom depth.

We can have two params say param_hoist_n_bbs to determine "large enough"
CFG, and param_hoist_expr_dist, that avoids hoisting if the "distance"
exceeds the param threshold.
For the values (hardcoded) in the patch, it "works" for avoiding the
spill and does not regress ssa-pre-*.c and ssa-hoist-*.c
tests (the param values could be made target-specific). Does the
approach look reasonable ?
My concern with current version is that walking dom tree per block in
do_hoist_insertion can end up being
pretty expensive. Any suggestions on how we could speed that up ?

Alternatively, we could consider hoisting from "bottom up", one block
at a time, and keep a map to count number of
times an expr is hoisted and refuse hoisting the expr further up if it
exceeds target-specific param threshold ?

Thanks,
Prathamesh
>
> Thanks,
> Prathamesh
> >
> > Richard.
> >
> > >
> > > [1] https://gcc.gnu.org/ml/gcc-patches/2019-07/msg01739.html
> > >
> > > Wilco
diff --git a/gcc/graph.h b/gcc/graph.h
index 5ec4f1c107f..c05cc6df25b 100644
--- a/gcc/graph.h
+++ b/gcc/graph.h
@@ -24,5 +24,6 @@ extern void print_graph_cfg (const char *, struct function *);
 extern void clean_graph_dump_file (const char *);
 extern void finish_graph_dump_file (const char *);
 extern void debug_dot_cfg (struct function *);
+extern void save_dot_cfg (struct function *);
 
 #endif /* ! GCC_GRAPH_H */
diff --git a/gcc/tree-ssa-pre.c b/gcc/tree-ssa-pre.c
index c618601a184..d13d1fdfd8a 100644
--- a/gcc/tree-ssa-pre.c
+++ b/gcc/tree-ssa-pre.c
@@ -3463,6 +3463,35 @@ do_pre_partial_partial_insertion (basic_block block, basic_block dom)
   return new_stuff;
 }
 
+/* Return longest "distance" of block, for which
+   intersect_p (availout_in_some, AVAIL_OUT (block)) is true.
+   Distance is measured in terms of dom depth.  */
+
+static int
+get_longest_dist (bitmap_head availout_in_some, basic_block block)
+{
+  basic_block son;
+  int dist;
+  int max_dist = 0;
+
+  for (son = first_dom_son (CDI_DOMINATORS, block);
+   son;
+   son = next_dom_son (CDI_DOMINATORS, son))
+{
+  dist = get_longest_dist (availout_in_some, son);
+  if (dist > max_dist)
+	max_dist = dist;
+}
+
+  if (max_dist != -1)
+return max_dist + 1;
+
+  if (bitmap_intersect_p (&availout_in_some, &AVAIL_OUT (block)->values))
+return 0;
+
+  return -1;
+}
+
 /* Insert expressions in BLOCK to compute hoistable values up.
Return TRUE if something was inserted, otherwise return FALSE.
The caller has to make sure that BLOCK has at least two successors.  */
@@ -3509,6 +3538,7 @@ do_hoist_insertion (basic_block b

[PATCH] Remove operand swapping from reduction vectorization (SLP)

2019-09-18 Thread Richard Biener



Here's the SLP part.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2019-09-19  Richard Biener  

* tree-parloops.c (parloops_is_slp_reduction): Do not set
LOOP_VINFO_OPERANDS_SWAPPED.
(parloops_is_simple_reduction): Likewise.
* tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Do not
initialize operands_swapped.
(_loop_vec_info::~_loop_vec_info): Do not re-canonicalize stmts.
(vect_is_slp_reduction): Do not swap operands.
* tree-vectorizer.h (_loop_vec_info::operands_swapped): Remove.
(LOOP_VINFO_OPERANDS_SWAPPED): Likewise.


Index: gcc/tree-parloops.c
===
--- gcc/tree-parloops.c (revision 275898)
+++ gcc/tree-parloops.c (working copy)
@@ -347,9 +347,6 @@ parloops_is_slp_reduction (loop_vec_info
 gimple_assign_rhs1_ptr (next_stmt),
  gimple_assign_rhs2_ptr (next_stmt));
  update_stmt (next_stmt);
-
- if (CONSTANT_CLASS_P (gimple_assign_rhs1 (next_stmt)))
-   LOOP_VINFO_OPERANDS_SWAPPED (loop_info) = true;
}
  else
return false;
@@ -831,9 +828,6 @@ parloops_is_simple_reduction (loop_vec_i
  if (dump_enabled_p ())
report_ploop_op (MSG_NOTE, def_stmt,
 "detected reduction: need to swap operands: ");
-
- if (CONSTANT_CLASS_P (gimple_assign_rhs1 (def_stmt)))
-   LOOP_VINFO_OPERANDS_SWAPPED (loop_info) = true;
 }
   else
 {
Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c(revision 275898)
+++ gcc/tree-vect-loop.c(working copy)
@@ -832,7 +832,6 @@ _loop_vec_info::_loop_vec_info (class lo
 fully_masked_p (false),
 peeling_for_gaps (false),
 peeling_for_niter (false),
-operands_swapped (false),
 no_data_dependencies (false),
 has_mask_store (false),
 scalar_loop_scaling (profile_probability::uninitialized ()),
@@ -906,57 +905,6 @@ release_vec_loop_masks (vec_loop_masks *
 
 _loop_vec_info::~_loop_vec_info ()
 {
-  int nbbs;
-  gimple_stmt_iterator si;
-  int j;
-
-  nbbs = loop->num_nodes;
-  for (j = 0; j < nbbs; j++)
-{
-  basic_block bb = bbs[j];
-  for (si = gsi_start_bb (bb); !gsi_end_p (si); )
-{
- gimple *stmt = gsi_stmt (si);
-
- /* We may have broken canonical form by moving a constant
-into RHS1 of a commutative op.  Fix such occurrences.  */
- if (operands_swapped && is_gimple_assign (stmt))
-   {
- enum tree_code code = gimple_assign_rhs_code (stmt);
-
- if ((code == PLUS_EXPR
-  || code == POINTER_PLUS_EXPR
-  || code == MULT_EXPR)
- && CONSTANT_CLASS_P (gimple_assign_rhs1 (stmt)))
-   swap_ssa_operands (stmt,
-  gimple_assign_rhs1_ptr (stmt),
-  gimple_assign_rhs2_ptr (stmt));
- else if (code == COND_EXPR
-  && CONSTANT_CLASS_P (gimple_assign_rhs2 (stmt)))
-   {
- tree cond_expr = gimple_assign_rhs1 (stmt);
- enum tree_code cond_code = TREE_CODE (cond_expr);
-
- if (TREE_CODE_CLASS (cond_code) == tcc_comparison)
-   {
- bool honor_nans = HONOR_NANS (TREE_OPERAND (cond_expr,
- 0));
- cond_code = invert_tree_comparison (cond_code,
- honor_nans);
- if (cond_code != ERROR_MARK)
-   {
- TREE_SET_CODE (cond_expr, cond_code);
- swap_ssa_operands (stmt,
-gimple_assign_rhs2_ptr (stmt),
-gimple_assign_rhs3_ptr (stmt));
-   }
-   }
-   }
-   }
-  gsi_next (&si);
-}
-}
-
   free (bbs);
 
   release_vec_loop_masks (&masks);
@@ -2715,7 +2663,8 @@ vect_is_slp_reduction (loop_vec_info loo
}
   else
{
-  tree op = gimple_assign_rhs2 (next_stmt);
+ gcc_assert (gimple_assign_rhs1 (next_stmt) == lhs);
+ tree op = gimple_assign_rhs2 (next_stmt);
  stmt_vec_info def_stmt_info = loop_info->lookup_def (op);
 
   /* Check that the other def is either defined in the loop
@@ -2725,23 +2674,12 @@ vect_is_slp_reduction (loop_vec_info loo
  && flow_bb_inside_loop_p (loop, gimple_bb (def_stmt_info->stmt))
  && vect_valid_reduction_input_p (def_stmt_info))
{
- if (dump_enabled_p ())
-   dump_printf_loc (MSG

[COMMITTED][GCC9] Backport RISC-V: Fix bad insn splits with paradoxical subregs.

2019-09-18 Thread Kito Cheng

This patch fix PR target/91635, fixing wrong code gen for RISC-V.
From 52e32e2f82b4fe09c253e230c4fe321a0341aae7 Mon Sep 17 00:00:00 2001
From: kito 
Date: Thu, 19 Sep 2019 06:38:23 +
Subject: [PATCH] RISC-V: Fix bad insn splits with paradoxical subregs.

Shifting by more than the size of a SUBREG_REG doesn't work, so we either
need to disable splits if an input is paradoxical, or else we need to
generate a clean temporary for intermediate results.

Jakub wrote the first version of this patch, so gets primary credit for it.

	gcc/
	PR target/91635
	* config/riscv/riscv.md (zero_extendsidi2, zero_extendhi2,
	extend2): Don't split if
	paradoxical_subreg_p (operands[0]).
	(*lshrsi3_zero_extend_3+1, *lshrsi3_zero_extend_3+2): Add clobber and
	use as intermediate value.

	gcc/testsuite/
	PR target/91635
	* gcc.c-torture/execute/pr91635.c: New test.
	* gcc.target/riscv/shift-shift-4.c: New test.
	* gcc.target/riscv/shift-shift-5.c: New test.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gcc-9-branch@275929 138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog | 13 +
 gcc/config/riscv/riscv.md | 30 +++---
 gcc/testsuite/ChangeLog   | 11 
 gcc/testsuite/gcc.c-torture/execute/pr91635.c | 57 +++
 .../gcc.target/riscv/shift-shift-4.c  | 13 +
 .../gcc.target/riscv/shift-shift-5.c  | 16 ++
 6 files changed, 131 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr91635.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/shift-shift-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/shift-shift-5.c

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 95f978d5805..aa90ce5df7b 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,16 @@
+2019-09-19  Kito Cheng  
+
+	Backport from mainline
+	2019-09-05  Jakub Jelinek  
+		Jim Wilson  
+
+	PR target/91635
+	* config/riscv/riscv.md (zero_extendsidi2, zero_extendhi2,
+	extend2): Don't split if
+	paradoxical_subreg_p (operands[0]).
+	(*lshrsi3_zero_extend_3+1, *lshrsi3_zero_extend_3+2): Add clobber and
+	use as intermediate value.
+
 2019-09-11  Eric Botcazou  
 
 	PR rtl-optimization/89795
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index a8bac170e72..7850c41f3c7 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -1051,7 +1051,9 @@
   "@
#
lwu\t%0,%1"
-  "&& reload_completed && REG_P (operands[1])"
+  "&& reload_completed
+   && REG_P (operands[1])
+   && !paradoxical_subreg_p (operands[0])"
   [(set (match_dup 0)
 	(ashift:DI (match_dup 1) (const_int 32)))
(set (match_dup 0)
@@ -1068,7 +1070,9 @@
   "@
#
lhu\t%0,%1"
-  "&& reload_completed && REG_P (operands[1])"
+  "&& reload_completed
+   && REG_P (operands[1])
+   && !paradoxical_subreg_p (operands[0])"
   [(set (match_dup 0)
 	(ashift:GPR (match_dup 1) (match_dup 2)))
(set (match_dup 0)
@@ -1117,7 +1121,9 @@
   "@
#
l\t%0,%1"
-  "&& reload_completed && REG_P (operands[1])"
+  "&& reload_completed
+   && REG_P (operands[1])
+   && !paradoxical_subreg_p (operands[0])"
   [(set (match_dup 0) (ashift:SI (match_dup 1) (match_dup 2)))
(set (match_dup 0) (ashiftrt:SI (match_dup 0) (match_dup 2)))]
 {
@@ -1765,15 +1771,20 @@
 ;; Handle AND with 2^N-1 for N from 12 to XLEN.  This can be split into
 ;; two logical shifts.  Otherwise it requires 3 instructions: lui,
 ;; xor/addi/srli, and.
+
+;; Generating a temporary for the shift output gives better combiner results;
+;; and also fixes a problem where op0 could be a paradoxical reg and shifting
+;; by amounts larger than the size of the SUBREG_REG doesn't work.
 (define_split
   [(set (match_operand:GPR 0 "register_operand")
 	(and:GPR (match_operand:GPR 1 "register_operand")
-		 (match_operand:GPR 2 "p2m1_shift_operand")))]
+		 (match_operand:GPR 2 "p2m1_shift_operand")))
+   (clobber (match_operand:GPR 3 "register_operand"))]
   ""
- [(set (match_dup 0)
+ [(set (match_dup 3)
(ashift:GPR (match_dup 1) (match_dup 2)))
   (set (match_dup 0)
-   (lshiftrt:GPR (match_dup 0) (match_dup 2)))]
+   (lshiftrt:GPR (match_dup 3) (match_dup 2)))]
 {
   /* Op2 is a VOIDmode constant, so get the mode size from op1.  */
   operands[2] = GEN_INT (GET_MODE_BITSIZE (GET_MODE (operands[1]))
@@ -1785,12 +1796,13 @@
 (define_split
   [(set (match_operand:DI 0 "register_operand")
 	(and:DI (match_operand:DI 1 "register_operand")
-		(match_operand:DI 2 "high_mask_shift_operand")))]
+		(match_operand:DI 2 "high_mask_shift_operand")))
+   (clobber (match_operand:DI 3 "register_operand"))]
   "TARGET_64BIT"
-  [(set (match_dup 0)
+  [(set (match_dup 3)
 	(lshiftrt:DI (match_dup 1) (match_dup 2)))
(set (match_dup 0)
-	(ashift:DI (match_dup 0) (match_dup 2)))]
+	(ashift:DI (match_dup 3) (match_dup 2)))]
 {
   operands[2] = GEN_INT (ctz_hwi (INTVAL (operands[2])));
 })
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog

89 matches

Mail list logo