[PATCH v10] add -fpatchable-function-entry=N,M option

2017-07-05 Thread Torsten Duwe
Changes since v9:

* Do not store (declare static) the nop pattern template string.
  In the future, it might depend on the particular function
  being emitted. Fetch it freshly each time instead.

* On platforms without named sections, simply omit the recording
  of the nop locations. Run-time instrumentation can still fiddle
  it out, if desired. Document this behaviour in a half sentence.

* Move the hook documentation to where it belongs. Texi file (re-)
  generation should work cleanly now.

* Documentation clarified as requested.

Torsten

---
gcc/c-family/ChangeLog
2017-07-04  Torsten Duwe  

* c-attribs.c (c_common_attribute_table): Add entry for
"patchable_function_entry".

gcc/lto/ChangeLog
2017-07-04  Torsten Duwe  

* lto-lang.c (lto_attribute_table): Add entry for
"patchable_function_entry".

gcc/ChangeLog
2017-07-04  Torsten Duwe  

* common.opt: Introduce -fpatchable-function-entry
command line option, and its variables function_entry_patch_area_size
and function_entry_patch_area_start.
* opts.c (common_handle_option): Add -fpatchable_function_entry_ case,
including a two-value parser.
* target.def (print_patchable_function_entry): New target hook.
* targhooks.h (default_print_patchable_function_entry): New function.
* targhooks.c (default_print_patchable_function_entry): Likewise.
* toplev.c (process_options): Switch off IPA-RA if
patchable function entries are being generated.
* varasm.c (assemble_start_function): Look at the
patchable-function-entry command line switch and current
function attributes and maybe generate NOP instructions by
calling the print_patchable_function_entry hook.
* doc/extend.texi: Document patchable_function_entry attribute.
* doc/invoke.texi: Document -fpatchable_function_entry
command line option.
* doc/tm.texi.in (TARGET_ASM_PRINT_PATCHABLE_FUNCTION_ENTRY):
New target hook.
* doc/tm.texi: Likewise.

gcc/testsuite/ChangeLog
2017-07-04  Torsten Duwe  

* c-c++-common/patchable_function_entry-default.c: New test.
* c-c++-common/patchable_function_entry-decl.c: Likewise.
* c-c++-common/patchable_function_entry-definition.c: Likewise.

diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
index 626ffa1cde7..ecb00c1d5b9 100644
--- a/gcc/c-family/c-attribs.c
+++ b/gcc/c-family/c-attribs.c
@@ -142,6 +142,8 @@ static tree handle_bnd_variable_size_attribute (tree *, 
tree, tree, int, bool *)
 static tree handle_bnd_legacy (tree *, tree, tree, int, bool *);
 static tree handle_bnd_instrument (tree *, tree, tree, int, bool *);
 static tree handle_fallthrough_attribute (tree *, tree, tree, int, bool *);
+static tree handle_patchable_function_entry_attribute (tree *, tree, tree,
+  int, bool *);
 
 /* Table of machine-independent attributes common to all C-like languages.
 
@@ -351,6 +353,9 @@ const struct attribute_spec c_common_attribute_table[] =
  handle_bnd_instrument, false },
   { "fallthrough",   0, 0, false, false, false,
  handle_fallthrough_attribute, false },
+  { "patchable_function_entry",1, 2, true, false, false,
+ handle_patchable_function_entry_attribute,
+ false },
   { NULL, 0, 0, false, false, false, NULL, false }
 };
 
@@ -3260,3 +3265,10 @@ handle_fallthrough_attribute (tree *, tree name, tree, 
int,
   *no_add_attrs = true;
   return NULL_TREE;
 }
+
+static tree
+handle_patchable_function_entry_attribute (tree *, tree, tree, int, bool *)
+{
+  /* Nothing to be done here.  */
+  return NULL_TREE;
+}
diff --git a/gcc/common.opt b/gcc/common.opt
index e81165c488b..78cfa568a95 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -163,6 +163,13 @@ bool flag_stack_usage_info = false
 Variable
 int flag_debug_asm
 
+; How many NOP insns to place at each function entry by default
+Variable
+HOST_WIDE_INT function_entry_patch_area_size
+
+; And how far the real asm entry point is into this area
+Variable
+HOST_WIDE_INT function_entry_patch_area_start
 
 ; Balance between GNAT encodings and standard DWARF to emit.
 Variable
@@ -2030,6 +2037,10 @@ fprofile-reorder-functions
 Common Report Var(flag_profile_reorder_functions)
 Enable function reordering that improves code placement.
 
+fpatchable-function-entry=
+Common Joined Optimization
+Insert NOP instructions at each function entry.
+
 frandom-seed
 Common Var(common_deferred_options) Defer
 
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 03ba8fc436c..a4c3c98b9f5 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -3105,6 +3105,27 @@ that affect more than one function.
 This attribute should be used for debugging purposes only.  It is not
 suitable in productio

Re: [PATCH][AArch64] Fix ILP32 memory access

2017-07-05 Thread Andrew Pinski
On Tue, Jun 27, 2017 at 6:39 AM, Wilco Dijkstra  wrote:
> This patch fixes a failure in gcc.target/aarch64/reload-valid-spoff.c
> triggered by https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01367.html -
> it supersedes https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01907.html
> as this fixes the root cause of the failure.
>
> In ILP32 all memory accesses must have Pmode as the base address, but
> aarch64_expand_mov_immediate wasn't emitting a conversion in one case.
> Besides fixing this add an assert that flags any MEM operands that are
> not Pmode.
>
> Passes regress (with/without ilp32). OK for commit?

This looks related to PR 80266 in that one was crashing due to the
store pair instruction like what was reported.

Thanks,
Andrew


>
> ChangeLog:
> 2017-06-27  Wilco Dijkstra  
>
> * config/aarch64/aarch64 (aarch64_expand_mov_immediate):
> Convert memory address to Pmode.
> --
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 
> 329d244e9cf16dbdf849e5dd02b3999caf0cd5a7..9038748ba049ba589f067f3f04c31704fe673d2c
>  100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -1958,6 +1958,8 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm)
>   gcc_assert (can_create_pseudo_p ());
>   base = gen_reg_rtx (ptr_mode);
>   aarch64_expand_mov_immediate (base, XEXP (mem, 0));
> + if (ptr_mode != Pmode)
> +   base = convert_memory_address (Pmode, base);
>   mem = gen_rtx_MEM (ptr_mode, base);
> }
>
> @@ -5207,6 +5209,7 @@ aarch64_print_operand (FILE *f, rtx x, int code)
>
> case MEM:
>   output_address (GET_MODE (x), XEXP (x, 0));
> + gcc_assert (GET_MODE (XEXP (x, 0)) == Pmode);
>   break;
>
> case CONST:


Re: [PATCH 0/7] Support for the SPARC M8 cpu

2017-07-05 Thread Jose E. Marchesi

Hi Rainer.

> This patch serie adds support for the SPARC M8 processor to GCC.
> The SPARC M8 processor implements the Oracle SPARC Architecture 2017.
[...]
> Note that full binutils support for M8 was upstreamed in May 19.
> Bootstrapped and tested in sparc64-linux-gnu.  No regressions.

since the patch is touching Solaris-specific files, too, please also
regtest it on Solaris (S12 with Studio 12.6 fbe would be best, I
suppose).

I tested in sparc-sun-solaris2.12 and it indeed catched a couple of
typos in the VIS4B built-ins registration in 32-bit mode.  There were no
regressions.

I am preparing a second version of the patch serie that I will be
submitting today.


[ping] don't complain about undefined env vars in self specs on gcc -v

2017-07-05 Thread Olivier Hainque
Hello,

Ping for patch proposed here:
https://gcc.gnu.org/ml/gcc-patches/2017-06/msg00579.html

Thanks much in advance,

With Kind Regards,

Olivier

> On Jun 9, 2017, at 10:42 , Olivier Hainque  wrote:
> 
> Hello,
> 
> This is a follow-up improvement over the change
> introduced from
> 
> https://gcc.gnu.org/ml/gcc-patches/2016-01/msg00414.html
> 
> << self-specs setup with configure --with-specs are allowed to contain 
> %:getenv
>   environment variable references. We are using this capability in a few cases
>   for cross ports.
> 
>   In such configurations, gcc --help or gcc --version fail if the variables
>   aren't defined [...]
> 
>   It is a bit annoying to have to define environment variables just to
>   be able to retrieve version info.
>>> 
> 
> The attached patch adjusts the code to allow a lone "gcc -v" as well,
> for the same reason.
> 
> I verified that it works as intended on ports configured with
> self specs as described above, that it doesn't change the behavior
> of a lone "gcc -v" on a regular x86-linux compiler, and that is
> bootstraps + regtests clean on x86_64-linux.
> 
> OK to commit ?
> 
> Thanks in advance,
> 
> With Kind Regards,
> 
> Olivier
> 
> 2017-06-09  Olivier Hainque  
> 
>* gcc.c (process_command): When deciding if undefined variables
>should be ignored when processing specs, accept "gcc -v" as well.
> 
> 
> 



Re: [patch,avr] Add support for devices with flash accessible by LD.

2017-07-05 Thread Georg-Johann Lay

On 04.07.2017 20:11, Richard Sandiford wrote:

Georg-Johann Lay  writes:

Hi,

This patch adds support for devices that can access flash memory
by LD* instructions, hence there is no need to put .rodata in RAM.

The default linker script for the new multilib versions already
supports this feature, it's similar to avrtiny, cf.

https://sourceware.org/PR21472

This patch does the following:

* Add multilib variants avrxmega3 and avrxmega3/short-calls.

* Add new option -mshort-calls for multilib selection between
devices with <= 8KiB flash and > 8KiB flash.

* Add specs handling for -mshort-calls:  The compiler knows
if this option is needed or not appropriate (similar to -msp8).

* Add new ISA feature AVR_ISA_RCALL for multilib selection
via -mshort-calls.

* Add a new row to architecture description that contains the
start address of flash memory in the RAM address range.
(The actual value is not needed).

* For devices with flash in RAM space, don't let .rodata
objects trigger need for __do_copy_data.

* Add some devices.

* Add configure test for Binutils PR21472.


Sorry if this has already been discussed, but it's useful to be
able to do things like:

   .../configure --target=avr-elf --with-cpu=arc700
   make -j... all-gcc

as a basic sanity test of a pan-target patch.  (I usually do
before-and-after assembly comparisons too if no changes are
expected.)  The way the configure test is written means that
it's no longer possible to do this without first building a
trunk version of binutils for avr-elf.

Thanks,
Richard


Okay, I already thought of a less aggressive approach, I'll
try to address it soon.

Johann




[PATCH v3][ASAN] Implement dynamic allocas/VLAs sanitization.​

2017-07-05 Thread Maxim Ostapenko

Hi,

this is a patch with fixed issues for previous review. Tested and 
bootstrapped on x86_64-unknown-linux-gnu and ppc64le-redhat-linux.

Could you take a look?

-Maxim
gcc/ChangeLog:

2017-07-05  Maxim Ostapenko  

	* asan.c: Include gimple-fold.h.
	(get_last_alloca_addr): New function.
	(handle_builtin_stackrestore): Likewise.
	(handle_builtin_alloca): Likewise.
	(asan_emit_allocas_unpoison): Likewise.
	(get_mem_refs_of_builtin_call): Add new parameter, remove const
	quallifier from first paramerer. Handle BUILT_IN_ALLOCA,
	BUILT_IN_ALLOCA_WITH_ALIGN and BUILT_IN_STACK_RESTORE builtins.
	(instrument_builtin_call): Pass gimple iterator to
	get_mem_refs_of_builtin_call.
	(last_alloca_addr): New global.
	* asan.h (asan_emit_allocas_unpoison): Declare.
	* builtins.c (expand_asan_emit_allocas_unpoison): New function.
	(expand_builtin): Handle BUILT_IN_ASAN_ALLOCAS_UNPOISON.
	* cfgexpand.c (expand_used_vars): Call asan_emit_allocas_unpoison
	if function calls alloca.
	* gimple-fold.c (replace_call_with_value): Remove static keyword.
	* gimple-fold.h (replace_call_with_value): Declare.
	* internal-fn.c: Include asan.h.
	* sanitizer.def (BUILT_IN_ASAN_ALLOCA_POISON,
	BUILT_IN_ASAN_ALLOCAS_UNPOISON): New builtins.

gcc/testsuite/ChangeLog:

2017-07-05  Maxim Ostapenko  

	* c-c++-common/asan/alloca_big_alignment.c: New test.
	* c-c++-common/asan/alloca_detect_custom_size.c: Likewise.
	* c-c++-common/asan/alloca_instruments_all_paddings.c: Likewise.
	* c-c++-common/asan/alloca_loop_unpoisoning.c: Likewise.
	* c-c++-common/asan/alloca_overflow_partial.c: Likewise.
	* c-c++-common/asan/alloca_overflow_right.c: Likewise.
	* c-c++-common/asan/alloca_safe_access.c: Likewise.
	* c-c++-common/asan/alloca_underflow_left.c: Likewise.

diff --git a/gcc/asan.c b/gcc/asan.c
index 2de1640..236dd23 100644
--- a/gcc/asan.c
+++ b/gcc/asan.c
@@ -55,6 +55,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "langhooks.h"
 #include "cfgloop.h"
 #include "gimple-builder.h"
+#include "gimple-fold.h"
 #include "ubsan.h"
 #include "params.h"
 #include "builtins.h"
@@ -245,6 +246,7 @@ along with GCC; see the file COPYING3.  If not see
 static unsigned HOST_WIDE_INT asan_shadow_offset_value;
 static bool asan_shadow_offset_computed;
 static vec sanitized_sections;
+static tree last_alloca_addr;
 
 /* Set of variable declarations that are going to be guarded by
use-after-scope sanitizer.  */
@@ -529,11 +531,175 @@ get_mem_ref_of_assignment (const gassign *assignment,
   return true;
 }
 
+/* Return address of last allocated dynamic alloca.  */
+
+static tree
+get_last_alloca_addr ()
+{
+  if (last_alloca_addr)
+return last_alloca_addr;
+
+  last_alloca_addr = create_tmp_reg (ptr_type_node, "last_alloca_addr");
+  gassign *g = gimple_build_assign (last_alloca_addr, null_pointer_node);
+  edge e = single_succ_edge (ENTRY_BLOCK_PTR_FOR_FN (cfun));
+  gsi_insert_on_edge_immediate (e, g);
+  return last_alloca_addr;
+}
+
+/* Insert __asan_allocas_unpoison (top, bottom) call after
+   __builtin_stack_restore (new_sp) call.
+   The pseudocode of this routine should look like this:
+ __builtin_stack_restore (new_sp);
+ top = last_alloca_addr;
+ bot = new_sp;
+ __asan_allocas_unpoison (top, bot);
+ last_alloca_addr = new_sp;
+   In general, can't we use new_sp as bot parameter because on some
+   architectures SP has non zero offset from dynamic stack area.  Moreover, on
+   some architectures this offset (STACK_DYNAMIC_OFFSET) becomes known for each
+   particular function only after all callees were expanded to rtl.
+   The most noticeable example is PowerPC{,64}, see
+   http://refspecs.linuxfoundation.org/ELF/ppc64/PPC-elf64abi.html#DYNAM-STACK.
+   To overcome the issue we use following trick: pass new_sp as a second
+   parameter to __asan_allocas_unpoison and rewrite it during expansion with
+   virtual_dynamic_stack_rtx later in expand_asan_emit_allocas_unpoison
+   function.
+*/
+
+static void
+handle_builtin_stack_restore (gcall *call, gimple_stmt_iterator *iter)
+{
+  if (!iter)
+return;
+
+  tree last_alloca = get_last_alloca_addr ();
+  tree restored_stack = gimple_call_arg (call, 0);
+  tree fn = builtin_decl_implicit (BUILT_IN_ASAN_ALLOCAS_UNPOISON);
+  gimple *g = gimple_build_call (fn, 2, last_alloca, restored_stack);
+  gsi_insert_after (iter, g, GSI_NEW_STMT);
+  g = gimple_build_assign (last_alloca, restored_stack);
+  gsi_insert_after (iter, g, GSI_NEW_STMT);
+}
+
+/* Deploy and poison redzones around __builtin_alloca call.  To do this, we
+   should replace this call with another one with changed parameters and
+   replace all its uses with new address, so
+ addr = __builtin_alloca (old_size, align);
+   is replaced by
+ new_size = old_size + additional_size;
+ tmp = __builtin_alloca (new_size, max (align, 32))
+ addr = tmp + 32 (first 32 bytes are for the left redzone);
+   ADDITIONAL_SIZE is added to make new memory allocation contain not only
+   requested memo

Re: [PATCH 0/7] Support for the SPARC M8 cpu

2017-07-05 Thread Rainer Orth
Hi Jose,

> > This patch serie adds support for the SPARC M8 processor to GCC.
> > The SPARC M8 processor implements the Oracle SPARC Architecture 2017.
> [...]
> > Note that full binutils support for M8 was upstreamed in May 19.
> > Bootstrapped and tested in sparc64-linux-gnu.  No regressions.
> 
> since the patch is touching Solaris-specific files, too, please also
> regtest it on Solaris (S12 with Studio 12.6 fbe would be best, I
> suppose).
>
> I tested in sparc-sun-solaris2.12 and it indeed catched a couple of
> typos in the VIS4B built-ins registration in 32-bit mode.  There were no
> regressions.

fine, thanks.

> I am preparing a second version of the patch serie that I will be
> submitting today.

Ok.  I was a bit astonished lately to find that a couple of SPARC
patches to binutils and gdb had only been tested (or even only
supported) Linux/SPARC, not Solaris/SPARC.  However, it's great to see
Oracle finally engage in the community here.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH 2/3] Simplify wrapped binops

2017-07-05 Thread Robin Dapp
> While the initialization value doesn't matter (wi::add will overwrite it)
> better initialize both to false ;)  Ah, you mean because we want to
> transform only if get_range_info returned VR_RANGE.  Indeed somewhat
> unintuitive (but still the best variant for now).

> so I'm still missing a comment on why min_ovf && max_ovf is ok.
> The simple-minded would have written [...]

I suppose it's more a matter of considering too many things at the same
time for me...  I was still thinking of including more cases than
necessary for the regression.  Guess the attached version will do as
well and should not contain any more surprises.  If needed, I'll add
additional cases some time.

Tests in a followup message.

Regards
 Robin
diff --git a/gcc/match.pd b/gcc/match.pd
index 80a17ba..3acf8be 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -1290,6 +1290,70 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 (if (cst && !TREE_OVERFLOW (cst))
  (plus { cst; } @0
 
+/* ((T)(A + CST1)) + CST2 -> (T)(A) + CST  */
+#if GIMPLE
+  (simplify
+(plus (convert (plus@3 @0 INTEGER_CST@1)) INTEGER_CST@2)
+  (if (INTEGRAL_TYPE_P (type)
+   && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@3)))
+   /* Combine CST1 and CST2 to CST and convert to outer type if
+  (A + CST1)'s range does not overflow.  */
+   (with
+   {
+ tree inner_type = TREE_TYPE (@3);
+ wide_int wmin0, wmax0;
+ wide_int w1 = @1;
+
+ bool ovf_undef = TYPE_OVERFLOW_UNDEFINED (inner_type);
+ bool min_ovf = true, max_ovf = true;
+
+ enum value_range_type vr0 = get_range_info (@0, &wmin0, &wmax0);
+
+ if (ovf_undef || vr0 == VR_RANGE)
+   {
+ if (!ovf_undef && vr0 == VR_RANGE)
+	   {
+		 wi::add (wmin0, w1, TYPE_SIGN (inner_type), &min_ovf);
+		 wi::add (wmax0, w1, TYPE_SIGN (inner_type), &max_ovf);
+	   }
+	 w1 = w1.from (@1, TYPE_PRECISION (type), TYPE_SIGN (inner_type));
+   }
+   }
+   (if (ovf_undef || !(min_ovf || max_ovf))
+(plus (convert @0) { wide_int_to_tree (type, wi::add (w1, @2)); }
+ )
+#endif
+
+/* ((T)(A)) + CST -> (T)(A + CST)  */
+#if GIMPLE
+  (simplify
+   (plus (convert SSA_NAME@0) INTEGER_CST@1)
+(if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
+ && INTEGRAL_TYPE_P (type)
+ && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0))
+ && int_fits_type_p (@1, TREE_TYPE (@0)))
+ /* Perform binary operation inside the cast if the constant fits
+and (A + CST)'s range does not overflow.  */
+ (with
+  {
+bool min_ovf = true, max_ovf = true;
+tree inner_type = TREE_TYPE (@0);
+
+wide_int w1 = w1.from (@1, TYPE_PRECISION (inner_type), TYPE_SIGN
+  		(inner_type));
+
+wide_int wmin0, wmax0;
+if (get_range_info (@0, &wmin0, &wmax0) == VR_RANGE)
+  {
+wi::add (wmin0, w1, TYPE_SIGN (inner_type), &min_ovf);
+wi::add (wmax0, w1, TYPE_SIGN (inner_type), &max_ovf);
+  }
+  }
+ (if (!min_ovf && !max_ovf)
+  (convert (plus @0 { {wide_int_to_tree (TREE_TYPE (@0), w1)}; })))
+ )))
+#endif
+
   /* ~A + A -> -1 */
   (simplify
(plus:c (bit_not @0) @0)


Re: [PATCH 2/3] Simplify wrapped binops

2017-07-05 Thread Robin Dapp
[3/3] Tests

--

gcc/testsuite/ChangeLog:

2017-07-05  Robin Dapp  

* gcc.dg/wrapped-binop-simplify-signed-1.c: New test.
* gcc.dg/wrapped-binop-simplify-signed-2.c: New test.
* gcc.dg/wrapped-binop-simplify-unsigned-1.c: New test.
* gcc.dg/wrapped-binop-simplify-unsigned-2.c: New test.
diff --git a/gcc/testsuite/gcc.dg/wrapped-binop-simplify-signed-1.c b/gcc/testsuite/gcc.dg/wrapped-binop-simplify-signed-1.c
new file mode 100644
index 000..2571a07
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/wrapped-binop-simplify-signed-1.c
@@ -0,0 +1,65 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-ccp1-details" } */
+/* { dg-final { scan-tree-dump-times "gimple_simplified to" 12 "ccp1" } } */
+
+#include 
+
+long foo(int a)
+{
+  return (long)(a - 2) + 1;
+}
+
+long bar(int a)
+{
+  return (long)(a + 3) - 1;
+}
+
+long baz(int a)
+{
+  return (long)(a - 1) + 2;
+}
+
+long baf(int a)
+{
+  return (long)(a + 1) - 2;
+}
+
+long bak(int a)
+{
+  return (long)(a + 1) + 3;
+}
+
+long bal(int a)
+{
+  return (long)(a - 7) - 4;
+}
+
+long bam(int a)
+{
+  return (long)(a - 1) - INT_MAX;
+}
+
+long bam2(int a)
+{
+  return (long)(a + 1) + INT_MAX;
+}
+
+long ban(int a)
+{
+  return (long)(a - 1) + INT_MIN;
+}
+
+long ban2(int a)
+{
+  return (long)(a + 1) - INT_MIN;
+}
+
+unsigned long baq(int a)
+{
+  return (unsigned long)(a + 1) - 1;
+}
+
+unsigned long baq2(int a)
+{
+  return (unsigned long)(a - 2) + 1;
+}
diff --git a/gcc/testsuite/gcc.dg/wrapped-binop-simplify-signed-2.c b/gcc/testsuite/gcc.dg/wrapped-binop-simplify-signed-2.c
new file mode 100644
index 000..5c897ba
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/wrapped-binop-simplify-signed-2.c
@@ -0,0 +1,39 @@
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+#include 
+#include 
+
+int aa = -3;
+
+__attribute__((noinline))
+long foo (int a)
+{
+  return (long)(a - INT_MIN) + 1;
+}
+
+__attribute__((noinline))
+long foo2 (int a)
+{
+  if (a > -10 && a < 10)
+return (long)(a + 2) - 1;
+}
+
+__attribute__((noinline))
+long foo3 (int a)
+{
+  if (a > -10 && a < 10)
+return (long)(a) - 3;
+}
+
+int main()
+{
+  volatile long h = foo (aa);
+  assert (h == 2147483646);
+
+  volatile long i = foo2 (aa);
+  assert (i == -2);
+
+  volatile long j = foo3 (aa);
+  assert (j == -6);
+}
diff --git a/gcc/testsuite/gcc.dg/wrapped-binop-simplify-unsigned-1.c b/gcc/testsuite/gcc.dg/wrapped-binop-simplify-unsigned-1.c
new file mode 100644
index 000..04a7ca49
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/wrapped-binop-simplify-unsigned-1.c
@@ -0,0 +1,43 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-evrp-details -fdump-tree-ccp2-details -fdump-tree-vrp1-details" } */
+/* { dg-final { scan-tree-dump-times "gimple_simplified to" 2 "evrp" } } */
+/* { dg-final { scan-tree-dump-times "Match-and-simplified" 2 "ccp2" } } */
+/* { dg-final { scan-tree-dump-times "gimple_simplified to" 3 "vrp1" } } */
+
+#include 
+
+unsigned long oof2(unsigned int a)
+{
+  if (a > 0)
+return (unsigned long)(a - 1) + 1;
+}
+
+unsigned long bah (unsigned int a)
+{
+  if (a > 0)
+return (unsigned long)(a - 1) - 1;
+}
+
+long baq3(unsigned int a)
+{
+  if (a > 0)
+return (long)(a - 1) + 1;
+}
+
+unsigned long bap(unsigned int a)
+{
+  if (a < UINT_MAX)
+return (unsigned long)(a + 1) + ULONG_MAX;
+}
+
+unsigned long bar3(unsigned int a)
+{
+  if (a < UINT_MAX)
+return (unsigned long)(a + 1) - 5;
+}
+
+unsigned long bar4(unsigned int a)
+{
+  if (a < UINT_MAX)
+return (unsigned long)(a + 1) - 6;
+}
diff --git a/gcc/testsuite/gcc.dg/wrapped-binop-simplify-unsigned-2.c b/gcc/testsuite/gcc.dg/wrapped-binop-simplify-unsigned-2.c
new file mode 100644
index 000..46290e7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/wrapped-binop-simplify-unsigned-2.c
@@ -0,0 +1,125 @@
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+#include 
+#include 
+
+unsigned int a = 3;
+int aa = 3;
+int bb = 1;
+int cc = 4;
+unsigned int dd = 0;
+unsigned int ee = 4294967294u;
+
+__attribute__((noinline))
+unsigned long foo1 (unsigned int a)
+{
+  return (unsigned long)(UINT_MAX + 1) - 1;
+}
+
+__attribute__((noinline))
+unsigned long foo2 (unsigned int a)
+{
+  if (a < 4)
+return (unsigned long)(a - 4) + 1;
+}
+
+__attribute__((noinline))
+unsigned long foo3 (unsigned int a)
+{
+  if (a > 2)
+return (unsigned long)(a + UINT_MAX - 4) + 2;
+}
+
+__attribute__((noinline))
+unsigned long foo4 (unsigned int a)
+{
+  if (a > 2)
+return (unsigned long)(a - UINT_MAX) + UINT_MAX;
+}
+
+__attribute__((noinline))
+unsigned long foo5 (unsigned int a)
+{
+  if (a > 2)
+return (unsigned long)(a + UINT_MAX) - UINT_MAX;
+}
+
+__attribute__((noinline))
+long foo6 (unsigned int a)
+{
+  if (a > 2)
+return (long)(a - 4) + 1;
+}
+
+__attribute__((noinline))
+long foo7 (unsigned int a)
+{
+  if (a > 2)
+return (long)(a + UINT_MAX) + 1;
+}
+
+__attribute__((noinline))
+unsigned long foo8 (unsigned int a)
+{
+  if (a < 2)
+return (unsigned long)(a 

[RFC][PR 67336][PING^4] Verify pointers during stack unwind

2017-07-05 Thread Yuri Gribov
Hi all,

I've rebased the previous patch to trunk per Andrew's suggestion.
Original patch description/motivation/questions are in
https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01869.html

-Y


safe-unwind-2.patch
Description: Binary data
#include 
#include 

struct _Unwind_Context;

typedef int (*_Unwind_Trace_Fn)(struct _Unwind_Context *, void *vdata);

extern int _Unwind_Backtrace(_Unwind_Trace_Fn trace, void * trace_argument);
extern int _Unwind_Backtrace_Checked(_Unwind_Trace_Fn trace, void * trace_argument);

#ifdef CHECK_UNWIND
#define _Unwind_Backtrace _Unwind_Backtrace_Checked
#endif

extern void *_Unwind_GetIP (struct _Unwind_Context *context);

int simple_unwind (struct _Unwind_Context *context, void *vdata) {
  printf("Next frame: ");
  void *pc = _Unwind_GetIP(context);
  printf("%p\n", pc);
  return 0;
}

#define noinline __attribute__((noinline))

noinline int foo() {
  // Clobber stack to provoke errors in unwinder
  int x;
  void *p = &x;
  asm("" :: "r"(p));
  memset(p, 0xa, 128);

  printf("After clobbering stack\n");

  int ret = _Unwind_Backtrace(simple_unwind, 0);
  printf("After unwind: %d\n", ret);
  printf("We're going to fail now\n");

  return 0;
}

noinline int bar() {
  int x = foo();
  return x + 1;
}

int main() {
  bar();
  return 0;
}


Re: [PATCH v3][ASAN] Implement dynamic allocas/VLAs sanitization.​

2017-07-05 Thread Jakub Jelinek
On Wed, Jul 05, 2017 at 11:24:15AM +0300, Maxim Ostapenko wrote:
> +   In general, can't we use new_sp as bot parameter because on some

s/can't we/we can't/

> +  /* new_alloca = new_alloca_with_rz + align.  */
> +  g = gimple_build_assign (make_ssa_name (ptr_type), POINTER_PLUS_EXPR,
> +new_alloca_with_rz,
> +build_int_cst (size_type_node,
> +   align / BITS_PER_UNIT));
> +  gsi_insert_before (iter, g, GSI_SAME_STMT);
> +  tree new_alloca = gimple_assign_lhs (g);
> +
> +  /* Replace old alloca ptr with NEW_ALLOCA.  */
> +  replace_call_with_value (iter, new_alloca);
> +
> +  /* Poison newly created alloca redzones:
> +  __asan_alloca_poison (new_alloca, old_size).  */
> +  fn = builtin_decl_implicit (BUILT_IN_ASAN_ALLOCA_POISON);
> +  gg = gimple_build_call (fn, 2, new_alloca, old_size);
> +  gsi_insert_before (iter, gg, GSI_SAME_STMT);
> +
> +  /* Save new_alloca_with_rz value into last_alloca to use it during
> + allocas unpoisoning.  */
> +  g = gimple_build_assign (last_alloca, new_alloca_with_rz);
> +  gsi_insert_before (iter, g, GSI_SAME_STMT);

I think the replace_call_with_value should go only after these two,
so that it matches the order in which the stmts are emitted.
Or maybe better, keep the __builtin_alloca call stmt in the IL,
instead of add another one, just change its argument and return value,
and add some new stmts before the call and some after the call,
then you wouldn't need to export replace_call_with_value.

Also, the function comment above handle_builtin_alloca describes
only small portion of the statements you actually emit, can you please
update it so that it also lists __asan_alloca_poison etc.?

> +/* Emit a call to __asan_allocas_unpoison call in EXP.  Replace second 
> argument
> +   of the call with virtual_stack_dynamic_rtx because in asan pass we emit a
> +   dummy value into second parameter relying on this function to perform the
> +   change.  See motivation for this in comment to 
> handle_builtin_stack_restore
> +   function.  */
> +
> +static rtx
> +expand_asan_emit_allocas_unpoison (tree exp)
> +{
> +  tree arg0 = CALL_EXPR_ARG (exp, 0);
> +  rtx top = expand_expr (arg0, NULL_RTX, GET_MODE 
> (virtual_stack_dynamic_rtx),
> +  EXPAND_NORMAL);
> +  rtx ret = init_one_libfunc ("__asan_allocas_unpoison");
> +  ret = emit_library_call_value (ret, NULL_RTX, LCT_NORMAL, ptr_mode, 2, top,
> +  TYPE_MODE (pointer_sized_int_node),
> +  virtual_stack_dynamic_rtx,
> +  TYPE_MODE (pointer_sized_int_node));

I think another possibility to implement this would be
  CALL_EXPR_ARG (exp, 1)
= make_tree (pointer_sized_int_mode, virtual_stack_dynamic_rtx);
  return expand_call (exp, const0_rtx, 1);

Otherwise LGTM.

Jakub


Re: [PATCH][Aarch64] Add support for overflow add and sub operations

2017-07-05 Thread Richard Earnshaw (lists)
On 19/05/17 22:11, Michael Collison wrote:
> Christophe,
> 
> I had a type in the two test cases: "addcs" should have been "adcs". I caught 
> this previously but submitted the previous patch incorrectly. Updated patch 
> attached.
> 
> Okay for trunk?
> 

Apologies for the delay responding, I've been procrastinating over this
one.   In part it's due to the size of the patch with very little
top-level description of what's the motivation and overall approach to
the problem.

It would really help review if this could be split into multiple patches
with a description of what each stage achieves.

Anyway, there are a couple of obvious formatting issues to deal with
first, before we get into the details of the patch.

> -Original Message-
> From: Christophe Lyon [mailto:christophe.l...@linaro.org] 
> Sent: Friday, May 19, 2017 3:59 AM
> To: Michael Collison 
> Cc: gcc-patches@gcc.gnu.org; nd 
> Subject: Re: [PATCH][Aarch64] Add support for overflow add and sub operations
> 
> Hi Michael,
> 
> 
> On 19 May 2017 at 07:12, Michael Collison  wrote:
>> Hi,
>>
>> This patch improves code generations for builtin arithmetic overflow 
>> operations for the aarch64 backend. As an example for a simple test case 
>> such as:
>>
>> Sure for a simple test case such as:
>>
>> int
>> f (int x, int y, int *ovf)
>> {
>>   int res;
>>   *ovf = __builtin_sadd_overflow (x, y, &res);
>>   return res;
>> }
>>
>> Current trunk at -O2 generates
>>
>> f:
>> mov w3, w0
>> mov w4, 0
>> add w0, w0, w1
>> tbnzw1, #31, .L4
>> cmp w0, w3
>> blt .L3
>> .L2:
>> str w4, [x2]
>> ret
>> .p2align 3
>> .L4:
>> cmp w0, w3
>> ble .L2
>> .L3:
>> mov w4, 1
>> b   .L2
>>
>>
>> With the patch this now generates:
>>
>> f:
>> addsw0, w0, w1
>> csetw1, vs
>> str w1, [x2]
>> ret
>>
>>
>> Original patch from Richard Henderson:
>>
>> https://gcc.gnu.org/ml/gcc-patches/2016-01/msg01903.html
>>
>>
>> Okay for trunk?
>>
>> 2017-05-17  Michael Collison  
>> Richard Henderson 
>>
>> * config/aarch64/aarch64-modes.def (CC_V): New.
>> * config/aarch64/aarch64-protos.h
>> (aarch64_add_128bit_scratch_regs): Declare
>> (aarch64_add_128bit_scratch_regs): Declare.
>> (aarch64_expand_subvti): Declare.
>> (aarch64_gen_unlikely_cbranch): Declare
>> * config/aarch64/aarch64.c (aarch64_select_cc_mode): Test
>> for signed overflow using CC_Vmode.
>> (aarch64_get_condition_code_1): Handle CC_Vmode.
>> (aarch64_gen_unlikely_cbranch): New function.
>> (aarch64_add_128bit_scratch_regs): New function.
>> (aarch64_subv_128bit_scratch_regs): New function.
>> (aarch64_expand_subvti): New function.
>> * config/aarch64/aarch64.md (addv4, uaddv4): New.
>> (addti3): Create simpler code if low part is already known to be 0.
>> (addvti4, uaddvti4): New.
>> (*add3_compareC_cconly_imm): New.
>> (*add3_compareC_cconly): New.
>> (*add3_compareC_imm): New.
>> (*add3_compareC): Rename from add3_compare1; do not
>> handle constants within this pattern.
>> (*add3_compareV_cconly_imm): New.
>> (*add3_compareV_cconly): New.
>> (*add3_compareV_imm): New.
>> (add3_compareV): New.
>> (add3_carryinC, add3_carryinV): New.
>> (*add3_carryinC_zero, *add3_carryinV_zero): New.
>> (*add3_carryinC, *add3_carryinV): New.
>> (subv4, usubv4): New.
>> (subti): Handle op1 zero.
>> (subvti4, usub4ti4): New.
>> (*sub3_compare1_imm): New.
>> (sub3_carryinCV): New.
>> (*sub3_carryinCV_z1_z2, *sub3_carryinCV_z1): New.
>> (*sub3_carryinCV_z2, *sub3_carryinCV): New.
>> * testsuite/gcc.target/arm/builtin_sadd_128.c: New testcase.
>> * testsuite/gcc.target/arm/builtin_saddl.c: New testcase.
>> * testsuite/gcc.target/arm/builtin_saddll.c: New testcase.
>> * testsuite/gcc.target/arm/builtin_uadd_128.c: New testcase.
>> * testsuite/gcc.target/arm/builtin_uaddl.c: New testcase.
>> * testsuite/gcc.target/arm/builtin_uaddll.c: New testcase.
>> * testsuite/gcc.target/arm/builtin_ssub_128.c: New testcase.
>> * testsuite/gcc.target/arm/builtin_ssubl.c: New testcase.
>> * testsuite/gcc.target/arm/builtin_ssubll.c: New testcase.
>> * testsuite/gcc.target/arm/builtin_usub_128.c: New testcase.
>> * testsuite/gcc.target/arm/builtin_usubl.c: New testcase.
>> * testsuite/gcc.target/arm/builtin_usubll.c: New testcase.
> 
> I've tried your patch, and 2 of the new tests FAIL:
> gcc.target/aarch64/builtin_sadd_128.c scan-assembler addcs
> gcc.target/aarch64/builtin_uadd_128.c scan-assembler addcs
> 
> Am I missing something?
> 
> Thanks,
> 
> Christophe
> 
> 
>

Re: [PATCH][AArch64] Fix ILP32 memory access

2017-07-05 Thread Wilco Dijkstra
Andrew Pinski wrote:
>
> This looks related to PR 80266 in that one was crashing due to the
> store pair instruction like what was reported.

Yes it's the same bug. I've now finally reproduced it, it seems many stack 
addresses in Ada are SImode which is incorrect (and ultimately can trigger
the LDP assertion like it did with -mcmodel=large). I don't see any checks in
the backend that enforce Pmode for addresses, for example
aarch64_classify_address doesn't reject SImode addresses...

Wilco

Re: [patch,avr] Add support for devices with flash accessible by LD.

2017-07-05 Thread Georg-Johann Lay

On 05.07.2017 10:17, Georg-Johann Lay wrote:

On 04.07.2017 20:11, Richard Sandiford wrote:

Georg-Johann Lay  writes:

Hi,

This patch adds support for devices that can access flash memory
by LD* instructions, hence there is no need to put .rodata in RAM.

The default linker script for the new multilib versions already
supports this feature, it's similar to avrtiny, cf.

https://sourceware.org/PR21472

This patch does the following:

* Add multilib variants avrxmega3 and avrxmega3/short-calls.

* Add new option -mshort-calls for multilib selection between
devices with <= 8KiB flash and > 8KiB flash.

* Add specs handling for -mshort-calls:  The compiler knows
if this option is needed or not appropriate (similar to -msp8).

* Add new ISA feature AVR_ISA_RCALL for multilib selection
via -mshort-calls.

* Add a new row to architecture description that contains the
start address of flash memory in the RAM address range.
(The actual value is not needed).

* For devices with flash in RAM space, don't let .rodata
objects trigger need for __do_copy_data.

* Add some devices.

* Add configure test for Binutils PR21472.


Sorry if this has already been discussed, but it's useful to be
able to do things like:

   .../configure --target=avr-elf --with-cpu=arc700
   make -j... all-gcc

as a basic sanity test of a pan-target patch.  (I usually do
before-and-after assembly comparisons too if no changes are
expected.)  The way the configure test is written means that
it's no longer possible to do this without first building a
trunk version of binutils for avr-elf.

Thanks,
Richard


Okay, I already thought of a less aggressive approach, I'll
try to address it soon.


Is the following addendum in order?

The avr maintainers appear to be offline since several weeks already,
maybe a global maintainer can have a look and approve it for trunk?


Johann

gcc/
PR target/81072
* configure.ac [target=avr]: WARN instead of ERROR if avrxmega3
.rodata in flash test fails.
: Define if test passes.
* confgure: Regenerate.
* config.in: Regenerate.
* config/avr/avr.c (avr_asm_named_section)
[HAVE_LD_AVR_AVRXMEGA3_RODATA_IN_FLASH]: Only trigger
__do_copy_data for stuff in .rodata if flash_pm_offset = 0.
(avr_asm_init_sections): Same.

Index: config/avr/avr.c
===
--- config/avr/avr.c	(revision 249982)
+++ config/avr/avr.c	(working copy)
@@ -10001,7 +10001,9 @@ avr_asm_init_sections (void)
  resp. `avr_need_copy_data_p'.  If flash is not mapped to RAM then
  we have also to track .rodata because it is located in RAM then.  */
 
+#if defined HAVE_LD_AVR_AVRXMEGA3_RODATA_IN_FLASH
   if (0 == avr_arch->flash_pm_offset)
+#endif
 readonly_data_section->unnamed.callback = avr_output_data_section_asm_op;
   data_section->unnamed.callback = avr_output_data_section_asm_op;
   bss_section->unnamed.callback = avr_output_bss_section_asm_op;
@@ -10037,7 +10039,10 @@ avr_asm_named_section (const char *name,
 || STR_PREFIX_P (name, ".gnu.linkonce.d"));
 
   if (!avr_need_copy_data_p
-  && 0 == avr_arch->flash_pm_offset)
+#if defined HAVE_LD_AVR_AVRXMEGA3_RODATA_IN_FLASH
+  && 0 == avr_arch->flash_pm_offset
+#endif
+  )
 avr_need_copy_data_p = (STR_PREFIX_P (name, ".rodata")
 || STR_PREFIX_P (name, ".gnu.linkonce.r"));
 
Index: config.in
===
--- config.in	(revision 249982)
+++ config.in	(working copy)
@@ -1460,6 +1460,13 @@ that are supported for each access macro
 #endif
 
 
+/* Define if your default avr linker script for avrxmega3 leaves .rodata in
+   flash. */
+#ifndef USED_FOR_TARGET
+#undef HAVE_LD_AVR_AVRXMEGA3_RODATA_IN_FLASH
+#endif
+
+
 /* Define if your linker supports -z bndplt */
 #ifndef USED_FOR_TARGET
 #undef HAVE_LD_BNDPLT_SUPPORT
Index: configure
===
--- configure	(revision 249982)
+++ configure	(working copy)
@@ -24851,29 +24851,32 @@ EOF
   ac_status=$?
   $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
   test $ac_status = 0; }; }
-if test -f conftest.nm
+if test -s conftest.nm
 then
 	if grep ' R xxvaryy' conftest.nm > /dev/null; then
 	{ $as_echo "$as_me:${as_lineno-$LINENO}: result: yes" >&5
 $as_echo "yes" >&6; }
-	rm -f conftest.s conftest.o conftest.elf conftest.nm
+
+$as_echo "#define HAVE_LD_AVR_AVRXMEGA3_RODATA_IN_FLASH 1" >>confdefs.h
+
 	else
 	{ $as_echo "$as_me:${as_lineno-$LINENO}: result: no: avrxmega3 .rodata located in RAM" >&5
 $as_echo "no: avrxmega3 .rodata located in RAM" >&6; }
 	echo "$as_me: nm output was" >&5
 	cat conftest.nm >&5
-	rm -f conftest.s conftest.o conftest.elf conftest.nm
 	avr_ld_ver="`$gcc_cv_ld -v | sed -e 's:^.* ::'`"
-	as_fn_error "support for avrxmega3 needs Bi

Re: [patch,avr][Ping #3] PR81075: Move jump-tables out of .text

2017-07-05 Thread Georg-Johann Lay

Ping #3

http://gcc.gnu.org/ml/gcc-patches/2017-06/msg01029.html

As avr maintainers are off-line, would a global maintainer have
a look at this?

Thanks,

Johann



On 27.06.2017 12:01, Georg-Johann Lay wrote:

Ping #2

http://gcc.gnu.org/ml/gcc-patches/2017-06/msg01029.html

On 14.06.2017 14:03, Georg-Johann Lay wrote:

Hi,

Since PR71151 we have jump-tables in .text so that branches
crossing the tables have longer offsets that needed.

This moves jump-tables out of test again, but not into
.progmem.gcc_sw_tables like before PR71151, but into
the currently unused but existing .jumptables.

Since PR63223 there is no restriction on the location
of jump-tables, they can even reside above 128KiB without
problems.

Also adds -mlog=insn_addresses to dump insn addresses
as asm comments before respective instruction.

The patch implements ASM_OUTPUT_ADDR_VEC so that avr.c
gains full control over the table generation.

Tested on ATmega2560.

Ok to apply?

Johann


gcc/
 Move jump-tables out of .text again.

 PR target/81075
 * config/avr/avr.c (ASM_OUTPUT_ADDR_VEC_ELT): Remove function.
 (ASM_OUTPUT_ADDR_VEC): New function.
 (avr_adjust_insn_length) [JUMP_TABLE_DATA_P]: Return 0.
 (avr_final_prescan_insn) [avr_log.insn_addresses]: Dump
 INSN_ADDRESSes as asm comment.
 * config/avr/avr.h (JUMP_TABLES_IN_TEXT_SECTION): Adjust comment.
 (ASM_OUTPUT_ADDR_VEC_ELT): Remove define.
 (ASM_OUTPUT_ADDR_VEC): Define to avr_output_addr_vec.
 * config/avr/avr.md (*tablejump): Adjust comment.
 * config/avr/elf.h (ASM_OUTPUT_BEFORE_CASE_LABEL): Remove.
 * config/avr/avr-log.c (avr_log_set_avr_log) :
 New detail.
 * config/avr/avr-protos.h (avr_output_addr_vec_elt): Remove proto.
 (avr_output_addr_vec): New proto.
 (avr_log_t) : New field.







Re: [PATCH v3][ASAN] Implement dynamic allocas/VLAs sanitization.​

2017-07-05 Thread Maxim Ostapenko

On 05/07/17 12:34, Jakub Jelinek wrote:

On Wed, Jul 05, 2017 at 11:24:15AM +0300, Maxim Ostapenko wrote:

+   In general, can't we use new_sp as bot parameter because on some

s/can't we/we can't/


+  /* new_alloca = new_alloca_with_rz + align.  */
+  g = gimple_build_assign (make_ssa_name (ptr_type), POINTER_PLUS_EXPR,
+  new_alloca_with_rz,
+  build_int_cst (size_type_node,
+ align / BITS_PER_UNIT));
+  gsi_insert_before (iter, g, GSI_SAME_STMT);
+  tree new_alloca = gimple_assign_lhs (g);
+
+  /* Replace old alloca ptr with NEW_ALLOCA.  */
+  replace_call_with_value (iter, new_alloca);
+
+  /* Poison newly created alloca redzones:
+  __asan_alloca_poison (new_alloca, old_size).  */
+  fn = builtin_decl_implicit (BUILT_IN_ASAN_ALLOCA_POISON);
+  gg = gimple_build_call (fn, 2, new_alloca, old_size);
+  gsi_insert_before (iter, gg, GSI_SAME_STMT);
+
+  /* Save new_alloca_with_rz value into last_alloca to use it during
+ allocas unpoisoning.  */
+  g = gimple_build_assign (last_alloca, new_alloca_with_rz);
+  gsi_insert_before (iter, g, GSI_SAME_STMT);

I think the replace_call_with_value should go only after these two,
so that it matches the order in which the stmts are emitted.
Or maybe better, keep the __builtin_alloca call stmt in the IL,
instead of add another one, just change its argument and return value,
and add some new stmts before the call and some after the call,
then you wouldn't need to export replace_call_with_value.


But won't we need to replace all alloca uses manually in this case? E.g. 
to change str.1_18 value to _27?


  str.1_18 = __builtin_alloca_with_align (_16, 256);
  *str.1_18[index_19(D)] ={v} 49;

to

  _26 = __builtin_alloca_with_align (_25, 256);
  _27 = _26 + 32;
  __builtin___asan_alloca_poison (_27, _16);
  last_alloca_addr.4_32 = _26;
  str.1_18 = _27;
  *str.1_18[index_19(D)] ={v} 49;



Also, the function comment above handle_builtin_alloca describes
only small portion of the statements you actually emit, can you please
update it so that it also lists __asan_alloca_poison etc.?


+/* Emit a call to __asan_allocas_unpoison call in EXP.  Replace second argument
+   of the call with virtual_stack_dynamic_rtx because in asan pass we emit a
+   dummy value into second parameter relying on this function to perform the
+   change.  See motivation for this in comment to handle_builtin_stack_restore
+   function.  */
+
+static rtx
+expand_asan_emit_allocas_unpoison (tree exp)
+{
+  tree arg0 = CALL_EXPR_ARG (exp, 0);
+  rtx top = expand_expr (arg0, NULL_RTX, GET_MODE (virtual_stack_dynamic_rtx),
+EXPAND_NORMAL);
+  rtx ret = init_one_libfunc ("__asan_allocas_unpoison");
+  ret = emit_library_call_value (ret, NULL_RTX, LCT_NORMAL, ptr_mode, 2, top,
+TYPE_MODE (pointer_sized_int_node),
+virtual_stack_dynamic_rtx,
+TYPE_MODE (pointer_sized_int_node));

I think another possibility to implement this would be
   CALL_EXPR_ARG (exp, 1)
 = make_tree (pointer_sized_int_mode, virtual_stack_dynamic_rtx);
   return expand_call (exp, const0_rtx, 1);

Otherwise LGTM.

Jakub







Re: [PATCH v3][ASAN] Implement dynamic allocas/VLAs sanitization.​

2017-07-05 Thread Jakub Jelinek
On Wed, Jul 05, 2017 at 01:19:27PM +0300, Maxim Ostapenko wrote:
> But won't we need to replace all alloca uses manually in this case? E.g. to
> change str.1_18 value to _27?
> 
>   str.1_18 = __builtin_alloca_with_align (_16, 256);
>   *str.1_18[index_19(D)] ={v} 49;
> 
> to
> 
>   _26 = __builtin_alloca_with_align (_25, 256);
>   _27 = _26 + 32;
>   __builtin___asan_alloca_poison (_27, _16);
>   last_alloca_addr.4_32 = _26;
>   str.1_18 = _27;
>   *str.1_18[index_19(D)] ={v} 49;

You could do that, e.g. using replace_uses_by.

Or you could save the lhs from the old __builtin_alloca*,
gimple_call_set_lhs to a new SSA_NAME, use that in the following stmt and
use the old lhs as the lhs of that.  If needed update SSA_NAME_DEF_STMT
(lhs) if the functions don't do it for you (but I think they should).

Anyway, this is not a strong requirement, the most important is to
fix the comments, then move the replace_call_with_value call, the rest
is just try if it works and if it doesn't, keep what you have.

Jakub


Re: [patch,avr] Add support for devices with flash accessible by LD.

2017-07-05 Thread Richard Sandiford
Georg-Johann Lay  writes:
> On 05.07.2017 10:17, Georg-Johann Lay wrote:
>> On 04.07.2017 20:11, Richard Sandiford wrote:
>>> Georg-Johann Lay  writes:
 Hi,

 This patch adds support for devices that can access flash memory
 by LD* instructions, hence there is no need to put .rodata in RAM.

 The default linker script for the new multilib versions already
 supports this feature, it's similar to avrtiny, cf.

 https://sourceware.org/PR21472

 This patch does the following:

 * Add multilib variants avrxmega3 and avrxmega3/short-calls.

 * Add new option -mshort-calls for multilib selection between
 devices with <= 8KiB flash and > 8KiB flash.

 * Add specs handling for -mshort-calls:  The compiler knows
 if this option is needed or not appropriate (similar to -msp8).

 * Add new ISA feature AVR_ISA_RCALL for multilib selection
 via -mshort-calls.

 * Add a new row to architecture description that contains the
 start address of flash memory in the RAM address range.
 (The actual value is not needed).

 * For devices with flash in RAM space, don't let .rodata
 objects trigger need for __do_copy_data.

 * Add some devices.

 * Add configure test for Binutils PR21472.
>>>
>>> Sorry if this has already been discussed, but it's useful to be
>>> able to do things like:
>>>
>>>.../configure --target=avr-elf --with-cpu=arc700
>>>make -j... all-gcc
>>>
>>> as a basic sanity test of a pan-target patch.  (I usually do
>>> before-and-after assembly comparisons too if no changes are
>>> expected.)  The way the configure test is written means that
>>> it's no longer possible to do this without first building a
>>> trunk version of binutils for avr-elf.
>>>
>>> Thanks,
>>> Richard
>> 
>> Okay, I already thought of a less aggressive approach, I'll
>> try to address it soon.
>
> Is the following addendum in order?
>
> The avr maintainers appear to be offline since several weeks already,
> maybe a global maintainer can have a look and approve it for trunk?

Thanks for doing this.  LGTM (though obviously I can't approve)

Richard


Re: [PATHC][x86] Scalar mask and round RTL templates

2017-07-05 Thread Kirill Yukhin
On 05 Jul 06:38, Peryt, Sebastian wrote:
> Hi Kirill,
> 
> Sorry for this confusion. I meant to write MDs for intrinsics. Those 
> intrinsics are all masked ones for ADD[SD,SS], SUB[SD,SS], MUL[SD,SS], 
> DIV[SD,SS],
> MIN[SD,SS] and MAX[SD,SS]. What I found is that for mask equal 0 they were 
> producing wrong results when old mask meta-template was used.
What you're talking about looks like a bug. Could you pls add a regession test
to your patch?

> Modified changelog below.
> 
> 2017-07-05  Sebastian Peryt  
> 
> gcc/
>   * config/i386/subst.md (mask_scalar, round_scalar, 
> round_saeonly_scalar): New meta-templates.
>   (mask_scalar_name, mask_scalar_operand3, round_scalar_name,
>   round_scalar_mask_operand3, round_scalar_mask_op3,
>   round_scalar_constraint, round_scalar_prefix, round_saeonly_scalar_name,
>   round_saeonly_scalar_mask_operand3, round_saeonly_scalar_mask_op3,
>   round_saeonly_scalar_constraint, round_saeonly_scalar_prefix): New 
> subst attribute.
>   * config/i386/sse.md
>   (_vm3): Renamed to ...
>   _vm3 
> ... this.
>   (_vm3): Renamed to 
> ...
>   _vm3 
> ... this.
>   (_vm3): Renamed to ...
>   _vm3 ... 
> this.
>   (v\t{%2, %1, 
> %0|
>   %0, %1, %2}): Changed to ...
>   v\t{%2, 
> %1, %0|
>   %0, %1, %2} ... this.
>   (v\t{%2, %1, 
> %0|
>   %0, %1, %2}): Changed to ...
>   v\t{%2, 
> %1, %0|
>   %0, %1, %2} ... this.
>   (v\t{%2, %1, 
> %0|
>   %0, %1, %2}): Changed to 
> ...
>   
> v\t{%2, %1, 
> %0|
>   %0, %1, %2} 
> ... this.
Max line length is 79 characters I suppose.

--
Thanks, K
> 
> Is it ok for trunk?
> 
> Thanks,
> Sebastian
> 
> -Original Message-
> From: Kirill Yukhin [mailto:kirill.yuk...@gmail.com] 
> Sent: Tuesday, July 4, 2017 7:45 PM
> To: Peryt, Sebastian 
> Cc: gcc-patches@gcc.gnu.org; Uros Bizjak 
> Subject: Re: [PATHC][x86] Scalar mask and round RTL templates
> 
> Hello Sebastian,
> On 23 Jun 09:00, Peryt, Sebastian wrote:
> > Hi,
> > 
> > This patch adds three extra RTL meta-templates for scalar round and mask. 
> > Additionally fixes errors caused by previous mask and round usage in some 
> > of the intrinsics that I found.
> Could you pls point which intrinsics did you fixed (or which errors)?
> I see only MD changes in your patch.
> 
> > 
> > 2017-06-23  Sebastian Peryt  
> > 
> > gcc/
> > * config/i386/subst.md (mask_scalar, round_scalar, 
> > round_saeonly_scalar): New templates.
> I'd call it meta-templates.
> > (mask_scalar_name, mask_scalar_operand3, round_scalar_name,
> > round_scalar_mask_operand3, round_scalar_mask_op3,
> > round_scalar_constraint, round_scalar_prefix, round_saeonly_scalar_name,
> > round_saeonly_scalar_mask_operand3, round_saeonly_scalar_mask_op3,
> > round_saeonly_scalar_constraint, round_saeonly_scalar_prefix): New 
> > subst attribute.
> > * config/i386/sse.md
> > (_vm3): Renamed to ...
> > _vm3 
> > ... this.
> > (_vm3): Renamed to 
> > ...
> > _vm3 
> > ... this.
> > (_vm3): Renamed to ...
> > _vm3 ... 
> > this.
> > (v\t{%2, %1, 
> > %0|%0, %1, %2}): 
> > Changed to ...
> > v\t{%2, 
> > %1, %0|%0, %1, 
> > %2} ... this.
> > (v\t{%2, %1, 
> > %0|%0, %1, %2}): 
> > Changed to ...
> > v\t{%2, 
> > %1, %0|%0, %1, 
> > %2} ... this.
> > (v\t{%2, %1, 
> > %0|%0, %1, 
> > %2}): Changed to ...
> > 
> > v\t{%2, 
> > %1, %0|%0, %1, 
> > %2} ... this.
> We need to obey conventions. Pls break long lines here.
> 
> --
> Thanks, K
> > 
> > Is it ok for trunk?
> > 
> > Thanks,
> > Sebastian
> 
> 


Re: [PATCH 0/7] Support for the SPARC M8 cpu

2017-07-05 Thread Jose E. Marchesi

> I am preparing a second version of the patch serie that I will be
> submitting today.

Ok.  I was a bit astonished lately to find that a couple of SPARC
patches to binutils and gdb had only been tested (or even only
supported) Linux/SPARC, not Solaris/SPARC.

For binutils, I always try to test all our patches with
--enable-targets=all, plus:

For changes in sparc code:

SPARC_TARGETS="sparc-aout sparc-linux sparc-vxworks sparc64-linux
   sparc-sun-solaris2.12"

For changes in common code, the above plus:

X86_TARGETS="i386-darwin i386-lynxos i586-linux i686-nacl i686-pc-beos
 i686-pc-elf i686-pe i686-vxworks x86_64-linux
 x86_64-w64-mingw32 x86_64-nacl "

MIPS_TARGETS="mips-linux mips-vxworks mips64-linux mipsel-linux-gnu
  mipsisa32el-linux mips64-openbsd mipstx39-elf"

For big extensive changes in common code, the above plus around 80 more
targets covering powerpc, arm, alpha, arm, arc, vax, etc.

That said, we should probably do more native testing on Solaris/SPARC to
avoid breakage in that platform... we will look into improving that :)


Re: [PATCH v3][ASAN] Implement dynamic allocas/VLAs sanitization.​

2017-07-05 Thread Maxim Ostapenko
Ok, I've fixed comments (not sure the note about optimization is 
well-formatted) and moved replace_call_with_value.

Looks better now?

On 05/07/17 13:28, Jakub Jelinek wrote:

Anyway, this is not a strong requirement, the most important is to
fix the comments, then move the replace_call_with_value call, the rest
is just try if it works and if it doesn't, keep what you have.


gcc/ChangeLog:

2017-07-05  Maxim Ostapenko  

	* asan.c: Include gimple-fold.h.
	(get_last_alloca_addr): New function.
	(handle_builtin_stackrestore): Likewise.
	(handle_builtin_alloca): Likewise.
	(asan_emit_allocas_unpoison): Likewise.
	(get_mem_refs_of_builtin_call): Add new parameter, remove const
	quallifier from first paramerer. Handle BUILT_IN_ALLOCA,
	BUILT_IN_ALLOCA_WITH_ALIGN and BUILT_IN_STACK_RESTORE builtins.
	(instrument_builtin_call): Pass gimple iterator to
	get_mem_refs_of_builtin_call.
	(last_alloca_addr): New global.
	* asan.h (asan_emit_allocas_unpoison): Declare.
	* builtins.c (expand_asan_emit_allocas_unpoison): New function.
	(expand_builtin): Handle BUILT_IN_ASAN_ALLOCAS_UNPOISON.
	* cfgexpand.c (expand_used_vars): Call asan_emit_allocas_unpoison
	if function calls alloca.
	* gimple-fold.c (replace_call_with_value): Remove static keyword.
	* gimple-fold.h (replace_call_with_value): Declare.
	* internal-fn.c: Include asan.h.
	* sanitizer.def (BUILT_IN_ASAN_ALLOCA_POISON,
	BUILT_IN_ASAN_ALLOCAS_UNPOISON): New builtins.

gcc/testsuite/ChangeLog:

2017-07-05  Maxim Ostapenko  

	* c-c++-common/asan/alloca_big_alignment.c: New test.
	* c-c++-common/asan/alloca_detect_custom_size.c: Likewise.
	* c-c++-common/asan/alloca_instruments_all_paddings.c: Likewise.
	* c-c++-common/asan/alloca_loop_unpoisoning.c: Likewise.
	* c-c++-common/asan/alloca_overflow_partial.c: Likewise.
	* c-c++-common/asan/alloca_overflow_right.c: Likewise.
	* c-c++-common/asan/alloca_safe_access.c: Likewise.
	* c-c++-common/asan/alloca_underflow_left.c: Likewise.

diff --git a/gcc/asan.c b/gcc/asan.c
index 2de1640..3ec7341 100644
--- a/gcc/asan.c
+++ b/gcc/asan.c
@@ -55,6 +55,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "langhooks.h"
 #include "cfgloop.h"
 #include "gimple-builder.h"
+#include "gimple-fold.h"
 #include "ubsan.h"
 #include "params.h"
 #include "builtins.h"
@@ -245,6 +246,7 @@ along with GCC; see the file COPYING3.  If not see
 static unsigned HOST_WIDE_INT asan_shadow_offset_value;
 static bool asan_shadow_offset_computed;
 static vec sanitized_sections;
+static tree last_alloca_addr;
 
 /* Set of variable declarations that are going to be guarded by
use-after-scope sanitizer.  */
@@ -529,11 +531,186 @@ get_mem_ref_of_assignment (const gassign *assignment,
   return true;
 }
 
+/* Return address of last allocated dynamic alloca.  */
+
+static tree
+get_last_alloca_addr ()
+{
+  if (last_alloca_addr)
+return last_alloca_addr;
+
+  last_alloca_addr = create_tmp_reg (ptr_type_node, "last_alloca_addr");
+  gassign *g = gimple_build_assign (last_alloca_addr, null_pointer_node);
+  edge e = single_succ_edge (ENTRY_BLOCK_PTR_FOR_FN (cfun));
+  gsi_insert_on_edge_immediate (e, g);
+  return last_alloca_addr;
+}
+
+/* Insert __asan_allocas_unpoison (top, bottom) call after
+   __builtin_stack_restore (new_sp) call.
+   The pseudocode of this routine should look like this:
+ __builtin_stack_restore (new_sp);
+ top = last_alloca_addr;
+ bot = new_sp;
+ __asan_allocas_unpoison (top, bot);
+ last_alloca_addr = new_sp;
+   In general, we can't use new_sp as bot parameter because on some
+   architectures SP has non zero offset from dynamic stack area.  Moreover, on
+   some architectures this offset (STACK_DYNAMIC_OFFSET) becomes known for each
+   particular function only after all callees were expanded to rtl.
+   The most noticeable example is PowerPC{,64}, see
+   http://refspecs.linuxfoundation.org/ELF/ppc64/PPC-elf64abi.html#DYNAM-STACK.
+   To overcome the issue we use following trick: pass new_sp as a second
+   parameter to __asan_allocas_unpoison and rewrite it during expansion with
+   virtual_dynamic_stack_rtx later in expand_asan_emit_allocas_unpoison
+   function.
+*/
+
+static void
+handle_builtin_stack_restore (gcall *call, gimple_stmt_iterator *iter)
+{
+  if (!iter)
+return;
+
+  tree last_alloca = get_last_alloca_addr ();
+  tree restored_stack = gimple_call_arg (call, 0);
+  tree fn = builtin_decl_implicit (BUILT_IN_ASAN_ALLOCAS_UNPOISON);
+  gimple *g = gimple_build_call (fn, 2, last_alloca, restored_stack);
+  gsi_insert_after (iter, g, GSI_NEW_STMT);
+  g = gimple_build_assign (last_alloca, restored_stack);
+  gsi_insert_after (iter, g, GSI_NEW_STMT);
+}
+
+/* Deploy and poison redzones around __builtin_alloca call.  To do this, we
+   should replace this call with another one with changed parameters and
+   replace all its uses with new address, so
+   addr = __builtin_alloca (old_size, align);
+   is replaced by
+   left_redzone_size = max (align, ASAN_RED_Z

[Patch ARM] Remove %? string from some Advanced SIMD patterns.

2017-07-05 Thread Ramana Radhakrishnan
Advanced SIMD patterns are not predicable, thus they should not have %? 
in their output templates. Found when auditing the code for something 
else. This has been in my tree for sometime , bootstrapped and 
regression tested on armhf for armv7ve+simd as the architectural base.


Applied to trunk

  Ramana Radhakrishnan  

* config/arm/neon.md (fma4): Remove %?.
(fma4_intrinsic): Likewise.
(*fmsub4): Likewise.
(*fmsub4_intrinsic): Likewise.

regards
Ramana
commit b510e80f861b97496386fe58e6b6976a94a3afa1
Author: Ramana Radhakrishnan 
Date:   Mon Jun 26 14:51:30 2017 +

Remove %? from advanced SIMD patterns

* config/arm/neon.md (fma4): Remove %?
  (fma4_intrinsic): Likewise.
 (*fmsub4): Likewise.
 (*fmsub4_intrinsic): Likewise.

diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 0ce3fe415e6..33b25ff3c73 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -678,7 +678,7 @@
 (match_operand:VCVTF 2 "register_operand" "w")
 (match_operand:VCVTF 3 "register_operand" "0")))]
   "TARGET_NEON && TARGET_FMA && flag_unsafe_math_optimizations"
-  "vfma%?.\\t%0, %1, %2"
+  "vfma.\\t%0, %1, %2"
   [(set_attr "type" "neon_fp_mla_s")]
 )
 
@@ -688,7 +688,7 @@
 (match_operand:VCVTF 2 "register_operand" "w")
 (match_operand:VCVTF 3 "register_operand" "0")))]
   "TARGET_NEON && TARGET_FMA"
-  "vfma%?.\\t%0, %1, %2"
+  "vfma.\\t%0, %1, %2"
   [(set_attr "type" "neon_fp_mla_s")]
 )
 
@@ -720,7 +720,7 @@
   (match_operand:VCVTF 2 "register_operand" "w")
   (match_operand:VCVTF 3 "register_operand" "0")))]
   "TARGET_NEON && TARGET_FMA && flag_unsafe_math_optimizations"
-  "vfms%?.\\t%0, %1, %2"
+  "vfms.\\t%0, %1, %2"
   [(set_attr "type" "neon_fp_mla_s")]
 )
 
@@ -731,7 +731,7 @@
 (match_operand:VCVTF 2 "register_operand" "w")
 (match_operand:VCVTF 3 "register_operand" "0")))]
  "TARGET_NEON && TARGET_FMA"
- "vfms%?.\\t%0, %1, %2"
+ "vfms.\\t%0, %1, %2"
  [(set_attr "type" "neon_fp_mla_s")]
 )
 
@@ -752,7 +752,7 @@
 "s_register_operand" "w")]
NEON_VRINT))]
   "TARGET_NEON && TARGET_FPU_ARMV8"
-  "vrint%?.f32\\t%0, %1"
+  "vrint.f32\\t%0, %1"
   [(set_attr "type" "neon_fp_round_")]
 )
 


[patch,avr,committed] Fix PR81305

2017-07-05 Thread Georg-Johann Lay

Hi,

Instruction selection must not depend on "optimize" because
LDS / STS range might not cover range of IN / OUT.

This lead to wrong ISR code for avrtiny.

Applied as obvious.

Also added some test coverage for ISRs which we didn't have
at all to date.

Johann


gcc/
PR target/81305
* config/avr/avr.c (avr_out_movhi_mr_r_xmega) [CONSTANT_ADDRESS_P]:
Don't depend on "optimize > 0".
(out_movhi_r_mr, out_movqi_mr_r): Same.
(out_movhi_mr_r, out_movqi_r_mr): Same.
(avr_address_cost) [CONSTANT_ADDRESS_P]: Don't depend cost for
io_address_operand on "optimize > 0".

gcc/testsuite/
PR target/81305
* gcc.target/avr/isr-test.h: New file.
* gcc.target/avr/torture/isr-01-simple.c: New test.
* gcc.target/avr/torture/isr-02-call.c: New test.
* gcc.target/avr/torture/isr-03-fixed.c: New test.
Index: config/avr/avr.c
===
--- config/avr/avr.c	(revision 249997)
+++ config/avr/avr.c	(revision 249998)
@@ -3820,7 +3820,7 @@ out_movqi_r_mr (rtx_insn *insn, rtx op[]
   if (CONSTANT_ADDRESS_P (x))
 {
   int n_words = AVR_TINY ? 1 : 2;
-  return optimize > 0 && io_address_operand (x, QImode)
+  return io_address_operand (x, QImode)
 ? avr_asm_len ("in %0,%i1", op, plen, -1)
 : avr_asm_len ("lds %0,%m1", op, plen, -n_words);
 }
@@ -4088,7 +4088,7 @@ out_movhi_r_mr (rtx_insn *insn, rtx op[]
   else if (CONSTANT_ADDRESS_P (base))
 {
   int n_words = AVR_TINY ? 2 : 4;
-  return optimize > 0 && io_address_operand (base, HImode)
+  return io_address_operand (base, HImode)
 ? avr_asm_len ("in %A0,%i1" CR_TAB
"in %B0,%i1+1", op, plen, -2)
 
@@ -5215,7 +5215,7 @@ out_movqi_mr_r (rtx_insn *insn, rtx op[]
   if (CONSTANT_ADDRESS_P (x))
 {
   int n_words = AVR_TINY ? 1 : 2;
-  return optimize > 0 && io_address_operand (x, QImode)
+  return io_address_operand (x, QImode)
 ? avr_asm_len ("out %i0,%1", op, plen, -1)
 : avr_asm_len ("sts %m0,%1", op, plen, -n_words);
 }
@@ -5291,13 +5291,12 @@ avr_out_movhi_mr_r_xmega (rtx_insn *insn
 
   if (CONSTANT_ADDRESS_P (base))
 {
-  int n_words = AVR_TINY ? 2 : 4;
-  return optimize > 0 && io_address_operand (base, HImode)
+  return io_address_operand (base, HImode)
 ? avr_asm_len ("out %i0,%A1" CR_TAB
"out %i0+1,%B1", op, plen, -2)
 
 : avr_asm_len ("sts %m0,%A1" CR_TAB
-   "sts %m0+1,%B1", op, plen, -n_words);
+   "sts %m0+1,%B1", op, plen, -4);
 }
 
   if (reg_base > 0)
@@ -5477,7 +5476,7 @@ out_movhi_mr_r (rtx_insn *insn, rtx op[]
   if (CONSTANT_ADDRESS_P (base))
 {
   int n_words = AVR_TINY ? 2 : 4;
-  return optimize > 0 && io_address_operand (base, HImode)
+  return io_address_operand (base, HImode)
 ? avr_asm_len ("out %i0+1,%B1" CR_TAB
"out %i0,%A1", op, plen, -2)
 
@@ -11361,8 +11360,7 @@ avr_address_cost (rtx x, machine_mode mo
 }
   else if (CONSTANT_ADDRESS_P (x))
 {
-  if (optimize > 0
-  && io_address_operand (x, QImode))
+  if (io_address_operand (x, QImode))
 cost = 2;
 
   if (AVR_TINY
Index: testsuite/gcc.target/avr/isr-test.h
===
--- testsuite/gcc.target/avr/isr-test.h	(nonexistent)
+++ testsuite/gcc.target/avr/isr-test.h	(revision 249998)
@@ -0,0 +1,282 @@
+#ifndef ISR_TEST_H
+#define ISR_TEST_H
+
+#include 
+
+#define ISR(N,...)  \
+__attribute__ ((used, externally_visible , ## __VA_ARGS__)) \
+void __vector_##N (void);   \
+void __vector_##N (void)
+
+#define SFR(ADDR) (*(unsigned char volatile*) (__AVR_SFR_OFFSET__ + (ADDR)))
+#define CORE_SFRS SFR (0x38)
+#define SREG  SFR (0x3F)
+#define SPL   SFR (0x3D)
+#define EIND  SFR (0x3C)
+#define RAMPZ SFR (0x3B)
+#define RAMPY SFR (0x3A)
+#define RAMPX SFR (0x39)
+#define RAMPD SFR (0x38)
+
+#ifdef __AVR_HAVE_JMP_CALL__
+#define VEC_SIZE 4
+#else
+#define VEC_SIZE 2
+#endif
+
+#ifdef __AVR_TINY__
+#define FIRST_REG 16
+#else
+#define FIRST_REG 0
+#endif
+
+#define CR "\n\t"
+
+typedef struct
+{
+  unsigned char sfrs[8];
+  unsigned char gprs[32 - FIRST_REG];
+} regs_t;
+
+regs_t reginfo1, reginfo2;
+
+__attribute__((noinline))
+static void clear_reginfo (void)
+{
+  memset (reginfo1.sfrs, 0, sizeof (reginfo1.sfrs));
+  memset (reginfo2.sfrs, 0, sizeof (reginfo2.sfrs));
+}
+
+__attribute__((noinline))
+static void compare_reginfo (unsigned long gpr_ignore)
+{
+  signed char regno;
+  const unsigned char *preg1 = ®info1.gprs[0];
+  const unsigned char *preg2 = ®info2.gprs[0];
+
+  if (memcmp (®info1, ®info2, 8))
+__builtin_abort();
+
+  gpr_ignore >>= FIRST_REG;
+
+for 

RE: [PATHC][x86] Scalar mask and round RTL templates

2017-07-05 Thread Peryt, Sebastian
Tests were added. I also updated Changelog and set the max line length to be 
equal to 79 characters.

gcc/
* config/i386/subst.md (mask_scalar, round_scalar,
round_saeonly_scalar): New meta-templates.
(mask_scalar_name, mask_scalar_operand3, round_scalar_name,
round_scalar_mask_operand3, round_scalar_mask_op3,
round_scalar_constraint, round_scalar_prefix, round_saeonly_scalar_name,
round_saeonly_scalar_mask_operand3, round_saeonly_scalar_mask_op3,
round_saeonly_scalar_constraint, 
round_saeonly_scalar_prefix): New subst attribute.
* config/i386/sse.md
(_vm3): Renamed to ...
_vm3
 ... this.
(_vm3): Renamed to 
...
_vm3
 ... this.
(_vm3): Renamed to ...
_vm3
 ... this.
(v
\t{%2, %1, %0|
%0, %1, %2}): Changed to ...
v
\t{%2, %1, %0|
%0, %1, %2} ... this.
(v
\t{%2, %1, %0|
%0, %1, %2}): Changed to ...
v
\t{%2, %1, %0|
%0, %1, %2} ... this.
(v
\t{%2, %1, %0|
%0, %1, %2}): Changed to 
...
v
\t{%2, %1, %0|
%0, %1, %2
} ... this.

gcc/testsuite
* gcc.target/i386/avx512f-vaddsd-3.c: New test for mask 0 verification.
* gcc.target/i386/avx512f-vaddss-3.c: Ditto.
* gcc.target/i386/avx512f-vdivsd-3.c: Ditto.
* gcc.target/i386/avx512f-vdivss-3.c: Ditto.
* gcc.target/i386/avx512f-vmaxsd-3.c: Ditto.
* gcc.target/i386/avx512f-vmaxss-3.c: Ditto.
* gcc.target/i386/avx512f-vminsd-3.c: Ditto.
* gcc.target/i386/avx512f-vminss-3.c: Ditto.
* gcc.target/i386/avx512f-vmulsd-3.c: Ditto.
* gcc.target/i386/avx512f-vmulss-3.c: Ditto.
* gcc.target/i386/avx512f-vsubsd-3.c: Ditto.
* gcc.target/i386/avx512f-vsubss-3.c: Ditto.

Is it ok for trunk?

Thanks,
Sebastian

-Original Message-
From: Kirill Yukhin [mailto:kirill.yuk...@gmail.com] 
Sent: Wednesday, July 5, 2017 12:36 PM
To: Peryt, Sebastian 
Cc: gcc-patches@gcc.gnu.org
Subject: Re: [PATHC][x86] Scalar mask and round RTL templates

On 05 Jul 06:38, Peryt, Sebastian wrote:
> Hi Kirill,
> 
> Sorry for this confusion. I meant to write MDs for intrinsics. Those 
> intrinsics are all masked ones for ADD[SD,SS], SUB[SD,SS], MUL[SD,SS], 
> DIV[SD,SS], MIN[SD,SS] and MAX[SD,SS]. What I found is that for mask equal 0 
> they were producing wrong results when old mask meta-template was used.
What you're talking about looks like a bug. Could you pls add a regession test 
to your patch?

> Modified changelog below.
> 
> 2017-07-05  Sebastian Peryt  
> 
> gcc/
>   * config/i386/subst.md (mask_scalar, round_scalar, 
> round_saeonly_scalar): New meta-templates.
>   (mask_scalar_name, mask_scalar_operand3, round_scalar_name,
>   round_scalar_mask_operand3, round_scalar_mask_op3,
>   round_scalar_constraint, round_scalar_prefix, round_saeonly_scalar_name,
>   round_saeonly_scalar_mask_operand3, round_saeonly_scalar_mask_op3,
>   round_saeonly_scalar_constraint, round_saeonly_scalar_prefix): New 
> subst attribute.
>   * config/i386/sse.md
>   (_vm3): Renamed to ...
>   _vm3 
> ... this.
>   (_vm3): Renamed to 
> ...
>   _vm3 
> ... this.
>   (_vm3): Renamed to ...
>   _vm3 ... 
> this.
>   (v\t{%2, %1, 
> %0|
>   %0, %1, %2}): Changed to ...
>   v\t{%2, 
> %1, %0|
>   %0, %1, %2} ... this.
>   (v\t{%2, %1, 
> %0|
>   %0, %1, %2}): Changed to ...
>   v\t{%2, 
> %1, %0|
>   %0, %1, %2} ... this.
>   (v\t{%2, %1, 
> %0|
>   %0, %1, %2}): Changed to 
> ...
>   
> v\t{%2, %1, 
> %0|
>   %0, %1, %2} 
> ... this.
Max line length is 79 characters I suppose.

--
Thanks, K
> 
> Is it ok for trunk?
> 
> Thanks,
> Sebastian
> 
> -Original Message-
> From: Kirill Yukhin [mailto:kirill.yuk...@gmail.com]
> Sent: Tuesday, July 4, 2017 7:45 PM
> To: Peryt, Sebastian 
> Cc: gcc-patches@gcc.gnu.org; Uros Bizjak 
> Subject: Re: [PATHC][x86] Scalar mask and round RTL templates
> 
> Hello Sebastian,
> On 23 Jun 09:00, Peryt, Sebastian wrote:
> > Hi,
> > 
> > This patch adds three extra RTL meta-templates for scalar round and mask. 
> > Additionally fixes errors caused by previous mask and round usage in some 
> > of the intrinsics that I found.
> Could you pls point which intrinsics did you fixed (or which errors)?
> I see only MD changes in your patch.
> 
> > 
> > 2017-06-23  Sebastian Peryt  
> > 
> > gcc/
> > * config/i386/subst.md (mask_scalar, round_scalar, 
> > round_saeonly_scalar): New templates.
> I'd call it meta-templates.
> > (mask_scalar_name, mask_scalar_operand3, round_scalar_name,
> > round_scalar_mask_operand3, round_scalar_mask_op3,
> > round_scalar_constraint, round_scalar_prefix, round_saeonly_scalar_name,
> > round_saeonly_scalar_mask_operand3, round_saeonly_scalar_mask_op3,

[patch,avr,committed] Ad PR81072: Be less aggressive when testing for Binutils PR21472.

2017-07-05 Thread Georg-Johann Lay

Well, it's all only avr stuff... and I decided that it's obvious :-)

Applied the addendum to PR81072 / trunk r249124 from below.

Sorry for the inconvenience.

Johann


On 05.07.2017 12:30, Richard Sandiford wrote:

Georg-Johann Lay  writes:

On 05.07.2017 10:17, Georg-Johann Lay wrote:

On 04.07.2017 20:11, Richard Sandiford wrote:

Georg-Johann Lay  writes:

Hi,

This patch adds support for devices that can access flash memory
by LD* instructions, hence there is no need to put .rodata in RAM.

The default linker script for the new multilib versions already
supports this feature, it's similar to avrtiny, cf.

https://sourceware.org/PR21472

This patch does the following:

* Add multilib variants avrxmega3 and avrxmega3/short-calls.

* Add new option -mshort-calls for multilib selection between
 devices with <= 8KiB flash and > 8KiB flash.

* Add specs handling for -mshort-calls:  The compiler knows
 if this option is needed or not appropriate (similar to -msp8).

* Add new ISA feature AVR_ISA_RCALL for multilib selection
 via -mshort-calls.

* Add a new row to architecture description that contains the
 start address of flash memory in the RAM address range.
 (The actual value is not needed).

* For devices with flash in RAM space, don't let .rodata
 objects trigger need for __do_copy_data.

* Add some devices.

* Add configure test for Binutils PR21472.


Sorry if this has already been discussed, but it's useful to be
able to do things like:

.../configure --target=avr-elf --with-cpu=arc700
make -j... all-gcc

as a basic sanity test of a pan-target patch.  (I usually do
before-and-after assembly comparisons too if no changes are
expected.)  The way the configure test is written means that
it's no longer possible to do this without first building a
trunk version of binutils for avr-elf.

Thanks,
Richard


Okay, I already thought of a less aggressive approach, I'll
try to address it soon.


Is the following addendum in order?

The avr maintainers appear to be offline since several weeks already,
maybe a global maintainer can have a look and approve it for trunk?


Thanks for doing this.  LGTM (though obviously I can't approve)

Richard



https://gcc.gnu.org/r25

gcc/
Graceful degrade if Binutils PR21472 is not available.

PR target/81072
* configure.ac [target=avr]: WARN instead of ERROR if avrxmega3
.rodata in flash test fails.
(HAVE_LD_AVR_AVRXMEGA3_RODATA_IN_FLASH): Define it if test passes.
* confgure: Regenerate.
* config.in: Regenerate.
* config/avr/avr.c (avr_asm_named_section)
[HAVE_LD_AVR_AVRXMEGA3_RODATA_IN_FLASH]: Only trigger
__do_copy_data for stuff in .rodata if flash_pm_offset = 0.
(avr_asm_init_sections): Same.
Index: config/avr/avr.c
===
--- config/avr/avr.c	(revision 249995)
+++ config/avr/avr.c	(working copy)
@@ -1,7 +1,9 @@ avr_asm_init_sections (void)
  resp. `avr_need_copy_data_p'.  If flash is not mapped to RAM then
  we have also to track .rodata because it is located in RAM then.  */
 
+#if defined HAVE_LD_AVR_AVRXMEGA3_RODATA_IN_FLASH
   if (0 == avr_arch->flash_pm_offset)
+#endif
 readonly_data_section->unnamed.callback = avr_output_data_section_asm_op;
   data_section->unnamed.callback = avr_output_data_section_asm_op;
   bss_section->unnamed.callback = avr_output_bss_section_asm_op;
@@ -10036,7 +10038,10 @@ avr_asm_named_section (const char *name,
 || STR_PREFIX_P (name, ".gnu.linkonce.d"));
 
   if (!avr_need_copy_data_p
-  && 0 == avr_arch->flash_pm_offset)
+#if defined HAVE_LD_AVR_AVRXMEGA3_RODATA_IN_FLASH
+  && 0 == avr_arch->flash_pm_offset
+#endif
+  )
 avr_need_copy_data_p = (STR_PREFIX_P (name, ".rodata")
 || STR_PREFIX_P (name, ".gnu.linkonce.r"));
 
Index: config.in
===
--- config.in	(revision 249982)
+++ config.in	(working copy)
@@ -1460,6 +1460,13 @@ that are supported for each access macro
 #endif
 
 
+/* Define if your default avr linker script for avrxmega3 leaves .rodata in
+   flash. */
+#ifndef USED_FOR_TARGET
+#undef HAVE_LD_AVR_AVRXMEGA3_RODATA_IN_FLASH
+#endif
+
+
 /* Define if your linker supports -z bndplt */
 #ifndef USED_FOR_TARGET
 #undef HAVE_LD_BNDPLT_SUPPORT
Index: configure
===
--- configure	(revision 249982)
+++ configure	(working copy)
@@ -24851,29 +24851,32 @@ EOF
   ac_status=$?
   $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
   test $ac_status = 0; }; }
-if test -f conftest.nm
+if test -s conftest.nm
 then
 	if grep ' R xxvaryy' conftest.nm > /dev/null; then
 	{ $as_echo "$as_me:${as_lineno-$LINENO}: result: yes" >&5
 $as_echo "yes" >&6; }
-	rm -f conftest.s conftest.o conftest.elf conftest.nm
+
+$as_echo

[PATCH V2 2/7] sparc: put VIS compare instructions in it's own insn type and adjust DFAs

2017-07-05 Thread Jose E. Marchesi
This patch introduces a new value for the insn type attribute viscmp.
VIS comparison insn are adapted to use it, and finally the DFA
schedulers are updated accordingly.

gcc/ChangeLog:

* config/sparc/sparc.md ("type"): New insn type viscmp.
("fcmp_vis"): Set insn type to
viscmp.
("fpcmp8_vis"): Likewise.
("fucmp8_vis"): Likewise.
("fpcmpu_vis"): Likewise.
* config/sparc/niagara7.md ("n7_vis_logical_v3pipe"): Handle
viscmp.
("n7_vis_logical_11cycle"): Likewise.
* config/sparc/niagara4.md ("n4_vis_logical"): Likewise.
* config/sparc/niagara2.md ("niag3_vis": Likewise.
* config/sparc/niagara.md ("niag_vis"): Likewise.
* config/sparc/ultra3.md ("us3_fga"): Likewise.
* config/sparc/ultra1_2.md ("us1_fga_double"): Likewise.
---
 gcc/ChangeLog| 17 +
 gcc/config/sparc/niagara.md  |  2 +-
 gcc/config/sparc/niagara2.md |  4 ++--
 gcc/config/sparc/niagara4.md |  5 +++--
 gcc/config/sparc/niagara7.md |  4 ++--
 gcc/config/sparc/sparc.md| 15 ++-
 gcc/config/sparc/ultra1_2.md |  8 
 gcc/config/sparc/ultra3.md   |  2 +-
 8 files changed, 36 insertions(+), 21 deletions(-)

diff --git a/gcc/config/sparc/niagara.md b/gcc/config/sparc/niagara.md
index f9a1f6d..a8e23b8 100644
--- a/gcc/config/sparc/niagara.md
+++ b/gcc/config/sparc/niagara.md
@@ -114,5 +114,5 @@
  */
 (define_insn_reservation "niag_vis" 8
   (and (eq_attr "cpu" "niagara")
-(eq_attr "type" 
"fga,visl,vismv,fgm_pack,fgm_mul,pdist,edge,edgen,gsr,array,bmask"))
+(eq_attr "type" 
"fga,visl,viscmp,vismv,fgm_pack,fgm_mul,pdist,edge,edgen,gsr,array,bmask"))
   "niag_pipe*8")
diff --git a/gcc/config/sparc/niagara2.md b/gcc/config/sparc/niagara2.md
index 34ee630..3190d55 100644
--- a/gcc/config/sparc/niagara2.md
+++ b/gcc/config/sparc/niagara2.md
@@ -111,10 +111,10 @@
 
 (define_insn_reservation "niag2_vis" 6
   (and (eq_attr "cpu" "niagara2")
-(eq_attr "type" 
"fga,vismv,visl,fgm_pack,fgm_mul,pdist,edge,edgen,array,bmask,gsr"))
+(eq_attr "type" 
"fga,vismv,visl,viscmp,fgm_pack,fgm_mul,pdist,edge,edgen,array,bmask,gsr"))
   "niag2_pipe*6")
 
 (define_insn_reservation "niag3_vis" 9
   (and (eq_attr "cpu" "niagara3")
-(eq_attr "type" 
"fga,vismv,visl,fgm_pack,fgm_mul,pdist,pdistn,edge,edgen,array,bmask,gsr"))
+(eq_attr "type" 
"fga,vismv,visl,viscmp,fgm_pack,fgm_mul,pdist,pdistn,edge,edgen,array,bmask,gsr"))
   "niag2_pipe*9")
diff --git a/gcc/config/sparc/niagara4.md b/gcc/config/sparc/niagara4.md
index cc1bb75..a3417d2 100644
--- a/gcc/config/sparc/niagara4.md
+++ b/gcc/config/sparc/niagara4.md
@@ -90,8 +90,9 @@
 
 (define_insn_reservation "n4_vis_logical" 3
   (and (eq_attr "cpu" "niagara4")
-(and (eq_attr "type" "visl,pdistn")
-  (eq_attr "fptype" "double")))
+   (ior (and (eq_attr "type" "visl,pdistn")
+ (eq_attr "fptype" "double"))
+(eq_attr "type" "viscmp")))
   "n4_slot1, nothing*2")
 
 (define_insn_reservation "n4_vis_logical_11cycle" 11
diff --git a/gcc/config/sparc/niagara7.md b/gcc/config/sparc/niagara7.md
index 3dc8f9e..3f46198 100644
--- a/gcc/config/sparc/niagara7.md
+++ b/gcc/config/sparc/niagara7.md
@@ -123,13 +123,13 @@
 
 (define_insn_reservation "n7_vis_logical_v3pipe" 11
   (and (eq_attr "cpu" "niagara7")
-(and (eq_attr "type" "visl,pdistn")
+(and (eq_attr "type" "visl,viscmp,pdistn")
  (eq_attr "v3pipe" "true")))
   "n7_slot1, nothing*2")
 
 (define_insn_reservation "n7_vis_logical_11cycle" 11
   (and (eq_attr "cpu" "niagara7")
-(and (eq_attr "type" "visl")
+(and (eq_attr "type" "visl,viscmp")
   (eq_attr "v3pipe" "false")))
   "n7_slot1, nothing*10")
 
diff --git a/gcc/config/sparc/sparc.md b/gcc/config/sparc/sparc.md
index da23060..04da8ae 100644
--- a/gcc/config/sparc/sparc.md
+++ b/gcc/config/sparc/sparc.md
@@ -281,7 +281,8 @@
fpcmp,
fpmul,fpdivs,fpdivd,
fpsqrts,fpsqrtd,
-   fga,visl,vismv,fgm_pack,fgm_mul,pdist,pdistn,edge,edgen,gsr,array,bmask,
+   fga,visl,vismv,viscmp,
+   fgm_pack,fgm_mul,pdist,pdistn,edge,edgen,gsr,array,bmask,
cmove,
ialuX,
multi,savew,flushw,iflush,trap,lzd"
@@ -9059,8 +9060,7 @@
 UNSPEC_FCMP))]
   "TARGET_VIS"
   "fcmp\t%1, %2, %0"
-  [(set_attr "type" "visl")
-   (set_attr "fptype" "double")
+  [(set_attr "type" "viscmp")
(set_attr "v3pipe" "true")])
 
 (define_insn "fpcmp8_vis"
@@ -9070,8 +9070,7 @@
 UNSPEC_FCMP))]
   "TARGET_VIS4"
   "fpcmp8\t%1, %2, %0"
-  [(set_attr "type" "visl")
-   (set_attr "fptype" "double")])
+  [(set_attr "type" "viscmp")])
 
 (define_expand "vcond"
   [(match_operand:GCM 0 "register_operand" "")
@@ -9427,8 +9426,7 @@
 UNSPEC_FUCMP))]
   "TARGET_VIS3"
   "fucmp8\t%1, %2, %0"
-  [(set_attr "type" "visl")
-   (set_attr "v3pipe" "true")])
+  [(set_attr "type" "viscmp")])
 
 (define_insn "fpcmpu_vis"
   [(set (match_operand:P 0 "register_operand" "=r")
@@ -9437,8 +9435,7 @@
 UNSPE

[PATCH V2 1/7] sparc: put bmask* instructions in it's own insn type and adjust DFAs

2017-07-05 Thread Jose E. Marchesi
This patch introduces a new value for the insn type attribute bmask.
bmask instructions, which were previously typed as `array', are
adapted to use it, and finally the several DFA schedulers are updated
accordingly.

gcc/ChangeLog:

* config/sparc/sparc.md: New instruction type `bmask'.
(bmaskdi_vis): Use the `bmask' type.
(bmasksi_vis): Likewise.
* config/sparc/ultra3.md (us3_array): Likewise.
* config/sparc/niagara7.md (n7_array): Likewise.
* config/sparc/niagara4.md (n4_array): Likewise.
* config/sparc/niagara2.md (niag2_vis): Likewise.
(niag3_vis): Likewise.
* config/sparc/niagara.md (niag_vis): Likewise.
---
 gcc/ChangeLog| 12 
 gcc/config/sparc/niagara.md  |  2 +-
 gcc/config/sparc/niagara2.md |  4 ++--
 gcc/config/sparc/niagara4.md |  2 +-
 gcc/config/sparc/niagara7.md |  4 ++--
 gcc/config/sparc/sparc.md|  6 +++---
 gcc/config/sparc/ultra3.md   |  2 +-
 7 files changed, 22 insertions(+), 10 deletions(-)

diff --git a/gcc/config/sparc/niagara.md b/gcc/config/sparc/niagara.md
index f79771f..f9a1f6d 100644
--- a/gcc/config/sparc/niagara.md
+++ b/gcc/config/sparc/niagara.md
@@ -114,5 +114,5 @@
  */
 (define_insn_reservation "niag_vis" 8
   (and (eq_attr "cpu" "niagara")
-(eq_attr "type" 
"fga,visl,vismv,fgm_pack,fgm_mul,pdist,edge,edgen,gsr,array"))
+(eq_attr "type" 
"fga,visl,vismv,fgm_pack,fgm_mul,pdist,edge,edgen,gsr,array,bmask"))
   "niag_pipe*8")
diff --git a/gcc/config/sparc/niagara2.md b/gcc/config/sparc/niagara2.md
index 9bcdd06..34ee630 100644
--- a/gcc/config/sparc/niagara2.md
+++ b/gcc/config/sparc/niagara2.md
@@ -111,10 +111,10 @@
 
 (define_insn_reservation "niag2_vis" 6
   (and (eq_attr "cpu" "niagara2")
-(eq_attr "type" 
"fga,vismv,visl,fgm_pack,fgm_mul,pdist,edge,edgen,array,gsr"))
+(eq_attr "type" 
"fga,vismv,visl,fgm_pack,fgm_mul,pdist,edge,edgen,array,bmask,gsr"))
   "niag2_pipe*6")
 
 (define_insn_reservation "niag3_vis" 9
   (and (eq_attr "cpu" "niagara3")
-(eq_attr "type" 
"fga,vismv,visl,fgm_pack,fgm_mul,pdist,pdistn,edge,edgen,array,gsr"))
+(eq_attr "type" 
"fga,vismv,visl,fgm_pack,fgm_mul,pdist,pdistn,edge,edgen,array,bmask,gsr"))
   "niag2_pipe*9")
diff --git a/gcc/config/sparc/niagara4.md b/gcc/config/sparc/niagara4.md
index ad0a04b..cc1bb75 100644
--- a/gcc/config/sparc/niagara4.md
+++ b/gcc/config/sparc/niagara4.md
@@ -66,7 +66,7 @@
 
 (define_insn_reservation "n4_array" 12
   (and (eq_attr "cpu" "niagara4")
-(eq_attr "type" "array,edge,edgen"))
+(eq_attr "type" "array,bmask,edge,edgen"))
   "n4_slot1, nothing*11")
 
 (define_insn_reservation "n4_vis_move_1cycle" 1
diff --git a/gcc/config/sparc/niagara7.md b/gcc/config/sparc/niagara7.md
index 12d6ab0..3dc8f9e 100644
--- a/gcc/config/sparc/niagara7.md
+++ b/gcc/config/sparc/niagara7.md
@@ -71,7 +71,7 @@
 
 (define_insn_reservation "n7_array" 12
   (and (eq_attr "cpu" "niagara7")
-(eq_attr "type" "array,edge,edgen"))
+(eq_attr "type" "array,bmask,edge,edgen"))
   "n7_slot1, nothing*11")
 
 (define_insn_reservation "n7_fpdivs" 24
@@ -133,4 +133,4 @@
   (eq_attr "v3pipe" "false")))
   "n7_slot1, nothing*10")
 
-(define_bypass 3 "*_v3pipe" "*_v3pipe")
+(define_bypass 3 "n7*_v3pipe" "n7_*_v3pipe")
diff --git a/gcc/config/sparc/sparc.md b/gcc/config/sparc/sparc.md
index 5c5096b..da23060 100644
--- a/gcc/config/sparc/sparc.md
+++ b/gcc/config/sparc/sparc.md
@@ -281,7 +281,7 @@
fpcmp,
fpmul,fpdivs,fpdivd,
fpsqrts,fpsqrtd,
-   fga,visl,vismv,fgm_pack,fgm_mul,pdist,pdistn,edge,edgen,gsr,array,
+   fga,visl,vismv,fgm_pack,fgm_mul,pdist,pdistn,edge,edgen,gsr,array,bmask,
cmove,
ialuX,
multi,savew,flushw,iflush,trap,lzd"
@@ -9134,7 +9134,7 @@
 (plus:DI (match_dup 1) (match_dup 2)))]
   "TARGET_VIS2 && TARGET_ARCH64"
   "bmask\t%r1, %r2, %0"
-  [(set_attr "type" "array")
+  [(set_attr "type" "bmask")
(set_attr "v3pipe" "true")])
 
 (define_insn "bmasksi_vis"
@@ -9145,7 +9145,7 @@
 (zero_extend:DI (plus:SI (match_dup 1) (match_dup 2]
   "TARGET_VIS2"
   "bmask\t%r1, %r2, %0"
-  [(set_attr "type" "array")
+  [(set_attr "type" "bmask")
(set_attr "v3pipe" "true")])
 
 (define_insn "bshuffle_vis"
diff --git a/gcc/config/sparc/ultra3.md b/gcc/config/sparc/ultra3.md
index 6296b38..f5b81d6 100644
--- a/gcc/config/sparc/ultra3.md
+++ b/gcc/config/sparc/ultra3.md
@@ -56,7 +56,7 @@
 
 (define_insn_reservation "us3_array" 2
   (and (eq_attr "cpu" "ultrasparc3")
-(eq_attr "type" "array,edgen"))
+(eq_attr "type" "array,edgen,bmask"))
   "us3_ms + us3_slotany, nothing")
 
 ;; ??? Not entirely accurate.
-- 
2.3.4



[PATCH V2 0/7] Support for the SPARC M8 cpu

2017-07-05 Thread Jose E. Marchesi
[Changes from the previons version:
- Fixed two typos on the definition of the built-ins for fpcmple32shl
  and fpcmpde8shl in 32-bit mode.
- Bootstrapped and regtested in sparc-sun-solaris2.12.
- Rebased to today's master.]

This patch serie adds support for the SPARC M8 processor to GCC.
The SPARC M8 processor implements the Oracle SPARC Architecture 2017.

The first four patches are preparatory work:

- bmask* instructions are put in their own instruction type.  It makes
  little sense to have them in the same category than array
  instructions.

- Similarly, VIS compare instructions are put in their own instruction
  type.  This is to better accommodate subtypes, which are not quite
  the same than the subtypes of `visl' instructions.

- The introduction of a new `subtype' insn attribute in sparc.md
  avoids the need for adjusting the instruction scheduler DFAs for
  previous cpu models every time a new cpu is introduced.

- The full set of SPARC instructions used in sparc.md, and their
  position in the type/subtype hierarchy, is documented in a comment.
  This eases the modification of the DFA schedulers, and the addition
  of new cpus.

- The M7 DFA scheduler is reworked:

  + To use the new type/subtype hierarchy.
  + The v3pipe insn attribute is no longer needed.
  + More accurate latencies for instructions.
  + The C4 core pipeline is documented in a comment in niagara7.md.

The next three patches introduce M8 support proper:

- Support for -mcpu=m8 (we are thus suggesting to abandon the niagaraN
  denomination for M8 and later processors.)

- Support for a new VIS level, VIS4B, covering the new VIS
  instructions introduced in OSA2017 and implemented in the M8.  Also
  built-ins.

  Note that no new VIS level was formally introduced in OSA2017, even
  if many new VIS instructions were added to the spec.  We introduced
  VIS4B for coherence (like availability of builtins and visintrin.h
  depending on the value of __VIS__) and avoided using VIS5 in case it
  is introduced in future versions of the Oracle SPARC Architecture.

- A M8 DFA scheduler:

  + Also based on the new type/subtype hierarchy.
  + The functional units in the C5 core are explicitly documented in a
comment in m8.md.

See the individual patch descriptions for more information and
associated ChangeLog entries.

After this serie gets integrated upstream we will be contributing more
support for M8 capabilities, such as support for using the new
misaligned load/store instructions for memory accesses known to be
misaligned at compile-time.

Note that full binutils support for M8 was upstreamed in May 19.

Bootstrapped and tested in sparc64-linux-gnu.  No regressions.
Bootstrapped and tested in sparc-sun-solaris2.12.  No regressions.


Jose E. Marchesi (7):
  sparc: put bmask* instructions in it's own insn type and adjust DFAs
  sparc: put VIS compare instructions in it's own insn type and adjust
DFAs
  sparc: introduce insn subtypes
  sparc: reworked M7 DFA based on instruction subtypes
  sparc: basic support for the SPARC M8 cpu
  sparc: support for VIS4B instructions
  sparc: M8 DFA scheduler

 gcc/ChangeLog   | 226 +
 gcc/config.gcc  |   2 +-
 gcc/config.in   |   4 +
 gcc/config/sparc/constraints.md |  12 +-
 gcc/config/sparc/driver-sparc.c |   1 +
 gcc/config/sparc/m8.md  | 242 ++
 gcc/config/sparc/niagara.md |   2 +-
 gcc/config/sparc/niagara2.md|   4 +-
 gcc/config/sparc/niagara4.md|   7 +-
 gcc/config/sparc/niagara7.md| 181 +-
 gcc/config/sparc/predicates.md  |  27 +++
 gcc/config/sparc/sol2.h |  14 +-
 gcc/config/sparc/sparc-c.c  |   7 +-
 gcc/config/sparc/sparc-opts.h   |   1 +
 gcc/config/sparc/sparc.c| 312 ++--
 gcc/config/sparc/sparc.h|  20 +-
 gcc/config/sparc/sparc.md   | 364 +---
 gcc/config/sparc/sparc.opt  |   7 +
 gcc/config/sparc/ultra1_2.md|   8 +-
 gcc/config/sparc/ultra3.md  |   4 +-
 gcc/configure   |  35 +++
 gcc/configure.ac|  12 +
 gcc/doc/extend.texi |  39 +++
 gcc/doc/invoke.texi |  25 +-
 gcc/testsuite/ChangeLog |   8 +
 gcc/testsuite/gcc.target/sparc/dictunpack.c |  25 ++
 gcc/testsuite/gcc.target/sparc/fpcmpdeshl.c |  25 ++
 gcc/testsuite/gcc.target/sparc/fpcmpshl.c   |  81 +++
 gcc/testsuite/gcc.target/sparc/fpcmpurshl.c |  25 ++
 gcc/testsuite/gcc.target/sparc/fpcmpushl.c  |  43 
 30 files changed, 1579 insertions(+), 184 deletions(-)
 create mode 100644 gcc/config/sparc/m8.md
 create mode 100644 gcc/testsuite/gcc.target/sparc/di

[PATCH V2 7/7] sparc: M8 DFA scheduler

2017-07-05 Thread Jose E. Marchesi
This patch adds a DFA scheduler modelling the core S5 in the SPARC M8
processors.

gcc/ChangeLog:

* config/sparc/m8.md: New file.
* config/sparc/sparc.md: Include m8.md.
---
 gcc/ChangeLog |   5 +
 gcc/config/sparc/m8.md| 242 ++
 gcc/config/sparc/sparc.md |   1 +
 3 files changed, 248 insertions(+)
 create mode 100644 gcc/config/sparc/m8.md

diff --git a/gcc/config/sparc/m8.md b/gcc/config/sparc/m8.md
new file mode 100644
index 000..f0fe1b2
--- /dev/null
+++ b/gcc/config/sparc/m8.md
@@ -0,0 +1,242 @@
+;; Scheduling description for the SPARC M8.
+;;   Copyright (C) 2017 Free Software Foundation, Inc.
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+;;
+;; GCC is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; .
+
+;; Thigs to improve:
+;;
+;; - Store instructions are implemented by micro-ops, one of which
+;;   generates the store address and is executed in the store address
+;;   generation unit in the slot0.  We need to model that.
+;;
+;; - There are two V3 pipes connected to different slots.  The current
+;;   implementation assumes that all the instructions executing in a
+;;   V3 pipe are issued to the unit in slot3.
+;;
+;; - Single-issue ALU operations incur an additional cycle of latency to
+;;   slot 0 and slot 1 instructions.  This is not currently reflected
+;;   in the DFA.
+
+(define_automaton "m8_0")
+
+;; The S5 core has two dual-issue queues, PQLS and PQEX.  Each queue
+;; is divided into two slots: PQLS corresponds to slots 0 and 1, and
+;; PQEX corresponds to slots 2 and 3.  The core can issue 4
+;; instructions per-cycle, and up to 4 instructions are committed each
+;; cycle.
+;;
+;;
+;;   m8_slot0  - Load Unit.
+;; - Store address gen. Unit.
+;;   
+;;
+;;   === PQLS ==>m8_slot1  - Store data unit.
+;; - Branch unit.
+;;
+;; 
+;;   === PQEX ==>m8_slot2  - Integer Unit (EXU2). 
+;; - 3-cycles Crypto Unit (SPU2).
+;; 
+;;   m8_slot3  - Integer Unit (EXU3).
+;; - 3-cycles Crypto Unit (SPU3).
+;; - Floating-point and graphics unit (FPG).
+;; - Long-latency Crypto Unit.
+;; - Oracle Numbers Unit (ONU).
+
+(define_cpu_unit "m8_slot0,m8_slot1,m8_slot2,m8_slot3" "m8_0")
+
+;; Some instructions stall the pipeline and avoid any other
+;; instruction to be issued in the same cycle.  We assume the same for
+;; multi-instruction insns.
+
+(define_reservation "m8_single_issue" "m8_slot0 + m8_slot1 + m8_slot2 + 
m8_slot3")
+
+(define_insn_reservation "m8_single" 1
+  (and (eq_attr "cpu" "m8")
+   (eq_attr "type" "multi,savew,flushw,trap,bmask"))
+  "m8_single_issue")
+
+;; Most of the instructions executing in the integer units have a
+;; latency of 1.
+
+(define_insn_reservation "m8_integer" 1
+  (and (eq_attr "cpu" "m8")
+   (eq_attr "type" "ialu,ialuX,shift,cmove,compare,bmask"))
+  "(m8_slot2 | m8_slot3)")
+
+;; Flushing the instruction memory takes 27 cycles.
+
+
+(define_insn_reservation "m8_iflush" 27
+  (and (eq_attr "cpu" "m8")
+   (eq_attr "type" "iflush"))
+  "(m8_slot2 | m8_slot3), nothing*26")
+
+;; The integer multiplication instructions have a latency of 10 cycles
+;; and execute in integer units.
+;;
+;; Likewise for array*, edge* and pdistn instructions.
+;;
+;; However, the latency is only 9 cycles if the consumer of the
+;; operation is also capable of 9 cycles latency.  We model this with
+;; a bypass.
+
+(define_insn_reservation "m8_imul" 10
+  (and (eq_attr "cpu" "m8")
+   (eq_attr "type" "imul,array,edge,edgen,pdistn"))
+  "(m8_slot2 | m8_slot3), nothing*12")
+
+(define_bypass 9 "m8_imul" "m8_imul")
+
+;; The integer division instructions `sdiv' and `udivx' have a latency
+;; of 30 cycles and execute in integer units.
+
+(define_insn_reservation "m8_idiv" 30
+  (and (eq_attr "cpu" "m8")
+   (eq_attr "type" "idiv"))
+  "(m8_slot2 | m8_slot3), nothing*29")
+
+;; Both integer and floating-point load instructions have a latency of
+;; only 3 cycles,

[PATCH V2 6/7] sparc: support for VIS4B instructions

2017-07-05 Thread Jose E. Marchesi
This patch adds suppport for the following VIS instructions, which are
introduced in the Oracle SPARC Architecture 2017 and implemented by the
SPARC M8 cpu:

- Dictionary unpack.
- Partitioned compare with shifted result.
- Unsigned partitioned compare with shifted result.
- Partitioned dual-equal compare with shifted result.
- Partitioned unsigned range compare with shifted result.

The facilities introduced are:

- A new option -mvis4b.
- Compiler built-ins for the above mentioned instructions.

Tests and documentation are also provided.

gcc/ChangeLog:

* config/sparc/sparc.opt: New option -mvis4b.
* config/sparc/sparc.c (dump_target_flag_bits): Handle MASK_VIS4B.
(sparc_option_override): Handle VIS4B.
(enum sparc_builtins): Define
SPARC_BUILTIN_DICTUNPACK{8,16,32},
SPARC_BUILTIN_FPCMP{LE,GT,EQ,NE}{8,16,32}SHL,
SPARC_BUILTIN_FPCMPU{LE,GT}{8,16,32}SHL,
SPARC_BUILTIN_FPCMPDE{8,16,32}SHL and
SPARC_BUILTIN_FPCMPUR{8,16,32}SHL.
(check_constant_argument): New function.
(sparc_vis_init_builtins): Define builtins
__builtin_vis_dictunpack{8,16,32},
__builtin_vis_fpcmp{le,gt,eq,ne}{8,16,32}shl,
__builtin_vis_fpcmpu{le,gt}{8,16,32}shl,
__builtin_vis_fpcmpde{8,16,32}shl and
__builtin_vis_fpcmpur{8,16,32}shl.
(sparc_expand_builtin): Check that the constant operands to
__builtin_vis_fpcmp*shl and _builtin_vis_dictunpack* are indeed
constant and in range.
* config/sparc/sparc-c.c (sparc_target_macros): Handle
TARGET_VIS4B.
* config/sparc/sparc.h (SPARC_IMM2_P): Define.
(SPARC_IMM5_P): Likewise.
* config/sparc/sparc.md (cpu_feature): Add new feagure "vis4b".
(enabled): Handle vis4b.
(UNSPEC_DICTUNPACK): New unspec.
(UNSPEC_FPCMPSHL): Likewise.
(UNSPEC_FPUCMPSHL): Likewise.
(UNSPEC_FPCMPDESHL): Likewise.
(UNSPEC_FPCMPURSHL): Likewise.
(cpu_feature): New CPU feature `vis4b'.
(dictunpack{8,16,32}): New insns.
(FPCSMODE): New mode iterator.
(fpcscond): New code iterator.
(fpcsucond): Likewise.
(fpcmp{le,gt,eq,ne}{8,16,32}{si,di}shl): New insns.
(fpcmpu{le,gt}{8,16,32}{si,di}shl): Likewise.
(fpcmpde{8,16,32}{si,di}shl): Likewise.
(fpcmpur{8,16,32}{si,di}shl): Likewise.
* config/sparc/constraints.md: Define constraints `q' for unsigned
2-bit integer constants and `t' for unsigned 5-bit integer
constants.
* config/sparc/predicates.md (imm5_operand_dictunpack8): New
predicate.
(imm5_operand_dictunpack16): Likewise.
(imm5_operand_dictunpack32): Likewise.
(imm2_operand): Likewise.
* doc/invoke.texi (SPARC Options): Document -mvis4b.
* doc/extend.texi (SPARC VIS Built-in Functions): Document the
ditunpack* and fpcmp*shl builtins.

gcc/testsuite/ChangeLog:

* gcc.target/sparc/dictunpack.c: New file.
* gcc.target/sparc/fpcmpdeshl.c: Likewise.
* gcc.target/sparc/fpcmpshl.c: Likewise.
* gcc.target/sparc/fpcmpurshl.c: Likewise.
* gcc.target/sparc/fpcmpushl.c: Likewise.
---
 gcc/ChangeLog   |  53 ++
 gcc/config/sparc/constraints.md |  12 +-
 gcc/config/sparc/predicates.md  |  27 +++
 gcc/config/sparc/sparc-c.c  |   7 +-
 gcc/config/sparc/sparc.c| 247 +++-
 gcc/config/sparc/sparc.h|   4 +
 gcc/config/sparc/sparc.md   |  73 +++-
 gcc/config/sparc/sparc.opt  |   4 +
 gcc/doc/extend.texi |  39 +
 gcc/doc/invoke.texi |  13 ++
 gcc/testsuite/ChangeLog |   8 +
 gcc/testsuite/gcc.target/sparc/dictunpack.c |  25 +++
 gcc/testsuite/gcc.target/sparc/fpcmpdeshl.c |  25 +++
 gcc/testsuite/gcc.target/sparc/fpcmpshl.c   |  81 +
 gcc/testsuite/gcc.target/sparc/fpcmpurshl.c |  25 +++
 gcc/testsuite/gcc.target/sparc/fpcmpushl.c  |  43 +
 16 files changed, 677 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/sparc/dictunpack.c
 create mode 100644 gcc/testsuite/gcc.target/sparc/fpcmpdeshl.c
 create mode 100644 gcc/testsuite/gcc.target/sparc/fpcmpshl.c
 create mode 100644 gcc/testsuite/gcc.target/sparc/fpcmpurshl.c
 create mode 100644 gcc/testsuite/gcc.target/sparc/fpcmpushl.c

diff --git a/gcc/config/sparc/constraints.md b/gcc/config/sparc/constraints.md
index 7c9ef74..cff5a61 100644
--- a/gcc/config/sparc/constraints.md
+++ b/gcc/config/sparc/constraints.md
@@ -19,7 +19,7 @@
 
 ;;; Unused letters:
 ;;; B
-;;;ajklq  tuv xyz
+;;;ajkluv xyz
 
 
 ;; Register constraints
@@ -58,6 +58,16 @@
 
 ;; Integer constant constraints
 
+(define_constraint "q"
+ "Unsigned 2-bit integer constant"
+  (and (match_code "const_

[PATCH V2 4/7] sparc: reworked M7 DFA based on instruction subtypes

2017-07-05 Thread Jose E. Marchesi
This patch reworks the M7 DFA scheduler to use instruction subtypes.  It
also removes the v3pipe insn attribute from sparc.md, as it is no longer
needed.

gcc/ChangeLog:

* config/sparc/niagara7.md: Rework the DFA scheduler to use insn
subtypes.
* config/sparc/sparc.md: Remove the `v3pipe' insn attribute.
("*movdi_insn_sp32"): Likewise.
("*movsi_insn"): Likewise.
("*movdi_insn_sp64"): Likewise.
("*movsf_insn"): Likewise.
("*movdf_insn_sp32"): Likewise.
("*movdf_insn_sp64"): Likewise.
("*zero_extendsidi2_insn_sp64"): Likewise.
("*sign_extendsidi2_insn"): Likewise.
("*mov_insn"): Likewise.
("*mov_insn_sp64"): Likewise.
("*mov_insn_sp32"): Likewise.
("3"): Likewise.
("3"): Likewise.
("*not_3"): Likewise.
("*nand_vis"): Likewise.
("*_not1_vis"): Likewise.
("*_not2_vis"): Likewise.
("one_cmpl2"): Likewise.
("faligndata_vis"): Likewise.
("alignaddrsi_vis"): Likewise.
("alignaddrdi_vis"): Likweise.
("alignaddrlsi_vis"): Likewise.
("alignaddrldi_vis"): Likewise.
("fcmp_vis"): Likewise.
("bmaskdi_vis"): Likewise.
("bmasksi_vis"): Likewise.
("bshuffle_vis"): Likewise.
("cmask8_vis"): Likewise.
("cmask16_vis"): Likewise.
("cmask32_vis"): Likewise.
("pdistn_vis"): Likewise.
("3"): Likewise.
---
 gcc/ChangeLog|  38 +
 gcc/config/sparc/niagara7.md | 181 ++-
 gcc/config/sparc/sparc.md|  93 +++---
 3 files changed, 192 insertions(+), 120 deletions(-)

diff --git a/gcc/config/sparc/niagara7.md b/gcc/config/sparc/niagara7.md
index 3f46198..23b6707 100644
--- a/gcc/config/sparc/niagara7.md
+++ b/gcc/config/sparc/niagara7.md
@@ -19,64 +19,120 @@
 
 (define_automaton "niagara7_0")
 
-(define_cpu_unit "n7_slot0,n7_slot1,n7_slot2" "niagara7_0")
-(define_reservation "n7_single_issue" "n7_slot0 + n7_slot1 + n7_slot2")
+;; The S4 core has a dual-issue queue.  This queue is divided into two
+;; slots.  One instruction can be issued each cycle to each slot, and
+;; up to 2 instructions are committed each cycle.  Each slot serves
+;; several execution units, as depicted below:
+;;
+;;
+;; m7_slot0 - Integer unit.
+;;  - Load/Store unit.
+;; === QUEUE ==>
+;;
+;; m7_slot1 - Integer unit.
+;;  - Branch unit.
+;;  - Floating-point and graphics unit.
+;;  - 3-cycles crypto unit.
 
-(define_cpu_unit "n7_load_store" "niagara7_0")
+(define_cpu_unit "n7_slot0,n7_slot1" "niagara7_0")
+
+;; Some instructions stall the pipeline and avoid any other
+;; instruction to be issued in the same cycle.  We assume the same for
+;; multi-instruction insns.
+
+(define_reservation "n7_single_issue" "n7_slot0 + n7_slot1")
 
 (define_insn_reservation "n7_single" 1
   (and (eq_attr "cpu" "niagara7")
 (eq_attr "type" "multi,savew,flushw,trap"))
   "n7_single_issue")
 
-(define_insn_reservation "n7_iflush" 27
-  (and (eq_attr "cpu" "niagara7")
-   (eq_attr "type" "iflush"))
-  "(n7_slot0 | n7_slot1), nothing*26")
+;; Most of the instructions executing in the integer unit have a
+;; latency of 1.
 
 (define_insn_reservation "n7_integer" 1
   (and (eq_attr "cpu" "niagara7")
 (eq_attr "type" "ialu,ialuX,shift,cmove,compare"))
   "(n7_slot0 | n7_slot1)")
 
+;; Flushing the instruction memory takes 27 cycles.
+
+(define_insn_reservation "n7_iflush" 27
+  (and (eq_attr "cpu" "niagara7")
+   (eq_attr "type" "iflush"))
+  "(n7_slot0 | n7_slot1), nothing*26")
+
+;; The integer multiplication instructions have a latency of 12 cycles
+;; and execute in the integer unit.
+;;
+;; Likewise for array*, edge* and pdistn instructions.
+
 (define_insn_reservation "n7_imul" 12
   (and (eq_attr "cpu" "niagara7")
-(eq_attr "type" "imul"))
-  "n7_slot1, nothing*11")
+(eq_attr "type" "imul,array,edge,edgen,pdistn"))
+  "(n7_slot0 | n7_slot1), nothing*11")
+
+;; The integer division instructions have a latency of 35 cycles and
+;; execute in the integer unit.
 
 (define_insn_reservation "n7_idiv" 35
   (and (eq_attr "cpu" "niagara7")
 (eq_attr "type" "idiv"))
-  "n7_slot1, nothing*34")
+  "(n7_slot0 | n7_slot1), nothing*34")
+
+;; Both integer and floating-point load instructions have a latency of
+;; 5 cycles, and execute in the slot0.
+;;
+;; The prefetch instruction also executes in the load/store unit, but
+;; its latency is only 1 cycle.
 
 (define_insn_reservation "n7_load" 5
   (and (eq_attr "cpu" "niagara7")
-(eq_attr "type" "load,fpload,sload"))
-  "(n7_slot0 + n7_load_store), nothing*4")
+   (ior (eq_attr "type" "fpload,sload")
+(and (eq_attr "type" "load")
+ (eq_attr "subtype" "regular"
+  "n7_slot0, nothing*4")
+
+(define_in

[PATCH V2 5/7] sparc: basic support for the SPARC M8 cpu

2017-07-05 Thread Jose E. Marchesi
This patch adds the following support for the SPARC M8 cpu, which
implements the Oracle SPARC Architecture 2017:

- Support for -mcpu=m8 and -mtune=m8.
- Definition of cpu target macros and specs in the backend.
- Tuning of backend parameters for the M8.
- Addition of a new cpu type m8 in the machine description.

gcc/ChangeLog:

* config.gcc: Handle m8 in --with-{cpu,tune} options.
* config.in: Add HAVE_AS_SPARC6 define.
* config/sparc/driver-sparc.c (cpu_names): Add entry for the SPARC
M8.
* config/sparc/sol2.h (CPP_CPU64_DEFAULT_SPEC): Define for
TARGET_CPU_m8.
(ASM_CPU32_DEFAUILT_SPEC): Likewise.
(CPP_CPU_SPEC): Handle m8.
(ASM_CPU_SPEC): Likewise.
* config/sparc/sparc-opts.h (enum processor_type): Add
PROCESSOR_M8.
* config/sparc/sparc.c (m8_costs): New struct.
(sparc_option_override): Handle TARGET_CPU_m8.
(sparc32_initialize_trampoline): Likewise.
(sparc64_initialize_trampoline): Likewise.
(sparc_issue_rate): Likewise.
(sparc_register_move_cost): Likewise.
* config/sparc/sparc.h (TARGET_CPU_m8): Define.
(CPP_CPU64_DEFAULT_SPEC): Define for M8.
(ASM_CPU64_DEFAULT_SPEC): Likewise.
(CPP_CPU_SPEC): Handle M8.
(ASM_CPU_SPEC): Likewise.
(AS_M8_FLAG): Define.
* config/sparc/sparc.md: Add m8 to the cpu attribute.
* config/sparc/sparc.opt: New option -mcpu=m8 for sparc targets.
* configure.ac (HAVE_AS_SPARC6): Check for assembler support for
M8 instructions.
* configure: Regenerate.
* doc/invoke.texi (SPARC Options): Document -mcpu=m8 and
-mtune=m8.
---
 gcc/ChangeLog   | 33 +
 gcc/config.gcc  |  2 +-
 gcc/config.in   |  4 +++
 gcc/config/sparc/driver-sparc.c |  1 +
 gcc/config/sparc/sol2.h | 14 +++--
 gcc/config/sparc/sparc-opts.h   |  1 +
 gcc/config/sparc/sparc.c| 65 ++---
 gcc/config/sparc/sparc.h| 16 +-
 gcc/config/sparc/sparc.md   |  3 +-
 gcc/config/sparc/sparc.opt  |  3 ++
 gcc/configure   | 35 ++
 gcc/configure.ac| 12 
 gcc/doc/invoke.texi | 12 
 13 files changed, 181 insertions(+), 20 deletions(-)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index c5ae8ca..07d0410 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -4435,7 +4435,7 @@ case "${target}" in
| sparclite | f930 | f934 | sparclite86x \
| sparclet | tsc701 \
| v9 | ultrasparc | ultrasparc3 | niagara | niagara2 \
-   | niagara3 | niagara4 | niagara7)
+   | niagara3 | niagara4 | niagara7 | m8)
# OK
;;
*)
diff --git a/gcc/config.in b/gcc/config.in
index bf2aa7b..bff886a 100644
--- a/gcc/config.in
+++ b/gcc/config.in
@@ -660,6 +660,10 @@
 #undef HAVE_AS_SPARC5_VIS4
 #endif
 
+/* Define if your assembler supports SPARC6 instructions. */
+#ifndef USED_FOR_TARGET
+#undef HAVE_AS_SPARC6
+#endif
 
 /* Define if your assembler and linker support GOTDATA_OP relocs. */
 #ifndef USED_FOR_TARGET
diff --git a/gcc/config/sparc/driver-sparc.c b/gcc/config/sparc/driver-sparc.c
index b96ef47..0c25d6c 100644
--- a/gcc/config/sparc/driver-sparc.c
+++ b/gcc/config/sparc/driver-sparc.c
@@ -79,6 +79,7 @@ static const struct cpu_names {
 #endif
   { "SPARC-M7","niagara7" },
   { "SPARC-S7","niagara7" },
+  { "SPARC-M8","m8" },
   { NULL,  NULL }
   };
 
diff --git a/gcc/config/sparc/sol2.h b/gcc/config/sparc/sol2.h
index 8a50bfe..b8177c0 100644
--- a/gcc/config/sparc/sol2.h
+++ b/gcc/config/sparc/sol2.h
@@ -174,13 +174,22 @@ along with GCC; see the file COPYING3.  If not see
 #define ASM_CPU64_DEFAULT_SPEC AS_SPARC64_FLAG AS_NIAGARA7_FLAG
 #endif
 
+#if TARGET_CPU_DEFAULT == TARGET_CPU_m8
+#undef CPP_CPU64_DEFAULT_SPEC
+#define CPP_CPU64_DEFAULT_SPEC ""
+#undef ASM_CPU32_DEFAULT_SPEC
+#define ASM_CPU32_DEFAULT_SPEC AS_SPARC32_FLAG AS_M8_FLAG
+#undef ASM_CPU64_DEFAULT_SPEC
+#define ASM_CPU64_DEFAULT_SPEC AS_SPARC64_FLAG AS_M8_FLAG
+#endif
+
 #undef CPP_CPU_SPEC
 #define CPP_CPU_SPEC "\
 %{mcpu=sparclet|mcpu=tsc701:-D__sparclet__} \
 %{mcpu=sparclite|mcpu-f930|mcpu=f934:-D__sparclite__} \
 %{mcpu=v8:" DEF_ARCH32_SPEC("-D__sparcv8") "} \
 %{mcpu=supersparc:-D__supersparc__ " DEF_ARCH32_SPEC("-D__sparcv8") "} \
-%{mcpu=v9|mcpu=ultrasparc|mcpu=ultrasparc3|mcpu=niagara|mcpu=niagara2|mcpu=niagara3|mcpu=niagara4|mcpu=niagara7:"
 DEF_ARCH32_SPEC("-D__sparcv8") "} \
+%{mcpu=v9|mcpu=ultrasparc|mcpu=ultrasparc3|mcpu=niagara|mcpu=niagara2|mcpu=niagara3|mcpu=niagara4|mcpu=niagara7|mcpu=m8:"
 DEF_ARCH32_SPEC("-D__sparcv8") "} \
 %{!mcpu*:%(cpp_cpu_default)} \
 "
 
@@ -290

[PATCH V2 3/7] sparc: introduce insn subtypes

2017-07-05 Thread Jose E. Marchesi
This patch introduces a new insn attribute `subtype', and marks
existing insns appropriately.  The resulting instruction hierarchy is
documented in a comment.

gcc/ChangeLog:

* config/sparc/sparc.md ("subtype"): New insn attribute.
("*wrgsr_sp64"): Set insn subtype.
("*rdgsr_sp64"): Likewise.
("alignaddrsi_vis"): Likewise.
("alignaddrdi_vis"): Likewise.
("alignaddrlsi_vis"): Likewise.
("alignaddrldi_vis"): Likewise.
("3"): Likewise.
("fexpand_vis"): Likewise.
("fpmerge_vis"): Likewise.
("faligndata_vis"): Likewise.
("bshuffle_vis"): Likewise.
("cmask8_vis"): Likewise.
("cmask16_vis"): Likewise.
("cmask32_vis"): Likewise.
("fchksm16_vis"): Likewise.
("v3"): Likewise.
("fmean16_vis"): Likewise.
("fp64_vis"): Likewise.
("v8qi3"): Likewise.
("3"): Likewise.
("3"): Likewise.
("3"): Likewise.
("v8qi3"): Likewise.
("3"): Likewise.
("*movqi_insn"): Likewise.
("*movhi_insn"): Likewise.
("*movsi_insn"): Likewise.
("movsi_pic_gotdata_op"): Likewise.
("*movdi_insn_sp32"): Likewise.
("*movdi_insn_sp64"): Likewise.
("movdi_pic_gotdata_op"): Likewise.
("*movsf_insn"): Likewise.
("*movdf_insn_sp32"): Likewise.
("*movdf_insn_sp64"): Likewise.
("*zero_extendhisi2_insn"): Likewise.
("*zero_extendqihi2_insn"): Likewise.
("*zero_extendqisi2_insn"): Likewise.
("*zero_extendqidi2_insn"): Likewise.
("*zero_extendhidi2_insn"): Likewise.
("*zero_extendsidi2_insn_sp64"): Likewise.
("ldfsr"): Likewise.
("prefetch_64"): Likewise.
("prefetch_32"): Likewise.
("tie_ld32"): Likewise.
("tie_ld64"): Likewise.
("*tldo_ldub_sp32"): Likewise.
("*tldo_ldub1_sp32"): Likewise.
("*tldo_ldub2_sp32"): Likewise.
("*tldo_ldub_sp64"): Likewise.
("*tldo_ldub1_sp64"): Likewise.
("*tldo_ldub2_sp64"): Likewise.
("*tldo_ldub3_sp64"): Likewise.
("*tldo_lduh_sp32"): Likewise.
("*tldo_lduh1_sp32"): Likewise.
("*tldo_lduh_sp64"): Likewise.
("*tldo_lduh1_sp64"): Likewise.
("*tldo_lduh2_sp64"): Likewise.
("*tldo_lduw_sp32"): Likewise.
("*tldo_lduw_sp64"): Likewise.
("*tldo_lduw1_sp64"): Likewise.
("*tldo_ldx_sp64"): Likewise.
("*mov_insn"): Likewise.
("*mov_insn_sp64"): Likewise.
("*mov_insn_sp32"): Likewise.
---
 gcc/ChangeLog |  68 
 gcc/config/sparc/sparc.md | 199 --
 2 files changed, 243 insertions(+), 24 deletions(-)

diff --git a/gcc/config/sparc/sparc.md b/gcc/config/sparc/sparc.md
index 04da8ae..d1bf6a7 100644
--- a/gcc/config/sparc/sparc.md
+++ b/gcc/config/sparc/sparc.md
@@ -268,7 +268,86 @@
  (eq_attr "cpu_feature" "vis4") (symbol_ref "TARGET_VIS4")]
 (const_int 0)))
 
-;; Insn type.
+;; The SPARC instructions used by the backend are organized into a
+;; hierarchy using the insn attributes "type" and "subtype".
+;;
+;; The mnemonics used in the list below are the architectural names
+;; used in the Oracle SPARC Architecture specs.  A / character
+;; separates the type from the subtype where appropriate.  For
+;; brevity, text enclosed in {} denotes alternatives, while text
+;; enclosed in [] is optional.
+;;
+;; Please keep this list updated.  It is of great help for keeping the
+;; correctness and coherence of the DFA schedulers.
+;;
+;; ialu:  
+;; ialuX: ADD[X]C SUB[X]C
+;; shift: SLL[X] SRL[X] SRA[X]
+;; cmove: MOV{A,N,NE,E,G,LE,GE,L,GU,LEU,CC,CS,POS,NEG,VC,VS}
+;;MOVF{A,N,U,G,UG,L,UL,LG,NE,E,UE,GE,UGE,LE,ULE,O}
+;;MOVR{Z,LEZ,LZ,NZ,GZ,GEZ}
+;; compare: ADDcc ADDCcc ANDcc ORcc SUBcc SUBCcc XORcc XNORcc
+;; imul: MULX SMUL[cc] UMUL UMULXHI XMULX XMULXHI
+;; idiv: UDIVX SDIVX
+;; flush: FLUSH
+;; load/regular: LD{UB,UH,UW} LDFSR
+;; load/prefetch: PREFETCH
+;; fpload: LDF LDDF LDQF
+;; sload: LD{SB,SH,SW}
+;; store: ST{B,H,W,X} STFSR
+;; fpstore: STF STDF STQF
+;; cbcond: CWB{NE,E,G,LE,GE,L,GU,LEU,CC,CS,POS,NEG,VC,VS}
+;; CXB{NE,E,G,LE,GE,L,GU,LEU,CC,CS,POS,NEG,VC,VS}
+;; uncond_branch: BA BPA JMPL
+;; branch: B{NE,E,G,LE,GE,L,GU,LEU,CC,CS,POS,NEG,VC,VS}
+;; BP{NE,E,G,LE,GE,L,GU,LEU,CC,CS,POS,NEG,VC,VS}
+;; FB{U,G,UG,L,UL,LG,NE,BE,UE,GE,UGE,LE,ULE,O}
+;; call: CALL
+;; return: RESTORE RETURN
+;; fpmove: FABS{s,d,q} FMOV{s,d,q} FNEG{s,d,q}
+;; fpcmove: FMOV{S,D,Q}{icc,xcc,fcc}
+;; fpcrmove: FMOVR{s,d,q}{Z,LEZ,LZ,NZ,GZ,GEZ}
+;; fp: FADD{s,d,q} FSUB{s,d,q} FHSUB{s,d} FNHADD{s,d} FNADD{s,d}
+;; FiTO{s,d,q} FsTO{i,x,d,q} FdTO{i,x,s,q} FxTO{d,s,q} FqTO{i,x,s,d}
+;; fpcmp: FCMP{s,d,q} FCMPE{s,d,q}
+;; fpmul: FMADD{s,d}  FMSUB{s,d} FMUL{s,d,q} FNMADD{s,d}
+;;FNMSUB{s,d} FNMUL{s,d} FNsMULd FsMULd
+;;FdMU

Re: [C++ PATCH] classtype_has_nothrow_assign_or_copy_p is confusing

2017-07-05 Thread Jason Merrill
On Mon, Jul 3, 2017 at 12:58 PM, Nathan Sidwell  wrote:
> I found classtype_has_nothrow_assign_or_copy_p confusing, trying to figure
> out when it should return false and when true.

I'm curious why you were looking at it?  It's only used by obsolete
trait built-ins.

Jason


Re: [C++ PATCH] classtype_has_nothrow_assign_or_copy_p is confusing

2017-07-05 Thread Nathan Sidwell

On 07/05/2017 10:41 AM, Jason Merrill wrote:

On Mon, Jul 3, 2017 at 12:58 PM, Nathan Sidwell  wrote:

I found classtype_has_nothrow_assign_or_copy_p confusing, trying to figure
out when it should return false and when true.


I'm curious why you were looking at it?  It's only used by obsolete
trait built-ins.



It showed up grepping CLASSTYPE_CONSTRUCTORS.

Here's my plan for class member name handling:

1) turn the sorted field vector into a member hash by name.  deploy 
STAT_HACK as appropriate

2) find cdtors by name in method_vec, not magic slot
3) move all-but-conv-ops from METHOD_VEC into the member hash table
4) (maybe) put all the conv ops on a single overload, found by name in 
the member hash table. (I'm thinking the separation by non-canonical 
type is not a win)


Put the lookup & pushing routines in name-lookup.c.  Keep the hierarchy 
searching in search.c


TYPE_METHODS & CLASSTYPE_METHOD_VEC die.

Happy to talk through this next week in Toronto (I presume you'll be there?)

nathan

--
Nathan Sidwell


Re: [PATCH] Simplify vec_merge of vec_duplicate with const_vector

2017-07-05 Thread Kyrill Tkachov


On 27/06/17 23:29, Jeff Law wrote:

On 06/06/2017 02:25 AM, Kyrill Tkachov wrote:

Hi all,

I'm trying to improve some of the RTL-level handling of vector lane
operations on aarch64 and that
involves dealing with a lot of vec_merge operations. One simplification
that I noticed missing
from simplify-rtx are combinations of vec_merge with vec_duplicate.
In this particular case:
(vec_merge (vec_duplicate (X)) (const_vector [A, B]) (const_int N))

which can be replaced with

(vec_concat (X) (B)) if N == 1 (0b01) or
(vec_concat (A) (X)) if N == 2 (0b10).

For the aarch64 testcase in this patch this simplifications allows us to
try to combine:
(set (reg:V2DI 77 [ x ])
 (vec_concat:V2DI (mem:DI (reg:DI 0 x0 [ y ]) [1 *y_3(D)+0 S8 A64])
 (const_int 0 [0])))

instead of the more complex:
(set (reg:V2DI 77 [ x ])
 (vec_merge:V2DI (vec_duplicate:V2DI (mem:DI (reg:DI 0 x0 [ y ]) [1
*y_3(D)+0 S8 A64]))
 (const_vector:V2DI [
 (const_int 0 [0])
 (const_int 0 [0])
 ])
 (const_int 1 [0x1])))


For the simplified form above we already have an aarch64 pattern:
*aarch64_combinez which
is missing a DI/DFmode version due to an oversight, so this patch
extends that pattern as well to
use the VDC mode iterator that includes DI and DFmode (as well as V2HF
which VD_BHSI was missing).
The aarch64 hunk is needed to see the benefit of the simplify-rtx.c
hunk, so I didn't split them
into separate patches.

Before this for the testcase we'd generate:
construct_lanedi:
 moviv0.4s, 0
 ldr x0, [x0]
 ins v0.d[0], x0
 ret

construct_lanedf:
 moviv0.2d, 0
 ldr d1, [x0]
 ins v0.d[0], v1.d[0]
 ret

but now we can generate:
construct_lanedi:
 ldr d0, [x0]
 ret

construct_lanedf:
 ldr d0, [x0]
 ret

Bootstrapped and tested on aarch64-none-linux-gnu.

Ok for trunk?

Thanks,
Kyrill

2017-06-06  Kyrylo Tkachov  

 * simplify-rtx.c (simplify_ternary_operation, VEC_MERGE):
 Simplify vec_merge of vec_duplicate and const_vector.
 * config/aarch64/predicates.md (aarch64_simd_or_scalar_imm_zero):
 New predicate.
 * config/aarch64/aarch64-simd.md (*aarch64_combinez): Use VDC
 mode iterator.  Update predicate on operand 1 to
 handle non-const_vec constants.  Delete constraints.
 (*aarch64_combinez_be): Likewise for operand 2.

2017-06-06  Kyrylo Tkachov  

 * gcc.target/aarch64/construct_lane_zero_1.c: New test.

OK for the simplify-rtx parts.


Thanks Jeff.
Pinging the aarch64 parts at:
https://gcc.gnu.org/ml/gcc-patches/2017-06/msg00272.html

I've re-bootstrapped and re-tested the patches on top of current trunk.

Kyrill


jeff





Re: [PATCH] vec_merge + vec_duplicate + vec_concat simplification

2017-07-05 Thread Kyrill Tkachov


On 27/06/17 23:28, Jeff Law wrote:

On 06/06/2017 02:35 AM, Kyrill Tkachov wrote:

Hi all,

Another vec_merge simplification that's missing is transforming:
(vec_merge (vec_duplicate x) (vec_concat (y) (z)) (const_int N))
into
(vec_concat x z) if N == 1 (0b01) or
(vec_concat y x) if N == 2 (0b10)

For the testcase in this patch on aarch64 this allows us to try matching
during combine the pattern:
(set (reg:V2DI 78 [ x ])
 (vec_concat:V2DI
 (mem:DI (reg/v/f:DI 76 [ y ]) [1 *y_4(D)+0 S8 A64])
 (mem:DI (plus:DI (reg/v/f:DI 76 [ y ])
 (const_int 8 [0x8])) [1 MEM[(long long int *)y_4(D) +
8B]+0 S8 A64])))

rather than the more complex:
(set (reg:V2DI 78 [ x ])
 (vec_merge:V2DI (vec_duplicate:V2DI (mem:DI (plus:DI (reg/v/f:DI 76
[ y ])
 (const_int 8 [0x8])) [1 MEM[(long long int *)y_4(D)
+ 8B]+0 S8 A64]))
 (vec_duplicate:V2DI (mem:DI (reg/v/f:DI 76 [ y ]) [1 *y_4(D)+0
S8 A64]))
 (const_int 2 [0x2])))

We don't actually have an aarch64 pattern for the simplified version
above, but it's a simple enough
form to add, so this patch adds such a pattern that performs a
concatenated load of two 64-bit vectors
in adjacent memory locations as a single Q-register LDR. The new aarch64
pattern is needed to demonstrate
the effectiveness of the simplify-rtx change, so I've kept them together
as one patch.

Now for the testcase in the patch we can generate:
construct_lanedi:
 ldr q0, [x0]
 ret

construct_lanedf:
 ldr q0, [x0]
 ret

instead of:
construct_lanedi:
 ld1r{v0.2d}, [x0]
 ldr x0, [x0, 8]
 ins v0.d[1], x0
 ret

construct_lanedf:
 ld1r{v0.2d}, [x0]
 ldr d1, [x0, 8]
 ins v0.d[1], v1.d[0]
 ret

The new memory constraint Utq is needed because we need to allow only
the Q-register addressing modes but
the MEM expressions in the RTL pattern have 64-bit vector modes, and if
we don't constrain them they will
allow the D-register addressing modes during register allocation/address
mode selection, which will produce
invalid assembly.

Bootstrapped and tested on aarch64-none-linux-gnu.
Ok for trunk?

Thanks,
Kyrill

2017-06-06  Kyrylo Tkachov  

 * simplify-rtx.c (simplify_ternary_operation, VEC_MERGE):
 Simplify vec_merge of vec_duplicate and vec_concat.
 * config/aarch64/constraints.md (Utq): New constraint.
 * config/aarch64/aarch64-simd.md (load_pair_lanes): New
 define_insn.

2017-06-06  Kyrylo Tkachov  

 * gcc.target/aarch64/load_v2vec_lanes_1.c: New test.

OK for the simplify-rtx bits.


Thanks Jeff.
I'd like to ping the aarch64 bits:
https://gcc.gnu.org/ml/gcc-patches/2017-06/msg00273.html

I've re-bootstrapped and re-tested these patches on aarch64 with today's trunk.

Kyrill


jeff





Re: Default std::list default and move constructors

2017-07-05 Thread Jonathan Wakely

On 26/06/17 21:29 +0200, François Dumont wrote:

Hi

   Here is the patch to default implementation of std::list default 
and move constructors.


   I introduce _List_node_header to take care of the move 
implementation and also isolate management of the optional list size 
storage. I prefer it to usage of _List_node as move 
constructor seems complicated to implement with an __aligned_membuf. 
It also avoids to use raw memory as-if it was a size_t without 
constructing it. Even if size_t constructor is trivial I guess some 
memory analyser could have complain about it.


No, that's not something we should be concerned about. As long as we
write to that memory before we read it, there's no problem. I still
like your new _List_node_header type, but not for this reason.



   * include/bits/stl_list.h
   (_List_node_base()): Define.
   (_List_node_base(_List_node_base*, _List_node_base*)): New.
   (struct _List_node_header): New.
   (_List_impl()): Fix noexcept qualification.
   (_List_impl(_List_impl&&)): New, default.
   (_List_impl(_List_impl&&, _Node_alloc_type&&)): New.
   (_List_base()): Default.
   (_List_base(_List_base&&)): Default.
   (_List_base(_List_base&&, _Node_alloc_type&&, true_type)): New.
   (_List_base(_List_base&&, _Node_alloc_type&&, false_type)): New.
   (_List_base(_List_base&&, _Node_alloc_type&&)): Use latters.
   (_List_base::_M_move_nodes): Adapt to use
   _List_node_header._M_move_nodes.
   (_List_base::_M_init): Likewise.
   (list<>()): Default.
   (list<>(list&&)): Default.
   (list<>::_M_move_assign(list&&, true_type)): Use _M_move_nodes.

   Tested under Linux x86_64.

Ok to commit ?


It's mostly good, but I'd like to make a few suggestions ...



diff --git a/libstdc++-v3/include/bits/stl_list.h 
b/libstdc++-v3/include/bits/stl_list.h
index 232885a..7e5 100644
--- a/libstdc++-v3/include/bits/stl_list.h
+++ b/libstdc++-v3/include/bits/stl_list.h
@@ -82,6 +82,17 @@ namespace std _GLIBCXX_VISIBILITY(default)
  _List_node_base* _M_next;
  _List_node_base* _M_prev;

+#if __cplusplus >= 201103L
+  _List_node_base() = default;
+#else
+  _List_node_base()
+  { }
+#endif
+
+  _List_node_base(_List_node_base* __next, _List_node_base* __prev)
+   : _M_next(__next), _M_prev(__prev)
+  { }
+


I think I'd prefer to leave this struct with no user-defined
constructors, instead of adding these.


  static void
  swap(_List_node_base& __x, _List_node_base& __y) _GLIBCXX_USE_NOEXCEPT;

@@ -99,6 +110,79 @@ namespace std _GLIBCXX_VISIBILITY(default)
  _M_unhook() _GLIBCXX_USE_NOEXCEPT;
};

+/// The %list node header.
+struct _List_node_header : public _List_node_base
+{
+private:
+#if _GLIBCXX_USE_CXX11_ABI
+  std::size_t _M_size;
+#endif


I don't think this needs to be private, because we don't have to worry
about users accessing this member. It's an internal-only type, and the
_M_next and _M_prev members are already public.

If it's public then the _List_base::_M_inc_size, _M_dec_size etc.
could access it directly, and we don't need to add duplicates of those
functions to _List_impl.


+
+  _List_node_base* _M_base() { return this; }


Is this function necessary?


+public:
+  _List_node_header() _GLIBCXX_NOEXCEPT
+  : _List_node_base(this, this)
+# if _GLIBCXX_USE_CXX11_ABI
+  , _M_size(0)
+# endif
+  { }


This could be:

 _List_node_header() _GLIBCXX_NOEXCEPT
 { _M_init(); }



+#if __cplusplus >= 201103L
+  _List_node_header(_List_node_header&& __x) noexcept
+  : _List_node_base(__x._M_next, __x._M_prev)


And this could use aggregate-initialization:

 : _List_node_base{__x._M_next, __x._M_prev}


+# if _GLIBCXX_USE_CXX11_ABI
+  , _M_size(__x._M_size)
+# endif
+{
+   if (__x._M_base()->_M_next == __x._M_base())
+ this->_M_next = this->_M_prev = this;
+   else
+ {
+   this->_M_next->_M_prev = this->_M_prev->_M_next = this->_M_base();
+   __x._M_init();
+ }
+  }



+#if _GLIBCXX_USE_CXX11_ABI
+  size_t _M_get_size() const { return _M_size; }
+  void _M_set_size(size_t __n) { _M_size = __n; }
+  void _M_inc_size(size_t __n) { _M_size += __n; }
+  void _M_dec_size(size_t __n) { _M_size -= __n; }
+#else
+  // dummy implementations used when the size is not stored
+  size_t _M_get_size() const { return 0; }
+  void _M_set_size(size_t) { }
+  void _M_inc_size(size_t) { }
+  void _M_dec_size(size_t) { }
+#endif


What do you think about only having _M_set_size() here?
The other functions are not needed here (assuming _M_size is public).

We could even get rid of _M_set_size and use #if in _M_init:


+  void
+  _M_init() _GLIBCXX_NOEXCEPT
+  {
+   this->_M_next = this->_M_prev = this;

#if _GLIBCXX_USE_CXX11_ABI
   _M_size = 0;
#endif


+   _M_set_size(0);
+  }
+};


This replaces a #if and #else and four functions with a #if and no
functions, which seem

Remove enum before machine_mode

2017-07-05 Thread Richard Sandiford
r216834 did a mass removal of "enum" before "machine_mode".  This patch
removes some new uses that have been added since then.

Tested on aarch64-linux-gnu and x86_64-linux-gnu, and also by building
at least one target per config/cpu.  Applied as obvious on the basis
that r216834 was OK.

Richard


2017-07-05  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* combine.c (simplify_if_then_else): Remove "enum" before
"machine_mode".
* compare-elim.c (can_eliminate_compare): Likewise.
* config/aarch64/aarch64-builtins.c (aarch64_simd_builtin_std_type):
Likewise.
(aarch64_lookup_simd_builtin_type): Likewise.
(aarch64_simd_builtin_type): Likewise.
(aarch64_init_simd_builtin_types): Likewise.
(aarch64_simd_expand_args): Likewise.
* config/aarch64/aarch64-protos.h (aarch64_simd_attr_length_rglist):
Likewise.
(aarch64_reverse_mask): Likewise.
(aarch64_simd_emit_reg_reg_move): Likewise.
(aarch64_gen_adjusted_ldpstp): Likewise.
(aarch64_ccmp_mode_to_code): Likewise.
(aarch64_operands_ok_for_ldpstp): Likewise.
(aarch64_operands_adjust_ok_for_ldpstp): Likewise.
* config/aarch64/aarch64.c (aarch64_ira_change_pseudo_allocno_class):
Likewise.
(aarch64_min_divisions_for_recip_mul): Likewise.
(aarch64_reassociation_width): Likewise.
(aarch64_get_condition_code_1): Likewise.
(aarch64_simd_emit_reg_reg_move): Likewise.
(aarch64_simd_attr_length_rglist): Likewise.
(aarch64_reverse_mask): Likewise.
(aarch64_operands_ok_for_ldpstp): Likewise.
(aarch64_operands_adjust_ok_for_ldpstp): Likewise.
(aarch64_gen_adjusted_ldpstp): Likewise.
* config/aarch64/cortex-a57-fma-steering.c (fma_node::rename):
Likewise.
* config/arc/arc.c (legitimate_offset_address_p): Likewise.
* config/arm/arm-builtins.c (arm_simd_builtin_std_type): Likewise.
(arm_lookup_simd_builtin_type): Likewise.
(arm_simd_builtin_type): Likewise.
(arm_init_simd_builtin_types): Likewise.
(arm_expand_builtin_args): Likewise.
* config/arm/arm-protos.h (arm_expand_builtin): Likewise.
* config/ft32/ft32.c (ft32_libcall_value): Likewise.
(ft32_setup_incoming_varargs): Likewise.
(ft32_function_arg): Likewise.
(ft32_function_arg_advance): Likewise.
(ft32_pass_by_reference): Likewise.
(ft32_arg_partial_bytes): Likewise.
(ft32_valid_pointer_mode): Likewise.
(ft32_addr_space_pointer_mode): Likewise.
(ft32_addr_space_legitimate_address_p): Likewise.
* config/i386/i386-protos.h (ix86_operands_ok_for_move_multiple):
Likewise.
* config/i386/i386.c (ix86_setup_incoming_vararg_bounds): Likewise.
(ix86_emit_outlined_ms2sysv_restore): Likewise.
(iamcu_alignment): Likewise.
(canonicalize_vector_int_perm): Likewise.
(ix86_noce_conversion_profitable_p): Likewise.
(ix86_mpx_bound_mode): Likewise.
(ix86_operands_ok_for_move_multiple): Likewise.
* config/microblaze/microblaze-protos.h
(microblaze_expand_conditional_branch_reg): Likewise.
* config/microblaze/microblaze.c
(microblaze_expand_conditional_branch_reg): Likewise.
* config/powerpcspe/powerpcspe.c (rs6000_init_hard_regno_mode_ok):
Likewise.
(rs6000_reassociation_width): Likewise.
(rs6000_invalid_binary_op): Likewise.
(fusion_p9_p): Likewise.
(emit_fusion_p9_load): Likewise.
(emit_fusion_p9_store): Likewise.
* config/riscv/riscv-protos.h (riscv_regno_mode_ok_for_base_p):
Likewise.
(riscv_hard_regno_mode_ok_p): Likewise.
(riscv_address_insns): Likewise.
(riscv_split_symbol): Likewise.
(riscv_legitimize_move): Likewise.
(riscv_function_value): Likewise.
(riscv_hard_regno_nregs): Likewise.
(riscv_expand_builtin): Likewise.
* config/riscv/riscv.c (riscv_build_integer_1): Likewise.
(riscv_build_integer): Likewise.
(riscv_split_integer): Likewise.
(riscv_legitimate_constant_p): Likewise.
(riscv_cannot_force_const_mem): Likewise.
(riscv_regno_mode_ok_for_base_p): Likewise.
(riscv_valid_base_register_p): Likewise.
(riscv_valid_offset_p): Likewise.
(riscv_valid_lo_sum_p): Likewise.
(riscv_classify_address): Likewise.
(riscv_legitimate_address_p): Likewise.
(riscv_address_insns): Likewise.
(riscv_load_store_insns): Likewise.
(riscv_force_binary): Likewise.
(riscv_split_symbol): Likewise.
(riscv_force_address): Likewise.
(riscv_legitimize_address): Likewise.
(riscv_move_integer): Likewise.
(riscv_legitimize_const_move): Likewise.
(riscv_legitimize_move): Likewise.
  

Use SET_DECL_MODE in libcc1

2017-07-05 Thread Richard Sandiford
Applied as obvious after testing on aarch64-linux-gnu and x86_64-linux-gnu.

Richard


2017-07-05  Richard Sandiford  

libcc1/
* libcp1plugin.cc (plugin_build_field): Use SET_DECL_MODE.

Index: libcc1/libcp1plugin.cc
===
--- libcc1/libcp1plugin.cc  2017-07-02 09:32:31.826745247 +0100
+++ libcc1/libcp1plugin.cc  2017-07-05 16:31:20.451371750 +0100
@@ -1887,7 +1887,7 @@ plugin_build_field (cc1_plugin::connecti
= c_build_bitfield_integer_type (bitsize, TYPE_UNSIGNED (field_type));
 }
 
-  DECL_MODE (decl) = TYPE_MODE (TREE_TYPE (decl));
+  SET_DECL_MODE (decl, TYPE_MODE (TREE_TYPE (decl)));
 
   // There's no way to recover this from DWARF.
   SET_DECL_OFFSET_ALIGN (decl, TYPE_PRECISION (pointer_sized_int_node));


Re: [PATCH] C/C++: add fix-it hints for various missing symbols

2017-07-05 Thread David Malcolm
On Mon, 2017-07-03 at 23:01 +, Joseph Myers wrote:
> Does the changed location fix bug 7356?

The patch as-written doesn't affect that bug, since the patch only
affects sites that use c_parser_require and cp_parser_require with
certain token types, and the diagnostic in PR 7356 is emitted by the C
FE here:

2174  /* This can appear in many cases looking nothing like a
2175 function definition, so we don't give a more specific
2176 error suggesting there was one.  */
2177  c_parser_error (parser, "expected %<=%>, %<,%>, %<;%>, 
% "
2178  "or %<__attribute__%>");

(the C++ FE handles it, emitting:

pr7356.c:1:1: error: ‘a’ does not name a type
 a//sample
 ^
)

c_parser_error currently uses the location of the next token, and
concats as description of the next token.

I tried hacking up c_parser_error to unconditionally attempt to use the
location immediately after the previous token.  This "fixes" PR 7356,
giving:

pr7356.c:1:2: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before 
‘typedef’
 a//sample
  ^

This error message might be better to be worded in terms of the
syntactic thing that came before, which would yield:

pr7356.c:1:2: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’
after declaration
 a//sample
  ^

or somesuch.  Doing so would presumably require adding an extra param to 
c_parser_error, e.g. an enum describing the syntactic elements that go before.

Does this sound worth pursuing as a followup?


Thanks
Dave


[ARM] Implement TARGET_FIXED_CONDITION_CODE_REGS

2017-07-05 Thread Richard Earnshaw (lists)

This patch implements TARGET_FIXED_CONDITION_CODE_REGS on ARM.

We have two main cases to consider: in Thumb1 code there are no
condition code registers, so we simply return false.  For other
cases we set the the first pointer to CC_REGNUM and the second to
VFPCC_REGNUM iff generating hard-float code.

Running the CSiBE benchmark I see a couple of cases (both in the same
file) where this feature kicks in, so it's not a major change.

2017-07-05  Richard Earnshaw  

* config/arm/arm.c (arm_fixed_condition_code_regs): New function.
(TARGET_FIXED_CONDITION_CODE_REGS): Redefine.

Installed on trunk.
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index d3a40b9..c6101ef 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -110,6 +110,7 @@ static void arm_print_operand_address (FILE *, machine_mode, rtx);
 static bool arm_print_operand_punct_valid_p (unsigned char code);
 static const char *fp_const_from_val (REAL_VALUE_TYPE *);
 static arm_cc get_arm_condition_code (rtx);
+static bool arm_fixed_condition_code_regs (unsigned int *, unsigned int *);
 static const char *output_multi_immediate (rtx *, const char *, const char *,
 	   int, HOST_WIDE_INT);
 static const char *shift_op (rtx, HOST_WIDE_INT *);
@@ -775,6 +776,9 @@ static const struct attribute_spec arm_attribute_table[] =
 #undef TARGET_CUSTOM_FUNCTION_DESCRIPTORS
 #define TARGET_CUSTOM_FUNCTION_DESCRIPTORS 2
 
+#undef TARGET_FIXED_CONDITION_CODE_REGS
+#define TARGET_FIXED_CONDITION_CODE_REGS arm_fixed_condition_code_regs
+
 
 /* Obstack for minipool constant handling.  */
 static struct obstack minipool_obstack;
@@ -22928,6 +22932,20 @@ get_arm_condition_code (rtx comparison)
   return code;
 }
 
+/* Implement TARGET_FIXED_CONDITION_CODE_REGS.  We only have condition
+   code registers when not targetting Thumb1.  The VFP condition register
+   only exists when generating hard-float code.  */
+static bool
+arm_fixed_condition_code_regs (unsigned int *p1, unsigned int *p2)
+{
+  if (!TARGET_32BIT)
+return false;
+
+  *p1 = CC_REGNUM;
+  *p2 = TARGET_HARD_FLOAT ? VFPCC_REGNUM : INVALID_REGNUM;
+  return true;
+}
+
 /* Tell arm_asm_output_opcode to output IT blocks for conditionally executed
instructions.  */
 void


Re: [PATCH v10] add -fpatchable-function-entry=N,M option

2017-07-05 Thread Sandra Loosemore

On 07/05/2017 01:36 AM, Torsten Duwe wrote:

Changes since v9:

* Do not store (declare static) the nop pattern template string.
   In the future, it might depend on the particular function
   being emitted. Fetch it freshly each time instead.

* On platforms without named sections, simply omit the recording
   of the nop locations. Run-time instrumentation can still fiddle
   it out, if desired. Document this behaviour in a half sentence.

* Move the hook documentation to where it belongs. Texi file (re-)
   generation should work cleanly now.

* Documentation clarified as requested.

Torsten

[snip]

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 03ba8fc436c..a4c3c98b9f5 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -3105,6 +3105,27 @@ that affect more than one function.
  This attribute should be used for debugging purposes only.  It is not
  suitable in production code.

+@item patchable_function_entry
+@cindex @code{patchable_function_entry} function attribute
+@cindex extra NOP instructions at the function entry point
+In case the target's text segment can be made writable at run time by
+any means, padding the function entry with a number of NOPs can be
+used to provide a universal tool for instrumentation.
+
+The @code{patchable_function_entry} function attribute can be used to
+change the number of NOPs to any desired value.  The two-value syntax
+is the same as for the command-line switch
+@option{-fpatchable-function-entry=N,M}, generating @var{N} NOPs, with
+the function entry point before the @var{M}th NOP instruction.
+@var{M} defaults to 0 if omitted e.g. function entry point is before
+the first NOP.
+
+If patchable function entries are enabled globally using the command
+line option @option{-fpatchable-function-entry=N,M}, then all functions


s/command line option/command-line option


+that are part of the instrumentation framework must disable
+instrumentation with the attribute @code{patchable_function_entry (0)}
+to prevent recursion.


The functions don't disable instrumentation, programmers disable the 
instrumentation on functions.  Rewrite this clause as


...then you must disable instrumentation on all functions that are part 
of the instrumentation framework with the attribute...



diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 04cecf94405..1b8a4555b33 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -11520,6 +11520,34 @@ of the function name, it is considered to be a match.  
For C99 and C++
  extended identifiers, the function name must be given in UTF-8, not
  using universal character names.

+@item -fpatchable-function-entry=@var{N}[,@var{M}]
+@opindex fpatchable-function-entry
+Generate @var{N} NOPs right at the beginning
+of each function, with the function entry point before the @var{M}th NOP.
+If @var{M} is omitted, it defaults to @code{0} so the
+function entry points to the address just at the first NOP.
+The NOP instructions reserve extra space which can be used to patch in
+any desired instrumentation at run time, provided that the code segment
+is writable.  The amount of space is controllable indirectly via
+the number of NOPs; the NOP instruction used corresponds to the instruction
+emitted by the internal GCC back-end interface @code{gen_nop}.  This behavior
+is target-specific and may also depend on the architecture variant and/or
+other compilation options.
+
+For run-time identification, the starting addresses of these areas,
+which correspond to their respective function entries minus @var{M},
+are additionally collected in the @code{__patchable_function_entries}
+section of the resulting binary, if the platform supports it.
+
+Note that the value of @code{__attribute__ ((patchable_function_entry
+(N,M)))} takes precedence over command-line option
+@option{-fpatchable-function-entry=N,M}.  This can be used to increase
+the area size or to remove it completely on a single function.
+If @code{N=0}, no pad location is recorded.
+
+The NOP instructions are inserted at --- and maybe before, depending on
+@var{M} --- the function entry address, even before the prologue.


No spaces around the em-dashes '---'.


+DEFHOOK
+(print_patchable_function_entry,
+ "Generate a patchable area at the function start, consisting of\n\
+@var{patch_area_size} NOP instructions.  If the target supports named\n\
+sections and if @var{record_p} is true, insert a pointer to the current\n\
+location in the table of patchable functions.  This table will then be held\n\
+in a special section called @code{__patchable_function_entries}.",


I don't understand when "then" might be.  Can you rewrite this in the 
present tense?  "This table is held..."


-Sandra



Re: [PATCH v10] add -fpatchable-function-entry=N,M option

2017-07-05 Thread Richard Earnshaw (lists)
On 05/07/17 16:38, Sandra Loosemore wrote:
> On 07/05/2017 01:36 AM, Torsten Duwe wrote:
>> Changes since v9:
>>
>> * Do not store (declare static) the nop pattern template string.
>>In the future, it might depend on the particular function
>>being emitted. Fetch it freshly each time instead.
>>
>> * On platforms without named sections, simply omit the recording
>>of the nop locations. Run-time instrumentation can still fiddle
>>it out, if desired. Document this behaviour in a half sentence.
>>
>> * Move the hook documentation to where it belongs. Texi file (re-)
>>generation should work cleanly now.
>>
>> * Documentation clarified as requested.
>>
>> Torsten
>>
>> [snip]
>>
>> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
>> index 03ba8fc436c..a4c3c98b9f5 100644
>> --- a/gcc/doc/extend.texi
>> +++ b/gcc/doc/extend.texi
>> @@ -3105,6 +3105,27 @@ that affect more than one function.
>>   This attribute should be used for debugging purposes only.  It is not
>>   suitable in production code.
>>
>> +@item patchable_function_entry
>> +@cindex @code{patchable_function_entry} function attribute
>> +@cindex extra NOP instructions at the function entry point
>> +In case the target's text segment can be made writable at run time by
>> +any means, padding the function entry with a number of NOPs can be
>> +used to provide a universal tool for instrumentation.
>> +
>> +The @code{patchable_function_entry} function attribute can be used to
>> +change the number of NOPs to any desired value.  The two-value syntax
>> +is the same as for the command-line switch
>> +@option{-fpatchable-function-entry=N,M}, generating @var{N} NOPs, with
>> +the function entry point before the @var{M}th NOP instruction.
>> +@var{M} defaults to 0 if omitted e.g. function entry point is before
>> +the first NOP.
>> +
>> +If patchable function entries are enabled globally using the command
>> +line option @option{-fpatchable-function-entry=N,M}, then all functions
> 
> s/command line option/command-line option
> 
>> +that are part of the instrumentation framework must disable
>> +instrumentation with the attribute @code{patchable_function_entry (0)}
>> +to prevent recursion.
> 
> The functions don't disable instrumentation, programmers disable the
> instrumentation on functions.  Rewrite this clause as
> 
> ...then you must disable instrumentation on all functions that are part
> of the instrumentation framework with the attribute...
> 
>> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
>> index 04cecf94405..1b8a4555b33 100644
>> --- a/gcc/doc/invoke.texi
>> +++ b/gcc/doc/invoke.texi
>> @@ -11520,6 +11520,34 @@ of the function name, it is considered to be
>> a match.  For C99 and C++
>>   extended identifiers, the function name must be given in UTF-8, not
>>   using universal character names.
>>
>> +@item -fpatchable-function-entry=@var{N}[,@var{M}]
>> +@opindex fpatchable-function-entry
>> +Generate @var{N} NOPs right at the beginning
>> +of each function, with the function entry point before the @var{M}th
>> NOP.
>> +If @var{M} is omitted, it defaults to @code{0} so the
>> +function entry points to the address just at the first NOP.
>> +The NOP instructions reserve extra space which can be used to patch in
>> +any desired instrumentation at run time, provided that the code segment
>> +is writable.  The amount of space is controllable indirectly via
>> +the number of NOPs; the NOP instruction used corresponds to the
>> instruction
>> +emitted by the internal GCC back-end interface @code{gen_nop}.  This
>> behavior
>> +is target-specific and may also depend on the architecture variant
>> and/or
>> +other compilation options.
>> +
>> +For run-time identification, the starting addresses of these areas,
>> +which correspond to their respective function entries minus @var{M},
>> +are additionally collected in the @code{__patchable_function_entries}
>> +section of the resulting binary, if the platform supports it.
>> +
>> +Note that the value of @code{__attribute__ ((patchable_function_entry
>> +(N,M)))} takes precedence over command-line option
>> +@option{-fpatchable-function-entry=N,M}.  This can be used to increase
>> +the area size or to remove it completely on a single function.
>> +If @code{N=0}, no pad location is recorded.
>> +
>> +The NOP instructions are inserted at --- and maybe before, depending on
>> +@var{M} --- the function entry address, even before the prologue.
> 
> No spaces around the em-dashes '---'.
> 
>> +DEFHOOK
>> +(print_patchable_function_entry,
>> + "Generate a patchable area at the function start, consisting of\n\
>> +@var{patch_area_size} NOP instructions.  If the target supports named\n\
>> +sections and if @var{record_p} is true, insert a pointer to the
>> current\n\
>> +location in the table of patchable functions.  This table will then
>> be held\n\
>> +in a special section called @code{__patchable_function_entries}.",
> 
> I don't understand when "then" might be.  Can you rewr

Re: [ping] don't complain about undefined env vars in self specs on gcc -v

2017-07-05 Thread Joseph Myers
On Wed, 5 Jul 2017, Olivier Hainque wrote:

> Hello,
> 
> Ping for patch proposed here:
> https://gcc.gnu.org/ml/gcc-patches/2017-06/msg00579.html

This patch is OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: Forward list default default and move constructors

2017-07-05 Thread Jonathan Wakely

On 19/06/17 22:48 +0200, François Dumont wrote:

Hi

   Here is the patch to default the default and move constructors on 
the std::forward_list. Putting a move constructor on 
_Fwd_list_node_base helped limiting the code impact of this patch. It 
doesn't have any side effect as iterator types using this base type 
are not defining any move semantic.


I don't understand this comment.

1) The iterators only _Fwd_list_node_base* pointers, so that's why
they aren't affected. It's not because the iterators don't define move
semantics.

2) The iterators *do* have move semantics, they have
implicitly-declared move operations, which are identical to their
implicitly-defined copy operations (because moving a pointer is
identical to copying it).

3) Adding this move constructor has a pretty large side effect because
now its copy constructor and copy assignment operator are defined as
deleted, and it has no move assignment operator. That's OK, because we
never copy or move nodes (except in the new _Fwd_list_impl move ctor
you're adding). But it's a significant side effect. Please consider
adding the following to make those side effects explicit:

 _Fwd_list_node_base(const _Fwd_list_node_base&) = delete;
 _Fwd_list_node_base& operator=(const _Fwd_list_node_base&) = delete;
 _Fwd_list_node_base& operator=(_Fwd_list_node_base&&) = delete;


   I also took the time to optimize the move constructor with 
allocator when allocator is always equal. It avoids initializing an 
empty forward list for nothing.


   I think it is fine but could we have an abi issue because of the 
change in forward_list.tcc ?


Old code with undefined references to that constructor will still find
a definition in new code that explicitly instantiates a forward_list.

New code compiled after your change would not find the new
constructors (the ones with true_type and false_type parameters) in
old code that explicitly instantiated a forward_list.

Could you split that part of the change into a separate patch? The
changes to define constructors as defaulted are OK, so I'd like to
considere the proposed optimisation separately.





Re: [PATCH] C/C++: add fix-it hints for various missing symbols

2017-07-05 Thread Joseph Myers
On Wed, 5 Jul 2017, David Malcolm wrote:

> This error message might be better to be worded in terms of the
> syntactic thing that came before, which would yield:
> 
> pr7356.c:1:2: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’
> after declaration
>  a//sample
>   ^
> 
> or somesuch.  Doing so would presumably require adding an extra param to 
> c_parser_error, e.g. an enum describing the syntactic elements that go before.
> 
> Does this sound worth pursuing as a followup?

Yes.  When you're wording things in terms of what the syntax error comes 
after rather than saying it comes before some automatically-generated 
description of a token, it would be best if the caller passes the complete 
message in an i18n-friendly way, rather than using concat (bug 18248).

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATHC][x86] Scalar mask and round RTL templates

2017-07-05 Thread Kirill Yukhin
On 05 Jul 13:51, Peryt, Sebastian wrote:
> Tests were added. I also updated Changelog and set the max line length to be 
> equal to 79 characters.
Thanks!
> 
> Is it ok for trunk?
Your changes are OK for trunk. I've committed the patch.

--
Thanks, K
> 
> Thanks,
> Sebastian
> 
> -Original Message-
> From: Kirill Yukhin [mailto:kirill.yuk...@gmail.com] 
> Sent: Wednesday, July 5, 2017 12:36 PM
> To: Peryt, Sebastian 
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATHC][x86] Scalar mask and round RTL templates
> 
> On 05 Jul 06:38, Peryt, Sebastian wrote:
> > Hi Kirill,
> > 
> > Sorry for this confusion. I meant to write MDs for intrinsics. Those 
> > intrinsics are all masked ones for ADD[SD,SS], SUB[SD,SS], MUL[SD,SS], 
> > DIV[SD,SS], MIN[SD,SS] and MAX[SD,SS]. What I found is that for mask equal 
> > 0 they were producing wrong results when old mask meta-template was used.
> What you're talking about looks like a bug. Could you pls add a regession 
> test to your patch?
> 
> > Modified changelog below.
> > 
> > 2017-07-05  Sebastian Peryt  
> > 
> > gcc/
> > * config/i386/subst.md (mask_scalar, round_scalar, 
> > round_saeonly_scalar): New meta-templates.
> > (mask_scalar_name, mask_scalar_operand3, round_scalar_name,
> > round_scalar_mask_operand3, round_scalar_mask_op3,
> > round_scalar_constraint, round_scalar_prefix, round_saeonly_scalar_name,
> > round_saeonly_scalar_mask_operand3, round_saeonly_scalar_mask_op3,
> > round_saeonly_scalar_constraint, round_saeonly_scalar_prefix): New 
> > subst attribute.
> > * config/i386/sse.md
> > (_vm3): Renamed to ...
> > _vm3 
> > ... this.
> > (_vm3): Renamed to 
> > ...
> > _vm3 
> > ... this.
> > (_vm3): Renamed to ...
> > _vm3 ... 
> > this.
> > (v\t{%2, %1, 
> > %0|
> > %0, %1, %2}): Changed to ...
> > v\t{%2, 
> > %1, %0|
> > %0, %1, %2} ... this.
> > (v\t{%2, %1, 
> > %0|
> > %0, %1, %2}): Changed to ...
> > v\t{%2, 
> > %1, %0|
> > %0, %1, %2} ... this.
> > (v\t{%2, %1, 
> > %0|
> > %0, %1, %2}): Changed to 
> > ...
> > 
> > v\t{%2, 
> > %1, %0|
> > %0, %1, %2} 
> > ... this.
> Max line length is 79 characters I suppose.
> 
> --
> Thanks, K
> > 
> > Is it ok for trunk?
> > 
> > Thanks,
> > Sebastian
> > 
> > -Original Message-
> > From: Kirill Yukhin [mailto:kirill.yuk...@gmail.com]
> > Sent: Tuesday, July 4, 2017 7:45 PM
> > To: Peryt, Sebastian 
> > Cc: gcc-patches@gcc.gnu.org; Uros Bizjak 
> > Subject: Re: [PATHC][x86] Scalar mask and round RTL templates
> > 
> > Hello Sebastian,
> > On 23 Jun 09:00, Peryt, Sebastian wrote:
> > > Hi,
> > > 
> > > This patch adds three extra RTL meta-templates for scalar round and mask. 
> > > Additionally fixes errors caused by previous mask and round usage in some 
> > > of the intrinsics that I found.
> > Could you pls point which intrinsics did you fixed (or which errors)?
> > I see only MD changes in your patch.
> > 
> > > 
> > > 2017-06-23  Sebastian Peryt  
> > > 
> > > gcc/
> > >   * config/i386/subst.md (mask_scalar, round_scalar, 
> > > round_saeonly_scalar): New templates.
> > I'd call it meta-templates.
> > >   (mask_scalar_name, mask_scalar_operand3, round_scalar_name,
> > >   round_scalar_mask_operand3, round_scalar_mask_op3,
> > >   round_scalar_constraint, round_scalar_prefix, round_saeonly_scalar_name,
> > >   round_saeonly_scalar_mask_operand3, round_saeonly_scalar_mask_op3,
> > >   round_saeonly_scalar_constraint, round_saeonly_scalar_prefix): New 
> > > subst attribute.
> > >   * config/i386/sse.md
> > >   (_vm3): Renamed to ...
> > >   _vm3 
> > > ... this.
> > >   (_vm3): Renamed to 
> > > ...
> > >   _vm3 
> > > ... this.
> > >   (_vm3): Renamed to ...
> > >   _vm3 ... 
> > > this.
> > >   (v\t{%2, %1, 
> > > %0|%0, %1, %2}): 
> > > Changed to ...
> > >   v\t{%2, 
> > > %1, %0|%0, %1, 
> > > %2} ... this.
> > >   (v\t{%2, %1, 
> > > %0|%0, %1, %2}): 
> > > Changed to ...
> > >   v\t{%2, 
> > > %1, %0|%0, %1, 
> > > %2} ... this.
> > >   (v\t{%2, %1, 
> > > %0|%0, %1, 
> > > %2}): Changed to ...
> > >   
> > > v\t{%2, 
> > > %1, %0|%0, %1, 
> > > %2} ... this.
> > We need to obey conventions. Pls break long lines here.
> > 
> > --
> > Thanks, K
> > > 
> > > Is it ok for trunk?
> > > 
> > > Thanks,
> > > Sebastian
> > 
> > 




[PING**6] [PATCH, ARM] correctly encode the CC reg data flow

2017-07-05 Thread Bernd Edlinger
Ping...

On 06/14/17 14:33, Bernd Edlinger wrote:
> Ping...
> 
> On 06/01/17 18:00, Bernd Edlinger wrote:
>> Ping...
>>
>> On 05/12/17 18:49, Bernd Edlinger wrote:
>>> Ping...
>>>
>>> On 04/29/17 19:21, Bernd Edlinger wrote:
 Ping...

 On 04/20/17 20:11, Bernd Edlinger wrote:
> Ping...
>
> for this patch:
> https://gcc.gnu.org/ml/gcc-patches/2017-01/msg01351.html
>
> On 01/18/17 16:36, Bernd Edlinger wrote:
>> On 01/13/17 19:28, Bernd Edlinger wrote:
>>> On 01/13/17 17:10, Bernd Edlinger wrote:
 On 01/13/17 14:50, Richard Earnshaw (lists) wrote:
> On 18/12/16 12:58, Bernd Edlinger wrote:
>> Hi,
>>
>> this is related to PR77308, the follow-up patch will depend on 
>> this
>> one.
>>
>> When trying the split the *arm_cmpdi_insn and *arm_cmpdi_unsigned
>> before reload, a mis-compilation in libgcc function
>> __gnu_satfractdasq
>> was discovered, see [1] for more details.
>>
>> The reason seems to be that when the *arm_cmpdi_insn is directly
>> followed by a *arm_cmpdi_unsigned instruction, both are split
>> up into this:
>>
>>[(set (reg:CC CC_REGNUM)
>>  (compare:CC (match_dup 0) (match_dup 1)))
>> (parallel [(set (reg:CC CC_REGNUM)
>> (compare:CC (match_dup 3) (match_dup 4)))
>>(set (match_dup 2)
>> (minus:SI (match_dup 5)
>>  (ltu:SI (reg:CC_C CC_REGNUM) 
>> (const_int
>> 0])]
>>
>>[(set (reg:CC CC_REGNUM)
>>  (compare:CC (match_dup 2) (match_dup 3)))
>> (cond_exec (eq:SI (reg:CC CC_REGNUM) (const_int 0))
>>(set (reg:CC CC_REGNUM)
>> (compare:CC (match_dup 0) (match_dup 1]
>>
>> The problem is that the reg:CC from the *subsi3_carryin_compare
>> is not mentioning that the reg:CC is also dependent on the reg:CC
>> from before.  Therefore the *arm_cmpsi_insn appears to be
>> redundant and thus got removed, because the data values are
>> identical.
>>
>> I think that applies to a number of similar pattern where data
>> flow is happening through the CC reg.
>>
>> So this is a kind of correctness issue, and should be fixed
>> independently from the optimization issue PR77308.
>>
>> Therefore I think the patterns need to specify the true
>> value that will be in the CC reg, in order for cse to
>> know what the instructions are really doing.
>>
>>
>> Bootstrapped and reg-tested on arm-linux-gnueabihf.
>> Is it OK for trunk?
>>
>
> I agree you've found a valid problem here, but I have some issues
> with
> the patch itself.
>
>
> (define_insn_and_split "subdi3_compare1"
>   [(set (reg:CC_NCV CC_REGNUM)
> (compare:CC_NCV
>   (match_operand:DI 1 "register_operand" "r")
>   (match_operand:DI 2 "register_operand" "r")))
>(set (match_operand:DI 0 "register_operand" "=&r")
> (minus:DI (match_dup 1) (match_dup 2)))]
>   "TARGET_32BIT"
>   "#"
>   "&& reload_completed"
>   [(parallel [(set (reg:CC CC_REGNUM)
>(compare:CC (match_dup 1) (match_dup 2)))
>   (set (match_dup 0) (minus:SI (match_dup 1) (match_dup
> 2)))])
>(parallel [(set (reg:CC_C CC_REGNUM)
>(compare:CC_C
>  (zero_extend:DI (match_dup 4))
>  (plus:DI (zero_extend:DI (match_dup 5))
>   (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)
>   (set (match_dup 3)
>(minus:SI (minus:SI (match_dup 4) (match_dup 5))
>  (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0])]
>
>
> This pattern is now no-longer self consistent in that before the
> split
> the overall result for the condition register is in mode 
> CC_NCV, but
> afterwards it is just CC_C.
>
> I think CC_NCV is correct mode (the N, C and V bits all correctly
> reflect the result of the 64-bit comparison), but that then 
> implies
> that
> the cc mode of subsi3_carryin_compare is incorrect as well and
> should in
> fact also be CC_NCV.  Thinking about this pattern, I'm inclined to
> agree
> that CC_NCV is the correct mode for this operation
>
> I'm not sure if there are other consequences that will fall out 
> from
> fixing this (it's possible that we might need a change to
> selec

[PING**6] [PATCH, ARM] Further improve stack usage on sha512 (PR 77308)

2017-07-05 Thread Bernd Edlinger
Ping...

The latest version of this patch was here:
https://gcc.gnu.org/ml/gcc-patches/2017-04/msg01567.html

Thanks
Bernd.

On 06/14/17 14:34, Bernd Edlinger wrote:
> Ping...
> 
> On 06/01/17 18:01, Bernd Edlinger wrote:
>> Ping...
>>
>> On 05/12/17 18:49, Bernd Edlinger wrote:
>>> Ping...
>>>
>>> On 04/29/17 19:45, Bernd Edlinger wrote:
 Ping...

 I attached a rebased version since there was a merge conflict in
 the xordi3 pattern, otherwise the patch is still identical.
 It splits adddi3, subdi3, anddi3, iordi3, xordi3 and one_cmpldi2
 early when the target has no neon or iwmmxt.


 Thanks
 Bernd.



 On 11/28/16 20:42, Bernd Edlinger wrote:
> On 11/25/16 12:30, Ramana Radhakrishnan wrote:
>> On Sun, Nov 6, 2016 at 2:18 PM, Bernd Edlinger
>>  wrote:
>>> Hi!
>>>
>>> This improves the stack usage on the sha512 test case for the case
>>> without hardware fpu and without iwmmxt by splitting all di-mode
>>> patterns right while expanding which is similar to what the
>>> shift-pattern
>>> does.  It does nothing in the case iwmmxt and fpu=neon or vfp as
>>> well as
>>> thumb1.
>>>
>>
>> I would go further and do this in the absence of Neon, the VFP unit
>> being there doesn't help with DImode operations i.e. we do not 
>> have 64
>> bit integer arithmetic instructions without Neon. The main reason why
>> we have the DImode patterns split so late is to give a chance for
>> folks who want to do 64 bit arithmetic in Neon a chance to make this
>> work as well as support some of the 64 bit Neon intrinsics which IIRC
>> map down to these instructions. Doing this just for soft-float 
>> doesn't
>> improve the default case only. I don't usually test iwmmxt and I'm 
>> not
>> sure who has the ability to do so, thus keeping this restriction for
>> iwMMX is fine.
>>
>>
>
> Yes I understand, thanks for pointing that out.
>
> I was not aware what iwmmxt exists at all, but I noticed that most
> 64bit expansions work completely different, and would break if we 
> split
> the pattern early.
>
> I can however only look at the assembler outout for iwmmxt, and make
> sure that the stack usage does not get worse.
>
> Thus the new version of the patch keeps only thumb1, neon and 
> iwmmxt as
> it is: around 1570 (thumb1), 2300 (neon) and 2200 (wimmxt) bytes stack
> for the test cases, and vfp and soft-float at around 270 bytes stack
> usage.
>
>>> It reduces the stack usage from 2300 to near optimal 272 bytes (!).
>>>
>>> Note this also splits many ldrd/strd instructions and therefore I 
>>> will
>>> post a followup-patch that mitigates this effect by enabling the
>>> ldrd/strd
>>> peephole optimization after the necessary reg-testing.
>>>
>>>
>>> Bootstrapped and reg-tested on arm-linux-gnueabihf.
>>
>> What do you mean by arm-linux-gnueabihf - when folks say that I
>> interpret it as --with-arch=armv7-a --with-float=hard
>> --with-fpu=vfpv3-d16 or (--with-fpu=neon).
>>
>> If you've really bootstrapped and regtested it on armhf, doesn't this
>> patch as it stand have no effect there i.e. no change ?
>> arm-linux-gnueabihf usually means to me someone has configured with
>> --with-float=hard, so there are no regressions in the hard float ABI
>> case,
>>
>
> I know it proves little.  When I say arm-linux-gnueabihf
> I do in fact mean --enable-languages=all,ada,go,obj-c++
> --with-arch=armv7-a --with-tune=cortex-a9 --with-fpu=vfpv3-d16
> --with-float=hard.
>
> My main interest in the stack usage is of course not because of linux,
> but because of eCos where we have very small task stacks and in fact
> no fpu support by the O/S at all, so that patch is exactly what we 
> need.
>
>
> Bootstrapped and reg-tested on arm-linux-gnueabihf
> Is it OK for trunk?
>
>
> Thanks
> Bernd.


Re: [C++ PATCH] classtype_has_nothrow_assign_or_copy_p is confusing

2017-07-05 Thread Jason Merrill
On Wed, Jul 5, 2017 at 11:03 AM, Nathan Sidwell  wrote:
> On 07/05/2017 10:41 AM, Jason Merrill wrote:
>>
>> On Mon, Jul 3, 2017 at 12:58 PM, Nathan Sidwell  wrote:
>>>
>>> I found classtype_has_nothrow_assign_or_copy_p confusing, trying to
>>> figure
>>> out when it should return false and when true.
>>
>>
>> I'm curious why you were looking at it?  It's only used by obsolete
>> trait built-ins.
>
>
>
> It showed up grepping CLASSTYPE_CONSTRUCTORS.
>
> Here's my plan for class member name handling:
>
> 1) turn the sorted field vector into a member hash by name.  deploy
> STAT_HACK as appropriate
> 2) find cdtors by name in method_vec, not magic slot
> 3) move all-but-conv-ops from METHOD_VEC into the member hash table
> 4) (maybe) put all the conv ops on a single overload, found by name in the
> member hash table. (I'm thinking the separation by non-canonical type is not
> a win)

Sounds good.

Jason


Ping [Patch, fortran] PR70071

2017-07-05 Thread Harald Anlauf
The patch below has not been applied to the best of my knowledge.

Just a reminder for whoever cares.

Harald

On 05/04/17 20:19, Harald Anlauf wrote:
> On 05/04/17 18:15, Steve Kargl wrote:
>> On Thu, May 04, 2017 at 05:26:17PM +0200, Harald Anlauf wrote:
>>> While trying to clean up my working copy, I found that the trivial
>>> patch for the ICE-on-invalid as described in the PR regtests cleanly
>>> for 7-release on i686-pc-linux-gnu.
>>>
>>> Here's the cleaned-up version (diffs attached).
>>>
>>> 2017-05-04  Harald Anlauf  
>>>
>>> PR fortran/70071
>>> * array.c (gfc_ref_dimen_size): Handle bad subscript triplets.
>>>
>>> 2017-05-04  Harald Anlauf  
>>>
>>> PR fortran/70071
>>> * gfortran.dg/coarray_44.f90: New testcase.
>>>
>>
>> Harald,
>>
>> The patch looks reasonable.  Do you have a commit privilege?
>>
> 
> Steve,
> 
> no, I don't.
> 
> Would you like to take care of the patch?  Then please do so.
> 
> Thanks,
> Harald
> 



[PATCH] Add AddressSanitizer annotations to std::vector

2017-07-05 Thread Jonathan Wakely

This patch adds AddressSanitizer annotations to std::vector, so that
ASan can detect out-of-bounds accesses to the unused capacity of a
vector. e.g.

 std::vector v(2);
 int* p = v.data();
 v.pop_back();
 return p[1];  // ERROR

This cannot be detected by Debug Mode, but with these annotations ASan
knows that only v.data()[0] is valid and will give an error.

The annotations are only enabled for vector> and
only when std::allocator's base class is either malloc_allocator or
new_allocator. For other allocators the memory might not come from the
freestore and so isn't tracked by ASan.

Something similar has been on the google branches for some time:
https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=207517
This patch is a complete rewrite from scratch, because the google code
was not exception safe. If an exception happened while appending
elements to a vector, so that the size didn't change, the google code
did not undo the annotation for the increased size. It also didn't
annotate before deallocating, to mark the unused capacity as valid
again.

We can probably do similar annotations for std::deque, so that
partially filled pages are annotated. I also have a patch for
shared_ptr so that objects created by make_shared can be marked as
invalid after they're destroyed.

* config/allocator/malloc_allocator_base.h [__SANITIZE_ADDRESS__]
(_GLIBCXX_SANITIZE_STD_ALLOCATOR): Define.
* config/allocator/new_allocator_base.h [__SANITIZE_ADDRESS__]
(_GLIBCXX_SANITIZE_STD_ALLOCATOR): Define.
* include/bits/stl_vector.h [_GLIBCXX_SANITIZE_STD_ALLOCATOR]
(_Vector_impl::_Asan, _Vector_impl::_Asan::_Reinit)
(_Vector_impl::_Asan::_Grow, _GLIBCXX_ASAN_ANNOTATE_REINIT)
(_GLIBCXX_ASAN_ANNOTATE_GROW, _GLIBCXX_ASAN_ANNOTATE_GREW)
(_GLIBCXX_ASAN_ANNOTATE_SHRINK, _GLIBCXX_ASAN_ANNOTATE_BEFORE_DEALLOC):
Define annotation helper types and macros.
(vector::~vector, vector::push_back, vector::pop_back)
(vector::_M_erase_at_end): Add annotations.
* include/bits/vector.tcc (vector::reserve, vector::emplace_back)
(vector::insert, vector::_M_erase, vector::operator=)
(vector::_M_fill_assign, vector::_M_assign_aux)
(vector::_M_insert_rval, vector::_M_emplace_aux)
(vector::_M_insert_aux, vector::_M_realloc_insert)
(vector::_M_fill_insert, vector::_M_default_append)
(vector::_M_shrink_to_fit, vector::_M_range_insert): Annotate.

Tested x86_64-linux (using -fsanitize=address, with some local patches
to the testsuite) and powerpc64le-linux.

I plan to commit this to trunk tomorrow.

commit 396bdc35021083dafa4ebb29726219dcb9f1644c
Author: Jonathan Wakely 
Date:   Wed Jul 5 13:25:21 2017 +0100

Add AddressSanitizer annotations to std::vector

* config/allocator/malloc_allocator_base.h [__SANITIZE_ADDRESS__]
(_GLIBCXX_SANITIZE_STD_ALLOCATOR): Define.
* config/allocator/new_allocator_base.h [__SANITIZE_ADDRESS__]
(_GLIBCXX_SANITIZE_STD_ALLOCATOR): Define.
* include/bits/stl_vector.h [_GLIBCXX_SANITIZE_STD_ALLOCATOR]
(_Vector_impl::_Asan, _Vector_impl::_Asan::_Reinit)
(_Vector_impl::_Asan::_Grow, _GLIBCXX_ASAN_ANNOTATE_REINIT)
(_GLIBCXX_ASAN_ANNOTATE_GROW, _GLIBCXX_ASAN_ANNOTATE_GREW)
(_GLIBCXX_ASAN_ANNOTATE_SHRINK, _GLIBCXX_ASAN_ANNOTATE_BEFORE_DEALLOC):
Define annotation helper types and macros.
(vector::~vector, vector::push_back, vector::pop_back)
(vector::_M_erase_at_end): Add annotations.
* include/bits/vector.tcc (vector::reserve, vector::emplace_back)
(vector::insert, vector::_M_erase, vector::operator=)
(vector::_M_fill_assign, vector::_M_assign_aux)
(vector::_M_insert_rval, vector::_M_emplace_aux)
(vector::_M_insert_aux, vector::_M_realloc_insert)
(vector::_M_fill_insert, vector::_M_default_append)
(vector::_M_shrink_to_fit, vector::_M_range_insert): Annotate.

diff --git a/libstdc++-v3/config/allocator/malloc_allocator_base.h 
b/libstdc++-v3/config/allocator/malloc_allocator_base.h
index b091bbc..54e0837 100644
--- a/libstdc++-v3/config/allocator/malloc_allocator_base.h
+++ b/libstdc++-v3/config/allocator/malloc_allocator_base.h
@@ -52,4 +52,8 @@ namespace std
 # define __allocator_base  __gnu_cxx::malloc_allocator
 #endif
 
+#if defined(__SANITIZE_ADDRESS__) && !defined(_GLIBCXX_SANITIZE_STD_ALLOCATOR)
+# define _GLIBCXX_SANITIZE_STD_ALLOCATOR 1
+#endif
+
 #endif
diff --git a/libstdc++-v3/config/allocator/new_allocator_base.h 
b/libstdc++-v3/config/allocator/new_allocator_base.h
index 3d2bb67..e776ed3 100644
--- a/libstdc++-v3/config/allocator/new_allocator_base.h
+++ b/libstdc++-v3/config/allocator/new_allocator_base.h
@@ -52,4 +52,8 @@ namespace std
 # define __allocator_base  __gnu_cxx::new_allocator
 #endif
 
+#if defined(__SANITIZE_ADDRESS__) && !defined(_GLIBCXX_SANITIZE_STD_ALLOCATOR)
+# define _GLIBCXX_SANITIZE_STD_ALL

Re: [PATCH] Add AddressSanitizer annotations to std::vector

2017-07-05 Thread Jonathan Wakely

On 05/07/17 20:00 +0100, Jonathan Wakely wrote:

We can probably do similar annotations for std::deque, so that
partially filled pages are annotated. I also have a patch for
shared_ptr so that objects created by make_shared can be marked as
invalid after they're destroyed.


This is the make_shared annotation patch. This allows ASan to give an
error for:

 auto p = std::make_shared();
 std::weak_ptr w = p;
 int* pi = p.get();
 p = nullptr;
 return *pi;

The error isn't ideal, because ASan thinks we're using a container,
because that's what the annotations are intended for:

==4525==ERROR: AddressSanitizer: container-overflow

commit d0478d9fbd17b9e9d165b4893f784ae897531713
Author: Jonathan Wakely 
Date:   Tue Jun 27 13:31:12 2017 +0100

Add AddressSanitizer annotations to std::make_shared

	* include/bits/shared_ptr_base.h [_GLIBCXX_SANITIZE_STD_ALLOCATOR]
	(__asan_annotate): Add.
	(_Sp_counted_ptr_inplace::_M_dispose): Call __asan_annotate.

diff --git a/libstdc++-v3/include/bits/shared_ptr_base.h b/libstdc++-v3/include/bits/shared_ptr_base.h
index 7e6766b..a720018 100644
--- a/libstdc++-v3/include/bits/shared_ptr_base.h
+++ b/libstdc++-v3/include/bits/shared_ptr_base.h
@@ -57,6 +57,12 @@
 #include 
 #include 
 
+#if _GLIBCXX_SANITIZE_STD_ALLOCATOR
+extern "C" void
+__sanitizer_annotate_contiguous_container(const void*, const void*,
+ const void*, const void*);
+#endif
+
 namespace std _GLIBCXX_VISIBILITY(default)
 {
 #if !__cpp_rtti
@@ -522,6 +528,19 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #endif
   };
 
+#if _GLIBCXX_SANITIZE_STD_ALLOCATOR
+  template
+inline void
+__asan_annotate(const void* __beg, const void* __mid, const void* __end,
+		const allocator<_Tp>&)
+{ __sanitizer_annotate_contiguous_container(__beg, __end, __end, __mid); }
+
+  template
+inline void
+__asan_annotate(const void*, const void*, const void*, const _Alloc&)
+{ }
+#endif
+
   template
 class _Sp_counted_ptr_inplace final : public _Sp_counted_base<_Lp>
 {
@@ -556,6 +575,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   _M_dispose() noexcept
   {
 	allocator_traits<_Alloc>::destroy(_M_impl._M_alloc(), _M_ptr());
+#if _GLIBCXX_SANITIZE_STD_ALLOCATOR
+	__asan_annotate(this, &_M_impl._M_storage, this + 1, _M_impl._M_alloc());
+#endif
   }
 
   // Override because the allocator needs to know the dynamic type


Re: [PATCH] Add AddressSanitizer annotations to std::vector

2017-07-05 Thread Yuri Gribov
On Wed, Jul 5, 2017 at 8:00 PM, Jonathan Wakely  wrote:
> This patch adds AddressSanitizer annotations to std::vector, so that
> ASan can detect out-of-bounds accesses to the unused capacity of a
> vector. e.g.
>
>  std::vector v(2);
>  int* p = v.data();
>  v.pop_back();
>  return p[1];  // ERROR
>
> This cannot be detected by Debug Mode, but with these annotations ASan
> knows that only v.data()[0] is valid and will give an error.
>
> The annotations are only enabled for vector> and
> only when std::allocator's base class is either malloc_allocator or
> new_allocator. For other allocators the memory might not come from the
> freestore and so isn't tracked by ASan.

One important issue with enabling this by default is that it may
(will?) break separate sanitization (which is extremely important
feature in practice). If one part of application is sanitized but the
other isn't and some poor std::vector is push_back'ed in latter and
then accessed in former, we'll get a false positive because push_back
wouldn't properly annotate memory.

Perhaps hide this under a compilation flag (disabled by default)?

> Something similar has been on the google branches for some time:
> https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=207517
> This patch is a complete rewrite from scratch, because the google code
> was not exception safe. If an exception happened while appending
> elements to a vector, so that the size didn't change, the google code
> did not undo the annotation for the increased size. It also didn't
> annotate before deallocating, to mark the unused capacity as valid
> again.
>
> We can probably do similar annotations for std::deque, so that
> partially filled pages are annotated. I also have a patch for
> shared_ptr so that objects created by make_shared can be marked as
> invalid after they're destroyed.
>
> * config/allocator/malloc_allocator_base.h [__SANITIZE_ADDRESS__]
> (_GLIBCXX_SANITIZE_STD_ALLOCATOR): Define.
> * config/allocator/new_allocator_base.h [__SANITIZE_ADDRESS__]
> (_GLIBCXX_SANITIZE_STD_ALLOCATOR): Define.
> * include/bits/stl_vector.h [_GLIBCXX_SANITIZE_STD_ALLOCATOR]
> (_Vector_impl::_Asan, _Vector_impl::_Asan::_Reinit)
> (_Vector_impl::_Asan::_Grow, _GLIBCXX_ASAN_ANNOTATE_REINIT)
> (_GLIBCXX_ASAN_ANNOTATE_GROW, _GLIBCXX_ASAN_ANNOTATE_GREW)
> (_GLIBCXX_ASAN_ANNOTATE_SHRINK,
> _GLIBCXX_ASAN_ANNOTATE_BEFORE_DEALLOC):
> Define annotation helper types and macros.
> (vector::~vector, vector::push_back, vector::pop_back)
> (vector::_M_erase_at_end): Add annotations.
> * include/bits/vector.tcc (vector::reserve, vector::emplace_back)
> (vector::insert, vector::_M_erase, vector::operator=)
> (vector::_M_fill_assign, vector::_M_assign_aux)
> (vector::_M_insert_rval, vector::_M_emplace_aux)
> (vector::_M_insert_aux, vector::_M_realloc_insert)
> (vector::_M_fill_insert, vector::_M_default_append)
> (vector::_M_shrink_to_fit, vector::_M_range_insert): Annotate.
>
> Tested x86_64-linux (using -fsanitize=address, with some local patches
> to the testsuite) and powerpc64le-linux.
>
> I plan to commit this to trunk tomorrow.


[PATCH] dynamically set default num_gangs in OpenACC

2017-07-05 Thread Cesar Philippidis
Currently, the nvptx libgomp plugin indiscriminately sets num_gangs to
32 regardless of the underlying CUDA hardware. Depending on the GPU,
this value can be extremely conservative. The attached patch implements
a more sophisticated approach, which probes the hardware at run time to
calculate the number of gangs that would saturate the hardware's
resources. It should be noted that this solution may not be optimal;
I've seen other approaches where the compiler works with the runtime to
set num_gangs to be the number of loop iterations in the gang loop.
However, the approach taken in this patch greatly increases the
performance of OpenACC parallel code inside SPEC_ACCEL.

Besides for selecting num_gangs dynamically, this patch also teaches the
GOMP_OPENACC_DIM environment variable parser to accept a '-' argument
for the num_gang field. That argument allows the runtime to dynamically
set num_gangs, while still enabling the end user to specify num_workers
and vector_length.

Because nvptx port does not preform any register allocation (that gets
deferred to the CUDA driver JIT), there are situations where the
hardware doesn't have sufficient resources to satisfy the default
num_workers. As a stopgap solution, this patch teaches the nvptx plugin
how to gracefully error whenever it encounters such a situation.
Furthermore, it will inform the user how to adjust num_workers to get
the program to work.

The latter two changes are extremely small, so I clumped them into a
single patch. Is this OK for trunk?

Thanks,
Cesar
2017-07-05  Cesar Philippidis  

	libgomp/
	* plugin/cuda/cuda.h (CUdevice_attribute): Add
	CU_DEVICE_ATTRIBUTE_MAX_SHARED_MEMORY_PER_MULTIPROCESSOR.
	(CUfunction_attribute): Add CU_FUNC_ATTRIBUTE_BINARY_VERSION.
	* plugin/plugin-nvptx.c (struct targ_fn_descriptor): Add num_regs
	member.
	(struct ptx_device): Rename num_sms, regs_per_block, regs_per_sm as
	multiprocessor_count, max_registers_per_block,
	max_registers_per_multiprocessor, respectively.  Add members
	warp_size, multiprocessor_count, max_shared_memory_per_multiprocessor,
	binary_version, register_allocation_unit_size,
	register_allocation_granularity. 
	(nvptx_open_device): Initialize new and renamed members in ptx_device.
	(nvptx_exec): Dynamically set num_gangs based on hardware resources.
	Add support for '-' gang argument to GOMP_OPENACC_DIM environment
	variable.  Describe how to reduce num_workers when the hardware lacks
	sufficient resources for the default.
	(GOMP_OFFLOAD_load_image): Initialize new and renamed
	targ_fn_descriptor members.
	(nvptx_adjust_launch_bounds): Adjust names of regs_per_sm and
	num_sms.


diff --git a/libgomp/plugin/cuda/cuda.h b/libgomp/plugin/cuda/cuda.h
index 25d5d19..3199a93 100644
--- a/libgomp/plugin/cuda/cuda.h
+++ b/libgomp/plugin/cuda/cuda.h
@@ -69,6 +69,7 @@ typedef enum {
   CU_DEVICE_ATTRIBUTE_CONCURRENT_KERNELS = 31,
   CU_DEVICE_ATTRIBUTE_MAX_THREADS_PER_MULTIPROCESSOR = 39,
   CU_DEVICE_ATTRIBUTE_ASYNC_ENGINE_COUNT = 40,
+  CU_DEVICE_ATTRIBUTE_MAX_SHARED_MEMORY_PER_MULTIPROCESSOR = 81,
   CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_MULTIPROCESSOR = 82
 } CUdevice_attribute;
 
@@ -79,7 +80,8 @@ enum {
 
 typedef enum {
   CU_FUNC_ATTRIBUTE_MAX_THREADS_PER_BLOCK = 0,
-  CU_FUNC_ATTRIBUTE_NUM_REGS = 4
+  CU_FUNC_ATTRIBUTE_NUM_REGS = 4,
+  CU_FUNC_ATTRIBUTE_BINARY_VERSION = 6
 } CUfunction_attribute;
 
 typedef enum {
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 71630b5..802f76d 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -372,6 +372,9 @@ struct targ_fn_descriptor
   const struct targ_fn_launch *launch;
   int regs_per_thread;
   int max_threads_per_block;
+
+  /* Cuda function properties.  */
+  int num_regs;
 };
 
 /* A loaded PTX image.  */
@@ -408,9 +411,21 @@ struct ptx_device
   bool mkern;
   int  mode;
   int clock_khz;
-  int num_sms;
-  int regs_per_block;
-  int regs_per_sm;
+  int max_threads_per_block;
+  int warp_size;
+  int multiprocessor_count;
+  int max_threads_per_multiprocessor;
+  int max_registers_per_block;
+  int max_registers_per_multiprocessor;
+  int max_shared_memory_per_multiprocessor;
+
+  int binary_version;
+
+  /* register_allocation_unit_size and register_allocation_granularity
+ were extracted from the "Register Allocation Granularity" in
+ Nvidia's CUDA Occupancy Calculator spreadsheet.  */
+  int register_allocation_unit_size;
+  int register_allocation_granularity;
 
   struct ptx_image_data *images;  /* Images loaded on device.  */
   pthread_mutex_t image_lock; /* Lock for above list.  */
@@ -722,6 +737,9 @@ nvptx_open_device (int n)
   ptx_dev->ord = n;
   ptx_dev->dev = dev;
   ptx_dev->ctx_shared = false;
+  ptx_dev->binary_version = 0;
+  ptx_dev->register_allocation_unit_size = 0;
+  ptx_dev->register_allocation_granularity = 0;
 
   r = CUDA_CALL_NOCHECK (cuCtxGetDevice, &ctx_dev);
   if (r != CUDA_SUCCESS && r != CUDA_ERROR_INVALID_CONTEXT)
@@ -770,33 +788,46 @@ nvptx

Re: Ping [Patch, fortran] PR70071

2017-07-05 Thread Janus Weil
Hi Harald,

thanks for the reminder. I can take care of committing the patch for
you. Just give me a day or two ...

Cheers,
Janus



2017-07-05 20:44 GMT+02:00 Harald Anlauf :
> The patch below has not been applied to the best of my knowledge.
>
> Just a reminder for whoever cares.
>
> Harald
>
> On 05/04/17 20:19, Harald Anlauf wrote:
>> On 05/04/17 18:15, Steve Kargl wrote:
>>> On Thu, May 04, 2017 at 05:26:17PM +0200, Harald Anlauf wrote:
 While trying to clean up my working copy, I found that the trivial
 patch for the ICE-on-invalid as described in the PR regtests cleanly
 for 7-release on i686-pc-linux-gnu.

 Here's the cleaned-up version (diffs attached).

 2017-05-04  Harald Anlauf  

 PR fortran/70071
 * array.c (gfc_ref_dimen_size): Handle bad subscript triplets.

 2017-05-04  Harald Anlauf  

 PR fortran/70071
 * gfortran.dg/coarray_44.f90: New testcase.

>>>
>>> Harald,
>>>
>>> The patch looks reasonable.  Do you have a commit privilege?
>>>
>>
>> Steve,
>>
>> no, I don't.
>>
>> Would you like to take care of the patch?  Then please do so.
>>
>> Thanks,
>> Harald
>>
>


Re: [PATCH] Add AddressSanitizer annotations to std::vector

2017-07-05 Thread Jonathan Wakely

On 05/07/17 20:44 +0100, Yuri Gribov wrote:

On Wed, Jul 5, 2017 at 8:00 PM, Jonathan Wakely  wrote:

This patch adds AddressSanitizer annotations to std::vector, so that
ASan can detect out-of-bounds accesses to the unused capacity of a
vector. e.g.

 std::vector v(2);
 int* p = v.data();
 v.pop_back();
 return p[1];  // ERROR

This cannot be detected by Debug Mode, but with these annotations ASan
knows that only v.data()[0] is valid and will give an error.

The annotations are only enabled for vector> and
only when std::allocator's base class is either malloc_allocator or
new_allocator. For other allocators the memory might not come from the
freestore and so isn't tracked by ASan.


One important issue with enabling this by default is that it may
(will?) break separate sanitization (which is extremely important
feature in practice). If one part of application is sanitized but the
other isn't and some poor std::vector is push_back'ed in latter and
then accessed in former, we'll get a false positive because push_back
wouldn't properly annotate memory.


Good point.


Perhaps hide this under a compilation flag (disabled by default)?


If you define _GLIBCXX_SANITIZE_STD_ALLOCATOR to 0 the annotations are
disabled. To make them disabled by default would need some changes, to
use separate macros for "the std::allocator base class can be
sanitized" and "the user wants std::vector to be sanitized".

I'll do that before committing.



Re: [PATCH, rs6000] Add support to __builtin_cpu_supports() for two new HWCAP2 bits

2017-07-05 Thread Tulio Magno Quites Machado Filho
Segher Boessenkool  writes:

> On Fri, Jun 30, 2017 at 11:53:48AM -0500, Peter Bergner wrote:
>> >> Not use an installed header, that's not what I'm asking.  Share the
>> >> source file, i.e., just copy it over from the glibc source tree (it
>> >> should probably hold the master copy).  Fewer typos, cannot forget to
>> >> update some entry, etc.
>...
> Does glibc also have the names in our cpu_supports_info array?

It does have the names, but it isn't compatible with the cpu_supports_info
array.

> Should it have them?

Maybe, but glibc is frozen now and we won't be able to make changes until
glibc 2.26 is released.

-- 
Tulio Magno



Re: [PATCH] warn on mem calls modifying objects of non-trivial types (PR 80560)

2017-07-05 Thread Andrew Pinski
On Sun, Apr 30, 2017 at 1:02 PM, Pedro Alves  wrote:
> Hi Martin,
>
> Thanks much for doing this.  A few comments below, in light of my
> experience doing the equivalent checks in the gdb patch linked below,
> using standard C++11.
>
> On 04/29/2017 09:09 PM, Martin Sebor wrote:
>> Calling memset, memcpy, or similar to write to an object of
>> a non-trivial type (such as one that defines a ctor or dtor,
>> or has such a member) can break the invariants otherwise
>> maintained by the class and cause undefined behavior.
>>
>> The motivating example that prompted this work was a review of
>> a change that added to a plain old struct a new member with a ctor
>> and dtor (in this instance the member was of type std::vector).
>>
>> To help catch problems of this sort some projects (such as GDB)
>> have apparently even devised their own clever solutions to detect
>> them: https://sourceware.org/ml/gdb-patches/2017-04/msg00378.html.
>>
>> The attached patch adds a new warning, -Wnon-trivial-memaccess,
>> that has GCC detect these mistakes.  The patch also fixes up
>> a handful of instances of the problem in GCC.  These instances
>> correspond to the two patterns below:
>>
>>   struct A
>>   {
>> void *p;
>> void foo (int n) { p = malloc (n); }
>> ~A () { free (p); }
>>   };
>>
>>   void init (A *a)
>>   {
>> memset (a, 0, sizeof *a);
>>   }
>>
>> and
>>
>>   struct B
>>   {
>> int i;
>> ~A ();
>>   };
>
> (typo: "~B ();")
>
>>
>>   void copy (B *p, const B *q)
>>   {
>> memcpy (p, q, sizeof *p);
>> ...
>>}
>>
>
> IMO the check should be relaxed from "type is trivial" to "type is
> trivially copyable" (which is what the gdb detection at
> https://sourceware.org/ml/gdb-patches/2017-04/msg00378.html
> uses for memcpy/memmove).  Checking that the destination is trivial is
> going to generate false positives -- specifically, [basic-types]/3
> specifies that it's fine to memcpy trivially _copyable_ types, not
> trivial types.  A type can be both non-trivial and trivially copyable
> at the same time.  For example, this compiles, but triggers
> your new warning:
>
> #include 
> #include 
> #include 
>
> struct NonTrivialButTriviallyCopyable
> {
>   NonTrivialButTriviallyCopyable () : i (0) {}
>   int i;
> };
>
> static_assert (!std::is_trivial::value, "");
> static_assert 
> (std::is_trivially_copyable::value, "");
>
> void copy (NonTrivialButTriviallyCopyable *dst, 
> NonTrivialButTriviallyCopyable *src)
> {
>   memcpy (dst, src, sizeof (*src));
> }
>
> $ /opt/gcc/bin/g++ -std=gnu++11 trivial-warn.cc -o trivial-warn -g3 -O0 -Wall 
> -Wextra -c
> trivial-warn.cc: In function ‘void copy(NonTrivialButTriviallyCopyable*, 
> NonTrivialButTriviallyCopyable*)’:
> trivial-warn.cc:16:34: warning: calling ‘void* memcpy(void*, const void*, 
> size_t)’ with a pointer to a non-trivial type ‘struct 
> NonTrivialButTriviallyCopyable’ [-Wnon-trivial-memaccess]
>memcpy (dst, src, sizeof (*src));
>   ^
> $
>
> Implementations of vector-like classes can very well (and are
> encouraged) to make use of std::is_trivially_copyable to know whether
> they can copy a range of elements to new storage
> using memcpy/memmove/mempcpy.
>
> Running your patch against GDB trips on such a case:
>
> src/gdb/btrace.h: In function ‘btrace_insn_s* 
> VEC_btrace_insn_s_quick_insert(VEC_btrace_insn_s*, unsigned int, const 
> btrace_insn_s*, const char*, unsigned int)’:
> src/gdb/common/vec.h:948:62: error: calling ‘void* memmove(void*, const 
> void*, size_t)’ with a pointer to a non-trivial type ‘btrace_insn_s {aka 
> struct btrace_insn}’ [-Werror=non-trivial-memaccess]
>memmove (slot_ + 1, slot_, (vec_->num++ - ix_) * sizeof (T));\
>   ^
>
> There is nothing wrong with the code being warned here.
> While "struct btrace_insn" is trivial (has a user-provided default
> ctor), it is still trivially copyable.


Any news on getting a "fix" for this issue.  Right now it blocks my
testing of GCC/gdb because I am building the trunk of both in a CI
loop and my build is broken due to this warning.  Should I just add
--disable-werror to my gdb build instead?

Thanks,
Andrew Pinski

>
> Now, this gdb code is using the old VEC (originated from
> gcc's C days, it's not the current C++fied VEC implementation),
> but the point is that any other random vector-like container out there
> is free to optimize copy of a range of non-trivial but trivially
> copyable types using memcpy/memmove.
>
> Note that libstdc++ does not actually do that optimization, but
> that's just a missed optimization, see PR libstdc++/68350 [1]
> "std::uninitialized_copy overly restrictive for
> trivially_copyable types".  (libstdc++'s std::vector defers
> copy to std::unitialized_copy.)
>
> [1] - https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68350
>
>> These aren't undefined and the patch could be tweaked to allow
>> them.
>
> I think they're undefined because the t

Re: [PATCH][testsuite] Add dg-require-stack-check

2017-07-05 Thread Christophe Lyon
On 4 July 2017 at 10:50, Christophe Lyon  wrote:
> On 3 July 2017 at 17:30, Jeff Law  wrote:
>> On 07/03/2017 09:00 AM, Christophe Lyon wrote:
>>> Hi,
>>>
>>> This is a follow-up to
>>> https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01791.html
>>>
>>> This patch adds dg-require-stack-check and updates the tests that use
>>> dg-options "-fstack-check" to avoid failures on configurations that to
>>> not support it.
>>>
>>> I merely copied what we currently do to check if visibility flags are
>>> supported, and cross-tested on aarch64 and arm targets with the
>>> results I expected.
>>>
>>> This means that my testing does not cover the changes I propose for
>>> i386 and gnat.
>>>
>>> Is it OK nonetheless?
>>>
>>> Thanks,
>>>
>>> Christophe
>>>
>>>
>>> stack-check-et.chlog.txt
>>>
>>>
>>> 2017-07-03  Christophe Lyon  
>>>
>>>   * lib/target-supports-dg.exp (dg-require-stack-check): New.
>>>   * lib/target-supports.exp (check_stack_check_available): New.
>>>   * g++.dg/other/i386-9.C: Add dg-require-stack-check.
>>>   * gcc.c-torture/compile/stack-check-1.c: Likewise.
>>>   * gcc.dg/graphite/run-id-pr47653.c: Likewise.
>>>   * gcc.dg/pr47443.c: Likewise.
>>>   * gcc.dg/pr48134.c: Likewise.
>>>   * gcc.dg/pr70017.c: Likewise.
>>>   * gcc.target/aarch64/stack-checking.c: Likewise.
>>>   * gcc.target/arm/stack-checking.c: Likewise.
>>>   * gcc.target/i386/pr48723.c: Likewise.
>>>   * gcc.target/i386/pr55672.c: Likewise.
>>>   * gcc.target/i386/pr67265-2.c: Likewise.
>>>   * gcc.target/i386/pr67265.c: Likewise.
>>>   * gnat.dg/opt49.adb: Likewise.
>>>   * gnat.dg/stack_check1.adb: Likewise.
>>>   * gnat.dg/stack_check2.adb: Likewise.
>>>   * gnat.dg/stack_check3.adb: Likewise.
>> ACK once you address Rainer's comments.  I've got further stack-check
>> tests in the queue which I'll update once your change goes in.
>>
>> jeff
>
> Here is an updated version, which adds documentation for 
> dg-require-stack-check.
>
> I also ran make-check on and x86_64 with ada enabled and checked the logs:
> the updated i386/* and gnat.dg* tests all pass, and are preceded by
> the compilation
> of the "stack_check" sample.
>
> OK?

Jeff, let me know if/when you want me to commit this?

Thanks,

Christophe

>
> Thanks,
>
> Christophe


Re: [PATCH] warn on mem calls modifying objects of non-trivial types (PR 80560)

2017-07-05 Thread Martin Sebor

On 07/05/2017 02:58 PM, Andrew Pinski wrote:

On Sun, Apr 30, 2017 at 1:02 PM, Pedro Alves  wrote:

Hi Martin,

Thanks much for doing this.  A few comments below, in light of my
experience doing the equivalent checks in the gdb patch linked below,
using standard C++11.

On 04/29/2017 09:09 PM, Martin Sebor wrote:

Calling memset, memcpy, or similar to write to an object of
a non-trivial type (such as one that defines a ctor or dtor,
or has such a member) can break the invariants otherwise
maintained by the class and cause undefined behavior.

The motivating example that prompted this work was a review of
a change that added to a plain old struct a new member with a ctor
and dtor (in this instance the member was of type std::vector).

To help catch problems of this sort some projects (such as GDB)
have apparently even devised their own clever solutions to detect
them: https://sourceware.org/ml/gdb-patches/2017-04/msg00378.html.

The attached patch adds a new warning, -Wnon-trivial-memaccess,
that has GCC detect these mistakes.  The patch also fixes up
a handful of instances of the problem in GCC.  These instances
correspond to the two patterns below:

  struct A
  {
void *p;
void foo (int n) { p = malloc (n); }
~A () { free (p); }
  };

  void init (A *a)
  {
memset (a, 0, sizeof *a);
  }

and

  struct B
  {
int i;
~A ();
  };


(typo: "~B ();")



  void copy (B *p, const B *q)
  {
memcpy (p, q, sizeof *p);
...
   }



IMO the check should be relaxed from "type is trivial" to "type is
trivially copyable" (which is what the gdb detection at
https://sourceware.org/ml/gdb-patches/2017-04/msg00378.html
uses for memcpy/memmove).  Checking that the destination is trivial is
going to generate false positives -- specifically, [basic-types]/3
specifies that it's fine to memcpy trivially _copyable_ types, not
trivial types.  A type can be both non-trivial and trivially copyable
at the same time.  For example, this compiles, but triggers
your new warning:

#include 
#include 
#include 

struct NonTrivialButTriviallyCopyable
{
  NonTrivialButTriviallyCopyable () : i (0) {}
  int i;
};

static_assert (!std::is_trivial::value, "");
static_assert (std::is_trivially_copyable::value, 
"");

void copy (NonTrivialButTriviallyCopyable *dst, NonTrivialButTriviallyCopyable 
*src)
{
  memcpy (dst, src, sizeof (*src));
}

$ /opt/gcc/bin/g++ -std=gnu++11 trivial-warn.cc -o trivial-warn -g3 -O0 -Wall 
-Wextra -c
trivial-warn.cc: In function ‘void copy(NonTrivialButTriviallyCopyable*, 
NonTrivialButTriviallyCopyable*)’:
trivial-warn.cc:16:34: warning: calling ‘void* memcpy(void*, const void*, 
size_t)’ with a pointer to a non-trivial type ‘struct 
NonTrivialButTriviallyCopyable’ [-Wnon-trivial-memaccess]
   memcpy (dst, src, sizeof (*src));
  ^
$

Implementations of vector-like classes can very well (and are
encouraged) to make use of std::is_trivially_copyable to know whether
they can copy a range of elements to new storage
using memcpy/memmove/mempcpy.

Running your patch against GDB trips on such a case:

src/gdb/btrace.h: In function ‘btrace_insn_s* 
VEC_btrace_insn_s_quick_insert(VEC_btrace_insn_s*, unsigned int, const 
btrace_insn_s*, const char*, unsigned int)’:
src/gdb/common/vec.h:948:62: error: calling ‘void* memmove(void*, const void*, 
size_t)’ with a pointer to a non-trivial type ‘btrace_insn_s {aka struct 
btrace_insn}’ [-Werror=non-trivial-memaccess]
   memmove (slot_ + 1, slot_, (vec_->num++ - ix_) * sizeof (T));\
  ^

There is nothing wrong with the code being warned here.
While "struct btrace_insn" is trivial (has a user-provided default
ctor), it is still trivially copyable.



Any news on getting a "fix" for this issue.  Right now it blocks my
testing of GCC/gdb because I am building the trunk of both in a CI
loop and my build is broken due to this warning.  Should I just add
--disable-werror to my gdb build instead?


I'm not aware of any serious bugs in the warning that need fixing.
The warning points out raw memory accesses to objects of non-trivial
types (among other things), or those with user-defined default or
copy ctors, dtor, or copy assignment operator.  Objects of such
types should be manipulated using these special member functions
rather than by raw memory functions.  In many (though not all(*))
cases the raw memory calls can put.leave such objects in an invalid
state and make using them undefined.

In the instance of the warning above, btrace_insn_s is a non-trivial
type because it has a user-defined default ctor, as a result of
defining a member of such a type (flags, which is of type
enum_flags).  To avoid the warning either
the memcpy/memmove calls should be replaced with a loop that makes
use of the special function(s), or in C++ 11 and later, the class
made trivial by defaulting the ctors and copy assignment operators.
In the GDB case, this can be done by replacing t

RE: Add support for use_hazard_barrier_return function attribute

2017-07-05 Thread Maciej W. Rozycki
On Fri, 23 Jun 2017, Prachi Godbole wrote:

> Index: gcc/config/mips/mips.md
> ===
> --- gcc/config/mips/mips.md   (revision 246899)
> +++ gcc/config/mips/mips.md   (working copy)
> @@ -6578,6 +6581,20 @@
>[(set_attr "type"  "jump")
> (set_attr "mode"  "none")])
>  
> +;; Insn to clear execution and instruction hazards while returning.
> +;; However, it doesn't clear hazards created by the insn in its delay slot.
> +;; Thus, explicitly place a nop in its delay slot.
> +
> +(define_insn "mips_hb_return_internal"
> +  [(return)
> +   (unspec_volatile [(match_operand 0 "pmode_register_operand" "")]
> + UNSPEC_JRHB)]
> +  ""
> +  {
> +return "%(jr.hb\t$31%/%)";
> +  }
> +  [(set_attr "insn_count" "2")])
> +
>  ;; Normal return.
>  
>  (define_insn "_internal"

 Nothing wrong with your proposed change, but overall I wonder if (as a 
follow-up change) we could find a nonintrusive way to have this pattern 
(and `clear_hazard_' as well) produce JRS.HB rather than JR.HB in 
microMIPS compilations, as using the 32-bit delay-slot NOP encoding where 
the 16-bit one would do is obviously a tiny, but completely unnecessary 
code space loss (and we do care about code space losses in microMIPS 
compilations; conserving space is the very purpose of the microMIPS ISA 
after all).

 Of course it wouldn't do if we rewrote the instruction pattern as 
"%(jr%!.hb\t$31%/%)" here, because the NOP that follows would have to come 
from an RTL instruction for `%!' to have any effect.  But perhaps we could 
emit RTL instead somehow rather than hardcoding the NOP with `%/'?

  Maciej


Re: Update profile for haifa-sched's recovery blocks

2017-07-05 Thread Jeff Law
On 07/04/2017 04:19 AM, Jan Hubicka wrote:
> Hi,
> this is another bug I noticed while looking into Itanium rgression.
> There is no profile attached to recovery blocks in scheduler.
> I made them very unlikely, but I wonder if we can do better? After all
> we probably know the probability of path that will lead for speculation
> to suceed?
I'd expect there's info around you could use to do better, but is it
worth the effort in practice?

Jeff


Re: [PATCH][testsuite] Add dg-require-stack-check

2017-07-05 Thread Jeff Law
On 07/04/2017 02:50 AM, Christophe Lyon wrote:
> On 3 July 2017 at 17:30, Jeff Law  wrote:
>> On 07/03/2017 09:00 AM, Christophe Lyon wrote:
>>> Hi,
>>>
>>> This is a follow-up to
>>> https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01791.html
>>>
>>> This patch adds dg-require-stack-check and updates the tests that use
>>> dg-options "-fstack-check" to avoid failures on configurations that to
>>> not support it.
>>>
>>> I merely copied what we currently do to check if visibility flags are
>>> supported, and cross-tested on aarch64 and arm targets with the
>>> results I expected.
>>>
>>> This means that my testing does not cover the changes I propose for
>>> i386 and gnat.
>>>
>>> Is it OK nonetheless?
>>>
>>> Thanks,
>>>
>>> Christophe
>>>
>>>
>>> stack-check-et.chlog.txt
>>>
>>>
>>> 2017-07-03  Christophe Lyon  
>>>
>>>   * lib/target-supports-dg.exp (dg-require-stack-check): New.
>>>   * lib/target-supports.exp (check_stack_check_available): New.
>>>   * g++.dg/other/i386-9.C: Add dg-require-stack-check.
>>>   * gcc.c-torture/compile/stack-check-1.c: Likewise.
>>>   * gcc.dg/graphite/run-id-pr47653.c: Likewise.
>>>   * gcc.dg/pr47443.c: Likewise.
>>>   * gcc.dg/pr48134.c: Likewise.
>>>   * gcc.dg/pr70017.c: Likewise.
>>>   * gcc.target/aarch64/stack-checking.c: Likewise.
>>>   * gcc.target/arm/stack-checking.c: Likewise.
>>>   * gcc.target/i386/pr48723.c: Likewise.
>>>   * gcc.target/i386/pr55672.c: Likewise.
>>>   * gcc.target/i386/pr67265-2.c: Likewise.
>>>   * gcc.target/i386/pr67265.c: Likewise.
>>>   * gnat.dg/opt49.adb: Likewise.
>>>   * gnat.dg/stack_check1.adb: Likewise.
>>>   * gnat.dg/stack_check2.adb: Likewise.
>>>   * gnat.dg/stack_check3.adb: Likewise.
>> ACK once you address Rainer's comments.  I've got further stack-check
>> tests in the queue which I'll update once your change goes in.
>>
>> jeff
> Here is an updated version, which adds documentation for 
> dg-require-stack-check.
> 
> I also ran make-check on and x86_64 with ada enabled and checked the logs:
> the updated i386/* and gnat.dg* tests all pass, and are preceded by
> the compilation
> of the "stack_check" sample.
> 
> OK?
> 
> Thanks,
> 
> Christophe
> 
> 
> stack-check-et.chlog.txt
> 
> 
> 2017-07-04  Christophe Lyon  
> 
>   gcc/
>   * doc/sourcebuild.texi (Test Directives, Variants of
>   dg-require-support): Add documentation for dg-require-stack-check.
> 
>   gcc/testsuite/
>   * lib/target-supports-dg.exp (dg-require-stack-check): New.
>   * lib/target-supports.exp (check_stack_check_available): New.
>   * g++.dg/other/i386-9.C: Add dg-require-stack-check.
>   * gcc.c-torture/compile/stack-check-1.c: Likewise.
>   * gcc.dg/graphite/run-id-pr47653.c: Likewise.
>   * gcc.dg/pr47443.c: Likewise.
>   * gcc.dg/pr48134.c: Likewise.
>   * gcc.dg/pr70017.c: Likewise.
>   * gcc.target/aarch64/stack-checking.c: Likewise.
>   * gcc.target/arm/stack-checking.c: Likewise.
>   * gcc.target/i386/pr48723.c: Likewise.
>   * gcc.target/i386/pr55672.c: Likewise.
>   * gcc.target/i386/pr67265-2.c: Likewise.
>   * gcc.target/i386/pr67265.c: Likewise.
>   * gnat.dg/opt49.adb: Likewise.
>   * gnat.dg/stack_check1.adb: Likewise.
>   * gnat.dg/stack_check2.adb: Likewise.
>   * gnat.dg/stack_check3.adb: Likewise.
OK for the trunk.  Thanks for doing this!

Jeff


[Arm] Obsoleting Command line option -mstructure-size-boundary in eabi configurations

2017-07-05 Thread Michael Collison
NetBSD/Arm requires that DEFAULT_STRUCTURE_SIZE_BOUNDARY (see 
config/arm/netbsd-elf.h for details). This patch disallows 
-mstructure-size-boundary on netbsd if the value is not equal to the 
DEFAULT_STRUCTURE_SIZE_BOUNDARY.

Okay for trunk?

2017-07-05  Michael Collison  

* config/arm/arm.c (arm_option_override): Disallow
-mstructure-size-boundary on netbsd if value is not
DEFAULT_STRUCTURE_SIZE_BOUNDARY.


pr1556.patch
Description: pr1556.patch