Re: [PATCH 01/15] add more coalescing to simplify constraints

2016-01-21 Thread Matthew Wahab

On 15/01/16 17:14, Sebastian Pop wrote:

From: Sebastian Pop 

2015-12-30  Aditya Kumar  
Sebastian Pop  

* graphite-dependences.c (constrain_domain): Add call to isl_*_coalesce.
(add_pdr_constraints): Same.
(scop_get_reads): Same.
(scop_get_must_writes): Same.
(scop_get_may_writes): Same.
(scop_get_original_schedule): Same.
(extend_schedule): Same.
(apply_schedule_on_deps): Same.
(carries_deps): Same.
(compute_deps): Same.
(scop_get_dependences): Same.
* graphite-isl-ast-to-gimple.c
(translate_isl_ast_to_gimple::generate_isl_schedule): Same.
* graphite-optimize-isl.c (get_schedule_for_band): Same.
(get_schedule_for_band_list): Same.
(get_schedule_map): Same.
(apply_schedule_map_to_scop): Same.
* graphite-sese-to-poly.c (build_pbb_scattering_polyhedrons): Same.
(build_loop_iteration_domains): Same.
(add_condition_to_pbb): Same.
(add_param_constraints): Same.
(pdr_add_memory_accesses): Same.
(pdr_add_data_dimensions): Same.
---
  gcc/graphite-dependences.c   | 63 ++--
  gcc/graphite-isl-ast-to-gimple.c |  2 ++
  gcc/graphite-optimize-isl.c  | 12 
  gcc/graphite-sese-to-poly.c  | 28 --
  4 files changed, 56 insertions(+), 49 deletions(-)



diff --git a/gcc/graphite-optimize-isl.c b/gcc/graphite-optimize-isl.c
index 9626e96..15dd5b0 100644
--- a/gcc/graphite-optimize-isl.c
+++ b/gcc/graphite-optimize-isl.c

[..]

  static isl_union_map *
  get_schedule_map (isl_schedule *schedule)
  {
-  isl_band_list *bandList = isl_schedule_get_band_forest (schedule);
-  isl_union_map *schedule_map = get_schedule_for_band_list (bandList);
+  isl_band_list *band_list = isl_schedule_get_band_forest (schedule);
+  isl_union_map *schedule_map = get_schedule_for_band_list (band_list);
isl_band_list_free (bandList);
return schedule_map;
  }


Building arm-none-linux-gnueabihf fails at the isl_band_list_free. Shouldn't bandList 
be band_list?


Matthew




Re: [PATCH 01/15] add more coalescing to simplify constraints

2016-01-21 Thread Matthew Wahab

On 21/01/16 14:22, Matthew Wahab wrote:

On 15/01/16 17:14, Sebastian Pop wrote:

From: Sebastian Pop 

2015-12-30  Aditya Kumar  
Sebastian Pop  

* graphite-dependences.c (constrain_domain): Add call to isl_*_coalesce.
(add_pdr_constraints): Same.
(scop_get_reads): Same.
(scop_get_must_writes): Same.
(scop_get_may_writes): Same.
(scop_get_original_schedule): Same.
(extend_schedule): Same.
(apply_schedule_on_deps): Same.
(carries_deps): Same.
(compute_deps): Same.
(scop_get_dependences): Same.
* graphite-isl-ast-to-gimple.c
(translate_isl_ast_to_gimple::generate_isl_schedule): Same.
* graphite-optimize-isl.c (get_schedule_for_band): Same.
(get_schedule_for_band_list): Same.
(get_schedule_map): Same.
(apply_schedule_map_to_scop): Same.
* graphite-sese-to-poly.c (build_pbb_scattering_polyhedrons): Same.
(build_loop_iteration_domains): Same.
(add_condition_to_pbb): Same.
(add_param_constraints): Same.
(pdr_add_memory_accesses): Same.
(pdr_add_data_dimensions): Same.
---
  gcc/graphite-dependences.c   | 63 ++--
  gcc/graphite-isl-ast-to-gimple.c |  2 ++
  gcc/graphite-optimize-isl.c  | 12 
  gcc/graphite-sese-to-poly.c  | 28 --
  4 files changed, 56 insertions(+), 49 deletions(-)



diff --git a/gcc/graphite-optimize-isl.c b/gcc/graphite-optimize-isl.c
index 9626e96..15dd5b0 100644
--- a/gcc/graphite-optimize-isl.c
+++ b/gcc/graphite-optimize-isl.c

[..]

  static isl_union_map *
  get_schedule_map (isl_schedule *schedule)
  {
-  isl_band_list *bandList = isl_schedule_get_band_forest (schedule);
-  isl_union_map *schedule_map = get_schedule_for_band_list (bandList);
+  isl_band_list *band_list = isl_schedule_get_band_forest (schedule);
+  isl_union_map *schedule_map = get_schedule_for_band_list (band_list);
isl_band_list_free (bandList);
return schedule_map;
  }


Building arm-none-linux-gnueabihf fails at the isl_band_list_free. Shouldn't 
bandList
be band_list?


Kyrill points out that it's already fixed: 
https://gcc.gnu.org/ml/gcc-patches/2016-01/msg01613.html


Matthew



Re: [PATCH][AArch64] Remove an unused reload hook.

2016-02-29 Thread Matthew Wahab

On 25/02/16 11:00, Yvan Roux wrote:

Hi,

On 26 January 2015 at 18:01, Matthew Wahab  wrote:

Hello,

The LEGITIMIZE_RELOAD_ADDRESS macro is only needed for reload. Since the
Aarch64 backend no longer supports reload, this macro is not needed and this
patch removes it.

Tested aarch64-none-linux-gnu with gcc-check. No new failures.

Ok for trunk?
Matthew

gcc/
2015-01-26  Matthew Wahab  

 * config/aarch64/aarch64.h (LEGITIMIZE_RELOAD_ADDRESS): Remove.
 * config/aarch64/arch64-protos.h
 (aarch64_legitimize_reload_address): Remove.
 * config/aarch64/aarch64.c (aarch64_legitimize_reload_address):
 Remove.


It seems that this old patch was forgotten, I guess that it'll have to
wait for GCC 7 now, but I think it's a good thing top cleanup the
reload specific hooks and constructions (I've another patch on for
that under on-going).



Thanks for spotting this. I'll take of it when stage 1 opens.
Matthew



Re: [ARM] Correct spelling of references to ARMv6KZ

2015-07-23 Thread Matthew Wahab

On 24/06/15 10:25, Matthew Wahab wrote:

Ping. Attached updated patch which also actually removes "armv6zk" from
doc/invoke.texi.

Also, retested:
- arm-none-linux-gnueabihf native bootstrap and cross-compiled make checck.
- arm-none-eabi: cross-compiled make check.


gcc/
2015-07-23  Matthew Wahab  

* config/arm/arm-arches.def: Add "armv6kz". Replace 6ZK with 6KZ
and FL_FOR_ARCH6ZK with FL_FOR_ARCH6KZ.
* config/arm/arm-c.c (arm_cpu_builtins): Emit "__ARM_ARCH_6ZK__"
for armv6kz targets.
* config/arm/arm-cores.def: Replace 6ZK with 6KZ.
* config/arm/arm-protos.h (FL_ARCH6KZ): New.
(FL_FOR_ARCH6ZK): Remove.
(FL_FOR_ARCH6KZ): New.
(arm_arch6zk): New declaration.
* config/arm/arm-tables.opt: Regenerate.
* config/arm/arm.c (arm_arch6kz): New.
(arm_option_override): Set arm_arch6kz.
* config/arm/arm.h (BASE_ARCH_6ZK): Rename to BASE_ARCH_6KZ.
* config/arm/driver-arm.c: Add "armv6kz".
* doc/invoke.texi: Replace "armv6zk" with "armv6kz".



Hello,

GCC supports ARM architecture ARMv6KZ but refers to it as ARMv6ZK. This is made
visible by the command line option -march=armv6zk and by the predefined macro
__ARM_ARCH_6ZK__.

This patch corrects the spelling internally and adds -march=armv6kz. To preserve
existing behaviour, -march=armv6zk is kept as an alias of -march=armv6kz and
both __ARM_ARCH_6KZ__ and __ARM_ARCH_6ZK__ macros are defined for the
architecture.

Use of -march=arm6kz will need to wait for binutils to be updated, a patch has
been submitted (https://sourceware.org/ml/binutils/2015-06/msg00236.html). Use
of the existing spelling, -march=arm6zk, still works with current binutils.

Tested arm-none-linux-gnueabihf with check-gcc.

Ok for trunk?
Matthew

gcc/
2015-15-24  Matthew Wahab  

* config/arm/arm-arches.def: Add "armv6kz". Replace 6ZK with 6KZ
and FL_FOR_ARCH6ZK with FL_FOR_ARCH6KZ.
* config/arm/arm-c.c (arm_cpu_builtins): Emit "__ARM_ARCH_6ZK__"
for armv6kz targets.
* config/arm/arm-cores.def: Replace 6ZK with 6KZ.
* config/arm/arm-protos.h (FL_ARCH6KZ): New.
(FL_FOR_ARCH6ZK): Remove.
(FL_FOR_ARCH6KZ): New.
(arm_arch6zk): New declaration.
* config/arm/arm-tables.opt: Regenerate.
* config/arm/arm.c (arm_arch6kz): New.
(arm_option_override): Set arm_arch6kz.
* config/arm/arm.h (BASE_ARCH_6ZK): Rename to BASE_ARCH_6KZ.
* config/arm/driver-arm.c: Add "armv6kz".
  * doc/invoke.texi: Replace "armv6zk" with "armv6kz" and
"armv6zkt2" with "armv6kzt2".



diff --git a/gcc/config/arm/arm-arches.def b/gcc/config/arm/arm-arches.def
index 840c1ff..3dafaa5 100644
--- a/gcc/config/arm/arm-arches.def
+++ b/gcc/config/arm/arm-arches.def
@@ -44,7 +44,8 @@ ARM_ARCH("armv6",   arm1136js,  6,   FL_CO_PROC | FL_FOR_ARCH6)
 ARM_ARCH("armv6j",  arm1136js,  6J,  FL_CO_PROC | FL_FOR_ARCH6J)
 ARM_ARCH("armv6k",  mpcore,	6K,  FL_CO_PROC | FL_FOR_ARCH6K)
 ARM_ARCH("armv6z",  arm1176jzs, 6Z,  FL_CO_PROC | FL_FOR_ARCH6Z)
-ARM_ARCH("armv6zk", arm1176jzs, 6ZK, FL_CO_PROC | FL_FOR_ARCH6ZK)
+ARM_ARCH("armv6kz", arm1176jzs, 6KZ, FL_CO_PROC | FL_FOR_ARCH6KZ)
+ARM_ARCH("armv6zk", arm1176jzs, 6KZ, FL_CO_PROC | FL_FOR_ARCH6KZ)
 ARM_ARCH("armv6t2", arm1156t2s, 6T2, FL_CO_PROC | FL_FOR_ARCH6T2)
 ARM_ARCH("armv6-m", cortexm1,	6M,			  FL_FOR_ARCH6M)
 ARM_ARCH("armv6s-m", cortexm1,	6M,			  FL_FOR_ARCH6M)
diff --git a/gcc/config/arm/arm-c.c b/gcc/config/arm/arm-c.c
index 297995b..9bf3973 100644
--- a/gcc/config/arm/arm-c.c
+++ b/gcc/config/arm/arm-c.c
@@ -167,6 +167,11 @@ arm_cpu_builtins (struct cpp_reader* pfile, int flags)
 }
   if (arm_arch_iwmmxt2)
 builtin_define ("__IWMMXT2__");
+  /* ARMv6KZ was originally identified as the misspelled __ARM_ARCH_6ZK__.  To
+ preserve the existing behaviour, the misspelled feature macro must still be
+ defined.  */
+  if (arm_arch6kz)
+builtin_define ("__ARM_ARCH_6ZK__");
   if (TARGET_AAPCS_BASED)
 {
   if (arm_pcs_default == ARM_PCS_AAPCS_VFP)
diff --git a/gcc/config/arm/arm-cores.def b/gcc/config/arm/arm-cores.def
index 103c314..9d47fcf 100644
--- a/gcc/config/arm/arm-cores.def
+++ b/gcc/config/arm/arm-cores.def
@@ -125,8 +125,8 @@ ARM_CORE("arm1026ej-s",	arm1026ejs, arm1026ejs,	5TEJ, FL_LDSCHED, 9e)
 /* V6 Architecture Processors */
 ARM_CORE("arm1136j-s",		arm1136js, arm1136js,		6J,  FL_LDSCHED, 9e)
 ARM_CORE("arm1136jf-s",		arm1136jfs, arm1136jfs,		6J,  FL_LDSCHED | FL_VFPV2, 9e)
-ARM_CORE("arm1176jz-s",		arm1176jzs, arm117

Re: [PATCH 1/4][ARM] Make room for more CPU feature flags.

2015-07-24 Thread Matthew Wahab

Ping. Updated patch attached.

Also, retested the series for arm-none-linux-gnueabihf with native bootstrap and
make check and arm-none-abi with cross-compiled make check.

On 22/06/15 16:41, Matthew Wahab wrote:

Hello,

The ARM backend uses an unsigned long to record CPU feature flags and there are
currently 30 bits in use. To be able to support new architecture features, the
current representation will need to be replaced so that more flags can be
recorded.

This series of patches replaces the single unsigned long with a representation
based on an array of unsigned longs. Constructors and operations are explicitly
defined for the new representation and the backend is updated to use the new
operations.

The individual patches:
- Make architecture flags explicit in arm-cores.def, to prepare for the changes.
- Add definitions for the new representation as type arm_feature_set and macros
with prefix ARM_FSET.
- Replace uses of the old representation with the arm_feature_set type and
operations.
- Rework arm-cores.def and arm-arches.def to make the feature set constructions
explicit.

The series tested for arm-none-linux-gnueabihf with check-gcc.

This patch moves the derived FL_FOR_ARCH##ARCH flags from the expansion of macro
arm.c/ARM_CORE and makes them explicit in the entries in arm-cores.def.

This patch tested for arm-none-linux-gnueabihf with check-gcc.

Ok for trunk?
Matthew

2015-06-22  Matthew Wahab  

* gcc/config/arm/arm-cores.def: Add FL_FOR_ARCH flag for each
ARM_CORE entry.  Fix some white-space.
* gcc/config/arm/arm.c: Remove FL_FOR_ARCH derivation from
ARM_CORE definition.



>From 898ac2cb977df8739a5bee7a16c78410f04c6dab Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Fri, 5 Jun 2015 12:33:34 +0100
Subject: [PATCH 1/4] [ARM] Make ARCH flags explicit in arm-cores.def

Change-Id: I13a79c89bebaf82aa921f0502b721ff5d9b92dbe
---
 gcc/config/arm/arm-cores.def | 200 +--
 gcc/config/arm/arm.c |   2 +-
 2 files changed, 101 insertions(+), 101 deletions(-)

diff --git a/gcc/config/arm/arm-cores.def b/gcc/config/arm/arm-cores.def
index 103c314..f362c27 100644
--- a/gcc/config/arm/arm-cores.def
+++ b/gcc/config/arm/arm-cores.def
@@ -43,134 +43,134 @@
Some tools assume no whitespace up to the first "," in each entry.  */
 
 /* V2/V2A Architecture Processors */
-ARM_CORE("arm2", 	arm2, arm2,	2, FL_CO_PROC | FL_MODE26, slowmul)
-ARM_CORE("arm250", 	arm250, arm250,	2, FL_CO_PROC | FL_MODE26, slowmul)
-ARM_CORE("arm3",	arm3, arm3,	2, FL_CO_PROC | FL_MODE26, slowmul)
+ARM_CORE("arm2",	arm2, arm2,	2, FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH2, slowmul)
+ARM_CORE("arm250",	arm250, arm250,	2, FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH2, slowmul)
+ARM_CORE("arm3",	arm3, arm3,	2, FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH2, slowmul)
 
 /* V3 Architecture Processors */
-ARM_CORE("arm6",	arm6, arm6,		3, FL_CO_PROC | FL_MODE26, slowmul)
-ARM_CORE("arm60",	arm60, arm60,		3, FL_CO_PROC | FL_MODE26, slowmul)
-ARM_CORE("arm600",	arm600, arm600,		3, FL_CO_PROC | FL_MODE26 | FL_WBUF, slowmul)
-ARM_CORE("arm610",	arm610, arm610,		3, FL_MODE26 | FL_WBUF, slowmul)
-ARM_CORE("arm620",	arm620, arm620,		3, FL_CO_PROC | FL_MODE26 | FL_WBUF, slowmul)
-ARM_CORE("arm7",	arm7, arm7,		3, FL_CO_PROC | FL_MODE26, slowmul)
-ARM_CORE("arm7d",	arm7d, arm7d,		3, FL_CO_PROC | FL_MODE26, slowmul)
-ARM_CORE("arm7di",	arm7di, arm7di,		3, FL_CO_PROC | FL_MODE26, slowmul)
-ARM_CORE("arm70",	arm70, arm70,		3, FL_CO_PROC | FL_MODE26, slowmul)
-ARM_CORE("arm700",	arm700, arm700,		3, FL_CO_PROC | FL_MODE26 | FL_WBUF, slowmul)
-ARM_CORE("arm700i",	arm700i, arm700i,	3, FL_CO_PROC | FL_MODE26 | FL_WBUF, slowmul)
-ARM_CORE("arm710",	arm710, arm710,		3, FL_MODE26 | FL_WBUF, slowmul)
-ARM_CORE("arm720",	arm720, arm720,		3, FL_MODE26 | FL_WBUF, slowmul)
-ARM_CORE("arm710c",	arm710c, arm710c,	3, FL_MODE26 | FL_WBUF, slowmul)
-ARM_CORE("arm7100",	arm7100, arm7100,	3, FL_MODE26 | FL_WBUF, slowmul)
-ARM_CORE("arm7500",	arm7500, arm7500,	3, FL_MODE26 | FL_WBUF, slowmul)
+ARM_CORE("arm6",	arm6, arm6,		3, FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH3, slowmul)
+ARM_CORE("arm60",	arm60, arm60,		3, FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH3, slowmul)
+ARM_CORE("arm600",	arm600, arm600,		3, FL_CO_PROC | FL_MODE26 | FL_WBUF | FL_FOR_ARCH3, slowmul)
+ARM_CORE("arm610",	arm610, arm610,		3, FL_MODE26 | FL_WBUF | FL_FOR_ARCH3, slowmul)
+ARM_CORE("arm620",	arm620, arm620,		3, FL_CO_PROC | FL_MODE26 | FL_WBUF | FL_FOR_ARCH3, slowmul)
+ARM_CORE("arm7",	arm7, arm7,		3, FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH3, slowmul)
+ARM_CORE("arm7d",	arm7d, arm7d,		3, FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH3, slowmul)
+ARM_COR

Re: [PATCH 2/4][ARM] Add feature set definitions.

2015-07-24 Thread Matthew Wahab

Ping. Updated patch attached.

Also, retested the series for arm-none-linux-gnueabihf with native bootstrap and
make check and arm-none-eabi with cross-compiled make check.


On 22/06/15 16:45, Matthew Wahab wrote:

Hello,

The ARM backend uses an unsigned long to record CPU feature flags and there are
currently 30 bits in use. This series of patches replaces the single unsigned
long with a representation based on an array of values.

This patch adds, but doesn't use, type arm_feature_set and macros prefixed
with ARM_FSET to represent and operate on feature sets.

Tested by building with no errors. Also tested as part of the series, for
arm-none-linux-gnueabihf with check-gcc.

Ok for trunk?
Matthew

gcc/
2015-06-22  Matthew Wahab  

* config/arm/arm-protos.h (FL_NONE): New.
(FL_ANY): New.
(arm_feature_set): New.
(ARM_FSET_MAKE): New.
(ARM_FSET_MAKE_CPU1): New.
(ARM_FSET_MAKE_CPU2): New.
(ARM_FSET_CPU1): New.
(ARM_FSET_CPU2): New.
(ARM_FSET_EMPTY): New.
(ARM_FSET_ANY): New.
(ARM_FSET_HAS_CPU1): New.
(ARM_FSET_HAS_CPU2): New.
(ARM_FSET_ADD_CPU1): New.
(ARM_FSET_ADD_CPU2): New.
(ARM_FSET_DEL_CPU1): New.
(ARM_FSET_DEL_CPU2): New.
(ARM_FSET_UNION): New.
(ARM_FSET_INTER): New.
(ARM_FSET_XOR): New.
(ARM_FSET_EXCLUDE): New.
(AFM_FSET_IS_EMPTY): New.
(ARM_FSET_CPU_SUBSET): New.



>From f977c65b06b0a55a6d371004d0cc64ba216ee954 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Thu, 4 Jun 2015 15:35:25 +0100
Subject: [PATCH 2/4] Add feature set definitions.

Change-Id: I5f89b46ea57e35f477ec4751fea3cb6ee8fce251
---
 gcc/config/arm/arm-protos.h | 101 
 1 file changed, 101 insertions(+)

diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 62f91ef..a19d54d 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -346,6 +346,8 @@ extern bool arm_is_constant_pool_ref (rtx);
 /* Flags used to identify the presence of processor capabilities.  */
 
 /* Bit values used to identify processor capabilities.  */
+#define FL_NONE	  (0)	  /* No flags.  */
+#define FL_ANY	  (0x)/* All flags.  */
 #define FL_CO_PROC(1 << 0)/* Has external co-processor bus */
 #define FL_ARCH3M (1 << 1)/* Extended multiply */
 #define FL_MODE26 (1 << 2)/* 26-bit mode support */
@@ -412,6 +414,105 @@ extern bool arm_is_constant_pool_ref (rtx);
 #define FL_FOR_ARCH7EM  (FL_FOR_ARCH7M | FL_ARCH7EM)
 #define FL_FOR_ARCH8A	(FL_FOR_ARCH7VE | FL_ARCH8)
 
+/* There are too many feature bits to fit in a single word so the set of cpu and
+   fpu capabilities is a structure.  A feature set is created and manipulated
+   with the ARM_FSET macros.  */
+
+typedef struct
+{
+  unsigned long cpu[2];
+} arm_feature_set;
+
+
+/* Initialize a feature set.  */
+
+#define ARM_FSET_MAKE(CPU1,CPU2) { { (CPU1), (CPU2) } }
+
+#define ARM_FSET_MAKE_CPU1(CPU1) ARM_FSET_MAKE ((CPU1), (FL_NONE))
+#define ARM_FSET_MAKE_CPU2(CPU2) ARM_FSET_MAKE ((FL_NONE), (CPU2))
+
+/* Accessors.  */
+
+#define ARM_FSET_CPU1(S) ((S).cpu[0])
+#define ARM_FSET_CPU2(S) ((S).cpu[1])
+
+/* Useful combinations.  */
+
+#define ARM_FSET_EMPTY ARM_FSET_MAKE (FL_NONE, FL_NONE)
+#define ARM_FSET_ANY ARM_FSET_MAKE (FL_ANY, FL_ANY)
+
+/* Tests for a specific CPU feature.  */
+
+#define ARM_FSET_HAS_CPU1(A, F)  (((A).cpu[0] & (F)) == F)
+#define ARM_FSET_HAS_CPU2(A, F)  (((A).cpu[1] & (F)) == F)
+
+/* Add a feature to a feature set.  */
+
+#define ARM_FSET_ADD_CPU1(DST, F)		\
+  do {		\
+(DST).cpu[0] |= (F);			\
+  } while (0)
+
+#define ARM_FSET_ADD_CPU2(DST, F)		\
+  do {		\
+(DST).cpu[1] |= (F);			\
+  } while (0)
+
+/* Remove a feature from a feature set.  */
+
+#define ARM_FSET_DEL_CPU1(DST, F)		\
+  do {		\
+(DST).cpu[0] &= ~(F);			\
+  } while (0)
+
+#define ARM_FSET_DEL_CPU2(DST, F)		\
+  do {		\
+(DST).cpu[1] &= ~(F);			\
+  } while (0)
+
+/* Union of feature sets.  */
+
+#define ARM_FSET_UNION(DST,F1,F2)		\
+  do {		\
+(DST).cpu[0] = (F1).cpu[0] | (F2).cpu[0];	\
+(DST).cpu[1] = (F1).cpu[1] | (F2).cpu[1];	\
+  } while (0)
+
+/* Intersection of feature sets.  */
+
+#define ARM_FSET_INTER(DST,F1,F2)		\
+  do {		\
+(DST).cpu[0] = (F1).cpu[0] & (F2).cpu[0];	\
+(DST).cpu[1] = (F1).cpu[1] & (F2).cpu[1];	\
+  } while (0)
+
+/* Exclusive disjunction.  */
+
+#define ARM_FSET_XOR(DST,F1,F2)\
+  do {			\
+(DST).cpu[0] = (F1).cpu[0] ^ (F2).cpu[0];		\
+(DST).cpu[1] = (F1).cpu[1] ^ (F2).cpu[1];		\
+  } while (0)
+
+/* Difference of feature sets: F1 excluding the elements of F2.  */
+
+#define ARM_FSET_EXCLUDE(DST,F1,F2)		\
+  do {		\
+(DST).cpu[0] = (F1).cpu[0] & ~(F2).cpu[0];	\
+(DST).cpu[1] = (F1).cpu[1] & ~(F2).cpu[1];	\
+  } while (0)
+
+/* Test fo

Re: [PATCH 3/4][ARM] Use new feature set representation.

2015-07-24 Thread Matthew Wahab

Ping. Updated patch attached.

Also, retested the series for arm-none-linux-gnueabihf with native bootstrap and
make check and arm-none-eabi with cross-compiled make check.


On 22/06/15 16:52, Matthew Wahab wrote:

Hello,

The ARM backend uses an unsigned long to record CPU feature flags and there are
currently 30 bits in use. This series of patches replaces the single unsigned
long with a representation based on an array of values.

This patch replaces the existing representation of CPU feature sets with the
type arm_feature_set and ARM_FSET macros added in an earlier patch in this
series.

Tested arm-none-linux-gnueabihf with check-gcc. Also tested as part of the
series for arm-none-linux-gnueabihf with check-gcc.

Ok for trunk?
Matthew

gcc/
2015-06-22  Matthew Wahab  

* config/arm/arm-builtins.c (def_mbuiltin): Use ARM_FSET macro.
* config/arm/arm-protos.h (insn_flags): Declare as type
arm_feature_set.
(tune_flags): Likewise.
* config/arm/arm.c (feature_count): New.
(insn_flags): Define as type arm_feature_set.
(tune_flags): Likewise.
(struct processors): Define field flags as type arm_feature_set.
(all_cores): Update for change to struct processors.
(all_architectures): Likewise.
(arm_option_check_internal): Use arm_feature_set and ARM_FSET macros.
(arm_option_override_internal): Likewise.
(arm_option_override): Likewise.



>From 2a4ecb02633d41f965e8e05a374700125211a440 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Mon, 8 Jun 2015 14:11:13 +0100
Subject: [PATCH 3/4] Use feature sets.

Change-Id: I5a1b162102dd19b6376637218dc548502112cf4b
---
 gcc/config/arm/arm-builtins.c |   4 +-
 gcc/config/arm/arm-protos.h   |   4 +-
 gcc/config/arm/arm.c  | 131 --
 3 files changed, 80 insertions(+), 59 deletions(-)

diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 65e72a4..5c03315 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -1072,10 +1072,10 @@ arm_init_neon_builtins (void)
 #undef NUM_DREG_TYPES
 #undef NUM_QREG_TYPES
 
-#define def_mbuiltin(MASK, NAME, TYPE, CODE)\
+#define def_mbuiltin(FLAG, NAME, TYPE, CODE)\
   do	\
 {	\
-  if ((MASK) & insn_flags)		\
+  if (ARM_FSET_HAS_CPU1 (insn_flags, (FLAG)))			\
 	{\
 	  tree bdecl;			\
 	  bdecl = add_builtin_function ((NAME), (TYPE), (CODE),		\
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index a19d54d..859b5d2 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -515,11 +515,11 @@ typedef struct
 
 /* The bits in this mask specify which
instructions we are allowed to generate.  */
-extern unsigned long insn_flags;
+extern arm_feature_set insn_flags;
 
 /* The bits in this mask specify which instruction scheduling options should
be used.  */
-extern unsigned long tune_flags;
+extern arm_feature_set tune_flags;
 
 /* Nonzero if this chip supports the ARM Architecture 3M extensions.  */
 extern int arm_arch3m;
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 4f90203..4190b3f 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -97,6 +97,7 @@ static void arm_add_gc_roots (void);
 static int arm_gen_constant (enum rtx_code, machine_mode, rtx,
 			 HOST_WIDE_INT, rtx, rtx, int, int);
 static unsigned bit_count (unsigned long);
+static unsigned feature_count (const arm_feature_set*);
 static int arm_address_register_rtx_p (rtx, int);
 static int arm_legitimate_index_p (machine_mode, rtx, RTX_CODE, int);
 static bool is_called_in_ARM_mode (tree);
@@ -767,11 +768,11 @@ static int thumb_call_reg_needed;
 
 /* The bits in this mask specify which
instructions we are allowed to generate.  */
-unsigned long insn_flags = 0;
+arm_feature_set insn_flags = ARM_FSET_EMPTY;
 
 /* The bits in this mask specify which instruction scheduling options should
be used.  */
-unsigned long tune_flags = 0;
+arm_feature_set tune_flags = ARM_FSET_EMPTY;
 
 /* The highest ARM architecture version supported by the
target.  */
@@ -924,7 +925,7 @@ struct processors
   enum processor_type core;
   const char *arch;
   enum base_architecture base_arch;
-  const unsigned long flags;
+  const arm_feature_set flags;
   const struct tune_params *const tune;
 };
 
@@ -2193,10 +2194,10 @@ static const struct processors all_cores[] =
   /* ARM Cores */
 #define ARM_CORE(NAME, X, IDENT, ARCH, FLAGS, COSTS) \
   {NAME, IDENT, #ARCH, BASE_ARCH_##ARCH,	  \
-FLAGS, &arm_##COSTS##_tune},
+   ARM_FSET_MAKE_CPU1 (FLAGS), &arm_##COSTS##_tune},
 #include "arm-cores.def"
 #undef ARM_CORE
-  {NULL, arm_none, NULL, BASE_ARCH_0, 0, NULL}
+  {NULL, arm_none, NULL, BASE_ARCH_0, ARM_FSET_EMPTY, NULL}
 };
 
 static const struct processors all_architectures[] =
@@ -2206,10 +2207,10 @@ static const struct processors all_arc

Re: [PATCH 4/4][ARM] Move initializer into arm-cores.def and arm-arches.def

2015-07-24 Thread Matthew Wahab

Ping. Updated patch attached.

Also, retested the series for arm-none-linux-gnueabihf with native bootstrap and
make check and arm-none-eabi with cross-compiled make check.


On 22/06/15 16:54, Matthew Wahab wrote:

Hello,

The ARM backend uses an unsigned long to record CPU feature flags and there are
currently 30 bits in use. This series of patches replaces the single unsigned
long with a representation based on an array of values.

This patch updates the entries in the arm-core.def and arm-arches.def files
for the new arm_feature_set representation, moving the initializers from a macro
expansion and making them explicit in the file entries.

Tested for arm-none-linux-gnueabihf with check-gcc.

Ok for trunk?
Matthew

gcc/
2015-08-22  Matthew Wahab  

* config/arm/arm-arches.def: Replace single value flags with
initializer built from ARM_FSET_MAKE_CPU1.
* config/arm/arm-cores.def: Likewise.
* config/arm/arm.c: (all_cores): Remove ARM_FSET_MAKE_CPU1
derivation from the ARM_CORE macro definition, use the given value
instead.
(all_architectures): Remove ARM_FSET_MAKE_CPU1 derivation from the
ARM_ARCH macro definition, use the given value instead.



>From 699d2d4c5b1683bddd2645690fd590ee8a812491 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Mon, 8 Jun 2015 16:15:52 +0100
Subject: [PATCH 4/4] Move feature sets into core and arch def files.

Change-Id: Ica484c7d9f46413c196b26a630ff49413b10289b
---
 gcc/config/arm/arm-arches.def |  56 ++--
 gcc/config/arm/arm-cores.def  | 200 +-
 gcc/config/arm/arm.c  |   4 +-
 3 files changed, 130 insertions(+), 130 deletions(-)

diff --git a/gcc/config/arm/arm-arches.def b/gcc/config/arm/arm-arches.def
index 840c1ff..6d0374a 100644
--- a/gcc/config/arm/arm-arches.def
+++ b/gcc/config/arm/arm-arches.def
@@ -28,33 +28,33 @@
 
genopt.sh assumes no whitespace up to the first "," in each entry.  */
 
-ARM_ARCH("armv2",   arm2,   2,   FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH2)
-ARM_ARCH("armv2a",  arm2,   2,   FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH2)
-ARM_ARCH("armv3",   arm6,   3,   FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH3)
-ARM_ARCH("armv3m",  arm7m,  3M,  FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH3M)
-ARM_ARCH("armv4",   arm7tdmi,   4,   FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH4)
+ARM_ARCH("armv2",   arm2,   2,	ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH2))
+ARM_ARCH("armv2a",  arm2,   2,	ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH2))
+ARM_ARCH("armv3",   arm6,   3,	ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH3))
+ARM_ARCH("armv3m",  arm7m,  3M,	ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH3M))
+ARM_ARCH("armv4",   arm7tdmi,   4,	ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH4))
 /* Strictly, FL_MODE26 is a permitted option for v4t, but there are no
implementations that support it, so we will leave it out for now.  */
-ARM_ARCH("armv4t",  arm7tdmi,   4T,  FL_CO_PROC | FL_FOR_ARCH4T)
-ARM_ARCH("armv5",   arm10tdmi,  5,   FL_CO_PROC | FL_FOR_ARCH5)
-ARM_ARCH("armv5t",  arm10tdmi,  5T,  FL_CO_PROC | FL_FOR_ARCH5T)
-ARM_ARCH("armv5e",  arm1026ejs, 5E,  FL_CO_PROC | FL_FOR_ARCH5E)
-ARM_ARCH("armv5te", arm1026ejs, 5TE, FL_CO_PROC | FL_FOR_ARCH5TE)
-ARM_ARCH("armv6",   arm1136js,  6,   FL_CO_PROC | FL_FOR_ARCH6)
-ARM_ARCH("armv6j",  arm1136js,  6J,  FL_CO_PROC | FL_FOR_ARCH6J)
-ARM_ARCH("armv6k",  mpcore,	6K,  FL_CO_PROC | FL_FOR_ARCH6K)
-ARM_ARCH("armv6z",  arm1176jzs, 6Z,  FL_CO_PROC | FL_FOR_ARCH6Z)
-ARM_ARCH("armv6zk", arm1176jzs, 6ZK, FL_CO_PROC | FL_FOR_ARCH6ZK)
-ARM_ARCH("armv6t2", arm1156t2s, 6T2, FL_CO_PROC | FL_FOR_ARCH6T2)
-ARM_ARCH("armv6-m", cortexm1,	6M,			  FL_FOR_ARCH6M)
-ARM_ARCH("armv6s-m", cortexm1,	6M,			  FL_FOR_ARCH6M)
-ARM_ARCH("armv7",   cortexa8,	7,   FL_CO_PROC |	  FL_FOR_ARCH7)
-ARM_ARCH("armv7-a", cortexa8,	7A,  FL_CO_PROC |	  FL_FOR_ARCH7A)
-ARM_ARCH("armv7ve", cortexa8,	7A,  FL_CO_PROC |	  FL_FOR_ARCH7VE)
-ARM_ARCH("armv7-r", cortexr4,	7R,  FL_CO_PROC |	  FL_FOR_ARCH7R)
-ARM_ARCH("armv7-m", cortexm3,	7M,  FL_CO_PROC |	  FL_FOR_ARCH7M)
-ARM_ARCH("armv7e-m", cortexm4,  7EM, FL_CO_PROC |	  FL_FOR_ARCH7EM)
-ARM_ARCH("armv8-a", cortexa53,  8A,  FL_CO_PROC | FL_FOR_ARCH8A)
-ARM_ARCH("armv8-a+crc",cortexa53, 8A,FL_CO_PROC | FL_CRC32  | FL_FOR_ARCH8A)
-ARM_ARCH("iwmmxt",  iwmmxt, 5TE, FL_LDSCHED | FL_STRONG | FL_FOR_ARCH5TE | FL_XSCALE | FL_IWMMXT)
-ARM_AR

Re: [ARM] Correct spelling of references to ARMv6KZ

2015-07-27 Thread Matthew Wahab

On 23/07/15 12:04, Kyrill Tkachov wrote:



GCC supports ARM architecture ARMv6KZ but refers to it as ARMv6ZK. This is made
visible by the command line option -march=armv6zk and by the predefined macro
__ARM_ARCH_6ZK__.

This patch corrects the spelling internally and adds -march=armv6kz. To preserve
existing behaviour, -march=armv6zk is kept as an alias of -march=armv6kz and
both __ARM_ARCH_6KZ__ and __ARM_ARCH_6ZK__ macros are defined for the
architecture.

Use of -march=arm6kz will need to wait for binutils to be updated,[..]


diff --git a/gcc/config/arm/driver-arm.c b/gcc/config/arm/driver-arm.c
index c715bb7..7873606 100644
--- a/gcc/config/arm/driver-arm.c
+++ b/gcc/config/arm/driver-arm.c
@@ -35,6 +35,7 @@ static struct vendor_cpu arm_cpu_table[] = {
  {"0xb02", "armv6k", "mpcore"},
  {"0xb36", "armv6j", "arm1136j-s"},
  {"0xb56", "armv6t2", "arm1156t2-s"},
+{"0xb76", "armv6kz", "arm1176jz-s"},
  {"0xb76", "armv6zk", "arm1176jz-s"},
  {"0xc05", "armv7-a", "cortex-a5"},
  {"0xc07", "armv7ve", "cortex-a7"},

This table is scanned from beginning to end, checking for the first field.
You introduce a duplicate here, so the second form will never be reached.
I'd suggest removing the wrong spelling from here, but the re-written march 
string
will be passed to the assembler, so if the assembler is old and doesn't support 
the
correct spelling we'll get errors. So it seems like in order to preserve 
backwards
compatibility we don't want to put the correctly spelled entry here :(
But definitely add a comment here mentioning the deliberate oversight.



Respun patch attached. I've removed "armv6kz" entry from config/arm/driver-arm.c and 
replaced it with a comment for the "armv6zk" entry.


Tested for arm-none-linux-gnueabihf with native bootstrap and make check.

Matthew

gcc/
2015-07-27  Matthew Wahab  

* config/arm/arm-arches.def: Add "armv6kz". Replace 6ZK with 6KZ
and FL_FOR_ARCH6ZK with FL_FOR_ARCH6KZ.
* config/arm/arm-c.c (arm_cpu_builtins): Emit "__ARM_ARCH_6ZK__"
for armv6kz targets.
* config/arm/arm-cores.def: Replace 6ZK with 6KZ.
* config/arm/arm-protos.h (FL_ARCH6KZ): New.
(FL_FOR_ARCH6ZK): Remove.
(FL_FOR_ARCH6KZ): New.
(arm_arch6zk): New declaration.
* config/arm/arm-tables.opt: Regenerate.
* config/arm/arm.c (arm_arch6kz): New.
(arm_option_override): Set arm_arch6kz.
* config/arm/arm.h (BASE_ARCH_6ZK): Rename to BASE_ARCH_6KZ.
* config/arm/driver-arm.c: Add comment to "armv6zk" entry.
* doc/invoke.texi: Replace "armv6zk" with "armv6kz".


diff --git a/gcc/config/arm/arm-arches.def b/gcc/config/arm/arm-arches.def
index 840c1ff..3dafaa5 100644
--- a/gcc/config/arm/arm-arches.def
+++ b/gcc/config/arm/arm-arches.def
@@ -44,7 +44,8 @@ ARM_ARCH("armv6",   arm1136js,  6,   FL_CO_PROC | FL_FOR_ARCH6)
 ARM_ARCH("armv6j",  arm1136js,  6J,  FL_CO_PROC | FL_FOR_ARCH6J)
 ARM_ARCH("armv6k",  mpcore,	6K,  FL_CO_PROC | FL_FOR_ARCH6K)
 ARM_ARCH("armv6z",  arm1176jzs, 6Z,  FL_CO_PROC | FL_FOR_ARCH6Z)
-ARM_ARCH("armv6zk", arm1176jzs, 6ZK, FL_CO_PROC | FL_FOR_ARCH6ZK)
+ARM_ARCH("armv6kz", arm1176jzs, 6KZ, FL_CO_PROC | FL_FOR_ARCH6KZ)
+ARM_ARCH("armv6zk", arm1176jzs, 6KZ, FL_CO_PROC | FL_FOR_ARCH6KZ)
 ARM_ARCH("armv6t2", arm1156t2s, 6T2, FL_CO_PROC | FL_FOR_ARCH6T2)
 ARM_ARCH("armv6-m", cortexm1,	6M,			  FL_FOR_ARCH6M)
 ARM_ARCH("armv6s-m", cortexm1,	6M,			  FL_FOR_ARCH6M)
diff --git a/gcc/config/arm/arm-c.c b/gcc/config/arm/arm-c.c
index 297995b..9bf3973 100644
--- a/gcc/config/arm/arm-c.c
+++ b/gcc/config/arm/arm-c.c
@@ -167,6 +167,11 @@ arm_cpu_builtins (struct cpp_reader* pfile, int flags)
 }
   if (arm_arch_iwmmxt2)
 builtin_define ("__IWMMXT2__");
+  /* ARMv6KZ was originally identified as the misspelled __ARM_ARCH_6ZK__.  To
+ preserve the existing behaviour, the misspelled feature macro must still be
+ defined.  */
+  if (arm_arch6kz)
+builtin_define ("__ARM_ARCH_6ZK__");
   if (TARGET_AAPCS_BASED)
 {
   if (arm_pcs_default == ARM_PCS_AAPCS_VFP)
diff --git a/gcc/config/arm/arm-cores.def b/gcc/config/arm/arm-cores.def
index 103c314..9d47fcf 100644
--- a/gcc/config/arm/arm-cores.def
+++ b/gcc/config/arm/arm-cores.def
@@ -125,8 +125,8 @@ ARM_CORE("arm1026ej-s",	arm1026ejs, arm1026ejs,	5TEJ, FL_LDSCHED, 9e)
 /* V6 Architecture Processors */
 ARM_CORE("arm1136j-s",		arm1136js, arm1136js,		6J,  FL_

Re: [PATCH 1/4][ARM][PR target/65697][5.1] Backport stronger barriers for __sync fetch-op builtins.

2015-07-27 Thread Matthew Wahab

Ping. Updated patch attached.

Also, retested for arm-none-linux-gnueabihf with native bootstrap and make
check and for arm-none-eabi with cross compiled make check.

On 02/07/15 14:12, Matthew Wahab wrote:

The __sync builtins are implemented using barriers that are too weak for ARMv8
targets, this has been fixed on trunk for the ARM back-end. Since GCC-5.1 is
also generating the incorrect code, it should also be fixed.

This patch backports the changes made to strengthen the barriers emitted for
the __sync fetch-and-op/op-and-fetch builtins.

The trunk patch submission is at
https://gcc.gnu.org/ml/gcc-patches/2015-06/msg01410.html
The commit is at https://gcc.gnu.org/ml/gcc-cvs/2015-06/msg01235.html

Tested the series for arm-none-linux-gnueabihf with check-gcc

Ok for the branch?
Matthew

2015-07-02  Matthew Wahab  

 Backport from trunk:
 2015-06-29  Matthew Wahab  

 PR target/65697
 * config/armc/arm.c (arm_split_atomic_op): For ARMv8, replace an
 initial acquire barrier with final barrier.


>From 0c2f209f869aead3475fe491f08cf7640d2bc8fe Mon Sep 17 00:00:00 2001
From: mwahab 
Date: Mon, 29 Jun 2015 16:03:34 +
Subject: [PATCH 1/4] 2015-07-01  Matthew Wahab  

Backport
	PR target/65697
	* config/armc/arm.c (arm_split_atomic_op): For ARMv8, replace an
	initial acquire barrier with final barrier.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@225132 138bc75d-0d04-0410-961f-82ee72b054a4

Conflicts:
	gcc/ChangeLog

Change-Id: I2074541794ecad8847ada04690cd9132a51b6404
---
 gcc/config/arm/arm.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 614ff0d..f694e74 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -27822,6 +27822,8 @@ arm_split_atomic_op (enum rtx_code code, rtx old_out, rtx new_out, rtx mem,
   rtx_code_label *label;
   rtx x;
 
+  bool is_armv8_sync = arm_arch8 && is_mm_sync (model);
+
   bool use_acquire = TARGET_HAVE_LDACQ
  && !(is_mm_relaxed (model) || is_mm_consume (model)
 			  || is_mm_release (model));
@@ -27830,6 +27832,11 @@ arm_split_atomic_op (enum rtx_code code, rtx old_out, rtx new_out, rtx mem,
  && !(is_mm_relaxed (model) || is_mm_consume (model)
 			  || is_mm_acquire (model));
 
+  /* For ARMv8, a load-acquire is too weak for __sync memory orders.  Instead,
+ a full barrier is emitted after the store-release.  */
+  if (is_armv8_sync)
+use_acquire = false;
+
   /* Checks whether a barrier is needed and emits one accordingly.  */
   if (!(use_acquire || use_release))
 arm_pre_atomic_barrier (model);
@@ -27900,7 +27907,8 @@ arm_split_atomic_op (enum rtx_code code, rtx old_out, rtx new_out, rtx mem,
   emit_unlikely_jump (gen_cbranchsi4 (x, cond, const0_rtx, label));
 
   /* Checks whether a barrier is needed and emits one accordingly.  */
-  if (!(use_acquire || use_release))
+  if (is_armv8_sync
+  || !(use_acquire || use_release))
 arm_post_atomic_barrier (model);
 }
 
-- 
1.9.1



Re: [PATCH 2/4][ARM][PR target/65697][5.1] Backport stronger barriers for __sync,compare-and-swap builtins.

2015-07-27 Thread Matthew Wahab

Ping. Updated patch attached.

Also, retested for arm-none-linux-gnueabihf with native bootstrap and make
check and for arm-none-eabi with cross compiled make check.


On 02/07/15 14:15, Matthew Wahab wrote:

This patch backports the changes made to strengthen the barriers emitted for
the __sync compare-and-swap builtins.

The trunk patch submission is at
https://gcc.gnu.org/ml/gcc-patches/2015-06/msg01411.html
The commit is at https://gcc.gnu.org/ml/gcc-cvs/2015-06/msg01236.html

Tested the series for arm-none-linux-gnueabihf with check-gcc

Ok for the branch?
Matthew

2015-07-02  Matthew Wahab  

 Backport from trunk:
 2015-06-29  Matthew Wahab  

 PR target/65697
 * config/armc/arm.c (arm_split_compare_and_swap): For ARMv8,
 replace an initial acquire barrier with final barrier.



>From fdcde1aa0b852f2a01bb45115e28f694b0225fcf Mon Sep 17 00:00:00 2001
From: mwahab 
Date: Mon, 29 Jun 2015 16:09:10 +
Subject: [PATCH 2/4] 2015-07-01  Matthew Wahab  

	Backport
	PR target/65697
	* config/armc/arm.c (arm_split_compare_and_swap): For ARMv8, replace an
	initial acquire barrier with final barrier.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@225133 138bc75d-0d04-0410-961f-82ee72b054a4

Conflicts:
	gcc/ChangeLog

Change-Id: Ifab505d792d6227c7d2231813dfb2e7826f0f450
---
 gcc/config/arm/arm.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index f694e74..1e67a73 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -27757,6 +27757,8 @@ arm_split_compare_and_swap (rtx operands[])
   scratch = operands[7];
   mode = GET_MODE (mem);
 
+  bool is_armv8_sync = arm_arch8 && is_mm_sync (mod_s);
+
   bool use_acquire = TARGET_HAVE_LDACQ
  && !(is_mm_relaxed (mod_s) || is_mm_consume (mod_s)
 			  || is_mm_release (mod_s));
@@ -27765,6 +27767,11 @@ arm_split_compare_and_swap (rtx operands[])
  && !(is_mm_relaxed (mod_s) || is_mm_consume (mod_s)
 			  || is_mm_acquire (mod_s));
 
+  /* For ARMv8, the load-acquire is too weak for __sync memory orders.  Instead,
+ a full barrier is emitted after the store-release.  */
+  if (is_armv8_sync)
+use_acquire = false;
+
   /* Checks whether a barrier is needed and emits one accordingly.  */
   if (!(use_acquire || use_release))
 arm_pre_atomic_barrier (mod_s);
@@ -27805,7 +27812,8 @@ arm_split_compare_and_swap (rtx operands[])
 emit_label (label2);
 
   /* Checks whether a barrier is needed and emits one accordingly.  */
-  if (!(use_acquire || use_release))
+  if (is_armv8_sync
+  || !(use_acquire || use_release))
 arm_post_atomic_barrier (mod_s);
 
   if (is_mm_relaxed (mod_f))
-- 
1.9.1



Re: [PATCH 3/4][ARM][PR target/65697][5.1] Add tests for __sync_builtins.

2015-07-27 Thread Matthew Wahab

Ping. Updated patch attached.

Also, retested for arm-none-linux-gnueabihf with native bootstrap and make
check and for arm-none-eabi with cross compiled make check.

On 02/07/15 14:17, Matthew Wahab wrote:

This patch backports the tests added for code generated by the ARM back-end for
the __sync builtins.

The trunk patch submission is at
https://gcc.gnu.org/ml/gcc-patches/2015-06/msg01412.html
The commit is at https://gcc.gnu.org/ml/gcc-cvs/2015-06/msg01237.html

Tested the series for arm-none-linux-gnueabihf with check-gcc

Ok for the branch?
Matthew

gcc/testsuite
2015-07-02  Matthew Wahab  

 Backport from trunk:
 2015-06-29  Matthew Wahab  

 PR target/65697
 * gcc.target/arm/armv-sync-comp-swap.c: New.
 * gcc.target/arm/armv-sync-op-acquire.c: New.
 * gcc.target/arm/armv-sync-op-full.c: New.
 * gcc.target/arm/armv-sync-op-release.c: New.



>From d1a53325eb47a5da55a8267deb5fbe168f7db4de Mon Sep 17 00:00:00 2001
From: mwahab 
Date: Mon, 29 Jun 2015 16:12:12 +
Subject: [PATCH 3/4] 2015-07-01  Matthew Wahab  

	Backport
	PR target/65697
	* gcc.target/arm/armv-sync-comp-swap.c: New.
	* gcc.target/arm/armv-sync-op-acquire.c: New.
	* gcc.target/arm/armv-sync-op-full.c: New.
	* gcc.target/arm/armv-sync-op-release.c: New.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@225134 138bc75d-0d04-0410-961f-82ee72b054a4

Conflicts:
	gcc/ChangeLog

Change-Id: I16c02786765bbbfbb287fba863ba27fb6a56ddc5
---
 gcc/testsuite/gcc.target/arm/armv8-sync-comp-swap.c  | 10 ++
 gcc/testsuite/gcc.target/arm/armv8-sync-op-acquire.c | 10 ++
 gcc/testsuite/gcc.target/arm/armv8-sync-op-full.c| 10 ++
 gcc/testsuite/gcc.target/arm/armv8-sync-op-release.c |  8 
 4 files changed, 38 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/arm/armv8-sync-comp-swap.c
 create mode 100644 gcc/testsuite/gcc.target/arm/armv8-sync-op-acquire.c
 create mode 100644 gcc/testsuite/gcc.target/arm/armv8-sync-op-full.c
 create mode 100644 gcc/testsuite/gcc.target/arm/armv8-sync-op-release.c

diff --git a/gcc/testsuite/gcc.target/arm/armv8-sync-comp-swap.c b/gcc/testsuite/gcc.target/arm/armv8-sync-comp-swap.c
new file mode 100644
index 000..f96c81a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/armv8-sync-comp-swap.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { do-require-effective-target arm_arch_v8a_ok } */
+/* { dg-options "-O2" } */
+/* { dg-add-options arm_arch_v8a } */
+
+#include "../aarch64/sync-comp-swap.x"
+
+/* { dg-final { scan-assembler-times "ldrex" 2 } } */
+/* { dg-final { scan-assembler-times "stlex" 2 } } */
+/* { dg-final { scan-assembler-times "dmb" 2 } } */
diff --git a/gcc/testsuite/gcc.target/arm/armv8-sync-op-acquire.c b/gcc/testsuite/gcc.target/arm/armv8-sync-op-acquire.c
new file mode 100644
index 000..8d6659b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/armv8-sync-op-acquire.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { do-require-effective-target arm_arch_v8a_ok } */
+/* { dg-options "-O2" } */
+/* { dg-add-options arm_arch_v8a } */
+
+#include "../aarch64/sync-op-acquire.x"
+
+/* { dg-final { scan-assembler-times "ldrex" 1 } } */
+/* { dg-final { scan-assembler-times "stlex" 1 } } */
+/* { dg-final { scan-assembler-times "dmb" 1 } } */
diff --git a/gcc/testsuite/gcc.target/arm/armv8-sync-op-full.c b/gcc/testsuite/gcc.target/arm/armv8-sync-op-full.c
new file mode 100644
index 000..a5ad3bd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/armv8-sync-op-full.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { do-require-effective-target arm_arch_v8a_ok } */
+/* { dg-options "-O2" } */
+/* { dg-add-options arm_arch_v8a } */
+
+#include "../aarch64/sync-op-full.x"
+
+/* { dg-final { scan-assembler-times "ldrex" 12 } } */
+/* { dg-final { scan-assembler-times "stlex" 12 } } */
+/* { dg-final { scan-assembler-times "dmb" 12 } } */
diff --git a/gcc/testsuite/gcc.target/arm/armv8-sync-op-release.c b/gcc/testsuite/gcc.target/arm/armv8-sync-op-release.c
new file mode 100644
index 000..0d3be7b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/armv8-sync-op-release.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { do-require-effective-target arm_arch_v8a_ok } */
+/* { dg-options "-O2" } */
+/* { dg-add-options arm_arch_v8a } */
+
+#include "../aarch64/sync-op-release.x"
+
+/* { dg-final { scan-assembler-times "stl" 1 } } */
-- 
1.9.1



Re: [PATCH 4/4][ARM][PR target/65697][5.1] Fix tests for __sync_builtins.

2015-07-27 Thread Matthew Wahab

Ping. Updated patch attached.

Also, retested for arm-none-linux-gnueabihf with native bootstrap and make
check and for arm-none-eabi with cross compiled make check.

On 02/07/15 14:18, Matthew Wahab wrote:

This patch backports fixes for the __sync builtin tests.

The trunk patch submission is at
https://gcc.gnu.org/ml/gcc-patches/2015-07/msg00031.html
The commit is at https://gcc.gnu.org/ml/gcc-cvs/2015-07/msg00025.html

Tested the series for arm-none-linux-gnueabihf with check-gcc

Ok for the branch?
Matthew

gcc/testsuite
2015-07-02  Matthew Wahab  

 Backport from trunk:
 2015-07-01  Matthew Wahab  

 * gcc.target/arm/armv8-sync-comp-swap.c: Replace
 'do-require-effective-target' with 'dg-require-effective-target'.
 * gcc.target/arm/armv8-sync-op-full.c: Likewise.
 * gcc.target/arm/armv8-sync-op-release.c: Likewise.
 * gcc.target/arm/armv8-sync-op-acquire.c: Likewise.  Also, replace
 'stlex' with 'strex' as the expected output.



>From d058686fe1027927a5fdfbb81a83526e3f9b9d6d Mon Sep 17 00:00:00 2001
From: mwahab 
Date: Wed, 1 Jul 2015 12:16:01 +0000
Subject: [PATCH 4/4] 2015-07-01  Matthew Wahab  

	Backport
	* gcc.target/arm/armv8-sync-comp-swap.c: Replace
	'do-require-effective-target' with 'dg-require-effective-target'.
	* gcc.target/arm/armv8-sync-op-full.c: Likewise.
	* gcc.target/arm/armv8-sync-op-release.c: Likewise.
	* gcc.target/arm/armv8-sync-op-acquire.c: Likewise.  Also, replace
'stlex' with 'strex' as the expected output.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@225241 138bc75d-0d04-0410-961f-82ee72b054a4

Conflicts:
	gcc/testsuite/ChangeLog

Change-Id: I19f2013f7bdd2dd035f36f0f7c9829cf6a86fb8e
---
 gcc/testsuite/gcc.target/arm/armv8-sync-comp-swap.c  | 2 +-
 gcc/testsuite/gcc.target/arm/armv8-sync-op-acquire.c | 4 ++--
 gcc/testsuite/gcc.target/arm/armv8-sync-op-full.c| 2 +-
 gcc/testsuite/gcc.target/arm/armv8-sync-op-release.c | 2 +-
 4 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/armv8-sync-comp-swap.c b/gcc/testsuite/gcc.target/arm/armv8-sync-comp-swap.c
index f96c81a..0e95986 100644
--- a/gcc/testsuite/gcc.target/arm/armv8-sync-comp-swap.c
+++ b/gcc/testsuite/gcc.target/arm/armv8-sync-comp-swap.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { do-require-effective-target arm_arch_v8a_ok } */
+/* { dg-require-effective-target arm_arch_v8a_ok } */
 /* { dg-options "-O2" } */
 /* { dg-add-options arm_arch_v8a } */
 
diff --git a/gcc/testsuite/gcc.target/arm/armv8-sync-op-acquire.c b/gcc/testsuite/gcc.target/arm/armv8-sync-op-acquire.c
index 8d6659b..c448599 100644
--- a/gcc/testsuite/gcc.target/arm/armv8-sync-op-acquire.c
+++ b/gcc/testsuite/gcc.target/arm/armv8-sync-op-acquire.c
@@ -1,10 +1,10 @@
 /* { dg-do compile } */
-/* { do-require-effective-target arm_arch_v8a_ok } */
+/* { dg-require-effective-target arm_arch_v8a_ok } */
 /* { dg-options "-O2" } */
 /* { dg-add-options arm_arch_v8a } */
 
 #include "../aarch64/sync-op-acquire.x"
 
 /* { dg-final { scan-assembler-times "ldrex" 1 } } */
-/* { dg-final { scan-assembler-times "stlex" 1 } } */
+/* { dg-final { scan-assembler-times "strex" 1 } } */
 /* { dg-final { scan-assembler-times "dmb" 1 } } */
diff --git a/gcc/testsuite/gcc.target/arm/armv8-sync-op-full.c b/gcc/testsuite/gcc.target/arm/armv8-sync-op-full.c
index a5ad3bd..cce9e00 100644
--- a/gcc/testsuite/gcc.target/arm/armv8-sync-op-full.c
+++ b/gcc/testsuite/gcc.target/arm/armv8-sync-op-full.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { do-require-effective-target arm_arch_v8a_ok } */
+/* { dg-require-effective-target arm_arch_v8a_ok } */
 /* { dg-options "-O2" } */
 /* { dg-add-options arm_arch_v8a } */
 
diff --git a/gcc/testsuite/gcc.target/arm/armv8-sync-op-release.c b/gcc/testsuite/gcc.target/arm/armv8-sync-op-release.c
index 0d3be7b..502a266 100644
--- a/gcc/testsuite/gcc.target/arm/armv8-sync-op-release.c
+++ b/gcc/testsuite/gcc.target/arm/armv8-sync-op-release.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { do-require-effective-target arm_arch_v8a_ok } */
+/* { dg-require-effective-target arm_arch_v8a_ok } */
 /* { dg-options "-O2" } */
 /* { dg-add-options arm_arch_v8a } */
 
-- 
1.9.1



[AArch64][PATCH 1/5] Use atomic instructions for swap and fetch-update operations.

2015-09-17 Thread Matthew Wahab

Hello,

ARMv8.1 adds atomic swap and atomic load-operate instructions with
optional memory ordering specifiers. This patch series adds the
instructions to GCC, making them available with -march=armv8.1-a or
-march=armv8+lse, and uses them to implement the __sync and __atomic
builtins.

The ARMv8.1 swap instruction swaps the value in a register with a value
in memory. The load-operate instructions load a value from memory,
update it with the result of an operation and store the result in
memory.

This series uses the swap instruction to implement the atomic_exchange
patterns and the load-operate instructions to implement the
atomic_fetch_ and atomic__fetch patterns. For the
atomic__fetch patterns, the value returned as the result of the
operation has to be recalculated from the loaded data. The ARMv8 BIC
instruction is added so that it can be used for this recalculation.

The patches in this series
- add and use the atomic swap instruction.
- add the Aarch64 BIC instruction,
- add the ARMv8.1 load-operate instructions,
- use the load-operate instructions to implement the atomic_fetch_
  patterns,
- use the load-operate instructions to implement the patterns
  atomic__fetch patterns,

The code-generation changes in this series are based around a new
function, aarch64_gen_atomic_ldop, which takes the operation to be
implemented and emits the appropriate code, making use of the atomic
instruction. This follows the existing uses aarch64_split_atomic_op for
the same purpose when atomic instructions aren't available.

This patch adds the ARMv8.1 SWAP instruction and function
aarch64_gen_atomic_ldop and changes the implementation of the
atomic_exchange pattern to the atomic instruction when it is available.

The general form of the code generated for an atomic_exchange, with
destination D, source S, memory address A and memory order MO is:

   swp S, D, [A]

where
is one of {'', 'a', 'l', 'al'} depending on memory order MO.
is one of {'', 'b', 'h'} depending on the data size.

This patch also adds tests for the changes. These reuse the support code
introduced for the atomic CAS tests, adding macros to test functions
taking one memory ordering argument. These are used to iteratively
define functions using the __atomic_exchange builtins, which should be
implemented using the atomic swap.

Tested the series for aarch64-none-linux-gnu with native bootstrap and
make check. Also tested for aarch64-none-elf with cross-compiled
check-gcc on an ARMv8.1 emulator with +lse enabled by default.

Ok for trunk?
Matthew

gcc/
2015-09-17  Matthew Wahab  

* config/aarch64/aarch64-protos.h (aarch64_gen_atomic_ldop):
Declare.
* config/aarch64/aarch64.c (aarch64_emit_atomic_swp): New.
(aarch64_gen_atomic_ldop): New.
(aarch64_split_atomic_op): Fix whitespace and add a comment.
* config/aarch64/atomics.md (UNSPECV_ATOMIC_SWP): New.
(atomic_compare_and_swap_lse): Remove comments and fix
whitespace.
(atomic_exchange): Replace with an expander.
(aarch64_atomic_exchange): New.
(aarch64_atomic_exchange_lse): New.
(aarch64_atomic_): Fix some whitespace.
    (aarch64_atomic_swp): New.


gcc/testsuite/
2015-09-17  Matthew Wahab  

* gcc.target/aarch64/atomic-inst-ops.inc: (TEST_MODEL): New.
(TEST_ONE): New.
 * gcc.target/aarch64/atomic-inst-swap.c: New.

>From 425fa05a5e3656c8d6d0d085628424b4c846cd49 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Fri, 7 Aug 2015 17:18:37 +0100
Subject: [PATCH 1/5] Add atomic SWP instruction

Change-Id: I87bf48526cb11e65edd15691f5eab20446e418c9
---
 gcc/config/aarch64/aarch64-protos.h|  1 +
 gcc/config/aarch64/aarch64.c   | 46 ++-
 gcc/config/aarch64/atomics.md  | 92 +++---
 .../gcc.target/aarch64/atomic-inst-ops.inc | 13 +++
 gcc/testsuite/gcc.target/aarch64/atomic-inst-swp.c | 44 +++
 5 files changed, 183 insertions(+), 13 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/atomic-inst-swp.c

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index ff19851..eba4c76 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -378,6 +378,7 @@ rtx aarch64_load_tp (rtx);
 void aarch64_expand_compare_and_swap (rtx op[]);
 void aarch64_split_compare_and_swap (rtx op[]);
 void aarch64_gen_atomic_cas (rtx, rtx, rtx, rtx, rtx);
+void aarch64_gen_atomic_ldop (enum rtx_code, rtx, rtx, rtx, rtx);
 void aarch64_split_atomic_op (enum rtx_code, rtx, rtx, rtx, rtx, rtx, rtx);
 
 bool aarch64_gen_adjusted_ldpstp (rtx *, bool, enum machine_mode, RTX_CODE);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 4d2126b..dc05c6e 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aa

[AArch64][PATCH 2/5] Add BIC instruction.

2015-09-17 Thread Matthew Wahab

Hello,

ARMv8.1 adds atomic swap and atomic load-operate instructions with
optional memory ordering specifiers. This patch adds an expander to
generate a BIC instruction that can be explicitly called when
implementing the atomic__fetch pattern to calculate the value to
be returned by the operation.

Tested the series for aarch64-none-linux-gnu with native bootstrap and
make check. Also tested for aarch64-none-elf with cross-compiled
check-gcc on an ARMv8.1 emulator with +lse enabled by default.

Ok for trunk?
Matthew

2015-09-17  Matthew Wahab  

* config/aarch64/aarch64.md (bic_3): New.


>From 14e122ee98aa20826ee070d20c58c94206cdd91b Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Mon, 17 Aug 2015 17:48:27 +0100
Subject: [PATCH 2/5] Add BIC instruction

Change-Id: Ibef049bfa1bfe5e168feada3dc358f28383e6410
---
 gcc/config/aarch64/aarch64.md | 13 +
 1 file changed, 13 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 88ba72e..bae4af4 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -3351,6 +3351,19 @@
(set_attr "simd" "*,yes")]
 )
 
+(define_expand "bic_3"
+ [(set (match_operand:GPI 0 "register_operand" "=r")
+   (and:GPI
+(not:GPI
+ (SHIFT:GPI
+  (match_operand:GPI 1 "register_operand" "r")
+  (match_operand:QI 2 "aarch64_shift_imm_si" "n")))
+(match_operand:GPI 3 "register_operand" "r")))]
+ ""
+ ""
+ [(set_attr "type" "logics_shift_imm")]
+)
+
 (define_insn "*and_one_cmpl3_compare0"
   [(set (reg:CC_NZ CC_REGNUM)
 	(compare:CC_NZ
-- 
2.1.4



[AArch64][PATCH 3/5] Add atomic load-operate instructions.

2015-09-17 Thread Matthew Wahab

Hello,

ARMv8.1 adds atomic swap and atomic load-operate instructions with
optional memory ordering specifiers. This patch adds the ARMv8.1 atomic
load-operate instructions.

Tested the series for aarch64-none-linux-gnu with native bootstrap and
make check. Also tested for aarch64-none-elf with cross-compiled
check-gcc on an ARMv8.1 emulator with +lse enabled by default.

Ok for trunk?
Matthew

2015-09-17  Matthew Wahab  

* config/aarch64/aarch64/atomics.md (UNSPECV_ATOMIC_LDOP): New.
(UNSPECV_ATOMIC_LDOP_OR): New.
(UNSPECV_ATOMIC_LDOP_BIC): New.
(UNSPECV_ATOMIC_LDOP_XOR): New.
(UNSPECV_ATOMIC_LDOP_PLUS): New.
(ATOMIC_LDOP): New.
(atomic_ldop): New.
(aarch64_atomic_load): New.

>From 6a8a83c4efbd607924f0630779d4915c9dad079c Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Mon, 10 Aug 2015 17:02:08 +0100
Subject: [PATCH 3/5] Add atomic load-operate instructions.

Change-Id: I3746875bad7469403bee7df952f0ba565e4abc71
---
 gcc/config/aarch64/atomics.md | 41 +
 1 file changed, 41 insertions(+)

diff --git a/gcc/config/aarch64/atomics.md b/gcc/config/aarch64/atomics.md
index 0e71002..b7b6fb5 100644
--- a/gcc/config/aarch64/atomics.md
+++ b/gcc/config/aarch64/atomics.md
@@ -29,8 +29,25 @@
 UNSPECV_ATOMIC_CAS			; Represent an atomic CAS.
 UNSPECV_ATOMIC_SWP			; Represent an atomic SWP.
 UNSPECV_ATOMIC_OP			; Represent an atomic operation.
+UNSPECV_ATOMIC_LDOP			; Represent an atomic load-operation
+UNSPECV_ATOMIC_LDOP_OR		; Represent an atomic load-or
+UNSPECV_ATOMIC_LDOP_BIC		; Represent an atomic load-bic
+UNSPECV_ATOMIC_LDOP_XOR		; Represent an atomic load-xor
+UNSPECV_ATOMIC_LDOP_PLUS		; Represent an atomic load-add
 ])
 
+;; Iterators for load-operate instructions.
+
+(define_int_iterator ATOMIC_LDOP
+ [UNSPECV_ATOMIC_LDOP_OR UNSPECV_ATOMIC_LDOP_BIC
+  UNSPECV_ATOMIC_LDOP_XOR UNSPECV_ATOMIC_LDOP_PLUS])
+
+(define_int_attr atomic_ldop
+ [(UNSPECV_ATOMIC_LDOP_OR "set") (UNSPECV_ATOMIC_LDOP_BIC "clr")
+  (UNSPECV_ATOMIC_LDOP_XOR "eor") (UNSPECV_ATOMIC_LDOP_PLUS "add")])
+
+;; Instruction patterns.
+
 (define_expand "atomic_compare_and_swap"
   [(match_operand:SI 0 "register_operand" "")			;; bool out
(match_operand:ALLI 1 "register_operand" "")			;; val out
@@ -541,3 +558,27 @@
 else
   return "casal\t%0, %2, %1";
 })
+
+;; Atomic load-op: Load data, operate, store result, keep data.
+
+(define_insn "aarch64_atomic_load"
+ [(set (match_operand:ALLI 0 "register_operand" "=r")
+   (match_operand:ALLI 1 "aarch64_sync_memory_operand" "+Q"))
+  (set (match_dup 1)
+   (unspec_volatile:ALLI
+[(match_dup 1)
+ (match_operand:ALLI 2 "register_operand")
+ (match_operand:SI 3 "const_int_operand")]
+ATOMIC_LDOP))]
+ "TARGET_LSE && reload_completed"
+ {
+   enum memmodel model = memmodel_from_int (INTVAL (operands[3]));
+   if (is_mm_relaxed (model))
+ return "ld\t%2, %0, %1";
+   else if (is_mm_acquire (model) || is_mm_consume (model))
+ return "lda\t%2, %0, %1";
+   else if (is_mm_release (model))
+ return "ldl\t%2, %0, %1";
+   else
+ return "ldal\t%2, %0, %1";
+ })
-- 
2.1.4



[AArch64][PATCH 4/5] Use atomic load-operate instructions for fetch-update patterns.

2015-09-17 Thread Matthew Wahab

Hello,

ARMv8.1 adds atomic swap and atomic load-operate instructions with
optional memory ordering specifiers. This patch uses the ARMv8.1 atomic
load-operate instructions to implement the atomic_fetch_
patterns. This patch also updates the implementation of the atomic_
patterns, which are treated as versions of the atomic_fetch_ which
discard the loaded data.

The general form of the code generated for an atomic_fetch_, with
destination D, source S, memory address A and memory order MO, depends
on whether the operation is directly supported by the instruction. If
 is one of PLUS, IOR or XOR, the code generated is:

ld S, D, [A]

where
   is one {add, set, eor}
   is one of {'', 'a', 'l', 'al'} depending on memory order MO.
   is one of {'', 'b', 'h'} depending on the data size.

If  is SUB, the code generated, with scratch register r, is:

neg r, S
ldadd r, D, [A]

If  is AND, the code generated is:
not r, S
ldclr r, D, [A]

Any operation not in {PLUS, IOR, XOR, SUB, AND} is passed to the
existing aarch64_split_atomic_op function, to implement the operation
using sequences built with the ARMv8 load-exclusive/store-exclusive
instructions

Tested the series for aarch64-none-linux-gnu with native bootstrap and
make check. Also tested for aarch64-none-elf with cross-compiled
check-gcc on an ARMv8.1 emulator with +lse enabled by default.

Ok for trunk?
Matthew

gcc/
2015-09-17  Matthew Wahab  

* config/aarch64/aarch64-protos.h
(aarch64_atomic_ldop_supported_p): Declare.
* config/aarch64/aarch64.c (aarch64_atomic_ldop_supported_p): New.
(enum aarch64_atomic_load_op_code): New.
(aarch64_emit_atomic_load_op): New.
(aarch64_gen_atomic_load_op): Update to support load-operate
patterns.
* config/aarch64/atomics.md (atomic_): Change
to an expander.
(aarch64_atomic_): New.
(aarch64_atomic__lse): New.
(atomic_fetch_): Change to an expander.
(aarch64_atomic_fetch_): New.
    (aarch64_atomic_fetch__lse): New.

gcc/testsuite/
2015-09-17  Matthew Wahab  

* gcc.target/aarch64/atomic-inst-ldadd.c: New.
* gcc.target/aarch64/atomic-inst-ldlogic.c: New.

>From c4b8eb6d2ca62c57f4a011e06335b918f603ad2a Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Fri, 7 Aug 2015 17:10:42 +0100
Subject: [PATCH 4/5] Use atomic instructions for fetch-update patterns.

Change-Id: I39759f02e61039067ccaabfd52039e4804eddf2f
---
 gcc/config/aarch64/aarch64-protos.h|   2 +
 gcc/config/aarch64/aarch64.c   | 176 -
 gcc/config/aarch64/atomics.md  | 109 -
 .../gcc.target/aarch64/atomic-inst-ldadd.c |  58 +++
 .../gcc.target/aarch64/atomic-inst-ldlogic.c   | 109 +
 5 files changed, 444 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/atomic-inst-ldadd.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/atomic-inst-ldlogic.c

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index eba4c76..76ebd6f 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -378,6 +378,8 @@ rtx aarch64_load_tp (rtx);
 void aarch64_expand_compare_and_swap (rtx op[]);
 void aarch64_split_compare_and_swap (rtx op[]);
 void aarch64_gen_atomic_cas (rtx, rtx, rtx, rtx, rtx);
+
+bool aarch64_atomic_ldop_supported_p (enum rtx_code);
 void aarch64_gen_atomic_ldop (enum rtx_code, rtx, rtx, rtx, rtx);
 void aarch64_split_atomic_op (enum rtx_code, rtx, rtx, rtx, rtx, rtx, rtx);
 
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index dc05c6e..33f9ef3 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -11064,6 +11064,33 @@ aarch64_expand_compare_and_swap (rtx operands[])
   emit_insn (gen_rtx_SET (bval, x));
 }
 
+/* Test whether the target supports using a atomic load-operate instruction.
+   CODE is the operation and AFTER is TRUE if the data in memory after the
+   operation should be returned and FALSE if the data before the operation
+   should be returned.  Returns FALSE if the operation isn't supported by the
+   architecture.
+  */
+
+bool
+aarch64_atomic_ldop_supported_p (enum rtx_code code)
+{
+  if (!TARGET_LSE)
+return false;
+
+  switch (code)
+{
+case SET:
+case AND:
+case IOR:
+case XOR:
+case MINUS:
+case PLUS:
+  return true;
+default:
+  return false;
+}
+}
+
 /* Emit a barrier, that is appropriate for memory model MODEL, at the end of a
sequence implementing an atomic operation.  */
 
@@ -11206,26 +11233,169 @@ aarch64_emit_atomic_swap (machine_mode mode, rtx dst, rtx value,
   emit_insn (gen (dst, mem, value, model));
 }
 
-/* Emit an atomic operation where the architecture supports it.  */
+/* Operations sup

[AArch64][PATCH 5/5] Use atomic load-operate instructions for update-fetch patterns.

2015-09-17 Thread Matthew Wahab

Hello,

ARMv8.1 adds atomic swap and atomic load-operate instructions with
optional memory ordering specifiers. This patch uses the ARMv8.1
load-operate instructions to implement the atomic__fetch patterns.

The approach is to use the atomic load-operate instruction to atomically
load the data and update memory and then to use the loaded data to
calculate the value that the instruction would have stored. The
calculation attempts to mirror the operation of the atomic instruction.
For example, atomic_and_fetch is implemented with an atomic
load-bic so the result is also calculated using a BIC instruction.

The general form of the code generated for an atomic__fetch, with
destination D, source S, memory address A and memory order MO, depends
on whether or not the operation is directly supported by the
instruction. If  is one of PLUS, IOR or XOR, the code generated is:

ld S, D, [A]
 D, D, S
where
 is one {add, set, eor}
 is one of {add, orr, xor}
 is one of {'', 'a', 'l', 'al'} depending on memory order MO.
 is one of {'', 'b', 'h'} depending on the data size.

If  is SUB, the code generated is:

neg S, S
ldadd S, D, [A]
add D, D, S

If  is AND, the code generated is:

not S, S
ldclr S, D, [A]
bic D, S, S

Any operation not in {PLUS, IOR, XOR, SUB, AND} is passed to the
existing aarch64_split_atomic_op function, to implement the operation
using sequences built with the ARMv8 load-exclusive/store-exclusive
instructions

Tested the series for aarch64-none-linux-gnu with native bootstrap and
make check. Also tested for aarch64-none-elf with cross-compiled
check-gcc on an ARMv8.1 emulator with +lse enabled by default.

Ok for trunk?
Matthew

2015-09-17  Matthew Wahab  

* config/aarch64/aarch64-protos.h (aarch64_gen_atomic_ldop):
Adjust declaration.
* config/aarch64/aarch64.c (aarch64_emit_bic): New.
(aarch64_gen_atomic_load_op): Adjust comment.  Add parameter
out_result.  Update to support update-fetch operations.
* config/aarch64/atomics.md (aarch64_atomic_exchange_lse):
Adjust for change to aarch64_gen_atomic_ldop.
(aarch64_atomic__lse): Likewise.
(aarch64_atomic_fetch__lse): Likewise.
(atomic__fetch): Change to an expander.
(aarch64_atomic__fetch): New.
    (aarch64_atomic__fetch_lse): New.

gcc/testsuite
2015-09-17  Matthew Wahab  

* gcc.target/aarch64/atomic-inst-ldadd.c: Add tests for
update-fetch operations.
* gcc.target/aarch64/atomic-inst-ldlogic.c: Likewise.

>From 577bdb656e451df5ce1c8c651a642c3ac4d7c73b Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Mon, 17 Aug 2015 11:27:18 +0100
Subject: [PATCH 5/5] Use atomic instructions for update-fetch patterns.

Change-Id: I5eef48586fe904f0d2df8c581fb3c12a4a2d9c78
---
 gcc/config/aarch64/aarch64-protos.h|   2 +-
 gcc/config/aarch64/aarch64.c   |  72 +++--
 gcc/config/aarch64/atomics.md  |  61 ++-
 .../gcc.target/aarch64/atomic-inst-ldadd.c |  53 ++---
 .../gcc.target/aarch64/atomic-inst-ldlogic.c   | 118 ++---
 5 files changed, 247 insertions(+), 59 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 76ebd6f..dd8ebcc 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -380,7 +380,7 @@ void aarch64_split_compare_and_swap (rtx op[]);
 void aarch64_gen_atomic_cas (rtx, rtx, rtx, rtx, rtx);
 
 bool aarch64_atomic_ldop_supported_p (enum rtx_code);
-void aarch64_gen_atomic_ldop (enum rtx_code, rtx, rtx, rtx, rtx);
+void aarch64_gen_atomic_ldop (enum rtx_code, rtx, rtx, rtx, rtx, rtx);
 void aarch64_split_atomic_op (enum rtx_code, rtx, rtx, rtx, rtx, rtx, rtx);
 
 bool aarch64_gen_adjusted_ldpstp (rtx *, bool, enum machine_mode, RTX_CODE);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 33f9ef3..d95b81f 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -11212,6 +11212,25 @@ aarch64_split_compare_and_swap (rtx operands[])
 aarch64_emit_post_barrier (model);
 }
 
+/* Emit a BIC instruction.  */
+
+static void
+aarch64_emit_bic (machine_mode mode, rtx dst, rtx s1, rtx s2, int shift)
+{
+  rtx shift_rtx = GEN_INT (shift);
+  rtx (*gen) (rtx, rtx, rtx, rtx);
+
+  switch (mode)
+{
+case SImode: gen = gen_bic_lshrsi3; break;
+case DImode: gen = gen_bic_lshrdi3; break;
+default:
+  gcc_unreachable ();
+}
+
+  emit_insn (gen (dst, s2, shift_rtx, s1));
+}
+
 /* Emit an atomic swap.  */
 
 static void
@@ -11306,13 +11325,14 @@ aarch64_emit_atomic_load_op (enum aarch64_atomic_load_op_code code,
 }
 
 /* Emit an atomic load+operate.  CODE is the operation.  OUT_DATA is the
-   location to store the data read from memory.  MEM is the memory location to
-  

[ARM] Add ARMv8.1 command line options.

2015-09-17 Thread Matthew Wahab

Hello,

ARMv8.1 is a set of architectural extensions to ARMv8. Support has been
enabled in binutils for ARMv8.1 for the architechure, using the name
"armv8.1-a".

This patch adds support to gcc for specifying an ARMv8.1 architecture
using options "-march=armv8.1-a" and "-march=armv8.1-a+crc". It also
adds the FPU options "-mfpu=neon-fp-armv8.1" and
"-mpu=crypto-neon-fp-armv8.1", to specify the ARMv8.1 Adv.SIMD
instruction set.  The changes set the apropriate architecture and fpu
options for binutils but don't otherwise change the code generated by
gcc.

Tested for arm-none-linux-gnueabihf with native bootstrap and make
check.

Ok for trunk?
Matthew

2015-09-17  Matthew Wahab  

* config/arm/arm-arches.def: Add "armv8.1-a" and "armv8.1-a+crc".
* config/arm/arm-fpus.def: Add "neon-fp-armv8.1" and
"crypto-neon-fp-armv8.1".
* config/arm/arm-protos.h (FL2_ARCH8_1): New.
(FL2_FOR_ARCH8_1A): New.
* config/arm/arm-tables.opt: Regenerate.
* config/arm/arm.h (FPU_FL_RDMA): New.
* doc/invoke.texi (ARM -march): Add "armv8.1-a" and
"armv8.1-a+crc".
(ARM -mfpu): Add "neon-fp-armv8.1" and "crypto-neon-fp-armv8.1".
diff --git a/gcc/config/arm/arm-arches.def b/gcc/config/arm/arm-arches.def
index ddf6c3c..4cf71fd 100644
--- a/gcc/config/arm/arm-arches.def
+++ b/gcc/config/arm/arm-arches.def
@@ -57,6 +57,8 @@ ARM_ARCH("armv7-m", cortexm3,	7M,	ARM_FSET_MAKE_CPU1 (FL_CO_PROC |	  FL_FOR_
 ARM_ARCH("armv7e-m", cortexm4,  7EM,	ARM_FSET_MAKE_CPU1 (FL_CO_PROC |	  FL_FOR_ARCH7EM))
 ARM_ARCH("armv8-a", cortexa53,  8A,	ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_FOR_ARCH8A))
 ARM_ARCH("armv8-a+crc",cortexa53, 8A,   ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_CRC32  | FL_FOR_ARCH8A))
+ARM_ARCH("armv8.1-a", cortexa53,  8A,	ARM_FSET_MAKE (FL_CO_PROC | FL_FOR_ARCH8A,  FL2_FOR_ARCH8_1A))
+ARM_ARCH("armv8.1-a+crc",cortexa53, 8A,	ARM_FSET_MAKE (FL_CO_PROC | FL_CRC32 | FL_FOR_ARCH8A, FL2_FOR_ARCH8_1A))
 ARM_ARCH("iwmmxt",  iwmmxt, 5TE,	ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_STRONG | FL_FOR_ARCH5TE | FL_XSCALE | FL_IWMMXT))
 ARM_ARCH("iwmmxt2", iwmmxt2,5TE,	ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_STRONG | FL_FOR_ARCH5TE | FL_XSCALE | FL_IWMMXT | FL_IWMMXT2))
 
diff --git a/gcc/config/arm/arm-fpus.def b/gcc/config/arm/arm-fpus.def
index efd5896..065fb3d9 100644
--- a/gcc/config/arm/arm-fpus.def
+++ b/gcc/config/arm/arm-fpus.def
@@ -44,5 +44,9 @@ ARM_FPU("fp-armv8",	ARM_FP_MODEL_VFP, 8, VFP_REG_D32, FPU_FL_FP16)
 ARM_FPU("neon-fp-armv8",ARM_FP_MODEL_VFP, 8, VFP_REG_D32, FPU_FL_NEON | FPU_FL_FP16)
 ARM_FPU("crypto-neon-fp-armv8",
 			ARM_FP_MODEL_VFP, 8, VFP_REG_D32, FPU_FL_NEON | FPU_FL_FP16 | FPU_FL_CRYPTO)
+ARM_FPU("neon-fp-armv8.1",
+			ARM_FP_MODEL_VFP, 8, VFP_REG_D32, FPU_FL_NEON | FPU_FL_FP16 | FPU_FL_RDMA)
+ARM_FPU("crypto-neon-fp-armv8.1",
+			ARM_FP_MODEL_VFP, 8, VFP_REG_D32, FPU_FL_NEON | FPU_FL_FP16 | FPU_FL_RDMA | FPU_FL_CRYPTO)
 /* Compatibility aliases.  */
 ARM_FPU("vfp3",		ARM_FP_MODEL_VFP, 3, VFP_REG_D32, FPU_FL_NONE)
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 8df312f..e60ad4c 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -387,6 +387,8 @@ extern bool arm_is_constant_pool_ref (rtx);
 #define FL_IWMMXT2(1 << 30)   /* "Intel Wireless MMX2 technology".  */
 #define FL_ARCH6KZ(1 << 31)   /* ARMv6KZ architecture.  */
 
+#define FL2_ARCH8_1   (1 << 0)	  /* Architecture 8.1.  */
+
 /* Flags that only effect tuning, not available instructions.  */
 #define FL_TUNE		(FL_WBUF | FL_VFPV2 | FL_STRONG | FL_LDSCHED \
 			 | FL_CO_PROC)
@@ -415,6 +417,7 @@ extern bool arm_is_constant_pool_ref (rtx);
 #define FL_FOR_ARCH7M	(FL_FOR_ARCH7 | FL_THUMB_DIV)
 #define FL_FOR_ARCH7EM  (FL_FOR_ARCH7M | FL_ARCH7EM)
 #define FL_FOR_ARCH8A	(FL_FOR_ARCH7VE | FL_ARCH8)
+#define FL2_FOR_ARCH8_1A	FL2_ARCH8_1
 
 /* There are too many feature bits to fit in a single word so the set of cpu and
fpu capabilities is a structure.  A feature set is created and manipulated
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index f7a9d63..274bc46 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -336,6 +336,7 @@ typedef unsigned long arm_fpu_feature_set;
 #define FPU_FL_NEON	(1 << 0)	/* NEON instructions.  */
 #define FPU_FL_FP16	(1 << 1)	/* Half-precision.  */
 #define FPU_FL_CRYPTO	(1 << 2)	/* Crypto extensions.  */
+#define FPU_FL_RDMA	(1 << 3)	/* ARMv8.1 extensions.  */
 
 /* Which floating point model to use.  */
 enum arm_fp_model
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 99c9685..9f49189 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/

Re: [AArch64][PATCH 1/5] Use atomic instructions for swap and fetch-update operations.

2015-09-21 Thread Matthew Wahab

On 18/09/15 08:58, James Greenhalgh wrote:

On Thu, Sep 17, 2015 at 05:37:55PM +0100, Matthew Wahab wrote:



diff --git a/gcc/config/aarch64/atomics.md b/gcc/config/aarch64/atomics.md
index 65d2cc9..0e71002 100644
--- a/gcc/config/aarch64/atomics.md
+++ b/gcc/config/aarch64/atomics.md
@@ -27,6 +27,7 @@
  UNSPECV_ATOMIC_CMPSW  ; Represent an atomic compare swap.
  UNSPECV_ATOMIC_EXCHG  ; Represent an atomic exchange.
  UNSPECV_ATOMIC_CAS; Represent an atomic CAS.
+UNSPECV_ATOMIC_SWP ; Represent an atomic SWP.
  UNSPECV_ATOMIC_OP ; Represent an atomic operation.
  ])

@@ -122,19 +123,19 @@
  )

  (define_insn_and_split "aarch64_compare_and_swap_lse"
-  [(set (reg:CC CC_REGNUM) ;; bool out
+  [(set (reg:CC CC_REGNUM)
  (unspec_volatile:CC [(const_int 0)] UNSPECV_ATOMIC_CMPSW))
-   (set (match_operand:GPI 0 "register_operand" "=&r") ;; val out
-(match_operand:GPI 1 "aarch64_sync_memory_operand" "+Q"))   ;; memory
+   (set (match_operand:GPI 0 "register_operand" "=&r")
+(match_operand:GPI 1 "aarch64_sync_memory_operand" "+Q"))
 (set (match_dup 1)
  (unspec_volatile:GPI
-  [(match_operand:GPI 2 "aarch64_plus_operand" "rI")   ;; expect
-   (match_operand:GPI 3 "register_operand" "r");; desired
-   (match_operand:SI 4 "const_int_operand")  ;; is_weak
-   (match_operand:SI 5 "const_int_operand")  ;; mod_s
-   (match_operand:SI 6 "const_int_operand")] ;; mod_f
+  [(match_operand:GPI 2 "aarch64_plus_operand" "rI")
+   (match_operand:GPI 3 "register_operand" "r")
+   (match_operand:SI 4 "const_int_operand")
+   (match_operand:SI 5 "const_int_operand")
+   (match_operand:SI 6 "const_int_operand")]


I'm not sure I understand the change here, those comments still look helpful
enough for understanding the pattern, what have a I missed?


That was part of an attempt to clean up some code. It's unnecessary and I've dropped 
the change.


Attached is the updated patch with some other changes:
- Simplified the atomic_exchange expander in line with reviews for
  other patches in the series.
- Removed the CC clobber from aarch64_atomic_exchange_lse, it was
  over-cautious.
- Added a missing entry to the change log (noting a whitespace fix).

Ok for trunk?
Matthew

gcc/
2015-09-21  Matthew Wahab  

* config/aarch64/aarch64-protos.h (aarch64_gen_atomic_ldop):
Declare.
* config/aarch64/aarch64.c (aarch64_emit_atomic_swap): New.
(aarch64_gen_atomic_ldop): New.
(aarch64_split_atomic_op): Fix whitespace and add a comment.
* config/aarch64/atomics.md (UNSPECV_ATOMIC_SWP): New.
(aarch64_compare_and_swap_lse): Fix some whitespace.
    (atomic_exchange): Replace with an expander.
(aarch64_atomic_exchange): New.
(aarch64_atomic_exchange_lse): New.
(aarch64_atomic_): Fix some whitespace.
(aarch64_atomic_swp): New.


gcc/testsuite/
2015-09-21  Matthew Wahab  

* gcc.target/aarch64/atomic-inst-ops.inc: (TEST_MODEL): New.
(TEST_ONE): New.
* gcc.target/aarch64/atomic-inst-swap.c: New.


>From 31226dce8d36be98ca95d9165d4147a3bf84d180 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Fri, 7 Aug 2015 17:18:37 +0100
Subject: [PATCH 1/5] Add atomic SWP instruction

Change-Id: I87bf48526cb11e65edd15691f5eab20446e418c9
---
 gcc/config/aarch64/aarch64-protos.h|  1 +
 gcc/config/aarch64/aarch64.c   | 46 +-
 gcc/config/aarch64/atomics.md  | 71 --
 .../gcc.target/aarch64/atomic-inst-ops.inc | 13 
 gcc/testsuite/gcc.target/aarch64/atomic-inst-swp.c | 44 ++
 5 files changed, 170 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/atomic-inst-swp.c

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index ff19851..eba4c76 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -378,6 +378,7 @@ rtx aarch64_load_tp (rtx);
 void aarch64_expand_compare_and_swap (rtx op[]);
 void aarch64_split_compare_and_swap (rtx op[]);
 void aarch64_gen_atomic_cas (rtx, rtx, rtx, rtx, rtx);
+void aarch64_gen_atomic_ldop (enum rtx_code, rtx, rtx, rtx, rtx);
 void aarch64_split_atomic_op (enum rtx_code, rtx, rtx, rtx, rtx, rtx, rtx);
 
 bool aarch64_gen_adjusted_ldpstp (rtx *, bool, enum machine_mode, RTX_CODE);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 4d2126b..dc05c6e 100644
--

Re: [AArch64][PATCH 2/5] Make BIC, other logical instructions, available. (was: Add BIC instruction.)

2015-09-21 Thread Matthew Wahab

On 18/09/15 09:05, James Greenhalgh wrote:

On Thu, Sep 17, 2015 at 05:40:48PM +0100, Matthew Wahab wrote:

Hello,

ARMv8.1 adds atomic swap and atomic load-operate instructions with
optional memory ordering specifiers. This patch adds an expander to
generate a BIC instruction that can be explicitly called when
implementing the atomic__fetch pattern to calculate the value to
be returned by the operation.



Why not make the "*_one_cmpl_3" pattern
named (remove the leading *) and call that in your atomic__fetch
patterns as:

   and_one_cmpl_3

I'd rather that than to add a pettern that simply expands to the same
thing.


I overlooked that pattern when I was trying to find the bic emitter. I've attached an 
updated patch.


Tested as part of the series for aarch64-none-linux-gnu with native bootstrap 
and
make check. Also tested for aarch64-none-elf with cross-compiled
check-gcc on an ARMv8.1 emulator with +lse enabled by default.

Ok for trunk?
Matthew

2015-09-21  Matthew Wahab  

* config/aarch64/aarch64.md
(_one_cmpl_3): Make a named
pattern.

>From 0e2ae8739d70e4d1c14fa848f67847b1ecf94f71 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Mon, 17 Aug 2015 17:48:27 +0100
Subject: [PATCH 2/5] Make BIC, other logical instructions, available for use.

Change-Id: Ibef049bfa1bfe5e168feada3dc358f28383e6410
---
 gcc/config/aarch64/aarch64.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 88ba72e..72384ce 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -3392,7 +3392,7 @@
   [(set_attr "type" "logics_reg")]
 )
 
-(define_insn "*_one_cmpl_3"
+(define_insn "_one_cmpl_3"
   [(set (match_operand:GPI 0 "register_operand" "=r")
 	(LOGICAL:GPI (not:GPI
 		  (SHIFT:GPI
-- 
2.1.4



Re: [AArch64][PATCH 3/5] Add atomic load-operate instructions.

2015-09-21 Thread Matthew Wahab

On 18/09/15 09:39, James Greenhalgh wrote:

On Thu, Sep 17, 2015 at 05:42:35PM +0100, Matthew Wahab wrote:

---
  gcc/config/aarch64/atomics.md | 41 +
  1 file changed, 41 insertions(+)

diff --git a/gcc/config/aarch64/atomics.md b/gcc/config/aarch64/atomics.md
index 0e71002..b7b6fb5 100644
--- a/gcc/config/aarch64/atomics.md
+++ b/gcc/config/aarch64/atomics.md
@@ -29,8 +29,25 @@
  UNSPECV_ATOMIC_CAS; Represent an atomic CAS.
  UNSPECV_ATOMIC_SWP; Represent an atomic SWP.
  UNSPECV_ATOMIC_OP ; Represent an atomic operation.
+UNSPECV_ATOMIC_LDOP; Represent an atomic 
load-operation
+UNSPECV_ATOMIC_LDOP_OR ; Represent an atomic load-or
+UNSPECV_ATOMIC_LDOP_BIC; Represent an atomic load-bic
+UNSPECV_ATOMIC_LDOP_XOR; Represent an atomic load-xor
+UNSPECV_ATOMIC_LDOP_PLUS   ; Represent an atomic load-add
  ])

+;; Iterators for load-operate instructions.
+
+(define_int_iterator ATOMIC_LDOP
+ [UNSPECV_ATOMIC_LDOP_OR UNSPECV_ATOMIC_LDOP_BIC
+  UNSPECV_ATOMIC_LDOP_XOR UNSPECV_ATOMIC_LDOP_PLUS])
+
+(define_int_attr atomic_ldop
+ [(UNSPECV_ATOMIC_LDOP_OR "set") (UNSPECV_ATOMIC_LDOP_BIC "clr")
+  (UNSPECV_ATOMIC_LDOP_XOR "eor") (UNSPECV_ATOMIC_LDOP_PLUS "add")])


There is precedent (atomic_optab, atomic_op_operand, const_atomic, etc.) for
these living in config/aarch64/iterators.md so they should be moved there.
Presumably the difficulty with that is to do with the position of the
"unspecv" define_c_enum? I'd argue that is in the wrong place too...

If you want to leave this to a cleanup patch in stage 3 that is fine.

This patch is OK for trunk.



I'd prefer to keep the clean-up separate from this series. I'll commit the patch as 
it is and the deal with the iterator move later.


Thanks,
Matthew


Re: [AArch64][PATCH 4/5] Use atomic load-operate instructions for fetch-update patterns.

2015-09-21 Thread Matthew Wahab

On 18/09/15 09:55, James Greenhalgh wrote:

On Thu, Sep 17, 2015 at 05:47:43PM +0100, Matthew Wahab wrote:

Hello,

ARMv8.1 adds atomic swap and atomic load-operate instructions with
optional memory ordering specifiers. This patch uses the ARMv8.1 atomic
load-operate instructions to implement the atomic_fetch_
patterns. This patch also updates the implementation of the atomic_
patterns, which are treated as versions of the atomic_fetch_ which
discard the loaded data.


[..]


Ok for trunk?


Some comments in line below.



diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index eba4c76..76ebd6f 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -378,6 +378,8 @@ rtx aarch64_load_tp (rtx);
  void aarch64_expand_compare_and_swap (rtx op[]);
  void aarch64_split_compare_and_swap (rtx op[]);
  void aarch64_gen_atomic_cas (rtx, rtx, rtx, rtx, rtx);
+
+bool aarch64_atomic_ldop_supported_p (enum rtx_code);
  void aarch64_gen_atomic_ldop (enum rtx_code, rtx, rtx, rtx, rtx);
  void aarch64_split_atomic_op (enum rtx_code, rtx, rtx, rtx, rtx, rtx, rtx);

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index dc05c6e..33f9ef3 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -11064,6 +11064,33 @@ aarch64_expand_compare_and_swap (rtx operands[])
emit_insn (gen_rtx_SET (bval, x));
  }

+/* Test whether the target supports using a atomic load-operate instruction.
+   CODE is the operation and AFTER is TRUE if the data in memory after the
+   operation should be returned and FALSE if the data before the operation
+   should be returned.  Returns FALSE if the operation isn't supported by the
+   architecture.
+  */


Stray newline, leave the */ on the line before.


Fixed this.


+
+bool
+aarch64_atomic_ldop_supported_p (enum rtx_code code)
+{
+  if (!TARGET_LSE)
+return false;
+
+  switch (code)
+{
+case SET:
+case AND:
+case IOR:
+case XOR:
+case MINUS:
+case PLUS:
+  return true;
+default:
+  return false;
+}
+}
+
  /* Emit a barrier, that is appropriate for memory model MODEL, at the end of a
 sequence implementing an atomic operation.  */

@@ -11206,26 +11233,169 @@ aarch64_emit_atomic_swap (machine_mode mode, rtx 
dst, rtx value,
emit_insn (gen (dst, mem, value, model));
  }

-/* Emit an atomic operation where the architecture supports it.  */
+/* Operations supported by aarch64_emit_atomic_load_op.  */
+
+enum aarch64_atomic_load_op_code
+{
+  AARCH64_LDOP_PLUS,   /* A + B  */
+  AARCH64_LDOP_XOR,/* A ^ B  */
+  AARCH64_LDOP_OR, /* A | B  */
+  AARCH64_LDOP_BIC /* A & ~B  */
+};


I have a small preference to calling these the same name as the
instructions they will generate, so AARCH64_LDOP_ADD, AARCH64_LDOP_EOR,
AARCH64_LDOP_SET and AARCH64_LDOP_CLR, but I'm happy fo you to leave it
this way if you prefer.



I prefer to keep them related to the operation being implemented rather than how it 
is implemented so I've left them as they are.




-(define_insn_and_split "atomic_"
+(define_expand "atomic_"
+ [(set (match_operand:ALLI 0 "aarch64_sync_memory_operand" "")
+   (unspec_volatile:ALLI
+[(atomic_op:ALLI (match_dup 0)
+  (match_operand:ALLI 1 "" ""))
+ (match_operand:SI 2 "const_int_operand")]
+UNSPECV_ATOMIC_OP))
+  (clobber (reg:CC CC_REGNUM))]


This is not needed for the LSE forms of these instructions and may result
in less optimal code genmeration. On the other hand, that will only be in
a corner case and this is only a define_expand. Because of that, it would
be clearer to a reader if you dropped the detailed description of this
in RTL (which is never used) and rewrote it using just the uses of the
operands, as so:


+(define_expand "atomic_"
+ [(match_operand:ALLI 0 "aarch64_sync_memory_operand" "")
+  (match_operand:ALLI 1 "" "")
+  (match_operand:SI 2 "const_int_operand")]




Switched the new expanders in this and the other patches to the simpler form.




+(define_insn_and_split "aarch64_atomic_"
+ [(set (match_operand:ALLI 0 "aarch64_sync_memory_operand" "+Q")
+   (unspec_volatile:ALLI
+[(atomic_op:ALLI (match_dup 0)
+  (match_operand:ALLI 1 "" "r"))
+ (match_operand:SI 2 "const_int_operand")]
+UNSPECV_ATOMIC_OP))
+  (clobber (reg:CC CC_REGNUM))
+  (clobber (match_scratch:ALLI 3 "=&r"))
+  (clobber (match_scratch:SI 4 "=&r"))]
+  ""


TARGET_LSE ?


It's not needed here because this pattern is always available.


+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+  {
+aarch64_split_atomic_op (, NULL, operands[3], operands[0],
+operands[1], operands[2], op

Re: [AArch64][PATCH 5/5] Use atomic load-operate instructions for update-fetch patterns.

2015-09-21 Thread Matthew Wahab

On 17/09/15 17:54, Matthew Wahab wrote:

ARMv8.1 adds atomic swap and atomic load-operate instructions with
optional memory ordering specifiers. This patch uses the ARMv8.1
load-operate instructions to implement the atomic__fetch patterns.

The approach is to use the atomic load-operate instruction to atomically
load the data and update memory and then to use the loaded data to
calculate the value that the instruction would have stored. The
calculation attempts to mirror the operation of the atomic instruction.
For example, atomic_and_fetch is implemented with an atomic
load-bic so the result is also calculated using a BIC instruction.


[...]


2015-09-17  Matthew Wahab  

 * config/aarch64/aarch64-protos.h (aarch64_gen_atomic_ldop):
 Adjust declaration.
 * config/aarch64/aarch64.c (aarch64_emit_bic): New.
 (aarch64_gen_atomic_load_op): Adjust comment.  Add parameter
 out_result.  Update to support update-fetch operations.
 * config/aarch64/atomics.md (aarch64_atomic_exchange_lse):
 Adjust for change to aarch64_gen_atomic_ldop.
 (aarch64_atomic__lse): Likewise.
 (aarch64_atomic_fetch__lse): Likewise.
 (atomic__fetch): Change to an expander.
 (aarch64_atomic__fetch): New.
 (aarch64_atomic__fetch_lse): New.

gcc/testsuite
2015-09-17  Matthew Wahab  

 * gcc.target/aarch64/atomic-inst-ldadd.c: Add tests for
 update-fetch operations.
 * gcc.target/aarch64/atomic-inst-ldlogic.c: Likewise.



Attached an updated patch that takes into account the review comments and changes for 
the rest of the series.


The changes in this patch:
- Updated emit_bic for changes in the earlier patch.
- Simplified the patterns used in the new expanders.
- Dropped CC clobber from the _lse patterns.

Tested the series for aarch64-none-linux-gnu with native bootstrap and
make check. Also tested for aarch64-none-elf with cross-compiled
check-gcc on an ARMv8.1 emulator with +lse enabled by default.

Ok for trunk?
Matthew

2015-09-21  Matthew Wahab  

* config/aarch64/aarch64-protos.h (aarch64_gen_atomic_ldop):
Adjust declaration.
* config/aarch64/aarch64.c (aarch64_emit_bic): New.
(aarch64_gen_atomic_ldop): Adjust comment.  Add parameter
out_result.  Update to support update-fetch operations.
* config/aarch64/atomics.md (aarch64_atomic_exchange_lse):
Adjust for change to aarch64_gen_atomic_ldop.
(aarch64_atomic__lse): Likewise.
(aarch64_atomic_fetch__lse): Likewise.
(atomic__fetch): Change to an expander.
(aarch64_atomic__fetch): New.
(aarch64_atomic__fetch_lse): New.

gcc/testsuite
2015-09-21  Matthew Wahab  

* gcc.target/aarch64/atomic-inst-ldadd.c: Add tests for
update-fetch operations.
* gcc.target/aarch64/atomic-inst-ldlogic.c: Likewise.


>From abd313723964e90b6e7d7785b646c657f6b072f9 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Mon, 17 Aug 2015 11:27:18 +0100
Subject: [PATCH 5/5] Use atomic instructions for update-fetch patterns.

Change-Id: I5eef48586fe904f0d2df8c581fb3c12a4a2d9c78
---
 gcc/config/aarch64/aarch64-protos.h|   2 +-
 gcc/config/aarch64/aarch64.c   |  72 +++--
 gcc/config/aarch64/atomics.md  |  55 +-
 .../gcc.target/aarch64/atomic-inst-ldadd.c |  53 ++---
 .../gcc.target/aarch64/atomic-inst-ldlogic.c   | 118 ++---
 5 files changed, 241 insertions(+), 59 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 76ebd6f..dd8ebcc 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -380,7 +380,7 @@ void aarch64_split_compare_and_swap (rtx op[]);
 void aarch64_gen_atomic_cas (rtx, rtx, rtx, rtx, rtx);
 
 bool aarch64_atomic_ldop_supported_p (enum rtx_code);
-void aarch64_gen_atomic_ldop (enum rtx_code, rtx, rtx, rtx, rtx);
+void aarch64_gen_atomic_ldop (enum rtx_code, rtx, rtx, rtx, rtx, rtx);
 void aarch64_split_atomic_op (enum rtx_code, rtx, rtx, rtx, rtx, rtx, rtx);
 
 bool aarch64_gen_adjusted_ldpstp (rtx *, bool, enum machine_mode, RTX_CODE);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 3a1b434..b6cdf7c 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -11211,6 +11211,25 @@ aarch64_split_compare_and_swap (rtx operands[])
 aarch64_emit_post_barrier (model);
 }
 
+/* Emit a BIC instruction.  */
+
+static void
+aarch64_emit_bic (machine_mode mode, rtx dst, rtx s1, rtx s2, int shift)
+{
+  rtx shift_rtx = GEN_INT (shift);
+  rtx (*gen) (rtx, rtx, rtx, rtx);
+
+  switch (mode)
+{
+case SImode: gen = gen_and_one_cmpl_lshrsi3; break;
+case DImode: gen = gen_and_one_cmpl_lshrdi3; break;
+default:
+  gcc_unreachable ();
+}
+
+  emit_insn (gen (dst, s2, shift_rtx, s1));
+}
+
 /* Emit an atomic swap.  */
 
 static void
@@ -11305,13 +11324

Re: [ARM] Add ARMv8.1 command line options.

2015-10-08 Thread Matthew Wahab

Ping.

Updated patch attached, I've broken the over-long lines added to arm-arches.def and 
arm-fpus.def.


Matthew

On 17/09/15 18:54, Matthew Wahab wrote:

Hello,

ARMv8.1 is a set of architectural extensions to ARMv8. Support has been
enabled in binutils for ARMv8.1 for the architechure, using the name
"armv8.1-a".

This patch adds support to gcc for specifying an ARMv8.1 architecture
using options "-march=armv8.1-a" and "-march=armv8.1-a+crc". It also
adds the FPU options "-mfpu=neon-fp-armv8.1" and
"-mpu=crypto-neon-fp-armv8.1", to specify the ARMv8.1 Adv.SIMD
instruction set.  The changes set the apropriate architecture and fpu
options for binutils but don't otherwise change the code generated by
gcc.

Tested for arm-none-linux-gnueabihf with native bootstrap and make
check.

Ok for trunk?
Matthew

2015-09-17  Matthew Wahab  

 * config/arm/arm-arches.def: Add "armv8.1-a" and "armv8.1-a+crc".
 * config/arm/arm-fpus.def: Add "neon-fp-armv8.1" and
 "crypto-neon-fp-armv8.1".
 * config/arm/arm-protos.h (FL2_ARCH8_1): New.
 (FL2_FOR_ARCH8_1A): New.
 * config/arm/arm-tables.opt: Regenerate.
 * config/arm/arm.h (FPU_FL_RDMA): New.
 * doc/invoke.texi (ARM -march): Add "armv8.1-a" and
 "armv8.1-a+crc".
 (ARM -mfpu): Add "neon-fp-armv8.1" and "crypto-neon-fp-armv8.1".


diff --git a/gcc/config/arm/arm-arches.def b/gcc/config/arm/arm-arches.def
index ddf6c3c..2635c7b 100644
--- a/gcc/config/arm/arm-arches.def
+++ b/gcc/config/arm/arm-arches.def
@@ -57,6 +57,11 @@ ARM_ARCH("armv7-m", cortexm3,	7M,	ARM_FSET_MAKE_CPU1 (FL_CO_PROC |	  FL_FOR_
 ARM_ARCH("armv7e-m", cortexm4,  7EM,	ARM_FSET_MAKE_CPU1 (FL_CO_PROC |	  FL_FOR_ARCH7EM))
 ARM_ARCH("armv8-a", cortexa53,  8A,	ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_FOR_ARCH8A))
 ARM_ARCH("armv8-a+crc",cortexa53, 8A,   ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_CRC32  | FL_FOR_ARCH8A))
+ARM_ARCH("armv8.1-a", cortexa53,  8A,
+	 ARM_FSET_MAKE (FL_CO_PROC | FL_FOR_ARCH8A,  FL2_FOR_ARCH8_1A))
+ARM_ARCH("armv8.1-a+crc",cortexa53, 8A,
+	 ARM_FSET_MAKE (FL_CO_PROC | FL_CRC32 | FL_FOR_ARCH8A,
+			FL2_FOR_ARCH8_1A))
 ARM_ARCH("iwmmxt",  iwmmxt, 5TE,	ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_STRONG | FL_FOR_ARCH5TE | FL_XSCALE | FL_IWMMXT))
 ARM_ARCH("iwmmxt2", iwmmxt2,5TE,	ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_STRONG | FL_FOR_ARCH5TE | FL_XSCALE | FL_IWMMXT | FL_IWMMXT2))
 
diff --git a/gcc/config/arm/arm-fpus.def b/gcc/config/arm/arm-fpus.def
index efd5896..2c7b82e 100644
--- a/gcc/config/arm/arm-fpus.def
+++ b/gcc/config/arm/arm-fpus.def
@@ -44,5 +44,9 @@ ARM_FPU("fp-armv8",	ARM_FP_MODEL_VFP, 8, VFP_REG_D32, FPU_FL_FP16)
 ARM_FPU("neon-fp-armv8",ARM_FP_MODEL_VFP, 8, VFP_REG_D32, FPU_FL_NEON | FPU_FL_FP16)
 ARM_FPU("crypto-neon-fp-armv8",
 			ARM_FP_MODEL_VFP, 8, VFP_REG_D32, FPU_FL_NEON | FPU_FL_FP16 | FPU_FL_CRYPTO)
+ARM_FPU("neon-fp-armv8.1", ARM_FP_MODEL_VFP, 8, VFP_REG_D32,
+	FPU_FL_NEON | FPU_FL_FP16 | FPU_FL_RDMA)
+ARM_FPU("crypto-neon-fp-armv8.1", ARM_FP_MODEL_VFP, 8, VFP_REG_D32,
+	FPU_FL_NEON | FPU_FL_FP16 | FPU_FL_RDMA | FPU_FL_CRYPTO)
 /* Compatibility aliases.  */
 ARM_FPU("vfp3",		ARM_FP_MODEL_VFP, 3, VFP_REG_D32, FPU_FL_NONE)
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index f9b1276..9631ac9 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -387,6 +387,8 @@ extern bool arm_is_constant_pool_ref (rtx);
 #define FL_IWMMXT2(1 << 30)   /* "Intel Wireless MMX2 technology".  */
 #define FL_ARCH6KZ(1 << 31)   /* ARMv6KZ architecture.  */
 
+#define FL2_ARCH8_1   (1 << 0)	  /* Architecture 8.1.  */
+
 /* Flags that only effect tuning, not available instructions.  */
 #define FL_TUNE		(FL_WBUF | FL_VFPV2 | FL_STRONG | FL_LDSCHED \
 			 | FL_CO_PROC)
@@ -415,6 +417,7 @@ extern bool arm_is_constant_pool_ref (rtx);
 #define FL_FOR_ARCH7M	(FL_FOR_ARCH7 | FL_THUMB_DIV)
 #define FL_FOR_ARCH7EM  (FL_FOR_ARCH7M | FL_ARCH7EM)
 #define FL_FOR_ARCH8A	(FL_FOR_ARCH7VE | FL_ARCH8)
+#define FL2_FOR_ARCH8_1A	FL2_ARCH8_1
 
 /* There are too many feature bits to fit in a single word so the set of cpu and
fpu capabilities is a structure.  A feature set is created and manipulated
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 87c9f90..4037933 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -320,6 +320,7 @@ typedef unsigned long arm_fpu_feature_set;
 #define FPU_FL_NEON	(1 << 0)	/* NEON instructions.  */
 #define FPU_FL_FP16	(1 << 1)	/* Half-precision.  */
 #define FPU_FL_CRYPTO	(1 << 2)	/* Crypto extensions.  */
+#define FPU_FL_RDMA	(1 << 3)	/* ARMv8.1 extensions.  */
 
 /* Which floating point model to use.  */

Re: [ARM] Add ARMv8.1 command line options.

2015-10-13 Thread Matthew Wahab

Some of the command line options may be unnecessary so I'll drop this patch.
Matthew

On 08/10/15 12:00, Matthew Wahab wrote:

Ping.

Updated patch attached, I've broken the over-long lines added to arm-arches.def 
and
arm-fpus.def.

Matthew

On 17/09/15 18:54, Matthew Wahab wrote:

Hello,

ARMv8.1 is a set of architectural extensions to ARMv8. Support has been
enabled in binutils for ARMv8.1 for the architechure, using the name
"armv8.1-a".

This patch adds support to gcc for specifying an ARMv8.1 architecture
using options "-march=armv8.1-a" and "-march=armv8.1-a+crc". It also
adds the FPU options "-mfpu=neon-fp-armv8.1" and
"-mpu=crypto-neon-fp-armv8.1", to specify the ARMv8.1 Adv.SIMD
instruction set.  The changes set the apropriate architecture and fpu
options for binutils but don't otherwise change the code generated by
gcc.

Tested for arm-none-linux-gnueabihf with native bootstrap and make
check.

Ok for trunk?
Matthew

2015-09-17  Matthew Wahab  

 * config/arm/arm-arches.def: Add "armv8.1-a" and "armv8.1-a+crc".
 * config/arm/arm-fpus.def: Add "neon-fp-armv8.1" and
 "crypto-neon-fp-armv8.1".
 * config/arm/arm-protos.h (FL2_ARCH8_1): New.
 (FL2_FOR_ARCH8_1A): New.
 * config/arm/arm-tables.opt: Regenerate.
 * config/arm/arm.h (FPU_FL_RDMA): New.
 * doc/invoke.texi (ARM -march): Add "armv8.1-a" and
 "armv8.1-a+crc".
 (ARM -mfpu): Add "neon-fp-armv8.1" and "crypto-neon-fp-armv8.1".






[AArch64][PATCH 1/7] Add support for ARMv8.1 Adv.SIMD,instructions.

2015-10-23 Thread Matthew Wahab

The ARMv8.1 architecture extension adds two Adv.SIMD instructions,
sqrdmlah and sqrdmlsh. This patch series adds the instructions to the
AArch64 backend together with the ACLE feature macro and NEON intrinsics
to make use of them. The instructions are enabled when -march=armv8.1-a
is selected.

To support execution tests for the instructions, code is also added to
the testsuite to check the target capabilities and to specify required
compiler options.

This patch adds target feature macros for the instructions. Subsequent
patches:
- add the instructions to the aarch64-simd patterns,
- add GCC builtins to generate the instructions,
- add the ACLE feature macro __ARM_FEATURE_QRDMX,
- add support for ARMv8.1-A Adv.SIMD tests to the dejagnu support code,
- add NEON intrinsics for the basic form of the instructions.
- add NEON intrinsics for the *_lane forms of the instructions.

Tested the series for aarch64-none-linux-gnu with native bootstrap and
make check on an ARMv8 architecture. Also tested aarch64-none-elf with
cross-compiled check-gcc on an ARMv8.1 emulator.

Ok for trunk?
Matthew

gcc/
2015-10-23  Matthew Wahab  

* config/aarch64/aarch64.h (AARCH64_ISA_RDMA): New.
(TARGET_SIMD_RDMA): New.

>From 4933ff4839406cdff2d2ec87920cab257a90474d Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Thu, 27 Aug 2015 13:31:17 +0100
Subject: [PATCH 1/7] Add RDMA target feature.

Change-Id: Ic22d5ae4c8dc012bd8e63dfd82a21935f44be50c
---
 gcc/config/aarch64/aarch64.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index b041a1e..c67eac9 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -157,6 +157,7 @@ extern unsigned aarch64_architecture_version;
 #define AARCH64_ISA_FP (aarch64_isa_flags & AARCH64_FL_FP)
 #define AARCH64_ISA_SIMD   (aarch64_isa_flags & AARCH64_FL_SIMD)
 #define AARCH64_ISA_LSE		   (aarch64_isa_flags & AARCH64_FL_LSE)
+#define AARCH64_ISA_RDMA	   (aarch64_isa_flags & AARCH64_FL_RDMA)
 
 /* Crypto is an optional extension to AdvSIMD.  */
 #define TARGET_CRYPTO (TARGET_SIMD && AARCH64_ISA_CRYPTO)
@@ -181,6 +182,9 @@ extern unsigned aarch64_architecture_version;
   ((aarch64_fix_a53_err835769 == 2)	\
   ? TARGET_FIX_ERR_A53_835769_DEFAULT : aarch64_fix_a53_err835769)
 
+/* ARMv8.1 Adv.SIMD support.  */
+#define TARGET_SIMD_RDMA (TARGET_SIMD && AARCH64_ISA_RDMA)
+
 /* Standard register usage.  */
 
 /* 31 64-bit general purpose registers R0-R30:
-- 
2.1.4



[AArch64][PATCH 2/7] Add sqrdmah, sqrdmsh instructions.

2015-10-23 Thread Matthew Wahab

The ARMv8.1 architecture extension adds two Adv.SIMD instructions,
sqrdmlah and sqrdmlsh. This patch adds the instructions to the
aarch64-simd patterns, making them conditional on the TARGET_SIMD_RDMA
feature macro introduced in the previous patch.

The instructions patterns are defined using unspec expressions, so that
they are only generated through builtins added by this patch series. To
simplify the definition, iterators SQRDMLAH and rdma_as are added, to
iterate over the add (sqrdmlah) and subtract (sqrdmlsh) forms of the
instructions.

Tested the series for aarch64-none-linux-gnu with native bootstrap and
make check on an ARMv8 architecture. Also tested aarch64-none-elf with
cross-compiled check-gcc on an ARMv8.1 emulator.

Ok for trunk?
Matthew

gcc/
2015-10-23  Matthew Wahab  

* config/aarch64/aarch64-simd.md
(aarch64_sqmovun): Fix some white-space.
(aarch64_qmovun): Likewise.
(aarch64_sqrdmlh): New.
(aarch64_sqrdmlh_lane): New.
(aarch64_sqrdmlh_laneq): New.
* config/aarch64/iterators.md (UNSPEC_SQRDMLAH): New.
(UNSPEC_SQRDMLSH): New.
(SQRDMLAH): New.
(rdma_as): New.

>From 3505963108eac78ad5e224a0e558cce82ac8e127 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Mon, 7 Sep 2015 18:57:37 +0100
Subject: [PATCH 2/7] Add RDMA simd instruction patterns.

Change-Id: I87043d052c660b7ce9b6d881293abe880efb795e
---
 gcc/config/aarch64/aarch64-simd.md | 94 +-
 gcc/config/aarch64/iterators.md|  6 +++
 2 files changed, 98 insertions(+), 2 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 167277e..cf87ac2 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -2852,7 +2852,7 @@
"TARGET_SIMD"
"sqxtun\\t%0, %1"
[(set_attr "type" "neon_sat_shift_imm_narrow_q")]
- )
+)
 
 ;; sqmovn and uqmovn
 
@@ -2863,7 +2863,7 @@
   "TARGET_SIMD"
   "qxtn\\t%0, %1"
[(set_attr "type" "neon_sat_shift_imm_narrow_q")]
- )
+)
 
 ;; q
 
@@ -2951,6 +2951,96 @@
   [(set_attr "type" "neon_sat_mul__scalar")]
 )
 
+;; sqrdml[as]h.
+
+(define_insn "aarch64_sqrdmlh"
+  [(set (match_operand:VSDQ_HSI 0 "register_operand" "=w")
+	(unspec:VSDQ_HSI
+	  [(match_operand:VSDQ_HSI 1 "register_operand" "0")
+	   (match_operand:VSDQ_HSI 2 "register_operand" "w")
+	   (match_operand:VSDQ_HSI 3 "register_operand" "w")]
+	  SQRDMLAH))]
+   "TARGET_SIMD_RDMA"
+   "sqrdmlh\\t%0, %2, %3"
+   [(set_attr "type" "neon_sat_mla__long")]
+)
+
+;; sqrdml[as]h_lane.
+
+(define_insn "aarch64_sqrdmlh_lane"
+  [(set (match_operand:VDQHS 0 "register_operand" "=w")
+	(unspec:VDQHS
+	  [(match_operand:VDQHS 1 "register_operand" "0")
+	   (match_operand:VDQHS 2 "register_operand" "w")
+	   (vec_select:
+	 (match_operand: 3 "register_operand" "w")
+	 (parallel [(match_operand:SI 4 "immediate_operand" "i")]))]
+	  SQRDMLAH))]
+   "TARGET_SIMD_RDMA"
+   {
+ operands[4] = GEN_INT (ENDIAN_LANE_N (mode, INTVAL (operands[4])));
+ return
+  "sqrdmlh\\t%0., %2., %3.[%4]";
+   }
+   [(set_attr "type" "neon_sat_mla__scalar_long")]
+)
+
+(define_insn "aarch64_sqrdmlh_lane"
+  [(set (match_operand:SD_HSI 0 "register_operand" "=w")
+	(unspec:SD_HSI
+	  [(match_operand:SD_HSI 1 "register_operand" "0")
+	   (match_operand:SD_HSI 2 "register_operand" "w")
+	   (vec_select:
+	 (match_operand: 3 "register_operand" "w")
+	 (parallel [(match_operand:SI 4 "immediate_operand" "i")]))]
+	  SQRDMLAH))]
+   "TARGET_SIMD_RDMA"
+   {
+ operands[4] = GEN_INT (ENDIAN_LANE_N (mode, INTVAL (operands[4])));
+ return
+  "sqrdmlh\\t%0, %2, %3.[%4]";
+   }
+   [(set_attr "type" "neon_sat_mla__scalar_long")]
+)
+
+;; sqrdml[as]h_laneq.
+
+(define_insn "aarch64_sqrdmlh_laneq"
+  [(set (match_operand:VDQHS 0 "register_operand" "=w")
+	(unspec:VDQHS
+	  [(match_operand:VDQHS 1 "register_operand" "0")
+	   (match_operand:VDQHS 2 "register_operand" "w")
+	   (vec_select:
+	 (match_operand: 3 "register_operand" "w")
+	 (parallel [(match_operand:SI 4 "immediate_operand" "i")]))]
+	  SQRDMLAH))]
+   "TARGET_SIMD_RDMA"
+   {
+ operands[4] = GEN_INT (ENDIAN_LANE_N (mode, INTVAL (operands[4])));
+ return
+  "sqrdmlh\\t%0., %2., %3.[%4]";
+   }
+   [(set_attr "type" "

[AArch64][PATCH 3/7] Add builtins for ARMv8.1 Adv.SIMD,instructions.

2015-10-23 Thread Matthew Wahab

The ARMv8.1 architecture extension adds two Adv.SIMD instructions,
sqrdmlah and sqrdmlsh. This patch adds the GCC builtins to generate the new
instructions which are needed for the NEON intrinsics added later in
this series.

Tested the series for aarch64-none-linux-gnu with native bootstrap and
make check on an ARMv8 architecture. Also tested aarch64-none-elf with
cross-compiled check-gcc on an ARMv8.1 emulator.

Ok for trunk?
Matthew

gcc/
2015-10-23  Matthew Wahab  

* config/aarch64/aarch64-simd-builtins.def
(sqrdmlah, sqrdmlsh): New.
(sqrdmlah_lane, sqrdmlsh_lane): New.
(sqrdmlah_laneq, sqrdmlsh_laneq): New.

>From b4a480cf0e38caa156b2fa15fc30b12ab8e0e7ad Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Thu, 27 Aug 2015 13:15:34 +0100
Subject: [PATCH 3/7] Add builtins for RDMA instructions.

Change-Id: I5156884010b1f6171583229c816aef4daab23b8f
---
 gcc/config/aarch64/aarch64-simd-builtins.def | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index 654e963..4cc4559 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -412,3 +412,17 @@
 
   /* Implemented by aarch64_tbx4v8qi.  */
   VAR1 (TERNOP, tbx4, 0, v8qi)
+
+  /* Builtins for ARMv8.1 Adv.SIMD instructions.  */
+
+  /* Implemented by aarch64_sqrdmlh.  */
+  BUILTIN_VSDQ_HSI (TERNOP, sqrdmlah, 0)
+  BUILTIN_VSDQ_HSI (TERNOP, sqrdmlsh, 0)
+
+  /* Implemented by aarch64_sqrdmlh_lane.  */
+  BUILTIN_VSDQ_HSI (QUADOP_LANE, sqrdmlah_lane, 0)
+  BUILTIN_VSDQ_HSI (QUADOP_LANE, sqrdmlsh_lane, 0)
+
+  /* Implemented by aarch64_sqrdmlh_laneq.  */
+  BUILTIN_VSDQ_HSI (QUADOP_LANE, sqrdmlah_laneq, 0)
+  BUILTIN_VSDQ_HSI (QUADOP_LANE, sqrdmlsh_laneq, 0)
-- 
2.1.4



[AArch64][PATCH 4/7] Add ACLE feature macro for ARMv8.1,Adv.SIMD instructions.

2015-10-23 Thread Matthew Wahab

The ARMv8.1 architecture extension adds two Adv.SIMD instructions,
sqrdmlah and sqrdmlsh. This patch adds the feature macro
__ARM_FEATURE_QRDMX to indicate the presence of these instructions,
generating it when the feature is available, as it is when
-march=armv8.1-a is selected.

Tested the series for aarch64-none-linux-gnu with native bootstrap and
make check on an ARMv8 architecture. Also tested aarch64-none-elf with
cross-compiled check-gcc on an ARMv8.1 emulator.

Ok for trunk?
Matthew

gcc/
2015-10-23  Matthew Wahab  

* config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): Add
ARM_FEATURE_QRDMX.

>From 3af8c483a2def95abec264ca8591547d6c0e0b3e Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Thu, 27 Aug 2015 13:31:49 +0100
Subject: [PATCH 4/7] Add ACLE QRDMX feature macro.

Change-Id: I91af172637603ea89fc93a8e715973d7d304a92f
---
 gcc/config/aarch64/aarch64-c.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/config/aarch64/aarch64-c.c b/gcc/config/aarch64/aarch64-c.c
index 303025f..ad95c78 100644
--- a/gcc/config/aarch64/aarch64-c.c
+++ b/gcc/config/aarch64/aarch64-c.c
@@ -126,6 +126,7 @@ aarch64_update_cpp_builtins (cpp_reader *pfile)
   aarch64_def_or_undef (TARGET_ILP32, "__ILP32__", pfile);
 
   aarch64_def_or_undef (TARGET_CRYPTO, "__ARM_FEATURE_CRYPTO", pfile);
+  aarch64_def_or_undef (TARGET_SIMD_RDMA, "__ARM_FEATURE_QRDMX", pfile);
 }
 
 /* Implement TARGET_CPU_CPP_BUILTINS.  */
-- 
2.1.4



[AArch64][dejagnu][PATCH 5/7] Dejagnu support for ARMv8.1 Adv.SIMD.

2015-10-23 Thread Matthew Wahab

The ARMv8.1 architecture extension adds two Adv.SIMD instructions,. This
patch adds support in Dejagnu for ARMv8.1 Adv.SIMD specifiers and
checks.

The new test options are
- { dg-add-options arm_v8_1a_neon }: Add compiler options needed to
  enable ARMv8.1 Adv.SIMD.
- { dg-require-effective-target arm_v8_1a_neon_hw }: Require a target
  capable of executing ARMv8.1 Adv.SIMD instructions.

The new options support AArch64 only.

Tested the series for aarch64-none-linux-gnu with native bootstrap and
make check on an ARMv8 architecture. Also tested aarch64-none-elf with
cross-compiled check-gcc on an ARMv8.1 emulator.

Ok for trunk?
Matthew

gcc/testsuite
2015-10-23  Matthew Wahab  

* lib/target-supports.exp (add_options_for_arm_v8_1a_neon): New.
(check_effective_target_arm_arch_FUNC_ok)
(add_options_for_arm_arch_FUNC)
(check_effective_target_arm_arch_FUNC_multilib): Add "armv8.1-a"
to the list to be generated.
(check_effective_target_arm_v8_1a_neon_ok_nocache): New.
(check_effective_target_arm_v8_1a_neon_ok): New.
(check_effective_target_arm_v8_1a_neon_hw): New.

>From 4c218c6972f510aee2b438180084baafda80b37f Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Thu, 27 Aug 2015 13:41:15 +0100
Subject: [PATCH 5/7] [Testsuite] Add dejagnu options for armv8.1 neon

Change-Id: Ic8edc48aa701aa159303f13154710a6fdae816d0
---
 gcc/testsuite/lib/target-supports.exp | 50 ++-
 1 file changed, 49 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 4d5b0a3d..b03ea02 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -2700,6 +2700,16 @@ proc add_options_for_arm_v8_neon { flags } {
 return "$flags $et_arm_v8_neon_flags -march=armv8-a"
 }
 
+# Add the options needed for ARMv8.1 Adv.SIMD.
+
+proc add_options_for_arm_v8_1a_neon { flags } {
+if { [istarget aarch64*-*-*] } {
+	return "$flags -march=armv8.1-a"
+} else {
+	return "$flags"
+}
+}
+
 proc add_options_for_arm_crc { flags } {
 if { ! [check_effective_target_arm_crc_ok] } {
 return "$flags"
@@ -2984,7 +2994,8 @@ foreach { armfunc armflag armdef } { v4 "-march=armv4 -marm" __ARM_ARCH_4__
  v7r "-march=armv7-r" __ARM_ARCH_7R__
  v7m "-march=armv7-m -mthumb" __ARM_ARCH_7M__
  v7em "-march=armv7e-m -mthumb" __ARM_ARCH_7EM__
- v8a "-march=armv8-a" __ARM_ARCH_8A__ } {
+ v8a "-march=armv8-a" __ARM_ARCH_8A__
+ v8_1a "-march=armv8.1a" __ARM_ARCH_8A__ } {
 eval [string map [list FUNC $armfunc FLAG $armflag DEF $armdef ] {
 	proc check_effective_target_arm_arch_FUNC_ok { } {
 	if { [ string match "*-marm*" "FLAG" ] &&
@@ -3141,6 +3152,22 @@ proc check_effective_target_arm_neonv2_hw { } {
 } [add_options_for_arm_neonv2 ""]]
 }
 
+# Return 1 if the target supports the ARMv8.1 Adv.SIMD extension, 0
+# otherwise.  The test is valid for AArch64.
+
+proc check_effective_target_arm_v8_1a_neon_ok_nocache { } {
+return [check_no_compiler_messages_nocache arm_v8_1a_neon_ok assembly {
+	#if !defined (__ARM_FEATURE_QRDMX)
+	#error FOO
+	#endif
+} [add_options_for_arm_v8_1a_neon ""]]
+}
+
+proc check_effective_target_arm_v8_1a_neon_ok { } {
+return [check_cached_effective_target arm_v8_1a_neon_ok \
+		check_effective_target_arm_v8_1a_neon_ok_nocache]
+}
+
 # Return 1 if the target supports executing ARMv8 NEON instructions, 0
 # otherwise.
 
@@ -3159,6 +3186,27 @@ proc check_effective_target_arm_v8_neon_hw { } {
 } [add_options_for_arm_v8_neon ""]]
 }
 
+# Return 1 if the target supports executing the ARMv8.1 Adv.SIMD extension, 0
+# otherwise.  The test is valid for AArch64.
+
+proc check_effective_target_arm_v8_1a_neon_hw { } {
+return [check_runtime_nocache arm_v8_1a_neon_hw_available {
+	int
+	main (void)
+	{
+	  long long a = 0, b = 1;
+	  long long result = 0;
+
+	  asm ("sqrdmlah %s0,%s1,%s2"
+	   : "=w"(result)
+	   : "w"(a), "w"(b)
+	   : /* No clobbers.  */);
+
+	  return result;
+	}
+}  [add_options_for_arm_v8_1a_neon ""]]
+}
+
 # Return 1 if this is a ARM target with NEON enabled.
 
 proc check_effective_target_arm_neon { } {
-- 
2.1.4



[AArch64][PATCH 6/7] Add NEON intrinsics vqrdmlah and vqrdmlsh.

2015-10-23 Thread Matthew Wahab

The ARMv8.1 architecture extension adds two Adv.SIMD instructions,
sqrdmlah and sqrdmlsh. This patch adds the NEON intrinsics vqrdmlah and
vqrdmlsh for these instructions. The new intrinsics are of the form
vqrdml{as}h[q]_.

Tested the series for aarch64-none-linux-gnu with native bootstrap and
make check on an ARMv8 architecture. Also tested aarch64-none-elf with
cross-compiled check-gcc on an ARMv8.1 emulator.

Ok for trunk?
Matthew

gcc/
2015-10-23  Matthew Wahab  

* gcc/config/aarch64/arm_neon.h (vqrdmlah_s16, vqrdmlah_s32): New.
(vqrdmlahq_s16, vqrdmlahq_s32): New.
(vqrdmlsh_s16, vqrdmlsh_s32): New.
(vqrdmlshq_s16, vqrdmlshq_s32): New.

gcc/testsuite
2015-10-23  Matthew Wahab  

* gcc.target/aarch64/advsimd-intrinsics/vqrdmlXh.inc: New file,
support code for vqrdml{as}h tests.
* gcc.target/aarch64/advsimd-intrinsics/vqrdmlah.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vqrdmlsh.c: New.

>From 611e1232a59dfe42f2cd980407d67abcfea5 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Thu, 27 Aug 2015 13:22:41 +0100
Subject: [PATCH 6/7] Add neon intrinsics: vqrdmlah, vqrdmlsh.

Change-Id: I5c7f8d36ee980d280c1d50f6f212b286084c5acf
---
 gcc/config/aarch64/arm_neon.h  |  53 
 .../aarch64/advsimd-intrinsics/vqrdmlXh.inc| 138 +
 .../aarch64/advsimd-intrinsics/vqrdmlah.c  |  57 +
 .../aarch64/advsimd-intrinsics/vqrdmlsh.c  |  61 +
 4 files changed, 309 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrdmlXh.inc
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrdmlah.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrdmlsh.c

diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index e186348..9e73809 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -2649,6 +2649,59 @@ vqrdmulhq_s32 (int32x4_t __a, int32x4_t __b)
   return (int32x4_t) __builtin_aarch64_sqrdmulhv4si (__a, __b);
 }
 
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8.1-a")
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vqrdmlah_s16 (int16x4_t __a, int16x4_t __b, int16x4_t __c)
+{
+  return (int16x4_t) __builtin_aarch64_sqrdmlahv4hi (__a, __b, __c);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vqrdmlah_s32 (int32x2_t __a, int32x2_t __b, int32x2_t __c)
+{
+  return (int32x2_t) __builtin_aarch64_sqrdmlahv2si (__a, __b, __c);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vqrdmlahq_s16 (int16x8_t __a, int16x8_t __b, int16x8_t __c)
+{
+  return (int16x8_t) __builtin_aarch64_sqrdmlahv8hi (__a, __b, __c);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vqrdmlahq_s32 (int32x4_t __a, int32x4_t __b, int32x4_t __c)
+{
+  return (int32x4_t) __builtin_aarch64_sqrdmlahv4si (__a, __b, __c);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vqrdmlsh_s16 (int16x4_t __a, int16x4_t __b, int16x4_t __c)
+{
+  return (int16x4_t) __builtin_aarch64_sqrdmlshv4hi (__a, __b, __c);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vqrdmlsh_s32 (int32x2_t __a, int32x2_t __b, int32x2_t __c)
+{
+  return (int32x2_t) __builtin_aarch64_sqrdmlshv2si (__a, __b, __c);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vqrdmlshq_s16 (int16x8_t __a, int16x8_t __b, int16x8_t __c)
+{
+  return (int16x8_t) __builtin_aarch64_sqrdmlshv8hi (__a, __b, __c);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vqrdmlshq_s32 (int32x4_t __a, int32x4_t __b, int32x4_t __c)
+{
+  return (int32x4_t) __builtin_aarch64_sqrdmlshv4si (__a, __b, __c);
+}
+
+#pragma GCC pop_options
+
 __extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
 vcreate_s8 (uint64_t __a)
 {
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrdmlXh.inc b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrdmlXh.inc
new file mode 100644
index 000..a504ca6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrdmlXh.inc
@@ -0,0 +1,138 @@
+#define FNNAME1(NAME) exec_ ## NAME
+#define FNNAME(NAME) FNNAME1 (NAME)
+
+void FNNAME (INSN) (void)
+{
+  /* vector_res = vqrdmlah (vector, vector2, vector3, vector4),
+ then store the result.  */
+#define TEST_VQRDMLAH2(INSN, Q, T1, T2, W, N, EXPECTED_CUMULATIVE_SAT, CMT) \
+  Set_Neon_Cumulative_Sat (0, VECT_VAR (vector_res, T1, W, N));		\
+  VECT_VAR (vector_res, T1, W, N) =	\
+INSN##Q##_##T2##W (VECT_VAR (vector, T1, W, N),			\
+		   VECT_VAR (vector2, T1, W, N),			\
+		   VECT_VAR (vector3, T1, W, N));			\
+  vst1##Q##_##T2##W (VECT_VAR (result, T1, W, N),			\
+		 VECT_VAR (vector_res, T1, W, N));			\
+  CHECK_C

[AArch64][PATCH 7/7] Add NEON intrinsics vqrdmlah_lane and vqrdmlsh_lane.

2015-10-23 Thread Matthew Wahab

The ARMv8.1 architecture extension adds two Adv.SIMD instructions,
sqrdmlah and sqrdmlsh. This patch adds the NEON intrinsics vqrdmlah_lane
and vqrdmlsh_lane for these instructions. The new intrinsics are of the
form vqrdml{as}h[q]_lane_.

Tested the series for aarch64-none-linux-gnu with native bootstrap and
make check on an ARMv8 architecture. Also tested aarch64-none-elf with
cross-compiled check-gcc on an ARMv8.1 emulator.

Ok for trunk?
Matthew

gcc/
2015-10-23  Matthew Wahab  

* gcc/config/aarch64/arm_neon.h
(vqrdmlah_laneq_s16, vqrdmlah_laneq_s32): New.
(vqrdmlahq_laneq_s16, vqrdmlahq_laneq_s32): New.
(vqrdmlsh_laneq_s16, vqrdmlsh_laneq_s32): New.
(vqrdmlshq_laneq_s16, vqrdmlshq_laneq_s32): New.
(vqrdmlah_lane_s16, vqrdmlah_lane_s32): New.
(vqrdmlahq_lane_s16, vqrdmlahq_lane_s32): New.
(vqrdmlahh_s16, vqrdmlahh_lane_s16, vqrdmlahh_laneq_s16): New.
(vqrdmlahs_s32, vqrdmlahs_lane_s32, vqrdmlahs_laneq_s32): New.
(vqrdmlsh_lane_s16, vqrdmlsh_lane_s32): New.
(vqrdmlshq_lane_s16, vqrdmlshq_lane_s32): New.
(vqrdmlshh_s16, vqrdmlshh_lane_s16, vqrdmlshh_laneq_s16): New.
(vqrdmlshs_s32, vqrdmlshs_lane_s32, vqrdmlshs_laneq_s32): New.

gcc/testsuite
2015-10-23  Matthew Wahab  

* gcc.target/aarch64/advsimd-intrinsics/vqrdmlXh_lane.inc: New file,
support code for vqrdml{as}h_lane tests.
* gcc.target/aarch64/advsimd-intrinsics/vqrdmlah_lane.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vqrdmlsh_lane.c: New.

>From a2399818dba85ff2801a28bad77ef51697990da7 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Thu, 27 Aug 2015 14:17:26 +0100
Subject: [PATCH 7/7] Add neon intrinsics: vqrdmlah_lane, vqrdmlsh_lane.

Change-Id: I6d7a372e0a5b83ef0846ab62abbe9b24ada69fc4
---
 gcc/config/aarch64/arm_neon.h  | 182 +
 .../aarch64/advsimd-intrinsics/vqrdmlXh_lane.inc   | 154 +
 .../aarch64/advsimd-intrinsics/vqrdmlah_lane.c |  57 +++
 .../aarch64/advsimd-intrinsics/vqrdmlsh_lane.c |  61 +++
 4 files changed, 454 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrdmlXh_lane.inc
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrdmlah_lane.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrdmlsh_lane.c

diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 9e73809..9b68e4a 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -10675,6 +10675,59 @@ vqrdmulhq_laneq_s32 (int32x4_t __a, int32x4_t __b, const int __c)
   return __builtin_aarch64_sqrdmulh_laneqv4si (__a, __b, __c);
 }
 
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8.1-a")
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vqrdmlah_laneq_s16 (int16x4_t __a, int16x4_t __b, int16x8_t __c, const int __d)
+{
+  return  __builtin_aarch64_sqrdmlah_laneqv4hi (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vqrdmlah_laneq_s32 (int32x2_t __a, int32x2_t __b, int32x4_t __c, const int __d)
+{
+  return __builtin_aarch64_sqrdmlah_laneqv2si (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vqrdmlahq_laneq_s16 (int16x8_t __a, int16x8_t __b, int16x8_t __c, const int __d)
+{
+  return __builtin_aarch64_sqrdmlah_laneqv8hi (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vqrdmlahq_laneq_s32 (int32x4_t __a, int32x4_t __b, int32x4_t __c, const int __d)
+{
+  return __builtin_aarch64_sqrdmlah_laneqv4si (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vqrdmlsh_laneq_s16 (int16x4_t __a, int16x4_t __b, int16x8_t __c, const int __d)
+{
+  return  __builtin_aarch64_sqrdmlsh_laneqv4hi (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vqrdmlsh_laneq_s32 (int32x2_t __a, int32x2_t __b, int32x4_t __c, const int __d)
+{
+  return __builtin_aarch64_sqrdmlsh_laneqv2si (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vqrdmlshq_laneq_s16 (int16x8_t __a, int16x8_t __b, int16x8_t __c, const int __d)
+{
+  return __builtin_aarch64_sqrdmlsh_laneqv8hi (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vqrdmlshq_laneq_s32 (int32x4_t __a, int32x4_t __b, int32x4_t __c, const int __d)
+{
+  return __builtin_aarch64_sqrdmlsh_laneqv4si (__a, __b, __c, __d);
+}
+
+#pragma GCC pop_options
+
 /* Table intrinsics.  */
 
 __extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
@@ -20014,6 +20067,135 @@ vqrdmulhs_laneq_s32 (int32_t __a, int32x4_t __b, const int __c)
   return __builtin_aarch64_sqrdmulh_laneqsi (

Re: [AArch64][dejagnu][PATCH 5/7] Dejagnu support for ARMv8.1 Adv.SIMD.

2015-10-27 Thread Matthew Wahab

On 24/10/15 08:16, Bernhard Reutner-Fischer wrote:

On October 23, 2015 2:24:26 PM GMT+02:00, Matthew Wahab 
 wrote:

The ARMv8.1 architecture extension adds two Adv.SIMD instructions,.
This
patch adds support in Dejagnu for ARMv8.1 Adv.SIMD specifiers and
checks.

The new test options are
- { dg-add-options arm_v8_1a_neon }: Add compiler options needed to
   enable ARMv8.1 Adv.SIMD.
- { dg-require-effective-target arm_v8_1a_neon_hw }: Require a target
   capable of executing ARMv8.1 Adv.SIMD instructions.



Please error with something more meaningful than FOO, !__ARM_FEATURE_QRDMX 
comes to mind.

TIA,



I've reworked the patch so that the error is "__ARM_FEATURE_QRDMX not
defined" and also strengthened the check_effective_target tests.

Retested for aarch64-none-elf with cross-compiled check-gcc on an
ARMv8.1 emulator. Also tested with a version of the compiler that
doesn't define the ACLE feature macro.

Matthew

gcc/testsuite
2015-10-27  Matthew Wahab  

* lib/target-supports.exp (add_options_for_arm_v8_1a_neon): New.
(check_effective_target_arm_arch_FUNC_ok)
(add_options_for_arm_arch_FUNC)
(check_effective_target_arm_arch_FUNC_multilib): Add "armv8.1-a"
to the list to be generated.
(check_effective_target_arm_v8_1a_neon_ok_nocache): New.
(check_effective_target_arm_v8_1a_neon_ok): New.
(check_effective_target_arm_v8_1a_neon_hw): New.


>From b12969882298cb79737e882c48398c58a45161b9 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Mon, 26 Oct 2015 14:58:36 +
Subject: [PATCH 5/7] [Testsuite] Add dejagnu options for armv8.1 neon

Change-Id: Ib58b8c4930ad3971af3ea682eda043e14cd2e8b3
---
 gcc/testsuite/lib/target-supports.exp | 56 ++-
 1 file changed, 55 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 4d5b0a3d..0fb679d 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -2700,6 +2700,16 @@ proc add_options_for_arm_v8_neon { flags } {
 return "$flags $et_arm_v8_neon_flags -march=armv8-a"
 }
 
+# Add the options needed for ARMv8.1 Adv.SIMD.
+
+proc add_options_for_arm_v8_1a_neon { flags } {
+if { [istarget aarch64*-*-*] } {
+	return "$flags -march=armv8.1-a"
+} else {
+	return "$flags"
+}
+}
+
 proc add_options_for_arm_crc { flags } {
 if { ! [check_effective_target_arm_crc_ok] } {
 return "$flags"
@@ -2984,7 +2994,8 @@ foreach { armfunc armflag armdef } { v4 "-march=armv4 -marm" __ARM_ARCH_4__
  v7r "-march=armv7-r" __ARM_ARCH_7R__
  v7m "-march=armv7-m -mthumb" __ARM_ARCH_7M__
  v7em "-march=armv7e-m -mthumb" __ARM_ARCH_7EM__
- v8a "-march=armv8-a" __ARM_ARCH_8A__ } {
+ v8a "-march=armv8-a" __ARM_ARCH_8A__
+ v8_1a "-march=armv8.1a" __ARM_ARCH_8A__ } {
 eval [string map [list FUNC $armfunc FLAG $armflag DEF $armdef ] {
 	proc check_effective_target_arm_arch_FUNC_ok { } {
 	if { [ string match "*-marm*" "FLAG" ] &&
@@ -3141,6 +3152,25 @@ proc check_effective_target_arm_neonv2_hw { } {
 } [add_options_for_arm_neonv2 ""]]
 }
 
+# Return 1 if the target supports the ARMv8.1 Adv.SIMD extension, 0
+# otherwise.  The test is valid for AArch64.
+
+proc check_effective_target_arm_v8_1a_neon_ok_nocache { } {
+if { ![istarget aarch64*-*-*] } {
+	return 0
+}
+return [check_no_compiler_messages_nocache arm_v8_1a_neon_ok assembly {
+	#if !defined (__ARM_FEATURE_QRDMX)
+	#error "__ARM_FEATURE_QRDMX not defined"
+	#endif
+} [add_options_for_arm_v8_1a_neon ""]]
+}
+
+proc check_effective_target_arm_v8_1a_neon_ok { } {
+return [check_cached_effective_target arm_v8_1a_neon_ok \
+		check_effective_target_arm_v8_1a_neon_ok_nocache]
+}
+
 # Return 1 if the target supports executing ARMv8 NEON instructions, 0
 # otherwise.
 
@@ -3159,6 +3189,30 @@ proc check_effective_target_arm_v8_neon_hw { } {
 } [add_options_for_arm_v8_neon ""]]
 }
 
+# Return 1 if the target supports executing the ARMv8.1 Adv.SIMD extension, 0
+# otherwise.  The test is valid for AArch64.
+
+proc check_effective_target_arm_v8_1a_neon_hw { } {
+if { ![check_effective_target_arm_v8_1a_neon_ok] } {
+	return 0;
+}
+return [check_runtime_nocache arm_v8_1a_neon_hw_available {
+	int
+	main (void)
+	{
+	  long long a = 0, b = 1;
+	  long long result = 0;
+
+	  asm ("sqrdmlah %s0,%s1,%s2"
+	   : "=w"(result)
+	   : "w"(a), "w"(b)
+	   : /* No clobbers.  */);
+
+	  return result;
+	}
+}  [add_options_for_arm_v8_1a_neon ""]]
+}
+
 # Return 1 if this is a ARM target with NEON enabled.
 
 proc check_effective_target_arm_neon { } {
-- 
2.1.4



Re: [AArch64][PATCH 2/7] Add sqrdmah, sqrdmsh instructions.

2015-10-27 Thread Matthew Wahab

On 27/10/15 11:18, James Greenhalgh wrote:


  ;; ---
@@ -932,6 +934,8 @@
 UNSPEC_SQSHRN UNSPEC_UQSHRN
 UNSPEC_SQRSHRN UNSPEC_UQRSHRN])

+(define_int_iterator SQRDMLAH [UNSPEC_SQRDMLAH UNSPEC_SQRDMLSH])
+


This name does not make it clear that you will iterate over an "A" and an
"S" form. I'd like to see a clearer naming choice, RDMAS? SQRDMLHADDSUB? etc.


SQRDMLHADDSUB is a little difficult to read. How about SQRDMLH_AS, to keep the link 
to the instruction?


Matthew




Re: [AArch64][PATCH 6/7] Add NEON intrinsics vqrdmlah and vqrdmlsh.

2015-10-30 Thread Matthew Wahab

On 30/10/15 12:51, Christophe Lyon wrote:

On 23 October 2015 at 14:26, Matthew Wahab  wrote:

The ARMv8.1 architecture extension adds two Adv.SIMD instructions,
sqrdmlah and sqrdmlsh. This patch adds the NEON intrinsics vqrdmlah and
vqrdmlsh for these instructions. The new intrinsics are of the form
vqrdml{as}h[q]_.

Tested the series for aarch64-none-linux-gnu with native bootstrap and
make check on an ARMv8 architecture. Also tested aarch64-none-elf with
cross-compiled check-gcc on an ARMv8.1 emulator.



Is there a publicly available simulator for v8.1? QEMU or Foundation Model?



Sorry, I don't know.
Matthew



[AArch64] Move iterators from atomics.md to iterators.md

2015-11-02 Thread Matthew Wahab

Hello

One of the review comments for the v8.1 atomics patches was that the
iterators and unspec declarations should be moved out of the atomics.md
file (https://gcc.gnu.org/ml/gcc-patches/2015-09/msg01375.html).

The iterators in atomics.md are tied to the unspecv definition in the
same file. This patch moves both into iterators.md.

Tested aarch64-none-elf with cross-compiled check-gcc and
aarch64-none-linux-gnu with native bootstrap and make check.

Ok for trunk?
Matthew

gcc/
2015-11-02  Matthew Wahab  

* config/aarch64/atomics.md (unspecv): Move to iterators.md.
(ATOMIC_LDOP): Likewise.
(atomic_ldop): Likewise.
* config/aarch64/iterators.md (unspecv): Moved from atomics.md.
(ATOMIC_LDOP): Likewise.
(atomic_ldop): Likewise.
>From 90471e373421b838d1069cddb54abe0377fdc244 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Thu, 29 Oct 2015 15:44:41 +
Subject: [PATCH] [AArch64] Move atomics iterators into iteraors.md

Change-Id: Ie83ae9c983762c10920db6bf3f2e2d4fa33167b2
---
 gcc/config/aarch64/atomics.md   | 28 
 gcc/config/aarch64/iterators.md | 33 +
 2 files changed, 33 insertions(+), 28 deletions(-)

diff --git a/gcc/config/aarch64/atomics.md b/gcc/config/aarch64/atomics.md
index e7ac5f6..3c034fb 100644
--- a/gcc/config/aarch64/atomics.md
+++ b/gcc/config/aarch64/atomics.md
@@ -18,34 +18,6 @@
 ;; along with GCC; see the file COPYING3.  If not see
 ;; <http://www.gnu.org/licenses/>.
 
-(define_c_enum "unspecv"
- [
-UNSPECV_LX; Represent a load-exclusive.
-UNSPECV_SX; Represent a store-exclusive.
-UNSPECV_LDA; Represent an atomic load or load-acquire.
-UNSPECV_STL; Represent an atomic store or store-release.
-UNSPECV_ATOMIC_CMPSW		; Represent an atomic compare swap.
-UNSPECV_ATOMIC_EXCHG		; Represent an atomic exchange.
-UNSPECV_ATOMIC_CAS			; Represent an atomic CAS.
-UNSPECV_ATOMIC_SWP			; Represent an atomic SWP.
-UNSPECV_ATOMIC_OP			; Represent an atomic operation.
-UNSPECV_ATOMIC_LDOP			; Represent an atomic load-operation
-UNSPECV_ATOMIC_LDOP_OR		; Represent an atomic load-or
-UNSPECV_ATOMIC_LDOP_BIC		; Represent an atomic load-bic
-UNSPECV_ATOMIC_LDOP_XOR		; Represent an atomic load-xor
-UNSPECV_ATOMIC_LDOP_PLUS		; Represent an atomic load-add
-])
-
-;; Iterators for load-operate instructions.
-
-(define_int_iterator ATOMIC_LDOP
- [UNSPECV_ATOMIC_LDOP_OR UNSPECV_ATOMIC_LDOP_BIC
-  UNSPECV_ATOMIC_LDOP_XOR UNSPECV_ATOMIC_LDOP_PLUS])
-
-(define_int_attr atomic_ldop
- [(UNSPECV_ATOMIC_LDOP_OR "set") (UNSPECV_ATOMIC_LDOP_BIC "clr")
-  (UNSPECV_ATOMIC_LDOP_XOR "eor") (UNSPECV_ATOMIC_LDOP_PLUS "add")])
-
 ;; Instruction patterns.
 
 (define_expand "atomic_compare_and_swap"
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index 964f8f1..fe7ca39 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -305,6 +305,29 @@
 UNSPEC_VEC_SHR  ; Used in aarch64-simd.md.
 ])
 
+;; --
+;; Unspec enumerations for Atomics.  They are here so that they can be
+;; used in the int_iterators for atomic operations.
+;; --
+
+(define_c_enum "unspecv"
+ [
+UNSPECV_LX			; Represent a load-exclusive.
+UNSPECV_SX			; Represent a store-exclusive.
+UNSPECV_LDA			; Represent an atomic load or load-acquire.
+UNSPECV_STL			; Represent an atomic store or store-release.
+UNSPECV_ATOMIC_CMPSW	; Represent an atomic compare swap.
+UNSPECV_ATOMIC_EXCHG	; Represent an atomic exchange.
+UNSPECV_ATOMIC_CAS		; Represent an atomic CAS.
+UNSPECV_ATOMIC_SWP		; Represent an atomic SWP.
+UNSPECV_ATOMIC_OP		; Represent an atomic operation.
+UNSPECV_ATOMIC_LDOP		; Represent an atomic load-operation
+UNSPECV_ATOMIC_LDOP_OR	; Represent an atomic load-or
+UNSPECV_ATOMIC_LDOP_BIC	; Represent an atomic load-bic
+UNSPECV_ATOMIC_LDOP_XOR	; Represent an atomic load-xor
+UNSPECV_ATOMIC_LDOP_PLUS	; Represent an atomic load-add
+])
+
 ;; ---
 ;; Mode attributes
 ;; ---
@@ -958,6 +981,16 @@
 
 (define_int_iterator CRYPTO_SHA256 [UNSPEC_SHA256H UNSPEC_SHA256H2])
 
+;; Iterators for atomic operations.
+
+(define_int_iterator ATOMIC_LDOP
+ [UNSPECV_ATOMIC_LDOP_OR UNSPECV_ATOMIC_LDOP_BIC
+  UNSPECV_ATOMIC_LDOP_XOR UNSPECV_ATOMIC_LDOP_PLUS])
+
+(define_int_attr atomic_ldop
+ [(UNSPECV_ATOMIC_LDOP_OR "set") (UNSPECV_ATOMIC_LDOP_BIC "clr")
+  (UNSPECV_ATOMIC_LDOP_XOR "eor") (UNSPECV_ATOMIC_LDOP_PLUS "add")])
+
 ;; ---
 ;; Int Iterators Attributes.
 ;; ---
-- 
2.1.4



Re: [AArch64][PATCH 6/7] Add NEON intrinsics vqrdmlah and vqrdmlsh.

2015-11-09 Thread Matthew Wahab

On 09/11/15 13:31, Christophe Lyon wrote:

On 30 October 2015 at 16:52, Matthew Wahab  wrote:

On 30/10/15 12:51, Christophe Lyon wrote:


On 23 October 2015 at 14:26, Matthew Wahab 
wrote:


The ARMv8.1 architecture extension adds two Adv.SIMD instructions,
sqrdmlah and sqrdmlsh. This patch adds the NEON intrinsics vqrdmlah and
vqrdmlsh for these instructions. The new intrinsics are of the form
vqrdml{as}h[q]_.

Tested the series for aarch64-none-linux-gnu with native bootstrap and
make check on an ARMv8 architecture. Also tested aarch64-none-elf with
cross-compiled check-gcc on an ARMv8.1 emulator.


Is there a publicly available simulator for v8.1? QEMU or Foundation
Model?


Sorry, I don't know.
Matthew



So, what will happen to the testsuite once this is committed?
Are we going to see FAILs when using QEMU?



No, the check at the top of the  test files

+/* { dg-require-effective-target arm_v8_1a_neon_hw } */

should make this test UNSUPPORTED if the the HW/simulator can't execute it. (Support 
for this check is added in patch #5 in this series.) Note that the aarch64-none-linux 
make check was run on ARMv8 HW which can't execute the test and correctly reported it 
as unsupported.


Matthew


[AArch64] Rework ARMv8.1 command line options.

2015-11-16 Thread Matthew Wahab

Hello,

The command line options for target selection allow ARMv8.1 extensions
to be individually enabled/disabled. They also allow the extensions to
be enabled with -march=armv8-a. This doesn't reflect the ARMv8.1
architecture which requires all extensions to be enabled and doesn't make
them available for ARMv8.

This patch removes the options for the individual ARMv8.1 extensions
except for +lse. This means that setting -march=armv8.1-a will enable
all extensions required by ARMv8.1 and that the ARMv8.1 extensions can't
be used with -march=armv8.

The exception to this is +lse since there may be existing code expecting
to be built with -march=armv8-a+lse. Note that +crc, which is enabled by
-march=armv8.1-a, is still an option for -march=armv8-a.

This patch depends on the patch series
https://gcc.gnu.org/ml/gcc-patches/2015-10/msg02429.html.

Tested aarch64-none-elf with cross-compiled check-gcc and
aarch64-none-linux-gnu with native bootstrap and make check.

Ok for trunk?
Matthew

gcc/
2015-11-16  Matthew Wahab  

* config/aarch64/aarch64-options-extensions.def: Remove
AARCH64_FL_RDMA from "fp" and "simd".  Remove "pan", "lor",
"rdma".
* config/aarch64/aarch64.h (AARCH64_FL_PAN): Remove.
(AARCH64_FL_LOR): Remove.
(AARCH64_FL_RDMA): Remove.
(AARCH64_FL_V8_1): New.
(AARCH64_FL_FOR_AARCH8_1): Replace AARCH64_FL_PAN, AARCH64_FL_LOR
and AARCH64_FL_RDMA with AARCH64_FL_V8_1.
(AARCH64_ISA_RDMA): Replace AARCH64_FL_RDMA with AARCH64_FL_V8_1.
* doc/invoke.texi (AArch64 - Feature Modifiers): Remove "pan",
"lor" and "rdma".
>From bc4ea389754127ec639ea2de085a7c82aebae117 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Fri, 30 Oct 2015 10:32:59 +
Subject: [PATCH] [AArch64] Rework ARMv8.1 command line options.

Change-Id: Ib9053719f45980255a3d7727e226a53d9f214049
---
 gcc/config/aarch64/aarch64-option-extensions.def | 9 -
 gcc/config/aarch64/aarch64.h | 9 +++--
 gcc/doc/invoke.texi  | 7 ---
 3 files changed, 7 insertions(+), 18 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-option-extensions.def b/gcc/config/aarch64/aarch64-option-extensions.def
index b261a0f..4f1d535 100644
--- a/gcc/config/aarch64/aarch64-option-extensions.def
+++ b/gcc/config/aarch64/aarch64-option-extensions.def
@@ -34,11 +34,10 @@
should contain a whitespace-separated list of the strings in 'Features'
that are required.  Their order is not important.  */
 
-AARCH64_OPT_EXTENSION("fp",	AARCH64_FL_FP,  AARCH64_FL_FPSIMD | AARCH64_FL_CRYPTO | AARCH64_FL_RDMA, "fp")
-AARCH64_OPT_EXTENSION("simd",	AARCH64_FL_FPSIMD,  AARCH64_FL_SIMD | AARCH64_FL_CRYPTO | AARCH64_FL_RDMA,   "asimd")
+AARCH64_OPT_EXTENSION ("fp", AARCH64_FL_FP,
+		   AARCH64_FL_FPSIMD | AARCH64_FL_CRYPTO, "fp")
+AARCH64_OPT_EXTENSION ("simd", AARCH64_FL_FPSIMD,
+		   AARCH64_FL_SIMD | AARCH64_FL_CRYPTO, "asimd")
 AARCH64_OPT_EXTENSION("crypto",	AARCH64_FL_CRYPTO | AARCH64_FL_FPSIMD,  AARCH64_FL_CRYPTO,   "aes pmull sha1 sha2")
 AARCH64_OPT_EXTENSION("crc",	AARCH64_FL_CRC, AARCH64_FL_CRC,"crc32")
 AARCH64_OPT_EXTENSION("lse",	AARCH64_FL_LSE, AARCH64_FL_LSE,"lse")
-AARCH64_OPT_EXTENSION("pan",	AARCH64_FL_PAN,		AARCH64_FL_PAN,		"pan")
-AARCH64_OPT_EXTENSION("lor",	AARCH64_FL_LOR,		AARCH64_FL_LOR,		"lor")
-AARCH64_OPT_EXTENSION("rdma",	AARCH64_FL_RDMA | AARCH64_FL_FPSIMD,	AARCH64_FL_RDMA,	"rdma")
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 68c006f..06345f0 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -134,9 +134,7 @@ extern unsigned aarch64_architecture_version;
 #define AARCH64_FL_CRC(1 << 3)	/* Has CRC.  */
 /* ARMv8.1 architecture extensions.  */
 #define AARCH64_FL_LSE	  (1 << 4)  /* Has Large System Extensions.  */
-#define AARCH64_FL_PAN	  (1 << 5)  /* Has Privileged Access Never.  */
-#define AARCH64_FL_LOR	  (1 << 6)  /* Has Limited Ordering regions.  */
-#define AARCH64_FL_RDMA	  (1 << 7)  /* Has ARMv8.1 Adv.SIMD.  */
+#define AARCH64_FL_V8_1	  (1 << 5)  /* Has ARMv8.1 extensions.  */
 
 /* Has FP and SIMD.  */
 #define AARCH64_FL_FPSIMD (AARCH64_FL_FP | AARCH64_FL_SIMD)
@@ -147,8 +145,7 @@ extern unsigned aarch64_architecture_version;
 /* Architecture flags that effect instruction selection.  */
 #define AARCH64_FL_FOR_ARCH8   (AARCH64_FL_FPSIMD)
 #define AARCH64_FL_FOR_ARCH8_1			   \
-  (AARCH64_FL_FOR_

Re: [AArch64][dejagnu][PATCH 5/7] Dejagnu support for ARMv8.1 Adv.SIMD.

2015-11-23 Thread Matthew Wahab

On 23/11/15 12:24, James Greenhalgh wrote:

On Tue, Oct 27, 2015 at 03:32:04PM +, Matthew Wahab wrote:

On 24/10/15 08:16, Bernhard Reutner-Fischer wrote:

On October 23, 2015 2:24:26 PM GMT+02:00, Matthew Wahab 
 wrote:

The ARMv8.1 architecture extension adds two Adv.SIMD instructions,.
This
patch adds support in Dejagnu for ARMv8.1 Adv.SIMD specifiers and
checks.

The new test options are
- { dg-add-options arm_v8_1a_neon }: Add compiler options needed to
   enable ARMv8.1 Adv.SIMD.
- { dg-require-effective-target arm_v8_1a_neon_hw }: Require a target
   capable of executing ARMv8.1 Adv.SIMD instructions.




Hi Matthew,

I have a couple of comments below. Neither need to block the patch, but
I'd appreciate a reply before I say OK.



diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 4d5b0a3d..0fb679d 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -2700,6 +2700,16 @@ proc add_options_for_arm_v8_neon { flags } {
  return "$flags $et_arm_v8_neon_flags -march=armv8-a"
  }

+# Add the options needed for ARMv8.1 Adv.SIMD.
+
+proc add_options_for_arm_v8_1a_neon { flags } {
+if { [istarget aarch64*-*-*] } {
+   return "$flags -march=armv8.1-a"


Should this be -march=armv8.1-a+simd or some other feature flag?



I think it should by armv8.1-a only. +simd is enabled by all -march settings so it 
seems redundant to add it here. An alternative is to add +rdma but that's also 
enabled by armv8.1-a. (I've a patch at 
https://gcc.gnu.org/ml/gcc-patches/2015-11/msg01973.html which gets rid for +rdma as 
part of an armv8.1-a command line clean up.)



+# Return 1 if the target supports executing the ARMv8.1 Adv.SIMD extension, 0
+# otherwise.  The test is valid for AArch64.
+
+proc check_effective_target_arm_v8_1a_neon_hw { } {
+if { ![check_effective_target_arm_v8_1a_neon_ok] } {
+   return 0;
+}
+return [check_runtime_nocache arm_v8_1a_neon_hw_available {
+   int
+   main (void)
+   {
+ long long a = 0, b = 1;
+ long long result = 0;
+
+ asm ("sqrdmlah %s0,%s1,%s2"
+  : "=w"(result)
+  : "w"(a), "w"(b)
+  : /* No clobbers.  */);


Hm, those types look wrong, I guess this works but it is an unusual way
to write it. I presume this is to avoid including arm_neon.h each time, but
you could just directly use the internal type names for the arm_neon types.
That is to say __Int32x4_t (or whichever mode you intend to use).



I'll rework the patch to use the internal types names.

Matthew



Re: [AArch64][dejagnu][PATCH 5/7] Dejagnu support for ARMv8.1 Adv.SIMD.

2015-11-25 Thread Matthew Wahab

On 23/11/15 16:38, Matthew Wahab wrote:

On 23/11/15 12:24, James Greenhalgh wrote:

On Tue, Oct 27, 2015 at 03:32:04PM +, Matthew Wahab wrote:

On 24/10/15 08:16, Bernhard Reutner-Fischer wrote:

On October 23, 2015 2:24:26 PM GMT+02:00, Matthew Wahab
 wrote:

The ARMv8.1 architecture extension adds two Adv.SIMD instructions,.
This
patch adds support in Dejagnu for ARMv8.1 Adv.SIMD specifiers and
checks.

The new test options are
- { dg-add-options arm_v8_1a_neon }: Add compiler options needed to
   enable ARMv8.1 Adv.SIMD.
- { dg-require-effective-target arm_v8_1a_neon_hw }: Require a target
   capable of executing ARMv8.1 Adv.SIMD instructions.




+# Return 1 if the target supports executing the ARMv8.1 Adv.SIMD extension, 0
+# otherwise.  The test is valid for AArch64.
+
+proc check_effective_target_arm_v8_1a_neon_hw { } {
+if { ![check_effective_target_arm_v8_1a_neon_ok] } {
+return 0;
+}
+return [check_runtime_nocache arm_v8_1a_neon_hw_available {
+int
+main (void)
+{
+  long long a = 0, b = 1;
+  long long result = 0;
+
+  asm ("sqrdmlah %s0,%s1,%s2"
+   : "=w"(result)
+   : "w"(a), "w"(b)
+   : /* No clobbers.  */);


Hm, those types look wrong, I guess this works but it is an unusual way
to write it. I presume this is to avoid including arm_neon.h each time, but
you could just directly use the internal type names for the arm_neon types.
That is to say __Int32x4_t (or whichever mode you intend to use).



I'll rework the patch to use the internal types names.


Attached, the reworked patch which uses internal type __Int32x2_t and
cleans up the assembler.

Retested aarch64-none-elf with cross-compiled check-gcc on an ARMv8.1
emulator. Also re-ran the cross-compiled
gcc.target/aarch64/advsimd-intrinsics tests for aarch64-none-elf on an
ARMv8 emulator.

Matthew

gcc/testsuite
2015-11-24  Matthew Wahab  

* lib/target-supports.exp (add_options_for_arm_v8_1a_neon): New.
(check_effective_target_arm_arch_FUNC_ok)
(add_options_for_arm_arch_FUNC)
(check_effective_target_arm_arch_FUNC_multilib): Add "armv8.1-a"
to the list to be generated.
(check_effective_target_arm_v8_1a_neon_ok_nocache): New.
(check_effective_target_arm_v8_1a_neon_ok): New.
(check_effective_target_arm_v8_1a_neon_hw): New.



>From 262c24946b2da5833a30b2e3e696bb7ea271059f Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Mon, 26 Oct 2015 14:58:36 +
Subject: [PATCH 5/7] [Testsuite] Add dejagnu options for armv8.1 neon

Change-Id: Ib58b8c4930ad3971af3ea682eda043e14cd2e8b3
---
 gcc/testsuite/lib/target-supports.exp | 57 ++-
 1 file changed, 56 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 3eb46f2..dcd51fd 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -2816,6 +2816,16 @@ proc add_options_for_arm_v8_neon { flags } {
 return "$flags $et_arm_v8_neon_flags -march=armv8-a"
 }
 
+# Add the options needed for ARMv8.1 Adv.SIMD.
+
+proc add_options_for_arm_v8_1a_neon { flags } {
+if { [istarget aarch64*-*-*] } {
+	return "$flags -march=armv8.1-a"
+} else {
+	return "$flags"
+}
+}
+
 proc add_options_for_arm_crc { flags } {
 if { ! [check_effective_target_arm_crc_ok] } {
 return "$flags"
@@ -3102,7 +3112,8 @@ foreach { armfunc armflag armdef } { v4 "-march=armv4 -marm" __ARM_ARCH_4__
  v7r "-march=armv7-r" __ARM_ARCH_7R__
  v7m "-march=armv7-m -mthumb" __ARM_ARCH_7M__
  v7em "-march=armv7e-m -mthumb" __ARM_ARCH_7EM__
- v8a "-march=armv8-a" __ARM_ARCH_8A__ } {
+ v8a "-march=armv8-a" __ARM_ARCH_8A__
+ v8_1a "-march=armv8.1a" __ARM_ARCH_8A__ } {
 eval [string map [list FUNC $armfunc FLAG $armflag DEF $armdef ] {
 	proc check_effective_target_arm_arch_FUNC_ok { } {
 	if { [ string match "*-marm*" "FLAG" ] &&
@@ -3259,6 +3270,25 @@ proc check_effective_target_arm_neonv2_hw { } {
 } [add_options_for_arm_neonv2 ""]]
 }
 
+# Return 1 if the target supports the ARMv8.1 Adv.SIMD extension, 0
+# otherwise.  The test is valid for AArch64.
+
+proc check_effective_target_arm_v8_1a_neon_ok_nocache { } {
+if { ![istarget aarch64*-*-*] } {
+	return 0
+}
+return [check_no_compiler_messages_nocache arm_v8_1a_neon_ok assembly {
+	#if !defined (__ARM_FEATURE_QRDMX)
+	#error "__ARM_FEATURE_QRDMX not defined"
+	#endif
+} [add_options_for_arm_v8_1a_neon ""]]
+}
+
+proc check_effective_target_arm_v8_1a_neon_ok { } {
+return [check_cached_effective_target arm_v8_1a_neon_ok \
+		check_effective_target_arm_v8_1a_neon_ok_nocache]
+}
+
 # Return 1 if

Re: [AArch64][PATCH 6/7] Add NEON intrinsics vqrdmlah and vqrdmlsh.

2015-11-25 Thread Matthew Wahab

On 23/11/15 13:35, James Greenhalgh wrote:

On Fri, Oct 23, 2015 at 01:26:11PM +0100, Matthew Wahab wrote:

The ARMv8.1 architecture extension adds two Adv.SIMD instructions,
sqrdmlah and sqrdmlsh. This patch adds the NEON intrinsics vqrdmlah and
vqrdmlsh for these instructions. The new intrinsics are of the form
vqrdml{as}h[q]_.




diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index e186348..9e73809 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -2649,6 +2649,59 @@ vqrdmulhq_s32 (int32x4_t __a, int32x4_t __b)
return (int32x4_t) __builtin_aarch64_sqrdmulhv4si (__a, __b);
  }

+#pragma GCC push_options
+#pragma GCC target ("arch=armv8.1-a")


Can we please patch the documentation to make it clear that -march=armv8.1-a
always implies -march=armv8.1-a+rdma ? The documentation around which
feature modifiers are implied when leaves much to be desired.


I'll rework the documentation as part of the (separate) command lines clean-up 
patch.


+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vqrdmlah_s16 (int16x4_t __a, int16x4_t __b, int16x4_t __c)
+{
+  return (int16x4_t) __builtin_aarch64_sqrdmlahv4hi (__a, __b, __c);


We don't need this cast (likewise the other instances)?



Attached, a reworked patch that removes the casts from the new
intrinsics. It also moves the new intrinsics to before the crypto
intrinsics. The intention is that the intrinsics added in this and the
next patch in the set are put in the same place and bracketed by a
single target pragma.

Retested aarch64-none-elf with cross-compiled check-gcc on an ARMv8.1
emulator. Also re-ran the cross-compiled
gcc.target/aarch64/advsimd-intrinsics tests for aarch64-none-elf on an
ARMv8 emulator.

Matthew

gcc/
2015-11-24  Matthew Wahab  

* gcc/config/aarch64/arm_neon.h (vqrdmlah_s16, vqrdmlah_s32): New.
(vqrdmlahq_s16, vqrdmlahq_s32): New.
(vqrdmlsh_s16, vqrdmlsh_s32): New.
(vqrdmlshq_s16, vqrdmlshq_s32): New.

gcc/testsuite
2015-11-24  Matthew Wahab  

* gcc.target/aarch64/advsimd-intrinsics/vqrdmlXh.inc: New file,
support code for vqrdml{as}h tests.
* gcc.target/aarch64/advsimd-intrinsics/vqrdmlah.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vqrdmlsh.c: New.


>From e623828ac2d033a9a51766d9843a650aab9f42e9 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Thu, 27 Aug 2015 13:22:41 +0100
Subject: [PATCH 6/7] Add neon intrinsics: vqrdmlah, vqrdmlsh.

Change-Id: I5c7f8d36ee980d280c1d50f6f212b286084c5acf
---
 gcc/config/aarch64/arm_neon.h  |  53 
 .../aarch64/advsimd-intrinsics/vqrdmlXh.inc| 138 +
 .../aarch64/advsimd-intrinsics/vqrdmlah.c  |  57 +
 .../aarch64/advsimd-intrinsics/vqrdmlsh.c  |  61 +
 4 files changed, 309 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrdmlXh.inc
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrdmlah.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrdmlsh.c

diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 138b108..63f1627 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -11213,6 +11213,59 @@ vbslq_u64 (uint64x2_t __a, uint64x2_t __b, uint64x2_t __c)
   return __builtin_aarch64_simd_bslv2di_ (__a, __b, __c);
 }
 
+/* ARMv8.1 instrinsics.  */
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8.1-a")
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vqrdmlah_s16 (int16x4_t __a, int16x4_t __b, int16x4_t __c)
+{
+  return __builtin_aarch64_sqrdmlahv4hi (__a, __b, __c);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vqrdmlah_s32 (int32x2_t __a, int32x2_t __b, int32x2_t __c)
+{
+  return __builtin_aarch64_sqrdmlahv2si (__a, __b, __c);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vqrdmlahq_s16 (int16x8_t __a, int16x8_t __b, int16x8_t __c)
+{
+  return __builtin_aarch64_sqrdmlahv8hi (__a, __b, __c);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vqrdmlahq_s32 (int32x4_t __a, int32x4_t __b, int32x4_t __c)
+{
+  return __builtin_aarch64_sqrdmlahv4si (__a, __b, __c);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vqrdmlsh_s16 (int16x4_t __a, int16x4_t __b, int16x4_t __c)
+{
+  return __builtin_aarch64_sqrdmlshv4hi (__a, __b, __c);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vqrdmlsh_s32 (int32x2_t __a, int32x2_t __b, int32x2_t __c)
+{
+  return __builtin_aarch64_sqrdmlshv2si (__a, __b, __c);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vqrdmlshq_s16 (int16x8_t __a, int16x8_t __b, int16x8_t __c)
+{
+  return __builtin_aa

Re: [AArch64][PATCH 7/7] Add NEON intrinsics vqrdmlah_lane and vqrdmlsh_lane.

2015-11-25 Thread Matthew Wahab

On 23/11/15 13:37, James Greenhalgh wrote:

On Fri, Oct 23, 2015 at 01:30:46PM +0100, Matthew Wahab wrote:

The ARMv8.1 architecture extension adds two Adv.SIMD instructions,
sqrdmlah and sqrdmlsh. This patch adds the NEON intrinsics vqrdmlah_lane
and vqrdmlsh_lane for these instructions. The new intrinsics are of the
form vqrdml{as}h[q]_lane_.




diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 9e73809..9b68e4a 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -10675,6 +10675,59 @@ vqrdmulhq_laneq_s32 (int32x4_t __a, int32x4_t __b, 
const int __c)
return __builtin_aarch64_sqrdmulh_laneqv4si (__a, __b, __c);
  }

+#pragma GCC push_options
+#pragma GCC target ("arch=armv8.1-a")


Rather than strict alphabetical order, can we group everything which is
under one set of extensions together, to save on the push_options/pop_options
pairs.



Attached the reworked patch that keeps the ARMv8.1 intrinsics together,
bracketed by a single target pragma.

Retested aarch64-none-elf with cross-compiled check-gcc on an ARMv8.1
emulator. Also re-ran the cross-compiled
gcc.target/aarch64/advsimd-intrinsics tests for aarch64-none-elf on an
ARMv8 emulator.

Matthew

gcc/
2015-11-24  Matthew Wahab  

* gcc/config/aarch64/arm_neon.h
(vqrdmlah_laneq_s16, vqrdmlah_laneq_s32): New.
(vqrdmlahq_laneq_s16, vqrdmlahq_laneq_s32): New.
(vqrdmlsh_lane_s16, vqrdmlsh_lane_s32): New.
(vqrdmlshq_laneq_s16, vqrdmlshq_laneq_s32): New.
(vqrdmlah_lane_s16, vqrdmlah_lane_s32): New.
(vqrdmlahq_lane_s16, vqrdmlahq_lane_s32): New.
(vqrdmlahh_s16, vqrdmlahh_lane_s16, vqrdmlahh_laneq_s16): New.
(vqrdmlahs_s32, vqrdmlahs_lane_s32, vqrdmlahs_laneq_s32): New.
(vqrdmlsh_lane_s16, vqrdmlsh_lane_s32): New.
(vqrdmlshq_lane_s16, vqrdmlshq_lane_s32): New.
(vqrdmlshh_s16, vqrdmlshh_lane_s16, vqrdmlshh_laneq_s16): New.
(vqrdmlshs_s32, vqrdmlshs_lane_s32, vqrdmlshs_laneq_s32): New.

gcc/testsuite
2015-11-24  Matthew Wahab  

* gcc.target/aarch64/advsimd-intrinsics/vqrdmlXh_lane.inc: New file,
support code for vqrdml{as}h_lane tests.
* gcc.target/aarch64/advsimd-intrinsics/vqrdmlah_lane.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vqrdmlsh_lane.c: New.

>From 03cb214eaf07cceb65f0dc07dca1be739bfe5375 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Thu, 27 Aug 2015 14:17:26 +0100
Subject: [PATCH 7/7] Add neon intrinsics: vqrdmlah_lane, vqrdmlsh_lane.

Change-Id: I6d7a372e0a5b83ef0846ab62abbe9b24ada69fc4
---
 gcc/config/aarch64/arm_neon.h  | 168 +
 .../aarch64/advsimd-intrinsics/vqrdmlXh_lane.inc   | 154 +++
 .../aarch64/advsimd-intrinsics/vqrdmlah_lane.c |  57 +++
 .../aarch64/advsimd-intrinsics/vqrdmlsh_lane.c |  61 
 4 files changed, 440 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrdmlXh_lane.inc
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrdmlah_lane.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrdmlsh_lane.c

diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 63f1627..56db339 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -11264,6 +11264,174 @@ vqrdmlshq_s32 (int32x4_t __a, int32x4_t __b, int32x4_t __c)
 {
   return __builtin_aarch64_sqrdmlshv4si (__a, __b, __c);
 }
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vqrdmlah_laneq_s16 (int16x4_t __a, int16x4_t __b, int16x8_t __c, const int __d)
+{
+  return  __builtin_aarch64_sqrdmlah_laneqv4hi (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vqrdmlah_laneq_s32 (int32x2_t __a, int32x2_t __b, int32x4_t __c, const int __d)
+{
+  return __builtin_aarch64_sqrdmlah_laneqv2si (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vqrdmlahq_laneq_s16 (int16x8_t __a, int16x8_t __b, int16x8_t __c, const int __d)
+{
+  return __builtin_aarch64_sqrdmlah_laneqv8hi (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vqrdmlahq_laneq_s32 (int32x4_t __a, int32x4_t __b, int32x4_t __c, const int __d)
+{
+  return __builtin_aarch64_sqrdmlah_laneqv4si (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vqrdmlsh_laneq_s16 (int16x4_t __a, int16x4_t __b, int16x8_t __c, const int __d)
+{
+  return  __builtin_aarch64_sqrdmlsh_laneqv4hi (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vqrdmlsh_laneq_s32 (int32x2_t __a, int32x2_t __b, int32x4_t __c, const int __d)
+{
+  return __builtin_aarch64_sqrdmlsh_laneqv2si (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int16x8_t 

[PATCH 1/7][ARM] Add support for ARMv8.1.

2015-11-26 Thread Matthew Wahab

Hello,


ARMv8.1 includes an extension to ARM which adds two Adv.SIMD
instructions, vqrdmlah and vqrdmlsh. This patch set adds support for
ARMv8.1 and for the new instructions, enabling the architecture with
--march=armv8.1-a. The new instructions are enabled when both ARMv8.1
and a suitable fpu options are set, for instance with -march=armv8.1-a
-mfpu=neon-fp-armv8 -mfloat-abi=hard.

This patch set adds the command line options and internal feature
macros. Following patches
- enable multilib support for ARMv8.1,
- add patterns for the new instructions,
- add the ACLE feature macro for the ARMv8.1 extensions,
- extend target support in the testsuite to ARMv8.1,
- add the ACLE intrinsics for vqrmdl{as}h and
- add the ACLE intrinsics for vqrmdl{as}h_lane.

Tested the series for arm-none-eabi with cross-compiled check-gcc on an
ARMv8.1 emulator. Also tested arm-none-linux-gnueabihf with native
bootstrap and make check.

Is this ok for trunk?
Matthew

gcc/
2015-11-26  Matthew Wahab  

* config/arm/arm-arches.def: Add "armv8.1-a" and "armv8.1-a+crc".
* config/arm/arm-protos.h (FL2_ARCH8_1): New.
(FL2_FOR_ARCH8_1A): New.
* config/arm/arm-tables.opt: Regenerate.
* config/arm/arm.c (arm_arch8_1): New.
(arm_option_override): Set arm_arch8_1.
* config/arm/arm.h (TARGET_NEON_RDMA): New.
(arm_arch8_1): Declare.
* doc/invoke.texi (ARM Options, -march): Add "armv8.1-a" and
"armv8.1-a+crc".
(ARM Options, -mfpu): Fix a typo.
>From 3ee3a16839c1c316906e33f5384da05ee70dd831 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Tue, 1 Sep 2015 11:31:25 +0100
Subject: [PATCH 1/7] [ARM] Add ARMv8.1 architecture flags and options.

Change-Id: I6bb0c7f020613a1a17e40bccc28b00c30d644c70
---
 gcc/config/arm/arm-arches.def |  5 +
 gcc/config/arm/arm-protos.h   |  3 +++
 gcc/config/arm/arm-tables.opt | 10 --
 gcc/config/arm/arm.c  |  4 
 gcc/config/arm/arm.h  |  6 ++
 gcc/doc/invoke.texi   |  6 +++---
 6 files changed, 29 insertions(+), 5 deletions(-)

diff --git a/gcc/config/arm/arm-arches.def b/gcc/config/arm/arm-arches.def
index ddf6c3c..6c83153 100644
--- a/gcc/config/arm/arm-arches.def
+++ b/gcc/config/arm/arm-arches.def
@@ -57,6 +57,11 @@ ARM_ARCH("armv7-m", cortexm3,	7M,	ARM_FSET_MAKE_CPU1 (FL_CO_PROC |	  FL_FOR_
 ARM_ARCH("armv7e-m", cortexm4,  7EM,	ARM_FSET_MAKE_CPU1 (FL_CO_PROC |	  FL_FOR_ARCH7EM))
 ARM_ARCH("armv8-a", cortexa53,  8A,	ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_FOR_ARCH8A))
 ARM_ARCH("armv8-a+crc",cortexa53, 8A,   ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_CRC32  | FL_FOR_ARCH8A))
+ARM_ARCH ("armv8.1-a", cortexa53,  8A,
+	  ARM_FSET_MAKE (FL_CO_PROC | FL_FOR_ARCH8A,  FL2_FOR_ARCH8_1A))
+ARM_ARCH ("armv8.1-a+crc",cortexa53, 8A,
+	  ARM_FSET_MAKE (FL_CO_PROC | FL_CRC32 | FL_FOR_ARCH8A,
+			 FL2_FOR_ARCH8_1A))
 ARM_ARCH("iwmmxt",  iwmmxt, 5TE,	ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_STRONG | FL_FOR_ARCH5TE | FL_XSCALE | FL_IWMMXT))
 ARM_ARCH("iwmmxt2", iwmmxt2,5TE,	ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_STRONG | FL_FOR_ARCH5TE | FL_XSCALE | FL_IWMMXT | FL_IWMMXT2))
 
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index e4b8fb3..c3eb6d3 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -388,6 +388,8 @@ extern bool arm_is_constant_pool_ref (rtx);
 #define FL_IWMMXT2(1 << 30)   /* "Intel Wireless MMX2 technology".  */
 #define FL_ARCH6KZ(1 << 31)   /* ARMv6KZ architecture.  */
 
+#define FL2_ARCH8_1   (1 << 0)	  /* Architecture 8.1.  */
+
 /* Flags that only effect tuning, not available instructions.  */
 #define FL_TUNE		(FL_WBUF | FL_VFPV2 | FL_STRONG | FL_LDSCHED \
 			 | FL_CO_PROC)
@@ -416,6 +418,7 @@ extern bool arm_is_constant_pool_ref (rtx);
 #define FL_FOR_ARCH7M	(FL_FOR_ARCH7 | FL_THUMB_DIV)
 #define FL_FOR_ARCH7EM  (FL_FOR_ARCH7M | FL_ARCH7EM)
 #define FL_FOR_ARCH8A	(FL_FOR_ARCH7VE | FL_ARCH8)
+#define FL2_FOR_ARCH8_1A	FL2_ARCH8_1
 
 /* There are too many feature bits to fit in a single word so the set of cpu and
fpu capabilities is a structure.  A feature set is created and manipulated
diff --git a/gcc/config/arm/arm-tables.opt b/gcc/config/arm/arm-tables.opt
index 48aac41..db17f6e 100644
--- a/gcc/config/arm/arm-tables.opt
+++ b/gcc/config/arm/arm-tables.opt
@@ -416,10 +416,16 @@ EnumValue
 Enum(arm_arch) String(armv8-a+crc) Value(26)
 
 EnumValue
-Enum(arm_arch) String(iwmmxt) Value(27)
+Enum(arm_arch) String(armv8.1-a) Value(27)
 
 EnumValue
-Enum(arm_arch) String(iwmmxt2) Value(28)
+Enum(arm_arch) String(armv8.1-a+crc) Value(28)
+
+EnumValue
+Enum(arm_arch) String(iwmmxt) Value(29)
+
+EnumValue
+Enum(arm_arch) String(iwmmxt2) Value(30)
 
 Enum
 Name(arm_fpu) Type(int)
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index e0cdc2

[PATCH 2/7][ARM] Multilib support for ARMv8.1.

2015-11-26 Thread Matthew Wahab

This patch sets up multilib support for ARMv8.1, treating it as a
synonym for ARMv8. Since ARMv8.1 integer, FP or SIMD
instructions are only generated for the new, instruction-specific
instrinsics, mapping to ARMv8 rather than adding a new multilib variant
is sufficient.

Tested the series for arm-none-eabi with cross-compiled check-gcc on an
ARMv8.1 emulator. Also tested arm-none-linux-gnueabihf with native
bootstrap and make check.

Ok for trunk?
Matthew

gcc/
2015-11-26  Matthew Wahab  

* config/arm/t-aprofile: Make "armv8.1-a" and "armv8.1-a+crc"
matches for "armv8-a".

>From 9cd389bf72cff391423e17423f4624904aff5474 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Fri, 23 Oct 2015 09:37:12 +0100
Subject: [PATCH 2/7] [ARM] Multilib support for ARMv8.1

Change-Id: I65ee77768e22452ac15452cf6d4fdec3079ef852
---
 gcc/config/arm/t-aprofile | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/config/arm/t-aprofile b/gcc/config/arm/t-aprofile
index cf34161..b23f1bc 100644
--- a/gcc/config/arm/t-aprofile
+++ b/gcc/config/arm/t-aprofile
@@ -98,6 +98,8 @@ MULTILIB_MATCHES   += march?armv8-a=mcpu?xgene1
 
 # Arch Matches
 MULTILIB_MATCHES   += march?armv8-a=march?armv8-a+crc
+MULTILIB_MATCHES   += march?armv8-a=march?armv8.1-a
+MULTILIB_MATCHES   += march?armv8-a=march?armv8.1-a+crc
 
 # FPU matches
 MULTILIB_MATCHES   += mfpu?vfpv3-d16=mfpu?vfpv3
-- 
2.1.4



[PATCH 3/7][ARM] Add patterns for new instructions

2015-11-26 Thread Matthew Wahab

Hello,

This patch adds patterns for the instructions, vqrdmlah and vqrdmlsh,
introduced in the ARMv8.1 architecture. The instructions are made
available when -march=armv8.1-a is enabled with suitable fpu settings,
such as -mfpu=neon-fp-armv8 -mfloat-abi=hard.

Tested the series for arm-none-eabi with cross-compiled check-gcc on an
ARMv8.1 emulator. Also tested arm-none-linux-gnueabihf with native
bootstrap and make check.

Ok for trunk?
Matthew

gcc/
2015-11-26  Matthew Wahab  

* config/arm/iterators.md (VQRDMLH_AS): New.
(neon_rdma_as): New.
* config/arm/neon.md
(neon_vqrdmlh): New.
(neon_vqrdmlh_lane): New.
* config/arm/unspecs.md (UNSPEC_VQRDMLAH): New.
(UNSPEC_VQRDMLSH): New.

>From fea646491d51548b775fdfb5a4fd6d6bc72d4c83 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Wed, 17 Jun 2015 12:00:50 +0100
Subject: [PATCH 3/7] [ARM] Add patterns for new instructions.

Change-Id: Ia84c345019c7beda2d3c6c39074043d2e005347a
---
 gcc/config/arm/iterators.md |  5 +
 gcc/config/arm/neon.md  | 45 +
 gcc/config/arm/unspecs.md   |  2 ++
 3 files changed, 52 insertions(+)

diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 6a54125..c7a6880 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -362,6 +362,8 @@
 (define_int_iterator CRYPTO_SELECTING [UNSPEC_SHA1C UNSPEC_SHA1M
UNSPEC_SHA1P])
 
+(define_int_iterator VQRDMLH_AS [UNSPEC_VQRDMLAH UNSPEC_VQRDMLSH])
+
 ;;
 ;; Mode attributes
 ;;
@@ -831,3 +833,6 @@
(simple_return " && use_simple_return_p ()")])
 (define_code_attr return_cond_true [(return " && USE_RETURN_INSN (TRUE)")
(simple_return " && use_simple_return_p ()")])
+
+;; Attributes for VQRDMLAH/VQRDMLSH
+(define_int_attr neon_rdma_as [(UNSPEC_VQRDMLAH "a") (UNSPEC_VQRDMLSH "s")])
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 62fb6da..844ef5e 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -2014,6 +2014,18 @@
   [(set_attr "type" "neon_sat_mul_")]
 )
 
+;; vqrdmlah, vqrdmlsh
+(define_insn "neon_vqrdmlh"
+  [(set (match_operand:VMDQI 0 "s_register_operand" "=w")
+	(unspec:VMDQI [(match_operand:VMDQI 1 "s_register_operand" "0")
+		   (match_operand:VMDQI 2 "s_register_operand" "w")
+		   (match_operand:VMDQI 3 "s_register_operand" "w")]
+		  VQRDMLH_AS))]
+  "TARGET_NEON_RDMA"
+  "vqrdmlh.\t%0, %2, %3"
+  [(set_attr "type" "neon_sat_mla__long")]
+)
+
 (define_insn "neon_vqdmlal"
   [(set (match_operand: 0 "s_register_operand" "=w")
 (unspec: [(match_operand: 1 "s_register_operand" "0")
@@ -3176,6 +3188,39 @@ if (BYTES_BIG_ENDIAN)
   [(set_attr "type" "neon_sat_mul__scalar_q")]
 )
 
+;; vqrdmlah_lane, vqrdmlsh_lane
+(define_insn "neon_vqrdmlh_lane"
+  [(set (match_operand:VMQI 0 "s_register_operand" "=w")
+	(unspec:VMQI [(match_operand:VMQI 1 "s_register_operand" "0")
+		  (match_operand:VMQI 2 "s_register_operand" "w")
+		  (match_operand: 3 "s_register_operand"
+	  "")
+		  (match_operand:SI 4 "immediate_operand" "i")]
+		 VQRDMLH_AS))]
+  "TARGET_NEON_RDMA"
+{
+  return
+   "vqrdmlh.\t%q0, %q2, %P3[%c4]";
+}
+  [(set_attr "type" "neon_mla__scalar")]
+)
+
+(define_insn "neon_vqrdmlh_lane"
+  [(set (match_operand:VMDI 0 "s_register_operand" "=w")
+	(unspec:VMDI [(match_operand:VMDI 1 "s_register_operand" "0")
+		  (match_operand:VMDI 2 "s_register_operand" "w")
+		  (match_operand:VMDI 3 "s_register_operand"
+	  "")
+		  (match_operand:SI 4 "immediate_operand" "i")]
+		 VQRDMLH_AS))]
+  "TARGET_NEON_RDMA"
+{
+  return
+   "vqrdmlh.\t%P0, %P2, %P3[%c4]";
+}
+  [(set_attr "type" "neon_mla__scalar")]
+)
+
 (define_insn "neon_vmla_lane"
   [(set (match_operand:VMD 0 "s_register_operand" "=w")
 	(unspec:VMD [(match_operand:VMD 1 "s_register_operand" "0")
diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md
index 44d4e7d..e7ae9a2 100644
--- a/gcc/config/arm/unspecs.md
+++ b/gcc/config/arm/unspecs.md
@@ -360,5 +360,7 @@
   UNSPEC_NVRINTX
   UNSPEC_NVRINTA
   UNSPEC_NVRINTN
+  UNSPEC_VQRDMLAH
+  UNSPEC_VQRDMLSH
 ])
 
-- 
2.1.4



[PATCH 4/7][ARM] Add ACLE feature macro for ARMv8.1 instructions.

2015-11-26 Thread Matthew Wahab

Hello,

This patch adds the feature macro __ARM_FEATURE_QRDMX to indicate the
presence of the ARMv8.1 instructions vqrdmlah and vqrdmlsh. It is
defined when the instructions are available, as it is when
-march=armv8.1-a is enabled with suitable fpu options.

Tested the series for arm-none-eabi with cross-compiled check-gcc on an
ARMv8.1 emulator. Also tested arm-none-linux-gnueabihf with native
bootstrap and make check.

Ok for trunk?
Matthew

gcc/
2015-11-26  Matthew Wahab  

* config/arm/arm-c.c (arm_cpu_builtins): Define __ARM_FEATURE_QRDMX.

>From 4009cf5c0455429a415be9ca239ac09ac86b17dd Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Wed, 17 Jun 2015 13:25:09 +0100
Subject: [PATCH 4/7] [ARM] Add __ARM_FEATURE_QRDMX

Change-Id: I26cde507e8844a731e4fd857fbd30bf87f213f89
---
 gcc/config/arm/arm-c.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/config/arm/arm-c.c b/gcc/config/arm/arm-c.c
index c336a16..6bf740b 100644
--- a/gcc/config/arm/arm-c.c
+++ b/gcc/config/arm/arm-c.c
@@ -66,6 +66,8 @@ arm_cpu_builtins (struct cpp_reader* pfile)
   def_or_undef_macro (pfile, "__ARM_FEATURE_SAT", TARGET_ARM_SAT);
   def_or_undef_macro (pfile, "__ARM_FEATURE_CRYPTO", TARGET_CRYPTO);
 
+  if (TARGET_NEON_RDMA)
+builtin_define ("__ARM_FEATURE_QRDMX");
   if (unaligned_access)
 builtin_define ("__ARM_FEATURE_UNALIGNED");
   if (TARGET_CRC32)
-- 
2.1.4



[PATCH 5/7][Testsuite] Support ARMv8.1 ARM tests.

2015-11-26 Thread Matthew Wahab

Hello,

This patch adds ARMv8.1 support to GCC Dejagnu, to allow ARM
tests to specify targest and to set up command line options.
It builds on the ARMv8.1 target support added for AArch64 tests, partly
reworking that support to take into account the different configurations
that tests may be run under.

The main changes are
- add_options_for_arm_v8_1a_neon: Call
  check_effective_target_arm_v8_1a_neon_ok to select a suitable set of
  options.
- check_effective_target_arm_v8_1a_neon_ok: Test possible command line
  options, recording the first set that works.
- check_effective_target_arm_v8_1a_neon_hw: Add a test for ARM targets.

Tested the series for arm-none-eabi with cross-compiled check-gcc on an
ARMv8.1 emulator. Also tested arm-none-linux-gnueabihf with native
bootstrap and make check.

Ok for trunk?
Matthew

testsuite/
2015-11-26  Matthew Wahab  

* lib/target-supports.exp (add_options_for_arm_v8_1a_neon): Update
comment.  Use check_effetive_target_arm_v8_1a_neon_ok to select
the command line options.
(check_effective_target_arm_v8_1a_neon_ok_nocache): Update initial
test to allow ARM targets.  Select and record a working set of
command line options.
(check_effective_target_arm_v8_1a_neon_hw): Add tests for ARM
targets.



[PATCH 6/7][ARM] Add ACLE intrinsics vqrdmlah and vqrdmlsh

2015-11-26 Thread Matthew Wahab

Hello,

This patch adds the ACLE intrinsics for the instructions introduced in
ARMv8.1. It adds the vqrmdlah and vqrdmlsh forms of the instrinsics to
the arm_neon.h header, together with the ARM builtins used to implement
them. The intrinsics are available when -march=armv8.1-a is enabled
together with appropriate fpu options.

Tested the series for arm-none-eabi with cross-compiled check-gcc on an
ARMv8.1 emulator. Also tested arm-none-linux-gnueabihf with native
bootstrap and make check.

Ok for trunk?
Matthew

gcc/
2015-11-26  Matthew Wahab  

* config/arm/arm_neon.h (vqrdmlah_s16, vqrdmlah_s32): New.
(vqrdmlahq_s16, vqrdmlahq_s32): New.
(vqrdmlsh_s16, vqrdmlsh_s32): New.
(vqrdmlahq_s16, vqrdmlshq_s32): New.
* config/arm/arm_neon_builtins.def: Add "vqrdmlah" and "vqrdmlsh".

>From 93e9db5bf06172f18f4e89e9533c66d8a0c4f2ca Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Tue, 1 Sep 2015 16:21:44 +0100
Subject: [PATCH 6/7] [ARM] Add neon intrinsics vqrdmlah, vqrdmlsh.

Change-Id: Ic40ff4d477f36ec01714c68e3b83b66208c7958b
---
 gcc/config/arm/arm_neon.h| 50 
 gcc/config/arm/arm_neon_builtins.def |  2 ++
 2 files changed, 52 insertions(+)

diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index 0a33d21..b617f80 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -1158,6 +1158,56 @@ vqrdmulhq_s32 (int32x4_t __a, int32x4_t __b)
   return (int32x4_t)__builtin_neon_vqrdmulhv4si (__a, __b);
 }
 
+#ifdef __ARM_FEATURE_QRDMX
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vqrdmlah_s16 (int16x4_t __a, int16x4_t __b, int16x4_t __c)
+{
+  return (int16x4_t)__builtin_neon_vqrdmlahv4hi (__a, __b, __c);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vqrdmlah_s32 (int32x2_t __a, int32x2_t __b, int32x2_t __c)
+{
+  return (int32x2_t)__builtin_neon_vqrdmlahv2si (__a, __b, __c);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vqrdmlahq_s16 (int16x8_t __a, int16x8_t __b, int16x8_t __c)
+{
+  return (int16x8_t)__builtin_neon_vqrdmlahv8hi (__a, __b, __c);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vqrdmlahq_s32 (int32x4_t __a, int32x4_t __b, int32x4_t __c)
+{
+  return (int32x4_t)__builtin_neon_vqrdmlahv4si (__a, __b, __c);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vqrdmlsh_s16 (int16x4_t __a, int16x4_t __b, int16x4_t __c)
+{
+  return (int16x4_t)__builtin_neon_vqrdmlshv4hi (__a, __b, __c);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vqrdmlsh_s32 (int32x2_t __a, int32x2_t __b, int32x2_t __c)
+{
+  return (int32x2_t)__builtin_neon_vqrdmlshv2si (__a, __b, __c);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vqrdmlshq_s16 (int16x8_t __a, int16x8_t __b, int16x8_t __c)
+{
+  return (int16x8_t)__builtin_neon_vqrdmlshv8hi (__a, __b, __c);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vqrdmlshq_s32 (int32x4_t __a, int32x4_t __b, int32x4_t __c)
+{
+  return (int32x4_t)__builtin_neon_vqrdmlshv4si (__a, __b, __c);
+}
+#endif
+
 __extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
 vmull_s8 (int8x8_t __a, int8x8_t __b)
 {
diff --git a/gcc/config/arm/arm_neon_builtins.def b/gcc/config/arm/arm_neon_builtins.def
index 0b719df..8d5c0ca 100644
--- a/gcc/config/arm/arm_neon_builtins.def
+++ b/gcc/config/arm/arm_neon_builtins.def
@@ -45,6 +45,8 @@ VAR4 (BINOP, vqdmulh, v4hi, v2si, v8hi, v4si)
 VAR4 (BINOP, vqrdmulh, v4hi, v2si, v8hi, v4si)
 VAR2 (TERNOP, vqdmlal, v4hi, v2si)
 VAR2 (TERNOP, vqdmlsl, v4hi, v2si)
+VAR4 (TERNOP, vqrdmlah, v4hi, v2si, v8hi, v4si)
+VAR4 (TERNOP, vqrdmlsh, v4hi, v2si, v8hi, v4si)
 VAR3 (BINOP, vmullp, v8qi, v4hi, v2si)
 VAR3 (BINOP, vmulls, v8qi, v4hi, v2si)
 VAR3 (BINOP, vmullu, v8qi, v4hi, v2si)
-- 
2.1.4



[PATCH 7/7][ARM] Add ACLE intrinsics vqrdmlah_lane and vqrdmlsh_lane

2015-11-26 Thread Matthew Wahab

Hello,

This patch adds the ACLE intrinsics for the instructions introduced in
ARMv8.1. It adds the vqrmdlah_lane and vqrdmlsh_lane forms of the
instrinsics to the arm_neon.h header, together with the ARM builtins
used to implement them. The intrinsics are available when
-march=armv8.1-a is enabled together with appropriate fpu options.

Tested the series for arm-none-eabi with cross-compiled check-gcc on an
ARMv8.1 emulator. Also tested arm-none-linux-gnueabihf with native
bootstrap and make check.

Ok for trunk?
Matthew

gcc/
2015-11-26  Matthew Wahab  

* config/arm/arm_neon.h (vqrdmlahq_lane_s16): New.
(vqrdmlahq_lane_s32): New.
(vqrdmlah_lane_s16): New.
(vqrdmlah_lane_s32): New.
(vqrdmlshq_lane_s16): New.
(vqrdmlshq_lane_s32): New.
(vqrdmlsh_lane_s16): New.
(vqrdmlsh_lane_s32): New.
* config/arm/arm_neon_builtins.def: Add "vqrdmlah_lane" and
"vqrdmlsh_lane".



Re: [PATCH 5/7][Testsuite] Support ARMv8.1 ARM tests.

2015-11-26 Thread Matthew Wahab

Attached the missing patch.
Matthew

On 26/11/15 16:02, Matthew Wahab wrote:

Hello,

This patch adds ARMv8.1 support to GCC Dejagnu, to allow ARM
tests to specify targest and to set up command line options.
It builds on the ARMv8.1 target support added for AArch64 tests, partly
reworking that support to take into account the different configurations
that tests may be run under.

The main changes are
- add_options_for_arm_v8_1a_neon: Call
   check_effective_target_arm_v8_1a_neon_ok to select a suitable set of
   options.
- check_effective_target_arm_v8_1a_neon_ok: Test possible command line
   options, recording the first set that works.
- check_effective_target_arm_v8_1a_neon_hw: Add a test for ARM targets.

Tested the series for arm-none-eabi with cross-compiled check-gcc on an
ARMv8.1 emulator. Also tested arm-none-linux-gnueabihf with native
bootstrap and make check.

Ok for trunk?
Matthew

testsuite/
2015-11-26  Matthew Wahab  

 * lib/target-supports.exp (add_options_for_arm_v8_1a_neon): Update
 comment.  Use check_effetive_target_arm_v8_1a_neon_ok to select
 the command line options.
 (check_effective_target_arm_v8_1a_neon_ok_nocache): Update initial
 test to allow ARM targets.  Select and record a working set of
 command line options.
 (check_effective_target_arm_v8_1a_neon_hw): Add tests for ARM
 targets.



>From 6f767289ce83be88bc088c7adf66d137ed335762 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Fri, 9 Oct 2015 17:38:12 +0100
Subject: [PATCH 5/7] [Testsuite] Support ARMv8.1 NEON on ARM.

Change-Id: I35436b64996789d54f215d66ed4b0ec5ffe48e37
---
 gcc/testsuite/lib/target-supports.exp | 56 +--
 1 file changed, 41 insertions(+), 15 deletions(-)

diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index dcd51fd..34bb45d 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -2816,14 +2816,15 @@ proc add_options_for_arm_v8_neon { flags } {
 return "$flags $et_arm_v8_neon_flags -march=armv8-a"
 }
 
-# Add the options needed for ARMv8.1 Adv.SIMD.
+# Add the options needed for ARMv8.1 Adv.SIMD.  Also adds the ARMv8 NEON
+# options for AArch64 and for ARM.
 
 proc add_options_for_arm_v8_1a_neon { flags } {
-if { [istarget aarch64*-*-*] } {
-	return "$flags -march=armv8.1-a"
-} else {
+if { ! [check_effective_target_arm_v8_1a_neon_ok] } {
 	return "$flags"
 }
+global et_arm_v8_1a_neon_flags
+return "$flags $et_arm_v8_1a_neon_flags -march=armv8.1-a"
 }
 
 proc add_options_for_arm_crc { flags } {
@@ -3271,17 +3272,29 @@ proc check_effective_target_arm_neonv2_hw { } {
 }
 
 # Return 1 if the target supports the ARMv8.1 Adv.SIMD extension, 0
-# otherwise.  The test is valid for AArch64.
+# otherwise.  The test is valid for AArch64 and ARM.
 
 proc check_effective_target_arm_v8_1a_neon_ok_nocache { } {
-if { ![istarget aarch64*-*-*] } {
-	return 0
+global et_arm_v8_1a_neon_flags
+set et_arm_v8_1a_neon_flags ""
+
+if { ![istarget arm*-*-*] && ![istarget aarch64*-*-*] } {
+	return 0;
 }
-return [check_no_compiler_messages_nocache arm_v8_1a_neon_ok assembly {
-	#if !defined (__ARM_FEATURE_QRDMX)
-	#error "__ARM_FEATURE_QRDMX not defined"
-	#endif
-} [add_options_for_arm_v8_1a_neon ""]]
+
+foreach flags {"" "-mfpu=neon-fp-armv8" "-mfloat-abi=softfp" \
+		   "-mfpu=neon-fp-armv8 -mfloat-abi=softfp"} {
+	if { [check_no_compiler_messages_nocache arm_v8_1a_neon_ok object {
+	#if !defined (__ARM_FEATURE_QRDMX)
+	#error "__ARM_FEATURE_QRDMX not defined"
+	#endif
+	} "$flags -march=armv8.1-a"] } {
+	set et_arm_v8_1a_neon_flags "$flags -march=armv8.1-a"
+	return 1
+	}
+}
+
+return 0;
 }
 
 proc check_effective_target_arm_v8_1a_neon_ok { } {
@@ -3308,16 +3321,17 @@ proc check_effective_target_arm_v8_neon_hw { } {
 }
 
 # Return 1 if the target supports executing the ARMv8.1 Adv.SIMD extension, 0
-# otherwise.  The test is valid for AArch64.
+# otherwise.  The test is valid for AArch64 and ARM.
 
 proc check_effective_target_arm_v8_1a_neon_hw { } {
 if { ![check_effective_target_arm_v8_1a_neon_ok] } {
 	return 0;
 }
-return [check_runtime_nocache arm_v8_1a_neon_hw_available {
+return [check_runtime arm_v8_1a_neon_hw_available {
 	int
 	main (void)
 	{
+	  #ifdef __ARM_ARCH_ISA_A64
 	  __Int32x2_t a = {0, 1};
 	  __Int32x2_t b = {0, 2};
 	  __Int32x2_t result;
@@ -3327,9 +3341,21 @@ proc check_effective_target_arm_v8_1a_neon_hw { } {
 	   : "w"(a), "w"(b)
 	   : /* No clobbers.  */);
 
+	  #else
+
+	  __simd64_int32_t a = {0, 1};
+	  __simd64_int32_t b = {0, 2};
+	  __simd64_int32_t result;
+
+	  asm ("vqrdmlah.s32 %P0, %P1, %P2"
+	   : "=w"(result)
+	   : &quo

Re: [PATCH 7/7][ARM] Add ACLE intrinsics vqrdmlah_lane and vqrdmlsh_lane

2015-11-26 Thread Matthew Wahab

Attached the missing patch.
Matthew

On 26/11/15 16:04, Matthew Wahab wrote:

Hello,

This patch adds the ACLE intrinsics for the instructions introduced in
ARMv8.1. It adds the vqrmdlah_lane and vqrdmlsh_lane forms of the
instrinsics to the arm_neon.h header, together with the ARM builtins
used to implement them. The intrinsics are available when
-march=armv8.1-a is enabled together with appropriate fpu options.

Tested the series for arm-none-eabi with cross-compiled check-gcc on an
ARMv8.1 emulator. Also tested arm-none-linux-gnueabihf with native
bootstrap and make check.

Ok for trunk?
Matthew

gcc/
2015-11-26  Matthew Wahab  

 * config/arm/arm_neon.h (vqrdmlahq_lane_s16): New.
 (vqrdmlahq_lane_s32): New.
 (vqrdmlah_lane_s16): New.
 (vqrdmlah_lane_s32): New.
 (vqrdmlshq_lane_s16): New.
 (vqrdmlshq_lane_s32): New.
 (vqrdmlsh_lane_s16): New.
 (vqrdmlsh_lane_s32): New.
 * config/arm/arm_neon_builtins.def: Add "vqrdmlah_lane" and
 "vqrdmlsh_lane".



>From cdfee6be49e52056de8999fbc33a432f2cc7254f Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Tue, 1 Sep 2015 16:22:34 +0100
Subject: [PATCH 7/7] [ARM] Add neon intrinsics vqrdmlah_lane, vqrdmlsh_lane.

Change-Id: Ia0ab4bbe683af2d019d18a34302a7b9798193a79
---
 gcc/config/arm/arm_neon.h| 50 
 gcc/config/arm/arm_neon_builtins.def |  2 ++
 2 files changed, 52 insertions(+)

diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index b617f80..ed50253 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -7096,6 +7096,56 @@ vqrdmulh_lane_s32 (int32x2_t __a, int32x2_t __b, const int __c)
   return (int32x2_t)__builtin_neon_vqrdmulh_lanev2si (__a, __b, __c);
 }
 
+#ifdef __ARM_FEATURE_QRDMX
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vqrdmlahq_lane_s16 (int16x8_t __a, int16x8_t __b, int16x4_t __c, const int __d)
+{
+  return (int16x8_t)__builtin_neon_vqrdmlah_lanev8hi (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vqrdmlahq_lane_s32 (int32x4_t __a, int32x4_t __b, int32x2_t __c, const int __d)
+{
+  return (int32x4_t)__builtin_neon_vqrdmlah_lanev4si (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vqrdmlah_lane_s16 (int16x4_t __a, int16x4_t __b, int16x4_t __c, const int __d)
+{
+  return (int16x4_t)__builtin_neon_vqrdmlah_lanev4hi (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vqrdmlah_lane_s32 (int32x2_t __a, int32x2_t __b, int32x2_t __c, const int __d)
+{
+  return (int32x2_t)__builtin_neon_vqrdmlah_lanev2si (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vqrdmlshq_lane_s16 (int16x8_t __a, int16x8_t __b, int16x4_t __c, const int __d)
+{
+  return (int16x8_t)__builtin_neon_vqrdmlsh_lanev8hi (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vqrdmlshq_lane_s32 (int32x4_t __a, int32x4_t __b, int32x2_t __c, const int __d)
+{
+  return (int32x4_t)__builtin_neon_vqrdmlsh_lanev4si (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vqrdmlsh_lane_s16 (int16x4_t __a, int16x4_t __b, int16x4_t __c, const int __d)
+{
+  return (int16x4_t)__builtin_neon_vqrdmlsh_lanev4hi (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vqrdmlsh_lane_s32 (int32x2_t __a, int32x2_t __b, int32x2_t __c, const int __d)
+{
+  return (int32x2_t)__builtin_neon_vqrdmlsh_lanev2si (__a, __b, __c, __d);
+}
+#endif
+
 __extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
 vmul_n_s16 (int16x4_t __a, int16_t __b)
 {
diff --git a/gcc/config/arm/arm_neon_builtins.def b/gcc/config/arm/arm_neon_builtins.def
index 8d5c0ca..1fdb2a8 100644
--- a/gcc/config/arm/arm_neon_builtins.def
+++ b/gcc/config/arm/arm_neon_builtins.def
@@ -60,6 +60,8 @@ VAR4 (BINOP, vqdmulh_n, v4hi, v2si, v8hi, v4si)
 VAR4 (BINOP, vqrdmulh_n, v4hi, v2si, v8hi, v4si)
 VAR4 (SETLANE, vqdmulh_lane, v4hi, v2si, v8hi, v4si)
 VAR4 (SETLANE, vqrdmulh_lane, v4hi, v2si, v8hi, v4si)
+VAR4 (MAC_LANE, vqrdmlah_lane, v4hi, v2si, v8hi, v4si)
+VAR4 (MAC_LANE, vqrdmlsh_lane, v4hi, v2si, v8hi, v4si)
 VAR2 (BINOP, vqdmull, v4hi, v2si)
 VAR8 (BINOP, vshls, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
 VAR8 (BINOP, vshlu, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
-- 
2.1.4



Re: [AArch64] Rework ARMv8.1 command line options.

2015-11-27 Thread Matthew Wahab

On 24/11/15 15:22, James Greenhalgh wrote:
> On Mon, Nov 16, 2015 at 04:31:32PM +0000, Matthew Wahab wrote:
>>
>> The command line options for target selection allow ARMv8.1 extensions
>> to be individually enabled/disabled. They also allow the extensions to
>> be enabled with -march=armv8-a. This doesn't reflect the ARMv8.1
>> architecture which requires all extensions to be enabled and doesn't make
>> them available for ARMv8.
>>
>> This patch removes the options for the individual ARMv8.1 extensions
>> except for +lse. This means that setting -march=armv8.1-a will enable
>> all extensions required by ARMv8.1 and that the ARMv8.1 extensions can't
>> be used with -march=armv8.

> I think I mentioned it in another review, but this patch seems a good place
> to solve the problem. Could you please update the documentation to explain
> what you've written above. As it stands I find myself confused by which
> features GCC will make available at -march=armv8-a and -march=armv8.1-a.

Attached is a patch with the documentation for the AArch64 -march option
reworked to try to make it clearer what the -march=armv8.1-a option will
do. Extensions with feature modifiers (+crc, +lse) are explicitly stated
as being enabled by -march=armv8.1-a. Extensions without feature
modifiers (RDMA, PAN, LOR) are treated as part of the generic 'ARMv8.1
architecture extension' term in the description of -march=armv8.1-a.

I've also rearranged the -march section, to put the description of the
values for -march together and reworded the description of the
-march=native option.

Matthew

2015-11-26  Matthew Wahab  

* config/aarch64/aarch64-options-extensions.def: Remove
AARCH64_FL_RDMA from "fp" and "simd".  Remove "pan", "lor",
"rdma".
* config/aarch64/aarch64.h (AARCH64_FL_PAN): Remove.
(AARCH64_FL_LOR): Remove.
(AARCH64_FL_RDMA): Remove.
(AARCH64_FL_V8_1): New.
(AARCH64_FL_FOR_AARCH8_1): Replace AARCH64_FL_PAN, AARCH64_FL_LOR
and AARCH64_FL_RDMA with AARCH64_FL_V8_1.
(AARCH64_ISA_RDMA): Replace AARCH64_FL_RDMA with AARCH64_FL_V8_1.
* doc/invoke.texi (AArch64 -march): Rewrite initial paragraph and
section on -march=native.  Group descriptions of permitted
architecture names together.  Expand description of
-march=armv8.1-a.
(AArch64 -mtune): Slightly rework section on -march=native.
(AArch64 -mcpu): Slightly rework section on -march=native.
(AArch64 Feature Modifiers): Remove "pan", "lor" and "rdma".
State that -march=armv8.1-a enables "crc" and "lse".

diff --git a/gcc/config/aarch64/aarch64-option-extensions.def b/gcc/config/aarch64/aarch64-option-extensions.def
index b261a0f7c3c6f5264fe4f95c85a59535aa951ce4..4f1d53515a9a4ff8920fadb13164c85e39990db5 100644
--- a/gcc/config/aarch64/aarch64-option-extensions.def
+++ b/gcc/config/aarch64/aarch64-option-extensions.def
@@ -34,11 +34,10 @@
should contain a whitespace-separated list of the strings in 'Features'
that are required.  Their order is not important.  */
 
-AARCH64_OPT_EXTENSION("fp",	AARCH64_FL_FP,  AARCH64_FL_FPSIMD | AARCH64_FL_CRYPTO | AARCH64_FL_RDMA, "fp")
-AARCH64_OPT_EXTENSION("simd",	AARCH64_FL_FPSIMD,  AARCH64_FL_SIMD | AARCH64_FL_CRYPTO | AARCH64_FL_RDMA,   "asimd")
+AARCH64_OPT_EXTENSION ("fp", AARCH64_FL_FP,
+		   AARCH64_FL_FPSIMD | AARCH64_FL_CRYPTO, "fp")
+AARCH64_OPT_EXTENSION ("simd", AARCH64_FL_FPSIMD,
+		   AARCH64_FL_SIMD | AARCH64_FL_CRYPTO, "asimd")
 AARCH64_OPT_EXTENSION("crypto",	AARCH64_FL_CRYPTO | AARCH64_FL_FPSIMD,  AARCH64_FL_CRYPTO,   "aes pmull sha1 sha2")
 AARCH64_OPT_EXTENSION("crc",	AARCH64_FL_CRC, AARCH64_FL_CRC,"crc32")
 AARCH64_OPT_EXTENSION("lse",	AARCH64_FL_LSE, AARCH64_FL_LSE,"lse")
-AARCH64_OPT_EXTENSION("pan",	AARCH64_FL_PAN,		AARCH64_FL_PAN,		"pan")
-AARCH64_OPT_EXTENSION("lor",	AARCH64_FL_LOR,		AARCH64_FL_LOR,		"lor")
-AARCH64_OPT_EXTENSION("rdma",	AARCH64_FL_RDMA | AARCH64_FL_FPSIMD,	AARCH64_FL_RDMA,	"rdma")
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 68c006fa91f6326140cf447c7f4578ac46c24f79..06345f0215ea190b7b089264a0039a201437ecec 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -134,9 +134,7 @@ extern unsigned aarch64_architecture_version;
 #define AARCH64_FL_CRC(1 << 3)	/* Has CRC.  */
 /* ARMv8.1 architecture extensions.  */
 #define AA

Re: [PATCH 1/7][ARM] Add support for ARMv8.1.

2015-11-27 Thread Matthew Wahab

On 27/11/15 14:05, Christophe Lyon wrote:

On 26 November 2015 at 16:55, Matthew Wahab  wrote:



ARMv8.1 includes an extension to ARM which adds two Adv.SIMD
instructions, vqrdmlah and vqrdmlsh. This patch set adds support for
ARMv8.1 and for the new instructions, enabling the architecture with
--march=armv8.1-a. The new instructions are enabled when both ARMv8.1
and a suitable fpu options are set, for instance with -march=armv8.1-a
-mfpu=neon-fp-armv8 -mfloat-abi=hard.

This patch set adds the command line options and internal feature
macros. Following patches
- enable multilib support for ARMv8.1,
- add patterns for the new instructions,
- add the ACLE feature macro for the ARMv8.1 extensions,
- extend target support in the testsuite to ARMv8.1,
- add the ACLE intrinsics for vqrmdl{as}h and
- add the ACLE intrinsics for vqrmdl{as}h_lane.





The whole series LGTM, but do you plan to add tests for the new intrinsics?


The Adv.SIMD intrinsics tests are in gcc.target/aarch64/advsimd-intrinsics, they get 
run for both AArch64 and ARM backends. The tests for the new intrinsics were added 
(yesterday) by the AArch64 version of this patch.


Matthew


Re: [PATCH 5/7][Testsuite] Support ARMv8.1 ARM tests.

2015-11-27 Thread Matthew Wahab

On 27/11/15 13:44, Christophe Lyon wrote:

On 26/11/15 16:02, Matthew Wahab wrote:



This patch adds ARMv8.1 support to GCC Dejagnu, to allow ARM
tests to specify targest and to set up command line options.
It builds on the ARMv8.1 target support added for AArch64 tests, partly
reworking that support to take into account the different configurations
that tests may be run under.



I may be mistaken, but -mfpu=neon-fp-armv8 and -mfloat-abi=softfp are not
supported by aarch64-gcc. So it seems to me that
check_effective_target_arm_v8_1a_neon_ok_nocache will not always work
for aarch64 after your patch.



Or does it work because no option is needed and thus "" always
matches and thus the loop always exits after the first iteration
on aarch64?


Yes, the idea is that the empty string will make the function first try 
'-march=armv8.1-a' without any other flag. That will work for AArch64 because it 
doesn't need any other option.



Maybe a more accurate comment would help remembering that, in case
-mfpu option becomes necessary for aarch64.



Agreed, it's worth having a comment to explain what the 'foreach' construct is 
doing.

Matthew




Re: [PATCH 5/7][Testsuite] Support ARMv8.1 ARM tests.

2015-11-27 Thread Matthew Wahab

On 27/11/15 13:44, Christophe Lyon wrote:

On 26/11/15 16:02, Matthew Wahab wrote:



This patch adds ARMv8.1 support to GCC Dejagnu, to allow ARM
tests to specify targest and to set up command line options.
It builds on the ARMv8.1 target support added for AArch64 tests, partly
reworking that support to take into account the different configurations
that tests may be run under.



I may be mistaken, but -mfpu=neon-fp-armv8 and -mfloat-abi=softfp are not
supported by aarch64-gcc. So it seems to me that
check_effective_target_arm_v8_1a_neon_ok_nocache will not always work
for aarch64 after your patch.



Or does it work because no option is needed and thus "" always
matches and thus the loop always exits after the first iteration
on aarch64?


Yes, the idea is that the empty string will make the function first try 
'-march=armv8.1-a' without any other flag. That will work for AArch64 because it 
doesn't need any other option.



Maybe a more accurate comment would help remembering that, in case
-mfpu option becomes necessary for aarch64.



Agreed, it's worth having a comment to explain what the 'foreach' construct is 
doing.

Matthew




Re: [AArch64] Rework ARMv8.1 command line options.

2015-12-07 Thread Matthew Wahab

Ping. Updated patch attached.

Matthew

On 27/11/15 09:23, Matthew Wahab wrote:

On 24/11/15 15:22, James Greenhalgh wrote:
 > On Mon, Nov 16, 2015 at 04:31:32PM +0000, Matthew Wahab wrote:
 >>
 >> The command line options for target selection allow ARMv8.1 extensions
 >> to be individually enabled/disabled. They also allow the extensions to
 >> be enabled with -march=armv8-a. This doesn't reflect the ARMv8.1
 >> architecture which requires all extensions to be enabled and doesn't make
 >> them available for ARMv8.
 >>
 >> This patch removes the options for the individual ARMv8.1 extensions
 >> except for +lse. This means that setting -march=armv8.1-a will enable
 >> all extensions required by ARMv8.1 and that the ARMv8.1 extensions can't
 >> be used with -march=armv8.

 > I think I mentioned it in another review, but this patch seems a good place
 > to solve the problem. Could you please update the documentation to explain
 > what you've written above. As it stands I find myself confused by which
 > features GCC will make available at -march=armv8-a and -march=armv8.1-a.

Attached is a patch with the documentation for the AArch64 -march option
reworked to try to make it clearer what the -march=armv8.1-a option will
do. Extensions with feature modifiers (+crc, +lse) are explicitly stated
as being enabled by -march=armv8.1-a. Extensions without feature
modifiers (RDMA, PAN, LOR) are treated as part of the generic 'ARMv8.1
architecture extension' term in the description of -march=armv8.1-a.

I've also rearranged the -march section, to put the description of the
values for -march together and reworded the description of the
-march=native option.

Matthew

2015-11-26  Matthew Wahab  

 * config/aarch64/aarch64-options-extensions.def: Remove
 AARCH64_FL_RDMA from "fp" and "simd".  Remove "pan", "lor",
 "rdma".
 * config/aarch64/aarch64.h (AARCH64_FL_PAN): Remove.
 (AARCH64_FL_LOR): Remove.
 (AARCH64_FL_RDMA): Remove.
 (AARCH64_FL_V8_1): New.
 (AARCH64_FL_FOR_AARCH8_1): Replace AARCH64_FL_PAN, AARCH64_FL_LOR
 and AARCH64_FL_RDMA with AARCH64_FL_V8_1.
 (AARCH64_ISA_RDMA): Replace AARCH64_FL_RDMA with AARCH64_FL_V8_1.
 * doc/invoke.texi (AArch64 -march): Rewrite initial paragraph and
 section on -march=native.  Group descriptions of permitted
 architecture names together.  Expand description of
 -march=armv8.1-a.
 (AArch64 -mtune): Slightly rework section on -march=native.
 (AArch64 -mcpu): Slightly rework section on -march=native.
 (AArch64 Feature Modifiers): Remove "pan", "lor" and "rdma".
 State that -march=armv8.1-a enables "crc" and "lse".



>From 498323fc1992cd75070e86f195d4bba09a5e02e0 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Fri, 30 Oct 2015 10:32:59 +
Subject: [PATCH] [AArch64] Rework ARMv8.1 command line options.

Change-Id: Ib9053719f45980255a3d7727e226a53d9f214049
---
 gcc/config/aarch64/aarch64-option-extensions.def |  9 ++---
 gcc/config/aarch64/aarch64.h |  9 ++---
 gcc/doc/invoke.texi  | 47 
 3 files changed, 30 insertions(+), 35 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-option-extensions.def b/gcc/config/aarch64/aarch64-option-extensions.def
index b261a0f..4f1d535 100644
--- a/gcc/config/aarch64/aarch64-option-extensions.def
+++ b/gcc/config/aarch64/aarch64-option-extensions.def
@@ -34,11 +34,10 @@
should contain a whitespace-separated list of the strings in 'Features'
that are required.  Their order is not important.  */
 
-AARCH64_OPT_EXTENSION("fp",	AARCH64_FL_FP,  AARCH64_FL_FPSIMD | AARCH64_FL_CRYPTO | AARCH64_FL_RDMA, "fp")
-AARCH64_OPT_EXTENSION("simd",	AARCH64_FL_FPSIMD,  AARCH64_FL_SIMD | AARCH64_FL_CRYPTO | AARCH64_FL_RDMA,   "asimd")
+AARCH64_OPT_EXTENSION ("fp", AARCH64_FL_FP,
+		   AARCH64_FL_FPSIMD | AARCH64_FL_CRYPTO, "fp")
+AARCH64_OPT_EXTENSION ("simd", AARCH64_FL_FPSIMD,
+		   AARCH64_FL_SIMD | AARCH64_FL_CRYPTO, "asimd")
 AARCH64_OPT_EXTENSION("crypto",	AARCH64_FL_CRYPTO | AARCH64_FL_FPSIMD,  AARCH64_FL_CRYPTO,   "aes pmull sha1 sha2")
 AARCH64_OPT_EXTENSION("crc",	AARCH64_FL_CRC, AARCH64_FL_CRC,"crc32")
 AARCH64_OPT_EXTENSION("lse",	AARCH64_FL_LSE, AARCH64_FL_LSE,"lse")
-AARCH64_OPT_EXTENSION("pan",	AARCH64_FL_PAN,		AARCH64_FL_PAN,		"pan")
-AARCH64_OPT_EXTENSION("lor",	AARCH64_FL_LOR,		AARCH64_FL_LOR,		"lor")
-AARCH64_OPT_EXTENSION("rdma",	AA

Re: [Fortran, Patch] Memory sync after coarray image control statements and assignment

2015-12-07 Thread Matthew Wahab

On 07/12/15 10:06, Tobias Burnus wrote:

I wrote:

I wonder whether using

__asm__ __volatile__ ("":::"memory");

would be sufficient as it has a way lower overhead than
__sync_synchronize().


Namely, something like the attached patch.

Regarding the original patch submission: Is there a reason that you didn't
include the test case of Deepak from 
https://gcc.gnu.org/ml/fortran/2015-04/msg00062.html
It should work as -fcoarray=lib -lcaf_single "dg-do run" test.

Tobias



I don't know anything about Fortran or coarrays and I'm curious whether this affects 
architectures with weak memory models. Is the barrier only needed to stop reordering 
by the compiler or is does it also need to stop reordering by the hardware?


Matthew




Re: [PATCH 1/7][ARM] Add support for ARMv8.1.

2015-12-07 Thread Matthew Wahab

Ping. Updated patch attached.
Matthew

On 26/11/15 15:55, Matthew Wahab wrote:

Hello,


ARMv8.1 includes an extension to ARM which adds two Adv.SIMD
instructions, vqrdmlah and vqrdmlsh. This patch set adds support for
ARMv8.1 and for the new instructions, enabling the architecture with
--march=armv8.1-a. The new instructions are enabled when both ARMv8.1
and a suitable fpu options are set, for instance with -march=armv8.1-a
-mfpu=neon-fp-armv8 -mfloat-abi=hard.

This patch set adds the command line options and internal feature
macros. Following patches
- enable multilib support for ARMv8.1,
- add patterns for the new instructions,
- add the ACLE feature macro for the ARMv8.1 extensions,
- extend target support in the testsuite to ARMv8.1,
- add the ACLE intrinsics for vqrmdl{as}h and
- add the ACLE intrinsics for vqrmdl{as}h_lane.

Tested the series for arm-none-eabi with cross-compiled check-gcc on an
ARMv8.1 emulator. Also tested arm-none-linux-gnueabihf with native
bootstrap and make check.

Is this ok for trunk?
Matthew

gcc/
2015-11-26  Matthew Wahab  

 * config/arm/arm-arches.def: Add "armv8.1-a" and "armv8.1-a+crc".
 * config/arm/arm-protos.h (FL2_ARCH8_1): New.
 (FL2_FOR_ARCH8_1A): New.
 * config/arm/arm-tables.opt: Regenerate.
 * config/arm/arm.c (arm_arch8_1): New.
 (arm_option_override): Set arm_arch8_1.
 * config/arm/arm.h (TARGET_NEON_RDMA): New.
 (arm_arch8_1): Declare.
 * doc/invoke.texi (ARM Options, -march): Add "armv8.1-a" and
 "armv8.1-a+crc".
 (ARM Options, -mfpu): Fix a typo.


>From 65bcf9a875fd31f6201e64cbbd4fdfb0b8f4719e Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Tue, 1 Sep 2015 11:31:25 +0100
Subject: [PATCH 1/7] [ARM] Add ARMv8.1 architecture flags and options.

Change-Id: I6bb0c7f020613a1a17e40bccc28b00c30d644c70
---
 gcc/config/arm/arm-arches.def |  5 +
 gcc/config/arm/arm-protos.h   |  3 +++
 gcc/config/arm/arm-tables.opt | 10 --
 gcc/config/arm/arm.c  |  4 
 gcc/config/arm/arm.h  |  6 ++
 gcc/doc/invoke.texi   |  6 +++---
 6 files changed, 29 insertions(+), 5 deletions(-)

diff --git a/gcc/config/arm/arm-arches.def b/gcc/config/arm/arm-arches.def
index ddf6c3c..6c83153 100644
--- a/gcc/config/arm/arm-arches.def
+++ b/gcc/config/arm/arm-arches.def
@@ -57,6 +57,11 @@ ARM_ARCH("armv7-m", cortexm3,	7M,	ARM_FSET_MAKE_CPU1 (FL_CO_PROC |	  FL_FOR_
 ARM_ARCH("armv7e-m", cortexm4,  7EM,	ARM_FSET_MAKE_CPU1 (FL_CO_PROC |	  FL_FOR_ARCH7EM))
 ARM_ARCH("armv8-a", cortexa53,  8A,	ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_FOR_ARCH8A))
 ARM_ARCH("armv8-a+crc",cortexa53, 8A,   ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_CRC32  | FL_FOR_ARCH8A))
+ARM_ARCH ("armv8.1-a", cortexa53,  8A,
+	  ARM_FSET_MAKE (FL_CO_PROC | FL_FOR_ARCH8A,  FL2_FOR_ARCH8_1A))
+ARM_ARCH ("armv8.1-a+crc",cortexa53, 8A,
+	  ARM_FSET_MAKE (FL_CO_PROC | FL_CRC32 | FL_FOR_ARCH8A,
+			 FL2_FOR_ARCH8_1A))
 ARM_ARCH("iwmmxt",  iwmmxt, 5TE,	ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_STRONG | FL_FOR_ARCH5TE | FL_XSCALE | FL_IWMMXT))
 ARM_ARCH("iwmmxt2", iwmmxt2,5TE,	ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_STRONG | FL_FOR_ARCH5TE | FL_XSCALE | FL_IWMMXT | FL_IWMMXT2))
 
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index e7328e7..d649e86 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -387,6 +387,8 @@ extern bool arm_is_constant_pool_ref (rtx);
 #define FL_IWMMXT2(1 << 30)   /* "Intel Wireless MMX2 technology".  */
 #define FL_ARCH6KZ(1 << 31)   /* ARMv6KZ architecture.  */
 
+#define FL2_ARCH8_1   (1 << 0)	  /* Architecture 8.1.  */
+
 /* Flags that only effect tuning, not available instructions.  */
 #define FL_TUNE		(FL_WBUF | FL_VFPV2 | FL_STRONG | FL_LDSCHED \
 			 | FL_CO_PROC)
@@ -415,6 +417,7 @@ extern bool arm_is_constant_pool_ref (rtx);
 #define FL_FOR_ARCH7M	(FL_FOR_ARCH7 | FL_THUMB_DIV)
 #define FL_FOR_ARCH7EM  (FL_FOR_ARCH7M | FL_ARCH7EM)
 #define FL_FOR_ARCH8A	(FL_FOR_ARCH7VE | FL_ARCH8)
+#define FL2_FOR_ARCH8_1A	FL2_ARCH8_1
 
 /* There are too many feature bits to fit in a single word so the set of cpu and
fpu capabilities is a structure.  A feature set is created and manipulated
diff --git a/gcc/config/arm/arm-tables.opt b/gcc/config/arm/arm-tables.opt
index 48aac41..db17f6e 100644
--- a/gcc/config/arm/arm-tables.opt
+++ b/gcc/config/arm/arm-tables.opt
@@ -416,10 +416,16 @@ EnumValue
 Enum(arm_arch) String(armv8-a+crc) Value(26)
 
 EnumValue
-Enum(arm_arch) String(iwmmxt) Value(27)
+Enum(arm_arch) String(armv8.1-a) Value(27)
 
 EnumValue
-Enum(arm_arch) String(iwmmxt2) Value(28)
+Enum(arm_arch) String(armv8.1-a+crc) Value(28)
+
+EnumValue
+Enum(arm_arch) String(iwmmxt) Value(29)
+
+EnumValue
+Enum(arm_arch) String(iwmmxt2) Value(30)
 
 Enum
 Name(arm_fpu) Type(int)
diff --git a/

Re: [PATCH 2/7][ARM] Multilib support for ARMv8.1.

2015-12-07 Thread Matthew Wahab

Ping. Updated patch attached.
Matthew

On 26/11/15 15:58, Matthew Wahab wrote:

This patch sets up multilib support for ARMv8.1, treating it as a
synonym for ARMv8. Since ARMv8.1 integer, FP or SIMD
instructions are only generated for the new, instruction-specific
instrinsics, mapping to ARMv8 rather than adding a new multilib variant
is sufficient.

Tested the series for arm-none-eabi with cross-compiled check-gcc on an
ARMv8.1 emulator. Also tested arm-none-linux-gnueabihf with native
bootstrap and make check.

Ok for trunk?
Matthew

gcc/
2015-11-26  Matthew Wahab  

 * config/arm/t-aprofile: Make "armv8.1-a" and "armv8.1-a+crc"
 matches for "armv8-a".



>From c5c0f983e03135fe0cde29077353b429c0c502a2 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Fri, 23 Oct 2015 09:37:12 +0100
Subject: [PATCH 2/7] [ARM] Multilib support for ARMv8.1

Change-Id: I65ee77768e22452ac15452cf6d4fdec3079ef852
---
 gcc/config/arm/t-aprofile | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/config/arm/t-aprofile b/gcc/config/arm/t-aprofile
index cf34161..b23f1bc 100644
--- a/gcc/config/arm/t-aprofile
+++ b/gcc/config/arm/t-aprofile
@@ -98,6 +98,8 @@ MULTILIB_MATCHES   += march?armv8-a=mcpu?xgene1
 
 # Arch Matches
 MULTILIB_MATCHES   += march?armv8-a=march?armv8-a+crc
+MULTILIB_MATCHES   += march?armv8-a=march?armv8.1-a
+MULTILIB_MATCHES   += march?armv8-a=march?armv8.1-a+crc
 
 # FPU matches
 MULTILIB_MATCHES   += mfpu?vfpv3-d16=mfpu?vfpv3
-- 
2.1.4



Re: [PATCH 3/7][ARM] Add patterns for new instructions

2015-12-07 Thread Matthew Wahab

Ping. Updated patch attached.
Matthew

On 26/11/15 16:00, Matthew Wahab wrote:

Hello,

This patch adds patterns for the instructions, vqrdmlah and vqrdmlsh,
introduced in the ARMv8.1 architecture. The instructions are made
available when -march=armv8.1-a is enabled with suitable fpu settings,
such as -mfpu=neon-fp-armv8 -mfloat-abi=hard.

Tested the series for arm-none-eabi with cross-compiled check-gcc on an
ARMv8.1 emulator. Also tested arm-none-linux-gnueabihf with native
bootstrap and make check.

Ok for trunk?
Matthew

gcc/
2015-11-26  Matthew Wahab  

 * config/arm/iterators.md (VQRDMLH_AS): New.
 (neon_rdma_as): New.
 * config/arm/neon.md
 (neon_vqrdmlh): New.
 (neon_vqrdmlh_lane): New.
 * config/arm/unspecs.md (UNSPEC_VQRDMLAH): New.
 (UNSPEC_VQRDMLSH): New.



>From 8b69bae2f0057be09d3cbe3fe3c29155085e260d Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Wed, 17 Jun 2015 12:00:50 +0100
Subject: [PATCH 3/7] [ARM] Add patterns for new instructions.

Change-Id: Ia84c345019c7beda2d3c6c39074043d2e005347a
---
 gcc/config/arm/iterators.md |  5 +
 gcc/config/arm/neon.md  | 45 +
 gcc/config/arm/unspecs.md   |  2 ++
 3 files changed, 52 insertions(+)

diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 6a54125..c7a6880 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -362,6 +362,8 @@
 (define_int_iterator CRYPTO_SELECTING [UNSPEC_SHA1C UNSPEC_SHA1M
UNSPEC_SHA1P])
 
+(define_int_iterator VQRDMLH_AS [UNSPEC_VQRDMLAH UNSPEC_VQRDMLSH])
+
 ;;
 ;; Mode attributes
 ;;
@@ -831,3 +833,6 @@
(simple_return " && use_simple_return_p ()")])
 (define_code_attr return_cond_true [(return " && USE_RETURN_INSN (TRUE)")
(simple_return " && use_simple_return_p ()")])
+
+;; Attributes for VQRDMLAH/VQRDMLSH
+(define_int_attr neon_rdma_as [(UNSPEC_VQRDMLAH "a") (UNSPEC_VQRDMLSH "s")])
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 62fb6da..844ef5e 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -2014,6 +2014,18 @@
   [(set_attr "type" "neon_sat_mul_")]
 )
 
+;; vqrdmlah, vqrdmlsh
+(define_insn "neon_vqrdmlh"
+  [(set (match_operand:VMDQI 0 "s_register_operand" "=w")
+	(unspec:VMDQI [(match_operand:VMDQI 1 "s_register_operand" "0")
+		   (match_operand:VMDQI 2 "s_register_operand" "w")
+		   (match_operand:VMDQI 3 "s_register_operand" "w")]
+		  VQRDMLH_AS))]
+  "TARGET_NEON_RDMA"
+  "vqrdmlh.\t%0, %2, %3"
+  [(set_attr "type" "neon_sat_mla__long")]
+)
+
 (define_insn "neon_vqdmlal"
   [(set (match_operand: 0 "s_register_operand" "=w")
 (unspec: [(match_operand: 1 "s_register_operand" "0")
@@ -3176,6 +3188,39 @@ if (BYTES_BIG_ENDIAN)
   [(set_attr "type" "neon_sat_mul__scalar_q")]
 )
 
+;; vqrdmlah_lane, vqrdmlsh_lane
+(define_insn "neon_vqrdmlh_lane"
+  [(set (match_operand:VMQI 0 "s_register_operand" "=w")
+	(unspec:VMQI [(match_operand:VMQI 1 "s_register_operand" "0")
+		  (match_operand:VMQI 2 "s_register_operand" "w")
+		  (match_operand: 3 "s_register_operand"
+	  "")
+		  (match_operand:SI 4 "immediate_operand" "i")]
+		 VQRDMLH_AS))]
+  "TARGET_NEON_RDMA"
+{
+  return
+   "vqrdmlh.\t%q0, %q2, %P3[%c4]";
+}
+  [(set_attr "type" "neon_mla__scalar")]
+)
+
+(define_insn "neon_vqrdmlh_lane"
+  [(set (match_operand:VMDI 0 "s_register_operand" "=w")
+	(unspec:VMDI [(match_operand:VMDI 1 "s_register_operand" "0")
+		  (match_operand:VMDI 2 "s_register_operand" "w")
+		  (match_operand:VMDI 3 "s_register_operand"
+	  "")
+		  (match_operand:SI 4 "immediate_operand" "i")]
+		 VQRDMLH_AS))]
+  "TARGET_NEON_RDMA"
+{
+  return
+   "vqrdmlh.\t%P0, %P2, %P3[%c4]";
+}
+  [(set_attr "type" "neon_mla__scalar")]
+)
+
 (define_insn "neon_vmla_lane"
   [(set (match_operand:VMD 0 "s_register_operand" "=w")
 	(unspec:VMD [(match_operand:VMD 1 "s_register_operand" "0")
diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md
index 67acafd..ffe703c 100644
--- a/gcc/config/arm/unspecs.md
+++ b/gcc/config/arm/unspecs.md
@@ -360,5 +360,7 @@
   UNSPEC_NVRINTX
   UNSPEC_NVRINTA
   UNSPEC_NVRINTN
+  UNSPEC_VQRDMLAH
+  UNSPEC_VQRDMLSH
 ])
 
-- 
2.1.4



Re: [PATCH 4/7][ARM] Add ACLE feature macro for ARMv8.1 instructions.

2015-12-07 Thread Matthew Wahab

Ping. Updated patch attached.
Matthew


On 26/11/15 16:01, Matthew Wahab wrote:

Hello,

This patch adds the feature macro __ARM_FEATURE_QRDMX to indicate the
presence of the ARMv8.1 instructions vqrdmlah and vqrdmlsh. It is
defined when the instructions are available, as it is when
-march=armv8.1-a is enabled with suitable fpu options.

Tested the series for arm-none-eabi with cross-compiled check-gcc on an
ARMv8.1 emulator. Also tested arm-none-linux-gnueabihf with native
bootstrap and make check.

Ok for trunk?
Matthew

gcc/
2015-11-26  Matthew Wahab  

 * config/arm/arm-c.c (arm_cpu_builtins): Define __ARM_FEATURE_QRDMX.



>From 721586aad45f7f75a0c198517602125c9d8f76f2 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Wed, 17 Jun 2015 13:25:09 +0100
Subject: [PATCH 4/7] [ARM] Add __ARM_FEATURE_QRDMX

Change-Id: I26cde507e8844a731e4fd857fbd30bf87f213f89
---
 gcc/config/arm/arm-c.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/config/arm/arm-c.c b/gcc/config/arm/arm-c.c
index 7dee28e..62c9304 100644
--- a/gcc/config/arm/arm-c.c
+++ b/gcc/config/arm/arm-c.c
@@ -68,6 +68,9 @@ arm_cpu_builtins (struct cpp_reader* pfile)
 
   def_or_undef_macro (pfile, "__ARM_FEATURE_UNALIGNED", unaligned_access);
 
+  if (TARGET_NEON_RDMA)
+builtin_define ("__ARM_FEATURE_QRDMX");
+
   if (TARGET_CRC32)
 builtin_define ("__ARM_FEATURE_CRC32");
 
-- 
2.1.4



Re: [PATCH 5/7][Testsuite] Support ARMv8.1 ARM tests.

2015-12-07 Thread Matthew Wahab

On 27/11/15 17:11, Matthew Wahab wrote:

On 27/11/15 13:44, Christophe Lyon wrote:

On 26/11/15 16:02, Matthew Wahab wrote:



This patch adds ARMv8.1 support to GCC Dejagnu, to allow ARM
tests to specify targest and to set up command line options.
It builds on the ARMv8.1 target support added for AArch64 tests, partly
reworking that support to take into account the different configurations
that tests may be run under.



I may be mistaken, but -mfpu=neon-fp-armv8 and -mfloat-abi=softfp are not
supported by aarch64-gcc. So it seems to me that
check_effective_target_arm_v8_1a_neon_ok_nocache will not always work
for aarch64 after your patch.



Or does it work because no option is needed and thus "" always
matches and thus the loop always exits after the first iteration
on aarch64?


Yes, the idea is that the empty string will make the function first try
'-march=armv8.1-a' without any other flag. That will work for AArch64 because it
doesn't need any other option.


Maybe a more accurate comment would help remembering that, in case
-mfpu option becomes necessary for aarch64.



Agreed, it's worth having a comment to explain what the 'foreach' construct is 
doing.

Matthew


I've added a comment to the foreach construct, to make it clearer what
it's doing.

Matthew

testsuite/
2015-12-07  Matthew Wahab  

* lib/target-supports.exp (add_options_for_arm_v8_1a_neon): Update
comment.  Use check_effetive_target_arm_v8_1a_neon_ok to select
the command line options.
(check_effective_target_arm_v8_1a_neon_ok_nocache): Update initial
test to allow ARM targets.  Select and record a working set of
command line options.
(check_effective_target_arm_v8_1a_neon_hw): Add tests for ARM
targets.

>From 7e2cd1ef475a5c7f4a4722b9ba32bd46e3b30eae Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Fri, 9 Oct 2015 17:38:12 +0100
Subject: [PATCH 5/7] [Testsuite] Support ARMv8.1 NEON on ARM.

---
 gcc/testsuite/lib/target-supports.exp | 60 ++-
 1 file changed, 45 insertions(+), 15 deletions(-)

diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 4e349e9..6dfb6f6 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -2816,14 +2816,15 @@ proc add_options_for_arm_v8_neon { flags } {
 return "$flags $et_arm_v8_neon_flags -march=armv8-a"
 }
 
-# Add the options needed for ARMv8.1 Adv.SIMD.
+# Add the options needed for ARMv8.1 Adv.SIMD.  Also adds the ARMv8 NEON
+# options for AArch64 and for ARM.
 
 proc add_options_for_arm_v8_1a_neon { flags } {
-if { [istarget aarch64*-*-*] } {
-	return "$flags -march=armv8.1-a"
-} else {
+if { ! [check_effective_target_arm_v8_1a_neon_ok] } {
 	return "$flags"
 }
+global et_arm_v8_1a_neon_flags
+return "$flags $et_arm_v8_1a_neon_flags -march=armv8.1-a"
 }
 
 proc add_options_for_arm_crc { flags } {
@@ -3271,17 +3272,33 @@ proc check_effective_target_arm_neonv2_hw { } {
 }
 
 # Return 1 if the target supports the ARMv8.1 Adv.SIMD extension, 0
-# otherwise.  The test is valid for AArch64.
+# otherwise.  The test is valid for AArch64 and ARM.  Record the command
+# line options that needed.
 
 proc check_effective_target_arm_v8_1a_neon_ok_nocache { } {
-if { ![istarget aarch64*-*-*] } {
-	return 0
+global et_arm_v8_1a_neon_flags
+set et_arm_v8_1a_neon_flags ""
+
+if { ![istarget arm*-*-*] && ![istarget aarch64*-*-*] } {
+	return 0;
 }
-return [check_no_compiler_messages_nocache arm_v8_1a_neon_ok assembly {
-	#if !defined (__ARM_FEATURE_QRDMX)
-	#error "__ARM_FEATURE_QRDMX not defined"
-	#endif
-} [add_options_for_arm_v8_1a_neon ""]]
+
+# Iterate through sets of options to find the compiler flags that
+# need to be added to the -march option.  Start with the empty set
+# since AArch64 only needs the -march setting.
+foreach flags {"" "-mfpu=neon-fp-armv8" "-mfloat-abi=softfp" \
+		   "-mfpu=neon-fp-armv8 -mfloat-abi=softfp"} {
+	if { [check_no_compiler_messages_nocache arm_v8_1a_neon_ok object {
+	#if !defined (__ARM_FEATURE_QRDMX)
+	#error "__ARM_FEATURE_QRDMX not defined"
+	#endif
+	} "$flags -march=armv8.1-a"] } {
+	set et_arm_v8_1a_neon_flags "$flags -march=armv8.1-a"
+	return 1
+	}
+}
+
+return 0;
 }
 
 proc check_effective_target_arm_v8_1a_neon_ok { } {
@@ -3308,16 +3325,17 @@ proc check_effective_target_arm_v8_neon_hw { } {
 }
 
 # Return 1 if the target supports executing the ARMv8.1 Adv.SIMD extension, 0
-# otherwise.  The test is valid for AArch64.
+# otherwise.  The test is valid for AArch64 and ARM.
 
 proc check_effective_target_arm_v8_1a_neon_hw { } {
 if { ![check_effective_target_arm_v8_1a_neon_o

Re: [PATCH 7/7][ARM] Add ACLE intrinsics vqrdmlah_lane and vqrdmlsh_lane

2015-12-07 Thread Matthew Wahab

Ping. Updated patch attached.
Matthew

On 26/11/15 16:10, Matthew Wahab wrote:

Attached the missing patch.
Matthew

On 26/11/15 16:04, Matthew Wahab wrote:

Hello,

This patch adds the ACLE intrinsics for the instructions introduced in
ARMv8.1. It adds the vqrmdlah_lane and vqrdmlsh_lane forms of the
instrinsics to the arm_neon.h header, together with the ARM builtins
used to implement them. The intrinsics are available when
-march=armv8.1-a is enabled together with appropriate fpu options.

Tested the series for arm-none-eabi with cross-compiled check-gcc on an
ARMv8.1 emulator. Also tested arm-none-linux-gnueabihf with native
bootstrap and make check.

Ok for trunk?
Matthew

gcc/
2015-11-26  Matthew Wahab  

 * config/arm/arm_neon.h (vqrdmlahq_lane_s16): New.
 (vqrdmlahq_lane_s32): New.
 (vqrdmlah_lane_s16): New.
 (vqrdmlah_lane_s32): New.
 (vqrdmlshq_lane_s16): New.
 (vqrdmlshq_lane_s32): New.
 (vqrdmlsh_lane_s16): New.
 (vqrdmlsh_lane_s32): New.
 * config/arm/arm_neon_builtins.def: Add "vqrdmlah_lane" and
 "vqrdmlsh_lane".





>From 9928f1e8e30c500933fa68f95311cf0f78dd6712 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Tue, 1 Sep 2015 16:22:34 +0100
Subject: [PATCH 7/7] [ARM] Add neon intrinsics vqrdmlah_lane, vqrdmlsh_lane.

Change-Id: Ia0ab4bbe683af2d019d18a34302a7b9798193a79
---
 gcc/config/arm/arm_neon.h| 50 
 gcc/config/arm/arm_neon_builtins.def |  2 ++
 2 files changed, 52 insertions(+)

diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index b617f80..ed50253 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -7096,6 +7096,56 @@ vqrdmulh_lane_s32 (int32x2_t __a, int32x2_t __b, const int __c)
   return (int32x2_t)__builtin_neon_vqrdmulh_lanev2si (__a, __b, __c);
 }
 
+#ifdef __ARM_FEATURE_QRDMX
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vqrdmlahq_lane_s16 (int16x8_t __a, int16x8_t __b, int16x4_t __c, const int __d)
+{
+  return (int16x8_t)__builtin_neon_vqrdmlah_lanev8hi (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vqrdmlahq_lane_s32 (int32x4_t __a, int32x4_t __b, int32x2_t __c, const int __d)
+{
+  return (int32x4_t)__builtin_neon_vqrdmlah_lanev4si (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vqrdmlah_lane_s16 (int16x4_t __a, int16x4_t __b, int16x4_t __c, const int __d)
+{
+  return (int16x4_t)__builtin_neon_vqrdmlah_lanev4hi (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vqrdmlah_lane_s32 (int32x2_t __a, int32x2_t __b, int32x2_t __c, const int __d)
+{
+  return (int32x2_t)__builtin_neon_vqrdmlah_lanev2si (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vqrdmlshq_lane_s16 (int16x8_t __a, int16x8_t __b, int16x4_t __c, const int __d)
+{
+  return (int16x8_t)__builtin_neon_vqrdmlsh_lanev8hi (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vqrdmlshq_lane_s32 (int32x4_t __a, int32x4_t __b, int32x2_t __c, const int __d)
+{
+  return (int32x4_t)__builtin_neon_vqrdmlsh_lanev4si (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vqrdmlsh_lane_s16 (int16x4_t __a, int16x4_t __b, int16x4_t __c, const int __d)
+{
+  return (int16x4_t)__builtin_neon_vqrdmlsh_lanev4hi (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vqrdmlsh_lane_s32 (int32x2_t __a, int32x2_t __b, int32x2_t __c, const int __d)
+{
+  return (int32x2_t)__builtin_neon_vqrdmlsh_lanev2si (__a, __b, __c, __d);
+}
+#endif
+
 __extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
 vmul_n_s16 (int16x4_t __a, int16_t __b)
 {
diff --git a/gcc/config/arm/arm_neon_builtins.def b/gcc/config/arm/arm_neon_builtins.def
index 8d5c0ca..1fdb2a8 100644
--- a/gcc/config/arm/arm_neon_builtins.def
+++ b/gcc/config/arm/arm_neon_builtins.def
@@ -60,6 +60,8 @@ VAR4 (BINOP, vqdmulh_n, v4hi, v2si, v8hi, v4si)
 VAR4 (BINOP, vqrdmulh_n, v4hi, v2si, v8hi, v4si)
 VAR4 (SETLANE, vqdmulh_lane, v4hi, v2si, v8hi, v4si)
 VAR4 (SETLANE, vqrdmulh_lane, v4hi, v2si, v8hi, v4si)
+VAR4 (MAC_LANE, vqrdmlah_lane, v4hi, v2si, v8hi, v4si)
+VAR4 (MAC_LANE, vqrdmlsh_lane, v4hi, v2si, v8hi, v4si)
 VAR2 (BINOP, vqdmull, v4hi, v2si)
 VAR8 (BINOP, vshls, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
 VAR8 (BINOP, vshlu, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
-- 
2.1.4



Re: [PATCH 6/7][ARM] Add ACLE intrinsics vqrdmlah and vqrdmlsh

2015-12-07 Thread Matthew Wahab

Ping. Updated patch attached.
Matthew

On 26/11/15 16:04, Matthew Wahab wrote:

Hello,

This patch adds the ACLE intrinsics for the instructions introduced in
ARMv8.1. It adds the vqrmdlah and vqrdmlsh forms of the instrinsics to
the arm_neon.h header, together with the ARM builtins used to implement
them. The intrinsics are available when -march=armv8.1-a is enabled
together with appropriate fpu options.

Tested the series for arm-none-eabi with cross-compiled check-gcc on an
ARMv8.1 emulator. Also tested arm-none-linux-gnueabihf with native
bootstrap and make check.

Ok for trunk?
Matthew

gcc/
2015-11-26  Matthew Wahab  

 * config/arm/arm_neon.h (vqrdmlah_s16, vqrdmlah_s32): New.
 (vqrdmlahq_s16, vqrdmlahq_s32): New.
 (vqrdmlsh_s16, vqrdmlsh_s32): New.
 (vqrdmlahq_s16, vqrdmlshq_s32): New.
 * config/arm/arm_neon_builtins.def: Add "vqrdmlah" and "vqrdmlsh".



>From 1844027592d818e0de53a3da904ae6bfe1aef534 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Tue, 1 Sep 2015 16:21:44 +0100
Subject: [PATCH 6/7] [ARM] Add neon intrinsics vqrdmlah, vqrdmlsh.

Change-Id: Ic40ff4d477f36ec01714c68e3b83b66208c7958b
---
 gcc/config/arm/arm_neon.h| 50 
 gcc/config/arm/arm_neon_builtins.def |  2 ++
 2 files changed, 52 insertions(+)

diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index 0a33d21..b617f80 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -1158,6 +1158,56 @@ vqrdmulhq_s32 (int32x4_t __a, int32x4_t __b)
   return (int32x4_t)__builtin_neon_vqrdmulhv4si (__a, __b);
 }
 
+#ifdef __ARM_FEATURE_QRDMX
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vqrdmlah_s16 (int16x4_t __a, int16x4_t __b, int16x4_t __c)
+{
+  return (int16x4_t)__builtin_neon_vqrdmlahv4hi (__a, __b, __c);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vqrdmlah_s32 (int32x2_t __a, int32x2_t __b, int32x2_t __c)
+{
+  return (int32x2_t)__builtin_neon_vqrdmlahv2si (__a, __b, __c);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vqrdmlahq_s16 (int16x8_t __a, int16x8_t __b, int16x8_t __c)
+{
+  return (int16x8_t)__builtin_neon_vqrdmlahv8hi (__a, __b, __c);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vqrdmlahq_s32 (int32x4_t __a, int32x4_t __b, int32x4_t __c)
+{
+  return (int32x4_t)__builtin_neon_vqrdmlahv4si (__a, __b, __c);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vqrdmlsh_s16 (int16x4_t __a, int16x4_t __b, int16x4_t __c)
+{
+  return (int16x4_t)__builtin_neon_vqrdmlshv4hi (__a, __b, __c);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vqrdmlsh_s32 (int32x2_t __a, int32x2_t __b, int32x2_t __c)
+{
+  return (int32x2_t)__builtin_neon_vqrdmlshv2si (__a, __b, __c);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vqrdmlshq_s16 (int16x8_t __a, int16x8_t __b, int16x8_t __c)
+{
+  return (int16x8_t)__builtin_neon_vqrdmlshv8hi (__a, __b, __c);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vqrdmlshq_s32 (int32x4_t __a, int32x4_t __b, int32x4_t __c)
+{
+  return (int32x4_t)__builtin_neon_vqrdmlshv4si (__a, __b, __c);
+}
+#endif
+
 __extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
 vmull_s8 (int8x8_t __a, int8x8_t __b)
 {
diff --git a/gcc/config/arm/arm_neon_builtins.def b/gcc/config/arm/arm_neon_builtins.def
index 0b719df..8d5c0ca 100644
--- a/gcc/config/arm/arm_neon_builtins.def
+++ b/gcc/config/arm/arm_neon_builtins.def
@@ -45,6 +45,8 @@ VAR4 (BINOP, vqdmulh, v4hi, v2si, v8hi, v4si)
 VAR4 (BINOP, vqrdmulh, v4hi, v2si, v8hi, v4si)
 VAR2 (TERNOP, vqdmlal, v4hi, v2si)
 VAR2 (TERNOP, vqdmlsl, v4hi, v2si)
+VAR4 (TERNOP, vqrdmlah, v4hi, v2si, v8hi, v4si)
+VAR4 (TERNOP, vqrdmlsh, v4hi, v2si, v8hi, v4si)
 VAR3 (BINOP, vmullp, v8qi, v4hi, v2si)
 VAR3 (BINOP, vmulls, v8qi, v4hi, v2si)
 VAR3 (BINOP, vmullu, v8qi, v4hi, v2si)
-- 
2.1.4



Re: [C] Issue an error on scalar va_list with reverse storage order

2015-12-08 Thread Matthew Wahab

Hello

On 03/12/15 14:53, Eric Botcazou wrote:

further testing revealed an issue with va_arg handling and reverse scalar 
storage
order on some platforms: when va_list is scalar, passing a field of a structure
with reverse SSO as first argument to va_start/va_arg/va_end doesn't work 
because
the machinery takes its address and this is not allowed for such a field (it's
really a corner case but gcc.c-torture/execute/stdarg-2.c does exercise it). 
Hence
the attached patch which issues an error in this case.


The new gcc.dg/sso-9.c test is failing for aarch64 and arm targets. There's no 
error
generated if I compile the test from the command line for aarch64-none-elf. GCC 
for
x86_64 does generate the error.

Matthew


2015-12-03  Eric Botcazou  

* c-tree.h (c_build_va_arg): Adjust prototype. * c-parser.c
(c_parser_postfix_expression): Adjust call to above. * c-typeck.c
(c_build_va_arg): Rename LOC parameter to LOC2, add LOC1 parameter, adjust
throughout and issue an error if EXPR is a component with reverse storage order.


2015-12-03  Eric Botcazou  

* gcc.dg/sso-9.c: New test.





Re: [Fortran, Patch] Memory sync after coarray image control statements and assignment

2015-12-09 Thread Matthew Wahab

On 08/12/15 09:25, Tobias Burnus wrote:

On Mon, Dec 07, 2015 at 02:09:22PM +, Matthew Wahab wrote:

I wonder whether using
__asm__ __volatile__ ("":::"memory");
would be sufficient as it has a way lower overhead than
__sync_synchronize().


I don't know anything about Fortran or coarrays and I'm curious
whether this affects architectures with weak memory models. Is the
barrier only needed to stop reordering by the compiler or is does it
also need to stop reordering by the hardware?


Short answer: I think no mfence is needed as either the communication
is local (to the thread/process) - in which case the hardware will act
correctly - or the communication is remote (different thread, process,
communication to different computer via interlink [ethernet, infiniband,
...]); and in the later case, the communication library has to deal with
it.


Thanks for explaining this, it made things clear. Based on your description, I agree 
that hardware reordering shouldn't be a problem.



and the (main) program code (slightly trimmed):

   static void * restrict caf_token.0;
   static integer(kind=4) * restrict var;
   void _caf_init.1 (void);

   *var = 4;

   desc.3.data = 42;
   _gfortran_caf_send (caf_token.0, 0B /* offset */ var,
   _gfortran_caf_this_image (0), &desc.2, 0B, &desc.3, 4, 
4, 0);
   __asm__ __volatile__("":::"memory");  // new
   tmp = *var;

The problem is that in that case the compiler does not know that
"_gfortran_caf_send (caf_token.0," can modify "*var".



Is the restrict attribute on var correct? From what you say, it sounds like *var 
could be accessed through other pointers (assuming restrict has the same meaning as 
in C).


Matthew


Re: [PATCH] Fix memory orders description in atomic ops built-ins docs.

2015-05-18 Thread Matthew Wahab

Hello,

On 15/05/15 17:22, Torvald Riegel wrote:

This patch improves the documentation of the built-ins for atomic
operations.


The "memory model" to "memory order" change does improve things but I think that
the patch has some problems. As it is now, it makes some of the descriptions
quite difficult to understand and seems to assume more familiarity with details
of the C++11 specification then might be expected.

Generally, the memory order descriptions seem to be targeted towards language
designers but don't provide for anybody trying to understand how to implement or
to use the built-ins. Adding a less formal, programmers view to some of the
descriptions would help. That implies the descriptions would be more than just
illustrative, but I'd suggest that would be appropriate for the GCC manual.

I'm also not sure that the use of C++11 terms in the some of the
descriptions. In particular, using happens-before seems wrong because
happens-before isn't described anywhere in the GCC manual and because it has a
specific meaning in the C++11 specification that doesn't apply to the GCC
built-ins (which C++11 doesn't know about).

Some more comments below.

Regards,
Matthew


diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 6004681..5b2ded8 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -8853,19 +8853,19 @@ are not prevented from being speculated to before the 
barrier.

 [...]  If the data type size maps to one
-of the integral sizes that may have lock free support, the generic
-version uses the lock free built-in function.  Otherwise an
+of the integral sizes that may support lock-freedom, the generic
+version uses the lock-free built-in function.  Otherwise an
 external call is left to be resolved at run time.

=
This is a slightly awkward sentence. Maybe it could be replaced with something
on the lines of "The generic function uses the lock-free built-in function when
the data-type size makes that possible, otherwise an external call is left to be
resolved at run-time."
=

-The memory models integrate both barriers to code motion as well as
-synchronization requirements with other threads.  They are listed here
-in approximately ascending order of strength.
+An atomic operation can both constrain code motion by the compiler and
+be mapped to a hardware instruction for synchronization between threads
+(e.g., a fence).  [...]

=
This is a little unclear (and inaccurate, aarch64 can use two instructions
for fences). I also thought that atomic operations constrain code motion by the
hardware. Maybe break the link with the compiler and hardware: "An atomic
operation can both constrain code motion and act as a synchronization point
between threads".
=

 @table  @code
 @item __ATOMIC_RELAXED
-No barriers or synchronization.
+Implies no inter-thread ordering constraints.


It may be useful to be explicit that there are no restrctions on code motion.


 @item __ATOMIC_CONSUME
-Data dependency only for both barrier and synchronization with another
-thread.
+This is currently implemented using the stronger @code{__ATOMIC_ACQUIRE}
+memory order because of a deficiency in C++11's semantics for
+@code{memory_order_consume}.

=
It would be useful to have a description of what the __ATOMIC_CONSUME was
meant to do, as well as the fact that it currently just maps to
__ATOMIC_ACQUIRE. (Or maybe just drop it from the documentation until it's
fixed.)
=

 @item __ATOMIC_ACQUIRE
-Barrier to hoisting of code and synchronizes with release (or stronger)
-semantic stores from another thread.
+Creates an inter-thread happens-before constraint from the release (or
+stronger) semantic store to this acquire load.  Can prevent hoisting
+of code to before the operation.

=
As noted before, it's not clear what the "inter-thread happens-before"
means in this context.

Here and elsewhere:
"Can prevent  of code" is ambiguous: it doesn't say under what
conditions code would or wouldn't be prevented from moving.
=

-Note that the scope of a C++11 memory model depends on whether or not
-the function being called is a @emph{fence} (such as
-@samp{__atomic_thread_fence}).  In a fence, all memory accesses are
-subject to the restrictions of the memory model.  When the function is
-an operation on a location, the restrictions apply only to those
-memory accesses that could affect or that could depend on the
-location.
+Note that in the C++11 memory model, @emph{fences} (e.g.,
+@samp{__atomic_thread_fence}) take effect in combination with other
+atomic operations on specific memory locations (e.g., atomic loads);
+operations on specific memory locations do not necessarily affect other
+operations in the same way.


Its very unclear what this paragraph is saying. It seems to suggest that fences
only work in combination with other operations. But that doesn't seem right
since a __atomic_thread_fence (with appropriate memory order) can be dropped
into any piece of code and will act i

Re: [PATCH] Fix memory orders description in atomic ops built-ins docs.

2015-05-21 Thread Matthew Wahab

On 19/05/15 20:20, Torvald Riegel wrote:

On Mon, 2015-05-18 at 17:36 +0100, Matthew Wahab wrote:

Hello,

On 15/05/15 17:22, Torvald Riegel wrote:

This patch improves the documentation of the built-ins for atomic
operations.


The "memory model" to "memory order" change does improve things but I think that
the patch has some problems. As it is now, it makes some of the descriptions
quite difficult to understand and seems to assume more familiarity with details
of the C++11 specification then might be expected.


I'd say that's a side effect of the C++11 memory model being the
reference specification of the built-ins.


Generally, the memory order descriptions seem to be targeted towards language
designers but don't provide for anybody trying to understand how to implement or
to use the built-ins.


I agree that the current descriptions aren't a tutorial on the C++11
memory model.  However, given that the model is not GCC-specific, we
aren't really in a need to provide a tutorial, in the same way that we
don't provide a C++ tutorial.  Users can pick the C++11 memory model
educational material of their choice, and we need to document what's
missing to apply the C++11 knowledge to the built-ins we provide.



We seem to have different views about the purpose of the manual page. I'm treating it 
as a description of the built-in functions provided by gcc to generate the code 
needed to implement the C++11 model. That is, the built-ins are distinct from C++11 
and their descriptions should be, as far as possible, independent of the methods used 
in the C++11 specification to describe the C++11 memory model.


I understand of course that the __atomics were added in order to support C++11 but 
that doesn't make them part of C++11 and, since __atomic functions can be made 
available when C11/C++11 may not be, it seems to make sense to try for stand-alone 
descriptions.


I'm also concerned that the patch, by describing things in terms of formal C++11 
concepts, makes it more difficult for people to know what the built-ins can be 
expected to do and so make the built-in more difficult to use There is a danger that 
rather than take a risk with uncertainty about the behaviour of the __atomics, people 
will fall-back to the __sync functions simply because their expected behaviour is 
easier to work out.


I don't think that linking to external sites will help either, unless people already 
want to know C++11. Anybody who just wants to (e.g.) add a memory barrier will take 
one look at the __sync manual page and use the closest match from there instead.


Note that none of this requires a tutorial of any kind. I'm just suggesting that the 
manual should describe what behaviour should be expected of the code generated for 
the functions. For the memory orders, that would mean describing what constraints 
need to be met by the generated code. The requirement that the atomics should support 
C++11 could be met by making sure that the description of the expected behaviour is 
sufficient for C++11.



There are several resources for implementers, for example the mappings
maintained by the Cambridge research group.  I guess it would be
sufficient to have such material on the wiki.  Is there something
specific that you'd like to see documented for implementers?
[...]
I agree it's not described in the manual, but we're implementing C++11.


(As above) I believe we're supporting the implementation of C++11 and that the 
distinction is important.



However, I don't see why happens-before semantics wouldn't apply to
GCC's implementation of the built-ins; there may be cases where we
guarantee more, but if one uses the builtins in way allowed by the C++11
model, one certainly gets behavior and happens-before relationships as
specified by C++11.



My understanding is that happens-before is a relation used in the C++11 specification 
for a specific meaning. I believe that it's used to decide whether something is or is 
not a data race so saying that it applies to a gcc built-in would be wrong. Using the 
gcc built-in rather than the equivalent C++11 library function would result in 
program that C++11 regards as invalid. (Again, as I understand it.)





diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 6004681..5b2ded8 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -8853,19 +8853,19 @@ are not prevented from being speculated to before the 
barrier.

   [...]  If the data type size maps to one
-of the integral sizes that may have lock free support, the generic
-version uses the lock free built-in function.  Otherwise an
+of the integral sizes that may support lock-freedom, the generic
+version uses the lock-free built-in function.  Otherwise an
   external call is left to be resolved at run time.

=
This is a slightly awkward sentence. Maybe it could be replaced with somethin

[PATCH 1/3][AArch64] Strengthen barriers for sync-fetch-op builtins.

2015-05-21 Thread Matthew Wahab

On Aarch64, the __sync builtins are implemented using the __atomic operations
and barriers. This makes the the __sync builtins inconsistent with their
documentation which requires stronger barriers than those for the __atomic
builtins.

The difference between __sync and __atomic builtins is that the restrictions
imposed by a __sync operation's barrier apply to all memory references while the
restrictions of an __atomic operation's barrier only need to apply to a
subset. This affects Aarch64 in particular because, although its implementation
of __atomic builtins is correct, the barriers generated are too weak for the
__sync builtins.

The affected __sync builtins are the __sync_fetch_and_op (and
__sync_op_and_fetch) functions, __sync_compare_and_swap and
__sync_lock_test_and_set. This and a following patch modifies the code generated
for these functions to weaken initial load-acquires to a simple load and to add
a final fence to prevent code-hoisting. The last patch will add tests for the
code generated by the Aarch64 backend for the __sync builtins.

- Full barriers:  __sync_fetch_and_op, __sync_op_and_fetch
  __sync_*_compare_and_swap

  [load-acquire; code; store-release]
  becomes
  [load; code ; store-release; fence].

- Acquire barriers:  __sync_lock_test_and_set

  [load-acquire; code; store]
  becomes
  [load; code; store; fence]

The code generated for release barriers and for the __atomic builtins is
unchanged.

This patch changes the code generated for __sync_fetch_and_ and
__sync__and_fetch builtins.

Tested with check-gcc for aarch64-none-linux-gnu.

Ok for trunk?
Matthew

gcc/
2015-05-21  Matthew Wahab  

* config/aarch64/aarch64.c (aarch64_emit_post_barrier): New.
(aarch64_split_atomic_op): Check for __sync memory models, emit
appropriate initial and final barriers.


From 2092902d2738b0c24a6272e0b3480bb9cffd275c Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Fri, 15 May 2015 09:26:28 +0100
Subject: [PATCH 1/3] [AArch64] Strengthen barriers for sync-fetch-op builtin.

Change-Id: I3342a572d672163ffc703e4e51603744680334fc
---
 gcc/config/aarch64/aarch64.c | 30 +-
 1 file changed, 29 insertions(+), 1 deletion(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 7f0cc0d..778571f 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -9249,6 +9249,22 @@ aarch64_expand_compare_and_swap (rtx operands[])
   emit_insn (gen_rtx_SET (bval, x));
 }
 
+/* Emit a post-operation barrier.  */
+
+static void
+aarch64_emit_post_barrier (enum memmodel model)
+{
+  const enum memmodel base_model = memmodel_base (model);
+
+  if (is_mm_sync (model)
+  && (base_model == MEMMODEL_ACQUIRE
+	  || base_model == MEMMODEL_ACQ_REL
+	  || base_model == MEMMODEL_SEQ_CST))
+{
+  emit_insn (gen_mem_thread_fence (GEN_INT (MEMMODEL_SEQ_CST)));
+}
+}
+
 /* Split a compare and swap pattern.  */
 
 void
@@ -9311,12 +9327,20 @@ aarch64_split_atomic_op (enum rtx_code code, rtx old_out, rtx new_out, rtx mem,
 {
   machine_mode mode = GET_MODE (mem);
   machine_mode wmode = (mode == DImode ? DImode : SImode);
+  const enum memmodel model = memmodel_from_int (INTVAL (model_rtx));
+  const bool is_sync = is_mm_sync (model);
+  rtx load_model_rtx = model_rtx;
   rtx_code_label *label;
   rtx x;
 
   label = gen_label_rtx ();
   emit_label (label);
 
+  /* A __sync operation will emit a final fence to stop code hoisting, so the
+ load can be relaxed.  */
+  if (is_sync)
+load_model_rtx = GEN_INT (MEMMODEL_RELAXED);
+
   if (new_out)
 new_out = gen_lowpart (wmode, new_out);
   if (old_out)
@@ -9325,7 +9349,7 @@ aarch64_split_atomic_op (enum rtx_code code, rtx old_out, rtx new_out, rtx mem,
 old_out = new_out;
   value = simplify_gen_subreg (wmode, value, mode, 0);
 
-  aarch64_emit_load_exclusive (mode, old_out, mem, model_rtx);
+  aarch64_emit_load_exclusive (mode, old_out, mem, load_model_rtx);
 
   switch (code)
 {
@@ -9361,6 +9385,10 @@ aarch64_split_atomic_op (enum rtx_code code, rtx old_out, rtx new_out, rtx mem,
   x = gen_rtx_IF_THEN_ELSE (VOIDmode, x,
 			gen_rtx_LABEL_REF (Pmode, label), pc_rtx);
   aarch64_emit_unlikely_jump (gen_rtx_SET (pc_rtx, x));
+
+  /* Emit any fence needed for a __sync operation.  */
+  if (is_sync)
+aarch64_emit_post_barrier (model);
 }
 
 static void
-- 
1.9.1



[AArch64][PATCH 2/3] Strengthen barriers for sync-compare-swap builtins.

2015-05-21 Thread Matthew Wahab

This patch changes the code generated for __sync_type_compare_and_swap to

  ldxr reg; cmp; bne label; stlxr; cbnz; label: dmb ish; mov .., reg

This removes the acquire-barrier from the load and ends the operation with a
fence to prevent memory references appearing after the __sync operation from
being moved ahead of the store-release.

This also strengthens the acquire barrier generated for __sync_lock_test_and_set
(which, like compare-and-swap, is implemented as a form of atomic exchange):

  ldaxr; stxr; cbnz
becomes
  ldxr; stxr; cbnz; dmb ish

Tested with check-gcc for aarch64-none-linux-gnu.

Ok for trunk?
Matthew

2015-05-21  Matthew Wahab  

* config/aarch64/aarch64.c (aarch64_split_compare_and_swap): Check
for __sync memory models, emit appropriate initial and final
barriers.

From 6f748034d25b75ea7829192d94e54189c2fbf99e Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Fri, 15 May 2015 09:31:06 +0100
Subject: [PATCH 2/3] [AArch64] Strengthen barriers for sync-compare-swap
 builtins.

Change-Id: I335771f2f42ea951d227f20f6cb9daa07330614d
---
 gcc/config/aarch64/aarch64.c | 19 +--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 778571f..11a8cd0 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -9275,14 +9275,19 @@ aarch64_split_compare_and_swap (rtx operands[])
   bool is_weak;
   rtx_code_label *label1, *label2;
   rtx x, cond;
+  enum memmodel model;
+  rtx model_rtx;
+  rtx load_model_rtx;
 
   rval = operands[0];
   mem = operands[1];
   oldval = operands[2];
   newval = operands[3];
   is_weak = (operands[4] != const0_rtx);
+  model_rtx = operands[5];
   scratch = operands[7];
   mode = GET_MODE (mem);
+  model = memmodel_from_int (INTVAL (model_rtx));
 
   label1 = NULL;
   if (!is_weak)
@@ -9292,7 +9297,13 @@ aarch64_split_compare_and_swap (rtx operands[])
 }
   label2 = gen_label_rtx ();
 
-  aarch64_emit_load_exclusive (mode, rval, mem, operands[5]);
+  /* A __sync operation will end with a fence so the load can be relaxed.  */
+  if (is_mm_sync (model))
+load_model_rtx = GEN_INT (MEMMODEL_RELAXED);
+  else
+load_model_rtx = model_rtx;
+
+  aarch64_emit_load_exclusive (mode, rval, mem, load_model_rtx);
 
   cond = aarch64_gen_compare_reg (NE, rval, oldval);
   x = gen_rtx_NE (VOIDmode, cond, const0_rtx);
@@ -9300,7 +9311,7 @@ aarch64_split_compare_and_swap (rtx operands[])
 			gen_rtx_LABEL_REF (Pmode, label2), pc_rtx);
   aarch64_emit_unlikely_jump (gen_rtx_SET (pc_rtx, x));
 
-  aarch64_emit_store_exclusive (mode, scratch, mem, newval, operands[5]);
+  aarch64_emit_store_exclusive (mode, scratch, mem, newval, model_rtx);
 
   if (!is_weak)
 {
@@ -9317,6 +9328,10 @@ aarch64_split_compare_and_swap (rtx operands[])
 }
 
   emit_label (label2);
+
+  /* A __sync operation may need a final fence.  */
+  if (is_mm_sync (model))
+aarch64_emit_post_barrier (model);
 }
 
 /* Split an atomic operation.  */
-- 
1.9.1



[PATCH 3/3][Aarch64] Add tests for __sync_builtins.

2015-05-21 Thread Matthew Wahab

This patch adds tests for the code generated by the Aarch64 backend for the
__sync builtins.

Tested aarch64-none-linux-gnu with check-gcc.

Ok for trunk?
Matthew

gcc/testsuite/
2015-05-21  Matthew Wahab  

* gcc.target/aarch64/sync-comp-swap.c: New.
* gcc.target/aarch64/sync-comp-swap.x: New.
* gcc.target/aarch64/sync-op-acquire.c: New.
* gcc.target/aarch64/sync-op-acquire.x: New.
* gcc.target/aarch64/sync-op-full.c: New.
* gcc.target/aarch64/sync-op-full.x: New.
* gcc.target/aarch64/sync-op-release.c: New.
* gcc.target/aarch64/sync-op-release.x: New.

From 74738b2c0ceb9d5cae281b9609c134fde1d459e9 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Fri, 15 May 2015 09:31:42 +0100
Subject: [PATCH 3/3] [Aarch64] Add tests for __sync_builtins.

Change-Id: I9f7cde85613dfe2cb6df55cbc732e683092f14d8
---
 gcc/testsuite/gcc.target/aarch64/sync-comp-swap.c  |  8 +++
 gcc/testsuite/gcc.target/aarch64/sync-comp-swap.x  | 13 
 gcc/testsuite/gcc.target/aarch64/sync-op-acquire.c |  8 +++
 gcc/testsuite/gcc.target/aarch64/sync-op-acquire.x |  7 +++
 gcc/testsuite/gcc.target/aarch64/sync-op-full.c|  8 +++
 gcc/testsuite/gcc.target/aarch64/sync-op-full.x| 73 ++
 gcc/testsuite/gcc.target/aarch64/sync-op-release.c |  6 ++
 gcc/testsuite/gcc.target/aarch64/sync-op-release.x |  7 +++
 8 files changed, 130 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sync-comp-swap.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sync-comp-swap.x
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sync-op-acquire.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sync-op-acquire.x
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sync-op-full.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sync-op-full.x
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sync-op-release.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sync-op-release.x

diff --git a/gcc/testsuite/gcc.target/aarch64/sync-comp-swap.c b/gcc/testsuite/gcc.target/aarch64/sync-comp-swap.c
new file mode 100644
index 000..126b997
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sync-comp-swap.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-ipa-icf" } */
+
+#include "sync-comp-swap.x"
+
+/* { dg-final { scan-assembler-times "ldxr\tw\[0-9\]+, \\\[x\[0-9\]+\\\]" 2 } } */
+/* { dg-final { scan-assembler-times "stlxr\tw\[0-9\]+, w\[0-9\]+, \\\[x\[0-9\]+\\\]" 2 } } */
+/* { dg-final { scan-assembler-times "dmb\tish" 2 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sync-comp-swap.x b/gcc/testsuite/gcc.target/aarch64/sync-comp-swap.x
new file mode 100644
index 000..eda52e40
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sync-comp-swap.x
@@ -0,0 +1,13 @@
+int v = 0;
+
+int
+sync_bool_compare_swap (int a, int b)
+{
+  return __sync_bool_compare_and_swap (&v, &a, &b);
+}
+
+int
+sync_val_compare_swap (int a, int b)
+{
+  return __sync_val_compare_and_swap (&v, &a, &b);
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/sync-op-acquire.c b/gcc/testsuite/gcc.target/aarch64/sync-op-acquire.c
new file mode 100644
index 000..2639f9f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sync-op-acquire.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+#include "sync-op-acquire.x"
+
+/* { dg-final { scan-assembler-times "ldxr\tw\[0-9\]+, \\\[x\[0-9\]+\\\]" 1 } } */
+/* { dg-final { scan-assembler-times "stxr\tw\[0-9\]+, w\[0-9\]+, \\\[x\[0-9\]+\\\]" 1 } } */
+/* { dg-final { scan-assembler-times "dmb\tish" 1 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sync-op-acquire.x b/gcc/testsuite/gcc.target/aarch64/sync-op-acquire.x
new file mode 100644
index 000..4c4548c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sync-op-acquire.x
@@ -0,0 +1,7 @@
+int v;
+
+int
+sync_lock_test_and_set (int a)
+{
+  return __sync_lock_test_and_set (&v, a);
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/sync-op-full.c b/gcc/testsuite/gcc.target/aarch64/sync-op-full.c
new file mode 100644
index 000..10fc8fc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sync-op-full.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+#include "sync-op-full.x"
+
+/* { dg-final { scan-assembler-times "ldxr\tw\[0-9\]+, \\\[x\[0-9\]+\\\]" 12 } } */
+/* { dg-final { scan-assembler-times "stlxr\tw\[0-9\]+, w\[0-9\]+, \\\[x\[0-9\]+\\\]" 12 } } */
+/* { dg-final { scan-assembler-times "dmb\tish" 12 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sync-op-full.x b/gcc/testsuite/gcc.target/aarch64/sync-op-full.x
new file mode 100644
index 000..c24223d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sync-op-full.x
@@ -0,0 +1,73 @@
+int v = 0;
+
+int
+sync_fetch_and_add (int a)
+{
+  return __sync_f

Re: [PATCH 1/3][AArch64][PR target/65697] Strengthen barriers for sync-fetch-op builtins.

2015-05-22 Thread Matthew Wahab

[Added PR number and updated patches]

On Aarch64, the __sync builtins are implemented using the __atomic operations
and barriers. This makes the the __sync builtins inconsistent with their
documentation which requires stronger barriers than those for the __atomic
builtins.

The difference between __sync and __atomic builtins is that the restrictions
imposed by a __sync operation's barrier apply to all memory references while the
restrictions of an __atomic operation's barrier only need to apply to a
subset. This affects Aarch64 in particular because, although its implementation
of __atomic builtins is correct, the barriers generated are too weak for the
__sync builtins.

The affected __sync builtins are the __sync_fetch_and_op (and
__sync_op_and_fetch) functions, __sync_compare_and_swap and
__sync_lock_test_and_set. This and a following patch modifies the code generated
for these functions to weaken initial load-acquires to a simple load and to add
a final fence to prevent code-hoisting. The last patch will add tests for the
code generated by the Aarch64 backend for the __sync builtins.

- Full barriers:  __sync_fetch_and_op, __sync_op_and_fetch
  __sync_*_compare_and_swap

  [load-acquire; code; store-release]
  becomes
  [load; code ; store-release; fence].

- Acquire barriers:  __sync_lock_test_and_set

  [load-acquire; code; store]
  becomes
  [load; code; store; fence]

The code generated for release barriers and for the __atomic builtins is
unchanged.

This patch changes the code generated for __sync_fetch_and_ and
__sync__and_fetch builtins.

Tested with check-gcc for aarch64-none-linux-gnu.

Ok for trunk?
Matthew

gcc/
2015-05-22  Matthew Wahab  

PR target/65697
* config/aarch64/aarch64.c (aarch64_emit_post_barrier): New.
(aarch64_split_atomic_op): Check for __sync memory models, emit
appropriate initial and final barriers.



From 5a2d546359f78cd3f304a62617f0fc385664374e Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Fri, 15 May 2015 09:26:28 +0100
Subject: [PATCH 1/3] [AArch64] Strengthen barriers for sync-fetch-op builtin.

Change-Id: I3342a572d672163ffc703e4e51603744680334fc
---
 gcc/config/aarch64/aarch64.c | 30 +-
 1 file changed, 29 insertions(+), 1 deletion(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 8c25d75..182dbad 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -9407,6 +9407,22 @@ aarch64_expand_compare_and_swap (rtx operands[])
   emit_insn (gen_rtx_SET (bval, x));
 }
 
+/* Emit a post-operation barrier.  */
+
+static void
+aarch64_emit_post_barrier (enum memmodel model)
+{
+  const enum memmodel base_model = memmodel_base (model);
+
+  if (is_mm_sync (model)
+  && (base_model == MEMMODEL_ACQUIRE
+	  || base_model == MEMMODEL_ACQ_REL
+	  || base_model == MEMMODEL_SEQ_CST))
+{
+  emit_insn (gen_mem_thread_fence (GEN_INT (MEMMODEL_SEQ_CST)));
+}
+}
+
 /* Split a compare and swap pattern.  */
 
 void
@@ -9469,12 +9485,20 @@ aarch64_split_atomic_op (enum rtx_code code, rtx old_out, rtx new_out, rtx mem,
 {
   machine_mode mode = GET_MODE (mem);
   machine_mode wmode = (mode == DImode ? DImode : SImode);
+  const enum memmodel model = memmodel_from_int (INTVAL (model_rtx));
+  const bool is_sync = is_mm_sync (model);
+  rtx load_model_rtx = model_rtx;
   rtx_code_label *label;
   rtx x;
 
   label = gen_label_rtx ();
   emit_label (label);
 
+  /* A __sync operation will emit a final fence to stop code hoisting, so the
+ load can be relaxed.  */
+  if (is_sync)
+load_model_rtx = GEN_INT (MEMMODEL_RELAXED);
+
   if (new_out)
 new_out = gen_lowpart (wmode, new_out);
   if (old_out)
@@ -9483,7 +9507,7 @@ aarch64_split_atomic_op (enum rtx_code code, rtx old_out, rtx new_out, rtx mem,
 old_out = new_out;
   value = simplify_gen_subreg (wmode, value, mode, 0);
 
-  aarch64_emit_load_exclusive (mode, old_out, mem, model_rtx);
+  aarch64_emit_load_exclusive (mode, old_out, mem, load_model_rtx);
 
   switch (code)
 {
@@ -9519,6 +9543,10 @@ aarch64_split_atomic_op (enum rtx_code code, rtx old_out, rtx new_out, rtx mem,
   x = gen_rtx_IF_THEN_ELSE (VOIDmode, x,
 			gen_rtx_LABEL_REF (Pmode, label), pc_rtx);
   aarch64_emit_unlikely_jump (gen_rtx_SET (pc_rtx, x));
+
+  /* Emit any fence needed for a __sync operation.  */
+  if (is_sync)
+aarch64_emit_post_barrier (model);
 }
 
 static void
-- 
1.9.1



Re: [PATCH 2/3][AArch64][PR target/65697] Strengthen barriers for sync-compare-swap builtins.

2015-05-22 Thread Matthew Wahab

[Added PR number and updated patches]

This patch changes the code generated for __sync_type_compare_and_swap to

  ldxr reg; cmp; bne label; stlxr; cbnz; label: dmb ish; mov .., reg

This removes the acquire-barrier from the load and ends the operation with a
fence to prevent memory references appearing after the __sync operation from
being moved ahead of the store-release.

This also strengthens the acquire barrier generated for __sync_lock_test_and_set
(which, like compare-and-swap, is implemented as a form of atomic exchange):

  ldaxr; stxr; cbnz
becomes
  ldxr; stxr; cbnz; dmb ish


Tested with check-gcc for aarch64-none-linux-gnu.

Ok for trunk?
Matthew

2015-05-22  Matthew Wahab  

PR target/65697
* config/aarch64/aarch64.c (aarch64_split_compare_and_swap): Check
for __sync memory models, emit appropriate initial and final
barriers.

From 1e5cda95944e7176b8934296b1bb1ec4c9fb1362 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Fri, 15 May 2015 09:31:06 +0100
Subject: [PATCH 2/3] [AArch64] Strengthen barriers for sync-compare-swap
 builtins.

Change-Id: I335771f2f42ea951d227f20f6cb9daa07330614d
---
 gcc/config/aarch64/aarch64.c | 19 +--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 182dbad..5b9feee 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -9433,14 +9433,19 @@ aarch64_split_compare_and_swap (rtx operands[])
   bool is_weak;
   rtx_code_label *label1, *label2;
   rtx x, cond;
+  enum memmodel model;
+  rtx model_rtx;
+  rtx load_model_rtx;
 
   rval = operands[0];
   mem = operands[1];
   oldval = operands[2];
   newval = operands[3];
   is_weak = (operands[4] != const0_rtx);
+  model_rtx = operands[5];
   scratch = operands[7];
   mode = GET_MODE (mem);
+  model = memmodel_from_int (INTVAL (model_rtx));
 
   label1 = NULL;
   if (!is_weak)
@@ -9450,7 +9455,13 @@ aarch64_split_compare_and_swap (rtx operands[])
 }
   label2 = gen_label_rtx ();
 
-  aarch64_emit_load_exclusive (mode, rval, mem, operands[5]);
+  /* A __sync operation will end with a fence so the load can be relaxed.  */
+  if (is_mm_sync (model))
+load_model_rtx = GEN_INT (MEMMODEL_RELAXED);
+  else
+load_model_rtx = model_rtx;
+
+  aarch64_emit_load_exclusive (mode, rval, mem, load_model_rtx);
 
   cond = aarch64_gen_compare_reg (NE, rval, oldval);
   x = gen_rtx_NE (VOIDmode, cond, const0_rtx);
@@ -9458,7 +9469,7 @@ aarch64_split_compare_and_swap (rtx operands[])
 			gen_rtx_LABEL_REF (Pmode, label2), pc_rtx);
   aarch64_emit_unlikely_jump (gen_rtx_SET (pc_rtx, x));
 
-  aarch64_emit_store_exclusive (mode, scratch, mem, newval, operands[5]);
+  aarch64_emit_store_exclusive (mode, scratch, mem, newval, model_rtx);
 
   if (!is_weak)
 {
@@ -9475,6 +9486,10 @@ aarch64_split_compare_and_swap (rtx operands[])
 }
 
   emit_label (label2);
+
+  /* A __sync operation may need a final fence.  */
+  if (is_mm_sync (model))
+aarch64_emit_post_barrier (model);
 }
 
 /* Split an atomic operation.  */
-- 
1.9.1



Re: [PATCH 3/3][Aarch64][PR target/65697] Add tests for __sync_builtins.

2015-05-22 Thread Matthew Wahab

[Added PR number and updated patches]

This patch adds tests for the code generated by the Aarch64 backend for the
__sync builtins.

Tested aarch64-none-linux-gnu with check-gcc.

Ok for trunk?
Matthew

gcc/testsuite/
2015-05-21  Matthew Wahab  

PR target/65697
* gcc.target/aarch64/sync-comp-swap.c: New.
* gcc.target/aarch64/sync-comp-swap.x: New.
* gcc.target/aarch64/sync-op-acquire.c: New.
* gcc.target/aarch64/sync-op-acquire.x: New.
* gcc.target/aarch64/sync-op-full.c: New.
* gcc.target/aarch64/sync-op-full.x: New.
* gcc.target/aarch64/sync-op-release.c: New.
* gcc.target/aarch64/sync-op-release.x: New.


From a3e8df9afce1098c1c616d66a309ce5bc5b95593 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Fri, 15 May 2015 09:31:42 +0100
Subject: [PATCH 3/3] [Aarch64] Add tests for __sync_builtins.

Change-Id: I9f7cde85613dfe2cb6df55cbc732e683092f14d8
---
 gcc/testsuite/gcc.target/aarch64/sync-comp-swap.c  |  8 +++
 gcc/testsuite/gcc.target/aarch64/sync-comp-swap.x  | 13 
 gcc/testsuite/gcc.target/aarch64/sync-op-acquire.c |  8 +++
 gcc/testsuite/gcc.target/aarch64/sync-op-acquire.x |  7 +++
 gcc/testsuite/gcc.target/aarch64/sync-op-full.c|  8 +++
 gcc/testsuite/gcc.target/aarch64/sync-op-full.x| 73 ++
 gcc/testsuite/gcc.target/aarch64/sync-op-release.c |  6 ++
 gcc/testsuite/gcc.target/aarch64/sync-op-release.x |  7 +++
 8 files changed, 130 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sync-comp-swap.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sync-comp-swap.x
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sync-op-acquire.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sync-op-acquire.x
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sync-op-full.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sync-op-full.x
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sync-op-release.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sync-op-release.x

diff --git a/gcc/testsuite/gcc.target/aarch64/sync-comp-swap.c b/gcc/testsuite/gcc.target/aarch64/sync-comp-swap.c
new file mode 100644
index 000..126b997
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sync-comp-swap.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-ipa-icf" } */
+
+#include "sync-comp-swap.x"
+
+/* { dg-final { scan-assembler-times "ldxr\tw\[0-9\]+, \\\[x\[0-9\]+\\\]" 2 } } */
+/* { dg-final { scan-assembler-times "stlxr\tw\[0-9\]+, w\[0-9\]+, \\\[x\[0-9\]+\\\]" 2 } } */
+/* { dg-final { scan-assembler-times "dmb\tish" 2 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sync-comp-swap.x b/gcc/testsuite/gcc.target/aarch64/sync-comp-swap.x
new file mode 100644
index 000..eda52e40
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sync-comp-swap.x
@@ -0,0 +1,13 @@
+int v = 0;
+
+int
+sync_bool_compare_swap (int a, int b)
+{
+  return __sync_bool_compare_and_swap (&v, &a, &b);
+}
+
+int
+sync_val_compare_swap (int a, int b)
+{
+  return __sync_val_compare_and_swap (&v, &a, &b);
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/sync-op-acquire.c b/gcc/testsuite/gcc.target/aarch64/sync-op-acquire.c
new file mode 100644
index 000..2639f9f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sync-op-acquire.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+#include "sync-op-acquire.x"
+
+/* { dg-final { scan-assembler-times "ldxr\tw\[0-9\]+, \\\[x\[0-9\]+\\\]" 1 } } */
+/* { dg-final { scan-assembler-times "stxr\tw\[0-9\]+, w\[0-9\]+, \\\[x\[0-9\]+\\\]" 1 } } */
+/* { dg-final { scan-assembler-times "dmb\tish" 1 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sync-op-acquire.x b/gcc/testsuite/gcc.target/aarch64/sync-op-acquire.x
new file mode 100644
index 000..4c4548c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sync-op-acquire.x
@@ -0,0 +1,7 @@
+int v;
+
+int
+sync_lock_test_and_set (int a)
+{
+  return __sync_lock_test_and_set (&v, a);
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/sync-op-full.c b/gcc/testsuite/gcc.target/aarch64/sync-op-full.c
new file mode 100644
index 000..10fc8fc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sync-op-full.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+#include "sync-op-full.x"
+
+/* { dg-final { scan-assembler-times "ldxr\tw\[0-9\]+, \\\[x\[0-9\]+\\\]" 12 } } */
+/* { dg-final { scan-assembler-times "stlxr\tw\[0-9\]+, w\[0-9\]+, \\\[x\[0-9\]+\\\]" 12 } } */
+/* { dg-final { scan-assembler-times "dmb\tish" 12 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sync-op-full.x b/gcc/testsuite/gcc.target/aarch64/sync-op-full.x
new file mode 100644
index 000..c24223d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sync-op-full.x
@@ -0,0 +1,73 @@
+i

Re: [PATCH 1/3][AArch64] Strengthen barriers for sync-fetch-op builtins.

2015-05-22 Thread Matthew Wahab

On 22/05/15 12:26, Ramana Radhakrishnan wrote:


Ok for trunk?


I can't approve but do you mind taking care of -march=armv8-a in the
arm backend too as that would have the same issues.



Will do,
Matthew



Re: [PATCH] Fix memory orders description in atomic ops built-ins docs.

2015-05-22 Thread Matthew Wahab

On 21/05/15 19:26, Torvald Riegel wrote:

On Thu, 2015-05-21 at 16:45 +0100, Matthew Wahab wrote:

On 19/05/15 20:20, Torvald Riegel wrote:

On Mon, 2015-05-18 at 17:36 +0100, Matthew Wahab wrote:

Hello,

On 15/05/15 17:22, Torvald Riegel wrote:

This patch improves the documentation of the built-ins for atomic
operations.




I think we're talking at cross-purposes and not really getting anywhere. I've replied 
to some of your comments below, but it's mostly a restatement of points already made.


I'll repeat that, although I have concerns about the patch, I don't object to it 
going in. Maybe wait a few days to see if anybody else wants to comment but, at this 
point and, since it's a documentation patch and won't break anything, it's better to 
just commit and deal with any problems come up.



We seem to have different views about the purpose of the manual page. I'm 
treating it
as a description of the built-in functions provided by gcc to generate the code
needed to implement the C++11 model. That is, the built-ins are distinct from 
C++11
and their descriptions should be, as far as possible, independent of the 
methods used
in the C++11 specification to describe the C++11 memory model.


OK.  But we'd need a *precise* specification of what they do if we'd
want to make them separate from the C++11 memory model.  And we don't
have that, would you agree?


There is a difference between the sort of description that is needed for a formal 
specification and the sort that would be needed for a programmers manual. The best 
example of this that I can think of is the Standard ML definition 
(http://sml-family.org). That is a mathematical (so precise) definition that is 
invaluable if you want an unambiguous specification of the language. But its useless 
for anybody who just wants to use Standard ML to write programs. For that, you need 
go to the imprecise descriptions that are given in books about SML and in the 
documentation for SML compilers and libraries.


The problem with using the formal SML definition is the same as with using the formal 
C++11 definition: most of it is detail needed to make things in the formal 
specification come out the right way. That detail, about things that are internal to 
the definition of the specification, makes it difficult to understand what is 
intended to be available for the user.


The GCC manual seems to me to be aimed more at the people who want to use GCC to 
write code and I don't think that the patch makes much allowance for them. I do think 
that more precise statements about the relationship to C++11 are useful to have. Its 
the sort of constraint that ought to be documented somewhere. But it seems to be more 
of interest to compiler writers or, at least, to users who are as knowledgeable as 
compiler writers. A document targeting that group, such as the GCC internals or a GCC 
wiki-page, would seem to be a better place for the information.


(Another example of the distinction may be the Intel Itanium ABI documentation which 
has a programmers description of the synchronization primitives and a separate, 
formal description of their behaviour.)


For what it's worth, my view of how C++11, the __atomics and the machine code line up 
is that each is a distinct layer. Each layer implements the requirements of the 
higher (more abstract) layer but is otherwise entirely independent. That's why I 
think that a description of the __atomic built-in, aimed at compiler users rather 
than writers and that doesn't expect knowledge of C++11 is desirable and possible.



I'm also concerned that the patch, by describing things in terms of formal C++11
concepts, makes it more difficult for people to know what the built-ins can be
expected to do and so make the built-in more difficult to use[..]


I hadn't thought about that possible danger, but that would be right.
The way I would prefer to counter that is that we add a big fat warning
to the __sync built-ins that we don't have a precise specification for
them and that there are several corners of hand-waving and potentially
further issues, and that this is another reason to prefer the __atomic
built-ins.  PR 65697 etc. are enough indication for me that we indeed
lack a proper specification.


Increasing uncertainty about the __sync built-ins wouldn't make people move to 
equally uncertain __atomic built-ins. There's enough knowledge and use of the __sync 
builtins to make them a more comfortable choice then the C++11 atomics and in the 
worst case it would push people to roll their own synchronization functions with 
assembler or system calls.



Well, "just wants to add a memory barrier" is a the start of the
problem.  The same way one needs to understand a hardware memory model
to pick the right HW instruction(s), the same one needs to understand a
programming language memory model to pick a fence and 

Re: [PATCH 1/3][AArch64][PR target/65797] Strengthen barriers for sync-fetch-op builtins.

2015-06-01 Thread Matthew Wahab

On 26/05/15 10:32, James Greenhalgh wrote:

Please tie this to the PR which was open in the ChangLog entry.


(aarch64_split_atomic_op): Check for __sync memory models, emit
appropriate initial and final barriers.


I don't see any new initial barriers. I think you are referring to
relaxing the ldaxr to an ldxr for __sync primitives, in which case, say
that.


+/* Emit a post-operation barrier.  */


This comment could do with some more detail. What is a post-operation
barrier? When do we need one? What is the MODEL parameter?


+  /* A __sync operation will emit a final fence to stop code hoisting, so the


Can we pick a consistent terminology between fence/barrier? They are
currently used interchangeably, but I think we generally prefer barrier
in the AArch64 port.


-  aarch64_emit_load_exclusive (mode, old_out, mem, model_rtx);
+  aarch64_emit_load_exclusive (mode, old_out, mem, load_model_rtx);


To my mind, these two hunks would be marginally easier to follow if
we combined them, as so:


Attached updated patch:
- Expanded the comment for aarch64_emit_post_barrier.
- Used 'barrier' rather than 'fence' in comments.
- Simplified the code for the initial load.

Tested with check-gcc for aarch64-none-linux-gnu.

Ok?
Matthew

2015-06-01  Matthew Wahab  

PR target/65697
* config/aarch64/aarch64.c (aarch64_split_compare_and_swap): Check
for __sync memory models, emit initial loads and final barriers as
appropriate.

From 02effa4c1e3e219f727c88091ebd9938d90c3f8a Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Fri, 15 May 2015 09:26:28 +0100
Subject: [PATCH 1/3] [AArch64] Strengthen barriers for sync-fetch-op builtin.

Change-Id: I3342a572d672163ffc703e4e51603744680334fc
---
 gcc/config/aarch64/aarch64.c | 31 ++-
 1 file changed, 30 insertions(+), 1 deletion(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 083b9b4..b083e12 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -9409,6 +9409,23 @@ aarch64_expand_compare_and_swap (rtx operands[])
   emit_insn (gen_rtx_SET (bval, x));
 }
 
+/* Emit a barrier, that is appropriate for memory model MODEL, at the end of a
+   sequence implementing an atomic operation.  */
+
+static void
+aarch64_emit_post_barrier (enum memmodel model)
+{
+  const enum memmodel base_model = memmodel_base (model);
+
+  if (is_mm_sync (model)
+  && (base_model == MEMMODEL_ACQUIRE
+	  || base_model == MEMMODEL_ACQ_REL
+	  || base_model == MEMMODEL_SEQ_CST))
+{
+  emit_insn (gen_mem_thread_fence (GEN_INT (MEMMODEL_SEQ_CST)));
+}
+}
+
 /* Split a compare and swap pattern.  */
 
 void
@@ -9471,6 +9488,8 @@ aarch64_split_atomic_op (enum rtx_code code, rtx old_out, rtx new_out, rtx mem,
 {
   machine_mode mode = GET_MODE (mem);
   machine_mode wmode = (mode == DImode ? DImode : SImode);
+  const enum memmodel model = memmodel_from_int (INTVAL (model_rtx));
+  const bool is_sync = is_mm_sync (model);
   rtx_code_label *label;
   rtx x;
 
@@ -9485,7 +9504,13 @@ aarch64_split_atomic_op (enum rtx_code code, rtx old_out, rtx new_out, rtx mem,
 old_out = new_out;
   value = simplify_gen_subreg (wmode, value, mode, 0);
 
-  aarch64_emit_load_exclusive (mode, old_out, mem, model_rtx);
+  /* The initial load can be relaxed for a __sync operation since a final
+ barrier will be emitted to stop code hoisting.  */
+ if (is_sync)
+aarch64_emit_load_exclusive (mode, old_out, mem,
+ GEN_INT (MEMMODEL_RELAXED));
+  else
+aarch64_emit_load_exclusive (mode, old_out, mem, model_rtx);
 
   switch (code)
 {
@@ -9521,6 +9546,10 @@ aarch64_split_atomic_op (enum rtx_code code, rtx old_out, rtx new_out, rtx mem,
   x = gen_rtx_IF_THEN_ELSE (VOIDmode, x,
 			gen_rtx_LABEL_REF (Pmode, label), pc_rtx);
   aarch64_emit_unlikely_jump (gen_rtx_SET (pc_rtx, x));
+
+  /* Emit any final barrier needed for a __sync operation.  */
+  if (is_sync)
+aarch64_emit_post_barrier (model);
 }
 
 static void
-- 
1.9.1



Re: [PATCH 2/3][AArch64][PR target/65697] Strengthen barriers for sync-compare-swap builtins.

2015-06-01 Thread Matthew Wahab

On 22/05/15 09:28, Matthew Wahab wrote:

[Added PR number and updated patches]

This patch changes the code generated for __sync_type_compare_and_swap to

ldxr reg; cmp; bne label; stlxr; cbnz; label: dmb ish; mov .., reg

This removes the acquire-barrier from the load and ends the operation with a
fence to prevent memory references appearing after the __sync operation from
being moved ahead of the store-release.

This also strengthens the acquire barrier generated for __sync_lock_test_and_set
(which, like compare-and-swap, is implemented as a form of atomic exchange):

ldaxr; stxr; cbnz
becomes
ldxr; stxr; cbnz; dmb ish



Updated patch:
- Used 'barrier' rather than 'fence' in comments.
- Simplified the code for the initial load.

Tested with check-gcc for aarch64-none-linux-gnu.

Ok for trunk?
Matthew

2015-06-01  Matthew Wahab  

PR target/65697
* config/aarch64/aarch64.c (aarch64_split_compare_and_swap): Check
for __sync memory models, emit initial loads and final barriers as
appropriate.

From e1f68896db3b367d43a7ae863339dbe100244360 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Fri, 15 May 2015 09:31:06 +0100
Subject: [PATCH 2/3] [AArch64] Strengthen barriers for sync-compare-swap
 builtins.

Change-Id: I335771f2f42ea951d227f20f6cb9daa07330614d
---
 gcc/config/aarch64/aarch64.c | 18 --
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index b083e12..2db8f4f 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -9436,14 +9436,18 @@ aarch64_split_compare_and_swap (rtx operands[])
   bool is_weak;
   rtx_code_label *label1, *label2;
   rtx x, cond;
+  enum memmodel model;
+  rtx model_rtx;
 
   rval = operands[0];
   mem = operands[1];
   oldval = operands[2];
   newval = operands[3];
   is_weak = (operands[4] != const0_rtx);
+  model_rtx = operands[5];
   scratch = operands[7];
   mode = GET_MODE (mem);
+  model = memmodel_from_int (INTVAL (model_rtx));
 
   label1 = NULL;
   if (!is_weak)
@@ -9453,7 +9457,13 @@ aarch64_split_compare_and_swap (rtx operands[])
 }
   label2 = gen_label_rtx ();
 
-  aarch64_emit_load_exclusive (mode, rval, mem, operands[5]);
+  /* The initial load can be relaxed for a __sync operation since a final
+ barrier will be emitted to stop code hoisting.  */
+  if (is_mm_sync (model))
+aarch64_emit_load_exclusive (mode, rval, mem,
+ GEN_INT (MEMMODEL_RELAXED));
+  else
+aarch64_emit_load_exclusive (mode, rval, mem, model_rtx);
 
   cond = aarch64_gen_compare_reg (NE, rval, oldval);
   x = gen_rtx_NE (VOIDmode, cond, const0_rtx);
@@ -9461,7 +9471,7 @@ aarch64_split_compare_and_swap (rtx operands[])
 			gen_rtx_LABEL_REF (Pmode, label2), pc_rtx);
   aarch64_emit_unlikely_jump (gen_rtx_SET (pc_rtx, x));
 
-  aarch64_emit_store_exclusive (mode, scratch, mem, newval, operands[5]);
+  aarch64_emit_store_exclusive (mode, scratch, mem, newval, model_rtx);
 
   if (!is_weak)
 {
@@ -9478,6 +9488,10 @@ aarch64_split_compare_and_swap (rtx operands[])
 }
 
   emit_label (label2);
+
+  /* Emit any final barrier needed for a __sync operation.  */
+  if (is_mm_sync (model))
+aarch64_emit_post_barrier (model);
 }
 
 /* Split an atomic operation.  */
-- 
1.9.1



[Aarch64] Add support for ARMv8.1 command line options.

2015-06-04 Thread Matthew Wahab

ARMv8.1 is a set of optional architectural extensions to ARMv8. Support, added
by other patches, is enabled in binutils for ARMv8.1 and for the individual
extensions by using architechure name "armv8.1-a" or by adding the extension
name to "armv8-a".

This patch adds support to gcc for using "armv8.1-a" as an architecture name and
for using "armv8-a" with one or more of the ARMv8.1 extension names "lse",
"pan", "rdma" or "lor" . The new options are passed through to the toolchain and
don't affect code generation in gcc.

Tested aarch64-none-linux-gnu with check-gcc.

Ok for trunk?
Matthew

gcc/
2015-06-4  Matthew Wahab  

* config/aarch64/aarch64-arches.def: Add "armv8.1-a".
* config/aarch64/aarch64-options-extensions.def: Update "fP",
"simd" and "crypto".  Add "lse", "pan", "lor" and "rdma".
* gcc/config/aarch64/aarch64.h (AARCH64_FL_LSE): New.
(AARCH64_FL_PAN): New.
(AARCH64_FL_LOR): New.
(AARCH64_FL_RDMA): New.
(AARCH64_FL_FOR_ARCH8_1): New.
* doc/invoke.texi (AArch64 Options): Add "armv8.1-a" to
-march. Add "lse", "pan", "lor", "rdma" to feature modifiers.
diff --git a/gcc/config/aarch64/aarch64-arches.def b/gcc/config/aarch64/aarch64-arches.def
index bf4e185..abbfce6 100644
--- a/gcc/config/aarch64/aarch64-arches.def
+++ b/gcc/config/aarch64/aarch64-arches.def
@@ -27,3 +27,4 @@
the flags implied by the architecture.  */
 
 AARCH64_ARCH("armv8-a",	  generic,	 8,  AARCH64_FL_FOR_ARCH8)
+AARCH64_ARCH("armv8.1-a", generic,	 8,  AARCH64_FL_FOR_ARCH8_1)
diff --git a/gcc/config/aarch64/aarch64-option-extensions.def b/gcc/config/aarch64/aarch64-option-extensions.def
index f296296..1762cc8 100644
--- a/gcc/config/aarch64/aarch64-option-extensions.def
+++ b/gcc/config/aarch64/aarch64-option-extensions.def
@@ -39,7 +39,11 @@
AArch64, and therefore serves as a template for adding more CPUs in the
future.  */
 
-AARCH64_OPT_EXTENSION("fp",	AARCH64_FL_FP,  AARCH64_FL_FPSIMD | AARCH64_FL_CRYPTO, "fp")
-AARCH64_OPT_EXTENSION("simd",	AARCH64_FL_FPSIMD,  AARCH64_FL_SIMD | AARCH64_FL_CRYPTO,   "asimd")
-AARCH64_OPT_EXTENSION("crypto",	AARCH64_FL_CRYPTO | AARCH64_FL_FPSIMD,  AARCH64_FL_CRYPTO, "aes pmull sha1 sha2")
+AARCH64_OPT_EXTENSION("fp",	AARCH64_FL_FP,  AARCH64_FL_FPSIMD | AARCH64_FL_CRYPTO | AARCH64_FL_RDMA, "fp")
+AARCH64_OPT_EXTENSION("simd",	AARCH64_FL_FPSIMD,  AARCH64_FL_SIMD | AARCH64_FL_CRYPTO | AARCH64_FL_RDMA,   "asimd")
+AARCH64_OPT_EXTENSION("crypto",	AARCH64_FL_CRYPTO | AARCH64_FL_FPSIMD,  AARCH64_FL_CRYPTO,   "aes pmull sha1 sha2")
 AARCH64_OPT_EXTENSION("crc",	AARCH64_FL_CRC, AARCH64_FL_CRC,"crc32")
+AARCH64_OPT_EXTENSION("lse",	AARCH64_FL_LSE, AARCH64_FL_LSE,"lse")
+AARCH64_OPT_EXTENSION("pan",	AARCH64_FL_PAN,		AARCH64_FL_PAN,		"pan")
+AARCH64_OPT_EXTENSION("lor",	AARCH64_FL_LOR,		AARCH64_FL_LOR,		"lor")
+AARCH64_OPT_EXTENSION("rdma",	AARCH64_FL_RDMA | AARCH64_FL_FPSIMD,	AARCH64_FL_RDMA,	"rdma")
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 25b9927..a22c6e4 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -201,6 +201,11 @@ extern unsigned aarch64_architecture_version;
 #define AARCH64_FL_CRC(1 << 3)	/* Has CRC.  */
 /* Has static dispatch of FMA.  */
 #define AARCH64_FL_USE_FMA_STEERING_PASS (1 << 4)
+/* ARMv8.1 architecture extensions.  */
+#define AARCH64_FL_LSE	  (1 << 5)  /* Has Large System Extensions.  */
+#define AARCH64_FL_PAN	  (1 << 6)  /* Has Privileged Access Never.  */
+#define AARCH64_FL_LOR	  (1 << 7)  /* Has Limited Ordering regions.  */
+#define AARCH64_FL_RDMA	  (1 << 8)  /* Has ARMv8.1 Adv.SIMD.  */
 
 /* Has FP and SIMD.  */
 #define AARCH64_FL_FPSIMD (AARCH64_FL_FP | AARCH64_FL_SIMD)
@@ -210,6 +215,9 @@ extern unsigned aarch64_architecture_version;
 
 /* Architecture flags that effect instruction selection.  */
 #define AARCH64_FL_FOR_ARCH8   (AARCH64_FL_FPSIMD)
+#define AARCH64_FL_FOR_ARCH8_1			   \
+  (AARCH64_FL_FOR_ARCH8 | AARCH64_FL_LSE | AARCH64_FL_PAN \
+   | AARCH64_FL_LOR | AARCH64_FL_RDMA)
 
 /* Macros to test ISA flags.  */
 extern unsigned long aarch64_isa_flags;
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index e25bd62..96033db 100644
--- a/gcc/doc/invoke.texi
+++ b/

Update __atomic builtins documentation.

2015-04-20 Thread Matthew Wahab

Hello,

The documentation for the __atomic builtins isn't clear about their expectations
and behaviour. In particular, assumptions about the C11/C++11 restrictions on
programs should be stated and the different behaviour of memory models in fences
and in operations should be noted. The behaviour of compare-exchange when the
compare fails is also confusing and the description of the implementation of the
__atomics is mixed in with the description of their functionality.

This patch tries to deal with some of these problems.

Tested by looking at the html.

Ok for trunk?
Matthew

2015-04-20  Matthew Wahab  

* doc/extend.texi (__atomic Builtins): Move implementation details
to the end of the description, rewrite opening paragraphs, state
difference with __sync builtins, state C11/C++11 assumptions,
weaken itemized descriptions, add explanation of memory model
behaviour, expand description of compare-exchange, simplify text.
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 7470e40..5b551c1 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -8353,45 +8353,47 @@ are not prevented from being speculated to before the barrier.
 @node __atomic Builtins
 @section Built-in Functions for Memory Model Aware Atomic Operations
 
-The following built-in functions approximately match the requirements for
-C++11 memory model. Many are similar to the @samp{__sync} prefixed built-in
-functions, but all also have a memory model parameter.  These are all
-identified by being prefixed with @samp{__atomic}, and most are overloaded
-such that they work with multiple types.
-
-GCC allows any integral scalar or pointer type that is 1, 2, 4, or 8
-bytes in length. 16-byte integral types are also allowed if
-@samp{__int128} (@pxref{__int128}) is supported by the architecture.
-
-Target architectures are encouraged to provide their own patterns for
-each of these built-in functions.  If no target is provided, the original 
-non-memory model set of @samp{__sync} atomic built-in functions are
-utilized, along with any required synchronization fences surrounding it in
-order to achieve the proper behavior.  Execution in this case is subject
-to the same restrictions as those built-in functions.
-
-If there is no pattern or mechanism to provide a lock free instruction
-sequence, a call is made to an external routine with the same parameters
-to be resolved at run time.
+The following built-in functions approximately match the requirements
+for C++11 concurrency and memory models.  They are all
+identified by being prefixed with @samp{__atomic} and most are
+overloaded so that they work with multiple types.
+
+These functions are intended to replace the legacy @samp{__sync}
+builtins.  The main difference is that the memory model to be used is a
+parameter to the functions.  New code should always use the
+@samp{__atomic} builtins rather than the @samp{__sync} builtins.
+
+Note that the @samp{__atomic} builtins assume that programs will
+conform to the C++11 model for concurrency.  In particular, they assume
+that programs are free of data races.  See the C++11 standard for
+detailed definitions.
+
+The @samp{__atomic} builtins can be used with any integral scalar or
+pointer type that is 1, 2, 4, or 8 bytes in length.  16-byte integral
+types are also allowed if @samp{__int128} (@pxref{__int128}) is
+supported by the architecture.
 
 The four non-arithmetic functions (load, store, exchange, and 
 compare_exchange) all have a generic version as well.  This generic
 version works on any data type.  If the data type size maps to one
 of the integral sizes that may have lock free support, the generic
-version utilizes the lock free built-in function.  Otherwise an
+version uses the lock free built-in function.  Otherwise an
 external call is left to be resolved at run time.  This external call is
 the same format with the addition of a @samp{size_t} parameter inserted
 as the first parameter indicating the size of the object being pointed to.
 All objects must be the same size.
 
 There are 6 different memory models that can be specified.  These map
-to the same names in the C++11 standard.  Refer there or to the
-@uref{http://gcc.gnu.org/wiki/Atomic/GCCMM/AtomicSync,GCC wiki on
-atomic synchronization} for more detailed definitions.  These memory
-models integrate both barriers to code motion as well as synchronization
-requirements with other threads. These are listed in approximately
-ascending order of strength. It is also possible to use target specific
-flags for memory model flags, like Hardware Lock Elision.
+to the C++11 memory models with the same names, see the C++11 standard
+or the @uref{http://gcc.gnu.org/wiki/Atomic/GCCMM/AtomicSync,GCC wiki
+on atomic synchronization} for detailed definitions.  Individual
+targets may also support additional memory models for use on specific
+architectures.  Refer to the target documentation for details of
+these.
+
+The m

[PATCH][AArch64] Add branch-cost to cpu tuning information.

2015-04-21 Thread Matthew Wahab

The AArch64 backend sets BRANCH_COST to be the constant value 2 for all cpus,
meaning that the compiler thinks that branches cost the same across all cpus.

This patch reworks the handling of branch costs to allow per-cpu values to be
set. The actual value of the branch-costs is unchanged as the correct values for
will need to be decided for each core.

Tested aarch64-none-linux-gnu with gcc-check.

Ok for trunk?
Matthew

2015-05-21  Matthew Wahab  

* gcc/config/aarch64-protos.h (struct cpu_branch_cost): New.
(tune_params): Add field branch_costs.
(aarch64_branch_cost): Declare.
* gcc/config/aarch64.c (generic_branch_cost): New.
(generic_tunings): Set field cpu_branch_cost to generic_branch_cost.
(cortexa53_tunings): Likewise.
(cortexa57_tunings): Likewise.
(thunderx_tunings): Likewise.
(xgene1_tunings): Likewise.
(aarch64_branch_cost): Define.
* gcc/config/aarch64/aarch64.h (BRANCH_COST): Redefine.

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 8676c5c..77b01fa 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -162,12 +162,20 @@ struct cpu_vector_cost
   const int cond_not_taken_branch_cost;  /* Cost of not taken branch.  */
 };
 
+/* Branch costs.  */
+struct cpu_branch_cost
+{
+  const int predictable;/* Predictable branch or optimizing for size.  */
+  const int unpredictable;  /* Unpredictable branch or optimizing for speed.  */
+};
+
 struct tune_params
 {
   const struct cpu_cost_table *const insn_extra_cost;
   const struct cpu_addrcost_table *const addr_cost;
   const struct cpu_regmove_cost *const regmove_cost;
   const struct cpu_vector_cost *const vec_costs;
+  const struct cpu_branch_cost *const branch_costs;
   const int memmov_cost;
   const int issue_rate;
   const unsigned int fuseable_ops;
@@ -259,6 +267,8 @@ void aarch64_print_operand (FILE *, rtx, char);
 void aarch64_print_operand_address (FILE *, rtx);
 void aarch64_emit_call_insn (rtx);
 
+int aarch64_branch_cost (bool, bool);
+
 /* Initialize builtins for SIMD intrinsics.  */
 void init_aarch64_simd_builtins (void);
 
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 77a641e..a020316 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -339,12 +339,20 @@ static const struct cpu_vector_cost xgene1_vector_cost =
 #define AARCH64_FUSE_ADRP_LDR	(1 << 3)
 #define AARCH64_FUSE_CMP_BRANCH	(1 << 4)
 
+/* Generic costs for branch instructions.  */
+static const struct cpu_branch_cost generic_branch_cost =
+{
+  2,  /* Predictable.  */
+  2   /* Unpredictable.  */
+};
+
 static const struct tune_params generic_tunings =
 {
   &cortexa57_extra_costs,
   &generic_addrcost_table,
   &generic_regmove_cost,
   &generic_vector_cost,
+  &generic_branch_cost,
   4, /* memmov_cost  */
   2, /* issue_rate  */
   AARCH64_FUSE_NOTHING, /* fuseable_ops  */
@@ -362,6 +370,7 @@ static const struct tune_params cortexa53_tunings =
   &generic_addrcost_table,
   &cortexa53_regmove_cost,
   &generic_vector_cost,
+  &generic_branch_cost,
   4, /* memmov_cost  */
   2, /* issue_rate  */
   (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
@@ -380,6 +389,7 @@ static const struct tune_params cortexa57_tunings =
   &cortexa57_addrcost_table,
   &cortexa57_regmove_cost,
   &cortexa57_vector_cost,
+  &generic_branch_cost,
   4, /* memmov_cost  */
   3, /* issue_rate  */
   (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
@@ -398,6 +408,7 @@ static const struct tune_params thunderx_tunings =
   &generic_addrcost_table,
   &thunderx_regmove_cost,
   &generic_vector_cost,
+  &generic_branch_cost,
   6, /* memmov_cost  */
   2, /* issue_rate  */
   AARCH64_FUSE_CMP_BRANCH, /* fuseable_ops  */
@@ -415,6 +426,7 @@ static const struct tune_params xgene1_tunings =
   &xgene1_addrcost_table,
   &xgene1_regmove_cost,
   &xgene1_vector_cost,
+  &generic_branch_cost,
   6, /* memmov_cost  */
   4, /* issue_rate  */
   AARCH64_FUSE_NOTHING, /* fuseable_ops  */
@@ -5361,6 +5373,19 @@ aarch64_address_cost (rtx x,
   return cost;
 }
 
+int
+aarch64_branch_cost (bool speed_p, bool predictable_p)
+{
+  /* When optimizing for speed, use the cost of unpredictable branches.  */
+  const struct cpu_branch_cost *branch_costs =
+aarch64_tune_params->branch_costs;
+
+  if (!speed_p || predictable_p)
+return branch_costs->predictable;
+  else
+return branch_costs->unpredictable;
+}
+
 /* Return true if the RTX X in mode MODE is a zero or sign extract
usable in an ADD or SUB (extended register) instruction.  */
 static bool
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index bf59e40..93a32f5 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -823,7 +823,8 @@ do {	 \
 #defin

Re: [PATCH][ARM] Remove an unused reload hook.

2015-04-23 Thread Matthew Wahab

On 27/02/15 09:41, Richard Earnshaw wrote:

On 19/02/15 12:19, Matthew Wahab wrote:

The LEGITIMIZE_RELOAD_ADDRESS macro is only needed for reload. Since the
ARM backend no longer supports reload, this macro is not needed and this
patch removes it.



This is OK for stage 1.


Committed as r222359.
Matthew

2015-04-23  Matthew Wahab  

* config/arm/arm.h (LEGITIMIZE_RELOAD_ADDRESS): Remove.
(ARM_LEGITIMIZE_RELOAD_ADDRESS): Remove.
(THUMB_LEGITIMIZE_RELOAD_ADDRESS): Remove.
* config/arm/arm.c (arm_legimitimize_reload_address): Remove.
(thumb_legimitimize_reload_address): Remove.
* config/arm/arm-protos.h (arm_legimitimize_reload_address):
Remove.
(thumb_legimitimize_reload_address): Remove.



[PATCH][docs] Re: Update __atomic builtins documentation.

2015-04-30 Thread Matthew Wahab

[added tags to subject]

Ping.

On 20/04/15 14:29, Matthew Wahab wrote:

Hello,

The documentation for the __atomic builtins isn't clear about their expectations
and behaviour. In particular, assumptions about the C11/C++11 restrictions on
programs should be stated and the different behaviour of memory models in fences
and in operations should be noted. The behaviour of compare-exchange when the
compare fails is also confusing and the description of the implementation of the
__atomics is mixed in with the description of their functionality.

This patch tries to deal with some of these problems.

Tested by looking at the html.

Ok for trunk?
Matthew

2015-04-20  Matthew Wahab  

* doc/extend.texi (__atomic Builtins): Move implementation details
to the end of the description, rewrite opening paragraphs, state
difference with __sync builtins, state C11/C++11 assumptions,
weaken itemized descriptions, add explanation of memory model
behaviour, expand description of compare-exchange, simplify text.



diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 7470e40..5b551c1 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -8353,45 +8353,47 @@ are not prevented from being speculated to before the barrier.
 @node __atomic Builtins
 @section Built-in Functions for Memory Model Aware Atomic Operations
 
-The following built-in functions approximately match the requirements for
-C++11 memory model. Many are similar to the @samp{__sync} prefixed built-in
-functions, but all also have a memory model parameter.  These are all
-identified by being prefixed with @samp{__atomic}, and most are overloaded
-such that they work with multiple types.
-
-GCC allows any integral scalar or pointer type that is 1, 2, 4, or 8
-bytes in length. 16-byte integral types are also allowed if
-@samp{__int128} (@pxref{__int128}) is supported by the architecture.
-
-Target architectures are encouraged to provide their own patterns for
-each of these built-in functions.  If no target is provided, the original 
-non-memory model set of @samp{__sync} atomic built-in functions are
-utilized, along with any required synchronization fences surrounding it in
-order to achieve the proper behavior.  Execution in this case is subject
-to the same restrictions as those built-in functions.
-
-If there is no pattern or mechanism to provide a lock free instruction
-sequence, a call is made to an external routine with the same parameters
-to be resolved at run time.
+The following built-in functions approximately match the requirements
+for C++11 concurrency and memory models.  They are all
+identified by being prefixed with @samp{__atomic} and most are
+overloaded so that they work with multiple types.
+
+These functions are intended to replace the legacy @samp{__sync}
+builtins.  The main difference is that the memory model to be used is a
+parameter to the functions.  New code should always use the
+@samp{__atomic} builtins rather than the @samp{__sync} builtins.
+
+Note that the @samp{__atomic} builtins assume that programs will
+conform to the C++11 model for concurrency.  In particular, they assume
+that programs are free of data races.  See the C++11 standard for
+detailed definitions.
+
+The @samp{__atomic} builtins can be used with any integral scalar or
+pointer type that is 1, 2, 4, or 8 bytes in length.  16-byte integral
+types are also allowed if @samp{__int128} (@pxref{__int128}) is
+supported by the architecture.
 
 The four non-arithmetic functions (load, store, exchange, and 
 compare_exchange) all have a generic version as well.  This generic
 version works on any data type.  If the data type size maps to one
 of the integral sizes that may have lock free support, the generic
-version utilizes the lock free built-in function.  Otherwise an
+version uses the lock free built-in function.  Otherwise an
 external call is left to be resolved at run time.  This external call is
 the same format with the addition of a @samp{size_t} parameter inserted
 as the first parameter indicating the size of the object being pointed to.
 All objects must be the same size.
 
 There are 6 different memory models that can be specified.  These map
-to the same names in the C++11 standard.  Refer there or to the
-@uref{http://gcc.gnu.org/wiki/Atomic/GCCMM/AtomicSync,GCC wiki on
-atomic synchronization} for more detailed definitions.  These memory
-models integrate both barriers to code motion as well as synchronization
-requirements with other threads. These are listed in approximately
-ascending order of strength. It is also possible to use target specific
-flags for memory model flags, like Hardware Lock Elision.
+to the C++11 memory models with the same names, see the C++11 standard
+or the @uref{http://gcc.gnu.org/wiki/Atomic/GCCMM/AtomicSync,GCC wiki
+on atomic synchronization} for detailed definitions.  Individual
+targets may also support additional memory models for use on specific
+architec

Re: [PATCH][AArch64] Add branch-cost to cpu tuning information.

2015-05-05 Thread Matthew Wahab

On 01/05/15 10:18, Marcus Shawcroft wrote:

On 21 April 2015 at 15:00, Matthew Wahab  wrote:

+int aarch64_branch_cost (bool, bool);
+

You would never guess looking at this .h today, but long ago there was
something close to alphabetical order by function name in place.
Please lift this definition between aarch64_bitmask_imm and
aarch64_classify_symbolic_expression.

+int
+aarch64_branch_cost (bool speed_p, bool predictable_p)
+{

Add an appropriate comment before the function please.


Attached reworked patch:

- Moved declaration of aarch64_branch_cost to after aarch64_bitmask_imm.
- Added comment before definition of aarch64_branch_cost.

Tested aarch64-none-linux-gnu with gcc-check.

Ok for trunk?
Matthew

2015-05-05  Matthew Wahab  

* gcc/config/aarch64-protos.h (struct cpu_branch_cost): New.
(tune_params): Add field branch_costs.
(aarch64_branch_cost): Declare.
* gcc/config/aarch64.c (generic_branch_cost): New.
(generic_tunings): Set field cpu_branch_cost to generic_branch_cost.
(cortexa53_tunings): Likewise.
(cortexa57_tunings): Likewise.
(thunderx_tunings): Likewise.
(xgene1_tunings): Likewise.
(aarch64_branch_cost): Define.
* gcc/config/aarch64/aarch64.h (BRANCH_COST): Redefine.

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 08ce5f1..931c8b8 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -162,12 +162,20 @@ struct cpu_vector_cost
   const int cond_not_taken_branch_cost;  /* Cost of not taken branch.  */
 };
 
+/* Branch costs.  */
+struct cpu_branch_cost
+{
+  const int predictable;/* Predictable branch or optimizing for size.  */
+  const int unpredictable;  /* Unpredictable branch or optimizing for speed.  */
+};
+
 struct tune_params
 {
   const struct cpu_cost_table *const insn_extra_cost;
   const struct cpu_addrcost_table *const addr_cost;
   const struct cpu_regmove_cost *const regmove_cost;
   const struct cpu_vector_cost *const vec_costs;
+  const struct cpu_branch_cost *const branch_costs;
   const int memmov_cost;
   const int issue_rate;
   const unsigned int fuseable_ops;
@@ -184,6 +192,7 @@ struct tune_params
 HOST_WIDE_INT aarch64_initial_elimination_offset (unsigned, unsigned);
 int aarch64_get_condition_code (rtx);
 bool aarch64_bitmask_imm (HOST_WIDE_INT val, machine_mode);
+int aarch64_branch_cost (bool, bool);
 enum aarch64_symbol_type
 aarch64_classify_symbolic_expression (rtx, enum aarch64_symbol_context);
 bool aarch64_const_vec_all_same_int_p (rtx, HOST_WIDE_INT);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 374b0a9..7bc28ae 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -340,12 +340,20 @@ static const struct cpu_vector_cost xgene1_vector_cost =
 #define AARCH64_FUSE_ADRP_LDR	(1 << 3)
 #define AARCH64_FUSE_CMP_BRANCH	(1 << 4)
 
+/* Generic costs for branch instructions.  */
+static const struct cpu_branch_cost generic_branch_cost =
+{
+  2,  /* Predictable.  */
+  2   /* Unpredictable.  */
+};
+
 static const struct tune_params generic_tunings =
 {
   &cortexa57_extra_costs,
   &generic_addrcost_table,
   &generic_regmove_cost,
   &generic_vector_cost,
+  &generic_branch_cost,
   4, /* memmov_cost  */
   2, /* issue_rate  */
   AARCH64_FUSE_NOTHING, /* fuseable_ops  */
@@ -365,6 +373,7 @@ static const struct tune_params cortexa53_tunings =
   &generic_addrcost_table,
   &cortexa53_regmove_cost,
   &generic_vector_cost,
+  &generic_branch_cost,
   4, /* memmov_cost  */
   2, /* issue_rate  */
   (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
@@ -385,6 +394,7 @@ static const struct tune_params cortexa57_tunings =
   &cortexa57_addrcost_table,
   &cortexa57_regmove_cost,
   &cortexa57_vector_cost,
+  &generic_branch_cost,
   4, /* memmov_cost  */
   3, /* issue_rate  */
   (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
@@ -405,6 +415,7 @@ static const struct tune_params thunderx_tunings =
   &generic_addrcost_table,
   &thunderx_regmove_cost,
   &generic_vector_cost,
+  &generic_branch_cost,
   6, /* memmov_cost  */
   2, /* issue_rate  */
   AARCH64_FUSE_CMP_BRANCH, /* fuseable_ops  */
@@ -424,6 +435,7 @@ static const struct tune_params xgene1_tunings =
   &xgene1_addrcost_table,
   &xgene1_regmove_cost,
   &xgene1_vector_cost,
+  &generic_branch_cost,
   6, /* memmov_cost  */
   4, /* issue_rate  */
   AARCH64_FUSE_NOTHING, /* fuseable_ops  */
@@ -5409,6 +5421,23 @@ aarch64_address_cost (rtx x,
   return cost;
 }
 
+/* Return the cost of a branch.  If SPEED_P is true then the compiler is
+   optimizing for speed.  If PREDICTABLE_P is true then the branch is predicted
+   to be taken.  */
+
+int
+aarch64_branch_cost (bool speed_p, bool predictable_p)
+{
+  /* When optimizing for speed, use the cost of unpredictable branches.  */

Re: [PATCH][ARM] Remove an unused reload hook.

2015-03-05 Thread Matthew Wahab

On 27/02/15 09:41, Richard Earnshaw wrote:

On 19/02/15 12:19, Matthew Wahab wrote:

The LEGITIMIZE_RELOAD_ADDRESS macro is only needed for reload. Since the
ARM backend no longer supports reload, this macro is not needed and this
patch removes it.

gcc/
2015-02-19  Matthew Wahab  

 * config/arm/arm.h (LEGITIMIZE_RELOAD_ADDRESS): Remove.
 (ARM_LEGITIMIZE_RELOAD_ADDRESS): Remove.
 (THUMB_LEGITIMIZE_RELOAD_ADDRESS): Remove.
 * config/arm/arm.c (arm_legitimize_reload_address): Remove.
 (thumb_legitimize_reload_address): Remove.
 * config/arm/arm-protos.h (arm_legitimize_reload_address):
 Remove.
 (thumb_legitimize_reload_address): Remove.



This is OK for stage 1.

I have one open question: can LRA generate the optimizations that these
hooks used to provide through reload?  If not, please could you file
some bugzilla reports so that we don't lose them.

Thanks,
R.


arm_legitimize_reload_address was added by 
https://gcc.gnu.org/ml/gcc-patches/2011-04/msg00605.html. From 
config/arm/arm.c, the optimization turns

 add t1, r2, #4096
 ldr r0, [t1, #4]
 add t2, r2, #4096
 ldr r1, [t2, #8]
into
 add t1, r2, #4096
 ldr r0, [t1, #4]
 ldr r1, [t1, #8]

As far as I can tell, LRA does do this. Compiling the following with -O1:

int bar(int, int, int);
int test1(int* buf)
{
  int a = buf[41000];
  int b = buf[41004];
  int c = buf[41008];
  bar(a, b, c);
  return a +  b + c;
}

gcc version 4.5.1 (Sourcery G++ Lite 2010.09-51), which predates the 
optimization, produces

ldr r3, .L2
ldr r4, [r0, r3]
add r3, r3, #16
ldr r5, [r0, r3]
add r3, r3, #16
ldr r6, [r0, r3]

gcc version 4.9.3 20141119 with and without -mno-lra produce
add r0, r0, #163840
ldr r4, [r0, #160]
ldr r6, [r0, #176]
ldr r5, [r0, #192]
so it looks the better sequence gets generated.

thumb_legitimize_reload_address was added by 
https://gcc.gnu.org/ml/gcc-patches/2005-08/msg01140.html to fix PR 
23436. It replaces sequences like

mov r3, r9
mov r2, r10
ldr r0, [r3, r2]
with
mov r3, r9
add r3, r3, r10
ldr r0, [r3]

This looks like it's missing from trunk so I'll open a bugzilla report 
for it.


It's quite possible that I've got this all wrong so if I've missed 
something or you'd like me to open a bugzilla report for the ARM 
optimization as well, let me know.


Matthew




Re: [PATCH][ARM] Remove an unused reload hook.

2015-03-05 Thread Matthew Wahab

On 05/03/15 16:34, Matthew Wahab wrote:


thumb_legitimize_reload_address was added by
https://gcc.gnu.org/ml/gcc-patches/2005-08/msg01140.html to fix PR
23436. It replaces sequences like
mov r3, r9
mov r2, r10
ldr r0, [r3, r2]
with
mov r3, r9
add r3, r3, r10
ldr r0, [r3]

This looks like it's missing from trunk so I'll open a bugzilla report
for it.


PR 65326 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65326).
Matthew



Re: [PATCH][libstdc++][Testsuite] isctype test fails for newlib.

2015-03-10 Thread Matthew Wahab

On 09/03/15 12:47, Jonathan Wakely wrote:

On 13/02/15 13:48 +, Matthew Wahab wrote:

Some DOS line endings were introduced into the char/isctype.cc file
when I committed this change These aren't visible in a terminal or
with svn diff but do show up in emacs. This is causing the test to
fail in local runs. The wchar_t/isctype.cc file isn't affected.

I've committed the attached patch as obvious, it just removes the DOS
line endings from the file.


That patch still left DOS line-endings in the file.


Sorry, I thought I'd got them all.
Matthew




[PATCH][doc] Update __sync builtins, preferring __atomics.

2015-04-14 Thread Matthew Wahab

Hello,

The documentation for the __sync builtins calls them legacy but doesn't clearly
say that the __atomic builtins should be prefered. This patch adds a statement
to that effect. It also simplifies some of the text and weakens a suggestion of
future change in the the __syncs behaviour.

Tested by looking at the html and info files.

Ok for trunk?
Matthew

2015-04-14  Matthew Wahab  

* doc/extend.texi (__sync Builtins): Simplify some text.  Update
details about implementation.  Make clear preference for __atomic
builtins.  Reduce possibility of future change.
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index d4c41c6..7470e40 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -8213,15 +8213,19 @@ identifier, or a sequence of member accesses and array references.
 The following built-in functions
 are intended to be compatible with those described
 in the @cite{Intel Itanium Processor-specific Application Binary Interface},
-section 7.4.  As such, they depart from the normal GCC practice of using
-the @samp{__builtin_} prefix, and further that they are overloaded such that
-they work on multiple types.
+section 7.4.  As such, they depart from normal GCC practice by not using
+the @samp{__builtin_} prefix and also by being overloaded so that they
+work on multiple types.
 
 The definition given in the Intel documentation allows only for the use of
-the types @code{int}, @code{long}, @code{long long} as well as their unsigned
+the types @code{int}, @code{long}, @code{long long} or their unsigned
 counterparts.  GCC allows any integral scalar or pointer type that is
 1, 2, 4 or 8 bytes in length.
 
+These functions are implemented in terms of the @samp{__atomic}
+builtins (@pxref{__atomic Builtins}).  They should not be used for new
+code which should use the @samp{__atomic} builtins instead.
+
 Not all operations are supported by all target processors.  If a particular
 operation cannot be implemented on the target processor, a warning is
 generated and a call to an external function is generated.  The external
@@ -8243,11 +8247,10 @@ after the operation.
 All of the routines are described in the Intel documentation to take
 ``an optional list of variables protected by the memory barrier''.  It's
 not clear what is meant by that; it could mean that @emph{only} the
-following variables are protected, or it could mean that these variables
-should in addition be protected.  At present GCC ignores this list and
-protects all variables that are globally accessible.  If in the future
-we make some use of this list, an empty list will continue to mean all
-globally accessible variables.
+listed variables are protected, or it could mean a list of additional
+variables to be protected.  The list is ignored by GCC which treats it as
+empty.  GCC interprets an empty list as meaning that all globally
+accessible variables should be protected.
 
 @table @code
 @item @var{type} __sync_fetch_and_add (@var{type} *ptr, @var{type} value, ...)


Re: [PATCH][doc] Update __sync builtins, preferring __atomics.

2015-04-15 Thread Matthew Wahab

Committed with a typo-fix in the text, /an list/a list/, and in the change log
/details about implementation/details about the implementation/.

2015-04-14  Matthew Wahab  

* doc/extend.texi (__sync Builtins): Simplify some text.  Update
details about the implementation.  Make clear preference for
__atomic builtins.  Reduce possibility of future change.

Matthew



Re: [Aarch64] Add support for ARMv8.1 command line options.

2015-06-16 Thread Matthew Wahab

Ping.
Updated patch attached.

On 04/06/15 10:16, Matthew Wahab wrote:

ARMv8.1 is a set of optional architectural extensions to ARMv8. Support, added
by other patches, is enabled in binutils for ARMv8.1 and for the individual
extensions by using architechure name "armv8.1-a" or by adding the extension
name to "armv8-a".

This patch adds support to gcc for using "armv8.1-a" as an architecture name and
for using "armv8-a" with one or more of the ARMv8.1 extension names "lse",
"pan", "rdma" or "lor" . The new options are passed through to the toolchain and
don't affect code generation in gcc.

Tested aarch64-none-linux-gnu with check-gcc.

Ok for trunk?
Matthew

gcc/
2015-06-4  Matthew Wahab  

* config/aarch64/aarch64-arches.def: Add "armv8.1-a".
* config/aarch64/aarch64-options-extensions.def: Update "fP",
"simd" and "crypto".  Add "lse", "pan", "lor" and "rdma".
* gcc/config/aarch64/aarch64.h (AARCH64_FL_LSE): New.
(AARCH64_FL_PAN): New.
(AARCH64_FL_LOR): New.
(AARCH64_FL_RDMA): New.
(AARCH64_FL_FOR_ARCH8_1): New.
* doc/invoke.texi (AArch64 Options): Add "armv8.1-a" to
-march. Add "lse", "pan", "lor", "rdma" to feature modifiers.



diff --git a/gcc/config/aarch64/aarch64-arches.def b/gcc/config/aarch64/aarch64-arches.def
index bf4e185..abbfce6 100644
--- a/gcc/config/aarch64/aarch64-arches.def
+++ b/gcc/config/aarch64/aarch64-arches.def
@@ -27,3 +27,4 @@
the flags implied by the architecture.  */
 
 AARCH64_ARCH("armv8-a",	  generic,	 8,  AARCH64_FL_FOR_ARCH8)
+AARCH64_ARCH("armv8.1-a", generic,	 8,  AARCH64_FL_FOR_ARCH8_1)
diff --git a/gcc/config/aarch64/aarch64-option-extensions.def b/gcc/config/aarch64/aarch64-option-extensions.def
index f296296..1762cc8 100644
--- a/gcc/config/aarch64/aarch64-option-extensions.def
+++ b/gcc/config/aarch64/aarch64-option-extensions.def
@@ -39,7 +39,11 @@
AArch64, and therefore serves as a template for adding more CPUs in the
future.  */
 
-AARCH64_OPT_EXTENSION("fp",	AARCH64_FL_FP,  AARCH64_FL_FPSIMD | AARCH64_FL_CRYPTO, "fp")
-AARCH64_OPT_EXTENSION("simd",	AARCH64_FL_FPSIMD,  AARCH64_FL_SIMD | AARCH64_FL_CRYPTO,   "asimd")
-AARCH64_OPT_EXTENSION("crypto",	AARCH64_FL_CRYPTO | AARCH64_FL_FPSIMD,  AARCH64_FL_CRYPTO, "aes pmull sha1 sha2")
+AARCH64_OPT_EXTENSION("fp",	AARCH64_FL_FP,  AARCH64_FL_FPSIMD | AARCH64_FL_CRYPTO | AARCH64_FL_RDMA, "fp")
+AARCH64_OPT_EXTENSION("simd",	AARCH64_FL_FPSIMD,  AARCH64_FL_SIMD | AARCH64_FL_CRYPTO | AARCH64_FL_RDMA,   "asimd")
+AARCH64_OPT_EXTENSION("crypto",	AARCH64_FL_CRYPTO | AARCH64_FL_FPSIMD,  AARCH64_FL_CRYPTO,   "aes pmull sha1 sha2")
 AARCH64_OPT_EXTENSION("crc",	AARCH64_FL_CRC, AARCH64_FL_CRC,"crc32")
+AARCH64_OPT_EXTENSION("lse",	AARCH64_FL_LSE, AARCH64_FL_LSE,"lse")
+AARCH64_OPT_EXTENSION("pan",	AARCH64_FL_PAN,		AARCH64_FL_PAN,		"pan")
+AARCH64_OPT_EXTENSION("lor",	AARCH64_FL_LOR,		AARCH64_FL_LOR,		"lor")
+AARCH64_OPT_EXTENSION("rdma",	AARCH64_FL_RDMA | AARCH64_FL_FPSIMD,	AARCH64_FL_RDMA,	"rdma")
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 25b9927..a22c6e4 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -201,6 +201,11 @@ extern unsigned aarch64_architecture_version;
 #define AARCH64_FL_CRC(1 << 3)	/* Has CRC.  */
 /* Has static dispatch of FMA.  */
 #define AARCH64_FL_USE_FMA_STEERING_PASS (1 << 4)
+/* ARMv8.1 architecture extensions.  */
+#define AARCH64_FL_LSE	  (1 << 5)  /* Has Large System Extensions.  */
+#define AARCH64_FL_PAN	  (1 << 6)  /* Has Privileged Access Never.  */
+#define AARCH64_FL_LOR	  (1 << 7)  /* Has Limited Ordering regions.  */
+#define AARCH64_FL_RDMA	  (1 << 8)  /* Has ARMv8.1 Adv.SIMD.  */
 
 /* Has FP and SIMD.  */
 #define AARCH64_FL_FPSIMD (AARCH64_FL_FP | AARCH64_FL_SIMD)
@@ -210,6 +215,9 @@ extern unsigned aarch64_architecture_version;
 
 /* Architecture flags that effect instruction selection.  */
 #define AARCH64_FL_FOR_ARCH8   (AARCH64_FL_FPSIMD)
+#define AARCH64_FL_FOR_ARCH8_1			   \
+  (AARCH64_FL_FOR_ARCH8 | AARCH64_FL_LSE | AARCH64_FL_PAN \
+   | AARCH64_FL_LOR | AARCH64_FL_RDMA)
 
 /* Macros to test ISA flags.  */
 extern unsigned long aarch64_isa_flags;
diff --git a/gcc/doc/invoke.texi b/gcc/

[PATCH 1/3][ARM][PR target/65697] Strengthen memory barriers for __sync builtins

2015-06-22 Thread Matthew Wahab

This is the ARM version of the patches to strengthen memory barriers for the
__sync builtins on ARMv8 targets
(https://gcc.gnu.org/ml/gcc-patches/2015-05/msg01989.html).

The problem is that the barriers generated for the __sync builtins for ARMv8
targets are too weak. This affects the full and the acquire barriers in the
__sync fetch-and-op, compare-and-swap functions and __sync_lock_test_and_set.

This patch series changes the code to strengthen the barriers by replacing
initial load-acquires with a simple load and adding a final memory barrier to
prevent code hoisting.

- Full barriers:  __sync_fetch_and_op, __sync_op_and_fetch
  __sync_*_compare_and_swap

  [load-acquire; code; store-release]
  becomes
  [load; code ; store-release; barrier].

- Acquire barriers:  __sync_lock_test_and_set

  [load-acquire; code; store]
  becomes
  [load; code; store; barrier]

This patch changes the code generated for __sync_fetch_and_ and
__sync__and_fetch builtins.

Tested as part of a series for arm-none-linux-gnueabihf with check-gcc.

Ok for trunk?
Matthew

gcc/
2015-06-22  Matthew Wahab  

PR Target/65697
* config/armc/arm.c (arm_split_atomic_op): For ARMv8, replace an
initial acquire barrier with a final full barrier.
From 3e9f71c04dba20ba66b5c9bae284fcac5fdd91ec Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Fri, 22 May 2015 13:31:58 +0100
Subject: [PATCH 1/3] [ARM] Strengthen barriers for sync-fetch-op builtin.

Change-Id: I18f5af5ba4b2e74b5866009d3a090e251eff4a45
---
 gcc/config/arm/arm.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index e79a369..94118f4 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -27668,6 +27668,8 @@ arm_split_atomic_op (enum rtx_code code, rtx old_out, rtx new_out, rtx mem,
   rtx_code_label *label;
   rtx x;
 
+  bool is_armv8_sync = arm_arch8 && is_mm_sync (model);
+
   bool use_acquire = TARGET_HAVE_LDACQ
  && !(is_mm_relaxed (model) || is_mm_consume (model)
 			  || is_mm_release (model));
@@ -27676,6 +27678,11 @@ arm_split_atomic_op (enum rtx_code code, rtx old_out, rtx new_out, rtx mem,
  && !(is_mm_relaxed (model) || is_mm_consume (model)
 			  || is_mm_acquire (model));
 
+  /* For ARMv8, a load-acquire is too weak for __sync memory orders.  Instead,
+ a full barrier is emitted after the store-release.  */
+  if (is_armv8_sync)
+use_acquire = false;
+
   /* Checks whether a barrier is needed and emits one accordingly.  */
   if (!(use_acquire || use_release))
 arm_pre_atomic_barrier (model);
@@ -27746,7 +27753,8 @@ arm_split_atomic_op (enum rtx_code code, rtx old_out, rtx new_out, rtx mem,
   emit_unlikely_jump (gen_cbranchsi4 (x, cond, const0_rtx, label));
 
   /* Checks whether a barrier is needed and emits one accordingly.  */
-  if (!(use_acquire || use_release))
+  if (is_armv8_sync
+  || !(use_acquire || use_release))
 arm_post_atomic_barrier (model);
 }
 
-- 
1.9.1



[PATCH 2/3][ARM][PR target/65697] Strengthen barriers for compare-and-swap builtin.

2015-06-22 Thread Matthew Wahab

This is the ARM version of the patches to strengthen memory barriers for the
__sync builtins on ARMv8 targets
(https://gcc.gnu.org/ml/gcc-patches/2015-05/msg01989.html).

This patch changes the code generated for __sync_type_compare_and_swap to remove
the acquire-barrier from the load and end the operation with a fence. This also
strengthens the acquire barrier generated for __sync_lock_test_and_set which,
like compare-and-swap, is implemented as a form of atomic exchange.

Tested as part of a series for arm-none-linux-gnueabihf with check-gcc.

Ok for trunk?
Matthew

gcc/
2015-06-22  Matthew Wahab  

PR Target/65697
* config/armc/arm.c (arm_split_compare_and_swap): For ARMv8, replace an
initial acquire barrier with a final full barrier.

From ddb9a45acda7bb64d91c446bc40afe4b78fcc1e1 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Fri, 22 May 2015 13:36:39 +0100
Subject: [PATCH 2/3] [ARM] Strengthen barriers for compare-and-swap builtin.

Change-Id: I43381b2ea88492f807d85a73d233369334c99881
---
 gcc/config/arm/arm.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 94118f4..4610ff6 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -27603,6 +27603,8 @@ arm_split_compare_and_swap (rtx operands[])
   scratch = operands[7];
   mode = GET_MODE (mem);
 
+  bool is_armv8_sync = arm_arch8 && is_mm_sync (mod_s);
+
   bool use_acquire = TARGET_HAVE_LDACQ
  && !(is_mm_relaxed (mod_s) || is_mm_consume (mod_s)
 			  || is_mm_release (mod_s));
@@ -27611,6 +27613,11 @@ arm_split_compare_and_swap (rtx operands[])
  && !(is_mm_relaxed (mod_s) || is_mm_consume (mod_s)
 			  || is_mm_acquire (mod_s));
 
+  /* For ARMv8, the load-acquire is too weak for __sync memory orders.  Instead,
+ a full barrier is emitted after the store-release.  */
+  if (is_armv8_sync)
+use_acquire = false;
+
   /* Checks whether a barrier is needed and emits one accordingly.  */
   if (!(use_acquire || use_release))
 arm_pre_atomic_barrier (mod_s);
@@ -27651,7 +27658,8 @@ arm_split_compare_and_swap (rtx operands[])
 emit_label (label2);
 
   /* Checks whether a barrier is needed and emits one accordingly.  */
-  if (!(use_acquire || use_release))
+  if (is_armv8_sync
+  || !(use_acquire || use_release))
 arm_post_atomic_barrier (mod_s);
 
   if (is_mm_relaxed (mod_f))
-- 
1.9.1



[PATCH 3/3][ARM][PR target/65697] Add tests for __sync builtins.

2015-06-22 Thread Matthew Wahab

This is the ARM version of the patches to strengthen memory barriers for the
__sync builtins on ARMv8 targets
(https://gcc.gnu.org/ml/gcc-patches/2015-05/msg01989.html).

This patch adds tests for the code generated by the ARM backend for the __sync
builtins.

Tested the series for arm-none-linux-gnueabihf with check-gcc.

Ok for trunk?
Matthew

gcc/testsuite
2015-06-22  Matthew Wahab  

PR Target/65697
* gcc.target/arm/armv8-sync-comp-swap.c: New.
* gcc.target/arm/armv8-sync-op-acquire.c: New.
* gcc.target/arm/armv8-sync-op-full.c: New.
* gcc.target/arm/armv8-sync-op-release.c: New.

From 8157c7480a9d6d559013d02e24519d1b7ba1ed5b Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Wed, 3 Jun 2015 16:27:55 +0100
Subject: [PATCH 3/3] [ARM] Add test cases.

Change-Id: I0f2257ce5b5e7f9d0f75e57e6be22fd9733ed3ca
---
 gcc/testsuite/gcc.target/arm/armv8-sync-comp-swap.c  | 10 ++
 gcc/testsuite/gcc.target/arm/armv8-sync-op-acquire.c | 10 ++
 gcc/testsuite/gcc.target/arm/armv8-sync-op-full.c| 10 ++
 gcc/testsuite/gcc.target/arm/armv8-sync-op-release.c |  8 
 4 files changed, 38 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/arm/armv8-sync-comp-swap.c
 create mode 100644 gcc/testsuite/gcc.target/arm/armv8-sync-op-acquire.c
 create mode 100644 gcc/testsuite/gcc.target/arm/armv8-sync-op-full.c
 create mode 100644 gcc/testsuite/gcc.target/arm/armv8-sync-op-release.c

diff --git a/gcc/testsuite/gcc.target/arm/armv8-sync-comp-swap.c b/gcc/testsuite/gcc.target/arm/armv8-sync-comp-swap.c
new file mode 100644
index 000..f96c81a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/armv8-sync-comp-swap.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { do-require-effective-target arm_arch_v8a_ok } */
+/* { dg-options "-O2" } */
+/* { dg-add-options arm_arch_v8a } */
+
+#include "../aarch64/sync-comp-swap.x"
+
+/* { dg-final { scan-assembler-times "ldrex" 2 } } */
+/* { dg-final { scan-assembler-times "stlex" 2 } } */
+/* { dg-final { scan-assembler-times "dmb" 2 } } */
diff --git a/gcc/testsuite/gcc.target/arm/armv8-sync-op-acquire.c b/gcc/testsuite/gcc.target/arm/armv8-sync-op-acquire.c
new file mode 100644
index 000..8d6659b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/armv8-sync-op-acquire.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { do-require-effective-target arm_arch_v8a_ok } */
+/* { dg-options "-O2" } */
+/* { dg-add-options arm_arch_v8a } */
+
+#include "../aarch64/sync-op-acquire.x"
+
+/* { dg-final { scan-assembler-times "ldrex" 1 } } */
+/* { dg-final { scan-assembler-times "stlex" 1 } } */
+/* { dg-final { scan-assembler-times "dmb" 1 } } */
diff --git a/gcc/testsuite/gcc.target/arm/armv8-sync-op-full.c b/gcc/testsuite/gcc.target/arm/armv8-sync-op-full.c
new file mode 100644
index 000..a5ad3bd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/armv8-sync-op-full.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { do-require-effective-target arm_arch_v8a_ok } */
+/* { dg-options "-O2" } */
+/* { dg-add-options arm_arch_v8a } */
+
+#include "../aarch64/sync-op-full.x"
+
+/* { dg-final { scan-assembler-times "ldrex" 12 } } */
+/* { dg-final { scan-assembler-times "stlex" 12 } } */
+/* { dg-final { scan-assembler-times "dmb" 12 } } */
diff --git a/gcc/testsuite/gcc.target/arm/armv8-sync-op-release.c b/gcc/testsuite/gcc.target/arm/armv8-sync-op-release.c
new file mode 100644
index 000..0d3be7b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/armv8-sync-op-release.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { do-require-effective-target arm_arch_v8a_ok } */
+/* { dg-options "-O2" } */
+/* { dg-add-options arm_arch_v8a } */
+
+#include "../aarch64/sync-op-release.x"
+
+/* { dg-final { scan-assembler-times "stl" 1 } } */
-- 
1.9.1



[PATCH 1/2][ARM] Record FPU features as a bit-set

2015-06-22 Thread Matthew Wahab

Hello,

The ARM backend records FPU features as booleans, one for each feature. This
means that adding support for a new feature involves updating every entry in the
list of FPU descriptions in arm-fpus.def. This patch series changes the
representation of FPU features to use a simple bit-set and flags, as is done
elsewhere.

This patch adds the new FPU feature representation, with feature sets
represented as unsigned longs.

Tested the series for arm-none-linux-gnueabihf with check-gcc

Ok for trunk?
Matthew

gcc/
2015-06-22  Matthew Wahab  

* config/arm/arm.h (arm_fpu_fset): New.
(ARM_FPU_FSET_HAS): New.
(FPU_FL_NONE): New.
(FPU_FL_NEON): New.
(FPU_FL_FP16): New.
(FPU_FL_CRYPTO): New.
From 0ae697751afd9420ece15432e4892a60574b1d56 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Wed, 10 Jun 2015 09:57:55 +0100
Subject: [PATCH 1/2] Add fpu feature set definitions.

Change-Id: I9614d12b19f068ae2e0cebc1a6c3903972c73d6a
---
 gcc/config/arm/arm.h | 13 +
 1 file changed, 13 insertions(+)

diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 373dc85..eadbcec 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -318,6 +318,19 @@ extern void (*arm_lang_output_object_attributes_hook)(void);
   {"mode", "%{!marm:%{!mthumb:-m%(VALUE)}}"}, \
   {"tls", "%{!mtls-dialect=*:-mtls-dialect=%(VALUE)}"},
 
+/* FPU feature sets.  */
+
+typedef unsigned long arm_fpu_fset;
+
+/* Test for an FPU feature.  */
+#define ARM_FPU_FSET_HAS(S,F) (((S) & (F)) == F)
+
+/* FPU Features.  */
+#define FPU_FL_NONE	(0)
+#define FPU_FL_NEON	(1 << 0)	/* NEON instructions.  */
+#define FPU_FL_FP16	(1 << 1)	/* Half-precision.  */
+#define FPU_FL_CRYPTO	(1 << 2)	/* Crypto extensions.  */
+
 /* Which floating point model to use.  */
 enum arm_fp_model
 {
-- 
1.9.1



[PATCH 2/2][ARM] Use new FPU features representation

2015-06-22 Thread Matthew Wahab

Hello,

This patch series changes the representation of FPU features to use a simple
bit-set and flags, as is done elsewhere.

This patch uses the new representation of FPU feature sets.

Tested the series for arm-none-linux-gnueabihf with check-gcc

Ok for trunk?
Matthew

gcc/
2015-06-22  Matthew Wahab  

* config/arm/arm-fpus.def: Replace neon, fp16 and crypto boolean
fields with feature flags.  Update comment.
* config/arm/arm.c (ARM_FPU): Update macro.
* config/arm/arm.h (TARGET_NEON_FP16): Update feature test.
(TARGET_FP16): Likewise.
(TARGET_CRYPTO): Likewise.
(TARGET_NEON): Likewise.
(struct arm_fpu_desc): Remove fields neon, fp16 and crypto.  Add
field features.

From 6f9cd1b41d7597d95bd80aa21344f8e6e011e168 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Wed, 10 Jun 2015 10:11:56 +0100
Subject: [PATCH 2/2] Use new FPU feature definitions.

Change-Id: I0c45e52b08b31433ec2b30fcb666584cabcb826b
---
 gcc/config/arm/arm-fpus.def | 40 
 gcc/config/arm/arm.c|  4 ++--
 gcc/config/arm/arm.h| 22 +-
 3 files changed, 35 insertions(+), 31 deletions(-)

diff --git a/gcc/config/arm/arm-fpus.def b/gcc/config/arm/arm-fpus.def
index 2dfefd6..efd5896 100644
--- a/gcc/config/arm/arm-fpus.def
+++ b/gcc/config/arm/arm-fpus.def
@@ -19,30 +19,30 @@
 
 /* Before using #include to read this file, define a macro:
 
-  ARM_FPU(NAME, MODEL, REV, VFP_REGS, NEON, FP16, CRYPTO)
+  ARM_FPU(NAME, MODEL, REV, VFP_REGS, FEATURES)
 
The arguments are the fields of struct arm_fpu_desc.
 
genopt.sh assumes no whitespace up to the first "," in each entry.  */
 
-ARM_FPU("vfp",		ARM_FP_MODEL_VFP, 2, VFP_REG_D16, false, false, false)
-ARM_FPU("vfpv3",	ARM_FP_MODEL_VFP, 3, VFP_REG_D32, false, false, false)
-ARM_FPU("vfpv3-fp16",	ARM_FP_MODEL_VFP, 3, VFP_REG_D32, false, true, false)
-ARM_FPU("vfpv3-d16",	ARM_FP_MODEL_VFP, 3, VFP_REG_D16, false, false, false)
-ARM_FPU("vfpv3-d16-fp16",	ARM_FP_MODEL_VFP, 3, VFP_REG_D16, false, true, false)
-ARM_FPU("vfpv3xd",	ARM_FP_MODEL_VFP, 3, VFP_REG_SINGLE, false, false, false)
-ARM_FPU("vfpv3xd-fp16",	ARM_FP_MODEL_VFP, 3, VFP_REG_SINGLE, false, true, false)
-ARM_FPU("neon",		ARM_FP_MODEL_VFP, 3, VFP_REG_D32, true , false, false)
-ARM_FPU("neon-fp16",	ARM_FP_MODEL_VFP, 3, VFP_REG_D32, true, true, false)
-ARM_FPU("vfpv4",	ARM_FP_MODEL_VFP, 4, VFP_REG_D32, false, true, false)
-ARM_FPU("vfpv4-d16",	ARM_FP_MODEL_VFP, 4, VFP_REG_D16, false, true, false)
-ARM_FPU("fpv4-sp-d16",	ARM_FP_MODEL_VFP, 4, VFP_REG_SINGLE, false, true, false)
-ARM_FPU("fpv5-sp-d16",	ARM_FP_MODEL_VFP, 5, VFP_REG_SINGLE, false, true, false)
-ARM_FPU("fpv5-d16",	ARM_FP_MODEL_VFP, 5, VFP_REG_D16, false, true, false)
-ARM_FPU("neon-vfpv4",	ARM_FP_MODEL_VFP, 4, VFP_REG_D32, true, true, false)
-ARM_FPU("fp-armv8",	ARM_FP_MODEL_VFP, 8, VFP_REG_D32, false, true, false)
-ARM_FPU("neon-fp-armv8",ARM_FP_MODEL_VFP, 8, VFP_REG_D32, true, true, false)
+ARM_FPU("vfp",		ARM_FP_MODEL_VFP, 2, VFP_REG_D16, FPU_FL_NONE)
+ARM_FPU("vfpv3",	ARM_FP_MODEL_VFP, 3, VFP_REG_D32, FPU_FL_NONE)
+ARM_FPU("vfpv3-fp16",	ARM_FP_MODEL_VFP, 3, VFP_REG_D32, FPU_FL_FP16)
+ARM_FPU("vfpv3-d16",	ARM_FP_MODEL_VFP, 3, VFP_REG_D16, FPU_FL_NONE)
+ARM_FPU("vfpv3-d16-fp16",	ARM_FP_MODEL_VFP, 3, VFP_REG_D16, FPU_FL_FP16)
+ARM_FPU("vfpv3xd",	ARM_FP_MODEL_VFP, 3, VFP_REG_SINGLE, FPU_FL_NONE)
+ARM_FPU("vfpv3xd-fp16",	ARM_FP_MODEL_VFP, 3, VFP_REG_SINGLE, FPU_FL_FP16)
+ARM_FPU("neon",		ARM_FP_MODEL_VFP, 3, VFP_REG_D32, FPU_FL_NEON)
+ARM_FPU("neon-fp16",	ARM_FP_MODEL_VFP, 3, VFP_REG_D32, FPU_FL_NEON | FPU_FL_FP16)
+ARM_FPU("vfpv4",	ARM_FP_MODEL_VFP, 4, VFP_REG_D32, FPU_FL_FP16)
+ARM_FPU("vfpv4-d16",	ARM_FP_MODEL_VFP, 4, VFP_REG_D16, FPU_FL_FP16)
+ARM_FPU("fpv4-sp-d16",	ARM_FP_MODEL_VFP, 4, VFP_REG_SINGLE, FPU_FL_FP16)
+ARM_FPU("fpv5-sp-d16",	ARM_FP_MODEL_VFP, 5, VFP_REG_SINGLE, FPU_FL_FP16)
+ARM_FPU("fpv5-d16",	ARM_FP_MODEL_VFP, 5, VFP_REG_D16, FPU_FL_FP16)
+ARM_FPU("neon-vfpv4",	ARM_FP_MODEL_VFP, 4, VFP_REG_D32, FPU_FL_NEON | FPU_FL_FP16)
+ARM_FPU("fp-armv8",	ARM_FP_MODEL_VFP, 8, VFP_REG_D32, FPU_FL_FP16)
+ARM_FPU("neon-fp-armv8",ARM_FP_MODEL_VFP, 8, VFP_REG_D32, FPU_FL_NEON | FPU_FL_FP16)
 ARM_FPU("crypto-neon-fp-armv8",
-			ARM_FP_MODEL_VFP, 8, VFP_REG_D32, true, true, true)
+			ARM_FP_MODEL_VFP, 8, VFP_REG_D32, FPU_FL_NEON | FPU_FL_FP16 | FPU_FL_CRYPTO)
 /* Compatibility aliases.  */
-ARM_FPU("vfp3",		ARM_FP_MODEL_VFP, 3, VFP_REG_D32, false, false, false)
+ARM_FPU("vfp3",		ARM_FP_MODEL_VFP, 3, VFP_REG_D32, FPU_FL_NONE)
diff --git a/gcc/confi

[PATCH 1/4][ARM] Make room for more CPU feature flags.

2015-06-22 Thread Matthew Wahab

Hello,

The ARM backend uses an unsigned long to record CPU feature flags and there are
currently 30 bits in use. To be able to support new architecture features, the
current representation will need to be replaced so that more flags can be
recorded.

This series of patches replaces the single unsigned long with a representation
based on an array of unsigned longs. Constructors and operations are explicitly
defined for the new representation and the backend is updated to use the new
operations.

The individual patches:
- Make architecture flags explicit in arm-cores.def, to prepare for the changes.
- Add definitions for the new representation as type arm_feature_set and macros
  with prefix ARM_FSET.
- Replace uses of the old representation with the arm_feature_set type and
  operations.
- Rework arm-cores.def and arm-arches.def to make the feature set constructions
  explicit.

The series tested for arm-none-linux-gnueabihf with check-gcc.

This patch moves the derived FL_FOR_ARCH##ARCH flags from the expansion of macro
arm.c/ARM_CORE and makes them explicit in the entries in arm-cores.def.

This patch tested for arm-none-linux-gnueabihf with check-gcc.

Ok for trunk?
Matthew

2015-06-22  Matthew Wahab  

* gcc/config/arm/arm-cores.def: Add FL_FOR_ARCH flag for each
ARM_CORE entry.  Fix some white-space.
* gcc/config/arm/arm.c: Remove FL_FOR_ARCH derivation from
ARM_CORE definition.
From b8d4b4ef938d64996d0d20aaa9974757057aaad2 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Fri, 5 Jun 2015 12:33:34 +0100
Subject: [PATCH 1/4] [ARM] Make ARCH flags explicit in arm-cores.def

Change-Id: I13a79c89bebaf82aa921f0502b721ff5d9b92dbe
---
 gcc/config/arm/arm-cores.def | 200 +--
 gcc/config/arm/arm.c |   2 +-
 2 files changed, 101 insertions(+), 101 deletions(-)

diff --git a/gcc/config/arm/arm-cores.def b/gcc/config/arm/arm-cores.def
index 103c314..f362c27 100644
--- a/gcc/config/arm/arm-cores.def
+++ b/gcc/config/arm/arm-cores.def
@@ -43,134 +43,134 @@
Some tools assume no whitespace up to the first "," in each entry.  */
 
 /* V2/V2A Architecture Processors */
-ARM_CORE("arm2", 	arm2, arm2,	2, FL_CO_PROC | FL_MODE26, slowmul)
-ARM_CORE("arm250", 	arm250, arm250,	2, FL_CO_PROC | FL_MODE26, slowmul)
-ARM_CORE("arm3",	arm3, arm3,	2, FL_CO_PROC | FL_MODE26, slowmul)
+ARM_CORE("arm2",	arm2, arm2,	2, FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH2, slowmul)
+ARM_CORE("arm250",	arm250, arm250,	2, FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH2, slowmul)
+ARM_CORE("arm3",	arm3, arm3,	2, FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH2, slowmul)
 
 /* V3 Architecture Processors */
-ARM_CORE("arm6",	arm6, arm6,		3, FL_CO_PROC | FL_MODE26, slowmul)
-ARM_CORE("arm60",	arm60, arm60,		3, FL_CO_PROC | FL_MODE26, slowmul)
-ARM_CORE("arm600",	arm600, arm600,		3, FL_CO_PROC | FL_MODE26 | FL_WBUF, slowmul)
-ARM_CORE("arm610",	arm610, arm610,		3, FL_MODE26 | FL_WBUF, slowmul)
-ARM_CORE("arm620",	arm620, arm620,		3, FL_CO_PROC | FL_MODE26 | FL_WBUF, slowmul)
-ARM_CORE("arm7",	arm7, arm7,		3, FL_CO_PROC | FL_MODE26, slowmul)
-ARM_CORE("arm7d",	arm7d, arm7d,		3, FL_CO_PROC | FL_MODE26, slowmul)
-ARM_CORE("arm7di",	arm7di, arm7di,		3, FL_CO_PROC | FL_MODE26, slowmul)
-ARM_CORE("arm70",	arm70, arm70,		3, FL_CO_PROC | FL_MODE26, slowmul)
-ARM_CORE("arm700",	arm700, arm700,		3, FL_CO_PROC | FL_MODE26 | FL_WBUF, slowmul)
-ARM_CORE("arm700i",	arm700i, arm700i,	3, FL_CO_PROC | FL_MODE26 | FL_WBUF, slowmul)
-ARM_CORE("arm710",	arm710, arm710,		3, FL_MODE26 | FL_WBUF, slowmul)
-ARM_CORE("arm720",	arm720, arm720,		3, FL_MODE26 | FL_WBUF, slowmul)
-ARM_CORE("arm710c",	arm710c, arm710c,	3, FL_MODE26 | FL_WBUF, slowmul)
-ARM_CORE("arm7100",	arm7100, arm7100,	3, FL_MODE26 | FL_WBUF, slowmul)
-ARM_CORE("arm7500",	arm7500, arm7500,	3, FL_MODE26 | FL_WBUF, slowmul)
+ARM_CORE("arm6",	arm6, arm6,		3, FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH3, slowmul)
+ARM_CORE("arm60",	arm60, arm60,		3, FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH3, slowmul)
+ARM_CORE("arm600",	arm600, arm600,		3, FL_CO_PROC | FL_MODE26 | FL_WBUF | FL_FOR_ARCH3, slowmul)
+ARM_CORE("arm610",	arm610, arm610,		3, FL_MODE26 | FL_WBUF | FL_FOR_ARCH3, slowmul)
+ARM_CORE("arm620",	arm620, arm620,		3, FL_CO_PROC | FL_MODE26 | FL_WBUF | FL_FOR_ARCH3, slowmul)
+ARM_CORE("arm7",	arm7, arm7,		3, FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH3, slowmul)
+ARM_CORE("arm7d",	arm7d, arm7d,		3, FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH3, slowmul)
+ARM_CORE("arm7di",	arm7di, arm7di,		3, FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH3, slowmul)
+ARM_CORE("arm70",	arm70, arm70,		3, FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH3, slowmul)
+ARM_CORE("arm700",	arm700,

[PATCH 2/4][ARM] Add feature set definitions.

2015-06-22 Thread Matthew Wahab

Hello,

The ARM backend uses an unsigned long to record CPU feature flags and there are
currently 30 bits in use. This series of patches replaces the single unsigned
long with a representation based on an array of values.

This patch adds, but doesn't use, type arm_feature_set and macros prefixed
with ARM_FSET to represent and operate on feature sets.

Tested by building with no errors. Also tested as part of the series, for
arm-none-linux-gnueabihf with check-gcc.

Ok for trunk?
Matthew

gcc/
2015-06-22  Matthew Wahab  

* config/arm/arm-protos.h (FL_NONE): New.
(FL_ANY): New.
(arm_feature_set): New.
(ARM_FSET_MAKE): New.
(ARM_FSET_MAKE_CPU1): New.
(ARM_FSET_MAKE_CPU2): New.
(ARM_FSET_CPU1): New.
(ARM_FSET_CPU2): New.
(ARM_FSET_EMPTY): New.
(ARM_FSET_ANY): New.
(ARM_FSET_HAS_CPU1): New.
(ARM_FSET_HAS_CPU2): New.
(ARM_FSET_ADD_CPU1): New.
(ARM_FSET_ADD_CPU2): New.
(ARM_FSET_DEL_CPU1): New.
(ARM_FSET_DEL_CPU2): New.
(ARM_FSET_UNION): New.
(ARM_FSET_INTER): New.
(ARM_FSET_XOR): New.
(ARM_FSET_EXCLUDE): New.
(AFM_FSET_IS_EMPTY): New.
(ARM_FSET_CPU_SUBSET): New.

From 1a98a80b64427f7bb97212ae9ecff515e980ddb7 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Thu, 4 Jun 2015 15:35:25 +0100
Subject: [PATCH 2/4] Add feature set definitions.

Change-Id: I5f89b46ea57e35f477ec4751fea3cb6ee8fce251
---
 gcc/config/arm/arm-protos.h | 101 
 1 file changed, 101 insertions(+)

diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 62f91ef..a19d54d 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -346,6 +346,8 @@ extern bool arm_is_constant_pool_ref (rtx);
 /* Flags used to identify the presence of processor capabilities.  */
 
 /* Bit values used to identify processor capabilities.  */
+#define FL_NONE	  (0)	  /* No flags.  */
+#define FL_ANY	  (0x)/* All flags.  */
 #define FL_CO_PROC(1 << 0)/* Has external co-processor bus */
 #define FL_ARCH3M (1 << 1)/* Extended multiply */
 #define FL_MODE26 (1 << 2)/* 26-bit mode support */
@@ -412,6 +414,105 @@ extern bool arm_is_constant_pool_ref (rtx);
 #define FL_FOR_ARCH7EM  (FL_FOR_ARCH7M | FL_ARCH7EM)
 #define FL_FOR_ARCH8A	(FL_FOR_ARCH7VE | FL_ARCH8)
 
+/* There are too many feature bits to fit in a single word so the set of cpu and
+   fpu capabilities is a structure.  A feature set is created and manipulated
+   with the ARM_FSET macros.  */
+
+typedef struct
+{
+  unsigned long cpu[2];
+} arm_feature_set;
+
+
+/* Initialize a feature set.  */
+
+#define ARM_FSET_MAKE(CPU1,CPU2) { { (CPU1), (CPU2) } }
+
+#define ARM_FSET_MAKE_CPU1(CPU1) ARM_FSET_MAKE ((CPU1), (FL_NONE))
+#define ARM_FSET_MAKE_CPU2(CPU2) ARM_FSET_MAKE ((FL_NONE), (CPU2))
+
+/* Accessors.  */
+
+#define ARM_FSET_CPU1(S) ((S).cpu[0])
+#define ARM_FSET_CPU2(S) ((S).cpu[1])
+
+/* Useful combinations.  */
+
+#define ARM_FSET_EMPTY ARM_FSET_MAKE (FL_NONE, FL_NONE)
+#define ARM_FSET_ANY ARM_FSET_MAKE (FL_ANY, FL_ANY)
+
+/* Tests for a specific CPU feature.  */
+
+#define ARM_FSET_HAS_CPU1(A, F)  (((A).cpu[0] & (F)) == F)
+#define ARM_FSET_HAS_CPU2(A, F)  (((A).cpu[1] & (F)) == F)
+
+/* Add a feature to a feature set.  */
+
+#define ARM_FSET_ADD_CPU1(DST, F)		\
+  do {		\
+(DST).cpu[0] |= (F);			\
+  } while (0)
+
+#define ARM_FSET_ADD_CPU2(DST, F)		\
+  do {		\
+(DST).cpu[1] |= (F);			\
+  } while (0)
+
+/* Remove a feature from a feature set.  */
+
+#define ARM_FSET_DEL_CPU1(DST, F)		\
+  do {		\
+(DST).cpu[0] &= ~(F);			\
+  } while (0)
+
+#define ARM_FSET_DEL_CPU2(DST, F)		\
+  do {		\
+(DST).cpu[1] &= ~(F);			\
+  } while (0)
+
+/* Union of feature sets.  */
+
+#define ARM_FSET_UNION(DST,F1,F2)		\
+  do {		\
+(DST).cpu[0] = (F1).cpu[0] | (F2).cpu[0];	\
+(DST).cpu[1] = (F1).cpu[1] | (F2).cpu[1];	\
+  } while (0)
+
+/* Intersection of feature sets.  */
+
+#define ARM_FSET_INTER(DST,F1,F2)		\
+  do {		\
+(DST).cpu[0] = (F1).cpu[0] & (F2).cpu[0];	\
+(DST).cpu[1] = (F1).cpu[1] & (F2).cpu[1];	\
+  } while (0)
+
+/* Exclusive disjunction.  */
+
+#define ARM_FSET_XOR(DST,F1,F2)\
+  do {			\
+(DST).cpu[0] = (F1).cpu[0] ^ (F2).cpu[0];		\
+(DST).cpu[1] = (F1).cpu[1] ^ (F2).cpu[1];		\
+  } while (0)
+
+/* Difference of feature sets: F1 excluding the elements of F2.  */
+
+#define ARM_FSET_EXCLUDE(DST,F1,F2)		\
+  do {		\
+(DST).cpu[0] = (F1).cpu[0] & ~(F2).cpu[0];	\
+(DST).cpu[1] = (F1).cpu[1] & ~(F2).cpu[1];	\
+  } while (0)
+
+/* Test for an empty feature set.  */
+
+#define ARM_FSET_IS_EMPTY(A)		\
+  (!((A).cpu[0]) && !((A).cpu[1]))
+
+/* Tests whether the cpu features of A are a subset of B.  */
+
+#define ARM_FSET_CPU_SUBSET(A,B)	\
+  (((

[PATCH 3/4][ARM] Use new feature set representation.

2015-06-22 Thread Matthew Wahab

Hello,

The ARM backend uses an unsigned long to record CPU feature flags and there are
currently 30 bits in use. This series of patches replaces the single unsigned
long with a representation based on an array of values.

This patch replaces the existing representation of CPU feature sets with the
type arm_feature_set and ARM_FSET macros added in an earlier patch in this
series.

Tested arm-none-linux-gnueabihf with check-gcc. Also tested as part of the
series for arm-none-linux-gnueabihf with check-gcc.

Ok for trunk?
Matthew

gcc/
2015-06-22  Matthew Wahab  

* config/arm/arm-builtins.c (def_mbuiltin): Use ARM_FSET macro.
* config/arm/arm-protos.h (insn_flags): Declare as type
arm_feature_set.
(tune_flags): Likewise.
* config/arm/arm.c (feature_count): New.
(insn_flags): Define as type arm_feature_set.
(tune_flags): Likewise.
(struct processors): Define field flags as type arm_feature_set.
(all_cores): Update for change to struct processors.
(all_architectures): Likewise.
(arm_option_check_internal): Use arm_feature_set and ARM_FSET macros.
(arm_option_override_internal): Likewise.
(arm_option_override): Likewise.

From 8b5e132868da066eb8a8673286b796656b9ed127 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Mon, 8 Jun 2015 14:11:13 +0100
Subject: [PATCH 3/4] Use feature sets.

Change-Id: I5a1b162102dd19b6376637218dc548502112cf4b
---
 gcc/config/arm/arm-builtins.c |   4 +-
 gcc/config/arm/arm-protos.h   |   4 +-
 gcc/config/arm/arm.c  | 131 --
 3 files changed, 80 insertions(+), 59 deletions(-)

diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index f960e0a..31203d4 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -1074,10 +1074,10 @@ arm_init_neon_builtins (void)
 #undef NUM_DREG_TYPES
 #undef NUM_QREG_TYPES
 
-#define def_mbuiltin(MASK, NAME, TYPE, CODE)\
+#define def_mbuiltin(FLAG, NAME, TYPE, CODE)\
   do	\
 {	\
-  if ((MASK) & insn_flags)		\
+  if (ARM_FSET_HAS_CPU1 (insn_flags, (FLAG)))			\
 	{\
 	  tree bdecl;			\
 	  bdecl = add_builtin_function ((NAME), (TYPE), (CODE),		\
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index a19d54d..859b5d2 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -515,11 +515,11 @@ typedef struct
 
 /* The bits in this mask specify which
instructions we are allowed to generate.  */
-extern unsigned long insn_flags;
+extern arm_feature_set insn_flags;
 
 /* The bits in this mask specify which instruction scheduling options should
be used.  */
-extern unsigned long tune_flags;
+extern arm_feature_set tune_flags;
 
 /* Nonzero if this chip supports the ARM Architecture 3M extensions.  */
 extern int arm_arch3m;
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index b21f433..dd892a7 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -105,6 +105,7 @@ static void arm_add_gc_roots (void);
 static int arm_gen_constant (enum rtx_code, machine_mode, rtx,
 			 HOST_WIDE_INT, rtx, rtx, int, int);
 static unsigned bit_count (unsigned long);
+static unsigned feature_count (const arm_feature_set*);
 static int arm_address_register_rtx_p (rtx, int);
 static int arm_legitimate_index_p (machine_mode, rtx, RTX_CODE, int);
 static bool is_called_in_ARM_mode (tree);
@@ -771,11 +772,11 @@ static int thumb_call_reg_needed;
 
 /* The bits in this mask specify which
instructions we are allowed to generate.  */
-unsigned long insn_flags = 0;
+arm_feature_set insn_flags = ARM_FSET_EMPTY;
 
 /* The bits in this mask specify which instruction scheduling options should
be used.  */
-unsigned long tune_flags = 0;
+arm_feature_set tune_flags = ARM_FSET_EMPTY;
 
 /* The highest ARM architecture version supported by the
target.  */
@@ -928,7 +929,7 @@ struct processors
   enum processor_type core;
   const char *arch;
   enum base_architecture base_arch;
-  const unsigned long flags;
+  const arm_feature_set flags;
   const struct tune_params *const tune;
 };
 
@@ -2197,10 +2198,10 @@ static const struct processors all_cores[] =
   /* ARM Cores */
 #define ARM_CORE(NAME, X, IDENT, ARCH, FLAGS, COSTS) \
   {NAME, IDENT, #ARCH, BASE_ARCH_##ARCH,	  \
-FLAGS, &arm_##COSTS##_tune},
+   ARM_FSET_MAKE_CPU1 (FLAGS), &arm_##COSTS##_tune},
 #include "arm-cores.def"
 #undef ARM_CORE
-  {NULL, arm_none, NULL, BASE_ARCH_0, 0, NULL}
+  {NULL, arm_none, NULL, BASE_ARCH_0, ARM_FSET_EMPTY, NULL}
 };
 
 static const struct processors all_architectures[] =
@@ -2210,10 +2211,10 @@ static const struct processors all_architectures[] =
  from the core.  */
 
 #define ARM_ARCH(NAME, CORE, ARCH, FLAGS) \
-  {NAME, CORE, #ARCH, BASE_ARCH_##ARCH, FLAGS, NULL},
+  {NAME, CORE, #ARCH, BASE_ARCH_##ARCH, ARM_FSET_MAKE_CPU1 (FLAGS), NULL},
 #i

[PATCH 4/4][ARM] Move initializer into arm-cores.def and arm-arches.def

2015-06-22 Thread Matthew Wahab

Hello,

The ARM backend uses an unsigned long to record CPU feature flags and there are
currently 30 bits in use. This series of patches replaces the single unsigned
long with a representation based on an array of values.

This patch updates the entries in the arm-core.def and arm-arches.def files
for the new arm_feature_set representation, moving the initializers from a macro
expansion and making them explicit in the file entries.

Tested for arm-none-linux-gnueabihf with check-gcc.

Ok for trunk?
Matthew

gcc/
2015-08-22  Matthew Wahab  

* config/arm/arm-arches.def: Replace single value flags with
initializer built from ARM_FSET_MAKE_CPU1.
* config/arm/arm-cores.def: Likewise.
* config/arm/arm.c: (all_cores): Remove ARM_FSET_MAKE_CPU1
derivation from the ARM_CORE macro definition, use the given value
instead.
(all_architectures): Remove ARM_FSET_MAKE_CPU1 derivation from the
ARM_ARCH macro definition, use the given value instead.

From 389cfb0e1046b1d84dd3d8920aa5bed50dc19164 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Mon, 8 Jun 2015 16:15:52 +0100
Subject: [PATCH 4/4] Move feature sets into core and arch def files.

Change-Id: Ica484c7d9f46413c196b26a630ff49413b10289b
---
 gcc/config/arm/arm-arches.def |  56 ++--
 gcc/config/arm/arm-cores.def  | 200 +-
 gcc/config/arm/arm.c  |   4 +-
 3 files changed, 130 insertions(+), 130 deletions(-)

diff --git a/gcc/config/arm/arm-arches.def b/gcc/config/arm/arm-arches.def
index 840c1ff..6d0374a 100644
--- a/gcc/config/arm/arm-arches.def
+++ b/gcc/config/arm/arm-arches.def
@@ -28,33 +28,33 @@
 
genopt.sh assumes no whitespace up to the first "," in each entry.  */
 
-ARM_ARCH("armv2",   arm2,   2,   FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH2)
-ARM_ARCH("armv2a",  arm2,   2,   FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH2)
-ARM_ARCH("armv3",   arm6,   3,   FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH3)
-ARM_ARCH("armv3m",  arm7m,  3M,  FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH3M)
-ARM_ARCH("armv4",   arm7tdmi,   4,   FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH4)
+ARM_ARCH("armv2",   arm2,   2,	ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH2))
+ARM_ARCH("armv2a",  arm2,   2,	ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH2))
+ARM_ARCH("armv3",   arm6,   3,	ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH3))
+ARM_ARCH("armv3m",  arm7m,  3M,	ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH3M))
+ARM_ARCH("armv4",   arm7tdmi,   4,	ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH4))
 /* Strictly, FL_MODE26 is a permitted option for v4t, but there are no
implementations that support it, so we will leave it out for now.  */
-ARM_ARCH("armv4t",  arm7tdmi,   4T,  FL_CO_PROC | FL_FOR_ARCH4T)
-ARM_ARCH("armv5",   arm10tdmi,  5,   FL_CO_PROC | FL_FOR_ARCH5)
-ARM_ARCH("armv5t",  arm10tdmi,  5T,  FL_CO_PROC | FL_FOR_ARCH5T)
-ARM_ARCH("armv5e",  arm1026ejs, 5E,  FL_CO_PROC | FL_FOR_ARCH5E)
-ARM_ARCH("armv5te", arm1026ejs, 5TE, FL_CO_PROC | FL_FOR_ARCH5TE)
-ARM_ARCH("armv6",   arm1136js,  6,   FL_CO_PROC | FL_FOR_ARCH6)
-ARM_ARCH("armv6j",  arm1136js,  6J,  FL_CO_PROC | FL_FOR_ARCH6J)
-ARM_ARCH("armv6k",  mpcore,	6K,  FL_CO_PROC | FL_FOR_ARCH6K)
-ARM_ARCH("armv6z",  arm1176jzs, 6Z,  FL_CO_PROC | FL_FOR_ARCH6Z)
-ARM_ARCH("armv6zk", arm1176jzs, 6ZK, FL_CO_PROC | FL_FOR_ARCH6ZK)
-ARM_ARCH("armv6t2", arm1156t2s, 6T2, FL_CO_PROC | FL_FOR_ARCH6T2)
-ARM_ARCH("armv6-m", cortexm1,	6M,			  FL_FOR_ARCH6M)
-ARM_ARCH("armv6s-m", cortexm1,	6M,			  FL_FOR_ARCH6M)
-ARM_ARCH("armv7",   cortexa8,	7,   FL_CO_PROC |	  FL_FOR_ARCH7)
-ARM_ARCH("armv7-a", cortexa8,	7A,  FL_CO_PROC |	  FL_FOR_ARCH7A)
-ARM_ARCH("armv7ve", cortexa8,	7A,  FL_CO_PROC |	  FL_FOR_ARCH7VE)
-ARM_ARCH("armv7-r", cortexr4,	7R,  FL_CO_PROC |	  FL_FOR_ARCH7R)
-ARM_ARCH("armv7-m", cortexm3,	7M,  FL_CO_PROC |	  FL_FOR_ARCH7M)
-ARM_ARCH("armv7e-m", cortexm4,  7EM, FL_CO_PROC |	  FL_FOR_ARCH7EM)
-ARM_ARCH("armv8-a", cortexa53,  8A,  FL_CO_PROC | FL_FOR_ARCH8A)
-ARM_ARCH("armv8-a+crc",cortexa53, 8A,FL_CO_PROC | FL_CRC32  | FL_FOR_ARCH8A)
-ARM_ARCH("iwmmxt",  iwmmxt, 5TE, FL_LDSCHED | FL_STRONG | FL_FOR_ARCH5TE | FL_XSCALE | FL_IWMMXT)
-ARM_ARCH("iwmmxt2", iwmmxt2,5TE, FL_LDSCHED | FL_STRONG | FL_FOR_ARCH5TE | FL_XSCALE | FL_IWMMXT | FL_IWMMXT2)
+ARM_ARCH("armv4t",  arm7tdmi,   4T,	ARM_FSET_MAKE_CPU1 (FL_CO_PROC |   

  1   2   3   >