Re: [PATCH/libiberty] Remove use of strtod in libiberty/d-demangle.c

2015-08-10 Thread Ian Lance Taylor
On Tue, Aug 4, 2015 at 7:23 AM, Iain Buclaw  wrote:
>
> Fixes PR 18669 raised against gdb/binutils.
>
> https://sourceware.org/bugzilla/show_bug.cgi?id=18669
>
> While it is possible to roll our own strtod that handles hexadecimal
> to float conversion, I'm no longer interested taking time out to
> implement or maintain such a thing.  So the next obvious thing to do
> is nothing, which is what I've settled for.

This is OK.

Thanks.

Ian


Re: [PR64164] drop copyrename, integrate into expand

2015-08-10 Thread James Greenhalgh
On Tue, Aug 04, 2015 at 12:45:28AM +0100, Alexandre Oliva wrote:
> On Jul 30, 2015, "H.J. Lu"  wrote:
> 
> > aoliva/pr64164  is fine on x32.
> 
> Thanks.  I have made a large number of changes since you tested it,
> fixing all the reported issues and then some.  Now, x86_64-linux-gnu
> (-m64 and -m32), i686-pc-linux-gnu, powerpc64-linux-gnu and
> powerpc64el-linux-gnu pass regstrap (r226317), and the many tens of
> targets I cross-tested still get the same 'make all' errors that the
> pristine tree did.

For what it is worth, I bootstrapped and tested the consolidated patch
on arm-none-linux-gnueabihf and aarch64-none-linux-gnu with trunk at
r226516 over the weekend, and didn't see any new issues.

Thanks,
James



Re: [PATCH][RTL-ifcvt] Improve conditional select ops on immediates

2015-08-10 Thread Kyrill Tkachov


On 04/08/15 09:44, Kyrill Tkachov wrote:

On 03/08/15 18:37, Uros Bizjak wrote:

On Mon, Aug 3, 2015 at 7:20 PM, Kyrill Tkachov  wrote:


Looking at the x86 movcc expansion code (ix86_expand_int_movcc) I
don't think this is a good idea. In the expander, there is already
quite some target-dependent code that goes great length to utilize sbb
insn as much as possible, before cmove is used.

IMO, as far as x86 is concerned, the best solution would be to revert
the change. ix86_expand_int_movcc already does some tricks from your
patch in a target-efficient way. Generic change that was introduced by
your patch now interferes with this expansion.

Well, technically the transformation was already there, it was just
never
reached for an x86 compilation because noce_try_cmove was tried in front
of
it
and used a target-specific expansion.
In any case, how's this proposal?
The transformation noce_try_store_flag_constants
  /* if (test) x = a; else x = b;
 =>   x = (-(test != 0) & (b - a)) + a;  */

Is a catch-all-immediates transformation in
noce_try_store_flag_constants.
What if we moved it to noce_try_cmove and performed it only if the
target-specific
conditional move expansion there failed?

That way we can try the x86_64-specific sequence first and still give
the
opportunity
to noce_try_store_flag_constants to perform the transformations that can
benefit targets
that don't have highly specific conditional move expanders.

Yes, let's try this approach. As was found out, some targets (e.g.
x86) hide lots of different target-dependent expansion strategies into
movcc expander. Perhaps this fact should be documented in the comment
in the generic code?

Ok, I'll work on that approach and add a comment.

I'm testing a patch that fix the testcases on x86_64 and does not
harm codegen on aarch64. Feel free to file a PR and assign it to me.

PR67103 [1]

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67103

Thanks,
Here's the patch to move that transformation from noce_try_store_flag_constants
to noce_try_cmove after the target-specific expansion has had a go.

This fixes the testcases for me on x86_64.
In i386.exp I only see:
FAIL: gcc.target/i386/pr49781-1.c scan-assembler-not lea[lq]?[ 
\t]\\((%|)r[a-z0-9]*
FAIL: gcc.target/i386/pr61403.c scan-assembler blend

which were there before my patch.
Bootstrap and testing on x86_64, arm and aarch64 is successful for me.

Is this ok?


Ping.
Uros, does the codegen with this patch look ok to you?

Thanks,
Kyrill



Thanks,
Kyrill

2015-08-04  Kyrylo Tkachov  

  PR rtl-optimization/67103
  * ifcvt.c (noce_try_store_flag_constants): Move
  x = (-(test != 0) & (b - a)) + a transformation to...
  (noce_try_cmove): ... Here.  Try it if normal conditional
  move fails.



Thanks,
Uros.





Re: [PATCH][RTL-ifcvt] Improve conditional select ops on immediates

2015-08-10 Thread Uros Bizjak
On Mon, Aug 10, 2015 at 11:36 AM, Kyrill Tkachov  wrote:

 I'm testing a patch that fix the testcases on x86_64 and does not
 harm codegen on aarch64. Feel free to file a PR and assign it to me.
>>>
>>> PR67103 [1]
>>>
>>> [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67103
>>
>> Thanks,
>> Here's the patch to move that transformation from
>> noce_try_store_flag_constants
>> to noce_try_cmove after the target-specific expansion has had a go.
>>
>> This fixes the testcases for me on x86_64.
>> In i386.exp I only see:
>> FAIL: gcc.target/i386/pr49781-1.c scan-assembler-not lea[lq]?[
>> \t]\\((%|)r[a-z0-9]*
>> FAIL: gcc.target/i386/pr61403.c scan-assembler blend
>>
>> which were there before my patch.
>> Bootstrap and testing on x86_64, arm and aarch64 is successful for me.
>>
>> Is this ok?
>
>
> Ping.
> Uros, does the codegen with this patch look ok to you?

Yes, the code of previously failing testcases looks OK.

You will need an approval from rtl-optimization maintainer, though.

Uros.


Re: [PATCH][RTL-ifcvt] Improve conditional select ops on immediates

2015-08-10 Thread Kyrill Tkachov


On 10/08/15 10:43, Uros Bizjak wrote:

On Mon, Aug 10, 2015 at 11:36 AM, Kyrill Tkachov  wrote:


I'm testing a patch that fix the testcases on x86_64 and does not
harm codegen on aarch64. Feel free to file a PR and assign it to me.

PR67103 [1]

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67103

Thanks,
Here's the patch to move that transformation from
noce_try_store_flag_constants
to noce_try_cmove after the target-specific expansion has had a go.

This fixes the testcases for me on x86_64.
In i386.exp I only see:
FAIL: gcc.target/i386/pr49781-1.c scan-assembler-not lea[lq]?[
\t]\\((%|)r[a-z0-9]*
FAIL: gcc.target/i386/pr61403.c scan-assembler blend

which were there before my patch.
Bootstrap and testing on x86_64, arm and aarch64 is successful for me.

Is this ok?


Ping.
Uros, does the codegen with this patch look ok to you?

Yes, the code of previously failing testcases looks OK.

You will need an approval from rtl-optimization maintainer, though.


Sure, thanks for confirming.

Kyrill



Uros.





Re: [COMMITTED][AArch64] Improve TLS Descriptor pattern to release RTL loop IV opt

2015-08-10 Thread Jiong Wang

Andreas Schwab writes:

> Jiong Wang  writes:
>
>> Index: gcc/ChangeLog
>> ===
>> --- gcc/ChangeLog(revision 226682)
>> +++ gcc/ChangeLog(working copy)
>> @@ -1,3 +1,16 @@
>> +2015-08-06Ramana Radhakrishnan  
>> +  Jiong Wang  
>> +
>> +* config/aarch64/aarch64.d (tlsdesc_small_pseudo_): New pattern.
>> +* config/aarch64/aarch64.h (reg_class): New enumeration FIXED_REG0.
>> +(REG_CLASS_NAMES): Likewise.
>> +(REG_CLASS_CONTENTS): Likewise.
>> +* config/aarch64/aarch64.c (aarch64_class_max_nregs): Likewise.
>> +(aarch64_register_move_cost): Likewise.
>> +(aarch64_load_symref_appropriately): Invoke the new added pattern if
>> +possible.
>> +* config/aarch64/constraints.md (Uc0): New constraint.
>
> That breaks go, all tests are crashing now.

Andreas,

  Thanks for the information.

  * I found I committed the wrong patch!
there are two patches in my local directory, one is
"tlsdesc_hoist.patch" the other is "tlsdesc-hoist.patch", the one
approved and up-to-date is tlsdesc-hoist.patch while I committed
tlsdesc_hoist.patch.

Reverted the wrong commit and committed the correct/approved
version.

  * Even after the correct patch applied, I still found go check failed
on my local native check.

Tring to understand why and if I can't figure out today I will
revert the patch.

  Sorry about the trouble!
-- 
Regards,
Jiong



Re: [PATCH 2/2][ARM] Use new FPU features representation

2015-08-10 Thread Matthew Wahab

Ping. Updated patch attached.

Also, retested the series for arm-none-linux-gnueabihf with native bootstrap and make 
check.



On 22/06/15 16:18, Matthew Wahab wrote:

Hello,

This patch series changes the representation of FPU features to use a simple
bit-set and flags, as is done elsewhere.

This patch uses the new representation of FPU feature sets.

Tested the series for arm-none-linux-gnueabihf with check-gcc

Ok for trunk?
Matthew

gcc/
2015-06-22  Matthew Wahab  

* config/arm/arm-fpus.def: Replace neon, fp16 and crypto boolean
fields with feature flags.  Update comment.
* config/arm/arm.c (ARM_FPU): Update macro.
* config/arm/arm.h (TARGET_NEON_FP16): Update feature test.
(TARGET_FP16): Likewise.
(TARGET_CRYPTO): Likewise.
(TARGET_NEON): Likewise.
(struct arm_fpu_desc): Remove fields neon, fp16 and crypto.  Add
field features.



>From 73e14e3f8ebde714735326a8745c929ac92759e8 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Thu, 23 Jul 2015 12:45:48 +0100
Subject: [PATCH 2/2] Use new FPU feature definitions.

Change-Id: Ic9e156efb7512455130260e115e9af8f27cda3a5
---
 gcc/config/arm/arm-fpus.def | 40 
 gcc/config/arm/arm.c|  4 ++--
 gcc/config/arm/arm.h| 22 +-
 3 files changed, 35 insertions(+), 31 deletions(-)

diff --git a/gcc/config/arm/arm-fpus.def b/gcc/config/arm/arm-fpus.def
index 2dfefd6..efd5896 100644
--- a/gcc/config/arm/arm-fpus.def
+++ b/gcc/config/arm/arm-fpus.def
@@ -19,30 +19,30 @@
 
 /* Before using #include to read this file, define a macro:
 
-  ARM_FPU(NAME, MODEL, REV, VFP_REGS, NEON, FP16, CRYPTO)
+  ARM_FPU(NAME, MODEL, REV, VFP_REGS, FEATURES)
 
The arguments are the fields of struct arm_fpu_desc.
 
genopt.sh assumes no whitespace up to the first "," in each entry.  */
 
-ARM_FPU("vfp",		ARM_FP_MODEL_VFP, 2, VFP_REG_D16, false, false, false)
-ARM_FPU("vfpv3",	ARM_FP_MODEL_VFP, 3, VFP_REG_D32, false, false, false)
-ARM_FPU("vfpv3-fp16",	ARM_FP_MODEL_VFP, 3, VFP_REG_D32, false, true, false)
-ARM_FPU("vfpv3-d16",	ARM_FP_MODEL_VFP, 3, VFP_REG_D16, false, false, false)
-ARM_FPU("vfpv3-d16-fp16",	ARM_FP_MODEL_VFP, 3, VFP_REG_D16, false, true, false)
-ARM_FPU("vfpv3xd",	ARM_FP_MODEL_VFP, 3, VFP_REG_SINGLE, false, false, false)
-ARM_FPU("vfpv3xd-fp16",	ARM_FP_MODEL_VFP, 3, VFP_REG_SINGLE, false, true, false)
-ARM_FPU("neon",		ARM_FP_MODEL_VFP, 3, VFP_REG_D32, true , false, false)
-ARM_FPU("neon-fp16",	ARM_FP_MODEL_VFP, 3, VFP_REG_D32, true, true, false)
-ARM_FPU("vfpv4",	ARM_FP_MODEL_VFP, 4, VFP_REG_D32, false, true, false)
-ARM_FPU("vfpv4-d16",	ARM_FP_MODEL_VFP, 4, VFP_REG_D16, false, true, false)
-ARM_FPU("fpv4-sp-d16",	ARM_FP_MODEL_VFP, 4, VFP_REG_SINGLE, false, true, false)
-ARM_FPU("fpv5-sp-d16",	ARM_FP_MODEL_VFP, 5, VFP_REG_SINGLE, false, true, false)
-ARM_FPU("fpv5-d16",	ARM_FP_MODEL_VFP, 5, VFP_REG_D16, false, true, false)
-ARM_FPU("neon-vfpv4",	ARM_FP_MODEL_VFP, 4, VFP_REG_D32, true, true, false)
-ARM_FPU("fp-armv8",	ARM_FP_MODEL_VFP, 8, VFP_REG_D32, false, true, false)
-ARM_FPU("neon-fp-armv8",ARM_FP_MODEL_VFP, 8, VFP_REG_D32, true, true, false)
+ARM_FPU("vfp",		ARM_FP_MODEL_VFP, 2, VFP_REG_D16, FPU_FL_NONE)
+ARM_FPU("vfpv3",	ARM_FP_MODEL_VFP, 3, VFP_REG_D32, FPU_FL_NONE)
+ARM_FPU("vfpv3-fp16",	ARM_FP_MODEL_VFP, 3, VFP_REG_D32, FPU_FL_FP16)
+ARM_FPU("vfpv3-d16",	ARM_FP_MODEL_VFP, 3, VFP_REG_D16, FPU_FL_NONE)
+ARM_FPU("vfpv3-d16-fp16",	ARM_FP_MODEL_VFP, 3, VFP_REG_D16, FPU_FL_FP16)
+ARM_FPU("vfpv3xd",	ARM_FP_MODEL_VFP, 3, VFP_REG_SINGLE, FPU_FL_NONE)
+ARM_FPU("vfpv3xd-fp16",	ARM_FP_MODEL_VFP, 3, VFP_REG_SINGLE, FPU_FL_FP16)
+ARM_FPU("neon",		ARM_FP_MODEL_VFP, 3, VFP_REG_D32, FPU_FL_NEON)
+ARM_FPU("neon-fp16",	ARM_FP_MODEL_VFP, 3, VFP_REG_D32, FPU_FL_NEON | FPU_FL_FP16)
+ARM_FPU("vfpv4",	ARM_FP_MODEL_VFP, 4, VFP_REG_D32, FPU_FL_FP16)
+ARM_FPU("vfpv4-d16",	ARM_FP_MODEL_VFP, 4, VFP_REG_D16, FPU_FL_FP16)
+ARM_FPU("fpv4-sp-d16",	ARM_FP_MODEL_VFP, 4, VFP_REG_SINGLE, FPU_FL_FP16)
+ARM_FPU("fpv5-sp-d16",	ARM_FP_MODEL_VFP, 5, VFP_REG_SINGLE, FPU_FL_FP16)
+ARM_FPU("fpv5-d16",	ARM_FP_MODEL_VFP, 5, VFP_REG_D16, FPU_FL_FP16)
+ARM_FPU("neon-vfpv4",	ARM_FP_MODEL_VFP, 4, VFP_REG_D32, FPU_FL_NEON | FPU_FL_FP16)
+ARM_FPU("fp-armv8",	ARM_FP_MODEL_VFP, 8, VFP_REG_D32, FPU_FL_FP16)
+ARM_FPU("neon-fp-armv8",ARM_FP_MODEL_VFP, 8, VFP_REG_D32, FPU_FL_NEON | FPU_FL_FP16)
 ARM_FPU("crypto-neon-fp-armv8",
-			ARM_FP_MODEL_VFP, 8, VFP_REG_D32, true, true, true)
+			ARM_FP_MODEL_VFP, 8, VFP_REG_D32, FPU_FL_NEON | FPU_FL_FP16 | FPU_FL_CRYPTO)
 /* Compatibility aliases.  */
-ARM_FPU("vfp3",		ARM_FP_MODEL_VFP, 3, VFP_REG_D32, false, false, false)
+ARM_FPU("vfp3",		ARM_FP_MODEL_VFP, 3, VFP_REG_D32, FPU_FL_NONE)
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 1ea9e27..6e15bf2 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -2230,8 +2230,8 @@ char arm_arch_name[] = "__ARM_ARCH_0UNK__";
 
 static const struct arm_fp

Re: [PATCH 1/2][ARM] Record FPU features as a bit-set

2015-08-10 Thread Matthew Wahab

Ping. Updated patch attached.

Also, retested the series for arm-none-linux-gnueabihf with native bootstrap and make 
check.


On 22/06/15 16:16, Matthew Wahab wrote:

Hello,

The ARM backend records FPU features as booleans, one for each feature. This
means that adding support for a new feature involves updating every entry in the
list of FPU descriptions in arm-fpus.def. This patch series changes the
representation of FPU features to use a simple bit-set and flags, as is done
elsewhere.

This patch adds the new FPU feature representation, with feature sets
represented as unsigned longs.

Tested the series for arm-none-linux-gnueabihf with check-gcc

Ok for trunk?
Matthew

gcc/
2015-06-22  Matthew Wahab  

* config/arm/arm.h (arm_fpu_fset): New.
(ARM_FPU_FSET_HAS): New.
(FPU_FL_NONE): New.
(FPU_FL_NEON): New.
(FPU_FL_FP16): New.
(FPU_FL_CRYPTO): New.



>From 571416d9e7bc9cb6c16008486faf357873270991 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Thu, 23 Jul 2015 12:44:51 +0100
Subject: [PATCH 1/2] Add fpu feature set definitions.

Change-Id: I9f0fcc9627e3c435cbbc9056b9244781b438447e
---
 gcc/config/arm/arm.h | 13 +
 1 file changed, 13 insertions(+)

diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index bb64be0..f49eb48 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -318,6 +318,19 @@ extern void (*arm_lang_output_object_attributes_hook)(void);
   {"mode", "%{!marm:%{!mthumb:-m%(VALUE)}}"}, \
   {"tls", "%{!mtls-dialect=*:-mtls-dialect=%(VALUE)}"},
 
+/* FPU feature sets.  */
+
+typedef unsigned long arm_fpu_feature_set;
+
+/* Test for an FPU feature.  */
+#define ARM_FPU_FSET_HAS(S,F) (((S) & (F)) == (F))
+
+/* FPU Features.  */
+#define FPU_FL_NONE	(0)
+#define FPU_FL_NEON	(1 << 0)	/* NEON instructions.  */
+#define FPU_FL_FP16	(1 << 1)	/* Half-precision.  */
+#define FPU_FL_CRYPTO	(1 << 2)	/* Crypto extensions.  */
+
 /* Which floating point model to use.  */
 enum arm_fp_model
 {
-- 
1.9.1



Re: [PATCH 1/2][ARM] Record FPU features as a bit-set

2015-08-10 Thread Kyrill Tkachov

Hi Matthew,

On 10/08/15 11:28, Matthew Wahab wrote:

Ping. Updated patch attached.

Also, retested the series for arm-none-linux-gnueabihf with native bootstrap 
and make
check.

On 22/06/15 16:16, Matthew Wahab wrote:

Hello,

The ARM backend records FPU features as booleans, one for each feature. This
means that adding support for a new feature involves updating every entry in the
list of FPU descriptions in arm-fpus.def. This patch series changes the
representation of FPU features to use a simple bit-set and flags, as is done
elsewhere.

This patch adds the new FPU feature representation, with feature sets
represented as unsigned longs.

Tested the series for arm-none-linux-gnueabihf with check-gcc

Ok for trunk?
Matthew

gcc/
2015-06-22  Matthew Wahab  

* config/arm/arm.h (arm_fpu_fset): New.
(ARM_FPU_FSET_HAS): New.
(FPU_FL_NONE): New.
(FPU_FL_NEON): New.
(FPU_FL_FP16): New.
(FPU_FL_CRYPTO): New.


This is ok.
Thanks,
Kyrill





Re: [PATCH] Fix default_binds_local_p_2 for extern protected data

2015-08-10 Thread Szabolcs Nagy


ping.

On 22/07/15 18:01, Szabolcs Nagy wrote:

The commit
https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=222184
changed a true to false in varasm.c:

  bool
  default_binds_local_p_2 (const_tree exp)
  {
-  return default_binds_local_p_3 (exp, flag_shlib != 0, true, true);
+  return default_binds_local_p_3 (exp, flag_shlib != 0, true, false,
+ !flag_pic);
  }

where

  default_binds_local_p_3 (const_tree exp, bool shlib, bool weak_dominate,
-bool extern_protected_data)
+bool extern_protected_data, bool common_local_p)
  {

false means that extern protected data binds locally,
which is wrong if the target can have copy relocations
against it (then the address must be loaded from GOT
otherwise the main executable will see different address).

Currently S/390, ARM and AArch64 targets use this predicate
and the current default is wrong for all of them (they can
have copy relocs) so I changed the default instead of doing
it in a target specific way.

The equivalent x86_64 bug was
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65248
the default was changed for
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65780
now i opened
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66912
for arm and aarch64.

Needs a further binutils patch too to emit R_*_GLOB_DAT
instead of R_*_RELATIVE relocs for protected data.
The glibc elf/tst-protected1a and elf/tst-protected1b
tests depend on this.

Tested ARM and AArch64 targets.

gcc/ChangeLog:

2015-07-22  Szabolcs Nagy  

PR target/66912
* varasm.c (default_binds_local_p_2): Turn on extern_protected_data.

gcc/testsuite/ChangeLog:

2015-07-22  Szabolcs Nagy  

PR target/66912
* gcc.target/aarch64/pr66912.c: New.
* gcc.target/arm/pr66912.c: New.





Re: [PATCH][ARM][3/3] Expand mod by power of 2

2015-08-10 Thread Kyrill Tkachov

Here is a slight respin.
The important parts are the same, just the expander now uses the slightly 
shorter
arm_gen_compare_reg and the rtx costs hunk is moved under an explicit case MOD.

Note, the tests still require patch 1/3 that does this for aarch64 that I hope 
to post
a respinned version of soon.

Ok after the prerequisite goes in?

Thanks,
Kyrill


2015-08-10  Kyrylo Tkachov  

* config/arm/arm.md (*subsi3_compare0): Rename to...
(subsi3_compare0): ... This.
(*arm_andsi3_insn): Rename to...
(arm_andsi3_insn): ... This.
(modsi3): New define_expand.
* config/arm/arm.c (arm_new_rtx_costs, MOD case): Handle case
when operand is power of 2.


2015-08-10  Kyrylo Tkachov  

* gcc.target/aarch64/mod_2.x: New file.
* gcc.target/aarch64/mod_256.x: Likewise.
* gcc.target/arm/mod_2.c: New test.
* gcc.target/arm/mod_256.c: Likewise.
* gcc.target/aarch64/mod_2.c: Likewise.
* gcc.target/aarch64/mod_256.c: Likewise.



On 31/07/15 09:20, Kyrill Tkachov wrote:

Ping.

https://gcc.gnu.org/ml/gcc-patches/2015-07/msg02037.html
Thanks,
Kyrill

On 24/07/15 11:55, Kyrill Tkachov wrote:

Hi all,

This third patch implements the same algorithm as patch 1/3 but for arm.
That is, for X % N where N is a power of 2 we do:

rsbsr1, r0, #0
and r0, r0, #(N - 1)
and r1, r1, #(N - 1)
rsbpl   r0, r1, #0

For the special case where N is 2 we do the shorter:
 cmp r0, #0
 and r0, r0, #1
 rsblt   r0, r0, #0

Note that for the final conditional negate we expand to an IF_THEN_ELSE of a NEG
rather than a cond_exec rtx because the lra dataflow analysis doesn't always 
deal
with cond_execs correctly. The splitters fixed in patch 2/3 then break it into a
cond_exec after reload, so it all works out.

Bootstrapped and tested on arm, with both ARM and Thumb2 states.

Tests are added and shared with aarch64.

Ok for trunk?

Thanks,
Kyrill

2015-07-24  Kyrylo Tkachov  

   * config/arm/arm.md (*subsi3_compare0): Rename to...
   (subsi3_compare0): ... This.
   (*arm_andsi3_insn): Rename to...
   (arm_andsi3_insn): ... This.
   (modsi3): New define_expand.
   * config/arm/arm.c (arm_new_rtx_costs, MOD case): Handle case
   operand is power of 2.


2015-07-24  Kyrylo Tkachov  

   * gcc.target/aarch64/mod_2.x: New file.
   * gcc.target/aarch64/mod_256.x: Likewise.
   * gcc.target/arm/mod_2.c: New test.
   * gcc.target/arm/mod_256.c: Likewise.
   * gcc.target/aarch64/mod_2.c: Likewise.
   * gcc.target/aarch64/mod_256.c: Likewise.


commit 7d0da77d73552d8e683525f4e6fb8bc660ed1c56
Author: Kyrylo Tkachov 
Date:   Fri Jul 17 16:30:01 2015 +0100

[ARM][3/3] Expand mod by power of 2

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 1ea9e27..a607a5c 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -9559,6 +9559,24 @@ arm_new_rtx_costs (rtx x, enum rtx_code code, enum rtx_code outer_code,
   return false;	/* All arguments must be in registers.  */
 
 case MOD:
+  /* MOD by a power of 2 can be expanded as:
+	 rsbsr1, r0, #0
+	 and r0, r0, #(n - 1)
+	 and r1, r1, #(n - 1)
+	 rsbpl   r0, r1, #0.  */
+  if (CONST_INT_P (XEXP (x, 1))
+	  && exact_log2 (INTVAL (XEXP (x, 1))) > 0
+	  && mode == SImode)
+	{
+	  *cost += COSTS_N_INSNS (3);
+
+	  if (speed_p)
+	*cost += 2 * extra_cost->alu.logical
+		 + extra_cost->alu.arith;
+	  return true;
+	}
+
+/* Fall-through.  */
 case UMOD:
   *cost = LIBCALL_COST (2);
   return false;	/* All arguments must be in registers.  */
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 817860d..652ec51 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -1229,7 +1229,7 @@ (define_peephole2
   ""
 )
 
-(define_insn "*subsi3_compare0"
+(define_insn "subsi3_compare0"
   [(set (reg:CC_NOOV CC_REGNUM)
 	(compare:CC_NOOV
 	 (minus:SI (match_operand:SI 1 "arm_rhs_operand" "r,r,I")
@@ -2158,7 +2158,7 @@ (define_expand "andsi3"
 )
 
 ; ??? Check split length for Thumb-2
-(define_insn_and_split "*arm_andsi3_insn"
+(define_insn_and_split "arm_andsi3_insn"
   [(set (match_operand:SI 0 "s_register_operand" "=r,l,r,r,r")
 	(and:SI (match_operand:SI 1 "s_register_operand" "%r,0,r,r,r")
 		(match_operand:SI 2 "reg_or_int_operand" "I,l,K,r,?n")))]
@@ -11143,6 +11143,76 @@ (define_expand "thumb_legacy_rev"
   ""
 )
 
+;; ARM-specific expansion of signed mod by power of 2
+;; using conditional negate.
+;; For r0 % n where n is a power of 2 produce:
+;; rsbsr1, r0, #0
+;; and r0, r0, #(n - 1)
+;; and r1, r1, #(n - 1)
+;; rsbpl   r0, r1, #0
+
+(define_expand "modsi3"
+  [(match_operand:SI 0 "register_operand" "")
+   (match_operand:SI 1 "register_operand" "")
+   (match_operand:SI 2 "const_int_operand" "")]
+  "TARGET_32BIT"
+  {
+HOST_WIDE_INT val = INTVAL (operands[2]);
+
+if (val <= 0
+   || exact_log2 (INTVAL (operands[2])) <= 0
+   || !const_ok_for_arm (

[PATCH 1/5][ARM] Make room for more CPU feature flags.

2015-08-10 Thread Matthew Wahab

The ARM backend uses an unsigned long to record CPU feature flags and
there are wcurrently 31 bits in use. To be able to support new
architecture features, the current representation will need to be
replaced so that more flags can be recorded.

This series of patches replaces the single unsigned long with a
representation based on an array of unsigned longs. Constructors and
operations are explicitly defined for the new representation and the
backend is updated to use the new operations.

The individual patches:
- Make architecture flags explicit in arm-cores.def, to prepare for the
  changes.
- Add definitions for the new representation as type arm_feature_set and
  macros with prefix ARM_FSET.
- Replace uses of the old representation with the arm_feature_set type
  and operations in the architecture specifiers.
- Use the new arm_feature_set type and operations in the descriptions of
  the builtins.
- Rework arm-cores.def and arm-arches.def to make the feature set
  constructions explicit.

Tested the series for arm-none-linux-gnueabihf with native bootstrap and
make check.

This patch moves the derived FL_FOR_ARCH##ARCH flags from the expansion
of macro arm.c/ARM_CORE and makes them explicit in the entries in
arm-cores.def.

2015-08-10  Matthew Wahab  

* gcc/config/arm/arm-cores.def: Add FL_FOR_ARCH flag for each
ARM_CORE entry.  Fix some white-space.
* gcc/config/arm/arm.c: Remove FL_FOR_ARCH derivation from
ARM_CORE definition.
>From beb28417822950ca773742977bed28db84679ed5 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Tue, 28 Jul 2015 09:26:47 +0100
Subject: [PATCH 1/5] Make ARCH flags explicit in arm-cores.def.

Change-Id: I29d640e71b59177a984272335412b4e256909a26
---
 gcc/config/arm/arm-cores.def | 200 +--
 gcc/config/arm/arm.c |   2 +-
 2 files changed, 101 insertions(+), 101 deletions(-)

diff --git a/gcc/config/arm/arm-cores.def b/gcc/config/arm/arm-cores.def
index 9d47fcf..26a6b4b 100644
--- a/gcc/config/arm/arm-cores.def
+++ b/gcc/config/arm/arm-cores.def
@@ -43,134 +43,134 @@
Some tools assume no whitespace up to the first "," in each entry.  */
 
 /* V2/V2A Architecture Processors */
-ARM_CORE("arm2", 	arm2, arm2,	2, FL_CO_PROC | FL_MODE26, slowmul)
-ARM_CORE("arm250", 	arm250, arm250,	2, FL_CO_PROC | FL_MODE26, slowmul)
-ARM_CORE("arm3",	arm3, arm3,	2, FL_CO_PROC | FL_MODE26, slowmul)
+ARM_CORE("arm2",	arm2, arm2,	2, FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH2, slowmul)
+ARM_CORE("arm250",	arm250, arm250,	2, FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH2, slowmul)
+ARM_CORE("arm3",	arm3, arm3,	2, FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH2, slowmul)
 
 /* V3 Architecture Processors */
-ARM_CORE("arm6",	arm6, arm6,		3, FL_CO_PROC | FL_MODE26, slowmul)
-ARM_CORE("arm60",	arm60, arm60,		3, FL_CO_PROC | FL_MODE26, slowmul)
-ARM_CORE("arm600",	arm600, arm600,		3, FL_CO_PROC | FL_MODE26 | FL_WBUF, slowmul)
-ARM_CORE("arm610",	arm610, arm610,		3, FL_MODE26 | FL_WBUF, slowmul)
-ARM_CORE("arm620",	arm620, arm620,		3, FL_CO_PROC | FL_MODE26 | FL_WBUF, slowmul)
-ARM_CORE("arm7",	arm7, arm7,		3, FL_CO_PROC | FL_MODE26, slowmul)
-ARM_CORE("arm7d",	arm7d, arm7d,		3, FL_CO_PROC | FL_MODE26, slowmul)
-ARM_CORE("arm7di",	arm7di, arm7di,		3, FL_CO_PROC | FL_MODE26, slowmul)
-ARM_CORE("arm70",	arm70, arm70,		3, FL_CO_PROC | FL_MODE26, slowmul)
-ARM_CORE("arm700",	arm700, arm700,		3, FL_CO_PROC | FL_MODE26 | FL_WBUF, slowmul)
-ARM_CORE("arm700i",	arm700i, arm700i,	3, FL_CO_PROC | FL_MODE26 | FL_WBUF, slowmul)
-ARM_CORE("arm710",	arm710, arm710,		3, FL_MODE26 | FL_WBUF, slowmul)
-ARM_CORE("arm720",	arm720, arm720,		3, FL_MODE26 | FL_WBUF, slowmul)
-ARM_CORE("arm710c",	arm710c, arm710c,	3, FL_MODE26 | FL_WBUF, slowmul)
-ARM_CORE("arm7100",	arm7100, arm7100,	3, FL_MODE26 | FL_WBUF, slowmul)
-ARM_CORE("arm7500",	arm7500, arm7500,	3, FL_MODE26 | FL_WBUF, slowmul)
+ARM_CORE("arm6",	arm6, arm6,		3, FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH3, slowmul)
+ARM_CORE("arm60",	arm60, arm60,		3, FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH3, slowmul)
+ARM_CORE("arm600",	arm600, arm600,		3, FL_CO_PROC | FL_MODE26 | FL_WBUF | FL_FOR_ARCH3, slowmul)
+ARM_CORE("arm610",	arm610, arm610,		3, FL_MODE26 | FL_WBUF | FL_FOR_ARCH3, slowmul)
+ARM_CORE("arm620",	arm620, arm620,		3, FL_CO_PROC | FL_MODE26 | FL_WBUF | FL_FOR_ARCH3, slowmul)
+ARM_CORE("arm7",	arm7, arm7,		3, FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH3, slowmul)
+ARM_CORE("arm7d",	arm7d, arm7d,		3, FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH3, slowmul)
+ARM_CORE("arm7di",	arm7di, arm7di,		3, FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH3, slowmul)
+ARM_CORE("arm70",	arm70, arm70,		3, FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH3, slowmul)
+ARM_CORE("arm700",	arm700, arm700,		3, FL_CO_PROC | FL_MODE26 | FL_WBUF | FL_FOR_ARCH3, slowmul)
+ARM_CORE("arm700i",	arm700i, arm700i,	3, FL_CO_PROC | FL_MODE26 | FL_WBUF | FL_FOR_ARCH3, slowmul)
+ARM_CORE("arm710",	arm710, arm710,		3, FL_MODE26 | FL_WBUF | FL_FOR_ARCH3, slowmul)
+ARM_CORE("arm720",	arm720, arm72

[PATCH 2/5][ARM] Add feature set definitions.

2015-08-10 Thread Matthew Wahab

The ARM backend uses an unsigned long to record CPU feature flags and
there are currently 31 bits in use. This series of patches replaces the
single unsigned long with a representation based on an array of values.

This patch adds, but doesn't use, type arm_feature_set and macros
prefixed with ARM_FSET to represent and operate on feature sets.

Tested the series for arm-none-linux-gnueabihf with native bootstrap and
make check.

gcc/
2015-08-10  Matthew Wahab  

* config/arm/arm-protos.h (FL_NONE): New.
(FL_ANY): New.
(arm_feature_set): New.
(ARM_FSET_MAKE): New.
(ARM_FSET_MAKE_CPU1): New.
(ARM_FSET_MAKE_CPU2): New.
(ARM_FSET_CPU1): New.
(ARM_FSET_CPU2): New.
(ARM_FSET_EMPTY): New.
(ARM_FSET_ANY): New.
(ARM_FSET_HAS_CPU1): New.
(ARM_FSET_HAS_CPU2): New.
(ARM_FSET_HAS_CPU): New.
(ARM_FSET_ADD_CPU1): New.
(ARM_FSET_ADD_CPU2): New.
(ARM_FSET_DEL_CPU1): New.
(ARM_FSET_DEL_CPU2): New.
(ARM_FSET_UNION): New.
(ARM_FSET_INTER): New.
(ARM_FSET_XOR): New.
(ARM_FSET_EXCLUDE): New.
(AFM_FSET_IS_EMPTY): New.
(ARM_FSET_CPU_SUBSET): New.

>From fd51de4ebdbeff478716cf0a4329fd38cd861403 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Thu, 4 Jun 2015 15:35:25 +0100
Subject: [PATCH 2/5] Add feature set definitions.

Change-Id: I5f89b46ea57e35f477ec4751fea3cb6ee8fce251
---
 gcc/config/arm/arm-protos.h | 105 
 1 file changed, 105 insertions(+)

diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index cef9eec..610c73e 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -346,6 +346,8 @@ extern bool arm_is_constant_pool_ref (rtx);
 /* Flags used to identify the presence of processor capabilities.  */
 
 /* Bit values used to identify processor capabilities.  */
+#define FL_NONE	  (0)	  /* No flags.  */
+#define FL_ANY	  (0x)/* All flags.  */
 #define FL_CO_PROC(1 << 0)/* Has external co-processor bus */
 #define FL_ARCH3M (1 << 1)/* Extended multiply */
 #define FL_MODE26 (1 << 2)/* 26-bit mode support */
@@ -413,6 +415,109 @@ extern bool arm_is_constant_pool_ref (rtx);
 #define FL_FOR_ARCH7EM  (FL_FOR_ARCH7M | FL_ARCH7EM)
 #define FL_FOR_ARCH8A	(FL_FOR_ARCH7VE | FL_ARCH8)
 
+/* There are too many feature bits to fit in a single word so the set of cpu and
+   fpu capabilities is a structure.  A feature set is created and manipulated
+   with the ARM_FSET macros.  */
+
+typedef struct
+{
+  unsigned long cpu[2];
+} arm_feature_set;
+
+
+/* Initialize a feature set.  */
+
+#define ARM_FSET_MAKE(CPU1,CPU2) { { (CPU1), (CPU2) } }
+
+#define ARM_FSET_MAKE_CPU1(CPU1) ARM_FSET_MAKE ((CPU1), (FL_NONE))
+#define ARM_FSET_MAKE_CPU2(CPU2) ARM_FSET_MAKE ((FL_NONE), (CPU2))
+
+/* Accessors.  */
+
+#define ARM_FSET_CPU1(S) ((S).cpu[0])
+#define ARM_FSET_CPU2(S) ((S).cpu[1])
+
+/* Useful combinations.  */
+
+#define ARM_FSET_EMPTY ARM_FSET_MAKE (FL_NONE, FL_NONE)
+#define ARM_FSET_ANY ARM_FSET_MAKE (FL_ANY, FL_ANY)
+
+/* Tests for a specific CPU feature.  */
+
+#define ARM_FSET_HAS_CPU1(A, F)  \
+  (((A).cpu[0] & ((unsigned long)(F))) == ((unsigned long)(F)))
+#define ARM_FSET_HAS_CPU2(A, F)  \
+  (((A).cpu[1] & ((unsigned long)(F))) == ((unsigned long)(F)))
+#define ARM_FSET_HAS_CPU(A, F1, F2)\
+  (ARM_FSET_HAS_CPU1 ((A), (F1)) && ARM_FSET_HAS_CPU2 ((A), (F2)))
+
+/* Add a feature to a feature set.  */
+
+#define ARM_FSET_ADD_CPU1(DST, F)		\
+  do {		\
+(DST).cpu[0] |= (F);			\
+  } while (0)
+
+#define ARM_FSET_ADD_CPU2(DST, F)		\
+  do {		\
+(DST).cpu[1] |= (F);			\
+  } while (0)
+
+/* Remove a feature from a feature set.  */
+
+#define ARM_FSET_DEL_CPU1(DST, F)		\
+  do {		\
+(DST).cpu[0] &= ~(F);			\
+  } while (0)
+
+#define ARM_FSET_DEL_CPU2(DST, F)		\
+  do {		\
+(DST).cpu[1] &= ~(F);			\
+  } while (0)
+
+/* Union of feature sets.  */
+
+#define ARM_FSET_UNION(DST,F1,F2)		\
+  do {		\
+(DST).cpu[0] = (F1).cpu[0] | (F2).cpu[0];	\
+(DST).cpu[1] = (F1).cpu[1] | (F2).cpu[1];	\
+  } while (0)
+
+/* Intersection of feature sets.  */
+
+#define ARM_FSET_INTER(DST,F1,F2)		\
+  do {		\
+(DST).cpu[0] = (F1).cpu[0] & (F2).cpu[0];	\
+(DST).cpu[1] = (F1).cpu[1] & (F2).cpu[1];	\
+  } while (0)
+
+/* Exclusive disjunction.  */
+
+#define ARM_FSET_XOR(DST,F1,F2)\
+  do {			\
+(DST).cpu[0] = (F1).cpu[0] ^ (F2).cpu[0];		\
+(DST).cpu[1] = (F1).cpu[1] ^ (F2).cpu[1];		\
+  } while (0)
+
+/* Difference of feature sets: F1 excluding the elements of F2.  */
+
+#define ARM_FSET_EXCLUDE(DST,F1,F2)		\
+  do {		\
+(DST).cpu[0] = (F1).cpu[0] & ~(F2).cpu[0];	\
+(DST).cpu[1] = (F1).cpu[1] & ~(F2).cpu[1];	\
+  } while (0)
+
+/* Test for an empty feature set.  */
+
+#define ARM_FSET_IS_EMPTY(A)		\
+  (!((A).cpu[0]) && !((A).cpu[1]))
+
+/* Tests whe

[PATCH 3/5][ARM] Use new feature set representation.

2015-08-10 Thread Matthew Wahab

The ARM backend uses an unsigned long to record CPU feature flags and
there are currently 31 bits in use. This series of patches replaces the
single unsigned long with a representation based on an array of values.

This patch replaces the existing representation of CPU feature sets with
the type arm_feature_set and ARM_FSET macros added in an earlier patch
in this series.

Tested the series for arm-none-linux-gnueabihf with native bootstrap and
make check.

gcc/
2015-08-10  Matthew Wahab  

* config/arm/arm-builtins.c (def_mbuiltin): Use ARM_FSET macro.
(struct builtin_description): Change type of mask to unsigned
long.
* config/arm/arm-protos.h (insn_flags): Declare as type
arm_feature_set.
(tune_flags): Likewise.
* config/arm/arm.c (feature_count): New.
(insn_flags): Define as type arm_feature_set.
(tune_flags): Likewise.
(struct processors): Define field flags as type arm_feature_set.
(all_cores): Update for change to struct processors.
(all_architectures): Likewise.
(arm_option_check_internal): Use arm_feature_set and ARM_FSET
macros.
(arm_option_override_internal): Likewise.
(arm_option_override): Likewise.

>From a717f57d0a1cada54294ad7ba47660b5bb3b1a64 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Tue, 28 Jul 2015 09:29:04 +0100
Subject: [PATCH 3/5]  Use feature sets.

Change-Id: Ia69777ef722ec09ac79081b7b653f0ba7bd2f014
---
 gcc/config/arm/arm-builtins.c |   6 +-
 gcc/config/arm/arm-protos.h   |   4 +-
 gcc/config/arm/arm.c  | 133 --
 3 files changed, 82 insertions(+), 61 deletions(-)

diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 030d8d1..ecc364b 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -1106,10 +1106,10 @@ arm_init_neon_builtins (void)
 #undef NUM_DREG_TYPES
 #undef NUM_QREG_TYPES
 
-#define def_mbuiltin(MASK, NAME, TYPE, CODE)\
+#define def_mbuiltin(FLAG, NAME, TYPE, CODE)\
   do	\
 {	\
-  if ((MASK) & insn_flags)		\
+  if (ARM_FSET_HAS_CPU1 (insn_flags, (FLAG)))			\
 	{\
 	  tree bdecl;			\
 	  bdecl = add_builtin_function ((NAME), (TYPE), (CODE),		\
@@ -1121,7 +1121,7 @@ arm_init_neon_builtins (void)
 
 struct builtin_description
 {
-  const unsigned int   mask;
+  const unsigned long  mask;
   const enum insn_code icode;
   const char * const   name;
   const enum arm_builtins  code;
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 610c73e..07dd7c4 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -520,11 +520,11 @@ typedef struct
 
 /* The bits in this mask specify which
instructions we are allowed to generate.  */
-extern unsigned long insn_flags;
+extern arm_feature_set insn_flags;
 
 /* The bits in this mask specify which instruction scheduling options should
be used.  */
-extern unsigned long tune_flags;
+extern arm_feature_set tune_flags;
 
 /* Nonzero if this chip supports the ARM Architecture 3M extensions.  */
 extern int arm_arch3m;
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 6197dfe..70214e0 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -97,6 +97,7 @@ static void arm_add_gc_roots (void);
 static int arm_gen_constant (enum rtx_code, machine_mode, rtx,
 			 HOST_WIDE_INT, rtx, rtx, int, int);
 static unsigned bit_count (unsigned long);
+static unsigned feature_count (const arm_feature_set*);
 static int arm_address_register_rtx_p (rtx, int);
 static int arm_legitimate_index_p (machine_mode, rtx, RTX_CODE, int);
 static bool is_called_in_ARM_mode (tree);
@@ -767,11 +768,11 @@ static int thumb_call_reg_needed;
 
 /* The bits in this mask specify which
instructions we are allowed to generate.  */
-unsigned long insn_flags = 0;
+arm_feature_set insn_flags = ARM_FSET_EMPTY;
 
 /* The bits in this mask specify which instruction scheduling options should
be used.  */
-unsigned long tune_flags = 0;
+arm_feature_set tune_flags = ARM_FSET_EMPTY;
 
 /* The highest ARM architecture version supported by the
target.  */
@@ -927,7 +928,7 @@ struct processors
   enum processor_type core;
   const char *arch;
   enum base_architecture base_arch;
-  const unsigned long flags;
+  const arm_feature_set flags;
   const struct tune_params *const tune;
 };
 
@@ -2196,10 +2197,10 @@ static const struct processors all_cores[] =
   /* ARM Cores */
 #define ARM_CORE(NAME, X, IDENT, ARCH, FLAGS, COSTS) \
   {NAME, IDENT, #ARCH, BASE_ARCH_##ARCH,	  \
-FLAGS, &arm_##COSTS##_tune},
+   ARM_FSET_MAKE_CPU1 (FLAGS), &arm_##COSTS##_tune},
 #include "arm-cores.def"
 #undef ARM_CORE
-  {NULL, arm_none, NULL, BASE_ARCH_0, 0, NULL}
+  {NULL, arm_none, NULL, BASE_ARCH_0, ARM_FSET_EMPTY, NULL}
 };
 
 static const struct processors all_architectures[] =
@@ -2209,10 +2210,10 @@ static c

[PATCH 4/5][ARM] Use features sets for builtins.

2015-08-10 Thread Matthew Wahab

The ARM backend uses an unsigned long to record CPU feature flags and
there are currently 31 bits in use. This series of patches replaces the
single unsigned long with a representation based on an array of values.

This patch updates the feature flags usage in the builtins description
to be able use make of all representable flags.

Tested the series for arm-none-linux-gnueabihf with native bootstrap and
make check.

gcc/
2015-08-10  Matthew Wahab  

* config/arm/arm-builtins.c (def_mbuiltin): Test all flags in a
feature set.
(struct builtin_description): Replace field mask with field
features.
(IWMMXT_BUILTIN): Use ARM_FSET macros for feature flags.
(IWMMXT2_BUILTIN): Likewise.
(IWMMXT2_BUILTIN2): Likewise.
(FP_BUILTIN): Likewise.
(CRC32_BUILTIN): Likewise.
(CRYPTO_BUILTIN): Likewise.
(iwmmx_mbuiltin): Likewise.
(iwmmx2_mbuiltin): Likewise.
(arm_init_iwmmxt_builtins): Likewise. Also, update for change to
struct builtin_description.

>From b82186a2d9c1ae31f10827664163a486d3c7906d Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Thu, 30 Jul 2015 15:19:39 +0100
Subject: [PATCH 4/5] Use features sets for builtins.

Change-Id: I4693a011bb9a3a769c20a8ef3443f25d743b44bd
---
 gcc/config/arm/arm-builtins.c | 45 +--
 1 file changed, 26 insertions(+), 19 deletions(-)

diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index ecc364b..64fbe7f 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -1106,10 +1106,11 @@ arm_init_neon_builtins (void)
 #undef NUM_DREG_TYPES
 #undef NUM_QREG_TYPES
 
-#define def_mbuiltin(FLAG, NAME, TYPE, CODE)\
+#define def_mbuiltin(FLAGS, NAME, TYPE, CODE)\
   do	\
 {	\
-  if (ARM_FSET_HAS_CPU1 (insn_flags, (FLAG)))			\
+  const arm_feature_set flags = FLAGS;\
+  if (ARM_FSET_CPU_SUBSET (flags, insn_flags))			\
 	{\
 	  tree bdecl;			\
 	  bdecl = add_builtin_function ((NAME), (TYPE), (CODE),		\
@@ -1121,7 +1122,7 @@ arm_init_neon_builtins (void)
 
 struct builtin_description
 {
-  const unsigned long  mask;
+  const arm_feature_setfeatures;
   const enum insn_code icode;
   const char * const   name;
   const enum arm_builtins  code;
@@ -1132,11 +1133,13 @@ struct builtin_description
 static const struct builtin_description bdesc_2arg[] =
 {
 #define IWMMXT_BUILTIN(code, string, builtin) \
-  { FL_IWMMXT, CODE_FOR_##code, "__builtin_arm_" string, \
+  { ARM_FSET_MAKE_CPU1 (FL_IWMMXT), CODE_FOR_##code, \
+"__builtin_arm_" string,			 \
 ARM_BUILTIN_##builtin, UNKNOWN, 0 },
 
 #define IWMMXT2_BUILTIN(code, string, builtin) \
-  { FL_IWMMXT2, CODE_FOR_##code, "__builtin_arm_" string, \
+  { ARM_FSET_MAKE_CPU1 (FL_IWMMXT2), CODE_FOR_##code, \
+"__builtin_arm_" string,			  \
 ARM_BUILTIN_##builtin, UNKNOWN, 0 },
 
   IWMMXT_BUILTIN (addv8qi3, "waddb", WADDB)
@@ -1219,10 +1222,12 @@ static const struct builtin_description bdesc_2arg[] =
   IWMMXT_BUILTIN (iwmmxt_walignr3, "walignr3", WALIGNR3)
 
 #define IWMMXT_BUILTIN2(code, builtin) \
-  { FL_IWMMXT, CODE_FOR_##code, NULL, ARM_BUILTIN_##builtin, UNKNOWN, 0 },
+  { ARM_FSET_MAKE_CPU1 (FL_IWMMXT), CODE_FOR_##code, NULL, \
+ARM_BUILTIN_##builtin, UNKNOWN, 0 },
 
 #define IWMMXT2_BUILTIN2(code, builtin) \
-  { FL_IWMMXT2, CODE_FOR_##code, NULL, ARM_BUILTIN_##builtin, UNKNOWN, 0 },
+  { ARM_FSET_MAKE_CPU2 (FL_IWMMXT2), CODE_FOR_##code, NULL, \
+ARM_BUILTIN_##builtin, UNKNOWN, 0 },
 
   IWMMXT2_BUILTIN2 (iwmmxt_waddbhusm, WADDBHUSM)
   IWMMXT2_BUILTIN2 (iwmmxt_waddbhusl, WADDBHUSL)
@@ -1237,7 +1242,7 @@ static const struct builtin_description bdesc_2arg[] =
 
 
 #define FP_BUILTIN(L, U) \
-  {0, CODE_FOR_##L, "__builtin_arm_"#L, ARM_BUILTIN_##U, \
+  {ARM_FSET_EMPTY, CODE_FOR_##L, "__builtin_arm_"#L, ARM_BUILTIN_##U, \
UNKNOWN, 0},
 
   FP_BUILTIN (get_fpscr, GET_FPSCR)
@@ -1245,8 +1250,8 @@ static const struct builtin_description bdesc_2arg[] =
 #undef FP_BUILTIN
 
 #define CRC32_BUILTIN(L, U) \
-  {0, CODE_FOR_##L, "__builtin_arm_"#L, ARM_BUILTIN_##U, \
-   UNKNOWN, 0},
+  {ARM_FSET_EMPTY, CODE_FOR_##L, "__builtin_arm_"#L, \
+   ARM_BUILTIN_##U, UNKNOWN, 0},
CRC32_BUILTIN (crc32b, CRC32B)
CRC32_BUILTIN (crc32h, CRC32H)
CRC32_BUILTIN (crc32w, CRC32W)
@@ -1256,9 +1261,9 @@ static const struct builtin_description bdesc_2arg[] =
 #undef CRC32_BUILTIN
 
 
-#define CRYPTO_BUILTIN(L, U) \
-  {0, CODE_FOR_crypto_##L, "__builtin_arm_crypto_"#L, ARM_BUILTIN_CRYPTO_##U, \
-   UNKNOWN, 0},
+#define CRYPTO_BUILTIN(L, U)	   \
+  {ARM_FSET_EMPTY, CODE_FOR_crypto_##L,	"__builtin_arm_crypto_"#L, \
+   ARM_BUILTIN_CRYPTO_##U, UNKNOWN, 0},
 #undef CRYPTO1
 #undef CRYPTO2
 #undef CRYPTO3
@@ -1514,7 +1519,9 @@ arm_init_iwmmxt_builtins (void)
   machine_mode mode;
   tree type;
 
-  if (d->name == 0 || !(d->mask == FL_IWMMXT || d->

[PATCH 5/5][ARM] Move initializer into arm-cores.def and arm-arches.def.

2015-08-10 Thread Matthew Wahab

The ARM backend uses an unsigned long to record CPU feature flags and
there are currently 30 bits in use. This series of patches replaces the
single unsigned long with a representation based on an array of values.

This patch updates the entries in the arm-core.def and arm-arches.def
files to for the new arm_feature_set representation, moving the
initializers from a macro expansion and making them explicit in the file
entries.

Tested the series for arm-none-linux-gnueabihf with native bootstrap and
make check.

gcc/
2015-08-10  Matthew Wahab  

* config/arm/arm-arches.def: Replace single value flags with
initializer built from ARM_FSET_MAKE_CPU1.
* config/arm/arm-cores.def: Likewise.
* config/arm/arm.c: (all_cores): Remove ARM_FSET_MAKE_CPU1
derivation from the ARM_CORE macro definition, use the given value
instead.
(all_architectures): Remove ARM_FSET_MAKE_CPU1 derivation from the
ARM_ARCH macro definition, use the given value instead.


>From cc8a20a67f2ef7e81cc581dd4726c17903b3870e Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Tue, 28 Jul 2015 09:32:36 +0100
Subject: [PATCH 5/5] Move feature sets into core and arch def files.

Change-Id: Id57f2ced41248384a21148b6ece8646a9e3d841f
---
 gcc/config/arm/arm-arches.def |  59 +++--
 gcc/config/arm/arm-cores.def  | 200 +-
 gcc/config/arm/arm.c  |   4 +-
 3 files changed, 132 insertions(+), 131 deletions(-)

diff --git a/gcc/config/arm/arm-arches.def b/gcc/config/arm/arm-arches.def
index 3dafaa5..e30640f 100644
--- a/gcc/config/arm/arm-arches.def
+++ b/gcc/config/arm/arm-arches.def
@@ -28,34 +28,35 @@
 
genopt.sh assumes no whitespace up to the first "," in each entry.  */
 
-ARM_ARCH("armv2",   arm2,   2,   FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH2)
-ARM_ARCH("armv2a",  arm2,   2,   FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH2)
-ARM_ARCH("armv3",   arm6,   3,   FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH3)
-ARM_ARCH("armv3m",  arm7m,  3M,  FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH3M)
-ARM_ARCH("armv4",   arm7tdmi,   4,   FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH4)
+ARM_ARCH("armv2",   arm2,   2,	ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH2))
+ARM_ARCH("armv2a",  arm2,   2,	ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH2))
+ARM_ARCH("armv3",   arm6,   3,	ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH3))
+ARM_ARCH("armv3m",  arm7m,  3M,	ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH3M))
+ARM_ARCH("armv4",   arm7tdmi,   4,	ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH4))
 /* Strictly, FL_MODE26 is a permitted option for v4t, but there are no
implementations that support it, so we will leave it out for now.  */
-ARM_ARCH("armv4t",  arm7tdmi,   4T,  FL_CO_PROC | FL_FOR_ARCH4T)
-ARM_ARCH("armv5",   arm10tdmi,  5,   FL_CO_PROC | FL_FOR_ARCH5)
-ARM_ARCH("armv5t",  arm10tdmi,  5T,  FL_CO_PROC | FL_FOR_ARCH5T)
-ARM_ARCH("armv5e",  arm1026ejs, 5E,  FL_CO_PROC | FL_FOR_ARCH5E)
-ARM_ARCH("armv5te", arm1026ejs, 5TE, FL_CO_PROC | FL_FOR_ARCH5TE)
-ARM_ARCH("armv6",   arm1136js,  6,   FL_CO_PROC | FL_FOR_ARCH6)
-ARM_ARCH("armv6j",  arm1136js,  6J,  FL_CO_PROC | FL_FOR_ARCH6J)
-ARM_ARCH("armv6k",  mpcore,	6K,  FL_CO_PROC | FL_FOR_ARCH6K)
-ARM_ARCH("armv6z",  arm1176jzs, 6Z,  FL_CO_PROC | FL_FOR_ARCH6Z)
-ARM_ARCH("armv6kz", arm1176jzs, 6KZ, FL_CO_PROC | FL_FOR_ARCH6KZ)
-ARM_ARCH("armv6zk", arm1176jzs, 6KZ, FL_CO_PROC | FL_FOR_ARCH6KZ)
-ARM_ARCH("armv6t2", arm1156t2s, 6T2, FL_CO_PROC | FL_FOR_ARCH6T2)
-ARM_ARCH("armv6-m", cortexm1,	6M,			  FL_FOR_ARCH6M)
-ARM_ARCH("armv6s-m", cortexm1,	6M,			  FL_FOR_ARCH6M)
-ARM_ARCH("armv7",   cortexa8,	7,   FL_CO_PROC |	  FL_FOR_ARCH7)
-ARM_ARCH("armv7-a", cortexa8,	7A,  FL_CO_PROC |	  FL_FOR_ARCH7A)
-ARM_ARCH("armv7ve", cortexa8,	7A,  FL_CO_PROC |	  FL_FOR_ARCH7VE)
-ARM_ARCH("armv7-r", cortexr4,	7R,  FL_CO_PROC |	  FL_FOR_ARCH7R)
-ARM_ARCH("armv7-m", cortexm3,	7M,  FL_CO_PROC |	  FL_FOR_ARCH7M)
-ARM_ARCH("armv7e-m", cortexm4,  7EM, FL_CO_PROC |	  FL_FOR_ARCH7EM)
-ARM_ARCH("armv8-a", cortexa53,  8A,  FL_CO_PROC | FL_FOR_ARCH8A)
-ARM_ARCH("armv8-a+crc",cortexa53, 8A,FL_CO_PROC | FL_CRC32  | FL_FOR_ARCH8A)
-ARM_ARCH("iwmmxt",  iwmmxt, 5TE, FL_LDSCHED | FL_STRONG | FL_FOR_ARCH5TE | FL_XSCALE | FL_IWMMXT)
-ARM_ARCH("iwmmxt2", iwmmxt2,5TE, FL_LDSCHED | FL_STRONG | FL_FOR_ARCH5TE | FL_XSCALE | FL_IWMMXT | FL_IWMMXT2)
+ARM_ARCH("armv4t",  arm7tdmi,   4T,	ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_FOR_ARCH4T))
+ARM_ARCH("armv5",   arm10tdmi,  5,	ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_FOR_ARCH5))
+ARM_ARCH("armv5t",  arm10tdmi,  5T,	ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_FOR_ARCH5T))
+ARM_ARCH("armv5e",  arm1026ejs, 5E,	ARM_FSET_MAKE_CPU1 (FL_CO_PR

[PATCH 1/2] add GCC_FINAL to ansidecl.h

2015-08-10 Thread tbsaunde+gcc
From: Trevor Saunders 

Hi,

This allows classes and virtual functions to be marked as final if the compiler
supports C++11, or is gcc 4.7 or later.

bootstrapped + regtested on x86_64-linux-gnu, ok?

Trev

include/ChangeLog:

2015-08-10  Trevor Saunders  

* ansidecl.h (GCC_FINAL): New macro.
---
 include/ansidecl.h | 9 +
 1 file changed, 9 insertions(+)

diff --git a/include/ansidecl.h b/include/ansidecl.h
index 224627d..6e4bfc2 100644
--- a/include/ansidecl.h
+++ b/include/ansidecl.h
@@ -313,6 +313,15 @@ So instead we use the macro below and test it against 
specific values.  */
 #define ENUM_BITFIELD(TYPE) unsigned int
 #endif
 
+/* This is used to mark a class or virtual function as final.  */
+#if __cplusplus >= 201103L
+#define GCC_FINAL final
+#elif GCC_VERSION >= 4007
+#define GCC_FINAL __final
+#else
+#define GCC_FINAL
+#endif
+
 #ifdef __cplusplus
 }
 #endif
-- 
2.4.0



[PATCH 2/2] replace several uses of the anon namespace with GCC_FINAL

2015-08-10 Thread tbsaunde+gcc
From: Trevor Saunders 

Hi,

In many places gcc puts classes in the anon namespace so the compiler can tell
they do not get inheritted from to enable better devirtualization.  However
debugging code in the anon namespace can be a pain, and the same thing can be
accomplished more directly by marking the classes as final.  When bootstrapping
stage 3 should always be built in C++14 mode now, and of course will always be
newer than gcc 4.7, so these classes will always be marked as final there.
AIUI cross compilers are supposed to be built with recent gcc, which I would
tend to think implies newer than 4.7, so they should also be built with these
classes marked as final.  I believe that means in all important cases this 
works just as well as the anon namespace.

bootstrapped + regtested on x86_64-linux-gnu, ok?

Trev


gcc/ChangeLog:

2015-08-10  Trevor Saunders  

* compare-elim.c, dce.c, dse.c, gimple-ssa-isolate-paths.c,
gimple-ssa-strength-reduction.c, graphite.c, init-regs.c,
ipa-pure-const.c, ipa-visibility.c, ipa.c, mode-switching.c,
omp-low.c, reorg.c, sanopt.c, trans-mem.c, tree-eh.c,
tree-if-conv.c, tree-ssa-copyrename.c, tree-ssa-dce.c,
tree-ssa-dom.c, tree-ssa-dse.c, tree-ssa-forwprop.c,
tree-ssa-sink.c, tree-ssanames.c, tree-stdarg.c, tree-tailcall.c,
tree-vect-generic.c, tree.c, ubsan.c, var-tracking.c,
vtable-verify.c, web.c: Use GCC_FINAL instead of the anonymous
namespace.
---
 gcc/compare-elim.c  |  8 ++
 gcc/dce.c   | 16 +++
 gcc/dse.c   | 16 +++
 gcc/gimple-ssa-isolate-paths.c  |  6 ++--
 gcc/gimple-ssa-strength-reduction.c |  8 ++
 gcc/graphite.c  | 16 +++
 gcc/init-regs.c |  8 ++
 gcc/ipa-pure-const.c| 32 ++---
 gcc/ipa-visibility.c| 14 ++
 gcc/ipa.c   | 24 
 gcc/mode-switching.c|  8 ++
 gcc/omp-low.c   | 40 +++---
 gcc/reorg.c | 16 +++
 gcc/sanopt.c|  8 ++
 gcc/trans-mem.c | 56 ++---
 gcc/tree-eh.c   | 40 +++---
 gcc/tree-if-conv.c  |  8 ++
 gcc/tree-ssa-copyrename.c   |  8 ++
 gcc/tree-ssa-dce.c  | 16 +++
 gcc/tree-ssa-dom.c  |  8 ++
 gcc/tree-ssa-dse.c  |  8 ++
 gcc/tree-ssa-forwprop.c |  8 ++
 gcc/tree-ssa-sink.c |  8 ++
 gcc/tree-ssanames.c |  8 ++
 gcc/tree-stdarg.c   | 16 +++
 gcc/tree-tailcall.c | 16 +++
 gcc/tree-vect-generic.c | 16 +++
 gcc/tree.c  |  8 ++
 gcc/ubsan.c |  8 ++
 gcc/var-tracking.c  |  8 ++
 gcc/vtable-verify.c |  8 ++
 gcc/web.c   |  8 ++
 32 files changed, 119 insertions(+), 357 deletions(-)

diff --git a/gcc/compare-elim.c b/gcc/compare-elim.c
index b65d09e..ea94c4e 100644
--- a/gcc/compare-elim.c
+++ b/gcc/compare-elim.c
@@ -668,9 +668,7 @@ execute_compare_elim_after_reload (void)
   return 0;
 }
 
-namespace {
-
-const pass_data pass_data_compare_elim_after_reload =
+static const pass_data pass_data_compare_elim_after_reload =
 {
   RTL_PASS, /* type */
   "cmpelim", /* name */
@@ -683,7 +681,7 @@ const pass_data pass_data_compare_elim_after_reload =
   ( TODO_df_finish | TODO_df_verify ), /* todo_flags_finish */
 };
 
-class pass_compare_elim_after_reload : public rtl_opt_pass
+class pass_compare_elim_after_reload GCC_FINAL : public rtl_opt_pass
 {
 public:
   pass_compare_elim_after_reload (gcc::context *ctxt)
@@ -706,8 +704,6 @@ public:
 
 }; // class pass_compare_elim_after_reload
 
-} // anon namespace
-
 rtl_opt_pass *
 make_pass_compare_elim_after_reload (gcc::context *ctxt)
 {
diff --git a/gcc/dce.c b/gcc/dce.c
index c9cffc9..1b23eb7 100644
--- a/gcc/dce.c
+++ b/gcc/dce.c
@@ -784,9 +784,7 @@ rest_of_handle_ud_dce (void)
 }
 
 
-namespace {
-
-const pass_data pass_data_ud_rtl_dce =
+static const pass_data pass_data_ud_rtl_dce =
 {
   RTL_PASS, /* type */
   "ud_dce", /* name */
@@ -799,7 +797,7 @@ const pass_data pass_data_ud_rtl_dce =
   TODO_df_finish, /* todo_flags_finish */
 };
 
-class pass_ud_rtl_dce : public rtl_opt_pass
+class pass_ud_rtl_dce GCC_FINAL : public rtl_opt_pass
 {
 public:
   pass_ud_rtl_dce (gcc::context *ctxt)
@@ -819,8 +817,6 @@ public:
 
 }; // class pass_ud_rtl_dce
 
-} // anon namespace
-
 rtl_opt_pass *
 make_pass_ud_rtl_dce (gcc::context *ctxt)
 {
@@ -1215,9 +1211,7 @@ run_fast_dce (void)
 }
 
 
-namespace {
-
-const pass_data pass_data_fast_rtl_dce =
+static const pass_d

[PATCH 2/4] [MIPS] Add pipeline description for MSA

2015-08-10 Thread Robert Suchanek
Hi,

The patch adds a pipeline description for MSA to I6400 and P5600 schedulers.

Regards,
Robert

gcc/ChangeLog:

* config/mips/i6400.md (i6400_fpu_intadd, i6400_fpu_logic)
(i6400_fpu_div, i6400_fpu_cmp, i6400_fpu_float, i6400_fpu_store)
(i6400_fpu_long_pipe, i6400_fpu_logic_l, i6400_fpu_float_l)
(i6400_fpu_mult): New cpu units.
(i6400_msa_add_d, i6400_msa_int_add, i6400_msa_short_logic3)
(i6400_msa_short_logic2, i6400_msa_short_logic, i6400_msa_move)
(i6400_msa_cmp, i6400_msa_short_float2, i6400_msa_div_d)
(i6400_msa_div_w, i6400_msa_div_h, i6400_msa_div_b, i6400_msa_copy)
(i6400_msa_branch, i6400_fpu_msa_store, i6400_fpu_msa_load)
(i6400_fpu_msa_move, i6400_msa_long_logic1, i6400_msa_long_logic2)
(i6400_msa_mult, i6400_msa_long_float2, i6400_msa_long_float4)
(i6400_msa_long_float5, i6400_msa_long_float8, i6400_msa_fdiv_df)
(i6400_msa_fdiv_sf): New reservations.
* config/mips/p5600.md (p5600_fpu_intadd, p5600_fpu_cmp)
(p5600_fpu_float, p5600_fpu_logic_a, p5600_fpu_logic_b, p5600_fpu_div)
(p5600_fpu_logic, p5600_fpu_float_a, p5600_fpu_float_b,)
(p5600_fpu_float_c, p5600_fpu_float_d, p5600_fpu_mult, p5600_fpu_fdiv)
(p5600_fpu_load): New cpu units.
(msa_short_int_add, msa_short_logic, msa_short_logic_move_v)
(msa_short_cmp, msa_short_float2, msa_short_logic3, msa_short_store4)
(msa_long_load, msa_short_store, msa_long_logic, msa_long_float2)
(msa_long_float4, msa_long_float5, msa_long_float8, msa_long_mult)
(msa_long_fdiv, msa_long_div): New reservations.


0002-MIPS-Add-pipeline-description-for-MSA.patch
Description: 0002-MIPS-Add-pipeline-description-for-MSA.patch


[PATCH 1/4] [MIPS] Add support for MIPS SIMD Architecture (MSA)

2015-08-10 Thread Robert Suchanek
Hi,

This series of patches adds the support for MIPS SIMD Architecture (MSA)
and underwent a few updates since the last review to address the comments:

https://gcc.gnu.org/ml/gcc-patches/2014-05/msg01777.html

The series is split into four parts:

0001 [MIPS] Add support for MIPS SIMD Architecture (MSA)
0002 [MIPS] Add pipeline description for MSA
0003 Add support to run auto-vectorization tests for multiple effective targets
0004 [MIPS] Add tests for MSA

There a couple things to mention here:
- there is a minor regression on O32 ABI due to the lack of stack realignment
  AFAICS and a patch will follow.  The vectorizer generates more unaligned 
accesses
  than the tests expect, and hence, fail the checks.
- the series doesn't add cost modelling for auto-vectorization
- patch 0003 is independent but must go in before 0004.

Regards,
Robert

gcc/ChangeLog:

* config.gcc: Add MSA header file for mips*-*-* target.
* config/mips/constraints.md (YI, YC, YZ, Unv5, Uuv5, Uuv6, Ubv8):
New constraints.
* config/mips/mips-ftypes.def: Add function types for MSA builtins.
* config/mips/mips-modes.def (V16QI, V8HI, V4SI, V2DI, V4SF, V2DF)
(V32QI, V16HI, V8SI, V4DI, V8SF, V4DF): New modes.
* config/mips/mips-msa.md: New file.
* config/mips/mips-protos.h
(mips_split_128bit_const_insns): New prototype.
(mips_msa_idiv_insns): Likewise.
(mips_split_128bit_move): Likewise.
(mips_split_128bit_move_p): Likewise.
(mips_split_msa_copy_d): Likewise.
(mips_split_msa_insert_d): Likewise.
(mips_split_msa_fill_d): Likewise.
(mips_expand_msa_branch): Likewise.
(mips_const_vector_same_val_p): Likewise.
(mips_const_vector_same_byte_p): Likewise.
(mips_const_vector_same_int_p): Likewise.
(mips_const_vector_bitimm_set_p): Likewise.
(mips_const_vector_bitimm_clr_p): Likewise.
(mips_msa_output_division): Likewise.
(mips_ldst_scaled_shift): Likewise.
(mips_expand_vec_cond_expr): Likewise.
* config/mips/mips.c (mips_const_vector_bitimm_set_p): New function.
(mips_const_vector_bitimm_clr_p): Likewise.
(mips_const_vector_same_val_p): Likewise.
(mips_const_vector_same_byte_p): Likewise.
(mips_const_vector_same_int_p): Likewise.
(mips_symbol_insns): Forbid loading symbols via immediate for MSA.
(mips_valid_offset_p): Limit offset to 10-bit for MSA loads and stores.
(mips_valid_lo_sum_p): Forbid loadings symbols via %lo(base) for MSA.
(mips_lx_address_p): Add support load indexed address for MSA.
(mips_address_insns): Add calculation of instructions needed for
stores and loads for MSA.
(mips_const_insns): Move CONST_DOUBLE below CONST_VECTOR.  Handle
CONST_VECTOR for MSA and let it fall through.
(mips_ldst_scaled_shift): New function.
(mips_subword_at_byte): Likewise.
(mips_msa_idiv_insns): Likewise.
(mips_legitimize_move): Validate MSA moves.
(mips_rtx_costs): Add UNGE, UNGT, UNLE, UNLT cases.  Add calculation of
costs for MSA division.
(mips_split_move_p): Check if MSA moves need splitting.
(mips_split_move): Split MSA moves if necessary.
(mips_split_128bit_move_p): New function.
(mips_split_128bit_move): Likewise.
(mips_split_msa_copy_d): Likewise.
(mips_split_msa_insert_d): Likewise.
(mips_split_msa_fill_d): Likewise.
(mips_output_move): Handle MSA moves.
(mips_expand_msa_branch): New function.
(mips_print_operand): Add 'E', 'B', 'w', 'v' modifiers.  Reinstate 'y'
modifier.
(mips_file_start): Add MSA .gnu_attribute.
(mips_hard_regno_mode_ok_p): Allow TImode and 128-bit vectors in FPRs.
(mips_hard_regno_nregs): Always return 1 for MSA supported mode.
(mips_class_max_nregs): Add register size for MSA supported mode.
(mips_cannot_change_mode_class): Allow conversion between MSA vector
modes and TImode.
(mips_mode_ok_for_mov_fmt_p): Allow MSA to use move.v instruction.
(mips_secondary_reload_class): Force MSA loads/stores via memory.
(mips_preferred_simd_mode): Add preffered modes for MSA.
(mips_vector_mode_supported_p): Add MSA supported modes.
(mips_autovectorize_vector_sizes): New function.
(mips_msa_output_division): Likewise.
(MSA_BUILTIN, MIPS_BUILTIN_DIRECT_NO_TARGET, MSA_NO_TARGET_BUILTIN):
New macros.
(CODE_FOR_msa_adds_s_b, CODE_FOR_msa_adds_s_h, CODE_FOR_msa_adds_s_w)
(CODE_FOR_msa_adds_s_d, CODE_FOR_msa_adds_u_b, CODE_FOR_msa_adds_u_h)
(CODE_FOR_msa_adds_u_w, CODE_FOR_msa_adds_u_d, CODE_FOR_msa_addv_b)
(CODE_FOR_msa_addv_h, CODE_FOR_msa_addv_w, CODE_FOR_msa_addv_d)
(CODE_FOR_msa_and_v, CODE_FOR_msa_bmnz_v, CODE_FOR_msa_bmz_v)
(CODE_FOR_msa_bnz_v, CODE_F

Re: [COMMITTED][AArch64] Improve TLS Descriptor pattern to release RTL loop IV opt

2015-08-10 Thread Jiong Wang

Jiong Wang writes:

> Andreas Schwab writes:
>
>> Jiong Wang  writes:
>>
>>> Index: gcc/ChangeLog
>>> ===
>>> --- gcc/ChangeLog   (revision 226682)
>>> +++ gcc/ChangeLog   (working copy)
>>> @@ -1,3 +1,16 @@
>>> +2015-08-06Ramana Radhakrishnan  
>>> + Jiong Wang  
>>> +
>>> +   * config/aarch64/aarch64.d (tlsdesc_small_pseudo_): New pattern.
>>> +   * config/aarch64/aarch64.h (reg_class): New enumeration FIXED_REG0.
>>> +   (REG_CLASS_NAMES): Likewise.
>>> +   (REG_CLASS_CONTENTS): Likewise.
>>> +   * config/aarch64/aarch64.c (aarch64_class_max_nregs): Likewise.
>>> +   (aarch64_register_move_cost): Likewise.
>>> +   (aarch64_load_symref_appropriately): Invoke the new added pattern if
>>> +   possible.
>>> +   * config/aarch64/constraints.md (Uc0): New constraint.
>>
>> That breaks go, all tests are crashing now.
>
> Andreas,
>
>   Thanks for the information.
>
>   * I found I committed the wrong patch!
> there are two patches in my local directory, one is
> "tlsdesc_hoist.patch" the other is "tlsdesc-hoist.patch", the one
> approved and up-to-date is tlsdesc-hoist.patch while I committed
> tlsdesc_hoist.patch.
> 
> Reverted the wrong commit and committed the correct/approved
> version.
>
>   * Even after the correct patch applied, I still found go check failed
> on my local native check.
>
> Tring to understand why and if I can't figure out today I will
> revert the patch.

And I just finished two round of native aarch64 build/check w/wo my patch.

I got the same go.sum.

And my patch only touches one tls descriptor pattern which will only be
used if there is tls variable. So I suspect the go test regressions are
not caused by my patch.

Andreas, can you please double confirm this?

Thanks

-- 
Regards,
Jiong



Re: [COMMITTED][AArch64] Improve TLS Descriptor pattern to release RTL loop IV opt

2015-08-10 Thread Andreas Schwab
Jiong Wang  writes:

> And I just finished two round of native aarch64 build/check w/wo my patch.

Did you rebuild everything?

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."


Re: [COMMITTED][AArch64] Improve TLS Descriptor pattern to release RTL loop IV opt

2015-08-10 Thread Jiong Wang

Andreas Schwab writes:

> Jiong Wang  writes:
>
>> And I just finished two round of native aarch64 build/check w/wo my patch.
>
> Did you rebuild everything?

No.

Just applied the patch, then "make all" and re-check

>
> Andreas.

-- 
Regards,
Jiong



Re: [COMMITTED][AArch64] Improve TLS Descriptor pattern to release RTL loop IV opt

2015-08-10 Thread Andreas Schwab
Jiong Wang  writes:

> Andreas Schwab writes:
>
>> Jiong Wang  writes:
>>
>>> And I just finished two round of native aarch64 build/check w/wo my patch.
>>
>> Did you rebuild everything?
>
> No.

Please do.

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."


Re: [PATCH 2/2] replace several uses of the anon namespace with GCC_FINAL

2015-08-10 Thread Markus Trippelsdorf
On 2015.08.10 at 08:05 -0400, tbsaunde+...@tbsaunde.org wrote:
> 
> In many places gcc puts classes in the anon namespace so the compiler can tell
> they do not get inheritted from to enable better devirtualization.  However
> debugging code in the anon namespace can be a pain, and the same thing can be
> accomplished more directly by marking the classes as final.  When 
> bootstrapping
> stage 3 should always be built in C++14 mode now, and of course will always be
> newer than gcc 4.7, so these classes will always be marked as final there.
> AIUI cross compilers are supposed to be built with recent gcc, which I would
> tend to think implies newer than 4.7, so they should also be built with these
> classes marked as final.  I believe that means in all important cases this 
> works just as well as the anon namespace.
> 
> bootstrapped + regtested on x86_64-linux-gnu, ok?

Are you sure that you don't unintentionally introduce new ODR
violations? 
An LTO bootstrap, where you look for new -Wodr warnings, should give the
answer.

-- 
Markus


Re: PR c/c++/diagnostics/66098 Take -Werror into account when deciding what was the command-line status

2015-08-10 Thread Manuel López-Ibáñez
PING^2: https://gcc.gnu.org/ml/gcc-patches/2015-07/msg02581.html

On 3 August 2015 at 20:47, Manuel López-Ibáñez  wrote:
> PING: https://gcc.gnu.org/ml/gcc-patches/2015-07/msg02581.html
>
> Thanks,
>
> Manuel.
>
> On 30 July 2015 at 17:35, Manuel López-Ibáñez  wrote:
>> When I fixed PR59304, I forgot that a command-line warning can be also
>> an error if -Werror was enabled. This introduced a regression since
>> anything enabled in the command-line together with -Werror would get
>> initially classified as a warning when reaching the first #pragma GCC
>> diagnostic, and this will be the setting after a #pragma pop.
>>
>> Options that appear as arguments of -W[no-]error= are not affected by
>> this since those are initially classified as errors/warnings even
>> before reaching the first #pragma, thus the pop sets them correctly
>> (before and after this patch). Nonetheless, the tests also check that
>> they work correctly.
>>
>> Boot®tested on x86_64-linux-gnu.
>>
>> OK?
>>
>>
>> gcc/ChangeLog:
>>
>> 2015-07-29  Manuel López-Ibáñez  
>>
>> PR c/66098
>> PR c/66711
>> * diagnostic.c (diagnostic_classify_diagnostic): Take -Werror into
>> account when deciding what was the command-line status.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2015-07-29  Manuel López-Ibáñez  
>>
>> PR c/66098
>> PR c/66711
>> * gcc.dg/pragma-diag-3.c: New test.
>> * gcc.dg/pragma-diag-4.c: New test.


Re: [PATCH 2/2] replace several uses of the anon namespace with GCC_FINAL

2015-08-10 Thread Trevor Saunders
On Mon, Aug 10, 2015 at 03:56:02PM +0200, Markus Trippelsdorf wrote:
> On 2015.08.10 at 08:05 -0400, tbsaunde+...@tbsaunde.org wrote:
> > 
> > In many places gcc puts classes in the anon namespace so the compiler can 
> > tell
> > they do not get inheritted from to enable better devirtualization.  However
> > debugging code in the anon namespace can be a pain, and the same thing can 
> > be
> > accomplished more directly by marking the classes as final.  When 
> > bootstrapping
> > stage 3 should always be built in C++14 mode now, and of course will always 
> > be
> > newer than gcc 4.7, so these classes will always be marked as final there.
> > AIUI cross compilers are supposed to be built with recent gcc, which I would
> > tend to think implies newer than 4.7, so they should also be built with 
> > these
> > classes marked as final.  I believe that means in all important cases this 
> > works just as well as the anon namespace.
> > 
> > bootstrapped + regtested on x86_64-linux-gnu, ok?
> 
> Are you sure that you don't unintentionally introduce new ODR
> violations? 

yeah, I just looked at a list of all the class names, and they are all
different.  So since all the new symbols involve a class name they are
all unique.

Trev

> An LTO bootstrap, where you look for new -Wodr warnings, should give the
> answer.
> 
> -- 
> Markus


Re: [PATCH 1/2][ARM] Record FPU features as a bit-set

2015-08-10 Thread Ramana Radhakrishnan
On Mon, Aug 10, 2015 at 11:28:06AM +0100, Matthew Wahab wrote:
> Ping. Updated patch attached.
> 
> Also, retested the series for arm-none-linux-gnueabihf with native
> bootstrap and make check.
> 
> On 22/06/15 16:16, Matthew Wahab wrote:
> >Hello,
> >
> >The ARM backend records FPU features as booleans, one for each feature. This
> >means that adding support for a new feature involves updating every entry in 
> >the
> >list of FPU descriptions in arm-fpus.def. This patch series changes the
> >representation of FPU features to use a simple bit-set and flags, as is done
> >elsewhere.
> >
> >This patch adds the new FPU feature representation, with feature sets
> >represented as unsigned longs.
> >
> >Tested the series for arm-none-linux-gnueabihf with check-gcc
> >
> >Ok for trunk?
> >Matthew

This is OK, thanks

Ramana
> >
> >gcc/
> >2015-06-22  Matthew Wahab  
> >
> > * config/arm/arm.h (arm_fpu_fset): New.
> > (ARM_FPU_FSET_HAS): New.
> > (FPU_FL_NONE): New.
> > (FPU_FL_NEON): New.
> > (FPU_FL_FP16): New.
> > (FPU_FL_CRYPTO): New.
> >
> 

> From 571416d9e7bc9cb6c16008486faf357873270991 Mon Sep 17 00:00:00 2001
> From: Matthew Wahab 
> Date: Thu, 23 Jul 2015 12:44:51 +0100
> Subject: [PATCH 1/2] Add fpu feature set definitions.
> 
> Change-Id: I9f0fcc9627e3c435cbbc9056b9244781b438447e
> ---
>  gcc/config/arm/arm.h | 13 +
>  1 file changed, 13 insertions(+)
> 
> diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
> index bb64be0..f49eb48 100644
> --- a/gcc/config/arm/arm.h
> +++ b/gcc/config/arm/arm.h
> @@ -318,6 +318,19 @@ extern void 
> (*arm_lang_output_object_attributes_hook)(void);
>{"mode", "%{!marm:%{!mthumb:-m%(VALUE)}}"}, \
>{"tls", "%{!mtls-dialect=*:-mtls-dialect=%(VALUE)}"},
>  
> +/* FPU feature sets.  */
> +
> +typedef unsigned long arm_fpu_feature_set;
> +
> +/* Test for an FPU feature.  */
> +#define ARM_FPU_FSET_HAS(S,F) (((S) & (F)) == (F))
> +
> +/* FPU Features.  */
> +#define FPU_FL_NONE  (0)
> +#define FPU_FL_NEON  (1 << 0)/* NEON instructions.  */
> +#define FPU_FL_FP16  (1 << 1)/* Half-precision.  */
> +#define FPU_FL_CRYPTO(1 << 2)/* Crypto extensions.  */
> +
>  /* Which floating point model to use.  */
>  enum arm_fp_model
>  {
> -- 
> 1.9.1
> 



Re: [PATCH 2/2][ARM] Use new FPU features representation

2015-08-10 Thread Ramana Radhakrishnan
On Mon, Jun 22, 2015 at 04:18:04PM +0100, Matthew Wahab wrote:
> Hello,
> 
> This patch series changes the representation of FPU features to use a simple
> bit-set and flags, as is done elsewhere.
> 
> This patch uses the new representation of FPU feature sets.
> 
> Tested the series for arm-none-linux-gnueabihf with check-gcc
> 
> Ok for trunk?
> Matthew


This is OK

thanks and sorry about the delay.

ramana

> 
> gcc/
> 2015-06-22  Matthew Wahab  
> 
>   * config/arm/arm-fpus.def: Replace neon, fp16 and crypto boolean
>   fields with feature flags.  Update comment.
>   * config/arm/arm.c (ARM_FPU): Update macro.
>   * config/arm/arm.h (TARGET_NEON_FP16): Update feature test.
>   (TARGET_FP16): Likewise.
>   (TARGET_CRYPTO): Likewise.
>   (TARGET_NEON): Likewise.
>   (struct arm_fpu_desc): Remove fields neon, fp16 and crypto.  Add
>   field features.
> 

> From 6f9cd1b41d7597d95bd80aa21344f8e6e011e168 Mon Sep 17 00:00:00 2001
> From: Matthew Wahab 
> Date: Wed, 10 Jun 2015 10:11:56 +0100
> Subject: [PATCH 2/2] Use new FPU feature definitions.
> 
> Change-Id: I0c45e52b08b31433ec2b30fcb666584cabcb826b
> ---
>  gcc/config/arm/arm-fpus.def | 40 
>  gcc/config/arm/arm.c|  4 ++--
>  gcc/config/arm/arm.h| 22 +-
>  3 files changed, 35 insertions(+), 31 deletions(-)
> 
> diff --git a/gcc/config/arm/arm-fpus.def b/gcc/config/arm/arm-fpus.def
> index 2dfefd6..efd5896 100644
> --- a/gcc/config/arm/arm-fpus.def
> +++ b/gcc/config/arm/arm-fpus.def
> @@ -19,30 +19,30 @@
>  
>  /* Before using #include to read this file, define a macro:
>  
> -  ARM_FPU(NAME, MODEL, REV, VFP_REGS, NEON, FP16, CRYPTO)
> +  ARM_FPU(NAME, MODEL, REV, VFP_REGS, FEATURES)
>  
> The arguments are the fields of struct arm_fpu_desc.
>  
> genopt.sh assumes no whitespace up to the first "," in each entry.  */
>  
> -ARM_FPU("vfp",   ARM_FP_MODEL_VFP, 2, VFP_REG_D16, false, false, 
> false)
> -ARM_FPU("vfpv3", ARM_FP_MODEL_VFP, 3, VFP_REG_D32, false, false, false)
> -ARM_FPU("vfpv3-fp16",ARM_FP_MODEL_VFP, 3, VFP_REG_D32, false, true, 
> false)
> -ARM_FPU("vfpv3-d16", ARM_FP_MODEL_VFP, 3, VFP_REG_D16, false, false, false)
> -ARM_FPU("vfpv3-d16-fp16",ARM_FP_MODEL_VFP, 3, VFP_REG_D16, false, true, 
> false)
> -ARM_FPU("vfpv3xd",   ARM_FP_MODEL_VFP, 3, VFP_REG_SINGLE, false, false, 
> false)
> -ARM_FPU("vfpv3xd-fp16",  ARM_FP_MODEL_VFP, 3, VFP_REG_SINGLE, false, 
> true, false)
> -ARM_FPU("neon",  ARM_FP_MODEL_VFP, 3, VFP_REG_D32, true , false, 
> false)
> -ARM_FPU("neon-fp16", ARM_FP_MODEL_VFP, 3, VFP_REG_D32, true, true, false)
> -ARM_FPU("vfpv4", ARM_FP_MODEL_VFP, 4, VFP_REG_D32, false, true, false)
> -ARM_FPU("vfpv4-d16", ARM_FP_MODEL_VFP, 4, VFP_REG_D16, false, true, false)
> -ARM_FPU("fpv4-sp-d16",   ARM_FP_MODEL_VFP, 4, VFP_REG_SINGLE, false, 
> true, false)
> -ARM_FPU("fpv5-sp-d16",   ARM_FP_MODEL_VFP, 5, VFP_REG_SINGLE, false, 
> true, false)
> -ARM_FPU("fpv5-d16",  ARM_FP_MODEL_VFP, 5, VFP_REG_D16, false, true, false)
> -ARM_FPU("neon-vfpv4",ARM_FP_MODEL_VFP, 4, VFP_REG_D32, true, true, 
> false)
> -ARM_FPU("fp-armv8",  ARM_FP_MODEL_VFP, 8, VFP_REG_D32, false, true, false)
> -ARM_FPU("neon-fp-armv8",ARM_FP_MODEL_VFP, 8, VFP_REG_D32, true, true, false)
> +ARM_FPU("vfp",   ARM_FP_MODEL_VFP, 2, VFP_REG_D16, FPU_FL_NONE)
> +ARM_FPU("vfpv3", ARM_FP_MODEL_VFP, 3, VFP_REG_D32, FPU_FL_NONE)
> +ARM_FPU("vfpv3-fp16",ARM_FP_MODEL_VFP, 3, VFP_REG_D32, FPU_FL_FP16)
> +ARM_FPU("vfpv3-d16", ARM_FP_MODEL_VFP, 3, VFP_REG_D16, FPU_FL_NONE)
> +ARM_FPU("vfpv3-d16-fp16",ARM_FP_MODEL_VFP, 3, VFP_REG_D16, FPU_FL_FP16)
> +ARM_FPU("vfpv3xd",   ARM_FP_MODEL_VFP, 3, VFP_REG_SINGLE, FPU_FL_NONE)
> +ARM_FPU("vfpv3xd-fp16",  ARM_FP_MODEL_VFP, 3, VFP_REG_SINGLE, 
> FPU_FL_FP16)
> +ARM_FPU("neon",  ARM_FP_MODEL_VFP, 3, VFP_REG_D32, FPU_FL_NEON)
> +ARM_FPU("neon-fp16", ARM_FP_MODEL_VFP, 3, VFP_REG_D32, FPU_FL_NEON | 
> FPU_FL_FP16)
> +ARM_FPU("vfpv4", ARM_FP_MODEL_VFP, 4, VFP_REG_D32, FPU_FL_FP16)
> +ARM_FPU("vfpv4-d16", ARM_FP_MODEL_VFP, 4, VFP_REG_D16, FPU_FL_FP16)
> +ARM_FPU("fpv4-sp-d16",   ARM_FP_MODEL_VFP, 4, VFP_REG_SINGLE, 
> FPU_FL_FP16)
> +ARM_FPU("fpv5-sp-d16",   ARM_FP_MODEL_VFP, 5, VFP_REG_SINGLE, 
> FPU_FL_FP16)
> +ARM_FPU("fpv5-d16",  ARM_FP_MODEL_VFP, 5, VFP_REG_D16, FPU_FL_FP16)
> +ARM_FPU("neon-vfpv4",ARM_FP_MODEL_VFP, 4, VFP_REG_D32, FPU_FL_NEON | 
> FPU_FL_FP16)
> +ARM_FPU("fp-armv8",  ARM_FP_MODEL_VFP, 8, VFP_REG_D32, FPU_FL_FP16)
> +ARM_FPU("neon-fp-armv8",ARM_FP_MODEL_VFP, 8, VFP_REG_D32, FPU_FL_NEON | 
> FPU_FL_FP16)
>  ARM_FPU("crypto-neon-fp-armv8",
> - ARM_FP_MODEL_VFP, 8, VFP_REG_D32, true, true, true)
> + ARM_FP_MODEL_VFP, 8, VFP_REG_D32, FPU_FL_NEON | 
> FPU_FL_FP16 | FPU_FL_CRYPTO)
>  /* Compatibility aliases.  */
> -ARM_FPU("vfp3",  ARM_FP

Re: [PATCH 2/5][ARM] Add feature set definitions.

2015-08-10 Thread Ramana Radhakrishnan
On Mon, Aug 10, 2015 at 12:56:59PM +0100, Matthew Wahab wrote:
> The ARM backend uses an unsigned long to record CPU feature flags and
> there are currently 31 bits in use. This series of patches replaces the
> single unsigned long with a representation based on an array of values.
> 
> This patch adds, but doesn't use, type arm_feature_set and macros
> prefixed with ARM_FSET to represent and operate on feature sets.
> 
> Tested the series for arm-none-linux-gnueabihf with native bootstrap and
> make check.
> 
> gcc/
> 2015-08-10  Matthew Wahab  
> 
>   * config/arm/arm-protos.h (FL_NONE): New.
>   (FL_ANY): New.
>   (arm_feature_set): New.
>   (ARM_FSET_MAKE): New.
>   (ARM_FSET_MAKE_CPU1): New.
>   (ARM_FSET_MAKE_CPU2): New.
>   (ARM_FSET_CPU1): New.
>   (ARM_FSET_CPU2): New.
>   (ARM_FSET_EMPTY): New.
>   (ARM_FSET_ANY): New.
>   (ARM_FSET_HAS_CPU1): New.
>   (ARM_FSET_HAS_CPU2): New.
>   (ARM_FSET_HAS_CPU): New.
>   (ARM_FSET_ADD_CPU1): New.
>   (ARM_FSET_ADD_CPU2): New.
>   (ARM_FSET_DEL_CPU1): New.
>   (ARM_FSET_DEL_CPU2): New.
>   (ARM_FSET_UNION): New.
>   (ARM_FSET_INTER): New.
>   (ARM_FSET_XOR): New.
>   (ARM_FSET_EXCLUDE): New.
>   (AFM_FSET_IS_EMPTY): New.
>   (ARM_FSET_CPU_SUBSET): New.
> 

> From fd51de4ebdbeff478716cf0a4329fd38cd861403 Mon Sep 17 00:00:00 2001
> From: Matthew Wahab 
> Date: Thu, 4 Jun 2015 15:35:25 +0100
> Subject: [PATCH 2/5] Add feature set definitions.
> 
> Change-Id: I5f89b46ea57e35f477ec4751fea3cb6ee8fce251
> ---
>  gcc/config/arm/arm-protos.h | 105 
> 
>  1 file changed, 105 insertions(+)
> 
> diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
> index cef9eec..610c73e 100644
> --- a/gcc/config/arm/arm-protos.h
> +++ b/gcc/config/arm/arm-protos.h
> @@ -346,6 +346,8 @@ extern bool arm_is_constant_pool_ref (rtx);
>  /* Flags used to identify the presence of processor capabilities.  */
>  
>  /* Bit values used to identify processor capabilities.  */
> +#define FL_NONE(0) /* No flags.  */
> +#define FL_ANY (0x)/* All flags.  */
>  #define FL_CO_PROC(1 << 0)/* Has external co-processor bus */
>  #define FL_ARCH3M (1 << 1)/* Extended multiply */
>  #define FL_MODE26 (1 << 2)/* 26-bit mode support */
> @@ -413,6 +415,109 @@ extern bool arm_is_constant_pool_ref (rtx);
>  #define FL_FOR_ARCH7EM  (FL_FOR_ARCH7M | FL_ARCH7EM)
>  #define FL_FOR_ARCH8A(FL_FOR_ARCH7VE | FL_ARCH8)
>  
> +/* There are too many feature bits to fit in a single word so the set of cpu 
> and
> +   fpu capabilities is a structure.  A feature set is created and manipulated
> +   with the ARM_FSET macros.  */
> +
> +typedef struct
> +{
> +  unsigned long cpu[2];
> +} arm_feature_set;
> +
> +
> +/* Initialize a feature set.  */
> +
> +#define ARM_FSET_MAKE(CPU1,CPU2) { { (CPU1), (CPU2) } }
> +
> +#define ARM_FSET_MAKE_CPU1(CPU1) ARM_FSET_MAKE ((CPU1), (FL_NONE))
> +#define ARM_FSET_MAKE_CPU2(CPU2) ARM_FSET_MAKE ((FL_NONE), (CPU2))
> +
> +/* Accessors.  */
> +
> +#define ARM_FSET_CPU1(S) ((S).cpu[0])
> +#define ARM_FSET_CPU2(S) ((S).cpu[1])
> +
> +/* Useful combinations.  */
> +
> +#define ARM_FSET_EMPTY ARM_FSET_MAKE (FL_NONE, FL_NONE)
> +#define ARM_FSET_ANY ARM_FSET_MAKE (FL_ANY, FL_ANY)
> +
> +/* Tests for a specific CPU feature.  */
> +
> +#define ARM_FSET_HAS_CPU1(A, F)  \
> +  (((A).cpu[0] & ((unsigned long)(F))) == ((unsigned long)(F)))
> +#define ARM_FSET_HAS_CPU2(A, F)  \
> +  (((A).cpu[1] & ((unsigned long)(F))) == ((unsigned long)(F)))
> +#define ARM_FSET_HAS_CPU(A, F1, F2)  \
> +  (ARM_FSET_HAS_CPU1 ((A), (F1)) && ARM_FSET_HAS_CPU2 ((A), (F2)))
> +
> +/* Add a feature to a feature set.  */
> +
> +#define ARM_FSET_ADD_CPU1(DST, F)\
> +  do {   \
> +(DST).cpu[0] |= (F); \
> +  } while (0)
> +
> +#define ARM_FSET_ADD_CPU2(DST, F)\
> +  do {   \
> +(DST).cpu[1] |= (F); \
> +  } while (0)
> +
> +/* Remove a feature from a feature set.  */
> +
> +#define ARM_FSET_DEL_CPU1(DST, F)\
> +  do {   \
> +(DST).cpu[0] &= ~(F);\
> +  } while (0)
> +
> +#define ARM_FSET_DEL_CPU2(DST, F)\
> +  do {   \
> +(DST).cpu[1] &= ~(F);\
> +  } while (0)
> +
> +/* Union of feature sets.  */
> +
> +#define ARM_FSET_UNION(DST,F1,F2)\
> +  do {   \
> +(DST).cpu[0] = (F1).cpu[0] | (F2).cpu[0];\
> +(DST).cpu[1] = (F1).cpu[1] | (F2).cpu[1];\
> +  } while (0)
> +
> +/* Intersection of feature sets.  */
> +
> +#define ARM_FSET_INTER(DST,F1,F2)\
> +  do {  

Re: [PATCH 1/5][ARM] Make room for more CPU feature flags.

2015-08-10 Thread Ramana Radhakrishnan
On Mon, Aug 10, 2015 at 12:55:45PM +0100, Matthew Wahab wrote:
> The ARM backend uses an unsigned long to record CPU feature flags and
> there are wcurrently 31 bits in use. To be able to support new
> architecture features, the current representation will need to be
> replaced so that more flags can be recorded.
> 
> This series of patches replaces the single unsigned long with a
> representation based on an array of unsigned longs. Constructors and
> operations are explicitly defined for the new representation and the
> backend is updated to use the new operations.
> 
> The individual patches:
> - Make architecture flags explicit in arm-cores.def, to prepare for the
>   changes.
> - Add definitions for the new representation as type arm_feature_set and
>   macros with prefix ARM_FSET.
> - Replace uses of the old representation with the arm_feature_set type
>   and operations in the architecture specifiers.
> - Use the new arm_feature_set type and operations in the descriptions of
>   the builtins.
> - Rework arm-cores.def and arm-arches.def to make the feature set
>   constructions explicit.
> 
> Tested the series for arm-none-linux-gnueabihf with native bootstrap and
> make check.
> 
> This patch moves the derived FL_FOR_ARCH##ARCH flags from the expansion
> of macro arm.c/ARM_CORE and makes them explicit in the entries in
> arm-cores.def.
> 
> 2015-08-10  Matthew Wahab  
> 
>   * gcc/config/arm/arm-cores.def: Add FL_FOR_ARCH flag for each
>   ARM_CORE entry.  Fix some white-space.
>   * gcc/config/arm/arm.c: Remove FL_FOR_ARCH derivation from
>   ARM_CORE definition.


OK

Ramana
> From beb28417822950ca773742977bed28db84679ed5 Mon Sep 17 00:00:00 2001
> From: Matthew Wahab 
> Date: Tue, 28 Jul 2015 09:26:47 +0100
> Subject: [PATCH 1/5] Make ARCH flags explicit in arm-cores.def.
> 
> Change-Id: I29d640e71b59177a984272335412b4e256909a26
> ---
>  gcc/config/arm/arm-cores.def | 200 
> +--
>  gcc/config/arm/arm.c |   2 +-
>  2 files changed, 101 insertions(+), 101 deletions(-)
> 
> diff --git a/gcc/config/arm/arm-cores.def b/gcc/config/arm/arm-cores.def
> index 9d47fcf..26a6b4b 100644
> --- a/gcc/config/arm/arm-cores.def
> +++ b/gcc/config/arm/arm-cores.def
> @@ -43,134 +43,134 @@
> Some tools assume no whitespace up to the first "," in each entry.  */
>  
>  /* V2/V2A Architecture Processors */
> -ARM_CORE("arm2", arm2, arm2, 2, FL_CO_PROC | FL_MODE26, slowmul)
> -ARM_CORE("arm250",   arm250, arm250, 2, FL_CO_PROC | FL_MODE26, slowmul)
> -ARM_CORE("arm3", arm3, arm3, 2, FL_CO_PROC | FL_MODE26, slowmul)
> +ARM_CORE("arm2", arm2, arm2, 2, FL_CO_PROC | FL_MODE26 | 
> FL_FOR_ARCH2, slowmul)
> +ARM_CORE("arm250",   arm250, arm250, 2, FL_CO_PROC | FL_MODE26 | 
> FL_FOR_ARCH2, slowmul)
> +ARM_CORE("arm3", arm3, arm3, 2, FL_CO_PROC | FL_MODE26 | 
> FL_FOR_ARCH2, slowmul)
>  
>  /* V3 Architecture Processors */
> -ARM_CORE("arm6", arm6, arm6, 3, FL_CO_PROC | FL_MODE26, 
> slowmul)
> -ARM_CORE("arm60",arm60, arm60,   3, FL_CO_PROC | FL_MODE26, 
> slowmul)
> -ARM_CORE("arm600",   arm600, arm600, 3, FL_CO_PROC | FL_MODE26 | 
> FL_WBUF, slowmul)
> -ARM_CORE("arm610",   arm610, arm610, 3, FL_MODE26 | FL_WBUF, slowmul)
> -ARM_CORE("arm620",   arm620, arm620, 3, FL_CO_PROC | FL_MODE26 | 
> FL_WBUF, slowmul)
> -ARM_CORE("arm7", arm7, arm7, 3, FL_CO_PROC | FL_MODE26, 
> slowmul)
> -ARM_CORE("arm7d",arm7d, arm7d,   3, FL_CO_PROC | FL_MODE26, 
> slowmul)
> -ARM_CORE("arm7di",   arm7di, arm7di, 3, FL_CO_PROC | FL_MODE26, 
> slowmul)
> -ARM_CORE("arm70",arm70, arm70,   3, FL_CO_PROC | FL_MODE26, 
> slowmul)
> -ARM_CORE("arm700",   arm700, arm700, 3, FL_CO_PROC | FL_MODE26 | 
> FL_WBUF, slowmul)
> -ARM_CORE("arm700i",  arm700i, arm700i,   3, FL_CO_PROC | FL_MODE26 | 
> FL_WBUF, slowmul)
> -ARM_CORE("arm710",   arm710, arm710, 3, FL_MODE26 | FL_WBUF, slowmul)
> -ARM_CORE("arm720",   arm720, arm720, 3, FL_MODE26 | FL_WBUF, slowmul)
> -ARM_CORE("arm710c",  arm710c, arm710c,   3, FL_MODE26 | FL_WBUF, slowmul)
> -ARM_CORE("arm7100",  arm7100, arm7100,   3, FL_MODE26 | FL_WBUF, slowmul)
> -ARM_CORE("arm7500",  arm7500, arm7500,   3, FL_MODE26 | FL_WBUF, slowmul)
> +ARM_CORE("arm6", arm6, arm6, 3, FL_CO_PROC | FL_MODE26 | 
> FL_FOR_ARCH3, slowmul)
> +ARM_CORE("arm60",arm60, arm60,   3, FL_CO_PROC | FL_MODE26 | 
> FL_FOR_ARCH3, slowmul)
> +ARM_CORE("arm600",   arm600, arm600, 3, FL_CO_PROC | FL_MODE26 | 
> FL_WBUF | FL_FOR_ARCH3, slowmul)
> +ARM_CORE("arm610",   arm610, arm610, 3, FL_MODE26 | FL_WBUF | 
> FL_FOR_ARCH3, slowmul)
> +ARM_CORE("arm620",   arm620, arm620, 3, FL_CO_PROC | FL_MODE26 | 
> FL_WBUF | FL_FOR_ARCH3, slowmul)
> +ARM_CORE("arm7", arm7, arm7, 3, FL_CO_PROC | FL_MODE26 | 
> FL_FOR_ARCH3, slowmul)
> +ARM_CORE("arm7d",

Re: [PR64164] drop copyrename, integrate into expand

2015-08-10 Thread Jeff Law

On 08/10/2015 02:23 AM, James Greenhalgh wrote:

On Tue, Aug 04, 2015 at 12:45:28AM +0100, Alexandre Oliva wrote:

On Jul 30, 2015, "H.J. Lu"  wrote:


aoliva/pr64164  is fine on x32.


Thanks.  I have made a large number of changes since you tested it,
fixing all the reported issues and then some.  Now, x86_64-linux-gnu
(-m64 and -m32), i686-pc-linux-gnu, powerpc64-linux-gnu and
powerpc64el-linux-gnu pass regstrap (r226317), and the many tens of
targets I cross-tested still get the same 'make all' errors that the
pristine tree did.


For what it is worth, I bootstrapped and tested the consolidated patch
on arm-none-linux-gnueabihf and aarch64-none-linux-gnu with trunk at
r226516 over the weekend, and didn't see any new issues.
Thanks -- I know it's been a long road on this patch.  I don't think 
anyone would have ever guessed fixing 64164 would be so complex.


jeff


Re: Fix offloading machine mode stream reading

2015-08-10 Thread Thomas Schwinge
Hi!

On Sat, 8 Aug 2015 07:25:42 +0200, Richard Biener  
wrote:
> Ok.

Committed in r226758 and r226759:

commit 7231f6b984806cceb30cacf0e79f8f5ae7a68803
Author: tschwinge 
Date:   Mon Aug 10 15:22:24 2015 +

Correctly advance iterator in offloading machine mode stream reading

gcc/
* lto-streamer-in.c (lto_input_mode_table): Correctly advance
iterator.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@226758 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog |6 ++
 gcc/lto-streamer-in.c |2 +-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git gcc/ChangeLog gcc/ChangeLog
index f103d41..c51aaf9 100644
--- gcc/ChangeLog
+++ gcc/ChangeLog
@@ -1,3 +1,9 @@
+2015-08-10  Thomas Schwinge  
+   Ilya Verbin  
+
+   * lto-streamer-in.c (lto_input_mode_table): Correctly advance
+   iterator.
+
 2015-08-09  Manuel López-Ibáñez  
 
* doc/options.texi (EnabledBy): Document that the argument must be
diff --git gcc/lto-streamer-in.c gcc/lto-streamer-in.c
index a56d3f3..299900a 100644
--- gcc/lto-streamer-in.c
+++ gcc/lto-streamer-in.c
@@ -1573,7 +1573,7 @@ lto_input_mode_table (struct lto_file_decl_data 
*file_data)
for (machine_mode mr = pass ? VOIDmode
: GET_CLASS_NARROWEST_MODE (mclass);
 pass ? mr < MAX_MACHINE_MODE : mr != VOIDmode;
-pass ? mr = (machine_mode) (m + 1)
+pass ? mr = (machine_mode) (mr + 1)
  : mr = GET_MODE_WIDER_MODE (mr))
  if (GET_MODE_CLASS (mr) != mclass
  || GET_MODE_SIZE (mr) != size

commit b308f4a0d03e67bdaf3f43416cfbd360db957a29
Author: tschwinge 
Date:   Mon Aug 10 15:22:30 2015 +

Fix offloading machine mode stream reading

... in context of the GET_MODE_INNER changes applied in r226328.

gcc/
* lto-streamer-in.c (lto_input_mode_table): Adjust to
GET_MODE_INNER changes.
libgomp/
* testsuite/libgomp.oacc-c-c++-common/vector-type-1.c: New file.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@226759 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog  |5 
 gcc/lto-streamer-in.c  |8 ---
 gcc/lto-streamer-out.c |4 ++--
 libgomp/ChangeLog  |4 
 .../libgomp.oacc-c-c++-common/vector-type-1.c  |   24 
 5 files changed, 40 insertions(+), 5 deletions(-)

diff --git gcc/ChangeLog gcc/ChangeLog
index c51aaf9..f547931 100644
--- gcc/ChangeLog
+++ gcc/ChangeLog
@@ -1,4 +1,9 @@
 2015-08-10  Thomas Schwinge  
+
+   * lto-streamer-in.c (lto_input_mode_table): Adjust to
+   GET_MODE_INNER changes.
+
+2015-08-10  Thomas Schwinge  
Ilya Verbin  
 
* lto-streamer-in.c (lto_input_mode_table): Correctly advance
diff --git gcc/lto-streamer-in.c gcc/lto-streamer-in.c
index 299900a..2eb8051 100644
--- gcc/lto-streamer-in.c
+++ gcc/lto-streamer-in.c
@@ -1544,7 +1544,7 @@ lto_input_mode_table (struct lto_file_decl_data 
*file_data)
= bp_unpack_enum (&bp, mode_class, MAX_MODE_CLASS);
   unsigned int size = bp_unpack_value (&bp, 8);
   unsigned int prec = bp_unpack_value (&bp, 16);
-  machine_mode inner = (machine_mode) table[bp_unpack_value (&bp, 8)];
+  machine_mode inner = (machine_mode) bp_unpack_value (&bp, 8);
   unsigned int nunits = bp_unpack_value (&bp, 8);
   unsigned int ibit = 0, fbit = 0;
   unsigned int real_fmt_len = 0;
@@ -1578,7 +1578,9 @@ lto_input_mode_table (struct lto_file_decl_data 
*file_data)
  if (GET_MODE_CLASS (mr) != mclass
  || GET_MODE_SIZE (mr) != size
  || GET_MODE_PRECISION (mr) != prec
- || GET_MODE_INNER (mr) != inner
+ || (inner == m
+ ? GET_MODE_INNER (mr) != mr
+ : GET_MODE_INNER (mr) != table[(int) inner])
  || GET_MODE_IBIT (mr) != ibit
  || GET_MODE_FBIT (mr) != fbit
  || GET_MODE_NUNITS (mr) != nunits)
@@ -1606,7 +1608,7 @@ lto_input_mode_table (struct lto_file_decl_data 
*file_data)
case MODE_VECTOR_UACCUM:
  /* For unsupported vector modes just use BLKmode,
 if the scalar mode is supported.  */
- if (inner != VOIDmode)
+ if (table[(int) inner] != VOIDmode)
{
  table[m] = BLKmode;
  break;
diff --git gcc/lto-streamer-out.c gcc/lto-streamer-out.c
index 1b88115..3ca8855 100644
--- gcc/lto-streamer-out.c
+++ gcc/lto-streamer-out.c
@@ -2676,7 +2676,7 @@ lto_write_mode_table (void)
   ob = create_output_block (LTO_section_mode_table);
   bitpack_d bp = bitpack_create (ob->main_stream);
 
-  /* Ensure that for GET_MODE_INNER (m) != VOIDmode we have
+  /* Ensure that for GET_MODE_INNER (m) != m we have
  also the inner mode mark

Empty libgomp for nvptx (was: [WIP] OpenMP 4 NVPTX support)

2015-08-10 Thread Thomas Schwinge
Hi!

On Wed, 22 Jul 2015 18:38:44 +0200, Jakub Jelinek  wrote:
> On Wed, Jul 22, 2015 at 06:04:20PM +0200, Thomas Schwinge wrote:
> > On Tue, 21 Apr 2015 17:58:39 +0200, Jakub Jelinek  wrote:
> > > Attached is a minimal patch to get at least a trivial OpenMP 4.0 testcase
> > > offloading to NVPTX (the first patch).  The second patch is WIP, just 
> > > first
> > > few needed changes to make libgomp to build for NVPTX (several weeks of 
> > > work
> > > at least).
> > 
> > We're not in particular working on making nvptx offloading work for
> > OpenMP, but also for OpenACC offloading a tiny bit of code is required to
> > be shipped in an offloading device's runtime library -- code that
> > conceptually belongs into libgomp.  (On gomp-4_0-branch, it currently
> > lives in libgcc because that was easier to do.)  Actually, as I should
> > find out, building a "dummy" (empty) libgomp for nvptx is not actually
> > difficult.  Additionally to your second patch (U2; quoted at the end of
> > this email), we'll need the following:
> 
> The U2 version was a very early one, I've posted a newer version later,
> but supposedly we can go with my U2 (if you've tested it together with your
> patch, please check it in yourself) and your patch, and then
> incrementally start removing the zero sized stubs or replacing them with
> something real.

Yes, that's precisely the idea.  Committed in r226760:

commit fdcd05c84f79cec55fa61249febd4c1c21b772a7
Author: tschwinge 
Date:   Mon Aug 10 15:53:33 2015 +

Empty libgomp for nvptx

* configure.ac (noconfigdirs): Don't add "target-libgomp" for target
nvptx*-*-*.
* configure: Regenerate.
libgomp/
* config/nvptx/affinity.c: New file.
* config/nvptx/alloc.c: Likewise.
* config/nvptx/bar.c: Likewise.
* config/nvptx/barrier.c: Likewise.
* config/nvptx/critical.c: Likewise.
* config/nvptx/env.c: Likewise.
* config/nvptx/error.c: Likewise.
* config/nvptx/fortran.c: Likewise.
* config/nvptx/iter.c: Likewise.
* config/nvptx/iter_ull.c: Likewise.
* config/nvptx/libgomp-plugin.c: Likewise.
* config/nvptx/lock.c: Likewise.
* config/nvptx/loop.c: Likewise.
* config/nvptx/loop_ull.c: Likewise.
* config/nvptx/mutex.c: Likewise.
* config/nvptx/oacc-async.c: Likewise.
* config/nvptx/oacc-cuda.c: Likewise.
* config/nvptx/oacc-host.c: Likewise.
* config/nvptx/oacc-init.c: Likewise.
* config/nvptx/oacc-mem.c: Likewise.
* config/nvptx/oacc-parallel.c: Likewise.
* config/nvptx/oacc-plugin.c: Likewise.
* config/nvptx/omp-lock.h: Likewise.
* config/nvptx/ordered.c: Likewise.
* config/nvptx/parallel.c: Likewise.
* config/nvptx/proc.c: Likewise.
* config/nvptx/ptrlock.c: Likewise.
* config/nvptx/sections.c: Likewise.
* config/nvptx/sem.c: Likewise.
* config/nvptx/single.c: Likewise.
* config/nvptx/splay-tree.c: Likewise.
* config/nvptx/target.c: Likewise.
* config/nvptx/task.c: Likewise.
* config/nvptx/team.c: Likewise.
* config/nvptx/time.c: Likewise.
* config/nvptx/work.c: Likewise.
* configure.ac: Don't probe pthreads support for host nvptx*-*-*.
* configure: Regenerate.
* configure.tgt (config_path): Set to "nvptx" for target
nvptx*-*-*.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@226760 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 ChangeLog   |7 +++
 configure   |6 +++---
 configure.ac|6 +++---
 libgomp/ChangeLog   |   44 +++
 libgomp/config/nvptx/omp-lock.h |   12 +++
 libgomp/configure   |3 +++
 libgomp/configure.ac|3 +++
 libgomp/configure.tgt   |4 
 8 files changed, 79 insertions(+), 6 deletions(-)

diff --git ChangeLog ChangeLog
index bd0f35e..6d3a8a0 100644
--- ChangeLog
+++ ChangeLog
@@ -1,3 +1,10 @@
+2015-08-10  Thomas Schwinge  
+   Jakub Jelinek  
+
+   * configure.ac (noconfigdirs): Don't add "target-libgomp" for target
+   nvptx*-*-*.
+   * configure: Regenerate.
+
 2015-08-06  Yaakov Selkowitz  
 
* Makefile.def (libiconv): Define bootstrap=true.
diff --git configure configure
index 6d7152e..79257fd 100755
--- configure
+++ configure
@@ -3168,9 +3168,8 @@ if test x$enable_static_libjava != xyes ; then
 fi
 
 
-# Disable libgomp on non POSIX hosted systems.
+# Enable libgomp by default on hosted POSIX systems, and a few others.
 if test x$enable_libgomp = x ; then
-# Enable libgomp by default on hosted POSIX systems.
 case "${target}" in
 *-*-linux* | *-*-gnu* | *-*-k*bsd*-gnu | *-*-kopensolaris*-gnu)
;;
@@ -3180,6 +3179,8 @@ if test x$enable_libgomp = x ; then
;;
 *-*-darwin* | *-*-aix*)

Re: [PATCH] PR target/67127: [ARM] Avoiding odd-number ldrd/strd in movdi introduced a regression on armeb-linux-gnueabihf

2015-08-10 Thread Alan Lawrence

Yvan Roux wrote:

Hi,

this patch is a fix for pr27127.  It avoids splitting the DI registers
into SI ones if it is not allowed, which breaks the introduced loop.
I haven't added a testcase as the bug is already exhibited by several
regressions (like g++.dg/ext/attribute-test-2.C or g++.dg/eh/simd-1.C)
but I can add one if you think it is needed.  Cross built and
regtested on trunk and gcc-5 branch and the regression mentioned in
https://gcc.gnu.org/ml/gcc-patches/2015-07/msg00216.html is not
observed.

Is it ok for trunk and branch ?

Thanks,
Yvan

gcc/

PR target/67127
* config/arm/arm.md (movdi): Avoid forbidden modes changed.


I've just looked into the above 2 testcases on armeb-none-eabi, and in both 
cases the infinite loop is due to an ldrd/strd with base register 16. So not an 
odd-numbered physical register, but rather something that isn't a physical 
register at all.


I observe that FIRST_VIRTUAL_REGISTER is 104, whereas LAST_ARM_REG is 15. So it 
might be that the pattern should check against the latter instead of the former 
- as arm_hard_regno_mode_ok does...


--Alan



Re: Empty libgomp for nvptx

2015-08-10 Thread Thomas Schwinge
Hi!

On Mon, 10 Aug 2015 17:55:57 +0200, I wrote:
> On Wed, 22 Jul 2015 18:38:44 +0200, Jakub Jelinek  wrote:
> > On Wed, Jul 22, 2015 at 06:04:20PM +0200, Thomas Schwinge wrote:
> > > On Tue, 21 Apr 2015 17:58:39 +0200, Jakub Jelinek  
> > > wrote:
> > > > Attached is a minimal patch to get at least a trivial OpenMP 4.0 
> > > > testcase
> > > > offloading to NVPTX (the first patch).  The second patch is WIP, just 
> > > > first
> > > > few needed changes to make libgomp to build for NVPTX (several weeks of 
> > > > work
> > > > at least).
> > > 
> > > We're not in particular working on making nvptx offloading work for
> > > OpenMP, but also for OpenACC offloading a tiny bit of code is required to
> > > be shipped in an offloading device's runtime library -- code that
> > > conceptually belongs into libgomp.  (On gomp-4_0-branch, it currently
> > > lives in libgcc because that was easier to do.)  Actually, as I should
> > > find out, building a "dummy" (empty) libgomp for nvptx is not actually
> > > difficult.  Additionally to your second patch (U2; quoted at the end of
> > > this email), we'll need the following:
> > 
> > The U2 version was a very early one, I've posted a newer version later,
> > but supposedly we can go with my U2 (if you've tested it together with your
> > patch, please check it in yourself) and your patch, and then
> > incrementally start removing the zero sized stubs or replacing them with
> > something real.
> 
> Yes, that's precisely the idea.  Committed in r226760:
> 
> commit fdcd05c84f79cec55fa61249febd4c1c21b772a7
> Author: tschwinge 
> Date:   Mon Aug 10 15:53:33 2015 +
> 
> Empty libgomp for nvptx

Backported to gomp-4_0-branch in r226761:

commit d4ba3f3e41b5b647e4a3cc7bad12f2a4770cd15d
Author: tschwinge 
Date:   Mon Aug 10 16:26:39 2015 +

Empty libgomp for nvptx

Backport trunk r226760:

* configure.ac (noconfigdirs): Don't add "target-libgomp" for target
nvptx*-*-*.
* configure: Regenerate.
libgomp/
* config/nvptx/affinity.c: New file.
* config/nvptx/alloc.c: Likewise.
* config/nvptx/bar.c: Likewise.
* config/nvptx/barrier.c: Likewise.
* config/nvptx/critical.c: Likewise.
* config/nvptx/env.c: Likewise.
* config/nvptx/error.c: Likewise.
* config/nvptx/fortran.c: Likewise.
* config/nvptx/iter.c: Likewise.
* config/nvptx/iter_ull.c: Likewise.
* config/nvptx/libgomp-plugin.c: Likewise.
* config/nvptx/lock.c: Likewise.
* config/nvptx/loop.c: Likewise.
* config/nvptx/loop_ull.c: Likewise.
* config/nvptx/mutex.c: Likewise.
* config/nvptx/oacc-async.c: Likewise.
* config/nvptx/oacc-cuda.c: Likewise.
* config/nvptx/oacc-host.c: Likewise.
* config/nvptx/oacc-init.c: Likewise.
* config/nvptx/oacc-mem.c: Likewise.
* config/nvptx/oacc-parallel.c: Likewise.
* config/nvptx/oacc-plugin.c: Likewise.
* config/nvptx/omp-lock.h: Likewise.
* config/nvptx/ordered.c: Likewise.
* config/nvptx/parallel.c: Likewise.
* config/nvptx/proc.c: Likewise.
* config/nvptx/ptrlock.c: Likewise.
* config/nvptx/sections.c: Likewise.
* config/nvptx/sem.c: Likewise.
* config/nvptx/single.c: Likewise.
* config/nvptx/splay-tree.c: Likewise.
* config/nvptx/target.c: Likewise.
* config/nvptx/task.c: Likewise.
* config/nvptx/team.c: Likewise.
* config/nvptx/time.c: Likewise.
* config/nvptx/work.c: Likewise.
* configure.ac: Don't probe pthreads support for host nvptx*-*-*.
* configure: Regenerate.
* configure.tgt (config_path): Set to "nvptx" for target
nvptx*-*-*.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@226761 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 ChangeLog.gomp  |   11 +
 configure   |6 ++---
 configure.ac|6 ++---
 libgomp/ChangeLog.gomp  |   48 +++
 libgomp/config/nvptx/omp-lock.h |   12 ++
 libgomp/configure   |3 +++
 libgomp/configure.ac|3 +++
 libgomp/configure.tgt   |4 
 8 files changed, 87 insertions(+), 6 deletions(-)

diff --git ChangeLog.gomp ChangeLog.gomp
index 2a0ddb1..fd1a1e0 100644
--- ChangeLog.gomp
+++ ChangeLog.gomp
@@ -1,3 +1,14 @@
+2015-08-10  Thomas Schwinge  
+
+   Backport trunk r226760:
+
+   2015-08-10  Thomas Schwinge  
+   Jakub Jelinek  
+
+   * configure.ac (noconfigdirs): Don't add "target-libgomp" for target
+   nvptx*-*-*.
+   * configure: Regenerate.
+
 2015-06-30  Tom de Vries  
 
Revert:
diff --git configure configure
index 82e45f3..5d90445 100755
--- configure
+++ configure
@@ -3159,9 +3159,8 @@ if test x$enable_static_libjava != xyes ; then
 fi
 
 
-# Dis

[gomp4] [nvptx] Move GOMP stuff from libgcc to libgomp (was: [WIP] OpenMP 4 NVPTX support)

2015-08-10 Thread Thomas Schwinge
Hi!

On Wed, 22 Jul 2015 18:04:20 +0200, I wrote:
> On Tue, 21 Apr 2015 17:58:39 +0200, Jakub Jelinek  wrote:
> > Attached is a minimal patch to get at least a trivial OpenMP 4.0 testcase
> > offloading to NVPTX (the first patch).  The second patch is WIP, just first
> > few needed changes to make libgomp to build for NVPTX (several weeks of work
> > at least).
> 
> We're not in particular working on making nvptx offloading work for
> OpenMP, but also for OpenACC offloading a tiny bit of code is required to
> be shipped in an offloading device's runtime library -- code that
> conceptually belongs into libgomp.  (On gomp-4_0-branch, it currently
> lives in libgcc because that was easier to do.)  [...]

> Next, we can then (on gomp-4_0-branch) move the libgcc code into libgomp:
> 
> commit d8d75d17630d7633be4f1733fd195a104cb2ccc4
> Author: Thomas Schwinge 
> Date:   Wed Jul 22 13:05:16 2015 +0200
> 
> [nvptx] Move GOMP stuff from libgcc to libgomp

Committed to gomp-4_0-branch in r226762:

commit c49a2b23a76591f26b4076401647011442df92df
Author: tschwinge 
Date:   Mon Aug 10 16:26:46 2015 +

[nvptx] Move GOMP stuff from libgcc to libgomp

libgcc/
* config.host [nvptx-*] (extra_parts): Don't add "libgomp.a", and
"libgomp.spec".
* config/nvptx/gomp-acc_on_device.c: Remove file.
* config/nvptx/gomp-atomic.asm: Likewise.
* config/nvptx/t-nvptx (OBJS_libgomp): Don't set.
(gomp-acc_on_device.o, gomp-atomic.o, libgomp.a, libgomp.spec):
Remove targets.
libgomp/
* config/nvptx/critical.c: New file, replacing empty file.
* config/nvptx/oacc-init.c: Likewise.
* config/nvptx/openacc.f90: New file.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@226762 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 libgcc/ChangeLog.gomp|   10 +++
 libgcc/config.host   |6 +-
 libgcc/config/nvptx/gomp-acc_on_device.c |   15 -
 libgcc/config/nvptx/gomp-atomic.asm  |   37 ---
 libgcc/config/nvptx/t-nvptx  |   11 
 libgomp/ChangeLog.gomp   |4 ++
 libgomp/config/nvptx/critical.c  |   57 +
 libgomp/config/nvptx/oacc-init.c |   40 
 libgomp/config/nvptx/openacc.f90 |  101 ++
 9 files changed, 213 insertions(+), 68 deletions(-)

diff --git libgcc/ChangeLog.gomp libgcc/ChangeLog.gomp
index 085bfda..7de8361 100644
--- libgcc/ChangeLog.gomp
+++ libgcc/ChangeLog.gomp
@@ -1,3 +1,13 @@
+2015-08-10  Thomas Schwinge  
+
+   * config.host [nvptx-*] (extra_parts): Don't add "libgomp.a", and
+   "libgomp.spec".
+   * config/nvptx/gomp-acc_on_device.c: Remove file.
+   * config/nvptx/gomp-atomic.asm: Likewise.
+   * config/nvptx/t-nvptx (OBJS_libgomp): Don't set.
+   (gomp-acc_on_device.o, gomp-atomic.o, libgomp.a, libgomp.spec):
+   Remove targets.
+
 2015-08-03  Thomas Schwinge  
 
* config/nvptx/gomp-acc_on_device.c: Don't include
diff --git libgcc/config.host libgcc/config.host
index ee7ce03..3a2c75d 100644
--- libgcc/config.host
+++ libgcc/config.host
@@ -1304,11 +1304,7 @@ mep*-*-*)
;;
 nvptx-*)
tmake_file="$tmake_file nvptx/t-nvptx"
-   if test "x${enable_as_accelerator_for}" != x; then
-   extra_parts="crt0.o libgomp.a libgomp.spec"
-   else
-   extra_parts="crt0.o"
-   fi
+   extra_parts="crt0.o"
;;
 *)
echo "*** Configuration ${host} not supported" 1>&2
diff --git libgcc/config/nvptx/gomp-acc_on_device.c 
libgcc/config/nvptx/gomp-acc_on_device.c
deleted file mode 100644
index db94350..000
--- libgcc/config/nvptx/gomp-acc_on_device.c
+++ /dev/null
@@ -1,15 +0,0 @@
-/* The compiler always attempts to expand acc_on_device, but if the
-   user disables the builtin, or calls it via a pointer, we have this
-   version.  */
-
-int
-acc_on_device (int dev)
-{
-  /* Just rely on the compiler builtin.  */
-  return __builtin_acc_on_device (dev);
-}
-
-int acc_on_device_h_(int *d)
-{
-  return acc_on_device(*d);
-}
diff --git libgcc/config/nvptx/gomp-atomic.asm 
libgcc/config/nvptx/gomp-atomic.asm
deleted file mode 100644
index ae9d925..000
--- libgcc/config/nvptx/gomp-atomic.asm
+++ /dev/null
@@ -1,37 +0,0 @@
-
-// BEGIN PREAMBLE
-   .version3.1
-   .target sm_30
-   .address_size 64
-   .extern .shared .u8 sdata[];
-// END PREAMBLE
-
-// BEGIN VAR DEF: libgomp_ptx_lock
-.global .align 4 .u32 libgomp_ptx_lock;
-
-// BEGIN GLOBAL FUNCTION DECL: GOMP_atomic_start
-.visible .func GOMP_atomic_start;
-// BEGIN GLOBAL FUNCTION DEF: GOMP_atomic_start
-.visible .func GOMP_atomic_start
-{
-   .reg .pred  %p<2>;
-   .reg .s32   %r<2>;
-   .reg .s64   %rd<2>;
-BB5_1:
-   mov.u64 %rd1, libgomp_ptx_lock;
-   atom.global.cas.b32 %r1, [%rd1], 0, 1;
-   setp.ne.s32 %p1, %r1, 0;
- 

Re: libgomp: plugin for non-shared memory host execution

2015-08-10 Thread Thomas Schwinge
Hi!

On Fri, 31 Jul 2015 16:16:59 +0200, I wrote:
> On Thu, 30 Jul 2015 13:51:17 +0200, Jakub Jelinek  wrote:
> > On Thu, Jul 30, 2015 at 01:47:37PM +0200, Thomas Schwinge wrote:
> > > > Here is such a libgomp plugin plus the infrastructure for initial 
> > > > support
> > > > of non-shared memory host execution.  [...]
> > > 
> > > ... the libgomp plugin as it is currently implemented fails to adequately
> > > provide such functionality: nobody so far has implemented support for
> > > certain data mapping constructs; so it is not currently used for OpenMP
> > > offloading testing, and also disabled for certain OpenACC offloading test
> > > cases.  Its improper integration into the offloading compilation process,
> > > , is also
> > > causing issues: we use the target compiler for compiling "device" code --
> > > but it doesn't know that it's being used for that purpose, so cannot
> > > properly handle some constructs, such as efficiently implement
> > > acc_on_device with constant argument.
> > > 
> > > It has been useful for initial bring-up, to test-drive the libgomp plugin
> > > interface, when the nvptx backend and libgomp nvptx plugin as well as the
> > > intelmic plugin were not yet available, but it's now probably time to
> > > retire this plugin, at least until somebody feels like working on
> > > integrating and implementing it properly.  Unless there are any
> > > objections, I'll later propose a patch to this effect.
> > 
> > I agree with the removal.  It would be nice if somebody could add OpenACC
> > support to the IntelMIC plugin, then you'd get a non-shared memory host
> > execution testing for free, as it has a reasonable emulation mode.
> 
> I find the intelmic plugin, with all its emulation code, a bit
> heavy-weight for this purpose.  But, let's leave that for later.  ;-)
> 
> Committed to gomp-4_0-branch in r226444; will address trunk later (next
> week).
> 
> commit 0eefa17a15b9a58fff02239289fc9c40ed62634f
> Author: tschwinge 
> Date:   Fri Jul 31 14:13:59 2015 +
> 
> [PR libgomp/65742, PR middle-end/66332] libgomp: Remove plugin for 
> non-shared memory host execution

Committed to trunk in r226763:

commit f212338e41d10436a48f04ea499f63dce5bf50ef
Author: tschwinge 
Date:   Mon Aug 10 16:48:26 2015 +

[PR libgomp/65742, PR middle-end/66332] libgomp: Remove plugin for 
non-shared memory host execution

gcc/
* builtins.c (expand_builtin_acc_on_device) [ACCEL_COMPILER]: Emit
open-coded sequence.
* omp-low.c (oacc_process_reduction_data): Remove handline of
GOMP_DEVICE_HOST_NONSHM.
gcc/testsuite/
* c-c++-common/goacc/acc_on_device-2.c: Remove XFAIL for C.
include/
* gomp-constants.c (GOMP_DEVICE_HOST_NONSHM): Remove.
libgomp/
* libgomp-plugin.h (enum offload_target_type): Remove
OFFLOAD_TARGET_TYPE_HOST_NONSHM.
* openacc.f90 (openacc_kinds): Remove acc_device_host_nonshm.
* openacc.h (enum acc_device_t): Likewise.
* openacc_lib.h: Likewise.
* oacc-init.c (name_of_acc_device_t): Don't handle it.
(acc_on_device): Just use __builtin_acc_on_device.
* testsuite/libgomp.oacc-c-c++-common/if-1.c: Don't forbid usage
of acc_on_device builtin.
* plugin/plugin-host.h: Remove file.
* plugin/plugin-host.c: Likewise, but salvage some content into...
* oacc-host.c: ... this file.
* plugin/Makefrag.am: Don't build libgomp-plugin-host_nonshm.la.
* plugin/configfrag.ac (offload_targets): Don't add host_nonshm.
* Makefile.in: Regenerate.
* configure: Likewise.
* testsuite/lib/libgomp.exp
(check_effective_target_openacc_host_nonshm_selected): Remove.
* testsuite/libgomp.oacc-c++/c++.exp: Don't handle
ACC_DEVICE_TYPE=host_nonshm.
* testsuite/libgomp.oacc-c/c.exp: Likewise.
* testsuite/libgomp.oacc-fortran/fortran.exp: Likewise.
* testsuite/libgomp.oacc-c-c++-common/acc_on_device-1.c: Likewise.
* testsuite/libgomp.oacc-fortran/acc_on_device-1-1.f90: Likewise.
* testsuite/libgomp.oacc-fortran/acc_on_device-1-2.f: Likewise.
* testsuite/libgomp.oacc-fortran/acc_on_device-1-3.f: Likewise.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@226763 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog  |7 +
 gcc/builtins.c |   12 +-
 gcc/omp-low.c  |   18 --
 gcc/testsuite/ChangeLog|6 +
 gcc/testsuite/c-c++-common/goacc/acc_on_device-2.c |   10 +-
 include/ChangeLog  |4 +
 include/gomp-constants.h   |4 +-
 libgomp/ChangeLog  |   29 +++
 libgomp/Makefile.in|   33 +--
 libgomp/configu

Re: [PATCH] PR target/67127: [ARM] Avoiding odd-number ldrd/strd in movdi introduced a regression on armeb-linux-gnueabihf

2015-08-10 Thread Yvan Roux
Hi Alan,

On 10 August 2015 at 18:02, Alan Lawrence  wrote:
> Yvan Roux wrote:
>>
>> Hi,
>>
>> this patch is a fix for pr27127.  It avoids splitting the DI registers
>> into SI ones if it is not allowed, which breaks the introduced loop.
>> I haven't added a testcase as the bug is already exhibited by several
>> regressions (like g++.dg/ext/attribute-test-2.C or g++.dg/eh/simd-1.C)
>> but I can add one if you think it is needed.  Cross built and
>> regtested on trunk and gcc-5 branch and the regression mentioned in
>> https://gcc.gnu.org/ml/gcc-patches/2015-07/msg00216.html is not
>> observed.
>>
>> Is it ok for trunk and branch ?
>>
>> Thanks,
>> Yvan
>>
>> gcc/
>>
>> PR target/67127
>> * config/arm/arm.md (movdi): Avoid forbidden modes changed.
>
>
> I've just looked into the above 2 testcases on armeb-none-eabi, and in both
> cases the infinite loop is due to an ldrd/strd with base register 16. So not
> an odd-numbered physical register, but rather something that isn't a
> physical register at all.

Yes in big-endian DI mode value are stored into VFP registers, and
here register 16 is the first of them s0.  Just in case you want to do
more test, the issue can be seen with a oneline testcase:

__attribute__((__vector_size__(2 * sizeof(int int fn1() {}

> I observe that FIRST_VIRTUAL_REGISTER is 104, whereas LAST_ARM_REG is 15. So
> it might be that the pattern should check against the latter instead of the
> former - as arm_hard_regno_mode_ok does...

yes, when checking that that the operand register number is lower or
equals to LAST_ARM_REGNUM the infinite loop is avoided.  I haven't
pass a full validation so far, but it has the same effect than
checking that the changing mode is authorized.  If you think that this
checking makes more sense, I can rerun a full valid.

Thanks,
Yvan


yes


> --Alan
>


Re: [Bug fortran/52846] [F2008] Support submodules - part 3/3

2015-08-10 Thread Toon Moene

On 08/03/2015 02:36 PM, Paul Richard Thomas wrote:

Dear Mikael,

Thanks for your green light!

I have been mulling over the trans-decl part of the patch and having
been wondering if it is necessary. Without optimization, private
entities can be linked to. Given the discussion concerning the
combination of submodules and private entities, I wonder if this is
not sufficient? Within submodule scope, an advisory could be given for
undefined references to suggest recompiling the module without
optimization or making the entities public.

Cheers

Paul

On 3 August 2015 at 12:44, Mikael Morin  wrote:

Le 29/07/2015 17:08, Paul Richard Thomas a écrit :


Dear All,

On 24 July 2015 at 10:08, Damian Rouson 
wrote:


I love this idea and had similar thoughts as well.

:D

Sent from my iPhone


On Jul 24, 2015, at 1:06 AM, Paul Richard Thomas
 wrote:

Dear Mikael,

It had crossed my mind also that a .mod and a .smod file could be
written. Normally, the .smod files are produced by the submodules
themselves, so that their descendants can pick up the symbols that
they generate. There is no reason at all why this could not be
implemented; early on in the development I did just this, although I
think that it would now be easier to modify this patch.

One huge advantage of proceeding in this way is that any resulting
library can be distributed with the .mod file alone so that the
private entities are never exposed. The penalty is that a second file
is output.

With best regards

Paul



Please find attached the implementation of this suggestion.

Bootstraps and regtests on FC21/x86_64 - OK for trunk or is the
original preferred?


There hasn't been a lot of voices about this among the other active and less
active team members.
I prefer this "private members to separate smod" variant.
It's OK for trunk as far as I'm concerned.
Thanks.

Mikael

PS: Regarding redundant initializations: rather have too many than too few.
;-)


Although I do not immediately know if this is relevant for *this* 
debate, J3 passed the following (attached) interpretation on submodules 
the past week (it still has to go to several mail ballots, but still), 
overwhelmingly prefering option 3:


[attached]

Kind regards,

--
Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
At home: http://moene.org/~toon/; weather: http://moene.org/~hirlam/
Progress of GNU Fortran: http://gcc.gnu.org/wiki/GFortran#news
 J3/15-208
To: J3
From: Malcolm Cohen
Subject: Interp of USE and submodules
Date: 2015 August 06


1. Introduction

Three options are provided for the answer to this interp:

Option 1: Basically what the actual text of the standard says now.
  Accessing a PROTECTED item by a USE statement will hide the
  host association, and therefore the item will be protected.

Option 2: Continue to allow a submodule to USE its ancestor, but say
  that PROTECTED has no effect in this case.

Option 3: Decide that it was a mistake to allow a submodule to access
  its ancestor module by use association, and forbid it.

2. The interpretation request

--

NUMBER: F08/0128
TITLE: Is recursive USE within a submodule permitted?
KEYWORDS: SUBMODULE, USE
DEFECT TYPE: Erratum
STATUS: J3 consideration in progress

QUESTION:

Consider
  Module m1
Real x
  End Module
  Submodule(m1) subm1
Use m1
  End Submodule

Q1. The module m1 is referenced from within one of its own
submodules.  Is this standard-conforming?

Note that the "submodule TR", Technical Report 19767 contains, an edit
with the normative requirement:
  "A submodule shall not reference its ancestor module by use
   association, either directly or indirectly."
along with a note which says
  "It is possible for submodules with different ancestor modules to
   access each other's ancestor modules by use association."
It also contains an edit to insert the direct reference prohibition
as a constraint.

However, none of this text appears in ISO/IEC 1539-1:2010.

The Introduction simply comments that submodules are available, but
not that they have been extended beyond the Technical Report that
created them.

Also, consider

  Module m2
Real, Private :: a
Real, Protected :: b
...
  End Module
  Submodule(m2) subm2
  Contains
Subroutine s
  Use m2
  Implicit None
  a = 3
  b = 4
End Subroutine
  End Submodule

In submodule SUBM2, procedure S references M2 by use association.
Use association does not make "A" accessible.

Q2. Is "A" still accessible by host association?

Also, procedure S attempts to assign a value to B, which is accessed
by use association, but has the PROTECTED attribute.  Normally, this
attribute prevents assignment to variables accessed by use
association.

Q3. Is the assignment to "B" standard-conforming?

DISCUSSION:

The requirement appears 

AW: [Bug fortran/52846] [F2008] Support submodules - part 3/3

2015-08-10 Thread Bader, Reinhold
Hello Toon, all else, 

a bit unfortunate, in my opinion (I was present at the discussion). 
I've in the meantime taken some effort to implement what the design pattern 
experts might call an "abstract factory with full dependency inversion" as 
a bare-bones framework and have attached an archive with three variants:

* pre_interp contains the code that is presently valid (and indeed compiles fine
   with both gfortran and ifort), but would become invalid due to indirect
   parent module access
* post_interp contains a variant that uses a helper module (mod_glue) to avoid
   the indirect ancestor use access (if there is a more concise way to do this, 
   I'd like to know ... up to now this is the best I can do)
* post_interp_v2 another shorter variant that pushes the extension types into a 
submodule
   (with the disadvantage that these types are not really reusable, and
that the monster module problem is shifted to a monster submodule, or a 
chain of
submodules)
You may need to edit the Makefiles to build. 

I would of course like to know how people feel about reintroducing this 
restriction, 
especially since the only reason given was that ancestor module access and its
use association overriding host association would confuse users ... which is a 
problem which in my opinion could have been dealt with in a slightly different
manner without removing the  permission for indirect parent module-referencing 
use statements. It is not clear to me whether *implementations* other than 
gfortran have problems with this, though.

More germane to this thread's discussion actually is another interp that was 
also passed, 
and which appears entirely uncontroversial:
http://j3-fortran.org/doc/meeting/207/15-209.txt 
It seems to me that this would permit avoiding generation of the .smod files for
modules that do not specify an separate module procedure interface.

Cheers
Reinhold
 
> -Ursprüngliche Nachricht-
> 
> Although I do not immediately know if this is relevant for *this*
> debate, J3 passed the following (attached) interpretation on submodules
> the past week (it still has to go to several mail ballots, but still),
> overwhelmingly prefering option 3:
> 
> [attached]
> 
> Kind regards,
> 
> --
> Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290
> Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
> At home: http://moene.org/~toon/; weather: http://moene.org/~hirlam/
> Progress of GNU Fortran: http://gcc.gnu.org/wiki/GFortran#news


examples.tgz
Description: examples.tgz


[committed, PATCH] Update -mtune=knl for Knights Landing

2015-08-10 Thread H.J. Lu
According to:

https://software.intel.com/sites/default/files/managed/e9/b5/Knights-Corner-is-your-path-to-Knights-Landing.pdf

Knights Landing is “Based on Intel Atom core (based on Silvermont
microarchitecture) with many HPC enhancements.”

This patch replaces CPU_KNL with CPU_SLM to tune for Knights Landing.

* config/i386/i386.c (processor_alias_table): Replace CPU_KNL
with CPU_SLM.
* config/i386/i386.md (cpu): Remove knl.
---
 gcc/config/i386/i386.c  | 2 +-
 gcc/config/i386/i386.md | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 0b785d8..57d874b 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -3351,7 +3351,7 @@ ix86_option_override_internal (bool main_args_p,
   {"atom", PROCESSOR_BONNELL, CPU_ATOM, PTA_BONNELL},
   {"silvermont", PROCESSOR_SILVERMONT, CPU_SLM, PTA_SILVERMONT},
   {"slm", PROCESSOR_SILVERMONT, CPU_SLM, PTA_SILVERMONT},
-  {"knl", PROCESSOR_KNL, CPU_KNL, PTA_KNL},
+  {"knl", PROCESSOR_KNL, CPU_SLM, PTA_KNL},
   {"intel", PROCESSOR_INTEL, CPU_SLM, PTA_NEHALEM},
   {"geode", PROCESSOR_GEODE, CPU_GEODE,
PTA_MMX | PTA_3DNOW | PTA_3DNOW_A | PTA_PREFETCH_SSE | PTA_PRFCHW},
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 9ffe9aa..e6c2d30 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -409,7 +409,7 @@
 ;; Processor type.
 (define_attr "cpu" "none,pentium,pentiumpro,geode,k6,athlon,k8,core2,nehalem,
atom,slm,generic,amdfam10,bdver1,bdver2,bdver3,bdver4,
-   btver2,knl"
+   btver2"
   (const (symbol_ref "ix86_schedule")))
 
 ;; A basic instruction type.  Refinements due to arguments to be
-- 
2.4.3



Re: [PATCH] Treat model == 0x4f as Broadwell

2015-08-10 Thread H.J. Lu
On Sat, Aug 8, 2015 at 12:42 AM, Uros Bizjak  wrote:
> On Sat, Aug 8, 2015 at 12:57 AM, H.J. Lu  wrote:
>> From Intel SDM Vol 3:
>>
>> Table 35-29 lists MSRs that are common to processors based on the
>> Broadwell microarchitectures (including CPUID signatures 06_3DH, 06_47H,
>> 06_4FH, and 06_56H).
>>
>> OK for trunk?
>
> These kind of patches fall into trivial category. They don't need approval..
>
> Uros.

I also checked in this.

-- 
H.J.
---
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 146a730..34a5c1a 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,8 @@
+2015-08-10  H.J. Lu  
+
+ * gcc.target/i386/builtin_target.c (check_intel_cpu_model):
+ Treat model == 0x4f as Broadwell.
+
 2015-08-10  Francois-Xavier Coudert  

  PR libfortran/67140
diff --git a/gcc/testsuite/gcc.target/i386/builtin_target.c
b/gcc/testsuite/gcc.target/i386/builtin_target.c
index 10c0568..4adea27 100644
--- a/gcc/testsuite/gcc.target/i386/builtin_target.c
+++ b/gcc/testsuite/gcc.target/i386/builtin_target.c
@@ -74,6 +74,7 @@ check_intel_cpu_model (unsigned int family, unsigned
int model,
   assert (__builtin_cpu_is ("haswell"));
   break;
 case 0x3d:
+case 0x47:
 case 0x4f:
 case 0x56:
   /* Broadwell.  */
diff --git a/libgcc/ChangeLog b/libgcc/ChangeLog
index 79df462..95a10f2 100644
--- a/libgcc/ChangeLog
+++ b/libgcc/ChangeLog
@@ -1,3 +1,8 @@
+2015-08-10  H.J. Lu  
+
+ * config/i386/cpuinfo.c (get_intel_cpu): Treat model == 0x4f as
+ Broadwell.
+
 2015-07-22  Uros Bizjak  

  PR target/66954
diff --git a/libgcc/config/i386/cpuinfo.c b/libgcc/config/i386/cpuinfo.c
index 01dbb59..57711d0 100644
--- a/libgcc/config/i386/cpuinfo.c
+++ b/libgcc/config/i386/cpuinfo.c
@@ -232,6 +232,7 @@ get_intel_cpu (unsigned int family, unsigned int
model, unsigned int brand_id)
   __cpu_model.__cpu_subtype = INTEL_COREI7_HASWELL;
   break;
 case 0x3d:
+case 0x47:
 case 0x4f:
 case 0x56:
   /* Broadwell.  */


Re: [gomp4] internal fns for id & nid

2015-08-10 Thread Thomas Schwinge
Hi!

On Mon, 3 Aug 2015 16:43:04 -0400, Nathan Sidwell  wrote:
> I've committed this to gomp4 branch.  It replaces the regular builtins 
> __builtin_GOACC_nid/__builtin_GOACC_id with internal functions 
> IFN_OACC_DIM_SIZE 
> and IFN_OACC_DIM_POS -- moving further away from the PTX-specific naming of 
> id & 
> nid.

Thanks!

> --- gcc/internal-fn.c (revision 226515)
> +++ gcc/internal-fn.c (working copy)

> +static void
> +expand_GOACC_DIM_SIZE (gcall *stmt)
> +{
> +  tree lhs = gimple_call_lhs (stmt);
> +
> +  if (!lhs)
> +return;
> +  
> +  rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
> +  rtx val = expand_expr (gimple_call_arg (stmt, 0), NULL_RTX,
> +  VOIDmode, EXPAND_NORMAL);
> +#ifdef HAVE_oacc_dim_size
> +  emit_insn (gen_oacc_dim_size (target, val));
> +#else
> +  emit_move_insn (target, const1_rtx);
> +#endif
> +}
> +
> +static void
> +expand_GOACC_DIM_POS (gcall *stmt)
> +{
> +  tree lhs = gimple_call_lhs (stmt);
> +
> +  if (!lhs)
> +return;
> +  
> +  rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
> +  rtx val = expand_expr (gimple_call_arg (stmt, 0), NULL_RTX,
> +  VOIDmode, EXPAND_NORMAL);
> +#ifdef HAVE_oacc_dim_pos
> +  emit_insn (gen_oacc_dim_pos (target, val));
> +#else
> +  emit_move_insn (target, const0_rtx);
> +#endif
> +}

Bootstrap failure:

[...]/source-gcc/gcc/internal-fn.c: In function 'void 
expand_GOACC_DIM_SIZE(gcall*)':
[...]/source-gcc/gcc/internal-fn.c:1996:7: error: unused variable 'val' 
[-Werror=unused-variable]
   rtx val = expand_expr (gimple_call_arg (stmt, 0), NULL_RTX,
   ^
[...]/source-gcc/gcc/internal-fn.c: In function 'void 
expand_GOACC_DIM_POS(gcall*)':
[...]/source-gcc/gcc/internal-fn.c:2014:7: error: unused variable 'val' 
[-Werror=unused-variable]
   rtx val = expand_expr (gimple_call_arg (stmt, 0), NULL_RTX,
   ^

I'm assuming it is permissible to not expand_expr the call argument (for
side effects) in these two cases (please shout if that's wrong);
committed to gomp-4_0-branch in r226767:

commit f3907d648a9c9420deb4fb9f295b6e192a209f8d
Author: tschwinge 
Date:   Mon Aug 10 19:37:49 2015 +

Address -Werror=unused-variable diagnostic

Fixup for r226531.

gcc/
* internal-fn.c (expand_GOACC_DIM_SIZE) [!HAVE_oacc_dim_size]:
Don't define and set variable val.
(expand_GOACC_DIM_POS) [!HAVE_oacc_dim_pos]: Likewise.
* internal-fn.c (expand_GOACC_DIM_SIZE, expand_GOACC_DIM_POS)
[!HAVE_oacc_dim_size]: Don't define and set variable val.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@226767 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog.gomp |6 ++
 gcc/internal-fn.c  |8 
 2 files changed, 10 insertions(+), 4 deletions(-)

diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp
index 62f5e59..542b1af 100644
--- gcc/ChangeLog.gomp
+++ gcc/ChangeLog.gomp
@@ -1,3 +1,9 @@
+2015-08-10  Thomas Schwinge  
+
+   * internal-fn.c (expand_GOACC_DIM_SIZE) [!HAVE_oacc_dim_size]:
+   Don't define and set variable val.
+   (expand_GOACC_DIM_POS) [!HAVE_oacc_dim_pos]: Likewise.
+
 2015-08-06  Cesar Philippidis  
 
* config/nvptx/nvptx.c (nvptx_expand_lock_unlock): Pass an
diff --git gcc/internal-fn.c gcc/internal-fn.c
index 72bb0bd..05321e1 100644
--- gcc/internal-fn.c
+++ gcc/internal-fn.c
@@ -1993,9 +1993,9 @@ expand_GOACC_DIM_SIZE (gcall *stmt)
 return;
   
   rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
-  rtx val = expand_expr (gimple_call_arg (stmt, 0), NULL_RTX,
-VOIDmode, EXPAND_NORMAL);
 #ifdef HAVE_oacc_dim_size
+  rtx val = expand_expr (gimple_call_arg (stmt, 0), NULL_RTX,
+VOIDmode, EXPAND_NORMAL);
   emit_insn (gen_oacc_dim_size (target, val));
 #else
   emit_move_insn (target, const1_rtx);
@@ -2011,9 +2011,9 @@ expand_GOACC_DIM_POS (gcall *stmt)
 return;
   
   rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
-  rtx val = expand_expr (gimple_call_arg (stmt, 0), NULL_RTX,
-VOIDmode, EXPAND_NORMAL);
 #ifdef HAVE_oacc_dim_pos
+  rtx val = expand_expr (gimple_call_arg (stmt, 0), NULL_RTX,
+VOIDmode, EXPAND_NORMAL);
   emit_insn (gen_oacc_dim_pos (target, val));
 #else
   emit_move_insn (target, const0_rtx);


Grüße,
 Thomas


signature.asc
Description: PGP signature


Re: [gomp4] OpenACC first private

2015-08-10 Thread Thomas Schwinge
Hi!

On Mon, 3 Aug 2015 10:30:49 -0400, Nathan Sidwell  wrote:
> I've committed this patch to gomp4.  The existing implementation of 
> firstprivate 
> presumes the existence of memory at the CTA level.  This patch does away with 
> that, treating firstprivate as thread-private variables initialized from the 
> host.
> 
> During development there was some fallout from declare handling, as that 
> wasn't 
>   creating the expected omp_region context object.  The previous handling of 
> firstprivate just happened to work.  Jim has been working on resolving that 
> problem.

I'm seeing the following regressions after this r226508 commit -- are
those the ones that Jim is working on resolving?

[-PASS:-]{+FAIL: gfortran.dg/goacc/modules.f95   -O  (internal compiler 
error)+}
{+FAIL:+} gfortran.dg/goacc/modules.f95   -O  (test for excess errors)

PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/parallel-loop-1.c 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 (test for excess errors)
[-XFAIL:-]{+XPASS:+} 
libgomp.oacc-c/../libgomp.oacc-c-c++-common/parallel-loop-1.c 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 execution test

PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/reduction-4.c 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 (test for excess errors)
{+WARNING: program timed out.+}
XFAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/reduction-4.c 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 execution test

PASS: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/parallel-loop-1.c 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 (test for excess errors)
[-XFAIL:-]{+XPASS:+} 
libgomp.oacc-c++/../libgomp.oacc-c-c++-common/parallel-loop-1.c 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 execution test

PASS: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/reduction-4.c 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 (test for excess errors)
{+WARNING: program timed out.+}
XFAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/reduction-4.c 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 execution test

PASS: libgomp.oacc-fortran/declare-1.f90 -DACC_DEVICE_TYPE_nvidia=1 
-DACC_MEM_SHARED=0  -O0  (test for excess errors)
[-PASS:-]{+FAIL:+} libgomp.oacc-fortran/declare-1.f90 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0  -O0  execution test
PASS: libgomp.oacc-fortran/declare-1.f90 -DACC_DEVICE_TYPE_nvidia=1 
-DACC_MEM_SHARED=0  -O1  (test for excess errors)
[-PASS:-]{+FAIL:+} libgomp.oacc-fortran/declare-1.f90 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0  -O1  execution test
PASS: libgomp.oacc-fortran/declare-1.f90 -DACC_DEVICE_TYPE_nvidia=1 
-DACC_MEM_SHARED=0  -O2  (test for excess errors)
[-PASS:-]{+FAIL:+} libgomp.oacc-fortran/declare-1.f90 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0  -O2  execution test
PASS: libgomp.oacc-fortran/declare-1.f90 -DACC_DEVICE_TYPE_nvidia=1 
-DACC_MEM_SHARED=0  -O3 -fomit-frame-pointer  (test for excess errors)
[-PASS:-]{+FAIL:+} libgomp.oacc-fortran/declare-1.f90 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0  -O3 -fomit-frame-pointer  
execution test
PASS: libgomp.oacc-fortran/declare-1.f90 -DACC_DEVICE_TYPE_nvidia=1 
-DACC_MEM_SHARED=0  -O3 -fomit-frame-pointer -funroll-loops  (test for excess 
errors)
[-PASS:-]{+FAIL:+} libgomp.oacc-fortran/declare-1.f90 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0  -O3 -fomit-frame-pointer 
-funroll-loops  execution test
PASS: libgomp.oacc-fortran/declare-1.f90 -DACC_DEVICE_TYPE_nvidia=1 
-DACC_MEM_SHARED=0  -O3 -fomit-frame-pointer -funroll-all-loops 
-finline-functions  (test for excess errors)
[-PASS:-]{+FAIL:+} libgomp.oacc-fortran/declare-1.f90 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0  -O3 -fomit-frame-pointer 
-funroll-all-loops -finline-functions  execution test
PASS: libgomp.oacc-fortran/declare-1.f90 -DACC_DEVICE_TYPE_nvidia=1 
-DACC_MEM_SHARED=0  -O3 -g  (test for excess errors)
[-PASS:-]{+FAIL:+} libgomp.oacc-fortran/declare-1.f90 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0  -O3 -g  execution test
PASS: libgomp.oacc-fortran/declare-1.f90 -DACC_DEVICE_TYPE_nvidia=1 
-DACC_MEM_SHARED=0  -Os  (test for excess errors)
[-PASS:-]{+FAIL:+} libgomp.oacc-fortran/declare-1.f90 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0  -Os  execution test

PASS: libgomp.oacc-fortran/lib-13.f90 -DACC_DEVICE_TYPE_nvidia=1 
-DACC_MEM_SHARED=0  -O0  (test for excess errors)
[-PASS:-]{+FAIL:+} libgomp.oacc-fortran/lib-13.f90 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0  -O0  execution test
PASS: libgomp.oacc-fortran/lib-13.f90 -DACC_DEVICE_TYPE_nvidia=1 
-DACC_MEM_SHARED=0  -O1  (test for excess errors)
[-PASS:-]{+FAIL:+} libgomp.oacc-fortran/lib-13.f90 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0  -O1  execution test
PASS: libgomp.oacc-fortran/lib-13.f90 -DACC_DEVICE_TYPE_nvidia=1 
-DACC_MEM_SHARED=0  -O2  (test for excess errors)
PASS: libgomp.oacc-fortran/lib-13.f90 -DACC_DEVICE_TYP

Re: offload data version number

2015-08-10 Thread Thomas Schwinge
Hi!

On Thu, 6 Aug 2015 21:52:43 +0200, Nathan Sidwell  wrote:
> Ping?
> 
> 1) updated version patch 
> https://gcc.gnu.org/ml/gcc-patches/2015-08/msg00010.html
> 
> 2) https://gcc.gnu.org/ml/gcc-patches/2015-08/msg00204.html
> An infrastructure piece from Thomas, who noticed liboffloadmic didn't have 
> the 
> include paths to #include  gomp-constants.h.

(I merged these two, and) I'm again attaching the patch
0001-Offload-data-version-number.patch, which is modified to apply
cleanly after today's trunk changes.


Grüße,
 Thomas


From 1c33c5b77abedb429b590f4eead1bfb82745fe27 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Fri, 7 Aug 2015 09:49:50 +0200
Subject: [PATCH] Offload data version number

-MM-DD  Nathan Sidwell  

	gcc/
	* config/nvptx/mkoffload.c (process): Replace
	GOMP_offload_{,un}register with GOMP_offload_{,un}register_ver.

	libgomp/
	* libgomp.map: Add 4.0.2 version.
	* target.c (offload_image_descr): Add version field.
	(gomp_load_image_to_device): Add version argument.  Adjust plugin
	call.  Improve load mismatch diagnostic.
	(gomp_unload_image_from_device): Add version argument.  Adjust plugin
	call.
	(GOMP_offload_regster): Make stub function, move bulk to ...
	(GOMP_offload_register_ver): ... here.  Process version argument.
	(GOMP_offload_unregister): Make stub function, move bulk to ...
	(GOMP_offload_unregister_ver): ... here.  Process version argument.
	(gomp_init_device): Process version field.
	(gomp_unload_device): Process version field.
	(gomp_load_plugin_for_device): Reimplement DLSYM & DLSYM_OPT
	macros.  Check plugin version.
	* libgomp.h (gomp_device_descr): Add version function field.  Adjust
	loader and unloader types.
	* oacc-host.c (host_dispatch): Adjust.
	* plugin/plugin-nvptx.c: Include gomp-constants.h.
	(GOMP_OFFLOAD_version): New.
	(GOMP_OFFLOAD_load_image): Add version arg and check it.
	(GOMP_OFFLOAD_unload_image): Likewise.
	* plugin/plugin-host.c: Include gomp-constants.h.
	(GOMP_OFFLOAD_version): New.
	(GOMP_OFFLOAD_load_image): Add version arg.
	(GOMP_OFFLOAD_unload_image): Likewise.
	* oacc-host.c (host_dispatch): Init version field.

	liboffloadmic/
	* plugin/libgomp-plugin-intelmic.cpp (GOMP_OFFLOAD_version): New.
	(GOMP_OFFLOAD_load_image): Add version arg and check it.
	(GOMP_OFFLOAD_unload_image): Likewise.

	include/
	* gomp-constants.h (GOMP_VERSION, GOMP_VERSION_NVIDIA_PTX)
	GOMP_VERSION_INTEL_MIC): New.
	(GOMP_VERSION_PACK, GOMP_VERSION_LIB, GOMP_VERSION_DEV): New.

-MM-DD  Thomas Schwinge  

	liboffloadmic/
	* plugin/Makefile.am (include_src_dir): Set.
	[PLUGIN_HOST] (libgomp_plugin_intelmic_la_CPPFLAGS): Use it.
	* plugin/Makefile.in: Regenerate.
	* plugin/libgomp-plugin-intelmic.cpp: Include "gomp-constants.h".
---
 gcc/config/nvptx/mkoffload.c |   24 +--
 include/gomp-constants.h |9 ++
 libgomp/libgomp.h|5 +-
 libgomp/libgomp.map  |6 +
 libgomp/oacc-host.c  |   10 ++
 libgomp/plugin/plugin-nvptx.c|   22 ++-
 libgomp/target.c |  188 --
 liboffloadmic/plugin/Makefile.am |3 +-
 liboffloadmic/plugin/Makefile.in |3 +-
 liboffloadmic/plugin/libgomp-plugin-intelmic.cpp |   27 +++-
 10 files changed, 193 insertions(+), 104 deletions(-)

diff --git a/gcc/config/nvptx/mkoffload.c b/gcc/config/nvptx/mkoffload.c
index 1e154c8..ba0454e 100644
--- a/gcc/config/nvptx/mkoffload.c
+++ b/gcc/config/nvptx/mkoffload.c
@@ -881,10 +881,10 @@ process (FILE *in, FILE *out)
 	   "extern \"C\" {\n"
 	   "#endif\n");
 
-  fprintf (out, "extern void GOMP_offload_register"
-	   " (const void *, int, const void *);\n");
-  fprintf (out, "extern void GOMP_offload_unregister"
-	   " (const void *, int, const void *);\n");
+  fprintf (out, "extern void GOMP_offload_register_ver"
+	   " (unsigned, const void *, int, const void *);\n");
+  fprintf (out, "extern void GOMP_offload_unregister_ver"
+	   " (unsigned, const void *, int, const void *);\n");
 
   fprintf (out, "#ifdef __cplusplus\n"
 	   "}\n"
@@ -894,15 +894,19 @@ process (FILE *in, FILE *out)
 
   fprintf (out, "static __attribute__((constructor)) void init (void)\n"
 	   "{\n"
-	   "  GOMP_offload_register (__OFFLOAD_TABLE__, %d/*NVIDIA_PTX*/,\n"
-	   " &target_data);\n"
-	   "};\n", GOMP_DEVICE_NVIDIA_PTX);
+	   "  GOMP_offload_register_ver (%#x, __OFFLOAD_TABLE__,"
+	   "%d/*NVIDIA_PTX*/, &target_data);\n"
+	   "};\n",
+	   GOMP_VERSION_PACK (GOMP_VERSION, GOMP_VERSION_NVIDIA_PTX),
+	   GOMP_DEVICE_NVIDIA_PTX);
 
   fprintf (out, "static __attribute__((destructor)) void fini (void)\n"
 	   "{\n"
-	   "  GOMP_offload_unregister (__OFFLOAD_TABLE__, %d/*NVIDIA_PTX*/,\n"
-	   "   &target_data);\n"
-	   "};\n", GOMP_DEVICE_NVIDIA_PTX);
+	   "  GOMP_offload_unregister_ver (%#x, __OFFLOAD_TABLE__,"
+	   "%d

[optimize 1/3] Fix phi to min/max

2015-08-10 Thread Nathan Sidwell

Richard,

This patch fixes the problem I described earlier about min/max generation 
propagating incorrect range information 
(https://gcc.gnu.org/ml/gcc/2015-08/msg00024.html).


I went with creating a new name, if the PHI being modified has more than 2 
edges.  I also modified the testcase that my min/max vrp patch tickled -- it 
wasn't calling an init function and hence copying zeroes rather than the small 
values it thought it was copying.  Also added a new test to convert zeroes, 
which was the failure mode I observed in the DFP lib.


This patch was tested in combination with the new min max optimization I'll 
post momentarily.


ok?

nathan
2015-08-10  Nathan Sidwell  

	* tree-ssa-phiopt.c (minmax_replacement): Create new ssa name if
	we're not the only contributor to target phi.

	testsuite/
	* c-c++-common/dfp/operator-comma.c: Call init function.
	* c-c++-common/dfp/convert-dfp-2.c: New test.

Index: tree-ssa-phiopt.c
===
--- tree-ssa-phiopt.c	(revision 226749)
+++ tree-ssa-phiopt.c	(working copy)
@@ -1277,8 +1277,16 @@ minmax_replacement (basic_block cond_bb,
   gsi_move_before (&gsi_from, &gsi);
 }
 
+  /* Create an SSA var to hold the min/max result.  If we're the only
+ things setting the target PHI, then we  can clone the PHI
+ variable.  Otherwise we must create a new one.  */
+  result = PHI_RESULT (phi);
+  if (EDGE_COUNT (gimple_bb (phi)->preds) == 2)
+result = duplicate_ssa_name (result, NULL);
+  else
+result = make_ssa_name (TREE_TYPE (result));
+
   /* Emit the statement to compute min/max.  */
-  result = duplicate_ssa_name (PHI_RESULT (phi), NULL);
   new_stmt = gimple_build_assign (result, minmax, arg0, arg1);
   gsi = gsi_last_bb (cond_bb);
   gsi_insert_before (&gsi, new_stmt, GSI_NEW_STMT);
Index: testsuite/c-c++-common/dfp/convert-dfp-2.c
===
--- testsuite/c-c++-common/dfp/convert-dfp-2.c	(revision 0)
+++ testsuite/c-c++-common/dfp/convert-dfp-2.c	(working copy)
@@ -0,0 +1,45 @@
+/* { dg-options "-O0" } */
+
+/* Test decimal fp conversions of zero.  */
+
+#include "dfp-dbg.h"
+
+volatile _Decimal32 d32a, d32c;
+volatile _Decimal64 d64a, d64c;
+volatile _Decimal128 d128a, d128c;
+
+int
+main ()
+{
+  d32a = d32c;
+  if (d32a)
+FAILURE
+  d32a = d64c;
+  if (d32a)
+FAILURE
+  d32a = d128c;
+  if (d32a)
+FAILURE
+
+  d64a = d32c;
+  if (d64a)
+FAILURE
+  d64a = d64c;
+  if (d64a)
+FAILURE
+  d64a = d128c;
+  if (d64a)
+FAILURE
+  
+  d128a = d32c;
+  if (d128a)
+FAILURE
+  d128a = d64c;
+  if (d128a)
+FAILURE
+  d128a = d128c;
+  if (d128a)
+FAILURE
+  
+  FINISH
+}
Index: testsuite/c-c++-common/dfp/operator-comma.c
===
--- testsuite/c-c++-common/dfp/operator-comma.c	(revision 226749)
+++ testsuite/c-c++-common/dfp/operator-comma.c	(working copy)
@@ -24,6 +24,8 @@ init ()
 int
 main ()
 {
+  init ();
+  
   d32a = (d32b, d32c);
   if (d32a != d32c)
 FAILURE


[optimize 2/3] Simplify vrp abs conversion

2015-08-10 Thread Nathan Sidwell

Richard,
in looking at how simplify_abs_using_ranges was doing its thing as a guide to a 
min/max vrp optimization, I noticed it was doing more work than necessary.


Firstly, it wasn't taking advantage of the range comparison functions only 
returning TRUE or FALSE nodes when there's a definite answer, and NULL 
otherwise.  Thus if we get a node, we don't have to (a) check if it's either 
true or false and (b) we only need to check for one of those values to determine 
which specific answer was given.


Also, it was checking for 'NOT (A >= B)' by inverting the result of a '>=' 
check, rather than simply doing a '<' check. (we're dealing with integer ranges, 
so that's all well defined)


Finally, there's a useless check for UNSIGNED_TYPE, which ends up doing nothing. 
 AFAICT 'ABS (unsigned)' gets folded out very early on.


booted and tested with the phi-min-max fix I just posted and the new VRP-min-max 
optimization I'm about to.


ok?

nathan

2015-08-10  Nathan Sidwell  

	* tree-vrp.c (simplify_abs_using_ranges): Simplify.

Index: tree-vrp.c
===
--- tree-vrp.c	(revision 226749)
+++ tree-vrp.c	(working copy)
@@ -9152,37 +9215,25 @@ simplify_div_or_mod_using_ranges (gimple
 static bool
 simplify_abs_using_ranges (gimple stmt)
 {
-  tree val = NULL;
   tree op = gimple_assign_rhs1 (stmt);
-  tree type = TREE_TYPE (op);
   value_range_t *vr = get_value_range (op);
 
-  if (TYPE_UNSIGNED (type))
-{
-  val = integer_zero_node;
-}
-  else if (vr)
+  if (vr)
 {
+  tree val = NULL;
   bool sop = false;
 
   val = compare_range_with_value (LE_EXPR, vr, integer_zero_node, &sop);
   if (!val)
 	{
+	  /* The range is neither <= 0 nor > 0.  Now see if it is
+	 either < 0 or >= 0.  */
 	  sop = false;
-	  val = compare_range_with_value (GE_EXPR, vr, integer_zero_node,
+	  val = compare_range_with_value (LT_EXPR, vr, integer_zero_node,
 	  &sop);
-
-	  if (val)
-	{
-	  if (integer_zerop (val))
-		val = integer_one_node;
-	  else if (integer_onep (val))
-		val = integer_zero_node;
-	}
 	}
 
-  if (val
-	  && (integer_onep (val) || integer_zerop (val)))
+  if (val)
 	{
 	  if (sop && issue_strict_overflow_warning (WARN_STRICT_OVERFLOW_MISC))
 	{
@@ -9198,10 +9249,10 @@ simplify_abs_using_ranges (gimple stmt)
 	}
 
 	  gimple_assign_set_rhs1 (stmt, op);
-	  if (integer_onep (val))
-	gimple_assign_set_rhs_code (stmt, NEGATE_EXPR);
-	  else
+	  if (integer_zerop (val))
 	gimple_assign_set_rhs_code (stmt, SSA_NAME);
+	  else
+	gimple_assign_set_rhs_code (stmt, NEGATE_EXPR);
 	  update_stmt (stmt);
 	  return true;
 	}


[optimize3/3] VRP min/max exprs

2015-08-10 Thread Nathan Sidwell

Richard.
this is the patch for the min/max optimization I was trying to implement before 
getting sidetracked with the phi bug and cleaning the vrp abs optimization.


This patch checks both min and max where both operands have a determined range, 
and the case where the second op is a constant.  When we determine the operand 
values are disjoint (modulo a possible single overlapping value) we replace the 
min or max with the appropriate operand.


booted and tested with the other two patches I just posted.

ok?

nathan
2015-08-10  Nathan Sidwell  

	* tree-vrp.c (simplify_min_or_max_using_ranges): New.
	(simplify_stmt_using_ranges): Simplify MIN and MAX exprs.

	testsuite/
	* gcc.dg/vrp-min-max-1.c: New.
	* gcc.dg/vrp-min-max-2.c: New.

Index: tree-vrp.c
===
--- tree-vrp.c	(revision 226749)
+++ tree-vrp.c	(working copy)
@@ -9145,6 +9145,69 @@ simplify_div_or_mod_using_ranges (gimple
   return false;
 }
 
+/* Simplify a min or max if the ranges of the two operands are
+   disjoint.   Return true if we do simplify.  */
+
+static bool
+simplify_min_or_max_using_ranges (gimple stmt)
+{
+  tree op0 = gimple_assign_rhs1 (stmt);
+  tree op1 = gimple_assign_rhs2 (stmt);
+  bool sop = false;
+  tree val;
+  value_range_t *vr0 = get_value_range (op0);
+
+  if (TREE_CODE (op1) == SSA_NAME)
+{
+  /* SSA_NAME MIN/MAX SSA_NAME.  Compare ranges.  */
+  value_range_t *vr1 = get_value_range (op1);
+
+  val = compare_ranges (LE_EXPR, vr0, vr1, &sop);
+  if (!val)
+	{
+	  sop = false;
+	  val = compare_ranges (LT_EXPR, vr0, vr1, &sop);
+	}
+}
+  else
+{
+  /* SSA_NAME MIN/MAX CONST.  Compare range to value.  */
+  val = compare_range_with_value (LE_EXPR, vr0, op1, &sop);
+  if (!val)
+	{
+	  sop = false;
+	  val = compare_range_with_value (LT_EXPR, vr0, op1, &sop);
+	}
+}
+
+  if (val)
+{
+  if (sop && issue_strict_overflow_warning (WARN_STRICT_OVERFLOW_MISC))
+	{
+	  location_t location;
+
+	  if (!gimple_has_location (stmt))
+	location = input_location;
+	  else
+	location = gimple_location (stmt);
+	  warning_at (location, OPT_Wstrict_overflow,
+		  "assuming signed overflow does not occur when "
+		  "simplifying % to % or %");
+	}
+
+  /* VAL == TRUE -> OP0 < or <= op1
+	 VAL == FALSE -> OP0 > or >= op1.  */
+  tree res = ((gimple_assign_rhs_code (stmt) == MAX_EXPR)
+		  == integer_zerop (val)) ? op0 : op1;
+  gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
+  gimple_assign_set_rhs_from_tree (&gsi, res);
+  update_stmt (stmt);
+  return true;
+}
+
+  return false;
+}
+
 /* If the operand to an ABS_EXPR is >= 0, then eliminate the
ABS_EXPR.  If the operand is <= 0, then simplify the
ABS_EXPR into a NEGATE_EXPR.  */
@@ -,6 +10050,13 @@ simplify_stmt_using_ranges (gimple_stmt_
 	return simplify_float_conversion_using_ranges (gsi, stmt);
 	  break;
 
+	case MIN_EXPR:
+	case MAX_EXPR:
+	  if (TREE_CODE (rhs1) == SSA_NAME
+	  && INTEGRAL_TYPE_P (TREE_TYPE (rhs1)))
+	return simplify_min_or_max_using_ranges (stmt);
+	  break;
+
 	default:
 	  break;
 	}
Index: testsuite/gcc.dg/vrp-min-max-1.c
===
--- testsuite/gcc.dg/vrp-min-max-1.c	(revision 0)
+++ testsuite/gcc.dg/vrp-min-max-1.c	(working copy)
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-vrp1 -fdump-tree-mergephi2" } */
+
+int bar (void);
+
+int foo1 (int x, int y)
+{
+  if (y < 10) return bar ();
+  if (x > 9) return bar ();
+
+  return x < y ? x : y;
+}
+
+int foo2 (int x, int y)
+{
+  if (y < 10) return bar ();
+  if (x > 9) return bar ();
+
+  return x > y ? x : y;
+}
+
+/* We expect to optimiz min/max in VRP*/
+
+/* { dg-final { scan-tree-dump-times "MIN_EXPR" 1 "mergephi2" } } */
+/* { dg-final { scan-tree-dump-times "MAX_EXPR" 1 "mergephi2" } } */
+/* { dg-final { scan-tree-dump-not "MIN_EXPR" "vrp1" } } */
+/* { dg-final { scan-tree-dump-not "MAX_EXPR" "vrp1" } } */
Index: testsuite/gcc.dg/vrp-min-max-2.c
===
--- testsuite/gcc.dg/vrp-min-max-2.c	(revision 0)
+++ testsuite/gcc.dg/vrp-min-max-2.c	(working copy)
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-vrp2" } */
+
+int Foo (int X)
+{
+  if (X < 0)
+X = 0;
+  if (X > 191)
+X = 191;
+
+  return X << 23;
+}
+
+/* We expect this min/max pair to survive.  */
+
+/* { dg-final { scan-tree-dump-times "MIN_EXPR" 1 "vrp2" } } */
+/* { dg-final { scan-tree-dump-times "MAX_EXPR" 1 "vrp2" } } */


Re: [COMMITTED][AArch64] Improve TLS Descriptor pattern to release RTL loop IV opt

2015-08-10 Thread Jiong Wang

Andreas Schwab writes:

> Jiong Wang  writes:
>
>> Andreas Schwab writes:
>>
>>> Jiong Wang  writes:
>>>
 And I just finished two round of native aarch64 build/check w/wo my patch.
>>>
>>> Did you rebuild everything?
>>
>> No.
>
> Please do.

Andreas,

  I Just finished several round of rebuild & testing on clean
  environment.

  The problem should have gone away after my new tlsdesc patch.  

  While I do have seen lots of failures with my old tlsdesc patch.

  Please let me know if you still have problem on this after my patch
  correction.
  
  Thanks.
  
-- 
Regards,
Jiong



Re: [PATCH] Fix PR67053

2015-08-10 Thread Marc Glisse

On Wed, 29 Jul 2015, Richard Biener wrote:


The following fixes PR67053 by more closely mirror what fold_binary()s
STRIP_NOPS does to avoid the C++ FE constexpr code to regress.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Yes, I'm thinking on an automated way to more closely mirror
STRIP_[SIGN_]NOPS behavior (on toplevel args).


As far as I can see, you are not currently checking that these conversions 
are NOPs. I didn't test, but I am afraid this may simplify 
(char)p1==(char)p2 to false a bit too quickly.



Richard.

2015-07-29  Richard Biener  

PR middle-end/67053
* match.pd: Allow both operands to independently have conversion
when simplifying compares of addresses.

Index: gcc/match.pd
===
--- gcc/match.pd(revision 226345)
+++ gcc/match.pd(working copy)
@@ -1814,7 +1814,7 @@ (define_operator_list CBRT BUILT_IN_CBRT
   enough to make fold_stmt not regress when not dispatching to fold_binary.  */
(for cmp (simple_comparison)
 (simplify
-  (cmp (convert?@2 addr@0) (convert? addr@1))
+  (cmp (convert1?@2 addr@0) (convert2? addr@1))
  (with
   {
 HOST_WIDE_INT off0, off1;



--
Marc Glisse


Re: [PR64164] drop copyrename, integrate into expand

2015-08-10 Thread Patrick Marlier
On Mon, Aug 10, 2015 at 5:14 PM, Jeff Law  wrote:
> On 08/10/2015 02:23 AM, James Greenhalgh wrote:
>>
>> On Tue, Aug 04, 2015 at 12:45:28AM +0100, Alexandre Oliva wrote:
>>>
>>> On Jul 30, 2015, "H.J. Lu"  wrote:
>>>
 aoliva/pr64164  is fine on x32.
>>>
>>>
>>> Thanks.  I have made a large number of changes since you tested it,
>>> fixing all the reported issues and then some.  Now, x86_64-linux-gnu
>>> (-m64 and -m32), i686-pc-linux-gnu, powerpc64-linux-gnu and
>>> powerpc64el-linux-gnu pass regstrap (r226317), and the many tens of
>>> targets I cross-tested still get the same 'make all' errors that the
>>> pristine tree did.
>>
>>
>> For what it is worth, I bootstrapped and tested the consolidated patch
>> on arm-none-linux-gnueabihf and aarch64-none-linux-gnu with trunk at
>> r226516 over the weekend, and didn't see any new issues.
>
> Thanks -- I know it's been a long road on this patch.  I don't think anyone
> would have ever guessed fixing 64164 would be so complex.

Especially as the bug reporter, I am impressed how a slight problem
can lead to such a patch! ;)
Thanks a lot Alexandre!

I feel like I owe you something for this hard work!
Feel free to ping if I can help you with something or I owe you at
least a beer when you will be around Switzerland. :)
--
Pat


Re: Fix two more memory leaks in threader

2015-08-10 Thread Uros Bizjak
Hello!

+2015-08-03  Jeff Law  
+
+ PR middle-end/66314
+ PR gcov-profile/66899
+ * tree-ssa-threadupdate.c (mark_threaded_blocks): Correctly
+ iterate over the jump threading paths when an element in the
+ jump threading paths array is eliminated.
+
 2015-08-03  Segher Boessenkool  

  * Makefile.in (OBJS): Put gimple-match.o and generic-match.o first.
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index a403767..0a841b5 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,10 @@
+2015-08-03  Jeff Law  
+
+ PR middle-end/66314
+ PR gcov-profile/66899
+ * gcc.dg/pr66899.c: New test.
+ * gcc.dg/pr66314.c: New test.

gcc.dg/pr66314.c testcase should go into gcc.dg/asan directory.
Targets where -fsanitize=address or -fsanitize=kernel-address are not
supported now emit warning about unsupported feature for the mentioned
testcase.

Uros.