[PATCH] driver: Also prune joined switches with negation

2019-09-23 Thread Matt Turner
When -march=native is passed to host_detect_local_cpu to the backend,
it overrides all command lines after it.  That means

$ gcc -march=native -march=armv8-a

is treated as

$ gcc -march=armv8-a -march=native

Prune joined switches with Negative and RejectNegative to allow
-march=armv8-a to override previous -march=native on command-line.

This is the same fix as was applied for i386 in SVN revision 269164 but for
aarch64 and arm.

gcc/

PR driver/69471
* config/aarch64/aarch64.opt (march=): Add Negative(march=).
(mtune=): Add Negative(mtune=).
* config/arm/arm.opt: Likewise.
---
 gcc/config/aarch64/aarch64.opt | 5 +++--
 gcc/config/arm/arm.opt | 4 ++--
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt
index 865b6a6d8ca..908dca23b3c 100644
--- a/gcc/config/aarch64/aarch64.opt
+++ b/gcc/config/aarch64/aarch64.opt
@@ -119,7 +119,8 @@ EnumValue
 Enum(aarch64_tls_size) String(48) Value(48)
 
 march=
-Target RejectNegative ToLower Joined Var(aarch64_arch_string)
+Target RejectNegative Negative(march=) ToLower Joined Var(aarch64_arch_string)
+
 Use features of architecture ARCH.
 
 mcpu=
@@ -127,7 +128,7 @@ Target RejectNegative ToLower Joined Var(aarch64_cpu_string)
 Use features of and optimize for CPU.
 
 mtune=
-Target RejectNegative ToLower Joined Var(aarch64_tune_string)
+Target RejectNegative Negative(mtune=) ToLower Joined Var(aarch64_tune_string)
 Optimize for CPU.
 
 mabi=
diff --git a/gcc/config/arm/arm.opt b/gcc/config/arm/arm.opt
index 452f0cf6d67..e3ead5c95d1 100644
--- a/gcc/config/arm/arm.opt
+++ b/gcc/config/arm/arm.opt
@@ -82,7 +82,7 @@ mapcs-stack-check
 Target Report Mask(APCS_STACK) Undocumented
 
 march=
-Target RejectNegative ToLower Joined Var(arm_arch_string)
+Target RejectNegative Negative(march=) ToLower Joined Var(arm_arch_string)
 Specify the name of the target architecture.
 
 ; Other arm_arch values are loaded from arm-tables.opt
@@ -232,7 +232,7 @@ Target Report Mask(TPCS_LEAF_FRAME)
 Thumb: Generate (leaf) stack frames even if not needed.
 
 mtune=
-Target RejectNegative ToLower Joined Var(arm_tune_string)
+Target RejectNegative Negative(mtune=) ToLower Joined Var(arm_tune_string)
 Tune code for the given processor.
 
 mprint-tune-info
-- 
2.21.0



[PATCH] driver: Also prune joined switches with negation

2019-09-24 Thread Matt Turner
When -march=native is passed to host_detect_local_cpu to the backend,
it overrides all command lines after it.  That means

$ gcc -march=native -march=armv8-a

is treated as

$ gcc -march=armv8-a -march=native

Prune joined switches with Negative and RejectNegative to allow
-march=armv8-a to override previous -march=native on command-line.

This is the same fix as was applied for i386 in SVN revision 269164 but for
aarch64 and arm.

gcc/

PR driver/69471
* config/aarch64/aarch64.opt (march=): Add Negative(march=).
(mtune=): Add Negative(mtune=). (mcpu=): Add Negative(mcpu=).
* config/arm/arm.opt: Likewise.
---
 gcc/config/aarch64/aarch64.opt | 6 +++---
 gcc/config/arm/arm.opt | 6 +++---
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt
index 865b6a6d8ca..fc43428b32a 100644
--- a/gcc/config/aarch64/aarch64.opt
+++ b/gcc/config/aarch64/aarch64.opt
@@ -119,15 +119,15 @@ EnumValue
 Enum(aarch64_tls_size) String(48) Value(48)
 
 march=
-Target RejectNegative ToLower Joined Var(aarch64_arch_string)
+Target RejectNegative Negative(march=) ToLower Joined Var(aarch64_arch_string)
 Use features of architecture ARCH.
 
 mcpu=
-Target RejectNegative ToLower Joined Var(aarch64_cpu_string)
+Target RejectNegative Negative(mcpu=) ToLower Joined Var(aarch64_cpu_string)
 Use features of and optimize for CPU.
 
 mtune=
-Target RejectNegative ToLower Joined Var(aarch64_tune_string)
+Target RejectNegative Negative(mtune=) ToLower Joined Var(aarch64_tune_string)
 Optimize for CPU.
 
 mabi=
diff --git a/gcc/config/arm/arm.opt b/gcc/config/arm/arm.opt
index 452f0cf6d67..76c10ab62a2 100644
--- a/gcc/config/arm/arm.opt
+++ b/gcc/config/arm/arm.opt
@@ -82,7 +82,7 @@ mapcs-stack-check
 Target Report Mask(APCS_STACK) Undocumented
 
 march=
-Target RejectNegative ToLower Joined Var(arm_arch_string)
+Target RejectNegative Negative(march=) ToLower Joined Var(arm_arch_string)
 Specify the name of the target architecture.
 
 ; Other arm_arch values are loaded from arm-tables.opt
@@ -107,7 +107,7 @@ Target Report Mask(CALLER_INTERWORKING)
 Thumb: Assume function pointers may go to non-Thumb aware code.
 
 mcpu=
-Target RejectNegative ToLower Joined Var(arm_cpu_string)
+Target RejectNegative Negative(mcpu=) ToLower Joined Var(arm_cpu_string)
 Specify the name of the target CPU.
 
 mfloat-abi=
@@ -232,7 +232,7 @@ Target Report Mask(TPCS_LEAF_FRAME)
 Thumb: Generate (leaf) stack frames even if not needed.
 
 mtune=
-Target RejectNegative ToLower Joined Var(arm_tune_string)
+Target RejectNegative Negative(mtune=) ToLower Joined Var(arm_tune_string)
 Tune code for the given processor.
 
 mprint-tune-info
-- 
2.21.0



Re: [PATCH] driver: Also prune joined switches with negation

2019-09-24 Thread Matt Turner
On Tue, Sep 24, 2019 at 1:24 AM Kyrill Tkachov
 wrote:
>
> Hi Matt,
>
> On 9/24/19 5:04 AM, Matt Turner wrote:
> > When -march=native is passed to host_detect_local_cpu to the backend,
> > it overrides all command lines after it.  That means
> >
> > $ gcc -march=native -march=armv8-a
> >
> > is treated as
> >
> > $ gcc -march=armv8-a -march=native
> >
> > Prune joined switches with Negative and RejectNegative to allow
> > -march=armv8-a to override previous -march=native on command-line.
> >
> > This is the same fix as was applied for i386 in SVN revision 269164
> > but for
> > aarch64 and arm.
> >
> The fix is ok for arm and LGTM for aarch64 FWIW.

Thanks!

> How has this been tested?

The problem was noticed in this bug report:

   https://bugs.gentoo.org/693522

I remembered seeing the i386 fix and I separately encountered the
problem on ARM when building the pixman library which has iwMMXt code
which requires march=iwmmxt (Could I bribe someone into fixing that by
giving gcc an -miwmmxt flag?)

I verified the fix works by patching gcc and seeing that nss (the
package from the Gentoo bug report) successfully builds with
CFLAGS="-march=native -O2 -pipe"

SVN revision 269164 also added some tests to the gcc test suite, but I
am not sufficiently familiar with building gcc and running the test
suite to verify that any test I speculatively add actually works.

> However...
>
>
> > gcc/
> >
> > PR driver/69471
> > * config/aarch64/aarch64.opt (march=): Add Negative(march=).
> > (mtune=): Add Negative(mtune=).
> > * config/arm/arm.opt: Likewise.
> > ---
> >  gcc/config/aarch64/aarch64.opt | 5 +++--
> >  gcc/config/arm/arm.opt | 4 ++--
> >  2 files changed, 5 insertions(+), 4 deletions(-)
> >
> > diff --git a/gcc/config/aarch64/aarch64.opt
> > b/gcc/config/aarch64/aarch64.opt
> > index 865b6a6d8ca..908dca23b3c 100644
> > --- a/gcc/config/aarch64/aarch64.opt
> > +++ b/gcc/config/aarch64/aarch64.opt
> > @@ -119,7 +119,8 @@ EnumValue
> >  Enum(aarch64_tls_size) String(48) Value(48)
> >
> >  march=
> > -Target RejectNegative ToLower Joined Var(aarch64_arch_string)
> > +Target RejectNegative Negative(march=) ToLower Joined
> > Var(aarch64_arch_string)
> > +
> >  Use features of architecture ARCH.
> >
> >  mcpu=
>
>
> ... Looks like we'll need something similar for -mcpu. On arm and
> aarch64 the -mcpu is the most commonly used option and that can also
> take a "native" value that would suffer from the same issue I presume.

Thank you. I've sent a second version with this addressed in reply to
my initial patch.

If the patch is okay, I think we'd appreciate it if it were backported
to the gcc-8 branch as well.


[PATCH 1/2] i386: Consider Kaby Lake to be equivalent to Skylake

2017-06-16 Thread Matt Turner
Currently -march=native selects -march=broadwell on Kaby Lake systems,
since its model numbers are missing from the switch statement. It falls
back to the default case and chooses -march=broadwell because of the
presence of the ADX instruction set.

gcc/
* config/i386/driver-i386.c (host_detect_local_cpu): Add Kaby
Lake models to skylake case.

gcc/testsuite/

* gcc.target/i386/builtin_target.c: Add Kaby Lake models to
skylake check.

libgcc/

* config/i386/cpuinfo.c (get_intel_cpu): Add Kaby Lake models to
skylake case.
---
 gcc/config/i386/driver-i386.c  | 3 +++
 gcc/testsuite/gcc.target/i386/builtin_target.c | 3 +++
 libgcc/config/i386/cpuinfo.c   | 3 +++
 3 files changed, 9 insertions(+)

diff --git a/gcc/config/i386/driver-i386.c b/gcc/config/i386/driver-i386.c
index 6c812514239..09faad0af0e 100644
--- a/gcc/config/i386/driver-i386.c
+++ b/gcc/config/i386/driver-i386.c
@@ -781,6 +781,9 @@ const char *host_detect_local_cpu (int argc, const char 
**argv)
case 0x4e:
case 0x5e:
  /* Skylake.  */
+   case 0x8e:
+   case 0x9e:
+ /* Kaby Lake.  */
  cpu = "skylake";
  break;
case 0x57:
diff --git a/gcc/testsuite/gcc.target/i386/builtin_target.c 
b/gcc/testsuite/gcc.target/i386/builtin_target.c
index 374f0292453..9c190eb7ebc 100644
--- a/gcc/testsuite/gcc.target/i386/builtin_target.c
+++ b/gcc/testsuite/gcc.target/i386/builtin_target.c
@@ -88,6 +88,9 @@ check_intel_cpu_model (unsigned int family, unsigned int 
model,
case 0x4e:
case 0x5e:
  /* Skylake.  */
+   case 0x8e:
+   case 0x9e:
+ /* Kaby Lake.  */
  assert (__builtin_cpu_is ("corei7"));
  assert (__builtin_cpu_is ("skylake"));
  break;
diff --git a/libgcc/config/i386/cpuinfo.c b/libgcc/config/i386/cpuinfo.c
index a1dc011525f..b008fb6e396 100644
--- a/libgcc/config/i386/cpuinfo.c
+++ b/libgcc/config/i386/cpuinfo.c
@@ -183,6 +183,9 @@ get_intel_cpu (unsigned int family, unsigned int model, 
unsigned int brand_id)
case 0x4e:
case 0x5e:
  /* Skylake.  */
+   case 0x8e:
+   case 0x9e:
+ /* Kaby Lake.  */
  __cpu_model.__cpu_type = INTEL_COREI7;
  __cpu_model.__cpu_subtype = INTEL_COREI7_SKYLAKE;
  break;
-- 
2.13.0



[PATCH 2/2] i386: Assume Skylake for unknown models with clflushopt

2017-06-16 Thread Matt Turner
gcc/
* config/i386/driver-i386.c (host_detect_local_cpu): Assume
skylake for unknown models with clflushopt.
---
 gcc/config/i386/driver-i386.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/config/i386/driver-i386.c b/gcc/config/i386/driver-i386.c
index 09faad0af0e..570c49031bd 100644
--- a/gcc/config/i386/driver-i386.c
+++ b/gcc/config/i386/driver-i386.c
@@ -797,6 +797,9 @@ const char *host_detect_local_cpu (int argc, const char 
**argv)
  /* Assume Knights Landing.  */
  if (has_avx512f)
cpu = "knl";
+ /* Assume Skylake.  */
+ else if (has_clflushopt)
+   cpu = "skylake";
  /* Assume Broadwell.  */
  else if (has_adx)
cpu = "broadwell";
-- 
2.13.0



Re: [PATCH 1/2] i386: Consider Kaby Lake to be equivalent to Skylake

2017-06-22 Thread Matt Turner
On Sun, Jun 18, 2017 at 10:56 AM, Uros Bizjak  wrote:
> On Fri, Jun 16, 2017 at 11:42 PM, Matt Turner  wrote:
>> Currently -march=native selects -march=broadwell on Kaby Lake systems,
>> since its model numbers are missing from the switch statement. It falls
>> back to the default case and chooses -march=broadwell because of the
>> presence of the ADX instruction set.
>>
>> gcc/
>> * config/i386/driver-i386.c (host_detect_local_cpu): Add Kaby
>> Lake models to skylake case.
>>
>> gcc/testsuite/
>>
>> * gcc.target/i386/builtin_target.c: Add Kaby Lake models to
>> skylake check.
>>
>> libgcc/
>>
>> * config/i386/cpuinfo.c (get_intel_cpu): Add Kaby Lake models to
>> skylake case.
>
> OK.
>
> Thanks,
> Uros.

Thank you very much. I do not have write access, so please check the
patches in for me if you would not mind.


Re: [PATCH ARM iWMMXt 0/5] Improve iWMMXt support

2013-01-27 Thread Matt Turner
On Tue, Jun 26, 2012 at 7:56 AM, nick clifton  wrote:
> Hi Matt,
>
>
>> There's also a trivial documentation fix:
>>
>> [PATCH 1/2] doc: Correct __builtin_arm_tinsr prototype documentation
>>
>> and a test to exercise the intrinsics:
>>
>> [PATCH 2/2] arm: add iwMMXt mmx-2.c test
>
>
> These have both been checked in.
>
> It turns out that both needed minor updates as some of the builtins have
> changed since these patches were written.  I have taken care of this
> however.
>
> Cheers
>   Nick

Hi Nick,

Could this patch, or perhaps the much smaller one I attached to bug
35294 be committed to the 4.7 branch?

Also, could you close its duplicates, bugs 36798 and 36966?

Thanks,
Matt


[PATCH] mips: Document r4700

2013-02-22 Thread Matt Turner
2013-02-22  Matt Turner  

gcc/
* doc/invoke.texi: Document r4700.
---
 gcc/doc/invoke.texi | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 7d96467..63eb6a6 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -15867,7 +15867,7 @@ The processor names are:
 @samp{octeon}, @samp{octeon+}, @samp{octeon2},
 @samp{orion},
 @samp{r2000}, @samp{r3000}, @samp{r3900}, @samp{r4000}, @samp{r4400},
-@samp{r4600}, @samp{r4650}, @samp{r6000}, @samp{r8000},
+@samp{r4600}, @samp{r4650}, @samp{r4700}, @samp{r6000}, @samp{r8000},
 @samp{rm7000}, @samp{rm9000},
 @samp{r1}, @samp{r12000}, @samp{r14000}, @samp{r16000},
 @samp{sb1},
-- 
1.7.12.4



Re: [PATCH ARM iWMMXt 0/5] Improve iWMMXt support

2012-06-13 Thread Matt Turner
On Wed, Jun 13, 2012 at 3:26 AM, nick clifton  wrote:
> Hi Matt, Hi Xinyu,
>
>
>> This series was written by Marvell and sent by Xinyu Qi
>> a number of times in the last year.
>
>
> Sorry for the long delay in reviewing these patches.  Overall they were
> fine, with only a few, very minor, formatting issues.  I have committed the
> entire series of patches to the mainline.

Great! Thank you so much! Thanks to Ramana for the reviews!

>> For 4.7 and 4.6 please consider committing my patch
>> "[PATCH] arm: Fix iwmmxt shift and logical intrinsics (PR 35294)."
>> which only fixes the logical and shift intrinsics.

Sounds good.

There's also a trivial documentation fix:

[PATCH 1/2] doc: Correct __builtin_arm_tinsr prototype documentation

and a test to exercise the intrinsics:

[PATCH 2/2] arm: add iwMMXt mmx-2.c test

Thanks a lot!

Matt


Re: [PATCH ARM iWMMXt 0/5] Improve iWMMXt support

2012-06-27 Thread Matt Turner
On Tue, Jun 26, 2012 at 10:56 AM, nick clifton  wrote:
> Hi Matt,
>
>
>> There's also a trivial documentation fix:
>>
>> [PATCH 1/2] doc: Correct __builtin_arm_tinsr prototype documentation
>>
>> and a test to exercise the intrinsics:
>>
>> [PATCH 2/2] arm: add iwMMXt mmx-2.c test
>
>
> These have both been checked in.
>
> It turns out that both needed minor updates as some of the builtins have
> changed since these patches were written.  I have taken care of this
> however.
>
> Cheers
>  Nick

Thanks a lot, Nick!


Re: [PING] iwMMXt patches

2012-05-02 Thread Matt Turner
On Tue, Apr 17, 2012 at 4:17 PM, Matt Turner  wrote:
> Are these patches ready to go in? It looks like they were ack'd.
>
> http://gcc.gnu.org/ml/gcc-patches/2011-10/msg01815.html
> http://gcc.gnu.org/ml/gcc-patches/2011-10/msg01817.html
> http://gcc.gnu.org/ml/gcc-patches/2011-10/msg01816.html
> http://gcc.gnu.org/ml/gcc-patches/2011-10/msg01818.html
> http://gcc.gnu.org/ml/gcc-patches/2011-10/msg01819.html
>
> We (OLPC) will need these patches for reasonable iwMMXt performance
> and the ability to use VFP and iwMMXt together.
>
> Thanks,
> Matt

Xinyu,

With these patches I don't see a new -mcpu flag. Isn't a tune/cpu flag
the normal way to activate this code?

Other .md files have statements like (eq_attr "tune" "cortexa8"), but
I don't see how to turn on the marvell-f-iwmmxt attribute, ie (eq_attr
"marvell_f_iwmmxt" "yes").

Please let me know.

Thanks,
Matt


Re: [PING] iwMMXt patches

2012-05-03 Thread Matt Turner
On Thu, May 3, 2012 at 12:59 AM, Xinyu Qi  wrote:
>> From: Matt Turner [mailto:matts...@gmail.com]
>> To: Xinyu Qi
>> Cc: Ramana Radhakrishnan; GCC Patches
>> Subject: Re: [PING] iwMMXt patches
>>
>> On Tue, Apr 17, 2012 at 4:17 PM, Matt Turner  wrote:
>> > Are these patches ready to go in? It looks like they were ack'd.
>> >
>> > http://gcc.gnu.org/ml/gcc-patches/2011-10/msg01815.html
>> > http://gcc.gnu.org/ml/gcc-patches/2011-10/msg01817.html
>> > http://gcc.gnu.org/ml/gcc-patches/2011-10/msg01816.html
>> > http://gcc.gnu.org/ml/gcc-patches/2011-10/msg01818.html
>> > http://gcc.gnu.org/ml/gcc-patches/2011-10/msg01819.html
>> >
>> > We (OLPC) will need these patches for reasonable iwMMXt performance
>> > and the ability to use VFP and iwMMXt together.
>> >
>> > Thanks,
>> > Matt
>>
>> Xinyu,
>>
>> With these patches I don't see a new -mcpu flag. Isn't a tune/cpu flag
>> the normal way to activate this code?
>>
>> Other .md files have statements like (eq_attr "tune" "cortexa8"), but
>> I don't see how to turn on the marvell-f-iwmmxt attribute, ie (eq_attr
>> "marvell_f_iwmmxt" "yes").
>>
>> Please let me know.
>>
>> Thanks,
>> Matt
>
> Hi Matt,
>
> I updated the patches several months ago by following the review opinions 
> form Richard Earnshaw [richard.earns...@arm.com]
> (unfortunately, no further feedback)
> The newest patches are
> http://gcc.gnu.org/ml/gcc-patches/2011-12/msg01787.html
> http://gcc.gnu.org/ml/gcc-patches/2011-12/msg01788.html
> http://gcc.gnu.org/ml/gcc-patches/2011-12/msg01789.html
> http://gcc.gnu.org/ml/gcc-patches/2011-12/msg01786.html
> http://gcc.gnu.org/ml/gcc-patches/2011-12/msg01599.html
> The main discussion is in
> http://gcc.gnu.org/ml/gcc-patches/2011-12/msg01786.html
>
> No new -mcpu flag is introduced in the patches. You can simply turn on 
> marvell-f-iwmmxt by -mcpu=iwmmxt2(or -march=iwmmxt2).
> (Of course it is odd to treat the "iwmmxt2" as a name of cpu)
>
>
> Thanks,
> Xinyu

Thanks for the email, Xinyu!

We (OLPC) will test the patches and then I'll resubmit them to
gcc-patches@ and try to get them included. They're definitely needed
for us, since they fix PR35294 (iwmmxt shift and logical intrinsics
are broken).

By the way, are there patches for add general instruction scheduling
support for Marvell CPUs like the Armada 610?

Thanks again,

Matt


Re: [PATCH 2/2] arm: add iwMMXt mmx-2.c test

2012-05-28 Thread Matt Turner
On Thu, Apr 5, 2012 at 4:53 AM, Ramana Radhakrishnan
 wrote:
> On 4 April 2012 19:35, Matt Turner  wrote:
>>  gcc/testsuite/gcc.target/arm/mmx-2.c |  158 
>> ++
>>  1 files changed, 158 insertions(+), 0 deletions(-)
>>  create mode 100644 gcc/testsuite/gcc.target/arm/mmx-2.c
>>
>> diff --git a/gcc/testsuite/gcc.target/arm/mmx-2.c 
>> b/gcc/testsuite/gcc.target/arm/mmx-2.c
>> new file mode 100644
>> index 000..603a63b
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/arm/mmx-2.c
>> @@ -0,0 +1,158 @@
>> +/* { dg-do compile } */
>> +/* { dg-skip-if "Test is specific to the iWMMXt" { arm*-*-* } { "-mcpu=*" } 
>> { "-mcpu=iwmmxt" } } */
>> +/* { dg-skip-if "Test is specific to the iWMMXt" { arm*-*-* } { "-mabi=*" } 
>> { "-mabi=iwmmxt" } } */
>> +/* { dg-skip-if "Test is specific to the iWMMXt" { arm*-*-* } { "-march=*" 
>> } { "-march=iwmmxt" } } */
>> +/* { dg-skip-if "Test is specific to ARM mode" { arm*-*-* } { "-mthumb" } { 
>> "" } } */
>
> How about simplifying this with a dg-require-effective-target
> arm_arm_ok instead of doing
> dg-require-effective-target arm32 and then skipping it for Thumb2 ?

I might not understand properly, but couldn't I just do this?

/* { dg-require-effective-target arm_iwmmxt_ok } */

Thanks,
Matt


Re: [PATCH 1/2] mips: Add R4600 scheduling support for imul and idiv

2012-05-28 Thread Matt Turner
On Sat, Feb 25, 2012 at 3:11 AM, Richard Sandiford
 wrote:
> Matt Turner  writes:
>> The r4600_imul and r4600_idiv reservations were correct for si, but
>> there were no *_di reservations.
>>
>> See page 4 of
>> http://www.sgistuff.net/hardware/other/documents/R4600_Prod_OV.pdf
>>
>> 2012-02-24  Matt Turner  
>>
>>       * config/mips/4600.md (r4600_imul_si): Rename from r4600_imul.
>>       (r4600_imul_di): New.
>>       (r4600_idiv_si): Rename from r4600_idiv.
>>       (r4600_idiv_di): New.
>
> Both patches look good, thanks.  Will commit once 4.8 is open and the
> copyright assignment is sorted.
>
> Richard

Copyright assignment is sorted. Please commit. :)


Re: [PATCH] alpha: add bypasses for fmul/fadd/fcmov -> fst/ftoi

2012-05-28 Thread Matt Turner
On Fri, Feb 24, 2012 at 10:53 PM, Matt Turner  wrote:
> See section 2.5.3 (page 28) of
> http://download.majix.org/dec/comp_guide_v2.pdf
>
> 2012-02-24  Matt Turner  
>
>        * config/alpha/ev6.md: (define_bypass "ev6_fmul,ev6_fadd"): New.
>        (define_bypass "ev6_fcmov"): New.
> ---
>  gcc/config/alpha/ev6.md |    4 
>  1 files changed, 4 insertions(+), 0 deletions(-)
>
> diff --git a/gcc/config/alpha/ev6.md b/gcc/config/alpha/ev6.md
> index adfe504..a16535a 100644
> --- a/gcc/config/alpha/ev6.md
> +++ b/gcc/config/alpha/ev6.md
> @@ -147,11 +147,15 @@
>        (eq_attr "type" "fadd,fcpys,fbr"))
>   "ev6_fa")
>
> +(define_bypass 6 "ev6_fmul,ev6_fadd" "ev6_fst,ev6_ftoi")
> +
>  (define_insn_reservation "ev6_fcmov" 8
>   (and (eq_attr "tune" "ev6")
>        (eq_attr "type" "fcmov"))
>   "ev6_fa,nothing*3,ev6_fa")
>
> +(define_bypass 10 "ev6_fcmov" "ev6_fst,ev6_ftoi")
> +
>  (define_insn_reservation "ev6_fdivsf" 12
>   (and (eq_attr "tune" "ev6")
>        (and (eq_attr "type" "fdiv")
> --
> 1.7.3.4
>

Copyright assignment is sorted. Please commit. :)


Re: [PATCH] arm: add _mm_empty to mmintrin.h for source compatibility

2012-05-28 Thread Matt Turner
On Tue, Feb 28, 2012 at 7:13 PM, Ramana Radhakrishnan
 wrote:
> On Fri, Feb 24, 2012 at 10:53:35PM -0500, Matt Turner wrote:
>> The x86/amd64 mmintrin.h provides the _mm_empty intrinsic for the 'emms'
>> MMX instruction. Although ARM does not need such an instruction, we
>> should provide an empty _mm_empty function nonetheless for source
>> compatibility.
>
> OK for 4.8 and after your copyright assignment has been
> sorted.
>
> Ramana
>
>>
>> 2012-02-24  Matt Turner  
>>
>>       * config/arm/mmintrin.h (_mm_empty): New.
>> ---
>>  gcc/config/arm/mmintrin.h |    7 +++
>>  1 files changed, 7 insertions(+), 0 deletions(-)
>>
>> diff --git a/gcc/config/arm/mmintrin.h b/gcc/config/arm/mmintrin.h
>> index 2cc500d..ea73bf1 100644
>> --- a/gcc/config/arm/mmintrin.h
>> +++ b/gcc/config/arm/mmintrin.h
>> @@ -32,6 +32,12 @@ typedef int __v2si __attribute__ ((vector_size (8)));
>>  typedef short __v4hi __attribute__ ((vector_size (8)));
>>  typedef char __v8qi __attribute__ ((vector_size (8)));
>>
>> +/* Provided for source compatibility with MMX.  */
>> +extern __inline void __attribute__((__gnu_inline__, __always_inline__, 
>> __artificial__))
>> +_mm_empty (void)
>> +{
>> +}
>> +
>>  /* "Convert" __m64 and __int64 into each other.  */
>>  static __inline __m64
>>  _mm_cvtsi64_m64 (__int64 __i)
>> @@ -1248,6 +1254,7 @@ _m_from_int (int __a)
>>  #define _m_psadzbw _mm_sadz_pu8
>>  #define _m_psadzwd _mm_sadz_pu16
>>  #define _m_paligniq _mm_align_si64
>> +#define _m_empty _mm_empty
>>  #define _m_cvt_si2pi _mm_cvtsi64_m64
>>  #define _m_cvt_pi2si _mm_cvtm64_si64
>>
>> --
>> 1.7.3.4
>>

Copyright assignment is sorted. Please commit. :)


Re: [PATCH] arm: Fix iwmmxt shift and logical intrinsics (PR 35294).

2012-05-28 Thread Matt Turner
On Fri, Feb 24, 2012 at 10:53 PM, Matt Turner  wrote:
> PR 36798 and 36966 are duplicates.
>
> 2012-02-24  Matt Turner  
>
>        PR target/35294
>        * config/arm/arm.c (arm_expand_builtin): Wire up missing
>        intrinsics.
> ---
>  gcc/config/arm/arm.c |   62 
> +-
>  1 files changed, 61 insertions(+), 1 deletions(-)

Drop this patch. Marvell has a five patch series that fixes this and
more. Maybe this patch would be suitable for the 4.6 and 4.7 branches,
since Marvell's adds some features?


Re: [PATCH 1/2] doc: Correct __builtin_arm_tinsr prototype documentation

2012-05-28 Thread Matt Turner
On Wed, Apr 4, 2012 at 2:34 PM, Matt Turner  wrote:
> 2012-04-04  Matt Turner  
>
>        gcc/
>        * doc/extend.texi (__builtin_arm_tinsrb): Add missing second
>        parameter.
>        (__builtin_arm_tinsrh): Likewise.
>        (__builtin_arm_tinsrw): Likewise.
> ---
> This patch and 2/2 are tie-ons to
> http://gcc.gnu.org/ml/gcc-patches/2012-02/msg01269.html
>
> Still waiting on copyright assignment, but I think this doc patch
> is trivial enough to be committed without it.
>
>  gcc/doc/extend.texi |    6 +++---
>  1 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index bb43825..966175d 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -8676,9 +8676,9 @@ int __builtin_arm_textrmsw (v2si, int)
>  int __builtin_arm_textrmub (v8qi, int)
>  int __builtin_arm_textrmuh (v4hi, int)
>  int __builtin_arm_textrmuw (v2si, int)
> -v8qi __builtin_arm_tinsrb (v8qi, int)
> -v4hi __builtin_arm_tinsrh (v4hi, int)
> -v2si __builtin_arm_tinsrw (v2si, int)
> +v8qi __builtin_arm_tinsrb (v8qi, int, int)
> +v4hi __builtin_arm_tinsrh (v4hi, int, int)
> +v2si __builtin_arm_tinsrw (v2si, int, int)
>  long long __builtin_arm_tmia (long long, int, int)
>  long long __builtin_arm_tmiabb (long long, int, int)
>  long long __builtin_arm_tmiabt (long long, int, int)
> --
> 1.7.3.4
>

Copyright assignment is sorted. Please commit. :)


[PATCH ARM iWMMXt 0/5] Improve iWMMXt support

2012-05-28 Thread Matt Turner

This series was written by Marvell and sent by Xinyu Qi 
a number of times in the last year.

We (One Laptop per Child) need these patches for reasonable iWMMXt support
and performance. Without them, logical and shift intrinsics cause ICEs,
see PR 35294 and its duplicates 36798 and 36966.

The software compositing library pixman uses MMX intrinsics to optimize
various compositing routines. The following are the minimum execution times
of cairo-perf-trace graphics work loads without and with iWMMXt-optimized
pixman for the image and image16 backends (32-bpp and 16-bpp respectively).

 image   image16
   evolution   33.492 ->  29.59030.334 ->  24.751
firefox-planet-gnome  191.465 -> 173.835   211.297 -> 187.570
gnome-system-monitor   51.956 ->  44.54952.272 ->  40.525
  gnome-terminal-vim   53.625 ->  54.55447.593 ->  47.341
  grads-heat-map4.439 ->   4.165 4.548 ->   4.624
   midori-zoomed   38.033 ->  28.50038.576 ->  26.937
 poppler   41.096 ->  31.94941.230 ->  31.749
  swfdec-giant-steps   20.062 ->  16.91228.294 ->  17.286
  swfdec-youtube   42.281 ->  37.33552.848 ->  47.053
   xfce4-terminal-a1   64.311 ->  51.01162.592 ->  51.191

We have cleaned up some white-space issues with the patches and fixed a
small bug in patch 4/5 since the last time they were posted in December
(added tandc,textrc,torc,torvsc to the "wtype" attribute)

Please commit them for 4.8.

For 4.7 and 4.6 please consider committing my patch
"[PATCH] arm: Fix iwmmxt shift and logical intrinsics (PR 35294)."
which only fixes the logical and shift intrinsics.

Thanks,

Matt Turner


[PATCH ARM iWMMXt 5/5] pipeline description

2012-05-28 Thread Matt Turner
From: Xinyu Qi 

gcc/
* config/arm/t-arm (MD_INCLUDES): Add marvell-f-iwmmxt.md.
* config/arm/marvell-f-iwmmxt.md: New file.
* config/arm/arm.md (marvell-f-iwmmxt.md): Include.
---
 gcc/config/arm/arm.md  |1 +
 gcc/config/arm/marvell-f-iwmmxt.md |  179 
 gcc/config/arm/t-arm   |1 +
 3 files changed, 181 insertions(+), 0 deletions(-)
 create mode 100644 gcc/config/arm/marvell-f-iwmmxt.md

diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index b0333c2..baa3b7c 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -546,6 +546,7 @@
  (const_string "yes")
  (const_string "no"
 
+(include "marvell-f-iwmmxt.md")
 (include "arm-generic.md")
 (include "arm926ejs.md")
 (include "arm1020e.md")
diff --git a/gcc/config/arm/marvell-f-iwmmxt.md 
b/gcc/config/arm/marvell-f-iwmmxt.md
new file mode 100644
index 000..fe8e455
--- /dev/null
+++ b/gcc/config/arm/marvell-f-iwmmxt.md
@@ -0,0 +1,179 @@
+;; Marvell WMMX2 pipeline description
+;; Copyright (C) 2011 Free Software Foundation, Inc.
+;; Written by Marvell, Inc.
+
+;; This file is part of GCC.
+
+;; GCC is free software; you can redistribute it and/or modify it
+;; under the terms of the GNU General Public License as published
+;; by the Free Software Foundation; either version 3, or (at your
+;; option) any later version.
+
+;; GCC is distributed in the hope that it will be useful, but WITHOUT
+;; ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+;; or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+;; License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; .
+
+
+(define_automaton "marvell_f_iwmmxt")
+
+
+;; Pipelines
+
+
+;; This is a 7-stage pipelines:
+;;
+;;MD | MI | ME1 | ME2 | ME3 | ME4 | MW
+;;
+;; There are various bypasses modelled to a greater or lesser extent.
+;;
+;; Latencies in this file correspond to the number of cycles after
+;; the issue stage that it takes for the result of the instruction to
+;; be computed, or for its side-effects to occur.
+
+(define_cpu_unit "mf_iwmmxt_MD" "marvell_f_iwmmxt")
+(define_cpu_unit "mf_iwmmxt_MI" "marvell_f_iwmmxt")
+(define_cpu_unit "mf_iwmmxt_ME1" "marvell_f_iwmmxt")
+(define_cpu_unit "mf_iwmmxt_ME2" "marvell_f_iwmmxt")
+(define_cpu_unit "mf_iwmmxt_ME3" "marvell_f_iwmmxt")
+(define_cpu_unit "mf_iwmmxt_ME4" "marvell_f_iwmmxt")
+(define_cpu_unit "mf_iwmmxt_MW" "marvell_f_iwmmxt")
+
+(define_reservation "mf_iwmmxt_ME"
+  "mf_iwmmxt_ME1,mf_iwmmxt_ME2,mf_iwmmxt_ME3,mf_iwmmxt_ME4"
+)
+
+(define_reservation "mf_iwmmxt_pipeline"
+  "mf_iwmmxt_MD, mf_iwmmxt_MI, mf_iwmmxt_ME, mf_iwmmxt_MW"
+)
+
+;; An attribute to indicate whether our reservations are applicable.
+(define_attr "marvell_f_iwmmxt" "yes,no"
+  (const (if_then_else (symbol_ref "arm_arch_iwmmxt")
+   (const_string "yes") (const_string "no"
+
+
+;; instruction classes
+
+
+;; An attribute appended to instructions for classification
+
+(define_attr "wmmxt_shift" "yes,no"
+  (if_then_else (eq_attr "wtype" "wror, wsll, wsra, wsrl")
+   (const_string "yes") (const_string "no"))
+)
+
+(define_attr "wmmxt_pack" "yes,no"
+  (if_then_else (eq_attr "wtype" "waligni, walignr, wmerge, wpack, wshufh, 
wunpckeh, wunpckih, wunpckel, wunpckil")
+   (const_string "yes") (const_string "no"))
+)
+
+(define_attr "wmmxt_mult_c1" "yes,no"
+  (if_then_else (eq_attr "wtype" "wmac, wmadd, wmiaxy, wmiawxy, wmulw, 
wqmiaxy, wqmulwm")
+   (const_string "yes") (const_string "no"))
+)
+
+(define_attr "wmmxt_mult_c2" "yes,no"
+  (if_then_else (eq_attr "wtype" "wmul, wqmulm")
+   (const_string "yes") (const_string "no"))
+)
+
+(define_attr "wmmxt_alu_c1" "yes,no"
+  (if_then_else (eq_attr "wtype" "wabs, wabsdiff, wand, wandn, wmov, wor, 
wxor")
+   (const_string "yes") (const_string "no"))
+)
+
+(define_attr "wmmxt_alu_c2" "yes,no"
+  (if_then_else (eq_attr "wtype" "wacc, wadd, waddsubhx, wavg2, wavg4, wcmpeq, 
wcmpgt, wmax, wmin, wsub, waddbhus, wsubaddhx")
+   (const_string "yes") (const_string "no"))
+)
+
+(define_attr "wmmxt_alu_c3" "yes,no"
+  (if_then_else (eq_attr "wtype" "wsad")
+   (const_string "yes") (const_string "no"))
+)
+
+(define_attr "wmmxt_transfer_c1" "yes,no"
+  (if_then_else (eq_attr "wtype" "tbcst, tinsr, tmcr, tmcrr")
+(const_string "yes") (const_string "no"))
+)
+
+(define_attr "wmmxt_transfer_c2" "yes,no"
+  (if_then_else (eq_attr "wtype" "textrm, tmo

[PATCH ARM iWMMXt 1/5] ARM code generic change

2012-05-28 Thread Matt Turner
From: Xinyu Qi 

gcc/
* config/arm/arm.c (FL_IWMMXT2): New define.
(arm_arch_iwmmxt2): New variable.
(arm_option_override): Enable use of iWMMXt with VFP.
Disable use of iWMMXt with NEON. Disable use of iWMMXt under
Thumb mode. Set arm_arch_iwmmxt2.
(arm_expand_binop_builtin): Accept VOIDmode op.
* config/arm/arm.h (TARGET_CPU_CPP_BUILTINS): Define __IWMMXT2__.
(TARGET_IWMMXT2): New define.
(TARGET_REALLY_IWMMXT2): Likewise.
(arm_arch_iwmmxt2): Declare.
* config/arm/arm-cores.def (iwmmxt2): Add FL_IWMMXT2.
* config/arm/arm-arches.def (iwmmxt2): Likewise.
* config/arm/arm.md (arch): Add "iwmmxt2".
(arch_enabled): Handle "iwmmxt2".
---
 gcc/config/arm/arm-arches.def |2 +-
 gcc/config/arm/arm-cores.def  |2 +-
 gcc/config/arm/arm.c  |   25 +
 gcc/config/arm/arm.h  |7 +++
 gcc/config/arm/arm.md |6 +-
 5 files changed, 31 insertions(+), 11 deletions(-)

diff --git a/gcc/config/arm/arm-arches.def b/gcc/config/arm/arm-arches.def
index 3123426..f4dd6cc 100644
--- a/gcc/config/arm/arm-arches.def
+++ b/gcc/config/arm/arm-arches.def
@@ -57,4 +57,4 @@ ARM_ARCH("armv7-m", cortexm3, 7M,  FL_CO_PROC | 
FL_FOR_ARCH7M)
 ARM_ARCH("armv7e-m", cortexm4,  7EM, FL_CO_PROC |FL_FOR_ARCH7EM)
 ARM_ARCH("ep9312",  ep9312, 4T,  FL_LDSCHED | FL_CIRRUS | FL_FOR_ARCH4)
 ARM_ARCH("iwmmxt",  iwmmxt, 5TE, FL_LDSCHED | FL_STRONG | FL_FOR_ARCH5TE | 
FL_XSCALE | FL_IWMMXT)
-ARM_ARCH("iwmmxt2", iwmmxt2,5TE, FL_LDSCHED | FL_STRONG | FL_FOR_ARCH5TE | 
FL_XSCALE | FL_IWMMXT)
+ARM_ARCH("iwmmxt2", iwmmxt2,5TE, FL_LDSCHED | FL_STRONG | FL_FOR_ARCH5TE | 
FL_XSCALE | FL_IWMMXT | FL_IWMMXT2)
diff --git a/gcc/config/arm/arm-cores.def b/gcc/config/arm/arm-cores.def
index d82b10b..c82eada 100644
--- a/gcc/config/arm/arm-cores.def
+++ b/gcc/config/arm/arm-cores.def
@@ -105,7 +105,7 @@ ARM_CORE("arm1020e",  arm1020e, 5TE,
 FL_LDSCHED, fastmul)
 ARM_CORE("arm1022e",  arm1022e,5TE, 
FL_LDSCHED, fastmul)
 ARM_CORE("xscale",xscale,  5TE, 
FL_LDSCHED | FL_STRONG | FL_XSCALE, xscale)
 ARM_CORE("iwmmxt",iwmmxt,  5TE, 
FL_LDSCHED | FL_STRONG | FL_XSCALE | FL_IWMMXT, xscale)
-ARM_CORE("iwmmxt2",   iwmmxt2, 5TE, 
FL_LDSCHED | FL_STRONG | FL_XSCALE | FL_IWMMXT, xscale)
+ARM_CORE("iwmmxt2",   iwmmxt2, 5TE, 
FL_LDSCHED | FL_STRONG | FL_XSCALE | FL_IWMMXT | FL_IWMMXT2, xscale)
 ARM_CORE("fa606te",   fa606te,  5TE, 
FL_LDSCHED, 9e)
 ARM_CORE("fa626te",   fa626te,  5TE, 
FL_LDSCHED, 9e)
 ARM_CORE("fmp626",fmp626,   5TE, 
FL_LDSCHED, 9e)
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 7a98197..b0680ab 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -685,6 +685,7 @@ static int thumb_call_reg_needed;
 #define FL_ARM_DIV(1 << 23)  /* Hardware divide (ARM mode).  */
 
 #define FL_IWMMXT (1 << 29)  /* XScale v2 or "Intel Wireless 
MMX technology".  */
+#define FL_IWMMXT2(1 << 30)   /* "Intel Wireless MMX2 technology".  */
 
 /* Flags that only effect tuning, not available instructions.  */
 #define FL_TUNE(FL_WBUF | FL_VFPV2 | FL_STRONG | FL_LDSCHED \
@@ -766,6 +767,9 @@ int arm_arch_cirrus = 0;
 /* Nonzero if this chip supports Intel Wireless MMX technology.  */
 int arm_arch_iwmmxt = 0;
 
+/* Nonzero if this chip supports Intel Wireless MMX2 technology.  */
+int arm_arch_iwmmxt2 = 0;
+
 /* Nonzero if this chip is an XScale.  */
 int arm_arch_xscale = 0;
 
@@ -1717,6 +1721,7 @@ arm_option_override (void)
   arm_tune_wbuf = (tune_flags & FL_WBUF) != 0;
   arm_tune_xscale = (tune_flags & FL_XSCALE) != 0;
   arm_arch_iwmmxt = (insn_flags & FL_IWMMXT) != 0;
+  arm_arch_iwmmxt2 = (insn_flags & FL_IWMMXT2) != 0;
   arm_arch_thumb_hwdiv = (insn_flags & FL_THUMB_DIV) != 0;
   arm_arch_arm_hwdiv = (insn_flags & FL_ARM_DIV) != 0;
   arm_tune_cortex_a9 = (arm_tune == cortexa9) != 0;
@@ -1817,14 +1822,17 @@ arm_option_override (void)
 }
 
   /* FPA and iWMMXt are incompatible because the insn encodings overlap.
- VFP and iWMMXt can theoretically coexist, but it's unlikely such silicon
- will ever exist.  GCC makes no attempt to support this combination.  */
-  if (TARGET_IWMMXT && !TARGET_SOFT_FLOAT)
-sorry ("iWMMXt and hardware floating point");
+ VFP and iWMMXt however can coexist.  */
+  if (TARGET_IWMMXT && TARGET_HARD_FLOAT && !TARGET_VFP)
+error ("iWMMXt and non-VFP floating point unit are incompatible");
+
+  /* iWMMXt and NEON are incompatible.  */
+  if (TARGET_IWMMXT && TARGET_NEON)
+error ("iWMMXt and NEON are inc

[PATCH ARM iWMMXt 2/5] intrinsic head file change

2012-05-28 Thread Matt Turner
From: Xinyu Qi 

gcc/
* config/arm/mmintrin.h: Use __IWMMXT__ to enable iWMMXt intrinsics.
Use __IWMMXT2__ to enable iWMMXt2 intrinsics.
Use C name-mangling for intrinsics.
(__v8qi): Redefine.
(_mm_cvtsi32_si64, _mm_andnot_si64, _mm_sad_pu8): Revise.
(_mm_sad_pu16, _mm_align_si64, _mm_setwcx, _mm_getwcx): Likewise.
(_m_from_int): Likewise.
(_mm_sada_pu8, _mm_sada_pu16): New intrinsic.
(_mm_alignr0_si64, _mm_alignr1_si64, _mm_alignr2_si64): Likewise.
(_mm_alignr3_si64, _mm_tandcb, _mm_tandch, _mm_tandcw): Likewise.
(_mm_textrcb, _mm_textrch, _mm_textrcw, _mm_torcb): Likewise.
(_mm_torch, _mm_torcw, _mm_tbcst_pi8, _mm_tbcst_pi16): Likewise.
(_mm_tbcst_pi32): Likewise.
(_mm_abs_pi8, _mm_abs_pi16, _mm_abs_pi32): New iWMMXt2 intrinsic.
(_mm_addsubhx_pi16, _mm_absdiff_pu8, _mm_absdiff_pu16): Likewise.
(_mm_absdiff_pu32, _mm_addc_pu16, _mm_addc_pu32): Likewise.
(_mm_avg4_pu8, _mm_avg4r_pu8, _mm_maddx_pi16, _mm_maddx_pu16): Likewise.
(_mm_msub_pi16, _mm_msub_pu16, _mm_mulhi_pi32): Likewise.
(_mm_mulhi_pu32, _mm_mulhir_pi16, _mm_mulhir_pi32): Likewise.
(_mm_mulhir_pu16, _mm_mulhir_pu32, _mm_mullo_pi32): Likewise.
(_mm_qmulm_pi16, _mm_qmulm_pi32, _mm_qmulmr_pi16): Likewise.
(_mm_qmulmr_pi32, _mm_subaddhx_pi16, _mm_addbhusl_pu8): Likewise.
(_mm_addbhusm_pu8, _mm_qmiabb_pi32, _mm_qmiabbn_pi32): Likewise.
(_mm_qmiabt_pi32, _mm_qmiabtn_pi32, _mm_qmiatb_pi32): Likewise.
(_mm_qmiatbn_pi32, _mm_qmiatt_pi32, _mm_qmiattn_pi32): Likewise.
(_mm_wmiabb_si64, _mm_wmiabbn_si64, _mm_wmiabt_si64): Likewise.
(_mm_wmiabtn_si64, _mm_wmiatb_si64, _mm_wmiatbn_si64): Likewise.
(_mm_wmiatt_si64, _mm_wmiattn_si64, _mm_wmiawbb_si64): Likewise.
(_mm_wmiawbbn_si64, _mm_wmiawbt_si64, _mm_wmiawbtn_si64): Likewise.
(_mm_wmiawtb_si64, _mm_wmiawtbn_si64, _mm_wmiawtt_si64): Likewise.
(_mm_wmiawttn_si64, _mm_merge_si64): Likewise.
(_mm_torvscb, _mm_torvsch, _mm_torvscw): Likewise.
(_m_to_int): New define.
---
 gcc/config/arm/mmintrin.h |  649 ++---
 1 files changed, 614 insertions(+), 35 deletions(-)

diff --git a/gcc/config/arm/mmintrin.h b/gcc/config/arm/mmintrin.h
index 2cc500d..0fe551d 100644
--- a/gcc/config/arm/mmintrin.h
+++ b/gcc/config/arm/mmintrin.h
@@ -24,16 +24,30 @@
 #ifndef _MMINTRIN_H_INCLUDED
 #define _MMINTRIN_H_INCLUDED
 
+#ifndef __IWMMXT__
+#error You must enable WMMX/WMMX2 instructions (e.g. -march=iwmmxt or 
-march=iwmmxt2) to use iWMMXt/iWMMXt2 intrinsics
+#else
+
+#ifndef __IWMMXT2__
+#warning You only enable iWMMXt intrinsics. Extended iWMMXt2 intrinsics 
available only if WMMX2 instructions enabled (e.g. -march=iwmmxt2)
+#endif
+
+
+#if defined __cplusplus
+extern "C" { /* Begin "C" */
+/* Intrinsics use C name-mangling.  */
+#endif /* __cplusplus */
+
 /* The data type intended for user use.  */
 typedef unsigned long long __m64, __int64;
 
 /* Internal data types for implementing the intrinsics.  */
 typedef int __v2si __attribute__ ((vector_size (8)));
 typedef short __v4hi __attribute__ ((vector_size (8)));
-typedef char __v8qi __attribute__ ((vector_size (8)));
+typedef signed char __v8qi __attribute__ ((vector_size (8)));
 
 /* "Convert" __m64 and __int64 into each other.  */
-static __inline __m64 
+static __inline __m64
 _mm_cvtsi64_m64 (__int64 __i)
 {
   return __i;
@@ -54,7 +68,7 @@ _mm_cvtsi64_si32 (__int64 __i)
 static __inline __int64
 _mm_cvtsi32_si64 (int __i)
 {
-  return __i;
+  return (__i & 0x);
 }
 
 /* Pack the four 16-bit values from M1 into the lower four 8-bit values of
@@ -603,7 +617,7 @@ _mm_and_si64 (__m64 __m1, __m64 __m2)
 static __inline __m64
 _mm_andnot_si64 (__m64 __m1, __m64 __m2)
 {
-  return __builtin_arm_wandn (__m1, __m2);
+  return __builtin_arm_wandn (__m2, __m1);
 }
 
 /* Bit-wise inclusive OR the 64-bit values in M1 and M2.  */
@@ -935,7 +949,13 @@ _mm_avg2_pu16 (__m64 __A, __m64 __B)
 static __inline __m64
 _mm_sad_pu8 (__m64 __A, __m64 __B)
 {
-  return (__m64) __builtin_arm_wsadb ((__v8qi)__A, (__v8qi)__B);
+  return (__m64) __builtin_arm_wsadbz ((__v8qi)__A, (__v8qi)__B);
+}
+
+static __inline __m64
+_mm_sada_pu8 (__m64 __A, __m64 __B, __m64 __C)
+{
+  return (__m64) __builtin_arm_wsadb ((__v2si)__A, (__v8qi)__B, (__v8qi)__C);
 }
 
 /* Compute the sum of the absolute differences of the unsigned 16-bit
@@ -944,9 +964,16 @@ _mm_sad_pu8 (__m64 __A, __m64 __B)
 static __inline __m64
 _mm_sad_pu16 (__m64 __A, __m64 __B)
 {
-  return (__m64) __builtin_arm_wsadh ((__v4hi)__A, (__v4hi)__B);
+  return (__m64) __builtin_arm_wsadhz ((__v4hi)__A, (__v4hi)__B);
 }
 
+static __inline __m64
+_mm_sada_pu16 (__m64 __A, __m64 __B, __m64 __C)
+{
+  return (__m64) __builtin_arm_wsadh ((__v2si)__A, (__v4hi)__B, (__v4hi)__C);
+}
+
+
 /* Compute the sum of the absolute differences of th

[PATCH ARM iWMMXt 3/5] built in define and expand

2012-05-28 Thread Matt Turner
From: Xinyu Qi 

gcc/
* config/arm/arm.c (enum arm_builtins): Revise built-in fcode.
(IWMMXT2_BUILTIN): New define.
(IWMMXT2_BUILTIN2): Likewise.
(iwmmx2_mbuiltin): Likewise.
(builtin_description bdesc_2arg): Revise built in declaration.
(builtin_description bdesc_1arg): Likewise.
(arm_init_iwmmxt_builtins): Revise built in initialization.
(arm_expand_builtin): Revise built in expansion.
---
 gcc/config/arm/arm.c |  620 +-
 1 files changed, 559 insertions(+), 61 deletions(-)

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index b0680ab..51eed40 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -19637,8 +19637,15 @@ static neon_builtin_datum neon_builtin_data[] =
FIXME?  */
 enum arm_builtins
 {
-  ARM_BUILTIN_GETWCX,
-  ARM_BUILTIN_SETWCX,
+  ARM_BUILTIN_GETWCGR0,
+  ARM_BUILTIN_GETWCGR1,
+  ARM_BUILTIN_GETWCGR2,
+  ARM_BUILTIN_GETWCGR3,
+
+  ARM_BUILTIN_SETWCGR0,
+  ARM_BUILTIN_SETWCGR1,
+  ARM_BUILTIN_SETWCGR2,
+  ARM_BUILTIN_SETWCGR3,
 
   ARM_BUILTIN_WZERO,
 
@@ -19661,7 +19668,11 @@ enum arm_builtins
   ARM_BUILTIN_WSADH,
   ARM_BUILTIN_WSADHZ,
 
-  ARM_BUILTIN_WALIGN,
+  ARM_BUILTIN_WALIGNI,
+  ARM_BUILTIN_WALIGNR0,
+  ARM_BUILTIN_WALIGNR1,
+  ARM_BUILTIN_WALIGNR2,
+  ARM_BUILTIN_WALIGNR3,
 
   ARM_BUILTIN_TMIA,
   ARM_BUILTIN_TMIAPH,
@@ -19797,6 +19808,81 @@ enum arm_builtins
   ARM_BUILTIN_WUNPCKELUH,
   ARM_BUILTIN_WUNPCKELUW,
 
+  ARM_BUILTIN_WABSB,
+  ARM_BUILTIN_WABSH,
+  ARM_BUILTIN_WABSW,
+
+  ARM_BUILTIN_WADDSUBHX,
+  ARM_BUILTIN_WSUBADDHX,
+
+  ARM_BUILTIN_WABSDIFFB,
+  ARM_BUILTIN_WABSDIFFH,
+  ARM_BUILTIN_WABSDIFFW,
+
+  ARM_BUILTIN_WADDCH,
+  ARM_BUILTIN_WADDCW,
+
+  ARM_BUILTIN_WAVG4,
+  ARM_BUILTIN_WAVG4R,
+
+  ARM_BUILTIN_WMADDSX,
+  ARM_BUILTIN_WMADDUX,
+
+  ARM_BUILTIN_WMADDSN,
+  ARM_BUILTIN_WMADDUN,
+
+  ARM_BUILTIN_WMULWSM,
+  ARM_BUILTIN_WMULWUM,
+
+  ARM_BUILTIN_WMULWSMR,
+  ARM_BUILTIN_WMULWUMR,
+
+  ARM_BUILTIN_WMULWL,
+
+  ARM_BUILTIN_WMULSMR,
+  ARM_BUILTIN_WMULUMR,
+
+  ARM_BUILTIN_WQMULM,
+  ARM_BUILTIN_WQMULMR,
+
+  ARM_BUILTIN_WQMULWM,
+  ARM_BUILTIN_WQMULWMR,
+
+  ARM_BUILTIN_WADDBHUSM,
+  ARM_BUILTIN_WADDBHUSL,
+
+  ARM_BUILTIN_WQMIABB,
+  ARM_BUILTIN_WQMIABT,
+  ARM_BUILTIN_WQMIATB,
+  ARM_BUILTIN_WQMIATT,
+
+  ARM_BUILTIN_WQMIABBN,
+  ARM_BUILTIN_WQMIABTN,
+  ARM_BUILTIN_WQMIATBN,
+  ARM_BUILTIN_WQMIATTN,
+
+  ARM_BUILTIN_WMIABB,
+  ARM_BUILTIN_WMIABT,
+  ARM_BUILTIN_WMIATB,
+  ARM_BUILTIN_WMIATT,
+
+  ARM_BUILTIN_WMIABBN,
+  ARM_BUILTIN_WMIABTN,
+  ARM_BUILTIN_WMIATBN,
+  ARM_BUILTIN_WMIATTN,
+
+  ARM_BUILTIN_WMIAWBB,
+  ARM_BUILTIN_WMIAWBT,
+  ARM_BUILTIN_WMIAWTB,
+  ARM_BUILTIN_WMIAWTT,
+
+  ARM_BUILTIN_WMIAWBBN,
+  ARM_BUILTIN_WMIAWBTN,
+  ARM_BUILTIN_WMIAWTBN,
+  ARM_BUILTIN_WMIAWTTN,
+
+  ARM_BUILTIN_WMERGE,
+
   ARM_BUILTIN_THREAD_POINTER,
 
   ARM_BUILTIN_NEON_BASE,
@@ -20329,6 +20415,10 @@ static const struct builtin_description bdesc_2arg[] =
   { FL_IWMMXT, CODE_FOR_##code, "__builtin_arm_" string, \
 ARM_BUILTIN_##builtin, UNKNOWN, 0 },
 
+#define IWMMXT2_BUILTIN(code, string, builtin) \
+  { FL_IWMMXT2, CODE_FOR_##code, "__builtin_arm_" string, \
+ARM_BUILTIN_##builtin, UNKNOWN, 0 },
+
   IWMMXT_BUILTIN (addv8qi3, "waddb", WADDB)
   IWMMXT_BUILTIN (addv4hi3, "waddh", WADDH)
   IWMMXT_BUILTIN (addv2si3, "waddw", WADDW)
@@ -20385,44 +20475,45 @@ static const struct builtin_description bdesc_2arg[] =
   IWMMXT_BUILTIN (iwmmxt_wunpckihb, "wunpckihb", WUNPCKIHB)
   IWMMXT_BUILTIN (iwmmxt_wunpckihh, "wunpckihh", WUNPCKIHH)
   IWMMXT_BUILTIN (iwmmxt_wunpckihw, "wunpckihw", WUNPCKIHW)
-  IWMMXT_BUILTIN (iwmmxt_wmadds, "wmadds", WMADDS)
-  IWMMXT_BUILTIN (iwmmxt_wmaddu, "wmaddu", WMADDU)
+  IWMMXT2_BUILTIN (iwmmxt_waddsubhx, "waddsubhx", WADDSUBHX)
+  IWMMXT2_BUILTIN (iwmmxt_wsubaddhx, "wsubaddhx", WSUBADDHX)
+  IWMMXT2_BUILTIN (iwmmxt_wabsdiffb, "wabsdiffb", WABSDIFFB)
+  IWMMXT2_BUILTIN (iwmmxt_wabsdiffh, "wabsdiffh", WABSDIFFH)
+  IWMMXT2_BUILTIN (iwmmxt_wabsdiffw, "wabsdiffw", WABSDIFFW)
+  IWMMXT2_BUILTIN (iwmmxt_avg4, "wavg4", WAVG4)
+  IWMMXT2_BUILTIN (iwmmxt_avg4r, "wavg4r", WAVG4R)
+  IWMMXT2_BUILTIN (iwmmxt_wmulwsm, "wmulwsm", WMULWSM)
+  IWMMXT2_BUILTIN (iwmmxt_wmulwum, "wmulwum", WMULWUM)
+  IWMMXT2_BUILTIN (iwmmxt_wmulwsmr, "wmulwsmr", WMULWSMR)
+  IWMMXT2_BUILTIN (iwmmxt_wmulwumr, "wmulwumr", WMULWUMR)
+  IWMMXT2_BUILTIN (iwmmxt_wmulwl, "wmulwl", WMULWL)
+  IWMMXT2_BUILTIN (iwmmxt_wmulsmr, "wmulsmr", WMULSMR)
+  IWMMXT2_BUILTIN (iwmmxt_wmulumr, "wmulumr", WMULUMR)
+  IWMMXT2_BUILTIN (iwmmxt_wqmulm, "wqmulm", WQMULM)
+  IWMMXT2_BUILTIN (iwmmxt_wqmulmr, "wqmulmr", WQMULMR)
+  IWMMXT2_BUILTIN (iwmmxt_wqmulwm, "wqmulwm", WQMULWM)
+  IWMMXT2_BUILTIN (iwmmxt_wqmulwmr, "wqmulwmr", WQMULWMR)
+  IWMMXT_BUILTIN (iwmmxt_walignr0, "walignr0", WALIGNR0)
+  IWMMXT_BUILTIN (iwmmxt_walignr1, "walignr1", WALIGNR1)
+  IWMMXT_BUILTIN (iwmmxt_walignr2, "walignr2", WALIGNR2)
+  IWMMXT_BUILTIN (iwmmxt_walignr3, "wali

Re: [PATCH 1/2] mips: Add R4600 scheduling support for imul and idiv

2012-05-31 Thread Matt Turner
On Thu, May 31, 2012 at 5:35 PM, Richard Sandiford
 wrote:
> Matt Turner  writes:
>> On Sat, Feb 25, 2012 at 3:11 AM, Richard Sandiford
>>  wrote:
>>> Matt Turner  writes:
>>>> The r4600_imul and r4600_idiv reservations were correct for si, but
>>>> there were no *_di reservations.
>>>>
>>>> See page 4 of
>>>> http://www.sgistuff.net/hardware/other/documents/R4600_Prod_OV.pdf
>>>>
>>>> 2012-02-24  Matt Turner  
>>>>
>>>>       * config/mips/4600.md (r4600_imul_si): Rename from r4600_imul.
>>>>       (r4600_imul_di): New.
>>>>       (r4600_idiv_si): Rename from r4600_idiv.
>>>>       (r4600_idiv_di): New.
>>>
>>> Both patches look good, thanks.  Will commit once 4.8 is open and the
>>> copyright assignment is sorted.
>>>
>>> Richard
>>
>> Copyright assignment is sorted. Please commit. :)
>
> Applied this one.  Part 2 seems to be based on a different version
> of driver-native.c though.
>
> Thanks for perservering. :-)
>
> Richard

Thanks a lot!

Ah, right, 2/2 was written before IRIX support was removed and changed
driver-native.c significantly.

Updated patch in your inbox shortly.

Thanks!
Matt


[PATCH 2/2] mips: Add R4700 scheduling support

2012-05-31 Thread Matt Turner
The R4700 is identical to the R4600 except for the integer and
floating-point multiplication costs.

See page 4 of http://datasheets.chipdb.org/IDT/MIPS/79RV4700.pdf

2012-03-24  Matt Turner  

gcc/
* config/mips/4600.md (r4700_imul_si): New.
(r4700_imul_di): New.
(r4700_fmul_single): New.
(r4700_fmul_double): New.
* config/mips/mips-cpus.def: Add r4700.
* config/mips/mips.c: Likewise.
* config/mips/mips.md: Likewise.
* config/mips/mips-tables.opt: Regenerate.
---
 gcc/config/mips/4600.md |   51 ++--
 gcc/config/mips/mips-cpus.def   |1 +
 gcc/config/mips/mips-tables.opt |  278 ---
 gcc/config/mips/mips.c  |3 +
 gcc/config/mips/mips.md |1 +
 5 files changed, 187 insertions(+), 147 deletions(-)

diff --git a/gcc/config/mips/4600.md b/gcc/config/mips/4600.md
index 53aa01b..36eab80 100644
--- a/gcc/config/mips/4600.md
+++ b/gcc/config/mips/4600.md
@@ -1,4 +1,4 @@
-;; R4600 and R4650 pipeline description.
+;; R4600, R4650, and R4700 pipeline description.
 ;;   Copyright (C) 2004, 2005, 2007, 2012 Free Software Foundation, Inc.
 ;;
 ;; This file is part of GCC.
@@ -21,8 +21,10 @@
 ;; This file overrides parts of generic.md.  It is derived from the
 ;; old define_function_unit description.
 ;;
-;; We handle the R4600 and R4650 in much the same way.  The only difference
-;; is in the integer multiplication and division costs.
+;; We handle the R4600, R4650, and R4700 in much the same way.  The only
+;; differences between R4600 and R4650 are the integer multiplication and
+;; division costs. The only differences between R4600 and R4700 are the
+;; integer and floating-point multiplication costs.
 
 (define_insn_reservation "r4600_imul_si" 10
   (and (eq_attr "cpu" "r4600")
@@ -37,13 +39,13 @@
   "imuldiv*12")
 
 (define_insn_reservation "r4600_idiv_si" 42
-  (and (eq_attr "cpu" "r4600")
+  (and (eq_attr "cpu" "r4600,r4700")
(eq_attr "type" "idiv")
(eq_attr "mode" "SI"))
   "imuldiv*42")
 
 (define_insn_reservation "r4600_idiv_di" 74
-  (and (eq_attr "cpu" "r4600")
+  (and (eq_attr "cpu" "r4600,r4700")
(eq_attr "type" "idiv")
(eq_attr "mode" "DI"))
   "imuldiv*74")
@@ -60,13 +62,26 @@
   "imuldiv*36")
 
 
+(define_insn_reservation "r4700_imul_si" 8
+  (and (eq_attr "cpu" "r4700")
+   (eq_attr "type" "imul,imul3,imadd")
+   (eq_attr "mode" "SI"))
+  "imuldiv*8")
+
+(define_insn_reservation "r4700_imul_di" 10
+  (and (eq_attr "cpu" "r4700")
+   (eq_attr "type" "imul,imul3,imadd")
+   (eq_attr "mode" "DI"))
+  "imuldiv*10")
+
+
 (define_insn_reservation "r4600_load" 2
-  (and (eq_attr "cpu" "r4600,r4650")
+  (and (eq_attr "cpu" "r4600,r4650,r4700")
(eq_attr "type" "load,fpload,fpidxload"))
   "alu")
 
 (define_insn_reservation "r4600_fmove" 1
-  (and (eq_attr "cpu" "r4600,r4650")
+  (and (eq_attr "cpu" "r4600,r4650,r4700")
(eq_attr "type" "fabs,fneg,fmove"))
   "alu")
 
@@ -82,26 +97,40 @@
(eq_attr "mode" "DF")))
   "alu")
 
+
+(define_insn_reservation "r4700_fmul_single" 4
+  (and (eq_attr "cpu" "r4700")
+   (and (eq_attr "type" "fmul,fmadd")
+   (eq_attr "mode" "SF")))
+  "alu")
+
+(define_insn_reservation "r4700_fmul_double" 5
+  (and (eq_attr "cpu" "r4700")
+   (and (eq_attr "type" "fmul,fmadd")
+   (eq_attr "mode" "DF")))
+  "alu")
+
+
 (define_insn_reservation "r4600_fdiv_single" 32
-  (and (eq_attr "cpu" "r4600,r4650")
+  (and (eq_attr "cpu" "r4600,r4650,r4700")
(and (eq_attr "type" "fdiv,frdiv")
(eq_attr "mode" "SF")))
   "alu")
 
 (define_insn_reservation "r4600_fdiv_double" 61
-  (and (eq_attr "cpu" "r4600,r4650")
+  (and (eq_attr "cpu" "r4600,r4650,r4700")
(and (eq_attr "type" "fdiv,frdiv")
(eq_attr "mode" "DF")))
   "alu")
 
 (define_insn_reservation "r4600_fsqrt_single" 31
-  (and (eq_attr "cpu" "r4600,r4650")
+  (and (eq_attr "cpu" &

[PATCH] Wire-up missing ARM iwmmxt intrinsics (bugs 35294, 36798, 36966)

2011-08-18 Thread Matt Turner
Hi,

Attached is a patch based on gcc-4.6.1 that wires-up missing ARM
iwmmxt intrinsics. Without it, gcc is completely useless when it comes
to using a large portion of the intrinsics documented on this page:
http://gcc.gnu.org/onlinedocs/gcc/ARM-iWMMXt-Built_002din-Functions.html

The patch is based on the work of  in bug 35294.

I do not know why the check_opsmode hack is necessary. Perhaps serowk
can help with that. I also do not know if this wires up all the
missing intrinsics, but it is sufficient to build a working
iwmmxt-optimized pixman:
http://cgit.freedesktop.org/~mattst88/pixman/log/?h=iwmmxt-optimizations

I have seen much more extensive patches from Xinyu Qi, but I do not
suppose that they will be available in gcc 4.6.

Thanks,
Matt Turner
--- arm.c.orig	2011-08-19 00:03:06.163195724 -0400
+++ arm.c	2011-08-19 00:03:10.872195933 -0400
@@ -157,7 +157,7 @@
 static void arm_init_builtins (void);
 static void arm_init_iwmmxt_builtins (void);
 static rtx safe_vector_operand (rtx, enum machine_mode);
-static rtx arm_expand_binop_builtin (enum insn_code, tree, rtx);
+static rtx arm_expand_binop_builtin (enum insn_code, tree, rtx, bool);
 static rtx arm_expand_unop_builtin (enum insn_code, tree, rtx, int);
 static rtx arm_expand_builtin (tree, rtx, rtx, enum machine_mode, int);
 static void emit_constant_insn (rtx cond, rtx pattern);
@@ -19197,7 +19197,7 @@
 
 static rtx
 arm_expand_binop_builtin (enum insn_code icode,
-			  tree exp, rtx target)
+			  tree exp, rtx target, bool check_opsmode)
 {
   rtx pat;
   tree arg0 = CALL_EXPR_ARG (exp, 0);
@@ -19218,7 +19218,8 @@
   || ! (*insn_data[icode].operand[0].predicate) (target, tmode))
 target = gen_reg_rtx (tmode);
 
-  gcc_assert (GET_MODE (op0) == mode0 && GET_MODE (op1) == mode1);
+  if (check_opsmode)
+gcc_assert (GET_MODE (op0) == mode0 && GET_MODE (op1) == mode1);
 
   if (! (*insn_data[icode].operand[1].predicate) (op0, mode0))
 op0 = copy_to_mode_reg (mode0, op0);
@@ -19760,13 +19761,13 @@
   return target;
 
 case ARM_BUILTIN_WSADB:
-  return arm_expand_binop_builtin (CODE_FOR_iwmmxt_wsadb, exp, target);
+  return arm_expand_binop_builtin (CODE_FOR_iwmmxt_wsadb, exp, target, true);
 case ARM_BUILTIN_WSADH:
-  return arm_expand_binop_builtin (CODE_FOR_iwmmxt_wsadh, exp, target);
+  return arm_expand_binop_builtin (CODE_FOR_iwmmxt_wsadh, exp, target, true);
 case ARM_BUILTIN_WSADBZ:
-  return arm_expand_binop_builtin (CODE_FOR_iwmmxt_wsadbz, exp, target);
+  return arm_expand_binop_builtin (CODE_FOR_iwmmxt_wsadbz, exp, target, true);
 case ARM_BUILTIN_WSADHZ:
-  return arm_expand_binop_builtin (CODE_FOR_iwmmxt_wsadhz, exp, target);
+  return arm_expand_binop_builtin (CODE_FOR_iwmmxt_wsadhz, exp, target, true);
 
   /* Several three-argument builtins.  */
 case ARM_BUILTIN_WMACS:
@@ -19814,6 +19815,65 @@
   emit_insn (pat);
   return target;
 
+case ARM_BUILTIN_WSLLH:
+case ARM_BUILTIN_WSLLHI:
+case ARM_BUILTIN_WSLLW:
+case ARM_BUILTIN_WSLLWI:
+case ARM_BUILTIN_WSLLD:
+case ARM_BUILTIN_WSLLDI:
+case ARM_BUILTIN_WSRAH:
+case ARM_BUILTIN_WSRAHI:
+case ARM_BUILTIN_WSRAW:
+case ARM_BUILTIN_WSRAWI:
+case ARM_BUILTIN_WSRAD:
+case ARM_BUILTIN_WSRADI:
+case ARM_BUILTIN_WSRLH:
+case ARM_BUILTIN_WSRLHI:
+case ARM_BUILTIN_WSRLW:
+case ARM_BUILTIN_WSRLWI:
+case ARM_BUILTIN_WSRLD:
+case ARM_BUILTIN_WSRLDI:
+case ARM_BUILTIN_WRORH:
+case ARM_BUILTIN_WRORHI:
+case ARM_BUILTIN_WRORW:
+case ARM_BUILTIN_WRORWI:
+case ARM_BUILTIN_WRORD:
+case ARM_BUILTIN_WRORDI:
+case ARM_BUILTIN_WAND:
+case ARM_BUILTIN_WANDN:
+case ARM_BUILTIN_WOR:
+case ARM_BUILTIN_WXOR:
+  icode = (fcode == ARM_BUILTIN_WSLLH ? CODE_FOR_ashlv4hi3_di
+	   : fcode == ARM_BUILTIN_WSLLHI ? CODE_FOR_ashlv4hi3_iwmmxt
+	   : fcode == ARM_BUILTIN_WSLLW  ? CODE_FOR_ashlv2si3_di
+	   : fcode == ARM_BUILTIN_WSLLWI ? CODE_FOR_ashlv2si3_iwmmxt
+	   : fcode == ARM_BUILTIN_WSLLD  ? CODE_FOR_ashldi3_di
+	   : fcode == ARM_BUILTIN_WSLLDI ? CODE_FOR_ashldi3_iwmmxt
+	   : fcode == ARM_BUILTIN_WSRAH  ? CODE_FOR_ashrv4hi3_di
+	   : fcode == ARM_BUILTIN_WSRAHI ? CODE_FOR_ashrv4hi3_iwmmxt
+	   : fcode == ARM_BUILTIN_WSRAW  ? CODE_FOR_ashrv2si3_di
+	   : fcode == ARM_BUILTIN_WSRAWI ? CODE_FOR_ashrv2si3_iwmmxt
+	   : fcode == ARM_BUILTIN_WSRAD  ? CODE_FOR_ashrdi3_di
+	   : fcode == ARM_BUILTIN_WSRADI ? CODE_FOR_ashrdi3_iwmmxt
+	   : fcode == ARM_BUILTIN_WSRLH  ? CODE_FOR_lshrv4hi3_di
+	   : fcode == ARM_BUILTIN_WSRLHI ? CODE_FOR_lshrv4hi3_iwmmxt
+	   : fcode == ARM_BUILTIN_WSRLW  ? CODE_FOR_lshrv2si3_di
+	   : fcode == ARM_BUILTIN_WSRLWI ? CODE_FOR_lshrv2si3_iwmmxt
+	   : fcode == ARM_BUILTIN_WSRLD  ? CODE_FOR_lshrdi3_di
+	   : fcode == ARM_BUILTIN_WSRLDI ? CODE_FOR_lshrdi3_iwmmxt
+	   : fcode == ARM_BUILTIN_WRORH  ? CODE

Re: [PATCH] Wire-up missing ARM iwmmxt intrinsics (bugs 35294, 36798, 36966)

2011-08-18 Thread Matt Turner
On Fri, Aug 19, 2011 at 12:13 AM, Matt Turner  wrote:
> Hi,
>
> Attached is a patch based on gcc-4.6.1 that wires-up missing ARM
> iwmmxt intrinsics. Without it, gcc is completely useless when it comes
> to using a large portion of the intrinsics documented on this page:
> http://gcc.gnu.org/onlinedocs/gcc/ARM-iWMMXt-Built_002din-Functions.html
>
> The patch is based on the work of  in bug 35294.
>
> I do not know why the check_opsmode hack is necessary. Perhaps serowk
> can help with that. I also do not know if this wires up all the
> missing intrinsics, but it is sufficient to build a working
> iwmmxt-optimized pixman:
> http://cgit.freedesktop.org/~mattst88/pixman/log/?h=iwmmxt-optimizations
>
> I have seen much more extensive patches from Xinyu Qi, but I do not
> suppose that they will be available in gcc 4.6.
>
> Thanks,
> Matt Turner




Re: [PATCH] Wire-up missing ARM iwmmxt intrinsics (bugs 35294, 36798, 36966)

2011-08-19 Thread Matt Turner
On Fri, Aug 19, 2011 at 2:09 AM, Xinyu Qi  wrote:
> At 2011-08-19 12:18:10,"Matt Turner"  wrote:> Subject: Re:
>>
>> On Fri, Aug 19, 2011 at 12:13 AM, Matt Turner  wrote:
>> > Hi,
>> >
>> > Attached is a patch based on gcc-4.6.1 that wires-up missing ARM
>> > iwmmxt intrinsics. Without it, gcc is completely useless when it comes
>> > to using a large portion of the intrinsics documented on this page:
>> > http://gcc.gnu.org/onlinedocs/gcc/ARM-iWMMXt-Built_002din-Functions.html
>> >
>> > The patch is based on the work of  in bug 35294.
>> >
>> > I do not know why the check_opsmode hack is necessary.
>
> Hi,
>
> I think check_opsmode in this patch is used to solve something that could be 
> solved by
> -  gcc_assert (GET_MODE (op0) == mode0 && GET_MODE (op1) == mode1);
> +  gcc_assert ((GET_MODE (op0) == mode0 || GET_MODE (op0) == VOIDmode)
> +             && (GET_MODE (op1) == mode1 || GET_MODE (op1) == VOIDmode));
> in my patch.
> For example, in the shift intrinsics, the shift count could be either a 
> variable, or a CONST_INT which has VOIDmode.
>
>> >I also do not know if this wires up all the missing intrinsics.
>
> I'm afraid not. Trunk misses all iWMMXt2 intrinsics and the bugs could be 
> found everywhere since it is lack of maintenance for a long time.
>
>> > I have seen much more extensive patches from Xinyu Qi, but I do not
>> > suppose that they will be available in gcc 4.6.
>
> The patches I submitted have some conflict with 4.6 code base.
>
> Thanks,
> Xinyu

Indeed, that seems like the way it should be done. Thanks very much.
See the attached patch.

Thanks,
Matt
--- arm.c.orig	2011-05-05 04:39:40.0 -0400
+++ arm.c	2011-08-19 13:48:21.548405102 -0400
@@ -19218,7 +19218,8 @@
   || ! (*insn_data[icode].operand[0].predicate) (target, tmode))
 target = gen_reg_rtx (tmode);
 
-  gcc_assert (GET_MODE (op0) == mode0 && GET_MODE (op1) == mode1);
+  gcc_assert ((GET_MODE (op0) == mode0 || GET_MODE (op0) == VOIDmode)
+ && (GET_MODE (op1) == mode1 || GET_MODE (op1) == VOIDmode));
 
   if (! (*insn_data[icode].operand[1].predicate) (op0, mode0))
 op0 = copy_to_mode_reg (mode0, op0);
@@ -19814,6 +19815,65 @@
   emit_insn (pat);
   return target;
 
+case ARM_BUILTIN_WSLLH:
+case ARM_BUILTIN_WSLLHI:
+case ARM_BUILTIN_WSLLW:
+case ARM_BUILTIN_WSLLWI:
+case ARM_BUILTIN_WSLLD:
+case ARM_BUILTIN_WSLLDI:
+case ARM_BUILTIN_WSRAH:
+case ARM_BUILTIN_WSRAHI:
+case ARM_BUILTIN_WSRAW:
+case ARM_BUILTIN_WSRAWI:
+case ARM_BUILTIN_WSRAD:
+case ARM_BUILTIN_WSRADI:
+case ARM_BUILTIN_WSRLH:
+case ARM_BUILTIN_WSRLHI:
+case ARM_BUILTIN_WSRLW:
+case ARM_BUILTIN_WSRLWI:
+case ARM_BUILTIN_WSRLD:
+case ARM_BUILTIN_WSRLDI:
+case ARM_BUILTIN_WRORH:
+case ARM_BUILTIN_WRORHI:
+case ARM_BUILTIN_WRORW:
+case ARM_BUILTIN_WRORWI:
+case ARM_BUILTIN_WRORD:
+case ARM_BUILTIN_WRORDI:
+case ARM_BUILTIN_WAND:
+case ARM_BUILTIN_WANDN:
+case ARM_BUILTIN_WOR:
+case ARM_BUILTIN_WXOR:
+  icode = (fcode == ARM_BUILTIN_WSLLH ? CODE_FOR_ashlv4hi3_di
+	   : fcode == ARM_BUILTIN_WSLLHI ? CODE_FOR_ashlv4hi3_iwmmxt
+	   : fcode == ARM_BUILTIN_WSLLW  ? CODE_FOR_ashlv2si3_di
+	   : fcode == ARM_BUILTIN_WSLLWI ? CODE_FOR_ashlv2si3_iwmmxt
+	   : fcode == ARM_BUILTIN_WSLLD  ? CODE_FOR_ashldi3_di
+	   : fcode == ARM_BUILTIN_WSLLDI ? CODE_FOR_ashldi3_iwmmxt
+	   : fcode == ARM_BUILTIN_WSRAH  ? CODE_FOR_ashrv4hi3_di
+	   : fcode == ARM_BUILTIN_WSRAHI ? CODE_FOR_ashrv4hi3_iwmmxt
+	   : fcode == ARM_BUILTIN_WSRAW  ? CODE_FOR_ashrv2si3_di
+	   : fcode == ARM_BUILTIN_WSRAWI ? CODE_FOR_ashrv2si3_iwmmxt
+	   : fcode == ARM_BUILTIN_WSRAD  ? CODE_FOR_ashrdi3_di
+	   : fcode == ARM_BUILTIN_WSRADI ? CODE_FOR_ashrdi3_iwmmxt
+	   : fcode == ARM_BUILTIN_WSRLH  ? CODE_FOR_lshrv4hi3_di
+	   : fcode == ARM_BUILTIN_WSRLHI ? CODE_FOR_lshrv4hi3_iwmmxt
+	   : fcode == ARM_BUILTIN_WSRLW  ? CODE_FOR_lshrv2si3_di
+	   : fcode == ARM_BUILTIN_WSRLWI ? CODE_FOR_lshrv2si3_iwmmxt
+	   : fcode == ARM_BUILTIN_WSRLD  ? CODE_FOR_lshrdi3_di
+	   : fcode == ARM_BUILTIN_WSRLDI ? CODE_FOR_lshrdi3_iwmmxt
+	   : fcode == ARM_BUILTIN_WRORH  ? CODE_FOR_rorv4hi3_di
+	   : fcode == ARM_BUILTIN_WRORHI ? CODE_FOR_rorv4hi3
+	   : fcode == ARM_BUILTIN_WRORW  ? CODE_FOR_rorv2si3_di
+	   : fcode == ARM_BUILTIN_WRORWI ? CODE_FOR_rorv2si3
+	   : fcode == ARM_BUILTIN_WRORD  ? CODE_FOR_rordi3_di
+	   : fcode == ARM_BUILTIN_WRORDI ? CODE_FOR_rordi3
+	   : fcode == ARM_BUILTIN_WAND   ? CODE_FOR_iwmmxt_anddi3
+	   : fcode == ARM_BUILTIN_WANDN  ? CODE_FOR_iwmmxt_nanddi3
+	   : fcode == ARM_BUILTIN_WOR

[PATCH] Wire-up missing ARM iwmmxt intrinsics (bugs 35294, 36798, 36966)

2011-10-01 Thread Matt Turner

--- arm.c.orig  2011-05-05 04:39:40.0 -0400
+++ arm.c   2011-08-19 13:48:21.548405102 -0400
@@ -19218,7 +19218,8 @@
   || ! (*insn_data[icode].operand[0].predicate) (target, tmode))
 target = gen_reg_rtx (tmode);
 
-  gcc_assert (GET_MODE (op0) == mode0 && GET_MODE (op1) == mode1);
+  gcc_assert ((GET_MODE (op0) == mode0 || GET_MODE (op0) == VOIDmode)
+ && (GET_MODE (op1) == mode1 || GET_MODE (op1) == VOIDmode));
 
   if (! (*insn_data[icode].operand[1].predicate) (op0, mode0))
 op0 = copy_to_mode_reg (mode0, op0);
@@ -19814,6 +19815,65 @@
   emit_insn (pat);
   return target;
 
+case ARM_BUILTIN_WSLLH:
+case ARM_BUILTIN_WSLLHI:
+case ARM_BUILTIN_WSLLW:
+case ARM_BUILTIN_WSLLWI:
+case ARM_BUILTIN_WSLLD:
+case ARM_BUILTIN_WSLLDI:
+case ARM_BUILTIN_WSRAH:
+case ARM_BUILTIN_WSRAHI:
+case ARM_BUILTIN_WSRAW:
+case ARM_BUILTIN_WSRAWI:
+case ARM_BUILTIN_WSRAD:
+case ARM_BUILTIN_WSRADI:
+case ARM_BUILTIN_WSRLH:
+case ARM_BUILTIN_WSRLHI:
+case ARM_BUILTIN_WSRLW:
+case ARM_BUILTIN_WSRLWI:
+case ARM_BUILTIN_WSRLD:
+case ARM_BUILTIN_WSRLDI:
+case ARM_BUILTIN_WRORH:
+case ARM_BUILTIN_WRORHI:
+case ARM_BUILTIN_WRORW:
+case ARM_BUILTIN_WRORWI:
+case ARM_BUILTIN_WRORD:
+case ARM_BUILTIN_WRORDI:
+case ARM_BUILTIN_WAND:
+case ARM_BUILTIN_WANDN:
+case ARM_BUILTIN_WOR:
+case ARM_BUILTIN_WXOR:
+  icode = (fcode == ARM_BUILTIN_WSLLH ? CODE_FOR_ashlv4hi3_di
+  : fcode == ARM_BUILTIN_WSLLHI ? CODE_FOR_ashlv4hi3_iwmmxt
+  : fcode == ARM_BUILTIN_WSLLW  ? CODE_FOR_ashlv2si3_di
+  : fcode == ARM_BUILTIN_WSLLWI ? CODE_FOR_ashlv2si3_iwmmxt
+  : fcode == ARM_BUILTIN_WSLLD  ? CODE_FOR_ashldi3_di
+  : fcode == ARM_BUILTIN_WSLLDI ? CODE_FOR_ashldi3_iwmmxt
+  : fcode == ARM_BUILTIN_WSRAH  ? CODE_FOR_ashrv4hi3_di
+  : fcode == ARM_BUILTIN_WSRAHI ? CODE_FOR_ashrv4hi3_iwmmxt
+  : fcode == ARM_BUILTIN_WSRAW  ? CODE_FOR_ashrv2si3_di
+  : fcode == ARM_BUILTIN_WSRAWI ? CODE_FOR_ashrv2si3_iwmmxt
+  : fcode == ARM_BUILTIN_WSRAD  ? CODE_FOR_ashrdi3_di
+  : fcode == ARM_BUILTIN_WSRADI ? CODE_FOR_ashrdi3_iwmmxt
+  : fcode == ARM_BUILTIN_WSRLH  ? CODE_FOR_lshrv4hi3_di
+  : fcode == ARM_BUILTIN_WSRLHI ? CODE_FOR_lshrv4hi3_iwmmxt
+  : fcode == ARM_BUILTIN_WSRLW  ? CODE_FOR_lshrv2si3_di
+  : fcode == ARM_BUILTIN_WSRLWI ? CODE_FOR_lshrv2si3_iwmmxt
+  : fcode == ARM_BUILTIN_WSRLD  ? CODE_FOR_lshrdi3_di
+  : fcode == ARM_BUILTIN_WSRLDI ? CODE_FOR_lshrdi3_iwmmxt
+  : fcode == ARM_BUILTIN_WRORH  ? CODE_FOR_rorv4hi3_di
+  : fcode == ARM_BUILTIN_WRORHI ? CODE_FOR_rorv4hi3
+  : fcode == ARM_BUILTIN_WRORW  ? CODE_FOR_rorv2si3_di
+  : fcode == ARM_BUILTIN_WRORWI ? CODE_FOR_rorv2si3
+  : fcode == ARM_BUILTIN_WRORD  ? CODE_FOR_rordi3_di
+  : fcode == ARM_BUILTIN_WRORDI ? CODE_FOR_rordi3
+  : fcode == ARM_BUILTIN_WAND   ? CODE_FOR_iwmmxt_anddi3
+  : fcode == ARM_BUILTIN_WANDN  ? CODE_FOR_iwmmxt_nanddi3
+  : fcode == ARM_BUILTIN_WOR? CODE_FOR_iwmmxt_iordi3
+  : fcode == ARM_BUILTIN_WXOR   ? CODE_FOR_iwmmxt_xordi3
+  : CODE_FOR_rordi3);
+  return arm_expand_binop_builtin (icode, exp, target);
+
 case ARM_BUILTIN_WZERO:
   target = gen_reg_rtx (DImode);
   emit_insn (gen_iwmmxt_clrdi (target));


[PATCH 1/2] doc: Correct __builtin_arm_tinsr prototype documentation

2012-04-04 Thread Matt Turner
2012-04-04  Matt Turner  

gcc/
* doc/extend.texi (__builtin_arm_tinsrb): Add missing second
parameter.
(__builtin_arm_tinsrh): Likewise.
(__builtin_arm_tinsrw): Likewise.
---
This patch and 2/2 are tie-ons to
http://gcc.gnu.org/ml/gcc-patches/2012-02/msg01269.html

Still waiting on copyright assignment, but I think this doc patch
is trivial enough to be committed without it.

 gcc/doc/extend.texi |6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index bb43825..966175d 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -8676,9 +8676,9 @@ int __builtin_arm_textrmsw (v2si, int)
 int __builtin_arm_textrmub (v8qi, int)
 int __builtin_arm_textrmuh (v4hi, int)
 int __builtin_arm_textrmuw (v2si, int)
-v8qi __builtin_arm_tinsrb (v8qi, int)
-v4hi __builtin_arm_tinsrh (v4hi, int)
-v2si __builtin_arm_tinsrw (v2si, int)
+v8qi __builtin_arm_tinsrb (v8qi, int, int)
+v4hi __builtin_arm_tinsrh (v4hi, int, int)
+v2si __builtin_arm_tinsrw (v2si, int, int)
 long long __builtin_arm_tmia (long long, int, int)
 long long __builtin_arm_tmiabb (long long, int, int)
 long long __builtin_arm_tmiabt (long long, int, int)
-- 
1.7.3.4



[PATCH] doc: Fix typo: mno-lsc -> mno-llsc

2012-04-04 Thread Matt Turner
2012-04-04  Matt Turner  

gcc/
* doc/install.texi: Correct typo "-mno-lsc" -> "-mno-llsc".
---
Still waiting on copyright assignment, but I think this doc patch
is trivial enough to be committed without it.

 gcc/doc/install.texi |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index 41dbf44..6da6c09 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -1238,7 +1238,7 @@ Division by zero checks use the break instruction.
 
 @item --with-llsc
 On MIPS targets, make @option{-mllsc} the default when no
-@option{-mno-lsc} option is passed.  This is the default for
+@option{-mno-llsc} option is passed.  This is the default for
 Linux-based targets, as the kernel will emulate them if the ISA does
 not provide them.
 
-- 
1.7.3.4



[PATCH 2/2] arm: add iwMMXt mmx-2.c test

2012-04-04 Thread Matt Turner
2012-04-04  Matt Turner  

PR target/35294
* gcc.target/arm/mmx-2.c: New.
---
This patch and 1/2 are tie-ons to
http://gcc.gnu.org/ml/gcc-patches/2012-02/msg01269.html

Still waiting on copyright assignment, but please review in the meantime.

Is there anything else I need to do to wire this into the test suite
other than putting it in the testsuite/gcc.target/arm/ folder?

 gcc/testsuite/gcc.target/arm/mmx-2.c |  158 ++
 1 files changed, 158 insertions(+), 0 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arm/mmx-2.c

diff --git a/gcc/testsuite/gcc.target/arm/mmx-2.c 
b/gcc/testsuite/gcc.target/arm/mmx-2.c
new file mode 100644
index 000..603a63b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mmx-2.c
@@ -0,0 +1,158 @@
+/* { dg-do compile } */
+/* { dg-skip-if "Test is specific to the iWMMXt" { arm*-*-* } { "-mcpu=*" } { 
"-mcpu=iwmmxt" } } */
+/* { dg-skip-if "Test is specific to the iWMMXt" { arm*-*-* } { "-mabi=*" } { 
"-mabi=iwmmxt" } } */
+/* { dg-skip-if "Test is specific to the iWMMXt" { arm*-*-* } { "-march=*" } { 
"-march=iwmmxt" } } */
+/* { dg-skip-if "Test is specific to ARM mode" { arm*-*-* } { "-mthumb" } { "" 
} } */
+/* { dg-require-effective-target arm32 } */
+/* { dg-require-effective-target arm_iwmmxt_ok } */
+
+/* Internal data types for implementing the intrinsics.  */
+typedef int __v2si __attribute__ ((vector_size (8)));
+typedef short __v4hi __attribute__ ((vector_size (8)));
+typedef signed char __v8qi __attribute__ ((vector_size (8)));
+
+void
+foo(void)
+{
+  volatile int isink;
+  volatile long long llsink;
+  volatile __v8qi v8sink;
+  volatile __v4hi v4sink;
+  volatile __v2si v2sink;
+
+  isink = __builtin_arm_getwcx (0);
+  __builtin_arm_setwcx (isink, 0);
+  isink = __builtin_arm_textrmsb (v8sink, 0);
+  isink = __builtin_arm_textrmsh (v4sink, 0);
+  isink = __builtin_arm_textrmsw (v2sink, 0);
+  isink = __builtin_arm_textrmub (v8sink, 0);
+  isink = __builtin_arm_textrmuh (v4sink, 0);
+  isink = __builtin_arm_textrmuw (v2sink, 0);
+  v8sink = __builtin_arm_tinsrb (v8sink, isink, 0);
+  v4sink = __builtin_arm_tinsrh (v4sink, isink, 0);
+  v2sink = __builtin_arm_tinsrw (v2sink, isink, 0);
+  llsink = __builtin_arm_tmia (llsink, isink, isink);
+  llsink = __builtin_arm_tmiabb (llsink, isink, isink);
+  llsink = __builtin_arm_tmiabt (llsink, isink, isink);
+  llsink = __builtin_arm_tmiaph (llsink, isink, isink);
+  llsink = __builtin_arm_tmiatb (llsink, isink, isink);
+  llsink = __builtin_arm_tmiatt (llsink, isink, isink);
+  isink = __builtin_arm_tmovmskb (v8sink);
+  isink = __builtin_arm_tmovmskh (v4sink);
+  isink = __builtin_arm_tmovmskw (v2sink);
+  llsink = __builtin_arm_waccb (v8sink);
+  llsink = __builtin_arm_wacch (v4sink);
+  llsink = __builtin_arm_waccw (v2sink);
+  v8sink = __builtin_arm_waddb (v8sink, v8sink);
+  v8sink = __builtin_arm_waddbss (v8sink, v8sink);
+  v8sink = __builtin_arm_waddbus (v8sink, v8sink);
+  v4sink = __builtin_arm_waddh (v4sink, v4sink);
+  v4sink = __builtin_arm_waddhss (v4sink, v4sink);
+  v4sink = __builtin_arm_waddhus (v4sink, v4sink);
+  v2sink = __builtin_arm_waddw (v2sink, v2sink);
+  v2sink = __builtin_arm_waddwss (v2sink, v2sink);
+  v2sink = __builtin_arm_waddwus (v2sink, v2sink);
+  v8sink = __builtin_arm_walign (v8sink, v8sink, 0);  /* waligni: 3-bit 
immediate.  */
+  v8sink = __builtin_arm_walign (v8sink, v8sink, isink); /* walignr: GP 
register.  */
+  llsink = __builtin_arm_wand(llsink, llsink);
+  llsink = __builtin_arm_wandn (llsink, llsink);
+  v8sink = __builtin_arm_wavg2b (v8sink, v8sink);
+  v8sink = __builtin_arm_wavg2br (v8sink, v8sink);
+  v4sink = __builtin_arm_wavg2h (v4sink, v4sink);
+  v4sink = __builtin_arm_wavg2hr (v4sink, v4sink);
+  v8sink = __builtin_arm_wcmpeqb (v8sink, v8sink);
+  v4sink = __builtin_arm_wcmpeqh (v4sink, v4sink);
+  v2sink = __builtin_arm_wcmpeqw (v2sink, v2sink);
+  v8sink = __builtin_arm_wcmpgtsb (v8sink, v8sink);
+  v4sink = __builtin_arm_wcmpgtsh (v4sink, v4sink);
+  v2sink = __builtin_arm_wcmpgtsw (v2sink, v2sink);
+  v8sink = __builtin_arm_wcmpgtub (v8sink, v8sink);
+  v4sink = __builtin_arm_wcmpgtuh (v4sink, v4sink);
+  v2sink = __builtin_arm_wcmpgtuw (v2sink, v2sink);
+  llsink = __builtin_arm_wmacs (llsink, v4sink, v4sink);
+  llsink = __builtin_arm_wmacsz (v4sink, v4sink);
+  llsink = __builtin_arm_wmacu (llsink, v4sink, v4sink);
+  llsink = __builtin_arm_wmacuz (v4sink, v4sink);
+  v4sink = __builtin_arm_wmadds (v4sink, v4sink);
+  v4sink = __builtin_arm_wmaddu (v4sink, v4sink);
+  v8sink = __builtin_arm_wmaxsb (v8sink, v8sink);
+  v4sink = __builtin_arm_wmaxsh (v4sink, v4sink);
+  v2sink = __builtin_arm_wmaxsw (v2sink, v2sink);
+  v8sink = __builtin_arm_wmaxub (v8sink, v8sink);
+  v4sink = __builtin_arm_wmaxuh (v4sink, v4sink);
+  v2sink = __builtin_arm_wmaxuw (v2sink, v2sink);
+  v8sink = __

[PING] iwMMXt patches

2012-04-17 Thread Matt Turner
Are these patches ready to go in? It looks like they were ack'd.

http://gcc.gnu.org/ml/gcc-patches/2011-10/msg01815.html
http://gcc.gnu.org/ml/gcc-patches/2011-10/msg01817.html
http://gcc.gnu.org/ml/gcc-patches/2011-10/msg01816.html
http://gcc.gnu.org/ml/gcc-patches/2011-10/msg01818.html
http://gcc.gnu.org/ml/gcc-patches/2011-10/msg01819.html

We (OLPC) will need these patches for reasonable iwMMXt performance
and the ability to use VFP and iwMMXt together.

Thanks,
Matt


[PATCH 1/2] mips: Add R4600 scheduling support for imul and idiv

2012-02-24 Thread Matt Turner
The r4600_imul and r4600_idiv reservations were correct for si, but
there were no *_di reservations.

See page 4 of
http://www.sgistuff.net/hardware/other/documents/R4600_Prod_OV.pdf

2012-02-24  Matt Turner  

* config/mips/4600.md (r4600_imul_si): Rename from r4600_imul.
(r4600_imul_di): New.
(r4600_idiv_si): Rename from r4600_idiv.
(r4600_idiv_di): New.
---
 gcc/config/mips/4600.md |   24 +++-
 1 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/gcc/config/mips/4600.md b/gcc/config/mips/4600.md
index c645cbc..fcdbf00 100644
--- a/gcc/config/mips/4600.md
+++ b/gcc/config/mips/4600.md
@@ -1,5 +1,5 @@
 ;; R4600 and R4650 pipeline description.
-;;   Copyright (C) 2004, 2005, 2007 Free Software Foundation, Inc.
+;;   Copyright (C) 2004, 2005, 2007, 2012 Free Software Foundation, Inc.
 ;;
 ;; This file is part of GCC.
 
@@ -24,16 +24,30 @@
 ;; We handle the R4600 and R4650 in much the same way.  The only difference
 ;; is in the integer multiplication and division costs.
 
-(define_insn_reservation "r4600_imul" 10
+(define_insn_reservation "r4600_imul_si" 10
   (and (eq_attr "cpu" "r4600")
-   (eq_attr "type" "imul,imul3,imadd"))
+   (eq_attr "type" "imul,imul3,imadd")
+   (eq_attr "mode" "SI"))
   "imuldiv*10")
 
-(define_insn_reservation "r4600_idiv" 42
+(define_insn_reservation "r4600_imul_di" 12
   (and (eq_attr "cpu" "r4600")
-   (eq_attr "type" "idiv"))
+   (eq_attr "type" "imul,imul3,imadd")
+   (eq_attr "mode" "DI"))
+  "imuldiv*12")
+
+(define_insn_reservation "r4600_idiv_si" 42
+  (and (eq_attr "cpu" "r4600")
+   (eq_attr "type" "idiv")
+   (eq_attr "mode" "SI"))
   "imuldiv*42")
 
+(define_insn_reservation "r4600_idiv_di" 74
+  (and (eq_attr "cpu" "r4600")
+   (eq_attr "type" "idiv")
+   (eq_attr "mode" "DI"))
+  "imuldiv*74")
+
 
 (define_insn_reservation "r4650_imul" 4
   (and (eq_attr "cpu" "r4650")
-- 
1.7.3.4



Miscellaneous mips, arm, and alpha patches

2012-02-24 Thread Matt Turner
Hi,

Following this email are five rather trivial patches that I've had
sitting around while waiting for my grad school and the Free Software
Foundation to decide it's okay for me to contribute. I don't have
copyright assignment for gcc yet, but I thought I would pipeline this
process and try to get the patches at least reviewed before the
paperwork is completed. If they're trivial enough to be committed
without copyright assignment, I'd love for them to be committed for
gcc 4.8.

The patches are

[PATCH 1/2] mips: Add R4600 scheduling support for imul and idiv
[PATCH 2/2] mips: Add R4700 scheduling support
[PATCH] arm: Fix iwmmxt shift and logical intrinsics (PR 35294)
[PATCH] arm: add _mm_empty to mmintrin.h for source compatibility
[PATCH] alpha: add bypasses for fmul/fadd/fcmov -> fst/ftoi

I have not contributed to gcc before, so please tell me if I've missed
a step or didn't format the ChangeLog entries properly, and so forth.
Please CC me on replies.

Thanks,
Matt Turner


[PATCH] alpha: add bypasses for fmul/fadd/fcmov -> fst/ftoi

2012-02-24 Thread Matt Turner
See section 2.5.3 (page 28) of
http://download.majix.org/dec/comp_guide_v2.pdf

2012-02-24  Matt Turner  

* config/alpha/ev6.md: (define_bypass "ev6_fmul,ev6_fadd"): New.
(define_bypass "ev6_fcmov"): New.
---
 gcc/config/alpha/ev6.md |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/gcc/config/alpha/ev6.md b/gcc/config/alpha/ev6.md
index adfe504..a16535a 100644
--- a/gcc/config/alpha/ev6.md
+++ b/gcc/config/alpha/ev6.md
@@ -147,11 +147,15 @@
(eq_attr "type" "fadd,fcpys,fbr"))
   "ev6_fa")
 
+(define_bypass 6 "ev6_fmul,ev6_fadd" "ev6_fst,ev6_ftoi")
+
 (define_insn_reservation "ev6_fcmov" 8
   (and (eq_attr "tune" "ev6")
(eq_attr "type" "fcmov"))
   "ev6_fa,nothing*3,ev6_fa")
 
+(define_bypass 10 "ev6_fcmov" "ev6_fst,ev6_ftoi")
+
 (define_insn_reservation "ev6_fdivsf" 12
   (and (eq_attr "tune" "ev6")
(and (eq_attr "type" "fdiv")
-- 
1.7.3.4



[PATCH 2/2] mips: Add R4700 scheduling support

2012-02-24 Thread Matt Turner
The R4700 is identical to the R4600 except for the integer and
floating-point multiplication costs.

See page 4 of http://datasheets.chipdb.org/IDT/MIPS/79RV4700.pdf

2012-02-24  Matt Turner  

* config/mips/4600.md (r4700_imul_si): New.
(r4700_imul_di): New.
(r4700_fmul_single): New.
(r4700_fmul_double): New.
* config/mips/driver-native.c (cpu_types): Add r4700.
* config/mips/mips-cpus.def: Likewise.
* config/mips/mips.c: Likewise.
* config/mips/mips.md: Likewise.
---
 gcc/config/mips/4600.md |   51 ++
 gcc/config/mips/driver-native.c |2 +-
 gcc/config/mips/mips-cpus.def   |1 +
 gcc/config/mips/mips.c  |3 ++
 gcc/config/mips/mips.md |1 +
 5 files changed, 46 insertions(+), 12 deletions(-)

diff --git a/gcc/config/mips/4600.md b/gcc/config/mips/4600.md
index fcdbf00..ef74fd3 100644
--- a/gcc/config/mips/4600.md
+++ b/gcc/config/mips/4600.md
@@ -1,4 +1,4 @@
-;; R4600 and R4650 pipeline description.
+;; R4600, R4650, and R4700 pipeline description.
 ;;   Copyright (C) 2004, 2005, 2007, 2012 Free Software Foundation, Inc.
 ;;
 ;; This file is part of GCC.
@@ -21,8 +21,10 @@
 ;; This file overrides parts of generic.md.  It is derived from the
 ;; old define_function_unit description.
 ;;
-;; We handle the R4600 and R4650 in much the same way.  The only difference
-;; is in the integer multiplication and division costs.
+;; We handle the R4600, R4650, and R4700 in much the same way.  The only
+;; differences between R4600 and R4650 are the integer multiplication and
+;; division costs. The only differences between R4600 and R4700 are the
+;; integer and floating-point multiplication costs.
 
 (define_insn_reservation "r4600_imul_si" 10
   (and (eq_attr "cpu" "r4600")
@@ -37,13 +39,13 @@
   "imuldiv*12")
 
 (define_insn_reservation "r4600_idiv_si" 42
-  (and (eq_attr "cpu" "r4600")
+  (and (eq_attr "cpu" "r4600,r4700")
(eq_attr "type" "idiv")
(eq_attr "mode" "SI"))
   "imuldiv*42")
 
 (define_insn_reservation "r4600_idiv_di" 74
-  (and (eq_attr "cpu" "r4600")
+  (and (eq_attr "cpu" "r4600,r4700")
(eq_attr "type" "idiv")
(eq_attr "mode" "DI"))
   "imuldiv*74")
@@ -60,13 +62,26 @@
   "imuldiv*36")
 
 
+(define_insn_reservation "r4700_imul_si" 8
+  (and (eq_attr "cpu" "r4700")
+   (eq_attr "type" "imul,imul3,imadd")
+   (eq_attr "mode" "SI"))
+  "imuldiv*8")
+
+(define_insn_reservation "r4700_imul_di" 10
+  (and (eq_attr "cpu" "r4700")
+   (eq_attr "type" "imul,imul3,imadd")
+   (eq_attr "mode" "DI"))
+  "imuldiv*10")
+
+
 (define_insn_reservation "r4600_load" 2
-  (and (eq_attr "cpu" "r4600,r4650")
+  (and (eq_attr "cpu" "r4600,r4650,r4700")
(eq_attr "type" "load,fpload,fpidxload"))
   "alu")
 
 (define_insn_reservation "r4600_fmove" 1
-  (and (eq_attr "cpu" "r4600,r4650")
+  (and (eq_attr "cpu" "r4600,r4650,r4700")
(eq_attr "type" "fabs,fneg,fmove"))
   "alu")
 
@@ -76,26 +91,40 @@
(eq_attr "mode" "SF")))
   "alu")
 
+
+(define_insn_reservation "r4700_fmul_single" 4
+  (and (eq_attr "cpu" "r4700")
+   (and (eq_attr "type" "fmul,fmadd")
+   (eq_attr "mode" "SF")))
+  "alu")
+
+(define_insn_reservation "r4700_fmul_double" 5
+  (and (eq_attr "cpu" "r4700")
+   (and (eq_attr "type" "fmul,fmadd")
+   (eq_attr "mode" "DF")))
+  "alu")
+
+
 (define_insn_reservation "r4600_fdiv_single" 32
-  (and (eq_attr "cpu" "r4600,r4650")
+  (and (eq_attr "cpu" "r4600,r4650,r4700")
(and (eq_attr "type" "fdiv,frdiv")
(eq_attr "mode" "SF")))
   "alu")
 
 (define_insn_reservation "r4600_fdiv_double" 61
-  (and (eq_attr "cpu" "r4600,r4650")
+  (and (eq_attr "cpu" "r4600,r4650,r4700")
(and (eq_attr "type" "fdiv,frdiv")
(eq_attr "mode" "DF")))
   "alu")
 
 (define_insn_reservation "r4600_fsqrt_single" 31
-  (and (eq_attr "cpu" "r4600,r4650")
+  (and (eq_attr "cpu" "r4600,r4

[PATCH] arm: Fix iwmmxt shift and logical intrinsics (PR 35294).

2012-02-24 Thread Matt Turner
PR 36798 and 36966 are duplicates.

2012-02-24  Matt Turner  

PR target/35294
* config/arm/arm.c (arm_expand_builtin): Wire up missing
intrinsics.
---
 gcc/config/arm/arm.c |   62 +-
 1 files changed, 61 insertions(+), 1 deletions(-)

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 7f0dc6b..f5935d6 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -20502,7 +20502,8 @@ arm_expand_binop_builtin (enum insn_code icode,
   || ! (*insn_data[icode].operand[0].predicate) (target, tmode))
 target = gen_reg_rtx (tmode);
 
-  gcc_assert (GET_MODE (op0) == mode0 && GET_MODE (op1) == mode1);
+  gcc_assert ((GET_MODE (op0) == mode0 || GET_MODE (op0) == VOIDmode)
+ && (GET_MODE (op1) == mode1 || GET_MODE (op1) == VOIDmode));
 
   if (! (*insn_data[icode].operand[1].predicate) (op0, mode0))
 op0 = copy_to_mode_reg (mode0, op0);
@@ -21181,6 +21182,65 @@ arm_expand_builtin (tree exp,
   emit_insn (pat);
   return target;
 
+case ARM_BUILTIN_WSLLH:
+case ARM_BUILTIN_WSLLHI:
+case ARM_BUILTIN_WSLLW:
+case ARM_BUILTIN_WSLLWI:
+case ARM_BUILTIN_WSLLD:
+case ARM_BUILTIN_WSLLDI:
+case ARM_BUILTIN_WSRAH:
+case ARM_BUILTIN_WSRAHI:
+case ARM_BUILTIN_WSRAW:
+case ARM_BUILTIN_WSRAWI:
+case ARM_BUILTIN_WSRAD:
+case ARM_BUILTIN_WSRADI:
+case ARM_BUILTIN_WSRLH:
+case ARM_BUILTIN_WSRLHI:
+case ARM_BUILTIN_WSRLW:
+case ARM_BUILTIN_WSRLWI:
+case ARM_BUILTIN_WSRLD:
+case ARM_BUILTIN_WSRLDI:
+case ARM_BUILTIN_WRORH:
+case ARM_BUILTIN_WRORHI:
+case ARM_BUILTIN_WRORW:
+case ARM_BUILTIN_WRORWI:
+case ARM_BUILTIN_WRORD:
+case ARM_BUILTIN_WRORDI:
+case ARM_BUILTIN_WAND:
+case ARM_BUILTIN_WANDN:
+case ARM_BUILTIN_WOR:
+case ARM_BUILTIN_WXOR:
+  icode = (fcode == ARM_BUILTIN_WSLLH ? CODE_FOR_ashlv4hi3_di
+  : fcode == ARM_BUILTIN_WSLLHI ? CODE_FOR_ashlv4hi3_iwmmxt
+  : fcode == ARM_BUILTIN_WSLLW  ? CODE_FOR_ashlv2si3_di
+  : fcode == ARM_BUILTIN_WSLLWI ? CODE_FOR_ashlv2si3_iwmmxt
+  : fcode == ARM_BUILTIN_WSLLD  ? CODE_FOR_ashldi3_di
+  : fcode == ARM_BUILTIN_WSLLDI ? CODE_FOR_ashldi3_iwmmxt
+  : fcode == ARM_BUILTIN_WSRAH  ? CODE_FOR_ashrv4hi3_di
+  : fcode == ARM_BUILTIN_WSRAHI ? CODE_FOR_ashrv4hi3_iwmmxt
+  : fcode == ARM_BUILTIN_WSRAW  ? CODE_FOR_ashrv2si3_di
+  : fcode == ARM_BUILTIN_WSRAWI ? CODE_FOR_ashrv2si3_iwmmxt
+  : fcode == ARM_BUILTIN_WSRAD  ? CODE_FOR_ashrdi3_di
+  : fcode == ARM_BUILTIN_WSRADI ? CODE_FOR_ashrdi3_iwmmxt
+  : fcode == ARM_BUILTIN_WSRLH  ? CODE_FOR_lshrv4hi3_di
+  : fcode == ARM_BUILTIN_WSRLHI ? CODE_FOR_lshrv4hi3_iwmmxt
+  : fcode == ARM_BUILTIN_WSRLW  ? CODE_FOR_lshrv2si3_di
+  : fcode == ARM_BUILTIN_WSRLWI ? CODE_FOR_lshrv2si3_iwmmxt
+  : fcode == ARM_BUILTIN_WSRLD  ? CODE_FOR_lshrdi3_di
+  : fcode == ARM_BUILTIN_WSRLDI ? CODE_FOR_lshrdi3_iwmmxt
+  : fcode == ARM_BUILTIN_WRORH  ? CODE_FOR_rorv4hi3_di
+  : fcode == ARM_BUILTIN_WRORHI ? CODE_FOR_rorv4hi3
+  : fcode == ARM_BUILTIN_WRORW  ? CODE_FOR_rorv2si3_di
+  : fcode == ARM_BUILTIN_WRORWI ? CODE_FOR_rorv2si3
+  : fcode == ARM_BUILTIN_WRORD  ? CODE_FOR_rordi3_di
+  : fcode == ARM_BUILTIN_WRORDI ? CODE_FOR_rordi3
+  : fcode == ARM_BUILTIN_WAND   ? CODE_FOR_iwmmxt_anddi3
+  : fcode == ARM_BUILTIN_WANDN  ? CODE_FOR_iwmmxt_nanddi3
+  : fcode == ARM_BUILTIN_WOR? CODE_FOR_iwmmxt_iordi3
+  : fcode == ARM_BUILTIN_WXOR   ? CODE_FOR_iwmmxt_xordi3
+  : CODE_FOR_rordi3);
+  return arm_expand_binop_builtin (icode, exp, target);
+
 case ARM_BUILTIN_WZERO:
   target = gen_reg_rtx (DImode);
   emit_insn (gen_iwmmxt_clrdi (target));
-- 
1.7.3.4



[PATCH] arm: add _mm_empty to mmintrin.h for source compatibility

2012-02-24 Thread Matt Turner
The x86/amd64 mmintrin.h provides the _mm_empty intrinsic for the 'emms'
MMX instruction. Although ARM does not need such an instruction, we
should provide an empty _mm_empty function nonetheless for source
compatibility.

2012-02-24  Matt Turner  

* config/arm/mmintrin.h (_mm_empty): New.
---
 gcc/config/arm/mmintrin.h |7 +++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/gcc/config/arm/mmintrin.h b/gcc/config/arm/mmintrin.h
index 2cc500d..ea73bf1 100644
--- a/gcc/config/arm/mmintrin.h
+++ b/gcc/config/arm/mmintrin.h
@@ -32,6 +32,12 @@ typedef int __v2si __attribute__ ((vector_size (8)));
 typedef short __v4hi __attribute__ ((vector_size (8)));
 typedef char __v8qi __attribute__ ((vector_size (8)));
 
+/* Provided for source compatibility with MMX.  */
+extern __inline void __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_mm_empty (void)
+{
+}
+
 /* "Convert" __m64 and __int64 into each other.  */
 static __inline __m64 
 _mm_cvtsi64_m64 (__int64 __i)
@@ -1248,6 +1254,7 @@ _m_from_int (int __a)
 #define _m_psadzbw _mm_sadz_pu8
 #define _m_psadzwd _mm_sadz_pu16
 #define _m_paligniq _mm_align_si64
+#define _m_empty _mm_empty
 #define _m_cvt_si2pi _mm_cvtsi64_m64
 #define _m_cvt_pi2si _mm_cvtm64_si64
 
-- 
1.7.3.4