Re: [PATCH, GCC] Fix PR67781: wrong code generation for partial load on big endian targets

2016-01-20 Thread Thomas Preud'homme
On Friday, January 08, 2016 10:05:25 AM Richard Biener wrote:
> On Tue, 5 Jan 2016, Thomas Preud'homme wrote:
> > Hi,
> > 
> > bswap optimization pass generate wrong code on big endian targets when the
> > result of a bit operation it analyzed is a partial load of the range of
> > memory accessed by the original expression (when one or more bytes at
> > lowest address were lost in the computation). This is due to the way
> > cmpxchg and cmpnop are adjusted in find_bswap_or_nop before being
> > compared to the result of the symbolic expression. Part of the adjustment
> > is endian independent: it's to ignore the bytes that were not accessed by
> > the original gimple expression. However, when the result has less byte
> > than that original expression, some more byte need to be ignored and this
> > is endian dependent.
> > 
> > The current code only support loss of bytes at the highest addresses
> > because there is no code to adjust the address of the load. However, for
> > little and big endian targets the bytes at highest address translate into
> > different byte significance in the result. This patch first separate
> > cmpxchg and cmpnop adjustement into 2 steps and then deal with endianness
> > correctly for the second step.
> > 
> > ChangeLog entries are as follow:
> > 
> > 
> > *** gcc/ChangeLog ***
> > 
> > 2015-12-16  Thomas Preud'homme  
> > 
> > PR tree-optimization/67781
> > * tree-ssa-math-opts.c (find_bswap_or_nop): Zero out bytes in
> > cmpxchg
> > and cmpnop in two steps: first the ones not accessed in original
> > gimple expression in a endian independent way and then the ones
> > not
> > accessed in the final result in an endian-specific way.
> > 
> > *** gcc/testsuite/ChangeLog ***
> > 
> > 2015-12-16  Thomas Preud'homme  
> > 
> > PR tree-optimization/67781
> > * gcc.c-torture/execute/pr67781.c: New file.
> > 
> > diff --git a/gcc/testsuite/gcc.c-torture/execute/pr67781.c
> > b/gcc/testsuite/gcc.c-torture/execute/pr67781.c
> > new file mode 100644
> > index 000..bf50aa2
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.c-torture/execute/pr67781.c
> > @@ -0,0 +1,34 @@
> > +#ifdef __UINT32_TYPE__
> > +typedef __UINT32_TYPE__ uint32_t;
> > +#else
> > +typedef unsigned uint32_t;
> > +#endif
> > +
> > +#ifdef __UINT8_TYPE__
> > +typedef __UINT8_TYPE__ uint8_t;
> > +#else
> > +typedef unsigned char uint8_t;
> > +#endif
> > +
> > +struct
> > +{
> > +  uint32_t a;
> > +  uint8_t b;
> > +} s = { 0x123456, 0x78 };
> > +
> > +int pr67781()
> > +{
> > +  uint32_t c = (s.a << 8) | s.b;
> > +  return c;
> > +}
> > +
> > +int
> > +main ()
> > +{
> > +  if (sizeof (uint32_t) * __CHAR_BIT__ != 32)
> > +return 0;
> > +
> > +  if (pr67781 () != 0x12345678)
> > +__builtin_abort ();
> > +  return 0;
> > +}
> > diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
> > index b00f046..e5a185f 100644
> > --- a/gcc/tree-ssa-math-opts.c
> > +++ b/gcc/tree-ssa-math-opts.c
> > @@ -2441,6 +2441,8 @@ find_bswap_or_nop_1 (gimple *stmt, struct
> > symbolic_number *n, int limit)
> > 
> >  static gimple *
> >  find_bswap_or_nop (gimple *stmt, struct symbolic_number *n, bool *bswap)
> >  {
> > 
> > +  unsigned rsize;
> > +  uint64_t tmpn, mask;
> > 
> >  /* The number which the find_bswap_or_nop_1 result should match in order
> >  
> > to have a full byte swap.  The number is shifted to the right
> > according to the size of the symbolic number before using it.  */
> > 
> > @@ -2464,24 +2466,38 @@ find_bswap_or_nop (gimple *stmt, struct
> > symbolic_number *n, bool *bswap)
> > 
> >/* Find real size of result (highest non-zero byte).  */
> >if (n->base_addr)
> > 
> > -{
> > -  int rsize;
> > -  uint64_t tmpn;
> > -
> > -  for (tmpn = n->n, rsize = 0; tmpn; tmpn >>= BITS_PER_MARKER,
> > rsize++); -  n->range = rsize;
> > -}
> > +for (tmpn = n->n, rsize = 0; tmpn; tmpn >>= BITS_PER_MARKER,
> > rsize++);
> > +  else
> > +rsize = n->range;
> > 
> > -  /* Zero out the extra bits of N and CMP*.  */
> > +  /* Zero out the bits corresponding to untouched bytes in original
> > gimple
> > + expression.  */

Re: [PATCH, GCC] Fix PR67781: wrong code generation for partial load on big endian targets

2016-01-21 Thread Thomas Preud'homme
On Thursday, January 21, 2016 09:21:52 AM Richard Biener wrote:
> On Thu, 21 Jan 2016, Thomas Preud'homme wrote:
> > On Friday, January 08, 2016 10:05:25 AM Richard Biener wrote:
> > > On Tue, 5 Jan 2016, Thomas Preud'homme wrote:
> > > > Hi,
> > > > 
> > > > bswap optimization pass generate wrong code on big endian targets when
> > > > the
> > > > result of a bit operation it analyzed is a partial load of the range
> > > > of
> > > > memory accessed by the original expression (when one or more bytes at
> > > > lowest address were lost in the computation). This is due to the way
> > > > cmpxchg and cmpnop are adjusted in find_bswap_or_nop before being
> > > > compared to the result of the symbolic expression. Part of the
> > > > adjustment
> > > > is endian independent: it's to ignore the bytes that were not accessed
> > > > by
> > > > the original gimple expression. However, when the result has less byte
> > > > than that original expression, some more byte need to be ignored and
> > > > this
> > > > is endian dependent.
> > > > 
> > > > The current code only support loss of bytes at the highest addresses
> > > > because there is no code to adjust the address of the load. However,
> > > > for
> > > > little and big endian targets the bytes at highest address translate
> > > > into
> > > > different byte significance in the result. This patch first separate
> > > > cmpxchg and cmpnop adjustement into 2 steps and then deal with
> > > > endianness
> > > > correctly for the second step.
> > > > 
> > > > ChangeLog entries are as follow:
> > > > 
> > > > 
> > > > *** gcc/ChangeLog ***
> > > > 
> > > > 2015-12-16  Thomas Preud'homme  
> > > > 
> > > > PR tree-optimization/67781
> > > > * tree-ssa-math-opts.c (find_bswap_or_nop): Zero out bytes in
> > > > cmpxchg
> > > > and cmpnop in two steps: first the ones not accessed in
> > > > original
> > > > gimple expression in a endian independent way and then the
> > > > ones
> > > > not
> > > > accessed in the final result in an endian-specific way.
> > > > 
> > > > *** gcc/testsuite/ChangeLog ***
> > > > 
> > > > 2015-12-16  Thomas Preud'homme  
> > > > 
> > > > PR tree-optimization/67781
> > > > * gcc.c-torture/execute/pr67781.c: New file.
> > > > 
> > > > diff --git a/gcc/testsuite/gcc.c-torture/execute/pr67781.c
> > > > b/gcc/testsuite/gcc.c-torture/execute/pr67781.c
> > > > new file mode 100644
> > > > index 000..bf50aa2
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.c-torture/execute/pr67781.c
> > > > @@ -0,0 +1,34 @@
> > > > +#ifdef __UINT32_TYPE__
> > > > +typedef __UINT32_TYPE__ uint32_t;
> > > > +#else
> > > > +typedef unsigned uint32_t;
> > > > +#endif
> > > > +
> > > > +#ifdef __UINT8_TYPE__
> > > > +typedef __UINT8_TYPE__ uint8_t;
> > > > +#else
> > > > +typedef unsigned char uint8_t;
> > > > +#endif
> > > > +
> > > > +struct
> > > > +{
> > > > +  uint32_t a;
> > > > +  uint8_t b;
> > > > +} s = { 0x123456, 0x78 };
> > > > +
> > > > +int pr67781()
> > > > +{
> > > > +  uint32_t c = (s.a << 8) | s.b;
> > > > +  return c;
> > > > +}
> > > > +
> > > > +int
> > > > +main ()
> > > > +{
> > > > +  if (sizeof (uint32_t) * __CHAR_BIT__ != 32)
> > > > +return 0;
> > > > +
> > > > +  if (pr67781 () != 0x12345678)
> > > > +__builtin_abort ();
> > > > +  return 0;
> > > > +}
> > > > diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
> > > > index b00f046..e5a185f 100644
> > > > --- a/gcc/tree-ssa-math-opts.c
> > > > +++ b/gcc/tree-ssa-math-opts.c
> > > > @@ -2441,6 +2441,8 @@ find_bswap_or_nop_1 (gimple *stmt, struct
> > > > symbolic_number *n, int limit)
> > > > 
> > > >  static gimple *
> &

Re: [PATCH, testsuite] Fix g++.dg/pr67989.C test failure when running with -march or -mcpu

2016-01-26 Thread Thomas Preud'homme
Ping?

On Monday, January 18, 2016 11:33:47 AM Thomas Preud'homme wrote:
> On Wednesday, January 13, 2016 06:39:20 PM Bernd Schmidt wrote:
> > On 01/12/2016 08:55 AM, Thomas Preud'homme wrote:
> > > On Monday, January 11, 2016 04:57:18 PM Bernd Schmidt wrote:
> > >> On 01/08/2016 10:33 AM, Thomas Preud'homme wrote:
> > >>> 2016-01-08  Thomas Preud'homme  
> > >>> 
> > >>>   * g++.dg/pr67989.C: Remove ARM-specific option.
> > >>>   * gcc.target/arm/pr67989.C: New file.
> > >> 
> > >> I checked some other arm tests and they have things like
> > >> 
> > >> /* { dg-skip-if "avoid conflicting multilib options" { *-*-* } {
> > >> "-march=*" } { "-march=armv4t" } } */
> > >> /* { dg-skip-if "avoid conflicting multilib options" { *-*-* } {
> > >> "-mthumb" } { "" } } */
> > >> 
> > >> Do you need the same in your testcase?
> > > 
> > > That was the first approach I took but Kyrill suggested me to use
> > > arm_arch_v4t and arm_arch_v4t_ok machinery instead. It should take care
> > > about whether the architecture can be selected.
> > 
> > Hmm, the ones I looked at did use dg-add-options, but not the
> > corresponding _ok requirement. So I think this is OK.
> 
> Just to make sure: ok as in OK to commit as is?
> 
> Best regards,
> 
> Thomas



Re: bswap PRs 69714, 67781

2016-02-14 Thread Thomas Preud'homme
Hi Bernd,

First of all, my apologize for the late reply. I was in holidays the past week 
to celebrate Chinese new year.

On Friday, February 12, 2016 05:28:43 PM Bernd Schmidt wrote:
> PR69714 is an issue where the bswap pass makes an incorrect
> transformation on big-endian targets. The source has a 32-bit bswap, but
> PA doesn't have a pattern for that. Still, we recognize that there is a
> 16-bit bswap involved, and generate code for that - loading the halfword
> at offset 2 from the original memory, as per the proper big-endian
> correction.
> 
> The problem is that we recognized the rotation of the _high_ part, which
> is at offset 0 on big-endian. The symbolic number is 0x0304, rather than
> 0x0102 as it should be. Only the latter form should ever be matched.

Which is exactly what the patch for PR67781 was set out to do (see the if 
(BYTES_BIG_ENDIAN) block in find_bswap_or_nop.

The reason why the offset is wrong is due to another if (BYTES_BIG_ENDIAN) 
block in bswap_replace. I will check the testcase added with that latter 
block, my guess is that the change was trying to fix a similar issue to 
PR67781 and PR69714. When removing it the load in avcrc is done without an 
offset. I should have run the full testsuite also on a big endian system 
instead of a few selected testcases and a bootstrap in addition to the little 
endian bootstrap+testsuite. Lesson learned.

> The
> problem is caused by the patch for PR67781, which was intended to solve
> a different big-endian problem. Unfortunately, I think it is based on an
> incorrect analysis.
> 
> The real issue with the PR67781 testcase is in fact the masking loop,
> identified by Thomas in comment #7 for 67781.
> 
> for (tmpn = n->n, rsize = 0; tmpn; tmpn >>= BITS_PER_MARKER, rsize++)
>;
> n->range = rsize;
> 
> If we have a value of 0x01020304, but a range of 5, it means that
> there's an "invisible" high-order byte that we don't care about. On
> little-endian, we can just ignore it. On big-endian, this implies that
> the data we're interested in is located at an offset. The code that does
> the replacements does not use the offset or bytepos fields, it assumes
> that the bytepos always matches that of the load instruction.

Yes, but the change in find_bswap_or_nop aims at checking that we have
0x05040302 or 0x02030405 for big endian targets and 0x04030201 or 0x01020304 
for little endian targets.

Before the "if (rsize < n->range)" block, cmpnop and cmpxchg are respectively 
0x0504030201 and 0x0102030405. Then for big endian it will only keep the 4 
least significant symbolic bytes of cmpxchg (if performs a bitwise and) and 
the 4 most significant symbolic bytes of cmpnop (it performs a right shift) so 
you'd get 0x05040302 for cmpnop and 0x02030405 for cmpxchg. Both would 
translate to a load at offset 0, and then a byteswap for the latter.

As said earlier, the problem is in bswap_replace which tries to adjust the 
address of the load for big endian targets by adding a load offset. With the 
change in find_bswap_or_nop, an offset is never needed because only pattern 
that correspond to a load at offset 0 are recognized. I kept for GCC 7 to 
change that to allow offset and recognize all sub-load and sub-bswap.

> The only
> offset we can introduce is the big-endian correction, but that assumes
> we're always dealing with lowparts.
> 
> So, I think the correct/conservative fix for both bugs is to revert the
> earlier change for PR67781, and then apply the following on top:
> 
> --- revert.tree-ssa-math-opts.c   2016-02-12 15:22:57.098895058 +0100
> +++ tree-ssa-math-opts.c  2016-02-12 15:23:08.482228474 +0100
> @@ -2473,10 +2473,14 @@ find_bswap_or_nop (gimple *stmt, struct
> /* Find real size of result (highest non-zero byte).  */
> if (n->base_addr)
>   {
> -  int rsize;
> +  unsigned HOST_WIDE_INT rsize;
> uint64_t tmpn;
> 
> for (tmpn = n->n, rsize = 0; tmpn; tmpn >>= BITS_PER_MARKER,
> rsize++);
> +  if (BYTES_BIG_ENDIAN && n->range != rsize)
> + /* This implies an offset, which is currently not handled by
> +bswap_replace.  */
> + return NULL;
> n->range = rsize;
>   }

This works too yes with less optimizations for big endian. I'm fine with 
either solutions. This one is indeed a bit more conservative so I see the 
appeal to use it for GCC 5 and 6.

Best regards,

Thomas


[PATCH] Obvious fix for PR66828: left shift with undefined behavior in bswap pass

2015-07-28 Thread Thomas Preud'homme
The bswap pass contain the following loop:

for (i = 0; i < size; i++, inc <<= BITS_PER_MARKER)

In the update to inc and i just before exiting the loop, inc can be shifted by 
a total of more than 62bit, making the value too large to be represented by 
int64_t. This is an undefined behavior [1] and it triggers an error under an 
ubsan bootstrap. This patch change the type of inc to be unsigned, removing the 
undefined behavior.

[1] C++ 98 standard section 5.8 paragraph 2:

"The value of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are 
zero-filled. If E1 has an unsigned type, the value of the result is E1 × 2E2 , 
reduced modulo one more than the maximum value representable in the result 
type. Otherwise, if E1 has a signed type and non-negative value, and E1 × 2E2 
is representable in the corresponding unsigned type of the result type, then 
that value, converted to the result type, is the resulting value; otherwise, the
behavior is undefined."

ChangeLog entry is as follows:

2015-07-28  Thomas Preud'homme  

PR tree-optimization/66828
* tree-ssa-math-opts.c (perform_symbolic_merge): Change type of inc
from int64_t to uint64_t.

Testsuite was run on a native x86_64-linux-gnu bootstrapped GCC and an 
arm-none-eabi cross-compiler without any regression. Committed as obvious as 
suggested by  Markus Trippelsdorf in PR66828.

Best regards,

Thomas




RE: [PATCH] Obvious fix for PR66828: left shift with undefined behavior in bswap pass

2015-07-28 Thread Thomas Preud'homme
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme
> 
> ChangeLog entry is as follows:
> 
> 2015-07-28  Thomas Preud'homme  
> 
> PR tree-optimization/66828
> * tree-ssa-math-opts.c (perform_symbolic_merge): Change type of
> inc
> from int64_t to uint64_t.

And the patch is:

diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
index 55382f3..c3098db 100644
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -2122,7 +2122,7 @@ perform_symbolic_merge (gimple source_stmt1, struct 
symbolic_number *n1,
  the same base (array, structure, ...).  */
   if (gimple_assign_rhs1 (source_stmt1) != gimple_assign_rhs1 (source_stmt2))
 {
-  int64_t inc;
+  uint64_t inc;
   HOST_WIDE_INT start_sub, end_sub, end1, end2, end;
   struct symbolic_number *toinc_n_ptr, *n_end;


Best regards,

Thomas




[PATCH, loop-invariant] Fix PR67043: -fcompare-debug failure with -O3

2015-07-31 Thread Thomas Preud'homme
Hi,

Since commit r223113, loop-invariant pass rely on luids to determine if an 
invariant can be hoisted out of a loop without introducing temporaries. 
However, nothing is made to ensure luids are up-to-date. This patch adds a 
DF_LIVE problem and mark all blocks as dirty before using luids to ensure these 
will be recomputed.

ChangeLog entries are as follows:


2015-07-31  Thomas Preud'homme  

PR tree-optimization/67043
* loop-invariant.c (find_defs): Force recomputation of all luids.


2015-07-29  Thomas Preud'homme  

PR tree-optimization/67043
* gcc.dg/pr67043.c: New test.

Note: the testcase was heavily reduced from the Linux kernel sources by Markus 
Trippelsdorf and formatted to follow GNU code style.


diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c
index 1fdb84d..fc53e09 100644
--- a/gcc/loop-invariant.c
+++ b/gcc/loop-invariant.c
@@ -676,6 +676,8 @@ find_defs (struct loop *loop)
   df_remove_problem (df_chain);
   df_process_deferred_rescans ();
   df_chain_add_problem (DF_UD_CHAIN);
+  df_live_add_problem ();
+  df_live_set_all_dirty ();
   df_set_flags (DF_RD_PRUNE_DEAD_DEFS);
   df_analyze_loop (loop);
   check_invariant_table_size ();
diff --git a/gcc/testsuite/gcc.dg/pr67043.c b/gcc/testsuite/gcc.dg/pr67043.c
new file mode 100644
index 000..36aa686
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr67043.c
@@ -0,0 +1,32 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fcompare-debug -w" } */
+
+extern void rt_mutex_owner (void);
+extern void rt_mutex_deadlock_account_lock (int);
+extern void signal_pending (void);
+__typeof__ (int *) a;
+int b;
+
+int
+try_to_take_rt_mutex (int p1) {
+  rt_mutex_owner ();
+  if (b)
+return 0;
+  rt_mutex_deadlock_account_lock (p1);
+  return 1;
+}
+
+void
+__rt_mutex_slowlock (int p1) {
+  int c;
+  for (;;) {
+c = ({
+  asm ("" : "=r"(a));
+  a;
+});
+if (try_to_take_rt_mutex (c))
+  break;
+if (__builtin_expect (p1 == 0, 0))
+  signal_pending ();
+  }
+}


Patch was tested by running the testsuite against a bootstrapped native 
x86_64-linux-gnu GCC and against an arm-none-eabi GCC cross-compiler without 
any regression.

Is this ok for trunk?

Best regards,

Thomas Preud'homme




RE: [PATCH] Obvious fix for PR66828: left shift with undefined behavior in bswap pass

2015-08-05 Thread Thomas Preud'homme
Hi,

> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme
> Sent: Tuesday, July 28, 2015 3:04 PM
> 
> > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> > ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme
> >
> > ChangeLog entry is as follows:
> >
> > 2015-07-28  Thomas Preud'homme  
> >
> > PR tree-optimization/66828
> > * tree-ssa-math-opts.c (perform_symbolic_merge): Change type
> of
> > inc
> > from int64_t to uint64_t.

Can I backport this change to GCC 5 branch? The patch applies cleanly on
GCC 5 and shows no regression on a native x86_64-linux-gnu bootstrapped
GCC and an arm-none-eabi GCC cross-compiler.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index ba37d96..a301c23 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,12 @@
+2015-08-04  Thomas Preud'homme  
+
+   Backport from mainline
+   2015-07-28  Thomas Preud'homme  
+
+   PR tree-optimization/66828
+   * tree-ssa-math-opts.c (perform_symbolic_merge): Change type of inc
+   from int64_t to uint64_t.
+
 2015-08-03  John David Anglin  
 
PR target/67060
diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
index c22a677..c699dcadb 100644
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -1856,7 +1856,7 @@ perform_symbolic_merge (gimple source_stmt1, struct 
symbolic_number *n1,
  the same base (array, structure, ...).  */
   if (gimple_assign_rhs1 (source_stmt1) != gimple_assign_rhs1 (source_stmt2))
 {
-  int64_t inc;
+  uint64_t inc;
   HOST_WIDE_INT start_sub, end_sub, end1, end2, end;
   struct symbolic_number *toinc_n_ptr, *n_end;


Best regards,

Thomas




FW: [PATCH, ARM/testsuite] Fix thumb2-slow-flash-data.c failures

2015-11-09 Thread Thomas Preud'homme
[Forwarding to gcc-patches, doh!]
 
Best regards,

Thomas
--- Begin Message ---
Hi,

ARM-specific thumb2-slow-flash-data.c testcase shows 2 failures when running 
for arm-none-eabi with -mcpu=cortex-m7:

FAIL: gcc.target/arm/thumb2-slow-flash-data.c (test for excess errors)
FAIL: gcc.target/arm/thumb2-slow-flash-data.c scan-assembler-times movt 13

The first one is due to a missing type specifier in the declaration of labelref 
while the second one is due to different constant synthesis as a result of a 
different tuning for the CPU selected. This patch fixes these issues by adding 
the missing type specifier and checking for .word and similar directive instead 
of the number of movt.

The new test passes for all of -mcpu=cortex-m{3,4,7} but fail when removing the 
-mslow-flash-data switch.

ChangeLog entry is as follows:


*** gcc/testsuite/ChangeLog ***

2015-11-04  Thomas Preud'homme  

* gcc.target/arm/thumb2-slow-flash-data.c: Add missing typespec for
labelref and check use of constant pool by looking for .word and
similar directives.


diff --git a/gcc/testsuite/gcc.target/arm/thumb2-slow-flash-data.c 
b/gcc/testsuite/gcc.target/arm/thumb2-slow-flash-data.c
index 9852ea5..089a72b 100644
--- a/gcc/testsuite/gcc.target/arm/thumb2-slow-flash-data.c
+++ b/gcc/testsuite/gcc.target/arm/thumb2-slow-flash-data.c
@@ -50,7 +50,7 @@ int
 foo (int a, int b)
 {
   int i;
-  volatile *labelref = &&label1;
+  volatile int *labelref = &&label1;
 
   if (a > b)
 {
@@ -70,5 +70,4 @@ label1:
   return a + b;
 }
 
-/* { dg-final { scan-assembler-times "movt" 13 } } */
-/* { dg-final { scan-assembler-times "movt.*LC0\\+4" 1 } } */
+/* { dg-final { scan-assembler-not 
"\\.(float|l\\?double|\d?byte|short|int|long|quad|word)\\s+\[^.\]" } } */


Is this ok for trunk?

Best regards,

Thomas
--- End Message ---


[arm-embedded][PATCHv2, ARM, libgcc] New aeabi_idiv function for armv6-m

2015-12-07 Thread Thomas Preud'homme
We decided to apply this to ARM/embedded-5-branch.

Best regards,

Thomas

> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Andre Vieira
> Sent: Wednesday, October 28, 2015 1:03 AM
> To: gcc-patches@gcc.gnu.org
> Subject: Re: [PING][PATCHv2, ARM, libgcc] New aeabi_idiv function for
> armv6-m
> 
> Ping.
> 
> BR,
> Andre
> 
> On 13/10/15 18:01, Andre Vieira wrote:
> > This patch ports the aeabi_idiv routine from Linaro Cortex-Strings
> > (https://git.linaro.org/toolchain/cortex-strings.git), which was
> > contributed by ARM under Free BSD license.
> >
> > The new aeabi_idiv routine is used to replace the one in
> > libgcc/config/arm/lib1funcs.S. This replacement happens within the
> > Thumb1 wrapper. The new routine is under LGPLv3 license.
> >
> > The main advantage of this version is that it can improve the
> > performance of the aeabi_idiv function for Thumb1. This solution will
> > also increase the code size. So it will only be used if
> > __OPTIMIZE_SIZE__ is not defined.
> >
> > Make check passed for armv6-m.
> >
> > libgcc/ChangeLog:
> > 2015-08-10  Hale Wang  
> >   Andre Vieira  
> >
> > * config/arm/lib1funcs.S: Add new wrapper.
> >





[PATCH, testsuite] Fix PR68629: attr-simd-3.c failure on arm-none-eabi targets

2015-12-09 Thread Thomas Preud'homme
c-c++-common/attr-simd-3.c fails to compile on arm-none-eabi targets due to 
-fcilkplus needing -pthread which is not available for those targets. This 
patch solves this issue by adding a condition to the cilkplus effective target 
that compiling with -fcilkplus succeeds and requires cilkplus as an effective 
target for attr-simd-3.c testcase.

ChangeLog entry is as follows:


*** gcc/testsuite/ChangeLog ***

2015-12-08  Thomas Preud'homme  

PR testsuite/68629
* lib/target-supports.exp (check_effective_target_cilkplus): Also
check that compiling with -fcilkplus does not give an error.
* c-c++-common/attr-simd-3.c: Require cilkplus effective target.


diff --git a/gcc/testsuite/c-c++-common/attr-simd-3.c 
b/gcc/testsuite/c-c++-common/attr-simd-3.c
index d61ba82..1970c67 100644
--- a/gcc/testsuite/c-c++-common/attr-simd-3.c
+++ b/gcc/testsuite/c-c++-common/attr-simd-3.c
@@ -1,4 +1,5 @@
 /* { dg-do compile } */
+/* { dg-require-effective-target "cilkplus" } */
 /* { dg-options "-fcilkplus" } */
 /* { dg-prune-output "undeclared here \\(not in a function\\)|\[^\n\r\]* was 
not declared in this scope" } */
 
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 4e349e9..95b903c 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -1432,7 +1432,12 @@ proc check_effective_target_cilkplus { } {
 if { [istarget avr-*-*] } {
return 0;
 }
-return 1
+return [ check_no_compiler_messages_nocache fcilkplus_available executable 
{
+   #ifdef __cplusplus
+   extern "C"
+   #endif
+   int dummy;
+   } "-fcilkplus" ]
 }
 
 proc check_linker_plugin_available { } {


Testsuite shows no regression when run with
  + an arm-none-eabi GCC cross-compiler targeting Cortex-M3
  + a bootstrapped x86_64-linux-gnu GCC native compiler

Is this ok for trunk?

Best regards,

Thomas




[PATCH, testsuite] Fix PR68632: gcc.target/arm/lto/pr65837 failure on M profile ARM targets

2015-12-09 Thread Thomas Preud'homme
gcc.target/arm/lto/pr65837 fails on M profile ARM targets because of lack of 
neon instructions. This patch adds the necessary arm_neon_ok effective target 
requirement to avoid running this test for such targets.

ChangeLog entry is as follows:


* gcc/testsuite/ChangeLog ***

2015-12-08  Thomas Preud'homme  

PR testsuite/68632
* gcc.target/arm/lto/pr65837_0.c: Require arm_neon_ok effective
target.


diff --git a/gcc/testsuite/gcc.target/arm/lto/pr65837_0.c 
b/gcc/testsuite/gcc.target/arm/lto/pr65837_0.c
index 000fc2a..fcc26a1 100644
--- a/gcc/testsuite/gcc.target/arm/lto/pr65837_0.c
+++ b/gcc/testsuite/gcc.target/arm/lto/pr65837_0.c
@@ -1,4 +1,5 @@
 /* { dg-lto-do run } */
+/* { dg-require-effective-target arm_neon_ok } */
 /* { dg-lto-options {{-flto -mfpu=neon}} } */
 /* { dg-suppress-ld-options {-mfpu=neon} } */



Testcase fails without the patch and succeeds with.

Is this ok for trunk?

Best regards,

Thomas




RE: [PATCH] Fix confusion between target, host and symbolic number byte sizes

2014-08-06 Thread Thomas Preud'homme
I suppose people were busy when I posted this patch and it got forgotten
since so here is a little ping.

Best regards,

Thomas

> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme
> Sent: Friday, July 04, 2014 12:53 PM
> To: gcc-patches@gcc.gnu.org
> Subject: [PATCH] Fix confusion between target, host and symbolic number
> byte sizes
> 
> The bswap pass deals with 3 possibly different byte size: host, target and the
> size a byte marker occupied in the symbolic_number structure [1]. However,
> as of now the code mixes the three size. This works in practice as the pass is
> only enabled for target with BITS_PER_UNIT == 8 and nobody runs GCC on a
> host with CHAR_BIT != 8. As prompted by Jakub Jelinek, this patch fixes this
> mess. Byte marker are 8-bit quantities (they could be made 4-bit quantities
> but I prefered to keep the code working the same as before) for which a
> new macro is introduced (BITS_PER_MARKERS), anything related to storing
> the value or a byte marker in a variable should check for the host byte size 
> or
> wide integer size and anything aimed at manipulating the target value should
> check for BITS_PER_UNIT.
> 
> 
> [1] Although the comment for this structure implies that a byte marker as the
> same size as the host byte, the way it is used in the code (even before any of
> my patch) shows that it uses a fixed size of 8 [2].
> [2] Note that since the pass is only active for targets with BITS_PER_UNIT ==
> 8, it might be using the target byte size.
> 
> gcc/ChangeLog:
> 
> 2014-07-04  Thomas Preud'homme  
> 
>   * tree-ssa-math-opts.c (struct symbolic_number): Clarify comment
> about
>   the size of byte markers.
>   (do_shift_rotate): Fix confusion between host, target and marker
> byte
>   size.
>   (verify_symbolic_number_p): Likewise.
>   (find_bswap_or_nop_1): Likewise.
>   (find_bswap_or_nop): Likewise.
> 
> 
> diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
> index ca2b30d..55c5df7 100644
> --- a/gcc/tree-ssa-math-opts.c
> +++ b/gcc/tree-ssa-math-opts.c
> @@ -1602,11 +1602,10 @@ make_pass_cse_sincos (gcc::context *ctxt)
> 
>  /* A symbolic number is used to detect byte permutation and selection
> patterns.  Therefore the field N contains an artificial number
> -   consisting of byte size markers:
> +   consisting of octet sized markers:
> 
> -   0- byte has the value 0
> -   1..size - byte contains the content of the byte
> -   number indexed with that value minus one.
> +   0- target byte has the value 0
> +   1..size - marker value is the target byte index minus one.
> 
> To detect permutations on memory sources (arrays and structures), a
> symbolic
> number is also associated a base address (the array or structure the load 
> is
> @@ -1631,6 +1630,8 @@ struct symbolic_number {
>unsigned HOST_WIDE_INT range;
>  };
> 
> +#define BITS_PER_MARKER 8
> +
>  /* The number which the find_bswap_or_nop_1 result should match in
> order to have a nop.  The number is masked according to the size of
> the symbolic number before using it.  */
> @@ -1652,15 +1653,16 @@ do_shift_rotate (enum tree_code code,
>struct symbolic_number *n,
>int count)
>  {
> -  int bitsize = TYPE_PRECISION (n->type);
> +  int size = TYPE_PRECISION (n->type) / BITS_PER_UNIT;
> 
> -  if (count % 8 != 0)
> +  if (count % BITS_PER_UNIT != 0)
>  return false;
> +  count = (count / BITS_PER_UNIT) * BITS_PER_MARKER;
> 
>/* Zero out the extra bits of N in order to avoid them being shifted
>   into the significant bits.  */
> -  if (bitsize < 8 * (int)sizeof (int64_t))
> -n->n &= ((uint64_t)1 << bitsize) - 1;
> +  if (size < 64 / BITS_PER_MARKER)
> +n->n &= ((uint64_t) 1 << (size * BITS_PER_MARKER)) - 1;
> 
>switch (code)
>  {
> @@ -1670,22 +1672,22 @@ do_shift_rotate (enum tree_code code,
>  case RSHIFT_EXPR:
>/* Arithmetic shift of signed type: result is dependent on the value.  
> */
>if (!TYPE_UNSIGNED (n->type)
> -   && (n->n & ((uint64_t) 0xff << (bitsize - 8
> +   && (n->n & ((uint64_t) 0xff << ((size - 1) * BITS_PER_MARKER
>   return false;
>n->n >>= count;
>break;
>  case LROTATE_EXPR:
> -  n->n = (n->n << count) | (n->n >> (bitsize - count));
> +  n->n = (n->n << count) | (n->n >> ((size * BITS_PER_MARKER) - count));
>break;
>  ca

[PATCH] Cancel bswap opt when intermediate stmts are reused

2014-08-07 Thread Thomas Preud'homme
Hi all,

Currently, when an expression doing manual load or bswap is detected, the
bswap optimization replace it by an actual load and/or bswap instruction
without considering whether the intermediate values computed in the
expressions are reused later. If that is the case, the construction of these
values has to be retained and the sum of the load and/or bswap instruction
and the instruction to contruct those values might be more expensive than
the initial fully manual expression. This patch aims at detecting such a case
and cancel the bswap optimization. In addition, it takes advantage of the
infrastructure necessary for the detection to also cleanup the stmts that
become useless when the bswap optimization is performed.

*** gcc/ChangeLog ***

2014-08-01  Thomas Preud'homme  

* tree-ssa-math-opts.c (struct usedtree): New.
(find_bswap_or_nop_1): Change prototype to take a hashtable and a list
of struct usedtree. Fill respectively these with visited stmts and
trees (along with the stmts where they are defined) that would not be
defined if bswap optimization is applied. Adapt recursion call to
prototype change.
(find_bswap_or_nop): Adapt to find_bswap_or_nop_1 prototype change.
Pass the hashtable and the list of struct usedtree from
pass_optimize_bswap::execute ().
(do_bswap_p): New.
(bswap_replace): Fix typo in heading comment. Stop returning whether
the bswap optimization was performed since this is now handled by
do_bswap_p (). Move alignment check to do_bswap_p ().
(free_usedtrees_list): New.
(pass_optimize_bswap::execute): Add allocation and deallocation
handling of the hashtable of browsed stmts. Free the list of struct
usedtree at the end of each iteration using free_usedtrees_list () and
the new bswap_check_end_iter label. Move some of the logic to perform
the bswap optimization to do_bswap_p (). Set gsi after performing the
bswap optimization and loop manually via the new
bswap_check_start_iter label so that all stmts are checked for
load/bswap even when cur_stmt is moved around by bswap_replace ().

*** gcc/testsuite/ChangeLog ***

2014-08-01  Thomas Preud'homme  

* gcc.dg/optimize-bswapsi-2.c (read_le32_1): Add an intermediate
variable in the mix to check that it is optimized when there is no
use outside the expression computing a load/bswap.
(read_be32_1): Likewise.
* gcc.dg/optimize-bswapsi-3.c: New. Create read_le32_1 () and
read_be32_1 () based on identically named function in
gcc.dg/optimize-bswapsi-2.c but reusing the intermediate variable so
as to check that bswap optimization is not performed in these cases.

diff --git a/gcc/testsuite/gcc.dg/optimize-bswapsi-2.c 
b/gcc/testsuite/gcc.dg/optimize-bswapsi-2.c
index de6e697..df7856b 100644
--- a/gcc/testsuite/gcc.dg/optimize-bswapsi-2.c
+++ b/gcc/testsuite/gcc.dg/optimize-bswapsi-2.c
@@ -14,7 +14,9 @@ struct uint32_st {
 
 uint32_t read_le32_1 (void)
 {
-  return data[0] | (data[1] << 8) | (data[2] << 16) | (data[3] << 24);
+  uint32_t low = data[0] | (data[1] << 8);
+  uint32_t ret = low | (data[2] << 16) | (data[3] << 24);
+  return ret;
 }
 
 uint32_t read_le32_2 (struct uint32_st data)
@@ -30,7 +32,9 @@ uint32_t read_le32_3 (unsigned char *data)
 
 uint32_t read_be32_1 (void)
 {
-  return data[3] | (data[2] << 8) | (data[1] << 16) | (data[0] << 24);
+  uint32_t low = data[3] | (data[2] << 8);
+  uint32_t ret = low | (data[1] << 16) | (data[0] << 24);
+  return ret;
 }
 
 uint32_t read_be32_2 (struct uint32_st data)
diff --git a/gcc/testsuite/gcc.dg/optimize-bswapsi-3.c 
b/gcc/testsuite/gcc.dg/optimize-bswapsi-3.c
new file mode 100644
index 000..dc48d9d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/optimize-bswapsi-3.c
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target bswap32 } */
+/* { dg-require-effective-target stdint_types } */
+/* { dg-options "-O2 -fdump-tree-bswap" } */
+/* { dg-additional-options "-march=z900" { target s390-*-* } } */
+
+#include 
+
+extern unsigned char data[4];
+
+/* No "bswap" optimization as low is reused */
+uint32_t read_le32_1 (unsigned char *data, uint32_t *low_neg)
+{
+  uint32_t low = data[0] | (data[1] << 8);
+  uint32_t ret = low | (data[2] << 16) | (data[3] << 24);
+  *low_neg = low;
+  return ret;
+}
+
+/* No "bswap" optimization as low is reused */
+uint32_t read_be32_1 (unsigned char *data, uint32_t *low_neg)
+{
+  uint32_t low = data[3] | (data[2] << 8);
+  uint32_t ret = low | (data[1] << 16) | (data[0] << 24);
+  *low_neg = low;
+  return ret;
+}
+
+/* { dg-final { scan-tree-dump-not "32 bit load in target endianness found at" 
"bswap" } } */
+/* { dg-fi

[PATCH] Fix incorrect folding of bitfield in a union on big endian target

2014-08-11 Thread Thomas Preud'homme
In the code dealing with folding of structure and union initialization, there 
is a
check that the size of the constructor is the same as the field being read.
However, in the case of bitfield this test can be wrong because it relies on
TYPE_SIZE to get the size of the field being read but TYPE_SIZE returns the
size of the enclosing integer in that case. This patch also check the size
parameter which contains the actual size of the field being read.

The patch was tested by running the testsuite with three different builds of 
GCC:
1) bootstrap of GCC on x86_64-linux-gnu
2) arm-none-eabi cross compiler (defaulting to little endian) with testsuite
run under qemu emulqting a Cortex M4
3) arm-none-eabi cross compiler (defaulting to big endian, thanks to patch at 
[1])
with testsuite run under qemu emulating a Cortex M3.

[1] https://sourceware.org/ml/binutils/2014-08/msg00014.html

No regression were observed on any of the tests. The ChangeLog is as follows:


2014-08-11  Thomas Preud'homme  

* gimple-fold.c (fold_ctor_reference): Don't fold in presence of
bitfields, that is when size doesn't match the size of type or the
size of the constructor.


diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
index 3dcb576..6270c34 100644
--- a/gcc/gimple-fold.c
+++ b/gcc/gimple-fold.c
@@ -3099,7 +3099,9 @@ fold_ctor_reference (tree type, tree ctor, unsigned 
HOST_WIDE_INT offset,
   if (!AGGREGATE_TYPE_P (TREE_TYPE (ctor)) && !offset
   /* VIEW_CONVERT_EXPR is defined only for matching sizes.  */
   && operand_equal_p (TYPE_SIZE (type),
- TYPE_SIZE (TREE_TYPE (ctor)), 0))
+ TYPE_SIZE (TREE_TYPE (ctor)), 0)
+  && !compare_tree_int (TYPE_SIZE (type), size)
+  && !compare_tree_int (TYPE_SIZE (TREE_TYPE (ctor)), size))
 {
   ret = canonicalize_constructor_val (unshare_expr (ctor), from_decl);
   ret = fold_unary (VIEW_CONVERT_EXPR, type, ret);
diff --git a/gcc/testsuite/gcc.c-torture/execute/bitfld-6.c 
b/gcc/testsuite/gcc.c-torture/execute/bitfld-6.c
new file mode 100644
index 000..50927dc
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/bitfld-6.c
@@ -0,0 +1,23 @@
+union U
+{
+  const int a;
+  unsigned b : 20;
+};
+
+static union U u = { 0x12345678 };
+
+/* Constant folding used to fail to account for endianness when folding a
+   union.  */
+
+int
+main (void)
+{
+#ifdef __BYTE_ORDER__
+#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
+  return u.b - 0x45678;
+#else
+  return u.b - 0x12345;
+#endif
+#endif
+  return 0;
+}

Is it ok for trunk?

Best regards,

Thomas




RE: [PATCH] Fix incorrect folding of bitfield in a union on big endian target

2014-08-11 Thread Thomas Preud'homme
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme
> 
> No regression were observed on any of the tests. The ChangeLog is as
> follows:
> 
> 
> 2014-08-11  Thomas Preud'homme  
> 
>   * gimple-fold.c (fold_ctor_reference): Don't fold in presence of
>   bitfields, that is when size doesn't match the size of type or the
>   size of the constructor.

The ChangeLog is imcomplete. This was for gcc/ChangeLog.

For gcc/testsuite/ChangeLog, there is:

2014-07-02  Thomas Preud'homme  

* gcc.c-torture/execute/bitfld-6.c: New test.

Best regards,

Thomas 





[PATCH, ARM] Fix incorrect alignment of small values in minipool

2014-08-11 Thread Thomas Preud'homme
Being 32-bit wide in size, constant pool entries are always filled as 32-bit
quantities. This works fine for little endian system but gives some incorrect
results for big endian system when the value is *accessed* with a mode smaller
than 32-bit in size. Suppose the case of the value 0x1234 that is accessed as an
HImode value. It would be output as 0x0 0x0 0x12 0x34 in a constant pool entry
and the 16-bit load that would be done would lead to the value 0x0 in register.
The approach here is to transform the value so that it is output correctly by
shifting the value left so that the highest byte in its mode ends up in the
highest byte of the 32-bit value being output.

The patch was tested by running the testsuite with three different builds of 
GCC:
1) bootstrap of GCC on x86_64-linux-gnu
2) arm-none-eabi cross compiler (defaulting to little endian) with testsuite
run under qemu emulqting a Cortex M4
3) arm-none-eabi cross compiler (defaulting to big endian, thanks to patch at 
[1])
with testsuite run under qemu emulating a Cortex M3. Due to the processor used,
the test constant-minipool was not run as part of the testsuite but manually 
with
-cpu=cortex-r4

[1] https://sourceware.org/ml/binutils/2014-08/msg00014.html

The ChangeLog is as follows:

*** gcc/ChangeLog ***

2014-07-14  Thomas Preud'homme  

* config/arm/arm.c (dump_minipool): Fix alignment in minipool of
values whose size is less than MINIPOOL_FIX_SIZE on big endian target.

*** gcc/testsuite/ChangeLog ***

2014-07-14  Thomas Preud'homme  

* gcc.target/arm/constant-pool.c: New test.


diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 0146fe8..e4e0ef4 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -16507,6 +16507,15 @@ dump_minipool (rtx scan)
  fputc ('\n', dump_file);
}
 
+ /* On big-endian target, make sure that padding for values whose
+mode size is smaller than MINIPOOL_FIX_SIZE comes after.  */
+ if (BYTES_BIG_ENDIAN && CONST_INT_P (mp->value))
+   {
+ int byte_disp = mp->fix_size - GET_MODE_SIZE (mp->mode);
+ HOST_WIDE_INT val = INTVAL (mp->value);
+ mp->value = GEN_INT (val << (byte_disp * BITS_PER_UNIT));
+   }
+
  switch (mp->fix_size)
{
 #ifdef HAVE_consttable_1
diff --git a/gcc/testsuite/gcc.target/arm/constant-pool.c 
b/gcc/testsuite/gcc.target/arm/constant-pool.c
new file mode 100644
index 000..46a1534
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/constant-pool.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_arm_ok } */
+/* { dg-options "-O1 -marm -mbig-endian" } */
+
+unsigned short v = 0x5678;
+int i;
+int j = 0;
+int *ptr = &j;
+
+int
+func (void)
+{
+  for (i = 0; i < 1; ++i)
+{
+  *ptr = -1;
+  v = 0x1234;
+}
+  return v;
+}
+
+int
+main (void)
+{
+  func ();
+  return v - 0x1234;
+}
+
+/* { dg-final { scan-assembler-not ".word 4660" } } */


Is this ok for trunk?

Best regards,

Thomas




[PATCH, C/C++] Add -fno-float to forbid floating point data types

2014-08-12 Thread Thomas Preud'homme
Hi,

As mentioned in PR60070, there is many cases when a programmer want to ensure
that a program does not use any floating point data type. Other cases to 
consider
is when a target has several floating point ABI and user want to ensure his/her
is compatible with all available floating point ABI. Adding such a check also  
provides
an opportunity to adapt the behavior of the compiler based on its result.

This patch adds the new option -fno-float to request gcc to throw an error if 
any
float or floating point operation is involved. The patch modifies the C and C++
frontend (others could be modified later if people request it) so as to throw an
error whenever a keyword introducing a float type or a float litteral are
encountered. The check is added to frontend rather middle end as this allow to
do the detection as the file is parsed rather than needing a pass. It also 
limit the
check to only the place where a float can be declared instead of having to look
at all gimple stmts. Finally, it allows to catch some cases that would be absent
of the middle end due to simplification or limited folding that the front end
might do. Note though that things excluded by the preprocessor (think #ifdef)
would not be analyzed.

Note that the tests were written independently of the code so as to have more
confidence in the patch.

ChangeLog are as follows:

*** gcc/ChangeLog ***

2014-08-08  Thomas Preud'homme  

PR middle-end/60070
* doc/invoke.texi (fno-float): Add to the list of C options and explain
its meaning.

*** gcc/c/ChangeLog ***

2014-08-08  Thomas Preud'homme  

PR middle-end/60070
* c-decl.c (finish_declspecs): Throw an error if -fno-float is
specified by user and a default complex is encountered.
* c-parser.c (c_token_starts_typename): Throw an error if -fno-float
is specified by user and a float type name is encountered.
(c_parser_declspecs): Memorize token being tested. Throw an error if
-fno-float is specified by user and a float declaration specifier is
encountered.
(c_parser_postfix_expression): Throw an error if -fno-float is
specified by user and a float litteral is encountered.

*** gcc/c-family/ChangeLog ***

2014-08-08  Thomas Preud'homme  

PR middle-end/60070
* c.opt (ffloat): New option.

*** gcc/cp/ChangeLog ***

2014-08-08  Thomas Preud'homme  

PR middle-end/60070
* decl.c (grokdeclarator): Throw an error if -fno-float is
specified by user and a default complex is encountered.
* parser.c (cp_parser_primary_expression): Throw an error if -fno-float
is specified by user and a float litteral is encountered.
(cp_parser_simple_type_specifier): Throw an error if -fno-float is
specified by user and a float type specifier is encountered.

*** gcc/testsuite/ChangeLog ***

2014-08-08  Thomas Preud'homme  

PR middle-end/60070
* gcc.dg/fno-float.c: New test case.
* g++.dg/diagnostic/fno-float.C: Likewise.


diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index c318cad..2d22e15 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -1079,6 +1079,10 @@ fnil-receivers
 ObjC ObjC++ Var(flag_nil_receivers) Init(1)
 Assume that receivers of Objective-C messages may be nil
 
+ffloat
+C C++ LTO Var(flag_no_float, 0)
+Allow floating point data types to be used in C/C++
+
 flocal-ivars
 ObjC ObjC++ Var(flag_local_ivars) Init(1)
 Allow access to instance variables as if they were local declarations within 
instance method implementations.
diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
index 2a4b439..e68f0f3 100644
--- a/gcc/c/c-decl.c
+++ b/gcc/c/c-decl.c
@@ -10078,6 +10078,10 @@ finish_declspecs (struct c_declspecs *specs)
}
   else if (specs->complex_p)
{
+ if (flag_no_float)
+   error_at (specs->locations[cdw_complex],
+ "use of floating points forbidden in this translation "
+ "unit (-fno-float)");
  specs->typespec_word = cts_double;
  pedwarn (specs->locations[cdw_complex], OPT_Wpedantic,
   "ISO C does not support plain % meaning "
diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index e32bf04..74ac945 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -486,6 +486,15 @@ c_token_starts_typename (c_token *token)
 case CPP_KEYWORD:
   switch (token->keyword)
{
+   case RID_FLOAT:
+   case RID_DOUBLE:
+   case RID_DFLOAT32:
+   case RID_DFLOAT64:
+   case RID_DFLOAT128:
+ if (flag_no_float)
+   error_at (token->location, "use of floating points forbidden in "
+  "this translation unit (-fno-float)");
+ /* Fall through.  */
case RID_UNSIGNED:
case RID_LONG:
case RID_INT128:
@@ -494

RE: [PATCH, C/C++] Add -fno-float to forbid floating point data types

2014-08-12 Thread Thomas Preud'homme
> From: Marc Glisse [mailto:marc.gli...@inria.fr]
> Sent: Tuesday, August 12, 2014 5:47 PM
> 
> 
> Are you sure you want something that strict? If there are floats in the
> code but the compiler manages to get rid of them, it won't cause any
> trouble with the ABI. Besides, if you #include a header that declares a
> function returning a double, but you don't use that function, the
> compilation will still fail. And if you only want to forbid float/double
> at the source level, grep is a pretty good tool. I can't see a test for
> __builtin_sqrt(42). No ObjC?

You raise some valid points due to the use case in mind when doing the
patch, which is codes that aim to be independent of the float ABI in use
on target with several float ABI. Now, that said, let's consider each point
one by one.

1) Where to do the check

Initially my approach was to detect whether a given execution unit is
affected by the float ABI in use and was thus much less strict. It
worked but needed some new hooks. After some discussion about
this approach it appeared to be too complicated, difficult to make it
target independent and also risky. Most importantly, it seemed
unnecessary as if you don't use float at interface you probably don't
use them at all. Or at least, it shouldn't be difficult to make it so. And
as you can see from the PR mentioned, such approach can be useful
for more use cases.

About doing the test later, I believe it makes the feature a bit less
useful. A float might be eliminated in a build due to an optimization
but still remain in another build. I feel it more useful to tell the user
that such line of code can lead to FPU being used. It also seems more
natural to test such a thing in the frontend, as you process the file.

However it's true that it prevents including math.h for instance.
It might be a good idea to ignore prototypes. 

2) Need for such an option at all

Grep might work but it does not give any hint to the compiler that
no float is used and therefore the file is independent of the float ABI.
This is important since it allows the compiler to emit a special
attribute to tell the linker about it.

3) __builtin_sqrt

True, I shall try if it works with builtins. Thanks for the advice.

4) Objective C

As said in the description, I'm not opposed to adding other language.
It's easier to add another language than remove one after the fact
because very few people use it. Therefore I preferred to have
just C and C++ for now which is what I expect most of the users of
such a switch to be interested in. Do you think I should add support
for that language up front or can it wait a later version of the patch
once people started to use it?

> 
> I am not at all saying the approach is wrong, just making sure that's
> what we want.

Sure, I appreciate any constructive critics. If you are unconvinced by
my arguments I'd be happy to hear about the reasons.

Best regards,

Thomas




RE: [PATCH, ARM] Fix incorrect alignment of small values in minipool

2014-08-13 Thread Thomas Preud'homme
> From: Richard Earnshaw
> Sent: Monday, August 11, 2014 4:54 PM
> 
> I think this is the wrong place for this sort of fix up.  HFmode values
> are fixed up in the consttable_4 pattern and it looks wrong to be that
> HImode values are then fixed up in a different place.  We should be
> consistent and do all the fix ups in one location.

Sorry for the delay in answering.

The problem is that in the pattern for constable_4 we don't have the information
about the access mode for this entry. In the testcase along this patch the rtx
manipulated in the pattern is VOIDmode while the access mode is HImode. In
dump_minipool on the other hand the information can be found in mp->mode.

Best regards,

Thomas





RE: [PATCH] Fix incorrect folding of bitfield in a union on big endian target

2014-08-13 Thread Thomas Preud'homme
> From: Richard Biener [mailto:richard.guent...@gmail.com]
> Sent: Monday, August 11, 2014 8:29 PM
> 
> That's now extra compares (the operand_equal_p check now does
> a check that can be derived transitively).
> 
> So - ok with the operand_equal_p cehck removed.
> 
> Also see if this can be backported.

I checked and the bug is present for in both GCC 4.8 and 4.9 branches. The
patch applies cleanly in both cases as well. I did the same testing as for trunk
on the GCC 4.8 branch, that is:

1) bootstrap of GCC on x86_64-linux-gnu
2) arm-none-eabi cross compiler (defaulting to little endian) with testsuite
run under qemu emulqting a Cortex M4
3) arm-none-eabi cross compiler (defaulting to big endian, thanks to patch at 
[1])
with testsuite run under qemu emulating a Cortex M3.

I'm going to do the same testing for GCC 4.9 now. May I commit the backports?

Best regards,

Thomas





RE: [PATCH, C/C++] Add -fno-float to forbid floating point data types

2014-08-13 Thread Thomas Preud'homme
> From: Marek Polacek [mailto:pola...@redhat.com]
> Sent: Tuesday, August 12, 2014 5:43 PM
> On Tue, Aug 12, 2014 at 11:34:35AM +0200, Jakub Jelinek wrote:
> >
> > This looks wrong.  c_token_starts_typename is just a function which tells
> > you if certain token can start a typename, issuing diagnostics there doesn't
> > make sense, that routine doesn't actually parse the token.  You should
> > diagnose it where you actually parse it.
> 
> I'd say the proper place would be declspecs_add_type.

Wouldn't that miss casts and sizeof for instance? It's true that
c_token_starts_typename is not the place where the token is parsed but it
seemed a more central place: it catches all these cases in one check. Ok
maybe sizeof (float) should be ignored but I suppose if you use that you
intend to store a float later.

On the other hand I do want to distinguish float declared in prototypes from
float declared elsewhere.

Best regards,

Thomas




RE: [PATCH] Fix confusion between target, host and symbolic number byte sizes

2014-08-14 Thread Thomas Preud'homme
Ping?

Best regards,

Thomas

> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme
> Sent: Thursday, August 07, 2014 1:57 PM
> To: gcc-patches@gcc.gnu.org
> Subject: RE: [PATCH] Fix confusion between target, host and symbolic
> number byte sizes
> 
> I suppose people were busy when I posted this patch and it got forgotten
> since so here is a little ping.
> 
> Best regards,
> 
> Thomas
> 
> > -Original Message-
> > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> > ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme
> > Sent: Friday, July 04, 2014 12:53 PM
> > To: gcc-patches@gcc.gnu.org
> > Subject: [PATCH] Fix confusion between target, host and symbolic number
> > byte sizes
> >
> > The bswap pass deals with 3 possibly different byte size: host, target and
> the
> > size a byte marker occupied in the symbolic_number structure [1].
> However,
> > as of now the code mixes the three size. This works in practice as the pass
> is
> > only enabled for target with BITS_PER_UNIT == 8 and nobody runs GCC on
> a
> > host with CHAR_BIT != 8. As prompted by Jakub Jelinek, this patch fixes this
> > mess. Byte marker are 8-bit quantities (they could be made 4-bit quantities
> > but I prefered to keep the code working the same as before) for which a
> > new macro is introduced (BITS_PER_MARKERS), anything related to storing
> > the value or a byte marker in a variable should check for the host byte size
> or
> > wide integer size and anything aimed at manipulating the target value
> should
> > check for BITS_PER_UNIT.
> >
> >
> > [1] Although the comment for this structure implies that a byte marker as
> the
> > same size as the host byte, the way it is used in the code (even before any
> of
> > my patch) shows that it uses a fixed size of 8 [2].
> > [2] Note that since the pass is only active for targets with BITS_PER_UNIT
> ==
> > 8, it might be using the target byte size.
> >
> > gcc/ChangeLog:
> >
> > 2014-07-04  Thomas Preud'homme  
> >
> > * tree-ssa-math-opts.c (struct symbolic_number): Clarify comment
> > about
> > the size of byte markers.
> > (do_shift_rotate): Fix confusion between host, target and marker
> > byte
> > size.
> > (verify_symbolic_number_p): Likewise.
> > (find_bswap_or_nop_1): Likewise.
> > (find_bswap_or_nop): Likewise.
> >
> >
> > diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
> > index ca2b30d..55c5df7 100644
> > --- a/gcc/tree-ssa-math-opts.c
> > +++ b/gcc/tree-ssa-math-opts.c
> > @@ -1602,11 +1602,10 @@ make_pass_cse_sincos (gcc::context *ctxt)
> >
> >  /* A symbolic number is used to detect byte permutation and selection
> > patterns.  Therefore the field N contains an artificial number
> > -   consisting of byte size markers:
> > +   consisting of octet sized markers:
> >
> > -   0- byte has the value 0
> > -   1..size - byte contains the content of the byte
> > -   number indexed with that value minus one.
> > +   0- target byte has the value 0
> > +   1..size - marker value is the target byte index minus one.
> >
> > To detect permutations on memory sources (arrays and structures), a
> > symbolic
> > number is also associated a base address (the array or structure the 
> > load
> is
> > @@ -1631,6 +1630,8 @@ struct symbolic_number {
> >unsigned HOST_WIDE_INT range;
> >  };
> >
> > +#define BITS_PER_MARKER 8
> > +
> >  /* The number which the find_bswap_or_nop_1 result should match in
> > order to have a nop.  The number is masked according to the size of
> > the symbolic number before using it.  */
> > @@ -1652,15 +1653,16 @@ do_shift_rotate (enum tree_code code,
> >  struct symbolic_number *n,
> >  int count)
> >  {
> > -  int bitsize = TYPE_PRECISION (n->type);
> > +  int size = TYPE_PRECISION (n->type) / BITS_PER_UNIT;
> >
> > -  if (count % 8 != 0)
> > +  if (count % BITS_PER_UNIT != 0)
> >  return false;
> > +  count = (count / BITS_PER_UNIT) * BITS_PER_MARKER;
> >
> >/* Zero out the extra bits of N in order to avoid them being shifted
> >   into the significant bits.  */
> > -  if (bitsize < 8 * (int)sizeof (int64_t))
> > -n->n &= ((uint64_t)1 << bitsize) - 1;
> > +  if (size < 64 / BITS_PER_MARKER)
> > +n-&g

RE: [PATCH] Cancel bswap opt when intermediate stmts are reused

2014-08-18 Thread Thomas Preud'homme
Ping?

> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme
> Sent: Thursday, August 07, 2014 6:56 PM
> To: gcc-patches@gcc.gnu.org
> Subject: [PATCH] Cancel bswap opt when intermediate stmts are reused
> 
> Hi all,
> 
> Currently, when an expression doing manual load or bswap is detected, the
> bswap optimization replace it by an actual load and/or bswap instruction
> without considering whether the intermediate values computed in the
> expressions are reused later. If that is the case, the construction of these
> values has to be retained and the sum of the load and/or bswap instruction
> and the instruction to contruct those values might be more expensive than
> the initial fully manual expression. This patch aims at detecting such a case
> and cancel the bswap optimization. In addition, it takes advantage of the
> infrastructure necessary for the detection to also cleanup the stmts that
> become useless when the bswap optimization is performed.
> 
> *** gcc/ChangeLog ***
> 
> 2014-08-01  Thomas Preud'homme  
> 
>   * tree-ssa-math-opts.c (struct usedtree): New.
>   (find_bswap_or_nop_1): Change prototype to take a hashtable and
> a list
>   of struct usedtree. Fill respectively these with visited stmts and
>   trees (along with the stmts where they are defined) that would not
> be
>   defined if bswap optimization is applied. Adapt recursion call to
>   prototype change.
>   (find_bswap_or_nop): Adapt to find_bswap_or_nop_1 prototype
> change.
>   Pass the hashtable and the list of struct usedtree from
>   pass_optimize_bswap::execute ().
>   (do_bswap_p): New.
>   (bswap_replace): Fix typo in heading comment. Stop returning
> whether
>   the bswap optimization was performed since this is now handled by
>   do_bswap_p (). Move alignment check to do_bswap_p ().
>   (free_usedtrees_list): New.
>   (pass_optimize_bswap::execute): Add allocation and deallocation
>   handling of the hashtable of browsed stmts. Free the list of struct
>   usedtree at the end of each iteration using free_usedtrees_list ()
> and
>   the new bswap_check_end_iter label. Move some of the logic to
> perform
>   the bswap optimization to do_bswap_p (). Set gsi after performing
> the
>   bswap optimization and loop manually via the new
>   bswap_check_start_iter label so that all stmts are checked for
>   load/bswap even when cur_stmt is moved around by bswap_replace
> ().
> 
> *** gcc/testsuite/ChangeLog ***
> 
> 2014-08-01  Thomas Preud'homme  
> 
>   * gcc.dg/optimize-bswapsi-2.c (read_le32_1): Add an intermediate
>   variable in the mix to check that it is optimized when there is no
>   use outside the expression computing a load/bswap.
>   (read_be32_1): Likewise.
>   * gcc.dg/optimize-bswapsi-3.c: New. Create read_le32_1 () and
>   read_be32_1 () based on identically named function in
>   gcc.dg/optimize-bswapsi-2.c but reusing the intermediate variable so
>   as to check that bswap optimization is not performed in these cases.
> 
> diff --git a/gcc/testsuite/gcc.dg/optimize-bswapsi-2.c
> b/gcc/testsuite/gcc.dg/optimize-bswapsi-2.c
> index de6e697..df7856b 100644
> --- a/gcc/testsuite/gcc.dg/optimize-bswapsi-2.c
> +++ b/gcc/testsuite/gcc.dg/optimize-bswapsi-2.c
> @@ -14,7 +14,9 @@ struct uint32_st {
> 
>  uint32_t read_le32_1 (void)
>  {
> -  return data[0] | (data[1] << 8) | (data[2] << 16) | (data[3] << 24);
> +  uint32_t low = data[0] | (data[1] << 8);
> +  uint32_t ret = low | (data[2] << 16) | (data[3] << 24);
> +  return ret;
>  }
> 
>  uint32_t read_le32_2 (struct uint32_st data)
> @@ -30,7 +32,9 @@ uint32_t read_le32_3 (unsigned char *data)
> 
>  uint32_t read_be32_1 (void)
>  {
> -  return data[3] | (data[2] << 8) | (data[1] << 16) | (data[0] << 24);
> +  uint32_t low = data[3] | (data[2] << 8);
> +  uint32_t ret = low | (data[1] << 16) | (data[0] << 24);
> +  return ret;
>  }
> 
>  uint32_t read_be32_2 (struct uint32_st data)
> diff --git a/gcc/testsuite/gcc.dg/optimize-bswapsi-3.c
> b/gcc/testsuite/gcc.dg/optimize-bswapsi-3.c
> new file mode 100644
> index 000..dc48d9d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/optimize-bswapsi-3.c
> @@ -0,0 +1,31 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target bswap32 } */
> +/* { dg-require-effective-target stdint_types } */
> +/* { dg-options "-O2 -fdump-tree-bswap" } */
> +/* { dg-additional-options "-march=z900" { target s390-*-*

[PATCH][ARM] Fix -fcall-saved-rX for X > 7

2014-08-19 Thread Thomas Preud'homme
This patch makes -fcall-saved-rX for X > 7 on Thumb target when optimizing for 
size. It works by adding a new field x_user_set_call_save_regs in struct 
target_hard_regs to track whether an entry in fields x_fixed_regs, 
x_call_used_regs and x_call_really_used_regs was user set or is in its default 
value. Then it can decide whether to set a given high register as caller saved 
or not when optimizing for size based on this information.

ChangeLog are as follows:

*** gcc/ChangeLog ***

2014-08-15  Thomas Preud'homme  

* config/arm/arm.c (arm_conditional_register_usage): Only set high
registers as caller saved when optimizing for size *and* the user did
not asked otherwise through -fcall-saved-* switch.
* hard-reg-set.h (x_user_set_call_save_regs): New.
(user_set_call_save_regs): Define.
* reginfo.c (init_reg_sets): Initialize user_set_call_save_regs.
(fix_register): Indicate in user_set_call_save_regs that the value set
in call_save_regs and fixed_regs is user set.


*** gcc/testsuite/ChangeLog ***

2014-08-15  Thomas Preud'homme  

* gcc.target/arm/fcall-save-rhigh.c: New.


diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 2f8d327..8324fa3 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -30084,7 +30084,8 @@ arm_conditional_register_usage (void)
 stacking them.  */
   for (regno = FIRST_HI_REGNUM;
   regno <= LAST_HI_REGNUM; ++regno)
-   fixed_regs[regno] = call_used_regs[regno] = 1;
+   if (!user_set_call_save_regs[regno])
+ fixed_regs[regno] = call_used_regs[regno] = 1;
 }
 
   /* The link register can be clobbered by any branch insn,
diff --git a/gcc/hard-reg-set.h b/gcc/hard-reg-set.h
index b8ab3df..b523637 100644
--- a/gcc/hard-reg-set.h
+++ b/gcc/hard-reg-set.h
@@ -614,6 +614,11 @@ struct target_hard_regs {
 
   char x_call_really_used_regs[FIRST_PSEUDO_REGISTER];
 
+  /* Indexed by hard register number, contains 1 for registers
+ whose saving at function call was decided by the user
+ with -fcall-saved-*, -fcall-used-* or -ffixed-*.  */
+  char x_user_set_call_save_regs[FIRST_PSEUDO_REGISTER];
+
   /* The same info as a HARD_REG_SET.  */
   HARD_REG_SET x_call_used_reg_set;
 
@@ -685,6 +690,8 @@ extern struct target_hard_regs *this_target_hard_regs;
   (this_target_hard_regs->x_call_used_regs)
 #define call_really_used_regs \
   (this_target_hard_regs->x_call_really_used_regs)
+#define user_set_call_save_regs \
+  (this_target_hard_regs->x_user_set_call_save_regs)
 #define call_used_reg_set \
   (this_target_hard_regs->x_call_used_reg_set)
 #define call_fixed_reg_set \
diff --git a/gcc/reginfo.c b/gcc/reginfo.c
index 7668be0..0b35f7f 100644
--- a/gcc/reginfo.c
+++ b/gcc/reginfo.c
@@ -183,6 +183,7 @@ init_reg_sets (void)
   memcpy (call_really_used_regs, initial_call_really_used_regs,
  sizeof call_really_used_regs);
 #endif
+  memset (user_set_call_save_regs, 0, sizeof user_set_call_save_regs);
 #ifdef REG_ALLOC_ORDER
   memcpy (reg_alloc_order, initial_reg_alloc_order, sizeof reg_alloc_order);
 #endif
@@ -742,6 +743,7 @@ fix_register (const char *name, int fixed, int call_used)
  if (fixed == 0)
call_really_used_regs[i] = call_used;
 #endif
+ user_set_call_save_regs[i] = 1;
}
}
 }
diff --git a/gcc/testsuite/gcc.target/arm/fcall-save-rhigh.c 
b/gcc/testsuite/gcc.target/arm/fcall-save-rhigh.c
new file mode 100644
index 000..a321a2b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/fcall-save-rhigh.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-final { scan-assembler "mov\\s+r.\\s*,\\s*r8" } } */
+/* { dg-require-effective-target arm_thumb1_ok } */
+/* { dg-options "-Os -mthumb -mcpu=cortex-m0 -fcall-saved-r8" } */
+
+void
+save_regs (void)
+{
+  asm volatile ("" ::: "r7", "r8");
+}

Ok for trunk?

Best regards,

Thomas




RE: [PATCH] Fix confusion between target, host and symbolic number byte sizes

2014-08-20 Thread Thomas Preud'homme
Ping?

> > -Original Message-
> > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> > ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme
> > Sent: Thursday, August 07, 2014 1:57 PM
> > To: gcc-patches@gcc.gnu.org
> > Subject: RE: [PATCH] Fix confusion between target, host and symbolic
> > number byte sizes
> >
> > I suppose people were busy when I posted this patch and it got forgotten
> > since so here is a little ping.
> >
> > Best regards,
> >
> > Thomas
> >
> > > -Original Message-----
> > > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> > > ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme
> > > Sent: Friday, July 04, 2014 12:53 PM
> > > To: gcc-patches@gcc.gnu.org
> > > Subject: [PATCH] Fix confusion between target, host and symbolic
> number
> > > byte sizes
> > >
> > > The bswap pass deals with 3 possibly different byte size: host, target and
> > the
> > > size a byte marker occupied in the symbolic_number structure [1].
> > However,
> > > as of now the code mixes the three size. This works in practice as the 
> > > pass
> > is
> > > only enabled for target with BITS_PER_UNIT == 8 and nobody runs GCC
> on
> > a
> > > host with CHAR_BIT != 8. As prompted by Jakub Jelinek, this patch fixes
> this
> > > mess. Byte marker are 8-bit quantities (they could be made 4-bit
> quantities
> > > but I prefered to keep the code working the same as before) for which a
> > > new macro is introduced (BITS_PER_MARKERS), anything related to
> storing
> > > the value or a byte marker in a variable should check for the host byte
> size
> > or
> > > wide integer size and anything aimed at manipulating the target value
> > should
> > > check for BITS_PER_UNIT.
> > >
> > >
> > > [1] Although the comment for this structure implies that a byte marker as
> > the
> > > same size as the host byte, the way it is used in the code (even before
> any
> > of
> > > my patch) shows that it uses a fixed size of 8 [2].
> > > [2] Note that since the pass is only active for targets with BITS_PER_UNIT
> > ==
> > > 8, it might be using the target byte size.
> > >
> > > gcc/ChangeLog:
> > >
> > > 2014-07-04  Thomas Preud'homme  
> > >
> > >   * tree-ssa-math-opts.c (struct symbolic_number): Clarify comment
> > > about
> > >   the size of byte markers.
> > >   (do_shift_rotate): Fix confusion between host, target and marker
> > > byte
> > >   size.
> > >   (verify_symbolic_number_p): Likewise.
> > >   (find_bswap_or_nop_1): Likewise.
> > >   (find_bswap_or_nop): Likewise.
> > >
> > >
> > > diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
> > > index ca2b30d..55c5df7 100644
> > > --- a/gcc/tree-ssa-math-opts.c
> > > +++ b/gcc/tree-ssa-math-opts.c
> > > @@ -1602,11 +1602,10 @@ make_pass_cse_sincos (gcc::context *ctxt)
> > >
> > >  /* A symbolic number is used to detect byte permutation and selection
> > > patterns.  Therefore the field N contains an artificial number
> > > -   consisting of byte size markers:
> > > +   consisting of octet sized markers:
> > >
> > > -   0- byte has the value 0
> > > -   1..size - byte contains the content of the byte
> > > -   number indexed with that value minus one.
> > > +   0- target byte has the value 0
> > > +   1..size - marker value is the target byte index minus one.
> > >
> > > To detect permutations on memory sources (arrays and structures), a
> > > symbolic
> > > number is also associated a base address (the array or structure the
> load
> > is
> > > @@ -1631,6 +1630,8 @@ struct symbolic_number {
> > >unsigned HOST_WIDE_INT range;
> > >  };
> > >
> > > +#define BITS_PER_MARKER 8
> > > +
> > >  /* The number which the find_bswap_or_nop_1 result should match in
> > > order to have a nop.  The number is masked according to the size of
> > > the symbolic number before using it.  */
> > > @@ -1652,15 +1653,16 @@ do_shift_rotate (enum tree_code code,
> > >struct symbolic_number *n,
> > >int count)
> > >  {
> > > -  int bitsize = TYPE_PRECISION (n->type);
> > > +  int size = TYPE_PRECISION (n->type) / BITS_PER_UNIT;
>

RE: [PATCH, ARM] Fix incorrect alignment of small values in minipool

2014-08-20 Thread Thomas Preud'homme
> From: Richard Earnshaw
> Sent: Monday, August 18, 2014 6:34 PM
> >
> > The problem is that in the pattern for constable_4 we don't have the
> information
> > about the access mode for this entry. In the testcase along this patch the
> rtx
> > manipulated in the pattern is VOIDmode while the access mode is HImode.
> In
> > dump_minipool on the other hand the information can be found in mp-
> >mode.
> >
> 
> I think it would be better to make sure the mode field never contains
> VOIDmode.  That's not really useful information.

I'm a bit confused. In the chapter about modes (13.6) of the gcc internal 
documentation
VOIDmode is said to be the correct mode for constant. You want me to change 
this?

Best regards,

Thomas





RE: [PATCH][ARM] Fix -fcall-saved-rX for X > 7 When compiling for size for thumb targets

2014-08-26 Thread Thomas Preud'homme
Ping?

> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme
> Sent: Wednesday, August 20, 2014 9:28 AM
> To: gcc-patches@gcc.gnu.org
> Subject: [PATCH][ARM] Fix -fcall-saved-rX for X > 7
> 
> This patch makes -fcall-saved-rX for X > 7 on Thumb target when optimizing
> for size. It works by adding a new field x_user_set_call_save_regs in struct
> target_hard_regs to track whether an entry in fields x_fixed_regs,
> x_call_used_regs and x_call_really_used_regs was user set or is in its default
> value. Then it can decide whether to set a given high register as caller saved
> or not when optimizing for size based on this information.
> 
> ChangeLog are as follows:
> 
> *** gcc/ChangeLog ***
> 
> 2014-08-15  Thomas Preud'homme  
> 
> * config/arm/arm.c (arm_conditional_register_usage): Only set high
> registers as caller saved when optimizing for size *and* the user did
> not asked otherwise through -fcall-saved-* switch.
> * hard-reg-set.h (x_user_set_call_save_regs): New.
> (user_set_call_save_regs): Define.
> * reginfo.c (init_reg_sets): Initialize user_set_call_save_regs.
> (fix_register): Indicate in user_set_call_save_regs that the value set
> in call_save_regs and fixed_regs is user set.
> 
> 
> *** gcc/testsuite/ChangeLog ***
> 
> 2014-08-15  Thomas Preud'homme  
> 
> * gcc.target/arm/fcall-save-rhigh.c: New.
> 
> 
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index 2f8d327..8324fa3 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -30084,7 +30084,8 @@ arm_conditional_register_usage (void)
>  stacking them.  */
>for (regno = FIRST_HI_REGNUM;
>  regno <= LAST_HI_REGNUM; ++regno)
> - fixed_regs[regno] = call_used_regs[regno] = 1;
> + if (!user_set_call_save_regs[regno])
> +   fixed_regs[regno] = call_used_regs[regno] = 1;
>  }
> 
>/* The link register can be clobbered by any branch insn,
> diff --git a/gcc/hard-reg-set.h b/gcc/hard-reg-set.h
> index b8ab3df..b523637 100644
> --- a/gcc/hard-reg-set.h
> +++ b/gcc/hard-reg-set.h
> @@ -614,6 +614,11 @@ struct target_hard_regs {
> 
>char x_call_really_used_regs[FIRST_PSEUDO_REGISTER];
> 
> +  /* Indexed by hard register number, contains 1 for registers
> + whose saving at function call was decided by the user
> + with -fcall-saved-*, -fcall-used-* or -ffixed-*.  */
> +  char x_user_set_call_save_regs[FIRST_PSEUDO_REGISTER];
> +
>/* The same info as a HARD_REG_SET.  */
>HARD_REG_SET x_call_used_reg_set;
> 
> @@ -685,6 +690,8 @@ extern struct target_hard_regs
> *this_target_hard_regs;
>(this_target_hard_regs->x_call_used_regs)
>  #define call_really_used_regs \
>(this_target_hard_regs->x_call_really_used_regs)
> +#define user_set_call_save_regs \
> +  (this_target_hard_regs->x_user_set_call_save_regs)
>  #define call_used_reg_set \
>(this_target_hard_regs->x_call_used_reg_set)
>  #define call_fixed_reg_set \
> diff --git a/gcc/reginfo.c b/gcc/reginfo.c
> index 7668be0..0b35f7f 100644
> --- a/gcc/reginfo.c
> +++ b/gcc/reginfo.c
> @@ -183,6 +183,7 @@ init_reg_sets (void)
>memcpy (call_really_used_regs, initial_call_really_used_regs,
> sizeof call_really_used_regs);
>  #endif
> +  memset (user_set_call_save_regs, 0, sizeof user_set_call_save_regs);
>  #ifdef REG_ALLOC_ORDER
>memcpy (reg_alloc_order, initial_reg_alloc_order, sizeof reg_alloc_order);
>  #endif
> @@ -742,6 +743,7 @@ fix_register (const char *name, int fixed, int call_used)
> if (fixed == 0)
>   call_really_used_regs[i] = call_used;
>  #endif
> +   user_set_call_save_regs[i] = 1;
>   }
>   }
>  }
> diff --git a/gcc/testsuite/gcc.target/arm/fcall-save-rhigh.c
> b/gcc/testsuite/gcc.target/arm/fcall-save-rhigh.c
> new file mode 100644
> index 000..a321a2b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/fcall-save-rhigh.c
> @@ -0,0 +1,10 @@
> +/* { dg-do compile } */
> +/* { dg-final { scan-assembler "mov\\s+r.\\s*,\\s*r8" } } */
> +/* { dg-require-effective-target arm_thumb1_ok } */
> +/* { dg-options "-Os -mthumb -mcpu=cortex-m0 -fcall-saved-r8" } */
> +
> +void
> +save_regs (void)
> +{
> +  asm volatile ("" ::: "r7", "r8");
> +}
> 
> Ok for trunk?
> 
> Best regards,
> 
> Thomas
> 
> 





[PATCH] Fix byte size confusion in bswap pass

2014-08-28 Thread Thomas Preud'homme
[CCing you Jakub as you are the one who raised this issue to me]

The bswap pass deals with 3 possibly different byte size: host, target and the 
size a byte marker in the symbolic_number structure [1]. However, right now the 
code mixes the three sizes. This works in practice as the pass is only enabled 
for target with BITS_PER_UNIT == 8 and nobody runs GCC on a host with CHAR_BIT 
!= 8. As prompted by Jakub Jelinek, this patch fixes this mess. Byte marker are 
8-bit quantities (they could be made 4-bit quantities but I preferred to keep 
the code working the same as before) for which a new macro is introduced 
(BITS_PER_MARKERS), anything related to storing the value or a byte marker in a 
variable should check for the host byte size or wide integer size and anything 
aimed at manipulating the target value should check for BITS_PER_UNIT.


[1] Although the comment for this structure implies that a byte marker as the 
same size as the host byte, the way it is used in the code (even before any of 
my patch) shows that it uses a fixed size of 8 [2].
[2] Note that since the pass is only active for targets with BITS_PER_UNIT == 
8, it might be using the target byte size.


gcc/ChangeLog:

2014-08-29  Thomas Preud'homme  

* tree-ssa-math-opts.c (struct symbolic_number): Clarify comment about
the size of byte markers.
(do_shift_rotate): Fix confusion between host, target and marker byte
size.
(verify_symbolic_number_p): Likewise.
(find_bswap_or_nop_1): Likewise.
(find_bswap_or_nop): Likewise.


diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
index ca2b30d..55c5df7 100644
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -1600,11 +1600,10 @@ make_pass_cse_sincos (gcc::context *ctxt)
 
 /* A symbolic number is used to detect byte permutation and selection
patterns.  Therefore the field N contains an artificial number
-   consisting of byte size markers:
+   consisting of octet sized markers:
 
-   0- byte has the value 0
-   1..size - byte contains the content of the byte
-   number indexed with that value minus one.
+   0- target byte has the value 0
+   1..size - marker value is the target byte index minus one.
 
To detect permutations on memory sources (arrays and structures), a symbolic
number is also associated a base address (the array or structure the load is
@@ -1629,6 +1628,8 @@ struct symbolic_number {
   unsigned HOST_WIDE_INT range;
 };
 
+#define BITS_PER_MARKER 8
+
 /* The number which the find_bswap_or_nop_1 result should match in
order to have a nop.  The number is masked according to the size of
the symbolic number before using it.  */
@@ -1650,15 +1651,16 @@ do_shift_rotate (enum tree_code code,
 struct symbolic_number *n,
 int count)
 {
-  int bitsize = TYPE_PRECISION (n->type);
+  int size = TYPE_PRECISION (n->type) / BITS_PER_UNIT;
 
-  if (count % 8 != 0)
+  if (count % BITS_PER_UNIT != 0)
 return false;
+  count = (count / BITS_PER_UNIT) * BITS_PER_MARKER;
 
   /* Zero out the extra bits of N in order to avoid them being shifted
  into the significant bits.  */
-  if (bitsize < 8 * (int)sizeof (int64_t))
-n->n &= ((uint64_t)1 << bitsize) - 1;
+  if (size < 64 / BITS_PER_MARKER)
+n->n &= ((uint64_t) 1 << (size * BITS_PER_MARKER)) - 1;
 
   switch (code)
 {
@@ -1668,22 +1670,22 @@ do_shift_rotate (enum tree_code code,
 case RSHIFT_EXPR:
   /* Arithmetic shift of signed type: result is dependent on the value.  */
   if (!TYPE_UNSIGNED (n->type)
- && (n->n & ((uint64_t) 0xff << (bitsize - 8
+ && (n->n & ((uint64_t) 0xff << ((size - 1) * BITS_PER_MARKER
return false;
   n->n >>= count;
   break;
 case LROTATE_EXPR:
-  n->n = (n->n << count) | (n->n >> (bitsize - count));
+  n->n = (n->n << count) | (n->n >> ((size * BITS_PER_MARKER) - count));
   break;
 case RROTATE_EXPR:
-  n->n = (n->n >> count) | (n->n << (bitsize - count));
+  n->n = (n->n >> count) | (n->n << ((size * BITS_PER_MARKER) - count));
   break;
 default:
   return false;
 }
   /* Zero unused bits for size.  */
-  if (bitsize < 8 * (int)sizeof (int64_t))
-n->n &= ((uint64_t)1 << bitsize) - 1;
+  if (size < 64 / BITS_PER_MARKER)
+n->n &= ((uint64_t) 1 << (size * BITS_PER_MARKER)) - 1;
   return true;
 }
 
@@ -1724,13 +1726,13 @@ init_symbolic_number (struct symbolic_number *n, tree 
src)
   if (size % BITS_PER_UNIT != 0)
 return false;
   size /= BITS_PER_UNIT;
-  if (size > (int)sizeof (uint64_t))
+  if (size > 64 / BITS_PER_MARKER)
 return false;
   n->range = size;
   n->n = CMPNOP;
 
-  if (size < (int)sizeof (int64_t))
-   

RE: [PATCH][ARM] Fix -fcall-saved-rX for X > 7 with -Os -mthumb

2014-08-31 Thread Thomas Preud'homme
Ping?
 
> > -Original Message-
> > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> > ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme
> > Sent: Wednesday, August 20, 2014 9:28 AM
> > To: gcc-patches@gcc.gnu.org
> > Subject: [PATCH][ARM] Fix -fcall-saved-rX for X > 7
> >
> > This patch makes -fcall-saved-rX for X > 7 on Thumb target when optimizing
> > for size. It works by adding a new field x_user_set_call_save_regs in struct
> > target_hard_regs to track whether an entry in fields x_fixed_regs,
> > x_call_used_regs and x_call_really_used_regs was user set or is in its
> default
> > value. Then it can decide whether to set a given high register as caller 
> > saved
> > or not when optimizing for size based on this information.
> >
> > ChangeLog are as follows:
> >
> > *** gcc/ChangeLog ***
> >
> > 2014-08-15  Thomas Preud'homme  
> >
> > * config/arm/arm.c (arm_conditional_register_usage): Only set high
> > registers as caller saved when optimizing for size *and* the user 
> > did
> > not asked otherwise through -fcall-saved-* switch.
> > * hard-reg-set.h (x_user_set_call_save_regs): New.
> > (user_set_call_save_regs): Define.
> > * reginfo.c (init_reg_sets): Initialize user_set_call_save_regs.
> > (fix_register): Indicate in user_set_call_save_regs that the value 
> > set
> > in call_save_regs and fixed_regs is user set.
> >
> >
> > *** gcc/testsuite/ChangeLog ***
> >
> > 2014-08-15  Thomas Preud'homme  
> >
> > * gcc.target/arm/fcall-save-rhigh.c: New.
> >
> >
> > diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> > index 2f8d327..8324fa3 100644
> > --- a/gcc/config/arm/arm.c
> > +++ b/gcc/config/arm/arm.c
> > @@ -30084,7 +30084,8 @@ arm_conditional_register_usage (void)
> >  stacking them.  */
> >for (regno = FIRST_HI_REGNUM;
> >regno <= LAST_HI_REGNUM; ++regno)
> > -   fixed_regs[regno] = call_used_regs[regno] = 1;
> > +   if (!user_set_call_save_regs[regno])
> > + fixed_regs[regno] = call_used_regs[regno] = 1;
> >  }
> >
> >/* The link register can be clobbered by any branch insn,
> > diff --git a/gcc/hard-reg-set.h b/gcc/hard-reg-set.h
> > index b8ab3df..b523637 100644
> > --- a/gcc/hard-reg-set.h
> > +++ b/gcc/hard-reg-set.h
> > @@ -614,6 +614,11 @@ struct target_hard_regs {
> >
> >char x_call_really_used_regs[FIRST_PSEUDO_REGISTER];
> >
> > +  /* Indexed by hard register number, contains 1 for registers
> > + whose saving at function call was decided by the user
> > + with -fcall-saved-*, -fcall-used-* or -ffixed-*.  */
> > +  char x_user_set_call_save_regs[FIRST_PSEUDO_REGISTER];
> > +
> >/* The same info as a HARD_REG_SET.  */
> >HARD_REG_SET x_call_used_reg_set;
> >
> > @@ -685,6 +690,8 @@ extern struct target_hard_regs
> > *this_target_hard_regs;
> >(this_target_hard_regs->x_call_used_regs)
> >  #define call_really_used_regs \
> >(this_target_hard_regs->x_call_really_used_regs)
> > +#define user_set_call_save_regs \
> > +  (this_target_hard_regs->x_user_set_call_save_regs)
> >  #define call_used_reg_set \
> >(this_target_hard_regs->x_call_used_reg_set)
> >  #define call_fixed_reg_set \
> > diff --git a/gcc/reginfo.c b/gcc/reginfo.c
> > index 7668be0..0b35f7f 100644
> > --- a/gcc/reginfo.c
> > +++ b/gcc/reginfo.c
> > @@ -183,6 +183,7 @@ init_reg_sets (void)
> >memcpy (call_really_used_regs, initial_call_really_used_regs,
> >   sizeof call_really_used_regs);
> >  #endif
> > +  memset (user_set_call_save_regs, 0, sizeof user_set_call_save_regs);
> >  #ifdef REG_ALLOC_ORDER
> >memcpy (reg_alloc_order, initial_reg_alloc_order, sizeof 
> > reg_alloc_order);
> >  #endif
> > @@ -742,6 +743,7 @@ fix_register (const char *name, int fixed, int
> call_used)
> >   if (fixed == 0)
> > call_really_used_regs[i] = call_used;
> >  #endif
> > + user_set_call_save_regs[i] = 1;
> > }
> > }
> >  }
> > diff --git a/gcc/testsuite/gcc.target/arm/fcall-save-rhigh.c
> > b/gcc/testsuite/gcc.target/arm/fcall-save-rhigh.c
> > new file mode 100644
> > index 000..a321a2b
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/arm/fcall-save-rhigh.c
> > @@ -0,0 +1,10 @@
> > +/* { dg-do compile } */
> > +/* { dg-final { scan-assembler "mov\\s+r.\\s*,\\s*r8" } } */
> > +/* { dg-require-effective-target arm_thumb1_ok } */
> > +/* { dg-options "-Os -mthumb -mcpu=cortex-m0 -fcall-saved-r8" } */
> > +
> > +void
> > +save_regs (void)
> > +{
> > +  asm volatile ("" ::: "r7", "r8");
> > +}
> >
> > Ok for trunk?
> >
> > Best regards,
> >
> > Thomas
> >
> >
> 
> 
> 





[PATCH] Fix PR63259: bswap not recognized when finishing with rotation

2014-10-07 Thread Thomas Preud'homme
Currently the bswap pass only look for bswap pattern by examining bitwise
OR statement and doing following def-use chains. However a rotation
(left or right) can finish a manual byteswap, as shown in the following example:

unsigned
byteswap_ending_with_rotation (unsigned in)
{
in = ((in & 0xff00ff00) >>  8) | ((in & 0x00ff00ff) <<  8);
in = ((in & 0x) >> 16) | ((in & 0x) << 16);
return in;
}

which is compiled into:

byteswap_ending_with_rotation (unsigned int in)
{
  unsigned int _2;
  unsigned int _3;
  unsigned int _4;
  unsigned int _5;

  :
  _2 = in_1(D) & 4278255360;
  _3 = _2 >> 8;
  _4 = in_1(D) & 16711935;
  _5 = _4 << 8;
  in_6 = _5 | _3;
  in_7 = in_6 r>> 16;
  return in_7;

}

This patch adds rotation (left and right) to the list of statement to consider 
for byte swap.

ChangeLog are as follows:

*** gcc/ChangeLog ***

2014-09-30  Thomas Preud'homme  

PR tree-optimization/63259
* tree-ssa-math-opts.c (pass_optimize_bswap::execute): Also consider
bswap in LROTATE_EXPR and RROTATE_EXPR statements.

*** gcc/testsuite/ChangeLog ***

2014-09-30  Thomas Preud'homme  

PR tree-optimization/63259
* optimize-bswapsi-1.c (swap32_e): New bswap pass test.


diff --git a/gcc/testsuite/gcc.dg/optimize-bswapsi-1.c 
b/gcc/testsuite/gcc.dg/optimize-bswapsi-1.c
index 580e6e0..d4b5740 100644
--- a/gcc/testsuite/gcc.dg/optimize-bswapsi-1.c
+++ b/gcc/testsuite/gcc.dg/optimize-bswapsi-1.c
@@ -64,5 +64,16 @@ swap32_d (SItype in)
 | (((in >> 24) & 0xFF) << 0);
 }
 
-/* { dg-final { scan-tree-dump-times "32 bit bswap implementation found at" 4 
"bswap" } } */
+/* This variant comes from PR63259.  It compiles to a gimple sequence that ends
+   with a rotation instead of a bitwise OR.  */
+
+unsigned
+swap32_e (unsigned in)
+{
+  in = ((in & 0xff00ff00) >>  8) | ((in & 0x00ff00ff) <<  8);
+  in = ((in & 0x) >> 16) | ((in & 0x) << 16);
+  return in;
+}
+
+/* { dg-final { scan-tree-dump-times "32 bit bswap implementation found at" 5 
"bswap" } } */
 /* { dg-final { cleanup-tree-dump "bswap" } } */
diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
index 3c6e935..2023f2e 100644
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -2377,11 +2377,16 @@ pass_optimize_bswap::execute (function *fun)
 {
  gimple src_stmt, cur_stmt = gsi_stmt (gsi);
  tree fndecl = NULL_TREE, bswap_type = NULL_TREE, load_type;
+ enum tree_code code;
  struct symbolic_number n;
  bool bswap;
 
- if (!is_gimple_assign (cur_stmt)
- || gimple_assign_rhs_code (cur_stmt) != BIT_IOR_EXPR)
+ if (!is_gimple_assign (cur_stmt))
+   continue;
+
+ code = gimple_assign_rhs_code (cur_stmt);
+ if (code != BIT_IOR_EXPR && code != LROTATE_EXPR
+ && code != RROTATE_EXPR)
continue;
 
  src_stmt = find_bswap_or_nop (cur_stmt, &n, &bswap);

Testing was done by running the testsuite on arm-none-eabi target with QEMU
emulating Cortex-M3: no regression were found. Due to the potential increase
in compilation time, A bootstrap with sequential build (no -j option when 
calling
make) and with default option was made with and without the patch. The
results shows no increase compilation time:

r215662 with patch:
make  6167.48s user 401.03s system 99% cpu 1:49:52.07 total

r215662 without patch
make  6136.63s user 400.32s system 99% cpu 1:49:27.28 total

Is it ok for trunk?

Best regards,

Thomas Preud'homme





RE: [PATCH] Fix PR63259: bswap not recognized when finishing with rotation

2014-10-07 Thread Thomas Preud'homme
> From: Jakub Jelinek [mailto:ja...@redhat.com]
> Sent: Wednesday, October 08, 2014 2:39 PM
> 
> Doesn't it turn 16-bit {L,R}ROTATE_EXPR used alone into
> __builtin_bswap16?
> For those the question is if the canonical GIMPLE should be the rotation
> or
> byteswap, I'd think rotation would be perhaps better.  Or depending on
> if
> the backend has bswaphi2 or rotate pattern?

Good point. It seems better to keep the status quo.

> 
> Also, perhaps you could short-circuit this if the rotation isn't by constant
> or not a multiple of BITS_PER_UNIT.  So
> switch (code)
>   {
>   case BIT_IOR_EXPR:
> break;
>   case LROTATE_EXPR:
>   case RROTATE_EXPR:
> if (!tree_fits_uhwi_p (gimple_assign_rhs2 (cur_stmt))
> || (tree_to_uhwi (gimple_assign_rhs2 (cur_stmt))
> % BITS_PER_UNIT))
>   continue;
> break;
>   default:
> continue;
>   }
> ?

Right. Thanks for the comments.

Best regards,

Thomas






RE: [PATCH] Fix PR63259: bswap not recognized when finishing with rotation

2014-10-07 Thread Thomas Preud'homme
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme
> Sent: Wednesday, October 08, 2014 2:43 PM
> > Also, perhaps you could short-circuit this if the rotation isn't by constant

Note that do_shift_rotate already check for this. Is it enough?

Best regards,

Thomas 






RE: [PATCH, C++] Fix PR63366: __complex not equivalent to __complex double in C++

2014-10-09 Thread Thomas Preud'homme
Ping?

> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme
> Sent: Monday, September 29, 2014 3:33 PM
> To: gcc-patches@gcc.gnu.org
> Subject: [PATCH, C++] Fix PR63366: __complex not equivalent to
> __complex double in C++
> 
> According to a comment in grokdeclarator in file gcc/cp/decl.c:
> 
> /* If we just have "complex", it is equivalent to
> "complex double", but if any modifiers at all are specified it is
> the complex form of TYPE. E.g, "complex short" is
> "complex short int". */
> 
> Yet, __complex is equivalent to __complex int as shows the following
> testcase:
> 
> #include 
> 
> int
> main (void)
> {
>   return typeid (__complex) != typeid (__complex int);
> }
> 
> The following patch fix the problem.
> 
> 
> ChangeLog are as follows:
> 
> *** gcc/cp/ChangeLog ***
> 
> 2014-09-26  Thomas Preud'homme  
> 
>   PR C++/63366
>   * decl.c (grokdeclarator): Set defaulted_int when defaulting to
> int
>   because type is null.
> 
> *** gcc/testsuite/ChangeLog ***
> 
> 2014-10-26  Thomas Preud'homme  
> 
>   PR C++/63366
>   * g++.dg/torture/pr63366.C: New test.
> 
> 
> diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
> index d26a432..449efdf 100644
> --- a/gcc/cp/decl.c
> +++ b/gcc/cp/decl.c
> @@ -9212,6 +9212,7 @@ grokdeclarator (const cp_declarator *declarator,
>   "ISO C++ forbids declaration of %qs with no type", name);
> 
>type = integer_type_node;
> +  defaulted_int = 1;
>  }
> 
>ctype = NULL_TREE;
> diff --git a/gcc/testsuite/g++.dg/torture/pr63366.C
> b/gcc/testsuite/g++.dg/torture/pr63366.C
> new file mode 100644
> index 000..af59b98
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/torture/pr63366.C
> @@ -0,0 +1,11 @@
> +// { dg-do run }
> +// { dg-options "-fpermissive" }
> +// { dg-prune-output "ISO C\\+\\+ forbids declaration of 'type name'
> with no type" }
> +
> +#include 
> +
> +int
> +main (void)
> +{
> +  return typeid (__complex) != typeid (__complex double);
> +}
> 
> 
> Is this ok for trunk?
> 
> Best regards,
> 
> Thomas Preud'homme
> 
> 
> 






RE: [PATCH] Fix PR63266: Keep track of impact of sign extension in bswap

2014-10-21 Thread Thomas Preud'homme
Hi Richard,

I realized thanks to Christophe Lyon that a shift was not right: the shift count
is a number of bytes instead of a number of bits.

This extra patch fixes the problem.

ChangeLog are as follows:

*** gcc/ChangeLog ***

2014-09-26  Thomas Preud'homme  

* tree-ssa-math-opts.c (find_bswap_or_nop_1): Fix creation of
MARKER_BYTE_UNKNOWN markers when handling casts.

*** gcc/testsuite/ChangeLog ***

2014-10-08  Thomas Preud'homme  

* gcc.dg/optimize-bswaphi-1.c: New bswap pass test.

diff --git a/gcc/testsuite/gcc.dg/optimize-bswaphi-1.c 
b/gcc/testsuite/gcc.dg/optimize-bswaphi-1.c
index 3e51f04..18aba28 100644
--- a/gcc/testsuite/gcc.dg/optimize-bswaphi-1.c
+++ b/gcc/testsuite/gcc.dg/optimize-bswaphi-1.c
@@ -42,6 +42,20 @@ uint32_t read_be16_3 (unsigned char *data)
   return *(data + 1) | (*data << 8);
 }
 
+typedef int SItype __attribute__ ((mode (SI)));
+typedef int HItype __attribute__ ((mode (HI)));
+
+/* Test that detection of significant sign extension works correctly. This
+   checks that unknown byte marker are set correctly in cast of cast.  */
+
+HItype
+swap16 (HItype in)
+{
+  return (HItype) (((in >> 0) & 0xFF) << 8)
+   | (((in >> 8) & 0xFF) << 0);
+}
+
 /* { dg-final { scan-tree-dump-times "16 bit load in target endianness found 
at" 3 "bswap" } } */
-/* { dg-final { scan-tree-dump-times "16 bit bswap implementation found at" 3 
"bswap" { xfail alpha*-*-* arm*-*-* } } } */
+/* { dg-final { scan-tree-dump-times "16 bit bswap implementation found at" 1 
"bswap" { target alpha*-*-* arm*-*-* } } } */
+/* { dg-final { scan-tree-dump-times "16 bit bswap implementation found at" 4 
"bswap" { xfail alpha*-*-* arm*-*-* } } } */
 /* { dg-final { cleanup-tree-dump "bswap" } } */
diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
index 3c6e935..2ef2333 100644
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -1916,7 +1916,8 @@ find_bswap_or_nop_1 (gimple stmt, struct symbolic_number 
*n, int limit)
if (!TYPE_UNSIGNED (n->type) && type_size > old_type_size
&& HEAD_MARKER (n->n, old_type_size))
  for (i = 0; i < type_size - old_type_size; i++)
-   n->n |= MARKER_BYTE_UNKNOWN << (type_size - 1 - i);
+   n->n |= MARKER_BYTE_UNKNOWN
+   << ((type_size - 1 - i) * BITS_PER_MARKER);
 
if (type_size < 64 / BITS_PER_MARKER)
  {

regression testsuite run without regression on x86_64-linux-gnu and bswap tests 
all pass on arm-none-eabi target

Is it ok for trunk?

Best regards,

Thomas

> -Original Message-
> From: Richard Biener [mailto:richard.guent...@gmail.com]
> Sent: Wednesday, September 24, 2014 4:01 PM
> To: Thomas Preud'homme
> Cc: GCC Patches
> Subject: Re: [PATCH] Fix PR63266: Keep track of impact of sign extension
> in bswap
> 
> On Tue, Sep 16, 2014 at 12:24 PM, Thomas Preud'homme
>  wrote:
> > Hi all,
> >
> > The fix for PR61306 disabled bswap when a sign extension is detected.
> However this led to a test case regression (and potential performance
> regression) in case where a sign extension happens but its effect is
> canceled by other bit manipulation. This patch aims to fix that by having a
> special marker to track bytes whose value is unpredictable due to sign
> extension. If the final result of a bit manipulation doesn't contain any
> such marker then the bswap optimization can proceed.
> 
> Nice and simple idea.
> 
> Ok.
> 
> Thanks,
> Richard.
> 
> > *** gcc/ChangeLog ***
> >
> > 2014-09-15  Thomas Preud'homme  
> >
> > PR tree-optimization/63266
> > * tree-ssa-math-opts.c (struct symbolic_number): Add comment
> about
> > marker for unknown byte value.
> > (MARKER_MASK): New macro.
> > (MARKER_BYTE_UNKNOWN): New macro.
> > (HEAD_MARKER): New macro.
> > (do_shift_rotate): Mark bytes with unknown values due to sign
> > extension when doing an arithmetic right shift. Replace hardcoded
> > mask for marker by new MARKER_MASK macro.
> > (find_bswap_or_nop_1): Likewise and adjust ORing of two
> symbolic
> > numbers accordingly.
> >
> > *** gcc/testsuite/ChangeLog ***
> >
> > 2014-09-15  Thomas Preud'homme  
> >
> > PR tree-optimization/63266
> > * gcc.dg/optimize-bswapsi-1.c (swap32_d): New bswap pass test.
> >
> >
> > Testing:
> >
> > * Built an arm-none-eabi-gcc cross-compiler and used it to run the
> testsuite on QEMU emulating Cortex-M3 without any regression
> > * Bootstrapped on x86_64-linux-gnu target and testsuite was run
> without regression
> >
> >
> > Ok for trunk?






RE: [PATCH] Fix PR63266: Keep track of impact of sign extension in bswap

2014-10-22 Thread Thomas Preud'homme
> From: Christophe Lyon [mailto:christophe.l...@linaro.org]
> Sent: Tuesday, October 21, 2014 10:03 PM
> > +typedef int SItype __attribute__ ((mode (SI)));
> What's the purpose of this? It seems unused.

Sigh. Bad copy/paste from optimize-bswapsi-1.c

I'll add it to my patch for pr63259.


> I believe this should be:
> "checks that unknown byte markers are set correctly in case of cast"

Indeed, there is a 's' missing for markers.

> 
> > +
> > +HItype
> > +swap16 (HItype in)
> > +{
> > +  return (HItype) (((in >> 0) & 0xFF) << 8)
> > +   | (((in >> 8) & 0xFF) << 0);
> > +}
> > +
> >  /* { dg-final { scan-tree-dump-times "16 bit load in target endianness
> found at" 3 "bswap" } } */
> > -/* { dg-final { scan-tree-dump-times "16 bit bswap implementation
> found at" 3 "bswap" { xfail alpha*-*-* arm*-*-* } } } */
> > +/* { dg-final { scan-tree-dump-times "16 bit bswap implementation
> found at" 1 "bswap" { target alpha*-*-* arm*-*-* } } } */
> 
> This line fails when forcing the compiler to target -march=armv5t for
> instance. I suspect this is because the check_effective_target_bswap
> test is too permissive.

Yep, it's likely to be the case. Feel to add a version check in it.

Thanks for the review.

Best regards,

Thomas






RE: [PATCH] Fix PR63266: Keep track of impact of sign extension in bswap

2014-10-27 Thread Thomas Preud'homme
> From: Christophe Lyon [mailto:christophe.l...@linaro.org]
> Sent: Sunday, October 26, 2014 4:40 PM
> I tried to modify check_effective_target_bswap
> and added:
> +   } else {
> +   if { [istarget arm*-*-*]
> +&& [check_no_compiler_messages_nocache arm_v6_or_later
> object {
> +#if __ARM_ARCH < 6
> +#error not armv6 or later
> +#endif
> +int i;
> +} ""] } {
> +   set et_bswap_saved 1
> +   }
> since the rev* instructions appeared in v6.
> 
> Regarding the testsuite, it moves the tests to UNSUPPORTED vs a mix of
> PASS/FAIL/XFAIL

[SNIP PASS/FAIL/XFAIL changes]

> 
> The PASS seems not very informative, so it may not be a problem to
> loose these few PASS/XFAIL.

Agreed. A FAIL would only mean that the test was badly written. Only
the dump is relevant to tell whether the bswap pass did its job or not.

> 
> We can also explicitly skip optimize-bswaphi-1 when ARM_ARCH < 6.
> 
> Not sure what's preferred?

I prefer changing the effective target as it could be reused for some other 
tests
eventually. It also reflects better the reason why the test is disabled: no 
16-bit
bswap.

Best regards,

Thomas






[PATCH] Fix PR61328: fully initialize symbolic number before using it

2014-06-03 Thread Thomas Preud'homme
When a bitwise OR gimple statement has for operands SSA_NAME initialized 
directly from memory source (no cast or other unary statement intervening), a 
symbolic number will be used only partly initialized. This was discovered by 
valgrind and reported as PR61328. This patch fixes that by moving the 
initialization code in a separate function that can be called from the two 
places that need it. There was a problem of a field of a structure that was set 
in a function and the value of this field was read before checking the result 
of the function call. This would lead to missed optimization.

ChangeLog is as follows:

2014-05-29  Thomas Preud'homme  

PR tree-optimization/61328
* tree-ssa-math-opts.c (init_symbolic_number): Extract symbolic number
initialization from find_bswap_or_nop_1.
(find_bswap_or_nop_1): Test return value of find_bswap_or_nop_1 stored
in source_expr2 before using the size value the function sets. Also
make use of init_symbolic_number () in both the old place and
find_bswap_or_nop_load () to avoid reading uninitialized memory when
doing recursion in the GIMPLE_BINARY_RHS case.

Ok for trunk?

diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
index d9afccf..6c26d6d 100644
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -1701,6 +1701,30 @@ verify_symbolic_number_p (struct symbolic_number *n, 
gimple stmt)
   return true;
 }
 
+/* Initialize the symbolic number N for the bswap pass from the base element
+   SRC manipulated by the bitwise OR expression.  */
+
+static bool
+init_symbolic_number (struct symbolic_number *n, tree src)
+{
+  n->base_addr = n->offset = n->alias_set = n->vuse = NULL_TREE;
+
+  /* Set up the symbolic number N by setting each byte to a value between 1 and
+ the byte size of rhs1.  The highest order byte is set to n->size and the
+ lowest order byte to 1.  */
+  n->size = TYPE_PRECISION (TREE_TYPE (src));
+  if (n->size % BITS_PER_UNIT != 0)
+return false;
+  n->size /= BITS_PER_UNIT;
+  n->range = n->size;
+  n->n = CMPNOP;
+
+  if (n->size < (int)sizeof (int64_t))
+n->n &= ((uint64_t)1 << (n->size * BITS_PER_UNIT)) - 1;
+
+  return true;
+}
+
 /* Check if STMT might be a byte swap or a nop from a memory source and returns
the answer. If so, REF is that memory source and the base of the memory area
accessed and the offset of the access from that base are recorded in N.  */
@@ -1713,26 +1737,27 @@ find_bswap_or_nop_load (gimple stmt, tree ref, struct 
symbolic_number *n)
   HOST_WIDE_INT bitsize, bitpos;
   enum machine_mode mode;
   int unsignedp, volatilep;
+  tree offset, base_addr;
 
   if (!gimple_assign_load_p (stmt) || gimple_has_volatile_ops (stmt))
 return false;
 
-  n->base_addr = get_inner_reference (ref, &bitsize, &bitpos, &n->offset,
- &mode, &unsignedp, &volatilep, false);
+  base_addr = get_inner_reference (ref, &bitsize, &bitpos, &offset, &mode,
+  &unsignedp, &volatilep, false);
 
-  if (TREE_CODE (n->base_addr) == MEM_REF)
+  if (TREE_CODE (base_addr) == MEM_REF)
 {
   offset_int bit_offset = 0;
-  tree off = TREE_OPERAND (n->base_addr, 1);
+  tree off = TREE_OPERAND (base_addr, 1);
 
   if (!integer_zerop (off))
{
- offset_int boff, coff = mem_ref_offset (n->base_addr);
+ offset_int boff, coff = mem_ref_offset (base_addr);
  boff = wi::lshift (coff, LOG2_BITS_PER_UNIT);
  bit_offset += boff;
}
 
-  n->base_addr = TREE_OPERAND (n->base_addr, 0);
+  base_addr = TREE_OPERAND (base_addr, 0);
 
   /* Avoid returning a negative bitpos as this may wreak havoc later.  */
   if (wi::neg_p (bit_offset))
@@ -1743,11 +1768,11 @@ find_bswap_or_nop_load (gimple stmt, tree ref, struct 
symbolic_number *n)
 Subtract it to BIT_OFFSET and add it (scaled) to OFFSET.  */
  bit_offset -= tem;
  tem = wi::arshift (tem, LOG2_BITS_PER_UNIT);
- if (n->offset)
-   n->offset = size_binop (PLUS_EXPR, n->offset,
+ if (offset)
+   offset = size_binop (PLUS_EXPR, offset,
wide_int_to_tree (sizetype, tem));
  else
-   n->offset = wide_int_to_tree (sizetype, tem);
+   offset = wide_int_to_tree (sizetype, tem);
}
 
   bitpos += bit_offset.to_shwi ();
@@ -1758,6 +1783,9 @@ find_bswap_or_nop_load (gimple stmt, tree ref, struct 
symbolic_number *n)
   if (bitsize % BITS_PER_UNIT)
 return false;
 
+  init_symbolic_number (n, ref);
+  n->base_addr = base_addr;
+  n->offset = offset;
   n->bytepos = bitpos / BITS_PER_UNIT;
   n->alias_set = reference_alias_ptr_type (ref);
   n->vuse = gimple_vuse (stmt);
@@ -1816,28 +1

[PATCH] Fix PR61306: improve handling of sign and cast in bswap

2014-06-03 Thread Thomas Preud'homme
When bswap replace a bitwise expression involving a memory source by a load 
possibly followed by a bswap, it is possible that the load has a size smaller 
than that of the target expression where the bitwise expression was affected. 
So some sort of cast is needed. But there might also be a difference between 
the size of the expression that was affected and the size of the load. So 3 
different sizes might be involved. Consider the following example from binutils:

bfd_vma
bfd_getl16 (const void *p)
{
  const bfd_byte *addr = (*const bfd_byte *) p;
  return (addr[1] << 8) | addr[0];
}

Here the load will have a size of 16 bits, while the bitwise expression is an 
int (don't ask me why) but is returned as a 64 bits quantity (bfd_vma maps to 
the size of host registers). In this case we need 2 separate cast. One from 16 
bit quantity to int with zero extension as the high bits are 0. It is always a 
zero extension because bswap will not do anything in the presence of a sign 
extension as depending on the initial value the result would be different 
(maybe a bswap if positive number and random value if negative number). Then, 
we need to cast respecting the extension that would have happen had we not 
replaced the bitwise extension. Here since the bitwise expression is int, it 
means we sign extend and then consider the content as being unsigned (bfd_vma 
is an unsigned quantity).

When a bswap is necessary we need to do this double cast after doing the bswap 
as the bswap must be done on the loaded value since a that's the size expected 
by the bswap builtin. Finally, this patch also forbit any sign extension *in* 
the bitwise expression as the result of the expression would then be 
unpredictable (depend on the initial value).

The patch works this way:

1) prevent size extension of a bitwise expression
2) record the type of the bitwise expression instead of its size (the size can 
be determined from the type)
3) use this type to perform a double cast as explained above


2014-05-30  Thomas Preud'homme  

PR tree-optimization/61306
* tree-ssa-math-opts.c (struct symbolic_number): Store type of
expression instead of its size.
(do_shift_rotate): Adapt to change in struct symbolic_number.
(verify_symbolic_number_p): Likewise.
(init_symbolic_number): Likewise.
(find_bswap_or_nop_1): Likewise. Also prevent optimization when the
result of the expressions is unpredictable due to sign extension.
(convert_via): New function to deal with the casting involved from the
loaded value to the target SSA.
(bswap_replace): Rename load_type in range_type to reflect it's the
type the memory accessed shall have before being casted. Select load
type according to whether a bswap needs to be done. Cast first to rhs
with zero extend and then to lhs with sign extend to keep semantic of
original stmts.
(pass_optimize_bswap::execute): Adapt to change in struct
symbolic_number. Decide if the range accessed should be signed or
unsigned before being casted to lhs type based on rhs type and size.

2014-05-29  Thomas Preud'homme  

* gcc.c-torture/execute/pr61306.c: New test.

Patch is in attachment. Is this ok for trunk?

Best regards,

Thomas

PR61306.1.0.diff
Description: Binary data


[PATCH] Fix PR61320: disable bswap for unaligned access on SLOW_UNALIGNED_ACCESS targets

2014-06-03 Thread Thomas Preud'homme
Hi there,

It seems from PR61320 that the bswap pass causes some problems when it replaces
an OR expression by an unaligned access. Although it's not clear yet why the
unaligned load does not go through the extract_bit_field codepath, it is 
necessary to
provide a solution as this prevent sparc from bootstrapping. This patch takes 
the
simple approach of cancelling the bswap optimization when the load that would
replace the OR expression would be an unaligned load and the target has
SLOW_UNALIGNED_ACCESS. In the long run this patch should be reverted as soon
as the root cause of the current problem is found.

The patch also rewrite the test to take into account the fact that the 
optimization is
not done for some target. It also add some alignment hint so that more tests 
can be
run even on STRICT_ALIGNMENT targets.

ChangeLog changes are:

*** gcc/ChangeLog ***

2014-06-03  Thomas Preud'homme  

PR tree-optimization/61320
* tree-ssa-math-opts.c (bswap_replace): Cancel bswap optimization when
load is unaligned and would be slow for this target.

*** gcc/testsuite/ChangeLog ***

2014-06-03  Thomas Preud'homme  

* gcc.dg/optimize-bswaphi-1.c: Make variables global when possible to
enforce correct alignment and make the test work better on
STRICT_ALIGNMENT targets. Also adjust dg-final selectors when alignment
cannot be controlled (read_*_3 ()).
* gcc.dg/optimize-bswapsi-2.c: Likewise.
* gcc.dg/optimize-bswapdi-3.c: Likewise.

Bootstrapped on x86_64-linux-gnu and no regression found in the testsuite. 
Patch is
in attachment. It applies on top of the one for PR61306 in the email titled
"[PATCH] Fix PR61306: improve handling of sign and cast in bswap" but can be
trivially modified to apply directly on trunk should that patch (PR61306) need 
to be
improved.

Is this ok for trunk?

Best regards,

Thomas

PR61320.1.0.diff
Description: Binary data


RE: [PATCH] Fix PR54733 Optimize endian independent load/store

2014-06-04 Thread Thomas Preud'homme
> From: Christophe Lyon [mailto:christophe.l...@linaro.org]
> On 29 May 2014 11:58, Thomas Preud'homme
>  wrote:
> >
> > Does the patch solve the problem you had? What about you Christophe?
> >
> >
> 
> Hi Thomas,
> 
> After a quick test, it looks OK to me.

Great. What about you Andreas? Does it work fine for you? If yes, is this ok 
for trunk?

Best regards,

Thomas




[PATCH] Adding myself to Write After Approval in MAINTAINERS

2014-06-04 Thread Thomas Preud'homme
Hi all,

I forgot to add myself to the MAINTAINERS file when I got Write After Approval
access. This patch does just this (already commited as specified on the 
website):

diff --git a/ChangeLog b/ChangeLog
index d35b315..80e9600 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,7 @@
+2014-06-04  Thomas Preud'homme  
+
+   * MAINTAINERS (Write After Approval): Add myself.
+
 2014-06-03  Andrew Bennett  
 
* MAINTAINERS (Write After Approval): Add myself.
diff --git a/MAINTAINERS b/MAINTAINERS
index 12c123d..71206e7 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -501,6 +501,7 @@ Paul Pluzhnikov 
ppluzhni...@google.com
 Marek Polacek  pola...@redhat.com
 Antoniu Popantoniu@gmail.com
 Vidya Praveen  vidyaprav...@arm.com
+Thomas Preud'homme thomas.preudho...@arm.com
 Vladimir Prus  vladi...@codesourcery.com
 Yao Qi y...@codesourcery.com
 Jerry Quinnjlqu...@optonline.net




RE: [PATCH] Fix PR61306: improve handling of sign and cast in bswap

2014-06-04 Thread Thomas Preud'homme
> From: Richard Biener [mailto:richard.guent...@gmail.com]
> 
> I'd rather change the comparisons
> 
> -  if (n->size < (int)sizeof (int64_t))
> -n->n &= ((uint64_t)1 << (n->size * BITS_PER_UNIT)) - 1;
> +  if (bitsize / BITS_PER_UNIT < (int)sizeof (int64_t))
> +n->n &= ((uint64_t)1 << bitsize) - 1;
> 
> to work in bits, thus bitsize < 8 * sizeof (int64_t) (note that using
> BITS_PER_UNIT is bogus here - you are dealing with 8-bit bytes
> on the host, not whatever the target uses).  Otherwise it smells
> like truncation may lose bits (probably not in practice).

Ah yes, right.

> 
> +   /* Sign extension: result is dependent on the value.  */
> +   if (!TYPE_UNSIGNED (type) && !TYPE_UNSIGNED (n->type)
> +   && type_size > TYPE_PRECISION (n->type))
> + return NULL_TREE;
> 
> whether it's sign-extended depends solely on the sign of the
> converted entity, so I'm not sure why you are restricting this
> to signed n->type.  Consider
> 
> signed char *p;
> ((unsigned int)p[0]) << 8 | ((unsigned int)p[1]) << 16
> 
> the loads are still sign-extended but n->type is unsigned.

Indeed, I understood it for convert_via (the requirement to be
unsigned) but got it wrong here.

> 
> I'm failing to get why you possibly need two casts ... you should
> only need one, from the bswap/load result to the final type
> (zero-extended as you say - so the load type should simply be
> unsigned which it is already).

Because of the type of the shift constant, the bitwise expression
is usually of type int. However, if you write a function that does
a 32 bit load in host endianness (like a 32 bit little endian load on
x86_64) with a result of size 64 bits, then you need to sign
extend, since the source type is signed. This is a situation I
encountered in bfd_getl32 in binutils I think.

Now if you consider bfd_getl16 instead a direct sign extension
is out of the question. Suppose bfd_getl16 is called to read from
a memory address that contains 0xff 0xfe. The bitwise expression
would thus be equivalent to the value 0xfeff since it's of type
int. Then after the sign extension to 64bits you'd have
0xfeff. But after replacing the bitwise expression
you end up with a 16bit load into a 16bit SSA variable. If you sign
extend that directly to 64 bits you'll end up with
0xfeff which is wrong. But if you zero extend to an
int value (the type of the bitwise OR expression) and then sign
extend to the target type you'll have the correct result.

But you're right, we can do simpler by sign extending if
load size == size of bitwise expression and zero extend else. The
change of load_type to range_type would still be needed
because in case of a load + bswap it's better to load in the
same type as the type of the parameter of bswap. After bswap
you'd need to convert to a signed or unsigned value according
to the logic above (load size == size of bitwise expression)

In the original statement, the bitwise OR
expression would have the 2 bytes of higher weight be 0 while
the 2 bytes of lower weight would be the value read from
memory. The result of the sign extension would be 

> 
> So I think that the testcase in the patch is fixed already by
> doing the n->type change (and a proper sign-extension detection).

I don't remember exactly but I think it didn't fix this bug
(but it is a necessary fix anyway).
 
> 
> Can you please split that part out?

Sure, that part would need to be applied on gcc 4.9 too. I'll try
to construct a testcase for that.

> 
> That range_type and convert_via looks wrong and unnecessary to me,
> and it doesn't look like you have a testcase excercising it?

See above.




RE: [PATCH] Fix PR61306: improve handling of sign and cast in bswap

2014-06-04 Thread Thomas Preud'homme
> From: Richard Biener [mailto:richard.guent...@gmail.com]
> 
> Err, but if you zero-extend directly to the target type you have the
> correct result, too.

Yep but in some case we need sign extend (32 bit bitwise OR stored
into 64 bit result). As I said, the logic could be simplified by sign
extending if load_size == bitwise expression size and zero extending
if not true. I'll rework the patch in this direction.

> 
> But nothing for the testsuite?  The testcase you add fails foul of
> sign-extending the loads.

Ack, I'll add a test for zero extension and one for sign extension.

Cheers,

Thomas





[PATCH] Clean bswap messages and tests

2014-06-05 Thread Thomas Preud'homme
This patch include 2 cleanup that were requested in PR61320:

* Use dg-additional-options to specify the extra option s390 target needs
* Use the correct vocabulary of target endianness instead of host endianness in 
comments, pass dump and the past ChangeLog entry.

Here are the ChangeLog:

*** gcc/ChangeLog ***

2014-06-04  Thomas Preud'homme  

* ChangeLog (2014-05-23): Fix ChangeLog entry to refer to target
endianness instead of host endianness.
* tree-ssa-math-opts.c (find_bswap_or_nop_1): Likewise in dumps and
comments.

*** gcc/testsuite/ChangeLog ***

2014-06-04  Thomas Preud'homme  

2014-06-04  Thomas Preud'homme  

* gcc.dg/optimize-bswaphi-1.c: Adapt test to change of dump output.
Specify -march=z900 as an additional option.
* gcc.dg/optimize-bswapsi-1.c: Likewise for s390 options.
* gcc.dg/optimize-bswapsi-2.c: Likewise.
* gcc.dg/optimize-bswapdi-3.c: Likewise for adaptation to dump change.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index de07e5c..09122aa 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1623,7 +1623,8 @@
(find_bswap_or_nop_1): This. Also add support for memory source.
(find_bswap): Renamed to ...
(find_bswap_or_nop): This. Also add support for memory source and
-   detection of bitwise operations equivalent to load in host endianness.
+   detection of bitwise operations equivalent to load in target
+   endianness.
(execute_optimize_bswap): Likewise. Also move its leading comment back
in place and split statement transformation into ...
(bswap_replace): This.
diff --git a/gcc/testsuite/gcc.dg/optimize-bswapdi-3.c 
b/gcc/testsuite/gcc.dg/optimize-bswapdi-3.c
index 0a8bf2e..d96d7e5 100644
--- a/gcc/testsuite/gcc.dg/optimize-bswapdi-3.c
+++ b/gcc/testsuite/gcc.dg/optimize-bswapdi-3.c
@@ -59,6 +59,6 @@ uint64_t read_be64_3 (unsigned char *data)
 | ((uint64_t) *(data + 1) << 48) | ((uint64_t) *data << 56);
 }
 
-/* { dg-final { scan-tree-dump-times "64 bit load in host endianness found at" 
3 "bswap" } } */
+/* { dg-final { scan-tree-dump-times "64 bit load in target endianness found 
at" 3 "bswap" } } */
 /* { dg-final { scan-tree-dump-times "64 bit bswap implementation found at" 3 
"bswap" { xfail alpha*-*-* arm*-*-* } } } */
 /* { dg-final { cleanup-tree-dump "bswap" } } */
diff --git a/gcc/testsuite/gcc.dg/optimize-bswaphi-1.c 
b/gcc/testsuite/gcc.dg/optimize-bswaphi-1.c
index 65bff98..3e51f04 100644
--- a/gcc/testsuite/gcc.dg/optimize-bswaphi-1.c
+++ b/gcc/testsuite/gcc.dg/optimize-bswaphi-1.c
@@ -2,7 +2,7 @@
 /* { dg-require-effective-target bswap16 } */
 /* { dg-require-effective-target stdint_types } */
 /* { dg-options "-O2 -fdump-tree-bswap" } */
-/* { dg-options "-O2 -fdump-tree-bswap -march=z900" { target s390-*-* } } */
+/* { dg-additional-options "-march=z900" { target s390-*-* } } */
 
 #include 
 
@@ -42,6 +42,6 @@ uint32_t read_be16_3 (unsigned char *data)
   return *(data + 1) | (*data << 8);
 }
 
-/* { dg-final { scan-tree-dump-times "16 bit load in host endianness found at" 
3 "bswap" } } */
+/* { dg-final { scan-tree-dump-times "16 bit load in target endianness found 
at" 3 "bswap" } } */
 /* { dg-final { scan-tree-dump-times "16 bit bswap implementation found at" 3 
"bswap" { xfail alpha*-*-* arm*-*-* } } } */
 /* { dg-final { cleanup-tree-dump "bswap" } } */
diff --git a/gcc/testsuite/gcc.dg/optimize-bswapsi-1.c 
b/gcc/testsuite/gcc.dg/optimize-bswapsi-1.c
index 33d0bb0..ebfca60 100644
--- a/gcc/testsuite/gcc.dg/optimize-bswapsi-1.c
+++ b/gcc/testsuite/gcc.dg/optimize-bswapsi-1.c
@@ -2,7 +2,7 @@
 /* { dg-require-effective-target bswap32 } */
 /* { dg-require-effective-target stdint_types } */
 /* { dg-options "-O2 -fdump-tree-bswap" } */
-/* { dg-options "-O2 -fdump-tree-bswap -march=z900" { target s390-*-* } } */
+/* { dg-additional-options "-march=z900" { target s390-*-* } } */
 
 #include 
 
diff --git a/gcc/testsuite/gcc.dg/optimize-bswapsi-2.c 
b/gcc/testsuite/gcc.dg/optimize-bswapsi-2.c
index 518b510..de6e697 100644
--- a/gcc/testsuite/gcc.dg/optimize-bswapsi-2.c
+++ b/gcc/testsuite/gcc.dg/optimize-bswapsi-2.c
@@ -2,7 +2,7 @@
 /* { dg-require-effective-target bswap32 } */
 /* { dg-require-effective-target stdint_types } */
 /* { dg-options "-O2 -fdump-tree-bswap" } */
-/* { dg-options "-O2 -fdump-tree-bswap -march=z900" { target s390-*-* } } */
+/* { dg-additional-options "-march=z900" { target s390-*-* } } */
 
 #include 
 
@@ -44,6 +44,6 @@ uint32_t read_be32_3 (unsigned char *data)
 | (*data << 24);
 }
 
-/* { dg-final { scan-tree-dump-times "32 bit load in host endianness found at" 
3 "bswap" } } */
+/* { dg-final {

RE: [PATCH] Fix PR54733 Optimize endian independent load/store

2014-06-06 Thread Thomas Preud'homme
> From: Richard Biener [mailto:richard.guent...@gmail.com]
> On Wed, Jun 4, 2014 at 9:04 AM, Thomas Preud'homme
>  wrote:
> >
> > Great. What about you Andreas? Does it work fine for you? If yes, is this ok
> for trunk?
> 
> Ok.
> 
> Thanks,
> Richard.

Commited since I got positive feedback from Christophe Lyon and
Rainer (on PR61320).

Best regards,

Thomas





[PATCH] Unchecked call to init_symbolic_number

2014-06-09 Thread Thomas Preud'homme
When doing a bootstrap on x86_64-linux-gnu to test a patch I'm writing,
I encountered a failure that turned out to be due to not checking the
return value of init_symbolic_number in find_bswap_or_nop_load
(tree-ssa-math-opts.c). The following patch fixes that and I commited
it as obvious as per GCC write access policies.

ChangeLog:

2014-06-09  Thomas Preud'homme  

* tree-ssa-math-opts.c (find_bswap_or_nop_load): Check return value of
init_symbolic_number ().


diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
index a928ad9..1f011a6 100644
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -1784,7 +1784,8 @@ find_bswap_or_nop_load (gimple stmt, tree ref, struct 
symbolic_number *n)
   if (bitsize % BITS_PER_UNIT)
 return false;
 
-  init_symbolic_number (n, ref);
+  if (!init_symbolic_number (n, ref))
+return false;
   n->base_addr = base_addr;
   n->offset = offset;
   n->bytepos = bitpos / BITS_PER_UNIT;




[PATCH] Fix PR61375: cancel bswap optimization when value doesn't fit in a HOST_WIDE_INT

2014-06-09 Thread Thomas Preud'homme
When analyzing a bitwise AND with a constant as part of a bitwise OR,
the bswap pass stores the constant in a int64_t variable without checking
if it fits. As a result, we get ICE when the constant is an __int128 value.
This affects GCC trunk but also GCC 4.9 and 4.8 (and possibly earlier
version as well).


ChangeLog are changed as follows:

*** gcc/ChangeLog ***

2014-06-05  Thomas Preud'homme  

PR tree-optimization/61375
* tree-ssa-math-opts.c (init_symbolic_number): Cancel optimization if
symbolic number cannot be represented in an unsigned HOST_WIDE_INT.
(find_bswap_or_nop_1): Likewise.

*** gcc/testsuite/ChangeLog ***

2014-06-05  Thomas Preud'homme  

PR tree-optimization/61375
* gcc.c-torture/execute/pr61375-1.c: New test.


diff --git a/gcc/testsuite/gcc.c-torture/execute/pr61375.c 
b/gcc/testsuite/gcc.c-torture/execute/pr61375.c
new file mode 100644
index 000..58df57a
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr61375.c
@@ -0,0 +1,34 @@
+#ifdef __UINT64_TYPE__
+typedef __UINT64_TYPE__ uint64_t;
+#else
+typedef unsigned long long uint64_t;
+#endif
+
+#ifndef __SIZEOF_INT128__
+#define __int128 long long
+#endif
+
+/* Some version of bswap optimization would ICE when analyzing a mask constant
+   too big for an HOST_WIDE_INT (PR210931).  */
+
+__attribute__ ((noinline, noclone)) uint64_t
+uint128_central_bitsi_ior (unsigned __int128 in1, uint64_t in2)
+{
+  __int128 mask = (__int128)0x << 56;
+  return ((in1 & mask) >> 56) | in2;
+}
+
+int main(int argc)
+{
+  __int128 in = 1;
+#ifdef __SIZEOF_INT128__
+  in <<= 64;
+#endif
+  if (sizeof (uint64_t) * __CHAR_BIT__ != 64)
+return 0;
+  if (sizeof (unsigned __int128) * __CHAR_BIT__ != 128)
+return 0;
+  if (uint128_central_bitsi_ior (in, 2) != 0x102)
+__builtin_abort ();
+  return 0;
+}
diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
index 658b341..95b3f25 100644
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -1717,6 +1717,8 @@ init_symbolic_number (struct symbolic_number *n, tree src)
   if (n->size % BITS_PER_UNIT != 0)
 return false;
   n->size /= BITS_PER_UNIT;
+  if (n->size > (int)sizeof (unsigned HOST_WIDE_INT))
+return false;
   n->range = n->size;
   n->n = CMPNOP;
 
@@ -1883,6 +1885,8 @@ find_bswap_or_nop_1 (gimple stmt, struct symbolic_number 
*n, int limit)
type_size = TYPE_PRECISION (gimple_expr_type (stmt));
if (type_size % BITS_PER_UNIT != 0)
  return NULL_TREE;
+   if (type_size > (int)sizeof (unsigned HOST_WIDE_INT) * 8)
+ return NULL_TREE;
 
if (type_size / BITS_PER_UNIT < (int)(sizeof (int64_t)))
  {

Is this OK for trunk? What about backports for 4.8 and 4.9? Would a
reworked patch for these versions be accepted? The change would
be trivial: the code in init_symbolic_number now was moved from
some other place.

Best regards,

Thomas




RE: [PATCH] Fix PR61306: improve handling of sign and cast in bswap

2014-06-10 Thread Thomas Preud'homme
> From: Richard Biener [mailto:richard.guent...@gmail.com]
> Sent: Wednesday, June 04, 2014 5:39 PM
> To: Thomas Preud'homme
> 

> 
> I'm failing to get why you possibly need two casts ... you should
> only need one, from the bswap/load result to the final type
> (zero-extended as you say - so the load type should simply be
> unsigned which it is already).

You are right indeed. I failed to realize that the problems I
encountered were caused by an initially wrong understanding of
the reason behind PR61306. All this code is not necessary.

> 
> So I think that the testcase in the patch is fixed already by
> doing the n->type change (and a proper sign-extension detection).
> 
> Can you please split that part out?

Doing so I realize the patch was incomplete. Sign extension can
be triggered in two distinct place in the code (right shift and cast)
that can both lead to incorrect code being generated. With some
efforts I managed to create two testcases that work both on
GCC trunk but also GCC 4.9 and 4.8.

ChangeLog entries are:

*** gcc/ChangeLog ***

2014-06-05  Thomas Preud'homme  

PR tree-optimization/61306
* tree-ssa-math-opts.c (struct symbolic_number): Store type of
expression instead of its size.
(do_shift_rotate): Adapt to change in struct symbolic_number. Return
false to prevent optimization when the result is unpredictable due to
arithmetic right shift of signed type with highest byte is set.
(verify_symbolic_number_p): Adapt to change in struct symbolic_number.
(init_symbolic_number): Likewise.
(find_bswap_or_nop_1): Likewise. Return NULL to prevent optimization
when the result is unpredictable due to sign extension.

*** gcc/testsuite/ChangeLog ***

2014-06-05  Thomas Preud'homme  

* gcc.c-torture/execute/pr61306-1.c: New test.
* gcc.c-torture/execute/pr61306-2.c: Likewise.
* gcc.c-torture/execute/pr61306-3.c: Likewise.


diff --git a/gcc/testsuite/gcc.c-torture/execute/pr61306-1.c 
b/gcc/testsuite/gcc.c-torture/execute/pr61306-1.c
new file mode 100644
index 000..f6e8ff3
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr61306-1.c
@@ -0,0 +1,39 @@
+#ifdef __INT32_TYPE__
+typedef __INT32_TYPE__ int32_t;
+#else
+typedef int int32_t;
+#endif
+
+#ifdef __UINT32_TYPE__
+typedef __UINT32_TYPE__ uint32_t;
+#else
+typedef unsigned uint32_t;
+#endif
+
+#define __fake_const_swab32(x) ((uint32_t)(  \
+(((uint32_t)(x) & (uint32_t)0x00ffUL) << 24) |\
+(((uint32_t)(x) & (uint32_t)0xff00UL) <<  8) |\
+(((uint32_t)(x) & (uint32_t)0x00ffUL) >>  8) |\
+(( (int32_t)(x) &  (int32_t)0xff00UL) >> 24)))
+
+/* Previous version of bswap optimization failed to consider sign extension
+   and as a result would replace an expression *not* doing a bswap by a
+   bswap.  */
+
+__attribute__ ((noinline, noclone)) uint32_t
+fake_bswap32 (uint32_t in)
+{
+  return __fake_const_swab32 (in);
+}
+
+int
+main(void)
+{
+  if (sizeof (int32_t) * __CHAR_BIT__ != 32)
+return 0;
+  if (sizeof (uint32_t) * __CHAR_BIT__ != 32)
+return 0;
+  if (fake_bswap32 (0x87654321) != 0xff87)
+__builtin_abort ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr61306-2.c 
b/gcc/testsuite/gcc.c-torture/execute/pr61306-2.c
new file mode 100644
index 000..6cbbd19
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr61306-2.c
@@ -0,0 +1,40 @@
+#ifdef __INT16_TYPE__
+typedef __INT16_TYPE__ int16_t;
+#else
+typedef short int16_t;
+#endif
+
+#ifdef __UINT32_TYPE__
+typedef __UINT32_TYPE__ uint32_t;
+#else
+typedef unsigned uint32_t;
+#endif
+
+#define __fake_const_swab32(x) ((uint32_t)(  \
+(((uint32_t) (x) & (uint32_t)0x00ffUL) << 24) |   \
+(((uint32_t)(int16_t)(x) & (uint32_t)0x0000UL) <<  8) |   \
+(((uint32_t) (x) & (uint32_t)0x00ffUL) >>  8) |   \
+(((uint32_t) (x) & (uint32_t)0xff00UL) >> 24)))
+
+
+/* Previous version of bswap optimization failed to consider sign extension
+   and as a result would replace an expression *not* doing a bswap by a
+   bswap.  */
+
+__attribute__ ((noinline, noclone)) uint32_t
+fake_bswap32 (uint32_t in)
+{
+  return __fake_const_swab32 (in);
+}
+
+int
+main(void)
+{
+  if (sizeof (uint32_t) * __CHAR_BIT__ != 32)
+return 0;
+  if (sizeof (int16_t) * __CHAR_BIT__ != 16)
+return 0;
+  if (fake_bswap32 (0x81828384) != 0xff838281)
+__builtin_abort ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr61306-3.c 
b/gcc/testsuite/gcc.c-torture/execute/pr61306-3.c
new file mode 100644
index 000..6086e27
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr61306-3.c
@@ -0,0 +1,13 @@
+short a = -1;
+int b;
+char c;
+
+int

RE: [PATCH] Fix PR61306: improve handling of sign and cast in bswap

2014-06-10 Thread Thomas Preud'homme
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme

> 
> Is this OK for trunk? Does this bug qualify for a backport patch to
> 4.8 and 4.9 branches?

I forgot to mention that this was tested via bootstrap on
x86_64-linux-gnu target, the testsuite then showing no
regressions and the 3 tests added now passing.

Best regards,

Thomas 





[PATCH] PR61517: fix stmt replacement in bswap pass

2014-06-17 Thread Thomas Preud'homme
Hi everybody,

Thanks to a comment from Richard Biener, the bswap pass take care to not 
perform its optimization is memory is modified between the load of the original 
expression. However, when it replaces these statements by a single load, it 
does so in the gimple statement that computes the final bitwise OR of the 
original expression. However, memory could be modified between the last load 
statement and this bitwise OR statement. Therefore the result is to read memory 
*after* it was changed instead of before.

This patch takes care to move the statement to be replaced close to one of the 
original load, thus avoiding this problem.

ChangeLog entries for this fix are:

*** gcc/ChangeLog ***

2014-06-16  Thomas Preud'homme  

* tree-ssa-math-opts.c (find_bswap_or_nop_1): Adapt to return a stmt
whose rhs's first tree is the source expression instead of the
expression itself.
(find_bswap_or_nop): Likewise.
(bsap_replace): Rename stmt in cur_stmt. Pass gsi by value and src as a
gimple stmt whose rhs's first tree is the source. In the memory source
case, move the stmt to be replaced close to one of the original load to
avoid the problem of a store between the load and the stmt's original
location.
(pass_optimize_bswap::execute): Adapt to change in bswap_replace's
signature.

*** gcc/testsuite/ChangeLog ***

2014-06-16  Thomas Preud'homme  

* gcc.c-torture/execute/bswap-2.c (incorrect_read_le32): New.
(incorrect_read_be32): Likewise.
(main): Call incorrect_read_* to test stmt replacement is made by
bswap at the right place.
* gcc.c-torture/execute/pr61517.c: New test.

Patch also attached for convenience. Is it ok for trunk?

diff --git a/gcc/testsuite/gcc.c-torture/execute/bswap-2.c 
b/gcc/testsuite/gcc.c-torture/execute/bswap-2.c
index a47e01a..88132fe 100644
--- a/gcc/testsuite/gcc.c-torture/execute/bswap-2.c
+++ b/gcc/testsuite/gcc.c-torture/execute/bswap-2.c
@@ -66,6 +66,32 @@ fake_read_be32 (char *x, char *y)
   return c3 | c2 << 8 | c1 << 16 | c0 << 24;
 }
 
+__attribute__ ((noinline, noclone)) uint32_t
+incorrect_read_le32 (char *x, char *y)
+{
+  unsigned char c0, c1, c2, c3;
+
+  c0 = x[0];
+  c1 = x[1];
+  c2 = x[2];
+  c3 = x[3];
+  *y = 1;
+  return c0 | c1 << 8 | c2 << 16 | c3 << 24;
+}
+
+__attribute__ ((noinline, noclone)) uint32_t
+incorrect_read_be32 (char *x, char *y)
+{
+  unsigned char c0, c1, c2, c3;
+
+  c0 = x[0];
+  c1 = x[1];
+  c2 = x[2];
+  c3 = x[3];
+  *y = 1;
+  return c3 | c2 << 8 | c1 << 16 | c0 << 24;
+}
+
 int
 main ()
 {
@@ -92,8 +118,17 @@ main ()
   out = fake_read_le32 (cin, &cin[2]);
   if (out != 0x89018583)
 __builtin_abort ();
+  cin[2] = 0x87;
   out = fake_read_be32 (cin, &cin[2]);
   if (out != 0x83850189)
 __builtin_abort ();
+  cin[2] = 0x87;
+  out = incorrect_read_le32 (cin, &cin[2]);
+  if (out != 0x89878583)
+__builtin_abort ();
+  cin[2] = 0x87;
+  out = incorrect_read_be32 (cin, &cin[2]);
+  if (out != 0x83858789)
+__builtin_abort ();
   return 0;
 }
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr61517.c 
b/gcc/testsuite/gcc.c-torture/execute/pr61517.c
new file mode 100644
index 000..fc9bbe8
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr61517.c
@@ -0,0 +1,19 @@
+int a, b, *c = &a;
+unsigned short d;
+
+int
+main ()
+{
+  unsigned int e = a;
+  *c = 1;
+  if (!b)
+{
+  d = e;
+  *c = d | e;
+}
+
+  if (a != 0)
+__builtin_abort ();
+
+  return 0;
+}
diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
index c868e92..1ee2ba8 100644
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -1804,28 +1804,28 @@ find_bswap_or_nop_load (gimple stmt, tree ref, struct 
symbolic_number *n)
 
 /* find_bswap_or_nop_1 invokes itself recursively with N and tries to perform
the operation given by the rhs of STMT on the result.  If the operation
-   could successfully be executed the function returns the tree expression of
-   the source operand and NULL otherwise.  */
+   could successfully be executed the function returns a gimple stmt whose
+   rhs's first tree is the expression of the source operand and NULL
+   otherwise.  */
 
-static tree
+static gimple
 find_bswap_or_nop_1 (gimple stmt, struct symbolic_number *n, int limit)
 {
   enum tree_code code;
   tree rhs1, rhs2 = NULL;
-  gimple rhs1_stmt, rhs2_stmt;
-  tree source_expr1;
+  gimple rhs1_stmt, rhs2_stmt, source_stmt1;
   enum gimple_rhs_class rhs_class;
 
   if (!limit || !is_gimple_assign (stmt))
-return NULL_TREE;
+return NULL;
 
   rhs1 = gimple_assign_rhs1 (stmt);
 
   if (find_bswap_or_nop_load (stmt, rhs1, n))
-return rhs1;
+return stmt;
 
   if (TREE_CODE (rhs1) != SSA_NAME)
-return NULL_TREE;
+return NULL;
 
   code = gimple_assign_rhs_code (stmt);
  

RE: [PATCH] Fix PR61306: improve handling of sign and cast in bswap

2014-06-17 Thread Thomas Preud'homme
> From: Richard Biener [mailto:richard.guent...@gmail.com]
> Sent: Wednesday, June 11, 2014 4:32 PM
> >
> >
> > Is this OK for trunk? Does this bug qualify for a backport patch to
> > 4.8 and 4.9 branches?
> 
> This is ok for trunk and also for backporting (after a short while to
> see if there is any fallout).

Below is the backported patch for 4.8/4.9. Is this ok for both 4.8 and
4.9? If yes, how much more should I wait before committing?

Tested on both 4.8 and 4.9 without regression in the testsuite after
a bootstrap.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 1e35bbe..0559b7f 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,16 @@
+2014-06-12  Thomas Preud'homme  
+
+   PR tree-optimization/61306
+   * tree-ssa-math-opts.c (struct symbolic_number): Store type of
+   expression instead of its size.
+   (do_shift_rotate): Adapt to change in struct symbolic_number. Return
+   false to prevent optimization when the result is unpredictable due to
+   arithmetic right shift of signed type with highest byte is set.
+   (verify_symbolic_number_p): Adapt to change in struct symbolic_number.
+   (find_bswap_1): Likewise. Return NULL to prevent optimization when the
+   result is unpredictable due to sign extension.
+   (find_bswap): Adapt to change in struct symbolic_number.
+
 2014-06-12  Alan Modra  
 
PR target/61300
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 757cb74..139f23c 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,9 @@
+2014-06-12  Thomas Preud'homme  
+
+   * gcc.c-torture/execute/pr61306-1.c: New test.
+   * gcc.c-torture/execute/pr61306-2.c: Likewise.
+   * gcc.c-torture/execute/pr61306-3.c: Likewise.
+
 2014-06-11  Richard Biener  
 
PR tree-optimization/61452
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr61306-1.c 
b/gcc/testsuite/gcc.c-torture/execute/pr61306-1.c
new file mode 100644
index 000..ebc90a3
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr61306-1.c
@@ -0,0 +1,39 @@
+#ifdef __INT32_TYPE__
+typedef __INT32_TYPE__ int32_t;
+#else
+typedef int int32_t;
+#endif
+
+#ifdef __UINT32_TYPE__
+typedef __UINT32_TYPE__ uint32_t;
+#else
+typedef unsigned uint32_t;
+#endif
+
+#define __fake_const_swab32(x) ((uint32_t)(  \
+   (((uint32_t)(x) & (uint32_t)0x00ffUL) << 24) |\
+   (((uint32_t)(x) & (uint32_t)0xff00UL) <<  8) |\
+   (((uint32_t)(x) & (uint32_t)0x00ffUL) >>  8) |\
+   (( (int32_t)(x) &  (int32_t)0xff00UL) >> 24)))
+
+/* Previous version of bswap optimization failed to consider sign extension
+   and as a result would replace an expression *not* doing a bswap by a
+   bswap.  */
+
+__attribute__ ((noinline, noclone)) uint32_t
+fake_bswap32 (uint32_t in)
+{
+  return __fake_const_swab32 (in);
+}
+
+int
+main(void)
+{
+  if (sizeof (int32_t) * __CHAR_BIT__ != 32)
+return 0;
+  if (sizeof (uint32_t) * __CHAR_BIT__ != 32)
+return 0;
+  if (fake_bswap32 (0x87654321) != 0xff87)
+__builtin_abort ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr61306-2.c 
b/gcc/testsuite/gcc.c-torture/execute/pr61306-2.c
new file mode 100644
index 000..886ecfd
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr61306-2.c
@@ -0,0 +1,40 @@
+#ifdef __INT16_TYPE__
+typedef __INT16_TYPE__ int16_t;
+#else
+typedef short int16_t;
+#endif
+
+#ifdef __UINT32_TYPE__
+typedef __UINT32_TYPE__ uint32_t;
+#else
+typedef unsigned uint32_t;
+#endif
+
+#define __fake_const_swab32(x) ((uint32_t)(  \
+   (((uint32_t) (x) & (uint32_t)0x00ffUL) << 24) |   \
+   (((uint32_t)(int16_t)(x) & (uint32_t)0x0000UL) <<  8) |   \
+   (((uint32_t) (x) & (uint32_t)0x00ffUL) >>  8) |   \
+   (((uint32_t) (x) & (uint32_t)0xff00UL) >> 24)))
+
+
+/* Previous version of bswap optimization failed to consider sign extension
+   and as a result would replace an expression *not* doing a bswap by a
+   bswap.  */
+
+__attribute__ ((noinline, noclone)) uint32_t
+fake_bswap32 (uint32_t in)
+{
+  return __fake_const_swab32 (in);
+}
+
+int
+main(void)
+{
+  if (sizeof (uint32_t) * __CHAR_BIT__ != 32)
+return 0;
+  if (sizeof (int16_t) * __CHAR_BIT__ != 16)
+return 0;
+  if (fake_bswap32 (0x81828384) != 0xff838281)
+__builtin_abort ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr61306-3.c 
b/gcc/testsuite/gcc.c-torture/execute/pr61306-3.c
new file mode 100644
index 000..6086e27
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr61306-3.c
@@ -0,0 +1,13 @@
+short a = -1;
+int b;
+char c;
+
+int
+main ()
+{
+  c = a;
+  b = a | c;
+  if (b != -1)
+__builtin_abort ();
+  return 0;
+}
diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tr

RE: [PATCH] Fix PR61306: improve handling of sign and cast in bswap

2014-06-18 Thread Thomas Preud'homme
> From: Jakub Jelinek [mailto:ja...@redhat.com]
> Sent: Thursday, June 19, 2014 1:54 AM
> 
> Seems there are actually two spots with this, not just one.
> 
> Completely untested fix:
> 
> 2014-06-18  Jakub Jelinek  
> 
>   * tree-ssa-math-opts.c (do_shift_rotate, find_bswap_or_nop_1):
> Cast
>   0xff to uint64_t before shifting it up.
> 
> --- gcc/tree-ssa-math-opts.c  2014-06-13 08:08:42.354136356 +0200
> +++ gcc/tree-ssa-math-opts.c  2014-06-18 19:50:59.486916201 +0200
> @@ -1669,7 +1669,8 @@ do_shift_rotate (enum tree_code code,
>break;
>  case RSHIFT_EXPR:
>/* Arithmetic shift of signed type: result is dependent on the value.  
> */
> -  if (!TYPE_UNSIGNED (n->type) && (n->n & (0xff << (bitsize - 8
> +  if (!TYPE_UNSIGNED (n->type)
> +   && (n->n & ((uint64_t) 0xff << (bitsize - 8
>   return false;
>n->n >>= count;
>break;
> @@ -1903,7 +1904,7 @@ find_bswap_or_nop_1 (gimple stmt, struct
>   old_type_size = TYPE_PRECISION (n->type);
>   if (!TYPE_UNSIGNED (n->type)
>   && type_size > old_type_size
> - && n->n & (0xff << (old_type_size - 8)))
> + && n->n & ((uint64_t) 0xff << (old_type_size - 8)))
> return NULL_TREE;
> 
>   if (type_size / BITS_PER_UNIT < (int)(sizeof (int64_t)))
> 
> 

Yep, that's the right fix. I tested it on both a bootstrapped gcc on 
x86_64-linux-gnu
and an arm-none-eabi cross-compiler with no regression on the testsuite.

Jakub, since you made the patch, the honor of commiting it should be yours.

Richard, given this issue, I think we should wait a few more days before I 
commit
A backported (and fixed of course) version to 4.8 and 4.9.

Best regards,

Thomas





RE: [PATCH] Fix PR61375: cancel bswap optimization when value doesn't fit in a HOST_WIDE_INT

2014-06-20 Thread Thomas Preud'homme
> From: Richard Biener [mailto:richard.guent...@gmail.com]
> Sent: Tuesday, June 10, 2014 5:05 PM
> 
> Backports are welcome - please post a patch.
> 

Sorry for the delay. Here you are:

diff --git a/gcc/testsuite/gcc.c-torture/execute/pr61375.c 
b/gcc/testsuite/gcc.c-torture/execute/pr61375.c
new file mode 100644
index 000..d3b54a8
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr61375.c
@@ -0,0 +1,35 @@
+#ifdef __UINT64_TYPE__
+typedef __UINT64_TYPE__ uint64_t;
+#else
+typedef unsigned long long uint64_t;
+#endif
+
+#ifndef __SIZEOF_INT128__
+#define __int128 long long
+#endif
+
+/* Some version of bswap optimization would ICE when analyzing a mask constant
+   too big for an HOST_WIDE_INT (PR61375).  */
+
+__attribute__ ((noinline, noclone)) uint64_t
+uint128_central_bitsi_ior (unsigned __int128 in1, uint64_t in2)
+{
+  __int128 mask = (__int128)0x << 56;
+  return ((in1 & mask) >> 56) | in2;
+}
+
+int
+main (int argc)
+{
+  __int128 in = 1;
+#ifdef __SIZEOF_INT128__
+  in <<= 64;
+#endif
+  if (sizeof (uint64_t) * __CHAR_BIT__ != 64)
+return 0;
+  if (sizeof (unsigned __int128) * __CHAR_BIT__ != 128)
+return 0;
+  if (uint128_central_bitsi_ior (in, 2) != 0x102)
+__builtin_abort ();
+  return 0;
+}
diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
index 9ff857c..9d64205 100644
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -1741,6 +1741,8 @@ find_bswap_1 (gimple stmt, struct symbolic_number *n, int 
limit)
  if (n->size % BITS_PER_UNIT != 0)
return NULL_TREE;
  n->size /= BITS_PER_UNIT;
+ if (n->size > (int)sizeof (unsigned HOST_WIDEST_INT))
+   return NULL_TREE;
  n->n = (sizeof (HOST_WIDEST_INT) < 8 ? 0 :
  (unsigned HOST_WIDEST_INT)0x08070605 << 32 | 0x04030201);
 
@@ -1781,6 +1783,8 @@ find_bswap_1 (gimple stmt, struct symbolic_number *n, int 
limit)
type_size = TYPE_PRECISION (gimple_expr_type (stmt));
if (type_size % BITS_PER_UNIT != 0)
  return NULL_TREE;
+   if (type_size > (int)HOST_BITS_PER_WIDEST_INT)
+ return NULL_TREE;
 
if (type_size / BITS_PER_UNIT < (int)(sizeof (HOST_WIDEST_INT)))
  {

Ok for GCC 4.8 and GCC 4.9 branches?

Best regards,

Thomas




RE: [PATCH] Fix PR61375: cancel bswap optimization when value doesn't fit in a HOST_WIDE_INT

2014-06-23 Thread Thomas Preud'homme
> From: Jakub Jelinek [mailto:ja...@redhat.com]
> Sent: Monday, June 23, 2014 4:37 PM
> 
> On Mon, Jun 23, 2014 at 10:18:16AM +0200, Richard Biener wrote:
> > > --- a/gcc/tree-ssa-math-opts.c
> > > +++ b/gcc/tree-ssa-math-opts.c
> > > @@ -1741,6 +1741,8 @@ find_bswap_1 (gimple stmt, struct
> symbolic_number *n, int limit)
> > >   if (n->size % BITS_PER_UNIT != 0)
> > > return NULL_TREE;
> > >   n->size /= BITS_PER_UNIT;
> > > + if (n->size > (int)sizeof (unsigned HOST_WIDEST_INT))
> > > +   return NULL_TREE;
> 
> This looks wrong, while the bswap pass is guarded with BITS_PER_UNIT == 8
> check (i.e. target), you don't know of HOST_BITS_PER_CHAR is 8.
> I'd move the test before the division by BITS_PER_UNIT, and compare
> against HOST_BITS_PER_WIDEST_INT.

I may misunderstand you but I don't think there is a problem here because we
just check if we can create a value on the host with as many bytes as the value
on the target. The value on the host is different, with each byte being a
number from 1 to SIZE, SIZE being the number of bytes on the target. So this
would fail only if the target value has so many bytes that this number of byte
cannot be represented in a HOST_WIDEST_INT.

> 
> > >   n->n = (sizeof (HOST_WIDEST_INT) < 8 ? 0 :
> > >   (unsigned HOST_WIDEST_INT)0x08070605 << 32 | 
> > > 0x04030201);
> > >
> > > @@ -1781,6 +1783,8 @@ find_bswap_1 (gimple stmt, struct
> symbolic_number *n, int limit)
> > > type_size = TYPE_PRECISION (gimple_expr_type (stmt));
> > > if (type_size % BITS_PER_UNIT != 0)
> > >   return NULL_TREE;
> > > +   if (type_size > (int)HOST_BITS_PER_WIDEST_INT)
> > > + return NULL_TREE;
> > >
> > > if (type_size / BITS_PER_UNIT < (int)(sizeof 
> > > (HOST_WIDEST_INT)))
> > >   {
> 
> Similarly here.

I agree that here the test is not correct as we look at the number of bits on 
the
host which should be enough to count the number of byte on the target. To
reflect better the intent we should first compute the number of byte that
type_size forms and then compare to the size in byte of HOST_WIDEST_INT.

I'll rework the patch in this directly.

> 
> BTW, the formatting is wrong too, the (int) cast should be followed by space.

Right, but note that I merely followed the current style in this file. There are
many more occurences of this style mistake in this file. Do you want me to
fix this one anyway?

Best regards,

Thomas




RE: [PATCH] Fix PR61375: cancel bswap optimization when value doesn't fit in a HOST_WIDE_INT

2014-06-23 Thread Thomas Preud'homme
> From: Jakub Jelinek [mailto:ja...@redhat.com]
> Sent: Monday, June 23, 2014 4:59 PM
> 
> Host could e.g. in theory have CHAR_BIT 32, while target BITS_PER_UNIT 8
> (otherwise bswap pass would give up).  sizeof (unsigned HOST_WIDE_INT)
> could
> very well be 2 in that case.

In this case the pass would skip any value of more than 2 bytes. However 
although the original comments on struct symbolic_number implies that there is 
a mapping between host bytes (the bytes of the symbolic number) and target 
bytes, it isn't the case since do_shift_rotate () shift the symbolic number by 
quantity of BYTES_PER_UNIT instead of CHAR_BIT. Also there is quite a few 8 
here and there. Although not a problem in practice, the mix of 8 and 
BITS_PER_UNIT does not look very good. I guess a quick review would be in 
order. Of course, with regards to the backport the mix of 8 and BITS_PER_UNIT 
should be left as is and only confusion about how to represent a target value 
into a host type should be fixed if any.

I'll come back to you whenever this is done.

Best regards,

Thomas





RE: [PATCH] Fix PR61375: cancel bswap optimization when value doesn't fit in a HOST_WIDE_INT

2014-06-23 Thread Thomas Preud'homme
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme
> 
> However
> although the original comments on struct symbolic_number implies that
> there is a mapping between host bytes (the bytes of the symbolic number)
> and target bytes, it isn't the case since do_shift_rotate () shift the 
> symbolic
> number by quantity of BYTES_PER_UNIT instead of CHAR_BIT.

My bad, the comment can be understood both ways.

Best regards,

Thomas




RE: [PATCH] Fix PR61375: cancel bswap optimization when value doesn't fit in a HOST_WIDE_INT

2014-06-25 Thread Thomas Preud'homme
Ok, what about the following patch and associated ChangeLog entries?

2014-06-24  Thomas Preud'homme  

PR tree-optimization/61375
* tree-ssa-math-opts.c (find_bswap_or_nop_1): Cancel optimization if
symbolic number cannot be represented in an unsigned HOST_WIDE_INT.
(execute_optimize_bswap): Cancel optimization if CHAR_BIT != 8.

2014-06-24  Thomas Preud'homme  

PR tree-optimization/61375
* gcc.c-torture/execute/pr61375-1.c: New test.


diff --git a/gcc/testsuite/gcc.c-torture/execute/pr61375.c 
b/gcc/testsuite/gcc.c-torture/execute/pr61375.c
new file mode 100644
index 000..6fb4693
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr61375.c
@@ -0,0 +1,35 @@
+#ifdef __UINT64_TYPE__
+typedef __UINT64_TYPE__ uint64_t;
+#else
+typedef unsigned long long uint64_t;
+#endif
+
+#ifndef __SIZEOF_INT128__
+#define __int128 long long
+#endif
+
+/* Some version of bswap optimization would ICE when analyzing a mask constant
+   too big for an HOST_WIDE_INT (PR61375).  */
+
+__attribute__ ((noinline, noclone)) uint64_t
+uint128_central_bitsi_ior (unsigned __int128 in1, uint64_t in2)
+{
+  __int128 mask = (__int128)0x << 56;
+  return ((in1 & mask) >> 56) | in2;
+}
+
+int
+main (int argc)
+{
+  __int128 in = 1;
+#ifdef __SIZEOF_INT128__
+  in <<= 64;
+#endif
+  if (sizeof (uint64_t) * __CHAR_BIT__ != 64)
+return 0;
+  if (sizeof (unsigned __int128) * __CHAR_BIT__ != 128)
+return 0;
+  if (uint128_central_bitsi_ior (in, 2) != 0x102)
+__builtin_abort ();
+  return 0;
+}
diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
index 9ff857c..045bf48 100644
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -1740,6 +1740,8 @@ find_bswap_1 (gimple stmt, struct symbolic_number *n, int 
limit)
  n->size = TYPE_PRECISION (TREE_TYPE (rhs1));
  if (n->size % BITS_PER_UNIT != 0)
return NULL_TREE;
+ if (n->size > HOST_BITS_PER_WIDEST_INT)
+   return NULL_TREE;
  n->size /= BITS_PER_UNIT;
  n->n = (sizeof (HOST_WIDEST_INT) < 8 ? 0 :
  (unsigned HOST_WIDEST_INT)0x08070605 << 32 | 0x04030201);
@@ -1781,6 +1783,8 @@ find_bswap_1 (gimple stmt, struct symbolic_number *n, int 
limit)
type_size = TYPE_PRECISION (gimple_expr_type (stmt));
if (type_size % BITS_PER_UNIT != 0)
  return NULL_TREE;
+   if (type_size > (int) HOST_BITS_PER_WIDEST_INT)
+ return NULL_TREE;
 
if (type_size / BITS_PER_UNIT < (int)(sizeof (HOST_WIDEST_INT)))
  {
@@ -1911,7 +1915,7 @@ execute_optimize_bswap (void)
   bool changed = false;
   tree bswap16_type = NULL_TREE, bswap32_type = NULL_TREE, bswap64_type = 
NULL_TREE;
 
-  if (BITS_PER_UNIT != 8)
+  if (BITS_PER_UNIT != 8 || CHAR_BIT != 8)
 return 0;
 
   if (sizeof (HOST_WIDEST_INT) < 8)

Is this ok for 4.8 and 4.9 branches?

Best regards,

Thomas




RE: [PATCH] Fix PR61306: improve handling of sign and cast in bswap

2014-06-26 Thread Thomas Preud'homme
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme
> Sent: Thursday, June 19, 2014 1:36 PM
> 
> Richard, given this issue, I think we should wait a few more days before I
> commit
> A backported (and fixed of course) version to 4.8 and 4.9.

No new issues were reported since then. Is it ok to commit the backport
(with Jakub fix) now or should we wait more?

Best regards,

Thomas




RE: [PATCH] Fix PR61306: improve handling of sign and cast in bswap

2014-06-29 Thread Thomas Preud'homme
> From: Richard Biener [mailto:richard.guent...@gmail.com]
> Sent: Friday, June 27, 2014 4:49 PM
> 
> FIne with me now.

Commited.

Best regards,

Thomas





RE: [PATCH] Fix PR61375: cancel bswap optimization when value doesn't fit in a HOST_WIDE_INT

2014-07-01 Thread Thomas Preud'homme
Ping?

> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme
> Sent: Thursday, June 26, 2014 9:11 AM
> To: 'Jakub Jelinek'
> Cc: Richard Biener; GCC Patches
> Subject: RE: [PATCH] Fix PR61375: cancel bswap optimization when value
> doesn't fit in a HOST_WIDE_INT
> 
> Ok, what about the following patch and associated ChangeLog entries?
> 
> 2014-06-24  Thomas Preud'homme  
> 
>   PR tree-optimization/61375
>   * tree-ssa-math-opts.c (find_bswap_or_nop_1): Cancel optimization
> if
>   symbolic number cannot be represented in an unsigned
> HOST_WIDE_INT.
>   (execute_optimize_bswap): Cancel optimization if CHAR_BIT != 8.
> 
> 2014-06-24  Thomas Preud'homme  
> 
>   PR tree-optimization/61375
>   * gcc.c-torture/execute/pr61375-1.c: New test.
> 
> 
> diff --git a/gcc/testsuite/gcc.c-torture/execute/pr61375.c
> b/gcc/testsuite/gcc.c-torture/execute/pr61375.c
> new file mode 100644
> index 000..6fb4693
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/execute/pr61375.c
> @@ -0,0 +1,35 @@
> +#ifdef __UINT64_TYPE__
> +typedef __UINT64_TYPE__ uint64_t;
> +#else
> +typedef unsigned long long uint64_t;
> +#endif
> +
> +#ifndef __SIZEOF_INT128__
> +#define __int128 long long
> +#endif
> +
> +/* Some version of bswap optimization would ICE when analyzing a mask
> constant
> +   too big for an HOST_WIDE_INT (PR61375).  */
> +
> +__attribute__ ((noinline, noclone)) uint64_t
> +uint128_central_bitsi_ior (unsigned __int128 in1, uint64_t in2)
> +{
> +  __int128 mask = (__int128)0x << 56;
> +  return ((in1 & mask) >> 56) | in2;
> +}
> +
> +int
> +main (int argc)
> +{
> +  __int128 in = 1;
> +#ifdef __SIZEOF_INT128__
> +  in <<= 64;
> +#endif
> +  if (sizeof (uint64_t) * __CHAR_BIT__ != 64)
> +return 0;
> +  if (sizeof (unsigned __int128) * __CHAR_BIT__ != 128)
> +return 0;
> +  if (uint128_central_bitsi_ior (in, 2) != 0x102)
> +__builtin_abort ();
> +  return 0;
> +}
> diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
> index 9ff857c..045bf48 100644
> --- a/gcc/tree-ssa-math-opts.c
> +++ b/gcc/tree-ssa-math-opts.c
> @@ -1740,6 +1740,8 @@ find_bswap_1 (gimple stmt, struct
> symbolic_number *n, int limit)
> n->size = TYPE_PRECISION (TREE_TYPE (rhs1));
> if (n->size % BITS_PER_UNIT != 0)
>   return NULL_TREE;
> +   if (n->size > HOST_BITS_PER_WIDEST_INT)
> + return NULL_TREE;
> n->size /= BITS_PER_UNIT;
> n->n = (sizeof (HOST_WIDEST_INT) < 8 ? 0 :
> (unsigned HOST_WIDEST_INT)0x08070605 << 32 |
> 0x04030201);
> @@ -1781,6 +1783,8 @@ find_bswap_1 (gimple stmt, struct
> symbolic_number *n, int limit)
>   type_size = TYPE_PRECISION (gimple_expr_type (stmt));
>   if (type_size % BITS_PER_UNIT != 0)
> return NULL_TREE;
> + if (type_size > (int) HOST_BITS_PER_WIDEST_INT)
> +   return NULL_TREE;
> 
>   if (type_size / BITS_PER_UNIT < (int)(sizeof (HOST_WIDEST_INT)))
> {
> @@ -1911,7 +1915,7 @@ execute_optimize_bswap (void)
>bool changed = false;
>tree bswap16_type = NULL_TREE, bswap32_type = NULL_TREE,
> bswap64_type = NULL_TREE;
> 
> -  if (BITS_PER_UNIT != 8)
> +  if (BITS_PER_UNIT != 8 || CHAR_BIT != 8)
>  return 0;
> 
>if (sizeof (HOST_WIDEST_INT) < 8)
> 
> Is this ok for 4.8 and 4.9 branches?
> 
> Best regards,
> 
> Thomas
> 
> 





[PATCH] Fix confusion between target, host and symbolic number byte sizes

2014-07-03 Thread Thomas Preud'homme
The bswap pass deals with 3 possibly different byte size: host, target and the 
size a byte marker occupied in the symbolic_number structure [1]. However, as 
of now the code mixes the three size. This works in practice as the pass is 
only enabled for target with BITS_PER_UNIT == 8 and nobody runs GCC on a host 
with CHAR_BIT != 8. As prompted by Jakub Jelinek, this patch fixes this mess. 
Byte marker are 8-bit quantities (they could be made 4-bit quantities but I 
prefered to keep the code working the same as before) for which a new macro is 
introduced (BITS_PER_MARKERS), anything related to storing the value or a byte 
marker in a variable should check for the host byte size or wide integer size 
and anything aimed at manipulating the target value should check for 
BITS_PER_UNIT.


[1] Although the comment for this structure implies that a byte marker as the 
same size as the host byte, the way it is used in the code (even before any of 
my patch) shows that it uses a fixed size of 8 [2].
[2] Note that since the pass is only active for targets with BITS_PER_UNIT == 
8, it might be using the target byte size.

gcc/ChangeLog:

2014-07-04  Thomas Preud'homme  

* tree-ssa-math-opts.c (struct symbolic_number): Clarify comment about
the size of byte markers.
(do_shift_rotate): Fix confusion between host, target and marker byte
size.
(verify_symbolic_number_p): Likewise.
(find_bswap_or_nop_1): Likewise.
(find_bswap_or_nop): Likewise.


diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
index ca2b30d..55c5df7 100644
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -1602,11 +1602,10 @@ make_pass_cse_sincos (gcc::context *ctxt)
 
 /* A symbolic number is used to detect byte permutation and selection
patterns.  Therefore the field N contains an artificial number
-   consisting of byte size markers:
+   consisting of octet sized markers:
 
-   0- byte has the value 0
-   1..size - byte contains the content of the byte
-   number indexed with that value minus one.
+   0- target byte has the value 0
+   1..size - marker value is the target byte index minus one.
 
To detect permutations on memory sources (arrays and structures), a symbolic
number is also associated a base address (the array or structure the load is
@@ -1631,6 +1630,8 @@ struct symbolic_number {
   unsigned HOST_WIDE_INT range;
 };
 
+#define BITS_PER_MARKER 8
+
 /* The number which the find_bswap_or_nop_1 result should match in
order to have a nop.  The number is masked according to the size of
the symbolic number before using it.  */
@@ -1652,15 +1653,16 @@ do_shift_rotate (enum tree_code code,
 struct symbolic_number *n,
 int count)
 {
-  int bitsize = TYPE_PRECISION (n->type);
+  int size = TYPE_PRECISION (n->type) / BITS_PER_UNIT;
 
-  if (count % 8 != 0)
+  if (count % BITS_PER_UNIT != 0)
 return false;
+  count = (count / BITS_PER_UNIT) * BITS_PER_MARKER;
 
   /* Zero out the extra bits of N in order to avoid them being shifted
  into the significant bits.  */
-  if (bitsize < 8 * (int)sizeof (int64_t))
-n->n &= ((uint64_t)1 << bitsize) - 1;
+  if (size < 64 / BITS_PER_MARKER)
+n->n &= ((uint64_t) 1 << (size * BITS_PER_MARKER)) - 1;
 
   switch (code)
 {
@@ -1670,22 +1672,22 @@ do_shift_rotate (enum tree_code code,
 case RSHIFT_EXPR:
   /* Arithmetic shift of signed type: result is dependent on the value.  */
   if (!TYPE_UNSIGNED (n->type)
- && (n->n & ((uint64_t) 0xff << (bitsize - 8
+ && (n->n & ((uint64_t) 0xff << ((size - 1) * BITS_PER_MARKER
return false;
   n->n >>= count;
   break;
 case LROTATE_EXPR:
-  n->n = (n->n << count) | (n->n >> (bitsize - count));
+  n->n = (n->n << count) | (n->n >> ((size * BITS_PER_MARKER) - count));
   break;
 case RROTATE_EXPR:
-  n->n = (n->n >> count) | (n->n << (bitsize - count));
+  n->n = (n->n >> count) | (n->n << ((size * BITS_PER_MARKER) - count));
   break;
 default:
   return false;
 }
   /* Zero unused bits for size.  */
-  if (bitsize < 8 * (int)sizeof (int64_t))
-n->n &= ((uint64_t)1 << bitsize) - 1;
+  if (size < 64 / BITS_PER_MARKER)
+n->n &= ((uint64_t) 1 << (size * BITS_PER_MARKER)) - 1;
   return true;
 }
 
@@ -1726,13 +1728,13 @@ init_symbolic_number (struct symbolic_number *n, tree 
src)
   if (size % BITS_PER_UNIT != 0)
 return false;
   size /= BITS_PER_UNIT;
-  if (size > (int)sizeof (uint64_t))
+  if (size > 64 / BITS_PER_MARKER)
 return false;
   n->range = size;
   n->n = CMPNOP;
 
-  if (size < (int)sizeof (int64_t))
-n->n &= ((uint64_t)1 << (size * BITS_PE

RE: [PATCH, ping 1] Move insns without introducing new temporaries in loop2_invariant

2015-05-12 Thread Thomas Preud'homme
> From: Jeff Law [mailto:l...@redhat.com]
> Sent: Tuesday, May 12, 2015 4:17 AM
> 
> On 05/06/2015 03:47 AM, Thomas Preud'homme wrote:
> > Ping?
> Something to consider as future work -- I'm pretty sure PRE sets up the
> same kind of problematical pattern with a new pseudo (reaching reg)
> holding the result of the redundant expression and the original
> evaluations turned into copies from the reaching reg to the final
> destination.

Yes absolutely, this is how the pattern I was interested in was created. The
reason I solved it in loop-invariant is that I thought this was on purpose
with the cleanup left to loop-invariant. When finding a TODO comment
about this in loop-invariant I thought it confirmed my initial thoughts.

> 
> That style is easy to prove correct.  There was an issue with the copies
> not propagating away that was pretty inherent in the partial redundancy
> cases that I could probably dig out of my archives if you're interested.

If you think this should also (or instead) be fixed in PRE I can take a look
at some point later since it shouldn't be much more work.

> It looks like there's a variety of line wrapping issues.  Please
> double-check line wrapping using an 80 column window.  Minor I know,
> but
> the consistency with the rest of the code is good.

Looking in vim seems to systematically cut at 80 column and
check_GNU_style.sh only complain about the dg-final line in the
new testcases. Could you point me to such an occurrence?

> 
> >>
> >> +
> >> +  /* Check that all uses reached by the def in insn would still be
> reached
> >> + it.  */
> >> +  dest_regno = REGNO (reg);
> >> +  for (use = DF_REG_USE_CHAIN (dest_regno); use; use =
> >> DF_REF_NEXT_REG (use))
> [ ... ]
> So isn't this overly conservative if DEST_REGNO is set multiple times
> since it's going to look at all the uses, even those not necessarily
> reached by the original SET of DEST_REGNO?
> 
> Or is that not an issue for some reason?  And I'm not requiring you to
> make this optimal, but if I'm right, a comment here seems wise.

My apologize, it is the comment that is incorrect since it doesn't match
the code (a remaining of an old version of this patch). The code actually
checks that the use was dominated by the instruction before it is moved
out of the loop. This is to prevent the code motion in case like:

foo = 1;
bar = 0;
for ()
  {
bar += foo;
foo = 42;
  }

which I met in some of the testsuite cases.

> 
> 
> I think with the wrapping nits fixed and closure on the multi-set issue
> noted immediately above and this will be good for the trunk.

I'll fix this comment right away.

Best regards,

Thomas





RE: [PATCH, ping 1] Move insns without introducing new temporaries in loop2_invariant

2015-05-12 Thread Thomas Preud'homme
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme
> 
> > From: Jeff Law [mailto:l...@redhat.com]
> > Sent: Tuesday, May 12, 2015 4:17 AM
> >
> > >>
> > >> +
> > >> +  /* Check that all uses reached by the def in insn would still be
> > reached
> > >> + it.  */
> > >> +  dest_regno = REGNO (reg);
> > >> +  for (use = DF_REG_USE_CHAIN (dest_regno); use; use =
> > >> DF_REF_NEXT_REG (use))
> > [ ... ]
> > So isn't this overly conservative if DEST_REGNO is set multiple times
> > since it's going to look at all the uses, even those not necessarily
> > reached by the original SET of DEST_REGNO?
> >
> > Or is that not an issue for some reason?  And I'm not requiring you to
> > make this optimal, but if I'm right, a comment here seems wise.
> 
> My apologize, it is the comment that is incorrect since it doesn't match
> the code (a remaining of an old version of this patch). The code actually
> checks that the use was dominated by the instruction before it is moved
> out of the loop.

> >
> >
> > I think with the wrapping nits fixed and closure on the multi-set issue
> > noted immediately above and this will be good for the trunk.
> 
> I'll fix this comment right away.

Please find below a patch with the comment fixed.


*** gcc/ChangeLog ***

2015-05-12  Thomas Preud'homme  

* loop-invariant.c (can_move_invariant_reg): New.
(move_invariant_reg): Call above new function to decide whether
instruction can just be moved, skipping creation of temporary
register.

*** gcc/testsuite/ChangeLog ***

2015-05-12  Thomas Preud'homme  

* gcc.dg/loop-8.c: New test.
* gcc.dg/loop-9.c: New test.


diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c
index e3b560d..76a009f 100644
--- a/gcc/loop-invariant.c
+++ b/gcc/loop-invariant.c
@@ -1511,6 +1511,79 @@ replace_uses (struct invariant *inv, rtx reg, bool 
in_group)
   return 1;
 }
 
+/* Whether invariant INV setting REG can be moved out of LOOP, at the end of
+   the block preceding its header.  */
+
+static bool
+can_move_invariant_reg (struct loop *loop, struct invariant *inv, rtx reg)
+{
+  df_ref def, use;
+  unsigned int dest_regno, defs_in_loop_count = 0;
+  rtx_insn *insn = inv->insn;
+  basic_block bb = BLOCK_FOR_INSN (inv->insn);
+
+  /* We ignore hard register and memory access for cost and complexity reasons.
+ Hard register are few at this stage and expensive to consider as they
+ require building a separate data flow.  Memory access would require using
+ df_simulate_* and can_move_insns_across functions and is more complex.  */
+  if (!REG_P (reg) || HARD_REGISTER_P (reg))
+return false;
+
+  /* Check whether the set is always executed.  We could omit this condition if
+ we know that the register is unused outside of the loop, but it does not
+ seem worth finding out.  */
+  if (!inv->always_executed)
+return false;
+
+  /* Check that all uses that would be dominated by def are already dominated
+ by it.  */
+  dest_regno = REGNO (reg);
+  for (use = DF_REG_USE_CHAIN (dest_regno); use; use = DF_REF_NEXT_REG (use))
+{
+  rtx_insn *use_insn;
+  basic_block use_bb;
+
+  use_insn = DF_REF_INSN (use);
+  use_bb = BLOCK_FOR_INSN (use_insn);
+
+  /* Ignore instruction considered for moving.  */
+  if (use_insn == insn)
+   continue;
+
+  /* Don't consider uses outside loop.  */
+  if (!flow_bb_inside_loop_p (loop, use_bb))
+   continue;
+
+  /* Don't move if a use is not dominated by def in insn.  */
+  if (use_bb == bb && DF_INSN_LUID (insn) >= DF_INSN_LUID (use_insn))
+   return false;
+  if (!dominated_by_p (CDI_DOMINATORS, use_bb, bb))
+   return false;
+}
+
+  /* Check for other defs.  Any other def in the loop might reach a use
+ currently reached by the def in insn.  */
+  for (def = DF_REG_DEF_CHAIN (dest_regno); def; def = DF_REF_NEXT_REG (def))
+{
+  basic_block def_bb = DF_REF_BB (def);
+
+  /* Defs in exit block cannot reach a use they weren't already.  */
+  if (single_succ_p (def_bb))
+   {
+ basic_block def_bb_succ;
+
+ def_bb_succ = single_succ (def_bb);
+ if (!flow_bb_inside_loop_p (loop, def_bb_succ))
+   continue;
+   }
+
+  if (++defs_in_loop_count > 1)
+   return false;
+}
+
+  return true;
+}
+
 /* Move invariant INVNO out of the LOOP.  Returns true if this succeeds, false
otherwise.  */
 
@@ -1544,11 +1617,8 @@ move_invariant_reg (struct loop *loop, unsigned invno)
}
}
 
-  /* Move the set out of the loop.  If the set is always executed (we could

RE: [PATCH, ping 1] Move insns without introducing new temporaries in loop2_invariant

2015-05-12 Thread Thomas Preud'homme
> From: Jeff Law [mailto:l...@redhat.com]
> Sent: Wednesday, May 13, 2015 4:05 AM
> OK for the trunk.
> 
> Thanks for your patience,

Thanks. Committed with the added "PR rtl-optimization/64616" to both
ChangeLog entries.

Best regards,

Thomas




[PATCH] Fix PR66168: ICE due to incorrect invariant register info

2015-05-19 Thread Thomas Preud'homme
Hi,

r223113 made it possible for invariant to actually be moved rather than
moving the source to a new pseudoregister. However, when doing so
the inv->reg is not set up properly: in case of a subreg destination it
holds the inner register rather than the subreg expression.

This patch fixes that.


ChangeLog entries are as follow:

*** gcc/ChangeLog ***

2015-05-18  Thomas Preud'homme  

PR rtl-optimization/66168
* loop-invariant.c (move_invariant_reg): Set inv->reg to destination
of inv->insn when moving an invariant without introducing a temporary
register.

*** gcc/testsuite/ChangeLog ***

2015-05-18  Thomas Preud'homme  

PR rtl-optimization/66168
* gcc.c-torture/compile/pr66168.c: New test.


diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c
index 76a009f..30e1945 100644
--- a/gcc/loop-invariant.c
+++ b/gcc/loop-invariant.c
@@ -1642,9 +1642,13 @@ move_invariant_reg (struct loop *loop, unsigned invno)
 
  emit_insn_after (gen_move_insn (dest, reg), inv->insn);
}
-  else if (dump_file)
-   fprintf (dump_file, "Invariant %d moved without introducing a new "
-   "temporary register\n", invno);
+  else
+   {
+ reg = SET_DEST (set);
+ if (dump_file)
+   fprintf (dump_file, "Invariant %d moved without introducing a new "
+   "temporary register\n", invno);
+   }
   reorder_insns (inv->insn, inv->insn, BB_END (preheader));
 
   /* If there is a REG_EQUAL note on the insn we just moved, and the
diff --git a/gcc/testsuite/gcc.c-torture/compile/pr66168.c 
b/gcc/testsuite/gcc.c-torture/compile/pr66168.c
new file mode 100644
index 000..d6bfc7b
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/pr66168.c
@@ -0,0 +1,15 @@
+int a, b;
+
+void
+fn1 ()
+{
+  for (;;)
+{
+  for (b = 0; b < 3; b++)
+   {
+ char e[2];
+ char f = e[1];
+ a ^= f ? 1 / f : 0;
+   }
+}
+}


Tested by bootstrapping on x86_64-linux-gnu and building an arm-none-eabi
cross-compiler. Testsuite run shows no regression for both of them.

Ok for trunk?

Best regards,

Thomas




RE: [PATCH] Fix PR66168: ICE due to incorrect invariant register info

2015-05-20 Thread Thomas Preud'homme
> From: Steven Bosscher [mailto:stevenb@gmail.com]
> Sent: Tuesday, May 19, 2015 7:21 PM
> 
> Not OK.
> This will break in move_invariants() when it looks at REGNO (inv->reg).

Indeed. I'm even surprised all tests passed. Ok I will just prevent moving
in such a case. I'm running the tests now and will get back to you tomorrow.

Best regards,

Thomas




RE: [PATCH] Fix PR66168: ICE due to incorrect invariant register info

2015-05-20 Thread Thomas Preud'homme
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme
> > From: Steven Bosscher [mailto:stevenb@gmail.com]
> > Sent: Tuesday, May 19, 2015 7:21 PM
> >
> > Not OK.
> > This will break in move_invariants() when it looks at REGNO (inv->reg).
> 
> Indeed. I'm even surprised all tests passed. Ok I will just prevent moving
> in such a case. I'm running the tests now and will get back to you
> tomorrow.

Patch is now tested via bootstrap + testsuite run on x86_64-linux-gnu and
building arm-none-eabi cross-compiler + testsuite run. Both testsuite run
show no regression.

diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c
index 76a009f..4ce3576 100644
--- a/gcc/loop-invariant.c
+++ b/gcc/loop-invariant.c
@@ -1626,7 +1626,7 @@ move_invariant_reg (struct loop *loop, unsigned invno)
   if (REG_P (reg))
regno = REGNO (reg);
 
-  if (!can_move_invariant_reg (loop, inv, reg))
+  if (!can_move_invariant_reg (loop, inv, dest))
{
  reg = gen_reg_rtx_and_attrs (dest);
 
diff --git a/gcc/testsuite/gcc.c-torture/compile/pr66168.c 
b/gcc/testsuite/gcc.c-torture/compile/pr66168.c
new file mode 100644
index 000..d6bfc7b
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/pr66168.c
@@ -0,0 +1,15 @@
+int a, b;
+
+void
+fn1 ()
+{
+  for (;;)
+{
+  for (b = 0; b < 3; b++)
+   {
+ char e[2];
+ char f = e[1];
+ a ^= f ? 1 / f : 0;
+   }
+}
+}

Ok for trunk?

Best regards,

Thomas 





RE: [PATCH] Fix PR66168: ICE due to incorrect invariant register info

2015-05-24 Thread Thomas Preud'homme
> From: Jeff Law [mailto:l...@redhat.com]
> Sent: Saturday, May 23, 2015 6:54 AM
> >
> > -  if (!can_move_invariant_reg (loop, inv, reg))
> > +  if (!can_move_invariant_reg (loop, inv, dest))
> Won't this run into into the same problem if DEST is a SUBREG?

One of the very first test in can_move_invariant_reg is:

if (!REG_P (reg) || !HARD_REGISTER_P (reg))
  return false;

So in case of a subreg the insn will not be moved which will execute the same
code as before my patch. It would be nicer if it could work with subreg of
course but this makes for a much smaller and safer patch.

Best regards,

Thomas




RE: [PATCH 2/3, ARM, libgcc, ping7] Code size optimization for the fmul/fdiv and dmul/ddiv function in libgcc

2015-05-26 Thread Thomas Preud'homme
Ping?

> -Original Message-
> From: Thomas Preud'homme [mailto:thomas.preudho...@arm.com]
> Sent: Thursday, April 30, 2015 3:19 PM
> To: Thomas Preud'homme; Richard Earnshaw; 'gcc-patches@gcc.gnu.org';
> Marcus Shawcroft; Ramana Radhakrishnan
> (ramana.radhakrish...@arm.com)
> Subject: RE: [PATCH 2/3, ARM, libgcc, ping6] Code size optimization for
> the fmul/fdiv and dmul/ddiv function in libgcc
> 
> Here is an updated patch that prefix local symbols with __ for more
> safety.
> They appear in the symtab as local so it is not strictly necessary but one is
> never too cautious. Being local, they also do not generate any PLT entry.
> They appear only because the jumps are from one section to another
> (which is the whole purpose of this patch) and thus need a static
> relocation.
> 
> I hope this revised version address all your concerns.
> 
> ChangeLog entry is unchanged:
> 
> *** gcc/libgcc/ChangeLog ***
> 
> 2015-04-30   Tony Wang 
> 
> * config/arm/ieee754-sf.S: Expose symbols around fragment
> boundaries as function symbols.
> * config/arm/ieee754-df.S: Same with above
> 
> diff --git a/libgcc/config/arm/ieee754-df.S b/libgcc/config/arm/ieee754-
> df.S
> index c1468dc..39b0028 100644
> --- a/libgcc/config/arm/ieee754-df.S
> +++ b/libgcc/config/arm/ieee754-df.S
> @@ -559,7 +559,7 @@ ARM_FUNC_ALIAS aeabi_l2d floatdidf
> 
>  #ifdef L_arm_muldivdf3
> 
> -ARM_FUNC_START muldf3
> +ARM_FUNC_START muldf3, function_section
>  ARM_FUNC_ALIAS aeabi_dmul muldf3
>   do_push {r4, r5, r6, lr}
> 
> @@ -571,7 +571,7 @@ ARM_FUNC_ALIAS aeabi_dmul muldf3
>   COND(and,s,ne)  r5, ip, yh, lsr #20
>   teqne   r4, ip
>   teqne   r5, ip
> - bleqLSYM(Lml_s)
> + bleq__Lml_s
> 
>   @ Add exponents together
>   add r4, r4, r5
> @@ -689,7 +689,7 @@ ARM_FUNC_ALIAS aeabi_dmul muldf3
>   subsip, r4, #(254 - 1)
>   do_it   hi
>   cmphi   ip, #0x700
> - bhi LSYM(Lml_u)
> + bhi __Lml_u
> 
>   @ Round the result, merge final exponent.
>   cmp lr, #0x8000
> @@ -716,9 +716,12 @@ LSYM(Lml_1):
>   mov lr, #0
>   subsr4, r4, #1
> 
> -LSYM(Lml_u):
> + FUNC_END aeabi_dmul
> + FUNC_END muldf3
> +
> +ARM_SYM_START __Lml_u
>   @ Overflow?
> - bgt LSYM(Lml_o)
> + bgt __Lml_o
> 
>   @ Check if denormalized result is possible, otherwise return
> signed 0.
>   cmn r4, #(53 + 1)
> @@ -778,10 +781,11 @@ LSYM(Lml_u):
>   do_it   eq
>   biceq   xl, xl, r3, lsr #31
>   RETLDM  "r4, r5, r6"
> + SYM_END __Lml_u
> 
>   @ One or both arguments are denormalized.
>   @ Scale them leftwards and preserve sign bit.
> -LSYM(Lml_d):
> +ARM_SYM_START __Lml_d
>   teq r4, #0
>   bne 2f
>   and r6, xh, #0x8000
> @@ -804,8 +808,9 @@ LSYM(Lml_d):
>   beq 3b
>   orr yh, yh, r6
>   RET
> + SYM_END __Lml_d
> 
> -LSYM(Lml_s):
> +ARM_SYM_START __Lml_s
>   @ Isolate the INF and NAN cases away
>   teq r4, ip
>   and r5, ip, yh, lsr #20
> @@ -817,10 +822,11 @@ LSYM(Lml_s):
>   orrsr6, xl, xh, lsl #1
>   do_it   ne
>   COND(orr,s,ne)  r6, yl, yh, lsl #1
> - bne LSYM(Lml_d)
> + bne __Lml_d
> + SYM_END __Lml_s
> 
>   @ Result is 0, but determine sign anyway.
> -LSYM(Lml_z):
> +ARM_SYM_START __Lml_z
>   eor xh, xh, yh
>   and xh, xh, #0x8000
>   mov xl, #0
> @@ -832,41 +838,42 @@ LSYM(Lml_z):
>   moveq   xl, yl
>   moveq   xh, yh
>   COND(orr,s,ne)  r6, yl, yh, lsl #1
> - beq LSYM(Lml_n) @ 0 * INF or INF * 0 -> NAN
> + beq __Lml_n @ 0 * INF or INF * 0 -> NAN
>   teq r4, ip
>   bne 1f
>   orrsr6, xl, xh, lsl #12
> - bne LSYM(Lml_n) @ NAN *  -> NAN
> + bne __Lml_n @ NAN *  -> NAN
>  1:   teq r5, ip
> - bne LSYM(Lml_i)
> + bne __Lml_i
>   orrsr6, yl, yh, lsl #12
>   do_it   ne, t
>   movne   xl, yl
>   movne   xh, yh
> - bne LSYM(Lml_n) @  * NAN -> NAN
> + bne __Lml_n @  * NAN -> NAN
> + SYM_END __Lml_z
> 
>   @ Result is INF, but we need to determine its sign.
> -LSYM(Lml_i):
> +ARM_SYM_START __Lml_i
>   eor xh, xh, yh
> + SYM_END __Lml_i
> 
>   @ Overflow: return INF (sign already in xh).
> -LSYM(Lml_o):
> +ARM_SYM_START __Lml_o
>   and xh, xh, #0x8000
&

RE: [PATCH] Fix PR66168: ICE due to incorrect invariant register info

2015-05-27 Thread Thomas Preud'homme
> From: Jeff Law [mailto:l...@redhat.com]
> Sent: Wednesday, May 27, 2015 11:24 PM
> Ah, OK.  I was looking at the code prior to the call for
> can_move_invariant_reg in move_invariant_reg which implies that DEST
> can
> be a subreg, but REG can not.
> 
> But with that check in can_move_invariant_reg obviously won't matter.
> It feels like we've likely got some dead code here, but that can be a
> follow-up if you want to pursue.

Are you referring to the subreg code? It's used at the end of the function:

inv->reg = reg;
inv->orig_regno = regno;

> 
> OK for the trunk.

Thanks, committed.

Best regards,

Thomas




RE: [PATCH, ping1] Fix removing of df problem in df_finish_pass

2015-04-20 Thread Thomas Preud'homme
Ping?

> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme
> Sent: Tuesday, March 03, 2015 12:02 PM
> To: 'Bernhard Reutner-Fischer'; gcc-patches@gcc.gnu.org; 'Paolo Bonzini';
> 'Seongbae Park'; 'Kenneth Zadeck'
> Subject: RE: [PATCH] Fix removing of df problem in df_finish_pass
> 
> > From: Bernhard Reutner-Fischer [mailto:rep.dot@gmail.com]
> > Sent: Saturday, February 28, 2015 4:00 AM
> > >   use df_remove_problem rather than manually removing problems,
> > living
> >
> > leaving
> 
> Indeed. Please find updated changelog below:
> 
> 2015-03-03  Thomas Preud'homme  
> 
>   * df-core.c (df_finish_pass): Iterate over df-
> >problems_by_index[] and
>   use df_remove_problem rather than manually removing
> problems, leaving
>   holes in df->problems_in_order[].
> 
> Best regards,
> 
> Thomas
> 
> 
> 
> 





RE: [PATCH, ping1] Fix removing of df problem in df_finish_pass

2015-04-21 Thread Thomas Preud'homme
Committed. I'll wait a week and then ask for approval for a backport to 5.1.1 
once 5.1 is released.

Best regards,

Thomas

> -Original Message-
> From: Kenneth Zadeck [mailto:zad...@naturalbridge.com]
> Sent: Monday, April 20, 2015 9:26 PM
> To: Thomas Preud'homme; 'Bernhard Reutner-Fischer'; gcc-
> patc...@gcc.gnu.org; 'Paolo Bonzini'; 'Seongbae Park'
> Subject: Re: [PATCH, ping1] Fix removing of df problem in df_finish_pass
> 
> As a dataflow maintainer, I approve this patch for the next release.
> However, you will have to get approval of a release manager to get it
> into 5.0.
> 
> 
> 
> On 04/20/2015 04:22 AM, Thomas Preud'homme wrote:
> > Ping?
> >
> >> -Original Message-----
> >> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> >> ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme
> >> Sent: Tuesday, March 03, 2015 12:02 PM
> >> To: 'Bernhard Reutner-Fischer'; gcc-patches@gcc.gnu.org; 'Paolo
> Bonzini';
> >> 'Seongbae Park'; 'Kenneth Zadeck'
> >> Subject: RE: [PATCH] Fix removing of df problem in df_finish_pass
> >>
> >>> From: Bernhard Reutner-Fischer [mailto:rep.dot@gmail.com]
> >>> Sent: Saturday, February 28, 2015 4:00 AM
> >>>>use df_remove_problem rather than manually removing
> problems,
> >>> living
> >>>
> >>> leaving
> >> Indeed. Please find updated changelog below:
> >>
> >> 2015-03-03  Thomas Preud'homme
> 
> >>
> >>* df-core.c (df_finish_pass): Iterate over df-
> >>> problems_by_index[] and
> >>use df_remove_problem rather than manually removing
> >> problems, leaving
> >>holes in df->problems_in_order[].
> >>
> >> Best regards,
> >>
> >> Thomas
> >>
> >>
> >>
> >>
> >
> >





RE: [PATCH, GCC, stage1] Fallback to copy-prop if constant-prop not possible

2015-04-23 Thread Thomas Preud'homme
> From: Jeff Law [mailto:l...@redhat.com]
> Sent: Friday, April 24, 2015 10:59 AM
> 

Hi Jeff,

> > +
> > +static bool
> > +cprop_reg_p (const_rtx x)
> > +{
> > +  return REG_P (x) && !HARD_REGISTER_P (x);
> > +}
> How about instead this move to a more visible location (perhaps a macro
> in regs.h or an inline function).  Then as a followup, change the
> various places that have this sequence to use that common definition
> that exist outside of cprop.c.

According to Steven this was proposed in the past but was refused (see
end of [1]).

[1] https://gcc.gnu.org/ml/gcc-patches/2015-03/msg01066.html

> 
> > @@ -1191,7 +1192,7 @@ do_local_cprop (rtx x, rtx_insn *insn)
> > /* Rule out USE instructions and ASM statements as we don't want
> to
> >change the hard registers mentioned.  */
> > if (REG_P (x)
> > -  && (REGNO (x) >= FIRST_PSEUDO_REGISTER
> > +  && (cprop_reg_p (x)
> > || (GET_CODE (PATTERN (insn)) != USE
> >   && asm_noperands (PATTERN (insn)) < 0)))
> Isn't the REG_P test now redundant?

I made the same mistake when reviewing that change and indeed it's not.
Note the opening parenthesis before cprop_reg_p that contains a bitwise
OR expression. So in the case where cprop_reg_p is false, REG_P still
needs to be true.

We could keep a check on FIRST_PSEUDO_REGISTER but the intent (checking
that the register is suitable for propagation) is clearer now, as pointed out by
Steven to me.

> 
> OK for the trunk with those changes.
> 
> jeff

Given the above I intent to keep the REG_P in the second excerpt and will
wait for your input about moving cprop_reg_p to rtl.h

Best regards,

Thomas




RE: [PATCH, GCC, stage1] Fallback to copy-prop if constant-prop not possible

2015-04-23 Thread Thomas Preud'homme
> From: Jeff Law [mailto:l...@redhat.com]
> Sent: Friday, April 24, 2015 11:15 AM
> 
> So revised review is "ok for the trunk" :-)

Committed.

Best regards,

Thomas





[PATCH, ARM, regression] Fix ternary operator in arm/unknown-elf.h

2015-04-23 Thread Thomas Preud'homme
I just committed the obvious fix below that fix build failure introduced by
revision 222371.


*** gcc/ChangeLog ***

2015-04-24  Thomas Preud'homme  

* config/arm/unknown-elf.h (ASM_OUTPUT_ALIGNED_DECL_LOCAL): fix
ternary operator in fprintf and harmonize spacing.


diff --git a/gcc/config/arm/unknown-elf.h b/gcc/config/arm/unknown-elf.h
index df0b9ce..2e5ab7e 100644
--- a/gcc/config/arm/unknown-elf.h
+++ b/gcc/config/arm/unknown-elf.h
@@ -80,9 +80,9 @@
\
   ASM_OUTPUT_ALIGN (FILE, floor_log2 (ALIGN / BITS_PER_UNIT)); \
   ASM_OUTPUT_LABEL (FILE, NAME);   \
-  fprintf (FILE, "\t.space\t%d\n", SIZE ? (int)(SIZE) : 1);
\
+  fprintf (FILE, "\t.space\t%d\n", SIZE ? (int) SIZE : 1); \
   fprintf (FILE, "\t.size\t%s, %d\n",  \
-  NAME, SIZE ? (int) SIZE, 1); \
+  NAME, SIZE ? (int) SIZE : 1);\
 }  \
   while (0)

Best regards,

Thomas




RE: [PATCH 1/2, combine] Try REG_EQUAL for nonzero_bits

2015-04-24 Thread Thomas Preud'homme
Hi,

first of all, sorry for the delay. We quickly entered stage 4 and I thought
it was best waiting for stage 1 to update you on this.

> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme
> 
> Of course both approaches are not exclusive. I'll try to test with *both*
> rs6000 bootstrap and with a cross-compiler for one of these targets.

I did two experiments where I checked the impact of removing the code
guarded by SHORT_IMMEDIATES_SIGN_EXTEND. In the first one I removed
the code in both rtlanal.c and combine.c. In the second, I only removed the
code from combine.c (in both occurences). In both cases powerpc
bootstrap succeeded.

I then proceeded to use these 2 produced compilers to compile the same
gcc source (actually the source from removing all code guarded by the
macro). I compared the output of objdump on the resulting g++ and
found that in both case the output was different from the one without
any modification. Both diffs look like:

Disassembly of section .init:
@@ -1359,7 +1359,7 @@ Disassembly of section .text:
 10003a94:  f8 21 ff 81 stdur1,-128(r1)
 10003a98:  eb e4 00 00 ld  r31,0(r4)
 10003a9c:  3c 82 ff f8 addis   r4,r2,-8
-10003aa0:  38 84 d7 60 addir4,r4,-10400
+10003aa0:  38 84 d7 70 addir4,r4,-10384
 10003aa4:  7f e3 fb 78 mr  r3,r31
 10003aa8:  4b ff f0 d9 bl  10002b80 
<003d.plt_call.strcmp@@GLIBC_2.3+0>
 10003aac:  e8 41 00 28 ld  r2,40(r1)
@@ -1371,7 +1371,7 @@ Disassembly of section .text:
 10003ac4:  79 2a ff e3 rldicl. r10,r9,63,63
 10003ac8:  41 82 00 78 beq-10003b40 
<._ZL22sanitize_spec_functioniPPKc+0xc0>
 10003acc:  3c 62 ff f8 addis   r3,r2,-8
-10003ad0:  38 63 f5 70 addir3,r3,-2704
+10003ad0:  38 63 f5 b0 addir3,r3,-2640
 10003ad4:  38 21 00 80 addir1,r1,128
 10003ad8:  e8 01 00 10 ld  r0,16(r1)
 10003adc:  eb e1 ff f8 ld  r31,-8(r1)

(this one is when comparing g++ compiled by GCC with partial removal of
the code guarded by the macro compared to compiled without GCC
being modified.

I may have done a mistake when doing the experiment though and can
do it again if you wish.

Best regards,

Thomas




RE: [PATCH, combine] Try REG_EQUAL for nonzero_bits

2015-04-27 Thread Thomas Preud'homme
> From: Jeff Law [mailto:l...@redhat.com]
> Sent: Saturday, April 25, 2015 3:00 AM
> Do you have a testcase where this change can result in better generated
> code.  If so please add that testcase.  It's OK if it's ARM specific.

Hi Jeff,

Last time I tried I couldn't reduce the code to a small testcase but if I 
remember
well it was mostly due to the problem of finding a good test for creduce
(zero extension is not unique enough). I'll try again with a more manual 
approach
and get back to you.

Best regards,

Thomas





RE: [PATCH 1/2, combine] Try REG_EQUAL for nonzero_bits

2015-04-27 Thread Thomas Preud'homme
> From: Jeff Law [mailto:l...@redhat.com]
> Sent: Saturday, April 25, 2015 2:57 AM

> > +static rtx
> > +sign_extend_short_imm (rtx src, machine_mode mode, unsigned int
> prec)
> > +{
> > +  if (GET_MODE_PRECISION (mode) < prec && CONST_INT_P (src)
> > +  && INTVAL (src) > 0 && val_signbit_known_set_p (mode, INTVAL
> (src)))
> > +src = GEN_INT (INTVAL (src) | ~GET_MODE_MASK (mode));
> Can you go ahead and put each condition of the && on a separate line.
> It uses more vertical space, but IMHO makes this easier to read.As I
> said, it was a nit :-)

You're perfectly right. Anything that can improve readability of source code
is a good thing.

> 
> OK with that fix.

Committed.

Best regards,

Thomas





RE: [PATCH, Aarch64] Add FMA steering pass for Cortex-A57

2015-04-28 Thread Thomas Preud'homme
> From: Marcus Shawcroft [mailto:marcus.shawcr...@gmail.com]
> Sent: Thursday, February 05, 2015 5:17 PM
> >
> > *** gcc/ChangeLog ***
> >
> > 2015-01-26 Thomas Preud'homme thomas.preudho...@arm.com
> >
> > * config.gcc: Add cortex-a57-fma-steering.o to extra_objs for
> > aarch64-*-*.
> > * config/aarch64/t-aarch64: Add a rule for cortex-a57-fma-steering.o.
> > * config/aarch64/aarch64.h
> (AARCH64_FL_USE_FMA_STEERING_PASS): Define.
> > (AARCH64_TUNE_FMA_STEERING): Likewise.
> > * config/aarch64/aarch64-cores.def: Set
> > AARCH64_FL_USE_FMA_STEERING_PASS for cores with dynamic
> steering of
> > FMUL/FMADD instructions.
> > * config/aarch64/aarch64.c (aarch64_register_fma_steering): Declare.
> > (aarch64_override_options): Include cortex-a57-fma-steering.h.  Call
> > aarch64_register_fma_steering () if
> AARCH64_TUNE_FMA_STEERING is true.
> > * config/aarch64/cortex-a57-fma-steering.h: New file.
> > * config/aarch64/cortex-a57-fma-steering.c: Likewise.
> 
> OK but wait for stage-1 to open for general development before you
> commit it please.

Done after rebasing it (context line change in aarch64.c due to new header
include and adaptation to new signature of AARCH64_CORE macro in
aarch64-cores.def).

Committed patch below:

diff --git a/gcc/config.gcc b/gcc/config.gcc
index a1df043..9fec1e8 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -302,7 +302,7 @@ m32c*-*-*)
 aarch64*-*-*)
cpu_type=aarch64
extra_headers="arm_neon.h arm_acle.h"
-   extra_objs="aarch64-builtins.o aarch-common.o"
+   extra_objs="aarch64-builtins.o aarch-common.o cortex-a57-fma-steering.o"
target_gtfiles="\$(srcdir)/config/aarch64/aarch64-builtins.c"
target_has_targetm_common=yes
;;
diff --git a/gcc/config/aarch64/aarch64-cores.def 
b/gcc/config/aarch64/aarch64-cores.def
index 7c285ba..dfc9cc8 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -40,7 +40,7 @@
 /* V8 Architecture Processors.  */
 
 AARCH64_CORE("cortex-a53",  cortexa53, cortexa53, 8,  AARCH64_FL_FOR_ARCH8 | 
AARCH64_FL_CRC, cortexa53, "0x41", "0xd03")
-AARCH64_CORE("cortex-a57",  cortexa57, cortexa57, 8,  AARCH64_FL_FOR_ARCH8 | 
AARCH64_FL_CRC, cortexa57, "0x41", "0xd07")
+AARCH64_CORE("cortex-a57",  cortexa57, cortexa57, 8,  AARCH64_FL_FOR_ARCH8 | 
AARCH64_FL_CRC | AARCH64_FL_USE_FMA_STEERING_PASS, cortexa57, "0x41", "0xd07")
 AARCH64_CORE("cortex-a72",  cortexa72, cortexa57, 8,  AARCH64_FL_FOR_ARCH8 | 
AARCH64_FL_CRC, cortexa57, "0x41", "0xd08")
 AARCH64_CORE("exynos-m1",   exynosm1,  cortexa57, 8,  AARCH64_FL_FOR_ARCH8 | 
AARCH64_FL_CRC | AARCH64_FL_CRYPTO, cortexa57, "0x53", "0x001")
 AARCH64_CORE("thunderx",thunderx,  thunderx,  8,  AARCH64_FL_FOR_ARCH8 | 
AARCH64_FL_CRC | AARCH64_FL_CRYPTO, thunderx,  "0x43", "0x0a1")
@@ -48,5 +48,5 @@ AARCH64_CORE("xgene1",  xgene1,xgene1,8,  
AARCH64_FL_FOR_ARCH8, xgen
 
 /* V8 big.LITTLE implementations.  */
 
-AARCH64_CORE("cortex-a57.cortex-a53",  cortexa57cortexa53, cortexa53, 8,  
AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa57, "0x41", "0xd07.0xd03")
+AARCH64_CORE("cortex-a57.cortex-a53",  cortexa57cortexa53, cortexa53, 8,  
AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_USE_FMA_STEERING_PASS, 
cortexa57, "0x41", "0xd07.0xd03")
 AARCH64_CORE("cortex-a72.cortex-a53",  cortexa72cortexa53, cortexa53, 8,  
AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa57, "0x41", "0xd08.0xd03")
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 1f7187b..3fd1b3f 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -200,6 +200,8 @@ extern unsigned aarch64_architecture_version;
 #define AARCH64_FL_CRYPTO (1 << 2) /* Has crypto.  */
 #define AARCH64_FL_SLOWMUL(1 << 3) /* A slow multiply core.  */
 #define AARCH64_FL_CRC(1 << 4) /* Has CRC.  */
+/* Has static dispatch of FMA.  */
+#define AARCH64_FL_USE_FMA_STEERING_PASS (1 << 5)
 
 /* Has FP and SIMD.  */
 #define AARCH64_FL_FPSIMD (AARCH64_FL_FP | AARCH64_FL_SIMD)
@@ -220,6 +222,8 @@ extern unsigned long aarch64_isa_flags;
 /* Macros to test tuning flags.  */
 extern unsigned long aarch64_tune_flags;
 #define AARCH64_TUNE_SLOWMUL   (aarch64_tune_flags & AARCH64_FL_SLOWMUL)
+#define AARCH64_TUNE_FMA_STEERING \
+  (aarch64_tune_flags & AARCH64_FL_USE_FMA_STEERING_PASS)
 
 /* Crypto is an optional extension to AdvSIMD.  */
 #define TARGET_CRYPTO (TARGET_SIMD &

RE: [PATCH 2/3, ARM, libgcc, ping6] Code size optimization for the fmul/fdiv and dmul/ddiv function in libgcc

2015-04-30 Thread Thomas Preud'homme
Here is an updated patch that prefix local symbols with __ for more safety.
They appear in the symtab as local so it is not strictly necessary but one is
never too cautious. Being local, they also do not generate any PLT entry.
They appear only because the jumps are from one section to another
(which is the whole purpose of this patch) and thus need a static relocation.

I hope this revised version address all your concerns.

ChangeLog entry is unchanged:

*** gcc/libgcc/ChangeLog ***

2015-04-30   Tony Wang 

* config/arm/ieee754-sf.S: Expose symbols around fragment boundaries as 
function symbols.
* config/arm/ieee754-df.S: Same with above

diff --git a/libgcc/config/arm/ieee754-df.S b/libgcc/config/arm/ieee754-df.S
index c1468dc..39b0028 100644
--- a/libgcc/config/arm/ieee754-df.S
+++ b/libgcc/config/arm/ieee754-df.S
@@ -559,7 +559,7 @@ ARM_FUNC_ALIAS aeabi_l2d floatdidf
 
 #ifdef L_arm_muldivdf3
 
-ARM_FUNC_START muldf3
+ARM_FUNC_START muldf3, function_section
 ARM_FUNC_ALIAS aeabi_dmul muldf3
do_push {r4, r5, r6, lr}
 
@@ -571,7 +571,7 @@ ARM_FUNC_ALIAS aeabi_dmul muldf3
COND(and,s,ne)  r5, ip, yh, lsr #20
teqne   r4, ip
teqne   r5, ip
-   bleqLSYM(Lml_s)
+   bleq__Lml_s
 
@ Add exponents together
add r4, r4, r5
@@ -689,7 +689,7 @@ ARM_FUNC_ALIAS aeabi_dmul muldf3
subsip, r4, #(254 - 1)
do_it   hi
cmphi   ip, #0x700
-   bhi LSYM(Lml_u)
+   bhi __Lml_u
 
@ Round the result, merge final exponent.
cmp lr, #0x8000
@@ -716,9 +716,12 @@ LSYM(Lml_1):
mov lr, #0
subsr4, r4, #1
 
-LSYM(Lml_u):
+   FUNC_END aeabi_dmul
+   FUNC_END muldf3
+
+ARM_SYM_START __Lml_u
@ Overflow?
-   bgt LSYM(Lml_o)
+   bgt __Lml_o
 
@ Check if denormalized result is possible, otherwise return signed 0.
cmn r4, #(53 + 1)
@@ -778,10 +781,11 @@ LSYM(Lml_u):
do_it   eq
biceq   xl, xl, r3, lsr #31
RETLDM  "r4, r5, r6"
+   SYM_END __Lml_u
 
@ One or both arguments are denormalized.
@ Scale them leftwards and preserve sign bit.
-LSYM(Lml_d):
+ARM_SYM_START __Lml_d
teq r4, #0
bne 2f
and r6, xh, #0x8000
@@ -804,8 +808,9 @@ LSYM(Lml_d):
beq 3b
orr yh, yh, r6
RET
+   SYM_END __Lml_d
 
-LSYM(Lml_s):
+ARM_SYM_START __Lml_s
@ Isolate the INF and NAN cases away
teq r4, ip
and r5, ip, yh, lsr #20
@@ -817,10 +822,11 @@ LSYM(Lml_s):
orrsr6, xl, xh, lsl #1
do_it   ne
COND(orr,s,ne)  r6, yl, yh, lsl #1
-   bne LSYM(Lml_d)
+   bne __Lml_d
+   SYM_END __Lml_s
 
@ Result is 0, but determine sign anyway.
-LSYM(Lml_z):
+ARM_SYM_START __Lml_z
eor xh, xh, yh
and xh, xh, #0x8000
mov xl, #0
@@ -832,41 +838,42 @@ LSYM(Lml_z):
moveq   xl, yl
moveq   xh, yh
COND(orr,s,ne)  r6, yl, yh, lsl #1
-   beq LSYM(Lml_n) @ 0 * INF or INF * 0 -> NAN
+   beq __Lml_n @ 0 * INF or INF * 0 -> NAN
teq r4, ip
bne 1f
orrsr6, xl, xh, lsl #12
-   bne LSYM(Lml_n) @ NAN *  -> NAN
+   bne __Lml_n @ NAN *  -> NAN
 1: teq r5, ip
-   bne LSYM(Lml_i)
+   bne __Lml_i
orrsr6, yl, yh, lsl #12
do_it   ne, t
movne   xl, yl
movne   xh, yh
-   bne LSYM(Lml_n) @  * NAN -> NAN
+   bne __Lml_n @  * NAN -> NAN
+   SYM_END __Lml_z
 
@ Result is INF, but we need to determine its sign.
-LSYM(Lml_i):
+ARM_SYM_START __Lml_i
eor xh, xh, yh
+   SYM_END __Lml_i
 
@ Overflow: return INF (sign already in xh).
-LSYM(Lml_o):
+ARM_SYM_START __Lml_o
and xh, xh, #0x8000
orr xh, xh, #0x7f00
orr xh, xh, #0x00f0
mov xl, #0
RETLDM  "r4, r5, r6"
+   SYM_END __Lml_o
 
@ Return a quiet NAN.
-LSYM(Lml_n):
+ARM_SYM_START __Lml_n
orr xh, xh, #0x7f00
orr xh, xh, #0x00f8
RETLDM  "r4, r5, r6"
+   SYM_END __Lml_n
 
-   FUNC_END aeabi_dmul
-   FUNC_END muldf3
-
-ARM_FUNC_START divdf3
+ARM_FUNC_START divdf3 function_section
 ARM_FUNC_ALIAS aeabi_ddiv divdf3

do_push {r4, r5, r6, lr}
@@ -985,7 +992,7 @@ ARM_FUNC_ALIAS aeabi_ddiv divdf3
subsip, r4, #(254 - 1)
do_it   hi
cmphi   ip, #0x700
-   bhi LSYM(Lml_u)
+   bhi __Lml_u
 
@ Round the result, merge final exponent.
subsip, r5, yh
@@ -1009,13 +1016,13 @@ LSYM(Ldv_1):
orr xh, xh, #0x0010
mov lr, #0
subsr4, r4, #1
-   b   LSYM(Lml_u)
+   b   __Lml_u
 
@ Result mightt need to be denormaliz

RE: [PATCH, combine] Try REG_EQUAL for nonzero_bits

2015-04-30 Thread Thomas Preud'homme
> From: Jeff Law [mailto:l...@redhat.com]
> Sent: Tuesday, April 28, 2015 12:27 AM
> To: Thomas Preud'homme; 'Eric Botcazou'
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH, combine] Try REG_EQUAL for nonzero_bits
> 
> On 04/27/2015 04:26 AM, Thomas Preud'homme wrote:
> >> From: Jeff Law [mailto:l...@redhat.com]
> >> Sent: Saturday, April 25, 2015 3:00 AM
> >> Do you have a testcase where this change can result in better
> generated
> >> code.  If so please add that testcase.  It's OK if it's ARM specific.
> >
> > Hi Jeff,
> >
> > Last time I tried I couldn't reduce the code to a small testcase but if I
> remember
> > well it was mostly due to the problem of finding a good test for creduce
> > (zero extension is not unique enough). I'll try again with a more manual
> approach
> > and get back to you.
> OK.  No need for heroics -- give it a shot, but don't burn an insane
> amount of time on it.  If we can't get to a reasonable testcase, then so
> be it.

Sadly I couldn't get a testcase. I get almost same sequence of instruction as 
the program we found the problem into but couldn't get exactly the same. In all 
the cases I constructed the nonzero_bits info we already have were enough for 
combine to do its job. I couldn't find what cause this information to be 
inaccurate. I will try to investigate a bit further on Monday as another pass 
might not be doing its job properly. Or maybe there's something that prevent 
information being propagated.

Best regards,

Thomas





RE: [PATCH, combine] Try REG_EQUAL for nonzero_bits

2015-05-04 Thread Thomas Preud'homme
> From: Jeff Law [mailto:l...@redhat.com]
> Sent: Tuesday, April 28, 2015 12:27 AM
> OK.  No need for heroics -- give it a shot, but don't burn an insane
> amount of time on it.  If we can't get to a reasonable testcase, then so
> be it.

Ok, I tried but really didn't managed to create a testcase. I did, however,
understand the condition when this patch is helpful. In the function
reg_nonzero_bits_for_combine () in combine.c there is a test to check if
last_set_nonzero_bits for a given register is still valid.

In the case I'm considering, the test evaluates to false because:

(i) the register rX whose nonzero bits are being evaluated was set in a
previous basic block than the one with the instruction using rX (hence
rsp->last_set_label < label_tick)
(ii) the predecessor of the the basic block for that same insn is not the
previous basic block analyzed by combine_instructions (hence
label_tick_ebb_start == label_tick)
(iii) the register rX is set multiple time (hence
REG_N_SETS (REGNO (x)) != 1)

Yet, the block being processed is dominated by the SET for rX so there
is a REG_EQUAL available to narrow down the set of nonzero bits.

Based on my understanding of your answer quoted above, I'll commit
it as is, despite not having been able to come up with a testcase. I'll
wait tomorrow to do so though in case you changed your mind about it.

Best regards,

Thomas




RE: [PATCH, ping 1] Move insns without introducing new temporaries in loop2_invariant

2015-05-06 Thread Thomas Preud'homme
Ping?

Best regards,

Thomas

> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme
> Sent: Monday, March 16, 2015 8:39 PM
> To: 'Steven Bosscher'
> Cc: GCC Patches; Eric Botcazou
> Subject: RE: [PATCH, stage1] Move insns without introducing new
> temporaries in loop2_invariant
> 
> > From: Steven Bosscher [mailto:stevenb@gmail.com]
> > Sent: Monday, March 09, 2015 7:48 PM
> > To: Thomas Preud'homme
> > Cc: GCC Patches; Eric Botcazou
> > Subject: Re: [PATCH, stage1] Move insns without introducing new
> > temporaries in loop2_invariant
> 
> New patch below.
> 
> >
> > It looks like this would run for all candidate loop invariants, right?
> >
> > If so, you're creating run time of O(n_invariants*n_bbs_in_loop), a
> > potential compile time hog for large loops.
> >
> > But why compute this at all? Perhaps I'm missing something, but you
> > already have inv->always_executed available, no?
> 
> Indeed. I didn't realize the information was already there.
> 
> >
> >
> > > +  basic_block use_bb;
> > > +
> > > +  ref = DF_REF_INSN (use);
> > > +  use_bb = BLOCK_FOR_INSN (ref);
> >
> > You can use DF_REF_BB.
> 
> Since I need use_insn here I kept BLOCK_FOR_INSN but I used
> DF_REF_BB for the def below.
> 
> 
> So here are the new ChangeLog entries:
> 
> *** gcc/ChangeLog ***
> 
> 2015-03-11  Thomas Preud'homme  
> 
> * loop-invariant.c (can_move_invariant_reg): New.
> (move_invariant_reg): Call above new function to decide whether
> instruction can just be moved, skipping creation of temporary
> register.
> 
> *** gcc/testsuite/ChangeLog ***
> 
> 2015-03-12  Thomas Preud'homme  
> 
> * gcc.dg/loop-8.c: New test.
> * gcc.dg/loop-9.c: New test.
> 
> 
> diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c
> index f79b497..8217d62 100644
> --- a/gcc/loop-invariant.c
> +++ b/gcc/loop-invariant.c
> @@ -1512,6 +1512,79 @@ replace_uses (struct invariant *inv, rtx reg,
> bool in_group)
>return 1;
>  }
> 
> And the new patch:
> 
> +/* Whether invariant INV setting REG can be moved out of LOOP, at the
> end of
> +   the block preceding its header.  */
> +
> +static bool
> +can_move_invariant_reg (struct loop *loop, struct invariant *inv, rtx
> reg)
> +{
> +  df_ref def, use;
> +  unsigned int dest_regno, defs_in_loop_count = 0;
> +  rtx_insn *insn = inv->insn;
> +  basic_block bb = BLOCK_FOR_INSN (inv->insn);
> +
> +  /* We ignore hard register and memory access for cost and complexity
> reasons.
> + Hard register are few at this stage and expensive to consider as they
> + require building a separate data flow.  Memory access would require
> using
> + df_simulate_* and can_move_insns_across functions and is more
> complex.  */
> +  if (!REG_P (reg) || HARD_REGISTER_P (reg))
> +return false;
> +
> +  /* Check whether the set is always executed.  We could omit this
> condition if
> + we know that the register is unused outside of the loop, but it does
> not
> + seem worth finding out.  */
> +  if (!inv->always_executed)
> +return false;
> +
> +  /* Check that all uses reached by the def in insn would still be reached
> + it.  */
> +  dest_regno = REGNO (reg);
> +  for (use = DF_REG_USE_CHAIN (dest_regno); use; use =
> DF_REF_NEXT_REG (use))
> +{
> +  rtx_insn *use_insn;
> +  basic_block use_bb;
> +
> +  use_insn = DF_REF_INSN (use);
> +  use_bb = BLOCK_FOR_INSN (use_insn);
> +
> +  /* Ignore instruction considered for moving.  */
> +  if (use_insn == insn)
> + continue;
> +
> +  /* Don't consider uses outside loop.  */
> +  if (!flow_bb_inside_loop_p (loop, use_bb))
> + continue;
> +
> +  /* Don't move if a use is not dominated by def in insn.  */
> +  if (use_bb == bb && DF_INSN_LUID (insn) >= DF_INSN_LUID
> (use_insn))
> + return false;
> +  if (!dominated_by_p (CDI_DOMINATORS, use_bb, bb))
> + return false;
> +}
> +
> +  /* Check for other defs.  Any other def in the loop might reach a use
> + currently reached by the def in insn.  */
> +  for (def = DF_REG_DEF_CHAIN (dest_regno); def; def =
> DF_REF_NEXT_REG (def))
> +{
> +  basic_block def_bb = DF_REF_BB (def);
> +
> +  /* Defs in exit block cannot reach a use they weren't already.  */
> +  if (single_succ_p (def_bb))
> +

[PATCH, ARM] Fix testcase for PR64616

2015-05-11 Thread Thomas Preud'homme
Hi,

Testcase made for PR64616 was only passing when using a litteral pool. Rather 
than having an alternative for systems where this is not true, this patch 
changes the test to check that a global copy propagation occurs in cprop2. This 
should work accross all ARM targets (it works when targetting Cortex-M0, 
Cortex-M3 and whatever default core for ARMv7-a with vfpv3-d16 FPU).

ChangeLog entry is as follows:

*** gcc/testsuite/ChangeLog ***

2015-05-04  Thomas Preud'homme  

* gcc.target/arm/pr64616.c: Test dump rather than assembly to work
accross ARM targets.

diff --git a/gcc/testsuite/gcc.target/arm/pr64616.c 
b/gcc/testsuite/gcc.target/arm/pr64616.c
index c686ffa..2280f21 100644
--- a/gcc/testsuite/gcc.target/arm/pr64616.c
+++ b/gcc/testsuite/gcc.target/arm/pr64616.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2" } */
+/* { dg-options "-O2 -fdump-rtl-cprop2" } */
 
 int f (int);
 unsigned int glob;
@@ -11,4 +11,5 @@ g ()
   glob = 5;
 }
 
-/* { dg-final { scan-assembler-times "ldr" 2 } } */
+/* { dg-final { scan-rtl-dump "GLOBAL COPY-PROP" "cprop2" } } */
+/* { dg-final { cleanup-rtl-dump "cprop2" } } */


Patch was tested by verifying that the pattern appears when targeting 
Cortex-M0, Cortex-M3 and
the default core for ARMv7-a with vfpv3-d16 FPU.

Best regards,

Thomas




RE: [PATCH, combine] Try REG_EQUAL for nonzero_bits

2015-05-12 Thread Thomas Preud'homme
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme
> 
> Based on my understanding of your answer quoted above, I'll commit
> it as is, despite not having been able to come up with a testcase. I'll
> wait tomorrow to do so though in case you changed your mind about it.

Committed.

Best regards,

Thomas




RE: [PATCH, RFC, C] Add -fno-float to forbid floating point data types

2014-11-13 Thread Thomas Preud'homme
> From: Joseph Myers [mailto:jos...@codesourcery.com]
> Sent: Wednesday, November 12, 2014 10:11 PM
> >
> > This patch modifies the C parser to give an error if:
> >   - any variable or function parameter is declared with a float type or
> > a type containing a float (prototype are ignored)
> 
> But if you ignore prototypes at the time of declaration, don't you need to
> diagnose if a function with a floating-point parameter or return type gets
> called?  I don't see anything to do that.  (This includes the
> __builtin_sqrt case from the previous discussion.)

It would work by transitivity. To call a function with a float you'd have to
either declare a float variable or pass it a float literal.

> 
> > specified by user and a float litteral is found.
> 
> "literal" (throughout).

Thanks. You'll have guessed that I'm not a native English speaker.

> 
> No, this is wrong.  (a) By tying this to CPP_SEMICOLON you'll only catch
> it if the variable is last in the list of declarators (consider "float f,
> g (void);", where what comes before the semicolon is a function
> declaration); better to check on each declarator, not just the last.

Indeed. I meant to do that at some point and I forgot about it.

> (b)
> declarator->kind reflects the outermost declarator, but what determines
> whether it's a function declaration is the innermost declarator (other
> than cdk_id or cdk_attrs declarators), so this looks like it would give an
> error for "float *f(void);" (wrongly treating it as a variable because the
> outermost declarator is cdk_pointer), but not for "float (*f)(void);"
> (wrongly treating it as a function because the outermost declarator is
> cdk_function ... you could of course decide that function pointers
> involving floating-point types are OK if you want). 

I see. I must say it's the first time I look at any GCC frontend so I missed
this important bit.

(c) specs->type only
> covers the type specifiers, so if you want to diagnose function pointer
> variables you need to allow for "int (*f)(float);" where the declaration's
> type involves floating point but the type specifiers don't.

Ok.

(d) What do
> you want to do with typedef declarations (right now it looks like they'll
> be handled as variables, but your testcases don't consider them)?

Because typedef can be used in header, these types should be dealt when
actually used, so at declaration type like the rest. I'll make sure to add some
testcase for this.

> 
> I'd also suggest some refactoring: have a function that takes as
> arguments
> a location and a type, and does the
> 
> if (flag_no_float && contains_floating_point_type (type))
>   error_at (loc, ...);
> 
> to avoid repeating that pattern in several places.

Right.

Thanks a lot for the review. Except on the patch itself, what do you think of
the general approach? Do you think doing this in the frontend is the right
approach?

Best regards,

Thomas 






RE: [PATCH 1/3, ARM, libgcc, ping5] Code size optimization for the fmul/fdiv and dmul/ddiv function in libgcc

2014-11-13 Thread Thomas Preud'homme
[Taking over Tony's patch]

Ping?

Best regards,

Thomas

> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Tony Wang
> Sent: Thursday, August 21, 2014 7:15 AM
> To: gcc-patches@gcc.gnu.org
> Subject: [PATCH 1/3,ARM,libgcc]Code size optimization for the fmul/fdiv
> and dmul/ddiv function in libgcc
> 
> Hi there,
> 
> In libgcc the file ieee754-sf.S and ieee754-df.S have some function pairs
> which will be bundled into one .o
> file and sharing the same .text section. For example, the fmul and fdiv,
> the libgcc makefile will build them
> into one .o file and archived into libgcc.a. So when user only call single
> float point multiply functions, the
> fdiv function will also be linked, and as fmul and fdiv share the same .text
> section, linker option
> --gc-sections or -flot can't remove the dead code.
> 
> So this optimization just separates the function pair(fmul/fdiv and
> dmul/ddiv) into different sections,
> following the naming pattern of -ffunction-
> sections(.text.__functionname), through which the unused sections
> of fdiv/ddiv can be eliminated through option --gcc-sections when users
> only use fmul/dmul.The solution is to
> add a conditional statement in the macro FUNC_START, which will
> conditional change the section of a function
> from .text to .text.__\name. when compiling with the L_arm_muldivsf3
> or L_arm_muldivdf3 macro.
> 
> GCC regression test has been done on QEMU for Cortex-M3. No new
> regressions when turn on this patch.
> 
> The code reduction for thumb2 on cortex-m3 is:
> 1. When user only use single float point multiply:
> fmul+fdiv => fmul will have a code size reduction of 318 bytes.
> 
> 2. When user only use double float point multiply:
> dmul+ddiv => dmul will have a code size reduction of 474 bytes.
> 
> Ok for trunk?
> 
> BR,
> Tony
> 
> Step 1: Provide another option: sp-scetion to control whether to split the
> section of a function pair into two
> part.
> 
> gcc/libgcc/ChangeLog:
> 2014-08-21  Tony Wang  
> 
> * config/arm/lib1funcs.S (FUNC_START): Add conditional section
> redefine for macro L_arm_muldivsf3 and L_arm_muldivdf3
> (SYM_END, ARM_SYM_START): Add macros used to expose function
> Symbols
> 
> diff --git a/libgcc/config/arm/lib1funcs.S b/libgcc/config/arm/lib1funcs.S
> index b617137..0f87111 100644
> --- a/libgcc/config/arm/lib1funcs.S
> +++ b/libgcc/config/arm/lib1funcs.S
> @@ -418,8 +418,12 @@ SYM (\name):
>  #define THUMB_SYNTAX
>  #endif
> 
> -.macro FUNC_START name
> +.macro FUNC_START name sp_section=
> +  .ifc \sp_section, function_section
> + .section.text.__\name,"ax",%progbits
> +  .else
>   .text
> +  .endif
>   .globl SYM (__\name)
>   TYPE (__\name)
>   .align 0
> @@ -429,14 +433,24 @@ SYM (\name):
>  SYM (__\name):
>  .endm
> 
> +.macro ARM_SYM_START name
> +   TYPE (\name)
> +   .align 0
> +SYM (\name):
> +.endm
> +
> +.macro SYM_END name
> +   SIZE (\name)
> +.endm
> +
>  /* Special function that will always be coded in ARM assembly, even if
> in Thumb-only compilation.  */
> 
>  #if defined(__thumb2__)
> 
>  /* For Thumb-2 we build everything in thumb mode.  */
> -.macro ARM_FUNC_START name
> -   FUNC_START \name
> +.macro ARM_FUNC_START name sp_section=
> +   FUNC_START \name \sp_section
> .syntax unified
>  .endm
>  #define EQUIV .thumb_set
> @@ -467,8 +481,12 @@ _L__\name:
>  #ifdef __ARM_ARCH_6M__
>  #define EQUIV .thumb_set
>  #else
> -.macro   ARM_FUNC_START name
> +.macro   ARM_FUNC_START name sp_section=
> +  .ifc \sp_section, function_section
> + .section.text.__\name,"ax",%progbits
> +  .else
>   .text
> +  .endif
>   .globl SYM (__\name)
>   TYPE (__\name)
>   .align 0





RE: [PATCH 2/3, ARM, libgcc, ping5] Code size optimization for the fmul/fdiv and dmul/ddiv function in libgcc

2014-11-13 Thread Thomas Preud'homme
[Taking over Tony's patch]

Ping?

Best regards,

Thomas

> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Tony Wang
> Sent: Thursday, August 21, 2014 7:15 AM
> To: gcc-patches@gcc.gnu.org
> Subject: [PATCH 2/3,ARM,libgcc]Code size optimization for the fmul/fdiv
> and dmul/ddiv function in libgcc
> 
> Step 2: Mark all the symbols around the fragment boundaries as function
> symbols, so as to generate veneer when
> the two section is too far away from each other. Also, I have both
> manually and using some test cases to
> verify that IP and PSR are not alive at such point.
> 
> gcc/libgcc/ChangeLog:
> 2014-8-21   Tony Wang 
> 
> * config/arm/ieee754-sf.S: Expose symbols around fragment
> boundaries as function symbols.
> * config/arm/ieee754-df.S: Same with above
> 
> BR,
> Tony





RE: [PATCH 3/3, ARM, libgcc, ping5] Code size optimization for the fmul/fdiv and dmul/ddiv function in libgcc

2014-11-13 Thread Thomas Preud'homme
[Taking over Tony's patch]

Ping?

Best regards,

Thomas

> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Tony Wang
> Sent: Thursday, August 21, 2014 7:15 AM
> To: gcc-patches@gcc.gnu.org
> Subject: [PATCH 3/3,ARM,libgcc]Code size optimization for the fmul/fdiv
> and dmul/ddiv function in libgcc
> 
> Step 3: Test cases to verify the code size reduction.
> 
> gcc/gcc/testsuite/ChangeLog:
> 2014-08-21  Tony Wang  
> 
> * gcc.target/arm/size-optimization-ieee-1.c: New test case
> * gcc.target/arm/size-optimization-ieee-2.c: New test case
> * lib/gcc-dg.exp: Add new function scan-symbol-common, scan-
> symbol-yes,
> scan-symbol-no to scan a user defined symbol in final elf file
> 
> BR,
> Tony
> 
> diff --git a/gcc/testsuite/gcc.target/arm/size-optimization-ieee-1.c
> b/gcc/testsuite/gcc.target/arm/size-optimization-ieee-1.c
> new file mode 100644
> index 000..46e9cdf
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/size-optimization-ieee-1.c
> @@ -0,0 +1,30 @@
> +/* { dg-do link { target { arm_thumb2_ok } } } */
> +/* { dg-options "-Wl,--gc-sections" } */
> +int
> +foo ()
> +{
> +  volatile float a;
> +  volatile float b;
> +  volatile float c = a * b;
> +  return 0;
> +}
> +
> +int
> +bar ()
> +{
> +  volatile double a;
> +  volatile double b;
> +  volatile double c = a * b;
> +  return 0;
> +}
> +
> +int
> +main ()
> +{
> +  foo ();
> +  bar ();
> +  return 0;
> +}
> +/* { dg-final { scan-symbol-no "__aeabi_fdiv" } } */
> +/* { dg-final { scan-symbol-no "__aeabi_ddiv" } } */
> +
> diff --git a/gcc/testsuite/gcc.target/arm/size-optimization-ieee-2.c
> b/gcc/testsuite/gcc.target/arm/size-optimization-ieee-2.c
> new file mode 100644
> index 000..5007d62
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/size-optimization-ieee-2.c
> @@ -0,0 +1,30 @@
> +/* { dg-do link { target { arm_thumb2_ok } } } */
> +/* { dg-options "-Wl,--gc-sections" } */
> +int
> +foo ()
> +{
> +  volatile float a;
> +  volatile float b;
> +  volatile float c = a / b;
> +  return 0;
> +}
> +
> +int
> +bar ()
> +{
> +  volatile double a;
> +  volatile double b;
> +  volatile double c = a / b;
> +  return 0;
> +}
> +
> +int
> +main ()
> +{
> +  foo ();
> +  bar ();
> +  return 0;
> +}
> +/* { dg-final { scan-symbol-yes "__aeabi_fmul" } } */
> +/* { dg-final { scan-symbol-yes "__aeabi_dmul" } } */
> +
> diff --git a/gcc/testsuite/lib/gcc-dg.exp b/gcc/testsuite/lib/gcc-dg.exp
> index 3390caa..0d52e95 100644
> --- a/gcc/testsuite/lib/gcc-dg.exp
> +++ b/gcc/testsuite/lib/gcc-dg.exp
> @@ -880,5 +880,57 @@ proc gdb-exists { args } {
>  return 0;
>  }
> 
> +# Scan the OUTPUT_FILE for a symbol. Return 1 if it present, or
> +# return 0 if it doesn't present
> +
> +proc scan-symbol-common { args } {
> +global nm
> +global base_dir
> +
> +set testcase [testname-for-summary]
> +set output_file "[file rootname [file tail $testcase]].exe"
> +
> +# Find nm like we find g++ in g++.exp.
> +if ![info exists nm]  {
> +set nm [findfile $base_dir/../../../binutils/nm \
> +$base_dir/../../../binutils/nm \
> +[findfile $base_dir/../../nm $base_dir/../../nm \
> +  [findfile $base_dir/nm $base_dir/nm \
> +   [transform nm
> +verbose -log "nm is $nm"
> +}
> +
> +if { $output_file == "" } {
> +fail "scan-symbol-not $args: dump file does not exist"
> +return
> +}
> +
> +set fd [open "| $nm $output_file" r]
> +set text [read $fd]
> +close $fd
> +
> +if [regexp -- [lindex $args 0] $text] {
> +return 1
> +} else {
> +return 0
> +}
> +}
> +
> +proc scan-symbol-yes { args } {
> +if { [scan-symbol-common $args] == 1 } {
> + pass "scan-symbol-yes $args exists"
> +} else {
> + fail "scan-symbol-yes $args does not exist"
> +}
> +}
> +
> +proc scan-symbol-no { args } {
> +if { [scan-symbol-common $args] != 1 } {
> +pass "scan-symbol-no $args does not exist"
> +} else {
> +fail "scan-symbol-no $args exists"
> +}
> +}
> +
>  set additional_prunes ""
>  set dg_runtest_extra_prunes ""





[PATCH] Add force option to find_best_rename_reg in regrename pass

2014-11-14 Thread Thomas Preud'homme
We are planning to introduce a new optimization in aarch64 backend, similar to 
the FP load balancing pass in the LLVM project [1]. This pass would be core 
specific and involve doing some register renaming. An RFC version of this patch 
should be posted later today. As part of this pass, we want to rename a 
register to any register following some specific constraints. We wanted to 
reuse the global (non static) find_best_rename_reg function as does the c6x 
backend but this function is a bit too specific to the register renaming pass.

[1] 
http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/AArch64/AArch64A57FPLoadBalancing.cpp?view=markup

It looks at register that respect the constraints of all the instructions in 
the set and tries to pick one in the preferred class for all the instructions 
involved. This is generally useful for any pass that wants to do register 
renaming. However it also contains some logic to only select the register that 
also haven't been used for a longer time than the register that should be 
replaced. This bit is specific to the register renaming pass and makes the 
function unusable for this new pass as a result which forces us to do a copy of 
the function.

This patch adds an extra parameter to skip this check and only consider the 
constraints and tries to pick a register in the preferred class.


ChangeLog entry is as follows:

2014-11-14  Thomas Preud'homme  

* regrename.c (find_best_rename_reg): Rename to ...
(find_rename_reg): This. Also add a parameter to skip tick check.
* regrename.h: Likewise.
* config/c6x/c6x.c: Adapt to above renaming.


diff --git a/gcc/config/c6x/c6x.c b/gcc/config/c6x/c6x.c
index 06319d0..6aca1e3 100644
--- a/gcc/config/c6x/c6x.c
+++ b/gcc/config/c6x/c6x.c
@@ -3513,7 +3513,8 @@ try_rename_operands (rtx_insn *head, rtx_insn *tail, 
unit_req_table reqs,
   COMPL_HARD_REG_SET (unavailable, reg_class_contents[(int) super_class]);
 
   old_reg = this_head->regno;
-  best_reg = find_best_rename_reg (this_head, super_class, &unavailable, 
old_reg);
+  best_reg = find_rename_reg (this_head, super_class, &unavailable, old_reg,
+ true);
 
   regrename_do_replace (this_head, best_reg);
 
diff --git a/gcc/regrename.h b/gcc/regrename.h
index 03b7164..05c78ad 100644
--- a/gcc/regrename.h
+++ b/gcc/regrename.h
@@ -89,8 +89,8 @@ extern void regrename_init (bool);
 extern void regrename_finish (void);
 extern void regrename_analyze (bitmap);
 extern du_head_p regrename_chain_from_id (unsigned int);
-extern int find_best_rename_reg (du_head_p, enum reg_class, HARD_REG_SET *,
-int);
+extern int find_rename_reg (du_head_p, enum reg_class, HARD_REG_SET *, int,
+   bool);
 extern void regrename_do_replace (du_head_p, int);
 
 #endif
diff --git a/gcc/regrename.c b/gcc/regrename.c


index 66f562b..5de7826 100644
--- a/gcc/regrename.c
+++ b/gcc/regrename.c
@@ -357,11 +357,13 @@ check_new_reg_p (int reg ATTRIBUTE_UNUSED, int new_reg,
 /* For the chain THIS_HEAD, compute and return the best register to
rename to.  SUPER_CLASS is the superunion of register classes in
the chain.  UNAVAILABLE is a set of registers that cannot be used.
-   OLD_REG is the register currently used for the chain.  */
+   OLD_REG is the register currently used for the chain.  BEST_RENAME
+   controls whether the register chosen must be better than the
+   current one or just respect the given constraint.  */
 
 int
-find_best_rename_reg (du_head_p this_head, enum reg_class super_class,
- HARD_REG_SET *unavailable, int old_reg)
+find_rename_reg (du_head_p this_head, enum reg_class super_class,
+HARD_REG_SET *unavailable, int old_reg, bool best_rename)
 {
   bool has_preferred_class;
   enum reg_class preferred_class;
@@ -408,8 +410,13 @@ find_best_rename_reg (du_head_p this_head, enum reg_class 
super_class,
  && ((pass == 0
   && !TEST_HARD_REG_BIT (reg_class_contents[preferred_class],
  best_new_reg))
- || tick[best_new_reg] > tick[new_reg]))
-   best_new_reg = new_reg;
+ || !best_rename || tick[best_new_reg] > tick[new_reg]))
+   {
+ if (best_rename)
+   best_new_reg = new_reg;
+ else
+   return new_reg;
+   }
}
   if (pass == 0 && best_new_reg != old_reg)
break;
@@ -480,8 +487,8 @@ rename_chains (void)
   if (n_uses < 2)
continue;
 
-  best_new_reg = find_best_rename_reg (this_head, super_class,
-  &this_unavailable, reg);
+  best_new_reg = find_rename_reg (this_head, super_class,
+ &this_unavailable, reg, true);
 
   if (dump_file)
{

=== Testing ===

c6x ba

RE: [PATCH] Cancel bswap opt when intermediate stmts are reused

2014-11-18 Thread Thomas Preud'homme
> From: Richard Biener [mailto:richard.guent...@gmail.com]
> Sent: Monday, November 17, 2014 12:47 PM
> 
> Hmm.  I am a little bit concerned about the malloc traffic generated here.
> So why not use a vec, get rid of the ->next pointer and
> use a hash_map  to associate the stmt with
> an index into the vector?

Sure, it even makes things easier. However I don't understand why a vector is
better than malloc. Is it a matter of fragmentation?

> 
> At this point I'd rather leave DCE to DCE ...

I thought since all information is there why not do it. It makes it easier to
read the dump of the pass.

> 
> Ick - do we really have to use gotos here?  Can you refactor this
> a bit to avoid it?

Yes I can. I needed the same kind of thing for fixing PR63761 (r217409) and
found a way to avoid it.

> 
> The whole scheme wouldn't exactly fly with the idea of eventually
> using a lattice to propagate the symbolic numbers, but well...
 
> 
> I think the overall idea is sound and if you have time to adjust according
> to my comments that would be nice.

To be honest I think it should wait for next stage1. After several ping I took
another look at the testcase with the idea of measuring the size reduction
the patch could give and was surprised that in all cases I could construct the
size was actually bigger. Performance might be improved nonetheless but
I think this needs more careful consideration.

And as you said the approach would need to be changed if bswap was
rewritten to do a forward analysis. At last, nobody reported a code size or
performance regression so far due to the changes so that might be a non
issue. If such a report happens, then it will be a good time to revisit that
decision.

Do you agree? 
 

> 
> Sorry for the very late review.

That's alright, I know how it is. Thank you for keeping track of it. I actually
feel sorry I didn't warn about my findings. I thought the patch fell through
the cracks and didn't want to spam gcc-patches uselessly.

Best regards,

Thomas





RE: [Patch ARM] Fix PR target/56846

2014-11-19 Thread Thomas Preud'homme
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Tony Wang
> 

> 
> Hi all,
> 
> The bug is reported at
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56846, and it’s about the
> problem that
> when exception handler is involved in the function, then
> _Unwind_Backtrace function will run into deadloop on
> arm target.

The patch (in r215101) can be backported without any change on 4.8 and
4.9 branches. I checked in QEMU with and without the patch on both
branches and it indeed solves the problem.

Testsuite run without regression when compiled with arm-none-eabi
cross compiler and executed on QEMU emulating Cortex-M3.

I also bootstrapped gcc on x86_64-linux-gnu and run the testsuite without
regressions.

Is it ok for backport?

Best regards,

Thomas 






RE: [Patch, ARM, ping1] Fix PR target/56846

2014-11-26 Thread Thomas Preud'homme
Ping?

> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme
> Sent: Wednesday, November 19, 2014 6:00 PM
> To: Tony Wang; gcc-patches@gcc.gnu.org; d...@debian.org; aph-
> g...@littlepinkcloud.com; Richard Earnshaw; Ramana Radhakrishnan;
> libstd...@gcc.gnu.org
> Subject: RE: [Patch ARM] Fix PR target/56846
> 
> > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> > ow...@gcc.gnu.org] On Behalf Of Tony Wang
> >
> 
> >
> > Hi all,
> >
> > The bug is reported at
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56846, and it’s about the
> > problem that
> > when exception handler is involved in the function, then
> > _Unwind_Backtrace function will run into deadloop on
> > arm target.
> 
> The patch (in r215101) can be backported without any change on 4.8 and
> 4.9 branches. I checked in QEMU with and without the patch on both
> branches and it indeed solves the problem.
> 
> Testsuite run without regression when compiled with arm-none-eabi
> cross compiler and executed on QEMU emulating Cortex-M3.
> 
> I also bootstrapped gcc on x86_64-linux-gnu and run the testsuite
> without
> regressions.
> 
> Is it ok for backport?
> 
> Best regards,
> 
> Thomas
> 
> 
> 
> 






RE: [Patch, ARM, ping1] Fix PR target/56846

2014-11-27 Thread Thomas Preud'homme
Thanks. Ccing release manager for their opinion.

Best regards,

Thomas

> -Original Message-
> From: Jonathan Wakely [mailto:jwak...@redhat.com]
> Sent: Wednesday, November 26, 2014 5:33 PM
> To: Thomas Preud'homme
> Cc: Tony Wang; gcc-patches@gcc.gnu.org; d...@debian.org; aph-
> g...@littlepinkcloud.com; Richard Earnshaw; Ramana Radhakrishnan;
> libstd...@gcc.gnu.org
> Subject: Re: [Patch, ARM, ping1] Fix PR target/56846
> 
> On 26/11/14 17:23 -, Thomas Preud'homme wrote:
> >Ping?
> 
> I'm OK with backporting it if a release manager approves it.
> 
> 
> >> -Original Message-
> >> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> >> ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme
> >> Sent: Wednesday, November 19, 2014 6:00 PM
> >> To: Tony Wang; gcc-patches@gcc.gnu.org; d...@debian.org; aph-
> >> g...@littlepinkcloud.com; Richard Earnshaw; Ramana Radhakrishnan;
> >> libstd...@gcc.gnu.org
> >> Subject: RE: [Patch ARM] Fix PR target/56846
> >>
> >> > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> >> > ow...@gcc.gnu.org] On Behalf Of Tony Wang
> >> >
> >>
> >> >
> >> > Hi all,
> >> >
> >> > The bug is reported at
> >> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56846, and it’s about
> the
> >> > problem that
> >> > when exception handler is involved in the function, then
> >> > _Unwind_Backtrace function will run into deadloop on
> >> > arm target.
> >>
> >> The patch (in r215101) can be backported without any change on 4.8
> and
> >> 4.9 branches. I checked in QEMU with and without the patch on both
> >> branches and it indeed solves the problem.
> >>
> >> Testsuite run without regression when compiled with arm-none-eabi
> >> cross compiler and executed on QEMU emulating Cortex-M3.
> >>
> >> I also bootstrapped gcc on x86_64-linux-gnu and run the testsuite
> >> without
> >> regressions.
> >>
> >> Is it ok for backport?
> >>
> >> Best regards,
> >>
> >> Thomas
> >>
> >>
> >>
> >>
> >
> >
> >
> >






RE: [Patch, ARM, ping1] Fix PR target/56846

2014-11-27 Thread Thomas Preud'homme


> -Original Message-
> From: Richard Biener [mailto:rguent...@suse.de]
> Sent: Thursday, November 27, 2014 9:57 AM
> To: Ramana Radhakrishnan
> Cc: Thomas Preud'homme; 'Jonathan Wakely'; Jakub Jelinek; Tony Wang;
> gcc-patches@gcc.gnu.org; d...@debian.org; aph-
> g...@littlepinkcloud.com; Richard Earnshaw; libstd...@gcc.gnu.org
> Subject: Re: [Patch, ARM, ping1] Fix PR target/56846
> 
> On Thu, 27 Nov 2014, Ramana Radhakrishnan wrote:
> 
> >
> >
> > On 27/11/14 09:34, Richard Biener wrote:
> > > On Thu, 27 Nov 2014, Thomas Preud'homme wrote:
> > >
> > > > Thanks. Ccing release manager for their opinion.
> > >
> > > It doesn't look ARM specific and frankly I have not too much
> expertise
> > > in this area.  The patch has been on trunk for more than two months
> > > though so I guess it is ok to backport.
> > >
> >
> > It is ARM specific because the whole thing sits in a #ifdef
> > __ARM_EABI_UNWINDER__ in eh_personality.cc.
> 
> Ah, too little patch context then.

Sorry, my bad. Below is the patch that were sent by Tony at that time with 20 
lines of context:

diff --git a/libstdc++-v3/libsupc++/eh_personality.cc 
b/libstdc++-v3/libsupc++/eh_personality.cc
index f315a83..cb4467a 100644
--- a/libstdc++-v3/libsupc++/eh_personality.cc
+++ b/libstdc++-v3/libsupc++/eh_personality.cc
@@ -361,40 +361,46 @@ PERSONALITY_FUNCTION (int version,
 found_cleanup,
 found_handler
   } found_type;
 
   lsda_header_info info;
   const unsigned char *language_specific_data;
   const unsigned char *action_record;
   const unsigned char *p;
   _Unwind_Ptr landing_pad, ip;
   int handler_switch_value;
   void* thrown_ptr = 0;
   bool foreign_exception;
   int ip_before_insn = 0;
 
 #ifdef __ARM_EABI_UNWINDER__
   _Unwind_Action actions;
 
   switch (state & _US_ACTION_MASK)
 {
 case _US_VIRTUAL_UNWIND_FRAME:
+  // If the unwind state pattern is
+  // _US_VIRTUAL_UNWIND_FRAME | _US_FORCE_UNWIND
+  // then we don't need to search for any handler as it is not a real
+  // exception. Just unwind the stack.
+  if (state & _US_FORCE_UNWIND)
+   CONTINUE_UNWINDING;
   actions = _UA_SEARCH_PHASE;
   break;
 
 case _US_UNWIND_FRAME_STARTING:
   actions = _UA_CLEANUP_PHASE;
   if (!(state & _US_FORCE_UNWIND)
  && ue_header->barrier_cache.sp == _Unwind_GetGR(context,
  UNWIND_STACK_REG))
actions |= _UA_HANDLER_FRAME;
   break;
 
 case _US_UNWIND_FRAME_RESUME:
   CONTINUE_UNWINDING;
   break;
 
 default:
   std::abort();
 }
   actions |= state & _US_FORCE_UNWIND;

Best regards,

Thomas





[PATCH, contrib] Reduce check_GNU_style noise

2014-11-28 Thread Thomas Preud'homme
Currently check_GNU_style.sh gives the error "There should be exactly one space 
between function name and parentheses." for the following kind of lines: 
tab[(int) idx]

This patch changes the check to only warn if there is 0 of 2+ space(s) between 
a alphanumeric character and an opening parenthesis, rather than 2+ space or 
anything else than a single space (which also was redundant).

With the change, above lines are now not warned about but other incorrect lines 
are still reported.

ChangeLog entry is as follows:

*** contrib/ChangeLog ***

2014-11-28  Thomas Preud'homme  

* check_GNU_style.sh: Warn for incorrect number of space in function
call only if 0 or 2+ spaces found.


diff --git a/contrib/check_GNU_style.sh b/contrib/check_GNU_style.sh
index ef8fdda..5f90190 100755
--- a/contrib/check_GNU_style.sh
+++ b/contrib/check_GNU_style.sh
@@ -113,7 +113,7 @@ g 'Sentences should end with a dot.  Dot, space, space, end 
of the comment.' \
 '[[:alnum:]][[:blank:]]*\*/' $*
 
 vg 'There should be exactly one space between function name and parentheses.' \
-'\#define' '[[:alnum:]]([^[:blank:]]|[[:blank:]]{2,})\(' $*
+'\#define' '[[:alnum:]]([[:blank:]]{2,})?\(' $*
 
 g 'There should be no space before closing parentheses.' \
 '[[:graph:]][[:blank:]]+\)' $*

Is this ok for trunk?

Best regards,

Thomas





[PATCH] Fix removing of df problem in df_finish_pass

2015-02-27 Thread Thomas Preud'homme
Hi,

In df_finish_pass, optional problems are removed manually making non null
entries in df->problems_in_order non contiguous. This may lead to null pointer
dereference when accessing all problems from df->problems_in_order[0] to
df->problems_in_order[df->num_problems_defined - 1] and miss some other
problems. Such a scenario was actually encountered when working on a patch.
This patch use the existing function df_remove_problem to do the deletion,
which require iterating on problems via the df->problems_by_index[] array
since each call mess up with df->num_problems_defined and order of
problems in df->problems_in_order[].

ChangeLog entry is as follows:

2015-02-12  Thomas Preud'homme  

* df-core.c (df_finish_pass): Iterate over df->problems_by_index[] and
use df_remove_problem rather than manually removing problems, living
holes in df->problems_in_order[].

diff --git a/gcc/df-core.c b/gcc/df-core.c
index 82f1364..67040a1 100644
--- a/gcc/df-core.c
+++ b/gcc/df-core.c
@@ -642,7 +642,6 @@ void
 df_finish_pass (bool verify ATTRIBUTE_UNUSED)
 {
   int i;
-  int removed = 0;
 
 #ifdef ENABLE_DF_CHECKING
   int saved_flags;
@@ -658,21 +657,15 @@ df_finish_pass (bool verify ATTRIBUTE_UNUSED)
   saved_flags = df->changeable_flags;
 #endif
 
-  for (i = 0; i < df->num_problems_defined; i++)
+  /* We iterate over problems by index as each problem removed will
+ lead to problems_in_order to be reordered.  */
+  for (i = 0; i < DF_LAST_PROBLEM_PLUS1; i++)
 {
-  struct dataflow *dflow = df->problems_in_order[i];
-  struct df_problem *problem = dflow->problem;
+  struct dataflow *dflow = df->problems_by_index[i];
 
-  if (dflow->optional_p)
-   {
- gcc_assert (problem->remove_problem_fun);
- (problem->remove_problem_fun) ();
- df->problems_in_order[i] = NULL;
- df->problems_by_index[problem->id] = NULL;
- removed++;
-   }
+  if (dflow && dflow->optional_p)
+   df_remove_problem (dflow);
 }
-  df->num_problems_defined -= removed;
 
   /* Clear all of the flags.  */
   df->changeable_flags = 0;


Testsuite was run with a bootstrapped x86_64 native compiler and an
arm-none-eabi GCC cross-compiler targetting Cortex-M3 without any
regression.

Although the problem is real, it doesn't seem that GCC hits it now
(I stumbled upon it while working on a patch). Therefore I'm not sure
if this should go in stage4 or not. Please advise me on this.

Ok for trunk/stage1?

Best regards,

Thomas





RE: [PATCH] Fix removing of df problem in df_finish_pass

2015-03-02 Thread Thomas Preud'homme
> From: Bernhard Reutner-Fischer [mailto:rep.dot@gmail.com]
> Sent: Saturday, February 28, 2015 4:00 AM
> >   use df_remove_problem rather than manually removing problems,
> living
> 
> leaving

Indeed. Please find updated changelog below:

2015-03-03  Thomas Preud'homme  

* df-core.c (df_finish_pass): Iterate over df->problems_by_index[] and
use df_remove_problem rather than manually removing problems, leaving
holes in df->problems_in_order[].

Best regards,

Thomas 






RE: [PATCH, ARM] Fix PR64453: live high register not saved in function prolog with -Os

2015-03-03 Thread Thomas Preud'homme
Just committed to 4.9 branch, 4.8 to follow once regression testsuite for
4.8 backport finishes running (backport was done quite some time ago now).

Best regards,

Thomas

> -Original Message-
> From: Ramana Radhakrishnan [mailto:ramana@googlemail.com]
> Sent: Tuesday, February 17, 2015 4:07 PM
> To: Thomas Preud'homme
> Cc: Ramana Radhakrishnan; gcc-patches; Richard Biener; Jakub Jelinek
> Subject: Re: [PATCH, ARM] Fix PR64453: live high register not saved in
> function prolog with -Os
> 
> On Fri, Jan 23, 2015 at 8:23 AM, Thomas Preud'homme
>  wrote:
> > Hi Ramana,
> >
> >> From: Ramana Radhakrishnan [mailto:ramana@googlemail.com]
> >> Sent: Wednesday, January 14, 2015 7:21 PM
> >> On Wed, Jan 14, 2015 at 10:20 AM, Thomas Preud'homme
> >>  wrote:
> >> > When compiling for size, live high registers are not saved in function
> >> prolog in ARM backend in Thumb mode. The problem comes from
> >> arm_conditional_register_usage setting call_used_regs for all high
> >> register to avoid them being allocated. However, this cause prolog to
> not
> >> save these register even if they are used. This patch marks high
> registers
> >> as really needing to be saved in prolog if live, no matter what is the
> >> content of call_used_regs.
> >> >
> >> > ChangeLog entries are as follows:
> >> >
> >> > gcc/ChangeLog
> >> >
> >> > 2015-01-12 Thomas Preud'homme thomas.preudho...@arm.com
> >> >
> >> > PR target/64453
> >> > * config/arm/arm.c (callee_saved_reg_p): Define.
> >> >     (arm_compute_save_reg0_reg12_mask): Use
> callee_saved_reg_p
> >> to check if
> >> > register is callee saved instead of !call_used_regs[reg].
> >> > (thumb1_compute_save_reg_mask): Likewise.
> >> >
> >> >
> >> > gcc/testsuite/ChangeLog
> >> >
> >> > 2014-12-31 Thomas Preud'homme thomas.preudho...@arm.com
> >> >
> >> > * gcc.target/arm/pr64453.c: New.
> >> >
> >> >
> >> >
> >>
> >> OK.
> >>
> >> Ramana
> >
> > The patch applies cleanly on GCC 4.8 and 4.9 branches when omitting
> the cosmetic change
> > in arm_conditional_register_usage () which was unintended. I
> compiled an arm-none-eabi
> > GCC cross compiler and ran the testsuite for both backport without any
> regression.
> >
> > Is this ok for the 4.8 and 4.9 branches?
> >
> 
> OK for the branches if no RM objects in 24 hours.
> 
> Ramana
> 
> > Best regards,
> >
> > Thomas
> >
> >
> >






  1   2   3   >