Re: [PATCH, GCC] Fix PR67781: wrong code generation for partial load on big endian targets
On Friday, January 08, 2016 10:05:25 AM Richard Biener wrote: > On Tue, 5 Jan 2016, Thomas Preud'homme wrote: > > Hi, > > > > bswap optimization pass generate wrong code on big endian targets when the > > result of a bit operation it analyzed is a partial load of the range of > > memory accessed by the original expression (when one or more bytes at > > lowest address were lost in the computation). This is due to the way > > cmpxchg and cmpnop are adjusted in find_bswap_or_nop before being > > compared to the result of the symbolic expression. Part of the adjustment > > is endian independent: it's to ignore the bytes that were not accessed by > > the original gimple expression. However, when the result has less byte > > than that original expression, some more byte need to be ignored and this > > is endian dependent. > > > > The current code only support loss of bytes at the highest addresses > > because there is no code to adjust the address of the load. However, for > > little and big endian targets the bytes at highest address translate into > > different byte significance in the result. This patch first separate > > cmpxchg and cmpnop adjustement into 2 steps and then deal with endianness > > correctly for the second step. > > > > ChangeLog entries are as follow: > > > > > > *** gcc/ChangeLog *** > > > > 2015-12-16 Thomas Preud'homme > > > > PR tree-optimization/67781 > > * tree-ssa-math-opts.c (find_bswap_or_nop): Zero out bytes in > > cmpxchg > > and cmpnop in two steps: first the ones not accessed in original > > gimple expression in a endian independent way and then the ones > > not > > accessed in the final result in an endian-specific way. > > > > *** gcc/testsuite/ChangeLog *** > > > > 2015-12-16 Thomas Preud'homme > > > > PR tree-optimization/67781 > > * gcc.c-torture/execute/pr67781.c: New file. > > > > diff --git a/gcc/testsuite/gcc.c-torture/execute/pr67781.c > > b/gcc/testsuite/gcc.c-torture/execute/pr67781.c > > new file mode 100644 > > index 000..bf50aa2 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.c-torture/execute/pr67781.c > > @@ -0,0 +1,34 @@ > > +#ifdef __UINT32_TYPE__ > > +typedef __UINT32_TYPE__ uint32_t; > > +#else > > +typedef unsigned uint32_t; > > +#endif > > + > > +#ifdef __UINT8_TYPE__ > > +typedef __UINT8_TYPE__ uint8_t; > > +#else > > +typedef unsigned char uint8_t; > > +#endif > > + > > +struct > > +{ > > + uint32_t a; > > + uint8_t b; > > +} s = { 0x123456, 0x78 }; > > + > > +int pr67781() > > +{ > > + uint32_t c = (s.a << 8) | s.b; > > + return c; > > +} > > + > > +int > > +main () > > +{ > > + if (sizeof (uint32_t) * __CHAR_BIT__ != 32) > > +return 0; > > + > > + if (pr67781 () != 0x12345678) > > +__builtin_abort (); > > + return 0; > > +} > > diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c > > index b00f046..e5a185f 100644 > > --- a/gcc/tree-ssa-math-opts.c > > +++ b/gcc/tree-ssa-math-opts.c > > @@ -2441,6 +2441,8 @@ find_bswap_or_nop_1 (gimple *stmt, struct > > symbolic_number *n, int limit) > > > > static gimple * > > find_bswap_or_nop (gimple *stmt, struct symbolic_number *n, bool *bswap) > > { > > > > + unsigned rsize; > > + uint64_t tmpn, mask; > > > > /* The number which the find_bswap_or_nop_1 result should match in order > > > > to have a full byte swap. The number is shifted to the right > > according to the size of the symbolic number before using it. */ > > > > @@ -2464,24 +2466,38 @@ find_bswap_or_nop (gimple *stmt, struct > > symbolic_number *n, bool *bswap) > > > >/* Find real size of result (highest non-zero byte). */ > >if (n->base_addr) > > > > -{ > > - int rsize; > > - uint64_t tmpn; > > - > > - for (tmpn = n->n, rsize = 0; tmpn; tmpn >>= BITS_PER_MARKER, > > rsize++); - n->range = rsize; > > -} > > +for (tmpn = n->n, rsize = 0; tmpn; tmpn >>= BITS_PER_MARKER, > > rsize++); > > + else > > +rsize = n->range; > > > > - /* Zero out the extra bits of N and CMP*. */ > > + /* Zero out the bits corresponding to untouched bytes in original > > gimple > > + expression. */
Re: [PATCH, GCC] Fix PR67781: wrong code generation for partial load on big endian targets
On Thursday, January 21, 2016 09:21:52 AM Richard Biener wrote: > On Thu, 21 Jan 2016, Thomas Preud'homme wrote: > > On Friday, January 08, 2016 10:05:25 AM Richard Biener wrote: > > > On Tue, 5 Jan 2016, Thomas Preud'homme wrote: > > > > Hi, > > > > > > > > bswap optimization pass generate wrong code on big endian targets when > > > > the > > > > result of a bit operation it analyzed is a partial load of the range > > > > of > > > > memory accessed by the original expression (when one or more bytes at > > > > lowest address were lost in the computation). This is due to the way > > > > cmpxchg and cmpnop are adjusted in find_bswap_or_nop before being > > > > compared to the result of the symbolic expression. Part of the > > > > adjustment > > > > is endian independent: it's to ignore the bytes that were not accessed > > > > by > > > > the original gimple expression. However, when the result has less byte > > > > than that original expression, some more byte need to be ignored and > > > > this > > > > is endian dependent. > > > > > > > > The current code only support loss of bytes at the highest addresses > > > > because there is no code to adjust the address of the load. However, > > > > for > > > > little and big endian targets the bytes at highest address translate > > > > into > > > > different byte significance in the result. This patch first separate > > > > cmpxchg and cmpnop adjustement into 2 steps and then deal with > > > > endianness > > > > correctly for the second step. > > > > > > > > ChangeLog entries are as follow: > > > > > > > > > > > > *** gcc/ChangeLog *** > > > > > > > > 2015-12-16 Thomas Preud'homme > > > > > > > > PR tree-optimization/67781 > > > > * tree-ssa-math-opts.c (find_bswap_or_nop): Zero out bytes in > > > > cmpxchg > > > > and cmpnop in two steps: first the ones not accessed in > > > > original > > > > gimple expression in a endian independent way and then the > > > > ones > > > > not > > > > accessed in the final result in an endian-specific way. > > > > > > > > *** gcc/testsuite/ChangeLog *** > > > > > > > > 2015-12-16 Thomas Preud'homme > > > > > > > > PR tree-optimization/67781 > > > > * gcc.c-torture/execute/pr67781.c: New file. > > > > > > > > diff --git a/gcc/testsuite/gcc.c-torture/execute/pr67781.c > > > > b/gcc/testsuite/gcc.c-torture/execute/pr67781.c > > > > new file mode 100644 > > > > index 000..bf50aa2 > > > > --- /dev/null > > > > +++ b/gcc/testsuite/gcc.c-torture/execute/pr67781.c > > > > @@ -0,0 +1,34 @@ > > > > +#ifdef __UINT32_TYPE__ > > > > +typedef __UINT32_TYPE__ uint32_t; > > > > +#else > > > > +typedef unsigned uint32_t; > > > > +#endif > > > > + > > > > +#ifdef __UINT8_TYPE__ > > > > +typedef __UINT8_TYPE__ uint8_t; > > > > +#else > > > > +typedef unsigned char uint8_t; > > > > +#endif > > > > + > > > > +struct > > > > +{ > > > > + uint32_t a; > > > > + uint8_t b; > > > > +} s = { 0x123456, 0x78 }; > > > > + > > > > +int pr67781() > > > > +{ > > > > + uint32_t c = (s.a << 8) | s.b; > > > > + return c; > > > > +} > > > > + > > > > +int > > > > +main () > > > > +{ > > > > + if (sizeof (uint32_t) * __CHAR_BIT__ != 32) > > > > +return 0; > > > > + > > > > + if (pr67781 () != 0x12345678) > > > > +__builtin_abort (); > > > > + return 0; > > > > +} > > > > diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c > > > > index b00f046..e5a185f 100644 > > > > --- a/gcc/tree-ssa-math-opts.c > > > > +++ b/gcc/tree-ssa-math-opts.c > > > > @@ -2441,6 +2441,8 @@ find_bswap_or_nop_1 (gimple *stmt, struct > > > > symbolic_number *n, int limit) > > > > > > > > static gimple * > &
Re: [PATCH, testsuite] Fix g++.dg/pr67989.C test failure when running with -march or -mcpu
Ping? On Monday, January 18, 2016 11:33:47 AM Thomas Preud'homme wrote: > On Wednesday, January 13, 2016 06:39:20 PM Bernd Schmidt wrote: > > On 01/12/2016 08:55 AM, Thomas Preud'homme wrote: > > > On Monday, January 11, 2016 04:57:18 PM Bernd Schmidt wrote: > > >> On 01/08/2016 10:33 AM, Thomas Preud'homme wrote: > > >>> 2016-01-08 Thomas Preud'homme > > >>> > > >>> * g++.dg/pr67989.C: Remove ARM-specific option. > > >>> * gcc.target/arm/pr67989.C: New file. > > >> > > >> I checked some other arm tests and they have things like > > >> > > >> /* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { > > >> "-march=*" } { "-march=armv4t" } } */ > > >> /* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { > > >> "-mthumb" } { "" } } */ > > >> > > >> Do you need the same in your testcase? > > > > > > That was the first approach I took but Kyrill suggested me to use > > > arm_arch_v4t and arm_arch_v4t_ok machinery instead. It should take care > > > about whether the architecture can be selected. > > > > Hmm, the ones I looked at did use dg-add-options, but not the > > corresponding _ok requirement. So I think this is OK. > > Just to make sure: ok as in OK to commit as is? > > Best regards, > > Thomas
Re: bswap PRs 69714, 67781
Hi Bernd, First of all, my apologize for the late reply. I was in holidays the past week to celebrate Chinese new year. On Friday, February 12, 2016 05:28:43 PM Bernd Schmidt wrote: > PR69714 is an issue where the bswap pass makes an incorrect > transformation on big-endian targets. The source has a 32-bit bswap, but > PA doesn't have a pattern for that. Still, we recognize that there is a > 16-bit bswap involved, and generate code for that - loading the halfword > at offset 2 from the original memory, as per the proper big-endian > correction. > > The problem is that we recognized the rotation of the _high_ part, which > is at offset 0 on big-endian. The symbolic number is 0x0304, rather than > 0x0102 as it should be. Only the latter form should ever be matched. Which is exactly what the patch for PR67781 was set out to do (see the if (BYTES_BIG_ENDIAN) block in find_bswap_or_nop. The reason why the offset is wrong is due to another if (BYTES_BIG_ENDIAN) block in bswap_replace. I will check the testcase added with that latter block, my guess is that the change was trying to fix a similar issue to PR67781 and PR69714. When removing it the load in avcrc is done without an offset. I should have run the full testsuite also on a big endian system instead of a few selected testcases and a bootstrap in addition to the little endian bootstrap+testsuite. Lesson learned. > The > problem is caused by the patch for PR67781, which was intended to solve > a different big-endian problem. Unfortunately, I think it is based on an > incorrect analysis. > > The real issue with the PR67781 testcase is in fact the masking loop, > identified by Thomas in comment #7 for 67781. > > for (tmpn = n->n, rsize = 0; tmpn; tmpn >>= BITS_PER_MARKER, rsize++) >; > n->range = rsize; > > If we have a value of 0x01020304, but a range of 5, it means that > there's an "invisible" high-order byte that we don't care about. On > little-endian, we can just ignore it. On big-endian, this implies that > the data we're interested in is located at an offset. The code that does > the replacements does not use the offset or bytepos fields, it assumes > that the bytepos always matches that of the load instruction. Yes, but the change in find_bswap_or_nop aims at checking that we have 0x05040302 or 0x02030405 for big endian targets and 0x04030201 or 0x01020304 for little endian targets. Before the "if (rsize < n->range)" block, cmpnop and cmpxchg are respectively 0x0504030201 and 0x0102030405. Then for big endian it will only keep the 4 least significant symbolic bytes of cmpxchg (if performs a bitwise and) and the 4 most significant symbolic bytes of cmpnop (it performs a right shift) so you'd get 0x05040302 for cmpnop and 0x02030405 for cmpxchg. Both would translate to a load at offset 0, and then a byteswap for the latter. As said earlier, the problem is in bswap_replace which tries to adjust the address of the load for big endian targets by adding a load offset. With the change in find_bswap_or_nop, an offset is never needed because only pattern that correspond to a load at offset 0 are recognized. I kept for GCC 7 to change that to allow offset and recognize all sub-load and sub-bswap. > The only > offset we can introduce is the big-endian correction, but that assumes > we're always dealing with lowparts. > > So, I think the correct/conservative fix for both bugs is to revert the > earlier change for PR67781, and then apply the following on top: > > --- revert.tree-ssa-math-opts.c 2016-02-12 15:22:57.098895058 +0100 > +++ tree-ssa-math-opts.c 2016-02-12 15:23:08.482228474 +0100 > @@ -2473,10 +2473,14 @@ find_bswap_or_nop (gimple *stmt, struct > /* Find real size of result (highest non-zero byte). */ > if (n->base_addr) > { > - int rsize; > + unsigned HOST_WIDE_INT rsize; > uint64_t tmpn; > > for (tmpn = n->n, rsize = 0; tmpn; tmpn >>= BITS_PER_MARKER, > rsize++); > + if (BYTES_BIG_ENDIAN && n->range != rsize) > + /* This implies an offset, which is currently not handled by > +bswap_replace. */ > + return NULL; > n->range = rsize; > } This works too yes with less optimizations for big endian. I'm fine with either solutions. This one is indeed a bit more conservative so I see the appeal to use it for GCC 5 and 6. Best regards, Thomas
[PATCH] Obvious fix for PR66828: left shift with undefined behavior in bswap pass
The bswap pass contain the following loop: for (i = 0; i < size; i++, inc <<= BITS_PER_MARKER) In the update to inc and i just before exiting the loop, inc can be shifted by a total of more than 62bit, making the value too large to be represented by int64_t. This is an undefined behavior [1] and it triggers an error under an ubsan bootstrap. This patch change the type of inc to be unsigned, removing the undefined behavior. [1] C++ 98 standard section 5.8 paragraph 2: "The value of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are zero-filled. If E1 has an unsigned type, the value of the result is E1 × 2E2 , reduced modulo one more than the maximum value representable in the result type. Otherwise, if E1 has a signed type and non-negative value, and E1 × 2E2 is representable in the corresponding unsigned type of the result type, then that value, converted to the result type, is the resulting value; otherwise, the behavior is undefined." ChangeLog entry is as follows: 2015-07-28 Thomas Preud'homme PR tree-optimization/66828 * tree-ssa-math-opts.c (perform_symbolic_merge): Change type of inc from int64_t to uint64_t. Testsuite was run on a native x86_64-linux-gnu bootstrapped GCC and an arm-none-eabi cross-compiler without any regression. Committed as obvious as suggested by Markus Trippelsdorf in PR66828. Best regards, Thomas
RE: [PATCH] Obvious fix for PR66828: left shift with undefined behavior in bswap pass
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme > > ChangeLog entry is as follows: > > 2015-07-28 Thomas Preud'homme > > PR tree-optimization/66828 > * tree-ssa-math-opts.c (perform_symbolic_merge): Change type of > inc > from int64_t to uint64_t. And the patch is: diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c index 55382f3..c3098db 100644 --- a/gcc/tree-ssa-math-opts.c +++ b/gcc/tree-ssa-math-opts.c @@ -2122,7 +2122,7 @@ perform_symbolic_merge (gimple source_stmt1, struct symbolic_number *n1, the same base (array, structure, ...). */ if (gimple_assign_rhs1 (source_stmt1) != gimple_assign_rhs1 (source_stmt2)) { - int64_t inc; + uint64_t inc; HOST_WIDE_INT start_sub, end_sub, end1, end2, end; struct symbolic_number *toinc_n_ptr, *n_end; Best regards, Thomas
[PATCH, loop-invariant] Fix PR67043: -fcompare-debug failure with -O3
Hi, Since commit r223113, loop-invariant pass rely on luids to determine if an invariant can be hoisted out of a loop without introducing temporaries. However, nothing is made to ensure luids are up-to-date. This patch adds a DF_LIVE problem and mark all blocks as dirty before using luids to ensure these will be recomputed. ChangeLog entries are as follows: 2015-07-31 Thomas Preud'homme PR tree-optimization/67043 * loop-invariant.c (find_defs): Force recomputation of all luids. 2015-07-29 Thomas Preud'homme PR tree-optimization/67043 * gcc.dg/pr67043.c: New test. Note: the testcase was heavily reduced from the Linux kernel sources by Markus Trippelsdorf and formatted to follow GNU code style. diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c index 1fdb84d..fc53e09 100644 --- a/gcc/loop-invariant.c +++ b/gcc/loop-invariant.c @@ -676,6 +676,8 @@ find_defs (struct loop *loop) df_remove_problem (df_chain); df_process_deferred_rescans (); df_chain_add_problem (DF_UD_CHAIN); + df_live_add_problem (); + df_live_set_all_dirty (); df_set_flags (DF_RD_PRUNE_DEAD_DEFS); df_analyze_loop (loop); check_invariant_table_size (); diff --git a/gcc/testsuite/gcc.dg/pr67043.c b/gcc/testsuite/gcc.dg/pr67043.c new file mode 100644 index 000..36aa686 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr67043.c @@ -0,0 +1,32 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -fcompare-debug -w" } */ + +extern void rt_mutex_owner (void); +extern void rt_mutex_deadlock_account_lock (int); +extern void signal_pending (void); +__typeof__ (int *) a; +int b; + +int +try_to_take_rt_mutex (int p1) { + rt_mutex_owner (); + if (b) +return 0; + rt_mutex_deadlock_account_lock (p1); + return 1; +} + +void +__rt_mutex_slowlock (int p1) { + int c; + for (;;) { +c = ({ + asm ("" : "=r"(a)); + a; +}); +if (try_to_take_rt_mutex (c)) + break; +if (__builtin_expect (p1 == 0, 0)) + signal_pending (); + } +} Patch was tested by running the testsuite against a bootstrapped native x86_64-linux-gnu GCC and against an arm-none-eabi GCC cross-compiler without any regression. Is this ok for trunk? Best regards, Thomas Preud'homme
RE: [PATCH] Obvious fix for PR66828: left shift with undefined behavior in bswap pass
Hi, > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme > Sent: Tuesday, July 28, 2015 3:04 PM > > > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > > ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme > > > > ChangeLog entry is as follows: > > > > 2015-07-28 Thomas Preud'homme > > > > PR tree-optimization/66828 > > * tree-ssa-math-opts.c (perform_symbolic_merge): Change type > of > > inc > > from int64_t to uint64_t. Can I backport this change to GCC 5 branch? The patch applies cleanly on GCC 5 and shows no regression on a native x86_64-linux-gnu bootstrapped GCC and an arm-none-eabi GCC cross-compiler. diff --git a/gcc/ChangeLog b/gcc/ChangeLog index ba37d96..a301c23 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,12 @@ +2015-08-04 Thomas Preud'homme + + Backport from mainline + 2015-07-28 Thomas Preud'homme + + PR tree-optimization/66828 + * tree-ssa-math-opts.c (perform_symbolic_merge): Change type of inc + from int64_t to uint64_t. + 2015-08-03 John David Anglin PR target/67060 diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c index c22a677..c699dcadb 100644 --- a/gcc/tree-ssa-math-opts.c +++ b/gcc/tree-ssa-math-opts.c @@ -1856,7 +1856,7 @@ perform_symbolic_merge (gimple source_stmt1, struct symbolic_number *n1, the same base (array, structure, ...). */ if (gimple_assign_rhs1 (source_stmt1) != gimple_assign_rhs1 (source_stmt2)) { - int64_t inc; + uint64_t inc; HOST_WIDE_INT start_sub, end_sub, end1, end2, end; struct symbolic_number *toinc_n_ptr, *n_end; Best regards, Thomas
FW: [PATCH, ARM/testsuite] Fix thumb2-slow-flash-data.c failures
[Forwarding to gcc-patches, doh!] Best regards, Thomas --- Begin Message --- Hi, ARM-specific thumb2-slow-flash-data.c testcase shows 2 failures when running for arm-none-eabi with -mcpu=cortex-m7: FAIL: gcc.target/arm/thumb2-slow-flash-data.c (test for excess errors) FAIL: gcc.target/arm/thumb2-slow-flash-data.c scan-assembler-times movt 13 The first one is due to a missing type specifier in the declaration of labelref while the second one is due to different constant synthesis as a result of a different tuning for the CPU selected. This patch fixes these issues by adding the missing type specifier and checking for .word and similar directive instead of the number of movt. The new test passes for all of -mcpu=cortex-m{3,4,7} but fail when removing the -mslow-flash-data switch. ChangeLog entry is as follows: *** gcc/testsuite/ChangeLog *** 2015-11-04 Thomas Preud'homme * gcc.target/arm/thumb2-slow-flash-data.c: Add missing typespec for labelref and check use of constant pool by looking for .word and similar directives. diff --git a/gcc/testsuite/gcc.target/arm/thumb2-slow-flash-data.c b/gcc/testsuite/gcc.target/arm/thumb2-slow-flash-data.c index 9852ea5..089a72b 100644 --- a/gcc/testsuite/gcc.target/arm/thumb2-slow-flash-data.c +++ b/gcc/testsuite/gcc.target/arm/thumb2-slow-flash-data.c @@ -50,7 +50,7 @@ int foo (int a, int b) { int i; - volatile *labelref = &&label1; + volatile int *labelref = &&label1; if (a > b) { @@ -70,5 +70,4 @@ label1: return a + b; } -/* { dg-final { scan-assembler-times "movt" 13 } } */ -/* { dg-final { scan-assembler-times "movt.*LC0\\+4" 1 } } */ +/* { dg-final { scan-assembler-not "\\.(float|l\\?double|\d?byte|short|int|long|quad|word)\\s+\[^.\]" } } */ Is this ok for trunk? Best regards, Thomas --- End Message ---
[arm-embedded][PATCHv2, ARM, libgcc] New aeabi_idiv function for armv6-m
We decided to apply this to ARM/embedded-5-branch. Best regards, Thomas > -Original Message- > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of Andre Vieira > Sent: Wednesday, October 28, 2015 1:03 AM > To: gcc-patches@gcc.gnu.org > Subject: Re: [PING][PATCHv2, ARM, libgcc] New aeabi_idiv function for > armv6-m > > Ping. > > BR, > Andre > > On 13/10/15 18:01, Andre Vieira wrote: > > This patch ports the aeabi_idiv routine from Linaro Cortex-Strings > > (https://git.linaro.org/toolchain/cortex-strings.git), which was > > contributed by ARM under Free BSD license. > > > > The new aeabi_idiv routine is used to replace the one in > > libgcc/config/arm/lib1funcs.S. This replacement happens within the > > Thumb1 wrapper. The new routine is under LGPLv3 license. > > > > The main advantage of this version is that it can improve the > > performance of the aeabi_idiv function for Thumb1. This solution will > > also increase the code size. So it will only be used if > > __OPTIMIZE_SIZE__ is not defined. > > > > Make check passed for armv6-m. > > > > libgcc/ChangeLog: > > 2015-08-10 Hale Wang > > Andre Vieira > > > > * config/arm/lib1funcs.S: Add new wrapper. > >
[PATCH, testsuite] Fix PR68629: attr-simd-3.c failure on arm-none-eabi targets
c-c++-common/attr-simd-3.c fails to compile on arm-none-eabi targets due to -fcilkplus needing -pthread which is not available for those targets. This patch solves this issue by adding a condition to the cilkplus effective target that compiling with -fcilkplus succeeds and requires cilkplus as an effective target for attr-simd-3.c testcase. ChangeLog entry is as follows: *** gcc/testsuite/ChangeLog *** 2015-12-08 Thomas Preud'homme PR testsuite/68629 * lib/target-supports.exp (check_effective_target_cilkplus): Also check that compiling with -fcilkplus does not give an error. * c-c++-common/attr-simd-3.c: Require cilkplus effective target. diff --git a/gcc/testsuite/c-c++-common/attr-simd-3.c b/gcc/testsuite/c-c++-common/attr-simd-3.c index d61ba82..1970c67 100644 --- a/gcc/testsuite/c-c++-common/attr-simd-3.c +++ b/gcc/testsuite/c-c++-common/attr-simd-3.c @@ -1,4 +1,5 @@ /* { dg-do compile } */ +/* { dg-require-effective-target "cilkplus" } */ /* { dg-options "-fcilkplus" } */ /* { dg-prune-output "undeclared here \\(not in a function\\)|\[^\n\r\]* was not declared in this scope" } */ diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index 4e349e9..95b903c 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -1432,7 +1432,12 @@ proc check_effective_target_cilkplus { } { if { [istarget avr-*-*] } { return 0; } -return 1 +return [ check_no_compiler_messages_nocache fcilkplus_available executable { + #ifdef __cplusplus + extern "C" + #endif + int dummy; + } "-fcilkplus" ] } proc check_linker_plugin_available { } { Testsuite shows no regression when run with + an arm-none-eabi GCC cross-compiler targeting Cortex-M3 + a bootstrapped x86_64-linux-gnu GCC native compiler Is this ok for trunk? Best regards, Thomas
[PATCH, testsuite] Fix PR68632: gcc.target/arm/lto/pr65837 failure on M profile ARM targets
gcc.target/arm/lto/pr65837 fails on M profile ARM targets because of lack of neon instructions. This patch adds the necessary arm_neon_ok effective target requirement to avoid running this test for such targets. ChangeLog entry is as follows: * gcc/testsuite/ChangeLog *** 2015-12-08 Thomas Preud'homme PR testsuite/68632 * gcc.target/arm/lto/pr65837_0.c: Require arm_neon_ok effective target. diff --git a/gcc/testsuite/gcc.target/arm/lto/pr65837_0.c b/gcc/testsuite/gcc.target/arm/lto/pr65837_0.c index 000fc2a..fcc26a1 100644 --- a/gcc/testsuite/gcc.target/arm/lto/pr65837_0.c +++ b/gcc/testsuite/gcc.target/arm/lto/pr65837_0.c @@ -1,4 +1,5 @@ /* { dg-lto-do run } */ +/* { dg-require-effective-target arm_neon_ok } */ /* { dg-lto-options {{-flto -mfpu=neon}} } */ /* { dg-suppress-ld-options {-mfpu=neon} } */ Testcase fails without the patch and succeeds with. Is this ok for trunk? Best regards, Thomas
RE: [PATCH] Fix confusion between target, host and symbolic number byte sizes
I suppose people were busy when I posted this patch and it got forgotten since so here is a little ping. Best regards, Thomas > -Original Message- > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme > Sent: Friday, July 04, 2014 12:53 PM > To: gcc-patches@gcc.gnu.org > Subject: [PATCH] Fix confusion between target, host and symbolic number > byte sizes > > The bswap pass deals with 3 possibly different byte size: host, target and the > size a byte marker occupied in the symbolic_number structure [1]. However, > as of now the code mixes the three size. This works in practice as the pass is > only enabled for target with BITS_PER_UNIT == 8 and nobody runs GCC on a > host with CHAR_BIT != 8. As prompted by Jakub Jelinek, this patch fixes this > mess. Byte marker are 8-bit quantities (they could be made 4-bit quantities > but I prefered to keep the code working the same as before) for which a > new macro is introduced (BITS_PER_MARKERS), anything related to storing > the value or a byte marker in a variable should check for the host byte size > or > wide integer size and anything aimed at manipulating the target value should > check for BITS_PER_UNIT. > > > [1] Although the comment for this structure implies that a byte marker as the > same size as the host byte, the way it is used in the code (even before any of > my patch) shows that it uses a fixed size of 8 [2]. > [2] Note that since the pass is only active for targets with BITS_PER_UNIT == > 8, it might be using the target byte size. > > gcc/ChangeLog: > > 2014-07-04 Thomas Preud'homme > > * tree-ssa-math-opts.c (struct symbolic_number): Clarify comment > about > the size of byte markers. > (do_shift_rotate): Fix confusion between host, target and marker > byte > size. > (verify_symbolic_number_p): Likewise. > (find_bswap_or_nop_1): Likewise. > (find_bswap_or_nop): Likewise. > > > diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c > index ca2b30d..55c5df7 100644 > --- a/gcc/tree-ssa-math-opts.c > +++ b/gcc/tree-ssa-math-opts.c > @@ -1602,11 +1602,10 @@ make_pass_cse_sincos (gcc::context *ctxt) > > /* A symbolic number is used to detect byte permutation and selection > patterns. Therefore the field N contains an artificial number > - consisting of byte size markers: > + consisting of octet sized markers: > > - 0- byte has the value 0 > - 1..size - byte contains the content of the byte > - number indexed with that value minus one. > + 0- target byte has the value 0 > + 1..size - marker value is the target byte index minus one. > > To detect permutations on memory sources (arrays and structures), a > symbolic > number is also associated a base address (the array or structure the load > is > @@ -1631,6 +1630,8 @@ struct symbolic_number { >unsigned HOST_WIDE_INT range; > }; > > +#define BITS_PER_MARKER 8 > + > /* The number which the find_bswap_or_nop_1 result should match in > order to have a nop. The number is masked according to the size of > the symbolic number before using it. */ > @@ -1652,15 +1653,16 @@ do_shift_rotate (enum tree_code code, >struct symbolic_number *n, >int count) > { > - int bitsize = TYPE_PRECISION (n->type); > + int size = TYPE_PRECISION (n->type) / BITS_PER_UNIT; > > - if (count % 8 != 0) > + if (count % BITS_PER_UNIT != 0) > return false; > + count = (count / BITS_PER_UNIT) * BITS_PER_MARKER; > >/* Zero out the extra bits of N in order to avoid them being shifted > into the significant bits. */ > - if (bitsize < 8 * (int)sizeof (int64_t)) > -n->n &= ((uint64_t)1 << bitsize) - 1; > + if (size < 64 / BITS_PER_MARKER) > +n->n &= ((uint64_t) 1 << (size * BITS_PER_MARKER)) - 1; > >switch (code) > { > @@ -1670,22 +1672,22 @@ do_shift_rotate (enum tree_code code, > case RSHIFT_EXPR: >/* Arithmetic shift of signed type: result is dependent on the value. > */ >if (!TYPE_UNSIGNED (n->type) > - && (n->n & ((uint64_t) 0xff << (bitsize - 8 > + && (n->n & ((uint64_t) 0xff << ((size - 1) * BITS_PER_MARKER > return false; >n->n >>= count; >break; > case LROTATE_EXPR: > - n->n = (n->n << count) | (n->n >> (bitsize - count)); > + n->n = (n->n << count) | (n->n >> ((size * BITS_PER_MARKER) - count)); >break; > ca
[PATCH] Cancel bswap opt when intermediate stmts are reused
Hi all, Currently, when an expression doing manual load or bswap is detected, the bswap optimization replace it by an actual load and/or bswap instruction without considering whether the intermediate values computed in the expressions are reused later. If that is the case, the construction of these values has to be retained and the sum of the load and/or bswap instruction and the instruction to contruct those values might be more expensive than the initial fully manual expression. This patch aims at detecting such a case and cancel the bswap optimization. In addition, it takes advantage of the infrastructure necessary for the detection to also cleanup the stmts that become useless when the bswap optimization is performed. *** gcc/ChangeLog *** 2014-08-01 Thomas Preud'homme * tree-ssa-math-opts.c (struct usedtree): New. (find_bswap_or_nop_1): Change prototype to take a hashtable and a list of struct usedtree. Fill respectively these with visited stmts and trees (along with the stmts where they are defined) that would not be defined if bswap optimization is applied. Adapt recursion call to prototype change. (find_bswap_or_nop): Adapt to find_bswap_or_nop_1 prototype change. Pass the hashtable and the list of struct usedtree from pass_optimize_bswap::execute (). (do_bswap_p): New. (bswap_replace): Fix typo in heading comment. Stop returning whether the bswap optimization was performed since this is now handled by do_bswap_p (). Move alignment check to do_bswap_p (). (free_usedtrees_list): New. (pass_optimize_bswap::execute): Add allocation and deallocation handling of the hashtable of browsed stmts. Free the list of struct usedtree at the end of each iteration using free_usedtrees_list () and the new bswap_check_end_iter label. Move some of the logic to perform the bswap optimization to do_bswap_p (). Set gsi after performing the bswap optimization and loop manually via the new bswap_check_start_iter label so that all stmts are checked for load/bswap even when cur_stmt is moved around by bswap_replace (). *** gcc/testsuite/ChangeLog *** 2014-08-01 Thomas Preud'homme * gcc.dg/optimize-bswapsi-2.c (read_le32_1): Add an intermediate variable in the mix to check that it is optimized when there is no use outside the expression computing a load/bswap. (read_be32_1): Likewise. * gcc.dg/optimize-bswapsi-3.c: New. Create read_le32_1 () and read_be32_1 () based on identically named function in gcc.dg/optimize-bswapsi-2.c but reusing the intermediate variable so as to check that bswap optimization is not performed in these cases. diff --git a/gcc/testsuite/gcc.dg/optimize-bswapsi-2.c b/gcc/testsuite/gcc.dg/optimize-bswapsi-2.c index de6e697..df7856b 100644 --- a/gcc/testsuite/gcc.dg/optimize-bswapsi-2.c +++ b/gcc/testsuite/gcc.dg/optimize-bswapsi-2.c @@ -14,7 +14,9 @@ struct uint32_st { uint32_t read_le32_1 (void) { - return data[0] | (data[1] << 8) | (data[2] << 16) | (data[3] << 24); + uint32_t low = data[0] | (data[1] << 8); + uint32_t ret = low | (data[2] << 16) | (data[3] << 24); + return ret; } uint32_t read_le32_2 (struct uint32_st data) @@ -30,7 +32,9 @@ uint32_t read_le32_3 (unsigned char *data) uint32_t read_be32_1 (void) { - return data[3] | (data[2] << 8) | (data[1] << 16) | (data[0] << 24); + uint32_t low = data[3] | (data[2] << 8); + uint32_t ret = low | (data[1] << 16) | (data[0] << 24); + return ret; } uint32_t read_be32_2 (struct uint32_st data) diff --git a/gcc/testsuite/gcc.dg/optimize-bswapsi-3.c b/gcc/testsuite/gcc.dg/optimize-bswapsi-3.c new file mode 100644 index 000..dc48d9d --- /dev/null +++ b/gcc/testsuite/gcc.dg/optimize-bswapsi-3.c @@ -0,0 +1,31 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target bswap32 } */ +/* { dg-require-effective-target stdint_types } */ +/* { dg-options "-O2 -fdump-tree-bswap" } */ +/* { dg-additional-options "-march=z900" { target s390-*-* } } */ + +#include + +extern unsigned char data[4]; + +/* No "bswap" optimization as low is reused */ +uint32_t read_le32_1 (unsigned char *data, uint32_t *low_neg) +{ + uint32_t low = data[0] | (data[1] << 8); + uint32_t ret = low | (data[2] << 16) | (data[3] << 24); + *low_neg = low; + return ret; +} + +/* No "bswap" optimization as low is reused */ +uint32_t read_be32_1 (unsigned char *data, uint32_t *low_neg) +{ + uint32_t low = data[3] | (data[2] << 8); + uint32_t ret = low | (data[1] << 16) | (data[0] << 24); + *low_neg = low; + return ret; +} + +/* { dg-final { scan-tree-dump-not "32 bit load in target endianness found at" "bswap" } } */ +/* { dg-fi
[PATCH] Fix incorrect folding of bitfield in a union on big endian target
In the code dealing with folding of structure and union initialization, there is a check that the size of the constructor is the same as the field being read. However, in the case of bitfield this test can be wrong because it relies on TYPE_SIZE to get the size of the field being read but TYPE_SIZE returns the size of the enclosing integer in that case. This patch also check the size parameter which contains the actual size of the field being read. The patch was tested by running the testsuite with three different builds of GCC: 1) bootstrap of GCC on x86_64-linux-gnu 2) arm-none-eabi cross compiler (defaulting to little endian) with testsuite run under qemu emulqting a Cortex M4 3) arm-none-eabi cross compiler (defaulting to big endian, thanks to patch at [1]) with testsuite run under qemu emulating a Cortex M3. [1] https://sourceware.org/ml/binutils/2014-08/msg00014.html No regression were observed on any of the tests. The ChangeLog is as follows: 2014-08-11 Thomas Preud'homme * gimple-fold.c (fold_ctor_reference): Don't fold in presence of bitfields, that is when size doesn't match the size of type or the size of the constructor. diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c index 3dcb576..6270c34 100644 --- a/gcc/gimple-fold.c +++ b/gcc/gimple-fold.c @@ -3099,7 +3099,9 @@ fold_ctor_reference (tree type, tree ctor, unsigned HOST_WIDE_INT offset, if (!AGGREGATE_TYPE_P (TREE_TYPE (ctor)) && !offset /* VIEW_CONVERT_EXPR is defined only for matching sizes. */ && operand_equal_p (TYPE_SIZE (type), - TYPE_SIZE (TREE_TYPE (ctor)), 0)) + TYPE_SIZE (TREE_TYPE (ctor)), 0) + && !compare_tree_int (TYPE_SIZE (type), size) + && !compare_tree_int (TYPE_SIZE (TREE_TYPE (ctor)), size)) { ret = canonicalize_constructor_val (unshare_expr (ctor), from_decl); ret = fold_unary (VIEW_CONVERT_EXPR, type, ret); diff --git a/gcc/testsuite/gcc.c-torture/execute/bitfld-6.c b/gcc/testsuite/gcc.c-torture/execute/bitfld-6.c new file mode 100644 index 000..50927dc --- /dev/null +++ b/gcc/testsuite/gcc.c-torture/execute/bitfld-6.c @@ -0,0 +1,23 @@ +union U +{ + const int a; + unsigned b : 20; +}; + +static union U u = { 0x12345678 }; + +/* Constant folding used to fail to account for endianness when folding a + union. */ + +int +main (void) +{ +#ifdef __BYTE_ORDER__ +#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ + return u.b - 0x45678; +#else + return u.b - 0x12345; +#endif +#endif + return 0; +} Is it ok for trunk? Best regards, Thomas
RE: [PATCH] Fix incorrect folding of bitfield in a union on big endian target
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme > > No regression were observed on any of the tests. The ChangeLog is as > follows: > > > 2014-08-11 Thomas Preud'homme > > * gimple-fold.c (fold_ctor_reference): Don't fold in presence of > bitfields, that is when size doesn't match the size of type or the > size of the constructor. The ChangeLog is imcomplete. This was for gcc/ChangeLog. For gcc/testsuite/ChangeLog, there is: 2014-07-02 Thomas Preud'homme * gcc.c-torture/execute/bitfld-6.c: New test. Best regards, Thomas
[PATCH, ARM] Fix incorrect alignment of small values in minipool
Being 32-bit wide in size, constant pool entries are always filled as 32-bit quantities. This works fine for little endian system but gives some incorrect results for big endian system when the value is *accessed* with a mode smaller than 32-bit in size. Suppose the case of the value 0x1234 that is accessed as an HImode value. It would be output as 0x0 0x0 0x12 0x34 in a constant pool entry and the 16-bit load that would be done would lead to the value 0x0 in register. The approach here is to transform the value so that it is output correctly by shifting the value left so that the highest byte in its mode ends up in the highest byte of the 32-bit value being output. The patch was tested by running the testsuite with three different builds of GCC: 1) bootstrap of GCC on x86_64-linux-gnu 2) arm-none-eabi cross compiler (defaulting to little endian) with testsuite run under qemu emulqting a Cortex M4 3) arm-none-eabi cross compiler (defaulting to big endian, thanks to patch at [1]) with testsuite run under qemu emulating a Cortex M3. Due to the processor used, the test constant-minipool was not run as part of the testsuite but manually with -cpu=cortex-r4 [1] https://sourceware.org/ml/binutils/2014-08/msg00014.html The ChangeLog is as follows: *** gcc/ChangeLog *** 2014-07-14 Thomas Preud'homme * config/arm/arm.c (dump_minipool): Fix alignment in minipool of values whose size is less than MINIPOOL_FIX_SIZE on big endian target. *** gcc/testsuite/ChangeLog *** 2014-07-14 Thomas Preud'homme * gcc.target/arm/constant-pool.c: New test. diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 0146fe8..e4e0ef4 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -16507,6 +16507,15 @@ dump_minipool (rtx scan) fputc ('\n', dump_file); } + /* On big-endian target, make sure that padding for values whose +mode size is smaller than MINIPOOL_FIX_SIZE comes after. */ + if (BYTES_BIG_ENDIAN && CONST_INT_P (mp->value)) + { + int byte_disp = mp->fix_size - GET_MODE_SIZE (mp->mode); + HOST_WIDE_INT val = INTVAL (mp->value); + mp->value = GEN_INT (val << (byte_disp * BITS_PER_UNIT)); + } + switch (mp->fix_size) { #ifdef HAVE_consttable_1 diff --git a/gcc/testsuite/gcc.target/arm/constant-pool.c b/gcc/testsuite/gcc.target/arm/constant-pool.c new file mode 100644 index 000..46a1534 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/constant-pool.c @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_arm_ok } */ +/* { dg-options "-O1 -marm -mbig-endian" } */ + +unsigned short v = 0x5678; +int i; +int j = 0; +int *ptr = &j; + +int +func (void) +{ + for (i = 0; i < 1; ++i) +{ + *ptr = -1; + v = 0x1234; +} + return v; +} + +int +main (void) +{ + func (); + return v - 0x1234; +} + +/* { dg-final { scan-assembler-not ".word 4660" } } */ Is this ok for trunk? Best regards, Thomas
[PATCH, C/C++] Add -fno-float to forbid floating point data types
Hi, As mentioned in PR60070, there is many cases when a programmer want to ensure that a program does not use any floating point data type. Other cases to consider is when a target has several floating point ABI and user want to ensure his/her is compatible with all available floating point ABI. Adding such a check also provides an opportunity to adapt the behavior of the compiler based on its result. This patch adds the new option -fno-float to request gcc to throw an error if any float or floating point operation is involved. The patch modifies the C and C++ frontend (others could be modified later if people request it) so as to throw an error whenever a keyword introducing a float type or a float litteral are encountered. The check is added to frontend rather middle end as this allow to do the detection as the file is parsed rather than needing a pass. It also limit the check to only the place where a float can be declared instead of having to look at all gimple stmts. Finally, it allows to catch some cases that would be absent of the middle end due to simplification or limited folding that the front end might do. Note though that things excluded by the preprocessor (think #ifdef) would not be analyzed. Note that the tests were written independently of the code so as to have more confidence in the patch. ChangeLog are as follows: *** gcc/ChangeLog *** 2014-08-08 Thomas Preud'homme PR middle-end/60070 * doc/invoke.texi (fno-float): Add to the list of C options and explain its meaning. *** gcc/c/ChangeLog *** 2014-08-08 Thomas Preud'homme PR middle-end/60070 * c-decl.c (finish_declspecs): Throw an error if -fno-float is specified by user and a default complex is encountered. * c-parser.c (c_token_starts_typename): Throw an error if -fno-float is specified by user and a float type name is encountered. (c_parser_declspecs): Memorize token being tested. Throw an error if -fno-float is specified by user and a float declaration specifier is encountered. (c_parser_postfix_expression): Throw an error if -fno-float is specified by user and a float litteral is encountered. *** gcc/c-family/ChangeLog *** 2014-08-08 Thomas Preud'homme PR middle-end/60070 * c.opt (ffloat): New option. *** gcc/cp/ChangeLog *** 2014-08-08 Thomas Preud'homme PR middle-end/60070 * decl.c (grokdeclarator): Throw an error if -fno-float is specified by user and a default complex is encountered. * parser.c (cp_parser_primary_expression): Throw an error if -fno-float is specified by user and a float litteral is encountered. (cp_parser_simple_type_specifier): Throw an error if -fno-float is specified by user and a float type specifier is encountered. *** gcc/testsuite/ChangeLog *** 2014-08-08 Thomas Preud'homme PR middle-end/60070 * gcc.dg/fno-float.c: New test case. * g++.dg/diagnostic/fno-float.C: Likewise. diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt index c318cad..2d22e15 100644 --- a/gcc/c-family/c.opt +++ b/gcc/c-family/c.opt @@ -1079,6 +1079,10 @@ fnil-receivers ObjC ObjC++ Var(flag_nil_receivers) Init(1) Assume that receivers of Objective-C messages may be nil +ffloat +C C++ LTO Var(flag_no_float, 0) +Allow floating point data types to be used in C/C++ + flocal-ivars ObjC ObjC++ Var(flag_local_ivars) Init(1) Allow access to instance variables as if they were local declarations within instance method implementations. diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c index 2a4b439..e68f0f3 100644 --- a/gcc/c/c-decl.c +++ b/gcc/c/c-decl.c @@ -10078,6 +10078,10 @@ finish_declspecs (struct c_declspecs *specs) } else if (specs->complex_p) { + if (flag_no_float) + error_at (specs->locations[cdw_complex], + "use of floating points forbidden in this translation " + "unit (-fno-float)"); specs->typespec_word = cts_double; pedwarn (specs->locations[cdw_complex], OPT_Wpedantic, "ISO C does not support plain % meaning " diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c index e32bf04..74ac945 100644 --- a/gcc/c/c-parser.c +++ b/gcc/c/c-parser.c @@ -486,6 +486,15 @@ c_token_starts_typename (c_token *token) case CPP_KEYWORD: switch (token->keyword) { + case RID_FLOAT: + case RID_DOUBLE: + case RID_DFLOAT32: + case RID_DFLOAT64: + case RID_DFLOAT128: + if (flag_no_float) + error_at (token->location, "use of floating points forbidden in " + "this translation unit (-fno-float)"); + /* Fall through. */ case RID_UNSIGNED: case RID_LONG: case RID_INT128: @@ -494
RE: [PATCH, C/C++] Add -fno-float to forbid floating point data types
> From: Marc Glisse [mailto:marc.gli...@inria.fr] > Sent: Tuesday, August 12, 2014 5:47 PM > > > Are you sure you want something that strict? If there are floats in the > code but the compiler manages to get rid of them, it won't cause any > trouble with the ABI. Besides, if you #include a header that declares a > function returning a double, but you don't use that function, the > compilation will still fail. And if you only want to forbid float/double > at the source level, grep is a pretty good tool. I can't see a test for > __builtin_sqrt(42). No ObjC? You raise some valid points due to the use case in mind when doing the patch, which is codes that aim to be independent of the float ABI in use on target with several float ABI. Now, that said, let's consider each point one by one. 1) Where to do the check Initially my approach was to detect whether a given execution unit is affected by the float ABI in use and was thus much less strict. It worked but needed some new hooks. After some discussion about this approach it appeared to be too complicated, difficult to make it target independent and also risky. Most importantly, it seemed unnecessary as if you don't use float at interface you probably don't use them at all. Or at least, it shouldn't be difficult to make it so. And as you can see from the PR mentioned, such approach can be useful for more use cases. About doing the test later, I believe it makes the feature a bit less useful. A float might be eliminated in a build due to an optimization but still remain in another build. I feel it more useful to tell the user that such line of code can lead to FPU being used. It also seems more natural to test such a thing in the frontend, as you process the file. However it's true that it prevents including math.h for instance. It might be a good idea to ignore prototypes. 2) Need for such an option at all Grep might work but it does not give any hint to the compiler that no float is used and therefore the file is independent of the float ABI. This is important since it allows the compiler to emit a special attribute to tell the linker about it. 3) __builtin_sqrt True, I shall try if it works with builtins. Thanks for the advice. 4) Objective C As said in the description, I'm not opposed to adding other language. It's easier to add another language than remove one after the fact because very few people use it. Therefore I preferred to have just C and C++ for now which is what I expect most of the users of such a switch to be interested in. Do you think I should add support for that language up front or can it wait a later version of the patch once people started to use it? > > I am not at all saying the approach is wrong, just making sure that's > what we want. Sure, I appreciate any constructive critics. If you are unconvinced by my arguments I'd be happy to hear about the reasons. Best regards, Thomas
RE: [PATCH, ARM] Fix incorrect alignment of small values in minipool
> From: Richard Earnshaw > Sent: Monday, August 11, 2014 4:54 PM > > I think this is the wrong place for this sort of fix up. HFmode values > are fixed up in the consttable_4 pattern and it looks wrong to be that > HImode values are then fixed up in a different place. We should be > consistent and do all the fix ups in one location. Sorry for the delay in answering. The problem is that in the pattern for constable_4 we don't have the information about the access mode for this entry. In the testcase along this patch the rtx manipulated in the pattern is VOIDmode while the access mode is HImode. In dump_minipool on the other hand the information can be found in mp->mode. Best regards, Thomas
RE: [PATCH] Fix incorrect folding of bitfield in a union on big endian target
> From: Richard Biener [mailto:richard.guent...@gmail.com] > Sent: Monday, August 11, 2014 8:29 PM > > That's now extra compares (the operand_equal_p check now does > a check that can be derived transitively). > > So - ok with the operand_equal_p cehck removed. > > Also see if this can be backported. I checked and the bug is present for in both GCC 4.8 and 4.9 branches. The patch applies cleanly in both cases as well. I did the same testing as for trunk on the GCC 4.8 branch, that is: 1) bootstrap of GCC on x86_64-linux-gnu 2) arm-none-eabi cross compiler (defaulting to little endian) with testsuite run under qemu emulqting a Cortex M4 3) arm-none-eabi cross compiler (defaulting to big endian, thanks to patch at [1]) with testsuite run under qemu emulating a Cortex M3. I'm going to do the same testing for GCC 4.9 now. May I commit the backports? Best regards, Thomas
RE: [PATCH, C/C++] Add -fno-float to forbid floating point data types
> From: Marek Polacek [mailto:pola...@redhat.com] > Sent: Tuesday, August 12, 2014 5:43 PM > On Tue, Aug 12, 2014 at 11:34:35AM +0200, Jakub Jelinek wrote: > > > > This looks wrong. c_token_starts_typename is just a function which tells > > you if certain token can start a typename, issuing diagnostics there doesn't > > make sense, that routine doesn't actually parse the token. You should > > diagnose it where you actually parse it. > > I'd say the proper place would be declspecs_add_type. Wouldn't that miss casts and sizeof for instance? It's true that c_token_starts_typename is not the place where the token is parsed but it seemed a more central place: it catches all these cases in one check. Ok maybe sizeof (float) should be ignored but I suppose if you use that you intend to store a float later. On the other hand I do want to distinguish float declared in prototypes from float declared elsewhere. Best regards, Thomas
RE: [PATCH] Fix confusion between target, host and symbolic number byte sizes
Ping? Best regards, Thomas > -Original Message- > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme > Sent: Thursday, August 07, 2014 1:57 PM > To: gcc-patches@gcc.gnu.org > Subject: RE: [PATCH] Fix confusion between target, host and symbolic > number byte sizes > > I suppose people were busy when I posted this patch and it got forgotten > since so here is a little ping. > > Best regards, > > Thomas > > > -Original Message- > > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > > ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme > > Sent: Friday, July 04, 2014 12:53 PM > > To: gcc-patches@gcc.gnu.org > > Subject: [PATCH] Fix confusion between target, host and symbolic number > > byte sizes > > > > The bswap pass deals with 3 possibly different byte size: host, target and > the > > size a byte marker occupied in the symbolic_number structure [1]. > However, > > as of now the code mixes the three size. This works in practice as the pass > is > > only enabled for target with BITS_PER_UNIT == 8 and nobody runs GCC on > a > > host with CHAR_BIT != 8. As prompted by Jakub Jelinek, this patch fixes this > > mess. Byte marker are 8-bit quantities (they could be made 4-bit quantities > > but I prefered to keep the code working the same as before) for which a > > new macro is introduced (BITS_PER_MARKERS), anything related to storing > > the value or a byte marker in a variable should check for the host byte size > or > > wide integer size and anything aimed at manipulating the target value > should > > check for BITS_PER_UNIT. > > > > > > [1] Although the comment for this structure implies that a byte marker as > the > > same size as the host byte, the way it is used in the code (even before any > of > > my patch) shows that it uses a fixed size of 8 [2]. > > [2] Note that since the pass is only active for targets with BITS_PER_UNIT > == > > 8, it might be using the target byte size. > > > > gcc/ChangeLog: > > > > 2014-07-04 Thomas Preud'homme > > > > * tree-ssa-math-opts.c (struct symbolic_number): Clarify comment > > about > > the size of byte markers. > > (do_shift_rotate): Fix confusion between host, target and marker > > byte > > size. > > (verify_symbolic_number_p): Likewise. > > (find_bswap_or_nop_1): Likewise. > > (find_bswap_or_nop): Likewise. > > > > > > diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c > > index ca2b30d..55c5df7 100644 > > --- a/gcc/tree-ssa-math-opts.c > > +++ b/gcc/tree-ssa-math-opts.c > > @@ -1602,11 +1602,10 @@ make_pass_cse_sincos (gcc::context *ctxt) > > > > /* A symbolic number is used to detect byte permutation and selection > > patterns. Therefore the field N contains an artificial number > > - consisting of byte size markers: > > + consisting of octet sized markers: > > > > - 0- byte has the value 0 > > - 1..size - byte contains the content of the byte > > - number indexed with that value minus one. > > + 0- target byte has the value 0 > > + 1..size - marker value is the target byte index minus one. > > > > To detect permutations on memory sources (arrays and structures), a > > symbolic > > number is also associated a base address (the array or structure the > > load > is > > @@ -1631,6 +1630,8 @@ struct symbolic_number { > >unsigned HOST_WIDE_INT range; > > }; > > > > +#define BITS_PER_MARKER 8 > > + > > /* The number which the find_bswap_or_nop_1 result should match in > > order to have a nop. The number is masked according to the size of > > the symbolic number before using it. */ > > @@ -1652,15 +1653,16 @@ do_shift_rotate (enum tree_code code, > > struct symbolic_number *n, > > int count) > > { > > - int bitsize = TYPE_PRECISION (n->type); > > + int size = TYPE_PRECISION (n->type) / BITS_PER_UNIT; > > > > - if (count % 8 != 0) > > + if (count % BITS_PER_UNIT != 0) > > return false; > > + count = (count / BITS_PER_UNIT) * BITS_PER_MARKER; > > > >/* Zero out the extra bits of N in order to avoid them being shifted > > into the significant bits. */ > > - if (bitsize < 8 * (int)sizeof (int64_t)) > > -n->n &= ((uint64_t)1 << bitsize) - 1; > > + if (size < 64 / BITS_PER_MARKER) > > +n-&g
RE: [PATCH] Cancel bswap opt when intermediate stmts are reused
Ping? > -Original Message- > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme > Sent: Thursday, August 07, 2014 6:56 PM > To: gcc-patches@gcc.gnu.org > Subject: [PATCH] Cancel bswap opt when intermediate stmts are reused > > Hi all, > > Currently, when an expression doing manual load or bswap is detected, the > bswap optimization replace it by an actual load and/or bswap instruction > without considering whether the intermediate values computed in the > expressions are reused later. If that is the case, the construction of these > values has to be retained and the sum of the load and/or bswap instruction > and the instruction to contruct those values might be more expensive than > the initial fully manual expression. This patch aims at detecting such a case > and cancel the bswap optimization. In addition, it takes advantage of the > infrastructure necessary for the detection to also cleanup the stmts that > become useless when the bswap optimization is performed. > > *** gcc/ChangeLog *** > > 2014-08-01 Thomas Preud'homme > > * tree-ssa-math-opts.c (struct usedtree): New. > (find_bswap_or_nop_1): Change prototype to take a hashtable and > a list > of struct usedtree. Fill respectively these with visited stmts and > trees (along with the stmts where they are defined) that would not > be > defined if bswap optimization is applied. Adapt recursion call to > prototype change. > (find_bswap_or_nop): Adapt to find_bswap_or_nop_1 prototype > change. > Pass the hashtable and the list of struct usedtree from > pass_optimize_bswap::execute (). > (do_bswap_p): New. > (bswap_replace): Fix typo in heading comment. Stop returning > whether > the bswap optimization was performed since this is now handled by > do_bswap_p (). Move alignment check to do_bswap_p (). > (free_usedtrees_list): New. > (pass_optimize_bswap::execute): Add allocation and deallocation > handling of the hashtable of browsed stmts. Free the list of struct > usedtree at the end of each iteration using free_usedtrees_list () > and > the new bswap_check_end_iter label. Move some of the logic to > perform > the bswap optimization to do_bswap_p (). Set gsi after performing > the > bswap optimization and loop manually via the new > bswap_check_start_iter label so that all stmts are checked for > load/bswap even when cur_stmt is moved around by bswap_replace > (). > > *** gcc/testsuite/ChangeLog *** > > 2014-08-01 Thomas Preud'homme > > * gcc.dg/optimize-bswapsi-2.c (read_le32_1): Add an intermediate > variable in the mix to check that it is optimized when there is no > use outside the expression computing a load/bswap. > (read_be32_1): Likewise. > * gcc.dg/optimize-bswapsi-3.c: New. Create read_le32_1 () and > read_be32_1 () based on identically named function in > gcc.dg/optimize-bswapsi-2.c but reusing the intermediate variable so > as to check that bswap optimization is not performed in these cases. > > diff --git a/gcc/testsuite/gcc.dg/optimize-bswapsi-2.c > b/gcc/testsuite/gcc.dg/optimize-bswapsi-2.c > index de6e697..df7856b 100644 > --- a/gcc/testsuite/gcc.dg/optimize-bswapsi-2.c > +++ b/gcc/testsuite/gcc.dg/optimize-bswapsi-2.c > @@ -14,7 +14,9 @@ struct uint32_st { > > uint32_t read_le32_1 (void) > { > - return data[0] | (data[1] << 8) | (data[2] << 16) | (data[3] << 24); > + uint32_t low = data[0] | (data[1] << 8); > + uint32_t ret = low | (data[2] << 16) | (data[3] << 24); > + return ret; > } > > uint32_t read_le32_2 (struct uint32_st data) > @@ -30,7 +32,9 @@ uint32_t read_le32_3 (unsigned char *data) > > uint32_t read_be32_1 (void) > { > - return data[3] | (data[2] << 8) | (data[1] << 16) | (data[0] << 24); > + uint32_t low = data[3] | (data[2] << 8); > + uint32_t ret = low | (data[1] << 16) | (data[0] << 24); > + return ret; > } > > uint32_t read_be32_2 (struct uint32_st data) > diff --git a/gcc/testsuite/gcc.dg/optimize-bswapsi-3.c > b/gcc/testsuite/gcc.dg/optimize-bswapsi-3.c > new file mode 100644 > index 000..dc48d9d > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/optimize-bswapsi-3.c > @@ -0,0 +1,31 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target bswap32 } */ > +/* { dg-require-effective-target stdint_types } */ > +/* { dg-options "-O2 -fdump-tree-bswap" } */ > +/* { dg-additional-options "-march=z900" { target s390-*-*
[PATCH][ARM] Fix -fcall-saved-rX for X > 7
This patch makes -fcall-saved-rX for X > 7 on Thumb target when optimizing for size. It works by adding a new field x_user_set_call_save_regs in struct target_hard_regs to track whether an entry in fields x_fixed_regs, x_call_used_regs and x_call_really_used_regs was user set or is in its default value. Then it can decide whether to set a given high register as caller saved or not when optimizing for size based on this information. ChangeLog are as follows: *** gcc/ChangeLog *** 2014-08-15 Thomas Preud'homme * config/arm/arm.c (arm_conditional_register_usage): Only set high registers as caller saved when optimizing for size *and* the user did not asked otherwise through -fcall-saved-* switch. * hard-reg-set.h (x_user_set_call_save_regs): New. (user_set_call_save_regs): Define. * reginfo.c (init_reg_sets): Initialize user_set_call_save_regs. (fix_register): Indicate in user_set_call_save_regs that the value set in call_save_regs and fixed_regs is user set. *** gcc/testsuite/ChangeLog *** 2014-08-15 Thomas Preud'homme * gcc.target/arm/fcall-save-rhigh.c: New. diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 2f8d327..8324fa3 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -30084,7 +30084,8 @@ arm_conditional_register_usage (void) stacking them. */ for (regno = FIRST_HI_REGNUM; regno <= LAST_HI_REGNUM; ++regno) - fixed_regs[regno] = call_used_regs[regno] = 1; + if (!user_set_call_save_regs[regno]) + fixed_regs[regno] = call_used_regs[regno] = 1; } /* The link register can be clobbered by any branch insn, diff --git a/gcc/hard-reg-set.h b/gcc/hard-reg-set.h index b8ab3df..b523637 100644 --- a/gcc/hard-reg-set.h +++ b/gcc/hard-reg-set.h @@ -614,6 +614,11 @@ struct target_hard_regs { char x_call_really_used_regs[FIRST_PSEUDO_REGISTER]; + /* Indexed by hard register number, contains 1 for registers + whose saving at function call was decided by the user + with -fcall-saved-*, -fcall-used-* or -ffixed-*. */ + char x_user_set_call_save_regs[FIRST_PSEUDO_REGISTER]; + /* The same info as a HARD_REG_SET. */ HARD_REG_SET x_call_used_reg_set; @@ -685,6 +690,8 @@ extern struct target_hard_regs *this_target_hard_regs; (this_target_hard_regs->x_call_used_regs) #define call_really_used_regs \ (this_target_hard_regs->x_call_really_used_regs) +#define user_set_call_save_regs \ + (this_target_hard_regs->x_user_set_call_save_regs) #define call_used_reg_set \ (this_target_hard_regs->x_call_used_reg_set) #define call_fixed_reg_set \ diff --git a/gcc/reginfo.c b/gcc/reginfo.c index 7668be0..0b35f7f 100644 --- a/gcc/reginfo.c +++ b/gcc/reginfo.c @@ -183,6 +183,7 @@ init_reg_sets (void) memcpy (call_really_used_regs, initial_call_really_used_regs, sizeof call_really_used_regs); #endif + memset (user_set_call_save_regs, 0, sizeof user_set_call_save_regs); #ifdef REG_ALLOC_ORDER memcpy (reg_alloc_order, initial_reg_alloc_order, sizeof reg_alloc_order); #endif @@ -742,6 +743,7 @@ fix_register (const char *name, int fixed, int call_used) if (fixed == 0) call_really_used_regs[i] = call_used; #endif + user_set_call_save_regs[i] = 1; } } } diff --git a/gcc/testsuite/gcc.target/arm/fcall-save-rhigh.c b/gcc/testsuite/gcc.target/arm/fcall-save-rhigh.c new file mode 100644 index 000..a321a2b --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/fcall-save-rhigh.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-final { scan-assembler "mov\\s+r.\\s*,\\s*r8" } } */ +/* { dg-require-effective-target arm_thumb1_ok } */ +/* { dg-options "-Os -mthumb -mcpu=cortex-m0 -fcall-saved-r8" } */ + +void +save_regs (void) +{ + asm volatile ("" ::: "r7", "r8"); +} Ok for trunk? Best regards, Thomas
RE: [PATCH] Fix confusion between target, host and symbolic number byte sizes
Ping? > > -Original Message- > > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > > ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme > > Sent: Thursday, August 07, 2014 1:57 PM > > To: gcc-patches@gcc.gnu.org > > Subject: RE: [PATCH] Fix confusion between target, host and symbolic > > number byte sizes > > > > I suppose people were busy when I posted this patch and it got forgotten > > since so here is a little ping. > > > > Best regards, > > > > Thomas > > > > > -Original Message----- > > > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > > > ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme > > > Sent: Friday, July 04, 2014 12:53 PM > > > To: gcc-patches@gcc.gnu.org > > > Subject: [PATCH] Fix confusion between target, host and symbolic > number > > > byte sizes > > > > > > The bswap pass deals with 3 possibly different byte size: host, target and > > the > > > size a byte marker occupied in the symbolic_number structure [1]. > > However, > > > as of now the code mixes the three size. This works in practice as the > > > pass > > is > > > only enabled for target with BITS_PER_UNIT == 8 and nobody runs GCC > on > > a > > > host with CHAR_BIT != 8. As prompted by Jakub Jelinek, this patch fixes > this > > > mess. Byte marker are 8-bit quantities (they could be made 4-bit > quantities > > > but I prefered to keep the code working the same as before) for which a > > > new macro is introduced (BITS_PER_MARKERS), anything related to > storing > > > the value or a byte marker in a variable should check for the host byte > size > > or > > > wide integer size and anything aimed at manipulating the target value > > should > > > check for BITS_PER_UNIT. > > > > > > > > > [1] Although the comment for this structure implies that a byte marker as > > the > > > same size as the host byte, the way it is used in the code (even before > any > > of > > > my patch) shows that it uses a fixed size of 8 [2]. > > > [2] Note that since the pass is only active for targets with BITS_PER_UNIT > > == > > > 8, it might be using the target byte size. > > > > > > gcc/ChangeLog: > > > > > > 2014-07-04 Thomas Preud'homme > > > > > > * tree-ssa-math-opts.c (struct symbolic_number): Clarify comment > > > about > > > the size of byte markers. > > > (do_shift_rotate): Fix confusion between host, target and marker > > > byte > > > size. > > > (verify_symbolic_number_p): Likewise. > > > (find_bswap_or_nop_1): Likewise. > > > (find_bswap_or_nop): Likewise. > > > > > > > > > diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c > > > index ca2b30d..55c5df7 100644 > > > --- a/gcc/tree-ssa-math-opts.c > > > +++ b/gcc/tree-ssa-math-opts.c > > > @@ -1602,11 +1602,10 @@ make_pass_cse_sincos (gcc::context *ctxt) > > > > > > /* A symbolic number is used to detect byte permutation and selection > > > patterns. Therefore the field N contains an artificial number > > > - consisting of byte size markers: > > > + consisting of octet sized markers: > > > > > > - 0- byte has the value 0 > > > - 1..size - byte contains the content of the byte > > > - number indexed with that value minus one. > > > + 0- target byte has the value 0 > > > + 1..size - marker value is the target byte index minus one. > > > > > > To detect permutations on memory sources (arrays and structures), a > > > symbolic > > > number is also associated a base address (the array or structure the > load > > is > > > @@ -1631,6 +1630,8 @@ struct symbolic_number { > > >unsigned HOST_WIDE_INT range; > > > }; > > > > > > +#define BITS_PER_MARKER 8 > > > + > > > /* The number which the find_bswap_or_nop_1 result should match in > > > order to have a nop. The number is masked according to the size of > > > the symbolic number before using it. */ > > > @@ -1652,15 +1653,16 @@ do_shift_rotate (enum tree_code code, > > >struct symbolic_number *n, > > >int count) > > > { > > > - int bitsize = TYPE_PRECISION (n->type); > > > + int size = TYPE_PRECISION (n->type) / BITS_PER_UNIT; >
RE: [PATCH, ARM] Fix incorrect alignment of small values in minipool
> From: Richard Earnshaw > Sent: Monday, August 18, 2014 6:34 PM > > > > The problem is that in the pattern for constable_4 we don't have the > information > > about the access mode for this entry. In the testcase along this patch the > rtx > > manipulated in the pattern is VOIDmode while the access mode is HImode. > In > > dump_minipool on the other hand the information can be found in mp- > >mode. > > > > I think it would be better to make sure the mode field never contains > VOIDmode. That's not really useful information. I'm a bit confused. In the chapter about modes (13.6) of the gcc internal documentation VOIDmode is said to be the correct mode for constant. You want me to change this? Best regards, Thomas
RE: [PATCH][ARM] Fix -fcall-saved-rX for X > 7 When compiling for size for thumb targets
Ping? > -Original Message- > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme > Sent: Wednesday, August 20, 2014 9:28 AM > To: gcc-patches@gcc.gnu.org > Subject: [PATCH][ARM] Fix -fcall-saved-rX for X > 7 > > This patch makes -fcall-saved-rX for X > 7 on Thumb target when optimizing > for size. It works by adding a new field x_user_set_call_save_regs in struct > target_hard_regs to track whether an entry in fields x_fixed_regs, > x_call_used_regs and x_call_really_used_regs was user set or is in its default > value. Then it can decide whether to set a given high register as caller saved > or not when optimizing for size based on this information. > > ChangeLog are as follows: > > *** gcc/ChangeLog *** > > 2014-08-15 Thomas Preud'homme > > * config/arm/arm.c (arm_conditional_register_usage): Only set high > registers as caller saved when optimizing for size *and* the user did > not asked otherwise through -fcall-saved-* switch. > * hard-reg-set.h (x_user_set_call_save_regs): New. > (user_set_call_save_regs): Define. > * reginfo.c (init_reg_sets): Initialize user_set_call_save_regs. > (fix_register): Indicate in user_set_call_save_regs that the value set > in call_save_regs and fixed_regs is user set. > > > *** gcc/testsuite/ChangeLog *** > > 2014-08-15 Thomas Preud'homme > > * gcc.target/arm/fcall-save-rhigh.c: New. > > > diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c > index 2f8d327..8324fa3 100644 > --- a/gcc/config/arm/arm.c > +++ b/gcc/config/arm/arm.c > @@ -30084,7 +30084,8 @@ arm_conditional_register_usage (void) > stacking them. */ >for (regno = FIRST_HI_REGNUM; > regno <= LAST_HI_REGNUM; ++regno) > - fixed_regs[regno] = call_used_regs[regno] = 1; > + if (!user_set_call_save_regs[regno]) > + fixed_regs[regno] = call_used_regs[regno] = 1; > } > >/* The link register can be clobbered by any branch insn, > diff --git a/gcc/hard-reg-set.h b/gcc/hard-reg-set.h > index b8ab3df..b523637 100644 > --- a/gcc/hard-reg-set.h > +++ b/gcc/hard-reg-set.h > @@ -614,6 +614,11 @@ struct target_hard_regs { > >char x_call_really_used_regs[FIRST_PSEUDO_REGISTER]; > > + /* Indexed by hard register number, contains 1 for registers > + whose saving at function call was decided by the user > + with -fcall-saved-*, -fcall-used-* or -ffixed-*. */ > + char x_user_set_call_save_regs[FIRST_PSEUDO_REGISTER]; > + >/* The same info as a HARD_REG_SET. */ >HARD_REG_SET x_call_used_reg_set; > > @@ -685,6 +690,8 @@ extern struct target_hard_regs > *this_target_hard_regs; >(this_target_hard_regs->x_call_used_regs) > #define call_really_used_regs \ >(this_target_hard_regs->x_call_really_used_regs) > +#define user_set_call_save_regs \ > + (this_target_hard_regs->x_user_set_call_save_regs) > #define call_used_reg_set \ >(this_target_hard_regs->x_call_used_reg_set) > #define call_fixed_reg_set \ > diff --git a/gcc/reginfo.c b/gcc/reginfo.c > index 7668be0..0b35f7f 100644 > --- a/gcc/reginfo.c > +++ b/gcc/reginfo.c > @@ -183,6 +183,7 @@ init_reg_sets (void) >memcpy (call_really_used_regs, initial_call_really_used_regs, > sizeof call_really_used_regs); > #endif > + memset (user_set_call_save_regs, 0, sizeof user_set_call_save_regs); > #ifdef REG_ALLOC_ORDER >memcpy (reg_alloc_order, initial_reg_alloc_order, sizeof reg_alloc_order); > #endif > @@ -742,6 +743,7 @@ fix_register (const char *name, int fixed, int call_used) > if (fixed == 0) > call_really_used_regs[i] = call_used; > #endif > + user_set_call_save_regs[i] = 1; > } > } > } > diff --git a/gcc/testsuite/gcc.target/arm/fcall-save-rhigh.c > b/gcc/testsuite/gcc.target/arm/fcall-save-rhigh.c > new file mode 100644 > index 000..a321a2b > --- /dev/null > +++ b/gcc/testsuite/gcc.target/arm/fcall-save-rhigh.c > @@ -0,0 +1,10 @@ > +/* { dg-do compile } */ > +/* { dg-final { scan-assembler "mov\\s+r.\\s*,\\s*r8" } } */ > +/* { dg-require-effective-target arm_thumb1_ok } */ > +/* { dg-options "-Os -mthumb -mcpu=cortex-m0 -fcall-saved-r8" } */ > + > +void > +save_regs (void) > +{ > + asm volatile ("" ::: "r7", "r8"); > +} > > Ok for trunk? > > Best regards, > > Thomas > >
[PATCH] Fix byte size confusion in bswap pass
[CCing you Jakub as you are the one who raised this issue to me] The bswap pass deals with 3 possibly different byte size: host, target and the size a byte marker in the symbolic_number structure [1]. However, right now the code mixes the three sizes. This works in practice as the pass is only enabled for target with BITS_PER_UNIT == 8 and nobody runs GCC on a host with CHAR_BIT != 8. As prompted by Jakub Jelinek, this patch fixes this mess. Byte marker are 8-bit quantities (they could be made 4-bit quantities but I preferred to keep the code working the same as before) for which a new macro is introduced (BITS_PER_MARKERS), anything related to storing the value or a byte marker in a variable should check for the host byte size or wide integer size and anything aimed at manipulating the target value should check for BITS_PER_UNIT. [1] Although the comment for this structure implies that a byte marker as the same size as the host byte, the way it is used in the code (even before any of my patch) shows that it uses a fixed size of 8 [2]. [2] Note that since the pass is only active for targets with BITS_PER_UNIT == 8, it might be using the target byte size. gcc/ChangeLog: 2014-08-29 Thomas Preud'homme * tree-ssa-math-opts.c (struct symbolic_number): Clarify comment about the size of byte markers. (do_shift_rotate): Fix confusion between host, target and marker byte size. (verify_symbolic_number_p): Likewise. (find_bswap_or_nop_1): Likewise. (find_bswap_or_nop): Likewise. diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c index ca2b30d..55c5df7 100644 --- a/gcc/tree-ssa-math-opts.c +++ b/gcc/tree-ssa-math-opts.c @@ -1600,11 +1600,10 @@ make_pass_cse_sincos (gcc::context *ctxt) /* A symbolic number is used to detect byte permutation and selection patterns. Therefore the field N contains an artificial number - consisting of byte size markers: + consisting of octet sized markers: - 0- byte has the value 0 - 1..size - byte contains the content of the byte - number indexed with that value minus one. + 0- target byte has the value 0 + 1..size - marker value is the target byte index minus one. To detect permutations on memory sources (arrays and structures), a symbolic number is also associated a base address (the array or structure the load is @@ -1629,6 +1628,8 @@ struct symbolic_number { unsigned HOST_WIDE_INT range; }; +#define BITS_PER_MARKER 8 + /* The number which the find_bswap_or_nop_1 result should match in order to have a nop. The number is masked according to the size of the symbolic number before using it. */ @@ -1650,15 +1651,16 @@ do_shift_rotate (enum tree_code code, struct symbolic_number *n, int count) { - int bitsize = TYPE_PRECISION (n->type); + int size = TYPE_PRECISION (n->type) / BITS_PER_UNIT; - if (count % 8 != 0) + if (count % BITS_PER_UNIT != 0) return false; + count = (count / BITS_PER_UNIT) * BITS_PER_MARKER; /* Zero out the extra bits of N in order to avoid them being shifted into the significant bits. */ - if (bitsize < 8 * (int)sizeof (int64_t)) -n->n &= ((uint64_t)1 << bitsize) - 1; + if (size < 64 / BITS_PER_MARKER) +n->n &= ((uint64_t) 1 << (size * BITS_PER_MARKER)) - 1; switch (code) { @@ -1668,22 +1670,22 @@ do_shift_rotate (enum tree_code code, case RSHIFT_EXPR: /* Arithmetic shift of signed type: result is dependent on the value. */ if (!TYPE_UNSIGNED (n->type) - && (n->n & ((uint64_t) 0xff << (bitsize - 8 + && (n->n & ((uint64_t) 0xff << ((size - 1) * BITS_PER_MARKER return false; n->n >>= count; break; case LROTATE_EXPR: - n->n = (n->n << count) | (n->n >> (bitsize - count)); + n->n = (n->n << count) | (n->n >> ((size * BITS_PER_MARKER) - count)); break; case RROTATE_EXPR: - n->n = (n->n >> count) | (n->n << (bitsize - count)); + n->n = (n->n >> count) | (n->n << ((size * BITS_PER_MARKER) - count)); break; default: return false; } /* Zero unused bits for size. */ - if (bitsize < 8 * (int)sizeof (int64_t)) -n->n &= ((uint64_t)1 << bitsize) - 1; + if (size < 64 / BITS_PER_MARKER) +n->n &= ((uint64_t) 1 << (size * BITS_PER_MARKER)) - 1; return true; } @@ -1724,13 +1726,13 @@ init_symbolic_number (struct symbolic_number *n, tree src) if (size % BITS_PER_UNIT != 0) return false; size /= BITS_PER_UNIT; - if (size > (int)sizeof (uint64_t)) + if (size > 64 / BITS_PER_MARKER) return false; n->range = size; n->n = CMPNOP; - if (size < (int)sizeof (int64_t)) -
RE: [PATCH][ARM] Fix -fcall-saved-rX for X > 7 with -Os -mthumb
Ping? > > -Original Message- > > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > > ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme > > Sent: Wednesday, August 20, 2014 9:28 AM > > To: gcc-patches@gcc.gnu.org > > Subject: [PATCH][ARM] Fix -fcall-saved-rX for X > 7 > > > > This patch makes -fcall-saved-rX for X > 7 on Thumb target when optimizing > > for size. It works by adding a new field x_user_set_call_save_regs in struct > > target_hard_regs to track whether an entry in fields x_fixed_regs, > > x_call_used_regs and x_call_really_used_regs was user set or is in its > default > > value. Then it can decide whether to set a given high register as caller > > saved > > or not when optimizing for size based on this information. > > > > ChangeLog are as follows: > > > > *** gcc/ChangeLog *** > > > > 2014-08-15 Thomas Preud'homme > > > > * config/arm/arm.c (arm_conditional_register_usage): Only set high > > registers as caller saved when optimizing for size *and* the user > > did > > not asked otherwise through -fcall-saved-* switch. > > * hard-reg-set.h (x_user_set_call_save_regs): New. > > (user_set_call_save_regs): Define. > > * reginfo.c (init_reg_sets): Initialize user_set_call_save_regs. > > (fix_register): Indicate in user_set_call_save_regs that the value > > set > > in call_save_regs and fixed_regs is user set. > > > > > > *** gcc/testsuite/ChangeLog *** > > > > 2014-08-15 Thomas Preud'homme > > > > * gcc.target/arm/fcall-save-rhigh.c: New. > > > > > > diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c > > index 2f8d327..8324fa3 100644 > > --- a/gcc/config/arm/arm.c > > +++ b/gcc/config/arm/arm.c > > @@ -30084,7 +30084,8 @@ arm_conditional_register_usage (void) > > stacking them. */ > >for (regno = FIRST_HI_REGNUM; > >regno <= LAST_HI_REGNUM; ++regno) > > - fixed_regs[regno] = call_used_regs[regno] = 1; > > + if (!user_set_call_save_regs[regno]) > > + fixed_regs[regno] = call_used_regs[regno] = 1; > > } > > > >/* The link register can be clobbered by any branch insn, > > diff --git a/gcc/hard-reg-set.h b/gcc/hard-reg-set.h > > index b8ab3df..b523637 100644 > > --- a/gcc/hard-reg-set.h > > +++ b/gcc/hard-reg-set.h > > @@ -614,6 +614,11 @@ struct target_hard_regs { > > > >char x_call_really_used_regs[FIRST_PSEUDO_REGISTER]; > > > > + /* Indexed by hard register number, contains 1 for registers > > + whose saving at function call was decided by the user > > + with -fcall-saved-*, -fcall-used-* or -ffixed-*. */ > > + char x_user_set_call_save_regs[FIRST_PSEUDO_REGISTER]; > > + > >/* The same info as a HARD_REG_SET. */ > >HARD_REG_SET x_call_used_reg_set; > > > > @@ -685,6 +690,8 @@ extern struct target_hard_regs > > *this_target_hard_regs; > >(this_target_hard_regs->x_call_used_regs) > > #define call_really_used_regs \ > >(this_target_hard_regs->x_call_really_used_regs) > > +#define user_set_call_save_regs \ > > + (this_target_hard_regs->x_user_set_call_save_regs) > > #define call_used_reg_set \ > >(this_target_hard_regs->x_call_used_reg_set) > > #define call_fixed_reg_set \ > > diff --git a/gcc/reginfo.c b/gcc/reginfo.c > > index 7668be0..0b35f7f 100644 > > --- a/gcc/reginfo.c > > +++ b/gcc/reginfo.c > > @@ -183,6 +183,7 @@ init_reg_sets (void) > >memcpy (call_really_used_regs, initial_call_really_used_regs, > > sizeof call_really_used_regs); > > #endif > > + memset (user_set_call_save_regs, 0, sizeof user_set_call_save_regs); > > #ifdef REG_ALLOC_ORDER > >memcpy (reg_alloc_order, initial_reg_alloc_order, sizeof > > reg_alloc_order); > > #endif > > @@ -742,6 +743,7 @@ fix_register (const char *name, int fixed, int > call_used) > > if (fixed == 0) > > call_really_used_regs[i] = call_used; > > #endif > > + user_set_call_save_regs[i] = 1; > > } > > } > > } > > diff --git a/gcc/testsuite/gcc.target/arm/fcall-save-rhigh.c > > b/gcc/testsuite/gcc.target/arm/fcall-save-rhigh.c > > new file mode 100644 > > index 000..a321a2b > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/arm/fcall-save-rhigh.c > > @@ -0,0 +1,10 @@ > > +/* { dg-do compile } */ > > +/* { dg-final { scan-assembler "mov\\s+r.\\s*,\\s*r8" } } */ > > +/* { dg-require-effective-target arm_thumb1_ok } */ > > +/* { dg-options "-Os -mthumb -mcpu=cortex-m0 -fcall-saved-r8" } */ > > + > > +void > > +save_regs (void) > > +{ > > + asm volatile ("" ::: "r7", "r8"); > > +} > > > > Ok for trunk? > > > > Best regards, > > > > Thomas > > > > > > >
[PATCH] Fix PR63259: bswap not recognized when finishing with rotation
Currently the bswap pass only look for bswap pattern by examining bitwise OR statement and doing following def-use chains. However a rotation (left or right) can finish a manual byteswap, as shown in the following example: unsigned byteswap_ending_with_rotation (unsigned in) { in = ((in & 0xff00ff00) >> 8) | ((in & 0x00ff00ff) << 8); in = ((in & 0x) >> 16) | ((in & 0x) << 16); return in; } which is compiled into: byteswap_ending_with_rotation (unsigned int in) { unsigned int _2; unsigned int _3; unsigned int _4; unsigned int _5; : _2 = in_1(D) & 4278255360; _3 = _2 >> 8; _4 = in_1(D) & 16711935; _5 = _4 << 8; in_6 = _5 | _3; in_7 = in_6 r>> 16; return in_7; } This patch adds rotation (left and right) to the list of statement to consider for byte swap. ChangeLog are as follows: *** gcc/ChangeLog *** 2014-09-30 Thomas Preud'homme PR tree-optimization/63259 * tree-ssa-math-opts.c (pass_optimize_bswap::execute): Also consider bswap in LROTATE_EXPR and RROTATE_EXPR statements. *** gcc/testsuite/ChangeLog *** 2014-09-30 Thomas Preud'homme PR tree-optimization/63259 * optimize-bswapsi-1.c (swap32_e): New bswap pass test. diff --git a/gcc/testsuite/gcc.dg/optimize-bswapsi-1.c b/gcc/testsuite/gcc.dg/optimize-bswapsi-1.c index 580e6e0..d4b5740 100644 --- a/gcc/testsuite/gcc.dg/optimize-bswapsi-1.c +++ b/gcc/testsuite/gcc.dg/optimize-bswapsi-1.c @@ -64,5 +64,16 @@ swap32_d (SItype in) | (((in >> 24) & 0xFF) << 0); } -/* { dg-final { scan-tree-dump-times "32 bit bswap implementation found at" 4 "bswap" } } */ +/* This variant comes from PR63259. It compiles to a gimple sequence that ends + with a rotation instead of a bitwise OR. */ + +unsigned +swap32_e (unsigned in) +{ + in = ((in & 0xff00ff00) >> 8) | ((in & 0x00ff00ff) << 8); + in = ((in & 0x) >> 16) | ((in & 0x) << 16); + return in; +} + +/* { dg-final { scan-tree-dump-times "32 bit bswap implementation found at" 5 "bswap" } } */ /* { dg-final { cleanup-tree-dump "bswap" } } */ diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c index 3c6e935..2023f2e 100644 --- a/gcc/tree-ssa-math-opts.c +++ b/gcc/tree-ssa-math-opts.c @@ -2377,11 +2377,16 @@ pass_optimize_bswap::execute (function *fun) { gimple src_stmt, cur_stmt = gsi_stmt (gsi); tree fndecl = NULL_TREE, bswap_type = NULL_TREE, load_type; + enum tree_code code; struct symbolic_number n; bool bswap; - if (!is_gimple_assign (cur_stmt) - || gimple_assign_rhs_code (cur_stmt) != BIT_IOR_EXPR) + if (!is_gimple_assign (cur_stmt)) + continue; + + code = gimple_assign_rhs_code (cur_stmt); + if (code != BIT_IOR_EXPR && code != LROTATE_EXPR + && code != RROTATE_EXPR) continue; src_stmt = find_bswap_or_nop (cur_stmt, &n, &bswap); Testing was done by running the testsuite on arm-none-eabi target with QEMU emulating Cortex-M3: no regression were found. Due to the potential increase in compilation time, A bootstrap with sequential build (no -j option when calling make) and with default option was made with and without the patch. The results shows no increase compilation time: r215662 with patch: make 6167.48s user 401.03s system 99% cpu 1:49:52.07 total r215662 without patch make 6136.63s user 400.32s system 99% cpu 1:49:27.28 total Is it ok for trunk? Best regards, Thomas Preud'homme
RE: [PATCH] Fix PR63259: bswap not recognized when finishing with rotation
> From: Jakub Jelinek [mailto:ja...@redhat.com] > Sent: Wednesday, October 08, 2014 2:39 PM > > Doesn't it turn 16-bit {L,R}ROTATE_EXPR used alone into > __builtin_bswap16? > For those the question is if the canonical GIMPLE should be the rotation > or > byteswap, I'd think rotation would be perhaps better. Or depending on > if > the backend has bswaphi2 or rotate pattern? Good point. It seems better to keep the status quo. > > Also, perhaps you could short-circuit this if the rotation isn't by constant > or not a multiple of BITS_PER_UNIT. So > switch (code) > { > case BIT_IOR_EXPR: > break; > case LROTATE_EXPR: > case RROTATE_EXPR: > if (!tree_fits_uhwi_p (gimple_assign_rhs2 (cur_stmt)) > || (tree_to_uhwi (gimple_assign_rhs2 (cur_stmt)) > % BITS_PER_UNIT)) > continue; > break; > default: > continue; > } > ? Right. Thanks for the comments. Best regards, Thomas
RE: [PATCH] Fix PR63259: bswap not recognized when finishing with rotation
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme > Sent: Wednesday, October 08, 2014 2:43 PM > > Also, perhaps you could short-circuit this if the rotation isn't by constant Note that do_shift_rotate already check for this. Is it enough? Best regards, Thomas
RE: [PATCH, C++] Fix PR63366: __complex not equivalent to __complex double in C++
Ping? > -Original Message- > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme > Sent: Monday, September 29, 2014 3:33 PM > To: gcc-patches@gcc.gnu.org > Subject: [PATCH, C++] Fix PR63366: __complex not equivalent to > __complex double in C++ > > According to a comment in grokdeclarator in file gcc/cp/decl.c: > > /* If we just have "complex", it is equivalent to > "complex double", but if any modifiers at all are specified it is > the complex form of TYPE. E.g, "complex short" is > "complex short int". */ > > Yet, __complex is equivalent to __complex int as shows the following > testcase: > > #include > > int > main (void) > { > return typeid (__complex) != typeid (__complex int); > } > > The following patch fix the problem. > > > ChangeLog are as follows: > > *** gcc/cp/ChangeLog *** > > 2014-09-26 Thomas Preud'homme > > PR C++/63366 > * decl.c (grokdeclarator): Set defaulted_int when defaulting to > int > because type is null. > > *** gcc/testsuite/ChangeLog *** > > 2014-10-26 Thomas Preud'homme > > PR C++/63366 > * g++.dg/torture/pr63366.C: New test. > > > diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c > index d26a432..449efdf 100644 > --- a/gcc/cp/decl.c > +++ b/gcc/cp/decl.c > @@ -9212,6 +9212,7 @@ grokdeclarator (const cp_declarator *declarator, > "ISO C++ forbids declaration of %qs with no type", name); > >type = integer_type_node; > + defaulted_int = 1; > } > >ctype = NULL_TREE; > diff --git a/gcc/testsuite/g++.dg/torture/pr63366.C > b/gcc/testsuite/g++.dg/torture/pr63366.C > new file mode 100644 > index 000..af59b98 > --- /dev/null > +++ b/gcc/testsuite/g++.dg/torture/pr63366.C > @@ -0,0 +1,11 @@ > +// { dg-do run } > +// { dg-options "-fpermissive" } > +// { dg-prune-output "ISO C\\+\\+ forbids declaration of 'type name' > with no type" } > + > +#include > + > +int > +main (void) > +{ > + return typeid (__complex) != typeid (__complex double); > +} > > > Is this ok for trunk? > > Best regards, > > Thomas Preud'homme > > >
RE: [PATCH] Fix PR63266: Keep track of impact of sign extension in bswap
Hi Richard, I realized thanks to Christophe Lyon that a shift was not right: the shift count is a number of bytes instead of a number of bits. This extra patch fixes the problem. ChangeLog are as follows: *** gcc/ChangeLog *** 2014-09-26 Thomas Preud'homme * tree-ssa-math-opts.c (find_bswap_or_nop_1): Fix creation of MARKER_BYTE_UNKNOWN markers when handling casts. *** gcc/testsuite/ChangeLog *** 2014-10-08 Thomas Preud'homme * gcc.dg/optimize-bswaphi-1.c: New bswap pass test. diff --git a/gcc/testsuite/gcc.dg/optimize-bswaphi-1.c b/gcc/testsuite/gcc.dg/optimize-bswaphi-1.c index 3e51f04..18aba28 100644 --- a/gcc/testsuite/gcc.dg/optimize-bswaphi-1.c +++ b/gcc/testsuite/gcc.dg/optimize-bswaphi-1.c @@ -42,6 +42,20 @@ uint32_t read_be16_3 (unsigned char *data) return *(data + 1) | (*data << 8); } +typedef int SItype __attribute__ ((mode (SI))); +typedef int HItype __attribute__ ((mode (HI))); + +/* Test that detection of significant sign extension works correctly. This + checks that unknown byte marker are set correctly in cast of cast. */ + +HItype +swap16 (HItype in) +{ + return (HItype) (((in >> 0) & 0xFF) << 8) + | (((in >> 8) & 0xFF) << 0); +} + /* { dg-final { scan-tree-dump-times "16 bit load in target endianness found at" 3 "bswap" } } */ -/* { dg-final { scan-tree-dump-times "16 bit bswap implementation found at" 3 "bswap" { xfail alpha*-*-* arm*-*-* } } } */ +/* { dg-final { scan-tree-dump-times "16 bit bswap implementation found at" 1 "bswap" { target alpha*-*-* arm*-*-* } } } */ +/* { dg-final { scan-tree-dump-times "16 bit bswap implementation found at" 4 "bswap" { xfail alpha*-*-* arm*-*-* } } } */ /* { dg-final { cleanup-tree-dump "bswap" } } */ diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c index 3c6e935..2ef2333 100644 --- a/gcc/tree-ssa-math-opts.c +++ b/gcc/tree-ssa-math-opts.c @@ -1916,7 +1916,8 @@ find_bswap_or_nop_1 (gimple stmt, struct symbolic_number *n, int limit) if (!TYPE_UNSIGNED (n->type) && type_size > old_type_size && HEAD_MARKER (n->n, old_type_size)) for (i = 0; i < type_size - old_type_size; i++) - n->n |= MARKER_BYTE_UNKNOWN << (type_size - 1 - i); + n->n |= MARKER_BYTE_UNKNOWN + << ((type_size - 1 - i) * BITS_PER_MARKER); if (type_size < 64 / BITS_PER_MARKER) { regression testsuite run without regression on x86_64-linux-gnu and bswap tests all pass on arm-none-eabi target Is it ok for trunk? Best regards, Thomas > -Original Message- > From: Richard Biener [mailto:richard.guent...@gmail.com] > Sent: Wednesday, September 24, 2014 4:01 PM > To: Thomas Preud'homme > Cc: GCC Patches > Subject: Re: [PATCH] Fix PR63266: Keep track of impact of sign extension > in bswap > > On Tue, Sep 16, 2014 at 12:24 PM, Thomas Preud'homme > wrote: > > Hi all, > > > > The fix for PR61306 disabled bswap when a sign extension is detected. > However this led to a test case regression (and potential performance > regression) in case where a sign extension happens but its effect is > canceled by other bit manipulation. This patch aims to fix that by having a > special marker to track bytes whose value is unpredictable due to sign > extension. If the final result of a bit manipulation doesn't contain any > such marker then the bswap optimization can proceed. > > Nice and simple idea. > > Ok. > > Thanks, > Richard. > > > *** gcc/ChangeLog *** > > > > 2014-09-15 Thomas Preud'homme > > > > PR tree-optimization/63266 > > * tree-ssa-math-opts.c (struct symbolic_number): Add comment > about > > marker for unknown byte value. > > (MARKER_MASK): New macro. > > (MARKER_BYTE_UNKNOWN): New macro. > > (HEAD_MARKER): New macro. > > (do_shift_rotate): Mark bytes with unknown values due to sign > > extension when doing an arithmetic right shift. Replace hardcoded > > mask for marker by new MARKER_MASK macro. > > (find_bswap_or_nop_1): Likewise and adjust ORing of two > symbolic > > numbers accordingly. > > > > *** gcc/testsuite/ChangeLog *** > > > > 2014-09-15 Thomas Preud'homme > > > > PR tree-optimization/63266 > > * gcc.dg/optimize-bswapsi-1.c (swap32_d): New bswap pass test. > > > > > > Testing: > > > > * Built an arm-none-eabi-gcc cross-compiler and used it to run the > testsuite on QEMU emulating Cortex-M3 without any regression > > * Bootstrapped on x86_64-linux-gnu target and testsuite was run > without regression > > > > > > Ok for trunk?
RE: [PATCH] Fix PR63266: Keep track of impact of sign extension in bswap
> From: Christophe Lyon [mailto:christophe.l...@linaro.org] > Sent: Tuesday, October 21, 2014 10:03 PM > > +typedef int SItype __attribute__ ((mode (SI))); > What's the purpose of this? It seems unused. Sigh. Bad copy/paste from optimize-bswapsi-1.c I'll add it to my patch for pr63259. > I believe this should be: > "checks that unknown byte markers are set correctly in case of cast" Indeed, there is a 's' missing for markers. > > > + > > +HItype > > +swap16 (HItype in) > > +{ > > + return (HItype) (((in >> 0) & 0xFF) << 8) > > + | (((in >> 8) & 0xFF) << 0); > > +} > > + > > /* { dg-final { scan-tree-dump-times "16 bit load in target endianness > found at" 3 "bswap" } } */ > > -/* { dg-final { scan-tree-dump-times "16 bit bswap implementation > found at" 3 "bswap" { xfail alpha*-*-* arm*-*-* } } } */ > > +/* { dg-final { scan-tree-dump-times "16 bit bswap implementation > found at" 1 "bswap" { target alpha*-*-* arm*-*-* } } } */ > > This line fails when forcing the compiler to target -march=armv5t for > instance. I suspect this is because the check_effective_target_bswap > test is too permissive. Yep, it's likely to be the case. Feel to add a version check in it. Thanks for the review. Best regards, Thomas
RE: [PATCH] Fix PR63266: Keep track of impact of sign extension in bswap
> From: Christophe Lyon [mailto:christophe.l...@linaro.org] > Sent: Sunday, October 26, 2014 4:40 PM > I tried to modify check_effective_target_bswap > and added: > + } else { > + if { [istarget arm*-*-*] > +&& [check_no_compiler_messages_nocache arm_v6_or_later > object { > +#if __ARM_ARCH < 6 > +#error not armv6 or later > +#endif > +int i; > +} ""] } { > + set et_bswap_saved 1 > + } > since the rev* instructions appeared in v6. > > Regarding the testsuite, it moves the tests to UNSUPPORTED vs a mix of > PASS/FAIL/XFAIL [SNIP PASS/FAIL/XFAIL changes] > > The PASS seems not very informative, so it may not be a problem to > loose these few PASS/XFAIL. Agreed. A FAIL would only mean that the test was badly written. Only the dump is relevant to tell whether the bswap pass did its job or not. > > We can also explicitly skip optimize-bswaphi-1 when ARM_ARCH < 6. > > Not sure what's preferred? I prefer changing the effective target as it could be reused for some other tests eventually. It also reflects better the reason why the test is disabled: no 16-bit bswap. Best regards, Thomas
[PATCH] Fix PR61328: fully initialize symbolic number before using it
When a bitwise OR gimple statement has for operands SSA_NAME initialized directly from memory source (no cast or other unary statement intervening), a symbolic number will be used only partly initialized. This was discovered by valgrind and reported as PR61328. This patch fixes that by moving the initialization code in a separate function that can be called from the two places that need it. There was a problem of a field of a structure that was set in a function and the value of this field was read before checking the result of the function call. This would lead to missed optimization. ChangeLog is as follows: 2014-05-29 Thomas Preud'homme PR tree-optimization/61328 * tree-ssa-math-opts.c (init_symbolic_number): Extract symbolic number initialization from find_bswap_or_nop_1. (find_bswap_or_nop_1): Test return value of find_bswap_or_nop_1 stored in source_expr2 before using the size value the function sets. Also make use of init_symbolic_number () in both the old place and find_bswap_or_nop_load () to avoid reading uninitialized memory when doing recursion in the GIMPLE_BINARY_RHS case. Ok for trunk? diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c index d9afccf..6c26d6d 100644 --- a/gcc/tree-ssa-math-opts.c +++ b/gcc/tree-ssa-math-opts.c @@ -1701,6 +1701,30 @@ verify_symbolic_number_p (struct symbolic_number *n, gimple stmt) return true; } +/* Initialize the symbolic number N for the bswap pass from the base element + SRC manipulated by the bitwise OR expression. */ + +static bool +init_symbolic_number (struct symbolic_number *n, tree src) +{ + n->base_addr = n->offset = n->alias_set = n->vuse = NULL_TREE; + + /* Set up the symbolic number N by setting each byte to a value between 1 and + the byte size of rhs1. The highest order byte is set to n->size and the + lowest order byte to 1. */ + n->size = TYPE_PRECISION (TREE_TYPE (src)); + if (n->size % BITS_PER_UNIT != 0) +return false; + n->size /= BITS_PER_UNIT; + n->range = n->size; + n->n = CMPNOP; + + if (n->size < (int)sizeof (int64_t)) +n->n &= ((uint64_t)1 << (n->size * BITS_PER_UNIT)) - 1; + + return true; +} + /* Check if STMT might be a byte swap or a nop from a memory source and returns the answer. If so, REF is that memory source and the base of the memory area accessed and the offset of the access from that base are recorded in N. */ @@ -1713,26 +1737,27 @@ find_bswap_or_nop_load (gimple stmt, tree ref, struct symbolic_number *n) HOST_WIDE_INT bitsize, bitpos; enum machine_mode mode; int unsignedp, volatilep; + tree offset, base_addr; if (!gimple_assign_load_p (stmt) || gimple_has_volatile_ops (stmt)) return false; - n->base_addr = get_inner_reference (ref, &bitsize, &bitpos, &n->offset, - &mode, &unsignedp, &volatilep, false); + base_addr = get_inner_reference (ref, &bitsize, &bitpos, &offset, &mode, + &unsignedp, &volatilep, false); - if (TREE_CODE (n->base_addr) == MEM_REF) + if (TREE_CODE (base_addr) == MEM_REF) { offset_int bit_offset = 0; - tree off = TREE_OPERAND (n->base_addr, 1); + tree off = TREE_OPERAND (base_addr, 1); if (!integer_zerop (off)) { - offset_int boff, coff = mem_ref_offset (n->base_addr); + offset_int boff, coff = mem_ref_offset (base_addr); boff = wi::lshift (coff, LOG2_BITS_PER_UNIT); bit_offset += boff; } - n->base_addr = TREE_OPERAND (n->base_addr, 0); + base_addr = TREE_OPERAND (base_addr, 0); /* Avoid returning a negative bitpos as this may wreak havoc later. */ if (wi::neg_p (bit_offset)) @@ -1743,11 +1768,11 @@ find_bswap_or_nop_load (gimple stmt, tree ref, struct symbolic_number *n) Subtract it to BIT_OFFSET and add it (scaled) to OFFSET. */ bit_offset -= tem; tem = wi::arshift (tem, LOG2_BITS_PER_UNIT); - if (n->offset) - n->offset = size_binop (PLUS_EXPR, n->offset, + if (offset) + offset = size_binop (PLUS_EXPR, offset, wide_int_to_tree (sizetype, tem)); else - n->offset = wide_int_to_tree (sizetype, tem); + offset = wide_int_to_tree (sizetype, tem); } bitpos += bit_offset.to_shwi (); @@ -1758,6 +1783,9 @@ find_bswap_or_nop_load (gimple stmt, tree ref, struct symbolic_number *n) if (bitsize % BITS_PER_UNIT) return false; + init_symbolic_number (n, ref); + n->base_addr = base_addr; + n->offset = offset; n->bytepos = bitpos / BITS_PER_UNIT; n->alias_set = reference_alias_ptr_type (ref); n->vuse = gimple_vuse (stmt); @@ -1816,28 +1
[PATCH] Fix PR61306: improve handling of sign and cast in bswap
When bswap replace a bitwise expression involving a memory source by a load possibly followed by a bswap, it is possible that the load has a size smaller than that of the target expression where the bitwise expression was affected. So some sort of cast is needed. But there might also be a difference between the size of the expression that was affected and the size of the load. So 3 different sizes might be involved. Consider the following example from binutils: bfd_vma bfd_getl16 (const void *p) { const bfd_byte *addr = (*const bfd_byte *) p; return (addr[1] << 8) | addr[0]; } Here the load will have a size of 16 bits, while the bitwise expression is an int (don't ask me why) but is returned as a 64 bits quantity (bfd_vma maps to the size of host registers). In this case we need 2 separate cast. One from 16 bit quantity to int with zero extension as the high bits are 0. It is always a zero extension because bswap will not do anything in the presence of a sign extension as depending on the initial value the result would be different (maybe a bswap if positive number and random value if negative number). Then, we need to cast respecting the extension that would have happen had we not replaced the bitwise extension. Here since the bitwise expression is int, it means we sign extend and then consider the content as being unsigned (bfd_vma is an unsigned quantity). When a bswap is necessary we need to do this double cast after doing the bswap as the bswap must be done on the loaded value since a that's the size expected by the bswap builtin. Finally, this patch also forbit any sign extension *in* the bitwise expression as the result of the expression would then be unpredictable (depend on the initial value). The patch works this way: 1) prevent size extension of a bitwise expression 2) record the type of the bitwise expression instead of its size (the size can be determined from the type) 3) use this type to perform a double cast as explained above 2014-05-30 Thomas Preud'homme PR tree-optimization/61306 * tree-ssa-math-opts.c (struct symbolic_number): Store type of expression instead of its size. (do_shift_rotate): Adapt to change in struct symbolic_number. (verify_symbolic_number_p): Likewise. (init_symbolic_number): Likewise. (find_bswap_or_nop_1): Likewise. Also prevent optimization when the result of the expressions is unpredictable due to sign extension. (convert_via): New function to deal with the casting involved from the loaded value to the target SSA. (bswap_replace): Rename load_type in range_type to reflect it's the type the memory accessed shall have before being casted. Select load type according to whether a bswap needs to be done. Cast first to rhs with zero extend and then to lhs with sign extend to keep semantic of original stmts. (pass_optimize_bswap::execute): Adapt to change in struct symbolic_number. Decide if the range accessed should be signed or unsigned before being casted to lhs type based on rhs type and size. 2014-05-29 Thomas Preud'homme * gcc.c-torture/execute/pr61306.c: New test. Patch is in attachment. Is this ok for trunk? Best regards, Thomas PR61306.1.0.diff Description: Binary data
[PATCH] Fix PR61320: disable bswap for unaligned access on SLOW_UNALIGNED_ACCESS targets
Hi there, It seems from PR61320 that the bswap pass causes some problems when it replaces an OR expression by an unaligned access. Although it's not clear yet why the unaligned load does not go through the extract_bit_field codepath, it is necessary to provide a solution as this prevent sparc from bootstrapping. This patch takes the simple approach of cancelling the bswap optimization when the load that would replace the OR expression would be an unaligned load and the target has SLOW_UNALIGNED_ACCESS. In the long run this patch should be reverted as soon as the root cause of the current problem is found. The patch also rewrite the test to take into account the fact that the optimization is not done for some target. It also add some alignment hint so that more tests can be run even on STRICT_ALIGNMENT targets. ChangeLog changes are: *** gcc/ChangeLog *** 2014-06-03 Thomas Preud'homme PR tree-optimization/61320 * tree-ssa-math-opts.c (bswap_replace): Cancel bswap optimization when load is unaligned and would be slow for this target. *** gcc/testsuite/ChangeLog *** 2014-06-03 Thomas Preud'homme * gcc.dg/optimize-bswaphi-1.c: Make variables global when possible to enforce correct alignment and make the test work better on STRICT_ALIGNMENT targets. Also adjust dg-final selectors when alignment cannot be controlled (read_*_3 ()). * gcc.dg/optimize-bswapsi-2.c: Likewise. * gcc.dg/optimize-bswapdi-3.c: Likewise. Bootstrapped on x86_64-linux-gnu and no regression found in the testsuite. Patch is in attachment. It applies on top of the one for PR61306 in the email titled "[PATCH] Fix PR61306: improve handling of sign and cast in bswap" but can be trivially modified to apply directly on trunk should that patch (PR61306) need to be improved. Is this ok for trunk? Best regards, Thomas PR61320.1.0.diff Description: Binary data
RE: [PATCH] Fix PR54733 Optimize endian independent load/store
> From: Christophe Lyon [mailto:christophe.l...@linaro.org] > On 29 May 2014 11:58, Thomas Preud'homme > wrote: > > > > Does the patch solve the problem you had? What about you Christophe? > > > > > > Hi Thomas, > > After a quick test, it looks OK to me. Great. What about you Andreas? Does it work fine for you? If yes, is this ok for trunk? Best regards, Thomas
[PATCH] Adding myself to Write After Approval in MAINTAINERS
Hi all, I forgot to add myself to the MAINTAINERS file when I got Write After Approval access. This patch does just this (already commited as specified on the website): diff --git a/ChangeLog b/ChangeLog index d35b315..80e9600 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,7 @@ +2014-06-04 Thomas Preud'homme + + * MAINTAINERS (Write After Approval): Add myself. + 2014-06-03 Andrew Bennett * MAINTAINERS (Write After Approval): Add myself. diff --git a/MAINTAINERS b/MAINTAINERS index 12c123d..71206e7 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -501,6 +501,7 @@ Paul Pluzhnikov ppluzhni...@google.com Marek Polacek pola...@redhat.com Antoniu Popantoniu@gmail.com Vidya Praveen vidyaprav...@arm.com +Thomas Preud'homme thomas.preudho...@arm.com Vladimir Prus vladi...@codesourcery.com Yao Qi y...@codesourcery.com Jerry Quinnjlqu...@optonline.net
RE: [PATCH] Fix PR61306: improve handling of sign and cast in bswap
> From: Richard Biener [mailto:richard.guent...@gmail.com] > > I'd rather change the comparisons > > - if (n->size < (int)sizeof (int64_t)) > -n->n &= ((uint64_t)1 << (n->size * BITS_PER_UNIT)) - 1; > + if (bitsize / BITS_PER_UNIT < (int)sizeof (int64_t)) > +n->n &= ((uint64_t)1 << bitsize) - 1; > > to work in bits, thus bitsize < 8 * sizeof (int64_t) (note that using > BITS_PER_UNIT is bogus here - you are dealing with 8-bit bytes > on the host, not whatever the target uses). Otherwise it smells > like truncation may lose bits (probably not in practice). Ah yes, right. > > + /* Sign extension: result is dependent on the value. */ > + if (!TYPE_UNSIGNED (type) && !TYPE_UNSIGNED (n->type) > + && type_size > TYPE_PRECISION (n->type)) > + return NULL_TREE; > > whether it's sign-extended depends solely on the sign of the > converted entity, so I'm not sure why you are restricting this > to signed n->type. Consider > > signed char *p; > ((unsigned int)p[0]) << 8 | ((unsigned int)p[1]) << 16 > > the loads are still sign-extended but n->type is unsigned. Indeed, I understood it for convert_via (the requirement to be unsigned) but got it wrong here. > > I'm failing to get why you possibly need two casts ... you should > only need one, from the bswap/load result to the final type > (zero-extended as you say - so the load type should simply be > unsigned which it is already). Because of the type of the shift constant, the bitwise expression is usually of type int. However, if you write a function that does a 32 bit load in host endianness (like a 32 bit little endian load on x86_64) with a result of size 64 bits, then you need to sign extend, since the source type is signed. This is a situation I encountered in bfd_getl32 in binutils I think. Now if you consider bfd_getl16 instead a direct sign extension is out of the question. Suppose bfd_getl16 is called to read from a memory address that contains 0xff 0xfe. The bitwise expression would thus be equivalent to the value 0xfeff since it's of type int. Then after the sign extension to 64bits you'd have 0xfeff. But after replacing the bitwise expression you end up with a 16bit load into a 16bit SSA variable. If you sign extend that directly to 64 bits you'll end up with 0xfeff which is wrong. But if you zero extend to an int value (the type of the bitwise OR expression) and then sign extend to the target type you'll have the correct result. But you're right, we can do simpler by sign extending if load size == size of bitwise expression and zero extend else. The change of load_type to range_type would still be needed because in case of a load + bswap it's better to load in the same type as the type of the parameter of bswap. After bswap you'd need to convert to a signed or unsigned value according to the logic above (load size == size of bitwise expression) In the original statement, the bitwise OR expression would have the 2 bytes of higher weight be 0 while the 2 bytes of lower weight would be the value read from memory. The result of the sign extension would be > > So I think that the testcase in the patch is fixed already by > doing the n->type change (and a proper sign-extension detection). I don't remember exactly but I think it didn't fix this bug (but it is a necessary fix anyway). > > Can you please split that part out? Sure, that part would need to be applied on gcc 4.9 too. I'll try to construct a testcase for that. > > That range_type and convert_via looks wrong and unnecessary to me, > and it doesn't look like you have a testcase excercising it? See above.
RE: [PATCH] Fix PR61306: improve handling of sign and cast in bswap
> From: Richard Biener [mailto:richard.guent...@gmail.com] > > Err, but if you zero-extend directly to the target type you have the > correct result, too. Yep but in some case we need sign extend (32 bit bitwise OR stored into 64 bit result). As I said, the logic could be simplified by sign extending if load_size == bitwise expression size and zero extending if not true. I'll rework the patch in this direction. > > But nothing for the testsuite? The testcase you add fails foul of > sign-extending the loads. Ack, I'll add a test for zero extension and one for sign extension. Cheers, Thomas
[PATCH] Clean bswap messages and tests
This patch include 2 cleanup that were requested in PR61320: * Use dg-additional-options to specify the extra option s390 target needs * Use the correct vocabulary of target endianness instead of host endianness in comments, pass dump and the past ChangeLog entry. Here are the ChangeLog: *** gcc/ChangeLog *** 2014-06-04 Thomas Preud'homme * ChangeLog (2014-05-23): Fix ChangeLog entry to refer to target endianness instead of host endianness. * tree-ssa-math-opts.c (find_bswap_or_nop_1): Likewise in dumps and comments. *** gcc/testsuite/ChangeLog *** 2014-06-04 Thomas Preud'homme 2014-06-04 Thomas Preud'homme * gcc.dg/optimize-bswaphi-1.c: Adapt test to change of dump output. Specify -march=z900 as an additional option. * gcc.dg/optimize-bswapsi-1.c: Likewise for s390 options. * gcc.dg/optimize-bswapsi-2.c: Likewise. * gcc.dg/optimize-bswapdi-3.c: Likewise for adaptation to dump change. diff --git a/gcc/ChangeLog b/gcc/ChangeLog index de07e5c..09122aa 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1623,7 +1623,8 @@ (find_bswap_or_nop_1): This. Also add support for memory source. (find_bswap): Renamed to ... (find_bswap_or_nop): This. Also add support for memory source and - detection of bitwise operations equivalent to load in host endianness. + detection of bitwise operations equivalent to load in target + endianness. (execute_optimize_bswap): Likewise. Also move its leading comment back in place and split statement transformation into ... (bswap_replace): This. diff --git a/gcc/testsuite/gcc.dg/optimize-bswapdi-3.c b/gcc/testsuite/gcc.dg/optimize-bswapdi-3.c index 0a8bf2e..d96d7e5 100644 --- a/gcc/testsuite/gcc.dg/optimize-bswapdi-3.c +++ b/gcc/testsuite/gcc.dg/optimize-bswapdi-3.c @@ -59,6 +59,6 @@ uint64_t read_be64_3 (unsigned char *data) | ((uint64_t) *(data + 1) << 48) | ((uint64_t) *data << 56); } -/* { dg-final { scan-tree-dump-times "64 bit load in host endianness found at" 3 "bswap" } } */ +/* { dg-final { scan-tree-dump-times "64 bit load in target endianness found at" 3 "bswap" } } */ /* { dg-final { scan-tree-dump-times "64 bit bswap implementation found at" 3 "bswap" { xfail alpha*-*-* arm*-*-* } } } */ /* { dg-final { cleanup-tree-dump "bswap" } } */ diff --git a/gcc/testsuite/gcc.dg/optimize-bswaphi-1.c b/gcc/testsuite/gcc.dg/optimize-bswaphi-1.c index 65bff98..3e51f04 100644 --- a/gcc/testsuite/gcc.dg/optimize-bswaphi-1.c +++ b/gcc/testsuite/gcc.dg/optimize-bswaphi-1.c @@ -2,7 +2,7 @@ /* { dg-require-effective-target bswap16 } */ /* { dg-require-effective-target stdint_types } */ /* { dg-options "-O2 -fdump-tree-bswap" } */ -/* { dg-options "-O2 -fdump-tree-bswap -march=z900" { target s390-*-* } } */ +/* { dg-additional-options "-march=z900" { target s390-*-* } } */ #include @@ -42,6 +42,6 @@ uint32_t read_be16_3 (unsigned char *data) return *(data + 1) | (*data << 8); } -/* { dg-final { scan-tree-dump-times "16 bit load in host endianness found at" 3 "bswap" } } */ +/* { dg-final { scan-tree-dump-times "16 bit load in target endianness found at" 3 "bswap" } } */ /* { dg-final { scan-tree-dump-times "16 bit bswap implementation found at" 3 "bswap" { xfail alpha*-*-* arm*-*-* } } } */ /* { dg-final { cleanup-tree-dump "bswap" } } */ diff --git a/gcc/testsuite/gcc.dg/optimize-bswapsi-1.c b/gcc/testsuite/gcc.dg/optimize-bswapsi-1.c index 33d0bb0..ebfca60 100644 --- a/gcc/testsuite/gcc.dg/optimize-bswapsi-1.c +++ b/gcc/testsuite/gcc.dg/optimize-bswapsi-1.c @@ -2,7 +2,7 @@ /* { dg-require-effective-target bswap32 } */ /* { dg-require-effective-target stdint_types } */ /* { dg-options "-O2 -fdump-tree-bswap" } */ -/* { dg-options "-O2 -fdump-tree-bswap -march=z900" { target s390-*-* } } */ +/* { dg-additional-options "-march=z900" { target s390-*-* } } */ #include diff --git a/gcc/testsuite/gcc.dg/optimize-bswapsi-2.c b/gcc/testsuite/gcc.dg/optimize-bswapsi-2.c index 518b510..de6e697 100644 --- a/gcc/testsuite/gcc.dg/optimize-bswapsi-2.c +++ b/gcc/testsuite/gcc.dg/optimize-bswapsi-2.c @@ -2,7 +2,7 @@ /* { dg-require-effective-target bswap32 } */ /* { dg-require-effective-target stdint_types } */ /* { dg-options "-O2 -fdump-tree-bswap" } */ -/* { dg-options "-O2 -fdump-tree-bswap -march=z900" { target s390-*-* } } */ +/* { dg-additional-options "-march=z900" { target s390-*-* } } */ #include @@ -44,6 +44,6 @@ uint32_t read_be32_3 (unsigned char *data) | (*data << 24); } -/* { dg-final { scan-tree-dump-times "32 bit load in host endianness found at" 3 "bswap" } } */ +/* { dg-final {
RE: [PATCH] Fix PR54733 Optimize endian independent load/store
> From: Richard Biener [mailto:richard.guent...@gmail.com] > On Wed, Jun 4, 2014 at 9:04 AM, Thomas Preud'homme > wrote: > > > > Great. What about you Andreas? Does it work fine for you? If yes, is this ok > for trunk? > > Ok. > > Thanks, > Richard. Commited since I got positive feedback from Christophe Lyon and Rainer (on PR61320). Best regards, Thomas
[PATCH] Unchecked call to init_symbolic_number
When doing a bootstrap on x86_64-linux-gnu to test a patch I'm writing, I encountered a failure that turned out to be due to not checking the return value of init_symbolic_number in find_bswap_or_nop_load (tree-ssa-math-opts.c). The following patch fixes that and I commited it as obvious as per GCC write access policies. ChangeLog: 2014-06-09 Thomas Preud'homme * tree-ssa-math-opts.c (find_bswap_or_nop_load): Check return value of init_symbolic_number (). diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c index a928ad9..1f011a6 100644 --- a/gcc/tree-ssa-math-opts.c +++ b/gcc/tree-ssa-math-opts.c @@ -1784,7 +1784,8 @@ find_bswap_or_nop_load (gimple stmt, tree ref, struct symbolic_number *n) if (bitsize % BITS_PER_UNIT) return false; - init_symbolic_number (n, ref); + if (!init_symbolic_number (n, ref)) +return false; n->base_addr = base_addr; n->offset = offset; n->bytepos = bitpos / BITS_PER_UNIT;
[PATCH] Fix PR61375: cancel bswap optimization when value doesn't fit in a HOST_WIDE_INT
When analyzing a bitwise AND with a constant as part of a bitwise OR, the bswap pass stores the constant in a int64_t variable without checking if it fits. As a result, we get ICE when the constant is an __int128 value. This affects GCC trunk but also GCC 4.9 and 4.8 (and possibly earlier version as well). ChangeLog are changed as follows: *** gcc/ChangeLog *** 2014-06-05 Thomas Preud'homme PR tree-optimization/61375 * tree-ssa-math-opts.c (init_symbolic_number): Cancel optimization if symbolic number cannot be represented in an unsigned HOST_WIDE_INT. (find_bswap_or_nop_1): Likewise. *** gcc/testsuite/ChangeLog *** 2014-06-05 Thomas Preud'homme PR tree-optimization/61375 * gcc.c-torture/execute/pr61375-1.c: New test. diff --git a/gcc/testsuite/gcc.c-torture/execute/pr61375.c b/gcc/testsuite/gcc.c-torture/execute/pr61375.c new file mode 100644 index 000..58df57a --- /dev/null +++ b/gcc/testsuite/gcc.c-torture/execute/pr61375.c @@ -0,0 +1,34 @@ +#ifdef __UINT64_TYPE__ +typedef __UINT64_TYPE__ uint64_t; +#else +typedef unsigned long long uint64_t; +#endif + +#ifndef __SIZEOF_INT128__ +#define __int128 long long +#endif + +/* Some version of bswap optimization would ICE when analyzing a mask constant + too big for an HOST_WIDE_INT (PR210931). */ + +__attribute__ ((noinline, noclone)) uint64_t +uint128_central_bitsi_ior (unsigned __int128 in1, uint64_t in2) +{ + __int128 mask = (__int128)0x << 56; + return ((in1 & mask) >> 56) | in2; +} + +int main(int argc) +{ + __int128 in = 1; +#ifdef __SIZEOF_INT128__ + in <<= 64; +#endif + if (sizeof (uint64_t) * __CHAR_BIT__ != 64) +return 0; + if (sizeof (unsigned __int128) * __CHAR_BIT__ != 128) +return 0; + if (uint128_central_bitsi_ior (in, 2) != 0x102) +__builtin_abort (); + return 0; +} diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c index 658b341..95b3f25 100644 --- a/gcc/tree-ssa-math-opts.c +++ b/gcc/tree-ssa-math-opts.c @@ -1717,6 +1717,8 @@ init_symbolic_number (struct symbolic_number *n, tree src) if (n->size % BITS_PER_UNIT != 0) return false; n->size /= BITS_PER_UNIT; + if (n->size > (int)sizeof (unsigned HOST_WIDE_INT)) +return false; n->range = n->size; n->n = CMPNOP; @@ -1883,6 +1885,8 @@ find_bswap_or_nop_1 (gimple stmt, struct symbolic_number *n, int limit) type_size = TYPE_PRECISION (gimple_expr_type (stmt)); if (type_size % BITS_PER_UNIT != 0) return NULL_TREE; + if (type_size > (int)sizeof (unsigned HOST_WIDE_INT) * 8) + return NULL_TREE; if (type_size / BITS_PER_UNIT < (int)(sizeof (int64_t))) { Is this OK for trunk? What about backports for 4.8 and 4.9? Would a reworked patch for these versions be accepted? The change would be trivial: the code in init_symbolic_number now was moved from some other place. Best regards, Thomas
RE: [PATCH] Fix PR61306: improve handling of sign and cast in bswap
> From: Richard Biener [mailto:richard.guent...@gmail.com] > Sent: Wednesday, June 04, 2014 5:39 PM > To: Thomas Preud'homme > > > I'm failing to get why you possibly need two casts ... you should > only need one, from the bswap/load result to the final type > (zero-extended as you say - so the load type should simply be > unsigned which it is already). You are right indeed. I failed to realize that the problems I encountered were caused by an initially wrong understanding of the reason behind PR61306. All this code is not necessary. > > So I think that the testcase in the patch is fixed already by > doing the n->type change (and a proper sign-extension detection). > > Can you please split that part out? Doing so I realize the patch was incomplete. Sign extension can be triggered in two distinct place in the code (right shift and cast) that can both lead to incorrect code being generated. With some efforts I managed to create two testcases that work both on GCC trunk but also GCC 4.9 and 4.8. ChangeLog entries are: *** gcc/ChangeLog *** 2014-06-05 Thomas Preud'homme PR tree-optimization/61306 * tree-ssa-math-opts.c (struct symbolic_number): Store type of expression instead of its size. (do_shift_rotate): Adapt to change in struct symbolic_number. Return false to prevent optimization when the result is unpredictable due to arithmetic right shift of signed type with highest byte is set. (verify_symbolic_number_p): Adapt to change in struct symbolic_number. (init_symbolic_number): Likewise. (find_bswap_or_nop_1): Likewise. Return NULL to prevent optimization when the result is unpredictable due to sign extension. *** gcc/testsuite/ChangeLog *** 2014-06-05 Thomas Preud'homme * gcc.c-torture/execute/pr61306-1.c: New test. * gcc.c-torture/execute/pr61306-2.c: Likewise. * gcc.c-torture/execute/pr61306-3.c: Likewise. diff --git a/gcc/testsuite/gcc.c-torture/execute/pr61306-1.c b/gcc/testsuite/gcc.c-torture/execute/pr61306-1.c new file mode 100644 index 000..f6e8ff3 --- /dev/null +++ b/gcc/testsuite/gcc.c-torture/execute/pr61306-1.c @@ -0,0 +1,39 @@ +#ifdef __INT32_TYPE__ +typedef __INT32_TYPE__ int32_t; +#else +typedef int int32_t; +#endif + +#ifdef __UINT32_TYPE__ +typedef __UINT32_TYPE__ uint32_t; +#else +typedef unsigned uint32_t; +#endif + +#define __fake_const_swab32(x) ((uint32_t)( \ +(((uint32_t)(x) & (uint32_t)0x00ffUL) << 24) |\ +(((uint32_t)(x) & (uint32_t)0xff00UL) << 8) |\ +(((uint32_t)(x) & (uint32_t)0x00ffUL) >> 8) |\ +(( (int32_t)(x) & (int32_t)0xff00UL) >> 24))) + +/* Previous version of bswap optimization failed to consider sign extension + and as a result would replace an expression *not* doing a bswap by a + bswap. */ + +__attribute__ ((noinline, noclone)) uint32_t +fake_bswap32 (uint32_t in) +{ + return __fake_const_swab32 (in); +} + +int +main(void) +{ + if (sizeof (int32_t) * __CHAR_BIT__ != 32) +return 0; + if (sizeof (uint32_t) * __CHAR_BIT__ != 32) +return 0; + if (fake_bswap32 (0x87654321) != 0xff87) +__builtin_abort (); + return 0; +} diff --git a/gcc/testsuite/gcc.c-torture/execute/pr61306-2.c b/gcc/testsuite/gcc.c-torture/execute/pr61306-2.c new file mode 100644 index 000..6cbbd19 --- /dev/null +++ b/gcc/testsuite/gcc.c-torture/execute/pr61306-2.c @@ -0,0 +1,40 @@ +#ifdef __INT16_TYPE__ +typedef __INT16_TYPE__ int16_t; +#else +typedef short int16_t; +#endif + +#ifdef __UINT32_TYPE__ +typedef __UINT32_TYPE__ uint32_t; +#else +typedef unsigned uint32_t; +#endif + +#define __fake_const_swab32(x) ((uint32_t)( \ +(((uint32_t) (x) & (uint32_t)0x00ffUL) << 24) | \ +(((uint32_t)(int16_t)(x) & (uint32_t)0x0000UL) << 8) | \ +(((uint32_t) (x) & (uint32_t)0x00ffUL) >> 8) | \ +(((uint32_t) (x) & (uint32_t)0xff00UL) >> 24))) + + +/* Previous version of bswap optimization failed to consider sign extension + and as a result would replace an expression *not* doing a bswap by a + bswap. */ + +__attribute__ ((noinline, noclone)) uint32_t +fake_bswap32 (uint32_t in) +{ + return __fake_const_swab32 (in); +} + +int +main(void) +{ + if (sizeof (uint32_t) * __CHAR_BIT__ != 32) +return 0; + if (sizeof (int16_t) * __CHAR_BIT__ != 16) +return 0; + if (fake_bswap32 (0x81828384) != 0xff838281) +__builtin_abort (); + return 0; +} diff --git a/gcc/testsuite/gcc.c-torture/execute/pr61306-3.c b/gcc/testsuite/gcc.c-torture/execute/pr61306-3.c new file mode 100644 index 000..6086e27 --- /dev/null +++ b/gcc/testsuite/gcc.c-torture/execute/pr61306-3.c @@ -0,0 +1,13 @@ +short a = -1; +int b; +char c; + +int
RE: [PATCH] Fix PR61306: improve handling of sign and cast in bswap
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme > > Is this OK for trunk? Does this bug qualify for a backport patch to > 4.8 and 4.9 branches? I forgot to mention that this was tested via bootstrap on x86_64-linux-gnu target, the testsuite then showing no regressions and the 3 tests added now passing. Best regards, Thomas
[PATCH] PR61517: fix stmt replacement in bswap pass
Hi everybody, Thanks to a comment from Richard Biener, the bswap pass take care to not perform its optimization is memory is modified between the load of the original expression. However, when it replaces these statements by a single load, it does so in the gimple statement that computes the final bitwise OR of the original expression. However, memory could be modified between the last load statement and this bitwise OR statement. Therefore the result is to read memory *after* it was changed instead of before. This patch takes care to move the statement to be replaced close to one of the original load, thus avoiding this problem. ChangeLog entries for this fix are: *** gcc/ChangeLog *** 2014-06-16 Thomas Preud'homme * tree-ssa-math-opts.c (find_bswap_or_nop_1): Adapt to return a stmt whose rhs's first tree is the source expression instead of the expression itself. (find_bswap_or_nop): Likewise. (bsap_replace): Rename stmt in cur_stmt. Pass gsi by value and src as a gimple stmt whose rhs's first tree is the source. In the memory source case, move the stmt to be replaced close to one of the original load to avoid the problem of a store between the load and the stmt's original location. (pass_optimize_bswap::execute): Adapt to change in bswap_replace's signature. *** gcc/testsuite/ChangeLog *** 2014-06-16 Thomas Preud'homme * gcc.c-torture/execute/bswap-2.c (incorrect_read_le32): New. (incorrect_read_be32): Likewise. (main): Call incorrect_read_* to test stmt replacement is made by bswap at the right place. * gcc.c-torture/execute/pr61517.c: New test. Patch also attached for convenience. Is it ok for trunk? diff --git a/gcc/testsuite/gcc.c-torture/execute/bswap-2.c b/gcc/testsuite/gcc.c-torture/execute/bswap-2.c index a47e01a..88132fe 100644 --- a/gcc/testsuite/gcc.c-torture/execute/bswap-2.c +++ b/gcc/testsuite/gcc.c-torture/execute/bswap-2.c @@ -66,6 +66,32 @@ fake_read_be32 (char *x, char *y) return c3 | c2 << 8 | c1 << 16 | c0 << 24; } +__attribute__ ((noinline, noclone)) uint32_t +incorrect_read_le32 (char *x, char *y) +{ + unsigned char c0, c1, c2, c3; + + c0 = x[0]; + c1 = x[1]; + c2 = x[2]; + c3 = x[3]; + *y = 1; + return c0 | c1 << 8 | c2 << 16 | c3 << 24; +} + +__attribute__ ((noinline, noclone)) uint32_t +incorrect_read_be32 (char *x, char *y) +{ + unsigned char c0, c1, c2, c3; + + c0 = x[0]; + c1 = x[1]; + c2 = x[2]; + c3 = x[3]; + *y = 1; + return c3 | c2 << 8 | c1 << 16 | c0 << 24; +} + int main () { @@ -92,8 +118,17 @@ main () out = fake_read_le32 (cin, &cin[2]); if (out != 0x89018583) __builtin_abort (); + cin[2] = 0x87; out = fake_read_be32 (cin, &cin[2]); if (out != 0x83850189) __builtin_abort (); + cin[2] = 0x87; + out = incorrect_read_le32 (cin, &cin[2]); + if (out != 0x89878583) +__builtin_abort (); + cin[2] = 0x87; + out = incorrect_read_be32 (cin, &cin[2]); + if (out != 0x83858789) +__builtin_abort (); return 0; } diff --git a/gcc/testsuite/gcc.c-torture/execute/pr61517.c b/gcc/testsuite/gcc.c-torture/execute/pr61517.c new file mode 100644 index 000..fc9bbe8 --- /dev/null +++ b/gcc/testsuite/gcc.c-torture/execute/pr61517.c @@ -0,0 +1,19 @@ +int a, b, *c = &a; +unsigned short d; + +int +main () +{ + unsigned int e = a; + *c = 1; + if (!b) +{ + d = e; + *c = d | e; +} + + if (a != 0) +__builtin_abort (); + + return 0; +} diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c index c868e92..1ee2ba8 100644 --- a/gcc/tree-ssa-math-opts.c +++ b/gcc/tree-ssa-math-opts.c @@ -1804,28 +1804,28 @@ find_bswap_or_nop_load (gimple stmt, tree ref, struct symbolic_number *n) /* find_bswap_or_nop_1 invokes itself recursively with N and tries to perform the operation given by the rhs of STMT on the result. If the operation - could successfully be executed the function returns the tree expression of - the source operand and NULL otherwise. */ + could successfully be executed the function returns a gimple stmt whose + rhs's first tree is the expression of the source operand and NULL + otherwise. */ -static tree +static gimple find_bswap_or_nop_1 (gimple stmt, struct symbolic_number *n, int limit) { enum tree_code code; tree rhs1, rhs2 = NULL; - gimple rhs1_stmt, rhs2_stmt; - tree source_expr1; + gimple rhs1_stmt, rhs2_stmt, source_stmt1; enum gimple_rhs_class rhs_class; if (!limit || !is_gimple_assign (stmt)) -return NULL_TREE; +return NULL; rhs1 = gimple_assign_rhs1 (stmt); if (find_bswap_or_nop_load (stmt, rhs1, n)) -return rhs1; +return stmt; if (TREE_CODE (rhs1) != SSA_NAME) -return NULL_TREE; +return NULL; code = gimple_assign_rhs_code (stmt);
RE: [PATCH] Fix PR61306: improve handling of sign and cast in bswap
> From: Richard Biener [mailto:richard.guent...@gmail.com] > Sent: Wednesday, June 11, 2014 4:32 PM > > > > > > Is this OK for trunk? Does this bug qualify for a backport patch to > > 4.8 and 4.9 branches? > > This is ok for trunk and also for backporting (after a short while to > see if there is any fallout). Below is the backported patch for 4.8/4.9. Is this ok for both 4.8 and 4.9? If yes, how much more should I wait before committing? Tested on both 4.8 and 4.9 without regression in the testsuite after a bootstrap. diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 1e35bbe..0559b7f 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,16 @@ +2014-06-12 Thomas Preud'homme + + PR tree-optimization/61306 + * tree-ssa-math-opts.c (struct symbolic_number): Store type of + expression instead of its size. + (do_shift_rotate): Adapt to change in struct symbolic_number. Return + false to prevent optimization when the result is unpredictable due to + arithmetic right shift of signed type with highest byte is set. + (verify_symbolic_number_p): Adapt to change in struct symbolic_number. + (find_bswap_1): Likewise. Return NULL to prevent optimization when the + result is unpredictable due to sign extension. + (find_bswap): Adapt to change in struct symbolic_number. + 2014-06-12 Alan Modra PR target/61300 diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 757cb74..139f23c 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,9 @@ +2014-06-12 Thomas Preud'homme + + * gcc.c-torture/execute/pr61306-1.c: New test. + * gcc.c-torture/execute/pr61306-2.c: Likewise. + * gcc.c-torture/execute/pr61306-3.c: Likewise. + 2014-06-11 Richard Biener PR tree-optimization/61452 diff --git a/gcc/testsuite/gcc.c-torture/execute/pr61306-1.c b/gcc/testsuite/gcc.c-torture/execute/pr61306-1.c new file mode 100644 index 000..ebc90a3 --- /dev/null +++ b/gcc/testsuite/gcc.c-torture/execute/pr61306-1.c @@ -0,0 +1,39 @@ +#ifdef __INT32_TYPE__ +typedef __INT32_TYPE__ int32_t; +#else +typedef int int32_t; +#endif + +#ifdef __UINT32_TYPE__ +typedef __UINT32_TYPE__ uint32_t; +#else +typedef unsigned uint32_t; +#endif + +#define __fake_const_swab32(x) ((uint32_t)( \ + (((uint32_t)(x) & (uint32_t)0x00ffUL) << 24) |\ + (((uint32_t)(x) & (uint32_t)0xff00UL) << 8) |\ + (((uint32_t)(x) & (uint32_t)0x00ffUL) >> 8) |\ + (( (int32_t)(x) & (int32_t)0xff00UL) >> 24))) + +/* Previous version of bswap optimization failed to consider sign extension + and as a result would replace an expression *not* doing a bswap by a + bswap. */ + +__attribute__ ((noinline, noclone)) uint32_t +fake_bswap32 (uint32_t in) +{ + return __fake_const_swab32 (in); +} + +int +main(void) +{ + if (sizeof (int32_t) * __CHAR_BIT__ != 32) +return 0; + if (sizeof (uint32_t) * __CHAR_BIT__ != 32) +return 0; + if (fake_bswap32 (0x87654321) != 0xff87) +__builtin_abort (); + return 0; +} diff --git a/gcc/testsuite/gcc.c-torture/execute/pr61306-2.c b/gcc/testsuite/gcc.c-torture/execute/pr61306-2.c new file mode 100644 index 000..886ecfd --- /dev/null +++ b/gcc/testsuite/gcc.c-torture/execute/pr61306-2.c @@ -0,0 +1,40 @@ +#ifdef __INT16_TYPE__ +typedef __INT16_TYPE__ int16_t; +#else +typedef short int16_t; +#endif + +#ifdef __UINT32_TYPE__ +typedef __UINT32_TYPE__ uint32_t; +#else +typedef unsigned uint32_t; +#endif + +#define __fake_const_swab32(x) ((uint32_t)( \ + (((uint32_t) (x) & (uint32_t)0x00ffUL) << 24) | \ + (((uint32_t)(int16_t)(x) & (uint32_t)0x0000UL) << 8) | \ + (((uint32_t) (x) & (uint32_t)0x00ffUL) >> 8) | \ + (((uint32_t) (x) & (uint32_t)0xff00UL) >> 24))) + + +/* Previous version of bswap optimization failed to consider sign extension + and as a result would replace an expression *not* doing a bswap by a + bswap. */ + +__attribute__ ((noinline, noclone)) uint32_t +fake_bswap32 (uint32_t in) +{ + return __fake_const_swab32 (in); +} + +int +main(void) +{ + if (sizeof (uint32_t) * __CHAR_BIT__ != 32) +return 0; + if (sizeof (int16_t) * __CHAR_BIT__ != 16) +return 0; + if (fake_bswap32 (0x81828384) != 0xff838281) +__builtin_abort (); + return 0; +} diff --git a/gcc/testsuite/gcc.c-torture/execute/pr61306-3.c b/gcc/testsuite/gcc.c-torture/execute/pr61306-3.c new file mode 100644 index 000..6086e27 --- /dev/null +++ b/gcc/testsuite/gcc.c-torture/execute/pr61306-3.c @@ -0,0 +1,13 @@ +short a = -1; +int b; +char c; + +int +main () +{ + c = a; + b = a | c; + if (b != -1) +__builtin_abort (); + return 0; +} diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tr
RE: [PATCH] Fix PR61306: improve handling of sign and cast in bswap
> From: Jakub Jelinek [mailto:ja...@redhat.com] > Sent: Thursday, June 19, 2014 1:54 AM > > Seems there are actually two spots with this, not just one. > > Completely untested fix: > > 2014-06-18 Jakub Jelinek > > * tree-ssa-math-opts.c (do_shift_rotate, find_bswap_or_nop_1): > Cast > 0xff to uint64_t before shifting it up. > > --- gcc/tree-ssa-math-opts.c 2014-06-13 08:08:42.354136356 +0200 > +++ gcc/tree-ssa-math-opts.c 2014-06-18 19:50:59.486916201 +0200 > @@ -1669,7 +1669,8 @@ do_shift_rotate (enum tree_code code, >break; > case RSHIFT_EXPR: >/* Arithmetic shift of signed type: result is dependent on the value. > */ > - if (!TYPE_UNSIGNED (n->type) && (n->n & (0xff << (bitsize - 8 > + if (!TYPE_UNSIGNED (n->type) > + && (n->n & ((uint64_t) 0xff << (bitsize - 8 > return false; >n->n >>= count; >break; > @@ -1903,7 +1904,7 @@ find_bswap_or_nop_1 (gimple stmt, struct > old_type_size = TYPE_PRECISION (n->type); > if (!TYPE_UNSIGNED (n->type) > && type_size > old_type_size > - && n->n & (0xff << (old_type_size - 8))) > + && n->n & ((uint64_t) 0xff << (old_type_size - 8))) > return NULL_TREE; > > if (type_size / BITS_PER_UNIT < (int)(sizeof (int64_t))) > > Yep, that's the right fix. I tested it on both a bootstrapped gcc on x86_64-linux-gnu and an arm-none-eabi cross-compiler with no regression on the testsuite. Jakub, since you made the patch, the honor of commiting it should be yours. Richard, given this issue, I think we should wait a few more days before I commit A backported (and fixed of course) version to 4.8 and 4.9. Best regards, Thomas
RE: [PATCH] Fix PR61375: cancel bswap optimization when value doesn't fit in a HOST_WIDE_INT
> From: Richard Biener [mailto:richard.guent...@gmail.com] > Sent: Tuesday, June 10, 2014 5:05 PM > > Backports are welcome - please post a patch. > Sorry for the delay. Here you are: diff --git a/gcc/testsuite/gcc.c-torture/execute/pr61375.c b/gcc/testsuite/gcc.c-torture/execute/pr61375.c new file mode 100644 index 000..d3b54a8 --- /dev/null +++ b/gcc/testsuite/gcc.c-torture/execute/pr61375.c @@ -0,0 +1,35 @@ +#ifdef __UINT64_TYPE__ +typedef __UINT64_TYPE__ uint64_t; +#else +typedef unsigned long long uint64_t; +#endif + +#ifndef __SIZEOF_INT128__ +#define __int128 long long +#endif + +/* Some version of bswap optimization would ICE when analyzing a mask constant + too big for an HOST_WIDE_INT (PR61375). */ + +__attribute__ ((noinline, noclone)) uint64_t +uint128_central_bitsi_ior (unsigned __int128 in1, uint64_t in2) +{ + __int128 mask = (__int128)0x << 56; + return ((in1 & mask) >> 56) | in2; +} + +int +main (int argc) +{ + __int128 in = 1; +#ifdef __SIZEOF_INT128__ + in <<= 64; +#endif + if (sizeof (uint64_t) * __CHAR_BIT__ != 64) +return 0; + if (sizeof (unsigned __int128) * __CHAR_BIT__ != 128) +return 0; + if (uint128_central_bitsi_ior (in, 2) != 0x102) +__builtin_abort (); + return 0; +} diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c index 9ff857c..9d64205 100644 --- a/gcc/tree-ssa-math-opts.c +++ b/gcc/tree-ssa-math-opts.c @@ -1741,6 +1741,8 @@ find_bswap_1 (gimple stmt, struct symbolic_number *n, int limit) if (n->size % BITS_PER_UNIT != 0) return NULL_TREE; n->size /= BITS_PER_UNIT; + if (n->size > (int)sizeof (unsigned HOST_WIDEST_INT)) + return NULL_TREE; n->n = (sizeof (HOST_WIDEST_INT) < 8 ? 0 : (unsigned HOST_WIDEST_INT)0x08070605 << 32 | 0x04030201); @@ -1781,6 +1783,8 @@ find_bswap_1 (gimple stmt, struct symbolic_number *n, int limit) type_size = TYPE_PRECISION (gimple_expr_type (stmt)); if (type_size % BITS_PER_UNIT != 0) return NULL_TREE; + if (type_size > (int)HOST_BITS_PER_WIDEST_INT) + return NULL_TREE; if (type_size / BITS_PER_UNIT < (int)(sizeof (HOST_WIDEST_INT))) { Ok for GCC 4.8 and GCC 4.9 branches? Best regards, Thomas
RE: [PATCH] Fix PR61375: cancel bswap optimization when value doesn't fit in a HOST_WIDE_INT
> From: Jakub Jelinek [mailto:ja...@redhat.com] > Sent: Monday, June 23, 2014 4:37 PM > > On Mon, Jun 23, 2014 at 10:18:16AM +0200, Richard Biener wrote: > > > --- a/gcc/tree-ssa-math-opts.c > > > +++ b/gcc/tree-ssa-math-opts.c > > > @@ -1741,6 +1741,8 @@ find_bswap_1 (gimple stmt, struct > symbolic_number *n, int limit) > > > if (n->size % BITS_PER_UNIT != 0) > > > return NULL_TREE; > > > n->size /= BITS_PER_UNIT; > > > + if (n->size > (int)sizeof (unsigned HOST_WIDEST_INT)) > > > + return NULL_TREE; > > This looks wrong, while the bswap pass is guarded with BITS_PER_UNIT == 8 > check (i.e. target), you don't know of HOST_BITS_PER_CHAR is 8. > I'd move the test before the division by BITS_PER_UNIT, and compare > against HOST_BITS_PER_WIDEST_INT. I may misunderstand you but I don't think there is a problem here because we just check if we can create a value on the host with as many bytes as the value on the target. The value on the host is different, with each byte being a number from 1 to SIZE, SIZE being the number of bytes on the target. So this would fail only if the target value has so many bytes that this number of byte cannot be represented in a HOST_WIDEST_INT. > > > > n->n = (sizeof (HOST_WIDEST_INT) < 8 ? 0 : > > > (unsigned HOST_WIDEST_INT)0x08070605 << 32 | > > > 0x04030201); > > > > > > @@ -1781,6 +1783,8 @@ find_bswap_1 (gimple stmt, struct > symbolic_number *n, int limit) > > > type_size = TYPE_PRECISION (gimple_expr_type (stmt)); > > > if (type_size % BITS_PER_UNIT != 0) > > > return NULL_TREE; > > > + if (type_size > (int)HOST_BITS_PER_WIDEST_INT) > > > + return NULL_TREE; > > > > > > if (type_size / BITS_PER_UNIT < (int)(sizeof > > > (HOST_WIDEST_INT))) > > > { > > Similarly here. I agree that here the test is not correct as we look at the number of bits on the host which should be enough to count the number of byte on the target. To reflect better the intent we should first compute the number of byte that type_size forms and then compare to the size in byte of HOST_WIDEST_INT. I'll rework the patch in this directly. > > BTW, the formatting is wrong too, the (int) cast should be followed by space. Right, but note that I merely followed the current style in this file. There are many more occurences of this style mistake in this file. Do you want me to fix this one anyway? Best regards, Thomas
RE: [PATCH] Fix PR61375: cancel bswap optimization when value doesn't fit in a HOST_WIDE_INT
> From: Jakub Jelinek [mailto:ja...@redhat.com] > Sent: Monday, June 23, 2014 4:59 PM > > Host could e.g. in theory have CHAR_BIT 32, while target BITS_PER_UNIT 8 > (otherwise bswap pass would give up). sizeof (unsigned HOST_WIDE_INT) > could > very well be 2 in that case. In this case the pass would skip any value of more than 2 bytes. However although the original comments on struct symbolic_number implies that there is a mapping between host bytes (the bytes of the symbolic number) and target bytes, it isn't the case since do_shift_rotate () shift the symbolic number by quantity of BYTES_PER_UNIT instead of CHAR_BIT. Also there is quite a few 8 here and there. Although not a problem in practice, the mix of 8 and BITS_PER_UNIT does not look very good. I guess a quick review would be in order. Of course, with regards to the backport the mix of 8 and BITS_PER_UNIT should be left as is and only confusion about how to represent a target value into a host type should be fixed if any. I'll come back to you whenever this is done. Best regards, Thomas
RE: [PATCH] Fix PR61375: cancel bswap optimization when value doesn't fit in a HOST_WIDE_INT
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme > > However > although the original comments on struct symbolic_number implies that > there is a mapping between host bytes (the bytes of the symbolic number) > and target bytes, it isn't the case since do_shift_rotate () shift the > symbolic > number by quantity of BYTES_PER_UNIT instead of CHAR_BIT. My bad, the comment can be understood both ways. Best regards, Thomas
RE: [PATCH] Fix PR61375: cancel bswap optimization when value doesn't fit in a HOST_WIDE_INT
Ok, what about the following patch and associated ChangeLog entries? 2014-06-24 Thomas Preud'homme PR tree-optimization/61375 * tree-ssa-math-opts.c (find_bswap_or_nop_1): Cancel optimization if symbolic number cannot be represented in an unsigned HOST_WIDE_INT. (execute_optimize_bswap): Cancel optimization if CHAR_BIT != 8. 2014-06-24 Thomas Preud'homme PR tree-optimization/61375 * gcc.c-torture/execute/pr61375-1.c: New test. diff --git a/gcc/testsuite/gcc.c-torture/execute/pr61375.c b/gcc/testsuite/gcc.c-torture/execute/pr61375.c new file mode 100644 index 000..6fb4693 --- /dev/null +++ b/gcc/testsuite/gcc.c-torture/execute/pr61375.c @@ -0,0 +1,35 @@ +#ifdef __UINT64_TYPE__ +typedef __UINT64_TYPE__ uint64_t; +#else +typedef unsigned long long uint64_t; +#endif + +#ifndef __SIZEOF_INT128__ +#define __int128 long long +#endif + +/* Some version of bswap optimization would ICE when analyzing a mask constant + too big for an HOST_WIDE_INT (PR61375). */ + +__attribute__ ((noinline, noclone)) uint64_t +uint128_central_bitsi_ior (unsigned __int128 in1, uint64_t in2) +{ + __int128 mask = (__int128)0x << 56; + return ((in1 & mask) >> 56) | in2; +} + +int +main (int argc) +{ + __int128 in = 1; +#ifdef __SIZEOF_INT128__ + in <<= 64; +#endif + if (sizeof (uint64_t) * __CHAR_BIT__ != 64) +return 0; + if (sizeof (unsigned __int128) * __CHAR_BIT__ != 128) +return 0; + if (uint128_central_bitsi_ior (in, 2) != 0x102) +__builtin_abort (); + return 0; +} diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c index 9ff857c..045bf48 100644 --- a/gcc/tree-ssa-math-opts.c +++ b/gcc/tree-ssa-math-opts.c @@ -1740,6 +1740,8 @@ find_bswap_1 (gimple stmt, struct symbolic_number *n, int limit) n->size = TYPE_PRECISION (TREE_TYPE (rhs1)); if (n->size % BITS_PER_UNIT != 0) return NULL_TREE; + if (n->size > HOST_BITS_PER_WIDEST_INT) + return NULL_TREE; n->size /= BITS_PER_UNIT; n->n = (sizeof (HOST_WIDEST_INT) < 8 ? 0 : (unsigned HOST_WIDEST_INT)0x08070605 << 32 | 0x04030201); @@ -1781,6 +1783,8 @@ find_bswap_1 (gimple stmt, struct symbolic_number *n, int limit) type_size = TYPE_PRECISION (gimple_expr_type (stmt)); if (type_size % BITS_PER_UNIT != 0) return NULL_TREE; + if (type_size > (int) HOST_BITS_PER_WIDEST_INT) + return NULL_TREE; if (type_size / BITS_PER_UNIT < (int)(sizeof (HOST_WIDEST_INT))) { @@ -1911,7 +1915,7 @@ execute_optimize_bswap (void) bool changed = false; tree bswap16_type = NULL_TREE, bswap32_type = NULL_TREE, bswap64_type = NULL_TREE; - if (BITS_PER_UNIT != 8) + if (BITS_PER_UNIT != 8 || CHAR_BIT != 8) return 0; if (sizeof (HOST_WIDEST_INT) < 8) Is this ok for 4.8 and 4.9 branches? Best regards, Thomas
RE: [PATCH] Fix PR61306: improve handling of sign and cast in bswap
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme > Sent: Thursday, June 19, 2014 1:36 PM > > Richard, given this issue, I think we should wait a few more days before I > commit > A backported (and fixed of course) version to 4.8 and 4.9. No new issues were reported since then. Is it ok to commit the backport (with Jakub fix) now or should we wait more? Best regards, Thomas
RE: [PATCH] Fix PR61306: improve handling of sign and cast in bswap
> From: Richard Biener [mailto:richard.guent...@gmail.com] > Sent: Friday, June 27, 2014 4:49 PM > > FIne with me now. Commited. Best regards, Thomas
RE: [PATCH] Fix PR61375: cancel bswap optimization when value doesn't fit in a HOST_WIDE_INT
Ping? > -Original Message- > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme > Sent: Thursday, June 26, 2014 9:11 AM > To: 'Jakub Jelinek' > Cc: Richard Biener; GCC Patches > Subject: RE: [PATCH] Fix PR61375: cancel bswap optimization when value > doesn't fit in a HOST_WIDE_INT > > Ok, what about the following patch and associated ChangeLog entries? > > 2014-06-24 Thomas Preud'homme > > PR tree-optimization/61375 > * tree-ssa-math-opts.c (find_bswap_or_nop_1): Cancel optimization > if > symbolic number cannot be represented in an unsigned > HOST_WIDE_INT. > (execute_optimize_bswap): Cancel optimization if CHAR_BIT != 8. > > 2014-06-24 Thomas Preud'homme > > PR tree-optimization/61375 > * gcc.c-torture/execute/pr61375-1.c: New test. > > > diff --git a/gcc/testsuite/gcc.c-torture/execute/pr61375.c > b/gcc/testsuite/gcc.c-torture/execute/pr61375.c > new file mode 100644 > index 000..6fb4693 > --- /dev/null > +++ b/gcc/testsuite/gcc.c-torture/execute/pr61375.c > @@ -0,0 +1,35 @@ > +#ifdef __UINT64_TYPE__ > +typedef __UINT64_TYPE__ uint64_t; > +#else > +typedef unsigned long long uint64_t; > +#endif > + > +#ifndef __SIZEOF_INT128__ > +#define __int128 long long > +#endif > + > +/* Some version of bswap optimization would ICE when analyzing a mask > constant > + too big for an HOST_WIDE_INT (PR61375). */ > + > +__attribute__ ((noinline, noclone)) uint64_t > +uint128_central_bitsi_ior (unsigned __int128 in1, uint64_t in2) > +{ > + __int128 mask = (__int128)0x << 56; > + return ((in1 & mask) >> 56) | in2; > +} > + > +int > +main (int argc) > +{ > + __int128 in = 1; > +#ifdef __SIZEOF_INT128__ > + in <<= 64; > +#endif > + if (sizeof (uint64_t) * __CHAR_BIT__ != 64) > +return 0; > + if (sizeof (unsigned __int128) * __CHAR_BIT__ != 128) > +return 0; > + if (uint128_central_bitsi_ior (in, 2) != 0x102) > +__builtin_abort (); > + return 0; > +} > diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c > index 9ff857c..045bf48 100644 > --- a/gcc/tree-ssa-math-opts.c > +++ b/gcc/tree-ssa-math-opts.c > @@ -1740,6 +1740,8 @@ find_bswap_1 (gimple stmt, struct > symbolic_number *n, int limit) > n->size = TYPE_PRECISION (TREE_TYPE (rhs1)); > if (n->size % BITS_PER_UNIT != 0) > return NULL_TREE; > + if (n->size > HOST_BITS_PER_WIDEST_INT) > + return NULL_TREE; > n->size /= BITS_PER_UNIT; > n->n = (sizeof (HOST_WIDEST_INT) < 8 ? 0 : > (unsigned HOST_WIDEST_INT)0x08070605 << 32 | > 0x04030201); > @@ -1781,6 +1783,8 @@ find_bswap_1 (gimple stmt, struct > symbolic_number *n, int limit) > type_size = TYPE_PRECISION (gimple_expr_type (stmt)); > if (type_size % BITS_PER_UNIT != 0) > return NULL_TREE; > + if (type_size > (int) HOST_BITS_PER_WIDEST_INT) > + return NULL_TREE; > > if (type_size / BITS_PER_UNIT < (int)(sizeof (HOST_WIDEST_INT))) > { > @@ -1911,7 +1915,7 @@ execute_optimize_bswap (void) >bool changed = false; >tree bswap16_type = NULL_TREE, bswap32_type = NULL_TREE, > bswap64_type = NULL_TREE; > > - if (BITS_PER_UNIT != 8) > + if (BITS_PER_UNIT != 8 || CHAR_BIT != 8) > return 0; > >if (sizeof (HOST_WIDEST_INT) < 8) > > Is this ok for 4.8 and 4.9 branches? > > Best regards, > > Thomas > >
[PATCH] Fix confusion between target, host and symbolic number byte sizes
The bswap pass deals with 3 possibly different byte size: host, target and the size a byte marker occupied in the symbolic_number structure [1]. However, as of now the code mixes the three size. This works in practice as the pass is only enabled for target with BITS_PER_UNIT == 8 and nobody runs GCC on a host with CHAR_BIT != 8. As prompted by Jakub Jelinek, this patch fixes this mess. Byte marker are 8-bit quantities (they could be made 4-bit quantities but I prefered to keep the code working the same as before) for which a new macro is introduced (BITS_PER_MARKERS), anything related to storing the value or a byte marker in a variable should check for the host byte size or wide integer size and anything aimed at manipulating the target value should check for BITS_PER_UNIT. [1] Although the comment for this structure implies that a byte marker as the same size as the host byte, the way it is used in the code (even before any of my patch) shows that it uses a fixed size of 8 [2]. [2] Note that since the pass is only active for targets with BITS_PER_UNIT == 8, it might be using the target byte size. gcc/ChangeLog: 2014-07-04 Thomas Preud'homme * tree-ssa-math-opts.c (struct symbolic_number): Clarify comment about the size of byte markers. (do_shift_rotate): Fix confusion between host, target and marker byte size. (verify_symbolic_number_p): Likewise. (find_bswap_or_nop_1): Likewise. (find_bswap_or_nop): Likewise. diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c index ca2b30d..55c5df7 100644 --- a/gcc/tree-ssa-math-opts.c +++ b/gcc/tree-ssa-math-opts.c @@ -1602,11 +1602,10 @@ make_pass_cse_sincos (gcc::context *ctxt) /* A symbolic number is used to detect byte permutation and selection patterns. Therefore the field N contains an artificial number - consisting of byte size markers: + consisting of octet sized markers: - 0- byte has the value 0 - 1..size - byte contains the content of the byte - number indexed with that value minus one. + 0- target byte has the value 0 + 1..size - marker value is the target byte index minus one. To detect permutations on memory sources (arrays and structures), a symbolic number is also associated a base address (the array or structure the load is @@ -1631,6 +1630,8 @@ struct symbolic_number { unsigned HOST_WIDE_INT range; }; +#define BITS_PER_MARKER 8 + /* The number which the find_bswap_or_nop_1 result should match in order to have a nop. The number is masked according to the size of the symbolic number before using it. */ @@ -1652,15 +1653,16 @@ do_shift_rotate (enum tree_code code, struct symbolic_number *n, int count) { - int bitsize = TYPE_PRECISION (n->type); + int size = TYPE_PRECISION (n->type) / BITS_PER_UNIT; - if (count % 8 != 0) + if (count % BITS_PER_UNIT != 0) return false; + count = (count / BITS_PER_UNIT) * BITS_PER_MARKER; /* Zero out the extra bits of N in order to avoid them being shifted into the significant bits. */ - if (bitsize < 8 * (int)sizeof (int64_t)) -n->n &= ((uint64_t)1 << bitsize) - 1; + if (size < 64 / BITS_PER_MARKER) +n->n &= ((uint64_t) 1 << (size * BITS_PER_MARKER)) - 1; switch (code) { @@ -1670,22 +1672,22 @@ do_shift_rotate (enum tree_code code, case RSHIFT_EXPR: /* Arithmetic shift of signed type: result is dependent on the value. */ if (!TYPE_UNSIGNED (n->type) - && (n->n & ((uint64_t) 0xff << (bitsize - 8 + && (n->n & ((uint64_t) 0xff << ((size - 1) * BITS_PER_MARKER return false; n->n >>= count; break; case LROTATE_EXPR: - n->n = (n->n << count) | (n->n >> (bitsize - count)); + n->n = (n->n << count) | (n->n >> ((size * BITS_PER_MARKER) - count)); break; case RROTATE_EXPR: - n->n = (n->n >> count) | (n->n << (bitsize - count)); + n->n = (n->n >> count) | (n->n << ((size * BITS_PER_MARKER) - count)); break; default: return false; } /* Zero unused bits for size. */ - if (bitsize < 8 * (int)sizeof (int64_t)) -n->n &= ((uint64_t)1 << bitsize) - 1; + if (size < 64 / BITS_PER_MARKER) +n->n &= ((uint64_t) 1 << (size * BITS_PER_MARKER)) - 1; return true; } @@ -1726,13 +1728,13 @@ init_symbolic_number (struct symbolic_number *n, tree src) if (size % BITS_PER_UNIT != 0) return false; size /= BITS_PER_UNIT; - if (size > (int)sizeof (uint64_t)) + if (size > 64 / BITS_PER_MARKER) return false; n->range = size; n->n = CMPNOP; - if (size < (int)sizeof (int64_t)) -n->n &= ((uint64_t)1 << (size * BITS_PE
RE: [PATCH, ping 1] Move insns without introducing new temporaries in loop2_invariant
> From: Jeff Law [mailto:l...@redhat.com] > Sent: Tuesday, May 12, 2015 4:17 AM > > On 05/06/2015 03:47 AM, Thomas Preud'homme wrote: > > Ping? > Something to consider as future work -- I'm pretty sure PRE sets up the > same kind of problematical pattern with a new pseudo (reaching reg) > holding the result of the redundant expression and the original > evaluations turned into copies from the reaching reg to the final > destination. Yes absolutely, this is how the pattern I was interested in was created. The reason I solved it in loop-invariant is that I thought this was on purpose with the cleanup left to loop-invariant. When finding a TODO comment about this in loop-invariant I thought it confirmed my initial thoughts. > > That style is easy to prove correct. There was an issue with the copies > not propagating away that was pretty inherent in the partial redundancy > cases that I could probably dig out of my archives if you're interested. If you think this should also (or instead) be fixed in PRE I can take a look at some point later since it shouldn't be much more work. > It looks like there's a variety of line wrapping issues. Please > double-check line wrapping using an 80 column window. Minor I know, > but > the consistency with the rest of the code is good. Looking in vim seems to systematically cut at 80 column and check_GNU_style.sh only complain about the dg-final line in the new testcases. Could you point me to such an occurrence? > > >> > >> + > >> + /* Check that all uses reached by the def in insn would still be > reached > >> + it. */ > >> + dest_regno = REGNO (reg); > >> + for (use = DF_REG_USE_CHAIN (dest_regno); use; use = > >> DF_REF_NEXT_REG (use)) > [ ... ] > So isn't this overly conservative if DEST_REGNO is set multiple times > since it's going to look at all the uses, even those not necessarily > reached by the original SET of DEST_REGNO? > > Or is that not an issue for some reason? And I'm not requiring you to > make this optimal, but if I'm right, a comment here seems wise. My apologize, it is the comment that is incorrect since it doesn't match the code (a remaining of an old version of this patch). The code actually checks that the use was dominated by the instruction before it is moved out of the loop. This is to prevent the code motion in case like: foo = 1; bar = 0; for () { bar += foo; foo = 42; } which I met in some of the testsuite cases. > > > I think with the wrapping nits fixed and closure on the multi-set issue > noted immediately above and this will be good for the trunk. I'll fix this comment right away. Best regards, Thomas
RE: [PATCH, ping 1] Move insns without introducing new temporaries in loop2_invariant
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme > > > From: Jeff Law [mailto:l...@redhat.com] > > Sent: Tuesday, May 12, 2015 4:17 AM > > > > >> > > >> + > > >> + /* Check that all uses reached by the def in insn would still be > > reached > > >> + it. */ > > >> + dest_regno = REGNO (reg); > > >> + for (use = DF_REG_USE_CHAIN (dest_regno); use; use = > > >> DF_REF_NEXT_REG (use)) > > [ ... ] > > So isn't this overly conservative if DEST_REGNO is set multiple times > > since it's going to look at all the uses, even those not necessarily > > reached by the original SET of DEST_REGNO? > > > > Or is that not an issue for some reason? And I'm not requiring you to > > make this optimal, but if I'm right, a comment here seems wise. > > My apologize, it is the comment that is incorrect since it doesn't match > the code (a remaining of an old version of this patch). The code actually > checks that the use was dominated by the instruction before it is moved > out of the loop. > > > > > > I think with the wrapping nits fixed and closure on the multi-set issue > > noted immediately above and this will be good for the trunk. > > I'll fix this comment right away. Please find below a patch with the comment fixed. *** gcc/ChangeLog *** 2015-05-12 Thomas Preud'homme * loop-invariant.c (can_move_invariant_reg): New. (move_invariant_reg): Call above new function to decide whether instruction can just be moved, skipping creation of temporary register. *** gcc/testsuite/ChangeLog *** 2015-05-12 Thomas Preud'homme * gcc.dg/loop-8.c: New test. * gcc.dg/loop-9.c: New test. diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c index e3b560d..76a009f 100644 --- a/gcc/loop-invariant.c +++ b/gcc/loop-invariant.c @@ -1511,6 +1511,79 @@ replace_uses (struct invariant *inv, rtx reg, bool in_group) return 1; } +/* Whether invariant INV setting REG can be moved out of LOOP, at the end of + the block preceding its header. */ + +static bool +can_move_invariant_reg (struct loop *loop, struct invariant *inv, rtx reg) +{ + df_ref def, use; + unsigned int dest_regno, defs_in_loop_count = 0; + rtx_insn *insn = inv->insn; + basic_block bb = BLOCK_FOR_INSN (inv->insn); + + /* We ignore hard register and memory access for cost and complexity reasons. + Hard register are few at this stage and expensive to consider as they + require building a separate data flow. Memory access would require using + df_simulate_* and can_move_insns_across functions and is more complex. */ + if (!REG_P (reg) || HARD_REGISTER_P (reg)) +return false; + + /* Check whether the set is always executed. We could omit this condition if + we know that the register is unused outside of the loop, but it does not + seem worth finding out. */ + if (!inv->always_executed) +return false; + + /* Check that all uses that would be dominated by def are already dominated + by it. */ + dest_regno = REGNO (reg); + for (use = DF_REG_USE_CHAIN (dest_regno); use; use = DF_REF_NEXT_REG (use)) +{ + rtx_insn *use_insn; + basic_block use_bb; + + use_insn = DF_REF_INSN (use); + use_bb = BLOCK_FOR_INSN (use_insn); + + /* Ignore instruction considered for moving. */ + if (use_insn == insn) + continue; + + /* Don't consider uses outside loop. */ + if (!flow_bb_inside_loop_p (loop, use_bb)) + continue; + + /* Don't move if a use is not dominated by def in insn. */ + if (use_bb == bb && DF_INSN_LUID (insn) >= DF_INSN_LUID (use_insn)) + return false; + if (!dominated_by_p (CDI_DOMINATORS, use_bb, bb)) + return false; +} + + /* Check for other defs. Any other def in the loop might reach a use + currently reached by the def in insn. */ + for (def = DF_REG_DEF_CHAIN (dest_regno); def; def = DF_REF_NEXT_REG (def)) +{ + basic_block def_bb = DF_REF_BB (def); + + /* Defs in exit block cannot reach a use they weren't already. */ + if (single_succ_p (def_bb)) + { + basic_block def_bb_succ; + + def_bb_succ = single_succ (def_bb); + if (!flow_bb_inside_loop_p (loop, def_bb_succ)) + continue; + } + + if (++defs_in_loop_count > 1) + return false; +} + + return true; +} + /* Move invariant INVNO out of the LOOP. Returns true if this succeeds, false otherwise. */ @@ -1544,11 +1617,8 @@ move_invariant_reg (struct loop *loop, unsigned invno) } } - /* Move the set out of the loop. If the set is always executed (we could
RE: [PATCH, ping 1] Move insns without introducing new temporaries in loop2_invariant
> From: Jeff Law [mailto:l...@redhat.com] > Sent: Wednesday, May 13, 2015 4:05 AM > OK for the trunk. > > Thanks for your patience, Thanks. Committed with the added "PR rtl-optimization/64616" to both ChangeLog entries. Best regards, Thomas
[PATCH] Fix PR66168: ICE due to incorrect invariant register info
Hi, r223113 made it possible for invariant to actually be moved rather than moving the source to a new pseudoregister. However, when doing so the inv->reg is not set up properly: in case of a subreg destination it holds the inner register rather than the subreg expression. This patch fixes that. ChangeLog entries are as follow: *** gcc/ChangeLog *** 2015-05-18 Thomas Preud'homme PR rtl-optimization/66168 * loop-invariant.c (move_invariant_reg): Set inv->reg to destination of inv->insn when moving an invariant without introducing a temporary register. *** gcc/testsuite/ChangeLog *** 2015-05-18 Thomas Preud'homme PR rtl-optimization/66168 * gcc.c-torture/compile/pr66168.c: New test. diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c index 76a009f..30e1945 100644 --- a/gcc/loop-invariant.c +++ b/gcc/loop-invariant.c @@ -1642,9 +1642,13 @@ move_invariant_reg (struct loop *loop, unsigned invno) emit_insn_after (gen_move_insn (dest, reg), inv->insn); } - else if (dump_file) - fprintf (dump_file, "Invariant %d moved without introducing a new " - "temporary register\n", invno); + else + { + reg = SET_DEST (set); + if (dump_file) + fprintf (dump_file, "Invariant %d moved without introducing a new " + "temporary register\n", invno); + } reorder_insns (inv->insn, inv->insn, BB_END (preheader)); /* If there is a REG_EQUAL note on the insn we just moved, and the diff --git a/gcc/testsuite/gcc.c-torture/compile/pr66168.c b/gcc/testsuite/gcc.c-torture/compile/pr66168.c new file mode 100644 index 000..d6bfc7b --- /dev/null +++ b/gcc/testsuite/gcc.c-torture/compile/pr66168.c @@ -0,0 +1,15 @@ +int a, b; + +void +fn1 () +{ + for (;;) +{ + for (b = 0; b < 3; b++) + { + char e[2]; + char f = e[1]; + a ^= f ? 1 / f : 0; + } +} +} Tested by bootstrapping on x86_64-linux-gnu and building an arm-none-eabi cross-compiler. Testsuite run shows no regression for both of them. Ok for trunk? Best regards, Thomas
RE: [PATCH] Fix PR66168: ICE due to incorrect invariant register info
> From: Steven Bosscher [mailto:stevenb@gmail.com] > Sent: Tuesday, May 19, 2015 7:21 PM > > Not OK. > This will break in move_invariants() when it looks at REGNO (inv->reg). Indeed. I'm even surprised all tests passed. Ok I will just prevent moving in such a case. I'm running the tests now and will get back to you tomorrow. Best regards, Thomas
RE: [PATCH] Fix PR66168: ICE due to incorrect invariant register info
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme > > From: Steven Bosscher [mailto:stevenb@gmail.com] > > Sent: Tuesday, May 19, 2015 7:21 PM > > > > Not OK. > > This will break in move_invariants() when it looks at REGNO (inv->reg). > > Indeed. I'm even surprised all tests passed. Ok I will just prevent moving > in such a case. I'm running the tests now and will get back to you > tomorrow. Patch is now tested via bootstrap + testsuite run on x86_64-linux-gnu and building arm-none-eabi cross-compiler + testsuite run. Both testsuite run show no regression. diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c index 76a009f..4ce3576 100644 --- a/gcc/loop-invariant.c +++ b/gcc/loop-invariant.c @@ -1626,7 +1626,7 @@ move_invariant_reg (struct loop *loop, unsigned invno) if (REG_P (reg)) regno = REGNO (reg); - if (!can_move_invariant_reg (loop, inv, reg)) + if (!can_move_invariant_reg (loop, inv, dest)) { reg = gen_reg_rtx_and_attrs (dest); diff --git a/gcc/testsuite/gcc.c-torture/compile/pr66168.c b/gcc/testsuite/gcc.c-torture/compile/pr66168.c new file mode 100644 index 000..d6bfc7b --- /dev/null +++ b/gcc/testsuite/gcc.c-torture/compile/pr66168.c @@ -0,0 +1,15 @@ +int a, b; + +void +fn1 () +{ + for (;;) +{ + for (b = 0; b < 3; b++) + { + char e[2]; + char f = e[1]; + a ^= f ? 1 / f : 0; + } +} +} Ok for trunk? Best regards, Thomas
RE: [PATCH] Fix PR66168: ICE due to incorrect invariant register info
> From: Jeff Law [mailto:l...@redhat.com] > Sent: Saturday, May 23, 2015 6:54 AM > > > > - if (!can_move_invariant_reg (loop, inv, reg)) > > + if (!can_move_invariant_reg (loop, inv, dest)) > Won't this run into into the same problem if DEST is a SUBREG? One of the very first test in can_move_invariant_reg is: if (!REG_P (reg) || !HARD_REGISTER_P (reg)) return false; So in case of a subreg the insn will not be moved which will execute the same code as before my patch. It would be nicer if it could work with subreg of course but this makes for a much smaller and safer patch. Best regards, Thomas
RE: [PATCH 2/3, ARM, libgcc, ping7] Code size optimization for the fmul/fdiv and dmul/ddiv function in libgcc
Ping? > -Original Message- > From: Thomas Preud'homme [mailto:thomas.preudho...@arm.com] > Sent: Thursday, April 30, 2015 3:19 PM > To: Thomas Preud'homme; Richard Earnshaw; 'gcc-patches@gcc.gnu.org'; > Marcus Shawcroft; Ramana Radhakrishnan > (ramana.radhakrish...@arm.com) > Subject: RE: [PATCH 2/3, ARM, libgcc, ping6] Code size optimization for > the fmul/fdiv and dmul/ddiv function in libgcc > > Here is an updated patch that prefix local symbols with __ for more > safety. > They appear in the symtab as local so it is not strictly necessary but one is > never too cautious. Being local, they also do not generate any PLT entry. > They appear only because the jumps are from one section to another > (which is the whole purpose of this patch) and thus need a static > relocation. > > I hope this revised version address all your concerns. > > ChangeLog entry is unchanged: > > *** gcc/libgcc/ChangeLog *** > > 2015-04-30 Tony Wang > > * config/arm/ieee754-sf.S: Expose symbols around fragment > boundaries as function symbols. > * config/arm/ieee754-df.S: Same with above > > diff --git a/libgcc/config/arm/ieee754-df.S b/libgcc/config/arm/ieee754- > df.S > index c1468dc..39b0028 100644 > --- a/libgcc/config/arm/ieee754-df.S > +++ b/libgcc/config/arm/ieee754-df.S > @@ -559,7 +559,7 @@ ARM_FUNC_ALIAS aeabi_l2d floatdidf > > #ifdef L_arm_muldivdf3 > > -ARM_FUNC_START muldf3 > +ARM_FUNC_START muldf3, function_section > ARM_FUNC_ALIAS aeabi_dmul muldf3 > do_push {r4, r5, r6, lr} > > @@ -571,7 +571,7 @@ ARM_FUNC_ALIAS aeabi_dmul muldf3 > COND(and,s,ne) r5, ip, yh, lsr #20 > teqne r4, ip > teqne r5, ip > - bleqLSYM(Lml_s) > + bleq__Lml_s > > @ Add exponents together > add r4, r4, r5 > @@ -689,7 +689,7 @@ ARM_FUNC_ALIAS aeabi_dmul muldf3 > subsip, r4, #(254 - 1) > do_it hi > cmphi ip, #0x700 > - bhi LSYM(Lml_u) > + bhi __Lml_u > > @ Round the result, merge final exponent. > cmp lr, #0x8000 > @@ -716,9 +716,12 @@ LSYM(Lml_1): > mov lr, #0 > subsr4, r4, #1 > > -LSYM(Lml_u): > + FUNC_END aeabi_dmul > + FUNC_END muldf3 > + > +ARM_SYM_START __Lml_u > @ Overflow? > - bgt LSYM(Lml_o) > + bgt __Lml_o > > @ Check if denormalized result is possible, otherwise return > signed 0. > cmn r4, #(53 + 1) > @@ -778,10 +781,11 @@ LSYM(Lml_u): > do_it eq > biceq xl, xl, r3, lsr #31 > RETLDM "r4, r5, r6" > + SYM_END __Lml_u > > @ One or both arguments are denormalized. > @ Scale them leftwards and preserve sign bit. > -LSYM(Lml_d): > +ARM_SYM_START __Lml_d > teq r4, #0 > bne 2f > and r6, xh, #0x8000 > @@ -804,8 +808,9 @@ LSYM(Lml_d): > beq 3b > orr yh, yh, r6 > RET > + SYM_END __Lml_d > > -LSYM(Lml_s): > +ARM_SYM_START __Lml_s > @ Isolate the INF and NAN cases away > teq r4, ip > and r5, ip, yh, lsr #20 > @@ -817,10 +822,11 @@ LSYM(Lml_s): > orrsr6, xl, xh, lsl #1 > do_it ne > COND(orr,s,ne) r6, yl, yh, lsl #1 > - bne LSYM(Lml_d) > + bne __Lml_d > + SYM_END __Lml_s > > @ Result is 0, but determine sign anyway. > -LSYM(Lml_z): > +ARM_SYM_START __Lml_z > eor xh, xh, yh > and xh, xh, #0x8000 > mov xl, #0 > @@ -832,41 +838,42 @@ LSYM(Lml_z): > moveq xl, yl > moveq xh, yh > COND(orr,s,ne) r6, yl, yh, lsl #1 > - beq LSYM(Lml_n) @ 0 * INF or INF * 0 -> NAN > + beq __Lml_n @ 0 * INF or INF * 0 -> NAN > teq r4, ip > bne 1f > orrsr6, xl, xh, lsl #12 > - bne LSYM(Lml_n) @ NAN * -> NAN > + bne __Lml_n @ NAN * -> NAN > 1: teq r5, ip > - bne LSYM(Lml_i) > + bne __Lml_i > orrsr6, yl, yh, lsl #12 > do_it ne, t > movne xl, yl > movne xh, yh > - bne LSYM(Lml_n) @ * NAN -> NAN > + bne __Lml_n @ * NAN -> NAN > + SYM_END __Lml_z > > @ Result is INF, but we need to determine its sign. > -LSYM(Lml_i): > +ARM_SYM_START __Lml_i > eor xh, xh, yh > + SYM_END __Lml_i > > @ Overflow: return INF (sign already in xh). > -LSYM(Lml_o): > +ARM_SYM_START __Lml_o > and xh, xh, #0x8000 &
RE: [PATCH] Fix PR66168: ICE due to incorrect invariant register info
> From: Jeff Law [mailto:l...@redhat.com] > Sent: Wednesday, May 27, 2015 11:24 PM > Ah, OK. I was looking at the code prior to the call for > can_move_invariant_reg in move_invariant_reg which implies that DEST > can > be a subreg, but REG can not. > > But with that check in can_move_invariant_reg obviously won't matter. > It feels like we've likely got some dead code here, but that can be a > follow-up if you want to pursue. Are you referring to the subreg code? It's used at the end of the function: inv->reg = reg; inv->orig_regno = regno; > > OK for the trunk. Thanks, committed. Best regards, Thomas
RE: [PATCH, ping1] Fix removing of df problem in df_finish_pass
Ping? > -Original Message- > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme > Sent: Tuesday, March 03, 2015 12:02 PM > To: 'Bernhard Reutner-Fischer'; gcc-patches@gcc.gnu.org; 'Paolo Bonzini'; > 'Seongbae Park'; 'Kenneth Zadeck' > Subject: RE: [PATCH] Fix removing of df problem in df_finish_pass > > > From: Bernhard Reutner-Fischer [mailto:rep.dot@gmail.com] > > Sent: Saturday, February 28, 2015 4:00 AM > > > use df_remove_problem rather than manually removing problems, > > living > > > > leaving > > Indeed. Please find updated changelog below: > > 2015-03-03 Thomas Preud'homme > > * df-core.c (df_finish_pass): Iterate over df- > >problems_by_index[] and > use df_remove_problem rather than manually removing > problems, leaving > holes in df->problems_in_order[]. > > Best regards, > > Thomas > > > >
RE: [PATCH, ping1] Fix removing of df problem in df_finish_pass
Committed. I'll wait a week and then ask for approval for a backport to 5.1.1 once 5.1 is released. Best regards, Thomas > -Original Message- > From: Kenneth Zadeck [mailto:zad...@naturalbridge.com] > Sent: Monday, April 20, 2015 9:26 PM > To: Thomas Preud'homme; 'Bernhard Reutner-Fischer'; gcc- > patc...@gcc.gnu.org; 'Paolo Bonzini'; 'Seongbae Park' > Subject: Re: [PATCH, ping1] Fix removing of df problem in df_finish_pass > > As a dataflow maintainer, I approve this patch for the next release. > However, you will have to get approval of a release manager to get it > into 5.0. > > > > On 04/20/2015 04:22 AM, Thomas Preud'homme wrote: > > Ping? > > > >> -Original Message----- > >> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > >> ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme > >> Sent: Tuesday, March 03, 2015 12:02 PM > >> To: 'Bernhard Reutner-Fischer'; gcc-patches@gcc.gnu.org; 'Paolo > Bonzini'; > >> 'Seongbae Park'; 'Kenneth Zadeck' > >> Subject: RE: [PATCH] Fix removing of df problem in df_finish_pass > >> > >>> From: Bernhard Reutner-Fischer [mailto:rep.dot@gmail.com] > >>> Sent: Saturday, February 28, 2015 4:00 AM > >>>>use df_remove_problem rather than manually removing > problems, > >>> living > >>> > >>> leaving > >> Indeed. Please find updated changelog below: > >> > >> 2015-03-03 Thomas Preud'homme > > >> > >>* df-core.c (df_finish_pass): Iterate over df- > >>> problems_by_index[] and > >>use df_remove_problem rather than manually removing > >> problems, leaving > >>holes in df->problems_in_order[]. > >> > >> Best regards, > >> > >> Thomas > >> > >> > >> > >> > > > >
RE: [PATCH, GCC, stage1] Fallback to copy-prop if constant-prop not possible
> From: Jeff Law [mailto:l...@redhat.com] > Sent: Friday, April 24, 2015 10:59 AM > Hi Jeff, > > + > > +static bool > > +cprop_reg_p (const_rtx x) > > +{ > > + return REG_P (x) && !HARD_REGISTER_P (x); > > +} > How about instead this move to a more visible location (perhaps a macro > in regs.h or an inline function). Then as a followup, change the > various places that have this sequence to use that common definition > that exist outside of cprop.c. According to Steven this was proposed in the past but was refused (see end of [1]). [1] https://gcc.gnu.org/ml/gcc-patches/2015-03/msg01066.html > > > @@ -1191,7 +1192,7 @@ do_local_cprop (rtx x, rtx_insn *insn) > > /* Rule out USE instructions and ASM statements as we don't want > to > >change the hard registers mentioned. */ > > if (REG_P (x) > > - && (REGNO (x) >= FIRST_PSEUDO_REGISTER > > + && (cprop_reg_p (x) > > || (GET_CODE (PATTERN (insn)) != USE > > && asm_noperands (PATTERN (insn)) < 0))) > Isn't the REG_P test now redundant? I made the same mistake when reviewing that change and indeed it's not. Note the opening parenthesis before cprop_reg_p that contains a bitwise OR expression. So in the case where cprop_reg_p is false, REG_P still needs to be true. We could keep a check on FIRST_PSEUDO_REGISTER but the intent (checking that the register is suitable for propagation) is clearer now, as pointed out by Steven to me. > > OK for the trunk with those changes. > > jeff Given the above I intent to keep the REG_P in the second excerpt and will wait for your input about moving cprop_reg_p to rtl.h Best regards, Thomas
RE: [PATCH, GCC, stage1] Fallback to copy-prop if constant-prop not possible
> From: Jeff Law [mailto:l...@redhat.com] > Sent: Friday, April 24, 2015 11:15 AM > > So revised review is "ok for the trunk" :-) Committed. Best regards, Thomas
[PATCH, ARM, regression] Fix ternary operator in arm/unknown-elf.h
I just committed the obvious fix below that fix build failure introduced by revision 222371. *** gcc/ChangeLog *** 2015-04-24 Thomas Preud'homme * config/arm/unknown-elf.h (ASM_OUTPUT_ALIGNED_DECL_LOCAL): fix ternary operator in fprintf and harmonize spacing. diff --git a/gcc/config/arm/unknown-elf.h b/gcc/config/arm/unknown-elf.h index df0b9ce..2e5ab7e 100644 --- a/gcc/config/arm/unknown-elf.h +++ b/gcc/config/arm/unknown-elf.h @@ -80,9 +80,9 @@ \ ASM_OUTPUT_ALIGN (FILE, floor_log2 (ALIGN / BITS_PER_UNIT)); \ ASM_OUTPUT_LABEL (FILE, NAME); \ - fprintf (FILE, "\t.space\t%d\n", SIZE ? (int)(SIZE) : 1); \ + fprintf (FILE, "\t.space\t%d\n", SIZE ? (int) SIZE : 1); \ fprintf (FILE, "\t.size\t%s, %d\n", \ - NAME, SIZE ? (int) SIZE, 1); \ + NAME, SIZE ? (int) SIZE : 1);\ } \ while (0) Best regards, Thomas
RE: [PATCH 1/2, combine] Try REG_EQUAL for nonzero_bits
Hi, first of all, sorry for the delay. We quickly entered stage 4 and I thought it was best waiting for stage 1 to update you on this. > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme > > Of course both approaches are not exclusive. I'll try to test with *both* > rs6000 bootstrap and with a cross-compiler for one of these targets. I did two experiments where I checked the impact of removing the code guarded by SHORT_IMMEDIATES_SIGN_EXTEND. In the first one I removed the code in both rtlanal.c and combine.c. In the second, I only removed the code from combine.c (in both occurences). In both cases powerpc bootstrap succeeded. I then proceeded to use these 2 produced compilers to compile the same gcc source (actually the source from removing all code guarded by the macro). I compared the output of objdump on the resulting g++ and found that in both case the output was different from the one without any modification. Both diffs look like: Disassembly of section .init: @@ -1359,7 +1359,7 @@ Disassembly of section .text: 10003a94: f8 21 ff 81 stdur1,-128(r1) 10003a98: eb e4 00 00 ld r31,0(r4) 10003a9c: 3c 82 ff f8 addis r4,r2,-8 -10003aa0: 38 84 d7 60 addir4,r4,-10400 +10003aa0: 38 84 d7 70 addir4,r4,-10384 10003aa4: 7f e3 fb 78 mr r3,r31 10003aa8: 4b ff f0 d9 bl 10002b80 <003d.plt_call.strcmp@@GLIBC_2.3+0> 10003aac: e8 41 00 28 ld r2,40(r1) @@ -1371,7 +1371,7 @@ Disassembly of section .text: 10003ac4: 79 2a ff e3 rldicl. r10,r9,63,63 10003ac8: 41 82 00 78 beq-10003b40 <._ZL22sanitize_spec_functioniPPKc+0xc0> 10003acc: 3c 62 ff f8 addis r3,r2,-8 -10003ad0: 38 63 f5 70 addir3,r3,-2704 +10003ad0: 38 63 f5 b0 addir3,r3,-2640 10003ad4: 38 21 00 80 addir1,r1,128 10003ad8: e8 01 00 10 ld r0,16(r1) 10003adc: eb e1 ff f8 ld r31,-8(r1) (this one is when comparing g++ compiled by GCC with partial removal of the code guarded by the macro compared to compiled without GCC being modified. I may have done a mistake when doing the experiment though and can do it again if you wish. Best regards, Thomas
RE: [PATCH, combine] Try REG_EQUAL for nonzero_bits
> From: Jeff Law [mailto:l...@redhat.com] > Sent: Saturday, April 25, 2015 3:00 AM > Do you have a testcase where this change can result in better generated > code. If so please add that testcase. It's OK if it's ARM specific. Hi Jeff, Last time I tried I couldn't reduce the code to a small testcase but if I remember well it was mostly due to the problem of finding a good test for creduce (zero extension is not unique enough). I'll try again with a more manual approach and get back to you. Best regards, Thomas
RE: [PATCH 1/2, combine] Try REG_EQUAL for nonzero_bits
> From: Jeff Law [mailto:l...@redhat.com] > Sent: Saturday, April 25, 2015 2:57 AM > > +static rtx > > +sign_extend_short_imm (rtx src, machine_mode mode, unsigned int > prec) > > +{ > > + if (GET_MODE_PRECISION (mode) < prec && CONST_INT_P (src) > > + && INTVAL (src) > 0 && val_signbit_known_set_p (mode, INTVAL > (src))) > > +src = GEN_INT (INTVAL (src) | ~GET_MODE_MASK (mode)); > Can you go ahead and put each condition of the && on a separate line. > It uses more vertical space, but IMHO makes this easier to read.As I > said, it was a nit :-) You're perfectly right. Anything that can improve readability of source code is a good thing. > > OK with that fix. Committed. Best regards, Thomas
RE: [PATCH, Aarch64] Add FMA steering pass for Cortex-A57
> From: Marcus Shawcroft [mailto:marcus.shawcr...@gmail.com] > Sent: Thursday, February 05, 2015 5:17 PM > > > > *** gcc/ChangeLog *** > > > > 2015-01-26 Thomas Preud'homme thomas.preudho...@arm.com > > > > * config.gcc: Add cortex-a57-fma-steering.o to extra_objs for > > aarch64-*-*. > > * config/aarch64/t-aarch64: Add a rule for cortex-a57-fma-steering.o. > > * config/aarch64/aarch64.h > (AARCH64_FL_USE_FMA_STEERING_PASS): Define. > > (AARCH64_TUNE_FMA_STEERING): Likewise. > > * config/aarch64/aarch64-cores.def: Set > > AARCH64_FL_USE_FMA_STEERING_PASS for cores with dynamic > steering of > > FMUL/FMADD instructions. > > * config/aarch64/aarch64.c (aarch64_register_fma_steering): Declare. > > (aarch64_override_options): Include cortex-a57-fma-steering.h. Call > > aarch64_register_fma_steering () if > AARCH64_TUNE_FMA_STEERING is true. > > * config/aarch64/cortex-a57-fma-steering.h: New file. > > * config/aarch64/cortex-a57-fma-steering.c: Likewise. > > OK but wait for stage-1 to open for general development before you > commit it please. Done after rebasing it (context line change in aarch64.c due to new header include and adaptation to new signature of AARCH64_CORE macro in aarch64-cores.def). Committed patch below: diff --git a/gcc/config.gcc b/gcc/config.gcc index a1df043..9fec1e8 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -302,7 +302,7 @@ m32c*-*-*) aarch64*-*-*) cpu_type=aarch64 extra_headers="arm_neon.h arm_acle.h" - extra_objs="aarch64-builtins.o aarch-common.o" + extra_objs="aarch64-builtins.o aarch-common.o cortex-a57-fma-steering.o" target_gtfiles="\$(srcdir)/config/aarch64/aarch64-builtins.c" target_has_targetm_common=yes ;; diff --git a/gcc/config/aarch64/aarch64-cores.def b/gcc/config/aarch64/aarch64-cores.def index 7c285ba..dfc9cc8 100644 --- a/gcc/config/aarch64/aarch64-cores.def +++ b/gcc/config/aarch64/aarch64-cores.def @@ -40,7 +40,7 @@ /* V8 Architecture Processors. */ AARCH64_CORE("cortex-a53", cortexa53, cortexa53, 8, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa53, "0x41", "0xd03") -AARCH64_CORE("cortex-a57", cortexa57, cortexa57, 8, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa57, "0x41", "0xd07") +AARCH64_CORE("cortex-a57", cortexa57, cortexa57, 8, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_USE_FMA_STEERING_PASS, cortexa57, "0x41", "0xd07") AARCH64_CORE("cortex-a72", cortexa72, cortexa57, 8, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa57, "0x41", "0xd08") AARCH64_CORE("exynos-m1", exynosm1, cortexa57, 8, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, cortexa57, "0x53", "0x001") AARCH64_CORE("thunderx",thunderx, thunderx, 8, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, thunderx, "0x43", "0x0a1") @@ -48,5 +48,5 @@ AARCH64_CORE("xgene1", xgene1,xgene1,8, AARCH64_FL_FOR_ARCH8, xgen /* V8 big.LITTLE implementations. */ -AARCH64_CORE("cortex-a57.cortex-a53", cortexa57cortexa53, cortexa53, 8, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa57, "0x41", "0xd07.0xd03") +AARCH64_CORE("cortex-a57.cortex-a53", cortexa57cortexa53, cortexa53, 8, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_USE_FMA_STEERING_PASS, cortexa57, "0x41", "0xd07.0xd03") AARCH64_CORE("cortex-a72.cortex-a53", cortexa72cortexa53, cortexa53, 8, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa57, "0x41", "0xd08.0xd03") diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index 1f7187b..3fd1b3f 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -200,6 +200,8 @@ extern unsigned aarch64_architecture_version; #define AARCH64_FL_CRYPTO (1 << 2) /* Has crypto. */ #define AARCH64_FL_SLOWMUL(1 << 3) /* A slow multiply core. */ #define AARCH64_FL_CRC(1 << 4) /* Has CRC. */ +/* Has static dispatch of FMA. */ +#define AARCH64_FL_USE_FMA_STEERING_PASS (1 << 5) /* Has FP and SIMD. */ #define AARCH64_FL_FPSIMD (AARCH64_FL_FP | AARCH64_FL_SIMD) @@ -220,6 +222,8 @@ extern unsigned long aarch64_isa_flags; /* Macros to test tuning flags. */ extern unsigned long aarch64_tune_flags; #define AARCH64_TUNE_SLOWMUL (aarch64_tune_flags & AARCH64_FL_SLOWMUL) +#define AARCH64_TUNE_FMA_STEERING \ + (aarch64_tune_flags & AARCH64_FL_USE_FMA_STEERING_PASS) /* Crypto is an optional extension to AdvSIMD. */ #define TARGET_CRYPTO (TARGET_SIMD &
RE: [PATCH 2/3, ARM, libgcc, ping6] Code size optimization for the fmul/fdiv and dmul/ddiv function in libgcc
Here is an updated patch that prefix local symbols with __ for more safety. They appear in the symtab as local so it is not strictly necessary but one is never too cautious. Being local, they also do not generate any PLT entry. They appear only because the jumps are from one section to another (which is the whole purpose of this patch) and thus need a static relocation. I hope this revised version address all your concerns. ChangeLog entry is unchanged: *** gcc/libgcc/ChangeLog *** 2015-04-30 Tony Wang * config/arm/ieee754-sf.S: Expose symbols around fragment boundaries as function symbols. * config/arm/ieee754-df.S: Same with above diff --git a/libgcc/config/arm/ieee754-df.S b/libgcc/config/arm/ieee754-df.S index c1468dc..39b0028 100644 --- a/libgcc/config/arm/ieee754-df.S +++ b/libgcc/config/arm/ieee754-df.S @@ -559,7 +559,7 @@ ARM_FUNC_ALIAS aeabi_l2d floatdidf #ifdef L_arm_muldivdf3 -ARM_FUNC_START muldf3 +ARM_FUNC_START muldf3, function_section ARM_FUNC_ALIAS aeabi_dmul muldf3 do_push {r4, r5, r6, lr} @@ -571,7 +571,7 @@ ARM_FUNC_ALIAS aeabi_dmul muldf3 COND(and,s,ne) r5, ip, yh, lsr #20 teqne r4, ip teqne r5, ip - bleqLSYM(Lml_s) + bleq__Lml_s @ Add exponents together add r4, r4, r5 @@ -689,7 +689,7 @@ ARM_FUNC_ALIAS aeabi_dmul muldf3 subsip, r4, #(254 - 1) do_it hi cmphi ip, #0x700 - bhi LSYM(Lml_u) + bhi __Lml_u @ Round the result, merge final exponent. cmp lr, #0x8000 @@ -716,9 +716,12 @@ LSYM(Lml_1): mov lr, #0 subsr4, r4, #1 -LSYM(Lml_u): + FUNC_END aeabi_dmul + FUNC_END muldf3 + +ARM_SYM_START __Lml_u @ Overflow? - bgt LSYM(Lml_o) + bgt __Lml_o @ Check if denormalized result is possible, otherwise return signed 0. cmn r4, #(53 + 1) @@ -778,10 +781,11 @@ LSYM(Lml_u): do_it eq biceq xl, xl, r3, lsr #31 RETLDM "r4, r5, r6" + SYM_END __Lml_u @ One or both arguments are denormalized. @ Scale them leftwards and preserve sign bit. -LSYM(Lml_d): +ARM_SYM_START __Lml_d teq r4, #0 bne 2f and r6, xh, #0x8000 @@ -804,8 +808,9 @@ LSYM(Lml_d): beq 3b orr yh, yh, r6 RET + SYM_END __Lml_d -LSYM(Lml_s): +ARM_SYM_START __Lml_s @ Isolate the INF and NAN cases away teq r4, ip and r5, ip, yh, lsr #20 @@ -817,10 +822,11 @@ LSYM(Lml_s): orrsr6, xl, xh, lsl #1 do_it ne COND(orr,s,ne) r6, yl, yh, lsl #1 - bne LSYM(Lml_d) + bne __Lml_d + SYM_END __Lml_s @ Result is 0, but determine sign anyway. -LSYM(Lml_z): +ARM_SYM_START __Lml_z eor xh, xh, yh and xh, xh, #0x8000 mov xl, #0 @@ -832,41 +838,42 @@ LSYM(Lml_z): moveq xl, yl moveq xh, yh COND(orr,s,ne) r6, yl, yh, lsl #1 - beq LSYM(Lml_n) @ 0 * INF or INF * 0 -> NAN + beq __Lml_n @ 0 * INF or INF * 0 -> NAN teq r4, ip bne 1f orrsr6, xl, xh, lsl #12 - bne LSYM(Lml_n) @ NAN * -> NAN + bne __Lml_n @ NAN * -> NAN 1: teq r5, ip - bne LSYM(Lml_i) + bne __Lml_i orrsr6, yl, yh, lsl #12 do_it ne, t movne xl, yl movne xh, yh - bne LSYM(Lml_n) @ * NAN -> NAN + bne __Lml_n @ * NAN -> NAN + SYM_END __Lml_z @ Result is INF, but we need to determine its sign. -LSYM(Lml_i): +ARM_SYM_START __Lml_i eor xh, xh, yh + SYM_END __Lml_i @ Overflow: return INF (sign already in xh). -LSYM(Lml_o): +ARM_SYM_START __Lml_o and xh, xh, #0x8000 orr xh, xh, #0x7f00 orr xh, xh, #0x00f0 mov xl, #0 RETLDM "r4, r5, r6" + SYM_END __Lml_o @ Return a quiet NAN. -LSYM(Lml_n): +ARM_SYM_START __Lml_n orr xh, xh, #0x7f00 orr xh, xh, #0x00f8 RETLDM "r4, r5, r6" + SYM_END __Lml_n - FUNC_END aeabi_dmul - FUNC_END muldf3 - -ARM_FUNC_START divdf3 +ARM_FUNC_START divdf3 function_section ARM_FUNC_ALIAS aeabi_ddiv divdf3 do_push {r4, r5, r6, lr} @@ -985,7 +992,7 @@ ARM_FUNC_ALIAS aeabi_ddiv divdf3 subsip, r4, #(254 - 1) do_it hi cmphi ip, #0x700 - bhi LSYM(Lml_u) + bhi __Lml_u @ Round the result, merge final exponent. subsip, r5, yh @@ -1009,13 +1016,13 @@ LSYM(Ldv_1): orr xh, xh, #0x0010 mov lr, #0 subsr4, r4, #1 - b LSYM(Lml_u) + b __Lml_u @ Result mightt need to be denormaliz
RE: [PATCH, combine] Try REG_EQUAL for nonzero_bits
> From: Jeff Law [mailto:l...@redhat.com] > Sent: Tuesday, April 28, 2015 12:27 AM > To: Thomas Preud'homme; 'Eric Botcazou' > Cc: gcc-patches@gcc.gnu.org > Subject: Re: [PATCH, combine] Try REG_EQUAL for nonzero_bits > > On 04/27/2015 04:26 AM, Thomas Preud'homme wrote: > >> From: Jeff Law [mailto:l...@redhat.com] > >> Sent: Saturday, April 25, 2015 3:00 AM > >> Do you have a testcase where this change can result in better > generated > >> code. If so please add that testcase. It's OK if it's ARM specific. > > > > Hi Jeff, > > > > Last time I tried I couldn't reduce the code to a small testcase but if I > remember > > well it was mostly due to the problem of finding a good test for creduce > > (zero extension is not unique enough). I'll try again with a more manual > approach > > and get back to you. > OK. No need for heroics -- give it a shot, but don't burn an insane > amount of time on it. If we can't get to a reasonable testcase, then so > be it. Sadly I couldn't get a testcase. I get almost same sequence of instruction as the program we found the problem into but couldn't get exactly the same. In all the cases I constructed the nonzero_bits info we already have were enough for combine to do its job. I couldn't find what cause this information to be inaccurate. I will try to investigate a bit further on Monday as another pass might not be doing its job properly. Or maybe there's something that prevent information being propagated. Best regards, Thomas
RE: [PATCH, combine] Try REG_EQUAL for nonzero_bits
> From: Jeff Law [mailto:l...@redhat.com] > Sent: Tuesday, April 28, 2015 12:27 AM > OK. No need for heroics -- give it a shot, but don't burn an insane > amount of time on it. If we can't get to a reasonable testcase, then so > be it. Ok, I tried but really didn't managed to create a testcase. I did, however, understand the condition when this patch is helpful. In the function reg_nonzero_bits_for_combine () in combine.c there is a test to check if last_set_nonzero_bits for a given register is still valid. In the case I'm considering, the test evaluates to false because: (i) the register rX whose nonzero bits are being evaluated was set in a previous basic block than the one with the instruction using rX (hence rsp->last_set_label < label_tick) (ii) the predecessor of the the basic block for that same insn is not the previous basic block analyzed by combine_instructions (hence label_tick_ebb_start == label_tick) (iii) the register rX is set multiple time (hence REG_N_SETS (REGNO (x)) != 1) Yet, the block being processed is dominated by the SET for rX so there is a REG_EQUAL available to narrow down the set of nonzero bits. Based on my understanding of your answer quoted above, I'll commit it as is, despite not having been able to come up with a testcase. I'll wait tomorrow to do so though in case you changed your mind about it. Best regards, Thomas
RE: [PATCH, ping 1] Move insns without introducing new temporaries in loop2_invariant
Ping? Best regards, Thomas > -Original Message- > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme > Sent: Monday, March 16, 2015 8:39 PM > To: 'Steven Bosscher' > Cc: GCC Patches; Eric Botcazou > Subject: RE: [PATCH, stage1] Move insns without introducing new > temporaries in loop2_invariant > > > From: Steven Bosscher [mailto:stevenb@gmail.com] > > Sent: Monday, March 09, 2015 7:48 PM > > To: Thomas Preud'homme > > Cc: GCC Patches; Eric Botcazou > > Subject: Re: [PATCH, stage1] Move insns without introducing new > > temporaries in loop2_invariant > > New patch below. > > > > > It looks like this would run for all candidate loop invariants, right? > > > > If so, you're creating run time of O(n_invariants*n_bbs_in_loop), a > > potential compile time hog for large loops. > > > > But why compute this at all? Perhaps I'm missing something, but you > > already have inv->always_executed available, no? > > Indeed. I didn't realize the information was already there. > > > > > > > > + basic_block use_bb; > > > + > > > + ref = DF_REF_INSN (use); > > > + use_bb = BLOCK_FOR_INSN (ref); > > > > You can use DF_REF_BB. > > Since I need use_insn here I kept BLOCK_FOR_INSN but I used > DF_REF_BB for the def below. > > > So here are the new ChangeLog entries: > > *** gcc/ChangeLog *** > > 2015-03-11 Thomas Preud'homme > > * loop-invariant.c (can_move_invariant_reg): New. > (move_invariant_reg): Call above new function to decide whether > instruction can just be moved, skipping creation of temporary > register. > > *** gcc/testsuite/ChangeLog *** > > 2015-03-12 Thomas Preud'homme > > * gcc.dg/loop-8.c: New test. > * gcc.dg/loop-9.c: New test. > > > diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c > index f79b497..8217d62 100644 > --- a/gcc/loop-invariant.c > +++ b/gcc/loop-invariant.c > @@ -1512,6 +1512,79 @@ replace_uses (struct invariant *inv, rtx reg, > bool in_group) >return 1; > } > > And the new patch: > > +/* Whether invariant INV setting REG can be moved out of LOOP, at the > end of > + the block preceding its header. */ > + > +static bool > +can_move_invariant_reg (struct loop *loop, struct invariant *inv, rtx > reg) > +{ > + df_ref def, use; > + unsigned int dest_regno, defs_in_loop_count = 0; > + rtx_insn *insn = inv->insn; > + basic_block bb = BLOCK_FOR_INSN (inv->insn); > + > + /* We ignore hard register and memory access for cost and complexity > reasons. > + Hard register are few at this stage and expensive to consider as they > + require building a separate data flow. Memory access would require > using > + df_simulate_* and can_move_insns_across functions and is more > complex. */ > + if (!REG_P (reg) || HARD_REGISTER_P (reg)) > +return false; > + > + /* Check whether the set is always executed. We could omit this > condition if > + we know that the register is unused outside of the loop, but it does > not > + seem worth finding out. */ > + if (!inv->always_executed) > +return false; > + > + /* Check that all uses reached by the def in insn would still be reached > + it. */ > + dest_regno = REGNO (reg); > + for (use = DF_REG_USE_CHAIN (dest_regno); use; use = > DF_REF_NEXT_REG (use)) > +{ > + rtx_insn *use_insn; > + basic_block use_bb; > + > + use_insn = DF_REF_INSN (use); > + use_bb = BLOCK_FOR_INSN (use_insn); > + > + /* Ignore instruction considered for moving. */ > + if (use_insn == insn) > + continue; > + > + /* Don't consider uses outside loop. */ > + if (!flow_bb_inside_loop_p (loop, use_bb)) > + continue; > + > + /* Don't move if a use is not dominated by def in insn. */ > + if (use_bb == bb && DF_INSN_LUID (insn) >= DF_INSN_LUID > (use_insn)) > + return false; > + if (!dominated_by_p (CDI_DOMINATORS, use_bb, bb)) > + return false; > +} > + > + /* Check for other defs. Any other def in the loop might reach a use > + currently reached by the def in insn. */ > + for (def = DF_REG_DEF_CHAIN (dest_regno); def; def = > DF_REF_NEXT_REG (def)) > +{ > + basic_block def_bb = DF_REF_BB (def); > + > + /* Defs in exit block cannot reach a use they weren't already. */ > + if (single_succ_p (def_bb)) > +
[PATCH, ARM] Fix testcase for PR64616
Hi, Testcase made for PR64616 was only passing when using a litteral pool. Rather than having an alternative for systems where this is not true, this patch changes the test to check that a global copy propagation occurs in cprop2. This should work accross all ARM targets (it works when targetting Cortex-M0, Cortex-M3 and whatever default core for ARMv7-a with vfpv3-d16 FPU). ChangeLog entry is as follows: *** gcc/testsuite/ChangeLog *** 2015-05-04 Thomas Preud'homme * gcc.target/arm/pr64616.c: Test dump rather than assembly to work accross ARM targets. diff --git a/gcc/testsuite/gcc.target/arm/pr64616.c b/gcc/testsuite/gcc.target/arm/pr64616.c index c686ffa..2280f21 100644 --- a/gcc/testsuite/gcc.target/arm/pr64616.c +++ b/gcc/testsuite/gcc.target/arm/pr64616.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2" } */ +/* { dg-options "-O2 -fdump-rtl-cprop2" } */ int f (int); unsigned int glob; @@ -11,4 +11,5 @@ g () glob = 5; } -/* { dg-final { scan-assembler-times "ldr" 2 } } */ +/* { dg-final { scan-rtl-dump "GLOBAL COPY-PROP" "cprop2" } } */ +/* { dg-final { cleanup-rtl-dump "cprop2" } } */ Patch was tested by verifying that the pattern appears when targeting Cortex-M0, Cortex-M3 and the default core for ARMv7-a with vfpv3-d16 FPU. Best regards, Thomas
RE: [PATCH, combine] Try REG_EQUAL for nonzero_bits
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme > > Based on my understanding of your answer quoted above, I'll commit > it as is, despite not having been able to come up with a testcase. I'll > wait tomorrow to do so though in case you changed your mind about it. Committed. Best regards, Thomas
RE: [PATCH, RFC, C] Add -fno-float to forbid floating point data types
> From: Joseph Myers [mailto:jos...@codesourcery.com] > Sent: Wednesday, November 12, 2014 10:11 PM > > > > This patch modifies the C parser to give an error if: > > - any variable or function parameter is declared with a float type or > > a type containing a float (prototype are ignored) > > But if you ignore prototypes at the time of declaration, don't you need to > diagnose if a function with a floating-point parameter or return type gets > called? I don't see anything to do that. (This includes the > __builtin_sqrt case from the previous discussion.) It would work by transitivity. To call a function with a float you'd have to either declare a float variable or pass it a float literal. > > > specified by user and a float litteral is found. > > "literal" (throughout). Thanks. You'll have guessed that I'm not a native English speaker. > > No, this is wrong. (a) By tying this to CPP_SEMICOLON you'll only catch > it if the variable is last in the list of declarators (consider "float f, > g (void);", where what comes before the semicolon is a function > declaration); better to check on each declarator, not just the last. Indeed. I meant to do that at some point and I forgot about it. > (b) > declarator->kind reflects the outermost declarator, but what determines > whether it's a function declaration is the innermost declarator (other > than cdk_id or cdk_attrs declarators), so this looks like it would give an > error for "float *f(void);" (wrongly treating it as a variable because the > outermost declarator is cdk_pointer), but not for "float (*f)(void);" > (wrongly treating it as a function because the outermost declarator is > cdk_function ... you could of course decide that function pointers > involving floating-point types are OK if you want). I see. I must say it's the first time I look at any GCC frontend so I missed this important bit. (c) specs->type only > covers the type specifiers, so if you want to diagnose function pointer > variables you need to allow for "int (*f)(float);" where the declaration's > type involves floating point but the type specifiers don't. Ok. (d) What do > you want to do with typedef declarations (right now it looks like they'll > be handled as variables, but your testcases don't consider them)? Because typedef can be used in header, these types should be dealt when actually used, so at declaration type like the rest. I'll make sure to add some testcase for this. > > I'd also suggest some refactoring: have a function that takes as > arguments > a location and a type, and does the > > if (flag_no_float && contains_floating_point_type (type)) > error_at (loc, ...); > > to avoid repeating that pattern in several places. Right. Thanks a lot for the review. Except on the patch itself, what do you think of the general approach? Do you think doing this in the frontend is the right approach? Best regards, Thomas
RE: [PATCH 1/3, ARM, libgcc, ping5] Code size optimization for the fmul/fdiv and dmul/ddiv function in libgcc
[Taking over Tony's patch] Ping? Best regards, Thomas > -Original Message- > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of Tony Wang > Sent: Thursday, August 21, 2014 7:15 AM > To: gcc-patches@gcc.gnu.org > Subject: [PATCH 1/3,ARM,libgcc]Code size optimization for the fmul/fdiv > and dmul/ddiv function in libgcc > > Hi there, > > In libgcc the file ieee754-sf.S and ieee754-df.S have some function pairs > which will be bundled into one .o > file and sharing the same .text section. For example, the fmul and fdiv, > the libgcc makefile will build them > into one .o file and archived into libgcc.a. So when user only call single > float point multiply functions, the > fdiv function will also be linked, and as fmul and fdiv share the same .text > section, linker option > --gc-sections or -flot can't remove the dead code. > > So this optimization just separates the function pair(fmul/fdiv and > dmul/ddiv) into different sections, > following the naming pattern of -ffunction- > sections(.text.__functionname), through which the unused sections > of fdiv/ddiv can be eliminated through option --gcc-sections when users > only use fmul/dmul.The solution is to > add a conditional statement in the macro FUNC_START, which will > conditional change the section of a function > from .text to .text.__\name. when compiling with the L_arm_muldivsf3 > or L_arm_muldivdf3 macro. > > GCC regression test has been done on QEMU for Cortex-M3. No new > regressions when turn on this patch. > > The code reduction for thumb2 on cortex-m3 is: > 1. When user only use single float point multiply: > fmul+fdiv => fmul will have a code size reduction of 318 bytes. > > 2. When user only use double float point multiply: > dmul+ddiv => dmul will have a code size reduction of 474 bytes. > > Ok for trunk? > > BR, > Tony > > Step 1: Provide another option: sp-scetion to control whether to split the > section of a function pair into two > part. > > gcc/libgcc/ChangeLog: > 2014-08-21 Tony Wang > > * config/arm/lib1funcs.S (FUNC_START): Add conditional section > redefine for macro L_arm_muldivsf3 and L_arm_muldivdf3 > (SYM_END, ARM_SYM_START): Add macros used to expose function > Symbols > > diff --git a/libgcc/config/arm/lib1funcs.S b/libgcc/config/arm/lib1funcs.S > index b617137..0f87111 100644 > --- a/libgcc/config/arm/lib1funcs.S > +++ b/libgcc/config/arm/lib1funcs.S > @@ -418,8 +418,12 @@ SYM (\name): > #define THUMB_SYNTAX > #endif > > -.macro FUNC_START name > +.macro FUNC_START name sp_section= > + .ifc \sp_section, function_section > + .section.text.__\name,"ax",%progbits > + .else > .text > + .endif > .globl SYM (__\name) > TYPE (__\name) > .align 0 > @@ -429,14 +433,24 @@ SYM (\name): > SYM (__\name): > .endm > > +.macro ARM_SYM_START name > + TYPE (\name) > + .align 0 > +SYM (\name): > +.endm > + > +.macro SYM_END name > + SIZE (\name) > +.endm > + > /* Special function that will always be coded in ARM assembly, even if > in Thumb-only compilation. */ > > #if defined(__thumb2__) > > /* For Thumb-2 we build everything in thumb mode. */ > -.macro ARM_FUNC_START name > - FUNC_START \name > +.macro ARM_FUNC_START name sp_section= > + FUNC_START \name \sp_section > .syntax unified > .endm > #define EQUIV .thumb_set > @@ -467,8 +481,12 @@ _L__\name: > #ifdef __ARM_ARCH_6M__ > #define EQUIV .thumb_set > #else > -.macro ARM_FUNC_START name > +.macro ARM_FUNC_START name sp_section= > + .ifc \sp_section, function_section > + .section.text.__\name,"ax",%progbits > + .else > .text > + .endif > .globl SYM (__\name) > TYPE (__\name) > .align 0
RE: [PATCH 2/3, ARM, libgcc, ping5] Code size optimization for the fmul/fdiv and dmul/ddiv function in libgcc
[Taking over Tony's patch] Ping? Best regards, Thomas > -Original Message- > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of Tony Wang > Sent: Thursday, August 21, 2014 7:15 AM > To: gcc-patches@gcc.gnu.org > Subject: [PATCH 2/3,ARM,libgcc]Code size optimization for the fmul/fdiv > and dmul/ddiv function in libgcc > > Step 2: Mark all the symbols around the fragment boundaries as function > symbols, so as to generate veneer when > the two section is too far away from each other. Also, I have both > manually and using some test cases to > verify that IP and PSR are not alive at such point. > > gcc/libgcc/ChangeLog: > 2014-8-21 Tony Wang > > * config/arm/ieee754-sf.S: Expose symbols around fragment > boundaries as function symbols. > * config/arm/ieee754-df.S: Same with above > > BR, > Tony
RE: [PATCH 3/3, ARM, libgcc, ping5] Code size optimization for the fmul/fdiv and dmul/ddiv function in libgcc
[Taking over Tony's patch] Ping? Best regards, Thomas > -Original Message- > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of Tony Wang > Sent: Thursday, August 21, 2014 7:15 AM > To: gcc-patches@gcc.gnu.org > Subject: [PATCH 3/3,ARM,libgcc]Code size optimization for the fmul/fdiv > and dmul/ddiv function in libgcc > > Step 3: Test cases to verify the code size reduction. > > gcc/gcc/testsuite/ChangeLog: > 2014-08-21 Tony Wang > > * gcc.target/arm/size-optimization-ieee-1.c: New test case > * gcc.target/arm/size-optimization-ieee-2.c: New test case > * lib/gcc-dg.exp: Add new function scan-symbol-common, scan- > symbol-yes, > scan-symbol-no to scan a user defined symbol in final elf file > > BR, > Tony > > diff --git a/gcc/testsuite/gcc.target/arm/size-optimization-ieee-1.c > b/gcc/testsuite/gcc.target/arm/size-optimization-ieee-1.c > new file mode 100644 > index 000..46e9cdf > --- /dev/null > +++ b/gcc/testsuite/gcc.target/arm/size-optimization-ieee-1.c > @@ -0,0 +1,30 @@ > +/* { dg-do link { target { arm_thumb2_ok } } } */ > +/* { dg-options "-Wl,--gc-sections" } */ > +int > +foo () > +{ > + volatile float a; > + volatile float b; > + volatile float c = a * b; > + return 0; > +} > + > +int > +bar () > +{ > + volatile double a; > + volatile double b; > + volatile double c = a * b; > + return 0; > +} > + > +int > +main () > +{ > + foo (); > + bar (); > + return 0; > +} > +/* { dg-final { scan-symbol-no "__aeabi_fdiv" } } */ > +/* { dg-final { scan-symbol-no "__aeabi_ddiv" } } */ > + > diff --git a/gcc/testsuite/gcc.target/arm/size-optimization-ieee-2.c > b/gcc/testsuite/gcc.target/arm/size-optimization-ieee-2.c > new file mode 100644 > index 000..5007d62 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/arm/size-optimization-ieee-2.c > @@ -0,0 +1,30 @@ > +/* { dg-do link { target { arm_thumb2_ok } } } */ > +/* { dg-options "-Wl,--gc-sections" } */ > +int > +foo () > +{ > + volatile float a; > + volatile float b; > + volatile float c = a / b; > + return 0; > +} > + > +int > +bar () > +{ > + volatile double a; > + volatile double b; > + volatile double c = a / b; > + return 0; > +} > + > +int > +main () > +{ > + foo (); > + bar (); > + return 0; > +} > +/* { dg-final { scan-symbol-yes "__aeabi_fmul" } } */ > +/* { dg-final { scan-symbol-yes "__aeabi_dmul" } } */ > + > diff --git a/gcc/testsuite/lib/gcc-dg.exp b/gcc/testsuite/lib/gcc-dg.exp > index 3390caa..0d52e95 100644 > --- a/gcc/testsuite/lib/gcc-dg.exp > +++ b/gcc/testsuite/lib/gcc-dg.exp > @@ -880,5 +880,57 @@ proc gdb-exists { args } { > return 0; > } > > +# Scan the OUTPUT_FILE for a symbol. Return 1 if it present, or > +# return 0 if it doesn't present > + > +proc scan-symbol-common { args } { > +global nm > +global base_dir > + > +set testcase [testname-for-summary] > +set output_file "[file rootname [file tail $testcase]].exe" > + > +# Find nm like we find g++ in g++.exp. > +if ![info exists nm] { > +set nm [findfile $base_dir/../../../binutils/nm \ > +$base_dir/../../../binutils/nm \ > +[findfile $base_dir/../../nm $base_dir/../../nm \ > + [findfile $base_dir/nm $base_dir/nm \ > + [transform nm > +verbose -log "nm is $nm" > +} > + > +if { $output_file == "" } { > +fail "scan-symbol-not $args: dump file does not exist" > +return > +} > + > +set fd [open "| $nm $output_file" r] > +set text [read $fd] > +close $fd > + > +if [regexp -- [lindex $args 0] $text] { > +return 1 > +} else { > +return 0 > +} > +} > + > +proc scan-symbol-yes { args } { > +if { [scan-symbol-common $args] == 1 } { > + pass "scan-symbol-yes $args exists" > +} else { > + fail "scan-symbol-yes $args does not exist" > +} > +} > + > +proc scan-symbol-no { args } { > +if { [scan-symbol-common $args] != 1 } { > +pass "scan-symbol-no $args does not exist" > +} else { > +fail "scan-symbol-no $args exists" > +} > +} > + > set additional_prunes "" > set dg_runtest_extra_prunes ""
[PATCH] Add force option to find_best_rename_reg in regrename pass
We are planning to introduce a new optimization in aarch64 backend, similar to the FP load balancing pass in the LLVM project [1]. This pass would be core specific and involve doing some register renaming. An RFC version of this patch should be posted later today. As part of this pass, we want to rename a register to any register following some specific constraints. We wanted to reuse the global (non static) find_best_rename_reg function as does the c6x backend but this function is a bit too specific to the register renaming pass. [1] http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/AArch64/AArch64A57FPLoadBalancing.cpp?view=markup It looks at register that respect the constraints of all the instructions in the set and tries to pick one in the preferred class for all the instructions involved. This is generally useful for any pass that wants to do register renaming. However it also contains some logic to only select the register that also haven't been used for a longer time than the register that should be replaced. This bit is specific to the register renaming pass and makes the function unusable for this new pass as a result which forces us to do a copy of the function. This patch adds an extra parameter to skip this check and only consider the constraints and tries to pick a register in the preferred class. ChangeLog entry is as follows: 2014-11-14 Thomas Preud'homme * regrename.c (find_best_rename_reg): Rename to ... (find_rename_reg): This. Also add a parameter to skip tick check. * regrename.h: Likewise. * config/c6x/c6x.c: Adapt to above renaming. diff --git a/gcc/config/c6x/c6x.c b/gcc/config/c6x/c6x.c index 06319d0..6aca1e3 100644 --- a/gcc/config/c6x/c6x.c +++ b/gcc/config/c6x/c6x.c @@ -3513,7 +3513,8 @@ try_rename_operands (rtx_insn *head, rtx_insn *tail, unit_req_table reqs, COMPL_HARD_REG_SET (unavailable, reg_class_contents[(int) super_class]); old_reg = this_head->regno; - best_reg = find_best_rename_reg (this_head, super_class, &unavailable, old_reg); + best_reg = find_rename_reg (this_head, super_class, &unavailable, old_reg, + true); regrename_do_replace (this_head, best_reg); diff --git a/gcc/regrename.h b/gcc/regrename.h index 03b7164..05c78ad 100644 --- a/gcc/regrename.h +++ b/gcc/regrename.h @@ -89,8 +89,8 @@ extern void regrename_init (bool); extern void regrename_finish (void); extern void regrename_analyze (bitmap); extern du_head_p regrename_chain_from_id (unsigned int); -extern int find_best_rename_reg (du_head_p, enum reg_class, HARD_REG_SET *, -int); +extern int find_rename_reg (du_head_p, enum reg_class, HARD_REG_SET *, int, + bool); extern void regrename_do_replace (du_head_p, int); #endif diff --git a/gcc/regrename.c b/gcc/regrename.c index 66f562b..5de7826 100644 --- a/gcc/regrename.c +++ b/gcc/regrename.c @@ -357,11 +357,13 @@ check_new_reg_p (int reg ATTRIBUTE_UNUSED, int new_reg, /* For the chain THIS_HEAD, compute and return the best register to rename to. SUPER_CLASS is the superunion of register classes in the chain. UNAVAILABLE is a set of registers that cannot be used. - OLD_REG is the register currently used for the chain. */ + OLD_REG is the register currently used for the chain. BEST_RENAME + controls whether the register chosen must be better than the + current one or just respect the given constraint. */ int -find_best_rename_reg (du_head_p this_head, enum reg_class super_class, - HARD_REG_SET *unavailable, int old_reg) +find_rename_reg (du_head_p this_head, enum reg_class super_class, +HARD_REG_SET *unavailable, int old_reg, bool best_rename) { bool has_preferred_class; enum reg_class preferred_class; @@ -408,8 +410,13 @@ find_best_rename_reg (du_head_p this_head, enum reg_class super_class, && ((pass == 0 && !TEST_HARD_REG_BIT (reg_class_contents[preferred_class], best_new_reg)) - || tick[best_new_reg] > tick[new_reg])) - best_new_reg = new_reg; + || !best_rename || tick[best_new_reg] > tick[new_reg])) + { + if (best_rename) + best_new_reg = new_reg; + else + return new_reg; + } } if (pass == 0 && best_new_reg != old_reg) break; @@ -480,8 +487,8 @@ rename_chains (void) if (n_uses < 2) continue; - best_new_reg = find_best_rename_reg (this_head, super_class, - &this_unavailable, reg); + best_new_reg = find_rename_reg (this_head, super_class, + &this_unavailable, reg, true); if (dump_file) { === Testing === c6x ba
RE: [PATCH] Cancel bswap opt when intermediate stmts are reused
> From: Richard Biener [mailto:richard.guent...@gmail.com] > Sent: Monday, November 17, 2014 12:47 PM > > Hmm. I am a little bit concerned about the malloc traffic generated here. > So why not use a vec, get rid of the ->next pointer and > use a hash_map to associate the stmt with > an index into the vector? Sure, it even makes things easier. However I don't understand why a vector is better than malloc. Is it a matter of fragmentation? > > At this point I'd rather leave DCE to DCE ... I thought since all information is there why not do it. It makes it easier to read the dump of the pass. > > Ick - do we really have to use gotos here? Can you refactor this > a bit to avoid it? Yes I can. I needed the same kind of thing for fixing PR63761 (r217409) and found a way to avoid it. > > The whole scheme wouldn't exactly fly with the idea of eventually > using a lattice to propagate the symbolic numbers, but well... > > I think the overall idea is sound and if you have time to adjust according > to my comments that would be nice. To be honest I think it should wait for next stage1. After several ping I took another look at the testcase with the idea of measuring the size reduction the patch could give and was surprised that in all cases I could construct the size was actually bigger. Performance might be improved nonetheless but I think this needs more careful consideration. And as you said the approach would need to be changed if bswap was rewritten to do a forward analysis. At last, nobody reported a code size or performance regression so far due to the changes so that might be a non issue. If such a report happens, then it will be a good time to revisit that decision. Do you agree? > > Sorry for the very late review. That's alright, I know how it is. Thank you for keeping track of it. I actually feel sorry I didn't warn about my findings. I thought the patch fell through the cracks and didn't want to spam gcc-patches uselessly. Best regards, Thomas
RE: [Patch ARM] Fix PR target/56846
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of Tony Wang > > > Hi all, > > The bug is reported at > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56846, and it’s about the > problem that > when exception handler is involved in the function, then > _Unwind_Backtrace function will run into deadloop on > arm target. The patch (in r215101) can be backported without any change on 4.8 and 4.9 branches. I checked in QEMU with and without the patch on both branches and it indeed solves the problem. Testsuite run without regression when compiled with arm-none-eabi cross compiler and executed on QEMU emulating Cortex-M3. I also bootstrapped gcc on x86_64-linux-gnu and run the testsuite without regressions. Is it ok for backport? Best regards, Thomas
RE: [Patch, ARM, ping1] Fix PR target/56846
Ping? > -Original Message- > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme > Sent: Wednesday, November 19, 2014 6:00 PM > To: Tony Wang; gcc-patches@gcc.gnu.org; d...@debian.org; aph- > g...@littlepinkcloud.com; Richard Earnshaw; Ramana Radhakrishnan; > libstd...@gcc.gnu.org > Subject: RE: [Patch ARM] Fix PR target/56846 > > > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > > ow...@gcc.gnu.org] On Behalf Of Tony Wang > > > > > > > Hi all, > > > > The bug is reported at > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56846, and it’s about the > > problem that > > when exception handler is involved in the function, then > > _Unwind_Backtrace function will run into deadloop on > > arm target. > > The patch (in r215101) can be backported without any change on 4.8 and > 4.9 branches. I checked in QEMU with and without the patch on both > branches and it indeed solves the problem. > > Testsuite run without regression when compiled with arm-none-eabi > cross compiler and executed on QEMU emulating Cortex-M3. > > I also bootstrapped gcc on x86_64-linux-gnu and run the testsuite > without > regressions. > > Is it ok for backport? > > Best regards, > > Thomas > > > >
RE: [Patch, ARM, ping1] Fix PR target/56846
Thanks. Ccing release manager for their opinion. Best regards, Thomas > -Original Message- > From: Jonathan Wakely [mailto:jwak...@redhat.com] > Sent: Wednesday, November 26, 2014 5:33 PM > To: Thomas Preud'homme > Cc: Tony Wang; gcc-patches@gcc.gnu.org; d...@debian.org; aph- > g...@littlepinkcloud.com; Richard Earnshaw; Ramana Radhakrishnan; > libstd...@gcc.gnu.org > Subject: Re: [Patch, ARM, ping1] Fix PR target/56846 > > On 26/11/14 17:23 -, Thomas Preud'homme wrote: > >Ping? > > I'm OK with backporting it if a release manager approves it. > > > >> -Original Message- > >> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > >> ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme > >> Sent: Wednesday, November 19, 2014 6:00 PM > >> To: Tony Wang; gcc-patches@gcc.gnu.org; d...@debian.org; aph- > >> g...@littlepinkcloud.com; Richard Earnshaw; Ramana Radhakrishnan; > >> libstd...@gcc.gnu.org > >> Subject: RE: [Patch ARM] Fix PR target/56846 > >> > >> > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > >> > ow...@gcc.gnu.org] On Behalf Of Tony Wang > >> > > >> > >> > > >> > Hi all, > >> > > >> > The bug is reported at > >> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56846, and it’s about > the > >> > problem that > >> > when exception handler is involved in the function, then > >> > _Unwind_Backtrace function will run into deadloop on > >> > arm target. > >> > >> The patch (in r215101) can be backported without any change on 4.8 > and > >> 4.9 branches. I checked in QEMU with and without the patch on both > >> branches and it indeed solves the problem. > >> > >> Testsuite run without regression when compiled with arm-none-eabi > >> cross compiler and executed on QEMU emulating Cortex-M3. > >> > >> I also bootstrapped gcc on x86_64-linux-gnu and run the testsuite > >> without > >> regressions. > >> > >> Is it ok for backport? > >> > >> Best regards, > >> > >> Thomas > >> > >> > >> > >> > > > > > > > >
RE: [Patch, ARM, ping1] Fix PR target/56846
> -Original Message- > From: Richard Biener [mailto:rguent...@suse.de] > Sent: Thursday, November 27, 2014 9:57 AM > To: Ramana Radhakrishnan > Cc: Thomas Preud'homme; 'Jonathan Wakely'; Jakub Jelinek; Tony Wang; > gcc-patches@gcc.gnu.org; d...@debian.org; aph- > g...@littlepinkcloud.com; Richard Earnshaw; libstd...@gcc.gnu.org > Subject: Re: [Patch, ARM, ping1] Fix PR target/56846 > > On Thu, 27 Nov 2014, Ramana Radhakrishnan wrote: > > > > > > > On 27/11/14 09:34, Richard Biener wrote: > > > On Thu, 27 Nov 2014, Thomas Preud'homme wrote: > > > > > > > Thanks. Ccing release manager for their opinion. > > > > > > It doesn't look ARM specific and frankly I have not too much > expertise > > > in this area. The patch has been on trunk for more than two months > > > though so I guess it is ok to backport. > > > > > > > It is ARM specific because the whole thing sits in a #ifdef > > __ARM_EABI_UNWINDER__ in eh_personality.cc. > > Ah, too little patch context then. Sorry, my bad. Below is the patch that were sent by Tony at that time with 20 lines of context: diff --git a/libstdc++-v3/libsupc++/eh_personality.cc b/libstdc++-v3/libsupc++/eh_personality.cc index f315a83..cb4467a 100644 --- a/libstdc++-v3/libsupc++/eh_personality.cc +++ b/libstdc++-v3/libsupc++/eh_personality.cc @@ -361,40 +361,46 @@ PERSONALITY_FUNCTION (int version, found_cleanup, found_handler } found_type; lsda_header_info info; const unsigned char *language_specific_data; const unsigned char *action_record; const unsigned char *p; _Unwind_Ptr landing_pad, ip; int handler_switch_value; void* thrown_ptr = 0; bool foreign_exception; int ip_before_insn = 0; #ifdef __ARM_EABI_UNWINDER__ _Unwind_Action actions; switch (state & _US_ACTION_MASK) { case _US_VIRTUAL_UNWIND_FRAME: + // If the unwind state pattern is + // _US_VIRTUAL_UNWIND_FRAME | _US_FORCE_UNWIND + // then we don't need to search for any handler as it is not a real + // exception. Just unwind the stack. + if (state & _US_FORCE_UNWIND) + CONTINUE_UNWINDING; actions = _UA_SEARCH_PHASE; break; case _US_UNWIND_FRAME_STARTING: actions = _UA_CLEANUP_PHASE; if (!(state & _US_FORCE_UNWIND) && ue_header->barrier_cache.sp == _Unwind_GetGR(context, UNWIND_STACK_REG)) actions |= _UA_HANDLER_FRAME; break; case _US_UNWIND_FRAME_RESUME: CONTINUE_UNWINDING; break; default: std::abort(); } actions |= state & _US_FORCE_UNWIND; Best regards, Thomas
[PATCH, contrib] Reduce check_GNU_style noise
Currently check_GNU_style.sh gives the error "There should be exactly one space between function name and parentheses." for the following kind of lines: tab[(int) idx] This patch changes the check to only warn if there is 0 of 2+ space(s) between a alphanumeric character and an opening parenthesis, rather than 2+ space or anything else than a single space (which also was redundant). With the change, above lines are now not warned about but other incorrect lines are still reported. ChangeLog entry is as follows: *** contrib/ChangeLog *** 2014-11-28 Thomas Preud'homme * check_GNU_style.sh: Warn for incorrect number of space in function call only if 0 or 2+ spaces found. diff --git a/contrib/check_GNU_style.sh b/contrib/check_GNU_style.sh index ef8fdda..5f90190 100755 --- a/contrib/check_GNU_style.sh +++ b/contrib/check_GNU_style.sh @@ -113,7 +113,7 @@ g 'Sentences should end with a dot. Dot, space, space, end of the comment.' \ '[[:alnum:]][[:blank:]]*\*/' $* vg 'There should be exactly one space between function name and parentheses.' \ -'\#define' '[[:alnum:]]([^[:blank:]]|[[:blank:]]{2,})\(' $* +'\#define' '[[:alnum:]]([[:blank:]]{2,})?\(' $* g 'There should be no space before closing parentheses.' \ '[[:graph:]][[:blank:]]+\)' $* Is this ok for trunk? Best regards, Thomas
[PATCH] Fix removing of df problem in df_finish_pass
Hi, In df_finish_pass, optional problems are removed manually making non null entries in df->problems_in_order non contiguous. This may lead to null pointer dereference when accessing all problems from df->problems_in_order[0] to df->problems_in_order[df->num_problems_defined - 1] and miss some other problems. Such a scenario was actually encountered when working on a patch. This patch use the existing function df_remove_problem to do the deletion, which require iterating on problems via the df->problems_by_index[] array since each call mess up with df->num_problems_defined and order of problems in df->problems_in_order[]. ChangeLog entry is as follows: 2015-02-12 Thomas Preud'homme * df-core.c (df_finish_pass): Iterate over df->problems_by_index[] and use df_remove_problem rather than manually removing problems, living holes in df->problems_in_order[]. diff --git a/gcc/df-core.c b/gcc/df-core.c index 82f1364..67040a1 100644 --- a/gcc/df-core.c +++ b/gcc/df-core.c @@ -642,7 +642,6 @@ void df_finish_pass (bool verify ATTRIBUTE_UNUSED) { int i; - int removed = 0; #ifdef ENABLE_DF_CHECKING int saved_flags; @@ -658,21 +657,15 @@ df_finish_pass (bool verify ATTRIBUTE_UNUSED) saved_flags = df->changeable_flags; #endif - for (i = 0; i < df->num_problems_defined; i++) + /* We iterate over problems by index as each problem removed will + lead to problems_in_order to be reordered. */ + for (i = 0; i < DF_LAST_PROBLEM_PLUS1; i++) { - struct dataflow *dflow = df->problems_in_order[i]; - struct df_problem *problem = dflow->problem; + struct dataflow *dflow = df->problems_by_index[i]; - if (dflow->optional_p) - { - gcc_assert (problem->remove_problem_fun); - (problem->remove_problem_fun) (); - df->problems_in_order[i] = NULL; - df->problems_by_index[problem->id] = NULL; - removed++; - } + if (dflow && dflow->optional_p) + df_remove_problem (dflow); } - df->num_problems_defined -= removed; /* Clear all of the flags. */ df->changeable_flags = 0; Testsuite was run with a bootstrapped x86_64 native compiler and an arm-none-eabi GCC cross-compiler targetting Cortex-M3 without any regression. Although the problem is real, it doesn't seem that GCC hits it now (I stumbled upon it while working on a patch). Therefore I'm not sure if this should go in stage4 or not. Please advise me on this. Ok for trunk/stage1? Best regards, Thomas
RE: [PATCH] Fix removing of df problem in df_finish_pass
> From: Bernhard Reutner-Fischer [mailto:rep.dot@gmail.com] > Sent: Saturday, February 28, 2015 4:00 AM > > use df_remove_problem rather than manually removing problems, > living > > leaving Indeed. Please find updated changelog below: 2015-03-03 Thomas Preud'homme * df-core.c (df_finish_pass): Iterate over df->problems_by_index[] and use df_remove_problem rather than manually removing problems, leaving holes in df->problems_in_order[]. Best regards, Thomas
RE: [PATCH, ARM] Fix PR64453: live high register not saved in function prolog with -Os
Just committed to 4.9 branch, 4.8 to follow once regression testsuite for 4.8 backport finishes running (backport was done quite some time ago now). Best regards, Thomas > -Original Message- > From: Ramana Radhakrishnan [mailto:ramana@googlemail.com] > Sent: Tuesday, February 17, 2015 4:07 PM > To: Thomas Preud'homme > Cc: Ramana Radhakrishnan; gcc-patches; Richard Biener; Jakub Jelinek > Subject: Re: [PATCH, ARM] Fix PR64453: live high register not saved in > function prolog with -Os > > On Fri, Jan 23, 2015 at 8:23 AM, Thomas Preud'homme > wrote: > > Hi Ramana, > > > >> From: Ramana Radhakrishnan [mailto:ramana@googlemail.com] > >> Sent: Wednesday, January 14, 2015 7:21 PM > >> On Wed, Jan 14, 2015 at 10:20 AM, Thomas Preud'homme > >> wrote: > >> > When compiling for size, live high registers are not saved in function > >> prolog in ARM backend in Thumb mode. The problem comes from > >> arm_conditional_register_usage setting call_used_regs for all high > >> register to avoid them being allocated. However, this cause prolog to > not > >> save these register even if they are used. This patch marks high > registers > >> as really needing to be saved in prolog if live, no matter what is the > >> content of call_used_regs. > >> > > >> > ChangeLog entries are as follows: > >> > > >> > gcc/ChangeLog > >> > > >> > 2015-01-12 Thomas Preud'homme thomas.preudho...@arm.com > >> > > >> > PR target/64453 > >> > * config/arm/arm.c (callee_saved_reg_p): Define. > >> > (arm_compute_save_reg0_reg12_mask): Use > callee_saved_reg_p > >> to check if > >> > register is callee saved instead of !call_used_regs[reg]. > >> > (thumb1_compute_save_reg_mask): Likewise. > >> > > >> > > >> > gcc/testsuite/ChangeLog > >> > > >> > 2014-12-31 Thomas Preud'homme thomas.preudho...@arm.com > >> > > >> > * gcc.target/arm/pr64453.c: New. > >> > > >> > > >> > > >> > >> OK. > >> > >> Ramana > > > > The patch applies cleanly on GCC 4.8 and 4.9 branches when omitting > the cosmetic change > > in arm_conditional_register_usage () which was unintended. I > compiled an arm-none-eabi > > GCC cross compiler and ran the testsuite for both backport without any > regression. > > > > Is this ok for the 4.8 and 4.9 branches? > > > > OK for the branches if no RM objects in 24 hours. > > Ramana > > > Best regards, > > > > Thomas > > > > > >