On Tue, 5 Jan 2016, Thomas Preud'homme wrote: > Hi, > > bswap optimization pass generate wrong code on big endian targets when the > result of a bit operation it analyzed is a partial load of the range of > memory > accessed by the original expression (when one or more bytes at lowest address > were lost in the computation). This is due to the way cmpxchg and cmpnop are > adjusted in find_bswap_or_nop before being compared to the result of the > symbolic expression. Part of the adjustment is endian independent: it's to > ignore the bytes that were not accessed by the original gimple expression. > However, when the result has less byte than that original expression, some > more byte need to be ignored and this is endian dependent. > > The current code only support loss of bytes at the highest addresses because > there is no code to adjust the address of the load. However, for little and > big endian targets the bytes at highest address translate into different byte > significance in the result. This patch first separate cmpxchg and cmpnop > adjustement into 2 steps and then deal with endianness correctly for the > second step. > > ChangeLog entries are as follow: > > > *** gcc/ChangeLog *** > > 2015-12-16 Thomas Preud'homme <thomas.preudho...@arm.com> > > PR tree-optimization/67781 > * tree-ssa-math-opts.c (find_bswap_or_nop): Zero out bytes in cmpxchg > and cmpnop in two steps: first the ones not accessed in original > gimple expression in a endian independent way and then the ones not > accessed in the final result in an endian-specific way. > > > *** gcc/testsuite/ChangeLog *** > > 2015-12-16 Thomas Preud'homme <thomas.preudho...@arm.com> > > PR tree-optimization/67781 > * gcc.c-torture/execute/pr67781.c: New file. > > > diff --git a/gcc/testsuite/gcc.c-torture/execute/pr67781.c > b/gcc/testsuite/gcc.c-torture/execute/pr67781.c > new file mode 100644 > index 0000000..bf50aa2 > --- /dev/null > +++ b/gcc/testsuite/gcc.c-torture/execute/pr67781.c > @@ -0,0 +1,34 @@ > +#ifdef __UINT32_TYPE__ > +typedef __UINT32_TYPE__ uint32_t; > +#else > +typedef unsigned uint32_t; > +#endif > + > +#ifdef __UINT8_TYPE__ > +typedef __UINT8_TYPE__ uint8_t; > +#else > +typedef unsigned char uint8_t; > +#endif > + > +struct > +{ > + uint32_t a; > + uint8_t b; > +} s = { 0x123456, 0x78 }; > + > +int pr67781() > +{ > + uint32_t c = (s.a << 8) | s.b; > + return c; > +} > + > +int > +main () > +{ > + if (sizeof (uint32_t) * __CHAR_BIT__ != 32) > + return 0; > + > + if (pr67781 () != 0x12345678) > + __builtin_abort (); > + return 0; > +} > diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c > index b00f046..e5a185f 100644 > --- a/gcc/tree-ssa-math-opts.c > +++ b/gcc/tree-ssa-math-opts.c > @@ -2441,6 +2441,8 @@ find_bswap_or_nop_1 (gimple *stmt, struct > symbolic_number > *n, int limit) > static gimple * > find_bswap_or_nop (gimple *stmt, struct symbolic_number *n, bool *bswap) > { > + unsigned rsize; > + uint64_t tmpn, mask; > /* The number which the find_bswap_or_nop_1 result should match in order > to have a full byte swap. The number is shifted to the right > according to the size of the symbolic number before using it. */ > @@ -2464,24 +2466,38 @@ find_bswap_or_nop (gimple *stmt, struct > symbolic_number > *n, bool *bswap) > > /* Find real size of result (highest non-zero byte). */ > if (n->base_addr) > - { > - int rsize; > - uint64_t tmpn; > - > - for (tmpn = n->n, rsize = 0; tmpn; tmpn >>= BITS_PER_MARKER, rsize++); > - n->range = rsize; > - } > + for (tmpn = n->n, rsize = 0; tmpn; tmpn >>= BITS_PER_MARKER, rsize++); > + else > + rsize = n->range; > > - /* Zero out the extra bits of N and CMP*. */ > + /* Zero out the bits corresponding to untouched bytes in original gimple > + expression. */ > if (n->range < (int) sizeof (int64_t)) > { > - uint64_t mask; > - > mask = ((uint64_t) 1 << (n->range * BITS_PER_MARKER)) - 1; > cmpxchg >>= (64 / BITS_PER_MARKER - n->range) * BITS_PER_MARKER; > cmpnop &= mask; > } > > + /* Zero out the bits corresponding to unused bytes in the result of the > + gimple expression. */ > + if (rsize < n->range) > + { > + if (BYTES_BIG_ENDIAN) > + { > + mask = ((uint64_t) 1 << (rsize * BITS_PER_MARKER)) - 1; > + cmpxchg &= mask; > + cmpnop >>= (n->range - rsize) * BITS_PER_MARKER; > + } > + else > + { > + mask = ((uint64_t) 1 << (rsize * BITS_PER_MARKER)) - 1; > + cmpxchg >>= (n->range - rsize) * BITS_PER_MARKER; > + cmpnop &= mask; > + } > + n->range = rsize; > + } > + > /* A complete byte swap should make the symbolic number to start with > the largest digit in the highest order byte. Unchanged symbolic > number indicates a read with same endianness as target architecture. */ > > > > Regression testsuite was run on a bootstrapped native x86_64-linux-gnu GCC > and > on an arm-none-eabi GCC cross-compiler without any regression. I'm waiting > for > a slot on gcc110 to do a big endian bootstrap but at least the testcase works > on mips-linux. I'll send an update once bootstrap is complete. > > Is this ok for trunk and 5 branch in a week time if no regression is reported?
Yes. Thanks, Richar. > Best regards, > > Thomas > > -- Richard Biener <rguent...@suse.de> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)