Hi Bernd, First of all, my apologize for the late reply. I was in holidays the past week to celebrate Chinese new year.
On Friday, February 12, 2016 05:28:43 PM Bernd Schmidt wrote: > PR69714 is an issue where the bswap pass makes an incorrect > transformation on big-endian targets. The source has a 32-bit bswap, but > PA doesn't have a pattern for that. Still, we recognize that there is a > 16-bit bswap involved, and generate code for that - loading the halfword > at offset 2 from the original memory, as per the proper big-endian > correction. > > The problem is that we recognized the rotation of the _high_ part, which > is at offset 0 on big-endian. The symbolic number is 0x0304, rather than > 0x0102 as it should be. Only the latter form should ever be matched. Which is exactly what the patch for PR67781 was set out to do (see the if (BYTES_BIG_ENDIAN) block in find_bswap_or_nop. The reason why the offset is wrong is due to another if (BYTES_BIG_ENDIAN) block in bswap_replace. I will check the testcase added with that latter block, my guess is that the change was trying to fix a similar issue to PR67781 and PR69714. When removing it the load in avcrc is done without an offset. I should have run the full testsuite also on a big endian system instead of a few selected testcases and a bootstrap in addition to the little endian bootstrap+testsuite. Lesson learned. > The > problem is caused by the patch for PR67781, which was intended to solve > a different big-endian problem. Unfortunately, I think it is based on an > incorrect analysis. > > The real issue with the PR67781 testcase is in fact the masking loop, > identified by Thomas in comment #7 for 67781. > > for (tmpn = n->n, rsize = 0; tmpn; tmpn >>= BITS_PER_MARKER, rsize++) > ; > n->range = rsize; > > If we have a value of 0x01020304, but a range of 5, it means that > there's an "invisible" high-order byte that we don't care about. On > little-endian, we can just ignore it. On big-endian, this implies that > the data we're interested in is located at an offset. The code that does > the replacements does not use the offset or bytepos fields, it assumes > that the bytepos always matches that of the load instruction. Yes, but the change in find_bswap_or_nop aims at checking that we have 0x05040302 or 0x02030405 for big endian targets and 0x04030201 or 0x01020304 for little endian targets. Before the "if (rsize < n->range)" block, cmpnop and cmpxchg are respectively 0x0504030201 and 0x0102030405. Then for big endian it will only keep the 4 least significant symbolic bytes of cmpxchg (if performs a bitwise and) and the 4 most significant symbolic bytes of cmpnop (it performs a right shift) so you'd get 0x05040302 for cmpnop and 0x02030405 for cmpxchg. Both would translate to a load at offset 0, and then a byteswap for the latter. As said earlier, the problem is in bswap_replace which tries to adjust the address of the load for big endian targets by adding a load offset. With the change in find_bswap_or_nop, an offset is never needed because only pattern that correspond to a load at offset 0 are recognized. I kept for GCC 7 to change that to allow offset and recognize all sub-load and sub-bswap. > The only > offset we can introduce is the big-endian correction, but that assumes > we're always dealing with lowparts. > > So, I think the correct/conservative fix for both bugs is to revert the > earlier change for PR67781, and then apply the following on top: > > --- revert.tree-ssa-math-opts.c 2016-02-12 15:22:57.098895058 +0100 > +++ tree-ssa-math-opts.c 2016-02-12 15:23:08.482228474 +0100 > @@ -2473,10 +2473,14 @@ find_bswap_or_nop (gimple *stmt, struct > /* Find real size of result (highest non-zero byte). */ > if (n->base_addr) > { > - int rsize; > + unsigned HOST_WIDE_INT rsize; > uint64_t tmpn; > > for (tmpn = n->n, rsize = 0; tmpn; tmpn >>= BITS_PER_MARKER, > rsize++); > + if (BYTES_BIG_ENDIAN && n->range != rsize) > + /* This implies an offset, which is currently not handled by > + bswap_replace. */ > + return NULL; > n->range = rsize; > } This works too yes with less optimizations for big endian. I'm fine with either solutions. This one is indeed a bit more conservative so I see the appeal to use it for GCC 5 and 6. Best regards, Thomas