Hi everybody, *** Motivation ***
Currently gcc is capable of replacing hand-crafted implementation of byteswap by a suitable instruction thanks to the bswap optimization pass. The patch proposed here aims at extending this pass to also optimize load in a specific endianness, independent of the host endianness. *** Methodology *** The patch adds support for dealing with a memory source (array or structure) and detect whether the result of a bitwise operation happens to be equivalent to a big endian or little endian load and replace it by a load or a load and a byteswap according to the host endianness. The original code used the concept of symbolic number: a number where the value of each byte indicates its position (in terms of weight) before the bitwise manipulation. After performing the bit manipulation on that symbolic number, the result tells how the byte were shuffled (see variable cmp in function find_bswap). Detecting an operation resulting in a number in the host endianness is thus pretty straightforward: look if the symbolic number has *not* changed. As to supporting read from array and structure, there is some logic to recognize the base of the array/structure and the offset of entries/fields accessed to check if the range of memory accessed would fit in an integer. Each entries is initially treated independently and when they are ORed together the values in the symbolic number are updated according to the host endianness: the entry of higher address would see its values incremented on a little endian machine. Note that as it stands the patch does not work for arrays indexed with variable (such a tab[a] || (tab[a+1] << 8)) because fold_const does not fold (a + 1) - a. If such cases were folded, the number of cases detected would automatically be increased due to the use of fold_build2 to compare two offsets. This patch also adds a few testcases to check both (i) that the optimization works as expected and (ii) that the result are correct. It also define new effective targets (bswap16, bswap32 and bswap64) to centralize the information about what target supports byte swap instructions for the testsuite and modify existing tests to use these new effective targets. The patch is quite big but could be split if necessary. A big part of the code added is for handling memory source and it would be difficult to split it but variable renaming and introduction of bwapXX effective target could be made separately to reduce the noise. The patch is too big so is only in attachment of this email. The ChangeLog are as follows: *** gcc/ChangeLog *** 2014-03-19 Thomas Preud'homme <thomas.preudho...@arm.com> PR tree-optimization/54733 * tree-ssa-math-opts.c (find_bswap_1): Renamed to ... (find_bswap_or_nop_1): This. Also add support for memory source. (find_bswap): Renamed to ... (find_bswap_or_nop): This. Also add support for memory source and detection of noop bitwise operations. (execute_optimize_bswap): Likewise. *** gcc/testsuite/ChangeLog *** 2014-03-19 Thomas Preud'homme <thomas.preudho...@arm.com> PR tree-optimization/54733 * lib/target-supports.exp: New effective targets for architectures capable of performing byte swap. * gcc.dg/optimize-bswapdi-1.c: Convert to new bswap target. * gcc.dg/optimize-bswapdi-2.c: Likewise. * gcc.dg/optimize-bswapsi-1.c: Likewise. * gcc.dg/optimize-bswapdi-3.c: New test to check extension of bswap optimization to support memory sources. * gcc.dg/optimize-bswaphi-1.c: Likewise. * gcc.dg/optimize-bswapsi-2.c: Likewise. * gcc.c-torture/execute/bswap-2.c: Likewise. Is this ok for stage 1? Best regards, Thomas
gcc32rm-84.3.diff
Description: Binary data