Hi everybody,

*** Motivation ***

Currently gcc is capable of replacing hand-crafted implementation of byteswap 
by a suitable instruction thanks to the bswap optimization pass. The patch 
proposed here aims at extending this pass to also optimize load in a specific 
endianness, independent of the host endianness.

*** Methodology ***

The patch adds support for dealing with a memory source (array or structure) 
and detect whether the result of a bitwise operation happens to be equivalent 
to a big endian or little endian load and replace it by a load or a load and a 
byteswap according to the host endianness. The original code used the concept 
of symbolic number: a number where the value of each byte indicates its 
position (in terms of weight) before the bitwise manipulation. After performing 
the bit manipulation on that symbolic number, the result tells how the byte 
were shuffled (see variable cmp in function find_bswap). Detecting an operation 
resulting in a number in the host endianness is thus pretty straightforward: 
look if the symbolic number has *not* changed.

As to supporting read from array and structure, there is some logic to 
recognize the base of the array/structure and the offset of entries/fields 
accessed to check if the range of memory accessed would fit in an integer. Each 
entries is initially treated independently and when they are ORed together the 
values in the symbolic number are updated according to the host endianness: the 
entry of higher address would see its values incremented on a little endian 
machine.

Note that as it stands the patch does not work for arrays indexed with variable 
(such a tab[a] || (tab[a+1] << 8)) because fold_const does not fold (a + 1) - 
a. If such cases were folded, the number of cases detected would automatically 
be increased due to the use of fold_build2 to compare two offsets.

This patch also adds a few testcases to check both (i) that the optimization 
works as expected and (ii) that the result are correct. It also define new 
effective targets (bswap16, bswap32 and bswap64) to centralize the information 
about what target supports  byte swap instructions for the testsuite and modify 
existing tests to use these new effective targets.

The patch is quite big but could be split if necessary. A big part of the code 
added is for handling memory source and it would be difficult to split it but 
variable renaming and introduction of bwapXX effective target could be made 
separately to reduce the noise. The patch is too big so is only in attachment 
of this email.

The ChangeLog are as follows:

*** gcc/ChangeLog ***

2014-03-19  Thomas Preud'homme  <thomas.preudho...@arm.com>

        PR tree-optimization/54733
        * tree-ssa-math-opts.c (find_bswap_1): Renamed to ...
        (find_bswap_or_nop_1): This. Also add support for memory source.
        (find_bswap): Renamed to ...
        (find_bswap_or_nop): This. Also add support for memory source and
        detection of noop bitwise operations.
        (execute_optimize_bswap): Likewise.

*** gcc/testsuite/ChangeLog ***

2014-03-19  Thomas Preud'homme  <thomas.preudho...@arm.com>

        PR tree-optimization/54733
        * lib/target-supports.exp: New effective targets for architectures
        capable of performing byte swap.
        * gcc.dg/optimize-bswapdi-1.c: Convert to new bswap target.
        * gcc.dg/optimize-bswapdi-2.c: Likewise.
        * gcc.dg/optimize-bswapsi-1.c: Likewise.
        * gcc.dg/optimize-bswapdi-3.c: New test to check extension of bswap
        optimization to support memory sources.
        * gcc.dg/optimize-bswaphi-1.c: Likewise.
        * gcc.dg/optimize-bswapsi-2.c: Likewise.
        * gcc.c-torture/execute/bswap-2.c: Likewise.

Is this ok for stage 1?

Best regards,

Thomas

Attachment: gcc32rm-84.3.diff
Description: Binary data

Reply via email to