Looking at PR77308, one of the issues is that the bswap optimization phase doesn't work on ARM. This is due to an odd check that uses SLOW_UNALIGNED_ACCESS (which is always true on ARM). Since the testcase in PR77308 generates much better code with this patch (~13% fewer instructions), it seems best to remove this odd check.
This exposes a problem with SLOW_UNALIGNED_ACCESS - what is it supposed to mean or do? According to its current definition, it means we should never emit an unaligned access for a given mode as it would lead to a trap. However that is not what happens, for example all integer modes on ARM support really fast unaligned access and we generate unaligned instructions without any issues. Some Thumb-1 targets automatically expand unaligned accesses if necessary. So this macro clearly doesn't stop unaligned accesses from being generated. So I want to set it to false for most modes on ARM as they are not slow. However doing so causes incorrect code generation and unaligned traps. How can we differentiate between modes that support fast unaligned access in hardware, modes that get expanded and modes that should never be used in an unaligned access? Bootstrap & regress OK. ChangeLog: 2015-11-01 Wilco Dijkstra <wdijk...@arm.com> gcc/ * tree-ssa-math-opts.c (bswap_replace): Remove test of SLOW_UNALIGNED_ACCESS. testsuite/ * gcc.dg/optimize-bswapdi-3.c: Remove xfail. * gcc.dg/optimize-bswaphi-1.c: Likewise. * gcc.dg/optimize-bswapsi-2.c: Likewise. -- diff --git a/gcc/testsuite/gcc.dg/optimize-bswapdi-3.c b/gcc/testsuite/gcc.dg/optimize-bswapdi-3.c index 273b4bc622cb32564533e1352b5fc8ad52054f8b..6f682014622ab79e541cdf26d13f16a7d87f158d 100644 --- a/gcc/testsuite/gcc.dg/optimize-bswapdi-3.c +++ b/gcc/testsuite/gcc.dg/optimize-bswapdi-3.c @@ -61,4 +61,4 @@ uint64_t read_be64_3 (unsigned char *data) } /* { dg-final { scan-tree-dump-times "64 bit load in target endianness found at" 3 "bswap" } } */ -/* { dg-final { scan-tree-dump-times "64 bit bswap implementation found at" 3 "bswap" { xfail alpha*-*-* arm*-*-* } } } */ +/* { dg-final { scan-tree-dump-times "64 bit bswap implementation found at" 3 "bswap" } } */ diff --git a/gcc/testsuite/gcc.dg/optimize-bswaphi-1.c b/gcc/testsuite/gcc.dg/optimize-bswaphi-1.c index c18ca6174d12a786a71252dfe47cfe78ca58750a..852ccfe5c1acd519f2cf340cc55f3ea74b1ec21f 100644 --- a/gcc/testsuite/gcc.dg/optimize-bswaphi-1.c +++ b/gcc/testsuite/gcc.dg/optimize-bswaphi-1.c @@ -55,5 +55,4 @@ swap16 (HItype in) } /* { dg-final { scan-tree-dump-times "16 bit load in target endianness found at" 3 "bswap" } } */ -/* { dg-final { scan-tree-dump-times "16 bit bswap implementation found at" 1 "bswap" { target alpha*-*-* arm*-*-* } } } */ -/* { dg-final { scan-tree-dump-times "16 bit bswap implementation found at" 4 "bswap" { xfail alpha*-*-* arm*-*-* } } } */ +/* { dg-final { scan-tree-dump-times "16 bit bswap implementation found at" 4 "bswap" } } */ diff --git a/gcc/testsuite/gcc.dg/optimize-bswapsi-2.c b/gcc/testsuite/gcc.dg/optimize-bswapsi-2.c index a1558af2cc74adde439d42223b00977d9eeb9639..01ae3776ed3f44fbc300d001f8c67ec11625d03b 100644 --- a/gcc/testsuite/gcc.dg/optimize-bswapsi-2.c +++ b/gcc/testsuite/gcc.dg/optimize-bswapsi-2.c @@ -45,4 +45,4 @@ uint32_t read_be32_3 (unsigned char *data) } /* { dg-final { scan-tree-dump-times "32 bit load in target endianness found at" 3 "bswap" } } */ -/* { dg-final { scan-tree-dump-times "32 bit bswap implementation found at" 3 "bswap" { xfail alpha*-*-* arm*-*-* } } } */ +/* { dg-final { scan-tree-dump-times "32 bit bswap implementation found at" 3 "bswap" } } */ diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c index 0cea1a8472d5d9c4f0e4a7bd82930e201948c4ec..cbb2f9367a287ad8cfcfc5740c0e49b2c83bafd0 100644 --- a/gcc/tree-ssa-math-opts.c +++ b/gcc/tree-ssa-math-opts.c @@ -2651,11 +2651,6 @@ bswap_replace (gimple *cur_stmt, gimple *src_stmt, tree fndecl, } } - if (bswap - && align < GET_MODE_ALIGNMENT (TYPE_MODE (load_type)) - && SLOW_UNALIGNED_ACCESS (TYPE_MODE (load_type), align)) - return false; - /* Move cur_stmt just before one of the load of the original to ensure it has the same VUSE. See PR61517 for what could go wrong. */