Hi all,
Here is an updated patch that does the following:
* Adds a new constraint in config/aarch64/constraints.md to check for a
constant integer that is left consecutive. This addresses Richard
Henderson's suggestion about combining the aarch64_is_left_consecutive
call and the const_int match in the pattern.
* Merges the two patterns defined into one.
* Changes the pattern's type attribute to bfm.
* Improved the comment above the aarch64_is_left_consecutive implementation.
* Makes the pattern use the GPI iterator to accept smaller integer sizes
(an extra test is added to check for this).
* Improves the tests in combine_bfxil.c to ensure they aren't optimised
away and that they check for the pattern's correctness.
Below is a new changelog and the example given before.
Is this OK for trunk?
This patch adds an optimisation that exploits the AArch64 BFXIL instruction
when or-ing the result of two bitwise and operations with non-overlapping
bitmasks (e.g. (a & 0xFFFF0000) | (b & 0x0000FFFF)).
Example:
unsigned long long combine(unsigned long long a, unsigned long long b) {
return (a & 0xffffffff00000000ll) | (b & 0x00000000ffffffffll);
}
void read(unsigned long long a, unsigned long long b, unsigned long long
*c) {
*c = combine(a, b);
}
When compiled with -O2, read would result in:
read:
and x5, x1, #0xffffffff
and x4, x0, #0xffffffff00000000
orr x4, x4, x5
str x4, [x2]
ret
But with this patch results in:
read:
mov x4, x0
bfxil x4, x1, 0, 32
str x4, [x2]
ret
Bootstrapped and regtested on aarch64-none-linux-gnu and
aarch64-none-elf with no regressions.
gcc/
2018-07-11 Sam Tebbs <sam.te...@arm.com>
PR target/85628
* config/aarch64/aarch64.md (*aarch64_bfxil):
Define.
* config/aarch64/constraints.md (Ulc): Define
* config/aarch64/aarch64-protos.h (aarch64_is_left_consecutive):
Define.
* config/aarch64/aarch64.c (aarch64_is_left_consecutive): New
function.
gcc/testsuite
2018-07-11 Sam Tebbs <sam.te...@arm.com>
PR target/85628
* gcc.target/aarch64/combine_bfxil.c: New file.
* gcc.target/aarch64/combine_bfxil_2.c: New file.
On 07/19/2018 02:02 PM, Sam Tebbs wrote:
Hi Richard,
Thanks for the feedback. I find that using "is_left_consecutive" is
more descriptive than checking for it being a power of 2 - 1, since it
describes the requirement (having consecutive ones from the MSB) more
explicitly. I would be happy to change it though if that is the
consensus.
I have addressed your point about just returning the string instead of
using output_asm_insn and have changed it locally. I'll send an
updated patch soon.
On 07/17/2018 02:33 AM, Richard Henderson wrote:
On 07/16/2018 10:10 AM, Sam Tebbs wrote:
+++ b/gcc/config/aarch64/aarch64.c
@@ -1439,6 +1439,14 @@ aarch64_hard_regno_caller_save_mode (unsigned
regno, unsigned,
return SImode;
}
+/* Implement IS_LEFT_CONSECUTIVE. Check if an integer's bits are
consecutive
+ ones from the MSB. */
+bool
+aarch64_is_left_consecutive (HOST_WIDE_INT i)
+{
+ return (i | (i - 1)) == HOST_WIDE_INT_M1;
+}
+
...
+(define_insn "*aarch64_bfxil"
+ [(set (match_operand:DI 0 "register_operand" "=r")
+ (ior:DI (and:DI (match_operand:DI 1 "register_operand" "r")
+ (match_operand 3 "const_int_operand"))
+ (and:DI (match_operand:DI 2 "register_operand" "0")
+ (match_operand 4 "const_int_operand"))))]
+ "INTVAL (operands[3]) == ~INTVAL (operands[4])
+ && aarch64_is_left_consecutive (INTVAL (operands[4]))"
Better is to use a define_predicate to merge both that second test
and the
const_int_operand.
(I'm not sure about the "left_consecutive" language either.
Isn't it more descriptive to say that op3 is a power of 2 minus 1?)
(define_predicate "pow2m1_operand"
(and (match_code "const_int")
(match_test "exact_pow2 (INTVAL(op) + 1) > 0")))
and use
(match_operand:DI 3 "pow2m1_operand")
and then just the
INTVAL (operands[3]) == ~INTVAL (operands[4])
test.
Also, don't omit the modes for the constants.
Also, there's no reason this applies only to DI mode;
use the GPI iterator and %<w> in the output template.
+ HOST_WIDE_INT op3 = INTVAL (operands[3]);
+ operands[3] = GEN_INT (ceil_log2 (op3));
+ output_asm_insn ("bfxil\\t%0, %1, 0, %3", operands);
+ return "";
You can just return the string that you passed to output_asm_insn.
+ }
+ [(set_attr "type" "bfx")]
The other aliases of the BFM insn use type "bfm";
"bfx" appears to be aliases of UBFM and SBFM.
Not that it appears to matter to the scheduling
descriptions, but it is inconsistent.
r~