On 7/18/24 3:12 PM, Georg-Johann Lay wrote:
This new builtin provides a faster way to compute
expressions like 1 << x or ~(1 << x) that are sometimes
used as masks for setting bits in the I/O region, and when
x is not known at compile-time.
The open coded C expression takes 5 + 4 * x cycles to compute an
8-bit result, whereas the builtin takes only 7 cycles independent
of x.
The implementation is straight forward and uses 3 new insns.
Ok for trunk?
Johann
--
AVR: Support new built-in function __builtin_avr_mask1.
gcc/
* config/avr/builtins.def (MASK1): New DEF_BUILTIN.
* config/avr/avr.cc (avr_rtx_costs_1): Handle rtx costs for
expressions like __builtin_avr_mask1.
(avr_init_builtins) <uintQI_ftype_uintQI_uintQI>: New tree type.
(avr_expand_builtin) [AVR_BUILTIN_MASK1]: Diagnose unexpected forms.
(avr_fold_builtin) [AVR_BUILTIN_MASK1]: Handle case.
* config/avr/avr.md (gen_mask1): New expand helper.
(mask1_0x01_split, mask1_0x80_split, mask1_0xfe_split): New
insn-and-split.
(*mask1_0x01, *mask1_0x80, *mask1_0xfe): New insns.
* doc/extend.texi (AVR Built-in Functions) <__builtin_avr_mask1>:
Document new built-in function.
gcc/testsuite/
* gcc.target/avr/torture/builtin-mask1.c: New test.
OK from a technical standpoint. Just a few nits.
builtin-mask1.diff
diff --git a/gcc/config/avr/avr.cc b/gcc/config/avr/avr.cc
index b9064424ffe..d8417c4ba3e 100644
--- a/gcc/config/avr/avr.cc
+++ b/gcc/config/avr/avr.cc
@@ -13135,6 +13135,16 @@ avr_rtx_costs_1 (rtx x, machine_mode mode, int
outer_code,
switch (mode)
{
case E_QImode:
+ if (speed
+ && XEXP (x, 0) == const1_rtx
CONST1_RTX (mode) is probably more appropriate.
+ && AND == GET_CODE (XEXP (x, 1)))
We generally write conditions with the constant as the 2nd operand. So
GET_CODE (XEXP (x, 1) == AND
@@ -13308,6 +13318,15 @@ avr_rtx_costs_1 (rtx x, machine_mode mode, int
outer_code,
break;
case E_HImode:
+ if (CONST_INT_P (XEXP (x, 0))
+ && INTVAL (XEXP (x, 0)) == 128
+ && AND == GET_CODE (XEXP (x, 1)))
Similarly for this condition.
+ switch (0xff & INTVAL (operands[1]))
Similarly here.
+ {
+ case 0x01: emit (gen_mask1_0x01_split (operands[0], operands[2])); break;
+ case 0x80: emit (gen_mask1_0x80_split (operands[0], operands[2])); break;
+ case 0xfe: emit (gen_mask1_0xfe_split (operands[0], operands[2])); break;
Don't put code on the same line as the case. Bring it down to the next
line and indent 2 positions further.
Those are all NFC, so approved with those changes.
I'm not sure the builtin is strictly needed since this could likely be
handled in the shift expanders and/or combiner patterns. But I've got
no objection to adding the builtin.
jeff