On 7/9/24 12:05 PM, Jeff Law wrote:
So another minor improvement for bitmanip code generation.
Essentially we have a pattern which matches a bset idiom for
x = zero_extend (1 << n). That pattern only handles SI->DI extension.
For the QI/HI case the 1<<n is first in a narrowing subreg to QI/HI. ie
(zero_extend:DI (subreg:QI (ashift (...))))
The same principles apply to this case as it can be implemented with
bset target,x0,bitpos and by using x0 we'll get the desired zero extension.
I think this testcase is ultimately derived from 500.perlbench. Our
code for this testcase still isn't great, but this is an easy
improvement and makes one of the remaining inefficiencies more obvious:
bset a5,x0,a5 # 24 [c=8 l=4] *bsetdi_3
andn a3,a0,a5 # 52 [c=4 l=4] and_notdi3
beq a4,zero,.L3 # 41 [c=12 l=4] *branchdi
or a3,a0,a5 # 44 [c=4 l=4] *iordi3/0
The bset is what this patch generates instead of a li+sll sequence. In
the form above its easier see that the andn can be replaced with a bclr
and the or can be replaced with a bset which in turn would allow the
bset above to go away completely.
This has been tested in my tester for rv32 and rv64. I'll wait for pre-
commit testing to complete before moving forward.
This has to be dropped. It's wrong.
+(define_insn "*bset<X:mode>_3"
+ [(set (match_operand:X 0 "register_operand" "=r")
+ (zero_extend:X
+ (subreg:SHORT
+ (ashift:X (const_int 1)
+ (match_operand:QI 1 "register_operand" "r")) 0)))]
That can't be a naked bset. The problem is the DImode shift may have
set a bit outside of the mode of SHORT which could be cleared by the
outer zero_extend from SHORT to DI.
If we tried to implement that with a naked bset that bit outside SHORT
would be left on resulting in incorrect code.
It's too bad. There's two additional follow-ups improvements that
aren't viable.
Jeff