On 7/9/24 12:05 PM, Jeff Law wrote:
So another minor improvement for bitmanip code generation.

Essentially we have a pattern which matches a bset idiom for
x = zero_extend (1 << n).   That pattern only handles SI->DI extension.

For the QI/HI case the 1<<n is first in a narrowing subreg to QI/HI.  ie

(zero_extend:DI (subreg:QI (ashift (...))))

The same principles apply to this case as it can be implemented with bset target,x0,bitpos and by using x0 we'll get the desired zero extension.

I think this testcase is ultimately derived from 500.perlbench.  Our code for this testcase still isn't great, but this is an easy improvement and makes one of the remaining inefficiencies more obvious:


        bset    a5,x0,a5        # 24    [c=8 l=4]  *bsetdi_3
        andn    a3,a0,a5        # 52    [c=4 l=4]  and_notdi3
        beq     a4,zero,.L3     # 41    [c=12 l=4]  *branchdi
        or      a3,a0,a5        # 44    [c=4 l=4]  *iordi3/0

The bset is what this patch generates instead of a li+sll sequence. In the form above its easier see that the andn can be replaced with a bclr and the or can be replaced with a bset which in turn would allow the bset above to go away completely.


This has been tested in my tester for rv32 and rv64.  I'll wait for pre- commit testing to complete before moving forward.
This has to be dropped.  It's wrong.

+(define_insn "*bset<X:mode>_3"
+  [(set (match_operand:X 0 "register_operand" "=r")
+       (zero_extend:X
+         (subreg:SHORT
+           (ashift:X (const_int 1)
+                     (match_operand:QI 1 "register_operand" "r")) 0)))]

That can't be a naked bset. The problem is the DImode shift may have set a bit outside of the mode of SHORT which could be cleared by the outer zero_extend from SHORT to DI.

If we tried to implement that with a naked bset that bit outside SHORT would be left on resulting in incorrect code.

It's too bad. There's two additional follow-ups improvements that aren't viable.

Jeff

Reply via email to