On 12/15/24 09:17, Paolo Bonzini wrote:


Il dom 15 dic 2024, 16:07 Richard Henderson <richard.hender...@linaro.org <mailto:richard.hender...@linaro.org>> ha scritto:

     > @@ -1384,6 +1409,12 @@ static void do_gen_rep(DisasContext *s, MemOp ot,

     >           gen_jcc_noeob(s, (JCC_Z << 1) | (nz ^ 1), done);
     >       }
     >
     > +    if (can_loop) {
     > +        tcg_gen_subi_tl(cx_next, cpu_regs[R_ECX], 1);

    Since we've just written back cx_next to ECX, this is the same as cx_next 
-= 1, yes?


Yeah, I wanted to make cx_next die at the assignment to ECX but it probably does not make a difference to generated code.

Not really. It would only make a difference if cx_next was never live outside the EBB. But it is live across the branches to LOOP and LAST.

What might make a difference is to use the knowledge of known values in ECX, but less usage of cx_next itself. Let cx_next die at the two

+        tcg_gen_brcondi_tl(TCG_COND_TSTEQ, cx_next, cx_mask, last);

by repeating the subtraction when updating ECX, i.e.

-    tcg_gen_mov_tl(cpu_regs[R_ECX], cx_next);
+    tcg_gen_subi_tl(cpu_regs[R_ECX], cpu_regs[R_ECX], 1);

This would avoid spilling cx_next to the stack.

There's a the ext32u to place somewhere.

I guess you can't hoist outside the loop before the first invocation of FN, due to the fault path. To eliminate it from the main loop you'd have to unroll once.

        // no iteration
        brcond tsteq ecx, mask, done

        sub cxnext, ecx, 1
        brcond tsteq cxnext, mask, last

        // first iteration
        fn
        sub ecx, ecx, 1
        extu ecx, ecx

        sub cxnext, ecx, 1
        brcond eq cxnext, 0, last

        // subsequent iterations, ecx now known zero-extended.
 loop:
        fn
        sub ecx, ecx, 1

        sub cxnext, ecx, 1
        brcond tstne, cxnext, max, loop
        brcond eq cxnext, 0, last

etc. It doesn't seem worthwhile to eliminate one ext32u, which will almost certainly be scheduled into the noise.


r~

Reply via email to