On 12/15/24 09:17, Paolo Bonzini wrote:
Il dom 15 dic 2024, 16:07 Richard Henderson <richard.hender...@linaro.org
<mailto:richard.hender...@linaro.org>> ha scritto:
> @@ -1384,6 +1409,12 @@ static void do_gen_rep(DisasContext *s, MemOp ot,
> gen_jcc_noeob(s, (JCC_Z << 1) | (nz ^ 1), done);
> }
>
> + if (can_loop) {
> + tcg_gen_subi_tl(cx_next, cpu_regs[R_ECX], 1);
Since we've just written back cx_next to ECX, this is the same as cx_next
-= 1, yes?
Yeah, I wanted to make cx_next die at the assignment to ECX but it probably does not make
a difference to generated code.
Not really. It would only make a difference if cx_next was never live outside the EBB.
But it is live across the branches to LOOP and LAST.
What might make a difference is to use the knowledge of known values in ECX, but less
usage of cx_next itself. Let cx_next die at the two
+ tcg_gen_brcondi_tl(TCG_COND_TSTEQ, cx_next, cx_mask, last);
by repeating the subtraction when updating ECX, i.e.
- tcg_gen_mov_tl(cpu_regs[R_ECX], cx_next);
+ tcg_gen_subi_tl(cpu_regs[R_ECX], cpu_regs[R_ECX], 1);
This would avoid spilling cx_next to the stack.
There's a the ext32u to place somewhere.
I guess you can't hoist outside the loop before the first invocation of FN, due to the
fault path. To eliminate it from the main loop you'd have to unroll once.
// no iteration
brcond tsteq ecx, mask, done
sub cxnext, ecx, 1
brcond tsteq cxnext, mask, last
// first iteration
fn
sub ecx, ecx, 1
extu ecx, ecx
sub cxnext, ecx, 1
brcond eq cxnext, 0, last
// subsequent iterations, ecx now known zero-extended.
loop:
fn
sub ecx, ecx, 1
sub cxnext, ecx, 1
brcond tstne, cxnext, max, loop
brcond eq cxnext, 0, last
etc. It doesn't seem worthwhile to eliminate one ext32u, which will almost certainly be
scheduled into the noise.
r~