On 12/18/2009 07:40 AM, malc wrote:
After fixing a bug (crop was done after reading the cr) i run some
openssl speed benchmarks, and, at least here on an MPC7447A, got a
speed degradation, tiny but consistent.
Well, you could try rendering the setcond with branches instead of
logical operations. You'll still gain the benefit of not having ended
the TCG basic block, and forced the stores of globals to their slots etc
etc.
IN:
0x40082295: movzbl (%eax),%eax
0x40082298: cmp $0x3d,%al
0x4008229a: setne %dl
0x4008229d: test %al,%al
0x4008229f: je 0x400822d2
OP after liveness analysis:
mov_i32 tmp2,eax
qemu_ld8u tmp0,tmp2,$0xffffffff
mov_i32 eax,tmp0
movi_i32 tmp1,$0x3d
mov_i32 tmp0,eax
nopn $0x2,$0x2
sub_i32 cc_dst,tmp0,tmp1
movi_i32 tmp13,$0xff
and_i32 tmp4,cc_dst,tmp13
movi_i32 tmp13,$0x0
setcond_i32 tmp0,tmp4,tmp13,ne
movi_i32 tmp14,$0xff
and_i32 tmp13,tmp0,tmp14
....
OUT: [size=204]
0x601051b0: lwz r14,0(r27)
0x601051b4: lbzx r14,0,r14
0x601051b8: mr r15,r14
0x601051bc: addi r15,r15,-61
0x601051c0: andi. r15,r15,255
0x601051c4: cmpwi cr6,r15,0
0x601051c8: crnot 4*cr7+eq,4*cr6+eq
0x601051cc: mfcr r0
0x601051d0: rlwinm r15,r0,31,31,31
0x601051d4: andi. r15,r15,255
...
So the fact that setcond produces 0/1 was never communicated to the
tcg, not that i would claim that it's possible at all...
It isn't.
And anyway, if you look at the opcodes generated without the setcond
patch you'll see that and 255 in there as well. Some more surgery on
the i386 translator could probably get rid of that. All I replaced were
sequences of
brcond c1,c2,$lab_true
movi dest,0
br $lab_over
movi dest,1
r~