Hi, On Wed, 29 Aug 2007, Michael Matz wrote:
> > > I solved that by placing one of the T[012] operands into memory > > > for HOST_I386, thereby freeing one reg. Here's some justification > > > of why that doesn't really cost performance: with three free regs > > > GCC is already spilling like mad in the snippets, we just trade one > > > of those memory accesses (to stack) with one other mem access to > > > the cpu_state structure, which will be in cache. > > > > Do you have any evidence to support this claim? > > Not really, only an apple and orange comparison. A 10000 iteration > tests/sha1 run in the same Linux image, with -no-kqemu, on host and target > i386: time ./sha1 > > with qemu-0.8.2 (compiled by gcc 3.3-hammer): 7.92 seconds > with qemu-0.9.0-cvs (gcc4.1 compiled, with the patch): 8.15 seconds > > I'll try to get a better comparison. So, I've now compared our 0.9.0 package, once without patch compiled by 3.3-hammer, and once with patch and compiled by gcc 4.2: gcc33 compiled: 7.81 seconds (i.e. a bit faster than 0.8.2 was) gcc42 compiled: 8.07 seconds I.e. 3% slower. Ciao, Michael.