Michael Matz wrote: > Hi, > > [please keep me CCed, I'm not on this list] > > the below patch let's qemu be compiled by GCC 4.2 (probably also 4.1 and > others) for most hosts (i386,x86_64,ia64,ppc). s390 as host is missing, > and needs a compiler change to emit the literal store inline again, as the > literal pool at the end fundamentally breaks the assumption that qemu can > paste together the code snippets by patching out the return. I have no > HOST_{ARM,MIPS*,ALPHA,SPARC*,M68K} machines to compile for that. > > It specifically changes these things: > > * ppc: adds -fno-section-anchors to OP_CFLAGS, as dyngen isn't prepared > to deal with the relocs resulting from using section anchors
Maybe this should be handled more generally then, not ppc specific, like other "offending" compiler options: check if the compiler knows the option, if yes, disable the feature. > * ppc: on target-alpha op_reset_FT GCC4 uses a floating point constant 0.0 > to reset the ft regs, which in turn is loaded from the data > section. The reloc for that is unhandled. Using -ffast-math would > work around this, but I chose to be conservative and change only > the op.c snippet in question. See the comment there. > * i386: well, most of you will know that GCC4 doesn't compile qemu because > of reload. The inherent problem is, that qemu uses 64bit > entities in some places (sometimes structs), which GCC (4.x) > manages to place in registers, i.e. needs 2 hardregs. But it > sometimes just so happens that an instruction needing such DImode > reg also has a memory operand with an indexed address (reg plus > reg), hence two hardregs more. But qemu by default leaves just > three free registers for compiling op.c --> boom. This is somewhat > hard to work around in GCC (trust me :) ). > > I solved that by placing one of the T[012] operands into memory > for HOST_I386, thereby freeing one reg. Here's some justification > of why that doesn't really cost performance: with three free regs > GCC is already spilling like mad in the snippets, we just trade one > of those memory accesses (to stack) with one other mem access to > the cpu_state structure, which will be in cache. Could you back up this assumption with some numbers? :-) If there is a significant difference I recommend to make that workaround conditional on GCC4 as well as HOST_I386. [snip] > diff -urp qemu-0.9.0.cvs.orig/target-arm/cpu.h qemu-0.9.0.cvs/target-arm/cpu.h > --- qemu-0.9.0.cvs.orig/target-arm/cpu.h 2007-06-24 14:09:48.000000000 > +0200 > +++ qemu-0.9.0.cvs/target-arm/cpu.h 2007-08-21 21:38:36.000000000 +0200 > @@ -52,6 +52,9 @@ typedef uint32_t ARMReadCPFunc(void *opa > */ > > typedef struct CPUARMState { > +#if defined(HOST_I386) > + uint32_t t1; > +#endif > /* Regs for current mode. */ > uint32_t regs[16]; > /* Frequently accessed CPSR bits are stored separately for efficiently. > diff -urp qemu-0.9.0.cvs.orig/target-arm/exec.h > qemu-0.9.0.cvs/target-arm/exec.h > --- qemu-0.9.0.cvs.orig/target-arm/exec.h 2007-06-03 19:44:36.000000000 > +0200 > +++ qemu-0.9.0.cvs/target-arm/exec.h 2007-08-21 21:48:48.000000000 +0200 > @@ -23,7 +23,12 @@ > register struct CPUARMState *env asm(AREG0); > register uint32_t T0 asm(AREG1); > register uint32_t T1 asm(AREG2); > +#ifndef HOST_I386 > register uint32_t T2 asm(AREG3); > +#else > +#define T2 (env->t1) > +#endif T2/t1 mismatch, it seems. Likewise for mips and ppc. Thiemo