Re: [Qemu-devel] qemu vs gcc4
On Monday 23 October 2006 2:37 pm, Paul Brook wrote: > > > Better to just teach qemu how to generate code. > > > In fact I've already done most of the infrastructure (and a fair amount > > > of the legwork) for this. The only major missing function is code to do > > > softmmu load/store ops. > > > https://nowt.dyndns.org/ I looked at the big diff between that and mainline, and couldn't make heads nor tails of it in the half-hour I spent on it. I also looked at the svn history, but there's apparently a year and change of it. I don't suppose there's a design document somewhere? Or could you quickly explain "old one did this, new one does this, the code path diverges here, start reading at this point and expect this and this to happen, and if you go read this unrelated documentation to get up to speed it might help..." I'd like to add enough of the new code generation stuff to the existing targets so it doesn't break when built with gcc4, but so far my interest here greatly outstrips my ability. I don't even know where to start... Rob -- "Perfection is reached, not when there is no longer anything to add, but when there is no longer anything to take away." - Antoine de Saint-Exupery ___ Qemu-devel mailing list Qemu-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/qemu-devel
Re: [Qemu-devel] qemu vs gcc4
On Tuesday 31 October 2006 16:53, Rob Landley wrote: > On Monday 23 October 2006 2:37 pm, Paul Brook wrote: > > > > Better to just teach qemu how to generate code. > > > > In fact I've already done most of the infrastructure (and a fair > > > > amount of the legwork) for this. The only major missing function is > > > > code to do softmmu load/store ops. > > > > https://nowt.dyndns.org/ > > I looked at the big diff between that and mainline, and couldn't make heads > nor tails of it in the half-hour I spent on it. I also looked at the svn > history, but there's apparently a year and change of it. > > I don't suppose there's a design document somewhere? Or could you quickly > explain "old one did this, new one does this, the code path diverges here, > start reading at this point and expect this and this to happen, and if you > go read this unrelated documentation to get up to speed it might help..." Not really. The basic principle is very similar. Host code is decomposed into an intermediate form consisting of simple operations, then native code is generated from those operations. In the existing dyngen implementation most operands to ops are implicit, with only a few ops taking explicit arguments. The principle with the new system is that all operands are explicit. The intermediate representation used by the code generator resembles an imaginary machine. This machine has various different instructions (qops), and a nominally infinite register file (qregs). Each qop takes zero or more arguments, each of which may be an input or output. In addition to dynamically allocated qregs there are a fixed set of qregs that map onto the guest CPU state. This is to simplify code generation. Each qreg has a particular type (32/64 bit, integer or float). It's up to you ro make sure the argument types match those expected by th qop. It's generally fairly obvious from the name. eg. add32 adds I32 values, addf64 adds F64 values, etc. The exception is that I64 values can be used in place of I32. The upper 64-bit of outputs are undefined in this case, and teh value must be explicitly extended before the full 64 bits are used. The old dyngen ops are actually implemented as a special case qops. As an example take the arm instruction add, r0, r1, r2, lsl #2 This is equivalent to the C expression r0 = r1 + (r2 << 2) The old dyngen translate.c would do: gen_op_movl_T1_r2() gen_op_shll_T1_im(2) gen_op_movl_T0_r1(); gen_op_addl(); /* does T0 = T0 + T1 */ gen_op_movl_r0_T0 When fully converted to the new system this would become: int tmp = gen_new_qreg(); /* Allocate a temporary reg. */ /* gen_im32 is a helper that allocates a new qreg and initializes it to an immediate value. */ gen_op_add32(tmp, QREG_R2, gen_im32(2)); gen_op_add32(QREG_R0, QREG_R1, tmp); One of the changes I've made to target-arm/translate.c is to replace all uses of T2 with new pseudo-regs. IN many cases I've left the code structure as it was (using the global T0/T1 temporaries), but replaced the dyngen ops with the equivalent qops. eg. movl and andl now generate mov32 and and32 qops. The standard qops are defined in qops.def. A target can also define additional qops in qop-target.def. The target specific qops are to simplify implementation the i386 static flag propagation pass. the expand_op_* routines. For operations that are too complicated to be expressed as qops there is a mechanism for calling helper functions. The m68k target uses this for division and a couple of other things. The implementation make fairly heavy use of the C preprocessor to generate code from .def files. There's also a small shell script that pulls the definiteions of the helper routines out of qop-helper.c The debug dumps can be quite useful. In particular -d in_asm,op will dump the input asm and the resulting OPs. For converting targets you can probably ignore most of the translate-all and host-*/ changes. These implement generating code from the qops. This works by the host defining a set of "hard" qregs that correspond to host CPU registers, and constraints for the operands of each qop. Then we do register allocation and spilling to satisfy those constraints. The qops can then be assembled directly into binary code. There is also mechanisms for implementing floating point and 64-bit arithmetic even if the target doesn't support this natively. The target code doesn't need to worry about this, it just generates 64-bit/fp qops and they will be decomposed as neccessary. Paul ___ Qemu-devel mailing list Qemu-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/qemu-devel
Re: [Qemu-devel] ColdFire/m68k target
Paul,I just checkout from CVS repository and encountered the following problem while building the code.It seems you eleminate your original arguments of function gen_op_divs and gen_op_divu.Could you take a look? gcc -Wall -O2 -g -fno-strict-aliasing -I. -I.. -I/home/cjia/research/Operating_Systems/qemu_cvs/target-m68k -I/home/cjia/research/Operating_Systems/qemu_cvs -I/home/cjia/research/Operating_Systems/qemu_cvs/linux-user -I/home/cjia/research/Operating_Systems/qemu_cvs/linux-user/m68k -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -I/home/cjia/research/Operating_Systems/qemu_cvs/fpu -I/home/cjia/research/Operating_Systems/qemu_cvs/slirp -c -o translate.o /home/cjia/research/Operating_Systems/qemu_cvs/target-m68k/translate.c/home/cjia/research/Operating_Systems/qemu_cvs/target-m68k/translate.c: In function `disas_divw':/home/cjia/research/Operating_Systems/qemu_cvs/target-m68k/translate.c:717: error: too many arguments to function `gen_op_divs' /home/cjia/research/Operating_Systems/qemu_cvs/target-m68k/translate.c:719: error: too many arguments to function `gen_op_divu'/home/cjia/research/Operating_Systems/qemu_cvs/target-m68k/translate.c: In function `disas_divl': /home/cjia/research/Operating_Systems/qemu_cvs/target-m68k/translate.c:750: error: too many arguments to function `gen_op_divs'/home/cjia/research/Operating_Systems/qemu_cvs/target-m68k/translate.c:752: error: too many arguments to function `gen_op_divu' make[1]: *** [translate.o] Error 1make[1]: Leaving directory `/home/cjia/research/Operating_Systems/qemu_cvs/m68k-user'make: *** [subdir-m68k-user] Error 2Thanks,Neo On 10/21/06, Paul Brook <[EMAIL PROTECTED]> wrote: I've just committed ColdFire/M68K target support to cvs. This implementsusermode emulation for ColdFire CPUs, including the FPU. The CPU emulationhas been reasonably well tested, but linux syscall emulation only lightly tested.For those that don't know, ColdFire is a subset of the old m68k architecture,with a few minor differences, and a slightly different FPU.I'll probably be implementing full system emulation sometime (Freescale M5xxxEVB dev board).M68k code will not run on the current emulation. Implementing the missing 68kbits (addressing modes and bitfield instructions) probably wouldn't be thathard. Implementing 68881 FPU emulation is a bit harder 'cos it uses "extended" precision registers.The code is a bit different to most other qemu targets because I originallywrote it for my code generation backend rather than dyngen. The maintranslation code is unmodified, with glue to make it work with dyngen. For this reason the generated code isn't as efficient as it could be.PaulP.S.Anyone wanting to play with ColdFire emulation can find toolchains at http://www.codesourcery.com/gnu_toolchains/coldfire/index_html___Qemu-devel mailing listQemu-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/qemu-devel-- I would remember that if researchers were not ambitious probably today we haven't the technology we are using! ___ Qemu-devel mailing list Qemu-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/qemu-devel
Re: [Qemu-devel] ColdFire/m68k target
On Tuesday 31 October 2006 20:20, Neo Jia wrote: > Paul, > > I just checkout from CVS repository and encountered the following problem > while building the code. > > It seems you eleminate your original arguments of function gen_op_divs and > gen_op_divu. > > Could you take a look? > > gcc -Wall -O2 -g -fno-strict-aliasing -I. -I.. > -I/home/cjia/research/Operating_Systems/qemu_cvs/target-m68k > -I/home/cjia/research/Operating_Systems/qemu_cvs > -I/home/cjia/research/Operating_Systems/qemu_cvs/linux-user > -I/home/cjia/research/Operating_Systems/qemu_cvs/linux-user/m68k > -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE > -I/home/cjia/research/Operating_Systems/qemu_cvs/fpu > -I/home/cjia/research/Operating_Systems/qemu_cvs/slirp -c -o > translate.o/home/cjia/research/Operating_Systems/qemu_cvs/target-m68k/trans >late.c > /home/cjia/research/Operating_Systems/qemu_cvs/target-m68k/translate.c: In > function `disas_divw': Ah, this is a dyngen bug. op_divw contains "if (PARAM1) {...}". PARAM1 is implemented by taking the address of a symbol. gcc knows that symbols can never have address zero (because the C standard says so), and eliminates the whole block of code. The patch below fixes it (weak symbols can be zero). However it break lame hosts that don't support weak symbols, eg. win32. I'm still trying to figure out a proper solution. Index: dyngen-exec.h === RCS file: /sources/qemu/qemu/dyngen-exec.h,v retrieving revision 1.29 diff -u -p -r1.29 dyngen-exec.h --- dyngen-exec.h 18 Jul 2006 21:23:34 - 1.29 +++ dyngen-exec.h 31 Oct 2006 16:53:38 - @@ -222,7 +222,7 @@ extern int __op_param3 __hidden; #if defined(__APPLE__) static int __op_param1, __op_param2, __op_param3; #else -extern int __op_param1, __op_param2, __op_param3; +extern int __attribute__((weak)) __op_param1, __op_param2, __op_param3; #endif #define PARAM1 ((long)(&__op_param1)) #define PARAM2 ((long)(&__op_param2)) ___ Qemu-devel mailing list Qemu-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/qemu-devel
Re: [Qemu-devel] qemu vs gcc4
Welcome to Stupid Question Theatre! With your host, Paul Brook. Today's contestant is: Rob Landley. How dumb will it get? On Tuesday 31 October 2006 2:02 pm, Paul Brook wrote: > The basic principle is very similar. Host code is decomposed into an > intermediate form consisting of simple operations, then native code is > generated from those operations. I got that part. It's the how I'm still head-scratching over. The disassembly routines seem relatively compiler-independent, but I'm under the impression that turning the intermediate result (the string of qops) into large blocks of translated code involves gluing together a bunch of smaller blocks of pregenerated code. These pregenerated blocks were spit out by gcc and are where the all the compiler dependencies that aren't clear bugs come from. I thought what you were doing was replacing the pregenerated blocks with hand-coded assembly statements, but your description here seems to be about changing the disassembly routines that figure out which qops to string together in part 2. > In the existing dyngen implementation most operands to ops are implicit, > with only a few ops taking explicit arguments. The principle with the new > system is that all operands are explicit. Having looked ahead to your example before replying to this, I think I understand that part now. (Just barely.) > The intermediate representation used by the code generator resembles an > imaginary machine. This machine has various different instructions (qops), > and a nominally infinite register file (qregs). Each qreg is represented as an integer index? > Each qop takes zero or more arguments, each of which may be an input or > output. The input or output is always one of these qreg indexes? (Some of the existing ones seem to take immediate values...) > In addition to dynamically allocated qregs there are a fixed set of qregs > that map onto the guest CPU state. This is to simplify code generation. These are indexes 0, 1, and 2? Ok, looking at target-arm/translate.c, we have: static inline void gen_op_addl_T0_T1(void) { gen_op_add32(QREG_T0, QREG_T0, QREG_T1); } So what is QREG_T0 anyway? This is hard to grep for. 'find . | grep -v svn | xargs grep "QREG_T0"' doesn't produce anything useful, so there's got to be preprocessor concatenation stuff with ## going on, let's try just QREG on the *.h files, and yup at the start of qop.h there's this: enum target_qregs { QREG_NULL, #define DEFO32(name, offset) QREG_ ## name, #define DEFO64(name, offset) DEFO32(name, offset) #define DEFF32(name, reg) DEFO32(name, reg) #define DEFF64(name, reg) DEFO32(name, reg) #define DEFR(name, reg, mode) DEFO32(name, reg) #include "qregs.def" And that has "DEFR(T0, AREG1, QMODE_I32)" which... Ok, DEFR() discards the third argument ("mode") completely, and then DEFO32() discards the second argument (offset), and what's left is just the name, so it's position dependent (so why have the darn macros at ALL?) My brain hurts a lot now. I'm just letting you know. What is all this complication actually trying to accomplish? > Each qreg has a particular type (32/64 bit, integer or float). You mean each qop's arguments have a particular type, and the arguments are always in qregs? Or each qreg has a type permanently associated with that qreg? Or the value currently in a qreg has a type associated with it, but the next value stored in that qreg may have a different type? > It's up to > you to make sure the argument types match those expected by the qop. It's > generally fairly obvious from the name. eg. add32 adds I32 values, addf64 > adds F64 values, etc. The exception is that I64 values can be used in place > of I32. The upper 64-bit of outputs are undefined in this case, and the > value must be explicitly extended before the full 64 bits are used. Possible translation: you can feed a qreg containing an I64 value to a qop taking an i32 argument, and it'll typecast the sucker down intelligently, but if you produce an I32 result and expect to use that qreg's value as an I64 argument later, you have to call a sign-extending qop on it first? > The old dyngen ops are actually implemented as a special case qops. You mean each dyngen op produces multiple qops? (And/or is a bundle of qops?) > As an example take the arm instruction > > add, r0, r1, r2, lsl #2 > > This is equivalent to the C expression > > r0 = r1 + (r2 << 2) > > The old dyngen translate.c would do: > > gen_op_movl_T1_r2() > gen_op_shll_T1_im(2) > gen_op_movl_T0_r1(); > gen_op_addl(); /* does T0 = T0 + T1 */ > gen_op_movl_r0_T0 Digging down into target-arm/translate.c, function disas_arm_insn(), I'm... still having to take your word for it. All the gen_op_movl_T1 variants I'm seeing end with _im which I presume means "immediate". The alternative is _cc, but what does that mean? (Presumably not "closed captioned".) > When fully converted to the new system thi
[Qemu-devel] [PATCH] USB network interface
This patch contains an initial version of an USB network interface (RNDIS / CDC Ethernet) emulator. It has been tested with Linux (Fedora Core 6). It uses the same vendor and product ID's as the linux gadget network device driver, therefore the "linux.inf" file from Documentation/usb of a linux-2.6 kernel source archive can be used under windows. Usage: -net user -net nic,model=usb -usbdevicenet:0 gives you the default setup (i.e. without any -net option), but with the USB adapter instead of the default PCI adapter. Problems: I couldn't so far get it to work under Windows, neither on W2K SP1 nor SP2. Although on SP2 it gets a lot farther. Both versions start accessing the device, and then sometime an interrupt in terminates with USBD_STATUS_INTERNAL_HC_ERROR and then the device more or less hangs. So it looks to me like a USB host controller emulation bug or a bug in the host controller driver... Maybe someone has an idea... Tom --- ./vl.c.usbnet 2006-10-28 17:46:08.0 +0200 +++ ./vl.c 2006-10-29 02:46:13.0 +0100 @@ -3766,6 +3766,11 @@ dev = usb_tablet_init(); } else if (strstart(devname, "disk:", &p)) { dev = usb_msd_init(p); +} else if (strstart(devname, "net:", &p)) { +unsigned int nr = strtoul(p, NULL, 0); + if (nr >= (unsigned int)nb_nics || strcmp(nd_table[nr].model, "usb")) + return -1; +dev = usb_net_init(&nd_table[nr]); } else { return -1; } --- ./Makefile.target.usbnet2006-10-28 17:44:51.0 +0200 +++ ./Makefile.target 2006-10-28 17:45:39.0 +0200 @@ -336,7 +336,7 @@ VL_OBJS+= scsi-disk.o cdrom.o lsi53c895a.o # USB layer -VL_OBJS+= usb.o usb-hub.o usb-linux.o usb-hid.o usb-ohci.o usb-msd.o +VL_OBJS+= usb.o usb-hub.o usb-linux.o usb-hid.o usb-ohci.o usb-msd.o usb-net.o # PCI network cards VL_OBJS+= ne2000.o rtl8139.o pcnet.o --- ./hw/pc.c.usbnet2006-10-29 02:50:18.0 +0100 +++ ./hw/pc.c 2006-10-29 02:52:23.0 +0100 @@ -672,6 +672,8 @@ } if (strcmp(nd->model, "ne2k_isa") == 0) { pc_init_ne2k_isa(nd); + } else if (strcmp(nd->model, "usb") == 0) { + /* ignore */ } else if (pci_enabled) { pci_nic_init(pci_bus, nd); } else { --- ./hw/usb-net.c.usbnet 2006-10-29 18:14:38.0 +0100 +++ ./hw/usb-net.c 2006-10-30 01:07:27.0 +0100 @@ -0,0 +1,1342 @@ +/* + * QEMU USB Net devices + * + * Copyright (c) 2006 Thomas Sailer + * based on usb-hid.c Copyright (c) 2005 Fabrice Bellard + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + * THE SOFTWARE. + */ +#include "vl.h" +#include "../audio/sys-queue.h" + +typedef uint32_t __le32; +#include "ndis.h" + +/* Thanks to NetChip Technologies for donating this product ID. + * It's for devices with only CDC Ethernet configurations. + */ +#define CDC_VENDOR_NUM 0x0525 /* NetChip */ +#define CDC_PRODUCT_NUM 0xa4a1 /* Linux-USB Ethernet Gadget */ +/* For hardware that can talk RNDIS and either of the above protocols, + * use this ID ... the windows INF files will know it. + */ +#define RNDIS_VENDOR_NUM0x0525 /* NetChip */ +#define RNDIS_PRODUCT_NUM 0xa4a2 /* Ethernet/RNDIS Gadget */ + +#define STRING_MANUFACTURER 1 +#define STRING_PRODUCT 2 +#define STRING_ETHADDR 3 +#define STRING_DATA 4 +#define STRING_CONTROL 5 +#define STRING_RNDIS_CONTROL6 +#define STRING_CDC 7 +#define STRING_SUBSET 8 +#define STRING_RNDIS9 +#define STRING_SERIALNUMBER 10 + +#define DEV_CONFIG_VALUE1 /* cdc or subset */ +#define DEV_RNDIS_CONFIG_VALUE 2 /* rndis; optional */ + +#define USB_CDC_SUBCLASS_ACM0x02 +#define USB_CDC_SUBCLASS_ETHERNET 0x06 + +#define USB_CDC_PROTO_NONE
Re: [Qemu-devel] qemu vs gcc4
On Tuesday 31 October 2006 20:41, Rob Landley wrote: > Welcome to Stupid Question Theatre! With your host, Paul Brook. Today's > contestant is: Rob Landley. How dumb will it get? > > On Tuesday 31 October 2006 2:02 pm, Paul Brook wrote: > > The basic principle is very similar. Host code is decomposed into an > > intermediate form consisting of simple operations, then native code is > > generated from those operations. > > I got that part. It's the how I'm still head-scratching over. > > The disassembly routines seem relatively compiler-independent, but I'm > under the impression that turning the intermediate result (the string of > qops) into large blocks of translated code involves gluing together a bunch > of smaller blocks of pregenerated code. These pregenerated blocks were > spit out by gcc and are where the all the compiler dependencies that aren't > clear bugs come from. Correct. > I thought what you were doing was replacing the pregenerated blocks with > hand-coded assembly statements, but your description here seems to be about > changing the disassembly routines that figure out which qops to string > together in part 2. Replacing the pregenerated blocks with hand written assembly isn't feasible. Each target has its own set of ops, and each host would need its own assembly implementation of those ops. Multiply 11 targets by 11 hosts and you get a unmaintainable mess :-) > > In the existing dyngen implementation most operands to ops are implicit, > > with only a few ops taking explicit arguments. The principle with the new > > system is that all operands are explicit. > > Having looked ahead to your example before replying to this, I think I > understand that part now. (Just barely.) > > > The intermediate representation used by the code generator resembles an > > imaginary machine. This machine has various different instructions > > (qops), and a nominally infinite register file (qregs). > > Each qreg is represented as an integer index? Yes. > > Each qop takes zero or more arguments, each of which may be an input or > > output. > > The input or output is always one of these qreg indexes? (Some of the > existing ones seem to take immediate values...) It is always a qreg. Potentially we could decide that some qregs are constants rather than variables, and use that information for gode generation, but that's a slightly different issue. > > In addition to dynamically allocated qregs there are a fixed set of qregs > > that map onto the guest CPU state. This is to simplify code generation. > > These are indexes 0, 1, and 2? They are defined by th code you quote below. However this is an implementation detail, and could change. You should use the named constants. > Ok, looking at target-arm/translate.c, we have: > > static inline void gen_op_addl_T0_T1(void) > { > gen_op_add32(QREG_T0, QREG_T0, QREG_T1); > } > > So what is QREG_T0 anyway? This is hard to grep for. 'find . | grep -v svn > | xargs grep "QREG_T0"' doesn't produce anything useful, so there's got to > be preprocessor concatenation stuff with ## going on, let's try just QREG > on the *.h files, and yup at the start of qop.h there's this: It corresponds to "T0" in dyngen. In addition to the actual CPU state, dyngen uses 3 fixed register as scratch workspace. for qop purposes these are part of the guest CPU state. They're only there to aid conversion of the translation code, they'll go away eventually. > enum target_qregs { > QREG_NULL, > #define DEFO32(name, offset) QREG_ ## name, > #define DEFO64(name, offset) DEFO32(name, offset) > #define DEFF32(name, reg) DEFO32(name, reg) > #define DEFF64(name, reg) DEFO32(name, reg) > #define DEFR(name, reg, mode) DEFO32(name, reg) > #include "qregs.def" > > And that has "DEFR(T0, AREG1, QMODE_I32)" which... Ok, DEFR() discards the > third argument ("mode") completely, and then DEFO32() discards the second > argument (offset), and what's left is just the name, so it's position > dependent (so why have the darn macros at ALL?) Because qregs.def in included in at least two other places. This is the C preprocessor trickery I mentioned :-) > My brain hurts a lot now. I'm just letting you know. What is all this > complication actually trying to accomplish? Generation of 3 different things (QREG_* constants, the target_reginfo structure, and qreg_names) from a single source. This avoid having to keep 3 big hairy arrays in sync with each other. It's also used implement 64-bit qregs as a pair of 32-bit qregs on 32-bit hosts. > > Each qreg has a particular type (32/64 bit, integer or float). > > You mean each qop's arguments have a particular type, and the arguments are > always in qregs? Or each qreg has a type permanently associated with that > qreg? Both the above. > Or the value currently in a qreg has a type associated with it, but > the next value stored in that qreg may have a different type? A qreg has a fixed type. The value stored in that qreg has th
Re: [Qemu-devel] qemu vs gcc4
Paul Brook a écrit : Replacing the pregenerated blocks with hand written assembly isn't feasible. Each target has its own set of ops, and each host would need its own assembly implementation of those ops. Multiply 11 targets by 11 hosts and you get a unmaintainable mess :-) Shouldn't you have 11+11 and not 11*11, given your intermediate representation? And of these 11+11, 11 have to be written anyway (target). Or did I miss something? On RISC targets like ARM most instructions don't set the condition codes, so we don't bother doing this. Except for ARM Thumb ISA which always sets flags. ARM is a bad RISC example :) I was wondering if you did some profiling to know how much time is spent in disas_arm_insn. Of course the profiling results would be very different for a Linux boot or a synthetic benchmark (which makes me think that you don't support MMU, do you?). There is a very nice trick to speed up decoding of ARM instructions: pick up bits 20-27 and 4-7 and you (almost) get one instruction per case entry; of course this means using a generator to write the 4096 entries, but the result was good for my interpreted ISS, reaching 44 M i/s on an Opteron @2.4GHz without any compiler dependent trick (such as gcc jump to labels). Laurent ___ Qemu-devel mailing list Qemu-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/qemu-devel
Re: [Qemu-devel] qemu vs gcc4
On Tuesday 31 October 2006 22:31, Laurent Desnogues wrote: > Paul Brook a écrit : > > Replacing the pregenerated blocks with hand written assembly isn't > > feasible. Each target has its own set of ops, and each host would need > > its own assembly implementation of those ops. Multiply 11 targets by 11 > > hosts and you get a unmaintainable mess :-) > > Shouldn't you have 11+11 and not 11*11, given your intermediate > representation? And of these 11+11, 11 have to be written > anyway (target). Or did I miss something? If you use qops (which is a target and host independent intermediate representation) it's 11 + 11. If you just replace the existing dyngen op.c with hand written assembly it's 11 * 11. > > On RISC targets like ARM most instructions don't set the condition codes, > > so we don't bother doing this. > > Except for ARM Thumb ISA which always sets flags. ARM is a bad > RISC example :) Bah. Details :-) > I was wondering if you did some profiling to know how much time > is spent in disas_arm_insn. Of course the profiling results > would be very different for a Linux boot or a synthetic benchmark The qop generator does add some overhead to the code translation. I haven't done proper benchmarks, but in most cases it doesn't seem to be too bad (maybe 10%). I'm hoping we can get most of that back. > (which makes me think that you don't support MMU, do you?). qemu does implement a MMU. Currently this still uses the dyngen code, but that's fixable. > There is a very nice trick to speed up decoding of ARM > instructions: pick up bits 20-27 and 4-7 and you (almost) get > one instruction per case entry; of course this means using a > generator to write the 4096 entries, but the result was good for > my interpreted ISS, reaching 44 M i/s on an Opteron @2.4GHz > without any compiler dependent trick (such as gcc jump to labels). qemu generally gets 100-200MIPS on my 2GHz Opteron. Paul ___ Qemu-devel mailing list Qemu-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/qemu-devel
Re: [Qemu-devel] qemu vs gcc4
On Tuesday 31 October 2006 2:02 pm, Paul Brook wrote: > As an example take the arm instruction > > add, r0, r1, r2, lsl #2 > > This is equivalent to the C expression > > r0 = r1 + (r2 << 2) ... > When fully converted to the new system this would become: > > int tmp = gen_new_qreg(); /* Allocate a temporary reg. */ > /* gen_im32 is a helper that allocates a new qreg and > initializes it to an immediate value. */ > gen_op_add32(tmp, QREG_R2, gen_im32(2)); > gen_op_add32(QREG_R0, QREG_R1, tmp); I forgot to ask: Where's the shift? I think the above code means you generate an immediate value (the 2), add it to R2 with the result going in a spill register, and then add the spill register to R1, with the result going to R0. Should that middle line be some kind of gen_op_lshift32() instead of gen_op_add32()? Do qregs ever get freed? (I'm guessing gen_new_qreg() lasts until the end of the translated block, and then the next block has its own set of qregs?) Rob -- "Perfection is reached, not when there is no longer anything to add, but when there is no longer anything to take away." - Antoine de Saint-Exupery ___ Qemu-devel mailing list Qemu-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/qemu-devel
Re: [Qemu-devel] qemu vs gcc4
On Tuesday 31 October 2006 5:08 pm, Paul Brook wrote: > On Tuesday 31 October 2006 20:41, Rob Landley wrote: > > Welcome to Stupid Question Theatre! With your host, Paul Brook. Today's > > contestant is: Rob Landley. How dumb will it get? Bonus round! > > I thought what you were doing was replacing the pregenerated blocks with > > hand-coded assembly statements, but your description here seems to be about > > changing the disassembly routines that figure out which qops to string > > together in part 2. > > Replacing the pregenerated blocks with hand written assembly isn't feasible. > Each target has its own set of ops, and each host would need its own > assembly implementation of those ops. Multiply 11 targets by 11 hosts and > you get a unmaintainable mess :-) Actually it sounds additive rather than multiplicative. Does each target have an entirely unrelated set of ops, or is there a shared set of primitive ops plus some oddballs? But backing up and just accepting that for a moment, in theory what you need is some way to compile a C function to machine code, and then unwrap that function into a .raw file containing just the machine code. So the only per-compiler thing would be this unwrapper thingy. But I already know that doesn't work because it doesn't explain the "unable to find spill register" problem. Presumably, just beating the right .raw contents out of the compiler is nontrivial, let alone unwrapping it... > It corresponds to "T0" in dyngen. In addition to the actual CPU state, dyngen > uses 3 fixed register as scratch workspace. for qop purposes these are part > of the guest CPU state. They're only there to aid conversion of the > translation code, they'll go away eventually. Presumably the m68k target is pure qop, and hasn't got this sort of thing? > > My brain hurts a lot now. I'm just letting you know. What is all this > > complication actually trying to accomplish? > > Generation of 3 different things (QREG_* constants, the target_reginfo > structure, and qreg_names) from a single source. This avoid having to keep 3 > big hairy arrays in sync with each other. > It's also used implement 64-bit qregs as a pair of 32-bit qregs on 32-bit > hosts. Ok, the QREG_* constants are for the intermediate code the decompiler stuff generates. I have no idea what target_reginfo and qreg_names are for, but maybe it'll come to me as I read the code... > > Or the value currently in a qreg has a type associated with it, but > > the next value stored in that qreg may have a different type? > > A qreg has a fixed type. The value stored in that qreg has that type. To > convert it to a different type you need to use an explicit conversion qop. So values don't have types, the qregs the values are _in_ have types. But I thought there were an unlimited number of them (well, 1024 or so), and they're dynamically allocated (at least some of the time). How does it keep track of the type of a given qreg? (When you convert, you copy values from one qreg into another?) > > Possible translation: you can feed a qreg containing an I64 value to a qop > > taking an i32 argument, and it'll typecast the sucker down intelligently, > > but if you produce an I32 result and expect to use that qreg's value as an > > I64 argument later, you have to call a sign-extending qop on it first? > > Exactly. > If you mix I32,F32 and/or F64 in this way Bad Things will happen. Presumably just the same kinds of Bad Things as "float f; *(int *)&f;"? > > seeing end with _im which I presume means "immediate". The alternative is > > _cc, but what does that mean? (Presumably not "closed captioned".) > > _cc are variants that set the condition codes. I may have got T0 and T1 > backwards in the first 3 lines. Ah! Is this written down anywhere? I've read Fabrice's paper and the design documentation, and I'm not remembering this. It's quite possible I missed it when my brain filled up, though. > > Um, is my earlier characterization of "unwrapping stuff" at all close? > > Not entirely. I'm also replacing fixed locations (T2) with dynamicall > allocated qregs. The dynamic allocation buys you what? (Less spilling?) > > Ok, now I'm really lost. > > Most x86 instructions set the condition code flags. However most of the time > these flags are ignored. eg. if you have to consecutive add instructions the > first will set the flags, and the second will immediately overwrite them. > > qemu contains a back-propagation pass that will remove the code to set the > flags after the first instruction. Currently this is implemented by changing > an addl_cc op into a plain addl op. I actually understood that. Yay! > The flag-setting code would most likely require several qops to implement, > so > it would be much harder to prove it is not needed and remove it. So there is > a mechanism for adding extra target qops, doing the flag elimination pass, > then expanding those to generic qops. Um, w
Re: [Qemu-devel] qemu vs gcc4
> Where's the shift? I think the above code means you generate an immediate > value (the 2), add it to R2 with the result going in a spill register, and > then add the spill register to R1, with the result going to R0. Should > that middle line be some kind of gen_op_lshift32() instead of > gen_op_add32()? Yes. > Do qregs ever get freed? (I'm guessing gen_new_qreg() lasts until the end > of the translated block, and then the next block has its own set of qregs?) Correct. Paul ___ Qemu-devel mailing list Qemu-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/qemu-devel
Re: [Qemu-devel] qemu vs gcc4
> Actually it sounds additive rather than multiplicative. Does each target > have an entirely unrelated set of ops, or is there a shared set of > primitive ops plus some oddballs? The shared set of primitive ops is basically qops :-) You probably could figure out a single common qet of qops, then write assembly and glue them together like we do with dyngen. However once you've done that you've implemented most of what's needed for fully dynamic qops, so it doesn't really seem worth it. > But backing up and just accepting that for a moment, in theory what you > need is some way to compile a C function to machine code, and then unwrap > that function into a .raw file containing just the machine code. So the > only per-compiler thing would be this unwrapper thingy. Right. > But I already know > that doesn't work because it doesn't explain the "unable to find spill > register" problem. That a separate gcc bug. It gets stuck when you tell it not to use half the registers, then ask it to do 64-bit math. This is one of the reasons eliminating the fixed registers is a good idea. > > It corresponds to "T0" in dyngen. In addition to the actual CPU state, > > dyngen > > uses 3 fixed register as scratch workspace. for qop purposes these are > > part of the guest CPU state. They're only there to aid conversion of the > > translation code, they'll go away eventually. > > Presumably the m68k target is pure qop, and hasn't got this sort of thing? Correct. There is one use of T0 left for communicating with the TB chaining code, but that's it and will probably go away eventually. > > > Or the value currently in a qreg has a type associated with it, but > > > the next value stored in that qreg may have a different type? > > > > A qreg has a fixed type. The value stored in that qreg has that type. To > > convert it to a different type you need to use an explicit conversion > > qop. > > So values don't have types, the qregs the values are _in_ have types. But > I thought there were an unlimited number of them (well, 1024 or so), and > they're dynamically allocated (at least some of the time). How does it > keep track of the type of a given qreg? (When you convert, you copy values > from one qreg into another?) Yes. Conversion is just like any other qop. It reads one qreg, and writes the result to a different qreg which happens to be a different type. > > > Possible translation: you can feed a qreg containing an I64 value to a > > > qop taking an i32 argument, and it'll typecast the sucker down > > > intelligently, but if you produce an I32 result and expect to use that > > > qreg's value as an I64 argument later, you have to call a > > > sign-extending qop on it first? > > > > Exactly. > > If you mix I32,F32 and/or F64 in this way Bad Things will happen. > > Presumably just the same kinds of Bad Things as "float f; *(int *)&f;"? Or qemu will get confused and crash. > > > seeing end with _im which I presume means "immediate". The alternative > > > is _cc, but what does that mean? (Presumably not "closed captioned".) > > > > _cc are variants that set the condition codes. I may have got T0 and T1 > > backwards in the first 3 lines. > > Ah! > > Is this written down anywhere? I've read Fabrice's paper and the design > documentation, and I'm not remembering this. It's quite possible I missed > it when my brain filled up, though. Dunno. > > > Um, is my earlier characterization of "unwrapping stuff" at all close? > > > > Not entirely. I'm also replacing fixed locations (T2) with dynamicall > > allocated qregs. > > The dynamic allocation buys you what? (Less spilling?) More-or-less. It makes it easier to optimize. The code generator can pick what to put in registers, or even not put them there at all, instead of having to do things exactly how you told it. It also means you don't need to reserve that register, avoiding the gcc unable to find spill register bug you mentioned above. > > Most x86 instructions set the condition code flags. However most of the > > time these flags are ignored. eg. if you have to consecutive add > > instructions the first will set the flags, and the second will > > immediately overwrite them. > > > > qemu contains a back-propagation pass that will remove the code to set > > the flags after the first instruction. Currently this is implemented by > > changing an addl_cc op into a plain addl op. > > I actually understood that. Yay! > > > The flag-setting code would most likely require several qops to > > implement, so > > it would be much harder to prove it is not needed and remove it. So there > > is a mechanism for adding extra target qops, doing the flag elimination > > pass, then expanding those to generic qops. > > Um, wouldn't the flag setting code be fairly straightforward as a qop that > comes right _before_ the other op, as in "set the flags for doing this with > these registers", that does nothing but set the flags (I.E. it wouldn't > modify the contents of an
[Qemu-devel] qemu vl.c
CVSROOT:/sources/qemu Module name:qemu Changes by: Paul Brook 06/11/01 01:44:16 Modified files: . : vl.c Log message: Remove FD on close. CVSWeb URLs: http://cvs.savannah.gnu.org/viewcvs/qemu/vl.c?cvsroot=qemu&r1=1.217&r2=1.218 ___ Qemu-devel mailing list Qemu-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/qemu-devel
Re: [Qemu-devel] qemu vs gcc4
On Tuesday 31 October 2006 7:29 pm, Paul Brook wrote: > > Actually it sounds additive rather than multiplicative. Does each target > > have an entirely unrelated set of ops, or is there a shared set of > > primitive ops plus some oddballs? > > The shared set of primitive ops is basically qops :-) > You probably could figure out a single common qet of qops, then write assembly > and glue them together like we do with dyngen. However once you've done that > you've implemented most of what's needed for fully dynamic qops, so it > doesn't really seem worth it. I missed a curve. What's "fully dynamic qops"? (There's no translation cache?) > > But I already know > > that doesn't work because it doesn't explain the "unable to find spill > > register" problem. > > That a separate gcc bug. It gets stuck when you tell it not to use half the > registers, then ask it to do 64-bit math. This is one of the reasons > eliminating the fixed registers is a good idea. Sigh. The problems motivating me to learn the code are highly esoteric breakage, yet I'm still not quite up to the task of understanding what's going on when all this works _right_. Grumble... > > > It corresponds to "T0" in dyngen. In addition to the actual CPU state, > > > dyngen > > > uses 3 fixed register as scratch workspace. for qop purposes these are > > > part of the guest CPU state. They're only there to aid conversion of the > > > translation code, they'll go away eventually. > > > > Presumably the m68k target is pure qop, and hasn't got this sort of thing? > > Correct. > There is one use of T0 left for communicating with the TB chaining code, but > that's it and will probably go away eventually. Any idea where I can get a toolchain that can output a "hello world" program for m68k nommu? (Or perhaps you have a statically linked "hello world" program for the platform lying around?) Building toolchains is one of my other hobbies but it's a royal pain because in order to get "hello world" to compile and link you have to supply kernel headers, build binutils and gcc with various configuration options and path overrides and such, build uClibc with the result and get them all talking to each other. I.E. you've got to do hours of work before you get to the first real "did it work" point, and then backtrack to figure out why the answer is usually "no". (Prebuilt binary toolchains are useful just to narrow down the number of possible things that could be broken when you first try out a new platform.) > > > > Possible translation: you can feed a qreg containing an I64 value to a > > > > qop taking an i32 argument, and it'll typecast the sucker down > > > > intelligently, but if you produce an I32 result and expect to use that > > > > qreg's value as an I64 argument later, you have to call a > > > > sign-extending qop on it first? > > > > > > Exactly. > > > If you mix I32,F32 and/or F64 in this way Bad Things will happen. > > > > Presumably just the same kinds of Bad Things as "float f; *(int *)&f;"? > > Or qemu will get confused and crash. I've had that happen without qops, although not recently. (I have this nasty habit of trying Ubuntu's PPC and x86-64 distros under qemu with each new release. They usually fail in amusing new ways.) > > > > seeing end with _im which I presume means "immediate". The alternative > > > > is _cc, but what does that mean? (Presumably not "closed captioned".) > > > > > > _cc are variants that set the condition codes. I may have got T0 and T1 > > > backwards in the first 3 lines. > > > > Ah! > > > > Is this written down anywhere? I've read Fabrice's paper and the design > > documentation, and I'm not remembering this. It's quite possible I missed > > it when my brain filled up, though. > > Dunno. So if at any point I actually understand this stuff, I need to write documentation? (I can do part 2, part 1 the jury's still out on...) > It also means you don't need to reserve that register, avoiding the gcc > unable to find spill register bug you mentioned above. I'm all for it. > > Um, wouldn't the flag setting code be fairly straightforward as a qop that > > comes right _before_ the other op, as in "set the flags for doing this with > > these registers", that does nothing but set the flags (I.E. it wouldn't > > modify the contents of any the registers, so it could be immediately > > followed by the appropriate add or shift or so on), and then the flag > > setting pass could just turn all the ones that weren't needed into > > QOP_NULL? > > Theoretically possible, but not so easy in practice. Especially when you get > things like partial flag clobbers, and lazy flag evaluation. Doing it as a > target specific hack is much simpler and quicker. I think I know what partial flag clobbers are (although if you're working your way back, in theory you could handle it with a mask of exposed bits), but what's lazy flag evaulation? (I thought that was the point of eliminating the
Re: [Qemu-devel] qemu vs gcc4
On Wednesday 01 November 2006 01:51, Rob Landley wrote: > On Tuesday 31 October 2006 7:29 pm, Paul Brook wrote: > > > Actually it sounds additive rather than multiplicative. Does each > > > target have an entirely unrelated set of ops, or is there a shared set > > > of primitive ops plus some oddballs? > > > > The shared set of primitive ops is basically qops :-) > > You probably could figure out a single common qet of qops, then write > > assembly > > > and glue them together like we do with dyngen. However once you've done > > that you've implemented most of what's needed for fully dynamic qops, so > > it doesn't really seem worth it. > > I missed a curve. What's "fully dynamic qops"? (There's no translation > cache?) I mean all the qop stuff I've implemented. > > > > It corresponds to "T0" in dyngen. In addition to the actual CPU > > > > state, dyngen > > > > uses 3 fixed register as scratch workspace. for qop purposes these > > > > are part of the guest CPU state. They're only there to aid conversion > > > > of the translation code, they'll go away eventually. > > > > > > Presumably the m68k target is pure qop, and hasn't got this sort of > > > thing? > > > > Correct. > > There is one use of T0 left for communicating with the TB chaining code, > > but that's it and will probably go away eventually. > > Any idea where I can get a toolchain that can output a "hello world" > program for m68k nommu? (Or perhaps you have a statically linked "hello > world" program for the platform lying around?) Funnily enough I do :-) http://www.codesourcery.com/gnu_toolchains/coldfire/ > > Theoretically possible, but not so easy in practice. Especially when you > > get things like partial flag clobbers, and lazy flag evaluation. Doing it > > as a target specific hack is much simpler and quicker. > > I think I know what partial flag clobbers are (although if you're working > your way back, in theory you could handle it with a mask of exposed bits), > but what's lazy flag evaulation? (I thought that was the point of > eliminating the unused flag setting. Are you saying the hardware also does > this and we have to emulate that?) Lazy flag evaluation is where you don't bother calculating the actual flags when executing the flag-setting instruction. Instead you save the operands/result and compute the flags when you actually need them. > > > > There are three fairly independent stages: > > > > 1) target-*/translate.c converts guest code into qops. > > > > 2) translate-all.c messes about with those qops a bit (allocates host > > > > registers, etc). > > > > 3) translate-op.c,translate-qop.c and target-*/ turns those qops into > > > > host code. > > > > > > Is pass 2 where the flag elimination pass goes (and presumably any > > > other optimizations that might get added)? No, that can't be the case > > > or the m68k code wouldn't need its own implementation of the flag > > > elimination pass... > > > > Flag elimination is at the end of step 1. > > Because it's platform specific? Yes. > > > > qops and dyngen ops are both small "functions" that are represented > > > > in a similar way. The difference is that dyngen ops are target > > > > specific fixed functions, whereas qops are generic parameterized > > > > functions. > > > > > > So the 11x11 exponential complexity of qemu producing its own assembly > > > output might not be as much of a problem after switching to qops? > > > > RIght. The exponential complexity is if you write the assembly by hand > > instead of using gcc to generate it. > > The exponential complexity is if you have to write different code for each > combination of host and target. If every target disassembles to the same > set of target QOPs, then you could have a hand-written assembly version of > each QOP for each host platform, and still have N rather than N^2 of them. Right, but by the time you've got everything to use the same set of ops you may as well teach qemu how to generate code instead of using potted fragments. Using hand-written assembly fragments probably doesn't make qemu any faster, it just removes the gcc dependency. Using qops also allows qemu to generate better (faster) translated code. Paul ___ Qemu-devel mailing list Qemu-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/qemu-devel