[Qemu-devel] Re: [PATCH 15/15] Implement the bus structure for PAPR virtual IO
On Sun, Feb 13, 2011 at 1:15 AM, Benjamin Herrenschmidt wrote: > On Sun, 2011-02-13 at 00:52 +0200, Blue Swirl wrote: >> On Sat, Feb 12, 2011 at 11:00 PM, Benjamin Herrenschmidt >> wrote: >> > On Sat, 2011-02-12 at 18:59 +0200, Blue Swirl wrote: >> >> >> >> Actually I don't quite understand the need for vty layer, why not use >> >> the chardev here directly? >> > >> > I'm not sure what you mean here... >> >> Maybe it would be reasonable to leave h_put_term_char to spapr_hcall.c >> instead of moving those to a separate file. > > Well, the VIO device instance gives the chardev instance which is all > nicely encapsulated inside spapr-vty... Also VIO devices tend to have > dedicated hcalls, not only VTY, so it makes a lot of sense to keep them > close to the rest of the VIO driver they belong to don't you think ? > > (Actually veth does, vscsi uses the more "generic" CRQ stuff which we've > added to the core VIO but you'll see that when we get those patches > ready, hopefully soon). This is a bit of a special case, much like semihosting modes for m68k or ARM, or like MOL hacks which were removed recently. From QEMU point of view, the most natural way of handling this would be hypervisor implemented in the guest side (for example BIOS). Then the hypervisor would use normal IO (or virtio) to communicate with the host. If inside QEMU, the interface of the hypervisor to the devices needs some thought. We'd like to avoid ugly interfaces like vmmouse where a device probes CPU registers directly or spaghetti interfaces like APIC. > Actually, one thing I noticed is that the current patches David posted > still have a single function with a switch/case statement for hcalls. > > I'm not 100% certain what David long term plans are here, but in our > internal "WIP" tree, we've subsequently turned that into a mechanism > where any module can call powerpc_register_hypercall() to add hcalls. > > So if David intends to move the "upstream candidate" tree in that > direction, then naturally, the calls in spapr_hcall.c are going to > disappear in favor of a pair of powerpc_register_hypercall() locally in > the vty module. Is the interface new design, or are you implementing what is used also on real HW?
Re: [Qemu-devel] [PATCH 1/2] linux-user: Define target alignment size
On Sun, Feb 13, 2011 at 4:22 AM, Laurent Vivier wrote: > Datatype alignment can be found using following application: > > int main(void) > { > printf("alignof(short) %ld\n", __alignof__(short)); > printf("alignof(int) %ld\n", __alignof__(int)); > printf("alignof(long) %ld\n", __alignof__(long)); > printf("alignof(long long) %ld\n", __alignof__(long long)); > } > > This patch includes following alignments: > > i386 > > alignof(short) 2 > alignof(int) 4 > alignof(long) 4 > alignof(long long) 8 > > x86_64 > > alignof(short) 2 > alignof(int) 4 > alignof(long) 8 > alignof(long long) 8 > > arm > > alignof(short) 2 > alignof(int) 4 > alignof(long) 4 > alignof(long long) 4 > > m68k (680x0) > > alignof(short) 2 > alignof(int) 2 > alignof(long) 2 > alignof(long long) 2 > > mips > > alignof(short) 2 > alignof(int) 4 > alignof(long) 4 > alignof(long long) 8 > > ppc > > alignof(short) 2 > alignof(int) 4 > alignof(long) 4 > alignof(long long) 8 > > for other targets, use by default (2,4,4,8). > > Please, update for your favorite target... For Sparc32 (I think also sparc32plus), the default is OK. For Sparc64, please use 2, 4, 8, 8. I'd guess other 64 bit platforms (Alpha, MIPS64, PPC64 etc) should use the same. Does GCC produce correct code using the attributes on strictly aligned host, when the target is less strictly aligned? Should the alignment of floating point variables be specified as well? The strict alignment required for doubles is 4, but recommended alignment is 8, I'm not sure which one is used for structures containing doubles.
Re: [Qemu-devel] Binary Translation hooking - reading registers
On Sun, Feb 13, 2011 at 5:48 AM, felix.matenaar@rwth-aachen wrote: > Hello everyone, > > i am working on a project adding instrumentation into qemu. My approach > is to use gen_helper stuff do hook specific opcodes like call or ret to > gain information about running processes in the virtual machine. > > Today I noticed that the CPUState* env is not in all cases up-to-date > when my hooks are called on block execution. That makes totally sense > since blocks are natively executed in one step as far as I understood so > there is no code which would keep the cpu environment up-to-date. > > To achieve my goal, it is necessary being able reading actual register > configuration like eax when a ret hook is called to get a function > return value. So my question is how I can do this. Are there already > some functions which generate code to update the cpu environment? If > not, is there anything you can point me towards for adding support? Without seeing your code, you are probably confusing translation phase and executing the code generated by TCG.
Re: [Qemu-devel] [PATCH] PS/2 keyboard Scancode Set 3 support
On Sun, Feb 13, 2011 at 9:32 AM, Roy Tam wrote: > The following patch adds PS/2 keyboard Scancode Set 3 support. > > Sign-off-by: Roy Tam > -- > diff --git a/hw/ps2.c b/hw/ps2.c > index 762bb00..4b73967 100644 > --- a/hw/ps2.c > +++ b/hw/ps2.c > @@ -143,12 +143,87 @@ static void ps2_put_keycode(void *opaque, int keycode) > { > PS2KbdState *s = opaque; > > - /* XXX: add support for scancode sets 1 and 3 */ > - if (!s->translate && keycode < 0xe0 && s->scancode_set == 2) > + /* XXX: add support for scancode sets 1 */ > + if (!s->translate && keycode < 0xe0 && s->scancode_set > 1) > { The braces belong to the previous line and the indentation is off. Please run the patch through scripts/checkpatch.pl. > if (keycode & 0x80) > ps2_queue(&s->common, 0xf0); > keycode = ps2_raw_keycode[keycode & 0x7f]; > + if (s->scancode_set == 3) > + { > + switch (keycode) > + { > + case 0x1: > + keycode = 0x47; > + break; > + case 0x3: > + keycode = 0x27; > + break; > + case 0x4: > + keycode = 0x17; > + break; > + case 0x5: > + keycode = 0x7; > + break; > + case 0x6: > + keycode = 0xf; > + break; > + case 0x7: > + keycode = 0x5e; > + break; > + case 0x9: > + keycode = 0x4f; > + break; > + case 0xa: > + keycode = 0x3f; > + break; > + case 0xb: > + keycode = 0x2f; > + break; > + case 0xc: > + keycode = 0x1f; > + break; > + case 0x11: > + keycode = 0x19; > + break; > + case 0x14: > + keycode = 0x11; > + break; > + case 0x58: > + keycode = 0x14; > + break; > + case 0x5d: > + keycode = 0x5c; > + break; > + case 0x76: > + keycode = 0x8; > + break; > + case 0x77: > + keycode = 0x76; > + break; > + case 0x78: > + keycode = 0x56; > + break; > + case 0x79: > + keycode = 0x7c; > + break; > + case 0x7b: > + keycode = 0x84; > + break; > + case 0x7c: > + keycode = 0x7e; > + break; > + case 0x7e: > + keycode = 0x5f; > + break; > + case 0x83: > + keycode = 0x37; > + break; > + case 0x84: > + keycode = 0x57; > + break; > + } > + } > } > ps2_queue(&s->common, keycode); > } > >
Re: [Qemu-devel] KVM call minutes for Feb 8
On Fri, Feb 11, 2011 at 08:14:16PM +0200, Blue Swirl wrote: > On Thu, Feb 10, 2011 at 6:05 PM, Anthony Liguori > wrote: > > On 02/10/2011 03:20 PM, Gleb Natapov wrote: > >> > >> Jugging by how well all previous conversion went we will end up with one > >> more way of creating devices. One legacy, another qdev and your new one. > >> And what is the problem with qdev again (not that I am a big qdev fan)? > >> > > > > We've really been arguing about probably the most minor aspect of the > > problem with qdev. Probably. But if this aspect affects the overall re-design of device tree it is not so minor after all. > > > > All I'm really saying is that we shouldn't tie device construction to a > > factory interface as we do with qdev. > > > > That simply means that we should be able to do: > > > > RTC *rtc_create(arg1, arg2, arg2); > > I don't see how that would help at all. Throwing qdev away and just > calling various functions directly, with all states exposed would be > like QEMU 0.9.0. > That is my sentiment exactly. This is what we had before. What was the reason we wanted device tree (let alone specific implementation of it that was committed without much external input)? Why do you no longer want device tree abstraction? > > And that a separate piece of code decides which devices are exposed through > > -device or device_add. Which devices are exposed is really a minor detail. > > > > That said, qdev has a number of significant limitations in my mind. The > > first is that the only relationship between devices is through the BusState > > interface. > > There's also qemu_irq for arbitrary signals. > > > I don't think we should even try to have a generic bus model. > > When you look at how badly broken PCI hotplug is current in qdev, I think > > this is symptomatic of this. > > And how should this be fixed? The API change would not help. > > > There's also no way in qdev to really have polymorphism. Interfaces really > > aren't meaningful in qdev so you have things like PCIDevice where some > > methods are stored in the object instead of the class dispatch table and you > > have overuse of static class members. > > QEMU is developed in C, not C++. > > > And it's all unrelated to VMState. > > Right, but this has also the good side that not all device state is > automatically exported. If other devices would be allowed to muck with > a devices internal state freely, bad things could happen. > > Device reset could also use standard register definitions, shared with > VMState. > > > And this is just the basic mechanisms of qdev. The actual implementation is > > worse. The use of qemu_irq as gpio in the base class and overuse of > > SystemBus is really quite insane. > > Maybe qemu_irq should be renamed to QEMUSignal (and I don't like > typedeffing pointers), otherwise it looks quite sane to me. > > Could you point to examples of SystemBus overuse? > > > And so far, the use of qdev has been entirely superficial. Devices still > > don't make use of bus level interfaces to do I/O so we don't have any better > > componentization than we did before qdev. > > > >> The fact that there is no enough interest to convert all devices to it? > >> > > > > I don't think there is any device that has been improved by qdev. -device > > is a nice feature, but it could have been implemented without qdev. > > We have 'info qtree' which can't be implemented easily without a > generic device class. Avi (or who was it) sent patches to expose even > more device state. > > With the patches I'm going to apply, if Redhat wants to disable > building various devices, it can be done without #ifdeffery. This is > not possible without a generic factory interface. -- Gleb.
Re: [Qemu-devel] Re: [PATCH 09/15] Parse SDR1 on mtspr instead of at translate time
On Sat, Feb 12, 2011 at 04:37:46PM +0100, Alexander Graf wrote: > On 12.02.2011, at 15:54, David Gibson wrote: [snip] > > +#define SDR_HTABORG_32 0xUL > > +#define SDR_HTABMASK 0x01FFUL > > Please mark this constant as ppc32 > > > + > > +#if defined(TARGET_PPC64) > > +#define SDR_HTABORG_64 0xFFFCULL > > +#define SDR_HTABSIZE 0x001FULL > > Please mark this constant as ppc64 Um.. I'm not sure what you mean by this. -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
Re: [Qemu-devel] RFC: Implement emulation of pSeries logical partitions
On Sat, Feb 12, 2011 at 05:26:20PM +0100, Laurent Vivier wrote: > Hi, > > Do you plan to boot AIX in one of these partitions ? Not really, no. This is aimed at existing pSeries Linux kernels. -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
Re: [Qemu-devel] Re: [PATCH 12/15] Support 1T segments on ppc
On Sat, Feb 12, 2011 at 04:57:39PM +0100, Alexander Graf wrote: > On 12.02.2011, at 15:54, David Gibson wrote: [snip] > > +if (rb & (0x1000 - env->slb_nr)) > > Braces... Oops, yeah. These later patches in the series I haven't really audited for coding style adequately yet. I'll fix these before the next version. [snip] > > + return -1; /* 1T segment on MMU that doesn't support it */ > > + > > +/* We stuff a copy of the B field into slb->esid to simplify > > + * lookup later */ > > +slb->esid = (rb & (SLB_ESID_ESID | SLB_ESID_V)) | > > +(rs >> SLB_VSID_SSIZE_SHIFT); > > Wouldn't it be easier to add another field? Easier for what? The reason I put these bits in here is that the rest of the things slb_lookup() needs to scan for are all in the esid field, so putting B in there means slb_lookup() needs only one comparison per-slot, per segment size. -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
Re: [Qemu-devel] Re: [PATCH 13/15] Add POWER7 support for ppc
On Sat, Feb 12, 2011 at 05:09:39PM +0100, Alexander Graf wrote: > On 12.02.2011, at 15:54, David Gibson wrote: [snip] > > +/* Don't generate spurious events */ > > +if ((cur_level == 1 && level == 0) || (cur_level == 0 && level != 0)) { > > Did you hit this? Qemu's irq framework should already ensure that > property. I'm also not sure it's actually correct - if a level > interrupt is on, the guest would get another interrupt injected, no? > That would be cur_level ==1 && level == 1 IIUC. [snip] > > +case POWER7_INPUT_CKSTP: > > POWER7 has checkstop? [snip] > > +case POWER7_INPUT_HRESET: > > Does this ever get triggered? POWER7 is run in lpar only, so there is no > hreset, right? [snip] > > +case POWER7_INPUT_TBEN: > > +LOG_IRQ("%s: set the TBEN state to %d\n", __func__, > > +level); > > +/* XXX: TODO */ > > Hrm - what is this? Ah, drat. I forgot about this. The POWER7 interrupt stuff I copied from 970 and them modified minimally to get it working. I meant to get around to auditing this stuff to see what was actually relevant to POWER7. I'll address this for the next version. [snip] > > +#if !defined(CONFIG_USER_ONLY) > > +env->slb_nr = 32; > > POWER7 has 64, no? Please check this :). Nope. POWER4 and POWER5 have 64, but POWER7 has 32. This one I did check and change. -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
[Qemu-devel] [PATCH v2] PS/2 keyboard Scancode Set 3 support
The following patch adds PS/2 keyboard Scancode Set 3 support. Sign-off-by: Roy Tam -- v2: checkpatch.pl style fixes diff --git a/hw/ps2.c b/hw/ps2.c index 762bb00..6bea0ef 100644 --- a/hw/ps2.c +++ b/hw/ps2.c @@ -143,13 +143,85 @@ static void ps2_put_keycode(void *opaque, int keycode) { PS2KbdState *s = opaque; -/* XXX: add support for scancode sets 1 and 3 */ -if (!s->translate && keycode < 0xe0 && s->scancode_set == 2) - { +/* XXX: add support for scancode sets 1 */ +if (!s->translate && keycode < 0xe0 && s->scancode_set > 1) { if (keycode & 0x80) ps2_queue(&s->common, 0xf0); keycode = ps2_raw_keycode[keycode & 0x7f]; - } +if (s->scancode_set == 3) { +switch (keycode) { +case 0x1: +keycode = 0x47; +break; +case 0x3: +keycode = 0x27; +break; +case 0x4: +keycode = 0x17; +break; +case 0x5: +keycode = 0x7; +break; +case 0x6: +keycode = 0xf; +break; +case 0x7: +keycode = 0x5e; +break; +case 0x9: +keycode = 0x4f; +break; +case 0xa: +keycode = 0x3f; +break; +case 0xb: +keycode = 0x2f; +break; +case 0xc: +keycode = 0x1f; +break; +case 0x11: +keycode = 0x19; +break; +case 0x14: +keycode = 0x11; +break; +case 0x58: +keycode = 0x14; +break; +case 0x5d: +keycode = 0x5c; +break; +case 0x76: +keycode = 0x8; +break; +case 0x77: +keycode = 0x76; +break; +case 0x78: +keycode = 0x56; +break; +case 0x79: +keycode = 0x7c; +break; +case 0x7b: +keycode = 0x84; +break; +case 0x7c: +keycode = 0x7e; +break; +case 0x7e: +keycode = 0x5f; +break; +case 0x83: +keycode = 0x37; +break; +case 0x84: +keycode = 0x57; +break; +} +} +} ps2_queue(&s->common, keycode); }
Re: [Qemu-devel] Re: [PATCH 15/15] Implement the bus structure for PAPR virtual IO
On Sun, Feb 13, 2011 at 10:08:23AM +0200, Blue Swirl wrote: > On Sun, Feb 13, 2011 at 1:15 AM, Benjamin Herrenschmidt > wrote: > > On Sun, 2011-02-13 at 00:52 +0200, Blue Swirl wrote: > >> On Sat, Feb 12, 2011 at 11:00 PM, Benjamin Herrenschmidt > >> wrote: > >> > On Sat, 2011-02-12 at 18:59 +0200, Blue Swirl wrote: > >> >> > >> >> Actually I don't quite understand the need for vty layer, why not use > >> >> the chardev here directly? > >> > > >> > I'm not sure what you mean here... > >> > >> Maybe it would be reasonable to leave h_put_term_char to spapr_hcall.c > >> instead of moving those to a separate file. > > > > Well, the VIO device instance gives the chardev instance which is all > > nicely encapsulated inside spapr-vty... Also VIO devices tend to have > > dedicated hcalls, not only VTY, so it makes a lot of sense to keep them > > close to the rest of the VIO driver they belong to don't you think ? > > > > (Actually veth does, vscsi uses the more "generic" CRQ stuff which we've > > added to the core VIO but you'll see that when we get those patches > > ready, hopefully soon). > > This is a bit of a special case, much like semihosting modes for m68k > or ARM, or like MOL hacks which were removed recently. From QEMU point > of view, the most natural way of handling this would be hypervisor > implemented in the guest side (for example BIOS). Then the hypervisor > would use normal IO (or virtio) to communicate with the host. If > inside QEMU, the interface of the hypervisor to the devices needs some > thought. We'd like to avoid ugly interfaces like vmmouse where a > device probes CPU registers directly or spaghetti interfaces like > APIC. I really don't follow what you're saying here. Running the hypervisor in the guest, rather than emulating its effect in qemu seems like an awful lot of complexity for no clear reason. > > Actually, one thing I noticed is that the current patches David posted > > still have a single function with a switch/case statement for hcalls. > > > > I'm not 100% certain what David long term plans are here, but in our > > internal "WIP" tree, we've subsequently turned that into a mechanism > > where any module can call powerpc_register_hypercall() to add hcalls. > > > > So if David intends to move the "upstream candidate" tree in that > > direction, then naturally, the calls in spapr_hcall.c are going to > > disappear in favor of a pair of powerpc_register_hypercall() locally in > > the vty module. > > Is the interface new design, or are you implementing what is used also > on real HW? The interface already exists on real HW. It's described in the PAPR document we keep mentioning. -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
Re: [Qemu-devel] Re: [PATCH 15/15] Implement the bus structure for PAPR virtual IO
On Sun, Feb 13, 2011 at 10:15:03AM +1100, Benjamin Herrenschmidt wrote: > On Sun, 2011-02-13 at 00:52 +0200, Blue Swirl wrote: > > On Sat, Feb 12, 2011 at 11:00 PM, Benjamin Herrenschmidt [snip] > Actually, one thing I noticed is that the current patches David posted > still have a single function with a switch/case statement for hcalls. > > I'm not 100% certain what David long term plans are here, but in our > internal "WIP" tree, we've subsequently turned that into a mechanism > where any module can call powerpc_register_hypercall() to add hcalls. > > So if David intends to move the "upstream candidate" tree in that > direction, then naturally, the calls in spapr_hcall.c are going to > disappear in favor of a pair of powerpc_register_hypercall() locally in > the vty module. Ah, yeah. I'm still not sure what to do about it. I was going to fold the dynamic hcall registration into the patch set before upstreaming. But then something paulus said made me rethink whether the dynamic registration was a good idea. Still need to sort this out before the series is really ready. -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
Re: [Qemu-devel] Re: [PATCH 15/15] Implement the bus structure for PAPR virtual IO
On Sat, Feb 12, 2011 at 05:47:53PM +0100, Alexander Graf wrote: > On 12.02.2011, at 15:54, David Gibson wrote: [snip] > > @@ -267,6 +295,7 @@ static QEMUMachine spapr_machine = { > > .desc = "pSeries Logical Partition (PAPR compliant)", > > .init = ppc_spapr_init, > > .max_cpus = 1, > > +.no_parallel = 1, > > duplicate? Oops, rebasing mistake. Fixed now. [snip] > > +VIOsPAPRDevice *spapr_vio_find_by_reg(VIOsPAPRBus *bus, uint32_t reg) > > +{ > > +DeviceState *qdev; > > +VIOsPAPRDevice *dev = NULL; > > + > > +QLIST_FOREACH(qdev, &bus->bus.children, sibling) { > > +dev = (VIOsPAPRDevice *)qdev; > > +if (dev->reg == reg) > > Braces > > > +break; > > +} > > + > > +return dev; > > What if the device doesn't exist? This returns NULL, the caller returns an error... -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
Re: [Qemu-devel] [PATCH] eepro100: pad to ensure minimum packet size
Am 11.02.2011 20:36, schrieb Bruce Rogers: Recent gpxe e100pro drivers will drop small packets because the emulated nic will report an error for small frames. In the qemu model we should instead have the e100pro pad out the received frames to be the minimum size and not report this case as an error. Signed-off-by: Bruce Rogers --- hw/eepro100.c | 23 +-- 1 files changed, 13 insertions(+), 10 deletions(-) diff --git a/hw/eepro100.c b/hw/eepro100.c index edf48f6..03b6934 100644 --- a/hw/eepro100.c +++ b/hw/eepro100.c @@ -1645,6 +1645,8 @@ static int nic_can_receive(VLANClientState *nc) #endif } +#define MIN_BUF_SIZE 60 /* Min. octets in an ethernet frame sans FCS */ + static ssize_t nic_receive(VLANClientState *nc, const uint8_t * buf, size_t size) { /* TODO: @@ -1653,6 +1655,7 @@ static ssize_t nic_receive(VLANClientState *nc, const uint8_t * buf, size_t size */ EEPRO100State *s = DO_UPCAST(NICState, nc, nc)->opaque; uint16_t rfd_status = 0xa000; + uint8_t min_buf[MIN_BUF_SIZE]; static const uint8_t broadcast_macaddr[6] = { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff }; @@ -1660,15 +1663,15 @@ static ssize_t nic_receive(VLANClientState *nc, const uint8_t * buf, size_t size /* CSMA is disabled. */ logout("%p received while CSMA is disabled\n", s); return -1; - } else if (size < 64 && (s->configuration[7] & BIT(0))) { - /* Short frame and configuration byte 7/0 (discard short receive) set: - * Short frame is discarded */ - logout("%p received short frame (%zu byte)\n", s, size); - s->statistics.rx_short_frame_errors++; -#if 0 - return -1; -#endif - } else if ((size > MAX_ETH_FRAME_SIZE + 4) && !(s->configuration[18] & BIT(3))) { + } + /* Pad to minimum Ethernet frame length */ + if (size < sizeof(min_buf)) { + memcpy(min_buf, buf, size); + memset(&min_buf[size], 0, sizeof(min_buf) - size); + buf = min_buf; + size = sizeof(min_buf); + } + if ((size > MAX_ETH_FRAME_SIZE + 4) && !(s->configuration[18] & BIT(3))) { /* Long frame and configuration byte 18/3 (long receive ok) not set: * Long frames are discarded. */ logout("%p received long frame (%zu byte), ignored\n", s, size); @@ -1744,7 +1747,7 @@ static ssize_t nic_receive(VLANClientState *nc, const uint8_t * buf, size_t size "(%zu bytes); data truncated\n", rfd_size, size); size = rfd_size; } - if (size < 64) { + if (size < MIN_BUF_SIZE) { rfd_status |= 0x0080; } TRACE(OTHER, logout("command 0x%04x, link 0x%08x, addr 0x%08x, size %u\n", Could you please give more details of the test scenario which fails? I'd like to reproduce it here and find a better solution. The configuration bit "discard short frame" exists in real hardware, so removing the code which emulates this behavior is not the correct solution - even if it helps in a special case. No acknowledge for this patch from me. Regards, Stefan Weil
Re: [Qemu-devel] Re: [PATCH 15/15] Implement the bus structure for PAPR virtual IO
On Sun, Feb 13, 2011 at 1:12 PM, David Gibson wrote: > On Sun, Feb 13, 2011 at 10:08:23AM +0200, Blue Swirl wrote: >> On Sun, Feb 13, 2011 at 1:15 AM, Benjamin Herrenschmidt >> wrote: >> > On Sun, 2011-02-13 at 00:52 +0200, Blue Swirl wrote: >> >> On Sat, Feb 12, 2011 at 11:00 PM, Benjamin Herrenschmidt >> >> wrote: >> >> > On Sat, 2011-02-12 at 18:59 +0200, Blue Swirl wrote: >> >> >> >> >> >> Actually I don't quite understand the need for vty layer, why not use >> >> >> the chardev here directly? >> >> > >> >> > I'm not sure what you mean here... >> >> >> >> Maybe it would be reasonable to leave h_put_term_char to spapr_hcall.c >> >> instead of moving those to a separate file. >> > >> > Well, the VIO device instance gives the chardev instance which is all >> > nicely encapsulated inside spapr-vty... Also VIO devices tend to have >> > dedicated hcalls, not only VTY, so it makes a lot of sense to keep them >> > close to the rest of the VIO driver they belong to don't you think ? >> > >> > (Actually veth does, vscsi uses the more "generic" CRQ stuff which we've >> > added to the core VIO but you'll see that when we get those patches >> > ready, hopefully soon). >> >> This is a bit of a special case, much like semihosting modes for m68k >> or ARM, or like MOL hacks which were removed recently. From QEMU point >> of view, the most natural way of handling this would be hypervisor >> implemented in the guest side (for example BIOS). Then the hypervisor >> would use normal IO (or virtio) to communicate with the host. If >> inside QEMU, the interface of the hypervisor to the devices needs some >> thought. We'd like to avoid ugly interfaces like vmmouse where a >> device probes CPU registers directly or spaghetti interfaces like >> APIC. > > I really don't follow what you're saying here. Running the hypervisor > in the guest, rather than emulating its effect in qemu seems like an > awful lot of complexity for no clear reason. Maybe it would be more complex but also emulation accuracy would be increased and the interfaces would be saner. We don't shortcut BIOS and implement its services to OS in QEMU for other machines either. I'd expect one problem with that approach though, the interface used on real HW between the hypervisor and the underlying HW may be undocumented, but then it could use for example existing virtio devices. One way to handle this could be to add the hypervisor interface now to QEMU and switch to guest hypervisor when (if) it becomes available. I'd just like to avoid duplication with virtio or messy interfaces like vmport.
[Qemu-devel] Re: [PATCH 15/15] Implement the bus structure for PAPR virtual IO
On 13.02.2011, at 09:08, Blue Swirl wrote: > On Sun, Feb 13, 2011 at 1:15 AM, Benjamin Herrenschmidt > wrote: >> On Sun, 2011-02-13 at 00:52 +0200, Blue Swirl wrote: >>> On Sat, Feb 12, 2011 at 11:00 PM, Benjamin Herrenschmidt >>> wrote: On Sat, 2011-02-12 at 18:59 +0200, Blue Swirl wrote: > > Actually I don't quite understand the need for vty layer, why not use > the chardev here directly? I'm not sure what you mean here... >>> >>> Maybe it would be reasonable to leave h_put_term_char to spapr_hcall.c >>> instead of moving those to a separate file. >> >> Well, the VIO device instance gives the chardev instance which is all >> nicely encapsulated inside spapr-vty... Also VIO devices tend to have >> dedicated hcalls, not only VTY, so it makes a lot of sense to keep them >> close to the rest of the VIO driver they belong to don't you think ? >> >> (Actually veth does, vscsi uses the more "generic" CRQ stuff which we've >> added to the core VIO but you'll see that when we get those patches >> ready, hopefully soon). > > This is a bit of a special case, much like semihosting modes for m68k > or ARM, or like MOL hacks which were removed recently. From QEMU point > of view, the most natural way of handling this would be hypervisor > implemented in the guest side (for example BIOS). Then the hypervisor > would use normal IO (or virtio) to communicate with the host. If > inside QEMU, the interface of the hypervisor to the devices needs some > thought. We'd like to avoid ugly interfaces like vmmouse where a > device probes CPU registers directly or spaghetti interfaces like > APIC. In this case I disagree. While the "emulate real hardware" case would be to have a full proprietary binary blob of firmware running in the guest that would handle all the hypervisor stuff, this is not feasible. Implementing PAPR in Qemu is hard (and slow) enough - doing it all emulated simply is overkill for what we're trying to achieve here. The PAPR interfaces are well specified (in the PAPR spec - seems to be power.org member restricted) and are the only thing you ever get to see on recent POWER hardware. The real hardware interface is simply inaccessible to you. It's basically the same as the S390 machine we have. On those machine we simply don't have access to real hw, so emulating it is moot. All interfaces that the OS sees are PV interfaces. > >> Actually, one thing I noticed is that the current patches David posted >> still have a single function with a switch/case statement for hcalls. >> >> I'm not 100% certain what David long term plans are here, but in our >> internal "WIP" tree, we've subsequently turned that into a mechanism >> where any module can call powerpc_register_hypercall() to add hcalls. >> >> So if David intends to move the "upstream candidate" tree in that >> direction, then naturally, the calls in spapr_hcall.c are going to >> disappear in favor of a pair of powerpc_register_hypercall() locally in >> the vty module. > > Is the interface new design, or are you implementing what is used also > on real HW? PAPR is the spec that defines the PV interface in use on POWER. Outside of IBM, nobody can run Linux on POWER without going through phyp which is their hypervisor. So this implements exactly what the OS sees on real HW :). Alex
Re: [Qemu-devel] Re: [PATCH 09/15] Parse SDR1 on mtspr instead of at translate time
On 13.02.2011, at 10:02, David Gibson wrote: > On Sat, Feb 12, 2011 at 04:37:46PM +0100, Alexander Graf wrote: >> On 12.02.2011, at 15:54, David Gibson wrote: > [snip] >>> +#define SDR_HTABORG_32 0xUL >>> +#define SDR_HTABMASK 0x01FFUL >> >> Please mark this constant as ppc32 >> >>> + >>> +#if defined(TARGET_PPC64) >>> +#define SDR_HTABORG_64 0xFFFCULL >>> +#define SDR_HTABSIZE 0x001FULL >> >> Please mark this constant as ppc64 > > Um.. I'm not sure what you mean by this. Well, while to you SDR_HTABMASK and SDR_HTABSIZE are "obviously" meant for ppc32/ppc64 respectively, the average code reader won't know the difference. What I'm proposing is: #define SDR_32_HTABORG #define SDR_32_HTABMASK #define SDR_64_HTABORG #define SDR_64_HTABSIZE This way it's a lot more obvious that the two constants really belong to two completely different semantics :). Alex
Re: [Qemu-devel] Re: [PATCH 12/15] Support 1T segments on ppc
On 13.02.2011, at 10:34, David Gibson wrote: > On Sat, Feb 12, 2011 at 04:57:39PM +0100, Alexander Graf wrote: >> On 12.02.2011, at 15:54, David Gibson wrote: > [snip] >>> +if (rb & (0x1000 - env->slb_nr)) >> >> Braces... > > Oops, yeah. These later patches in the series I haven't really > audited for coding style adequately yet. I'll fix these before the > next version. > > [snip] >>> + return -1; /* 1T segment on MMU that doesn't support it */ >>> + >>> +/* We stuff a copy of the B field into slb->esid to simplify >>> + * lookup later */ >>> +slb->esid = (rb & (SLB_ESID_ESID | SLB_ESID_V)) | >>> +(rs >> SLB_VSID_SSIZE_SHIFT); >> >> Wouldn't it be easier to add another field? > > Easier for what? The reason I put these bits in here is that the rest > of the things slb_lookup() needs to scan for are all in the esid > field, so putting B in there means slb_lookup() needs only one > comparison per-slot, per segment size. Hrm - but it also needs random & ~3 masking in other code which is very unpretty. Comparing two numbers really shouldn't hurt performance too much, but makes the code better maintainable. struct slb_entry { uint64_t esid; uint64_t vsid; int b; } or so :). Alex
Re: [Qemu-devel] Re: [PATCH 13/15] Add POWER7 support for ppc
On 13.02.2011, at 10:39, David Gibson wrote: > On Sat, Feb 12, 2011 at 05:09:39PM +0100, Alexander Graf wrote: >> On 12.02.2011, at 15:54, David Gibson wrote: > [snip] >>> +/* Don't generate spurious events */ >>> +if ((cur_level == 1 && level == 0) || (cur_level == 0 && level != 0)) { >> >> Did you hit this? Qemu's irq framework should already ensure that >> property. I'm also not sure it's actually correct - if a level >> interrupt is on, the guest would get another interrupt injected, no? >> That would be cur_level ==1 && level == 1 IIUC. > > [snip] >>> +case POWER7_INPUT_CKSTP: >> >> POWER7 has checkstop? > > [snip] >>> +case POWER7_INPUT_HRESET: >> >> Does this ever get triggered? POWER7 is run in lpar only, so there is no >> hreset, right? > > [snip] >>> +case POWER7_INPUT_TBEN: >>> +LOG_IRQ("%s: set the TBEN state to %d\n", __func__, >>> +level); >>> +/* XXX: TODO */ >> >> Hrm - what is this? > > Ah, drat. I forgot about this. The POWER7 interrupt stuff I copied > from 970 and them modified minimally to get it working. I meant to > get around to auditing this stuff to see what was actually relevant to > POWER7. I'll address this for the next version. > > [snip] >>> +#if !defined(CONFIG_USER_ONLY) >>> +env->slb_nr = 32; >> >> POWER7 has 64, no? Please check this :). > > Nope. POWER4 and POWER5 have 64, but POWER7 has 32. This one I did > check and change. Oh? Interesting. Good to know :) Alex
Re: [Qemu-devel] Re: [PATCH 15/15] Implement the bus structure for PAPR virtual IO
On 13.02.2011, at 12:09, David Gibson wrote: > On Sat, Feb 12, 2011 at 05:47:53PM +0100, Alexander Graf wrote: >> On 12.02.2011, at 15:54, David Gibson wrote: > [snip] >>> @@ -267,6 +295,7 @@ static QEMUMachine spapr_machine = { >>>.desc = "pSeries Logical Partition (PAPR compliant)", >>>.init = ppc_spapr_init, >>>.max_cpus = 1, >>> +.no_parallel = 1, >> >> duplicate? > > Oops, rebasing mistake. Fixed now. > > [snip] >>> +VIOsPAPRDevice *spapr_vio_find_by_reg(VIOsPAPRBus *bus, uint32_t reg) >>> +{ >>> +DeviceState *qdev; >>> +VIOsPAPRDevice *dev = NULL; >>> + >>> +QLIST_FOREACH(qdev, &bus->bus.children, sibling) { >>> +dev = (VIOsPAPRDevice *)qdev; >>> +if (dev->reg == reg) >> >> Braces >> >>> +break; >>> +} >>> + >>> +return dev; >> >> What if the device doesn't exist? > > This returns NULL, the caller returns an error... Makes sense :). Alex
Re: [Qemu-devel] Re: [PATCH 15/15] Implement the bus structure for PAPR virtual IO
On 13.02.2011, at 12:14, David Gibson wrote: > On Sun, Feb 13, 2011 at 10:15:03AM +1100, Benjamin Herrenschmidt wrote: >> On Sun, 2011-02-13 at 00:52 +0200, Blue Swirl wrote: >>> On Sat, Feb 12, 2011 at 11:00 PM, Benjamin Herrenschmidt > [snip] >> Actually, one thing I noticed is that the current patches David posted >> still have a single function with a switch/case statement for hcalls. >> >> I'm not 100% certain what David long term plans are here, but in our >> internal "WIP" tree, we've subsequently turned that into a mechanism >> where any module can call powerpc_register_hypercall() to add hcalls. >> >> So if David intends to move the "upstream candidate" tree in that >> direction, then naturally, the calls in spapr_hcall.c are going to >> disappear in favor of a pair of powerpc_register_hypercall() locally in >> the vty module. > > Ah, yeah. I'm still not sure what to do about it. I was going to > fold the dynamic hcall registration into the patch set before > upstreaming. But then something paulus said made me rethink whether > the dynamic registration was a good idea. Still need to sort this out > before the series is really ready. We can surely move it to dynamic later on. I think the "proper" way would be to populate a qdev bus and have the individual hypercall receivers register themselves through -device creations. But Blue really is the expert here :). Alex
Re: [Qemu-devel] Re: [PATCH 15/15] Implement the bus structure for PAPR virtual IO
On Sun, Feb 13, 2011 at 01:40:14PM +0100, Alexander Graf wrote: > > On 13.02.2011, at 12:14, David Gibson wrote: > > > On Sun, Feb 13, 2011 at 10:15:03AM +1100, Benjamin Herrenschmidt wrote: > >> On Sun, 2011-02-13 at 00:52 +0200, Blue Swirl wrote: > >>> On Sat, Feb 12, 2011 at 11:00 PM, Benjamin Herrenschmidt > > [snip] > >> Actually, one thing I noticed is that the current patches David posted > >> still have a single function with a switch/case statement for hcalls. > >> > >> I'm not 100% certain what David long term plans are here, but in our > >> internal "WIP" tree, we've subsequently turned that into a mechanism > >> where any module can call powerpc_register_hypercall() to add hcalls. > >> > >> So if David intends to move the "upstream candidate" tree in that > >> direction, then naturally, the calls in spapr_hcall.c are going to > >> disappear in favor of a pair of powerpc_register_hypercall() locally in > >> the vty module. > > > > Ah, yeah. I'm still not sure what to do about it. I was going to > > fold the dynamic hcall registration into the patch set before > > upstreaming. But then something paulus said made me rethink whether > > the dynamic registration was a good idea. Still need to sort this out > > before the series is really ready. > > We can surely move it to dynamic later on. I think the "proper" way > would be to populate a qdev bus and have the individual hypercall > receivers register themselves through -device creations. But Blue > really is the expert here :). Ok, not sure what you mean here. I already have a qdev bus for the VIO devices. With my tentative dynamic model as devices are created on the bus they may register hypercalls as well. Is that what you mean, or do you mean have a separate "hypercall" bus. That sounds like serious overkill for a simple number->function translation. -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
Re: [Qemu-devel] Re: [PATCH 05/15] Implement PowerPC slbmfee and slbmfev instructions
On Sat, Feb 12, 2011 at 04:23:39PM +0100, Alexander Graf wrote: > On 12.02.2011, at 15:54, David Gibson wrote: [snip] > > +target_ulong helper_load_slb_esid (target_ulong rb) > > +{ > > +target_ulong rt; > > + > > +if (ppc_load_slb_esid(env, rb, &rt) < 0) { > > +helper_raise_exception_err(POWERPC_EXCP_PROGRAM, > > POWERPC_EXCP_INVAL); > > The spec doesn't say what to do in this case. Have you checked what > real hardware does? Erm, I don't think I've checked this specific case, on this specific CPU. Generally I've found that invalid parameters to MMU management instructions results in invalid instruction program checks, so I assumed that's what would happen in this case. -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
[Qemu-devel] Re: [PATCH 15/15] Implement the bus structure for PAPR virtual IO
On Sun, Feb 13, 2011 at 2:31 PM, Alexander Graf wrote: > > On 13.02.2011, at 09:08, Blue Swirl wrote: > >> On Sun, Feb 13, 2011 at 1:15 AM, Benjamin Herrenschmidt >> wrote: >>> On Sun, 2011-02-13 at 00:52 +0200, Blue Swirl wrote: On Sat, Feb 12, 2011 at 11:00 PM, Benjamin Herrenschmidt wrote: > On Sat, 2011-02-12 at 18:59 +0200, Blue Swirl wrote: >> >> Actually I don't quite understand the need for vty layer, why not use >> the chardev here directly? > > I'm not sure what you mean here... Maybe it would be reasonable to leave h_put_term_char to spapr_hcall.c instead of moving those to a separate file. >>> >>> Well, the VIO device instance gives the chardev instance which is all >>> nicely encapsulated inside spapr-vty... Also VIO devices tend to have >>> dedicated hcalls, not only VTY, so it makes a lot of sense to keep them >>> close to the rest of the VIO driver they belong to don't you think ? >>> >>> (Actually veth does, vscsi uses the more "generic" CRQ stuff which we've >>> added to the core VIO but you'll see that when we get those patches >>> ready, hopefully soon). >> >> This is a bit of a special case, much like semihosting modes for m68k >> or ARM, or like MOL hacks which were removed recently. From QEMU point >> of view, the most natural way of handling this would be hypervisor >> implemented in the guest side (for example BIOS). Then the hypervisor >> would use normal IO (or virtio) to communicate with the host. If >> inside QEMU, the interface of the hypervisor to the devices needs some >> thought. We'd like to avoid ugly interfaces like vmmouse where a >> device probes CPU registers directly or spaghetti interfaces like >> APIC. > > In this case I disagree. While the "emulate real hardware" case would be to > have a full proprietary binary blob of firmware running in the guest that > would handle all the hypervisor stuff, this is not feasible. Implementing > PAPR in Qemu is hard (and slow) enough - doing it all emulated simply is > overkill for what we're trying to achieve here. > > The PAPR interfaces are well specified (in the PAPR spec - seems to be > power.org member restricted) and are the only thing you ever get to see on > recent POWER hardware. The real hardware interface is simply inaccessible to > you. Hah, I'm sure that if sufficiently motivated bunch of cool hackers got access to a truckload (or several, aren't these big machines?) of POWER hardware, which they could open and brick at leisure, like it is done with other firmware reverse engineering efforts, we'd have open hypervisor at no time. ;-) It's not impossible to imagine also IBM open sourcing theirs. > It's basically the same as the S390 machine we have. On those machine we > simply don't have access to real hw, so emulating it is moot. All interfaces > that the OS sees are PV interfaces. > >> >>> Actually, one thing I noticed is that the current patches David posted >>> still have a single function with a switch/case statement for hcalls. >>> >>> I'm not 100% certain what David long term plans are here, but in our >>> internal "WIP" tree, we've subsequently turned that into a mechanism >>> where any module can call powerpc_register_hypercall() to add hcalls. >>> >>> So if David intends to move the "upstream candidate" tree in that >>> direction, then naturally, the calls in spapr_hcall.c are going to >>> disappear in favor of a pair of powerpc_register_hypercall() locally in >>> the vty module. >> >> Is the interface new design, or are you implementing what is used also >> on real HW? > > PAPR is the spec that defines the PV interface in use on POWER. Outside of > IBM, nobody can run Linux on POWER without going through phyp which is their > hypervisor. > > So this implements exactly what the OS sees on real HW :). Right. As long as the resulting spaghetti is well contained, I'm OK with this approach. But should ever the millionaire hacker team (or IBM) appear with their open hypervisor, this shall make room for it.
Re: [Qemu-devel] Re: [PATCH 15/15] Implement the bus structure for PAPR virtual IO
On 13.02.2011, at 13:44, David Gibson wrote: > On Sun, Feb 13, 2011 at 01:40:14PM +0100, Alexander Graf wrote: >> >> On 13.02.2011, at 12:14, David Gibson wrote: >> >>> On Sun, Feb 13, 2011 at 10:15:03AM +1100, Benjamin Herrenschmidt wrote: On Sun, 2011-02-13 at 00:52 +0200, Blue Swirl wrote: > On Sat, Feb 12, 2011 at 11:00 PM, Benjamin Herrenschmidt >>> [snip] Actually, one thing I noticed is that the current patches David posted still have a single function with a switch/case statement for hcalls. I'm not 100% certain what David long term plans are here, but in our internal "WIP" tree, we've subsequently turned that into a mechanism where any module can call powerpc_register_hypercall() to add hcalls. So if David intends to move the "upstream candidate" tree in that direction, then naturally, the calls in spapr_hcall.c are going to disappear in favor of a pair of powerpc_register_hypercall() locally in the vty module. >>> >>> Ah, yeah. I'm still not sure what to do about it. I was going to >>> fold the dynamic hcall registration into the patch set before >>> upstreaming. But then something paulus said made me rethink whether >>> the dynamic registration was a good idea. Still need to sort this out >>> before the series is really ready. >> >> We can surely move it to dynamic later on. I think the "proper" way >> would be to populate a qdev bus and have the individual hypercall >> receivers register themselves through -device creations. But Blue >> really is the expert here :). > > Ok, not sure what you mean here. I already have a qdev bus for the > VIO devices. With my tentative dynamic model as devices are created > on the bus they may register hypercalls as well. Oh, guess I just overlooked that then, sorry :). > Is that what you mean, or do you mean have a separate "hypercall" > bus. That sounds like serious overkill for a simple number->function > translation. Yup. Alex
Re: [Qemu-devel] Re: [PATCH 09/15] Parse SDR1 on mtspr instead of at translate time
On Sun, Feb 13, 2011 at 01:33:44PM +0100, Alexander Graf wrote: > > On 13.02.2011, at 10:02, David Gibson wrote: > > > On Sat, Feb 12, 2011 at 04:37:46PM +0100, Alexander Graf wrote: > >> On 12.02.2011, at 15:54, David Gibson wrote: > > [snip] > >>> +#define SDR_HTABORG_32 0xUL > >>> +#define SDR_HTABMASK 0x01FFUL > >> > >> Please mark this constant as ppc32 > >> > >>> + > >>> +#if defined(TARGET_PPC64) > >>> +#define SDR_HTABORG_64 0xFFFCULL > >>> +#define SDR_HTABSIZE 0x001FULL > >> > >> Please mark this constant as ppc64 > > > > Um.. I'm not sure what you mean by this. > > Well, while to you SDR_HTABMASK and SDR_HTABSIZE are "obviously" > meant for ppc32/ppc64 respectively, the average code reader won't > know the difference. What I'm proposing is: > > #define SDR_32_HTABORG > #define SDR_32_HTABMASK > > #define SDR_64_HTABORG > #define SDR_64_HTABSIZE > > This way it's a lot more obvious that the two constants really > belong to two completely different semantics :). Ah! I see. Done, I'll have this in the next cut. -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
Re: [Qemu-devel] Re: [PATCH 12/15] Support 1T segments on ppc
On Sun, Feb 13, 2011 at 01:37:12PM +0100, Alexander Graf wrote: > On 13.02.2011, at 10:34, David Gibson wrote: > > On Sat, Feb 12, 2011 at 04:57:39PM +0100, Alexander Graf wrote: > >> On 12.02.2011, at 15:54, David Gibson wrote: > > [snip] > >>> +if (rb & (0x1000 - env->slb_nr)) > >> > >> Braces... > > > > Oops, yeah. These later patches in the series I haven't really > > audited for coding style adequately yet. I'll fix these before the > > next version. > > > > [snip] > >>> + return -1; /* 1T segment on MMU that doesn't support it */ > >>> + > >>> +/* We stuff a copy of the B field into slb->esid to simplify > >>> + * lookup later */ > >>> +slb->esid = (rb & (SLB_ESID_ESID | SLB_ESID_V)) | > >>> +(rs >> SLB_VSID_SSIZE_SHIFT); > >> > >> Wouldn't it be easier to add another field? > > > > Easier for what? The reason I put these bits in here is that the rest > > of the things slb_lookup() needs to scan for are all in the esid > > field, so putting B in there means slb_lookup() needs only one > > comparison per-slot, per segment size. > > Hrm - but it also needs random & ~3 masking in other code which is > very unpretty. Comparing two numbers really shouldn't hurt > performance too much, but makes the code better maintainable. Well, it's only one place. But fair enough, I'll avoid this hack in the next version. > struct slb_entry { > uint64_t esid; > uint64_t vsid; > int b; > } > > or so :). Actually, we don't even need that. The B field is already in slb->vsid. -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
[Qemu-devel] Re: Problem with DOS application and 286 DOS Extender application
Hello, After some fortune I found out that also Turbo Debugger 286 doesn't work under plain DOS 6.22 (without any memory mananger just pressing F5) or with some memory mananagers (HIMEM.SYS, EMM386, QEMM386). Error message is: Error 266 loading D:\DIR\TD286.EXE into extended memory. So it looks like that there is a major issue with extended memory. Any ideas how to fix or how to find the problem and fix it? Version is latest seabios and QEMU from git as of now (own builds). I'm pretty sure that it is the same reason that the 286 DOS Extender application doesn't work. For full reference of the previously discussed have a look here: http://www.mail-archive.com/qemu-devel@nongnu.org/msg29518.html http://www.mail-archive.com/qemu-devel@nongnu.org/msg29465.html Thnx. Ciao, Gerhard -- http://www.wiesinger.com/ On Wed, 14 Apr 2010, Jan Kiszka wrote: Jamie Lokier wrote: Gerhard Wiesinger wrote: It is a non public, proprietary application which uses the Ergo Computing 286 DOS Extender. I guess some other application which use the same DOS extender have the same problem. So best thing is to find another application which uses the Ergo Computing 286 DOS Extender, too. The 286 was obsolete 20 years ago, although code depending on it persisted for some years after. I'm fairly sure the number of people using (or trying to use) Qemu with 286-specific code is very small indeed, so unfortunately for a 286 problem, you will need to help reproduce it as much as you can for it to be fixed. In some scenarios, we use QEMU in emulation mode for such a legacy guest (16-bit protected mode), but we mostly run it in KVM mode these days. It works fairly well under QEMU, but also we did not explore all corner cases. Note that Qemu doesn't emulate segments properly even for 32-bit x86 code, and 16-bit (286) code depends on that all the more. That may be the problem. More precisely: QEMU does not check for segment limits. This can be a problem with buggy or pedantic guests, but usually one tried to avoid triggering this anyway. I once wrote a crude patch to add this, but it had significant performance impact and did not properly make use of the TCG to optimize the checks. You'll find it in the archives (but I guess it no longer applies). Or it may be the "reset using keyboard controller and BIOS" method used to switch from protected mode to real mode on a 286 is not implemented properly, or is not supported by the BIOS properly. Or it may simply be a bug in 16-bit task segment switching or something like that, which is quite complex and so rarely used that it might never have been properly tested. Task switching looks fairly stable in QEMU (in contrast to KVM where we just ran into some more corner cases). Did you try running the application under Bochs, which has a more accurate emulation of very old x86 CPUs? -- Jamie That said, having some test case to reproduce the issue is essential. I'm willing to have a look if you can provide such thing (publicly or privately). Before that, you could already try building QEMU with --enable-debug and run it with "-d exec,int". The generated /tmp/qemu.log may point out where things go wrong (usually where faults starts to occur). Jan
[Qemu-devel] Re: Problem with DOS application and 286 DOS Extender application
On Sun, Feb 13, 2011 at 03:06:44PM +0100, Gerhard Wiesinger wrote: > Hello, > > After some fortune I found out that also Turbo Debugger 286 doesn't > work under plain DOS 6.22 (without any memory mananger just pressing > F5) or with some memory mananagers (HIMEM.SYS, EMM386, QEMM386). > > Error message is: > Error 266 loading D:\DIR\TD286.EXE into extended memory. > > So it looks like that there is a major issue with extended memory. > Any ideas how to fix or how to find the problem and fix it? It would help if you could post the seabios log. The easiest way to get at that is to add the following to the qemu command line: -chardev stdio,id=seabios -device isa-debugcon,iobase=0x402,chardev=seabios It's also possible to recompile seabios with the debug level increased to get more info on specific calls. -Kevin
Re: [Qemu-devel] Re: [PATCH 15/15] Implement the bus structure for PAPR virtual IO
On 02/13/2011 05:12 AM, David Gibson wrote: On Sun, Feb 13, 2011 at 10:08:23AM +0200, Blue Swirl wrote: On Sun, Feb 13, 2011 at 1:15 AM, Benjamin Herrenschmidt wrote: On Sun, 2011-02-13 at 00:52 +0200, Blue Swirl wrote: On Sat, Feb 12, 2011 at 11:00 PM, Benjamin Herrenschmidt wrote: On Sat, 2011-02-12 at 18:59 +0200, Blue Swirl wrote: Actually I don't quite understand the need for vty layer, why not use the chardev here directly? I'm not sure what you mean here... Maybe it would be reasonable to leave h_put_term_char to spapr_hcall.c instead of moving those to a separate file. Well, the VIO device instance gives the chardev instance which is all nicely encapsulated inside spapr-vty... Also VIO devices tend to have dedicated hcalls, not only VTY, so it makes a lot of sense to keep them close to the rest of the VIO driver they belong to don't you think ? (Actually veth does, vscsi uses the more "generic" CRQ stuff which we've added to the core VIO but you'll see that when we get those patches ready, hopefully soon). This is a bit of a special case, much like semihosting modes for m68k or ARM, or like MOL hacks which were removed recently. From QEMU point of view, the most natural way of handling this would be hypervisor implemented in the guest side (for example BIOS). Then the hypervisor would use normal IO (or virtio) to communicate with the host. If inside QEMU, the interface of the hypervisor to the devices needs some thought. We'd like to avoid ugly interfaces like vmmouse where a device probes CPU registers directly or spaghetti interfaces like APIC. I really don't follow what you're saying here. Running the hypervisor in the guest, rather than emulating its effect in qemu seems like an awful lot of complexity for no clear reason. In KVM for x86, instead of using a secondary interface (like vmmcall/vmcall), we do all of our paravirtualization using native hardware interfaces that we can trap (PIO/MMIO). IIUC, on Power, trapping MMIO is not possible due to the MMU mode that is currently used (PFs are delivered directly to the guest). So it's not really possible to switch from using hypercalls to using MMIO. What I would suggest is modelling hypercalls as another I/O address space for the processor. So instead of having a function pointer in the CPUState, introduce a: typedef void (HypercallFunc)(CPUState *env, void *opaque); /* register a hypercall handler */ void register_hypercall(target_ulong index, HypercallFunc *handler, void *opaque); void unregister_hypercall(target_ulong index); /* dispatch a hypercall */ void cpu_hypercall(CPUState *env, target_ulong index); This interface could also be used to implement hypercall based interfaces on s390 and x86. The arguments will have to be extracted from the CPU state but I don't think we'll really ever have common hypercall implementations anyway so that's not a huge problem. on real HW? The interface already exists on real HW. It's described in the PAPR document we keep mentioning. Another thing to note is that the hypercall based I/O devices the interfaces that the current Power hypervisor uses so implementing this interface is necessary to support existing guests. Regards, Anthony Liguori
Re: [Qemu-devel] Re: [PATCH 15/15] Implement the bus structure for PAPR virtual IO
On 02/13/2011 06:40 AM, Alexander Graf wrote: Ah, yeah. I'm still not sure what to do about it. I was going to fold the dynamic hcall registration into the patch set before upstreaming. But then something paulus said made me rethink whether the dynamic registration was a good idea. Still need to sort this out before the series is really ready. We can surely move it to dynamic later on. I think the "proper" way would be to populate a qdev bus and have the individual hypercall receivers register themselves through -device creations. Ignore the qdev bit for a moment. Hypercalls could be plausibly implemented as another I/O space from a processor so the thing to model off of would be PIO dispatch (cpu_outb and friends). From a qdev perspective, having a VIO bus makes sense. The details of which I/O spaces are uses are not as important from a device tree perspective. Regards, Anthony Liguori Alex
Re: [Qemu-devel] KVM call minutes for Feb 8
On 02/11/2011 12:14 PM, Blue Swirl wrote: On Thu, Feb 10, 2011 at 6:05 PM, Anthony Liguori wrote: On 02/10/2011 03:20 PM, Gleb Natapov wrote: Jugging by how well all previous conversion went we will end up with one more way of creating devices. One legacy, another qdev and your new one. And what is the problem with qdev again (not that I am a big qdev fan)? We've really been arguing about probably the most minor aspect of the problem with qdev. All I'm really saying is that we shouldn't tie device construction to a factory interface as we do with qdev. That simply means that we should be able to do: RTC *rtc_create(arg1, arg2, arg2); I don't see how that would help at all. Throwing qdev away and just calling various functions directly, with all states exposed would be like QEMU 0.9.0. qdev doesn't expose any state today. qdev properties are construction-only properties that happen to be stored in each device state. What we really need is a full property framework that includes properties with hookable getters and setters along with the ability to mark properties as construct-only, read-only, or read-write. But I think it's reasonable to expose construct-only properties as just an initfn argument. And that a separate piece of code decides which devices are exposed through -device or device_add. Which devices are exposed is really a minor detail. That said, qdev has a number of significant limitations in my mind. The first is that the only relationship between devices is through the BusState interface. There's also qemu_irq for arbitrary signals. Yes, but qemu_irq is very restricted as it only models a signal bit of information and doesn't really have a mechanism to attach/detach in any generic way. I don't think we should even try to have a generic bus model. When you look at how badly broken PCI hotplug is current in qdev, I think this is symptomatic of this. And how should this be fixed? The API change would not help. Just as we have bus level creation functions, we should have bus level hotplug interfaces. There's also no way in qdev to really have polymorphism. Interfaces really aren't meaningful in qdev so you have things like PCIDevice where some methods are stored in the object instead of the class dispatch table and you have overuse of static class members. QEMU is developed in C, not C++. But we're trying to do object oriented programming in C so as long as we're doing that, we ought to do it right. And it's all unrelated to VMState. Right, but this has also the good side that not all device state is automatically exported. If other devices would be allowed to muck with a devices internal state freely, bad things could happen. Device reset could also use standard register definitions, shared with VMState. There's a way to have formally verifiable serialization/deserialization if we can satisfy two conditions 1) the devices rely on no global state (i.e. static variables) and 2) every field asssociated with a device is marshalled during serialization/deserialization. When we define a device, right now we say that certain state is writable during construction. It's not a stretch to want to have some properties writable during runtime. If we also had a mechanism to mark certain properties as read-only, but still were able to introspect them, we could implement serialization without having to have a second VMState definition. Compatibility will always require manipulating state, but once you have the state stored in a data structure, you can describe those transformations in a pretty high level fashion. And this is just the basic mechanisms of qdev. The actual implementation is worse. The use of qemu_irq as gpio in the base class and overuse of SystemBus is really quite insane. Maybe qemu_irq should be renamed to QEMUSignal (and I don't like typedeffing pointers), otherwise it looks quite sane to me. Any interfaces of a base class should make sense even for derived classes. That means if the base class is going to expose essentially a pin-out interface, that if I have a PCIDevice and cast it to Device, I should be able to interact with the GPIO interface to interact with the PCI device. Presumably, that means interfacing at the PCI signalling level. That's insane to model in QEMU :-) In reality, GPIO only makes sense for a small class of simple devices where modelling the pin-out interface makes sense (like a 7-segment LCD). That suggests that GPIO should not be in the DeviceState interface but instead should be in a SimpleDevice subclass or something like that. Could you point to examples of SystemBus overuse? anthony@titi:~/git/qemu/hw$ grep qdev_create *.c | wc -l 73 anthony@titi:~/git/qemu/hw$ grep 'qdev_create(NULL' *.c | wc -l 56 SystemBus has become a catch-all for shallow qdev conversions. We've got Northbridges
Re: [Qemu-devel] KVM call minutes for Feb 8
On 02/10/2011 04:29 AM, Avi Kivity wrote: On 02/10/2011 09:47 AM, Anthony Liguori wrote: So very concretely, I'm suggesting we do the following to target-i386: 1) make the i440fx device have an embedded ide controller, piix3, and usb controller that get initialized automatically. The piix3 embeds the PCI-to-ISA bridge along with all of the default ISA devices (rtc, serial, etc.). This I like. 2) get rid of the entire concept of machines. Creating a i440fx is essentially equivalent to creating a bare machine. No, it's not. The 440fx does not include an IOAPIC, for example. There may be other optional components, or differences in wiring, that make two machines with i440fx not identical. The IOAPIC is basically the only other component and I view it as part of the CPU interface to the chipset. But still, if we're creating a machine from scratch: qemu -device i440fx,id=nb -device piix3,id=sb,chipset=nb -device ioapic,id=ioapic,chipset=sb -device cpu,ioapic=ioapic,northbridge=nb Is not all that unreasonable and presents a fully functioning PC. 4) model the CPUs as devices that take a pointer to a host controller, for x86, the normal case would be giving it a pointer to i440fx. Surely the connection is via a bus? An x86 cpu talks to the bus, and there happens to be an 440fx north bridge at the end of it. It could also be a Q35 or something else. I see being on a bus as really just taking a pointer to an interface. So yes, the i440fx would implement a PentiumCpuInterface or something like that and the CPU would take a pointer to a PentiumCpuInterface[1]. This is part of why having proper polymorphism is important. We need it in order to be able to express concepts like interfaces. [1] This is just a Random Bad Name. Don't read anything into it. Regards, Anthony Liguori
Re: [Qemu-devel] KVM call minutes for Feb 8
On 02/10/2011 04:29 AM, Avi Kivity wrote: On 02/10/2011 09:47 AM, Anthony Liguori wrote: So very concretely, I'm suggesting we do the following to target-i386: 1) make the i440fx device have an embedded ide controller, piix3, and usb controller that get initialized automatically. The piix3 embeds the PCI-to-ISA bridge along with all of the default ISA devices (rtc, serial, etc.). This I like. 2) get rid of the entire concept of machines. Creating a i440fx is essentially equivalent to creating a bare machine. No, it's not. The 440fx does not include an IOAPIC, for example. There may be other optional components, or differences in wiring, that make two machines with i440fx not identical. The IOAPIC is basically the only other component and I view it as part of the CPU interface to the chipset. But still, if we're creating a machine from scratch: qemu -device i440fx,id=nb -device piix3,id=sb,chipset=nb -device ioapic,id=ioapic,chipset=sb -device cpu,ioapic=ioapic,northbridge=nb Is not all that unreasonable and presents a fully functioning PC. 4) model the CPUs as devices that take a pointer to a host controller, for x86, the normal case would be giving it a pointer to i440fx. Surely the connection is via a bus? An x86 cpu talks to the bus, and there happens to be an 440fx north bridge at the end of it. It could also be a Q35 or something else. I see being on a bus as really just taking a pointer to an interface. So yes, the i440fx would implement a PentiumCpuInterface or something like that and the CPU would take a pointer to a PentiumCpuInterface[1]. This is part of why having proper polymorphism is important. We need it in order to be able to express concepts like interfaces. [1] This is just a Random Bad Name. Don't read anything into it. Regards, Anthony Liguori
Re: [Qemu-devel] [PATCH 06/10] vmmouse: convert to qdev
On 02/12/2011 11:03 AM, Markus Armbruster wrote: Blue Swirl writes: Convert to qdev, also add a proper reset function. Signed-off-by: Blue Swirl --- hw/pc.c |5 +++-- hw/pc.h |3 --- hw/vmmouse.c | 37 + 3 files changed, 32 insertions(+), 13 deletions(-) diff --git a/hw/pc.c b/hw/pc.c index fcee09a..f66ac5d 100644 --- a/hw/pc.c +++ b/hw/pc.c @@ -1096,7 +1096,7 @@ void pc_basic_device_init(qemu_irq *isa_irq, PITState *pit; qemu_irq rtc_irq = NULL; qemu_irq *a20_line; -ISADevice *i8042, *port92; +ISADevice *i8042, *port92, *vmmouse; qemu_irq *cpu_exit_irq; register_ioport_write(0x80, 1, 1, ioport80_write, NULL); @@ -1133,7 +1133,8 @@ void pc_basic_device_init(qemu_irq *isa_irq, a20_line = qemu_allocate_irqs(handle_a20_line_change, first_cpu, 2); i8042 = isa_create_simple("i8042"); i8042_setup_a20_line(i8042,&a20_line[0]); -vmmouse_init(i8042); +vmmouse = isa_create("vmmouse"); +qdev_prop_set_ptr(&vmmouse->qdev, "ps2_mouse", i8042); port92 = isa_create_simple("port92"); port92_init(port92,&a20_line[1]); diff --git a/hw/pc.h b/hw/pc.h index 603a2a3..ae83934 100644 --- a/hw/pc.h +++ b/hw/pc.h @@ -67,9 +67,6 @@ void hpet_pit_enable(void); /* vmport.c */ void vmport_register(unsigned char command, IOPortReadFunc *func, void *opaque); -/* vmmouse.c */ -void *vmmouse_init(void *m); - /* pckbd.c */ void i8042_init(qemu_irq kbd_irq, qemu_irq mouse_irq, uint32_t io_base); diff --git a/hw/vmmouse.c b/hw/vmmouse.c index 2097119..3b39144 100644 --- a/hw/vmmouse.c +++ b/hw/vmmouse.c @@ -25,6 +25,7 @@ #include "console.h" #include "ps2.h" #include "pc.h" +#include "qdev.h" /* debug only vmmouse */ //#define DEBUG_VMMOUSE @@ -52,6 +53,7 @@ typedef struct _VMMouseState { +ISADevice dev; uint32_t queue[VMMOUSE_QUEUE_SIZE]; int32_t queue_size; uint16_t nb_queue; @@ -270,22 +272,41 @@ static const VMStateDescription vmstate_vmmouse = { } }; -void *vmmouse_init(void *m) +static void vmmouse_reset(DeviceState *d) { -VMMouseState *s = NULL; +VMMouseState *s = container_of(d, VMMouseState, dev.qdev); -DPRINTF("vmmouse_init\n"); +s->status = 0x; +} -s = qemu_mallocz(sizeof(VMMouseState)); +static int vmmouse_initfn(ISADevice *dev) +{ +VMMouseState *s = DO_UPCAST(VMMouseState, dev, dev); -s->status = 0x; -s->ps2_mouse = m; -s->queue_size = VMMOUSE_QUEUE_SIZE; Where is member queue_size initialized in your new code? +DPRINTF("vmmouse_init\n"); vmport_register(VMMOUSE_STATUS, vmmouse_ioport_read, s); vmport_register(VMMOUSE_COMMAND, vmmouse_ioport_read, s); vmport_register(VMMOUSE_DATA, vmmouse_ioport_read, s); vmstate_register(NULL, 0,&vmstate_vmmouse, s); -return s; +return 0; +} + +static ISADeviceInfo vmmouse_info = { +.init = vmmouse_initfn, +.qdev.name = "vmmouse", +.qdev.size = sizeof(VMMouseState), +.qdev.no_user = 1, +.qdev.reset= vmmouse_reset, +.qdev.props = (Property[]) { +DEFINE_PROP_PTR("ps2_mouse", VMMouseState, ps2_mouse), Pointer properties are for dirty hacks only. Is there really no better solution? Why does it have to be a property? vmmouse is really just an extension to the PS2 Mouse. It's definitely not an ISA device. In terms of qdev enablement, I would just make it a boolean option to the PS2Mouse and not expose it as a top level device at all. It cannot exist without a PS2Mouse. Regards, Anthony Liguori +DEFINE_PROP_END_OF_LIST(), +} +}; + +static void vmmouse_dev_register(void) +{ +isa_qdev_register(&vmmouse_info); } +device_init(vmmouse_dev_register)
Re: [Qemu-devel] KVM call minutes for Feb 8
On 02/13/2011 05:38 PM, Anthony Liguori wrote: 2) get rid of the entire concept of machines. Creating a i440fx is essentially equivalent to creating a bare machine. No, it's not. The 440fx does not include an IOAPIC, for example. There may be other optional components, or differences in wiring, that make two machines with i440fx not identical. The IOAPIC is basically the only other component and I view it as part of the CPU interface to the chipset. But it isn't. The IOAPIC is not per-core or per-socket. It's strictly board level. It's a board interface to the cpu. But still, if we're creating a machine from scratch: qemu -device i440fx,id=nb -device piix3,id=sb,chipset=nb -device ioapic,id=ioapic,chipset=sb -device cpu,ioapic=ioapic,northbridge=nb Is not all that unreasonable and presents a fully functioning PC. Sure. And -M blah is a shortcut. 4) model the CPUs as devices that take a pointer to a host controller, for x86, the normal case would be giving it a pointer to i440fx. Surely the connection is via a bus? An x86 cpu talks to the bus, and there happens to be an 440fx north bridge at the end of it. It could also be a Q35 or something else. I see being on a bus as really just taking a pointer to an interface. So yes, the i440fx would implement a PentiumCpuInterface or something like that and the CPU would take a pointer to a PentiumCpuInterface[1]. This is part of why having proper polymorphism is important. We need it in order to be able to express concepts like interfaces. The CPUs and northbridge are peers on the system bus. However, this isn't something good to model, since it keeps changing without any guest software impact. -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] Re: [PATCH 15/15] Implement the bus structure for PAPR virtual IO
On 13.02.2011, at 16:08, Anthony Liguori wrote: > On 02/13/2011 05:12 AM, David Gibson wrote: >> On Sun, Feb 13, 2011 at 10:08:23AM +0200, Blue Swirl wrote: >> >>> On Sun, Feb 13, 2011 at 1:15 AM, Benjamin Herrenschmidt >>> wrote: >>> On Sun, 2011-02-13 at 00:52 +0200, Blue Swirl wrote: > On Sat, Feb 12, 2011 at 11:00 PM, Benjamin Herrenschmidt > wrote: > >> On Sat, 2011-02-12 at 18:59 +0200, Blue Swirl wrote: >> >>> Actually I don't quite understand the need for vty layer, why not use >>> the chardev here directly? >>> >> I'm not sure what you mean here... >> > Maybe it would be reasonable to leave h_put_term_char to spapr_hcall.c > instead of moving those to a separate file. > Well, the VIO device instance gives the chardev instance which is all nicely encapsulated inside spapr-vty... Also VIO devices tend to have dedicated hcalls, not only VTY, so it makes a lot of sense to keep them close to the rest of the VIO driver they belong to don't you think ? (Actually veth does, vscsi uses the more "generic" CRQ stuff which we've added to the core VIO but you'll see that when we get those patches ready, hopefully soon). >>> This is a bit of a special case, much like semihosting modes for m68k >>> or ARM, or like MOL hacks which were removed recently. From QEMU point >>> of view, the most natural way of handling this would be hypervisor >>> implemented in the guest side (for example BIOS). Then the hypervisor >>> would use normal IO (or virtio) to communicate with the host. If >>> inside QEMU, the interface of the hypervisor to the devices needs some >>> thought. We'd like to avoid ugly interfaces like vmmouse where a >>> device probes CPU registers directly or spaghetti interfaces like >>> APIC. >>> >> I really don't follow what you're saying here. Running the hypervisor >> in the guest, rather than emulating its effect in qemu seems like an >> awful lot of complexity for no clear reason. >> > > In KVM for x86, instead of using a secondary interface (like vmmcall/vmcall), > we do all of our paravirtualization using native hardware interfaces that we > can trap (PIO/MMIO). > > IIUC, on Power, trapping MMIO is not possible due to the MMU mode that is > currently used (PFs are delivered directly to the guest). > > So it's not really possible to switch from using hypercalls to using MMIO. > > What I would suggest is modelling hypercalls as another I/O address space for > the processor. So instead of having a function pointer in the CPUState, > introduce a: > > typedef void (HypercallFunc)(CPUState *env, void *opaque); > > /* register a hypercall handler */ > void register_hypercall(target_ulong index, HypercallFunc *handler, void > *opaque); > void unregister_hypercall(target_ulong index); > > /* dispatch a hypercall */ > void cpu_hypercall(CPUState *env, target_ulong index); > > This interface could also be used to implement hypercall based interfaces on > s390 and x86. > > The arguments will have to be extracted from the CPU state but I don't think > we'll really ever have common hypercall implementations anyway so that's not > a huge problem. Is there enough common ground between the hypercall interfaces that this makes sense? It sounds nice on paper, but in the end the "hypercall not implemented" return codes differ, the argument interpretation differs and even the amount of return values differ. So at the end of the day, all this interface could do is call a machine helper function to evaluate the hypercall id from its register state and dispatch to that, leaving the rest to the individual hypercall function implementation. The implementations themselves also couldn't be reused. A PAPR hypercall simply doesn't work on x86. PIO and MMIO interfaces make sense to share - devices implemented using them could potentially be reused later by other platforms. For the hypercall devices, that doesn't work. > >>> on real HW? >>> >> The interface already exists on real HW. It's described in the PAPR >> document we keep mentioning. >> > > Another thing to note is that the hypercall based I/O devices the interfaces > that the current Power hypervisor uses so implementing this interface is > necessary to support existing guests. Yes, implementing the existing PAPR interfaces is crucial to run existing guests. Implementing it at the hypercall level is required if we ever want to run it with hardware accelerated KVM on ppc, as there hypercalls simply get forwarded to the hypervisor (kvm) which would pass them on to qemu. And since the interface is not nesting capable, running a hypervisor inside the guest doesn't work. Alex
[Qemu-devel] Re: [PATCH 15/15] Implement the bus structure for PAPR virtual IO
On Sun, 2011-02-13 at 10:08 +0200, Blue Swirl wrote: > This is a bit of a special case, much like semihosting modes for m68k > or ARM, or like MOL hacks which were removed recently. From QEMU point > of view, the most natural way of handling this would be hypervisor > implemented in the guest side (for example BIOS). Then the hypervisor > would use normal IO (or virtio) to communicate with the host. If > inside QEMU, the interface of the hypervisor to the devices needs some > thought. We'd like to avoid ugly interfaces like vmmouse where a > device probes CPU registers directly or spaghetti interfaces like > APIC. I'm not sure I understand your point. We are emulating a guest running under pHyp, ie, qemu in that case is the hypervisor, and provides the same interfaces as pHyp does (the IBM proprietary hypervisor, aka sPAPR). This is how we enable booting existing kernels and distributions inside qemu/kvm. > Is the interface new design, or are you implementing what is used also > on real HW? We are implementing a subset of the interfaces implemented by pHyp today on "real HW". In the long run, when you throw KVM into the picture, on machines (hypothetical machines at this stage) where one can run a KVM has a hypervisor (currently you can only run pHyp on all released IBM machines), this will give you an environment that is compatible with pHyp and thus supports existing distributions and kernels. We -will- add support for the "real" virtio on top of that, but those will require modified guest kernels (and will provide better performances than the sPAPR emulation). But the sPAPR emulation is a necessary step to enable existing stuff to run. Cheers, Ben.
Re: [Qemu-devel] Re: [PATCH 15/15] Implement the bus structure for PAPR virtual IO
On Sun, 2011-02-13 at 14:15 +0200, Blue Swirl wrote: > > Maybe it would be more complex but also emulation accuracy would be > increased and the interfaces would be saner. We don't shortcut BIOS > and implement its services to OS in QEMU for other machines either. But that is not comparable. BIOS is comparable for example to Open Firmware and we do not 'emulate' OF, we will provide an implementation that runs inside the guest, just like you do for BIOS (SLOF based, tho people are welcome to play with OpenBIOS if they want, but SLOF is what we will provide and support). In this case, we are talking about a hypervisor which is somewhat a different beast. Sure you -could- run it into the guest, I suppose, if emulation accuracy was your ultimate goal. That would entail at least the followings: - Implement support for the complete "hypervisor" mode inside qemu - Re-implement a complete hypervisor compatible with pHyp An enormous amount of work, for a result that would have low performances and about zero interest to anybody. The goal here is to provide a runtime environment for kernels and distributions that is -compatible- with sPAPR/pHyp to enable existing distributions to operate in KVM. > I'd expect one problem with that approach though, the interface used > on real HW between the hypervisor and the underlying HW may be > undocumented, but then it could use for example existing virtio > devices. But what would be the point ? > One way to handle this could be to add the hypervisor interface now to > QEMU and switch to guest hypervisor when (if) it becomes available. > I'd just like to avoid duplication with virtio or messy interfaces > like vmport. Again, what would be the point ? Eventually, KVM will be available as an "alternate" hypervisor to pHyp which I suppose one could run entirely inside qemu once we add support for the HV mode to it, and that would somewhat do what you describe but that isn't what we are trying to get at here. Cheers, Ben.
Re: [Qemu-devel] Re: [PATCH 15/15] Implement the bus structure for PAPR virtual IO
On Sun, 2011-02-13 at 13:40 +0100, Alexander Graf wrote: > > We can surely move it to dynamic later on. I think the "proper" way > would be to populate a qdev bus and have the individual hypercall > receivers register themselves through -device creations. But Blue > really is the expert here :). Why would you want to go through a bus for all hcalls ? (ie including the ones that aren't device related ?). That doesn't quite "tick" but I'm sure I'm missing part of the picture here :-) A simple dispatch table based approach is the most efficient and simple way to do that (after a switch/case) in my opinion, this is more/less what we have done internally, with a pair of calls for "modules" to register hcalls if they need to. The hcalls numbers are fixed and specified in sPAPR. BTW. We are still missing in this picture RTAS. I suppose David is still cleaning up those patches. Basically, we use a 5 instruction trampoline that calls a private h-call, the RTAS emulation is entirely in qemu. This has to be done that way for various reasons, but essentially RTAS under pHyp also more/less turns into private pHyp calls under the hood. Cheers, Ben.
Re: [Qemu-devel] Re: [PATCH 15/15] Implement the bus structure for PAPR virtual IO
On 02/13/2011 09:56 AM, Alexander Graf wrote: This interface could also be used to implement hypercall based interfaces on s390 and x86. The arguments will have to be extracted from the CPU state but I don't think we'll really ever have common hypercall implementations anyway so that's not a huge problem. Is there enough common ground between the hypercall interfaces that this makes sense? Between the dispatch, registration, and tracing, yes. It sounds nice on paper, but in the end the "hypercall not implemented" return codes differ, the argument interpretation differs and even the amount of return values differ. Yes, that's why we pass CPUState. But keep in mind, CPUState needs to be sync'd before and after the invocation. So at the end of the day, all this interface could do is call a machine helper function to evaluate the hypercall id from its register state and dispatch to that, leaving the rest to the individual hypercall function implementation. The implementations themselves also couldn't be reused. A PAPR hypercall simply doesn't work on x86. PIO and MMIO interfaces make sense to share - devices implemented using them could potentially be reused later by other platforms. For the hypercall devices, that doesn't work. Yes, which is why I'm not advocating trying to unmarshal the calls or anything like that. But the dispatch, registration, tracing, and CPU sync'ing bits can all be reused. Regards, Anthony Liguori
Re: [Qemu-devel] Re: [PATCH 15/15] Implement the bus structure for PAPR virtual IO
On 02/13/2011 10:07 AM, Benjamin Herrenschmidt wrote: On Sun, 2011-02-13 at 10:08 +0200, Blue Swirl wrote: This is a bit of a special case, much like semihosting modes for m68k or ARM, or like MOL hacks which were removed recently. From QEMU point of view, the most natural way of handling this would be hypervisor implemented in the guest side (for example BIOS). Then the hypervisor would use normal IO (or virtio) to communicate with the host. If inside QEMU, the interface of the hypervisor to the devices needs some thought. We'd like to avoid ugly interfaces like vmmouse where a device probes CPU registers directly or spaghetti interfaces like APIC. I'm not sure I understand your point. We are emulating a guest running under pHyp, ie, qemu in that case is the hypervisor, and provides the same interfaces as pHyp does (the IBM proprietary hypervisor, aka sPAPR). This is how we enable booting existing kernels and distributions inside qemu/kvm. Is the interface new design, or are you implementing what is used also on real HW? We are implementing a subset of the interfaces implemented by pHyp today on "real HW". In the long run, when you throw KVM into the picture, on machines (hypothetical machines at this stage) where one can run a KVM has a hypervisor (currently you can only run pHyp on all released IBM machines), this will give you an environment that is compatible with pHyp and thus supports existing distributions and kernels. We try very, very hard to make our paravirtualization look like real hardware. When the paravirtualization interfaces are already defined, we have no choice in supporting those interfaces although sometimes we try to push that to firmware (like with Xenner). It's better from a security PoV. But in this case, we have no choice but to implement the paravirtualization interfaces in QEMU because of the way the hardware works and because these interfaces are already well defined. Regards, Anthony Liguori We -will- add support for the "real" virtio on top of that, but those will require modified guest kernels (and will provide better performances than the sPAPR emulation). But the sPAPR emulation is a necessary step to enable existing stuff to run. Cheers, Ben.
Re: [Qemu-devel] Re: [PATCH 15/15] Implement the bus structure for PAPR virtual IO
On 02/13/2011 10:17 AM, Benjamin Herrenschmidt wrote: On Sun, 2011-02-13 at 13:40 +0100, Alexander Graf wrote: We can surely move it to dynamic later on. I think the "proper" way would be to populate a qdev bus and have the individual hypercall receivers register themselves through -device creations. But Blue really is the expert here :). Why would you want to go through a bus for all hcalls ? (ie including the ones that aren't device related ?). That doesn't quite "tick" but I'm sure I'm missing part of the picture here :-) A virtual bus is just an interface. If all virtual devices that interact via hcalls would all reside on the same virtual bus, then having hypercalls registered through that interface makes sense because you can associate hypercalls with particular devices. This means that you can automatically deregister on device removal and things like that. But I don't think this will work out well. I think treating the hypercalls as a simple dispatch table just like ioport would make sense. Regards, Anthony Liguori A simple dispatch table based approach is the most efficient and simple way to do that (after a switch/case) in my opinion, this is more/less what we have done internally, with a pair of calls for "modules" to register hcalls if they need to. The hcalls numbers are fixed and specified in sPAPR. BTW. We are still missing in this picture RTAS. I suppose David is still cleaning up those patches. Basically, we use a 5 instruction trampoline that calls a private h-call, the RTAS emulation is entirely in qemu. This has to be done that way for various reasons, but essentially RTAS under pHyp also more/less turns into private pHyp calls under the hood. Cheers, Ben.
Re: [Qemu-devel] KVM call minutes for Feb 8
On 02/13/2011 09:56 AM, Avi Kivity wrote: On 02/13/2011 05:38 PM, Anthony Liguori wrote: 2) get rid of the entire concept of machines. Creating a i440fx is essentially equivalent to creating a bare machine. No, it's not. The 440fx does not include an IOAPIC, for example. There may be other optional components, or differences in wiring, that make two machines with i440fx not identical. The IOAPIC is basically the only other component and I view it as part of the CPU interface to the chipset. But it isn't. The IOAPIC is not per-core or per-socket. Yeah, I wasn't implying that it was. It's strictly board level. It's a board interface to the cpu. But still, if we're creating a machine from scratch: qemu -device i440fx,id=nb -device piix3,id=sb,chipset=nb -device ioapic,id=ioapic,chipset=sb -device cpu,ioapic=ioapic,northbridge=nb Is not all that unreasonable and presents a fully functioning PC. Sure. And -M blah is a shortcut. Exactly. Or better yet, blah is a config file that contains [device "nb"] driver=i440fx [device "sb"] driver=piix3 chipset=nb [device "ioapic"] driver=ioapic chipset=sb [device "cpu"] driver=cpu ioapic=ioapic northbridge=nb 4) model the CPUs as devices that take a pointer to a host controller, for x86, the normal case would be giving it a pointer to i440fx. Surely the connection is via a bus? An x86 cpu talks to the bus, and there happens to be an 440fx north bridge at the end of it. It could also be a Q35 or something else. I see being on a bus as really just taking a pointer to an interface. So yes, the i440fx would implement a PentiumCpuInterface or something like that and the CPU would take a pointer to a PentiumCpuInterface[1]. This is part of why having proper polymorphism is important. We need it in order to be able to express concepts like interfaces. The CPUs and northbridge are peers on the system bus. However, this isn't something good to model, since it keeps changing without any guest software impact. If we can move away from Bus abstraction and to a simpler interface mechanism, then we can express peer relationships by just having bidirection references. IOW: -device cpus,northbridge=nb,id=cpus,count=16 -device i440fx,cpus=cpus I don't think modelling each CPU makes sense. We should probably just model all cpus in a single device for the sake of simplicity. Regards, Anthony Liguori
Re: [Qemu-devel] [PATCH 1/2] linux-user: Define target alignment size
Le dimanche 13 février 2011 à 10:24 +0200, Blue Swirl a écrit : > On Sun, Feb 13, 2011 at 4:22 AM, Laurent Vivier wrote: > > Datatype alignment can be found using following application: > > > > int main(void) > > { > >printf("alignof(short) %ld\n", __alignof__(short)); > >printf("alignof(int) %ld\n", __alignof__(int)); > >printf("alignof(long) %ld\n", __alignof__(long)); > >printf("alignof(long long) %ld\n", __alignof__(long long)); > > } > > > > This patch includes following alignments: > > > > i386 > > > > alignof(short) 2 > > alignof(int) 4 > > alignof(long) 4 > > alignof(long long) 8 > > > > x86_64 > > > > alignof(short) 2 > > alignof(int) 4 > > alignof(long) 8 > > alignof(long long) 8 > > > > arm > > > > alignof(short) 2 > > alignof(int) 4 > > alignof(long) 4 > > alignof(long long) 4 > > > > m68k (680x0) > > > > alignof(short) 2 > > alignof(int) 2 > > alignof(long) 2 > > alignof(long long) 2 > > > > mips > > > > alignof(short) 2 > > alignof(int) 4 > > alignof(long) 4 > > alignof(long long) 8 > > > > ppc > > > > alignof(short) 2 > > alignof(int) 4 > > alignof(long) 4 > > alignof(long long) 8 > > > > for other targets, use by default (2,4,4,8). > > > > Please, update for your favorite target... > > For Sparc32 (I think also sparc32plus), the default is OK. > > For Sparc64, please use 2, 4, 8, 8. I'd guess other 64 bit platforms > (Alpha, MIPS64, PPC64 etc) should use the same. OK, I update my patch. > Does GCC produce correct code using the attributes on strictly aligned > host, when the target is less strictly aligned? It seems it is OK. I did some tests into a mips-linux-user chroot (sparc one is broken ;-) ) : mips is a strictly aligned host, from gcc/config/mips/mips.h: #define STRICT_ALIGNMENT 1 aligments are: alignof(short) 2 alignof(int) 4 alignof(long) 4 alignof(long long) 8 We try to align int on 2. #include typedef int target_int __attribute__((aligned(2))); struct { char Z; target_int A; } B; int main(void) { B.A = 0xdeadbeaf; printf("%d: %x\n", __alignof__(B.A), B.A); } ./test_align 2: deadbeaf disass: lw $2,%got(B)($28) li $3,-559087616 # 0xdead ori $3,$3,0xbeaf swl $3,2($2) swr $3,5($2) normal case: lw $3,%got(B)($28) li $2,-559087616 # 0xdead ori $2,$2,0xbeaf sw $2,4($3) So, gcc seems smart enough to split the memory access in several ones compatible with the strict alignment rules. > Should the alignment of floating point variables be specified as well? At the moment it seems useless. > The strict alignment required for doubles is 4, but recommended > alignment is 8, I'm not sure which one is used for structures > containing doubles. if necessary, some tests will be helpfull. Thank you for your comments. Regards, Laurent -- - laur...@vivier.eu -- "Tout ce qui est impossible reste à accomplir"Jules Verne "Things are only impossible until they're not" Jean-Luc Picard
Re: [Qemu-devel] Binary Translation hooking - reading registers
On 02/13/2011 06:38 AM, Mulyadi Santosa wrote: > Hi > > On Sun, Feb 13, 2011 at 10:48, felix.matenaar@rwth-aachen > wrote: >> To achieve my goal, it is necessary being able reading actual register >> configuration like eax when a ret hook is called to get a function >> return value. So my question is how I can do this. Are there already >> some functions which generate code to update the cpu environment? If >> not, is there anything you can point me towards for adding support? > I think you should look into the tracing infrastructure that is > gradually added to Qemu. I forgot the URL that provide the patch > (since I am not sure whether it's fully merged with mainline). Please > check this list archieve... > > NB: You're talking about qemu system emulation,right? not the user > mode emulation, I assume? Because you said "executed in one step" (or > something like that). AFAIK, although Qemu does lazy evalution, but > for general registers it should be always updated. The one that gets > lazy evalution for example is eflags. > Yes I am talking about full system emulation (i386). Tracing infrastructure may be suitable for my needs in the future but I think that my instrumentation patches which are already added as far as I need it have a bit less overhead. To be more specific on my env update problem, here an example: push ebp mv esp,ebp /* do something */ call 0xfoo test eax,eax /* do something */ ret The first line is the start of a block. What I did was adding a gen_helper_instrument_call in the 'call' opcode cases in disas_insn() which will call my analysis code to examine from where the call is launched, where it will go and what is the address we will return to after (in this case regarding to call 0xfoo). Because in this case, usage of tcg_const_i32(pc_start) and providing CPU_T[0] as parameters to my generated helper, I can provide all three values. But what I would _assume_ is that after gen_helper_instrument_call is executed, env->eip would point towards 'call 0xfoo', but it points to 'push ebp'. Thats why I am confused if I can trust the values in env to gain more information about the current register values of the guest machine. This particular case is indeed not a problem since I can provide all parameters to gen_helper_instrument_call here but there are cases where I would like to read e.g. esp for keeping an analysis shadow stack up to date or reading return values by examining eax when my ret hook is called. This is my call immediate handler: void helper_call_im_protected(target_ulong src_eip, target_ulong new_eip, int next_eip){ if (!(new_eip & 0x8000) ){ /* env->eip == src_eip will evaluate to 0 if 'call' opcode was not at the beginning of the translation block*/ analysis_examine_call(src_eip,new_eip,next_eip); } } Here my hook inserting in disas_insn: case 0xe8: /* call im */ { /* some code */ gen_movtl_T0_im(next_eip); gen_push_T0(s); /* will call helper_call_im_protected when call opcode is executed */ gen_helper_call_im_protected(tcg_const_i32(pc_start), tcg_const_i32(tval), cpu_T[0]); gen_jmp(s, tval); } Hope you understand.
Re: [Qemu-devel] [PATCH 02/10] parallel: make optional
On 02/12/11 15:40, Blue Swirl wrote: > Ignore failure with parallel device creation. > > Signed-off-by: Blue Swirl > --- > hw/pc.h |5 - > 1 files changed, 4 insertions(+), 1 deletions(-) > > diff --git a/hw/pc.h b/hw/pc.h > index 443ba34..f823b7d 100644 > --- a/hw/pc.h > +++ b/hw/pc.h > @@ -24,7 +24,10 @@ static inline bool parallel_init(int index, > CharDriverState *chr) > { > ISADevice *dev; > > -dev = isa_create("isa-parallel"); > +dev = isa_try_create("isa-parallel"); > +if (!dev) { > +return false; > +} > qdev_prop_set_uint32(&dev->qdev, "index", index); > qdev_prop_set_chr(&dev->qdev, "chardev", chr); > if (qdev_init(&dev->qdev) < 0) { How is this design supposed to be better than wrapping init functions in #ifdef CONFIG_ ... #endif? If a hardware model is compiled out via the CONFIG options qemu should fail to accept the command line parameters and not try to create the device. Instead with this design it tries and fails to create the device and yet continues on. David
Re: [Qemu-devel] Binary Translation hooking - reading registers
On 13 February 2011 17:14, felix.matenaar@rwth-aachen wrote: > To be more specific on my env update problem, here an example: > > push ebp > mv esp,ebp > /* do something */ > call 0xfoo > test eax,eax > /* do something */ > ret > > The first line is the start of a block. What I did was adding a > gen_helper_instrument_call in the 'call' opcode cases in disas_insn() > which will call my analysis code to examine from where the call is > launched, where it will go and what is the address we will return to > after (in this case regarding to call 0xfoo). > Because in this case, usage of tcg_const_i32(pc_start) and providing > CPU_T[0] as parameters to my generated helper, I can provide all three > values. But what I would _assume_ is that after > gen_helper_instrument_call is executed, env->eip would point towards > 'call 0xfoo', but it points to 'push ebp'. Ah, you should have said you were talking about EIP; you've hit an optimisation case. If we updated env->eip after every instruction then this would slow things down. Instead, since we know what the value is statically at translate time, we can just use that calculated value mostly. We only update env->eip: * at the end of the TB * when we translate jump commands (usually ends the TB) * when we translate something that will raise an exception or some other thing we know will need an accurate env->eip Unexpected interrupts (typically faults on load/store) are handled by constructing a mapping from host PC to guest PC and then looking up the host fault address in it and resetting env->eip (in gen_pc_load()). Which registers are dealt with like this varies from target to target; on target-arm we do this for both r15 (pc) and the IT state bits in the CPSR, for instance. The other case to watch out for is where the value of some register is kept in env, but as a broken out set of components which are only reassembled into the actual register value when the target needs to read it; see for example the comment in target-i386/cpu.h about env->eflags. If the registers you care about aren't in either of these categories you should be able to trust them in a helper fn. This typically includes most of the general purpose registers. -- PMM
Re: [Qemu-devel] KVM call minutes for Feb 8
On Sun, Feb 13, 2011 at 10:56:30AM -0600, Anthony Liguori wrote: > >> > >>qemu -device i440fx,id=nb -device piix3,id=sb,chipset=nb -device > >>ioapic,id=ioapic,chipset=sb -device > >>cpu,ioapic=ioapic,northbridge=nb > >> > >>Is not all that unreasonable and presents a fully functioning PC. > > > >Sure. And -M blah is a shortcut. > > Exactly. Or better yet, blah is a config file that contains > > [device "nb"] > driver=i440fx > You are trying to model how particular (very ancient) HW looked like, instead of emulating guest visible functionality, but is dead end since things are changing constantly. Northbridge functionality moves onto cpu for instance. What CPU i440fx was designed for? Pentium? What if user runs QEMU with emulated CPU that in real life has internal memory controller? Does you config have sense for such setup? Should we allow to specify only Pentium CPU since this is how real HW worked? > [device "sb"] > driver=piix3 And piix3 refers to piix3.cfg which describe devices that present on the chipset. > chipset=nb > > [device "ioapic"] > driver=ioapic > chipset=sb Here, for instance, IOAPIC is included in a chipset for a long time now. Why user should care that piix3 didn't have it. How this detail changes qemu functionality? If it doesn't why should we expose it? > > [device "cpu"] > driver=cpu > ioapic=ioapic Why ioapic here? Doesn't cpu talks to ioapic via northbridge? > northbridge=nb > -- Gleb.
[Qemu-devel] [RFC] qapi: events in QMP
Hi, In my QAPI branch[1], I've now got almost every existing QMP command converted with (hopefully) all of the hard problems solved. There is only one remaining thing to attack before posting for inclusion and that's events. Here's my current thinking about what to do. Events in QMP Today QMP events are asynchronous messages. They are not tied explicitly to any type of context, do not have a well defined format, and are have no mechanism to mask or filter events. As of right now, we have somewhere around a dozen events. Goals of QAPI 1) Make all interfaces consumable in C such that we can use the interfaces in QEMU 2) Make all interfaces exposed through a library using code generation from static introspection 3) Make all interfaces well specified in a formal schema Proposal for events in QAPI For QAPI, I'd like to model events on the notion of signals and slots[2]. A client would explicitly connect to a signal through a QMP interface which would result in a slot being added that then generates an event. Internally in QEMU, we could also connect slots to the same signals. Since we don't have an object API in QMP, we'd use a pair of connect/disconnect functions that had enough information to identify the signal. Example: We would define QEVENT_BLOCK_IO_EVENT as: # qmp-schema.json { 'BlockIOEvent': {'device': 'str', 'action': 'str', 'operation': 'str'} } The 'Event' suffix is used to determine that the type is an event and not a structure. This would generate the following structures for QEMU: typedef void (BlockIOEventFunc)(const char *device, const char *action, const char *operation, void *opaque); typedef struct BlockIOEvent { /* contents are private */ } BlockIOEvent; /* Connect a slot to a BlockIOEvent signal and return a handle to the connection */ int block_io_event_connect(BlockIOEvent *event, BlockIOEventFunc *cb, void *opaque); /* Signal any connect slots of a BlockIOEvent */ void block_io_event_signal(BlockIOEvent *event, const char *device, const char *action, const char *operation); /* Disconnect a slot from a signal based on a connection handle */ void block_io_event_disconnect(BlockIOEvent *event, int handle); In the case of BlockIOEvent, this is a global signal that is not tied to any specific object so there would be a global BlockIOEvent object within QEMU. From a QMP perspective, we would have the following function in the schema: [ 'block-io-event', {}, {}, 'BlockIOEvent' ] A function definition that returns an Event is a signal accessor. Within QEMU, we implement a signal accessor like so: BlockIOEvent *qmp_block_io_event(Error **errp) { return &global_block_io_event; } For QMP and libqmp, signal accessors are handled specially. A signal accessor is expanded into two QMP functions: [ 'block-io-event-connect', {}, {}, 'str' ] [ 'block-io-event-disconnect', {'tag' : 'str'}, {}, 'none' ] Connect returns a handle to the connection as an opaque string. Disconnect takes that handle. Both functions will also take any arguments that the signal accessor takes and under the cover uses the signal accessor to access the object. In libqmp, the following interfaces will be generated: int libqmp_block_io_event_connect(QmpSession *sess, BlockIOEventFunc *cb, void *opaque, Error **errp); void libqmp_block_io_event_disconnect(QmpSession *sess, int handle, Error **errp); Again, if the signal accessor takes additional arguments, they will be passed here. All existing events will be converted to signals with signal accessors that take no arguments. However, BlockIOEvent is a good example of a signal that we'll add a new version of that takes a 'const char *device' as an argument for the signal accessor. This will let clients register for block io events on specific block devices. Backwards Compatibility Right now, QMP events to not have any type of context information which means that there's no way to communicate the connection handle. We'll solve this by adding a new 'tag' field to events that are registered through one of the connect interfaces. Since this is only present on events registered through this new interface, backwards compatibility is preserved. Upon initial connection, we'll treat all existing signals as having a 'default signal connection'. We'll introduce a new QMP command that can be executed while in capabilities mode to disconnect default signal connections. The effect is that new clients can use the explicit register/unregister interface while old clients will continue to see the old style events. We'll deprecate the 'default signal connections' and make sure to never add a new one post 0.15. Old clients will need to be updated to explicitly register for events. Another Example Here's an example of BlockDeviceEvent, the successor to BlockIOEvent. It makes use of signal accessor arguments and QAPI enumeration support: # qmp-schema.json { 'BlockDeviceAction
Re: [Qemu-devel] [RFC] qapi: events in QMP
On 02/13/2011 12:08 PM, Anthony Liguori wrote: Hi, In my QAPI branch[1], I've now got almost every existing QMP command converted with (hopefully) all of the hard problems solved. There is only one remaining thing to attack before posting for inclusion and that's events. Here's my current thinking about what to do. Events in QMP Today QMP events are asynchronous messages. They are not tied explicitly to any type of context, do not have a well defined format, and are have no mechanism to mask or filter events. As of right now, we have somewhere around a dozen events. Goals of QAPI 1) Make all interfaces consumable in C such that we can use the interfaces in QEMU 2) Make all interfaces exposed through a library using code generation from static introspection 3) Make all interfaces well specified in a formal schema Proposal for events in QAPI For QAPI, I'd like to model events on the notion of signals and slots[2]. A client would explicitly connect to a signal through a QMP interface which would result in a slot being added that then generates an event. Internally in QEMU, we could also connect slots to the same signals. Since we don't have an object API in QMP, we'd use a pair of connect/disconnect functions that had enough information to identify the signal. Example: We would define QEVENT_BLOCK_IO_EVENT as: # qmp-schema.json { 'BlockIOEvent': {'device': 'str', 'action': 'str', 'operation': 'str'} } The 'Event' suffix is used to determine that the type is an event and not a structure. This would generate the following structures for QEMU: typedef void (BlockIOEventFunc)(const char *device, const char *action, const char *operation, void *opaque); One thing I'm on the fence about, is making the slot callback take a BlockIOEvent *event as the first argument. If we did this, the signals would be compatible with GObject signals. It's hard to map that to a libqmp though so I'm not sure. Regards, Anthony Liguori typedef struct BlockIOEvent { /* contents are private */ } BlockIOEvent; /* Connect a slot to a BlockIOEvent signal and return a handle to the connection */ int block_io_event_connect(BlockIOEvent *event, BlockIOEventFunc *cb, void *opaque); /* Signal any connect slots of a BlockIOEvent */ void block_io_event_signal(BlockIOEvent *event, const char *device, const char *action, const char *operation); /* Disconnect a slot from a signal based on a connection handle */ void block_io_event_disconnect(BlockIOEvent *event, int handle); In the case of BlockIOEvent, this is a global signal that is not tied to any specific object so there would be a global BlockIOEvent object within QEMU. From a QMP perspective, we would have the following function in the schema: [ 'block-io-event', {}, {}, 'BlockIOEvent' ] A function definition that returns an Event is a signal accessor. Within QEMU, we implement a signal accessor like so: BlockIOEvent *qmp_block_io_event(Error **errp) { return &global_block_io_event; } For QMP and libqmp, signal accessors are handled specially. A signal accessor is expanded into two QMP functions: [ 'block-io-event-connect', {}, {}, 'str' ] [ 'block-io-event-disconnect', {'tag' : 'str'}, {}, 'none' ] Connect returns a handle to the connection as an opaque string. Disconnect takes that handle. Both functions will also take any arguments that the signal accessor takes and under the cover uses the signal accessor to access the object. In libqmp, the following interfaces will be generated: int libqmp_block_io_event_connect(QmpSession *sess, BlockIOEventFunc *cb, void *opaque, Error **errp); void libqmp_block_io_event_disconnect(QmpSession *sess, int handle, Error **errp); Again, if the signal accessor takes additional arguments, they will be passed here. All existing events will be converted to signals with signal accessors that take no arguments. However, BlockIOEvent is a good example of a signal that we'll add a new version of that takes a 'const char *device' as an argument for the signal accessor. This will let clients register for block io events on specific block devices. Backwards Compatibility Right now, QMP events to not have any type of context information which means that there's no way to communicate the connection handle. We'll solve this by adding a new 'tag' field to events that are registered through one of the connect interfaces. Since this is only present on events registered through this new interface, backwards compatibility is preserved. Upon initial connection, we'll treat all existing signals as having a 'default signal connection'. We'll introduce a new QMP command that can be executed while in capabilities mode to disconnect default signal connections. The effect is that new clients can use the explicit register/unregister interface while old clients will continue to see the old style events. We'll deprecate the 'default signal connections
Re: [Qemu-devel] Re: [PATCH 15/15] Implement the bus structure for PAPR virtual IO
On Sun, 2011-02-13 at 10:48 -0600, Anthony Liguori wrote: > > We try very, very hard to make our paravirtualization look like real > hardware. Sure, that makes sense when you invent new paravirt interfaces, but that isn't the case. Note also that our current processors do not have the ability to emulate MMIOs in all cases, ie, when doing "real" KVM in HV mode, we cannot trap MMIO unless we redirect all page faults to the hypervisor, which comes at a cost. > When the paravirtualization interfaces are already defined, we have no > choice in supporting those interfaces although sometimes we try to > push that to firmware (like with Xenner). It's better from a > security PoV. > > But in this case, we have no choice but to implement the > paravirtualization interfaces in QEMU because of the way the hardware > works and because these interfaces are already well defined. Right. Now, in the long run, we might decide to "reflect" some of these back into some guest co-located FW or whatever of that kind, especially if we get to a point where linux virt-io becomes more prevalent and the sPAPR style IOs become purely legacy backward compat stubs, but we aren't there yet. Cheers, Ben.
Re: [Qemu-devel] Re: [PATCH 15/15] Implement the bus structure for PAPR virtual IO
On Sun, 2011-02-13 at 10:52 -0600, Anthony Liguori wrote: > > A virtual bus is just an interface. If all virtual devices that > interact via hcalls would all reside on the same virtual bus, then > having hypercalls registered through that interface makes sense > because > you can associate hypercalls with particular devices. This means > that > you can automatically deregister on device removal and things like > that. I see. Well, VIO related h-calls are only part of the picture here, I think we can live with having explicit de-registration if needed ;-) Besides the h-call is still implemented even if no device -instance- is currently plugged into the partition anyways. It just returns a (well defined per-hcall) error code if the instance handle passed to it is bogus. > But I don't think this will work out well. I think treating the > hypercalls as a simple dispatch table just like ioport would make > sense. Yup. Cheers, Ben.
Re: [Qemu-devel] Re: [PATCH 15/15] Implement the bus structure for PAPR virtual IO
On Sun, Feb 13, 2011 at 5:08 PM, Anthony Liguori wrote: > On 02/13/2011 05:12 AM, David Gibson wrote: >> >> On Sun, Feb 13, 2011 at 10:08:23AM +0200, Blue Swirl wrote: >> >>> >>> On Sun, Feb 13, 2011 at 1:15 AM, Benjamin Herrenschmidt >>> wrote: >>> On Sun, 2011-02-13 at 00:52 +0200, Blue Swirl wrote: > > On Sat, Feb 12, 2011 at 11:00 PM, Benjamin Herrenschmidt > wrote: > >> >> On Sat, 2011-02-12 at 18:59 +0200, Blue Swirl wrote: >> >>> >>> Actually I don't quite understand the need for vty layer, why not use >>> the chardev here directly? >>> >> >> I'm not sure what you mean here... >> > > Maybe it would be reasonable to leave h_put_term_char to spapr_hcall.c > instead of moving those to a separate file. > Well, the VIO device instance gives the chardev instance which is all nicely encapsulated inside spapr-vty... Also VIO devices tend to have dedicated hcalls, not only VTY, so it makes a lot of sense to keep them close to the rest of the VIO driver they belong to don't you think ? (Actually veth does, vscsi uses the more "generic" CRQ stuff which we've added to the core VIO but you'll see that when we get those patches ready, hopefully soon). >>> >>> This is a bit of a special case, much like semihosting modes for m68k >>> or ARM, or like MOL hacks which were removed recently. From QEMU point >>> of view, the most natural way of handling this would be hypervisor >>> implemented in the guest side (for example BIOS). Then the hypervisor >>> would use normal IO (or virtio) to communicate with the host. If >>> inside QEMU, the interface of the hypervisor to the devices needs some >>> thought. We'd like to avoid ugly interfaces like vmmouse where a >>> device probes CPU registers directly or spaghetti interfaces like >>> APIC. >>> >> >> I really don't follow what you're saying here. Running the hypervisor >> in the guest, rather than emulating its effect in qemu seems like an >> awful lot of complexity for no clear reason. >> > > In KVM for x86, instead of using a secondary interface (like > vmmcall/vmcall), we do all of our paravirtualization using native hardware > interfaces that we can trap (PIO/MMIO). > > IIUC, on Power, trapping MMIO is not possible due to the MMU mode that is > currently used (PFs are delivered directly to the guest). > > So it's not really possible to switch from using hypercalls to using MMIO. > > What I would suggest is modelling hypercalls as another I/O address space > for the processor. So instead of having a function pointer in the CPUState, > introduce a: > > typedef void (HypercallFunc)(CPUState *env, void *opaque); > > /* register a hypercall handler */ > void register_hypercall(target_ulong index, HypercallFunc *handler, void > *opaque); > void unregister_hypercall(target_ulong index); > > /* dispatch a hypercall */ > void cpu_hypercall(CPUState *env, target_ulong index); > > This interface could also be used to implement hypercall based interfaces on > s390 and x86. > > The arguments will have to be extracted from the CPU state but I don't think > we'll really ever have common hypercall implementations anyway so that's not > a huge problem. Nice idea. Then the part handling CPUState probably should belong to target-ppc/ rather than hw/.
Re: [Qemu-devel] [PATCH 02/10] parallel: make optional
On Sun, Feb 13, 2011 at 7:30 PM, David Ahern wrote: > > > On 02/12/11 15:40, Blue Swirl wrote: >> Ignore failure with parallel device creation. >> >> Signed-off-by: Blue Swirl >> --- >> hw/pc.h | 5 - >> 1 files changed, 4 insertions(+), 1 deletions(-) >> >> diff --git a/hw/pc.h b/hw/pc.h >> index 443ba34..f823b7d 100644 >> --- a/hw/pc.h >> +++ b/hw/pc.h >> @@ -24,7 +24,10 @@ static inline bool parallel_init(int index, >> CharDriverState *chr) >> { >> ISADevice *dev; >> >> - dev = isa_create("isa-parallel"); >> + dev = isa_try_create("isa-parallel"); >> + if (!dev) { >> + return false; >> + } >> qdev_prop_set_uint32(&dev->qdev, "index", index); >> qdev_prop_set_chr(&dev->qdev, "chardev", chr); >> if (qdev_init(&dev->qdev) < 0) { > > > How is this design supposed to be better than wrapping init functions in > #ifdef CONFIG_ ... #endif? We avoid the #ifdeffery and the device model is not violated. Most of the code is not affected in any way by leaving the device out. > If a hardware model is compiled out via the CONFIG options qemu should > fail to accept the command line parameters and not try to create the > device. Instead with this design it tries and fails to create the device > and yet continues on. Maybe command line parameter handling should be pushed to the devices. The devices should pull their parameters, removing them thus from a parameter pool. After the machine init finishes, errors could be reported for unhandled parameters.
Re: [Qemu-devel] Re: [PATCH 15/15] Implement the bus structure for PAPR virtual IO
On 02/13/2011 12:29 PM, Blue Swirl wrote: On Sun, Feb 13, 2011 at 5:08 PM, Anthony Liguori wrote: In KVM for x86, instead of using a secondary interface (like vmmcall/vmcall), we do all of our paravirtualization using native hardware interfaces that we can trap (PIO/MMIO). IIUC, on Power, trapping MMIO is not possible due to the MMU mode that is currently used (PFs are delivered directly to the guest). So it's not really possible to switch from using hypercalls to using MMIO. What I would suggest is modelling hypercalls as another I/O address space for the processor. So instead of having a function pointer in the CPUState, introduce a: typedef void (HypercallFunc)(CPUState *env, void *opaque); /* register a hypercall handler */ void register_hypercall(target_ulong index, HypercallFunc *handler, void *opaque); void unregister_hypercall(target_ulong index); /* dispatch a hypercall */ void cpu_hypercall(CPUState *env, target_ulong index); This interface could also be used to implement hypercall based interfaces on s390 and x86. The arguments will have to be extracted from the CPU state but I don't think we'll really ever have common hypercall implementations anyway so that's not a huge problem. Nice idea. Then the part handling CPUState probably should belong to target-ppc/ rather than hw/. Would be nice to have the target-ppc/ hypercall handlers call into a higher level VIO interface that didn't deal directly with CPUState. The actual hardware emulation would then be implemented in hw/ and would not be compiled for a specific target. That helps avoid future sloppiness. Regards, Anthony Liguori
Re: [Qemu-devel] KVM call minutes for Feb 8
On Sun, Feb 13, 2011 at 5:31 PM, Anthony Liguori wrote: > On 02/11/2011 12:14 PM, Blue Swirl wrote: >> >> On Thu, Feb 10, 2011 at 6:05 PM, Anthony Liguori >> wrote: >> >>> >>> On 02/10/2011 03:20 PM, Gleb Natapov wrote: >>> Jugging by how well all previous conversion went we will end up with one more way of creating devices. One legacy, another qdev and your new one. And what is the problem with qdev again (not that I am a big qdev fan)? >>> >>> We've really been arguing about probably the most minor aspect of the >>> problem with qdev. >>> >>> All I'm really saying is that we shouldn't tie device construction to a >>> factory interface as we do with qdev. >>> >>> That simply means that we should be able to do: >>> >>> RTC *rtc_create(arg1, arg2, arg2); >>> >> >> I don't see how that would help at all. Throwing qdev away and just >> calling various functions directly, with all states exposed would be >> like QEMU 0.9.0. >> > > qdev doesn't expose any state today. qdev properties are construction-only > properties that happen to be stored in each device state. > > What we really need is a full property framework that includes properties > with hookable getters and setters along with the ability to mark properties > as construct-only, read-only, or read-write. > > But I think it's reasonable to expose construct-only properties as just an > initfn argument. Sounds OK. About read-write properties, what happens if we one day have extensive threading, and locks are pushed to device level? I can imagine a deadlock involving one thread running in IO thread for a device and another trying to access that device's properties. Maybe that is not different from function call version. >>> And that a separate piece of code decides which devices are exposed >>> through >>> -device or device_add. Which devices are exposed is really a minor >>> detail. >>> >>> That said, qdev has a number of significant limitations in my mind. The >>> first is that the only relationship between devices is through the >>> BusState >>> interface. >>> >> >> There's also qemu_irq for arbitrary signals. >> > > Yes, but qemu_irq is very restricted as it only models a signal bit of > information and doesn't really have a mechanism to attach/detach in any > generic way. Basic signals are already very useful for many purposes, since they match digital logic signals in real HW. In theory, whole machines could be constructed with just qemu_irq and NAND gate emulator. ;-) In the message passing IRQ discussion earlier, it was IIRC decided that the one bit version would not be changed but a separate message passing version would be created if ever needed. >>> I don't think we should even try to have a generic bus model. >>> When you look at how badly broken PCI hotplug is current in qdev, I >>> think >>> this is symptomatic of this. >>> >> >> And how should this be fixed? The API change would not help. >> > > Just as we have bus level creation functions, we should have bus level > hotplug interfaces. > >>> There's also no way in qdev to really have polymorphism. Interfaces >>> really >>> aren't meaningful in qdev so you have things like PCIDevice where some >>> methods are stored in the object instead of the class dispatch table and >>> you >>> have overuse of static class members. >>> >> >> QEMU is developed in C, not C++. >> > > But we're trying to do object oriented programming in C so as long as we're > doing that, we ought to do it right. > >>> And it's all unrelated to VMState. >>> >> >> Right, but this has also the good side that not all device state is >> automatically exported. If other devices would be allowed to muck with >> a devices internal state freely, bad things could happen. >> >> Device reset could also use standard register definitions, shared with >> VMState. >> > > There's a way to have formally verifiable serialization/deserialization if > we can satisfy two conditions 1) the devices rely on no global state (i.e. > static variables) and 2) every field asssociated with a device is marshalled > during serialization/deserialization. > > When we define a device, right now we say that certain state is writable > during construction. It's not a stretch to want to have some properties > writable during runtime. If we also had a mechanism to mark certain > properties as read-only, but still were able to introspect them, we could > implement serialization without having to have a second VMState definition. > > Compatibility will always require manipulating state, but once you have the > state stored in a data structure, you can describe those transformations in > a pretty high level fashion. > >>> And this is just the basic mechanisms of qdev. The actual implementation >>> is >>> worse. The use of qemu_irq as gpio in the base class and overuse of >>> SystemBus is really quite insane. >>> >> >> Maybe qemu_irq should be renamed to QEMUSignal (and I don't like >> typedeffing pointers), otherwise it
Re: [Qemu-devel] KVM call minutes for Feb 8
On 02/13/2011 12:08 PM, Gleb Natapov wrote: On Sun, Feb 13, 2011 at 10:56:30AM -0600, Anthony Liguori wrote: qemu -device i440fx,id=nb -device piix3,id=sb,chipset=nb -device ioapic,id=ioapic,chipset=sb -device cpu,ioapic=ioapic,northbridge=nb Is not all that unreasonable and presents a fully functioning PC. Sure. And -M blah is a shortcut. Exactly. Or better yet, blah is a config file that contains [device "nb"] driver=i440fx You are trying to model how particular (very ancient) HW looked like, instead of emulating guest visible functionality, but is dead end since things are changing constantly. Northbridge functionality moves onto cpu for instance. What CPU i440fx was designed for? Pentium? What if user runs QEMU with emulated CPU that in real life has internal memory controller? Does you config have sense for such setup? Should we allow to specify only Pentium CPU since this is how real HW worked? Yes, how we structure the device tree will change over time. If users have to specify this stuff, we've already failed to start with. The i440fx is definitely guest visible. And yes, a guest could look at freak out that an i440fx is in use with a non-Pentium class CPU. We cheat here, and it works just fine. [device "sb"] driver=piix3 And piix3 refers to piix3.cfg which describe devices that present on the chipset. I disagree in this case that it's the best thing to do, but in general, yeah, we should provide a mechanism to have essentially device tree macros such that one high level device represents a larger tree of devices. This would be a good mechanism to support the concept of machines. But I don't think we should rely on having this tomorrow as doing this well is going to be challenging. chipset=nb [device "ioapic"] driver=ioapic chipset=sb Here, for instance, IOAPIC is included in a chipset for a long time now. Why user should care that piix3 didn't have it. How this detail changes qemu functionality? If it doesn't why should we expose it? Yeah, I'd be inclined to push the ioapic into the chipset here except it may be useful to have multiple ioapics at some point down the road. Yes, we're exposing gory details of the device model. No, I don't have a problem with that. [device "cpu"] driver=cpu ioapic=ioapic Why ioapic here? Doesn't cpu talks to ioapic via northbridge? The i440fx is the northbridge. For the i440fx, the I/O apic is a separate chip. Regards, Anthony Liguori
Re: [Qemu-devel] KVM call minutes for Feb 8
On 02/13/2011 01:37 PM, Blue Swirl wrote: On Sun, Feb 13, 2011 at 5:31 PM, Anthony Liguori wrote: qdev doesn't expose any state today. qdev properties are construction-only properties that happen to be stored in each device state. What we really need is a full property framework that includes properties with hookable getters and setters along with the ability to mark properties as construct-only, read-only, or read-write. But I think it's reasonable to expose construct-only properties as just an initfn argument. Sounds OK. About read-write properties, what happens if we one day have extensive threading, and locks are pushed to device level? I can imagine a deadlock involving one thread running in IO thread for a device and another trying to access that device's properties. Maybe that is not different from function call version. You need hookable setters/getters that can acquire a lock and do the right thing. It shouldn't be able to dead lock if the locking is designed right. Yes, but qemu_irq is very restricted as it only models a signal bit of information and doesn't really have a mechanism to attach/detach in any generic way. Basic signals are already very useful for many purposes, since they match digital logic signals in real HW. In theory, whole machines could be constructed with just qemu_irq and NAND gate emulator. ;-) It's not just in theory. In the C++ port of QEMU that I wrote, I implemented an AND, OR, and XOR gate and implemented a full 32-bit adder by just using a device config file. If done correctly, using referencing can be extremely powerful. A full adder is a good example. The gates really don't have any concept of bus and the relationship between gates is definitely not a tree. In the message passing IRQ discussion earlier, it was IIRC decided that the one bit version would not be changed but a separate message passing version would be created if ever needed. C already has a message passing interface that supports type safety called function pointers :-) An object that implements multiple interfaces where the interface becomes the "message passing interface" is exactly what I've been saying we need. It's flexible and the compiler helps us enforce typing. Any interfaces of a base class should make sense even for derived classes. That means if the base class is going to expose essentially a pin-out interface, that if I have a PCIDevice and cast it to Device, I should be able to interact with the GPIO interface to interact with the PCI device. Presumably, that means interfacing at the PCI signalling level. That's insane to model in QEMU :-) This would be doable, if we built buses from a bunch of signals, like in VHDL or Verilog. It would simplify aliased MMIO addresses nicely, the undecoded address pins would be ignored. I don't think it would be useful, but a separate interface could be added for connecting to PCIBus with just qemu_irqs. Yeah, it's possible, but I don't want to spend my time doing this. In reality, GPIO only makes sense for a small class of simple devices where modelling the pin-out interface makes sense (like a 7-segment LCD). That suggests that GPIO should not be in the DeviceState interface but instead should be in a SimpleDevice subclass or something like that. Could you point to examples of SystemBus overuse? anthony@titi:~/git/qemu/hw$ grep qdev_create *.c | wc -l 73 anthony@titi:~/git/qemu/hw$ grep 'qdev_create(NULL' *.c | wc -l 56 SystemBus has become a catch-all for shallow qdev conversions. We've got Northbridges, RAM, and network devices sitting on the same bus... On Sparc32 I have not bothered to create a SBus bus. Now it would be useful to get bootindex corrected. Most devices (even on-board IO) should use SBus. The only other bus (MBus) would exist between CPU, IOMMU and memory. But SysBus fitted the need until recently. A good way to judge where a device is using a bus interface correct: does all of it's interactions with any other part of the guest state involve method calls to the bus? Right now, the answer is no for just about every device in QEMU. This is the problem that qdev really was meant to solve and we're not really any further along solving it unfortunately. I'm not arguing against a generic factory interface, I'm arguing that it should be separate. IOW: SerialState *serial_create(int iobase, int irq, ...); static DeviceState *qdev_serial_create(QemuOpts *opts); static void serial_init(void) { qdev_register("serial", qdev_serial_create); } The key point is that when we create devices internally, we should have a C-friendly, type-safe interface to interact with. This will encourage composition and a richer device model than what we have today. Isn't this what we have now in most cases? No. The common pattern of shallow conversion is: struct SerialState { // device state }; struct I
Re: [Qemu-devel] KVM call minutes for Feb 8
On Sun, Feb 13, 2011 at 9:57 PM, Anthony Liguori wrote: > On 02/13/2011 01:37 PM, Blue Swirl wrote: >> >> On Sun, Feb 13, 2011 at 5:31 PM, Anthony Liguori >> wrote: >> >>> >>> qdev doesn't expose any state today. qdev properties are >>> construction-only >>> properties that happen to be stored in each device state. >>> >>> What we really need is a full property framework that includes properties >>> with hookable getters and setters along with the ability to mark >>> properties >>> as construct-only, read-only, or read-write. >>> >>> But I think it's reasonable to expose construct-only properties as just >>> an >>> initfn argument. >>> >> >> Sounds OK. About read-write properties, what happens if we one day >> have extensive threading, and locks are pushed to device level? I can >> imagine a deadlock involving one thread running in IO thread for a >> device and another trying to access that device's properties. Maybe >> that is not different from function call version. >> > > You need hookable setters/getters that can acquire a lock and do the right > thing. It shouldn't be able to dead lock if the locking is designed right. > > >>> Yes, but qemu_irq is very restricted as it only models a signal bit of >>> information and doesn't really have a mechanism to attach/detach in any >>> generic way. >>> >> >> Basic signals are already very useful for many purposes, since they >> match digital logic signals in real HW. In theory, whole machines >> could be constructed with just qemu_irq and NAND gate emulator. ;-) >> > > It's not just in theory. In the C++ port of QEMU that I wrote, I > implemented an AND, OR, and XOR gate and implemented a full 32-bit adder by > just using a device config file. > > If done correctly, using referencing can be extremely powerful. A full > adder is a good example. The gates really don't have any concept of bus and > the relationship between gates is definitely not a tree. > >> In the message passing IRQ discussion earlier, it was IIRC decided >> that the one bit version would not be changed but a separate message >> passing version would be created if ever needed. >> > > C already has a message passing interface that supports type safety called > function pointers :-) > > An object that implements multiple interfaces where the interface becomes > the "message passing interface" is exactly what I've been saying we need. > It's flexible and the compiler helps us enforce typing. > >>> >>> Any interfaces of a base class should make sense even for derived >>> classes. >>> >>> That means if the base class is going to expose essentially a pin-out >>> interface, that if I have a PCIDevice and cast it to Device, I should be >>> able to interact with the GPIO interface to interact with the PCI device. >>> Presumably, that means interfacing at the PCI signalling level. That's >>> insane to model in QEMU :-) >>> >> >> This would be doable, if we built buses from a bunch of signals, like >> in VHDL or Verilog. It would simplify aliased MMIO addresses nicely, >> the undecoded address pins would be ignored. I don't think it would be >> useful, but a separate interface could be added for connecting to >> PCIBus with just qemu_irqs. >> > > Yeah, it's possible, but I don't want to spend my time doing this. > >>> In reality, GPIO only makes sense for a small class of simple devices >>> where >>> modelling the pin-out interface makes sense (like a 7-segment LCD). That >>> suggests that GPIO should not be in the DeviceState interface but instead >>> should be in a SimpleDevice subclass or something like that. >>> >>> Could you point to examples of SystemBus overuse? >>> >>> anthony@titi:~/git/qemu/hw$ grep qdev_create *.c | wc -l >>> 73 >>> anthony@titi:~/git/qemu/hw$ grep 'qdev_create(NULL' *.c | wc -l >>> 56 >>> >>> SystemBus has become a catch-all for shallow qdev conversions. We've got >>> Northbridges, RAM, and network devices sitting on the same bus... >>> >> >> On Sparc32 I have not bothered to create a SBus bus. Now it would be >> useful to get bootindex corrected. Most devices (even on-board IO) >> should use SBus. >> >> The only other bus (MBus) would exist between CPU, IOMMU and memory. >> >> But SysBus fitted the need until recently. >> > > A good way to judge where a device is using a bus interface correct: does > all of it's interactions with any other part of the guest state involve > method calls to the bus? > > Right now, the answer is no for just about every device in QEMU. This is > the problem that qdev really was meant to solve and we're not really any > further along solving it unfortunately. > >>> I'm not arguing against a generic factory interface, I'm arguing that it >>> should be separate. >>> >>> IOW: >>> >>> SerialState *serial_create(int iobase, int irq, ...); >>> >>> static DeviceState *qdev_serial_create(QemuOpts *opts); >>> >>> static void serial_init(void) >>> { >>> qdev_register("serial", qdev_serial_create); >>> } >>> >>> The key p
[Qemu-devel] [PATCH 0/4] More qdev conversion and optional devices
Make applesmc and vga-isa optional, convert i8254 to qdev. Blue Swirl (4): applesmc: make optional vga-isa: convert to qdev vga-isa: make optional i8254: convert to qdev Makefile.objs |1 + Makefile.target|2 +- default-configs/i386-softmmu.mak |1 + default-configs/x86_64-softmmu.mak |1 + hw/i8254.c | 61 ++-- hw/mips_fulong2e.c |4 +- hw/mips_jazz.c |4 +- hw/mips_malta.c|4 +- hw/mips_r4k.c |4 +- hw/pc.c|5 +-- hw/pc.h| 39 +- hw/pcspk.c |4 +- hw/ppc_prep.c |4 +- hw/vga-isa.c | 51 ++--- hw/vga.c | 22 - hw/vga_int.h |1 + 16 files changed, 152 insertions(+), 56 deletions(-)
[Qemu-devel] [PATCH 1/4] applesmc: make optional
Based on patch by David Ahern. Signed-off-by: Blue Swirl --- Makefile.objs |1 + Makefile.target|2 +- default-configs/i386-softmmu.mak |1 + default-configs/x86_64-softmmu.mak |1 + 4 files changed, 4 insertions(+), 1 deletions(-) diff --git a/Makefile.objs b/Makefile.objs index adb5842..25ff8a5 100644 --- a/Makefile.objs +++ b/Makefile.objs @@ -196,6 +196,7 @@ hw-obj-$(CONFIG_ACPI) += acpi.o acpi_piix4.o hw-obj-$(CONFIG_APM) += pm_smbus.o apm.o hw-obj-$(CONFIG_DMA) += dma.o hw-obj-$(CONFIG_HPET) += hpet.o +hw-obj-$(CONFIG_APPLESMC) += applesmc.o # PPC devices hw-obj-$(CONFIG_OPENPIC) += openpic.o diff --git a/Makefile.target b/Makefile.target index a6c30dd..e4b1210 100644 --- a/Makefile.target +++ b/Makefile.target @@ -215,7 +215,7 @@ obj-$(CONFIG_KVM) += ivshmem.o obj-i386-y += vga.o obj-i386-y += mc146818rtc.o i8259.o pc.o obj-i386-y += cirrus_vga.o apic.o ioapic.o piix_pci.o -obj-i386-y += vmport.o applesmc.o +obj-i386-y += vmport.o obj-i386-y += device-hotplug.o pci-hotplug.o smbios.o wdt_ib700.o obj-i386-y += debugcon.o multiboot.o obj-i386-y += pc_piix.o diff --git a/default-configs/i386-softmmu.mak b/default-configs/i386-softmmu.mak index 3e0eddf..55589fa 100644 --- a/default-configs/i386-softmmu.mak +++ b/default-configs/i386-softmmu.mak @@ -20,3 +20,4 @@ CONFIG_NE2000_ISA=y CONFIG_PIIX_PCI=y CONFIG_SOUND=y CONFIG_HPET=y +CONFIG_APPLESMC=y diff --git a/default-configs/x86_64-softmmu.mak b/default-configs/x86_64-softmmu.mak index 1cc1b61..59b7893 100644 --- a/default-configs/x86_64-softmmu.mak +++ b/default-configs/x86_64-softmmu.mak @@ -20,3 +20,4 @@ CONFIG_NE2000_ISA=y CONFIG_PIIX_PCI=y CONFIG_SOUND=y CONFIG_HPET=y +CONFIG_APPLESMC=y -- 1.6.2.4 0001-applesmc-make-optional.patch Description: application/mbox
[Qemu-devel] [PATCH 2/4] vga-isa: convert to qdev
Signed-off-by: Blue Swirl --- hw/pc.h |8 +++- hw/vga-isa.c | 51 +-- hw/vga.c | 22 ++ hw/vga_int.h |1 + 4 files changed, 67 insertions(+), 15 deletions(-) diff --git a/hw/pc.h b/hw/pc.h index 64a3a22..475484a 100644 --- a/hw/pc.h +++ b/hw/pc.h @@ -181,7 +181,13 @@ enum vga_retrace_method { extern enum vga_retrace_method vga_retrace_method; -int isa_vga_init(void); +static inline int isa_vga_init(void) +{ +isa_create_simple("isa-vga"); + +return 0; +} + int pci_vga_init(PCIBus *bus); int isa_vga_mm_init(target_phys_addr_t vram_base, target_phys_addr_t ctrl_base, int it_shift); diff --git a/hw/vga-isa.c b/hw/vga-isa.c index 3046054..fde0d56 100644 --- a/hw/vga-isa.c +++ b/hw/vga-isa.c @@ -29,16 +29,40 @@ #include "qemu-timer.h" #include "loader.h" -int isa_vga_init(void) +typedef struct ISAVGAState { +ISADevice dev; +struct VGACommonState state; +} ISAVGAState; + +static void vga_reset_isa(DeviceState *dev) { -VGACommonState *s; +ISAVGAState *d = container_of(dev, ISAVGAState, dev.qdev); +VGACommonState *s = &d->state; -s = qemu_mallocz(sizeof(*s)); +vga_common_reset(s); +} -vga_common_init(s, VGA_RAM_SIZE); -vga_init(s); -vmstate_register(NULL, 0, &vmstate_vga_common, s); +static int vga_initfn(ISADevice *dev) +{ +ISAVGAState *d = DO_UPCAST(ISAVGAState, dev, dev); +VGACommonState *s = &d->state; +int vga_io_memory; +vga_common_init(s, VGA_RAM_SIZE); +vga_io_memory = vga_init_io(s); +cpu_register_physical_memory(isa_mem_base + 0x000a, 0x2, + vga_io_memory); +qemu_register_coalesced_mmio(isa_mem_base + 0x000a, 0x2); +isa_init_ioport(dev, 0x3c0); +isa_init_ioport(dev, 0x3b4); +isa_init_ioport(dev, 0x3ba); +isa_init_ioport(dev, 0x3da); +isa_init_ioport(dev, 0x3c0); +#ifdef CONFIG_BOCHS_VBE +isa_init_ioport(dev, 0x1ce); +isa_init_ioport(dev, 0x1cf); +isa_init_ioport(dev, 0x1d0); +#endif /* CONFIG_BOCHS_VBE */ s->ds = graphic_console_init(s->update, s->invalidate, s->screen_dump, s->text_update, s); @@ -47,3 +71,18 @@ int isa_vga_init(void) rom_add_vga(VGABIOS_FILENAME); return 0; } + +static ISADeviceInfo vga_info = { +.qdev.name = "isa-vga", +.qdev.size = sizeof(ISAVGAState), +.qdev.vmsd = &vmstate_vga_common, +.qdev.reset = vga_reset_isa, +.qdev.no_user = 1, +.init = vga_initfn, +}; + +static void vga_register(void) +{ +isa_qdev_register(&vga_info); +} +device_init(vga_register) diff --git a/hw/vga.c b/hw/vga.c index e2151a2..fd492d0 100644 --- a/hw/vga.c +++ b/hw/vga.c @@ -2259,12 +2259,8 @@ void vga_common_init(VGACommonState *s, int vga_ram_size) } /* used by both ISA and PCI */ -void vga_init(VGACommonState *s) +int vga_init_io(VGACommonState *s) { -int vga_io_memory; - -qemu_register_reset(vga_reset, s); - register_ioport_write(0x3c0, 16, 1, vga_ioport_write, s); register_ioport_write(0x3b4, 2, 1, vga_ioport_write, s); @@ -2278,7 +2274,6 @@ void vga_init(VGACommonState *s) register_ioport_read(0x3d4, 2, 1, vga_ioport_read, s); register_ioport_read(0x3ba, 1, 1, vga_ioport_read, s); register_ioport_read(0x3da, 1, 1, vga_ioport_read, s); -s->bank_offset = 0; #ifdef CONFIG_BOCHS_VBE #if defined (TARGET_I386) @@ -2296,8 +2291,19 @@ void vga_init(VGACommonState *s) #endif #endif /* CONFIG_BOCHS_VBE */ -vga_io_memory = cpu_register_io_memory(vga_mem_read, vga_mem_write, s, - DEVICE_LITTLE_ENDIAN); +return cpu_register_io_memory(vga_mem_read, vga_mem_write, s, + DEVICE_LITTLE_ENDIAN); +} + +void vga_init(VGACommonState *s) +{ +int vga_io_memory; + +qemu_register_reset(vga_reset, s); + +s->bank_offset = 0; + +vga_io_memory = vga_init_io(s); cpu_register_physical_memory(isa_mem_base + 0x000a, 0x2, vga_io_memory); qemu_register_coalesced_mmio(isa_mem_base + 0x000a, 0x2); diff --git a/hw/vga_int.h b/hw/vga_int.h index 1067f2c..d2811bd 100644 --- a/hw/vga_int.h +++ b/hw/vga_int.h @@ -191,6 +191,7 @@ static inline int c6_to_8(int v) void vga_common_init(VGACommonState *s, int vga_ram_size); void vga_init(VGACommonState *s); +int vga_init_io(VGACommonState *s); void vga_common_reset(VGACommonState *s); void vga_dirty_log_start(VGACommonState *s); -- 1.6.2.4 0002-vga-isa-convert-to-qdev.patch Description: application/mbox
[Qemu-devel] [PATCH 3/4] vga-isa: make optional
Ignore failure with vga-isa device creation, but print a warning message. Signed-off-by: Blue Swirl --- hw/pc.h | 10 -- 1 files changed, 8 insertions(+), 2 deletions(-) diff --git a/hw/pc.h b/hw/pc.h index 475484a..60f8c42 100644 --- a/hw/pc.h +++ b/hw/pc.h @@ -183,9 +183,15 @@ extern enum vga_retrace_method vga_retrace_method; static inline int isa_vga_init(void) { -isa_create_simple("isa-vga"); +ISADevice *dev; -return 0; +dev = isa_try_create("isa-vga"); +if (!dev) { +fprintf(stderr, "Warning: isa-vga not available\n"); +return 0; +} +qdev_init_nofail(&dev->qdev); +return 1; } int pci_vga_init(PCIBus *bus); -- 1.6.2.4 0003-vga-isa-make-optional.patch Description: application/mbox
[Qemu-devel] [PATCH 4/4] i8254: convert to qdev
Convert to qdev. Don't expose PITState. Signed-off-by: Blue Swirl --- hw/i8254.c | 61 +-- hw/mips_fulong2e.c |4 +- hw/mips_jazz.c |4 +- hw/mips_malta.c|4 +- hw/mips_r4k.c |4 +- hw/pc.c|5 +-- hw/pc.h| 25 ++-- hw/pcspk.c |4 +- hw/ppc_prep.c |4 +- 9 files changed, 75 insertions(+), 40 deletions(-) diff --git a/hw/i8254.c b/hw/i8254.c index 06b225c..680caab 100644 --- a/hw/i8254.c +++ b/hw/i8254.c @@ -53,9 +53,12 @@ typedef struct PITChannelState { qemu_irq irq; } PITChannelState; -struct PITState { +typedef struct PITState { +ISADevice dev; +uint32_t irq; +uint32_t iobase; PITChannelState channels[3]; -}; +} PITState; static PITState pit_state; @@ -119,8 +122,9 @@ static int pit_get_out1(PITChannelState *s, int64_t current_time) return out; } -int pit_get_out(PITState *pit, int channel, int64_t current_time) +int pit_get_out(ISADevice *dev, int channel, int64_t current_time) { +PITState *pit = DO_UPCAST(PITState, dev, dev); PITChannelState *s = &pit->channels[channel]; return pit_get_out1(s, current_time); } @@ -179,8 +183,9 @@ static int64_t pit_get_next_transition_time(PITChannelState *s, } /* val must be 0 or 1 */ -void pit_set_gate(PITState *pit, int channel, int val) +void pit_set_gate(ISADevice *dev, int channel, int val) { +PITState *pit = DO_UPCAST(PITState, dev, dev); PITChannelState *s = &pit->channels[channel]; switch(s->mode) { @@ -210,20 +215,23 @@ void pit_set_gate(PITState *pit, int channel, int val) s->gate = val; } -int pit_get_gate(PITState *pit, int channel) +int pit_get_gate(ISADevice *dev, int channel) { +PITState *pit = DO_UPCAST(PITState, dev, dev); PITChannelState *s = &pit->channels[channel]; return s->gate; } -int pit_get_initial_count(PITState *pit, int channel) +int pit_get_initial_count(ISADevice *dev, int channel) { +PITState *pit = DO_UPCAST(PITState, dev, dev); PITChannelState *s = &pit->channels[channel]; return s->count; } -int pit_get_mode(PITState *pit, int channel) +int pit_get_mode(ISADevice *dev, int channel) { +PITState *pit = DO_UPCAST(PITState, dev, dev); PITChannelState *s = &pit->channels[channel]; return s->mode; } @@ -462,9 +470,9 @@ static const VMStateDescription vmstate_pit = { } }; -static void pit_reset(void *opaque) +static void pit_reset(DeviceState *dev) { -PITState *pit = opaque; +PITState *pit = container_of(dev, PITState, dev.qdev); PITChannelState *s; int i; @@ -498,20 +506,39 @@ void hpet_pit_enable(void) pit_load_count(s, 0); } -PITState *pit_init(int base, qemu_irq irq) +static int pit_initfn(ISADevice *dev) { -PITState *pit = &pit_state; +PITState *pit = DO_UPCAST(PITState, dev, dev); PITChannelState *s; s = &pit->channels[0]; /* the timer 0 is connected to an IRQ */ s->irq_timer = qemu_new_timer(vm_clock, pit_irq_timer, s); -s->irq = irq; +s->irq = isa_reserve_irq(pit->irq); -vmstate_register(NULL, base, &vmstate_pit, pit); -qemu_register_reset(pit_reset, pit); -register_ioport_write(base, 4, 1, pit_ioport_write, pit); -register_ioport_read(base, 3, 1, pit_ioport_read, pit); +register_ioport_write(pit->iobase, 4, 1, pit_ioport_write, pit); +register_ioport_read(pit->iobase, 3, 1, pit_ioport_read, pit); +isa_init_ioport(dev, pit->iobase); -return pit; +return 0; +} + +static ISADeviceInfo pit_info = { +.qdev.name = "isa-pit", +.qdev.size = sizeof(PITState), +.qdev.vmsd = &vmstate_pit, +.qdev.reset= pit_reset, +.qdev.no_user = 1, +.init = pit_initfn, +.qdev.props = (Property[]) { +DEFINE_PROP_UINT32("irq", PITState, irq, -1), +DEFINE_PROP_HEX32("iobase", PITState, iobase, -1), +DEFINE_PROP_END_OF_LIST(), +}, +}; + +static void pit_register(void) +{ +isa_qdev_register(&pit_info); } +device_init(pit_register) diff --git a/hw/mips_fulong2e.c b/hw/mips_fulong2e.c index 2783ed5..f5ae639 100644 --- a/hw/mips_fulong2e.c +++ b/hw/mips_fulong2e.c @@ -67,7 +67,7 @@ #define FULONG2E_ATI_SLOT6 #define FULONG2E_RTL8139_SLOT7 -static PITState *pit; +static ISADevice *pit; static struct _loaderparams { int ram_size; @@ -369,7 +369,7 @@ static void mips_fulong2e_init(ram_addr_t ram_size, const char *boot_device, qdev_init_nofail(eeprom); /* init other devices */ -pit = pit_init(0x40, isa_reserve_irq(0)); +pit = pit_init(0x40, 0); cpu_exit_irq = qemu_allocate_irqs(cpu_request_exit, NULL, 1); DMA_init(0, cpu_exit_irq); diff --git a/hw/mips_jazz.c b/hw/mips_jazz.c index 85eba5a..a100394 100644 --- a/hw/mips_jazz.c +++ b/hw/mips_jazz.c @@ -115,7 +115,7 @@ void mips_jazz_init (ram_addr_t ram_size, void* rc4030_opaque; int s_rt
Re: [Qemu-devel] KVM call minutes for Feb 8
On 13 February 2011 16:56, Anthony Liguori wrote: > If we can move away from Bus abstraction and to a simpler interface > mechanism, then we can express peer relationships by just having bidirection > references. IOW: > > -device cpus,northbridge=nb,id=cpus,count=16 -device i440fx,cpus=cpus > > I don't think modelling each CPU makes sense. We should probably just model > all cpus in a single device for the sake of simplicity. How would this work for systems with multiple CPUs which have different views of the world? (ie their memory maps differ so that eg some RAM is shared between them but some parts of the address space are different RAM for the two cores, some devices one core only, some devices shared between cores but the device can tell which core made an IO request) With a bus-style abstraction this is straightforward: each core has its own bus which is what defines its view of the world, some devices and RAM are wired up to both buses. I'm not sure how the bidirectional reference model would look for this? (Real world examples would be if we ever had any need to actually model any of the auxiliary cores in say an OMAP device, or the M3 in a versatile-express. Yes, most systems won't look that odd but it does come up, especially in testbench type designs, and our interface abstraction should be able to handle it.) -- PMM
[Qemu-devel] [PATCH 2/2][v3] linux-user: correct core dump format
This patch allows to really use the core dumped by qemu with guest architecture tools. - it adds a missing bswap_phdr() for the program headers of memory regions. "objdump -x" sample: BEFORE: 0x100 off0x0020 vaddr 0x0400 paddr 0x align 2**21 filesz 0x memsz 0x0010 flags --- 0x100 off0x0020 vaddr 0x00100400 paddr 0x align 2**21 filesz 0x memsz 0x0008 flags --- 600 AFTER: LOAD off0x2000 vaddr 0x0004 paddr 0x align 2**13 filesz 0x memsz 0x1000 flags --- LOAD off0x2000 vaddr 0x00041000 paddr 0x align 2**13 filesz 0x memsz 0x0800 flags rw- - it doesn't pad the note size to sizeof(int32_t). On m68k the NT_PRSTATUS note size is 154 and must not be rounded up to 156, because this value is checked by objdump and gdb. "gdb" symptoms: "warning: Couldn't find general-purpose registers in core file." "objdump -x" sample: BEFORE: Sections: Idx Name Size VMA LMA File off Algn 0 note0 01c4 03b4 2**0 CONTENTS, READONLY 1 .auxv 0070 0508 2**2 CONTENTS 2 proc1 0010 0400 0020 2**10 READONLY AFTER: Sections: Idx Name Size VMA LMA File off Algn 0 note0 01c4 03b4 2**0 CONTENTS, READONLY 1 .reg/190220050 040e 2**2 CONTENTS 2 .reg 0050 040e 2**2 CONTENTS 3 .auxv 0070 0508 2**2 CONTENTS 4 load1 0004 2000 2**13 ALLOC, READONLY Signed-off-by: Laurent Vivier --- v2: use a predefined alignment size for target_elf_prstatus v3: use target_ aligned according target properties linux-user/elfload.c | 34 ++ 1 files changed, 18 insertions(+), 16 deletions(-) diff --git a/linux-user/elfload.c b/linux-user/elfload.c index 2de83e4..fe5410e 100644 --- a/linux-user/elfload.c +++ b/linux-user/elfload.c @@ -103,13 +103,13 @@ enum { typedef target_ulongtarget_elf_greg_t; #ifdef USE_UID16 -typedef uint16_ttarget_uid_t; -typedef uint16_ttarget_gid_t; +typedef target_ushort target_uid_t; +typedef target_ushort target_gid_t; #else -typedef uint32_ttarget_uid_t; -typedef uint32_ttarget_gid_t; +typedef target_uint target_uid_t; +typedef target_uint target_gid_t; #endif -typedef int32_t target_pid_t; +typedef target_int target_pid_t; #ifdef TARGET_I386 @@ -1761,19 +1761,20 @@ struct memelfnote { size_t namesz_rounded; inttype; size_t datasz; +size_t datasz_rounded; void *data; size_t notesz; }; struct target_elf_siginfo { -int si_signo; /* signal number */ -int si_code; /* extra code */ -int si_errno; /* errno */ +target_int si_signo; /* signal number */ +target_int si_code; /* extra code */ +target_int si_errno; /* errno */ }; struct target_elf_prstatus { struct target_elf_siginfo pr_info; /* Info associated with signal */ -short pr_cursig;/* Current signal */ +target_short pr_cursig;/* Current signal */ target_ulong pr_sigpend; /* XXX */ target_ulong pr_sighold; /* XXX */ target_pid_t pr_pid; @@ -1785,7 +1786,7 @@ struct target_elf_prstatus { struct target_timeval pr_cutime; /* XXX Cumulative user time */ struct target_timeval pr_cstime; /* XXX Cumulative system time */ target_elf_gregset_t pr_reg; /* GP registers */ -intpr_fpvalid; /* XXX */ +target_int pr_fpvalid; /* XXX */ }; #define ELF_PRARGSZ (80) /* Number of chars for args */ @@ -2036,7 +2037,9 @@ static void fill_note(struct memelfnote *note, const char *name, int type, note->namesz = namesz; note->namesz_rounded = roundup(namesz, sizeof (int32_t)); note->type = type; -note->datasz = roundup(sz, sizeof (int32_t));; +note->datasz = sz; +note->datasz_rounded = roundup(sz, sizeof (int32_t)); + note->data = data; /* @@ -2044,7 +2047,7 @@ static void fill_note(struct memelfnote *note, const char *name, int type, * ELF document. */ note->notesz = sizeof (struct elf_note) + -note->namesz_rounded + note->datasz; +note->namesz_rounded + note->datasz_rounded; } static void fill_elf_header(struct elfhdr *elf, int segs, uint16_t machine, @@ -2264,7 +2267,7 @@ static int write_note(struct memelfnote *men, int fd) return (-1); if (dump_write(fd, men->name, me
[Qemu-devel] [PATCH 1/2][v3] linux-user: Define target alignment size
Datatype alignment can be found using following application: int main(void) { printf("alignof(short) %ld\n", __alignof__(short)); printf("alignof(int) %ld\n", __alignof__(int)); printf("alignof(long) %ld\n", __alignof__(long)); printf("alignof(long long) %ld\n", __alignof__(long long)); } This patch includes following alignments: i386 alignof(short) 2 alignof(int) 4 alignof(long) 4 alignof(long long) 8 x86_64 alignof(short) 2 alignof(int) 4 alignof(long) 8 alignof(long long) 8 arm alignof(short) 2 alignof(int) 4 alignof(long) 4 alignof(long long) 4 m68k (680x0) alignof(short) 2 alignof(int) 2 alignof(long) 2 alignof(long long) 2 mips alignof(short) 2 alignof(int) 4 alignof(long) 4 alignof(long long) 8 ppc alignof(short) 2 alignof(int) 4 alignof(long) 4 alignof(long long) 8 for other targets, use by default (2,4,4,8). Please, update for your favorite target... Signed-off-by: Laurent Vivier --- v2: compute align size for each basic datatype v3: add alignments for some 64bit targets, renanme long_long to llong configure | 17 + cpu-defs.h | 14 ++ 2 files changed, 27 insertions(+), 4 deletions(-) diff --git a/configure b/configure index 25381e5..c36c553 100755 --- a/configure +++ b/configure @@ -2919,6 +2919,10 @@ target_nptl="no" interp_prefix1=`echo "$interp_prefix" | sed "s/%M/$target_arch2/g"` echo "CONFIG_QEMU_INTERP_PREFIX=\"$interp_prefix1\"" >> $config_target_mak gdb_xml_files="" +target_short_alignment=2 +target_int_alignment=4 +target_long_alignment=4 +target_llong_alignment=8 TARGET_ARCH="$target_arch2" TARGET_BASE_ARCH="" @@ -2931,9 +2935,11 @@ case "$target_arch2" in x86_64) TARGET_BASE_ARCH=i386 target_phys_bits=64 +target_long_alignment=8 ;; alpha) target_phys_bits=64 +target_long_alignment=8 target_nptl="yes" ;; arm|armeb) @@ -2942,6 +2948,7 @@ case "$target_arch2" in target_nptl="yes" gdb_xml_files="arm-core.xml arm-vfp.xml arm-vfp3.xml arm-neon.xml" target_phys_bits=32 +target_llong_alignment=4 ;; cris) target_nptl="yes" @@ -2951,6 +2958,9 @@ case "$target_arch2" in bflt="yes" gdb_xml_files="cf-core.xml cf-fp.xml" target_phys_bits=32 +target_int_alignment=2 +target_long_alignment=2 +target_llong_alignment=2 ;; microblaze) bflt="yes" @@ -2974,6 +2984,7 @@ case "$target_arch2" in TARGET_BASE_ARCH=mips echo "TARGET_ABI_MIPSN64=y" >> $config_target_mak target_phys_bits=64 +target_long_alignment=8 ;; ppc) gdb_xml_files="power-core.xml power-fpu.xml power-altivec.xml power-spe.xml" @@ -2992,6 +3003,7 @@ case "$target_arch2" in TARGET_ABI_DIR=ppc gdb_xml_files="power64-core.xml power-fpu.xml power-altivec.xml power-spe.xml" target_phys_bits=64 +target_long_alignment=8 ;; ppc64abi32) TARGET_ARCH=ppc64 @@ -3013,6 +3025,7 @@ case "$target_arch2" in sparc64) TARGET_BASE_ARCH=sparc target_phys_bits=64 +target_long_alignment=8 ;; sparc32plus) TARGET_ARCH=sparc64 @@ -3029,6 +3042,10 @@ case "$target_arch2" in exit 1 ;; esac +echo "TARGET_SHORT_ALIGNMENT=$target_short_alignment" >> $config_target_mak +echo "TARGET_INT_ALIGNMENT=$target_int_alignment" >> $config_target_mak +echo "TARGET_LONG_ALIGNMENT=$target_long_alignment" >> $config_target_mak +echo "TARGET_LLONG_ALIGNMENT=$target_llong_alignment" >> $config_target_mak echo "TARGET_ARCH=$TARGET_ARCH" >> $config_target_mak target_arch_name="`echo $TARGET_ARCH | tr '[:lower:]' '[:upper:]'`" echo "TARGET_$target_arch_name=y" >> $config_target_mak diff --git a/cpu-defs.h b/cpu-defs.h index 8d4bf86..37780e7 100644 --- a/cpu-defs.h +++ b/cpu-defs.h @@ -37,16 +37,22 @@ #define TARGET_LONG_SIZE (TARGET_LONG_BITS / 8) +typedef int16_t target_short __attribute__ ((aligned(TARGET_SHORT_ALIGNMENT))); +typedef uint16_t target_ushort __attribute__((aligned(TARGET_SHORT_ALIGNMENT))); +typedef int32_t target_int __attribute__((aligned(TARGET_INT_ALIGNMENT))); +typedef uint32_t target_uint __attribute__((aligned(TARGET_INT_ALIGNMENT))); +typedef int64_t target_llong __attribute__((aligned(TARGET_LLONG_ALIGNMENT))); +typedef uint64_t target_ullong __attribute__((aligned(TARGET_LLONG_ALIGNMENT))); /* target_ulong is the type of a virtual address */ #if TARGET_LONG_SIZE == 4 -typedef int32_t target_long; -typedef uint32_t target_ulong; +typedef int32_t target_long __attribute__((aligned(TARGET_LONG_ALIGNMENT))); +typedef uint32_t target_ulong __attribute__((aligned(TARGET_LONG_ALIGNMENT))); #define TARGET_FMT_lx "%08x" #define TARGET_FMT_ld "%d" #define TARGET_FMT_lu "%u" #elif TARGET_LONG_SIZE == 8 -typedef int64_t target_long; -typedef uint64_t target_ulong; +typedef int64_t target_long __attribute__((aligned(TARGET_LONG_ALIGNMENT))); +typedef uint64_t target_ulong __attribute__((alig
[Qemu-devel] [PATCH 0/2][v4] correct core dump format
This is the v4 of my patch correcting the core dump format. (3 versions for patch 2, 3 versions for patch 1 starting at version 2 of patch 2...) v4 adds some long alignments for 64bit targets, renames target_long_long to target_llong, and so on... v3 introduces a new parameter of the target: the datatype alignment size. Targets like i386, mips or ppc align (short, int, long, long long) on (2, 4, 4, 8), target like x86_64 aligns on (2, 4, 8, 8) but arm aligns on (2, 4, 4, 4) and m68k (680x0) on (2, 2, 2, 2). And this knowledge is needed to correctly generate a core dump. For other targets, please update the patch with your favorite one. V2 introduces target_elf_prstatus alignment (to manage arm and m68k) v1 corrects core dump for m68k [PATCH 1/2][v3] linux-user: Define target alignment size [PATCH 2/2][v3] linux-user: correct core dump format
Re: [Qemu-devel] KVM call minutes for Feb 8
On 02/13/2011 03:00 PM, Blue Swirl wrote: On Sun, Feb 13, 2011 at 9:57 PM, Anthony Liguori wrote: On 02/13/2011 01:37 PM, Blue Swirl wrote: On Sun, Feb 13, 2011 at 5:31 PM, Anthony Liguori wrote: qdev doesn't expose any state today. qdev properties are construction-only properties that happen to be stored in each device state. What we really need is a full property framework that includes properties with hookable getters and setters along with the ability to mark properties as construct-only, read-only, or read-write. But I think it's reasonable to expose construct-only properties as just an initfn argument. Sounds OK. About read-write properties, what happens if we one day have extensive threading, and locks are pushed to device level? I can imagine a deadlock involving one thread running in IO thread for a device and another trying to access that device's properties. Maybe that is not different from function call version. You need hookable setters/getters that can acquire a lock and do the right thing. It shouldn't be able to dead lock if the locking is designed right. Yes, but qemu_irq is very restricted as it only models a signal bit of information and doesn't really have a mechanism to attach/detach in any generic way. Basic signals are already very useful for many purposes, since they match digital logic signals in real HW. In theory, whole machines could be constructed with just qemu_irq and NAND gate emulator. ;-) It's not just in theory. In the C++ port of QEMU that I wrote, I implemented an AND, OR, and XOR gate and implemented a full 32-bit adder by just using a device config file. If done correctly, using referencing can be extremely powerful. A full adder is a good example. The gates really don't have any concept of bus and the relationship between gates is definitely not a tree. In the message passing IRQ discussion earlier, it was IIRC decided that the one bit version would not be changed but a separate message passing version would be created if ever needed. C already has a message passing interface that supports type safety called function pointers :-) An object that implements multiple interfaces where the interface becomes the "message passing interface" is exactly what I've been saying we need. It's flexible and the compiler helps us enforce typing. Any interfaces of a base class should make sense even for derived classes. That means if the base class is going to expose essentially a pin-out interface, that if I have a PCIDevice and cast it to Device, I should be able to interact with the GPIO interface to interact with the PCI device. Presumably, that means interfacing at the PCI signalling level. That's insane to model in QEMU :-) This would be doable, if we built buses from a bunch of signals, like in VHDL or Verilog. It would simplify aliased MMIO addresses nicely, the undecoded address pins would be ignored. I don't think it would be useful, but a separate interface could be added for connecting to PCIBus with just qemu_irqs. Yeah, it's possible, but I don't want to spend my time doing this. In reality, GPIO only makes sense for a small class of simple devices where modelling the pin-out interface makes sense (like a 7-segment LCD). That suggests that GPIO should not be in the DeviceState interface but instead should be in a SimpleDevice subclass or something like that. Could you point to examples of SystemBus overuse? anthony@titi:~/git/qemu/hw$ grep qdev_create *.c | wc -l 73 anthony@titi:~/git/qemu/hw$ grep 'qdev_create(NULL' *.c | wc -l 56 SystemBus has become a catch-all for shallow qdev conversions. We've got Northbridges, RAM, and network devices sitting on the same bus... On Sparc32 I have not bothered to create a SBus bus. Now it would be useful to get bootindex corrected. Most devices (even on-board IO) should use SBus. The only other bus (MBus) would exist between CPU, IOMMU and memory. But SysBus fitted the need until recently. A good way to judge where a device is using a bus interface correct: does all of it's interactions with any other part of the guest state involve method calls to the bus? Right now, the answer is no for just about every device in QEMU. This is the problem that qdev really was meant to solve and we're not really any further along solving it unfortunately. I'm not arguing against a generic factory interface, I'm arguing that it should be separate. IOW: SerialState *serial_create(int iobase, int irq, ...); static DeviceState *qdev_serial_create(QemuOpts *opts); static void serial_init(void) { qdev_register("serial", qdev_serial_create); } The key point is that when we create devices internally, we should have a C-friendly, type-safe interface to interact with. This will encourage composition and a richer device model than what
[Qemu-devel] Re: RFC: New API for PPC for vcpu mmu access
On 12.02.2011, at 01:57, Scott Wood wrote: > On Fri, 11 Feb 2011 22:07:11 +0100 > Alexander Graf wrote: > >> >> On 11.02.2011, at 21:53, Scott Wood wrote: >> >>> On Fri, 11 Feb 2011 02:41:35 +0100 >>> Alexander Graf wrote: >>> >> Maybe we should go with Avi's proposal after all and simply keep the >> full soft-mmu synced between kernel and user space? That way we only >> need a setup call at first, no copying in between and simply update the >> user space version whenever something changes in the guest. We need to >> store the TLB's contents off somewhere anyways, so all we need is an >> additional in-kernel array with internal translation data, but that can >> be separate from the guest visible data, right? >>> >>> Hmm, the idea is growing on me. >>> So then everything we need to get all the functionality we need is a hint from kernel to user space that something changed and vice versa. From kernel to user space is simple. We can just document that after every RUN, all fields can be modified. From user space to kernel, we could modify the entries directly and then pass in an ioctl that passes in a dirty bitmap to kernel space. KVM can then decide what to do with it. I guess the easiest implementation for now would be to ignore the bitmap and simply flush the shadow tlb. That gives us the flush almost for free. All we need to do is set the tlb to all zeros (should be done by env init anyways) and pass in the "something changed" call. KVM can then decide to simply drop all of its shadow state or loop through every shadow entry and flush it individually. Maybe we should give a hint on the amount of flushes, so KVM can implement some threshold. >>> >>> OK. We'll also need a config ioctl to specify MMU type/size and the address >>> of the arrays. >> >> Right, a setup call basically :). > > OK, here goes v3: > > [Note: one final thought after writing this up -- this isn't going to work > too well in cases where the guest can directly manipulate its TLB, such as > with the LRAT feature of Power Arch 2.06. We'll still need a > copy-in/copy-out mechanism for that.] In that case qemu sets the mmu mode respectively and is aware of the fact that it needs get/set methods. We'll get to that when we have LRAT implemented :). > struct kvmppc_book3e_tlb_entry { > union { > __u64 mas8_1; > struct { > __u32 mas8; > __u32 mas1; > }; > }; > __u64 mas2; > union { > __u64 mas7_3 > struct { > __u32 mas7; > __u32 mas3; > }; > }; > }; Looks good to me, except for the anonymous unions and structs of course. Avi dislikes those :). Is there any obvious reason we need to have unions in the first place? The compiler should be clever enough to pick the right size accessors when writing/reading masked __u64 entries, no? The struct name should also have a version indicator - it's the data descriptor only a single specific mmu_type, right? > > For an MMU type of KVM_MMU_PPC_BOOK3E_NOHV, the mas8 in > kvmppc_book3e_tlb_entry is > present but not supported. > > struct kvmppc_book3e_tlb_params { > /* >* book3e defines 4 TLBs. Individual implementations may have >* fewer. TLBs that do not already exist on the target must be >* configured with a size of zero. A tlb_ways value of zero means >* the array is fully associative. Only TLBs that are already >* set-associative on the target may be configured with a different >* associativity. A set-associative TLB may not exceed 255 ways. >* >* KVM will adjust TLBnCFG based on the sizes configured here, >* though arrays greater than 2048 entries will have TLBnCFG[NENTRY] >* set to zero. >* >* The size of any TLB that is set-associative must be a multiple of >* the number of ways, and the number of sets must be a power of two. >* >* The page sizes supported by a TLB shall be determined by reading >* the TLB configuration registers. This is not adjustable by > userspace. >* [Note: need sregs] >*/ > __u32 tlb_sizes[4]; > __u8 tlb_ways[4]; > __u32 reserved[11]; > }; > > KVM_CONFIG_TLB > -- > > Capability: KVM_CAP_SW_TLB > Type: vcpu ioctl > Parameters: struct kvm_config_tlb (in) > Returns: 0 on success > -1 on error > > struct kvm_config_tlb { > __u64 params; > __u64 array; > __u32 mmu_type; > __u32 array_len; Some reserved bits please. IIRC Avi also really likes it when in addition to reserved fields there's also a "features" int that indicates which reserved fields are actually usable. Shouldn't hurt here either, right? > }; >
Re: [Qemu-devel] KVM call minutes for Feb 8
On 02/13/2011 03:24 PM, Peter Maydell wrote: On 13 February 2011 16:56, Anthony Liguori wrote: If we can move away from Bus abstraction and to a simpler interface mechanism, then we can express peer relationships by just having bidirection references. IOW: -device cpus,northbridge=nb,id=cpus,count=16 -device i440fx,cpus=cpus I don't think modelling each CPU makes sense. We should probably just model all cpus in a single device for the sake of simplicity. How would this work for systems with multiple CPUs which have different views of the world? (ie their memory maps differ so that eg some RAM is shared between them but some parts of the address space are different RAM for the two cores, some devices one core only, some devices shared between cores but the device can tell which core made an IO request) With a bus-style abstraction this is straightforward: each core has its own bus which is what defines its view of the world, some devices and RAM are wired up to both buses. I'm not sure how the bidirectional reference model would look for this? Each core has it's own northbridge. You would do: -device arm-cpu,northbridge=nb1 -device dsp,northbridge=nb2 Or whatever. Regards, Anthony Liguori (Real world examples would be if we ever had any need to actually model any of the auxiliary cores in say an OMAP device, or the M3 in a versatile-express. Yes, most systems won't look that odd but it does come up, especially in testbench type designs, and our interface abstraction should be able to handle it.) -- PMM
Re: [Qemu-devel] KVM call minutes for Feb 8
On 13 February 2011 22:43, Anthony Liguori wrote: > On 02/13/2011 03:24 PM, Peter Maydell wrote: >> How would this work for systems with multiple CPUs which have different >> views of the world? (ie their memory maps differ so that eg some RAM is >> shared between them but some parts of the address space are different >> RAM for the two cores, some devices one core only, some devices shared >> between cores but the device can tell which core made an IO request) >> With a bus-style abstraction this is straightforward: each core has its >> own bus which is what defines its view of the world, some devices >> and RAM are wired up to both buses. I'm not sure how the bidirectional >> reference model would look for this? > > Each core has it's own northbridge. You would do: > > -device arm-cpu,northbridge=nb1 -device dsp,northbridge=nb2 I'm afraid I don't really understand what you mean here. "Northbridge" as I understand it is a very PC-architecture specific term; can you explain a bit more about what would actually be going on here? (ie what are the various components in this kind of system model and what are they doing?) -- PMM
Re: [Qemu-devel] Re: [PATCH 15/15] Implement the bus structure for PAPR virtual IO
On Sun, Feb 13, 2011 at 08:29:05PM +0200, Blue Swirl wrote: > On Sun, Feb 13, 2011 at 5:08 PM, Anthony Liguori > wrote: > > On 02/13/2011 05:12 AM, David Gibson wrote: [snip] > > The arguments will have to be extracted from the CPU state but I don't think > > we'll really ever have common hypercall implementations anyway so that's not > > a huge problem. > > Nice idea. Then the part handling CPUState probably should belong to > target-ppc/ rather than hw/. Doesn't work. Different hypervisors may have arguments - even the hcall number itself - arranged differently in the registers. My earlier drafts had this in target-ppc/; I moved it to hw/ for a reason. -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
Re: [Qemu-devel] Re: [PATCH 15/15] Implement the bus structure for PAPR virtual IO
On Sun, Feb 13, 2011 at 09:08:22AM -0600, Anthony Liguori wrote: > On 02/13/2011 05:12 AM, David Gibson wrote: > >On Sun, Feb 13, 2011 at 10:08:23AM +0200, Blue Swirl wrote: > >>On Sun, Feb 13, 2011 at 1:15 AM, Benjamin Herrenschmidt [snip] > In KVM for x86, instead of using a secondary interface (like > vmmcall/vmcall), we do all of our paravirtualization using native > hardware interfaces that we can trap (PIO/MMIO). > > IIUC, on Power, trapping MMIO is not possible due to the MMU mode > that is currently used (PFs are delivered directly to the guest). > > So it's not really possible to switch from using hypercalls to using MMIO. That's correct. > What I would suggest is modelling hypercalls as another I/O address > space for the processor. So instead of having a function pointer in > the CPUState, introduce a: > > typedef void (HypercallFunc)(CPUState *env, void *opaque); > > /* register a hypercall handler */ > void register_hypercall(target_ulong index, HypercallFunc *handler, > void *opaque); > void unregister_hypercall(target_ulong index); > > /* dispatch a hypercall */ > void cpu_hypercall(CPUState *env, target_ulong index); Well, I can certainly implement dynamic registration - in fact I've done that, I just need to fold it into the earlier part of the patch series. But the only "address" we have for this hypercall address space is the hypercall number, and it's not architected where that will be supplied. So we'd still need a per-platform hook to extract the index from the CPUState. > This interface could also be used to implement hypercall based > interfaces on s390 and x86. > > The arguments will have to be extracted from the CPU state but I > don't think we'll really ever have common hypercall implementations > anyway so that's not a huge problem. > > >>on real HW? > >The interface already exists on real HW. It's described in the PAPR > >document we keep mentioning. > > Another thing to note is that the hypercall based I/O devices the > interfaces that the current Power hypervisor uses so implementing > this interface is necessary to support existing guests. -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
Re: [Qemu-devel] RFC: Implement emulation of pSeries logical partitions
On Sun, 13 Feb 2011 01:54:12 +1100 David Gibson wrote: > This patch series adds a "pseries" machine to qemu, allowing it to > emulate IBM pSeries logical partitions. Along the way we add a bunch > of support for more modern ppc CPUs than are currently supported. It > also makes some significant cleanups to the translation code for hash > page table based ppc MMUs. > > This is a first version of this series for review. There are a number > of additional patches adding features such as virtual IO devices to > the emulated pSeries platform, which will be added to the series once > they're a bit more polished. The communication between LPARs that can be used for something like VIO server is (or will be) supported?
Re: [Qemu-devel] [PATCH] eepro100: pad to ensure minimum packet size
>>> On 2/13/2011 at 05:17 AM, Stefan Weil wrote: > Am 11.02.2011 20:36, schrieb Bruce Rogers: >> Recent gpxe e100pro drivers will drop small packets because the emulated >> nic will report an error for small frames. In the qemu model we should >> instead have the e100pro pad out the received frames to be the minimum >> size and not report this case as an error. >> >> Signed-off-by: Bruce Rogers >> --- >> hw/eepro100.c | 23 +-- >> 1 files changed, 13 insertions(+), 10 deletions(-) >> >> diff --git a/hw/eepro100.c b/hw/eepro100.c >> index edf48f6..03b6934 100644 >> --- a/hw/eepro100.c >> +++ b/hw/eepro100.c >> @@ -1645,6 +1645,8 @@ static int nic_can_receive(VLANClientState *nc) >> #endif >> } >> >> +#define MIN_BUF_SIZE 60 /* Min. octets in an ethernet frame sans FCS */ >> + >> static ssize_t nic_receive(VLANClientState *nc, const uint8_t * buf, >> size_t size) >> { >> /* TODO: >> @@ -1653,6 +1655,7 @@ static ssize_t nic_receive(VLANClientState *nc, >> const uint8_t * buf, size_t size >> */ >> EEPRO100State *s = DO_UPCAST(NICState, nc, nc)->opaque; >> uint16_t rfd_status = 0xa000; >> + uint8_t min_buf[MIN_BUF_SIZE]; >> static const uint8_t broadcast_macaddr[6] = >> { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff }; >> >> @@ -1660,15 +1663,15 @@ static ssize_t nic_receive(VLANClientState >> *nc, const uint8_t * buf, size_t size >> /* CSMA is disabled. */ >> logout("%p received while CSMA is disabled\n", s); >> return -1; >> - } else if (size < 64 && (s->configuration[7] & BIT(0))) { >> - /* Short frame and configuration byte 7/0 (discard short receive) set: >> - * Short frame is discarded */ >> - logout("%p received short frame (%zu byte)\n", s, size); >> - s->statistics.rx_short_frame_errors++; >> -#if 0 >> - return -1; >> -#endif >> - } else if ((size > MAX_ETH_FRAME_SIZE + 4) && !(s->configuration[18] >> & BIT(3))) { >> + } >> + /* Pad to minimum Ethernet frame length */ >> + if (size < sizeof(min_buf)) { >> + memcpy(min_buf, buf, size); >> + memset(&min_buf[size], 0, sizeof(min_buf) - size); >> + buf = min_buf; >> + size = sizeof(min_buf); >> + } >> + if ((size > MAX_ETH_FRAME_SIZE + 4) && !(s->configuration[18] & >> BIT(3))) { >> /* Long frame and configuration byte 18/3 (long receive ok) not set: >> * Long frames are discarded. */ >> logout("%p received long frame (%zu byte), ignored\n", s, size); >> @@ -1744,7 +1747,7 @@ static ssize_t nic_receive(VLANClientState *nc, >> const uint8_t * buf, size_t size >> "(%zu bytes); data truncated\n", rfd_size, size); >> size = rfd_size; >> } >> - if (size < 64) { >> + if (size < MIN_BUF_SIZE) { >> rfd_status |= 0x0080; >> } >> TRACE(OTHER, logout("command 0x%04x, link 0x%08x, addr 0x%08x, size %u\n", > > > Could you please give more details of the test scenario which fails? > I'd like to reproduce it here and find a better solution. > > The configuration bit "discard short frame" exists in real hardware, > so removing the code which emulates this behavior is not the correct > solution - even if it helps in a special case. > > No acknowledge for this patch from me. > > Regards, > Stefan Weil What I'm doing is using the current v1.0.1 gpxe based epro100 rom to pxe boot using the user mode networking stack. For example, the following (in tree) command line fails to pxe boot successfully, when the above mentioned pxe rom is used: x86_64-softmmu/qemu-system-x86_64 -boot n -bootp http://boot.kernel.org/bko/pxelinux.0 -net user -net nic,model=i82559er (It doesn't have to be using http protocol however - using a local tftp path to pxelinux.0 fails for me as well.) I found the gpxe git commit which causes the failure to be the following: commit cdcb4165bdabf3adbc4b2e5937c279614d769aa9 Author: Thomas Miletich Date: Tue Oct 6 23:57:49 2009 +0200 [eepro100] Convert to native gPXE API This version is Based on Michael Decker's GSoC 2008 code. A number cleanups and fixes were applied. This commit causes packets which are short, such as come from the slirp code, to be rejected. Bruce
Re: [Qemu-devel] [PATCH] Remove a detached device from qemu_device_opts.
At 01/27/2011 05:00 PM, Ken'ichi Ohmichi Write: > > Hi, > > When I tried to attach the interface after detaching the same interface, > the virsh command output the following and it failed: > > # virsh detach-interface Domain01 network --mac 52:54:00:0d:78:92 > Interface detached successfully > > # virsh attach-interface Domain01 network default --mac 52:54:00:0d:78:92 > error: Failed to attach interface > error: internal error unable to execute QEMU command 'device_add': > Duplicate ID 'net0' for device > # I can reproduce this problem. > > The reason is that a detached device is not removed from the list > "qemu_device_opts", and this patch fixes it. The detached device will be removed from the list "qemu_device_opts" in qdev_free() asynchronously. > > > Thanks > Ken'ichi Ohmichi > > > Signed-off-by: Ken'ichi Ohmichi > --- > --- a/hw/qdev.c 2011-01-27 17:42:25.0 +0900 > +++ b/hw/qdev.c 2011-01-27 17:43:46.0 +0900 > @@ -905,6 +905,8 @@ int do_device_del(Monitor *mon, const QD > qerror_report(QERR_DEVICE_NOT_FOUND, id); > return -1; > } > +qemu_opts_del(qemu_opts_find(&qemu_device_opts, id)); > + We can not use qemu_device_opts here, because qemu_device_opts is a static variable in qemu-config.c. We should use qemu_find_opts("device") instead of qemu_device_opts. I test your patch by attach/detach interface, and qemu core dumps. The reason may be that we call qemu_opts_del() twice. Here is the log(/var/log/libvirt/qemu/.log): *** glibc detected *** /usr/libexec/qemu-system-x86_64: free(): invalid pointer: 0x0355f000 *** === Backtrace: = /lib64/libc.so.6[0x351f275676] /usr/libexec/qemu-system-x86_64[0x43e2a6] /usr/libexec/qemu-system-x86_64[0x43f9a0] /usr/libexec/qemu-system-x86_64[0x4400ee] /usr/libexec/qemu-system-x86_64[0x4cb176] /usr/libexec/qemu-system-x86_64[0x5af9f6] /usr/libexec/qemu-system-x86_64[0x4953c1] /usr/libexec/qemu-system-x86_64[0x495dab] /usr/libexec/qemu-system-x86_64[0x43a087] /usr/libexec/qemu-system-x86_64[0x43a5dd] /usr/libexec/qemu-system-x86_64[0x51f8f5] /usr/libexec/qemu-system-x86_64[0x40ad4b] /usr/libexec/qemu-system-x86_64[0x40ae52] /usr/libexec/qemu-system-x86_64[0x585c81] /usr/libexec/qemu-system-x86_64[0x589da7] /lib64/libc.so.6(__libc_start_main+0xfd)[0x351f21ec5d] /usr/libexec/qemu-system-x86_64[0x408a19] === Memory map: 0040-00784000 r-xp 08:06 679152 /usr/libexec/qemu-system-x86_64 00983000-009c9000 rw-p 00383000 08:06 679152 /usr/libexec/qemu-system-x86_64 009c9000-01177000 rw-p 00:00 0 01177000-01178000 rwxp 00:00 0 01178000-011d5000 rw-p 00:00 0 02d32000-03109000 rw-p 00:00 0 03109000-03119000 rw-p 00:00 0 03119000-03134000 rw-p 00:00 0 03134000-03144000 rw-p 00:00 0 03144000-0352c000 rw-p 00:00 0 0352c000-0353c000 rw-p 00:00 0 0353c000-0354f000 rw-p 00:00 0 0354f000-0355f000 rw-p 00:00 0 0355f000-0358 rw-p 00:00 0 4154f000-4954f000 rwxp 00:00 0 328d00-328d038000 r-xp 08:06 394303 /lib64/libgssapi_krb5.so.2.2 328d038000-328d237000 ---p 00038000 08:06 394303 /lib64/libgssapi_krb5.so.2.2 328d237000-328d239000 rw-p 00037000 08:06 394303 /lib64/libgssapi_krb5.so.2.2 328d40-328d453000 r-xp 08:06 685072 /usr/lib64/libssl.so.1.0.0 328d453000-328d653000 ---p 00053000 08:06 685072 /usr/lib64/libssl.so.1.0.0 328d653000-328d65b000 rw-p 00053000 08:06 685072 /usr/lib64/libssl.so.1.0.0 328d80-328d827000 r-xp 08:06 394300 /lib64/libk5crypto.so.3.1 328d827000-328da27000 ---p 00027000 08:06 394300 /lib64/libk5crypto.so.3.1 328da27000-328da29000 rw-p 00027000 08:06 394300 /lib64/libk5crypto.so.3.1 328ec0-328eccf000 r-xp 08:06 394302 /lib64/libkrb5.so.3.3 328eccf000-328eecf000 ---p 000cf000 08:06 394302 /lib64/libkrb5.so.3.3 328eecf000-328eeda000 rw-p 000cf000 08:06 394302 /lib64/libkrb5.so.3.3 328f00-328f003000 r-xp 08:06 394301 /lib64/libcom_err.so.2.1 328f003000-328f202000 ---p 3000 08:06 394301 /lib64/libcom_err.so.2.1 328f202000-328f203000 rw-p 2000 08:06 394301 /lib64/libcom_err.so.2.1 328f40-328f40a000 r-xp 08:06 393785 /lib64/libkrb5support.so.0.1 328f40a000-328f609000 ---p a000 08:06 393785 /lib64/libkrb5support.so.0.1 328f609000-328f60a000 rw-p 9000 08:06 393785 /lib64/libkrb5support.so.0.1 351ea0-351ea1e000 r-xp 000
[Qemu-devel] CREATING CUSTOMER LOYALTY THROUGH EXTRAORDINARY SERVICE
Good day, We would like to inform you that MarkPlus&Co Consultancy is holding a one-day training session on How to Create Customer Loyalty on February 22, 2011 at the Royale Chulan Hotel in Kuala Lumpur. This training will be helpful as it will discuss tools on how service can attract and keep your best customers. The training will feature a very experienced trainer and practitioner, Ms. Emilia Zainol, who will share her experience on boosting customer loyalty amongst retailers and mall employees. It was partly due to her efforts, that her company has won many accolades on service, has an impressive loyal following of visitors and retailers, and thus continues to become a major profit center. This training is perfect for Senior Executives, Head of Departments, Managers, Supervisors and Team Leaders and those that are responsible for service to customers. We can provide a special price for you if you call us at 03 2166 8197 or 019 309 9907 (and ask for Mien Aziz). Program Summary: CREATING CUSTOMER LOYALTY THROUGH EXTRAORDINARY SERVICE 22 February 2011 9 AM to 5 PM Royale Chulan Hotel 5 Jalan Conlay, 50450 Kuala Lumpur Certificate of attendance awarded for those who complete the course INTRODUCTION: In today's competitive business world, many companies are implementing marketing campaigns designed to attract new business. Unfortunately, the cost for these campaigns can be very high with little return on investment. What is often lost in the mix is the fact that it can be much more cost effective to have a loyal customer base that returns again and again rather than constantly seeking the next new customers. Every business has a need for loyal customers. Not only is it less expensive to build loyalty in your existing customers but often they will spend more with you and spread word of their satisfaction to their friends. Overall, the return of investment of keeping customers satisfied and loyal is much higher than the return on advertising to new customers. This one day training program will teach you how to create a service system in your company that will boost customer satisfaction and create customer loyalty. Learn how your company can increase revenue and profit through extraordinary service. LEARNING OBJECTIVES At the end of the course participants shall be able to: - Master the underlying concepts of customer expectations, experience, satisfaction and loyalty - Be able to identify regular, loyal and at risk customers and respond accordingly - Be able to deliver excellent service to customers and achieve higher satisfaction - Be able to proactively go the extra mile to deliver extraordinary service - Be able to manage customer relationships to maximize repurchase and referrals - Be able to winback lost customers through personalized approaches COURSE SYLLABUS Session 1: Understanding Customer Expectations, Customer Satisfaction and Customer Loyalty Session 2: Understanding & Managing Customer Types Based on Satisfaction Level Session 3: Service Excellence and Beyond Session 4: Building An Extraordinary Service Culture In Your Organization WHO SHOULD ATTEND? Senior Executives, Head of Departments, Managers, Supervisors and Team Leaders and those that are responsible for service to customers. SPEAKER Emilia Zainol has over 10 years experience as a service practitioner. She has helped retailers and malls improve their service and boost customer loyalty. As a trainer, Emilia has helped banks, retailers, and many service companies improve their service performance. COURSE FEE AND REGISTRATION RM 800 per person (includes Lunch, Tea breaks, Course Notes and Certificate of Completion) To register, please contact: MIMIEN AZIZ MARKPLUS&CO CONSULTANCY SDN BHD Mobile: 019 309 9907 Telephone: 03 2166 8197 Faxsimile: 03 2166 9197