Re: [Qemu-devel] [PATCH v3 1/3] uhci: fix segfault when hot-unplugging uhci controller
Hi, > -static void uhci_reset(void *opaque) > +static void uhci_reset(DeviceState *dev) > { > -UHCIState *s = opaque; > +PCIDevice *d = PCI_DEVICE(dev); > +UHCIState *s = DO_UPCAST(UHCIState, dev, d); Uh, oh, DO_UPCAST() is long deprecated. There are other instances of this in the uhci emulation though, so we need a cleanup & qom-ify pass for the code anyway. So I think it's ok for a bugfix patch. I'll queue it up (and the other two too of course). cheers, Gerd
Re: [Qemu-devel] [Qemu-block] [PATCH] qtest/ahci: Fix clang 3.5.0 error
John Snow writes: > A thinko that clang 3.5.0 caught. > Thankfully does not introduce any new failures. > > Signed-off-by: John Snow How you caught the bug is interesting enough to be mentioned in the commit message, but the nature of the bug surely is more interesting than that. Suggest to change the subject to qtest/ahci: Fix a bit mask expression Perhaps it can be fixed up on commit. Reviewed-by: Markus Armbruster
Re: [Qemu-devel] [PATCH] add pci-bridge-seat
Hi, > Could you explain a bit more please? What does it mean that > a bridge is a new seat? How is this used? seat: kbd+mouse+display adapter. multiseat: When you have multiple of them on a single machine. When configured properly logind will show a login screen on each seat. why a pci bridge: grouping all devices belonging to a seat by placing them all below a pci bridge simplifies guest configuration (instead of a list of devices you'll only need to say "this bridge and all devices underneath it"). why a new pci bridge device: this allows to make the guest-side seat configuration automatic with a udev rule. more details: see docs/multiseat.txt cheers, Gerd
Re: [Qemu-devel] [PATCH v3 1/3] uhci: fix segfault when hot-unplugging uhci controller
On 2015/3/18 15:02, Gerd Hoffmann wrote: > Hi, > >> -static void uhci_reset(void *opaque) >> +static void uhci_reset(DeviceState *dev) >> { >> -UHCIState *s = opaque; >> +PCIDevice *d = PCI_DEVICE(dev); >> +UHCIState *s = DO_UPCAST(UHCIState, dev, d); > > Uh, oh, DO_UPCAST() is long deprecated. There are other instances of > this in the uhci emulation though, so we need a cleanup & qom-ify pass > for the code anyway. So I think it's ok for a bugfix patch. > Yes, I noticed that, but I haven't a good idea for qom-ifing uhci. :) May we refer to the realization of ehci ? > I'll queue it up (and the other two too of course). > Thanks. Regards, -Gonglei
Re: [Qemu-devel] RFC: -object vs -chardev creation order
On 17/03/2015 20:08, Daniel P. Berrange wrote: > A third option is to not process -object args in one go, instead process > each type of object at a time. eg we'd first create all the > -object tls-credential instances, then create the -chardev instances, > then create the -object rng-egd instances. This is probably the least > amount of work in short term, but not all that scalable, unless we do > a catch-all default case, so we only need code up hacks for a few > particular object types. > > Thus my gut feeling is to do option 3, but I'd like other opinions before > embarking on this Yes, for now it is the best. Another idea is to build objects lazily. This requires adding a "missing property" interface and making /objects implement it. We can think about it later. Paolo
Re: [Qemu-devel] Duplicate definition of 'qemu_time' building git master
On 17/03/2015 21:36, B Cran wrote: > When trying to build the latest git master on openSUSE 13.2, it failed with: > > vl.c:711:15: error: ‘qemu_time’ redeclared as different kind of symbol > static time_t qemu_time(void) >^ > In file included from qemu/include/block/aio.h:23:0, > from qemu/include/hw/hw.h:13, > from vl.c:62: > qemu/include/qemu/timer.h:1005:16: note: previous declaration of > ‘qemu_time’ was here > extern int64_t qemu_time, qemu_time_start; > ^ > CCbackends/rng-egd.o > CCbackends/rng-random.o > qemu/rules.mak:57: recipe for target 'vl.o' failed > make: *** [vl.o] Error 1 > make: *** Waiting for unfinished jobs > > > The ./configure commandline I used (which has previously worked) was: > > ./configure --prefix=~ --enable-sdl --disable-xen --enable-kvm > --enable-curses --enable-fdt --disable-bluez --disable-user > --enable-uuid --disable-vde --enable-linux-aio --enable-vhost-net > --enable-spice --enable-libusb --enable-usb-redir --enable-lzo > --enable-libssh2 --enable-profiler --enable-system > --target-list=i386-softmmu,x86_64-softmmu > A patch has been posted to fix this. You can disable --enable-profiler, you probably have never used it. Paolo
Re: [Qemu-devel] [v2][PATCH 2/2] libxl: introduce gfx_passthru_kind
I think the _libxl_ message needs to be just "Unable to detect graphics passthru kind". i.e. it can't/shouldn't reference anything to do with xl config options etc (which would make no sense if libvirt was being used). That's not very user friendly though, so you may want to consider adding a new specific error code for this case and returning it here, such that xl or libvirt can then give a more comprehensible error message. But, args = libxl__build_device_model_args(gc, "stubdom-dm", guest_domid, guest_config, d_state, NULL); | + return libxl__build_device_model_args_new(gc, dm,guest_domid, guest_config, state, dm_state_fd); | + Currently we always return "NULL" inside. if (!args) { ret = ERROR_FAIL; goto out; } So I'm not very sure how to pass this new specific error code to xl/libvirt. So looks the whole policy should be something like this, diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c index 5c40e84..5518759 100644 --- a/tools/libxl/xl_cmdimpl.c +++ b/tools/libxl/xl_cmdimpl.c @@ -1953,8 +1953,27 @@ skip_vfb: xlu_cfg_replace_string (config, "spice_streaming_video", &b_info->u.hvm.spice.streaming_video, 0); xlu_cfg_get_defbool(config, "nographic", &b_info->u.hvm.nographic, 0); -xlu_cfg_get_defbool(config, "gfx_passthru", -&b_info->u.hvm.gfx_passthru, 0); +if (!xlu_cfg_get_long(config, "gfx_passthru", &l, 1)) { +libxl_defbool_set(&b_info->u.hvm.gfx_passthru, l); +} else if (!xlu_cfg_get_string(config, "gfx_passthru", &buf, 0)) { +if (libxl_gfx_passthru_kind_from_string(buf, + &b_info->u.hvm.gfx_passthru_kind)) { +fprintf(stderr, +"ERROR: invalid value \"%s\" for \"gfx_passthru\"\n", +buf); +exit (1); +} +libxl_defbool_set(&b_info->u.hvm.gfx_passthru, true); +} Up to here is fine. Thanks but I guess I'd better to paste this whole patch to avoid I'm still missing something :) --- tools/libxl/libxl.h | 6 ++ tools/libxl/libxl_dm.c | 25 ++--- tools/libxl/libxl_internal.h | 5 + tools/libxl/libxl_types.idl | 6 ++ tools/libxl/xl_cmdimpl.c | 14 -- 5 files changed, 51 insertions(+), 5 deletions(-) diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h index 6bbc52d..62b3ae5 100644 --- a/tools/libxl/libxl.h +++ b/tools/libxl/libxl.h @@ -720,6 +720,12 @@ void libxl_mac_copy(libxl_ctx *ctx, libxl_mac *dst, libxl_mac *src); #define LIBXL_HAVE_PSR_MBM 1 #endif +/* + * libxl_domain_build_info has the u.hvm.gfx_passthru_kind field and + * the libxl_gfx_passthru_kind enumeration is defined. +*/ +#define LIBXL_HAVE_GFX_PASSTHRU_KIND + typedef char **libxl_string_list; void libxl_string_list_dispose(libxl_string_list *sl); int libxl_string_list_length(const libxl_string_list *sl); diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c index 8599a6a..045d48a 100644 --- a/tools/libxl/libxl_dm.c +++ b/tools/libxl/libxl_dm.c @@ -710,9 +710,6 @@ static char ** libxl__build_device_model_args_new(libxl__gc *gc, flexarray_append(dm_args, "-net"); flexarray_append(dm_args, "none"); } -if (libxl_defbool_val(b_info->u.hvm.gfx_passthru)) { -flexarray_append(dm_args, "-gfx_passthru"); -} } else { if (!sdl && !vnc) { flexarray_append(dm_args, "-nographic"); @@ -757,6 +754,28 @@ static char ** libxl__build_device_model_args_new(libxl__gc *gc, machinearg, max_ram_below_4g); } } + +if (libxl_defbool_val(b_info->u.hvm.gfx_passthru)) { +switch (b_info->u.hvm.gfx_passthru_kind) { +case LIBXL_GFX_PASSTHRU_KIND_DEFAULT: +if (libxl__is_igd_vga_passthru(gc, guest_config)) { +machinearg = GCSPRINTF("%s,igd-passthru=on", machinearg); +} else { +LOG(ERROR, "Unable to detect graphics passthru kind," +" please set gfx_passthru_kind. See xl.cfg(5) for more" +" information.\n"); +return NULL; +} +break; +case LIBXL_GFX_PASSTHRU_KIND_IGD: +machinearg = GCSPRINTF("%s,igd-passthru=on", machinearg); +break; +default: +LOG(WARN, "gfx_passthru_kind is invalid so ignored.\n"); +break; +} +} + flexarray_append(dm_args, machinearg); for (i = 0; b_info->extra_hvm && b_info->extra_hvm[i] != NULL; i++) flexarray_append(dm_args, b_info->extra_hvm[i]); diff --git a/tools/libxl/libxl_internal.h
Re: [Qemu-devel] [PATCH] virtio-scsi-dataplane: fix memory leak in virtio_scsi_vring_init
On 18/03/2015 10:42, Bo Su wrote: > if k->set_host_notifier failed, VirtIOSCSIVring *r will leak > > Signed-off-by: Bo Su > --- > hw/scsi/virtio-scsi-dataplane.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/hw/scsi/virtio-scsi-dataplane.c b/hw/scsi/virtio-scsi-dataplane.c > index 3f40ff0..c069cd7 100644 > --- a/hw/scsi/virtio-scsi-dataplane.c > +++ b/hw/scsi/virtio-scsi-dataplane.c > @@ -45,7 +45,7 @@ static VirtIOSCSIVring *virtio_scsi_vring_init(VirtIOSCSI > *s, > { > BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(s))); > VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus); > -VirtIOSCSIVring *r = g_slice_new(VirtIOSCSIVring); > +VirtIOSCSIVring *r; > int rc; > > /* Set up virtqueue notify */ > @@ -56,6 +56,8 @@ static VirtIOSCSIVring *virtio_scsi_vring_init(VirtIOSCSI > *s, > s->dataplane_fenced = true; > return NULL; > } > + > +r = g_slice_new(VirtIOSCSIVring); > r->host_notifier = *virtio_queue_get_host_notifier(vq); > r->guest_notifier = *virtio_queue_get_guest_notifier(vq); > aio_set_event_notifier(s->ctx, &r->host_notifier, handler); > Thanks, queued for 2.3. Paolo
Re: [Qemu-devel] [PATCH v3 1/3] uhci: fix segfault when hot-unplugging uhci controller
Gerd Hoffmann writes: > Hi, > >> -static void uhci_reset(void *opaque) >> +static void uhci_reset(DeviceState *dev) >> { >> -UHCIState *s = opaque; >> +PCIDevice *d = PCI_DEVICE(dev); >> +UHCIState *s = DO_UPCAST(UHCIState, dev, d); > > Uh, oh, DO_UPCAST() is long deprecated. There are other instances of [...] Only 378 instances left in the code %-}
Re: [Qemu-devel] [PATCH] raw-posix: Deprecate host floppy passthrough
John Snow writes: > On 03/17/2015 12:02 PM, Markus Armbruster wrote: >> Raise your hand if you have a physical floppy drive in a computer >> you've powered on in 2015. Okay, I see we got a few weirdos in the >> audience. That's okay, weirdos are welcome here. >> > > May I introduce to you my reference Q35 machine: > http://i.imgur.com/4gCnetj.jpg Take good care of it, so it doesn't die on us before pc-q35 is done!
Re: [Qemu-devel] [PATCH] raw-posix: Deprecate host floppy passthrough
On Di, 2015-03-17 at 17:02 +0100, Markus Armbruster wrote: > Raise your hand if you have a physical floppy drive in a computer > you've powered on in 2015. /me raises the hand, my amd test box has one. I was surprised when I got the machine a few years back. And I never ever actually used the floppy drive in all those years. > Kidding aside, media change detection doesn't fully work, isn't going > to be fixed, and floppy passthrough just isn't earning its keep > anymore. If you really need floppies there is the option to go for usb. Recent machines don't have floppy connectors any more, so that is the only kind of floppy device you can use anyway. And you can usb passthrough it to the guest. Case closed. Reviewed-by: Gerd Hoffmann cheers, Gerd
Re: [Qemu-devel] [PATCH v3 1/3] uhci: fix segfault when hot-unplugging uhci controller
On 2015/3/18 15:35, Markus Armbruster wrote: > Gerd Hoffmann writes: > >> Hi, >> >>> -static void uhci_reset(void *opaque) >>> +static void uhci_reset(DeviceState *dev) >>> { >>> -UHCIState *s = opaque; >>> +PCIDevice *d = PCI_DEVICE(dev); >>> +UHCIState *s = DO_UPCAST(UHCIState, dev, d); >> >> Uh, oh, DO_UPCAST() is long deprecated. There are other instances of > [...] > > Only 378 instances left in the code %-} > Maybe I can do some cleanup work for 2.4 ;) Regards, -Gonglei
Re: [Qemu-devel] [PATCH v3 2/3] ehci: fix segfault when hot-unplugging ehci controller
On Mi, 2015-03-18 at 09:49 +0800, arei.gong...@huawei.com wrote: > From: Gonglei > > When hot-unplugging the usb controllers (ehci/uhci), > we have to clean all resouce of these devices, > involved registered reset handler. Otherwise, it > may cause NULL pointer access and/or segmentation fault > if we reboot the guest os after hot-unplugging. > > Let's hook up reset via DeviceClass->reset() and drop > the qemu_register_reset() call. Then Qemu will register > and unregister the reset handler automatically. Fails "make check" (for aarch64). My guess is the sysbus variants lost the reset hookup. cheers, Gerd
Re: [Qemu-devel] [RFC 0/5] Memory transaction attributes API
On Mon, Mar 16, 2015 at 05:20:17PM +, Peter Maydell wrote: > This is an RFC patchset aimed at getting comment on > some memory API changes to support "transaction attributes", > ie sideband information that goes along with a memory read > or write access to define things like ARM secure/nonsecure, > CPU/transaction master ID, privileged/nonprivileged. Hi Peter, Generally I think the direction of this looks very good, thanks! A few comments: 1. Would it make sense to instead of having the MemTxAttrs be a uint64_t instead have them be a struct with bitfields? We could still pass the struct by value I think and as long as it doesn't grow large the compiler will emit similar or the same code IIUC. 2. The series introduces the read and write _with_attrs. Another approach could be to add a (for example) single .access callback. The callback could take a structure pointer as input, e.g: struct MemTx { bool rw; hwaddr addr; int size; uint64_t data; MemTxAttrs attrs; } MemTxResult access(MemTx *tx) The benefit I see is that this will make it easier for us in the future to add new fields if needed. Best regards, Edgar > > (See also previous discussion: > https://lists.nongnu.org/archive/html/qemu-devel/2014-09/msg01026.html ) > > Patch 1 is the API changes themselves: > 1) new read_with_attrs and write_with_attrs fields in the >MemoryRegionOps struct; the old read/write still exist >for backwards compatibility, but devices that care about >attributes can register with these function pointers instead > 2) the functions return a success/failure status, so a device >can actually fail bad transactions rather than merely >returning bogus data. (This isn't wired up in this patchset >but I include it to avoid revving the memory API twice.) > > Patches 2, 3 and 4 then plumb the memory attribute parameters > through the various functions, working upwards to being able > to put them in the iotlb. Patch 5 implements the target-arm > changes to provide a secure/nonsecure tx attribute based on > the page table walk, as a demonstration. > > There are obviously more APIs within QEMU for memory access > functions which need to change to either always take a tx > attribute, or to have extra with-tx-attribute versions of the > functions. For the moment things are stubbed out with passing > in "no attributes specified" values. > > I've modelled the transaction attributes as a (typedefed) > uint64_t, whose bits will be defined as we find requirements > for them (the meaning will not be per-architecture). When > we originally discussed this on-list, Edgar suggested making > the attributes be a (pointer to a) struct; however I found > the ownership/copying semantics on this too awkward, because > the access path needs to take attributes set up in the TLB > and then modify them according to details of the access > actually being made before passing them to the device, so > took the simpler implementation route. > > I intend to continue working on this (filling in the gaps, > etc), but wanted to send this series out early for comment > on the memory API changes in particular. > > thanks > -- PMM > > Peter Maydell (5): > memory: Define API for MemoryRegionOps to take attrs and return status > memory: Add MemTxAttrs argument to io_mem_read and io_mem_write > Make CPU iotlb a structure rather than a plain hwaddr > Add MemTxAttrs to the IOTLB > target-arm: Honour NS bits in page tables > > cputlb.c | 22 +++--- > exec.c | 29 +++--- > hw/s390x/s390-pci-inst.c | 7 ++-- > hw/vfio/pci.c| 4 +- > include/exec/cpu-defs.h | 15 ++- > include/exec/exec-all.h | 8 +++- > include/exec/memattrs.h | 37 + > include/exec/memory.h| 22 ++ > memory.c | 102 > +-- > softmmu_template.h | 36 + > target-arm/helper.c | 83 -- > 11 files changed, 288 insertions(+), 77 deletions(-) > create mode 100644 include/exec/memattrs.h > > -- > 1.9.1 >
Re: [Qemu-devel] [PATCH v3 2/3] ehci: fix segfault when hot-unplugging ehci controller
On 2015/3/18 16:23, Gerd Hoffmann wrote: > On Mi, 2015-03-18 at 09:49 +0800, arei.gong...@huawei.com wrote: >> From: Gonglei >> >> When hot-unplugging the usb controllers (ehci/uhci), >> we have to clean all resouce of these devices, >> involved registered reset handler. Otherwise, it >> may cause NULL pointer access and/or segmentation fault >> if we reboot the guest os after hot-unplugging. >> >> Let's hook up reset via DeviceClass->reset() and drop >> the qemu_register_reset() call. Then Qemu will register >> and unregister the reset handler automatically. > > Fails "make check" (for aarch64). My guess is the sysbus variants lost > the reset hookup. > Actually, these fails were introduced by the follow patch: commit c3cf77cb63b71618224129df41f114488e0f74e4 Author: David Gibson Date: Wed Feb 18 16:01:01 2015 +1100 Make sysbus EHCI devices ARM only by default A number of ARM embedded boards include EHCI USB host controllers which appear as directly mapped devices, rather than sitting on a PCI bus. At present code to emulate such devices is included whenever EHCI support is included. This patch adjusts teh config options to only include them in builds targetting ARM by default. But on the other hand, the sysbus variants lost the reset hookup is a real bug, I will fix them in the next version. thanks! Regards, -Gonglei
[Qemu-devel] [v7][PATCH 00/10] xen: add Intel IGD passthrough support
v7: * Instead of "-gfx_passthru" we'd like to make that a machine option, "-machine xxx,igd-passthru=on"" * try to make something as common shared by others like KvmGT in the future * Just read those real value from host bridge pci configuration space when create host bridge then put in dev->config. v6: * Drop introducing a new machine specific to IGD passthrough * Try to share some codes from KVM stuff in qemu to retrive VGA BIOS * Currently IGD drivers always need to access PCH by 1f.0. But we don't want to poke that directly to get ID, and although in real world different GPU should have different PCH. But actually the different PCH DIDs likely map to different PCH SKUs. We do the same thing for the GPU. For PCH, the different SKUs are going to be all the same silicon design and implementation, just different features turn on and off with fuses. The SW interfaces should be consistent across all SKUs in a given family (eg LPT). But just same features may not be supported. Most of these different PCH features probably don't matter to the Gfx driver, but obviously any difference in display port connections will so it should be fine with any PCH in case of passthrough. So currently use one PCH version, 0x8c4e, to cover all HSW(Haswell) scenarios, 0x9cc3 for BDW(Broadwell). * Drop igd write ops since its fine to emulate that, and we also shrink those igd read ops as necessary. * Rebase and cleanup all patches. v5: * Simplify to make sure its really inherited from the standard one in patch #3 * Then drop the original patch #3 v4: * Rebase on latest tree * Drop patch #2 * Regenerate patches after Michael introduce patch #1 * We need to use this pci_type as a index to reuse I440FX_PCI_DEVICE() * Test: boot with a preinstalled winxp ./i386-softmmu/qemu-system-i386 -hda winxp-32.img -m 2560 -boot c -machine pc v3: * Drop patch #4 * Add one patch #1 from Michael * Rebase * In./i386-softmmu/qemu-system-i386 -hda test.img -m 2560 -boot c -machine pc v2: * Fix some coding style * New patch to separate i440fx_init * Just add prefix with XEN_IGD_PASSTHROUGH/xen_igd_passthrough * Based on patch #2 to regenerate * Unify prefix with XEN_IGD_PASSTHROUGH/xen_igd_passthrough like patch #3 * Test: boot with a preinstalled ubuntu 14.04 ./i386-softmmu/qemu-system-i386 -hda test.img -m 2560 -boot c -machine pc As we discussed we need to create a separate machine to support current IGD passthrough. Michael S. Tsirkin (1): i440fx: make types configurable at run-time Tiejun Chen (9): pc_init1: pass parameters just with types piix: create host bridge to passthrough hw/pci-assign: split pci-assign.c xen, gfx passthrough: basic graphics passthrough support xen, gfx passthrough: retrieve VGA BIOS to work igd gfx passthrough: create a isa bridge xen, gfx passthrough: register a isa bridge xen, gfx passthrough: register host bridge specific to passthrough xen, gfx passthrough: add opregion mapping hw/core/machine.c | 20 +++ hw/i386/Makefile.objs | 1 + hw/i386/kvm/pci-assign.c | 82 +- hw/i386/pc_piix.c | 149 ++- hw/i386/pci-assign-load-rom.c | 93 hw/pci-host/piix.c| 91 +++- hw/xen/Makefile.objs | 1 + hw/xen/xen-host-pci-device.c | 5 + hw/xen/xen-host-pci-device.h | 1 + hw/xen/xen_pt.c | 32 hw/xen/xen_pt.h | 21 ++- hw/xen/xen_pt_config_init.c | 52 ++- hw/xen/xen_pt_graphics.c | 272 ++ include/hw/boards.h | 1 + include/hw/i386/pc.h | 8 +- include/hw/pci/pci-assign.h | 27 include/hw/xen/xen.h | 1 + qemu-options.hx | 3 + vl.c | 10 ++ 19 files changed, 778 insertions(+), 92 deletions(-) create mode 100644 hw/i386/pci-assign-load-rom.c create mode 100644 hw/xen/xen_pt_graphics.c create mode 100644 include/hw/pci/pci-assign.h Thanks Tiejun
[Qemu-devel] [v7][PATCH 01/10] i440fx: make types configurable at run-time
From: "Michael S. Tsirkin" IGD passthrough wants to supply a different pci and host devices, inheriting i440fx devices. Make types configurable. Signed-off-by: Michael S. Tsirkin Signed-off-by: Tiejun Chen --- hw/i386/pc_piix.c| 4 +++- hw/pci-host/piix.c | 9 - include/hw/i386/pc.h | 6 +- 3 files changed, 12 insertions(+), 7 deletions(-) diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c index 36c69d7..07faec9 100644 --- a/hw/i386/pc_piix.c +++ b/hw/i386/pc_piix.c @@ -202,7 +202,9 @@ static void pc_init1(MachineState *machine, } if (pci_enabled) { -pci_bus = i440fx_init(&i440fx_state, &piix3_devfn, &isa_bus, gsi, +pci_bus = i440fx_init(TYPE_I440FX_PCI_HOST_BRIDGE, + TYPE_I440FX_PCI_DEVICE, + &i440fx_state, &piix3_devfn, &isa_bus, gsi, system_memory, system_io, machine->ram_size, below_4g_mem_size, above_4g_mem_size, diff --git a/hw/pci-host/piix.c b/hw/pci-host/piix.c index 723836f..c812eaa 100644 --- a/hw/pci-host/piix.c +++ b/hw/pci-host/piix.c @@ -40,7 +40,6 @@ * http://download.intel.com/design/chipsets/datashts/29054901.pdf */ -#define TYPE_I440FX_PCI_HOST_BRIDGE "i440FX-pcihost" #define I440FX_PCI_HOST_BRIDGE(obj) \ OBJECT_CHECK(I440FXState, (obj), TYPE_I440FX_PCI_HOST_BRIDGE) @@ -91,7 +90,6 @@ typedef struct PIIX3State { MemoryRegion rcr_mem; } PIIX3State; -#define TYPE_I440FX_PCI_DEVICE "i440FX" #define I440FX_PCI_DEVICE(obj) \ OBJECT_CHECK(PCII440FXState, (obj), TYPE_I440FX_PCI_DEVICE) @@ -304,7 +302,8 @@ static void i440fx_realize(PCIDevice *dev, Error **errp) cpu_smm_register(&i440fx_set_smm, d); } -PCIBus *i440fx_init(PCII440FXState **pi440fx_state, +PCIBus *i440fx_init(const char *host_type, const char *pci_type, +PCII440FXState **pi440fx_state, int *piix3_devfn, ISABus **isa_bus, qemu_irq *pic, MemoryRegion *address_space_mem, @@ -324,7 +323,7 @@ PCIBus *i440fx_init(PCII440FXState **pi440fx_state, unsigned i; I440FXState *i440fx; -dev = qdev_create(NULL, TYPE_I440FX_PCI_HOST_BRIDGE); +dev = qdev_create(NULL, host_type); s = PCI_HOST_BRIDGE(dev); b = pci_bus_new(dev, NULL, pci_address_space, address_space_io, 0, TYPE_PCI_BUS); @@ -332,7 +331,7 @@ PCIBus *i440fx_init(PCII440FXState **pi440fx_state, object_property_add_child(qdev_get_machine(), "i440fx", OBJECT(dev), NULL); qdev_init_nofail(dev); -d = pci_create_simple(b, 0, TYPE_I440FX_PCI_DEVICE); +d = pci_create_simple(b, 0, pci_type); *pi440fx_state = I440FX_PCI_DEVICE(d); f = *pi440fx_state; f->system_memory = address_space_mem; diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h index 1b35168..99dc26b 100644 --- a/include/hw/i386/pc.h +++ b/include/hw/i386/pc.h @@ -230,7 +230,11 @@ extern int no_hpet; struct PCII440FXState; typedef struct PCII440FXState PCII440FXState; -PCIBus *i440fx_init(PCII440FXState **pi440fx_state, int *piix_devfn, +#define TYPE_I440FX_PCI_HOST_BRIDGE "i440FX-pcihost" +#define TYPE_I440FX_PCI_DEVICE "i440FX" + +PCIBus *i440fx_init(const char *host_type, const char *pci_type, +PCII440FXState **pi440fx_state, int *piix_devfn, ISABus **isa_bus, qemu_irq *pic, MemoryRegion *address_space_mem, MemoryRegion *address_space_io, -- 1.9.1
[Qemu-devel] [v7][PATCH 06/10] xen, gfx passthrough: retrieve VGA BIOS to work
Now we retrieve VGA bios like kvm stuff in qemu but we need to fix Device Identification in case if its not matched with the real IGD device since Seabios is always trying to compare this ID to work out VGA BIOS. Signed-off-by: Tiejun Chen --- hw/xen/xen_pt.c | 10 ++ hw/xen/xen_pt.h | 5 +++ hw/xen/xen_pt_graphics.c | 79 3 files changed, 94 insertions(+) diff --git a/hw/xen/xen_pt.c b/hw/xen/xen_pt.c index 1d78021..fcc9f1c 100644 --- a/hw/xen/xen_pt.c +++ b/hw/xen/xen_pt.c @@ -672,6 +672,16 @@ static int xen_pt_initfn(PCIDevice *d) s->memory_listener = xen_pt_memory_listener; s->io_listener = xen_pt_io_listener; +/* Setup VGA bios for passthrough GFX */ +if ((s->real_device.domain == 0) && (s->real_device.bus == 0) && +(s->real_device.dev == 2) && (s->real_device.func == 0)) { +if (xen_pt_setup_vga(s, &s->real_device) < 0) { +XEN_PT_ERR(d, "Setup VGA BIOS of passthrough GFX failed!\n"); +xen_host_pci_device_put(&s->real_device); +return -1; +} +} + /* Handle real device's MMIO/PIO BARs */ xen_pt_register_regions(s); diff --git a/hw/xen/xen_pt.h b/hw/xen/xen_pt.h index 3325387..d9b1ce0 100644 --- a/hw/xen/xen_pt.h +++ b/hw/xen/xen_pt.h @@ -298,6 +298,11 @@ static inline bool xen_pt_has_msix_mapping(XenPCIPassthroughState *s, int bar) return s->msix && s->msix->bar_index == bar; } +extern void *pci_assign_dev_load_option_rom(PCIDevice *dev, +struct Object *owner, int *size, +unsigned int domain, +unsigned int bus, unsigned int slot, +unsigned int function); extern bool has_igd_gfx_passthru; static inline bool is_igd_vga_passthrough(XenHostPCIDevice *dev) { diff --git a/hw/xen/xen_pt_graphics.c b/hw/xen/xen_pt_graphics.c index 9b3df81..3232296 100644 --- a/hw/xen/xen_pt_graphics.c +++ b/hw/xen/xen_pt_graphics.c @@ -109,3 +109,82 @@ int xen_pt_unregister_vga_regions(XenHostPCIDevice *dev) return 0; } + +static void *get_vgabios(XenPCIPassthroughState *s, int *size, + XenHostPCIDevice *dev) +{ +return pci_assign_dev_load_option_rom(&s->dev, OBJECT(&s->dev), size, + dev->domain, dev->bus, + dev->dev, dev->func); +} + +/* Refer to Seabios. */ +struct rom_header { +uint16_t signature; +uint8_t size; +uint8_t initVector[4]; +uint8_t reserved[17]; +uint16_t pcioffset; +uint16_t pnpoffset; +} __attribute__((packed)); + +struct pci_data { +uint32_t signature; +uint16_t vendor; +uint16_t device; +uint16_t vitaldata; +uint16_t dlen; +uint8_t drevision; +uint8_t class_lo; +uint16_t class_hi; +uint16_t ilen; +uint16_t irevision; +uint8_t type; +uint8_t indicator; +uint16_t reserved; +} __attribute__((packed)); + +int xen_pt_setup_vga(XenPCIPassthroughState *s, XenHostPCIDevice *dev) +{ +unsigned char *bios = NULL; +struct rom_header *rom; +int bios_size; +char *c = NULL; +char checksum = 0; +uint32_t len = 0; +struct pci_data *pd = NULL; + +if (!is_igd_vga_passthrough(dev)) { +return -1; +} + +bios = get_vgabios(s, &bios_size, dev); +if (!bios) { +XEN_PT_ERR(&s->dev, "VGA: Can't getting VBIOS!\n"); +return -1; +} + +/* Currently we fixed this address as a primary. */ +rom = (struct rom_header *)bios; +pd = (void *)(bios + (unsigned char)rom->pcioffset); + +/* We may need to fixup Device Identification. */ +if (pd->device != s->real_device.device_id) { +pd->device = s->real_device.device_id; + +len = rom->size * 512; +/* Then adjust the bios checksum */ +for (c = (char *)bios; c < ((char *)bios + len); c++) { +checksum += *c; +} +if (checksum) { +bios[len - 1] -= checksum; +XEN_PT_LOG(&s->dev, "vga bios checksum is adjusted %x!\n", + checksum); +} +} + +/* Currently we fixed this address as a primary for legacy BIOS. */ +cpu_physical_memory_rw(0xc, bios, bios_size, 1); +return 0; +} -- 1.9.1
[Qemu-devel] [v7][PATCH 05/10] xen, gfx passthrough: basic graphics passthrough support
basic gfx passthrough support: - add a vga type for gfx passthrough - register/unregister legacy VGA I/O ports and MMIOs for passthrough GFX Signed-off-by: Tiejun Chen Signed-off-by: Yang Zhang --- hw/core/machine.c| 20 hw/xen/Makefile.objs | 1 + hw/xen/xen-host-pci-device.c | 5 ++ hw/xen/xen-host-pci-device.h | 1 + hw/xen/xen_pt.c | 4 ++ hw/xen/xen_pt.h | 10 +++- hw/xen/xen_pt_graphics.c | 111 +++ include/hw/boards.h | 1 + qemu-options.hx | 3 ++ vl.c | 10 10 files changed, 165 insertions(+), 1 deletion(-) create mode 100644 hw/xen/xen_pt_graphics.c diff --git a/hw/core/machine.c b/hw/core/machine.c index cb1185a..6ac3ee4 100644 --- a/hw/core/machine.c +++ b/hw/core/machine.c @@ -225,6 +225,20 @@ static void machine_set_usb(Object *obj, bool value, Error **errp) ms->usb = value; } +static bool machine_get_igd_gfx_passthru(Object *obj, Error **errp) +{ +MachineState *ms = MACHINE(obj); + +return ms->igd_gfx_passthru; +} + +static void machine_set_igd_gfx_passthru(Object *obj, bool value, Error **errp) +{ +MachineState *ms = MACHINE(obj); + +ms->igd_gfx_passthru = value; +} + static char *machine_get_firmware(Object *obj, Error **errp) { MachineState *ms = MACHINE(obj); @@ -379,6 +393,12 @@ static void machine_initfn(Object *obj) object_property_set_description(obj, "usb", "Set on/off to enable/disable usb", NULL); +object_property_add_bool(obj, "igd-passthru", + machine_get_igd_gfx_passthru, + machine_set_igd_gfx_passthru, NULL); +object_property_set_description(obj, "igd-passthru", +"Set on/off to enable/disable igd passthrou", +NULL); object_property_add_str(obj, "firmware", machine_get_firmware, machine_set_firmware, NULL); diff --git a/hw/xen/Makefile.objs b/hw/xen/Makefile.objs index a0ca0aa..a9ad7e7 100644 --- a/hw/xen/Makefile.objs +++ b/hw/xen/Makefile.objs @@ -3,3 +3,4 @@ common-obj-$(CONFIG_XEN_BACKEND) += xen_backend.o xen_devconfig.o obj-$(CONFIG_XEN_PCI_PASSTHROUGH) += xen-host-pci-device.o obj-$(CONFIG_XEN_PCI_PASSTHROUGH) += xen_pt.o xen_pt_config_init.o xen_pt_msi.o +obj-$(CONFIG_XEN_PCI_PASSTHROUGH) += xen_pt.o xen_pt_config_init.o xen_pt_msi.o xen_pt_graphics.o diff --git a/hw/xen/xen-host-pci-device.c b/hw/xen/xen-host-pci-device.c index 743b37b..a54b7de 100644 --- a/hw/xen/xen-host-pci-device.c +++ b/hw/xen/xen-host-pci-device.c @@ -376,6 +376,11 @@ int xen_host_pci_device_get(XenHostPCIDevice *d, uint16_t domain, goto error; } d->irq = v; +rc = xen_host_pci_get_hex_value(d, "class", &v); +if (rc) { +goto error; +} +d->class_code = v; d->is_virtfn = xen_host_pci_dev_is_virtfn(d); return 0; diff --git a/hw/xen/xen-host-pci-device.h b/hw/xen/xen-host-pci-device.h index c2486f0..f1e1c30 100644 --- a/hw/xen/xen-host-pci-device.h +++ b/hw/xen/xen-host-pci-device.h @@ -25,6 +25,7 @@ typedef struct XenHostPCIDevice { uint16_t vendor_id; uint16_t device_id; +uint32_t class_code; int irq; XenHostPCIIORegion io_regions[PCI_NUM_REGIONS - 1]; diff --git a/hw/xen/xen_pt.c b/hw/xen/xen_pt.c index f2893b2..1d78021 100644 --- a/hw/xen/xen_pt.c +++ b/hw/xen/xen_pt.c @@ -450,6 +450,7 @@ static int xen_pt_register_regions(XenPCIPassthroughState *s) d->rom.size, d->rom.base_addr); } +xen_pt_register_vga_regions(d); return 0; } @@ -748,6 +749,7 @@ out: static void xen_pt_unregister_device(PCIDevice *d) { XenPCIPassthroughState *s = DO_UPCAST(XenPCIPassthroughState, dev, d); +XenHostPCIDevice *host_dev = &s->real_device; uint8_t machine_irq = s->machine_irq; uint8_t intx = xen_pt_pci_intx(s); int rc; @@ -791,6 +793,8 @@ static void xen_pt_unregister_device(PCIDevice *d) /* delete all emulated config registers */ xen_pt_config_delete(s); +xen_pt_unregister_vga_regions(host_dev); + memory_listener_unregister(&s->memory_listener); memory_listener_unregister(&s->io_listener); diff --git a/hw/xen/xen_pt.h b/hw/xen/xen_pt.h index 942dc60..3325387 100644 --- a/hw/xen/xen_pt.h +++ b/hw/xen/xen_pt.h @@ -298,5 +298,13 @@ static inline bool xen_pt_has_msix_mapping(XenPCIPassthroughState *s, int bar) return s->msix && s->msix->bar_index == bar; } - +extern bool has_igd_gfx_passthru; +static inline bool is_igd_vga_passthrough(XenHostPCIDevice *dev) +{ +return (has_igd_gfx_passthru +&& ((dev->class_code >> 0x8) == PCI_CLASS_DISPLAY_VGA)); +} +int xen_pt_register_vga_regions(XenHostPCIDevice *dev); +int xen_pt_unregister_vga_regions(Xen
[Qemu-devel] [v7][PATCH 08/10] xen, gfx passthrough: register a isa bridge
Currently we just register this isa bridge when we use IGD passthrough in Xen side. Signed-off-by: Tiejun Chen --- hw/xen/xen_pt.c | 18 ++ include/hw/xen/xen.h | 1 + 2 files changed, 19 insertions(+) diff --git a/hw/xen/xen_pt.c b/hw/xen/xen_pt.c index fcc9f1c..2d5cebb 100644 --- a/hw/xen/xen_pt.c +++ b/hw/xen/xen_pt.c @@ -632,6 +632,21 @@ static const MemoryListener xen_pt_io_listener = { .priority = 10, }; +static void +xen_igd_passthrough_isa_bridge_create(XenPCIPassthroughState *s, + XenHostPCIDevice *dev) +{ +uint16_t gpu_dev_id; +PCIDevice *d = &s->dev; + +if (!is_igd_vga_passthrough(dev)) { +return; +} + +gpu_dev_id = dev->device_id; +igd_passthrough_isa_bridge_create(d->bus, gpu_dev_id); +} + /* init */ static int xen_pt_initfn(PCIDevice *d) @@ -680,6 +695,9 @@ static int xen_pt_initfn(PCIDevice *d) xen_host_pci_device_put(&s->real_device); return -1; } + +/* Register ISA bridge for passthrough GFX. */ +xen_igd_passthrough_isa_bridge_create(s, &s->real_device); } /* Handle real device's MMIO/PIO BARs */ diff --git a/include/hw/xen/xen.h b/include/hw/xen/xen.h index 4356af4..703148e 100644 --- a/include/hw/xen/xen.h +++ b/include/hw/xen/xen.h @@ -51,4 +51,5 @@ void xen_register_framebuffer(struct MemoryRegion *mr); # define HVM_MAX_VCPUS 32 #endif +extern void igd_passthrough_isa_bridge_create(PCIBus *bus, uint16_t gpu_dev_id); #endif /* QEMU_HW_XEN_H */ -- 1.9.1
[Qemu-devel] [v7][PATCH 07/10] igd gfx passthrough: create a isa bridge
Currently IGD drivers always need to access PCH by 1f.0. But we don't want to poke that directly to get ID, and although in real world different GPU should have different PCH. But actually the different PCH DIDs likely map to different PCH SKUs. We do the same thing for the GPU. For PCH, the different SKUs are going to be all the same silicon design and implementation, just different features turn on and off with fuses. The SW interfaces should be consistent across all SKUs in a given family (eg LPT). But just same features may not be supported. Most of these different PCH features probably don't matter to the Gfx driver, but obviously any difference in display port connections will so it should be fine with any PCH in case of passthrough. So currently use one PCH version, 0x8c4e, to cover all HSW(Haswell) scenarios, 0x9cc3 for BDW(Broadwell). Signed-off-by: Tiejun Chen --- hw/i386/pc_piix.c | 113 ++ 1 file changed, 113 insertions(+) diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c index cea3a5c..8fbfc09 100644 --- a/hw/i386/pc_piix.c +++ b/hw/i386/pc_piix.c @@ -956,6 +956,119 @@ static QEMUMachine pc_machine_v0_10 = { .hw_version = "0.10", }; +typedef struct { +uint16_t gpu_device_id; +uint16_t pch_device_id; +uint8_t pch_revision_id; +} IGDDeviceIDInfo; + +/* In real world different GPU should have different PCH. But actually + * the different PCH DIDs likely map to different PCH SKUs. We do the + * same thing for the GPU. For PCH, the different SKUs are going to be + * all the same silicon design and implementation, just different + * features turn on and off with fuses. The SW interfaces should be + * consistent across all SKUs in a given family (eg LPT). But just same + * features may not be supported. + * + * Most of these different PCH features probably don't matter to the + * Gfx driver, but obviously any difference in display port connections + * will so it should be fine with any PCH in case of passthrough. + * + * So currently use one PCH version, 0x8c4e, to cover all HSW(Haswell) + * scenarios, 0x9cc3 for BDW(Broadwell). + */ +static const IGDDeviceIDInfo igd_combo_id_infos[] = { +/* HSW Classic */ +{0x0402, 0x8c4e, 0x04}, /* HSWGT1D, HSWD_w7 */ +{0x0406, 0x8c4e, 0x04}, /* HSWGT1M, HSWM_w7 */ +{0x0412, 0x8c4e, 0x04}, /* HSWGT2D, HSWD_w7 */ +{0x0416, 0x8c4e, 0x04}, /* HSWGT2M, HSWM_w7 */ +{0x041E, 0x8c4e, 0x04}, /* HSWGT15D, HSWD_w7 */ +/* HSW ULT */ +{0x0A06, 0x8c4e, 0x04}, /* HSWGT1UT, HSWM_w7 */ +{0x0A16, 0x8c4e, 0x04}, /* HSWGT2UT, HSWM_w7 */ +{0x0A26, 0x8c4e, 0x06}, /* HSWGT3UT, HSWM_w7 */ +{0x0A2E, 0x8c4e, 0x04}, /* HSWGT3UT28W, HSWM_w7 */ +{0x0A1E, 0x8c4e, 0x04}, /* HSWGT2UX, HSWM_w7 */ +{0x0A0E, 0x8c4e, 0x04}, /* HSWGT1ULX, HSWM_w7 */ +/* HSW CRW */ +{0x0D26, 0x8c4e, 0x04}, /* HSWGT3CW, HSWM_w7 */ +{0x0D22, 0x8c4e, 0x04}, /* HSWGT3CWDT, HSWD_w7 */ +/* HSW Server */ +{0x041A, 0x8c4e, 0x04}, /* HSWSVGT2, HSWD_w7 */ +/* HSW SRVR */ +{0x040A, 0x8c4e, 0x04}, /* HSWSVGT1, HSWD_w7 */ +/* BSW */ +{0x1606, 0x9cc3, 0x03}, /* BDWULTGT1, BDWM_w7 */ +{0x1616, 0x9cc3, 0x03}, /* BDWULTGT2, BDWM_w7 */ +{0x1626, 0x9cc3, 0x03}, /* BDWULTGT3, BDWM_w7 */ +{0x160E, 0x9cc3, 0x03}, /* BDWULXGT1, BDWM_w7 */ +{0x161E, 0x9cc3, 0x03}, /* BDWULXGT2, BDWM_w7 */ +{0x1602, 0x9cc3, 0x03}, /* BDWHALOGT1, BDWM_w7 */ +{0x1612, 0x9cc3, 0x03}, /* BDWHALOGT2, BDWM_w7 */ +{0x1622, 0x9cc3, 0x03}, /* BDWHALOGT3, BDWM_w7 */ +{0x162B, 0x9cc3, 0x03}, /* BDWHALO28W, BDWM_w7 */ +{0x162A, 0x9cc3, 0x03}, /* BDWGT3WRKS, BDWM_w7 */ +{0x162D, 0x9cc3, 0x03}, /* BDWGT3SRVR, BDWM_w7 */ +}; + +static void isa_bridge_class_init(ObjectClass *klass, void *data) +{ +DeviceClass *dc = DEVICE_CLASS(klass); +PCIDeviceClass *k = PCI_DEVICE_CLASS(klass); + +dc->desc= "ISA bridge faked to support IGD PT"; +k->vendor_id= PCI_VENDOR_ID_INTEL; +k->class_id = PCI_CLASS_BRIDGE_ISA; +}; + +static TypeInfo isa_bridge_info = { +.name = "igd-passthrough-isa-bridge", +.parent= TYPE_PCI_DEVICE, +.instance_size = sizeof(PCIDevice), +.class_init = isa_bridge_class_init, +}; + +static void pt_graphics_register_types(void) +{ +type_register_static(&isa_bridge_info); +} +type_init(pt_graphics_register_types) + +void igd_passthrough_isa_bridge_create(PCIBus *bus, uint16_t gpu_dev_id) +{ +struct PCIDevice *bridge_dev; +int i, num; +uint16_t pch_dev_id = 0x; +uint8_t pch_rev_id; + +num = ARRAY_SIZE(igd_combo_id_infos); +for (i = 0; i < num; i++) { +if (gpu_dev_id == igd_combo_id_infos[i].gpu_device_id) { +pch_dev_id = igd_combo_id_infos[i].pch_device_id; +pch_rev_id = igd_combo_id_infos[i].pch_revision_id; +} +} + +if (pch_dev_id == 0x) { +return; +} + +/* Currently IGD drivers always need to access
[Qemu-devel] [v7][PATCH 03/10] piix: create host bridge to passthrough
Implement a pci host bridge specific to passthrough. Actually this just inherits the standard one. And we also just expose a minimal real host bridge pci configuration subset. Signed-off-by: Tiejun Chen --- hw/pci-host/piix.c | 82 include/hw/i386/pc.h | 2 ++ 2 files changed, 84 insertions(+) diff --git a/hw/pci-host/piix.c b/hw/pci-host/piix.c index c812eaa..0906ba5 100644 --- a/hw/pci-host/piix.c +++ b/hw/pci-host/piix.c @@ -728,6 +728,87 @@ static const TypeInfo i440fx_info = { .class_init= i440fx_class_init, }; +/* IGD Passthrough Host Bridge. */ +typedef struct { +uint8_t offset; +uint8_t len; +} IGDHostInfo; + +/* Here we just expose minimal host bridge offset subset. */ +static const IGDHostInfo igd_host_bridge_infos[] = { +{0x08, 2}, /* revision id */ +{0x2c, 2}, /* sybsystem vendor id */ +{0x2e, 2}, /* sybsystem id */ +{0x50, 2}, /* SNB: processor graphics control register */ +{0x52, 2}, /* processor graphics control register */ +{0xa4, 4}, /* SNB: graphics base of stolen memory */ +{0xa8, 4}, /* SNB: base of GTT stolen memory */ +}; + +static int host_pci_config_read(int pos, int len, uint32_t val) +{ +char path[PATH_MAX]; +int config_fd; +ssize_t size = sizeof(path); +/* Access real host bridge. */ +int rc = snprintf(path, size, "/sys/bus/pci/devices/%04x:%02x:%02x.%d/%s", + 0, 0, 0, 0, "config"); + +if (rc >= size || rc < 0) { +return -ENODEV; +} + +config_fd = open(path, O_RDWR); +if (config_fd < 0) { +return -ENODEV; +} + +do { +rc = pread(config_fd, (uint8_t *)&val, len, pos); +} while (rc < 0 && (errno == EINTR || errno == EAGAIN)); +if (rc != len) { +return -errno; +} + +return 0; +} + +static int igd_pt_i440fx_initfn(struct PCIDevice *pci_dev) +{ +uint32_t val = 0; +int rc, i, num; +int pos, len; + +num = ARRAY_SIZE(igd_host_bridge_infos); +for (i = 0; i < num; i++) { +pos = igd_host_bridge_infos[i].offset; +len = igd_host_bridge_infos[i].len; +rc = host_pci_config_read(pos, len, val); +if (rc) { +return -ENODEV; +} +pci_default_write_config(pci_dev, pos, val, len); +} + +return 0; +} + +static void igd_passthrough_i440fx_class_init(ObjectClass *klass, void *data) +{ +DeviceClass *dc = DEVICE_CLASS(klass); +PCIDeviceClass *k = PCI_DEVICE_CLASS(klass); + +k->init = igd_pt_i440fx_initfn; +dc->desc = "IGD Passthrough Host bridge"; +} + +static const TypeInfo igd_passthrough_i440fx_info = { +.name = TYPE_IGD_PASSTHROUGH_I440FX_PCI_DEVICE, +.parent= TYPE_I440FX_PCI_DEVICE, +.instance_size = sizeof(PCII440FXState), +.class_init= igd_passthrough_i440fx_class_init, +}; + static const char *i440fx_pcihost_root_bus_path(PCIHostState *host_bridge, PCIBus *rootbus) { @@ -769,6 +850,7 @@ static const TypeInfo i440fx_pcihost_info = { static void i440fx_register_types(void) { type_register_static(&i440fx_info); +type_register_static(&igd_passthrough_i440fx_info); type_register_static(&piix3_info); type_register_static(&piix3_xen_info); type_register_static(&i440fx_pcihost_info); diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h index 99dc26b..ec3ca88 100644 --- a/include/hw/i386/pc.h +++ b/include/hw/i386/pc.h @@ -233,6 +233,8 @@ typedef struct PCII440FXState PCII440FXState; #define TYPE_I440FX_PCI_HOST_BRIDGE "i440FX-pcihost" #define TYPE_I440FX_PCI_DEVICE "i440FX" +#define TYPE_IGD_PASSTHROUGH_I440FX_PCI_DEVICE "xen-igd-passthrough-i440FX" + PCIBus *i440fx_init(const char *host_type, const char *pci_type, PCII440FXState **pi440fx_state, int *piix_devfn, ISABus **isa_bus, qemu_irq *pic, -- 1.9.1
[Qemu-devel] [v7][PATCH 02/10] pc_init1: pass parameters just with types
Pass types to configure pc_init1(). Signed-off-by: Tiejun Chen --- hw/i386/pc_piix.c | 17 +++-- 1 file changed, 11 insertions(+), 6 deletions(-) diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c index 07faec9..cea3a5c 100644 --- a/hw/i386/pc_piix.c +++ b/hw/i386/pc_piix.c @@ -75,7 +75,9 @@ static bool has_reserved_memory = true; /* PC hardware initialisation */ static void pc_init1(MachineState *machine, int pci_enabled, - int kvmclock_enabled) + int kvmclock_enabled, + const char *host_type, + const char *pci_type) { PCMachineState *pc_machine = PC_MACHINE(machine); MemoryRegion *system_memory = get_system_memory(); @@ -202,8 +204,8 @@ static void pc_init1(MachineState *machine, } if (pci_enabled) { -pci_bus = i440fx_init(TYPE_I440FX_PCI_HOST_BRIDGE, - TYPE_I440FX_PCI_DEVICE, +pci_bus = i440fx_init(host_type, + pci_type, &i440fx_state, &piix3_devfn, &isa_bus, gsi, system_memory, system_io, machine->ram_size, below_4g_mem_size, @@ -309,7 +311,8 @@ static void pc_init1(MachineState *machine, static void pc_init_pci(MachineState *machine) { -pc_init1(machine, 1, 1); +pc_init1(machine, 1, 1, TYPE_I440FX_PCI_HOST_BRIDGE, + TYPE_I440FX_PCI_DEVICE); } static void pc_compat_2_2(MachineState *machine) @@ -478,7 +481,8 @@ static void pc_init_pci_1_2(MachineState *machine) static void pc_init_pci_no_kvmclock(MachineState *machine) { pc_compat_1_2(machine); -pc_init1(machine, 1, 0); +pc_init1(machine, 1, 0, TYPE_I440FX_PCI_HOST_BRIDGE, + TYPE_I440FX_PCI_DEVICE); } static void pc_init_isa(MachineState *machine) @@ -495,7 +499,8 @@ static void pc_init_isa(MachineState *machine) } x86_cpu_compat_kvm_no_autoenable(FEAT_KVM, 1 << KVM_FEATURE_PV_EOI); enable_compat_apic_id_mode(); -pc_init1(machine, 0, 1); +pc_init1(machine, 0, 1, TYPE_I440FX_PCI_HOST_BRIDGE, + TYPE_I440FX_PCI_DEVICE); } #ifdef CONFIG_XEN -- 1.9.1
[Qemu-devel] [v7][PATCH 10/10] xen, gfx passthrough: add opregion mapping
The OpRegion shouldn't be mapped 1:1 because the address in the host can't be used in the guest directly. This patch traps read and write access to the opregion of the Intel GPU config space (offset 0xfc). The original patch is from Jean Guyader Signed-off-by: Tiejun Chen Signed-off-by: Yang Zhang --- hw/xen/xen_pt.h | 6 +++- hw/xen/xen_pt_config_init.c | 52 ++-- hw/xen/xen_pt_graphics.c| 82 + 3 files changed, 137 insertions(+), 3 deletions(-) diff --git a/hw/xen/xen_pt.h b/hw/xen/xen_pt.h index d9b1ce0..20cc9b9 100644 --- a/hw/xen/xen_pt.h +++ b/hw/xen/xen_pt.h @@ -36,6 +36,9 @@ typedef struct XenPTReg XenPTReg; typedef struct XenPCIPassthroughState XenPCIPassthroughState; +uint32_t igd_read_opregion(XenPCIPassthroughState *s); +void igd_write_opregion(XenPCIPassthroughState *s, uint32_t val); + /* function type for config reg */ typedef int (*xen_pt_conf_reg_init) (XenPCIPassthroughState *, XenPTRegInfo *, uint32_t real_offset, @@ -62,8 +65,9 @@ typedef int (*xen_pt_conf_byte_read) #define XEN_PT_BAR_ALLF 0x #define XEN_PT_BAR_UNMAPPED (-1) -#define PCI_CAP_MAX 48 +#define XEN_PCI_CAP_MAX 48 +#define XEN_PCI_INTEL_OPREGION 0xfc typedef enum { XEN_PT_GRP_TYPE_HARDWIRED = 0, /* 0 Hardwired reg group */ diff --git a/hw/xen/xen_pt_config_init.c b/hw/xen/xen_pt_config_init.c index d99c22e..abcfe1b 100644 --- a/hw/xen/xen_pt_config_init.c +++ b/hw/xen/xen_pt_config_init.c @@ -573,6 +573,22 @@ static int xen_pt_exp_rom_bar_reg_write(XenPCIPassthroughState *s, return 0; } +static int xen_pt_intel_opregion_read(XenPCIPassthroughState *s, + XenPTReg *cfg_entry, + uint32_t *value, uint32_t valid_mask) +{ +*value = igd_read_opregion(s); +return 0; +} + +static int xen_pt_intel_opregion_write(XenPCIPassthroughState *s, + XenPTReg *cfg_entry, uint32_t *value, + uint32_t dev_value, uint32_t valid_mask) +{ +igd_write_opregion(s, *value); +return 0; +} + /* Header Type0 reg static information table */ static XenPTRegInfo xen_pt_emu_reg_header0[] = { /* Vendor ID reg */ @@ -1438,6 +1454,20 @@ static XenPTRegInfo xen_pt_emu_reg_msix[] = { }, }; +static XenPTRegInfo xen_pt_emu_reg_igd_opregion[] = { +/* Intel IGFX OpRegion reg */ +{ +.offset = 0x0, +.size = 4, +.init_val = 0, +.no_wb = 1, +.u.dw.read = xen_pt_intel_opregion_read, +.u.dw.write = xen_pt_intel_opregion_write, +}, +{ +.size = 0, +}, +}; / * Capabilities @@ -1675,6 +1705,14 @@ static const XenPTRegGroupInfo xen_pt_emu_reg_grps[] = { .size_init = xen_pt_msix_size_init, .emu_regs = xen_pt_emu_reg_msix, }, +/* Intel IGD Opregion group */ +{ +.grp_id = XEN_PCI_INTEL_OPREGION, +.grp_type= XEN_PT_GRP_TYPE_EMU, +.grp_size= 0x4, +.size_init = xen_pt_reg_grp_size_init, +.emu_regs= xen_pt_emu_reg_igd_opregion, +}, { .grp_size = 0, }, @@ -1725,7 +1763,7 @@ out: static uint8_t find_cap_offset(XenPCIPassthroughState *s, uint8_t cap) { uint8_t id; -unsigned max_cap = PCI_CAP_MAX; +unsigned max_cap = XEN_PCI_CAP_MAX; uint8_t pos = PCI_CAPABILITY_LIST; uint8_t status = 0; @@ -1804,7 +1842,8 @@ int xen_pt_config_init(XenPCIPassthroughState *s) uint32_t reg_grp_offset = 0; XenPTRegGroup *reg_grp_entry = NULL; -if (xen_pt_emu_reg_grps[i].grp_id != 0xFF) { +if (xen_pt_emu_reg_grps[i].grp_id != 0xFF +&& xen_pt_emu_reg_grps[i].grp_id != XEN_PCI_INTEL_OPREGION) { if (xen_pt_hide_dev_cap(&s->real_device, xen_pt_emu_reg_grps[i].grp_id)) { continue; @@ -1817,6 +1856,15 @@ int xen_pt_config_init(XenPCIPassthroughState *s) } } +/* + * By default we will trap up to 0x40 in the cfg space. + * If an intel device is pass through we need to trap 0xfc, + * therefore the size should be 0xff. + */ +if (xen_pt_emu_reg_grps[i].grp_id == XEN_PCI_INTEL_OPREGION) { +reg_grp_offset = XEN_PCI_INTEL_OPREGION; +} + reg_grp_entry = g_new0(XenPTRegGroup, 1); QLIST_INIT(®_grp_entry->reg_tbl_list); QLIST_INSERT_HEAD(&s->reg_grps, reg_grp_entry, entries); diff --git a/hw/xen/xen_pt_graphics.c b/hw/xen/xen_pt_graphics.c index 3232296..df6069b 100644 --- a/hw/xen/xen_pt_graphics.c +++ b/hw/xen/xen_pt_graphics.c @@ -5,6 +5,11 @@ #include "xen-host-pci-device.h" #include "hw/xen/xen_backend.h" +static unsigned long igd_guest_opregion; +static unsigned long igd_host_opregion; + +#define XEN_PCI_INTEL_O
[Qemu-devel] [v7][PATCH 09/10] xen, gfx passthrough: register host bridge specific to passthrough
Just register that pci host bridge specific to passthrough. Signed-off-by: Tiejun Chen --- hw/i386/pc_piix.c | 19 +-- 1 file changed, 17 insertions(+), 2 deletions(-) diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c index 8fbfc09..eae2d20 100644 --- a/hw/i386/pc_piix.c +++ b/hw/i386/pc_piix.c @@ -50,7 +50,8 @@ #include "cpu.h" #include "qemu/error-report.h" #ifdef CONFIG_XEN -# include +#include +#include "hw/xen/xen_pt.h" #endif #define MAX_IDE_BUS 2 @@ -504,11 +505,25 @@ static void pc_init_isa(MachineState *machine) } #ifdef CONFIG_XEN +static void igd_passthrough_pc_init_pci(MachineState *machine) +{ +pc_init1(machine, 1, 1, TYPE_I440FX_PCI_HOST_BRIDGE, + TYPE_IGD_PASSTHROUGH_I440FX_PCI_DEVICE); +} + +static void pc_igd_init_pci(MachineState *machine) +{ +if (has_igd_gfx_passthru) +igd_passthrough_pc_init_pci(machine); +else +pc_init_pci(machine); +} + static void pc_xen_hvm_init(MachineState *machine) { PCIBus *bus; -pc_init_pci(machine); +pc_igd_init_pci(machine); bus = pci_find_primary_bus(); if (bus != NULL) { -- 1.9.1
[Qemu-devel] [v7][PATCH 04/10] hw/pci-assign: split pci-assign.c
We will try to reuse assign_dev_load_option_rom in xen side, and especially its a good beginning to unify pci assign codes both on kvm and xen in the future. Signed-off-by: Tiejun Chen --- hw/i386/Makefile.objs | 1 + hw/i386/kvm/pci-assign.c | 82 -- hw/i386/pci-assign-load-rom.c | 93 +++ include/hw/pci/pci-assign.h | 27 + 4 files changed, 128 insertions(+), 75 deletions(-) create mode 100644 hw/i386/pci-assign-load-rom.c create mode 100644 include/hw/pci/pci-assign.h diff --git a/hw/i386/Makefile.objs b/hw/i386/Makefile.objs index e058a39..8bd1932 100644 --- a/hw/i386/Makefile.objs +++ b/hw/i386/Makefile.objs @@ -7,6 +7,7 @@ obj-$(CONFIG_XEN) += ../xenpv/ xen/ obj-y += kvmvapic.o obj-y += acpi-build.o +obj-y += pci-assign-load-rom.o hw/i386/acpi-build.o: hw/i386/acpi-build.c \ hw/i386/acpi-dsdt.hex hw/i386/q35-acpi-dsdt.hex \ hw/i386/ssdt-tpm.hex diff --git a/hw/i386/kvm/pci-assign.c b/hw/i386/kvm/pci-assign.c index 9db7c77..fe870c9 100644 --- a/hw/i386/kvm/pci-assign.c +++ b/hw/i386/kvm/pci-assign.c @@ -37,6 +37,7 @@ #include "hw/pci/pci.h" #include "hw/pci/msi.h" #include "kvm_i386.h" +#include "hw/pci/pci-assign.h" #define MSIX_PAGE_SIZE 0x1000 @@ -48,17 +49,6 @@ #define IORESOURCE_PREFETCH 0x2000 /* No side effects */ #define IORESOURCE_MEM_64 0x0010 -//#define DEVICE_ASSIGNMENT_DEBUG - -#ifdef DEVICE_ASSIGNMENT_DEBUG -#define DEBUG(fmt, ...) \ -do { \ -fprintf(stderr, "%s: " fmt, __func__ , __VA_ARGS__); \ -} while (0) -#else -#define DEBUG(fmt, ...) -#endif - typedef struct PCIRegion { int type; /* Memory or port I/O */ int valid; @@ -1893,73 +1883,15 @@ static void assign_register_types(void) type_init(assign_register_types) -/* - * Scan the assigned devices for the devices that have an option ROM, and then - * load the corresponding ROM data to RAM. If an error occurs while loading an - * option ROM, we just ignore that option ROM and continue with the next one. - */ static void assigned_dev_load_option_rom(AssignedDevice *dev) { -char name[32], rom_file[64]; -FILE *fp; -uint8_t val; -struct stat st; -void *ptr; - -/* If loading ROM from file, pci handles it */ -if (dev->dev.romfile || !dev->dev.rom_bar) { -return; -} +int size = 0; -snprintf(rom_file, sizeof(rom_file), - "/sys/bus/pci/devices/%04x:%02x:%02x.%01x/rom", - dev->host.domain, dev->host.bus, dev->host.slot, - dev->host.function); +pci_assign_dev_load_option_rom(&dev->dev, OBJECT(dev), &size, + dev->host.domain, dev->host.bus, + dev->host.slot, dev->host.function); -if (stat(rom_file, &st)) { -return; -} - -if (access(rom_file, F_OK)) { -error_report("pci-assign: Insufficient privileges for %s", rom_file); -return; -} - -/* Write "1" to the ROM file to enable it */ -fp = fopen(rom_file, "r+"); -if (fp == NULL) { -return; +if (!size) { +error_report("pci-assign: Invalid ROM."); } -val = 1; -if (fwrite(&val, 1, 1, fp) != 1) { -goto close_rom; -} -fseek(fp, 0, SEEK_SET); - -snprintf(name, sizeof(name), "%s.rom", -object_get_typename(OBJECT(dev))); -memory_region_init_ram(&dev->dev.rom, OBJECT(dev), name, st.st_size, - &error_abort); -vmstate_register_ram(&dev->dev.rom, &dev->dev.qdev); -ptr = memory_region_get_ram_ptr(&dev->dev.rom); -memset(ptr, 0xff, st.st_size); - -if (!fread(ptr, 1, st.st_size, fp)) { -error_report("pci-assign: Cannot read from host %s", rom_file); -error_printf("Device option ROM contents are probably invalid " - "(check dmesg).\nSkip option ROM probe with rombar=0, " - "or load from file with romfile=\n"); -goto close_rom; -} - -pci_register_bar(&dev->dev, PCI_ROM_SLOT, 0, &dev->dev.rom); -dev->dev.has_rom = true; -close_rom: -/* Write "0" to disable ROM */ -fseek(fp, 0, SEEK_SET); -val = 0; -if (!fwrite(&val, 1, 1, fp)) { -DEBUG("%s\n", "Failed to disable pci-sysfs rom file"); -} -fclose(fp); } diff --git a/hw/i386/pci-assign-load-rom.c b/hw/i386/pci-assign-load-rom.c new file mode 100644 index 000..a04b540 --- /dev/null +++ b/hw/i386/pci-assign-load-rom.c @@ -0,0 +1,93 @@ +/* + * This is splited from hw/i386/kvm/pci-assign.c + */ +#include +#include +#include +#include +#include +#include +#include "hw/hw.h" +#include "hw/i386/pc.h" +#include "qemu/error-report.h" +#include "ui/console.h" +#include "hw/loader.h" +#include "monitor/monitor.h" +#include "qemu/range.h" +#include "sysemu/sysemu.h"
[Qemu-devel] [PATCH v4 1/3] uhci: fix segfault when hot-unplugging uhci controller
From: Gonglei When hot-unplugging the usb controllers (ehci/uhci), we have to clean all resouce of these devices, involved registered reset handler. Otherwise, it may cause NULL pointer access and/or segmentation fault if we reboot the guest os after hot-unplugging. Let's hook up reset via DeviceClass->reset() and drop the qemu_register_reset() call. Then Qemu will register and unregister the reset handler automatically. Cc: qemu-stable Reported-by: Lidonglin Signed-off-by: Gonglei --- hw/usb/hcd-uhci.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/hw/usb/hcd-uhci.c b/hw/usb/hcd-uhci.c index f903de7..d8f0577 100644 --- a/hw/usb/hcd-uhci.c +++ b/hw/usb/hcd-uhci.c @@ -348,9 +348,10 @@ static void uhci_update_irq(UHCIState *s) pci_set_irq(&s->dev, level); } -static void uhci_reset(void *opaque) +static void uhci_reset(DeviceState *dev) { -UHCIState *s = opaque; +PCIDevice *d = PCI_DEVICE(dev); +UHCIState *s = DO_UPCAST(UHCIState, dev, d); uint8_t *pci_conf; int i; UHCIPort *port; @@ -454,11 +455,11 @@ static void uhci_port_write(void *opaque, hwaddr addr, port = &s->ports[i]; usb_device_reset(port->port.dev); } -uhci_reset(s); +uhci_reset(DEVICE(s)); return; } if (val & UHCI_CMD_HCRESET) { -uhci_reset(s); +uhci_reset(DEVICE(s)); return; } s->cmd = val; @@ -1226,8 +1227,6 @@ static int usb_uhci_common_initfn(PCIDevice *dev) s->num_ports_vmstate = NB_PORTS; QTAILQ_INIT(&s->queues); -qemu_register_reset(uhci_reset, s); - memory_region_init_io(&s->io_bar, OBJECT(s), &uhci_ioport_ops, s, "uhci", 0x20); @@ -1303,6 +1302,7 @@ static void uhci_class_init(ObjectClass *klass, void *data) k->revision = info->revision; k->class_id = PCI_CLASS_SERIAL_USB; dc->vmsd = &vmstate_uhci; +dc->reset = uhci_reset; if (!info->unplug) { /* uhci controllers in companion setups can't be hotplugged */ dc->hotpluggable = false; -- 1.7.12.4
[Qemu-devel] [PATCH v4 0/3] usb: fix segfault when hot-unplugging usb host adapter
From: Gonglei When hot-unplugging the usb controllers (ehci/uhci), we have to clean all resouce of these devices, involved registered reset handler. Otherwise, it may cause NULL pointer access and/or segmentation fault if we reboot the guest os after hot-unplugging. Let's hook up reset via DeviceClass->reset() and drop the qemu_register_reset() call. Then Qemu will register and unregister the reset handler automatically. Cc: qemu-stable v4 -> v3: - add reset hookup for sysbus devices (ehci/ohci). (Gerd) v2 -> v3: - rewrite each patch's title v1 -> v2: - hooking up reset via DeviceClass->reset and drop the qemu_register_reset() calls. (Gerd) Gonglei (3): uhci: fix segfault when hot-unplugging uhci controller ehci: fix segfault when hot-unplugging ehci controller ohci: fix resource cleanup leak hw/usb/hcd-ehci-pci.c| 10 ++ hw/usb/hcd-ehci-sysbus.c | 10 ++ hw/usb/hcd-ehci.c| 3 +-- hw/usb/hcd-ehci.h| 1 + hw/usb/hcd-ohci.c| 20 +++- hw/usb/hcd-uhci.c| 12 ++-- 6 files changed, 47 insertions(+), 9 deletions(-) -- 1.7.12.4
[Qemu-devel] [PATCH V4 00/19] Support more virtio queues
We current limit the max virtio queues to 64. This is not sufficient to support multiqueue devices (e.g recent Linux support up to 256 tap queues). So this series tries to let virtio to support more queues. No much works need to be done except: - Introducing transport specific queue limitation. - Let each virtio transport to use specific limit. - Speedup the MSI-X masking and unmasking through per vector queue list, and increase the maximum MSI-X vectors supported by qemu. - With the above enhancements, increase the maximum number of virtqueues supported by PCI from 64 to 513. - Compat the changes for legacy machine types. With this patch, we can support up to 256 queues. Since x86 can only allow about 240 interrupt vectors for MSI-X, current Linux driver can only have about 80 queue pairs has their private MSI-X interrupt vectors. With sharing IRQ with queue pairs (RFC posted in https://lkml.org/lkml/2014/12/25/169), Linux driver can have up to about 186 queue pairs has their private MSI-X interrupt vectors. Stress/migration test on virtio-pci, compile test on other targets. And make check on s390x-softmmu and ppc64-softmmu. Cc: Paolo Bonzini Cc: Richard Henderson Cc: Michael S. Tsirkin Cc: Alexander Graf Cc: Keith Busch Cc: Kevin Wolf Cc: Stefan Hajnoczi Cc: Christian Borntraeger Cc: Cornelia Huck Cc: Amit Shah Cc: qemu-...@nongnu.org Please review Thanks Changes from V3: - rebase to master and target to 2.4 - handling compat issues for spapr - fixes for hmp command completion when we have a nic with 256 queues - using VIRTIO_NO_VECTOR instead of 0 for invalid virtqueue in virtio-ccw - fix compile issues for ppc64-softmmu - don't export VIRTIO_CCW_QUEUE_MAX by introducing a gsi limit and inherit in ccw - use transport specific queue limit in virtio-serial - correct the stale comment for AdapterRoutes and move it to ccw patch - replace 128 with queue_max + 64 and add a comment for this in virtio_ccw_notify() Changes from V2: - move transport specific limitation to their implementation. (The left is VIRTIO_CCW_QUEUE_MAX since I fail to find a common header files other than virtio.h) - use virtio_get_queue_max() if possible in virtio.c - AdapterRoutes should use ccw limit - introduce a vector to queue mapping for virito devices and speedup the MSI-X masking and unmasking through this. Changes from V1: - add a validation against the bus limitation - switch to use a bus specific queue limit instead of a global one, this will allow us to just increase the limit of one transport without disturbing others. - only increase the queue limit of virtio-pci - limit the maximum number of virtio queues to 64 for legacy machine types Jason Wang (19): pc: add 2.4 machine types spapr: add machine type specific instance init function ppc: spapr: add 2.4 machine type monitor: replace the magic number 255 with MAX_QUEUE_NUM monitor: check return value of qemu_find_net_clients_except() virtio-ccw: using VIRTIO_NO_VECTOR instead of 0 for invalid virtqueue virtio-net: validate backend queue numbers against bus limitation virtio-net: fix the upper bound when trying to delete queues virito: introduce bus specific queue limit virtio-ccw: introduce ccw specific queue limit virtio-s390: switch to bus specific queue limit virtio-mmio: switch to bus specific queue limit virtio-pci: switch to use bus specific queue limit virtio: introduce vector to virtqueues mapping virtio: introduce virtio_queue_get_index() virtio-pci: speedup MSI-X masking and unmasking virtio-pci: increase the maximum number of virtqueues to 513 pci: remove hard-coded bar size in msix_init_exclusive_bar() virtio-pci: introduce auto_msix_bar_size property hw/block/nvme.c| 2 +- hw/char/virtio-serial-bus.c| 2 +- hw/i386/pc_piix.c | 42 --- hw/i386/pc_q35.c | 39 +++-- hw/misc/ivshmem.c | 2 +- hw/net/virtio-net.c| 9 - hw/pci/msix.c | 18 -- hw/ppc/spapr.c | 70 +++-- hw/s390x/s390-virtio-bus.c | 7 ++-- hw/s390x/s390-virtio-ccw.c | 7 ++-- hw/s390x/virtio-ccw.c | 20 +++ hw/scsi/virtio-scsi.c | 4 +-- hw/virtio/virtio-mmio.c| 7 ++-- hw/virtio/virtio-pci.c | 78 -- hw/virtio/virtio-pci.h | 3 ++ hw/virtio/virtio.c | 75 +++- include/hw/compat.h| 11 ++ include/hw/pci/msix.h | 2 +- include/hw/s390x/s390_flic.h | 4 ++- include/hw/virtio/virtio-bus.h | 2 ++ include/hw/virtio/virtio.h | 7 ++-- monitor.c | 25 -- 22 files changed, 339 insertions(+), 97 deletions(-) -- 2.1.0
[Qemu-devel] [PATCH V4 02/19] spapr: add machine type specific instance init function
This patches adds machine type specific instance initialization functions. Those functions will be used by following patches to compat class properties for legacy machine types. Cc: Alexander Graf Cc: qemu-...@nongnu.org Signed-off-by: Jason Wang --- hw/ppc/spapr.c | 23 +++ 1 file changed, 23 insertions(+) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index 0487f52..7fb174f 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -1803,6 +1803,27 @@ static const TypeInfo spapr_machine_info = { #define SPAPR_COMPAT_2_1 \ SPAPR_COMPAT_2_2 +static void spapr_compat_2_2(Object *obj) +{ +} + +static void spapr_compat_2_1(Object *obj) +{ +spapr_compat_2_2(obj); +} + +static void spapr_machine_2_2_instance_init(Object *obj) +{ +spapr_compat_2_2(obj); +spapr_machine_initfn(obj); +} + +static void spapr_machine_2_1_instance_init(Object *obj) +{ +spapr_compat_2_1(obj); +spapr_machine_initfn(obj); +} + static void spapr_machine_2_1_class_init(ObjectClass *oc, void *data) { MachineClass *mc = MACHINE_CLASS(oc); @@ -1821,6 +1842,7 @@ static const TypeInfo spapr_machine_2_1_info = { .name = TYPE_SPAPR_MACHINE "2.1", .parent= TYPE_SPAPR_MACHINE, .class_init= spapr_machine_2_1_class_init, +.instance_init = spapr_machine_2_1_instance_init, }; static void spapr_machine_2_2_class_init(ObjectClass *oc, void *data) @@ -1840,6 +1862,7 @@ static const TypeInfo spapr_machine_2_2_info = { .name = TYPE_SPAPR_MACHINE "2.2", .parent= TYPE_SPAPR_MACHINE, .class_init= spapr_machine_2_2_class_init, +.instance_init = spapr_machine_2_2_instance_init, }; static void spapr_machine_2_3_class_init(ObjectClass *oc, void *data) -- 2.1.0
Re: [Qemu-devel] [PATCH] MAINTAINERS: Add myself as the maintainer of the Quorum driver
On Tue, Mar 17, 2015 at 05:23:56PM +, Stefan Hajnoczi wrote: > > + > > +Quorum > > +M: Alberto Garcia > > +S: Supported > > +F: block/quorum.c > > I'd like to also add the qemu-bl...@nongnu.org mailing list to your > MAINTAINERS stanza because patches should be CCed to the block list: > > L: qemu-bl...@nongnu.org > > Does that sound good? If you have any questions, just let me know. Yes, please go ahead. Thanks, Berto
[Qemu-devel] [PATCH V4 03/19] ppc: spapr: add 2.4 machine type
The following patches will limit the following things to legacy machine type: - maximum number of virtqueues for virtio-pci were limited to 64 - auto msix bar size for virtio-net-pci were disabled by default Cc: Alexander Graf Cc: qemu-...@nongnu.org Signed-off-by: Jason Wang --- hw/ppc/spapr.c | 31 +-- 1 file changed, 29 insertions(+), 2 deletions(-) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index 7fb174f..22f4ae4 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -1803,8 +1803,13 @@ static const TypeInfo spapr_machine_info = { #define SPAPR_COMPAT_2_1 \ SPAPR_COMPAT_2_2 +static void spapr_compat_2_3(Object *obj) +{ +} + static void spapr_compat_2_2(Object *obj) { +spapr_compat_2_3(obj); } static void spapr_compat_2_1(Object *obj) @@ -1812,6 +1817,12 @@ static void spapr_compat_2_1(Object *obj) spapr_compat_2_2(obj); } +static void spapr_machine_2_3_instance_init(Object *obj) +{ +spapr_compat_2_3(obj); +spapr_machine_initfn(obj); +} + static void spapr_machine_2_2_instance_init(Object *obj) { spapr_compat_2_2(obj); @@ -1871,14 +1882,29 @@ static void spapr_machine_2_3_class_init(ObjectClass *oc, void *data) mc->name = "pseries-2.3"; mc->desc = "pSeries Logical Partition (PAPR compliant) v2.3"; -mc->alias = "pseries"; -mc->is_default = 1; } static const TypeInfo spapr_machine_2_3_info = { .name = TYPE_SPAPR_MACHINE "2.3", .parent= TYPE_SPAPR_MACHINE, .class_init= spapr_machine_2_3_class_init, +.instance_init = spapr_machine_2_3_instance_init, +}; + +static void spapr_machine_2_4_class_init(ObjectClass *oc, void *data) +{ +MachineClass *mc = MACHINE_CLASS(oc); + +mc->name = "pseries-2.4"; +mc->desc = "pSeries Logical Partition (PAPR compliant) v2.4"; +mc->alias = "pseries"; +mc->is_default = 1; +} + +static const TypeInfo spapr_machine_2_4_info = { +.name = TYPE_SPAPR_MACHINE "2.4", +.parent= TYPE_SPAPR_MACHINE, +.class_init= spapr_machine_2_4_class_init, }; static void spapr_machine_register_types(void) @@ -1887,6 +1913,7 @@ static void spapr_machine_register_types(void) type_register_static(&spapr_machine_2_1_info); type_register_static(&spapr_machine_2_2_info); type_register_static(&spapr_machine_2_3_info); +type_register_static(&spapr_machine_2_4_info); } type_init(spapr_machine_register_types) -- 2.1.0
[Qemu-devel] [PATCH V4 08/19] virtio-net: fix the upper bound when trying to delete queues
Virtqueue were indexed from zero, so don't delete virtqueue whose index is n->max_queues * 2 + 1. Cc: Michael S. Tsirkin Signed-off-by: Jason Wang --- hw/net/virtio-net.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index 59f76bc..b6fac9c 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -1309,7 +1309,7 @@ static void virtio_net_set_multiqueue(VirtIONet *n, int multiqueue) n->multiqueue = multiqueue; -for (i = 2; i <= n->max_queues * 2 + 1; i++) { +for (i = 2; i < n->max_queues * 2 + 1; i++) { virtio_del_queue(vdev, i); } -- 2.1.0
[Qemu-devel] [PATCH V4 13/19] virtio-pci: switch to use bus specific queue limit
Instead of depending on a macro, switch to use a bus specific queue limit. Cc: Michael S. Tsirkin Signed-off-by: Jason Wang --- hw/virtio/virtio-pci.c | 12 +++- include/hw/virtio/virtio.h | 2 -- 2 files changed, 7 insertions(+), 7 deletions(-) diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c index 075b13b..e556919 100644 --- a/hw/virtio/virtio-pci.c +++ b/hw/virtio/virtio-pci.c @@ -42,6 +42,8 @@ * configuration space */ #define VIRTIO_PCI_CONFIG_SIZE(dev) VIRTIO_PCI_CONFIG_OFF(msix_enabled(dev)) +#define VIRTIO_PCI_QUEUE_MAX 64 + static void virtio_pci_bus_new(VirtioBusState *bus, size_t bus_size, VirtIOPCIProxy *dev); @@ -171,7 +173,7 @@ static void virtio_pci_start_ioeventfd(VirtIOPCIProxy *proxy) return; } -for (n = 0; n < VIRTIO_PCI_QUEUE_MAX; n++) { +for (n = 0; n < virtio_get_queue_max(vdev); n++) { if (!virtio_queue_get_num(vdev, n)) { continue; } @@ -207,7 +209,7 @@ static void virtio_pci_stop_ioeventfd(VirtIOPCIProxy *proxy) return; } -for (n = 0; n < VIRTIO_PCI_QUEUE_MAX; n++) { +for (n = 0; n < virtio_get_queue_max(vdev); n++) { if (!virtio_queue_get_num(vdev, n)) { continue; } @@ -243,11 +245,11 @@ static void virtio_ioport_write(void *opaque, uint32_t addr, uint32_t val) virtio_queue_set_addr(vdev, vdev->queue_sel, pa); break; case VIRTIO_PCI_QUEUE_SEL: -if (val < VIRTIO_PCI_QUEUE_MAX) +if (val < virtio_get_queue_max(vdev)) vdev->queue_sel = val; break; case VIRTIO_PCI_QUEUE_NOTIFY: -if (val < VIRTIO_PCI_QUEUE_MAX) { +if (val < virtio_get_queue_max(vdev)) { virtio_queue_notify(vdev, val); } break; @@ -748,7 +750,7 @@ static int virtio_pci_set_guest_notifiers(DeviceState *d, int nvqs, bool assign) bool with_irqfd = msix_enabled(&proxy->pci_dev) && kvm_msi_via_irqfd_enabled(); -nvqs = MIN(nvqs, VIRTIO_PCI_QUEUE_MAX); +nvqs = MIN(nvqs, virtio_get_queue_max(vdev)); /* When deassigning, pass a consistent nvqs value * to avoid leaking notifiers. diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h index 91fd673..7ff40ac 100644 --- a/include/hw/virtio/virtio.h +++ b/include/hw/virtio/virtio.h @@ -48,8 +48,6 @@ typedef struct VirtQueueElement struct iovec out_sg[VIRTQUEUE_MAX_SIZE]; } VirtQueueElement; -#define VIRTIO_PCI_QUEUE_MAX 64 - #define VIRTIO_NO_VECTOR 0x #define TYPE_VIRTIO_DEVICE "virtio-device" -- 2.1.0
[Qemu-devel] [PATCH V4 01/19] pc: add 2.4 machine types
The following patches will limit the following things to legacy machine type: - maximum number of virtqueues for virtio-pci were limited to 64 - auto msix bar size for virtio-net-pci were disabled by default Cc: Paolo Bonzini Cc: Richard Henderson Cc: Michael S. Tsirkin Signed-off-by: Jason Wang --- hw/i386/pc_piix.c | 29 + hw/i386/pc_q35.c | 26 +++--- 2 files changed, 48 insertions(+), 7 deletions(-) diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c index 36c69d7..175e582 100644 --- a/hw/i386/pc_piix.c +++ b/hw/i386/pc_piix.c @@ -310,8 +310,13 @@ static void pc_init_pci(MachineState *machine) pc_init1(machine, 1, 1); } +static void pc_compat_2_3(MachineState *machine) +{ +} + static void pc_compat_2_2(MachineState *machine) { +pc_compat_2_3(machine); rsdp_in_ram = false; x86_cpu_compat_set_features("kvm64", FEAT_1_EDX, 0, CPUID_VME); x86_cpu_compat_set_features("kvm32", FEAT_1_EDX, 0, CPUID_VME); @@ -417,6 +422,12 @@ static void pc_compat_1_2(MachineState *machine) x86_cpu_compat_kvm_no_autoenable(FEAT_KVM, 1 << KVM_FEATURE_PV_EOI); } +static void pc_init_pci_2_3(MachineState *machine) +{ +pc_compat_2_3(machine); +pc_init_pci(machine); +} + static void pc_init_pci_2_2(MachineState *machine) { pc_compat_2_2(machine); @@ -516,19 +527,28 @@ static void pc_xen_hvm_init(MachineState *machine) .desc = "Standard PC (i440FX + PIIX, 1996)", \ .hot_add_cpu = pc_hot_add_cpu -#define PC_I440FX_2_3_MACHINE_OPTIONS \ +#define PC_I440FX_2_4_MACHINE_OPTIONS \ PC_I440FX_MACHINE_OPTIONS, \ .default_machine_opts = "firmware=bios-256k.bin", \ .default_display = "std" -static QEMUMachine pc_i440fx_machine_v2_3 = { -PC_I440FX_2_3_MACHINE_OPTIONS, -.name = "pc-i440fx-2.3", + +static QEMUMachine pc_i440fx_machine_v2_4 = { +PC_I440FX_2_4_MACHINE_OPTIONS, +.name = "pc-i440fx-2.4", .alias = "pc", .init = pc_init_pci, .is_default = 1, }; +#define PC_I440FX_2_3_MACHINE_OPTIONS PC_I440FX_2_4_MACHINE_OPTIONS + +static QEMUMachine pc_i440fx_machine_v2_3 = { +PC_I440FX_2_3_MACHINE_OPTIONS, +.name = "pc-i440fx-2.3", +.init = pc_init_pci_2_3, +}; + #define PC_I440FX_2_2_MACHINE_OPTIONS PC_I440FX_2_3_MACHINE_OPTIONS static QEMUMachine pc_i440fx_machine_v2_2 = { @@ -974,6 +994,7 @@ static QEMUMachine xenfv_machine = { static void pc_machine_init(void) { +qemu_register_pc_machine(&pc_i440fx_machine_v2_4); qemu_register_pc_machine(&pc_i440fx_machine_v2_3); qemu_register_pc_machine(&pc_i440fx_machine_v2_2); qemu_register_pc_machine(&pc_i440fx_machine_v2_1); diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c index bc40537..df44d25 100644 --- a/hw/i386/pc_q35.c +++ b/hw/i386/pc_q35.c @@ -289,8 +289,13 @@ static void pc_q35_init(MachineState *machine) } } +static void pc_compat_2_3(MachineState *machine) +{ +} + static void pc_compat_2_2(MachineState *machine) { +pc_compat_2_3(machine); rsdp_in_ram = false; x86_cpu_compat_set_features("kvm64", FEAT_1_EDX, 0, CPUID_VME); x86_cpu_compat_set_features("kvm32", FEAT_1_EDX, 0, CPUID_VME); @@ -365,6 +370,12 @@ static void pc_compat_1_4(MachineState *machine) x86_cpu_compat_set_features("Westmere", FEAT_1_ECX, 0, CPUID_EXT_PCLMULQDQ); } +static void pc_q35_init_2_3(MachineState *machine) +{ +pc_compat_2_3(machine); +pc_q35_init(machine); +} + static void pc_q35_init_2_2(MachineState *machine) { pc_compat_2_2(machine); @@ -414,16 +425,24 @@ static void pc_q35_init_1_4(MachineState *machine) .hot_add_cpu = pc_hot_add_cpu, \ .units_per_default_bus = 1 -#define PC_Q35_2_3_MACHINE_OPTIONS \ +#define PC_Q35_2_4_MACHINE_OPTIONS \ PC_Q35_MACHINE_OPTIONS, \ .default_machine_opts = "firmware=bios-256k.bin", \ .default_display = "std" +static QEMUMachine pc_q35_machine_v2_4 = { +PC_Q35_2_4_MACHINE_OPTIONS, +.name = "pc-q35-2.4", +.alias = "q35", +.init = pc_q35_init, +}; + +#define PC_Q35_2_3_MACHINE_OPTIONS PC_Q35_2_4_MACHINE_OPTIONS + static QEMUMachine pc_q35_machine_v2_3 = { PC_Q35_2_3_MACHINE_OPTIONS, .name = "pc-q35-2.3", -.alias = "q35", -.init = pc_q35_init, +.init = pc_q35_init_2_3, }; #define PC_Q35_2_2_MACHINE_OPTIONS PC_Q35_2_3_MACHINE_OPTIONS @@ -510,6 +529,7 @@ static QEMUMachine pc_q35_machine_v1_4 = { static void pc_q35_machine_init(void) { +qemu_register_pc_machine(&pc_q35_machine_v2_4); qemu_register_pc_machine(&pc_q35_machine_v2_3); qemu_register_pc_machine(&pc_q35_machine_v2_2); qemu_register_pc_machine(&pc_q35_machine_v2_1); -- 2.1.0
[Qemu-devel] [PATCH V4 16/19] virtio-pci: speedup MSI-X masking and unmasking
This patch tries to speed up the MSI-X masking and unmasking through the mapping between vector and queues. With this patch it will there's no need to go through all possible virtqueues, which may help to reduce the time spent when doing MSI-X masking/unmasking a single vector when more than hundreds or even thousands of virtqueues were supported. Tested with 80 queue pairs virito-net-pci by changing the smp affinity in the background and doing netperf in the same time: Before the patch: 5711.70 Gbits/sec After the patch: 6830.98 Gbits/sec About 19.6% improvements in throughput. Cc: Michael S. Tsirkin Signed-off-by: Jason Wang --- hw/virtio/virtio-pci.c | 40 +--- 1 file changed, 21 insertions(+), 19 deletions(-) diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c index c38f33f..9a5242a 100644 --- a/hw/virtio/virtio-pci.c +++ b/hw/virtio/virtio-pci.c @@ -632,28 +632,30 @@ static int virtio_pci_vector_unmask(PCIDevice *dev, unsigned vector, { VirtIOPCIProxy *proxy = container_of(dev, VirtIOPCIProxy, pci_dev); VirtIODevice *vdev = virtio_bus_get_device(&proxy->bus); -int ret, queue_no; +VirtQueue *vq = virtio_vector_first_queue(vdev, vector); +int ret, index, unmasked = 0; -for (queue_no = 0; queue_no < proxy->nvqs_with_notifiers; queue_no++) { -if (!virtio_queue_get_num(vdev, queue_no)) { +while (vq) { +index = virtio_queue_get_index(vdev, vq); +if (!virtio_queue_get_num(vdev, index)) { break; } -if (virtio_queue_vector(vdev, queue_no) != vector) { -continue; -} -ret = virtio_pci_vq_vector_unmask(proxy, queue_no, vector, msg); +ret = virtio_pci_vq_vector_unmask(proxy, index, vector, msg); if (ret < 0) { goto undo; } +vq = virtio_vector_next_queue(vq); +++unmasked; } + return 0; undo: -while (--queue_no >= 0) { -if (virtio_queue_vector(vdev, queue_no) != vector) { -continue; -} -virtio_pci_vq_vector_mask(proxy, queue_no, vector); +vq = virtio_vector_first_queue(vdev, vector); +while (vq && --unmasked >= 0) { +index = virtio_queue_get_index(vdev, vq); +virtio_pci_vq_vector_mask(proxy, index, vector); +vq = virtio_vector_next_queue(vq); } return ret; } @@ -662,16 +664,16 @@ static void virtio_pci_vector_mask(PCIDevice *dev, unsigned vector) { VirtIOPCIProxy *proxy = container_of(dev, VirtIOPCIProxy, pci_dev); VirtIODevice *vdev = virtio_bus_get_device(&proxy->bus); -int queue_no; +VirtQueue *vq = virtio_vector_first_queue(vdev, vector); +int index; -for (queue_no = 0; queue_no < proxy->nvqs_with_notifiers; queue_no++) { -if (!virtio_queue_get_num(vdev, queue_no)) { +while (vq) { +index = virtio_queue_get_index(vdev, vq); +if (!virtio_queue_get_num(vdev, index)) { break; } -if (virtio_queue_vector(vdev, queue_no) != vector) { -continue; -} -virtio_pci_vq_vector_mask(proxy, queue_no, vector); +virtio_pci_vq_vector_mask(proxy, index, vector); +vq = virtio_vector_next_queue(vq); } } -- 2.1.0
[Qemu-devel] [PATCH V4 06/19] virtio-ccw: using VIRTIO_NO_VECTOR instead of 0 for invalid virtqueue
There's no need to use vector 0 for invalid virtqueue. So this patch changes to use VIRTIO_NO_VECTOR instead. Cc: Michael S. Tsirkin Cc: Cornelia Huck CC: Christian Borntraeger Cc: Richard Henderson Cc: Alexander Graf Signed-off-by: Jason Wang --- hw/s390x/virtio-ccw.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c index 130535c..c8b87aa 100644 --- a/hw/s390x/virtio-ccw.c +++ b/hw/s390x/virtio-ccw.c @@ -281,7 +281,7 @@ static int virtio_ccw_set_vqs(SubchDev *sch, uint64_t addr, uint32_t align, virtio_queue_set_addr(vdev, index, addr); if (!addr) { -virtio_queue_set_vector(vdev, index, 0); +virtio_queue_set_vector(vdev, index, VIRTIO_NO_VECTOR); } else { /* Fail if we don't have a big enough queue. */ /* TODO: Add interface to handle vring.num changing */ -- 2.1.0
[Qemu-devel] [PATCH V4 09/19] virito: introduce bus specific queue limit
This patch introduces a bus specific queue limitation. It will be useful for increasing the limit for one of the bus without disturbing other buses. Cc: Michael S. Tsirkin Cc: Alexander Graf Cc: Richard Henderson Cc: Cornelia Huck Cc: Christian Borntraeger Cc: Paolo Bonzini Signed-off-by: Jason Wang --- hw/char/virtio-serial-bus.c| 2 +- hw/net/virtio-net.c| 4 ++-- hw/s390x/s390-virtio-bus.c | 1 + hw/s390x/virtio-ccw.c | 1 + hw/scsi/virtio-scsi.c | 4 ++-- hw/virtio/virtio-mmio.c| 1 + hw/virtio/virtio-pci.c | 1 + hw/virtio/virtio.c | 40 +--- include/hw/virtio/virtio-bus.h | 1 + include/hw/virtio/virtio.h | 1 + 10 files changed, 36 insertions(+), 20 deletions(-) diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c index c86814f..dec2f2d 100644 --- a/hw/char/virtio-serial-bus.c +++ b/hw/char/virtio-serial-bus.c @@ -942,7 +942,7 @@ static void virtio_serial_device_realize(DeviceState *dev, Error **errp) } /* Each port takes 2 queues, and one pair is for the control queue */ -max_supported_ports = VIRTIO_PCI_QUEUE_MAX / 2 - 1; +max_supported_ports = virtio_get_queue_max(vdev) / 2 - 1; if (vser->serial.max_virtserial_ports > max_supported_ports) { error_setg(errp, "maximum ports supported: %u", max_supported_ports); diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index b6fac9c..bf286f5 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -1588,10 +1588,10 @@ static void virtio_net_device_realize(DeviceState *dev, Error **errp) virtio_init(vdev, "virtio-net", VIRTIO_ID_NET, n->config_size); n->max_queues = MAX(n->nic_conf.peers.queues, 1); -if (n->max_queues * 2 + 1 > VIRTIO_PCI_QUEUE_MAX) { +if (n->max_queues * 2 + 1 > virtio_get_queue_max(vdev)) { error_setg(errp, "Invalid number of queues (= %" PRIu32 "), " "must be a postive integer less than %d.", - n->max_queues, (VIRTIO_PCI_QUEUE_MAX - 1) / 2); + n->max_queues, (virtio_get_queue_max(vdev) - 1) / 2); virtio_cleanup(vdev); return; } diff --git a/hw/s390x/s390-virtio-bus.c b/hw/s390x/s390-virtio-bus.c index 047c963..2b41e32 100644 --- a/hw/s390x/s390-virtio-bus.c +++ b/hw/s390x/s390-virtio-bus.c @@ -715,6 +715,7 @@ static void virtio_s390_bus_class_init(ObjectClass *klass, void *data) bus_class->max_dev = 1; k->notify = virtio_s390_notify; k->get_features = virtio_s390_get_features; +k->queue_max = VIRTIO_PCI_QUEUE_MAX; } static const TypeInfo virtio_s390_bus_info = { diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c index c8b87aa..dfe6ded 100644 --- a/hw/s390x/virtio-ccw.c +++ b/hw/s390x/virtio-ccw.c @@ -1711,6 +1711,7 @@ static void virtio_ccw_bus_class_init(ObjectClass *klass, void *data) k->load_queue = virtio_ccw_load_queue; k->save_config = virtio_ccw_save_config; k->load_config = virtio_ccw_load_config; +k->queue_max = VIRTIO_PCI_QUEUE_MAX; } static const TypeInfo virtio_ccw_bus_info = { diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c index da0cff8..feb5e07 100644 --- a/hw/scsi/virtio-scsi.c +++ b/hw/scsi/virtio-scsi.c @@ -822,10 +822,10 @@ void virtio_scsi_common_realize(DeviceState *dev, Error **errp, sizeof(VirtIOSCSIConfig)); if (s->conf.num_queues == 0 || -s->conf.num_queues > VIRTIO_PCI_QUEUE_MAX - 2) { +s->conf.num_queues > virtio_get_queue_max(vdev) - 2) { error_setg(errp, "Invalid number of queues (= %" PRIu32 "), " "must be a positive integer less than %d.", - s->conf.num_queues, VIRTIO_PCI_QUEUE_MAX - 2); + s->conf.num_queues, virtio_get_queue_max(vdev) - 2); virtio_cleanup(vdev); return; } diff --git a/hw/virtio/virtio-mmio.c b/hw/virtio/virtio-mmio.c index 10123f3..2ae6942 100644 --- a/hw/virtio/virtio-mmio.c +++ b/hw/virtio/virtio-mmio.c @@ -403,6 +403,7 @@ static void virtio_mmio_bus_class_init(ObjectClass *klass, void *data) k->device_plugged = virtio_mmio_device_plugged; k->has_variable_vring_alignment = true; bus_class->max_dev = 1; +k->queue_max = VIRTIO_PCI_QUEUE_MAX; } static const TypeInfo virtio_mmio_bus_info = { diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c index c7c3f72..075b13b 100644 --- a/hw/virtio/virtio-pci.c +++ b/hw/virtio/virtio-pci.c @@ -1498,6 +1498,7 @@ static void virtio_pci_bus_class_init(ObjectClass *klass, void *data) k->vmstate_change = virtio_pci_vmstate_change; k->device_plugged = virtio_pci_device_plugged; k->device_unplugged = virtio_pci_device_unplugged; +k->queue_max = VIRTIO_PCI_QUEUE_MAX; } static const TypeInfo virtio_pci_bus_info = { diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c index 3c6e430..023eb04 100644 --- a/hw/v
[Qemu-devel] [PATCH V4 04/19] monitor: replace the magic number 255 with MAX_QUEUE_NUM
This patch replace the magic number 255, and increase it to MAX_QUEUE_NUM which is maximum number of queues supported by a nic. Cc: Luiz Capitulino Signed-off-by: Jason Wang --- monitor.c | 17 ++--- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/monitor.c b/monitor.c index 42116a9..07dfed0 100644 --- a/monitor.c +++ b/monitor.c @@ -4475,10 +4475,11 @@ void set_link_completion(ReadLineState *rs, int nb_args, const char *str) len = strlen(str); readline_set_completion_index(rs, len); if (nb_args == 2) { -NetClientState *ncs[255]; +NetClientState *ncs[MAX_QUEUE_NUM]; int count, i; count = qemu_find_net_clients_except(NULL, ncs, - NET_CLIENT_OPTIONS_KIND_NONE, 255); + NET_CLIENT_OPTIONS_KIND_NONE, + MAX_QUEUE_NUM); for (i = 0; i < count; i++) { const char *name = ncs[i]->name; if (!strncmp(str, name, len)) { @@ -4494,7 +4495,7 @@ void set_link_completion(ReadLineState *rs, int nb_args, const char *str) void netdev_del_completion(ReadLineState *rs, int nb_args, const char *str) { int len, count, i; -NetClientState *ncs[255]; +NetClientState *ncs[MAX_QUEUE_NUM]; if (nb_args != 2) { return; @@ -4503,7 +4504,7 @@ void netdev_del_completion(ReadLineState *rs, int nb_args, const char *str) len = strlen(str); readline_set_completion_index(rs, len); count = qemu_find_net_clients_except(NULL, ncs, NET_CLIENT_OPTIONS_KIND_NIC, - 255); + MAX_QUEUE_NUM); for (i = 0; i < count; i++) { QemuOpts *opts; const char *name = ncs[i]->name; @@ -4569,14 +4570,15 @@ void host_net_add_completion(ReadLineState *rs, int nb_args, const char *str) void host_net_remove_completion(ReadLineState *rs, int nb_args, const char *str) { -NetClientState *ncs[255]; +NetClientState *ncs[MAX_QUEUE_NUM]; int count, i, len; len = strlen(str); readline_set_completion_index(rs, len); if (nb_args == 2) { count = qemu_find_net_clients_except(NULL, ncs, - NET_CLIENT_OPTIONS_KIND_NONE, 255); + NET_CLIENT_OPTIONS_KIND_NONE, + MAX_QUEUE_NUM); for (i = 0; i < count; i++) { int id; char name[16]; @@ -4592,7 +4594,8 @@ void host_net_remove_completion(ReadLineState *rs, int nb_args, const char *str) return; } else if (nb_args == 3) { count = qemu_find_net_clients_except(NULL, ncs, - NET_CLIENT_OPTIONS_KIND_NIC, 255); + NET_CLIENT_OPTIONS_KIND_NIC, + MAX_QUEUE_NUM); for (i = 0; i < count; i++) { int id; const char *name; -- 2.1.0
[Qemu-devel] [PATCH V4 18/19] pci: remove hard-coded bar size in msix_init_exclusive_bar()
This patch let msix_init_exclusive_bar() can accept bar_size parameter other than a hard-coded limit 4096. Then caller can specify a bar_size depends on msix entries and can use up to 2048 msix entries as PCI spec allows. To keep migration compatibility, 4096 is used for all callers and pba were start from half of bar size. Cc: Keith Busch Cc: Kevin Wolf Cc: Stefan Hajnoczi Cc: Michael S. Tsirkin Signed-off-by: Jason Wang --- hw/block/nvme.c| 2 +- hw/misc/ivshmem.c | 2 +- hw/pci/msix.c | 18 +++--- hw/virtio/virtio-pci.c | 2 +- include/hw/pci/msix.h | 2 +- 5 files changed, 11 insertions(+), 15 deletions(-) diff --git a/hw/block/nvme.c b/hw/block/nvme.c index 0f3dfb9..09d7884 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -787,7 +787,7 @@ static int nvme_init(PCIDevice *pci_dev) pci_register_bar(&n->parent_obj, 0, PCI_BASE_ADDRESS_SPACE_MEMORY | PCI_BASE_ADDRESS_MEM_TYPE_64, &n->iomem); -msix_init_exclusive_bar(&n->parent_obj, n->num_queues, 4); +msix_init_exclusive_bar(&n->parent_obj, n->num_queues, 4, 4096); id->vid = cpu_to_le16(pci_get_word(pci_conf + PCI_VENDOR_ID)); id->ssvid = cpu_to_le16(pci_get_word(pci_conf + PCI_SUBSYSTEM_VENDOR_ID)); diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c index 5d272c8..3e2a2d4 100644 --- a/hw/misc/ivshmem.c +++ b/hw/misc/ivshmem.c @@ -631,7 +631,7 @@ static uint64_t ivshmem_get_size(IVShmemState * s) { static void ivshmem_setup_msi(IVShmemState * s) { -if (msix_init_exclusive_bar(PCI_DEVICE(s), s->vectors, 1)) { +if (msix_init_exclusive_bar(PCI_DEVICE(s), s->vectors, 1, 4096)) { IVSHMEM_DPRINTF("msix initialization failed\n"); exit(1); } diff --git a/hw/pci/msix.c b/hw/pci/msix.c index 24de260..9a1894f 100644 --- a/hw/pci/msix.c +++ b/hw/pci/msix.c @@ -291,33 +291,29 @@ int msix_init(struct PCIDevice *dev, unsigned short nentries, } int msix_init_exclusive_bar(PCIDevice *dev, unsigned short nentries, -uint8_t bar_nr) +uint8_t bar_nr, uint32_t bar_size) { int ret; char *name; +uint32_t bar_pba_offset = bar_size / 2; /* * Migration compatibility dictates that this remains a 4k * BAR with the vector table in the lower half and PBA in * the upper half. Do not use these elsewhere! */ -#define MSIX_EXCLUSIVE_BAR_SIZE 4096 -#define MSIX_EXCLUSIVE_BAR_TABLE_OFFSET 0 -#define MSIX_EXCLUSIVE_BAR_PBA_OFFSET (MSIX_EXCLUSIVE_BAR_SIZE / 2) -#define MSIX_EXCLUSIVE_CAP_OFFSET 0 - -if (nentries * PCI_MSIX_ENTRY_SIZE > MSIX_EXCLUSIVE_BAR_PBA_OFFSET) { +if (nentries * PCI_MSIX_ENTRY_SIZE > bar_pba_offset) { return -EINVAL; } name = g_strdup_printf("%s-msix", dev->name); -memory_region_init(&dev->msix_exclusive_bar, OBJECT(dev), name, MSIX_EXCLUSIVE_BAR_SIZE); +memory_region_init(&dev->msix_exclusive_bar, OBJECT(dev), name, bar_size); g_free(name); ret = msix_init(dev, nentries, &dev->msix_exclusive_bar, bar_nr, -MSIX_EXCLUSIVE_BAR_TABLE_OFFSET, &dev->msix_exclusive_bar, -bar_nr, MSIX_EXCLUSIVE_BAR_PBA_OFFSET, -MSIX_EXCLUSIVE_CAP_OFFSET); +0, &dev->msix_exclusive_bar, +bar_nr, bar_pba_offset, +0); if (ret) { return ret; } diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c index 02e3ce8..4a5febb 100644 --- a/hw/virtio/virtio-pci.c +++ b/hw/virtio/virtio-pci.c @@ -937,7 +937,7 @@ static void virtio_pci_device_plugged(DeviceState *d) config[PCI_INTERRUPT_PIN] = 1; if (proxy->nvectors && -msix_init_exclusive_bar(&proxy->pci_dev, proxy->nvectors, 1)) { +msix_init_exclusive_bar(&proxy->pci_dev, proxy->nvectors, 1, 4096)) { error_report("unable to init msix vectors to %" PRIu32, proxy->nvectors); proxy->nvectors = 0; diff --git a/include/hw/pci/msix.h b/include/hw/pci/msix.h index 954d82b..43edebc 100644 --- a/include/hw/pci/msix.h +++ b/include/hw/pci/msix.h @@ -11,7 +11,7 @@ int msix_init(PCIDevice *dev, unsigned short nentries, unsigned table_offset, MemoryRegion *pba_bar, uint8_t pba_bar_nr, unsigned pba_offset, uint8_t cap_pos); int msix_init_exclusive_bar(PCIDevice *dev, unsigned short nentries, -uint8_t bar_nr); +uint8_t bar_nr, uint32_t bar_size); void msix_write_config(PCIDevice *dev, uint32_t address, uint32_t val, int len); -- 2.1.0
[Qemu-devel] [PATCH V4 05/19] monitor: check return value of qemu_find_net_clients_except()
qemu_find_net_clients_except() may return a value which is greater than the size of array we provided. So we should check this value before using it, otherwise this may cause unexpected memory access. This patch fixes the net related command completion when we have a virtio-net nic with more than 255 queues. Cc: Luiz Capitulino Signed-off-by: Jason Wang --- monitor.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/monitor.c b/monitor.c index 07dfed0..3c0abfd 100644 --- a/monitor.c +++ b/monitor.c @@ -4480,7 +4480,7 @@ void set_link_completion(ReadLineState *rs, int nb_args, const char *str) count = qemu_find_net_clients_except(NULL, ncs, NET_CLIENT_OPTIONS_KIND_NONE, MAX_QUEUE_NUM); -for (i = 0; i < count; i++) { +for (i = 0; i < MIN(count, MAX_QUEUE_NUM); i++) { const char *name = ncs[i]->name; if (!strncmp(str, name, len)) { readline_add_completion(rs, name); @@ -4505,7 +4505,7 @@ void netdev_del_completion(ReadLineState *rs, int nb_args, const char *str) readline_set_completion_index(rs, len); count = qemu_find_net_clients_except(NULL, ncs, NET_CLIENT_OPTIONS_KIND_NIC, MAX_QUEUE_NUM); -for (i = 0; i < count; i++) { +for (i = 0; i < MIN(count, MAX_QUEUE_NUM); i++) { QemuOpts *opts; const char *name = ncs[i]->name; if (strncmp(str, name, len)) { @@ -4579,7 +4579,7 @@ void host_net_remove_completion(ReadLineState *rs, int nb_args, const char *str) count = qemu_find_net_clients_except(NULL, ncs, NET_CLIENT_OPTIONS_KIND_NONE, MAX_QUEUE_NUM); -for (i = 0; i < count; i++) { +for (i = 0; i < MIN(count, MAX_QUEUE_NUM); i++) { int id; char name[16]; @@ -4596,7 +4596,7 @@ void host_net_remove_completion(ReadLineState *rs, int nb_args, const char *str) count = qemu_find_net_clients_except(NULL, ncs, NET_CLIENT_OPTIONS_KIND_NIC, MAX_QUEUE_NUM); -for (i = 0; i < count; i++) { +for (i = 0; i < MIN(count, MAX_QUEUE_NUM); i++) { int id; const char *name; -- 2.1.0
[Qemu-devel] [PATCH V4 07/19] virtio-net: validate backend queue numbers against bus limitation
We don't validate the backend queue numbers against bus limitation, this will easily crash qemu if it exceeds the limitation. Fixing this by doing the validation and fail early. Cc: Michael S. Tsirkin Signed-off-by: Jason Wang --- hw/net/virtio-net.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index 27adcc5..59f76bc 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -1588,6 +1588,13 @@ static void virtio_net_device_realize(DeviceState *dev, Error **errp) virtio_init(vdev, "virtio-net", VIRTIO_ID_NET, n->config_size); n->max_queues = MAX(n->nic_conf.peers.queues, 1); +if (n->max_queues * 2 + 1 > VIRTIO_PCI_QUEUE_MAX) { +error_setg(errp, "Invalid number of queues (= %" PRIu32 "), " + "must be a postive integer less than %d.", + n->max_queues, (VIRTIO_PCI_QUEUE_MAX - 1) / 2); +virtio_cleanup(vdev); +return; +} n->vqs = g_malloc0(sizeof(VirtIONetQueue) * n->max_queues); n->vqs[0].rx_vq = virtio_add_queue(vdev, 256, virtio_net_handle_rx); n->curr_queues = 1; -- 2.1.0
[Qemu-devel] [PATCH V4 14/19] virtio: introduce vector to virtqueues mapping
Currently we will try to traverse all virtqueues to find a subset that using a specific vector. This is sub optimal when we will support hundreds or even thousands of virtqueues. So this patch introduces a method which could be used by transport to get all virtqueues that using a same vector. This is done through QLISTs and the number of QLISTs was queried through a transport specific method. When guest setting vectors, the virtqueue will be linked and helpers for traverse the list was also introduced. The first user will be virtio pci which will use this to speed up MSI-X masking and unmasking handling. Cc: Michael S. Tsirkin Signed-off-by: Jason Wang --- hw/virtio/virtio-pci.c | 8 hw/virtio/virtio.c | 32 ++-- include/hw/virtio/virtio-bus.h | 1 + include/hw/virtio/virtio.h | 3 +++ 4 files changed, 42 insertions(+), 2 deletions(-) diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c index e556919..c38f33f 100644 --- a/hw/virtio/virtio-pci.c +++ b/hw/virtio/virtio-pci.c @@ -910,6 +910,13 @@ static const TypeInfo virtio_9p_pci_info = { * virtio-pci: This is the PCIDevice which has a virtio-pci-bus. */ +static int virtio_pci_query_nvectors(DeviceState *d) +{ +VirtIOPCIProxy *proxy = VIRTIO_PCI(d); + +return proxy->nvectors; +} + /* This is called by virtio-bus just after the device is plugged. */ static void virtio_pci_device_plugged(DeviceState *d) { @@ -1500,6 +1507,7 @@ static void virtio_pci_bus_class_init(ObjectClass *klass, void *data) k->vmstate_change = virtio_pci_vmstate_change; k->device_plugged = virtio_pci_device_plugged; k->device_unplugged = virtio_pci_device_unplugged; +k->query_nvectors = virtio_pci_query_nvectors; k->queue_max = VIRTIO_PCI_QUEUE_MAX; } diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c index 023eb04..ca157e8 100644 --- a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -89,6 +89,7 @@ struct VirtQueue VirtIODevice *vdev; EventNotifier guest_notifier; EventNotifier host_notifier; +QLIST_ENTRY(VirtQueue) node; }; /* virt queue functions */ @@ -613,7 +614,7 @@ void virtio_reset(void *opaque) vdev->vq[i].vring.used = 0; vdev->vq[i].last_avail_idx = 0; vdev->vq[i].pa = 0; -vdev->vq[i].vector = VIRTIO_NO_VECTOR; +virtio_queue_set_vector(vdev, i, VIRTIO_NO_VECTOR); vdev->vq[i].signalled_used = 0; vdev->vq[i].signalled_used_valid = false; vdev->vq[i].notification = true; @@ -738,6 +739,16 @@ void virtio_queue_set_num(VirtIODevice *vdev, int n, int num) virtqueue_init(&vdev->vq[n]); } +VirtQueue *virtio_vector_first_queue(VirtIODevice *vdev, uint16_t vector) +{ +return QLIST_FIRST(&vdev->vector_queues[vector]); +} + +VirtQueue *virtio_vector_next_queue(VirtQueue *vq) +{ +return QLIST_NEXT(vq, node); +} + int virtio_queue_get_num(VirtIODevice *vdev, int n) { return vdev->vq[n].vring.num; @@ -788,8 +799,17 @@ uint16_t virtio_queue_vector(VirtIODevice *vdev, int n) void virtio_queue_set_vector(VirtIODevice *vdev, int n, uint16_t vector) { -if (n < virtio_get_queue_max(vdev)) +VirtQueue *vq = &vdev->vq[n]; + +if (n < virtio_get_queue_max(vdev)) { +if (vdev->vq[n].vector != VIRTIO_NO_VECTOR) { +QLIST_REMOVE(vq, node); +} vdev->vq[n].vector = vector; +if (vector != VIRTIO_NO_VECTOR) { +QLIST_INSERT_HEAD(&vdev->vector_queues[vector], vq, node); +} +} } VirtQueue *virtio_add_queue(VirtIODevice *vdev, int queue_size, @@ -1097,6 +1117,7 @@ void virtio_cleanup(VirtIODevice *vdev) qemu_del_vm_change_state_handler(vdev->vmstate); g_free(vdev->config); g_free(vdev->vq); +g_free(vdev->vector_queues); } static void virtio_vmstate_change(void *opaque, int running, RunState state) @@ -1134,7 +1155,14 @@ void virtio_instance_init_common(Object *proxy_obj, void *data, void virtio_init(VirtIODevice *vdev, const char *name, uint16_t device_id, size_t config_size) { +BusState *qbus = qdev_get_parent_bus(DEVICE(vdev)); +VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus); int i, queue_max = virtio_get_queue_max(vdev); +int nvectors = k->query_nvectors ? +k->query_nvectors(qbus->parent) : queue_max; + +vdev->vector_queues = g_malloc0(sizeof(*vdev->vector_queues) * nvectors); + vdev->device_id = device_id; vdev->status = 0; vdev->isr = 0; diff --git a/include/hw/virtio/virtio-bus.h b/include/hw/virtio/virtio-bus.h index 4da8022..12386d5 100644 --- a/include/hw/virtio/virtio-bus.h +++ b/include/hw/virtio/virtio-bus.h @@ -62,6 +62,7 @@ typedef struct VirtioBusClass { * This is called by virtio-bus just before the device is unplugged. */ void (*device_unplugged)(DeviceState *d); +int (*query_nvectors)(DeviceState *d); /* * Does the transport have variable vring alignment?
[Qemu-devel] [PATCH V4 11/19] virtio-s390: switch to bus specific queue limit
Instead of depending on marco, switch to use a bus specific queue limit. Cc: Alexander Graf Cc: Richard Henderson Cc: Christian Borntraeger Cc: Cornelia Huck Signed-off-by: Jason Wang --- hw/s390x/s390-virtio-bus.c | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/hw/s390x/s390-virtio-bus.c b/hw/s390x/s390-virtio-bus.c index 2b41e32..9d6028d 100644 --- a/hw/s390x/s390-virtio-bus.c +++ b/hw/s390x/s390-virtio-bus.c @@ -45,6 +45,8 @@ do { } while (0) #endif +#define VIRTIO_S390_QUEUE_MAX 64 + static void virtio_s390_bus_new(VirtioBusState *bus, size_t bus_size, VirtIOS390Device *dev); @@ -344,7 +346,7 @@ static ram_addr_t s390_virtio_device_num_vq(VirtIOS390Device *dev) VirtIODevice *vdev = dev->vdev; int num_vq; -for (num_vq = 0; num_vq < VIRTIO_PCI_QUEUE_MAX; num_vq++) { +for (num_vq = 0; num_vq < virtio_get_queue_max(vdev); num_vq++) { if (!virtio_queue_get_num(vdev, num_vq)) { break; } @@ -446,7 +448,7 @@ VirtIOS390Device *s390_virtio_bus_find_vring(VirtIOS390Bus *bus, QTAILQ_FOREACH(kid, &bus->bus.children, sibling) { VirtIOS390Device *dev = (VirtIOS390Device *)kid->child; -for(i = 0; i < VIRTIO_PCI_QUEUE_MAX; i++) { +for (i = 0; i < virtio_get_queue_max(dev->vdev); i++) { if (!virtio_queue_get_addr(dev->vdev, i)) break; if (virtio_queue_get_addr(dev->vdev, i) == mem) { @@ -715,7 +717,7 @@ static void virtio_s390_bus_class_init(ObjectClass *klass, void *data) bus_class->max_dev = 1; k->notify = virtio_s390_notify; k->get_features = virtio_s390_get_features; -k->queue_max = VIRTIO_PCI_QUEUE_MAX; +k->queue_max = VIRTIO_S390_QUEUE_MAX; } static const TypeInfo virtio_s390_bus_info = { -- 2.1.0
Re: [Qemu-devel] RFC: -object vs -chardev creation order
"Daniel P. Berrange" writes: > The current QEMU startup code will create -chardev backends first, then > create -object backends, then -fsdev backends and so on, in some pretty > arbitrary order of types. > > There is already a dependancy from the rng-egd object type, which has a > link to a chardev, which requires -chardev options be processed before > -object options. > > In my work to introduce an object type for TLS credentials though, we > encounter exactly the opposite dependancy. A -chardev option will > require that -object options be processed first, so that the TLS > credantials can be created. > > This is a no-win situation. Both orders of creation are wrong for some > set of use cases. This is only going to get worse over time, as more > types of user creatable objects are defined. > > The application invoking QEMU could easily solve it, by putting the > args on the command line in the order in which they need to processed, > but then QEMU throws away this ordering information and creates them > based on type of backend. Yes, and that's stupid. > I'm unclear on whether anyone has thought about ways to solve this yet, > but it is a blocker for my work now, so I need to figure out some kind > of solution asap, even if only a temporary workaround. > > One option is to convert the -chardev backends so they become user > creatable objects, and thus -object can obsolete use of -chardev > for these. This sounds simple, but I fear it could be a rather large > yak shaving exercise to get that done. > > Another idea is to perhaps change the way options are stored, so that > instead of having a separate QemuOptsList for each type of option, have > one single list and store the type with each entry in the list. Then we > can iterate over the single list creating things in exactly the same > order that they were passed on the command line. It's how we should've done it, but: > I fear this may cause > regressions for apps using QEMU though - eg if some app is placing > '-object rng-egd' in the args, before the -chardev it would currently > work, but if we switch to honour ordering it would fail. We could > introduce a flag -strict-ordering but that feels gross in itself. Yes. I'm afraid we'll have to break our byzantine command line at some point... I'd rather not break it now, though. > A third option is to not process -object args in one go, instead process > each type of object at a time. eg we'd first create all the > -object tls-credential instances, then create the -chardev instances, > then create the -object rng-egd instances. This is probably the least > amount of work in short term, but not all that scalable, unless we do > a catch-all default case, so we only need code up hacks for a few > particular object types. > > Thus my gut feeling is to do option 3, but I'd like other opinions before > embarking on this Piling yet another hack on top isn't great, put telling you "no more hacks" after everybody and his dog already got to pile on some doesn't feel fair. That said: can we sort the creation work topologically? Requires identifying references. In the QemuOpts world, a reference is a string-valued parameter that is interpreted as ID in a well-known QemuOptsList. Adding the necessary information to QemuOptDesc desc[] shouldn't be too hard. The ones with empty desc[] could be troublesome, as usual.
[Qemu-devel] [PATCH V4 10/19] virtio-ccw: introduce ccw specific queue limit
Instead of depending on marco, using a bus specific limit. Cc: Alexander Graf Cc: Cornelia Huck Cc: Christian Borntraeger Cc: Richard Henderson Signed-off-by: Jason Wang --- hw/s390x/s390-virtio-ccw.c | 7 +-- hw/s390x/virtio-ccw.c| 19 --- include/hw/s390x/s390_flic.h | 4 +++- 3 files changed, 20 insertions(+), 10 deletions(-) diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c index afb539a..96b84ab 100644 --- a/hw/s390x/s390-virtio-ccw.c +++ b/hw/s390x/s390-virtio-ccw.c @@ -55,6 +55,7 @@ void io_subsystem_reset(void) static int virtio_ccw_hcall_notify(const uint64_t *args) { +VirtIODevice *vdev; uint64_t subch_id = args[0]; uint64_t queue = args[1]; SubchDev *sch; @@ -67,10 +68,12 @@ static int virtio_ccw_hcall_notify(const uint64_t *args) if (!sch || !css_subch_visible(sch)) { return -EINVAL; } -if (queue >= VIRTIO_PCI_QUEUE_MAX) { + +vdev = virtio_ccw_get_vdev(sch); +if (queue >= virtio_get_queue_max(vdev)) { return -EINVAL; } -virtio_queue_notify(virtio_ccw_get_vdev(sch), queue); +virtio_queue_notify(vdev, queue); return 0; } diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c index dfe6ded..935b880 100644 --- a/hw/s390x/virtio-ccw.c +++ b/hw/s390x/virtio-ccw.c @@ -32,6 +32,8 @@ static QTAILQ_HEAD(, IndAddr) indicator_addresses = QTAILQ_HEAD_INITIALIZER(indicator_addresses); +#define VIRTIO_CCW_QUEUE_MAX ADAPTER_ROUTES_MAX_GSI + static IndAddr *get_indicator(hwaddr ind_addr, int len) { IndAddr *indicator; @@ -170,7 +172,7 @@ static void virtio_ccw_start_ioeventfd(VirtioCcwDevice *dev) return; } vdev = virtio_bus_get_device(&dev->bus); -for (n = 0; n < VIRTIO_PCI_QUEUE_MAX; n++) { +for (n = 0; n < virtio_get_queue_max(vdev); n++) { if (!virtio_queue_get_num(vdev, n)) { continue; } @@ -205,7 +207,7 @@ static void virtio_ccw_stop_ioeventfd(VirtioCcwDevice *dev) return; } vdev = virtio_bus_get_device(&dev->bus); -for (n = 0; n < VIRTIO_PCI_QUEUE_MAX; n++) { +for (n = 0; n < virtio_get_queue_max(vdev); n++) { if (!virtio_queue_get_num(vdev, n)) { continue; } @@ -265,8 +267,9 @@ static int virtio_ccw_set_vqs(SubchDev *sch, uint64_t addr, uint32_t align, uint16_t index, uint16_t num) { VirtIODevice *vdev = virtio_ccw_get_vdev(sch); +int queue_max = virtio_get_queue_max(vdev); -if (index > VIRTIO_PCI_QUEUE_MAX) { +if (index > queue_max) { return -EINVAL; } @@ -291,7 +294,7 @@ static int virtio_ccw_set_vqs(SubchDev *sch, uint64_t addr, uint32_t align, virtio_queue_set_vector(vdev, index, index); } /* tell notify handler in case of config change */ -vdev->config_vector = VIRTIO_PCI_QUEUE_MAX; +vdev->config_vector = queue_max; return 0; } @@ -1036,14 +1039,16 @@ static uint8_t virtio_set_ind_atomic(SubchDev *sch, uint64_t ind_loc, static void virtio_ccw_notify(DeviceState *d, uint16_t vector) { VirtioCcwDevice *dev = to_virtio_ccw_dev_fast(d); +VirtIODevice *vdev = virtio_bus_get_device(&dev->bus); SubchDev *sch = dev->sch; uint64_t indicators; -if (vector >= 128) { +/* queue indicators + secondary indicators */ +if (vector >= virtio_get_queue_max(vdev) + 64) { return; } -if (vector < VIRTIO_PCI_QUEUE_MAX) { +if (vector < virtio_get_queue_max(vdev)) { if (!dev->indicators) { return; } @@ -1711,7 +1716,7 @@ static void virtio_ccw_bus_class_init(ObjectClass *klass, void *data) k->load_queue = virtio_ccw_load_queue; k->save_config = virtio_ccw_save_config; k->load_config = virtio_ccw_load_config; -k->queue_max = VIRTIO_PCI_QUEUE_MAX; +k->queue_max = VIRTIO_CCW_QUEUE_MAX; } static const TypeInfo virtio_ccw_bus_info = { diff --git a/include/hw/s390x/s390_flic.h b/include/hw/s390x/s390_flic.h index 489d73b..e1b0389 100644 --- a/include/hw/s390x/s390_flic.h +++ b/include/hw/s390x/s390_flic.h @@ -17,10 +17,12 @@ #include "hw/s390x/adapter.h" #include "hw/virtio/virtio.h" +#define ADAPTER_ROUTES_MAX_GSI 64 + typedef struct AdapterRoutes { AdapterInfo adapter; int num_routes; -int gsi[VIRTIO_PCI_QUEUE_MAX]; +int gsi[ADAPTER_ROUTES_MAX_GSI]; } AdapterRoutes; #define TYPE_S390_FLIC_COMMON "s390-flic" -- 2.1.0
[Qemu-devel] [PATCH V4 12/19] virtio-mmio: switch to bus specific queue limit
Cc: Michael S. Tsirkin Signed-off-by: Jason Wang --- hw/virtio/virtio-mmio.c | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/hw/virtio/virtio-mmio.c b/hw/virtio/virtio-mmio.c index 2ae6942..dbd44b6 100644 --- a/hw/virtio/virtio-mmio.c +++ b/hw/virtio/virtio-mmio.c @@ -34,6 +34,8 @@ do { printf("virtio_mmio: " fmt , ## __VA_ARGS__); } while (0) #define DPRINTF(fmt, ...) do {} while (0) #endif +#define VIRTIO_MMIO_QUEUE_MAX 64 + /* QOM macros */ /* virtio-mmio-bus */ #define TYPE_VIRTIO_MMIO_BUS "virtio-mmio-bus" @@ -237,7 +239,7 @@ static void virtio_mmio_write(void *opaque, hwaddr offset, uint64_t value, proxy->guest_page_shift); break; case VIRTIO_MMIO_QUEUESEL: -if (value < VIRTIO_PCI_QUEUE_MAX) { +if (value < virtio_get_queue_max(vdev)) { vdev->queue_sel = value; } break; @@ -257,7 +259,7 @@ static void virtio_mmio_write(void *opaque, hwaddr offset, uint64_t value, } break; case VIRTIO_MMIO_QUEUENOTIFY: -if (value < VIRTIO_PCI_QUEUE_MAX) { +if (value < virtio_get_queue_max(vdev)) { virtio_queue_notify(vdev, value); } break; @@ -403,7 +405,7 @@ static void virtio_mmio_bus_class_init(ObjectClass *klass, void *data) k->device_plugged = virtio_mmio_device_plugged; k->has_variable_vring_alignment = true; bus_class->max_dev = 1; -k->queue_max = VIRTIO_PCI_QUEUE_MAX; +k->queue_max = VIRTIO_MMIO_QUEUE_MAX; } static const TypeInfo virtio_mmio_bus_info = { -- 2.1.0
[Qemu-devel] [PATCH V4 15/19] virtio: introduce virtio_queue_get_index()
This patch introduces a helper that can get the queue index of a VirtQueue. This is useful when traversing the list of VirtQueues. Cc: Michael S. Tsirkin Signed-off-by: Jason Wang --- hw/virtio/virtio.c | 5 + include/hw/virtio/virtio.h | 1 + 2 files changed, 6 insertions(+) diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c index ca157e8..d4c49ef 100644 --- a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -725,6 +725,11 @@ hwaddr virtio_queue_get_addr(VirtIODevice *vdev, int n) return vdev->vq[n].pa; } +int virtio_queue_get_index(VirtIODevice *vdev, VirtQueue *vq) +{ +return vq - vdev->vq; +} + void virtio_queue_set_num(VirtIODevice *vdev, int n, int num) { /* Don't allow guest to flip queue between existent and diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h index e3adb1d..f148590 100644 --- a/include/hw/virtio/virtio.h +++ b/include/hw/virtio/virtio.h @@ -171,6 +171,7 @@ void virtio_config_writew(VirtIODevice *vdev, uint32_t addr, uint32_t data); void virtio_config_writel(VirtIODevice *vdev, uint32_t addr, uint32_t data); void virtio_queue_set_addr(VirtIODevice *vdev, int n, hwaddr addr); hwaddr virtio_queue_get_addr(VirtIODevice *vdev, int n); +int virtio_queue_get_index(VirtIODevice *vdev, VirtQueue *vq); void virtio_queue_set_num(VirtIODevice *vdev, int n, int num); int virtio_queue_get_num(VirtIODevice *vdev, int n); void virtio_queue_set_align(VirtIODevice *vdev, int n, int align); -- 2.1.0
[Qemu-devel] [PATCH V4 17/19] virtio-pci: increase the maximum number of virtqueues to 513
This patch increases the maximum number of virtqueues for pci from 64 to 513. This will allow booting a virtio-net-pci device with 256 queue pairs. To keep migration compatibility, 64 was kept for legacy machine types. This is because qemu in fact allows guest to probe the limit of virtqueues through trying to set queue_sel. So for legacy machine type, we should make sure setting queue_sel to more than 63 won't make effect. Cc: Paolo Bonzini Cc: Richard Henderson Cc: Michael S. Tsirkin Cc: Alexander Graf Cc: qemu-...@nongnu.org Signed-off-by: Jason Wang --- hw/i386/pc_piix.c | 5 + hw/i386/pc_q35.c | 5 + hw/ppc/spapr.c | 5 + hw/virtio/virtio-pci.c | 2 +- 4 files changed, 16 insertions(+), 1 deletion(-) diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c index 175e582..0796719 100644 --- a/hw/i386/pc_piix.c +++ b/hw/i386/pc_piix.c @@ -49,6 +49,7 @@ #include "hw/acpi/acpi.h" #include "cpu.h" #include "qemu/error-report.h" +#include "hw/virtio/virtio-bus.h" #ifdef CONFIG_XEN # include #endif @@ -312,6 +313,10 @@ static void pc_init_pci(MachineState *machine) static void pc_compat_2_3(MachineState *machine) { +ObjectClass *klass = object_class_by_name("virtio-pci-bus"); +VirtioBusClass *k = VIRTIO_BUS_CLASS(klass); + +k->queue_max = 64; } static void pc_compat_2_2(MachineState *machine) diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c index df44d25..a8a34a4 100644 --- a/hw/i386/pc_q35.c +++ b/hw/i386/pc_q35.c @@ -45,6 +45,7 @@ #include "hw/usb.h" #include "hw/cpu/icc_bus.h" #include "qemu/error-report.h" +#include "include/hw/virtio/virtio-bus.h" /* ICH9 AHCI has 6 ports */ #define MAX_SATA_PORTS 6 @@ -291,6 +292,10 @@ static void pc_q35_init(MachineState *machine) static void pc_compat_2_3(MachineState *machine) { +ObjectClass *klass = object_class_by_name("virtio-pci-bus"); +VirtioBusClass *k = VIRTIO_BUS_CLASS(klass); + +k->queue_max = 64; } static void pc_compat_2_2(MachineState *machine) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index 22f4ae4..5f25dd3 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -50,6 +50,7 @@ #include "hw/pci/pci.h" #include "hw/scsi/scsi.h" #include "hw/virtio/virtio-scsi.h" +#include "hw/virtio/virtio-bus.h" #include "exec/address-spaces.h" #include "hw/usb.h" @@ -1805,6 +1806,10 @@ static const TypeInfo spapr_machine_info = { static void spapr_compat_2_3(Object *obj) { +ObjectClass *klass = object_class_by_name("virtio-pci-bus"); +VirtioBusClass *k = VIRTIO_BUS_CLASS(klass); + +k->queue_max = 64; } static void spapr_compat_2_2(Object *obj) diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c index 9a5242a..02e3ce8 100644 --- a/hw/virtio/virtio-pci.c +++ b/hw/virtio/virtio-pci.c @@ -42,7 +42,7 @@ * configuration space */ #define VIRTIO_PCI_CONFIG_SIZE(dev) VIRTIO_PCI_CONFIG_OFF(msix_enabled(dev)) -#define VIRTIO_PCI_QUEUE_MAX 64 +#define VIRTIO_PCI_QUEUE_MAX 513 static void virtio_pci_bus_new(VirtioBusState *bus, size_t bus_size, VirtIOPCIProxy *dev); -- 2.1.0
[Qemu-devel] [PATCH V4 19/19] virtio-pci: introduce auto_msix_bar_size property
Currently we don't support more than 128 MSI-X vectors for a pci devices, trying to use vector=129 for a virtio-net-pci device may get: qemu-system-x86_64: -device virtio-net-pci,netdev=hn0,vectors=129: unable to init msix vectors to 129 This this because the MSI-X bar size were hard-coded as 4096. So this patch introduces boolean auto_msix_bar_size property for virito-pci devices. Enable this will let the device calculate the msix bar size based on the number of MSI-X entries instead of previous 4096 hard-coded limit. This is a must to let virtio-net can up to 256 queues and each queue were associated with a specific MSI-X entry. Cc: Paolo Bonzini Cc: Richard Henderson Cc: Michael S. Tsirkin Cc: Alexander Graf Cc: qemu-...@nongnu.org Signed-off-by: Jason Wang --- hw/i386/pc_piix.c | 8 hw/i386/pc_q35.c | 8 hw/ppc/spapr.c | 11 ++- hw/virtio/virtio-pci.c | 17 +++-- hw/virtio/virtio-pci.h | 3 +++ include/hw/compat.h| 11 +++ 6 files changed, 55 insertions(+), 3 deletions(-) diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c index 0796719..8808500 100644 --- a/hw/i386/pc_piix.c +++ b/hw/i386/pc_piix.c @@ -552,6 +552,10 @@ static QEMUMachine pc_i440fx_machine_v2_3 = { PC_I440FX_2_3_MACHINE_OPTIONS, .name = "pc-i440fx-2.3", .init = pc_init_pci_2_3, +.compat_props = (GlobalProperty[]) { +HW_COMPAT_2_3, +{ /* end of list */ } +}, }; #define PC_I440FX_2_2_MACHINE_OPTIONS PC_I440FX_2_3_MACHINE_OPTIONS @@ -560,6 +564,10 @@ static QEMUMachine pc_i440fx_machine_v2_2 = { PC_I440FX_2_2_MACHINE_OPTIONS, .name = "pc-i440fx-2.2", .init = pc_init_pci_2_2, +.compat_props = (GlobalProperty[]) { +HW_COMPAT_2_2, +{ /* end of list */ } +}, }; #define PC_I440FX_2_1_MACHINE_OPTIONS \ diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c index a8a34a4..4a34349 100644 --- a/hw/i386/pc_q35.c +++ b/hw/i386/pc_q35.c @@ -448,6 +448,10 @@ static QEMUMachine pc_q35_machine_v2_3 = { PC_Q35_2_3_MACHINE_OPTIONS, .name = "pc-q35-2.3", .init = pc_q35_init_2_3, +.compat_props = (GlobalProperty[]) { +HW_COMPAT_2_3, +{ /* end of list */ } +}, }; #define PC_Q35_2_2_MACHINE_OPTIONS PC_Q35_2_3_MACHINE_OPTIONS @@ -456,6 +460,10 @@ static QEMUMachine pc_q35_machine_v2_2 = { PC_Q35_2_2_MACHINE_OPTIONS, .name = "pc-q35-2.2", .init = pc_q35_init_2_2, +.compat_props = (GlobalProperty[]) { +HW_COMPAT_2_2, +{ /* end of list */ } +}, }; #define PC_Q35_2_1_MACHINE_OPTIONS \ diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index 5f25dd3..853a5cc 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -1794,12 +1794,16 @@ static const TypeInfo spapr_machine_info = { }, }; +#define SPAPR_COMPAT_2_3 \ +HW_COMPAT_2_3 + #define SPAPR_COMPAT_2_2 \ +SPAPR_COMPAT_2_3, \ {\ .driver = TYPE_SPAPR_PCI_HOST_BRIDGE,\ .property = "mem_win_size",\ .value= "0x2000",\ -} +} \ #define SPAPR_COMPAT_2_1 \ SPAPR_COMPAT_2_2 @@ -1883,10 +1887,15 @@ static const TypeInfo spapr_machine_2_2_info = { static void spapr_machine_2_3_class_init(ObjectClass *oc, void *data) { +static GlobalProperty compat_props[] = { +SPAPR_COMPAT_2_3, +{ /* end of list */ } +}; MachineClass *mc = MACHINE_CLASS(oc); mc->name = "pseries-2.3"; mc->desc = "pSeries Logical Partition (PAPR compliant) v2.3"; +mc->compat_props = compat_props; } static const TypeInfo spapr_machine_2_3_info = { diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c index 4a5febb..f4cd405 100644 --- a/hw/virtio/virtio-pci.c +++ b/hw/virtio/virtio-pci.c @@ -925,7 +925,7 @@ static void virtio_pci_device_plugged(DeviceState *d) VirtIOPCIProxy *proxy = VIRTIO_PCI(d); VirtioBusState *bus = &proxy->bus; uint8_t *config; -uint32_t size; +uint32_t size, bar_size; config = proxy->pci_dev.config; if (proxy->class_code) { @@ -936,8 +936,19 @@ static void virtio_pci_device_plugged(DeviceState *d) pci_set_word(config + PCI_SUBSYSTEM_ID, virtio_bus_get_vdev_id(bus)); config[PCI_INTERRUPT_PIN] = 1; +if (proxy->flags & VIRTIO_PCI_FLAG_AUTO_MSIX_SIZE) { +bar_size = proxy->nvectors * PCI_MSIX_ENTRY_SIZE * 2; +if (bar_size & (bar_size - 1)) { +bar_size = 1 << qemu_fls(bar_size); +} +} else { +/* For migration compatibility */ +bar_size = 4096; +} + if (proxy->nvectors && -msix_init_exclusive_bar(&proxy->pci_dev, proxy->nvectors, 1, 4096)) { +msix_init_exclusive_bar(&proxy->pci_dev, proxy->nvectors, 1, +bar_size)) { error_report("unable to init msix vectors to %" PRIu32, proxy->nvectors); proxy->nvect
Re: [Qemu-devel] RFC: -object vs -chardev creation order
On Wed, Mar 18, 2015 at 10:43:25AM +0100, Markus Armbruster wrote: > "Daniel P. Berrange" writes: > > A third option is to not process -object args in one go, instead process > > each type of object at a time. eg we'd first create all the > > -object tls-credential instances, then create the -chardev instances, > > then create the -object rng-egd instances. This is probably the least > > amount of work in short term, but not all that scalable, unless we do > > a catch-all default case, so we only need code up hacks for a few > > particular object types. > > > > Thus my gut feeling is to do option 3, but I'd like other opinions before > > embarking on this > > Piling yet another hack on top isn't great, put telling you "no more > hacks" after everybody and his dog already got to pile on some doesn't > feel fair. I think I'll do this right now, since it should be a fast fix. > That said: can we sort the creation work topologically? Requires > identifying references. In the QemuOpts world, a reference is a > string-valued parameter that is interpreted as ID in a well-known > QemuOptsList. Adding the necessary information to QemuOptDesc desc[] > shouldn't be too hard. The ones with empty desc[] could be troublesome, > as usual. This sounds intriguing though - we should have enough info around to do a topological sort - its a shame that properties are only registered at time the object is created, rather than being registered against the classes, as then we wouldn't even need to add more info to the QemuOptsList. I'll have a think about this approach though and see if I can come up with something that works Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
[Qemu-devel] [PATCH v4 2/3] ehci: fix segfault when hot-unplugging ehci controller
From: Gonglei When hot-unplugging the usb controllers (ehci/uhci), we have to clean all resouce of these devices, involved registered reset handler. Otherwise, it may cause NULL pointer access and/or segmentation fault if we reboot the guest os after hot-unplugging. Let's hook up reset via DeviceClass->reset() and drop the qemu_register_reset() call. Then Qemu will register and unregister the reset handler automatically. Cc: qemu-stable Reported-by: Lidonglin Signed-off-by: Gonglei --- hw/usb/hcd-ehci-pci.c| 10 ++ hw/usb/hcd-ehci-sysbus.c | 10 ++ hw/usb/hcd-ehci.c| 3 +-- hw/usb/hcd-ehci.h| 1 + 4 files changed, 22 insertions(+), 2 deletions(-) diff --git a/hw/usb/hcd-ehci-pci.c b/hw/usb/hcd-ehci-pci.c index 4c80707..7afa5f9 100644 --- a/hw/usb/hcd-ehci-pci.c +++ b/hw/usb/hcd-ehci-pci.c @@ -101,6 +101,15 @@ static void usb_ehci_pci_exit(PCIDevice *dev) } } +static void usb_ehci_pci_reset(DeviceState *dev) +{ +PCIDevice *pci_dev = PCI_DEVICE(dev); +EHCIPCIState *i = PCI_EHCI(pci_dev); +EHCIState *s = &i->ehci; + +ehci_reset(s); +} + static void usb_ehci_pci_write_config(PCIDevice *dev, uint32_t addr, uint32_t val, int l) { @@ -143,6 +152,7 @@ static void ehci_class_init(ObjectClass *klass, void *data) k->config_write = usb_ehci_pci_write_config; dc->vmsd = &vmstate_ehci_pci; dc->props = ehci_pci_properties; +dc->reset = usb_ehci_pci_reset; } static const TypeInfo ehci_pci_type_info = { diff --git a/hw/usb/hcd-ehci-sysbus.c b/hw/usb/hcd-ehci-sysbus.c index 19ed2c2..cd1cc14 100644 --- a/hw/usb/hcd-ehci-sysbus.c +++ b/hw/usb/hcd-ehci-sysbus.c @@ -42,6 +42,15 @@ static void usb_ehci_sysbus_realize(DeviceState *dev, Error **errp) sysbus_init_irq(d, &s->irq); } +static void usb_ehci_sysbus_reset(DeviceState *dev) +{ +SysBusDevice *d = SYS_BUS_DEVICE(dev); +EHCISysBusState *i = SYS_BUS_EHCI(d); +EHCIState *s = &i->ehci; + +ehci_reset(s); +} + static void ehci_sysbus_init(Object *obj) { SysBusDevice *d = SYS_BUS_DEVICE(obj); @@ -70,6 +79,7 @@ static void ehci_sysbus_class_init(ObjectClass *klass, void *data) dc->realize = usb_ehci_sysbus_realize; dc->vmsd = &vmstate_ehci_sysbus; dc->props = ehci_sysbus_properties; +dc->reset = usb_ehci_sysbus_reset; set_bit(DEVICE_CATEGORY_USB, dc->categories); } diff --git a/hw/usb/hcd-ehci.c b/hw/usb/hcd-ehci.c index ccf54b6..2a84f46 100644 --- a/hw/usb/hcd-ehci.c +++ b/hw/usb/hcd-ehci.c @@ -845,7 +845,7 @@ static USBDevice *ehci_find_device(EHCIState *ehci, uint8_t addr) } /* 4.1 host controller initialization */ -static void ehci_reset(void *opaque) +void ehci_reset(void *opaque) { EHCIState *s = opaque; int i; @@ -2471,7 +2471,6 @@ void usb_ehci_realize(EHCIState *s, DeviceState *dev, Error **errp) s->async_bh = qemu_bh_new(ehci_frame_timer, s); s->device = dev; -qemu_register_reset(ehci_reset, s); s->vmstate = qemu_add_vm_change_state_handler(usb_ehci_vm_state_change, s); } diff --git a/hw/usb/hcd-ehci.h b/hw/usb/hcd-ehci.h index 2bc259c..87b240f 100644 --- a/hw/usb/hcd-ehci.h +++ b/hw/usb/hcd-ehci.h @@ -325,6 +325,7 @@ extern const VMStateDescription vmstate_ehci; void usb_ehci_init(EHCIState *s, DeviceState *dev); void usb_ehci_realize(EHCIState *s, DeviceState *dev, Error **errp); void usb_ehci_unrealize(EHCIState *s, DeviceState *dev, Error **errp); +void ehci_reset(void *opaque); #define TYPE_PCI_EHCI "pci-ehci-usb" #define PCI_EHCI(obj) OBJECT_CHECK(EHCIPCIState, (obj), TYPE_PCI_EHCI) -- 1.7.12.4
[Qemu-devel] [PATCH v4 3/3] ohci: fix resource cleanup leak
From: Gonglei When hot-unplugging the usb controllers (ehci/uhci), we have to clean all resouce of these devices, involved registered reset handler. Otherwise, it may cause NULL pointer access and/or segmentation fault if we reboot the guest os after hot-unplugging. Let's hook up reset via DeviceClass->reset() and drop the qemu_register_reset() call. Then Qemu will register and unregister the reset handler automatically. Ohci does't support hotplugging/hotunplugging yet, but existing resource cleanup leak logic likes ehci/uhci. Cc: qemu-stable Signed-off-by: Gonglei --- hw/usb/hcd-ohci.c | 20 +++- 1 file changed, 19 insertions(+), 1 deletion(-) diff --git a/hw/usb/hcd-ohci.c b/hw/usb/hcd-ohci.c index a0d478e..ca7aeb3 100644 --- a/hw/usb/hcd-ohci.c +++ b/hw/usb/hcd-ohci.c @@ -1879,7 +1879,6 @@ static int usb_ohci_init(OHCIState *ohci, DeviceState *dev, usb_packet_init(&ohci->usb_packet); ohci->async_td = 0; -qemu_register_reset(ohci_reset, ohci); return 0; } @@ -1951,6 +1950,15 @@ static void usb_ohci_exit(PCIDevice *dev) } } +static void usb_ohci_reset_pci(DeviceState *d) +{ +PCIDevice *dev = PCI_DEVICE(d); +OHCIPCIState *ohci = PCI_OHCI(dev); +OHCIState *s = &ohci->state; + +ohci_reset(s); +} + #define TYPE_SYSBUS_OHCI "sysbus-ohci" #define SYSBUS_OHCI(obj) OBJECT_CHECK(OHCISysBusState, (obj), TYPE_SYSBUS_OHCI) @@ -1976,6 +1984,14 @@ static void ohci_realize_pxa(DeviceState *dev, Error **errp) sysbus_init_mmio(sbd, &s->ohci.mem); } +static void usb_ohci_reset_sysbus(DeviceState *dev) +{ +OHCISysBusState *s = SYSBUS_OHCI(dev); +OHCIState *ohci = &s->ohci; + +ohci_reset(ohci); +} + static Property ohci_pci_properties[] = { DEFINE_PROP_STRING("masterbus", OHCIPCIState, masterbus), DEFINE_PROP_UINT32("num-ports", OHCIPCIState, num_ports, 3), @@ -2097,6 +2113,7 @@ static void ohci_pci_class_init(ObjectClass *klass, void *data) dc->props = ohci_pci_properties; dc->hotpluggable = false; dc->vmsd = &vmstate_ohci; +dc->reset = usb_ohci_reset_pci; } static const TypeInfo ohci_pci_info = { @@ -2120,6 +2137,7 @@ static void ohci_sysbus_class_init(ObjectClass *klass, void *data) set_bit(DEVICE_CATEGORY_USB, dc->categories); dc->desc = "OHCI USB Controller"; dc->props = ohci_sysbus_properties; +dc->reset = usb_ohci_reset_sysbus; } static const TypeInfo ohci_sysbus_info = { -- 1.7.12.4
Re: [Qemu-devel] [v7][PATCH 03/10] piix: create host bridge to passthrough
On Mi, 2015-03-18 at 17:06 +0800, Tiejun Chen wrote: > Implement a pci host bridge specific to passthrough. Actually > this just inherits the standard one. And we also just expose > a minimal real host bridge pci configuration subset. > +/* Here we just expose minimal host bridge offset subset. */ > +static const IGDHostInfo igd_host_bridge_infos[] = { > +{0x08, 2}, /* revision id */ > +{0x2c, 2}, /* sybsystem vendor id */ > +{0x2e, 2}, /* sybsystem id */ > +{0x50, 2}, /* SNB: processor graphics control register */ > +{0x52, 2}, /* processor graphics control register */ > +{0xa4, 4}, /* SNB: graphics base of stolen memory */ > +{0xa8, 4}, /* SNB: base of GTT stolen memory */ > +}; Hmm, no vendor/device id here? Will the idg guest drivers happily read graphics control registers from i440fx even though this chipset never existed in a igd variant? [ just wondering, if it works that way fine, it certainly makes things easier for the firmware which uses host bridge pci id to figure whenever it is running @ i440fx or q35 ]. > +static int igd_pt_i440fx_initfn(struct PCIDevice *pci_dev) > +{ > +uint32_t val = 0; > +int rc, i, num; > +int pos, len; > + > +num = ARRAY_SIZE(igd_host_bridge_infos); > +for (i = 0; i < num; i++) { > +pos = igd_host_bridge_infos[i].offset; > +len = igd_host_bridge_infos[i].len; > +rc = host_pci_config_read(pos, len, val); > +if (rc) { > +return -ENODEV; > +} > +pci_default_write_config(pci_dev, pos, val, len); > +} > + > +return 0; > +} Nothing i440fx specific in here, just copying some host bridge pci config space bits. I guess we can sub-class the q35 host bridge the same way and even reuse the init function? > +#define TYPE_IGD_PASSTHROUGH_I440FX_PCI_DEVICE "xen-igd-passthrough-i440FX" One xen leftover slipped through here. cheers, Gerd
Re: [Qemu-devel] [RFC 0/5] Memory transaction attributes API
On 18 March 2015 at 08:38, Edgar E. Iglesias wrote: > 2. The series introduces the read and write _with_attrs. Another > approach could be to add a (for example) single .access callback. > The callback could take a structure pointer as input, e.g: > > struct MemTx { > bool rw; > hwaddr addr; > int size; > uint64_t data; > MemTxAttrs attrs; > } > > MemTxResult access(MemTx *tx) > > The benefit I see is that this will make it easier for us in the > future to add new fields if needed. I see the reasoning, but I looked at this and it just felt awkward when I tried writing the code that way -- the code calling the access function ends up having to create and mutate this MemTx struct as it loops round calling the access callback. In particular, memory.c has kept the read and write paths separate all the way through, so squashing them back together into a single callback which will then immediately want to split them back out into read vs write seemed a bit daft. -- PMM
Re: [Qemu-devel] [v2][PATCH 2/2] libxl: introduce gfx_passthru_kind
On Wed, 2015-03-18 at 15:32 +0800, Chen, Tiejun wrote: > > I think the _libxl_ message needs to be just "Unable to detect graphics > > passthru kind". i.e. it can't/shouldn't reference anything to do with xl > > config options etc (which would make no sense if libvirt was being > > used). > > > > That's not very user friendly though, so you may want to consider adding > > a new specific error code for this case and returning it here, such that > > xl or libvirt can then give a more comprehensible error message. > > But, > args = libxl__build_device_model_args(gc, "stubdom-dm", guest_domid, >guest_config, d_state, NULL); > | > + return libxl__build_device_model_args_new(gc, dm,guest_domid, > guest_config, state, dm_state_fd); > | > + Currently we always return "NULL" inside. > > if (!args) { > ret = ERROR_FAIL; > goto out; > } > > So I'm not very sure how to pass this new specific error code to xl/libvirt. Bah, yes. I don't think it is reasonable to ask you to rework the error handling here completely for this. Lets leave this as a potential future enhancement. > Thanks but I guess I'd better to paste this whole patch to avoid I'm > still missing something :) > > --- > tools/libxl/libxl.h | 6 ++ > tools/libxl/libxl_dm.c | 25 ++--- > tools/libxl/libxl_internal.h | 5 + > tools/libxl/libxl_types.idl | 6 ++ > tools/libxl/xl_cmdimpl.c | 14 -- > 5 files changed, 51 insertions(+), 5 deletions(-) > > diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h > index 6bbc52d..62b3ae5 100644 > --- a/tools/libxl/libxl.h > +++ b/tools/libxl/libxl.h > @@ -720,6 +720,12 @@ void libxl_mac_copy(libxl_ctx *ctx, libxl_mac *dst, > libxl_mac *src); > #define LIBXL_HAVE_PSR_MBM 1 > #endif > > +/* > + * libxl_domain_build_info has the u.hvm.gfx_passthru_kind field and > + * the libxl_gfx_passthru_kind enumeration is defined. > +*/ > +#define LIBXL_HAVE_GFX_PASSTHRU_KIND > + > typedef char **libxl_string_list; > void libxl_string_list_dispose(libxl_string_list *sl); > int libxl_string_list_length(const libxl_string_list *sl); > diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c > index 8599a6a..045d48a 100644 > --- a/tools/libxl/libxl_dm.c > +++ b/tools/libxl/libxl_dm.c > @@ -710,9 +710,6 @@ static char ** > libxl__build_device_model_args_new(libxl__gc *gc, > flexarray_append(dm_args, "-net"); > flexarray_append(dm_args, "none"); > } > -if (libxl_defbool_val(b_info->u.hvm.gfx_passthru)) { > -flexarray_append(dm_args, "-gfx_passthru"); > -} > } else { > if (!sdl && !vnc) { > flexarray_append(dm_args, "-nographic"); > @@ -757,6 +754,28 @@ static char ** > libxl__build_device_model_args_new(libxl__gc *gc, > machinearg, max_ram_below_4g); > } > } > + > +if (libxl_defbool_val(b_info->u.hvm.gfx_passthru)) { > +switch (b_info->u.hvm.gfx_passthru_kind) { > +case _info->u.hvm.gfx_passthru_kind: > +if (libxl__is_igd_vga_passthru(gc, guest_config)) { > +machinearg = GCSPRINTF("%s,igd-passthru=on", > machinearg); > +} else { > +LOG(ERROR, "Unable to detect graphics passthru kind," > +" please set gfx_passthru_kind. See xl.cfg(5) > for more" > +" information.\n"); Please change this to "Unable to detect graphics passthru kind" to avoid talking xl-isms in libxl. > +return NULL; > +} > +break; > +case LIBXL_GFX_PASSTHRU_KIND_IGD: > +machinearg = GCSPRINTF("%s,igd-passthru=on", machinearg); > +break; This duplicates the code from above. I think this would be best done as: static int libxl__detect_gfx_passthru_kind(libxl__gc *gc, guest_config) { if (b_info->u.hvm.gfx_passthru_kind != LIBXL_GFX_PASSTHRU_KIND_DEFAULT) return 0; if (libxl__is_igd_vga_passthru(gc, guest_config)) { b_info->u.hvm.gfx_passthru_kind = LIBXL_GFX_PASSTHRU_KIND_IGD; return 0; } LOG(ERROR, "Unable to detect graphics passthru kind"); return 1; } Then for the code in libxl__build_device_model_args_new: if (libxl_defbool_val(b_info->u.hvm.gfx_passthru)) { if (!libxl__detect_gfx_passthru_kind(gc, guest_config)) return NULL switch (b_info->u.hvm.gfx_passthru_kind) { case LIBXL_GFX_PASSTHRU_KIND_IGD: machinearg = GCSPRINTF("%s,igd-passthru=on", machinearg); break; default: LOG(ERROR, "unknown gfx_passthru_kind\n"); return NULL; } } Tha
Re: [Qemu-devel] [PATCH for-2.3] numa: pc: fix default VCPU to node mapping
On Tue, 17 Mar 2015 13:42:36 -0300 Eduardo Habkost wrote: > On Tue, Mar 17, 2015 at 03:48:38PM +, Igor Mammedov wrote: > > since commit > >dd0247e0 pc: acpi: mark all possible CPUs as enabled in SRAT > > Linux kernel actually tries to use CPU to Node mapping from > > QEMU provided SRAT table instead of discarding it, and that > > in some cases breaks build_sched_domains() which expects > > sane mapping where cores/threads belonging to the same socket > > are on the same NUMA node. > > > > With current default round-robin mapping of VCPUs to nodes > > guest ends-up with cores/threads belonging to the same socket > > being on different NUMA nodes. > > > > For example with following CLI: > > qemu-kvm -m 4G -smp 5,sockets=1,cores=4,threads=1,maxcpus=8 \ > > -numa node,nodeid=0 -numa node,nodeid=1 > > 2.6.32 based kernels will hang on boot due to incorrectly build > > sched_group-s list in update_sd_lb_stats() > > so comment in QEMU justifying dumb default mapping: > > " > > guest OSes must cope with this anyway, because there are BIOSes > > out there in real machines which also use this scheme. > > " > > isn't really valid. > > > > Replacing default mapping withi a manual, where VCPUs belonging to > > the same socket are on the same NUMA node, fixes issue for > > guests which can't handle nonsense topology i.e. cnaging CLI to: > > -numa node,nodeid=0,cpus=0-3 -numa node,nodeid=1,cpus=4-7 > > > > So instead of simply scattering VCPUs around nodes, map > > the same socket VCPUs to the same NUMA node, which is what > > guest would expect from a sane hardware/BIOS. > > > > Signed-off-by: Igor Mammedov > > I believe the proposed behavior is much better. But if we are going to > break compatibility, shouldn't we at least do that before the first -rc > so we get feedback in case it break existing configurations? Aren't existing setups that actually care about node mapping typically do not use default mapping in favor of manually specifying which CPUs go to which node? BTW what it would break on already broken configuration? My thinking on this was that we prefer to not keep bugs in ACPI tables and fix them instead, since it doesn't affect currently running (migrating) guest and allows restarted guest use fixed ACPI tables. Considering that CPU info from SRAT is actually used only since 2.2, I'd prefer to fix it in 2.3 and 2.2 stable then allow default buggy SRAT spread over several releases. > > About qemu_cpu_socket_id_from_index(): all qemu-system-* binaries have > smp_cores and smp_threads available (even if machines ignore it), but > the default stub can return values that are larger than the number of > sockets if smp_cores*smp_threads > 1, which would be obviously > incorrect. Isn't it easier to simply make > "cpu_index/(smp_cores*smp_sockets)" be the default cpu_index->socket > mapping function, and allow machine-specific (not arch-specific) > overrides if necessary? Since it's hard-freeze now, patch focuses on fixing PC machine (which I can test) and do not affecting other targets (i.e. keep legacy behavior for them) that's why stub returns cpu_index. So I'd prefer hold off global "cpu_index/(smp_cores*smp_sockets)" change for now. So would something like below acceptable 2.3? node_id = cpu_index % nb_numa_nodes; if (machine_class->socket_from_cpu_index) { node_id = machine_class->socket_from_cpu_index(cpu_index) % nb_numa_nodes; } > > > > --- > > include/sysemu/cpus.h | 3 +++ > > numa.c| 14 ++ > > stubs/Makefile.objs | 1 + > > stubs/qemu_cpu_socket_id_from_index.c | 6 ++ > > target-i386/cpu.c | 11 +++ > > 5 files changed, 31 insertions(+), 4 deletions(-) > > create mode 100644 stubs/qemu_cpu_socket_id_from_index.c > > > > diff --git a/include/sysemu/cpus.h b/include/sysemu/cpus.h > > index 3f162a9..aacabcb 100644 > > --- a/include/sysemu/cpus.h > > +++ b/include/sysemu/cpus.h > > @@ -1,5 +1,6 @@ > > #ifndef QEMU_CPUS_H > > #define QEMU_CPUS_H > > +#include "qemu-common.h" > > > > /* cpus.c */ > > void qemu_init_cpu_loop(void); > > @@ -18,6 +19,8 @@ void qtest_clock_warp(int64_t dest); > > /* vl.c */ > > extern int smp_cores; > > extern int smp_threads; > > + > > +unsigned qemu_cpu_socket_id_from_index(unsigned int cpu_index); > > #else > > /* *-user doesn't have configurable SMP topology */ > > #define smp_cores 1 > > diff --git a/numa.c b/numa.c > > index ffbec68..5297749 100644 > > --- a/numa.c > > +++ b/numa.c > > @@ -26,6 +26,7 @@ > > #include "exec/cpu-common.h" > > #include "qemu/bitmap.h" > > #include "qom/cpu.h" > > +#include "sysemu/cpus.h" > > #include "qemu/error-report.h" > > #include "include/exec/cpu-common.h" /* for RAM_ADDR_FMT */ > > #include "qapi-visit.h" > > @@ -233,13 +234,18 @@ void parse_numa_opts(void) > > break; > > } > > } > > -/* assigning
Re: [Qemu-devel] [PATCH for-2.3] numa: pc: fix default VCPU to node mapping
On Tue, 17 Mar 2015 17:59:48 +0100 Andreas Färber wrote: > Am 17.03.2015 um 17:42 schrieb Eduardo Habkost: > > On Tue, Mar 17, 2015 at 03:48:38PM +, Igor Mammedov wrote: > >> since commit > >>dd0247e0 pc: acpi: mark all possible CPUs as enabled in SRAT > >> Linux kernel actually tries to use CPU to Node mapping from > >> QEMU provided SRAT table instead of discarding it, and that > >> in some cases breaks build_sched_domains() which expects > >> sane mapping where cores/threads belonging to the same socket > >> are on the same NUMA node. > >> > >> With current default round-robin mapping of VCPUs to nodes > >> guest ends-up with cores/threads belonging to the same socket > >> being on different NUMA nodes. > >> > >> For example with following CLI: > >> qemu-kvm -m 4G -smp 5,sockets=1,cores=4,threads=1,maxcpus=8 \ > >> -numa node,nodeid=0 -numa node,nodeid=1 > >> 2.6.32 based kernels will hang on boot due to incorrectly build > >> sched_group-s list in update_sd_lb_stats() > >> so comment in QEMU justifying dumb default mapping: > >> " > >> guest OSes must cope with this anyway, because there are BIOSes > >> out there in real machines which also use this scheme. > >> " > >> isn't really valid. > >> > >> Replacing default mapping withi a manual, where VCPUs belonging to > >> the same socket are on the same NUMA node, fixes issue for > >> guests which can't handle nonsense topology i.e. cnaging CLI to: > >> -numa node,nodeid=0,cpus=0-3 -numa node,nodeid=1,cpus=4-7 > >> > >> So instead of simply scattering VCPUs around nodes, map > >> the same socket VCPUs to the same NUMA node, which is what > >> guest would expect from a sane hardware/BIOS. > >> > >> Signed-off-by: Igor Mammedov > > > > I believe the proposed behavior is much better. But if we are going to > > break compatibility, shouldn't we at least do that before the first -rc > > so we get feedback in case it break existing configurations? > > > > About qemu_cpu_socket_id_from_index(): all qemu-system-* binaries have > > smp_cores and smp_threads available (even if machines ignore it), but > > the default stub can return values that are larger than the number of > > sockets if smp_cores*smp_threads > 1, which would be obviously > > incorrect. Isn't it easier to simply make > > "cpu_index/(smp_cores*smp_sockets)" be the default cpu_index->socket > > mapping function, and allow machine-specific (not arch-specific) > > overrides if necessary? > > Agree that the proposed stub solution is not so nice. Can you propose a > MachineClass based solution instead? sure > > The example I keep bringing up for x86 is that the Galileo boards or > even the Minnow boards don't really have sockets, being a SoC. > > Thanks, > Andreas >
Re: [Qemu-devel] [PATCH] exec: Respect as_tranlsate_internal length clamp
On 17/03/2015 06:35, Peter Crosthwaite wrote: > address_space_translate_internal will clamp the *plen length argument > based on the size of the memory region being queried. The iommu walker > logic in addresss_space_translate was ignoring this by discarding the > post fn call value of *plen. Fix by just always using *plen as the > length argument throughout the fn, removing the len local variable. > > This fixes a bootloader bug when a single elf section spans multiple > QEMU memory regions. > > Signed-off-by: Peter Crosthwaite > --- > exec.c | 6 ++ > 1 file changed, 2 insertions(+), 4 deletions(-) > > diff --git a/exec.c b/exec.c > index e97071a..8b922db 100644 > --- a/exec.c > +++ b/exec.c > @@ -380,7 +380,6 @@ MemoryRegion *address_space_translate(AddressSpace *as, > hwaddr addr, > IOMMUTLBEntry iotlb; > MemoryRegionSection *section; > MemoryRegion *mr; > -hwaddr len = *plen; > > rcu_read_lock(); > for (;;) { > @@ -395,7 +394,7 @@ MemoryRegion *address_space_translate(AddressSpace *as, > hwaddr addr, > iotlb = mr->iommu_ops->translate(mr, addr, is_write); > addr = ((iotlb.translated_addr & ~iotlb.addr_mask) > | (addr & iotlb.addr_mask)); > -len = MIN(len, (addr | iotlb.addr_mask) - addr + 1); > +*plen = MIN(*plen, (addr | iotlb.addr_mask) - addr + 1); > if (!(iotlb.perm & (1 << is_write))) { > mr = &io_mem_unassigned; > break; > @@ -406,10 +405,9 @@ MemoryRegion *address_space_translate(AddressSpace *as, > hwaddr addr, > > if (xen_enabled() && memory_access_is_direct(mr, is_write)) { > hwaddr page = ((addr & TARGET_PAGE_MASK) + TARGET_PAGE_SIZE) - addr; > -len = MIN(page, len); > +*plen = MIN(page, *plen); > } > > -*plen = len; > *xlat = addr; > rcu_read_unlock(); > return mr; > Applied, thanks. Paolo
Re: [Qemu-devel] [PATCH] migration: flush the bdrv before stopping VM
[ Cc: qemu-block ] Am 18.03.2015 um 04:19 hat Li, Liang Z geschrieben: > > This needs further review/changes on the block layer. > > > > First explanation, why I think this don't fix the full problem. > > Whith this patch, we fix the problem where we have a dirty block layer but > > basically nothing dirtying the memory on the guest (we are moving the 20 > > seconds from max_downtime for the blocklayer flush), to 20 seconds until > > we have decided that the amount of dirty memory is small enough to be > > transferred during max_downtime. But it is still going to take 20 seconds > > to > > flush the block layer, and during that 20 seconds, the amount of memory that > > can be dirty is HUGE. > > It's true. What kind of cache is it actually that takes 20s to flush here? Inside of qemu, we don't cache a whole lot. If you use qcow2, we do use a writeback cache for some metadata that might need to be written back, but it is small and shouldn't take any significant time. Then we have the kernel page cache, or for some network protocols caches in the respective libs. This sounds like the right size for a 20s stall, but don't we require cache.direct=on for live migration anyway for coherency, i.e. bypassing any such cache? Lastly there may be a disk cache, but it's too small either. > > I think our ouptions are: > > > > - tell the block layer at the beggining of migration > > Hey, we are migrating, could you please start flusing data now, and > > don't get the caches to grow too much, please, pretty please. > > (I left the API to the block layer) > > - Add on that point a new function: > > bdrvr_flush_all_start() > > That starts the sending of pages, and we "hope" that by the time that > > we have migrated all memory, they have also finished (so our last > > call to block_flush_all() have less work to do) > > - Add another function: > > int bdrv_flush_all_timeout(int timeout) > > that returns if timeout pass, telling if it has migrated all pages or > > timeout has passed. So we can got back to the iterative stage if it > > has taken too long. The problem is that the block layer really doesn't have an option to control what is getting synced once the data is cached outside of qemu. Essentially we can do an fdatasync() or we can leave it, that's the only choice we have. Now doing an asynchronous fdatasync() in the background is completely reasonable in order to reduce the amount of data to be flushed later. But the patch is doing it while holding both the BQL and the AIOContext lock of the device, which doesn't feel right. Maybe it should schedule a BH in the AIOContext instead and flush from there asynchronously. The other thing is that flushing once doesn't mean that new data isn't accumulating in the cache, especially if you decide to do the background flush at the start of the migration. The obvious way to avoid that would be to switch to a writethrough mode, so any write request goes directly to the disk. This will, however, impact performance so heavily that it's probably not a realistic option. An alternative approach could be to repeat the background flush periodically, either time based or after every x bytes that are written to a device. Time based should actually be quite easy to implement. Once we have the flushing in the background, the migration code can apply any timeouts it wants. If the timeout is exceeded, the flush wouldn't be aborted but keep running in the background, but migration can go back to the iterative state anyway. > > Notice that *normally* bdrv_flush_all() is very fast, the problem is that > > sometimes it get really, really slow (NFS decided to go slow, TCP drop a > > packet, whatever). > > > > Right now, we don't have an interface to detect that cases and got back to > > the iterative stage. > > How about go back to the iterative stage when detect that the pending_size is > larger > Than max_size, like this: > > +/* do flush here is aimed to shorten the VM downtime, > + * bdrv_flush_all is a time consuming operation > + * when the guest has done some file writing */ > +bdrv_flush_all(); > +pending_size = qemu_savevm_state_pending(s->file, max_size); > +if (pending_size && pending_size >= max_size) { > +qemu_mutex_unlock_iothread(); > +continue; > +} > ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE); > if (ret >= 0) { > qemu_file_set_rate_limit(s->file, INT64_MAX); > > and this is quite simple. Yes, but it is too simple. If you hold all the locks during bdrv_flush_all(), your VM will effectively stop as soon as it performs the next I/O access, so you don't win much. And you still don't have a timeout for cases where the flush takes really long. Kevin
[Qemu-devel] [PULL 01/19] nbd: Fix overflow return value
From: Yik Fang The value of reply.error should be the type unsigned int. Signed-off-by: Yik Fang Message-Id: <1423722111-12902-1-git-send-email-eric.fan...@huawei.com> Signed-off-by: Paolo Bonzini --- nbd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/nbd.c b/nbd.c index 71159af..02ba0fe 100644 --- a/nbd.c +++ b/nbd.c @@ -1295,7 +1295,7 @@ static void nbd_trip(void *opaque) default: LOG("invalid request type (%u) received", request.type); invalid_request: -reply.error = -EINVAL; +reply.error = EINVAL; error_reply: if (nbd_co_send_reply(req, &reply, 0) < 0) { goto out; -- 2.3.0
[Qemu-devel] [PULL 06/19] nbd: Handle blk_getlength() failure
From: Max Reitz Signed-off-by: Max Reitz Message-Id: <1424887718-10800-9-git-send-email-mre...@redhat.com> Signed-off-by: Paolo Bonzini --- blockdev-nbd.c | 6 +- include/block/nbd.h | 3 ++- nbd.c | 16 ++-- qemu-nbd.c | 10 +- 4 files changed, 30 insertions(+), 5 deletions(-) diff --git a/blockdev-nbd.c b/blockdev-nbd.c index 22e95d1..b29e456 100644 --- a/blockdev-nbd.c +++ b/blockdev-nbd.c @@ -105,7 +105,11 @@ void qmp_nbd_server_add(const char *device, bool has_writable, bool writable, writable = false; } -exp = nbd_export_new(blk, 0, -1, writable ? 0 : NBD_FLAG_READ_ONLY, NULL); +exp = nbd_export_new(blk, 0, -1, writable ? 0 : NBD_FLAG_READ_ONLY, NULL, + errp); +if (!exp) { +return; +} nbd_export_set_name(exp, device); diff --git a/include/block/nbd.h b/include/block/nbd.h index ca9a5ac..2c20138 100644 --- a/include/block/nbd.h +++ b/include/block/nbd.h @@ -86,7 +86,8 @@ typedef struct NBDExport NBDExport; typedef struct NBDClient NBDClient; NBDExport *nbd_export_new(BlockBackend *blk, off_t dev_offset, off_t size, - uint32_t nbdflags, void (*close)(NBDExport *)); + uint32_t nbdflags, void (*close)(NBDExport *), + Error **errp); void nbd_export_close(NBDExport *exp); void nbd_export_get(NBDExport *exp); void nbd_export_put(NBDExport *exp); diff --git a/nbd.c b/nbd.c index 34f4dbb..8837e75 100644 --- a/nbd.c +++ b/nbd.c @@ -966,7 +966,8 @@ static void blk_aio_detach(void *opaque) } NBDExport *nbd_export_new(BlockBackend *blk, off_t dev_offset, off_t size, - uint32_t nbdflags, void (*close)(NBDExport *)) + uint32_t nbdflags, void (*close)(NBDExport *), + Error **errp) { NBDExport *exp = g_malloc0(sizeof(NBDExport)); exp->refcount = 1; @@ -974,7 +975,14 @@ NBDExport *nbd_export_new(BlockBackend *blk, off_t dev_offset, off_t size, exp->blk = blk; exp->dev_offset = dev_offset; exp->nbdflags = nbdflags; -exp->size = size == -1 ? blk_getlength(blk) : size; +exp->size = size < 0 ? blk_getlength(blk) : size; +if (exp->size < 0) { +error_setg_errno(errp, -exp->size, + "Failed to determine the NBD export's length"); +goto fail; +} +exp->size -= exp->size % BDRV_SECTOR_SIZE; + exp->close = close; exp->ctx = blk_get_aio_context(blk); blk_ref(blk); @@ -986,6 +994,10 @@ NBDExport *nbd_export_new(BlockBackend *blk, off_t dev_offset, off_t size, */ blk_invalidate_cache(blk, NULL); return exp; + +fail: +g_free(exp); +return NULL; } NBDExport *nbd_export_find(const char *name) diff --git a/qemu-nbd.c b/qemu-nbd.c index d8daf1d..0cb0e4e 100644 --- a/qemu-nbd.c +++ b/qemu-nbd.c @@ -717,6 +717,10 @@ int main(int argc, char **argv) bs->detect_zeroes = detect_zeroes; fd_size = blk_getlength(blk); +if (fd_size < 0) { +errx(EXIT_FAILURE, "Failed to determine the image length: %s", + strerror(-fd_size)); +} if (partition != -1) { ret = find_partition(blk, partition, &dev_offset, &fd_size); @@ -726,7 +730,11 @@ int main(int argc, char **argv) } } -exp = nbd_export_new(blk, dev_offset, fd_size, nbdflags, nbd_export_closed); +exp = nbd_export_new(blk, dev_offset, fd_size, nbdflags, nbd_export_closed, + &local_err); +if (!exp) { +errx(EXIT_FAILURE, "%s", error_get_pretty(local_err)); +} if (sockpath) { fd = unix_socket_incoming(sockpath); -- 2.3.0
[Qemu-devel] [PULL 03/19] qemu-nbd: Detect unused partitions by system == 0
From: Max Reitz Unused partitions do not necessarily have a total sector count of 0 (although they should have), but they always do have the system field set to 0, so use that for testing whether a partition is in use rather than the sector count field alone. Signed-off-by: Max Reitz Message-Id: <1424887718-10800-3-git-send-email-mre...@redhat.com> Signed-off-by: Paolo Bonzini --- qemu-nbd.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/qemu-nbd.c b/qemu-nbd.c index 064b000..d8daf1d 100644 --- a/qemu-nbd.c +++ b/qemu-nbd.c @@ -167,8 +167,9 @@ static int find_partition(BlockBackend *blk, int partition, for (i = 0; i < 4; i++) { read_partition(&data[446 + 16 * i], &mbr[i]); -if (!mbr[i].nb_sectors_abs) +if (!mbr[i].system || !mbr[i].nb_sectors_abs) { continue; +} if (mbr[i].system == 0xF || mbr[i].system == 0x5) { struct partition_record ext[4]; @@ -182,8 +183,9 @@ static int find_partition(BlockBackend *blk, int partition, for (j = 0; j < 4; j++) { read_partition(&data1[446 + 16 * j], &ext[j]); -if (!ext[j].nb_sectors_abs) +if (!ext[j].system || !ext[j].nb_sectors_abs) { continue; +} if ((ext_partnum + j + 1) == partition) { *offset = (uint64_t)ext[j].start_sector_abs << 9; -- 2.3.0
[Qemu-devel] [PULL 00/19] Misc bugfixes for 2.3.0-rc1
The following changes since commit cd232acfa0d70002fed89e9293f04afda577a513: Update version for v2.3.0-rc0 release (2015-03-17 18:58:33 +) are available in the git repository at: git://github.com/bonzini/qemu.git tags/for-upstream for you to fetch changes up to c3c1bb99d1c11978d9ce94d1bdcf0705378c1459: exec: Respect as_tranlsate_internal length clamp (2015-03-18 12:09:42 +0100) - kvm: ioeventfd fix for PPC64LE - virtio-scsi: misc fixes - fix for --enable-profiler - nbd: fixes from Max - build: fix for scripts/make_device_config.sh - exec: fix for address_space_translate Let's see if patchew likes the Message-Id tags from git-am. :) Alexey Kardashevskiy (1): profiler: Reenable built-in profiler Bo Su (1): virtio-scsi-dataplane: fix memory leak in virtio_scsi_vring_init Fam Zheng (1): virtio-scsi: Fix assert in virtio_scsi_push_event Greg Kurz (1): kvm: fix ioeventfd endianness on bi-endian architectures Max Reitz (12): util/uri: Add overflow check to rfc3986_parse_port qemu-nbd: Detect unused partitions by system == 0 nbd: Fix nbd_establish_connection()'s return value nbd: Pass return value from nbd_handle_list() nbd: Handle blk_getlength() failure qemu-nbd: fork() can fail nbd: Fix potential signed overflow issues nbd: Set block size to BDRV_SECTOR_SIZE nbd: Fix nbd_receive_options() nbd: Fix interpretation of the export flags nbd: Drop unexpected data for NBD_OPT_LIST coroutine-io: Return -errno in case of error Paolo Bonzini (1): build: pass .d file name to scripts/make_device_config.sh, fix makefile target Peter Crosthwaite (1): exec: Respect as_tranlsate_internal length clamp Yik Fang (1): nbd: Fix overflow return value Makefile| 3 +- block/nbd-client.c | 3 +- block/nbd-client.h | 1 - block/nbd.c | 2 +- blockdev-nbd.c | 6 ++- cpus.c | 2 +- exec.c | 6 +-- hw/scsi/virtio-scsi-dataplane.c | 4 +- hw/scsi/virtio-scsi.c | 8 ++- include/block/nbd.h | 11 ++-- include/qemu/timer.h| 5 +- kvm-all.c | 24 - monitor.c | 6 +-- nbd.c | 112 qemu-coroutine-io.c | 2 +- qemu-nbd.c | 30 +++ scripts/make_device_config.sh | 18 --- util/uri.c | 24 + 18 files changed, 178 insertions(+), 89 deletions(-) -- 2.3.0
[Qemu-devel] [PULL 04/19] nbd: Fix nbd_establish_connection()'s return value
From: Max Reitz unix_connect_opts() and inet_connect_opts() do not necessarily set errno (if at all); therefore, nbd_establish_connection() should not literally return -errno on error. Signed-off-by: Max Reitz Message-Id: <1424887718-10800-4-git-send-email-mre...@redhat.com> Signed-off-by: Paolo Bonzini --- block/nbd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/block/nbd.c b/block/nbd.c index 6634a69..2176186 100644 --- a/block/nbd.c +++ b/block/nbd.c @@ -248,7 +248,7 @@ static int nbd_establish_connection(BlockDriverState *bs, Error **errp) /* Failed to establish connection */ if (sock < 0) { logout("Failed to establish connection to NBD server\n"); -return -errno; +return -EIO; } return sock; -- 2.3.0
[Qemu-devel] [PULL 02/19] util/uri: Add overflow check to rfc3986_parse_port
From: Max Reitz And while at it, replace tabs by eight spaces in this function. Signed-off-by: Max Reitz Message-Id: <1424887718-10800-2-git-send-email-mre...@redhat.com> Signed-off-by: Paolo Bonzini --- util/uri.c | 24 ++-- 1 file changed, 14 insertions(+), 10 deletions(-) diff --git a/util/uri.c b/util/uri.c index 1cfd78b..550b984 100644 --- a/util/uri.c +++ b/util/uri.c @@ -320,19 +320,23 @@ static int rfc3986_parse_port(URI *uri, const char **str) { const char *cur = *str; +int port = 0; if (ISA_DIGIT(cur)) { - if (uri != NULL) - uri->port = 0; - while (ISA_DIGIT(cur)) { - if (uri != NULL) - uri->port = uri->port * 10 + (*cur - '0'); - cur++; - } - *str = cur; - return(0); +while (ISA_DIGIT(cur)) { +port = port * 10 + (*cur - '0'); +if (port > 65535) { +return 1; +} +cur++; +} +if (uri) { +uri->port = port; +} +*str = cur; +return 0; } -return(1); +return 1; } /** -- 2.3.0
[Qemu-devel] [PULL 18/19] virtio-scsi-dataplane: fix memory leak in virtio_scsi_vring_init
From: Bo Su if k->set_host_notifier failed, VirtIOSCSIVring *r will leak Signed-off-by: Bo Su Message-Id: <1426671732-80213-1-git-send-email-su...@huawei.com> Reviewed-by: Fam Zheng Reviewed-by: Gonglei Signed-off-by: Paolo Bonzini --- hw/scsi/virtio-scsi-dataplane.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/hw/scsi/virtio-scsi-dataplane.c b/hw/scsi/virtio-scsi-dataplane.c index 3f40ff0..c069cd7 100644 --- a/hw/scsi/virtio-scsi-dataplane.c +++ b/hw/scsi/virtio-scsi-dataplane.c @@ -45,7 +45,7 @@ static VirtIOSCSIVring *virtio_scsi_vring_init(VirtIOSCSI *s, { BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(s))); VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus); -VirtIOSCSIVring *r = g_slice_new(VirtIOSCSIVring); +VirtIOSCSIVring *r; int rc; /* Set up virtqueue notify */ @@ -56,6 +56,8 @@ static VirtIOSCSIVring *virtio_scsi_vring_init(VirtIOSCSI *s, s->dataplane_fenced = true; return NULL; } + +r = g_slice_new(VirtIOSCSIVring); r->host_notifier = *virtio_queue_get_host_notifier(vq); r->guest_notifier = *virtio_queue_get_guest_notifier(vq); aio_set_event_notifier(s->ctx, &r->host_notifier, handler); -- 2.3.0
[Qemu-devel] [PULL 05/19] nbd: Pass return value from nbd_handle_list()
From: Max Reitz While it does not make a difference in practice, nbd_receive_options() generally returns -errno, so it should do that here as well; and the easiest way to achieve this is by passing on the value returned by nbd_handle_list(). Signed-off-by: Max Reitz Message-Id: <1424887718-10800-7-git-send-email-mre...@redhat.com> Signed-off-by: Paolo Bonzini --- nbd.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/nbd.c b/nbd.c index 02ba0fe..34f4dbb 100644 --- a/nbd.c +++ b/nbd.c @@ -351,7 +351,7 @@ fail: static int nbd_receive_options(NBDClient *client) { while (1) { -int csock = client->sock; +int csock = client->sock, ret; uint32_t tmp, length; uint64_t magic; @@ -398,8 +398,9 @@ static int nbd_receive_options(NBDClient *client) TRACE("Checking option"); switch (be32_to_cpu(tmp)) { case NBD_OPT_LIST: -if (nbd_handle_list(client, length) < 0) { -return 1; +ret = nbd_handle_list(client, length); +if (ret < 0) { +return ret; } break; -- 2.3.0
[Qemu-devel] [PULL 09/19] nbd: Set block size to BDRV_SECTOR_SIZE
From: Max Reitz Signed-off-by: Max Reitz Message-Id: <1424887718-10800-13-git-send-email-mre...@redhat.com> Signed-off-by: Paolo Bonzini --- block/nbd-client.c | 3 +-- block/nbd-client.h | 1 - include/block/nbd.h | 4 ++-- nbd.c | 15 +++ qemu-nbd.c | 5 ++--- 5 files changed, 12 insertions(+), 16 deletions(-) diff --git a/block/nbd-client.c b/block/nbd-client.c index 259f5a3..e1bb919 100644 --- a/block/nbd-client.c +++ b/block/nbd-client.c @@ -386,8 +386,7 @@ int nbd_client_init(BlockDriverState *bs, int sock, const char *export, logout("session init %s\n", export); qemu_set_block(sock); ret = nbd_receive_negotiate(sock, export, -&client->nbdflags, &client->size, -&client->blocksize, errp); +&client->nbdflags, &client->size, errp); if (ret < 0) { logout("Failed to negotiate with the NBD server\n"); closesocket(sock); diff --git a/block/nbd-client.h b/block/nbd-client.h index fa4ff42..e841340 100644 --- a/block/nbd-client.h +++ b/block/nbd-client.h @@ -20,7 +20,6 @@ typedef struct NbdClientSession { int sock; uint32_t nbdflags; off_t size; -size_t blocksize; CoMutex send_mutex; CoMutex free_sema; diff --git a/include/block/nbd.h b/include/block/nbd.h index 53726e8..65f409d 100644 --- a/include/block/nbd.h +++ b/include/block/nbd.h @@ -75,8 +75,8 @@ enum { ssize_t nbd_wr_sync(int fd, void *buffer, size_t size, bool do_read); int nbd_receive_negotiate(int csock, const char *name, uint32_t *flags, - off_t *size, size_t *blocksize, Error **errp); -int nbd_init(int fd, int csock, uint32_t flags, off_t size, size_t blocksize); + off_t *size, Error **errp); +int nbd_init(int fd, int csock, uint32_t flags, off_t size); ssize_t nbd_send_request(int csock, struct nbd_request *request); ssize_t nbd_receive_reply(int csock, struct nbd_reply *reply); int nbd_client(int fd); diff --git a/nbd.c b/nbd.c index 8837e75..3d550f7 100644 --- a/nbd.c +++ b/nbd.c @@ -495,7 +495,7 @@ fail: } int nbd_receive_negotiate(int csock, const char *name, uint32_t *flags, - off_t *size, size_t *blocksize, Error **errp) + off_t *size, Error **errp) { char buf[256]; uint64_t magic, s; @@ -603,7 +603,6 @@ int nbd_receive_negotiate(int csock, const char *name, uint32_t *flags, goto fail; } *size = be64_to_cpu(s); -*blocksize = 1024; TRACE("Size is %" PRIu64, *size); if (!name) { @@ -630,7 +629,7 @@ fail: } #ifdef __linux__ -int nbd_init(int fd, int csock, uint32_t flags, off_t size, size_t blocksize) +int nbd_init(int fd, int csock, uint32_t flags, off_t size) { TRACE("Setting NBD socket"); @@ -640,17 +639,17 @@ int nbd_init(int fd, int csock, uint32_t flags, off_t size, size_t blocksize) return -serrno; } -TRACE("Setting block size to %lu", (unsigned long)blocksize); +TRACE("Setting block size to %lu", (unsigned long)BDRV_SECTOR_SIZE); -if (ioctl(fd, NBD_SET_BLKSIZE, blocksize) < 0) { +if (ioctl(fd, NBD_SET_BLKSIZE, (size_t)BDRV_SECTOR_SIZE) < 0) { int serrno = errno; LOG("Failed setting NBD block size"); return -serrno; } -TRACE("Setting size to %zd block(s)", (size_t)(size / blocksize)); +TRACE("Setting size to %zd block(s)", (size_t)(size / BDRV_SECTOR_SIZE)); -if (ioctl(fd, NBD_SET_SIZE_BLOCKS, size / blocksize) < 0) { +if (ioctl(fd, NBD_SET_SIZE_BLOCKS, size / (size_t)BDRV_SECTOR_SIZE) < 0) { int serrno = errno; LOG("Failed setting size (in blocks)"); return -serrno; @@ -715,7 +714,7 @@ int nbd_client(int fd) return ret; } #else -int nbd_init(int fd, int csock, uint32_t flags, off_t size, size_t blocksize) +int nbd_init(int fd, int csock, uint32_t flags, off_t size) { return -ENOTSUP; } diff --git a/qemu-nbd.c b/qemu-nbd.c index a4a9a0c..7e690ff 100644 --- a/qemu-nbd.c +++ b/qemu-nbd.c @@ -279,7 +279,6 @@ static void *nbd_client_thread(void *arg) { char *device = arg; off_t size; -size_t blocksize; uint32_t nbdflags; int fd, sock; int ret; @@ -292,7 +291,7 @@ static void *nbd_client_thread(void *arg) } ret = nbd_receive_negotiate(sock, NULL, &nbdflags, -&size, &blocksize, &local_error); +&size, &local_error); if (ret < 0) { if (local_error) { fprintf(stderr, "%s\n", error_get_pretty(local_error)); @@ -308,7 +307,7 @@ static void *nbd_client_thread(void *arg) goto out_socket; } -ret = nbd_init(fd, sock, nbdflags, size, blocksize); +ret = nbd_init(fd, sock, nbdflags, size); if (ret < 0) { goto out_fd; } -- 2.3.0
[Qemu-devel] [PULL 12/19] nbd: Drop unexpected data for NBD_OPT_LIST
From: Max Reitz When requesting the list of exports, no data should be sent. If data is sent, the NBD server should not just inform the client of the invalid request, but also drop the data. Signed-off-by: Max Reitz Message-Id: <1424887718-10800-22-git-send-email-mre...@redhat.com> Signed-off-by: Paolo Bonzini --- nbd.c | 23 +++ 1 file changed, 23 insertions(+) diff --git a/nbd.c b/nbd.c index 563e820..91b7d56 100644 --- a/nbd.c +++ b/nbd.c @@ -193,6 +193,26 @@ static ssize_t read_sync(int fd, void *buffer, size_t size) return nbd_wr_sync(fd, buffer, size, true); } +static ssize_t drop_sync(int fd, size_t size) +{ +ssize_t ret, dropped = size; +uint8_t *buffer = g_malloc(MIN(65536, size)); + +while (size > 0) { +ret = read_sync(fd, buffer, MIN(65536, size)); +if (ret < 0) { +g_free(buffer); +return ret; +} + +assert(ret <= size); +size -= ret; +} + +g_free(buffer); +return dropped; +} + static ssize_t write_sync(int fd, void *buffer, size_t size) { int ret; @@ -303,6 +323,9 @@ static int nbd_handle_list(NBDClient *client, uint32_t length) csock = client->sock; if (length) { +if (drop_sync(csock, length) != length) { +return -EIO; +} return nbd_send_rep(csock, NBD_REP_ERR_INVALID, NBD_OPT_LIST); } -- 2.3.0
[Qemu-devel] [PULL 17/19] profiler: Reenable built-in profiler
From: Alexey Kardashevskiy 2ed1ebcf6 "timer: replace time() with QEMU_CLOCK_HOST" broke compile when configured with --enable-profiler. Turned out the profiler has been broken for a while. This does s/qemu_time/tcg_time/ as the profiler only works in a TCG mode. This also fixes the compile error. This changes profile_getclock() to return nanoseconds rather than CPU ticks as the "profile" HMP command prints seconds and there is no platform-independent way to get ticks-per-second rate. Since TCG is quite slow and get_clock() returns nanoseconds (fine enough), this should not affect precision much. This removes unused qemu_time_start and tlb_flush_time. Signed-off-by: Alexey Kardashevskiy Message-Id: <1426478258-29961-1-git-send-email-...@ozlabs.ru> Signed-off-by: Paolo Bonzini --- cpus.c | 2 +- include/qemu/timer.h | 5 ++--- monitor.c| 6 +++--- 3 files changed, 6 insertions(+), 7 deletions(-) diff --git a/cpus.c b/cpus.c index 1ce90a1..314df16 100644 --- a/cpus.c +++ b/cpus.c @@ -1353,7 +1353,7 @@ static int tcg_cpu_exec(CPUArchState *env) } ret = cpu_exec(env); #ifdef CONFIG_PROFILER -qemu_time += profile_getclock() - ti; +tcg_time += profile_getclock() - ti; #endif if (use_icount) { /* Fold pending instructions back into the diff --git a/include/qemu/timer.h b/include/qemu/timer.h index eba8b21..e5bd494 100644 --- a/include/qemu/timer.h +++ b/include/qemu/timer.h @@ -999,11 +999,10 @@ static inline int64_t cpu_get_real_ticks (void) #ifdef CONFIG_PROFILER static inline int64_t profile_getclock(void) { -return cpu_get_real_ticks(); +return get_clock(); } -extern int64_t qemu_time, qemu_time_start; -extern int64_t tlb_flush_time; +extern int64_t tcg_time; extern int64_t dev_time; #endif diff --git a/monitor.c b/monitor.c index 42116a9..65a63df 100644 --- a/monitor.c +++ b/monitor.c @@ -1975,7 +1975,7 @@ static void hmp_info_numa(Monitor *mon, const QDict *qdict) #ifdef CONFIG_PROFILER -int64_t qemu_time; +int64_t tcg_time; int64_t dev_time; static void hmp_info_profile(Monitor *mon, const QDict *qdict) @@ -1983,8 +1983,8 @@ static void hmp_info_profile(Monitor *mon, const QDict *qdict) monitor_printf(mon, "async time %" PRId64 " (%0.3f)\n", dev_time, dev_time / (double)get_ticks_per_sec()); monitor_printf(mon, "qemu time %" PRId64 " (%0.3f)\n", - qemu_time, qemu_time / (double)get_ticks_per_sec()); -qemu_time = 0; + tcg_time, tcg_time / (double)get_ticks_per_sec()); +tcg_time = 0; dev_time = 0; } #else -- 2.3.0
[Qemu-devel] [PULL 08/19] nbd: Fix potential signed overflow issues
From: Max Reitz Signed-off-by: Max Reitz Message-Id: <1424887718-10800-11-git-send-email-mre...@redhat.com> Signed-off-by: Paolo Bonzini --- include/block/nbd.h | 4 ++-- qemu-nbd.c | 5 +++-- 2 files changed, 5 insertions(+), 4 deletions(-) diff --git a/include/block/nbd.h b/include/block/nbd.h index 2c20138..53726e8 100644 --- a/include/block/nbd.h +++ b/include/block/nbd.h @@ -54,8 +54,8 @@ struct nbd_reply { /* Reply types. */ #define NBD_REP_ACK (1) /* Data sending finished. */ #define NBD_REP_SERVER (2) /* Export description. */ -#define NBD_REP_ERR_UNSUP ((1 << 31) | 1) /* Unknown option. */ -#define NBD_REP_ERR_INVALID ((1 << 31) | 3) /* Invalid length. */ +#define NBD_REP_ERR_UNSUP ((UINT32_C(1) << 31) | 1) /* Unknown option. */ +#define NBD_REP_ERR_INVALID ((UINT32_C(1) << 31) | 3) /* Invalid length. */ #define NBD_CMD_MASK_COMMAND 0x #define NBD_CMD_FLAG_FUA (1 << 16) diff --git a/qemu-nbd.c b/qemu-nbd.c index 0c9e807..a4a9a0c 100644 --- a/qemu-nbd.c +++ b/qemu-nbd.c @@ -142,8 +142,9 @@ static void read_partition(uint8_t *p, struct partition_record *r) r->end_head = p[5]; r->end_cylinder = p[7] | ((p[6] << 2) & 0x300); r->end_sector = p[6] & 0x3f; -r->start_sector_abs = p[8] | p[9] << 8 | p[10] << 16 | p[11] << 24; -r->nb_sectors_abs = p[12] | p[13] << 8 | p[14] << 16 | p[15] << 24; + +r->start_sector_abs = le32_to_cpup((uint32_t *)(p + 8)); +r->nb_sectors_abs = le32_to_cpup((uint32_t *)(p + 12)); } static int find_partition(BlockBackend *blk, int partition, -- 2.3.0
[Qemu-devel] [PULL 10/19] nbd: Fix nbd_receive_options()
From: Max Reitz The client flags are sent exactly once overall, not once per option. Signed-off-by: Max Reitz Message-Id: <1424887718-10800-19-git-send-email-mre...@redhat.com> Signed-off-by: Paolo Bonzini --- nbd.c | 49 + 1 file changed, 29 insertions(+), 20 deletions(-) diff --git a/nbd.c b/nbd.c index 3d550f7..fb8a4d4 100644 --- a/nbd.c +++ b/nbd.c @@ -350,30 +350,39 @@ fail: static int nbd_receive_options(NBDClient *client) { +int csock = client->sock; +uint32_t flags; + +/* Client sends: +[ 0 .. 3] client flags + +[ 0 .. 7] NBD_OPTS_MAGIC +[ 8 .. 11] NBD option +[12 .. 15] Data length +... Rest of request + +[ 0 .. 7] NBD_OPTS_MAGIC +[ 8 .. 11] Second NBD option +[12 .. 15] Data length +... Rest of request +*/ + +if (read_sync(csock, &flags, sizeof(flags)) != sizeof(flags)) { +LOG("read failed"); +return -EIO; +} +TRACE("Checking client flags"); +be32_to_cpus(&flags); +if (flags != 0 && flags != NBD_FLAG_C_FIXED_NEWSTYLE) { +LOG("Bad client flags received"); +return -EIO; +} + while (1) { -int csock = client->sock, ret; +int ret; uint32_t tmp, length; uint64_t magic; -/* Client sends: -[ 0 .. 3] client flags -[ 4 .. 11] NBD_OPTS_MAGIC -[12 .. 15] NBD option -[16 .. 19] length -... Rest of request -*/ - -if (read_sync(csock, &tmp, sizeof(tmp)) != sizeof(tmp)) { -LOG("read failed"); -return -EINVAL; -} -TRACE("Checking client flags"); -tmp = be32_to_cpu(tmp); -if (tmp != 0 && tmp != NBD_FLAG_C_FIXED_NEWSTYLE) { -LOG("Bad client flags received"); -return -EINVAL; -} - if (read_sync(csock, &magic, sizeof(magic)) != sizeof(magic)) { LOG("read failed"); return -EINVAL; -- 2.3.0
[Qemu-devel] [PULL 14/19] build: pass .d file name to scripts/make_device_config.sh, fix makefile target
The .d file name must match exactly what is used in the SUBDIR_DEVICES_MAK_DEP variable. Instead of making assumptions in the make_device_config.sh script, just pass it in. Similarly, the makefile target may not match the output file name, because Makefile uses a temporary file. Instead of making assumptions on what the Makefile does, emit the config-devices.mak file to stdout, and use the passed-in destination as the makefile target Reported-by: Peter Maydell Cc: Michael S. Tsirkin Signed-off-by: Paolo Bonzini --- Makefile | 3 ++- scripts/make_device_config.sh | 18 ++ 2 files changed, 12 insertions(+), 9 deletions(-) diff --git a/Makefile b/Makefile index 884b59d..88bce56 100644 --- a/Makefile +++ b/Makefile @@ -112,7 +112,8 @@ endif -include $(SUBDIR_DEVICES_MAK_DEP) %/config-devices.mak: default-configs/%.mak - $(call quiet-command,$(SHELL) $(SRC_PATH)/scripts/make_device_config.sh $@.tmp $<, " GEN $@.tmp") + $(call quiet-command, \ +$(SHELL) $(SRC_PATH)/scripts/make_device_config.sh $< $*-config-devices.mak.d $@ > $@.tmp, " GEN $@.tmp") $(call quiet-command, if test -f $@; then \ if cmp -s $@.old $@; then \ mv $@.tmp $@; \ diff --git a/scripts/make_device_config.sh b/scripts/make_device_config.sh index 7958086..c1afb3f 100644 --- a/scripts/make_device_config.sh +++ b/scripts/make_device_config.sh @@ -1,10 +1,12 @@ #! /bin/sh -# Construct a target device config file from a default, pulling in any -# files from include directives. +# Writes a target device config file to stdout, from a default and from +# include directives therein. Also emits Makefile dependencies. +# +# Usage: make_device_config.sh SRC DEPFILE-NAME DEPFILE-TARGET > DEST -dest=$1 -dep=`dirname $1`-`basename $1`.d -src=$2 +src=$1 +dep=$2 +target=$3 src_dir=`dirname $src` all_includes= @@ -22,7 +24,7 @@ while [ -n "$f" ] ; do [ $? = 0 ] || exit 1 all_includes="$all_includes $f" done -process_includes $src > $dest +process_includes $src -cat $src $all_includes | grep -v '^include' > $dest -echo "$1: $all_includes" > $dep +cat $src $all_includes | grep -v '^include' +echo "$target: $all_includes" > $dep -- 2.3.0
[Qemu-devel] [PULL 07/19] qemu-nbd: fork() can fail
From: Max Reitz It is very unlikely, but it is possible. Signed-off-by: Max Reitz Message-Id: <1424887718-10800-10-git-send-email-mre...@redhat.com> Signed-off-by: Paolo Bonzini --- qemu-nbd.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/qemu-nbd.c b/qemu-nbd.c index 0cb0e4e..0c9e807 100644 --- a/qemu-nbd.c +++ b/qemu-nbd.c @@ -635,7 +635,9 @@ int main(int argc, char **argv) * print errors and exit with the proper status code. */ pid = fork(); -if (pid == 0) { +if (pid < 0) { +err(EXIT_FAILURE, "Failed to fork"); +} else if (pid == 0) { close(stderr_fd[0]); ret = qemu_daemon(1, 0); -- 2.3.0
[Qemu-devel] [PULL 19/19] exec: Respect as_tranlsate_internal length clamp
From: Peter Crosthwaite address_space_translate_internal will clamp the *plen length argument based on the size of the memory region being queried. The iommu walker logic in addresss_space_translate was ignoring this by discarding the post fn call value of *plen. Fix by just always using *plen as the length argument throughout the fn, removing the len local variable. This fixes a bootloader bug when a single elf section spans multiple QEMU memory regions. Signed-off-by: Peter Crosthwaite Message-Id: <1426570554-15940-1-git-send-email-peter.crosthwa...@xilinx.com> Signed-off-by: Paolo Bonzini --- exec.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/exec.c b/exec.c index e97071a..8b922db 100644 --- a/exec.c +++ b/exec.c @@ -380,7 +380,6 @@ MemoryRegion *address_space_translate(AddressSpace *as, hwaddr addr, IOMMUTLBEntry iotlb; MemoryRegionSection *section; MemoryRegion *mr; -hwaddr len = *plen; rcu_read_lock(); for (;;) { @@ -395,7 +394,7 @@ MemoryRegion *address_space_translate(AddressSpace *as, hwaddr addr, iotlb = mr->iommu_ops->translate(mr, addr, is_write); addr = ((iotlb.translated_addr & ~iotlb.addr_mask) | (addr & iotlb.addr_mask)); -len = MIN(len, (addr | iotlb.addr_mask) - addr + 1); +*plen = MIN(*plen, (addr | iotlb.addr_mask) - addr + 1); if (!(iotlb.perm & (1 << is_write))) { mr = &io_mem_unassigned; break; @@ -406,10 +405,9 @@ MemoryRegion *address_space_translate(AddressSpace *as, hwaddr addr, if (xen_enabled() && memory_access_is_direct(mr, is_write)) { hwaddr page = ((addr & TARGET_PAGE_MASK) + TARGET_PAGE_SIZE) - addr; -len = MIN(page, len); +*plen = MIN(page, *plen); } -*plen = len; *xlat = addr; rcu_read_unlock(); return mr; -- 2.3.0
[Qemu-devel] [PULL 11/19] nbd: Fix interpretation of the export flags
From: Max Reitz The export flags are a 16 bit value, so be16_to_cpu() has to be used to interpret them correctly. This makes discard and flush actually work for named NBD exports (they did not work before, because the client always assumed them to be unsupported because of the bug fixed by this patch). Signed-off-by: Max Reitz Message-Id: <1424887718-10800-20-git-send-email-mre...@redhat.com> Signed-off-by: Paolo Bonzini --- nbd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/nbd.c b/nbd.c index fb8a4d4..563e820 100644 --- a/nbd.c +++ b/nbd.c @@ -625,7 +625,7 @@ int nbd_receive_negotiate(int csock, const char *name, uint32_t *flags, error_setg(errp, "Failed to read export flags"); goto fail; } -*flags |= be32_to_cpu(tmp); +*flags |= be16_to_cpu(tmp); } if (read_sync(csock, &buf, 124) != 124) { error_setg(errp, "Failed to read reserved block"); -- 2.3.0
[Qemu-devel] [PULL 16/19] kvm: fix ioeventfd endianness on bi-endian architectures
From: Greg Kurz KVM expects host endian values. Hosts that don't use the default endianness need to negate the swap performed in adjust_endianness(). Suggested-by: Paolo Bonzini Signed-off-by: Greg Kurz Message-Id: <20150313212337.31142.3991.stgit@bahia.local> Signed-off-by: Paolo Bonzini --- kvm-all.c | 24 ++-- 1 file changed, 22 insertions(+), 2 deletions(-) diff --git a/kvm-all.c b/kvm-all.c index 55025cc..335438a 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -528,13 +528,33 @@ int kvm_vm_check_extension(KVMState *s, unsigned int extension) return ret; } +static uint32_t adjust_ioeventfd_endianness(uint32_t val, uint32_t size) +{ +#if defined(HOST_WORDS_BIGENDIAN) != defined(TARGET_WORDS_BIGENDIAN) +/* The kernel expects ioeventfd values in HOST_WORDS_BIGENDIAN + * endianness, but the memory core hands them in target endianness. + * For example, PPC is always treated as big-endian even if running + * on KVM and on PPC64LE. Correct here. + */ +switch (size) { +case 2: +val = bswap16(val); +break; +case 4: +val = bswap32(val); +break; +} +#endif +return val; +} + static int kvm_set_ioeventfd_mmio(int fd, hwaddr addr, uint32_t val, bool assign, uint32_t size, bool datamatch) { int ret; struct kvm_ioeventfd iofd; -iofd.datamatch = datamatch ? val : 0; +iofd.datamatch = datamatch ? adjust_ioeventfd_endianness(val, size) : 0; iofd.addr = addr; iofd.len = size; iofd.flags = 0; @@ -564,7 +584,7 @@ static int kvm_set_ioeventfd_pio(int fd, uint16_t addr, uint16_t val, bool assign, uint32_t size, bool datamatch) { struct kvm_ioeventfd kick = { -.datamatch = datamatch ? val : 0, +.datamatch = datamatch ? adjust_ioeventfd_endianness(val, size) : 0, .addr = addr, .flags = KVM_IOEVENTFD_FLAG_PIO, .len = size, -- 2.3.0
[Qemu-devel] [PULL 15/19] virtio-scsi: Fix assert in virtio_scsi_push_event
From: Fam Zheng Hotplugging a scsi-disk may trigger the assertion in qemu_sgl_concat. qemu-system-x86_64: qemu/hw/scsi/virtio-scsi.c:115: qemu_sgl_concat: Assertion `skip == 0' failed. This is introduced by commit 55783a55 (virtio-scsi: work around bug in old BIOSes) which didn't check out_num when accessing out_sg[0].iov_len (the same to in sg). For virtio_scsi_push_event, looking into out_sg doesn't make sense because 0 req_size is intended. Cc: qemu-sta...@nongnu.org [Cc'ing qemu-stable because 55783a55 did it too] Signed-off-by: Fam Zheng Message-Id: <1426233354-525-1-git-send-email-f...@redhat.com> Signed-off-by: Paolo Bonzini --- hw/scsi/virtio-scsi.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c index da0cff8..c9bea06 100644 --- a/hw/scsi/virtio-scsi.c +++ b/hw/scsi/virtio-scsi.c @@ -146,8 +146,12 @@ static int virtio_scsi_parse_req(VirtIOSCSIReq *req, * TODO: always disable this workaround for virtio 1.0 devices. */ if (!virtio_has_feature(vdev, VIRTIO_F_ANY_LAYOUT)) { -req_size = req->elem.out_sg[0].iov_len; -resp_size = req->elem.in_sg[0].iov_len; +if (req->elem.out_num) { +req_size = req->elem.out_sg[0].iov_len; +} +if (req->elem.in_num) { +resp_size = req->elem.in_sg[0].iov_len; +} } out_size = qemu_sgl_concat(req, req->elem.out_sg, -- 2.3.0
[Qemu-devel] [PULL 13/19] coroutine-io: Return -errno in case of error
From: Max Reitz In case qemu_co_sendv_recvv() fails without any data read, there is no reason not to return the perfectly fine error number retrieved from socket_error(). Signed-off-by: Max Reitz Message-Id: <1424887718-10800-16-git-send-email-mre...@redhat.com> Signed-off-by: Paolo Bonzini --- qemu-coroutine-io.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/qemu-coroutine-io.c b/qemu-coroutine-io.c index d404926..28dc735 100644 --- a/qemu-coroutine-io.c +++ b/qemu-coroutine-io.c @@ -45,7 +45,7 @@ qemu_co_sendv_recvv(int sockfd, struct iovec *iov, unsigned iov_cnt, if (err == EAGAIN || err == EWOULDBLOCK) { qemu_coroutine_yield(); } else if (done == 0) { -return -1; +return -err; } else { break; } -- 2.3.0
[Qemu-devel] [PATCH] virtio: move sanity checks to ifdef DEBUG
All that happens when virtqueue_fill is invoked incorrectly is that we corrupt guest memory, so this check is not a security measure. Move the check to ifdef DEBUG to make sure we don't introduce new crashes close to release. Array scans aren't free either, so it's a good idea from performance point of view, too. Cc: Rusty Russell Signed-off-by: Michael S. Tsirkin --- hw/virtio/virtio.c | 15 +++ 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c index 27429c2..37fb2ee 100644 --- a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -243,15 +243,22 @@ int virtio_queue_empty(VirtQueue *vq) void virtqueue_fill(VirtQueue *vq, const VirtQueueElement *elem, unsigned int len_written, unsigned int idx) { -unsigned int offset, tot_wlen; +unsigned int offset; int i; trace_virtqueue_fill(vq, elem, len_written, idx); -for (tot_wlen = i = 0; i < elem->in_num; i++) { -tot_wlen += elem->in_sg[i].iov_len; +#ifdef DEBUG_VIRTIO +{ +/* Check that len_written is <= the writable length. */ +unsigned int tot_wlen; + +for (tot_wlen = i = 0; i < elem->in_num; i++) { +tot_wlen += elem->in_sg[i].iov_len; +} +assert(len_written <= tot_wlen); } -assert(len_written <= tot_wlen); +#endif offset = 0; for (i = 0; i < elem->in_num; i++) { -- MST
Re: [Qemu-devel] GSOC 2015 Project Proposal
On Mon, Feb 9, 2015 at 12:44 AM, Xin Tong wrote: > I would like to do GSOC this summer. The project i have in mind is to > implement a set of facilities to make implementing Hardware > transactional memory (HTM) easier in QEMU. > > HTM has become available in many architecture supported by QEMU, e.g. > i386, PowerPC, etc. Currently, necessary memory tracking. conflict > detection and transaction rollbak/commit are not available in QEMU. As > a result, HTM is supported in a very rudimentary fashion in PowerPC, > i.e. the transaction begins (tbegin in PowerPC) always trigger a fault > to the fallback code path. Even though HTM is supported by different > architectures, the underlying principle are very similar and therefore > it is beneficial to provide a set of facilities to make implementing > HTM easier in QEMU. > > These facilities should include. > > A modified software TLB to make memory address and value tracking simple. > A performant and memory efficient value/address tracking facility to > detect read/write conflicts for transactions. > A performant and memory efficient mechanism to rollback and commit > memory accesses. > A mechanism to abort transactions on the current processor as well as > remote processor. > > I will come up with a more detailed proposal as application time draws > close. Any suggestions are appreciated at the moment. Hi Xin, Thanks for proposing this project idea. There haven't been any responses yet. I have CCed Alexander Graf and David Gibson, who have worked on the PowerPC target, and Richard Henderson in case he's interested in transactional memory. You need to find a mentor willing to supervise this project idea. Hopefully bumping this email thread will remind people to consider your idea. If no QEMU regular contributors are willing to mentor the project idea then I'm afraid you would have to choose another idea to apply for QEMU GSoC. Let me know if you have any questions or need help. Stefan
Re: [Qemu-devel] [PATCH v4 0/3] usb: fix segfault when hot-unplugging usb host adapter
On Mi, 2015-03-18 at 17:33 +0800, arei.gong...@huawei.com wrote: > v4 -> v3: > - add reset hookup for sysbus devices (ehci/ohci). (Gerd) Replaced patches. make check failure is gone. Good. thanks, Gerd
Re: [Qemu-devel] [PATCH v2] hw/usb: Include USB files only if necessary
On Di, 2015-03-17 at 14:52 +0100, Thomas Huth wrote: > Everything should now also compile fine again when configure has > been run with the --enable-usb-redir and/or --enable-libusb option. Survived test now. Added to usb queue. thanks, Gerd
[Qemu-devel] [PATCH] pcie_aer: fix comment to match pcie spec
Code comment says "table 6-2" but in fact it's is not a table, it is "Figure 6-2" on page 479. Cc: Chen Fan Reported-by: Michael Tokarev Signed-off-by: Michael S. Tsirkin --- hw/pci/pcie_aer.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/hw/pci/pcie_aer.c b/hw/pci/pcie_aer.c index 9126058..eaa3e6e 100644 --- a/hw/pci/pcie_aer.c +++ b/hw/pci/pcie_aer.c @@ -622,8 +622,8 @@ static bool pcie_aer_inject_uncor_error(PCIEAERInject *inj, bool is_fatal) * * 6.2.4 Error Logging * 6.2.5 Sequence of Device Error Signaling and Logging Operations - * table 6-2: Flowchart Showing Sequence of Device Error Signaling and Logging - *Operations + * Figure 6-2: Flowchart Showing Sequence of Device Error Signaling and Logging + * Operations */ int pcie_aer_inject_error(PCIDevice *dev, const PCIEAERErr *err) { -- MST
Re: [Qemu-devel] Google SoC 2015 Idea - Extracting Virtual Device Implementation from QEMU
On Tue, Mar 17, 2015 at 7:38 PM, Guodong Zhu wrote: > The purpose of the project is to implement an interface that can extract the > implementation of the virtual devices and make them executable outside of > the QEMU environment, which will benefit later testing and making the test > of the virtual devices more flexible and more thoroughly. Hi, Take a look at tests/libqtest.c. Test cases use this interface to exercise specific device models. Although qtest involves running a full QEMU system emulator, it allows isolated device model tests similar to what you have described. Is there a project idea on the wiki that you are interested in? http://qemu-project.org/Google_Summer_of_Code_2015 Or maybe you have another idea you'd like to propose? Stefa
[Qemu-devel] [Bug 1297218] Re: guest hangs after live migration due to tsc jump
Status changed to 'Confirmed' because the bug affects multiple users. ** Changed in: glusterfs (Ubuntu Trusty) Status: New => Confirmed -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1297218 Title: guest hangs after live migration due to tsc jump Status in QEMU: New Status in glusterfs package in Ubuntu: Invalid Status in qemu package in Ubuntu: Triaged Status in glusterfs source package in Trusty: Confirmed Status in qemu source package in Trusty: Confirmed Bug description: We have two identical Ubuntu servers running libvirt/kvm/qemu, sharing a Gluster filesystem. Guests can be live migrated between them. However, live migration often leads to the guest being stuck at 100% for a while. In that case, the dmesg output for such a guest will show (once it recovers): Clocksource tsc unstable (delta = 662463064082 ns). In this particular example, a guest was migrated and only after 11 minutes (662 seconds) did it become responsive again. It seems that newly booted guests doe not suffer from this problem, these can be migrated back and forth at will. After a day or so, the problem becomes apparent. It also seems that migrating from server A to server B causes much more problems than going from B back to A. If necessary, I can do more measurements to qualify these observations. The VM servers run Ubuntu 13.04 with these packages: Kernel: 3.8.0-35-generic x86_64 Libvirt: 1.0.2 Qemu: 1.4.0 Gluster-fs: 3.4.2 (libvirt access the images via the filesystem, not using libgfapi yet as the Ubuntu libvirt is not linked against libgfapi). The interconnect between both machines (both for migration and gluster) is 10GbE. Both servers are synced to NTP and well within 1ms form one another. Guests are either Ubuntu 13.04 or 13.10. On the guests, the current_clocksource is kvm-clock. The XML definition of the guests only contains: Now as far as I've read in the documentation of kvm-clock, it specifically supports live migrations, so I'm a bit surprised at these problems. There isn't all that much information to find on these issue, although I have found postings by others that seem to have run into the same issues, but without a solution. --- ApportVersion: 2.14.1-0ubuntu3 Architecture: amd64 DistroRelease: Ubuntu 14.04 Package: libvirt (not installed) ProcCmdline: BOOT_IMAGE=/boot/vmlinuz-3.13.0-24-generic root=UUID=1b0c3c6d-a9b8-4e84-b076-117ae267d178 ro console=ttyS1,115200n8 BOOTIF=01-00-25-90-75-b5-c8 ProcVersionSignature: Ubuntu 3.13.0-24.47-generic 3.13.9 Tags: trusty apparmor apparmor apparmor apparmor apparmor Uname: Linux 3.13.0-24-generic x86_64 UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: _MarkForUpload: True modified.conffile..etc.default.libvirt.bin: [modified] modified.conffile..etc.libvirt.libvirtd.conf: [modified] modified.conffile..etc.libvirt.qemu.conf: [modified] modified.conffile..etc.libvirt.qemu.networks.default.xml: [deleted] mtime.conffile..etc.default.libvirt.bin: 2014-05-12T19:07:40.020662 mtime.conffile..etc.libvirt.libvirtd.conf: 2014-05-13T14:40:25.894837 mtime.conffile..etc.libvirt.qemu.conf: 2014-05-12T18:58:27.885506 To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1297218/+subscriptions
Re: [Qemu-devel] [PATCH v4] block/null: Latency simulation by adding new option "latency-ns"
On 03/18/2015 12:49 AM, Fam Zheng wrote: > Aio context switch should just work because the requests will be > drained, so the scheduled timer(s) on the old context will be freed. > > Signed-off-by: Fam Zheng > > --- > v4: If errp is not null, return -EINVAL. [Kevin] > --- > block/null.c | 59 > ++-- > qapi/block-core.json | 5 - > 2 files changed, 57 insertions(+), 7 deletions(-) Reviewed-by: Eric Blake -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [PATCH 2/3] block: Add QMP support for streaming to an intermediate layer
On Thu, Mar 12, 2015 at 04:45:17PM +0100, Kevin Wolf wrote: > > One issue that I'm finding is that when we move the block-stream > > job to an intermediate node, where the device name is empty, we > > get messages like "Device '' is busy". > My first thought was "then make it 'Source/Target device is busy' > without mentioning any name". In the context of any given command, > it would still be clear which BDS is meant. In fact, I have argued > before that mentioning the device name in an error to a command that > refers to a specific device is redundant and should be avoided. > > The problem here is that it's not stream_start() that generates the > error, but block_job_create(), which doesn't know which role it's bs > argument has for the block job. So it can't decide whether to say > "source device", "target device" or something completely different. The problem is actually not there. The error message generated by block_job_create() is "block device is in use by block job: stream". It's bdrv_op_is_blocked() that adds the extra "Device '' is busy". error_setg(errp, "Device '%s' is busy: %s", bdrv_get_device_name(bs), error_get_pretty(blocker->reason)); I can use the same approach as in the BlockJobInfo case and fall back to the node name if the device name is empty, but the problem is that bdrv_get_device_name() is used all over the place, so this probably needs a more general solution. Even at the moment the backing blocker set by bdrv_set_backing_hd() has problems: error_setg(&bs->backing_blocker, "device is used as backing hd of '%s'", bdrv_get_device_name(bs)); This only works if 'bs' is a root node, but if you try to perform an operation on the backing image of another backing image, you get a "device is used as backing hd of ''". Error messages aside, I would probably need to check all uses of bdrv_get_device_name() because there could be more surprises if the node is not at the root. Berto
Re: [Qemu-devel] [PATCH 0/2] virtio len fixes for qemu.
On Wed, Mar 11, 2015 at 04:29:30PM +1030, Rusty Russell wrote: > The virtio 'used' ring describes descriptors which have been used. It > also says how many bytes have been written to the ring. For some cases, > this value is ignored by Linux guests, thus errors have not been noticed. > I was working on increasing the checking in Linux when I noticed this > behaviour. > > The first patch changes the 'len' formal parameter name to 'len_written' to > make the API clearer, and adds an assert(). The second fixes block writes. > > Cheers, > Rusty. > PS. It's based on MST's virtio-1.0 tree, but should be easily ported. After going back and forth on this, I decided it's best to defer this change to 2.4. Guests can't depend on this behaviour without checking virtio-1 anyway. > Rusty Russell (2): > virtio: make it clear that "len" for a used descriptor is len written. > virtio-blk: fix length calculations for write operations. > > hw/block/virtio-blk.c | 9 - > hw/virtio/virtio.c | 19 --- > include/hw/virtio/virtio.h | 4 ++-- > 3 files changed, 22 insertions(+), 10 deletions(-) > > -- > 2.1.0
[Qemu-devel] [PULL 1/8] virtio: validate the existence of handle_output before calling it
From: Jason Wang We don't validate the existence of handle_output which may let a buggy guest to trigger a SIGSEV easily. E.g: 1) write 10 to queue_sel to a virtio net device with only 1 queue 2) setup an arbitrary pfn 3) then notify queue 10 Fixing this by validating the existence of handle_output before. Cc: qemu-sta...@nongnu.org Cc: Michael S. Tsirkin Signed-off-by: Jason Wang Reviewed-by: Michael S. Tsirkin Signed-off-by: Michael S. Tsirkin Reviewed-by: Don Koch Reviewed-by: Fam Zheng --- hw/virtio/virtio.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c index 3c6e430..17c1260 100644 --- a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -759,8 +759,9 @@ void virtio_queue_set_align(VirtIODevice *vdev, int n, int align) void virtio_queue_notify_vq(VirtQueue *vq) { -if (vq->vring.desc) { +if (vq->vring.desc && vq->handle_output) { VirtIODevice *vdev = vq->vdev; + trace_virtio_queue_notify(vdev, vq - vdev->vq, vq); vq->handle_output(vdev, vq); } -- MST
[Qemu-devel] [PULL 0/8] pci, virtio bugfixes for 2.3
The following changes since commit 18bf9e2f379334306530cbfd44218748eceaf67d: virtio-scsi: remove empty wrapper for cmd (2015-03-11 18:24:30 +0100) are available in the git repository at: git://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git tags/for_upstream for you to fetch changes up to ce394947a75296fc10f1676932473e92aa8be11a: pcie_aer: fix comment to match pcie spec (2015-03-18 12:48:21 +0100) pci, virtio bugfixes for 2.3 Just a bunch of bugfixes. Should be nothing remarkable here. Signed-off-by: Michael S. Tsirkin Chen Fan (5): pcie: correct mistaken register bit for End-End TLP Prefix Blocking aer: fix wrong check on expose aer tlp prefix log pcie_aer: fix typos in pcie_aer_inject_error comment aer: fix a wrong init PCI_ERR_COR_STATUS w1cmask type register pci: fix several trivial typos in comment Jason Wang (1): virtio: validate the existence of handle_output before calling it Michael S. Tsirkin (1): pcie_aer: fix comment to match pcie spec Stefan Weil (1): virtio: Fix memory leaks reported by Coverity include/hw/pci/pci.h | 2 +- include/hw/pci/pcie_aer.h | 2 +- include/hw/pci/pcie_regs.h | 2 +- hw/9pfs/virtio-9p-local.c | 28 hw/pci/pcie.c | 2 +- hw/pci/pcie_aer.c | 12 ++-- hw/virtio/virtio.c | 3 ++- 7 files changed, 20 insertions(+), 31 deletions(-)
Re: [Qemu-devel] [PATCH] migration: flush the bdrv before stopping VM
Kevin Wolf wrote: > [ Cc: qemu-block ] > > Am 18.03.2015 um 04:19 hat Li, Liang Z geschrieben: >> > This needs further review/changes on the block layer. >> > >> > First explanation, why I think this don't fix the full problem. >> > Whith this patch, we fix the problem where we have a dirty block layer but >> > basically nothing dirtying the memory on the guest (we are moving the 20 >> > seconds from max_downtime for the blocklayer flush), to 20 seconds until >> > we have decided that the amount of dirty memory is small enough to be >> > transferred during max_downtime. But it is still going to take 20 seconds >> > to >> > flush the block layer, and during that 20 seconds, the amount of memory >> > that >> > can be dirty is HUGE. >> >> It's true. > > What kind of cache is it actually that takes 20s to flush here? That is a very good question. When I meassured this (long, long ago), testing with the same workload, bdrv_flush_all() could take form 100-300ms (what I expected and I can live with that), to several seconds, what I can't live with. Basically (this was around RHEL6 times, whatever upstream were there at the time), my notes of the time point to: aio.c:quemu_aio_wait() . ret = select(max_fd, &rdfds, &wrfds, NULL, NULL); notice that we are doing a select without a timeout in the iothread, bad. I know that the code has changed a lot on that area, the select() don't exist anymore. But this exemplifies the problem, something asks the block layer to do an operation, and it blocks until it finishes, even if this time it is taking more than usual for whatever reason. What I would really is a way to be able to bdrv_flush_all() to take a timeout parameter and return an error case that it is taking too long. > Inside of qemu, we don't cache a whole lot. If you use qcow2, we do use > a writeback cache for some metadata that might need to be written back, > but it is small and shouldn't take any significant time. > > Then we have the kernel page cache, or for some network protocols caches > in the respective libs. This sounds like the right size for a 20s stall, > but don't we require cache.direct=on for live migration anyway for > coherency, i.e. bypassing any such cache? > > Lastly there may be a disk cache, but it's too small either. >> > I think our ouptions are: >> > >> > - tell the block layer at the beggining of migration >> > Hey, we are migrating, could you please start flusing data now, and >> > don't get the caches to grow too much, please, pretty please. >> > (I left the API to the block layer) >> > - Add on that point a new function: >> > bdrvr_flush_all_start() >> > That starts the sending of pages, and we "hope" that by the time that >> > we have migrated all memory, they have also finished (so our last >> > call to block_flush_all() have less work to do) >> > - Add another function: >> > int bdrv_flush_all_timeout(int timeout) >> > that returns if timeout pass, telling if it has migrated all pages or >> > timeout has passed. So we can got back to the iterative stage if it >> > has taken too long. > > The problem is that the block layer really doesn't have an option to > control what is getting synced once the data is cached outside of qemu. > Essentially we can do an fdatasync() or we can leave it, that's the only > choice we have. See my explanation, if all that qemu is doing is an fdatasync(), just spawn a thread that do the fdatasync() and if the thread don't finish in time, just return one error. You can implement that behaviour whatever you want. > Now doing an asynchronous fdatasync() in the background is completely > reasonable in order to reduce the amount of data to be flushed later. > But the patch is doing it while holding both the BQL and the AIOContext > lock of the device, which doesn't feel right. Maybe it should schedule a > BH in the AIOContext instead and flush from there asynchronously. Position is wrong, definitelly. We want to start the asynchronous fdatasync() at the start of migration, or each X milliseconds. At this point, we *think* that we can finish the migration on max_downtime (basically we are ignoring the time that is going to take to migrate device state and do the block layer flush, but normally this takes in the other of 100-200ms, so it don't matter at all). > The other thing is that flushing once doesn't mean that new data isn't > accumulating in the cache, especially if you decide to do the background > flush at the start of the migration. But "pontentially", we would arrive to this point with less "cached" data everything on the system. > The obvious way to avoid that would be to switch to a writethrough mode, > so any write request goes directly to the disk. This will, however, > impact performance so heavily that it's probably not a realistic option. > > An alternative approach could be to repeat the background flush > periodically, either time based or after every x bytes that are wri
[Qemu-devel] [PULL 4/8] aer: fix wrong check on expose aer tlp prefix log
From: Chen Fan when specify TLP Prefix log as using pcie_aer_inject_error, the TLP prefix log is always discarded. because the check is incorrect, the End-End TLP Prefix Supported bit (PCI_EXP_DEVCAP2_EETLPP) should be in Device Capabilities 2 Register. Signed-off-by: Chen Fan Reviewed-by: Michael S. Tsirkin Signed-off-by: Michael S. Tsirkin --- hw/pci/pcie_aer.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/pci/pcie_aer.c b/hw/pci/pcie_aer.c index 5a25c32..c7fad34 100644 --- a/hw/pci/pcie_aer.c +++ b/hw/pci/pcie_aer.c @@ -433,7 +433,7 @@ static void pcie_aer_update_log(PCIDevice *dev, const PCIEAERErr *err) } if ((err->flags & PCIE_AER_ERR_TLP_PREFIX_PRESENT) && -(pci_get_long(dev->config + dev->exp.exp_cap + PCI_EXP_DEVCTL2) & +(pci_get_long(dev->config + dev->exp.exp_cap + PCI_EXP_DEVCAP2) & PCI_EXP_DEVCAP2_EETLPP)) { for (i = 0; i < ARRAY_SIZE(err->prefix); ++i) { /* 7.10.12 tlp prefix log register */ -- MST
[Qemu-devel] [PULL 5/8] pcie_aer: fix typos in pcie_aer_inject_error comment
From: Chen Fan Refer to "PCI Express Base Spec3.0", this comments can't fit the description in spec, so we should fix them. Signed-off-by: Chen Fan Reviewed-by: Michael S. Tsirkin Signed-off-by: Michael S. Tsirkin --- hw/pci/pcie_aer.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/hw/pci/pcie_aer.c b/hw/pci/pcie_aer.c index c7fad34..9daebc2 100644 --- a/hw/pci/pcie_aer.c +++ b/hw/pci/pcie_aer.c @@ -618,11 +618,11 @@ static bool pcie_aer_inject_uncor_error(PCIEAERInject *inj, bool is_fatal) * non-Function specific error must be recorded in all functions. * It is the responsibility of the caller of this function. * It is also caller's responsibility to determine which function should - * report the rerror. + * report the error. * * 6.2.4 Error Logging - * 6.2.5 Sqeunce of Device Error Signaling and Logging Operations - * table 6-2: Flowchard Showing Sequence of Device Error Signaling and Logging + * 6.2.5 Sequence of Device Error Signaling and Logging Operations + * table 6-2: Flowchart Showing Sequence of Device Error Signaling and Logging *Operations */ int pcie_aer_inject_error(PCIDevice *dev, const PCIEAERErr *err) -- MST
[Qemu-devel] [PULL 2/8] virtio: Fix memory leaks reported by Coverity
From: Stefan Weil All four leaks are similar, so fix them in one patch. Signed-off-by: Stefan Weil Reviewed-by: Michael S. Tsirkin Signed-off-by: Michael S. Tsirkin --- hw/9pfs/virtio-9p-local.c | 28 1 file changed, 8 insertions(+), 20 deletions(-) diff --git a/hw/9pfs/virtio-9p-local.c b/hw/9pfs/virtio-9p-local.c index d05c917..d66abcd 100644 --- a/hw/9pfs/virtio-9p-local.c +++ b/hw/9pfs/virtio-9p-local.c @@ -488,7 +488,7 @@ static int local_mknod(FsContext *fs_ctx, V9fsPath *dir_path, int err = -1; int serrno = 0; V9fsString fullname; -char *buffer; +char *buffer = NULL; v9fs_string_init(&fullname); v9fs_string_sprintf(&fullname, "%s/%s", dir_path->data, name); @@ -499,7 +499,6 @@ static int local_mknod(FsContext *fs_ctx, V9fsPath *dir_path, buffer = rpath(fs_ctx, path); err = mknod(buffer, SM_LOCAL_MODE_BITS|S_IFREG, 0); if (err == -1) { -g_free(buffer); goto out; } err = local_set_xattr(buffer, credp); @@ -512,7 +511,6 @@ static int local_mknod(FsContext *fs_ctx, V9fsPath *dir_path, buffer = rpath(fs_ctx, path); err = mknod(buffer, SM_LOCAL_MODE_BITS|S_IFREG, 0); if (err == -1) { -g_free(buffer); goto out; } err = local_set_mapped_file_attr(fs_ctx, path, credp); @@ -525,7 +523,6 @@ static int local_mknod(FsContext *fs_ctx, V9fsPath *dir_path, buffer = rpath(fs_ctx, path); err = mknod(buffer, credp->fc_mode, credp->fc_rdev); if (err == -1) { -g_free(buffer); goto out; } err = local_post_create_passthrough(fs_ctx, path, credp); @@ -539,8 +536,8 @@ static int local_mknod(FsContext *fs_ctx, V9fsPath *dir_path, err_end: remove(buffer); errno = serrno; -g_free(buffer); out: +g_free(buffer); v9fs_string_free(&fullname); return err; } @@ -552,7 +549,7 @@ static int local_mkdir(FsContext *fs_ctx, V9fsPath *dir_path, int err = -1; int serrno = 0; V9fsString fullname; -char *buffer; +char *buffer = NULL; v9fs_string_init(&fullname); v9fs_string_sprintf(&fullname, "%s/%s", dir_path->data, name); @@ -563,7 +560,6 @@ static int local_mkdir(FsContext *fs_ctx, V9fsPath *dir_path, buffer = rpath(fs_ctx, path); err = mkdir(buffer, SM_LOCAL_DIR_MODE_BITS); if (err == -1) { -g_free(buffer); goto out; } credp->fc_mode = credp->fc_mode|S_IFDIR; @@ -576,7 +572,6 @@ static int local_mkdir(FsContext *fs_ctx, V9fsPath *dir_path, buffer = rpath(fs_ctx, path); err = mkdir(buffer, SM_LOCAL_DIR_MODE_BITS); if (err == -1) { -g_free(buffer); goto out; } credp->fc_mode = credp->fc_mode|S_IFDIR; @@ -590,7 +585,6 @@ static int local_mkdir(FsContext *fs_ctx, V9fsPath *dir_path, buffer = rpath(fs_ctx, path); err = mkdir(buffer, credp->fc_mode); if (err == -1) { -g_free(buffer); goto out; } err = local_post_create_passthrough(fs_ctx, path, credp); @@ -604,8 +598,8 @@ static int local_mkdir(FsContext *fs_ctx, V9fsPath *dir_path, err_end: remove(buffer); errno = serrno; -g_free(buffer); out: +g_free(buffer); v9fs_string_free(&fullname); return err; } @@ -659,7 +653,7 @@ static int local_open2(FsContext *fs_ctx, V9fsPath *dir_path, const char *name, int err = -1; int serrno = 0; V9fsString fullname; -char *buffer; +char *buffer = NULL; /* * Mark all the open to not follow symlinks @@ -675,7 +669,6 @@ static int local_open2(FsContext *fs_ctx, V9fsPath *dir_path, const char *name, buffer = rpath(fs_ctx, path); fd = open(buffer, flags, SM_LOCAL_MODE_BITS); if (fd == -1) { -g_free(buffer); err = fd; goto out; } @@ -690,7 +683,6 @@ static int local_open2(FsContext *fs_ctx, V9fsPath *dir_path, const char *name, buffer = rpath(fs_ctx, path); fd = open(buffer, flags, SM_LOCAL_MODE_BITS); if (fd == -1) { -g_free(buffer); err = fd; goto out; } @@ -706,7 +698,6 @@ static int local_open2(FsContext *fs_ctx, V9fsPath *dir_path, const char *name, buffer = rpath(fs_ctx, path); fd = open(buffer, flags, credp->fc_mode); if (fd == -1) { -g_free(buffer); err = fd; goto out; } @@ -724,8 +715,8 @@ err_end: close(fd); remove(buffer); errno = serrno; -g_free(buffer); out: +g_free(buffer); v9fs_string_free(&fullname); return err; } @@ -738,7 +729,7 @@ static int local_symlink(FsContext *fs_ctx, const char *oldpath, int serrno = 0; char *newpath; V9fsString fullname; -
[Qemu-devel] [PULL 8/8] pcie_aer: fix comment to match pcie spec
Code comment says "table 6-2" but in fact it's is not a table, it is "Figure 6-2" on page 479. Cc: Chen Fan Reported-by: Michael Tokarev Signed-off-by: Michael S. Tsirkin --- hw/pci/pcie_aer.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/hw/pci/pcie_aer.c b/hw/pci/pcie_aer.c index 9126058..eaa3e6e 100644 --- a/hw/pci/pcie_aer.c +++ b/hw/pci/pcie_aer.c @@ -622,8 +622,8 @@ static bool pcie_aer_inject_uncor_error(PCIEAERInject *inj, bool is_fatal) * * 6.2.4 Error Logging * 6.2.5 Sequence of Device Error Signaling and Logging Operations - * table 6-2: Flowchart Showing Sequence of Device Error Signaling and Logging - *Operations + * Figure 6-2: Flowchart Showing Sequence of Device Error Signaling and Logging + * Operations */ int pcie_aer_inject_error(PCIDevice *dev, const PCIEAERErr *err) { -- MST
[Qemu-devel] [PULL 3/8] pcie: correct mistaken register bit for End-End TLP Prefix Blocking
From: Chen Fan from pcie spec 7.8.17, the End-End TLP Prefix Blocking bit local is 15(e.g. 0x8000) in device control 2 register. Signed-off-by: Chen Fan Reviewed-by: Michael S. Tsirkin Signed-off-by: Michael S. Tsirkin --- include/hw/pci/pcie_regs.h | 2 +- hw/pci/pcie.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/include/hw/pci/pcie_regs.h b/include/hw/pci/pcie_regs.h index 652d9fc..848ab1c 100644 --- a/include/hw/pci/pcie_regs.h +++ b/include/hw/pci/pcie_regs.h @@ -72,7 +72,7 @@ #define PCI_EXP_DEVCAP2_EFF 0x10 #define PCI_EXP_DEVCAP2_EETLPP 0x20 -#define PCI_EXP_DEVCTL2_EETLPPB 0x80 +#define PCI_EXP_DEVCTL2_EETLPPB 0x8000 /* ARI */ #define PCI_ARI_VER 1 diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c index 1a1..1463e65 100644 --- a/hw/pci/pcie.c +++ b/hw/pci/pcie.c @@ -84,7 +84,7 @@ int pcie_cap_init(PCIDevice *dev, uint8_t offset, uint8_t type, uint8_t port) pci_set_long(exp_cap + PCI_EXP_DEVCAP2, PCI_EXP_DEVCAP2_EFF | PCI_EXP_DEVCAP2_EETLPP); -pci_set_word(dev->wmask + pos, PCI_EXP_DEVCTL2_EETLPPB); +pci_set_word(dev->wmask + pos + PCI_EXP_DEVCTL2, PCI_EXP_DEVCTL2_EETLPPB); return pos; } -- MST
[Qemu-devel] [PULL 7/8] pci: fix several trivial typos in comment
From: Chen Fan Signed-off-by: Chen Fan Reviewed-by: Michael S. Tsirkin Signed-off-by: Michael S. Tsirkin --- include/hw/pci/pci.h | 2 +- include/hw/pci/pcie_aer.h | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h index be2d9b8..b97c295 100644 --- a/include/hw/pci/pci.h +++ b/include/hw/pci/pci.h @@ -137,7 +137,7 @@ enum { #define PCI_CONFIG_HEADER_SIZE 0x40 /* Size of the standard PCI config space */ #define PCI_CONFIG_SPACE_SIZE 0x100 -/* Size of the standart PCIe config space: 4KB */ +/* Size of the standard PCIe config space: 4KB */ #define PCIE_CONFIG_SPACE_SIZE 0x1000 #define PCI_NUM_PINS 4 /* A-D */ diff --git a/include/hw/pci/pcie_aer.h b/include/hw/pci/pcie_aer.h index bcac80a..2fb8388 100644 --- a/include/hw/pci/pcie_aer.h +++ b/include/hw/pci/pcie_aer.h @@ -51,7 +51,7 @@ struct PCIEAERLog { PCIEAERErr *log; }; -/* aer error message: error signaling message has only error sevirity and +/* aer error message: error signaling message has only error severity and source id. See 2.2.8.3 error signaling messages */ struct PCIEAERMsg { /* -- MST
[Qemu-devel] [PULL 6/8] aer: fix a wrong init PCI_ERR_COR_STATUS w1cmask type register
From: Chen Fan Error Status Register, so this patch fix a wrong definition for PCI_ERR_COR_STATUS register with w1cmask type. Signed-off-by: Chen Fan Reviewed-by: Michael S. Tsirkin Signed-off-by: Michael S. Tsirkin --- hw/pci/pcie_aer.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/pci/pcie_aer.c b/hw/pci/pcie_aer.c index 9daebc2..9126058 100644 --- a/hw/pci/pcie_aer.c +++ b/hw/pci/pcie_aer.c @@ -123,7 +123,7 @@ int pcie_aer_init(PCIDevice *dev, uint16_t offset) PCI_ERR_UNC_SUPPORTED); pci_long_test_and_set_mask(dev->w1cmask + offset + PCI_ERR_COR_STATUS, - PCI_ERR_COR_STATUS); + PCI_ERR_COR_SUPPORTED); pci_set_long(dev->config + offset + PCI_ERR_COR_MASK, PCI_ERR_COR_MASK_DEFAULT); -- MST