Re: [Qemu-devel] [PATCH v9] spec: add qcow2 bitmaps extension specification
On Tue, 02/02 09:35, Vladimir Sementsov-Ogievskiy wrote: > The new feature for qcow2: storing bitmaps. > > This patch adds new header extension to qcow2 - Bitmaps Extension. It > provides an ability to store virtual disk related bitmaps in a qcow2 > image. For now there is only one type of such bitmaps: Dirty Tracking > Bitmap, which just tracks virtual disk changes from some moment. > > Note: Only bitmaps, relative to the virtual disk, stored in qcow2 file, > should be stored in this qcow2 file. The size of each bitmap > (considering its granularity) is equal to virtual disk size. > > Signed-off-by: Vladimir Sementsov-Ogievskiy > --- > > v9 > - rewordings, thanks to Max > > v8 > - rewordings > - bitmap_directory_size: 4b -> 8b > - add more descriptive description in == Bitmaps == section > - add paragraph "Dirty tracking bitmaps" > > Bitmap directory entry: > - extra data should not allocate additional clusters > - padding must be all-bytes-zero > - add extra_data_compatible flag (now behavior in case of unknown > extra data is defined by this flag) > > v7: > > - Rewordings, grammar. > Max, Eric, John, thank you very much. > > - add last paragraph: remaining bits in bitmap data clusters must be > zero. > > - s/Bitmap Directory/bitmap directory/ and other names like this at > the request of Max. > > v6: > > - reword bitmap_directory_size description > - bitmap type: make 0 reserved > - extra_data_size: resize to 4bytes > Also, I've marked this field as "must be zero". We can always change > it, if we decide allowing managing app to specify any extra data, by > defining some magic value as a top of user extra data.. So, for now > non zeor extra_data_size should be considered as an error. > - swap name and extra_data to give good alignment to extra_data. > > > v5: > > - 'Dirty bitmaps' renamed to 'Bitmaps', as we may have several types of > bitmaps. > - rewordings > - move upper bounds to "Notes about Qemu limits" > - s/should/must somewhere. (but not everywhere) > - move name_size field closer to name itself in bitmap header > - add extra data area to bitmap header > - move bitmap data description to separate section > > > docs/specs/qcow2.txt | 223 > ++- > 1 file changed, 222 insertions(+), 1 deletion(-) > > diff --git a/docs/specs/qcow2.txt b/docs/specs/qcow2.txt > index f236d8c..db5e666 100644 > --- a/docs/specs/qcow2.txt > +++ b/docs/specs/qcow2.txt > @@ -103,7 +103,18 @@ in the description of a field. > write to an image with unknown auto-clear features if it > clears the respective bits from this field first. > > -Bits 0-63: Reserved (set to 0) > +Bit 0: Bitmaps extension bit > +This bit indicates consistency for the > bitmaps > +extension data. > + > +It is an error if this bit is set without the > +bitmaps extension present. > + > +If the bitmaps extension is present but this > +bit is unset, the bitmaps extension data > must be > +considered inconsistent. > + > +Bits 1-63: Reserved (set to 0) > > 96 - 99: refcount_order > Describes the width of a reference count block entry > (width > @@ -123,6 +134,7 @@ be stored. Each extension has a structure like the > following: > 0x - End of the header extension area > 0xE2792ACA - Backing file format name > 0x6803f857 - Feature name table > +0x23852875 - Bitmaps extension > other - Unknown header extension, can be safely > ignored > > @@ -166,6 +178,36 @@ the header extension data. Each entry look like this: > terminated if it has full length) > > > +== Bitmaps extension == > + > +The bitmaps extension is an optional header extension. It provides the > ability > +to store bitmaps related to a virtual disk. For now, there is only one bitmap > +type: the dirty tracking bitmap, which tracks virtual disk changes from some > +point in time. > + > +The data of the extension should be considered consistent only if the > +corresponding auto-clear feature bit is set, see autoclear_features above. > + > +The fields of the bitmaps extension are: > + > +Byte 0 - 3: nb_bitmaps > + The number of bitmaps contained in the image. Must be > + greater than or equal to 1. > + > + Note: Qemu currently only supports up to 65535 bitmaps per > + image. > + > + 4 - 7: Reserved, must be zero. > + > +
[Qemu-devel] RE: [iGVT-g] VFIO based vGPU(was Re: [Announcement] 2015-Q3 release of XenGT - a Mediated ...)
> From: Zhiyuan Lv > Sent: Tuesday, February 02, 2016 3:35 PM > > Hi Gerd/Alex, > > On Mon, Feb 01, 2016 at 02:44:55PM -0700, Alex Williamson wrote: > > On Mon, 2016-02-01 at 14:10 +0100, Gerd Hoffmann wrote: > > > Hi, > > > > > > > > Unfortunately it's not the only one. Another example is, device-model > > > > > may want to write-protect a gfn (RAM). In case that this request goes > > > > > to VFIO .. how it is supposed to reach KVM MMU? > > > > > > > > Well, let's work through the problem. How is the GFN related to the > > > > device? Is this some sort of page table for device mappings with a base > > > > register in the vgpu hardware? > > > > > > IIRC this is needed to make sure the guest can't bypass execbuffer > > > verification and works like this: > > > > > > (1) guest submits execbuffer. > > > (2) host makes execbuffer readonly for the guest > > > (3) verify the buffer (make sure it only accesses resources owned by > > > the vm). > > > (4) pass on execbuffer to the hardware. > > > (5) when the gpu is done with it make the execbuffer writable again. > > > > Ok, so are there opportunities to do those page protections outside of > > KVM? We should be able to get the vma for the buffer, can we do > > something with that to make it read-only. Alternatively can the vgpu > > driver copy it to a private buffer and hardware can execute from that? > > I'm not a virtual memory expert, but it doesn't seem like an > > insurmountable problem. Thanks, > > Originally iGVT-g used write-protection for privilege execbuffers, as Gerd > described. Now the latest implementation has removed wp to do buffer copy > instead, since the privilege command buffers are usually small. So that part > is fine. > > But we need write-protection for graphics page table shadowing as well. Once > guest driver modifies gpu page table, we need to know that and manipulate > shadow page table accordingly. buffer copy cannot help here. Thanks! > After walking through the whole thread again, let me do a summary here so everyone can be on the same page. First, Jike told me before his vacation, that we cannot do any change to KVM module according to community comments. Now I think it's not true. We can do necessary changes, as long as it is done in a structural/layered approach, w/o hard assumption on KVMGT as the only user. That's the guideline we need to obey. :-) Mostly we care about two aspects regarding to a vgpu driver: - services/callbacks which vgpu driver provides to external framework (e.g. vgpu core driver and VFIO); - services/callbacks which vgpu driver relies on for proper emulation (e.g. from VFIO and/or hypervisor); The former is being discussed in another thread. So here let's focus on the latter. In general Intel GVT-g requires below services for emulation: 1) Selectively pass-through a region to a VM -- This can be supported by today's VFIO framework, by setting VFIO_REGION_INFO_FLAG_MMAP for concerned regions. Then Qemu will mmap that region which will finally be added to the EPT table of the target VM 2) Trap-and-emulate a region -- Similarly, this can be easily achieved by clearing MMAP flag for concerned regions. Then every access from VM will go through Qemu and then VFIO and finally reach vgpu driver. The only concern is in the performance part. We need some general mechanism to allow delivering I/O emulation request directly from KVM in kernel. For example, Alex mentioned some flavor based on file descriptor + offset. Likely let's move forward with the default Qemu forwarding, while brainstorming exit-less delivery in parallel. 3) Inject a virtual interrupt -- We can leverage existing VFIO IRQ injection interface, including configuration and irqfd interface. 4) Map/unmap guest memory -- It's there for KVM. 5) Pin/unpin guest memory -- IGD or any PCI passthru should have same requirement. So we should be able to leverage existing code in VFIO. The only tricky thing (Jike may elaborate after he is back), is that KVMGT requires to pin EPT entry too, which requires some further change in KVM side. But I'm not sure whether it still holds true after some design changes made in this thread. So I'll leave to Jike to further comment. 6) Write-protect a guest memory page -- The primary purpose is for GPU page table shadowing. We need to track modifications on guest GPU page table, so shadow part can be synchronized accordingly. Just think about CPU page table shadowing. And old example as Zhiyuan pointed out, is to write-protect guest cmd buffer. But it becomes not necessary now. So we need KVM to provide an interface so some agents can request such write-protection action (not just for KVMGT. could be for other tracking usages). Guangrong has been working on a general page tracking mechanism, upon which write-protection can be easily built on. The review is still in progress. 7) GPA->IOVA/HVA translation -- It's required in various places, e.g.: - read a guest structure accord
Re: [Qemu-devel] [PATCH v2 0/6] external backup api
On Sat, 01/30 13:56, Vladimir Sementsov-Ogievskiy wrote: > Hi all. > > These series which aims to add external backup api. This is needed to allow > backup software use our dirty bitmaps. > > Vmware and Parallels Cloud Server have this feature. What is the advantage of this appraoch over "drive-backup sync=incremental ..."? > > There are three things are done: > - add query-block-dirty-bitmap-ranges qmp command > - add qmp commands for dirty-bitmap functions: create_successor, abdicate, > reclaim. > - make create-successor command transaction-able > > Then, external backup should be done like this: > > 1. qmp transaction { > external-snapshot > bitmap-create-successor > } > > 2. qmp query frozen bitmap, not acquiring aio context. What do you do with the query response, read the snapshot image with an external tool? Fam > > 3. do external backup, using snapshot and bitmap > > > 4. if (success backup) > qmp bitmap-abdicate > else > qmp bitmap-reclaime > > 5. qmp merge snapshot > > > v2: a lot of additions and changes, no sense to compare with v1 > > > Vladimir Sementsov-Ogievskiy (6): > block dirty bitmap: add next_zero function > qmp: add query-block-dirty-bitmap-ranges > iotests: test query-block-dirty-bitmap-ranges > qapi: add qmp commands for some dirty bitmap functions > qapi: make block-dirty-bitmap-create-successor transaction-able > iotests: test external backup api > > block/dirty-bitmap.c | 60 ++ > blockdev.c | 113 + > include/block/dirty-bitmap.h | 9 > include/qemu/hbitmap.h | 8 +++ > qapi-schema.json | 4 +- > qapi/block-core.json | 90 + > qmp-commands.hx | 118 > +++ > tests/qemu-iotests/150 | 88 > tests/qemu-iotests/150.out | 21 > tests/qemu-iotests/151 | 77 > tests/qemu-iotests/151.out | 7 +++ > tests/qemu-iotests/group | 2 + > util/hbitmap.c | 26 ++ > 13 files changed, 622 insertions(+), 1 deletion(-) > create mode 100755 tests/qemu-iotests/150 > create mode 100644 tests/qemu-iotests/150.out > create mode 100755 tests/qemu-iotests/151 > create mode 100644 tests/qemu-iotests/151.out > > -- > 1.8.3.1 >
Re: [Qemu-devel] [PATCH v9 01/16] block: Release named dirty bitmaps in bdrv_close()
On Fri, 01/29 16:36, Max Reitz wrote: > bdrv_delete() is not very happy about deleting BlockDriverStates with > dirty bitmaps still attached to them. In the past, we got around that > very easily by relying on bdrv_close_all() bypassing bdrv_delete(), and > bdrv_close() simply ignoring that condition. We should fix that by > releasing all named dirty bitmaps in bdrv_close() (there should not be > any unnamed bitmaps left) and moving the assertion from bdrv_delete() > there. > > Signed-off-by: Max Reitz > --- > block.c | 39 +++ > 1 file changed, 31 insertions(+), 8 deletions(-) > > diff --git a/block.c b/block.c > index 5709d3d..41ab00e 100644 > --- a/block.c > +++ b/block.c > @@ -88,6 +88,8 @@ static int bdrv_open_inherit(BlockDriverState **pbs, const > char *filename, > const BdrvChildRole *child_role, Error **errp); > > static void bdrv_dirty_bitmap_truncate(BlockDriverState *bs); > +static void bdrv_release_named_dirty_bitmaps(BlockDriverState *bs); > + > /* If non-zero, use only whitelisted block drivers */ > static int use_bdrv_whitelist; > > @@ -2157,6 +2159,9 @@ void bdrv_close(BlockDriverState *bs) > > notifier_list_notify(&bs->close_notifiers, bs); > > +bdrv_release_named_dirty_bitmaps(bs); > +assert(QLIST_EMPTY(&bs->dirty_bitmaps)); > + > if (bs->blk) { > blk_dev_change_media_cb(bs->blk, false); > } > @@ -2366,7 +2371,6 @@ static void bdrv_delete(BlockDriverState *bs) > assert(!bs->job); > assert(bdrv_op_blocker_is_empty(bs)); > assert(!bs->refcnt); > -assert(QLIST_EMPTY(&bs->dirty_bitmaps)); > > bdrv_close(bs); > > @@ -3582,21 +3586,40 @@ static void > bdrv_dirty_bitmap_truncate(BlockDriverState *bs) > } > } > > -void bdrv_release_dirty_bitmap(BlockDriverState *bs, BdrvDirtyBitmap *bitmap) > +static void bdrv_do_release_matching_dirty_bitmap(BlockDriverState *bs, > + BdrvDirtyBitmap *bitmap, > + bool only_named) > { > BdrvDirtyBitmap *bm, *next; > QLIST_FOREACH_SAFE(bm, &bs->dirty_bitmaps, list, next) { > -if (bm == bitmap) { > +if ((!bitmap || bm == bitmap) && (!only_named || bm->name)) { > assert(!bdrv_dirty_bitmap_frozen(bm)); > -QLIST_REMOVE(bitmap, list); > -hbitmap_free(bitmap->bitmap); > -g_free(bitmap->name); > -g_free(bitmap); > -return; > +QLIST_REMOVE(bm, list); > +hbitmap_free(bm->bitmap); > +g_free(bm->name); > +g_free(bm); > + > +if (bitmap) { > +return; > +} > } > } > } > > +void bdrv_release_dirty_bitmap(BlockDriverState *bs, BdrvDirtyBitmap *bitmap) > +{ > +bdrv_do_release_matching_dirty_bitmap(bs, bitmap, false); > +} > + > +/** > + * Release all named dirty bitmaps attached to a BDS (for use in > bdrv_close()). > + * There must not be any frozen bitmaps attached. > + */ > +static void bdrv_release_named_dirty_bitmaps(BlockDriverState *bs) > +{ > +bdrv_do_release_matching_dirty_bitmap(bs, NULL, true); > +} > + > void bdrv_disable_dirty_bitmap(BdrvDirtyBitmap *bitmap) > { > assert(!bdrv_dirty_bitmap_frozen(bitmap)); > -- > 2.7.0 > Reviewed-by: Fam Zheng
Re: [Qemu-devel] [RFC PATCH v1 1/1] vGPU core driver : to provide common interface for vGPU.
Hi, > Actually I have a long puzzle in this area. Definitely libvirt will use UUID > to > mark a VM. And obviously UUID is not recorded within KVM. Then how does > libvirt talk to KVM based on UUID? It could be a good reference to this > design. libvirt keeps track which qemu instance belongs to which vm. qemu also gets started with "-uuid ...", so one can query qemu via monitor ("info uuid") to figure what the uuid is. It is also in the smbios tables so the guest can see it in the system information table. The uuid is not visible to the kernel though, the kvm kernel driver doesn't know what the uuid is (and neither does vfio). qemu uses file handles to talk to both kvm and vfio. qemu notifies both kvm and vfio about anything relevant events (guest address space changes etc) and connects file descriptors (eventfd -> irqfd). qemu needs a sysfs node as handle to the vfio device, something like /sys/devices/virtual/vgpu/. can be a uuid if you want have it that way, but it could be pretty much anything. The sysfs node will probably show up as-is in the libvirt xml when assign a vgpu to a vm. So the name should be something stable (i.e. when using a uuid as name you should better not generate a new one on each boot). cheers, Gerd
Re: [Qemu-devel] [PATCH v9 05/16] virtio-scsi: Catch BDS-BB removal/insertion
On Fri, 01/29 16:36, Max Reitz wrote: > Make use of the BDS-BB removal and insertion notifiers to remove or set > up, respectively, virtio-scsi's op blockers. > > Signed-off-by: Max Reitz Reviewed-by: Fam Zheng
Re: [Qemu-devel] RE: [iGVT-g] VFIO based vGPU(was Re: [Announcement] 2015-Q3 release of XenGT - a Mediated ...)
On Wed, Feb 03, 2016 at 08:04:16AM +, Tian, Kevin wrote: > > From: Zhiyuan Lv > > Sent: Tuesday, February 02, 2016 3:35 PM > > > > Hi Gerd/Alex, > > > > On Mon, Feb 01, 2016 at 02:44:55PM -0700, Alex Williamson wrote: > > > On Mon, 2016-02-01 at 14:10 +0100, Gerd Hoffmann wrote: > > > > Hi, > > > > > > > > > > Unfortunately it's not the only one. Another example is, > > > > > > device-model > > > > > > may want to write-protect a gfn (RAM). In case that this request > > > > > > goes > > > > > > to VFIO .. how it is supposed to reach KVM MMU? > > > > > > > > > > Well, let's work through the problem. How is the GFN related to the > > > > > device? Is this some sort of page table for device mappings with a > > > > > base > > > > > register in the vgpu hardware? > > > > > > > > IIRC this is needed to make sure the guest can't bypass execbuffer > > > > verification and works like this: > > > > > > > > (1) guest submits execbuffer. > > > > (2) host makes execbuffer readonly for the guest > > > > (3) verify the buffer (make sure it only accesses resources owned by > > > > the vm). > > > > (4) pass on execbuffer to the hardware. > > > > (5) when the gpu is done with it make the execbuffer writable again. > > > > > > Ok, so are there opportunities to do those page protections outside of > > > KVM? We should be able to get the vma for the buffer, can we do > > > something with that to make it read-only. Alternatively can the vgpu > > > driver copy it to a private buffer and hardware can execute from that? > > > I'm not a virtual memory expert, but it doesn't seem like an > > > insurmountable problem. Thanks, > > > > Originally iGVT-g used write-protection for privilege execbuffers, as Gerd > > described. Now the latest implementation has removed wp to do buffer copy > > instead, since the privilege command buffers are usually small. So that part > > is fine. > > > > But we need write-protection for graphics page table shadowing as well. Once > > guest driver modifies gpu page table, we need to know that and manipulate > > shadow page table accordingly. buffer copy cannot help here. Thanks! > > > > > 4) Map/unmap guest memory > -- > It's there for KVM. > > 5) Pin/unpin guest memory > -- > IGD or any PCI passthru should have same requirement. So we should be > able to leverage existing code in VFIO. The only tricky thing (Jike may > elaborate after he is back), is that KVMGT requires to pin EPT entry too, > which requires some further change in KVM side. But I'm not sure whether > it still holds true after some design changes made in this thread. So I'll > leave to Jike to further comment. > Hi Kevin, I think you should be able to map and pin guest memory via the IOMMU API, not KVM. > Well, then I realize pretty much opens have been covered with a solution > when ending this write-up. Then we should move forward to come up a > prototype upon which we can then identify anything missing or overlooked > (definitely there would be), and also discuss several remaining opens atop > (such as exit-less emulation, pin/unpin, etc.). Another thing we need > to think is whether this new design is still compatible to Xen side. > > Thanks a lot all for the great discussion (especially Alex with many good > inputs)! I believe it becomes much clearer now than 2 weeks ago, about > how to integrate KVMGT with VFIO. :-) > It is great to see you guys are onboard with VFIO solution! As Kirti has mentioned in other threads, let's review the current registration APIs and figure out what we need to add for both solutions. Thanks, Neo > Thanks > Kevin > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
[Qemu-devel] [PATCH v2] scripts/kvm/kvm_stat: Fix tracefs access checking
On kernels build without CONFIG_TRACING kvm_stat will bail out even when traces are not used. This is not very helpful, especially if the user can't install a new kernel. Instead, we should warn the user and fall back to debugfs statistics. These changes check if trace statistics were selected without kernel support and warn the user. If they were not selected explicitly, we wait five seconds to let the user read the warning and then set the debugfs statistics to True and the tracefs ones to False. Otherwise we exit. Fixes: 7aa4ee5 ('scripts/kvm/kvm_stat: Improve debugfs access checking') Signed-off-by: Janosch Frank --- v1 to v2: Exit if -t is set explicitly scripts/kvm/kvm_stat | 23 ++- 1 file changed, 18 insertions(+), 5 deletions(-) diff --git a/scripts/kvm/kvm_stat b/scripts/kvm/kvm_stat index d43e8f3..4756eee 100755 --- a/scripts/kvm/kvm_stat +++ b/scripts/kvm/kvm_stat @@ -22,6 +22,7 @@ import resource import struct import re from collections import defaultdict +from time import sleep VMX_EXIT_REASONS = { 'EXCEPTION_NMI':0, @@ -778,7 +779,7 @@ def get_providers(options): return providers -def check_access(): +def check_access(options): if not os.path.exists('/sys/kernel/debug'): sys.stderr.write('Please enable CONFIG_DEBUG_FS in your kernel.') sys.exit(1) @@ -790,14 +791,26 @@ def check_access(): "Also ensure, that the kvm modules are loaded.\n") sys.exit(1) -if not os.path.exists(PATH_DEBUGFS_TRACING): -sys.stderr.write("Please make {0} readable by the current user.\n" +if not os.path.exists(PATH_DEBUGFS_TRACING) and (options.tracepoints + or not options.debugfs): +sys.stderr.write("Please enable CONFIG_TRACING in your kernel " + "when using the option -t (default).\n" + "If it is enabled, make {0} readable by the " + "current user.\n" .format(PATH_DEBUGFS_TRACING)) -sys.exit(1) + +if options.tracepoints: +sys.exit(1) + +sys.stderr.write("Falling back to debugfs statistics!\n") +options.debugfs = True +sleep(5) + +return options def main(): -check_access() options = get_options() +options = check_access(options) providers = get_providers(options) stats = Stats(providers, fields=options.fields) -- 2.3.0
Re: [Qemu-devel] [PATCH v4] Add optionrom compatible with fw_cfg DMA version
Hi, > I think the "dma_enabled" property is not exposed to the user. It is: "-global fw_cfg.dma_enabled=off" works (as in: doesn't throw an error). Has no effect through as it gets overridden later on. > The default value of "dma_enabled" in both fw_cfg_io_properties and > fw_cfg_mem_properties is irrelevant; the actual property value is always > overwritten in fw_cfg_init_io_dma() and fw_cfg_init_mem_wide(), which > all of the init paths go through. And IMHO we should not do that, so setting the property actually has an effect. > I agree that DMA capability should be filtered with machine type. > However, that distinction should not be made using the current > "dma_enabled" properties (i.e., of "fw_cfg_io_properties" and > "fw_cfg_mem_properties". Instead, it should be made in the > board-specific callers of fw_cfg_init_(io_dma|mem_wide). Why? cheers, Gerd
Re: [Qemu-devel] [RFC PATCH v2 7/7] vfio/pci: Find and expose Intel IGD OpRegion
On Di, 2016-02-02 at 13:09 -0700, Alex Williamson wrote: > This is provided via a device specific region, look for it on Intel > VGA class devices. Our default mechanism to expose this to the BIOS > is via fw_cfg where it's expected that the BIOS will copy the data > into a reserved RAM area and update the ASL Storage register to > reference the GPA of that buffer. > We also support directly mapping > the OpRegion through to the host in response to the ASL Storage > register write, which makes the data "live" and potentially provides > write access should the underlying vfio region support writes. This should better be splitted into a separate patch. > This > option is automatically enabled if we somehow don't support fw_cfg (is > this a good idea?). I think this can't happen. And even in case it can: we have bigger problems than the opregion then. cheers, Gerd
Re: [Qemu-devel] [PATCH v14 7/8] Implement new driver for block replication
On Wed, Feb 03, 2016 at 09:29:15AM +0800, Wen Congyang wrote: > On 02/02/2016 10:34 PM, Stefan Hajnoczi wrote: > > On Mon, Feb 01, 2016 at 09:13:36AM +0800, Wen Congyang wrote: > >> On 01/29/2016 11:46 PM, Stefan Hajnoczi wrote: > >>> On Fri, Jan 29, 2016 at 11:13:42AM +0800, Changlong Xie wrote: > On 01/28/2016 11:15 PM, Stefan Hajnoczi wrote: > > On Thu, Jan 28, 2016 at 09:13:24AM +0800, Wen Congyang wrote: > >> On 01/27/2016 10:46 PM, Stefan Hajnoczi wrote: > >>> On Wed, Jan 13, 2016 at 05:18:31PM +0800, Changlong Xie wrote: > > I'm concerned that the bdrv_drain_all() in vm_stop() can take a long > > time if the disk is slow/failing. bdrv_drain_all() blocks until all > > in-flight I/O requests have completed. What does the Primary do if the > > Secondary becomes unresponsive? > > Actually, we knew this problem. But currently, there seems no better way > to > resolve it. If you have any ideas? > >>> > >>> Is it possible to hold the checkpoint information and acknowledge the > >>> checkpoint right away, without waiting for bdrv_drain_all() or any > >>> Secondory guest activity to complete? > >> > >> There is no way to know that secondary becomes unreponsive. > > > > I meant whether it is necessary for the Secondary to vm_stop() and apply > > the checkpoint before acknowledging the checkpoint to the Primary? > > I don't understand this. > Here is the COLO checkpoint flow: > > PrimarySecondary > new checkpoint notice ---> > vm_stop() vm_stop() > vm state(device state, memory, cpu) ---> >load state > <--- done > vm_start() vm_start() If the Secondary's vm_stop() call blocks then the Primary is stuck too. I was wondering whether the Secondary can do: <--- done vm_stop() load state It simply receives the checkpoint data into a buffer and immediately replies with "done". vm_stop() and load state is only performed after sending "done". The advantage is that the Primary will not be delayed by the Secondary. It's an approach that doesn't block. But perhaps it's a problem if the Secondary is slower than the Primary since the Secondary still needs to complete vm_stop() and load state before it can resume execution? > >>> I think this really means falling back to microcheckpointing until the > >>> Secondary guest can checkpoint. Instead of a blocking vm_stop() we > >>> would prevent vcpus from running and when the last pending I/O finishes > >>> the Secondary could apply the last checkpoint. This approach does not > >>> block QEMU (the monitor, etc). > >>> > >> > >> If secondary host becomes unresponsive, it means that we cannot do > >> mocrocheckpointing. > >> We should do failover in this case. > > > > This is dangerous because it means that a delay/failure in the Secondary > > would cause the Primary to fail over to the broken Secondary. All the > > more reason not to perform blocking operations on the Secondary in the > > checkpoint code path. > > If the secondary is broken, primary qemu will take over. Does the Primary use a timeout between "new checkpoint notice" and Secondary's "done" so it can move on if the Secondary is unresponsive? Stefan signature.asc Description: PGP signature
Re: [Qemu-devel] [PATCH 05/16] iothread: release AioContext around aio_poll
On Tue, Feb 02, 2016 at 04:01:55PM +0100, Paolo Bonzini wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA256 > > > > On 02/02/2016 15:52, Stefan Hajnoczi wrote: > >>> @@ -110,6 +111,8 @@ static void *test_acquire_thread(void > >>> *opaque) qemu_mutex_lock(&data->start_lock); > >>> qemu_mutex_unlock(&data->start_lock); > >>> > >>> +g_usleep(50); > > Sleep? > > What about it? :) Sleep in a loop is inefficient but at least correct. A sleep outside a loop is a race condition. If the machine is heavily loaded, whatever you are waiting for may not happen in 500 ms and the test case is unreliable. What is the purpose of this sleep? Stefan signature.asc Description: PGP signature
Re: [Qemu-devel] [PATCH] Fix inconsistency between comment and variable name
Michael Tokarev writes: > 03.02.2016 06:19, Cao jin wrote: >> Signed-off-by: Cao jin >> --- >> include/hw/qdev-core.h | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h >> index abcdee8..42fa5db 100644 >> --- a/include/hw/qdev-core.h >> +++ b/include/hw/qdev-core.h >> @@ -221,7 +221,7 @@ typedef struct BusChild { >> >> /** >> * BusState: >> - * @hotplug_device: link to a hotplug device associated with bus. >> + * @hotplug_handler: link to a hotplug device associated with bus. > > Hmm. Now while the field name in comment and in the structure > do match, the comment is still wrong, since it is a linke to > a handler, not a device… :) Do you mean to suggest the line should be changed to * @hotplug_device: link to a hotplug handler associated with bus. ?
[Qemu-devel] [PATCH 2/2] m68k: Build the opcode table only once to avoid multithreading issues
Signed-off-by: John Paul Adrian Glaubitz --- target-m68k/translate.c | 4 1 file changed, 4 insertions(+) diff --git a/target-m68k/translate.c b/target-m68k/translate.c index 535d7f9..f508d1e 100644 --- a/target-m68k/translate.c +++ b/target-m68k/translate.c @@ -2828,6 +2828,10 @@ register_opcode (disas_proc proc, uint16_t opcode, uint16_t mask) Later insn override earlier ones. */ void register_m68k_insns (CPUM68KState *env) { +/* Build the opcode table only once to avoid + issues with multithreading. */ +if(opcode_table[0] != NULL) + return; #define INSN(name, opcode, mask, feature) do { \ if (m68k_feature(env, M68K_FEATURE_##feature)) \ register_opcode(disas_##name, 0x##opcode, 0x##mask); \ -- 2.7.0
[Qemu-devel] [PATCH 1/2] m68k: Fix opcode mask for fbcc instruction
Signed-off-by: John Paul Adrian Glaubitz --- target-m68k/translate.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/target-m68k/translate.c b/target-m68k/translate.c index 342c040..535d7f9 100644 --- a/target-m68k/translate.c +++ b/target-m68k/translate.c @@ -2940,7 +2940,7 @@ void register_m68k_insns (CPUM68KState *env) INSN(shift_reg, e0a0, f0f0, CF_ISA_A); INSN(undef_fpu, f000, f000, CF_ISA_A); INSN(fpu, f200, ffc0, CF_FPU); -INSN(fbcc, f280, ffc0, CF_FPU); +INSN(fbcc, f280, ff80, CF_FPU); INSN(frestore, f340, ffc0, CF_FPU); INSN(fsave, f340, ffc0, CF_FPU); INSN(intouch, f340, ffc0, CF_ISA_A); -- 2.7.0
[Qemu-devel] m68k: More bug fixes for translation code
Hi Laurent! As promised, here are the fixes for the two recently discovered bugs in the m68k translation code. The first patch fixes the opcode mask for the fbcc instruction which is currently incorrect as it masks the 6th bit as constant (0xffc0). However, according to the ColdFire reference manual, this bit is used to determine the size of the displacement for the jump, either 16 or 32 bits: > http://www.nxp.com/files/dsp/doc/ref_manual/CFPRM.pdf (p. 229) Looking at DISAS_INSN(fbcc), the emulated instruction actually tests for the 6th bit and sets the offset accordingly. However, since the current opcode mask ignores this bit, long jumps can never work. In fact, what we actually see is an illegal instruction: 0xf2e0. Changing the opcode mask to 0xff80 makes the 6th bit variable and allows long jumps to work as expected. The second patch addresses a problem with the thread safety of register_m68k_insns(). It turns out, that the opcode table is rebuild for every thread that is started which means that in a multithreaded environment, one thread can destroy the opcode table of a concurrent thread which makes this thread crash with an illegal instruction. This patch changes register_m68k_insns() such that it returns without doing anything in case the opcode table has already been built and re-registering the instructions is therefore not necessary but rather harmful. Credits go to Michael Karcher for helping to debug these issues! Cheers, Adrian -- .''`. John Paul Adrian Glaubitz : :' : Debian Developer - glaub...@debian.org `. `' Freie Universitaet Berlin - glaub...@physik.fu-berlin.de `-GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913
Re: [Qemu-devel] [PATCH 1/2] m68k: Fix opcode mask for fbcc instruction
Le 03/02/2016 10:37, John Paul Adrian Glaubitz a écrit : > Signed-off-by: John Paul Adrian Glaubitz > --- > target-m68k/translate.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/target-m68k/translate.c b/target-m68k/translate.c > index 342c040..535d7f9 100644 > --- a/target-m68k/translate.c > +++ b/target-m68k/translate.c > @@ -2940,7 +2940,7 @@ void register_m68k_insns (CPUM68KState *env) > INSN(shift_reg, e0a0, f0f0, CF_ISA_A); > INSN(undef_fpu, f000, f000, CF_ISA_A); > INSN(fpu, f200, ffc0, CF_FPU); > -INSN(fbcc, f280, ffc0, CF_FPU); > +INSN(fbcc, f280, ff80, CF_FPU); > INSN(frestore, f340, ffc0, CF_FPU); > INSN(fsave, f340, ffc0, CF_FPU); > INSN(intouch, f340, ffc0, CF_ISA_A); > Reviewed-by: Laurent Vivier
Re: [Qemu-devel] [PATCH 2/2] m68k: Build the opcode table only once to avoid multithreading issues
Le 03/02/2016 10:37, John Paul Adrian Glaubitz a écrit : > Signed-off-by: John Paul Adrian Glaubitz > --- > target-m68k/translate.c | 4 > 1 file changed, 4 insertions(+) > > diff --git a/target-m68k/translate.c b/target-m68k/translate.c > index 535d7f9..f508d1e 100644 > --- a/target-m68k/translate.c > +++ b/target-m68k/translate.c > @@ -2828,6 +2828,10 @@ register_opcode (disas_proc proc, uint16_t opcode, > uint16_t mask) > Later insn override earlier ones. */ > void register_m68k_insns (CPUM68KState *env) > { > +/* Build the opcode table only once to avoid > + issues with multithreading. */ > +if(opcode_table[0] != NULL) > + return; > #define INSN(name, opcode, mask, feature) do { \ > if (m68k_feature(env, M68K_FEATURE_##feature)) \ > register_opcode(disas_##name, 0x##opcode, 0x##mask); \ > Reviewed-by: Laurent Vivier
Re: [Qemu-devel] [PATCH 1/2] m68k: Fix opcode mask for fbcc instruction
Strange. There should be a cover letter coming along as well which explains my changes. Did you get it? On 02/03/2016 10:37 AM, John Paul Adrian Glaubitz wrote: > Signed-off-by: John Paul Adrian Glaubitz > --- > target-m68k/translate.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/target-m68k/translate.c b/target-m68k/translate.c > index 342c040..535d7f9 100644 > --- a/target-m68k/translate.c > +++ b/target-m68k/translate.c > @@ -2940,7 +2940,7 @@ void register_m68k_insns (CPUM68KState *env) > INSN(shift_reg, e0a0, f0f0, CF_ISA_A); > INSN(undef_fpu, f000, f000, CF_ISA_A); > INSN(fpu, f200, ffc0, CF_FPU); > -INSN(fbcc, f280, ffc0, CF_FPU); > +INSN(fbcc, f280, ff80, CF_FPU); > INSN(frestore, f340, ffc0, CF_FPU); > INSN(fsave, f340, ffc0, CF_FPU); > INSN(intouch, f340, ffc0, CF_ISA_A); > -- .''`. John Paul Adrian Glaubitz : :' : Debian Developer - glaub...@debian.org `. `' Freie Universitaet Berlin - glaub...@physik.fu-berlin.de `-GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913
Re: [Qemu-devel] [PATCH 1/2] m68k: Fix opcode mask for fbcc instruction
Le 03/02/2016 10:40, John Paul Adrian Glaubitz a écrit : > Strange. There should be a cover letter coming along as well which > explains my changes. Did you get it? We have the cover letter, but it is never sent to the sender :) Laurent > > On 02/03/2016 10:37 AM, John Paul Adrian Glaubitz wrote: >> Signed-off-by: John Paul Adrian Glaubitz >> --- >> target-m68k/translate.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/target-m68k/translate.c b/target-m68k/translate.c >> index 342c040..535d7f9 100644 >> --- a/target-m68k/translate.c >> +++ b/target-m68k/translate.c >> @@ -2940,7 +2940,7 @@ void register_m68k_insns (CPUM68KState *env) >> INSN(shift_reg, e0a0, f0f0, CF_ISA_A); >> INSN(undef_fpu, f000, f000, CF_ISA_A); >> INSN(fpu, f200, ffc0, CF_FPU); >> -INSN(fbcc, f280, ffc0, CF_FPU); >> +INSN(fbcc, f280, ff80, CF_FPU); >> INSN(frestore, f340, ffc0, CF_FPU); >> INSN(fsave, f340, ffc0, CF_FPU); >> INSN(intouch, f340, ffc0, CF_ISA_A); >> > >
Re: [Qemu-devel] [PATCH v4] Add optionrom compatible with fw_cfg DMA version
On Tue, Feb 02, 2016 at 01:58:14PM +0100, Marc Marí wrote: > El Tue, 02 Feb 2016 12:06:27 +0100 > Gerd Hoffmann escribió: > > Hi, > > > > > %.img: %.o > > > - $(call quiet-command,$(LD) $(LDFLAGS_NOPIE) -Ttext 0 -e > > > _start -s -o $@ $<," Building $(TARGET_DIR)$@") > > > + $(call quiet-command,$(LD) $(LDFLAGS_NOPIE) -m elf_i386 > > > -Ttext 0 -e _start -s -o $@ $<," Building $(TARGET_DIR)$@") > > > > Hmm, that breaks the windows cross build: > > > > make: Entering directory `/home/kraxel/projects/qemu/build-win32' > > Building optionrom/linuxboot_dma.img > > i686-w64-mingw32-ld: unrecognised emulation mode: elf_i386 > > Supported emulations: i386pe > > make[1]: *** [linuxboot_dma.img] Error 1 > > Thanks for reporting. > > I don't know much about Windows cross-builds. Any idea on how to solve > the issue? The Windows toolchain doesn't use ELF binaries so -m elf_i386 doesn't work there (the emulation is called "i386pe" in i686-w64-mingw32-ld). I wonder whether it's possible to use gcc -m32 ... -o $@ %< as a wrapper that automatically does the right thing. But I guess it won't work since gcc wants a C source file and not an object file as input. You could make the emulation ("elf_i386" vs "i386pe") conditional on the host platform (CONFIG_WIN32=y). I'm not sure what the most elegant solution is. Stefan signature.asc Description: PGP signature
Re: [Qemu-devel] [PATCH v6 1/5] util: Introduce error reporting functions with fatal/abort
David Gibson writes: > On Tue, Feb 02, 2016 at 10:47:35PM +0100, Thomas Huth wrote: >> On 02.02.2016 19:53, Markus Armbruster wrote: >> > Lluís Vilanova writes: >> ... >> >> >> diff --git a/include/qemu/error-report.h b/include/qemu/error-report.h >> >> index 7ab2355..6c2f142 100644 >> >> --- a/include/qemu/error-report.h >> >> +++ b/include/qemu/error-report.h >> >> @@ -43,4 +43,23 @@ void error_report(const char *fmt, ...) >> >> GCC_FMT_ATTR(1, 2); >> >> const char *error_get_progname(void); >> >> extern bool enable_timestamp_msg; >> >> >> >> +/* Report message and exit with error */ >> >> +void QEMU_NORETURN error_vreport_fatal(const char *fmt, va_list ap) >> >> GCC_FMT_ATTR(1, 0); >> >> +void QEMU_NORETURN error_report_fatal(const char *fmt, ...) >> >> GCC_FMT_ATTR(1, 2); >> > >> > This lets people write things like >> > >> > error_report_fatal("The sky is falling"); >> > >> > instead of >> > >> > error_report("The sky is falling"); >> > exit(1); >> > >> > or >> > >> > fprintf(stderr, "The sky is falling\n"); >> > exit(1); >> > >> > I don't think that's an improvement in clarity. >> >> The problem is not the existing code, but that in a couple of new >> patches, I've now already seen that people are trying to use >> >> error_setg(&error_fatal, ... ); > > So, I don't actually see any real advantage to error_report_fatal(...) > over error_setg(&error_fatal, ...). I do. Compare: (a) error_report(...); exit(1); (b) error_report_fatal(...); (c) error_setg(&error_fatal, ...); In my opinion, (a) is clearest: even a relatively clueless reader will know what exit(1) does, can guess what error_report() approximately does, and doesn't need to know what it does exactly. (b) is slightly less obvious, and (c) is positively opaque. Let's stick to the obvious (a) and be done with it. [...]
[Qemu-devel] [PULL 1/1] hmp: fix sendkey out of bounds write (CVE-2015-8619)
From: Wolfgang Bumiller When processing 'sendkey' command, hmp_sendkey routine null terminates the 'keyname_buf' array. This results in an OOB write issue, if 'keyname_len' was to fall outside of 'keyname_buf' array. Since the keyname's length is known the keyname_buf can be removed altogether by adding a length parameter to index_from_key() and using it for the error output as well. Reported-by: Ling Liu Signed-off-by: Wolfgang Bumiller Message-Id: <20160113080958.GA18934@olga> [Comparison with "<" dumbed down, test for junk after strtoul() tweaked] Signed-off-by: Markus Armbruster --- hmp.c| 18 -- include/ui/console.h | 2 +- ui/input-legacy.c| 5 +++-- 3 files changed, 12 insertions(+), 13 deletions(-) diff --git a/hmp.c b/hmp.c index 54f2620..9c571f5 100644 --- a/hmp.c +++ b/hmp.c @@ -1731,21 +1731,18 @@ void hmp_sendkey(Monitor *mon, const QDict *qdict) int has_hold_time = qdict_haskey(qdict, "hold-time"); int hold_time = qdict_get_try_int(qdict, "hold-time", -1); Error *err = NULL; -char keyname_buf[16]; char *separator; int keyname_len; while (1) { separator = strchr(keys, '-'); keyname_len = separator ? separator - keys : strlen(keys); -pstrcpy(keyname_buf, sizeof(keyname_buf), keys); /* Be compatible with old interface, convert user inputted "<" */ -if (!strncmp(keyname_buf, "<", 1) && keyname_len == 1) { -pstrcpy(keyname_buf, sizeof(keyname_buf), "less"); +if (keys[0] == '<' && keyname_len == 1) { +keys = "less"; keyname_len = 4; } -keyname_buf[keyname_len] = 0; keylist = g_malloc0(sizeof(*keylist)); keylist->value = g_malloc0(sizeof(*keylist->value)); @@ -1758,16 +1755,17 @@ void hmp_sendkey(Monitor *mon, const QDict *qdict) } tmp = keylist; -if (strstart(keyname_buf, "0x", NULL)) { +if (strstart(keys, "0x", NULL)) { char *endp; -int value = strtoul(keyname_buf, &endp, 0); -if (*endp != '\0') { +int value = strtoul(keys, &endp, 0); +assert(endp <= keys + keyname_len); +if (endp != keys + keyname_len) { goto err_out; } keylist->value->type = KEY_VALUE_KIND_NUMBER; keylist->value->u.number = value; } else { -int idx = index_from_key(keyname_buf); +int idx = index_from_key(keys, keyname_len); if (idx == Q_KEY_CODE__MAX) { goto err_out; } @@ -1789,7 +1787,7 @@ out: return; err_out: -monitor_printf(mon, "invalid parameter: %s\n", keyname_buf); +monitor_printf(mon, "invalid parameter: %.*s\n", keyname_len, keys); goto out; } diff --git a/include/ui/console.h b/include/ui/console.h index adac36d..116bc2b 100644 --- a/include/ui/console.h +++ b/include/ui/console.h @@ -448,7 +448,7 @@ static inline int vnc_display_pw_expire(const char *id, time_t expires) void curses_display_init(DisplayState *ds, int full_screen); /* input.c */ -int index_from_key(const char *key); +int index_from_key(const char *key, size_t key_length); /* gtk.c */ void early_gtk_display_init(int opengl); diff --git a/ui/input-legacy.c b/ui/input-legacy.c index 35dfc27..3454055 100644 --- a/ui/input-legacy.c +++ b/ui/input-legacy.c @@ -57,12 +57,13 @@ struct QEMUPutLEDEntry { static QTAILQ_HEAD(, QEMUPutLEDEntry) led_handlers = QTAILQ_HEAD_INITIALIZER(led_handlers); -int index_from_key(const char *key) +int index_from_key(const char *key, size_t key_length) { int i; for (i = 0; QKeyCode_lookup[i] != NULL; i++) { -if (!strcmp(key, QKeyCode_lookup[i])) { +if (!strncmp(key, QKeyCode_lookup[i], key_length) && +!QKeyCode_lookup[i][key_length]) { break; } } -- 2.4.3
[Qemu-devel] [PULL 0/1] Monitor patches for 2016-02-03
The following changes since commit c65db7705b7926f4a084b93778e4bd5dd3990aad: Merge remote-tracking branch 'remotes/maxreitz/tags/pull-block-for-peter-2016-02-02' into staging (2016-02-02 18:04:04 +) are available in the git repository at: git://repo.or.cz/qemu/armbru.git tags/pull-monitor-2016-02-03 for you to fetch changes up to 64ffbe04eaafebf4045a3ace52a360c14959d196: hmp: fix sendkey out of bounds write (CVE-2015-8619) (2016-02-03 10:13:06 +0100) Monitor patches for 2016-02-03 Wolfgang Bumiller (1): hmp: fix sendkey out of bounds write (CVE-2015-8619) hmp.c| 18 -- include/ui/console.h | 2 +- ui/input-legacy.c| 5 +++-- 3 files changed, 12 insertions(+), 13 deletions(-) -- 2.4.3
Re: [Qemu-devel] [PATCH 05/16] iothread: release AioContext around aio_poll
On 03/02/2016 10:34, Stefan Hajnoczi wrote: > +g_usleep(50); > Sleep? >>> >>> What about it? :) > Sleep in a loop is inefficient but at least correct. > > A sleep outside a loop is a race condition. If the machine is > heavily loaded, whatever you are waiting for may not happen in 500 > ms and the test case is unreliable. > > What is the purpose of this sleep? In the test the desired ordering of events is main thread additional thread - lock start_lock start thread lock start_lock (waits) acquire context unlock start_lock lock start_lock (returns) unlock start_lock aio_poll poll() event_notifier_set Comparing the test to QEMU, the roles of the threads are reversed. The test's "main thread" is QEMU's dataplane thread, and the test's additional thread is QEMU's main thread. The sleep "ensures" that poll() blocks before event_notifier_set. The sleep represents how QEMU's main thread acquires the context rarely, and not right after starting the dataplane thread. Because this is a file descriptor source, there is really no difference between the code's behavior, no matter if aio_poll starts before or after the event_notifier_set. The test passes even if you remove the sleep. Do you have any suggestion? Just add a comment? Paolo
Re: [Qemu-devel] [PATCH v1 14/22] migration: convert savevm to use QIOChannel for writing to files
* Daniel P. Berrange (berra...@redhat.com) wrote: > Convert the exec savevm code to use QIOChannel and QEMUFileChannel, > instead of the stdio APIs. > > Signed-off-by: Daniel P. Berrange Reviewed-by: Dr. David Alan Gilbert > --- > migration/savevm.c | 8 +--- > tests/Makefile | 4 ++-- > tests/test-vmstate.c | 11 ++- > 3 files changed, 17 insertions(+), 6 deletions(-) > > diff --git a/migration/savevm.c b/migration/savevm.c > index f2e1880..e57d7ce 100644 > --- a/migration/savevm.c > +++ b/migration/savevm.c > @@ -51,6 +51,7 @@ > #include "block/snapshot.h" > #include "block/qapi.h" > #include "io/channel-buffer.h" > +#include "io/channel-file.h" > > > #ifndef ETH_P_RARP > @@ -1990,6 +1991,7 @@ void hmp_savevm(Monitor *mon, const QDict *qdict) > void qmp_xen_save_devices_state(const char *filename, Error **errp) > { > QEMUFile *f; > +QIOChannelFile *ioc; > int saved_vm_running; > int ret; > > @@ -1997,11 +1999,11 @@ void qmp_xen_save_devices_state(const char *filename, > Error **errp) > vm_stop(RUN_STATE_SAVE_VM); > global_state_store_running(); > > -f = qemu_fopen(filename, "wb"); > -if (!f) { > -error_setg_file_open(errp, errno, filename); > +ioc = qio_channel_file_new_path(filename, O_WRONLY | O_CREAT, 0660, > errp); > +if (!ioc) { > goto the_end; > } > +f = qemu_fopen_channel_output(QIO_CHANNEL(ioc)); > ret = qemu_save_device_state(f); > qemu_fclose(f); > if (ret < 0) { > diff --git a/tests/Makefile b/tests/Makefile > index 5a1b25f..e8f759a 100644 > --- a/tests/Makefile > +++ b/tests/Makefile > @@ -423,8 +423,8 @@ tests/test-qdev-global-props$(EXESUF): > tests/test-qdev-global-props.o \ > $(test-qapi-obj-y) > tests/test-vmstate$(EXESUF): tests/test-vmstate.o \ > migration/vmstate.o migration/qemu-file.o \ > -migration/qemu-file-unix.o qjson.o \ > - $(test-qom-obj-y) > +migration/qemu-file-channel.o qjson.o \ > + $(test-io-obj-y) > tests/test-timed-average$(EXESUF): tests/test-timed-average.o qemu-timer.o \ > $(test-util-obj-y) > tests/test-base64$(EXESUF): tests/test-base64.o \ > diff --git a/tests/test-vmstate.c b/tests/test-vmstate.c > index 0f943d5..6bbda8a 100644 > --- a/tests/test-vmstate.c > +++ b/tests/test-vmstate.c > @@ -28,6 +28,7 @@ > #include "migration/migration.h" > #include "migration/vmstate.h" > #include "qemu/coroutine.h" > +#include "io/channel-file.h" > > static char temp_file[] = "/tmp/vmst.test.XX"; > static int temp_fd; > @@ -48,11 +49,17 @@ void yield_until_fd_readable(int fd) > static QEMUFile *open_test_file(bool write) > { > int fd = dup(temp_fd); > +QIOChannel *ioc; > lseek(fd, 0, SEEK_SET); > if (write) { > g_assert_cmpint(ftruncate(fd, 0), ==, 0); > } > -return qemu_fdopen(fd, write ? "wb" : "rb"); > +ioc = QIO_CHANNEL(qio_channel_file_new_fd(fd)); > +if (write) { > +return qemu_fopen_channel_output(ioc); > +} else { > +return qemu_fopen_channel_input(ioc); > +} > } > > #define SUCCESS(val) \ > @@ -468,6 +475,8 @@ int main(int argc, char **argv) > { > temp_fd = mkstemp(temp_file); > > +module_call_init(MODULE_INIT_QOM); > + > g_test_init(&argc, &argv, NULL); > g_test_add_func("/vmstate/simple/primitive", test_simple_primitive); > g_test_add_func("/vmstate/versioned/load/v1", test_load_v1); > -- > 2.5.0 > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH v14 7/8] Implement new driver for block replication
On 02/03/2016 05:32 PM, Stefan Hajnoczi wrote: > On Wed, Feb 03, 2016 at 09:29:15AM +0800, Wen Congyang wrote: >> On 02/02/2016 10:34 PM, Stefan Hajnoczi wrote: >>> On Mon, Feb 01, 2016 at 09:13:36AM +0800, Wen Congyang wrote: On 01/29/2016 11:46 PM, Stefan Hajnoczi wrote: > On Fri, Jan 29, 2016 at 11:13:42AM +0800, Changlong Xie wrote: >> On 01/28/2016 11:15 PM, Stefan Hajnoczi wrote: >>> On Thu, Jan 28, 2016 at 09:13:24AM +0800, Wen Congyang wrote: On 01/27/2016 10:46 PM, Stefan Hajnoczi wrote: > On Wed, Jan 13, 2016 at 05:18:31PM +0800, Changlong Xie wrote: >>> I'm concerned that the bdrv_drain_all() in vm_stop() can take a long >>> time if the disk is slow/failing. bdrv_drain_all() blocks until all >>> in-flight I/O requests have completed. What does the Primary do if the >>> Secondary becomes unresponsive? >> >> Actually, we knew this problem. But currently, there seems no better way >> to >> resolve it. If you have any ideas? > > Is it possible to hold the checkpoint information and acknowledge the > checkpoint right away, without waiting for bdrv_drain_all() or any > Secondory guest activity to complete? There is no way to know that secondary becomes unreponsive. >>> >>> I meant whether it is necessary for the Secondary to vm_stop() and apply >>> the checkpoint before acknowledging the checkpoint to the Primary? >> >> I don't understand this. >> Here is the COLO checkpoint flow: >> >> PrimarySecondary >> new checkpoint notice ---> >> vm_stop() vm_stop() >> vm state(device state, memory, cpu) ---> >>load state >> <--- done >> vm_start() vm_start() > > If the Secondary's vm_stop() call blocks then the Primary is stuck too. > > I was wondering whether the Secondary can do: > > <--- done > vm_stop() > load state > > It simply receives the checkpoint data into a buffer and immediately > replies with "done". vm_stop() and load state is only performed after > sending "done". Secondary vm is running, so we should also get the pages that are dirtied by secondary vm, but not dirtied by primary vm. We have two ways to do it: 1. Cache all original memory in the secondary qemu 2. Send the dirty pfn list to primary qemu, and get it. If we ack the checkpoint and the call vm_stop(), we only can select 1. It means that secondary qemu costs more memory. In COLO mode, we will compare the output socket, and will do checkpoint if the application level data is different. If we ack the checkpoint and the call vm_stop(), the client can not get any more data until secondary vm is running again. So we still 'wait' the secondary vm. > > The advantage is that the Primary will not be delayed by the Secondary. > It's an approach that doesn't block. > > But perhaps it's a problem if the Secondary is slower than the Primary > since the Secondary still needs to complete vm_stop() and load state > before it can resume execution? > > I think this really means falling back to microcheckpointing until the > Secondary guest can checkpoint. Instead of a blocking vm_stop() we > would prevent vcpus from running and when the last pending I/O finishes > the Secondary could apply the last checkpoint. This approach does not > block QEMU (the monitor, etc). > If secondary host becomes unresponsive, it means that we cannot do mocrocheckpointing. We should do failover in this case. >>> >>> This is dangerous because it means that a delay/failure in the Secondary >>> would cause the Primary to fail over to the broken Secondary. All the >>> more reason not to perform blocking operations on the Secondary in the >>> checkpoint code path. >> >> If the secondary is broken, primary qemu will take over. > > Does the Primary use a timeout between "new checkpoint notice" and > Secondary's "done" so it can move on if the Secondary is unresponsive? To hailiang: IIRC, we don't use a timeout but I think we can do it. In our design, there is an exteranl heartbeat to check primary and secondary status, and decide when to do checkpoint. Thanks Wen Congyang > > Stefan >
Re: [Qemu-devel] [PATCH] block: add missing call to bdrv_drain_recurse
On 11/01/2016 09:32, Paolo Bonzini wrote: > > > On 25/12/2015 02:55, Fam Zheng wrote: >> On Wed, 12/23 11:48, Paolo Bonzini wrote: >>> This is also needed in bdrv_drain_all, not just in bdrv_drain. >>> >>> Signed-off-by: Paolo Bonzini >>> --- >>> block/io.c | 1 + >>> 1 file changed, 1 insertion(+) >>> >>> diff --git a/block/io.c b/block/io.c >>> index 841f5b5..bfe2544 100644 >>> --- a/block/io.c >>> +++ b/block/io.c >>> @@ -293,6 +293,7 @@ void bdrv_drain_all(void) >>> if (bs->job) { >>> block_job_pause(bs->job); >>> } >>> +bdrv_drain_recurse(bs); >>> aio_context_release(aio_context); >>> >>> if (!g_slist_find(aio_ctxs, aio_context)) { >>> -- >>> 2.5.0 >>> >>> >> >> Reviewed-by: Fam Zheng >> >> > > Ping? Ping^2?
Re: [Qemu-devel] [PATCH v1 15/22] migration: delete QEMUFile buffer implementation
* Daniel P. Berrange (berra...@redhat.com) wrote: > The qemu_bufopen() method is no longer used, so the memory > buffer based QEMUFile backend can be deleted entirely. > > Signed-off-by: Daniel P. Berrange Reviewed-by: Dr. David Alan Gilbert > --- > include/migration/qemu-file.h | 6 --- > migration/qemu-file-buf.c | 96 > --- > 2 files changed, 102 deletions(-) > > diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h > index cb79311..6b12960 100644 > --- a/include/migration/qemu-file.h > +++ b/include/migration/qemu-file.h > @@ -141,7 +141,6 @@ QEMUFile *qemu_fopen_socket(int fd, const char *mode); > QEMUFile *qemu_fopen_channel_input(QIOChannel *ioc); > QEMUFile *qemu_fopen_channel_output(QIOChannel *ioc); > QEMUFile *qemu_popen_cmd(const char *command, const char *mode); > -QEMUFile *qemu_bufopen(const char *mode, QEMUSizedBuffer *input); > void qemu_file_set_hooks(QEMUFile *f, const QEMUFileHooks *hooks); > int qemu_get_fd(QEMUFile *f); > int qemu_fclose(QEMUFile *f); > @@ -167,11 +166,6 @@ ssize_t qsb_write_at(QEMUSizedBuffer *qsb, const uint8_t > *buf, > off_t pos, size_t count); > > > -/* > - * For use on files opened with qemu_bufopen > - */ > -const QEMUSizedBuffer *qemu_buf_get(QEMUFile *f); > - > static inline void qemu_put_ubyte(QEMUFile *f, unsigned int v) > { > qemu_put_byte(f, (int)v); > diff --git a/migration/qemu-file-buf.c b/migration/qemu-file-buf.c > index 49516b8..c0bc38c 100644 > --- a/migration/qemu-file-buf.c > +++ b/migration/qemu-file-buf.c > @@ -365,99 +365,3 @@ ssize_t qsb_write_at(QEMUSizedBuffer *qsb, const uint8_t > *source, > > return count; > } > - > -typedef struct QEMUBuffer { > -QEMUSizedBuffer *qsb; > -QEMUFile *file; > -bool qsb_allocated; > -} QEMUBuffer; > - > -static ssize_t buf_get_buffer(void *opaque, uint8_t *buf, int64_t pos, > - size_t size) > -{ > -QEMUBuffer *s = opaque; > -ssize_t len = qsb_get_length(s->qsb) - pos; > - > -if (len <= 0) { > -return 0; > -} > - > -if (len > size) { > -len = size; > -} > -return qsb_get_buffer(s->qsb, pos, len, buf); > -} > - > -static ssize_t buf_put_buffer(void *opaque, const uint8_t *buf, > - int64_t pos, size_t size) > -{ > -QEMUBuffer *s = opaque; > - > -return qsb_write_at(s->qsb, buf, pos, size); > -} > - > -static int buf_close(void *opaque) > -{ > -QEMUBuffer *s = opaque; > - > -if (s->qsb_allocated) { > -qsb_free(s->qsb); > -} > - > -g_free(s); > - > -return 0; > -} > - > -const QEMUSizedBuffer *qemu_buf_get(QEMUFile *f) > -{ > -QEMUBuffer *p; > - > -qemu_fflush(f); > - > -p = f->opaque; > - > -return p->qsb; > -} > - > -static const QEMUFileOps buf_read_ops = { > -.get_buffer = buf_get_buffer, > -.close = buf_close, > -}; > - > -static const QEMUFileOps buf_write_ops = { > -.put_buffer = buf_put_buffer, > -.close = buf_close, > -}; > - > -QEMUFile *qemu_bufopen(const char *mode, QEMUSizedBuffer *input) > -{ > -QEMUBuffer *s; > - > -if (mode == NULL || (mode[0] != 'r' && mode[0] != 'w') || > -mode[1] != '\0') { > -error_report("qemu_bufopen: Argument validity check failed"); > -return NULL; > -} > - > -s = g_new0(QEMUBuffer, 1); > -s->qsb = input; > - > -if (s->qsb == NULL) { > -s->qsb = qsb_create(NULL, 0); > -s->qsb_allocated = true; > -} > -if (!s->qsb) { > -g_free(s); > -error_report("qemu_bufopen: qsb_create failed"); > -return NULL; > -} > - > - > -if (mode[0] == 'r') { > -s->file = qemu_fopen_ops(s, &buf_read_ops); > -} else { > -s->file = qemu_fopen_ops(s, &buf_write_ops); > -} > -return s->file; > -} > -- > 2.5.0 > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH v1 16/22] migration: delete QEMUSizedBuffer struct
* Daniel P. Berrange (berra...@redhat.com) wrote: > Now that we don't have have a buffer based QemuFile > implementation, the QEMUSizedBuffer code is also > unused and can be deleted. A simpler buffer class > also exists in util/buffer.c which other code can > used as needed. > > Signed-off-by: Daniel P. Berrange Reviewed-by: Dr. David Alan Gilbert > --- > include/migration/qemu-file.h | 16 -- > include/qemu/typedefs.h | 1 - > migration/Makefile.objs | 2 +- > migration/qemu-file-buf.c | 367 > -- > 4 files changed, 1 insertion(+), 385 deletions(-) > delete mode 100644 migration/qemu-file-buf.c > > diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h > index 6b12960..da67931 100644 > --- a/include/migration/qemu-file.h > +++ b/include/migration/qemu-file.h > @@ -127,13 +127,6 @@ typedef struct QEMUFileHooks { > QEMURamSaveFunc *save_page; > } QEMUFileHooks; > > -struct QEMUSizedBuffer { > -struct iovec *iov; > -size_t n_iov; > -size_t size; /* total allocated size in all iov's */ > -size_t used; /* number of used bytes */ > -}; > - > QEMUFile *qemu_fopen_ops(void *opaque, const QEMUFileOps *ops); > QEMUFile *qemu_fopen(const char *filename, const char *mode); > QEMUFile *qemu_fdopen(int fd, const char *mode); > @@ -156,15 +149,6 @@ void qemu_put_buffer_async(QEMUFile *f, const uint8_t > *buf, size_t size); > bool qemu_file_mode_is_not_valid(const char *mode); > bool qemu_file_is_writable(QEMUFile *f); > > -QEMUSizedBuffer *qsb_create(const uint8_t *buffer, size_t len); > -void qsb_free(QEMUSizedBuffer *); > -size_t qsb_set_length(QEMUSizedBuffer *qsb, size_t length); > -size_t qsb_get_length(const QEMUSizedBuffer *qsb); > -ssize_t qsb_get_buffer(const QEMUSizedBuffer *, off_t start, size_t count, > - uint8_t *buf); > -ssize_t qsb_write_at(QEMUSizedBuffer *qsb, const uint8_t *buf, > - off_t pos, size_t count); > - > > static inline void qemu_put_ubyte(QEMUFile *f, unsigned int v) > { > diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h > index 78fe6e8..3f8dfbf 100644 > --- a/include/qemu/typedefs.h > +++ b/include/qemu/typedefs.h > @@ -77,7 +77,6 @@ typedef struct QemuOpt QemuOpt; > typedef struct QemuOpts QemuOpts; > typedef struct QemuOptsList QemuOptsList; > typedef struct QEMUSGList QEMUSGList; > -typedef struct QEMUSizedBuffer QEMUSizedBuffer; > typedef struct QEMUTimer QEMUTimer; > typedef struct QEMUTimerListGroup QEMUTimerListGroup; > typedef struct QObject QObject; > diff --git a/migration/Makefile.objs b/migration/Makefile.objs > index 3c90c44..8ecb941 100644 > --- a/migration/Makefile.objs > +++ b/migration/Makefile.objs > @@ -1,6 +1,6 @@ > common-obj-y += migration.o tcp.o unix.o fd.o exec.o > common-obj-y += vmstate.o > -common-obj-y += qemu-file.o qemu-file-buf.o qemu-file-unix.o > qemu-file-stdio.o > +common-obj-y += qemu-file.o qemu-file-unix.o qemu-file-stdio.o > common-obj-y += qemu-file-channel.o > common-obj-y += xbzrle.o postcopy-ram.o > > diff --git a/migration/qemu-file-buf.c b/migration/qemu-file-buf.c > deleted file mode 100644 > index c0bc38c..000 > --- a/migration/qemu-file-buf.c > +++ /dev/null > @@ -1,367 +0,0 @@ > -/* > - * QEMU System Emulator > - * > - * Copyright (c) 2003-2008 Fabrice Bellard > - * Copyright (c) 2014 IBM Corp. > - * > - * Authors: > - * Stefan Berger > - * > - * Permission is hereby granted, free of charge, to any person obtaining a > copy > - * of this software and associated documentation files (the "Software"), to > deal > - * in the Software without restriction, including without limitation the > rights > - * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell > - * copies of the Software, and to permit persons to whom the Software is > - * furnished to do so, subject to the following conditions: > - * > - * The above copyright notice and this permission notice shall be included in > - * all copies or substantial portions of the Software. > - * > - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR > - * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, > - * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL > - * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER > - * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING > FROM, > - * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN > - * THE SOFTWARE. > - */ > -#include "qemu-common.h" > -#include "qemu/error-report.h" > -#include "qemu/iov.h" > -#include "qemu/sockets.h" > -#include "qemu/coroutine.h" > -#include "migration/migration.h" > -#include "migration/qemu-file.h" > -#include "migration/qemu-file-internal.h" > -#include "trace.h" > - > -#define QSB_CHUNK_SIZE (1 << 10) > -#define QSB_MAX_CHUNK_SIZE (16 * QSB_CHUNK_SIZE) > - >
Re: [Qemu-devel] [PATCH v1 17/22] migration: delete QEMUFile sockets implementation
* Daniel P. Berrange (berra...@redhat.com) wrote: > Now that the tcp, unix and fd migration backends have converted > to use the QIOChannel based QEMUFile, there is no user remaining > for the sockets based QEMUFile impl and it can be deleted. > > Signed-off-by: Daniel P. Berrange Reviewed-by: Dr. David Alan Gilbert > --- > include/migration/qemu-file.h | 2 - > migration/Makefile.objs | 2 +- > migration/qemu-file-unix.c| 324 > -- > 3 files changed, 1 insertion(+), 327 deletions(-) > delete mode 100644 migration/qemu-file-unix.c > > diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h > index da67931..dc4b7ab 100644 > --- a/include/migration/qemu-file.h > +++ b/include/migration/qemu-file.h > @@ -129,8 +129,6 @@ typedef struct QEMUFileHooks { > > QEMUFile *qemu_fopen_ops(void *opaque, const QEMUFileOps *ops); > QEMUFile *qemu_fopen(const char *filename, const char *mode); > -QEMUFile *qemu_fdopen(int fd, const char *mode); > -QEMUFile *qemu_fopen_socket(int fd, const char *mode); > QEMUFile *qemu_fopen_channel_input(QIOChannel *ioc); > QEMUFile *qemu_fopen_channel_output(QIOChannel *ioc); > QEMUFile *qemu_popen_cmd(const char *command, const char *mode); > diff --git a/migration/Makefile.objs b/migration/Makefile.objs > index 8ecb941..2c71056 100644 > --- a/migration/Makefile.objs > +++ b/migration/Makefile.objs > @@ -1,6 +1,6 @@ > common-obj-y += migration.o tcp.o unix.o fd.o exec.o > common-obj-y += vmstate.o > -common-obj-y += qemu-file.o qemu-file-unix.o qemu-file-stdio.o > +common-obj-y += qemu-file.o qemu-file-stdio.o > common-obj-y += qemu-file-channel.o > common-obj-y += xbzrle.o postcopy-ram.o > > diff --git a/migration/qemu-file-unix.c b/migration/qemu-file-unix.c > deleted file mode 100644 > index 6ca53e7..000 > --- a/migration/qemu-file-unix.c > +++ /dev/null > @@ -1,324 +0,0 @@ > -/* > - * QEMU System Emulator > - * > - * Copyright (c) 2003-2008 Fabrice Bellard > - * > - * Permission is hereby granted, free of charge, to any person obtaining a > copy > - * of this software and associated documentation files (the "Software"), to > deal > - * in the Software without restriction, including without limitation the > rights > - * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell > - * copies of the Software, and to permit persons to whom the Software is > - * furnished to do so, subject to the following conditions: > - * > - * The above copyright notice and this permission notice shall be included in > - * all copies or substantial portions of the Software. > - * > - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR > - * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, > - * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL > - * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER > - * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING > FROM, > - * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN > - * THE SOFTWARE. > - */ > -#include "qemu-common.h" > -#include "qemu/error-report.h" > -#include "qemu/iov.h" > -#include "qemu/sockets.h" > -#include "qemu/coroutine.h" > -#include "migration/qemu-file.h" > -#include "migration/qemu-file-internal.h" > - > -typedef struct QEMUFileSocket { > -int fd; > -QEMUFile *file; > -} QEMUFileSocket; > - > -static ssize_t socket_writev_buffer(void *opaque, struct iovec *iov, int > iovcnt, > -int64_t pos) > -{ > -QEMUFileSocket *s = opaque; > -ssize_t len; > -ssize_t size = iov_size(iov, iovcnt); > -ssize_t offset = 0; > -int err; > - > -while (size > 0) { > -len = iov_send(s->fd, iov, iovcnt, offset, size); > - > -if (len > 0) { > -size -= len; > -offset += len; > -} > - > -if (size > 0) { > -err = socket_error(); > - > -if (err != EAGAIN && err != EWOULDBLOCK) { > -error_report("socket_writev_buffer: Got err=%d for > (%zu/%zu)", > - err, (size_t)size, (size_t)len); > -/* > - * If I've already sent some but only just got the error, I > - * could return the amount validly sent so far and wait for > the > - * next call to report the error, but I'd rather flag the > error > - * immediately. > - */ > -return -err; > -} > - > -/* Emulate blocking */ > -GPollFD pfd; > - > -pfd.fd = s->fd; > -pfd.events = G_IO_OUT | G_IO_ERR; > -pfd.revents = 0; > -TFR(err = g_poll(&pfd, 1, -1 /* no timeout */)); > -/* Errors other than EINTR intentionally ignored */ > -} > - } > - > -return offset; >
Re: [Qemu-devel] [PATCH 2/2] m68k: Build the opcode table only once to avoid multithreading issues
Le 03/02/2016 10:39, Laurent Vivier a écrit : > > > Le 03/02/2016 10:37, John Paul Adrian Glaubitz a écrit : >> Signed-off-by: John Paul Adrian Glaubitz >> --- >> target-m68k/translate.c | 4 >> 1 file changed, 4 insertions(+) >> >> diff --git a/target-m68k/translate.c b/target-m68k/translate.c >> index 535d7f9..f508d1e 100644 >> --- a/target-m68k/translate.c >> +++ b/target-m68k/translate.c >> @@ -2828,6 +2828,10 @@ register_opcode (disas_proc proc, uint16_t opcode, >> uint16_t mask) >> Later insn override earlier ones. */ >> void register_m68k_insns (CPUM68KState *env) >> { >> +/* Build the opcode table only once to avoid >> + issues with multithreading. */ >> +if(opcode_table[0] != NULL) >> + return; >> #define INSN(name, opcode, mask, feature) do { \ >> if (m68k_feature(env, M68K_FEATURE_##feature)) \ >> register_opcode(disas_##name, 0x##opcode, 0x##mask); \ >> > Reviewed-by: Laurent Vivier In fact, the line should be: +if (opcode_table[0] != NULL) +return; thanks to scripts/checkpatch.pl. Laurent
Re: [Qemu-devel] [PATCH v6 1/5] util: Introduce error reporting functions with fatal/abort
On 03.02.2016 10:48, Markus Armbruster wrote: > David Gibson writes: > >> On Tue, Feb 02, 2016 at 10:47:35PM +0100, Thomas Huth wrote: >>> On 02.02.2016 19:53, Markus Armbruster wrote: Lluís Vilanova writes: >>> ... >>> > diff --git a/include/qemu/error-report.h b/include/qemu/error-report.h > index 7ab2355..6c2f142 100644 > --- a/include/qemu/error-report.h > +++ b/include/qemu/error-report.h > @@ -43,4 +43,23 @@ void error_report(const char *fmt, ...) > GCC_FMT_ATTR(1, 2); > const char *error_get_progname(void); > extern bool enable_timestamp_msg; > > +/* Report message and exit with error */ > +void QEMU_NORETURN error_vreport_fatal(const char *fmt, va_list ap) > GCC_FMT_ATTR(1, 0); > +void QEMU_NORETURN error_report_fatal(const char *fmt, ...) > GCC_FMT_ATTR(1, 2); This lets people write things like error_report_fatal("The sky is falling"); instead of error_report("The sky is falling"); exit(1); or fprintf(stderr, "The sky is falling\n"); exit(1); I don't think that's an improvement in clarity. >>> >>> The problem is not the existing code, but that in a couple of new >>> patches, I've now already seen that people are trying to use >>> >>> error_setg(&error_fatal, ... ); >> >> So, I don't actually see any real advantage to error_report_fatal(...) >> over error_setg(&error_fatal, ...). > > I do. Compare: > > (a) error_report(...); > exit(1); > > (b) error_report_fatal(...); > > (c) error_setg(&error_fatal, ...); > > In my opinion, (a) is clearest: even a relatively clueless reader will > know what exit(1) does, can guess what error_report() approximately > does, and doesn't need to know what it does exactly. (b) is slightly > less obvious, and (c) is positively opaque. > > Let's stick to the obvious (a) and be done with it. Ok, (a) is fine for me too, as long as we avoid (c). Lluís, could you maybe add that information to your patch that updates the HACKING text? (and sorry for the fuzz with error_report_fatal() ... I thought it would be a good solution to avoid (c), but if (a) is preferred instead, then we should go with that solution instead). And, by the way, what about the spots that currently already use error_setg(&error_abort, ) ? Should they be turned into error_report() + abort() instead? Or only abort(), without error message, since abort() is only about programming errors? Thomas
Re: [Qemu-devel] [PATCH v1 18/22] migration: delete QEMUFile stdio implementation
* Daniel P. Berrange (berra...@redhat.com) wrote: > Now that the exec migration backend and savevm have converted > to use the QIOChannel based QEMUFile, there is no user remaining > for the stdio based QEMUFile impl and it can be deleted. > > Signed-off-by: Daniel P. Berrange Reviewed-by: Dr. David Alan Gilbert > --- > include/migration/qemu-file.h | 2 - > migration/Makefile.objs | 2 +- > migration/qemu-file-stdio.c | 195 > -- > 3 files changed, 1 insertion(+), 198 deletions(-) > delete mode 100644 migration/qemu-file-stdio.c > > diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h > index dc4b7ab..6a66735 100644 > --- a/include/migration/qemu-file.h > +++ b/include/migration/qemu-file.h > @@ -128,10 +128,8 @@ typedef struct QEMUFileHooks { > } QEMUFileHooks; > > QEMUFile *qemu_fopen_ops(void *opaque, const QEMUFileOps *ops); > -QEMUFile *qemu_fopen(const char *filename, const char *mode); > QEMUFile *qemu_fopen_channel_input(QIOChannel *ioc); > QEMUFile *qemu_fopen_channel_output(QIOChannel *ioc); > -QEMUFile *qemu_popen_cmd(const char *command, const char *mode); > void qemu_file_set_hooks(QEMUFile *f, const QEMUFileHooks *hooks); > int qemu_get_fd(QEMUFile *f); > int qemu_fclose(QEMUFile *f); > diff --git a/migration/Makefile.objs b/migration/Makefile.objs > index 2c71056..a127f31 100644 > --- a/migration/Makefile.objs > +++ b/migration/Makefile.objs > @@ -1,6 +1,6 @@ > common-obj-y += migration.o tcp.o unix.o fd.o exec.o > common-obj-y += vmstate.o > -common-obj-y += qemu-file.o qemu-file-stdio.o > +common-obj-y += qemu-file.o > common-obj-y += qemu-file-channel.o > common-obj-y += xbzrle.o postcopy-ram.o > > diff --git a/migration/qemu-file-stdio.c b/migration/qemu-file-stdio.c > deleted file mode 100644 > index 9bde9db..000 > --- a/migration/qemu-file-stdio.c > +++ /dev/null > @@ -1,195 +0,0 @@ > -/* > - * QEMU System Emulator > - * > - * Copyright (c) 2003-2008 Fabrice Bellard > - * > - * Permission is hereby granted, free of charge, to any person obtaining a > copy > - * of this software and associated documentation files (the "Software"), to > deal > - * in the Software without restriction, including without limitation the > rights > - * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell > - * copies of the Software, and to permit persons to whom the Software is > - * furnished to do so, subject to the following conditions: > - * > - * The above copyright notice and this permission notice shall be included in > - * all copies or substantial portions of the Software. > - * > - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR > - * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, > - * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL > - * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER > - * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING > FROM, > - * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN > - * THE SOFTWARE. > - */ > -#include "qemu-common.h" > -#include "qemu/coroutine.h" > -#include "migration/qemu-file.h" > - > -typedef struct QEMUFileStdio { > -FILE *stdio_file; > -QEMUFile *file; > -} QEMUFileStdio; > - > -static int stdio_get_fd(void *opaque) > -{ > -QEMUFileStdio *s = opaque; > - > -return fileno(s->stdio_file); > -} > - > -static ssize_t stdio_put_buffer(void *opaque, const uint8_t *buf, int64_t > pos, > -size_t size) > -{ > -QEMUFileStdio *s = opaque; > -size_t res; > - > -res = fwrite(buf, 1, size, s->stdio_file); > - > -if (res != size) { > -return -errno; > -} > -return res; > -} > - > -static ssize_t stdio_get_buffer(void *opaque, uint8_t *buf, int64_t pos, > -size_t size) > -{ > -QEMUFileStdio *s = opaque; > -FILE *fp = s->stdio_file; > -ssize_t bytes; > - > -for (;;) { > -clearerr(fp); > -bytes = fread(buf, 1, size, fp); > -if (bytes != 0 || !ferror(fp)) { > -break; > -} > -if (errno == EAGAIN) { > -yield_until_fd_readable(fileno(fp)); > -} else if (errno != EINTR) { > -break; > -} > -} > -return bytes; > -} > - > -static int stdio_pclose(void *opaque) > -{ > -QEMUFileStdio *s = opaque; > -int ret; > -ret = pclose(s->stdio_file); > -if (ret == -1) { > -ret = -errno; > -} else if (!WIFEXITED(ret) || WEXITSTATUS(ret) != 0) { > -/* close succeeded, but non-zero exit code: */ > -ret = -EIO; /* fake errno value */ > -} > -g_free(s); > -return ret; > -} > - > -static int stdio_fclose(void *opaque) > -{ > -QEMUFileStdio *s = opaque; > -int ret = 0; > - > -if (qemu_file_is_writable(s->file)) { > -int f
Re: [Qemu-devel] [PATCHv3 2/4] Split serial-isa into its own config option
David Gibson writes: > At present, the core device model code for 8250-like serial ports > (serial.c) and the code for serial ports attached to ISA-style legacy IO > (serial-isa.c) are both controlled by the CONFIG_SERIAL variable. > > There are lots and lots of embedded platforms that have 8250-like serial > ports but have never had anything resembling ISA legacy IO. Therefore, > split serial-isa into its own CONFIG_SERIAL_ISA option so it can be > disabled for platforms where it's not appropriate. > > For now, I enabled CONFIG_SERIAL_ISA in every default-config where > CONFIG_SERIAL is enabled, excepting microblaze, moxie, or32, and > xtensa. As best as I can tell, those platforms never used legacy ISA, > and also don't include PCI support (which would allow connection of a > PCI->ISA bridge and/or a southbridge including legacy ISA serial > ports). > > Signed-off-by: David Gibson > --- > default-configs/alpha-softmmu.mak| 1 + > default-configs/arm-softmmu.mak | 1 + > default-configs/i386-softmmu.mak | 1 + > default-configs/mips-softmmu.mak | 1 + > default-configs/mips64-softmmu.mak | 1 + > default-configs/mips64el-softmmu.mak | 1 + > default-configs/mipsel-softmmu.mak | 1 + > default-configs/ppc-softmmu.mak | 1 + > default-configs/ppc64-softmmu.mak| 1 + > default-configs/ppcemb-softmmu.mak | 1 + > default-configs/sh4-softmmu.mak | 1 + > default-configs/sh4eb-softmmu.mak| 1 + > default-configs/sparc64-softmmu.mak | 1 + > default-configs/x86_64-softmmu.mak | 1 + > hw/char/Makefile.objs| 3 ++- > 15 files changed, 16 insertions(+), 1 deletion(-) Ignorant question: what about the CONFIG_SERIAL in pci.mak? Should it trigger CONFIG_SERIAL_ISA, too? If not, should the commit message explain why not?
Re: [Qemu-devel] [PATCH v1 10/22] migration: convert tcp socket protocol to use QIOChannel
On Tue, Feb 02, 2016 at 06:19:03PM +, Dr. David Alan Gilbert wrote: > * Daniel P. Berrange (berra...@redhat.com) wrote: > > Convert the tcp socket migration protocol driver to use > > QIOChannel and QEMUFileChannel, instead of plain sockets > > APIs. > > > > While this now looks pretty similar to the migration/unix.c > > file from the previous patch, it was decided not to merge > > the two, because when TLS is added to the TCP impl later, > > this file diverge from unix.c once again. > > Hmm OK, although I'd kind of like to see merging, but lets > see the TLS code later in the series > > > Signed-off-by: Daniel P. Berrange > > --- > > migration/tcp.c | 119 > > ++-- > > 1 file changed, 82 insertions(+), 37 deletions(-) > > > > diff --git a/migration/tcp.c b/migration/tcp.c > > index ae89172..ac73977 100644 > > --- a/migration/tcp.c > > +++ b/migration/tcp.c > > @@ -2,9 +2,11 @@ > > * QEMU live migration > > * > > * Copyright IBM, Corp. 2008 > > + * Copyright Red Hat, Inc. 2015 > > * > > * Authors: > > * Anthony Liguori > > + * Daniel P. Berrange > > * > > * This work is licensed under the terms of the GNU GPL, version 2. See > > * the COPYING file in the top-level directory. > > @@ -17,11 +19,9 @@ > > > > #include "qemu-common.h" > > #include "qemu/error-report.h" > > -#include "qemu/sockets.h" > > #include "migration/migration.h" > > #include "migration/qemu-file.h" > > -#include "block/block.h" > > -#include "qemu/main-loop.h" > > +#include "io/channel-socket.h" > > > > //#define DEBUG_MIGRATION_TCP > > > > @@ -33,71 +33,116 @@ > > do { } while (0) > > #endif > > > > -static void tcp_wait_for_connect(int fd, Error *err, void *opaque) > > + > > +static SocketAddress *tcp_build_address(const char *host_port, Error > > **errp) > > +{ > > +InetSocketAddress *iaddr = inet_parse(host_port, errp); > > +SocketAddress *saddr; > > + > > +if (!iaddr) { > > +return NULL; > > +} > > + > > +saddr = g_new0(SocketAddress, 1); > > +saddr->type = SOCKET_ADDRESS_KIND_INET; > > +saddr->u.inet = iaddr; > > + > > +return saddr; > > +} > > + > > + > > +static void tcp_outgoing_migration(Object *src, > > + Error *err, > > + gpointer opaque) > > { > > MigrationState *s = opaque; > > +QIOChannel *sioc = QIO_CHANNEL(src); > > > > -if (fd < 0) { > > +if (err) { > > DPRINTF("migrate connect error: %s\n", error_get_pretty(err)); > > s->file = NULL; > > migrate_fd_error(s); > > } else { > > DPRINTF("migrate connect success\n"); > > -s->file = qemu_fopen_socket(fd, "wb"); > > +s->file = qemu_fopen_channel_output(sioc); > > migrate_fd_connect(s); > > } > > +object_unref(src); > > } > > > > -void tcp_start_outgoing_migration(MigrationState *s, const char > > *host_port, Error **errp) > > + > > +void tcp_start_outgoing_migration(MigrationState *s, > > + const char *host_port, > > + Error **errp) > > { > > -inet_nonblocking_connect(host_port, tcp_wait_for_connect, s, errp); > > +SocketAddress *saddr = tcp_build_address(host_port, errp); > > +QIOChannelSocket *sioc; > > + > > +if (!saddr) { > > +return; > > +} > > + > > +sioc = qio_channel_socket_new(); > > +qio_channel_socket_connect_async(sioc, > > + saddr, > > + tcp_outgoing_migration, > > + s, > > + NULL); > > +qapi_free_SocketAddress(saddr); > > } > > > > -static void tcp_accept_incoming_migration(void *opaque) > > + > > +static gboolean tcp_accept_incoming_migration(QIOChannel *ioc, > > + GIOCondition condition, > > + gpointer opaque) > > { > > -struct sockaddr_in addr; > > -socklen_t addrlen = sizeof(addr); > > -int s = (intptr_t)opaque; > > QEMUFile *f; > > -int c, err; > > +QIOChannelSocket *cioc; > > +Error *err = NULL; > > > > -do { > > -c = qemu_accept(s, (struct sockaddr *)&addr, &addrlen); > > -err = socket_error(); > > -} while (c < 0 && err == EINTR); > > -qemu_set_fd_handler(s, NULL, NULL, NULL); > > -closesocket(s); > > - > > -DPRINTF("accepted migration\n"); > > - > > -if (c < 0) { > > +cioc = qio_channel_socket_accept(QIO_CHANNEL_SOCKET(ioc), > > + &err); > > +if (!cioc) { > > error_report("could not accept migration connection (%s)", > > - strerror(err)); > > -return; > > -} > > - > > -f = qemu_fopen_socket(c, "rb"); > > -if (f == NULL) { > > -error_
Re: [Qemu-devel] [PATCH v1 11/22] migration: convert fd socket protocol to use QIOChannel
On Tue, Feb 02, 2016 at 06:46:01PM +, Dr. David Alan Gilbert wrote: > * Daniel P. Berrange (berra...@redhat.com) wrote: > > Convert the fd socket migration protocol driver to use > > QIOChannel and QEMUFileChannel, instead of plain sockets > > APIs. It can be unconditionally built because the > > QIOChannel APIs it uses will take care to report suitable > > error messages if needed. > > > > Signed-off-by: Daniel P. Berrange > > --- > > migration/Makefile.objs | 4 ++-- > > migration/fd.c | 57 > > - > > migration/migration.c | 4 > > 3 files changed, 39 insertions(+), 26 deletions(-) > > > > diff --git a/migration/Makefile.objs b/migration/Makefile.objs > > index a5f8a03..64f95cd 100644 > > --- a/migration/Makefile.objs > > +++ b/migration/Makefile.objs > > @@ -1,11 +1,11 @@ > > -common-obj-y += migration.o tcp.o unix.o > > +common-obj-y += migration.o tcp.o unix.o fd.o > > common-obj-y += vmstate.o > > common-obj-y += qemu-file.o qemu-file-buf.o qemu-file-unix.o > > qemu-file-stdio.o > > common-obj-y += qemu-file-channel.o > > common-obj-y += xbzrle.o postcopy-ram.o > > > > common-obj-$(CONFIG_RDMA) += rdma.o > > -common-obj-$(CONFIG_POSIX) += exec.o fd.o > > +common-obj-$(CONFIG_POSIX) += exec.o > > > > common-obj-y += block.o > > > > diff --git a/migration/fd.c b/migration/fd.c > > index 3e4bed0..8d48e0d 100644 > > --- a/migration/fd.c > > +++ b/migration/fd.c > > @@ -20,6 +20,8 @@ > > #include "monitor/monitor.h" > > #include "migration/qemu-file.h" > > #include "block/block.h" > > +#include "io/channel-file.h" > > +#include "io/channel-socket.h" > > > > //#define DEBUG_MIGRATION_FD > > > > @@ -33,56 +35,71 @@ > > > > static bool fd_is_socket(int fd) > > { > > -struct stat stat; > > -int ret = fstat(fd, &stat); > > -if (ret == -1) { > > -/* When in doubt say no */ > > -return false; > > -} > > -return S_ISSOCK(stat.st_mode); > > +int optval; > > +socklen_t optlen; > > +optlen = sizeof(optval); > > +return getsockopt(fd, > > + SOL_SOCKET, > > + SO_TYPE, > > + (char *)&optval, > > + &optlen) == 0; > > Should that be qemu_getsockopt ? Yes. > > void fd_start_outgoing_migration(MigrationState *s, const char *fdname, > > Error **errp) > > { > > +QIOChannel *ioc; > > int fd = monitor_get_fd(cur_mon, fdname, errp); > > if (fd == -1) { > > return; > > } > > > > if (fd_is_socket(fd)) { > > -s->file = qemu_fopen_socket(fd, "wb"); > > +ioc = QIO_CHANNEL(qio_channel_socket_new_fd(fd, errp)); > > +if (!ioc) { > > +close(fd); > > +return; > > +} > > Have you considered moving this fd_is_socket detection > glue down into channel-file.c? It's here so that a socket > passed via an fd will be able to use a shutdown() method. > It would be nice to do the same thing to sockets in the same > situation in other places, so it would make sense if it worked > on any fd that went through channels. > The tricky bit is at that level you return a QIOChannelFile rather > than a generic QIOChannel pointer. > (One place I'd like to be able to a shutdown is on an nbd channel > for example). Yeah, the complexity is that we have to return two different class instances depending on the type of FD in use. We could introduce some helper method todo this though. > > -static void fd_accept_incoming_migration(void *opaque) > > +static gboolean fd_accept_incoming_migration(QIOChannel *ioc, > > + GIOCondition condition, > > + gpointer opaque) > > { > > QEMUFile *f = opaque; > > - > > -qemu_set_fd_handler(qemu_get_fd(f), NULL, NULL, NULL); > > process_incoming_migration(f); > > +return FALSE; > > Please comment the magical FALSE. FALSE is the standard value any glib event loop handled need to return to cause glib to unregister it. I'm not sure we really want to comment every event handle in this way. > > void fd_start_incoming_migration(const char *infd, Error **errp) > > { > > -int fd; > > QEMUFile *f; > > +QIOChannel *ioc; > > +int fd; > > > > DPRINTF("Attempting to start an incoming migration via fd\n"); > > > > fd = strtol(infd, NULL, 0); > > if (fd_is_socket(fd)) { > > -f = qemu_fopen_socket(fd, "rb"); > > +ioc = QIO_CHANNEL(qio_channel_socket_new_fd(fd, errp)); > > +if (!ioc) { > > +close(fd); > > +return; > > +} > > Wouldn't it be better to move this check outside of this if, so that > you test the output of both the socket_new_fd and the file_new_fd ? qio_channel_file_new_fd() can never fail - only the socket_new_fd can fail (when trying to query the sockaddr_t data). > > } else { > > -f = q
Re: [Qemu-devel] [PATCH 2/2] m68k: Build the opcode table only once to avoid multithreading issues
On 02/03/2016 10:57 AM, Laurent Vivier wrote: > In fact, the line should be: > > +if (opcode_table[0] != NULL) > +return; > > thanks to scripts/checkpatch.pl. Want me to re-send it? -- .''`. John Paul Adrian Glaubitz : :' : Debian Developer - glaub...@debian.org `. `' Freie Universitaet Berlin - glaub...@physik.fu-berlin.de `-GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913
Re: [Qemu-devel] [PATCHv3 3/4] Allow ISA bus to be configured out
David Gibson writes: > Currently, the code to handle the legacy ISA bus is always included in > qemu. However there are lots of platforms that don't include ISA legacy > devies, and quite a few that have never used ISA legacy devices at all. > > This patch allows the ISA bus code to be disabled in the configuration for > platforms where it doesn't make sense. > > For now, the default configs are adjusted to include ISA on all platforms > including PCI: anything with PCI can at least in principle add an i82378 > PCI->ISA bridge. Also, CONFIG_IDE_CORE which is already in pci.mak > requires ISA support. > > We also explicitly enable ISA on some other non-PCI platforms which include > ISA devices. We may want to pare this down in future. Impact? Please list the targets that lose ISA because of this patch. > Signed-off-by: David Gibson > Acked-by: Michael S. Tsirkin > --- > default-configs/moxie-softmmu.mak | 1 + > default-configs/pci.mak | 2 ++ > default-configs/sparc-softmmu.mak | 1 + > default-configs/unicore32-softmmu.mak | 1 + > hw/isa/Makefile.objs | 2 +- > 5 files changed, 6 insertions(+), 1 deletion(-) > > diff --git a/default-configs/moxie-softmmu.mak > b/default-configs/moxie-softmmu.mak > index 1a95476..3886275 100644 > --- a/default-configs/moxie-softmmu.mak > +++ b/default-configs/moxie-softmmu.mak > @@ -1,5 +1,6 @@ > # Default configuration for moxie-softmmu > > +CONFIG_ISA_BUS=y > CONFIG_MC146818RTC=y > CONFIG_SERIAL=y > CONFIG_VGA=y Uh, PATCH 2 excepted moxie from "add CONFIG_SERIAL_ISA to every config that has CONFIG_SERIAL", but now you're giving it an ISA bus. Please explain. > diff --git a/default-configs/pci.mak b/default-configs/pci.mak > index f250119..bcf18f0 100644 > --- a/default-configs/pci.mak > +++ b/default-configs/pci.mak > @@ -1,4 +1,6 @@ > CONFIG_PCI=y > +# For now, IDE_CORE requires ISA, so we enable it here You mean CONFIG_IDE_CORE? If yes, please spell it out, so grep finds it. > +CONFIG_ISA_BUS=y > CONFIG_VIRTIO_PCI=y > CONFIG_VIRTIO=y > CONFIG_USB_UHCI=y > diff --git a/default-configs/sparc-softmmu.mak > b/default-configs/sparc-softmmu.mak > index ab796b3..004b0f4 100644 > --- a/default-configs/sparc-softmmu.mak > +++ b/default-configs/sparc-softmmu.mak > @@ -1,5 +1,6 @@ > # Default configuration for sparc-softmmu > > +CONFIG_ISA_BUS=y > CONFIG_ECC=y > CONFIG_ESP=y > CONFIG_ESCC=y > diff --git a/default-configs/unicore32-softmmu.mak > b/default-configs/unicore32-softmmu.mak > index de38577..5f6c4a8 100644 > --- a/default-configs/unicore32-softmmu.mak > +++ b/default-configs/unicore32-softmmu.mak > @@ -1,4 +1,5 @@ > # Default configuration for unicore32-softmmu > +CONFIG_ISA_BUS=y > CONFIG_PUV3=y > CONFIG_PTIMER=y > CONFIG_PCKBD=y > diff --git a/hw/isa/Makefile.objs b/hw/isa/Makefile.objs > index 9164556..fb37c55 100644 > --- a/hw/isa/Makefile.objs > +++ b/hw/isa/Makefile.objs > @@ -1,4 +1,4 @@ > -common-obj-y += isa-bus.o > +common-obj-$(CONFIG_ISA_BUS) += isa-bus.o > common-obj-$(CONFIG_APM) += apm.o > common-obj-$(CONFIG_I82378) += i82378.o > common-obj-$(CONFIG_PC87312) += pc87312.o
Re: [Qemu-devel] [PATCH v5 02/10] qemu-img: add support for --object command line arg
On Tue, Feb 02, 2016 at 05:24:32PM -0700, Eric Blake wrote: > On 02/02/2016 05:57 AM, Daniel P. Berrange wrote: > > Allow creation of user creatable object types with qemu-img > > via a new --object command line arg. This will be used to supply > > passwords and/or encryption keys to the various block driver > > backends via the recently added 'secret' object type. > > > > # printf letmein > mypasswd.txt > > # qemu-img info --object secret,id=sec0,file=mypasswd.txt \ > > ...other info args... > > > > Signed-off-by: Daniel P. Berrange > > --- > > qemu-img-cmds.hx | 44 - > > qemu-img.c | 269 > > +-- > > qemu-img.texi| 8 ++ > > 3 files changed, 291 insertions(+), 30 deletions(-) > > > > > +++ b/qemu-img.c > > @@ -94,6 +97,10 @@ static void QEMU_NORETURN help(void) > > "\n" > > "Command parameters:\n" > > " 'filename' is a disk image filename\n" > > + " 'objectdef' is a QEMU user creatable object definition. See > > the @code{qemu(1)}\n" > > Drop @code; this is the --help text. > > > + "manual page for a description of the object properties. > > The common object\n" > > + "type that it makes sense to define is 'secret' object, > > which is used to\n" > > s/is/is a/ > > or maybe go for something shorter: > > The most common object type is a 'secret', which is used... > > or match the text you put in the info: > > The only object type that it makes sense to define is the 'secret' > object, which is used... > > > @@ -275,7 +291,14 @@ static int img_create(int argc, char **argv) > > bool quiet = false; > > > > for(;;) { > > -c = getopt(argc, argv, "F:b:f:he6o:q"); > > +int option_index = 0; > > +static const struct option long_options[] = { > > +{"help", no_argument, 0, 'h'}, > > +{"object", required_argument, 0, OPTION_OBJECT}, > > +{0, 0, 0, 0} > > +}; > > +c = getopt_long(argc, argv, "F:b:f:he6o:q", > > +long_options, &option_index); > > Can't you pass NULL for the last parameter, if you aren't going to use > option_index in your error reporting? Oh, I didn't realize that it allowed NULL for that parameter. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
Re: [Qemu-devel] [PATCH 2/2] m68k: Build the opcode table only once to avoid multithreading issues
Le 03/02/2016 11:06, John Paul Adrian Glaubitz a écrit : > On 02/03/2016 10:57 AM, Laurent Vivier wrote: >> In fact, the line should be: >> >> +if (opcode_table[0] != NULL) >> +return; >> >> thanks to scripts/checkpatch.pl. > > Want me to re-send it? > I don't know if the one who will commit this to the tree will want to update this. BTW, Peter, perhaps it's the time to add me as m68k maintainer ? (so I will manage that :) ) Laurent
[Qemu-devel] [PATCH] m68k: Build the opcode table only once to avoid multithreading issues
Signed-off-by: John Paul Adrian Glaubitz --- target-m68k/translate.c | 4 1 file changed, 4 insertions(+) diff --git a/target-m68k/translate.c b/target-m68k/translate.c index 535d7f9..a989961 100644 --- a/target-m68k/translate.c +++ b/target-m68k/translate.c @@ -2828,6 +2828,10 @@ register_opcode (disas_proc proc, uint16_t opcode, uint16_t mask) Later insn override earlier ones. */ void register_m68k_insns (CPUM68KState *env) { +/* Build the opcode table only once to avoid + multithreading issues. */ +if (opcode_table[0] != NULL) +return; #define INSN(name, opcode, mask, feature) do { \ if (m68k_feature(env, M68K_FEATURE_##feature)) \ register_opcode(disas_##name, 0x##opcode, 0x##mask); \ -- 2.7.0
[Qemu-devel] Updated patch for the opcode table
Hi Laurent! Here's the second patch again, this time checked with ./scripts/checkpatch.pl. Thanks for the heads-up! Adrian
Re: [Qemu-devel] [PATCH 2/2] m68k: Build the opcode table only once to avoid multithreading issues
On 02/03/2016 11:13 AM, Laurent Vivier wrote: > I don't know if the one who will commit this to the tree will want to > update this. Just sent an updated one, just in case :). > BTW, Peter, perhaps it's the time to add me as m68k maintainer ? > (so I will manage that :) ) Yes, please. I'm all for that! I'm really motivated to get m68k support into best shape. qemu-m68k is fun to hack on! Adrian -- .''`. John Paul Adrian Glaubitz : :' : Debian Developer - glaub...@debian.org `. `' Freie Universitaet Berlin - glaub...@physik.fu-berlin.de `-GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913
Re: [Qemu-devel] [PATCH v1 11/22] migration: convert fd socket protocol to use QIOChannel
* Daniel P. Berrange (berra...@redhat.com) wrote: > On Tue, Feb 02, 2016 at 06:46:01PM +, Dr. David Alan Gilbert wrote: > > * Daniel P. Berrange (berra...@redhat.com) wrote: > > > Convert the fd socket migration protocol driver to use > > > QIOChannel and QEMUFileChannel, instead of plain sockets > > > APIs. It can be unconditionally built because the > > > QIOChannel APIs it uses will take care to report suitable > > > error messages if needed. > > > > > > Signed-off-by: Daniel P. Berrange > > > --- > > > migration/Makefile.objs | 4 ++-- > > > migration/fd.c | 57 > > > - > > > migration/migration.c | 4 > > > 3 files changed, 39 insertions(+), 26 deletions(-) > > > > > > diff --git a/migration/Makefile.objs b/migration/Makefile.objs > > > index a5f8a03..64f95cd 100644 > > > --- a/migration/Makefile.objs > > > +++ b/migration/Makefile.objs > > > @@ -1,11 +1,11 @@ > > > -common-obj-y += migration.o tcp.o unix.o > > > +common-obj-y += migration.o tcp.o unix.o fd.o > > > common-obj-y += vmstate.o > > > common-obj-y += qemu-file.o qemu-file-buf.o qemu-file-unix.o > > > qemu-file-stdio.o > > > common-obj-y += qemu-file-channel.o > > > common-obj-y += xbzrle.o postcopy-ram.o > > > > > > common-obj-$(CONFIG_RDMA) += rdma.o > > > -common-obj-$(CONFIG_POSIX) += exec.o fd.o > > > +common-obj-$(CONFIG_POSIX) += exec.o > > > > > > common-obj-y += block.o > > > > > > diff --git a/migration/fd.c b/migration/fd.c > > > index 3e4bed0..8d48e0d 100644 > > > --- a/migration/fd.c > > > +++ b/migration/fd.c > > > @@ -20,6 +20,8 @@ > > > #include "monitor/monitor.h" > > > #include "migration/qemu-file.h" > > > #include "block/block.h" > > > +#include "io/channel-file.h" > > > +#include "io/channel-socket.h" > > > > > > //#define DEBUG_MIGRATION_FD > > > > > > @@ -33,56 +35,71 @@ > > > > > > static bool fd_is_socket(int fd) > > > { > > > -struct stat stat; > > > -int ret = fstat(fd, &stat); > > > -if (ret == -1) { > > > -/* When in doubt say no */ > > > -return false; > > > -} > > > -return S_ISSOCK(stat.st_mode); > > > +int optval; > > > +socklen_t optlen; > > > +optlen = sizeof(optval); > > > +return getsockopt(fd, > > > + SOL_SOCKET, > > > + SO_TYPE, > > > + (char *)&optval, > > > + &optlen) == 0; > > > > Should that be qemu_getsockopt ? > > Yes. > > > > void fd_start_outgoing_migration(MigrationState *s, const char *fdname, > > > Error **errp) > > > { > > > +QIOChannel *ioc; > > > int fd = monitor_get_fd(cur_mon, fdname, errp); > > > if (fd == -1) { > > > return; > > > } > > > > > > if (fd_is_socket(fd)) { > > > -s->file = qemu_fopen_socket(fd, "wb"); > > > +ioc = QIO_CHANNEL(qio_channel_socket_new_fd(fd, errp)); > > > +if (!ioc) { > > > +close(fd); > > > +return; > > > +} > > > > Have you considered moving this fd_is_socket detection > > glue down into channel-file.c? It's here so that a socket > > passed via an fd will be able to use a shutdown() method. > > It would be nice to do the same thing to sockets in the same > > situation in other places, so it would make sense if it worked > > on any fd that went through channels. > > The tricky bit is at that level you return a QIOChannelFile rather > > than a generic QIOChannel pointer. > > (One place I'd like to be able to a shutdown is on an nbd channel > > for example). > > Yeah, the complexity is that we have to return two different > class instances depending on the type of FD in use. We could > introduce some helper method todo this though. Is there any reason they return the subclasses rather than the common parent? > > > -static void fd_accept_incoming_migration(void *opaque) > > > +static gboolean fd_accept_incoming_migration(QIOChannel *ioc, > > > + GIOCondition condition, > > > + gpointer opaque) > > > { > > > QEMUFile *f = opaque; > > > - > > > -qemu_set_fd_handler(qemu_get_fd(f), NULL, NULL, NULL); > > > process_incoming_migration(f); > > > +return FALSE; > > > > Please comment the magical FALSE. > > FALSE is the standard value any glib event loop handled need to > return to cause glib to unregister it. I'm not sure we really > want to comment every event handle in this way. I'd prefer we did. It doesn't need to be big and detailed, but either a function comment, or something as simple as: return FALSE; /* unregister */ it's not something I immediately recognised, and it took a bit of digging around for me to find the answer. > > > void fd_start_incoming_migration(const char *infd, Error **errp) > > > { > > > -int fd; > > > QEMUFile *f; > > > +QIOChannel *ioc; > > > +int fd; > > > > > > DPR
Re: [Qemu-devel] [PATCH v1 10/22] migration: convert tcp socket protocol to use QIOChannel
* Daniel P. Berrange (berra...@redhat.com) wrote: > On Tue, Feb 02, 2016 at 06:19:03PM +, Dr. David Alan Gilbert wrote: > > * Daniel P. Berrange (berra...@redhat.com) wrote: > > > Convert the tcp socket migration protocol driver to use > > > QIOChannel and QEMUFileChannel, instead of plain sockets > > > APIs. > > > > > > While this now looks pretty similar to the migration/unix.c > > > file from the previous patch, it was decided not to merge > > > the two, because when TLS is added to the TCP impl later, > > > this file diverge from unix.c once again. > > > > Hmm OK, although I'd kind of like to see merging, but lets > > see the TLS code later in the series > > > > > Signed-off-by: Daniel P. Berrange > > > --- > > > migration/tcp.c | 119 > > > ++-- > > > 1 file changed, 82 insertions(+), 37 deletions(-) > > > > > > diff --git a/migration/tcp.c b/migration/tcp.c > > > index ae89172..ac73977 100644 > > > --- a/migration/tcp.c > > > +++ b/migration/tcp.c > > > @@ -2,9 +2,11 @@ > > > * QEMU live migration > > > * > > > * Copyright IBM, Corp. 2008 > > > + * Copyright Red Hat, Inc. 2015 > > > * > > > * Authors: > > > * Anthony Liguori > > > + * Daniel P. Berrange > > > * > > > * This work is licensed under the terms of the GNU GPL, version 2. See > > > * the COPYING file in the top-level directory. > > > @@ -17,11 +19,9 @@ > > > > > > #include "qemu-common.h" > > > #include "qemu/error-report.h" > > > -#include "qemu/sockets.h" > > > #include "migration/migration.h" > > > #include "migration/qemu-file.h" > > > -#include "block/block.h" > > > -#include "qemu/main-loop.h" > > > +#include "io/channel-socket.h" > > > > > > //#define DEBUG_MIGRATION_TCP > > > > > > @@ -33,71 +33,116 @@ > > > do { } while (0) > > > #endif > > > > > > -static void tcp_wait_for_connect(int fd, Error *err, void *opaque) > > > + > > > +static SocketAddress *tcp_build_address(const char *host_port, Error > > > **errp) > > > +{ > > > +InetSocketAddress *iaddr = inet_parse(host_port, errp); > > > +SocketAddress *saddr; > > > + > > > +if (!iaddr) { > > > +return NULL; > > > +} > > > + > > > +saddr = g_new0(SocketAddress, 1); > > > +saddr->type = SOCKET_ADDRESS_KIND_INET; > > > +saddr->u.inet = iaddr; > > > + > > > +return saddr; > > > +} > > > + > > > + > > > +static void tcp_outgoing_migration(Object *src, > > > + Error *err, > > > + gpointer opaque) > > > { > > > MigrationState *s = opaque; > > > +QIOChannel *sioc = QIO_CHANNEL(src); > > > > > > -if (fd < 0) { > > > +if (err) { > > > DPRINTF("migrate connect error: %s\n", error_get_pretty(err)); > > > s->file = NULL; > > > migrate_fd_error(s); > > > } else { > > > DPRINTF("migrate connect success\n"); > > > -s->file = qemu_fopen_socket(fd, "wb"); > > > +s->file = qemu_fopen_channel_output(sioc); > > > migrate_fd_connect(s); > > > } > > > +object_unref(src); > > > } > > > > > > -void tcp_start_outgoing_migration(MigrationState *s, const char > > > *host_port, Error **errp) > > > + > > > +void tcp_start_outgoing_migration(MigrationState *s, > > > + const char *host_port, > > > + Error **errp) > > > { > > > -inet_nonblocking_connect(host_port, tcp_wait_for_connect, s, errp); > > > +SocketAddress *saddr = tcp_build_address(host_port, errp); > > > +QIOChannelSocket *sioc; > > > + > > > +if (!saddr) { > > > +return; > > > +} > > > + > > > +sioc = qio_channel_socket_new(); > > > +qio_channel_socket_connect_async(sioc, > > > + saddr, > > > + tcp_outgoing_migration, > > > + s, > > > + NULL); > > > +qapi_free_SocketAddress(saddr); > > > } > > > > > > -static void tcp_accept_incoming_migration(void *opaque) > > > + > > > +static gboolean tcp_accept_incoming_migration(QIOChannel *ioc, > > > + GIOCondition condition, > > > + gpointer opaque) > > > { > > > -struct sockaddr_in addr; > > > -socklen_t addrlen = sizeof(addr); > > > -int s = (intptr_t)opaque; > > > QEMUFile *f; > > > -int c, err; > > > +QIOChannelSocket *cioc; > > > +Error *err = NULL; > > > > > > -do { > > > -c = qemu_accept(s, (struct sockaddr *)&addr, &addrlen); > > > -err = socket_error(); > > > -} while (c < 0 && err == EINTR); > > > -qemu_set_fd_handler(s, NULL, NULL, NULL); > > > -closesocket(s); > > > - > > > -DPRINTF("accepted migration\n"); > > > - > > > -if (c < 0) { > > > +cioc = qio_channel_socket_accep
Re: [Qemu-devel] [PATCH v6 1/5] util: Introduce error reporting functions with fatal/abort
Thomas Huth writes: > On 03.02.2016 10:48, Markus Armbruster wrote: >> David Gibson writes: >> >>> On Tue, Feb 02, 2016 at 10:47:35PM +0100, Thomas Huth wrote: On 02.02.2016 19:53, Markus Armbruster wrote: > Lluís Vilanova writes: ... >> diff --git a/include/qemu/error-report.h b/include/qemu/error-report.h >> index 7ab2355..6c2f142 100644 >> --- a/include/qemu/error-report.h >> +++ b/include/qemu/error-report.h >> @@ -43,4 +43,23 @@ void error_report(const char *fmt, ...) >> GCC_FMT_ATTR(1, 2); >> const char *error_get_progname(void); >> extern bool enable_timestamp_msg; >> >> +/* Report message and exit with error */ >> +void QEMU_NORETURN error_vreport_fatal(const char *fmt, va_list ap) >> GCC_FMT_ATTR(1, 0); >> +void QEMU_NORETURN error_report_fatal(const char *fmt, ...) >> GCC_FMT_ATTR(1, 2); > > This lets people write things like > > error_report_fatal("The sky is falling"); > > instead of > > error_report("The sky is falling"); > exit(1); > > or > > fprintf(stderr, "The sky is falling\n"); > exit(1); > > I don't think that's an improvement in clarity. The problem is not the existing code, but that in a couple of new patches, I've now already seen that people are trying to use error_setg(&error_fatal, ... ); >>> >>> So, I don't actually see any real advantage to error_report_fatal(...) >>> over error_setg(&error_fatal, ...). >> >> I do. Compare: >> >> (a) error_report(...); >> exit(1); >> >> (b) error_report_fatal(...); >> >> (c) error_setg(&error_fatal, ...); >> >> In my opinion, (a) is clearest: even a relatively clueless reader will >> know what exit(1) does, can guess what error_report() approximately >> does, and doesn't need to know what it does exactly. (b) is slightly >> less obvious, and (c) is positively opaque. >> >> Let's stick to the obvious (a) and be done with it. > > Ok, (a) is fine for me too, as long as we avoid (c). Lluís, could you > maybe add that information to your patch that updates the HACKING text? I feel such detailed advice belings into error.h. Sketch appended. If that doesn't succeed in keeping (c) out, make checkpatch flag it. > (and sorry for the fuzz with error_report_fatal() ... I thought it would > be a good solution to avoid (c), but if (a) is preferred instead, then > we should go with that solution instead). > > And, by the way, what about the spots that currently already use > error_setg(&error_abort, ) ? Should they be turned into > error_report() + abort() instead? Or only abort(), without error > message, since abort() is only about programming errors? As I wrote in my first reply to this thread, I'd like them to be cleaned up to just abort() or assert(). I like assert(), because it gives me exactly what I can use to debug the programming error: a core dump (if enabled) and a source location (useful when no core dump). I never bought the argument that we should use abort() instead of assert(0) because "what if NDEBUG?!?". If you define NDEBUG, our 600+ abort()s won't save you from our 4000+ assert()s. diff --git a/include/qapi/error.h b/include/qapi/error.h index 45d6c72..ea7e74f 100644 --- a/include/qapi/error.h +++ b/include/qapi/error.h @@ -162,6 +162,9 @@ ErrorClass error_get_class(const Error *err); * human-readable error message is made from printf-style @fmt, ... * The resulting message should be a single phrase, with no newline or * trailing punctuation. + * Please don't error_setg(&error_fatal, ...), use error_report() and + * exit(), because that's more obvious. + * Likewise, don't error_setg(&error_abort, ...), use assert(). */ #define error_setg(errp, fmt, ...) \ error_setg_internal((errp), __FILE__, __LINE__, __func__, \ @@ -213,6 +216,8 @@ void error_setg_win32_internal(Error **errp, * the error object. * Else, move the error object from @local_err to *@dst_errp. * On return, @local_err is invalid. + * Please don't error_propagate(&error_fatal, ...), use + * error_report_err() and exit(), because that's more obvious. */ void error_propagate(Error **dst_errp, Error *local_err); @@ -291,12 +296,14 @@ void error_set_internal(Error **errp, GCC_FMT_ATTR(6, 7); /* - * Pass to error_setg() & friends to abort() on error. + * Special error destination to abort on error. + * See error_setg() and error_propagate() for details. */ extern Error *error_abort; /* - * Pass to error_setg() & friends to exit(1) on error. + * Special error destination to exit(1) on error. + * See error_setg() and error_propagate() for details. */ extern Error *error_fatal;
Re: [Qemu-devel] ping: [PATCH v13] block/raw-posix.c: Make physical devices usable in QEMU under Mac OS X host
Am 02.02.2016 um 20:29 hat Eric Blake geschrieben: > On 02/02/2016 10:28 AM, Programmingkid wrote: > > >> Whats the rationale here ? Using pre-allocated fixed > >> length arrays is pretty bad practice in general, but > >> especially so for filenames > > > > With an automatic variable there is no worry about when to release it. > > Yeah, but it comes with the downside of having to worry about exhausting > stack space (there are platforms where MAXPATHLEN is intentionally > undefined [hello, GNU Hurd]), or with the downside of arbitrary > limitations. And at the same time, MAXPATHLEN is usually wasteful - > allocating 1k or 4k of stack to store what is typically less than 100 > bytes is dumb. Really, storing file names in fixed length arrays is > better off to avoid, and just get used to dynamic management. Just let me add that while it's probably harmless here in .bdrv_open(), other paths in the block layer which run in a coroutine have a much more limited stack size and the danger of stack overflows is very real there. Kevin
Re: [Qemu-devel] [PATCH v1 11/22] migration: convert fd socket protocol to use QIOChannel
On Wed, Feb 03, 2016 at 10:29:50AM +, Dr. David Alan Gilbert wrote: > * Daniel P. Berrange (berra...@redhat.com) wrote: > > On Tue, Feb 02, 2016 at 06:46:01PM +, Dr. David Alan Gilbert wrote: > > > > void fd_start_incoming_migration(const char *infd, Error **errp) > > > > { > > > > -int fd; > > > > QEMUFile *f; > > > > +QIOChannel *ioc; > > > > +int fd; > > > > > > > > DPRINTF("Attempting to start an incoming migration via fd\n"); > > > > > > > > fd = strtol(infd, NULL, 0); > > > > if (fd_is_socket(fd)) { > > > > -f = qemu_fopen_socket(fd, "rb"); > > > > +ioc = QIO_CHANNEL(qio_channel_socket_new_fd(fd, errp)); > > > > +if (!ioc) { > > > > +close(fd); > > > > +return; > > > > +} > > > > > > Wouldn't it be better to move this check outside of this if, so that > > > you test the output of both the socket_new_fd and the file_new_fd ? > > > > qio_channel_file_new_fd() can never fail - only the socket_new_fd can > > fail (when trying to query the sockaddr_t data). > > OK, I'm not too fussed about this bit, but: > 1) It's easier to read - one level less of nesting > 2) It avoids making an assumption about qio_channel_file_new_fd() never > failing, which is something you happen to know but you treat as > API; it costs nothing to avoid making that assumption. Note that qio_channel_file_new_fd() does not accept a 'Error **errp' since there is no failure condition. So if we pushed the failure check up it'd be feel like a logic error as we'd be introducing a error return path from the method where 'errp' may not be set. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
Re: [Qemu-devel] [Qemu-block] [PATCH] block: add missing call to bdrv_drain_recurse
Am 23.12.2015 um 11:48 hat Paolo Bonzini geschrieben: > This is also needed in bdrv_drain_all, not just in bdrv_drain. > > Signed-off-by: Paolo Bonzini Reviewed-by: Kevin Wolf
Re: [Qemu-devel] [PATCH 2/6] i.MX: simplify CCM to only handle clock required by timers.
On 2 February 2016 at 22:22, Jean-Christophe DUBOIS wrote: > Peter Maydell wrote: >> These are just renaming NOCLK to CLK_NONE and fixing formatting? >> Again, please don't put that in the same patch as substantive >> code changes. > > > I just wanted to make things more coherent at the naming convention level. > > But if you prefer NOCLK, I'll put it back. I have no preference either way about the name. It's just hard to review patches if they mix lots of cleanups in at once, and especially if they mix stylistic changes in with behaviour changes. It's probably sufficient just to split this patch up into logically distinct changes with suitable commit messages. thanks -- PMM
Re: [Qemu-devel] [PATCH 1/1] arm: virt: change GPIO trigger interrupt to pulse
On 3 February 2016 at 07:15, Michael Tokarev wrote: > 28.01.2016 21:22, Wei Huang wrote: >> When QEMU is hook'ed up with libvirt/virsh, the first ACPI reboot >> request will succeed; but the following shutdown/reboot requests >> fail to trigger VMs to react. Notice that in mach-virt machine >> model GPIO is defined as edge-triggered and active-high in ACPI. >> This patch changes the behavior of powerdown notifier from PULLUP >> to PULSE. It solves the problem described above (i.e. reboot >> continues to work). > > So, what's the outcome of this? :) This patch is definitely wrong. The patch to fix up the gpio reset stuff is definitely the right idea. Whether it fixes the reported failure or some further change is also needed is currently unclear. thanks -- PMM
Re: [Qemu-devel] [PATCH] Fix inconsistency between comment and variable name
On 02/03/2016 05:35 PM, Markus Armbruster wrote: Michael Tokarev writes: 03.02.2016 06:19, Cao jin wrote: Signed-off-by: Cao jin --- include/hw/qdev-core.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h index abcdee8..42fa5db 100644 --- a/include/hw/qdev-core.h +++ b/include/hw/qdev-core.h @@ -221,7 +221,7 @@ typedef struct BusChild { /** * BusState: - * @hotplug_device: link to a hotplug device associated with bus. + * @hotplug_handler: link to a hotplug device associated with bus. Hmm. Now while the field name in comment and in the structure do match, the comment is still wrong, since it is a linke to a handler, not a device… :) Do you mean to suggest the line should be changed to * @hotplug_device: link to a hotplug handler associated with bus. ? If I understand it right, hotplug_handler is a link property of BusState, see qbus_initfn(). Tt actually points to a device, see qbus_set_hotplug_handler(), and the actually hotplug handler is reside in this device, see how qdev_get_hotplug_handler() is used. So, base on the talking above, the original comment could be understood by me:) -- Yours Sincerely, Cao jin
Re: [Qemu-devel] [PATCH v2 0/6] external backup api
On 03.02.2016 11:14, Fam Zheng wrote: On Sat, 01/30 13:56, Vladimir Sementsov-Ogievskiy wrote: Hi all. These series which aims to add external backup api. This is needed to allow backup software use our dirty bitmaps. Vmware and Parallels Cloud Server have this feature. What is the advantage of this appraoch over "drive-backup sync=incremental ..."? Hmm, you are asking about advantage of external backup over internal? It depends on external backup tool. There are three things are done: - add query-block-dirty-bitmap-ranges qmp command - add qmp commands for dirty-bitmap functions: create_successor, abdicate, reclaim. - make create-successor command transaction-able Then, external backup should be done like this: 1. qmp transaction { external-snapshot bitmap-create-successor } 2. qmp query frozen bitmap, not acquiring aio context. What do you do with the query response, read the snapshot image with an external tool? Yes. Alternatively, image fleecing may be used instead of snapshot. Fam 3. do external backup, using snapshot and bitmap 4. if (success backup) qmp bitmap-abdicate else qmp bitmap-reclaime 5. qmp merge snapshot v2: a lot of additions and changes, no sense to compare with v1 Vladimir Sementsov-Ogievskiy (6): block dirty bitmap: add next_zero function qmp: add query-block-dirty-bitmap-ranges iotests: test query-block-dirty-bitmap-ranges qapi: add qmp commands for some dirty bitmap functions qapi: make block-dirty-bitmap-create-successor transaction-able iotests: test external backup api block/dirty-bitmap.c | 60 ++ blockdev.c | 113 + include/block/dirty-bitmap.h | 9 include/qemu/hbitmap.h | 8 +++ qapi-schema.json | 4 +- qapi/block-core.json | 90 + qmp-commands.hx | 118 +++ tests/qemu-iotests/150 | 88 tests/qemu-iotests/150.out | 21 tests/qemu-iotests/151 | 77 tests/qemu-iotests/151.out | 7 +++ tests/qemu-iotests/group | 2 + util/hbitmap.c | 26 ++ 13 files changed, 622 insertions(+), 1 deletion(-) create mode 100755 tests/qemu-iotests/150 create mode 100644 tests/qemu-iotests/150.out create mode 100755 tests/qemu-iotests/151 create mode 100644 tests/qemu-iotests/151.out -- 1.8.3.1 -- Best regards, Vladimir
Re: [Qemu-devel] [PATCH v4] blockjob: Fix hang in block_job_finish_sync
On Tue, Feb 02, 2016 at 10:12:24AM +0800, Fam Zheng wrote: > With a mirror job running on a virtio-blk dataplane disk, sending "q" to > HMP will cause a dead loop in block_job_finish_sync. > > This is because the aio_poll() only processes the AIO context of bs > which has no more work to do, while the main loop BH that is scheduled > for setting the job->completed flag is never processed. > > Fix this by adding a flag in BlockJob structure, to track which context > to poll for the block job to make progress. Its value is set to true > when block_job_coroutine_complete() is called, and is checked in > block_job_finish_sync to determine which context to poll. > > Suggested-by: Stefan Hajnoczi > Signed-off-by: Fam Zheng > --- > blockjob.c | 6 +- > include/block/blockjob.h | 5 + > 2 files changed, 10 insertions(+), 1 deletion(-) Thanks, applied to my block tree: https://github.com/stefanha/qemu/commits/block Stefan signature.asc Description: PGP signature
Re: [Qemu-devel] [PATCH v2 0/6] external backup api
On Wed, 02/03 13:57, Vladimir Sementsov-Ogievskiy wrote: > On 03.02.2016 11:14, Fam Zheng wrote: > >On Sat, 01/30 13:56, Vladimir Sementsov-Ogievskiy wrote: > >>Hi all. > >> > >>These series which aims to add external backup api. This is needed to allow > >>backup software use our dirty bitmaps. > >> > >>Vmware and Parallels Cloud Server have this feature. > >What is the advantage of this appraoch over "drive-backup sync=incremental > >..."? > > Hmm, you are asking about advantage of external backup over > internal? It depends on external backup tool. I'm asking why the tool can't use drive-backup to achieve what you want. They seem very comparable. Fam > > > > >>There are three things are done: > >>- add query-block-dirty-bitmap-ranges qmp command > >>- add qmp commands for dirty-bitmap functions: create_successor, abdicate, > >> reclaim. > >>- make create-successor command transaction-able > >> > >>Then, external backup should be done like this: > >> > >>1. qmp transaction { > >> external-snapshot > >> bitmap-create-successor > >> } > >> > >>2. qmp query frozen bitmap, not acquiring aio context. > >What do you do with the query response, read the snapshot image with an > >external tool? > > Yes. Alternatively, image fleecing may be used instead of snapshot. > > > > >Fam > > > >>3. do external backup, using snapshot and bitmap > >> > >> > >>4. if (success backup) > >> qmp bitmap-abdicate > >> else > >> qmp bitmap-reclaime > >> > >>5. qmp merge snapshot > >> > >> > >>v2: a lot of additions and changes, no sense to compare with v1 > >> > >> > >>Vladimir Sementsov-Ogievskiy (6): > >> block dirty bitmap: add next_zero function > >> qmp: add query-block-dirty-bitmap-ranges > >> iotests: test query-block-dirty-bitmap-ranges > >> qapi: add qmp commands for some dirty bitmap functions > >> qapi: make block-dirty-bitmap-create-successor transaction-able > >> iotests: test external backup api > >> > >> block/dirty-bitmap.c | 60 ++ > >> blockdev.c | 113 > >> + > >> include/block/dirty-bitmap.h | 9 > >> include/qemu/hbitmap.h | 8 +++ > >> qapi-schema.json | 4 +- > >> qapi/block-core.json | 90 + > >> qmp-commands.hx | 118 > >> +++ > >> tests/qemu-iotests/150 | 88 > >> tests/qemu-iotests/150.out | 21 > >> tests/qemu-iotests/151 | 77 > >> tests/qemu-iotests/151.out | 7 +++ > >> tests/qemu-iotests/group | 2 + > >> util/hbitmap.c | 26 ++ > >> 13 files changed, 622 insertions(+), 1 deletion(-) > >> create mode 100755 tests/qemu-iotests/150 > >> create mode 100644 tests/qemu-iotests/150.out > >> create mode 100755 tests/qemu-iotests/151 > >> create mode 100644 tests/qemu-iotests/151.out > >> > >>-- > >>1.8.3.1 > >> > > > -- > Best regards, > Vladimir >
Re: [Qemu-devel] [PATCH v16 00/14] vfio-pci: pass the aer error to guest
On 01/17/2016 02:34 AM, Michael S. Tsirkin wrote: On Tue, Jan 12, 2016 at 10:43:01AM +0800, Cao jin wrote: From: Chen Fan For now, for vfio pci passthough devices when qemu receives an error from host aer report, currentlly just terminate the guest, but usually user want to know what error occurred but stopping the guest, so this patches add aer capability support for vfio device, and pass the error to guest, and have guest driver to recover from the error. I would like to see a version of this patchset that doesn't depend on pci core changes. I think that if you make this simplifying assumption: - all devices on same bus in guest are on same bus in host then you can handle both reset and hotplug simply in function 0 since it will belong to vfio. So we can have a version without pci core changes that simply assumes this, and things will just work. Now, if we wanted to enforce this limitation, I think the cleanest way would be to add a callback in struct PCIDevice: bool is_valid_function(PCIDevice *newfunction) and call it as each function is added. This way aer function can validate that each function added shares the same bus. And this way issues will be detected directly and not when function 0 is added. I would prefer this validation code to be a patch on top so we can merge the functionality directly and avoid blocking it while we figure out the best api to validate things. I don't see why making guest topology match host would ever be a problem, but if it's required to support configurations where these differ, I'd like to see an attempt to address that be split out, after aer is supported. Hi Michael, Just think about this more, I think we also should check the vfio devices whether on the same bus at the time of function 0 is added. because we don't know the affected devices by a bus reset have already all been assigned to VM. for example, the multi-function's hotplug. devices on same bus in host are added to VM one by one. when we test one device, we haven't yet added the other devices. so I think the patch should like below. then we could add a vfio_is_valid_function in vfio to test each device whether the affected devices on the same bus. Thanks, Chen diff --git a/hw/pci/pci.c b/hw/pci/pci.c index d940f79..7163b56 100644 --- a/hw/pci/pci.c +++ b/hw/pci/pci.c @@ -1836,6 +1836,38 @@ PCIDevice *pci_find_device(PCIBus *bus, int bus_num, uint8_t devfn) return bus->devices[devfn]; } +static int pci_bus_check_devices(PCIBus *bus) +{ +PCIDeviceClass *pc; +int i, ret = 0; + +for (i = 0; i < ARRAY_SIZE(bus->devices); ++i) { +if (!bus->devices[i]) { +continue; +} + +pc = PCI_DEVICE_GET_CLASS(bus->devices[i]); +if (!pc->is_valid_func) { +continue; +} + +ret = pc->is_valid_func(bus->devices[i], bus); +if (!ret) { +return -1; +} +} +return 0; +} + +static bool pci_is_valid_function(PCIDevice *pdev, PCIBus *bus) +{ +if (pdev->bus == bus) { +return true; +} + +return false; +} + static void pci_qdev_realize(DeviceState *qdev, Error **errp) { PCIDevice *pci_dev = (PCIDevice *)qdev; @@ -1878,6 +1910,14 @@ static void pci_qdev_realize(DeviceState *qdev, Error **errp) pci_qdev_unrealize(DEVICE(pci_dev), NULL); return; } + +if (DEVICE(pci_dev)->hotplugged && +pci_get_function_0(pci_dev) == pci_dev && +pci_bus_check_devices(bus)) { +error_setg(errp, "failed to hotplug function 0"); +pci_qdev_unrealize(DEVICE(pci_dev), NULL); +return; +} } static void pci_default_realize(PCIDevice *dev, Error **errp) @@ -2390,6 +2430,7 @@ static void pci_device_class_init(ObjectClass *klass, void *data) k->bus_type = TYPE_PCI_BUS; k->props = pci_props; pc->realize = pci_default_realize; +pc->is_valid_func = pci_is_valid_function; } AddressSpace *pci_device_iommu_address_space(PCIDevice *dev) diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h index dedf277..a89580f 100644 --- a/include/hw/pci/pci.h +++ b/include/hw/pci/pci.h @@ -191,6 +191,7 @@ typedef struct PCIDeviceClass { void (*realize)(PCIDevice *dev, Error **errp); int (*init)(PCIDevice *dev);/* TODO convert to realize() and remove */ +bool (*is_valid_func)(PCIDevice *dev, PCIBus *bus); PCIUnregisterFunc *exit; PCIConfigReadFunc *config_read; PCIConfigWriteFunc *config_write; v15-v16: 10/14, 11/14 are new to introduce a reset sequence id to specify the vfio devices has been reset for that reset. other patches aren't modified. v14-v15: 1. add device hot reset callback 2. add bus_in_reset for vfio device to avoid multi do host bus reset v13-v14: 1. for multifunction device, requiring all functions enable AER.(9/13) 2. due to all affected functions receive error signal, ignore no error occurred function. (12/13) v12-v13:
Re: [Qemu-devel] [SeaBIOS] [RFC PATCH v2] fw/pci: Add support for mapping Intel IGD OpRegion via QEMU
Hi, > +static void intel_igd_opregion_setup(struct pci_device *dev, void *arg) > +{ > +struct romfile_s *file = romfile_find("etc/igd-opregion"); Is it possible to have multiple igd devices in a single machine? So, should we include the pci address in the file name? Guess not needed, it's chipset graphics after all ... > +pci_config_writel(bdf, 0xFC, cpu_to_le32((u32)opregion)); Looks a bit funny in code which is never ever going to run on big endian machines ;) Patch looks good to me. cheers, Gerd
Re: [Qemu-devel] [PULL 00/17] Net patches
Jason Wang, on Wed 03 Feb 2016 12:52:24 +0800, wrote: > > Hi. I'm afraid this failed to build on w32: > > > > declaration specifiers or ‘...’ before ‘sa_family_t’ > > - switch to use unsigned short > - or typedef sa_family_t as unsigned short for windows Doing the second runs the risk of windows eventually catching up with what posix requires. Using unsigned short instead should be fine. Samuel
[Qemu-devel] [PULL 0/6] virtio-gpu: bugfixes and spice support preparation
Hi, Minor fixes and some preparing work in console and virtio-gpu for spice support. please pull, Gerd The following changes since commit c65db7705b7926f4a084b93778e4bd5dd3990aad: Merge remote-tracking branch 'remotes/maxreitz/tags/pull-block-for-peter-2016-02-02' into staging (2016-02-02 18:04:04 +) are available in the git repository at: git://git.kraxel.org/qemu tags/pull-vga-20160203-1 for you to fetch changes up to 321c9adba5a64a1a9de2dd7db5433b62a5433439: virtio-gpu: block any rendering until client (ui) is done (2016-02-03 10:41:36 +0100) virtio-gpu: bugfixes and spice support preparation Gerd Hoffmann (6): zap qemu_egl_has_ext in include/ui/egl-helpers.h console: block rendering until client is done virtio-gpu: fix memory leak in error path virtio-gpu: maintain command queue virtio-gpu: add support to enable/disable command processing virtio-gpu: block any rendering until client (ui) is done hw/display/virtio-gpu-3d.c | 11 -- hw/display/virtio-gpu.c| 77 ++ hw/display/virtio-vga.c| 10 ++ include/hw/virtio/virtio-gpu.h | 4 +++ include/ui/console.h | 2 ++ include/ui/egl-helpers.h | 1 - ui/console.c | 10 ++ 7 files changed, 90 insertions(+), 25 deletions(-)
[Qemu-devel] [PULL 1/6] zap qemu_egl_has_ext in include/ui/egl-helpers.h
Drop leftover prototype which sneaked in by mistake Signed-off-by: Gerd Hoffmann Reviewed-by: Marc-André Lureau --- include/ui/egl-helpers.h | 1 - 1 file changed, 1 deletion(-) diff --git a/include/ui/egl-helpers.h b/include/ui/egl-helpers.h index 5ad5dc3..8c84398 100644 --- a/include/ui/egl-helpers.h +++ b/include/ui/egl-helpers.h @@ -11,6 +11,5 @@ EGLSurface qemu_egl_init_surface_x11(EGLContext ectx, Window win); int qemu_egl_init_dpy(EGLNativeDisplayType dpy, bool gles, bool debug); EGLContext qemu_egl_init_ctx(void); -bool qemu_egl_has_ext(const char *haystack, const char *needle); #endif /* EGL_HELPERS_H */ -- 1.8.3.1
[Qemu-devel] [PULL 4/6] virtio-gpu: maintain command queue
We'll go take out the commands we receive out of the virt queue and put them into a linked list, to decouple virtio queue handling from actual command processing. Also move cmd processing to new virtio_gpu_handle_ctrl func, so we can easily kick it from different places. Signed-off-by: Gerd Hoffmann --- hw/display/virtio-gpu.c| 63 +++--- include/hw/virtio/virtio-gpu.h | 1 + 2 files changed, 42 insertions(+), 22 deletions(-) diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c index 3c96f8e..a2ec7cb 100644 --- a/hw/display/virtio-gpu.c +++ b/hw/display/virtio-gpu.c @@ -755,33 +755,21 @@ static void virtio_gpu_handle_cursor_cb(VirtIODevice *vdev, VirtQueue *vq) qemu_bh_schedule(g->cursor_bh); } -static void virtio_gpu_handle_ctrl(VirtIODevice *vdev, VirtQueue *vq) +static void virtio_gpu_process_cmdq(VirtIOGPU *g) { -VirtIOGPU *g = VIRTIO_GPU(vdev); struct virtio_gpu_ctrl_command *cmd; -if (!virtio_queue_ready(vq)) { -return; -} - -#ifdef CONFIG_VIRGL -if (!g->renderer_inited && g->use_virgl_renderer) { -virtio_gpu_virgl_init(g); -g->renderer_inited = true; -} -#endif - -cmd = g_new(struct virtio_gpu_ctrl_command, 1); -while (virtqueue_pop(vq, &cmd->elem)) { -cmd->vq = vq; -cmd->error = 0; -cmd->finished = false; -if (virtio_gpu_stats_enabled(g->conf)) { -g->stats.requests++; -} +while (!QTAILQ_EMPTY(&g->cmdq)) { +cmd = QTAILQ_FIRST(&g->cmdq); +/* process command */ VIRGL(g, virtio_gpu_virgl_process_cmd, virtio_gpu_simple_process_cmd, g, cmd); +QTAILQ_REMOVE(&g->cmdq, cmd, next); +if (virtio_gpu_stats_enabled(g->conf)) { +g->stats.requests++; +} + if (!cmd->finished) { QTAILQ_INSERT_TAIL(&g->fenceq, cmd, next); g->inflight++; @@ -791,11 +779,41 @@ static void virtio_gpu_handle_ctrl(VirtIODevice *vdev, VirtQueue *vq) } fprintf(stderr, "inflight: %3d (+)\r", g->inflight); } -cmd = g_new(struct virtio_gpu_ctrl_command, 1); +} else { +g_free(cmd); } } +} + +static void virtio_gpu_handle_ctrl(VirtIODevice *vdev, VirtQueue *vq) +{ +VirtIOGPU *g = VIRTIO_GPU(vdev); +struct virtio_gpu_ctrl_command *cmd; + +if (!virtio_queue_ready(vq)) { +return; +} + +#ifdef CONFIG_VIRGL +if (!g->renderer_inited && g->use_virgl_renderer) { +virtio_gpu_virgl_init(g); +g->renderer_inited = true; +} +#endif + +cmd = g_new(struct virtio_gpu_ctrl_command, 1); +while (virtqueue_pop(vq, &cmd->elem)) { +cmd->vq = vq; +cmd->error = 0; +cmd->finished = false; +cmd->waiting = false; +QTAILQ_INSERT_TAIL(&g->cmdq, cmd, next); +cmd = g_new(struct virtio_gpu_ctrl_command, 1); +} g_free(cmd); +virtio_gpu_process_cmdq(g); + #ifdef CONFIG_VIRGL if (g->use_virgl_renderer) { virtio_gpu_virgl_fence_poll(g); @@ -921,6 +939,7 @@ static void virtio_gpu_device_realize(DeviceState *qdev, Error **errp) g->ctrl_bh = qemu_bh_new(virtio_gpu_ctrl_bh, g); g->cursor_bh = qemu_bh_new(virtio_gpu_cursor_bh, g); QTAILQ_INIT(&g->reslist); +QTAILQ_INIT(&g->cmdq); QTAILQ_INIT(&g->fenceq); g->enabled_output_bitmask = 1; diff --git a/include/hw/virtio/virtio-gpu.h b/include/hw/virtio/virtio-gpu.h index 9b279d7..f7e7a52 100644 --- a/include/hw/virtio/virtio-gpu.h +++ b/include/hw/virtio/virtio-gpu.h @@ -94,6 +94,7 @@ typedef struct VirtIOGPU { DeviceState *qdev; QTAILQ_HEAD(, virtio_gpu_simple_resource) reslist; +QTAILQ_HEAD(, virtio_gpu_ctrl_command) cmdq; QTAILQ_HEAD(, virtio_gpu_ctrl_command) fenceq; struct virtio_gpu_scanout scanout[VIRTIO_GPU_MAX_SCANOUT]; -- 1.8.3.1
[Qemu-devel] [PULL 3/6] virtio-gpu: fix memory leak in error path
Found by Coverity Scan, buf not freed on error. Signed-off-by: Gerd Hoffmann Reviewed-by: Marc-André Lureau --- hw/display/virtio-gpu-3d.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/hw/display/virtio-gpu-3d.c b/hw/display/virtio-gpu-3d.c index 59581a4..e13122d 100644 --- a/hw/display/virtio-gpu-3d.c +++ b/hw/display/virtio-gpu-3d.c @@ -198,7 +198,7 @@ static void virgl_cmd_submit_3d(VirtIOGPU *g, qemu_log_mask(LOG_GUEST_ERROR, "%s: size mismatch (%zd/%d)", __func__, s, cs.size); cmd->error = VIRTIO_GPU_RESP_ERR_INVALID_PARAMETER; -return; +goto out; } if (virtio_gpu_stats_enabled(g->conf)) { @@ -208,6 +208,7 @@ static void virgl_cmd_submit_3d(VirtIOGPU *g, virgl_renderer_submit_cmd(buf, cs.hdr.ctx_id, cs.size / 4); +out: g_free(buf); } -- 1.8.3.1
[Qemu-devel] [PULL 6/6] virtio-gpu: block any rendering until client (ui) is done
Wire up gl_block callback, so ui code can request to stop virtio-gpu rendering. Signed-off-by: Gerd Hoffmann --- hw/display/virtio-gpu-3d.c | 5 + hw/display/virtio-gpu.c| 11 +++ hw/display/virtio-vga.c| 10 ++ include/hw/virtio/virtio-gpu.h | 1 + 4 files changed, 27 insertions(+) diff --git a/hw/display/virtio-gpu-3d.c b/hw/display/virtio-gpu-3d.c index 6f646b1..fa19294 100644 --- a/hw/display/virtio-gpu-3d.c +++ b/hw/display/virtio-gpu-3d.c @@ -383,6 +383,11 @@ void virtio_gpu_virgl_process_cmd(VirtIOGPU *g, { VIRTIO_GPU_FILL_CMD(cmd->cmd_hdr); +cmd->waiting = g->renderer_blocked; +if (cmd->waiting) { +return; +} + virgl_renderer_force_ctx_0(); switch (cmd->cmd_hdr.type) { case VIRTIO_GPU_CMD_CTX_CREATE: diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c index af9b757..1cb4002 100644 --- a/hw/display/virtio-gpu.c +++ b/hw/display/virtio-gpu.c @@ -897,11 +897,22 @@ static int virtio_gpu_ui_info(void *opaque, uint32_t idx, QemuUIInfo *info) return 0; } +static void virtio_gpu_gl_block(void *opaque, bool block) +{ +VirtIOGPU *g = opaque; + +g->renderer_blocked = block; +if (!block) { +virtio_gpu_process_cmdq(g); +} +} + const GraphicHwOps virtio_gpu_ops = { .invalidate = virtio_gpu_invalidate_display, .gfx_update = virtio_gpu_update_display, .text_update = virtio_gpu_text_update, .ui_info = virtio_gpu_ui_info, +.gl_block = virtio_gpu_gl_block, }; static void virtio_gpu_device_realize(DeviceState *qdev, Error **errp) diff --git a/hw/display/virtio-vga.c b/hw/display/virtio-vga.c index 249dbc0..e58b165 100644 --- a/hw/display/virtio-vga.c +++ b/hw/display/virtio-vga.c @@ -66,11 +66,21 @@ static int virtio_vga_ui_info(void *opaque, uint32_t idx, QemuUIInfo *info) return -1; } +static void virtio_vga_gl_block(void *opaque, bool block) +{ +VirtIOVGA *vvga = opaque; + +if (virtio_gpu_ops.gl_block) { +virtio_gpu_ops.gl_block(&vvga->vdev, block); +} +} + static const GraphicHwOps virtio_vga_ops = { .invalidate = virtio_vga_invalidate_display, .gfx_update = virtio_vga_update_display, .text_update = virtio_vga_text_update, .ui_info = virtio_vga_ui_info, +.gl_block = virtio_vga_gl_block, }; /* VGA device wrapper around PCI device around virtio GPU */ diff --git a/include/hw/virtio/virtio-gpu.h b/include/hw/virtio/virtio-gpu.h index f6cae0b..13b0ab0 100644 --- a/include/hw/virtio/virtio-gpu.h +++ b/include/hw/virtio/virtio-gpu.h @@ -107,6 +107,7 @@ typedef struct VirtIOGPU { bool use_virgl_renderer; bool renderer_inited; +bool renderer_blocked; QEMUTimer *fence_poll; QEMUTimer *print_stats; -- 1.8.3.1
[Qemu-devel] [PULL 2/6] console: block rendering until client is done
Allow gl user interfaces to block display device gl rendering. The ui code might want to do that in case it takes a little longer to bring things to screen, for example because we'll hand over a dma-buf to another process (spice will do that). Signed-off-by: Gerd Hoffmann Reviewed-by: Marc-André Lureau --- include/ui/console.h | 2 ++ ui/console.c | 10 ++ 2 files changed, 12 insertions(+) diff --git a/include/ui/console.h b/include/ui/console.h index adac36d..12ad627 100644 --- a/include/ui/console.h +++ b/include/ui/console.h @@ -362,6 +362,7 @@ typedef struct GraphicHwOps { void (*text_update)(void *opaque, console_ch_t *text); void (*update_interval)(void *opaque, uint64_t interval); int (*ui_info)(void *opaque, uint32_t head, QemuUIInfo *info); +void (*gl_block)(void *opaque, bool block); } GraphicHwOps; QemuConsole *graphic_console_init(DeviceState *dev, uint32_t head, @@ -374,6 +375,7 @@ void graphic_console_set_hwops(QemuConsole *con, void graphic_hw_update(QemuConsole *con); void graphic_hw_invalidate(QemuConsole *con); void graphic_hw_text_update(QemuConsole *con, console_ch_t *chardata); +void graphic_hw_gl_block(QemuConsole *con, bool block); QemuConsole *qemu_console_lookup_by_index(unsigned int index); QemuConsole *qemu_console_lookup_by_device(DeviceState *dev, uint32_t head); diff --git a/ui/console.c b/ui/console.c index fe950c6..791b4fc 100644 --- a/ui/console.c +++ b/ui/console.c @@ -261,6 +261,16 @@ void graphic_hw_update(QemuConsole *con) } } +void graphic_hw_gl_block(QemuConsole *con, bool block) +{ +if (!con) { +con = active_console; +} +if (con && con->hw_ops->gl_block) { +con->hw_ops->gl_block(con->hw, block); +} +} + void graphic_hw_invalidate(QemuConsole *con) { if (!con) { -- 1.8.3.1
[Qemu-devel] [PULL 5/6] virtio-gpu: add support to enable/disable command processing
So we can stop rendering for a while in case we have to. Signed-off-by: Gerd Hoffmann Reviewed-by: Marc-André Lureau --- hw/display/virtio-gpu-3d.c | 3 ++- hw/display/virtio-gpu.c| 5 - include/hw/virtio/virtio-gpu.h | 2 ++ 3 files changed, 8 insertions(+), 2 deletions(-) diff --git a/hw/display/virtio-gpu-3d.c b/hw/display/virtio-gpu-3d.c index e13122d..6f646b1 100644 --- a/hw/display/virtio-gpu-3d.c +++ b/hw/display/virtio-gpu-3d.c @@ -554,7 +554,8 @@ static void virtio_gpu_fence_poll(void *opaque) VirtIOGPU *g = opaque; virgl_renderer_poll(); -if (g->inflight) { +virtio_gpu_process_cmdq(g); +if (!QTAILQ_EMPTY(&g->cmdq) || !QTAILQ_EMPTY(&g->fenceq)) { timer_mod(g->fence_poll, qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + 10); } } diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c index a2ec7cb..af9b757 100644 --- a/hw/display/virtio-gpu.c +++ b/hw/display/virtio-gpu.c @@ -755,7 +755,7 @@ static void virtio_gpu_handle_cursor_cb(VirtIODevice *vdev, VirtQueue *vq) qemu_bh_schedule(g->cursor_bh); } -static void virtio_gpu_process_cmdq(VirtIOGPU *g) +void virtio_gpu_process_cmdq(VirtIOGPU *g) { struct virtio_gpu_ctrl_command *cmd; @@ -765,6 +765,9 @@ static void virtio_gpu_process_cmdq(VirtIOGPU *g) /* process command */ VIRGL(g, virtio_gpu_virgl_process_cmd, virtio_gpu_simple_process_cmd, g, cmd); +if (cmd->waiting) { +break; +} QTAILQ_REMOVE(&g->cmdq, cmd, next); if (virtio_gpu_stats_enabled(g->conf)) { g->stats.requests++; diff --git a/include/hw/virtio/virtio-gpu.h b/include/hw/virtio/virtio-gpu.h index f7e7a52..f6cae0b 100644 --- a/include/hw/virtio/virtio-gpu.h +++ b/include/hw/virtio/virtio-gpu.h @@ -76,6 +76,7 @@ struct virtio_gpu_ctrl_command { VirtQueue *vq; struct virtio_gpu_ctrl_hdr cmd_hdr; uint32_t error; +bool waiting; bool finished; QTAILQ_ENTRY(virtio_gpu_ctrl_command) next; }; @@ -152,6 +153,7 @@ int virtio_gpu_create_mapping_iov(struct virtio_gpu_resource_attach_backing *ab, struct virtio_gpu_ctrl_command *cmd, struct iovec **iov); void virtio_gpu_cleanup_mapping_iov(struct iovec *iov, uint32_t count); +void virtio_gpu_process_cmdq(VirtIOGPU *g); /* virtio-gpu-3d.c */ void virtio_gpu_virgl_process_cmd(VirtIOGPU *g, -- 1.8.3.1
Re: [Qemu-devel] [PATCH v2 0/6] external backup api
On 03.02.2016 14:02, Fam Zheng wrote: On Wed, 02/03 13:57, Vladimir Sementsov-Ogievskiy wrote: On 03.02.2016 11:14, Fam Zheng wrote: On Sat, 01/30 13:56, Vladimir Sementsov-Ogievskiy wrote: Hi all. These series which aims to add external backup api. This is needed to allow backup software use our dirty bitmaps. Vmware and Parallels Cloud Server have this feature. What is the advantage of this appraoch over "drive-backup sync=incremental ..."? Hmm, you are asking about advantage of external backup over internal? It depends on external backup tool. I'm asking why the tool can't use drive-backup to achieve what you want. They seem very comparable. drive-backup has limited range of maintained by qemu targets.. Third party tool uses its own internal targets. If we use drvie-backup there will be two copies instead of one: from qemu to drive-backup produced backup and then to internal target in third-party tool. Fam There are three things are done: - add query-block-dirty-bitmap-ranges qmp command - add qmp commands for dirty-bitmap functions: create_successor, abdicate, reclaim. - make create-successor command transaction-able Then, external backup should be done like this: 1. qmp transaction { external-snapshot bitmap-create-successor } 2. qmp query frozen bitmap, not acquiring aio context. What do you do with the query response, read the snapshot image with an external tool? Yes. Alternatively, image fleecing may be used instead of snapshot. Fam 3. do external backup, using snapshot and bitmap 4. if (success backup) qmp bitmap-abdicate else qmp bitmap-reclaime 5. qmp merge snapshot v2: a lot of additions and changes, no sense to compare with v1 Vladimir Sementsov-Ogievskiy (6): block dirty bitmap: add next_zero function qmp: add query-block-dirty-bitmap-ranges iotests: test query-block-dirty-bitmap-ranges qapi: add qmp commands for some dirty bitmap functions qapi: make block-dirty-bitmap-create-successor transaction-able iotests: test external backup api block/dirty-bitmap.c | 60 ++ blockdev.c | 113 + include/block/dirty-bitmap.h | 9 include/qemu/hbitmap.h | 8 +++ qapi-schema.json | 4 +- qapi/block-core.json | 90 + qmp-commands.hx | 118 +++ tests/qemu-iotests/150 | 88 tests/qemu-iotests/150.out | 21 tests/qemu-iotests/151 | 77 tests/qemu-iotests/151.out | 7 +++ tests/qemu-iotests/group | 2 + util/hbitmap.c | 26 ++ 13 files changed, 622 insertions(+), 1 deletion(-) create mode 100755 tests/qemu-iotests/150 create mode 100644 tests/qemu-iotests/150.out create mode 100755 tests/qemu-iotests/151 create mode 100644 tests/qemu-iotests/151.out -- 1.8.3.1 -- Best regards, Vladimir -- Best regards, Vladimir
Re: [Qemu-devel] [PATCH v1 09/22] migration: convert unix socket protocol to use QIOChannel
On Tue, Feb 02, 2016 at 06:02:13PM +, Dr. David Alan Gilbert wrote: > * Daniel P. Berrange (berra...@redhat.com) wrote: > > Convert the unix socket migration protocol driver to use > > QIOChannel and QEMUFileChannel, instead of plain sockets > > APIs. It can be unconditionally built, since the socket > > impl of QIOChannel will report a suitable error on platforms > > where UNIX sockets are unavailable. > > > > Signed-off-by: Daniel P. Berrange > > --- > > migration/Makefile.objs | 4 +- > > migration/migration.c | 4 ++ > > migration/unix.c| 103 > > +++- > > 3 files changed, 72 insertions(+), 39 deletions(-) > > > > diff --git a/migration/Makefile.objs b/migration/Makefile.objs > > index b357e2f..a5f8a03 100644 > > --- a/migration/Makefile.objs > > +++ b/migration/Makefile.objs > > @@ -1,11 +1,11 @@ > > -common-obj-y += migration.o tcp.o > > +common-obj-y += migration.o tcp.o unix.o > > common-obj-y += vmstate.o > > common-obj-y += qemu-file.o qemu-file-buf.o qemu-file-unix.o > > qemu-file-stdio.o > > common-obj-y += qemu-file-channel.o > > common-obj-y += xbzrle.o postcopy-ram.o > > > > common-obj-$(CONFIG_RDMA) += rdma.o > > -common-obj-$(CONFIG_POSIX) += exec.o unix.o fd.o > > +common-obj-$(CONFIG_POSIX) += exec.o fd.o > > > > common-obj-y += block.o > > > > diff --git a/migration/migration.c b/migration/migration.c > > index e921b20..1c5f12e 100644 > > --- a/migration/migration.c > > +++ b/migration/migration.c > > @@ -312,8 +312,10 @@ void qemu_start_incoming_migration(const char *uri, > > Error **errp) > > #if !defined(WIN32) > > } else if (strstart(uri, "exec:", &p)) { > > exec_start_incoming_migration(p, errp); > > +#endif > > } else if (strstart(uri, "unix:", &p)) { > > unix_start_incoming_migration(p, errp); > > +#if !defined(WIN32) > > } else if (strstart(uri, "fd:", &p)) { > > fd_start_incoming_migration(p, errp); > > #endif > > @@ -1017,8 +1019,10 @@ void qmp_migrate(const char *uri, bool has_blk, bool > > blk, > > #if !defined(WIN32) > > } else if (strstart(uri, "exec:", &p)) { > > exec_start_outgoing_migration(s, p, &local_err); > > +#endif > > } else if (strstart(uri, "unix:", &p)) { > > unix_start_outgoing_migration(s, p, &local_err); > > +#if !defined(WIN32) > > } else if (strstart(uri, "fd:", &p)) { > > fd_start_outgoing_migration(s, p, &local_err); > > #endif > > diff --git a/migration/unix.c b/migration/unix.c > > index b591813..4674640 100644 > > --- a/migration/unix.c > > +++ b/migration/unix.c > > @@ -1,10 +1,11 @@ > > /* > > * QEMU live migration via Unix Domain Sockets > > * > > - * Copyright Red Hat, Inc. 2009 > > + * Copyright Red Hat, Inc. 2009-2015 > > year++ ? > > > * > > * Authors: > > * Chris Lalancette > > + * Daniel P. Berrange > > * > > * This work is licensed under the terms of the GNU GPL, version 2. See > > * the COPYING file in the top-level directory. > > @@ -17,11 +18,9 @@ > > > > #include "qemu-common.h" > > #include "qemu/error-report.h" > > -#include "qemu/sockets.h" > > -#include "qemu/main-loop.h" > > #include "migration/migration.h" > > #include "migration/qemu-file.h" > > -#include "block/block.h" > > +#include "io/channel-socket.h" > > > > //#define DEBUG_MIGRATION_UNIX > > > > @@ -33,71 +32,101 @@ > > do { } while (0) > > #endif > > > > -static void unix_wait_for_connect(int fd, Error *err, void *opaque) > > + > > +static SocketAddress *unix_build_address(const char *path) > > +{ > > +SocketAddress *saddr; > > + > > +saddr = g_new0(SocketAddress, 1); > > +saddr->type = SOCKET_ADDRESS_KIND_UNIX; > > +saddr->u.q_unix = g_new0(UnixSocketAddress, 1); > > +saddr->u.q_unix->path = g_strdup(path); > > + > > +return saddr; > > +} > > + > > + > > +static void unix_outgoing_migration(Object *src, > > +Error *err, > > +gpointer opaque) > > { > > MigrationState *s = opaque; > > +QIOChannel *sioc = QIO_CHANNEL(src); > > > > -if (fd < 0) { > > +if (err) { > > DPRINTF("migrate connect error: %s\n", error_get_pretty(err)); > > s->file = NULL; > > migrate_fd_error(s); > > } else { > > DPRINTF("migrate connect success\n"); > > -s->file = qemu_fopen_socket(fd, "wb"); > > +s->file = qemu_fopen_channel_output(sioc); > > migrate_fd_connect(s); > > } > > +object_unref(src); > > } > > > > + > > void unix_start_outgoing_migration(MigrationState *s, const char *path, > > Error **errp) > > { > > -unix_nonblocking_connect(path, unix_wait_for_connect, s, errp); > > +SocketAddress *saddr = unix_build_address(path); > > +QIOChannelSocket *sioc; > > +sioc = qio_channel_socket_new(); > > +qio_channel_socket_connect_async(sioc, > > +
Re: [Qemu-devel] [PATCH v14 7/8] Implement new driver for block replication
On 2016/2/3 17:55, Wen Congyang wrote: On 02/03/2016 05:32 PM, Stefan Hajnoczi wrote: On Wed, Feb 03, 2016 at 09:29:15AM +0800, Wen Congyang wrote: On 02/02/2016 10:34 PM, Stefan Hajnoczi wrote: On Mon, Feb 01, 2016 at 09:13:36AM +0800, Wen Congyang wrote: On 01/29/2016 11:46 PM, Stefan Hajnoczi wrote: On Fri, Jan 29, 2016 at 11:13:42AM +0800, Changlong Xie wrote: On 01/28/2016 11:15 PM, Stefan Hajnoczi wrote: On Thu, Jan 28, 2016 at 09:13:24AM +0800, Wen Congyang wrote: On 01/27/2016 10:46 PM, Stefan Hajnoczi wrote: On Wed, Jan 13, 2016 at 05:18:31PM +0800, Changlong Xie wrote: I'm concerned that the bdrv_drain_all() in vm_stop() can take a long time if the disk is slow/failing. bdrv_drain_all() blocks until all in-flight I/O requests have completed. What does the Primary do if the Secondary becomes unresponsive? Actually, we knew this problem. But currently, there seems no better way to resolve it. If you have any ideas? Is it possible to hold the checkpoint information and acknowledge the checkpoint right away, without waiting for bdrv_drain_all() or any Secondory guest activity to complete? There is no way to know that secondary becomes unreponsive. I meant whether it is necessary for the Secondary to vm_stop() and apply the checkpoint before acknowledging the checkpoint to the Primary? I don't understand this. Here is the COLO checkpoint flow: PrimarySecondary new checkpoint notice ---> vm_stop() vm_stop() vm state(device state, memory, cpu) ---> load state <--- done vm_start() vm_start() If the Secondary's vm_stop() call blocks then the Primary is stuck too. I was wondering whether the Secondary can do: <--- done vm_stop() load state It simply receives the checkpoint data into a buffer and immediately replies with "done". vm_stop() and load state is only performed after sending "done". Secondary vm is running, so we should also get the pages that are dirtied by secondary vm, but not dirtied by primary vm. We have two ways to do it: 1. Cache all original memory in the secondary qemu 2. Send the dirty pfn list to primary qemu, and get it. If we ack the checkpoint and the call vm_stop(), we only can select 1. It means that secondary qemu costs more memory. In COLO mode, we will compare the output socket, and will do checkpoint if the application level data is different. If we ack the checkpoint and the call vm_stop(), the client can not get any more data until secondary vm is running again. So we still 'wait' the secondary vm. The advantage is that the Primary will not be delayed by the Secondary. It's an approach that doesn't block. But perhaps it's a problem if the Secondary is slower than the Primary since the Secondary still needs to complete vm_stop() and load state before it can resume execution? I think this really means falling back to microcheckpointing until the Secondary guest can checkpoint. Instead of a blocking vm_stop() we would prevent vcpus from running and when the last pending I/O finishes the Secondary could apply the last checkpoint. This approach does not block QEMU (the monitor, etc). If secondary host becomes unresponsive, it means that we cannot do mocrocheckpointing. We should do failover in this case. This is dangerous because it means that a delay/failure in the Secondary would cause the Primary to fail over to the broken Secondary. All the more reason not to perform blocking operations on the Secondary in the checkpoint code path. If the secondary is broken, primary qemu will take over. Does the Primary use a timeout between "new checkpoint notice" and Secondary's "done" so it can move on if the Secondary is unresponsive? To hailiang: IIRC, we don't use a timeout but I think we can do it. In our design, there is Yes, we may need a timeout to help detecting the unresponsive case which can not be caught by the external heartbeat module. I will investigate it. Thanks, Hailiang an exteranl heartbeat to check primary and secondary status, and decide when to do checkpoint. Thanks Wen Congyang Stefan .
Re: [Qemu-devel] [PATCH v1 13/22] migration: convert RDMA to use QIOChannel interface
On Tue, Feb 02, 2016 at 08:01:36PM +, Dr. David Alan Gilbert wrote: > * Daniel P. Berrange (berra...@redhat.com) wrote: > > This converts the RDMA code to provide a subclass of > > QIOChannel that uses RDMA for the data transport. > > > > The RDMA code would be much better off it it could > > be split up in a generic RDMA layer, a QIOChannel > > impl based on RMDA, and then the RMDA migration > > glue. This is left as a future exercise for the brave. > > > > Signed-off-by: Daniel P. Berrange > > --- > > migration/rdma.c | 260 > > ++- > > 1 file changed, 161 insertions(+), 99 deletions(-) > > > > +static ssize_t qio_channel_rdma_readv(QIOChannel *ioc, > > + const struct iovec *iov, > > + size_t niov, > > + int **fds, > > + size_t *nfds, > > + Error **errp) > > +{ > > +QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc); > > +RDMAContext *rdma = rioc->rdma; > > RDMAControlHeader head; > > int ret = 0; > > +ssize_t i; > > +size_t done = 0; > > > > CHECK_ERROR_STATE(); > > > > -/* > > - * First, we hold on to the last SEND message we > > - * were given and dish out the bytes until we run > > - * out of bytes. > > - */ > > -r->len = qemu_rdma_fill(r->rdma, buf, size, 0); > > -if (r->len) { > > -return r->len; > > -} > > +for (i = 0; i < niov; i++) { > > +size_t want = iov[i].iov_len; > > +uint8_t *data = (void *)iov[i].iov_base; > > > > -/* > > - * Once we run out, we block and wait for another > > - * SEND message to arrive. > > - */ > > -ret = qemu_rdma_exchange_recv(rdma, &head, RDMA_CONTROL_QEMU_FILE); > > +/* > > + * First, we hold on to the last SEND message we > > + * were given and dish out the bytes until we run > > + * out of bytes. > > + */ > > +ret = qemu_rdma_fill(rioc->rdma, data, want, 0); > > +if (ret > 0) { > > +done += ret; > > +if (ret < want) { > > +break; > > +} else { > > +continue; > > +} > > > +} > > > > -if (ret < 0) { > > -rdma->error_state = ret; > > -return ret; > > -} > > +/* > > + * Once we run out, we block and wait for another > > + * SEND message to arrive. > > + */ > > +ret = qemu_rdma_exchange_recv(rdma, &head, RDMA_CONTROL_QEMU_FILE); > > > > -/* > > - * SEND was received with new bytes, now try again. > > - */ > > -return qemu_rdma_fill(r->rdma, buf, size, 0); > > +if (ret < 0) { > > +rdma->error_state = ret; > > +return ret; > > +} > > + > > +/* > > + * SEND was received with new bytes, now try again. > > + */ > > +ret = qemu_rdma_fill(rioc->rdma, data, want, 0); > > +if (ret > 0) { > > +done += ret; > > +if (ret < want) { > > +break; > > +} > > +} > > I don't quite understand the behaviour of this loop. > If rdma_fill returns less than you wanted for the first iov we break. > If it returns 0 then we try and get some more. > The weird thing to me is if we have two iov entries; if the > amount returned by the qemu_rdma_fill happens to match the size of > the 1st iov then I think we end up doing the exchange_recv and > waiting for more. Is that what we want? Why? No, it isn't quite what we want. If we have successfully received some data in a preceeding iov, then we shouldn't wait for more data for any following iov. I'll rework this bit of code to work better In fact technically, we should not block for data at all when the channel is in non-blocking mode. Fixing that would require some refactoring of qemu_rdma_block_for_wrid() so that we could tell it to only look for an already completed work request and not block Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
Re: [Qemu-devel] [PATCH v4] Add optionrom compatible with fw_cfg DMA version
On 02/03/16 09:44, Gerd Hoffmann wrote: > Hi, > >> I think the "dma_enabled" property is not exposed to the user. > > It is: "-global fw_cfg.dma_enabled=off" works (as in: doesn't throw an > error). Has no effect through as it gets overridden later on. > >> The default value of "dma_enabled" in both fw_cfg_io_properties and >> fw_cfg_mem_properties is irrelevant; the actual property value is always >> overwritten in fw_cfg_init_io_dma() and fw_cfg_init_mem_wide(), which >> all of the init paths go through. > > And IMHO we should not do that, so setting the property actually has an > effect. Fair point. >> I agree that DMA capability should be filtered with machine type. >> However, that distinction should not be made using the current >> "dma_enabled" properties (i.e., of "fw_cfg_io_properties" and >> "fw_cfg_mem_properties". Instead, it should be made in the >> board-specific callers of fw_cfg_init_(io_dma|mem_wide). > > Why? That's how "has_reserved_memory" works as well, for example. But, if the property is made work, I guess PC_COMPAT_2_4 can be used too. (Or should it be HW_COMPAT_2_4?) Is that your point? Thanks Laszlo
[Qemu-devel] [PATCH] hw/pxb: add pxb devices to the bridge category
Signed-off-by: Marcel Apfelbaum --- hw/pci-bridge/pci_expander_bridge.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/hw/pci-bridge/pci_expander_bridge.c b/hw/pci-bridge/pci_expander_bridge.c index 62fd29d..d23b8da 100644 --- a/hw/pci-bridge/pci_expander_bridge.c +++ b/hw/pci-bridge/pci_expander_bridge.c @@ -302,6 +302,7 @@ static void pxb_dev_class_init(ObjectClass *klass, void *data) dc->desc = "PCI Expander Bridge"; dc->props = pxb_dev_properties; +set_bit(DEVICE_CATEGORY_BRIDGE, dc->categories); } static const TypeInfo pxb_dev_info = { @@ -334,6 +335,7 @@ static void pxb_pcie_dev_class_init(ObjectClass *klass, void *data) dc->desc = "PCI Express Expander Bridge"; dc->props = pxb_dev_properties; +set_bit(DEVICE_CATEGORY_BRIDGE, dc->categories); } static const TypeInfo pxb_pcie_dev_info = { -- 2.4.3
Re: [Qemu-devel] [PATCH 2/3] pcdimm: add 'type' field to PCDIMMDeviceInfo
On 03.02.2016 01:12, Eric Blake wrote: On 01/27/2016 11:51 PM, Vladimir Sementsov-Ogievskiy wrote: The field is needed to distinguish pc-dimm and nvdimm. Signed-off-by: Vladimir Sementsov-Ogievskiy Signed-off-by: Denis V. Lunev CC: Stefan Hajnoczi CC: Xiao Guangrong CC: "Michael S. Tsirkin" CC: Igor Mammedov CC: Eric Blake CC: Markus Armbruster --- +++ b/qapi-schema.json @@ -3924,6 +3924,8 @@ # # @hotpluggable: true if device if could be added/removed while machine is running # +# @type: device type: 'pc-dimm' or 'nvdimm' (since 2.6) +# # Since: 2.1 ## { 'struct': 'PCDIMMDeviceInfo', @@ -3934,7 +3936,8 @@ 'node': 'int', 'memdev': 'str', 'hotplugged': 'bool', -'hotpluggable': 'bool' +'hotpluggable': 'bool', +'type': 'str' No. Since it is a finite set of values (just two possible), you should be using an enum here rather than open-coded 'str'. Something like: { 'enum': 'DIMMType', 'data': [ 'pc-dimm', 'nvdimm' ] } Are you sure? This is only output Info, so user will never "set" this field. Also, qemu type system (as I understand) is based on string names. object_dynamic_cast and other functions uses "const char *typename". This enum will be out of qemu type system and we will have to sync it.. Is there already some practice of translating string typenames to enum values? -- Best regards, Vladimir
Re: [Qemu-devel] [PATCH 3/3] balloon: don't use NVDIMM for ballooning
On 02.02.2016 18:30, Markus Armbruster wrote: Vladimir Sementsov-Ogievskiy writes: NVDIMM for now is planned to use as a backing store for DAX filesystem in the guest and thus this memory is excluded from guest memory management and LRUs. In this case libvirt running QEMU along with configured balloon almost immediately inflates balloon and effectively kill the guest as qemu counts nvdimm as part of the ram. Counting dimm devices as part of the ram for ballooning was started from commit 463756d03: virtio-balloon: Fix balloon not working correctly when hotplug memory Signed-off-by: Vladimir Sementsov-Ogievskiy Signed-off-by: Denis V. Lunev CC: Stefan Hajnoczi CC: Xiao Guangrong CC: "Michael S. Tsirkin" CC: Igor Mammedov CC: Eric Blake CC: Markus Armbruster --- hw/virtio/virtio-balloon.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c index 6a4c4d2..749be25 100644 --- a/hw/virtio/virtio-balloon.c +++ b/hw/virtio/virtio-balloon.c @@ -26,6 +26,7 @@ #include "qapi/visitor.h" #include "qapi-event.h" #include "trace.h" +#include "hw/mem/nvdimm.h" #if defined(__linux__) #include @@ -308,7 +309,9 @@ static ram_addr_t get_current_ram_size(void) if (value) { switch (value->type) { case MEMORY_DEVICE_INFO_KIND_DIMM: -size += value->u.dimm->size; +if (strcmp(value->u.dimm->type, TYPE_NVDIMM)) { +size += value->u.dimm->size; +} break; default: break; Should this be a blacklist ("don't count TYPE_NVDIMM") or a whitelist ("count TYPE_PC_DIMM")? I guess that depends on whether we think future types are more likely to need counting or not counting. May be, to avoid such bugs in future, it would be better to make like a whitelist. -- Best regards, Vladimir
Re: [Qemu-devel] [PATCH] Fix inconsistency between comment and variable name
On 02/03/2016 05:35 PM, Markus Armbruster wrote: Michael Tokarev writes: 03.02.2016 06:19, Cao jin wrote: Signed-off-by: Cao jin --- include/hw/qdev-core.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h index abcdee8..42fa5db 100644 --- a/include/hw/qdev-core.h +++ b/include/hw/qdev-core.h @@ -221,7 +221,7 @@ typedef struct BusChild { /** * BusState: - * @hotplug_device: link to a hotplug device associated with bus. + * @hotplug_handler: link to a hotplug device associated with bus. Hmm. Now while the field name in comment and in the structure do match, the comment is still wrong, since it is a linke to a handler, not a device… :) Do you mean to suggest the line should be changed to * @hotplug_device: link to a hotplug handler associated with bus. ? And sometimes a hotplug handler could be a bus, see qbus_set_bus_hotplug_handler(). on both case(device & bus), the actual hotplug handler called at last is a TYPE_HOTPLUG_HANDLER which reside in that device/bus, so I guess, the @hotplug_handler maybe fine:) anyway, these terms could easily confuse newbies -- Yours Sincerely, Cao jin
Re: [Qemu-devel] [PATCH v2 00/10] virtio/vring: optimization patches
Hi, > Subject: [Qemu-devel] [PATCH v2 00/10] virtio/vring: optimization patches > > This includes two optimization of virtio: > > - "slimming down" VirtQueueElements by not including room for > 1024 buffers. This makes malloc much faster. > > - optimizations to limit the number of address_space_translate > calls in virtio.c, from Vincenzo and myself. > Very nice! After apply those optimizations (patch 8, 9, 10), I got 4MB/sec bonus of speed. Based on my virtio-crypto device benchmark, ase-128-cbc algorithm. I haven't rebase other patches of this patch set yet, maybe I can get other bonus, can I? Before applying those three patches: Testing AES-128-CBC cipher: Encrypting in chunks of 256 bytes: done. 246.84 MiB in 5.02 secs: 49.13 MiB/sec (1011061 packets) Encrypting in chunks of 256 bytes: done. 247.03 MiB in 5.02 secs: 49.16 MiB/sec (1011840 packets) Encrypting in chunks of 256 bytes: done. 246.98 MiB in 5.02 secs: 49.17 MiB/sec (1011636 packets) Encrypting in chunks of 256 bytes: done. 247.14 MiB in 5.02 secs: 49.19 MiB/sec (1012270 packets) Encrypting in chunks of 256 bytes: done. 246.96 MiB in 5.02 secs: 49.16 MiB/sec (1011565 packets) Encrypting in chunks of 256 bytes: done. 246.97 MiB in 5.02 secs: 49.18 MiB/sec (1011594 packets) Encrypting in chunks of 256 bytes: done. 246.89 MiB in 5.02 secs: 49.15 MiB/sec (1011259 packets) Encrypting in chunks of 256 bytes: done. 246.96 MiB in 5.02 secs: 49.15 MiB/sec (1011561 packets) 'Perf top' shows: 23.61% qemu-kvm [.] address_space_translate 14.49% qemu-kvm [.] qemu_get_ram_ptr 4.65% qemu-kvm [.] phys_page_find 4.31% qemu-kvm [.] address_space_translate_internal 3.18% libpthread-2.19.so [.] __pthread_mutex_unlock_usercnt 2.83% qemu-kvm [.] qemu_ram_addr_from_host 2.40% qemu-kvm [.] address_space_map 2.34% libc-2.19.so [.] _int_malloc 2.22% libc-2.19.so [.] _int_free 1.96% libc-2.19.so [.] malloc 1.71% libpthread-2.19.so [.] pthread_mutex_lock 1.40% qemu-kvm [.] find_next_zero_bit 1.38% libc-2.19.so [.] malloc_consolidate 1.31% qemu-kvm [.] lduw_le_phys 1.27% libc-2.19.so [.] __memcpy_sse2_unaligned 1.05% qemu-kvm [.] qemu_get_ram_block 1.05% qemu-kvm [.] object_unref 1.04% qemu-kvm [.] memory_region_get_ram_addr After applying those optimizations: Encrypting in chunks of 256 bytes: done. 267.92 MiB in 5.03 secs: 53.31 MiB/sec (1097399 packets) Encrypting in chunks of 256 bytes: done. 268.05 MiB in 5.02 secs: 53.35 MiB/sec (1097935 packets) Encrypting in chunks of 256 bytes: done. 265.40 MiB in 5.02 secs: 52.82 MiB/sec (1087091 packets) Encrypting in chunks of 256 bytes: done. 263.18 MiB in 5.01 secs: 52.50 MiB/sec (1077999 packets) Encrypting in chunks of 256 bytes: done. 266.85 MiB in 5.01 secs: 53.29 MiB/sec (1093010 packets) Encrypting in chunks of 256 bytes: done. 267.64 MiB in 5.02 secs: 53.28 MiB/sec (1096251 packets) Encrypting in chunks of 256 bytes: done. 267.30 MiB in 5.02 secs: 53.24 MiB/sec (1094861 packets) Encrypting in chunks of 256 bytes: done. 267.29 MiB in 5.02 secs: 53.25 MiB/sec (1094833 packets) 'Perf top' shows: 22.56% qemu-kvm [.] address_space_translate 13.29% qemu-kvm [.] qemu_get_ram_ptr 4.71% qemu-kvm [.] phys_page_find 4.43% qemu-kvm [.] address_space_translate_internal 3.47% libpthread-2.19.so [.] __pthread_mutex_unlock_usercnt 3.08% qemu-kvm [.] qemu_ram_addr_from_host 2.62% qemu-kvm [.] address_space_map 2.61% libc-2.19.so [.] _int_malloc 2.58% libc-2.19.so [.] _int_free 2.38% libc-2.19.so [.] malloc 2.06% libpthread-2.19.so [.] pthread_mutex_lock 1.68% libc-2.19.so [.] malloc_consolidate 1.35% libc-2.19.so [.] __memcpy_sse2_unaligned 1.23% qemu-kvm [.] lduw_le_phys 1.18% qemu-kvm [.] find_next_zero_bit 1.02% qemu-kvm [.] object_unref Regards, -Gonglei
Re: [Qemu-devel] [PULL 0/1] Monitor patches for 2016-02-03
On 3 February 2016 at 09:51, Markus Armbruster wrote: > The following changes since commit c65db7705b7926f4a084b93778e4bd5dd3990aad: > > Merge remote-tracking branch > 'remotes/maxreitz/tags/pull-block-for-peter-2016-02-02' into staging > (2016-02-02 18:04:04 +) > > are available in the git repository at: > > git://repo.or.cz/qemu/armbru.git tags/pull-monitor-2016-02-03 > > for you to fetch changes up to 64ffbe04eaafebf4045a3ace52a360c14959d196: > > hmp: fix sendkey out of bounds write (CVE-2015-8619) (2016-02-03 10:13:06 > +0100) > > > Monitor patches for 2016-02-03 > > Applied, thanks. -- PMM
Re: [Qemu-devel] [Qemu-arm] [PATCH] target-arm: Don't report presence of EL2 if it doesn't exist
On 02.02.2016 21:20, Peter Maydell wrote: > We already modify the processor feature bits to not report EL3 > support to the guest if EL3 isn't enabled for the CPU we're emulating. > Add similar support for not reporting EL2 unless it is enabled. > This is necessary because real world guest code running at EL3 > (trusted firmware or bootloaders) will query the ID registers to > determine whether it should start a guest Linux kernel in EL2 or EL3. > > Signed-off-by: Peter Maydell Reviewed-by: Sergey Fedorov > --- > When full EL2 arrives and we have the CPU property for it then > this will expand a bit to look like the 'if (!cpu->has_el3)' > condition just above it. > > target-arm/cpu.c | 9 + > 1 file changed, 9 insertions(+) > > diff --git a/target-arm/cpu.c b/target-arm/cpu.c > index 6c34476..0cc075d 100644 > --- a/target-arm/cpu.c > +++ b/target-arm/cpu.c > @@ -650,6 +650,15 @@ static void arm_cpu_realizefn(DeviceState *dev, Error > **errp) > cpu->id_aa64pfr0 &= ~0xf000; > } > > +if (!arm_feature(env, ARM_FEATURE_EL2)) { > +/* Disable the hypervisor feature bits in the processor feature > + * registers if we don't have EL2. These are id_pfr1[15:12] and > + * id_aa64pfr0_el1[11:8]. > + */ > +cpu->id_aa64pfr0 &= ~0xf00; > +cpu->id_pfr1 &= ~0xf000; > +} > + > if (!cpu->has_mpu) { > unset_feature(env, ARM_FEATURE_MPU); > }
[Qemu-devel] [PATCH v2] target-mips: implement R6 multi-threading
From: Yongbok Kim MIPS Release 6 provides multi-threading features which replace pre-R6 MT Module. CP0.Config3.MT is always 0 in R6, instead there is new CP0.Config5.VP (Virtual Processor) bit which indicates presence of multi-threading support which includes CP0.GlobalNumber register and DVP/EVP instructions. Signed-off-by: Yongbok Kim Signed-off-by: Leon Alrae --- This is an updated version of the patch originally sent by Yongbok some time ago. v2: * rebased * fix typo: GlobalNumer -> GlobalNumber * fix minor style issues * update the patch description --- disas/mips.c | 4 +++ target-mips/cpu.c| 9 +++ target-mips/cpu.h| 25 +++ target-mips/helper.h | 4 +++ target-mips/op_helper.c | 48 +++ target-mips/translate.c | 59 target-mips/translate_init.c | 3 ++- 7 files changed, 151 insertions(+), 1 deletion(-) diff --git a/disas/mips.c b/disas/mips.c index 0e488d8..249931b 100644 --- a/disas/mips.c +++ b/disas/mips.c @@ -1405,6 +1405,10 @@ const struct mips_opcode mips_builtin_opcodes[] = {"cmp.sor.d", "D,S,T", 0x46a00019, 0xffe0003f, RD_S|RD_T|WR_D|FP_D, 0, I32R6}, {"cmp.sune.d", "D,S,T", 0x46a0001a, 0xffe0003f, RD_S|RD_T|WR_D|FP_D, 0, I32R6}, {"cmp.sne.d", "D,S,T", 0x46a0001b, 0xffe0003f, RD_S|RD_T|WR_D|FP_D, 0, I32R6}, +{"dvp","", 0x41600024, 0x, TRAP, 0, I32R6}, +{"dvp","t", 0x41600024, 0xffe0, TRAP|WR_t,0, I32R6}, +{"evp","", 0x4164, 0x, TRAP, 0, I32R6}, +{"evp","t", 0x4164, 0xffe0, TRAP|WR_t,0, I32R6}, /* MSA */ {"sll.b", "+d,+e,+f", 0x780d, 0xffe0003f, WR_VD|RD_VS|RD_VT, 0, MSA}, diff --git a/target-mips/cpu.c b/target-mips/cpu.c index 0b3f130..7dc3a44 100644 --- a/target-mips/cpu.c +++ b/target-mips/cpu.c @@ -77,6 +77,15 @@ static bool mips_cpu_has_work(CPUState *cs) has_work = false; } } +/* MIPS Release 6 has the ability to halt the CPU. */ +if (env->CP0_Config5 & (1 << CP0C5_VP)) { +if (cs->interrupt_request & CPU_INTERRUPT_WAKE) { +has_work = true; +} +if (!mips_vp_active(env)) { +has_work = false; +} +} return has_work; } diff --git a/target-mips/cpu.h b/target-mips/cpu.h index 86b6333..ae8c575 100644 --- a/target-mips/cpu.h +++ b/target-mips/cpu.h @@ -238,6 +238,8 @@ struct CPUMIPSState { int32_t CP0_Index; /* CP0_MVP* are per MVP registers. */ +int32_t CP0_VPControl; +#define CP0VPCtl_DIS0 int32_t CP0_Random; int32_t CP0_VPEControl; #define CP0VPECo_YSI 21 @@ -287,6 +289,8 @@ struct CPUMIPSState { # define CP0EnLo_RI 31 # define CP0EnLo_XI 30 #endif +int32_t CP0_GlobalNumber; +#define CP0GN_VPId 0 target_ulong CP0_Context; target_ulong CP0_KScratch[MIPS_KSCRATCH_NUM]; int32_t CP0_PageMask; @@ -472,6 +476,7 @@ struct CPUMIPSState { #define CP0C5_XNP13 #define CP0C5_UFE9 #define CP0C5_FRE8 +#define CP0C5_VP 7 #define CP0C5_SBRI 6 #define CP0C5_MVH5 #define CP0C5_LLB4 @@ -859,6 +864,26 @@ static inline int mips_vpe_active(CPUMIPSState *env) return active; } +static inline int mips_vp_active(CPUMIPSState *env) +{ +CPUState *other_cs = first_cpu; + +/* Check if the VP disabled other VPs (which means the VP is enabled) */ +if ((env->CP0_VPControl >> CP0VPCtl_DIS) & 1) { +return 1; +} + +/* Check if the virtual processor is disabled due to a DVP */ +CPU_FOREACH(other_cs) { +MIPSCPU *other_cpu = MIPS_CPU(other_cs); +if ((&other_cpu->env != env) && +((other_cpu->env.CP0_VPControl >> CP0VPCtl_DIS) & 1)) { +return 0; +} +} +return 1; +} + #include "exec/exec-all.h" static inline void compute_hflags(CPUMIPSState *env) diff --git a/target-mips/helper.h b/target-mips/helper.h index 95b9149..1bc8bb2 100644 --- a/target-mips/helper.h +++ b/target-mips/helper.h @@ -176,6 +176,10 @@ DEF_HELPER_0(dmt, tl) DEF_HELPER_0(emt, tl) DEF_HELPER_1(dvpe, tl, env) DEF_HELPER_1(evpe, tl, env) + +/* R6 Multi-threading */ +DEF_HELPER_1(dvp, tl, env) +DEF_HELPER_1(evp, tl, env) #endif /* !CONFIG_USER_ONLY */ /* microMIPS functions */ diff --git a/target-mips/op_helper.c b/target-mips/op_helper.c index 684ec92..7c5669c 100644 --- a/target-mips/op_helper.c +++ b/target-mips/op_helper.c @@ -571,6 +571,14 @@ static bool mips_vpe_is_wfi(MIPSCPU *c) return cpu->halted && mips_vpe_active(env); } +static bool mips_vp_is_wfi(MIPSCPU *c) +{ +CPUState *cpu = CPU(c); +CPUMIPSState *env = &c->env; + +return cpu->halted && mips_vp_active(env); +} + static inline void mips_vpe_wake(MIPSCPU *c) { /* Dont set ->halted = 0 directly, let it be done via cpu_has_work @@ -18
Re: [Qemu-devel] [PATCH 07/10] virtio: combine the read of a descriptor
Hi, > Subject: [Qemu-devel] [PATCH 07/10] virtio: combine the read of a descriptor > > Compared to vring, virtio has a performance penalty of 10%. Fix it > by combining all the reads for a descriptor in a single address_space_read > call. This also simplifies the code nicely. > > Reviewed-by: Cornelia Huck > Signed-off-by: Paolo Bonzini > --- > hw/virtio/virtio.c | 86 > ++ > 1 file changed, 35 insertions(+), 51 deletions(-) > Unbelievable! After applying this patch, the virtio-crypto speed can attach 74MB/sec, host Cpu overhead is 180% (the main thread 100% and vcpu threads 80%) Testing AES-128-CBC cipher: Encrypting in chunks of 256 bytes: done. 371.94 MiB in 5.02 secs: 74.12 MiB/sec (1523475 packets) Encrypting in chunks of 256 bytes: done. 369.85 MiB in 5.01 secs: 73.88 MiB/sec (1514900 packets) Encrypting in chunks of 256 bytes: done. 371.07 MiB in 5.02 secs: 73.97 MiB/sec (1519914 packets) Encrypting in chunks of 256 bytes: done. 371.66 MiB in 5.02 secs: 74.09 MiB/sec (1522309 packets) Encrypting in chunks of 256 bytes: done. 371.79 MiB in 5.02 secs: 74.12 MiB/sec (1522868 packets) Encrypting in chunks of 256 bytes: done. 371.94 MiB in 5.02 secs: 74.15 MiB/sec (1523457 packets) Encrypting in chunks of 256 bytes: done. 371.90 MiB in 5.02 secs: 74.14 MiB/sec (1523317 packets) Encrypting in chunks of 256 bytes: done. 371.71 MiB in 5.02 secs: 74.10 MiB/sec (1522522 packets) 15.95% qemu-kvm [.] address_space_translate 6.98% qemu-kvm [.] qemu_get_ram_ptr 4.87% libpthread-2.19.so [.] __pthread_mutex_unlock_usercnt 4.40% qemu-kvm [.] qemu_ram_addr_from_host 3.79% qemu-kvm [.] address_space_map 3.41% libc-2.19.so [.] _int_malloc 3.29% libc-2.19.so [.] _int_free 3.07% libc-2.19.so [.] malloc 2.95% libpthread-2.19.so [.] pthread_mutex_lock 2.94% qemu-kvm [.] phys_page_find 2.73% qemu-kvm [.] address_space_translate_internal 2.65% libc-2.19.so [.] malloc_consolidate 2.35% libc-2.19.so [.] __memcpy_sse2_unaligned 1.72% qemu-kvm [.] find_next_zero_bit 1.38% qemu-kvm [.] address_space_rw 1.34% qemu-kvm [.] object_unref 1.30% qemu-kvm [.] object_ref 1.28% qemu-kvm [.] virtqueue_pop 1.20% libc-2.19.so [.] memset 1.11% qemu-kvm [.] virtio_notify Thank you so much! Regards, -Gonglei > diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c > index 79a635f..2433866 100644 > --- a/hw/virtio/virtio.c > +++ b/hw/virtio/virtio.c > @@ -107,35 +107,15 @@ void virtio_queue_update_rings(VirtIODevice *vdev, > int n) >vring->align); > } > > -static inline uint64_t vring_desc_addr(VirtIODevice *vdev, hwaddr desc_pa, > - int i) > +static void vring_desc_read(VirtIODevice *vdev, VRingDesc *desc, > +hwaddr desc_pa, int i) > { > -hwaddr pa; > -pa = desc_pa + sizeof(VRingDesc) * i + offsetof(VRingDesc, addr); > -return virtio_ldq_phys(vdev, pa); > -} > - > -static inline uint32_t vring_desc_len(VirtIODevice *vdev, hwaddr desc_pa, int > i) > -{ > -hwaddr pa; > -pa = desc_pa + sizeof(VRingDesc) * i + offsetof(VRingDesc, len); > -return virtio_ldl_phys(vdev, pa); > -} > - > -static inline uint16_t vring_desc_flags(VirtIODevice *vdev, hwaddr desc_pa, > -int i) > -{ > -hwaddr pa; > -pa = desc_pa + sizeof(VRingDesc) * i + offsetof(VRingDesc, flags); > -return virtio_lduw_phys(vdev, pa); > -} > - > -static inline uint16_t vring_desc_next(VirtIODevice *vdev, hwaddr desc_pa, > - int i) > -{ > -hwaddr pa; > -pa = desc_pa + sizeof(VRingDesc) * i + offsetof(VRingDesc, next); > -return virtio_lduw_phys(vdev, pa); > +address_space_read(&address_space_memory, desc_pa + i * > sizeof(VRingDesc), > + MEMTXATTRS_UNSPECIFIED, (void *)desc, > sizeof(VRingDesc)); > +virtio_tswap64s(vdev, &desc->addr); > +virtio_tswap32s(vdev, &desc->len); > +virtio_tswap16s(vdev, &desc->flags); > +virtio_tswap16s(vdev, &desc->next); > } > > static inline uint16_t vring_avail_flags(VirtQueue *vq) > @@ -345,18 +325,18 @@ static unsigned int virtqueue_get_head(VirtQueue > *vq, unsigned int idx) > return head; > } > > -static unsigned virtqueue_next_desc(VirtIODevice *vdev, hwaddr desc_pa, > -unsigned int i, unsigned int max) > +static unsigned virtqueue_read_next_desc(VirtIODevice *vdev, VRingDesc > *desc, > + hwaddr desc_pa, unsigned > int max) > { > unsigne
Re: [Qemu-devel] [PATCH] rng-random: implement request queue
Hi Ladi, Adding Pankaj to CC, he too looked at this recently. On (Fri) 22 Jan 2016 [13:19:58], Ladi Prosek wrote: > If the guest adds a buffer to the virtio queue while another buffer > is still pending and hasn't been filled and returned by the rng > device, rng-random internally discards the pending request, which > leads to the second buffer getting stuck in the queue. For the guest > this manifests as delayed completion of reads from virtio-rng, i.e. > a read is completed only after another read is issued. > > This patch adds an internal queue of requests, analogous to what > rng-egd uses, to make sure that requests and responses are balanced > and correctly ordered. ... and this can lead to breaking migration (the queue of requests on the host needs to be migrated, else the new host will have no idea of the queue). I think we should limit the queue size to 1 instead. Multiple rng requests should not be common, because if we did have entropy, we'd just service the guest request and be done with it. If we haven't replied to the guest, it just means that the host itself is waiting for more entropy, or is waiting for the timeout before the guest's ratelimit is lifted. So, instead of fixing this using a queue, how about limiting the size of the vq to have just one element at a time? Thanks, Amit
Re: [Qemu-devel] [PATCH v2 00/10] virtio/vring: optimization patches
> -Original Message- > From: qemu-devel-bounces+arei.gonglei=huawei@nongnu.org > [mailto:qemu-devel-bounces+arei.gonglei=huawei@nongnu.org] On > Behalf Of Paolo Bonzini > Sent: Sunday, January 31, 2016 6:29 PM > To: qemu-devel@nongnu.org > Cc: cornelia.h...@de.ibm.com; m...@redhat.com > Subject: [Qemu-devel] [PATCH v2 00/10] virtio/vring: optimization patches > > This includes two optimization of virtio: > > - "slimming down" VirtQueueElements by not including room for > 1024 buffers. This makes malloc much faster. > > - optimizations to limit the number of address_space_translate > calls in virtio.c, from Vincenzo and myself. > > Thanks, > > Paolo > > v1->v2: improved commit messages [Conny] > add assertions on sz [Conny] > change bools from 1 and 0 to "true" and "false" [Conny] > update shadow avail_idx in virtio_queue_set_last_avail_idx [Michael] > collect Reviewed-by > > Paolo Bonzini (7): > virtio: move VirtQueueElement at the beginning of the structs > virtio: move allocation to virtqueue_pop/vring_pop > virtio: introduce qemu_get/put_virtqueue_element > virtio: introduce virtqueue_alloc_element > virtio: slim down allocation of VirtQueueElements > vring: slim down allocation of VirtQueueElements > virtio: combine the read of a descriptor > > Vincenzo Maffione (3): > virtio: cache used_idx in a VirtQueue field > virtio: read avail_idx from VQ only when necessary > virtio: combine write of an entry into used ring > > hw/9pfs/9p.c| 2 +- > hw/9pfs/virtio-9p-device.c | 17 +- > hw/9pfs/virtio-9p.h | 2 +- > hw/block/dataplane/virtio-blk.c | 11 +- > hw/block/virtio-blk.c | 23 +-- > hw/char/virtio-serial-bus.c | 78 + > hw/display/virtio-gpu.c | 25 ++- > hw/input/virtio-input.c | 24 ++- > hw/net/virtio-net.c | 69 +--- > hw/scsi/virtio-scsi-dataplane.c | 15 +- > hw/scsi/virtio-scsi.c | 26 ++- > hw/virtio/dataplane/vring.c | 62 --- > hw/virtio/virtio-balloon.c | 22 ++- > hw/virtio/virtio-rng.c | 10 +- > hw/virtio/virtio.c | 340 > +--- > include/hw/virtio/dataplane/vring.h | 2 +- > include/hw/virtio/virtio-balloon.h | 2 +- > include/hw/virtio/virtio-blk.h | 5 +- > include/hw/virtio/virtio-net.h | 2 +- > include/hw/virtio/virtio-scsi.h | 15 +- > include/hw/virtio/virtio-serial.h | 2 +- > include/hw/virtio/virtio.h | 13 +- > 22 files changed, 486 insertions(+), 281 deletions(-) > > -- > 2.5.0 > For patch 7,8,9,10: Tested-by: Gonglei Regards, -Gonglei
Re: [Qemu-devel] [PATCH qemu] migration/vmstate: document VMStateFlags
On (Thu) 21 Jan 2016 [21:39:27], Sascha Silbe wrote: > The VMState API is rather sparsely documented. Start by describing the > meaning of all VMStateFlags. > > Signed-off-by: Sascha Silbe Thanks, this is much needed. I'm just returning from a long trip, will go through this in a bit. Amit
Re: [Qemu-devel] [PATCH V2 1/2] ARM: PL061: Clear PL061 device state after reset
Hi Wei, This still has a problem as I said before. If we execute "virsh reboot xxx" during VM booting(i.e. before the PL061 driver loaded), then after VM booting, the VM will not have any reaction to the "virsh reboot xxx". On 2016/2/2 4:49, Wei Huang wrote: > Current QEMU doesn't clear PL061 state after reset. This causes a > weird issue with guest reboot via GPIO. Here is the device state > description with two reboot requests: > > (PL061State fields) data old_in_data istate > VM boot 0 0 0 > After 1st ACPI reboot request 8 8 8 > After VM PL061 driver ACK 8 8 0 > After VM reboot 8 8 0 > > 2nd ACPI reboot request 8 > > In the second reboot request above, because old_in_data field is 8, > QEMU decides that there is a pending edge IRQ already (see > pl061_update()) in input; so it doesn't raise up IRQ again. As a result > the second reboot request is lost. The correct way is to clear PL061 > device state after reset. > > NOTE: The reset state is found from the following documentation: > - PL061 Technical Reference Manual > - Stellaris LM3S8962 Microcontroller Data Sheet > - Stellaris LM3S5P31 Microcontroller Data Sheet > > Signed-off-by: Wei Huang > --- > hw/gpio/pl061.c | 32 ++-- > 1 file changed, 30 insertions(+), 2 deletions(-) > > diff --git a/hw/gpio/pl061.c b/hw/gpio/pl061.c > index e5a696e..342a70d 100644 > --- a/hw/gpio/pl061.c > +++ b/hw/gpio/pl061.c > @@ -284,8 +284,35 @@ static void pl061_write(void *opaque, hwaddr offset, > > static void pl061_reset(PL061State *s) > { > - s->locked = 1; > - s->cr = 0xff; > +/* reset values from PL061 TRM, Stellaris LM3S5P31 & LM3S8962 Data Sheet > */ > +s->data = 0; > +s->old_out_data = 0; > +s->old_in_data = 0; > +s->dir = 0; > +s->isense = 0; > +s->ibe = 0; > +s->iev = 0; > +s->im = 0; > +s->istate = 0; > +s->afsel = 0; > +s->dr2r = 0xff; > +s->dr4r = 0; > +s->dr8r = 0; > +s->odr = 0; > +s->pur = 0; > +s->pdr = 0; > +s->slr = 0; > +s->den = 0; > +s->locked = 1; > +s->cr = 0xff; > +s->amsel = 0; > +} > + > +static void pl061_state_reset(DeviceState *dev) > +{ > +PL061State *s = PL061(dev); > + > +pl061_reset(s); > } > > static void pl061_set_irq(void * opaque, int irq, int level) > @@ -343,6 +370,7 @@ static void pl061_class_init(ObjectClass *klass, void > *data) > > k->init = pl061_initfn; > dc->vmsd = &vmstate_pl061; > +dc->reset = &pl061_state_reset; > } > > static const TypeInfo pl061_info = { > -- Shannon
Re: [Qemu-devel] [PULL 0/6] virtio-gpu: bugfixes and spice support preparation
On 3 February 2016 at 11:16, Gerd Hoffmann wrote: > Hi, > > Minor fixes and some preparing work in console > and virtio-gpu for spice support. > > please pull, > Gerd > > The following changes since commit c65db7705b7926f4a084b93778e4bd5dd3990aad: > > Merge remote-tracking branch > 'remotes/maxreitz/tags/pull-block-for-peter-2016-02-02' into staging > (2016-02-02 18:04:04 +) > > are available in the git repository at: > > > git://git.kraxel.org/qemu tags/pull-vga-20160203-1 > > for you to fetch changes up to 321c9adba5a64a1a9de2dd7db5433b62a5433439: > > virtio-gpu: block any rendering until client (ui) is done (2016-02-03 > 10:41:36 +0100) > > > virtio-gpu: bugfixes and spice support preparation > > Applied, thanks. -- PMM
Re: [Qemu-devel] [PATCH v4] Add optionrom compatible with fw_cfg DMA version
Hi, > >> I agree that DMA capability should be filtered with machine type. > >> However, that distinction should not be made using the current > >> "dma_enabled" properties (i.e., of "fw_cfg_io_properties" and > >> "fw_cfg_mem_properties". Instead, it should be made in the > >> board-specific callers of fw_cfg_init_(io_dma|mem_wide). > > > > Why? > > That's how "has_reserved_memory" works as well, for example. Those are machine options, not device options. If we have a device property there is no need to create a duplicate a machine option for the same purpose. > But, if the property is made work, I guess PC_COMPAT_2_4 can be used > too. (Or should it be HW_COMPAT_2_4?) HW_COMPAT_2_4 probably, so it applies to q35 too. cheers, Gerd
Re: [Qemu-devel] [RFC PATCH v1 1/1] vGPU core driver : to provide common interface for vGPU.
On 2/3/2016 11:26 AM, Tian, Kevin wrote: [...] * @vgpu_create:Called to allocate basic resouces in graphics * driver for a particular vgpu. * @dev: physical pci device structure on which vgpu *should be created * @uuid: uuid for which VM it is intended to * @instance: vgpu instance in that VM * @vgpu_id: This represents the type of vgpu to be *created * Returns integer: success (0) or error (< 0) Specifically for Intel GVT-g we didn't hard partition resource among vGPUs. Instead we allow user to accurately control how many physical resources are allocated to a vGPU. So this interface should be extensible to allow vendor specific resource control. This interface forwards the create request to vendor/GPU driver informing about which physical GPU this request is intended for and the type of vGPU. Then its vendor/GPU driver's responsibility to do resources allocation and manage resources in their own driver. However the current parameter definition disallows resource configuration information passed from user. As I said, Intel GVT-g doesn't do static allocation based on type. We provide flexibility to user for fine-grained resource management. int (*vgpu_create)(struct pci_dev *dev, uuid_le uuid, - uint32_t instance, uint32_t vgpu_id); + uint32_t instance, char *vgpu_params); If we change integer vgpu_id parameter to char *vgpu_param then GPU driver can have multiple parameters. Suppose there is a GPU at :85:00.0, then to create vgpu: # echo "::" > /sys/bus/pci/devices/\:85\:00.0/vgpu_create Common vgpu module will not parse vgpu_params string, it will be forwarded to GPU driver, then its GPU driver's responsibility to parse the string and act accordingly. This should give flexibility to have multiple parameters for GPU driver. * * Physical GPU that support vGPU should be register with vgpu module with * gpu_device_ops structure. */ Also it'd be good design to allow extensible usages, such as statistics, and other vendor specific control knobs (e.g. foreground/background VM switch in Intel GVT-g, etc.) Can you elaborate on what other control knobs that would be needed? Some examples: - foreground/background VM switch - resource query - various statistics info - virtual monitor configuration - ... Since this vgpu core driver will become the central point for all vgpu management, we need provide an easy way for vendor specific extension, e.g. exposing above callbacks as sysfs nodes and then vendor can create its own extensions under subdirectory (/intel, /nvidia, ...). Ok. Makes sense. Are these parameters per physical device or per vgpu device? Adding attribute groups to gpu_device_ops structure would provide a way to add vendor specific extensions. I think we need two types of attributes here, per physical device and per vgpu device. Right? const struct attribute_group **dev_groups; const struct attribute_group **vgpu_groups; Thanks, Kirti. Thanks Kevin
Re: [Qemu-devel] [PATCH v1 13/22] migration: convert RDMA to use QIOChannel interface
* Daniel P. Berrange (berra...@redhat.com) wrote: > On Tue, Feb 02, 2016 at 08:01:36PM +, Dr. David Alan Gilbert wrote: > > * Daniel P. Berrange (berra...@redhat.com) wrote: > > > This converts the RDMA code to provide a subclass of > > > QIOChannel that uses RDMA for the data transport. > > > > > > The RDMA code would be much better off it it could > > > be split up in a generic RDMA layer, a QIOChannel > > > impl based on RMDA, and then the RMDA migration > > > glue. This is left as a future exercise for the brave. > > > > > > Signed-off-by: Daniel P. Berrange > > > --- > > > migration/rdma.c | 260 > > > ++- > > > 1 file changed, 161 insertions(+), 99 deletions(-) > > > > > > > +static ssize_t qio_channel_rdma_readv(QIOChannel *ioc, > > > + const struct iovec *iov, > > > + size_t niov, > > > + int **fds, > > > + size_t *nfds, > > > + Error **errp) > > > +{ > > > +QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc); > > > +RDMAContext *rdma = rioc->rdma; > > > RDMAControlHeader head; > > > int ret = 0; > > > +ssize_t i; > > > +size_t done = 0; > > > > > > CHECK_ERROR_STATE(); > > > > > > -/* > > > - * First, we hold on to the last SEND message we > > > - * were given and dish out the bytes until we run > > > - * out of bytes. > > > - */ > > > -r->len = qemu_rdma_fill(r->rdma, buf, size, 0); > > > -if (r->len) { > > > -return r->len; > > > -} > > > +for (i = 0; i < niov; i++) { > > > +size_t want = iov[i].iov_len; > > > +uint8_t *data = (void *)iov[i].iov_base; > > > > > > -/* > > > - * Once we run out, we block and wait for another > > > - * SEND message to arrive. > > > - */ > > > -ret = qemu_rdma_exchange_recv(rdma, &head, RDMA_CONTROL_QEMU_FILE); > > > +/* > > > + * First, we hold on to the last SEND message we > > > + * were given and dish out the bytes until we run > > > + * out of bytes. > > > + */ > > > +ret = qemu_rdma_fill(rioc->rdma, data, want, 0); > > > +if (ret > 0) { > > > +done += ret; > > > +if (ret < want) { > > > +break; > > > +} else { > > > +continue; > > > +} > > > > > +} > > > > > > -if (ret < 0) { > > > -rdma->error_state = ret; > > > -return ret; > > > -} > > > +/* > > > + * Once we run out, we block and wait for another > > > + * SEND message to arrive. > > > + */ > > > +ret = qemu_rdma_exchange_recv(rdma, &head, > > > RDMA_CONTROL_QEMU_FILE); > > > > > > -/* > > > - * SEND was received with new bytes, now try again. > > > - */ > > > -return qemu_rdma_fill(r->rdma, buf, size, 0); > > > +if (ret < 0) { > > > +rdma->error_state = ret; > > > +return ret; > > > +} > > > + > > > +/* > > > + * SEND was received with new bytes, now try again. > > > + */ > > > +ret = qemu_rdma_fill(rioc->rdma, data, want, 0); > > > +if (ret > 0) { > > > +done += ret; > > > +if (ret < want) { > > > +break; > > > +} > > > +} > > > > I don't quite understand the behaviour of this loop. > > If rdma_fill returns less than you wanted for the first iov we break. > > If it returns 0 then we try and get some more. > > The weird thing to me is if we have two iov entries; if the > > amount returned by the qemu_rdma_fill happens to match the size of > > the 1st iov then I think we end up doing the exchange_recv and > > waiting for more. Is that what we want? Why? > > No, it isn't quite what we want. If we have successfully received > some data in a preceeding iov, then we shouldn't wait for more data > for any following iov. I'll rework this bit of code to work better > > In fact technically, we should not block for data at all when the > channel is in non-blocking mode. Fixing that would require some > refactoring of qemu_rdma_block_for_wrid() so that we could tell > it to only look for an already completed work request and not > block I wouldn't go changing qemu_rdma_block_for_wrid unless you need to. Dave > > Regards, > Daniel > -- > |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| > |: http://libvirt.org -o- http://virt-manager.org :| > |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| > |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH v1 13/22] migration: convert RDMA to use QIOChannel interface
On Wed, Feb 03, 2016 at 01:23:04PM +, Dr. David Alan Gilbert wrote: > * Daniel P. Berrange (berra...@redhat.com) wrote: > > On Tue, Feb 02, 2016 at 08:01:36PM +, Dr. David Alan Gilbert wrote: > > > * Daniel P. Berrange (berra...@redhat.com) wrote: > > > > This converts the RDMA code to provide a subclass of > > > > QIOChannel that uses RDMA for the data transport. > > > > > > > > The RDMA code would be much better off it it could > > > > be split up in a generic RDMA layer, a QIOChannel > > > > impl based on RMDA, and then the RMDA migration > > > > glue. This is left as a future exercise for the brave. > > > > > > > > Signed-off-by: Daniel P. Berrange > > > > --- > > > > migration/rdma.c | 260 > > > > ++- > > > > 1 file changed, 161 insertions(+), 99 deletions(-) > > > > > > > > > > I don't quite understand the behaviour of this loop. > > > If rdma_fill returns less than you wanted for the first iov we break. > > > If it returns 0 then we try and get some more. > > > The weird thing to me is if we have two iov entries; if the > > > amount returned by the qemu_rdma_fill happens to match the size of > > > the 1st iov then I think we end up doing the exchange_recv and > > > waiting for more. Is that what we want? Why? > > > > No, it isn't quite what we want. If we have successfully received > > some data in a preceeding iov, then we shouldn't wait for more data > > for any following iov. I'll rework this bit of code to work better > > > > In fact technically, we should not block for data at all when the > > channel is in non-blocking mode. Fixing that would require some > > refactoring of qemu_rdma_block_for_wrid() so that we could tell > > it to only look for an already completed work request and not > > block > > I wouldn't go changing qemu_rdma_block_for_wrid unless you need > to. Yeah, I won't do it now - just something to think about for the future to properly do non-blocking I/o channels. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
[Qemu-devel] Memory on stellaris board
Hello, i was trying to understand how does the sram and flash size works on lm3s6965evb, i found a hardcoded 0x00ff007f as dc0 value and how flash_size and sram_size are calculated based on that hexadecimal. I mean this: flash_size = (((board->dc0 & 0x) + 1) << 1) * 1024; sram_size = ((board->dc0 >> 18) + 1) * 1024; On stellaris.c When i use the -m [size] flag while running qemu how does the flag affect does flash_size and sram_size values? Or it doesn't? Where can i see that the memory has indeed change? When i debug qemu with or whitout the flag i keep getting the same sram_size and flash_size values (the ones calculated with 0x00ff007f) Thank you!! -- Aurelio Remonda Software Engineer San Lorenzo 47, 3rd Floor, Office 5 Córdoba, Argentina Phone: +54-351-4217888 / 4218211
Re: [Qemu-devel] [PATCH v1 03/22] migration: ensure qemu_fflush() always writes full data amount
On Thu, Jan 28, 2016 at 05:53:46PM +, Dr. David Alan Gilbert wrote: > * Daniel P. Berrange (berra...@redhat.com) wrote: > > The QEMUFile writev_buffer / put_buffer functions are expected > > to write out the full set of requested data, blocking until > > complete. The qemu_fflush() caller does not expect to deal with > > partial writes. Clarify the function comments and add a sanity > > check to the code to catch mistaken implementations. > > > > Signed-off-by: Daniel P. Berrange > > --- > > include/migration/qemu-file.h | 6 -- > > migration/qemu-file.c | 16 > > 2 files changed, 16 insertions(+), 6 deletions(-) > > > > diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h > > index b5d08d2..5debe8c 100644 > > --- a/include/migration/qemu-file.h > > +++ b/include/migration/qemu-file.h > > @@ -29,7 +29,8 @@ > > > > /* This function writes a chunk of data to a file at the given position. > > * The pos argument can be ignored if the file is only being used for > > - * streaming. The handler should try to write all of the data it can. > > + * streaming. The handler must write all of the data or return a negative > > + * errno value. > > */ > > typedef ssize_t (QEMUFilePutBufferFunc)(void *opaque, const uint8_t *buf, > > int64_t pos, size_t size); > > @@ -55,7 +56,8 @@ typedef int (QEMUFileCloseFunc)(void *opaque); > > typedef int (QEMUFileGetFD)(void *opaque); > > > > /* > > - * This function writes an iovec to file. > > + * This function writes an iovec to file. The handler must write all > > + * of the data or return a negative errno value. > > */ > > typedef ssize_t (QEMUFileWritevBufferFunc)(void *opaque, struct iovec *iov, > > int iovcnt, int64_t pos); > > diff --git a/migration/qemu-file.c b/migration/qemu-file.c > > index 0bbd257..f89e64e 100644 > > --- a/migration/qemu-file.c > > +++ b/migration/qemu-file.c > > @@ -107,11 +107,13 @@ bool qemu_file_is_writable(QEMUFile *f) > > * Flushes QEMUFile buffer > > * > > * If there is writev_buffer QEMUFileOps it uses it otherwise uses > > - * put_buffer ops. > > + * put_buffer ops. This will flush all pending data. If data was > > + * only partially flushed, it will set an error state. > > */ > > void qemu_fflush(QEMUFile *f) > > { > > ssize_t ret = 0; > > +ssize_t expect = 0; > > > > if (!qemu_file_is_writable(f)) { > > return; > > @@ -119,21 +121,27 @@ void qemu_fflush(QEMUFile *f) > > > > if (f->ops->writev_buffer) { > > if (f->iovcnt > 0) { > > +expect = iov_size(f->iov, f->iovcnt); > > ret = f->ops->writev_buffer(f->opaque, f->iov, f->iovcnt, > > f->pos); > > } > > } else { > > if (f->buf_index > 0) { > > +expect = f->buf_index; > > ret = f->ops->put_buffer(f->opaque, f->buf, f->pos, > > f->buf_index); > > } > > } > > + > > if (ret >= 0) { > > f->pos += ret; > > } > > -f->buf_index = 0; > > -f->iovcnt = 0; > > -if (ret < 0) { > > +/* We expect the QEMUFile write impl to send the full > > + * data set we requested, so sanity check that. > > + */ > > +if (ret < 0 || ret != expect) { > > qemu_file_set_error(f, ret); > > You could simplify that to if (ret != expect) couldn't you? > > In the case you're trying to guard against, the value past > to qemu_file_set_error is potentially truncated; which in the worst > case could make it appear as success; although I doubt that > can happen in our uses. qemu_file_set_error expects an errni, so in fact my code was buggy if the ret != expect part got evaluated, so I'll change this to if (ret != expect) { qemu_file_set_error(f, ret < 0 ? ret : -EIO); } > > Reviewed-by: Dr. David Alan Gilbert Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
Re: [Qemu-devel] Memory on stellaris board
On 3 February 2016 at 13:00, Aurelio Remonda wrote: > Hello, i was trying to understand how does the sram and flash size > works on lm3s6965evb, i found a hardcoded 0x00ff007f as dc0 value and > how flash_size and sram_size are calculated based on that hexadecimal. > I mean this: > > flash_size = (((board->dc0 & 0x) + 1) << 1) * 1024; > sram_size = ((board->dc0 >> 18) + 1) * 1024; > On stellaris.c > > When i use the -m [size] flag while running qemu how does the flag > affect does flash_size and sram_size values? Or it doesn't? Where can > i see that the memory has indeed change? This board model ignores -m. We just implement a model of this particular bit of hardware, which has a fixed amount of RAM in it. thanks -- PMM
Re: [Qemu-devel] [PATCH v1 07/22] migration: introduce a new QEMUFile impl based on QIOChannel
On Tue, Feb 02, 2016 at 05:06:24PM +, Dr. David Alan Gilbert wrote: > * Daniel P. Berrange (berra...@redhat.com) wrote: > > Introduce a new QEMUFile implementation that is based on > > the QIOChannel objects. This impl is different from existing > > impls in that there is no file descriptor that can be made > > available, as some channels may be based on higher level > > protocols such as TLS. > > > > Although the QIOChannel based implementation can trivially > > provide a bi-directional stream, initially we have separate > > functions for opening input & output directions to fit with > > the expectation of the current QEMUFile interface. > > > > Signed-off-by: Daniel P. Berrange > > --- > > include/migration/qemu-file.h | 4 + > > migration/Makefile.objs | 1 + > > migration/qemu-file-channel.c | 201 > > ++ > > 3 files changed, 206 insertions(+) > > create mode 100644 migration/qemu-file-channel.c > > +static ssize_t channel_writev_buffer(void *opaque, > > + struct iovec *iov, > > + int iovcnt, > > + int64_t pos) > > +{ > > +QIOChannel *ioc = QIO_CHANNEL(opaque); > > +ssize_t done = 0; > > +ssize_t want = iov_size(iov, iovcnt); > > +struct iovec oldiov = { NULL, 0 }; > > + > > +while (done < want) { > > +ssize_t len; > > +struct iovec *cur = iov; > > +int curcnt = iovcnt; > > + > > +channel_skip_iov(done, &cur, &curcnt, &oldiov); > > + > > +len = qio_channel_writev(ioc, cur, curcnt, NULL); > > +if (oldiov.iov_base) { > > +/* Restore original caller's info in @iov */ > > +cur[0].iov_base = oldiov.iov_base; > > +cur[0].iov_len = oldiov.iov_len; > > +oldiov.iov_base = NULL; > > +oldiov.iov_len = 0; > > +} > > +if (len == QIO_CHANNEL_ERR_BLOCK) { > > +qio_channel_wait(ioc, G_IO_OUT); > > +continue; > > +} > > +if (len < 0) { > > +/* XXX handle Error objects */ > > +return -EIO; > > It's best to return 'len' rather than lose what little > error information you had (similarly below). In this case 'len' is in fact just '-1', as all error info is in the Error ** parameter, but the QEMUFile API contract requires an errno value. So we don't have much choice but to return a fixed EIO - returning 'len' would also be a fixed errno - whichever errno corresponds to the value -1. I'd like to switch QEMUFile over to use Error **errp parameters for error reporting, so that we can make detailed error info available throughout the migration I/O code. That ought to wait until after this series is done though, to avoid complicating it yet more. > > > +} > > + > > +done += len; > > +} > > +return done; > > +} > > + > > + > > +static ssize_t channel_get_buffer(void *opaque, > > + uint8_t *buf, > > + int64_t pos, > > + size_t size) > > +{ > > +QIOChannel *ioc = QIO_CHANNEL(opaque); > > +ssize_t ret; > > + > > + reread: > > +ret = qio_channel_read(ioc, (char *)buf, size, NULL); > > +if (ret < 0) { > > +if (ret == QIO_CHANNEL_ERR_BLOCK) { > > +qio_channel_yield(ioc, G_IO_IN); > > +goto reread; > > +} else { > > +/* XXX handle Error * object */ > > +return -EIO; > > +} > > +} > > +return ret; > > > I'd prefer a loop to a goto; generally the only places we > use goto is an error exit. > > do { >ret = qio_channel_read(ioc, (char *)buf, size, NULL); >if (ret == QIO_CHANNEL_ERR_BLOCK) { >qio_channel_yield(ioc, G_IO_IN); >} > } while (ret == QIO_CHANNEL_ERR_BLOCK); > > return ret; > > and IMHO the loop is clearer. Ok, will change that. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
[Qemu-devel] [PATCH 7/7] target-arm: Enable EL3 for Cortex-A53 and Cortex-A57
Enable EL3 support for our Cortex-A53 and Cortex-A57 CPU models. We have enough implemented now to be able to run real world code at least to some extent (I can boot ARM Trusted Firmware to the point where it pulls in OP-TEE and then falls over because it doesn't have a UEFI image it can chain to). Signed-off-by: Peter Maydell --- target-arm/cpu64.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/target-arm/cpu64.c b/target-arm/cpu64.c index cc177bb..073677b5 100644 --- a/target-arm/cpu64.c +++ b/target-arm/cpu64.c @@ -109,6 +109,7 @@ static void aarch64_a57_initfn(Object *obj) set_feature(&cpu->env, ARM_FEATURE_V8_SHA256); set_feature(&cpu->env, ARM_FEATURE_V8_PMULL); set_feature(&cpu->env, ARM_FEATURE_CRC); +set_feature(&cpu->env, ARM_FEATURE_EL3); cpu->kvm_target = QEMU_KVM_ARM_TARGET_CORTEX_A57; cpu->midr = 0x411fd070; cpu->revidr = 0x; @@ -161,6 +162,7 @@ static void aarch64_a53_initfn(Object *obj) set_feature(&cpu->env, ARM_FEATURE_V8_SHA256); set_feature(&cpu->env, ARM_FEATURE_V8_PMULL); set_feature(&cpu->env, ARM_FEATURE_CRC); +set_feature(&cpu->env, ARM_FEATURE_EL3); cpu->kvm_target = QEMU_KVM_ARM_TARGET_CORTEX_A53; cpu->midr = 0x410fd034; cpu->revidr = 0x; -- 1.9.1
[Qemu-devel] [PATCH 2/7] target-arm: Implement MDCR_EL3 and SDCR
Implement the MDCR_EL3 register (which is SDCR for AArch32). For the moment we implement it as reads-as-written. Signed-off-by: Peter Maydell --- target-arm/cpu.h| 1 + target-arm/helper.c | 24 2 files changed, 25 insertions(+) diff --git a/target-arm/cpu.h b/target-arm/cpu.h index 52284e9..cf2df50 100644 --- a/target-arm/cpu.h +++ b/target-arm/cpu.h @@ -382,6 +382,7 @@ typedef struct CPUARMState { uint64_t mdscr_el1; uint64_t oslsr_el1; /* OS Lock Status */ uint64_t mdcr_el2; +uint64_t mdcr_el3; /* If the counter is enabled, this stores the last time the counter * was reset. Otherwise it stores the counter value */ diff --git a/target-arm/helper.c b/target-arm/helper.c index b631b83..8b96b80 100644 --- a/target-arm/helper.c +++ b/target-arm/helper.c @@ -364,6 +364,23 @@ static CPAccessResult access_el3_aa32ns_aa64any(CPUARMState *env, return CP_ACCESS_OK; } +/* Some secure-only AArch32 registers trap to EL3 if used from + * Secure EL1 (but are just ordinary UNDEF in other non-EL3 contexts). + * We assume that the .access field is set to PL1_RW. + */ +static CPAccessResult access_trap_aa32s_el1(CPUARMState *env, +const ARMCPRegInfo *ri) +{ +if (arm_current_el(env) == 3) { +return CP_ACCESS_OK; +} +if (arm_is_secure_below_el3(env)) { +return CP_ACCESS_TRAP_EL3; +} +/* This will be EL1 NS and EL2 NS, which just UNDEF */ +return CP_ACCESS_TRAP_UNCATEGORIZED; +} + static void dacr_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t value) { ARMCPU *cpu = arm_env_get_cpu(env); @@ -3532,6 +3549,13 @@ static const ARMCPRegInfo el3_cp_reginfo[] = { .cp = 15, .opc1 = 0, .crn = 1, .crm = 1, .opc2 = 0, .access = PL3_RW, .fieldoffset = offsetoflow32(CPUARMState, cp15.scr_el3), .writefn = scr_write }, +{ .name = "MDCR_EL3", .state = ARM_CP_STATE_AA64, + .opc0 = 3, .opc1 = 6, .crn = 1, .crm = 3, .opc2 = 1, + .access = PL3_RW, .fieldoffset = offsetof(CPUARMState, cp15.mdcr_el3) }, +{ .name = "SDCR", .type = ARM_CP_ALIAS, + .cp = 15, .opc1 = 0, .crn = 1, .crm = 3, .opc2 = 1, + .access = PL1_RW, .accessfn = access_trap_aa32s_el1, + .fieldoffset = offsetoflow32(CPUARMState, cp15.mdcr_el3) }, { .name = "SDER32_EL3", .state = ARM_CP_STATE_AA64, .opc0 = 3, .opc1 = 6, .crn = 1, .crm = 1, .opc2 = 1, .access = PL3_RW, .resetvalue = 0, -- 1.9.1
[Qemu-devel] [PATCH 5/7] target-arm: Add isread parameter to CPAccessFns
System registers might have access requirements which need to be described via a CPAccessFn and which differ for reads and writes. For this to be possible we need to pass the access function a parameter to tell it whether the access being checked is a read or a write. Signed-off-by: Peter Maydell --- target-arm/cpu.h | 4 ++- target-arm/helper.c| 81 +- target-arm/helper.h| 2 +- target-arm/op_helper.c | 5 +-- target-arm/translate-a64.c | 6 ++-- target-arm/translate.c | 7 ++-- 6 files changed, 68 insertions(+), 37 deletions(-) diff --git a/target-arm/cpu.h b/target-arm/cpu.h index 0fb79d0..5137632 100644 --- a/target-arm/cpu.h +++ b/target-arm/cpu.h @@ -1319,7 +1319,9 @@ typedef uint64_t CPReadFn(CPUARMState *env, const ARMCPRegInfo *opaque); typedef void CPWriteFn(CPUARMState *env, const ARMCPRegInfo *opaque, uint64_t value); /* Access permission check functions for coprocessor registers. */ -typedef CPAccessResult CPAccessFn(CPUARMState *env, const ARMCPRegInfo *opaque); +typedef CPAccessResult CPAccessFn(CPUARMState *env, + const ARMCPRegInfo *opaque, + bool isread); /* Hook function for register reset */ typedef void CPResetFn(CPUARMState *env, const ARMCPRegInfo *opaque); diff --git a/target-arm/helper.c b/target-arm/helper.c index d85b04f..8bc3ea3 100644 --- a/target-arm/helper.c +++ b/target-arm/helper.c @@ -344,7 +344,8 @@ void init_cpreg_list(ARMCPU *cpu) * access_el3_aa32ns_aa64any: Used to check both AArch32/64 register views. */ static CPAccessResult access_el3_aa32ns(CPUARMState *env, -const ARMCPRegInfo *ri) +const ARMCPRegInfo *ri, +bool isread) { bool secure = arm_is_secure_below_el3(env); @@ -356,10 +357,11 @@ static CPAccessResult access_el3_aa32ns(CPUARMState *env, } static CPAccessResult access_el3_aa32ns_aa64any(CPUARMState *env, -const ARMCPRegInfo *ri) +const ARMCPRegInfo *ri, +bool isread) { if (!arm_el_is_aa64(env, 3)) { -return access_el3_aa32ns(env, ri); +return access_el3_aa32ns(env, ri, isread); } return CP_ACCESS_OK; } @@ -369,7 +371,8 @@ static CPAccessResult access_el3_aa32ns_aa64any(CPUARMState *env, * We assume that the .access field is set to PL1_RW. */ static CPAccessResult access_trap_aa32s_el1(CPUARMState *env, -const ARMCPRegInfo *ri) +const ARMCPRegInfo *ri, +bool isread) { if (arm_current_el(env) == 3) { return CP_ACCESS_OK; @@ -651,7 +654,8 @@ static void cpacr_write(CPUARMState *env, const ARMCPRegInfo *ri, env->cp15.cpacr_el1 = value; } -static CPAccessResult cpacr_access(CPUARMState *env, const ARMCPRegInfo *ri) +static CPAccessResult cpacr_access(CPUARMState *env, const ARMCPRegInfo *ri, + bool isread) { if (arm_feature(env, ARM_FEATURE_V8)) { /* Check if CPACR accesses are to be trapped to EL2 */ @@ -668,7 +672,8 @@ static CPAccessResult cpacr_access(CPUARMState *env, const ARMCPRegInfo *ri) return CP_ACCESS_OK; } -static CPAccessResult cptr_access(CPUARMState *env, const ARMCPRegInfo *ri) +static CPAccessResult cptr_access(CPUARMState *env, const ARMCPRegInfo *ri, + bool isread) { /* Check if CPTR accesses are set to trap to EL3 */ if (arm_current_el(env) == 2 && (env->cp15.cptr_el[3] & CPTR_TCPAC)) { @@ -710,7 +715,8 @@ static const ARMCPRegInfo v6_cp_reginfo[] = { REGINFO_SENTINEL }; -static CPAccessResult pmreg_access(CPUARMState *env, const ARMCPRegInfo *ri) +static CPAccessResult pmreg_access(CPUARMState *env, const ARMCPRegInfo *ri, + bool isread) { /* Performance monitor registers user accessibility is controlled * by PMUSERENR. @@ -1154,7 +1160,8 @@ static void teecr_write(CPUARMState *env, const ARMCPRegInfo *ri, env->teecr = value; } -static CPAccessResult teehbr_access(CPUARMState *env, const ARMCPRegInfo *ri) +static CPAccessResult teehbr_access(CPUARMState *env, const ARMCPRegInfo *ri, +bool isread) { if (arm_current_el(env) == 0 && (env->teecr & 1)) { return CP_ACCESS_TRAP; @@ -1207,7 +1214,8 @@ static const ARMCPRegInfo v6k_cp_reginfo[] = { #ifndef CONFIG_USER_ONLY -static CPAccessResult gt_cntfrq_access(CPUARMState *env, const ARMCPRegInfo *ri) +static CPAccessResult gt_cntfrq_access(CPUARMState *env, const ARMCPRegInfo *ri, + bool isre
Re: [Qemu-devel] [PATCH 07/10] virtio: combine the read of a descriptor
On 03/02/2016 13:34, Gonglei (Arei) wrote: > Hi, > >> Subject: [Qemu-devel] [PATCH 07/10] virtio: combine the read of a descriptor >> >> Compared to vring, virtio has a performance penalty of 10%. Fix it >> by combining all the reads for a descriptor in a single address_space_read >> call. This also simplifies the code nicely. >> >> Reviewed-by: Cornelia Huck >> Signed-off-by: Paolo Bonzini >> --- >> hw/virtio/virtio.c | 86 >> ++ >> 1 file changed, 35 insertions(+), 51 deletions(-) >> > > Unbelievable! After applying this patch, the virtio-crypto speed can attach > 74MB/sec, host > Cpu overhead is 180% (the main thread 100% and vcpu threads 80%) The three patches from Vincenzo will help too. What was it like before? Also, are you using ioeventfd or dataplane? virtio-crypto sounds like something that could be very easily run outside the "big QEMU lock". Paolo > Testing AES-128-CBC cipher: > Encrypting in chunks of 256 bytes: done. 371.94 MiB in 5.02 secs: > 74.12 MiB/sec (1523475 packets) > Encrypting in chunks of 256 bytes: done. 369.85 MiB in 5.01 secs: > 73.88 MiB/sec (1514900 packets) > Encrypting in chunks of 256 bytes: done. 371.07 MiB in 5.02 secs: > 73.97 MiB/sec (1519914 packets) > Encrypting in chunks of 256 bytes: done. 371.66 MiB in 5.02 secs: > 74.09 MiB/sec (1522309 packets) > Encrypting in chunks of 256 bytes: done. 371.79 MiB in 5.02 secs: > 74.12 MiB/sec (1522868 packets) > Encrypting in chunks of 256 bytes: done. 371.94 MiB in 5.02 secs: > 74.15 MiB/sec (1523457 packets) > Encrypting in chunks of 256 bytes: done. 371.90 MiB in 5.02 secs: > 74.14 MiB/sec (1523317 packets) > Encrypting in chunks of 256 bytes: done. 371.71 MiB in 5.02 secs: > 74.10 MiB/sec (1522522 packets) > > 15.95% qemu-kvm [.] address_space_translate > 6.98% qemu-kvm [.] qemu_get_ram_ptr > 4.87% libpthread-2.19.so [.] __pthread_mutex_unlock_usercnt > 4.40% qemu-kvm [.] qemu_ram_addr_from_host > 3.79% qemu-kvm [.] address_space_map > 3.41% libc-2.19.so [.] _int_malloc > 3.29% libc-2.19.so [.] _int_free > 3.07% libc-2.19.so [.] malloc > 2.95% libpthread-2.19.so [.] pthread_mutex_lock > 2.94% qemu-kvm [.] phys_page_find > 2.73% qemu-kvm [.] address_space_translate_internal > 2.65% libc-2.19.so [.] malloc_consolidate > 2.35% libc-2.19.so [.] __memcpy_sse2_unaligned > 1.72% qemu-kvm [.] find_next_zero_bit > 1.38% qemu-kvm [.] address_space_rw > 1.34% qemu-kvm [.] object_unref > 1.30% qemu-kvm [.] object_ref > 1.28% qemu-kvm [.] virtqueue_pop > 1.20% libc-2.19.so [.] memset > 1.11% qemu-kvm [.] virtio_notify > > Thank you so much! > > Regards, > -Gonglei > >> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c >> index 79a635f..2433866 100644 >> --- a/hw/virtio/virtio.c >> +++ b/hw/virtio/virtio.c >> @@ -107,35 +107,15 @@ void virtio_queue_update_rings(VirtIODevice *vdev, >> int n) >>vring->align); >> } >> >> -static inline uint64_t vring_desc_addr(VirtIODevice *vdev, hwaddr desc_pa, >> - int i) >> +static void vring_desc_read(VirtIODevice *vdev, VRingDesc *desc, >> +hwaddr desc_pa, int i) >> { >> -hwaddr pa; >> -pa = desc_pa + sizeof(VRingDesc) * i + offsetof(VRingDesc, addr); >> -return virtio_ldq_phys(vdev, pa); >> -} >> - >> -static inline uint32_t vring_desc_len(VirtIODevice *vdev, hwaddr desc_pa, >> int >> i) >> -{ >> -hwaddr pa; >> -pa = desc_pa + sizeof(VRingDesc) * i + offsetof(VRingDesc, len); >> -return virtio_ldl_phys(vdev, pa); >> -} >> - >> -static inline uint16_t vring_desc_flags(VirtIODevice *vdev, hwaddr desc_pa, >> -int i) >> -{ >> -hwaddr pa; >> -pa = desc_pa + sizeof(VRingDesc) * i + offsetof(VRingDesc, flags); >> -return virtio_lduw_phys(vdev, pa); >> -} >> - >> -static inline uint16_t vring_desc_next(VirtIODevice *vdev, hwaddr desc_pa, >> - int i) >> -{ >> -hwaddr pa; >> -pa = desc_pa + sizeof(VRingDesc) * i + offsetof(VRingDesc, next); >> -return virtio_lduw_phys(vdev, pa); >> +address_space_read(&address_space_memory, desc_pa + i * >> sizeof(VRingDesc), >> + MEMTXATTRS_UNSPECIFIED, (void *)desc, >> sizeof(VRingDesc)); >> +virtio_tswap64s(vdev, &desc->addr); >> +virtio_tswap32s(vdev, &desc->len); >> +virtio_tswap16s(vdev, &desc->flags); >> +virtio_tswap16s(vdev, &desc->next); >> } >> >> static inline uint16_t vring_avail_flags(VirtQueue *vq) >> @@ -345,18 +325,18 @@
Re: [Qemu-devel] [PATCH v6 1/5] util: Introduce error reporting functions with fatal/abort
Markus Armbruster writes: > Thomas Huth writes: >> On 03.02.2016 10:48, Markus Armbruster wrote: >>> David Gibson writes: >>> On Tue, Feb 02, 2016 at 10:47:35PM +0100, Thomas Huth wrote: > On 02.02.2016 19:53, Markus Armbruster wrote: >> Lluís Vilanova writes: > ... > >>> diff --git a/include/qemu/error-report.h b/include/qemu/error-report.h >>> index 7ab2355..6c2f142 100644 >>> --- a/include/qemu/error-report.h >>> +++ b/include/qemu/error-report.h >>> @@ -43,4 +43,23 @@ void error_report(const char *fmt, ...) >>> GCC_FMT_ATTR(1, 2); >>> const char *error_get_progname(void); >>> extern bool enable_timestamp_msg; >>> >>> +/* Report message and exit with error */ >>> +void QEMU_NORETURN error_vreport_fatal(const char *fmt, va_list ap) >>> GCC_FMT_ATTR(1, 0); >>> +void QEMU_NORETURN error_report_fatal(const char *fmt, ...) >>> GCC_FMT_ATTR(1, 2); >> >> This lets people write things like >> >> error_report_fatal("The sky is falling"); >> >> instead of >> >> error_report("The sky is falling"); >> exit(1); >> >> or >> >> fprintf(stderr, "The sky is falling\n"); >> exit(1); >> >> I don't think that's an improvement in clarity. > > The problem is not the existing code, but that in a couple of new > patches, I've now already seen that people are trying to use > > error_setg(&error_fatal, ... ); So, I don't actually see any real advantage to error_report_fatal(...) over error_setg(&error_fatal, ...). >>> >>> I do. Compare: >>> >>> (a) error_report(...); >>> exit(1); >>> >>> (b) error_report_fatal(...); >>> >>> (c) error_setg(&error_fatal, ...); >>> >>> In my opinion, (a) is clearest: even a relatively clueless reader will >>> know what exit(1) does, can guess what error_report() approximately >>> does, and doesn't need to know what it does exactly. (b) is slightly >>> less obvious, and (c) is positively opaque. >>> >>> Let's stick to the obvious (a) and be done with it. >> >> Ok, (a) is fine for me too, as long as we avoid (c). Lluís, could you >> maybe add that information to your patch that updates the HACKING text? > I feel such detailed advice belings into error.h. Sketch appended. > If that doesn't succeed in keeping (c) out, make checkpatch flag it. >> (and sorry for the fuzz with error_report_fatal() ... I thought it would >> be a good solution to avoid (c), but if (a) is preferred instead, then >> we should go with that solution instead). I can easily change that, no problem. I'm just happy consensus is landing on this subject. >> And, by the way, what about the spots that currently already use >> error_setg(&error_abort, ) ? Should they be turned into >> error_report() + abort() instead? Or only abort(), without error >> message, since abort() is only about programming errors? > As I wrote in my first reply to this thread, I'd like them to be cleaned > up to just abort() or assert(). > I like assert(), because it gives me exactly what I can use to debug the > programming error: a core dump (if enabled) and a source location > (useful when no core dump). I never bought the argument that we should > use abort() instead of assert(0) because "what if NDEBUG?!?". If you > define NDEBUG, our 600+ abort()s won't save you from our 4000+ > assert()s. Sorry, but I don't buy the argument of, "I prefer assert() because there's already lots of them". To me, there's a semantic difference between debug builds and regular ones (aka, assert vs abort). Also, I think it adds to the confusion that assert and abort seem to be used interchangeably in the code. What about this definition? * exit(): user-triggered errors * abort(): general programming errors * assert(): additional sanity/consistency checks against programming errors Now, abort & assert have an overlap. Should we discourage one in favour of the other? Also: * error_report_fatal ensures the same exit code is always used (otherwise it can fail with inconsistent error codes) * error_report_abort brings the code information of assert into abort But of course, I'm happy either way :) > diff --git a/include/qapi/error.h b/include/qapi/error.h > index 45d6c72..ea7e74f 100644 > --- a/include/qapi/error.h > +++ b/include/qapi/error.h > @@ -162,6 +162,9 @@ ErrorClass error_get_class(const Error *err); > * human-readable error message is made from printf-style @fmt, ... > * The resulting message should be a single phrase, with no newline or > * trailing punctuation. > + * Please don't error_setg(&error_fatal, ...), use error_report() and > + * exit(), because that's more obvious. > + * Likewise, don't error_setg(&error_abort, ...), use assert(). > */ > #define error_setg(errp, fmt, ...) \ > error_setg_internal((errp), __FILE__, __LINE__, __func__, \ > @@ -213,6 +216,8 @@ void error_setg_win32_internal(Er
Re: [Qemu-devel] Memory on stellaris board
On Wed, Feb 3, 2016 at 10:34 AM, Peter Maydell wrote: > On 3 February 2016 at 13:00, Aurelio Remonda > wrote: >> Hello, i was trying to understand how does the sram and flash size >> works on lm3s6965evb, i found a hardcoded 0x00ff007f as dc0 value and >> how flash_size and sram_size are calculated based on that hexadecimal. >> I mean this: >> >> flash_size = (((board->dc0 & 0x) + 1) << 1) * 1024; >> sram_size = ((board->dc0 >> 18) + 1) * 1024; >> On stellaris.c >> >> When i use the -m [size] flag while running qemu how does the flag >> affect does flash_size and sram_size values? Or it doesn't? Where can >> i see that the memory has indeed change? > > This board model ignores -m. We just implement a model of this particular > bit of hardware, which has a fixed amount of RAM in it. Thanks for the quick answer, do you think it is worth to make the m flag work for this model? Can you give me a hint on where to look?(another board that use it) so i can add this feature for this model. Thanks! -- Aurelio Remonda Software Engineer San Lorenzo 47, 3rd Floor, Office 5 Córdoba, Argentina Phone: +54-351-4217888 / 4218211
Re: [Qemu-devel] [PATCH v9] spec: add qcow2 bitmaps extension specification
On 03.02.2016 11:04, Fam Zheng wrote: On Tue, 02/02 09:35, Vladimir Sementsov-Ogievskiy wrote: The new feature for qcow2: storing bitmaps. This patch adds new header extension to qcow2 - Bitmaps Extension. It provides an ability to store virtual disk related bitmaps in a qcow2 image. For now there is only one type of such bitmaps: Dirty Tracking Bitmap, which just tracks virtual disk changes from some moment. Note: Only bitmaps, relative to the virtual disk, stored in qcow2 file, should be stored in this qcow2 file. The size of each bitmap (considering its granularity) is equal to virtual disk size. Signed-off-by: Vladimir Sementsov-Ogievskiy --- v9 - rewordings, thanks to Max v8 - rewordings - bitmap_directory_size: 4b -> 8b - add more descriptive description in == Bitmaps == section - add paragraph "Dirty tracking bitmaps" Bitmap directory entry: - extra data should not allocate additional clusters - padding must be all-bytes-zero - add extra_data_compatible flag (now behavior in case of unknown extra data is defined by this flag) v7: - Rewordings, grammar. Max, Eric, John, thank you very much. - add last paragraph: remaining bits in bitmap data clusters must be zero. - s/Bitmap Directory/bitmap directory/ and other names like this at the request of Max. v6: - reword bitmap_directory_size description - bitmap type: make 0 reserved - extra_data_size: resize to 4bytes Also, I've marked this field as "must be zero". We can always change it, if we decide allowing managing app to specify any extra data, by defining some magic value as a top of user extra data.. So, for now non zeor extra_data_size should be considered as an error. - swap name and extra_data to give good alignment to extra_data. v5: - 'Dirty bitmaps' renamed to 'Bitmaps', as we may have several types of bitmaps. - rewordings - move upper bounds to "Notes about Qemu limits" - s/should/must somewhere. (but not everywhere) - move name_size field closer to name itself in bitmap header - add extra data area to bitmap header - move bitmap data description to separate section docs/specs/qcow2.txt | 223 ++- 1 file changed, 222 insertions(+), 1 deletion(-) diff --git a/docs/specs/qcow2.txt b/docs/specs/qcow2.txt index f236d8c..db5e666 100644 --- a/docs/specs/qcow2.txt +++ b/docs/specs/qcow2.txt @@ -103,7 +103,18 @@ in the description of a field. write to an image with unknown auto-clear features if it clears the respective bits from this field first. -Bits 0-63: Reserved (set to 0) +Bit 0: Bitmaps extension bit +This bit indicates consistency for the bitmaps +extension data. + +It is an error if this bit is set without the +bitmaps extension present. + +If the bitmaps extension is present but this +bit is unset, the bitmaps extension data must be +considered inconsistent. + +Bits 1-63: Reserved (set to 0) 96 - 99: refcount_order Describes the width of a reference count block entry (width @@ -123,6 +134,7 @@ be stored. Each extension has a structure like the following: 0x - End of the header extension area 0xE2792ACA - Backing file format name 0x6803f857 - Feature name table +0x23852875 - Bitmaps extension other - Unknown header extension, can be safely ignored @@ -166,6 +178,36 @@ the header extension data. Each entry look like this: terminated if it has full length) +== Bitmaps extension == + +The bitmaps extension is an optional header extension. It provides the ability +to store bitmaps related to a virtual disk. For now, there is only one bitmap +type: the dirty tracking bitmap, which tracks virtual disk changes from some +point in time. + +The data of the extension should be considered consistent only if the +corresponding auto-clear feature bit is set, see autoclear_features above. + +The fields of the bitmaps extension are: + +Byte 0 - 3: nb_bitmaps + The number of bitmaps contained in the image. Must be + greater than or equal to 1. + + Note: Qemu currently only supports up to 65535 bitmaps per + image. + + 4 - 7: Reserved, must be zero. + + 8 - 15: bitmap_directory_size + Size of the bitmap directory in bytes. It is the cumulative + size of all (nb_bitmaps) bitmap headers. + +