Re: [Qemu-devel] About hotplug multifunction
On Tue, Sep 13, 2011 at 09:52:49AM +0300, Gleb Natapov wrote: > On Tue, Sep 13, 2011 at 02:57:20PM +0900, Isaku Yamahata wrote: > > On Sun, Sep 11, 2011 at 12:05:17PM +0300, Michael S. Tsirkin wrote: > > > On Sat, Sep 10, 2011 at 02:43:11AM +0900, Isaku Yamahata wrote: > > > > pci/pcie hot plug needs clean up for multifunction hotplug in long term. > > > > Only single function device case works. Multifunction case is broken > > > > somwehat. > > > > Especially the current acpi based hotplug should be replaced by > > > > the standardized hot plug controller in long term. > > > > > > We'll need to keep supporting windows XP, which IIUC only > > > supports hotplug through ACPI. So it looks like we'll > > > need both. > > > > Yes, we'll need both then. > > It would be possible to implement acpi-based hotplug with > > standardized hotplug controller. Not with qemu-specific controller. > > > Where is this "standardized hotplug controller" documented? PCI Hot-Plug 1.1 by PCI sig. NOTE: I already implemented pcie native hotplug in qemu which is defined in pcie spec. > > It would require a bit amount of work to write ACPI code in DSDT that > > handles standardized hotplug controller. > > So I'm not sure it's worth while only for windows XP support. > > -- > > yamahata > > -- > Gleb. > -- yamahata
[Qemu-devel] Armel host (x86 emul.) img disk not writable
Hi, It's my first message so : Me : I work from a while with Vmware and Virtualbox I have integrated Virtualbox in the x86 QNAP NAS system Armel Platform : QNAP NAS TS-219 with a Marvell (Arm) SOC 1,8 Ghz Test run QEMU x86 emulation inside (VM like freedos or Windows) Context : only Modules are possible no kernel change ... ... test are done in a chroot env. Debian Squeeze for Armel, so add X11 client to NAS and use X-Ming or Debian box as X Server Test working : install qemu thru apt-get Start emu from fd (freedos) cdrom (live_cd linux , Win98 installation CD) ... all run open the windows, menu and keyboard works, I have add, also in fd program to manage the hlt ... so all seems to run. tested also qemu-launcher ... works. Problem : Each time I create a disk image to add or install on disk ... the install (Windows or Freedos) failed because he don't see the disk or can't read or write on I have tested qemu-img with raw, qcow, qcow2 without success ... img file are full rw for all, qemu run under root, but it's same under a "normal" user. Due to the lack of IDE in the Qnap ... I have try also compile modules for ide-core and load it (insmod) but no change ... any advice ? Certainly I forgot somethings ? thanks for help. Philippe.
Re: [Qemu-devel] [PATCH v3 5/6] vga: Use linear mapping + dirty logging in chain 4 memory access mode
On 12.09.2011, at 17:57, Jan Kiszka wrote: > On 2011-09-12 17:49, Jan Kiszka wrote: >> On 2011-09-12 17:45, Andreas Färber wrote: >>> Am 12.09.2011 17:33, schrieb Jan Kiszka: On 2011-09-12 17:20, Alexander Graf wrote: > Jan Kiszka wrote: >> Most VGA memory access modes require MMIO handling as they demand weird >> logic to get a byte from or into the video RAM. However, there is one >> exception: chain 4 mode with all memory planes enabled for writing. This >> mode actually allows lineary mapping, which can then be combined with >> dirty logging to accelerate KVM. >> >> This patch accelerates specifically VBE accesses like they are used by >> grub in graphical mode. Not only the standard VGA adapter benefits from >> this, also vmware and spice in VGA mode. >> >> CC: Gerd Hoffmann >> CC: Avi Kivity >> Signed-off-by: Jan Kiszka >> > [...] > >> +static void vga_update_memory_access(VGACommonState *s) >> +{ >> +MemoryRegion *region, *old_region = s->chain4_alias; >> +target_phys_addr_t base, offset, size; >> + >> +s->chain4_alias = NULL; >> + >> +if ((s->sr[0x02] & 0xf) == 0xf && s->sr[0x04] & 0x08) { >> +offset = 0; >> +switch ((s->gr[6] >> 2) & 3) { >> +case 0: >> +base = 0xa; >> +size = 0x2; >> +break; >> +case 1: >> +base = 0xa; >> +size = 0x1; >> +offset = s->bank_offset; >> +break; >> +case 2: >> +base = 0xb; >> +size = 0x8000; >> +break; >> +case 3: >> +base = 0xb8000; >> +size = 0x8000; >> +break; >> +} >> +region = g_malloc(sizeof(*region)); >> +memory_region_init_alias(region, "vga.chain4", &s->vram, >> offset, size); >> +memory_region_add_subregion_overlap(s->legacy_address_space, >> base, >> +region, 2); >> > This one eventually gives me the following in info mtree with -M g3beige > on qemu-system-ppc: > > (qemu) info mtree > memory > system addr off size 7fff > -vga.chain4 addr 000a off size 1 > -macio addr 8088 off size 8 > --macio-nvram addr 0006 off size 2 > --pmac-ide addr 0002 off size 1000 > --cuda addr 00016000 off size 2000 > --escc-bar addr 00013000 off size 40 > --dbdma addr 8000 off size 1000 > --heathrow-pic addr off size 1000 > -vga.rom addr 8080 off size 1 > -vga.vram addr 8000 off size 80 > -vga-lowmem addr 800a off size 2 > -escc addr 80013000 off size 40 > -isa-mmio addr fe00 off size 20 > I/O > io addr off size 1 > -cmd646-bmdma addr 0700 off size 10 > --cmd646-bmdma-ioport addr 000c off size 4 > --cmd646-bmdma-bus addr 0008 off size 4 > --cmd646-bmdma-ioport addr 0004 off size 4 > --cmd646-bmdma-bus addr off size 4 > -cmd646-cmd addr 0680 off size 4 > -cmd646-data addr 0600 off size 8 > -cmd646-cmd addr 0580 off size 4 > -cmd646-data addr 0500 off size 8 > -ne2000 addr 0400 off size 100 > > This ends up overmapping 0xa, effectively overwriting kernel data. > If I #if 0 the offending chunk out, everything is fine. I would assume > that chain4 really needs to be inside of lowmem? No idea about VGA, but > I'm sure you know what's going on :). Does this help? diff --git a/hw/vga.c b/hw/vga.c index 125fb29..0a0c5a6 100644 --- a/hw/vga.c +++ b/hw/vga.c @@ -181,6 +181,7 @@ static void vga_update_memory_access(VGACommonState *s) size = 0x8000; break; } +base += isa_mem_base; region = g_malloc(sizeof(*region)); memory_region_init_alias(region, "vga.chain4", &s->vram, offset, size); memory_region_add_subregion_overlap(s->legacy_address_space, base, >>> >>> No longer oopses, but the screen looks chaotic now (black bar at bottom, >>> part of contents at top etc.). >> >> Does this PPC machine map the ISA range and forward VGA accesses to the >> adapter in general? > > If it does, please post a dump of the VGACommonState while the screen is > corrupted (gdb or via device_show [1]. Maybe I missed some condition > that prevents chain4 optimizations, and your guest triggers this. Picture: http://dl.dropbo
Re: [Qemu-devel] [PATCH v3 5/6] vga: Use linear mapping + dirty logging in chain 4 memory access mode
On 09/13/2011 09:54 AM, Alexander Graf wrote: > > I had similar problems with sun4u, fixed with > f69539b14bdba7a5cd22e1f4bed439b476b17286. I think also here, PCI > should be given a memory range at 0x8000 and VGA should > automatically use that like. Yeah, usually the ISA bus is behind an ISA-PCI bridge, so it should inherit the offset from its parent. Or do you mean something different? He means that isa_mem_base should go away; instead isa_address_space() should be a subregion at offset 0x8000. Which vga variant are you using? -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain.
[Qemu-devel] QEMU Image problem
Hi, I have some problem at the genarating the KVM-QEMU image from the .iso file ., I tried virt-manager but could not create the proper one . Can you pls tell a coorect way genarate the QEMU image from .iso file ..? Regards Bala
[Qemu-devel] [PATCH][RFC][0/2] REF+/REF- optimization
These patches try to trade-off between leaks and speed for clusters refcounts. Refcount increments (REF+ or refp) are handled in a different way from decrements (REF- or refm). The reason it that posting or not flushing a REF- cause "just" a leak while posting a REF+ cause a corruption. To optimize REF- I just used an array to store offsets then when a flush is requested or array reach a limit (currently 1022) the array is sorted and written to disk. I use an array with offset instead of ranges to support compression (an offset could appear multiple times in the array). I consider this patch quite ready. To optimize REF+ I mark a range as allocated and use this range to get new ones (avoiding writing refcount to disk). When a flush is requested or in some situations (like snapshot) this cache is disabled and flushed (written as REF-). I do not consider this patch ready, it works and pass all io-tests but for instance I would avoid allocating new clusters for refcount during preallocation. End speed up is quite visible allocating clusters (more then 20%). Frediano Ziglio (2): qcow2: optimize refminus updates qcow2: ref+ optimization block/qcow2-refcount.c | 270 +--- block/qcow2.c |2 + block/qcow2.h | 16 +++ 3 files changed, 275 insertions(+), 13 deletions(-)
[Qemu-devel] [PATCH][RFC][1/2] qcow2: optimize refminus updates
Cache refcount decrement in an array to trade-off between leaks and speed. Signed-off-by: Frediano Ziglio --- block/qcow2-refcount.c | 142 ++-- block/qcow2.c |1 + block/qcow2.h | 14 + 3 files changed, 153 insertions(+), 4 deletions(-) diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c index 9605367..7d59b68 100644 --- a/block/qcow2-refcount.c +++ b/block/qcow2-refcount.c @@ -40,6 +40,13 @@ int qcow2_refcount_init(BlockDriverState *bs) BDRVQcowState *s = bs->opaque; int ret, refcount_table_size2, i; +s->refm_cache_index = 0; +s->refm_cache_len = 1024; +s->refm_cache = g_malloc(s->refm_cache_len * sizeof(uint64)); +if (!s->refm_cache) { +goto fail; +} + refcount_table_size2 = s->refcount_table_size * sizeof(uint64_t); s->refcount_table = g_malloc(refcount_table_size2); if (s->refcount_table_size > 0) { @@ -53,12 +60,14 @@ int qcow2_refcount_init(BlockDriverState *bs) } return 0; fail: +g_free(s->refm_cache); return -ENOMEM; } void qcow2_refcount_close(BlockDriverState *bs) { BDRVQcowState *s = bs->opaque; +g_free(s->refm_cache); g_free(s->refcount_table); } @@ -634,13 +643,21 @@ int64_t qcow2_alloc_bytes(BlockDriverState *bs, int size) void qcow2_free_clusters(BlockDriverState *bs, int64_t offset, int64_t size) { +BDRVQcowState *s = bs->opaque; int ret; +int64_t start, last; BLKDBG_EVENT(bs->file, BLKDBG_CLUSTER_FREE); -ret = update_refcount(bs, offset, size, -1); -if (ret < 0) { -fprintf(stderr, "qcow2_free_clusters failed: %s\n", strerror(-ret)); -/* TODO Remember the clusters to free them later and avoid leaking */ +start = offset & ~(s->cluster_size - 1); +last = (offset + size - 1) & ~(s->cluster_size - 1); +for (; start <= last; start += s->cluster_size) { +ret = qcow2_refm_add(bs, start); +if (ret < 0) { +fprintf(stderr, "qcow2_free_clusters failed: %s\n", strerror(-ret)); +/* TODO Remember the clusters to free them later + * and avoid leaking */ +break; +} } } @@ -1165,3 +1182,120 @@ fail: return ret; } +int qcow2_refm_add_any(BlockDriverState *bs, int64_t offset) +{ +BDRVQcowState *s = bs->opaque; + +offset &= ~QCOW_OFLAG_COPIED; +if (s->refm_cache_index + 2 > s->refm_cache_len) { +int ret = qcow2_refm_flush(bs); +if (ret < 0) { +return ret; +} +} + +if ((offset & QCOW_OFLAG_COMPRESSED)) { +int nb_csectors = ((offset >> s->csize_shift) & s->csize_mask) + 1; +int64_t last; + +offset = (offset & s->cluster_offset_mask) & ~511; +last = offset + nb_csectors * 512 - 1; +if (!in_same_refcount_block(s, offset, last)) { +s->refm_cache[s->refm_cache_index++] = last; +} +} +s->refm_cache[s->refm_cache_index++] = offset; +return 0; +} + +static int uint64_cmp(const void *a, const void *b) +{ +#define A (*((const uint64_t *)a)) +#define B (*((const uint64_t *)b)) +if (A == B) { +return 0; +} +return A > B ? 1 : -1; +#undef A +#undef B +} + +int qcow2_refm_flush(BlockDriverState *bs) +{ +BDRVQcowState *s = bs->opaque; +uint16_t *refcount_block = NULL; +int64_t old_table_index = -1; +int ret, i, saved_index = 0; +int len = s->refm_cache_index; + +/* sort cache */ +qsort(s->refm_cache, len, sizeof(uint64_t), uint64_cmp); + +/* save */ +for (i = 0; i < len; ++i) { +uint64_t cluster_offset = s->refm_cache[i]; +int block_index, refcount; +int64_t cluster_index = cluster_offset >> s->cluster_bits; +int64_t table_index = +cluster_index >> (s->cluster_bits - REFCOUNT_SHIFT); + +/* Load the refcount block and allocate it if needed */ +if (table_index != old_table_index) { +if (refcount_block) { +ret = qcow2_cache_put(bs, s->refcount_block_cache, +(void **) &refcount_block); +if (ret < 0) { +goto fail; +} +saved_index = i; +refcount_block = NULL; +} + +ret = alloc_refcount_block(bs, cluster_index, &refcount_block); +if (ret < 0) { +goto fail; +} +} +old_table_index = table_index; + +qcow2_cache_entry_mark_dirty(s->refcount_block_cache, refcount_block); + +/* we can update the count and save it */ +block_index = cluster_index & +((1 << (s->cluster_bits - REFCOUNT_SHIFT)) - 1); + +refcount = be16_to_cpu(refcount_block[block_index]); +refcount--; +if (refcount < 0) { +ret = -EINVAL; +goto fail; +} +if (refcount == 0 && clus
[Qemu-devel] [PATCH][RFC][2/2] qcow2: ref+ optimization
preallocate multiple refcount increment in order to collapse allocation writes. This cause leaks in case of Qemu crash but no corruptions. Signed-off-by: Frediano Ziglio --- block/qcow2-refcount.c | 128 --- block/qcow2.c |1 + block/qcow2.h |2 + 3 files changed, 122 insertions(+), 9 deletions(-) diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c index 7d59b68..3792cda 100644 --- a/block/qcow2-refcount.c +++ b/block/qcow2-refcount.c @@ -30,6 +30,7 @@ static int64_t alloc_clusters_noref(BlockDriverState *bs, int64_t size); static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs, int64_t offset, int64_t length, int addend); +static void qcow2_refp_enable(BlockDriverState *bs); /*/ @@ -117,6 +118,12 @@ static int get_refcount(BlockDriverState *bs, int64_t cluster_index) ((1 << (s->cluster_bits - REFCOUNT_SHIFT)) - 1); refcount = be16_to_cpu(refcount_block[block_index]); +/* ignore preallocation */ +if (cluster_index >= s->refp_prealloc_begin +&& cluster_index < s->refp_prealloc_end) { +--refcount; +} + ret = qcow2_cache_put(bs, s->refcount_block_cache, (void**) &refcount_block); if (ret < 0) { @@ -207,6 +214,10 @@ static int alloc_refcount_block(BlockDriverState *bs, * refcount block into the cache */ +uint64_t old_free_cluster_index = s->free_cluster_index; +qcow2_refp_flush(bs); +s->free_cluster_index = old_free_cluster_index; + *refcount_block = NULL; /* We write to the refcount table, so we might depend on L2 tables */ @@ -215,6 +226,7 @@ static int alloc_refcount_block(BlockDriverState *bs, /* Allocate the refcount block itself and mark it as used */ int64_t new_block = alloc_clusters_noref(bs, s->cluster_size); if (new_block < 0) { +qcow2_refp_enable(bs); return new_block; } @@ -279,6 +291,7 @@ static int alloc_refcount_block(BlockDriverState *bs, } s->refcount_table[refcount_table_index] = new_block; +qcow2_refp_enable(bs); return 0; } @@ -400,10 +413,11 @@ static int alloc_refcount_block(BlockDriverState *bs, s->refcount_table_offset = table_offset; /* Free old table. Remember, we must not change free_cluster_index */ -uint64_t old_free_cluster_index = s->free_cluster_index; +old_free_cluster_index = s->free_cluster_index; qcow2_free_clusters(bs, old_table_offset, old_table_size * sizeof(uint64_t)); s->free_cluster_index = old_free_cluster_index; +qcow2_refp_enable(bs); ret = load_refcount_block(bs, new_block, (void**) refcount_block); if (ret < 0) { return ret; @@ -417,6 +431,7 @@ fail_block: if (*refcount_block != NULL) { qcow2_cache_put(bs, s->refcount_block_cache, (void**) refcount_block); } +qcow2_refp_enable(bs); return ret; } @@ -529,9 +544,23 @@ static int update_cluster_refcount(BlockDriverState *bs, BDRVQcowState *s = bs->opaque; int ret; -ret = update_refcount(bs, cluster_index << s->cluster_bits, 1, addend); -if (ret < 0) { -return ret; +/* handle preallocation */ +if (cluster_index >= s->refp_prealloc_begin +&& cluster_index < s->refp_prealloc_end) { + +/* free previous (should never happen) */ +int64_t index = s->refp_prealloc_begin; +for (; index < cluster_index; ++index) { +qcow2_refm_add(bs, index << s->cluster_bits); +} +addend--; +s->refp_prealloc_begin = cluster_index + 1; +} +if (addend) { +ret = update_refcount(bs, cluster_index << s->cluster_bits, 1, addend); +if (ret < 0) { +return ret; +} } bdrv_flush(bs->file); @@ -572,20 +601,94 @@ retry: return (s->free_cluster_index - nb_clusters) << s->cluster_bits; } +static void qcow2_refp_enable(BlockDriverState *bs) +{ +BDRVQcowState *s = bs->opaque; + +if (s->refp_prealloc_end < 0) { +/* enable again ? */ +if (++s->refp_prealloc_end == 0) { +s->refp_prealloc_end = +s->refp_prealloc_begin; +} +} +} + +int qcow2_refp_flush(BlockDriverState *bs) +{ +BDRVQcowState *s = bs->opaque; +int64_t index, end = s->refp_prealloc_end; + +if (end < 0) { +s->refp_prealloc_end = end - 1; +return 0; +} + +index = s->refp_prealloc_begin; +/* this disable next allocations */ +s->refp_prealloc_end = -1; +for (; index < end; ++index) { +qcow2_refm_add(bs, index << s->cluster_bits); +} +qcow2_refm_flush(bs); +return 0; +} + int64_t qcow2_alloc_clusters(BlockDriverState *bs, int64_t size) { -int64_t offset; -int ret; +BDRVQcowState *s = bs->opaque; +int64_t o
Re: [Qemu-devel] [PATCH v3 5/6] vga: Use linear mapping + dirty logging in chain 4 memory access mode
On 13.09.2011, at 09:51, Avi Kivity wrote: > On 09/13/2011 09:54 AM, Alexander Graf wrote: >> > >> > I had similar problems with sun4u, fixed with >> > f69539b14bdba7a5cd22e1f4bed439b476b17286. I think also here, PCI >> > should be given a memory range at 0x8000 and VGA should >> > automatically use that like. >> >> Yeah, usually the ISA bus is behind an ISA-PCI bridge, so it should inherit >> the offset from its parent. Or do you mean something different? >> > > He means that isa_mem_base should go away; instead isa_address_space() should > be a subregion at offset 0x8000. So we are talking about the same thing. Logically speaking, ISA devices are behind the ISA-PCI bridge, so the parent would be the bridge, right? > Which vga variant are you using? This is stdvga. Alex
Re: [Qemu-devel] [PATCH 1/4] Sparc: convert mmu_helper to trace framework
On Sun, Sep 11, 2011 at 04:41:10PM +, Blue Swirl wrote: > +mmu_helper_dmiss(uint64_t address, uint64_t context) "DMISS at > %"PRIx64" context %"PRIx64 > +mmu_helper_tfault(uint64_t address, uint64_t context) "TFAULT at > %"PRIx64" context %"PRIx64 > +mmu_helper_tmiss(uint64_t address, uint64_t context) "TMISS at > %"PRIx64" context %"PRIx64 >From docs/tracing.txt: format strings must begin and end with double quotes. When using portability macros, ensure they are preceded and followed by double quotes: "value %"PRIx64"" This is a parser limitation in scripts/tracetool. We could change it to treat everything after the trace event arguments as the format string, but today it explicitly looks for a pattern like ".*". I've added this to my tracing TODO list and it should be possible to lift the limitation soon. For now, please make sure the format string begins and ends with double quote. Stefan
Re: [Qemu-devel] Why qemu write/rw speed is so low?
On Tue, Sep 13, 2011 at 10:52:44AM +0800, Zhi Yong Wu wrote: > This is real log when fio issued with bs=128K and bps=100(block > I/O throttling): I would use 1024 * 1024 instead of 100 as the throughput limit. 10^5 is not a multiple of 512 bytes and is not a nice value in KB/s (976.5625). > > 8,201 0.0 24332 A WS 79958528 + 256 <- > (253,2) 71830016 256 blocks = 256 * 512 bytes = 128 KB per request. We know the maximum request size from Linux is 128 KB so this makes sense. > Throughput (R/W): 0KiB/s / 482KiB/s What throughput do you get without I/O throttling? Either I/O throttling is limiting too aggressively here or the physical disk is the bottleneck (I double that since the write throughput value is very low). We need to compare against the throughput when throttling is not enabled. Stefan
Re: [Qemu-devel] Why qemu write/rw speed is so low?
On Tue, Sep 13, 2011 at 10:38:28AM +0800, Zhi Yong Wu wrote: > On Fri, Sep 9, 2011 at 6:38 PM, Stefan Hajnoczi > wrote: > > On Fri, Sep 09, 2011 at 05:44:36PM +0800, Zhi Yong Wu wrote: > >> Today, i did some basical I/O testing, and suddenly found that qemu write > >> and rw speed is so low now, my qemu binary is built on commit > >> 344eecf6995f4a0ad1d887cec922f6806f91a3f8. > >> > >> Do qemu have regression? > >> > >> The testing data is shown as below: > >> > >> 1.) write > >> > >> test: (g=0): rw=write, bs=512-512/512-512, ioengine=libaio, iodepth=1 > > > > Please post your QEMU command-line. If your -drive is using > > cache=writethrough then small writes are slow because they require the > > physical disk to write and then synchronize its write cache. Typically > > cache=none is a good setting to use for local disks. > > > > The block size of 512 bytes is too small. Ext4 uses a 4 KB block size, > > so I think a 512 byte write from the guest could cause a 4 KB > > read-modify-write operation on the host filesystem. > > > > You can check this by running btrace(8) on the host during the > > benchmark. The blktrace output and the summary statistics will show > > what I/O pattern the host is issuing. > 8,201 0.0 337 A WS 425081504 + 8 <- > (253,1) 42611360 8 blocks = 8 * 512 bytes = 4 KB So we are not performing 512 byte writes. Some layer is changing the I/O pattern. Stefan
Re: [Qemu-devel] [PATCH v3 5/6] vga: Use linear mapping + dirty logging in chain 4 memory access mode
On 09/13/2011 10:54 AM, Alexander Graf wrote: >> >> Yeah, usually the ISA bus is behind an ISA-PCI bridge, so it should inherit the offset from its parent. Or do you mean something different? >> > > He means that isa_mem_base should go away; instead isa_address_space() should be a subregion at offset 0x8000. So we are talking about the same thing. Logically speaking, ISA devices are behind the ISA-PCI bridge, so the parent would be the bridge, right? Right. system_memory -> pci_address_space() -> isa_address_space() -> various vga areas. > Which vga variant are you using? This is stdvga. Don't see the call to vga_init() in that path (this is what passes isa_address_space() to the vga core). -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain.
Re: [Qemu-devel] [PATCH] qcow2: fix range check
2011/9/12 Kevin Wolf : > Am 10.09.2011 10:23, schrieb Frediano Ziglio: >> QCowL2Meta::offset is not cluster aligned but only sector aligned >> however nb_clusters count cluster from cluster start. >> This fix range check. Note that old code have no corruption issues >> related to this check cause it only cause intersection to occur >> when shouldn't. > > Are you sure? See below. (I think it doesn't corrupt the image, but for > a different reason) > >> >> Signed-off-by: Frediano Ziglio >> --- >> block/qcow2-cluster.c | 14 +++--- >> 1 files changed, 7 insertions(+), 7 deletions(-) >> >> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c >> index 428b5ad..2f76311 100644 >> --- a/block/qcow2-cluster.c >> +++ b/block/qcow2-cluster.c >> @@ -776,17 +776,17 @@ again: >> */ >> QLIST_FOREACH(old_alloc, &s->cluster_allocs, next_in_flight) { >> >> - uint64_t end_offset = offset + nb_clusters * s->cluster_size; >> - uint64_t old_offset = old_alloc->offset; >> - uint64_t old_end_offset = old_alloc->offset + >> - old_alloc->nb_clusters * s->cluster_size; >> + uint64_t start = offset >> s->cluster_bits; >> + uint64_t end = start + nb_clusters; >> + uint64_t old_start = old_alloc->offset >> s->cluster_bits; >> + uint64_t old_end = old_start + old_alloc->nb_clusters; >> >> - if (end_offset < old_offset || offset > old_end_offset) { >> + if (end < old_start || start > old_end) { >> /* No intersection */ > > Consider request A from 0x0 + 0x1000 bytes and request B from 0x2000 + > 0x1000 bytes. Both touch the same cluster and therefore should be > serialised, but 0x2000 > 0x1000, so we decided here that there is no > intersection and we don't have to care. > > Note that this doesn't corrupt the image, qcow2 can handle parallel > requests allocating the same cluster. In qcow2_alloc_cluster_link_l2() > we get an additional COW operation, so performance will be hurt, but > correctness is maintained. > I tested this adding some printf and also with strace and I can confirm that current code serialize allocation. Using ranges A (0-0x1000) and B (0x2000-0x3000) and assuming 0x1 (64k) as cluster size you get A: offset 0 nb_clusters 1 B: offset 0x2000 nb_clusters 1 So without the patch you get two ranges A: 0-0x1 B: 0x2000-0x12000 which intersects. >> } else { >> - if (offset < old_offset) { >> + if (start < old_start) { >> /* Stop at the start of a running allocation */ >> - nb_clusters = (old_offset - offset) >> s->cluster_bits; >> + nb_clusters = old_start - start; >> } else { >> nb_clusters = 0; >> } > > Anyway, the patch looks good. Applied to the block branch. > > Kevin > Oh... I realize that ranges are [start, end) (end not inclusive) so intersection test should be if (end <= old_start || start >= old_end) { intead of if (end < old_start || start > old_end) { However I don't understand why I got some small speed penalty with this change (only done some small tests). Frediano
Re: [Qemu-devel] [PATCH v3 5/6] vga: Use linear mapping + dirty logging in chain 4 memory access mode
On 2011-09-13 09:39, Alexander Graf wrote: > > On 12.09.2011, at 17:57, Jan Kiszka wrote: > >> On 2011-09-12 17:49, Jan Kiszka wrote: >>> On 2011-09-12 17:45, Andreas Färber wrote: Am 12.09.2011 17:33, schrieb Jan Kiszka: > On 2011-09-12 17:20, Alexander Graf wrote: >> Jan Kiszka wrote: >>> Most VGA memory access modes require MMIO handling as they demand weird >>> logic to get a byte from or into the video RAM. However, there is one >>> exception: chain 4 mode with all memory planes enabled for writing. This >>> mode actually allows lineary mapping, which can then be combined with >>> dirty logging to accelerate KVM. >>> >>> This patch accelerates specifically VBE accesses like they are used by >>> grub in graphical mode. Not only the standard VGA adapter benefits from >>> this, also vmware and spice in VGA mode. >>> >>> CC: Gerd Hoffmann >>> CC: Avi Kivity >>> Signed-off-by: Jan Kiszka >>> >> [...] >> >>> +static void vga_update_memory_access(VGACommonState *s) >>> +{ >>> +MemoryRegion *region, *old_region = s->chain4_alias; >>> +target_phys_addr_t base, offset, size; >>> + >>> +s->chain4_alias = NULL; >>> + >>> +if ((s->sr[0x02] & 0xf) == 0xf && s->sr[0x04] & 0x08) { >>> +offset = 0; >>> +switch ((s->gr[6] >> 2) & 3) { >>> +case 0: >>> +base = 0xa; >>> +size = 0x2; >>> +break; >>> +case 1: >>> +base = 0xa; >>> +size = 0x1; >>> +offset = s->bank_offset; >>> +break; >>> +case 2: >>> +base = 0xb; >>> +size = 0x8000; >>> +break; >>> +case 3: >>> +base = 0xb8000; >>> +size = 0x8000; >>> +break; >>> +} >>> +region = g_malloc(sizeof(*region)); >>> +memory_region_init_alias(region, "vga.chain4", &s->vram, >>> offset, size); >>> +memory_region_add_subregion_overlap(s->legacy_address_space, >>> base, >>> +region, 2); >>> >> This one eventually gives me the following in info mtree with -M g3beige >> on qemu-system-ppc: >> >> (qemu) info mtree >> memory >> system addr off size 7fff >> -vga.chain4 addr 000a off size 1 >> -macio addr 8088 off size 8 >> --macio-nvram addr 0006 off size 2 >> --pmac-ide addr 0002 off size 1000 >> --cuda addr 00016000 off size 2000 >> --escc-bar addr 00013000 off size 40 >> --dbdma addr 8000 off size 1000 >> --heathrow-pic addr off size 1000 >> -vga.rom addr 8080 off size 1 >> -vga.vram addr 8000 off size 80 >> -vga-lowmem addr 800a off size 2 >> -escc addr 80013000 off size 40 >> -isa-mmio addr fe00 off size 20 >> I/O >> io addr off size 1 >> -cmd646-bmdma addr 0700 off size 10 >> --cmd646-bmdma-ioport addr 000c off size 4 >> --cmd646-bmdma-bus addr 0008 off size 4 >> --cmd646-bmdma-ioport addr 0004 off size 4 >> --cmd646-bmdma-bus addr off size 4 >> -cmd646-cmd addr 0680 off size 4 >> -cmd646-data addr 0600 off size 8 >> -cmd646-cmd addr 0580 off size 4 >> -cmd646-data addr 0500 off size 8 >> -ne2000 addr 0400 off size 100 >> >> This ends up overmapping 0xa, effectively overwriting kernel data. >> If I #if 0 the offending chunk out, everything is fine. I would assume >> that chain4 really needs to be inside of lowmem? No idea about VGA, but >> I'm sure you know what's going on :). > Does this help? > > diff --git a/hw/vga.c b/hw/vga.c > index 125fb29..0a0c5a6 100644 > --- a/hw/vga.c > +++ b/hw/vga.c > @@ -181,6 +181,7 @@ static void vga_update_memory_access(VGACommonState > *s) > size = 0x8000; > break; > } > +base += isa_mem_base; > region = g_malloc(sizeof(*region)); > memory_region_init_alias(region, "vga.chain4", &s->vram, offset, > size); > memory_region_add_subregion_overlap(s->legacy_address_space, base, No longer oopses, but the screen looks chaotic now (black bar at bottom, part of contents at top etc.). >>> >>> Does this PPC machine map the ISA range and forward VGA accesses to the >>> adapter in general? >> >> If it does, please post a dump of the VGACommonState while the screen is >> cor
Re: [Qemu-devel] [PATCH] qcow2: fix range check
Am 13.09.2011 10:10, schrieb Frediano Ziglio: > 2011/9/12 Kevin Wolf : >> Am 10.09.2011 10:23, schrieb Frediano Ziglio: >>> QCowL2Meta::offset is not cluster aligned but only sector aligned >>> however nb_clusters count cluster from cluster start. >>> This fix range check. Note that old code have no corruption issues >>> related to this check cause it only cause intersection to occur >>> when shouldn't. >> >> Are you sure? See below. (I think it doesn't corrupt the image, but for >> a different reason) >> >>> >>> Signed-off-by: Frediano Ziglio >>> --- >>> block/qcow2-cluster.c | 14 +++--- >>> 1 files changed, 7 insertions(+), 7 deletions(-) >>> >>> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c >>> index 428b5ad..2f76311 100644 >>> --- a/block/qcow2-cluster.c >>> +++ b/block/qcow2-cluster.c >>> @@ -776,17 +776,17 @@ again: >>> */ >>> QLIST_FOREACH(old_alloc, &s->cluster_allocs, next_in_flight) { >>> >>> -uint64_t end_offset = offset + nb_clusters * s->cluster_size; >>> -uint64_t old_offset = old_alloc->offset; >>> -uint64_t old_end_offset = old_alloc->offset + >>> -old_alloc->nb_clusters * s->cluster_size; >>> +uint64_t start = offset >> s->cluster_bits; >>> +uint64_t end = start + nb_clusters; >>> +uint64_t old_start = old_alloc->offset >> s->cluster_bits; >>> +uint64_t old_end = old_start + old_alloc->nb_clusters; >>> >>> -if (end_offset < old_offset || offset > old_end_offset) { >>> +if (end < old_start || start > old_end) { >>> /* No intersection */ >> >> Consider request A from 0x0 + 0x1000 bytes and request B from 0x2000 + >> 0x1000 bytes. Both touch the same cluster and therefore should be >> serialised, but 0x2000 > 0x1000, so we decided here that there is no >> intersection and we don't have to care. >> >> Note that this doesn't corrupt the image, qcow2 can handle parallel >> requests allocating the same cluster. In qcow2_alloc_cluster_link_l2() >> we get an additional COW operation, so performance will be hurt, but >> correctness is maintained. >> > > I tested this adding some printf and also with strace and I can > confirm that current code serialize allocation. > Using ranges A (0-0x1000) and B (0x2000-0x3000) and assuming 0x1 > (64k) as cluster size you get > A: >offset 0 >nb_clusters 1 > B: > offset 0x2000 > nb_clusters 1 > > So without the patch you get two ranges > A: 0-0x1 > B: 0x2000-0x12000 > which intersects. > >>> } else { >>> -if (offset < old_offset) { >>> +if (start < old_start) { >>> /* Stop at the start of a running allocation */ >>> -nb_clusters = (old_offset - offset) >> s->cluster_bits; >>> +nb_clusters = old_start - start; >>> } else { >>> nb_clusters = 0; >>> } >> >> Anyway, the patch looks good. Applied to the block branch. >> >> Kevin >> > > Oh... I realize that ranges are [start, end) (end not inclusive) so > intersection test should be > >if (end <= old_start || start >= old_end) { > > intead of > > if (end < old_start || start > old_end) { > > However I don't understand why I got some small speed penalty with > this change (only done some small tests). Hm, I think you are right. How do you measure performance? Kevin
Re: [Qemu-devel] [PATCH v3 5/6] vga: Use linear mapping + dirty logging in chain 4 memory access mode
Am 13.09.2011 um 10:14 schrieb Jan Kiszka: On 2011-09-13 09:39, Alexander Graf wrote: (qemu) device_show #3 dev: VGA, id "#3", version 2 dev. version_id: 0002 config: 00 00 00 00 10 d1 cf 20 - 00 00 00 00 10 d1 d0 30 ... irq_state: 00 00 00 00 00 00 00 10 - 00 00 00 00 00 00 00 00 vga. latch: sr_index: 00 sr: 00 00 0f 00 08 00 00 00 gr_index: 00 gr: 00 00 00 00 00 40 05 00 - 00 00 00 00 00 00 00 00 ar_index: 20 ar: 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ... ar_flip_flop: 0001 cr_index: 00 cr: 00 63 00 00 00 00 00 50 - 00 40 00 00 00 00 00 00 ... msr:00 fcr:00 st00: 00 st01: 00 dac_state: 00 dac_sub_index: 00 dac_read_index: 00 dac_write_index:10 dac_cache: 3f 3f 3f palette:00 00 00 00 00 2a 00 2a - 00 00 2a 2a 2a 00 00 2a ... bank_offset: is_vbe_vmstate: 01 vbe_index: 0004 vbe_regs[00]: b0c5 vbe_regs[01]: 0320 vbe_regs[02]: 0258 vbe_regs[03]: 000f vbe_regs[04]: 0001 vbe_regs[05]: vbe_regs[06]: 0320 vbe_regs[07]: 0258 vbe_regs[08]: vbe_regs[09]: vbe_start_addr: vbe_line_offset:0640 vbe_bank_mask: 007f Makes no sense, must work with this setup. Maybe it's dynamic effect when switching modes of bank offsets. Do you have some test image for me? I've been using a Debian netinst image, but businesscard should do as well: http://cdimage.debian.org/debian-cd/6.0.2.1/powerpc/iso-cd/debian-6.0.2.1-powerpc-businesscard.iso qemu-system-ppc -boot d -cdrom path/to.iso # at yaboot prompt type: install Then in addition to the black bar in Alex' image, the penguin gets beheaded. Andreas
Re: [Qemu-devel] Why qemu write/rw speed is so low?
On Tue, Sep 13, 2011 at 3:15 PM, Stefan Hajnoczi wrote: > On Tue, Sep 13, 2011 at 10:38:28AM +0800, Zhi Yong Wu wrote: >> On Fri, Sep 9, 2011 at 6:38 PM, Stefan Hajnoczi >> wrote: >> > On Fri, Sep 09, 2011 at 05:44:36PM +0800, Zhi Yong Wu wrote: >> >> Today, i did some basical I/O testing, and suddenly found that qemu write >> >> and rw speed is so low now, my qemu binary is built on commit >> >> 344eecf6995f4a0ad1d887cec922f6806f91a3f8. >> >> >> >> Do qemu have regression? >> >> >> >> The testing data is shown as below: >> >> >> >> 1.) write >> >> >> >> test: (g=0): rw=write, bs=512-512/512-512, ioengine=libaio, iodepth=1 >> > >> > Please post your QEMU command-line. If your -drive is using >> > cache=writethrough then small writes are slow because they require the >> > physical disk to write and then synchronize its write cache. Typically >> > cache=none is a good setting to use for local disks. >> > >> > The block size of 512 bytes is too small. Ext4 uses a 4 KB block size, >> > so I think a 512 byte write from the guest could cause a 4 KB >> > read-modify-write operation on the host filesystem. >> > >> > You can check this by running btrace(8) on the host during the >> > benchmark. The blktrace output and the summary statistics will show >> > what I/O pattern the host is issuing. >> 8,2 0 1 0.0 337 A WS 425081504 + 8 <- >> (253,1) 42611360 > > 8 blocks = 8 * 512 bytes = 4 KB How do you know each block size is 512 bytes? > > So we are not performing 512 byte writes. Some layer is changing the > I/O pattern. > > Stefan > -- Regards, Zhi Yong Wu
Re: [Qemu-devel] [PATCH v3 5/6] vga: Use linear mapping + dirty logging in chain 4 memory access mode
On 13.09.2011, at 10:14, Jan Kiszka wrote: > On 2011-09-13 09:39, Alexander Graf wrote: >> >> On 12.09.2011, at 17:57, Jan Kiszka wrote: >> >>> On 2011-09-12 17:49, Jan Kiszka wrote: On 2011-09-12 17:45, Andreas Färber wrote: > Am 12.09.2011 17:33, schrieb Jan Kiszka: >> On 2011-09-12 17:20, Alexander Graf wrote: >>> Jan Kiszka wrote: Most VGA memory access modes require MMIO handling as they demand weird logic to get a byte from or into the video RAM. However, there is one exception: chain 4 mode with all memory planes enabled for writing. This mode actually allows lineary mapping, which can then be combined with dirty logging to accelerate KVM. This patch accelerates specifically VBE accesses like they are used by grub in graphical mode. Not only the standard VGA adapter benefits from this, also vmware and spice in VGA mode. CC: Gerd Hoffmann CC: Avi Kivity Signed-off-by: Jan Kiszka >>> [...] >>> +static void vga_update_memory_access(VGACommonState *s) +{ +MemoryRegion *region, *old_region = s->chain4_alias; +target_phys_addr_t base, offset, size; + +s->chain4_alias = NULL; + +if ((s->sr[0x02] & 0xf) == 0xf && s->sr[0x04] & 0x08) { +offset = 0; +switch ((s->gr[6] >> 2) & 3) { +case 0: +base = 0xa; +size = 0x2; +break; +case 1: +base = 0xa; +size = 0x1; +offset = s->bank_offset; +break; +case 2: +base = 0xb; +size = 0x8000; +break; +case 3: +base = 0xb8000; +size = 0x8000; +break; +} +region = g_malloc(sizeof(*region)); +memory_region_init_alias(region, "vga.chain4", &s->vram, offset, size); +memory_region_add_subregion_overlap(s->legacy_address_space, base, +region, 2); >>> This one eventually gives me the following in info mtree with -M g3beige >>> on qemu-system-ppc: >>> >>> (qemu) info mtree >>> memory >>> system addr off size 7fff >>> -vga.chain4 addr 000a off size 1 >>> -macio addr 8088 off size 8 >>> --macio-nvram addr 0006 off size 2 >>> --pmac-ide addr 0002 off size 1000 >>> --cuda addr 00016000 off size 2000 >>> --escc-bar addr 00013000 off size 40 >>> --dbdma addr 8000 off size 1000 >>> --heathrow-pic addr off size 1000 >>> -vga.rom addr 8080 off size 1 >>> -vga.vram addr 8000 off size 80 >>> -vga-lowmem addr 800a off size 2 >>> -escc addr 80013000 off size 40 >>> -isa-mmio addr fe00 off size 20 >>> I/O >>> io addr off size 1 >>> -cmd646-bmdma addr 0700 off size 10 >>> --cmd646-bmdma-ioport addr 000c off size 4 >>> --cmd646-bmdma-bus addr 0008 off size 4 >>> --cmd646-bmdma-ioport addr 0004 off size 4 >>> --cmd646-bmdma-bus addr off size 4 >>> -cmd646-cmd addr 0680 off size 4 >>> -cmd646-data addr 0600 off size 8 >>> -cmd646-cmd addr 0580 off size 4 >>> -cmd646-data addr 0500 off size 8 >>> -ne2000 addr 0400 off size 100 >>> >>> This ends up overmapping 0xa, effectively overwriting kernel data. >>> If I #if 0 the offending chunk out, everything is fine. I would assume >>> that chain4 really needs to be inside of lowmem? No idea about VGA, but >>> I'm sure you know what's going on :). >> Does this help? >> >> diff --git a/hw/vga.c b/hw/vga.c >> index 125fb29..0a0c5a6 100644 >> --- a/hw/vga.c >> +++ b/hw/vga.c >> @@ -181,6 +181,7 @@ static void vga_update_memory_access(VGACommonState >> *s) >>size = 0x8000; >>break; >>} >> +base += isa_mem_base; >>region = g_malloc(sizeof(*region)); >>memory_region_init_alias(region, "vga.chain4", &s->vram, offset, >> size); >>memory_region_add_subregion_overlap(s->legacy_address_space, base, > > No longer oopses, but the screen looks chaotic now (black bar at bottom, > part of contents at top etc.). Does
[Qemu-devel] [PATCH] PPC: Fix heathrow PIC to use little endian MMIO
During the memory API conversion, the indication on little endianness of MMIO for the heathrow PIC got dropped. This patch adds it back again. Signed-off-by: Alexander Graf --- hw/heathrow_pic.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/hw/heathrow_pic.c b/hw/heathrow_pic.c index 51996ab..16f48d1 100644 --- a/hw/heathrow_pic.c +++ b/hw/heathrow_pic.c @@ -126,7 +126,7 @@ static uint64_t pic_read(void *opaque, target_phys_addr_t addr, static const MemoryRegionOps heathrow_pic_ops = { .read = pic_read, .write = pic_write, -.endianness = DEVICE_NATIVE_ENDIAN, +.endianness = DEVICE_LITTLE_ENDIAN, }; static void heathrow_pic_set_irq(void *opaque, int num, int level) -- 1.6.0.2
Re: [Qemu-devel] Why qemu write/rw speed is so low?
On Tue, Sep 13, 2011 at 9:31 AM, Zhi Yong Wu wrote: > On Tue, Sep 13, 2011 at 3:15 PM, Stefan Hajnoczi > wrote: >> On Tue, Sep 13, 2011 at 10:38:28AM +0800, Zhi Yong Wu wrote: >>> On Fri, Sep 9, 2011 at 6:38 PM, Stefan Hajnoczi >>> wrote: >>> > On Fri, Sep 09, 2011 at 05:44:36PM +0800, Zhi Yong Wu wrote: >>> >> Today, i did some basical I/O testing, and suddenly found that qemu >>> >> write and rw speed is so low now, my qemu binary is built on commit >>> >> 344eecf6995f4a0ad1d887cec922f6806f91a3f8. >>> >> >>> >> Do qemu have regression? >>> >> >>> >> The testing data is shown as below: >>> >> >>> >> 1.) write >>> >> >>> >> test: (g=0): rw=write, bs=512-512/512-512, ioengine=libaio, iodepth=1 >>> > >>> > Please post your QEMU command-line. If your -drive is using >>> > cache=writethrough then small writes are slow because they require the >>> > physical disk to write and then synchronize its write cache. Typically >>> > cache=none is a good setting to use for local disks. >>> > >>> > The block size of 512 bytes is too small. Ext4 uses a 4 KB block size, >>> > so I think a 512 byte write from the guest could cause a 4 KB >>> > read-modify-write operation on the host filesystem. >>> > >>> > You can check this by running btrace(8) on the host during the >>> > benchmark. The blktrace output and the summary statistics will show >>> > what I/O pattern the host is issuing. >>> 8,2 0 1 0.0 337 A WS 425081504 + 8 <- >>> (253,1) 42611360 >> >> 8 blocks = 8 * 512 bytes = 4 KB > How do you know each block size is 512 bytes? The blkparse format specifier for blocks is 'n'. Here is the code to print it from blkparse_fmt.c: case 'n': fprintf(ofp, strcat(format, "u"), t_sec(t)); And t_sec() is: #define t_sec(t)((t)->bytes >> 9) So it divides the byte count by 512. Block size == sector size == 512 bytes. You can get the blktrace source code here: http://brick.kernel.dk/snaps/ Stefan
Re: [Qemu-devel] Why qemu write/rw speed is so low?
On Tue, Sep 13, 2011 at 4:49 PM, Stefan Hajnoczi wrote: > On Tue, Sep 13, 2011 at 9:31 AM, Zhi Yong Wu wrote: >> On Tue, Sep 13, 2011 at 3:15 PM, Stefan Hajnoczi >> wrote: >>> On Tue, Sep 13, 2011 at 10:38:28AM +0800, Zhi Yong Wu wrote: On Fri, Sep 9, 2011 at 6:38 PM, Stefan Hajnoczi wrote: > On Fri, Sep 09, 2011 at 05:44:36PM +0800, Zhi Yong Wu wrote: >> Today, i did some basical I/O testing, and suddenly found that qemu >> write and rw speed is so low now, my qemu binary is built on commit >> 344eecf6995f4a0ad1d887cec922f6806f91a3f8. >> >> Do qemu have regression? >> >> The testing data is shown as below: >> >> 1.) write >> >> test: (g=0): rw=write, bs=512-512/512-512, ioengine=libaio, iodepth=1 > > Please post your QEMU command-line. If your -drive is using > cache=writethrough then small writes are slow because they require the > physical disk to write and then synchronize its write cache. Typically > cache=none is a good setting to use for local disks. > > The block size of 512 bytes is too small. Ext4 uses a 4 KB block size, > so I think a 512 byte write from the guest could cause a 4 KB > read-modify-write operation on the host filesystem. > > You can check this by running btrace(8) on the host during the > benchmark. The blktrace output and the summary statistics will show > what I/O pattern the host is issuing. 8,2 0 1 0.0 337 A WS 425081504 + 8 <- (253,1) 42611360 >>> >>> 8 blocks = 8 * 512 bytes = 4 KB >> How do you know each block size is 512 bytes? > > The blkparse format specifier for blocks is 'n'. Here is the code to > print it from blkparse_fmt.c: > > case 'n': > fprintf(ofp, strcat(format, "u"), t_sec(t)); > > And t_sec() is: > > #define t_sec(t) ((t)->bytes >> 9) Great, it shift 9 bit in the right direction, i.e. its unit is changed from bytes to blocks, got it, thanks. > > So it divides the byte count by 512. Block size == sector size == 512 bytes. > > You can get the blktrace source code here: > > http://brick.kernel.dk/snaps/ > > Stefan > -- Regards, Zhi Yong Wu
Re: [Qemu-devel] [PATCH v3 5/6] vga: Use linear mapping + dirty logging in chain 4 memory access mode
On 2011-09-13 10:40, Alexander Graf wrote: > Btw, it still tries to execute invalid code even with your patch. #if 0'ing > out the memory region updates at least get the guest booting for me. Btw, to > get it working you also need a patch for the interrupt controller (another > breakage thanks to memory api). > > diff --git a/hw/heathrow_pic.c b/hw/heathrow_pic.c > index 51996ab..16f48d1 100644 > --- a/hw/heathrow_pic.c > +++ b/hw/heathrow_pic.c > @@ -126,7 +126,7 @@ static uint64_t pic_read(void *opaque, target_phys_addr_t > addr, > static const MemoryRegionOps heathrow_pic_ops = { > .read = pic_read, > .write = pic_write, > -.endianness = DEVICE_NATIVE_ENDIAN, > +.endianness = DEVICE_LITTLE_ENDIAN, > }; > > static void heathrow_pic_set_irq(void *opaque, int num, int level) > With out without this fix, with or without active chain-4 optimization, I just get an empty yellow screen when firing up qemu-system-ppc (also when using the Debian ISO). Do I need to specify a specific machine type? Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux
Re: [Qemu-devel] [Bug 848571] [NEW] qemu does not generate a qemu-kvm.stp tapset file
On Tue, Sep 13, 2011 at 5:08 AM, William Cohen <848...@bugs.launchpad.net> wrote: > Public bug reported: > > To make the systemtap probing easier to use qemu generates qemu*.stp > files with aliases for various events for each of the executables. The > installer places these files in /usr/share/systemtap/tapset/. These > files are generated by the tracetool. However, the /usr/bin/qemu-kvm is > produced with a copy: > > cp -a x86_64-softmmu/qemu-system-x86_64 qemu-kvm > > No matching qemu-kvm.stp generated for the qemu-kvm executable. It would > be really nice if that tapset file is generated so people can use more > symbolic probe points. Jes Sorensen added an option to make this possible: http://repo.or.cz/w/qemu.git/commitdiff/e323c93edf3abb67c37b8e08b78da4835880f12e Distro packaging scripts can make use of this tracetool option to generate .stp files for qemu-kvm in its install path. I think you need to file a bug with your distro. Stefan
[Qemu-devel] Simulez vos économies en ligne avec Ticket Restaurant
Si vous ne parvenez pas à lire cet email, visualisez la version en ligne. Pour découvrir tous les avantages de Ticket Restaurant® et évaluer vos économies 1) 33% d'économies = (550/1650)* 100 (voir explications sur la page suivante) Le plafond d'exonération de charges sociales et fiscales selon la réglementation au 01/01/2011 est de 5.29. 1164 (arrondi au supérieur dans le texte) = 5.29 x 220 jours travaillés. EDENRED FRANCE, S.A.S au capital de 388.036.640.00 ,dont le siège social est situé 166-180, boulevard Gabriel Péri 92245 Malakoff -393 365 135 R.C.S NANTERRE - TVA Intra Communautaire : FR 13 393 365 135 - Les marques mentionnées sur ce document sont enregistrées et propriété de EDENRED S.A. ou des sociétés de son groupe. Pour ne plus recevoir d'offres de la part de "Ticket Restaurant" Cliquez-ici . Ce message de l'annonceur Edenred vous est envoyé par le programme GDP. Vous recevez ce message parce que vous êtes inscrit sur le site du Guide des Prestataires. Conformément à l'article 34 de la loi "Informatique et Libertés" du 6 janvier 1978, Vous disposez d'un droit d'accès, de modification, de rectification et de suppression de données vous concernent. Si vous souhaitez vous désabonner, recopiez cette adresse dans la barre d'adresse de votre navigateur : http://www.ve1851-01.com/u-1.1.php?param=3Rbi_1IvEkRQjb1KxEXDWVukfGahjZsOBcEkRQjb1KwEjb1KEkRQjb1KxE2/jGEg/W/KEkRQjb1KSEIQIvIGmQRvEkRQjb1KwEiQI3bi3EkRQjb1KxEwJCLLLSLEkRQjb1KwE3sk/lPbY/EkRQjb1KxExEkRQjb1KwE1glibjkbvI/EkRQjb1KxExFC
Re: [Qemu-devel] qemu virtIO blocking operation - question
On Mon, Sep 12, 2011 at 10:05 PM, Sinha, Ani wrote: > > On Sep 11, 2011, at 6:34 AM, Stefan Hajnoczi wrote: > >> >> You may find these posts I wrote helpful, they explain vcpu threads and >> the I/O thread: >> http://blog.vmsplice.net/2011/03/qemu-internals-big-picture-overview.html >> http://blog.vmsplice.net/2011/03/qemu-internals-overall-architecture-and.html >> > > > "One example user of worker threads is posix-aio-compat.c, an asynchronous > file I/O implementation. When core QEMU issues an aio request it is placed on > a queue. Worker threads take requests off the queue and execute them outside > of core QEMU. They may perform blocking operations since they execute in > their own threads and do not block > > Another example is ui/vnc-jobs-async.c which performs compute-intensive image > compression and encoding in worker threads." > > > I wonder why there isn't a general framework for this? Some thread that would > take requests off a queue and process them without knowing the internals of > the request. There is, you can use GThreadPool. Stefan
Re: [Qemu-devel] qemu virtIO blocking operation - question
On Mon, Sep 12, 2011 at 7:31 PM, Sinha, Ani wrote: > > On Sep 11, 2011, at 6:34 AM, Stefan Hajnoczi wrote: > >> On Fri, Sep 09, 2011 at 07:45:17PM -0500, Sinha, Ani wrote: >>> So I am writing a virtIO driver for a device that supports blocking calls >>> like poll() etc. Now the front end paravirtualized driver mirrors the >>> request to the backend "host" qemu process that then does the actual call >>> on the host kernel on behalf of the guest. Now my question is, when I do a >>> blocking call from within the callback that I have registered when I >>> created the virtqueue, does it block the entire vcpu? If it does, do I have >>> to create an async context for it? Looking at virtio-net and virtio-block, >>> I can see the concept of bottom halves but I am not sure if this helps me >>> in any way. >> >> What device are you adding? It would help to share what you are trying >> to accomplish. > > > We are trying to paravirtualize the IPMI device (/dev/ipmi0). >From http://en.wikipedia.org/wiki/Intelligent_Platform_Management_Interface: "An implementation of IPMI version 1.5 can communicate via a direct serial connection or via a side-band local area network (LAN) connection to a remote client." Why do you need a new virtio device? Can't you use virtio-serial? This is what other management channels are using for host<->guest agents. What features and use cases does paravirtualized IPMI provide? Stefan
Re: [Qemu-devel] Why qemu write/rw speed is so low?
On Tue, Sep 13, 2011 at 3:14 PM, Stefan Hajnoczi wrote: > On Tue, Sep 13, 2011 at 10:52:44AM +0800, Zhi Yong Wu wrote: >> This is real log when fio issued with bs=128K and bps=100(block >> I/O throttling): > > I would use 1024 * 1024 instead of 100 as the throughput limit. > 10^5 is not a multiple of 512 bytes and is not a nice value in KB/s > (976.5625). OK. next time, i will adopt this. > >> >> 8,2 0 1 0.0 24332 A WS 79958528 + 256 <- >> (253,2) 71830016 > > 256 blocks = 256 * 512 bytes = 128 KB per request. We know the maximum > request size from Linux is 128 KB so this makes sense. > >> Throughput (R/W): 0KiB/s / 482KiB/s > > What throughput do you get without I/O throttling? Either I/O > throttling is limiting too aggressively here or the physical disk is the > bottleneck (I double that since the write throughput value is very low). > We need to compare against the throughput when throttling is not > enabled. Without block I/O throttling. test: (g=0): rw=write, bs=512-512/512-512, ioengine=libaio, iodepth=1 Starting 1 process Jobs: 1 (f=1): [W] [100.0% done] [0K/58K /s] [0/114 iops] [eta 00m:00s] test: (groupid=0, jobs=1): err= 0: pid=2659 write: io=51,200KB, bw=59,936B/s, iops=117, runt=874741msec slat (usec): min=25, max=44,515, avg=69.77, stdev=774.19 clat (usec): min=778, max=216K, avg=8460.67, stdev=2417.70 lat (usec): min=845, max=216K, avg=8531.11, stdev=2778.62 bw (KB/s) : min= 11, max= 60, per=100.89%, avg=58.52, stdev= 3.14 cpu : usr=0.04%, sys=0.76%, ctx=102601, majf=0, minf=49 IO depths: 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued r/w: total=0/102400, short=0/0 lat (usec): 1000=0.01% lat (msec): 2=0.01%, 4=0.01%, 10=99.17%, 20=0.24%, 50=0.53% lat (msec): 100=0.04%, 250=0.01% Run status group 0 (all jobs): WRITE: io=51,200KB, aggrb=58KB/s, minb=59KB/s, maxb=59KB/s, mint=874741msec, maxt=874741msec Disk stats (read/write): dm-0: ios=37/103237, merge=0/0, ticks=1935/901887, in_queue=903811, util=99.67%, aggrios=37/102904, aggrmerge=0/351, aggrticks=1935/889769, aggrin_queue=891623, aggrutil=99.64% vda: ios=37/102904, merge=0/351, ticks=1935/889769, in_queue=891623, util=99.64% test: (g=0): rw=write, bs=8K-8K/8K-8K, ioengine=libaio, iodepth=1 Starting 1 process Jobs: 1 (f=1): [W] [100.0% done] [0K/973K /s] [0/118 iops] [eta 00m:00s] test: (groupid=0, jobs=1): err= 0: pid=2716 write: io=51,200KB, bw=926KB/s, iops=115, runt= 55291msec slat (usec): min=20, max=36,133, avg=68.68, stdev=920.02 clat (msec): min=1, max=58, avg= 8.52, stdev= 1.99 lat (msec): min=1, max=66, avg= 8.58, stdev= 2.48 bw (KB/s) : min= 587, max= 972, per=100.23%, avg=928.14, stdev=54.43 cpu : usr=0.04%, sys=0.59%, ctx=6416, majf=0, minf=26 IO depths: 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued r/w: total=0/6400, short=0/0 lat (msec): 2=0.06%, 4=0.06%, 10=99.00%, 20=0.25%, 50=0.61% lat (msec): 100=0.02% Run status group 0 (all jobs): WRITE: io=51,200KB, aggrb=926KB/s, minb=948KB/s, maxb=948KB/s, mint=55291msec, maxt=55291msec Disk stats (read/write): dm-0: ios=3/6507, merge=0/0, ticks=33/68470, in_queue=68508, util=99.51%, aggrios=3/6462, aggrmerge=0/60, aggrticks=33/64291, aggrin_queue=64322, aggrutil=99.48% vda: ios=3/6462, merge=0/60, ticks=33/64291, in_queue=64322, util=99.48% test: (g=0): rw=write, bs=64K-64K/64K-64K, ioengine=libaio, iodepth=1 Starting 1 process Jobs: 1 (f=1): [W] [100.0% done] [0K/7,259K /s] [0/110 iops] [eta 00m:00s] test: (groupid=0, jobs=1): err= 0: pid=2727 write: io=51,200KB, bw=7,050KB/s, iops=110, runt= 7262msec slat (usec): min=30, max=46,393, avg=90.62, stdev=1639.10 clat (msec): min=2, max=39, avg= 8.98, stdev= 1.82 lat (msec): min=2, max=85, avg= 9.07, stdev= 3.08 bw (KB/s) : min= 6003, max= 7252, per=100.13%, avg=7058.86, stdev=362.31 cpu : usr=0.00%, sys=0.61%, ctx=801, majf=0, minf=23 IO depths: 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued r/w: total=0/800, short=0/0 lat (msec): 4=0.25%, 10=92.38%, 20=7.00%, 50=0.38% Run status group 0 (all jobs): WRITE: io=51,200KB, aggrb=7,050KB/s, minb=7,219KB/s, maxb=7,219KB/s, mint=7262msec, maxt=7262msec Disk stats (read/write): dm-0: ios=0/808, merge=0/0, ticks=0/8216, in_queue=8225, util=98.31%, aggrios=0/804, aggrmerge=0/18, aggrticks=0/7363, aggrin_queue=7363, aggrutil=98.19%
[Qemu-devel] qcow2: snapshot and resize possible?
Looking at block TODOs I saw qcow2 resize with snapshot. However I would ask if this is technical possible with current format. The reason is that snapshots have no size (only l1_size, QCowHeader have a size field) however I think that size if part of machine state and is not possible to compute exact size from l1_size. Should this resize be posted to qcow3 update? Frediano
Re: [Qemu-devel] [PATCH v3 5/6] vga: Use linear mapping + dirty logging in chain 4 memory access mode
On 13.09.2011, at 11:00, Jan Kiszka wrote: > On 2011-09-13 10:40, Alexander Graf wrote: >> Btw, it still tries to execute invalid code even with your patch. #if 0'ing >> out the memory region updates at least get the guest booting for me. Btw, to >> get it working you also need a patch for the interrupt controller (another >> breakage thanks to memory api). >> >> diff --git a/hw/heathrow_pic.c b/hw/heathrow_pic.c >> index 51996ab..16f48d1 100644 >> --- a/hw/heathrow_pic.c >> +++ b/hw/heathrow_pic.c >> @@ -126,7 +126,7 @@ static uint64_t pic_read(void *opaque, >> target_phys_addr_t addr, >> static const MemoryRegionOps heathrow_pic_ops = { >> .read = pic_read, >> .write = pic_write, >> -.endianness = DEVICE_NATIVE_ENDIAN, >> +.endianness = DEVICE_LITTLE_ENDIAN, >> }; >> >> static void heathrow_pic_set_irq(void *opaque, int num, int level) >> > > With out without this fix, with or without active chain-4 optimization, > I just get an empty yellow screen when firing up qemu-system-ppc (also > when using the Debian ISO). Do I need to specify a specific machine type? Ugh. No, you only need this patch: [PATCH] PPC: Fix via-cuda memory registration which fixes another recently introduced regression :) Alex
Re: [Qemu-devel] [PATCH v3 5/6] vga: Use linear mapping + dirty logging in chain 4 memory access mode
Am 13.09.2011 um 11:00 schrieb Jan Kiszka: On 2011-09-13 10:40, Alexander Graf wrote: Btw, it still tries to execute invalid code even with your patch. #if 0'ing out the memory region updates at least get the guest booting for me. Btw, to get it working you also need a patch for the interrupt controller (another breakage thanks to memory api). diff --git a/hw/heathrow_pic.c b/hw/heathrow_pic.c index 51996ab..16f48d1 100644 --- a/hw/heathrow_pic.c +++ b/hw/heathrow_pic.c @@ -126,7 +126,7 @@ static uint64_t pic_read(void *opaque, target_phys_addr_t addr, static const MemoryRegionOps heathrow_pic_ops = { .read = pic_read, .write = pic_write, -.endianness = DEVICE_NATIVE_ENDIAN, +.endianness = DEVICE_LITTLE_ENDIAN, }; static void heathrow_pic_set_irq(void *opaque, int num, int level) With out without this fix, with or without active chain-4 optimization, I just get an empty yellow screen when firing up qemu-system-ppc (also when using the Debian ISO). Do I need to specify a specific machine type? No. Did you try with Alex' via-cuda patch? That's the only one I have on my branch for Linux host. Andreas
Re: [Qemu-devel] qcow2: snapshot and resize possible?
Am 13.09.2011 11:41, schrieb Frediano Ziglio: > Looking at block TODOs I saw qcow2 resize with snapshot. However I > would ask if this is technical possible with current format. The > reason is that snapshots have no size (only l1_size, QCowHeader have a > size field) however I think that size if part of machine state and is > not possible to compute exact size from l1_size. > Should this resize be posted to qcow3 update? Yes, I think it would require a format change. Maybe we should add a field for the image size to the qcow2 v3 proposal. Kevin
Re: [Qemu-devel] About hotplug multifunction
On Tue, Sep 13, 2011 at 09:52:49AM +0300, Gleb Natapov wrote: > On Tue, Sep 13, 2011 at 02:57:20PM +0900, Isaku Yamahata wrote: > > On Sun, Sep 11, 2011 at 12:05:17PM +0300, Michael S. Tsirkin wrote: > > > On Sat, Sep 10, 2011 at 02:43:11AM +0900, Isaku Yamahata wrote: > > > > pci/pcie hot plug needs clean up for multifunction hotplug in long term. > > > > Only single function device case works. Multifunction case is broken > > > > somwehat. > > > > Especially the current acpi based hotplug should be replaced by > > > > the standardized hot plug controller in long term. > > > > > > We'll need to keep supporting windows XP, which IIUC only > > > supports hotplug through ACPI. So it looks like we'll > > > need both. > > > > Yes, we'll need both then. > > It would be possible to implement acpi-based hotplug with > > standardized hotplug controller. Not with qemu-specific controller. > > > Where is this "standardized hotplug controller" documented? In the pci bridge spec. > > It would require a bit amount of work to write ACPI code in DSDT that > > handles standardized hotplug controller. > > So I'm not sure it's worth while only for windows XP support. > > -- > > yamahata > > -- > Gleb.
Re: [Qemu-devel] Using the qemu tracepoints with SystemTap
On Mon, Sep 12, 2011 at 4:33 PM, William Cohen wrote: > The RHEL-6 version of qemu-kvm makes the tracepoints available to SystemTap. > I have been working on useful examples for the SystemTap tracepoints in qemu. > There doesn't seem to be a great number of examples showing the utility of > the tracepoints in diagnosing problems. However, I came across the following > blog entry that had several examples: > > http://blog.vmsplice.net/2011/03/how-to-write-trace-analysis-scripts-for.html > > I reimplemented the VirtqueueRequestTracker example from the blog in > SystemTap (the attached virtqueueleaks.stp). I can run it on RHEL-6's > qemu-kvm-0.12.1.2-2.160.el6_1.8.x86_64 and get output like the following. It > outputs the pid and the address of the elem that leaked when the script is > stopped like the following: > > $ stap virtqueueleaks.stp > ^C > pid elem > 19503 1c4af28 > 19503 1c56f88 > 19503 1c62fe8 > 19503 1c6f048 > 19503 1c7b0a8 > 19503 1c87108 > 19503 1c93168 > ... > > I am not that familiar with the internals of qemu. The script seems to > indicates qemu is leaking, but is that really the case? If there are > resource leaks, what output would help debug those leaks? What enhancements > can be done to this script to provide more useful information? Leak tracing always has corner cases :). With virtio-blk this would indicate a leak because it uses a request-response model where the guest initiates I/O and the host responds. A guest that cleanly shuts down before you exit your SystemTap script should not leak requests for virtio-blk. With virtio-net the guest actually hands the host receive buffers and they are held until we can receive packets into them and return them to the host. We don't have a virtio_reset trace event, and due to this we're not accounting for clean shutdown (the guest driver resets the device to clear all virtqueues). I am submitting a patch to add virtio_reset() tracing. This will allow the script to delete all elements belonging to this virtio device. > Are there other examples of qemu probing people would like to see? The malloc/realloc/memalign/vmalloc/free/vfree trace events can be used for a few things: * Finding memory leaks. * Finding malloc/vfree or vmalloc/free mismatches. The rules are: malloc/realloc need free, memalign/vmalloc need vfree. They cannot be mixed. Stefan
Re: [Qemu-devel] About hotplug multifunction
On Tue, Sep 13, 2011 at 09:52:49AM +0300, Gleb Natapov wrote: > On Tue, Sep 13, 2011 at 02:57:20PM +0900, Isaku Yamahata wrote: > > On Sun, Sep 11, 2011 at 12:05:17PM +0300, Michael S. Tsirkin wrote: > > > On Sat, Sep 10, 2011 at 02:43:11AM +0900, Isaku Yamahata wrote: > > > > pci/pcie hot plug needs clean up for multifunction hotplug in long term. > > > > Only single function device case works. Multifunction case is broken > > > > somwehat. > > > > Especially the current acpi based hotplug should be replaced by > > > > the standardized hot plug controller in long term. > > > > > > We'll need to keep supporting windows XP, which IIUC only > > > supports hotplug through ACPI. So it looks like we'll > > > need both. > > > > Yes, we'll need both then. > > It would be possible to implement acpi-based hotplug with > > standardized hotplug controller. Not with qemu-specific controller. > > > Where is this "standardized hotplug controller" documented? Sorry both pci bridge and hotplug spec only reference shpc. The spec itself is PCI Standard Hot-Plug Controller and Subsystem Specification. Revision 1.0 - get it from pcisig > > It would require a bit amount of work to write ACPI code in DSDT that > > handles standardized hotplug controller. > > So I'm not sure it's worth while only for windows XP support. > > -- > > yamahata > > -- > Gleb.
Re: [Qemu-devel] About hotplug multifunction
On Tue, Sep 13, 2011 at 01:05:00PM +0300, Michael S. Tsirkin wrote: > On Tue, Sep 13, 2011 at 09:52:49AM +0300, Gleb Natapov wrote: > > On Tue, Sep 13, 2011 at 02:57:20PM +0900, Isaku Yamahata wrote: > > > On Sun, Sep 11, 2011 at 12:05:17PM +0300, Michael S. Tsirkin wrote: > > > > On Sat, Sep 10, 2011 at 02:43:11AM +0900, Isaku Yamahata wrote: > > > > > pci/pcie hot plug needs clean up for multifunction hotplug in long > > > > > term. > > > > > Only single function device case works. Multifunction case is broken > > > > > somwehat. > > > > > Especially the current acpi based hotplug should be replaced by > > > > > the standardized hot plug controller in long term. > > > > > > > > We'll need to keep supporting windows XP, which IIUC only > > > > supports hotplug through ACPI. So it looks like we'll > > > > need both. > > > > > > Yes, we'll need both then. > > > It would be possible to implement acpi-based hotplug with > > > standardized hotplug controller. Not with qemu-specific controller. > > > > > Where is this "standardized hotplug controller" documented? > > Sorry both pci bridge and hotplug spec only reference shpc. > The spec itself is PCI Standard Hot-Plug > Controller and Subsystem Specification. > > Revision 1.0 - get it from pcisig > Thanks, Isaku is already pointed it to me. -- Gleb.
Re: [Qemu-devel] Why qemu write/rw speed is so low?
On Tue, Sep 13, 2011 at 10:25 AM, Zhi Yong Wu wrote: > On Tue, Sep 13, 2011 at 3:14 PM, Stefan Hajnoczi > wrote: >> On Tue, Sep 13, 2011 at 10:52:44AM +0800, Zhi Yong Wu wrote: >>> This is real log when fio issued with bs=128K and bps=100(block >>> I/O throttling): >> >> I would use 1024 * 1024 instead of 100 as the throughput limit. >> 10^5 is not a multiple of 512 bytes and is not a nice value in KB/s >> (976.5625). > OK. next time, i will adopt this. >> >>> >>> 8,2 0 1 0.0 24332 A WS 79958528 + 256 <- >>> (253,2) 71830016 >> >> 256 blocks = 256 * 512 bytes = 128 KB per request. We know the maximum >> request size from Linux is 128 KB so this makes sense. >> >>> Throughput (R/W): 0KiB/s / 482KiB/s >> >> What throughput do you get without I/O throttling? Either I/O >> throttling is limiting too aggressively here or the physical disk is the >> bottleneck (I double that since the write throughput value is very low). >> We need to compare against the throughput when throttling is not >> enabled. > Without block I/O throttling. [...] > test: (g=0): rw=write, bs=128K-128K/128K-128K, ioengine=libaio, iodepth=1 > Starting 1 process > Jobs: 1 (f=1): [W] [100.0% done] [0K/13M /s] [0/103 iops] [eta 00m:00s] > test: (groupid=0, jobs=1): err= 0: pid=2734 > write: io=51,200KB, bw=12,933KB/s, iops=101, runt= 3959msec This shows that the physical disk is capable of far exceeding 1 MB/s when I/O is not limited. So the earlier result where the guest only gets 482 KiB/s under 100 bps limit shows that I/O limits are being too aggressive. For some reason the algorithm is causing the guest to get lower throughput than expected. It would be interesting to try with bps=$((10 * 1024 * 1024)). I wonder if the algorithm has a constant overhead of a couple hundred KB/s or if it changes with the much larger bps value. Stefan
Re: [Qemu-devel] Why qemu write/rw speed is so low?
On Tue, Sep 13, 2011 at 6:14 PM, Stefan Hajnoczi wrote: > On Tue, Sep 13, 2011 at 10:25 AM, Zhi Yong Wu wrote: >> On Tue, Sep 13, 2011 at 3:14 PM, Stefan Hajnoczi >> wrote: >>> On Tue, Sep 13, 2011 at 10:52:44AM +0800, Zhi Yong Wu wrote: This is real log when fio issued with bs=128K and bps=100(block I/O throttling): >>> >>> I would use 1024 * 1024 instead of 100 as the throughput limit. >>> 10^5 is not a multiple of 512 bytes and is not a nice value in KB/s >>> (976.5625). >> OK. next time, i will adopt this. >>> 8,2 0 1 0.0 24332 A WS 79958528 + 256 <- (253,2) 71830016 >>> >>> 256 blocks = 256 * 512 bytes = 128 KB per request. We know the maximum >>> request size from Linux is 128 KB so this makes sense. >>> Throughput (R/W): 0KiB/s / 482KiB/s >>> >>> What throughput do you get without I/O throttling? Either I/O >>> throttling is limiting too aggressively here or the physical disk is the >>> bottleneck (I double that since the write throughput value is very low). >>> We need to compare against the throughput when throttling is not >>> enabled. >> Without block I/O throttling. > [...] >> test: (g=0): rw=write, bs=128K-128K/128K-128K, ioengine=libaio, iodepth=1 >> Starting 1 process >> Jobs: 1 (f=1): [W] [100.0% done] [0K/13M /s] [0/103 iops] [eta 00m:00s] >> test: (groupid=0, jobs=1): err= 0: pid=2734 >> write: io=51,200KB, bw=12,933KB/s, iops=101, runt= 3959msec > > This shows that the physical disk is capable of far exceeding 1 MB/s > when I/O is not limited. So the earlier result where the guest only > gets 482 KiB/s under 100 bps limit shows that I/O limits are being > too aggressive. For some reason the algorithm is causing the guest to > get lower throughput than expected. > > It would be interesting to try with bps=$((10 * 1024 * 1024)). I > wonder if the algorithm has a constant overhead of a couple hundred > KB/s or if it changes with the much larger bps value. OK, i will try it tomorrow. By the way, I/O throttling can be enabled now from libvirt guest's xml file when guest is started up. I have pushed its code changes to my git tree. git branch: ssh://wu...@repo.or.cz/srv/git/libvirt/zwu.git dev > > Stefan > -- Regards, Zhi Yong Wu
Re: [Qemu-devel] [PATCH][RFC][0/2] REF+/REF- optimization
Am 13.09.2011 09:53, schrieb Frediano Ziglio: > These patches try to trade-off between leaks and speed for clusters > refcounts. > > Refcount increments (REF+ or refp) are handled in a different way from > decrements (REF- or refm). The reason it that posting or not flushing > a REF- cause "just" a leak while posting a REF+ cause a corruption. > > To optimize REF- I just used an array to store offsets then when a > flush is requested or array reach a limit (currently 1022) the array > is sorted and written to disk. I use an array with offset instead of > ranges to support compression (an offset could appear multiple times > in the array). > I consider this patch quite ready. Ok, first of all let's clarify what this optimises. I don't think it changes anything at all for the writeback cache modes, because these already do most operations in memory only. So this must be about optimising some operations with cache=writethrough. REF- isn't about normal cluster allocation, it is about COW with internal snapshots or bdrv_discard. Do you have benchmarks for any of them? I strongly disagree with your approach for REF-. We already have a cache, and introducing a second one sounds like a bad idea. I think we could get a very similar effect if we introduced a qcow2_cache_entry_mark_dirty_wb() that marks a given refcount block as dirty, but at the same time tells the cache that even in write-through mode it can still treat this block as write-back. This should require much less code changes. But let's measure the effects first, I suspect that for cluster allocation it doesn't help much because every REF- comes with a REF+. > To optimize REF+ I mark a range as allocated and use this range to > get new ones (avoiding writing refcount to disk). When a flush is > requested or in some situations (like snapshot) this cache is disabled > and flushed (written as REF-). > I do not consider this patch ready, it works and pass all io-tests > but for instance I would avoid allocating new clusters for refcount > during preallocation. The only question here is if improving cache=writethrough cluster allocation performance is worth the additional complexity in the already complex refcounting code. The alternative that was discussed before is the dirty bit approach that is used in QED and would allow us to use writeback for all refcount blocks, regardless of REF- or REF+. It would be an easier approach requiring less code changes, but it comes with the cost of requiring an fsck after a qemu crash. > End speed up is quite visible allocating clusters (more then 20%). What benchmark do you use for testing this? Kevin
[Qemu-devel] Help needed to modify VVFAT
Hi, This is regarding vvfat. I want to do some modification in vvfat. And I need some help. Currently vvfat scans all directories and sub-directories in the beginning. I want to modify vvfat such that it should scan only the TOP directory content and not the sub-directory content. How can I do that? Please let me know. Thanks, Pintu
Re: [Qemu-devel] [PATCH v3 5/6] vga: Use linear mapping + dirty logging in chain 4 memory access mode
On 2011-09-13 11:42, Alexander Graf wrote: > > On 13.09.2011, at 11:00, Jan Kiszka wrote: > >> On 2011-09-13 10:40, Alexander Graf wrote: >>> Btw, it still tries to execute invalid code even with your patch. #if 0'ing >>> out the memory region updates at least get the guest booting for me. Btw, >>> to get it working you also need a patch for the interrupt controller >>> (another breakage thanks to memory api). >>> >>> diff --git a/hw/heathrow_pic.c b/hw/heathrow_pic.c >>> index 51996ab..16f48d1 100644 >>> --- a/hw/heathrow_pic.c >>> +++ b/hw/heathrow_pic.c >>> @@ -126,7 +126,7 @@ static uint64_t pic_read(void *opaque, >>> target_phys_addr_t addr, >>> static const MemoryRegionOps heathrow_pic_ops = { >>> .read = pic_read, >>> .write = pic_write, >>> -.endianness = DEVICE_NATIVE_ENDIAN, >>> +.endianness = DEVICE_LITTLE_ENDIAN, >>> }; >>> >>> static void heathrow_pic_set_irq(void *opaque, int num, int level) >>> >> >> With out without this fix, with or without active chain-4 optimization, >> I just get an empty yellow screen when firing up qemu-system-ppc (also >> when using the Debian ISO). Do I need to specify a specific machine type? > > Ugh. No, you only need this patch: > > [PATCH] PPC: Fix via-cuda memory registration > > which fixes another recently introduced regression :) That works now - and allowed me to identify the bug after enhancing info mtree a bit: (qemu) info mtree memory addr prio 0 size 7fff system addr 8088 prio 1 size 8 macio addr 808e prio 0 size 2 macio-nvram addr 808a prio 0 size 1000 pmac-ide addr 80896000 prio 0 size 2000 cuda addr 80893000 prio 0 size 40 escc-bar addr 80888000 prio 0 size 1000 dbdma addr 8088 prio 0 size 1000 heathrow-pic addr 8000 prio 1 size 80 vga.vram addr 800a prio 1 size 2 vga-lowmem ... Here is the problem: Both the vram and the ISA range get mapped into system address space, but the former eclipses the latter as it shows up earlier in the list and has the same priority. This picture changes with the chain-4 alias which has prio 2, thus maps over the vram. It looks to me like the ISA address space is either misplaced at 0x8000 or is not supposed to be mapped at all on PPC. Comments? Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux
Re: [Qemu-devel] Question on kvm_clock working ...
Thanks Phillip. My current source is "kvm-clock". WHen I start my guest, it is in sync with the host clock. Then, I chance the time on my host - using "date --set ...". I don't see the guest update its time. I was expecting that the guest will detect host time change and change it? So, when the host is exporting its system time and TSC values, does it go into the "emulated RTC" of the guest and the guest checks it only once? Or does the guest resync its clock with the host's value periodically? I can try to do: "hwclock --hctosys --utc" --- this is just to check. (I have kvm-clock as my clock source though). Thanks -a On Tue, Sep 13, 2011 at 2:49 AM, Philipp Hahn wrote: > Hello Al, > > I just debugged a kvmclock bug, so I claim to have some knowledge in this > area > now, but please take my answer with a grain of doubt. > > On Monday 12 September 2011 15:21:25 al pat wrote: > > Still seeking your guidance on this. Appreciate any pointers you may > have. > > You have to distiguish between the real-time-clock (RTC), which in hardware > is > a battery powered clock running even when your PC is powered off. Since > it's > slow to access, most Linux distributions read out its value once during > boot > using "hwclock --hctosys --utc" and than don't care about that clock any > more > until shutdown, when they write back the system time to the RTC > using "... --systohc ...". > During runtime, other methods are used for time keeping: Either by counting > regular interrupts, using the ACPI-PM clock, or the High Performance Event > Timer (HPET), or the Time Stamp Counter (TSC) register, or ...; > see /sys/devices/system/clocksource/clocksource0/available_clocksource for > a > list of available clock sources. > > For virtual machines there is an additional clock source named "kvmclock", > which uses the host clock and the TSC: The host exports its current system > time (plus some configurable offset) and a snapshot value of TSC register > when doing so. Than the guest can interpolate the current time by using the > exported_system_time + scale * (current_TSC_value-snapshot_TSC_value). This > kvmclock doesn't have anything to do with the RTC clock as far as I know. > > Now to your problem: You should check the value > of /sys/devices/system/clocksource/clocksource0/current_clocksource in your > guest. If it is somethong other than kvmclock, you should if > using "hwclock --hctosys --utc" re-synchronizes your guest clock to the > host. > > Sincerely > Philipp > -- > Philipp Hahn Open Source Software Engineer > h...@univention.de > Univention GmbHLinux for Your Businessfon: +49 421 22 232- > 0 > Mary-Somerville-Str.1 D-28359 Bremen fax: +49 421 22 > 232-99 > > http://www.univention.de/ > > > Treffen Sie Univention auf der IT&Business vom 20. bis 22. September 2011 > auf dem Gemeinschaftsstand der Open Source Business Alliance in Stuttgart > in > Halle 3 Stand 3D27-7. >
Re: [Qemu-devel] [PATCH V8 03/14] Add persistent state handling to TPM TIS frontend driver
On Monday, September 12, 2011 07:37:25 PM Stefan Berger wrote: > On 09/12/2011 05:16 PM, Paul Moore wrote: > > On Sunday, September 11, 2011 12:45:05 PM Stefan Berger wrote: > >> On 09/09/2011 05:13 PM, Paul Moore wrote: > >>> On Wednesday, August 31, 2011 10:35:54 AM Stefan Berger wrote: > Index: qemu-git/hw/tpm_tis.c > == > = > --- qemu-git.orig/hw/tpm_tis.c > +++ qemu-git/hw/tpm_tis.c > @@ -6,6 +6,8 @@ > > * Author: Stefan Berger > * David Safford > * > > + * Xen 4 support: Andrease > Niederl > + * > > * This program is free software; you can redistribute it > and/or > * modify it under the terms of the GNU General Public > License as > * published by the Free Software Foundation, version 2 of > the > > @@ -839,3 +841,167 @@ static int tis_init(ISADevice *dev) > > err_exit: > return -1; > > } > > + > +/* persistent state handling */ > + > +static void tis_pre_save(void *opaque) > +{ > +TPMState *s = opaque; > +uint8_t locty = s->active_locty; > >>> > >>> Is it safe to read s->active_locty without the state_lock? I'm not > >>> sure at this point but I saw it being protected by the lock > >>> elsewhere ...>> > >> It cannot change anymore since no vCPU is in the TPM TIS emulation > >> layer > >> anymore but all we're doing is wait for the last outstanding command > >> to > >> be returned to use from the TPM thread. > >> I don't mind putting this reading into the critical section, though, > >> just to have it be consistent. > > > > [Dropping the rest of the comments since they all cover the same issue] > > > > I'm a big fan of consistency, especially when it comes to locking; > > inconsistent lock usage can lead to confusion and that is almost never > > good. > > > > If we need a lock here because there is the potential for an outstanding > > TPM command, then I vote for locking in this function just as you would > > in any other. However, if we really don't need locks here because the > > outstanding TPM command will have _no_ effect on the TPMState or any > > related structure, then just do away with the lock completely and make > > of note of it in the function explaining why. > > Let's give the consistency argument the favor and extend the locking > over those parts that usually also get locked. Great, thanks. -- paul moore virtualization @ redhat
Re: [Qemu-devel] About hotplug multifunction
Hi all, After reading the pci driver code, I found a problem. There is a list for each slot, (slot->funcs) it will be inited in acpiphp_glue.c:register_slot() before hotpluging device, and only one entry(func 0) will be added to it, no new entry will be added to the list when hotpluging devices to the slot. When we release the device, there are only _one_ entry in the list(slot->funcs). acpiphp_glue.c:disable_device() list_for_each_entry(func, &slot->funcs, sibling) { pdev = pci_get_slot(slot->bridge->pci_bus, PCI_DEVFN(slot->device, func->function)); ...release code... // those code can only be executed one time (func 0) pci_remove_bus_device(pdev); } bus.c:pci_bus_add_device() is called for each func device in acpiphp_glue.c:enable_device(). bus.c:pci_remove_bus_device(pdev) is only called for func 0 in acpiphp_glue.c:disable_device(). Resolution: (I've tested it, success) enumerate all the funcs when disable device. list_for_each_entry(func, &slot->funcs, sibling) { for (i=0; i<8; i++) { pdev = pci_get_slot(slot->bridge->pci_bus, PCI_DEVFN(slot->device, i)); ...release code... pci_remove_bus_device(pdev); } } Thanks, Amos
Re: [Qemu-devel] [PATCH 0/2] versatile: cleanups to remove use of sysbus_mmio_init_cb2
On 09/12/2011 06:01 PM, Peter Maydell wrote: Ping? Sorry; applied to memory/queue. On 1 September 2011 18:36, Peter Maydell wrote: > A couple of patches which do some cleanup work to versatile > devices following the recent MemoryRegion conversion. These > both remove uses of sysbus_mmio_init_cb2(), which strikes me > as kind of ugly and worth avoiding. (After these two patches > it will be used by only sh_pci.c and ppce500_pci.c...) > > Peter Maydell (2): >hw/arm11mpcore: Clean up to avoid using sysbus_mmio_init_cb2 >hw/versatile_pci: Expose multiple sysbus mmio regions > >hw/arm11mpcore.c | 13 + >hw/realview.c | 12 ++-- >hw/versatile_pci.c | 42 -- >hw/versatilepb.c | 12 ++-- >4 files changed, 29 insertions(+), 50 deletions(-) -- error compiling committee.c: too many arguments to function
[Qemu-devel] [PATCH 3/4] trace: remove trailing double quotes after PRI*64
Now that format strings can end in a PRI*64 macro, remove the workarounds from the trace-events file. Signed-off-by: Stefan Hajnoczi --- trace-events | 34 +- 1 files changed, 17 insertions(+), 17 deletions(-) diff --git a/trace-events b/trace-events index cfcdc9b..9a59525 100644 --- a/trace-events +++ b/trace-events @@ -88,8 +88,8 @@ balloon_event(void *opaque, unsigned long addr) "opaque %p addr %lu" # hw/apic.c apic_local_deliver(int vector, uint32_t lvt) "vector %d delivery mode %d" apic_deliver_irq(uint8_t dest, uint8_t dest_mode, uint8_t delivery_mode, uint8_t vector_num, uint8_t trigger_mode) "dest %d dest_mode %d delivery_mode %d vector %d trigger_mode %d" -cpu_set_apic_base(uint64_t val) "%016"PRIx64"" -cpu_get_apic_base(uint64_t val) "%016"PRIx64"" +cpu_set_apic_base(uint64_t val) "%016"PRIx64 +cpu_get_apic_base(uint64_t val) "%016"PRIx64 apic_mem_readl(uint64_t addr, uint32_t val) "%"PRIx64" = %08x" apic_mem_writel(uint64_t addr, uint32_t val) "%"PRIx64" = %08x" # coalescing @@ -169,21 +169,21 @@ slavio_led_mem_readw(uint32_t ret) "Read diagnostic LED %04x" # hw/slavio_timer.c slavio_timer_get_out(uint64_t limit, uint32_t counthigh, uint32_t count) "limit %"PRIx64" count %x%08x" slavio_timer_irq(uint32_t counthigh, uint32_t count) "callback: count %x%08x" -slavio_timer_mem_readl_invalid(uint64_t addr) "invalid read address %"PRIx64"" +slavio_timer_mem_readl_invalid(uint64_t addr) "invalid read address %"PRIx64 slavio_timer_mem_readl(uint64_t addr, uint32_t ret) "read %"PRIx64" = %08x" slavio_timer_mem_writel(uint64_t addr, uint32_t val) "write %"PRIx64" = %08x" -slavio_timer_mem_writel_limit(unsigned int timer_index, uint64_t count) "processor %d user timer set to %016"PRIx64"" +slavio_timer_mem_writel_limit(unsigned int timer_index, uint64_t count) "processor %d user timer set to %016"PRIx64 slavio_timer_mem_writel_counter_invalid(void) "not user timer" slavio_timer_mem_writel_status_start(unsigned int timer_index) "processor %d user timer started" slavio_timer_mem_writel_status_stop(unsigned int timer_index) "processor %d user timer stopped" slavio_timer_mem_writel_mode_user(unsigned int timer_index) "processor %d changed from counter to user timer" slavio_timer_mem_writel_mode_counter(unsigned int timer_index) "processor %d changed from user timer to counter" slavio_timer_mem_writel_mode_invalid(void) "not system timer" -slavio_timer_mem_writel_invalid(uint64_t addr) "invalid write address %"PRIx64"" +slavio_timer_mem_writel_invalid(uint64_t addr) "invalid write address %"PRIx64 # hw/sparc32_dma.c -ledma_memory_read(uint64_t addr) "DMA read addr 0x%"PRIx64"" -ledma_memory_write(uint64_t addr) "DMA write addr 0x%"PRIx64"" +ledma_memory_read(uint64_t addr) "DMA read addr 0x%"PRIx64 +ledma_memory_write(uint64_t addr) "DMA write addr 0x%"PRIx64 sparc32_dma_set_irq_raise(void) "Raise IRQ" sparc32_dma_set_irq_lower(void) "Lower IRQ" espdma_memory_read(uint32_t addr) "DMA read addr 0x%08x" @@ -202,12 +202,12 @@ sun4m_cpu_set_irq_lower(int level) "Lower CPU IRQ %d" # hw/sun4m_iommu.c sun4m_iommu_mem_readl(uint64_t addr, uint32_t ret) "read reg[%"PRIx64"] = %x" sun4m_iommu_mem_writel(uint64_t addr, uint32_t val) "write reg[%"PRIx64"] = %x" -sun4m_iommu_mem_writel_ctrl(uint64_t iostart) "iostart = %"PRIx64"" +sun4m_iommu_mem_writel_ctrl(uint64_t iostart) "iostart = %"PRIx64 sun4m_iommu_mem_writel_tlbflush(uint32_t val) "tlb flush %x" sun4m_iommu_mem_writel_pgflush(uint32_t val) "page flush %x" sun4m_iommu_page_get_flags(uint64_t pa, uint64_t iopte, uint32_t ret) "get flags addr %"PRIx64" => pte %"PRIx64", *pte = %x" sun4m_iommu_translate_pa(uint64_t addr, uint64_t pa, uint32_t iopte) "xlate dva %"PRIx64" => pa %"PRIx64" iopte = %x" -sun4m_iommu_bad_addr(uint64_t addr) "bad addr %"PRIx64"" +sun4m_iommu_bad_addr(uint64_t addr) "bad addr %"PRIx64 # hw/usb-bus.c usb_port_claim(int bus, const char *port) "bus %d, port %s" @@ -278,7 +278,7 @@ scsi_req_data(int target, int lun, int tag, int len) "target %d lun %d tag %d le scsi_req_dequeue(int target, int lun, int tag) "target %d lun %d tag %d" scsi_req_continue(int target, int lun, int tag) "target %d lun %d tag %d" scsi_req_parsed(int target, int lun, int tag, int cmd, int mode, int xfer) "target %d lun %d tag %d command %d dir %d length %d" -scsi_req_parsed_lba(int target, int lun, int tag, int cmd, uint64_t lba) "target %d lun %d tag %d command %d lba %"PRIu64"" +scsi_req_parsed_lba(int target, int lun, int tag, int cmd, uint64_t lba) "target %d lun %d tag %d command %d lba %"PRIu64 scsi_req_parse_bad(int target, int lun, int tag, int cmd) "target %d lun %d tag %d command %d" scsi_req_build_sense(int target, int lun, int tag, int key, int asc, int ascq) "target %d lun %d tag %d key %#02x asc %#02x ascq %#02x" scsi_report_luns(int target, int lun, int tag) "target %d lun %d tag %d" @@ -306,11 +306,11 @@ qed_start_need_check_timer(void *s) "s %p" qed_cancel_n
[Qemu-devel] [PATCH 2/3] wavcapture: port to FILE*
QEMUFile * is only intended for Migration. Using it for anything else just adds pain and a layer of buffers for no good reason. Signed-off-by: Juan Quintela --- audio/wavcapture.c | 38 +- 1 files changed, 25 insertions(+), 13 deletions(-) diff --git a/audio/wavcapture.c b/audio/wavcapture.c index c64f0ef..ecdb9ec 100644 --- a/audio/wavcapture.c +++ b/audio/wavcapture.c @@ -3,7 +3,7 @@ #include "audio.h" typedef struct { -QEMUFile *f; +FILE *f; int bytes; char *path; int freq; @@ -40,12 +40,16 @@ static void wav_destroy (void *opaque) le_store (rlen, rifflen, 4); le_store (dlen, datalen, 4); -qemu_fseek (wav->f, 4, SEEK_SET); -qemu_put_buffer (wav->f, rlen, 4); +fseek (wav->f, 4, SEEK_SET); +if (fwrite (rlen, 1, 4, wav->f) != 4) { +printf("wav_destroy: short write\n"); +} -qemu_fseek (wav->f, 32, SEEK_CUR); -qemu_put_buffer (wav->f, dlen, 4); -qemu_fclose (wav->f); +fseek (wav->f, 32, SEEK_CUR); +if (fwrite (dlen, 1, 4, wav->f) != 4) { +printf("wav_destroy: short write\n"); +} +fclose (wav->f); } g_free (wav->path); @@ -55,7 +59,9 @@ static void wav_capture (void *opaque, void *buf, int size) { WAVState *wav = opaque; -qemu_put_buffer (wav->f, buf, size); +if (fwrite (buf, size, 1, wav->f) != size) { +printf("wav_capture: short write\n"); +} wav->bytes += size; } @@ -130,7 +136,7 @@ int wav_start_capture (CaptureState *s, const char *path, int freq, le_store (hdr + 28, freq << shift, 4); le_store (hdr + 32, 1 << shift, 2); -wav->f = qemu_fopen (path, "wb"); +wav->f = fopen (path, "wb"); if (!wav->f) { monitor_printf(mon, "Failed to open wave file `%s'\nReason: %s\n", path, strerror (errno)); @@ -143,19 +149,25 @@ int wav_start_capture (CaptureState *s, const char *path, int freq, wav->nchannels = nchannels; wav->freq = freq; -qemu_put_buffer (wav->f, hdr, sizeof (hdr)); +if (fwrite (hdr, sizeof (hdr), 1, wav->f) != sizeof (hdr)) { +monitor_printf(mon, "wav_start_capture: short write\n"); +goto error_free; +} cap = AUD_add_capture (&as, &ops, wav); if (!cap) { monitor_printf(mon, "Failed to add audio capture\n"); -g_free (wav->path); -qemu_fclose (wav->f); -g_free (wav); -return -1; +goto error_free; } wav->cap = cap; s->opaque = wav; s->ops = wav_capture_ops; return 0; + +error_free: +g_free (wav->path); +fclose (wav->f); +g_free (wav); +return -1; } -- 1.7.6
[Qemu-devel] [PATCH 1/3] vawaudio: port to FILE*
QEMUFile * is only intended for Migration. Using it for anything else just adds pain and a layer of buffers for no good reason. Signed-off-by: Juan Quintela --- audio/wavaudio.c | 28 +++- 1 files changed, 19 insertions(+), 9 deletions(-) diff --git a/audio/wavaudio.c b/audio/wavaudio.c index aed1817..837b86d 100644 --- a/audio/wavaudio.c +++ b/audio/wavaudio.c @@ -30,7 +30,7 @@ typedef struct WAVVoiceOut { HWVoiceOut hw; -QEMUFile *f; +FILE *f; int64_t old_ticks; void *pcm_buf; int total_samples; @@ -76,7 +76,10 @@ static int wav_run_out (HWVoiceOut *hw, int live) dst = advance (wav->pcm_buf, rpos << hw->info.shift); hw->clip (dst, src, convert_samples); -qemu_put_buffer (wav->f, dst, convert_samples << hw->info.shift); +if (fwrite (dst, convert_samples << hw->info.shift, 1, wav->f) != +convert_samples << hw->info.shift) { +printf("wav_run_out: short write\n"); +} rpos = (rpos + convert_samples) % hw->samples; samples -= convert_samples; @@ -152,7 +155,7 @@ static int wav_init_out (HWVoiceOut *hw, struct audsettings *as) le_store (hdr + 28, hw->info.freq << (bits16 + stereo), 4); le_store (hdr + 32, 1 << (bits16 + stereo), 2); -wav->f = qemu_fopen (conf.wav_path, "wb"); +wav->f = fopen (conf.wav_path, "wb"); if (!wav->f) { dolog ("Failed to open wave file `%s'\nReason: %s\n", conf.wav_path, strerror (errno)); @@ -161,7 +164,10 @@ static int wav_init_out (HWVoiceOut *hw, struct audsettings *as) return -1; } -qemu_put_buffer (wav->f, hdr, sizeof (hdr)); +if (fwrite (hdr, sizeof (hdr), 1, wav->f) != sizeof (hdr)) { +printf("wav_init_out: short write\n"); +return -1; +} return 0; } @@ -180,13 +186,17 @@ static void wav_fini_out (HWVoiceOut *hw) le_store (rlen, rifflen, 4); le_store (dlen, datalen, 4); -qemu_fseek (wav->f, 4, SEEK_SET); -qemu_put_buffer (wav->f, rlen, 4); +fseek (wav->f, 4, SEEK_SET); +if (fwrite (rlen, 1, 4, wav->f) != 4) { +printf("wav_fini_out: short write\n"); +} -qemu_fseek (wav->f, 32, SEEK_CUR); -qemu_put_buffer (wav->f, dlen, 4); +fseek (wav->f, 32, SEEK_CUR); +if (fwrite (dlen, 1, 4, wav->f) != 4) { +printf("wav_fini_out: short write\n"); +} -qemu_fclose (wav->f); +fclose (wav->f); wav->f = NULL; g_free (wav->pcm_buf); -- 1.7.6
[Qemu-devel] [PATCH 0/3] Remove QEMUFile abuse
Hi QEMUFile is intended to be used only for migration. Change the other three users to use FILE * operations directly. gcc on Fedora 15 complains about fread/write not checking its return value, so I added checks. But in several places only print an error message (there is no error handly that I can hook into). Notice that this is not worse than it is today. Later, Juan. Juan Quintela (3): vawaudio: port to FILE* wavcapture: port to FILE* ds1225y: port to FILE* audio/wavaudio.c | 28 +++- audio/wavcapture.c | 38 +- hw/ds1225y.c | 28 3 files changed, 60 insertions(+), 34 deletions(-) -- 1.7.6
[Qemu-devel] [PATCH 3/3] ds1225y: port to FILE*
QEMUFile * is only intended for Migration. Using it for anything else just adds pain and a layer of buffers for no good reason. Signed-off-by: Juan Quintela --- hw/ds1225y.c | 28 1 files changed, 16 insertions(+), 12 deletions(-) diff --git a/hw/ds1225y.c b/hw/ds1225y.c index 9875c44..cd23668 100644 --- a/hw/ds1225y.c +++ b/hw/ds1225y.c @@ -29,7 +29,7 @@ typedef struct { DeviceState qdev; uint32_t chip_size; char *filename; -QEMUFile *file; +FILE *file; uint8_t *contents; } NvRamState; @@ -70,9 +70,9 @@ static void nvram_writeb (void *opaque, target_phys_addr_t addr, uint32_t val) s->contents[addr] = val; if (s->file) { -qemu_fseek(s->file, addr, SEEK_SET); -qemu_put_byte(s->file, (int)val); -qemu_fflush(s->file); +fseek(s->file, addr, SEEK_SET); +fputc(val, s->file); +fflush(s->file); } } @@ -108,15 +108,17 @@ static int nvram_post_load(void *opaque, int version_id) /* Close file, as filename may has changed in load/store process */ if (s->file) { -qemu_fclose(s->file); +fclose(s->file); } /* Write back nvram contents */ -s->file = qemu_fopen(s->filename, "wb"); +s->file = fopen(s->filename, "wb"); if (s->file) { /* Write back contents, as 'wb' mode cleaned the file */ -qemu_put_buffer(s->file, s->contents, s->chip_size); -qemu_fflush(s->file); +if (fwrite(s->contents, s->chip_size, 1, s->file) != s->chip_size) { +printf("nvram_post_load: short write\n"); +} +fflush(s->file); } return 0; @@ -143,7 +145,7 @@ typedef struct { static int nvram_sysbus_initfn(SysBusDevice *dev) { NvRamState *s = &FROM_SYSBUS(SysBusNvRamState, dev)->nvram; -QEMUFile *file; +FILE *file; int s_io; s->contents = g_malloc0(s->chip_size); @@ -153,11 +155,13 @@ static int nvram_sysbus_initfn(SysBusDevice *dev) sysbus_init_mmio(dev, s->chip_size, s_io); /* Read current file */ -file = qemu_fopen(s->filename, "rb"); +file = fopen(s->filename, "rb"); if (file) { /* Read nvram contents */ -qemu_get_buffer(file, s->contents, s->chip_size); -qemu_fclose(file); +if (fread(s->contents, s->chip_size, 1, file) != s->chip_size) { +printf("nvram_sysbus_initfn: short read\n"); +} +fclose(file); } nvram_post_load(s, 0); -- 1.7.6
[Qemu-devel] [PATCH 0/4] Remove trailing double quote limitation and add virtio_set_status trace event
This series removes the tracetool parser limitation that format strings must begin and end with double quotes. In practice this means we need to work around PRI*64 usage by adding dummy "" at the end of the line. It's fairly easy to solve this parser limitation and do away with the workarounds. While we're at it, also add the virtio_set_status() trace event to properly follow the lifecycle of virtio devices. docs/tracing.txt |5 + hw/virtio.c | 10 ++ hw/virtio.h |9 + scripts/tracetool | 20 +--- trace-events | 37 +++-- 5 files changed, 44 insertions(+), 37 deletions(-)
[Qemu-devel] [PATCH 2/4] trace: allow PRI*64 at beginning and ending of format string
The tracetool parser only picks up PRI*64 and other format string macros when enclosed between double quoted strings. Lift this restriction by extracting everything after the closing ')' as the format string: cpu_set_apic_base(uint64_t val) "%016"PRIx64 ^^^^ One trick here: it turns out that backslashes in the format string like "\n" were being interpreted by echo(1). Fix this by using the POSIX printf(1) command instead. Although it normally does not make sense to include backslashes in trace event format strings, an injected newline causes tracetool to emit a broken header file and I want to eliminate cases where broken output is emitted, even if the input was bad. Signed-off-by: Stefan Hajnoczi --- docs/tracing.txt |5 + scripts/tracetool | 20 +--- 2 files changed, 14 insertions(+), 11 deletions(-) diff --git a/docs/tracing.txt b/docs/tracing.txt index e130a61..2c33a62 100644 --- a/docs/tracing.txt +++ b/docs/tracing.txt @@ -75,10 +75,7 @@ Trace events should use types as follows: Format strings should reflect the types defined in the trace event. Take special care to use PRId64 and PRIu64 for int64_t and uint64_t types, -respectively. This ensures portability between 32- and 64-bit platforms. Note -that format strings must begin and end with double quotes. When using -portability macros, ensure they are preceded and followed by double quotes: -"value %"PRIx64"". +respectively. This ensures portability between 32- and 64-bit platforms. === Hints for adding new trace events === diff --git a/scripts/tracetool b/scripts/tracetool index 743d246..4c9951d 100755 --- a/scripts/tracetool +++ b/scripts/tracetool @@ -40,6 +40,15 @@ EOF exit 1 } +# Print a line without interpreting backslash escapes +# +# The built-in echo command may interpret backslash escapes without an option +# to disable this behavior. +puts() +{ +printf "%s\n" "$1" +} + # Get the name of a trace event get_name() { @@ -111,13 +120,10 @@ get_argc() echo $argc } -# Get the format string for a trace event +# Get the format string including double quotes for a trace event get_fmt() { -local fmt -fmt=${1#*\"} -fmt=${fmt%\"*} -echo "$fmt" +puts "${1#*)}" } linetoh_begin_nop() @@ -266,7 +272,7 @@ linetoh_stderr() static inline void trace_$name($args) { if (trace_list[$stderr_event_num].state != 0) { -fprintf(stderr, "$name $fmt\n" $argnames); +fprintf(stderr, "$name " $fmt "\n" $argnames); } } EOF @@ -366,7 +372,7 @@ DEFINE_TRACE(ust_$name); static void ust_${name}_probe($args) { -trace_mark(ust, $name, "$fmt"$argnames); +trace_mark(ust, $name, $fmt$argnames); } EOF -- 1.7.5.4
Re: [Qemu-devel] [Qemu-ppc] [RESEND][PATCH] booke timers
On 12/09/2011 19:23, Scott Wood wrote: > On 09/09/2011 09:58 AM, Alexander Graf wrote: >> On 09.09.2011, at 16:22, Fabien Chouteau wrote: >>> if the interrupt is already set and you clear TCR.DIE, the interrupt has to >>> remain set. The only way to unset an interrupt is to clear the corresponding >>> bit in TSR (currently in store_booke_tsr). >> >> Are you sure? I see several things in the 2.06 spec: > [snip] >> To me that sounds as if the decrementer interrupt gets injected only >> when TSR.DIS=1, TCR.DIE=1 and MSR.EE=1. Unsetting any of these bits >> stops the interrupt from being delivered. >> >> Scott, can you please check up with the hardware guys if this is correct? > > This is how I've always understood it to work (assuming the interrupt > hasn't already been delivered, of course). Fabien, do you have real > hardware that you see behave the way you describe? > No I don't, it was just my understanding of Book-E documentation. I've tried your solution (below) with VxWorks, and it works like a charm. static void booke_update_irq(CPUState *env) { ppc_set_irq(env, PPC_INTERRUPT_DECR, (env->spr[SPR_BOOKE_TSR] & TSR_DIS && env->spr[SPR_BOOKE_TCR] & TCR_DIE)); ppc_set_irq(env, PPC_INTERRUPT_WDT, (env->spr[SPR_BOOKE_TSR] & TSR_WIS && env->spr[SPR_BOOKE_TCR] & TCR_WIE)); ppc_set_irq(env, PPC_INTERRUPT_FIT, (env->spr[SPR_BOOKE_TSR] & TSR_FIS && env->spr[SPR_BOOKE_TCR] & TCR_FIE)); } Regards, -- Fabien Chouteau
[Qemu-devel] [PATCH 4/4] trace: add virtio_set_status() trace event
The virtio device lifecycle can be observed by looking at the sequence of set status operations. This is especially important for catching the reset operation (status value 0), which resets the device and all virtqueues. Signed-off-by: Stefan Hajnoczi --- hw/virtio.c | 10 ++ hw/virtio.h |9 + trace-events |1 + 3 files changed, 12 insertions(+), 8 deletions(-) diff --git a/hw/virtio.c b/hw/virtio.c index 13aa0fa..946d911 100644 --- a/hw/virtio.c +++ b/hw/virtio.c @@ -498,6 +498,16 @@ void virtio_update_irq(VirtIODevice *vdev) virtio_notify_vector(vdev, VIRTIO_NO_VECTOR); } +void virtio_set_status(VirtIODevice *vdev, uint8_t val) +{ +trace_virtio_set_status(vdev, val); + +if (vdev->set_status) { +vdev->set_status(vdev, val); +} +vdev->status = val; +} + void virtio_reset(void *opaque) { VirtIODevice *vdev = opaque; diff --git a/hw/virtio.h b/hw/virtio.h index c129264..666e381 100644 --- a/hw/virtio.h +++ b/hw/virtio.h @@ -135,14 +135,6 @@ struct VirtIODevice VMChangeStateEntry *vmstate; }; -static inline void virtio_set_status(VirtIODevice *vdev, uint8_t val) -{ -if (vdev->set_status) { -vdev->set_status(vdev, val); -} -vdev->status = val; -} - VirtQueue *virtio_add_queue(VirtIODevice *vdev, int queue_size, void (*handle_output)(VirtIODevice *, VirtQueue *)); @@ -190,6 +182,7 @@ int virtio_queue_get_num(VirtIODevice *vdev, int n); void virtio_queue_notify(VirtIODevice *vdev, int n); uint16_t virtio_queue_vector(VirtIODevice *vdev, int n); void virtio_queue_set_vector(VirtIODevice *vdev, int n, uint16_t vector); +void virtio_set_status(VirtIODevice *vdev, uint8_t val); void virtio_reset(void *opaque); void virtio_update_irq(VirtIODevice *vdev); diff --git a/trace-events b/trace-events index 9a59525..99edc97 100644 --- a/trace-events +++ b/trace-events @@ -42,6 +42,7 @@ virtqueue_pop(void *vq, void *elem, unsigned int in_num, unsigned int out_num) " virtio_queue_notify(void *vdev, int n, void *vq) "vdev %p n %d vq %p" virtio_irq(void *vq) "vq %p" virtio_notify(void *vdev, void *vq) "vdev %p vq %p" +virtio_set_status(void *vdev, uint8_t val) "vdev %p val %u" # hw/virtio-serial-bus.c virtio_serial_send_control_event(unsigned int port, uint16_t event, uint16_t value) "port %u, event %u, value %u" -- 1.7.5.4
Re: [Qemu-devel] [Qemu-ppc] [RESEND][PATCH] booke timers
Fabien Chouteau wrote: > On 12/09/2011 19:23, Scott Wood wrote: > >> On 09/09/2011 09:58 AM, Alexander Graf wrote: >> >>> On 09.09.2011, at 16:22, Fabien Chouteau wrote: >>> if the interrupt is already set and you clear TCR.DIE, the interrupt has to remain set. The only way to unset an interrupt is to clear the corresponding bit in TSR (currently in store_booke_tsr). >>> Are you sure? I see several things in the 2.06 spec: >>> >> [snip] >> >>> To me that sounds as if the decrementer interrupt gets injected only >>> when TSR.DIS=1, TCR.DIE=1 and MSR.EE=1. Unsetting any of these bits >>> stops the interrupt from being delivered. >>> >>> Scott, can you please check up with the hardware guys if this is correct? >>> >> This is how I've always understood it to work (assuming the interrupt >> hasn't already been delivered, of course). Fabien, do you have real >> hardware that you see behave the way you describe? >> >> > > No I don't, it was just my understanding of Book-E documentation. I've tried > your solution (below) with VxWorks, and it works like a charm. > > static void booke_update_irq(CPUState *env) > { > ppc_set_irq(env, PPC_INTERRUPT_DECR, > (env->spr[SPR_BOOKE_TSR] & TSR_DIS > && env->spr[SPR_BOOKE_TCR] & TCR_DIE)); > > ppc_set_irq(env, PPC_INTERRUPT_WDT, > (env->spr[SPR_BOOKE_TSR] & TSR_WIS > && env->spr[SPR_BOOKE_TCR] & TCR_WIE)); > > ppc_set_irq(env, PPC_INTERRUPT_FIT, > (env->spr[SPR_BOOKE_TSR] & TSR_FIS > && env->spr[SPR_BOOKE_TCR] & TCR_FIE)); > } > Awesome! Please also check on MSR.EE and send a new patch then :) Alex
[Qemu-devel] [PATCH 1/4] trace: remove newline from grlib_irqmp_check_irqs format string
There is no need to put a newline in trace event format strings. The backend may use the format string within some context and takes care of how to display the event. The stderr backend automatically appends "\n" whereas the ust backend does not want a newline at all. Signed-off-by: Stefan Hajnoczi --- trace-events |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/trace-events b/trace-events index a8e7684..cfcdc9b 100644 --- a/trace-events +++ b/trace-events @@ -327,7 +327,7 @@ grlib_gptimer_readl(int id, uint64_t addr, uint32_t val) "timer:%d addr 0x%"PRIx grlib_gptimer_writel(int id, uint64_t addr, uint32_t val) "timer:%d addr 0x%"PRIx64" 0x%x" # hw/grlib_irqmp.c -grlib_irqmp_check_irqs(uint32_t pend, uint32_t force, uint32_t mask, uint32_t lvl1, uint32_t lvl2) "pend:0x%04x force:0x%04x mask:0x%04x lvl1:0x%04x lvl0:0x%04x\n" +grlib_irqmp_check_irqs(uint32_t pend, uint32_t force, uint32_t mask, uint32_t lvl1, uint32_t lvl2) "pend:0x%04x force:0x%04x mask:0x%04x lvl1:0x%04x lvl0:0x%04x" grlib_irqmp_ack(int intno) "interrupt:%d" grlib_irqmp_set_irq(int irq) "Raise CPU IRQ %d" grlib_irqmp_readl_unknown(uint64_t addr) "addr 0x%"PRIx64"" -- 1.7.5.4
Re: [Qemu-devel] Question on kvm_clock working ...
On 2011-09-13 13:38, al pat wrote: > Thanks Phillip. > > My current source is "kvm-clock". > > WHen I start my guest, it is in sync with the host clock. > > Then, I chance the time on my host - using "date --set ...". I don't see the > guest update its time. > I was expecting that the guest will detect host time change and change it? That's not what kvmclock is supposed to provide. Besides a monotonic clock source, it basically offers an alternative persistent clock, to some degree replacing the emulated RTC. So updates of the host system time are only recognized by the guest kernel when it reboots or suspends/resumes. > > So, when the host is exporting its system time and TSC values, does it go > into the "emulated RTC" of the guest and the guest checks it only once? Or > does the guest resync its clock with the host's value periodically? > > I can try to do: "hwclock --hctosys --utc" --- this is just to check. (I > have kvm-clock as my clock source though). Re-reading the RTC is a brute-force approach to re-synchronize with the host. If potential clock jumps are OK for your use case, you can go that way. Alternatives are using NTP against a time server or writing a RTC plugin for NTP to synchronize against that local source (a colleague once wrote such a plugin for a special guest, but it never made it into public AFAIK). Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux
Re: [Qemu-devel] [Qemu-ppc] [RESEND][PATCH] booke timers
Alexander Graf wrote: > Fabien Chouteau wrote: > >> On 12/09/2011 19:23, Scott Wood wrote: >> >> >>> On 09/09/2011 09:58 AM, Alexander Graf wrote: >>> >>> On 09.09.2011, at 16:22, Fabien Chouteau wrote: > if the interrupt is already set and you clear TCR.DIE, the interrupt has > to > remain set. The only way to unset an interrupt is to clear the > corresponding > bit in TSR (currently in store_booke_tsr). > > Are you sure? I see several things in the 2.06 spec: >>> [snip] >>> >>> To me that sounds as if the decrementer interrupt gets injected only when TSR.DIS=1, TCR.DIE=1 and MSR.EE=1. Unsetting any of these bits stops the interrupt from being delivered. Scott, can you please check up with the hardware guys if this is correct? >>> This is how I've always understood it to work (assuming the interrupt >>> hasn't already been delivered, of course). Fabien, do you have real >>> hardware that you see behave the way you describe? >>> >>> >>> >> No I don't, it was just my understanding of Book-E documentation. I've tried >> your solution (below) with VxWorks, and it works like a charm. >> >> static void booke_update_irq(CPUState *env) >> { >> ppc_set_irq(env, PPC_INTERRUPT_DECR, >> (env->spr[SPR_BOOKE_TSR] & TSR_DIS >> && env->spr[SPR_BOOKE_TCR] & TCR_DIE)); >> >> ppc_set_irq(env, PPC_INTERRUPT_WDT, >> (env->spr[SPR_BOOKE_TSR] & TSR_WIS >> && env->spr[SPR_BOOKE_TCR] & TCR_WIE)); >> >> ppc_set_irq(env, PPC_INTERRUPT_FIT, >> (env->spr[SPR_BOOKE_TSR] & TSR_FIS >> && env->spr[SPR_BOOKE_TCR] & TCR_FIE)); >> } >> >> > > Awesome! Please also check on MSR.EE and send a new patch then :) > Ah, the EE check is in target-ppc/helper.c:ppc_hw_interrupt. Very confusing (and probably wrong because it could generate spurious interrupts), but it should be enough for now. Alex
[Qemu-devel] [PULL] VirtFS update
Hi Anthony, This series contain few fixes to VirtFS server. The patch set also add two new 9p operations. Please pull. The following changes since commit 07ff2c4475df77e38a31d50ee7f3932631806c15: Merge remote-tracking branch 'origin/master' into staging (2011-09-08 09:25:36 -0500) are available in the git repository at: git://repo.or.cz/qemu/v9fs.git for-upstream-4 Aneesh Kumar K.V (5): hw/9pfs: Update the fidp path before opendir hw/9pfs: Initialize rest of qid field to zero. hw/9pfs: Fix memleaks in some 9p operation hw/9pfs: add 9P2000.L renameat operation hw/9pfs: add 9P2000.L unlinkat operation hw/9pfs/virtio-9p.c | 126 +++ hw/9pfs/virtio-9p.h |4 ++ 2 files changed, 130 insertions(+), 0 deletions(-) -aneesh
Re: [Qemu-devel] [Qemu-ppc] [RESEND][PATCH] booke timers
On 13/09/2011 15:13, Alexander Graf wrote: > Alexander Graf wrote: >> Fabien Chouteau wrote: >> >>> On 12/09/2011 19:23, Scott Wood wrote: >>> >>> On 09/09/2011 09:58 AM, Alexander Graf wrote: > On 09.09.2011, at 16:22, Fabien Chouteau wrote: > > >> if the interrupt is already set and you clear TCR.DIE, the interrupt has >> to >> remain set. The only way to unset an interrupt is to clear the >> corresponding >> bit in TSR (currently in store_booke_tsr). >> >> > Are you sure? I see several things in the 2.06 spec: > > [snip] > To me that sounds as if the decrementer interrupt gets injected only > when TSR.DIS=1, TCR.DIE=1 and MSR.EE=1. Unsetting any of these bits > stops the interrupt from being delivered. > > Scott, can you please check up with the hardware guys if this is correct? > > This is how I've always understood it to work (assuming the interrupt hasn't already been delivered, of course). Fabien, do you have real hardware that you see behave the way you describe? >>> No I don't, it was just my understanding of Book-E documentation. I've tried >>> your solution (below) with VxWorks, and it works like a charm. >>> >>> static void booke_update_irq(CPUState *env) >>> { >>> ppc_set_irq(env, PPC_INTERRUPT_DECR, >>> (env->spr[SPR_BOOKE_TSR] & TSR_DIS >>> && env->spr[SPR_BOOKE_TCR] & TCR_DIE)); >>> >>> ppc_set_irq(env, PPC_INTERRUPT_WDT, >>> (env->spr[SPR_BOOKE_TSR] & TSR_WIS >>> && env->spr[SPR_BOOKE_TCR] & TCR_WIE)); >>> >>> ppc_set_irq(env, PPC_INTERRUPT_FIT, >>> (env->spr[SPR_BOOKE_TSR] & TSR_FIS >>> && env->spr[SPR_BOOKE_TCR] & TCR_FIE)); >>> } >>> >>> >> >> Awesome! Please also check on MSR.EE and send a new patch then :) >> > > Ah, the EE check is in target-ppc/helper.c:ppc_hw_interrupt. Very > confusing (and probably wrong because it could generate spurious > interrupts), but it should be enough for now. > That's what I was looking for... So we are good. -- Fabien Chouteau
[Qemu-devel] [Bug 848571] Re: qemu does not generate a qemu-kvm.stp tapset file
You are correct. the qemu.spec file is doing the copy. ** Changed in: qemu Status: New => Invalid -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/848571 Title: qemu does not generate a qemu-kvm.stp tapset file Status in QEMU: Invalid Bug description: To make the systemtap probing easier to use qemu generates qemu*.stp files with aliases for various events for each of the executables. The installer places these files in /usr/share/systemtap/tapset/. These files are generated by the tracetool. However, the /usr/bin/qemu-kvm is produced with a copy: cp -a x86_64-softmmu/qemu-system-x86_64 qemu-kvm No matching qemu-kvm.stp generated for the qemu-kvm executable. It would be really nice if that tapset file is generated so people can use more symbolic probe points. To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/848571/+subscriptions
Re: [Qemu-devel] [PATCH][RFC][0/2] REF+/REF- optimization
2011/9/13 Kevin Wolf : > Am 13.09.2011 09:53, schrieb Frediano Ziglio: >> These patches try to trade-off between leaks and speed for clusters >> refcounts. >> >> Refcount increments (REF+ or refp) are handled in a different way from >> decrements (REF- or refm). The reason it that posting or not flushing >> a REF- cause "just" a leak while posting a REF+ cause a corruption. >> >> To optimize REF- I just used an array to store offsets then when a >> flush is requested or array reach a limit (currently 1022) the array >> is sorted and written to disk. I use an array with offset instead of >> ranges to support compression (an offset could appear multiple times >> in the array). >> I consider this patch quite ready. > > Ok, first of all let's clarify what this optimises. I don't think it > changes anything at all for the writeback cache modes, because these > already do most operations in memory only. So this must be about > optimising some operations with cache=writethrough. REF- isn't about > normal cluster allocation, it is about COW with internal snapshots or > bdrv_discard. Do you have benchmarks for any of them? > > I strongly disagree with your approach for REF-. We already have a > cache, and introducing a second one sounds like a bad idea. I think we > could get a very similar effect if we introduced a > qcow2_cache_entry_mark_dirty_wb() that marks a given refcount block as > dirty, but at the same time tells the cache that even in write-through > mode it can still treat this block as write-back. This should require > much less code changes. > Yes, mainly optimize for writethrough. I did not test with writeback but should improve even this (I think here you have some flush to keep consistency). I'll try to write a qcow2_cache_entry_mark_dirty_wb patch and test it. > But let's measure the effects first, I suspect that for cluster > allocation it doesn't help much because every REF- comes with a REF+. > That's 50% of effort if REF- clusters are far from REF+ :) >> To optimize REF+ I mark a range as allocated and use this range to >> get new ones (avoiding writing refcount to disk). When a flush is >> requested or in some situations (like snapshot) this cache is disabled >> and flushed (written as REF-). >> I do not consider this patch ready, it works and pass all io-tests >> but for instance I would avoid allocating new clusters for refcount >> during preallocation. > > The only question here is if improving cache=writethrough cluster > allocation performance is worth the additional complexity in the already > complex refcounting code. > I didn't see this optimization as a second level cache, but yes, for REF- is a second cache. > The alternative that was discussed before is the dirty bit approach that > is used in QED and would allow us to use writeback for all refcount > blocks, regardless of REF- or REF+. It would be an easier approach > requiring less code changes, but it comes with the cost of requiring an > fsck after a qemu crash. > I was thinking about changing the header magic first time we change refcount in order to mark image as dirty so newer Qemu recognize the flag while former one does not recognize image. Obviously reverting magic on image close. >> End speed up is quite visible allocating clusters (more then 20%). > > What benchmark do you use for testing this? > > Kevin > Currently I'm using bonnie++ but I noted similar improves with iozone. The test script format an image then launch a Linux machine which run a script and save result to a file. The test image is seems by this virtual machine as a separate disk. The file on hist reside in a separate LV. I got quite consistent results (of course not working on the machine while testing, is not actually dedicated to this job). Actually I'm running the test (added a test working in a snapshot image). Frediano
[Qemu-devel] [PATCH] bswap.h: build fix
qemu build fails when CONFIG_MACHINE_BSWAP_H is defined because float32, float64, etc. are not defined. This makes qemu build. Signed-off-by: Christoph Egger diff --git a/bswap.h b/bswap.h index f41bebe..cc7f84d 100644 --- a/bswap.h +++ b/bswap.h @@ -4,6 +4,7 @@ #include "config-host.h" #include +#include "softfloat.h" #ifdef CONFIG_MACHINE_BSWAP_H #include @@ -11,8 +12,6 @@ #include #else -#include "softfloat.h" - #ifdef CONFIG_BYTESWAP_H #include #else -- ---to satisfy European Law for business letters: Advanced Micro Devices GmbH Einsteinring 24, 85689 Dornach b. Muenchen Geschaeftsfuehrer: Alberto Bozzo, Andrew Bowd Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632
Re: [Qemu-devel] [PATCH 04/12] nbd: add support for NBD_CMD_FLUSH
Am 08.09.2011 17:24, schrieb Paolo Bonzini: > Note for the brace police: the style in this commit and the following > is consistent with the rest of the file. It is then fixed together with > the introduction of coroutines. > > Signed-off-by: Paolo Bonzini > --- > block/nbd.c | 31 +++ > nbd.c | 14 +- > 2 files changed, 44 insertions(+), 1 deletions(-) > > diff --git a/block/nbd.c b/block/nbd.c > index ffc57a9..4a195dc 100644 > --- a/block/nbd.c > +++ b/block/nbd.c > @@ -237,6 +237,36 @@ static int nbd_write(BlockDriverState *bs, int64_t > sector_num, > return 0; > } > > +static int nbd_flush(BlockDriverState *bs) > +{ > +BDRVNBDState *s = bs->opaque; > +struct nbd_request request; > +struct nbd_reply reply; > + > +if (!(s->nbdflags & NBD_FLAG_SEND_FLUSH)) { > +return 0; > +} > + > +request.type = NBD_CMD_FLUSH; > +request.handle = (uint64_t)(intptr_t)bs; > +request.from = 0; > +request.len = 0; > + > +if (nbd_send_request(s->sock, &request) == -1) > +return -errno; > + > +if (nbd_receive_reply(s->sock, &reply) == -1) > +return -errno; > + > +if (reply.error !=0) Missing space (this is not for consistency, right?) > @@ -682,6 +683,18 @@ int nbd_trip(BlockDriverState *bs, int csock, off_t > size, uint64_t dev_offset, > TRACE("Request type is DISCONNECT"); > errno = 0; > return 1; > +case NBD_CMD_FLUSH: > +TRACE("Request type is FLUSH"); > + > +if (bdrv_flush(bs) == -1) { bdrv_flush is supposed to return -errno, so please check for < 0. (I see that raw-posix needs to be fixed, but other block drivers already return error values other than -1) Kevin
Re: [Qemu-devel] [PATCH 05/12] nbd: add support for NBD_CMD_FLAG_FUA
Am 08.09.2011 17:24, schrieb Paolo Bonzini: > The server can use it to issue a flush automatically after a > write. The client can also use it to mimic a write-through > cache. > > Signed-off-by: Paolo Bonzini > --- > block/nbd.c |8 > nbd.c | 13 +++-- > 2 files changed, 19 insertions(+), 2 deletions(-) > @@ -674,6 +675,14 @@ int nbd_trip(BlockDriverState *bs, int csock, off_t > size, uint64_t dev_offset, > } > > *offset += request.len; > + > +if (request.type & NBD_CMD_FLAG_FUA) { > +if (bdrv_flush(bs) == -1) { Need to check for < 0 here as well. Kevin
Re: [Qemu-devel] [PATCH 06/12] nbd: support NBD_CMD_TRIM in the server
Am 08.09.2011 17:24, schrieb Paolo Bonzini: > Map it to bdrv_discard. The server can now expose NBD_FLAG_SEND_TRIM. > > Signed-off-by: Paolo Bonzini > --- > block/nbd.c | 31 +++ > nbd.c |9 - > 2 files changed, 39 insertions(+), 1 deletions(-) > > diff --git a/block/nbd.c b/block/nbd.c > index 5a7812c..964caa8 100644 > --- a/block/nbd.c > +++ b/block/nbd.c > @@ -275,6 +275,36 @@ static int nbd_flush(BlockDriverState *bs) > return 0; > } > > +static int nbd_discard(BlockDriverState *bs, int64_t sector_num, > + int nb_sectors) > +{ > +BDRVNBDState *s = bs->opaque; > +struct nbd_request request; > +struct nbd_reply reply; > + > +if (!(s->nbdflags & NBD_FLAG_SEND_TRIM)) { > +return 0; > +} > +request.type = NBD_CMD_TRIM; > +request.handle = (uint64_t)(intptr_t)bs; > +request.from = sector_num * 512;; > +request.len = nb_sectors * 512; > + > +if (nbd_send_request(s->sock, &request) == -1) > +return -errno; > + > +if (nbd_receive_reply(s->sock, &reply) == -1) > +return -errno; > + > +if (reply.error !=0) > +return -reply.error; > + > +if (reply.handle != request.handle) > +return -EIO; > + > +return 0; > +} > + > static void nbd_close(BlockDriverState *bs) > { > BDRVNBDState *s = bs->opaque; > @@ -299,6 +329,7 @@ static BlockDriver bdrv_nbd = { > .bdrv_write = nbd_write, > .bdrv_close = nbd_close, > .bdrv_flush = nbd_flush, > +.bdrv_discard= nbd_discard, > .bdrv_getlength = nbd_getlength, > .protocol_name = "nbd", > }; > diff --git a/nbd.c b/nbd.c > index b65fb4a..f089904 100644 > --- a/nbd.c > +++ b/nbd.c > @@ -194,7 +194,7 @@ int nbd_negotiate(int csock, off_t size, uint32_t flags) > cpu_to_be64w((uint64_t*)(buf + 8), 0x00420281861253LL); > cpu_to_be64w((uint64_t*)(buf + 16), size); > cpu_to_be32w((uint32_t*)(buf + 24), > - flags | NBD_FLAG_HAS_FLAGS | > + flags | NBD_FLAG_HAS_FLAGS | NBD_FLAG_SEND_TRIM | > NBD_FLAG_SEND_FLUSH | NBD_FLAG_SEND_FUA); > memset(buf + 28, 0, 124); > > @@ -703,6 +703,13 @@ int nbd_trip(BlockDriverState *bs, int csock, off_t > size, uint64_t dev_offset, > if (nbd_send_reply(csock, &reply) == -1) > return -1; > break; > +case NBD_CMD_TRIM: > +TRACE("Request type is TRIM"); > +bdrv_discard(bs, (request.from + dev_offset) / 512, > + request.len / 512); Errors are completely ignored? Does the NBD protocol not allow to return an error? Kevin
[Qemu-devel] [PATCH V4] booke timers
While working on the emulation of the freescale p2010 (e500v2) I realized that there's no implementation of booke's timers features. Currently mpc8544 uses ppc_emb (ppc_emb_timers_init) which is close but not exactly like booke (for example booke uses different SPR). Signed-off-by: Fabien Chouteau --- V2: - Fix fixed timer, now trigger each time the selected bit switch from 0 to 1. - Fix e500 criterion. - Trigger an interrupt when user set DIE/FIE/WIE while DIS/FIS/WIS is already set. - Minor fixes (mask definition, variable name...). - Rename ppc_emb to ppc_40x V3: - Fix bit selection for e500 fixed timers (fp == 00 selects msb) - Improved formula to compute the next event of a fixed timer v4: - Centralized interrupt handling - Timer flags (BOOKE, E500, DECR_UNDERFLOW_TRIGGERED, DECR_ZERO_TRIGGERED) Makefile.target |2 +- hw/ppc.c| 138 +-- hw/ppc.h| 37 ++- hw/ppc4xx_devs.c|2 +- hw/ppc_booke.c | 266 +++ hw/ppce500_mpc8544ds.c |4 +- hw/virtex_ml507.c | 11 +-- target-ppc/cpu.h| 27 + target-ppc/translate_init.c | 39 +++ 9 files changed, 427 insertions(+), 99 deletions(-) create mode 100644 hw/ppc_booke.c diff --git a/Makefile.target b/Makefile.target index f708453..5a85662 100644 --- a/Makefile.target +++ b/Makefile.target @@ -229,7 +229,7 @@ obj-i386-$(CONFIG_KVM) += kvmclock.o obj-i386-$(CONFIG_SPICE) += qxl.o qxl-logger.o qxl-render.o # shared objects -obj-ppc-y = ppc.o +obj-ppc-y = ppc.o ppc_booke.o obj-ppc-y += vga.o # PREP target obj-ppc-y += i8259.o mc146818rtc.o diff --git a/hw/ppc.c b/hw/ppc.c index 8870748..25b59dd 100644 --- a/hw/ppc.c +++ b/hw/ppc.c @@ -50,7 +50,7 @@ static void cpu_ppc_tb_stop (CPUState *env); static void cpu_ppc_tb_start (CPUState *env); -static void ppc_set_irq (CPUState *env, int n_IRQ, int level) +void ppc_set_irq(CPUState *env, int n_IRQ, int level) { unsigned int old_pending = env->pending_interrupts; @@ -423,25 +423,8 @@ void ppce500_irq_init (CPUState *env) } /*/ /* PowerPC time base and decrementer emulation */ -struct ppc_tb_t { -/* Time base management */ -int64_t tb_offset;/* Compensation*/ -int64_t atb_offset; /* Compensation*/ -uint32_t tb_freq; /* TB frequency*/ -/* Decrementer management */ -uint64_t decr_next;/* Tick for next decr interrupt*/ -uint32_t decr_freq;/* decrementer frequency */ -struct QEMUTimer *decr_timer; -/* Hypervisor decrementer management */ -uint64_t hdecr_next;/* Tick for next hdecr interrupt */ -struct QEMUTimer *hdecr_timer; -uint64_t purr_load; -uint64_t purr_start; -void *opaque; -}; -static inline uint64_t cpu_ppc_get_tb(ppc_tb_t *tb_env, uint64_t vmclk, - int64_t tb_offset) +uint64_t cpu_ppc_get_tb(ppc_tb_t *tb_env, uint64_t vmclk, int64_t tb_offset) { /* TB time in tb periods */ return muldiv64(vmclk, tb_env->tb_freq, get_ticks_per_sec()) + tb_offset; @@ -611,10 +594,13 @@ static inline uint32_t _cpu_ppc_load_decr(CPUState *env, uint64_t next) int64_t diff; diff = next - qemu_get_clock_ns(vm_clock); -if (diff >= 0) +if (diff >= 0) { decr = muldiv64(diff, tb_env->decr_freq, get_ticks_per_sec()); -else +} else if (tb_env->flags & PPC_TIMER_BOOKE) { +decr = 0; +} else { decr = -muldiv64(-diff, tb_env->decr_freq, get_ticks_per_sec()); +} LOG_TB("%s: %08" PRIx32 "\n", __func__, decr); return decr; @@ -678,18 +664,24 @@ static void __cpu_ppc_store_decr (CPUState *env, uint64_t *nextp, decr, value); now = qemu_get_clock_ns(vm_clock); next = now + muldiv64(value, get_ticks_per_sec(), tb_env->decr_freq); -if (is_excp) +if (is_excp) { next += *nextp - now; -if (next == now) +} +if (next == now) { next++; +} *nextp = next; /* Adjust timer */ qemu_mod_timer(timer, next); -/* If we set a negative value and the decrementer was positive, - * raise an exception. + +/* If we set a negative value and the decrementer was positive, raise an + * exception. */ -if ((value & 0x8000) && !(decr & 0x8000)) +if ((tb_env->flags & PPC_DECR_UNDERFLOW_TRIGGERED) +&& (value & 0x8000) +&& !(decr & 0x8000)) { (*raise_excp)(env); +} } static inline void _cpu_ppc_store_decr(CPUState *env, uint32_t decr, @@ -763,6 +755,7 @@ clk_setup_cb cpu_ppc_tb_init (CPUState *env, uint32_t freq) tb_env = g_malloc0(sizeof(ppc_tb_t)); env->tb_env = tb_env; +tb_env->flags = PPC_DECR_UNDERFLOW_TRI
Re: [Qemu-devel] [PATCH v2 09/12] sheepdog: move coroutine send/recv function to generic code
Am 09.09.2011 10:11, schrieb Paolo Bonzini: > Outside coroutines, avoid busy waiting on EAGAIN by temporarily > making the socket blocking. > > The API of qemu_recvv/qemu_sendv is slightly different from > do_readv/do_writev because they do not handle coroutines. It > returns the number of bytes written before encountering an > EAGAIN. The specificity of yielding on EAGAIN is entirely in > qemu-coroutine.c. > > Cc: MORITA Kazutaka > Signed-off-by: Paolo Bonzini > --- > Thanks for the review. I checked with qemu-io that all of > > readv -v 0 524288 (x8) > readv -v 0 262144 (x16) > readv -v 0 1024 (x4096) > readv -v 0 1536 (x2730) 1024 > readv -v 0 1024 512 (x2730) 1024 > > work and produce the same output, while previously they would fail. > Looks like it's hard to trigger the code just with qemu. > > block/sheepdog.c | 225 > ++ > cutils.c | 103 + > qemu-common.h|3 + > qemu-coroutine.c | 70 + > qemu-coroutine.h | 26 ++ Can we move the code somewhere else? This is not core coroutine infrastructure. I would suggest qemu_socket.h/qemu-sockets.c. Kevin
Re: [Qemu-devel] About hotplug multifunction
- Original Message - > Hi all, I've tested with WinXp guest, the multifunction hotplug works. > After reading the pci driver code, I found a problem. > > There is a list for each slot, (slot->funcs) > it will be inited in acpiphp_glue.c:register_slot() before hotpluging > device, > and only one entry(func 0) will be added to it, > no new entry will be added to the list when hotpluging devices to the > slot. > > When we release the device, there are only _one_ entry in the > list(slot->funcs). This list (slot->funcs) is designed to restore the func entries, But it only restore the func 0. I changed the # to # in seabios: src/acpi-dsdt.dsl mf hotplug of Windows doesn't work. linux guest will only remove the last func, func 0~6 still exist in guest. it seems a bug of Linux pci driver (not calling pci_remove_bus_device() for func 1~7). --- a/src/acpi-dsdt.dsl +++ b/src/acpi-dsdt.dsl @@ -130,7 +130,7 @@ DefinitionBlock ( #define hotplug_slot(name, nr) \ Device (S##name) {\ - Name (_ADR, nr##) \ + Name (_ADR, nr##) \ Method (_EJ0,1) { \ Store(ShiftLeft(1, nr), B0EJ) \ Return (0x0) \ @@ -462,7 +462,7 @@ DefinitionBlock ( #define gen_pci_device(name, nr)\ Device(SL##name) { \ -Name (_ADR, nr##) \ +Name (_ADR, nr##) \ Method (_RMV) { \ == I try to add new entries in acpiphp_glue.c:enable_device() for each func, but it doesn't work. > acpiphp_glue.c:disable_device() > list_for_each_entry(func, &slot->funcs, sibling) { > pdev = pci_get_slot(slot->bridge->pci_bus, > PCI_DEVFN(slot->device, func->function)); > ...release code... // those code can only be executed one time (func > 0) > pci_remove_bus_device(pdev); > } > > bus.c:pci_bus_add_device() is called for each func device in > acpiphp_glue.c:enable_device(). > bus.c:pci_remove_bus_device(pdev) is only called for func 0 in > acpiphp_glue.c:disable_device(). > > > Resolution: (I've tested it, success) > enumerate all the funcs when disable device. > > list_for_each_entry(func, &slot->funcs, sibling) { > for (i=0; i<8; i++) { > pdev = pci_get_slot(slot->bridge->pci_bus, > PCI_DEVFN(slot->device, i)); > ...release code... > pci_remove_bus_device(pdev); > > } > }
[Qemu-devel] [PATCH 1/2] KVM: emulate lapic tsc deadline timer for guest
>From 7b12021e1d1b79797b49e41cc0a7be05a6180d9a Mon Sep 17 00:00:00 2001 From: Liu, Jinsong Date: Tue, 13 Sep 2011 21:52:54 +0800 Subject: [PATCH] KVM: emulate lapic tsc deadline timer for guest This patch emulate lapic tsc deadline timer for guest: Enumerate tsc deadline timer capability by CPUID; Enable tsc deadline timer mode by lapic MMIO; Start tsc deadline timer by WRMSR; Signed-off-by: Liu, Jinsong --- arch/x86/include/asm/apicdef.h|2 + arch/x86/include/asm/cpufeature.h |3 + arch/x86/include/asm/kvm_host.h |2 + arch/x86/include/asm/msr-index.h |2 + arch/x86/kvm/kvm_timer.h |2 + arch/x86/kvm/lapic.c | 122 ++--- arch/x86/kvm/lapic.h |3 + arch/x86/kvm/x86.c| 20 ++- 8 files changed, 132 insertions(+), 24 deletions(-) diff --git a/arch/x86/include/asm/apicdef.h b/arch/x86/include/asm/apicdef.h index 34595d5..3925d80 100644 --- a/arch/x86/include/asm/apicdef.h +++ b/arch/x86/include/asm/apicdef.h @@ -100,7 +100,9 @@ #defineAPIC_TIMER_BASE_CLKIN 0x0 #defineAPIC_TIMER_BASE_TMBASE 0x1 #defineAPIC_TIMER_BASE_DIV 0x2 +#defineAPIC_LVT_TIMER_ONESHOT (0 << 17) #defineAPIC_LVT_TIMER_PERIODIC (1 << 17) +#defineAPIC_LVT_TIMER_TSCDEADLINE (2 << 17) #defineAPIC_LVT_MASKED (1 << 16) #defineAPIC_LVT_LEVEL_TRIGGER (1 << 15) #defineAPIC_LVT_REMOTE_IRR (1 << 14) diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h index 4258aac..8a26e48 100644 --- a/arch/x86/include/asm/cpufeature.h +++ b/arch/x86/include/asm/cpufeature.h @@ -120,6 +120,7 @@ #define X86_FEATURE_X2APIC (4*32+21) /* x2APIC */ #define X86_FEATURE_MOVBE (4*32+22) /* MOVBE instruction */ #define X86_FEATURE_POPCNT (4*32+23) /* POPCNT instruction */ +#define X86_FEATURE_TSC_DEADLINE_TIMER(4*32+24) /* Tsc deadline timer */ #define X86_FEATURE_AES(4*32+25) /* AES instructions */ #define X86_FEATURE_XSAVE (4*32+26) /* XSAVE/XRSTOR/XSETBV/XGETBV */ #define X86_FEATURE_OSXSAVE(4*32+27) /* "" XSAVE enabled in the OS */ @@ -284,6 +285,8 @@ extern const char * const x86_power_flags[32]; #define cpu_has_xmm4_1 boot_cpu_has(X86_FEATURE_XMM4_1) #define cpu_has_xmm4_2 boot_cpu_has(X86_FEATURE_XMM4_2) #define cpu_has_x2apic boot_cpu_has(X86_FEATURE_X2APIC) +#define cpu_has_tsc_deadline_timer \ + boot_cpu_has(X86_FEATURE_TSC_DEADLINE_TIMER) #define cpu_has_xsave boot_cpu_has(X86_FEATURE_XSAVE) #define cpu_has_hypervisor boot_cpu_has(X86_FEATURE_HYPERVISOR) #define cpu_has_pclmulqdq boot_cpu_has(X86_FEATURE_PCLMULQDQ) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 307e3cf..2ce6529 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -671,6 +671,8 @@ u8 kvm_get_guest_memory_type(struct kvm_vcpu *vcpu, gfn_t gfn); extern bool tdp_enabled; +extern u64 vcpu_tsc_khz(struct kvm_vcpu *vcpu); + /* control of guest tsc rate supported? */ extern bool kvm_has_tsc_control; /* minimum supported tsc_khz for guests */ diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h index d52609a..a6962d9 100644 --- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -229,6 +229,8 @@ #define MSR_IA32_APICBASE_ENABLE (1<<11) #define MSR_IA32_APICBASE_BASE (0xf<<12) +#define MSR_IA32_TSCDEADLINE 0x06e0 + #define MSR_IA32_UCODE_WRITE 0x0079 #define MSR_IA32_UCODE_REV 0x008b diff --git a/arch/x86/kvm/kvm_timer.h b/arch/x86/kvm/kvm_timer.h index 64bc6ea..497dbaa 100644 --- a/arch/x86/kvm/kvm_timer.h +++ b/arch/x86/kvm/kvm_timer.h @@ -2,6 +2,8 @@ struct kvm_timer { struct hrtimer timer; s64 period; /* unit: ns */ + u32 timer_mode_mask; + u64 tscdeadline; atomic_t pending; /* accumulated triggered timers */ bool reinject; struct kvm_timer_ops *t_ops; diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 2b2255b..925d4b9 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -135,9 +135,23 @@ static inline int apic_lvt_vector(struct kvm_lapic *apic, int lvt_type) return apic_get_reg(apic, lvt_type) & APIC_VECTOR_MASK; } +static inline int apic_lvtt_oneshot(struct kvm_lapic *apic) +{ + return ((apic_get_reg(apic, APIC_LVTT) & + apic->lapic_timer.timer_mode_mask) == APIC_LVT_TIMER_ONESHOT); +} + static inline int apic_lvtt_period(struct kvm_lapic *apic) { - return apic_get_reg(apic, APIC_LVTT) & APIC_LVT_TIMER_PERIODIC; +
[Qemu-devel] [PATCH 2/2] Qemu co-operation with kvm tsc deadline timer
>From c1b502d6548fcc41592cd90acc82109ee949df75 Mon Sep 17 00:00:00 2001 From: Liu, Jinsong Date: Tue, 13 Sep 2011 22:05:30 +0800 Subject: [PATCH] Qemu co-operation with kvm tsc deadline timer KVM add emulation of lapic tsc deadline timer for guest. This patch is co-operation work at qemu side. Signed-off-by: Liu, Jinsong --- target-i386/cpu.h |2 ++ target-i386/kvm.c | 14 ++ 2 files changed, 16 insertions(+), 0 deletions(-) diff --git a/target-i386/cpu.h b/target-i386/cpu.h index 935d08a..62ff73c 100644 --- a/target-i386/cpu.h +++ b/target-i386/cpu.h @@ -283,6 +283,7 @@ #define MSR_IA32_APICBASE_BSP (1<<8) #define MSR_IA32_APICBASE_ENABLE(1<<11) #define MSR_IA32_APICBASE_BASE (0xf<<12) +#define MSR_IA32_TSCDEADLINE0x6e0 #define MSR_MTRRcap0xfe #define MSR_MTRRcap_VCNT 8 @@ -687,6 +688,7 @@ typedef struct CPUX86State { uint64_t async_pf_en_msr; uint64_t tsc; +uint64_t tsc_deadline; uint64_t mcg_status; diff --git a/target-i386/kvm.c b/target-i386/kvm.c index aa843f0..206fcad 100644 --- a/target-i386/kvm.c +++ b/target-i386/kvm.c @@ -59,6 +59,7 @@ const KVMCapabilityInfo kvm_arch_required_capabilities[] = { static bool has_msr_star; static bool has_msr_hsave_pa; +static bool has_msr_tsc_deadline; static bool has_msr_async_pf_en; static int lm_capable_kernel; @@ -571,6 +572,10 @@ static int kvm_get_supported_msrs(KVMState *s) has_msr_hsave_pa = true; continue; } +if (kvm_msr_list->indices[i] == MSR_IA32_TSCDEADLINE) { +has_msr_tsc_deadline = true; +continue; +} } } @@ -899,6 +904,9 @@ static int kvm_put_msrs(CPUState *env, int level) if (has_msr_hsave_pa) { kvm_msr_entry_set(&msrs[n++], MSR_VM_HSAVE_PA, env->vm_hsave); } +if (has_msr_tsc_deadline) { +kvm_msr_entry_set(&msrs[n++], MSR_IA32_TSCDEADLINE, env->tsc_deadline); +} #ifdef TARGET_X86_64 if (lm_capable_kernel) { kvm_msr_entry_set(&msrs[n++], MSR_CSTAR, env->cstar); @@ -1145,6 +1153,9 @@ static int kvm_get_msrs(CPUState *env) if (has_msr_hsave_pa) { msrs[n++].index = MSR_VM_HSAVE_PA; } +if (has_msr_tsc_deadline) { +msrs[n++].index = MSR_IA32_TSCDEADLINE; +} if (!env->tsc_valid) { msrs[n++].index = MSR_IA32_TSC; @@ -1213,6 +1224,9 @@ static int kvm_get_msrs(CPUState *env) case MSR_IA32_TSC: env->tsc = msrs[i].data; break; +case MSR_IA32_TSCDEADLINE: +env->tsc_deadline = msrs[i].data; +break; case MSR_VM_HSAVE_PA: env->vm_hsave = msrs[i].data; break; -- 1.6.5.6 qemu-lapic-tsc-deadline-timer.patch Description: qemu-lapic-tsc-deadline-timer.patch
Re: [Qemu-devel] [PATCH][RFC][0/2] REF+/REF- optimization
2011/9/13 Kevin Wolf : > Am 13.09.2011 09:53, schrieb Frediano Ziglio: >> These patches try to trade-off between leaks and speed for clusters >> refcounts. >> >> Refcount increments (REF+ or refp) are handled in a different way from >> decrements (REF- or refm). The reason it that posting or not flushing >> a REF- cause "just" a leak while posting a REF+ cause a corruption. >> >> To optimize REF- I just used an array to store offsets then when a >> flush is requested or array reach a limit (currently 1022) the array >> is sorted and written to disk. I use an array with offset instead of >> ranges to support compression (an offset could appear multiple times >> in the array). >> I consider this patch quite ready. > > Ok, first of all let's clarify what this optimises. I don't think it > changes anything at all for the writeback cache modes, because these > already do most operations in memory only. So this must be about > optimising some operations with cache=writethrough. REF- isn't about > normal cluster allocation, it is about COW with internal snapshots or > bdrv_discard. Do you have benchmarks for any of them? > > I strongly disagree with your approach for REF-. We already have a > cache, and introducing a second one sounds like a bad idea. I think we > could get a very similar effect if we introduced a > qcow2_cache_entry_mark_dirty_wb() that marks a given refcount block as > dirty, but at the same time tells the cache that even in write-through > mode it can still treat this block as write-back. This should require > much less code changes. > > But let's measure the effects first, I suspect that for cluster > allocation it doesn't help much because every REF- comes with a REF+. > >> To optimize REF+ I mark a range as allocated and use this range to >> get new ones (avoiding writing refcount to disk). When a flush is >> requested or in some situations (like snapshot) this cache is disabled >> and flushed (written as REF-). >> I do not consider this patch ready, it works and pass all io-tests >> but for instance I would avoid allocating new clusters for refcount >> during preallocation. > > The only question here is if improving cache=writethrough cluster > allocation performance is worth the additional complexity in the already > complex refcounting code. > > The alternative that was discussed before is the dirty bit approach that > is used in QED and would allow us to use writeback for all refcount > blocks, regardless of REF- or REF+. It would be an easier approach > requiring less code changes, but it comes with the cost of requiring an > fsck after a qemu crash. > >> End speed up is quite visible allocating clusters (more then 20%). > > What benchmark do you use for testing this? > > Kevin > Here you are some results (kb/s) with patch (ref-, ref+), qcow2s is qcow2 with a snapshot run rawqcow2 qcow2s 1 22748 4878 4792 2 29557 15839 23144 without run rawqcow2 qcow2s 1 21979 4308 1021 2 26249 13776 24182 Frediano
Re: [Qemu-devel] [PATCH 2/2] Qemu co-operation with kvm tsc deadline timer
On 2011-09-13 16:38, Liu, Jinsong wrote: > From c1b502d6548fcc41592cd90acc82109ee949df75 Mon Sep 17 00:00:00 2001 > From: Liu, Jinsong > Date: Tue, 13 Sep 2011 22:05:30 +0800 > Subject: [PATCH] Qemu co-operation with kvm tsc deadline timer > > KVM add emulation of lapic tsc deadline timer for guest. > This patch is co-operation work at qemu side. > > Signed-off-by: Liu, Jinsong > --- > target-i386/cpu.h |2 ++ > target-i386/kvm.c | 14 ++ > 2 files changed, 16 insertions(+), 0 deletions(-) > > diff --git a/target-i386/cpu.h b/target-i386/cpu.h > index 935d08a..62ff73c 100644 > --- a/target-i386/cpu.h > +++ b/target-i386/cpu.h > @@ -283,6 +283,7 @@ > #define MSR_IA32_APICBASE_BSP (1<<8) > #define MSR_IA32_APICBASE_ENABLE(1<<11) > #define MSR_IA32_APICBASE_BASE (0xf<<12) > +#define MSR_IA32_TSCDEADLINE0x6e0 > > #define MSR_MTRRcap 0xfe > #define MSR_MTRRcap_VCNT 8 > @@ -687,6 +688,7 @@ typedef struct CPUX86State { > uint64_t async_pf_en_msr; > > uint64_t tsc; > +uint64_t tsc_deadline; This field has to be saved/restored for snapshots/migrations. Frankly, I've no clue right now if substates are in vogue again (they had problems in their binary format) or if you can simply add a versioned top-level field and bump the CPUState version number. > > uint64_t mcg_status; > > diff --git a/target-i386/kvm.c b/target-i386/kvm.c > index aa843f0..206fcad 100644 > --- a/target-i386/kvm.c > +++ b/target-i386/kvm.c > @@ -59,6 +59,7 @@ const KVMCapabilityInfo kvm_arch_required_capabilities[] = { > > static bool has_msr_star; > static bool has_msr_hsave_pa; > +static bool has_msr_tsc_deadline; > static bool has_msr_async_pf_en; > static int lm_capable_kernel; > > @@ -571,6 +572,10 @@ static int kvm_get_supported_msrs(KVMState *s) > has_msr_hsave_pa = true; > continue; > } > +if (kvm_msr_list->indices[i] == MSR_IA32_TSCDEADLINE) { > +has_msr_tsc_deadline = true; > +continue; > +} > } > } > > @@ -899,6 +904,9 @@ static int kvm_put_msrs(CPUState *env, int level) > if (has_msr_hsave_pa) { > kvm_msr_entry_set(&msrs[n++], MSR_VM_HSAVE_PA, env->vm_hsave); > } > +if (has_msr_tsc_deadline) { > +kvm_msr_entry_set(&msrs[n++], MSR_IA32_TSCDEADLINE, > env->tsc_deadline); > +} > #ifdef TARGET_X86_64 > if (lm_capable_kernel) { > kvm_msr_entry_set(&msrs[n++], MSR_CSTAR, env->cstar); > @@ -1145,6 +1153,9 @@ static int kvm_get_msrs(CPUState *env) > if (has_msr_hsave_pa) { > msrs[n++].index = MSR_VM_HSAVE_PA; > } > +if (has_msr_tsc_deadline) { > +msrs[n++].index = MSR_IA32_TSCDEADLINE; > +} > > if (!env->tsc_valid) { > msrs[n++].index = MSR_IA32_TSC; > @@ -1213,6 +1224,9 @@ static int kvm_get_msrs(CPUState *env) > case MSR_IA32_TSC: > env->tsc = msrs[i].data; > break; > +case MSR_IA32_TSCDEADLINE: > +env->tsc_deadline = msrs[i].data; > +break; > case MSR_VM_HSAVE_PA: > env->vm_hsave = msrs[i].data; > break; Just to double check: This feature is exposed to the guest when A) the host CPU supports it and B) QEMU passed down guest CPU specifications (cpuid data) that allow it as well? Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux
Re: [Qemu-devel] [PATCH 04/12] nbd: add support for NBD_CMD_FLUSH
On 09/13/2011 03:52 PM, Kevin Wolf wrote: > + > +if (reply.error !=0) Missing space (this is not for consistency, right?) Well, cut and paste implies consistency. :) Will fix. Paolo
Re: [Qemu-devel] [PATCH 06/12] nbd: support NBD_CMD_TRIM in the server
On 09/13/2011 03:58 PM, Kevin Wolf wrote: > +case NBD_CMD_TRIM: > +TRACE("Request type is TRIM"); > +bdrv_discard(bs, (request.from + dev_offset) / 512, > + request.len / 512); Errors are completely ignored? Does the NBD protocol not allow to return an error? Actually it does, will fix. Paolo
Re: [Qemu-devel] [PATCH v2 09/12] sheepdog: move coroutine send/recv function to generic code
On 09/13/2011 04:14 PM, Kevin Wolf wrote: >block/sheepdog.c | 225 ++ >cutils.c | 103 + >qemu-common.h|3 + >qemu-coroutine.c | 70 + >qemu-coroutine.h | 26 ++ Can we move the code somewhere else? This is not core coroutine infrastructure. I would suggest qemu_socket.h/qemu-sockets.c. It's not really socket-specific either (it uses recv/send only because of Windows brokenness---it could use read/write if it wasn't for that). I hoped sooner or later it could become a qemu_co_readv/writev, hence the choice of qemu-coroutine.c. Paolo ps: I also hope that the Earth will start spinning slower and will give me 32 hour days, so just tell me if you really want that outside qemu-coroutine.c.
Re: [Qemu-devel] [PATCH v2 09/12] sheepdog: move coroutine send/recv function to generic code
Am 13.09.2011 17:16, schrieb Paolo Bonzini: > On 09/13/2011 04:14 PM, Kevin Wolf wrote: block/sheepdog.c | 225 ++ cutils.c | 103 + qemu-common.h|3 + qemu-coroutine.c | 70 + qemu-coroutine.h | 26 ++ >> >> Can we move the code somewhere else? This is not core coroutine >> infrastructure. I would suggest qemu_socket.h/qemu-sockets.c. > > It's not really socket-specific either (it uses recv/send only because > of Windows brokenness---it could use read/write if it wasn't for that). > I hoped sooner or later it could become a qemu_co_readv/writev, hence > the choice of qemu-coroutine.c. > > Paolo > > ps: I also hope that the Earth will start spinning slower and will give > me 32 hour days, so just tell me if you really want that outside > qemu-coroutine.c. Yes, I do want it outside qemu-coroutine.c. If you prefer putting it next to qemu_write_full() and friends rather than into the sockets file, feel free to do that. Kevin
Re: [Qemu-devel] [PATCH v2 09/12] sheepdog: move coroutine send/recv function to generic code
On 09/13/2011 05:36 PM, Kevin Wolf wrote: If you prefer putting it next to qemu_write_full() and friends rather than into the sockets file, feel free to do that. Yes, that makes good sense. Paolo
Re: [Qemu-devel] Armel host (x86 emul.) img disk not writable
Hi all, Apologize ... I have try using the apt-get install qemu (0.12) and it's seem that it's not target the kernel (because Qnap and chroot don't have the same level), generally all works fine (virtualbox for ex.) only kernel module if needed are to rebuild ... But at all I have downloaded the 0.15, do the compile (even if I compile some platform that I don't use (but I look about after ...)) Now I can write on the image disk I will start with a win98 install (it's an Arm soc) ... and see what append. So sorry for this mess. I hope to post more value in the future ... Philippe. Le 13/09/2011 08:52, Father Mande a écrit : Hi, It's my first message so : Me : I work from a while with Vmware and Virtualbox I have integrated Virtualbox in the x86 QNAP NAS system Armel Platform : QNAP NAS TS-219 with a Marvell (Arm) SOC 1,8 Ghz Test run QEMU x86 emulation inside (VM like freedos or Windows) Context : only Modules are possible no kernel change ... ... test are done in a chroot env. Debian Squeeze for Armel, so add X11 client to NAS and use X-Ming or Debian box as X Server Test working : install qemu thru apt-get Start emu from fd (freedos) cdrom (live_cd linux , Win98 installation CD) ... all run open the windows, menu and keyboard works, I have add, also in fd program to manage the hlt ... so all seems to run. tested also qemu-launcher ... works. *Problem : * Each time I create a disk image to add or install on disk ... the install (Windows or Freedos) failed because he don't see the disk or can't read or write on I have tested qemu-img with raw, qcow, qcow2 without success ... img file are full rw for all, qemu run under root, but it's same under a "normal" user. Due to the lack of IDE in the Qnap ... I have try also compile modules for ide-core and load it (insmod) but no change ... any advice ? Certainly I forgot somethings ? thanks for help. Philippe.
[Qemu-devel] [PATCH] fix compilation with stderr trace backend
Signed-off-by: Paolo Bonzini --- trace-events |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/trace-events b/trace-events index a8e7684..1e9e717 100644 --- a/trace-events +++ b/trace-events @@ -454,7 +454,7 @@ milkymist_vgafb_memory_write(uint32_t addr, uint32_t value) "addr %08x value %08 mipsnet_send(uint32_t size) "sending len=%u" mipsnet_receive(uint32_t size) "receiving len=%u" mipsnet_read(uint64_t addr, uint32_t val) "read addr=0x%" PRIx64 " val=0x%x" -mipsnet_write(uint64_t addr, uint64_t val) "write addr=0x%" PRIx64 " val=0x%" PRIx64 +mipsnet_write(uint64_t addr, uint64_t val) "write addr=0x%" PRIx64 " val=0x%" PRIx64 "" mipsnet_irq(uint32_t isr, uint32_t intctl) "set irq to %d (%02x)" # xen-all.c -- 1.7.6
Re: [Qemu-devel] Using the qemu tracepoints with SystemTap
On 09/13/2011 06:03 AM, Stefan Hajnoczi wrote: > On Mon, Sep 12, 2011 at 4:33 PM, William Cohen wrote: >> The RHEL-6 version of qemu-kvm makes the tracepoints available to SystemTap. >> I have been working on useful examples for the SystemTap tracepoints in >> qemu. There doesn't seem to be a great number of examples showing the >> utility of the tracepoints in diagnosing problems. However, I came across >> the following blog entry that had several examples: >> >> http://blog.vmsplice.net/2011/03/how-to-write-trace-analysis-scripts-for.html >> >> I reimplemented the VirtqueueRequestTracker example from the blog in >> SystemTap (the attached virtqueueleaks.stp). I can run it on RHEL-6's >> qemu-kvm-0.12.1.2-2.160.el6_1.8.x86_64 and get output like the following. It >> outputs the pid and the address of the elem that leaked when the script is >> stopped like the following: >> >> $ stap virtqueueleaks.stp >> ^C >> pid elem >> 19503 1c4af28 >> 19503 1c56f88 >> 19503 1c62fe8 >> 19503 1c6f048 >> 19503 1c7b0a8 >> 19503 1c87108 >> 19503 1c93168 >> ... >> >> I am not that familiar with the internals of qemu. The script seems to >> indicates qemu is leaking, but is that really the case? If there are >> resource leaks, what output would help debug those leaks? What enhancements >> can be done to this script to provide more useful information? > Hi Stefan, Thanks for the comments. > Leak tracing always has corner cases :). > > With virtio-blk this would indicate a leak because it uses a > request-response model where the guest initiates I/O and the host > responds. A guest that cleanly shuts down before you exit your > SystemTap script should not leak requests for virtio-blk. I stopped the systemtap script while the guest vm was still running. So when the guest vm cleanly shuts down there should be a series of virtqueue_fill operations that will remove those elements? Qemu uses a thread for each virtual processor, but a single thread to handle all IO. It seems like that might be a possible bottleneck. What would be the path of io event from guest to host back to guest? Is there somthing that a script could do to gauge that delay due to the qemu io thread handling multiple processors? > > With virtio-net the guest actually hands the host receive buffers and > they are held until we can receive packets into them and return them > to the host. We don't have a virtio_reset trace event, and due to > this we're not accounting for clean shutdown (the guest driver resets > the device to clear all virtqueues). > > I am submitting a patch to add virtio_reset() tracing. This will > allow the script to delete all elements belonging to this virtio > device. > >> Are there other examples of qemu probing people would like to see? > > The malloc/realloc/memalign/vmalloc/free/vfree trace events can be > used for a few things: > * Finding memory leaks. > * Finding malloc/vfree or vmalloc/free mismatches. The rules are: > malloc/realloc need free, memalign/vmalloc need vfree. They cannot be > mixed. > > Stefan As a quick and simple experiment to see how often various probes are getting hit I used the following script on RHEL-6 (the probe points are a bit different on Fedora): global counts probe qemu.*.*? {counts[pn()]++} probe end {foreach(n+ in counts) printf ("%s = %d\n", n, counts[n])} For starting up a fedora 14 guest virtual machine and shutting it down I got the following output: $ stap ~/research/profiling/examples/qemu_count.s ^Cqemu.kvm.balloon_event = 1 qemu.kvm.bdrv_aio_multiwrite = 155 qemu.kvm.bdrv_aio_readv = 13284 qemu.kvm.bdrv_aio_writev = 998 qemu.kvm.cpu_get_apic_base = 20 qemu.kvm.cpu_in = 94082 qemu.kvm.cpu_out = 165789 qemu.kvm.cpu_set_apic_base = 445752 qemu.kvm.multiwrite_cb = 654 qemu.kvm.paio_submit = 7141 qemu.kvm.qemu_free = 677704 qemu.kvm.qemu_malloc = 683031 qemu.kvm.qemu_memalign = 285 qemu.kvm.qemu_realloc = 47550 qemu.kvm.virtio_blk_handle_write = 504 qemu.kvm.virtio_blk_req_complete = 7146 qemu.kvm.virtio_blk_rw_complete = 7146 qemu.kvm.virtio_notify = 6574 qemu.kvm.virtio_queue_notify = 6680 qemu.kvm.virtqueue_fill = 7146 qemu.kvm.virtqueue_flush = 7146 qemu.kvm.virtqueue_pop = 7147 qemu.kvm.vm_state_notify = 1 See a lot of qemu.kvm.qemu_malloc. This is likely more than systemtap can track if there are thousands of them live at the same time. There are no qemu_vmalloc events because of https://bugzilla.redhat.com/show_bug.cgi?id=714773. Should the qemu.kvm.cpu_in and qemu.kvm.cpu_out match up? There are a lot more qemu.kvm.cpu_out than qemu.kvm.cpu_in count. See that qemu.kvm.virtio_blk_req_complete, qemu.kvm.virtio_blk_rw_complete, qemu.kvm.virtqueue_fill, and qemu.kvm.virtqueue_flush all have the same count, 7146. The qemu.kvm.virtqueue_pop is close, at 7147. -Will
[Qemu-devel] [Bug 818673] Re: virtio: trying to map MMIO memory
I've made several unsuccessful attempts to reproduce this problem, running VMs on top of F14 and RHEL6.2 The only one relatively close problem, reported by our QE, was https://bugzilla.redhat.com/show_bug.cgi?id=727034 It must be fixed in our internal repository. (Public repository is out of sync, but I'm going to update it soon) I put our recent (WHQL candidates) drivers here: http://people.redhat.com/vrozenfe/virtio-win-prewhql-0.1.zip Please give them a try and share your experience. Best, Vadim. ** Bug watch added: Red Hat Bugzilla #727034 https://bugzilla.redhat.com/show_bug.cgi?id=727034 -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/818673 Title: virtio: trying to map MMIO memory Status in QEMU: New Bug description: Qemu host is Core i7, running Linux. Guest is Windows XP sp3. Often, qemu will crash shortly after starting (1-5 minutes) with a statement "qemu-system-x86_64: virtio: trying to map MMIO memory" This has occured with qemu-kvm 0.14, qemu-kvm 0.14.1, qemu-0.15.0-rc0 and qemu 0.15.0-rc1. Qemu is started as such: qemu-system-x86_64 -cpu host -enable-kvm -pidfile /home/rick/qemu/hds/wxp.pid -drive file=/home/rick/qemu/hds/wxp.raw,if=virtio -m 768 -name WinXP -net nic,model=virtio -net user -localtime -usb -vga qxl -device virtio-serial -chardev spicevmc,name=vdagent,id=vdagent -device virtserialport,chardev=vdagent,name=com.redhat.spice.0 -spice port=1234,disable-ticketing -daemonize -monitor telnet:localhost:12341,server,nowait The WXP guest has virtio 1.1.16 drivers for net and scsi, and the most current spice binaries from spice-space.org. To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/818673/+subscriptions
Re: [Qemu-devel] [Qemu-ppc] [RESEND][PATCH] booke timers
On 09/13/2011 08:08 AM, Alexander Graf wrote: > Fabien Chouteau wrote: >> static void booke_update_irq(CPUState *env) >> { >> ppc_set_irq(env, PPC_INTERRUPT_DECR, >> (env->spr[SPR_BOOKE_TSR] & TSR_DIS >> && env->spr[SPR_BOOKE_TCR] & TCR_DIE)); >> >> ppc_set_irq(env, PPC_INTERRUPT_WDT, >> (env->spr[SPR_BOOKE_TSR] & TSR_WIS >> && env->spr[SPR_BOOKE_TCR] & TCR_WIE)); >> >> ppc_set_irq(env, PPC_INTERRUPT_FIT, >> (env->spr[SPR_BOOKE_TSR] & TSR_FIS >> && env->spr[SPR_BOOKE_TCR] & TCR_FIE)); >> } >> > > Awesome! Please also check on MSR.EE and send a new patch then :) If you check on EE here, then you'll need to call booke_update_irq() when EE changes (not sure whether that's the plan). Another option would be to unset the irq if the condition is not valid. This would also be better in that you could have all three set (DIS, DIE, EE) and not deliver the interrupt because there's a higher priority exception. -Scott
Re: [Qemu-devel] [Qemu-ppc] [RESEND][PATCH] booke timers
Am 13.09.2011 um 18:44 schrieb Scott Wood : > On 09/13/2011 08:08 AM, Alexander Graf wrote: >> Fabien Chouteau wrote: >>> static void booke_update_irq(CPUState *env) >>> { >>>ppc_set_irq(env, PPC_INTERRUPT_DECR, >>>(env->spr[SPR_BOOKE_TSR] & TSR_DIS >>> && env->spr[SPR_BOOKE_TCR] & TCR_DIE)); >>> >>>ppc_set_irq(env, PPC_INTERRUPT_WDT, >>>(env->spr[SPR_BOOKE_TSR] & TSR_WIS >>> && env->spr[SPR_BOOKE_TCR] & TCR_WIE)); >>> >>>ppc_set_irq(env, PPC_INTERRUPT_FIT, >>>(env->spr[SPR_BOOKE_TSR] & TSR_FIS >>> && env->spr[SPR_BOOKE_TCR] & TCR_FIE)); >>> } >>> >> >> Awesome! Please also check on MSR.EE and send a new patch then :) > > If you check on EE here, then you'll need to call booke_update_irq() > when EE changes (not sure whether that's the plan). Another option > would be to unset the irq if the condition is not valid. This would > also be better in that you could have all three set (DIS, DIE, EE) and > not deliver the interrupt because there's a higher priority exception. Yup, which is what the patch actually does, so sorry for the fuss :). The subtile parts are in a different function and by lowering the irq line when TSR or TCR get set, we're good. Alex >
Re: [Qemu-devel] qemu virtIO blocking operation - question
>> >> We are trying to paravirtualize the IPMI device (/dev/ipmi0). > > From http://en.wikipedia.org/wiki/Intelligent_Platform_Management_Interface: > "An implementation of IPMI version 1.5 can communicate via a direct > serial connection or via a side-band local area network (LAN) > connection to a remote client." > > Why do you need a new virtio device? Can't you use virtio-serial? > This is what other management channels are using for host<->guest > agents. It might be possible. However, we are doing it this way because : (a) I am not sure of the interactions with the real ipmi device on the host when the device is shared with multiple guests (we are not using pci passthrough to attach the device to a single guest). The device itself is stateless - that is, it can not handle multiple requests at one time. When multiple users use the device, the driver has to serialize the requests and send it to the device. The existing ipmi driver within Linux (drivers/char/ipmi) does that now. But this driver on one guest would not know the existence of another guest. May be it might have to be reworked to work with virtio-serial. Dunno. So, we wanted to keep this driver as is in the host and build a lightweight interface layer in the guest that can talk to the real driver on the host. Multiple guests would then be like multiple processes accessing the device. (b) We wanted to gain some experience paravirtualizing devices this way through virtio since we have other proprietary hardware that needs to be paravirtualized. Makes sense? Ani The information contained in this message may be privileged and confidential and protected from disclosure. If the reader of this message is not the intended recipient, or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any reproduction, dissemination or distribution of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately by replying to the message and deleting it from your computer. Thank you. Tellabs
Re: [Qemu-devel] [Qemu-ppc] [PATCH] PPC: Fix via-cuda memory registration
On Mon, Sep 12, 2011 at 9:05 PM, Anthony Liguori wrote: > On 09/12/2011 08:53 AM, Avi Kivity wrote: >> >> On 09/12/2011 04:46 PM, Lucas Meneghel Rodrigues wrote: >>> >>> On 09/12/2011 06:07 AM, Avi Kivity wrote: On 09/11/2011 02:38 PM, Alexander Graf wrote: > > Am 11.09.2011 um 12:41 schrieb Avi Kivity: > > > On 09/08/2011 07:54 PM, Alexander Graf wrote: > >> PS: Please test your patches. This one could have been found with > an invocation > >> as simple as "qemu-system-ppc". We boot into the OpenBIOS prompt by > default, > >> so you wouldn't even have required a guest image or kernel. > >> > > > > > > Sorry about that. > > > > Note that it's pretty hard to test these patches. I often don't even > know which binary as the device->target relationship is not > immediately visible, > > The patch was explicitly to convert ppc ;). Yes, in this case. Not in the general case. > > and I don't really know what to expect from the guest. > > The very easy check-fundamentals thing to do for ppc is to execute > qemu-system-ppc without arguments. It should drop you into an OF > prompt. Both memory api bugs on ppc I've seen now would have been > exposed with that. > > I agree that we should have something slightly more sophisticated, but > doing such a bare minimum test is almost for free to the tester and > covers at least basic functionality :). I don't mind people > introducibg subtle bugs in corner cases - these things happen. But an > abort() when you execute the binary? That really shouldn't happen > ever. This one is almost as bad. Yeah. > > It would be best if we had a kvm-autotest testset for tcg, it would > probably run in just a few minutes and increase confidence in these > patches. > > Yeah, I am using kvm-autotest today for regression testing, but it's > very hard to tell it to run multiple different binaries. The target > program variable can only be set for an execution job, making it > impossible to run multiple targets in one autotest run. >>> >>> Alexander, I've started to work on this, I'm clearing out my request >>> list, last week I've implemented ticket 50, that was related to >>> special block configuration for the tests, now I want to make it >>> possible to support multiple binaries. >>> Probably best to tell autotest about the directory, and let it pick up the binary. Still need some configuration to choose between qemu-kvm and qemu-system-x86_64. Lucas? >>> >>> Yes, that would also work, having different variants with different >>> qemu and qemu-img paths. Those binaries would have to be already >>> pre-built, but then we miss the ability autotest has of building the >>> binaries and prepare the environment. It'd be like: >>> >>> variant1: >>> qemu = /path/to/qemu1 >>> qemu-img = /path/to/qemu-img1 >>> extra_params = "--appropriate --extra --params2" >>> >>> >>> variant2: >>> qemu = /path/to/qemu2 >>> qemu-img = /path/to/qemu-img2 >>> extra_params = "--appropriate --extra --params2" >>> >>> Something like that. It's a feasible intermediate solution until I >>> finish work on supporting multiple userspaces. >>> >> >> Another option is, now that the binary name 'qemu' is available for >> general use, make it possible to invoke everything with just one binary: >> >> qemu -system -target mips ... >> qemu-system -target mips ... >> qemu-system-mips ... > > I have a fancy script that I'll post soon that does something like this. It > takes the git approach and expands: > > qemu foo --bar=baz > > To: > > qemu-foo --bar=baz > > Which means that you could do: > > qemu system-x86_64 -hda foo.img > > And it'd go to: > > qemu-system-x86_64 -hda foo.img > > But there is also a smarter 'run' command that let's you do: > > qemu run --target=x86_64 -hda foo.img How would this be better than Avi's version? There isn't even any compatibility like 'qemu' has with 'qemu' defaulting to 'qemu -system -target i386'. > I've made no attempt to unify linux-user. It's a very different executable > with a different usage model. > > My motivation is QOM as I don't want to have command line options to create > devices any more. Instead, a front end script will talk to the monitor to > setup devices/machines. > > Regards, > > Anthony Liguori > >> >> are all equivalent. autotest should easily be able to pass different >> -target based on the test being run. >> > > >
Re: [Qemu-devel] Using the qemu tracepoints with SystemTap
On 09/13/2011 12:10 PM, William Cohen wrote: > Should the qemu.kvm.cpu_in and qemu.kvm.cpu_out match up? There are a lot > more qemu.kvm.cpu_out than qemu.kvm.cpu_in count. I found that cpu_in and cpu_out refer to input and output instructions. I wrote a little script tally up the input and output operations on each port to run on a qemu on fc15 machine. It generates output like the following: cpu_in portcount 0x01f7 3000 0x03d5 120 0xc000 2000 0xc002 3000 cpu_out portcount 0x0080 480 0x01f1 2000 0x01f2 2000 0x01f3 2000 0x01f4 2000 0x01f5 2000 0x01f6 2000 0x01f7 1000 0x03d4 480 0x03d5 120 0x03f6 1000 0xc000 3000 0xc002 2000 0xc004 1000 0xc0904 Looks like lots of touching the ide device ports (0x01f0-0x01ff) and some vga controller (0x03d0-0x3df). This is kind of what would be expected when the machine is doing a fsck and selinux relabel on the guest virtual machines. Look like some pci device access (http://www.tech-pro.net/intro_pci.html) also. -Will global cpu_in, cpu_out probe qemu.system.x86_64.cpu_in { cpu_in[addr]++ } probe qemu.system.x86_64.cpu_out {cpu_out[addr]++ } probe end { # write out the data printf("\ncpu_in\n%6s %8s\n","port", "count") foreach (addr+ in cpu_in) printf("0x%04x %8d\n", addr, cpu_in[addr]) printf("\ncpu_out\n%6s %8s\n","port", "count") foreach (addr+ in cpu_out) printf("0x%04x %8d\n", addr, cpu_out[addr]) }
Re: [Qemu-devel] [PATCH v3 5/6] vga: Use linear mapping + dirty logging in chain 4 memory access mode
On Tue, Sep 13, 2011 at 11:34 AM, Jan Kiszka wrote: > On 2011-09-13 11:42, Alexander Graf wrote: >> >> On 13.09.2011, at 11:00, Jan Kiszka wrote: >> >>> On 2011-09-13 10:40, Alexander Graf wrote: Btw, it still tries to execute invalid code even with your patch. #if 0'ing out the memory region updates at least get the guest booting for me. Btw, to get it working you also need a patch for the interrupt controller (another breakage thanks to memory api). diff --git a/hw/heathrow_pic.c b/hw/heathrow_pic.c index 51996ab..16f48d1 100644 --- a/hw/heathrow_pic.c +++ b/hw/heathrow_pic.c @@ -126,7 +126,7 @@ static uint64_t pic_read(void *opaque, target_phys_addr_t addr, static const MemoryRegionOps heathrow_pic_ops = { .read = pic_read, .write = pic_write, - .endianness = DEVICE_NATIVE_ENDIAN, + .endianness = DEVICE_LITTLE_ENDIAN, }; static void heathrow_pic_set_irq(void *opaque, int num, int level) >>> >>> With out without this fix, with or without active chain-4 optimization, >>> I just get an empty yellow screen when firing up qemu-system-ppc (also >>> when using the Debian ISO). Do I need to specify a specific machine type? >> >> Ugh. No, you only need this patch: >> >> [PATCH] PPC: Fix via-cuda memory registration >> >> which fixes another recently introduced regression :) > > That works now - and allowed me to identify the bug after enhancing info > mtree a bit: > > (qemu) info mtree > memory > addr prio 0 size 7fff system > addr 8088 prio 1 size 8 macio > addr 808e prio 0 size 2 macio-nvram > addr 808a prio 0 size 1000 pmac-ide > addr 80896000 prio 0 size 2000 cuda > addr 80893000 prio 0 size 40 escc-bar > addr 80888000 prio 0 size 1000 dbdma > addr 8088 prio 0 size 1000 heathrow-pic > addr 8000 prio 1 size 80 vga.vram > addr 800a prio 1 size 2 vga-lowmem > ... > > Here is the problem: Both the vram and the ISA range get mapped into > system address space, but the former eclipses the latter as it shows up > earlier in the list and has the same priority. This picture changes with > the chain-4 alias which has prio 2, thus maps over the vram. > > It looks to me like the ISA address space is either misplaced at > 0x8000 or is not supposed to be mapped at all on PPC. Comments? Since there is no PCI-ISA bridge, ISA address space shouldn't exist.
Re: [Qemu-devel] [Qemu-ppc] [PATCH] PPC: Fix via-cuda memory registration
On 09/13/2011 02:31 PM, Blue Swirl wrote: On Mon, Sep 12, 2011 at 9:05 PM, Anthony Liguori wrote: On 09/12/2011 08:53 AM, Avi Kivity wrote: On 09/12/2011 04:46 PM, Lucas Meneghel Rodrigues wrote: On 09/12/2011 06:07 AM, Avi Kivity wrote: On 09/11/2011 02:38 PM, Alexander Graf wrote: Am 11.09.2011 um 12:41 schrieb Avi Kivity: On 09/08/2011 07:54 PM, Alexander Graf wrote: PS: Please test your patches. This one could have been found with an invocation as simple as "qemu-system-ppc". We boot into the OpenBIOS prompt by default, so you wouldn't even have required a guest image or kernel. Sorry about that. Note that it's pretty hard to test these patches. I often don't even know which binary as the device->target relationship is not immediately visible, The patch was explicitly to convert ppc ;). Yes, in this case. Not in the general case. and I don't really know what to expect from the guest. The very easy check-fundamentals thing to do for ppc is to execute qemu-system-ppc without arguments. It should drop you into an OF prompt. Both memory api bugs on ppc I've seen now would have been exposed with that. I agree that we should have something slightly more sophisticated, but doing such a bare minimum test is almost for free to the tester and covers at least basic functionality :). I don't mind people introducibg subtle bugs in corner cases - these things happen. But an abort() when you execute the binary? That really shouldn't happen ever. This one is almost as bad. Yeah. It would be best if we had a kvm-autotest testset for tcg, it would probably run in just a few minutes and increase confidence in these patches. Yeah, I am using kvm-autotest today for regression testing, but it's very hard to tell it to run multiple different binaries. The target program variable can only be set for an execution job, making it impossible to run multiple targets in one autotest run. Alexander, I've started to work on this, I'm clearing out my request list, last week I've implemented ticket 50, that was related to special block configuration for the tests, now I want to make it possible to support multiple binaries. Probably best to tell autotest about the directory, and let it pick up the binary. Still need some configuration to choose between qemu-kvm and qemu-system-x86_64. Lucas? Yes, that would also work, having different variants with different qemu and qemu-img paths. Those binaries would have to be already pre-built, but then we miss the ability autotest has of building the binaries and prepare the environment. It'd be like: variant1: qemu = /path/to/qemu1 qemu-img = /path/to/qemu-img1 extra_params = "--appropriate --extra --params2" variant2: qemu = /path/to/qemu2 qemu-img = /path/to/qemu-img2 extra_params = "--appropriate --extra --params2" Something like that. It's a feasible intermediate solution until I finish work on supporting multiple userspaces. Another option is, now that the binary name 'qemu' is available for general use, make it possible to invoke everything with just one binary: qemu -system -target mips ... qemu-system -target mips ... qemu-system-mips ... I have a fancy script that I'll post soon that does something like this. It takes the git approach and expands: qemu foo --bar=baz To: qemu-foo --bar=baz Which means that you could do: qemu system-x86_64 -hda foo.img And it'd go to: qemu-system-x86_64 -hda foo.img But there is also a smarter 'run' command that let's you do: qemu run --target=x86_64 -hda foo.img How would this be better than Avi's version? There isn't even any compatibility like 'qemu' has with 'qemu' defaulting to 'qemu -system -target i386'. Because you can then do: $ qemu run -hda foo.img -name bar $ qemu monitor bar info kvm KVM enabled Or you can do: $ sudo qemu setup-nat foo eth0 $ sudo qemu create-vnic foo Created vnic `vnet0' $ qemu run -hda foo.img -net tap,ifname=vnet0 And all sorts of other interesting things. It means a much friendly interface for command line users and much better scriptability. Regards, Anthony Liguori I've made no attempt to unify linux-user. It's a very different executable with a different usage model. My motivation is QOM as I don't want to have command line options to create devices any more. Instead, a front end script will talk to the monitor to setup devices/machines. Regards, Anthony Liguori are all equivalent. autotest should easily be able to pass different -target based on the test being run.
Re: [Qemu-devel] QEMU Image problem
On Tue, Sep 13, 2011 at 14:52, bala suru wrote: > Hi, > > I have some problem at the genarating the KVM-QEMU image from the .iso file > ., I tried virt-manager but could not create the proper one . i am not really sure what you're going to do, but if you just wanna to access it, why not simply pass it to -cdrom option? -- regards, Mulyadi Santosa Freelance Linux trainer and consultant blog: the-hydra.blogspot.com training: mulyaditraining.blogspot.com
[Qemu-devel] [Bug 818673] Re: virtio: trying to map MMIO memory
Still crashes just the same. I updated the drivers for virt net, scsi & serial from the XP and WXp folders in the zip file that you referenced. Then I shutdown the VM. Because it only seems to happen every other time that Qemu is started, I started it back up and shut it down again. Then the VM was started a third time and left idle prior to crashing. Thanks, and sorry that I didn't have better news. (also, note that I've built qemu-kvm straight from www.linux-kvm.org, and qemu straight from qemu.org). -Rick -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/818673 Title: virtio: trying to map MMIO memory Status in QEMU: New Bug description: Qemu host is Core i7, running Linux. Guest is Windows XP sp3. Often, qemu will crash shortly after starting (1-5 minutes) with a statement "qemu-system-x86_64: virtio: trying to map MMIO memory" This has occured with qemu-kvm 0.14, qemu-kvm 0.14.1, qemu-0.15.0-rc0 and qemu 0.15.0-rc1. Qemu is started as such: qemu-system-x86_64 -cpu host -enable-kvm -pidfile /home/rick/qemu/hds/wxp.pid -drive file=/home/rick/qemu/hds/wxp.raw,if=virtio -m 768 -name WinXP -net nic,model=virtio -net user -localtime -usb -vga qxl -device virtio-serial -chardev spicevmc,name=vdagent,id=vdagent -device virtserialport,chardev=vdagent,name=com.redhat.spice.0 -spice port=1234,disable-ticketing -daemonize -monitor telnet:localhost:12341,server,nowait The WXP guest has virtio 1.1.16 drivers for net and scsi, and the most current spice binaries from spice-space.org. To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/818673/+subscriptions
[Qemu-devel] buildbot failure in qemu on block_x86_64_debian_6_0
The Buildbot has detected a new failure on builder block_x86_64_debian_6_0 while building qemu. Full details are available at: http://buildbot.b1-systems.de/qemu/builders/block_x86_64_debian_6_0/builds/30 Buildbot URL: http://buildbot.b1-systems.de/qemu/ Buildslave for this Build: yuzuki Build Reason: The Nightly scheduler named 'nightly_block' triggered this build Build Source Stamp: [branch block] HEAD Blamelist: BUILD FAILED: failed git sincerely, -The Buildbot
Re: [Qemu-devel] [PATCH] pci: implement bridge filtering
At 09/05/2011 02:13 AM, Michael S. Tsirkin Write: > Support bridge filtering on top of the memory > API as suggested by Avi Kivity: > > Create a memory region for the bridge's address space. This region is > not directly added to system_memory or its descendants. Devices under > the bridge see this region as its pci_address_space(). The region is > as large as the entire address space - it does not take into account > any windows. > > For each of the three windows (pref, non-pref, vga), create an alias > with the appropriate start and size. Map the alias into the bridge's > parent's pci_address_space(), as subregions. > > Signed-off-by: Michael S. Tsirkin > --- > > The below seems to work fine for me so I applied this. > Still need to test bridge filtering, any help with this > appreciated. > I test bridge filtering, and the BAR still can be visible on guest even if I change the memory region. Thanks Wen Congyang
Re: [Qemu-devel] Why qemu write/rw speed is so low?
Log for bps=((10 * 1024 * 1024)). test: (g=0): rw=write, bs=512-512/512-512, ioengine=libaio, iodepth=1 Starting 1 process Jobs: 1 (f=1): [W] [100.0% done] [0K/58K /s] [0/114 iops] [eta 00m:00s] test: (groupid=0, jobs=1): err= 0: pid=2657 write: io=51,200KB, bw=58,527B/s, iops=114, runt=895793msec slat (usec): min=26, max=376K, avg=81.69, stdev=2104.09 clat (usec): min=859, max=757K, avg=8648.07, stdev=8278.64 lat (usec): min=921, max=1,133K, avg=8730.49, stdev=9239.57 bw (KB/s) : min=0, max= 60, per=101.03%, avg=57.59, stdev= 7.41 cpu : usr=0.05%, sys=0.75%, ctx=102611, majf=0, minf=51 IO depths: 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued r/w: total=0/102400, short=0/0 lat (usec): 1000=0.01% lat (msec): 2=0.01%, 4=0.02%, 10=98.99%, 20=0.24%, 50=0.66% lat (msec): 100=0.03%, 250=0.01%, 500=0.05%, 1000=0.01% Run status group 0 (all jobs): WRITE: io=51,200KB, aggrb=57KB/s, minb=58KB/s, maxb=58KB/s, mint=895793msec, maxt=895793msec Disk stats (read/write): dm-0: ios=28/103311, merge=0/0, ticks=1318/950537, in_queue=951852, util=99.63%, aggrios=28/102932, aggrmerge=0/379, aggrticks=1316/929743, aggrin_queue=930987, aggrutil=99.60% vda: ios=28/102932, merge=0/379, ticks=1316/929743, in_queue=930987, util=99.60% test: (g=0): rw=write, bs=8K-8K/8K-8K, ioengine=libaio, iodepth=1 Starting 1 process Jobs: 1 (f=1): [W] [100.0% done] [0K/892K /s] [0/108 iops] [eta 00m:00s] test: (groupid=0, jobs=1): err= 0: pid=2782 write: io=51,200KB, bw=926KB/s, iops=115, runt= 55269msec slat (usec): min=20, max=32,160, avg=66.43, stdev=935.62 clat (msec): min=1, max=157, avg= 8.53, stdev= 2.55 lat (msec): min=1, max=158, avg= 8.60, stdev= 2.93 bw (KB/s) : min= 539, max= 968, per=100.12%, avg=927.09, stdev=63.89 cpu : usr=0.10%, sys=0.47%, ctx=6415, majf=0, minf=26 IO depths: 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued r/w: total=0/6400, short=0/0 lat (msec): 2=0.06%, 4=0.05%, 10=99.19%, 20=0.06%, 50=0.62% lat (msec): 250=0.02% Run status group 0 (all jobs): WRITE: io=51,200KB, aggrb=926KB/s, minb=948KB/s, maxb=948KB/s, mint=55269msec, maxt=55269msec Disk stats (read/write): dm-0: ios=3/6546, merge=0/0, ticks=117/65262, in_queue=65387, util=99.58%, aggrios=3/6472, aggrmerge=0/79, aggrticks=117/62063, aggrin_queue=62178, aggrutil=99.54% vda: ios=3/6472, merge=0/79, ticks=117/62063, in_queue=62178, util=99.54% test: (g=0): rw=write, bs=64K-64K/64K-64K, ioengine=libaio, iodepth=1 Starting 1 process Jobs: 1 (f=1): [W] [100.0% done] [0K/7,332K /s] [0/111 iops] [eta 00m:00s] test: (g=0): rw=write, bs=64K-64K/64K-64K, ioengine=libaio, iodepth=1 Starting 1 process Jobs: 1 (f=1): [W] [100.0% done] [0K/7,332K /s] [0/111 iops] [eta 00m:00s] test: (groupid=0, jobs=1): err= 0: pid=2793 write: io=51,200KB, bw=7,074KB/s, iops=110, runt= 7238msec slat (usec): min=23, max=37,715, avg=82.08, stdev=1332.25 clat (msec): min=2, max=34, avg= 8.96, stdev= 1.54 lat (msec): min=2, max=58, avg= 9.04, stdev= 2.31 bw (KB/s) : min= 6361, max= 7281, per=100.13%, avg=7082.07, stdev=274.31 cpu : usr=0.08%, sys=0.53%, ctx=801, majf=0, minf=23 IO depths: 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued r/w: total=0/800, short=0/0 lat (msec): 4=0.25%, 10=92.12%, 20=7.25%, 50=0.38% Run status group 0 (all jobs): WRITE: io=51,200KB, aggrb=7,073KB/s, minb=7,243KB/s, maxb=7,243KB/s, mint=7238msec, maxt=7238msec Disk stats (read/write): dm-0: ios=0/811, merge=0/0, ticks=0/8003, in_queue=8003, util=98.35%, aggrios=0/804, aggrmerge=0/17, aggrticks=0/7319, aggrin_queue=7319, aggrutil=98.19% vda: ios=0/804, merge=0/17, ticks=0/7319, in_queue=7319, util=98.19% test: (g=0): rw=write, bs=128K-128K/128K-128K, ioengine=libaio, iodepth=1 Starting 1 process Jobs: 1 (f=1): [W] [83.3% done] [0K/10M /s] [0/81 iops] [eta 00m:01s] test: (groupid=0, jobs=1): err= 0: pid=2800 write: io=51,200KB, bw=10,113KB/s, iops=79, runt= 5063msec slat (usec): min=36, max=35,279, avg=130.55, stdev=1761.93 clat (msec): min=3, max=134, avg=12.52, stdev=16.93 lat (msec): min=3, max=134, avg=12.65, stdev=17.14 bw (KB/s) : min= 7888, max=13128, per=100.41%, avg=10153.00, stdev=1607.48 cpu : usr=0.00%, sys=0.51%, ctx=401, majf=0, minf=23 IO depths: 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit: 0=0.0%, 4=100.0%
Re: [Qemu-devel] [PATCH 2/2] Qemu co-operation with kvm tsc deadline timer
Jan Kiszka wrote: > On 2011-09-13 16:38, Liu, Jinsong wrote: >> From c1b502d6548fcc41592cd90acc82109ee949df75 Mon Sep 17 00:00:00 >> 2001 >> From: Liu, Jinsong >> Date: Tue, 13 Sep 2011 22:05:30 +0800 >> Subject: [PATCH] Qemu co-operation with kvm tsc deadline timer >> >> KVM add emulation of lapic tsc deadline timer for guest. >> This patch is co-operation work at qemu side. >> >> Signed-off-by: Liu, Jinsong --- >> target-i386/cpu.h |2 ++ >> target-i386/kvm.c | 14 ++ >> 2 files changed, 16 insertions(+), 0 deletions(-) >> >> diff --git a/target-i386/cpu.h b/target-i386/cpu.h >> index 935d08a..62ff73c 100644 >> --- a/target-i386/cpu.h >> +++ b/target-i386/cpu.h >> @@ -283,6 +283,7 @@ >> #define MSR_IA32_APICBASE_BSP (1<<8) >> #define MSR_IA32_APICBASE_ENABLE(1<<11) >> #define MSR_IA32_APICBASE_BASE (0xf<<12) >> +#define MSR_IA32_TSCDEADLINE0x6e0 >> >> #define MSR_MTRRcap 0xfe >> #define MSR_MTRRcap_VCNT8 >> @@ -687,6 +688,7 @@ typedef struct CPUX86State { >> uint64_t async_pf_en_msr; >> >> uint64_t tsc; >> +uint64_t tsc_deadline; > > This field has to be saved/restored for snapshots/migrations. > > Frankly, I've no clue right now if substates are in vogue again (they > had problems in their binary format) or if you can simply add a > versioned top-level field and bump the CPUState version number. > Yes, it would be saved/restored. After migration, tsc_deadline would be set to MSR_IA32_TSCDEADLINE to trigger tsc timer interrupt. >> >> uint64_t mcg_status; >> >> diff --git a/target-i386/kvm.c b/target-i386/kvm.c >> index aa843f0..206fcad 100644 >> --- a/target-i386/kvm.c >> +++ b/target-i386/kvm.c >> @@ -59,6 +59,7 @@ const KVMCapabilityInfo >> kvm_arch_required_capabilities[] = { >> >> static bool has_msr_star; >> static bool has_msr_hsave_pa; >> +static bool has_msr_tsc_deadline; >> static bool has_msr_async_pf_en; >> static int lm_capable_kernel; >> >> @@ -571,6 +572,10 @@ static int kvm_get_supported_msrs(KVMState *s) >> has_msr_hsave_pa = true; >> continue; >> } >> +if (kvm_msr_list->indices[i] == >> MSR_IA32_TSCDEADLINE) { +has_msr_tsc_deadline = >> true; +continue; >> +} >> } >> } >> >> @@ -899,6 +904,9 @@ static int kvm_put_msrs(CPUState *env, int >> level) if (has_msr_hsave_pa) { >> kvm_msr_entry_set(&msrs[n++], MSR_VM_HSAVE_PA, env->vm_hsave); >> } +if (has_msr_tsc_deadline) { + >> kvm_msr_entry_set(&msrs[n++], MSR_IA32_TSCDEADLINE, >> env->tsc_deadline); +} #ifdef TARGET_X86_64 if >> (lm_capable_kernel) { kvm_msr_entry_set(&msrs[n++], >> MSR_CSTAR, env->cstar); @@ -1145,6 +1153,9 @@ static int >> kvm_get_msrs(CPUState *env) if (has_msr_hsave_pa) { >> msrs[n++].index = MSR_VM_HSAVE_PA; } >> +if (has_msr_tsc_deadline) { >> +msrs[n++].index = MSR_IA32_TSCDEADLINE; >> +} >> >> if (!env->tsc_valid) { >> msrs[n++].index = MSR_IA32_TSC; >> @@ -1213,6 +1224,9 @@ static int kvm_get_msrs(CPUState *env) >> case MSR_IA32_TSC: env->tsc = msrs[i].data; >> break; >> +case MSR_IA32_TSCDEADLINE: >> +env->tsc_deadline = msrs[i].data; >> +break; >> case MSR_VM_HSAVE_PA: >> env->vm_hsave = msrs[i].data; >> break; > > Just to double check: This feature is exposed to the guest when A) the > host CPU supports it and B) QEMU passed down guest CPU specifications > (cpuid data) that allow it as well? > > Jan Yes. Thanks, Jinsong
[Qemu-devel] [PATCH] hid: vmstat fix
The commit "usb/hid: add hid_pointer_activate, use it" used HIDMouseState.mouse_grabbed in hid_pointer_activate(), so mouse_grabbed should be added into vmstat. Signed-off-by: TeLeMan --- hw/hid.c |3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/hw/hid.c b/hw/hid.c index c608400..72b861f 100644 --- a/hw/hid.c +++ b/hw/hid.c @@ -433,7 +433,7 @@ static const VMStateDescription vmstate_hid_ptr_queue = { const VMStateDescription vmstate_hid_ptr_device = { .name = "HIDPointerDevice", -.version_id = 1, +.version_id = 2, .minimum_version_id = 1, .post_load = hid_post_load, .fields = (VMStateField[]) { @@ -443,6 +443,7 @@ const VMStateDescription vmstate_hid_ptr_device = { VMSTATE_UINT32(n, HIDState), VMSTATE_INT32(protocol, HIDState), VMSTATE_UINT8(idle, HIDState), +VMSTATE_INT32_V(ptr.mouse_grabbed, HIDState, 2), VMSTATE_END_OF_LIST(), } }; -- 1.7.6.msysgit.0
Re: [Qemu-devel] [PATCH 4/9] runstate_set(): Check for valid transitions
On Fri, 9 Sep 2011 17:25:41 -0300 Luiz Capitulino wrote: > This commit could have been folded with the previous one, however > doing it separately will allow for easy bisect and revert if needed. > > Checking and testing all valid transitions wasn't trivial, chances > are this will need broader testing to become more stable. > > This is a transition table as suggested by Lluís Vilanova. > > Signed-off-by: Luiz Capitulino Would be nice to get a reviewed-by for this patch, as it wasn't trivial to get it right and tested... > --- > sysemu.h |1 + > vl.c | 74 > +- > 2 files changed, 74 insertions(+), 1 deletions(-) > > diff --git a/sysemu.h b/sysemu.h > index 19088aa..a01ddac 100644 > --- a/sysemu.h > +++ b/sysemu.h > @@ -36,6 +36,7 @@ extern uint8_t qemu_uuid[]; > int qemu_uuid_parse(const char *str, uint8_t *uuid); > #define UUID_FMT > "%02hhx%02hhx%02hhx%02hhx-%02hhx%02hhx-%02hhx%02hhx-%02hhx%02hhx-%02hhx%02hhx%02hhx%02hhx%02hhx%02hhx" > > +void runstate_init(void); > bool runstate_check(RunState state); > void runstate_set(RunState new_state); > typedef struct vm_change_state_entry VMChangeStateEntry; > diff --git a/vl.c b/vl.c > index 9926d2a..4a8edc7 100644 > --- a/vl.c > +++ b/vl.c > @@ -327,14 +327,84 @@ static int default_driver_check(QemuOpts *opts, void > *opaque) > > static RunState current_run_state = RSTATE_NO_STATE; > > +typedef struct { > +RunState from; > +RunState to; > +} RunStateTransition; > + > +static const RunStateTransition runstate_transitions_def[] = { > +/* from -> to */ > +{ RSTATE_NO_STATE, RSTATE_RUNNING }, > +{ RSTATE_NO_STATE, RSTATE_IN_MIGRATE }, > +{ RSTATE_NO_STATE, RSTATE_PRE_LAUNCH }, > + > +{ RSTATE_DEBUG, RSTATE_RUNNING }, > + > +{ RSTATE_IN_MIGRATE, RSTATE_RUNNING }, > +{ RSTATE_IN_MIGRATE, RSTATE_PRE_LAUNCH }, > + > +{ RSTATE_PANICKED, RSTATE_PAUSED }, > + > +{ RSTATE_IO_ERROR, RSTATE_RUNNING }, > + > +{ RSTATE_PAUSED, RSTATE_RUNNING }, > + > +{ RSTATE_POST_MIGRATE, RSTATE_RUNNING }, > + > +{ RSTATE_PRE_LAUNCH, RSTATE_RUNNING }, > +{ RSTATE_PRE_LAUNCH, RSTATE_POST_MIGRATE }, > + > +{ RSTATE_PRE_MIGRATE, RSTATE_RUNNING }, > +{ RSTATE_PRE_MIGRATE, RSTATE_POST_MIGRATE }, > + > +{ RSTATE_RESTORE, RSTATE_RUNNING }, > + > +{ RSTATE_RUNNING, RSTATE_DEBUG }, > +{ RSTATE_RUNNING, RSTATE_PANICKED }, > +{ RSTATE_RUNNING, RSTATE_IO_ERROR }, > +{ RSTATE_RUNNING, RSTATE_PAUSED }, > +{ RSTATE_RUNNING, RSTATE_PRE_MIGRATE }, > +{ RSTATE_RUNNING, RSTATE_RESTORE }, > +{ RSTATE_RUNNING, RSTATE_SAVEVM }, > +{ RSTATE_RUNNING, RSTATE_SHUTDOWN }, > +{ RSTATE_RUNNING, RSTATE_WATCHDOG }, > + > +{ RSTATE_SAVEVM, RSTATE_RUNNING }, > + > +{ RSTATE_SHUTDOWN, RSTATE_PAUSED }, > + > +{ RSTATE_WATCHDOG, RSTATE_RUNNING }, > + > +{ RSTATE_MAX, RSTATE_MAX }, > +}; > + > +static bool runstate_valid_transitions[RSTATE_MAX][RSTATE_MAX]; > + > bool runstate_check(RunState state) > { > return current_run_state == state; > } > > +void runstate_init(void) > +{ > +const RunStateTransition *p; > + > +memset(&runstate_valid_transitions, 0, > sizeof(runstate_valid_transitions)); > + > +for (p = &runstate_transitions_def[0]; p->from != RSTATE_MAX; p++) { > +runstate_valid_transitions[p->from][p->to] = true; > +} > +} > + > +/* This function will abort() on invalid state transitions */ > void runstate_set(RunState new_state) > { > -assert(new_state < RSTATE_MAX); > +if (new_state >= RSTATE_MAX || > +!runstate_valid_transitions[current_run_state][new_state]) { > +fprintf(stderr, "invalid runstate transition\n"); > +abort(); > +} > + > current_run_state = new_state; > } > > @@ -2218,6 +2288,8 @@ int main(int argc, char **argv, char **envp) > > g_mem_set_vtable(&mem_trace); > > +runstate_init(); > + > init_clocks(); > > qemu_cache_utils_init(envp);