Re: [Qemu-devel] About hotplug multifunction

2011-09-13 Thread Isaku Yamahata
On Tue, Sep 13, 2011 at 09:52:49AM +0300, Gleb Natapov wrote:
> On Tue, Sep 13, 2011 at 02:57:20PM +0900, Isaku Yamahata wrote:
> > On Sun, Sep 11, 2011 at 12:05:17PM +0300, Michael S. Tsirkin wrote:
> > > On Sat, Sep 10, 2011 at 02:43:11AM +0900, Isaku Yamahata wrote:
> > > > pci/pcie hot plug needs clean up for multifunction hotplug in long term.
> > > > Only single function device case works. Multifunction case is broken 
> > > > somwehat.
> > > > Especially the current acpi based hotplug should be replaced by
> > > > the standardized hot plug controller in long term.
> > > 
> > > We'll need to keep supporting windows XP, which IIUC only
> > > supports hotplug through ACPI. So it looks like we'll
> > > need both.
> > 
> > Yes, we'll need both then.
> > It would be possible to implement acpi-based hotplug with
> > standardized hotplug controller. Not with qemu-specific controller.
> > 
> Where is this "standardized hotplug controller" documented?

PCI Hot-Plug 1.1 by PCI sig.

NOTE: I already implemented pcie native hotplug in qemu which is defined
in pcie spec. 

> > It would require a bit amount of work to write ACPI code in DSDT that
> > handles standardized hotplug controller.
> > So I'm not sure it's worth while only for windows XP support.
> > -- 
> > yamahata
> 
> --
>   Gleb.
> 

-- 
yamahata



[Qemu-devel] Armel host (x86 emul.) img disk not writable

2011-09-13 Thread Father Mande

Hi,

It's my first message so :

Me : I work from a while with Vmware and Virtualbox

I have integrated Virtualbox in the x86 QNAP NAS system



Armel Platform : QNAP NAS TS-219 with a Marvell (Arm) SOC 1,8 Ghz

Test run QEMU x86 emulation inside (VM like freedos or Windows)



Context : only Modules are possible no kernel change ...

... test are done in a chroot env. Debian Squeeze for Armel, so add X11 
client to NAS and use X-Ming or Debian box as X Server



Test working :

install qemu thru apt-get

Start emu from fd (freedos) cdrom (live_cd linux , Win98 installation 
CD) ... all run open the windows, menu and keyboard works, I have add, 
also in fd program to manage the hlt ... so all seems to run. tested 
also qemu-launcher ... works.

Problem :

Each time I create a disk image to add or install on disk ... the 
install (Windows or Freedos) failed because he don't see the disk or 
can't read or write on 

I have tested qemu-img with raw, qcow, qcow2 without success ...

img file are full rw for all, qemu run under root, but it's same under a 
"normal" user.

Due to the lack of IDE in the Qnap ... I have try also compile modules 
for ide-core and load it (insmod) but no change ...



any advice ? Certainly I forgot somethings  ?



thanks for help.

Philippe.
  

Re: [Qemu-devel] [PATCH v3 5/6] vga: Use linear mapping + dirty logging in chain 4 memory access mode

2011-09-13 Thread Alexander Graf

On 12.09.2011, at 17:57, Jan Kiszka wrote:

> On 2011-09-12 17:49, Jan Kiszka wrote:
>> On 2011-09-12 17:45, Andreas Färber wrote:
>>> Am 12.09.2011 17:33, schrieb Jan Kiszka:
 On 2011-09-12 17:20, Alexander Graf wrote:
> Jan Kiszka wrote:
>> Most VGA memory access modes require MMIO handling as they demand weird
>> logic to get a byte from or into the video RAM. However, there is one
>> exception: chain 4 mode with all memory planes enabled for writing. This
>> mode actually allows lineary mapping, which can then be combined with
>> dirty logging to accelerate KVM.
>> 
>> This patch accelerates specifically VBE accesses like they are used by
>> grub in graphical mode. Not only the standard VGA adapter benefits from
>> this, also vmware and spice in VGA mode.
>> 
>> CC: Gerd Hoffmann 
>> CC: Avi Kivity 
>> Signed-off-by: Jan Kiszka 
>> 
> [...]
> 
>> +static void vga_update_memory_access(VGACommonState *s)
>> +{
>> +MemoryRegion *region, *old_region = s->chain4_alias;
>> +target_phys_addr_t base, offset, size;
>> +
>> +s->chain4_alias = NULL;
>> +
>> +if ((s->sr[0x02] & 0xf) == 0xf && s->sr[0x04] & 0x08) {
>> +offset = 0;
>> +switch ((s->gr[6] >> 2) & 3) {
>> +case 0:
>> +base = 0xa;
>> +size = 0x2;
>> +break;
>> +case 1:
>> +base = 0xa;
>> +size = 0x1;
>> +offset = s->bank_offset;
>> +break;
>> +case 2:
>> +base = 0xb;
>> +size = 0x8000;
>> +break;
>> +case 3:
>> +base = 0xb8000;
>> +size = 0x8000;
>> +break;
>> +}
>> +region = g_malloc(sizeof(*region));
>> +memory_region_init_alias(region, "vga.chain4", &s->vram, 
>> offset, size);
>> +memory_region_add_subregion_overlap(s->legacy_address_space, 
>> base,
>> +region, 2);
>> 
> This one eventually gives me the following in info mtree with -M g3beige
> on qemu-system-ppc:
> 
> (qemu) info mtree
> memory
> system addr  off  size 7fff
> -vga.chain4 addr 000a off  size 1
> -macio addr 8088 off  size 8
> --macio-nvram addr 0006 off  size 2
> --pmac-ide addr 0002 off  size 1000
> --cuda addr 00016000 off  size 2000
> --escc-bar addr 00013000 off  size 40
> --dbdma addr 8000 off  size 1000
> --heathrow-pic addr  off  size 1000
> -vga.rom addr 8080 off  size 1
> -vga.vram addr 8000 off  size 80
> -vga-lowmem addr 800a off  size 2
> -escc addr 80013000 off  size 40
> -isa-mmio addr fe00 off  size 20
> I/O
> io addr  off  size 1
> -cmd646-bmdma addr 0700 off  size 10
> --cmd646-bmdma-ioport addr 000c off  size 4
> --cmd646-bmdma-bus addr 0008 off  size 4
> --cmd646-bmdma-ioport addr 0004 off  size 4
> --cmd646-bmdma-bus addr  off  size 4
> -cmd646-cmd addr 0680 off  size 4
> -cmd646-data addr 0600 off  size 8
> -cmd646-cmd addr 0580 off  size 4
> -cmd646-data addr 0500 off  size 8
> -ne2000 addr 0400 off  size 100
> 
> This ends up overmapping 0xa, effectively overwriting kernel data.
> If I #if 0 the offending chunk out, everything is fine. I would assume
> that chain4 really needs to be inside of lowmem? No idea about VGA, but
> I'm sure you know what's going on :).
 Does this help?
 
 diff --git a/hw/vga.c b/hw/vga.c
 index 125fb29..0a0c5a6 100644
 --- a/hw/vga.c
 +++ b/hw/vga.c
 @@ -181,6 +181,7 @@ static void vga_update_memory_access(VGACommonState *s)
 size = 0x8000;
 break;
 }
 +base += isa_mem_base;
 region = g_malloc(sizeof(*region));
 memory_region_init_alias(region, "vga.chain4", &s->vram, offset, 
 size);
 memory_region_add_subregion_overlap(s->legacy_address_space, base,
>>> 
>>> No longer oopses, but the screen looks chaotic now (black bar at bottom,
>>> part of contents at top etc.).
>> 
>> Does this PPC machine map the ISA range and forward VGA accesses to the
>> adapter in general?
> 
> If it does, please post a dump of the VGACommonState while the screen is
> corrupted (gdb or via device_show [1]. Maybe I missed some condition
> that prevents chain4 optimizations, and your guest triggers this.

Picture: 
http://dl.dropbo

Re: [Qemu-devel] [PATCH v3 5/6] vga: Use linear mapping + dirty logging in chain 4 memory access mode

2011-09-13 Thread Avi Kivity

On 09/13/2011 09:54 AM, Alexander Graf wrote:

>
>  I had similar problems with sun4u, fixed with
>  f69539b14bdba7a5cd22e1f4bed439b476b17286. I think also here, PCI
>  should be given a memory range at 0x8000 and VGA should
>  automatically use that like.

Yeah, usually the ISA bus is behind an ISA-PCI bridge, so it should inherit the 
offset from its parent. Or do you mean something different?



He means that isa_mem_base should go away; instead isa_address_space() 
should be a subregion at offset 0x8000.


Which vga variant are you using?

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.




[Qemu-devel] QEMU Image problem

2011-09-13 Thread bala suru
Hi,

I have some problem at the genarating the KVM-QEMU image from the .iso file
., I tried virt-manager but could not create the proper one .

Can you pls tell a coorect way genarate the QEMU image from .iso file ..?

Regards
Bala


[Qemu-devel] [PATCH][RFC][0/2] REF+/REF- optimization

2011-09-13 Thread Frediano Ziglio
These patches try to trade-off between leaks and speed for clusters
refcounts.

Refcount increments (REF+ or refp) are handled in a different way from
decrements (REF- or refm). The reason it that posting or not flushing
a REF- cause "just" a leak while posting a REF+ cause a corruption.

To optimize REF- I just used an array to store offsets then when a
flush is requested or array reach a limit (currently 1022) the array
is sorted and written to disk. I use an array with offset instead of
ranges to support compression (an offset could appear multiple times
in the array).
I consider this patch quite ready.

To optimize REF+ I mark a range as allocated and use this range to
get new ones (avoiding writing refcount to disk). When a flush is
requested or in some situations (like snapshot) this cache is disabled
and flushed (written as REF-).
I do not consider this patch ready, it works and pass all io-tests
but for instance I would avoid allocating new clusters for refcount
during preallocation.

End speed up is quite visible allocating clusters (more then 20%).


Frediano Ziglio (2):
  qcow2: optimize refminus updates
  qcow2: ref+ optimization

 block/qcow2-refcount.c |  270 +---
 block/qcow2.c  |2 +
 block/qcow2.h  |   16 +++
 3 files changed, 275 insertions(+), 13 deletions(-)




[Qemu-devel] [PATCH][RFC][1/2] qcow2: optimize refminus updates

2011-09-13 Thread Frediano Ziglio
Cache refcount decrement in an array to trade-off between
leaks and speed.

Signed-off-by: Frediano Ziglio 
---
 block/qcow2-refcount.c |  142 ++--
 block/qcow2.c  |1 +
 block/qcow2.h  |   14 +
 3 files changed, 153 insertions(+), 4 deletions(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 9605367..7d59b68 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -40,6 +40,13 @@ int qcow2_refcount_init(BlockDriverState *bs)
 BDRVQcowState *s = bs->opaque;
 int ret, refcount_table_size2, i;
 
+s->refm_cache_index = 0;
+s->refm_cache_len = 1024;
+s->refm_cache = g_malloc(s->refm_cache_len * sizeof(uint64));
+if (!s->refm_cache) {
+goto fail;
+}
+
 refcount_table_size2 = s->refcount_table_size * sizeof(uint64_t);
 s->refcount_table = g_malloc(refcount_table_size2);
 if (s->refcount_table_size > 0) {
@@ -53,12 +60,14 @@ int qcow2_refcount_init(BlockDriverState *bs)
 }
 return 0;
  fail:
+g_free(s->refm_cache);
 return -ENOMEM;
 }
 
 void qcow2_refcount_close(BlockDriverState *bs)
 {
 BDRVQcowState *s = bs->opaque;
+g_free(s->refm_cache);
 g_free(s->refcount_table);
 }
 
@@ -634,13 +643,21 @@ int64_t qcow2_alloc_bytes(BlockDriverState *bs, int size)
 void qcow2_free_clusters(BlockDriverState *bs,
   int64_t offset, int64_t size)
 {
+BDRVQcowState *s = bs->opaque;
 int ret;
+int64_t start, last;
 
 BLKDBG_EVENT(bs->file, BLKDBG_CLUSTER_FREE);
-ret = update_refcount(bs, offset, size, -1);
-if (ret < 0) {
-fprintf(stderr, "qcow2_free_clusters failed: %s\n", strerror(-ret));
-/* TODO Remember the clusters to free them later and avoid leaking */
+start = offset & ~(s->cluster_size - 1);
+last = (offset + size - 1) & ~(s->cluster_size - 1);
+for (; start <= last; start += s->cluster_size) {
+ret = qcow2_refm_add(bs, start);
+if (ret < 0) {
+fprintf(stderr, "qcow2_free_clusters failed: %s\n", 
strerror(-ret));
+/* TODO Remember the clusters to free them later
+ * and avoid leaking */
+break;
+}
 }
 }
 
@@ -1165,3 +1182,120 @@ fail:
 return ret;
 }
 
+int qcow2_refm_add_any(BlockDriverState *bs, int64_t offset)
+{
+BDRVQcowState *s = bs->opaque;
+
+offset &= ~QCOW_OFLAG_COPIED;
+if (s->refm_cache_index + 2 > s->refm_cache_len) {
+int ret = qcow2_refm_flush(bs);
+if (ret < 0) {
+return ret;
+}
+}
+
+if ((offset & QCOW_OFLAG_COMPRESSED)) {
+int nb_csectors = ((offset >> s->csize_shift) & s->csize_mask) + 1;
+int64_t last;
+
+offset = (offset & s->cluster_offset_mask) & ~511;
+last  = offset + nb_csectors * 512 - 1;
+if (!in_same_refcount_block(s, offset, last)) {
+s->refm_cache[s->refm_cache_index++] = last;
+}
+}
+s->refm_cache[s->refm_cache_index++] = offset;
+return 0;
+}
+
+static int uint64_cmp(const void *a, const void *b)
+{
+#define A (*((const uint64_t *)a))
+#define B (*((const uint64_t *)b))
+if (A == B) {
+return 0;
+}
+return A > B ? 1 : -1;
+#undef A
+#undef B
+}
+
+int qcow2_refm_flush(BlockDriverState *bs)
+{
+BDRVQcowState *s = bs->opaque;
+uint16_t *refcount_block = NULL;
+int64_t old_table_index = -1;
+int ret, i, saved_index = 0;
+int len = s->refm_cache_index;
+
+/* sort cache */
+qsort(s->refm_cache, len, sizeof(uint64_t), uint64_cmp);
+
+/* save */
+for (i = 0; i < len; ++i) {
+uint64_t cluster_offset = s->refm_cache[i];
+int block_index, refcount;
+int64_t cluster_index = cluster_offset >> s->cluster_bits;
+int64_t table_index =
+cluster_index >> (s->cluster_bits - REFCOUNT_SHIFT);
+
+/* Load the refcount block and allocate it if needed */
+if (table_index != old_table_index) {
+if (refcount_block) {
+ret = qcow2_cache_put(bs, s->refcount_block_cache,
+(void **) &refcount_block);
+if (ret < 0) {
+goto fail;
+}
+saved_index = i;
+refcount_block = NULL;
+}
+
+ret = alloc_refcount_block(bs, cluster_index, &refcount_block);
+if (ret < 0) {
+goto fail;
+}
+}
+old_table_index = table_index;
+
+qcow2_cache_entry_mark_dirty(s->refcount_block_cache, refcount_block);
+
+/* we can update the count and save it */
+block_index = cluster_index &
+((1 << (s->cluster_bits - REFCOUNT_SHIFT)) - 1);
+
+refcount = be16_to_cpu(refcount_block[block_index]);
+refcount--;
+if (refcount < 0) {
+ret = -EINVAL;
+goto fail;
+}
+if (refcount == 0 && clus

[Qemu-devel] [PATCH][RFC][2/2] qcow2: ref+ optimization

2011-09-13 Thread Frediano Ziglio
preallocate multiple refcount increment in order to collapse
allocation writes. This cause leaks in case of Qemu crash but
no corruptions.

Signed-off-by: Frediano Ziglio 
---
 block/qcow2-refcount.c |  128 ---
 block/qcow2.c  |1 +
 block/qcow2.h  |2 +
 3 files changed, 122 insertions(+), 9 deletions(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 7d59b68..3792cda 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -30,6 +30,7 @@ static int64_t alloc_clusters_noref(BlockDriverState *bs, 
int64_t size);
 static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
 int64_t offset, int64_t length,
 int addend);
+static void qcow2_refp_enable(BlockDriverState *bs);
 
 
 /*/
@@ -117,6 +118,12 @@ static int get_refcount(BlockDriverState *bs, int64_t 
cluster_index)
 ((1 << (s->cluster_bits - REFCOUNT_SHIFT)) - 1);
 refcount = be16_to_cpu(refcount_block[block_index]);
 
+/* ignore preallocation */
+if (cluster_index >= s->refp_prealloc_begin
+&& cluster_index < s->refp_prealloc_end) {
+--refcount;
+}
+
 ret = qcow2_cache_put(bs, s->refcount_block_cache,
 (void**) &refcount_block);
 if (ret < 0) {
@@ -207,6 +214,10 @@ static int alloc_refcount_block(BlockDriverState *bs,
  *   refcount block into the cache
  */
 
+uint64_t old_free_cluster_index = s->free_cluster_index;
+qcow2_refp_flush(bs);
+s->free_cluster_index = old_free_cluster_index;
+
 *refcount_block = NULL;
 
 /* We write to the refcount table, so we might depend on L2 tables */
@@ -215,6 +226,7 @@ static int alloc_refcount_block(BlockDriverState *bs,
 /* Allocate the refcount block itself and mark it as used */
 int64_t new_block = alloc_clusters_noref(bs, s->cluster_size);
 if (new_block < 0) {
+qcow2_refp_enable(bs);
 return new_block;
 }
 
@@ -279,6 +291,7 @@ static int alloc_refcount_block(BlockDriverState *bs,
 }
 
 s->refcount_table[refcount_table_index] = new_block;
+qcow2_refp_enable(bs);
 return 0;
 }
 
@@ -400,10 +413,11 @@ static int alloc_refcount_block(BlockDriverState *bs,
 s->refcount_table_offset = table_offset;
 
 /* Free old table. Remember, we must not change free_cluster_index */
-uint64_t old_free_cluster_index = s->free_cluster_index;
+old_free_cluster_index = s->free_cluster_index;
 qcow2_free_clusters(bs, old_table_offset, old_table_size * 
sizeof(uint64_t));
 s->free_cluster_index = old_free_cluster_index;
 
+qcow2_refp_enable(bs);
 ret = load_refcount_block(bs, new_block, (void**) refcount_block);
 if (ret < 0) {
 return ret;
@@ -417,6 +431,7 @@ fail_block:
 if (*refcount_block != NULL) {
 qcow2_cache_put(bs, s->refcount_block_cache, (void**) refcount_block);
 }
+qcow2_refp_enable(bs);
 return ret;
 }
 
@@ -529,9 +544,23 @@ static int update_cluster_refcount(BlockDriverState *bs,
 BDRVQcowState *s = bs->opaque;
 int ret;
 
-ret = update_refcount(bs, cluster_index << s->cluster_bits, 1, addend);
-if (ret < 0) {
-return ret;
+/* handle preallocation */
+if (cluster_index >= s->refp_prealloc_begin
+&& cluster_index < s->refp_prealloc_end) {
+
+/* free previous (should never happen) */
+int64_t index = s->refp_prealloc_begin;
+for (; index < cluster_index; ++index) {
+qcow2_refm_add(bs, index << s->cluster_bits);
+}
+addend--;
+s->refp_prealloc_begin = cluster_index + 1;
+}
+if (addend) {
+ret = update_refcount(bs, cluster_index << s->cluster_bits, 1, addend);
+if (ret < 0) {
+return ret;
+}
 }
 
 bdrv_flush(bs->file);
@@ -572,20 +601,94 @@ retry:
 return (s->free_cluster_index - nb_clusters) << s->cluster_bits;
 }
 
+static void qcow2_refp_enable(BlockDriverState *bs)
+{
+BDRVQcowState *s = bs->opaque;
+
+if (s->refp_prealloc_end < 0) {
+/* enable again ? */
+if (++s->refp_prealloc_end == 0) {
+s->refp_prealloc_end =
+s->refp_prealloc_begin;
+}
+}
+}
+
+int qcow2_refp_flush(BlockDriverState *bs)
+{
+BDRVQcowState *s = bs->opaque;
+int64_t index, end = s->refp_prealloc_end;
+
+if (end < 0) {
+s->refp_prealloc_end = end - 1;
+return 0;
+}
+
+index = s->refp_prealloc_begin;
+/* this disable next allocations */
+s->refp_prealloc_end = -1;
+for (; index < end; ++index) {
+qcow2_refm_add(bs, index << s->cluster_bits);
+}
+qcow2_refm_flush(bs);
+return 0;
+}
+
 int64_t qcow2_alloc_clusters(BlockDriverState *bs, int64_t size)
 {
-int64_t offset;
-int ret;
+BDRVQcowState *s = bs->opaque;
+int64_t o

Re: [Qemu-devel] [PATCH v3 5/6] vga: Use linear mapping + dirty logging in chain 4 memory access mode

2011-09-13 Thread Alexander Graf

On 13.09.2011, at 09:51, Avi Kivity wrote:

> On 09/13/2011 09:54 AM, Alexander Graf wrote:
>> >
>> >  I had similar problems with sun4u, fixed with
>> >  f69539b14bdba7a5cd22e1f4bed439b476b17286. I think also here, PCI
>> >  should be given a memory range at 0x8000 and VGA should
>> >  automatically use that like.
>> 
>> Yeah, usually the ISA bus is behind an ISA-PCI bridge, so it should inherit 
>> the offset from its parent. Or do you mean something different?
>> 
> 
> He means that isa_mem_base should go away; instead isa_address_space() should 
> be a subregion at offset 0x8000.

So we are talking about the same thing. Logically speaking, ISA devices are 
behind the ISA-PCI bridge, so the parent would be the bridge, right?

> Which vga variant are you using?

This is stdvga.


Alex




Re: [Qemu-devel] [PATCH 1/4] Sparc: convert mmu_helper to trace framework

2011-09-13 Thread Stefan Hajnoczi
On Sun, Sep 11, 2011 at 04:41:10PM +, Blue Swirl wrote:
> +mmu_helper_dmiss(uint64_t address, uint64_t context) "DMISS at
> %"PRIx64" context %"PRIx64
> +mmu_helper_tfault(uint64_t address, uint64_t context) "TFAULT at
> %"PRIx64" context %"PRIx64
> +mmu_helper_tmiss(uint64_t address, uint64_t context) "TMISS at
> %"PRIx64" context %"PRIx64

>From docs/tracing.txt:

  format strings must begin and end with double quotes.  When using
  portability macros, ensure they are preceded and followed by double
  quotes:
  "value %"PRIx64""

This is a parser limitation in scripts/tracetool.  We could change it to
treat everything after the trace event arguments as the format string,
but today it explicitly looks for a pattern like ".*".

I've added this to my tracing TODO list and it should be possible to
lift the limitation soon.  For now, please make sure the format string
begins and ends with double quote.

Stefan



Re: [Qemu-devel] Why qemu write/rw speed is so low?

2011-09-13 Thread Stefan Hajnoczi
On Tue, Sep 13, 2011 at 10:52:44AM +0800, Zhi Yong Wu wrote:
> This is real log when fio issued with bs=128K and bps=100(block
> I/O throttling):

I would use 1024 * 1024 instead of 100 as the throughput limit.
10^5 is not a multiple of 512 bytes and is not a nice value in KB/s
(976.5625).

> 
>   8,201 0.0 24332  A  WS 79958528 + 256 <-
> (253,2) 71830016

256 blocks = 256 * 512 bytes = 128 KB per request.  We know the maximum
request size from Linux is 128 KB so this makes sense.

> Throughput (R/W): 0KiB/s / 482KiB/s

What throughput do you get without I/O throttling?  Either I/O
throttling is limiting too aggressively here or the physical disk is the
bottleneck (I double that since the write throughput value is very low).
We need to compare against the throughput when throttling is not
enabled.

Stefan



Re: [Qemu-devel] Why qemu write/rw speed is so low?

2011-09-13 Thread Stefan Hajnoczi
On Tue, Sep 13, 2011 at 10:38:28AM +0800, Zhi Yong Wu wrote:
> On Fri, Sep 9, 2011 at 6:38 PM, Stefan Hajnoczi
>  wrote:
> > On Fri, Sep 09, 2011 at 05:44:36PM +0800, Zhi Yong Wu wrote:
> >> Today, i did some basical I/O testing, and suddenly found that qemu write 
> >> and rw speed is so low now, my qemu binary is built on commit 
> >> 344eecf6995f4a0ad1d887cec922f6806f91a3f8.
> >>
> >> Do qemu have regression?
> >>
> >> The testing data is shown as below:
> >>
> >> 1.) write
> >>
> >> test: (g=0): rw=write, bs=512-512/512-512, ioengine=libaio, iodepth=1
> >
> > Please post your QEMU command-line.  If your -drive is using
> > cache=writethrough then small writes are slow because they require the
> > physical disk to write and then synchronize its write cache.  Typically
> > cache=none is a good setting to use for local disks.
> >
> > The block size of 512 bytes is too small.  Ext4 uses a 4 KB block size,
> > so I think a 512 byte write from the guest could cause a 4 KB
> > read-modify-write operation on the host filesystem.
> >
> > You can check this by running btrace(8) on the host during the
> > benchmark.  The blktrace output and the summary statistics will show
> > what I/O pattern the host is issuing.
>   8,201 0.0   337  A  WS 425081504 + 8 <-
> (253,1) 42611360

8 blocks = 8 * 512 bytes = 4 KB

So we are not performing 512 byte writes.  Some layer is changing the
I/O pattern.

Stefan



Re: [Qemu-devel] [PATCH v3 5/6] vga: Use linear mapping + dirty logging in chain 4 memory access mode

2011-09-13 Thread Avi Kivity

On 09/13/2011 10:54 AM, Alexander Graf wrote:

>>
>>  Yeah, usually the ISA bus is behind an ISA-PCI bridge, so it should inherit 
the offset from its parent. Or do you mean something different?
>>
>
>  He means that isa_mem_base should go away; instead isa_address_space() 
should be a subregion at offset 0x8000.

So we are talking about the same thing. Logically speaking, ISA devices are 
behind the ISA-PCI bridge, so the parent would be the bridge, right?



Right.  system_memory -> pci_address_space() -> isa_address_space() -> 
various vga areas.




>  Which vga variant are you using?

This is stdvga.




Don't see the call to vga_init() in that path (this is what passes 
isa_address_space() to the vga core).


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.




Re: [Qemu-devel] [PATCH] qcow2: fix range check

2011-09-13 Thread Frediano Ziglio
2011/9/12 Kevin Wolf :
> Am 10.09.2011 10:23, schrieb Frediano Ziglio:
>> QCowL2Meta::offset is not cluster aligned but only sector aligned
>> however nb_clusters count cluster from cluster start.
>> This fix range check. Note that old code have no corruption issues
>> related to this check cause it only cause intersection to occur
>> when shouldn't.
>
> Are you sure? See below. (I think it doesn't corrupt the image, but for
> a different reason)
>
>>
>> Signed-off-by: Frediano Ziglio 
>> ---
>>  block/qcow2-cluster.c |   14 +++---
>>  1 files changed, 7 insertions(+), 7 deletions(-)
>>
>> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
>> index 428b5ad..2f76311 100644
>> --- a/block/qcow2-cluster.c
>> +++ b/block/qcow2-cluster.c
>> @@ -776,17 +776,17 @@ again:
>>       */
>>      QLIST_FOREACH(old_alloc, &s->cluster_allocs, next_in_flight) {
>>
>> -        uint64_t end_offset = offset + nb_clusters * s->cluster_size;
>> -        uint64_t old_offset = old_alloc->offset;
>> -        uint64_t old_end_offset = old_alloc->offset +
>> -            old_alloc->nb_clusters * s->cluster_size;
>> +        uint64_t start = offset >> s->cluster_bits;
>> +        uint64_t end = start + nb_clusters;
>> +        uint64_t old_start = old_alloc->offset >> s->cluster_bits;
>> +        uint64_t old_end = old_start + old_alloc->nb_clusters;
>>
>> -        if (end_offset < old_offset || offset > old_end_offset) {
>> +        if (end < old_start || start > old_end) {
>>              /* No intersection */
>
> Consider request A from 0x0 + 0x1000 bytes and request B from 0x2000 +
> 0x1000 bytes. Both touch the same cluster and therefore should be
> serialised, but 0x2000 > 0x1000, so we decided here that there is no
> intersection and we don't have to care.
>
> Note that this doesn't corrupt the image, qcow2 can handle parallel
> requests allocating the same cluster. In qcow2_alloc_cluster_link_l2()
> we get an additional COW operation, so performance will be hurt, but
> correctness is maintained.
>

I tested this adding some printf and also with strace and I can
confirm that current code serialize allocation.
Using ranges A (0-0x1000) and B (0x2000-0x3000) and assuming 0x1
(64k) as cluster size you get
A:
   offset 0
   nb_clusters 1
B:
  offset 0x2000
  nb_clusters 1

So without the patch you get two ranges
A: 0-0x1
B: 0x2000-0x12000
which intersects.

>>          } else {
>> -            if (offset < old_offset) {
>> +            if (start < old_start) {
>>                  /* Stop at the start of a running allocation */
>> -                nb_clusters = (old_offset - offset) >> s->cluster_bits;
>> +                nb_clusters = old_start - start;
>>              } else {
>>                  nb_clusters = 0;
>>              }
>
> Anyway, the patch looks good. Applied to the block branch.
>
> Kevin
>

Oh... I realize that ranges are [start, end) (end not inclusive) so
intersection test should be

   if (end <= old_start || start >= old_end) {

intead of

if (end < old_start || start > old_end) {

However I don't understand why I got some small speed penalty with
this change (only done some small tests).

Frediano



Re: [Qemu-devel] [PATCH v3 5/6] vga: Use linear mapping + dirty logging in chain 4 memory access mode

2011-09-13 Thread Jan Kiszka
On 2011-09-13 09:39, Alexander Graf wrote:
> 
> On 12.09.2011, at 17:57, Jan Kiszka wrote:
> 
>> On 2011-09-12 17:49, Jan Kiszka wrote:
>>> On 2011-09-12 17:45, Andreas Färber wrote:
 Am 12.09.2011 17:33, schrieb Jan Kiszka:
> On 2011-09-12 17:20, Alexander Graf wrote:
>> Jan Kiszka wrote:
>>> Most VGA memory access modes require MMIO handling as they demand weird
>>> logic to get a byte from or into the video RAM. However, there is one
>>> exception: chain 4 mode with all memory planes enabled for writing. This
>>> mode actually allows lineary mapping, which can then be combined with
>>> dirty logging to accelerate KVM.
>>>
>>> This patch accelerates specifically VBE accesses like they are used by
>>> grub in graphical mode. Not only the standard VGA adapter benefits from
>>> this, also vmware and spice in VGA mode.
>>>
>>> CC: Gerd Hoffmann 
>>> CC: Avi Kivity 
>>> Signed-off-by: Jan Kiszka 
>>>
>> [...]
>>
>>> +static void vga_update_memory_access(VGACommonState *s)
>>> +{
>>> +MemoryRegion *region, *old_region = s->chain4_alias;
>>> +target_phys_addr_t base, offset, size;
>>> +
>>> +s->chain4_alias = NULL;
>>> +
>>> +if ((s->sr[0x02] & 0xf) == 0xf && s->sr[0x04] & 0x08) {
>>> +offset = 0;
>>> +switch ((s->gr[6] >> 2) & 3) {
>>> +case 0:
>>> +base = 0xa;
>>> +size = 0x2;
>>> +break;
>>> +case 1:
>>> +base = 0xa;
>>> +size = 0x1;
>>> +offset = s->bank_offset;
>>> +break;
>>> +case 2:
>>> +base = 0xb;
>>> +size = 0x8000;
>>> +break;
>>> +case 3:
>>> +base = 0xb8000;
>>> +size = 0x8000;
>>> +break;
>>> +}
>>> +region = g_malloc(sizeof(*region));
>>> +memory_region_init_alias(region, "vga.chain4", &s->vram, 
>>> offset, size);
>>> +memory_region_add_subregion_overlap(s->legacy_address_space, 
>>> base,
>>> +region, 2);
>>>
>> This one eventually gives me the following in info mtree with -M g3beige
>> on qemu-system-ppc:
>>
>> (qemu) info mtree
>> memory
>> system addr  off  size 7fff
>> -vga.chain4 addr 000a off  size 1
>> -macio addr 8088 off  size 8
>> --macio-nvram addr 0006 off  size 2
>> --pmac-ide addr 0002 off  size 1000
>> --cuda addr 00016000 off  size 2000
>> --escc-bar addr 00013000 off  size 40
>> --dbdma addr 8000 off  size 1000
>> --heathrow-pic addr  off  size 1000
>> -vga.rom addr 8080 off  size 1
>> -vga.vram addr 8000 off  size 80
>> -vga-lowmem addr 800a off  size 2
>> -escc addr 80013000 off  size 40
>> -isa-mmio addr fe00 off  size 20
>> I/O
>> io addr  off  size 1
>> -cmd646-bmdma addr 0700 off  size 10
>> --cmd646-bmdma-ioport addr 000c off  size 4
>> --cmd646-bmdma-bus addr 0008 off  size 4
>> --cmd646-bmdma-ioport addr 0004 off  size 4
>> --cmd646-bmdma-bus addr  off  size 4
>> -cmd646-cmd addr 0680 off  size 4
>> -cmd646-data addr 0600 off  size 8
>> -cmd646-cmd addr 0580 off  size 4
>> -cmd646-data addr 0500 off  size 8
>> -ne2000 addr 0400 off  size 100
>>
>> This ends up overmapping 0xa, effectively overwriting kernel data.
>> If I #if 0 the offending chunk out, everything is fine. I would assume
>> that chain4 really needs to be inside of lowmem? No idea about VGA, but
>> I'm sure you know what's going on :).
> Does this help?
>
> diff --git a/hw/vga.c b/hw/vga.c
> index 125fb29..0a0c5a6 100644
> --- a/hw/vga.c
> +++ b/hw/vga.c
> @@ -181,6 +181,7 @@ static void vga_update_memory_access(VGACommonState 
> *s)
> size = 0x8000;
> break;
> }
> +base += isa_mem_base;
> region = g_malloc(sizeof(*region));
> memory_region_init_alias(region, "vga.chain4", &s->vram, offset, 
> size);
> memory_region_add_subregion_overlap(s->legacy_address_space, base,

 No longer oopses, but the screen looks chaotic now (black bar at bottom,
 part of contents at top etc.).
>>>
>>> Does this PPC machine map the ISA range and forward VGA accesses to the
>>> adapter in general?
>>
>> If it does, please post a dump of the VGACommonState while the screen is
>> cor

Re: [Qemu-devel] [PATCH] qcow2: fix range check

2011-09-13 Thread Kevin Wolf
Am 13.09.2011 10:10, schrieb Frediano Ziglio:
> 2011/9/12 Kevin Wolf :
>> Am 10.09.2011 10:23, schrieb Frediano Ziglio:
>>> QCowL2Meta::offset is not cluster aligned but only sector aligned
>>> however nb_clusters count cluster from cluster start.
>>> This fix range check. Note that old code have no corruption issues
>>> related to this check cause it only cause intersection to occur
>>> when shouldn't.
>>
>> Are you sure? See below. (I think it doesn't corrupt the image, but for
>> a different reason)
>>
>>>
>>> Signed-off-by: Frediano Ziglio 
>>> ---
>>>  block/qcow2-cluster.c |   14 +++---
>>>  1 files changed, 7 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
>>> index 428b5ad..2f76311 100644
>>> --- a/block/qcow2-cluster.c
>>> +++ b/block/qcow2-cluster.c
>>> @@ -776,17 +776,17 @@ again:
>>>   */
>>>  QLIST_FOREACH(old_alloc, &s->cluster_allocs, next_in_flight) {
>>>
>>> -uint64_t end_offset = offset + nb_clusters * s->cluster_size;
>>> -uint64_t old_offset = old_alloc->offset;
>>> -uint64_t old_end_offset = old_alloc->offset +
>>> -old_alloc->nb_clusters * s->cluster_size;
>>> +uint64_t start = offset >> s->cluster_bits;
>>> +uint64_t end = start + nb_clusters;
>>> +uint64_t old_start = old_alloc->offset >> s->cluster_bits;
>>> +uint64_t old_end = old_start + old_alloc->nb_clusters;
>>>
>>> -if (end_offset < old_offset || offset > old_end_offset) {
>>> +if (end < old_start || start > old_end) {
>>>  /* No intersection */
>>
>> Consider request A from 0x0 + 0x1000 bytes and request B from 0x2000 +
>> 0x1000 bytes. Both touch the same cluster and therefore should be
>> serialised, but 0x2000 > 0x1000, so we decided here that there is no
>> intersection and we don't have to care.
>>
>> Note that this doesn't corrupt the image, qcow2 can handle parallel
>> requests allocating the same cluster. In qcow2_alloc_cluster_link_l2()
>> we get an additional COW operation, so performance will be hurt, but
>> correctness is maintained.
>>
> 
> I tested this adding some printf and also with strace and I can
> confirm that current code serialize allocation.
> Using ranges A (0-0x1000) and B (0x2000-0x3000) and assuming 0x1
> (64k) as cluster size you get
> A:
>offset 0
>nb_clusters 1
> B:
>   offset 0x2000
>   nb_clusters 1
> 
> So without the patch you get two ranges
> A: 0-0x1
> B: 0x2000-0x12000
> which intersects.
> 
>>>  } else {
>>> -if (offset < old_offset) {
>>> +if (start < old_start) {
>>>  /* Stop at the start of a running allocation */
>>> -nb_clusters = (old_offset - offset) >> s->cluster_bits;
>>> +nb_clusters = old_start - start;
>>>  } else {
>>>  nb_clusters = 0;
>>>  }
>>
>> Anyway, the patch looks good. Applied to the block branch.
>>
>> Kevin
>>
> 
> Oh... I realize that ranges are [start, end) (end not inclusive) so
> intersection test should be
> 
>if (end <= old_start || start >= old_end) {
> 
> intead of
> 
> if (end < old_start || start > old_end) {
> 
> However I don't understand why I got some small speed penalty with
> this change (only done some small tests).

Hm, I think you are right. How do you measure performance?

Kevin



Re: [Qemu-devel] [PATCH v3 5/6] vga: Use linear mapping + dirty logging in chain 4 memory access mode

2011-09-13 Thread Andreas Färber

Am 13.09.2011 um 10:14 schrieb Jan Kiszka:


On 2011-09-13 09:39, Alexander Graf wrote:

(qemu) device_show #3
dev: VGA, id "#3", version 2
 dev.
   version_id: 0002
   config: 00 00 00 00 10 d1 cf 20 - 00 00 00 00 10 d1  
d0 30

   ...
   irq_state:  00 00 00 00 00 00 00 10 - 00 00 00 00 00 00  
00 00

 vga.
   latch:  
   sr_index:   00
   sr: 00 00 0f 00 08 00 00 00
   gr_index:   00
   gr: 00 00 00 00 00 40 05 00 - 00 00 00 00 00 00  
00 00

   ar_index:   20
   ar: 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00  
00 00

   ...
   ar_flip_flop:   0001
   cr_index:   00
   cr: 00 63 00 00 00 00 00 50 - 00 40 00 00 00 00  
00 00

   ...
   msr:00
   fcr:00
   st00:   00
   st01:   00
   dac_state:  00
   dac_sub_index:  00
   dac_read_index: 00
   dac_write_index:10
   dac_cache:  3f 3f 3f
   palette:00 00 00 00 00 2a 00 2a - 00 00 2a 2a 2a 00  
00 2a

   ...
   bank_offset:
   is_vbe_vmstate: 01
   vbe_index:  0004
   vbe_regs[00]:   b0c5
   vbe_regs[01]:   0320
   vbe_regs[02]:   0258
   vbe_regs[03]:   000f
   vbe_regs[04]:   0001
   vbe_regs[05]:   
   vbe_regs[06]:   0320
   vbe_regs[07]:   0258
   vbe_regs[08]:   
   vbe_regs[09]:   
   vbe_start_addr: 
   vbe_line_offset:0640
   vbe_bank_mask:  007f



Makes no sense, must work with this setup. Maybe it's dynamic effect
when switching modes of bank offsets. Do you have some test image  
for me?


I've been using a Debian netinst image, but businesscard should do as  
well:

http://cdimage.debian.org/debian-cd/6.0.2.1/powerpc/iso-cd/debian-6.0.2.1-powerpc-businesscard.iso

qemu-system-ppc -boot d -cdrom path/to.iso # at yaboot prompt type:  
install


Then in addition to the black bar in Alex' image, the penguin gets  
beheaded.


Andreas



Re: [Qemu-devel] Why qemu write/rw speed is so low?

2011-09-13 Thread Zhi Yong Wu
On Tue, Sep 13, 2011 at 3:15 PM, Stefan Hajnoczi
 wrote:
> On Tue, Sep 13, 2011 at 10:38:28AM +0800, Zhi Yong Wu wrote:
>> On Fri, Sep 9, 2011 at 6:38 PM, Stefan Hajnoczi
>>  wrote:
>> > On Fri, Sep 09, 2011 at 05:44:36PM +0800, Zhi Yong Wu wrote:
>> >> Today, i did some basical I/O testing, and suddenly found that qemu write 
>> >> and rw speed is so low now, my qemu binary is built on commit 
>> >> 344eecf6995f4a0ad1d887cec922f6806f91a3f8.
>> >>
>> >> Do qemu have regression?
>> >>
>> >> The testing data is shown as below:
>> >>
>> >> 1.) write
>> >>
>> >> test: (g=0): rw=write, bs=512-512/512-512, ioengine=libaio, iodepth=1
>> >
>> > Please post your QEMU command-line.  If your -drive is using
>> > cache=writethrough then small writes are slow because they require the
>> > physical disk to write and then synchronize its write cache.  Typically
>> > cache=none is a good setting to use for local disks.
>> >
>> > The block size of 512 bytes is too small.  Ext4 uses a 4 KB block size,
>> > so I think a 512 byte write from the guest could cause a 4 KB
>> > read-modify-write operation on the host filesystem.
>> >
>> > You can check this by running btrace(8) on the host during the
>> > benchmark.  The blktrace output and the summary statistics will show
>> > what I/O pattern the host is issuing.
>>   8,2    0        1     0.0   337  A  WS 425081504 + 8 <-
>> (253,1) 42611360
>
> 8 blocks = 8 * 512 bytes = 4 KB
How do you know each block size is 512 bytes?

>
> So we are not performing 512 byte writes.  Some layer is changing the
> I/O pattern.
>
> Stefan
>



-- 
Regards,

Zhi Yong Wu



Re: [Qemu-devel] [PATCH v3 5/6] vga: Use linear mapping + dirty logging in chain 4 memory access mode

2011-09-13 Thread Alexander Graf

On 13.09.2011, at 10:14, Jan Kiszka wrote:

> On 2011-09-13 09:39, Alexander Graf wrote:
>> 
>> On 12.09.2011, at 17:57, Jan Kiszka wrote:
>> 
>>> On 2011-09-12 17:49, Jan Kiszka wrote:
 On 2011-09-12 17:45, Andreas Färber wrote:
> Am 12.09.2011 17:33, schrieb Jan Kiszka:
>> On 2011-09-12 17:20, Alexander Graf wrote:
>>> Jan Kiszka wrote:
 Most VGA memory access modes require MMIO handling as they demand weird
 logic to get a byte from or into the video RAM. However, there is one
 exception: chain 4 mode with all memory planes enabled for writing. 
 This
 mode actually allows lineary mapping, which can then be combined with
 dirty logging to accelerate KVM.
 
 This patch accelerates specifically VBE accesses like they are used by
 grub in graphical mode. Not only the standard VGA adapter benefits from
 this, also vmware and spice in VGA mode.
 
 CC: Gerd Hoffmann 
 CC: Avi Kivity 
 Signed-off-by: Jan Kiszka 
 
>>> [...]
>>> 
 +static void vga_update_memory_access(VGACommonState *s)
 +{
 +MemoryRegion *region, *old_region = s->chain4_alias;
 +target_phys_addr_t base, offset, size;
 +
 +s->chain4_alias = NULL;
 +
 +if ((s->sr[0x02] & 0xf) == 0xf && s->sr[0x04] & 0x08) {
 +offset = 0;
 +switch ((s->gr[6] >> 2) & 3) {
 +case 0:
 +base = 0xa;
 +size = 0x2;
 +break;
 +case 1:
 +base = 0xa;
 +size = 0x1;
 +offset = s->bank_offset;
 +break;
 +case 2:
 +base = 0xb;
 +size = 0x8000;
 +break;
 +case 3:
 +base = 0xb8000;
 +size = 0x8000;
 +break;
 +}
 +region = g_malloc(sizeof(*region));
 +memory_region_init_alias(region, "vga.chain4", &s->vram, 
 offset, size);
 +memory_region_add_subregion_overlap(s->legacy_address_space, 
 base,
 +region, 2);
 
>>> This one eventually gives me the following in info mtree with -M g3beige
>>> on qemu-system-ppc:
>>> 
>>> (qemu) info mtree
>>> memory
>>> system addr  off  size 7fff
>>> -vga.chain4 addr 000a off  size 1
>>> -macio addr 8088 off  size 8
>>> --macio-nvram addr 0006 off  size 2
>>> --pmac-ide addr 0002 off  size 1000
>>> --cuda addr 00016000 off  size 2000
>>> --escc-bar addr 00013000 off  size 40
>>> --dbdma addr 8000 off  size 1000
>>> --heathrow-pic addr  off  size 1000
>>> -vga.rom addr 8080 off  size 1
>>> -vga.vram addr 8000 off  size 80
>>> -vga-lowmem addr 800a off  size 2
>>> -escc addr 80013000 off  size 40
>>> -isa-mmio addr fe00 off  size 20
>>> I/O
>>> io addr  off  size 1
>>> -cmd646-bmdma addr 0700 off  size 10
>>> --cmd646-bmdma-ioport addr 000c off  size 4
>>> --cmd646-bmdma-bus addr 0008 off  size 4
>>> --cmd646-bmdma-ioport addr 0004 off  size 4
>>> --cmd646-bmdma-bus addr  off  size 4
>>> -cmd646-cmd addr 0680 off  size 4
>>> -cmd646-data addr 0600 off  size 8
>>> -cmd646-cmd addr 0580 off  size 4
>>> -cmd646-data addr 0500 off  size 8
>>> -ne2000 addr 0400 off  size 100
>>> 
>>> This ends up overmapping 0xa, effectively overwriting kernel data.
>>> If I #if 0 the offending chunk out, everything is fine. I would assume
>>> that chain4 really needs to be inside of lowmem? No idea about VGA, but
>>> I'm sure you know what's going on :).
>> Does this help?
>> 
>> diff --git a/hw/vga.c b/hw/vga.c
>> index 125fb29..0a0c5a6 100644
>> --- a/hw/vga.c
>> +++ b/hw/vga.c
>> @@ -181,6 +181,7 @@ static void vga_update_memory_access(VGACommonState 
>> *s)
>>size = 0x8000;
>>break;
>>}
>> +base += isa_mem_base;
>>region = g_malloc(sizeof(*region));
>>memory_region_init_alias(region, "vga.chain4", &s->vram, offset, 
>> size);
>>memory_region_add_subregion_overlap(s->legacy_address_space, base,
> 
> No longer oopses, but the screen looks chaotic now (black bar at bottom,
> part of contents at top etc.).
 
 Does

[Qemu-devel] [PATCH] PPC: Fix heathrow PIC to use little endian MMIO

2011-09-13 Thread Alexander Graf
During the memory API conversion, the indication on little endianness of
MMIO for the heathrow PIC got dropped. This patch adds it back again.

Signed-off-by: Alexander Graf 
---
 hw/heathrow_pic.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/hw/heathrow_pic.c b/hw/heathrow_pic.c
index 51996ab..16f48d1 100644
--- a/hw/heathrow_pic.c
+++ b/hw/heathrow_pic.c
@@ -126,7 +126,7 @@ static uint64_t pic_read(void *opaque, target_phys_addr_t 
addr,
 static const MemoryRegionOps heathrow_pic_ops = {
 .read = pic_read,
 .write = pic_write,
-.endianness = DEVICE_NATIVE_ENDIAN,
+.endianness = DEVICE_LITTLE_ENDIAN,
 };
 
 static void heathrow_pic_set_irq(void *opaque, int num, int level)
-- 
1.6.0.2




Re: [Qemu-devel] Why qemu write/rw speed is so low?

2011-09-13 Thread Stefan Hajnoczi
On Tue, Sep 13, 2011 at 9:31 AM, Zhi Yong Wu  wrote:
> On Tue, Sep 13, 2011 at 3:15 PM, Stefan Hajnoczi
>  wrote:
>> On Tue, Sep 13, 2011 at 10:38:28AM +0800, Zhi Yong Wu wrote:
>>> On Fri, Sep 9, 2011 at 6:38 PM, Stefan Hajnoczi
>>>  wrote:
>>> > On Fri, Sep 09, 2011 at 05:44:36PM +0800, Zhi Yong Wu wrote:
>>> >> Today, i did some basical I/O testing, and suddenly found that qemu 
>>> >> write and rw speed is so low now, my qemu binary is built on commit 
>>> >> 344eecf6995f4a0ad1d887cec922f6806f91a3f8.
>>> >>
>>> >> Do qemu have regression?
>>> >>
>>> >> The testing data is shown as below:
>>> >>
>>> >> 1.) write
>>> >>
>>> >> test: (g=0): rw=write, bs=512-512/512-512, ioengine=libaio, iodepth=1
>>> >
>>> > Please post your QEMU command-line.  If your -drive is using
>>> > cache=writethrough then small writes are slow because they require the
>>> > physical disk to write and then synchronize its write cache.  Typically
>>> > cache=none is a good setting to use for local disks.
>>> >
>>> > The block size of 512 bytes is too small.  Ext4 uses a 4 KB block size,
>>> > so I think a 512 byte write from the guest could cause a 4 KB
>>> > read-modify-write operation on the host filesystem.
>>> >
>>> > You can check this by running btrace(8) on the host during the
>>> > benchmark.  The blktrace output and the summary statistics will show
>>> > what I/O pattern the host is issuing.
>>>   8,2    0        1     0.0   337  A  WS 425081504 + 8 <-
>>> (253,1) 42611360
>>
>> 8 blocks = 8 * 512 bytes = 4 KB
> How do you know each block size is 512 bytes?

The blkparse format specifier for blocks is 'n'.  Here is the code to
print it from blkparse_fmt.c:

case 'n':
fprintf(ofp, strcat(format, "u"), t_sec(t));

And t_sec() is:

#define t_sec(t)((t)->bytes >> 9)

So it divides the byte count by 512.  Block size == sector size == 512 bytes.

You can get the blktrace source code here:

http://brick.kernel.dk/snaps/

Stefan



Re: [Qemu-devel] Why qemu write/rw speed is so low?

2011-09-13 Thread Zhi Yong Wu
On Tue, Sep 13, 2011 at 4:49 PM, Stefan Hajnoczi  wrote:
> On Tue, Sep 13, 2011 at 9:31 AM, Zhi Yong Wu  wrote:
>> On Tue, Sep 13, 2011 at 3:15 PM, Stefan Hajnoczi
>>  wrote:
>>> On Tue, Sep 13, 2011 at 10:38:28AM +0800, Zhi Yong Wu wrote:
 On Fri, Sep 9, 2011 at 6:38 PM, Stefan Hajnoczi
  wrote:
 > On Fri, Sep 09, 2011 at 05:44:36PM +0800, Zhi Yong Wu wrote:
 >> Today, i did some basical I/O testing, and suddenly found that qemu 
 >> write and rw speed is so low now, my qemu binary is built on commit 
 >> 344eecf6995f4a0ad1d887cec922f6806f91a3f8.
 >>
 >> Do qemu have regression?
 >>
 >> The testing data is shown as below:
 >>
 >> 1.) write
 >>
 >> test: (g=0): rw=write, bs=512-512/512-512, ioengine=libaio, iodepth=1
 >
 > Please post your QEMU command-line.  If your -drive is using
 > cache=writethrough then small writes are slow because they require the
 > physical disk to write and then synchronize its write cache.  Typically
 > cache=none is a good setting to use for local disks.
 >
 > The block size of 512 bytes is too small.  Ext4 uses a 4 KB block size,
 > so I think a 512 byte write from the guest could cause a 4 KB
 > read-modify-write operation on the host filesystem.
 >
 > You can check this by running btrace(8) on the host during the
 > benchmark.  The blktrace output and the summary statistics will show
 > what I/O pattern the host is issuing.
   8,2    0        1     0.0   337  A  WS 425081504 + 8 <-
 (253,1) 42611360
>>>
>>> 8 blocks = 8 * 512 bytes = 4 KB
>> How do you know each block size is 512 bytes?
>
> The blkparse format specifier for blocks is 'n'.  Here is the code to
> print it from blkparse_fmt.c:
>
> case 'n':
>    fprintf(ofp, strcat(format, "u"), t_sec(t));
>
> And t_sec() is:
>
> #define t_sec(t)        ((t)->bytes >> 9)
Great, it shift 9 bit in the right direction, i.e. its unit is changed
from bytes to blocks, got it, thanks.

>
> So it divides the byte count by 512.  Block size == sector size == 512 bytes.
>
> You can get the blktrace source code here:
>
> http://brick.kernel.dk/snaps/
>
> Stefan
>



-- 
Regards,

Zhi Yong Wu



Re: [Qemu-devel] [PATCH v3 5/6] vga: Use linear mapping + dirty logging in chain 4 memory access mode

2011-09-13 Thread Jan Kiszka
On 2011-09-13 10:40, Alexander Graf wrote:
> Btw, it still tries to execute invalid code even with your patch. #if 0'ing 
> out the memory region updates at least get the guest booting for me. Btw, to 
> get it working you also need a patch for the interrupt controller (another 
> breakage thanks to memory api).
> 
> diff --git a/hw/heathrow_pic.c b/hw/heathrow_pic.c
> index 51996ab..16f48d1 100644
> --- a/hw/heathrow_pic.c
> +++ b/hw/heathrow_pic.c
> @@ -126,7 +126,7 @@ static uint64_t pic_read(void *opaque, target_phys_addr_t 
> addr,
>  static const MemoryRegionOps heathrow_pic_ops = {
>  .read = pic_read,
>  .write = pic_write,
> -.endianness = DEVICE_NATIVE_ENDIAN,
> +.endianness = DEVICE_LITTLE_ENDIAN,
>  };
>  
>  static void heathrow_pic_set_irq(void *opaque, int num, int level)
> 

With out without this fix, with or without active chain-4 optimization,
I just get an empty yellow screen when firing up qemu-system-ppc (also
when using the Debian ISO). Do I need to specify a specific machine type?

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux



Re: [Qemu-devel] [Bug 848571] [NEW] qemu does not generate a qemu-kvm.stp tapset file

2011-09-13 Thread Stefan Hajnoczi
On Tue, Sep 13, 2011 at 5:08 AM, William Cohen
<848...@bugs.launchpad.net> wrote:
> Public bug reported:
>
> To make the systemtap probing easier to use qemu generates qemu*.stp
> files with aliases for various events for each of the executables. The
> installer places these files in /usr/share/systemtap/tapset/.  These
> files are generated by the tracetool. However, the /usr/bin/qemu-kvm is
> produced with a copy:
>
>  cp -a x86_64-softmmu/qemu-system-x86_64 qemu-kvm
>
> No matching qemu-kvm.stp generated for the qemu-kvm executable. It would
> be really nice if that tapset file is generated so people can use more
> symbolic probe points.

Jes Sorensen added an option to make this possible:

http://repo.or.cz/w/qemu.git/commitdiff/e323c93edf3abb67c37b8e08b78da4835880f12e

Distro packaging scripts can make use of this tracetool option to
generate .stp files for qemu-kvm in its install path.

I think you need to file a bug with your distro.

Stefan



[Qemu-devel] Simulez vos économies en ligne avec Ticket Restaurant

2011-09-13 Thread Ticket Restaurant par GDP
 
Si vous ne parvenez pas à lire cet email, visualisez la version en ligne.
 
 
 
 
 Pour découvrir tous les avantages de Ticket Restaurant® et évaluer vos 
économies  

 
  


 
  

 
 1) 33% d'économies = (550€/1650€)* 100 (voir explications sur la page 
suivante) 
Le plafond d'exonération de charges sociales et fiscales selon la 
réglementation au 01/01/2011 est de 5.29€. 1164€ (arrondi au supérieur dans le 
texte) = 5.29€ x 220 jours travaillés. 

EDENRED FRANCE, S.A.S au capital de 388.036.640.00 €,dont le siège social est 
situé 166-180, boulevard Gabriel Péri 92245 Malakoff -393 365 135 R.C.S 
NANTERRE - TVA Intra Communautaire : FR 13 393 365 135 - Les marques 
mentionnées sur ce document sont enregistrées et propriété de EDENRED S.A. ou 
des sociétés de son groupe. 
 

 

 
Pour ne plus recevoir d'offres de la part de "Ticket Restaurant" Cliquez-ici .
 



Ce message de l'annonceur Edenred vous est envoyé par le programme GDP.
Vous recevez ce message parce que vous êtes inscrit sur le site du Guide des 
Prestataires.
Conformément à l'article 34 de la loi "Informatique et Libertés" du 6 janvier 
1978,

Vous disposez d'un droit d'accès, de modification, de rectification et de 
suppression de données vous concernent.
 
Si vous souhaitez vous désabonner, recopiez cette adresse dans la barre 
d'adresse de votre navigateur : 
http://www.ve1851-01.com/u-1.1.php?param=3Rbi_1IvEkRQjb1KxEXDWVukfGahjZsOBcEkRQjb1KwEjb1KEkRQjb1KxE2/jGEg/W/KEkRQjb1KSEIQIvIGmQRvEkRQjb1KwEiQI3bi3EkRQjb1KxEwJCLLLSLEkRQjb1KwE3sk/lPbY/EkRQjb1KxExEkRQjb1KwE1glibjkbvI/EkRQjb1KxExFC


Re: [Qemu-devel] qemu virtIO blocking operation - question

2011-09-13 Thread Stefan Hajnoczi
On Mon, Sep 12, 2011 at 10:05 PM, Sinha, Ani  wrote:
>
> On Sep 11, 2011, at 6:34 AM, Stefan Hajnoczi wrote:
>
>>
>> You may find these posts I wrote helpful, they explain vcpu threads and
>> the I/O thread:
>> http://blog.vmsplice.net/2011/03/qemu-internals-big-picture-overview.html
>> http://blog.vmsplice.net/2011/03/qemu-internals-overall-architecture-and.html
>>
>
>
> "One example user of worker threads is posix-aio-compat.c, an asynchronous 
> file I/O implementation. When core QEMU issues an aio request it is placed on 
> a queue. Worker threads take requests off the queue and execute them outside 
> of core QEMU. They may perform blocking operations since they execute in 
> their own threads and do not block
>
> Another example is ui/vnc-jobs-async.c which performs compute-intensive image 
> compression and encoding in worker threads."
>
>
> I wonder why there isn't a general framework for this? Some thread that would 
> take requests off a queue and process them without knowing the internals of 
> the request.

There is, you can use GThreadPool.

Stefan



Re: [Qemu-devel] qemu virtIO blocking operation - question

2011-09-13 Thread Stefan Hajnoczi
On Mon, Sep 12, 2011 at 7:31 PM, Sinha, Ani  wrote:
>
> On Sep 11, 2011, at 6:34 AM, Stefan Hajnoczi wrote:
>
>> On Fri, Sep 09, 2011 at 07:45:17PM -0500, Sinha, Ani wrote:
>>> So I am writing a virtIO driver for a device that supports blocking calls 
>>> like poll() etc. Now the front end paravirtualized driver mirrors the 
>>> request to the backend "host" qemu process that then does the actual call 
>>> on the host kernel on behalf of the guest. Now my question is, when I do a 
>>> blocking call from within the callback that I have registered when I 
>>> created the virtqueue, does it block the entire vcpu? If it does, do I have 
>>> to create an async context for it? Looking at virtio-net and virtio-block, 
>>> I can see the concept of bottom halves but I am not sure if this helps me 
>>> in any way.
>>
>> What device are you adding?  It would help to share what you are trying
>> to accomplish.
>
>
> We are trying to paravirtualize the IPMI device (/dev/ipmi0).

>From http://en.wikipedia.org/wiki/Intelligent_Platform_Management_Interface:
"An implementation of IPMI version 1.5 can communicate via a direct
serial connection or via a side-band local area network (LAN)
connection to a remote client."

Why do you need a new virtio device?  Can't you use virtio-serial?
This is what other management channels are using for host<->guest
agents.

What features and use cases does paravirtualized IPMI provide?

Stefan



Re: [Qemu-devel] Why qemu write/rw speed is so low?

2011-09-13 Thread Zhi Yong Wu
On Tue, Sep 13, 2011 at 3:14 PM, Stefan Hajnoczi
 wrote:
> On Tue, Sep 13, 2011 at 10:52:44AM +0800, Zhi Yong Wu wrote:
>> This is real log when fio issued with bs=128K and bps=100(block
>> I/O throttling):
>
> I would use 1024 * 1024 instead of 100 as the throughput limit.
> 10^5 is not a multiple of 512 bytes and is not a nice value in KB/s
> (976.5625).
OK. next time, i will adopt this.
>
>>
>>   8,2    0        1     0.0 24332  A  WS 79958528 + 256 <-
>> (253,2) 71830016
>
> 256 blocks = 256 * 512 bytes = 128 KB per request.  We know the maximum
> request size from Linux is 128 KB so this makes sense.
>
>> Throughput (R/W): 0KiB/s / 482KiB/s
>
> What throughput do you get without I/O throttling?  Either I/O
> throttling is limiting too aggressively here or the physical disk is the
> bottleneck (I double that since the write throughput value is very low).
> We need to compare against the throughput when throttling is not
> enabled.
Without block I/O throttling.

test: (g=0): rw=write, bs=512-512/512-512, ioengine=libaio, iodepth=1
Starting 1 process
Jobs: 1 (f=1): [W] [100.0% done] [0K/58K /s] [0/114 iops] [eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=2659
  write: io=51,200KB, bw=59,936B/s, iops=117, runt=874741msec
slat (usec): min=25, max=44,515, avg=69.77, stdev=774.19
clat (usec): min=778, max=216K, avg=8460.67, stdev=2417.70
 lat (usec): min=845, max=216K, avg=8531.11, stdev=2778.62
bw (KB/s) : min=   11, max=   60, per=100.89%, avg=58.52, stdev= 3.14
  cpu  : usr=0.04%, sys=0.76%, ctx=102601, majf=0, minf=49
  IO depths: 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
 submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
 complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
 issued r/w: total=0/102400, short=0/0
 lat (usec): 1000=0.01%
 lat (msec): 2=0.01%, 4=0.01%, 10=99.17%, 20=0.24%, 50=0.53%
 lat (msec): 100=0.04%, 250=0.01%

Run status group 0 (all jobs):
  WRITE: io=51,200KB, aggrb=58KB/s, minb=59KB/s, maxb=59KB/s,
mint=874741msec, maxt=874741msec

Disk stats (read/write):
  dm-0: ios=37/103237, merge=0/0, ticks=1935/901887, in_queue=903811,
util=99.67%, aggrios=37/102904, aggrmerge=0/351,
aggrticks=1935/889769, aggrin_queue=891623, aggrutil=99.64%
vda: ios=37/102904, merge=0/351, ticks=1935/889769,
in_queue=891623, util=99.64%
test: (g=0): rw=write, bs=8K-8K/8K-8K, ioengine=libaio, iodepth=1
Starting 1 process
Jobs: 1 (f=1): [W] [100.0% done] [0K/973K /s] [0/118 iops] [eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=2716
  write: io=51,200KB, bw=926KB/s, iops=115, runt= 55291msec
slat (usec): min=20, max=36,133, avg=68.68, stdev=920.02
clat (msec): min=1, max=58, avg= 8.52, stdev= 1.99
 lat (msec): min=1, max=66, avg= 8.58, stdev= 2.48
bw (KB/s) : min=  587, max=  972, per=100.23%, avg=928.14, stdev=54.43
  cpu  : usr=0.04%, sys=0.59%, ctx=6416, majf=0, minf=26
  IO depths: 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
 submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
 complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
 issued r/w: total=0/6400, short=0/0

 lat (msec): 2=0.06%, 4=0.06%, 10=99.00%, 20=0.25%, 50=0.61%
 lat (msec): 100=0.02%

Run status group 0 (all jobs):
  WRITE: io=51,200KB, aggrb=926KB/s, minb=948KB/s, maxb=948KB/s,
mint=55291msec, maxt=55291msec

Disk stats (read/write):
  dm-0: ios=3/6507, merge=0/0, ticks=33/68470, in_queue=68508,
util=99.51%, aggrios=3/6462, aggrmerge=0/60, aggrticks=33/64291,
aggrin_queue=64322, aggrutil=99.48%
vda: ios=3/6462, merge=0/60, ticks=33/64291, in_queue=64322, util=99.48%
test: (g=0): rw=write, bs=64K-64K/64K-64K, ioengine=libaio, iodepth=1
Starting 1 process
Jobs: 1 (f=1): [W] [100.0% done] [0K/7,259K /s] [0/110 iops] [eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=2727
  write: io=51,200KB, bw=7,050KB/s, iops=110, runt=  7262msec
slat (usec): min=30, max=46,393, avg=90.62, stdev=1639.10
clat (msec): min=2, max=39, avg= 8.98, stdev= 1.82
 lat (msec): min=2, max=85, avg= 9.07, stdev= 3.08
bw (KB/s) : min= 6003, max= 7252, per=100.13%, avg=7058.86, stdev=362.31
  cpu  : usr=0.00%, sys=0.61%, ctx=801, majf=0, minf=23
  IO depths: 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
 submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
 complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
 issued r/w: total=0/800, short=0/0

 lat (msec): 4=0.25%, 10=92.38%, 20=7.00%, 50=0.38%

Run status group 0 (all jobs):
  WRITE: io=51,200KB, aggrb=7,050KB/s, minb=7,219KB/s, maxb=7,219KB/s,
mint=7262msec, maxt=7262msec

Disk stats (read/write):
  dm-0: ios=0/808, merge=0/0, ticks=0/8216, in_queue=8225,
util=98.31%, aggrios=0/804, aggrmerge=0/18, aggrticks=0/7363,
aggrin_queue=7363, aggrutil=98.19%

[Qemu-devel] qcow2: snapshot and resize possible?

2011-09-13 Thread Frediano Ziglio
Looking at block TODOs I saw qcow2 resize with snapshot. However I
would ask if this is technical possible with current format. The
reason is that snapshots have no size (only l1_size, QCowHeader have a
size field) however I think that size if part of machine state and is
not possible to compute exact size from l1_size.
Should this resize be posted to qcow3 update?

Frediano



Re: [Qemu-devel] [PATCH v3 5/6] vga: Use linear mapping + dirty logging in chain 4 memory access mode

2011-09-13 Thread Alexander Graf

On 13.09.2011, at 11:00, Jan Kiszka wrote:

> On 2011-09-13 10:40, Alexander Graf wrote:
>> Btw, it still tries to execute invalid code even with your patch. #if 0'ing 
>> out the memory region updates at least get the guest booting for me. Btw, to 
>> get it working you also need a patch for the interrupt controller (another 
>> breakage thanks to memory api).
>> 
>> diff --git a/hw/heathrow_pic.c b/hw/heathrow_pic.c
>> index 51996ab..16f48d1 100644
>> --- a/hw/heathrow_pic.c
>> +++ b/hw/heathrow_pic.c
>> @@ -126,7 +126,7 @@ static uint64_t pic_read(void *opaque, 
>> target_phys_addr_t addr,
>> static const MemoryRegionOps heathrow_pic_ops = {
>> .read = pic_read,
>> .write = pic_write,
>> -.endianness = DEVICE_NATIVE_ENDIAN,
>> +.endianness = DEVICE_LITTLE_ENDIAN,
>> };
>> 
>> static void heathrow_pic_set_irq(void *opaque, int num, int level)
>> 
> 
> With out without this fix, with or without active chain-4 optimization,
> I just get an empty yellow screen when firing up qemu-system-ppc (also
> when using the Debian ISO). Do I need to specify a specific machine type?

Ugh. No, you only need this patch:

  [PATCH] PPC: Fix via-cuda memory registration

which fixes another recently introduced regression :)


Alex




Re: [Qemu-devel] [PATCH v3 5/6] vga: Use linear mapping + dirty logging in chain 4 memory access mode

2011-09-13 Thread Andreas Färber

Am 13.09.2011 um 11:00 schrieb Jan Kiszka:


On 2011-09-13 10:40, Alexander Graf wrote:
Btw, it still tries to execute invalid code even with your patch.  
#if 0'ing out the memory region updates at least get the guest  
booting for me. Btw, to get it working you also need a patch for  
the interrupt controller (another breakage thanks to memory api).


diff --git a/hw/heathrow_pic.c b/hw/heathrow_pic.c
index 51996ab..16f48d1 100644
--- a/hw/heathrow_pic.c
+++ b/hw/heathrow_pic.c
@@ -126,7 +126,7 @@ static uint64_t pic_read(void *opaque,  
target_phys_addr_t addr,

static const MemoryRegionOps heathrow_pic_ops = {
.read = pic_read,
.write = pic_write,
-.endianness = DEVICE_NATIVE_ENDIAN,
+.endianness = DEVICE_LITTLE_ENDIAN,
};

static void heathrow_pic_set_irq(void *opaque, int num, int level)



With out without this fix, with or without active chain-4  
optimization,

I just get an empty yellow screen when firing up qemu-system-ppc (also
when using the Debian ISO). Do I need to specify a specific machine  
type?


No. Did you try with Alex' via-cuda patch? That's the only one I have  
on my branch for Linux host.


Andreas



Re: [Qemu-devel] qcow2: snapshot and resize possible?

2011-09-13 Thread Kevin Wolf
Am 13.09.2011 11:41, schrieb Frediano Ziglio:
> Looking at block TODOs I saw qcow2 resize with snapshot. However I
> would ask if this is technical possible with current format. The
> reason is that snapshots have no size (only l1_size, QCowHeader have a
> size field) however I think that size if part of machine state and is
> not possible to compute exact size from l1_size.
> Should this resize be posted to qcow3 update?

Yes, I think it would require a format change. Maybe we should add a
field for the image size to the qcow2 v3 proposal.

Kevin



Re: [Qemu-devel] About hotplug multifunction

2011-09-13 Thread Michael S. Tsirkin
On Tue, Sep 13, 2011 at 09:52:49AM +0300, Gleb Natapov wrote:
> On Tue, Sep 13, 2011 at 02:57:20PM +0900, Isaku Yamahata wrote:
> > On Sun, Sep 11, 2011 at 12:05:17PM +0300, Michael S. Tsirkin wrote:
> > > On Sat, Sep 10, 2011 at 02:43:11AM +0900, Isaku Yamahata wrote:
> > > > pci/pcie hot plug needs clean up for multifunction hotplug in long term.
> > > > Only single function device case works. Multifunction case is broken 
> > > > somwehat.
> > > > Especially the current acpi based hotplug should be replaced by
> > > > the standardized hot plug controller in long term.
> > > 
> > > We'll need to keep supporting windows XP, which IIUC only
> > > supports hotplug through ACPI. So it looks like we'll
> > > need both.
> > 
> > Yes, we'll need both then.
> > It would be possible to implement acpi-based hotplug with
> > standardized hotplug controller. Not with qemu-specific controller.
> > 
> Where is this "standardized hotplug controller" documented?

In the pci bridge spec.

> > It would require a bit amount of work to write ACPI code in DSDT that
> > handles standardized hotplug controller.
> > So I'm not sure it's worth while only for windows XP support.
> > -- 
> > yamahata
> 
> --
>   Gleb.



Re: [Qemu-devel] Using the qemu tracepoints with SystemTap

2011-09-13 Thread Stefan Hajnoczi
On Mon, Sep 12, 2011 at 4:33 PM, William Cohen  wrote:
> The RHEL-6 version of qemu-kvm makes the tracepoints available to SystemTap. 
> I have been working on useful examples for the SystemTap tracepoints in qemu. 
> There doesn't seem to be a great number of examples showing the utility of 
> the tracepoints in diagnosing problems. However, I came across the following 
> blog entry that had several examples:
>
> http://blog.vmsplice.net/2011/03/how-to-write-trace-analysis-scripts-for.html
>
> I reimplemented the VirtqueueRequestTracker example from the blog in 
> SystemTap (the attached virtqueueleaks.stp). I can run it on RHEL-6's 
> qemu-kvm-0.12.1.2-2.160.el6_1.8.x86_64 and get output like the following. It 
> outputs the pid and the address of the elem that leaked when the script is 
> stopped like the following:
>
> $ stap virtqueueleaks.stp
> ^C
>     pid     elem
>   19503  1c4af28
>   19503  1c56f88
>   19503  1c62fe8
>   19503  1c6f048
>   19503  1c7b0a8
>   19503  1c87108
>   19503  1c93168
> ...
>
> I am not that familiar with the internals of qemu. The script seems to 
> indicates qemu is leaking, but is that really the case?  If there are 
> resource leaks, what output would help debug those leaks? What enhancements 
> can be done to this script to provide more useful information?

Leak tracing always has corner cases :).

With virtio-blk this would indicate a leak because it uses a
request-response model where the guest initiates I/O and the host
responds.  A guest that cleanly shuts down before you exit your
SystemTap script should not leak requests for virtio-blk.

With virtio-net the guest actually hands the host receive buffers and
they are held until we can receive packets into them and return them
to the host.  We don't have a virtio_reset trace event, and due to
this we're not accounting for clean shutdown (the guest driver resets
the device to clear all virtqueues).

I am submitting a patch to add virtio_reset() tracing.  This will
allow the script to delete all elements belonging to this virtio
device.

> Are there other examples of qemu probing people would like to see?

The malloc/realloc/memalign/vmalloc/free/vfree trace events can be
used for a few things:
 * Finding memory leaks.
 * Finding malloc/vfree or vmalloc/free mismatches.  The rules are:
malloc/realloc need free, memalign/vmalloc need vfree.  They cannot be
mixed.

Stefan



Re: [Qemu-devel] About hotplug multifunction

2011-09-13 Thread Michael S. Tsirkin
On Tue, Sep 13, 2011 at 09:52:49AM +0300, Gleb Natapov wrote:
> On Tue, Sep 13, 2011 at 02:57:20PM +0900, Isaku Yamahata wrote:
> > On Sun, Sep 11, 2011 at 12:05:17PM +0300, Michael S. Tsirkin wrote:
> > > On Sat, Sep 10, 2011 at 02:43:11AM +0900, Isaku Yamahata wrote:
> > > > pci/pcie hot plug needs clean up for multifunction hotplug in long term.
> > > > Only single function device case works. Multifunction case is broken 
> > > > somwehat.
> > > > Especially the current acpi based hotplug should be replaced by
> > > > the standardized hot plug controller in long term.
> > > 
> > > We'll need to keep supporting windows XP, which IIUC only
> > > supports hotplug through ACPI. So it looks like we'll
> > > need both.
> > 
> > Yes, we'll need both then.
> > It would be possible to implement acpi-based hotplug with
> > standardized hotplug controller. Not with qemu-specific controller.
> > 
> Where is this "standardized hotplug controller" documented?

Sorry both pci bridge and hotplug spec only reference shpc.
The spec itself is PCI Standard Hot-Plug
Controller and Subsystem Specification.

Revision 1.0 - get it from pcisig

> > It would require a bit amount of work to write ACPI code in DSDT that
> > handles standardized hotplug controller.
> > So I'm not sure it's worth while only for windows XP support.
> > -- 
> > yamahata
> 
> --
>   Gleb.



Re: [Qemu-devel] About hotplug multifunction

2011-09-13 Thread Gleb Natapov
On Tue, Sep 13, 2011 at 01:05:00PM +0300, Michael S. Tsirkin wrote:
> On Tue, Sep 13, 2011 at 09:52:49AM +0300, Gleb Natapov wrote:
> > On Tue, Sep 13, 2011 at 02:57:20PM +0900, Isaku Yamahata wrote:
> > > On Sun, Sep 11, 2011 at 12:05:17PM +0300, Michael S. Tsirkin wrote:
> > > > On Sat, Sep 10, 2011 at 02:43:11AM +0900, Isaku Yamahata wrote:
> > > > > pci/pcie hot plug needs clean up for multifunction hotplug in long 
> > > > > term.
> > > > > Only single function device case works. Multifunction case is broken 
> > > > > somwehat.
> > > > > Especially the current acpi based hotplug should be replaced by
> > > > > the standardized hot plug controller in long term.
> > > > 
> > > > We'll need to keep supporting windows XP, which IIUC only
> > > > supports hotplug through ACPI. So it looks like we'll
> > > > need both.
> > > 
> > > Yes, we'll need both then.
> > > It would be possible to implement acpi-based hotplug with
> > > standardized hotplug controller. Not with qemu-specific controller.
> > > 
> > Where is this "standardized hotplug controller" documented?
> 
> Sorry both pci bridge and hotplug spec only reference shpc.
> The spec itself is PCI Standard Hot-Plug
> Controller and Subsystem Specification.
> 
> Revision 1.0 - get it from pcisig
> 
Thanks, Isaku is already pointed it to me.

--
Gleb.



Re: [Qemu-devel] Why qemu write/rw speed is so low?

2011-09-13 Thread Stefan Hajnoczi
On Tue, Sep 13, 2011 at 10:25 AM, Zhi Yong Wu  wrote:
> On Tue, Sep 13, 2011 at 3:14 PM, Stefan Hajnoczi
>  wrote:
>> On Tue, Sep 13, 2011 at 10:52:44AM +0800, Zhi Yong Wu wrote:
>>> This is real log when fio issued with bs=128K and bps=100(block
>>> I/O throttling):
>>
>> I would use 1024 * 1024 instead of 100 as the throughput limit.
>> 10^5 is not a multiple of 512 bytes and is not a nice value in KB/s
>> (976.5625).
> OK. next time, i will adopt this.
>>
>>>
>>>   8,2    0        1     0.0 24332  A  WS 79958528 + 256 <-
>>> (253,2) 71830016
>>
>> 256 blocks = 256 * 512 bytes = 128 KB per request.  We know the maximum
>> request size from Linux is 128 KB so this makes sense.
>>
>>> Throughput (R/W): 0KiB/s / 482KiB/s
>>
>> What throughput do you get without I/O throttling?  Either I/O
>> throttling is limiting too aggressively here or the physical disk is the
>> bottleneck (I double that since the write throughput value is very low).
>> We need to compare against the throughput when throttling is not
>> enabled.
> Without block I/O throttling.
[...]
> test: (g=0): rw=write, bs=128K-128K/128K-128K, ioengine=libaio, iodepth=1
> Starting 1 process
> Jobs: 1 (f=1): [W] [100.0% done] [0K/13M /s] [0/103 iops] [eta 00m:00s]
> test: (groupid=0, jobs=1): err= 0: pid=2734
>  write: io=51,200KB, bw=12,933KB/s, iops=101, runt=  3959msec

This shows that the physical disk is capable of far exceeding 1 MB/s
when I/O is not limited.  So the earlier result where the guest only
gets 482 KiB/s under 100 bps limit shows that I/O limits are being
too aggressive.  For some reason the algorithm is causing the guest to
get lower throughput than expected.

It would be interesting to try with bps=$((10 * 1024 * 1024)).  I
wonder if the algorithm has a constant overhead of a couple hundred
KB/s or if it changes with the much larger bps value.

Stefan



Re: [Qemu-devel] Why qemu write/rw speed is so low?

2011-09-13 Thread Zhi Yong Wu
On Tue, Sep 13, 2011 at 6:14 PM, Stefan Hajnoczi  wrote:
> On Tue, Sep 13, 2011 at 10:25 AM, Zhi Yong Wu  wrote:
>> On Tue, Sep 13, 2011 at 3:14 PM, Stefan Hajnoczi
>>  wrote:
>>> On Tue, Sep 13, 2011 at 10:52:44AM +0800, Zhi Yong Wu wrote:
 This is real log when fio issued with bs=128K and bps=100(block
 I/O throttling):
>>>
>>> I would use 1024 * 1024 instead of 100 as the throughput limit.
>>> 10^5 is not a multiple of 512 bytes and is not a nice value in KB/s
>>> (976.5625).
>> OK. next time, i will adopt this.
>>>

   8,2    0        1     0.0 24332  A  WS 79958528 + 256 <-
 (253,2) 71830016
>>>
>>> 256 blocks = 256 * 512 bytes = 128 KB per request.  We know the maximum
>>> request size from Linux is 128 KB so this makes sense.
>>>
 Throughput (R/W): 0KiB/s / 482KiB/s
>>>
>>> What throughput do you get without I/O throttling?  Either I/O
>>> throttling is limiting too aggressively here or the physical disk is the
>>> bottleneck (I double that since the write throughput value is very low).
>>> We need to compare against the throughput when throttling is not
>>> enabled.
>> Without block I/O throttling.
> [...]
>> test: (g=0): rw=write, bs=128K-128K/128K-128K, ioengine=libaio, iodepth=1
>> Starting 1 process
>> Jobs: 1 (f=1): [W] [100.0% done] [0K/13M /s] [0/103 iops] [eta 00m:00s]
>> test: (groupid=0, jobs=1): err= 0: pid=2734
>>  write: io=51,200KB, bw=12,933KB/s, iops=101, runt=  3959msec
>
> This shows that the physical disk is capable of far exceeding 1 MB/s
> when I/O is not limited.  So the earlier result where the guest only
> gets 482 KiB/s under 100 bps limit shows that I/O limits are being
> too aggressive.  For some reason the algorithm is causing the guest to
> get lower throughput than expected.
>
> It would be interesting to try with bps=$((10 * 1024 * 1024)).  I
> wonder if the algorithm has a constant overhead of a couple hundred
> KB/s or if it changes with the much larger bps value.
OK, i will try it tomorrow.

By the way, I/O throttling can be enabled now from libvirt guest's xml
file when guest is started up.
I have pushed its code changes to my git tree.

git branch: ssh://wu...@repo.or.cz/srv/git/libvirt/zwu.git dev

>
> Stefan
>



-- 
Regards,

Zhi Yong Wu



Re: [Qemu-devel] [PATCH][RFC][0/2] REF+/REF- optimization

2011-09-13 Thread Kevin Wolf
Am 13.09.2011 09:53, schrieb Frediano Ziglio:
> These patches try to trade-off between leaks and speed for clusters
> refcounts.
> 
> Refcount increments (REF+ or refp) are handled in a different way from
> decrements (REF- or refm). The reason it that posting or not flushing
> a REF- cause "just" a leak while posting a REF+ cause a corruption.
> 
> To optimize REF- I just used an array to store offsets then when a
> flush is requested or array reach a limit (currently 1022) the array
> is sorted and written to disk. I use an array with offset instead of
> ranges to support compression (an offset could appear multiple times
> in the array).
> I consider this patch quite ready.

Ok, first of all let's clarify what this optimises. I don't think it
changes anything at all for the writeback cache modes, because these
already do most operations in memory only. So this must be about
optimising some operations with cache=writethrough. REF- isn't about
normal cluster allocation, it is about COW with internal snapshots or
bdrv_discard. Do you have benchmarks for any of them?

I strongly disagree with your approach for REF-. We already have a
cache, and introducing a second one sounds like a bad idea. I think we
could get a very similar effect if we introduced a
qcow2_cache_entry_mark_dirty_wb() that marks a given refcount block as
dirty, but at the same time tells the cache that even in write-through
mode it can still treat this block as write-back. This should require
much less code changes.

But let's measure the effects first, I suspect that for cluster
allocation it doesn't help much because every REF- comes with a REF+.

> To optimize REF+ I mark a range as allocated and use this range to
> get new ones (avoiding writing refcount to disk). When a flush is
> requested or in some situations (like snapshot) this cache is disabled
> and flushed (written as REF-).
> I do not consider this patch ready, it works and pass all io-tests
> but for instance I would avoid allocating new clusters for refcount
> during preallocation.

The only question here is if improving cache=writethrough cluster
allocation performance is worth the additional complexity in the already
complex refcounting code.

The alternative that was discussed before is the dirty bit approach that
is used in QED and would allow us to use writeback for all refcount
blocks, regardless of REF- or REF+. It would be an easier approach
requiring less code changes, but it comes with the cost of requiring an
fsck after a qemu crash.

> End speed up is quite visible allocating clusters (more then 20%).

What benchmark do you use for testing this?

Kevin



[Qemu-devel] Help needed to modify VVFAT

2011-09-13 Thread Pintu Agarwal
Hi, 
This is regarding vvfat.
I want to do some modification in vvfat. And I need some help.
Currently vvfat scans all directories and sub-directories in the beginning.

I want to modify vvfat such that it should scan only the TOP directory content 
and not the sub-directory content.
How can I do that?

Please let me know.

 
 
Thanks,
Pintu



Re: [Qemu-devel] [PATCH v3 5/6] vga: Use linear mapping + dirty logging in chain 4 memory access mode

2011-09-13 Thread Jan Kiszka
On 2011-09-13 11:42, Alexander Graf wrote:
> 
> On 13.09.2011, at 11:00, Jan Kiszka wrote:
> 
>> On 2011-09-13 10:40, Alexander Graf wrote:
>>> Btw, it still tries to execute invalid code even with your patch. #if 0'ing 
>>> out the memory region updates at least get the guest booting for me. Btw, 
>>> to get it working you also need a patch for the interrupt controller 
>>> (another breakage thanks to memory api).
>>>
>>> diff --git a/hw/heathrow_pic.c b/hw/heathrow_pic.c
>>> index 51996ab..16f48d1 100644
>>> --- a/hw/heathrow_pic.c
>>> +++ b/hw/heathrow_pic.c
>>> @@ -126,7 +126,7 @@ static uint64_t pic_read(void *opaque, 
>>> target_phys_addr_t addr,
>>> static const MemoryRegionOps heathrow_pic_ops = {
>>> .read = pic_read,
>>> .write = pic_write,
>>> -.endianness = DEVICE_NATIVE_ENDIAN,
>>> +.endianness = DEVICE_LITTLE_ENDIAN,
>>> };
>>>
>>> static void heathrow_pic_set_irq(void *opaque, int num, int level)
>>>
>>
>> With out without this fix, with or without active chain-4 optimization,
>> I just get an empty yellow screen when firing up qemu-system-ppc (also
>> when using the Debian ISO). Do I need to specify a specific machine type?
> 
> Ugh. No, you only need this patch:
> 
>   [PATCH] PPC: Fix via-cuda memory registration
> 
> which fixes another recently introduced regression :)

That works now - and allowed me to identify the bug after enhancing info
mtree a bit:

(qemu) info mtree
memory
  addr  prio 0 size 7fff system
addr 8088 prio 1 size 8 macio
  addr 808e prio 0 size 2 macio-nvram
  addr 808a prio 0 size 1000 pmac-ide
  addr 80896000 prio 0 size 2000 cuda
  addr 80893000 prio 0 size 40 escc-bar
  addr 80888000 prio 0 size 1000 dbdma
  addr 8088 prio 0 size 1000 heathrow-pic
addr 8000 prio 1 size 80 vga.vram
addr 800a prio 1 size 2 vga-lowmem
...

Here is the problem: Both the vram and the ISA range get mapped into
system address space, but the former eclipses the latter as it shows up
earlier in the list and has the same priority. This picture changes with
the chain-4 alias which has prio 2, thus maps over the vram.

It looks to me like the ISA address space is either misplaced at
0x8000 or is not supposed to be mapped at all on PPC. Comments?

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux



Re: [Qemu-devel] Question on kvm_clock working ...

2011-09-13 Thread al pat
Thanks Phillip.

My current source is "kvm-clock".

WHen I start my guest, it is in sync with the host clock.

Then, I chance the time on my host - using "date --set ...". I don't see the
guest update its time.
I was expecting that the guest will detect host time change and change it?

So, when the host is exporting its system time and TSC values, does it go
into the "emulated RTC" of the guest and the guest checks it only once? Or
does the guest resync its clock with the host's value periodically?

I can try to do: "hwclock --hctosys --utc" --- this is just to check. (I
have kvm-clock as my clock source though).

Thanks
-a

On Tue, Sep 13, 2011 at 2:49 AM, Philipp Hahn  wrote:

> Hello Al,
>
> I just debugged a kvmclock bug, so I claim to have some knowledge in this
> area
> now, but please take my answer with a grain of doubt.
>
> On Monday 12 September 2011 15:21:25 al pat wrote:
> > Still seeking your guidance on this. Appreciate any pointers you may
> have.
>
> You have to distiguish between the real-time-clock (RTC), which in hardware
> is
> a battery powered clock running even when your PC is powered off. Since
> it's
> slow to access, most Linux distributions read out its value once during
> boot
> using "hwclock --hctosys --utc" and than don't care about that clock any
> more
> until shutdown, when they write back the system time to the RTC
> using "... --systohc ...".
> During runtime, other methods are used for time keeping: Either by counting
> regular interrupts, using the ACPI-PM clock, or the High Performance Event
> Timer (HPET), or the Time Stamp Counter (TSC) register, or ...;
> see /sys/devices/system/clocksource/clocksource0/available_clocksource for
> a
> list of available clock sources.
>
> For virtual machines there is an additional clock source named "kvmclock",
> which uses the host clock and the TSC: The host exports its current system
> time (plus some configurable offset) and a snapshot value of TSC register
> when doing so. Than the guest can interpolate the current time by using the
> exported_system_time + scale * (current_TSC_value-snapshot_TSC_value). This
> kvmclock doesn't have anything to do with the RTC clock as far as I know.
>
> Now to your problem: You should check the value
> of /sys/devices/system/clocksource/clocksource0/current_clocksource in your
> guest. If it is somethong other than kvmclock, you should if
> using "hwclock --hctosys --utc" re-synchronizes your guest clock to the
> host.
>
> Sincerely
> Philipp
> --
> Philipp Hahn   Open Source Software Engineer
> h...@univention.de
> Univention GmbHLinux for Your Businessfon: +49 421 22 232-
> 0
> Mary-Somerville-Str.1  D-28359 Bremen fax: +49 421 22
> 232-99
>
> http://www.univention.de/
>
> 
> Treffen Sie Univention auf der IT&Business vom 20. bis 22. September 2011
> auf dem Gemeinschaftsstand der Open Source Business Alliance in Stuttgart
> in
> Halle 3 Stand 3D27-7.
>


Re: [Qemu-devel] [PATCH V8 03/14] Add persistent state handling to TPM TIS frontend driver

2011-09-13 Thread Paul Moore
On Monday, September 12, 2011 07:37:25 PM Stefan Berger wrote:
> On 09/12/2011 05:16 PM, Paul Moore wrote:
> > On Sunday, September 11, 2011 12:45:05 PM Stefan Berger wrote:
> >> On 09/09/2011 05:13 PM, Paul Moore wrote:
> >>> On Wednesday, August 31, 2011 10:35:54 AM Stefan Berger wrote:
>  Index: qemu-git/hw/tpm_tis.c
>  ==
>  =
>  --- qemu-git.orig/hw/tpm_tis.c
>  +++ qemu-git/hw/tpm_tis.c
>  @@ -6,6 +6,8 @@
>  
>  * Author: Stefan Berger
>  * David Safford
>  *
>  
>  + * Xen 4 support: Andrease
>  Niederl
>  + *
>  
>  * This program is free software; you can redistribute it
>  and/or
>  * modify it under the terms of the GNU General Public
>  License as
>  * published by the Free Software Foundation, version 2 of
>  the
>  
>  @@ -839,3 +841,167 @@ static int tis_init(ISADevice *dev)
>  
>  err_exit:
> return -1;
> 
> }
>  
>  +
>  +/* persistent state handling */
>  +
>  +static void tis_pre_save(void *opaque)
>  +{
>  +TPMState *s = opaque;
>  +uint8_t locty = s->active_locty;
> >>> 
> >>> Is it safe to read s->active_locty without the state_lock?  I'm not
> >>> sure at this point but I saw it being protected by the lock
> >>> elsewhere ...>> 
> >> It cannot change anymore since no vCPU is in the TPM TIS emulation
> >> layer
> >> anymore but all we're doing is wait for the last outstanding command
> >> to
> >> be returned to use from the TPM thread.
> >> I don't mind putting this reading into the critical section, though,
> >> just to have it be consistent.
> > 
> > [Dropping the rest of the comments since they all cover the same issue]
> > 
> > I'm a big fan of consistency, especially when it comes to locking;
> > inconsistent lock usage can lead to confusion and that is almost never
> > good.
> > 
> > If we need a lock here because there is the potential for an outstanding
> > TPM command, then I vote for locking in this function just as you would
> > in any other.  However, if we really don't need locks here because the
> > outstanding TPM command will have _no_ effect on the TPMState or any
> > related structure, then just do away with the lock completely and make
> > of note of it in the function explaining why.
> 
> Let's give the consistency argument the favor and extend the locking
> over those parts that usually also get locked.

Great, thanks.

-- 
paul moore
virtualization @ redhat




Re: [Qemu-devel] About hotplug multifunction

2011-09-13 Thread Amos Kong
Hi all,

After reading the pci driver code, I found a problem.

There is a list for each slot, (slot->funcs)
it will be inited in acpiphp_glue.c:register_slot() before hotpluging device,
and only one entry(func 0) will be added to it,
no new entry will be added to the list when hotpluging devices to the slot.

When we release the device, there are only _one_ entry in the list(slot->funcs).

acpiphp_glue.c:disable_device()
list_for_each_entry(func, &slot->funcs, sibling) {
pdev = pci_get_slot(slot->bridge->pci_bus,
PCI_DEVFN(slot->device, func->function));
...release code...  // those code can only be executed one time (func 0)
pci_remove_bus_device(pdev);
}

bus.c:pci_bus_add_device() is called for each func device in 
acpiphp_glue.c:enable_device().
bus.c:pci_remove_bus_device(pdev) is only called for func 0 in 
acpiphp_glue.c:disable_device().


Resolution: (I've tested it, success)
enumerate all the funcs when disable device.

list_for_each_entry(func, &slot->funcs, sibling) {
for (i=0; i<8; i++) {
pdev = pci_get_slot(slot->bridge->pci_bus,
PCI_DEVFN(slot->device, i));
...release code...
pci_remove_bus_device(pdev);

}
}

Thanks,
Amos



Re: [Qemu-devel] [PATCH 0/2] versatile: cleanups to remove use of sysbus_mmio_init_cb2

2011-09-13 Thread Avi Kivity

On 09/12/2011 06:01 PM, Peter Maydell wrote:

Ping?


Sorry; applied to memory/queue.


On 1 September 2011 18:36, Peter Maydell  wrote:
>  A couple of patches which do some cleanup work to versatile
>  devices following the recent MemoryRegion conversion. These
>  both remove uses of sysbus_mmio_init_cb2(), which strikes me
>  as kind of ugly and worth avoiding. (After these two patches
>  it will be used by only sh_pci.c and ppce500_pci.c...)
>
>  Peter Maydell (2):
>hw/arm11mpcore: Clean up to avoid using sysbus_mmio_init_cb2
>hw/versatile_pci: Expose multiple sysbus mmio regions
>
>hw/arm11mpcore.c   |   13 +
>hw/realview.c  |   12 ++--
>hw/versatile_pci.c |   42 --
>hw/versatilepb.c   |   12 ++--
>4 files changed, 29 insertions(+), 50 deletions(-)



--
error compiling committee.c: too many arguments to function




[Qemu-devel] [PATCH 3/4] trace: remove trailing double quotes after PRI*64

2011-09-13 Thread Stefan Hajnoczi
Now that format strings can end in a PRI*64 macro, remove the
workarounds from the trace-events file.

Signed-off-by: Stefan Hajnoczi 
---
 trace-events |   34 +-
 1 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/trace-events b/trace-events
index cfcdc9b..9a59525 100644
--- a/trace-events
+++ b/trace-events
@@ -88,8 +88,8 @@ balloon_event(void *opaque, unsigned long addr) "opaque %p 
addr %lu"
 # hw/apic.c
 apic_local_deliver(int vector, uint32_t lvt) "vector %d delivery mode %d"
 apic_deliver_irq(uint8_t dest, uint8_t dest_mode, uint8_t delivery_mode, 
uint8_t vector_num, uint8_t trigger_mode) "dest %d dest_mode %d delivery_mode 
%d vector %d trigger_mode %d"
-cpu_set_apic_base(uint64_t val) "%016"PRIx64""
-cpu_get_apic_base(uint64_t val) "%016"PRIx64""
+cpu_set_apic_base(uint64_t val) "%016"PRIx64
+cpu_get_apic_base(uint64_t val) "%016"PRIx64
 apic_mem_readl(uint64_t addr, uint32_t val)  "%"PRIx64" = %08x"
 apic_mem_writel(uint64_t addr, uint32_t val) "%"PRIx64" = %08x"
 # coalescing
@@ -169,21 +169,21 @@ slavio_led_mem_readw(uint32_t ret) "Read diagnostic LED 
%04x"
 # hw/slavio_timer.c
 slavio_timer_get_out(uint64_t limit, uint32_t counthigh, uint32_t count) 
"limit %"PRIx64" count %x%08x"
 slavio_timer_irq(uint32_t counthigh, uint32_t count) "callback: count %x%08x"
-slavio_timer_mem_readl_invalid(uint64_t addr) "invalid read address %"PRIx64""
+slavio_timer_mem_readl_invalid(uint64_t addr) "invalid read address %"PRIx64
 slavio_timer_mem_readl(uint64_t addr, uint32_t ret) "read %"PRIx64" = %08x"
 slavio_timer_mem_writel(uint64_t addr, uint32_t val) "write %"PRIx64" = %08x"
-slavio_timer_mem_writel_limit(unsigned int timer_index, uint64_t count) 
"processor %d user timer set to %016"PRIx64""
+slavio_timer_mem_writel_limit(unsigned int timer_index, uint64_t count) 
"processor %d user timer set to %016"PRIx64
 slavio_timer_mem_writel_counter_invalid(void) "not user timer"
 slavio_timer_mem_writel_status_start(unsigned int timer_index) "processor %d 
user timer started"
 slavio_timer_mem_writel_status_stop(unsigned int timer_index) "processor %d 
user timer stopped"
 slavio_timer_mem_writel_mode_user(unsigned int timer_index) "processor %d 
changed from counter to user timer"
 slavio_timer_mem_writel_mode_counter(unsigned int timer_index) "processor %d 
changed from user timer to counter"
 slavio_timer_mem_writel_mode_invalid(void) "not system timer"
-slavio_timer_mem_writel_invalid(uint64_t addr) "invalid write address 
%"PRIx64""
+slavio_timer_mem_writel_invalid(uint64_t addr) "invalid write address %"PRIx64
 
 # hw/sparc32_dma.c
-ledma_memory_read(uint64_t addr) "DMA read addr 0x%"PRIx64""
-ledma_memory_write(uint64_t addr) "DMA write addr 0x%"PRIx64""
+ledma_memory_read(uint64_t addr) "DMA read addr 0x%"PRIx64
+ledma_memory_write(uint64_t addr) "DMA write addr 0x%"PRIx64
 sparc32_dma_set_irq_raise(void) "Raise IRQ"
 sparc32_dma_set_irq_lower(void) "Lower IRQ"
 espdma_memory_read(uint32_t addr) "DMA read addr 0x%08x"
@@ -202,12 +202,12 @@ sun4m_cpu_set_irq_lower(int level) "Lower CPU IRQ %d"
 # hw/sun4m_iommu.c
 sun4m_iommu_mem_readl(uint64_t addr, uint32_t ret) "read reg[%"PRIx64"] = %x"
 sun4m_iommu_mem_writel(uint64_t addr, uint32_t val) "write reg[%"PRIx64"] = %x"
-sun4m_iommu_mem_writel_ctrl(uint64_t iostart) "iostart = %"PRIx64""
+sun4m_iommu_mem_writel_ctrl(uint64_t iostart) "iostart = %"PRIx64
 sun4m_iommu_mem_writel_tlbflush(uint32_t val) "tlb flush %x"
 sun4m_iommu_mem_writel_pgflush(uint32_t val) "page flush %x"
 sun4m_iommu_page_get_flags(uint64_t pa, uint64_t iopte, uint32_t ret) "get 
flags addr %"PRIx64" => pte %"PRIx64", *pte = %x"
 sun4m_iommu_translate_pa(uint64_t addr, uint64_t pa, uint32_t iopte) "xlate 
dva %"PRIx64" => pa %"PRIx64" iopte = %x"
-sun4m_iommu_bad_addr(uint64_t addr) "bad addr %"PRIx64""
+sun4m_iommu_bad_addr(uint64_t addr) "bad addr %"PRIx64
 
 # hw/usb-bus.c
 usb_port_claim(int bus, const char *port) "bus %d, port %s"
@@ -278,7 +278,7 @@ scsi_req_data(int target, int lun, int tag, int len) 
"target %d lun %d tag %d le
 scsi_req_dequeue(int target, int lun, int tag) "target %d lun %d tag %d"
 scsi_req_continue(int target, int lun, int tag) "target %d lun %d tag %d"
 scsi_req_parsed(int target, int lun, int tag, int cmd, int mode, int xfer) 
"target %d lun %d tag %d command %d dir %d length %d"
-scsi_req_parsed_lba(int target, int lun, int tag, int cmd, uint64_t lba) 
"target %d lun %d tag %d command %d lba %"PRIu64""
+scsi_req_parsed_lba(int target, int lun, int tag, int cmd, uint64_t lba) 
"target %d lun %d tag %d command %d lba %"PRIu64
 scsi_req_parse_bad(int target, int lun, int tag, int cmd) "target %d lun %d 
tag %d command %d"
 scsi_req_build_sense(int target, int lun, int tag, int key, int asc, int ascq) 
"target %d lun %d tag %d key %#02x asc %#02x ascq %#02x"
 scsi_report_luns(int target, int lun, int tag) "target %d lun %d tag %d"
@@ -306,11 +306,11 @@ qed_start_need_check_timer(void *s) "s %p"
 qed_cancel_n

[Qemu-devel] [PATCH 2/3] wavcapture: port to FILE*

2011-09-13 Thread Juan Quintela
QEMUFile * is only intended for Migration.  Using it for anything else
just adds pain and a layer of buffers for no good reason.

Signed-off-by: Juan Quintela 
---
 audio/wavcapture.c |   38 +-
 1 files changed, 25 insertions(+), 13 deletions(-)

diff --git a/audio/wavcapture.c b/audio/wavcapture.c
index c64f0ef..ecdb9ec 100644
--- a/audio/wavcapture.c
+++ b/audio/wavcapture.c
@@ -3,7 +3,7 @@
 #include "audio.h"

 typedef struct {
-QEMUFile *f;
+FILE *f;
 int bytes;
 char *path;
 int freq;
@@ -40,12 +40,16 @@ static void wav_destroy (void *opaque)
 le_store (rlen, rifflen, 4);
 le_store (dlen, datalen, 4);

-qemu_fseek (wav->f, 4, SEEK_SET);
-qemu_put_buffer (wav->f, rlen, 4);
+fseek (wav->f, 4, SEEK_SET);
+if (fwrite (rlen, 1, 4, wav->f) != 4) {
+printf("wav_destroy: short write\n");
+}

-qemu_fseek (wav->f, 32, SEEK_CUR);
-qemu_put_buffer (wav->f, dlen, 4);
-qemu_fclose (wav->f);
+fseek (wav->f, 32, SEEK_CUR);
+if (fwrite (dlen, 1, 4, wav->f) != 4) {
+printf("wav_destroy: short write\n");
+}
+fclose (wav->f);
 }

 g_free (wav->path);
@@ -55,7 +59,9 @@ static void wav_capture (void *opaque, void *buf, int size)
 {
 WAVState *wav = opaque;

-qemu_put_buffer (wav->f, buf, size);
+if (fwrite (buf, size, 1, wav->f) != size) {
+printf("wav_capture: short write\n");
+}
 wav->bytes += size;
 }

@@ -130,7 +136,7 @@ int wav_start_capture (CaptureState *s, const char *path, 
int freq,
 le_store (hdr + 28, freq << shift, 4);
 le_store (hdr + 32, 1 << shift, 2);

-wav->f = qemu_fopen (path, "wb");
+wav->f = fopen (path, "wb");
 if (!wav->f) {
 monitor_printf(mon, "Failed to open wave file `%s'\nReason: %s\n",
path, strerror (errno));
@@ -143,19 +149,25 @@ int wav_start_capture (CaptureState *s, const char *path, 
int freq,
 wav->nchannels = nchannels;
 wav->freq = freq;

-qemu_put_buffer (wav->f, hdr, sizeof (hdr));
+if (fwrite (hdr, sizeof (hdr), 1, wav->f) != sizeof (hdr)) {
+monitor_printf(mon, "wav_start_capture: short write\n");
+goto error_free;
+}

 cap = AUD_add_capture (&as, &ops, wav);
 if (!cap) {
 monitor_printf(mon, "Failed to add audio capture\n");
-g_free (wav->path);
-qemu_fclose (wav->f);
-g_free (wav);
-return -1;
+goto error_free;
 }

 wav->cap = cap;
 s->opaque = wav;
 s->ops = wav_capture_ops;
 return 0;
+
+error_free:
+g_free (wav->path);
+fclose (wav->f);
+g_free (wav);
+return -1;
 }
-- 
1.7.6




[Qemu-devel] [PATCH 1/3] vawaudio: port to FILE*

2011-09-13 Thread Juan Quintela
QEMUFile * is only intended for Migration.  Using it for anything else
just adds pain and a layer of buffers for no good reason.

Signed-off-by: Juan Quintela 
---
 audio/wavaudio.c |   28 +++-
 1 files changed, 19 insertions(+), 9 deletions(-)

diff --git a/audio/wavaudio.c b/audio/wavaudio.c
index aed1817..837b86d 100644
--- a/audio/wavaudio.c
+++ b/audio/wavaudio.c
@@ -30,7 +30,7 @@

 typedef struct WAVVoiceOut {
 HWVoiceOut hw;
-QEMUFile *f;
+FILE *f;
 int64_t old_ticks;
 void *pcm_buf;
 int total_samples;
@@ -76,7 +76,10 @@ static int wav_run_out (HWVoiceOut *hw, int live)
 dst = advance (wav->pcm_buf, rpos << hw->info.shift);

 hw->clip (dst, src, convert_samples);
-qemu_put_buffer (wav->f, dst, convert_samples << hw->info.shift);
+if (fwrite (dst, convert_samples << hw->info.shift, 1, wav->f) !=
+convert_samples << hw->info.shift) {
+printf("wav_run_out: short write\n");
+}

 rpos = (rpos + convert_samples) % hw->samples;
 samples -= convert_samples;
@@ -152,7 +155,7 @@ static int wav_init_out (HWVoiceOut *hw, struct audsettings 
*as)
 le_store (hdr + 28, hw->info.freq << (bits16 + stereo), 4);
 le_store (hdr + 32, 1 << (bits16 + stereo), 2);

-wav->f = qemu_fopen (conf.wav_path, "wb");
+wav->f = fopen (conf.wav_path, "wb");
 if (!wav->f) {
 dolog ("Failed to open wave file `%s'\nReason: %s\n",
conf.wav_path, strerror (errno));
@@ -161,7 +164,10 @@ static int wav_init_out (HWVoiceOut *hw, struct 
audsettings *as)
 return -1;
 }

-qemu_put_buffer (wav->f, hdr, sizeof (hdr));
+if (fwrite (hdr, sizeof (hdr), 1, wav->f) != sizeof (hdr)) {
+printf("wav_init_out: short write\n");
+return -1;
+}
 return 0;
 }

@@ -180,13 +186,17 @@ static void wav_fini_out (HWVoiceOut *hw)
 le_store (rlen, rifflen, 4);
 le_store (dlen, datalen, 4);

-qemu_fseek (wav->f, 4, SEEK_SET);
-qemu_put_buffer (wav->f, rlen, 4);
+fseek (wav->f, 4, SEEK_SET);
+if (fwrite (rlen, 1, 4, wav->f) != 4) {
+printf("wav_fini_out: short write\n");
+}

-qemu_fseek (wav->f, 32, SEEK_CUR);
-qemu_put_buffer (wav->f, dlen, 4);
+fseek (wav->f, 32, SEEK_CUR);
+if (fwrite (dlen, 1, 4, wav->f) != 4) {
+printf("wav_fini_out: short write\n");
+}

-qemu_fclose (wav->f);
+fclose (wav->f);
 wav->f = NULL;

 g_free (wav->pcm_buf);
-- 
1.7.6




[Qemu-devel] [PATCH 0/3] Remove QEMUFile abuse

2011-09-13 Thread Juan Quintela
Hi

QEMUFile is intended to be used only for migration.  Change the other
three users to use FILE * operations directly.  gcc on Fedora 15
complains about fread/write not checking its return value, so I added
checks.  But in several places only print an error message (there is
no error handly that I can hook into).  Notice that this is not worse
than it is today.

Later, Juan.

Juan Quintela (3):
  vawaudio: port to FILE*
  wavcapture: port to FILE*
  ds1225y: port to FILE*

 audio/wavaudio.c   |   28 +++-
 audio/wavcapture.c |   38 +-
 hw/ds1225y.c   |   28 
 3 files changed, 60 insertions(+), 34 deletions(-)

-- 
1.7.6




[Qemu-devel] [PATCH 3/3] ds1225y: port to FILE*

2011-09-13 Thread Juan Quintela
QEMUFile * is only intended for Migration.  Using it for anything else
just adds pain and a layer of buffers for no good reason.

Signed-off-by: Juan Quintela 
---
 hw/ds1225y.c |   28 
 1 files changed, 16 insertions(+), 12 deletions(-)

diff --git a/hw/ds1225y.c b/hw/ds1225y.c
index 9875c44..cd23668 100644
--- a/hw/ds1225y.c
+++ b/hw/ds1225y.c
@@ -29,7 +29,7 @@ typedef struct {
 DeviceState qdev;
 uint32_t chip_size;
 char *filename;
-QEMUFile *file;
+FILE *file;
 uint8_t *contents;
 } NvRamState;

@@ -70,9 +70,9 @@ static void nvram_writeb (void *opaque, target_phys_addr_t 
addr, uint32_t val)

 s->contents[addr] = val;
 if (s->file) {
-qemu_fseek(s->file, addr, SEEK_SET);
-qemu_put_byte(s->file, (int)val);
-qemu_fflush(s->file);
+fseek(s->file, addr, SEEK_SET);
+fputc(val, s->file);
+fflush(s->file);
 }
 }

@@ -108,15 +108,17 @@ static int nvram_post_load(void *opaque, int version_id)

 /* Close file, as filename may has changed in load/store process */
 if (s->file) {
-qemu_fclose(s->file);
+fclose(s->file);
 }

 /* Write back nvram contents */
-s->file = qemu_fopen(s->filename, "wb");
+s->file = fopen(s->filename, "wb");
 if (s->file) {
 /* Write back contents, as 'wb' mode cleaned the file */
-qemu_put_buffer(s->file, s->contents, s->chip_size);
-qemu_fflush(s->file);
+if (fwrite(s->contents, s->chip_size, 1, s->file) != s->chip_size) {
+printf("nvram_post_load: short write\n");
+}
+fflush(s->file);
 }

 return 0;
@@ -143,7 +145,7 @@ typedef struct {
 static int nvram_sysbus_initfn(SysBusDevice *dev)
 {
 NvRamState *s = &FROM_SYSBUS(SysBusNvRamState, dev)->nvram;
-QEMUFile *file;
+FILE *file;
 int s_io;

 s->contents = g_malloc0(s->chip_size);
@@ -153,11 +155,13 @@ static int nvram_sysbus_initfn(SysBusDevice *dev)
 sysbus_init_mmio(dev, s->chip_size, s_io);

 /* Read current file */
-file = qemu_fopen(s->filename, "rb");
+file = fopen(s->filename, "rb");
 if (file) {
 /* Read nvram contents */
-qemu_get_buffer(file, s->contents, s->chip_size);
-qemu_fclose(file);
+if (fread(s->contents, s->chip_size, 1, file) != s->chip_size) {
+printf("nvram_sysbus_initfn: short read\n");
+}
+fclose(file);
 }
 nvram_post_load(s, 0);

-- 
1.7.6




[Qemu-devel] [PATCH 0/4] Remove trailing double quote limitation and add virtio_set_status trace event

2011-09-13 Thread Stefan Hajnoczi
This series removes the tracetool parser limitation that format strings must
begin and end with double quotes.  In practice this means we need to work
around PRI*64 usage by adding dummy "" at the end of the line.  It's fairly
easy to solve this parser limitation and do away with the workarounds.

While we're at it, also add the virtio_set_status() trace event to properly
follow the lifecycle of virtio devices.

 docs/tracing.txt  |5 +
 hw/virtio.c   |   10 ++
 hw/virtio.h   |9 +
 scripts/tracetool |   20 +---
 trace-events  |   37 +++--
 5 files changed, 44 insertions(+), 37 deletions(-)




[Qemu-devel] [PATCH 2/4] trace: allow PRI*64 at beginning and ending of format string

2011-09-13 Thread Stefan Hajnoczi
The tracetool parser only picks up PRI*64 and other format string macros
when enclosed between double quoted strings.  Lift this restriction by
extracting everything after the closing ')' as the format string:

  cpu_set_apic_base(uint64_t val) "%016"PRIx64
  ^^^^

One trick here: it turns out that backslashes in the format string like
"\n" were being interpreted by echo(1).  Fix this by using the POSIX
printf(1) command instead.  Although it normally does not make sense to
include backslashes in trace event format strings, an injected newline
causes tracetool to emit a broken header file and I want to eliminate
cases where broken output is emitted, even if the input was bad.

Signed-off-by: Stefan Hajnoczi 
---
 docs/tracing.txt  |5 +
 scripts/tracetool |   20 +---
 2 files changed, 14 insertions(+), 11 deletions(-)

diff --git a/docs/tracing.txt b/docs/tracing.txt
index e130a61..2c33a62 100644
--- a/docs/tracing.txt
+++ b/docs/tracing.txt
@@ -75,10 +75,7 @@ Trace events should use types as follows:
 
 Format strings should reflect the types defined in the trace event.  Take
 special care to use PRId64 and PRIu64 for int64_t and uint64_t types,
-respectively.  This ensures portability between 32- and 64-bit platforms.  Note
-that format strings must begin and end with double quotes.  When using
-portability macros, ensure they are preceded and followed by double quotes:
-"value %"PRIx64"".
+respectively.  This ensures portability between 32- and 64-bit platforms.
 
 === Hints for adding new trace events ===
 
diff --git a/scripts/tracetool b/scripts/tracetool
index 743d246..4c9951d 100755
--- a/scripts/tracetool
+++ b/scripts/tracetool
@@ -40,6 +40,15 @@ EOF
 exit 1
 }
 
+# Print a line without interpreting backslash escapes
+#
+# The built-in echo command may interpret backslash escapes without an option
+# to disable this behavior.
+puts()
+{
+printf "%s\n" "$1"
+}
+
 # Get the name of a trace event
 get_name()
 {
@@ -111,13 +120,10 @@ get_argc()
 echo $argc
 }
 
-# Get the format string for a trace event
+# Get the format string including double quotes for a trace event
 get_fmt()
 {
-local fmt
-fmt=${1#*\"}
-fmt=${fmt%\"*}
-echo "$fmt"
+puts "${1#*)}"
 }
 
 linetoh_begin_nop()
@@ -266,7 +272,7 @@ linetoh_stderr()
 static inline void trace_$name($args)
 {
 if (trace_list[$stderr_event_num].state != 0) {
-fprintf(stderr, "$name $fmt\n" $argnames);
+fprintf(stderr, "$name " $fmt "\n" $argnames);
 }
 }
 EOF
@@ -366,7 +372,7 @@ DEFINE_TRACE(ust_$name);
 
 static void ust_${name}_probe($args)
 {
-trace_mark(ust, $name, "$fmt"$argnames);
+trace_mark(ust, $name, $fmt$argnames);
 }
 EOF
 
-- 
1.7.5.4




Re: [Qemu-devel] [Qemu-ppc] [RESEND][PATCH] booke timers

2011-09-13 Thread Fabien Chouteau
On 12/09/2011 19:23, Scott Wood wrote:
> On 09/09/2011 09:58 AM, Alexander Graf wrote:
>> On 09.09.2011, at 16:22, Fabien Chouteau wrote:
>>> if the interrupt is already set and you clear TCR.DIE, the interrupt has to
>>> remain set. The only way to unset an interrupt is to clear the corresponding
>>> bit in TSR (currently in store_booke_tsr).
>>
>> Are you sure? I see several things in the 2.06 spec:
> [snip]
>> To me that sounds as if the decrementer interrupt gets injected only
>> when TSR.DIS=1, TCR.DIE=1 and MSR.EE=1. Unsetting any of these bits
>> stops the interrupt from being delivered.
>>
>> Scott, can you please check up with the hardware guys if this is correct?
>
> This is how I've always understood it to work (assuming the interrupt
> hasn't already been delivered, of course).  Fabien, do you have real
> hardware that you see behave the way you describe?
>

No I don't, it was just my understanding of Book-E documentation. I've tried
your solution (below) with VxWorks, and it works like a charm.

static void booke_update_irq(CPUState *env)
{
ppc_set_irq(env, PPC_INTERRUPT_DECR,
(env->spr[SPR_BOOKE_TSR] & TSR_DIS
 && env->spr[SPR_BOOKE_TCR] & TCR_DIE));

ppc_set_irq(env, PPC_INTERRUPT_WDT,
(env->spr[SPR_BOOKE_TSR] & TSR_WIS
 && env->spr[SPR_BOOKE_TCR] & TCR_WIE));

ppc_set_irq(env, PPC_INTERRUPT_FIT,
(env->spr[SPR_BOOKE_TSR] & TSR_FIS
 && env->spr[SPR_BOOKE_TCR] & TCR_FIE));
}

Regards,

-- 
Fabien Chouteau



[Qemu-devel] [PATCH 4/4] trace: add virtio_set_status() trace event

2011-09-13 Thread Stefan Hajnoczi
The virtio device lifecycle can be observed by looking at the sequence
of set status operations.  This is especially important for catching the
reset operation (status value 0), which resets the device and all
virtqueues.

Signed-off-by: Stefan Hajnoczi 
---
 hw/virtio.c  |   10 ++
 hw/virtio.h  |9 +
 trace-events |1 +
 3 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/hw/virtio.c b/hw/virtio.c
index 13aa0fa..946d911 100644
--- a/hw/virtio.c
+++ b/hw/virtio.c
@@ -498,6 +498,16 @@ void virtio_update_irq(VirtIODevice *vdev)
 virtio_notify_vector(vdev, VIRTIO_NO_VECTOR);
 }
 
+void virtio_set_status(VirtIODevice *vdev, uint8_t val)
+{
+trace_virtio_set_status(vdev, val);
+
+if (vdev->set_status) {
+vdev->set_status(vdev, val);
+}
+vdev->status = val;
+}
+
 void virtio_reset(void *opaque)
 {
 VirtIODevice *vdev = opaque;
diff --git a/hw/virtio.h b/hw/virtio.h
index c129264..666e381 100644
--- a/hw/virtio.h
+++ b/hw/virtio.h
@@ -135,14 +135,6 @@ struct VirtIODevice
 VMChangeStateEntry *vmstate;
 };
 
-static inline void virtio_set_status(VirtIODevice *vdev, uint8_t val)
-{
-if (vdev->set_status) {
-vdev->set_status(vdev, val);
-}
-vdev->status = val;
-}
-
 VirtQueue *virtio_add_queue(VirtIODevice *vdev, int queue_size,
 void (*handle_output)(VirtIODevice *,
   VirtQueue *));
@@ -190,6 +182,7 @@ int virtio_queue_get_num(VirtIODevice *vdev, int n);
 void virtio_queue_notify(VirtIODevice *vdev, int n);
 uint16_t virtio_queue_vector(VirtIODevice *vdev, int n);
 void virtio_queue_set_vector(VirtIODevice *vdev, int n, uint16_t vector);
+void virtio_set_status(VirtIODevice *vdev, uint8_t val);
 void virtio_reset(void *opaque);
 void virtio_update_irq(VirtIODevice *vdev);
 
diff --git a/trace-events b/trace-events
index 9a59525..99edc97 100644
--- a/trace-events
+++ b/trace-events
@@ -42,6 +42,7 @@ virtqueue_pop(void *vq, void *elem, unsigned int in_num, 
unsigned int out_num) "
 virtio_queue_notify(void *vdev, int n, void *vq) "vdev %p n %d vq %p"
 virtio_irq(void *vq) "vq %p"
 virtio_notify(void *vdev, void *vq) "vdev %p vq %p"
+virtio_set_status(void *vdev, uint8_t val) "vdev %p val %u"
 
 # hw/virtio-serial-bus.c
 virtio_serial_send_control_event(unsigned int port, uint16_t event, uint16_t 
value) "port %u, event %u, value %u"
-- 
1.7.5.4




Re: [Qemu-devel] [Qemu-ppc] [RESEND][PATCH] booke timers

2011-09-13 Thread Alexander Graf
Fabien Chouteau wrote:
> On 12/09/2011 19:23, Scott Wood wrote:
>   
>> On 09/09/2011 09:58 AM, Alexander Graf wrote:
>> 
>>> On 09.09.2011, at 16:22, Fabien Chouteau wrote:
>>>   
 if the interrupt is already set and you clear TCR.DIE, the interrupt has to
 remain set. The only way to unset an interrupt is to clear the 
 corresponding
 bit in TSR (currently in store_booke_tsr).
 
>>> Are you sure? I see several things in the 2.06 spec:
>>>   
>> [snip]
>> 
>>> To me that sounds as if the decrementer interrupt gets injected only
>>> when TSR.DIS=1, TCR.DIE=1 and MSR.EE=1. Unsetting any of these bits
>>> stops the interrupt from being delivered.
>>>
>>> Scott, can you please check up with the hardware guys if this is correct?
>>>   
>> This is how I've always understood it to work (assuming the interrupt
>> hasn't already been delivered, of course).  Fabien, do you have real
>> hardware that you see behave the way you describe?
>>
>> 
>
> No I don't, it was just my understanding of Book-E documentation. I've tried
> your solution (below) with VxWorks, and it works like a charm.
>
> static void booke_update_irq(CPUState *env)
> {
> ppc_set_irq(env, PPC_INTERRUPT_DECR,
> (env->spr[SPR_BOOKE_TSR] & TSR_DIS
>  && env->spr[SPR_BOOKE_TCR] & TCR_DIE));
>
> ppc_set_irq(env, PPC_INTERRUPT_WDT,
> (env->spr[SPR_BOOKE_TSR] & TSR_WIS
>  && env->spr[SPR_BOOKE_TCR] & TCR_WIE));
>
> ppc_set_irq(env, PPC_INTERRUPT_FIT,
> (env->spr[SPR_BOOKE_TSR] & TSR_FIS
>  && env->spr[SPR_BOOKE_TCR] & TCR_FIE));
> }
>   

Awesome! Please also check on MSR.EE and send a new patch then :)


Alex




[Qemu-devel] [PATCH 1/4] trace: remove newline from grlib_irqmp_check_irqs format string

2011-09-13 Thread Stefan Hajnoczi
There is no need to put a newline in trace event format strings.  The
backend may use the format string within some context and takes care of
how to display the event.  The stderr backend automatically appends "\n"
whereas the ust backend does not want a newline at all.

Signed-off-by: Stefan Hajnoczi 
---
 trace-events |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/trace-events b/trace-events
index a8e7684..cfcdc9b 100644
--- a/trace-events
+++ b/trace-events
@@ -327,7 +327,7 @@ grlib_gptimer_readl(int id, uint64_t addr, uint32_t val) 
"timer:%d addr 0x%"PRIx
 grlib_gptimer_writel(int id, uint64_t addr, uint32_t val) "timer:%d addr 
0x%"PRIx64" 0x%x"
 
 # hw/grlib_irqmp.c
-grlib_irqmp_check_irqs(uint32_t pend, uint32_t force, uint32_t mask, uint32_t 
lvl1, uint32_t lvl2) "pend:0x%04x force:0x%04x mask:0x%04x lvl1:0x%04x 
lvl0:0x%04x\n"
+grlib_irqmp_check_irqs(uint32_t pend, uint32_t force, uint32_t mask, uint32_t 
lvl1, uint32_t lvl2) "pend:0x%04x force:0x%04x mask:0x%04x lvl1:0x%04x 
lvl0:0x%04x"
 grlib_irqmp_ack(int intno) "interrupt:%d"
 grlib_irqmp_set_irq(int irq) "Raise CPU IRQ %d"
 grlib_irqmp_readl_unknown(uint64_t addr) "addr 0x%"PRIx64""
-- 
1.7.5.4




Re: [Qemu-devel] Question on kvm_clock working ...

2011-09-13 Thread Jan Kiszka
On 2011-09-13 13:38, al pat wrote:
> Thanks Phillip.
> 
> My current source is "kvm-clock".
> 
> WHen I start my guest, it is in sync with the host clock.
> 
> Then, I chance the time on my host - using "date --set ...". I don't see the
> guest update its time.
> I was expecting that the guest will detect host time change and change it?

That's not what kvmclock is supposed to provide. Besides a monotonic
clock source, it basically offers an alternative persistent clock, to
some degree replacing the emulated RTC. So updates of the host system
time are only recognized by the guest kernel when it reboots or
suspends/resumes.

> 
> So, when the host is exporting its system time and TSC values, does it go
> into the "emulated RTC" of the guest and the guest checks it only once? Or
> does the guest resync its clock with the host's value periodically?
> 
> I can try to do: "hwclock --hctosys --utc" --- this is just to check. (I
> have kvm-clock as my clock source though).

Re-reading the RTC is a brute-force approach to re-synchronize with the
host. If potential clock jumps are OK for your use case, you can go that
way.

Alternatives are using NTP against a time server or writing a RTC plugin
for NTP to synchronize against that local source (a colleague once wrote
such a plugin for a special guest, but it never made it into public AFAIK).

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux



Re: [Qemu-devel] [Qemu-ppc] [RESEND][PATCH] booke timers

2011-09-13 Thread Alexander Graf
Alexander Graf wrote:
> Fabien Chouteau wrote:
>   
>> On 12/09/2011 19:23, Scott Wood wrote:
>>   
>> 
>>> On 09/09/2011 09:58 AM, Alexander Graf wrote:
>>> 
>>>   
 On 09.09.2011, at 16:22, Fabien Chouteau wrote:
   
 
> if the interrupt is already set and you clear TCR.DIE, the interrupt has 
> to
> remain set. The only way to unset an interrupt is to clear the 
> corresponding
> bit in TSR (currently in store_booke_tsr).
> 
>   
 Are you sure? I see several things in the 2.06 spec:
   
 
>>> [snip]
>>> 
>>>   
 To me that sounds as if the decrementer interrupt gets injected only
 when TSR.DIS=1, TCR.DIE=1 and MSR.EE=1. Unsetting any of these bits
 stops the interrupt from being delivered.

 Scott, can you please check up with the hardware guys if this is correct?
   
 
>>> This is how I've always understood it to work (assuming the interrupt
>>> hasn't already been delivered, of course).  Fabien, do you have real
>>> hardware that you see behave the way you describe?
>>>
>>> 
>>>   
>> No I don't, it was just my understanding of Book-E documentation. I've tried
>> your solution (below) with VxWorks, and it works like a charm.
>>
>> static void booke_update_irq(CPUState *env)
>> {
>> ppc_set_irq(env, PPC_INTERRUPT_DECR,
>> (env->spr[SPR_BOOKE_TSR] & TSR_DIS
>>  && env->spr[SPR_BOOKE_TCR] & TCR_DIE));
>>
>> ppc_set_irq(env, PPC_INTERRUPT_WDT,
>> (env->spr[SPR_BOOKE_TSR] & TSR_WIS
>>  && env->spr[SPR_BOOKE_TCR] & TCR_WIE));
>>
>> ppc_set_irq(env, PPC_INTERRUPT_FIT,
>> (env->spr[SPR_BOOKE_TSR] & TSR_FIS
>>  && env->spr[SPR_BOOKE_TCR] & TCR_FIE));
>> }
>>   
>> 
>
> Awesome! Please also check on MSR.EE and send a new patch then :)
>   

Ah, the EE check is in target-ppc/helper.c:ppc_hw_interrupt. Very
confusing (and probably wrong because it could generate spurious
interrupts), but it should be enough for now.


Alex




[Qemu-devel] [PULL] VirtFS update

2011-09-13 Thread Aneesh Kumar K.V

Hi Anthony,

This series contain few fixes to VirtFS server. The patch set also add
two new 9p operations. Please pull.

The following changes since commit 07ff2c4475df77e38a31d50ee7f3932631806c15:

  Merge remote-tracking branch 'origin/master' into staging (2011-09-08 
09:25:36 -0500)

are available in the git repository at:

  git://repo.or.cz/qemu/v9fs.git for-upstream-4

Aneesh Kumar K.V (5):
  hw/9pfs: Update the fidp path before opendir
  hw/9pfs: Initialize rest of qid field to zero.
  hw/9pfs: Fix memleaks in some 9p operation
  hw/9pfs: add 9P2000.L renameat operation
  hw/9pfs: add 9P2000.L unlinkat operation

 hw/9pfs/virtio-9p.c |  126 +++
 hw/9pfs/virtio-9p.h |4 ++
 2 files changed, 130 insertions(+), 0 deletions(-)


-aneesh




Re: [Qemu-devel] [Qemu-ppc] [RESEND][PATCH] booke timers

2011-09-13 Thread Fabien Chouteau
On 13/09/2011 15:13, Alexander Graf wrote:
> Alexander Graf wrote:
>> Fabien Chouteau wrote:
>>   
>>> On 12/09/2011 19:23, Scott Wood wrote:
>>>   
>>> 
 On 09/09/2011 09:58 AM, Alexander Graf wrote:
 
   
> On 09.09.2011, at 16:22, Fabien Chouteau wrote:
>   
> 
>> if the interrupt is already set and you clear TCR.DIE, the interrupt has 
>> to
>> remain set. The only way to unset an interrupt is to clear the 
>> corresponding
>> bit in TSR (currently in store_booke_tsr).
>> 
>>   
> Are you sure? I see several things in the 2.06 spec:
>   
> 
 [snip]
 
   
> To me that sounds as if the decrementer interrupt gets injected only
> when TSR.DIS=1, TCR.DIE=1 and MSR.EE=1. Unsetting any of these bits
> stops the interrupt from being delivered.
>
> Scott, can you please check up with the hardware guys if this is correct?
>   
> 
 This is how I've always understood it to work (assuming the interrupt
 hasn't already been delivered, of course).  Fabien, do you have real
 hardware that you see behave the way you describe?

 
   
>>> No I don't, it was just my understanding of Book-E documentation. I've tried
>>> your solution (below) with VxWorks, and it works like a charm.
>>>
>>> static void booke_update_irq(CPUState *env)
>>> {
>>> ppc_set_irq(env, PPC_INTERRUPT_DECR,
>>> (env->spr[SPR_BOOKE_TSR] & TSR_DIS
>>>  && env->spr[SPR_BOOKE_TCR] & TCR_DIE));
>>>
>>> ppc_set_irq(env, PPC_INTERRUPT_WDT,
>>> (env->spr[SPR_BOOKE_TSR] & TSR_WIS
>>>  && env->spr[SPR_BOOKE_TCR] & TCR_WIE));
>>>
>>> ppc_set_irq(env, PPC_INTERRUPT_FIT,
>>> (env->spr[SPR_BOOKE_TSR] & TSR_FIS
>>>  && env->spr[SPR_BOOKE_TCR] & TCR_FIE));
>>> }
>>>   
>>> 
>>
>> Awesome! Please also check on MSR.EE and send a new patch then :)
>>   
> 
> Ah, the EE check is in target-ppc/helper.c:ppc_hw_interrupt. Very
> confusing (and probably wrong because it could generate spurious
> interrupts), but it should be enough for now.
> 

That's what I was looking for...

So we are good.

-- 
Fabien Chouteau



[Qemu-devel] [Bug 848571] Re: qemu does not generate a qemu-kvm.stp tapset file

2011-09-13 Thread William Cohen
You are correct. the qemu.spec file is doing the copy.

** Changed in: qemu
   Status: New => Invalid

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/848571

Title:
  qemu does not generate a qemu-kvm.stp tapset file

Status in QEMU:
  Invalid

Bug description:
  To make the systemtap probing easier to use qemu generates qemu*.stp
  files with aliases for various events for each of the executables. The
  installer places these files in /usr/share/systemtap/tapset/.  These
  files are generated by the tracetool. However, the /usr/bin/qemu-kvm
  is produced with a copy:

   cp -a x86_64-softmmu/qemu-system-x86_64 qemu-kvm

  No matching qemu-kvm.stp generated for the qemu-kvm executable. It
  would be really nice if that tapset file is generated so people can
  use more symbolic probe points.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/848571/+subscriptions



Re: [Qemu-devel] [PATCH][RFC][0/2] REF+/REF- optimization

2011-09-13 Thread Frediano Ziglio
2011/9/13 Kevin Wolf :
> Am 13.09.2011 09:53, schrieb Frediano Ziglio:
>> These patches try to trade-off between leaks and speed for clusters
>> refcounts.
>>
>> Refcount increments (REF+ or refp) are handled in a different way from
>> decrements (REF- or refm). The reason it that posting or not flushing
>> a REF- cause "just" a leak while posting a REF+ cause a corruption.
>>
>> To optimize REF- I just used an array to store offsets then when a
>> flush is requested or array reach a limit (currently 1022) the array
>> is sorted and written to disk. I use an array with offset instead of
>> ranges to support compression (an offset could appear multiple times
>> in the array).
>> I consider this patch quite ready.
>
> Ok, first of all let's clarify what this optimises. I don't think it
> changes anything at all for the writeback cache modes, because these
> already do most operations in memory only. So this must be about
> optimising some operations with cache=writethrough. REF- isn't about
> normal cluster allocation, it is about COW with internal snapshots or
> bdrv_discard. Do you have benchmarks for any of them?
>
> I strongly disagree with your approach for REF-. We already have a
> cache, and introducing a second one sounds like a bad idea. I think we
> could get a very similar effect if we introduced a
> qcow2_cache_entry_mark_dirty_wb() that marks a given refcount block as
> dirty, but at the same time tells the cache that even in write-through
> mode it can still treat this block as write-back. This should require
> much less code changes.
>

Yes, mainly optimize for writethrough. I did not test with writeback
but should improve even this (I think here you have some flush to keep
consistency).
I'll try to write a qcow2_cache_entry_mark_dirty_wb patch and test it.

> But let's measure the effects first, I suspect that for cluster
> allocation it doesn't help much because every REF- comes with a REF+.
>

That's 50% of effort if REF- clusters are far from REF+ :)

>> To optimize REF+ I mark a range as allocated and use this range to
>> get new ones (avoiding writing refcount to disk). When a flush is
>> requested or in some situations (like snapshot) this cache is disabled
>> and flushed (written as REF-).
>> I do not consider this patch ready, it works and pass all io-tests
>> but for instance I would avoid allocating new clusters for refcount
>> during preallocation.
>
> The only question here is if improving cache=writethrough cluster
> allocation performance is worth the additional complexity in the already
> complex refcounting code.
>

I didn't see this optimization as a second level cache, but yes, for
REF- is a second cache.

> The alternative that was discussed before is the dirty bit approach that
> is used in QED and would allow us to use writeback for all refcount
> blocks, regardless of REF- or REF+. It would be an easier approach
> requiring less code changes, but it comes with the cost of requiring an
> fsck after a qemu crash.
>

I was thinking about changing the header magic first time we change
refcount in order to mark image as dirty so newer Qemu recognize the
flag while former one does not recognize image. Obviously reverting
magic on image close.

>> End speed up is quite visible allocating clusters (more then 20%).
>
> What benchmark do you use for testing this?
>
> Kevin
>

Currently I'm using bonnie++ but I noted similar improves with iozone.
The test script format an image then launch a Linux machine which run
a script and save result to a file.
The test image is seems by this virtual machine as a separate disk.
The file on hist reside in a separate LV.
I got quite consistent results (of course not working on the machine
while testing, is not actually dedicated to this job).

Actually I'm running the test (added a test working in a snapshot image).

Frediano



[Qemu-devel] [PATCH] bswap.h: build fix

2011-09-13 Thread Christoph Egger


qemu build fails when CONFIG_MACHINE_BSWAP_H is defined
because float32, float64, etc. are not defined.
This makes qemu build.

Signed-off-by: Christoph Egger 

diff --git a/bswap.h b/bswap.h
index f41bebe..cc7f84d 100644
--- a/bswap.h
+++ b/bswap.h
@@ -4,6 +4,7 @@
 #include "config-host.h"

 #include 
+#include "softfloat.h"

 #ifdef CONFIG_MACHINE_BSWAP_H
 #include 
@@ -11,8 +12,6 @@
 #include 
 #else

-#include "softfloat.h"
-
 #ifdef CONFIG_BYTESWAP_H
 #include 
 #else



--
---to satisfy European Law for business letters:
Advanced Micro Devices GmbH
Einsteinring 24, 85689 Dornach b. Muenchen
Geschaeftsfuehrer: Alberto Bozzo, Andrew Bowd
Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632




Re: [Qemu-devel] [PATCH 04/12] nbd: add support for NBD_CMD_FLUSH

2011-09-13 Thread Kevin Wolf
Am 08.09.2011 17:24, schrieb Paolo Bonzini:
> Note for the brace police: the style in this commit and the following
> is consistent with the rest of the file.  It is then fixed together with
> the introduction of coroutines.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  block/nbd.c |   31 +++
>  nbd.c   |   14 +-
>  2 files changed, 44 insertions(+), 1 deletions(-)
> 
> diff --git a/block/nbd.c b/block/nbd.c
> index ffc57a9..4a195dc 100644
> --- a/block/nbd.c
> +++ b/block/nbd.c
> @@ -237,6 +237,36 @@ static int nbd_write(BlockDriverState *bs, int64_t 
> sector_num,
>  return 0;
>  }
>  
> +static int nbd_flush(BlockDriverState *bs)
> +{
> +BDRVNBDState *s = bs->opaque;
> +struct nbd_request request;
> +struct nbd_reply reply;
> +
> +if (!(s->nbdflags & NBD_FLAG_SEND_FLUSH)) {
> +return 0;
> +}
> +
> +request.type = NBD_CMD_FLUSH;
> +request.handle = (uint64_t)(intptr_t)bs;
> +request.from = 0;
> +request.len = 0;
> +
> +if (nbd_send_request(s->sock, &request) == -1)
> +return -errno;
> +
> +if (nbd_receive_reply(s->sock, &reply) == -1)
> +return -errno;
> +
> +if (reply.error !=0)

Missing space (this is not for consistency, right?)

> @@ -682,6 +683,18 @@ int nbd_trip(BlockDriverState *bs, int csock, off_t 
> size, uint64_t dev_offset,
>  TRACE("Request type is DISCONNECT");
>  errno = 0;
>  return 1;
> +case NBD_CMD_FLUSH:
> +TRACE("Request type is FLUSH");
> +
> +if (bdrv_flush(bs) == -1) {

bdrv_flush is supposed to return -errno, so please check for < 0. (I see
that raw-posix needs to be fixed, but other block drivers already return
error values other than -1)

Kevin



Re: [Qemu-devel] [PATCH 05/12] nbd: add support for NBD_CMD_FLAG_FUA

2011-09-13 Thread Kevin Wolf
Am 08.09.2011 17:24, schrieb Paolo Bonzini:
> The server can use it to issue a flush automatically after a
> write.  The client can also use it to mimic a write-through
> cache.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  block/nbd.c |8 
>  nbd.c   |   13 +++--
>  2 files changed, 19 insertions(+), 2 deletions(-)

> @@ -674,6 +675,14 @@ int nbd_trip(BlockDriverState *bs, int csock, off_t 
> size, uint64_t dev_offset,
>  }
>  
>  *offset += request.len;
> +
> +if (request.type & NBD_CMD_FLAG_FUA) {
> +if (bdrv_flush(bs) == -1) {

Need to check for < 0 here as well.

Kevin



Re: [Qemu-devel] [PATCH 06/12] nbd: support NBD_CMD_TRIM in the server

2011-09-13 Thread Kevin Wolf
Am 08.09.2011 17:24, schrieb Paolo Bonzini:
> Map it to bdrv_discard.  The server can now expose NBD_FLAG_SEND_TRIM.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  block/nbd.c |   31 +++
>  nbd.c   |9 -
>  2 files changed, 39 insertions(+), 1 deletions(-)
> 
> diff --git a/block/nbd.c b/block/nbd.c
> index 5a7812c..964caa8 100644
> --- a/block/nbd.c
> +++ b/block/nbd.c
> @@ -275,6 +275,36 @@ static int nbd_flush(BlockDriverState *bs)
>  return 0;
>  }
>  
> +static int nbd_discard(BlockDriverState *bs, int64_t sector_num,
> +   int nb_sectors)
> +{
> +BDRVNBDState *s = bs->opaque;
> +struct nbd_request request;
> +struct nbd_reply reply;
> +
> +if (!(s->nbdflags & NBD_FLAG_SEND_TRIM)) {
> +return 0;
> +}
> +request.type = NBD_CMD_TRIM;
> +request.handle = (uint64_t)(intptr_t)bs;
> +request.from = sector_num * 512;;
> +request.len = nb_sectors * 512;
> +
> +if (nbd_send_request(s->sock, &request) == -1)
> +return -errno;
> +
> +if (nbd_receive_reply(s->sock, &reply) == -1)
> +return -errno;
> +
> +if (reply.error !=0)
> +return -reply.error;
> +
> +if (reply.handle != request.handle)
> +return -EIO;
> +
> +return 0;
> +}
> +
>  static void nbd_close(BlockDriverState *bs)
>  {
>  BDRVNBDState *s = bs->opaque;
> @@ -299,6 +329,7 @@ static BlockDriver bdrv_nbd = {
>  .bdrv_write  = nbd_write,
>  .bdrv_close  = nbd_close,
>  .bdrv_flush  = nbd_flush,
> +.bdrv_discard= nbd_discard,
>  .bdrv_getlength  = nbd_getlength,
>  .protocol_name   = "nbd",
>  };
> diff --git a/nbd.c b/nbd.c
> index b65fb4a..f089904 100644
> --- a/nbd.c
> +++ b/nbd.c
> @@ -194,7 +194,7 @@ int nbd_negotiate(int csock, off_t size, uint32_t flags)
>  cpu_to_be64w((uint64_t*)(buf + 8), 0x00420281861253LL);
>  cpu_to_be64w((uint64_t*)(buf + 16), size);
>  cpu_to_be32w((uint32_t*)(buf + 24),
> - flags | NBD_FLAG_HAS_FLAGS |
> + flags | NBD_FLAG_HAS_FLAGS | NBD_FLAG_SEND_TRIM |
>   NBD_FLAG_SEND_FLUSH | NBD_FLAG_SEND_FUA);
>  memset(buf + 28, 0, 124);
>  
> @@ -703,6 +703,13 @@ int nbd_trip(BlockDriverState *bs, int csock, off_t 
> size, uint64_t dev_offset,
>  if (nbd_send_reply(csock, &reply) == -1)
>  return -1;
>  break;
> +case NBD_CMD_TRIM:
> +TRACE("Request type is TRIM");
> +bdrv_discard(bs, (request.from + dev_offset) / 512,
> + request.len / 512);

Errors are completely ignored? Does the NBD protocol not allow to return
an error?

Kevin



[Qemu-devel] [PATCH V4] booke timers

2011-09-13 Thread Fabien Chouteau
While working on the emulation of the freescale p2010 (e500v2) I realized that
there's no implementation of booke's timers features. Currently mpc8544 uses
ppc_emb (ppc_emb_timers_init) which is close but not exactly like booke (for
example booke uses different SPR).

Signed-off-by: Fabien Chouteau 
---

V2:
  - Fix fixed timer, now trigger each time the selected
bit switch from 0 to 1.
  - Fix e500 criterion.
  - Trigger an interrupt when user set DIE/FIE/WIE while
DIS/FIS/WIS is already set.
  - Minor fixes (mask definition, variable name...).
  - Rename ppc_emb to ppc_40x

V3:

  - Fix bit selection for e500 fixed timers (fp == 00 selects msb)
  - Improved formula to compute the next event of a fixed timer

v4:

  - Centralized interrupt handling
  - Timer flags (BOOKE, E500, DECR_UNDERFLOW_TRIGGERED, DECR_ZERO_TRIGGERED)

 Makefile.target |2 +-
 hw/ppc.c|  138 +--
 hw/ppc.h|   37 ++-
 hw/ppc4xx_devs.c|2 +-
 hw/ppc_booke.c  |  266 +++
 hw/ppce500_mpc8544ds.c  |4 +-
 hw/virtex_ml507.c   |   11 +--
 target-ppc/cpu.h|   27 +
 target-ppc/translate_init.c |   39 +++
 9 files changed, 427 insertions(+), 99 deletions(-)
 create mode 100644 hw/ppc_booke.c

diff --git a/Makefile.target b/Makefile.target
index f708453..5a85662 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -229,7 +229,7 @@ obj-i386-$(CONFIG_KVM) += kvmclock.o
 obj-i386-$(CONFIG_SPICE) += qxl.o qxl-logger.o qxl-render.o
 
 # shared objects
-obj-ppc-y = ppc.o
+obj-ppc-y = ppc.o ppc_booke.o
 obj-ppc-y += vga.o
 # PREP target
 obj-ppc-y += i8259.o mc146818rtc.o
diff --git a/hw/ppc.c b/hw/ppc.c
index 8870748..25b59dd 100644
--- a/hw/ppc.c
+++ b/hw/ppc.c
@@ -50,7 +50,7 @@
 static void cpu_ppc_tb_stop (CPUState *env);
 static void cpu_ppc_tb_start (CPUState *env);
 
-static void ppc_set_irq (CPUState *env, int n_IRQ, int level)
+void ppc_set_irq(CPUState *env, int n_IRQ, int level)
 {
 unsigned int old_pending = env->pending_interrupts;
 
@@ -423,25 +423,8 @@ void ppce500_irq_init (CPUState *env)
 }
 /*/
 /* PowerPC time base and decrementer emulation */
-struct ppc_tb_t {
-/* Time base management */
-int64_t  tb_offset;/* Compensation*/
-int64_t  atb_offset;   /* Compensation*/
-uint32_t tb_freq;  /* TB frequency*/
-/* Decrementer management */
-uint64_t decr_next;/* Tick for next decr interrupt*/
-uint32_t decr_freq;/* decrementer frequency   */
-struct QEMUTimer *decr_timer;
-/* Hypervisor decrementer management */
-uint64_t hdecr_next;/* Tick for next hdecr interrupt  */
-struct QEMUTimer *hdecr_timer;
-uint64_t purr_load;
-uint64_t purr_start;
-void *opaque;
-};
 
-static inline uint64_t cpu_ppc_get_tb(ppc_tb_t *tb_env, uint64_t vmclk,
-  int64_t tb_offset)
+uint64_t cpu_ppc_get_tb(ppc_tb_t *tb_env, uint64_t vmclk, int64_t tb_offset)
 {
 /* TB time in tb periods */
 return muldiv64(vmclk, tb_env->tb_freq, get_ticks_per_sec()) + tb_offset;
@@ -611,10 +594,13 @@ static inline uint32_t _cpu_ppc_load_decr(CPUState *env, 
uint64_t next)
 int64_t diff;
 
 diff = next - qemu_get_clock_ns(vm_clock);
-if (diff >= 0)
+if (diff >= 0) {
 decr = muldiv64(diff, tb_env->decr_freq, get_ticks_per_sec());
-else
+} else if (tb_env->flags & PPC_TIMER_BOOKE) {
+decr = 0;
+}  else {
 decr = -muldiv64(-diff, tb_env->decr_freq, get_ticks_per_sec());
+}
 LOG_TB("%s: %08" PRIx32 "\n", __func__, decr);
 
 return decr;
@@ -678,18 +664,24 @@ static void __cpu_ppc_store_decr (CPUState *env, uint64_t 
*nextp,
 decr, value);
 now = qemu_get_clock_ns(vm_clock);
 next = now + muldiv64(value, get_ticks_per_sec(), tb_env->decr_freq);
-if (is_excp)
+if (is_excp) {
 next += *nextp - now;
-if (next == now)
+}
+if (next == now) {
 next++;
+}
 *nextp = next;
 /* Adjust timer */
 qemu_mod_timer(timer, next);
-/* If we set a negative value and the decrementer was positive,
- * raise an exception.
+
+/* If we set a negative value and the decrementer was positive, raise an
+ * exception.
  */
-if ((value & 0x8000) && !(decr & 0x8000))
+if ((tb_env->flags & PPC_DECR_UNDERFLOW_TRIGGERED)
+&& (value & 0x8000)
+&& !(decr & 0x8000)) {
 (*raise_excp)(env);
+}
 }
 
 static inline void _cpu_ppc_store_decr(CPUState *env, uint32_t decr,
@@ -763,6 +755,7 @@ clk_setup_cb cpu_ppc_tb_init (CPUState *env, uint32_t freq)
 
 tb_env = g_malloc0(sizeof(ppc_tb_t));
 env->tb_env = tb_env;
+tb_env->flags = PPC_DECR_UNDERFLOW_TRI

Re: [Qemu-devel] [PATCH v2 09/12] sheepdog: move coroutine send/recv function to generic code

2011-09-13 Thread Kevin Wolf
Am 09.09.2011 10:11, schrieb Paolo Bonzini:
> Outside coroutines, avoid busy waiting on EAGAIN by temporarily
> making the socket blocking.
> 
> The API of qemu_recvv/qemu_sendv is slightly different from
> do_readv/do_writev because they do not handle coroutines.  It
> returns the number of bytes written before encountering an
> EAGAIN.  The specificity of yielding on EAGAIN is entirely in
> qemu-coroutine.c.
> 
> Cc: MORITA Kazutaka 
> Signed-off-by: Paolo Bonzini 
> ---
>   Thanks for the review.  I checked with qemu-io that all of
> 
>   readv -v 0 524288 (x8)
> readv -v 0 262144 (x16)
> readv -v 0 1024 (x4096)
> readv -v 0 1536 (x2730) 1024
> readv -v 0 1024 512 (x2730) 1024
> 
>   work and produce the same output, while previously they would fail.
>   Looks like it's hard to trigger the code just with qemu.
> 
>  block/sheepdog.c |  225 
> ++
>  cutils.c |  103 +
>  qemu-common.h|3 +
>  qemu-coroutine.c |   70 +
>  qemu-coroutine.h |   26 ++

Can we move the code somewhere else? This is not core coroutine
infrastructure. I would suggest qemu_socket.h/qemu-sockets.c.

Kevin



Re: [Qemu-devel] About hotplug multifunction

2011-09-13 Thread Amos Kong
- Original Message -
> Hi all,

I've tested with WinXp guest, the multifunction hotplug works.

> After reading the pci driver code, I found a problem.
> 
> There is a list for each slot, (slot->funcs)
> it will be inited in acpiphp_glue.c:register_slot() before hotpluging
> device,
> and only one entry(func 0) will be added to it,
> no new entry will be added to the list when hotpluging devices to the
> slot.
> 
> When we release the device, there are only _one_ entry in the
> list(slot->funcs).

This list (slot->funcs) is designed to restore the func entries,
But it only restore the func 0.

I changed the # to # in seabios: src/acpi-dsdt.dsl
mf hotplug of Windows doesn't work. linux guest will only remove the last func, 
func 0~6 still exist in guest.
it seems a bug of Linux pci driver (not calling pci_remove_bus_device() for 
func 1~7).

--- a/src/acpi-dsdt.dsl
+++ b/src/acpi-dsdt.dsl
@@ -130,7 +130,7 @@ DefinitionBlock (
 
 #define hotplug_slot(name, nr) \
 Device (S##name) {\
-   Name (_ADR, nr##)  \
+   Name (_ADR, nr##)  \
Method (_EJ0,1) {  \
 Store(ShiftLeft(1, nr), B0EJ) \
 Return (0x0)  \
@@ -462,7 +462,7 @@ DefinitionBlock (
 
 #define gen_pci_device(name, nr)\
 Device(SL##name) {  \
-Name (_ADR, nr##)   \
+Name (_ADR, nr##)   \
 Method (_RMV) { \

==
I try to add new entries in acpiphp_glue.c:enable_device() for each func, but 
it doesn't work.



> acpiphp_glue.c:disable_device()
> list_for_each_entry(func, &slot->funcs, sibling) {
> pdev = pci_get_slot(slot->bridge->pci_bus,
> PCI_DEVFN(slot->device, func->function));
> ...release code... // those code can only be executed one time (func
> 0)
> pci_remove_bus_device(pdev);
> }
> 
> bus.c:pci_bus_add_device() is called for each func device in
> acpiphp_glue.c:enable_device().
> bus.c:pci_remove_bus_device(pdev) is only called for func 0 in
> acpiphp_glue.c:disable_device().
> 
> 
> Resolution: (I've tested it, success)
> enumerate all the funcs when disable device.
> 
> list_for_each_entry(func, &slot->funcs, sibling) {
> for (i=0; i<8; i++) {
> pdev = pci_get_slot(slot->bridge->pci_bus,
> PCI_DEVFN(slot->device, i));
> ...release code...
> pci_remove_bus_device(pdev);
> 
> }
> }





[Qemu-devel] [PATCH 1/2] KVM: emulate lapic tsc deadline timer for guest

2011-09-13 Thread Liu, Jinsong
>From 7b12021e1d1b79797b49e41cc0a7be05a6180d9a Mon Sep 17 00:00:00 2001
From: Liu, Jinsong 
Date: Tue, 13 Sep 2011 21:52:54 +0800
Subject: [PATCH] KVM: emulate lapic tsc deadline timer for guest

This patch emulate lapic tsc deadline timer for guest:
Enumerate tsc deadline timer capability by CPUID;
Enable tsc deadline timer mode by lapic MMIO;
Start tsc deadline timer by WRMSR;

Signed-off-by: Liu, Jinsong 
---
 arch/x86/include/asm/apicdef.h|2 +
 arch/x86/include/asm/cpufeature.h |3 +
 arch/x86/include/asm/kvm_host.h   |2 +
 arch/x86/include/asm/msr-index.h  |2 +
 arch/x86/kvm/kvm_timer.h  |2 +
 arch/x86/kvm/lapic.c  |  122 ++---
 arch/x86/kvm/lapic.h  |3 +
 arch/x86/kvm/x86.c|   20 ++-
 8 files changed, 132 insertions(+), 24 deletions(-)

diff --git a/arch/x86/include/asm/apicdef.h b/arch/x86/include/asm/apicdef.h
index 34595d5..3925d80 100644
--- a/arch/x86/include/asm/apicdef.h
+++ b/arch/x86/include/asm/apicdef.h
@@ -100,7 +100,9 @@
 #defineAPIC_TIMER_BASE_CLKIN   0x0
 #defineAPIC_TIMER_BASE_TMBASE  0x1
 #defineAPIC_TIMER_BASE_DIV 0x2
+#defineAPIC_LVT_TIMER_ONESHOT  (0 << 17)
 #defineAPIC_LVT_TIMER_PERIODIC (1 << 17)
+#defineAPIC_LVT_TIMER_TSCDEADLINE  (2 << 17)
 #defineAPIC_LVT_MASKED (1 << 16)
 #defineAPIC_LVT_LEVEL_TRIGGER  (1 << 15)
 #defineAPIC_LVT_REMOTE_IRR (1 << 14)
diff --git a/arch/x86/include/asm/cpufeature.h 
b/arch/x86/include/asm/cpufeature.h
index 4258aac..8a26e48 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -120,6 +120,7 @@
 #define X86_FEATURE_X2APIC (4*32+21) /* x2APIC */
 #define X86_FEATURE_MOVBE  (4*32+22) /* MOVBE instruction */
 #define X86_FEATURE_POPCNT  (4*32+23) /* POPCNT instruction */
+#define X86_FEATURE_TSC_DEADLINE_TIMER(4*32+24) /* Tsc deadline timer */
 #define X86_FEATURE_AES(4*32+25) /* AES instructions */
 #define X86_FEATURE_XSAVE  (4*32+26) /* XSAVE/XRSTOR/XSETBV/XGETBV */
 #define X86_FEATURE_OSXSAVE(4*32+27) /* "" XSAVE enabled in the OS */
@@ -284,6 +285,8 @@ extern const char * const x86_power_flags[32];
 #define cpu_has_xmm4_1 boot_cpu_has(X86_FEATURE_XMM4_1)
 #define cpu_has_xmm4_2 boot_cpu_has(X86_FEATURE_XMM4_2)
 #define cpu_has_x2apic boot_cpu_has(X86_FEATURE_X2APIC)
+#define cpu_has_tsc_deadline_timer \
+   boot_cpu_has(X86_FEATURE_TSC_DEADLINE_TIMER)
 #define cpu_has_xsave  boot_cpu_has(X86_FEATURE_XSAVE)
 #define cpu_has_hypervisor boot_cpu_has(X86_FEATURE_HYPERVISOR)
 #define cpu_has_pclmulqdq  boot_cpu_has(X86_FEATURE_PCLMULQDQ)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 307e3cf..2ce6529 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -671,6 +671,8 @@ u8 kvm_get_guest_memory_type(struct kvm_vcpu *vcpu, gfn_t 
gfn);
 
 extern bool tdp_enabled;
 
+extern u64 vcpu_tsc_khz(struct kvm_vcpu *vcpu);
+
 /* control of guest tsc rate supported? */
 extern bool kvm_has_tsc_control;
 /* minimum supported tsc_khz for guests */
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index d52609a..a6962d9 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -229,6 +229,8 @@
 #define MSR_IA32_APICBASE_ENABLE   (1<<11)
 #define MSR_IA32_APICBASE_BASE (0xf<<12)
 
+#define MSR_IA32_TSCDEADLINE   0x06e0
+
 #define MSR_IA32_UCODE_WRITE   0x0079
 #define MSR_IA32_UCODE_REV 0x008b
 
diff --git a/arch/x86/kvm/kvm_timer.h b/arch/x86/kvm/kvm_timer.h
index 64bc6ea..497dbaa 100644
--- a/arch/x86/kvm/kvm_timer.h
+++ b/arch/x86/kvm/kvm_timer.h
@@ -2,6 +2,8 @@
 struct kvm_timer {
struct hrtimer timer;
s64 period; /* unit: ns */
+   u32 timer_mode_mask;
+   u64 tscdeadline;
atomic_t pending;   /* accumulated triggered timers 
*/
bool reinject;
struct kvm_timer_ops *t_ops;
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 2b2255b..925d4b9 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -135,9 +135,23 @@ static inline int apic_lvt_vector(struct kvm_lapic *apic, 
int lvt_type)
return apic_get_reg(apic, lvt_type) & APIC_VECTOR_MASK;
 }
 
+static inline int apic_lvtt_oneshot(struct kvm_lapic *apic)
+{
+   return ((apic_get_reg(apic, APIC_LVTT) & 
+   apic->lapic_timer.timer_mode_mask) == APIC_LVT_TIMER_ONESHOT);
+}
+
 static inline int apic_lvtt_period(struct kvm_lapic *apic)
 {
-   return apic_get_reg(apic, APIC_LVTT) & APIC_LVT_TIMER_PERIODIC;
+   

[Qemu-devel] [PATCH 2/2] Qemu co-operation with kvm tsc deadline timer

2011-09-13 Thread Liu, Jinsong
>From c1b502d6548fcc41592cd90acc82109ee949df75 Mon Sep 17 00:00:00 2001
From: Liu, Jinsong 
Date: Tue, 13 Sep 2011 22:05:30 +0800
Subject: [PATCH] Qemu co-operation with kvm tsc deadline timer

KVM add emulation of lapic tsc deadline timer for guest.
This patch is co-operation work at qemu side.

Signed-off-by: Liu, Jinsong 
---
 target-i386/cpu.h |2 ++
 target-i386/kvm.c |   14 ++
 2 files changed, 16 insertions(+), 0 deletions(-)

diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index 935d08a..62ff73c 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -283,6 +283,7 @@
 #define MSR_IA32_APICBASE_BSP   (1<<8)
 #define MSR_IA32_APICBASE_ENABLE(1<<11)
 #define MSR_IA32_APICBASE_BASE  (0xf<<12)
+#define MSR_IA32_TSCDEADLINE0x6e0
 
 #define MSR_MTRRcap0xfe
 #define MSR_MTRRcap_VCNT   8
@@ -687,6 +688,7 @@ typedef struct CPUX86State {
 uint64_t async_pf_en_msr;
 
 uint64_t tsc;
+uint64_t tsc_deadline;
 
 uint64_t mcg_status;
 
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index aa843f0..206fcad 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -59,6 +59,7 @@ const KVMCapabilityInfo kvm_arch_required_capabilities[] = {
 
 static bool has_msr_star;
 static bool has_msr_hsave_pa;
+static bool has_msr_tsc_deadline;
 static bool has_msr_async_pf_en;
 static int lm_capable_kernel;
 
@@ -571,6 +572,10 @@ static int kvm_get_supported_msrs(KVMState *s)
 has_msr_hsave_pa = true;
 continue;
 }
+if (kvm_msr_list->indices[i] == MSR_IA32_TSCDEADLINE) {
+has_msr_tsc_deadline = true;
+continue;
+}
 }
 }
 
@@ -899,6 +904,9 @@ static int kvm_put_msrs(CPUState *env, int level)
 if (has_msr_hsave_pa) {
 kvm_msr_entry_set(&msrs[n++], MSR_VM_HSAVE_PA, env->vm_hsave);
 }
+if (has_msr_tsc_deadline) {
+kvm_msr_entry_set(&msrs[n++], MSR_IA32_TSCDEADLINE, env->tsc_deadline);
+}
 #ifdef TARGET_X86_64
 if (lm_capable_kernel) {
 kvm_msr_entry_set(&msrs[n++], MSR_CSTAR, env->cstar);
@@ -1145,6 +1153,9 @@ static int kvm_get_msrs(CPUState *env)
 if (has_msr_hsave_pa) {
 msrs[n++].index = MSR_VM_HSAVE_PA;
 }
+if (has_msr_tsc_deadline) {
+msrs[n++].index = MSR_IA32_TSCDEADLINE;
+}
 
 if (!env->tsc_valid) {
 msrs[n++].index = MSR_IA32_TSC;
@@ -1213,6 +1224,9 @@ static int kvm_get_msrs(CPUState *env)
 case MSR_IA32_TSC:
 env->tsc = msrs[i].data;
 break;
+case MSR_IA32_TSCDEADLINE:
+env->tsc_deadline = msrs[i].data;
+break;
 case MSR_VM_HSAVE_PA:
 env->vm_hsave = msrs[i].data;
 break;
-- 
1.6.5.6


qemu-lapic-tsc-deadline-timer.patch
Description: qemu-lapic-tsc-deadline-timer.patch


Re: [Qemu-devel] [PATCH][RFC][0/2] REF+/REF- optimization

2011-09-13 Thread Frediano Ziglio
2011/9/13 Kevin Wolf :
> Am 13.09.2011 09:53, schrieb Frediano Ziglio:
>> These patches try to trade-off between leaks and speed for clusters
>> refcounts.
>>
>> Refcount increments (REF+ or refp) are handled in a different way from
>> decrements (REF- or refm). The reason it that posting or not flushing
>> a REF- cause "just" a leak while posting a REF+ cause a corruption.
>>
>> To optimize REF- I just used an array to store offsets then when a
>> flush is requested or array reach a limit (currently 1022) the array
>> is sorted and written to disk. I use an array with offset instead of
>> ranges to support compression (an offset could appear multiple times
>> in the array).
>> I consider this patch quite ready.
>
> Ok, first of all let's clarify what this optimises. I don't think it
> changes anything at all for the writeback cache modes, because these
> already do most operations in memory only. So this must be about
> optimising some operations with cache=writethrough. REF- isn't about
> normal cluster allocation, it is about COW with internal snapshots or
> bdrv_discard. Do you have benchmarks for any of them?
>
> I strongly disagree with your approach for REF-. We already have a
> cache, and introducing a second one sounds like a bad idea. I think we
> could get a very similar effect if we introduced a
> qcow2_cache_entry_mark_dirty_wb() that marks a given refcount block as
> dirty, but at the same time tells the cache that even in write-through
> mode it can still treat this block as write-back. This should require
> much less code changes.
>
> But let's measure the effects first, I suspect that for cluster
> allocation it doesn't help much because every REF- comes with a REF+.
>
>> To optimize REF+ I mark a range as allocated and use this range to
>> get new ones (avoiding writing refcount to disk). When a flush is
>> requested or in some situations (like snapshot) this cache is disabled
>> and flushed (written as REF-).
>> I do not consider this patch ready, it works and pass all io-tests
>> but for instance I would avoid allocating new clusters for refcount
>> during preallocation.
>
> The only question here is if improving cache=writethrough cluster
> allocation performance is worth the additional complexity in the already
> complex refcounting code.
>
> The alternative that was discussed before is the dirty bit approach that
> is used in QED and would allow us to use writeback for all refcount
> blocks, regardless of REF- or REF+. It would be an easier approach
> requiring less code changes, but it comes with the cost of requiring an
> fsck after a qemu crash.
>
>> End speed up is quite visible allocating clusters (more then 20%).
>
> What benchmark do you use for testing this?
>
> Kevin
>

Here you are some results (kb/s)

with patch (ref-, ref+), qcow2s is qcow2 with a snapshot

run   rawqcow2  qcow2s
1 22748  4878   4792
2 29557  15839  23144

without
run   rawqcow2  qcow2s
1 21979  4308   1021
2 26249  13776  24182

Frediano



Re: [Qemu-devel] [PATCH 2/2] Qemu co-operation with kvm tsc deadline timer

2011-09-13 Thread Jan Kiszka
On 2011-09-13 16:38, Liu, Jinsong wrote:
> From c1b502d6548fcc41592cd90acc82109ee949df75 Mon Sep 17 00:00:00 2001
> From: Liu, Jinsong 
> Date: Tue, 13 Sep 2011 22:05:30 +0800
> Subject: [PATCH] Qemu co-operation with kvm tsc deadline timer
> 
> KVM add emulation of lapic tsc deadline timer for guest.
> This patch is co-operation work at qemu side.
> 
> Signed-off-by: Liu, Jinsong 
> ---
>  target-i386/cpu.h |2 ++
>  target-i386/kvm.c |   14 ++
>  2 files changed, 16 insertions(+), 0 deletions(-)
> 
> diff --git a/target-i386/cpu.h b/target-i386/cpu.h
> index 935d08a..62ff73c 100644
> --- a/target-i386/cpu.h
> +++ b/target-i386/cpu.h
> @@ -283,6 +283,7 @@
>  #define MSR_IA32_APICBASE_BSP   (1<<8)
>  #define MSR_IA32_APICBASE_ENABLE(1<<11)
>  #define MSR_IA32_APICBASE_BASE  (0xf<<12)
> +#define MSR_IA32_TSCDEADLINE0x6e0
>  
>  #define MSR_MTRRcap  0xfe
>  #define MSR_MTRRcap_VCNT 8
> @@ -687,6 +688,7 @@ typedef struct CPUX86State {
>  uint64_t async_pf_en_msr;
>  
>  uint64_t tsc;
> +uint64_t tsc_deadline;

This field has to be saved/restored for snapshots/migrations.

Frankly, I've no clue right now if substates are in vogue again (they
had problems in their binary format) or if you can simply add a
versioned top-level field and bump the CPUState version number.

>  
>  uint64_t mcg_status;
>  
> diff --git a/target-i386/kvm.c b/target-i386/kvm.c
> index aa843f0..206fcad 100644
> --- a/target-i386/kvm.c
> +++ b/target-i386/kvm.c
> @@ -59,6 +59,7 @@ const KVMCapabilityInfo kvm_arch_required_capabilities[] = {
>  
>  static bool has_msr_star;
>  static bool has_msr_hsave_pa;
> +static bool has_msr_tsc_deadline;
>  static bool has_msr_async_pf_en;
>  static int lm_capable_kernel;
>  
> @@ -571,6 +572,10 @@ static int kvm_get_supported_msrs(KVMState *s)
>  has_msr_hsave_pa = true;
>  continue;
>  }
> +if (kvm_msr_list->indices[i] == MSR_IA32_TSCDEADLINE) {
> +has_msr_tsc_deadline = true;
> +continue;
> +}
>  }
>  }
>  
> @@ -899,6 +904,9 @@ static int kvm_put_msrs(CPUState *env, int level)
>  if (has_msr_hsave_pa) {
>  kvm_msr_entry_set(&msrs[n++], MSR_VM_HSAVE_PA, env->vm_hsave);
>  }
> +if (has_msr_tsc_deadline) {
> +kvm_msr_entry_set(&msrs[n++], MSR_IA32_TSCDEADLINE, 
> env->tsc_deadline);
> +}
>  #ifdef TARGET_X86_64
>  if (lm_capable_kernel) {
>  kvm_msr_entry_set(&msrs[n++], MSR_CSTAR, env->cstar);
> @@ -1145,6 +1153,9 @@ static int kvm_get_msrs(CPUState *env)
>  if (has_msr_hsave_pa) {
>  msrs[n++].index = MSR_VM_HSAVE_PA;
>  }
> +if (has_msr_tsc_deadline) {
> +msrs[n++].index = MSR_IA32_TSCDEADLINE;
> +}
>  
>  if (!env->tsc_valid) {
>  msrs[n++].index = MSR_IA32_TSC;
> @@ -1213,6 +1224,9 @@ static int kvm_get_msrs(CPUState *env)
>  case MSR_IA32_TSC:
>  env->tsc = msrs[i].data;
>  break;
> +case MSR_IA32_TSCDEADLINE:
> +env->tsc_deadline = msrs[i].data;
> +break;
>  case MSR_VM_HSAVE_PA:
>  env->vm_hsave = msrs[i].data;
>  break;

Just to double check: This feature is exposed to the guest when A) the
host CPU supports it and B) QEMU passed down guest CPU specifications
(cpuid data) that allow it as well?

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux



Re: [Qemu-devel] [PATCH 04/12] nbd: add support for NBD_CMD_FLUSH

2011-09-13 Thread Paolo Bonzini

On 09/13/2011 03:52 PM, Kevin Wolf wrote:

>  +
>  +if (reply.error !=0)


Missing space (this is not for consistency, right?)


Well, cut and paste implies consistency. :)  Will fix.

Paolo



Re: [Qemu-devel] [PATCH 06/12] nbd: support NBD_CMD_TRIM in the server

2011-09-13 Thread Paolo Bonzini

On 09/13/2011 03:58 PM, Kevin Wolf wrote:

>  +case NBD_CMD_TRIM:
>  +TRACE("Request type is TRIM");
>  +bdrv_discard(bs, (request.from + dev_offset) / 512,
>  + request.len / 512);


Errors are completely ignored? Does the NBD protocol not allow to return
an error?


Actually it does, will fix.

Paolo



Re: [Qemu-devel] [PATCH v2 09/12] sheepdog: move coroutine send/recv function to generic code

2011-09-13 Thread Paolo Bonzini

On 09/13/2011 04:14 PM, Kevin Wolf wrote:

>block/sheepdog.c |  225 
++
>cutils.c |  103 +
>qemu-common.h|3 +
>qemu-coroutine.c |   70 +
>qemu-coroutine.h |   26 ++


Can we move the code somewhere else? This is not core coroutine
infrastructure. I would suggest qemu_socket.h/qemu-sockets.c.


It's not really socket-specific either (it uses recv/send only because 
of Windows brokenness---it could use read/write if it wasn't for that). 
 I hoped sooner or later it could become a qemu_co_readv/writev, hence 
the choice of qemu-coroutine.c.


Paolo

ps: I also hope that the Earth will start spinning slower and will give 
me 32 hour days, so just tell me if you really want that outside 
qemu-coroutine.c.




Re: [Qemu-devel] [PATCH v2 09/12] sheepdog: move coroutine send/recv function to generic code

2011-09-13 Thread Kevin Wolf
Am 13.09.2011 17:16, schrieb Paolo Bonzini:
> On 09/13/2011 04:14 PM, Kevin Wolf wrote:
block/sheepdog.c |  225 
 ++
cutils.c |  103 +
qemu-common.h|3 +
qemu-coroutine.c |   70 +
qemu-coroutine.h |   26 ++
>>
>> Can we move the code somewhere else? This is not core coroutine
>> infrastructure. I would suggest qemu_socket.h/qemu-sockets.c.
> 
> It's not really socket-specific either (it uses recv/send only because 
> of Windows brokenness---it could use read/write if it wasn't for that). 
>   I hoped sooner or later it could become a qemu_co_readv/writev, hence 
> the choice of qemu-coroutine.c.
> 
> Paolo
> 
> ps: I also hope that the Earth will start spinning slower and will give 
> me 32 hour days, so just tell me if you really want that outside 
> qemu-coroutine.c.

Yes, I do want it outside qemu-coroutine.c.

If you prefer putting it next to qemu_write_full() and friends rather
than into the sockets file, feel free to do that.

Kevin



Re: [Qemu-devel] [PATCH v2 09/12] sheepdog: move coroutine send/recv function to generic code

2011-09-13 Thread Paolo Bonzini

On 09/13/2011 05:36 PM, Kevin Wolf wrote:

If you prefer putting it next to qemu_write_full() and friends rather
than into the sockets file, feel free to do that.


Yes, that makes good sense.

Paolo



Re: [Qemu-devel] Armel host (x86 emul.) img disk not writable

2011-09-13 Thread father_mande

Hi all,

Apologize ... I have try using the apt-get install qemu (0.12) and it's 
seem that it's not target the kernel (because Qnap and chroot don't have 
the same level), generally all works fine (virtualbox for ex.) only 
kernel module if needed are to rebuild ...


But at all I have downloaded the 0.15, do the compile (even if I compile 
some platform that I don't use (but I look about after ...))

Now I can write on the image disk

I will start with a win98 install (it's an Arm soc) ... and see what append.

So sorry for this mess. I hope to post more value in the future ...

Philippe.

Le 13/09/2011 08:52, Father Mande a écrit :

Hi,
It's my first message so :
Me : I work from a while with Vmware and Virtualbox
I have integrated Virtualbox in the x86 QNAP NAS system

Armel Platform : QNAP NAS TS-219 with a Marvell (Arm) SOC 1,8 Ghz
Test run QEMU x86 emulation inside (VM like freedos or Windows)

Context : only Modules are possible no kernel change ...
... test are done in a chroot env. Debian Squeeze for Armel, so add 
X11 client to NAS and use X-Ming or Debian box as X Server


Test working :
install qemu thru apt-get
Start emu from fd (freedos) cdrom (live_cd linux , Win98 installation 
CD) ... all run open the windows, menu and keyboard works, I have add, 
also in fd program to manage the hlt ... so all seems to run. tested 
also qemu-launcher ... works.

*Problem : *
Each time I create a disk image to add or install on disk ... the 
install (Windows or Freedos) failed because he don't see the disk or 
can't read or write on 

I have tested qemu-img with raw, qcow, qcow2 without success ...
img file are full rw for all, qemu run under root, but it's same under 
a "normal" user.
Due to the lack of IDE in the Qnap ... I have try also compile modules 
for ide-core and load it (insmod) but no change ...


any advice ? Certainly I forgot somethings  ?

thanks for help.
Philippe.




[Qemu-devel] [PATCH] fix compilation with stderr trace backend

2011-09-13 Thread Paolo Bonzini
Signed-off-by: Paolo Bonzini 
---
 trace-events |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/trace-events b/trace-events
index a8e7684..1e9e717 100644
--- a/trace-events
+++ b/trace-events
@@ -454,7 +454,7 @@ milkymist_vgafb_memory_write(uint32_t addr, uint32_t value) 
"addr %08x value %08
 mipsnet_send(uint32_t size) "sending len=%u"
 mipsnet_receive(uint32_t size) "receiving len=%u"
 mipsnet_read(uint64_t addr, uint32_t val) "read addr=0x%" PRIx64 " val=0x%x"
-mipsnet_write(uint64_t addr, uint64_t val) "write addr=0x%" PRIx64 " val=0x%" 
PRIx64
+mipsnet_write(uint64_t addr, uint64_t val) "write addr=0x%" PRIx64 " val=0x%" 
PRIx64 ""
 mipsnet_irq(uint32_t isr, uint32_t intctl) "set irq to %d (%02x)"
 
 # xen-all.c
-- 
1.7.6




Re: [Qemu-devel] Using the qemu tracepoints with SystemTap

2011-09-13 Thread William Cohen
On 09/13/2011 06:03 AM, Stefan Hajnoczi wrote:
> On Mon, Sep 12, 2011 at 4:33 PM, William Cohen  wrote:
>> The RHEL-6 version of qemu-kvm makes the tracepoints available to SystemTap. 
>> I have been working on useful examples for the SystemTap tracepoints in 
>> qemu. There doesn't seem to be a great number of examples showing the 
>> utility of the tracepoints in diagnosing problems. However, I came across 
>> the following blog entry that had several examples:
>>
>> http://blog.vmsplice.net/2011/03/how-to-write-trace-analysis-scripts-for.html
>>
>> I reimplemented the VirtqueueRequestTracker example from the blog in 
>> SystemTap (the attached virtqueueleaks.stp). I can run it on RHEL-6's 
>> qemu-kvm-0.12.1.2-2.160.el6_1.8.x86_64 and get output like the following. It 
>> outputs the pid and the address of the elem that leaked when the script is 
>> stopped like the following:
>>
>> $ stap virtqueueleaks.stp
>> ^C
>> pid elem
>>   19503  1c4af28
>>   19503  1c56f88
>>   19503  1c62fe8
>>   19503  1c6f048
>>   19503  1c7b0a8
>>   19503  1c87108
>>   19503  1c93168
>> ...
>>
>> I am not that familiar with the internals of qemu. The script seems to 
>> indicates qemu is leaking, but is that really the case?  If there are 
>> resource leaks, what output would help debug those leaks? What enhancements 
>> can be done to this script to provide more useful information?
> 

Hi Stefan,

Thanks for the comments.

> Leak tracing always has corner cases :).
> 
> With virtio-blk this would indicate a leak because it uses a
> request-response model where the guest initiates I/O and the host
> responds.  A guest that cleanly shuts down before you exit your
> SystemTap script should not leak requests for virtio-blk.

I stopped the systemtap script while the guest vm was still running. So when 
the guest vm cleanly shuts down there should be a series of virtqueue_fill 
operations that will remove those elements?

Qemu uses a thread for each virtual processor, but a single thread to handle 
all IO. It seems like that might be a possible bottleneck. What would be the 
path of io event from guest to host back to guest? Is there somthing that a 
script could do to gauge that delay due to the qemu io thread handling multiple 
processors?

> 
> With virtio-net the guest actually hands the host receive buffers and
> they are held until we can receive packets into them and return them
> to the host.  We don't have a virtio_reset trace event, and due to
> this we're not accounting for clean shutdown (the guest driver resets
> the device to clear all virtqueues).
> 
> I am submitting a patch to add virtio_reset() tracing.  This will
> allow the script to delete all elements belonging to this virtio
> device.
> 
>> Are there other examples of qemu probing people would like to see?
> 
> The malloc/realloc/memalign/vmalloc/free/vfree trace events can be
> used for a few things:
>  * Finding memory leaks.
>  * Finding malloc/vfree or vmalloc/free mismatches.  The rules are:
> malloc/realloc need free, memalign/vmalloc need vfree.  They cannot be
> mixed.
> 
> Stefan

As a quick and simple experiment to see how often various probes are getting 
hit I used the following script on RHEL-6 (the probe points are a bit different 
on Fedora):

global counts
probe qemu.*.*? {counts[pn()]++}
probe end {foreach(n+ in counts) printf ("%s = %d\n", n, counts[n])}

For starting up a fedora 14 guest virtual machine and shutting it down I got 
the following output:

$  stap ~/research/profiling/examples/qemu_count.s 
^Cqemu.kvm.balloon_event = 1
qemu.kvm.bdrv_aio_multiwrite = 155
qemu.kvm.bdrv_aio_readv = 13284
qemu.kvm.bdrv_aio_writev = 998
qemu.kvm.cpu_get_apic_base = 20
qemu.kvm.cpu_in = 94082
qemu.kvm.cpu_out = 165789
qemu.kvm.cpu_set_apic_base = 445752
qemu.kvm.multiwrite_cb = 654
qemu.kvm.paio_submit = 7141
qemu.kvm.qemu_free = 677704
qemu.kvm.qemu_malloc = 683031
qemu.kvm.qemu_memalign = 285
qemu.kvm.qemu_realloc = 47550
qemu.kvm.virtio_blk_handle_write = 504
qemu.kvm.virtio_blk_req_complete = 7146
qemu.kvm.virtio_blk_rw_complete = 7146
qemu.kvm.virtio_notify = 6574
qemu.kvm.virtio_queue_notify = 6680
qemu.kvm.virtqueue_fill = 7146
qemu.kvm.virtqueue_flush = 7146
qemu.kvm.virtqueue_pop = 7147
qemu.kvm.vm_state_notify = 1


See a lot of qemu.kvm.qemu_malloc. This is likely more than systemtap can track 
if there are thousands of them live at the same time. There are no qemu_vmalloc 
events because of https://bugzilla.redhat.com/show_bug.cgi?id=714773.

Should the qemu.kvm.cpu_in and qemu.kvm.cpu_out match up?  There are a lot more 
qemu.kvm.cpu_out than qemu.kvm.cpu_in count.

See that qemu.kvm.virtio_blk_req_complete, qemu.kvm.virtio_blk_rw_complete, 
qemu.kvm.virtqueue_fill, and qemu.kvm.virtqueue_flush all have the same count, 
7146. The qemu.kvm.virtqueue_pop is close, at 7147.

-Will











[Qemu-devel] [Bug 818673] Re: virtio: trying to map MMIO memory

2011-09-13 Thread vrozenfe
I've made several unsuccessful attempts to reproduce this problem,
running VMs on top of F14 and RHEL6.2

The only one relatively close problem,  reported by our QE, was
https://bugzilla.redhat.com/show_bug.cgi?id=727034 
It must be fixed in our internal repository. (Public repository
is out of sync, but I'm going to update it soon) 

I put our recent  (WHQL candidates) drivers here:
http://people.redhat.com/vrozenfe/virtio-win-prewhql-0.1.zip

Please give them a try and share your experience.

Best,
Vadim.

** Bug watch added: Red Hat Bugzilla #727034
   https://bugzilla.redhat.com/show_bug.cgi?id=727034

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/818673

Title:
  virtio: trying to map MMIO memory

Status in QEMU:
  New

Bug description:
  Qemu host is Core i7, running Linux.  Guest is Windows XP sp3.
  Often, qemu will crash shortly after starting (1-5 minutes) with a statement 
"qemu-system-x86_64: virtio: trying to map MMIO memory"
  This has occured with qemu-kvm 0.14, qemu-kvm 0.14.1, qemu-0.15.0-rc0 and 
qemu 0.15.0-rc1.
  Qemu is started as such:
  qemu-system-x86_64 -cpu host -enable-kvm -pidfile /home/rick/qemu/hds/wxp.pid 
-drive file=/home/rick/qemu/hds/wxp.raw,if=virtio -m 768 -name WinXP -net 
nic,model=virtio -net user -localtime -usb -vga qxl -device virtio-serial 
-chardev spicevmc,name=vdagent,id=vdagent -device 
virtserialport,chardev=vdagent,name=com.redhat.spice.0 -spice 
port=1234,disable-ticketing -daemonize -monitor 
telnet:localhost:12341,server,nowait
  The WXP guest has virtio 1.1.16 drivers for net and scsi, and the most 
current spice binaries from spice-space.org.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/818673/+subscriptions



Re: [Qemu-devel] [Qemu-ppc] [RESEND][PATCH] booke timers

2011-09-13 Thread Scott Wood
On 09/13/2011 08:08 AM, Alexander Graf wrote:
> Fabien Chouteau wrote:
>> static void booke_update_irq(CPUState *env)
>> {
>> ppc_set_irq(env, PPC_INTERRUPT_DECR,
>> (env->spr[SPR_BOOKE_TSR] & TSR_DIS
>>  && env->spr[SPR_BOOKE_TCR] & TCR_DIE));
>>
>> ppc_set_irq(env, PPC_INTERRUPT_WDT,
>> (env->spr[SPR_BOOKE_TSR] & TSR_WIS
>>  && env->spr[SPR_BOOKE_TCR] & TCR_WIE));
>>
>> ppc_set_irq(env, PPC_INTERRUPT_FIT,
>> (env->spr[SPR_BOOKE_TSR] & TSR_FIS
>>  && env->spr[SPR_BOOKE_TCR] & TCR_FIE));
>> }
>>   
> 
> Awesome! Please also check on MSR.EE and send a new patch then :)

If you check on EE here, then you'll need to call booke_update_irq()
when EE changes (not sure whether that's the plan).  Another option
would be to unset the irq if the condition is not valid.  This would
also be better in that you could have all three set (DIS, DIE, EE) and
not deliver the interrupt because there's a higher priority exception.

-Scott




Re: [Qemu-devel] [Qemu-ppc] [RESEND][PATCH] booke timers

2011-09-13 Thread Alexander Graf




Am 13.09.2011 um 18:44 schrieb Scott Wood :

> On 09/13/2011 08:08 AM, Alexander Graf wrote:
>> Fabien Chouteau wrote:
>>> static void booke_update_irq(CPUState *env)
>>> {
>>>ppc_set_irq(env, PPC_INTERRUPT_DECR,
>>>(env->spr[SPR_BOOKE_TSR] & TSR_DIS
>>> && env->spr[SPR_BOOKE_TCR] & TCR_DIE));
>>> 
>>>ppc_set_irq(env, PPC_INTERRUPT_WDT,
>>>(env->spr[SPR_BOOKE_TSR] & TSR_WIS
>>> && env->spr[SPR_BOOKE_TCR] & TCR_WIE));
>>> 
>>>ppc_set_irq(env, PPC_INTERRUPT_FIT,
>>>(env->spr[SPR_BOOKE_TSR] & TSR_FIS
>>> && env->spr[SPR_BOOKE_TCR] & TCR_FIE));
>>> }
>>> 
>> 
>> Awesome! Please also check on MSR.EE and send a new patch then :)
> 
> If you check on EE here, then you'll need to call booke_update_irq()
> when EE changes (not sure whether that's the plan).  Another option
> would be to unset the irq if the condition is not valid.  This would
> also be better in that you could have all three set (DIS, DIE, EE) and
> not deliver the interrupt because there's a higher priority exception.

Yup, which is what the patch actually does, so sorry for the fuss :). The 
subtile parts are in a different function and by lowering the irq line when TSR 
or TCR get set, we're good.


Alex


> 



Re: [Qemu-devel] qemu virtIO blocking operation - question

2011-09-13 Thread Sinha, Ani

>>
>> We are trying to paravirtualize the IPMI device (/dev/ipmi0).
>
> From http://en.wikipedia.org/wiki/Intelligent_Platform_Management_Interface:
> "An implementation of IPMI version 1.5 can communicate via a direct
> serial connection or via a side-band local area network (LAN)
> connection to a remote client."
>
> Why do you need a new virtio device?  Can't you use virtio-serial?
> This is what other management channels are using for host<->guest
> agents.


It might be possible. However, we are doing it this way because :

(a) I am not sure of the interactions with the real ipmi device on the host 
when the device is shared with multiple guests (we are not using pci 
passthrough to attach the device to a single guest).  The device itself is 
stateless - that is, it can not handle multiple requests at one time. When 
multiple users use the device, the driver has to serialize the requests and 
send it to the device. The existing ipmi driver within Linux 
(drivers/char/ipmi)  does that now.  But this driver on one guest would not 
know the existence of another guest. May be it might have to be reworked to 
work with virtio-serial. Dunno.

So, we wanted to keep this driver as is in the host and build a lightweight 
interface layer in the guest that can talk to the real driver on the host. 
Multiple guests would then be like multiple processes accessing the device.

(b) We wanted to gain some experience paravirtualizing devices this way through 
virtio since we have other proprietary hardware that needs to be 
paravirtualized.


Makes sense?

Ani



The information contained in this message may be privileged
and confidential and protected from disclosure. If the reader
of this message is not the intended recipient, or an employee
or agent responsible for delivering this message to the
intended recipient, you are hereby notified that any reproduction,
dissemination or distribution of this communication is strictly
prohibited. If you have received this communication in error,
please notify us immediately by replying to the message and
deleting it from your computer. Thank you. Tellabs





Re: [Qemu-devel] [Qemu-ppc] [PATCH] PPC: Fix via-cuda memory registration

2011-09-13 Thread Blue Swirl
On Mon, Sep 12, 2011 at 9:05 PM, Anthony Liguori  wrote:
> On 09/12/2011 08:53 AM, Avi Kivity wrote:
>>
>> On 09/12/2011 04:46 PM, Lucas Meneghel Rodrigues wrote:
>>>
>>> On 09/12/2011 06:07 AM, Avi Kivity wrote:

 On 09/11/2011 02:38 PM, Alexander Graf wrote:
>
> Am 11.09.2011 um 12:41 schrieb Avi Kivity:
>
> > On 09/08/2011 07:54 PM, Alexander Graf wrote:
> >> PS: Please test your patches. This one could have been found with
> an invocation
> >> as simple as "qemu-system-ppc". We boot into the OpenBIOS prompt by
> default,
> >> so you wouldn't even have required a guest image or kernel.
> >>
> >
> >
> > Sorry about that.
> >
> > Note that it's pretty hard to test these patches. I often don't even
> know which binary as the device->target relationship is not
> immediately visible,
>
> The patch was explicitly to convert ppc ;).

 Yes, in this case. Not in the general case.

> > and I don't really know what to expect from the guest.
>
> The very easy check-fundamentals thing to do for ppc is to execute
> qemu-system-ppc without arguments. It should drop you into an OF
> prompt. Both memory api bugs on ppc I've seen now would have been
> exposed with that.
>
> I agree that we should have something slightly more sophisticated, but
> doing such a bare minimum test is almost for free to the tester and
> covers at least basic functionality :). I don't mind people
> introducibg subtle bugs in corner cases - these things happen. But an
> abort() when you execute the binary? That really shouldn't happen
> ever. This one is almost as bad.

 Yeah.

> > It would be best if we had a kvm-autotest testset for tcg, it would
> probably run in just a few minutes and increase confidence in these
> patches.
>
> Yeah, I am using kvm-autotest today for regression testing, but it's
> very hard to tell it to run multiple different binaries. The target
> program variable can only be set for an execution job, making it
> impossible to run multiple targets in one autotest run.
>>>
>>> Alexander, I've started to work on this, I'm clearing out my request
>>> list, last week I've implemented ticket 50, that was related to
>>> special block configuration for the tests, now I want to make it
>>> possible to support multiple binaries.
>>>
 Probably best to tell autotest about the directory, and let it pick up
 the binary. Still need some configuration to choose between qemu-kvm and
 qemu-system-x86_64.

 Lucas?
>>>
>>> Yes, that would also work, having different variants with different
>>> qemu and qemu-img paths. Those binaries would have to be already
>>> pre-built, but then we miss the ability autotest has of building the
>>> binaries and prepare the environment. It'd be like:
>>>
>>> variant1:
>>> qemu = /path/to/qemu1
>>> qemu-img = /path/to/qemu-img1
>>> extra_params = "--appropriate --extra --params2"
>>>
>>>
>>> variant2:
>>> qemu = /path/to/qemu2
>>> qemu-img = /path/to/qemu-img2
>>> extra_params = "--appropriate --extra --params2"
>>>
>>> Something like that. It's a feasible intermediate solution until I
>>> finish work on supporting multiple userspaces.
>>>
>>
>> Another option is, now that the binary name 'qemu' is available for
>> general use, make it possible to invoke everything with just one binary:
>>
>> qemu -system -target mips ...
>> qemu-system -target mips ...
>> qemu-system-mips ...
>
> I have a fancy script that I'll post soon that does something like this.  It
> takes the git approach and expands:
>
> qemu foo --bar=baz
>
> To:
>
> qemu-foo --bar=baz
>
> Which means that you could do:
>
> qemu system-x86_64 -hda foo.img
>
> And it'd go to:
>
> qemu-system-x86_64 -hda foo.img
>
> But there is also a smarter 'run' command that let's you do:
>
> qemu run --target=x86_64 -hda foo.img

How would this be better than Avi's version? There isn't even any
compatibility like 'qemu' has with 'qemu' defaulting to 'qemu -system
-target i386'.

> I've made no attempt to unify linux-user.  It's a very different executable
> with a different usage model.
>
> My motivation is QOM as I don't want to have command line options to create
> devices any more.  Instead, a front end script will talk to the monitor to
> setup devices/machines.
>
> Regards,
>
> Anthony Liguori
>
>>
>> are all equivalent. autotest should easily be able to pass different
>> -target based on the test being run.
>>
>
>
>



Re: [Qemu-devel] Using the qemu tracepoints with SystemTap

2011-09-13 Thread William Cohen
On 09/13/2011 12:10 PM, William Cohen wrote:

> Should the qemu.kvm.cpu_in and qemu.kvm.cpu_out match up?  There are a lot 
> more qemu.kvm.cpu_out than qemu.kvm.cpu_in count.

I found that cpu_in and cpu_out refer to input and output instructions.  I 
wrote a little script tally up the input and output operations on each port to 
run on a qemu on fc15 machine.

It generates output like the following:


cpu_in
  portcount
0x01f7 3000
0x03d5  120
0xc000 2000
0xc002 3000

cpu_out
  portcount
0x0080  480
0x01f1 2000
0x01f2 2000
0x01f3 2000
0x01f4 2000
0x01f5 2000
0x01f6 2000
0x01f7 1000
0x03d4  480
0x03d5  120
0x03f6 1000
0xc000 3000
0xc002 2000
0xc004 1000
0xc0904

Looks like lots of touching the ide device ports (0x01f0-0x01ff) and some vga 
controller (0x03d0-0x3df). This is kind of what would be expected when the 
machine is doing a fsck and selinux relabel on the guest virtual machines. Look 
like some pci device access (http://www.tech-pro.net/intro_pci.html) also.

-Will
global cpu_in, cpu_out
probe qemu.system.x86_64.cpu_in { cpu_in[addr]++ }
probe qemu.system.x86_64.cpu_out {cpu_out[addr]++ }
probe end {
  # write out the data
  printf("\ncpu_in\n%6s %8s\n","port", "count")
  foreach (addr+ in cpu_in)
printf("0x%04x %8d\n", addr, cpu_in[addr])
  printf("\ncpu_out\n%6s %8s\n","port", "count")
  foreach (addr+ in cpu_out)
printf("0x%04x %8d\n", addr, cpu_out[addr])
}


Re: [Qemu-devel] [PATCH v3 5/6] vga: Use linear mapping + dirty logging in chain 4 memory access mode

2011-09-13 Thread Blue Swirl
On Tue, Sep 13, 2011 at 11:34 AM, Jan Kiszka  wrote:
> On 2011-09-13 11:42, Alexander Graf wrote:
>>
>> On 13.09.2011, at 11:00, Jan Kiszka wrote:
>>
>>> On 2011-09-13 10:40, Alexander Graf wrote:
 Btw, it still tries to execute invalid code even with your patch. #if 
 0'ing out the memory region updates at least get the guest booting for me. 
 Btw, to get it working you also need a patch for the interrupt controller 
 (another breakage thanks to memory api).

 diff --git a/hw/heathrow_pic.c b/hw/heathrow_pic.c
 index 51996ab..16f48d1 100644
 --- a/hw/heathrow_pic.c
 +++ b/hw/heathrow_pic.c
 @@ -126,7 +126,7 @@ static uint64_t pic_read(void *opaque, 
 target_phys_addr_t addr,
 static const MemoryRegionOps heathrow_pic_ops = {
     .read = pic_read,
     .write = pic_write,
 -    .endianness = DEVICE_NATIVE_ENDIAN,
 +    .endianness = DEVICE_LITTLE_ENDIAN,
 };

 static void heathrow_pic_set_irq(void *opaque, int num, int level)

>>>
>>> With out without this fix, with or without active chain-4 optimization,
>>> I just get an empty yellow screen when firing up qemu-system-ppc (also
>>> when using the Debian ISO). Do I need to specify a specific machine type?
>>
>> Ugh. No, you only need this patch:
>>
>>   [PATCH] PPC: Fix via-cuda memory registration
>>
>> which fixes another recently introduced regression :)
>
> That works now - and allowed me to identify the bug after enhancing info
> mtree a bit:
>
> (qemu) info mtree
> memory
>  addr  prio 0 size 7fff system
>    addr 8088 prio 1 size 8 macio
>      addr 808e prio 0 size 2 macio-nvram
>      addr 808a prio 0 size 1000 pmac-ide
>      addr 80896000 prio 0 size 2000 cuda
>      addr 80893000 prio 0 size 40 escc-bar
>      addr 80888000 prio 0 size 1000 dbdma
>      addr 8088 prio 0 size 1000 heathrow-pic
>    addr 8000 prio 1 size 80 vga.vram
>    addr 800a prio 1 size 2 vga-lowmem
>    ...
>
> Here is the problem: Both the vram and the ISA range get mapped into
> system address space, but the former eclipses the latter as it shows up
> earlier in the list and has the same priority. This picture changes with
> the chain-4 alias which has prio 2, thus maps over the vram.
>
> It looks to me like the ISA address space is either misplaced at
> 0x8000 or is not supposed to be mapped at all on PPC. Comments?

Since there is no PCI-ISA bridge, ISA address space shouldn't exist.



Re: [Qemu-devel] [Qemu-ppc] [PATCH] PPC: Fix via-cuda memory registration

2011-09-13 Thread Anthony Liguori

On 09/13/2011 02:31 PM, Blue Swirl wrote:

On Mon, Sep 12, 2011 at 9:05 PM, Anthony Liguori  wrote:

On 09/12/2011 08:53 AM, Avi Kivity wrote:


On 09/12/2011 04:46 PM, Lucas Meneghel Rodrigues wrote:


On 09/12/2011 06:07 AM, Avi Kivity wrote:


On 09/11/2011 02:38 PM, Alexander Graf wrote:


Am 11.09.2011 um 12:41 schrieb Avi Kivity:


On 09/08/2011 07:54 PM, Alexander Graf wrote:

PS: Please test your patches. This one could have been found with

an invocation

as simple as "qemu-system-ppc". We boot into the OpenBIOS prompt by

default,

so you wouldn't even have required a guest image or kernel.




Sorry about that.

Note that it's pretty hard to test these patches. I often don't even

know which binary as the device->target relationship is not
immediately visible,

The patch was explicitly to convert ppc ;).


Yes, in this case. Not in the general case.


and I don't really know what to expect from the guest.


The very easy check-fundamentals thing to do for ppc is to execute
qemu-system-ppc without arguments. It should drop you into an OF
prompt. Both memory api bugs on ppc I've seen now would have been
exposed with that.

I agree that we should have something slightly more sophisticated, but
doing such a bare minimum test is almost for free to the tester and
covers at least basic functionality :). I don't mind people
introducibg subtle bugs in corner cases - these things happen. But an
abort() when you execute the binary? That really shouldn't happen
ever. This one is almost as bad.


Yeah.


It would be best if we had a kvm-autotest testset for tcg, it would

probably run in just a few minutes and increase confidence in these
patches.

Yeah, I am using kvm-autotest today for regression testing, but it's
very hard to tell it to run multiple different binaries. The target
program variable can only be set for an execution job, making it
impossible to run multiple targets in one autotest run.


Alexander, I've started to work on this, I'm clearing out my request
list, last week I've implemented ticket 50, that was related to
special block configuration for the tests, now I want to make it
possible to support multiple binaries.


Probably best to tell autotest about the directory, and let it pick up
the binary. Still need some configuration to choose between qemu-kvm and
qemu-system-x86_64.

Lucas?


Yes, that would also work, having different variants with different
qemu and qemu-img paths. Those binaries would have to be already
pre-built, but then we miss the ability autotest has of building the
binaries and prepare the environment. It'd be like:

variant1:
qemu = /path/to/qemu1
qemu-img = /path/to/qemu-img1
extra_params = "--appropriate --extra --params2"


variant2:
qemu = /path/to/qemu2
qemu-img = /path/to/qemu-img2
extra_params = "--appropriate --extra --params2"

Something like that. It's a feasible intermediate solution until I
finish work on supporting multiple userspaces.



Another option is, now that the binary name 'qemu' is available for
general use, make it possible to invoke everything with just one binary:

qemu -system -target mips ...
qemu-system -target mips ...
qemu-system-mips ...


I have a fancy script that I'll post soon that does something like this.  It
takes the git approach and expands:

qemu foo --bar=baz

To:

qemu-foo --bar=baz

Which means that you could do:

qemu system-x86_64 -hda foo.img

And it'd go to:

qemu-system-x86_64 -hda foo.img

But there is also a smarter 'run' command that let's you do:

qemu run --target=x86_64 -hda foo.img


How would this be better than Avi's version? There isn't even any
compatibility like 'qemu' has with 'qemu' defaulting to 'qemu -system
-target i386'.


Because you can then do:

$ qemu run -hda foo.img -name bar
$ qemu monitor bar info kvm
KVM enabled

Or you can do:

$ sudo qemu setup-nat foo eth0
$ sudo qemu create-vnic foo
Created vnic `vnet0'
$ qemu run -hda foo.img -net tap,ifname=vnet0

And all sorts of other interesting things.  It means a much friendly 
interface for command line users and much better scriptability.


Regards,

Anthony Liguori




I've made no attempt to unify linux-user.  It's a very different executable
with a different usage model.

My motivation is QOM as I don't want to have command line options to create
devices any more.  Instead, a front end script will talk to the monitor to
setup devices/machines.

Regards,

Anthony Liguori



are all equivalent. autotest should easily be able to pass different
-target based on the test being run.










Re: [Qemu-devel] QEMU Image problem

2011-09-13 Thread Mulyadi Santosa
On Tue, Sep 13, 2011 at 14:52, bala suru  wrote:
> Hi,
>
> I have some problem at the genarating the KVM-QEMU image from the .iso file
> ., I tried virt-manager but could not create the proper one .

i am not really sure what you're going to do, but if you just wanna to
access it, why not simply pass it to -cdrom option?

-- 
regards,

Mulyadi Santosa
Freelance Linux trainer and consultant

blog: the-hydra.blogspot.com
training: mulyaditraining.blogspot.com



[Qemu-devel] [Bug 818673] Re: virtio: trying to map MMIO memory

2011-09-13 Thread Rick Vernam
Still crashes just the same.
I updated the drivers for virt net, scsi & serial from the XP and WXp folders 
in the zip file that you referenced.
Then I shutdown the VM.
Because it only seems to happen every other time that Qemu is started, I 
started it back up and shut it down again.
Then the VM was started a third time and left idle prior to crashing.

Thanks, and sorry that I didn't have better news.
(also, note that I've built qemu-kvm straight from www.linux-kvm.org, and qemu 
straight from qemu.org).

-Rick

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/818673

Title:
  virtio: trying to map MMIO memory

Status in QEMU:
  New

Bug description:
  Qemu host is Core i7, running Linux.  Guest is Windows XP sp3.
  Often, qemu will crash shortly after starting (1-5 minutes) with a statement 
"qemu-system-x86_64: virtio: trying to map MMIO memory"
  This has occured with qemu-kvm 0.14, qemu-kvm 0.14.1, qemu-0.15.0-rc0 and 
qemu 0.15.0-rc1.
  Qemu is started as such:
  qemu-system-x86_64 -cpu host -enable-kvm -pidfile /home/rick/qemu/hds/wxp.pid 
-drive file=/home/rick/qemu/hds/wxp.raw,if=virtio -m 768 -name WinXP -net 
nic,model=virtio -net user -localtime -usb -vga qxl -device virtio-serial 
-chardev spicevmc,name=vdagent,id=vdagent -device 
virtserialport,chardev=vdagent,name=com.redhat.spice.0 -spice 
port=1234,disable-ticketing -daemonize -monitor 
telnet:localhost:12341,server,nowait
  The WXP guest has virtio 1.1.16 drivers for net and scsi, and the most 
current spice binaries from spice-space.org.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/818673/+subscriptions



[Qemu-devel] buildbot failure in qemu on block_x86_64_debian_6_0

2011-09-13 Thread qemu
The Buildbot has detected a new failure on builder block_x86_64_debian_6_0 
while building qemu.
Full details are available at:
 http://buildbot.b1-systems.de/qemu/builders/block_x86_64_debian_6_0/builds/30

Buildbot URL: http://buildbot.b1-systems.de/qemu/

Buildslave for this Build: yuzuki

Build Reason: The Nightly scheduler named 'nightly_block' triggered this build
Build Source Stamp: [branch block] HEAD
Blamelist: 

BUILD FAILED: failed git

sincerely,
 -The Buildbot



Re: [Qemu-devel] [PATCH] pci: implement bridge filtering

2011-09-13 Thread Wen Congyang
At 09/05/2011 02:13 AM, Michael S. Tsirkin Write:
> Support bridge filtering on top of the memory
> API as suggested by Avi Kivity:
> 
> Create a memory region for the bridge's address space.  This region is
> not directly added to system_memory or its descendants.  Devices under
> the bridge see this region as its pci_address_space().  The region is
> as large as the entire address space - it does not take into account
> any windows.
> 
> For each of the three windows (pref, non-pref, vga), create an alias
> with the appropriate start and size.  Map the alias into the bridge's
> parent's pci_address_space(), as subregions.
> 
> Signed-off-by: Michael S. Tsirkin 
> ---
> 
> The below seems to work fine for me so I applied this.
> Still need to test bridge filtering, any help with this
> appreciated.
> 


I test bridge filtering, and the BAR still can be visible on guest even if
I change the memory region.

Thanks
Wen Congyang



Re: [Qemu-devel] Why qemu write/rw speed is so low?

2011-09-13 Thread Zhi Yong Wu
Log for bps=((10 * 1024 * 1024)).

test: (g=0): rw=write, bs=512-512/512-512, ioengine=libaio, iodepth=1
Starting 1 process
Jobs: 1 (f=1): [W] [100.0% done] [0K/58K /s] [0/114 iops] [eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=2657
  write: io=51,200KB, bw=58,527B/s, iops=114, runt=895793msec
slat (usec): min=26, max=376K, avg=81.69, stdev=2104.09
clat (usec): min=859, max=757K, avg=8648.07, stdev=8278.64
 lat (usec): min=921, max=1,133K, avg=8730.49, stdev=9239.57
bw (KB/s) : min=0, max=   60, per=101.03%, avg=57.59, stdev= 7.41
  cpu  : usr=0.05%, sys=0.75%, ctx=102611, majf=0, minf=51
  IO depths: 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
 submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
 complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
 issued r/w: total=0/102400, short=0/0
 lat (usec): 1000=0.01%
 lat (msec): 2=0.01%, 4=0.02%, 10=98.99%, 20=0.24%, 50=0.66%
 lat (msec): 100=0.03%, 250=0.01%, 500=0.05%, 1000=0.01%

Run status group 0 (all jobs):
  WRITE: io=51,200KB, aggrb=57KB/s, minb=58KB/s, maxb=58KB/s,
mint=895793msec, maxt=895793msec

Disk stats (read/write):
  dm-0: ios=28/103311, merge=0/0, ticks=1318/950537, in_queue=951852,
util=99.63%, aggrios=28/102932, aggrmerge=0/379,
aggrticks=1316/929743, aggrin_queue=930987, aggrutil=99.60%
vda: ios=28/102932, merge=0/379, ticks=1316/929743,
in_queue=930987, util=99.60%
test: (g=0): rw=write, bs=8K-8K/8K-8K, ioengine=libaio, iodepth=1
Starting 1 process
Jobs: 1 (f=1): [W] [100.0% done] [0K/892K /s] [0/108 iops] [eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=2782
  write: io=51,200KB, bw=926KB/s, iops=115, runt= 55269msec
slat (usec): min=20, max=32,160, avg=66.43, stdev=935.62
clat (msec): min=1, max=157, avg= 8.53, stdev= 2.55
 lat (msec): min=1, max=158, avg= 8.60, stdev= 2.93
bw (KB/s) : min=  539, max=  968, per=100.12%, avg=927.09, stdev=63.89
  cpu  : usr=0.10%, sys=0.47%, ctx=6415, majf=0, minf=26
  IO depths: 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
 submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
 complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
 issued r/w: total=0/6400, short=0/0

 lat (msec): 2=0.06%, 4=0.05%, 10=99.19%, 20=0.06%, 50=0.62%
 lat (msec): 250=0.02%

Run status group 0 (all jobs):
  WRITE: io=51,200KB, aggrb=926KB/s, minb=948KB/s, maxb=948KB/s,
mint=55269msec, maxt=55269msec

Disk stats (read/write):
  dm-0: ios=3/6546, merge=0/0, ticks=117/65262, in_queue=65387,
util=99.58%, aggrios=3/6472, aggrmerge=0/79, aggrticks=117/62063,
aggrin_queue=62178, aggrutil=99.54%
vda: ios=3/6472, merge=0/79, ticks=117/62063, in_queue=62178, util=99.54%
test: (g=0): rw=write, bs=64K-64K/64K-64K, ioengine=libaio, iodepth=1
Starting 1 process
Jobs: 1 (f=1): [W] [100.0% done] [0K/7,332K /s] [0/111 iops] [eta 00m:00s]
test: (g=0): rw=write, bs=64K-64K/64K-64K, ioengine=libaio, iodepth=1
Starting 1 process
Jobs: 1 (f=1): [W] [100.0% done] [0K/7,332K /s] [0/111 iops] [eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=2793
  write: io=51,200KB, bw=7,074KB/s, iops=110, runt=  7238msec
slat (usec): min=23, max=37,715, avg=82.08, stdev=1332.25
clat (msec): min=2, max=34, avg= 8.96, stdev= 1.54
 lat (msec): min=2, max=58, avg= 9.04, stdev= 2.31
bw (KB/s) : min= 6361, max= 7281, per=100.13%, avg=7082.07, stdev=274.31
  cpu  : usr=0.08%, sys=0.53%, ctx=801, majf=0, minf=23
  IO depths: 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
 submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
 complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
 issued r/w: total=0/800, short=0/0

 lat (msec): 4=0.25%, 10=92.12%, 20=7.25%, 50=0.38%

Run status group 0 (all jobs):
  WRITE: io=51,200KB, aggrb=7,073KB/s, minb=7,243KB/s, maxb=7,243KB/s,
mint=7238msec, maxt=7238msec

Disk stats (read/write):
  dm-0: ios=0/811, merge=0/0, ticks=0/8003, in_queue=8003,
util=98.35%, aggrios=0/804, aggrmerge=0/17, aggrticks=0/7319,
aggrin_queue=7319, aggrutil=98.19%
vda: ios=0/804, merge=0/17, ticks=0/7319, in_queue=7319, util=98.19%
test: (g=0): rw=write, bs=128K-128K/128K-128K, ioengine=libaio, iodepth=1
Starting 1 process
Jobs: 1 (f=1): [W] [83.3% done] [0K/10M /s] [0/81 iops] [eta 00m:01s]
test: (groupid=0, jobs=1): err= 0: pid=2800
  write: io=51,200KB, bw=10,113KB/s, iops=79, runt=  5063msec
slat (usec): min=36, max=35,279, avg=130.55, stdev=1761.93
clat (msec): min=3, max=134, avg=12.52, stdev=16.93
 lat (msec): min=3, max=134, avg=12.65, stdev=17.14
bw (KB/s) : min= 7888, max=13128, per=100.41%, avg=10153.00, stdev=1607.48
  cpu  : usr=0.00%, sys=0.51%, ctx=401, majf=0, minf=23
  IO depths: 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
 submit: 0=0.0%, 4=100.0%

Re: [Qemu-devel] [PATCH 2/2] Qemu co-operation with kvm tsc deadline timer

2011-09-13 Thread Liu, Jinsong
Jan Kiszka wrote:
> On 2011-09-13 16:38, Liu, Jinsong wrote:
>> From c1b502d6548fcc41592cd90acc82109ee949df75 Mon Sep 17 00:00:00
>> 2001 
>> From: Liu, Jinsong 
>> Date: Tue, 13 Sep 2011 22:05:30 +0800
>> Subject: [PATCH] Qemu co-operation with kvm tsc deadline timer
>> 
>> KVM add emulation of lapic tsc deadline timer for guest.
>> This patch is co-operation work at qemu side.
>> 
>> Signed-off-by: Liu, Jinsong  ---
>>  target-i386/cpu.h |2 ++
>>  target-i386/kvm.c |   14 ++
>>  2 files changed, 16 insertions(+), 0 deletions(-)
>> 
>> diff --git a/target-i386/cpu.h b/target-i386/cpu.h
>> index 935d08a..62ff73c 100644
>> --- a/target-i386/cpu.h
>> +++ b/target-i386/cpu.h
>> @@ -283,6 +283,7 @@
>>  #define MSR_IA32_APICBASE_BSP   (1<<8)
>>  #define MSR_IA32_APICBASE_ENABLE(1<<11)
>>  #define MSR_IA32_APICBASE_BASE  (0xf<<12)
>> +#define MSR_IA32_TSCDEADLINE0x6e0
>> 
>>  #define MSR_MTRRcap 0xfe
>>  #define MSR_MTRRcap_VCNT8
>> @@ -687,6 +688,7 @@ typedef struct CPUX86State {
>>  uint64_t async_pf_en_msr;
>> 
>>  uint64_t tsc;
>> +uint64_t tsc_deadline;
> 
> This field has to be saved/restored for snapshots/migrations.
> 
> Frankly, I've no clue right now if substates are in vogue again (they
> had problems in their binary format) or if you can simply add a
> versioned top-level field and bump the CPUState version number.
> 

Yes, it would be saved/restored. After migration, tsc_deadline would be set to 
MSR_IA32_TSCDEADLINE to trigger tsc timer interrupt.

>> 
>>  uint64_t mcg_status;
>> 
>> diff --git a/target-i386/kvm.c b/target-i386/kvm.c
>> index aa843f0..206fcad 100644
>> --- a/target-i386/kvm.c
>> +++ b/target-i386/kvm.c
>> @@ -59,6 +59,7 @@ const KVMCapabilityInfo
>> kvm_arch_required_capabilities[] = { 
>> 
>>  static bool has_msr_star;
>>  static bool has_msr_hsave_pa;
>> +static bool has_msr_tsc_deadline;
>>  static bool has_msr_async_pf_en;
>>  static int lm_capable_kernel;
>> 
>> @@ -571,6 +572,10 @@ static int kvm_get_supported_msrs(KVMState *s)
>>  has_msr_hsave_pa = true;
>>  continue;
>>  }
>> +if (kvm_msr_list->indices[i] ==
>> MSR_IA32_TSCDEADLINE) { +has_msr_tsc_deadline =
>> true; +continue;
>> +}
>>  }
>>  }
>> 
>> @@ -899,6 +904,9 @@ static int kvm_put_msrs(CPUState *env, int
>>  level)  if (has_msr_hsave_pa) {
>> kvm_msr_entry_set(&msrs[n++], MSR_VM_HSAVE_PA, env->vm_hsave); 
>> } +if (has_msr_tsc_deadline) { +   
>>  kvm_msr_entry_set(&msrs[n++], MSR_IA32_TSCDEADLINE,
>>  env->tsc_deadline); +} #ifdef TARGET_X86_64 if
>>  (lm_capable_kernel) { kvm_msr_entry_set(&msrs[n++],
>> MSR_CSTAR, env->cstar); @@ -1145,6 +1153,9 @@ static int
>>  kvm_get_msrs(CPUState *env)  if (has_msr_hsave_pa) {
>>  msrs[n++].index = MSR_VM_HSAVE_PA; }
>> +if (has_msr_tsc_deadline) {
>> +msrs[n++].index = MSR_IA32_TSCDEADLINE;
>> +}
>> 
>>  if (!env->tsc_valid) {
>>  msrs[n++].index = MSR_IA32_TSC;
>> @@ -1213,6 +1224,9 @@ static int kvm_get_msrs(CPUState *env)
>>  case MSR_IA32_TSC: env->tsc = msrs[i].data;
>>  break;
>> +case MSR_IA32_TSCDEADLINE:
>> +env->tsc_deadline = msrs[i].data;
>> +break;
>>  case MSR_VM_HSAVE_PA:
>>  env->vm_hsave = msrs[i].data;
>>  break;
> 
> Just to double check: This feature is exposed to the guest when A) the
> host CPU supports it and B) QEMU passed down guest CPU specifications
> (cpuid data) that allow it as well?
> 
> Jan

Yes.

Thanks,
Jinsong




[Qemu-devel] [PATCH] hid: vmstat fix

2011-09-13 Thread TeLeMan
The commit "usb/hid: add hid_pointer_activate, use it" used
HIDMouseState.mouse_grabbed in hid_pointer_activate(), so
mouse_grabbed should be added into vmstat.

Signed-off-by: TeLeMan 
---
 hw/hid.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/hw/hid.c b/hw/hid.c
index c608400..72b861f 100644
--- a/hw/hid.c
+++ b/hw/hid.c
@@ -433,7 +433,7 @@ static const VMStateDescription vmstate_hid_ptr_queue = {

 const VMStateDescription vmstate_hid_ptr_device = {
 .name = "HIDPointerDevice",
-.version_id = 1,
+.version_id = 2,
 .minimum_version_id = 1,
 .post_load = hid_post_load,
 .fields = (VMStateField[]) {
@@ -443,6 +443,7 @@ const VMStateDescription vmstate_hid_ptr_device = {
 VMSTATE_UINT32(n, HIDState),
 VMSTATE_INT32(protocol, HIDState),
 VMSTATE_UINT8(idle, HIDState),
+VMSTATE_INT32_V(ptr.mouse_grabbed, HIDState, 2),
 VMSTATE_END_OF_LIST(),
 }
 };
-- 
1.7.6.msysgit.0



Re: [Qemu-devel] [PATCH 4/9] runstate_set(): Check for valid transitions

2011-09-13 Thread Luiz Capitulino
On Fri,  9 Sep 2011 17:25:41 -0300
Luiz Capitulino  wrote:

> This commit could have been folded with the previous one, however
> doing it separately will allow for easy bisect and revert if needed.
> 
> Checking and testing all valid transitions wasn't trivial, chances
> are this will need broader testing to become more stable.
> 
> This is a transition table as suggested by Lluís Vilanova.
> 
> Signed-off-by: Luiz Capitulino 

Would be nice to get a reviewed-by for this patch, as it wasn't trivial
to get it right and tested...

> ---
>  sysemu.h |1 +
>  vl.c |   74 
> +-
>  2 files changed, 74 insertions(+), 1 deletions(-)
> 
> diff --git a/sysemu.h b/sysemu.h
> index 19088aa..a01ddac 100644
> --- a/sysemu.h
> +++ b/sysemu.h
> @@ -36,6 +36,7 @@ extern uint8_t qemu_uuid[];
>  int qemu_uuid_parse(const char *str, uint8_t *uuid);
>  #define UUID_FMT 
> "%02hhx%02hhx%02hhx%02hhx-%02hhx%02hhx-%02hhx%02hhx-%02hhx%02hhx-%02hhx%02hhx%02hhx%02hhx%02hhx%02hhx"
>  
> +void runstate_init(void);
>  bool runstate_check(RunState state);
>  void runstate_set(RunState new_state);
>  typedef struct vm_change_state_entry VMChangeStateEntry;
> diff --git a/vl.c b/vl.c
> index 9926d2a..4a8edc7 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -327,14 +327,84 @@ static int default_driver_check(QemuOpts *opts, void 
> *opaque)
>  
>  static RunState current_run_state = RSTATE_NO_STATE;
>  
> +typedef struct {
> +RunState from;
> +RunState to;
> +} RunStateTransition;
> +
> +static const RunStateTransition runstate_transitions_def[] = {
> +/* from  -> to  */
> +{ RSTATE_NO_STATE, RSTATE_RUNNING },
> +{ RSTATE_NO_STATE, RSTATE_IN_MIGRATE },
> +{ RSTATE_NO_STATE, RSTATE_PRE_LAUNCH },
> +
> +{ RSTATE_DEBUG, RSTATE_RUNNING },
> +
> +{ RSTATE_IN_MIGRATE, RSTATE_RUNNING },
> +{ RSTATE_IN_MIGRATE, RSTATE_PRE_LAUNCH },
> +
> +{ RSTATE_PANICKED, RSTATE_PAUSED },
> +
> +{ RSTATE_IO_ERROR, RSTATE_RUNNING },
> +
> +{ RSTATE_PAUSED, RSTATE_RUNNING },
> +
> +{ RSTATE_POST_MIGRATE, RSTATE_RUNNING },
> +
> +{ RSTATE_PRE_LAUNCH, RSTATE_RUNNING },
> +{ RSTATE_PRE_LAUNCH, RSTATE_POST_MIGRATE },
> +
> +{ RSTATE_PRE_MIGRATE, RSTATE_RUNNING },
> +{ RSTATE_PRE_MIGRATE, RSTATE_POST_MIGRATE },
> +
> +{ RSTATE_RESTORE, RSTATE_RUNNING },
> +
> +{ RSTATE_RUNNING, RSTATE_DEBUG },
> +{ RSTATE_RUNNING, RSTATE_PANICKED },
> +{ RSTATE_RUNNING, RSTATE_IO_ERROR },
> +{ RSTATE_RUNNING, RSTATE_PAUSED },
> +{ RSTATE_RUNNING, RSTATE_PRE_MIGRATE },
> +{ RSTATE_RUNNING, RSTATE_RESTORE },
> +{ RSTATE_RUNNING, RSTATE_SAVEVM },
> +{ RSTATE_RUNNING, RSTATE_SHUTDOWN },
> +{ RSTATE_RUNNING, RSTATE_WATCHDOG },
> +
> +{ RSTATE_SAVEVM, RSTATE_RUNNING },
> +
> +{ RSTATE_SHUTDOWN, RSTATE_PAUSED },
> +
> +{ RSTATE_WATCHDOG, RSTATE_RUNNING },
> +
> +{ RSTATE_MAX, RSTATE_MAX },
> +};
> +
> +static bool runstate_valid_transitions[RSTATE_MAX][RSTATE_MAX];
> +
>  bool runstate_check(RunState state)
>  {
>  return current_run_state == state;
>  }
>  
> +void runstate_init(void)
> +{
> +const RunStateTransition *p;
> +
> +memset(&runstate_valid_transitions, 0, 
> sizeof(runstate_valid_transitions));
> +
> +for (p = &runstate_transitions_def[0]; p->from != RSTATE_MAX; p++) {
> +runstate_valid_transitions[p->from][p->to] = true;
> +}
> +}
> +
> +/* This function will abort() on invalid state transitions */
>  void runstate_set(RunState new_state)
>  {
> -assert(new_state < RSTATE_MAX);
> +if (new_state >= RSTATE_MAX ||
> +!runstate_valid_transitions[current_run_state][new_state]) {
> +fprintf(stderr, "invalid runstate transition\n");
> +abort();
> +}
> +
>  current_run_state = new_state;
>  }
>  
> @@ -2218,6 +2288,8 @@ int main(int argc, char **argv, char **envp)
>  
>  g_mem_set_vtable(&mem_trace);
>  
> +runstate_init();
> +
>  init_clocks();
>  
>  qemu_cache_utils_init(envp);