date:20120221

Re: [Qemu-devel] [PATCHv2-RFC 2/2] pci: add standard bridge device

2012-02-21 Thread Gerd Hoffmann

  Hi,

>> This seems to not support 64bit prefetchable memory windows, at least
>> linux doesn't think it does, lspci looks like this:
>>
>> 00:10.0 PCI bridge: Red Hat, Inc. Device 0001 (prog-if 00 [Normal decode])
[ ... ]
>> Memory behind bridge: f600-f60f
>> Prefetchable memory behind bridge: f800-fbff

> What in the above tells you that 64 bit windows are not supported?

lspci prints 64bit addresses then, like this:

00:1c.0 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express
Port 1 (rev 03) (prog-if 00 [Normal decode])
[ ... ]
I/O behind bridge: 8000-8fff
Memory behind bridge: c000-c01f
Prefetchable memory behind bridge: c020-c03f

> BAR0 is 32 bit non prefetcheable. As far as I can see linux driver
> takes no precautions against access combining that can
> happen with prefetcheable BARs, so non-prefetcheable
> seems safer.

I'm not talking about bar #0, but about the prefetchable memory window
for devices behind the bridge.

cheers,
  Gerd

Re: [Qemu-devel] [RFC 5/7] qxl-render: call ppm_save on callback

2012-02-21 Thread Alon Levy

On Mon, Feb 20, 2012 at 02:29:11PM -0700, Eric Blake wrote:
> On 02/20/2012 04:32 AM, Gerd Hoffmann wrote:
> > Hmm, that is pretty lame.  There are users like autotest which expect
> > the screen dump being there when the monitor command is finished, that
> > change will break them.
> 
> Libvirt is another such user.
> 
> > 
> > Unfortunaly there is no easy way out.  I think the options are:
> > 
> >  (1) Keep existing behavior.  That means the screenshot might show old
> >  screen content.  Not very nice too.  Would work sort-of ok for
> >  autotest though as autotest does screenshots every second and thus
> >  the screen content wouldn't be older than a second.
> > 
> >  (2) Async monitor command.  Keeps interface and works nicely.  A bunch
> >  of QAPI bits tickled into master meanwhile, so we could look at
> >  this again.  Luiz?  What is the status here?
> > 
> >  (3) Something like this patch + additionally introduce a
> >  "your-screenshot-is-finished-now" qmp event.  Will break existing
> >  users too.  But at least they can be adapted without requiring
> >  some external, nonportable service like inotify ...
> 
> Libvirt would want 3) - any command that becomes async also needs an
> event to tell us when the command is completed, so that libvirt can
> maintain the synchronous interface to the user (and/or expose a new flag
> to allow the user to also benefit from the asynchronous command).

If I do 2) then libvirt won't notice because the monitor command will
block as usual. Only change would be internal, qemu would continue
processing other fds in the interim.

> 
> -- 
> Eric Blake   ebl...@redhat.com+1-919-301-3266
> Libvirt virtualization library http://libvirt.org
>

Re: [Qemu-devel] [RFC 7/7] qxl: add allocator

2012-02-21 Thread Alon Levy

On Tue, Feb 21, 2012 at 08:57:54AM +0100, Gerd Hoffmann wrote:
>   Hi,
> 
> > TBH I'm having problems without doing it with my own Allocator. In
> > particular calling ppm_save from spice_server thread seems to be a
> > problem.
> 
> Grabbed the qemu mutex?

Didn't we remove qemu mutex grabbing from the spice server thread a long
long time ago? I've switched to using a bh, it seems to work better but
still not perfectly. I've also found my notes on why I tried the
allocator approach in the first place (all the following is in
QXL_NATIVE mode):

 Boot with qxl only, no vnc or sdl (so no qxl_render_update users except
 screendump).

 Issue screendump.

 Right now qxl_render_update checks if the displaysurface buffer is not
 shared, meaning it was allocated by qemu, and in this case it replaces
 it with the flipped buffer.

 But right after that surface->data gets reset, by vga_hw_screen_dump:

vga_hw_screen_dump/console_select/qemu_resize_displaysurface/ds->allocator->resize_displaysurface/defaultallocator_resize_displaysurface/qemu_alloc_display

 Hence my line of thought that replacing the allocator with my own would
 prevent this. Since you have misgivings about using our own allocator
 that I don't know how to resolve, I'm instead doing a second
 reallocation in our dpy_resize callback qxl.c:display_resize, in affect
 it means that we have three allocations and three deallocations for every
 screendump. Do you still think it's less ugly then an allocator? note
 that I have sdl and vnc working with spice with my allocator scheme.
 (just didn't test all three together yet).

> 
> >> BTW: qxl insisting on a shared displaysurface isn't very clean too, it
> >> better should be able to fallback to just copying/converting for the
> >> non-shared case.
> > 
> > I'll try to get that working, although it seems to require having some
> > timer/bh to do the ppm_save.
> 
> bh should do, that is one of the things they are supposed to handle.
> 
> cheers,
>   Gerd
> 
>

Re: [Qemu-devel] [PATCH] Use DMADirection type for dma_bdrv_io

2012-02-21 Thread Paolo Bonzini

On 02/20/2012 11:50 AM, Alexander Graf wrote:
>> > DMAAIOCB *dbs = qemu_aio_get(&dma_aio_pool, bs, cb, opaque);
>> > 
>> > -trace_dma_bdrv_io(dbs, bs, sector_num, to_dev);
>> > +trace_dma_bdrv_io(dbs, bs, sector_num, dir);
> Was the trace wrong before or is it now? I don't see its definition changed 
> anywhere.

Not sure what you mean. :)

Paolo

[Qemu-devel] BlockDriverState stack and BlockListeners (was: [RFC] Replication agent design)

2012-02-21 Thread Kevin Wolf

Am 20.02.2012 15:32, schrieb Paolo Bonzini:
> On 02/19/2012 02:40 PM, Ori Mamluk wrote:
>>
>> I think it might be better to go back to my original less generic design.
>> We can regard it as a 'plugin' for a specific application - in this
>> case, replication.
>> I can add a plugin interface in the generic block layer that allows
>> building a proper storage stack.
>> The plugin will have capabilities like a filter driver - getting hold of
>> the request on its way down (from VM to storage) and on its way up (IO
>> completion), allowing to block or stall both.
> 
> I and Stefan talked about this recently... we called it a BlockListener.
>  It seems like a good idea, and probably copy-on-read should be
> converted in due to time to a BlockListener, too.

After thinking a bit about it, I tend to agree. However, I wouldn't call
it a BlockListener because it could do much more than just observing
requests, it can modify them. Basically it would take a request and do
anything with it. It could enqueue the request and do nothing for the
moment (I/O throttling), it could use a different buffer and do copy on
read, it could mirror writes, etc.

So let's check which features could make use of it:

- Copy on read
- I/O throttling
- blkmirror for precopy storage migration
- replication agent
- Old style block migration (btw, we should deprecate this)
- Maybe even bdrv_check_request and high watermark? However, they are
  not optional, so probably makes less sense.

I think these are enough cases to justify it. Now, which operations do
we need to intercept?

- bdrv_co_read
- bdrv_co_write
- bdrv_drain (btw, we need a version for only one BDS)
- Probably bdrv_co_discard as well

Anything I missed? Now the interesting question that comes to mind is:
What is really the difference between the proposed BlockListener and a
BlockDriver? Sure, a listener would implement much less functionality,
but we also have BlockDrivers today that implement very few of the
callbacks.

A bdrv_drain callback doesn't exist yet in BlockDrivers, but I consider
this a bug (qemu_aio_flush() is really the implementation for raw-posix
and possibly some network protocols), so we should just add this to
BlockDriver.

The main difference that I see is that the listeners stay always on top.
For example, let's assume that if implemented a blkmirror driver in
today's infrastructure, you would get a BlockDriverState stack like
blkmirror -> qcow2 -> file. If you take a live snapshot now, you don't
want to have the blkmirror applied to the old top-level image, which is
now a read-only backing file. Instead, it should move to the new
top-level image. I believe this is similar with I/O throttling, to some
degree with copy on read, etc.

So maybe we just need to extend the current BlockDriverState stack to
distinguish "normal" and "always on top" BlockDrivers, where the latter
would roughly correspond to BlockListeners?

Kevin

Re: [Qemu-devel] [PATCH] Use DMADirection type for dma_bdrv_io

2012-02-21 Thread Kevin Wolf

Am 21.02.2012 09:35, schrieb Paolo Bonzini:
> On 02/20/2012 11:50 AM, Alexander Graf wrote:
 DMAAIOCB *dbs = qemu_aio_get(&dma_aio_pool, bs, cb, opaque);

 -trace_dma_bdrv_io(dbs, bs, sector_num, to_dev);
 +trace_dma_bdrv_io(dbs, bs, sector_num, dir);
>> Was the trace wrong before or is it now? I don't see its definition changed 
>> anywhere.
> 
> Not sure what you mean. :)

trace-events:

dma_bdrv_io(void *dbs, void *bs, int64_t sector_num, bool to_dev)
"dbs=%p bs=%p sector_num=%" PRId64 " to_dev=%d"

to_dev is declared bool here, and it should also be renamed to dir (the
unfortunate thing about DMADirection is that it swaps 0 and 1 compared
to bool to_dev... We need to check carefully that all occurrences have
been caught.)

Kevin

Re: [Qemu-devel] [RFC PATCH] arm boot: added QOM device definition

2012-02-21 Thread Peter Crosthwaite

On Tue, Feb 21, 2012 at 5:56 AM, Peter Maydell  wrote:
> On 20 February 2012 19:51, Andreas Färber  wrote:
>> Am 20.02.2012 20:43, schrieb Peter Maydell:
>>> I don't particularly care how we QOMify arm_boot (it's not exactly at
>>> the top of my priority list demanding attention), I do care that (a)
>>> we have a sensible user-facing interface [ie command line options] and
>>> (b) vl.c can usefully just pass the information from those options
>>> straight to the boot loader code.

So when I trialled this patch I used the -device argument to
instantiate the bootloader on the command line:

qemu-system-arm -M versatilepb --device
arm_linux_loader,kernel_filename=kernel.foo,initrd=image.bar...

With this approach I was able to add command line args (if you will)
to arm_boot without touching vl.c, qemu_opts or any of the arm machine
models. With changing only arm_boot.c you would get your dtb argument:

qemu-system-arm -M versatilepb --device
arm_linux_loader,kernel_filename=kernel.foo,initrd=image.bar,-dtb=blah.dtb

Paul and Andreas reached the conclusion (correct me if I have
misunderstood) that the machine model should have still have to
explictly  instantiate the bootloader "device" (whether that be a QOM
device or arm_load_kernel() in its present form), but with the QOM
approach apparently --property arguments should allow you to
parameterise the bootloader directly, without vl.c or qemu-opts having
to know about the arguments at all. Unfortunately --property is a long
way from working so we are stuck with the status quo or your
machine-opts approach until that happens.

>>
>> A QOM'ified arm_boot could get a "virtual" callback method to check the
>> QemuOpts command line args. That way derived classes can decide what
>> additional options to accept.
>>
>> An alternative would be to expect QOM properties of the same name as the
>> command line parameters and fail if there isn't one of that name.
>
> You'd also need vl.c to be able to say "find me the first boot loader
> of any kind in the qom tree" so it could set the properties on the
> right thing.
>

> But really qomifying arm_boot doesn't excite me very much -- it
> doesn't really get us anything useful in the way that qomifying
> SysBus, for instance, does. So my preference is not to have to
> tie device tree support to it.
>

> -- PMM

Re: [Qemu-devel] BlockDriverState stack and BlockListeners

2012-02-21 Thread Paolo Bonzini

On 02/21/2012 10:03 AM, Kevin Wolf wrote:
> So let's check which features could make use of it:
> 
> - Copy on read
> - I/O throttling
> - blkmirror for precopy storage migration
> - replication agent
> - Old style block migratiom

More precisely, dirty bitmap handling.

> (btw, we should deprecate this)

Yeah, but we need blkmirror to provide an alternative.

> - Maybe even bdrv_check_request and high watermark? However, they are
>   not optional, so probably makes less sense.
> 
> I think these are enough cases to justify it. Now, which operations do
> we need to intercept?
> 
> - bdrv_co_read
> - bdrv_co_write
> - bdrv_drain (btw, we need a version for only one BDS)
> - Probably bdrv_co_discard as well

Yes.

> Anything I missed?

bdrv_co_flush, I think.

> Now the interesting question that comes to mind is:
> What is really the difference between the proposed BlockListener and a
> BlockDriver? Sure, a listener would implement much less functionality,
> but we also have BlockDrivers today that implement very few of the
> callbacks.

The two differences that come to mind are:

1) BlockListener would be by-design coroutine based; I know we disagree
on this (you want to change raw to coroutines long term; I would like to
reintroduce some AIO fastpaths when there are no active listeners).

2) BlockListener would be entirely an implementation detail, used in the
implementation of other commands.

Third, perhaps the interface to BlockListener could be
bdrv_before/after_read.  Cannot really say without writing one or two
BlockListeners whether this would be useful or a limitation.

> The main difference that I see is that the listeners stay always on top.
> For example, let's assume that if implemented a blkmirror driver in
> today's infrastructure, you would get a BlockDriverState stack like
> blkmirror -> qcow2 -> file. If you take a live snapshot now, you don't
> want to have the blkmirror applied to the old top-level image, which is
> now a read-only backing file. Instead, it should move to the new
> top-level image.

Yes, that's because a BlockListener always applies to a
BlockDriverState, and live snapshots close+reopen the BDS but do not
delete/recreate it.

> So maybe we just need to extend the current BlockDriverState stack to
> distinguish "normal" and "always on top" BlockDrivers, where the latter
> would roughly correspond to BlockListeners?

I would prefer having separate objects.  Even if you do not count fields
related to throttling or copy-on-read or other tasks in the list above,
there are many fields in BDS that do not really apply to BlockListeners.
 Backing files, device ops, encryption, even size.  Having extra methods
is not a big problem, but unwanted data items smell...

Paolo

Re: [Qemu-devel] [RFC 7/7] qxl: add allocator

2012-02-21 Thread Gerd Hoffmann

  Hi,

>  Right now qxl_render_update checks if the displaysurface buffer is not
>  shared, meaning it was allocated by qemu, and in this case it replaces
>  it with the flipped buffer.

I think we should first reqire spice-server 0.8.latest, so
update_area_complete is available unconditionally.  Then do any
displaysurface updates in that callback (or a bh kicked by that
callback).  Handle both shared & non-shared cases.  I think we also can
get rid of the flip buffer then and just use a non-shared displaysurface
in that case (and flip the upside-down qxl surface while copying to the
qemu displaysurface).

>  But right after that surface->data gets reset, by vga_hw_screen_dump:
>  
> vga_hw_screen_dump/console_select/qemu_resize_displaysurface/ds->allocator->resize_displaysurface/defaultallocator_resize_displaysurface/qemu_alloc_display
> 
>  Hence my line of thought that replacing the allocator with my own would
>  prevent this. Since you have misgivings about using our own allocator
>  that I don't know how to resolve, I'm instead doing a second
>  reallocation in our dpy_resize callback qxl.c:display_resize, in affect
>  it means that we have three allocations and three deallocations for every
>  screendump. Do you still think it's less ugly then an allocator? note
>  that I have sdl and vnc working with spice with my allocator scheme.
>  (just didn't test all three together yet).

IMHO that calls for a patch like this to get rid of the pointless
console_select() call:

--- a/console.c
+++ b/console.c
@@ -181,12 +181,14 @@ void vga_hw_screen_dump(const char *filename)

 /* There is currently no way of specifying which screen we want to
dump,
so always dump the first one.  */
-console_select(0);
+if (previous_active_console && previous_active_console->index != 0) {
+console_select(0);
+}
 if (consoles[0] && consoles[0]->hw_screen_dump) {
 consoles[0]->hw_screen_dump(consoles[0]->hw, filename);
 }

-if (previous_active_console) {
+if (previous_active_console && previous_active_console->index != 0) {
 console_select(previous_active_console->index);
 }
 }


cheers,
  Gerd

Re: [Qemu-devel] [PATCH 4/4] arm: add device tree support

2012-02-21 Thread Peter Crosthwaite

On Tue, Feb 21, 2012 at 5:47 AM, Peter Maydell  wrote:
> Ping re patch 3 and 4 here -- I know there was some discussion under the
> thread on Peter Crosthwaite's 'qomify arm_boot' patch series, but it's a
> bit hard to disentangle from the comments on that patch series.

Conclusion reached over there was the yet to be implemented --property
command line argument should work for this. But blocking dtb support
while waiting for this seems excessive to me. Can we push this
regardless of the issues identifed in that thread considering the the
groundwork for the ideal solution is yet to be implemented?

> Specifically, does anybody (a) dislike the user-facing command line
> interface here or (b) otherwise think it's misguided? I'm not a huge
> fan of the -machine options (they seem to be a bit of a grab-bag of
> stuff) but I can live with them as a mechanism for passing the data
> to the right place...
>
> thanks
> -- PMM
>
> On 8 February 2012 05:41, Peter Maydell  wrote:
>> From: Grant Likely 
>>
>> If compiled with CONFIG_FDT, allow user to specify a device tree file using
>> the -dtb argument.  If the machine supports it then the dtb will be loaded
>> into memory and passed to the kernel on boot.
>>
>> Signed-off-by: Jeremy Kerr 
>> Signed-off-by: Grant Likely 
>> [Peter Maydell: Use machine opt rather than global to pass dtb filename]
>> Signed-off-by: Peter Maydell 
>> ---
>>  Makefile.target |    1 +
>>  configure       |    1 +
>>  hw/arm-misc.h   |    1 +
>>  hw/arm_boot.c   |   96 
>> +++---
>>  qemu-config.c   |    4 ++
>>  qemu-options.hx |    9 +
>>  vl.c            |    9 +
>>  7 files changed, 115 insertions(+), 6 deletions(-)
>>
>> diff --git a/Makefile.target b/Makefile.target
>> index 68481a3..5e465ec 100644
>> --- a/Makefile.target
>> +++ b/Makefile.target
>> @@ -363,6 +363,7 @@ obj-arm-y += vexpress.o
>>  obj-arm-y += strongarm.o
>>  obj-arm-y += collie.o
>>  obj-arm-y += pl041.o lm4549.o
>> +obj-arm-$(CONFIG_FDT) += device_tree.o
>>
>>  obj-sh4-y = shix.o r2d.o sh7750.o sh7750_regnames.o tc58128.o
>>  obj-sh4-y += sh_timer.o sh_serial.o sh_intc.o sh_pci.o sm501.o
>> diff --git a/configure b/configure
>> index 763db24..1d8b3ac 100755
>> --- a/configure
>> +++ b/configure
>> @@ -3458,6 +3458,7 @@ case "$target_arch2" in
>>     gdb_xml_files="arm-core.xml arm-vfp.xml arm-vfp3.xml arm-neon.xml"
>>     target_phys_bits=32
>>     target_llong_alignment=4
>> +    target_libs_softmmu="$fdt_libs"
>>   ;;
>>   cris)
>>     target_nptl="yes"
>> diff --git a/hw/arm-misc.h b/hw/arm-misc.h
>> index 5e5204b..4b55fb8 100644
>> --- a/hw/arm-misc.h
>> +++ b/hw/arm-misc.h
>> @@ -29,6 +29,7 @@ struct arm_boot_info {
>>     const char *kernel_filename;
>>     const char *kernel_cmdline;
>>     const char *initrd_filename;
>> +    const char *dtb_filename;
>>     target_phys_addr_t loader_start;
>>     /* multicore boards that use the default secondary core boot functions
>>      * need to put the address of the secondary boot code, the boot reg,
>> diff --git a/hw/arm_boot.c b/hw/arm_boot.c
>> index 5f163fd..769fe2e 100644
>> --- a/hw/arm_boot.c
>> +++ b/hw/arm_boot.c
>> @@ -7,11 +7,14 @@
>>  * This code is licensed under the GPL.
>>  */
>>
>> +#include "config.h"
>>  #include "hw.h"
>>  #include "arm-misc.h"
>>  #include "sysemu.h"
>> +#include "boards.h"
>>  #include "loader.h"
>>  #include "elf.h"
>> +#include "device_tree.h"
>>
>>  #define KERNEL_ARGS_ADDR 0x100
>>  #define KERNEL_LOAD_ADDR 0x0001
>> @@ -207,6 +210,67 @@ static void set_kernel_args_old(const struct 
>> arm_boot_info *info,
>>     }
>>  }
>>
>> +static int load_dtb(target_phys_addr_t addr, const struct arm_boot_info 
>> *binfo)
>> +{
>> +#ifdef CONFIG_FDT
>> +    uint32_t mem_reg_property[] = { cpu_to_be32(binfo->loader_start),
>> +                                    cpu_to_be32(binfo->ram_size) };
>> +    void *fdt = NULL;
>> +    char *filename;
>> +    int size, rc;
>> +
>> +    filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, binfo->dtb_filename);
>> +    if (!filename) {
>> +        fprintf(stderr, "Couldn't open dtb file %s\n", binfo->dtb_filename);
>> +        return -1;
>> +    }
>> +
>> +    fdt = load_device_tree(filename, &size);
>> +    if (!fdt) {
>> +        fprintf(stderr, "Couldn't open dtb file %s\n", filename);
>> +        g_free(filename);
>> +        return -1;
>> +    }
>> +    g_free(filename);
>> +
>> +    rc = qemu_devtree_setprop(fdt, "/memory", "reg", mem_reg_property,
>> +                               sizeof(mem_reg_property));
>> +    if (rc < 0) {
>> +        fprintf(stderr, "couldn't set /memory/reg\n");
>> +    }
>> +
>> +    rc = qemu_devtree_setprop_string(fdt, "/chosen", "bootargs",
>> +                                      binfo->kernel_cmdline);
>> +    if (rc < 0) {
>> +        fprintf(stderr, "couldn't set /chosen/bootargs\n");
>> +    }
>> +
>> +    if (binfo->initrd_size) {
>> +        rc = qemu_devtree_setprop_cell(fdt, "/chosen", "linux,initr

Re: [Qemu-devel] [PATCH V7 11/11] xen passthrough: clean up MSI-X table handling

2012-02-21 Thread Jan Beulich

>>> On 17.02.12 at 18:08, Anthony PERARD  wrote:

Wouldn't thus much better be merged into the prior patch(es)? After
all you're not trying to reconstruct the Xen-side history of this code
anyway.

Jan

> From: Shan Haitao 
> 
> This patch does cleaning up of QEMU MSI handling. The fixes are:
> 1. Changes made to MSI-X table mapping handling to eliminate the small
> windows in which guest could have access to physical MSI-X table.
> 2. MSI-X table is mapped as read-only to QEMU, as masking of MSI-X is
> already in Xen now.
> 3. For registers that coexists inside the MSI-X table (this could be
> only PBA I think), value read from physical page would be returned.
> 
> Signed-off-by:  Shan Haitao 
> 
> Consolidated duplicate code into _pt_iomem_helper(). Fixed formatting.
> 
> Signed-off-by: Jan Beulich 
> Signed-off-by: Anthony PERARD 
> ---
>  hw/xen_pci_passthrough.c |   80 
> --
>  hw/xen_pci_passthrough.h |   11 +-
>  hw/xen_pci_passthrough_msi.c |   62 
>  3 files changed, 78 insertions(+), 75 deletions(-)
> 
> diff --git a/hw/xen_pci_passthrough.c b/hw/xen_pci_passthrough.c
> index 6ea2c45..fab9fbe 100644
> --- a/hw/xen_pci_passthrough.c
> +++ b/hw/xen_pci_passthrough.c
> @@ -356,6 +356,53 @@ out:
>  }
>  }
>  
> +static int pt_iomem_helper(XenPCIPassthroughState *s, int i,
> +   pcibus_t e_base, pcibus_t e_size, int op)
> +{
> +uint64_t msix_last_pfn = 0;
> +pcibus_t bar_last_pfn = 0;
> +pcibus_t msix_table_size = 0;
> +XenPTMSIX *msix = s->msix;
> +int rc = 0;
> +
> +if (!pt_has_msix_mapping(s, i)) {
> +return xc_domain_memory_mapping(xen_xc, xen_domid,
> +PFN(e_base),
> +PFN(s->bases[i].access.maddr),
> +PFN(e_size + XC_PAGE_SIZE - 1),
> +op);
> +
> +}
> +
> +if (msix->table_off) {
> +rc = xc_domain_memory_mapping(xen_xc, xen_domid,
> +  PFN(e_base),
> +  PFN(s->bases[i].access.maddr),
> +  PFN(msix->mmio_base_addr) - 
> PFN(e_base),
> +  op);
> +}
> +
> +msix_table_size = msix->total_entries * PCI_MSIX_ENTRY_SIZE;
> +msix_last_pfn = PFN(msix->mmio_base_addr + msix_table_size - 1);
> +bar_last_pfn = PFN(e_base + e_size - 1);
> +
> +if (rc == 0 && msix_last_pfn != bar_last_pfn) {
> +assert(msix_last_pfn < bar_last_pfn);
> +
> +rc = xc_domain_memory_mapping(xen_xc, xen_domid,
> +  msix_last_pfn + 1,
> +  PFN(s->bases[i].access.maddr
> +  + msix->table_off
> +  + msix_table_size
> +  + XC_PAGE_SIZE - 1),
> +  bar_last_pfn - msix_last_pfn,
> +  op);
> +}
> +
> +return rc;
> +}
> +
> +
>  /* ioport/iomem space*/
>  static void pt_iomem_map(XenPCIPassthroughState *s, int i,
>   pcibus_t e_phys, pcibus_t e_size)
> @@ -376,13 +423,10 @@ static void pt_iomem_map(XenPCIPassthroughState *s, int 
> i,
>  }
>  
>  if (!first_map && old_ebase != PT_PCI_BAR_UNMAPPED) {
> -pt_add_msix_mapping(s, i);
> -/* Remove old mapping */
> -rc = xc_domain_memory_mapping(xen_xc, xen_domid,
> -   old_ebase >> XC_PAGE_SHIFT,
> -   s->bases[i].access.maddr >> XC_PAGE_SHIFT,
> -   (e_size + XC_PAGE_SIZE - 1) >> XC_PAGE_SHIFT,
> -   DPCI_REMOVE_MAPPING);
> +if (pt_has_msix_mapping(s, i)) {
> +memory_region_set_enabled(&s->msix->mmio, false);
> +}
> +rc = pt_iomem_helper(s, i, old_ebase, e_size, DPCI_REMOVE_MAPPING);
>  if (rc) {
>  PT_ERR(&s->dev, "remove old mapping failed! (rc: %i)\n", rc);
>  return;
> @@ -391,21 +435,19 @@ static void pt_iomem_map(XenPCIPassthroughState *s, int 
> i,
>  
>  /* map only valid guest address */
>  if (e_phys != PCI_BAR_UNMAPPED) {
> -/* Create new mapping */
> -rc = xc_domain_memory_mapping(xen_xc, xen_domid,
> -   s->bases[i].e_physbase >> XC_PAGE_SHIFT,
> -   s->bases[i].access.maddr >> XC_PAGE_SHIFT,
> -   (e_size+XC_PAGE_SIZE-1) >> XC_PAGE_SHIFT,
> -   DPCI_ADD_MAPPING);
> +if (pt_has_msix_mapping(s, i)) {
> +XenPTMSIX *msix = s->msix;
>  
> -if (rc) {
> -PT_ERR(&s->dev, "create new mapping failed! (rc: %i)\n", rc);

Re: [Qemu-devel] BlockDriverState stack and BlockListeners

2012-02-21 Thread Kevin Wolf

Am 21.02.2012 10:15, schrieb Paolo Bonzini:
> On 02/21/2012 10:03 AM, Kevin Wolf wrote:
>> So let's check which features could make use of it:
>>
>> - Copy on read
>> - I/O throttling
>> - blkmirror for precopy storage migration
>> - replication agent
>> - Old style block migratiom
> 
> More precisely, dirty bitmap handling.

Yes, but is it used anywhere else?

>> (btw, we should deprecate this)
> 
> Yeah, but we need blkmirror to provide an alternative.

Does block migration even work meanwhile without corrupting things left
and right?

>> - Maybe even bdrv_check_request and high watermark? However, they are
>>   not optional, so probably makes less sense.
>>
>> I think these are enough cases to justify it. Now, which operations do
>> we need to intercept?
>>
>> - bdrv_co_read
>> - bdrv_co_write
>> - bdrv_drain (btw, we need a version for only one BDS)
>> - Probably bdrv_co_discard as well
> 
> Yes.
> 
>> Anything I missed?
> 
> bdrv_co_flush, I think.

Right, we'll need bdrv_co_flush as well.

>> Now the interesting question that comes to mind is:
>> What is really the difference between the proposed BlockListener and a
>> BlockDriver? Sure, a listener would implement much less functionality,
>> but we also have BlockDrivers today that implement very few of the
>> callbacks.
> 
> The two differences that come to mind are:
> 
> 1) BlockListener would be by-design coroutine based; I know we disagree
> on this (you want to change raw to coroutines long term; I would like to
> reintroduce some AIO fastpaths when there are no active listeners).

I can't see a technical reason why a BlockListener could not be callback
based. The only reason might be a "there are only coroutines" policy.

But even then, just don't implement bdrv_aio_* like all other
coroutine-based block drivers.

> 2) BlockListener would be entirely an implementation detail, used in the
> implementation of other commands.

Depending on what you mean by command (presumably QMP commands?), I
think I disagree. Management tools will want to start a VM with
BlockListeners already applied (which would be possible via -blockdev).

And on the other hand, protocols like file are entirely implementation
details as well, and still they are BlockDrivers.

> Third, perhaps the interface to BlockListener could be
> bdrv_before/after_read.  Cannot really say without writing one or two
> BlockListeners whether this would be useful or a limitation.

/* bdrv_before code here */
ret = bdrv_co_read(bs->file, ...);
/* bdrv_after code here */

Should be pretty much equivalent, no?

>> The main difference that I see is that the listeners stay always on top.
>> For example, let's assume that if implemented a blkmirror driver in
>> today's infrastructure, you would get a BlockDriverState stack like
>> blkmirror -> qcow2 -> file. If you take a live snapshot now, you don't
>> want to have the blkmirror applied to the old top-level image, which is
>> now a read-only backing file. Instead, it should move to the new
>> top-level image.
> 
> Yes, that's because a BlockListener always applies to a
> BlockDriverState, and live snapshots close+reopen the BDS but do not
> delete/recreate it.

Hm, that's an interesting angle to look at it. The reasoning makes sense
to me (though I would reword it as a BlockListener belongs to a
drive/block device rather than a BDS, which is an implementation detail).

However, live snapshots can't close and reopen the BDS in the future,
because the reopen could fail and you must not have closed the old image
yet in this case. So what Jeff and I were looking into recently is to
change this so that new top-level images are opened first without
backing file and then the backing file relationship is created with the
existing BDS.

Of course, we stumbled across the thing that you're telling me here,
that devices refer to the same BDS as before. So their content must be
swapped, but some data like the device name (and now BlockListeners)
stay with the top-level image instead.

Which in turn reminds me of a discussion I had with Stefan a while ago,
where we came to the conclusion that we need to separate the
representation of an image file and a "view" of it which represents a
block device (as a whole). One of the reasons then was that one qcow2
image could offer multiple views (one r/w view and for each snapshot a
r/o one). I think the separation that this would require might actually
be the same as the separation of things that stay top-level and that
belong to the image file.

Isn't it cool how everything is connected with everything? :-)

>> So maybe we just need to extend the current BlockDriverState stack to
>> distinguish "normal" and "always on top" BlockDrivers, where the latter
>> would roughly correspond to BlockListeners?
> 
> I would prefer having separate objects.  Even if you do not count fields
> related to throttling or copy-on-read or other tasks in the list above,
> there are many fields in BDS that do not really apply to BlockListen

Re: [Qemu-devel] [PATCH v5 07/12] suspend: add system_wakeup monitor command

2012-02-21 Thread Gerd Hoffmann

  Hi,

>> +# @system_wakeup:
>> +#
>> +# Wakeup guest from suspend
>> +#
>> +# Since:  1.1
>> +#
>> +# Returns:  nothing.
> 
> Would be nice to note that this command does nothing if the guest is already
> suspended (btw, does is_suspend account for guest initiated suspends too?).

s/suspended/running/, but yes.

There are only guest initiated suspends (well, libvirt-via-agent is
somewhat grey ...).

cheers,
  Gerd

>> diff --git a/qmp.c b/qmp.c
>> index 1f64844..a182b51 100644
>> --- a/qmp.c
>> +++ b/qmp.c
>> @@ -163,6 +163,11 @@ void qmp_cont(Error **errp)
>>  vm_start();
>>  }
>>  
>> +void qmp_system_wakeup(Error **errp)
>> +{
>> +qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
> 
> qemu_system_wakeup_request() does:
> 
> if (!(wakeup_reason_mask & (1 << reason))) {
> return;
> }
> 
> But I think you never disable REASON_OTHER, meaning that this will
> never be true when the wakeup request comes from the command, right?

Correct.  The mask is for wakeup events the guest is able to
enable/disable (i.e. rtc wakeups can be enabled/disabled via acpi).
REASON_OTHER can't be maked.

cheers,
  Gerd

Re: [Qemu-devel] [RFC 7/7] qxl: add allocator

2012-02-21 Thread Alon Levy

On Tue, Feb 21, 2012 at 10:20:49AM +0100, Gerd Hoffmann wrote:
>   Hi,
> 
> >  Right now qxl_render_update checks if the displaysurface buffer is not
> >  shared, meaning it was allocated by qemu, and in this case it replaces
> >  it with the flipped buffer.
> 
> I think we should first reqire spice-server 0.8.latest, so
> update_area_complete is available unconditionally.  Then do any
> displaysurface updates in that callback (or a bh kicked by that
> callback).  Handle both shared & non-shared cases.  I think we also can
> get rid of the flip buffer then and just use a non-shared displaysurface
> in that case (and flip the upside-down qxl surface while copying to the
> qemu displaysurface).

ok, I'll try that...

> 
> >  But right after that surface->data gets reset, by vga_hw_screen_dump:
> >  
> > vga_hw_screen_dump/console_select/qemu_resize_displaysurface/ds->allocator->resize_displaysurface/defaultallocator_resize_displaysurface/qemu_alloc_display
> > 
> >  Hence my line of thought that replacing the allocator with my own would
> >  prevent this. Since you have misgivings about using our own allocator
> >  that I don't know how to resolve, I'm instead doing a second
> >  reallocation in our dpy_resize callback qxl.c:display_resize, in affect
> >  it means that we have three allocations and three deallocations for every
> >  screendump. Do you still think it's less ugly then an allocator? note
> >  that I have sdl and vnc working with spice with my allocator scheme.
> >  (just didn't test all three together yet).
> 
> IMHO that calls for a patch like this to get rid of the pointless
> console_select() call:
> 
> --- a/console.c
> +++ b/console.c
> @@ -181,12 +181,14 @@ void vga_hw_screen_dump(const char *filename)
> 
>  /* There is currently no way of specifying which screen we want to
> dump,
> so always dump the first one.  */
> -console_select(0);
> +if (previous_active_console && previous_active_console->index != 0) {
> +console_select(0);
> +}
>  if (consoles[0] && consoles[0]->hw_screen_dump) {
>  consoles[0]->hw_screen_dump(consoles[0]->hw, filename);
>  }
> 
> -if (previous_active_console) {
> +if (previous_active_console && previous_active_console->index != 0) {
>  console_select(previous_active_console->index);
>  }
>  }

...and that. Thanks.

> 
> 
> cheers,
>   Gerd
>

Re: [Qemu-devel] [PATCH v5 12/12] suspend: add qmp events

2012-02-21 Thread Gerd Hoffmann

On 02/17/12 18:33, Luiz Capitulino wrote:
> On Wed, 15 Feb 2012 11:28:21 +0100
> Gerd Hoffmann  wrote:
> 
>> Send qmp events on suspend and wakeup so libvirt
>> has a chance to track the vm state.

[ added libvirt to Cc:, leaving full context.
  this is about qmp events when the guest enters/leaves s3 ].

>> Signed-off-by: Gerd Hoffmann 
>> ---
>>  monitor.c |6 ++
>>  monitor.h |2 ++
>>  vl.c  |   15 +++
>>  3 files changed, 23 insertions(+), 0 deletions(-)
>>
>> diff --git a/monitor.c b/monitor.c
>> index aadbdcb..e6f5fad 100644
>> --- a/monitor.c
>> +++ b/monitor.c
>> @@ -485,6 +485,12 @@ void monitor_protocol_event(MonitorEvent event, QObject 
>> *data)
>>  case QEVENT_BLOCK_JOB_CANCELLED:
>>  event_name = "BLOCK_JOB_CANCELLED";
>>  break;
>> +case QEVENT_SUSPEND:
>> +event_name = "SUSPEND";
>> +break;
>> +case QEVENT_WAKEUP:
>> +event_name = "WAKEUP";
>> +break;
>>  default:
>>  abort();
>>  break;
>> diff --git a/monitor.h b/monitor.h
>> index b72ea07..9df3bab 100644
>> --- a/monitor.h
>> +++ b/monitor.h
>> @@ -38,6 +38,8 @@ typedef enum MonitorEvent {
>>  QEVENT_SPICE_DISCONNECTED,
>>  QEVENT_BLOCK_JOB_COMPLETED,
>>  QEVENT_BLOCK_JOB_CANCELLED,
>> +QEVENT_SUSPEND,
>> +QEVENT_WAKEUP,
>>  QEVENT_MAX,
>>  } MonitorEvent;
>>  
>> diff --git a/vl.c b/vl.c
>> index bfdcb7c..570ea05 100644
>> --- a/vl.c
>> +++ b/vl.c
>> @@ -1416,6 +1416,7 @@ static void qemu_system_suspend(void)
>>  {
>>  pause_all_vcpus();
>>  notifier_list_notify(&suspend_notifiers, NULL);
>> +monitor_protocol_event(QEVENT_SUSPEND, NULL);
>>  is_suspended = true;
>>  }
>>  
>> @@ -1436,12 +1437,26 @@ void qemu_register_suspend_notifier(Notifier 
>> *notifier)
>>  
>>  void qemu_system_wakeup_request(WakeupReason reason)
>>  {
>> +static const char *names[] = {
>> +[QEMU_WAKEUP_REASON_OTHER]   = "other",
>> +[QEMU_WAKEUP_REASON_RTC] = "rtc",
>> +[QEMU_WAKEUP_REASON_PMTIMER] = "pmtimer",
> 
> Is the reason really important for mngt? Can we just leave it out?

I assumed the wakeup reason could be useful, but dunno.
Zapping the code is no problem of course.

Lets ask mgmt aka libvirt folks ;)

cheers,
  Gerd

>> +};
>> +const char *name;
>> +QObject *data;
>> +
>>  if (!is_suspended) {
>>  return;
>>  }
>>  if (!(wakeup_reason_mask & (1 << reason))) {
>>  return;
>>  }
>> +
>> +name = (reason < ARRAY_SIZE(names)) ? names[reason] : "unknown";
>> +data = qobject_from_jsonf("{ 'reason': %s }", name);
>> +monitor_protocol_event(QEVENT_WAKEUP, data);
>> +qobject_decref(data);
>> +
>>  notifier_list_notify(&wakeup_notifiers, &reason);
>>  reset_requested = 1;
>>  qemu_notify_event();
>

Re: [Qemu-devel] BlockDriverState stack and BlockListeners

2012-02-21 Thread Paolo Bonzini

On 02/21/2012 10:49 AM, Kevin Wolf wrote:
> Am 21.02.2012 10:15, schrieb Paolo Bonzini:
>> On 02/21/2012 10:03 AM, Kevin Wolf wrote:
>>> - Old style block migratiom
>>
>> More precisely, dirty bitmap handling.
> 
> Yes, but is it used anywhere else?

No, just nitpicking.

>>> (btw, we should deprecate this)
>>
>> Yeah, but we need blkmirror to provide an alternative.
> 
> Does block migration even work meanwhile without corrupting things left
> and right?

No idea.  Somebody must have used it at some point. :)

>> 1) BlockListener would be by-design coroutine based; I know we disagree
>> on this (you want to change raw to coroutines long term; I would like to
>> reintroduce some AIO fastpaths when there are no active listeners).
> 
> I can't see a technical reason why a BlockListener could not be callback
> based. The only reason might be a "there are only coroutines" policy.

Not a technical reason, more of a sanity reason.  Coroutines were
introduced to allow a number of the things in the list.

>> 2) BlockListener would be entirely an implementation detail, used in the
>> implementation of other commands.
> 
> Depending on what you mean by command (presumably QMP commands?),

QMP commands, command-line (-drive), whatever.

> And on the other hand, protocols like file are entirely implementation
> details as well, and still they are BlockDrivers.

True.  Formats and protocols also do not have perfectly overlapping
functionality (backing images are only a format thing, for example).

>>> The main difference that I see is that the listeners stay always on top.
>>> For example, let's assume that if implemented a blkmirror driver in
>>> today's infrastructure, you would get a BlockDriverState stack like
>>> blkmirror -> qcow2 -> file. If you take a live snapshot now, you don't
>>> want to have the blkmirror applied to the old top-level image, which is
>>> now a read-only backing file. Instead, it should move to the new
>>> top-level image.
>>
>> Yes, that's because a BlockListener always applies to a
>> BlockDriverState, and live snapshots close+reopen the BDS but do not
>> delete/recreate it.
> 
> Hm, that's an interesting angle to look at it. The reasoning makes sense
> to me (though I would reword it as a BlockListener belongs to a
> drive/block device rather than a BDS, which is an implementation detail).

Yes.

> However, live snapshots can't close and reopen the BDS in the future,
> because the reopen could fail and you must not have closed the old image
> yet in this case. So what Jeff and I were looking into recently is to
> change this so that new top-level images are opened first without
> backing file and then the backing file relationship is created with the
> existing BDS.
> 
> Of course, we stumbled across the thing that you're telling me here,
> that devices refer to the same BDS as before. So their content must be
> swapped, but some data like the device name (and now BlockListeners)
> stay with the top-level image instead.
> 
> Which in turn reminds me of a discussion I had with Stefan a while ago,
> where we came to the conclusion that we need to separate the
> representation of an image file and a "view" of it which represents a
> block device (as a whole). [...]
> 
> Isn't it cool how everything is connected with everything? :-)

:)

So we'd have:

- Protocols -- Protocols tell you where to get the raw bits.
- Formats -- Formats transform those raw bits into a block device.
- Views -- Views can move from a format to another.  A format can use a
default view implementation, or provide its own (e.g. to access
different snapshots).
- Listeners -- I think a view can have-a listener?

with the following relationship:

- A format has-a protocol for the raw bits and has-a view for the
backing file.
- A view has-a format, a device has-a view.
- A view can have-a listener?  Or is it formats?

But I think we're digressing...

>>> So maybe we just need to extend the current BlockDriverState stack to
>>> distinguish "normal" and "always on top" BlockDrivers, where the latter
>>> would roughly correspond to BlockListeners?
>>
>> I would prefer having separate objects.  Even if you do not count fields
>> related to throttling or copy-on-read or other tasks in the list above,
>> there are many fields in BDS that do not really apply to BlockListeners.
>>  Backing files, device ops, encryption, even size.  Having extra methods
>> is not a big problem, but unwanted data items smell...
> 
> Most other block drivers use only little of them. We can try to clean up
> some of them (and making the separation described above would probably
> help with it), but BlockListeners aren't really different here from
> existing drivers.

True, the question only matters insofar as having a separate data
structure simplifies the design.  ("Simplify" means "we can actually
understand it and be reasonably sure that it's correct and implementable").

Paolo

Re: [Qemu-devel] [PATCH V7 08/11] Introduce Xen PCI Passthrough, PCI config space helpers (2/3)

2012-02-21 Thread Stefano Stabellini

On Mon, 20 Feb 2012, Michael S. Tsirkin wrote:
> On Fri, Feb 17, 2012 at 05:08:42PM +, Anthony PERARD wrote:
> > From: Allen Kay 
> >
> > A more complete history can be found here:
> > git://xenbits.xensource.com/qemu-xen-unstable.git
> >
> > Signed-off-by: Allen Kay 
> > Signed-off-by: Guy Zana 
> > Signed-off-by: Anthony PERARD 
> 
> A lot of functionality seems to be duplicated with
> kvm's device assignment code.
> Makes sense to try separating generic bits so that code
> can be shared?

We had this discussion a while back, see Avi's suggestion to merge both
and abstracting them later:

http://marc.info/?l=qemu-devel&m=131805825314072&w=2

[Qemu-devel] [PATCH] qom: Document ways to retrieve child object added by object_property_add_child()

2012-02-21 Thread alexander_barabash

From: Alexander Barabash 

object_property_add_child() creates a property whose values as a string is
the child object's canonical path.

Signed-off-by: Alexander Barabash 
---
 include/qemu/object.h |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/include/qemu/object.h b/include/qemu/object.h
index ba2409d..d9e9221 100644
--- a/include/qemu/object.h
+++ b/include/qemu/object.h
@@ -808,6 +808,10 @@ Object *object_resolve_path_type(const char *path, const 
char *typename,
  *
  * There is no way for a child to determine what its parent is.  It is not
  * a bidirectional relationship.  This is by design.
+ *
+ * The value of a child property as a C string will be the child object's
+ * canonical path. It can be retrieved using object_property_get_str().
+ * The child object itself can be retrieved using object_property_get_link().
  */
 void object_property_add_child(Object *obj, const char *name,
Object *child, struct Error **errp);
-- 
1.7.5.4

Re: [Qemu-devel] [PATCH] qom: Document ways to retrieve child object added by object_property_add_child()

2012-02-21 Thread Paolo Bonzini

On 02/21/2012 11:14 AM, alexander_barab...@mentor.com wrote:
> From: Alexander Barabash 
> 
> object_property_add_child() creates a property whose values as a string is
> the child object's canonical path.
> 
> Signed-off-by: Alexander Barabash 
> ---
>  include/qemu/object.h |4 
>  1 files changed, 4 insertions(+), 0 deletions(-)
> 
> diff --git a/include/qemu/object.h b/include/qemu/object.h
> index ba2409d..d9e9221 100644
> --- a/include/qemu/object.h
> +++ b/include/qemu/object.h
> @@ -808,6 +808,10 @@ Object *object_resolve_path_type(const char *path, const 
> char *typename,
>   *
>   * There is no way for a child to determine what its parent is.  It is not
>   * a bidirectional relationship.  This is by design.
> + *
> + * The value of a child property as a C string will be the child object's
> + * canonical path. It can be retrieved using object_property_get_str().
> + * The child object itself can be retrieved using object_property_get_link().
>   */
>  void object_property_add_child(Object *obj, const char *name,
> Object *child, struct Error **errp);

Acked-by: Paolo Bonzini

Re: [Qemu-devel] [PATCH v4 0/6] save/restore on Xen

2012-02-21 Thread Stefano Stabellini

On Mon, 13 Feb 2012, Stefano Stabellini wrote:
> On Tue, 31 Jan 2012, Stefano Stabellini wrote:
> > On Wed, 25 Jan 2012, Stefano Stabellini wrote:
> > > Hi all,
> > > this is the fourth version of the Xen save/restore patch series.
> > > We have been discussing this issue for quite a while on #qemu and
> > > qemu-devel:
> > > 
> > > 
> > > http://marc.info/?l=qemu-devel&m=132346828427314&w=2
> > > http://marc.info/?l=qemu-devel&m=132377734605464&w=2
> > > 
> > > 
> > > The principal changes in the this version are:
> > > 
> > > - Following Anthony's suggestion I have introduced a new monitor command
> > > to save the non-ram device state to file.
> > > 
> > > - I have also removed the hack not to reset the cirrus videoram on
> > > restore, because it turns out that the videoram doesn't need to be
> > > reset in the reset handler at all (tested on Win2K, where the problem
> > > was found in the first place).
> > > 
> > 
> > Is everybody happy enough with this series?
> > Do you have any additional comments?
> > 
> 
> ping
> 

Can we at least agree on the basic approach (especially for patch #1,
#2 and #5 that are not xen specific), so that we can check in the
libxl side of the series?
It is currently blocking other patch series from being merged into
xen-unstable.

Re: [Qemu-devel] [RFC PATCH] arm boot: added QOM device definition

2012-02-21 Thread Peter Maydell

On 21 February 2012 09:15, Peter Crosthwaite
 wrote:
> On Tue, Feb 21, 2012 at 5:56 AM, Peter Maydell  
> wrote:
>> On 20 February 2012 19:51, Andreas Färber  wrote:
>>> Am 20.02.2012 20:43, schrieb Peter Maydell:
 I don't particularly care how we QOMify arm_boot (it's not exactly at
 the top of my priority list demanding attention), I do care that (a)
 we have a sensible user-facing interface [ie command line options] and
 (b) vl.c can usefully just pass the information from those options
 straight to the boot loader code.
>
> So when I trialled this patch I used the -device argument to
> instantiate the bootloader on the command line:
>
> qemu-system-arm -M versatilepb --device
> arm_linux_loader,kernel_filename=kernel.foo,initrd=image.bar...
>
> With this approach I was able to add command line args (if you will)
> to arm_boot without touching vl.c, qemu_opts or any of the arm machine
> models.

I think this is wrong, because it means the user has to know what the
name of the arm_boot device is (as well as because it means that the
machine model can't set the arm_boot properties that it needs to, as Paul
and Andreas point out). vl.c should search for "some device,
any device, which provides a kernel loading interface" and set
the kernel/initrd/cmdline/dtb properties on it, so that the same
kind of command line works on different machines even if they implement
different boot loader devices, subclass arm_boot or whatever.

As you say, the infrastructure to do it properly is kind of missing
at the moment; that's why I don't want to get basic DTB tied up
with QOMification.

-- PMM

Re: [Qemu-devel] BlockDriverState stack and BlockListeners

2012-02-21 Thread Ori Mamluk


On 21/02/2012 11:49, Kevin Wolf wrote:

Am 21.02.2012 10:15, schrieb Paolo Bonzini:

So maybe we just need to extend the current BlockDriverState stack to
distinguish "normal" and "always on top" BlockDrivers, where the latter
would roughly correspond to BlockListeners?

I would prefer having separate objects.  Even if you do not count fields
related to throttling or copy-on-read or other tasks in the list above,
there are many fields in BDS that do not really apply to BlockListeners.
  Backing files, device ops, encryption, even size.  Having extra methods
is not a big problem, but unwanted data items smell...

Most other block drivers use only little of them. We can try to clean up
some of them (and making the separation described above would probably
help with it), but BlockListeners aren't really different here from
existing drivers.
My impression as an outside observer was similar - it appears as though 
the block driver object contains stuff that belongs to the specific 
driver (e.g. bitmap).


An additional capability that I need in the replication filter is for 
the driver to initiate a new IO. It means that if we have a stack of 
drivers guest->bdrv1->bdrv2->bdrv3->image, then bdrv2 should be able to 
invoke a read - which will be processed only by deeper parts of the 
stack - i.e. bdrv3.

Makes sense?

For question of upper/lower drivers, I tend to think that there are two 
kinds - those who need the original guest IO transactions and those who 
transform them. Maybe two separate drivers stack (upper and lower) are 
enough to implement this difference.

Kevin

Re: [Qemu-devel] [PATCH][v13] megasas: LSI Megaraid SAS HBA emulation

2012-02-21 Thread Hannes Reinecke

On 02/21/2012 12:04 AM, Michael S. Tsirkin wrote:
> On Mon, Feb 20, 2012 at 04:17:05PM +0100, Alexander Graf wrote:
>>
>> On 20.02.2012, at 16:16, Hannes Reinecke wrote:
>>
>>> On 02/20/2012 04:13 PM, Alexander Graf wrote:

 On 20.02.2012, at 16:11, Hannes Reinecke wrote:

> This patch adds an emulation for the LSI Megaraid SAS 8708EM2 HBA.
> I've tested it to work with Linux, Windows Vista, and Windows7.
>
> Changes since v12:
> - Fixup flag setting via properties
> - Disable MSI-X per default
> - Fixup MSI-X handling

 If MSI-X works again, why disable it? :o Does it break in other cases?

>>> It doesn't. No idea why.
>>> That's why I disabled it :-)
>>>
>>> The MSI-X vector is signalled, but nothing on the OS side appear to
>>> receive it.
>>> If you got any idea on how to debug it I'm all ears ...
>>
>> Michael, any ideas? IIRC MSI-X used to work in earlier versions, so 
>> something recent must've broken it.
>>
>>
>> Alex
> 
> Hmm, there were's any recent changes I'm aware of.
> Try to check that MSIX vector as qemu sees it
> matches the value that the guest set.
> If using kvm, compare against kernel as well.
> Endian-ness issue somehow?
> Are you testing on x86?
> 
Testing on x86_64.
But apparently the (linux) driver doesn't activate MSI-X.
So either I didn't set it up properly in the HBA emulation
or something else went wrong.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries & Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)

Re: [Qemu-devel] [PATCH][v13] megasas: LSI Megaraid SAS HBA emulation

2012-02-21 Thread Michael S. Tsirkin

On Tue, Feb 21, 2012 at 11:27:15AM +0100, Hannes Reinecke wrote:
> On 02/21/2012 12:04 AM, Michael S. Tsirkin wrote:
> > On Mon, Feb 20, 2012 at 04:17:05PM +0100, Alexander Graf wrote:
> >>
> >> On 20.02.2012, at 16:16, Hannes Reinecke wrote:
> >>
> >>> On 02/20/2012 04:13 PM, Alexander Graf wrote:
> 
>  On 20.02.2012, at 16:11, Hannes Reinecke wrote:
> 
> > This patch adds an emulation for the LSI Megaraid SAS 8708EM2 HBA.
> > I've tested it to work with Linux, Windows Vista, and Windows7.
> >
> > Changes since v12:
> > - Fixup flag setting via properties
> > - Disable MSI-X per default
> > - Fixup MSI-X handling
> 
>  If MSI-X works again, why disable it? :o Does it break in other cases?
> 
> >>> It doesn't. No idea why.
> >>> That's why I disabled it :-)
> >>>
> >>> The MSI-X vector is signalled, but nothing on the OS side appear to
> >>> receive it.
> >>> If you got any idea on how to debug it I'm all ears ...
> >>
> >> Michael, any ideas? IIRC MSI-X used to work in earlier versions, so 
> >> something recent must've broken it.
> >>
> >>
> >> Alex
> > 
> > Hmm, there were's any recent changes I'm aware of.
> > Try to check that MSIX vector as qemu sees it
> > matches the value that the guest set.
> > If using kvm, compare against kernel as well.
> > Endian-ness issue somehow?
> > Are you testing on x86?
> > 
> Testing on x86_64.
> But apparently the (linux) driver doesn't activate MSI-X.
> So either I didn't set it up properly in the HBA emulation
> or something else went wrong.
> 
> Cheers,
> 
> Hannes

Ah, the driver doesn't enable msix? Your earlier comment
made me think the driver did enable it, and you
send the interrupt but guest does not get it...

Look at output of lspci -vv in the guest. If you see
msix there, then we present msix capability,
look at what failure did driver detect.

> -- 
> Dr. Hannes Reinecke zSeries & Storage
> h...@suse.de+49 911 74053 688
> SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
> GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)

[Qemu-devel] fix ELF loading for 0-length sections

2012-02-21 Thread Damian, Alexandru

Hi,

I got a problem with QEMU refusing to load an ELF binary with 0-length
sections,
while the kernel has no issue doing this.

This patch adds a check that has been in kernel since 2008 at least.

Cheers,
Alex


commit a42e5231c1be5f09caeb6c73e34933cd7efa7023
Author: Alexandru DAMIAN 
Date:   Tue Feb 21 12:34:36 2012 +0200

Do not attempt to map 0-length sections

Mmap will return an invalid argument, but 0-length sections
are valid in any case. The kernel as a similar check
when loading elf binaries courtesy of jkos...@suse.cz.

Signed-off-by: Alexandru Damian 

diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index ea61d0d..71d0ae3 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -918,9 +918,9 @@ static inline void init_thread(struct target_pt_regs
*regs,

 #define elf_check_arch(x) ( (x) == ELF_ARCH )

-#define ELF_CLASSELFCLASS64
-#define ELF_DATAELFDATA2MSB
-#define ELF_ARCHEM_S390
+#define ELF_CLASSELFCLASS64
+#define ELF_DATAELFDATA2MSB
+#define ELF_ARCHEM_S390

 static inline void init_thread(struct target_pt_regs *regs, struct
image_info *infop)
 {
@@ -1565,11 +1565,16 @@ static void load_elf_image(const char *image_name,
int image_fd,
 vaddr_po = TARGET_ELF_PAGEOFFSET(vaddr);
 vaddr_ps = TARGET_ELF_PAGESTART(vaddr);

-error = target_mmap(vaddr_ps, eppnt->p_filesz + vaddr_po,
-elf_prot, MAP_PRIVATE | MAP_FIXED,
-image_fd, eppnt->p_offset - vaddr_po);
-if (error == -1) {
-goto exit_perror;
+/* Don't attempt to map 0 bytes len sections.
+   Kernel also has this check.
+*/
+if (eppnt->p_filesz + vaddr_po != 0) {
+error = target_mmap(vaddr_ps, eppnt->p_filesz +
vaddr_po,
+elf_prot, MAP_PRIVATE | MAP_FIXED,
+image_fd, eppnt->p_offset -
vaddr_po);
+if (error == -1) {
+goto exit_perror;
+}
 }

 vaddr_ef = vaddr + eppnt->p_filesz;

Re: [Qemu-devel] BlockDriverState stack and BlockListeners

2012-02-21 Thread Kevin Wolf

Am 21.02.2012 11:09, schrieb Paolo Bonzini:
>>> 2) BlockListener would be entirely an implementation detail, used in the
>>> implementation of other commands.
>>
>> Depending on what you mean by command (presumably QMP commands?),
> 
> QMP commands, command-line (-drive), whatever.
> 
>> And on the other hand, protocols like file are entirely implementation
>> details as well, and still they are BlockDrivers.
> 
> True.  Formats and protocols also do not have perfectly overlapping
> functionality (backing images are only a format thing, for example).

And even protocols and protocols don't. Compare file to blkdebug, for
example. In fact, blkdebug and blkverify are already very close to what
BlockListeners would be.

 The main difference that I see is that the listeners stay always on top.
 For example, let's assume that if implemented a blkmirror driver in
 today's infrastructure, you would get a BlockDriverState stack like
 blkmirror -> qcow2 -> file. If you take a live snapshot now, you don't
 want to have the blkmirror applied to the old top-level image, which is
 now a read-only backing file. Instead, it should move to the new
 top-level image.
>>>
>>> Yes, that's because a BlockListener always applies to a
>>> BlockDriverState, and live snapshots close+reopen the BDS but do not
>>> delete/recreate it.
>>
>> Hm, that's an interesting angle to look at it. The reasoning makes sense
>> to me (though I would reword it as a BlockListener belongs to a
>> drive/block device rather than a BDS, which is an implementation detail).
> 
> Yes.
> 
>> However, live snapshots can't close and reopen the BDS in the future,
>> because the reopen could fail and you must not have closed the old image
>> yet in this case. So what Jeff and I were looking into recently is to
>> change this so that new top-level images are opened first without
>> backing file and then the backing file relationship is created with the
>> existing BDS.
>>
>> Of course, we stumbled across the thing that you're telling me here,
>> that devices refer to the same BDS as before. So their content must be
>> swapped, but some data like the device name (and now BlockListeners)
>> stay with the top-level image instead.
>>
>> Which in turn reminds me of a discussion I had with Stefan a while ago,
>> where we came to the conclusion that we need to separate the
>> representation of an image file and a "view" of it which represents a
>> block device (as a whole). [...]
>>
>> Isn't it cool how everything is connected with everything? :-)
> 
> :)
> 
> So we'd have:
> 
> - Protocols -- Protocols tell you where to get the raw bits.
> - Formats -- Formats transform those raw bits into a block device.
> - Views -- Views can move from a format to another.  A format can use a
> default view implementation, or provide its own (e.g. to access
> different snapshots).
> - Listeners -- I think a view can have-a listener?

Where protocols, formats and listeners are-a image (Not the best name,
I'm open for suggestions. Something like "BDS stack building block"
would be most accurate...). Or actually not is-a in terms of
inheritance, but I think it would be the very same thing without any
subclassing, implemented by a BlockDriver.

> with the following relationship:
> 
> - A format has-a protocol for the raw bits and has-a view for the
> backing file.

An image has-a image from which it takes its data (bs->file). And it
has-a view for the backing file, yes. Both could be a listener.

> - A view has-a format, a device has-a view.
> - A view can have-a listener?  Or is it formats?

A view has-a image. This may happen to be a listener, which in turn
has-a image (could be another listener, a format or a protocol).

The question is what the semantics is with live snapshots (there are
probably similar problems, but this is the obvious one). For example we
could now have:

mirror -> qcow2 -> blkdebug -> file

There are two listeners here, mirror and blkdebug. (Things like blkdebug
are why view has-a listener isn't enough). After creating an external
snapshot, we expect the graph to look like this (the arrow down is the
backing file):

mirror -> qcow2 -> file
|
+-> qcow2 -> blkdebug -> file

The question is: Can we assume that any listeners that are on top of the
first format or protocol (i.e. those that would fit your model) should
move to the new top-level view? Or would it sometimes make sense to keep
it at the old one?

> But I think we're digressing...

No, in fact I think we need to get an idea of what we want to have in
the end. And we need to do it soon, almost any new topic that's coming
up ends up in a discussion about the same shortcomings of the block layer.

It doesn't make sense to hack in more and more stuff without having
proper infrastructure for it.

 So maybe we just need to extend the current BlockDriverState stack to
 distinguish "normal" and "always on top" BlockDrivers, where the latter
 would ro

Re: [Qemu-devel] win7 bad i/o performance, high insn_emulation and exists

2012-02-21 Thread Peter Lieven


On 20.02.2012 21:45, Gleb Natapov wrote:

On Mon, Feb 20, 2012 at 08:59:38PM +0100, Peter Lieven wrote:

On 20.02.2012 20:04, Gleb Natapov wrote:

On Mon, Feb 20, 2012 at 08:40:08PM +0200, Gleb Natapov wrote:

On Mon, Feb 20, 2012 at 07:17:55PM +0100, Peter Lieven wrote:

Hi,

I came a across an issue with a Windows 7 (32-bit) as well as with a
Windows 2008 R2 (64-bit) guest.

If I transfer a file from the VM via CIFS or FTP to a remote machine,
i get very poor read performance (around 13MB/s). The VM peaks at 100%
cpu and I see a lot of insn_emulations and all kinds of exists in kvm_stat

efer_reload0 0
exits2260976 79620
fpu_reload  619711
halt_exits114734  5011
halt_wakeup   95  4876
host_state_reload1499659 60962
hypercalls 0 0
insn_emulation   1577325 58488
insn_emulation_fail0 0
invlpg 0 0
io_exits  943949 40249

Hmm, too many of those.


irq_exits 108679  5434
irq_injections236545 10788
irq_window  7606   246
largepages   672 5
mmio_exits460020 16082
mmu_cache_miss   119 0
mmu_flooded0 0
mmu_pde_zapped 0 0
mmu_pte_updated0 0
mmu_pte_write  13474 9
mmu_recycled   0 0
mmu_shadow_zapped141 0
mmu_unsync 0 0
nmi_injections 0 0
nmi_window 0 0
pf_fixed   2280335
pf_guest   0 0
remote_tlb_flush 239 2
request_irq0 0
signal_exits   0 0
tlb_flush  20933 0

If I run the same VM with a Ubuntu 10.04.4 guest I get around 60MB/s
throughput. The kvm_stats look a lot more sane.

efer_reload0 0
exits6132004 17931
fpu_reload 19863 3
halt_exits264961  3083
halt_wakeup   236468  2959
host_state_reload1104468  3104
hypercalls 0 0
insn_emulation   1417443  7518
insn_emulation_fail0 0
invlpg 0 0
io_exits  869380  2795
irq_exits 253501  2362
irq_injections616967  6804
irq_window201186  2161
largepages  1019 0
mmio_exits205268 0
mmu_cache_miss   192 0
mmu_flooded0 0
mmu_pde_zapped 0 0
mmu_pte_updated0 0
mmu_pte_write7440546 0
mmu_recycled   0 0
mmu_shadow_zapped259 0
mmu_unsync 0 0
nmi_injections 0 0
nmi_window 0 0
pf_fixed   3852930
pf_guest   0 0
remote_tlb_flush 761 1
request_irq0 0
signal_exits   0 0
tlb_flush  0 0

I use virtio-net (with vhost-net) and virtio-blk. I tried disabling
hpet (which basically illiminated the mmio_exits, but does not
increase
performance) and also commit (39a7a362e16bb27e98738d63f24d1ab5811e26a8
) - no improvement.

My commandline:
/usr/bin/qemu-kvm-1.0 -netdev
type=tap,id=guest8,script=no,downscript=no,ifname=tap0,vhost=on
-device virtio-net-pci,netdev=guest8,mac=52:54:00:ff:00:d3 -drive 
format=host_device,file=/dev/mapper/iqn.2001-05.com.equallogic:0-8a0906-eeef4e007-a8a9f3818674f2fc-lieven-windows7-vc-r80788,if=virtio,cache=none,aio=native
-m 2048 -smp 2 -monitor tcp:0:4001,server,nowait -vnc :1 -name
lieven-win7-vc -boot order=dc,menu=off -k de -pidfile
/var/run/qemu/vm-187.pid -mem-path /hugepages -mem-prealloc -cpu
host -rtc base=localtime -vga std -usb -usbdevice tablet -no-hpet

What further information is needed to debug this further?


Which kernel version (looks like something recent)?
Which host CPU (looks like something old)?

Output of cat /proc/cpuinfo


Which Windows' virtio drivers are you using?

Take a trace like described here http://www.linux-kvm.org/page/Tracing
(with -no-hpet please).


And also "info pci" output from qemu monitor while we are at it.

here is the output while i was tracing. you can download the trace
i took while i did a ftp transfer from the vm:

->  http://82.141.21.156/report.txt.gz


Windows reads PM timer. A lot. 15152 times per second.

Can you try to run this command in Windows guest:

   bcdedit /set {default} use

Re: [Qemu-devel] win7 bad i/o performance, high insn_emulation and exists

2012-02-21 Thread Gleb Natapov

On Tue, Feb 21, 2012 at 11:50:47AM +0100, Peter Lieven wrote:
> >I hope it will make Windows use TSC instead, but you can't be sure
> >about anything with Windows :(
> Whatever it does now it eates more CPU has almost equal
> number of exits and throughput is about the same (15MB/s).
> If pmtimer is at 0xb008 it still reads it like hell.
> 
> I checked with bcedit /v that useplatformclock is set to "No".
Yeah, today I noticed that it is likely virtio drivers that hammer
on PM timer (at least rip of the instruction that access it is
very close to rip of the instruction that access virtio pio).
Vadim, Windows driver developer,  is CCed.

> 
> I still wonder why both virtio devices are on IRQ0 ?
> 
They use MSI like they should.

--
Gleb.

[Qemu-devel] [PATCH 6/9] spice: support ipv6 channel address in monitor events and in spice info

2012-02-21 Thread Gerd Hoffmann

From: Yonit Halperin 

RHBZ #788444

CC: Gerd Hoffmann 

Signed-off-by: Yonit Halperin 
Signed-off-by: Gerd Hoffmann 
---
 ui/spice-core.c |   37 -
 1 files changed, 32 insertions(+), 5 deletions(-)

diff --git a/ui/spice-core.c b/ui/spice-core.c
index 05cb745..1308a3d 100644
--- a/ui/spice-core.c
+++ b/ui/spice-core.c
@@ -220,10 +220,23 @@ static void channel_event(int event, 
SpiceChannelEventInfo *info)
 }
 
 client = qdict_new();
-add_addr_info(client, &info->paddr, info->plen);
-
 server = qdict_new();
-add_addr_info(server, &info->laddr, info->llen);
+
+#ifdef SPICE_CHANNEL_EVENT_FLAG_ADDR_EXT
+if (info->flags & SPICE_CHANNEL_EVENT_FLAG_ADDR_EXT) {
+add_addr_info(client, (struct sockaddr *)&info->paddr_ext,
+  info->plen_ext);
+add_addr_info(server, (struct sockaddr *)&info->laddr_ext,
+  info->llen_ext);
+} else {
+fprintf(stderr, "spice: %s, extended address is expected\n",
+__func__);
+#endif
+add_addr_info(client, &info->paddr, info->plen);
+add_addr_info(server, &info->laddr, info->llen);
+#ifdef SPICE_CHANNEL_EVENT_FLAG_ADDR_EXT
+}
+#endif
 
 if (event == SPICE_CHANNEL_EVENT_INITIALIZED) {
 qdict_put(server, "auth", qstring_from_str(auth));
@@ -376,16 +389,30 @@ static SpiceChannelList *qmp_query_spice_channels(void)
 QTAILQ_FOREACH(item, &channel_list, link) {
 SpiceChannelList *chan;
 char host[NI_MAXHOST], port[NI_MAXSERV];
+struct sockaddr *paddr;
+socklen_t plen;
 
 chan = g_malloc0(sizeof(*chan));
 chan->value = g_malloc0(sizeof(*chan->value));
 
-getnameinfo(&item->info->paddr, item->info->plen,
+#ifdef SPICE_CHANNEL_EVENT_FLAG_ADDR_EXT
+if (item->info->flags & SPICE_CHANNEL_EVENT_FLAG_ADDR_EXT) {
+paddr = (struct sockaddr *)&item->info->paddr_ext;
+plen = item->info->plen_ext;
+} else {
+#endif
+paddr = &item->info->paddr;
+plen = item->info->plen;
+#ifdef SPICE_CHANNEL_EVENT_FLAG_ADDR_EXT
+}
+#endif
+
+getnameinfo(paddr, plen,
 host, sizeof(host), port, sizeof(port),
 NI_NUMERICHOST | NI_NUMERICSERV);
 chan->value->host = g_strdup(host);
 chan->value->port = g_strdup(port);
-chan->value->family = 
g_strdup(inet_strfamily(item->info->paddr.sa_family));
+chan->value->family = g_strdup(inet_strfamily(paddr->sa_family));
 
 chan->value->connection_id = item->info->connection_id;
 chan->value->channel_type = item->info->type;
-- 
1.7.1

Re: [Qemu-devel] win7 bad i/o performance, high insn_emulation and exists

2012-02-21 Thread Gleb Natapov

On Tue, Feb 21, 2012 at 11:59:23AM +0100, Peter Lieven wrote:
> On 21.02.2012 11:56, Gleb Natapov wrote:
> >On Tue, Feb 21, 2012 at 11:50:47AM +0100, Peter Lieven wrote:
> >>>I hope it will make Windows use TSC instead, but you can't be sure
> >>>about anything with Windows :(
> >>Whatever it does now it eates more CPU has almost equal
> >>number of exits and throughput is about the same (15MB/s).
> >>If pmtimer is at 0xb008 it still reads it like hell.
> >>
> >>I checked with bcedit /v that useplatformclock is set to "No".
> >Yeah, today I noticed that it is likely virtio drivers that hammer
> >on PM timer (at least rip of the instruction that access it is
> >very close to rip of the instruction that access virtio pio).
> >Vadim, Windows driver developer,  is CCed.
> Ok, I will switch to IDE and e1000 to confirm this? Or does it not
> make sense?
> 
It make perfect sense! Please try it.

--
Gleb.

Re: [Qemu-devel] fix ELF loading for 0-length sections

2012-02-21 Thread Peter Maydell

On 21 February 2012 10:42, Damian, Alexandru  wrote:
> Hi,
>
> I got a problem with QEMU refusing to load an ELF binary with 0-length
> sections,
> while the kernel has no issue doing this.
>
> This patch adds a check that has been in kernel since 2008 at least.

CC'ing Riku (linux-user maintainer).

>
> Cheers,
> Alex
>
> 
> commit a42e5231c1be5f09caeb6c73e34933cd7efa7023
> Author: Alexandru DAMIAN 
> Date:   Tue Feb 21 12:34:36 2012 +0200
>
>     Do not attempt to map 0-length sections
>
>     Mmap will return an invalid argument, but 0-length sections
>     are valid in any case. The kernel as a similar check
>     when loading elf binaries courtesy of jkos...@suse.cz.
>
>     Signed-off-by: Alexandru Damian 

Convention is to submit patches as git-format-patch style emails.

> diff --git a/linux-user/elfload.c b/linux-user/elfload.c
> index ea61d0d..71d0ae3 100644
> --- a/linux-user/elfload.c
> +++ b/linux-user/elfload.c
> @@ -918,9 +918,9 @@ static inline void init_thread(struct target_pt_regs
> *regs,
>
>  #define elf_check_arch(x) ( (x) == ELF_ARCH )
>
> -#define ELF_CLASS    ELFCLASS64
> -#define ELF_DATA    ELFDATA2MSB
> -#define ELF_ARCH    EM_S390
> +#define ELF_CLASS    ELFCLASS64
> +#define ELF_DATA    ELFDATA2MSB
> +#define ELF_ARCH    EM_S390

Please don't make unrelated whitespace changes.

>
>  static inline void init_thread(struct target_pt_regs *regs, struct
> image_info *infop)
>  {
> @@ -1565,11 +1565,16 @@ static void load_elf_image(const char *image_name,
> int image_fd,
>  vaddr_po = TARGET_ELF_PAGEOFFSET(vaddr);
>  vaddr_ps = TARGET_ELF_PAGESTART(vaddr);
>
> -    error = target_mmap(vaddr_ps, eppnt->p_filesz + vaddr_po,
> -    elf_prot, MAP_PRIVATE | MAP_FIXED,
> -    image_fd, eppnt->p_offset - vaddr_po);
> -    if (error == -1) {
> -    goto exit_perror;
> +    /* Don't attempt to map 0 bytes len sections.
> +   Kernel also has this check.
> +    */

This comment could be better:
   /* mmap() will fail EINVAL if given a zero size, but a
* segment with zero filesize is perfectly valid (and
* handled by the kernel's ELF loading code).
*/

> +    if (eppnt->p_filesz + vaddr_po != 0) {
> +    error = target_mmap(vaddr_ps, eppnt->p_filesz +
> vaddr_po,
> +    elf_prot, MAP_PRIVATE | MAP_FIXED,
> +    image_fd, eppnt->p_offset -
> vaddr_po);

Something in your patch submission path is wrapping long lines.

> +    if (error == -1) {
> +    goto exit_perror;
> +    }

QEMU coding style is four-space indents, but the indent in your new
code is mostly eight-space.

>  }
>
>  vaddr_ef = vaddr + eppnt->p_filesz;
>

Otherwise looks OK.

-- PMM

Re: [Qemu-devel] [PATCH][v13] megasas: LSI Megaraid SAS HBA emulation

2012-02-21 Thread Hannes Reinecke

On 02/21/2012 11:34 AM, Michael S. Tsirkin wrote:
> On Tue, Feb 21, 2012 at 11:27:15AM +0100, Hannes Reinecke wrote:
>> On 02/21/2012 12:04 AM, Michael S. Tsirkin wrote:
>>> On Mon, Feb 20, 2012 at 04:17:05PM +0100, Alexander Graf wrote:

 On 20.02.2012, at 16:16, Hannes Reinecke wrote:

> On 02/20/2012 04:13 PM, Alexander Graf wrote:
>>
>> On 20.02.2012, at 16:11, Hannes Reinecke wrote:
>>
>>> This patch adds an emulation for the LSI Megaraid SAS 8708EM2 HBA.
>>> I've tested it to work with Linux, Windows Vista, and Windows7.
>>>
>>> Changes since v12:
>>> - Fixup flag setting via properties
>>> - Disable MSI-X per default
>>> - Fixup MSI-X handling
>>
>> If MSI-X works again, why disable it? :o Does it break in other cases?
>>
> It doesn't. No idea why.
> That's why I disabled it :-)
>
> The MSI-X vector is signalled, but nothing on the OS side appear to
> receive it.
> If you got any idea on how to debug it I'm all ears ...

 Michael, any ideas? IIRC MSI-X used to work in earlier versions, so 
 something recent must've broken it.


 Alex
>>>
>>> Hmm, there were's any recent changes I'm aware of.
>>> Try to check that MSIX vector as qemu sees it
>>> matches the value that the guest set.
>>> If using kvm, compare against kernel as well.
>>> Endian-ness issue somehow?
>>> Are you testing on x86?
>>>
>> Testing on x86_64.
>> But apparently the (linux) driver doesn't activate MSI-X.
>> So either I didn't set it up properly in the HBA emulation
>> or something else went wrong.
>>
>> Cheers,
>>
>> Hannes
> 
> Ah, the driver doesn't enable msix? Your earlier comment
> made me think the driver did enable it, and you
> send the interrupt but guest does not get it...
> 
It looks as if the HBA emulation sets up the BARs
correctly:

# lspci -s 00:04.0 -vv
00:04.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 1078
Subsystem: LSI Logic / Symbios Logic Device 1013
Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast
>TAbort- SERR- dev)) {
msix_notify(&s->dev, 0);
} else {
qemu_irq_raise(s->dev.irq[0]);
}

and guest is stuck just after the msix_notify().
So I would assume that the (linux) driver did setup MSI-X,
otherwise 'msix_enabled()' should return false.

But still the Linux driver doesn't receive any MSI-X interrupts.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries & Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)

Re: [Qemu-devel] win7 bad i/o performance, high insn_emulation and exists

2012-02-21 Thread Peter Lieven


On 21.02.2012 12:00, Gleb Natapov wrote:

On Tue, Feb 21, 2012 at 11:59:23AM +0100, Peter Lieven wrote:

On 21.02.2012 11:56, Gleb Natapov wrote:

On Tue, Feb 21, 2012 at 11:50:47AM +0100, Peter Lieven wrote:

I hope it will make Windows use TSC instead, but you can't be sure
about anything with Windows :(

Whatever it does now it eates more CPU has almost equal
number of exits and throughput is about the same (15MB/s).
If pmtimer is at 0xb008 it still reads it like hell.

I checked with bcedit /v that useplatformclock is set to "No".

Yeah, today I noticed that it is likely virtio drivers that hammer
on PM timer (at least rip of the instruction that access it is
very close to rip of the instruction that access virtio pio).
Vadim, Windows driver developer,  is CCed.

Ok, I will switch to IDE and e1000 to confirm this? Or does it not
make sense?


It make perfect sense! Please try it.

~10MB/s. still a lot of 0xb008 reads.

efer_reload0 0
exits4389875 72341
fpu_reload 36729   342
halt_exits206204  3451
halt_wakeup   212953  3474
host_state_reload2976799 59043
hypercalls 0 0
insn_emulation   2936091 54921
insn_emulation_fail0 0
invlpg 0 0
io_exits 1821386 14108
irq_exits  81999  2798
irq_injections343720  8560
irq_window 12712   153
largepages   754 2
mmio_exits37 0
mmu_cache_miss   148 0
mmu_flooded0 0
mmu_pde_zapped 0 0
mmu_pte_updated0 0
mmu_pte_write  0 0
mmu_recycled   0 0
mmu_shadow_zapped189 0
mmu_unsync 0 0
nmi_injections 0 0
nmi_window 0 0
pf_fixed  13946121
pf_guest   0 0
remote_tlb_flush 248 0
request_irq0 0
signal_exits   0 0
tlb_flush  15366 0

trace at http://82.141.21.156/report3.txt.gz

Peter

--
Gleb.

[Qemu-devel] [PATCH 8/9] qxl: move ram size init to new function

2012-02-21 Thread Gerd Hoffmann

Factor memory bar sizing bits out to a separate function.

Signed-off-by: Gerd Hoffmann 
---
 hw/qxl.c |   41 ++---
 1 files changed, 22 insertions(+), 19 deletions(-)

diff --git a/hw/qxl.c b/hw/qxl.c
index 4de4b8d..38bb90e 100644
--- a/hw/qxl.c
+++ b/hw/qxl.c
@@ -1554,6 +1554,25 @@ static DisplayChangeListener display_listener = {
 .dpy_refresh = display_refresh,
 };
 
+static void qxl_init_ramsize(PCIQXLDevice *qxl, uint32_t ram_min_mb)
+{
+/* vga ram (bar 0) */
+if (qxl->vga.vram_size < ram_min_mb * 1024 * 1024) {
+qxl->vga.vram_size = ram_min_mb * 1024 * 1024;
+}
+
+/* vram (surfaces, bar 1) */
+if (qxl->vram_size < 4096) {
+qxl->vram_size = 4096;
+}
+if (qxl->revision == 1) {
+qxl->vram_size = 4096;
+}
+
+qxl->vga.vram_size = msb_mask(qxl->vga.vram_size * 2 - 1);
+qxl->vram_size = msb_mask(qxl->vram_size * 2 - 1);
+}
+
 static int qxl_init_common(PCIQXLDevice *qxl)
 {
 uint8_t* config = qxl->pci.config;
@@ -1592,13 +1611,6 @@ static int qxl_init_common(PCIQXLDevice *qxl)
 init_qxl_rom(qxl);
 init_qxl_ram(qxl);
 
-if (qxl->vram_size < 4096) {
-qxl->vram_size = 4096;
-}
-if (qxl->revision == 1) {
-qxl->vram_size = 4096;
-}
-qxl->vram_size = msb_mask(qxl->vram_size * 2 - 1);
 memory_region_init_ram(&qxl->vram_bar, "qxl.vram", qxl->vram_size);
 vmstate_register_ram(&qxl->vram_bar, &qxl->pci.qdev);
 
@@ -1641,15 +1653,11 @@ static int qxl_init_primary(PCIDevice *dev)
 {
 PCIQXLDevice *qxl = DO_UPCAST(PCIQXLDevice, pci, dev);
 VGACommonState *vga = &qxl->vga;
-ram_addr_t ram_size = msb_mask(qxl->vga.vram_size * 2 - 1);
 PortioList *qxl_vga_port_list = g_new(PortioList, 1);
 
 qxl->id = 0;
-
-if (ram_size < 32 * 1024 * 1024) {
-ram_size = 32 * 1024 * 1024;
-}
-vga_common_init(vga, ram_size);
+qxl_init_ramsize(qxl, 32);
+vga_common_init(vga, qxl->vga.vram_size);
 vga_init(vga, pci_address_space(dev), pci_address_space_io(dev), false);
 portio_list_init(qxl_vga_port_list, qxl_vga_portio_list, vga, "vga");
 portio_list_add(qxl_vga_port_list, pci_address_space_io(dev), 0x3b0);
@@ -1668,14 +1676,9 @@ static int qxl_init_secondary(PCIDevice *dev)
 {
 static int device_id = 1;
 PCIQXLDevice *qxl = DO_UPCAST(PCIQXLDevice, pci, dev);
-ram_addr_t ram_size = msb_mask(qxl->vga.vram_size * 2 - 1);
 
 qxl->id = device_id++;
-
-if (ram_size < 16 * 1024 * 1024) {
-ram_size = 16 * 1024 * 1024;
-}
-qxl->vga.vram_size = ram_size;
+qxl_init_ramsize(qxl, 16);
 memory_region_init_ram(&qxl->vga.vram, "qxl.vgavram", qxl->vga.vram_size);
 vmstate_register_ram(&qxl->vga.vram, &qxl->pci.qdev);
 qxl->vga.vram_ptr = memory_region_get_ram_ptr(&qxl->vga.vram);
-- 
1.7.1

Re: [Qemu-devel] [PATCH V14 2/7] Add TPM (frontend) hardware interface (TPM TIS) to Qemu

2012-02-21 Thread Stefan Berger


On 02/20/2012 10:18 PM, Michael S. Tsirkin wrote:

On Mon, Feb 20, 2012 at 07:43:05PM -0500, Stefan Berger wrote:

On 02/20/2012 05:02 PM, Michael S. Tsirkin wrote:

On Wed, Dec 14, 2011 at 08:43:17AM -0500, Stefan Berger wrote:

+/*
+ * Send a TPM request.
+ * Call this with the state_lock held so we can sync with the receive
+ * callback.
+ */
+static void tpm_tis_tpm_send(TPMState *s, uint8_t locty)
+{
+TPMTISState *tis =&s->s.tis;
+
+tpm_tis_show_buffer(&tis->loc[locty].w_buffer, "tpm_tis: To TPM");
+
+s->command_locty = locty;
+s->cmd_locty =&tis->loc[locty];
+
+/* w_offset serves as length indicator for length of data;
+   it's reset when the response comes back */
+tis->loc[locty].status = TPM_TIS_STATUS_EXECUTION;
+tis->loc[locty].sts&= ~TPM_TIS_STS_EXPECT;
+
+s->to_tpm_execute = true;
+qemu_cond_signal(&s->to_tpm_cond);
+}

What happens IIUC is that frondend sets to_tpm_execute
and signals a condition, and backend clears it
and waits on a condition.

So how about moving all the signalling
and locking out to backend, and have frontend
invoke a callback to signal it?

The whole threading thing then becomes a work-around
for a backend that does not support select,
instead of spilling out into frontend?


How do I get the lock calls (qemu_mutex_lock(&s->state_lock)) out of
the frontend? Do you want me to add callbacks to the backend
interface for locking (s->be_driver->ops->state_lock(s)) and one for
unlocking (s->be_driver->ops->state_unlock(tpm_be)) of the state
that really belongs to the front-end (state is 's') and invoke it as
shown in parenthesis and still keep s->state_lock around? Ideally
the locks would end up being 'nop's' if select() was available, but
in the end all backend will need to support that lock.

[The lock protects the common structure so that the thread in the
backend can deliver the response to a request while the OS for
example polls the hardware interface for its current state.]


Stefan


Well, this is just an idea, please do not take this as
a request or anything like that. Maybe it is a dumb one.

Maybe something like what you describe.


I am starting to wonder what we're trying to achieve? We have a 
producer-consumer problem here with different threads. Both threads need 
to have some locking constructs along with the signalling (condition). 
The backend needs to be written in a certain way to work with the 
frontend, locking and signalling is a part of this. So I don't see it 
makes much sense to move all that code around, especially since there is 
only one backend right now. Maybe something really great can be done 
once there is a 2nd backend.



Alternatively, I imagined that you can pass a copy
or pointer of the necessary state to the backend,
which queues the command and wakes the worker.
In the reverse direction, backend queues a response
and when OS polls you dequeue it and update state.



The OS doesn't necessarily need to poll. It is just one mode of 
operation of the OS, the other being interrupt-driven where the backend 
raises the interrupt once it has delivered the response to the frontend.



   Stefan


Can this work?

Re: [Qemu-devel] win7 bad i/o performance, high insn_emulation and exists

2012-02-21 Thread Peter Lieven


On 21.02.2012 11:56, Gleb Natapov wrote:

On Tue, Feb 21, 2012 at 11:50:47AM +0100, Peter Lieven wrote:

I hope it will make Windows use TSC instead, but you can't be sure
about anything with Windows :(

Whatever it does now it eates more CPU has almost equal
number of exits and throughput is about the same (15MB/s).
If pmtimer is at 0xb008 it still reads it like hell.

I checked with bcedit /v that useplatformclock is set to "No".

Yeah, today I noticed that it is likely virtio drivers that hammer
on PM timer (at least rip of the instruction that access it is
very close to rip of the instruction that access virtio pio).
Vadim, Windows driver developer,  is CCed.
Ok, I will switch to IDE and e1000 to confirm this? Or does it not make 
sense?


Peter

[Qemu-devel] [PATCH 2/9] qxl: don't render stuff when the vm is stopped.

2012-02-21 Thread Gerd Hoffmann

This patch fixes the local qxl renderer to not kick spice-server
in case the vm is stopped.  First it is largely pointless because
we ask spice-server to process all not-yet processed commands when
the vm is stopped, so there isn't much do do anyway.  Second we
avoid triggering an assert in spice-server.

The patch makes sure we still honor redraw requests, even if we don't
ask spice-server for updates.  This is needed to handle displaysurface
changes with a stopped vm correctly.

With this patch applied it is possible to take screen shots (via
screendump monitor command) from a qxl gpu even in case the guest
is stopped.

Signed-off-by: Gerd Hoffmann 
---
 hw/qxl-render.c |   12 +---
 1 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/hw/qxl-render.c b/hw/qxl-render.c
index 133d093..7084143 100644
--- a/hw/qxl-render.c
+++ b/hw/qxl-render.c
@@ -121,19 +121,17 @@ void qxl_render_update(PCIQXLDevice *qxl)
 dpy_resize(vga->ds);
 }
 
-if (!qxl->guest_primary.commands) {
-return;
-}
-qxl->guest_primary.commands = 0;
-
 update.left   = 0;
 update.right  = qxl->guest_primary.surface.width;
 update.top= 0;
 update.bottom = qxl->guest_primary.surface.height;
 
 memset(dirty, 0, sizeof(dirty));
-qxl_spice_update_area(qxl, 0, &update,
-  dirty, ARRAY_SIZE(dirty), 1, QXL_SYNC);
+if (runstate_is_running() && qxl->guest_primary.commands) {
+qxl->guest_primary.commands = 0;
+qxl_spice_update_area(qxl, 0, &update,
+  dirty, ARRAY_SIZE(dirty), 1, QXL_SYNC);
+}
 if (redraw) {
 memset(dirty, 0, sizeof(dirty));
 dirty[0] = update;
-- 
1.7.1

Re: [Qemu-devel] BlockDriverState stack and BlockListeners

2012-02-21 Thread Paolo Bonzini

On 02/21/2012 11:51 AM, Kevin Wolf wrote:
> And even protocols and protocols don't. Compare file to blkdebug, for
> example. In fact, blkdebug and blkverify are already very close to what
> BlockListeners would be.

Yes, and I think considering blkdebug and blkverify help in the design.
They provide the difference between views and listeners: listeners can
be applied to both a protocol and a view, while views can only be
applied to a format.

> > - Protocols -- Protocols tell you where to get the raw bits.
> > - Formats -- Formats transform those raw bits into a block device.
> > - Views -- Views can move from a format to another.  A format can use a
> > default view implementation, or provide its own (e.g. to access
> > different snapshots).
> > - Listeners -- I think a view can have-a listener?
> 
> Where protocols, formats and listeners are-a image (Not the best name,
> I'm open for suggestions. Something like "BDS stack building block"
> would be most accurate...). Or actually not is-a in terms of
> inheritance, but I think it would be the very same thing without any
> subclassing, implemented by a BlockDriver.
> 
> > with the following relationship:
> > 
> > - A format has-a protocol for the raw bits and has-a view for the
> > backing file.
> 
> An image has-a image from which it takes its data (bs->file).

No. A protocol has neither an image below it, nor a backing file.  I
think a view has no backing file either (except as provided by the
format).  I'm not sure that a listener can have a backing file, either;
you could say that blkverify has one, but it could just as well have
three or four, so it's not the same thing.

A format does not do much more than create views, do snapshots, and hold
state that is common to all of its views.

So, let's put aside listeners for a moment, and consider this
alternative hierarchy:

BlockDriver
Protocol
FileProtocol
...
View
QCOW2View
RawView
...
BlockFormat
QCOW2Format
RawFormat
...

Now we have to figure out how to fit listeners in this picture.

> And it has-a view for the backing file, yes. Both could be a listener.
> 
>> > - A view has-a format, a device has-a view.
>> > - A view can have-a listener?  Or is it formats?
> A view has-a image. This may happen to be a listener, which in turn
> has-a image (could be another listener, a format or a protocol).
> 
> The question is what the semantics is with live snapshots (there are
> probably similar problems, but this is the obvious one). For example we
> could now have:
> 
> mirror -> qcow2 -> blkdebug -> file

Let's be precise here:

mirror -> currentSnapshot -> qcow2 -> blkdebug -> file

- file is a protocol.

- blkdebug is a listener

- qcow2 is a format

- currentSnapshot is a view

- mirror is a listener

The difference between blkdebug/mirror on one side, and currentSnapshot
on the other, is that (as you said) blkdebug/mirror are always stacked
on top of something else. A driver that references currentSnapshot
actually gets mirror.

So we get to the actual hierarchy I'm proposing:

BlockDriver
BlockSource (see below)
Protocol (bdrv_file_open)
FileProtocol
...
View (has-a BlockFormat)
QCOW2View
RawView
...
BlockListener (has-a BlockDriver)
MirrorListener
BlkDebugListener
BlockFormat (has-a BlockSource)
QCOW2Format
RawFormat
...

Protocols and views are only internal.  Formats and devices in practice
will only ever see BlockSources.  A BlockSource is a reference a stack of
BlockDrivers, where the base must be a protocol or view and there can
be a number of listeners stacked on top of it.  Listeners can be
added or removed from the stack, and the bottom driver can be swapped
for another (for snapshots).

So, here is how it would look:

  .== BlockSource ==.   .== BlockSource ===.
  | MirrorListener  |   | BlkDebugListener |
  | QCOW2View --+--> QCOW2Format -> | FileProtocol |
  '='   '=='

> There are two listeners here, mirror and blkdebug. (Things like blkdebug
> are why view has-a listener isn't enough). After creating an external
> snapshot, we expect the graph to look like this (the arrow down is the
> backing file):
> 
> mirror -> qcow2 -> file
> |
> +-> qcow2 -> blkdebug -> file

And here:

  .== BlockSource ==.
  | MirrorListener  |   .== BlockSource ==.
  | QCOW2View --+--> QCOW2Format -> | FileProtocol|
  '='|  '='
 |  .== 
BlockSource ===.
 |   .== BlockSource ==.| 
BlkDebugListener |
 '-> | QCOW2View --+--> QCOW2Format --> | 
FileProtocol |
 '='
'=='

Does it

[Qemu-devel] [PATCH 7/9] qxl: drop vram bar minimum size

2012-02-21 Thread Gerd Hoffmann

There is no reason to require a minimum size of 16 MB for the vram.
Lower the limit to 4096 (one page).  Make it disapper completely would
break guests.
---
 hw/qxl.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/qxl.c b/hw/qxl.c
index 5c6b556..4de4b8d 100644
--- a/hw/qxl.c
+++ b/hw/qxl.c
@@ -1592,8 +1592,8 @@ static int qxl_init_common(PCIQXLDevice *qxl)
 init_qxl_rom(qxl);
 init_qxl_ram(qxl);
 
-if (qxl->vram_size < 16 * 1024 * 1024) {
-qxl->vram_size = 16 * 1024 * 1024;
+if (qxl->vram_size < 4096) {
+qxl->vram_size = 4096;
 }
 if (qxl->revision == 1) {
 qxl->vram_size = 4096;
-- 
1.7.1

[Qemu-devel] [PATCH 5/9] Add SPICE support to add_client monitor command

2012-02-21 Thread Gerd Hoffmann

From: Daniel P. Berrange 

With the acceptance of some new APIs to libspice-server.so it
is possible to add support for SPICE to the 'add_client'
monitor command, bringing parity with VNC. Since SPICE can
use TLS or plain connections, the command also gains a new
'tls' parameter to specify whether TLS should be attempted
on the injected client sockets.

This new feature is only enabled if building against a
libspice-server >= 0.10.1

* qmp-commands.hx: Add 'tls' parameter & missing doc for
  'skipauth' parameter
* monitor.c: Wire up SPICE for 'add_client' command
* ui/qemu-spice.h, ui/spice-core.c: Add qemu_spice_display_add_client
  API to wire up from monitor

[1] 
http://cgit.freedesktop.org/spice/spice/commit/server/spice.h?id=d55b68b6b44f2499278fa860fb47ff22f5011faa

http://cgit.freedesktop.org/spice/spice/commit/server/spice.h?id=bd07dde530d9504e1cfe7ed5837fc00c26f36716

Changes in v3:
 - Added 'optional' flag to new parameters documented
 - Added no-op impl of qemu_spice_display_add_client when
   SPICE is disabled during build

Signed-off-by: Daniel P. Berrange 
Signed-off-by: Gerd Hoffmann 
---
 monitor.c   |9 +++--
 qmp-commands.hx |6 --
 ui/qemu-spice.h |7 +++
 ui/spice-core.c |   13 +
 4 files changed, 31 insertions(+), 4 deletions(-)

diff --git a/monitor.c b/monitor.c
index aadbdcb..0d4daad 100644
--- a/monitor.c
+++ b/monitor.c
@@ -823,13 +823,18 @@ static int add_graphics_client(Monitor *mon, const QDict 
*qdict, QObject **ret_d
 CharDriverState *s;
 
 if (strcmp(protocol, "spice") == 0) {
+int fd = monitor_get_fd(mon, fdname);
+int skipauth = qdict_get_try_bool(qdict, "skipauth", 0);
+int tls = qdict_get_try_bool(qdict, "tls", 0);
 if (!using_spice) {
 /* correct one? spice isn't a device ,,, */
 qerror_report(QERR_DEVICE_NOT_ACTIVE, "spice");
 return -1;
 }
-   qerror_report(QERR_ADD_CLIENT_FAILED);
-   return -1;
+if (qemu_spice_display_add_client(fd, skipauth, tls) < 0) {
+close(fd);
+}
+return 0;
 #ifdef CONFIG_VNC
 } else if (strcmp(protocol, "vnc") == 0) {
int fd = monitor_get_fd(mon, fdname);
diff --git a/qmp-commands.hx b/qmp-commands.hx
index b5e2ab8..dee95f1 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -910,8 +910,8 @@ EQMP
 
 {
 .name   = "add_client",
-.args_type  = "protocol:s,fdname:s,skipauth:b?",
-.params = "protocol fdname skipauth",
+.args_type  = "protocol:s,fdname:s,skipauth:b?,tls:b?",
+.params = "protocol fdname skipauth tls",
 .help   = "add a graphics client",
 .user_print = monitor_user_noop,
 .mhandler.cmd_new = add_graphics_client,
@@ -927,6 +927,8 @@ Arguments:
 
 - "protocol": protocol name (json-string)
 - "fdname": file descriptor name (json-string)
+- "skipauth": whether to skip authentication (json-bool, optional)
+- "tls": whether to perform TLS (json-bool, optional)
 
 Example:
 
diff --git a/ui/qemu-spice.h b/ui/qemu-spice.h
index c35b29c..680206a 100644
--- a/ui/qemu-spice.h
+++ b/ui/qemu-spice.h
@@ -33,6 +33,7 @@ void qemu_spice_init(void);
 void qemu_spice_input_init(void);
 void qemu_spice_audio_init(void);
 void qemu_spice_display_init(DisplayState *ds);
+int qemu_spice_display_add_client(int csock, int skipauth, int tls);
 int qemu_spice_add_interface(SpiceBaseInstance *sin);
 int qemu_spice_set_passwd(const char *passwd,
   bool fail_if_connected, bool 
disconnect_if_connected);
@@ -68,6 +69,12 @@ static inline int qemu_spice_migrate_info(const char *h, int 
p, int t,
 return -1;
 }
 
+static inline int qemu_spice_display_add_client(int csock, int skipauth,
+int tls)
+{
+return -1;
+}
+
 #endif /* CONFIG_SPICE */
 
 #endif /* QEMU_SPICE_H */
diff --git a/ui/spice-core.c b/ui/spice-core.c
index b4629f8..05cb745 100644
--- a/ui/spice-core.c
+++ b/ui/spice-core.c
@@ -747,6 +747,19 @@ int qemu_spice_set_pw_expire(time_t expires)
 return qemu_spice_set_ticket(false, false);
 }
 
+int qemu_spice_display_add_client(int csock, int skipauth, int tls)
+{
+#if SPICE_SERVER_VERSION >= 0x000a01
+if (tls) {
+return spice_server_add_ssl_client(spice_server, csock, skipauth);
+} else {
+return spice_server_add_client(spice_server, csock, skipauth);
+}
+#else
+return -1;
+#endif
+}
+
 static void spice_register_config(void)
 {
 qemu_add_opts(&qemu_spice_opts);
-- 
1.7.1

[Qemu-devel] [PATCH 1/9] qxl: fix warnings on 32bit

2012-02-21 Thread Gerd Hoffmann

Signed-off-by: Gerd Hoffmann 
---
 hw/qxl.c |7 ---
 1 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/hw/qxl.c b/hw/qxl.c
index ac69125..f421a45 100644
--- a/hw/qxl.c
+++ b/hw/qxl.c
@@ -625,7 +625,7 @@ static void interface_release_resource(QXLInstance *sin,
 
 if (ext.group_id == MEMSLOT_GROUP_HOST) {
 /* host group -> vga mode update request */
-qemu_spice_destroy_update(&qxl->ssd, (void*)ext.info->id);
+qemu_spice_destroy_update(&qxl->ssd, (void *)(intptr_t)ext.info->id);
 return;
 }
 
@@ -748,7 +748,8 @@ static void interface_async_complete(QXLInstance *sin, 
uint64_t cookie)
 qxl->current_async = QXL_UNDEFINED_IO;
 qemu_mutex_unlock(&qxl->async_lock);
 
-dprint(qxl, 2, "async_complete: %d (%ld) done\n", current_async, cookie);
+dprint(qxl, 2, "async_complete: %d (%" PRId64 ") done\n",
+   current_async, cookie);
 switch (current_async) {
 case QXL_IO_CREATE_PRIMARY_ASYNC:
 qxl_create_guest_primary_complete(qxl);
@@ -1015,7 +1016,7 @@ void *qxl_phys2virt(PCIQXLDevice *qxl, QXLPHYSICAL pqxl, 
int group_id)
 
 switch (group_id) {
 case MEMSLOT_GROUP_HOST:
-return (void*)offset;
+return (void *)(intptr_t)offset;
 case MEMSLOT_GROUP_GUEST:
 PANIC_ON(slot >= NUM_MEMSLOTS);
 PANIC_ON(!qxl->guest_slots[slot].active);
-- 
1.7.1

[Qemu-devel] [PULL] spice patch queue

2012-02-21 Thread Gerd Hoffmann

  Hi,

Here is the spice patch queue with a collection of little improvements
and bugfixes.  No major stuff.  See individual patches for details.

please pull,
  Gerd

The following changes since commit 99c7f87826337fa81f2f0f9baa9ca0a44faf90e9:

  input: send kbd+mouse events only to running guests. (2012-02-17 11:02:55 
-0600)

are available in the git repository at:
  git://anongit.freedesktop.org/spice/qemu spice.v48

Daniel P. Berrange (1):
  Add SPICE support to add_client monitor command

Gerd Hoffmann (5):
  qxl: fix warnings on 32bit
  qxl: don't render stuff when the vm is stopped.
  qxl: drop vram bar minimum size
  qxl: move ram size init to new function
  qxl: add user-friendly bar size properties

Yonit Halperin (3):
  qxl: set only off-screen surfaces dirty instead of the whole vram
  qxl: make sure primary surface is saved on migration also in compat mode
  spice: support ipv6 channel address in monitor events and in spice info

 hw/qxl-render.c |   12 +++
 hw/qxl.c|  109 +++
 hw/qxl.h|4 ++
 monitor.c   |9 -
 qmp-commands.hx |6 ++-
 ui/qemu-spice.h |7 
 ui/spice-core.c |   50 +++---
 7 files changed, 150 insertions(+), 47 deletions(-)

Re: [Qemu-devel] win7 bad i/o performance, high insn_emulation and exists

2012-02-21 Thread Gleb Natapov

On Tue, Feb 21, 2012 at 12:16:16PM +0100, Peter Lieven wrote:
> On 21.02.2012 12:00, Gleb Natapov wrote:
> >On Tue, Feb 21, 2012 at 11:59:23AM +0100, Peter Lieven wrote:
> >>On 21.02.2012 11:56, Gleb Natapov wrote:
> >>>On Tue, Feb 21, 2012 at 11:50:47AM +0100, Peter Lieven wrote:
> >I hope it will make Windows use TSC instead, but you can't be sure
> >about anything with Windows :(
> Whatever it does now it eates more CPU has almost equal
> number of exits and throughput is about the same (15MB/s).
> If pmtimer is at 0xb008 it still reads it like hell.
> 
> I checked with bcedit /v that useplatformclock is set to "No".
> >>>Yeah, today I noticed that it is likely virtio drivers that hammer
> >>>on PM timer (at least rip of the instruction that access it is
> >>>very close to rip of the instruction that access virtio pio).
> >>>Vadim, Windows driver developer,  is CCed.
> >>Ok, I will switch to IDE and e1000 to confirm this? Or does it not
> >>make sense?
> >>
> >It make perfect sense! Please try it.
> ~10MB/s. still a lot of 0xb008 reads.
> 
The same amount of reads essentially. So my theory is incorrect. Virtio
driver probably calls Windows function to do IO and the function
happens to be near the function that access PM timer.

I wonder why time stamps in your traces are so coarse-grained. What do
you see in /sys/bus/clocksource/devices/clocksource0/current_clocksource ?

--
Gleb.

[Qemu-devel] [PATCH 4/9] qxl: make sure primary surface is saved on migration also in compat mode

2012-02-21 Thread Gerd Hoffmann

From: Yonit Halperin 

RHBZ #790083

Signed-off-by: Yonit Halperin 
Signed-off-by: Gerd Hoffmann 
---
 hw/qxl.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/hw/qxl.c b/hw/qxl.c
index b544f31..5c6b556 100644
--- a/hw/qxl.c
+++ b/hw/qxl.c
@@ -1471,7 +1471,7 @@ static void qxl_dirty_surfaces(PCIQXLDevice *qxl)
 intptr_t vram_start;
 int i;
 
-if (qxl->mode != QXL_MODE_NATIVE) {
+if (qxl->mode != QXL_MODE_NATIVE && qxl->mode != QXL_MODE_COMPAT) {
 return;
 }
 
-- 
1.7.1

[Qemu-devel] [PATCH 9/9] qxl: add user-friendly bar size properties

2012-02-21 Thread Gerd Hoffmann

Add two properties to specify bar sizes in megabytes instead of bytes,
which is alot more user-friendly.

Signed-off-by: Gerd Hoffmann 
---
 hw/qxl.c |8 
 hw/qxl.h |4 
 2 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/hw/qxl.c b/hw/qxl.c
index 38bb90e..87ad49a 100644
--- a/hw/qxl.c
+++ b/hw/qxl.c
@@ -1557,11 +1557,17 @@ static DisplayChangeListener display_listener = {
 static void qxl_init_ramsize(PCIQXLDevice *qxl, uint32_t ram_min_mb)
 {
 /* vga ram (bar 0) */
+if (qxl->ram_size_mb != -1) {
+qxl->vga.vram_size = qxl->ram_size_mb * 1024 * 1024;
+}
 if (qxl->vga.vram_size < ram_min_mb * 1024 * 1024) {
 qxl->vga.vram_size = ram_min_mb * 1024 * 1024;
 }
 
 /* vram (surfaces, bar 1) */
+if (qxl->vram_size_mb != -1) {
+qxl->vram_size = qxl->vram_size_mb * 1024 * 1024;
+}
 if (qxl->vram_size < 4096) {
 qxl->vram_size = 4096;
 }
@@ -1860,6 +1866,8 @@ static Property qxl_properties[] = {
 DEFINE_PROP_UINT32("debug", PCIQXLDevice, debug, 0),
 DEFINE_PROP_UINT32("guestdebug", PCIQXLDevice, guestdebug, 0),
 DEFINE_PROP_UINT32("cmdlog", PCIQXLDevice, cmdlog, 0),
+DEFINE_PROP_UINT32("ram_size_mb",  PCIQXLDevice, ram_size_mb, -1),
+DEFINE_PROP_UINT32("vram_size_mb", PCIQXLDevice, vram_size_mb, -1),
 DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/qxl.h b/hw/qxl.h
index 766aa6d..d062991 100644
--- a/hw/qxl.h
+++ b/hw/qxl.h
@@ -89,6 +89,10 @@ typedef struct PCIQXLDevice {
 
 /* io bar */
 MemoryRegion   io_bar;
+
+/* user-friendly properties (in megabytes) */
+uint32_t  ram_size_mb;
+uint32_t  vram_size_mb;
 } PCIQXLDevice;
 
 #define PANIC_ON(x) if ((x)) { \
-- 
1.7.1

[Qemu-devel] [PATCH 3/9] qxl: set only off-screen surfaces dirty instead of the whole vram

2012-02-21 Thread Gerd Hoffmann

From: Yonit Halperin 

We used to assure the guest surfaces were saved before migration by
setting the whole vram dirty. This patch sets dirty only the areas
that are actually used in the vram.

Signed-off-by: Yonit Halperin 
Signed-off-by: Gerd Hoffmann 
---
 hw/qxl.c |   53 -
 1 files changed, 44 insertions(+), 9 deletions(-)

diff --git a/hw/qxl.c b/hw/qxl.c
index f421a45..b544f31 100644
--- a/hw/qxl.c
+++ b/hw/qxl.c
@@ -1007,7 +1007,7 @@ static void qxl_reset_surfaces(PCIQXLDevice *d)
 qxl_spice_destroy_surfaces(d, QXL_SYNC);
 }
 
-/* called from spice server thread context only */
+/* can be also called from spice server thread context */
 void *qxl_phys2virt(PCIQXLDevice *qxl, QXLPHYSICAL pqxl, int group_id)
 {
 uint64_t phys   = le64_to_cpu(pqxl);
@@ -1466,6 +1466,46 @@ static void qxl_hw_text_update(void *opaque, 
console_ch_t *chardata)
 }
 }
 
+static void qxl_dirty_surfaces(PCIQXLDevice *qxl)
+{
+intptr_t vram_start;
+int i;
+
+if (qxl->mode != QXL_MODE_NATIVE) {
+return;
+}
+
+/* dirty the primary surface */
+qxl_set_dirty(&qxl->vga.vram, qxl->shadow_rom.draw_area_offset,
+  qxl->shadow_rom.surface0_area_size);
+
+vram_start =  (intptr_t)memory_region_get_ram_ptr(&qxl->vram_bar);
+
+/* dirty the off-screen surfaces */
+for (i = 0; i < NUM_SURFACES; i++) {
+QXLSurfaceCmd *cmd;
+intptr_t surface_offset;
+int surface_size;
+
+if (qxl->guest_surfaces.cmds[i] == 0) {
+continue;
+}
+
+cmd = qxl_phys2virt(qxl, qxl->guest_surfaces.cmds[i],
+MEMSLOT_GROUP_GUEST);
+assert(cmd->type == QXL_SURFACE_CMD_CREATE);
+surface_offset = (intptr_t)qxl_phys2virt(qxl,
+ cmd->u.surface_create.data,
+ MEMSLOT_GROUP_GUEST);
+surface_offset -= vram_start;
+surface_size = cmd->u.surface_create.height *
+   abs(cmd->u.surface_create.stride);
+dprint(qxl, 3, "%s: dirty surface %d, offset %d, size %d\n", __func__,
+   i, (int)surface_offset, surface_size);
+qxl_set_dirty(&qxl->vram_bar, surface_offset, surface_size);
+}
+}
+
 static void qxl_vm_change_state_handler(void *opaque, int running,
 RunState state)
 {
@@ -1479,14 +1519,9 @@ static void qxl_vm_change_state_handler(void *opaque, 
int running,
  * called
  */
  qxl_update_irq(qxl);
-} else if (qxl->mode == QXL_MODE_NATIVE) {
-/* dirty all vram (which holds surfaces) and devram (primary surface)
- * to make sure they are saved */
-/* FIXME #1: should go out during "live" stage */
-/* FIXME #2: we only need to save the areas which are actually used */
-qxl_set_dirty(&qxl->vram_bar, 0, qxl->vram_size);
-qxl_set_dirty(&qxl->vga.vram, qxl->shadow_rom.draw_area_offset,
-  qxl->shadow_rom.surface0_area_size);
+} else {
+/* make sure surfaces are saved before migration */
+qxl_dirty_surfaces(qxl);
 }
 }
 
-- 
1.7.1

Re: [Qemu-devel] win7 bad i/o performance, high insn_emulation and exists

2012-02-21 Thread Peter Lieven


On 21.02.2012 12:46, Gleb Natapov wrote:

On Tue, Feb 21, 2012 at 12:16:16PM +0100, Peter Lieven wrote:

On 21.02.2012 12:00, Gleb Natapov wrote:

On Tue, Feb 21, 2012 at 11:59:23AM +0100, Peter Lieven wrote:

On 21.02.2012 11:56, Gleb Natapov wrote:

On Tue, Feb 21, 2012 at 11:50:47AM +0100, Peter Lieven wrote:

I hope it will make Windows use TSC instead, but you can't be sure
about anything with Windows :(

Whatever it does now it eates more CPU has almost equal
number of exits and throughput is about the same (15MB/s).
If pmtimer is at 0xb008 it still reads it like hell.

I checked with bcedit /v that useplatformclock is set to "No".

Yeah, today I noticed that it is likely virtio drivers that hammer
on PM timer (at least rip of the instruction that access it is
very close to rip of the instruction that access virtio pio).
Vadim, Windows driver developer,  is CCed.

Ok, I will switch to IDE and e1000 to confirm this? Or does it not
make sense?


It make perfect sense! Please try it.

~10MB/s. still a lot of 0xb008 reads.


The same amount of reads essentially. So my theory is incorrect. Virtio
driver probably calls Windows function to do IO and the function
happens to be near the function that access PM timer.

I wonder why time stamps in your traces are so coarse-grained. What do
you see in /sys/bus/clocksource/devices/clocksource0/current_clocksource ?


its set to acpi_pm on the host. we changed that from tsc (choosen by 
kernel) after we encountered
a kernel bug which ooops all hosts after approx. 270 days uptime. 
(https://lkml.org/lkml/2011/7/21/343).


i am not sure if this is fixed in 2.6.38 or later kernels and we could 
go back to tsc.


for testing i already checked this, but it doesn't give better performance.

peter



--
Gleb.

Re: [Qemu-devel] [PATCH V14 2/7] Add TPM (frontend) hardware interface (TPM TIS) to Qemu

2012-02-21 Thread Michael S. Tsirkin

On Tue, Feb 21, 2012 at 06:19:26AM -0500, Stefan Berger wrote:
> On 02/20/2012 10:18 PM, Michael S. Tsirkin wrote:
> >On Mon, Feb 20, 2012 at 07:43:05PM -0500, Stefan Berger wrote:
> >>On 02/20/2012 05:02 PM, Michael S. Tsirkin wrote:
> >>>On Wed, Dec 14, 2011 at 08:43:17AM -0500, Stefan Berger wrote:
> +/*
> + * Send a TPM request.
> + * Call this with the state_lock held so we can sync with the receive
> + * callback.
> + */
> +static void tpm_tis_tpm_send(TPMState *s, uint8_t locty)
> +{
> +TPMTISState *tis =&s->s.tis;
> +
> +tpm_tis_show_buffer(&tis->loc[locty].w_buffer, "tpm_tis: To TPM");
> +
> +s->command_locty = locty;
> +s->cmd_locty =&tis->loc[locty];
> +
> +/* w_offset serves as length indicator for length of data;
> +   it's reset when the response comes back */
> +tis->loc[locty].status = TPM_TIS_STATUS_EXECUTION;
> +tis->loc[locty].sts&= ~TPM_TIS_STS_EXPECT;
> +
> +s->to_tpm_execute = true;
> +qemu_cond_signal(&s->to_tpm_cond);
> +}
> >>>What happens IIUC is that frondend sets to_tpm_execute
> >>>and signals a condition, and backend clears it
> >>>and waits on a condition.
> >>>
> >>>So how about moving all the signalling
> >>>and locking out to backend, and have frontend
> >>>invoke a callback to signal it?
> >>>
> >>>The whole threading thing then becomes a work-around
> >>>for a backend that does not support select,
> >>>instead of spilling out into frontend?
> >>>
> >>How do I get the lock calls (qemu_mutex_lock(&s->state_lock)) out of
> >>the frontend? Do you want me to add callbacks to the backend
> >>interface for locking (s->be_driver->ops->state_lock(s)) and one for
> >>unlocking (s->be_driver->ops->state_unlock(tpm_be)) of the state
> >>that really belongs to the front-end (state is 's') and invoke it as
> >>shown in parenthesis and still keep s->state_lock around? Ideally
> >>the locks would end up being 'nop's' if select() was available, but
> >>in the end all backend will need to support that lock.
> >>
> >>[The lock protects the common structure so that the thread in the
> >>backend can deliver the response to a request while the OS for
> >>example polls the hardware interface for its current state.]
> >>
> >>
> >>Stefan
> >
> >Well, this is just an idea, please do not take this as
> >a request or anything like that. Maybe it is a dumb one.
> >
> >Maybe something like what you describe.
> 
> I am starting to wonder what we're trying to achieve? We have a
> producer-consumer problem here with different threads. Both threads
> need to have some locking constructs along with the signalling
> (condition). The backend needs to be written in a certain way to
> work with the frontend, locking and signalling is a part of this. So
> I don't see it makes much sense to move all that code around,
> especially since there is only one backend right now. Maybe
> something really great can be done once there is a 2nd backend.

There are three reasons I think where I think code
could be improved:

1. Your backend does not expose a reentrant asynchronous API,
   but another backend might.
   So it might be a better idea to hide this detail, and build a
   reentrant asynchronous API on top of what the OS supplies.
2. Your backend looks into the frontend data structures.
   This will make it impossible to implement another frontend.
3. I personally find it very hard to follow inter-thread
   communication based on shared memory and conditions
   if it is spread around between 2 different patches
   and different files. This can alternatively be addressed
   by documenting the synchronization/locking strategy.


> >Alternatively, I imagined that you can pass a copy
> >or pointer of the necessary state to the backend,
> >which queues the command and wakes the worker.
> >In the reverse direction, backend queues a response
> >and when OS polls you dequeue it and update state.
> >
> 
> The OS doesn't necessarily need to poll. It is just one mode of
> operation of the OS, the other being interrupt-driven where the
> backend raises the interrupt once it has delivered the response to
> the frontend.
> 
> 
>Stefan

So you will also need to signal the frontend when it
must interrupt the guest. This is not a problem,
for example you can use a qemu_eventfd object for this.

> 
> >Can this work?

Re: [Qemu-devel] BlockDriverState stack and BlockListeners

2012-02-21 Thread Stefan Hajnoczi

On Tue, Feb 21, 2012 at 11:36 AM, Paolo Bonzini  wrote:
> On 02/21/2012 11:51 AM, Kevin Wolf wrote:
>> And even protocols and protocols don't. Compare file to blkdebug, for
>> example. In fact, blkdebug and blkverify are already very close to what
>> BlockListeners would be.
>
> Yes, and I think considering blkdebug and blkverify help in the design.
> They provide the difference between views and listeners: listeners can
> be applied to both a protocol and a view, while views can only be
> applied to a format.
>
>> > - Protocols -- Protocols tell you where to get the raw bits.
>> > - Formats -- Formats transform those raw bits into a block device.
>> > - Views -- Views can move from a format to another.  A format can use a
>> > default view implementation, or provide its own (e.g. to access
>> > different snapshots).
>> > - Listeners -- I think a view can have-a listener?
>>
>> Where protocols, formats and listeners are-a image (Not the best name,
>> I'm open for suggestions. Something like "BDS stack building block"
>> would be most accurate...). Or actually not is-a in terms of
>> inheritance, but I think it would be the very same thing without any
>> subclassing, implemented by a BlockDriver.
>>
>> > with the following relationship:
>> >
>> > - A format has-a protocol for the raw bits and has-a view for the
>> > backing file.
>>
>> An image has-a image from which it takes its data (bs->file).
>
> No. A protocol has neither an image below it, nor a backing file.  I
> think a view has no backing file either (except as provided by the
> format).  I'm not sure that a listener can have a backing file, either;
> you could say that blkverify has one, but it could just as well have
> three or four, so it's not the same thing.
>
> A format does not do much more than create views, do snapshots, and hold
> state that is common to all of its views.
>
> So, let's put aside listeners for a moment, and consider this
> alternative hierarchy:
>
> BlockDriver
>    Protocol
>        FileProtocol
>        ...
>    View
>        QCOW2View
>        RawView
>        ...
> BlockFormat
>    QCOW2Format
>    RawFormat
>    ...
>
> Now we have to figure out how to fit listeners in this picture.
>
>> And it has-a view for the backing file, yes. Both could be a listener.
>>
>>> > - A view has-a format, a device has-a view.
>>> > - A view can have-a listener?  Or is it formats?
>> A view has-a image. This may happen to be a listener, which in turn
>> has-a image (could be another listener, a format or a protocol).
>>
>> The question is what the semantics is with live snapshots (there are
>> probably similar problems, but this is the obvious one). For example we
>> could now have:
>>
>> mirror -> qcow2 -> blkdebug -> file
>
> Let's be precise here:
>
> mirror -> currentSnapshot -> qcow2 -> blkdebug -> file
>
> - file is a protocol.
>
> - blkdebug is a listener
>
> - qcow2 is a format
>
> - currentSnapshot is a view
>
> - mirror is a listener
>
> The difference between blkdebug/mirror on one side, and currentSnapshot
> on the other, is that (as you said) blkdebug/mirror are always stacked
> on top of something else. A driver that references currentSnapshot
> actually gets mirror.
>
> So we get to the actual hierarchy I'm proposing:
>
> BlockDriver
>    BlockSource (see below)
>    Protocol (bdrv_file_open)
>        FileProtocol
>        ...
>    View (has-a BlockFormat)
>        QCOW2View
>        RawView
>        ...
>    BlockListener (has-a BlockDriver)
>        MirrorListener
>        BlkDebugListener
> BlockFormat (has-a BlockSource)
>    QCOW2Format
>    RawFormat
>    ...
>
> Protocols and views are only internal.  Formats and devices in practice
> will only ever see BlockSources.  A BlockSource is a reference a stack of
> BlockDrivers, where the base must be a protocol or view and there can
> be a number of listeners stacked on top of it.  Listeners can be
> added or removed from the stack, and the bottom driver can be swapped
> for another (for snapshots).
>
> So, here is how it would look:
>
>  .== BlockSource ==.                   .== BlockSource ===.
>  | MirrorListener  |                   | BlkDebugListener |
>  | QCOW2View --+--> QCOW2Format -> | FileProtocol     |
>  '='                   '=='
>
>
>> There are two listeners here, mirror and blkdebug. (Things like blkdebug
>> are why view has-a listener isn't enough). After creating an external
>> snapshot, we expect the graph to look like this (the arrow down is the
>> backing file):
>>
>> mirror -> qcow2 -> file
>>             |
>>             +-> qcow2 -> blkdebug -> file
>
> And here:
>
>  .== BlockSource ==.
>  | MirrorListener  |                   .== BlockSource ==.
>  | QCOW2View --+--> QCOW2Format -> | FileProtocol    |
>  '='    |              '='
>                         |                                          .== 
> BlockSource ===.
>                         |   .== BlockSource ==.

Re: [Qemu-devel] [Qemu-trivial] [PATCH] Fix dependency issue introduced by commit 7b93fadf3a38d1ed65ea5536a52efc2772c6e3b8

2012-02-21 Thread Stefan Hajnoczi

On Tue, Feb 21, 2012 at 10:12 AM, 陳韋任  wrote:
>  Commit 7b93fadf3a38d1ed65ea5536a52efc2772c6e3b8 adds basic version of bridge
> helper, but introduces dependency issue at the same time. While building 
> target
> xxx-linux-user, qemu-bridge-helper.c needs config-host.h which is not 
> generated
> at the moment. Building recurse-all target first fixes the problem.

The build system should not rely on ordering side-effects.  Instead,
please explicitly declare the dependency for the helper or see if the
dependency can be eliminated if it is inappropriate.

Stefan

Re: [Qemu-devel] [PATCH 1/2] arm: clean up GIC constants.

2012-02-21 Thread Peter Maydell

On 21 February 2012 02:33, Rusty Russell  wrote:
> Interrupts numbers 0-31 are private to the processor interface, 32-1019 are
> general interrups.  Add GIC_INTERNAL and substitute everywhere.
> @@ -73,8 +75,9 @@ typedef struct gic_irq_state
>  #define GIC_SET_TRIGGER(irq) s->irq_state[irq].trigger = 1
>  #define GIC_CLEAR_TRIGGER(irq) s->irq_state[irq].trigger = 0
>  #define GIC_TEST_TRIGGER(irq) s->irq_state[irq].trigger
> -#define GIC_GET_PRIORITY(irq, cpu) \
> -  (((irq) < 32) ? s->priority1[irq][cpu] : s->priority2[(irq) - 32])
> +#define GIC_GET_PRIORITY(irq, cpu) (((irq) < GIC_INTERNAL) ?           \
> +                                   s->priority1[irq][cpu] :            \
> +                                   s->priority2[(irq) - GIC_INTERNAL])

Hardcoded tabs are against coding style. In the interests of not taking
another round for what's really not a very significant patch, I've
fixed this (and the typo in the commit message) and put the fixed
result into arm-devs.next.

(http://git.linaro.org/gitweb?p=people/pmaydell/qemu-arm.git;a=shortlog;h=refs/heads/arm-devs.next)

-- PMM

Re: [Qemu-devel] [PATCH 2/2] arm: make sure that number of irqs can be represented in GICD_TYPER.

2012-02-21 Thread Peter Maydell

On 21 February 2012 02:33, Rusty Russell  wrote:
> We currently assume that the number of interrupts (ITLinesNumber in
> the architecture reference manual) is divisible by 32, since we
> present it to the guest when it reads GICD_TYPER (in gic_dist_readb())
> as (N / 32) - 1.
>
> Signed-off-by: Rusty Russell 

Reviewed-by: Peter Maydell 
and put into arm-devs.next.

-- PMM

Re: [Qemu-devel] [PATCH 2/3] qapi: Introduce blockdev-group-snapshot-sync command

2012-02-21 Thread Jeff Cody


On 02/20/2012 12:41 PM, Eric Blake wrote:

On 02/20/2012 10:31 AM, Jeff Cody wrote:

This is a QAPI/QMP only command to take a snapshot of a group of
devices. This is simlar to the blockdev-snapshot-sync command, except


s/simlar/similar/



Oops - fixed for v2.



blockdev-group-snapshot-sync accepts a list devices, filenames, and
formats.

It is attempted to keep the snapshot of the group atomic; if
any snapshot of a device in a given group fails, then the whole group
is reverted back to its original image, and error is reported.

This allows the group of disks to remain consistent with each other,
even across snapshot failures.

Signed-off-by: Jeff Cody
---
  blockdev.c   |  130 ++
  qapi-schema.json |   45 +++
  qmp-commands.hx  |   39 
  3 files changed, 214 insertions(+), 0 deletions(-)



+
+error_rollback:
+/* failure, undo everything as much as we can */
+QSIMPLEQ_FOREACH(snap_entry,&gsnp_list, entry) {
+if (snap_entry->has_pivoted) {
+ret = bdrv_open(snap_entry->bs, snap_entry->old_filename,
+snap_entry->flags, snap_entry->old_drv);
+if (ret != 0) {
+/* This is very very bad */
+error_set(errp, QERR_OPEN_FILE_FAILED,
+  snap_entry->old_filename);


Is there any way to reduce the likelihood of a rollback failure?


Good question.  Obviously, it is ideal for this to have the lowest 
possible likelihood of a rollback failure.  I guess the real question 
is: What are the events that would cause the original image re-open to 
fail, and how can we avoid them - or, as Kevin mentioned to me, do we 
never close the original file to avoid the re-open (and, will that avoid 
those failure events)?  Hopefully these are rare events, in any case.






+SQMP
+blockdev-group-snapshot-sync
+--
+
+Synchronous snapshot of one or more block devices.  A list array input
+is accepted, that contains the device, snapshot-file to be create as the
+target of the new image. If the file exists, or if it is a device, the
+snapshot will be created in the existing file/device. If does not
+exist, a new file will be created. format specifies the format of the
+snapshot image, default is qcow2.  On failure of any device, it is
+attempted to reopen the original image for all the devices that were
+specified.
+
+Arguments:
+
+- "device": device name to snapshot (json-string)
+- "snapshot-file": name of new image file (json-string)
+- "format": format of new image (json-string, optional)


Shouldn't this mention that the arguments is a JSON list, rather than a
single argument?


Yes, you are right - adding that in.




+
+Example:
+
+->  { "execute": "blockdev-group-snapshot-sync", "arguments": [{ "device": 
"ide-hd0",
+
"snapshot-file":
+
"/some/place/my-image",
+"format": 
"qcow2" },
+  { "device": 
"ide-hd1",
+
"snapshot-file":
+
"/some/place/my-image2",
+"format": 
"qcow2" } }


Are you missing a ']' before the final '}' here?



Yes, thanks.  Forgot that in the example.  Added for v2.

Jeff

Re: [Qemu-devel] BlockDriverState stack and BlockListeners

2012-02-21 Thread Paolo Bonzini

On 02/21/2012 01:22 PM, Stefan Hajnoczi wrote:
> This is a good discussion because BlockDriverState has become bloated
> and complex.  A lot of fields only apply to sub-cases and we should
> really split this struct.
> 
> Fields like "backing_file" *should* be in generic code, not duplicated
> in each Format.  But BlockDriverState is too generic since it also
> encompasses Protocols and Listeners.

Yes.

> You mentioned that some of these classes would be "internal".  I think
> everything should be exposed in the QOM just like Linux exposes kernel
> objects in sysfs.  It makes troubleshooting and debugging easier.

Yes, exposing in QOM makes sense.  But QOM can see all things internal
by design. :)  The question is more what to expose to the rest of QEMU.
 For me the answer would be: BlockSource to the device and monitor,
Format to the monitor only.

> As has been pointed out, "Listener" suggests a passive role.  Perhaps
> BlockFilter, BlockProcessor, or BlockModule is a better name.

BlockFilter sounds good.

> Ideally Formats can be isolated from the rest of the block layer so
> that it becomes easy to create a libimageformat.  If we bake
> coroutines, I/O APIs, and memory allocation functions too deeply into
> Formats then they are hard to test and impossible to use outside of
> QEMU.

I'm afraid that the only way to do this is to first replace coroutines
with threads.

Paolo

Re: [Qemu-devel] [PATCH v6 2/4] cadence_ttc: initial version of device model

2012-02-21 Thread Paul Brook

> > +static inline int64_t is_between(int64_t x, int64_t a, int64_t b)
> > +{
> > +if (a < b) {
> > +return x > a && x <= b;
> > +}
> > +return x < a && x >= b;
> > +}
> 
> This looks slightly odd -- should the boundary condition for whether
> a value equal to the max/min really change depending on :whether a
> or b is greater?

This is a ugly hack.  Instead of figuring out whether we have a count-up or 
count-down timer the code checks for both, and have the "in_between" function 
magically DTRT.  I haven't followed the paths through in enough detail to 
figure out whether it gets all the corner cases right.

Paul

Re: [Qemu-devel] BlockDriverState stack and BlockListeners

2012-02-21 Thread Kevin Wolf

Am 21.02.2012 12:36, schrieb Paolo Bonzini:
> And here:
> 
>   .== BlockSource ==.
>   | MirrorListener  |   .== BlockSource ==.
>   | QCOW2View --+--> QCOW2Format -> | FileProtocol|
>   '='|  '='
>  |  .== 
> BlockSource ===.
>  |   .== BlockSource ==.| 
> BlkDebugListener |
>  '-> | QCOW2View --+--> QCOW2Format --> | 
> FileProtocol |
>  '='
> '=='
> 
> Does it seem sane?

Yes, this (and the whole explanation that I didn't quote) looks good to
me. For now, at least, until I find the first example where it doesn't
work. Or maybe I can just trust Markus to bring something up.

>> The question is: Can we assume that any listeners that are on top of the
>> first format or protocol (i.e. those that would fit your model) should
>> move to the new top-level view? Or would it sometimes make sense to keep
>> it at the old one?
> 
> I think it depends, but both possibilities should be doable in this model.

Meh. :-)

Maybe we need to introduce something outside of the whole stack, an
entity that is referred to by the device (as in IDE, virtio-blk, ...)
and that refers to a stack of top-level listeners (which would be moved
to the new top-level BlockSource on live snapshot) and to the first
BlockSource (which can have more listeners, and those would stick with
the same BlockSource even if moves down the chain).

Oh, and just to open another can of worms: We should probably design in
the notion of media (which can be ejected etc.) and drives (which always
stay there). We don't have a clean separation today.

Kevin

Re: [Qemu-devel] BlockDriverState stack and BlockListeners

2012-02-21 Thread Paolo Bonzini

On 02/21/2012 02:10 PM, Kevin Wolf wrote:
>> > I think it depends, but both possibilities should be doable in this model.
> 
> Meh. :-)

Agreed. :)

> Maybe we need to introduce something outside of the whole stack, an
> entity that is referred to by the device (as in IDE, virtio-blk, ...)
> and that refers to a stack of top-level listeners (which would be moved
> to the new top-level BlockSource on live snapshot) and to the first
> BlockSource (which can have more listeners, and those would stick with
> the same BlockSource even if moves down the chain).

That would be a nested BlockSource.  I'm not sure why this is needed
though.  You can move BlockDrivers to another BlockSource down the chain
(stacking them on top of those that are there already), and leave the
upper BlockSource with the Protocol/View only.

> Oh, and just to open another can of worms: We should probably design in
> the notion of media (which can be ejected etc.) and drives (which always
> stay there). We don't have a clean separation today.

It is there: the drive is the BlockSource, the medium is the Protocol.
An empty drive is just another protocol.

Paolo

[Qemu-devel] [PATCH] pci: add accessors to get/set registers by mask

2012-02-21 Thread Michael S. Tsirkin

pci_regs.h specifies many registers by mask +
shifted register values.
There's always some duplication when using such:
for example to override device type, we would need:

pci_word_test_and_clear_mask(cap + PCI_EXP_FLAGS,
 PCI_EXP_FLAGS_TYPE);
pci_word_test_and_set_mask(cap + PCI_EXP_FLAGS,
PCI_EXP_TYPE_ENDPOINT << (ffs(PCI_EXP_FLAGS_TYPE) - 1));

Getting such registers also uses some duplication:

word = pci_get_word(cap + PCI_EXP_FLAGS) & PCI_EXP_FLAGS_TYPE;
if ((word >> ffs((PCI_EXP_FLAGS_TYPE) - 1)) == PCI_EXP_TYPE_ENDPOINT)

Add API to access such registers in one line:
pci_set_word_by_mask(cap + PCI_EXP_FLAGS, PCI_EXP_FLAGS_TYPE,
 PCI_EXP_TYPE_ENDPOINT)

and
word = pci_get_word_by_mask(cap + PCI_EXP_FLAGS, PCI_EXP_FLAGS_TYPE)
if (word == PCI_EXP_TYPE_ENDPOINT)

Signed-off-by: Michael S. Tsirkin 
---
 hw/pci.h |   61 +
 1 files changed, 61 insertions(+), 0 deletions(-)

diff --git a/hw/pci.h b/hw/pci.h
index 448c44e..1103838 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -469,6 +469,67 @@ pci_quad_test_and_set_mask(uint8_t *config, uint64_t mask)
 return val & mask;
 }
 
+/* Access a register specified by a mask */
+static inline void
+pci_set_byte_by_mask(uint8_t *config, uint8_t mask, uint8_t reg)
+{
+uint8_t val = pci_get_byte(config);
+uint8_t rval = reg << (ffs(mask) - 1);
+pci_set_byte(config, (~mask & val) | (mask & rval));
+}
+
+static inline uint8_t
+pci_get_byte_by_mask(uint8_t *config, uint8_t mask)
+{
+uint8_t val = pci_get_byte(config);
+return (val & mask) >> (ffs(mask) - 1);
+}
+
+static inline void
+pci_set_word_by_mask(uint8_t *config, uint16_t mask, uint16_t reg)
+{
+uint16_t val = pci_get_word(config);
+uint16_t rval = reg << (ffs(mask) - 1);
+pci_set_word(config, (~mask & val) | (mask & rval));
+}
+
+static inline uint16_t
+pci_get_word_by_mask(uint8_t *config, uint16_t mask)
+{
+uint16_t val = pci_get_word(config);
+return (val & mask) >> (ffs(mask) - 1);
+}
+
+static inline void
+pci_set_long_by_mask(uint8_t *config, uint32_t mask, uint32_t reg)
+{
+uint32_t val = pci_get_long(config);
+uint32_t rval = reg << (ffs(mask) - 1);
+pci_set_long(config, (~mask & val) | (mask & rval));
+}
+
+static inline uint32_t
+pci_get_long_by_mask(uint8_t *config, uint32_t mask)
+{
+uint32_t val = pci_get_long(config);
+return (val & mask) >> (ffs(mask) - 1);
+}
+
+static inline void
+pci_set_quad_by_mask(uint8_t *config, uint64_t mask, uint64_t reg)
+{
+uint64_t val = pci_get_quad(config);
+uint64_t rval = reg << (ffs(mask) - 1);
+pci_set_quad(config, (~mask & val) | (mask & rval));
+}
+
+static inline uint64_t
+pci_get_quad_by_mask(uint8_t *config, uint64_t mask)
+{
+uint64_t val = pci_get_quad(config);
+return (val & mask) >> (ffs(mask) - 1);
+}
+
 PCIDevice *pci_create_multifunction(PCIBus *bus, int devfn, bool multifunction,
 const char *name);
 PCIDevice *pci_create_simple_multifunction(PCIBus *bus, int devfn,
-- 
1.7.9.111.gf3fb0

[Qemu-devel] [PATCH] pci: set memory type for memory behind the bridge

2012-02-21 Thread Michael S. Tsirkin

As we make upper bits in IO and prefetcheable memory
registers writeable, we should declare support
for 64 bit prefetcheable memory and 32 bit io
in the bridge.

This changes the default for apb, dec, but I'm guessing
they got the defaults wrong by accident.
Alternatively, we could let bridges declare lack of
64 bit support and make the upper bits read-only zero.

With this applied, we can drop these bits
from express code.

Reported-by: Gerd Hoffmann 
Signed-off-by: Michael S. Tsirkin 

Could someone familiar with apb,dec ack this please?

---
 hw/pci.c |   15 +--
 1 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/hw/pci.c b/hw/pci.c
index 31d6a5f..3ca5f4c 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -611,7 +611,7 @@ static void pci_init_w1cmask(PCIDevice *dev)
  PCI_STATUS_SIG_SYSTEM_ERROR | PCI_STATUS_DETECTED_PARITY);
 }
 
-static void pci_init_wmask_bridge(PCIDevice *d)
+static void pci_init_mask_bridge(PCIDevice *d)
 {
 /* PCI_PRIMARY_BUS, PCI_SECONDARY_BUS, PCI_SUBORDINATE_BUS and
PCI_SEC_LETENCY_TIMER */
@@ -632,6 +632,14 @@ static void pci_init_wmask_bridge(PCIDevice *d)
 /* PCI_PREF_BASE_UPPER32 and PCI_PREF_LIMIT_UPPER32 */
 memset(d->wmask + PCI_PREF_BASE_UPPER32, 0xff, 8);
 
+/* Supported memory and i/o types */
+d->config[PCI_IO_BASE] |= PCI_IO_RANGE_TYPE_32;
+d->config[PCI_IO_LIMIT] |= PCI_IO_RANGE_TYPE_32;
+pci_word_test_and_set_mask(d->config + PCI_PREF_MEMORY_BASE,
+   PCI_PREF_RANGE_TYPE_64);
+pci_word_test_and_set_mask(d->config + PCI_PREF_MEMORY_LIMIT,
+   PCI_PREF_RANGE_TYPE_64);
+
 /* TODO: add this define to pci_regs.h in linux and then in qemu. */
 #define  PCI_BRIDGE_CTL_VGA_16BIT  0x10/* VGA 16-bit decode */
 #define  PCI_BRIDGE_CTL_DISCARD0x100   /* Primary discard 
timer */
@@ -654,6 +662,9 @@ static void pci_init_wmask_bridge(PCIDevice *d)
  * completeness. */
 pci_set_word(d->w1cmask + PCI_BRIDGE_CONTROL,
  PCI_BRIDGE_CTL_DISCARD_STATUS);
+d->cmask[PCI_IO_BASE] |= PCI_IO_RANGE_TYPE_MASK;
+pci_word_test_and_set_mask(d->cmask + PCI_PREF_MEMORY_BASE,
+   PCI_PREF_RANGE_TYPE_MASK);
 }
 
 static int pci_init_multifunction(PCIBus *bus, PCIDevice *dev)
@@ -775,7 +786,7 @@ static PCIDevice *do_pci_register_device(PCIDevice 
*pci_dev, PCIBus *bus,
 pci_init_wmask(pci_dev);
 pci_init_w1cmask(pci_dev);
 if (pc->is_bridge) {
-pci_init_wmask_bridge(pci_dev);
+pci_init_mask_bridge(pci_dev);
 }
 if (pci_init_multifunction(bus, pci_dev)) {
 pci_config_free(pci_dev);
-- 
1.7.9.111.gf3fb0

Re: [Qemu-devel] win7 bad i/o performance, high insn_emulation and exists

2012-02-21 Thread Vadim Rozenfeld



- Original Message -
From: "Peter Lieven" 
To: "Gleb Natapov" 
Cc: qemu-devel@nongnu.org, k...@vger.kernel.org, vroze...@redhat.com
Sent: Tuesday, February 21, 2012 2:05:25 PM
Subject: Re: win7 bad i/o performance, high insn_emulation and exists

On 21.02.2012 12:46, Gleb Natapov wrote:
> On Tue, Feb 21, 2012 at 12:16:16PM +0100, Peter Lieven wrote:
>> On 21.02.2012 12:00, Gleb Natapov wrote:
>>> On Tue, Feb 21, 2012 at 11:59:23AM +0100, Peter Lieven wrote:
 On 21.02.2012 11:56, Gleb Natapov wrote:
> On Tue, Feb 21, 2012 at 11:50:47AM +0100, Peter Lieven wrote:
>>> I hope it will make Windows use TSC instead, but you can't be sure
>>> about anything with Windows :(
>> Whatever it does now it eates more CPU has almost equal
>> number of exits and throughput is about the same (15MB/s).
>> If pmtimer is at 0xb008 it still reads it like hell.
>>
>> I checked with bcedit /v that useplatformclock is set to "No".
> Yeah, today I noticed that it is likely virtio drivers that hammer
> on PM timer (at least rip of the instruction that access it is
> very close to rip of the instruction that access virtio pio).
> Vadim, Windows driver developer,  is CCed.
 Ok, I will switch to IDE and e1000 to confirm this? Or does it not
 make sense?

>>> It make perfect sense! Please try it.
>> ~10MB/s. still a lot of 0xb008 reads.
>>
[VR]
Could it be that you have Driver Verifier running in you system?

> The same amount of reads essentially. So my theory is incorrect. Virtio
> driver probably calls Windows function to do IO and the function
> happens to be near the function that access PM timer.
>
> I wonder why time stamps in your traces are so coarse-grained. What do
> you see in /sys/bus/clocksource/devices/clocksource0/current_clocksource ?

its set to acpi_pm on the host. we changed that from tsc (choosen by 
kernel) after we encountered
a kernel bug which ooops all hosts after approx. 270 days uptime. 
(https://lkml.org/lkml/2011/7/21/343).

i am not sure if this is fixed in 2.6.38 or later kernels and we could 
go back to tsc.

for testing i already checked this, but it doesn't give better performance.

peter


> --
>   Gleb.

Re: [Qemu-devel] KVM call agenda for Tuesday 21th

2012-02-21 Thread Kevin Wolf

Am 20.02.2012 11:13, schrieb Juan Quintela:
> 
> Hi
> 
> Please send in any agenda items you are interested in covering.

What's the status with qtest? (Though probably a one-line email would
already answer this)

Kevin

Re: [Qemu-devel] win7 bad i/o performance, high insn_emulation and exists

2012-02-21 Thread Peter Lieven

On 21.02.2012 14:56, Vadim Rozenfeld wrote:

- Original Message -
From: "Peter Lieven"
To: "Gleb Natapov"
Cc: qemu-devel@nongnu.org, k...@vger.kernel.org, vroze...@redhat.com
Sent: Tuesday, February 21, 2012 2:05:25 PM
Subject: Re: win7 bad i/o performance, high insn_emulation and exists

On 21.02.2012 12:46, Gleb Natapov wrote:

On Tue, Feb 21, 2012 at 12:16:16PM +0100, Peter Lieven wrote:

On 21.02.2012 12:00, Gleb Natapov wrote:

On Tue, Feb 21, 2012 at 11:59:23AM +0100, Peter Lieven wrote:

On 21.02.2012 11:56, Gleb Natapov wrote:

On Tue, Feb 21, 2012 at 11:50:47AM +0100, Peter Lieven wrote:

I hope it will make Windows use TSC instead, but you can't be sure
about anything with Windows :(

Whatever it does now it eates more CPU has almost equal
number of exits and throughput is about the same (15MB/s).
If pmtimer is at 0xb008 it still reads it like hell.

I checked with bcedit /v that useplatformclock is set to "No".

Yeah, today I noticed that it is likely virtio drivers that hammer
on PM timer (at least rip of the instruction that access it is
very close to rip of the instruction that access virtio pio).
Vadim, Windows driver developer,  is CCed.

Ok, I will switch to IDE and e1000 to confirm this? Or does it not
make sense?

It make perfect sense! Please try it.

~10MB/s. still a lot of 0xb008 reads.

[VR]
Could it be that you have Driver Verifier running in you system?

unfortunately not.

i found the following in an old knowledge base article 
(http://support.microsoft.com/kb/938448):

"Only Windows Server 2003 with Service Pack 2 uniprocessor ACPI HALs use 
*PMTIMER* for QPC by default. Multiprocessor ACPI HALs will use 
*PMTIMER* only if *USE_PLATFORM_CLOCK *flag is set by the BIOS or if the 
*/usepmtimer *boot.ini option is used. Other HAL types don’t support 
*PMTIMER* and will use *TSC* by default for QPC

By default, Windows Server 2003 Service Pack 2 (SP2) uses the PM timer 
for all Advanced Configuration and Power Interface (ACPI) HALs unless 
one of the following conditions aretrue:

   * The check process to determine whether the BIOS supports the APIC
 or ACPI HALs fails.
   * *

 Note:* If the BIOS does not support the ACPI HAL, contact the
 original equipment manufacturer to determine whether a BIOS update
 is available that will resolve the problem. If a BIOS update is
 not available, you must use the PM timer by using the
 */usepmtimer* switch.

If you are not running Windows Server 2003 SP2, you must force the AMD 
computer to use the PM timer by using the */usepmtimer* switch.

*Note* The decision to use the PM timer or the TSC timer is made during 
a check that is performed at startup to query the BIOS and to determine 
whether the BIOS will support the PM timer functions. This check is not 
completely accurate on AMD chipsets. Therefore, you must use the 
*/usepmtimer* switch.

In Windows Server 2003 SP2, this section of code was rewritten. 
Therefore, the correct performance monitor data appears on AMD chipsets 
that have Windows Server 2003 SP2 installed, and you do not have to use 
the */usepmtimer* switch.

For more information about ACPI and APCI hardware support, click the 
following article number to view the article in the Microsoft Knowledge 
Base:
309283   HAL options after 
Windows XP or Windows Server 2003 Setup
The third-party products that this article discusses are manufactured by 
companies that are independent of Microsoft. Microsoft makes no 
warranty, implied or otherwise, about the performance or reliability of 
these products."

-

so it seems windows prefers pmtimer over tsc. has anyone an idea/hack to 
make the acpi_pm timer fail without disabling acpi completely?

thanks,
peter

Re: [Qemu-devel] [PATCH 3/3] qapi: Introduce blockdev-query-group-snapshot-failure

2012-02-21 Thread Jeff Cody

On 02/20/2012 12:48 PM, Eric Blake wrote:
> On 02/20/2012 10:31 AM, Jeff Cody wrote:
>> In the case of a failure in a group snapshot, it is possible for
>> multiple file image failures to occur - for instance, failure of
>> an original snapshot, and then failure of one or more of the
>> attempted reopens of the original.
>>
>> Knowing all of the file images which failed could be useful or
>> critical information, so this command returns a list of strings
>> containing the filenames of all failures from the last
>> invocation of blockdev-group-snapshot-sync.
> 
> Meta-question:
> 
> Suppose that the guest is running when we issue
> blockdev-group-snapshot-sync - in that case, qemu is responsible for
> pausing and then resuming the guest.  On success, this makes sense.  But
> what happens on failure?

The guest is not paused in blockdev-group-snapshot-sync; I don't think
that qemu should enforce pause/resume in the live snapshot commands.

> 
> If we only fail at creating one snapshot, but successfully roll back the
> rest of the set, should the guest be resumed (as if the command had
> never been attempted), or should the guest be left paused?
> 
> On the other hand, if we fail at creating one snapshot, as well as fail
> at rolling back, then that argues that we _cannot_ resume the guest,
> because we no longer have a block device open.

Is that really true, though?  Depending on what drive failed, the guest
may still be runnable.  It would be roughly equivalent to the guest as a
drive failure; a bad event, but not always fatal.

But, I think v2 of the patch may make this moot - I was talking with
Kevin, and he had some good ideas on how to do this without requiring a
close & reopen in the case of the snapshot failure; which means that we
shouldn't have to worry about the second scenario.  I am going to
incorporate those changes into v2.

> 
> This policy needs to be documented in one (or both) of the two new
> monitor commands, and we probably ought to make sure that if the guest
> is left paused where it had originally started as running, then an
> appropriate event is also emitted.

I agree, the documentation should make it clear what is going on - I
will add that to v2.

> 
> For blockdev-snapshot-sync, libvirt was always pausing qemu before
> issuing the snapshot, then resuming afterwards; but now that we have the
> ability to make the set atomic, I'm debating about whether libvirt still
> needs to pause qemu, or whether it can now rely on qemu doing the right
> things about pausing and resuming as part of the snapshot command.
> 

Again, it doesn't pause automatically, so that is up to libvirt.  The
guest agent is also available to freeze the filesystem, if libvirt wants
to trust it (and it is running); if not, then libvirt can still issue a
pause/resume around the snapshot command (and libvirt may be in a better
position to decide what to do in case of failure, if it has some
knowledge of the drives that failed and how they are used).

Re: [Qemu-devel] [PATCH] block: remove unused fields in BlockDriverState

2012-02-21 Thread Kevin Wolf

Am 20.02.2012 17:58, schrieb Paolo Bonzini:
> sync_aiocb is unused since commit ce1a14d (Dynamically allocate AIO
> Completion Blocks., 2006-08-07).
> 
> private is unused since commit 56a1493 (drive cleanup fixes., 2009-09-25).
> 
> Signed-off-by: Paolo Bonzini 

Thanks, applied to the block branch.

Kevin

[Qemu-devel] [PATCH] Makefile: Add dependency to fix linux-user-only build

2012-02-21 Thread Peter Maydell

Make qemu-bridge-helper explicitly depend on $(GENERATED_HEADERS)
so that it doesn't fail to build when we configured for linux-user
targets only. (Build breakage introduced in commit 7b93fad.)

Signed-off-by: Peter Maydell 
---
I suspect we could handle generated headers better in our makefile
than having to scatter $(GENERATED_HEADERS) dependencies into almost
everything (eg "%.o: $(GENERATED_HEADERS)" ??) but this is the minimal
fix for the build breakage which we should apply for now I think.

 Makefile |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/Makefile b/Makefile
index e66e885..c67493e 100644
--- a/Makefile
+++ b/Makefile
@@ -159,6 +159,7 @@ qemu-nbd$(EXESUF): qemu-nbd.o $(tools-obj-y) $(block-obj-y)
 qemu-io$(EXESUF): qemu-io.o cmd.o $(tools-obj-y) $(block-obj-y)
 
 qemu-bridge-helper$(EXESUF): qemu-bridge-helper.o
+qemu-bridge-helper.o: $(GENERATED_HEADERS)
 
 fsdev/virtfs-proxy-helper$(EXESUF): fsdev/virtfs-proxy-helper.o 
fsdev/virtio-9p-marshal.o oslib-posix.o $(trace-obj-y)
 fsdev/virtfs-proxy-helper$(EXESUF): LIBS += -lcap
-- 
1.7.1

Re: [Qemu-devel] KVM call agenda for Tuesday 21th

2012-02-21 Thread Juan Quintela

Kevin Wolf  wrote:
> Am 20.02.2012 11:13, schrieb Juan Quintela:
>> 
>> Hi
>> 
>> Please send in any agenda items you are interested in covering.
>
> What's the status with qtest? (Though probably a one-line email would
> already answer this)

As this is the only topic for today call, and you agree that a single
line email is enough, I am canceling today call.

Happy Hacking, Juan.

> Kevin

[Qemu-devel] 2 build issues for qemu v1.0.1

2012-02-21 Thread Virtbie


Hello
I am trying to build from source the qemu v1.0.1 (from git) on a Ubuntu 
mixed system (part is oneiric 11.10, part is precise 12.04)


Here are two issues:

During build of:
hw/9pfs/virtio-9p-handle.c
it gave errors saying it couldnt find definition of "AT_EMPTY_PATH".

So I added it at the head of the file (copied from my fcntl.h which is 
apparently NOT imported during this cc):

#define AT_EMPTY_PATH   0x1000  /* Allow empty relative pathname */
and then it apparently went ahead correctly.


Then it gave the following error:

../libhw32/pcspk.o: In function `pcspk_ioport_write':
../hw/pcspk.c:132: undefined reference to `pit_set_gate'
../libhw32/pcspk.o: In function `pcspk_ioport_read':
../hw/pcspk.c:121: undefined reference to `pit_get_out'
../hw/pcspk.c:123: undefined reference to `pit_get_gate'
../libhw32/pcspk.o: In function `pcspk_callback':
../hw/pcspk.c:75: undefined reference to `pit_get_mode'
../hw/pcspk.c:78: undefined reference to `pit_get_initial_count'
collect2: ld returned 1 exit status
make[1]: *** [qemu-system-cris] Error 1
make: *** [subdir-cris-softmmu] Error 2
make: *** Waiting for unfinished jobs

and for this I didn't want to delve into the issue so I disabled pcspk 
emulation at configure and rebuilt so to skip this.


After this, it built correctly.

Re: [Qemu-devel] [PATCH] pci: set memory type for memory behind the bridge

2012-02-21 Thread Alexander Graf


On 02/21/2012 02:57 PM, Michael S. Tsirkin wrote:

As we make upper bits in IO and prefetcheable memory
registers writeable, we should declare support
for 64 bit prefetcheable memory and 32 bit io
in the bridge.

This changes the default for apb, dec, but I'm guessing
they got the defaults wrong by accident.
Alternatively, we could let bridges declare lack of
64 bit support and make the upper bits read-only zero.

With this applied, we can drop these bits
from express code.


Ben, you know these devices better than us. Any comments? :)


Alex


Reported-by: Gerd Hoffmann
Signed-off-by: Michael S. Tsirkin

Could someone familiar with apb,dec ack this please?

---
  hw/pci.c |   15 +--
  1 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/hw/pci.c b/hw/pci.c
index 31d6a5f..3ca5f4c 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -611,7 +611,7 @@ static void pci_init_w1cmask(PCIDevice *dev)
   PCI_STATUS_SIG_SYSTEM_ERROR | PCI_STATUS_DETECTED_PARITY);
  }

-static void pci_init_wmask_bridge(PCIDevice *d)
+static void pci_init_mask_bridge(PCIDevice *d)
  {
  /* PCI_PRIMARY_BUS, PCI_SECONDARY_BUS, PCI_SUBORDINATE_BUS and
 PCI_SEC_LETENCY_TIMER */
@@ -632,6 +632,14 @@ static void pci_init_wmask_bridge(PCIDevice *d)
  /* PCI_PREF_BASE_UPPER32 and PCI_PREF_LIMIT_UPPER32 */
  memset(d->wmask + PCI_PREF_BASE_UPPER32, 0xff, 8);

+/* Supported memory and i/o types */
+d->config[PCI_IO_BASE] |= PCI_IO_RANGE_TYPE_32;
+d->config[PCI_IO_LIMIT] |= PCI_IO_RANGE_TYPE_32;
+pci_word_test_and_set_mask(d->config + PCI_PREF_MEMORY_BASE,
+   PCI_PREF_RANGE_TYPE_64);
+pci_word_test_and_set_mask(d->config + PCI_PREF_MEMORY_LIMIT,
+   PCI_PREF_RANGE_TYPE_64);
+
  /* TODO: add this define to pci_regs.h in linux and then in qemu. */
  #define  PCI_BRIDGE_CTL_VGA_16BIT 0x10/* VGA 16-bit decode */
  #define  PCI_BRIDGE_CTL_DISCARD   0x100   /* Primary discard 
timer */
@@ -654,6 +662,9 @@ static void pci_init_wmask_bridge(PCIDevice *d)
   * completeness. */
  pci_set_word(d->w1cmask + PCI_BRIDGE_CONTROL,
   PCI_BRIDGE_CTL_DISCARD_STATUS);
+d->cmask[PCI_IO_BASE] |= PCI_IO_RANGE_TYPE_MASK;
+pci_word_test_and_set_mask(d->cmask + PCI_PREF_MEMORY_BASE,
+   PCI_PREF_RANGE_TYPE_MASK);
  }

  static int pci_init_multifunction(PCIBus *bus, PCIDevice *dev)
@@ -775,7 +786,7 @@ static PCIDevice *do_pci_register_device(PCIDevice 
*pci_dev, PCIBus *bus,
  pci_init_wmask(pci_dev);
  pci_init_w1cmask(pci_dev);
  if (pc->is_bridge) {
-pci_init_wmask_bridge(pci_dev);
+pci_init_mask_bridge(pci_dev);
  }
  if (pci_init_multifunction(bus, pci_dev)) {
  pci_config_free(pci_dev);

Re: [Qemu-devel] KVM call agenda for Tuesday 21th

2012-02-21 Thread Anthony Liguori


On 02/21/2012 08:06 AM, Kevin Wolf wrote:

Am 20.02.2012 11:13, schrieb Juan Quintela:


Hi

Please send in any agenda items you are interested in covering.


What's the status with qtest? (Though probably a one-line email would
already answer this)


I still need to go through Paolo's patches.  I'll make sure to look at them this 
week to try to get something merged.


Regards,

Anthony Liguori



Kevin

[Qemu-devel] [PATCH] Fix dependency issue introduced by commit 7b93fadf3a38d1ed65ea5536a52efc2772c6e3b8

2012-02-21 Thread 陳韋任

  Commit 7b93fadf3a38d1ed65ea5536a52efc2772c6e3b8 adds basic version of bridge
helper, but introduces dependency issue at the same time. While building target
xxx-linux-user, qemu-bridge-helper.c needs config-host.h which is not generated
at the moment. Building recurse-all target first fixes the problem.  

Signed-off-by: Chen Wei-Ren 
---
 Makefile |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/Makefile b/Makefile
index e66e885..618b306 100644
--- a/Makefile
+++ b/Makefile
@@ -79,7 +79,7 @@ defconfig:
 
 -include config-all-devices.mak
 
-build-all: $(DOCS) $(TOOLS) $(HELPERS-y) recurse-all
+build-all: $(DOCS) $(TOOLS) recurse-all $(HELPERS-y)
 
 config-host.h: config-host.h-timestamp
 config-host.h-timestamp: config-host.mak
-- 
1.7.3.4

Re: [Qemu-devel] [PATCH V14 2/7] Add TPM (frontend) hardware interface (TPM TIS) to Qemu

2012-02-21 Thread Stefan Berger


On 02/21/2012 07:18 AM, Michael S. Tsirkin wrote:

On Tue, Feb 21, 2012 at 06:19:26AM -0500, Stefan Berger wrote:

On 02/20/2012 10:18 PM, Michael S. Tsirkin wrote:

On Mon, Feb 20, 2012 at 07:43:05PM -0500, Stefan Berger wrote:

On 02/20/2012 05:02 PM, Michael S. Tsirkin wrote:

On Wed, Dec 14, 2011 at 08:43:17AM -0500, Stefan Berger wrote:

Alternatively, I imagined that you can pass a copy
or pointer of the necessary state to the backend,
which queues the command and wakes the worker.
In the reverse direction, backend queues a response
and when OS polls you dequeue it and update state.


The OS doesn't necessarily need to poll. It is just one mode of
operation of the OS, the other being interrupt-driven where the
backend raises the interrupt once it has delivered the response to
the frontend.


Stefan

So you will also need to signal the frontend when it
must interrupt the guest. This is not a problem,
for example you can use a qemu_eventfd object for this.



When the backend delivers the response it checks whether the interface 
is used in interrupt mode and raises the interrupt. The backend enters 
the frontend code with a callback. In this function also a signal is 
sent that may wake up the main thread that, upon suspend, may be waiting 
for the last command to be processed and be sleeping on a condition 
variable.


I now added a function to the backend interface that is invoked by the 
frontend to notify the backend of a TPM request. The backend code can 
then either notify a thread (passthrough and libtpms driver) or create a 
response right away and invoke that callback to the front-end to deliver 
the response (null driver). How frontend and backend handle 
notifications is isolated to the frontend and backend with some backends 
(libtpms, passthough) sharing the code for how the notification is done.


Stefan

Re: [Qemu-devel] KVM call agenda for tuesday 31

2012-02-21 Thread Peter Maydell

On 9 February 2012 22:23, Peter Maydell  wrote:
> Ping re the VMState and variable sized arrays issue. I don't
> see any consensus in this discussion for a different approach,
> so should we just commit Mitsyanko's patchset?

>From an IRC conversation I just had with Anthony and Juan:
===begin==
14:51 < pm215> quintela: do you have an opinion on the vmstate variable-
length array stuff (needed for sd card) ?
14:51 < quintela> pm havent looked, email title?
14:52 < pm215> "KVM call agenda for tuesday 31" is the most recent
discussion :-)
14:53 < pm215> http://patchwork.ozlabs.org/patch/137732/ and
 http://patchwork.ozlabs.org/patch/137733/ are the relevant patches
14:54 < quintela> pm215: found it, that it is a difficult thing to do (TM)
14:54 < quintela> it should be on the "card" file, or whatever :-(
14:55 < quintela> notice the "should" part.
14:55 < pm215> I'm not sure what you mean, can you elaborate?
14:57 < quintela> pm215: sect is number of sectors, right?
14:58 < pm215> yes
14:59 < quintela> so, a 1GB card would have around 8MB array?
14:59 < quintela> took or left some byties here andthere.
14:59 < quintela> bytes indeed.
14:59 < quintela> I _think_ that we should put that in a save_live
 section, but that is just me (TM)
15:00 < quintela> I guess that at some point, people are going to need
 bigger SD cards (16GB are already on the wild, right)
15:00 < quintela> and that would make live migration just impossible
15:00 < quintela> or very slow, that is completely equivalent.
15:01 < quintela> it is my understanding that AHCI is using similar code,
 or did I missread some of the information?
15:03 < pm215> I think alex would like ahci to use a similar 'variable
 length array' thing, but in that case the array is much smaller
15:03 < aliguori> pm215, it's large but of bounded size, right?
15:03 < aliguori> for SD cards?
15:03 < quintela> aliguori: number of sectors * 4 bytes
15:03 < quintela> aliguori: so hugee
15:04 < quintela> 8MB array for each 1GB of disk.
15:04 < quintela> but or take some bytes.
15:04 < aliguori> quintela, you cannot save that much data via savevm
15:04 < quintela> this is more than all other devices combined in a
 normal instalaltion.
15:04 < aliguori> that's too much
15:04 < quintela> aliguori: see my answer, we need a save_live section,
 really.
15:04 < aliguori> it will screw up the live migration downtime algorithm
15:04 < aliguori> pm215, ^
15:04 < aliguori> or just treat it as a ram section
15:05 < aliguori> qemu_ram_alloc() it, and call it a day
15:05 < pm215> the spec isn't very clear, but I think technically this
 info should go in the sd card image, except there's no way to tack
 additional info into a raw file
15:05 < aliguori> pm215, yeah, qemu_ram_alloc() is the way to go I think,
 that makes it effectively volatile on-card RAM
15:05 < quintela> pm215: I fully agree that it should go into the card
 image, but . no space for it :-(
15:05 < aliguori> which i think makes sense
15:05 < quintela> pm215: another thing, forgetting about migration at all.
15:06 < quintela> how does this work if you stop marchine and restart
 it again?
15:06  * quintela guess that it is stored somewhere?
15:06 < quintela> s/marchine/machine/
15:06 < pm215> no, we just assume that any fresh sd card image has no
 write-protect set up
15:07 < quintela> pm215: what is stored on that image? /me would have
 assumed that wearing information
15:07 < quintela> but that is without reading the whole code.
15:09 < quintela> humm, it looks like only 1 bit is used for each sector,
 why are we storing 32 bits if we only use 1 bit?
15:09 < pm215> it's write-protect : you can set parts of the sd card to
 not be writeable (with a granularity of a "write-protect group size")
15:09 < pm215> we probably don't implement fantastically efficiently
15:10 < quintela> pm215: ok, only 1 bit is needed.
15:10 < quintela> we can move to 1bit/sector (8 times smaller)
15:10 < quintela> but I still think that doing the qemu_ram_alloc()
 trick that aliguori pointed is the easiest way to fix it
15:11 < quintela> you can create a ram_save_live section, but it is
 going to be more complex for almost no gain
15:11 < pm215> ah, so we qemu_ram_alloc() it and then the contents get
 transferred in the same way as main memory ?
15:12 < pm215> ...that is in exec-obsolete.h and marked as "to be
 removed soon"...
15:13 < aliguori> pm215, yeah, so you'll need to create it using
 whatever the new fancy memory api interface is
15:13 < aliguori> pm215, note that whenever you touch that memory, you
 have to set the dirty bits appropriately
15:13 < aliguori> or else live migration won't work
15:14 < quintela> aliguori: if they have to touch the dirty bits, it is
 equivalent to do their own save_live section.
15:14 < quintela> but I agree that this is the only "easy" solution.
15:17 < pm215> doesn't sound too hard...
15:18 < quintela> as usual with vmstate, problem is testing (althought
 shouldn't be very difficult, eith

[Qemu-devel] [PATCH] block: drop aio_multiwrite in BlockDriver

2012-02-21 Thread Paolo Bonzini

These were never used.

Signed-off-by: Paolo Bonzini 
---
 block.c |   23 ++---
 block_int.h |6 --
 2 files changed, 3 insertions(+), 26 deletions(-)

diff --git a/block.c b/block.c
index ae297bb..b395f7b 100644
--- a/block.c
+++ b/block.c
@@ -2767,7 +2793,6 @@ typedef struct MultiwriteCB {
 BlockDriverCompletionFunc *cb;
 void *opaque;
 QEMUIOVector *free_qiov;
-void *free_buf;
 } callbacks[];
 } MultiwriteCB;
 
@@ -2781,7 +2806,6 @@ static void multiwrite_user_cb(MultiwriteCB *mcb)
 qemu_iovec_destroy(mcb->callbacks[i].free_qiov);
 }
 g_free(mcb->callbacks[i].free_qiov);
-qemu_vfree(mcb->callbacks[i].free_buf);
 }
 }
 
@@ -2838,20 +2862,11 @@ static int multiwrite_merge(BlockDriverState *bs, 
BlockRequest *reqs,
 int merge = 0;
 int64_t oldreq_last = reqs[outidx].sector + reqs[outidx].nb_sectors;
 
-// This handles the cases that are valid for all block drivers, namely
-// exactly sequential writes and overlapping writes.
+// Handle exactly sequential writes and overlapping writes.
 if (reqs[i].sector <= oldreq_last) {
 merge = 1;
 }
 
-// The block driver may decide that it makes sense to combine requests
-// even if there is a gap of some sectors between them. In this case,
-// the gap is filled with zeros (therefore only applicable for yet
-// unused space in format like qcow2).
-if (!merge && bs->drv->bdrv_merge_requests) {
-merge = bs->drv->bdrv_merge_requests(bs, &reqs[outidx], &reqs[i]);
-}
-
 if (reqs[outidx].qiov->niov + reqs[i].qiov->niov + 1 > IOV_MAX) {
 merge = 0;
 }
@@ -2868,14 +2883,8 @@ static int multiwrite_merge(BlockDriverState *bs, 
BlockRequest *reqs,
 size = (reqs[i].sector - reqs[outidx].sector) << 9;
 qemu_iovec_concat(qiov, reqs[outidx].qiov, size);
 
-// We might need to add some zeros between the two requests
-if (reqs[i].sector > oldreq_last) {
-size_t zero_bytes = (reqs[i].sector - oldreq_last) << 9;
-uint8_t *buf = qemu_blockalign(bs, zero_bytes);
-memset(buf, 0, zero_bytes);
-qemu_iovec_add(qiov, buf, zero_bytes);
-mcb->callbacks[i].free_buf = buf;
-}
+// We should need to add any zeros between the two requests
+assert (reqs[i].sector <= oldreq_last);
 
 // Add the second request
 qemu_iovec_concat(qiov, reqs[i].qiov, reqs[i].qiov->size);
diff --git a/block_int.h b/block_int.h
index 7946cf6..3eeca58 100644
--- a/block_int.h
+++ b/block_int.h
@@ -162,12 +162,6 @@ struct BlockDriver {
  */
 int coroutine_fn (*bdrv_co_flush_to_os)(BlockDriverState *bs);
 
-int (*bdrv_aio_multiwrite)(BlockDriverState *bs, BlockRequest *reqs,
-int num_reqs);
-int (*bdrv_merge_requests)(BlockDriverState *bs, BlockRequest* a,
-BlockRequest *b);
-
-
 const char *protocol_name;
 int (*bdrv_truncate)(BlockDriverState *bs, int64_t offset);
 int64_t (*bdrv_getlength)(BlockDriverState *bs);
-- 
1.7.7.6

Re: [Qemu-devel] BlockDriverState stack and BlockListeners

2012-02-21 Thread Markus Armbruster

Stefan Hajnoczi  writes:

> This is a good discussion because BlockDriverState has become bloated
> and complex.  A lot of fields only apply to sub-cases and we should
> really split this struct.

Yup.

We've had discussions where couldn't even agree whether a specific block
driver is a format or a protocol or something else entirely.  This is
because the code doesn't really make distinctions.

> Fields like "backing_file" *should* be in generic code, not duplicated
> in each Format.

Debatable.

>  But BlockDriverState is too generic since it also
> encompasses Protocols and Listeners.
>
> You mentioned that some of these classes would be "internal".  I think
> everything should be exposed in the QOM just like Linux exposes kernel
> objects in sysfs.  It makes troubleshooting and debugging easier.
>
> As has been pointed out, "Listener" suggests a passive role.  Perhaps
> BlockFilter, BlockProcessor, or BlockModule is a better name.

BlockFilter sounds good to me.

> Ideally Formats can be isolated from the rest of the block layer so
> that it becomes easy to create a libimageformat.  If we bake
> coroutines, I/O APIs, and memory allocation functions too deeply into
> Formats then they are hard to test and impossible to use outside of
> QEMU.

Wouldn't that be nice.

Re: [Qemu-devel] BlockDriverState stack and BlockListeners

2012-02-21 Thread Markus Armbruster

Kevin Wolf  writes:

> Am 21.02.2012 12:36, schrieb Paolo Bonzini:
>> And here:
>> 
>>   .== BlockSource ==.
>>   | MirrorListener  |   .== BlockSource ==.
>>   | QCOW2View --+--> QCOW2Format -> | FileProtocol|
>>   '='|  '='
>>  |  .== 
>> BlockSource ===.
>>  |   .== BlockSource ==.| 
>> BlkDebugListener |
>>  '-> | QCOW2View --+--> QCOW2Format --> | 
>> FileProtocol |
>>  '='
>> '=='
>> 
>> Does it seem sane?
>
> Yes, this (and the whole explanation that I didn't quote) looks good to
> me. For now, at least, until I find the first example where it doesn't
> work. Or maybe I can just trust Markus to bring something up.
>
>>> The question is: Can we assume that any listeners that are on top of the
>>> first format or protocol (i.e. those that would fit your model) should
>>> move to the new top-level view? Or would it sometimes make sense to keep
>>> it at the old one?
>> 
>> I think it depends, but both possibilities should be doable in this model.
>
> Meh. :-)
>
> Maybe we need to introduce something outside of the whole stack, an
> entity that is referred to by the device (as in IDE, virtio-blk, ...)
> and that refers to a stack of top-level listeners (which would be moved
> to the new top-level BlockSource on live snapshot) and to the first
> BlockSource (which can have more listeners, and those would stick with
> the same BlockSource even if moves down the chain).

The top-level BDS is already special.  I think it makes sense to factor
out the specialness into a "block backend" type, and let it point to a
non-special block driver instance (root of a tree of block driver
instances, in general).

> Oh, and just to open another can of worms: We should probably design in
> the notion of media (which can be ejected etc.) and drives (which always
> stay there). We don't have a clean separation today.

The "closed BDS means no media" thing works, but it's odd.

Re: [Qemu-devel] virtio-blk performance regression and qemu-kvm

2012-02-21 Thread Dongsu Park

Hi Stefan,
see below.

On 13.02.2012 11:57, Stefan Hajnoczi wrote:
> On Fri, Feb 10, 2012 at 2:36 PM, Dongsu Park
>  wrote:
> >  Now I'm running benchmarks with both qemu-kvm 0.14.1 and 1.0.
> >
> >  - Sequential read (Running inside guest)
> >   # fio -name iops -rw=read -size=1G -iodepth 1 \
> >    -filename /dev/vdb -ioengine libaio -direct=1 -bs=4096
> >
> >  - Sequential write (Running inside guest)
> >   # fio -name iops -rw=write -size=1G -iodepth 1 \
> >    -filename /dev/vdb -ioengine libaio -direct=1 -bs=4096
> >
> >  For each one, I tested 3 times to get the average.
> >
> >  Result:
> >
> >  seqread with qemu-kvm 0.14.1   67,0 MByte/s
> >  seqread with qemu-kvm 1.0      30,9 MByte/s
> >
> >  seqwrite with qemu-kvm 0.14.1  65,8 MByte/s
> >  seqwrite with qemu-kvm 1.0     30,5 MByte/s
> 
> Please retry with the following commit or simply qemu-kvm.git/master.
> Avi discovered a performance regression which was introduced when the
> block layer was converted to use coroutines:
> 
> $ git describe 39a7a362e16bb27e98738d63f24d1ab5811e26a8
> v1.0-327-g39a7a36
> 
> (This commit is not in 1.0!)
> 
> Please post your qemu-kvm command-line.
> 
> 67 MB/s sequential 4 KB read means 67 * 1024 / 4 = 17152 requests per
> second, so 58 microseconds per request.
> 
> Please post the fio output so we can double-check what is reported.

As you mentioned above, I tested it again with the revision
v1.0-327-g39a7a36, which includes the commit 39a7a36.

Result is though still not good enough.

seqread   : 20.3 MByte/s
seqwrite  : 20.1 MByte/s
randread  : 20.5 MByte/s
randwrite : 20.0 MByte/s

My qemu-kvm commandline is like below:

===
/usr/bin/kvm -S -M pc-0.14 -enable-kvm -m 1024 \
-smp 1,sockets=1,cores=1,threads=1 -name mydebian3_8gb \
-uuid d99ad012-2fcc-6f7e-fbb9-bc48b424a258 -nodefconfig -nodefaults \
-chardev 
socket,id=charmonitor,path=/var/lib/libvirt/qemu/mydebian3_8gb.monitor,server,nowait
 \
-mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown \
-drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw \
-device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 \
-drive 
file=/var/lib/libvirt/images/mydebian3_8gb.img,if=none,id=drive-virtio-disk0,format=raw,cache=none,aio=native
 \
-device 
virtio-blk-pci,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
 \
-drive 
file=/dev/ram0,if=none,id=drive-virtio-disk1,format=raw,cache=none,aio=native \
-device 
virtio-blk-pci,bus=pci.0,addr=0x7,drive=drive-virtio-disk1,id=virtio-disk1 \
-netdev tap,fd=19,id=hostnet0 \
-device 
virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:68:9f:d0,bus=pci.0,addr=0x3 
\
-chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 \
-usb -device usb-tablet,id=input0 -vnc 127.0.0.1:0 -vga cirrus \
-device AC97,id=sound0,bus=pci.0,addr=0x4 \
-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6
===

As you see, /dev/ram0 is being mapped to /dev/vdb on the guest side,
which is used for fio tests.

Here is a sample of fio output:

===
# fio -name iops -rw=read -size=1G -iodepth 1 -filename /dev/vdb \
-ioengine libaio -direct=1 -bs=4096
iops: (g=0): rw=read, bs=4K-4K/4K-4K, ioengine=libaio, iodepth=1
Starting 1 process
Jobs: 1 (f=1): [R] [100.0% done] [21056K/0K /s] [5140/0 iops] [eta
00m:00s]
iops: (groupid=0, jobs=1): err= 0: pid=1588
  read : io=1024MB, bw=20101KB/s, iops=5025, runt= 52166msec
slat (usec): min=4, max=6461, avg=24.00, stdev=19.75
clat (usec): min=0, max=11934, avg=169.49, stdev=113.91
bw (KB/s) : min=18200, max=23048, per=100.03%, avg=20106.31, stdev=934.42
  cpu  : usr=5.43%, sys=23.25%, ctx=262363, majf=0, minf=28
  IO depths: 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
 submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
 complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
 issued r/w: total=262144/0, short=0/0
 lat (usec): 2=0.01%, 4=0.16%, 10=0.03%, 20=0.01%, 50=0.27%
 lat (usec): 100=4.07%, 250=89.12%, 500=5.76%, 750=0.30%, 1000=0.13%
 lat (msec): 2=0.12%, 4=0.02%, 10=0.01%, 20=0.01%

Run status group 0 (all jobs):
   READ: io=1024MB, aggrb=20100KB/s, minb=20583KB/s, maxb=20583KB/s,
mint=52166msec, maxt=52166msec

Disk stats (read/write):
  vdb: ios=261308/0, merge=0/0, ticks=40210/0, in_queue=40110, util=77.14%
===


So I think, the patch for coroutine-ucontext isn't about the bottleneck
I'm looking for.

Regards,
Dongsu

p.s. Sorry for the late reply. Last week I was on vacation.

Re: [Qemu-devel] BlockDriverState stack and BlockListeners

2012-02-21 Thread Kevin Wolf

Am 21.02.2012 16:56, schrieb Markus Armbruster:
> Kevin Wolf  writes:
> 
>> Am 21.02.2012 12:36, schrieb Paolo Bonzini:
>>> And here:
>>>
>>>   .== BlockSource ==.
>>>   | MirrorListener  |   .== BlockSource ==.
>>>   | QCOW2View --+--> QCOW2Format -> | FileProtocol|
>>>   '='|  '='
>>>  |  .== 
>>> BlockSource ===.
>>>  |   .== BlockSource ==.| 
>>> BlkDebugListener |
>>>  '-> | QCOW2View --+--> QCOW2Format --> | 
>>> FileProtocol |
>>>  '='
>>> '=='
>>>
>>> Does it seem sane?
>>
>> Yes, this (and the whole explanation that I didn't quote) looks good to
>> me. For now, at least, until I find the first example where it doesn't
>> work. Or maybe I can just trust Markus to bring something up.
>>
 The question is: Can we assume that any listeners that are on top of the
 first format or protocol (i.e. those that would fit your model) should
 move to the new top-level view? Or would it sometimes make sense to keep
 it at the old one?
>>>
>>> I think it depends, but both possibilities should be doable in this model.
>>
>> Meh. :-)
>>
>> Maybe we need to introduce something outside of the whole stack, an
>> entity that is referred to by the device (as in IDE, virtio-blk, ...)
>> and that refers to a stack of top-level listeners (which would be moved
>> to the new top-level BlockSource on live snapshot) and to the first
>> BlockSource (which can have more listeners, and those would stick with
>> the same BlockSource even if moves down the chain).
> 
> The top-level BDS is already special.  I think it makes sense to factor
> out the specialness into a "block backend" type, and let it point to a
> non-special block driver instance (root of a tree of block driver
> instances, in general).

I think this is what I meant.

>> Oh, and just to open another can of worms: We should probably design in
>> the notion of media (which can be ejected etc.) and drives (which always
>> stay there). We don't have a clean separation today.
> 
> The "closed BDS means no media" thing works, but it's odd.

I'm more talking about data that belongs to the media, like geometry.
This came up recently with Hervé's floppy patches.

Kevin

Re: [Qemu-devel] [RFC PATCH] replication agent module

2012-02-21 Thread Markus Armbruster

Anthony Liguori  writes:

> On 02/07/2012 07:50 AM, Stefan Hajnoczi wrote:
>> On Tue, Feb 7, 2012 at 1:34 PM, Kevin Wolf  wrote:
>>> Am 07.02.2012 11:29, schrieb Ori Mamluk:
 Repagent is a new module that allows an external replication system to
 replicate a volume of a Qemu VM.
>>
>> I recently joked with Kevin that QEMU is on its way to reimplementing
>> the Linux block and device-mapper layers.  Now we have drbd, thanks!
>> :P
>
> I don't think it's a joke.  Do we really want to get into this space?
> Why not just use drbd?
>
> If it's because we want to also work with image formats, perhaps we
> should export our image format code as a shared library and let drbd
> link against it.

I want to be able to "mount foo.qcow2 /mnt".

[Qemu-devel] Mascot Contest Results

2012-02-21 Thread Anthony Liguori

First, I'd like to thank everyone for participating!  We received a large number 
of entries and lot of great submissions.


But there is pretty clearly a favorite and with 12 votes, I'm pleased to 
announce that Benoit Canet's "Kew the Angry Emu" logo is the winner!


I'm attaching the SVG to this mail for everyone to see in it's full glory.  It's 
derived from the oksmith EMU SVG (which is public domain) and Liberation Sans 
text (GPL with a clause permitting any use when embedding).


Benoit has agreed to license the logo under the CC BY license[1].  I'll commit 
this logo (along with some derivatives) to the tree so we can start using it 
everywhere we can!


Benoit has also graciously volunteered to help re-theme the wiki to better match 
the new logo.  Stay tuned for more info here.


Thanks again to everyone who participated and congratulations to Benoit!

[1] http://creativecommons.org/licenses/by/3.0/

Regards,

Anthony Liguori
<>

Re: [Qemu-devel] [PATCH V7 03/11] pci_regs: Add PCI_EXP_TYPE_PCIE_BRIDGE

2012-02-21 Thread Anthony PERARD

On Mon, Feb 20, 2012 at 20:30, Michael S. Tsirkin  wrote:
> On Fri, Feb 17, 2012 at 05:08:37PM +, Anthony PERARD wrote:
>> Signed-off-by: Anthony PERARD 
>> ---
>>  hw/pci_regs.h |    1 +
>>  1 files changed, 1 insertions(+), 0 deletions(-)
>>
>> diff --git a/hw/pci_regs.h b/hw/pci_regs.h
>> index 6b42515..56a404b 100644
>> --- a/hw/pci_regs.h
>> +++ b/hw/pci_regs.h
>> @@ -392,6 +392,7 @@
>>  #define  PCI_EXP_TYPE_UPSTREAM       0x5     /* Upstream Port */
>>  #define  PCI_EXP_TYPE_DOWNSTREAM 0x6 /* Downstream Port */
>>  #define  PCI_EXP_TYPE_PCI_BRIDGE 0x7 /* PCI/PCI-X Bridge */
>> +#define  PCI_EXP_TYPE_PCIE_BRIDGE 0x8   /* PCI/PCI-X to PCIE Bridge */
>
> I don't know why but linux source lacks this value, and we
> try to stay in sync with that.
> Please push there then we'll add here.
> Meanwhile  put a define in your code.

Ok, I'll do that.
Thanks,

-- 
Anthony PERARD

Re: [Qemu-devel] [PATCH] linux-user: fix segfault deadlock

2012-02-21 Thread Peter Maydell

On 13 January 2012 16:21, Alexander Graf  wrote:
> On 13.01.2012, at 17:16, Peter Maydell wrote:
>> On 13 January 2012 15:46, Alexander Graf  wrote:
>>> This patch forces unlocking of that lock in the segv handler. I'm not sure
>>> this is the right approach though. Maybe we should rather make sure we don't
>>> segfault in the code? I would greatly appreciate someone more intelligible
>>> than me to look at this :).
>>
>> A segfault while we're walking the TB chains in QEMU C code?
>> That's just a bug (and we know we have one there) -- we should
>> fix it rather than papering over it like this.
>
> Well, we're segfaulting in this exact special case which calls setrlimit() 
> before an mmap which fails:

So what's actually happening here is not a problem with the failing
mmap requested by the guest, but with one we do ourselves slightly
later. The guest attempts to do a call to some code that hasn't been
translated yet (as it happens it's an attempt to call the tls function
in the ARM commpage at 0x0fe0, but that's not particularly
important). tb_gen_code() generates the code, and then calls
tb_link_page() to add it to our data structures. tb_link_page calls
tb_alloc_page() which calls page_find_alloc() which then needs
to allocate a level 2 page table entry in the l1_map[]. Unfortunately
(a) the mmap() it uses to do this fails because of the rlimit
(b) we don't check the mmap() return value, so we proceed on
with the bogus -1 and quickly segfault.

So "unlock and carry on regardless" is definitely wrong. We could
do the same as the system-mode (which is using g_malloc0) and
abort when the mmap() fails, although that won't help your test
program much I suspect.

For a proper fix we probably need to handle set/getrlimit for
RLIMIT_AS specially so we can apply them ourselves to the guest's
mmap/brk usage and don't get spurious allocation failures of
our private data structures...

-- PMM

Re: [Qemu-devel] [RFC 5/7] qxl-render: call ppm_save on callback

2012-02-21 Thread Eric Blake

On 02/21/2012 01:19 AM, Alon Levy wrote:

>>>  (2) Async monitor command.  Keeps interface and works nicely.  A bunch
>>>  of QAPI bits tickled into master meanwhile, so we could look at
>>>  this again.  Luiz?  What is the status here?
>>>
>>>  (3) Something like this patch + additionally introduce a
>>>  "your-screenshot-is-finished-now" qmp event.  Will break existing
>>>  users too.  But at least they can be adapted without requiring
>>>  some external, nonportable service like inotify ...
>>
>> Libvirt would want 3) - any command that becomes async also needs an
>> event to tell us when the command is completed, so that libvirt can
>> maintain the synchronous interface to the user (and/or expose a new flag
>> to allow the user to also benefit from the asynchronous command).
> 
> If I do 2) then libvirt won't notice because the monitor command will
> block as usual. Only change would be internal, qemu would continue
> processing other fds in the interim.

I guess I misunderstood things then.  I was assuming that an async
monitor command would mean that the monitor command would return control
to libvirt prior to the screenshot file being completely written; but
libvirt promises a synchronous interface to callers (that is, a caller
using virDomainScreenshot won't get a response until the screenshot file
has started streaming, but that means the file had better be consistent,
and not something that gets modified after libvirt has streamed it to
the caller).  I don't particularly care which solution we have, as long
as the overall result is still something where libvirt has guarantees
that the complete screenshot file is available before libvirt then hands
control of that file back to the caller.

-- 
Eric Blake   ebl...@redhat.com+1-919-301-3266
Libvirt virtualization library http://libvirt.org

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH V7 11/11] xen passthrough: clean up MSI-X table handling

2012-02-21 Thread Anthony PERARD

On Tue, Feb 21, 2012 at 09:34, Jan Beulich  wrote:
> Wouldn't thus much better be merged into the prior patch(es)? After
> all you're not trying to reconstruct the Xen-side history of this code
> anyway.

Yes, that probably better than having an extra patch. There is a
"link" to the history, anyway. I'll merge the patch.

Thanks,

-- 
Anthony PERARD

Re: [Qemu-devel] [PATCH] linux-user: fix segfault deadlock

2012-02-21 Thread Alexander Graf


On 02/21/2012 05:11 PM, Peter Maydell wrote:

On 13 January 2012 16:21, Alexander Graf  wrote:

On 13.01.2012, at 17:16, Peter Maydell wrote:

On 13 January 2012 15:46, Alexander Graf  wrote:

This patch forces unlocking of that lock in the segv handler. I'm not sure
this is the right approach though. Maybe we should rather make sure we don't
segfault in the code? I would greatly appreciate someone more intelligible
than me to look at this :).

A segfault while we're walking the TB chains in QEMU C code?
That's just a bug (and we know we have one there) -- we should
fix it rather than papering over it like this.

Well, we're segfaulting in this exact special case which calls setrlimit() 
before an mmap which fails:

So what's actually happening here is not a problem with the failing
mmap requested by the guest, but with one we do ourselves slightly
later. The guest attempts to do a call to some code that hasn't been
translated yet (as it happens it's an attempt to call the tls function
in the ARM commpage at 0x0fe0, but that's not particularly
important). tb_gen_code() generates the code, and then calls
tb_link_page() to add it to our data structures. tb_link_page calls
tb_alloc_page() which calls page_find_alloc() which then needs
to allocate a level 2 page table entry in the l1_map[]. Unfortunately
(a) the mmap() it uses to do this fails because of the rlimit
(b) we don't check the mmap() return value, so we proceed on
with the bogus -1 and quickly segfault.

So "unlock and carry on regardless" is definitely wrong. We could
do the same as the system-mode (which is using g_malloc0) and
abort when the mmap() fails, although that won't help your test
program much I suspect.

For a proper fix we probably need to handle set/getrlimit for
RLIMIT_AS specially so we can apply them ourselves to the guest's
mmap/brk usage and don't get spurious allocation failures of
our private data structures...


Or we preallocate structures we need in those critical paths. Or we 
manage an allocation failure by not linking pages.


Alex

Re: [Qemu-devel] BlockDriverState stack and BlockListeners

2012-02-21 Thread Markus Armbruster

Kevin Wolf  writes:

> Am 21.02.2012 16:56, schrieb Markus Armbruster:
>> Kevin Wolf  writes:
[...]
>>> Maybe we need to introduce something outside of the whole stack, an
>>> entity that is referred to by the device (as in IDE, virtio-blk, ...)
>>> and that refers to a stack of top-level listeners (which would be moved
>>> to the new top-level BlockSource on live snapshot) and to the first
>>> BlockSource (which can have more listeners, and those would stick with
>>> the same BlockSource even if moves down the chain).
>> 
>> The top-level BDS is already special.  I think it makes sense to factor
>> out the specialness into a "block backend" type, and let it point to a
>> non-special block driver instance (root of a tree of block driver
>> instances, in general).
>
> I think this is what I meant.

Then we're in violent agreement :)

>>> Oh, and just to open another can of worms: We should probably design in
>>> the notion of media (which can be ejected etc.) and drives (which always
>>> stay there). We don't have a clean separation today.
>> 
>> The "closed BDS means no media" thing works, but it's odd.
>
> I'm more talking about data that belongs to the media, like geometry.
> This came up recently with Hervé's floppy patches.

Is geometry relevant to anything but floppies and really small disks
being accessed via really old interfaces?

Re: [Qemu-devel] [PATCH] linux-user: fix segfault deadlock

2012-02-21 Thread Peter Maydell

On 21 February 2012 16:19, Alexander Graf  wrote:
> On 02/21/2012 05:11 PM, Peter Maydell wrote:
>> For a proper fix we probably need to handle set/getrlimit for
>> RLIMIT_AS specially so we can apply them ourselves to the guest's
>> mmap/brk usage and don't get spurious allocation failures of
>> our private data structures...
>
> Or we preallocate structures we need in those critical paths. Or we manage
> an allocation failure by not linking pages.

Yeah. Feel free to submit patches :-)

-- PMM

[Qemu-devel] PCI: Add PCI_EXP_TYPE_PCIE_BRIDGE value

2012-02-21 Thread Anthony PERARD

Signed-off-by: Anthony PERARD 

---
 include/linux/pci_regs.h |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/include/linux/pci_regs.h b/include/linux/pci_regs.h
index e41a10f..4b608f5 100644
--- a/include/linux/pci_regs.h
+++ b/include/linux/pci_regs.h
@@ -391,6 +391,7 @@
 #define  PCI_EXP_TYPE_UPSTREAM 0x5 /* Upstream Port */
 #define  PCI_EXP_TYPE_DOWNSTREAM 0x6   /* Downstream Port */
 #define  PCI_EXP_TYPE_PCI_BRIDGE 0x7   /* PCI/PCI-X Bridge */
+#define  PCI_EXP_TYPE_PCIE_BRIDGE 0x8  /* PCI/PCI-X to PCIE Bridge */
 #define  PCI_EXP_TYPE_RC_END   0x9 /* Root Complex Integrated Endpoint */
 #define  PCI_EXP_TYPE_RC_EC0xa /* Root Complex Event Collector */
 #define PCI_EXP_FLAGS_SLOT 0x0100  /* Slot implemented */
-- 
tg: (27e74da..) add-new-pci-value (depends on: master)

Re: [Qemu-devel] [PATCH] usb-hid: activate usb tablet / mouse after migration.

2012-02-21 Thread Peter Lieven


On 16.10.2011 10:54, TeLeMan wrote:

On Thu, Oct 13, 2011 at 18:48, Gerd Hoffmann  wrote:

On 10/13/11 04:09, TeLeMan wrote:

On Wed, Oct 12, 2011 at 19:30, Gerd Hoffmannwrote:

qemu uses the ps/2 mouse by default.  The usb tablet (or mouse) is
activated as soon as qemu sees some guest activity on the device,
i.e. polling for HID events.  That used to work fine for both fresh
boot and migration.

It does not fix usb tablet/mouse when starting vm directly from snapshot.

What does "info mice" print before/after snapshotting?
Which guest?  WinXP IIRC?

Yes, the guest is WinXP.

Original:
* Mouse #1: QEMU HID Tablet (absolute)
   Mouse #0: QEMU PS/2 Mouse

Start from the snapshot:
* Mouse #0: QEMU PS/2 Mouse
   Mouse #1: QEMU HID Tablet (absolute)

The active mouse device is not be saved into the snapshot.
it seems that the active mouse event receiver is messed up up after 
loading a vm.


if you issue a 'mouse_set 1' in the qemu monitor the tablet should work 
again.


can someone who is more familiar with the loadvm code check if the 
devices are just added in wrong order?


thanks
peter

Re: [Qemu-devel] [PATCH v5 12/12] suspend: add qmp events

2012-02-21 Thread Eric Blake

On 02/21/2012 03:00 AM, Gerd Hoffmann wrote:
> [ added libvirt to Cc:, leaving full context.
>   this is about qmp events when the guest enters/leaves s3 ].
> 

>>> @@ -1436,12 +1437,26 @@ void qemu_register_suspend_notifier(Notifier 
>>> *notifier)
>>>  
>>>  void qemu_system_wakeup_request(WakeupReason reason)
>>>  {
>>> +static const char *names[] = {
>>> +[QEMU_WAKEUP_REASON_OTHER]   = "other",
>>> +[QEMU_WAKEUP_REASON_RTC] = "rtc",
>>> +[QEMU_WAKEUP_REASON_PMTIMER] = "pmtimer",
>>
>> Is the reason really important for mngt? Can we just leave it out?
> 
> I assumed the wakeup reason could be useful, but dunno.
> Zapping the code is no problem of course.
> 
> Lets ask mgmt aka libvirt folks ;)

If you present the reason, libvirt will pass it on to the management
app.  But I don't see any reason why libvirt itself would care about the
reason, so it won't hurt libvirt if you drop a reason field.  I don't
know whether higher-level management apps, like oVirt, would care or not.

Isn't this something where it is easier to omit first and add later once
we have a use case, than to add up front only to find that no one cares?

-- 
Eric Blake   ebl...@redhat.com+1-919-301-3266
Libvirt virtualization library http://libvirt.org

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] BlockDriverState stack and BlockListeners

2012-02-21 Thread Kevin Wolf

Am 21.02.2012 17:19, schrieb Markus Armbruster:
 Oh, and just to open another can of worms: We should probably design in
 the notion of media (which can be ejected etc.) and drives (which always
 stay there). We don't have a clean separation today.
>>>
>>> The "closed BDS means no media" thing works, but it's odd.
>>
>> I'm more talking about data that belongs to the media, like geometry.
>> This came up recently with Hervé's floppy patches.
> 
> Is geometry relevant to anything but floppies and really small disks
> being accessed via really old interfaces?

Not sure what it's actually used for, but even virtio-blk does have a
geometry.

But is it really only geometry? I think the read-only flag belongs to
the medium as well. There are probably more candidates.

Kevin

Re: [Qemu-devel] virtio-blk performance regression and qemu-kvm

2012-02-21 Thread Dongsu Park

Hi Rusty,

On 13.02.2012 10:25, Rusty Russell wrote:
> On Fri, 10 Feb 2012 15:36:39 +0100, Dongsu Park 
>  wrote:
> > Hi,
> > 
> > Recently I observed performance regression regarding virtio-blk,
> > especially different IO bandwidths between qemu-kvm 0.14.1 and 1.0.
> > So I want to share the benchmark results, and ask you what the reason
> > would be.
> 
> Interesting.  There are two obvious possibilities here.  One is that
> qemu has regressed, the other is that virtio_blk has regressed; the new
> qemu may negotiate new features.  Please do the following in the guest
> with old and new qemus:
> 
> cat /sys/class/block/vdb/device/features
> 
> (eg, here that gives: 001010110110100e0).

I did that on guest VM, using both qemu-kvm 0.14.1 and 1.0.
(cat /sys/class/block/vdb/device/features)

using qemu-kvm 0.14.1:

0010101101101000

using qemu-kvm 1.0:

0010101101101100

>From my understanding, both of them have the same virtio features.
Please correct me if I'm wrong.

Regards,
Dongsu

Re: [Qemu-devel] win7 bad i/o performance, high insn_emulation and exists

2012-02-21 Thread Vadim Rozenfeld

- Original Message -
From: "Peter Lieven" 
To: "Vadim Rozenfeld" 
Cc: qemu-devel@nongnu.org, k...@vger.kernel.org, "Gleb Natapov" 

Sent: Tuesday, February 21, 2012 4:10:22 PM
Subject: Re: win7 bad i/o performance, high insn_emulation and exists

On 21.02.2012 14:56, Vadim Rozenfeld wrote:
>
> - Original Message -
> From: "Peter Lieven"
> To: "Gleb Natapov"
> Cc: qemu-devel@nongnu.org, k...@vger.kernel.org, vroze...@redhat.com
> Sent: Tuesday, February 21, 2012 2:05:25 PM
> Subject: Re: win7 bad i/o performance, high insn_emulation and exists
>
> On 21.02.2012 12:46, Gleb Natapov wrote:
>> On Tue, Feb 21, 2012 at 12:16:16PM +0100, Peter Lieven wrote:
>>> On 21.02.2012 12:00, Gleb Natapov wrote:
 On Tue, Feb 21, 2012 at 11:59:23AM +0100, Peter Lieven wrote:
> On 21.02.2012 11:56, Gleb Natapov wrote:
>> On Tue, Feb 21, 2012 at 11:50:47AM +0100, Peter Lieven wrote:
 I hope it will make Windows use TSC instead, but you can't be sure
 about anything with Windows :(
>>> Whatever it does now it eates more CPU has almost equal
>>> number of exits and throughput is about the same (15MB/s).
>>> If pmtimer is at 0xb008 it still reads it like hell.
>>>
>>> I checked with bcedit /v that useplatformclock is set to "No".
>> Yeah, today I noticed that it is likely virtio drivers that hammer
>> on PM timer (at least rip of the instruction that access it is
>> very close to rip of the instruction that access virtio pio).
>> Vadim, Windows driver developer,  is CCed.
> Ok, I will switch to IDE and e1000 to confirm this? Or does it not
> make sense?
>
 It make perfect sense! Please try it.
>>> ~10MB/s. still a lot of 0xb008 reads.
>>>
> [VR]
> Could it be that you have Driver Verifier running in you system?
>
unfortunately not.

[VR]
Then could you try booting into "Safe Mode"?

i found the following in an old knowledge base article
(http://support.microsoft.com/kb/938448):

"Only Windows Server 2003 with Service Pack 2 uniprocessor ACPI HALs use
*PMTIMER* for QPC by default. Multiprocessor ACPI HALs will use
*PMTIMER* only if *USE_PLATFORM_CLOCK *flag is set by the BIOS or if the
*/usepmtimer *boot.ini option is used. Other HAL types don’t support
*PMTIMER* and will use *TSC* by default for QPC

By default, Windows Server 2003 Service Pack 2 (SP2) uses the PM timer
for all Advanced Configuration and Power Interface (ACPI) HALs unless
one of the following conditions aretrue:

* The check process to determine whether the BIOS supports the APIC
  or ACPI HALs fails.
* *

  Note:* If the BIOS does not support the ACPI HAL, contact the
  original equipment manufacturer to determine whether a BIOS update
  is available that will resolve the problem. If a BIOS update is
  not available, you must use the PM timer by using the
  */usepmtimer* switch.

If you are not running Windows Server 2003 SP2, you must force the AMD
computer to use the PM timer by using the */usepmtimer* switch.

*Note* The decision to use the PM timer or the TSC timer is made during
a check that is performed at startup to query the BIOS and to determine
whether the BIOS will support the PM timer functions. This check is not
completely accurate on AMD chipsets. Therefore, you must use the
*/usepmtimer* switch.

In Windows Server 2003 SP2, this section of code was rewritten.
Therefore, the correct performance monitor data appears on AMD chipsets
that have Windows Server 2003 SP2 installed, and you do not have to use
the */usepmtimer* switch.

For more information about ACPI and APCI hardware support, click the
following article number to view the article in the Microsoft Knowledge
Base:
309283   HAL options after
Windows XP or Windows Server 2003 Setup
The third-party products that this article discusses are manufactured by
companies that are independent of Microsoft. Microsoft makes no
warranty, implied or otherwise, about the performance or reliability of
these products."

-

so it seems windows prefers pmtimer over tsc. has anyone an idea/hack to
make the acpi_pm timer fail without disabling acpi completely?

thanks,
peter
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Qemu-devel] [PULL v2 0/9] qdev deconstruction, command-line episode

2012-02-21 Thread Paolo Bonzini

Anthony,

I'm sending a pull request since you informally said on IRC that you're
okay with the patches.

The following changes since commit 99c7f87826337fa81f2f0f9baa9ca0a44faf90e9:

  input: send kbd+mouse events only to running guests. (2012-02-17 11:02:55 
-0600)

are available in the git repository at:
  git://github.com/bonzini/qemu.git qdev-props-for-anthony

v1->v2:
Fix bug in parsing booleans and strengthen test suite.
The interdiff is attached.

Paolo Bonzini (9):
  qapi: allow sharing enum implementation across visitors
  qapi: drop qmp_input_end_optional
  qapi: add string-based visitors
  qapi: add tests for string-based visitors
  qom: add generic string parsing/printing
  qdev: accept both strings and integers for PCI addresses
  qdev: accept hex properties only if prefixed by 0x
  qdev: use built-in QOM string parser
  qdev: drop unnecessary parse/print methods

 .gitignore   |2 +
 Makefile.objs|5 +-
 hw/qdev-properties.c |  186 +---
 include/qemu/object.h|   24 +
 qapi/qapi-visit-core.c   |   51 +++
 qapi/qapi-visit-impl.h   |   23 +
 qapi/qmp-input-visitor.c |   39 +
 qapi/qmp-output-visitor.c|   22 +-
 qapi/string-input-visitor.c  |  138 +
 qapi/string-input-visitor.h  |   25 ++
 qapi/string-output-visitor.c |   89 +++
 qapi/string-output-visitor.h |   26 ++
 qom/object.c |   24 +
 test-string-input-visitor.c  |  195 ++
 test-string-output-visitor.c |  188 
 tests/Makefile   |   12 ++-
 16 files changed, 841 insertions(+), 208 deletions(-)
 create mode 100644 qapi/qapi-visit-impl.h
 create mode 100644 qapi/string-input-visitor.c
 create mode 100644 qapi/string-input-visitor.h
 create mode 100644 qapi/string-output-visitor.c
 create mode 100644 qapi/string-output-visitor.h
 create mode 100644 test-string-input-visitor.c
 create mode 100644 test-string-output-visitor.c

diff --git a/include/qemu/object.h b/include/qemu/object.h
index 2081ca0..ff6be14 100644
--- a/include/qemu/object.h
+++ b/include/qemu/object.h
@@ -742,7 +742,7 @@ void object_property_parse(Object *obj, const char *string,
const char *name, struct Error **errp);
 
 /**
- * object_property_set:
+ * object_property_print:
  * @obj: the object
  * @name: the name of the property
  * @errp: returns an error if this function fails
diff --git a/qapi/string-input-visitor.c b/qapi/string-input-visitor.c
index ceee699..497eb9a 100644
--- a/qapi/string-input-visitor.c
+++ b/qapi/string-input-visitor.c
@@ -47,13 +47,15 @@ static void parse_type_bool(Visitor *v, bool *obj, const 
char *name,
 StringInputVisitor *siv = DO_UPCAST(StringInputVisitor, visitor, v);
 
 if (siv->string) {
-if (strcasecmp(siv->string, "on") || strcasecmp(siv->string, "yes") ||
-strcasecmp(siv->string, "true")) {
+if (!strcasecmp(siv->string, "on") ||
+!strcasecmp(siv->string, "yes") ||
+!strcasecmp(siv->string, "true")) {
 *obj = true;
 return;
 }
-if (strcasecmp(siv->string, "off") || strcasecmp(siv->string, "no") ||
-strcasecmp(siv->string, "false")) {
+if (!strcasecmp(siv->string, "off") ||
+!strcasecmp(siv->string, "no") ||
+!strcasecmp(siv->string, "false")) {
 *obj = false;
 return;
 }
diff --git a/test-string-input-visitor.c b/test-string-input-visitor.c
index ba2cc40..5370e32 100644
--- a/test-string-input-visitor.c
+++ b/test-string-input-visitor.c
@@ -75,6 +75,41 @@ static void test_visitor_in_bool(TestInputVisitorData *data,
 visit_type_bool(v, &res, NULL, &errp);
 g_assert(!error_is_set(&errp));
 g_assert_cmpint(res, ==, true);
+visitor_input_teardown(data, unused);
+
+v = visitor_input_test_init(data, "yes");
+
+visit_type_bool(v, &res, NULL, &errp);
+g_assert(!error_is_set(&errp));
+g_assert_cmpint(res, ==, true);
+visitor_input_teardown(data, unused);
+
+v = visitor_input_test_init(data, "on");
+
+visit_type_bool(v, &res, NULL, &errp);
+g_assert(!error_is_set(&errp));
+g_assert_cmpint(res, ==, true);
+visitor_input_teardown(data, unused);
+
+v = visitor_input_test_init(data, "false");
+
+visit_type_bool(v, &res, NULL, &errp);
+g_assert(!error_is_set(&errp));
+g_assert_cmpint(res, ==, false);
+visitor_input_teardown(data, unused);
+
+v = visitor_input_test_init(data, "no");
+
+visit_type_bool(v, &res, NULL, &errp);
+g_assert(!error_is_set(&errp));
+g_assert_cmpint(res, ==, false);
+visitor_input_teardown(data, unused);
+
+v = visitor_input_test_init(data, "off");
+
+visit_type_bool(v, &res, NULL,

Re: [Qemu-devel] BlockDriverState stack and BlockListeners

2012-02-21 Thread Stefan Hajnoczi

On Tue, Feb 21, 2012 at 1:10 PM, Kevin Wolf  wrote:
> Am 21.02.2012 12:36, schrieb Paolo Bonzini:
> Oh, and just to open another can of worms: We should probably design in
> the notion of media (which can be ejected etc.) and drives (which always
> stay there). We don't have a clean separation today.

Media and drives are guest state, they should be part of device
emulation and not host device implementation.

The layering issue is that things like floppy and CD-ROM passthrough
do deal with media change and status.

Does it make sense to have a hw/drive.c for the guest state associated
with a -drive?

Stefan

Re: [Qemu-devel] Mascot Contest Results

2012-02-21 Thread Mulyadi Santosa

On Tue, Feb 21, 2012 at 23:03, Anthony Liguori  wrote:
> But there is pretty clearly a favorite and with 12 votes, I'm pleased to
> announce that Benoit Canet's "Kew the Angry Emu" logo is the winner!

I am late to vote, but after seeing the SVG, I say it's a good logo.
Kudos for Benoit!

-- 
regards,

Mulyadi Santosa
Freelance Linux trainer and consultant

blog: the-hydra.blogspot.com
training: mulyaditraining.blogspot.com

Re: [Qemu-devel] virtio-blk performance regression and qemu-kvm

2012-02-21 Thread Stefan Hajnoczi

On Tue, Feb 21, 2012 at 3:57 PM, Dongsu Park
 wrote:
> On 13.02.2012 11:57, Stefan Hajnoczi wrote:
>> On Fri, Feb 10, 2012 at 2:36 PM, Dongsu Park
>>  wrote:
>> >  Now I'm running benchmarks with both qemu-kvm 0.14.1 and 1.0.
>> >
>> >  - Sequential read (Running inside guest)
>> >   # fio -name iops -rw=read -size=1G -iodepth 1 \
>> >    -filename /dev/vdb -ioengine libaio -direct=1 -bs=4096
>> >
>> >  - Sequential write (Running inside guest)
>> >   # fio -name iops -rw=write -size=1G -iodepth 1 \
>> >    -filename /dev/vdb -ioengine libaio -direct=1 -bs=4096
>> >
>> >  For each one, I tested 3 times to get the average.
>> >
>> >  Result:
>> >
>> >  seqread with qemu-kvm 0.14.1   67,0 MByte/s
>> >  seqread with qemu-kvm 1.0      30,9 MByte/s
>> >
>> >  seqwrite with qemu-kvm 0.14.1  65,8 MByte/s
>> >  seqwrite with qemu-kvm 1.0     30,5 MByte/s
>>
>> Please retry with the following commit or simply qemu-kvm.git/master.
>> Avi discovered a performance regression which was introduced when the
>> block layer was converted to use coroutines:
>>
>> $ git describe 39a7a362e16bb27e98738d63f24d1ab5811e26a8
>> v1.0-327-g39a7a36
>>
>> (This commit is not in 1.0!)
>>
>> Please post your qemu-kvm command-line.
>>
>> 67 MB/s sequential 4 KB read means 67 * 1024 / 4 = 17152 requests per
>> second, so 58 microseconds per request.
>>
>> Please post the fio output so we can double-check what is reported.
>
> As you mentioned above, I tested it again with the revision
> v1.0-327-g39a7a36, which includes the commit 39a7a36.
>
> Result is though still not good enough.
>
> seqread   : 20.3 MByte/s
> seqwrite  : 20.1 MByte/s
> randread  : 20.5 MByte/s
> randwrite : 20.0 MByte/s
>
> My qemu-kvm commandline is like below:
>
> ===
> /usr/bin/kvm -S -M pc-0.14 -enable-kvm -m 1024 \
> -smp 1,sockets=1,cores=1,threads=1 -name mydebian3_8gb \
> -uuid d99ad012-2fcc-6f7e-fbb9-bc48b424a258 -nodefconfig -nodefaults \
> -chardev 
> socket,id=charmonitor,path=/var/lib/libvirt/qemu/mydebian3_8gb.monitor,server,nowait
>  \
> -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown \
> -drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw \
> -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 \
> -drive 
> file=/var/lib/libvirt/images/mydebian3_8gb.img,if=none,id=drive-virtio-disk0,format=raw,cache=none,aio=native
>  \
> -device 
> virtio-blk-pci,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
>  \
> -drive 
> file=/dev/ram0,if=none,id=drive-virtio-disk1,format=raw,cache=none,aio=native 
> \

I'm not sure if O_DIRECT and Linux AIO to /dev/ram0 is a good idea.
At least with tmpfs O_DIRECT does not even work - which kind of makes
sense there because tmpfs lives in the page cache.  My point here is
that ramdisk does not follow the same rules or have the same
performance characteristics as real disks do.  It's something to be
careful about.  Did you run this test because you noticed a real-world
regression?

> Here is a sample of fio output:
>
> ===
> # fio -name iops -rw=read -size=1G -iodepth 1 -filename /dev/vdb \
> -ioengine libaio -direct=1 -bs=4096
> iops: (g=0): rw=read, bs=4K-4K/4K-4K, ioengine=libaio, iodepth=1
> Starting 1 process
> Jobs: 1 (f=1): [R] [100.0% done] [21056K/0K /s] [5140/0 iops] [eta
> 00m:00s]
> iops: (groupid=0, jobs=1): err= 0: pid=1588
>  read : io=1024MB, bw=20101KB/s, iops=5025, runt= 52166msec
>    slat (usec): min=4, max=6461, avg=24.00, stdev=19.75
>    clat (usec): min=0, max=11934, avg=169.49, stdev=113.91
>    bw (KB/s) : min=18200, max=23048, per=100.03%, avg=20106.31, stdev=934.42
>  cpu          : usr=5.43%, sys=23.25%, ctx=262363, majf=0, minf=28
>  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
>     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>     issued r/w: total=262144/0, short=0/0
>     lat (usec): 2=0.01%, 4=0.16%, 10=0.03%, 20=0.01%, 50=0.27%
>     lat (usec): 100=4.07%, 250=89.12%, 500=5.76%, 750=0.30%, 1000=0.13%
>     lat (msec): 2=0.12%, 4=0.02%, 10=0.01%, 20=0.01%
>
> Run status group 0 (all jobs):
>   READ: io=1024MB, aggrb=20100KB/s, minb=20583KB/s, maxb=20583KB/s,
> mint=52166msec, maxt=52166msec
>
> Disk stats (read/write):
>  vdb: ios=261308/0, merge=0/0, ticks=40210/0, in_queue=40110, util=77.14%
> ===
>
>
> So I think, the patch for coroutine-ucontext isn't about the bottleneck
> I'm looking for.

Try turning ioeventfd off for the virtio-blk device:

-device virtio-blk-pci,ioeventfd=off,...

You might see better performance since ramdisk I/O should be very
low-latency.  The overhead of using ioeventfd might not make it
worthwhile.  The ioeventfd feature was added post-0.14

Re: [Qemu-devel] [RFC PATCH] replication agent module

2012-02-21 Thread Stefan Hajnoczi

On Tue, Feb 21, 2012 at 4:01 PM, Markus Armbruster  wrote:
> Anthony Liguori  writes:
>
>> On 02/07/2012 07:50 AM, Stefan Hajnoczi wrote:
>>> On Tue, Feb 7, 2012 at 1:34 PM, Kevin Wolf  wrote:
 Am 07.02.2012 11:29, schrieb Ori Mamluk:
> Repagent is a new module that allows an external replication system to
> replicate a volume of a Qemu VM.
>>>
>>> I recently joked with Kevin that QEMU is on its way to reimplementing
>>> the Linux block and device-mapper layers.  Now we have drbd, thanks!
>>> :P
>>
>> I don't think it's a joke.  Do we really want to get into this space?
>> Why not just use drbd?
>>
>> If it's because we want to also work with image formats, perhaps we
>> should export our image format code as a shared library and let drbd
>> link against it.
>
> I want to be able to "mount foo.qcow2 /mnt".

That would only be possible if foo.qcow2 is a flat file system without
a partition table or logical volumes.

But I completely agree that working with image files on the host
should be as easy as mount.

Stefan

Re: [Qemu-devel] win7 bad i/o performance, high insn_emulation and exists

2012-02-21 Thread Peter Lieven

On 21.02.2012 17:48, Vadim Rozenfeld wrote:

- Original Message -
From: "Peter Lieven"
To: "Vadim Rozenfeld"
Cc: qemu-devel@nongnu.org, k...@vger.kernel.org, "Gleb Natapov"
Sent: Tuesday, February 21, 2012 4:10:22 PM
Subject: Re: win7 bad i/o performance, high insn_emulation and exists

On 21.02.2012 14:56, Vadim Rozenfeld wrote:

- Original Message -
From: "Peter Lieven"
To: "Gleb Natapov"
Cc: qemu-devel@nongnu.org, k...@vger.kernel.org, vroze...@redhat.com
Sent: Tuesday, February 21, 2012 2:05:25 PM
Subject: Re: win7 bad i/o performance, high insn_emulation and exists

On 21.02.2012 12:46, Gleb Natapov wrote:

On Tue, Feb 21, 2012 at 12:16:16PM +0100, Peter Lieven wrote:

On 21.02.2012 12:00, Gleb Natapov wrote:

On Tue, Feb 21, 2012 at 11:59:23AM +0100, Peter Lieven wrote:

On 21.02.2012 11:56, Gleb Natapov wrote:

On Tue, Feb 21, 2012 at 11:50:47AM +0100, Peter Lieven wrote:

I hope it will make Windows use TSC instead, but you can't be sure
about anything with Windows :(

Whatever it does now it eates more CPU has almost equal
number of exits and throughput is about the same (15MB/s).
If pmtimer is at 0xb008 it still reads it like hell.

I checked with bcedit /v that useplatformclock is set to "No".

Yeah, today I noticed that it is likely virtio drivers that hammer
on PM timer (at least rip of the instruction that access it is
very close to rip of the instruction that access virtio pio).
Vadim, Windows driver developer,  is CCed.

Ok, I will switch to IDE and e1000 to confirm this? Or does it not
make sense?

It make perfect sense! Please try it.

~10MB/s. still a lot of 0xb008 reads.

[VR]
Could it be that you have Driver Verifier running in you system?

unfortunately not.

[VR]
Then could you try booting into "Safe Mode"?

same picture.

[Qemu-devel] [PATCH] PPC: 405: Fix ppc405ep initialization

2012-02-21 Thread Alexander Graf

When trying to run a ppc405 guest, it segfaults quite quickly, trying to
access timers that weren't initialized. Initialize them properly instead.

Reported-by: Andreas Faerber 
Signed-off-by: Alexander Graf 
---
 hw/ppc405_uc.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/hw/ppc405_uc.c b/hw/ppc405_uc.c
index 98079fa..1f4efbf 100644
--- a/hw/ppc405_uc.c
+++ b/hw/ppc405_uc.c
@@ -2483,6 +2483,8 @@ CPUState *ppc405ep_init(MemoryRegion *address_space_mem,
 ppc4xx_pob_init(env);
 /* OBP arbitrer */
 ppc4xx_opba_init(0xef600600);
+/* Initialize timers */
+ppc_booke_timers_init(env, sysclk, 0);
 /* Universal interrupt controller */
 irqs = g_malloc0(sizeof(qemu_irq) * PPCUIC_OUTPUT_NB);
 irqs[PPCUIC_OUTPUT_INT] =
-- 
1.6.0.2

[Qemu-devel] [PATCH] PPC: 405: Use proper CPU reset

2012-02-21 Thread Alexander Graf

On ppc405ep there is a register that allows for software to reset the
core, but not the whole system. Implement this reset using a reset
interrupt.

This gets rid of a bunch of #if 0'ed code.

Reported-by: Andreas Faerber 
Signed-off-by: Alexander Graf 
---
 cpu-exec.c   |2 --
 hw/ppc.c |   13 ++---
 hw/ppc405_uc.c   |   16 ++--
 target-ppc/cpu.h |3 +++
 4 files changed, 7 insertions(+), 27 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index a9fa608..8b1edb0 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -339,11 +339,9 @@ int cpu_exec(CPUState *env)
 }
 }
 #elif defined(TARGET_PPC)
-#if 0
 if ((interrupt_request & CPU_INTERRUPT_RESET)) {
 cpu_reset(env);
 }
-#endif
 if (interrupt_request & CPU_INTERRUPT_HARD) {
 ppc_hw_interrupt(env);
 if (env->pending_interrupts == 0)
diff --git a/hw/ppc.c b/hw/ppc.c
index 59882e2..a9516f1 100644
--- a/hw/ppc.c
+++ b/hw/ppc.c
@@ -131,13 +131,7 @@ static void ppc6xx_set_irq (void *opaque, int pin, int 
level)
 /* Level sensitive - active low */
 if (level) {
 LOG_IRQ("%s: reset the CPU\n", __func__);
-env->interrupt_request |= CPU_INTERRUPT_EXITTB;
-/* XXX: TOFIX */
-#if 0
-cpu_reset(env);
-#else
-qemu_system_reset_request();
-#endif
+cpu_interrupt(env, CPU_INTERRUPT_RESET);
 }
 break;
 case PPC6xx_INPUT_SRESET:
@@ -214,10 +208,7 @@ static void ppc970_set_irq (void *opaque, int pin, int 
level)
 case PPC970_INPUT_HRESET:
 /* Level sensitive - active low */
 if (level) {
-#if 0 // XXX: TOFIX
-LOG_IRQ("%s: reset the CPU\n", __func__);
-cpu_reset(env);
-#endif
+cpu_interrupt(env, CPU_INTERRUPT_RESET);
 }
 break;
 case PPC970_INPUT_SRESET:
diff --git a/hw/ppc405_uc.c b/hw/ppc405_uc.c
index 1f4efbf..f18d6d8 100644
--- a/hw/ppc405_uc.c
+++ b/hw/ppc405_uc.c
@@ -1769,13 +1769,7 @@ void ppc40x_core_reset (CPUState *env)
 target_ulong dbsr;
 
 printf("Reset PowerPC core\n");
-env->interrupt_request |= CPU_INTERRUPT_EXITTB;
-/* XXX: TOFIX */
-#if 0
-cpu_reset(env);
-#else
-qemu_system_reset_request();
-#endif
+cpu_interrupt(env, CPU_INTERRUPT_RESET);
 dbsr = env->spr[SPR_40x_DBSR];
 dbsr &= ~0x0300;
 dbsr |= 0x0100;
@@ -1787,13 +1781,7 @@ void ppc40x_chip_reset (CPUState *env)
 target_ulong dbsr;
 
 printf("Reset PowerPC chip\n");
-env->interrupt_request |= CPU_INTERRUPT_EXITTB;
-/* XXX: TOFIX */
-#if 0
-cpu_reset(env);
-#else
-qemu_system_reset_request();
-#endif
+cpu_interrupt(env, CPU_INTERRUPT_RESET);
 /* XXX: TODO reset all internal peripherals */
 dbsr = env->spr[SPR_40x_DBSR];
 dbsr &= ~0x0300;
diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
index fbcf488..ac753f3 100644
--- a/target-ppc/cpu.h
+++ b/target-ppc/cpu.h
@@ -2051,6 +2051,9 @@ enum {
 PPC_INTERRUPT_PERFM,  /* Performance monitor interrupt*/
 };
 
+/* CPU should be reset next, restart from scratch afterwards */
+#define CPU_INTERRUPT_RESET   CPU_INTERRUPT_TGT_INT_0
+
 /*/
 
 static inline void cpu_get_tb_cpu_state(CPUState *env, target_ulong *pc,
-- 
1.6.0.2

1 2 >

1 - 100 of 138 matches

Mail list logo