date:20120912

Re: [Qemu-devel] [PATCH] hw: Add support for new LSI Logic devices.

2012-09-12 Thread Gerhard Wiesinger


On 11.09.2012 19:00, Don Slutz wrote:

Add LSI53C1030, SAS1068, SAS1068e.  Based on code from "VirtualBox Open Source 
Edition".
Based on QEMU MegaRAID SAS 8708EM2.

This is a common VMware disk controller.

SEABIOS change for booting is in the works.

Tested with Fedora 16, 17.  CentoOS 6. Windows 2003R2 and 2008R2.

Signed-off-by: Don Slutz 



Hello Don,

Looks great! Will give it a try.

BTW: Anyone working on a Buslogic SCSI implementation. Then the list for 
VMWare is IHMO complete.


Ciao,
Gerhard

Re: [Qemu-devel] [PATCH V12 5/6] add-cow file format

2012-09-12 Thread Dong Xu Wang

On Tue, Sep 11, 2012 at 5:40 PM, Kevin Wolf  wrote:
> Am 10.08.2012 17:39, schrieb Dong Xu Wang:
>> add-cow file format core code. It use block-cache.c as cache code.
>>
>> Signed-off-by: Dong Xu Wang 
>> ---
>>  block/Makefile.objs |1 +
>>  block/add-cow.c |  613 
>> +++
>>  block/add-cow.h |   85 +++
>>  block_int.h |2 +
>>  4 files changed, 701 insertions(+), 0 deletions(-)
>>  create mode 100644 block/add-cow.c
>>  create mode 100644 block/add-cow.h
>>
>> diff --git a/block/Makefile.objs b/block/Makefile.objs
>> index 23bdfc8..7ed5051 100644
>> --- a/block/Makefile.objs
>> +++ b/block/Makefile.objs
>> @@ -2,6 +2,7 @@ block-obj-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o 
>> bochs.o vpc.o vvfat
>>  block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o
>>  block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
>>  block-obj-y += qed-check.o
>> +block-obj-y += add-cow.o
>>  block-obj-y += block-cache.o
>>  block-obj-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o
>>  block-obj-y += stream.o
>> diff --git a/block/add-cow.c b/block/add-cow.c
>> new file mode 100644
>> index 000..d4711d5
>> --- /dev/null
>> +++ b/block/add-cow.c
>> @@ -0,0 +1,613 @@
>> +/*
>> + * QEMU ADD-COW Disk Format
>> + *
>> + * Copyright IBM, Corp. 2012
>> + *
>> + * Authors:
>> + *  Dong Xu Wang 
>> + *
>> + * This work is licensed under the terms of the GNU LGPL, version 2 or 
>> later.
>> + * See the COPYING.LIB file in the top-level directory.
>> + *
>> + */
>> +
>> +#include "qemu-common.h"
>> +#include "block_int.h"
>> +#include "module.h"
>> +#include "add-cow.h"
>> +
>> +static void add_cow_header_le_to_cpu(const AddCowHeader *le, AddCowHeader 
>> *cpu)
>> +{
>> +cpu->magic  = le64_to_cpu(le->magic);
>> +cpu->version= le32_to_cpu(le->version);
>> +
>> +cpu->backing_filename_offset= 
>> le32_to_cpu(le->backing_filename_offset);
>> +cpu->backing_filename_size  = 
>> le32_to_cpu(le->backing_filename_size);
>> +
>> +cpu->image_filename_offset  = 
>> le32_to_cpu(le->image_filename_offset);
>> +cpu->image_filename_size= le32_to_cpu(le->image_filename_size);
>> +
>> +cpu->features   = le64_to_cpu(le->features);
>> +cpu->optional_features  = le64_to_cpu(le->optional_features);
>> +cpu->header_pages_size  = le32_to_cpu(le->header_pages_size);
>> +}
>> +
>> +static void add_cow_header_cpu_to_le(const AddCowHeader *cpu, AddCowHeader 
>> *le)
>> +{
>> +le->magic   = cpu_to_le64(cpu->magic);
>> +le->version = cpu_to_le32(cpu->version);
>> +
>> +le->backing_filename_offset = 
>> cpu_to_le32(cpu->backing_filename_offset);
>> +le->backing_filename_size   = 
>> cpu_to_le32(cpu->backing_filename_size);
>> +
>> +le->image_filename_offset   = 
>> cpu_to_le32(cpu->image_filename_offset);
>> +le->image_filename_size = cpu_to_le32(cpu->image_filename_size);
>> +
>> +le->features= cpu_to_le64(cpu->features);
>> +le->optional_features   = cpu_to_le64(cpu->optional_features);
>> +le->header_pages_size   = cpu_to_le32(cpu->header_pages_size);
>> +}
>> +
>> +static int add_cow_probe(const uint8_t *buf, int buf_size, const char 
>> *filename)
>> +{
>> +const AddCowHeader *header = (const AddCowHeader *)buf;
>> +
>> +if (le64_to_cpu(header->magic) == ADD_COW_MAGIC &&
>> +le32_to_cpu(header->version) == ADD_COW_VERSION) {
>> +return 100;
>> +} else {
>> +return 0;
>> +}
>> +}
>> +
>> +static int add_cow_create(const char *filename, QEMUOptionParameter 
>> *options)
>> +{
>> +AddCowHeader header = {
>> +.magic = ADD_COW_MAGIC,
>> +.version = ADD_COW_VERSION,
>> +.features = 0,
>> +.optional_features = 0,
>> +.header_pages_size = ADD_COW_DEFAULT_PAGE_SIZE,
>> +};
>> +AddCowHeader le_header;
>> +int64_t image_len = 0;
>> +const char *backing_filename = NULL;
>> +const char *backing_fmt = NULL;
>> +const char *image_filename = NULL;
>> +const char *image_format = NULL;
>> +BlockDriverState *bs, *image_bs = NULL, *backing_bs = NULL;
>> +BlockDriver *drv = bdrv_find_format("add-cow");
>> +BDRVAddCowState s;
>> +int ret;
>> +
>> +while (options && options->name) {
>> +if (!strcmp(options->name, BLOCK_OPT_SIZE)) {
>> +image_len = options->value.n;
>> +} else if (!strcmp(options->name, BLOCK_OPT_BACKING_FILE)) {
>> +backing_filename = options->value.s;
>> +} else if (!strcmp(options->name, BLOCK_OPT_BACKING_FMT)) {
>> +backing_fmt = options->value.s;
>> +} else if (!strcmp(options->name, BLOCK_OPT_IMAGE_FILE)) {
>> +image_filename = options->value.s;
>> +} else if

Re: [Qemu-devel] [PATCH] isapc: Shadow ISA BIOS by default

2012-09-12 Thread Avi Kivity

On 09/12/2012 08:57 AM, Michael Tokarev wrote:
>> 
>> This unbreaks isapc for TCG, and keep it working for KVM once it starts
>> supporting read-only memslots.
> 
> Can you clarify please, -- referring to the above? :)

kvm only works due to a bug.  Once the kvm bug is fixed, isapc will
break, unless the patch above is applied.


-- 
error compiling committee.c: too many arguments to function

Re: [Qemu-devel] [PATCH V12 5/6] add-cow file format

2012-09-12 Thread Kevin Wolf

Am 12.09.2012 09:28, schrieb Dong Xu Wang:
>>> +static bool is_allocated(BlockDriverState *bs, int64_t sector_num)
>>> +{
>>> +BDRVAddCowState *s  = bs->opaque;
>>> +BlockCache *c = s->bitmap_cache;
>>> +int64_t cluster_num = sector_num / SECTORS_PER_CLUSTER;
>>> +uint8_t *table  = NULL;
>>> +uint64_t offset = ADD_COW_PAGE_SIZE * s->header.header_pages_size
>>> ++ (offset_in_bitmap(sector_num) & (~(c->entry_size - 1)));
>>> +int ret = block_cache_get(bs, s->bitmap_cache, offset,
>>> +(void **)&table, BLOCK_TABLE_BITMAP, ADD_COW_CACHE_ENTRY_SIZE);
>>
>> No matching block_cache_put?
>>
>>> +
>>> +if (ret < 0) {
>>> +return ret;
>>> +}
>>> +return table[cluster_num / 8 % ADD_COW_CACHE_ENTRY_SIZE]
>>> +& (1 << (cluster_num % 8));
>>> +}
>>> +
>>> +static coroutine_fn int add_cow_is_allocated(BlockDriverState *bs,
>>> +int64_t sector_num, int nb_sectors, int *num_same)
>>> +{
>>> +BDRVAddCowState *s = bs->opaque;
>>> +int changed;
>>> +
>>> +if (nb_sectors == 0) {
>>> +*num_same = 0;
>>> +return 0;
>>> +}
>>> +
>>> +if (s->header.features & ADD_COW_F_All_ALLOCATED) {
>>> +*num_same = nb_sectors - 1;
>>
>> Why - 1?
>>
>>> +return 1;
>>> +}
>>> +changed = is_allocated(bs, sector_num);
>>> +
>>> +for (*num_same = 1; *num_same < nb_sectors; (*num_same)++) {
>>> +if (is_allocated(bs, sector_num + *num_same) != changed) {
>>> +break;
>>> +}
>>> +}
>>> +return changed;
>>> +}

>>> +static int bdrv_add_cow_truncate(BlockDriverState *bs, int64_t size)
>>> +{
>>> +BDRVAddCowState *s = bs->opaque;
>>> +int sector_per_byte = SECTORS_PER_CLUSTER * 8;
>>> +int ret;
>>> +uint32_t bitmap_pos = s->header.header_pages_size * ADD_COW_PAGE_SIZE;
>>> +int64_t bitmap_size =
>>> +(size / BDRV_SECTOR_SIZE + sector_per_byte - 1) / sector_per_byte;
>>> +bitmap_size = (bitmap_size + ADD_COW_CACHE_ENTRY_SIZE - 1)
>>> +& (~(ADD_COW_CACHE_ENTRY_SIZE - 1));
>>> +
>>> +ret = bdrv_truncate(bs->file, bitmap_pos + bitmap_size);
>>> +if (ret < 0) {
>>> +return ret;
>>> +}
>>> +return 0;
>>> +}
>>
>> So you don't truncate s->image_file? Does this work?
> 
> s->image_file should be truncated? Image file can have a larger virtual size
> than backing_file, my understanding is we should not truncate image file.

I'm talking about s->image_hd, not bs->backing_hd. You are right that
the backing file should not be changed. But the associated raw image
should be resized, shouldn't it?

>>> +static coroutine_fn int add_cow_co_flush(BlockDriverState *bs)
>>> +{
>>> +BDRVAddCowState *s = bs->opaque;
>>> +int ret;
>>> +
>>> +qemu_co_mutex_lock(&s->lock);
>>> +ret = block_cache_flush(bs, s->bitmap_cache, BLOCK_TABLE_BITMAP,
>>> +ADD_COW_CACHE_ENTRY_SIZE);
>>> +qemu_co_mutex_unlock(&s->lock);
>>> +return ret;
>>> +}
>>
>> What about flushing s->image_file?

>>> +typedef struct AddCowHeader {
>>> +uint64_tmagic;
>>> +uint32_tversion;
>>> +
>>> +uint32_tbacking_filename_offset;
>>> +uint32_tbacking_filename_size;
>>> +
>>> +uint32_timage_filename_offset;
>>> +uint32_timage_filename_size;
>>> +
>>> +uint64_tfeatures;
>>> +uint64_toptional_features;
>>> +uint32_theader_pages_size;
>>> +} QEMU_PACKED AddCowHeader;
>>
>> Why aren't backing/image_file_format part of the header here? They are
>> in the spec. It would also simplify some offset calculation code.
>>
> 
> Anthony said "It's far better to shrink the size of the header and use
> an offset/len
> pointer to the backing file string.  Limiting backing files to 1023 is
> unacceptable"
> 
> http://lists.gnu.org/archive/html/qemu-devel/2012-05/msg04110.html
> 
> So I use offset  and length instead of using string directly.

I'm talking about the format, not the path.

Kevin

Re: [Qemu-devel] [PATCH 1/2] KVM: fix i8259 interrupt high to low transition logic

2012-09-12 Thread Avi Kivity

On 09/10/2012 04:29 AM, Matthew Ogilvie wrote:
> Intel's definition of "edge triggered" means: "asserted with a
> low-to-high transition at the time an interrupt is registered
> and then kept high until the interrupt is served via one of the
> EOI mechanisms or goes away unhandled."
> 
> So the only difference between edge triggered and level triggered
> is in the leading edge, with no difference in the trailing edge.
> 
> This bug manifested itself when the guest was Microport UNIX
> System V/386 v2.1 (ca. 1987), because it would sometimes mask
> off IRQ14 in the slave IMR after it had already been asserted.
> The master would still try to deliver an interrupt even though
> IRQ2 had dropped again, resulting in a spurious interupt
> (IRQ15) and a panicked UNIX kernel.
> diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
> index adba28f..5cbba99 100644
> --- a/arch/x86/kvm/i8254.c
> +++ b/arch/x86/kvm/i8254.c
> @@ -302,8 +302,12 @@ static void pit_do_work(struct kthread_work *work)
>   }
>   spin_unlock(&ps->inject_lock);
>   if (inject) {
> - kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 1);
> + /* Clear previous interrupt, then create a rising
> +  * edge to request another interupt, and leave it at
> +  * level=1 until time to inject another one.
> +  */
>   kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 0);
> + kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 1);
>  
>   /*

I thought I understood this, now I'm not sure.  How can this be correct?
 Real hardware doesn't act like this.

What if the PIT is disabled after this?  You're injecting a spurious
interrupt then.

-- 
error compiling committee.c: too many arguments to function

Re: [Qemu-devel] [PATCH V2 1/6] libqblock API design

2012-09-12 Thread Kevin Wolf

Am 11.09.2012 22:28, schrieb Blue Swirl:
> On Mon, Sep 10, 2012 at 8:26 AM, Wenchao Xia  
> wrote:
>>   This patch contains the major APIs in the library.
>> Important APIs:
>>   1 QBroker. These structure was used to retrieve errors, every thread must
>> create one first, later maybe thread related staff could be added into it.
>>   2 QBlockState. It stands for an block image object.
>>   3 QBlockStaticInfo. It contains static information such as location, 
>> backing
>> file, size.
>>   4 ABI was kept with reserved members.
>>   5 Sync I/O. It is similar to C file open, read, write and close operations.
>>
>> Signed-off-by: Wenchao Xia 
>> ---
>>  libqblock/libqblock.c | 1077 
>> +
>>  libqblock/libqblock.h |  292 +
>>  2 files changed, 1369 insertions(+), 0 deletions(-)
>>  create mode 100644 libqblock/libqblock.c
>>  create mode 100644 libqblock/libqblock.h
>>
>> diff --git a/libqblock/libqblock.c b/libqblock/libqblock.c
>> new file mode 100644
>> index 000..133ac0f
>> --- /dev/null
>> +++ b/libqblock/libqblock.c
>> @@ -0,0 +1,1077 @@
>> +/*
>> + * QEMU block layer library
>> + *
>> + * Copyright IBM, Corp. 2012
>> + *
>> + * Authors:
>> + *  Wenchao Xia   
>> + *
>> + * This work is licensed under the terms of the GNU LGPL, version 2 or 
>> later.
>> + * See the COPYING.LIB file in the top-level directory.
>> + *
>> + */
>> +
>> +#include 
>> +#include 
>> +
>> +#include "libqblock.h"
>> +#include "libqblock-internal.h"
>> +
>> +#include "qemu-aio.h"
>> +
>> +struct LibqblockGlobalData {
>> +int init_flag;
>> +};
>> +
>> +struct LibqblockGlobalData g_libqblock_data;
>> +
>> +__attribute__((constructor))
>> +static void libqblock_init(void)
>> +{
>> +if (g_libqblock_data.init_flag == 0) {
>> +bdrv_init();
>> +qemu_init_main_loop();
>> +}
>> +g_libqblock_data.init_flag = 1;
>> +}
>> +
>> +const char *qb_fmttype2str(enum QBlockFmtType fmt_type)
>> +{
>> +const char *ret = NULL;
>> +switch (fmt_type) {
>> +case QB_FMT_COW:
>> +ret = "cow";
>> +break;
>> +case QB_FMT_QED:
>> +ret = "qed";
>> +break;
>> +case QB_FMT_QCOW:
>> +ret = "qcow";
>> +break;
>> +case QB_FMT_QCOW2:
>> +ret = "qcow2";
>> +break;
>> +case QB_FMT_RAW:
>> +ret = "raw";
>> +break;
>> +case QB_FMT_RBD:
>> +ret = "rbd";
>> +break;
>> +case QB_FMT_SHEEPDOG:
>> +ret = "sheepdog";
>> +break;
>> +case QB_FMT_VDI:
>> +ret = "vdi";
>> +break;
>> +case QB_FMT_VMDK:
>> +ret = "vmdk";
>> +break;
>> +case QB_FMT_VPC:
>> +ret = "vpc";
>> +break;
>> +default:
>> +break;
>> +}
>> +return ret;
>> +}
>> +
>> +enum QBlockFmtType qb_str2fmttype(const char *fmt_str)
>> +{
>> +enum QBlockFmtType ret = QB_FMT_NONE;
>> +if (0 == strcmp(fmt_str, "cow")) {
> 
> This order is not common in QEMU.

How about just changing the whole thing to a table that maps
QBlockFmtType to strings, and then both conversion functions could just
search that table?

> 
>> +ret = QB_FMT_COW;
>> +} else if (0 == strcmp(fmt_str, "qed")) {
>> +ret = QB_FMT_QED;
>> +} else if (0 == strcmp(fmt_str, "qcow")) {
>> +ret = QB_FMT_QCOW;
>> +} else if (0 == strcmp(fmt_str, "qcow2")) {
>> +ret = QB_FMT_QCOW2;
>> +} else if (0 == strcmp(fmt_str, "raw")) {
>> +ret = QB_FMT_RAW;
>> +} else if (0 == strcmp(fmt_str, "rbd")) {
>> +ret = QB_FMT_RBD;
>> +} else if (0 == strcmp(fmt_str, "sheepdog")) {
>> +ret = QB_FMT_SHEEPDOG;
>> +} else if (0 == strcmp(fmt_str, "vdi")) {
>> +ret = QB_FMT_VDI;
>> +} else if (0 == strcmp(fmt_str, "vmdk")) {
>> +ret = QB_FMT_VMDK;
>> +} else if (0 == strcmp(fmt_str, "vpc")) {
>> +ret = QB_FMT_VPC;
>> +}
>> +return ret;
>> +}

Kevin

Re: [Qemu-devel] [PATCH] isapc: Shadow ISA BIOS by default

2012-09-12 Thread Jan Kiszka

[forgot to CC stable: this one apparently qualifies for older QEMU
releases as well but would require some adaptions for < 1.1]

On 2012-09-11 17:53, Jan Kiszka wrote:
> Our one and only BIOS depends on a writable shadowed BIOS in the ISA
> range. As we have no interface to control the write property, make that
> region writable by default.
> 
> Signed-off-by: Jan Kiszka 
> ---
> 
> This unbreaks isapc for TCG, and keep it working for KVM once it starts
> supporting read-only memslots.
> 
>  hw/pc_sysfw.c |   13 +
>  1 files changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/pc_sysfw.c b/hw/pc_sysfw.c
> index b45f0ac..027d98a 100644
> --- a/hw/pc_sysfw.c
> +++ b/hw/pc_sysfw.c
> @@ -136,6 +136,7 @@ static void old_pc_system_rom_init(MemoryRegion 
> *rom_memory)
>  {
>  char *filename;
>  MemoryRegion *bios, *isa_bios;
> +void *isa_bios_ptr;
>  int bios_size, isa_bios_size;
>  int ret;
>  
> @@ -167,19 +168,23 @@ static void old_pc_system_rom_init(MemoryRegion 
> *rom_memory)
>  g_free(filename);
>  }
>  
> -/* map the last 128KB of the BIOS in ISA space */
> +/* Shadow the last 128KB of the BIOS in ISA space as RAM -
> + * Seabios depends on this */
>  isa_bios_size = bios_size;
>  if (isa_bios_size > (128 * 1024)) {
>  isa_bios_size = 128 * 1024;
>  }
>  isa_bios = g_malloc(sizeof(*isa_bios));
> -memory_region_init_alias(isa_bios, "isa-bios", bios,
> - bios_size - isa_bios_size, isa_bios_size);
> +memory_region_init_ram(isa_bios, "isa-bios", isa_bios_size);
> +vmstate_register_ram_global(isa_bios);
>  memory_region_add_subregion_overlap(rom_memory,
>  0x10 - isa_bios_size,
>  isa_bios,
>  1);
> -memory_region_set_readonly(isa_bios, true);
> +
> +/* copy ISA rom image from top of the ROM */
> +isa_bios_ptr = memory_region_get_ram_ptr(isa_bios);
> +rom_copy(isa_bios_ptr, (uint32_t)(-isa_bios_size), isa_bios_size);
>  
>  /* map all the bios at the top of memory */
>  memory_region_add_subregion(rom_memory,
> 

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux

[Qemu-devel] qemu-nbd very slow, patch available

2012-09-12 Thread Tristan Wibberley

Hello qemu devs,

A couple of years ago Stephane Chazelas
(https://launchpad.net/~stephane-chazelas) reported a performance
issue with qemu-nbd due to the lack of an option to switch on
writeback caching (ie a mode like any typical program that just
modifies files of data where resilience is not provided for each write
operation). I can't find any information about why that wasn't
applied.

Perhaps it just got lost in the noise, but I could see some
improvements to be made so I've attached a variant to provide extra
speed via a commandline option for users for whom fast, less robust
operation is wanted (ie, where robustness is available through other
means).

The attached diff adds a commandline option "--cache=" with four modes
and makes no change to the default behaviour of qemu-nbd:

  --cache=writethrough [default, O_DSYNC, very slow, very resilient]
  --cache=off [same as --nocache, O_DIRECT, slow, resilient]
  --cache=none [O_DIRECT, O_DSYNC, very very slow, very very resilient]
  --cache=writeback [fast, not resilient]


Here is Stephane's original report:
https://bugs.launchpad.net/ubuntu/+source/qemu-kvm/+bug/595117

Versus Stephane's patch I've altered the meaning of --cache=off to
accurately match --nocache.
I have not included Stephane's extra fsync operation on close because
that would make --cache=writeback very slow on disconnect. Users can
sync after disconnect via alternatives available on their platform.

--
Tristan


fast-nbd.diff
Description: Binary data

Re: [Qemu-devel] [PATCH 1/2] KVM: fix i8259 interrupt high to low transition logic

2012-09-12 Thread Jan Kiszka

On 2012-09-12 10:01, Avi Kivity wrote:
> On 09/10/2012 04:29 AM, Matthew Ogilvie wrote:
>> Intel's definition of "edge triggered" means: "asserted with a
>> low-to-high transition at the time an interrupt is registered
>> and then kept high until the interrupt is served via one of the
>> EOI mechanisms or goes away unhandled."
>>
>> So the only difference between edge triggered and level triggered
>> is in the leading edge, with no difference in the trailing edge.
>>
>> This bug manifested itself when the guest was Microport UNIX
>> System V/386 v2.1 (ca. 1987), because it would sometimes mask
>> off IRQ14 in the slave IMR after it had already been asserted.
>> The master would still try to deliver an interrupt even though
>> IRQ2 had dropped again, resulting in a spurious interupt
>> (IRQ15) and a panicked UNIX kernel.
>> diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
>> index adba28f..5cbba99 100644
>> --- a/arch/x86/kvm/i8254.c
>> +++ b/arch/x86/kvm/i8254.c
>> @@ -302,8 +302,12 @@ static void pit_do_work(struct kthread_work *work)
>>  }
>>  spin_unlock(&ps->inject_lock);
>>  if (inject) {
>> -kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 1);
>> +/* Clear previous interrupt, then create a rising
>> + * edge to request another interupt, and leave it at
>> + * level=1 until time to inject another one.
>> + */
>>  kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 0);
>> +kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 1);
>>  
>>  /*
> 
> I thought I understood this, now I'm not sure.  How can this be correct?
>  Real hardware doesn't act like this.
> 
> What if the PIT is disabled after this?  You're injecting a spurious
> interrupt then.

Yes, the PIT has to raise the output as long as specified, i.e.
according to the datasheet. That's important now due to the corrections
to the PIC. We can then carefully check if there is room for
simplifications / optimizations. I also cannot imagine that the above
already fulfills these requirements.

And if the PIT is disabled by the HPET, we need to clear the output
explicitly as we inject the IRQ#0 under a different source ID than
userspace HPET does (which will logically take over IRQ#0 control). The
kernel would otherwise OR both sources to an incorrect result.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux

Re: [Qemu-devel] [PATCH 1/2] KVM: fix i8259 interrupt high to low transition logic

2012-09-12 Thread Avi Kivity

On 09/12/2012 11:48 AM, Jan Kiszka wrote:
> On 2012-09-12 10:01, Avi Kivity wrote:
>> On 09/10/2012 04:29 AM, Matthew Ogilvie wrote:
>>> Intel's definition of "edge triggered" means: "asserted with a
>>> low-to-high transition at the time an interrupt is registered
>>> and then kept high until the interrupt is served via one of the
>>> EOI mechanisms or goes away unhandled."
>>>
>>> So the only difference between edge triggered and level triggered
>>> is in the leading edge, with no difference in the trailing edge.
>>>
>>> This bug manifested itself when the guest was Microport UNIX
>>> System V/386 v2.1 (ca. 1987), because it would sometimes mask
>>> off IRQ14 in the slave IMR after it had already been asserted.
>>> The master would still try to deliver an interrupt even though
>>> IRQ2 had dropped again, resulting in a spurious interupt
>>> (IRQ15) and a panicked UNIX kernel.
>>> diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
>>> index adba28f..5cbba99 100644
>>> --- a/arch/x86/kvm/i8254.c
>>> +++ b/arch/x86/kvm/i8254.c
>>> @@ -302,8 +302,12 @@ static void pit_do_work(struct kthread_work *work)
>>> }
>>> spin_unlock(&ps->inject_lock);
>>> if (inject) {
>>> -   kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 1);
>>> +   /* Clear previous interrupt, then create a rising
>>> +* edge to request another interupt, and leave it at
>>> +* level=1 until time to inject another one.
>>> +*/
>>> kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 0);
>>> +   kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 1);
>>>  
>>> /*
>> 
>> I thought I understood this, now I'm not sure.  How can this be correct?
>>  Real hardware doesn't act like this.
>> 
>> What if the PIT is disabled after this?  You're injecting a spurious
>> interrupt then.
> 
> Yes, the PIT has to raise the output as long as specified, i.e.
> according to the datasheet. That's important now due to the corrections
> to the PIC. We can then carefully check if there is room for
> simplifications / optimizations. I also cannot imagine that the above
> already fulfills these requirements.
> 
> And if the PIT is disabled by the HPET, we need to clear the output
> explicitly as we inject the IRQ#0 under a different source ID than
> userspace HPET does (which will logically take over IRQ#0 control). The
> kernel would otherwise OR both sources to an incorrect result.
> 

I guess we need to double the hrtimer rate then in order to generate a
square wave.  It's getting ridiculous how accurate our model needs to be.


-- 
error compiling committee.c: too many arguments to function

Re: [Qemu-devel] [PATCH 1/2] KVM: fix i8259 interrupt high to low transition logic

2012-09-12 Thread Jan Kiszka

On 2012-09-12 10:51, Avi Kivity wrote:
> On 09/12/2012 11:48 AM, Jan Kiszka wrote:
>> On 2012-09-12 10:01, Avi Kivity wrote:
>>> On 09/10/2012 04:29 AM, Matthew Ogilvie wrote:
 Intel's definition of "edge triggered" means: "asserted with a
 low-to-high transition at the time an interrupt is registered
 and then kept high until the interrupt is served via one of the
 EOI mechanisms or goes away unhandled."

 So the only difference between edge triggered and level triggered
 is in the leading edge, with no difference in the trailing edge.

 This bug manifested itself when the guest was Microport UNIX
 System V/386 v2.1 (ca. 1987), because it would sometimes mask
 off IRQ14 in the slave IMR after it had already been asserted.
 The master would still try to deliver an interrupt even though
 IRQ2 had dropped again, resulting in a spurious interupt
 (IRQ15) and a panicked UNIX kernel.
 diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
 index adba28f..5cbba99 100644
 --- a/arch/x86/kvm/i8254.c
 +++ b/arch/x86/kvm/i8254.c
 @@ -302,8 +302,12 @@ static void pit_do_work(struct kthread_work *work)
}
spin_unlock(&ps->inject_lock);
if (inject) {
 -  kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 1);
 +  /* Clear previous interrupt, then create a rising
 +   * edge to request another interupt, and leave it at
 +   * level=1 until time to inject another one.
 +   */
kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 0);
 +  kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 1);
  
/*
>>>
>>> I thought I understood this, now I'm not sure.  How can this be correct?
>>>  Real hardware doesn't act like this.
>>>
>>> What if the PIT is disabled after this?  You're injecting a spurious
>>> interrupt then.
>>
>> Yes, the PIT has to raise the output as long as specified, i.e.
>> according to the datasheet. That's important now due to the corrections
>> to the PIC. We can then carefully check if there is room for
>> simplifications / optimizations. I also cannot imagine that the above
>> already fulfills these requirements.
>>
>> And if the PIT is disabled by the HPET, we need to clear the output
>> explicitly as we inject the IRQ#0 under a different source ID than
>> userspace HPET does (which will logically take over IRQ#0 control). The
>> kernel would otherwise OR both sources to an incorrect result.
>>
> 
> I guess we need to double the hrtimer rate then in order to generate a
> square wave.  It's getting ridiculous how accurate our model needs to be.

I would suggest to solve this for the userspace model first, ensure that
it works properly in all modes, maybe optimize it, and then decide how
to map all this on kernel space. As long as we have two models, we can
also make use of them.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux

Re: [Qemu-devel] [PATCH 1/2] KVM: fix i8259 interrupt high to low transition logic

2012-09-12 Thread Avi Kivity

On 09/12/2012 11:57 AM, Jan Kiszka wrote:
> On 2012-09-12 10:51, Avi Kivity wrote:
>> On 09/12/2012 11:48 AM, Jan Kiszka wrote:
>>> On 2012-09-12 10:01, Avi Kivity wrote:
 On 09/10/2012 04:29 AM, Matthew Ogilvie wrote:
> Intel's definition of "edge triggered" means: "asserted with a
> low-to-high transition at the time an interrupt is registered
> and then kept high until the interrupt is served via one of the
> EOI mechanisms or goes away unhandled."
>
> So the only difference between edge triggered and level triggered
> is in the leading edge, with no difference in the trailing edge.
>
> This bug manifested itself when the guest was Microport UNIX
> System V/386 v2.1 (ca. 1987), because it would sometimes mask
> off IRQ14 in the slave IMR after it had already been asserted.
> The master would still try to deliver an interrupt even though
> IRQ2 had dropped again, resulting in a spurious interupt
> (IRQ15) and a panicked UNIX kernel.
> diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
> index adba28f..5cbba99 100644
> --- a/arch/x86/kvm/i8254.c
> +++ b/arch/x86/kvm/i8254.c
> @@ -302,8 +302,12 @@ static void pit_do_work(struct kthread_work *work)
>   }
>   spin_unlock(&ps->inject_lock);
>   if (inject) {
> - kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 1);
> + /* Clear previous interrupt, then create a rising
> +  * edge to request another interupt, and leave it at
> +  * level=1 until time to inject another one.
> +  */
>   kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 0);
> + kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 1);
>  
>   /*

 I thought I understood this, now I'm not sure.  How can this be correct?
  Real hardware doesn't act like this.

 What if the PIT is disabled after this?  You're injecting a spurious
 interrupt then.
>>>
>>> Yes, the PIT has to raise the output as long as specified, i.e.
>>> according to the datasheet. That's important now due to the corrections
>>> to the PIC. We can then carefully check if there is room for
>>> simplifications / optimizations. I also cannot imagine that the above
>>> already fulfills these requirements.
>>>
>>> And if the PIT is disabled by the HPET, we need to clear the output
>>> explicitly as we inject the IRQ#0 under a different source ID than
>>> userspace HPET does (which will logically take over IRQ#0 control). The
>>> kernel would otherwise OR both sources to an incorrect result.
>>>
>> 
>> I guess we need to double the hrtimer rate then in order to generate a
>> square wave.  It's getting ridiculous how accurate our model needs to be.
> 
> I would suggest to solve this for the userspace model first, ensure that
> it works properly in all modes, maybe optimize it, and then decide how
> to map all this on kernel space. As long as we have two models, we can
> also make use of them.

Good idea.


-- 
error compiling committee.c: too many arguments to function

[Qemu-devel] [PATCH v5] configure: properly check if -lrt and -lm is needed

2012-09-12 Thread Natanael Copa

Fixes build against uClibc.

uClibc provides 2 versions of clock_gettime(), one with realtime
support and one without (this is so you can avoid linking in -lrt
unless actually needed). This means that the clock_gettime() don't
need -lrt. We still need it for timer_create() so we check for this
function in addition.

We also need check if -lm is needed for isnan().

Both -lm and -lrt are needed for libs_qga.

Signed-off-by: Natanael Copa 
---
Changes v4->v5:

 - Do not exit with error if librt fails.
   Apparently, mingw32 does not use those functions at all so we
   should not exit with error.

   This is how it originally worked.

 configure | 31 +--
 1 file changed, 29 insertions(+), 2 deletions(-)

diff --git a/configure b/configure
index edf9da4..c1ed856 100755
--- a/configure
+++ b/configure
@@ -2624,17 +2624,44 @@ fi
 
 
 ##
+# Do we need libm
+cat > $TMPC << EOF
+#include 
+int main(void) { return isnan(sin(0.0)); }
+EOF
+if compile_prog "" "" ; then
+  :
+elif compile_prog "" "-lm" ; then
+  LIBS="-lm $LIBS"
+  libs_qga="-lm $libs_qga"
+else
+  echo
+  echo "Error: libm check failed"
+  echo
+  exit 1
+fi
+
+##
 # Do we need librt
+# uClibc provides 2 versions of clock_gettime(), one with realtime
+# support and one without. This means that the clock_gettime() don't
+# need -lrt. We still need it for timer_create() so we check for this
+# function in addition.
 cat > $TMPC <
 #include 
-int main(void) { return clock_gettime(CLOCK_REALTIME, NULL); }
+int main(void) {
+  timer_create(CLOCK_REALTIME, NULL, NULL);
+  return clock_gettime(CLOCK_REALTIME, NULL);
+}
 EOF
 
 if compile_prog "" "" ; then
   :
-elif compile_prog "" "-lrt" ; then
+# we need pthread for static linking. use previous pthread test result 
+elif compile_prog "" "-lrt $pthread_lib" ; then
   LIBS="-lrt $LIBS"
+  libs_qga="-lrt $libs_qga"
 fi
 
 if test "$darwin" != "yes" -a "$mingw32" != "yes" -a "$solaris" != yes -a \
-- 
1.7.12

Re: [Qemu-devel] [PATCH v6 2/2] block: Support GlusterFS as a QEMU block backend

2012-09-12 Thread Bharata B Rao

On Fri, Sep 07, 2012 at 11:57:58AM +0200, Kevin Wolf wrote:
> Am 07.09.2012 11:36, schrieb Paolo Bonzini:
> > Hmm, why don't we do the exact same thing as libvirt 
> > (http://libvirt.org/remote.html):
> > 
> > ipv4 - gluster+tcp://1.2.3.4:0/testvol/dir/a.img
> > ipv6 - gluster+tcp://[1:2:3:4:5:6:7:8]:0/testvol/a.img
> > host - gluster+tcp://example.org/testvol/dir/a.img
> > unix - gluster+unix:///testvol/dir/a.img?socket=/tmp/glusterd.socket
> > rdma - gluster+rdma://1.2.3.4:0/testvol/a.img
> > 
> > Where the default transport is tcp (i.e. gluster+tcp is the same as 
> > gluster).
> > This is easily extensible, theoretically you could have something like this:
> > 
> > ssh - gluster+ssh://user@host/testvol/a.img?socket=/tmp/glusterd.socket
> 
> I like this. Having the type only as a parameter when it decides how the
> schema must be parsed has bothered me from the start, but I hadn't
> thought of this way.
> 
> Strong +1 from me.

FYI, bdrv_find_protocol() fails for protocols like this. It detects the protocol
as "gluster+tcp" and compares it with drv->protocol_name (which is only
"gluster").

I guess I will have to fix bdrv_find_protocol() to handle the '+' within
protocol string correctly.

Regards,
Bharata.

> 
> Kevin

Re: [Qemu-devel] [PATCH V2 1/6] libqblock API design

2012-09-12 Thread Wenchao Xia


于 2012-9-12 16:19, Kevin Wolf 写道:

Am 11.09.2012 22:28, schrieb Blue Swirl:

On Mon, Sep 10, 2012 at 8:26 AM, Wenchao Xia  wrote:

   This patch contains the major APIs in the library.
Important APIs:
   1 QBroker. These structure was used to retrieve errors, every thread must
create one first, later maybe thread related staff could be added into it.
   2 QBlockState. It stands for an block image object.
   3 QBlockStaticInfo. It contains static information such as location, backing
file, size.
   4 ABI was kept with reserved members.
   5 Sync I/O. It is similar to C file open, read, write and close operations.

Signed-off-by: Wenchao Xia 
---
  libqblock/libqblock.c | 1077 +
  libqblock/libqblock.h |  292 +
  2 files changed, 1369 insertions(+), 0 deletions(-)
  create mode 100644 libqblock/libqblock.c
  create mode 100644 libqblock/libqblock.h

diff --git a/libqblock/libqblock.c b/libqblock/libqblock.c
new file mode 100644
index 000..133ac0f
--- /dev/null
+++ b/libqblock/libqblock.c
@@ -0,0 +1,1077 @@
+/*
+ * QEMU block layer library
+ *
+ * Copyright IBM, Corp. 2012
+ *
+ * Authors:
+ *  Wenchao Xia   
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ *
+ */
+
+#include 
+#include 
+
+#include "libqblock.h"
+#include "libqblock-internal.h"
+
+#include "qemu-aio.h"
+
+struct LibqblockGlobalData {
+int init_flag;
+};
+
+struct LibqblockGlobalData g_libqblock_data;
+
+__attribute__((constructor))
+static void libqblock_init(void)
+{
+if (g_libqblock_data.init_flag == 0) {
+bdrv_init();
+qemu_init_main_loop();
+}
+g_libqblock_data.init_flag = 1;
+}
+
+const char *qb_fmttype2str(enum QBlockFmtType fmt_type)
+{
+const char *ret = NULL;
+switch (fmt_type) {
+case QB_FMT_COW:
+ret = "cow";
+break;
+case QB_FMT_QED:
+ret = "qed";
+break;
+case QB_FMT_QCOW:
+ret = "qcow";
+break;
+case QB_FMT_QCOW2:
+ret = "qcow2";
+break;
+case QB_FMT_RAW:
+ret = "raw";
+break;
+case QB_FMT_RBD:
+ret = "rbd";
+break;
+case QB_FMT_SHEEPDOG:
+ret = "sheepdog";
+break;
+case QB_FMT_VDI:
+ret = "vdi";
+break;
+case QB_FMT_VMDK:
+ret = "vmdk";
+break;
+case QB_FMT_VPC:
+ret = "vpc";
+break;
+default:
+break;
+}
+return ret;
+}
+
+enum QBlockFmtType qb_str2fmttype(const char *fmt_str)
+{
+enum QBlockFmtType ret = QB_FMT_NONE;
+if (0 == strcmp(fmt_str, "cow")) {


This order is not common in QEMU.


How about just changing the whole thing to a table that maps
QBlockFmtType to strings, and then both conversion functions could just
search that table?


  Good idea, will go this way.




+ret = QB_FMT_COW;
+} else if (0 == strcmp(fmt_str, "qed")) {
+ret = QB_FMT_QED;
+} else if (0 == strcmp(fmt_str, "qcow")) {
+ret = QB_FMT_QCOW;
+} else if (0 == strcmp(fmt_str, "qcow2")) {
+ret = QB_FMT_QCOW2;
+} else if (0 == strcmp(fmt_str, "raw")) {
+ret = QB_FMT_RAW;
+} else if (0 == strcmp(fmt_str, "rbd")) {
+ret = QB_FMT_RBD;
+} else if (0 == strcmp(fmt_str, "sheepdog")) {
+ret = QB_FMT_SHEEPDOG;
+} else if (0 == strcmp(fmt_str, "vdi")) {
+ret = QB_FMT_VDI;
+} else if (0 == strcmp(fmt_str, "vmdk")) {
+ret = QB_FMT_VMDK;
+} else if (0 == strcmp(fmt_str, "vpc")) {
+ret = QB_FMT_VPC;
+}
+return ret;
+}


Kevin




--
Best Regards

Wenchao Xia

Re: [Qemu-devel] [PATCH v6 2/2] block: Support GlusterFS as a QEMU block backend

2012-09-12 Thread Paolo Bonzini

Il 12/09/2012 11:22, Bharata B Rao ha scritto:
> FYI, bdrv_find_protocol() fails for protocols like this. It detects the 
> protocol
> as "gluster+tcp" and compares it with drv->protocol_name (which is only
> "gluster").
> 
> I guess I will have to fix bdrv_find_protocol() to handle the '+' within
> protocol string correctly.

Yes, or you can declare separate protocols for each supported transport
(there are just 2 or 3, right?).

Paolo

Re: [Qemu-devel] qemu-nbd very slow, patch available

2012-09-12 Thread Paolo Bonzini

Il 12/09/2012 10:25, Tristan Wibberley ha scritto:
> The attached diff adds a commandline option "--cache=" with four modes
> and makes no change to the default behaviour of qemu-nbd:
> 
>   --cache=writethrough [default, O_DSYNC, very slow, very resilient]
>   --cache=off [same as --nocache, O_DIRECT, slow, resilient]
>   --cache=none [O_DIRECT, O_DSYNC, very very slow, very very resilient]
>   --cache=writeback [fast, not resilient]

A very similar patch is in QEMU 1.2.0, just released last week. :)

  --cache=writethrough [default, O_DSYNC]
  --cache=none [same as --nocache, O_DIRECT]
  --cache=directsync [O_DIRECT, O_DSYNC]
  --cache=writeback

> I have not included Stephane's extra fsync operation on close because
> that would make --cache=writeback very slow on disconnect. Users can
> sync after disconnect via alternatives available on their platform.

An fsync is already done by bdrv_close; however, this only syncs just
before qemu-nbd exits, not after each and every disconnect.  I think
it's a good compromise.  The best solution would be to add support for
fsync in the NBD kernel driver, so that --cache=writeback would be just
as good.

Paolo

Re: [Qemu-devel] Windows VM slow boot

2012-09-12 Thread Richard Davies

[ adding linux-mm - previously at http://marc.info/?t=13451150943 ]

Hi Rik,

Since qemu-kvm 1.2.0 and Linux 3.6.0-rc5 came out, I thought that I would
retest with these.

The typical symptom now appears to be that the Windows VMs boot reasonably
fast, but then there is high CPU use and load for many minutes afterwards -
the high CPU use is both for the qemu-kvm processes themselves and also for
% sys.

I attach a perf report which seems to show that the high CPU use is in the
memory manager.

Cheers,

Richard.


# 
# captured on: Wed Sep 12 10:25:43 2012
# os release : 3.6.0-rc5-elastic
# perf version : 3.5.2
# arch : x86_64
# nrcpus online : 16
# nrcpus avail : 16
# cpudesc : AMD Opteron(tm) Processor 6128
# cpuid : AuthenticAMD,16,9,1
# total memory : 131973280 kB
# cmdline : /home/root/bin/perf record -g -a 
# event : name = cycles, type = 0, config = 0x0, config1 = 0x0, config2 = 0x0, 
excl_usr = 0, excl_kern = 0, id = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 
14, 15, 16 }
# HEADER_CPU_TOPOLOGY info available, use -I to display
# HEADER_NUMA_TOPOLOGY info available, use -I to display
# 
#
# Samples: 870K of event 'cycles'
# Event count (approx.): 432968175910
#
# Overhead  Command Shared Object   
   Symbol
#   ...    
..
#
89.14% qemu-kvm  [kernel.kallsyms] [k] _raw_spin_lock_irqsave   
 
   |
   --- _raw_spin_lock_irqsave
  |  
  |--95.47%-- isolate_migratepages_range
  |  compact_zone
  |  compact_zone_order
  |  try_to_compact_pages
  |  __alloc_pages_direct_compact
  |  __alloc_pages_nodemask
  |  alloc_pages_vma
  |  do_huge_pmd_anonymous_page
  |  handle_mm_fault
  |  __get_user_pages
  |  get_user_page_nowait
  |  hva_to_pfn.isra.17
  |  __gfn_to_pfn
  |  gfn_to_pfn_async
  |  try_async_pf
  |  tdp_page_fault
  |  kvm_mmu_page_fault
  |  pf_interception
  |  handle_exit
  |  kvm_arch_vcpu_ioctl_run
  |  kvm_vcpu_ioctl
  |  do_vfs_ioctl
  |  sys_ioctl
  |  system_call_fastpath
  |  ioctl
  |  |  
  |  |--55.64%-- 0x1010002
  |  |  
  |   --44.36%-- 0x1010006
  |  
  |--4.53%-- compact_zone
  |  compact_zone_order
  |  try_to_compact_pages
  |  __alloc_pages_direct_compact
  |  __alloc_pages_nodemask
  |  alloc_pages_vma
  |  do_huge_pmd_anonymous_page
  |  handle_mm_fault
  |  __get_user_pages
  |  get_user_page_nowait
  |  hva_to_pfn.isra.17
  |  __gfn_to_pfn
  |  gfn_to_pfn_async
  |  try_async_pf
  |  tdp_page_fault
  |  kvm_mmu_page_fault
  |  pf_interception
  |  handle_exit
  |  kvm_arch_vcpu_ioctl_run
  |  kvm_vcpu_ioctl
  |  do_vfs_ioctl
  |  sys_ioctl
  |  system_call_fastpath
  |  ioctl
  |  |  
  |  |--55.36%-- 0x1010002
  |  |  
  |   --44.64%-- 0x1010006
   --0.00%-- [...]
 4.92% qemu-kvm  [kernel.kallsyms] [k] migrate_pages
 
   |
   --- migrate_pages
  |  
  |--99.74%-- compact_zone
  |  compact_zone_order
  |  try_to_compact_pages
  |  __alloc_pages_direct_compact

[Qemu-devel] [PATCH v2 1/3] Refactor inet_connect_opts function

2012-09-12 Thread Orit Wasserman

From: Michael S. Tsirkin 

refactor address resolution code to fix nonblocking connect

Signed-off-by: Michael S. Tsirkin 
Signed-off-by: Amos Kong 
Signed-off-by: Orit Wasserman 
---
 qemu-sockets.c |  139 +---
 1 files changed, 82 insertions(+), 57 deletions(-)

diff --git a/qemu-sockets.c b/qemu-sockets.c
index 361d890..68e4d30 100644
--- a/qemu-sockets.c
+++ b/qemu-sockets.c
@@ -209,32 +209,25 @@ listen:
 return slisten;
 }
 
-int inet_connect_opts(QemuOpts *opts, bool *in_progress, Error **errp)
+static struct addrinfo *inet_parse_connect_opts(QemuOpts *opts, Error **errp)
 {
-struct addrinfo ai,*res,*e;
+struct addrinfo ai, *res;
+int rc;
 const char *addr;
 const char *port;
-char uaddr[INET6_ADDRSTRLEN+1];
-char uport[33];
-int sock,rc;
-bool block;
 
 memset(&ai,0, sizeof(ai));
 ai.ai_flags = AI_CANONNAME | AI_ADDRCONFIG;
 ai.ai_family = PF_UNSPEC;
 ai.ai_socktype = SOCK_STREAM;
 
-if (in_progress) {
-*in_progress = false;
-}
-
 addr = qemu_opt_get(opts, "host");
 port = qemu_opt_get(opts, "port");
-block = qemu_opt_get_bool(opts, "block", 0);
 if (addr == NULL || port == NULL) {
-fprintf(stderr, "inet_connect: host and/or port not specified\n");
+fprintf(stderr,
+"inet_parse_connect_opts: host and/or port not specified\n");
 error_set(errp, QERR_SOCKET_CREATE_FAILED);
-return -1;
+return NULL;
 }
 
 if (qemu_opt_get_bool(opts, "ipv4", 0))
@@ -247,57 +240,89 @@ int inet_connect_opts(QemuOpts *opts, bool *in_progress, 
Error **errp)
 fprintf(stderr,"getaddrinfo(%s,%s): %s\n", addr, port,
 gai_strerror(rc));
 error_set(errp, QERR_SOCKET_CREATE_FAILED);
-   return -1;
+return NULL;
 }
+return res;
+}
 
-for (e = res; e != NULL; e = e->ai_next) {
-if (getnameinfo((struct sockaddr*)e->ai_addr,e->ai_addrlen,
-uaddr,INET6_ADDRSTRLEN,uport,32,
-NI_NUMERICHOST | NI_NUMERICSERV) != 0) {
-fprintf(stderr,"%s: getnameinfo: oops\n", __FUNCTION__);
-continue;
-}
-sock = qemu_socket(e->ai_family, e->ai_socktype, e->ai_protocol);
-if (sock < 0) {
-fprintf(stderr,"%s: socket(%s): %s\n", __FUNCTION__,
-inet_strfamily(e->ai_family), strerror(errno));
-continue;
+#ifdef _WIN32
+#define QEMU_SOCKET_RC_INPROGRESS(rc) \
+((rc) == -EINPROGRESS || rc == -EWOULDBLOCK || rc == -WSAEALREADY)
+#else
+#define QEMU_SOCKET_RC_INPROGRESS(rc) \
+((rc) == -EINPROGRESS)
+#endif
+
+static int inet_connect_addr(struct addrinfo *addr, bool block,
+ bool *in_progress, Error **errp)
+{
+char uaddr[INET6_ADDRSTRLEN + 1];
+char uport[33];
+int sock, rc;
+
+if (getnameinfo((struct sockaddr *)addr->ai_addr, addr->ai_addrlen,
+uaddr, INET6_ADDRSTRLEN, uport, 32,
+NI_NUMERICHOST | NI_NUMERICSERV)) {
+fprintf(stderr, "%s: getnameinfo: oops\n", __func__);
+return -1;
+}
+sock = qemu_socket(addr->ai_family, addr->ai_socktype, addr->ai_protocol);
+if (sock < 0) {
+fprintf(stderr, "%s: socket(%s): %s\n", __func__,
+inet_strfamily(addr->ai_family), strerror(errno));
+return -1;
+}
+setsockopt(sock, SOL_SOCKET, SO_REUSEADDR, &on, sizeof(on));
+if (!block) {
+socket_set_nonblock(sock);
+}
+/* connect to peer */
+do {
+rc = 0;
+if (connect(sock, addr->ai_addr, addr->ai_addrlen) < 0) {
+rc = -socket_error();
 }
-setsockopt(sock,SOL_SOCKET,SO_REUSEADDR,(void*)&on,sizeof(on));
-if (!block) {
-socket_set_nonblock(sock);
+} while (rc == -EINTR);
+
+if (!block && QEMU_SOCKET_RC_INPROGRESS(rc)) {
+if (in_progress) {
+*in_progress = true;
 }
-/* connect to peer */
-do {
-rc = 0;
-if (connect(sock, e->ai_addr, e->ai_addrlen) < 0) {
-rc = -socket_error();
-}
-} while (rc == -EINTR);
-
-  #ifdef _WIN32
-if (!block && (rc == -EINPROGRESS || rc == -EWOULDBLOCK
-   || rc == -WSAEALREADY)) {
-  #else
-if (!block && (rc == -EINPROGRESS)) {
-  #endif
-if (in_progress) {
-*in_progress = true;
-}
-} else if (rc < 0) {
-if (NULL == e->ai_next)
-fprintf(stderr, "%s: connect(%s,%s,%s,%s): %s\n", __FUNCTION__,
-inet_strfamily(e->ai_family),
-e->ai_canonname, uaddr, uport, strerror(errno));
-closesocket(sock);
-continue;
+} else if (rc < 0) {
+closesocket(sock);
+return -1;
+}
+return sock;
+}
+
+int inet_connect_op

[Qemu-devel] [PATCH v2 3/3] Fix address handling in inet_nonblocking_connect

2012-09-12 Thread Orit Wasserman

getaddrinfo can give us a list of addresses, but we only try to
connect to the first one. If that fails we never proceed to
the next one.  This is common on desktop setups that often have ipv6
configured but not actually working.

To fix this make inet_connect_nonblocking retry connection with a different
address.
callers on inet_nonblocking_connect register a callback function that will
be called when connect opertion completes, in case of failure the fd will have
a negative value

Signed-off-by: Orit Wasserman 
Signed-off-by: Michael S. Tsirkin 
---
 migration-tcp.c |   29 +++---
 qemu-sockets.c  |  169 +--
 qemu_socket.h   |9 ++-
 3 files changed, 142 insertions(+), 65 deletions(-)

diff --git a/migration-tcp.c b/migration-tcp.c
index 7f6ad98..cadea36 100644
--- a/migration-tcp.c
+++ b/migration-tcp.c
@@ -53,29 +53,18 @@ static int tcp_close(MigrationState *s)
 return r;
 }
 
-static void tcp_wait_for_connect(void *opaque)
+static void tcp_wait_for_connect(int fd, void *opaque)
 {
 MigrationState *s = opaque;
-int val, ret;
-socklen_t valsize = sizeof(val);
 
-DPRINTF("connect completed\n");
-do {
-ret = getsockopt(s->fd, SOL_SOCKET, SO_ERROR, (void *) &val, &valsize);
-} while (ret == -1 && (socket_error()) == EINTR);
-
-if (ret < 0) {
+if (fd < 0) {
+DPRINTF("migrate connect error\n");
+s->fd = -1;
 migrate_fd_error(s);
-return;
-}
-
-qemu_set_fd_handler2(s->fd, NULL, NULL, NULL, NULL);
-
-if (val == 0)
+} else {
+DPRINTF("migrate connect success\n");
+s->fd = fd;
 migrate_fd_connect(s);
-else {
-DPRINTF("error connecting %d\n", val);
-migrate_fd_error(s);
 }
 }
 
@@ -88,7 +77,8 @@ int tcp_start_outgoing_migration(MigrationState *s, const 
char *host_port,
 s->write = socket_write;
 s->close = tcp_close;
 
-s->fd = inet_nonblocking_connect(host_port, &in_progress, errp);
+s->fd = inet_nonblocking_connect(host_port, tcp_wait_for_connect, s,
+ &in_progress, errp);
 if (error_is_set(errp)) {
 migrate_fd_error(s);
 return -1;
@@ -96,7 +86,6 @@ int tcp_start_outgoing_migration(MigrationState *s, const 
char *host_port,
 
 if (in_progress) {
 DPRINTF("connect in progress\n");
-qemu_set_fd_handler2(s->fd, NULL, NULL, tcp_wait_for_connect, s);
 } else {
 migrate_fd_connect(s);
 }
diff --git a/qemu-sockets.c b/qemu-sockets.c
index 4f5eca8..f5d64c8 100644
--- a/qemu-sockets.c
+++ b/qemu-sockets.c
@@ -24,6 +24,7 @@
 
 #include "qemu_socket.h"
 #include "qemu-common.h" /* for qemu_isdigit */
+#include "main-loop.h"
 
 #ifndef AI_ADDRCONFIG
 # define AI_ADDRCONFIG 0
@@ -209,41 +210,27 @@ listen:
 return slisten;
 }
 
-static struct addrinfo *inet_parse_connect_opts(QemuOpts *opts, Error **errp)
-{
-struct addrinfo ai, *res;
-int rc;
-const char *addr;
-const char *port;
-
-memset(&ai,0, sizeof(ai));
-ai.ai_flags = AI_CANONNAME | AI_ADDRCONFIG;
-ai.ai_family = PF_UNSPEC;
-ai.ai_socktype = SOCK_STREAM;
-
-addr = qemu_opt_get(opts, "host");
-port = qemu_opt_get(opts, "port");
-if (addr == NULL || port == NULL) {
-fprintf(stderr,
-"inet_parse_connect_opts: host and/or port not specified\n");
-error_set(errp, QERR_SOCKET_CREATE_FAILED);
-return NULL;
-}
-
-if (qemu_opt_get_bool(opts, "ipv4", 0))
-ai.ai_family = PF_INET;
-if (qemu_opt_get_bool(opts, "ipv6", 0))
-ai.ai_family = PF_INET6;
+#ifdef _WIN32
+#define QEMU_SOCKET_RC_INPROGRESS(rc) \
+((rc) == -EINPROGRESS || rc == -EWOULDBLOCK || rc == -WSAEALREADY)
+#else
+#define QEMU_SOCKET_RC_INPROGRESS(rc) \
+((rc) == -EINPROGRESS)
+#endif
 
-/* lookup */
-if (0 != (rc = getaddrinfo(addr, port, &ai, &res))) {
-fprintf(stderr,"getaddrinfo(%s,%s): %s\n", addr, port,
-gai_strerror(rc));
-error_set(errp, QERR_SOCKET_CREATE_FAILED);
-return NULL;
-}
-return res;
-}
+/* Struct to store connect state for non blocking connect */
+typedef struct ConnectState {
+int fd;
+struct addrinfo *addr_list;
+struct addrinfo *current_addr;
+ConnectHandler *callback;
+void *opaque;
+Error *errp;
+} ConnectState;
+
+static ConnectState connect_state = {
+.fd = -1,
+};
 
 #ifdef _WIN32
 #define QEMU_SOCKET_RC_INPROGRESS(rc) \
@@ -254,12 +241,17 @@ static struct addrinfo *inet_parse_connect_opts(QemuOpts 
*opts, Error **errp)
 #endif
 
 static int inet_connect_addr(struct addrinfo *addr, bool block,
- bool *in_progress, Error **errp)
+ IOHandler *handler, bool *in_progress,
+ Error **errp)
 {
 char uaddr[INET6_ADDRSTRLEN + 1];
 char uport[33];
 int sock, rc;
 
+if (in_progress) {
+*in

[Qemu-devel] [PATCH v2 2/3] Separate inet_connect into inet_connect (blocking) and inet_nonblocking_connect

2012-09-12 Thread Orit Wasserman

No need to add non blocking parameters to the blocking inet_connect

Signed-off-by: Orit Wasserman 
---
 migration-tcp.c |2 +-
 nbd.c   |2 +-
 qemu-sockets.c  |   24 
 qemu_socket.h   |4 +++-
 ui/vnc.c|2 +-
 5 files changed, 26 insertions(+), 8 deletions(-)

diff --git a/migration-tcp.c b/migration-tcp.c
index ac891c3..7f6ad98 100644
--- a/migration-tcp.c
+++ b/migration-tcp.c
@@ -88,7 +88,7 @@ int tcp_start_outgoing_migration(MigrationState *s, const 
char *host_port,
 s->write = socket_write;
 s->close = tcp_close;
 
-s->fd = inet_connect(host_port, false, &in_progress, errp);
+s->fd = inet_nonblocking_connect(host_port, &in_progress, errp);
 if (error_is_set(errp)) {
 migrate_fd_error(s);
 return -1;
diff --git a/nbd.c b/nbd.c
index 0dd60c5..206f75c 100644
--- a/nbd.c
+++ b/nbd.c
@@ -162,7 +162,7 @@ int tcp_socket_outgoing(const char *address, uint16_t port)
 
 int tcp_socket_outgoing_spec(const char *address_and_port)
 {
-return inet_connect(address_and_port, true, NULL, NULL);
+return inet_connect(address_and_port, NULL);
 }
 
 int tcp_socket_incoming(const char *address, uint16_t port)
diff --git a/qemu-sockets.c b/qemu-sockets.c
index 68e4d30..4f5eca8 100644
--- a/qemu-sockets.c
+++ b/qemu-sockets.c
@@ -518,16 +518,32 @@ int inet_listen(const char *str, char *ostr, int olen,
 return sock;
 }
 
-int inet_connect(const char *str, bool block, bool *in_progress, Error **errp)
+int inet_connect(const char *str, Error **errp)
 {
 QemuOpts *opts;
 int sock = -1;
 
 opts = qemu_opts_create(&dummy_opts, NULL, 0, NULL);
 if (inet_parse(opts, str) == 0) {
-if (block) {
-qemu_opt_set(opts, "block", "on");
-}
+qemu_opt_set(opts, "block", "on");
+sock = inet_connect_opts(opts, NULL, errp);
+} else {
+error_set(errp, QERR_SOCKET_CREATE_FAILED);
+}
+qemu_opts_del(opts);
+return sock;
+}
+
+
+int inet_nonblocking_connect(const char *str, bool *in_progress,
+ Error **errp)
+{
+QemuOpts *opts;
+int sock = -1;
+
+opts = qemu_opts_create(&dummy_opts, NULL, 0, NULL);
+if (inet_parse(opts, str) == 0) {
+qemu_opt_set(opts, "block", "off");
 sock = inet_connect_opts(opts, in_progress, errp);
 } else {
 error_set(errp, QERR_SOCKET_CREATE_FAILED);
diff --git a/qemu_socket.h b/qemu_socket.h
index 30ae6af..c47f2b0 100644
--- a/qemu_socket.h
+++ b/qemu_socket.h
@@ -43,7 +43,9 @@ int inet_listen_opts(QemuOpts *opts, int port_offset, Error 
**errp);
 int inet_listen(const char *str, char *ostr, int olen,
 int socktype, int port_offset, Error **errp);
 int inet_connect_opts(QemuOpts *opts, bool *in_progress, Error **errp);
-int inet_connect(const char *str, bool block, bool *in_progress, Error **errp);
+int inet_connect(const char *str, Error **errp);
+int inet_nonblocking_connect(const char *str, bool *in_progress,
+ Error **errp);
 int inet_dgram_opts(QemuOpts *opts);
 const char *inet_strfamily(int family);
 
diff --git a/ui/vnc.c b/ui/vnc.c
index 385e345..01b2daf 100644
--- a/ui/vnc.c
+++ b/ui/vnc.c
@@ -3061,7 +3061,7 @@ int vnc_display_open(DisplayState *ds, const char 
*display)
 if (strncmp(display, "unix:", 5) == 0)
 vs->lsock = unix_connect(display+5);
 else
-vs->lsock = inet_connect(display, true, NULL, NULL);
+vs->lsock = inet_connect(display, NULL);
 if (-1 == vs->lsock) {
 g_free(vs->display);
 vs->display = NULL;
-- 
1.7.7.6

[Qemu-devel] [PATCH v2 0/3] nonblocking connect address handling cleanup

2012-09-12 Thread Orit Wasserman

getaddrinfo can give us a list of addresses, but we only try to
connect to the first one. If that fails we never proceed to
the next one.  This is common on desktop setups that often have ipv6
configured but not actually working.
A simple way to reproduce the problem is migration:
for the destination use -incoming tcp:0:, run migrate -d tcp:localhost:
migration will fail on hosts that have both IPv4 and IPV6 address for localhost.

To fix this, refactor address resolution code and make inet_nonblocking_connect
retry connection with a different address.

Orit Wasserman (3):
  Refactor inet_connect_opts function
  Separate inet_connect into inet_connect (blocking) and
inet_nonblocking_connect
  Fix address handling in inet_nonblocking_connect

 migration-tcp.c |   29 ++-
 nbd.c   |2 +-
 qemu-sockets.c  |  254 +--
 qemu_socket.h   |9 ++-
 ui/vnc.c|2 +-
 5 files changed, 208 insertions(+), 88 deletions(-)

-- 
1.7.7.6

Re: [Qemu-devel] [PATCH v2] Add ability to disable build of all targets

2012-09-12 Thread Laurent Desnogues

On Tue, Sep 11, 2012 at 8:56 PM, Eduardo Habkost  wrote:
> On Mon, Sep 10, 2012 at 06:00:54PM -0500, Anthony Liguori wrote:
>> "Daniel P. Berrange"  writes:
>>
>> > From: "Daniel P. Berrange" 
>> >
>> > Allow passing of '--target-list=' to configure to request that
>> > all targets are to be disabled. This allows for doing a very
>> > fast tools-only build of things like qemu-img, qemu-io, qemu-nbd.
>> >
>> > Signed-off-by: Daniel P. Berrange 
>>
>> Applied. Thanks.
>
> This patch broke the --target-list option:
>
>   $ ./configure --target-list=x86_64-linux-user,x86_64-softmmu
>   [...]
>   ERROR: Target 'x86_64-linux-user,x86_64-softmmu' not recognised
>   $

It also seems to not even allow to use --target-list=.  On my setup
(CentOS 5.6, building in the git directory) the build process just
hangs here:

$ make V=1
cat  | grep =y | sort -u > config-all-devices.mak


Laurent

>>
>> Regards,
>>
>> Anthony Liguori
>>
>> > ---
>> >  configure | 13 -
>> >  1 file changed, 4 insertions(+), 9 deletions(-)
>> >
>> > diff --git a/configure b/configure
>> > index 75dc9da..472374e 100755
>> > --- a/configure
>> > +++ b/configure
>> > @@ -127,7 +127,7 @@ cc_i386=i386-pc-linux-gnu-gcc
>> >  libs_qga=""
>> >  debug_info="yes"
>> >
>> > -target_list=""
>> > +target_list="DEFAULT"
>> >
>> >  # Default value for a variable defining feature "foo".
>> >  #  * foo="no"  feature will only be used if --enable-foo arg is given
>> > @@ -1319,15 +1319,10 @@ if ! "$python" -c 'import sys; 
>> > sys.exit(sys.version_info < (2,4) or sys.version_
>> >exit 1
>> >  fi
>> >
>> > -if test -z "$target_list" ; then
>> > -target_list="$default_target_list"
>> > -else
>> > -target_list=`echo "$target_list" | sed -e 's/,/ /g'`
>> > -fi
>> > -if test -z "$target_list" ; then
>> > -echo "No targets enabled"
>> > -exit 1
>> > +if test "$target_list" = "DEFAULT" ; then
>> > +target_list=`echo "$default_target_list" | sed -e 's/,/ /g'`
>> >  fi
>> > +
>> >  # see if system emulation was really requested
>> >  case " $target_list " in
>> >*"-softmmu "*) softmmu=yes
>> > --
>> > 1.7.11.2
>>
>
> --
> Eduardo
>

[Qemu-devel] [PATCH 1/5] qemu-char: Add new char device CirMemCharDriver

2012-09-12 Thread Lei Li

Signed-off-by: Lei Li 
---
 qemu-char.c |   84 +++
 1 files changed, 84 insertions(+), 0 deletions(-)

diff --git a/qemu-char.c b/qemu-char.c
index 767da93..0470085 100644
--- a/qemu-char.c
+++ b/qemu-char.c
@@ -2591,6 +2591,90 @@ size_t qemu_chr_mem_osize(const CharDriverState *chr)
 return d->outbuf_size;
 }
 
+/*/
+/*CircularMemory chardev*/
+
+typedef struct {
+size_t cbuf_capacity;
+size_t cbuf_in;
+size_t cbuf_out;
+size_t cbuf_count;
+uint8_t *cbuf;
+} CircMemCharDriver;
+
+static int cirmem_chr_is_empty(CharDriverState *chr)
+{
+CircMemCharDriver *d = chr->opaque;
+
+return d->cbuf_count == 0;
+}
+
+static int cirmem_chr_is_full(CharDriverState *chr)
+{
+CircMemCharDriver *d = chr->opaque;
+
+return d->cbuf_count == d->cbuf_capacity;
+}
+
+static int cirmem_chr_write(CharDriverState *chr, const uint8_t *buf, int len)
+{
+CircMemCharDriver *d = chr->opaque;
+int left;
+
+if (d->cbuf_capacity < len) {
+return -1;
+}
+
+left = d->cbuf_capacity - d->cbuf_count % d->cbuf_capacity;
+
+/* Some of cbuf need to be overwrited */
+if (left < len) {
+memcpy(d->cbuf + d->cbuf_in, buf, left);
+memcpy(d->cbuf + d->cbuf_out, buf + left, len - left);
+d->cbuf_out = (d->cbuf_out + len - left) % d->cbuf_capacity;
+d->cbuf_count = d->cbuf_count + left;
+} else {
+/* Completely overwrite */
+if (cirmem_chr_is_full(chr)) {
+d->cbuf_out = (d->cbuf_out + len) % d->cbuf_capacity;
+} else {
+/* Enough cbuf to write */
+d->cbuf_count += len;
+}
+memcpy(d->cbuf + d->cbuf_in, buf, len);
+}
+
+d->cbuf_in = (d->cbuf_in + len) % d->cbuf_capacity;
+
+return len;
+}
+
+static void cirmem_chr_read(CharDriverState *chr, uint8_t *buf, int len)
+{
+CircMemCharDriver *d = chr->opaque;
+int left;
+
+if (cirmem_chr_is_empty(chr)) {
+return;
+}
+
+left = d->cbuf_capacity - d->cbuf_count % d->cbuf_capacity;
+
+if (d->cbuf_capacity < len) {
+len = d->cbuf_capacity;
+}
+
+if (left < len) {
+memcpy(buf, d->cbuf + d->cbuf_out, left);
+memcpy(buf + left, d->cbuf + d->cbuf_out + left, len - left);
+} else {
+memcpy(buf, d->cbuf + d->cbuf_out, len);
+}
+
+d->cbuf_out = (d->cbuf_out + len) % d->cbuf_capacity;
+d->cbuf_count -= len;
+}
+
 QemuOpts *qemu_chr_parse_compat(const char *label, const char *filename)
 {
 char host[65], port[33], width[8], height[8];
-- 
1.7.7.6

[Qemu-devel] [RFC v3 ATCH 0/5] char: expose CirMemCharDriver and provide QMP interface

2012-09-12 Thread Lei Li

This RFC series attempts to convert the MemCharDriver to use a circular
buffer for input and output, expose it to users by introducing QMP commands
memchar_write and memchar_read and via the command line like the other
CharDriverStates.

Serial ports in qemu always use CharDriverStates as there backends,
Right now, all of our backends always try to write the data from the
guest to a socket or file. The concern from OpenStack is that this could
lead to unbounded disk space usage since they log the serial output.
For more detail of the background info:
https://bugs.launchpad.net/nova/+bug/832507

So we want to use a circular buffer in QEMU instead, and then OpenStack
can periodically read the buffer in QEMU and log it.

The QMP commands introduced like:

{ 'command': 'memchar-write',
  'data': {'chardev': 'str', 'size': 'int', 'data': 'str',
   'format': 'str', 'control': 'str' } }

{ 'command': 'memchar-read',
  'data': {'chardev': 'str', 'size': 'int',
   'format': 'str', 'control': 'str' },
  'returns': 'str' }

Expose CirMemCharDriver via the command line like:

qemu -chardev memchr,id=foo,maxcapacity=640k -serial chardev:foo

Introduce HMP command 'console' like:

(qemu) console foo

Note:
Now all of the feature were implemented, this series is just a
sketch and not completely tested. Please comment and let me know
if this seems like the direction we should be headed, your suggestion
would be very appreciated!

Known issues:
There are still some problems I am fixing,
  - The circularMemoryDriver need to be improved.
  - For the 'block' option as sync command, will support it later when
we gain the necessary infrastructure.
  - The HMP command 'console' still have problems to be fixed. 

Changes since v2:
  - Add congestion mechanism. For the 'block' option as sync command,
will support it later when we gain the necessary infrastructure
enhancement.
  - Add HMP 'console' command so that can interact with multiple
chardevs via a single monitor socket.
  - Make the circular buffer backend and the current MemCharDriver
live in parallel, expose a new char backend with circular buffer
CirMemCharDriver suggested by Luiz.
  - Other fixs from Eric and Markus.

Changes since v1:
  - Exposing the MemCharDriver via command line.
  - Support base64 data format suggested by Anthony and Eric.
  - Follow the new rule for the name of qmp command from Eric.


Lei Li (5):
  qemu-char: Add new char device CirMemCharDriver
  Expose CirMemCharDriver via command line
  QAPI: Introduce memchar-write QMP command
  QAPI: Introduce memchar-read QMP command
  HMP: Introduce console command

 hmp-commands.hx  |   72 ++
 hmp.c|   79 
 hmp.h|3 +
 monitor.c|   18 +
 monitor.h|2 +
 qapi-schema.json |   96 
 qemu-char.c  |  212 ++
 qemu-config.c|3 +
 qemu-options.hx  |   10 +++
 qmp-commands.hx  |   76 +++
 10 files changed, 571 insertions(+), 0 deletions(-)

[Qemu-devel] [PATCH 4/5] QAPI: Introduce memchar-read QMP command

2012-09-12 Thread Lei Li

Signed-off-by: Lei Li 
---
 hmp-commands.hx  |   25 +
 hmp.c|   18 ++
 hmp.h|1 +
 qapi-schema.json |   27 +++
 qemu-char.c  |   48 
 qmp-commands.hx  |   37 +
 6 files changed, 156 insertions(+), 0 deletions(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index fe11926..1d2fccc 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -816,6 +816,31 @@ Support 'utf8' and 'base64', by default is 'utf8'.
 @var{control} is option('block', 'drop') for read and write command
 that specifies behavior when the queue is full/empty. By default is
 'drop'. Note that the 'block' option is not supported now.
+
+ETEXI
+
+{
+.name   = "memchar_read",
+.args_type  = "chardev:s,size:i,format:s?,control:s?",
+.params = "chardev size [format] [control]",
+.help   = "Provide read interface for CirMemCharDriver. Read from"
+  "it and return 'size' of the data",
+.mhandler.cmd = hmp_memchar_read,
+},
+
+STEXI
+@item memchar_read @var{chardev} @var{size}
+@findex memchar_read
+Provide read interface for CirMemCharDriver. Read from cirmemchr
+char device and return @var{size} of the data.
+
+@var{format} is the format of the data read from CirMemCharDriver.
+Support 'utf8' and 'base64', by default is 'utf8'.
+
+@var{control} is option['block', 'drop'] for read and write command
+that specifies behavior when the queue is full/empty. By default is
+'drop'. Note that the 'block' option is not supported now.
+
 ETEXI
 
 {
diff --git a/hmp.c b/hmp.c
index 97f5058..4397981 100644
--- a/hmp.c
+++ b/hmp.c
@@ -690,6 +690,24 @@ void hmp_memchar_write(Monitor *mon, const QDict *qdict)
 hmp_handle_error(mon, &errp);
 }
 
+void hmp_memchar_read(Monitor *mon, const QDict *qdict)
+{
+uint32_t size = qdict_get_int(qdict, "size");
+const char *chardev = qdict_get_str(qdict, "chardev");
+char *data;
+int val = qdict_get_try_bool(qdict, "base64", 0);
+enum DataFormat format;
+int con = qdict_get_try_bool(qdict, "block", 0);
+enum CongestionControl control;
+Error *errp = NULL;
+
+format = val ? DATA_FORMAT_BASE64 : DATA_FORMAT_UTF8;
+control = con ? CONGESTION_CONTROL_BLOCK : CONGESTION_CONTROL_DROP;
+data = qmp_memchar_read(chardev, size, true, format,
+true, control, &errp);
+monitor_printf(mon, "%s\n", data);
+}
+
 static void hmp_cont_cb(void *opaque, int err)
 {
 if (!err) {
diff --git a/hmp.h b/hmp.h
index 44b6463..ed2cda4 100644
--- a/hmp.h
+++ b/hmp.h
@@ -44,6 +44,7 @@ void hmp_cpu(Monitor *mon, const QDict *qdict);
 void hmp_memsave(Monitor *mon, const QDict *qdict);
 void hmp_pmemsave(Monitor *mon, const QDict *qdict);
 void hmp_memchar_write(Monitor *mon, const QDict *qdict);
+void hmp_memchar_read(Monitor *mon, const QDict *qdict);
 void hmp_cont(Monitor *mon, const QDict *qdict);
 void hmp_system_wakeup(Monitor *mon, const QDict *qdict);
 void hmp_inject_nmi(Monitor *mon, const QDict *qdict);
diff --git a/qapi-schema.json b/qapi-schema.json
index 371239a..5274b86 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -305,6 +305,33 @@
'*control': 'CongestionControl'} }
 
 ##
+# @memchar-read:
+#
+# Provide read interface for CirMemCharDriver. Read from cirmemchar
+# char device and return the data.
+#
+# @chardev: the name of the cirmemchar char device.
+#
+# @size: the size to read in bytes.
+#
+# @format: #optional the format of the data want to read from
+#  CirMemCharDriver, by default is 'utf8'.
+#
+# @control: #optional options for read and write command that specifies
+#   behavior when the queue is full/empty.
+#
+# Returns: The data read from cirmemchar as string.
+#  If @chardev is not a valid memchr device, DeviceNotFound
+#  If an I/O error occurs while reading, IOError
+#
+# Since: 1.3
+##
+{ 'command': 'memchar-read',
+  'data': {'chardev': 'str', 'size': 'int', '*format': 'DataFormat',
+   '*control': 'CongestionControl'},
+   'returns': 'str' }
+
+##
 # @CommandInfo:
 #
 # Information about a QMP command
diff --git a/qemu-char.c b/qemu-char.c
index be1d79a..bb3ddb9 100644
--- a/qemu-char.c
+++ b/qemu-char.c
@@ -2748,6 +2748,54 @@ void qmp_memchar_write(const char *chardev, int64_t size,
 g_free(write_data);
 }
 
+char *qmp_memchar_read(const char *chardev, int64_t size,
+   bool has_format, enum DataFormat format,
+   bool has_control, enum CongestionControl control,
+   Error **errp)
+{
+CharDriverState *chr;
+guchar *read_data;
+char *data = NULL;
+
+read_data = g_malloc0(size);
+
+chr = qemu_chr_find(chardev);
+if(!chr) {
+error_set(errp, QERR_DEVICE_NOT_FOUND, chardev);
+return NULL;
+}
+
+/* XXX: For the

[Qemu-devel] [PATCH 2/5] Expose CirMemCharDriver via command line

2012-09-12 Thread Lei Li

Signed-off-by: Lei Li 
---
 qemu-char.c |   31 +++
 qemu-config.c   |3 +++
 qemu-options.hx |   10 ++
 3 files changed, 44 insertions(+), 0 deletions(-)

diff --git a/qemu-char.c b/qemu-char.c
index 0470085..6e84acc 100644
--- a/qemu-char.c
+++ b/qemu-char.c
@@ -2675,6 +2675,31 @@ static void cirmem_chr_read(CharDriverState *chr, 
uint8_t *buf, int len)
 d->cbuf_count -= len;
 }
 
+static CharDriverState *qemu_chr_open_cirmemchr(QemuOpts *opts)
+{
+CharDriverState *chr;
+CircMemCharDriver *d;
+
+chr = g_malloc0(sizeof(CharDriverState));
+d = g_malloc(sizeof(*d));
+
+d->cbuf_capacity = qemu_opt_get_number(opts, "maxcapacity", 0);
+if (d->cbuf_capacity == 0) {
+d->cbuf_capacity = CBUFF_SIZE;
+}
+
+d->cbuf_in = 0;
+d->cbuf_out = 0;
+d->cbuf_count = 0;
+d->cbuf = g_malloc0(d->cbuf_capacity);
+
+memset(chr, 0, sizeof(*chr));
+chr->opaque = d;
+chr->chr_write = mem_chr_write;
+
+return chr;
+}
+
 QemuOpts *qemu_chr_parse_compat(const char *label, const char *filename)
 {
 char host[65], port[33], width[8], height[8];
@@ -2739,6 +2764,11 @@ QemuOpts *qemu_chr_parse_compat(const char *label, const 
char *filename)
 qemu_opt_set(opts, "path", p);
 return opts;
 }
+if (strstart(filename, "memchr", &p)) {
+qemu_opt_set(opts, "backend", "memchr");
+qemu_opt_set(opts, "maxcapacity", p);
+return opts;
+}
 if (strstart(filename, "tcp:", &p) ||
 strstart(filename, "telnet:", &p)) {
 if (sscanf(p, "%64[^:]:%32[^,]%n", host, port, &pos) < 2) {
@@ -2812,6 +2842,7 @@ static const struct {
 { .name = "udp",   .open = qemu_chr_open_udp },
 { .name = "msmouse",   .open = qemu_chr_open_msmouse },
 { .name = "vc",.open = text_console_init },
+{ .name = "memchr",.open = qemu_chr_open_cirmemchr },
 #ifdef _WIN32
 { .name = "file",  .open = qemu_chr_open_win_file_out },
 { .name = "pipe",  .open = qemu_chr_open_win_pipe },
diff --git a/qemu-config.c b/qemu-config.c
index eba977e..5cb6dcb 100644
--- a/qemu-config.c
+++ b/qemu-config.c
@@ -213,6 +213,9 @@ static QemuOptsList qemu_chardev_opts = {
 },{
 .name = "debug",
 .type = QEMU_OPT_NUMBER,
+},{
+.name = "maxcapacity",
+.type = QEMU_OPT_NUMBER,
 },
 { /* end of list */ }
 },
diff --git a/qemu-options.hx b/qemu-options.hx
index 804a2d1..3a7384d 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -1666,6 +1666,7 @@ DEF("chardev", HAS_ARG, QEMU_OPTION_chardev,
 "-chardev msmouse,id=id[,mux=on|off]\n"
 "-chardev 
vc,id=id[[,width=width][,height=height]][[,cols=cols][,rows=rows]]\n"
 " [,mux=on|off]\n"
+"-chardev memchr,id=id,maxcapacity=maxcapacity\n"
 "-chardev file,id=id,path=path[,mux=on|off]\n"
 "-chardev pipe,id=id,path=path[,mux=on|off]\n"
 #ifdef _WIN32
@@ -1704,6 +1705,7 @@ Backend is one of:
 @option{udp},
 @option{msmouse},
 @option{vc},
+@option{memchr},
 @option{file},
 @option{pipe},
 @option{console},
@@ -1810,6 +1812,14 @@ the console, in pixels.
 @option{cols} and @option{rows} specify that the console be sized to fit a text
 console with the given dimensions.
 
+@item -chardev memchr ,id=@var{id} ,maxcapacity=@var{maxcapacity}
+
+Create a circular buffer with fixed size indicated by optionally 
@option{maxcapacity}
+which will be default 64K if it is not given.
+
+@option{maxcapacity} specify the max capacity of the size of circular buffer
+want to create.
+
 @item -chardev file ,id=@var{id} ,path=@var{path}
 
 Log all traffic received from the guest to a file.
-- 
1.7.7.6

[Qemu-devel] [PATCH 5/5] HMP: Introduce console command

2012-09-12 Thread Lei Li

Signed-off-by: Lei Li 
---
 hmp.c |   42 ++
 monitor.c |   18 ++
 monitor.h |2 ++
 3 files changed, 62 insertions(+), 0 deletions(-)

diff --git a/hmp.c b/hmp.c
index 4397981..a016a5c 100644
--- a/hmp.c
+++ b/hmp.c
@@ -1205,3 +1205,45 @@ void hmp_screen_dump(Monitor *mon, const QDict *qdict)
 qmp_screendump(filename, &err);
 hmp_handle_error(mon, &err);
 }
+
+int console_escape_char = 0x1d; /* ctrl-] is used for escape */
+
+static void hmp_read_console(Monitor *mon, const char *data,
+ void *opaque)
+{
+CharDriverState *chr = opaque;
+uint32_t size = strlen(data);
+enum DataFormat format = DATA_FORMAT_UTF8;
+enum CongestionControl control = CONGESTION_CONTROL_DROP;
+
+Error *err = NULL;
+
+if (*data == console_escape_char) {
+monitor_resume(mon);
+return;
+}
+
+qmp_memchar_write(chr->label, size, data, 0, format,
+  0, control, &err);
+monitor_read_command(mon, 1);
+}
+
+void hmp_console(Monitor *mon, const QDict *qdict)
+{
+const char *device = qdict_get_str(qdict, "chardev");
+CharDriverState *chr;
+Error *err = NULL;
+
+chr = qemu_chr_find(device);
+
+if (!chr) {
+error_set(&err, QERR_DEVICE_NOT_FOUND, device);
+hmp_handle_error(mon, &err);
+return;
+}
+
+if (monitor_read_console(mon, device, hmp_read_console, chr) < 0) {
+monitor_printf(mon, "Connect to console %s failed\n", device);
+}
+g_free(chr);
+}
diff --git a/monitor.c b/monitor.c
index 67064e2..285dc7b 100644
--- a/monitor.c
+++ b/monitor.c
@@ -256,6 +256,24 @@ int monitor_read_password(Monitor *mon, ReadLineFunc 
*readline_func,
 }
 }
 
+int monitor_read_console(Monitor *mon, const char *device,
+ ReadLineFunc *readline_func, void *opaque)
+{
+char prompt[60];
+
+if (!mon->rs)
+return -1;
+
+if (monitor_ctrl_mode(mon)) {
+qerror_report(QERR_MISSING_PARAMETER, "console");
+return -EINVAL;
+}
+
+snprintf(prompt, sizeof(prompt), "%s: ", device);
+readline_start(mon->rs, prompt, 0, readline_func, opaque);
+return 0;
+}
+
 void monitor_flush(Monitor *mon)
 {
 if (mon && mon->outbuf_index != 0 && !mon->mux_out) {
diff --git a/monitor.h b/monitor.h
index 64c1561..924a042 100644
--- a/monitor.h
+++ b/monitor.h
@@ -84,6 +84,8 @@ void monitor_read_command(Monitor *mon, int show_prompt);
 ReadLineState *monitor_get_rs(Monitor *mon);
 int monitor_read_password(Monitor *mon, ReadLineFunc *readline_func,
   void *opaque);
+int monitor_read_console(Monitor *mon, const char *device,
+ ReadLineFunc *readline_func, void *opaque);
 
 int qmp_qom_set(Monitor *mon, const QDict *qdict, QObject **ret);
 
-- 
1.7.7.6

[Qemu-devel] [PATCH 3/5] QAPI: Introduce memchar-write QMP command

2012-09-12 Thread Lei Li

Signed-off-by: Lei Li 
---
 hmp-commands.hx  |   23 ++
 hmp.c|   19 +++
 hmp.h|1 +
 qapi-schema.json |   69 ++
 qemu-char.c  |   48 +
 qmp-commands.hx  |   39 ++
 6 files changed, 199 insertions(+), 0 deletions(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index ed67e99..fe11926 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -796,6 +796,29 @@ Inject an NMI on the given CPU (x86 only).
 ETEXI
 
 {
+.name   = "memchar_write",
+.args_type  = "chardev:s,size:i,data:s,format:s?,control:s?",
+.params = "chardev size data [format] [control]",
+.help   = "Provide writing interface for CirMemCharDriver. Write"
+  "'data' to it with size 'size'",
+.mhandler.cmd = hmp_memchar_write,
+},
+
+STEXI
+@item memchar_write @var{chardev} @var{size} @var{data} [@var{format}] 
[@var{control}]
+@findex memchar_write
+Provide writing interface for CirMemCharDriver. Write @var{data}
+to cirmemchr char device with size @var{size}.
+
+@var{format} is the format of the data write to CirMemCharDriver.
+Support 'utf8' and 'base64', by default is 'utf8'.
+
+@var{control} is option('block', 'drop') for read and write command
+that specifies behavior when the queue is full/empty. By default is
+'drop'. Note that the 'block' option is not supported now.
+ETEXI
+
+{
 .name   = "migrate",
 .args_type  = "detach:-d,blk:-b,inc:-i,uri:s",
 .params = "[-d] [-b] [-i] uri",
diff --git a/hmp.c b/hmp.c
index ba6fbd3..97f5058 100644
--- a/hmp.c
+++ b/hmp.c
@@ -671,6 +671,25 @@ void hmp_pmemsave(Monitor *mon, const QDict *qdict)
 hmp_handle_error(mon, &errp);
 }
 
+void hmp_memchar_write(Monitor *mon, const QDict *qdict)
+{
+uint32_t size = qdict_get_int(qdict, "size");
+const char *chardev = qdict_get_str(qdict, "chardev");
+const char *data = qdict_get_str(qdict, "data");
+int val = qdict_get_try_bool(qdict, "base64", 0);
+enum DataFormat format;
+int con = qdict_get_try_bool(qdict, "block", 0);
+enum CongestionControl control;
+Error *errp = NULL;
+
+format = val ? DATA_FORMAT_BASE64 : DATA_FORMAT_UTF8;
+control = con ? CONGESTION_CONTROL_BLOCK : CONGESTION_CONTROL_DROP;
+qmp_memchar_write(chardev, size, data, true, format,
+  true, control, &errp);
+
+hmp_handle_error(mon, &errp);
+}
+
 static void hmp_cont_cb(void *opaque, int err)
 {
 if (!err) {
diff --git a/hmp.h b/hmp.h
index 48b9c59..44b6463 100644
--- a/hmp.h
+++ b/hmp.h
@@ -43,6 +43,7 @@ void hmp_system_powerdown(Monitor *mon, const QDict *qdict);
 void hmp_cpu(Monitor *mon, const QDict *qdict);
 void hmp_memsave(Monitor *mon, const QDict *qdict);
 void hmp_pmemsave(Monitor *mon, const QDict *qdict);
+void hmp_memchar_write(Monitor *mon, const QDict *qdict);
 void hmp_cont(Monitor *mon, const QDict *qdict);
 void hmp_system_wakeup(Monitor *mon, const QDict *qdict);
 void hmp_inject_nmi(Monitor *mon, const QDict *qdict);
diff --git a/qapi-schema.json b/qapi-schema.json
index a9f465a..371239a 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -236,6 +236,75 @@
 { 'command': 'query-chardev', 'returns': ['ChardevInfo'] }
 
 ##
+# @DataFormat:
+#
+# An enumeration of data format write to or read from
+# memchardev. The default value would be utf8.
+#
+# @utf8: The data format is 'utf8'.
+#
+# @base64: The data format is 'base64'.
+#
+# Note: The data format start with 'utf8' and 'base64', will support
+#   other data format as well.
+#
+# Since: 1.3
+##
+{ 'enum': 'DataFormat'
+  'data': [ 'utf8', 'base64' ] }
+
+##
+# @CongestionControl
+#
+# An enumeration of options for the read and write command that
+# specifies behavior when the queue is full/empty. The default
+# option would be dropped.
+#
+# @drop: Would result in reads returning empty strings and writes
+#dropping queued data.
+#
+# @block: Would make the session block until data was available
+# or the queue had space available.
+#
+# Note: The option 'block' is not supported now due to the miss
+#   feature in qmp. Will add it later when we gain the necessary
+#   infrastructure enhancement.
+#
+# Since: 1.3
+##
+{'enum': 'CongestionControl'
+ 'data': [ 'drop', 'block' ] }
+
+##
+# @memchar-write:
+#
+# Provide writing interface for CirMemCharDriver. Write data to cirmemchar
+# char device.
+#
+# @chardev: the name of the cirmemchar char device.
+#
+# @size: the size to write in bytes.
+#
+# @data: the source data write to cirmemchar.
+#
+# @format: #optional the format of the data write to cirmemchar, by
+#  default is 'utf8'.
+#
+# @control: #optional options for read and write command that specifies
+#   behavior when the queue is full/empty.
+#
+# Returns: Nothing on success
+#  If @chardev is not a

[Qemu-devel] [PATCH] Basic support for ARM A15 "architectured" (cp15) timers

2012-09-12 Thread Daniel Forsgren

This patch adds basic support for the "architected" timers (i.e. cp15)
found in A15. It's enough to allow Linux to boot, using arch_timer for
the tick. However - it is not a complete model of the timer block at
large, it is not that well structured, and it is currently tested with
qemu-linaro-1.1.50-2012.07 (not latest and greatest). It's simply a
prototype.

However, if anyone wants to play with the architectured (cp15) timers
instead of sp804, then please feel free to try it out. It has been
tested with linux-linaro-3.6-rc2-2012.08, and you can easily verify
the existence of these timers under /proc/interrupts:

root@linaro-developer:~# cat /proc/interrupts 
cat /proc/interrupts 
   CPU0   
 29:   7424   GIC  arch_timer
 30:  0   GIC  arch_timer

Please note that this also requires some minor fixes that are not part
of qemu-linaro-1.1.50-2012.07:

http://patches.linaro.org/9833/

Signed-off-by: Daniel Forsgren 

---

diff -Nupr qemu-linaro-1.1.50-2012.07/hw/a15mpcore.c 
qemu-linaro-1.1.50-2012.07-modified/hw/a15mpcore.c
--- qemu-linaro-1.1.50-2012.07/hw/a15mpcore.c   2012-07-05 16:48:28.0 
+0200
+++ qemu-linaro-1.1.50-2012.07-modified/hw/a15mpcore.c  2012-09-12 
11:24:25.844237405 +0200
@@ -28,6 +28,7 @@ typedef struct A15MPPrivState {
 uint32_t num_cpu;
 uint32_t num_irq;
 MemoryRegion container;
+DeviceState *archtimer;
 DeviceState *gic;
 } A15MPPrivState;
 
@@ -40,7 +41,8 @@ static void a15mp_priv_set_irq(void *opa
 static int a15mp_priv_init(SysBusDevice *dev)
 {
 A15MPPrivState *s = FROM_SYSBUS(A15MPPrivState, dev);
-SysBusDevice *busdev;
+SysBusDevice *busdev, *timerbusdev;
+int i;
 
 if (kvm_irqchip_in_kernel()) {
 s->gic = qdev_create(NULL, "kvm-arm_gic");
@@ -60,6 +62,11 @@ static int a15mp_priv_init(SysBusDevice
 /* Pass through inbound GPIO lines to the GIC */
 qdev_init_gpio_in(&s->busdev.qdev, a15mp_priv_set_irq, s->num_irq - 32);
 
+s->archtimer = qdev_create(NULL, "arm_archtimer");
+//qdev_prop_set_uint32(s->archtimer, "num-cpu", s->num_cpu);
+qdev_init_nofail(s->archtimer);
+timerbusdev = sysbus_from_qdev(s->archtimer);
+
 /* Memory map (addresses are offsets from PERIPHBASE):
  *  0x-0x0fff -- reserved
  *  0x1000-0x1fff -- GIC Distributor
@@ -75,6 +82,16 @@ static int a15mp_priv_init(SysBusDevice
 sysbus_mmio_get_region(busdev, 1));
 
 sysbus_init_mmio(dev, &s->container);
+
+
+for (i = 0; i < s->num_cpu; i++) {
+int ppibase = (s->num_irq - 32) + i * 32;
+sysbus_connect_irq(timerbusdev, i * 2,
+   qdev_get_gpio_in(s->gic, ppibase + 29));
+sysbus_connect_irq(timerbusdev, i * 2 + 1,
+   qdev_get_gpio_in(s->gic, ppibase + 30));
+}
+
 return 0;
 }
 
diff -Nupr qemu-linaro-1.1.50-2012.07/hw/arm/Makefile.objs 
qemu-linaro-1.1.50-2012.07-modified/hw/arm/Makefile.objs
--- qemu-linaro-1.1.50-2012.07/hw/arm/Makefile.objs 2012-07-05 
16:48:28.0 +0200
+++ qemu-linaro-1.1.50-2012.07-modified/hw/arm/Makefile.objs2012-09-12 
11:28:39.121053287 +0200
@@ -1,4 +1,4 @@
-obj-y = integratorcp.o versatilepb.o arm_pic.o arm_timer.o
+obj-y = integratorcp.o versatilepb.o arm_pic.o arm_timer.o arm_archtimer.o
 obj-y += arm_boot.o pl011.o pl031.o pl050.o pl080.o pl110.o pl181.o pl190.o
 obj-y += versatile_pci.o
 obj-y += versatile_i2c.o
diff -Nupr qemu-linaro-1.1.50-2012.07/hw/arm_archtimer.c 
qemu-linaro-1.1.50-2012.07-modified/hw/arm_archtimer.c
--- qemu-linaro-1.1.50-2012.07/hw/arm_archtimer.c   1970-01-01 
01:00:00.0 +0100
+++ qemu-linaro-1.1.50-2012.07-modified/hw/arm_archtimer.c  2012-09-12 
13:21:44.676267111 +0200
@@ -0,0 +1,232 @@
+/*
+ * "Architectured" timer for A15
+ *
+ * Copyright (c) 2012 Enea Software AB
+ * Written by Daniel Forsgren
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see .
+ */
+
+#include "sysbus.h"
+#include "qemu-timer.h"
+
+/* Primitive (and imcomplete) support for A15 "architectured" timers 
+
+   - Only PL1 timer supported.
+
+   - Only minimal subset of fuctionality requred by Linux.
+   
+   - Only tested with single-core.
+
+*/
+
+/* control register bit assignment */
+#define CTL_ENABLE  0x01
+#define CTL_MASK0x02
+#define CTL_INT 0x04
+
+/* state of per-core timers

[Qemu-devel] [PATCH 2/9] ehci: Walk async schedule before and after migration

2012-09-12 Thread Hans de Goede

Signed-off-by: Hans de Goede 
---
 hw/usb/hcd-ehci.c | 28 
 1 file changed, 28 insertions(+)

diff --git a/hw/usb/hcd-ehci.c b/hw/usb/hcd-ehci.c
index c5f2635..e67cbc7 100644
--- a/hw/usb/hcd-ehci.c
+++ b/hw/usb/hcd-ehci.c
@@ -34,6 +34,7 @@
 #include "monitor.h"
 #include "trace.h"
 #include "dma.h"
+#include "sysemu.h"
 
 #define EHCI_DEBUG   0
 
@@ -2574,6 +2575,32 @@ static int usb_ehci_post_load(void *opaque, int 
version_id)
 return 0;
 }
 
+static void usb_ehci_vm_state_change(void *opaque, int running, RunState state)
+{
+EHCIState *ehci = opaque;
+
+/*
+ * We don't migrate the EHCIQueue-s, instead we rebuild them for the
+ * schedule in guest memory. We must do the rebuilt ASAP, so that
+ * USB-devices which have async handled packages have a packet in the
+ * ep queue to match the completion with.
+ */
+if (state == RUN_STATE_RUNNING) {
+ehci_advance_async_state(ehci);
+}
+
+/*
+ * The schedule rebuilt from guest memory could cause the migration dest
+ * to miss a QH unlink, and fail to cancel packets, since the unlinked QH
+ * will never have existed on the destination. Therefor we must flush the
+ * async schedule on savevm to catch any not yet noticed unlinks.
+ */
+if (state == RUN_STATE_SAVE_VM) {
+ehci_advance_async_state(ehci);
+ehci_queues_rip_unseen(ehci, 1);
+}
+}
+
 static const VMStateDescription vmstate_ehci = {
 .name= "ehci",
 .version_id  = 2,
@@ -2723,6 +2750,7 @@ static int usb_ehci_initfn(PCIDevice *dev)
 usb_packet_init(&s->ipacket);
 
 qemu_register_reset(ehci_reset, s);
+qemu_add_vm_change_state_handler(usb_ehci_vm_state_change, s);
 
 memory_region_init_io(&s->mem, &ehci_mem_ops, s, "ehci", MMIO_SIZE);
 pci_register_bar(&s->dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY, &s->mem);
-- 
1.7.12

[Qemu-devel] [PATCH 1/9] ehci: Don't set seen to 0 when removing unseen queue-heads

2012-09-12 Thread Hans de Goede

When removing unseen queue-heads from the async queue list, we should not
set the seen flag to 0, as this may cause them to be removed by
ehci_queues_rip_unused() during the next call to ehci_advance_async_state()
if the timer is late or running at a low frequency.

Note:
1) This *may* have caused the instant unlink / relinks described in commit
   9bc3a3a216e2689bfcdd36c3e079333bbdbf3ba0

2) Rather then putting more if-s inside ehci_queues_rip_unused, this patch
   instead introduces a new ehci_queues_rip_unseen function.

3) This patch also makes it save to call ehci_queues_rip_unseen() multiple
   times, which gets used in the folluw up patch titled:
   "ehci: Walk async schedule before and after migration"

Signed-off-by: Hans de Goede 
---
 hw/usb/hcd-ehci.c | 24 ++--
 1 file changed, 18 insertions(+), 6 deletions(-)

diff --git a/hw/usb/hcd-ehci.c b/hw/usb/hcd-ehci.c
index 2f3e9c0..c5f2635 100644
--- a/hw/usb/hcd-ehci.c
+++ b/hw/usb/hcd-ehci.c
@@ -853,10 +853,10 @@ static EHCIQueue *ehci_find_queue_by_qh(EHCIState *ehci, 
uint32_t addr,
 return NULL;
 }
 
-static void ehci_queues_rip_unused(EHCIState *ehci, int async, int flush)
+static void ehci_queues_rip_unused(EHCIState *ehci, int async)
 {
 EHCIQueueHead *head = async ? &ehci->aqueues : &ehci->pqueues;
-const char *warn = (async && !flush) ? "guest unlinked busy QH" : NULL;
+const char *warn = async ? "guest unlinked busy QH" : NULL;
 uint64_t maxage = FRAME_TIMER_NS * ehci->maxframes * 4;
 EHCIQueue *q, *tmp;
 
@@ -866,13 +866,25 @@ static void ehci_queues_rip_unused(EHCIState *ehci, int 
async, int flush)
 q->ts = ehci->last_run_ns;
 continue;
 }
-if (!flush && ehci->last_run_ns < q->ts + maxage) {
+if (ehci->last_run_ns < q->ts + maxage) {
 continue;
 }
 ehci_free_queue(q, warn);
 }
 }
 
+static void ehci_queues_rip_unseen(EHCIState *ehci, int async)
+{
+EHCIQueueHead *head = async ? &ehci->aqueues : &ehci->pqueues;
+EHCIQueue *q, *tmp;
+
+QTAILQ_FOREACH_SAFE(q, head, next, tmp) {
+if (!q->seen) {
+ehci_free_queue(q, NULL);
+}
+}
+}
+
 static void ehci_queues_rip_device(EHCIState *ehci, USBDevice *dev, int async)
 {
 EHCIQueueHead *head = async ? &ehci->aqueues : &ehci->pqueues;
@@ -1732,7 +1744,7 @@ static int ehci_state_waitlisthead(EHCIState *ehci,  int 
async)
 ehci_set_usbsts(ehci, USBSTS_REC);
 }
 
-ehci_queues_rip_unused(ehci, async, 0);
+ehci_queues_rip_unused(ehci, async);
 
 /*  Find the head of the list (4.9.1.1) */
 for(i = 0; i < MAX_QH; i++) {
@@ -2364,7 +2376,7 @@ static void ehci_advance_async_state(EHCIState *ehci)
  */
 if (ehci->usbcmd & USBCMD_IAAD) {
 /* Remove all unseen qhs from the async qhs queue */
-ehci_queues_rip_unused(ehci, async, 1);
+ehci_queues_rip_unseen(ehci, async);
 trace_usb_ehci_doorbell_ack();
 ehci->usbcmd &= ~USBCMD_IAAD;
 ehci_raise_irq(ehci, USBSTS_IAA);
@@ -2417,7 +2429,7 @@ static void ehci_advance_periodic_state(EHCIState *ehci)
 ehci_set_fetch_addr(ehci, async,entry);
 ehci_set_state(ehci, async, EST_FETCHENTRY);
 ehci_advance_state(ehci, async);
-ehci_queues_rip_unused(ehci, async, 0);
+ehci_queues_rip_unused(ehci, async);
 break;
 
 default:
-- 
1.7.12

Re: [Qemu-devel] [PATCH] hw: Add support for new LSI Logic devices.

2012-09-12 Thread Kevin Wolf

Am 12.09.2012 01:50, schrieb Michael S. Tsirkin:
> On Tue, Sep 11, 2012 at 01:00:13PM -0400, Don Slutz wrote:
>> Add LSI53C1030, SAS1068, SAS1068e.  Based on code from "VirtualBox Open 
>> Source Edition".
>> Based on QEMU MegaRAID SAS 8708EM2.
>>
>> This is a common VMware disk controller.
> 
> I think you mean VMware emulates this controller too,
> pls say it explicitly in the commit log.
> 
>> SEABIOS change for booting is in the works.
>>
>> Tested with Fedora 16, 17.  CentoOS 6. Windows 2003R2 and 2008R2.
>>
>> Signed-off-by: Don Slutz 
> 
> Minor comments below.
> 
> Coding style does not adhere to qemu standards,
> I guess you know that :)
> 
> Otherwise, from pci side of things this looks OK.
> I did not look at the scsi side of things.
> 
>> ---
>>  default-configs/pci.mak |1 +
>>  hw/Makefile.objs|1 +
>>  hw/lsilogic.c   | 2743 ++
>>  hw/lsilogic.h   | 3365 
>> +++
>>  hw/pci_ids.h|4 +
>>  trace-events|   26 +
>>  6 files changed, 6140 insertions(+), 0 deletions(-)
>>  create mode 100644 hw/lsilogic.c
>>  create mode 100644 hw/lsilogic.h
>>
>> diff --git a/default-configs/pci.mak b/default-configs/pci.mak
>> index 69e18f1..ae4873d 100644
>> --- a/default-configs/pci.mak
>> +++ b/default-configs/pci.mak
>> @@ -11,6 +11,7 @@ CONFIG_PCNET_PCI=y
>>  CONFIG_PCNET_COMMON=y
>>  CONFIG_LSI_SCSI_PCI=y
>>  CONFIG_MEGASAS_SCSI_PCI=y
>> +CONFIG_LSILOGIC_SCSI_PCI=y
>>  CONFIG_RTL8139_PCI=y
>>  CONFIG_E1000_PCI=y
>>  CONFIG_IDE_CORE=y
>> diff --git a/hw/Makefile.objs b/hw/Makefile.objs
>> index 6dfebd2..e5f939c 100644
>> --- a/hw/Makefile.objs
>> +++ b/hw/Makefile.objs
>> @@ -115,6 +115,7 @@ hw-obj-$(CONFIG_OPENCORES_ETH) += opencores_eth.o
>>  # SCSI layer
>>  hw-obj-$(CONFIG_LSI_SCSI_PCI) += lsi53c895a.o
>>  hw-obj-$(CONFIG_MEGASAS_SCSI_PCI) += megasas.o
>> +hw-obj-$(CONFIG_LSILOGIC_SCSI_PCI) += lsilogic.o
>>  hw-obj-$(CONFIG_ESP) += esp.o
>>  hw-obj-$(CONFIG_ESP_PCI) += esp-pci.o
>>  
>> diff --git a/hw/lsilogic.c b/hw/lsilogic.c
>> new file mode 100644
>> index 000..1c0a54f
>> --- /dev/null
>> +++ b/hw/lsilogic.c
>> @@ -0,0 +1,2743 @@
>> +/*
>> + * QEMU LSILOGIC LSI53C1030 SCSI and SAS1068 Host Bus Adapter emulation
>> + * Based on the QEMU Megaraid emulator and the VirtualBox LsiLogic
>> + * LSI53c1030 SCSI controller
>> + *
>> + * Copyright (c) 2009-2012 Hannes Reinecke, SUSE Labs
>> + *
>> + * This library is free software; you can redistribute it and/or
>> + * modify it under the terms of the GNU Lesser General Public
>> + * License as published by the Free Software Foundation; either
>> + * version 2 of the License, or (at your option) any later version.
>> + *
>> + * This library is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>> + * Lesser General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU Lesser General Public
>> + * License along with this library; if not, see 
>> .
>> + */
>> +
>> +/* Id: DevLsiLogicSCSI.cpp 40642 2012-03-26 13:14:08Z vboxsync $ */
>> +/** @file
>> + * VBox storage devices: LsiLogic LSI53c1030 SCSI controller.
>> + */
>> +
>> +/*
>> + * Copyright (C) 2006-2009 Oracle Corporation
>> + *
>> + * This file is part of VirtualBox Open Source Edition (OSE), as
>> + * available from http://www.virtualbox.org. This file is free software;
>> + * you can redistribute it and/or modify it under the terms of the GNU
>> + * General Public License (GPL) as published by the Free Software
>> + * Foundation, in version 2 as it comes in the "COPYING" file of the
>> + * VirtualBox OSE distribution. VirtualBox OSE is distributed in the
>> + * hope that it will be useful, but WITHOUT ANY WARRANTY of any kind.
>> + */
>> +
>> +
> 
> I suspect you need to rewrite above: probably add
> all copyrights in 1st header and make it v2 only.

Do we even accept new GPLv2-only code?

Kevin

Re: [Qemu-devel] [PATCH V2 2/6] libqblock type and structure defines

2012-09-12 Thread Eric Blake

On 09/11/2012 09:05 PM, Wenchao Xia wrote:
>> Seriously?  We require a C99-compliant compiler, which is required to
>> treat the more compact version identically (all undefined names evaluate
>> to 0 in the preprocessor), and HACKING doesn't mandate that we spell out
>> a defined-ness check first.  Okay, so configure adds -Wundef to the set
>> of CFLAGS, but I fail to see why we want -Wundef (that's an
>> anachronistic warning, only there to help you if you are writing code
>> portable to K&R).
>>
>   So if the preprocessor replaced __GNUC__ to 0, is there difference
> between these two kinds of macoros?
> #if __GNUC__ >= 4
> #if defined(__GNUC__) && __GNUC__ >=4

The only difference is whether -Wundef will warn, and I'm trying to
argue that qemu's current use of -Wundef is pointless, as that warning
exists solely for K&R compatibility, not for modern standard-compliant
code correctness.

-- 
Eric Blake   ebl...@redhat.com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [PATCH 1/9] ehci: Don't set seen to 0 when removing unseen queue-heads

2012-09-12 Thread Hans de Goede

When removing unseen queue-heads from the async queue list, we should not
set the seen flag to 0, as this may cause them to be removed by
ehci_queues_rip_unused() during the next call to ehci_advance_async_state()
if the timer is late or running at a low frequency.

Note:
1) This *may* have caused the instant unlink / relinks described in commit
   9bc3a3a216e2689bfcdd36c3e079333bbdbf3ba0

2) Rather then putting more if-s inside ehci_queues_rip_unused, this patch
   instead introduces a new ehci_queues_rip_unseen function.

3) This patch also makes it save to call ehci_queues_rip_unseen() multiple
   times, which gets used in the folluw up patch titled:
   "ehci: Walk async schedule before and after migration"

Signed-off-by: Hans de Goede 
---
 hw/usb/hcd-ehci.c | 24 ++--
 1 file changed, 18 insertions(+), 6 deletions(-)

diff --git a/hw/usb/hcd-ehci.c b/hw/usb/hcd-ehci.c
index 2f3e9c0..c5f2635 100644
--- a/hw/usb/hcd-ehci.c
+++ b/hw/usb/hcd-ehci.c
@@ -853,10 +853,10 @@ static EHCIQueue *ehci_find_queue_by_qh(EHCIState *ehci, 
uint32_t addr,
 return NULL;
 }
 
-static void ehci_queues_rip_unused(EHCIState *ehci, int async, int flush)
+static void ehci_queues_rip_unused(EHCIState *ehci, int async)
 {
 EHCIQueueHead *head = async ? &ehci->aqueues : &ehci->pqueues;
-const char *warn = (async && !flush) ? "guest unlinked busy QH" : NULL;
+const char *warn = async ? "guest unlinked busy QH" : NULL;
 uint64_t maxage = FRAME_TIMER_NS * ehci->maxframes * 4;
 EHCIQueue *q, *tmp;
 
@@ -866,13 +866,25 @@ static void ehci_queues_rip_unused(EHCIState *ehci, int 
async, int flush)
 q->ts = ehci->last_run_ns;
 continue;
 }
-if (!flush && ehci->last_run_ns < q->ts + maxage) {
+if (ehci->last_run_ns < q->ts + maxage) {
 continue;
 }
 ehci_free_queue(q, warn);
 }
 }
 
+static void ehci_queues_rip_unseen(EHCIState *ehci, int async)
+{
+EHCIQueueHead *head = async ? &ehci->aqueues : &ehci->pqueues;
+EHCIQueue *q, *tmp;
+
+QTAILQ_FOREACH_SAFE(q, head, next, tmp) {
+if (!q->seen) {
+ehci_free_queue(q, NULL);
+}
+}
+}
+
 static void ehci_queues_rip_device(EHCIState *ehci, USBDevice *dev, int async)
 {
 EHCIQueueHead *head = async ? &ehci->aqueues : &ehci->pqueues;
@@ -1732,7 +1744,7 @@ static int ehci_state_waitlisthead(EHCIState *ehci,  int 
async)
 ehci_set_usbsts(ehci, USBSTS_REC);
 }
 
-ehci_queues_rip_unused(ehci, async, 0);
+ehci_queues_rip_unused(ehci, async);
 
 /*  Find the head of the list (4.9.1.1) */
 for(i = 0; i < MAX_QH; i++) {
@@ -2364,7 +2376,7 @@ static void ehci_advance_async_state(EHCIState *ehci)
  */
 if (ehci->usbcmd & USBCMD_IAAD) {
 /* Remove all unseen qhs from the async qhs queue */
-ehci_queues_rip_unused(ehci, async, 1);
+ehci_queues_rip_unseen(ehci, async);
 trace_usb_ehci_doorbell_ack();
 ehci->usbcmd &= ~USBCMD_IAAD;
 ehci_raise_irq(ehci, USBSTS_IAA);
@@ -2417,7 +2429,7 @@ static void ehci_advance_periodic_state(EHCIState *ehci)
 ehci_set_fetch_addr(ehci, async,entry);
 ehci_set_state(ehci, async, EST_FETCHENTRY);
 ehci_advance_state(ehci, async);
-ehci_queues_rip_unused(ehci, async, 0);
+ehci_queues_rip_unused(ehci, async);
 break;
 
 default:
-- 
1.7.12

[Qemu-devel] [PATCH 7/9] usb-redir: Add chardev open / close debug logging

2012-09-12 Thread Hans de Goede

Signed-off-by: Hans de Goede 
---
 hw/usb/redirect.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/hw/usb/redirect.c b/hw/usb/redirect.c
index 8d3cf3b..e438fd1 100644
--- a/hw/usb/redirect.c
+++ b/hw/usb/redirect.c
@@ -854,6 +854,7 @@ static void usbredir_chardev_close_bh(void *opaque)
 usbredir_device_disconnect(dev);
 
 if (dev->parser) {
+DPRINTF("destroying usbredirparser\n");
 usbredirparser_destroy(dev->parser);
 dev->parser = NULL;
 }
@@ -869,6 +870,8 @@ static void usbredir_chardev_open(USBRedirDevice *dev)
 usbredir_chardev_close_bh(dev);
 qemu_bh_cancel(dev->chardev_close_bh);
 
+DPRINTF("creating usbredirparser\n");
+
 strcpy(version, "qemu usb-redir guest ");
 pstrcat(version, sizeof(version), qemu_get_version());
 
@@ -980,9 +983,11 @@ static void usbredir_chardev_event(void *opaque, int event)
 
 switch (event) {
 case CHR_EVENT_OPENED:
+DPRINTF("chardev open\n");
 usbredir_chardev_open(dev);
 break;
 case CHR_EVENT_CLOSED:
+DPRINTF("chardev close\n");
 qemu_bh_schedule(dev->chardev_close_bh);
 break;
 }
@@ -1228,6 +1233,7 @@ static void usbredir_device_disconnect(void *priv)
 qemu_del_timer(dev->attach_timer);
 
 if (dev->dev.attached) {
+DPRINTF("detaching device\n");
 usb_device_detach(&dev->dev);
 /*
  * Delay next usb device attach to give the guest a chance to see
-- 
1.7.12

[Qemu-devel] [PATCH 9/9] uhci: Don't queue up packets after one with the SPD flag set

2012-09-12 Thread Hans de Goede

Don't queue up packets after a packet with the SPD (short packet detect)
flag set. Since we won't know if the packet will actually be short until it
has completed, and if it is short we should stop the queue.

This fixes a miniature photoframe emulating a USB cdrom with the windows
software for it not working.

Signed-off-by: Hans de Goede 
---
 hw/usb/hcd-uhci.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/hw/usb/hcd-uhci.c b/hw/usb/hcd-uhci.c
index c7c8786..cdc8bc3 100644
--- a/hw/usb/hcd-uhci.c
+++ b/hw/usb/hcd-uhci.c
@@ -1000,6 +1000,9 @@ static void uhci_fill_queue(UHCIState *s, UHCI_TD *td)
 }
 assert(ret == TD_RESULT_ASYNC_START);
 assert(int_mask == 0);
+if (ptd.ctrl & TD_CTRL_SPD) {
+break;
+}
 plink = ptd.link;
 }
 }
@@ -1097,7 +1100,7 @@ static void uhci_process_frame(UHCIState *s)
 
 case TD_RESULT_ASYNC_START:
 trace_usb_uhci_td_async(curr_qh & ~0xf, link & ~0xf);
-if (is_valid(td.link)) {
+if (is_valid(td.link) && !(td.ctrl & TD_CTRL_SPD)) {
 uhci_fill_queue(s, &td);
 }
 link = curr_qh ? qh.link : td.link;
-- 
1.7.12

[Qemu-devel] [PATCH 0/3] client monitors config support

2012-09-12 Thread Alon Levy

v3:
 - no addition of guest capabilities, use interrupt mask instead, ignore
   0 or ~0 that are set by current windows driver.
 - use crc to solve possible write while read.
 - limit heads to 64, statically allocated on rom by host.
 - some misc trace fixes.

QEMU:

Alon Levy (3):
  hw/qxl: tracing fixes
  qxl: add trace-event for QXL_IO_LOG
  hw/qxl: support client monitor configuration via device

 configure|  7 +
 hw/qxl.c | 88 +---
 trace-events | 11 ++--
 3 files changed, 101 insertions(+), 5 deletions(-)

spice-protocol:

Alon Levy (2):
  qxl_dev.h: add client monitors configuration notification to guest
  Release 0.12.2

 configure.ac|  2 +-
 spice/qxl_dev.h | 18 ++
 2 files changed, 19 insertions(+), 1 deletion(-)

spice-common:

Alon Levy (1):
  Update spice-protocol module

 spice-protocol | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

spice:

Alon Levy (7):
  server/red_dispatcher: change a printerr to debug
  update spice-common module
  server: add QXLInterface::client_monitors_config
  server/red_dispatcher: client_monitors_config support
  server: Filter VD_AGENT_MONITORS_CONFIG
  server/tests: agent mock, client_monitors_config
  spice-server 0.11.5

 configure.ac   |  6 ++--
 server/agent-msg-filter.c  |  8 +
 server/agent-msg-filter.h  |  1 +
 server/red_dispatcher.c| 51 +-
 server/red_dispatcher.h|  4 +++
 server/reds.c  | 65 --
 server/spice.h | 14 +---
 server/tests/basic_event_loop.c|  2 +-
 server/tests/test_display_base.c   | 46 +++
 server/tests/test_display_base.h   |  1 +
 server/tests/test_display_no_ssl.c |  1 +
 spice-common   |  2 +-
 12 files changed, 189 insertions(+), 12 deletions(-)



-- 
1.7.12

[Qemu-devel] [PATCH 3/3] hw/qxl: support client monitor configuration via device

2012-09-12 Thread Alon Levy

Until now we used only the agent to change the monitor count and each
monitor resolution. This patch introduces the qemu part of using the
device as the mediator instead of the agent via virtio-serial.

Spice (>=0.11.5) calls the new QXLInterface::client_monitors_config,
which returns wether the interrupt is enabled, and if so and given a non
NULL monitors config will
generate an interrupt QXL_INTERRUPT_CLIENT_MONITORS_CONFIG with crc
checksum for the guest to verify a second call hasn't interfered.

The maximal number of monitors is limited on the QXLRom to 64.

Signed-off-by: Alon Levy 
---
 configure|  7 ++
 hw/qxl.c | 79 
 trace-events |  6 -
 3 files changed, 91 insertions(+), 1 deletion(-)

diff --git a/configure b/configure
index 30be784..35b5f1b 100755
--- a/configure
+++ b/configure
@@ -2709,6 +2709,9 @@ EOF
 if $pkg_config --atleast-version=0.12.0 spice-protocol >/dev/null 2>&1; 
then
 spice_qxl_io_monitors_config_async="yes"
 fi
+if $pkg_config --atleast-version=0.12.2 spice-protocol > /dev/null 2>&1; 
then
+spice_qxl_client_monitors_config="yes"
+fi
   else
 if test "$spice" = "yes" ; then
   feature_not_found "spice"
@@ -3456,6 +3459,10 @@ if test "$spice_qxl_io_monitors_config_async" = "yes" ; 
then
   echo "CONFIG_QXL_IO_MONITORS_CONFIG_ASYNC=y" >> $config_host_mak
 fi
 
+if test "$spice_qxl_client_monitors_config" = "yes" ; then
+  echo "CONFIG_QXL_CLIENT_MONITORS_CONFIG=y" >> $config_host_mak
+fi
+
 if test "$smartcard" = "yes" ; then
   echo "CONFIG_SMARTCARD=y" >> $config_host_mak
 fi
diff --git a/hw/qxl.c b/hw/qxl.c
index 12dfc79..3a0c059 100644
--- a/hw/qxl.c
+++ b/hw/qxl.c
@@ -18,6 +18,8 @@
  * along with this program; if not, see .
  */
 
+#include 
+
 #include "qemu-common.h"
 #include "qemu-timer.h"
 #include "qemu-queue.h"
@@ -966,6 +968,79 @@ static void interface_set_client_capabilities(QXLInstance 
*sin,
 
 #endif
 
+#if defined(CONFIG_QXL_CLIENT_MONITORS_CONFIG) \
+&& SPICE_SERVER_VERSION >= 0x000b05
+
+static uint32_t qxl_crc32(const uint8_t *p, unsigned len)
+{
+/*
+ * zlib xors the seed with 0x, and xors the result
+ * again with 0x; Both are not done with linux's crc32,
+ * which we want to be compatible with, so undo that.
+ */
+return crc32(0x, p, len) ^ 0x;
+}
+
+/* called from main context only */
+static int interface_client_monitors_config(QXLInstance *sin,
+VDAgentMonitorsConfig *monitors_config)
+{
+PCIQXLDevice *qxl = container_of(sin, PCIQXLDevice, ssd.qxl);
+QXLRom *rom = memory_region_get_ram_ptr(&qxl->rom_bar);
+int i;
+
+/*
+ * Older windows drivers set int_mask to 0 when their ISR is called,
+ * then later set it to ~0. So it doesn't relate to the actual interrupts
+ * handled. However, they are old, so clearly they don't support this
+ * interrupt
+ */
+if (qxl->ram->int_mask == 0 || qxl->ram->int_mask == ~0 ||
+!(qxl->ram->int_mask & QXL_INTERRUPT_CLIENT_MONITORS_CONFIG)) {
+trace_qxl_client_monitors_config_unsupported_by_guest(qxl->id,
+qxl->ram->int_mask,
+monitors_config);
+return 0;
+}
+if (!monitors_config) {
+return 1;
+}
+memset(&rom->client_monitors_config, 0,
+   sizeof(rom->client_monitors_config));
+rom->client_monitors_config.count = monitors_config->num_of_monitors;
+/* monitors_config->flags ignored */
+if (rom->client_monitors_config.count >=
+ARRAY_SIZE(rom->client_monitors_config.heads)) {
+trace_qxl_client_monitors_config_capped(qxl->id,
+monitors_config->num_of_monitors,
+ARRAY_SIZE(rom->client_monitors_config.heads));
+rom->client_monitors_config.count =
+ARRAY_SIZE(rom->client_monitors_config.heads);
+}
+for (i = 0 ; i < rom->client_monitors_config.count ; ++i) {
+VDAgentMonConfig *monitor = &monitors_config->monitors[i];
+QXLURect *rect = &rom->client_monitors_config.heads[i];
+/* monitor->depth ignored */
+rect->left = monitor->x;
+rect->top = monitor->y;
+rect->right = monitor->x + monitor->width;
+rect->bottom = monitor->y + monitor->height;
+}
+rom->client_monitors_config_crc = qxl_crc32(
+(const uint8_t *)&rom->client_monitors_config,
+sizeof(rom->client_monitors_config));
+trace_qxl_client_monitors_config_crc(qxl->id,
+sizeof(rom->client_monitors_config),
+rom->client_monitors_config_crc);
+
+trace_qxl_interrupt_client_monitors_config(qxl->id,
+rom->client_monitors_config.count,
+rom->clie

[Qemu-devel] [PATCH 2/3] qxl: add trace-event for QXL_IO_LOG

2012-09-12 Thread Alon Levy

Signed-off-by: Alon Levy 
---
 hw/qxl.c | 1 +
 trace-events | 1 +
 2 files changed, 2 insertions(+)

diff --git a/hw/qxl.c b/hw/qxl.c
index 94eb3c8..12dfc79 100644
--- a/hw/qxl.c
+++ b/hw/qxl.c
@@ -1503,6 +1503,7 @@ async_common:
 qxl_set_mode(d, val, 0);
 break;
 case QXL_IO_LOG:
+trace_qxl_io_log(d->id, d->ram->log_buf);
 if (d->guestdebug) {
 fprintf(stderr, "qxl/guest-%d: %" PRId64 ": %s", d->id,
 qemu_get_clock_ns(vm_clock), d->ram->log_buf);
diff --git a/trace-events b/trace-events
index 83b332c..564b773 100644
--- a/trace-events
+++ b/trace-events
@@ -924,6 +924,7 @@ qxl_interface_update_area_complete_rest(int qid, uint32_t 
num_updated_rects) "%d
 qxl_interface_update_area_complete_overflow(int qid, int max) "%d max=%d"
 qxl_interface_update_area_complete_schedule_bh(int qid, uint32_t num_dirty) 
"%d #dirty=%d"
 qxl_io_destroy_primary_ignored(int qid, const char *mode) "%d %s"
+qxl_io_log(int qid, const uint8_t *str) "%d %s"
 qxl_io_read_unexpected(int qid) "%d"
 qxl_io_unexpected_vga_mode(int qid, uint64_t addr, uint64_t val, const char 
*desc) "%d 0x%"PRIx64"=%"PRIu64" (%s)"
 qxl_io_write(int qid, const char *mode, uint64_t addr, uint64_t val, unsigned 
size, int async) "%d %s addr=%"PRIu64 " val=%"PRIu64" size=%u async=%d"
-- 
1.7.12

[Qemu-devel] [PATCH 1/3] hw/qxl: tracing fixes

2012-09-12 Thread Alon Levy

Add two new trace events:
qxl_send_events(int qid, uint32_t events) "%d %d"
qxl_set_guest_bug(int qid) "%d"

Change qxl_io_unexpected_vga_mode parameters to be equivalent to those
of qxl_io_write for easier grouping under a single systemtap probe.

Change d to qxl in one place.

Signed-off-by: Alon Levy 
---
 hw/qxl.c | 8 +---
 trace-events | 6 --
 2 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/hw/qxl.c b/hw/qxl.c
index 80ba401..94eb3c8 100644
--- a/hw/qxl.c
+++ b/hw/qxl.c
@@ -141,6 +141,7 @@ static void qxl_ring_set_dirty(PCIQXLDevice *qxl);
 
 void qxl_set_guest_bug(PCIQXLDevice *qxl, const char *msg, ...)
 {
+trace_qxl_set_guest_bug(qxl->id);
 qxl_send_events(qxl, QXL_INTERRUPT_ERROR);
 qxl->guest_bug = 1;
 if (qxl->guestdebug) {
@@ -1403,7 +1404,7 @@ static void ioport_write(void *opaque, target_phys_addr_t 
addr,
 break;
 }
 trace_qxl_io_unexpected_vga_mode(d->id,
-io_port, io_port_to_string(io_port));
+addr, val, io_port_to_string(io_port));
 /* be nice to buggy guest drivers */
 if (io_port >= QXL_IO_UPDATE_AREA_ASYNC &&
 io_port < QXL_IO_RANGE_SIZE) {
@@ -1595,9 +1596,9 @@ cancel_async:
 static uint64_t ioport_read(void *opaque, target_phys_addr_t addr,
 unsigned size)
 {
-PCIQXLDevice *d = opaque;
+PCIQXLDevice *qxl = opaque;
 
-trace_qxl_io_read_unexpected(d->id);
+trace_qxl_io_read_unexpected(qxl->id);
 return 0xff;
 }
 
@@ -1627,6 +1628,7 @@ static void qxl_send_events(PCIQXLDevice *d, uint32_t 
events)
 uint32_t old_pending;
 uint32_t le_events = cpu_to_le32(events);
 
+trace_qxl_send_events(d->id, events);
 assert(qemu_spice_display_is_running(&d->ssd));
 old_pending = __sync_fetch_and_or(&d->ram->int_pending, le_events);
 if ((old_pending & le_events) == le_events) {
diff --git a/trace-events b/trace-events
index 6e31d3c..83b332c 100644
--- a/trace-events
+++ b/trace-events
@@ -925,7 +925,7 @@ qxl_interface_update_area_complete_overflow(int qid, int 
max) "%d max=%d"
 qxl_interface_update_area_complete_schedule_bh(int qid, uint32_t num_dirty) 
"%d #dirty=%d"
 qxl_io_destroy_primary_ignored(int qid, const char *mode) "%d %s"
 qxl_io_read_unexpected(int qid) "%d"
-qxl_io_unexpected_vga_mode(int qid, uint32_t io_port, const char *desc) "%d 
0x%x (%s)"
+qxl_io_unexpected_vga_mode(int qid, uint64_t addr, uint64_t val, const char 
*desc) "%d 0x%"PRIx64"=%"PRIu64" (%s)"
 qxl_io_write(int qid, const char *mode, uint64_t addr, uint64_t val, unsigned 
size, int async) "%d %s addr=%"PRIu64 " val=%"PRIu64" size=%u async=%d"
 qxl_memslot_add_guest(int qid, uint32_t slot_id, uint64_t guest_start, 
uint64_t guest_end) "%d %u: guest phys 0x%"PRIx64 " - 0x%" PRIx64
 qxl_post_load(int qid, const char *mode) "%d %s"
@@ -956,7 +956,7 @@ qxl_spice_destroy_surfaces(int qid, int async) "%d async=%d"
 qxl_spice_destroy_surface_wait_complete(int qid, uint32_t id) "%d sid=%d"
 qxl_spice_destroy_surface_wait(int qid, uint32_t id, int async) "%d sid=%d 
async=%d"
 qxl_spice_flush_surfaces_async(int qid, uint32_t surface_count, uint32_t 
num_free_res) "%d s#=%d, res#=%d"
-qxl_spice_monitors_config(int id) "%d"
+qxl_spice_monitors_config(int qid) "%d"
 qxl_spice_loadvm_commands(int qid, void *ext, uint32_t count) "%d ext=%p 
count=%d"
 qxl_spice_oom(int qid) "%d"
 qxl_spice_reset_cursor(int qid) "%d"
@@ -965,6 +965,8 @@ qxl_spice_reset_memslots(int qid) "%d"
 qxl_spice_update_area(int qid, uint32_t surface_id, uint32_t left, uint32_t 
right, uint32_t top, uint32_t bottom) "%d sid=%d [%d,%d,%d,%d]"
 qxl_spice_update_area_rest(int qid, uint32_t num_dirty_rects, uint32_t 
clear_dirty_region) "%d #d=%d clear=%d"
 qxl_surfaces_dirty(int qid, int surface, int offset, int size) "%d surface=%d 
offset=%d size=%d"
+qxl_send_events(int qid, uint32_t events) "%d %d"
+qxl_set_guest_bug(int qid) "%d"
 
 # hw/qxl-render.c
 qxl_render_blit_guest_primary_initialized(void) ""
-- 
1.7.12

Re: [Qemu-devel] TCG questions

2012-09-12 Thread Lluís Vilanova

Xin Tong writes:

> i do not know. could be similar. I am doing architecture research. i
> need traces of memory access for programming running under a full
> system environment, so i wrote this.

> i do nto seem to be able to access the linked provided from the link
> you give me though.

> https://projects.gso.ac.upc.edu/projects/qemu-dbi/wiki

Well, if you tried to access it during the last few days, we've been having some
issues with the server.

It should all work now.

The main idea is to have an API similar in spirit to that of PIN [1]. You can
have a look at the instrumentation docs [2] for some simple examples.

I had some plans to modify QEMU's address translation mechanism to provide a
performant mechanism to retrieve physical addresses (traces of virtual addresses
are already supported), but that will have to wait until I finish some other
unrelated tasks.


[1] http://pintool.org
[2] 
https://projects.gso.ac.upc.edu/projects/qemu-dbi/repository/entry/docs/instrumentation.txt#L202

Lluis



> On Tue, Sep 11, 2012 at 6:52 PM, 陳韋任 (Wei-Ren Chen)
>  wrote:
>>> I have created a set of instrument API on QEMU. one can write client
>>> programs that compile into shared library. the shared library is then
>>> loaded into qemu and extract statistics out of QEMU.
>> 
>> Instrument API? Same as what Liuis did?
>> 
>> Regards,
>> chenwj
>> 
>> [1] http://lists.gnu.org/archive/html/qemu-devel/2012-09/msg00379.html
>> 
>> --
>> Wei-Ren Chen (陳韋任)
>> Computer Systems Lab, Institute of Information Science,
>> Academia Sinica, Taiwan (R.O.C.)
>> Tel:886-2-2788-3799 #1667
>> Homepage: http://people.cs.nctu.edu.tw/~chenwj


-- 
 "And it's much the same thing with knowledge, for whenever you learn
 something new, the whole world becomes that much richer."
 -- The Princess of Pure Reason, as told by Norton Juster in The Phantom
 Tollbooth

Re: [Qemu-devel] [PATCH] configure: fix "--target-list=, , ..." option

2012-09-12 Thread Laurent Desnogues

Sorry, I had missed this patch...

On Tue, Sep 11, 2012 at 9:02 PM, Eduardo Habkost  wrote:
> commit 66d5499b3754b83c09487259c08fe2ce73188a59 broke the support for
> comma-separated target lists on the --target-list option. e.g.:
>
>   $ ./configure --target-list=x86_64-linux-user,x86_64-softmmu
>   [...]
>   ERROR: Target 'x86_64-linux-user,x86_64-softmmu' not recognised
>   $
>
> This patch restores that ability.
>
> Signed-off-by: Eduardo Habkost 
> Cc: Daniel P. Berrange 
> Cc: Anthony Liguori 
> ---
>  configure | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/configure b/configure
> index 7656c32..9ee7038 100755
> --- a/configure
> +++ b/configure
> @@ -1323,7 +1323,9 @@ if ! "$python" -c 'import sys; 
> sys.exit(sys.version_info < (2,4) or sys.version_
>  fi
>
>  if test "$target_list" = "DEFAULT" ; then
> -target_list=`echo "$default_target_list" | sed -e 's/,/ /g'`
> +target_list="$default_target_list"
> +else
> +target_list=`echo "$target_list" | sed -e 's/,/ /g'`
>  fi

This works for me too.

But I still can't get what the original patch posted by
Daniel Berrange intended to do:

$ ./configure --target-list=
$ make V=1
cat  | grep =y | sort -u > config-all-devices.mak

And it of course hangs there.

Creating an empty config-all-devices.mak before running
make solves the issue.


Laurent

>  # see if system emulation was really requested
> --
> 1.7.11.4
>
>

Re: [Qemu-devel] [PATCH 005/126] target-s390: Fix gdbstub

2012-09-12 Thread Alexander Graf


On 09/09/2012 11:04 PM, Richard Henderson wrote:

The real gdb protocol doesn't split out pc or cc as real registers.
Those are pseudos that are extracted as needed from the PSW.  Don't
modify env->cc_op during read -- that way lies heisenbugs.

Fill in the XXX for the fp registers.

Remove duplicated defines in cpu.h.

Signed-off-by: Richard Henderson 
---
  gdbstub.c  | 78 +-
  target-s390x/cpu.h | 73 --
  2 files changed, 48 insertions(+), 103 deletions(-)

diff --git a/gdbstub.c b/gdbstub.c
index 5d37dd9..6aed0b4 100644
--- a/gdbstub.c
+++ b/gdbstub.c
@@ -1499,27 +1499,35 @@ static int cpu_gdb_write_register(CPUAlphaState *env, 
uint8_t *mem_buf, int n)
  }
  #elif defined (TARGET_S390X)
  
-#define NUM_CORE_REGS S390_NUM_TOTAL_REGS

+#define NUM_CORE_REGS  S390_NUM_REGS
  
  static int cpu_gdb_read_register(CPUS390XState *env, uint8_t *mem_buf, int n)

  {
+uint64_t val;
+int cc_op;
+
  switch (n) {
-case S390_PSWM_REGNUM: GET_REGL(env->psw.mask); break;
-case S390_PSWA_REGNUM: GET_REGL(env->psw.addr); break;
-case S390_R0_REGNUM ... S390_R15_REGNUM:
-GET_REGL(env->regs[n-S390_R0_REGNUM]); break;
-case S390_A0_REGNUM ... S390_A15_REGNUM:
-GET_REG32(env->aregs[n-S390_A0_REGNUM]); break;
-case S390_FPC_REGNUM: GET_REG32(env->fpc); break;
-case S390_F0_REGNUM ... S390_F15_REGNUM:
-/* XXX */
-break;
-case S390_PC_REGNUM: GET_REGL(env->psw.addr); break;
-case S390_CC_REGNUM:
-env->cc_op = calc_cc(env, env->cc_op, env->cc_src, env->cc_dst,
- env->cc_vr);
-GET_REG32(env->cc_op);
-break;
+case S390_PSWM_REGNUM:
+val = env->psw.mask & ~(3ull << 13);
+cc_op = calc_cc(env, env->cc_op, env->cc_src, env->cc_dst, env->cc_vr);
+val |= cc_op << 13;
+GET_REGL(val);
+break;
+case S390_PSWA_REGNUM:
+GET_REGL(env->psw.addr);
+break;
+case S390_R0_REGNUM ... S390_R15_REGNUM:
+GET_REGL(env->regs[n-S390_R0_REGNUM]);
+break;
+case S390_A0_REGNUM ... S390_A15_REGNUM:
+GET_REG32(env->aregs[n-S390_A0_REGNUM]);
+break;
+case S390_FPC_REGNUM:
+GET_REG32(env->fpc);
+break;
+case S390_F0_REGNUM ... S390_F15_REGNUM:
+GET_REG64(env->fregs[n-S390_F0_REGNUM].ll);
+break;
  }
  
  return 0;

@@ -1534,20 +1542,30 @@ static int cpu_gdb_write_register(CPUS390XState *env, 
uint8_t *mem_buf, int n)
  tmp32 = ldl_p(mem_buf);
  
  switch (n) {

-case S390_PSWM_REGNUM: env->psw.mask = tmpl; break;
-case S390_PSWA_REGNUM: env->psw.addr = tmpl; break;
-case S390_R0_REGNUM ... S390_R15_REGNUM:
-env->regs[n-S390_R0_REGNUM] = tmpl; break;
-case S390_A0_REGNUM ... S390_A15_REGNUM:
-env->aregs[n-S390_A0_REGNUM] = tmp32; r=4; break;
-case S390_FPC_REGNUM: env->fpc = tmp32; r=4; break;
-case S390_F0_REGNUM ... S390_F15_REGNUM:
-/* XXX */
-break;
-case S390_PC_REGNUM: env->psw.addr = tmpl; break;
-case S390_CC_REGNUM: env->cc_op = tmp32; r=4; break;
+case S390_PSWM_REGNUM:
+env->psw.mask = tmpl;
+env->cc_op = (tmpl >> 13) & 3;


Are you sure this is correct? I thought gdbstub would just ignore the cc 
bits.



Alex


+break;
+case S390_PSWA_REGNUM:
+env->psw.addr = tmpl;
+break;
+case S390_R0_REGNUM ... S390_R15_REGNUM:
+env->regs[n-S390_R0_REGNUM] = tmpl;
+break;
+case S390_A0_REGNUM ... S390_A15_REGNUM:
+env->aregs[n-S390_A0_REGNUM] = tmp32;
+r = 4;
+break;
+case S390_FPC_REGNUM:
+env->fpc = tmp32;
+r = 4;
+break;
+case S390_F0_REGNUM ... S390_F15_REGNUM:
+env->fregs[n-S390_F0_REGNUM].ll = tmpl;
+break;
+default:
+return 0;
  }
-
  return r;
  }
  #elif defined (TARGET_LM32)
diff --git a/target-s390x/cpu.h b/target-s390x/cpu.h
index ed81af3..471fb91 100644
--- a/target-s390x/cpu.h
+++ b/target-s390x/cpu.h
@@ -430,79 +430,6 @@ static inline void cpu_set_tls(CPUS390XState *env, 
target_ulong newtls)
  /* Total.  */
  #define S390_NUM_REGS 51
  
-/* Pseudo registers -- PC and condition code.  */

-#define S390_PC_REGNUM S390_NUM_REGS
-#define S390_CC_REGNUM (S390_NUM_REGS+1)
-#define S390_NUM_PSEUDO_REGS 2
-#define S390_NUM_TOTAL_REGS (S390_NUM_REGS+2)
-
-
-
-/* Program Status Word.  */
-#define S390_PSWM_REGNUM 0
-#define S390_PSWA_REGNUM 1
-/* General Purpose Registers.  */
-#define S390_R0_REGNUM 2
-#define S390_R1_REGNUM 3
-#define S390_R2_REGNUM 4
-#define S390_R3_REGNUM 5
-#define S390_R4_REGNUM 6
-#define S390_R5_REGNUM 7
-#define S390_R6_REGNUM 8
-#define S390_R7_REGNUM 9
-#define S390_R8_REGNUM 10
-#define S390_R9_REGNUM 11

[Qemu-devel] [PATCH 2/9] ehci: Walk async schedule before and after migration

2012-09-12 Thread Hans de Goede

Signed-off-by: Hans de Goede 
---
 hw/usb/hcd-ehci.c | 28 
 1 file changed, 28 insertions(+)

diff --git a/hw/usb/hcd-ehci.c b/hw/usb/hcd-ehci.c
index c5f2635..e67cbc7 100644
--- a/hw/usb/hcd-ehci.c
+++ b/hw/usb/hcd-ehci.c
@@ -34,6 +34,7 @@
 #include "monitor.h"
 #include "trace.h"
 #include "dma.h"
+#include "sysemu.h"
 
 #define EHCI_DEBUG   0
 
@@ -2574,6 +2575,32 @@ static int usb_ehci_post_load(void *opaque, int 
version_id)
 return 0;
 }
 
+static void usb_ehci_vm_state_change(void *opaque, int running, RunState state)
+{
+EHCIState *ehci = opaque;
+
+/*
+ * We don't migrate the EHCIQueue-s, instead we rebuild them for the
+ * schedule in guest memory. We must do the rebuilt ASAP, so that
+ * USB-devices which have async handled packages have a packet in the
+ * ep queue to match the completion with.
+ */
+if (state == RUN_STATE_RUNNING) {
+ehci_advance_async_state(ehci);
+}
+
+/*
+ * The schedule rebuilt from guest memory could cause the migration dest
+ * to miss a QH unlink, and fail to cancel packets, since the unlinked QH
+ * will never have existed on the destination. Therefor we must flush the
+ * async schedule on savevm to catch any not yet noticed unlinks.
+ */
+if (state == RUN_STATE_SAVE_VM) {
+ehci_advance_async_state(ehci);
+ehci_queues_rip_unseen(ehci, 1);
+}
+}
+
 static const VMStateDescription vmstate_ehci = {
 .name= "ehci",
 .version_id  = 2,
@@ -2723,6 +2750,7 @@ static int usb_ehci_initfn(PCIDevice *dev)
 usb_packet_init(&s->ipacket);
 
 qemu_register_reset(ehci_reset, s);
+qemu_add_vm_change_state_handler(usb_ehci_vm_state_change, s);
 
 memory_region_init_io(&s->mem, &ehci_mem_ops, s, "ehci", MMIO_SIZE);
 pci_register_bar(&s->dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY, &s->mem);
-- 
1.7.12

[Qemu-devel] [PATCH 3/9] usb-redir: Change cancelled packet code into a generic packet-id queue

2012-09-12 Thread Hans de Goede

Signed-off-by: Hans de Goede 
---
 hw/usb/redirect.c | 102 +-
 1 file changed, 71 insertions(+), 31 deletions(-)

diff --git a/hw/usb/redirect.c b/hw/usb/redirect.c
index 5301a69..603262a 100644
--- a/hw/usb/redirect.c
+++ b/hw/usb/redirect.c
@@ -43,7 +43,6 @@
 #define EP2I(ep_address) (((ep_address & 0x80) >> 3) | (ep_address & 0x0f))
 #define I2EP(i) (((i & 0x10) << 3) | (i & 0x0f))
 
-typedef struct Cancelled Cancelled;
 typedef struct USBRedirDevice USBRedirDevice;
 
 /* Struct to hold buffered packets (iso or int input packets) */
@@ -69,6 +68,18 @@ struct endp_data {
 int bufpq_target_size;
 };
 
+struct PacketIdQueueEntry {
+uint64_t id;
+QTAILQ_ENTRY(PacketIdQueueEntry)next;
+};
+
+struct PacketIdQueue {
+USBRedirDevice *dev;
+const char *name;
+QTAILQ_HEAD(, PacketIdQueueEntry) head;
+int size;
+};
+
 struct USBRedirDevice {
 USBDevice dev;
 /* Properties */
@@ -86,7 +97,7 @@ struct USBRedirDevice {
 int64_t next_attach_time;
 struct usbredirparser *parser;
 struct endp_data endpoint[MAX_ENDPOINTS];
-QTAILQ_HEAD(, Cancelled) cancelled;
+struct PacketIdQueue cancelled;
 /* Data for device filtering */
 struct usb_redir_device_connect_header device_info;
 struct usb_redir_interface_info_header interface_info;
@@ -94,11 +105,6 @@ struct USBRedirDevice {
 int filter_rules_count;
 };
 
-struct Cancelled {
-uint64_t id;
-QTAILQ_ENTRY(Cancelled)next;
-};
-
 static void usbredir_hello(void *priv, struct usb_redir_hello_header *h);
 static void usbredir_device_connect(void *priv,
 struct usb_redir_device_connect_header *device_connect);
@@ -239,37 +245,75 @@ static int usbredir_write(void *priv, uint8_t *data, int 
count)
  * Cancelled and buffered packets helpers
  */
 
-static void usbredir_cancel_packet(USBDevice *udev, USBPacket *p)
+static void packet_id_queue_init(struct PacketIdQueue *q,
+USBRedirDevice *dev, const char *name)
 {
-USBRedirDevice *dev = DO_UPCAST(USBRedirDevice, dev, udev);
-Cancelled *c;
+q->dev = dev;
+q->name = name;
+QTAILQ_INIT(&q->head);
+q->size = 0;
+}
+
+static void packet_id_queue_add(struct PacketIdQueue *q, uint64_t id)
+{
+USBRedirDevice *dev = q->dev;
+struct PacketIdQueueEntry *e;
+
+DPRINTF("adding packet id %"PRIu64" to %s queue\n", id, q->name);
+
+e = g_malloc0(sizeof(struct PacketIdQueueEntry));
+e->id = id;
+QTAILQ_INSERT_TAIL(&q->head, e, next);
+q->size++;
+}
+
+static int packet_id_queue_remove(struct PacketIdQueue *q, uint64_t id)
+{
+USBRedirDevice *dev = q->dev;
+struct PacketIdQueueEntry *e;
+
+QTAILQ_FOREACH(e, &q->head, next) {
+if (e->id == id) {
+DPRINTF("removing packet id %"PRIu64" from %s queue\n",
+id, q->name);
+QTAILQ_REMOVE(&q->head, e, next);
+q->size--;
+g_free(e);
+return 1;
+}
+}
+return 0;
+}
+
+static void packet_id_queue_empty(struct PacketIdQueue *q)
+{
+USBRedirDevice *dev = q->dev;
+struct PacketIdQueueEntry *e, *next_e;
 
-DPRINTF("cancel packet id %"PRIu64"\n", p->id);
+DPRINTF("removing %d packet-ids from %s queue\n", q->size, q->name);
 
-c = g_malloc0(sizeof(Cancelled));
-c->id = p->id;
-QTAILQ_INSERT_TAIL(&dev->cancelled, c, next);
+QTAILQ_FOREACH_SAFE(e, &q->head, next, next_e) {
+QTAILQ_REMOVE(&q->head, e, next);
+g_free(e);
+}
+q->size = 0;
+}
 
+static void usbredir_cancel_packet(USBDevice *udev, USBPacket *p)
+{
+USBRedirDevice *dev = DO_UPCAST(USBRedirDevice, dev, udev);
+
+packet_id_queue_add(&dev->cancelled, p->id);
 usbredirparser_send_cancel_data_packet(dev->parser, p->id);
 usbredirparser_do_write(dev->parser);
 }
 
 static int usbredir_is_cancelled(USBRedirDevice *dev, uint64_t id)
 {
-Cancelled *c;
-
 if (!dev->dev.attached) {
 return 1; /* Treat everything as cancelled after a disconnect */
 }
-
-QTAILQ_FOREACH(c, &dev->cancelled, next) {
-if (c->id == id) {
-QTAILQ_REMOVE(&dev->cancelled, c, next);
-g_free(c);
-return 1;
-}
-}
-return 0;
+return packet_id_queue_remove(&dev->cancelled, id);
 }
 
 static USBPacket *usbredir_find_packet_by_id(USBRedirDevice *dev,
@@ -914,7 +958,7 @@ static int usbredir_initfn(USBDevice *udev)
 dev->chardev_close_bh = qemu_bh_new(usbredir_chardev_close_bh, dev);
 dev->attach_timer = qemu_new_timer_ms(vm_clock, usbredir_do_attach, dev);
 
-QTAILQ_INIT(&dev->cancelled);
+packet_id_queue_init(&dev->cancelled, dev, "cancelled");
 for (i = 0; i < MAX_ENDPOINTS; i++) {
 QTAILQ_INIT(&dev->endpoint[i].bufpq);
 }
@@ -933,13 +977,9 @@ static int usbredir_initfn(USBDevice *udev)
 
 static void usbredir_cleanup_device_queues(USBRedirDevice *dev)
 {
-Cancelled *c, *next_c;
 int i;
 
-QTAILQ_FOREA

[Qemu-devel] [PATCH 4/9] usb-redir: Add an already_in_flight packet-id queue

2012-09-12 Thread Hans de Goede

After a live migration, the usb-hcd will re-queue all packets by
walking over the schedule in the guest memory again, but requests which
were encountered on the migration source before will already be in flight,
so these should *not* be re-send to the usbredir-host.

This patch adds an already in flight packet ud queue, which will be filled by
the source before migration and then moved over to the migration dest, any
async handled packets are then checked against this queue to avoid sending
the same packet to the usbredir-host twice.

Signed-off-by: Hans de Goede 
---
 hw/usb/redirect.c | 43 +++
 1 file changed, 43 insertions(+)

diff --git a/hw/usb/redirect.c b/hw/usb/redirect.c
index 603262a..f474da8 100644
--- a/hw/usb/redirect.c
+++ b/hw/usb/redirect.c
@@ -98,6 +98,7 @@ struct USBRedirDevice {
 struct usbredirparser *parser;
 struct endp_data endpoint[MAX_ENDPOINTS];
 struct PacketIdQueue cancelled;
+struct PacketIdQueue already_in_flight;
 /* Data for device filtering */
 struct usb_redir_device_connect_header device_info;
 struct usb_redir_interface_info_header interface_info;
@@ -316,6 +317,34 @@ static int usbredir_is_cancelled(USBRedirDevice *dev, 
uint64_t id)
 return packet_id_queue_remove(&dev->cancelled, id);
 }
 
+static void usbredir_fill_already_in_flight_from_ep(USBRedirDevice *dev,
+struct USBEndpoint *ep)
+{
+static USBPacket *p;
+
+QTAILQ_FOREACH(p, &ep->queue, queue) {
+packet_id_queue_add(&dev->already_in_flight, p->id);
+}
+}
+
+static void usbredir_fill_already_in_flight(USBRedirDevice *dev)
+{
+int ep;
+struct USBDevice *udev = &dev->dev;
+
+usbredir_fill_already_in_flight_from_ep(dev, &udev->ep_ctl);
+
+for (ep = 0; ep < USB_MAX_ENDPOINTS; ep++) {
+usbredir_fill_already_in_flight_from_ep(dev, &udev->ep_in[ep]);
+usbredir_fill_already_in_flight_from_ep(dev, &udev->ep_out[ep]);
+}
+}
+
+static int usbredir_already_in_flight(USBRedirDevice *dev, uint64_t id)
+{
+return packet_id_queue_remove(&dev->already_in_flight, id);
+}
+
 static USBPacket *usbredir_find_packet_by_id(USBRedirDevice *dev,
 uint8_t ep, uint64_t id)
 {
@@ -531,6 +560,10 @@ static int usbredir_handle_bulk_data(USBRedirDevice *dev, 
USBPacket *p,
 
 DPRINTF("bulk-out ep %02X len %zd id %"PRIu64"\n", ep, p->iov.size, p->id);
 
+if (usbredir_already_in_flight(dev, p->id)) {
+return USB_RET_ASYNC;
+}
+
 bulk_packet.endpoint  = ep;
 bulk_packet.length= p->iov.size;
 bulk_packet.stream_id = 0;
@@ -611,6 +644,10 @@ static int usbredir_handle_interrupt_data(USBRedirDevice 
*dev,
 DPRINTF("interrupt-out ep %02X len %zd id %"PRIu64"\n", ep,
 p->iov.size, p->id);
 
+if (usbredir_already_in_flight(dev, p->id)) {
+return USB_RET_ASYNC;
+}
+
 interrupt_packet.endpoint  = ep;
 interrupt_packet.length= p->iov.size;
 
@@ -753,6 +790,10 @@ static int usbredir_handle_control(USBDevice *udev, 
USBPacket *p,
 USBRedirDevice *dev = DO_UPCAST(USBRedirDevice, dev, udev);
 struct usb_redir_control_packet_header control_packet;
 
+if (usbredir_already_in_flight(dev, p->id)) {
+return USB_RET_ASYNC;
+}
+
 /* Special cases for certain standard device requests */
 switch (request) {
 case DeviceOutRequest | USB_REQ_SET_ADDRESS:
@@ -959,6 +1000,7 @@ static int usbredir_initfn(USBDevice *udev)
 dev->attach_timer = qemu_new_timer_ms(vm_clock, usbredir_do_attach, dev);
 
 packet_id_queue_init(&dev->cancelled, dev, "cancelled");
+packet_id_queue_init(&dev->already_in_flight, dev, "already-in-flight");
 for (i = 0; i < MAX_ENDPOINTS; i++) {
 QTAILQ_INIT(&dev->endpoint[i].bufpq);
 }
@@ -980,6 +1022,7 @@ static void usbredir_cleanup_device_queues(USBRedirDevice 
*dev)
 int i;
 
 packet_id_queue_empty(&dev->cancelled);
+packet_id_queue_empty(&dev->already_in_flight);
 for (i = 0; i < MAX_ENDPOINTS; i++) {
 usbredir_free_bufpq(dev, I2EP(i));
 }
-- 
1.7.12

[Qemu-devel] [PATCH 5/9] usb-redir: Store max_packet_size in endp_data

2012-09-12 Thread Hans de Goede

So that we've a place to migrate it to / from to allow restoring it after
migration.

Signed-off-by: Hans de Goede 
---
 hw/usb/redirect.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/hw/usb/redirect.c b/hw/usb/redirect.c
index f474da8..3196665 100644
--- a/hw/usb/redirect.c
+++ b/hw/usb/redirect.c
@@ -57,6 +57,7 @@ struct endp_data {
 uint8_t type;
 uint8_t interval;
 uint8_t interface; /* bInterfaceNumber this ep belongs to */
+uint16_t max_packet_size; /* In bytes, not wMaxPacketSize format !! */
 uint8_t iso_started;
 uint8_t iso_error; /* For reporting iso errors to the HC */
 uint8_t interrupt_started;
@@ -1278,7 +1279,8 @@ static void usbredir_ep_info(void *priv,
 usb_ep->ifnum = dev->endpoint[i].interface;
 if (usbredirparser_peer_has_cap(dev->parser,
  usb_redir_cap_ep_info_max_packet_size)) {
-usb_ep->max_packet_size = ep_info->max_packet_size[i];
+dev->endpoint[i].max_packet_size =
+usb_ep->max_packet_size = ep_info->max_packet_size[i];
 }
 if (ep_info->type[i] == usb_redir_type_bulk) {
 usb_ep->pipeline = true;
-- 
1.7.12

[Qemu-devel] [PATCH 6/9] usb-redir: Add support for migration

2012-09-12 Thread Hans de Goede

Signed-off-by: Hans de Goede 
---
 hw/usb/redirect.c | 349 +-
 1 file changed, 346 insertions(+), 3 deletions(-)

diff --git a/hw/usb/redirect.c b/hw/usb/redirect.c
index 3196665..8d3cf3b 100644
--- a/hw/usb/redirect.c
+++ b/hw/usb/redirect.c
@@ -65,8 +65,8 @@ struct endp_data {
 uint8_t bufpq_prefilled;
 uint8_t bufpq_dropping_packets;
 QTAILQ_HEAD(, buf_packet) bufpq;
-int bufpq_size;
-int bufpq_target_size;
+int32_t bufpq_size;
+int32_t bufpq_target_size;
 };
 
 struct PacketIdQueueEntry {
@@ -240,6 +240,11 @@ static int usbredir_write(void *priv, uint8_t *data, int 
count)
 return 0;
 }
 
+/* Don't send new data to the chardev until our state is fully synced */
+if (!runstate_check(RUN_STATE_RUNNING)) {
+return 0;
+}
+
 return qemu_chr_fe_write(dev->cs, data, count);
 }
 
@@ -858,6 +863,7 @@ static void usbredir_chardev_open(USBRedirDevice *dev)
 {
 uint32_t caps[USB_REDIR_CAPS_SIZE] = { 0, };
 char version[32];
+int flags = 0;
 
 /* Make sure any pending closes are handled (no-op if none pending) */
 usbredir_chardev_close_bh(dev);
@@ -893,7 +899,12 @@ static void usbredir_chardev_open(USBRedirDevice *dev)
 usbredirparser_caps_set_cap(caps, usb_redir_cap_filter);
 usbredirparser_caps_set_cap(caps, usb_redir_cap_ep_info_max_packet_size);
 usbredirparser_caps_set_cap(caps, usb_redir_cap_64bits_ids);
-usbredirparser_init(dev->parser, version, caps, USB_REDIR_CAPS_SIZE, 0);
+
+if (runstate_check(RUN_STATE_INMIGRATE)) {
+flags |= usbredirparser_fl_no_hello;
+}
+usbredirparser_init(dev->parser, version, caps, USB_REDIR_CAPS_SIZE,
+flags);
 usbredirparser_do_write(dev->parser);
 }
 
@@ -939,6 +950,11 @@ static int usbredir_chardev_can_read(void *opaque)
 return 0;
 }
 
+/* Don't read new data from the chardev until our state is fully synced */
+if (!runstate_check(RUN_STATE_RUNNING)) {
+return 0;
+}
+
 /* usbredir_parser_do_read will consume *all* data we give it */
 return 1024 * 1024;
 }
@@ -976,6 +992,15 @@ static void usbredir_chardev_event(void *opaque, int event)
  * init + destroy
  */
 
+static void usbredir_vm_state_change(void *priv, int running, RunState state)
+{
+USBRedirDevice *dev = priv;
+
+if (state == RUN_STATE_RUNNING && dev->parser != NULL) {
+usbredirparser_do_write(dev->parser); /* Flush any pending writes */
+}
+}
+
 static int usbredir_initfn(USBDevice *udev)
 {
 USBRedirDevice *dev = DO_UPCAST(USBRedirDevice, dev, udev);
@@ -1014,6 +1039,7 @@ static int usbredir_initfn(USBDevice *udev)
 qemu_chr_add_handlers(dev->cs, usbredir_chardev_can_read,
   usbredir_chardev_read, usbredir_chardev_event, dev);
 
+qemu_add_vm_change_state_handler(usbredir_vm_state_change, dev);
 add_boot_device_path(dev->bootindex, &udev->qdev, NULL);
 return 0;
 }
@@ -1503,6 +1529,322 @@ static void usbredir_interrupt_packet(void *priv, 
uint64_t id,
 }
 }
 
+/*
+ * Migration code
+ */
+
+static void usbredir_pre_save(void *priv)
+{
+USBRedirDevice *dev = priv;
+
+usbredir_fill_already_in_flight(dev);
+}
+
+static int usbredir_post_load(void *priv, int version_id)
+{
+USBRedirDevice *dev = priv;
+struct USBEndpoint *usb_ep;
+int i;
+
+switch (dev->device_info.speed) {
+case usb_redir_speed_low:
+dev->dev.speed = USB_SPEED_LOW;
+break;
+case usb_redir_speed_full:
+dev->dev.speed = USB_SPEED_FULL;
+break;
+case usb_redir_speed_high:
+dev->dev.speed = USB_SPEED_HIGH;
+break;
+case usb_redir_speed_super:
+dev->dev.speed = USB_SPEED_SUPER;
+break;
+default:
+dev->dev.speed = USB_SPEED_FULL;
+}
+dev->dev.speedmask = (1 << dev->dev.speed);
+
+for (i = 0; i < MAX_ENDPOINTS; i++) {
+usb_ep = usb_ep_get(&dev->dev,
+(i & 0x10) ? USB_TOKEN_IN : USB_TOKEN_OUT,
+i & 0x0f);
+usb_ep->type = dev->endpoint[i].type;
+usb_ep->ifnum = dev->endpoint[i].interface;
+usb_ep->max_packet_size = dev->endpoint[i].max_packet_size;
+if (dev->endpoint[i].type == usb_redir_type_bulk) {
+usb_ep->pipeline = true;
+}
+}
+return 0;
+}
+
+/* For usbredirparser migration */
+static void usbredir_put_parser(QEMUFile *f, void *priv, size_t unused)
+{
+USBRedirDevice *dev = priv;
+uint8_t *data;
+int len;
+
+if (dev->parser == NULL) {
+qemu_put_be32(f, 0);
+return;
+}
+
+usbredirparser_serialize(dev->parser, &data, &len);
+qemu_oom_check(data);
+
+qemu_put_be32(f, len);
+qemu_put_buffer(f, data, len);
+
+free(data);
+}
+
+static int usbredir_get_parser(QEMUFile *f, void *priv, size_t unused)
+{
+USBRedirDevice *dev = priv;
+uint8_t *data;
+

[Qemu-devel] [PATCH 8/9] usb-redir: Revert usb-redir part of commit 93bfef4c

2012-09-12 Thread Hans de Goede

Commit 93bfef4c6e4b23caea9d51e1099d06433d8835a4 makes qemu-devices
which report the qemu version string to the guest in some way use a
qemu_get_version function which reports a machine-specific version string.

However usb-redir does not expose the qemu version to the guest, only to
the usbredir-host as part of the initial handshake. This can then be logged
on the usbredir-host side for debugging purposes and is otherwise completely
unused! For debugging purposes it is important to have the real qemu version
in there, rather then the machine-specific version.

Signed-off-by: Hans de Goede 
---
 hw/usb/redirect.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/hw/usb/redirect.c b/hw/usb/redirect.c
index e438fd1..f327b94 100644
--- a/hw/usb/redirect.c
+++ b/hw/usb/redirect.c
@@ -142,6 +142,8 @@ static void usbredir_interrupt_packet(void *priv, uint64_t 
id,
 static int usbredir_handle_status(USBRedirDevice *dev,
int status, int actual_len);
 
+#define VERSION "qemu usb-redir guest " QEMU_VERSION
+
 /*
  * Logging stuff
  */
@@ -863,7 +865,6 @@ static void usbredir_chardev_close_bh(void *opaque)
 static void usbredir_chardev_open(USBRedirDevice *dev)
 {
 uint32_t caps[USB_REDIR_CAPS_SIZE] = { 0, };
-char version[32];
 int flags = 0;
 
 /* Make sure any pending closes are handled (no-op if none pending) */
@@ -872,9 +873,6 @@ static void usbredir_chardev_open(USBRedirDevice *dev)
 
 DPRINTF("creating usbredirparser\n");
 
-strcpy(version, "qemu usb-redir guest ");
-pstrcat(version, sizeof(version), qemu_get_version());
-
 dev->parser = qemu_oom_check(usbredirparser_create());
 dev->parser->priv = dev;
 dev->parser->log_func = usbredir_log;
@@ -906,7 +904,7 @@ static void usbredir_chardev_open(USBRedirDevice *dev)
 if (runstate_check(RUN_STATE_INMIGRATE)) {
 flags |= usbredirparser_fl_no_hello;
 }
-usbredirparser_init(dev->parser, version, caps, USB_REDIR_CAPS_SIZE,
+usbredirparser_init(dev->parser, VERSION, caps, USB_REDIR_CAPS_SIZE,
 flags);
 usbredirparser_do_write(dev->parser);
 }
-- 
1.7.12

[Qemu-devel] Rethinking missed tick catchup

2012-09-12 Thread Anthony Liguori


Hi,

We've been running into a lot of problems lately with Windows guests and
I think they all ultimately could be addressed by revisiting the missed
tick catchup algorithms that we use.  Mike and I spent a while talking
about it yesterday and I wanted to take the discussion to the list to
get some additional input.

Here are the problems we're seeing:

1) Rapid reinjection can lead to time moving faster for short bursts of
   time.  We've seen a number of RTC watchdog BSoDs and it's possible
   that at least one cause is reinjection speed.

2) When hibernating a host system, the guest gets is essentially paused
   for a long period of time.  This results in a very large tick catchup
   while also resulting in a large skew in guest time.

   I've gotten reports of the tick catchup consuming a lot of CPU time
   from rapid delivery of interrupts (although I haven't reproduced this
   yet).

3) Windows appears to have a service that periodically syncs the guest
   time with the hardware clock.  I've been told the resync period is an
   hour.  For large clock skews, this can compete with reinjection
   resulting in a positive skew in time (the guest can be ahead of the
   host).

I've been thinking about an algorithm like this to address these
problems:

A) Limit the number of interrupts that we reinject to the equivalent of
   a small period of wallclock time.  Something like 60 seconds.

B) In the event of (A), trigger a notification in QEMU.  This is easy
   for the RTC but harder for the in-kernel PIT.  Maybe it's a good time to
   revisit usage of the in-kernel PIT?

C) On acculumated tick overflow, rely on using a qemu-ga command to
   force a resync of the guest's time to the hardware wallclock time.

D) Whenever the guest reads the wallclock time from the RTC, reset all
   accumulated ticks.

In order to do (C), we'll need to plumb qemu-ga through QMP.  Mike and I
discussed a low-impact way of doing this (having a separate dispatch
path for guest agent commands) and I'm confident we could do this for
1.3.

This would mean that management tools would need to consume qemu-ga
through QMP.  Not sure if this is a problem for anyone.

I'm not sure whether it's worth trying to support this with the
in-kernel PIT or not either.

Are there other issues with reinjection that people are aware of?  Does
anything seem obviously wrong with the above?

Regards,

Anthony Liguori

Re: [Qemu-devel] [PATCH] hw: Add support for new LSI Logic devices.

2012-09-12 Thread Anthony Liguori

Kevin Wolf  writes:

> Am 12.09.2012 01:50, schrieb Michael S. Tsirkin:
>> On Tue, Sep 11, 2012 at 01:00:13PM -0400, Don Slutz wrote:
>>> Add LSI53C1030, SAS1068, SAS1068e.  Based on code from "VirtualBox Open 
>>> Source Edition".
>>> Based on QEMU MegaRAID SAS 8708EM2.
>>>
>>> This is a common VMware disk controller.
>> 
>> I think you mean VMware emulates this controller too,
>> pls say it explicitly in the commit log.
>> 
>>> SEABIOS change for booting is in the works.
>>>
>>> Tested with Fedora 16, 17.  CentoOS 6. Windows 2003R2 and 2008R2.
>>>
>>> Signed-off-by: Don Slutz 
>> 
>> Minor comments below.
>> 
>> Coding style does not adhere to qemu standards,
>> I guess you know that :)
>> 
>> Otherwise, from pci side of things this looks OK.
>> I did not look at the scsi side of things.
>> 
>>> ---
>>>  default-configs/pci.mak |1 +
>>>  hw/Makefile.objs|1 +
>>>  hw/lsilogic.c   | 2743 ++
>>>  hw/lsilogic.h   | 3365 
>>> +++
>>>  hw/pci_ids.h|4 +
>>>  trace-events|   26 +
>>>  6 files changed, 6140 insertions(+), 0 deletions(-)
>>>  create mode 100644 hw/lsilogic.c
>>>  create mode 100644 hw/lsilogic.h
>>>
>>> diff --git a/default-configs/pci.mak b/default-configs/pci.mak
>>> index 69e18f1..ae4873d 100644
>>> --- a/default-configs/pci.mak
>>> +++ b/default-configs/pci.mak
>>> @@ -11,6 +11,7 @@ CONFIG_PCNET_PCI=y
>>>  CONFIG_PCNET_COMMON=y
>>>  CONFIG_LSI_SCSI_PCI=y
>>>  CONFIG_MEGASAS_SCSI_PCI=y
>>> +CONFIG_LSILOGIC_SCSI_PCI=y
>>>  CONFIG_RTL8139_PCI=y
>>>  CONFIG_E1000_PCI=y
>>>  CONFIG_IDE_CORE=y
>>> diff --git a/hw/Makefile.objs b/hw/Makefile.objs
>>> index 6dfebd2..e5f939c 100644
>>> --- a/hw/Makefile.objs
>>> +++ b/hw/Makefile.objs
>>> @@ -115,6 +115,7 @@ hw-obj-$(CONFIG_OPENCORES_ETH) += opencores_eth.o
>>>  # SCSI layer
>>>  hw-obj-$(CONFIG_LSI_SCSI_PCI) += lsi53c895a.o
>>>  hw-obj-$(CONFIG_MEGASAS_SCSI_PCI) += megasas.o
>>> +hw-obj-$(CONFIG_LSILOGIC_SCSI_PCI) += lsilogic.o
>>>  hw-obj-$(CONFIG_ESP) += esp.o
>>>  hw-obj-$(CONFIG_ESP_PCI) += esp-pci.o
>>>  
>>> diff --git a/hw/lsilogic.c b/hw/lsilogic.c
>>> new file mode 100644
>>> index 000..1c0a54f
>>> --- /dev/null
>>> +++ b/hw/lsilogic.c
>>> @@ -0,0 +1,2743 @@
>>> +/*
>>> + * QEMU LSILOGIC LSI53C1030 SCSI and SAS1068 Host Bus Adapter emulation
>>> + * Based on the QEMU Megaraid emulator and the VirtualBox LsiLogic
>>> + * LSI53c1030 SCSI controller
>>> + *
>>> + * Copyright (c) 2009-2012 Hannes Reinecke, SUSE Labs
>>> + *
>>> + * This library is free software; you can redistribute it and/or
>>> + * modify it under the terms of the GNU Lesser General Public
>>> + * License as published by the Free Software Foundation; either
>>> + * version 2 of the License, or (at your option) any later version.
>>> + *
>>> + * This library is distributed in the hope that it will be useful,
>>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>>> + * Lesser General Public License for more details.
>>> + *
>>> + * You should have received a copy of the GNU Lesser General Public
>>> + * License along with this library; if not, see 
>>> .
>>> + */
>>> +
>>> +/* Id: DevLsiLogicSCSI.cpp 40642 2012-03-26 13:14:08Z vboxsync $ */
>>> +/** @file
>>> + * VBox storage devices: LsiLogic LSI53c1030 SCSI controller.
>>> + */
>>> +
>>> +/*
>>> + * Copyright (C) 2006-2009 Oracle Corporation
>>> + *
>>> + * This file is part of VirtualBox Open Source Edition (OSE), as
>>> + * available from http://www.virtualbox.org. This file is free software;
>>> + * you can redistribute it and/or modify it under the terms of the GNU
>>> + * General Public License (GPL) as published by the Free Software
>>> + * Foundation, in version 2 as it comes in the "COPYING" file of the
>>> + * VirtualBox OSE distribution. VirtualBox OSE is distributed in the
>>> + * hope that it will be useful, but WITHOUT ANY WARRANTY of any kind.
>>> + */
>>> +
>>> +
>> 
>> I suspect you need to rewrite above: probably add
>> all copyrights in 1st header and make it v2 only.
>
> Do we even accept new GPLv2-only code?

Yes.

I've got some concern about the maintainability of this though.  This is
code copied from another project and then heavily modified.

Are we prepared to independently fork this device?  How are we going to test it
regularly?

We seem to be growing SCSI controllers like weeds.  Why would someone
use this verses megasas vs. LSI vs virtio-scsi?

Regards,

Anthony Liguori

>
> Kevin

Re: [Qemu-devel] [PATCH v2 0/3] nonblocking connect address handling cleanup

2012-09-12 Thread Amos Kong


On 12/09/12 19:12, Orit Wasserman wrote:

getaddrinfo can give us a list of addresses, but we only try to
connect to the first one. If that fails we never proceed to
the next one.  This is common on desktop setups that often have ipv6
configured but not actually working.
A simple way to reproduce the problem is migration:
for the destination use -incoming tcp:0:, run migrate -d tcp:localhost:
migration will fail on hosts that have both IPv4 and IPV6 address for localhost.

To fix this, refactor address resolution code and make inet_nonblocking_connect
retry connection with a different address.



Looks good to me.

Reviewed-by: Amos Kong 
Tested-by: Amos Kong 



Orit Wasserman (3):
   Refactor inet_connect_opts function
   Separate inet_connect into inet_connect (blocking) and
 inet_nonblocking_connect
   Fix address handling in inet_nonblocking_connect

  migration-tcp.c |   29 ++-
  nbd.c   |2 +-
  qemu-sockets.c  |  254 +--
  qemu_socket.h   |9 ++-
  ui/vnc.c|2 +-
  5 files changed, 208 insertions(+), 88 deletions(-)



--
Amos.

Re: [Qemu-devel] Rethinking missed tick catchup

2012-09-12 Thread Jan Kiszka

On 2012-09-12 15:54, Anthony Liguori wrote:
> 
> Hi,
> 
> We've been running into a lot of problems lately with Windows guests and
> I think they all ultimately could be addressed by revisiting the missed
> tick catchup algorithms that we use.  Mike and I spent a while talking
> about it yesterday and I wanted to take the discussion to the list to
> get some additional input.
> 
> Here are the problems we're seeing:
> 
> 1) Rapid reinjection can lead to time moving faster for short bursts of
>time.  We've seen a number of RTC watchdog BSoDs and it's possible
>that at least one cause is reinjection speed.
> 
> 2) When hibernating a host system, the guest gets is essentially paused
>for a long period of time.  This results in a very large tick catchup
>while also resulting in a large skew in guest time.
> 
>I've gotten reports of the tick catchup consuming a lot of CPU time
>from rapid delivery of interrupts (although I haven't reproduced this
>yet).
> 
> 3) Windows appears to have a service that periodically syncs the guest
>time with the hardware clock.  I've been told the resync period is an
>hour.  For large clock skews, this can compete with reinjection
>resulting in a positive skew in time (the guest can be ahead of the
>host).
> 
> I've been thinking about an algorithm like this to address these
> problems:
> 
> A) Limit the number of interrupts that we reinject to the equivalent of
>a small period of wallclock time.  Something like 60 seconds.
> 
> B) In the event of (A), trigger a notification in QEMU.  This is easy
>for the RTC but harder for the in-kernel PIT.  Maybe it's a good time to
>revisit usage of the in-kernel PIT?
> 
> C) On acculumated tick overflow, rely on using a qemu-ga command to
>force a resync of the guest's time to the hardware wallclock time.
> 
> D) Whenever the guest reads the wallclock time from the RTC, reset all
>accumulated ticks.
> 
> In order to do (C), we'll need to plumb qemu-ga through QMP.  Mike and I
> discussed a low-impact way of doing this (having a separate dispatch
> path for guest agent commands) and I'm confident we could do this for
> 1.3.
> 
> This would mean that management tools would need to consume qemu-ga
> through QMP.  Not sure if this is a problem for anyone.
> 
> I'm not sure whether it's worth trying to support this with the
> in-kernel PIT or not either.

As with our current discussion around fixing the PIC and its impact on
the PIT, we should try on the userspace model first and then check if
the design can be adapted to support in-kernel as well.

For which guests is the PIT important again? Old Linux kernels? Windows
should be mostly happy with the RTC - or the HPET.

> 
> Are there other issues with reinjection that people are aware of?  Does
> anything seem obviously wrong with the above?

We should take the chance and design everything in a way that the HPET
can finally be (left) enabled.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux

Re: [Qemu-devel] Rethinking missed tick catchup

2012-09-12 Thread Anthony Liguori

Jan Kiszka  writes:

> On 2012-09-12 15:54, Anthony Liguori wrote:
>> 
>> Hi,
>> 
>> We've been running into a lot of problems lately with Windows guests and
>> I think they all ultimately could be addressed by revisiting the missed
>> tick catchup algorithms that we use.  Mike and I spent a while talking
>> about it yesterday and I wanted to take the discussion to the list to
>> get some additional input.
>> 
>> Here are the problems we're seeing:
>> 
>> 1) Rapid reinjection can lead to time moving faster for short bursts of
>>time.  We've seen a number of RTC watchdog BSoDs and it's possible
>>that at least one cause is reinjection speed.
>> 
>> 2) When hibernating a host system, the guest gets is essentially paused
>>for a long period of time.  This results in a very large tick catchup
>>while also resulting in a large skew in guest time.
>> 
>>I've gotten reports of the tick catchup consuming a lot of CPU time
>>from rapid delivery of interrupts (although I haven't reproduced this
>>yet).
>> 
>> 3) Windows appears to have a service that periodically syncs the guest
>>time with the hardware clock.  I've been told the resync period is an
>>hour.  For large clock skews, this can compete with reinjection
>>resulting in a positive skew in time (the guest can be ahead of the
>>host).
>> 
>> I've been thinking about an algorithm like this to address these
>> problems:
>> 
>> A) Limit the number of interrupts that we reinject to the equivalent of
>>a small period of wallclock time.  Something like 60 seconds.
>> 
>> B) In the event of (A), trigger a notification in QEMU.  This is easy
>>for the RTC but harder for the in-kernel PIT.  Maybe it's a good time to
>>revisit usage of the in-kernel PIT?
>> 
>> C) On acculumated tick overflow, rely on using a qemu-ga command to
>>force a resync of the guest's time to the hardware wallclock time.
>> 
>> D) Whenever the guest reads the wallclock time from the RTC, reset all
>>accumulated ticks.
>> 
>> In order to do (C), we'll need to plumb qemu-ga through QMP.  Mike and I
>> discussed a low-impact way of doing this (having a separate dispatch
>> path for guest agent commands) and I'm confident we could do this for
>> 1.3.
>> 
>> This would mean that management tools would need to consume qemu-ga
>> through QMP.  Not sure if this is a problem for anyone.
>> 
>> I'm not sure whether it's worth trying to support this with the
>> in-kernel PIT or not either.
>
> As with our current discussion around fixing the PIC and its impact on
> the PIT, we should try on the userspace model first and then check if
> the design can be adapted to support in-kernel as well.
>
> For which guests is the PIT important again? Old Linux kernels? Windows
> should be mostly happy with the RTC - or the HPET.

I thought that only 64-bit Win2k8+ used the RTC.

I thought win2k3 and even 32-bit win2k8 still used the PIT.

>> Are there other issues with reinjection that people are aware of?  Does
>> anything seem obviously wrong with the above?
>
> We should take the chance and design everything in a way that the HPET
> can finally be (left) enabled.

I thought the issue with the HPET was access frequency and the cost of
heavy weight exits.

I don't have concrete data here.  I've only heard it second hand.  Can
anyone comment more?

Regards,

Anthony Liguori

>
> Jan
>
> -- 
> Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
> Corporate Competence Center Embedded Linux

Re: [Qemu-devel] Rethinking missed tick catchup

2012-09-12 Thread Jan Kiszka

On 2012-09-12 16:44, Anthony Liguori wrote:
> Jan Kiszka  writes:
> 
>> On 2012-09-12 15:54, Anthony Liguori wrote:
>>>
>>> Hi,
>>>
>>> We've been running into a lot of problems lately with Windows guests and
>>> I think they all ultimately could be addressed by revisiting the missed
>>> tick catchup algorithms that we use.  Mike and I spent a while talking
>>> about it yesterday and I wanted to take the discussion to the list to
>>> get some additional input.
>>>
>>> Here are the problems we're seeing:
>>>
>>> 1) Rapid reinjection can lead to time moving faster for short bursts of
>>>time.  We've seen a number of RTC watchdog BSoDs and it's possible
>>>that at least one cause is reinjection speed.
>>>
>>> 2) When hibernating a host system, the guest gets is essentially paused
>>>for a long period of time.  This results in a very large tick catchup
>>>while also resulting in a large skew in guest time.
>>>
>>>I've gotten reports of the tick catchup consuming a lot of CPU time
>>>from rapid delivery of interrupts (although I haven't reproduced this
>>>yet).
>>>
>>> 3) Windows appears to have a service that periodically syncs the guest
>>>time with the hardware clock.  I've been told the resync period is an
>>>hour.  For large clock skews, this can compete with reinjection
>>>resulting in a positive skew in time (the guest can be ahead of the
>>>host).
>>>
>>> I've been thinking about an algorithm like this to address these
>>> problems:
>>>
>>> A) Limit the number of interrupts that we reinject to the equivalent of
>>>a small period of wallclock time.  Something like 60 seconds.
>>>
>>> B) In the event of (A), trigger a notification in QEMU.  This is easy
>>>for the RTC but harder for the in-kernel PIT.  Maybe it's a good time to
>>>revisit usage of the in-kernel PIT?
>>>
>>> C) On acculumated tick overflow, rely on using a qemu-ga command to
>>>force a resync of the guest's time to the hardware wallclock time.
>>>
>>> D) Whenever the guest reads the wallclock time from the RTC, reset all
>>>accumulated ticks.
>>>
>>> In order to do (C), we'll need to plumb qemu-ga through QMP.  Mike and I
>>> discussed a low-impact way of doing this (having a separate dispatch
>>> path for guest agent commands) and I'm confident we could do this for
>>> 1.3.
>>>
>>> This would mean that management tools would need to consume qemu-ga
>>> through QMP.  Not sure if this is a problem for anyone.
>>>
>>> I'm not sure whether it's worth trying to support this with the
>>> in-kernel PIT or not either.
>>
>> As with our current discussion around fixing the PIC and its impact on
>> the PIT, we should try on the userspace model first and then check if
>> the design can be adapted to support in-kernel as well.
>>
>> For which guests is the PIT important again? Old Linux kernels? Windows
>> should be mostly happy with the RTC - or the HPET.
> 
> I thought that only 64-bit Win2k8+ used the RTC.
> 
> I thought win2k3 and even 32-bit win2k8 still used the PIT.

Hmm, might be true.

> 
>>> Are there other issues with reinjection that people are aware of?  Does
>>> anything seem obviously wrong with the above?
>>
>> We should take the chance and design everything in a way that the HPET
>> can finally be (left) enabled.
> 
> I thought the issue with the HPET was access frequency and the cost of
> heavy weight exits.

Well, with common Win7-64 you can choose between RTC and HPET. There
former works well, the latter breaks timing. Both require userspace
exists. But HPET is enabled by default, so needs manual tuning to get
things right.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux

[Qemu-devel] Enablig DLPAR capacity on QEMU pSeries

2012-09-12 Thread Erlon Cruz

Hi all,

We are planning to implement DLPAR capacity on QEMU pSeries. As we
lack of experience in the internals of the arch we would like you guys
to give us some design directions
and confirm if we going in the right direction. Our first idea is:

1 - to patch 'spapr.c' so it can dynamically insert/remove basic
items into the device tree.
2 - create a host side device that will be used with a guest side
driver to perform guest side operations and communicate changes from
host to the guest (like DynamicRM does in PowerVM LPARs). We are not
planning to use powerpc-tools and want to make resource management
transparent (i.e. no need to run daemons or userspace programs in the
guest, only this kernel driver).
3 - create bindings to support adding/removal  ibmvscsi devices
4 - create bindings to support adding/removal  ibmveth devices
5 - create bindings to support adding/removal PCI devices
6 - create bindings to support adding/removal of memory
- Do we need to do this the way PowerVM does? We have tested
virtio ballooning and it can works with a few endiannes corrections.
7 - create bindings to support adding/removal  CPUs
- is SMP supported already? I tried to run SMP in a x86 host
and the guest stuck when SMP is enabled
- would be possible to work on this without a P7 baremetal
machine? We have a P7 8205-E6B, is that possible to kick PHYP out?

Any ideia on how much effort (time/people) the hole thing would take?
Any consideration about this is much appreciated :)

Kind regards,
Erlon

Re: [Qemu-devel] [PATCH 4/5] QAPI: Introduce memchar-read QMP command

2012-09-12 Thread Eric Blake

On 09/12/2012 05:57 AM, Lei Li wrote:
> Signed-off-by: Lei Li 
> ---
>  hmp-commands.hx  |   25 +
>  hmp.c|   18 ++
>  hmp.h|1 +
>  qapi-schema.json |   27 +++
>  qemu-char.c  |   48 
>  qmp-commands.hx  |   37 +
>  6 files changed, 156 insertions(+), 0 deletions(-)
> 

> +++ b/qapi-schema.json
> @@ -305,6 +305,33 @@
> '*control': 'CongestionControl'} }
>  
>  ##
> +# @memchar-read:
> +#
> +# Provide read interface for CirMemCharDriver. Read from cirmemchar
> +# char device and return the data.
> +#
> +# @chardev: the name of the cirmemchar char device.
> +#
> +# @size: the size to read in bytes.
> +#
> +# @format: #optional the format of the data want to read from
> +#  CirMemCharDriver, by default is 'utf8'.
> +#
> +# @control: #optional options for read and write command that specifies
> +#   behavior when the queue is full/empty.
> +#
> +# Returns: The data read from cirmemchar as string.
> +#  If @chardev is not a valid memchr device, DeviceNotFound
> +#  If an I/O error occurs while reading, IOError
> +#
> +# Since: 1.3
> +##
> +{ 'command': 'memchar-read',
> +  'data': {'chardev': 'str', 'size': 'int', '*format': 'DataFormat',
> +   '*control': 'CongestionControl'},
> +   'returns': 'str' }

What happens if the data to be read contains embedded NUL, but the
requested 'format' can't express that?  What happens if there is less
data available than the maximum requested size?  I'm wondering if the
return should be a JSON struct, { 'data':'str', 'size':'int' }, in order
to allow for the case of short read returns.

> +- "chardev": the name of the char device, must be unique (json-string)
> +- "size": the memory size in bytes, init size of the cirmemchar
> +  by default (json-int)
> +- "format": the data format write to CirMemCharDriver, default is
> +utf8. (json-string, optional)
> +  - Possible values: "utf8", "base64"

Also, you probably want to make it crystal-clear whether size is
referring to the unencoded size of the raw data, or the encoded size
after conversion to utf8 or base64.

-- 
Eric Blake   ebl...@redhat.com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] Rethinking missed tick catchup

2012-09-12 Thread Gleb Natapov

On Wed, Sep 12, 2012 at 09:44:10AM -0500, Anthony Liguori wrote:
> Jan Kiszka  writes:
> 
> > On 2012-09-12 15:54, Anthony Liguori wrote:
> >> 
> >> Hi,
> >> 
> >> We've been running into a lot of problems lately with Windows guests and
> >> I think they all ultimately could be addressed by revisiting the missed
> >> tick catchup algorithms that we use.  Mike and I spent a while talking
> >> about it yesterday and I wanted to take the discussion to the list to
> >> get some additional input.
> >> 
> >> Here are the problems we're seeing:
> >> 
> >> 1) Rapid reinjection can lead to time moving faster for short bursts of
> >>time.  We've seen a number of RTC watchdog BSoDs and it's possible
> >>that at least one cause is reinjection speed.
> >> 
> >> 2) When hibernating a host system, the guest gets is essentially paused
> >>for a long period of time.  This results in a very large tick catchup
> >>while also resulting in a large skew in guest time.
> >> 
> >>I've gotten reports of the tick catchup consuming a lot of CPU time
> >>from rapid delivery of interrupts (although I haven't reproduced this
> >>yet).
> >> 
> >> 3) Windows appears to have a service that periodically syncs the guest
> >>time with the hardware clock.  I've been told the resync period is an
> >>hour.  For large clock skews, this can compete with reinjection
> >>resulting in a positive skew in time (the guest can be ahead of the
> >>host).
> >> 
> >> I've been thinking about an algorithm like this to address these
> >> problems:
> >> 
> >> A) Limit the number of interrupts that we reinject to the equivalent of
> >>a small period of wallclock time.  Something like 60 seconds.
> >> 
> >> B) In the event of (A), trigger a notification in QEMU.  This is easy
> >>for the RTC but harder for the in-kernel PIT.  Maybe it's a good time to
> >>revisit usage of the in-kernel PIT?
> >> 
> >> C) On acculumated tick overflow, rely on using a qemu-ga command to
> >>force a resync of the guest's time to the hardware wallclock time.
> >> 
> >> D) Whenever the guest reads the wallclock time from the RTC, reset all
> >>accumulated ticks.
> >> 
> >> In order to do (C), we'll need to plumb qemu-ga through QMP.  Mike and I
> >> discussed a low-impact way of doing this (having a separate dispatch
> >> path for guest agent commands) and I'm confident we could do this for
> >> 1.3.
> >> 
> >> This would mean that management tools would need to consume qemu-ga
> >> through QMP.  Not sure if this is a problem for anyone.
> >> 
> >> I'm not sure whether it's worth trying to support this with the
> >> in-kernel PIT or not either.
> >
> > As with our current discussion around fixing the PIC and its impact on
> > the PIT, we should try on the userspace model first and then check if
> > the design can be adapted to support in-kernel as well.
> >
> > For which guests is the PIT important again? Old Linux kernels? Windows
> > should be mostly happy with the RTC - or the HPET.
> 
> I thought that only 64-bit Win2k8+ used the RTC.
> 
> I thought win2k3 and even 32-bit win2k8 still used the PIT.
> 
Only WindowsXP non-acpi hal uses PIT. Any other windows uses RTC. In
other words we do not care about PIT.

> >> Are there other issues with reinjection that people are aware of?  Does
> >> anything seem obviously wrong with the above?
> >
> > We should take the chance and design everything in a way that the HPET
> > can finally be (left) enabled.
> 
> I thought the issue with the HPET was access frequency and the cost of
> heavy weight exits.
> 
> I don't have concrete data here.  I've only heard it second hand.  Can
> anyone comment more?
> 
There is no any reason whatsoever to emulate HPET for Windows. It will
make it slower. Hyper-V does not emulate it. For proper time support in
Windows we need to implement relevant part of Hyper-V spec.

--
Gleb.

Re: [Qemu-devel] [PATCH 005/126] target-s390: Fix gdbstub

2012-09-12 Thread Richard Henderson

On 09/12/2012 06:25 AM, Alexander Graf wrote:
>> +case S390_PSWM_REGNUM:
>> +env->psw.mask = tmpl;
>> +env->cc_op = (tmpl >> 13) & 3;
> 
> Are you sure this is correct? I thought gdbstub would just ignore the cc bits.

Well... no it won't ignore the cc bits.  But it would appear that I've got
them at the wrong location.  From gdb/s390-tdep.c:

  if (regnum == tdep->cc_regnum)
{
  enum register_status status;

  status = regcache_raw_read_unsigned (regcache, S390_PSWM_REGNUM, &val);
  if (status == REG_VALID)
{
  if (register_size (gdbarch, S390_PSWA_REGNUM) == 4)
val = (val >> 12) & 3;
  else
val = (val >> 44) & 3;
  store_unsigned_integer (buf, regsize, byte_order, val);
}
  return status;
}


r~

Re: [Qemu-devel] Rethinking missed tick catchup

2012-09-12 Thread Gleb Natapov

On Wed, Sep 12, 2012 at 08:54:26AM -0500, Anthony Liguori wrote:
> 
> Hi,
> 
> We've been running into a lot of problems lately with Windows guests and
> I think they all ultimately could be addressed by revisiting the missed
> tick catchup algorithms that we use.  Mike and I spent a while talking
> about it yesterday and I wanted to take the discussion to the list to
> get some additional input.
> 
> Here are the problems we're seeing:
> 
> 1) Rapid reinjection can lead to time moving faster for short bursts of
>time.  We've seen a number of RTC watchdog BSoDs and it's possible
>that at least one cause is reinjection speed.
> 
> 2) When hibernating a host system, the guest gets is essentially paused
>for a long period of time.  This results in a very large tick catchup
>while also resulting in a large skew in guest time.
> 
>I've gotten reports of the tick catchup consuming a lot of CPU time
>from rapid delivery of interrupts (although I haven't reproduced this
>yet).
> 
> 3) Windows appears to have a service that periodically syncs the guest
>time with the hardware clock.  I've been told the resync period is an
>hour.  For large clock skews, this can compete with reinjection
>resulting in a positive skew in time (the guest can be ahead of the
>host).
> 
> I've been thinking about an algorithm like this to address these
> problems:
> 
> A) Limit the number of interrupts that we reinject to the equivalent of
>a small period of wallclock time.  Something like 60 seconds.
> 
How this will fix BSOD problem for instance? 60 seconds is long enough
to cause all the problem you are talking about above. We can make
amount of accumulated ticks easily configurable though to play with and
see.

> B) In the event of (A), trigger a notification in QEMU.  This is easy
>for the RTC but harder for the in-kernel PIT.  Maybe it's a good time to
>revisit usage of the in-kernel PIT?
> 
PIT does not matter for Windows guests.

> C) On acculumated tick overflow, rely on using a qemu-ga command to
>force a resync of the guest's time to the hardware wallclock time.
> 
Needs guest cooperation.

> D) Whenever the guest reads the wallclock time from the RTC, reset all
>accumulated ticks.
>
> In order to do (C), we'll need to plumb qemu-ga through QMP.  Mike and I
> discussed a low-impact way of doing this (having a separate dispatch
> path for guest agent commands) and I'm confident we could do this for
> 1.3.
> 
> This would mean that management tools would need to consume qemu-ga
> through QMP.  Not sure if this is a problem for anyone.
> 
> I'm not sure whether it's worth trying to support this with the
> in-kernel PIT or not either.
> 
> Are there other issues with reinjection that people are aware of?  Does
> anything seem obviously wrong with the above?
> 
It looks like you are trying to solve only pathologically big timedrift
problems. Those do not happen normally.

--
Gleb.

Re: [Qemu-devel] [PATCH] hw: Add support for new LSI Logic devices.

2012-09-12 Thread Avi Kivity

On 09/11/2012 08:00 PM, Don Slutz wrote:
> Add LSI53C1030, SAS1068, SAS1068e.  Based on code from "VirtualBox Open 
> Source Edition".
> Based on QEMU MegaRAID SAS 8708EM2.
> 
> This is a common VMware disk controller.
> 
> SEABIOS change for booting is in the works.
> 
> Tested with Fedora 16, 17.  CentoOS 6. Windows 2003R2 and 2008R2.
> 

Is the spec for these devices freely available?  Please provide a link
in the source.


-- 
error compiling committee.c: too many arguments to function

Re: [Qemu-devel] Rethinking missed tick catchup

2012-09-12 Thread Jan Kiszka

On 2012-09-12 17:06, Gleb Natapov wrote:
 Are there other issues with reinjection that people are aware of?  Does
 anything seem obviously wrong with the above?
>>>
>>> We should take the chance and design everything in a way that the HPET
>>> can finally be (left) enabled.
>>
>> I thought the issue with the HPET was access frequency and the cost of
>> heavy weight exits.
>>
>> I don't have concrete data here.  I've only heard it second hand.  Can
>> anyone comment more?
>>
> There is no any reason whatsoever to emulate HPET for Windows. It will
> make it slower. Hyper-V does not emulate it. For proper time support in
> Windows we need to implement relevant part of Hyper-V spec.

There are two reasons to do it nevertheless:

 - QEMU is not Hyper-V. We are emulating the HPET already, and we
   expose it by default. So we should do it properly.

 - The time drift fix for the RTC is still a hack. Adding a second user
   would force us to finally clean it up.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux

Re: [Qemu-devel] Rethinking missed tick catchup

2012-09-12 Thread Gleb Natapov

On Wed, Sep 12, 2012 at 05:42:58PM +0200, Jan Kiszka wrote:
> On 2012-09-12 17:06, Gleb Natapov wrote:
>  Are there other issues with reinjection that people are aware of?  Does
>  anything seem obviously wrong with the above?
> >>>
> >>> We should take the chance and design everything in a way that the HPET
> >>> can finally be (left) enabled.
> >>
> >> I thought the issue with the HPET was access frequency and the cost of
> >> heavy weight exits.
> >>
> >> I don't have concrete data here.  I've only heard it second hand.  Can
> >> anyone comment more?
> >>
> > There is no any reason whatsoever to emulate HPET for Windows. It will
> > make it slower. Hyper-V does not emulate it. For proper time support in
> > Windows we need to implement relevant part of Hyper-V spec.
> 
> There are two reasons to do it nevertheless:
> 
>  - QEMU is not Hyper-V. We are emulating the HPET already, and we
>expose it by default. So we should do it properly.
> 
>  - The time drift fix for the RTC is still a hack. Adding a second user
>would force us to finally clean it up.
> 
I am not saying we should not emulate HPET in QEMU,  I am saying there
is not reason to emulate it for Windows :)

--
Gleb.

Re: [Qemu-devel] Enablig DLPAR capacity on QEMU pSeries

2012-09-12 Thread Alexander Graf


On 09/12/2012 04:54 PM, Erlon Cruz wrote:

Hi all,

We are planning to implement DLPAR capacity on QEMU pSeries. As we


What is DLPAR? Hotplug support?


lack of experience in the internals of the arch we would like you guys
to give us some design directions
and confirm if we going in the right direction. Our first idea is:

 1 - to patch 'spapr.c' so it can dynamically insert/remove basic
items into the device tree.


What exactly would you like to patch into it? We already do have support 
for dynamic dt creation with the spapr target.



 2 - create a host side device that will be used with a guest side
driver to perform guest side operations and communicate changes from
host to the guest (like DynamicRM does in PowerVM LPARs). We are not


Why not just use hypercalls?


planning to use powerpc-tools and want to make resource management
transparent (i.e. no need to run daemons or userspace programs in the
guest, only this kernel driver).
 3 - create bindings to support adding/removal  ibmvscsi devices
 4 - create bindings to support adding/removal  ibmveth devices
 5 - create bindings to support adding/removal PCI devices
 6 - create bindings to support adding/removal of memory


This is going to be the hardest part. I don't think QEMU supports memory 
hotplug yet.



 - Do we need to do this the way PowerVM does? We have tested
virtio ballooning and it can works with a few endiannes corrections.


I don't know how PowerVM works. But if normal ballooning is all you 
need, you should certainly just enable virtio-balloon.



 7 - create bindings to support adding/removal  CPUs
 - is SMP supported already? I tried to run SMP in a x86 host
and the guest stuck when SMP is enabled


SMP should work just fine, yes. Where exactly does it get stuck?


 - would be possible to work on this without a P7 baremetal
machine?


At least for device hotplug, it should be perfectly possible to use an 
old G5 with PR KVM. I haven't gotten around to patch all the pieces of 
the puzzle to make -M pseries work with PR KVM when it's running on top 
of pHyp yet, so that won't work.



We have a P7 8205-E6B, is that possible to kick PHYP out?


Ben?


Any ideia on how much effort (time/people) the hole thing would take?
Any consideration about this is much appreciated :)


Phew. It's hard to tell. Depends heavily on how good your people are :).


Alex

Re: [Qemu-devel] [RFC v3 ATCH 0/5] char: expose CirMemCharDriver and provide QMP interface

2012-09-12 Thread Avi Kivity

On 09/12/2012 02:57 PM, Lei Li wrote:
> This RFC series attempts to convert the MemCharDriver to use a circular
> buffer for input and output, expose it to users by introducing QMP commands
> memchar_write and memchar_read and via the command line like the other
> CharDriverStates.
> 
> Serial ports in qemu always use CharDriverStates as there backends,
> Right now, all of our backends always try to write the data from the
> guest to a socket or file. The concern from OpenStack is that this could
> lead to unbounded disk space usage since they log the serial output.
> For more detail of the background info:
> https://bugs.launchpad.net/nova/+bug/832507
> 
> So we want to use a circular buffer in QEMU instead, and then OpenStack
> can periodically read the buffer in QEMU and log it.

Can't they do it themselves?  Have qemu write to a pipe, and on the
other side, do whatever rate limiting is needed.

-- 
error compiling committee.c: too many arguments to function

Re: [Qemu-devel] [PATCH 005/126] target-s390: Fix gdbstub

2012-09-12 Thread Alexander Graf


On 09/12/2012 05:11 PM, Richard Henderson wrote:

On 09/12/2012 06:25 AM, Alexander Graf wrote:

+case S390_PSWM_REGNUM:
+env->psw.mask = tmpl;
+env->cc_op = (tmpl >> 13) & 3;

Are you sure this is correct? I thought gdbstub would just ignore the cc bits.

Well... no it won't ignore the cc bits.


So the CC pseudo-register is never written to?


  But it would appear that I've got
them at the wrong location.  From gdb/s390-tdep.c:

   if (regnum == tdep->cc_regnum)
 {
   enum register_status status;

   status = regcache_raw_read_unsigned (regcache, S390_PSWM_REGNUM, &val);
   if (status == REG_VALID)
 {
   if (register_size (gdbarch, S390_PSWA_REGNUM) == 4)
 val = (val >> 12) & 3;


Oops :)


Alex

[Qemu-devel] Does TCG IR use static single assignment (SSA) form?

2012-09-12 Thread Yichen Yang

Dear all,

Excuse me for asking, does TCG-IR  use static single assignment (SSA) form?

I just wanna know how to translate a register-based bytecode to TCG-IR.

thanks :)

Yichen

Re: [Qemu-devel] Windows VM slow boot

2012-09-12 Thread Mel Gorman

On Wed, Sep 12, 2012 at 11:56:59AM +0100, Richard Davies wrote:
> [ adding linux-mm - previously at http://marc.info/?t=13451150943 ]
> 
> Hi Rik,
> 

I'm not Rik but hi anyway.

> Since qemu-kvm 1.2.0 and Linux 3.6.0-rc5 came out, I thought that I would
> retest with these.
> 

Ok. 3.6.0-rc5 contains [c67fe375: mm: compaction: Abort async compaction
if locks are contended or taking too long] that should have helped mitigate
some of the lock contention problem but not all of it as we'll see later.

> The typical symptom now appears to be that the Windows VMs boot reasonably
> fast,

I see that this is an old-ish bug but I did not read the full history.
Is it now booting faster than 3.5.0 was? I'm asking because I'm
interested to see if commit c67fe375 helped your particular case.

> but then there is high CPU use and load for many minutes afterwards -
> the high CPU use is both for the qemu-kvm processes themselves and also for
> % sys.
> 

Ok, I cannot comment on the userspace portion of things but the kernel
portion still indicates that there is a high percentage of time on what
appears to be lock contention.

> I attach a perf report which seems to show that the high CPU use is in the
> memory manager.
> 

A follow-on from commit c67fe375 was the following patch (author cc'd)
which addresses lock contention in isolate_migratepages_range where your
perf report indicates that we're spending 95% of the time. Would you be
willing to test it please?

---8<---
From: Shaohua Li 
Subject: mm: compaction: check lock contention first before taking lock

isolate_migratepages_range will take zone->lru_lock first and check if the
lock is contented, if yes, it will release the lock.  This isn't
efficient.  If the lock is truly contented, a lock/unlock pair will
increase the lock contention.  We'd better check if the lock is contended
first.  compact_trylock_irqsave perfectly meets the requirement.

Signed-off-by: Shaohua Li 
Acked-by: Mel Gorman 
Acked-by: Minchan Kim 
Signed-off-by: Andrew Morton 
---

 mm/compaction.c |5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff -puN 
mm/compaction.c~mm-compaction-check-lock-contention-first-before-taking-lock 
mm/compaction.c
--- 
a/mm/compaction.c~mm-compaction-check-lock-contention-first-before-taking-lock
+++ a/mm/compaction.c
@@ -349,8 +349,9 @@ isolate_migratepages_range(struct zone *

/* Time to isolate some pages for migration */
cond_resched();
-   spin_lock_irqsave(&zone->lru_lock, flags);
-   locked = true;
+   locked = compact_trylock_irqsave(&zone->lru_lock, &flags, cc);
+   if (!locked)
+   return 0;
for (; low_pfn < end_pfn; low_pfn++) {
struct page *page;

Re: [Qemu-devel] [RFC v3 ATCH 0/5] char: expose CirMemCharDriver and provide QMP interface

2012-09-12 Thread Daniel P. Berrange

On Wed, Sep 12, 2012 at 07:57:21PM +0800, Lei Li wrote:
> This RFC series attempts to convert the MemCharDriver to use a circular
> buffer for input and output, expose it to users by introducing QMP commands
> memchar_write and memchar_read and via the command line like the other
> CharDriverStates.
> 
> Serial ports in qemu always use CharDriverStates as there backends,
> Right now, all of our backends always try to write the data from the
> guest to a socket or file. The concern from OpenStack is that this could
> lead to unbounded disk space usage since they log the serial output.
> For more detail of the background info:
> https://bugs.launchpad.net/nova/+bug/832507

Unbounded disk usage is only relevant if telling QEMU to write directly
to its file backend. If you use a socket backend the mgmt app can provide
whatever policy it desires.

> So we want to use a circular buffer in QEMU instead, and then OpenStack
> can periodically read the buffer in QEMU and log it.

With any circular buffer you obviously have a race condition where
the buffer may overflow before the application can consume the data.
By implementing the circular buffer in QEMU you are imposing a
specific policy for overflow on the applications using QEMU, namely
that data gets overwritten/lost.

If the circular buffering is done outside QEMU, then the application
can implement a variety of policies on overflow. At the very least
they can detect when overflow would occur, and insert a marker to
the effect that there is a log discontinuity. Or they can pause the
VM for a period of time, or reduce its scheduling priority, or any
number of different things.

The further advantage of doing it outside QEMU, is that OpenStack will
work with all existing QEMU/KVM/libvirt versions.

I think most of the discussion in the quoted OpenStack bug is rather
short sighted only focusing on the immediate needs for their current
'get_log_output' API. I don't think this will be satisfactory for
OpenStack in the medium-long term. IMHO OpenStack needs to provide
a more comprensive logging capability than what is does today, and
thus this proposed QEMU feature would have a short lifetime of usefulness
to OpenStack. I've already recommended that OpenStack take advantage
of conserver to setup logging of all VMs, though there were some
issues around that. It is entirely possible for OpenStack to provide
its own logging system to process VM logs into fixed size files, as
well as satisfying the needs of its get_log_output API for circular
buffers.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

Re: [Qemu-devel] Rethinking missed tick catchup

2012-09-12 Thread Gleb Natapov

On Wed, Sep 12, 2012 at 06:06:47PM +0300, Gleb Natapov wrote:
> On Wed, Sep 12, 2012 at 09:44:10AM -0500, Anthony Liguori wrote:
> > Jan Kiszka  writes:
> > 
> > > On 2012-09-12 15:54, Anthony Liguori wrote:
> > >> 
> > >> Hi,
> > >> 
> > >> We've been running into a lot of problems lately with Windows guests and
> > >> I think they all ultimately could be addressed by revisiting the missed
> > >> tick catchup algorithms that we use.  Mike and I spent a while talking
> > >> about it yesterday and I wanted to take the discussion to the list to
> > >> get some additional input.
> > >> 
> > >> Here are the problems we're seeing:
> > >> 
> > >> 1) Rapid reinjection can lead to time moving faster for short bursts of
> > >>time.  We've seen a number of RTC watchdog BSoDs and it's possible
> > >>that at least one cause is reinjection speed.
> > >> 
> > >> 2) When hibernating a host system, the guest gets is essentially paused
> > >>for a long period of time.  This results in a very large tick catchup
> > >>while also resulting in a large skew in guest time.
> > >> 
> > >>I've gotten reports of the tick catchup consuming a lot of CPU time
> > >>from rapid delivery of interrupts (although I haven't reproduced this
> > >>yet).
> > >> 
> > >> 3) Windows appears to have a service that periodically syncs the guest
> > >>time with the hardware clock.  I've been told the resync period is an
> > >>hour.  For large clock skews, this can compete with reinjection
> > >>resulting in a positive skew in time (the guest can be ahead of the
> > >>host).
> > >> 
> > >> I've been thinking about an algorithm like this to address these
> > >> problems:
> > >> 
> > >> A) Limit the number of interrupts that we reinject to the equivalent of
> > >>a small period of wallclock time.  Something like 60 seconds.
> > >> 
> > >> B) In the event of (A), trigger a notification in QEMU.  This is easy
> > >>for the RTC but harder for the in-kernel PIT.  Maybe it's a good time 
> > >> to
> > >>revisit usage of the in-kernel PIT?
> > >> 
> > >> C) On acculumated tick overflow, rely on using a qemu-ga command to
> > >>force a resync of the guest's time to the hardware wallclock time.
> > >> 
> > >> D) Whenever the guest reads the wallclock time from the RTC, reset all
> > >>accumulated ticks.
> > >> 
> > >> In order to do (C), we'll need to plumb qemu-ga through QMP.  Mike and I
> > >> discussed a low-impact way of doing this (having a separate dispatch
> > >> path for guest agent commands) and I'm confident we could do this for
> > >> 1.3.
> > >> 
> > >> This would mean that management tools would need to consume qemu-ga
> > >> through QMP.  Not sure if this is a problem for anyone.
> > >> 
> > >> I'm not sure whether it's worth trying to support this with the
> > >> in-kernel PIT or not either.
> > >
> > > As with our current discussion around fixing the PIC and its impact on
> > > the PIT, we should try on the userspace model first and then check if
> > > the design can be adapted to support in-kernel as well.
> > >
> > > For which guests is the PIT important again? Old Linux kernels? Windows
> > > should be mostly happy with the RTC - or the HPET.
> > 
> > I thought that only 64-bit Win2k8+ used the RTC.
> > 
> > I thought win2k3 and even 32-bit win2k8 still used the PIT.
> > 
> Only WindowsXP non-acpi hal uses PIT. Any other windows uses RTC. In
> other words we do not care about PIT.
> 
Small clarification. They use RTC if HPET is not present. I don't know
at what version Windows started to prefer HPET over RTC.

--
Gleb.

Re: [Qemu-devel] [PATCH 005/126] target-s390: Fix gdbstub

2012-09-12 Thread Richard Henderson

On 09/12/2012 08:54 AM, Alexander Graf wrote:
> So the CC pseudo-register is never written to?

They do handle a write to cc_regnum in s390_pseudo_register_write.
They modify psw.mask as one would expect.


r~

Re: [Qemu-devel] Rethinking missed tick catchup

2012-09-12 Thread Stefan Weil


Am 12.09.2012 15:54, schrieb Anthony Liguori:


Hi,

We've been running into a lot of problems lately with Windows guests and
I think they all ultimately could be addressed by revisiting the missed
tick catchup algorithms that we use.  Mike and I spent a while talking
about it yesterday and I wanted to take the discussion to the list to
get some additional input.

Here are the problems we're seeing:

1) Rapid reinjection can lead to time moving faster for short bursts of
time.  We've seen a number of RTC watchdog BSoDs and it's possible
that at least one cause is reinjection speed.

2) When hibernating a host system, the guest gets is essentially paused
for a long period of time.  This results in a very large tick catchup
while also resulting in a large skew in guest time.

I've gotten reports of the tick catchup consuming a lot of CPU time
from rapid delivery of interrupts (although I haven't reproduced this
yet).

3) Windows appears to have a service that periodically syncs the guest
time with the hardware clock.  I've been told the resync period is an
hour.  For large clock skews, this can compete with reinjection
resulting in a positive skew in time (the guest can be ahead of the
host).


Nearly each modern OS (including Windows) uses NTP
or some other protocol to get the time via a TCP network.

If a guest OS detects a small difference of time, it will usually
accelerate or decelerate the OS clock until the time is
synchronised again.

Large jumps in network time will make the OS time jump, too.
With a little bad luck, QEMU's reinjection will add the
positive skew, no matter whether the guest is Linux or Windows.



I've been thinking about an algorithm like this to address these
problems:

A) Limit the number of interrupts that we reinject to the equivalent of
a small period of wallclock time.  Something like 60 seconds.

B) In the event of (A), trigger a notification in QEMU.  This is easy
for the RTC but harder for the in-kernel PIT.  Maybe it's a good time to
revisit usage of the in-kernel PIT?

C) On acculumated tick overflow, rely on using a qemu-ga command to
force a resync of the guest's time to the hardware wallclock time.

D) Whenever the guest reads the wallclock time from the RTC, reset all
accumulated ticks.


D) makes no sense, see my comment above.

Injection of additional timer interrupts should not be needed
after a hibernation. The guest must handle that situation
by reading either the hw clock (which must be updated
by QEMU when it resumes from hibernate) or by using
another time reference (like NTP, for example).



In order to do (C), we'll need to plumb qemu-ga through QMP.  Mike and I
discussed a low-impact way of doing this (having a separate dispatch
path for guest agent commands) and I'm confident we could do this for
1.3.

This would mean that management tools would need to consume qemu-ga
through QMP.  Not sure if this is a problem for anyone.

I'm not sure whether it's worth trying to support this with the
in-kernel PIT or not either.

Are there other issues with reinjection that people are aware of?  Does
anything seem obviously wrong with the above?

Regards,

Anthony Liguori

Re: [Qemu-devel] Rethinking missed tick catchup

2012-09-12 Thread Gleb Natapov

On Wed, Sep 12, 2012 at 06:27:14PM +0200, Stefan Weil wrote:
> Am 12.09.2012 15:54, schrieb Anthony Liguori:
> >
> >Hi,
> >
> >We've been running into a lot of problems lately with Windows guests and
> >I think they all ultimately could be addressed by revisiting the missed
> >tick catchup algorithms that we use.  Mike and I spent a while talking
> >about it yesterday and I wanted to take the discussion to the list to
> >get some additional input.
> >
> >Here are the problems we're seeing:
> >
> >1) Rapid reinjection can lead to time moving faster for short bursts of
> >time.  We've seen a number of RTC watchdog BSoDs and it's possible
> >that at least one cause is reinjection speed.
> >
> >2) When hibernating a host system, the guest gets is essentially paused
> >for a long period of time.  This results in a very large tick catchup
> >while also resulting in a large skew in guest time.
> >
> >I've gotten reports of the tick catchup consuming a lot of CPU time
> >from rapid delivery of interrupts (although I haven't reproduced this
> >yet).
> >
> >3) Windows appears to have a service that periodically syncs the guest
> >time with the hardware clock.  I've been told the resync period is an
> >hour.  For large clock skews, this can compete with reinjection
> >resulting in a positive skew in time (the guest can be ahead of the
> >host).
> 
> Nearly each modern OS (including Windows) uses NTP
> or some other protocol to get the time via a TCP network.
> 
The drifts we are talking about will take ages for NTP to fix.

> If a guest OS detects a small difference of time, it will usually
> accelerate or decelerate the OS clock until the time is
> synchronised again.
> 
> Large jumps in network time will make the OS time jump, too.
> With a little bad luck, QEMU's reinjection will add the
> positive skew, no matter whether the guest is Linux or Windows.
> 
As far as I know NTP will never make OS clock jump. The purpose of NTP
is to fix time gradually, so apps will not notice. npdate is used to
force clock synchronization, but is should be run manually.

> >
> >I've been thinking about an algorithm like this to address these
> >problems:
> >
> >A) Limit the number of interrupts that we reinject to the equivalent of
> >a small period of wallclock time.  Something like 60 seconds.
> >
> >B) In the event of (A), trigger a notification in QEMU.  This is easy
> >for the RTC but harder for the in-kernel PIT.  Maybe it's a good time to
> >revisit usage of the in-kernel PIT?
> >
> >C) On acculumated tick overflow, rely on using a qemu-ga command to
> >force a resync of the guest's time to the hardware wallclock time.
> >
> >D) Whenever the guest reads the wallclock time from the RTC, reset all
> >accumulated ticks.
> 
> D) makes no sense, see my comment above.
> 
> Injection of additional timer interrupts should not be needed
> after a hibernation. The guest must handle that situation
> by reading either the hw clock (which must be updated
> by QEMU when it resumes from hibernate) or by using
> another time reference (like NTP, for example).
> 
He is talking about host hibernation, not guest.

> >
> >In order to do (C), we'll need to plumb qemu-ga through QMP.  Mike and I
> >discussed a low-impact way of doing this (having a separate dispatch
> >path for guest agent commands) and I'm confident we could do this for
> >1.3.
> >
> >This would mean that management tools would need to consume qemu-ga
> >through QMP.  Not sure if this is a problem for anyone.
> >
> >I'm not sure whether it's worth trying to support this with the
> >in-kernel PIT or not either.
> >
> >Are there other issues with reinjection that people are aware of?  Does
> >anything seem obviously wrong with the above?
> >
> >Regards,
> >
> >Anthony Liguori

--
Gleb.

Re: [Qemu-devel] Windows VM slow boot

2012-09-12 Thread Richard Davies

Hi Mel - thanks for replying to my underhand bcc!

Mel Gorman wrote:
> I see that this is an old-ish bug but I did not read the full history.
> Is it now booting faster than 3.5.0 was? I'm asking because I'm
> interested to see if commit c67fe375 helped your particular case.

Yes, I think 3.6.0-rc5 is already better than 3.5.x but can still be
improved, as discussed.

> A follow-on from commit c67fe375 was the following patch (author cc'd)
> which addresses lock contention in isolate_migratepages_range where your
> perf report indicates that we're spending 95% of the time. Would you be
> willing to test it please?
>
> ---8<---
> From: Shaohua Li 
> Subject: mm: compaction: check lock contention first before taking lock
>
> isolate_migratepages_range will take zone->lru_lock first and check if the
> lock is contented, if yes, it will release the lock.  This isn't
> efficient.  If the lock is truly contented, a lock/unlock pair will
> increase the lock contention.  We'd better check if the lock is contended
> first.  compact_trylock_irqsave perfectly meets the requirement.
>
> Signed-off-by: Shaohua Li 
> Acked-by: Mel Gorman 
> Acked-by: Minchan Kim 
> Signed-off-by: Andrew Morton 
> ---
>
>  mm/compaction.c |5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff -puN 
> mm/compaction.c~mm-compaction-check-lock-contention-first-before-taking-lock 
> mm/compaction.c
> --- 
> a/mm/compaction.c~mm-compaction-check-lock-contention-first-before-taking-lock
> +++ a/mm/compaction.c
> @@ -349,8 +349,9 @@ isolate_migratepages_range(struct zone *
> 
>   /* Time to isolate some pages for migration */
>   cond_resched();
> - spin_lock_irqsave(&zone->lru_lock, flags);
> - locked = true;
> + locked = compact_trylock_irqsave(&zone->lru_lock, &flags, cc);
> + if (!locked)
> + return 0;
>   for (; low_pfn < end_pfn; low_pfn++) {
>   struct page *page;

I have applied and tested again - perf results below.

isolate_migratepages_range is indeed much reduced.

There is now a lot of time in isolate_freepages_block and still quite a lot
of lock contention, although in a different place.


# 
# captured on: Wed Sep 12 16:00:52 2012
# os release : 3.6.0-rc5-elastic+
# perf version : 3.5.2
# arch : x86_64
# nrcpus online : 16
# nrcpus avail : 16
# cpudesc : AMD Opteron(tm) Processor 6128
# cpuid : AuthenticAMD,16,9,1
# total memory : 131973280 kB
# cmdline : /home/root/bin/perf record -g -a 
# event : name = cycles, type = 0, config = 0x0, config1 = 0x0, config2 = 0x0, 
excl_usr = 0, excl_kern = 0, id = { 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 
76, 77, 78, 79, 80 }
# HEADER_CPU_TOPOLOGY info available, use -I to display
# HEADER_NUMA_TOPOLOGY info available, use -I to display
# 
#
# Samples: 1M of event 'cycles'
# Event count (approx.): 560365005583
#
# Overhead  Command Shared Object   
   Symbol
#   ...    
..
#
43.95% qemu-kvm  [kernel.kallsyms] [k] isolate_freepages_block  
 
   |
   --- isolate_freepages_block
  |  
  |--99.99%-- compaction_alloc
  |  migrate_pages
  |  compact_zone
  |  compact_zone_order
  |  try_to_compact_pages
  |  __alloc_pages_direct_compact
  |  __alloc_pages_nodemask
  |  alloc_pages_vma
  |  do_huge_pmd_anonymous_page
  |  handle_mm_fault
  |  __get_user_pages
  |  get_user_page_nowait
  |  hva_to_pfn.isra.17
  |  __gfn_to_pfn
  |  gfn_to_pfn_async
  |  try_async_pf
  |  tdp_page_fault
  |  kvm_mmu_page_fault
  |  pf_interception
  |  handle_exit
  |  kvm_arch_vcpu_ioctl_run
  |  kvm_vcpu_ioctl
  |  do_vfs_ioctl
  |  sys_ioctl
  |  system_call_fastpath
  |  ioctl
  |  |  
  |  |--95.17%-- 0x1010006
  |  |  
  |   --4.83%-- 0x1010002
   --0.01%-- [...]
15.98% qemu-kvm  [kernel.kallsyms] [k] _raw_spin_lock_irqsave   
 
   |
   --- _raw_spin_lock_irqsave

Re: [Qemu-devel] [PATCH v3 01/17] vga: rename pci_vga_init() into pci_std_vga_init()

2012-09-12 Thread Richard Henderson

On 09/11/2012 12:10 PM, Aurelien Jarno wrote:
> This better explains what is this function about. Adjust all callers.
> 
> Cc: Richard Henderson 
> Cc: Alexander Graf 
> Cc: Andreas Färber 
> Cc: David Gibson 
> Cc: Anthony Liguori 
> Acked-by: Blue Swirl 
> Acked-by: Andreas Färber 
> Signed-off-by: Aurelien Jarno 

Acked-by: Richard Henderson 


r~

Re: [Qemu-devel] Does TCG IR use static single assignment (SSA) form?

2012-09-12 Thread Peter Maydell

On 12 September 2012 10:45, (Yichen Yang)楊逸臣
 wrote:
> Excuse me for asking, does TCG-IR  use static single assignment (SSA) form?

No. The TCG IR is documented in tcg/README -- this should be enough to
be able to translate something else into it.

-- PMM

Re: [Qemu-devel] Does TCG IR use static single assignment (SSA) form?

2012-09-12 Thread Richard Henderson

On 09/12/2012 02:45 AM, (Yichen Yang)楊逸臣 wrote:
> Excuse me for asking, does TCG-IR  use static single assignment (SSA) form?

No.


r~

Re: [Qemu-devel] TCG questions

2012-09-12 Thread Xin Tong

On Wed, Sep 12, 2012 at 6:14 AM, Lluís Vilanova  wrote:
> Xin Tong writes:
>
>> i do not know. could be similar. I am doing architecture research. i
>> need traces of memory access for programming running under a full
>> system environment, so i wrote this.
>
>> i do nto seem to be able to access the linked provided from the link
>> you give me though.
>
>> https://projects.gso.ac.upc.edu/projects/qemu-dbi/wiki
>
> Well, if you tried to access it during the last few days, we've been having 
> some
> issues with the server.
>
> It should all work now.
>
> The main idea is to have an API similar in spirit to that of PIN [1]. You can
> have a look at the instrumentation docs [2] for some simple examples.
>
> I had some plans to modify QEMU's address translation mechanism to provide a
> performant mechanism to retrieve physical addresses (traces of virtual 
> addresses
> are already supported), but that will have to wait until I finish some other
> unrelated tasks.
>
>
> [1] http://pintool.org
> [2] 
> https://projects.gso.ac.upc.edu/projects/qemu-dbi/repository/entry/docs/instrumentation.txt#L202

By the way Luis. this is exactly what i am doing. i am writing up a
ISPASS paper for this as well. I am also using PIN to verify the
instrumentation interface.

Xin

>
> Lluis
>
>
>
>> On Tue, Sep 11, 2012 at 6:52 PM, 陳韋任 (Wei-Ren Chen)
>>  wrote:
 I have created a set of instrument API on QEMU. one can write client
 programs that compile into shared library. the shared library is then
 loaded into qemu and extract statistics out of QEMU.
>>>
>>> Instrument API? Same as what Liuis did?
>>>
>>> Regards,
>>> chenwj
>>>
>>> [1] http://lists.gnu.org/archive/html/qemu-devel/2012-09/msg00379.html
>>>
>>> --
>>> Wei-Ren Chen (陳韋任)
>>> Computer Systems Lab, Institute of Information Science,
>>> Academia Sinica, Taiwan (R.O.C.)
>>> Tel:886-2-2788-3799 #1667
>>> Homepage: http://people.cs.nctu.edu.tw/~chenwj
>
>
> --
>  "And it's much the same thing with knowledge, for whenever you learn
>  something new, the whole world becomes that much richer."
>  -- The Princess of Pure Reason, as told by Norton Juster in The Phantom
>  Tollbooth

Re: [Qemu-devel] TCG questions

2012-09-12 Thread Xin Tong

I have the code on http://code.google.com/p/qemu-trace/. I currently
have memory trace, branch trace and some special instructions traces
ready ( unverified though). we should discuss about what is the best
way to do this btw.

Xin


On Wed, Sep 12, 2012 at 10:09 AM, Xin Tong  wrote:
> On Wed, Sep 12, 2012 at 6:14 AM, Lluís Vilanova  wrote:
>> Xin Tong writes:
>>
>>> i do not know. could be similar. I am doing architecture research. i
>>> need traces of memory access for programming running under a full
>>> system environment, so i wrote this.
>>
>>> i do nto seem to be able to access the linked provided from the link
>>> you give me though.
>>
>>> https://projects.gso.ac.upc.edu/projects/qemu-dbi/wiki
>>
>> Well, if you tried to access it during the last few days, we've been having 
>> some
>> issues with the server.
>>
>> It should all work now.
>>
>> The main idea is to have an API similar in spirit to that of PIN [1]. You can
>> have a look at the instrumentation docs [2] for some simple examples.
>>
>> I had some plans to modify QEMU's address translation mechanism to provide a
>> performant mechanism to retrieve physical addresses (traces of virtual 
>> addresses
>> are already supported), but that will have to wait until I finish some other
>> unrelated tasks.
>>
>>
>> [1] http://pintool.org
>> [2] 
>> https://projects.gso.ac.upc.edu/projects/qemu-dbi/repository/entry/docs/instrumentation.txt#L202
>
> By the way Luis. this is exactly what i am doing. i am writing up a
> ISPASS paper for this as well. I am also using PIN to verify the
> instrumentation interface.
>
> Xin
>
>>
>> Lluis
>>
>>
>>
>>> On Tue, Sep 11, 2012 at 6:52 PM, 陳韋任 (Wei-Ren Chen)
>>>  wrote:
> I have created a set of instrument API on QEMU. one can write client
> programs that compile into shared library. the shared library is then
> loaded into qemu and extract statistics out of QEMU.

 Instrument API? Same as what Liuis did?

 Regards,
 chenwj

 [1] http://lists.gnu.org/archive/html/qemu-devel/2012-09/msg00379.html

 --
 Wei-Ren Chen (陳韋任)
 Computer Systems Lab, Institute of Information Science,
 Academia Sinica, Taiwan (R.O.C.)
 Tel:886-2-2788-3799 #1667
 Homepage: http://people.cs.nctu.edu.tw/~chenwj
>>
>>
>> --
>>  "And it's much the same thing with knowledge, for whenever you learn
>>  something new, the whole world becomes that much richer."
>>  -- The Princess of Pure Reason, as told by Norton Juster in The Phantom
>>  Tollbooth

[Qemu-devel] [PATCH] tcg: Fix MAX_OPC_PARAM_IARGS

2012-09-12 Thread Stefan Weil

DEF_HELPER_FLAGS_5 was added some time ago without adjusting
MAX_OPC_PARAM_IARGS.

Fixing the definition becomes more important as QEMU is using
an increasing number of helper functions called with 5 arguments.

Add also a comment to avoid future problems when DEF_HELPER_FLAGS_6
will be added.

Signed-off-by: Stefan Weil 
---

Hi,

I think this patch should be added to the latest stable versions, too.

Please note that this patch breaks compilation with --enable-tcg-interpreter.

TCI code is designed for up to 4 arguments and needs modifications.
The current TCI binaries crash at runtime, so the patch just makes it
obvious that TCI needs to be fixed.

Regards,
Stefan Weil

 def-helper.h |2 ++
 exec-all.h   |2 +-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/def-helper.h b/def-helper.h
index b98ff69..022a9ce 100644
--- a/def-helper.h
+++ b/def-helper.h
@@ -128,6 +128,8 @@
 #define DEF_HELPER_5(name, ret, t1, t2, t3, t4, t5) \
 DEF_HELPER_FLAGS_5(name, 0, ret, t1, t2, t3, t4, t5)
 
+/* MAX_OPC_PARAM_IARGS must be set to n if last entry is DEF_HELPER_FLAGS_n. */
+
 #endif /* DEF_HELPER_H */
 
 #ifndef GEN_HELPER
diff --git a/exec-all.h b/exec-all.h
index ac19c02..8977729 100644
--- a/exec-all.h
+++ b/exec-all.h
@@ -51,7 +51,7 @@ typedef struct TranslationBlock TranslationBlock;
 #else
 #define MAX_OPC_PARAM_PER_ARG 1
 #endif
-#define MAX_OPC_PARAM_IARGS 4
+#define MAX_OPC_PARAM_IARGS 5
 #define MAX_OPC_PARAM_OARGS 1
 #define MAX_OPC_PARAM_ARGS (MAX_OPC_PARAM_IARGS + MAX_OPC_PARAM_OARGS)
 
-- 
1.7.10

Re: [Qemu-devel] [PATCH] tcg: Fix MAX_OPC_PARAM_IARGS

2012-09-12 Thread Richard Henderson

On 09/12/2012 10:18 AM, Stefan Weil wrote:
> DEF_HELPER_FLAGS_5 was added some time ago without adjusting
> MAX_OPC_PARAM_IARGS.
> 
> Fixing the definition becomes more important as QEMU is using
> an increasing number of helper functions called with 5 arguments.
> 
> Add also a comment to avoid future problems when DEF_HELPER_FLAGS_6
> will be added.
> 
> Signed-off-by: Stefan Weil 

Reviewed-by: Richard Henderson 


r~

Re: [Qemu-devel] Rethinking missed tick catchup

2012-09-12 Thread Luiz Capitulino

On Wed, 12 Sep 2012 08:54:26 -0500
Anthony Liguori  wrote:

> 
> Hi,
> 
> We've been running into a lot of problems lately with Windows guests and
> I think they all ultimately could be addressed by revisiting the missed
> tick catchup algorithms that we use.  Mike and I spent a while talking
> about it yesterday and I wanted to take the discussion to the list to
> get some additional input.
> 
> Here are the problems we're seeing:
> 
> 1) Rapid reinjection can lead to time moving faster for short bursts of
>time.  We've seen a number of RTC watchdog BSoDs and it's possible
>that at least one cause is reinjection speed.
> 
> 2) When hibernating a host system, the guest gets is essentially paused
>for a long period of time.  This results in a very large tick catchup
>while also resulting in a large skew in guest time.
> 
>I've gotten reports of the tick catchup consuming a lot of CPU time
>from rapid delivery of interrupts (although I haven't reproduced this
>yet).
> 
> 3) Windows appears to have a service that periodically syncs the guest
>time with the hardware clock.  I've been told the resync period is an
>hour.  For large clock skews, this can compete with reinjection
>resulting in a positive skew in time (the guest can be ahead of the
>host).
> 
> I've been thinking about an algorithm like this to address these
> problems:
> 
> A) Limit the number of interrupts that we reinject to the equivalent of
>a small period of wallclock time.  Something like 60 seconds.
> 
> B) In the event of (A), trigger a notification in QEMU.  This is easy
>for the RTC but harder for the in-kernel PIT.  Maybe it's a good time to
>revisit usage of the in-kernel PIT?
> 
> C) On acculumated tick overflow, rely on using a qemu-ga command to
>force a resync of the guest's time to the hardware wallclock time.
> 
> D) Whenever the guest reads the wallclock time from the RTC, reset all
>accumulated ticks.
> 
> In order to do (C), we'll need to plumb qemu-ga through QMP.  Mike and I
> discussed a low-impact way of doing this (having a separate dispatch
> path for guest agent commands) and I'm confident we could do this for
> 1.3.

Fine with me, but note that we're only two or three commands away from
having the qapi conversion done. So, it's possible that we'll merge this
and re-do it a few weeks later.

> This would mean that management tools would need to consume qemu-ga
> through QMP.  Not sure if this is a problem for anyone.

Shouldn't be a problem I think.

> 
> I'm not sure whether it's worth trying to support this with the
> in-kernel PIT or not either.
> 
> Are there other issues with reinjection that people are aware of?  Does
> anything seem obviously wrong with the above?
> 
> Regards,
> 
> Anthony Liguori
>

Re: [Qemu-devel] Rethinking missed tick catchup

2012-09-12 Thread Stefan Weil


Am 12.09.2012 18:45, schrieb Gleb Natapov:

On Wed, Sep 12, 2012 at 06:27:14PM +0200, Stefan Weil wrote:

Am 12.09.2012 15:54, schrieb Anthony Liguori:

Hi,

We've been running into a lot of problems lately with Windows guests and
I think they all ultimately could be addressed by revisiting the missed
tick catchup algorithms that we use.  Mike and I spent a while talking
about it yesterday and I wanted to take the discussion to the list to
get some additional input.

Here are the problems we're seeing:

1) Rapid reinjection can lead to time moving faster for short bursts of
time.  We've seen a number of RTC watchdog BSoDs and it's possible
that at least one cause is reinjection speed.

2) When hibernating a host system, the guest gets is essentially paused
for a long period of time.  This results in a very large tick catchup
while also resulting in a large skew in guest time.

I've gotten reports of the tick catchup consuming a lot of CPU time
from rapid delivery of interrupts (although I haven't reproduced this
yet).

3) Windows appears to have a service that periodically syncs the guest
time with the hardware clock.  I've been told the resync period is an
hour.  For large clock skews, this can compete with reinjection
resulting in a positive skew in time (the guest can be ahead of the
host).

Nearly each modern OS (including Windows) uses NTP
or some other protocol to get the time via a TCP network.


The drifts we are talking about will take ages for NTP to fix.


If a guest OS detects a small difference of time, it will usually
accelerate or decelerate the OS clock until the time is
synchronised again.

Large jumps in network time will make the OS time jump, too.
With a little bad luck, QEMU's reinjection will add the
positive skew, no matter whether the guest is Linux or Windows.


As far as I know NTP will never make OS clock jump. The purpose of NTP
is to fix time gradually, so apps will not notice. npdate is used to
force clock synchronization, but is should be run manually.


s/npdate/ntpdate. Yes, some Linux distros run it at system start,
and it's also usual to call it every hour (poor man's NTP, uses
less resources).




I've been thinking about an algorithm like this to address these
problems:

A) Limit the number of interrupts that we reinject to the equivalent of
a small period of wallclock time.  Something like 60 seconds.

B) In the event of (A), trigger a notification in QEMU.  This is easy
for the RTC but harder for the in-kernel PIT.  Maybe it's a good time to
revisit usage of the in-kernel PIT?

C) On acculumated tick overflow, rely on using a qemu-ga command to
force a resync of the guest's time to the hardware wallclock time.

D) Whenever the guest reads the wallclock time from the RTC, reset all
accumulated ticks.

D) makes no sense, see my comment above.

Injection of additional timer interrupts should not be needed
after a hibernation. The guest must handle that situation
by reading either the hw clock (which must be updated
by QEMU when it resumes from hibernate) or by using
another time reference (like NTP, for example).


He is talking about host hibernation, not guest.



I also meant host hibernation.

Maybe the host should tell the guest that it is going to
hibernate (ACPI event), then the guest can use its
normal hibernate entry and recovery code, too.

Re: [Qemu-devel] [libvirt] [PATCH] snapshot: fix rollback failure in transaction mode

2012-09-12 Thread Eric Blake

On 09/12/2012 09:22 AM, Guannan Ren wrote:
> After failure of qemu transaction command, the snapshot file still
> be there with non-zero in size. In order to unlink the file, the
> patch removes the file size checking.

Can you give some exact steps to reproduce this, so that I know who is
making the file have non-zero size?  I'm worried that unlinking a
non-empty file is discarding data, so the commit message needs a lot
more details about how we are proving that the only way the file can be
non-zero size is because qemu happened to put data into a previously
empty file prior to the failed 'transaction' attempt.

That is, after re-reading context code just now, I'm fairly confident
that this code can only be reached when qemu supports the 'transaction'
monitor command, and libvirt's --reuse-ext flag was not specified, and
therefore libvirt must have just created the file.  But I want that in
the commit message rather than having to re-read the code every time I
visit this commit in future reads of the git log.  It may also be that
qemu has a bug in that the 'transaction' command is modifying files even
when it fails, so even while this works around the bug, I'm cc'ing Jeff
to see if qemu also needs a bug fix.

> ---
>  src/qemu/qemu_driver.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
> index 8e8e00c..1fedfb8 100644
> --- a/src/qemu/qemu_driver.c
> +++ b/src/qemu/qemu_driver.c
> @@ -10833,7 +10833,7 @@ qemuDomainSnapshotUndoSingleDiskActive(struct 
> qemud_driver *driver,
>  if (virDomainLockDiskDetach(driver->lockManager, vm, disk) < 0)
>  VIR_WARN("Unable to release lock on %s", disk->src);
>  if (need_unlink && stat(disk->src, &st) == 0 &&
> -st.st_size == 0 && S_ISREG(st.st_mode) && unlink(disk->src) < 0)
> +S_ISREG(st.st_mode) && unlink(disk->src) < 0)
>  VIR_WARN("Unable to remove just-created %s", disk->src);
>  
>  /* Update vm in place to match changes.  */
> 

-- 
Eric Blake   ebl...@redhat.com+1-919-301-3266
Libvirt virtualization library http://libvirt.org

signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [PATCH] w64: Fix calls of TCG helper functions with 5 arguments

2012-09-12 Thread Stefan Weil

TCG uses 6 registers for function arguments on 64 bit Linux hosts,
but only 4 registers on W64 hosts.

Commit 2999a0b20074a7e4a58f56572bb1436749368f59 increased the number
of arguments for some important helper functions from 4 to 5
which triggered a bug for W64 hosts: QEMU aborts when executing
helper_lcall_real in the guest's BIOS because function
tcg_target_get_call_iarg_regs_count always returned 6.

As W64 has only 4 registers for arguments, the 5th argument must be
passed on the stack using a correct stack offset.

Signed-off-by: Stefan Weil 
---

Without that patch, QEMU 1.2 is unusable on W64 hosts.
Please apply it to the stable versions.

Thanks,

Stefan W.

 tcg/i386/tcg-target.c |2 +-
 tcg/i386/tcg-target.h |4 
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/tcg/i386/tcg-target.c b/tcg/i386/tcg-target.c
index da17bba..43b5572 100644
--- a/tcg/i386/tcg-target.c
+++ b/tcg/i386/tcg-target.c
@@ -118,7 +118,7 @@ static void patch_reloc(uint8_t *code_ptr, int type,
 static inline int tcg_target_get_call_iarg_regs_count(int flags)
 {
 if (TCG_TARGET_REG_BITS == 64) {
-return 6;
+return ARRAY_SIZE(tcg_target_call_iarg_regs);
 }
 
 return 0;
diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index c3cfe05..87417d0 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -67,7 +67,11 @@ typedef enum {
 /* used for function call generation */
 #define TCG_REG_CALL_STACK TCG_REG_ESP 
 #define TCG_TARGET_STACK_ALIGN 16
+#if defined(_WIN64)
+#define TCG_TARGET_CALL_STACK_OFFSET 32
+#else
 #define TCG_TARGET_CALL_STACK_OFFSET 0
+#endif
 
 /* optional instructions */
 #define TCG_TARGET_HAS_div2_i32 1
-- 
1.7.10

Re: [Qemu-devel] Rethinking missed tick catchup

2012-09-12 Thread Clemens Kolbitsch

> On 2012-09-12 15:54, Anthony Liguori wrote:
>>
>> Hi,
>>
>> We've been running into a lot of problems lately with Windows guests and
>> I think they all ultimately could be addressed by revisiting the missed
>> tick catchup algorithms that we use.  Mike and I spent a while talking
>> about it yesterday and I wanted to take the discussion to the list to
>> get some additional input.
>>
>> Here are the problems we're seeing:
>>
>> 1) Rapid reinjection can lead to time moving faster for short bursts of
>>time.  We've seen a number of RTC watchdog BSoDs and it's possible
>>that at least one cause is reinjection speed.
>>
>> 2) When hibernating a host system, the guest gets is essentially paused
>>for a long period of time.  This results in a very large tick catchup
>>while also resulting in a large skew in guest time.
>>
>>I've gotten reports of the tick catchup consuming a lot of CPU time
>>from rapid delivery of interrupts (although I haven't reproduced this
>>yet).

Guys,

not much that I can contribute to solving the problem, but I have a
bunch of VMs where this happens _every_ time I resume a snapshot (but
without hibernating). In case this could be a connected problem and
you need help testing a patch, I'm more than happy to help.

-Clemens

Re: [Qemu-devel] [PATCH] w64: Fix calls of TCG helper functions with 5 arguments

2012-09-12 Thread Peter Maydell

On 12 September 2012 19:03, Stefan Weil  wrote:
> diff --git a/tcg/i386/tcg-target.c b/tcg/i386/tcg-target.c
> index da17bba..43b5572 100644
> --- a/tcg/i386/tcg-target.c
> +++ b/tcg/i386/tcg-target.c
> @@ -118,7 +118,7 @@ static void patch_reloc(uint8_t *code_ptr, int type,
>  static inline int tcg_target_get_call_iarg_regs_count(int flags)
>  {
>  if (TCG_TARGET_REG_BITS == 64) {
> -return 6;
> +return ARRAY_SIZE(tcg_target_call_iarg_regs);
>  }
>
>  return 0;

Hmm. Why can't we just return the array size in all cases?
Is there something special about 32 bit x86? I checked, and
all our other TCG targets return the same value as the size of
the iarg_regs array (either using ARRAY_SIZE or by just returning
the right number)...

-- PMM

Re: [Qemu-devel] Rethinking missed tick catchup

2012-09-12 Thread Gleb Natapov

On Wed, Sep 12, 2012 at 07:30:08PM +0200, Stefan Weil wrote:
> Am 12.09.2012 18:45, schrieb Gleb Natapov:
> >On Wed, Sep 12, 2012 at 06:27:14PM +0200, Stefan Weil wrote:
> >>Am 12.09.2012 15:54, schrieb Anthony Liguori:
> >>>Hi,
> >>>
> >>>We've been running into a lot of problems lately with Windows guests and
> >>>I think they all ultimately could be addressed by revisiting the missed
> >>>tick catchup algorithms that we use.  Mike and I spent a while talking
> >>>about it yesterday and I wanted to take the discussion to the list to
> >>>get some additional input.
> >>>
> >>>Here are the problems we're seeing:
> >>>
> >>>1) Rapid reinjection can lead to time moving faster for short bursts of
> >>>time.  We've seen a number of RTC watchdog BSoDs and it's possible
> >>>that at least one cause is reinjection speed.
> >>>
> >>>2) When hibernating a host system, the guest gets is essentially paused
> >>>for a long period of time.  This results in a very large tick catchup
> >>>while also resulting in a large skew in guest time.
> >>>
> >>>I've gotten reports of the tick catchup consuming a lot of CPU time
> >>>from rapid delivery of interrupts (although I haven't reproduced this
> >>>yet).
> >>>
> >>>3) Windows appears to have a service that periodically syncs the guest
> >>>time with the hardware clock.  I've been told the resync period is an
> >>>hour.  For large clock skews, this can compete with reinjection
> >>>resulting in a positive skew in time (the guest can be ahead of the
> >>>host).
> >>Nearly each modern OS (including Windows) uses NTP
> >>or some other protocol to get the time via a TCP network.
> >>
> >The drifts we are talking about will take ages for NTP to fix.
> >
> >>If a guest OS detects a small difference of time, it will usually
> >>accelerate or decelerate the OS clock until the time is
> >>synchronised again.
> >>
> >>Large jumps in network time will make the OS time jump, too.
> >>With a little bad luck, QEMU's reinjection will add the
> >>positive skew, no matter whether the guest is Linux or Windows.
> >>
> >As far as I know NTP will never make OS clock jump. The purpose of NTP
> >is to fix time gradually, so apps will not notice. npdate is used to
> >force clock synchronization, but is should be run manually.
> 
> s/npdate/ntpdate. Yes, some Linux distros run it at system start,
Yes, typo.

> and it's also usual to call it every hour (poor man's NTP, uses
> less resources).
> 
> >
> >>>I've been thinking about an algorithm like this to address these
> >>>problems:
> >>>
> >>>A) Limit the number of interrupts that we reinject to the equivalent of
> >>>a small period of wallclock time.  Something like 60 seconds.
> >>>
> >>>B) In the event of (A), trigger a notification in QEMU.  This is easy
> >>>for the RTC but harder for the in-kernel PIT.  Maybe it's a good time 
> >>> to
> >>>revisit usage of the in-kernel PIT?
> >>>
> >>>C) On acculumated tick overflow, rely on using a qemu-ga command to
> >>>force a resync of the guest's time to the hardware wallclock time.
> >>>
> >>>D) Whenever the guest reads the wallclock time from the RTC, reset all
> >>>accumulated ticks.
> >>D) makes no sense, see my comment above.
> >>
> >>Injection of additional timer interrupts should not be needed
> >>after a hibernation. The guest must handle that situation
> >>by reading either the hw clock (which must be updated
> >>by QEMU when it resumes from hibernate) or by using
> >>another time reference (like NTP, for example).
> >>
> >He is talking about host hibernation, not guest.
> >
> 
> I also meant host hibernation.
Than I don't see how guest can handle the situation since it has
no idea that it was stopped. Qemu has not idea about host hibernation
either.

> 
> Maybe the host should tell the guest that it is going to
> hibernate (ACPI event), then the guest can use its
> normal hibernate entry and recovery code, too.
Qemu does not emulate Sleep button, but even if it did guest may ignore
it. AFAIK libvirt migrate VM into a file during host hibernation. While
this does not require guest cooperation it have time keeping issues.

--
Gleb.

Re: [Qemu-devel] Rethinking missed tick catchup

2012-09-12 Thread Anthony Liguori

Gleb Natapov  writes:

> On Wed, Sep 12, 2012 at 08:54:26AM -0500, Anthony Liguori wrote:
>> 
>> Hi,
>> 
>> We've been running into a lot of problems lately with Windows guests and
>> I think they all ultimately could be addressed by revisiting the missed
>> tick catchup algorithms that we use.  Mike and I spent a while talking
>> about it yesterday and I wanted to take the discussion to the list to
>> get some additional input.
>> 
>> Here are the problems we're seeing:
>> 
>> 1) Rapid reinjection can lead to time moving faster for short bursts of
>>time.  We've seen a number of RTC watchdog BSoDs and it's possible
>>that at least one cause is reinjection speed.
>> 
>> 2) When hibernating a host system, the guest gets is essentially paused
>>for a long period of time.  This results in a very large tick catchup
>>while also resulting in a large skew in guest time.
>> 
>>I've gotten reports of the tick catchup consuming a lot of CPU time
>>from rapid delivery of interrupts (although I haven't reproduced this
>>yet).
>> 
>> 3) Windows appears to have a service that periodically syncs the guest
>>time with the hardware clock.  I've been told the resync period is an
>>hour.  For large clock skews, this can compete with reinjection
>>resulting in a positive skew in time (the guest can be ahead of the
>>host).
>> 
>> I've been thinking about an algorithm like this to address these
>> problems:
>> 
>> A) Limit the number of interrupts that we reinject to the equivalent of
>>a small period of wallclock time.  Something like 60 seconds.
>> 
> How this will fix BSOD problem for instance? 60 seconds is long enough
> to cause all the problem you are talking about above. We can make
> amount of accumulated ticks easily configurable though to play with and
> see.

It won't, but the goal of an upper limit is to cap time correction at
something reasonably caused by overcommit, not by suspend/resume.

60 seconds is probably way too long.  Maybe 5 seconds?  We can try
various amounts as you said.

What do you think about slowing down the catchup rate?  I think now we
increase wallclock time by 100-700%.

This is very fast.  I wonder if this makes sense anymore since hr timers
are pretty much ubiquitous.

I think we could probably even just increase wallclock time by as little
as 10-20%.  That should avoid false watchdog alerts but still give us a
chance to inject enough interrupts.

>
>> B) In the event of (A), trigger a notification in QEMU.  This is easy
>>for the RTC but harder for the in-kernel PIT.  Maybe it's a good time to
>>revisit usage of the in-kernel PIT?
>> 
> PIT does not matter for Windows guests.
>
>> C) On acculumated tick overflow, rely on using a qemu-ga command to
>>force a resync of the guest's time to the hardware wallclock time.
>> 
> Needs guest cooperation.

Yes, hence qemu-ga.  But is there any other choice?  Hibernation can
cause us to miss an unbounded number of ticks.   Days worth of time.  It
seems unreasonable to gradually catch up that much time.

>> D) Whenever the guest reads the wallclock time from the RTC, reset all
>>accumulated ticks.
>>
>> In order to do (C), we'll need to plumb qemu-ga through QMP.  Mike and I
>> discussed a low-impact way of doing this (having a separate dispatch
>> path for guest agent commands) and I'm confident we could do this for
>> 1.3.
>> 
>> This would mean that management tools would need to consume qemu-ga
>> through QMP.  Not sure if this is a problem for anyone.
>> 
>> I'm not sure whether it's worth trying to support this with the
>> in-kernel PIT or not either.
>> 
>> Are there other issues with reinjection that people are aware of?  Does
>> anything seem obviously wrong with the above?
>> 
> It looks like you are trying to solve only pathologically big timedrift
> problems. Those do not happen normally.

They do if you hibernate your laptop.

Regards,

Anthony Liguori

>
> --
>   Gleb.

Re: [Qemu-devel] [PATCH v2 3/4] target-i386: Allow changing of Hypervisor CPUIDs.

2012-09-12 Thread Marcelo Tosatti


The problem with integrating this is that it has little or 
no assurance from documentation. The Linux kernel source is a good
source, then say "accordingly to VMWare guest support code in version xyz"
in the changelog.

Also extracting this information in a text file (or comment in the code)
would be better than just adding code.

On Tue, Sep 11, 2012 at 10:07:46AM -0400, Don Slutz wrote:
> This is primarily done so that the guest will think it is running
> under vmware when hypervisor-vendor=vmware is specified as a
> property of a cpu.
> 
> Signed-off-by: Don Slutz 
> ---
>  target-i386/cpu.c |  214 
> +
>  target-i386/cpu.h |   21 +
>  target-i386/kvm.c |   33 +++--
>  3 files changed, 262 insertions(+), 6 deletions(-)
> 
> diff --git a/target-i386/cpu.c b/target-i386/cpu.c
> index 5f9866a..9f1f390 100644
> --- a/target-i386/cpu.c
> +++ b/target-i386/cpu.c
> @@ -1135,6 +1135,36 @@ static void x86_cpuid_set_model_id(Object *obj, const 
> char *model_id,
>  }
>  }
>  
> +static void x86_cpuid_set_vmware_extra(Object *obj)
> +{
> +X86CPU *cpu = X86_CPU(obj);
> +
> +if ((cpu->env.tsc_khz != 0) &&
> +(cpu->env.cpuid_hv_level == CPUID_HV_LEVEL_VMARE_4) &&
> +(cpu->env.cpuid_hv_vendor1 == CPUID_HV_VENDOR_VMWARE_1) &&
> +(cpu->env.cpuid_hv_vendor2 == CPUID_HV_VENDOR_VMWARE_2) &&
> +(cpu->env.cpuid_hv_vendor3 == CPUID_HV_VENDOR_VMWARE_3)) {
> +const uint32_t apic_khz = 100L;
> +
> +/*
> + * From article.gmane.org/gmane.comp.emulators.kvm.devel/22643
> + *
> + *Leaf 0x4010, Timing Information.
> + *
> + *VMware has defined the first generic leaf to provide timing
> + *information.  This leaf returns the current TSC frequency and
> + *current Bus frequency in kHz.
> + *
> + *# EAX: (Virtual) TSC frequency in kHz.
> + *# EBX: (Virtual) Bus (local apic timer) frequency in kHz.
> + *# ECX, EDX: RESERVED (Per above, reserved fields are set to 
> zero).
> + */
> +cpu->env.cpuid_hv_extra = 0x4010;
> +cpu->env.cpuid_hv_extra_a = (uint32_t)cpu->env.tsc_khz;
> +cpu->env.cpuid_hv_extra_b = apic_khz;
> +}
> +}

What happens in case you migrate the vmware guest to a host
with different frequency? How is that transmitted to the
vmware-guest-running-on-kvm ? Or is migration not supported?

> +static void x86_cpuid_set_hv_level(Object *obj, Visitor *v, void *opaque,
> +const char *name, Error **errp)
> +{
> +X86CPU *cpu = X86_CPU(obj);
> +uint32_t value;
> +
> +visit_type_uint32(v, &value, name, errp);
> +if (error_is_set(errp)) {
> +return;
> +}
> +
> +if ((value != 0) && (value < 0x4000)) {
> +value += 0x4000;
> +}
> +cpu->env.cpuid_hv_level = value;
> +}
> +
> +static char *x86_cpuid_get_hv_vendor(Object *obj, Error **errp)
> +{
> +X86CPU *cpu = X86_CPU(obj);
> +CPUX86State *env = &cpu->env;
> +char *value;
> +int i;
> +
> +value = (char *)g_malloc(CPUID_VENDOR_SZ + 1);
> +for (i = 0; i < 4; i++) {
> +value[i + 0] = env->cpuid_hv_vendor1 >> (8 * i);
> +value[i + 4] = env->cpuid_hv_vendor2 >> (8 * i);
> +value[i + 8] = env->cpuid_hv_vendor3 >> (8 * i);
> +}
> +value[CPUID_VENDOR_SZ] = '\0';
> +
> +/* Convert known names */
> +if (!strcmp(value, CPUID_HV_VENDOR_VMWARE)) {
> +if (env->cpuid_hv_level == CPUID_HV_LEVEL_VMARE_4) {
> +pstrcpy(value, sizeof(value), "vmware4");
> +} else if (env->cpuid_hv_level == CPUID_HV_LEVEL_VMARE_3) {
> +pstrcpy(value, sizeof(value), "vmware3");
> +}
> +} else if (!strcmp(value, CPUID_HV_VENDOR_XEN) &&
> +   env->cpuid_hv_level == CPUID_HV_LEVEL_XEN) {
> +pstrcpy(value, sizeof(value), "xen");
> +} else if (!strcmp(value, CPUID_HV_VENDOR_KVM) &&
> +   env->cpuid_hv_level == 0) {
> +pstrcpy(value, sizeof(value), "kvm");
> +}
> +return value;
> +}
> +
> +static void x86_cpuid_set_hv_vendor(Object *obj, const char *value,
> + Error **errp)
> +{
> +X86CPU *cpu = X86_CPU(obj);
> +CPUX86State *env = &cpu->env;
> +int i;
> +char adj_value[CPUID_VENDOR_SZ + 1];
> +
> +memset(adj_value, 0, sizeof(adj_value));
> +
> +/* Convert known names */
> +if (!strcmp(value, "vmware") || !strcmp(value, "vmware4")) {
> +if (env->cpuid_hv_level == 0) {
> +env->cpuid_hv_level = CPUID_HV_LEVEL_VMARE_4;
> +}
> +pstrcpy(adj_value, sizeof(adj_value), CPUID_HV_VENDOR_VMWARE);
> +} else if (!strcmp(value, "vmware3")) {
> +if (env->cpuid_hv_level == 0) {
> +env->cpuid_hv_level = CPUID_HV_LEVEL_VMARE_3;
> +}
> +pstrcpy(adj_value, sizeof(adj_value), CPUID_HV_VE

Re: [Qemu-devel] [libvirt] [PATCH] snapshot: fix rollback failure in transaction mode

2012-09-12 Thread Jeff Cody

On 09/12/2012 01:47 PM, Eric Blake wrote:
> On 09/12/2012 09:22 AM, Guannan Ren wrote:
>> After failure of qemu transaction command, the snapshot file still
>> be there with non-zero in size. In order to unlink the file, the
>> patch removes the file size checking.
> 
> Can you give some exact steps to reproduce this, so that I know who is
> making the file have non-zero size?  I'm worried that unlinking a
> non-empty file is discarding data, so the commit message needs a lot
> more details about how we are proving that the only way the file can be
> non-zero size is because qemu happened to put data into a previously
> empty file prior to the failed 'transaction' attempt.
> 
> That is, after re-reading context code just now, I'm fairly confident
> that this code can only be reached when qemu supports the 'transaction'
> monitor command, and libvirt's --reuse-ext flag was not specified, and
> therefore libvirt must have just created the file.  But I want that in
> the commit message rather than having to re-read the code every time I
> visit this commit in future reads of the git log.  It may also be that
> qemu has a bug in that the 'transaction' command is modifying files even
> when it fails, so even while this works around the bug, I'm cc'ing Jeff
> to see if qemu also needs a bug fix.

If QEMU creates the snapshot file, upon transaction failure it is
possible to have a newly created image file left, depending on when the
failure occurs.  The running QEMU instance, however, will not be
affected.

For instance, if we are performing qcow2 snapshots on 2 drives using the
transaction command (e.g. drive0 and drive1), we will:

1. create qcow2 image file for drive0 new active layer
2. create qcow2 image file for drive1 new active layer
3. If 1 & 2 were successful, then we modify the live image chain
   structure in memory to use the newly created files.  Otherwise, we
   abandon the change, notify libvirt of the error, and leave any newly
   created files intact.

That means, on a snapshot failure, QEMU's live running operation will
not be affected, but the management software (libvirt) should clean up
any resulting image files, if appropriate.

It sounds like you expect QEMU to unlink any of the newly created
snapshot files on failure - is that an accurate statement?

> 
>> ---
>>  src/qemu/qemu_driver.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
>> index 8e8e00c..1fedfb8 100644
>> --- a/src/qemu/qemu_driver.c
>> +++ b/src/qemu/qemu_driver.c
>> @@ -10833,7 +10833,7 @@ qemuDomainSnapshotUndoSingleDiskActive(struct 
>> qemud_driver *driver,
>>  if (virDomainLockDiskDetach(driver->lockManager, vm, disk) < 0)
>>  VIR_WARN("Unable to release lock on %s", disk->src);
>>  if (need_unlink && stat(disk->src, &st) == 0 &&
>> -st.st_size == 0 && S_ISREG(st.st_mode) && unlink(disk->src) < 0)
>> +S_ISREG(st.st_mode) && unlink(disk->src) < 0)
>>  VIR_WARN("Unable to remove just-created %s", disk->src);
>>  
>>  /* Update vm in place to match changes.  */
>>
>

Re: [Qemu-devel] [PATCH] socket: don't attempt to reconnect a TCP socket in server mode

2012-09-12 Thread Richard W.M. Jones

On Wed, Sep 05, 2012 at 02:01:36PM -0500, Anthony Liguori wrote:
> Commit c3767ed0eb5d0bb25fe409ae5dec06e3411ff1b6 introduced a possible SEGV 
> when
> using a socket chardev with server=on because it assumes that all TCP sockets
> are in client mode.
> 
> This patch adds a check to only reconnect when in client mode.
> 
> Cc: Lei Li 
> Reported-by: Michael Roth 
> Signed-off-by: Anthony Liguori 
> ---
>  qemu-char.c |4 +++-
>  1 files changed, 3 insertions(+), 1 deletions(-)
> 
> diff --git a/qemu-char.c b/qemu-char.c
> index 398baf1..767da93 100644
> --- a/qemu-char.c
> +++ b/qemu-char.c
> @@ -2148,10 +2148,12 @@ static int tcp_chr_write(CharDriverState *chr, const 
> uint8_t *buf, int len)
>  TCPCharDriver *s = chr->opaque;
>  if (s->connected) {
>  return send_all(s->fd, buf, len);
> -} else {
> +} else if (s->listen_fd == -1) {
>  /* (Re-)connect for unconnected writing */
>  tcp_chr_connect(chr);
>  return 0;
> +} else {
> +return len;
>  }
>  }

Hi Anthony,

I just came around this patch when I was trying to fix this
bug: https://bugzilla.redhat.com/show_bug.cgi?id=853408
qemu segfaults when trying to write to a serial socket which
is *not* a server socket and has been closed by the other end.

Unfortunately your patch above does not fix it.  Only a
complete revert of c3767ed0eb5d0 fixes it.

I don't understand the purpose of c3767ed0eb5d0 at all.  It
seems to set the s->connected flag and carries on regardless,
happily calling write (-1, ...), which is completely broken.

The other end closed the socket.  There's no one listening on the
other end, and setting the s->connected flag will not help that.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming blog: http://rwmj.wordpress.com
Fedora now supports 80 OCaml packages (the OPEN alternative to F#)
http://cocan.org/getting_started_with_ocaml_on_red_hat_and_fedora

Re: [Qemu-devel] [PATCH] w64: Fix calls of TCG helper functions with 5 arguments

2012-09-12 Thread Aurelien Jarno

On Wed, Sep 12, 2012 at 07:12:47PM +0100, Peter Maydell wrote:
> On 12 September 2012 19:03, Stefan Weil  wrote:
> > diff --git a/tcg/i386/tcg-target.c b/tcg/i386/tcg-target.c
> > index da17bba..43b5572 100644
> > --- a/tcg/i386/tcg-target.c
> > +++ b/tcg/i386/tcg-target.c
> > @@ -118,7 +118,7 @@ static void patch_reloc(uint8_t *code_ptr, int type,
> >  static inline int tcg_target_get_call_iarg_regs_count(int flags)
> >  {
> >  if (TCG_TARGET_REG_BITS == 64) {
> > -return 6;
> > +return ARRAY_SIZE(tcg_target_call_iarg_regs);
> >  }
> >
> >  return 0;
> 
> Hmm. Why can't we just return the array size in all cases?
> Is there something special about 32 bit x86? I checked, and
> all our other TCG targets return the same value as the size of
> the iarg_regs array (either using ARRAY_SIZE or by just returning
> the right number)...
> 

On 32-bit x86, all arguments are now being passed on the stack, that's
why the function returns 0. On the other hand when the change has been 
done, the registers haven't been removed from tcg_target_call_iarg_regs.

I think this patch is fine enough for 1.2, but a better patch is needed
for master.


-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH] socket: don't attempt to reconnect a TCP socket in server mode

2012-09-12 Thread Anthony Liguori

"Richard W.M. Jones"  writes:

> On Wed, Sep 05, 2012 at 02:01:36PM -0500, Anthony Liguori wrote:
>> Commit c3767ed0eb5d0bb25fe409ae5dec06e3411ff1b6 introduced a possible SEGV 
>> when
>> using a socket chardev with server=on because it assumes that all TCP sockets
>> are in client mode.
>> 
>> This patch adds a check to only reconnect when in client mode.
>> 
>> Cc: Lei Li 
>> Reported-by: Michael Roth 
>> Signed-off-by: Anthony Liguori 
>> ---
>>  qemu-char.c |4 +++-
>>  1 files changed, 3 insertions(+), 1 deletions(-)
>> 
>> diff --git a/qemu-char.c b/qemu-char.c
>> index 398baf1..767da93 100644
>> --- a/qemu-char.c
>> +++ b/qemu-char.c
>> @@ -2148,10 +2148,12 @@ static int tcp_chr_write(CharDriverState *chr, const 
>> uint8_t *buf, int len)
>>  TCPCharDriver *s = chr->opaque;
>>  if (s->connected) {
>>  return send_all(s->fd, buf, len);
>> -} else {
>> +} else if (s->listen_fd == -1) {
>>  /* (Re-)connect for unconnected writing */
>>  tcp_chr_connect(chr);
>>  return 0;
>> +} else {
>> +return len;
>>  }
>>  }
>
> Hi Anthony,
>
> I just came around this patch when I was trying to fix this
> bug: https://bugzilla.redhat.com/show_bug.cgi?id=853408
> qemu segfaults when trying to write to a serial socket which
> is *not* a server socket and has been closed by the other end.
>
> Unfortunately your patch above does not fix it.  Only a
> complete revert of c3767ed0eb5d0 fixes it.
>
> I don't understand the purpose of c3767ed0eb5d0 at all.  It
> seems to set the s->connected flag and carries on regardless,
> happily calling write (-1, ...), which is completely broken.
>
> The other end closed the socket.  There's no one listening on the
> other end, and setting the s->connected flag will not help that.

You're 100% correct.  I was only attempting to fix the server SEGV, I
didn't notice that client was hopelessly broken too.  Will send a patch
reverting both commits.

Regards,

Anthony Liguori

>
> Rich.
>
> -- 
> Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
> Read my programming blog: http://rwmj.wordpress.com
> Fedora now supports 80 OCaml packages (the OPEN alternative to F#)
> http://cocan.org/getting_started_with_ocaml_on_red_hat_and_fedora

Re: [Qemu-devel] [PATCH] w64: Fix calls of TCG helper functions with 5 arguments

2012-09-12 Thread Stefan Weil


Am 12.09.2012 21:14, schrieb Aurelien Jarno:

On Wed, Sep 12, 2012 at 07:12:47PM +0100, Peter Maydell wrote:

On 12 September 2012 19:03, Stefan Weil  wrote:

diff --git a/tcg/i386/tcg-target.c b/tcg/i386/tcg-target.c
index da17bba..43b5572 100644
--- a/tcg/i386/tcg-target.c
+++ b/tcg/i386/tcg-target.c
@@ -118,7 +118,7 @@ static void patch_reloc(uint8_t *code_ptr, int type,
  static inline int tcg_target_get_call_iarg_regs_count(int flags)
  {
  if (TCG_TARGET_REG_BITS == 64) {
-return 6;
+return ARRAY_SIZE(tcg_target_call_iarg_regs);
  }

  return 0;


Hmm. Why can't we just return the array size in all cases?
Is there something special about 32 bit x86? I checked, and
all our other TCG targets return the same value as the size of
the iarg_regs array (either using ARRAY_SIZE or by just returning
the right number)...



On 32-bit x86, all arguments are now being passed on the stack, that's
why the function returns 0. On the other hand when the change has been
done, the registers haven't been removed from tcg_target_call_iarg_regs.

I think this patch is fine enough for 1.2, but a better patch is needed
for master.


I noticed that Blue switched from register arguments to
arguments on the stack, but don't know the reason for that
change.

Maybe 32 bit x86 can also use a mixture of register / stack
arguments. This needs more testing and is the main reason
why I did not change tcg_target_call_iarg_regs for 32 bit
and return ARRAY_SIZE for both 32 and 64 bit.

I'd prefer to get the patch in master soon because it is
a minimalistic change which fixes the now unusable
64 bit mode on Windows. An additional patch can still
be applied on top.

Of course any better patch which also fixes 64 bit Windows
and which comes soon would also be very acceptable.

Regards

Stefan

Re: [Qemu-devel] Rethinking missed tick catchup

2012-09-12 Thread Stefan Weil


Am 12.09.2012 20:13, schrieb Gleb Natapov:

On Wed, Sep 12, 2012 at 07:30:08PM +0200, Stefan Weil wrote:

I also meant host hibernation.

Than I don't see how guest can handle the situation since it has
no idea that it was stopped. Qemu has not idea about host hibernation
either.


The guest can compare its internal timers (which stop
when the host hibernates) with external time references
(NTP, hw clock or any other clock reference).

Re: [Qemu-devel] [PATCH] w64: Fix calls of TCG helper functions with 5 arguments

2012-09-12 Thread Stefan Weil


Am 12.09.2012 21:14, schrieb Aurelien Jarno:

On Wed, Sep 12, 2012 at 07:12:47PM +0100, Peter Maydell wrote:

On 12 September 2012 19:03, Stefan Weil  wrote:

diff --git a/tcg/i386/tcg-target.c b/tcg/i386/tcg-target.c
index da17bba..43b5572 100644
--- a/tcg/i386/tcg-target.c
+++ b/tcg/i386/tcg-target.c
@@ -118,7 +118,7 @@ static void patch_reloc(uint8_t *code_ptr, int type,
  static inline int tcg_target_get_call_iarg_regs_count(int flags)
  {
  if (TCG_TARGET_REG_BITS == 64) {
-return 6;
+return ARRAY_SIZE(tcg_target_call_iarg_regs);
  }

  return 0;


Hmm. Why can't we just return the array size in all cases?
Is there something special about 32 bit x86? I checked, and
all our other TCG targets return the same value as the size of
the iarg_regs array (either using ARRAY_SIZE or by just returning
the right number)...



On 32-bit x86, all arguments are now being passed on the stack, that's
why the function returns 0. On the other hand when the change has been
done, the registers haven't been removed from tcg_target_call_iarg_regs.

I think this patch is fine enough for 1.2, but a better patch is needed
for master.


As soon as the special case x86 with 32 bit is fixed or eliminated,
it should be possible that all TCG targets share the same code for
tcg_target_get_call_iarg_regs_count. That function could be
removed from the target specific implementations and moved
to tcg.c.

Re: [Qemu-devel] Rethinking missed tick catchup

2012-09-12 Thread Michael Roth

On Wed, Sep 12, 2012 at 07:30:08PM +0200, Stefan Weil wrote:
> Am 12.09.2012 18:45, schrieb Gleb Natapov:
> >On Wed, Sep 12, 2012 at 06:27:14PM +0200, Stefan Weil wrote:
> >>Am 12.09.2012 15:54, schrieb Anthony Liguori:
> >>>Hi,
> >>>
> >>>We've been running into a lot of problems lately with Windows guests and
> >>>I think they all ultimately could be addressed by revisiting the missed
> >>>tick catchup algorithms that we use.  Mike and I spent a while talking
> >>>about it yesterday and I wanted to take the discussion to the list to
> >>>get some additional input.
> >>>
> >>>Here are the problems we're seeing:
> >>>
> >>>1) Rapid reinjection can lead to time moving faster for short bursts of
> >>>time.  We've seen a number of RTC watchdog BSoDs and it's possible
> >>>that at least one cause is reinjection speed.
> >>>
> >>>2) When hibernating a host system, the guest gets is essentially paused
> >>>for a long period of time.  This results in a very large tick catchup
> >>>while also resulting in a large skew in guest time.
> >>>
> >>>I've gotten reports of the tick catchup consuming a lot of CPU time
> >>>from rapid delivery of interrupts (although I haven't reproduced this
> >>>yet).
> >>>
> >>>3) Windows appears to have a service that periodically syncs the guest
> >>>time with the hardware clock.  I've been told the resync period is an
> >>>hour.  For large clock skews, this can compete with reinjection
> >>>resulting in a positive skew in time (the guest can be ahead of the
> >>>host).
> >>Nearly each modern OS (including Windows) uses NTP
> >>or some other protocol to get the time via a TCP network.
> >>
> >The drifts we are talking about will take ages for NTP to fix.
> >
> >>If a guest OS detects a small difference of time, it will usually
> >>accelerate or decelerate the OS clock until the time is
> >>synchronised again.
> >>
> >>Large jumps in network time will make the OS time jump, too.
> >>With a little bad luck, QEMU's reinjection will add the
> >>positive skew, no matter whether the guest is Linux or Windows.
> >>
> >As far as I know NTP will never make OS clock jump. The purpose of NTP
> >is to fix time gradually, so apps will not notice. npdate is used to
> >force clock synchronization, but is should be run manually.
> 
> s/npdate/ntpdate. Yes, some Linux distros run it at system start,
> and it's also usual to call it every hour (poor man's NTP, uses
> less resources).

Windows at least seems to generally default to a max correction of += 15
hours using this approach. The relevant registry values are listed here:

http://support.microsoft.com/kb/816042#method4 (under More Information)

On my Win7 instance I have:

MaxPosPhaseCorrection: 54000 (15 hours)
MaxNegPhaseCorrection: 54000 (15 hours)

So there are definitely situations where guests won't correct themselves
even with NTP or ntpdate-like services running.

Also:

MaxAllowedPhaseOffset: 1 (1 second)

So Windows won't attempt to "catch-up" via increased tickrate if the
delta is greater than 1 second, and will instead try to reset the clock
directly. Which is basically the policy we're looking to implement,
except from the host-side.


> 
> >
> >>>I've been thinking about an algorithm like this to address these
> >>>problems:
> >>>
> >>>A) Limit the number of interrupts that we reinject to the equivalent of
> >>>a small period of wallclock time.  Something like 60 seconds.
> >>>
> >>>B) In the event of (A), trigger a notification in QEMU.  This is easy
> >>>for the RTC but harder for the in-kernel PIT.  Maybe it's a good time 
> >>> to
> >>>revisit usage of the in-kernel PIT?
> >>>
> >>>C) On acculumated tick overflow, rely on using a qemu-ga command to
> >>>force a resync of the guest's time to the hardware wallclock time.
> >>>
> >>>D) Whenever the guest reads the wallclock time from the RTC, reset all
> >>>accumulated ticks.
> >>D) makes no sense, see my comment above.
> >>
> >>Injection of additional timer interrupts should not be needed
> >>after a hibernation. The guest must handle that situation
> >>by reading either the hw clock (which must be updated
> >>by QEMU when it resumes from hibernate) or by using
> >>another time reference (like NTP, for example).
> >>
> >He is talking about host hibernation, not guest.
> >
> 
> I also meant host hibernation.
> 
> Maybe the host should tell the guest that it is going to
> hibernate (ACPI event), then the guest can use its
> normal hibernate entry and recovery code, too.
> 

I think do that would be useful either way, but aren't there other
scenarios where big time jumps can occur? What about live migration?
Presumably we'd complete within the 15 hour limit above, but for other
operating systems or particular configurations thereof we might still
fall outside the threshold they're willing to correct for. At least with
an approach like this we can clearly define the requirements for proper
time-keeping.

Re: [Qemu-devel] [libvirt] [PATCH] snapshot: fix rollback failure in transaction mode

2012-09-12 Thread Eric Blake

On 09/12/2012 12:33 PM, Jeff Cody wrote:

> If QEMU creates the snapshot file, upon transaction failure it is
> possible to have a newly created image file left, depending on when the
> failure occurs.  The running QEMU instance, however, will not be
> affected.
> 
> For instance, if we are performing qcow2 snapshots on 2 drives using the
> transaction command (e.g. drive0 and drive1), we will:
> 
> 1. create qcow2 image file for drive0 new active layer
> 2. create qcow2 image file for drive1 new active layer
> 3. If 1 & 2 were successful, then we modify the live image chain
>structure in memory to use the newly created files.  Otherwise, we
>abandon the change, notify libvirt of the error, and leave any newly
>created files intact.
> 
> That means, on a snapshot failure, QEMU's live running operation will
> not be affected, but the management software (libvirt) should clean up
> any resulting image files, if appropriate.
> 
> It sounds like you expect QEMU to unlink any of the newly created
> snapshot files on failure - is that an accurate statement?

A non-empty file is being left behind by the transaction command, and it
seems like whoever did the successful open(O_CREAT) of that file should
then be tasked with the unlink() of the same file on the failure path.
But it's not quite as simple as having qemu unlink() the file.
Remember, with libvirt, we have to pre-create a 0-byte file, in order to
set appropriate SELinux labels, prior to telling qemu to make the
snapshot.  Remember, qemu is doing open(O_CREAT) but not
open(O_CREAT|O_EXCL), and has no idea whether it created a new file or
is merely reusing (and truncating) an existing file, so in a way, I'm
arguing that libvirt should be doing the unlink() if it handed a
pre-created file into qemu.

But even if unlink() on 'transaction' failure in qemu is asking too
much, it might be nice of qemu to ftruncate() the file back to 0 bytes
to return it to the state it had pre-'transaction'; any operation that
claims to have full rollback, but leaves altered state behind (even if
the altered state doesn't affect qemu operation), just seems odd.  And
while libvirt always hands in a pre-created empty file, there's still
the question of how qemu should behave when operated manually without
SELinux in the way and without libvirt driving things, where an unlink()
in qemu may actually make sense.

At any rate, you are confirming that qemu 1.1 leaves a non-empty file
behind, so even if we change qemu 1.3 to truncate the file back to 0
bytes on failure, libvirt still has to cope with the older behavior.  So
it's now up to the qemu folks to decide whether this is a bug worth
fixing, or merely an annoyance worth documenting, or even something
worth completely ignoring and throwing back at libvirt.  On libvirt's
side, Guannan's patch looks correct, although it needs better commentary
in the commit message and/or code before being applied.

-- 
Eric Blake   ebl...@redhat.com+1-919-301-3266
Libvirt virtualization library http://libvirt.org

signature.asc
Description: OpenPGP digital signature

1 2 >

1 - 100 of 149 matches

Mail list logo